[PATCH 0/2] dma-mapping: provide a benchmark for streaming DMA mapping

2020-10-26 Thread Barry Song
Nowadays, there are increasing requirements to benchmark the performance
of dma_map and dma_unmap particually while the device is attached to an
IOMMU.

This patchset provides the benchmark infrastruture for streaming DMA
mapping. The architecture of the code is pretty much similar with GUP
benchmark:
* mm/gup_benchmark.c provides kernel interface;
* tools/testing/selftests/vm/gup_benchmark.c provides user program to
call the interface provided by mm/gup_benchmark.c.

In our case, kernel/dma/map_benchmark.c is like mm/gup_benchmark.c;
tools/testing/selftests/dma/dma_map_benchmark.c is like tools/testing/
selftests/vm/gup_benchmark.c

A major difference with GUP benchmark is DMA_MAP benchmark needs to run
on a device. Considering one board with below devices and IOMMUs
device A  --- IOMMU 1
device B  --- IOMMU 2
device C  --- non-IOMMU

Different devices might attach to different IOMMU or non-IOMMU. To make
benchmark run, we can either
* create a virtual device and hack the kernel code to attach the virtual
device to IOMMU1, IOMMU2 or non-IOMMU.
* use the existing driver_override mechinism, unbind device A,B, or c from
their original driver and bind them to "dma_map_benchmark" platform_driver
or pci_driver for benchmarking.

In this patchset, I prefer to use the driver_override and avoid the various
hack in kernel. We can dynamically switch devices behind different IOMMUs
to get the performance of dma map on IOMMU or non-IOMMU.

Barry Song (2):
  dma-mapping: add benchmark support for streaming DMA APIs
  selftests/dma: add test application for DMA_MAP_BENCHMARK

 MAINTAINERS   |   6 +
 kernel/dma/Kconfig|   8 +
 kernel/dma/Makefile   |   1 +
 kernel/dma/map_benchmark.c| 202 ++
 tools/testing/selftests/dma/Makefile  |   6 +
 tools/testing/selftests/dma/config|   1 +
 .../testing/selftests/dma/dma_map_benchmark.c |  72 +++
 7 files changed, 296 insertions(+)
 create mode 100644 kernel/dma/map_benchmark.c
 create mode 100644 tools/testing/selftests/dma/Makefile
 create mode 100644 tools/testing/selftests/dma/config
 create mode 100644 tools/testing/selftests/dma/dma_map_benchmark.c

-- 
2.25.1

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[PATCH 2/2] selftests/dma: add test application for DMA_MAP_BENCHMARK

2020-10-26 Thread Barry Song
This patch provides the test application for DMA_MAP_BENCHMARK.

Before running the test application, we need to bind a device to dma_map_
benchmark driver. For example, unbind "xxx" from its original driver and
bind to dma_map_benchmark:

echo dma_map_benchmark > /sys/bus/platform/devices/xxx/driver_override
echo xxx > /sys/bus/platform/drivers/xxx/unbind
echo xxx > /sys/bus/platform/drivers/dma_map_benchmark/bind

Then, run 10 threads on numa node 1 for 10 seconds on device "xxx":
./dma_map_benchmark -t 10 -s 10 -n 1
dma mapping benchmark: average map_nsec:3619 average unmap_nsec:2423

Cc: Joerg Roedel 
Cc: Will Deacon 
Cc: Shuah Khan 
Cc: Christoph Hellwig 
Cc: Marek Szyprowski 
Cc: Robin Murphy 
Signed-off-by: Barry Song 
---
 MAINTAINERS   |  6 ++
 tools/testing/selftests/dma/Makefile  |  6 ++
 tools/testing/selftests/dma/config|  1 +
 .../testing/selftests/dma/dma_map_benchmark.c | 72 +++
 4 files changed, 85 insertions(+)
 create mode 100644 tools/testing/selftests/dma/Makefile
 create mode 100644 tools/testing/selftests/dma/config
 create mode 100644 tools/testing/selftests/dma/dma_map_benchmark.c

diff --git a/MAINTAINERS b/MAINTAINERS
index f310f0a09904..552389874ca2 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -5220,6 +5220,12 @@ F:   include/linux/dma-mapping.h
 F: include/linux/dma-map-ops.h
 F: kernel/dma/
 
+DMA MAPPING BENCHMARK
+M: Barry Song 
+L: iommu@lists.linux-foundation.org
+F: kernel/dma/map_benchmark.c
+F: tools/testing/selftests/dma/
+
 DMA-BUF HEAPS FRAMEWORK
 M: Sumit Semwal 
 R: Andrew F. Davis 
diff --git a/tools/testing/selftests/dma/Makefile 
b/tools/testing/selftests/dma/Makefile
new file mode 100644
index ..aa8e8b5b3864
--- /dev/null
+++ b/tools/testing/selftests/dma/Makefile
@@ -0,0 +1,6 @@
+# SPDX-License-Identifier: GPL-2.0
+CFLAGS += -I../../../../usr/include/
+
+TEST_GEN_PROGS := dma_map_benchmark
+
+include ../lib.mk
diff --git a/tools/testing/selftests/dma/config 
b/tools/testing/selftests/dma/config
new file mode 100644
index ..6102ee3c43cd
--- /dev/null
+++ b/tools/testing/selftests/dma/config
@@ -0,0 +1 @@
+CONFIG_DMA_MAP_BENCHMARK=y
diff --git a/tools/testing/selftests/dma/dma_map_benchmark.c 
b/tools/testing/selftests/dma/dma_map_benchmark.c
new file mode 100644
index ..e03bd03e101e
--- /dev/null
+++ b/tools/testing/selftests/dma/dma_map_benchmark.c
@@ -0,0 +1,72 @@
+// SPDX-License-Identifier: GPL-2.0-only
+/*
+ * Copyright (C) 2020 Hisilicon Limited.
+ */
+
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+
+#define DMA_MAP_BENCHMARK  _IOWR('d', 1, struct map_benchmark)
+
+struct map_benchmark {
+   __u64 map_nsec;
+   __u64 unmap_nsec;
+   __u32 threads; /* how many threads will do map/unmap in parallel */
+   __u32 seconds; /* how long the test will last */
+   int node; /* which numa node this benchmark will run on */
+   __u64 expansion[10];/* For future use */
+};
+
+int main(int argc, char **argv)
+{
+   struct map_benchmark map;
+   int fd, opt, threads = 0, seconds = 0, node = -1;
+   int cmd = DMA_MAP_BENCHMARK;
+   char *p;
+
+   while ((opt = getopt(argc, argv, "t:s:n:")) != -1) {
+   switch (opt) {
+   case 't':
+   threads = atoi(optarg);
+   break;
+   case 's':
+   seconds = atoi(optarg);
+   break;
+   case 'n':
+   node = atoi(optarg);
+   break;
+   default:
+   return -1;
+   }
+   }
+
+   if (threads <= 0 || seconds <= 0) {
+   perror("invalid number of threads or seconds");
+   exit(1);
+   }
+
+   fd = open("/sys/kernel/debug/dma_map_benchmark", O_RDWR);
+   if (fd == -1) {
+   perror("open");
+   exit(1);
+   }
+
+   map.seconds = seconds;
+   map.threads = threads;
+   map.node = node;
+   if (ioctl(fd, cmd, )) {
+   perror("ioctl");
+   exit(1);
+   }
+
+   printf("dma mapping benchmark: average map_nsec:%lld average 
unmap_nsec:%lld\n",
+   map.map_nsec,
+   map.unmap_nsec);
+
+   return 0;
+}
-- 
2.25.1

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[PATCH 1/2] dma-mapping: add benchmark support for streaming DMA APIs

2020-10-26 Thread Barry Song
Nowadays, there are increasing requirements to benchmark the performance
of dma_map and dma_unmap particually while the device is attached to an
IOMMU.

This patch enables the support. Users can run specified number of threads
to do dma_map_page and dma_unmap_page on a specific NUMA node with the
specified duration. Then dma_map_benchmark will calculate the average
latency for map and unmap.

A difficulity for this benchmark is that dma_map/unmap APIs must run on
a particular device. Each device might have different backend of IOMMU or
non-IOMMU.

So we use the driver_override to bind dma_map_benchmark to a particual
device by:
echo dma_map_benchmark > /sys/bus/platform/devices/xxx/driver_override
echo xxx > /sys/bus/platform/drivers/xxx/unbind
echo xxx > /sys/bus/platform/drivers/dma_map_benchmark/bind

For this moment, it supports platform device only, PCI device will also
be supported afterwards.

Cc: Joerg Roedel 
Cc: Will Deacon 
Cc: Shuah Khan 
Cc: Christoph Hellwig 
Cc: Marek Szyprowski 
Cc: Robin Murphy 
Signed-off-by: Barry Song 
---
 kernel/dma/Kconfig |   8 ++
 kernel/dma/Makefile|   1 +
 kernel/dma/map_benchmark.c | 202 +
 3 files changed, 211 insertions(+)
 create mode 100644 kernel/dma/map_benchmark.c

diff --git a/kernel/dma/Kconfig b/kernel/dma/Kconfig
index c99de4a21458..949c53da5991 100644
--- a/kernel/dma/Kconfig
+++ b/kernel/dma/Kconfig
@@ -225,3 +225,11 @@ config DMA_API_DEBUG_SG
  is technically out-of-spec.
 
  If unsure, say N.
+
+config DMA_MAP_BENCHMARK
+   bool "Enable benchmarking of streaming DMA mapping"
+   help
+ Provides /sys/kernel/debug/dma_map_benchmark that helps with testing
+ performance of dma_(un)map_page.
+
+ See tools/testing/selftests/dma/dma_map_benchmark.c
diff --git a/kernel/dma/Makefile b/kernel/dma/Makefile
index dc755ab68aab..7aa6b26b1348 100644
--- a/kernel/dma/Makefile
+++ b/kernel/dma/Makefile
@@ -10,3 +10,4 @@ obj-$(CONFIG_DMA_API_DEBUG)   += debug.o
 obj-$(CONFIG_SWIOTLB)  += swiotlb.o
 obj-$(CONFIG_DMA_COHERENT_POOL)+= pool.o
 obj-$(CONFIG_DMA_REMAP)+= remap.o
+obj-$(CONFIG_DMA_MAP_BENCHMARK)+= map_benchmark.o
diff --git a/kernel/dma/map_benchmark.c b/kernel/dma/map_benchmark.c
new file mode 100644
index ..16a5d7779d67
--- /dev/null
+++ b/kernel/dma/map_benchmark.c
@@ -0,0 +1,202 @@
+// SPDX-License-Identifier: GPL-2.0-only
+/*
+ * Copyright (C) 2020 Hisilicon Limited.
+ */
+
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+
+#define DMA_MAP_BENCHMARK  _IOWR('d', 1, struct map_benchmark)
+
+struct map_benchmark {
+   __u64 map_nsec;
+   __u64 unmap_nsec;
+   __u32 threads; /* how many threads will do map/unmap in parallel */
+   __u32 seconds; /* how long the test will last */
+   int node; /* which numa node this benchmark will run on */
+   __u64 expansion[10];/* For future use */
+};
+
+struct map_benchmark_data {
+   struct map_benchmark bparam;
+   struct device *dev;
+   struct dentry  *debugfs;
+   atomic64_t total_map_nsecs;
+   atomic64_t total_map_loops;
+   atomic64_t total_unmap_nsecs;
+   atomic64_t total_unmap_loops;
+};
+
+static int map_benchmark_thread(void *data)
+{
+   struct page *page;
+   dma_addr_t dma_addr;
+   struct map_benchmark_data *map = data;
+   int ret = 0;
+
+   page = alloc_page(GFP_KERNEL);
+   if (!page)
+   return -ENOMEM;
+
+   while (!kthread_should_stop())  {
+   ktime_t map_stime, map_etime, unmap_stime, unmap_etime;
+
+   map_stime = ktime_get();
+   dma_addr = dma_map_page(map->dev, page, 0, PAGE_SIZE, 
DMA_BIDIRECTIONAL);
+   if (unlikely(dma_mapping_error(map->dev, dma_addr))) {
+   dev_err(map->dev, "dma_map_page failed\n");
+   ret = -ENOMEM;
+   goto out;
+   }
+   map_etime = ktime_get();
+
+   unmap_stime = ktime_get();
+   dma_unmap_single(map->dev, dma_addr, PAGE_SIZE, 
DMA_BIDIRECTIONAL);
+   unmap_etime = ktime_get();
+
+   atomic64_add((long long)ktime_to_ns(ktime_sub(map_etime, 
map_stime)),
+   >total_map_nsecs);
+   atomic64_add((long long)ktime_to_ns(ktime_sub(unmap_etime, 
unmap_stime)),
+   >total_unmap_nsecs);
+   atomic64_inc(>total_map_loops);
+   atomic64_inc(>total_unmap_loops);
+   }
+
+out:
+   __free_page(page);
+   return ret;
+}
+
+static int do_map_benchmark(struct map_benchmark_data *map)
+{
+   struct task_struct **tsk;
+   int threads = map->bparam.threads;
+   int node = map->bparam.node;
+   const cpumask_t *cpu_mask = 

[PATCH] fix swiotlb panic on Xen

2020-10-26 Thread Stefano Stabellini
From: Stefano Stabellini 

kernel/dma/swiotlb.c:swiotlb_init gets called first and tries to
allocate a buffer for the swiotlb. It does so by calling

  memblock_alloc_low(PAGE_ALIGN(bytes), PAGE_SIZE);

If the allocation must fail, no_iotlb_memory is set.


Later during initialization swiotlb-xen comes in
(drivers/xen/swiotlb-xen.c:xen_swiotlb_init) and given that io_tlb_start
is != 0, it thinks the memory is ready to use when actually it is not.

When the swiotlb is actually needed, swiotlb_tbl_map_single gets called
and since no_iotlb_memory is set the kernel panics.

Instead, if swiotlb-xen.c:xen_swiotlb_init knew the swiotlb hadn't been
initialized, it would do the initialization itself, which might still
succeed.


Fix the panic by setting io_tlb_start to 0 on swiotlb initialization
failure, and also by setting no_iotlb_memory to false on swiotlb
initialization success.

Signed-off-by: Stefano Stabellini 


diff --git a/kernel/dma/swiotlb.c b/kernel/dma/swiotlb.c
index c19379fabd20..9924214df60a 100644
--- a/kernel/dma/swiotlb.c
+++ b/kernel/dma/swiotlb.c
@@ -231,6 +231,7 @@ int __init swiotlb_init_with_tbl(char *tlb, unsigned long 
nslabs, int verbose)
io_tlb_orig_addr[i] = INVALID_PHYS_ADDR;
}
io_tlb_index = 0;
+   no_iotlb_memory = false;
 
if (verbose)
swiotlb_print_info();
@@ -262,9 +263,11 @@ swiotlb_init(int verbose)
if (vstart && !swiotlb_init_with_tbl(vstart, io_tlb_nslabs, verbose))
return;
 
-   if (io_tlb_start)
+   if (io_tlb_start) {
memblock_free_early(io_tlb_start,
PAGE_ALIGN(io_tlb_nslabs << IO_TLB_SHIFT));
+   io_tlb_start = 0;
+   }
pr_warn("Cannot allocate buffer");
no_iotlb_memory = true;
 }
@@ -362,6 +365,7 @@ swiotlb_late_init_with_tbl(char *tlb, unsigned long nslabs)
io_tlb_orig_addr[i] = INVALID_PHYS_ADDR;
}
io_tlb_index = 0;
+   no_iotlb_memory = false;
 
swiotlb_print_info();
 
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH v3 01/14] docs: Document IO Address Space ID (IOASID) APIs

2020-10-26 Thread Jacob Pan
Hi Jean-Philippe,

Thanks a lot for the review. Comments inline.

On Tue, 20 Oct 2020 15:58:09 +0200, Jean-Philippe Brucker
 wrote:

> On Mon, Sep 28, 2020 at 02:38:28PM -0700, Jacob Pan wrote:
> > IOASID is used to identify address spaces that can be targeted by device
> > DMA. It is a system-wide resource that is essential to its many users.
> > This document is an attempt to help developers from all vendors navigate
> > the APIs. At this time, ARM SMMU and Intel’s Scalable IO Virtualization
> > (SIOV) enabled platforms are the primary users of IOASID. Examples of
> > how SIOV components interact with IOASID APIs are provided in that many
> > APIs are driven by the requirements from SIOV.
> > 
> > Cc: Jonathan Corbet 
> > Cc: linux-...@vger.kernel.org
> > Cc: Randy Dunlap 
> > Signed-off-by: Liu Yi L 
> > Signed-off-by: Wu Hao 
> > Signed-off-by: Jacob Pan   
> 
> This looks good to me, with small comments below.
> 
Can I add your Reviewed-by tag after addressing the comments?

> > ---
> >  Documentation/driver-api/ioasid.rst | 648
> >  1 file changed, 648 insertions(+)
> >  create mode 100644 Documentation/driver-api/ioasid.rst
> > 
> > diff --git a/Documentation/driver-api/ioasid.rst
> > b/Documentation/driver-api/ioasid.rst new file mode 100644
> > index ..7f8e702997ab
> > --- /dev/null
> > +++ b/Documentation/driver-api/ioasid.rst
> > @@ -0,0 +1,648 @@
> > +.. SPDX-License-Identifier: GPL-2.0
> > +.. ioasid:
> > +
> > +===
> > +IO Address Space ID
> > +===
> > +
> > +IOASID is a generic name for PCIe Process Address ID (PASID) or ARM
> > +SMMU SubstreamID. An IOASID identifies an address space that DMA
> > +requests can target.
> > +
> > +The primary use cases for IOASID are Shared Virtual Address (SVA) and
> > +multiple IOVA spaces per device. However, the requirements for IOASID
> > +management can vary among hardware architectures.
> > +
> > +For baremetal IOVA, IOASID #0 is used for DMA request without  
> 
>bare metal
got it

> 
> > +PASID. Even though some architectures such as VT-d also offers
> > +the flexibility of using any PASIDs for DMA request without PASID.
> > +PASID #0 is reserved and not allocated from any ioasid_set.
> > +
> > +Multiple IOVA spaces per device are mapped to auxiliary domains which
> > +can be used for mediated device assignment with and without a virtual
> > +IOMMU (vIOMMU). An IOASID is allocated for each auxiliary domain as
> > default +PASID. Without vIOMMU, default IOASID is used for DMA map/unmap
> > +APIs. With vIOMMU, default IOASID is used for guest IOVA where DMA
> > +request with PASID is required for the device. The reason is that
> > +there is only one PASID #0 per device, e.g. VT-d, RID_PASID is per
> > PCI  
> 
>on VT-d
got that
> 
> > +device.
> > +
> > +This document covers the generic features supported by IOASID
> > +APIs. Vendor-specific use cases are also illustrated with Intel's VT-d
> > +based platforms as the first example.
> > +
> > +.. contents:: :local:
> > +
> > +Glossary
> > +
> > +PASID - Process Address Space ID
> > +
> > +IOASID - IO Address Space ID (generic term for PCIe PASID and
> > +SubstreamID in SMMU)
> > +
> > +SVA/SVM - Shared Virtual Addressing/Memory
> > +
> > +ENQCMD - Intel X86 ISA for efficient workqueue submission [1]
> > +!!!TODO: Link to Spec at the bottom  
> 
> Yes, or maybe hyperlinks at the end of this section would be better. There
> are references and lists all over the document so keeping things as close
> as possible avoids confusion.
> 
sounds good.

> > +
> > +DSA - Intel Data Streaming Accelerator [2]
> > +
> > +VDCM - Virtual Device Composition Module [3]
> > +
> > +SIOV - Intel Scalable IO Virtualization
> > +
> > +
> > +Key Concepts
> > +
> > +
> > +IOASID Set
> > +---
> > +An IOASID set is a group of IOASIDs allocated from the system-wide
> > +IOASID pool. Refer to IOASID set APIs for more details.  
> 
>  IOASID Set Level APIs
> 
Yes, I should use the exact section title.

> > +
> > +IOASID set is particularly useful for guest SVA where each guest could
> > +have its own IOASID set for security and efficiency reasons.
> > +
> > +IOASID Set Private ID (SPID)
> > +
> > +Each IOASID set has a private namespace of SPIDs. An SPID maps to a
> > +single system-wide IOASID. Conversely, each IOASID may be associated
> > +with an alias ID, local to the IOASID set, named SPID.
> > +SPIDs can be used as guest IOASIDs where each guest could do
> > +IOASID allocation from its own pool and map them to host physical
> > +IOASIDs. SPIDs are particularly useful for supporting live migration
> > +where decoupling guest and host physical resources are necessary.
> > +
> > +For example, two VMs can both allocate guest PASID/SPID #101 but map to
> > +different host PASIDs #201 and #202 respectively as shown in 

Re: [PATCH v3 00/24] MT8192 IOMMU support

2020-10-26 Thread Krzysztof Kozlowski
On Wed, Sep 30, 2020 at 03:06:23PM +0800, Yong Wu wrote:
> This patch mainly adds support for mt8192 IOMMU and SMI.
> 
> mt8192 also is MTK IOMMU gen2 which uses ARM Short-Descriptor translation
> table format. The M4U-SMI HW diagram is as below:
> 
>   EMI
>|
>   M4U
>|
>   
>SMI Common
>   
>|
>   +---+--+--+--+---+
>   |   |  |  |   .. |   |
>   |   |  |  |  |   |
> larb0   larb1  larb2  larb4 ..  larb19   larb20
> disp0   disp1   mdpvdec   IPE  IPE
> 
> All the connections are HW fixed, SW can NOT adjust it.
> 
> Comparing with the preview SoC, this patchset mainly adds two new functions:
> a) add iova 34 bits support.
> b) add multi domains support since several HW has the special iova
> region requirement.
> 
> this patchset depend on v5.9-rc1.

Hi,

I think there will be v4 of this, right? If yes, please also describe
the dependencies between the patches. If the entire patchset is strictly
ordered, then mention this as well.

Best regards,
Krzysztof

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


RE: [PATCH] dma: Per-NUMA-node CMA should depend on NUMA

2020-10-26 Thread Song Bao Hua (Barry Song)



> -Original Message-
> From: Robin Murphy [mailto:robin.mur...@arm.com]
> Sent: Tuesday, October 27, 2020 1:25 AM
> To: h...@lst.de
> Cc: iommu@lists.linux-foundation.org; linux-ker...@vger.kernel.org; Song Bao
> Hua (Barry Song) 
> Subject: [PATCH] dma: Per-NUMA-node CMA should depend on NUMA
> 
> Offering DMA_PERNUMA_CMA to non-NUMA configs is pointless.
> 

This is right.

> Signed-off-by: Robin Murphy 
> ---
>  kernel/dma/Kconfig | 3 ++-
>  1 file changed, 2 insertions(+), 1 deletion(-)
> 
> diff --git a/kernel/dma/Kconfig b/kernel/dma/Kconfig
> index c99de4a21458..964b74c9b7e3 100644
> --- a/kernel/dma/Kconfig
> +++ b/kernel/dma/Kconfig
> @@ -125,7 +125,8 @@ if  DMA_CMA
> 
>  config DMA_PERNUMA_CMA
>   bool "Enable separate DMA Contiguous Memory Area for each NUMA
> Node"
> - default NUMA && ARM64
> + depends on NUMA
> + default ARM64

On the other hand, at this moment, only ARM64 is calling the init code
to get per_numa cma. Do we need to
depends on NUMA && ARM64 ?
so that this is not enabled by non-arm64?

>   help
> Enable this option to get pernuma CMA areas so that devices like
> ARM64 SMMU can get local memory by DMA coherent APIs.
> --
 
Thanks
Barry


___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[PATCH v2 4/4] iommu: avoid taking iova_rbtree_lock twice

2020-10-26 Thread John Garry
From: Cong Wang 

Both find_iova() and __free_iova() take iova_rbtree_lock,
there is no reason to take and release it twice inside
free_iova().

Fold them into one critical section by calling the unlock
versions instead.

Signed-off-by: Cong Wang 
Reviewed-by: Robin Murphy 
Signed-off-by: John Garry 
---
 drivers/iommu/iova.c | 8 ++--
 1 file changed, 6 insertions(+), 2 deletions(-)

diff --git a/drivers/iommu/iova.c b/drivers/iommu/iova.c
index 386005055aca..3b32e6746c70 100644
--- a/drivers/iommu/iova.c
+++ b/drivers/iommu/iova.c
@@ -398,10 +398,14 @@ EXPORT_SYMBOL_GPL(__free_iova);
 void
 free_iova(struct iova_domain *iovad, unsigned long pfn)
 {
-   struct iova *iova = find_iova(iovad, pfn);
+   unsigned long flags;
+   struct iova *iova;
 
+   spin_lock_irqsave(>iova_rbtree_lock, flags);
+   iova = private_find_iova(iovad, pfn);
if (iova)
-   __free_iova(iovad, iova);
+   private_free_iova(iovad, iova);
+   spin_unlock_irqrestore(>iova_rbtree_lock, flags);
 
 }
 EXPORT_SYMBOL_GPL(free_iova);
-- 
2.26.2

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[PATCH v2 2/4] iommu/iova: Avoid double-negatives in magazine helpers

2020-10-26 Thread John Garry
A similar crash to the following could be observed if initial CPU rcache
magazine allocations fail in init_iova_rcaches():

Unable to handle kernel NULL pointer dereference at virtual address 

Mem abort info:
   ESR = 0x9604
   EC = 0x25: DABT (current EL), IL = 32 bits
   SET = 0, FnV = 0
   EA = 0, S1PTW = 0
Data abort info:
   ISV = 0, ISS = 0x0004
   CM = 0, WnR = 0
[] user address but active_mm is swapper
Internal error: Oops: 9604 [#1] PREEMPT SMP
Modules linked in:
CPU: 11 PID: 696 Comm: irq/40-hisi_sas Not tainted 5.9.0-rc7-dirty #109
Hardware name: Huawei D06 /D06, BIOS Hisilicon D06 UEFI RC0 - V1.16.01 
03/15/2019
Call trace:
  free_iova_fast+0xfc/0x280
  iommu_dma_free_iova+0x64/0x70
  __iommu_dma_unmap+0x9c/0xf8
  iommu_dma_unmap_sg+0xa8/0xc8
  dma_unmap_sg_attrs+0x28/0x50
  cq_thread_v3_hw+0x2dc/0x528
  irq_thread_fn+0x2c/0xa0
  irq_thread+0x130/0x1e0
  kthread+0x154/0x158
  ret_from_fork+0x10/0x34

Code: f9400060 f102001f 54000981 d421 (f9400043)

 ---[ end trace 4afcbdfc61b60467 ]---

The issue is that expression !iova_magazine_full(NULL) evaluates true; this
falls over in in __iova_rcache_insert() when we attempt to cache a mag
and cpu_rcache->loaded == NULL:

if (!iova_magazine_full(cpu_rcache->loaded)) {
can_insert = true;
...

if (can_insert)
iova_magazine_push(cpu_rcache->loaded, iova_pfn);

As above, can_insert is evaluated true, which it shouldn't be, and we try
to insert pfns in a NULL mag, which is not safe.

To avoid this, stop using double-negatives, like !iova_magazine_full() and
!iova_magazine_empty(), and use positive tests, like
iova_magazine_has_space() and iova_magazine_has_pfns(), respectively; these
can safely deal with cpu_rcache->{loaded, prev} = NULL.

Signed-off-by: John Garry 
---
 drivers/iommu/iova.c | 29 +
 1 file changed, 17 insertions(+), 12 deletions(-)

diff --git a/drivers/iommu/iova.c b/drivers/iommu/iova.c
index 81b7399dd5e8..1f3f0f8b12e0 100644
--- a/drivers/iommu/iova.c
+++ b/drivers/iommu/iova.c
@@ -827,14 +827,18 @@ iova_magazine_free_pfns(struct iova_magazine *mag, struct 
iova_domain *iovad)
mag->size = 0;
 }
 
-static bool iova_magazine_full(struct iova_magazine *mag)
+static bool iova_magazine_has_space(struct iova_magazine *mag)
 {
-   return (mag && mag->size == IOVA_MAG_SIZE);
+   if (!mag)
+   return false;
+   return mag->size < IOVA_MAG_SIZE;
 }
 
-static bool iova_magazine_empty(struct iova_magazine *mag)
+static bool iova_magazine_has_pfns(struct iova_magazine *mag)
 {
-   return (!mag || mag->size == 0);
+   if (!mag)
+   return false;
+   return mag->size;
 }
 
 static unsigned long iova_magazine_pop(struct iova_magazine *mag,
@@ -843,7 +847,7 @@ static unsigned long iova_magazine_pop(struct iova_magazine 
*mag,
int i;
unsigned long pfn;
 
-   BUG_ON(iova_magazine_empty(mag));
+   BUG_ON(!iova_magazine_has_pfns(mag));
 
/* Only fall back to the rbtree if we have no suitable pfns at all */
for (i = mag->size - 1; mag->pfns[i] > limit_pfn; i--)
@@ -859,7 +863,7 @@ static unsigned long iova_magazine_pop(struct iova_magazine 
*mag,
 
 static void iova_magazine_push(struct iova_magazine *mag, unsigned long pfn)
 {
-   BUG_ON(iova_magazine_full(mag));
+   BUG_ON(!iova_magazine_has_space(mag));
 
mag->pfns[mag->size++] = pfn;
 }
@@ -905,9 +909,9 @@ static bool __iova_rcache_insert(struct iova_domain *iovad,
cpu_rcache = raw_cpu_ptr(rcache->cpu_rcaches);
spin_lock_irqsave(_rcache->lock, flags);
 
-   if (!iova_magazine_full(cpu_rcache->loaded)) {
+   if (iova_magazine_has_space(cpu_rcache->loaded)) {
can_insert = true;
-   } else if (!iova_magazine_full(cpu_rcache->prev)) {
+   } else if (iova_magazine_has_space(cpu_rcache->prev)) {
swap(cpu_rcache->prev, cpu_rcache->loaded);
can_insert = true;
} else {
@@ -916,8 +920,9 @@ static bool __iova_rcache_insert(struct iova_domain *iovad,
if (new_mag) {
spin_lock(>lock);
if (rcache->depot_size < MAX_GLOBAL_MAGS) {
-   rcache->depot[rcache->depot_size++] =
-   cpu_rcache->loaded;
+   if (cpu_rcache->loaded)
+   rcache->depot[rcache->depot_size++] =
+   cpu_rcache->loaded;
} else {
mag_to_free = cpu_rcache->loaded;
}
@@ -968,9 +973,9 @@ static unsigned long __iova_rcache_get(struct iova_rcache 
*rcache,
cpu_rcache = raw_cpu_ptr(rcache->cpu_rcaches);
spin_lock_irqsave(_rcache->lock, flags);
 
-   if (!iova_magazine_empty(cpu_rcache->loaded)) {
+   if 

[PATCH v2 0/4] iommu/iova: Solve longterm IOVA issue

2020-10-26 Thread John Garry
This series contains a patch to solve the longterm IOVA issue which
leizhen originally tried to address at [0].

Along with this, I included the following:
- A smaller helper to clear all IOVAs for a domain
- Change polarity of the IOVA magazine helpers
- Small optimisation from Cong Wang included, which was never applied [1].
  There was some debate of the other patches in that series, but this one
  is quite straightforward.

Differences to v1:
- Add IOVA clearing helper
- Add patch to change polarity of mag helpers
- Avoid logically-redundant extra variable in __iova_rcache_insert()

[0] 
https://lore.kernel.org/linux-iommu/20190815121104.29140-3-thunder.leiz...@huawei.com/
[1] 
https://lore.kernel.org/linux-iommu/4b74d40a-22d1-af53-fcb6-5d7018370...@huawei.com/

Cong Wang (1):
  iommu: avoid taking iova_rbtree_lock twice

John Garry (3):
  iommu/iova: Add free_all_cpu_cached_iovas()
  iommu/iova: Avoid double-negatives in magazine helpers
  iommu/iova: Flush CPU rcache for when a depot fills

 drivers/iommu/iova.c | 66 +---
 1 file changed, 38 insertions(+), 28 deletions(-)

-- 
2.26.2

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[PATCH v2 3/4] iommu/iova: Flush CPU rcache for when a depot fills

2020-10-26 Thread John Garry
Leizhen reported some time ago that IOVA performance may degrade over time
[0], but unfortunately his solution to fix this problem was not given
attention.

To summarize, the issue is that as time goes by, the CPU rcache and depot
rcache continue to grow. As such, IOVA RB tree access time also continues
to grow.

At a certain point, a depot may become full, and also some CPU rcaches may
also be full when inserting another IOVA is attempted. For this scenario,
currently the "loaded" CPU rcache is freed and a new one is created. This
freeing means that many IOVAs in the RB tree need to be freed, which
makes IO throughput performance fall off a cliff in some storage scenarios:

Jobs: 12 (f=12): [] [0.0% done] [6314MB/0KB/0KB /s] [1616K/0/0 iops]
Jobs: 12 (f=12): [] [0.0% done] [5669MB/0KB/0KB /s] [1451K/0/0 iops]
Jobs: 12 (f=12): [] [0.0% done] [6031MB/0KB/0KB /s] [1544K/0/0 iops]
Jobs: 12 (f=12): [] [0.0% done] [6673MB/0KB/0KB /s] [1708K/0/0 iops]
Jobs: 12 (f=12): [] [0.0% done] [6705MB/0KB/0KB /s] [1717K/0/0 iops]
Jobs: 12 (f=12): [] [0.0% done] [6031MB/0KB/0KB /s] [1544K/0/0 iops]
Jobs: 12 (f=12): [] [0.0% done] [6761MB/0KB/0KB /s] [1731K/0/0 iops]
Jobs: 12 (f=12): [] [0.0% done] [6705MB/0KB/0KB /s] [1717K/0/0 iops]
Jobs: 12 (f=12): [] [0.0% done] [6685MB/0KB/0KB /s] [1711K/0/0 iops]
Jobs: 12 (f=12): [] [0.0% done] [6178MB/0KB/0KB /s] [1582K/0/0 iops]
Jobs: 12 (f=12): [] [0.0% done] [6731MB/0KB/0KB /s] [1723K/0/0 iops]
Jobs: 12 (f=12): [] [0.0% done] [2387MB/0KB/0KB /s] [611K/0/0 iops]
Jobs: 12 (f=12): [] [0.0% done] [2689MB/0KB/0KB /s] [688K/0/0 iops]
Jobs: 12 (f=12): [] [0.0% done] [2278MB/0KB/0KB /s] [583K/0/0 iops]
Jobs: 12 (f=12): [] [0.0% done] [1288MB/0KB/0KB /s] [330K/0/0 iops]
Jobs: 12 (f=12): [] [0.0% done] [1632MB/0KB/0KB /s] [418K/0/0 iops]
Jobs: 12 (f=12): [] [0.0% done] [1765MB/0KB/0KB /s] [452K/0/0 iops]

And continue in this fashion, without recovering. Note that in this
example it was required to wait 16 hours for this to occur. Also note that
IO throughput also becomes gradually becomes more unstable leading up to
this point.

As a solution to this issue, judge that the IOVA caches have grown too big
when cached magazines need to be free, and just flush all the CPUs rcaches
instead.

The depot rcaches, however, are not flushed, as they can be used to
immediately replenish active CPUs.

In future, some IOVA compaction could be implemented to solve the
instabilty issue, which I figure could be quite complex to implement.

[0] 
https://lore.kernel.org/linux-iommu/20190815121104.29140-3-thunder.leiz...@huawei.com/

Analyzed-by: Zhen Lei 
Reported-by: Xiang Chen 
Signed-off-by: John Garry 
---
 drivers/iommu/iova.c | 16 ++--
 1 file changed, 6 insertions(+), 10 deletions(-)

diff --git a/drivers/iommu/iova.c b/drivers/iommu/iova.c
index 1f3f0f8b12e0..386005055aca 100644
--- a/drivers/iommu/iova.c
+++ b/drivers/iommu/iova.c
@@ -901,7 +901,6 @@ static bool __iova_rcache_insert(struct iova_domain *iovad,
 struct iova_rcache *rcache,
 unsigned long iova_pfn)
 {
-   struct iova_magazine *mag_to_free = NULL;
struct iova_cpu_rcache *cpu_rcache;
bool can_insert = false;
unsigned long flags;
@@ -923,13 +922,12 @@ static bool __iova_rcache_insert(struct iova_domain 
*iovad,
if (cpu_rcache->loaded)
rcache->depot[rcache->depot_size++] =
cpu_rcache->loaded;
-   } else {
-   mag_to_free = cpu_rcache->loaded;
+   can_insert = true;
+   cpu_rcache->loaded = new_mag;
}
spin_unlock(>lock);
-
-   cpu_rcache->loaded = new_mag;
-   can_insert = true;
+   if (!can_insert)
+   iova_magazine_free(new_mag);
}
}
 
@@ -938,10 +936,8 @@ static bool __iova_rcache_insert(struct iova_domain *iovad,
 
spin_unlock_irqrestore(_rcache->lock, flags);
 
-   if (mag_to_free) {
-   iova_magazine_free_pfns(mag_to_free, iovad);
-   iova_magazine_free(mag_to_free);
-   }
+   if (!can_insert)
+   free_all_cpu_cached_iovas(iovad);
 
return can_insert;
 }
-- 
2.26.2

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCHv6 4/6] drm/msm/a6xx: Add support for using system cache(LLC)

2020-10-26 Thread Jordan Crouse
On Mon, Oct 26, 2020 at 05:24:03PM +0530, Sai Prakash Ranjan wrote:
> From: Sharat Masetty 
> 
> The last level system cache can be partitioned to 32 different
> slices of which GPU has two slices preallocated. One slice is
> used for caching GPU buffers and the other slice is used for
> caching the GPU SMMU pagetables. This talks to the core system
> cache driver to acquire the slice handles, configure the SCID's
> to those slices and activates and deactivates the slices upon
> GPU power collapse and restore.
> 
> Some support from the IOMMU driver is also needed to make use
> of the system cache to set the right TCR attributes. GPU then
> has the ability to override a few cacheability parameters which
> it does to override write-allocate to write-no-allocate as the
> GPU hardware does not benefit much from it.
> 
> DOMAIN_ATTR_SYS_CACHE is another domain level attribute used by the
> IOMMU driver to set the right attributes to cache the hardware
> pagetables into the system cache.
> 
> Signed-off-by: Sharat Masetty 
> [saiprakash.ranjan: fix to set attr before device attach to iommu and rebase]
> Signed-off-by: Sai Prakash Ranjan 

As with the previous patch this doesn't exactly need the IOMMU side changes
outside of the update to the domain attribute enum.

If the attribute didn't exist we would just lose no-write-allocate which is
undesirable but not devastating.

Hopefully the arm-smmu changes are ready to go but I'm just trying to figure
out a game plan to keep Sai from having to maintain these patches
for another cycle.

Jordan

> ---
>  drivers/gpu/drm/msm/adreno/a6xx_gpu.c   | 83 +
>  drivers/gpu/drm/msm/adreno/a6xx_gpu.h   |  4 ++
>  drivers/gpu/drm/msm/adreno/adreno_gpu.c | 17 +
>  3 files changed, 104 insertions(+)
> 
> diff --git a/drivers/gpu/drm/msm/adreno/a6xx_gpu.c 
> b/drivers/gpu/drm/msm/adreno/a6xx_gpu.c
> index 8915882e..151190ff62f7 100644
> --- a/drivers/gpu/drm/msm/adreno/a6xx_gpu.c
> +++ b/drivers/gpu/drm/msm/adreno/a6xx_gpu.c
> @@ -8,7 +8,9 @@
>  #include "a6xx_gpu.h"
>  #include "a6xx_gmu.xml.h"
>  
> +#include 
>  #include 
> +#include 
>  
>  #define GPU_PAS_ID 13
>  
> @@ -1022,6 +1024,79 @@ static irqreturn_t a6xx_irq(struct msm_gpu *gpu)
>   return IRQ_HANDLED;
>  }
>  
> +static void a6xx_llc_rmw(struct a6xx_gpu *a6xx_gpu, u32 reg, u32 mask, u32 
> or)
> +{
> + return msm_rmw(a6xx_gpu->llc_mmio + (reg << 2), mask, or);
> +}
> +
> +static void a6xx_llc_write(struct a6xx_gpu *a6xx_gpu, u32 reg, u32 value)
> +{
> + return msm_writel(value, a6xx_gpu->llc_mmio + (reg << 2));
> +}
> +
> +static void a6xx_llc_deactivate(struct a6xx_gpu *a6xx_gpu)
> +{
> + llcc_slice_deactivate(a6xx_gpu->llc_slice);
> + llcc_slice_deactivate(a6xx_gpu->htw_llc_slice);
> +}
> +
> +static void a6xx_llc_activate(struct a6xx_gpu *a6xx_gpu)
> +{
> + u32 cntl1_regval = 0;
> +
> + if (IS_ERR(a6xx_gpu->llc_mmio))
> + return;
> +
> + if (!llcc_slice_activate(a6xx_gpu->llc_slice)) {
> + u32 gpu_scid = llcc_get_slice_id(a6xx_gpu->llc_slice);
> +
> + gpu_scid &= 0x1f;
> + cntl1_regval = (gpu_scid << 0) | (gpu_scid << 5) | (gpu_scid << 
> 10) |
> +(gpu_scid << 15) | (gpu_scid << 20);
> + }
> +
> + if (!llcc_slice_activate(a6xx_gpu->htw_llc_slice)) {
> + u32 gpuhtw_scid = llcc_get_slice_id(a6xx_gpu->htw_llc_slice);
> +
> + gpuhtw_scid &= 0x1f;
> + cntl1_regval |= FIELD_PREP(GENMASK(29, 25), gpuhtw_scid);
> + }
> +
> + if (cntl1_regval) {
> + /*
> +  * Program the slice IDs for the various GPU blocks and GPU MMU
> +  * pagetables
> +  */
> + a6xx_llc_write(a6xx_gpu, REG_A6XX_CX_MISC_SYSTEM_CACHE_CNTL_1, 
> cntl1_regval);
> +
> + /*
> +  * Program cacheability overrides to not allocate cache lines on
> +  * a write miss
> +  */
> + a6xx_llc_rmw(a6xx_gpu, REG_A6XX_CX_MISC_SYSTEM_CACHE_CNTL_0, 
> 0xF, 0x03);
> + }
> +}
> +
> +static void a6xx_llc_slices_destroy(struct a6xx_gpu *a6xx_gpu)
> +{
> + llcc_slice_putd(a6xx_gpu->llc_slice);
> + llcc_slice_putd(a6xx_gpu->htw_llc_slice);
> +}
> +
> +static void a6xx_llc_slices_init(struct platform_device *pdev,
> + struct a6xx_gpu *a6xx_gpu)
> +{
> + a6xx_gpu->llc_mmio = msm_ioremap(pdev, "cx_mem", "gpu_cx");
> + if (IS_ERR(a6xx_gpu->llc_mmio))
> + return;
> +
> + a6xx_gpu->llc_slice = llcc_slice_getd(LLCC_GPU);
> + a6xx_gpu->htw_llc_slice = llcc_slice_getd(LLCC_GPUHTW);
> +
> + if (IS_ERR(a6xx_gpu->llc_slice) && IS_ERR(a6xx_gpu->htw_llc_slice))
> + a6xx_gpu->llc_mmio = ERR_PTR(-EINVAL);
> +}
> +
>  static int a6xx_pm_resume(struct msm_gpu *gpu)
>  {
>   struct adreno_gpu *adreno_gpu = to_adreno_gpu(gpu);
> @@ -1038,6 +1113,8 @@ static int a6xx_pm_resume(struct msm_gpu *gpu)
>  
>   

Re: [PATCHv6 3/6] drm/msm: rearrange the gpu_rmw() function

2020-10-26 Thread Jordan Crouse
On Mon, Oct 26, 2020 at 05:24:02PM +0530, Sai Prakash Ranjan wrote:
> From: Sharat Masetty 
> 
> The register read-modify-write construct is generic enough
> that it can be used by other subsystems as needed, create
> a more generic rmw() function and have the gpu_rmw() use
> this new function.
> 
> Signed-off-by: Sharat Masetty 
> Reviewed-by: Jordan Crouse 
> Signed-off-by: Sai Prakash Ranjan 

Rob - this should be safe to pull with msm-next regardless of the merge status
of the iommu side of things. Hopefully everything will be pulled for 5.11 but if
it isn't it would be good to get this out of the cycle.

Jordan

> ---
>  drivers/gpu/drm/msm/msm_drv.c | 8 
>  drivers/gpu/drm/msm/msm_drv.h | 1 +
>  drivers/gpu/drm/msm/msm_gpu.h | 5 +
>  3 files changed, 10 insertions(+), 4 deletions(-)
> 
> diff --git a/drivers/gpu/drm/msm/msm_drv.c b/drivers/gpu/drm/msm/msm_drv.c
> index 49685571dc0e..a1e22b974b77 100644
> --- a/drivers/gpu/drm/msm/msm_drv.c
> +++ b/drivers/gpu/drm/msm/msm_drv.c
> @@ -180,6 +180,14 @@ u32 msm_readl(const void __iomem *addr)
>   return val;
>  }
>  
> +void msm_rmw(void __iomem *addr, u32 mask, u32 or)
> +{
> + u32 val = msm_readl(addr);
> +
> + val &= ~mask;
> + msm_writel(val | or, addr);
> +}
> +
>  struct msm_vblank_work {
>   struct work_struct work;
>   int crtc_id;
> diff --git a/drivers/gpu/drm/msm/msm_drv.h b/drivers/gpu/drm/msm/msm_drv.h
> index b9dd8f8f4887..655b3b0424a1 100644
> --- a/drivers/gpu/drm/msm/msm_drv.h
> +++ b/drivers/gpu/drm/msm/msm_drv.h
> @@ -478,6 +478,7 @@ void __iomem *msm_ioremap_quiet(struct platform_device 
> *pdev, const char *name,
>   const char *dbgname);
>  void msm_writel(u32 data, void __iomem *addr);
>  u32 msm_readl(const void __iomem *addr);
> +void msm_rmw(void __iomem *addr, u32 mask, u32 or);
>  
>  struct msm_gpu_submitqueue;
>  int msm_submitqueue_init(struct drm_device *drm, struct msm_file_private 
> *ctx);
> diff --git a/drivers/gpu/drm/msm/msm_gpu.h b/drivers/gpu/drm/msm/msm_gpu.h
> index 6c9e1fdc1a76..b2b419277953 100644
> --- a/drivers/gpu/drm/msm/msm_gpu.h
> +++ b/drivers/gpu/drm/msm/msm_gpu.h
> @@ -246,10 +246,7 @@ static inline u32 gpu_read(struct msm_gpu *gpu, u32 reg)
>  
>  static inline void gpu_rmw(struct msm_gpu *gpu, u32 reg, u32 mask, u32 or)
>  {
> - uint32_t val = gpu_read(gpu, reg);
> -
> - val &= ~mask;
> - gpu_write(gpu, reg, val | or);
> + msm_rmw(gpu->mmio + (reg << 2), mask, or);
>  }
>  
>  static inline u64 gpu_read64(struct msm_gpu *gpu, u32 lo, u32 hi)
> -- 
> QUALCOMM INDIA, on behalf of Qualcomm Innovation Center, Inc. is a member
> of Code Aurora Forum, hosted by The Linux Foundation
> 

-- 
The Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum,
a Linux Foundation Collaborative Project
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH v4 4/7] of: unittest: Add test for of_dma_get_max_cpu_address()

2020-10-26 Thread Rob Herring
On Wed, 21 Oct 2020 14:34:34 +0200, Nicolas Saenz Julienne wrote:
> Introduce a test for of_dma_get_max_cup_address(), it uses the same DT
> data as the rest of dma-ranges unit tests.
> 
> Signed-off-by: Nicolas Saenz Julienne 
> 
> ---
> Changes since v3:
>  - Remove HAS_DMA guards
> 
>  drivers/of/unittest.c | 18 ++
>  1 file changed, 18 insertions(+)
> 

Reviewed-by: Rob Herring 
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[PATCH] dma: Per-NUMA-node CMA should depend on NUMA

2020-10-26 Thread Robin Murphy
Offering DMA_PERNUMA_CMA to non-NUMA configs is pointless.

Signed-off-by: Robin Murphy 
---
 kernel/dma/Kconfig | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/kernel/dma/Kconfig b/kernel/dma/Kconfig
index c99de4a21458..964b74c9b7e3 100644
--- a/kernel/dma/Kconfig
+++ b/kernel/dma/Kconfig
@@ -125,7 +125,8 @@ if  DMA_CMA
 
 config DMA_PERNUMA_CMA
bool "Enable separate DMA Contiguous Memory Area for each NUMA Node"
-   default NUMA && ARM64
+   depends on NUMA
+   default ARM64
help
  Enable this option to get pernuma CMA areas so that devices like
  ARM64 SMMU can get local memory by DMA coherent APIs.
-- 
2.28.0.dirty

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[PATCH] iommu/arm-smmu: Use new devm_krealloc()

2020-10-26 Thread Robin Murphy
The implementation-specific subclassing of struct arm_smmu_device really
wanted an appropriate version of realloc(). Now that one exists, take
full advantage of it to clarify what's actually being done here.

Signed-off-by: Robin Murphy 
---
 drivers/iommu/arm/arm-smmu/arm-smmu-impl.c   |  5 +
 drivers/iommu/arm/arm-smmu/arm-smmu-nvidia.c | 17 +
 drivers/iommu/arm/arm-smmu/arm-smmu-qcom.c   |  5 +
 3 files changed, 3 insertions(+), 24 deletions(-)

diff --git a/drivers/iommu/arm/arm-smmu/arm-smmu-impl.c 
b/drivers/iommu/arm/arm-smmu/arm-smmu-impl.c
index 88f17cc33023..336f36ed9ed7 100644
--- a/drivers/iommu/arm/arm-smmu/arm-smmu-impl.c
+++ b/drivers/iommu/arm/arm-smmu/arm-smmu-impl.c
@@ -91,15 +91,12 @@ static struct arm_smmu_device *cavium_smmu_impl_init(struct 
arm_smmu_device *smm
 {
struct cavium_smmu *cs;
 
-   cs = devm_kzalloc(smmu->dev, sizeof(*cs), GFP_KERNEL);
+   cs = devm_krealloc(smmu->dev, smmu, sizeof(*cs), GFP_KERNEL);
if (!cs)
return ERR_PTR(-ENOMEM);
 
-   cs->smmu = *smmu;
cs->smmu.impl = _impl;
 
-   devm_kfree(smmu->dev, smmu);
-
return >smmu;
 }
 
diff --git a/drivers/iommu/arm/arm-smmu/arm-smmu-nvidia.c 
b/drivers/iommu/arm/arm-smmu/arm-smmu-nvidia.c
index 31368057e9be..29117444e5a0 100644
--- a/drivers/iommu/arm/arm-smmu/arm-smmu-nvidia.c
+++ b/drivers/iommu/arm/arm-smmu/arm-smmu-nvidia.c
@@ -242,18 +242,10 @@ struct arm_smmu_device *nvidia_smmu_impl_init(struct 
arm_smmu_device *smmu)
struct nvidia_smmu *nvidia_smmu;
struct platform_device *pdev = to_platform_device(dev);
 
-   nvidia_smmu = devm_kzalloc(dev, sizeof(*nvidia_smmu), GFP_KERNEL);
+   nvidia_smmu = devm_krealloc(dev, smmu, sizeof(*nvidia_smmu), 
GFP_KERNEL);
if (!nvidia_smmu)
return ERR_PTR(-ENOMEM);
 
-   /*
-* Copy the data from struct arm_smmu_device *smmu allocated in
-* arm-smmu.c. The smmu from struct nvidia_smmu replaces the smmu
-* pointer used in arm-smmu.c once this function returns.
-* This is necessary to derive nvidia_smmu from smmu pointer passed
-* through arm_smmu_impl function calls subsequently.
-*/
-   nvidia_smmu->smmu = *smmu;
/* Instance 0 is ioremapped by arm-smmu.c. */
nvidia_smmu->bases[0] = smmu->base;
 
@@ -267,12 +259,5 @@ struct arm_smmu_device *nvidia_smmu_impl_init(struct 
arm_smmu_device *smmu)
 
nvidia_smmu->smmu.impl = _smmu_impl;
 
-   /*
-* Free the struct arm_smmu_device *smmu allocated in arm-smmu.c.
-* Once this function returns, arm-smmu.c would use arm_smmu_device
-* allocated as part of struct nvidia_smmu.
-*/
-   devm_kfree(dev, smmu);
-
return _smmu->smmu;
 }
diff --git a/drivers/iommu/arm/arm-smmu/arm-smmu-qcom.c 
b/drivers/iommu/arm/arm-smmu/arm-smmu-qcom.c
index be4318044f96..600670588521 100644
--- a/drivers/iommu/arm/arm-smmu/arm-smmu-qcom.c
+++ b/drivers/iommu/arm/arm-smmu/arm-smmu-qcom.c
@@ -69,14 +69,11 @@ struct arm_smmu_device *qcom_smmu_impl_init(struct 
arm_smmu_device *smmu)
 {
struct qcom_smmu *qsmmu;
 
-   qsmmu = devm_kzalloc(smmu->dev, sizeof(*qsmmu), GFP_KERNEL);
+   qsmmu = devm_krealloc(smmu->dev, smmu, sizeof(*qsmmu), GFP_KERNEL);
if (!qsmmu)
return ERR_PTR(-ENOMEM);
 
-   qsmmu->smmu = *smmu;
-
qsmmu->smmu.impl = _smmu_impl;
-   devm_kfree(smmu->dev, smmu);
 
return >smmu;
 }
-- 
2.28.0.dirty

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[PATCHv6 6/6] iommu: arm-smmu-impl: Add a space before open parenthesis

2020-10-26 Thread Sai Prakash Ranjan
Fix the checkpatch warning for space required before the open
parenthesis.

Signed-off-by: Sai Prakash Ranjan 
---
 drivers/iommu/arm/arm-smmu/arm-smmu-impl.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/iommu/arm/arm-smmu/arm-smmu-impl.c 
b/drivers/iommu/arm/arm-smmu/arm-smmu-impl.c
index ffaf3f91ba52..f16da4a21270 100644
--- a/drivers/iommu/arm/arm-smmu/arm-smmu-impl.c
+++ b/drivers/iommu/arm/arm-smmu/arm-smmu-impl.c
@@ -12,7 +12,7 @@
 
 static int arm_smmu_gr0_ns(int offset)
 {
-   switch(offset) {
+   switch (offset) {
case ARM_SMMU_GR0_sCR0:
case ARM_SMMU_GR0_sACR:
case ARM_SMMU_GR0_sGFSR:
-- 
QUALCOMM INDIA, on behalf of Qualcomm Innovation Center, Inc. is a member
of Code Aurora Forum, hosted by The Linux Foundation

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[PATCHv6 3/6] drm/msm: rearrange the gpu_rmw() function

2020-10-26 Thread Sai Prakash Ranjan
From: Sharat Masetty 

The register read-modify-write construct is generic enough
that it can be used by other subsystems as needed, create
a more generic rmw() function and have the gpu_rmw() use
this new function.

Signed-off-by: Sharat Masetty 
Reviewed-by: Jordan Crouse 
Signed-off-by: Sai Prakash Ranjan 
---
 drivers/gpu/drm/msm/msm_drv.c | 8 
 drivers/gpu/drm/msm/msm_drv.h | 1 +
 drivers/gpu/drm/msm/msm_gpu.h | 5 +
 3 files changed, 10 insertions(+), 4 deletions(-)

diff --git a/drivers/gpu/drm/msm/msm_drv.c b/drivers/gpu/drm/msm/msm_drv.c
index 49685571dc0e..a1e22b974b77 100644
--- a/drivers/gpu/drm/msm/msm_drv.c
+++ b/drivers/gpu/drm/msm/msm_drv.c
@@ -180,6 +180,14 @@ u32 msm_readl(const void __iomem *addr)
return val;
 }
 
+void msm_rmw(void __iomem *addr, u32 mask, u32 or)
+{
+   u32 val = msm_readl(addr);
+
+   val &= ~mask;
+   msm_writel(val | or, addr);
+}
+
 struct msm_vblank_work {
struct work_struct work;
int crtc_id;
diff --git a/drivers/gpu/drm/msm/msm_drv.h b/drivers/gpu/drm/msm/msm_drv.h
index b9dd8f8f4887..655b3b0424a1 100644
--- a/drivers/gpu/drm/msm/msm_drv.h
+++ b/drivers/gpu/drm/msm/msm_drv.h
@@ -478,6 +478,7 @@ void __iomem *msm_ioremap_quiet(struct platform_device 
*pdev, const char *name,
const char *dbgname);
 void msm_writel(u32 data, void __iomem *addr);
 u32 msm_readl(const void __iomem *addr);
+void msm_rmw(void __iomem *addr, u32 mask, u32 or);
 
 struct msm_gpu_submitqueue;
 int msm_submitqueue_init(struct drm_device *drm, struct msm_file_private *ctx);
diff --git a/drivers/gpu/drm/msm/msm_gpu.h b/drivers/gpu/drm/msm/msm_gpu.h
index 6c9e1fdc1a76..b2b419277953 100644
--- a/drivers/gpu/drm/msm/msm_gpu.h
+++ b/drivers/gpu/drm/msm/msm_gpu.h
@@ -246,10 +246,7 @@ static inline u32 gpu_read(struct msm_gpu *gpu, u32 reg)
 
 static inline void gpu_rmw(struct msm_gpu *gpu, u32 reg, u32 mask, u32 or)
 {
-   uint32_t val = gpu_read(gpu, reg);
-
-   val &= ~mask;
-   gpu_write(gpu, reg, val | or);
+   msm_rmw(gpu->mmio + (reg << 2), mask, or);
 }
 
 static inline u64 gpu_read64(struct msm_gpu *gpu, u32 lo, u32 hi)
-- 
QUALCOMM INDIA, on behalf of Qualcomm Innovation Center, Inc. is a member
of Code Aurora Forum, hosted by The Linux Foundation

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[PATCHv6 1/6] iommu/io-pgtable-arm: Add support to use system cache

2020-10-26 Thread Sai Prakash Ranjan
Add a quirk IO_PGTABLE_QUIRK_SYS_CACHE to override the
attributes set in TCR for the page table walker when
using system cache.

Signed-off-by: Sai Prakash Ranjan 
---
 drivers/iommu/io-pgtable-arm.c | 7 ++-
 include/linux/io-pgtable.h | 4 
 2 files changed, 10 insertions(+), 1 deletion(-)

diff --git a/drivers/iommu/io-pgtable-arm.c b/drivers/iommu/io-pgtable-arm.c
index dc7bcf858b6d..828426c16fa9 100644
--- a/drivers/iommu/io-pgtable-arm.c
+++ b/drivers/iommu/io-pgtable-arm.c
@@ -789,7 +789,8 @@ arm_64_lpae_alloc_pgtable_s1(struct io_pgtable_cfg *cfg, 
void *cookie)
 
if (cfg->quirks & ~(IO_PGTABLE_QUIRK_ARM_NS |
IO_PGTABLE_QUIRK_NON_STRICT |
-   IO_PGTABLE_QUIRK_ARM_TTBR1))
+   IO_PGTABLE_QUIRK_ARM_TTBR1 |
+   IO_PGTABLE_QUIRK_SYS_CACHE))
return NULL;
 
data = arm_lpae_alloc_pgtable(cfg);
@@ -801,6 +802,10 @@ arm_64_lpae_alloc_pgtable_s1(struct io_pgtable_cfg *cfg, 
void *cookie)
tcr->sh = ARM_LPAE_TCR_SH_IS;
tcr->irgn = ARM_LPAE_TCR_RGN_WBWA;
tcr->orgn = ARM_LPAE_TCR_RGN_WBWA;
+   } else if (cfg->quirks & IO_PGTABLE_QUIRK_SYS_CACHE) {
+   tcr->sh = ARM_LPAE_TCR_SH_OS;
+   tcr->irgn = ARM_LPAE_TCR_RGN_NC;
+   tcr->orgn = ARM_LPAE_TCR_RGN_WBWA;
} else {
tcr->sh = ARM_LPAE_TCR_SH_OS;
tcr->irgn = ARM_LPAE_TCR_RGN_NC;
diff --git a/include/linux/io-pgtable.h b/include/linux/io-pgtable.h
index 23285ba645db..ecc9d2248b84 100644
--- a/include/linux/io-pgtable.h
+++ b/include/linux/io-pgtable.h
@@ -86,6 +86,9 @@ struct io_pgtable_cfg {
 *
 * IO_PGTABLE_QUIRK_ARM_TTBR1: (ARM LPAE format) Configure the table
 *  for use in the upper half of a split address space.
+*
+* IO_PGTABLE_QUIRK_SYS_CACHE: Override the attributes set in TCR for
+*  the page table walker when using system cache.
 */
#define IO_PGTABLE_QUIRK_ARM_NS BIT(0)
#define IO_PGTABLE_QUIRK_NO_PERMS   BIT(1)
@@ -93,6 +96,7 @@ struct io_pgtable_cfg {
#define IO_PGTABLE_QUIRK_ARM_MTK_EXTBIT(3)
#define IO_PGTABLE_QUIRK_NON_STRICT BIT(4)
#define IO_PGTABLE_QUIRK_ARM_TTBR1  BIT(5)
+   #define IO_PGTABLE_QUIRK_SYS_CACHE  BIT(6)
unsigned long   quirks;
unsigned long   pgsize_bitmap;
unsigned intias;
-- 
QUALCOMM INDIA, on behalf of Qualcomm Innovation Center, Inc. is a member
of Code Aurora Forum, hosted by The Linux Foundation

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH v3 11/24] iommu/io-pgtable-arm-v7s: Quad lvl1 pgtable for MediaTek

2020-10-26 Thread Robin Murphy

On 2020-10-26 07:41, Yong Wu wrote:

On Fri, 2020-10-23 at 15:10 +0100, Robin Murphy wrote:

On 2020-09-30 08:06, Yong Wu wrote:

The standard input iova bits is 32. MediaTek quad the lvl1 pagetable
(4 * lvl1). No change for lvl2 pagetable. Then the iova bits can reach
34bit.

Signed-off-by: Yong Wu 
---
   drivers/iommu/io-pgtable-arm-v7s.c | 13 ++---
   drivers/iommu/mtk_iommu.c  |  2 +-
   2 files changed, 11 insertions(+), 4 deletions(-)

diff --git a/drivers/iommu/io-pgtable-arm-v7s.c 
b/drivers/iommu/io-pgtable-arm-v7s.c
index 8362fdf76657..306bae2755ed 100644
--- a/drivers/iommu/io-pgtable-arm-v7s.c
+++ b/drivers/iommu/io-pgtable-arm-v7s.c
@@ -50,10 +50,17 @@
*/
   #define ARM_V7S_ADDR_BITS32
   #define _ARM_V7S_LVL_BITS(lvl)   (16 - (lvl) * 4)
+/* MediaTek: totally 34bits, 14bits at lvl1 and 8bits at lvl2. */
+#define _ARM_V7S_LVL_BITS_MTK(lvl) (20 - (lvl) * 6)


This should defined in terms of both lvl and cfg->ias. The formula here
is nothing more than a disgusting trick I made up since a linear
interpolation happened to fit the required numbers. That said, all of
these bits pretending that short-descriptor is a well-defined recursive
format only served to allow the rest of the code to look more like the
LPAE code - IIRC they've already diverged a fair bit since then, so
frankly a lot of this could stand to be unpicked and made considerably
clearer by simply accepting that level 1 and level 2 are different from
each other.


If the formula is not good and make it clearer, How about this?


/*
  * We have 32 bits total; 12 bits resolved at level 1, 8 bits at level
2,
-* and 12 bits in a page. With some carefully-chosen coefficients we can
-* hide the ugly inconsistencies behind these macros and at least let
the
-* rest of the code pretend to be somewhat sane.
+* and 12 bits in a page.
+*
+* MediaTek extend 2 bits to reach 34 bits, 14 bits at lvl1 and 8 bits
at lvl2.
  */

-#define _ARM_V7S_LVL_BITS(lvl) (16 - (lvl) * 4)
+#define _ARM_V7S_LVL1_BITS_NR(cfg) (((cfg)->ias == 32) ? 12 : 14)
+#define _ARM_V7S_LVL2_BITS_NR  8
+
+#define _ARM_V7S_LVL_BITS(lvl, cfg)\
+  (((lvl) == 1) ? _ARM_V7S_LVL1_BITS_NR(cfg):_ARM_V7S_LVL2_BITS_NR)


Well, I'd have gone for something really simple and clear like:

#define ARM_V7S_LVL_BITS(lvl, cfg) ((lvl) == 1 ? (cfg)->ias - 20 : 8)
#define ARM_V7S_LVL_SHIFT(lvl) ((lvl) == 1 ? 20 : 12)

Then maybe see if enough of the users could resolve lvl significantly 
earlier to make it worth splitting things up further.


Robin.


   #define ARM_V7S_LVL_SHIFT(lvl)   (ARM_V7S_ADDR_BITS - (4 + 8 * 
(lvl)))
   #define ARM_V7S_TABLE_SHIFT  10
   
-#define ARM_V7S_PTES_PER_LVL(lvl, cfg)	(1 << _ARM_V7S_LVL_BITS(lvl))

+#define ARM_V7S_PTES_PER_LVL(lvl, cfg) ({  \
+   int _lvl = lvl; \
+   !arm_v7s_is_mtk_enabled(cfg) ?  \
+(1 << _ARM_V7S_LVL_BITS(_lvl)) : (1 << _ARM_V7S_LVL_BITS_MTK(_lvl));\
+})
+
   #define ARM_V7S_TABLE_SIZE(lvl, cfg) \
(ARM_V7S_PTES_PER_LVL(lvl, cfg) * sizeof(arm_v7s_iopte))
   
@@ -63,7 +70,7 @@

   #define _ARM_V7S_IDX_MASK(lvl, cfg)  (ARM_V7S_PTES_PER_LVL(lvl, cfg) - 1)
   #define ARM_V7S_LVL_IDX(addr, lvl, cfg)  ({  \
int _l = lvl;   \
-   ((u32)(addr) >> ARM_V7S_LVL_SHIFT(_l)) & _ARM_V7S_IDX_MASK(_l, cfg); \
+   ((addr) >> ARM_V7S_LVL_SHIFT(_l)) & _ARM_V7S_IDX_MASK(_l, cfg); \
   })
   
   /*

@@ -755,7 +762,7 @@ static struct io_pgtable *arm_v7s_alloc_pgtable(struct 
io_pgtable_cfg *cfg,
   {
struct arm_v7s_io_pgtable *data;
   
-	if (cfg->ias > ARM_V7S_ADDR_BITS)

+   if (cfg->ias > (arm_v7s_is_mtk_enabled(cfg) ? 34 : ARM_V7S_ADDR_BITS))
return NULL;
   
   	if (cfg->oas > (arm_v7s_is_mtk_enabled(cfg) ? 35 : ARM_V7S_ADDR_BITS))

diff --git a/drivers/iommu/mtk_iommu.c b/drivers/iommu/mtk_iommu.c
index f6a2e3eb59d2..6e85c9976a33 100644
--- a/drivers/iommu/mtk_iommu.c
+++ b/drivers/iommu/mtk_iommu.c
@@ -316,7 +316,7 @@ static int mtk_iommu_domain_finalise(struct 
mtk_iommu_domain *dom)
IO_PGTABLE_QUIRK_TLBI_ON_MAP |
IO_PGTABLE_QUIRK_ARM_MTK_EXT,
.pgsize_bitmap = mtk_iommu_ops.pgsize_bitmap,
-   .ias = 32,
+   .ias = 34,
.oas = 35,
.tlb = _iommu_flush_ops,
.iommu_dev = data->dev,




___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


RE: WARNING in dma_map_page_attrs

2020-10-26 Thread Parav Pandit
Hi Christoph,

> From: Jakub Kicinski 
> Sent: Saturday, October 24, 2020 11:45 PM
> 
> CC: rdma, looks like rdma from the stack trace
> 
> On Fri, 23 Oct 2020 20:07:17 -0700 syzbot wrote:
> > syzbot has found a reproducer for the following issue on:
> >
> > HEAD commit:3cb12d27 Merge tag 'net-5.10-rc1' of git://git.kernel.org/..

In [1] you mentioned that dma_mask should not be set for dma_virt_ops.
So patch [2] removed it.

But check to validate the dma mask for all dma_ops was added in [3].

What is the right way? Did I misunderstood your comment about dma_mask in [1]?

[1] https://www.spinics.net/lists/linux-rdma/msg96374.html
[2] e0477b34d9d ("RDMA: Explicitly pass in the dma_device to 
ib_register_device")
[3] f959dcd6ddfd ("dma-direct: Fix potential NULL pointer dereference")

> > WARNING: CPU: 1 PID: 8488 at kernel/dma/mapping.c:149
> > dma_map_page_attrs+0x493/0x700 kernel/dma/mapping.c:149 Modules
> linked in:
> >  dma_map_single_attrs include/linux/dma-mapping.h:279 [inline]
> > ib_dma_map_single include/rdma/ib_verbs.h:3967 [inline]
> >  ib_mad_post_receive_mads+0x23f/0xd60
> > drivers/infiniband/core/mad.c:2715
> >  ib_mad_port_start drivers/infiniband/core/mad.c:2862 [inline]
> > ib_mad_port_open drivers/infiniband/core/mad.c:3016 [inline]
> >  ib_mad_init_device+0x72b/0x1400 drivers/infiniband/core/mad.c:3092
> >  add_client_context+0x405/0x5e0 drivers/infiniband/core/device.c:680
> >  enable_device_and_get+0x1d5/0x3c0
> > drivers/infiniband/core/device.c:1301
> >  ib_register_device drivers/infiniband/core/device.c:1376 [inline]
> >  ib_register_device+0x7a7/0xa40 drivers/infiniband/core/device.c:1335
> >  rxe_register_device+0x46d/0x570
> > drivers/infiniband/sw/rxe/rxe_verbs.c:1182
> >  rxe_add+0x12fe/0x16d0 drivers/infiniband/sw/rxe/rxe.c:247
> >  rxe_net_add+0x8c/0xe0 drivers/infiniband/sw/rxe/rxe_net.c:507
> >  rxe_newlink drivers/infiniband/sw/rxe/rxe.c:269 [inline]
> >  rxe_newlink+0xb7/0xe0 drivers/infiniband/sw/rxe/rxe.c:250
> >  nldev_newlink+0x30e/0x540 drivers/infiniband/core/nldev.c:1555
> >  rdma_nl_rcv_msg+0x367/0x690 drivers/infiniband/core/netlink.c:195
> >  rdma_nl_rcv_skb drivers/infiniband/core/netlink.c:239 [inline]
> >  rdma_nl_rcv+0x2f2/0x440 drivers/infiniband/core/netlink.c:259
> >  netlink_unicast_kernel net/netlink/af_netlink.c:1304 [inline]
> >  netlink_unicast+0x533/0x7d0 net/netlink/af_netlink.c:1330
> >  netlink_sendmsg+0x856/0xd90 net/netlink/af_netlink.c:1919
> > sock_sendmsg_nosec net/socket.c:651 [inline]
> >  sock_sendmsg+0xcf/0x120 net/socket.c:671
> >  sys_sendmsg+0x6e8/0x810 net/socket.c:2353
> >  ___sys_sendmsg+0xf3/0x170 net/socket.c:2407
> >  __sys_sendmsg+0xe5/0x1b0 net/socket.c:2440
> >  do_syscall_64+0x2d/0x70 arch/x86/entry/common.c:46
> >  entry_SYSCALL_64_after_hwframe+0x44/0xa9
> > RIP: 0033:0x443699
> > Code: 18 89 d0 c3 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 00 48 89 f8 48
> > 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d
> > 01 f0 ff ff 0f 83 db 0d fc ff c3 66 2e 0f 1f 84 00 00 00 00
> > RSP: 002b:7ffc067db418 EFLAGS: 0246 ORIG_RAX:
> 002e
> > RAX: ffda RBX: 0003 RCX: 00443699
> > RDX:  RSI: 22c0 RDI: 0003
> > RBP: 7ffc067db420 R08: 01bb R09: 01bb
> > R10:  R11: 0246 R12: 7ffc067db430
> > R13:  R14:  R15: 
> >

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH v3 08/24] iommu/io-pgtable-arm-v7s: Use ias to check the valid iova in unmap

2020-10-26 Thread Yong Wu
On Fri, 2020-10-23 at 12:17 +0100, Will Deacon wrote:
> On Wed, Sep 30, 2020 at 03:06:31PM +0800, Yong Wu wrote:
> > Use the ias for the valid iova checking in arm_v7s_unmap. This is a
> > preparing patch for supporting iova 34bit for MediaTek.
> > BTW, change the ias/oas checking format in arm_v7s_map.
> > 
> > Signed-off-by: Yong Wu 
> > ---
> >  drivers/iommu/io-pgtable-arm-v7s.c | 5 ++---
> >  1 file changed, 2 insertions(+), 3 deletions(-)
> > 
> > diff --git a/drivers/iommu/io-pgtable-arm-v7s.c 
> > b/drivers/iommu/io-pgtable-arm-v7s.c
> > index a688f22cbe3b..4c9d8dccfc5a 100644
> > --- a/drivers/iommu/io-pgtable-arm-v7s.c
> > +++ b/drivers/iommu/io-pgtable-arm-v7s.c
> > @@ -526,8 +526,7 @@ static int arm_v7s_map(struct io_pgtable_ops *ops, 
> > unsigned long iova,
> > if (!(prot & (IOMMU_READ | IOMMU_WRITE)))
> > return 0;
> >  
> > -   if (WARN_ON(iova >= (1ULL << data->iop.cfg.ias) ||
> > -   paddr >= (1ULL << data->iop.cfg.oas)))
> > +   if (WARN_ON(iova >> data->iop.cfg.ias || paddr >> data->iop.cfg.oas))
> > return -ERANGE;
> 
> As discussed when reviewing these for Android, please leave this code as-is.
> 
> >  
> > ret = __arm_v7s_map(data, iova, paddr, size, prot, 1, data->pgd, gfp);
> > @@ -717,7 +716,7 @@ static size_t arm_v7s_unmap(struct io_pgtable_ops *ops, 
> > unsigned long iova,
> >  {
> > struct arm_v7s_io_pgtable *data = io_pgtable_ops_to_data(ops);
> >  
> > -   if (WARN_ON(upper_32_bits(iova)))
> > +   if (WARN_ON(iova >> data->iop.cfg.ias))
> > return 0;
> 
> And avoid the UB here for 32-bit machines by comparing with 1ULL <<
> iop.cfg.ias instead.

Thanks. I will fix it in next version.

> 
> Thanks,
> 
> Will

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH v3 11/24] iommu/io-pgtable-arm-v7s: Quad lvl1 pgtable for MediaTek

2020-10-26 Thread Yong Wu
On Fri, 2020-10-23 at 12:21 +0100, Will Deacon wrote:
> On Wed, Sep 30, 2020 at 03:06:34PM +0800, Yong Wu wrote:
> > The standard input iova bits is 32. MediaTek quad the lvl1 pagetable
> > (4 * lvl1). No change for lvl2 pagetable. Then the iova bits can reach
> > 34bit.
> > 
> > Signed-off-by: Yong Wu 
> > ---
> >  drivers/iommu/io-pgtable-arm-v7s.c | 13 ++---
> >  drivers/iommu/mtk_iommu.c  |  2 +-
> >  2 files changed, 11 insertions(+), 4 deletions(-)
> > 
> > diff --git a/drivers/iommu/io-pgtable-arm-v7s.c 
> > b/drivers/iommu/io-pgtable-arm-v7s.c
> > index 8362fdf76657..306bae2755ed 100644
> > --- a/drivers/iommu/io-pgtable-arm-v7s.c
> > +++ b/drivers/iommu/io-pgtable-arm-v7s.c
> > @@ -50,10 +50,17 @@
> >   */
> >  #define ARM_V7S_ADDR_BITS  32
> 
> If we rename this to _ARM_V7S_ADDR_BITS can we then have ARM_V7S_ADDR_BITS
> take a cfg parameter and move the arm_v7s_is_mtk_enabled() check in there?
> Same for _ARM_V7S_LVL_BITS.
> 
> That would avoid scattering arm_v7s_is_mtk_enabled() into all the users.

I added "cfg" for _ARM_V7S_LVL_BITS in Robin's mail. is that ok?

Regarding ARM_V7S_ADDR_BITS, I'd like to keep it as is(Don't add cfg),
this macro only is used in ARM_V7S_LVL_SHIFT and checking the value of
ias/oas.

a) ARM_V7S_LVL_SHIFT always expect ARM_V7S_ADDR_BITS is 32.

b) our ias/oas is different(ias is 34 while oas is 35). If we define a
new macro, we need two about this, like:
#define ARM_V7S_IADDR_BITS(cfg) (!arm_v7s_is_mtk_enabled(cfg) ? 32 : 34)
#define ARM_V7S_OADDR_BITS(cfg) (!arm_v7s_is_mtk_enabled(cfg) ? 32 : 35)
and the two will only are used in the checking of ias/oas.

thus it looks unnecessary?

> 
> Will

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH v3 11/24] iommu/io-pgtable-arm-v7s: Quad lvl1 pgtable for MediaTek

2020-10-26 Thread Yong Wu
On Fri, 2020-10-23 at 15:10 +0100, Robin Murphy wrote:
> On 2020-09-30 08:06, Yong Wu wrote:
> > The standard input iova bits is 32. MediaTek quad the lvl1 pagetable
> > (4 * lvl1). No change for lvl2 pagetable. Then the iova bits can reach
> > 34bit.
> > 
> > Signed-off-by: Yong Wu 
> > ---
> >   drivers/iommu/io-pgtable-arm-v7s.c | 13 ++---
> >   drivers/iommu/mtk_iommu.c  |  2 +-
> >   2 files changed, 11 insertions(+), 4 deletions(-)
> > 
> > diff --git a/drivers/iommu/io-pgtable-arm-v7s.c 
> > b/drivers/iommu/io-pgtable-arm-v7s.c
> > index 8362fdf76657..306bae2755ed 100644
> > --- a/drivers/iommu/io-pgtable-arm-v7s.c
> > +++ b/drivers/iommu/io-pgtable-arm-v7s.c
> > @@ -50,10 +50,17 @@
> >*/
> >   #define ARM_V7S_ADDR_BITS 32
> >   #define _ARM_V7S_LVL_BITS(lvl)(16 - (lvl) * 4)
> > +/* MediaTek: totally 34bits, 14bits at lvl1 and 8bits at lvl2. */
> > +#define _ARM_V7S_LVL_BITS_MTK(lvl) (20 - (lvl) * 6)
> 
> This should defined in terms of both lvl and cfg->ias. The formula here 
> is nothing more than a disgusting trick I made up since a linear 
> interpolation happened to fit the required numbers. That said, all of 
> these bits pretending that short-descriptor is a well-defined recursive 
> format only served to allow the rest of the code to look more like the 
> LPAE code - IIRC they've already diverged a fair bit since then, so 
> frankly a lot of this could stand to be unpicked and made considerably 
> clearer by simply accepting that level 1 and level 2 are different from 
> each other.

If the formula is not good and make it clearer, How about this?


/*
 * We have 32 bits total; 12 bits resolved at level 1, 8 bits at level
2,
-* and 12 bits in a page. With some carefully-chosen coefficients we can
-* hide the ugly inconsistencies behind these macros and at least let
the
-* rest of the code pretend to be somewhat sane.
+* and 12 bits in a page.
+*
+* MediaTek extend 2 bits to reach 34 bits, 14 bits at lvl1 and 8 bits
at lvl2.
 */

-#define _ARM_V7S_LVL_BITS(lvl) (16 - (lvl) * 4)
+#define _ARM_V7S_LVL1_BITS_NR(cfg) (((cfg)->ias == 32) ? 12 : 14)
+#define _ARM_V7S_LVL2_BITS_NR  8
+
+#define _ARM_V7S_LVL_BITS(lvl, cfg)\
+  (((lvl) == 1) ? _ARM_V7S_LVL1_BITS_NR(cfg):_ARM_V7S_LVL2_BITS_NR)

> Robin.
> 
> >   #define ARM_V7S_LVL_SHIFT(lvl)(ARM_V7S_ADDR_BITS - (4 + 8 * 
> > (lvl)))
> >   #define ARM_V7S_TABLE_SHIFT   10
> >   
> > -#define ARM_V7S_PTES_PER_LVL(lvl, cfg) (1 << _ARM_V7S_LVL_BITS(lvl))
> > +#define ARM_V7S_PTES_PER_LVL(lvl, cfg) ({  
> > \
> > +   int _lvl = lvl; \
> > +   !arm_v7s_is_mtk_enabled(cfg) ?  \
> > +(1 << _ARM_V7S_LVL_BITS(_lvl)) : (1 << _ARM_V7S_LVL_BITS_MTK(_lvl));\
> > +})
> > +
> >   #define ARM_V7S_TABLE_SIZE(lvl, cfg)  
> > \
> > (ARM_V7S_PTES_PER_LVL(lvl, cfg) * sizeof(arm_v7s_iopte))
> >   
> > @@ -63,7 +70,7 @@
> >   #define _ARM_V7S_IDX_MASK(lvl, cfg)   (ARM_V7S_PTES_PER_LVL(lvl, cfg) 
> > - 1)
> >   #define ARM_V7S_LVL_IDX(addr, lvl, cfg)   ({  \
> > int _l = lvl;   \
> > -   ((u32)(addr) >> ARM_V7S_LVL_SHIFT(_l)) & _ARM_V7S_IDX_MASK(_l, cfg); \
> > +   ((addr) >> ARM_V7S_LVL_SHIFT(_l)) & _ARM_V7S_IDX_MASK(_l, cfg); \
> >   })
> >   
> >   /*
> > @@ -755,7 +762,7 @@ static struct io_pgtable *arm_v7s_alloc_pgtable(struct 
> > io_pgtable_cfg *cfg,
> >   {
> > struct arm_v7s_io_pgtable *data;
> >   
> > -   if (cfg->ias > ARM_V7S_ADDR_BITS)
> > +   if (cfg->ias > (arm_v7s_is_mtk_enabled(cfg) ? 34 : ARM_V7S_ADDR_BITS))
> > return NULL;
> >   
> > if (cfg->oas > (arm_v7s_is_mtk_enabled(cfg) ? 35 : ARM_V7S_ADDR_BITS))
> > diff --git a/drivers/iommu/mtk_iommu.c b/drivers/iommu/mtk_iommu.c
> > index f6a2e3eb59d2..6e85c9976a33 100644
> > --- a/drivers/iommu/mtk_iommu.c
> > +++ b/drivers/iommu/mtk_iommu.c
> > @@ -316,7 +316,7 @@ static int mtk_iommu_domain_finalise(struct 
> > mtk_iommu_domain *dom)
> > IO_PGTABLE_QUIRK_TLBI_ON_MAP |
> > IO_PGTABLE_QUIRK_ARM_MTK_EXT,
> > .pgsize_bitmap = mtk_iommu_ops.pgsize_bitmap,
> > -   .ias = 32,
> > +   .ias = 34,
> > .oas = 35,
> > .tlb = _iommu_flush_ops,
> > .iommu_dev = data->dev,
> > 

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[PATCH 1/1] iommu: Fix deferred domain attachment in iommu_probe_device()

2020-10-26 Thread Lu Baolu
The IOMMU core has support for deferring the attachment of default domain
to device. Fix a missed deferred attaching check in iommu_probe_device().

Fixes: cf193888bfbd3 ("iommu: Move new probe_device path to separate function")
Cc: sta...@vger.kernel.org # v5.8+
Signed-off-by: Lu Baolu 
---
 drivers/iommu/iommu.c | 5 -
 1 file changed, 4 insertions(+), 1 deletion(-)

diff --git a/drivers/iommu/iommu.c b/drivers/iommu/iommu.c
index 6d847027d35e..690abf60239d 100644
--- a/drivers/iommu/iommu.c
+++ b/drivers/iommu/iommu.c
@@ -93,6 +93,8 @@ static void __iommu_detach_group(struct iommu_domain *domain,
 static int iommu_create_device_direct_mappings(struct iommu_group *group,
   struct device *dev);
 static struct iommu_group *iommu_group_get_for_dev(struct device *dev);
+static bool iommu_is_attach_deferred(struct iommu_domain *domain,
+struct device *dev);
 
 #define IOMMU_GROUP_ATTR(_name, _mode, _show, _store)  \
 struct iommu_group_attribute iommu_group_attr_##_name =\
@@ -264,7 +266,8 @@ int iommu_probe_device(struct device *dev)
 */
iommu_alloc_default_domain(group, dev);
 
-   if (group->default_domain)
+   if (group->default_domain &&
+   !iommu_is_attach_deferred(group->default_domain, dev))
ret = __iommu_attach_device(group->default_domain, dev);
 
iommu_create_device_direct_mappings(group, dev);
-- 
2.17.1

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu