Re: [PATCH] iommu/io-pgtable-arm: Allow non-coherent masters to use system cache

2021-01-07 Thread Sai Prakash Ranjan

On 2021-01-07 22:27, isa...@codeaurora.org wrote:

On 2021-01-06 03:56, Will Deacon wrote:

On Thu, Dec 24, 2020 at 12:10:07PM +0530, Sai Prakash Ranjan wrote:
commit ecd7274fb4cd ("iommu: Remove unused IOMMU_SYS_CACHE_ONLY 
flag")

removed unused IOMMU_SYS_CACHE_ONLY prot flag and along with it went
the memory type setting required for the non-coherent masters to use
system cache. Now that system cache support for GPU is added, we will
need to mark the memory as normal sys-cached for GPU to use system 
cache.
Without this, the system cache lines are not allocated for GPU. We 
use
the IO_PGTABLE_QUIRK_ARM_OUTER_WBWA quirk instead of a page 
protection

flag as the flag cannot be exposed via DMA api because of no in-tree
users.

Signed-off-by: Sai Prakash Ranjan 
---
 drivers/iommu/io-pgtable-arm.c | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/drivers/iommu/io-pgtable-arm.c 
b/drivers/iommu/io-pgtable-arm.c

index 7c9ea9d7874a..3fb7de8304a2 100644
--- a/drivers/iommu/io-pgtable-arm.c
+++ b/drivers/iommu/io-pgtable-arm.c
@@ -415,6 +415,9 @@ static arm_lpae_iopte arm_lpae_prot_to_pte(struct 
arm_lpae_io_pgtable *data,

else if (prot & IOMMU_CACHE)
pte |= (ARM_LPAE_MAIR_ATTR_IDX_CACHE
<< ARM_LPAE_PTE_ATTRINDX_SHIFT);
+   else if (data->iop.cfg.quirks & IO_PGTABLE_QUIRK_ARM_OUTER_WBWA)
+   pte |= (ARM_LPAE_MAIR_ATTR_IDX_INC_OCACHE
+   << ARM_LPAE_PTE_ATTRINDX_SHIFT);
}



While this approach of enabling system cache globally for both page
tables and other buffers
works for the GPU usecase, this isn't ideal for other clients that use
system cache. For example,
video clients only want to cache a subset of their buffers in the
system cache, due to the sizing constraint
imposed by how much of the system cache they can use. So, it would be
ideal to have
a way of expressing the desire to use the system cache on a per-buffer
basis. Additionally,
our video clients use the DMA layer, and since the requirement is for
caching in the system cache
to be a per buffer attribute, it seems like we would have to have a
DMA attribute to express
this on a per-buffer basis.



I did bring this up initially [1], also where is this video client
in upstream? AFAIK, only system cache user in upstream is GPU.
We cannot add any DMA attribute unless there is any user upstream
as per [2], so when the support for such a client is added, wouldn't
((data->iop.cfg.quirks & IO_PGTABLE_QUIRK_ARM_OUTER_WBWA) || PROT_FLAG)
work?

[1] 
https://lore.kernel.org/dri-devel/ecfda7ca80f6d7b4ff3d89b8758f4...@codeaurora.org/

[2] https://lore.kernel.org/linux-iommu/20191026053026.ga14...@lst.de/T/

Thanks,
Sai

--
QUALCOMM INDIA, on behalf of Qualcomm Innovation Center, Inc. is a 
member

of Code Aurora Forum, hosted by The Linux Foundation
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [RFC PATCH 1/2] dt-bindings: iommu: add bindings for sprd iommu

2021-01-07 Thread Rob Herring
On Wed, Dec 23, 2020 at 07:16:32PM +0800, Chunyan Zhang wrote:
> From: Chunyan Zhang 
> 
> This patch only adds bindings to support display iommu, support for others
> would be added once finished tests with those devices, such as Image
> codec(jpeg) processor, a few signal processors, including VSP(video),
> GSP(graphic), ISP(image), and camera CPP, etc.
> 
> Signed-off-by: Chunyan Zhang 
> ---
>  .../devicetree/bindings/iommu/sprd,iommu.yaml | 44 +++
>  1 file changed, 44 insertions(+)
>  create mode 100644 Documentation/devicetree/bindings/iommu/sprd,iommu.yaml
> 
> diff --git a/Documentation/devicetree/bindings/iommu/sprd,iommu.yaml 
> b/Documentation/devicetree/bindings/iommu/sprd,iommu.yaml
> new file mode 100644
> index ..4d9a578a7cc9
> --- /dev/null
> +++ b/Documentation/devicetree/bindings/iommu/sprd,iommu.yaml
> @@ -0,0 +1,44 @@
> +# SPDX-License-Identifier: (GPL-2.0-only OR BSD-2-Clause)
> +# Copyright 2020 Unisoc Inc.
> +%YAML 1.2
> +---
> +$id: http://devicetree.org/schemas/iommu/sprd,iommu.yaml#
> +$schema: http://devicetree.org/meta-schemas/core.yaml#
> +
> +title: Unisoc IOMMU and Multi-media MMU
> +
> +maintainers:
> +  - Chunyan Zhang 
> +
> +properties:
> +  compatible:
> +enum:
> +  - sprd,iommu-disp

Needs to be Soc specific. Is this block specific to display subsys or 
that just happens to be where the instance is?

> +
> +  reg:
> +maxItems: 1
> +
> +  "#iommu-cells":
> +const: 0
> +description:
> +  Unisoc IOMMUs are all single-master IOMMU devices, therefore no
> +  additional information needs to associate with its master device.
> +  Please refer to the generic bindings document for more details,
> +  Documentation/devicetree/bindings/iommu/iommu.txt
> +
> +required:
> +  - compatible
> +  - reg
> +  - "#iommu-cells"
> +
> +additionalProperties: false
> +
> +examples:
> +  - |
> +iommu_disp: iommu@6300 {
> +  compatible = "sprd,iommu-disp";
> +  reg = <0x6300 0x880>;
> +  #iommu-cells = <0>;
> +};
> +
> +...
> -- 
> 2.25.1
> 
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[PATCH] dma-mapping: benchmark: fix kernel crash when dma_map_single fails

2021-01-07 Thread Barry Song
if dma_map_single() fails, kernel will give the below oops since
task_struct has been destroyed and we are running into the memory
corruption due to use-after-free in kthread_stop():

[   48.095310] Unable to handle kernel paging request at virtual address 
00c473548040
[   48.095736] Mem abort info:
[   48.095864]   ESR = 0x9604
[   48.096025]   EC = 0x25: DABT (current EL), IL = 32 bits
[   48.096268]   SET = 0, FnV = 0
[   48.096401]   EA = 0, S1PTW = 0
[   48.096538] Data abort info:
[   48.096659]   ISV = 0, ISS = 0x0004
[   48.096820]   CM = 0, WnR = 0
[   48.097079] user pgtable: 4k pages, 48-bit VAs, pgdp=000104639000
[   48.098099] [00c473548040] pgd=, p4d=
[   48.098832] Internal error: Oops: 9604 [#1] PREEMPT SMP
[   48.099232] Modules linked in:
[   48.099387] CPU: 0 PID: 2 Comm: kthreadd Tainted: GW
[   48.099887] Hardware name: linux,dummy-virt (DT)
[   48.100078] pstate: 6005 (nZCv daif -PAN -UAO -TCO BTYPE=--)
[   48.100516] pc : __kmalloc_node+0x214/0x368
[   48.100944] lr : __kmalloc_node+0x1f4/0x368
[   48.101458] sp : 800011f0bb80
[   48.101843] x29: 800011f0bb80 x28: c0098ec0
[   48.102330] x27:  x26: 001d4600
[   48.102648] x25: c0098ec0 x24: 800011b6a000
[   48.102988] x23:  x22: c0098ec0
[   48.10] x21: 8000101d7a54 x20: 0dc0
[   48.103657] x19: c0001e00 x18: 
[   48.104069] x17:  x16: 
[   48.105449] x15: 01aa0304e7b9 x14: 03b1
[   48.106401] x13: 8000122d5000 x12: 80001228d000
[   48.107296] x11: c0154340 x10: 
[   48.107862] x9 : 8fff x8 : c473527f
[   48.108326] x7 : 800011e62f58 x6 : c01c8ed8
[   48.108778] x5 : c0098ec0 x4 : 
[   48.109223] x3 : 001d4600 x2 : 0040
[   48.109656] x1 : 0001 x0 : ffc473548000
[   48.110104] Call trace:
[   48.110287]  __kmalloc_node+0x214/0x368
[   48.110493]  __vmalloc_node_range+0xc4/0x298
[   48.110805]  copy_process+0x2c8/0x15c8
[   48.33]  kernel_clone+0x5c/0x3c0
[   48.111373]  kernel_thread+0x64/0x90
[   48.111604]  kthreadd+0x158/0x368
[   48.111810]  ret_from_fork+0x10/0x30
[   48.112336] Code: 17e9 b9402a62 b94008a1 11000421 (f8626802)
[   48.112884] ---[ end trace d4890e21e75419d5 ]---

Signed-off-by: Barry Song 
---
 kernel/dma/map_benchmark.c | 6 +-
 1 file changed, 5 insertions(+), 1 deletion(-)

diff --git a/kernel/dma/map_benchmark.c b/kernel/dma/map_benchmark.c
index b1496e744c68..1b1b8ff875cb 100644
--- a/kernel/dma/map_benchmark.c
+++ b/kernel/dma/map_benchmark.c
@@ -147,8 +147,10 @@ static int do_map_benchmark(struct map_benchmark_data *map)
atomic64_set(&map->sum_sq_unmap, 0);
atomic64_set(&map->loops, 0);
 
-   for (i = 0; i < threads; i++)
+   for (i = 0; i < threads; i++) {
+   get_task_struct(tsk[i]);
wake_up_process(tsk[i]);
+   }
 
msleep_interruptible(map->bparam.seconds * 1000);
 
@@ -183,6 +185,8 @@ static int do_map_benchmark(struct map_benchmark_data *map)
}
 
 out:
+   for (i = 0; i < threads; i++)
+   put_task_struct(tsk[i]);
put_device(map->dev);
kfree(tsk);
return ret;
-- 
2.25.1

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH 3/5] iommu/vt-d: Remove unused dma map/unmap trace events

2021-01-07 Thread Lu Baolu

Hi Will,

On 2021/1/7 22:40, Will Deacon wrote:

On Wed, Jan 06, 2021 at 09:14:22AM +0800, Lu Baolu wrote:

On 2021/1/6 3:04, Will Deacon wrote:

On Thu, Dec 31, 2020 at 08:53:21AM +0800, Lu Baolu wrote:

With commit c588072bba6b5 ("iommu/vt-d: Convert intel iommu driver to
the iommu ops"), the trace events for dma map/unmap have no users any
more. Remove them so that they don't show up under
/sys/kernel/debug/tracing/events/intel_iommu. The users should use the
map/unmap traces defined in the iommu core from now on.

Fixes: c588072bba6b5 ("iommu/vt-d: Convert intel iommu driver to the iommu ops")
Signed-off-by: Lu Baolu 
---
   include/trace/events/intel_iommu.h | 119 -
   1 file changed, 119 deletions(-)


Is this needed in 5.11, or can it wait until 5.12?


It's necessary for 5.11 as far as I can see. Without this, users still
see the events under /sys/kernel/debug/tracing/events/intel_iommu, but
they will get nothing traced even they enable the events.


I'm just a bit wary about breaking userspace by removing them altogether,
although I see that there's plenty of precedent for removing events from
the include/trace/events directory, so it's probably fine.

However, the patch as-is results in this warning for me:

  | In file included from include/trace/define_trace.h:102,
  |  from include/trace/events/intel_iommu.h:22,
  |  from drivers/iommu/intel/trace.c:14:
  | include/trace/trace_events.h:27:23: warning: 
‘str__intel_iommu__trace_system_name’ defined but not used 
[-Wunused-const-variable=]
  |27 | #define __app__(x, y) str__##x##y
  |   |   ^
  | include/trace/trace_events.h:28:21: note: in expansion of macro ‘__app__’
  |28 | #define __app(x, y) __app__(x, y)
  |   | ^~~
  | include/trace/trace_events.h:30:29: note: in expansion of macro ‘__app’
  |30 | #define TRACE_SYSTEM_STRING 
__app(TRACE_SYSTEM_VAR,__trace_system_name)
  |   | ^
  | include/trace/trace_events.h:33:20: note: in expansion of macro 
‘TRACE_SYSTEM_STRING’
  |33 |  static const char TRACE_SYSTEM_STRING[] = \
  |   |^~~
  | include/trace/trace_events.h:36:1: note: in expansion of macro 
‘TRACE_MAKE_SYSTEM_STR’
  |36 | TRACE_MAKE_SYSTEM_STR();
  |   | ^

so I'll drop this for now.


Okay, I will rework this. Thanks!



Will



Best regards,
baolu
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

Re: [PATCH 2/5] iommu/vt-d: Fix unaligned addresses for intel_flush_svm_range_dev()

2021-01-07 Thread Lu Baolu

Hi Will,

On 2021/1/6 9:09, Lu Baolu wrote:

Hi Will,

Happy New Year!

On 2021/1/6 3:03, Will Deacon wrote:

On Thu, Dec 31, 2020 at 08:53:20AM +0800, Lu Baolu wrote:

The VT-d hardware will ignore those Addr bits which have been masked by
the AM field in the PASID-based-IOTLB invalidation descriptor. As the
result, if the starting address in the descriptor is not aligned with
the address mask, some IOTLB caches might not invalidate. Hence people
will see below errors.

[ 1093.704661] dmar_fault: 29 callbacks suppressed
[ 1093.704664] DMAR: DRHD: handling fault status reg 3
[ 1093.712738] DMAR: [DMA Read] Request device [7a:02.0] PASID 2
    fault addr 7f81c968d000 [fault reason 113]
    SM: Present bit in first-level paging entry is clear

Fix this by using aligned address for PASID-based-IOTLB invalidation.

Fixes: 1c4f88b7f1f92 ("iommu/vt-d: Shared virtual address in scalable 
mode")

Reported-and-tested-by: Guo Kaijie 
Signed-off-by: Lu Baolu 
---
  drivers/iommu/intel/svm.c | 22 --
  1 file changed, 20 insertions(+), 2 deletions(-)

diff --git a/drivers/iommu/intel/svm.c b/drivers/iommu/intel/svm.c
index 69566695d032..b16a4791acfb 100644
--- a/drivers/iommu/intel/svm.c
+++ b/drivers/iommu/intel/svm.c
@@ -118,8 +118,10 @@ void intel_svm_check(struct intel_iommu *iommu)
  iommu->flags |= VTD_FLAG_SVM_CAPABLE;
  }
-static void intel_flush_svm_range_dev (struct intel_svm *svm, struct 
intel_svm_dev *sdev,

-    unsigned long address, unsigned long pages, int ih)
+static void __flush_svm_range_dev(struct intel_svm *svm,
+  struct intel_svm_dev *sdev,
+  unsigned long address,
+  unsigned long pages, int ih)
  {
  struct qi_desc desc;
@@ -170,6 +172,22 @@ static void intel_flush_svm_range_dev (struct 
intel_svm *svm, struct intel_svm_d

  }
  }
+static void intel_flush_svm_range_dev(struct intel_svm *svm,
+  struct intel_svm_dev *sdev,
+  unsigned long address,
+  unsigned long pages, int ih)
+{
+    unsigned long shift = ilog2(__roundup_pow_of_two(pages));
+    unsigned long align = (1ULL << (VTD_PAGE_SHIFT + shift));
+    unsigned long start = ALIGN_DOWN(address, align);
+    unsigned long end = ALIGN(address + (pages << VTD_PAGE_SHIFT), 
align);

+
+    while (start < end) {
+    __flush_svm_range_dev(svm, sdev, start, align >> 
VTD_PAGE_SHIFT, ih);

+    start += align;
+    }
+}


Given that this only seems to be called from intel_invalidate_range(), 
which
has to compute 'pages' only to have it pulled apart again here, 
perhaps it

would be cleaner for intel_flush_svm_range() to take something like an
'order' argument instead?

What do you think?


We need to clean up here. It's duplicate with the qi_flush_piotlb()
helper. I have a patch under testing for this. I will post it for review
later.


I'm sorry, above reply is a little vague.

I meant to say, let's take 'pages' as the argument. We are going to use
qi_flush_piotlb() here to avoid duplicate QI interactions. The
qi_flush_piotlb() helper also takes 'pages', so keep 'pages' here will
make things easier.

My cleanup patch is for v5.12. Can you please take this for v5.11?

Best regards,
baolu
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

Re: [RFC PATCH v3 2/6] swiotlb: Add restricted DMA pool

2021-01-07 Thread Konrad Rzeszutek Wilk
On Thu, Jan 07, 2021 at 10:09:14AM -0800, Florian Fainelli wrote:
> On 1/7/21 9:57 AM, Konrad Rzeszutek Wilk wrote:
> > On Fri, Jan 08, 2021 at 01:39:18AM +0800, Claire Chang wrote:
> >> Hi Greg and Konrad,
> >>
> >> This change is intended to be non-arch specific. Any arch that lacks DMA 
> >> access
> >> control and has devices not behind an IOMMU can make use of it. Could you 
> >> share
> >> why you think this should be arch specific?
> > 
> > The idea behind non-arch specific code is it to be generic. The devicetree
> > is specific to PowerPC, Sparc, and ARM, and not to x86 - hence it should
> > be in arch specific code.
> 
> In premise the same code could be used with an ACPI enabled system with
> an appropriate service to identify the restricted DMA regions and unlock
> them.

Which this patchset is not.

> 
> More than 1 architecture requiring this function (ARM and ARM64 are the
> two I can think of needing this immediately) sort of calls for making
> the code architecture agnostic since past 2, you need something that scales.

I believe the use-case is for ARM64 at this moment.

> 
> There is already code today under kernel/dma/contiguous.c that is only
> activated on a CONFIG_OF=y && CONFIG_OF_RESERVED_MEM=y system, this is
> no different.
> -- 
> Florian
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [RFC PATCH v3 5/6] dt-bindings: of: Add restricted DMA pool

2021-01-07 Thread Florian Fainelli
On 1/7/21 10:00 AM, Konrad Rzeszutek Wilk wrote:
>>>
>>>
>>>  - Nothing stops the physical device from bypassing the SWIOTLB buffer.
>>>That is if an errant device screwed up the length or DMA address, the
>>>SWIOTLB would gladly do what the device told it do?
>>
>> So the system needs to provide a way to lock down the memory access, e.g. 
>> MPU.
> 
> OK! Would it be prudent to have this in the description above perhaps?

Yes this is something that must be documented as a requirement for the
restricted DMA pool users, otherwise attempting to do restricted DMA
pool is no different than say, using a device private CMA region.
Without the enforcement, this is just a best effort.
-- 
Florian
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [RFC PATCH v3 2/6] swiotlb: Add restricted DMA pool

2021-01-07 Thread Florian Fainelli
On 1/7/21 9:57 AM, Konrad Rzeszutek Wilk wrote:
> On Fri, Jan 08, 2021 at 01:39:18AM +0800, Claire Chang wrote:
>> Hi Greg and Konrad,
>>
>> This change is intended to be non-arch specific. Any arch that lacks DMA 
>> access
>> control and has devices not behind an IOMMU can make use of it. Could you 
>> share
>> why you think this should be arch specific?
> 
> The idea behind non-arch specific code is it to be generic. The devicetree
> is specific to PowerPC, Sparc, and ARM, and not to x86 - hence it should
> be in arch specific code.

In premise the same code could be used with an ACPI enabled system with
an appropriate service to identify the restricted DMA regions and unlock
them.

More than 1 architecture requiring this function (ARM and ARM64 are the
two I can think of needing this immediately) sort of calls for making
the code architecture agnostic since past 2, you need something that scales.

There is already code today under kernel/dma/contiguous.c that is only
activated on a CONFIG_OF=y && CONFIG_OF_RESERVED_MEM=y system, this is
no different.
-- 
Florian
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [RFC PATCH v3 0/6] Restricted DMA

2021-01-07 Thread Claire Chang
On Thu, Jan 7, 2021 at 2:48 AM Florian Fainelli 
wrote:
>
> Hi,
>
> First of all let me say that I am glad that someone is working on a
> upstream solution for this issue, would appreciate if you could CC and
> Jim Quinlan on subsequent submissions.

Sure!

>
>
> On 1/5/21 7:41 PM, Claire Chang wrote:
> > This series implements mitigations for lack of DMA access control on
> > systems without an IOMMU, which could result in the DMA accessing the
> > system memory at unexpected times and/or unexpected addresses, possibly
> > leading to data leakage or corruption.
> >
> > For example, we plan to use the PCI-e bus for Wi-Fi and that PCI-e bus
is
> > not behind an IOMMU. As PCI-e, by design, gives the device full access
to
> > system memory, a vulnerability in the Wi-Fi firmware could easily
escalate
> > to a full system exploit (remote wifi exploits: [1a], [1b] that shows a
> > full chain of exploits; [2], [3]).
> >
> > To mitigate the security concerns, we introduce restricted DMA.
Restricted
> > DMA utilizes the existing swiotlb to bounce streaming DMA in and out of
a
> > specially allocated region and does memory allocation from the same
region.
> > The feature on its own provides a basic level of protection against the
DMA
> > overwriting buffer contents at unexpected times. However, to protect
> > against general data leakage and system memory corruption, the system
needs
> > to provide a way to restrict the DMA to a predefined memory region
(this is
> > usually done at firmware level, e.g. in ATF on some ARM platforms).
>
> Can you explain how ATF gets involved and to what extent it does help,
> besides enforcing a secure region from the ARM CPU's perpsective? Does
> the PCIe root complex not have an IOMMU but can somehow be denied access
> to a region that is marked NS=0 in the ARM CPU's MMU? If so, that is
> still some sort of basic protection that the HW enforces, right?

We need the ATF support for memory MPU (memory protection unit).
Restricted DMA (with reserved-memory in dts) makes sure the predefined
memory
region is for PCIe DMA only, but we still need MPU to locks down PCIe
access to
that specific regions.

>
> On Broadcom STB SoCs we have had something similar for a while however
> and while we don't have an IOMMU for the PCIe bridge, we do have a a
> basic protection mechanism whereby we can configure a region in DRAM to
> be PCIe read/write and CPU read/write which then gets used as the PCIe
> inbound region for the PCIe EP. By default the PCIe bridge is not
> allowed access to DRAM so we must call into a security agent to allow
> the PCIe bridge to access the designated DRAM region.
>
> We have done this using a private CMA area region assigned via Device
> Tree, assigned with a and requiring the PCIe EP driver to use
> dma_alloc_from_contiguous() in order to allocate from this device
> private CMA area. The only drawback with that approach is that it
> requires knowing how much memory you need up front for buffers and DMA
> descriptors that the PCIe EP will need to process. The problem is that
> it requires driver modifications and that does not scale over the number
> of PCIe EP drivers, some we absolutely do not control, but there is no
> need to bounce buffer. Your approach scales better across PCIe EP
> drivers however it does require bounce buffering which could be a
> performance hit.

Only the streaming DMA (map/unmap) needs bounce buffering.
I also added alloc/free support in this series
(https://lore.kernel.org/patchwork/patch/1360995/), so dma_direct_alloc()
will
try to allocate memory from the predefined memory region.

As for the performance hit, it should be similar to the default swiotlb.
Here are my experiment results. Both SoCs lack IOMMU for PCIe.

PCIe wifi vht80 throughput -

  MTK SoC  tcp_tx tcp_rxudp_tx   udp_rx
  w/o Restricted DMA  244.1 134.66   312.56   350.79
  w/ Restricted DMA246.95   136.59   363.21   351.99

  Rockchip SoC   tcp_tx tcp_rxudp_tx   udp_rx
  w/o Restricted DMA  237.87   133.86   288.28   361.88
  w/ Restricted DMA256.01   130.95   292.28   353.19

The CPU usage doesn't increase too much either.
Although I didn't measure the CPU usage very precisely, it's ~3% with a
single
big core (Cortex-A72) and ~5% with a single small core (Cortex-A53).

Thanks!
>
> Thanks!


On Thu, Jan 7, 2021 at 2:48 AM Florian Fainelli 
wrote:

> Hi,
>
> First of all let me say that I am glad that someone is working on a
> upstream solution for this issue, would appreciate if you could CC and
> Jim Quinlan on subsequent submissions.
>
> On 1/5/21 7:41 PM, Claire Chang wrote:
> > This series implements mitigations for lack of DMA access control on
> > systems without an IOMMU, which could result in the DMA accessing the
> > system memory at unexpected times and/or unexpected addresses, possibly
> > leading to data leakage or corruption.
> >
> > For example, we plan to use the PCI-e bus for Wi-Fi and that PCI-e bus is
> >

Re: [RFC PATCH v3 5/6] dt-bindings: of: Add restricted DMA pool

2021-01-07 Thread Konrad Rzeszutek Wilk
On Fri, Jan 08, 2021 at 01:39:43AM +0800, Claire Chang wrote:
> On Thu, Jan 7, 2021 at 2:58 AM Konrad Rzeszutek Wilk
>  wrote:
> >
> > On Wed, Jan 06, 2021 at 11:41:23AM +0800, Claire Chang wrote:
> > > Introduce the new compatible string, restricted-dma-pool, for restricted
> > > DMA. One can specify the address and length of the restricted DMA memory
> > > region by restricted-dma-pool in the device tree.
> > >
> > > Signed-off-by: Claire Chang 
> > > ---
> > >  .../reserved-memory/reserved-memory.txt   | 24 +++
> > >  1 file changed, 24 insertions(+)
> > >
> > > diff --git 
> > > a/Documentation/devicetree/bindings/reserved-memory/reserved-memory.txt 
> > > b/Documentation/devicetree/bindings/reserved-memory/reserved-memory.txt
> > > index e8d3096d922c..44975e2a1fd2 100644
> > > --- 
> > > a/Documentation/devicetree/bindings/reserved-memory/reserved-memory.txt
> > > +++ 
> > > b/Documentation/devicetree/bindings/reserved-memory/reserved-memory.txt
> > > @@ -51,6 +51,20 @@ compatible (optional) - standard definition
> > >used as a shared pool of DMA buffers for a set of devices. It 
> > > can
> > >be used by an operating system to instantiate the necessary 
> > > pool
> > >management subsystem if necessary.
> > > +- restricted-dma-pool: This indicates a region of memory meant 
> > > to be
> > > +  used as a pool of restricted DMA buffers for a set of devices. 
> > > The
> > > +  memory region would be the only region accessible to those 
> > > devices.
> > > +  When using this, the no-map and reusable properties must not 
> > > be set,
> > > +  so the operating system can create a virtual mapping that will 
> > > be used
> > > +  for synchronization. The main purpose for restricted DMA is to
> > > +  mitigate the lack of DMA access control on systems without an 
> > > IOMMU,
> > > +  which could result in the DMA accessing the system memory at
> > > +  unexpected times and/or unexpected addresses, possibly leading 
> > > to data
> > > +  leakage or corruption. The feature on its own provides a basic 
> > > level
> > > +  of protection against the DMA overwriting buffer contents at
> > > +  unexpected times. However, to protect against general data 
> > > leakage and
> > > +  system memory corruption, the system needs to provide way to 
> > > restrict
> > > +  the DMA to a predefined memory region.
> >
> > Heya!
> >
> > I think I am missing something obvious here so please bear with my
> > questions:
> >
> >  - This code adds the means of having the SWIOTLB pool tied to a specific
> >memory correct?
> 
> It doesn't affect the existing SWIOTLB. It just utilizes the existing SWIOTLB
> code to create another DMA pool tied to a specific memory region for a given 
> set
> of devices. It bounces the streaming DMA (map/unmap) in and out of that region
> and does the memory allocation (dma_direct_alloc) from the same region.

Right, so why can't it follow the same mechanism that Xen SWIOTLB does - which
had exactly the same problem (needed special handling on the pool) - and do
a similar code?

> 
> >
> >
> >  - Nothing stops the physical device from bypassing the SWIOTLB buffer.
> >That is if an errant device screwed up the length or DMA address, the
> >SWIOTLB would gladly do what the device told it do?
> 
> So the system needs to provide a way to lock down the memory access, e.g. MPU.

OK! Would it be prudent to have this in the description above perhaps?
> 
> >
> >  - This has to be combined with SWIOTLB-force-ish to always use the
> >bounce buffer, otherwise you could still do DMA without using
> >SWIOTLB (by not hitting the criteria for needing to use SWIOTLB)?
> 
> Since restricted DMA is for the devices that are not behind an IOMMU, I change
> the criteria
> `if (unlikely(swiotlb_force == SWIOTLB_FORCE))`
> to
> `if (unlikely(swiotlb_force == SWIOTLB_FORCE) || dev->dma_io_tlb_mem)`
> in dma_direct_map_page().
> 
> Also, even if SWIOTLB=force, the restricted DMA pool is preferred if available
> (get_io_tlb_mem in https://lore.kernel.org/patchwork/patch/1360995/).
> 
> Thanks!
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [RFC PATCH v3 0/6] Restricted DMA

2021-01-07 Thread Florian Fainelli
On 1/7/21 9:42 AM, Claire Chang wrote:

>> Can you explain how ATF gets involved and to what extent it does help,
>> besides enforcing a secure region from the ARM CPU's perpsective? Does
>> the PCIe root complex not have an IOMMU but can somehow be denied access
>> to a region that is marked NS=0 in the ARM CPU's MMU? If so, that is
>> still some sort of basic protection that the HW enforces, right?
> 
> We need the ATF support for memory MPU (memory protection unit).
> Restricted DMA (with reserved-memory in dts) makes sure the predefined memory
> region is for PCIe DMA only, but we still need MPU to locks down PCIe access 
> to
> that specific regions.

OK so you do have a protection unit of some sort to enforce which region
in DRAM the PCIE bridge is allowed to access, that makes sense,
otherwise the restricted DMA region would only be a hint but nothing you
can really enforce. This is almost entirely analogous to our systems then.

There may be some value in standardizing on an ARM SMCCC call then since
you already support two different SoC vendors.

> 
>>
>> On Broadcom STB SoCs we have had something similar for a while however
>> and while we don't have an IOMMU for the PCIe bridge, we do have a a
>> basic protection mechanism whereby we can configure a region in DRAM to
>> be PCIe read/write and CPU read/write which then gets used as the PCIe
>> inbound region for the PCIe EP. By default the PCIe bridge is not
>> allowed access to DRAM so we must call into a security agent to allow
>> the PCIe bridge to access the designated DRAM region.
>>
>> We have done this using a private CMA area region assigned via Device
>> Tree, assigned with a and requiring the PCIe EP driver to use
>> dma_alloc_from_contiguous() in order to allocate from this device
>> private CMA area. The only drawback with that approach is that it
>> requires knowing how much memory you need up front for buffers and DMA
>> descriptors that the PCIe EP will need to process. The problem is that
>> it requires driver modifications and that does not scale over the number
>> of PCIe EP drivers, some we absolutely do not control, but there is no
>> need to bounce buffer. Your approach scales better across PCIe EP
>> drivers however it does require bounce buffering which could be a
>> performance hit.
> 
> Only the streaming DMA (map/unmap) needs bounce buffering.

True, and typically only on transmit since you don't really control
where the sk_buff are allocated from, right? On RX since you need to
hand buffer addresses to the WLAN chip prior to DMA, you can allocate
them from a pool that already falls within the restricted DMA region, right?

> I also added alloc/free support in this series
> (https://lore.kernel.org/patchwork/patch/1360995/), so dma_direct_alloc() will
> try to allocate memory from the predefined memory region.
> 
> As for the performance hit, it should be similar to the default swiotlb.
> Here are my experiment results. Both SoCs lack IOMMU for PCIe.
> 
> PCIe wifi vht80 throughput -
> 
>   MTK SoC  tcp_tx tcp_rxudp_tx   udp_rx
>   w/o Restricted DMA  244.1 134.66   312.56   350.79
>   w/ Restricted DMA246.95   136.59   363.21   351.99
> 
>   Rockchip SoC   tcp_tx tcp_rxudp_tx   udp_rx
>   w/o Restricted DMA  237.87   133.86   288.28   361.88
>   w/ Restricted DMA256.01   130.95   292.28   353.19

How come you get better throughput with restricted DMA? Is it because
doing DMA to/from a contiguous region allows for better grouping of
transactions from the DRAM controller's perspective somehow?

> 
> The CPU usage doesn't increase too much either.
> Although I didn't measure the CPU usage very precisely, it's ~3% with a single
> big core (Cortex-A72) and ~5% with a single small core (Cortex-A53).
> 
> Thanks!
> 
>>
>> Thanks!
>> --
>> Florian


-- 
Florian
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [RFC PATCH v3 2/6] swiotlb: Add restricted DMA pool

2021-01-07 Thread Konrad Rzeszutek Wilk
On Fri, Jan 08, 2021 at 01:39:18AM +0800, Claire Chang wrote:
> Hi Greg and Konrad,
> 
> This change is intended to be non-arch specific. Any arch that lacks DMA 
> access
> control and has devices not behind an IOMMU can make use of it. Could you 
> share
> why you think this should be arch specific?

The idea behind non-arch specific code is it to be generic. The devicetree
is specific to PowerPC, Sparc, and ARM, and not to x86 - hence it should
be in arch specific code.

> 
> Thanks!
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [RFC PATCH v3 0/6] Restricted DMA

2021-01-07 Thread Claire Chang
On Thu, Jan 7, 2021 at 2:48 AM Florian Fainelli  wrote:
>
> Hi,
>
> First of all let me say that I am glad that someone is working on a
> upstream solution for this issue, would appreciate if you could CC and
> Jim Quinlan on subsequent submissions.

Sure!

>
> On 1/5/21 7:41 PM, Claire Chang wrote:
> > This series implements mitigations for lack of DMA access control on
> > systems without an IOMMU, which could result in the DMA accessing the
> > system memory at unexpected times and/or unexpected addresses, possibly
> > leading to data leakage or corruption.
> >
> > For example, we plan to use the PCI-e bus for Wi-Fi and that PCI-e bus is
> > not behind an IOMMU. As PCI-e, by design, gives the device full access to
> > system memory, a vulnerability in the Wi-Fi firmware could easily escalate
> > to a full system exploit (remote wifi exploits: [1a], [1b] that shows a
> > full chain of exploits; [2], [3]).
> >
> > To mitigate the security concerns, we introduce restricted DMA. Restricted
> > DMA utilizes the existing swiotlb to bounce streaming DMA in and out of a
> > specially allocated region and does memory allocation from the same region.
> > The feature on its own provides a basic level of protection against the DMA
> > overwriting buffer contents at unexpected times. However, to protect
> > against general data leakage and system memory corruption, the system needs
> > to provide a way to restrict the DMA to a predefined memory region (this is
> > usually done at firmware level, e.g. in ATF on some ARM platforms).
>
> Can you explain how ATF gets involved and to what extent it does help,
> besides enforcing a secure region from the ARM CPU's perpsective? Does
> the PCIe root complex not have an IOMMU but can somehow be denied access
> to a region that is marked NS=0 in the ARM CPU's MMU? If so, that is
> still some sort of basic protection that the HW enforces, right?

We need the ATF support for memory MPU (memory protection unit).
Restricted DMA (with reserved-memory in dts) makes sure the predefined memory
region is for PCIe DMA only, but we still need MPU to locks down PCIe access to
that specific regions.

>
> On Broadcom STB SoCs we have had something similar for a while however
> and while we don't have an IOMMU for the PCIe bridge, we do have a a
> basic protection mechanism whereby we can configure a region in DRAM to
> be PCIe read/write and CPU read/write which then gets used as the PCIe
> inbound region for the PCIe EP. By default the PCIe bridge is not
> allowed access to DRAM so we must call into a security agent to allow
> the PCIe bridge to access the designated DRAM region.
>
> We have done this using a private CMA area region assigned via Device
> Tree, assigned with a and requiring the PCIe EP driver to use
> dma_alloc_from_contiguous() in order to allocate from this device
> private CMA area. The only drawback with that approach is that it
> requires knowing how much memory you need up front for buffers and DMA
> descriptors that the PCIe EP will need to process. The problem is that
> it requires driver modifications and that does not scale over the number
> of PCIe EP drivers, some we absolutely do not control, but there is no
> need to bounce buffer. Your approach scales better across PCIe EP
> drivers however it does require bounce buffering which could be a
> performance hit.

Only the streaming DMA (map/unmap) needs bounce buffering.
I also added alloc/free support in this series
(https://lore.kernel.org/patchwork/patch/1360995/), so dma_direct_alloc() will
try to allocate memory from the predefined memory region.

As for the performance hit, it should be similar to the default swiotlb.
Here are my experiment results. Both SoCs lack IOMMU for PCIe.

PCIe wifi vht80 throughput -

  MTK SoC  tcp_tx tcp_rxudp_tx   udp_rx
  w/o Restricted DMA  244.1 134.66   312.56   350.79
  w/ Restricted DMA246.95   136.59   363.21   351.99

  Rockchip SoC   tcp_tx tcp_rxudp_tx   udp_rx
  w/o Restricted DMA  237.87   133.86   288.28   361.88
  w/ Restricted DMA256.01   130.95   292.28   353.19

The CPU usage doesn't increase too much either.
Although I didn't measure the CPU usage very precisely, it's ~3% with a single
big core (Cortex-A72) and ~5% with a single small core (Cortex-A53).

Thanks!

>
> Thanks!
> --
> Florian
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [RFC PATCH v3 5/6] dt-bindings: of: Add restricted DMA pool

2021-01-07 Thread Claire Chang
On Thu, Jan 7, 2021 at 2:58 AM Konrad Rzeszutek Wilk
 wrote:
>
> On Wed, Jan 06, 2021 at 11:41:23AM +0800, Claire Chang wrote:
> > Introduce the new compatible string, restricted-dma-pool, for restricted
> > DMA. One can specify the address and length of the restricted DMA memory
> > region by restricted-dma-pool in the device tree.
> >
> > Signed-off-by: Claire Chang 
> > ---
> >  .../reserved-memory/reserved-memory.txt   | 24 +++
> >  1 file changed, 24 insertions(+)
> >
> > diff --git 
> > a/Documentation/devicetree/bindings/reserved-memory/reserved-memory.txt 
> > b/Documentation/devicetree/bindings/reserved-memory/reserved-memory.txt
> > index e8d3096d922c..44975e2a1fd2 100644
> > --- a/Documentation/devicetree/bindings/reserved-memory/reserved-memory.txt
> > +++ b/Documentation/devicetree/bindings/reserved-memory/reserved-memory.txt
> > @@ -51,6 +51,20 @@ compatible (optional) - standard definition
> >used as a shared pool of DMA buffers for a set of devices. It can
> >be used by an operating system to instantiate the necessary pool
> >management subsystem if necessary.
> > +- restricted-dma-pool: This indicates a region of memory meant to 
> > be
> > +  used as a pool of restricted DMA buffers for a set of devices. 
> > The
> > +  memory region would be the only region accessible to those 
> > devices.
> > +  When using this, the no-map and reusable properties must not be 
> > set,
> > +  so the operating system can create a virtual mapping that will 
> > be used
> > +  for synchronization. The main purpose for restricted DMA is to
> > +  mitigate the lack of DMA access control on systems without an 
> > IOMMU,
> > +  which could result in the DMA accessing the system memory at
> > +  unexpected times and/or unexpected addresses, possibly leading 
> > to data
> > +  leakage or corruption. The feature on its own provides a basic 
> > level
> > +  of protection against the DMA overwriting buffer contents at
> > +  unexpected times. However, to protect against general data 
> > leakage and
> > +  system memory corruption, the system needs to provide way to 
> > restrict
> > +  the DMA to a predefined memory region.
>
> Heya!
>
> I think I am missing something obvious here so please bear with my
> questions:
>
>  - This code adds the means of having the SWIOTLB pool tied to a specific
>memory correct?

It doesn't affect the existing SWIOTLB. It just utilizes the existing SWIOTLB
code to create another DMA pool tied to a specific memory region for a given set
of devices. It bounces the streaming DMA (map/unmap) in and out of that region
and does the memory allocation (dma_direct_alloc) from the same region.

>
>
>  - Nothing stops the physical device from bypassing the SWIOTLB buffer.
>That is if an errant device screwed up the length or DMA address, the
>SWIOTLB would gladly do what the device told it do?

So the system needs to provide a way to lock down the memory access, e.g. MPU.

>
>  - This has to be combined with SWIOTLB-force-ish to always use the
>bounce buffer, otherwise you could still do DMA without using
>SWIOTLB (by not hitting the criteria for needing to use SWIOTLB)?

Since restricted DMA is for the devices that are not behind an IOMMU, I change
the criteria
`if (unlikely(swiotlb_force == SWIOTLB_FORCE))`
to
`if (unlikely(swiotlb_force == SWIOTLB_FORCE) || dev->dma_io_tlb_mem)`
in dma_direct_map_page().

Also, even if SWIOTLB=force, the restricted DMA pool is preferred if available
(get_io_tlb_mem in https://lore.kernel.org/patchwork/patch/1360995/).

Thanks!
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [RFC PATCH v3 2/6] swiotlb: Add restricted DMA pool

2021-01-07 Thread Claire Chang
Hi Greg and Konrad,

This change is intended to be non-arch specific. Any arch that lacks DMA access
control and has devices not behind an IOMMU can make use of it. Could you share
why you think this should be arch specific?

Thanks!
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH] iommu/io-pgtable-arm: Allow non-coherent masters to use system cache

2021-01-07 Thread isaacm

On 2021-01-06 03:56, Will Deacon wrote:

On Thu, Dec 24, 2020 at 12:10:07PM +0530, Sai Prakash Ranjan wrote:

commit ecd7274fb4cd ("iommu: Remove unused IOMMU_SYS_CACHE_ONLY flag")
removed unused IOMMU_SYS_CACHE_ONLY prot flag and along with it went
the memory type setting required for the non-coherent masters to use
system cache. Now that system cache support for GPU is added, we will
need to mark the memory as normal sys-cached for GPU to use system 
cache.

Without this, the system cache lines are not allocated for GPU. We use
the IO_PGTABLE_QUIRK_ARM_OUTER_WBWA quirk instead of a page protection
flag as the flag cannot be exposed via DMA api because of no in-tree
users.

Signed-off-by: Sai Prakash Ranjan 
---
 drivers/iommu/io-pgtable-arm.c | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/drivers/iommu/io-pgtable-arm.c 
b/drivers/iommu/io-pgtable-arm.c

index 7c9ea9d7874a..3fb7de8304a2 100644
--- a/drivers/iommu/io-pgtable-arm.c
+++ b/drivers/iommu/io-pgtable-arm.c
@@ -415,6 +415,9 @@ static arm_lpae_iopte arm_lpae_prot_to_pte(struct 
arm_lpae_io_pgtable *data,

else if (prot & IOMMU_CACHE)
pte |= (ARM_LPAE_MAIR_ATTR_IDX_CACHE
<< ARM_LPAE_PTE_ATTRINDX_SHIFT);
+   else if (data->iop.cfg.quirks & IO_PGTABLE_QUIRK_ARM_OUTER_WBWA)
+   pte |= (ARM_LPAE_MAIR_ATTR_IDX_INC_OCACHE
+   << ARM_LPAE_PTE_ATTRINDX_SHIFT);
}


While this approach of enabling system cache globally for both page 
tables and other buffers
works for the GPU usecase, this isn't ideal for other clients that use 
system cache. For example,
video clients only want to cache a subset of their buffers in the system 
cache, due to the sizing constraint
imposed by how much of the system cache they can use. So, it would be 
ideal to have
a way of expressing the desire to use the system cache on a per-buffer 
basis. Additionally,
our video clients use the DMA layer, and since the requirement is for 
caching in the system cache
to be a per buffer attribute, it seems like we would have to have a DMA 
attribute to express

this on a per-buffer basis.

Thanks,
Isaac

drivers/iommu/io-pgtable.c currently documents this quirk as applying 
only
to the page-table walker. Given that we only have one user at the 
moment,

I think it's ok to change that, but please update the comment.

We also need to decide on whether we want to allow the quirk to be 
passed
if the coherency of the page-table walker differs from the DMA device, 
since

we have these combinations:

Coherent walker?IOMMU_CACHE IO_PGTABLE_QUIRK_ARM_OUTER_WBWA
0:  N   0   0
1:  N   0   1
2:  N   1   0
3:  N   1   1
4:  Y   0   0
5:  Y   0   1
6:  Y   1   0
7:  Y   1   1

Some of them are obviously bogus, such as (7), but I don't know what to
do about cases such as (3) and (5).

Will

___
linux-arm-kernel mailing list
linux-arm-ker...@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[PATCH 2/2] dt-bindings: arm-smmu: Add binding for Qcom SDX55 SMMU

2021-01-07 Thread Manivannan Sadhasivam
Add devicetree binding for Qualcomm SDX55 SMMU.

Cc: Will Deacon 
Cc: Robin Murphy 
Cc: Joerg Roedel 
Cc: iommu@lists.linux-foundation.org
Signed-off-by: Manivannan Sadhasivam 
Reviewed-by: Vinod Koul 
---
 Documentation/devicetree/bindings/iommu/arm,smmu.yaml | 1 +
 1 file changed, 1 insertion(+)

diff --git a/Documentation/devicetree/bindings/iommu/arm,smmu.yaml 
b/Documentation/devicetree/bindings/iommu/arm,smmu.yaml
index 3b63f2ae24db..3a1cefce8bc4 100644
--- a/Documentation/devicetree/bindings/iommu/arm,smmu.yaml
+++ b/Documentation/devicetree/bindings/iommu/arm,smmu.yaml
@@ -35,6 +35,7 @@ properties:
   - enum:
   - qcom,sc7180-smmu-500
   - qcom,sdm845-smmu-500
+  - qcom,sdx55-smmu-500
   - qcom,sm8150-smmu-500
   - qcom,sm8250-smmu-500
   - const: arm,mmu-500
-- 
2.25.1

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH 3/5] iommu/vt-d: Remove unused dma map/unmap trace events

2021-01-07 Thread Will Deacon
On Wed, Jan 06, 2021 at 09:14:22AM +0800, Lu Baolu wrote:
> On 2021/1/6 3:04, Will Deacon wrote:
> > On Thu, Dec 31, 2020 at 08:53:21AM +0800, Lu Baolu wrote:
> > > With commit c588072bba6b5 ("iommu/vt-d: Convert intel iommu driver to
> > > the iommu ops"), the trace events for dma map/unmap have no users any
> > > more. Remove them so that they don't show up under
> > > /sys/kernel/debug/tracing/events/intel_iommu. The users should use the
> > > map/unmap traces defined in the iommu core from now on.
> > > 
> > > Fixes: c588072bba6b5 ("iommu/vt-d: Convert intel iommu driver to the 
> > > iommu ops")
> > > Signed-off-by: Lu Baolu 
> > > ---
> > >   include/trace/events/intel_iommu.h | 119 -
> > >   1 file changed, 119 deletions(-)
> > 
> > Is this needed in 5.11, or can it wait until 5.12?
> 
> It's necessary for 5.11 as far as I can see. Without this, users still
> see the events under /sys/kernel/debug/tracing/events/intel_iommu, but
> they will get nothing traced even they enable the events.

I'm just a bit wary about breaking userspace by removing them altogether,
although I see that there's plenty of precedent for removing events from
the include/trace/events directory, so it's probably fine.

However, the patch as-is results in this warning for me:

 | In file included from include/trace/define_trace.h:102,
 |  from include/trace/events/intel_iommu.h:22,
 |  from drivers/iommu/intel/trace.c:14:
 | include/trace/trace_events.h:27:23: warning: 
‘str__intel_iommu__trace_system_name’ defined but not used 
[-Wunused-const-variable=]
 |27 | #define __app__(x, y) str__##x##y
 |   |   ^
 | include/trace/trace_events.h:28:21: note: in expansion of macro ‘__app__’
 |28 | #define __app(x, y) __app__(x, y)
 |   | ^~~
 | include/trace/trace_events.h:30:29: note: in expansion of macro ‘__app’
 |30 | #define TRACE_SYSTEM_STRING 
__app(TRACE_SYSTEM_VAR,__trace_system_name)
 |   | ^
 | include/trace/trace_events.h:33:20: note: in expansion of macro 
‘TRACE_SYSTEM_STRING’
 |33 |  static const char TRACE_SYSTEM_STRING[] = \
 |   |^~~
 | include/trace/trace_events.h:36:1: note: in expansion of macro 
‘TRACE_MAKE_SYSTEM_STR’
 |36 | TRACE_MAKE_SYSTEM_STR();
 |   | ^

so I'll drop this for now.

Will
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

Re: [PATCH v4 2/3] iommu/vt-d: Track device aux-attach with subdevice_domain_info

2021-01-07 Thread Will Deacon
On Thu, Jan 07, 2021 at 10:16:57PM +0800, Lu Baolu wrote:
> On 2021/1/7 21:49, Will Deacon wrote:
> > On Thu, Jan 07, 2021 at 12:03:56AM +0800, Liu Yi L wrote:
> > > In the existing code, loop all devices attached to a domain does not
> > > include sub-devices attached via iommu_aux_attach_device().
> > > 
> > > This was found by when I'm working on the below patch, There is no
> > > device in the domain->devices list, thus unable to get the cap and
> > > ecap of iommu unit. But this domain actually has subdevice which is
> > > attached via aux-manner. But it is tracked by domain. This patch is
> > > going to fix it.
> > > 
> > > https://lore.kernel.org/kvm/1599734733-6431-17-git-send-email-yi.l@intel.com/
> > > 
> > > And this fix goes beyond the patch above, such sub-device tracking is
> > > necessary for other cases. For example, flushing device_iotlb for a
> > > domain which has sub-devices attached by auxiliary manner.
> > 
> > Sorry, but I'm having a really hard time understanding what this patch is
> > doing based on this commit message. Baolu -- do you think you could reword
> > it for me please? No need to resend the patch.
> 
> iommu/vt-d: Fix general protection fault in aux_detach_device()

[...]

Thanks! I'll push this out shortly.

Will
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH] iommu/arm-smmu-qcom: Initialize SCTLR of the bypass context

2021-01-07 Thread Will Deacon
On Tue, 5 Jan 2021 16:50:38 -0800, Bjorn Andersson wrote:
> On SM8150 it's occasionally observed that the boot hangs in between the
> writing of SMEs and context banks in arm_smmu_device_reset().
> 
> The problem seems to coincide with a display refresh happening after
> updating the stream mapping, but before clearing - and there by
> disabling translation - the context bank picked to emulate translation
> bypass.
> 
> [...]

Applied to arm64 (for-next/iommu/fixes), thanks!

[1/1] iommu/arm-smmu-qcom: Initialize SCTLR of the bypass context
  https://git.kernel.org/arm64/c/aded8c7c2b72

Cheers,
-- 
Will

https://fixes.arm64.dev
https://next.arm64.dev
https://will.arm64.dev
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH 1/5] iommu/vt-d: Fix misuse of ALIGN in qi_flush_piotlb()

2021-01-07 Thread Will Deacon
On Thu, 31 Dec 2020 08:53:19 +0800, Lu Baolu wrote:
> Use IS_ALIGNED() instead. Otherwise, an unaligned address will be ignored.

Applied patches 1, 4 and 5 to arm64 (for-next/iommu/fixes), thanks!

[1/5] iommu/vt-d: Fix misuse of ALIGN in qi_flush_piotlb()
  https://git.kernel.org/arm64/c/1efd17e7acb6
[4/5] Revert "iommu: Add quirk for Intel graphic devices in map_sg"
  https://git.kernel.org/arm64/c/4df7b2268ad8
[5/5] iommu/vt-d: Fix lockdep splat in sva bind()/unbind()
  https://git.kernel.org/arm64/c/420d42f6f9db

Cheers,
-- 
Will

https://fixes.arm64.dev
https://next.arm64.dev
https://will.arm64.dev
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH v4 2/3] iommu/vt-d: Track device aux-attach with subdevice_domain_info

2021-01-07 Thread Lu Baolu

Hi Will,

On 2021/1/7 21:49, Will Deacon wrote:

On Thu, Jan 07, 2021 at 12:03:56AM +0800, Liu Yi L wrote:

In the existing code, loop all devices attached to a domain does not
include sub-devices attached via iommu_aux_attach_device().

This was found by when I'm working on the below patch, There is no
device in the domain->devices list, thus unable to get the cap and
ecap of iommu unit. But this domain actually has subdevice which is
attached via aux-manner. But it is tracked by domain. This patch is
going to fix it.

https://lore.kernel.org/kvm/1599734733-6431-17-git-send-email-yi.l@intel.com/

And this fix goes beyond the patch above, such sub-device tracking is
necessary for other cases. For example, flushing device_iotlb for a
domain which has sub-devices attached by auxiliary manner.


Sorry, but I'm having a really hard time understanding what this patch is
doing based on this commit message. Baolu -- do you think you could reword
it for me please? No need to resend the patch.


iommu/vt-d: Fix general protection fault in aux_detach_device()

The aux-domain attach/detach are not tracked, some data structures might
be used after free. This causes general protection faults when multiple
subdevices are created and assigned to a same guest machine.

The symptoms of this look like:
[ 1548.992644] general protection fault, probably for non-canonical
address 0xdead0100:  [#1] SMP NOPTI
[ 1549.078610] RIP: 0010:intel_iommu_aux_detach_device+0x12a/0x1f0
[ 1549.095668] Code: 25 88 01 49 8b 8c 24 a0 02 00 00 85 c0 0f 84 b3 00
 00 00 48 85 c9 0f 84 ac 00 00 00 48 8b 8b 68 f8 ff
 ff 48 8b 83 70 f8 ff ff <48> 89 41 08 48 89 08 48
 b9 00 01 00 00 00 00 ad de 48 b8 22 01 00
[ 1549.142717] RSP: 0018:a19eca067b88 EFLAGS: 00010082
[ 1549.157403] RAX: dead0122 RBX: 8910a1d0a0b8 RCX: 
dead0100
[ 1549.175075] RDX: 8910ba752b00 RSI:  RDI: 
8910daee7ea0
[ 1549.192746] RBP: a19eca067bc0 R08:  R09: 
0018
[ 1549.208495] R10: 8910d5eb03b0 R11: 0001 R12: 
8910cf3900b8
[ 1549.223652] R13: 8910daeba600 R14: 0246 R15: 
8910a1fe1e58
[ 1549.240158] FS:  7fc4449cad80() GS:8910de80() 
knlGS:

[ 1549.257316] CS:  0010 DS:  ES:  CR0: 80050033
[ 1549.272424] CR2: 5604b8b9 CR3: 000859542004 CR4: 
03762ee0
[ 1549.288201] DR0:  DR1:  DR2: 

[ 1549.302623] DR3:  DR6: fffe0ff0 DR7: 
0400

[ 1549.316768] PKRU: 5554
[ 1549.327711] Call Trace:
[ 1549.337400]  iommu_aux_detach_device+0x24/0x70
[ 1549.348268]  vfio_mdev_detach_domain+0x3b/0x60
[ 1549.361050]  ? vfio_mdev_set_domain+0x50/0x50
[ 1549.372811]  iommu_group_for_each_dev+0x4f/0x80
[ 1549.386160]  vfio_iommu_detach_group.isra.0+0x22/0x30
[ 1549.398160]  vfio_iommu_type1_detach_group.cold+0x71/0x211
[ 1549.411491]  ? find_exported_symbol_in_section+0x4a/0xd0
[ 1549.423097]  ? each_symbol_section+0x28/0x50
[ 1549.435137]  __vfio_group_unset_container+0x4d/0x150
[ 1549.448735]  vfio_group_try_dissolve_container+0x25/0x30
[ 1549.461213]  vfio_group_put_external_user+0x13/0x20
[ 1549.474632]  kvm_vfio_group_put_external_user+0x27/0x40 [kvm]
[ 1549.488017]  kvm_vfio_destroy+0x45/0xb0 [kvm]
[ 1549.500137]  kvm_put_kvm+0x1bb/0x2e0 [kvm]
[ 1549.509965]  kvm_vm_release+0x22/0x30 [kvm]
[ 1549.520706]  __fput+0xcc/0x260
[ 1549.530202]  fput+0xe/0x10
[ 1549.539426]  task_work_run+0x8f/0xb0
[ 1549.549440]  do_exit+0x358/0xaf0
[ 1549.558165]  ? wake_up_state+0x10/0x20
[ 1549.568318]  ? signal_wake_up_state+0x1a/0x30
[ 1549.579739]  do_group_exit+0x47/0xb0
[ 1549.589337]  __x64_sys_exit_group+0x18/0x20
[ 1549.599069]  do_syscall_64+0x57/0x1d0
[ 1549.609082]  entry_SYSCALL_64_after_hwframe+0x44/0xa9
[ 1549.619044] RIP: 0033:0x7fc44e7e38a0
[ 1549.627488] Code: Bad RIP value.
[ 1549.635181] RSP: 002b:7ffd23b6c038 EFLAGS: 0246 ORIG_RAX: 
00e7
[ 1549.647761] RAX: ffda RBX: 7fc44e8f2470 RCX: 
7fc44e7e38a0
[ 1549.661439] RDX:  RSI: 003c RDI: 

[ 1549.673452] RBP: 7fc44e8f2470 R08: 00e7 R09: 
ddb8
[ 1549.686293] R10:  R11: 0246 R12: 

[ 1549.699347] R13:  R14: 0304 R15: 



Fix it by tracking the subdevices when attaching and detaching aux-
domains.

Best regards,
baolu



Thanks,

Will


___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH v3 5/6] media: uvcvideo: Use dma_alloc_noncontiguos API

2021-01-07 Thread Ricardo Ribalda
Hi Christoph

Happy new year!

On Wed, Dec 9, 2020 at 12:16 PM . Christoph Hellwig  wrote:
>
> On Tue, Dec 08, 2020 at 04:13:20PM +0900, Sergey Senozhatsky wrote:
> > On (20/12/08 13:54), Tomasz Figa wrote:
> > >
> > > In any case, Sergey is going to share a preliminary patch on how the
> > > current API would be used in the V4L2 videobuf2 framework. That should
> > > give us more input on how such a helper could look.
> >
> > HUGE apologies for the previous screw up! I replied in the
> > gmail web-interface and that did not work out as expected
> > (at all, big times).
>
> Actually the previous mail was a mime multipart one, and the plain text
> version displayed just fine here.  My the gmail engineers finally learned
> something after all.
>
> > Another thing to notice is that the new API requires us to have two 
> > execution branches
> > in allocators - one for the current API; and one for the new API (if it's 
> > supported and
> > if user-space requested non-coherent allocation).
>
> So I think we do want these branches for coherent vs non-coherent as they
> have very different semantics and I do not think that hiding them under
> the same API helps people to understand those vastly different semantics.
>
> OTOH we should look into a fallback for DMA API instances that do not
> support the discontigous allocations.
>
> I think between your comments and those from Ricardo I have a good idea
> for a somewhat updated API.  I'll try to post something in the next days.

Did you have time to look into this?

No hurry, I just want to make sure that I didn't miss anything ;)

Best regards!



-- 
Ricardo Ribalda
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH v4 2/3] iommu/vt-d: Track device aux-attach with subdevice_domain_info

2021-01-07 Thread Will Deacon
On Thu, Jan 07, 2021 at 12:03:56AM +0800, Liu Yi L wrote:
> In the existing code, loop all devices attached to a domain does not
> include sub-devices attached via iommu_aux_attach_device().
> 
> This was found by when I'm working on the below patch, There is no
> device in the domain->devices list, thus unable to get the cap and
> ecap of iommu unit. But this domain actually has subdevice which is
> attached via aux-manner. But it is tracked by domain. This patch is
> going to fix it.
> 
> https://lore.kernel.org/kvm/1599734733-6431-17-git-send-email-yi.l@intel.com/
> 
> And this fix goes beyond the patch above, such sub-device tracking is
> necessary for other cases. For example, flushing device_iotlb for a
> domain which has sub-devices attached by auxiliary manner.

Sorry, but I'm having a really hard time understanding what this patch is
doing based on this commit message. Baolu -- do you think you could reword
it for me please? No need to resend the patch.

Thanks,

Will
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH] iommu/arm-smmu-v3: Handle duplicated Stream IDs from other masters

2021-01-07 Thread Will Deacon
On Thu, Jan 07, 2021 at 03:03:40PM +0530, Ajay Kumar wrote:
> When PCI function drivers(ex:pci-endpoint-test) are probed for already
> initialized PCIe-RC(Root Complex), and PCIe-RC is already bound to SMMU,
> then we encounter a situation where the function driver tries to attach
> itself to the smmu with the same stream-id as PCIe-RC and re-initialize
> an already initialized STE. This causes ste_live BUG_ON() in the driver.

I don't understand why the endpoint is using the same stream ID as the root
complex in this case. Why is that? Is the grouping logic not working
properly?

> There is an already existing check in the driver to manage duplicated ids
> if duplicated ids are added in same master device, but there can be
> scenarios like above where we need to extend the check for other masters
> using the same stream-id.
> 
> Signed-off-by: Ajay Kumar 
> ---
>  drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c | 33 +
>  1 file changed, 33 insertions(+)

It doesn't feel like the driver is the right place to fix this, as the same
issue could surely occur for other IOMMUs too, right? In which case, I think
we should avoid getting into the situation where different groups have
overlapping stream IDs.

Will
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[PATCH v4 7/7] iommu/mediatek: Remove the tlb-ops for v7s

2021-01-07 Thread Yong Wu
Until now, we have already used the tlb operations from iommu framework,
then the tlb operations for v7s can be removed.

Correspondingly, Switch the paramenter "cookie" to the internal structure.

Signed-off-by: Yong Wu 
---
 drivers/iommu/mtk_iommu.c | 27 ---
 1 file changed, 4 insertions(+), 23 deletions(-)

diff --git a/drivers/iommu/mtk_iommu.c b/drivers/iommu/mtk_iommu.c
index d3b8a1649093..86ab577c9520 100644
--- a/drivers/iommu/mtk_iommu.c
+++ b/drivers/iommu/mtk_iommu.c
@@ -182,10 +182,8 @@ static struct mtk_iommu_domain *to_mtk_domain(struct 
iommu_domain *dom)
return container_of(dom, struct mtk_iommu_domain, domain);
 }
 
-static void mtk_iommu_tlb_flush_all(void *cookie)
+static void mtk_iommu_tlb_flush_all(struct mtk_iommu_data *data)
 {
-   struct mtk_iommu_data *data = cookie;
-
for_each_m4u(data) {
writel_relaxed(F_INVLD_EN1 | F_INVLD_EN0,
   data->base + data->plat_data->inv_sel_reg);
@@ -195,9 +193,9 @@ static void mtk_iommu_tlb_flush_all(void *cookie)
 }
 
 static void mtk_iommu_tlb_flush_range_sync(unsigned long iova, size_t size,
-  size_t granule, void *cookie)
+  size_t granule,
+  struct mtk_iommu_data *data)
 {
-   struct mtk_iommu_data *data = cookie;
unsigned long flags;
int ret;
u32 tmp;
@@ -219,7 +217,7 @@ static void mtk_iommu_tlb_flush_range_sync(unsigned long 
iova, size_t size,
if (ret) {
dev_warn(data->dev,
 "Partial TLB flush timed out, falling back to 
full flush\n");
-   mtk_iommu_tlb_flush_all(cookie);
+   mtk_iommu_tlb_flush_all(data);
}
/* Clear the CPE status */
writel_relaxed(0, data->base + REG_MMU_CPE_DONE);
@@ -227,22 +225,6 @@ static void mtk_iommu_tlb_flush_range_sync(unsigned long 
iova, size_t size,
}
 }
 
-static void mtk_iommu_tlb_flush_page_nosync(struct iommu_iotlb_gather *gather,
-   unsigned long iova, size_t granule,
-   void *cookie)
-{
-   struct mtk_iommu_data *data = cookie;
-   struct iommu_domain *domain = &data->m4u_dom->domain;
-
-   iommu_iotlb_gather_add_page(domain, gather, iova, granule);
-}
-
-static const struct iommu_flush_ops mtk_iommu_flush_ops = {
-   .tlb_flush_all = mtk_iommu_tlb_flush_all,
-   .tlb_flush_walk = mtk_iommu_tlb_flush_range_sync,
-   .tlb_add_page = mtk_iommu_tlb_flush_page_nosync,
-};
-
 static irqreturn_t mtk_iommu_isr(int irq, void *dev_id)
 {
struct mtk_iommu_data *data = dev_id;
@@ -326,7 +308,6 @@ static int mtk_iommu_domain_finalise(struct 
mtk_iommu_domain *dom)
.pgsize_bitmap = mtk_iommu_ops.pgsize_bitmap,
.ias = 32,
.oas = 34,
-   .tlb = &mtk_iommu_flush_ops,
.iommu_dev = data->dev,
};
 
-- 
2.18.0

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[PATCH v4 6/7] iommu/mediatek: Gather iova in iommu_unmap to achieve tlb sync once

2021-01-07 Thread Yong Wu
In current iommu_unmap, this code is:

iommu_iotlb_gather_init(&iotlb_gather);
ret = __iommu_unmap(domain, iova, size, &iotlb_gather);
iommu_iotlb_sync(domain, &iotlb_gather);

We could gather the whole iova range in __iommu_unmap, and then do tlb
synchronization in the iommu_iotlb_sync.

This patch implement this, Gather the range in mtk_iommu_unmap.
then iommu_iotlb_sync call tlb synchronization for the gathered iova range.
we don't call iommu_iotlb_gather_add_page since our tlb synchronization
could be regardless of granule size.

In this way, gather->start is impossible ULONG_MAX, remove the checking.

This patch aims to do tlb synchronization *once* in the iommu_unmap.

Signed-off-by: Yong Wu 
---
 drivers/iommu/mtk_iommu.c | 8 +---
 1 file changed, 5 insertions(+), 3 deletions(-)

diff --git a/drivers/iommu/mtk_iommu.c b/drivers/iommu/mtk_iommu.c
index 66a00a2cb445..d3b8a1649093 100644
--- a/drivers/iommu/mtk_iommu.c
+++ b/drivers/iommu/mtk_iommu.c
@@ -430,7 +430,12 @@ static size_t mtk_iommu_unmap(struct iommu_domain *domain,
  struct iommu_iotlb_gather *gather)
 {
struct mtk_iommu_domain *dom = to_mtk_domain(domain);
+   unsigned long end = iova + size - 1;
 
+   if (gather->start > iova)
+   gather->start = iova;
+   if (gather->end < end)
+   gather->end = end;
return dom->iop->unmap(dom->iop, iova, size, gather);
 }
 
@@ -445,9 +450,6 @@ static void mtk_iommu_iotlb_sync(struct iommu_domain 
*domain,
struct mtk_iommu_data *data = mtk_iommu_get_m4u_data();
size_t length = gather->end - gather->start + 1;
 
-   if (gather->start == ULONG_MAX)
-   return;
-
mtk_iommu_tlb_flush_range_sync(gather->start, length, gather->pgsize,
   data);
 }
-- 
2.18.0

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[PATCH v4 5/7] iommu/io-pgtable: Allow io_pgtable_tlb ops optional

2021-01-07 Thread Yong Wu
This patch allows io_pgtable_tlb ops could be null since the IOMMU drivers
may use the tlb ops from iommu framework.

Signed-off-by: Yong Wu 
---
 include/linux/io-pgtable.h | 8 +---
 1 file changed, 5 insertions(+), 3 deletions(-)

diff --git a/include/linux/io-pgtable.h b/include/linux/io-pgtable.h
index ea727eb1a1a9..2a5686ca2ba3 100644
--- a/include/linux/io-pgtable.h
+++ b/include/linux/io-pgtable.h
@@ -214,14 +214,16 @@ struct io_pgtable_domain_attr {
 
 static inline void io_pgtable_tlb_flush_all(struct io_pgtable *iop)
 {
-   iop->cfg.tlb->tlb_flush_all(iop->cookie);
+   if (iop->cfg.tlb && iop->cfg.tlb->tlb_flush_all)
+   iop->cfg.tlb->tlb_flush_all(iop->cookie);
 }
 
 static inline void
 io_pgtable_tlb_flush_walk(struct io_pgtable *iop, unsigned long iova,
  size_t size, size_t granule)
 {
-   iop->cfg.tlb->tlb_flush_walk(iova, size, granule, iop->cookie);
+   if (iop->cfg.tlb && iop->cfg.tlb->tlb_flush_walk)
+   iop->cfg.tlb->tlb_flush_walk(iova, size, granule, iop->cookie);
 }
 
 static inline void
@@ -229,7 +231,7 @@ io_pgtable_tlb_add_page(struct io_pgtable *iop,
struct iommu_iotlb_gather * gather, unsigned long iova,
size_t granule)
 {
-   if (iop->cfg.tlb->tlb_add_page)
+   if (iop->cfg.tlb && iop->cfg.tlb->tlb_add_page)
iop->cfg.tlb->tlb_add_page(gather, iova, granule, iop->cookie);
 }
 
-- 
2.18.0

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[PATCH v4 4/7] iommu: Switch gather->end to the inclusive end

2021-01-07 Thread Yong Wu
Currently gather->end is "unsigned long" which may be overflow in
arch32 in the corner case: 0xfff0 + 0x10(iova + size).
Although it doesn't affect the size(end - start), it affects the checking
"gather->end < end"

This patch changes this "end" to the real end address
(end = start + size - 1). Correspondingly, update the length to
"end - start + 1".

Fixes: a7d20dc19d9e ("iommu: Introduce struct iommu_iotlb_gather for batching 
TLB flushes")
Signed-off-by: Yong Wu 
---
 drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c | 2 +-
 drivers/iommu/mtk_iommu.c   | 2 +-
 drivers/iommu/tegra-gart.c  | 2 +-
 include/linux/iommu.h   | 4 ++--
 4 files changed, 5 insertions(+), 5 deletions(-)

diff --git a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c 
b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
index 8ca7415d785d..c70d6e79f534 100644
--- a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
+++ b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
@@ -2280,7 +2280,7 @@ static void arm_smmu_iotlb_sync(struct iommu_domain 
*domain,
 {
struct arm_smmu_domain *smmu_domain = to_smmu_domain(domain);
 
-   arm_smmu_tlb_inv_range(gather->start, gather->end - gather->start,
+   arm_smmu_tlb_inv_range(gather->start, gather->end - gather->start + 1,
   gather->pgsize, true, smmu_domain);
 }
 
diff --git a/drivers/iommu/mtk_iommu.c b/drivers/iommu/mtk_iommu.c
index f579fc21971e..66a00a2cb445 100644
--- a/drivers/iommu/mtk_iommu.c
+++ b/drivers/iommu/mtk_iommu.c
@@ -443,7 +443,7 @@ static void mtk_iommu_iotlb_sync(struct iommu_domain 
*domain,
 struct iommu_iotlb_gather *gather)
 {
struct mtk_iommu_data *data = mtk_iommu_get_m4u_data();
-   size_t length = gather->end - gather->start;
+   size_t length = gather->end - gather->start + 1;
 
if (gather->start == ULONG_MAX)
return;
diff --git a/drivers/iommu/tegra-gart.c b/drivers/iommu/tegra-gart.c
index 05e8e19b8269..6f130e51f072 100644
--- a/drivers/iommu/tegra-gart.c
+++ b/drivers/iommu/tegra-gart.c
@@ -270,7 +270,7 @@ static void gart_iommu_sync_map(struct iommu_domain 
*domain, unsigned long iova,
 static void gart_iommu_sync(struct iommu_domain *domain,
struct iommu_iotlb_gather *gather)
 {
-   size_t length = gather->end - gather->start;
+   size_t length = gather->end - gather->start + 1;
 
gart_iommu_sync_map(domain, gather->start, length);
 }
diff --git a/include/linux/iommu.h b/include/linux/iommu.h
index 9ce0aa9e236b..ae8eddd4621b 100644
--- a/include/linux/iommu.h
+++ b/include/linux/iommu.h
@@ -170,7 +170,7 @@ enum iommu_dev_features {
  * struct iommu_iotlb_gather - Range information for a pending IOTLB flush
  *
  * @start: IOVA representing the start of the range to be flushed
- * @end: IOVA representing the end of the range to be flushed (exclusive)
+ * @end: IOVA representing the end of the range to be flushed (inclusive)
  * @pgsize: The interval at which to perform the flush
  *
  * This structure is intended to be updated by multiple calls to the
@@ -539,7 +539,7 @@ static inline void iommu_iotlb_gather_add_page(struct 
iommu_domain *domain,
   struct iommu_iotlb_gather 
*gather,
   unsigned long iova, size_t size)
 {
-   unsigned long start = iova, end = start + size;
+   unsigned long start = iova, end = start + size - 1;
 
/*
 * If the new page is disjoint from the current range or is mapped at
-- 
2.18.0

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[PATCH v4 3/7] iommu/mediatek: Add iotlb_sync_map to sync whole the iova range

2021-01-07 Thread Yong Wu
Remove IO_PGTABLE_QUIRK_TLBI_ON_MAP to avoid tlb sync for each a small
chunk memory, Use the new iotlb_sync_map to tlb_sync once for whole the
iova range of iommu_map.

Signed-off-by: Yong Wu 
Reviewed-by: Robin Murphy 
---
 drivers/iommu/mtk_iommu.c | 10 +-
 1 file changed, 9 insertions(+), 1 deletion(-)

diff --git a/drivers/iommu/mtk_iommu.c b/drivers/iommu/mtk_iommu.c
index 8e56cec532e7..f579fc21971e 100644
--- a/drivers/iommu/mtk_iommu.c
+++ b/drivers/iommu/mtk_iommu.c
@@ -322,7 +322,6 @@ static int mtk_iommu_domain_finalise(struct 
mtk_iommu_domain *dom)
dom->cfg = (struct io_pgtable_cfg) {
.quirks = IO_PGTABLE_QUIRK_ARM_NS |
IO_PGTABLE_QUIRK_NO_PERMS |
-   IO_PGTABLE_QUIRK_TLBI_ON_MAP |
IO_PGTABLE_QUIRK_ARM_MTK_EXT,
.pgsize_bitmap = mtk_iommu_ops.pgsize_bitmap,
.ias = 32,
@@ -453,6 +452,14 @@ static void mtk_iommu_iotlb_sync(struct iommu_domain 
*domain,
   data);
 }
 
+static void mtk_iommu_sync_map(struct iommu_domain *domain, unsigned long iova,
+  size_t size)
+{
+   struct mtk_iommu_data *data = mtk_iommu_get_m4u_data();
+
+   mtk_iommu_tlb_flush_range_sync(iova, size, size, data);
+}
+
 static phys_addr_t mtk_iommu_iova_to_phys(struct iommu_domain *domain,
  dma_addr_t iova)
 {
@@ -539,6 +546,7 @@ static const struct iommu_ops mtk_iommu_ops = {
.unmap  = mtk_iommu_unmap,
.flush_iotlb_all = mtk_iommu_flush_iotlb_all,
.iotlb_sync = mtk_iommu_iotlb_sync,
+   .iotlb_sync_map = mtk_iommu_sync_map,
.iova_to_phys   = mtk_iommu_iova_to_phys,
.probe_device   = mtk_iommu_probe_device,
.release_device = mtk_iommu_release_device,
-- 
2.18.0

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[PATCH v4 2/7] iommu: Add iova and size as parameters in iotlb_sync_map

2021-01-07 Thread Yong Wu
iotlb_sync_map allow IOMMU drivers tlb sync after completing the whole
mapping. This patch adds iova and size as the parameters in it. then the
IOMMU driver could flush tlb with the whole range once after iova mapping
to improve performance.

Signed-off-by: Yong Wu 
Reviewed-by: Robin Murphy 
---
 drivers/iommu/iommu.c  | 4 ++--
 drivers/iommu/tegra-gart.c | 7 +--
 include/linux/iommu.h  | 3 ++-
 3 files changed, 9 insertions(+), 5 deletions(-)

diff --git a/drivers/iommu/iommu.c b/drivers/iommu/iommu.c
index c304a6a30d42..3d099a31ddca 100644
--- a/drivers/iommu/iommu.c
+++ b/drivers/iommu/iommu.c
@@ -2443,7 +2443,7 @@ static int _iommu_map(struct iommu_domain *domain, 
unsigned long iova,
 
ret = __iommu_map(domain, iova, paddr, size, prot, GFP_KERNEL);
if (ret == 0 && ops->iotlb_sync_map)
-   ops->iotlb_sync_map(domain);
+   ops->iotlb_sync_map(domain, iova, size);
 
return ret;
 }
@@ -2575,7 +2575,7 @@ static size_t __iommu_map_sg(struct iommu_domain *domain, 
unsigned long iova,
}
 
if (ops->iotlb_sync_map)
-   ops->iotlb_sync_map(domain);
+   ops->iotlb_sync_map(domain, iova, mapped);
return mapped;
 
 out_err:
diff --git a/drivers/iommu/tegra-gart.c b/drivers/iommu/tegra-gart.c
index fac720273889..05e8e19b8269 100644
--- a/drivers/iommu/tegra-gart.c
+++ b/drivers/iommu/tegra-gart.c
@@ -261,7 +261,8 @@ static int gart_iommu_of_xlate(struct device *dev,
return 0;
 }
 
-static void gart_iommu_sync_map(struct iommu_domain *domain)
+static void gart_iommu_sync_map(struct iommu_domain *domain, unsigned long 
iova,
+   size_t size)
 {
FLUSH_GART_REGS(gart_handle);
 }
@@ -269,7 +270,9 @@ static void gart_iommu_sync_map(struct iommu_domain *domain)
 static void gart_iommu_sync(struct iommu_domain *domain,
struct iommu_iotlb_gather *gather)
 {
-   gart_iommu_sync_map(domain);
+   size_t length = gather->end - gather->start;
+
+   gart_iommu_sync_map(domain, gather->start, length);
 }
 
 static const struct iommu_ops gart_iommu_ops = {
diff --git a/include/linux/iommu.h b/include/linux/iommu.h
index b3f0e2018c62..9ce0aa9e236b 100644
--- a/include/linux/iommu.h
+++ b/include/linux/iommu.h
@@ -246,7 +246,8 @@ struct iommu_ops {
size_t (*unmap)(struct iommu_domain *domain, unsigned long iova,
 size_t size, struct iommu_iotlb_gather *iotlb_gather);
void (*flush_iotlb_all)(struct iommu_domain *domain);
-   void (*iotlb_sync_map)(struct iommu_domain *domain);
+   void (*iotlb_sync_map)(struct iommu_domain *domain, unsigned long iova,
+  size_t size);
void (*iotlb_sync)(struct iommu_domain *domain,
   struct iommu_iotlb_gather *iotlb_gather);
phys_addr_t (*iova_to_phys)(struct iommu_domain *domain, dma_addr_t 
iova);
-- 
2.18.0

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[PATCH v4 1/7] iommu: Move iotlb_sync_map out from __iommu_map

2021-01-07 Thread Yong Wu
In the end of __iommu_map, It alway call iotlb_sync_map.

This patch moves iotlb_sync_map out from __iommu_map since it is
unnecessary to call this for each sg segment especially iotlb_sync_map
is flush tlb all currently. Add a little helper _iommu_map for this.

Signed-off-by: Yong Wu 
Reviewed-by: Robin Murphy 
---
 drivers/iommu/iommu.c | 23 ++-
 1 file changed, 18 insertions(+), 5 deletions(-)

diff --git a/drivers/iommu/iommu.c b/drivers/iommu/iommu.c
index ffeebda8d6de..c304a6a30d42 100644
--- a/drivers/iommu/iommu.c
+++ b/drivers/iommu/iommu.c
@@ -2426,9 +2426,6 @@ static int __iommu_map(struct iommu_domain *domain, 
unsigned long iova,
size -= pgsize;
}
 
-   if (ops->iotlb_sync_map)
-   ops->iotlb_sync_map(domain);
-
/* unroll mapping in case something went wrong */
if (ret)
iommu_unmap(domain, orig_iova, orig_size - size);
@@ -2438,18 +2435,31 @@ static int __iommu_map(struct iommu_domain *domain, 
unsigned long iova,
return ret;
 }
 
+static int _iommu_map(struct iommu_domain *domain, unsigned long iova,
+ phys_addr_t paddr, size_t size, int prot, gfp_t gfp)
+{
+   const struct iommu_ops *ops = domain->ops;
+   int ret;
+
+   ret = __iommu_map(domain, iova, paddr, size, prot, GFP_KERNEL);
+   if (ret == 0 && ops->iotlb_sync_map)
+   ops->iotlb_sync_map(domain);
+
+   return ret;
+}
+
 int iommu_map(struct iommu_domain *domain, unsigned long iova,
  phys_addr_t paddr, size_t size, int prot)
 {
might_sleep();
-   return __iommu_map(domain, iova, paddr, size, prot, GFP_KERNEL);
+   return _iommu_map(domain, iova, paddr, size, prot, GFP_KERNEL);
 }
 EXPORT_SYMBOL_GPL(iommu_map);
 
 int iommu_map_atomic(struct iommu_domain *domain, unsigned long iova,
  phys_addr_t paddr, size_t size, int prot)
 {
-   return __iommu_map(domain, iova, paddr, size, prot, GFP_ATOMIC);
+   return _iommu_map(domain, iova, paddr, size, prot, GFP_ATOMIC);
 }
 EXPORT_SYMBOL_GPL(iommu_map_atomic);
 
@@ -2533,6 +2543,7 @@ static size_t __iommu_map_sg(struct iommu_domain *domain, 
unsigned long iova,
 struct scatterlist *sg, unsigned int nents, int 
prot,
 gfp_t gfp)
 {
+   const struct iommu_ops *ops = domain->ops;
size_t len = 0, mapped = 0;
phys_addr_t start;
unsigned int i = 0;
@@ -2563,6 +2574,8 @@ static size_t __iommu_map_sg(struct iommu_domain *domain, 
unsigned long iova,
sg = sg_next(sg);
}
 
+   if (ops->iotlb_sync_map)
+   ops->iotlb_sync_map(domain);
return mapped;
 
 out_err:
-- 
2.18.0

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[PATCH v4 0/7] MediaTek IOMMU improve tlb flush performance in map/unmap

2021-01-07 Thread Yong Wu
This patchset is to improve tlb flushing performance in iommu_map/unmap
for MediaTek IOMMU.

For iommu_map, currently MediaTek IOMMU use IO_PGTABLE_QUIRK_TLBI_ON_MAP
to do tlb_flush for each a memory chunk. this is so unnecessary. we could
improve it by tlb flushing one time at the end of iommu_map.

For iommu_unmap, currently we have already improve this performance by
gather. But the current gather should take care its granule size. if the
granule size is different, it will do tlb flush and gather again. Our HW
don't care about granule size. thus I gather the range in our file.

After this patchset, we could achieve only tlb flushing once in iommu_map
and iommu_unmap.

Regardless of sg, for each a segment, I did a simple test:
  
  size = 20 * SZ_1M;
  /* the worst case, all are 4k mapping. */
  ret = iommu_map(domain, 0x5bb02000, 0x123f1000, size, IOMMU_READ);
  iommu_unmap(domain, 0x5bb02000, size);

This is the comparing time(unit is us):
  original-time  after-improve
   map-20M59943   2347
   unmap-20M  264 36

This patchset also flush tlb once in the iommu_map_sg case.

patch [1/7][2/7][3/7] are for map while the others are for unmap.

change note:
v4: a. base on v5.11-rc1.
b. Add a little helper _iommu_map.
c. Fix a build fail for tegra-gart.c. I didn't notice there is another place
call gart_iommu_sync_map.
d. Switch gather->end to the read end address("start + end - 1").

v3: 
https://lore.kernel.org/linux-iommu/20201216103607.23050-1-yong...@mediatek.com/#r
Refactor the unmap flow suggested by Robin.
 
v2: 
https://lore.kernel.org/linux-iommu/20201119061836.15238-1-yong...@mediatek.com/
Refactor all the code.
base on v5.10-rc1.

Yong Wu (7):
  iommu: Move iotlb_sync_map out from __iommu_map
  iommu: Add iova and size as parameters in iotlb_sync_map
  iommu/mediatek: Add iotlb_sync_map to sync whole the iova range
  iommu: Switch gather->end to the inclusive end
  iommu/io-pgtable: Allow io_pgtable_tlb ops optional
  iommu/mediatek: Gather iova in iommu_unmap to achieve tlb sync once
  iommu/mediatek: Remove the tlb-ops for v7s

 drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c |  2 +-
 drivers/iommu/iommu.c   | 23 +++---
 drivers/iommu/mtk_iommu.c   | 47 +
 drivers/iommu/tegra-gart.c  |  7 ++-
 include/linux/io-pgtable.h  |  8 ++--
 include/linux/iommu.h   |  7 +--
 6 files changed, 52 insertions(+), 42 deletions(-)

-- 
2.18.0


___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[PATCH] iommu/arm-smmu-v3: Handle duplicated Stream IDs from other masters

2021-01-07 Thread Ajay Kumar
When PCI function drivers(ex:pci-endpoint-test) are probed for already
initialized PCIe-RC(Root Complex), and PCIe-RC is already bound to SMMU,
then we encounter a situation where the function driver tries to attach
itself to the smmu with the same stream-id as PCIe-RC and re-initialize
an already initialized STE. This causes ste_live BUG_ON() in the driver.

There is an already existing check in the driver to manage duplicated ids
if duplicated ids are added in same master device, but there can be
scenarios like above where we need to extend the check for other masters
using the same stream-id.

Signed-off-by: Ajay Kumar 
---
 drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c | 33 +
 1 file changed, 33 insertions(+)

diff --git a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c 
b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
index e634bbe60573..a91c3c0e9ee8 100644
--- a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
+++ b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
@@ -2022,10 +2022,26 @@ static __le64 *arm_smmu_get_step_for_sid(struct 
arm_smmu_device *smmu, u32 sid)
return step;
 }
 
+static bool arm_smmu_check_duplicated_sid(struct arm_smmu_master *master,
+   int sid)
+{
+   int i;
+
+   for (i = 0; i < master->num_sids; ++i)
+   if (master->sids[i] == sid)
+   return true;
+
+   return false;
+}
+
 static void arm_smmu_install_ste_for_dev(struct arm_smmu_master *master)
 {
+   bool sid_in_other_masters;
int i, j;
struct arm_smmu_device *smmu = master->smmu;
+   unsigned long flags;
+   struct arm_smmu_domain *smmu_domain = master->domain;
+   struct arm_smmu_master *other_masters;
 
for (i = 0; i < master->num_sids; ++i) {
u32 sid = master->sids[i];
@@ -2038,6 +2054,23 @@ static void arm_smmu_install_ste_for_dev(struct 
arm_smmu_master *master)
if (j < i)
continue;
 
+   /* Check for stream-ID duplication in masters in given domain */
+   sid_in_other_masters = false;
+   spin_lock_irqsave(&smmu_domain->devices_lock, flags);
+   list_for_each_entry(other_masters, &smmu_domain->devices,
+   domain_head) {
+   sid_in_other_masters =
+   arm_smmu_check_duplicated_sid(other_masters,
+   sid);
+   if (sid_in_other_masters)
+   break;
+   }
+   spin_unlock_irqrestore(&smmu_domain->devices_lock, flags);
+
+   /* Skip STE re-init if stream-id found in other masters */
+   if (sid_in_other_masters)
+   continue;
+
arm_smmu_write_strtab_ent(master, sid, step);
}
 }
-- 
2.17.1

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[PATCH 5/5] vfio/iommu_type1: Move sanity_check_pfn_list to unmap_unpin_all

2021-01-07 Thread Keqian Zhu
Both external domain and iommu backend domain can use pinning
interface, thus both can add pfn to dma pfn_list. By moving
the sanity_check_pfn_list to unmap_unpin_all can apply it to
all types of domain.

Fixes: a54eb55045ae ("vfio iommu type1: Add support for mediated devices")
Signed-off-by: Keqian Zhu 
---
 drivers/vfio/vfio_iommu_type1.c | 38 -
 1 file changed, 18 insertions(+), 20 deletions(-)

diff --git a/drivers/vfio/vfio_iommu_type1.c b/drivers/vfio/vfio_iommu_type1.c
index 9776a059904d..d796be8bcbc5 100644
--- a/drivers/vfio/vfio_iommu_type1.c
+++ b/drivers/vfio/vfio_iommu_type1.c
@@ -2225,10 +2225,28 @@ static int vfio_iommu_type1_attach_group(void 
*iommu_data,
return ret;
 }
 
+static void vfio_sanity_check_pfn_list(struct vfio_iommu *iommu)
+{
+   struct rb_node *n;
+
+   n = rb_first(&iommu->dma_list);
+   for (; n; n = rb_next(n)) {
+   struct vfio_dma *dma;
+
+   dma = rb_entry(n, struct vfio_dma, node);
+
+   if (WARN_ON(!RB_EMPTY_ROOT(&dma->pfn_list)))
+   break;
+   }
+   /* mdev vendor driver must unregister notifier */
+   WARN_ON(iommu->notifier.head);
+}
+
 static void vfio_iommu_unmap_unpin_all(struct vfio_iommu *iommu)
 {
struct rb_node *node;
 
+   vfio_sanity_check_pfn_list(iommu);
while ((node = rb_first(&iommu->dma_list)))
vfio_remove_dma(iommu, rb_entry(node, struct vfio_dma, node));
 }
@@ -2256,23 +2274,6 @@ static void vfio_iommu_unmap_unpin_reaccount(struct 
vfio_iommu *iommu)
}
 }
 
-static void vfio_sanity_check_pfn_list(struct vfio_iommu *iommu)
-{
-   struct rb_node *n;
-
-   n = rb_first(&iommu->dma_list);
-   for (; n; n = rb_next(n)) {
-   struct vfio_dma *dma;
-
-   dma = rb_entry(n, struct vfio_dma, node);
-
-   if (WARN_ON(!RB_EMPTY_ROOT(&dma->pfn_list)))
-   break;
-   }
-   /* mdev vendor driver must unregister notifier */
-   WARN_ON(iommu->notifier.head);
-}
-
 /*
  * Called when a domain is removed in detach. It is possible that
  * the removed domain decided the iova aperture window. Modify the
@@ -2371,8 +2372,6 @@ static void vfio_iommu_type1_detach_group(void 
*iommu_data,
kfree(group);
 
if (list_empty(&iommu->external_domain->group_list)) {
-   vfio_sanity_check_pfn_list(iommu);
-
/*
 * During dirty page tracking, we can't remove
 * vfio_dma because dirty log will lose.
@@ -2503,7 +2502,6 @@ static void vfio_iommu_type1_release(void *iommu_data)
 
if (iommu->external_domain) {
vfio_release_domain(iommu->external_domain, true);
-   vfio_sanity_check_pfn_list(iommu);
kfree(iommu->external_domain);
}
 
-- 
2.19.1

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[PATCH 2/5] vfio/iommu_type1: Populate dirty bitmap for new vfio_dma

2021-01-07 Thread Keqian Zhu
We should populate dirty bitmap for newly added vfio_dma as it can be
accessed arbitrarily if vfio_iommu is not promoted to pinned_scope.
vfio_dma_populate_bitmap can handle this properly.

Fixes: d6a4c185660c ("vfio iommu: Implementation of ioctl for dirty pages 
tracking")
Signed-off-by: Keqian Zhu 
---
 drivers/vfio/vfio_iommu_type1.c | 4 +++-
 1 file changed, 3 insertions(+), 1 deletion(-)

diff --git a/drivers/vfio/vfio_iommu_type1.c b/drivers/vfio/vfio_iommu_type1.c
index b0a26e8e0adf..29c8702c3b6e 100644
--- a/drivers/vfio/vfio_iommu_type1.c
+++ b/drivers/vfio/vfio_iommu_type1.c
@@ -1413,7 +1413,9 @@ static int vfio_dma_do_map(struct vfio_iommu *iommu,
 
if (!ret && iommu->dirty_page_tracking) {
ret = vfio_dma_bitmap_alloc(dma, pgsize);
-   if (ret)
+   if (!ret)
+   vfio_dma_populate_bitmap(iommu, dma);
+   else
vfio_remove_dma(iommu, dma);
}
 
-- 
2.19.1

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[PATCH 3/5] vfio/iommu_type1: Populate dirty bitmap when attach group

2021-01-07 Thread Keqian Zhu
Attach an iommu backend group will potentially access all dma
ranges. We should traverse all dma ranges to mark dirty.

Fixes: d6a4c185660c ("vfio iommu: Implementation of ioctl for dirty pages 
tracking")
Signed-off-by: Keqian Zhu 
---
 drivers/vfio/vfio_iommu_type1.c | 18 +-
 1 file changed, 17 insertions(+), 1 deletion(-)

diff --git a/drivers/vfio/vfio_iommu_type1.c b/drivers/vfio/vfio_iommu_type1.c
index 29c8702c3b6e..26b7eb2a5cfc 100644
--- a/drivers/vfio/vfio_iommu_type1.c
+++ b/drivers/vfio/vfio_iommu_type1.c
@@ -255,6 +255,17 @@ static void vfio_dma_populate_bitmap(struct vfio_iommu 
*iommu,
vfio_dma_populate_bitmap_full(dma, pgsize);
 }
 
+static void vfio_iommu_populate_bitmap(struct vfio_iommu *iommu)
+{
+   struct rb_node *n;
+   struct vfio_dma *dma;
+
+   for (n = rb_first(&iommu->dma_list); n; n = rb_next(n)) {
+   dma = rb_entry(n, struct vfio_dma, node);
+   vfio_dma_populate_bitmap(iommu, dma);
+   }
+}
+
 static int vfio_dma_bitmap_alloc_all(struct vfio_iommu *iommu)
 {
struct rb_node *n;
@@ -2190,7 +2201,12 @@ static int vfio_iommu_type1_attach_group(void 
*iommu_data,
 * demotes the iommu scope until it declares itself dirty tracking
 * capable via the page pinning interface.
 */
-   iommu->pinned_page_dirty_scope = false;
+   if (iommu->pinned_page_dirty_scope) {
+   iommu->pinned_page_dirty_scope = false;
+   if (iommu->dirty_page_tracking)
+   vfio_iommu_populate_bitmap(iommu);
+   }
+
mutex_unlock(&iommu->lock);
vfio_iommu_resv_free(&group_resv_regions);
 
-- 
2.19.1

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[PATCH 1/5] vfio/iommu_type1: Fixes vfio_dma_populate_bitmap to avoid dirty lose

2021-01-07 Thread Keqian Zhu
Defer checking whether vfio_dma is of fully-dirty in update_user_bitmap
is easy to lose dirty log. For example, after promoting pinned_scope of
vfio_iommu, vfio_dma is not considered as fully-dirty, then we may lose
dirty log that occurs before vfio_iommu is promoted.

The key point is that pinned-dirty is not a real dirty tracking way, it
can't continuously track dirty pages, but just restrict dirty scope. It
is essentially the same as fully-dirty. Fully-dirty is of full-scope and
pinned-dirty is of pinned-scope.

So we must mark pinned-dirty or fully-dirty after we start dirty tracking
or clear dirty bitmap, to ensure that dirty log is marked right away.

Fixes: d6a4c185660c ("vfio iommu: Implementation of ioctl for dirty pages 
tracking")
Signed-off-by: Keqian Zhu 
---
 drivers/vfio/vfio_iommu_type1.c | 33 ++---
 1 file changed, 22 insertions(+), 11 deletions(-)

diff --git a/drivers/vfio/vfio_iommu_type1.c b/drivers/vfio/vfio_iommu_type1.c
index bceda5e8baaa..b0a26e8e0adf 100644
--- a/drivers/vfio/vfio_iommu_type1.c
+++ b/drivers/vfio/vfio_iommu_type1.c
@@ -224,7 +224,7 @@ static void vfio_dma_bitmap_free(struct vfio_dma *dma)
dma->bitmap = NULL;
 }
 
-static void vfio_dma_populate_bitmap(struct vfio_dma *dma, size_t pgsize)
+static void vfio_dma_populate_bitmap_pinned(struct vfio_dma *dma, size_t 
pgsize)
 {
struct rb_node *p;
unsigned long pgshift = __ffs(pgsize);
@@ -236,6 +236,25 @@ static void vfio_dma_populate_bitmap(struct vfio_dma *dma, 
size_t pgsize)
}
 }
 
+static void vfio_dma_populate_bitmap_full(struct vfio_dma *dma, size_t pgsize)
+{
+   unsigned long pgshift = __ffs(pgsize);
+   unsigned long nbits = dma->size >> pgshift;
+
+   bitmap_set(dma->bitmap, 0, nbits);
+}
+
+static void vfio_dma_populate_bitmap(struct vfio_iommu *iommu,
+struct vfio_dma *dma)
+{
+   size_t pgsize = (size_t)1 << __ffs(iommu->pgsize_bitmap);
+
+   if (iommu->pinned_page_dirty_scope)
+   vfio_dma_populate_bitmap_pinned(dma, pgsize);
+   else if (dma->iommu_mapped)
+   vfio_dma_populate_bitmap_full(dma, pgsize);
+}
+
 static int vfio_dma_bitmap_alloc_all(struct vfio_iommu *iommu)
 {
struct rb_node *n;
@@ -257,7 +276,7 @@ static int vfio_dma_bitmap_alloc_all(struct vfio_iommu 
*iommu)
}
return ret;
}
-   vfio_dma_populate_bitmap(dma, pgsize);
+   vfio_dma_populate_bitmap(iommu, dma);
}
return 0;
 }
@@ -987,13 +1006,6 @@ static int update_user_bitmap(u64 __user *bitmap, struct 
vfio_iommu *iommu,
unsigned long shift = bit_offset % BITS_PER_LONG;
unsigned long leftover;
 
-   /*
-* mark all pages dirty if any IOMMU capable device is not able
-* to report dirty pages and all pages are pinned and mapped.
-*/
-   if (!iommu->pinned_page_dirty_scope && dma->iommu_mapped)
-   bitmap_set(dma->bitmap, 0, nbits);
-
if (shift) {
bitmap_shift_left(dma->bitmap, dma->bitmap, shift,
  nbits + shift);
@@ -1019,7 +1031,6 @@ static int vfio_iova_dirty_bitmap(u64 __user *bitmap, 
struct vfio_iommu *iommu,
struct vfio_dma *dma;
struct rb_node *n;
unsigned long pgshift = __ffs(iommu->pgsize_bitmap);
-   size_t pgsize = (size_t)1 << pgshift;
int ret;
 
/*
@@ -1055,7 +1066,7 @@ static int vfio_iova_dirty_bitmap(u64 __user *bitmap, 
struct vfio_iommu *iommu,
 * pages which are marked dirty by vfio_dma_rw()
 */
bitmap_clear(dma->bitmap, 0, dma->size >> pgshift);
-   vfio_dma_populate_bitmap(dma, pgsize);
+   vfio_dma_populate_bitmap(iommu, dma);
}
return 0;
 }
-- 
2.19.1

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[PATCH 4/5] vfio/iommu_type1: Carefully use unmap_unpin_all during dirty tracking

2021-01-07 Thread Keqian Zhu
If we detach group during dirty page tracking, we shouldn't remove
vfio_dma, because dirty log will lose.

But we don't prevent unmap_unpin_all in vfio_iommu_release, because
under normal procedure, dirty tracking has been stopped.

Fixes: d6a4c185660c ("vfio iommu: Implementation of ioctl for dirty pages 
tracking")
Signed-off-by: Keqian Zhu 
---
 drivers/vfio/vfio_iommu_type1.c | 14 --
 1 file changed, 12 insertions(+), 2 deletions(-)

diff --git a/drivers/vfio/vfio_iommu_type1.c b/drivers/vfio/vfio_iommu_type1.c
index 26b7eb2a5cfc..9776a059904d 100644
--- a/drivers/vfio/vfio_iommu_type1.c
+++ b/drivers/vfio/vfio_iommu_type1.c
@@ -2373,7 +2373,12 @@ static void vfio_iommu_type1_detach_group(void 
*iommu_data,
if (list_empty(&iommu->external_domain->group_list)) {
vfio_sanity_check_pfn_list(iommu);
 
-   if (!IS_IOMMU_CAP_DOMAIN_IN_CONTAINER(iommu))
+   /*
+* During dirty page tracking, we can't remove
+* vfio_dma because dirty log will lose.
+*/
+   if (!IS_IOMMU_CAP_DOMAIN_IN_CONTAINER(iommu) &&
+   !iommu->dirty_page_tracking)
vfio_iommu_unmap_unpin_all(iommu);
 
kfree(iommu->external_domain);
@@ -2406,10 +2411,15 @@ static void vfio_iommu_type1_detach_group(void 
*iommu_data,
 * iommu and external domain doesn't exist, then all the
 * mappings go away too. If it's the last domain with iommu and
 * external domain exist, update accounting
+*
+* Note: During dirty page tracking, we can't remove vfio_dma
+* because dirty log will lose. Just update accounting is a good
+* choice.
 */
if (list_empty(&domain->group_list)) {
if (list_is_singular(&iommu->domain_list)) {
-   if (!iommu->external_domain)
+   if (!iommu->external_domain &&
+   !iommu->dirty_page_tracking)
vfio_iommu_unmap_unpin_all(iommu);
else
vfio_iommu_unmap_unpin_reaccount(iommu);
-- 
2.19.1

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[PATCH 0/5] vfio/iommu_type1: Some fixes about dirty tracking

2021-01-07 Thread Keqian Zhu
Hi,

In patch series[1], I put forward some fixes and optimizations for
vfio_iommu_type1. This extracts and improves the fix part of it.

[1] 
https://lore.kernel.org/linux-iommu/20201210073425.25960-1-zhukeqi...@huawei.com/T/#t

Thanks,
Keqian

Keqian Zhu (5):
  vfio/iommu_type1: Fixes vfio_dma_populate_bitmap to avoid dirty lose
  vfio/iommu_type1: Populate dirty bitmap for new vfio_dma
  vfio/iommu_type1: Populate dirty bitmap when attach group
  vfio/iommu_type1: Carefully use unmap_unpin_all during dirty tracking
  vfio/iommu_type1: Move sanity_check_pfn_list to unmap_unpin_all

 drivers/vfio/vfio_iommu_type1.c | 107 +---
 1 file changed, 72 insertions(+), 35 deletions(-)

-- 
2.19.1

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH v2 6/6] iommu: Delete iommu_dev_has_feature()

2021-01-07 Thread John Garry

On 07/01/2021 01:18, Lu Baolu wrote:

On 1/6/21 9:35 PM, John Garry wrote:

Function iommu_dev_has_feature() has never been referenced in the tree,
and there does not appear to be anything coming soon to use it, so delete
it.




Hi baolu,


It will be used by the device driver which want to support the aux-
domain capability, for example, below series is under discussion.

https://lore.kernel.org/linux-pci/160408357912.912050.17005584526266191420.st...@djiang5-desk3.ch.intel.com/ 



So I did check linux-iommu lore for recent references, above, but did 
not see this one - that really should have cc'ed linux-iommu list (which 
it didn't).




The typical use case is

     if (iommu_dev_has_feature(dev, IOMMU_DEV_FEAT_AUX)) {
     rc = iommu_dev_enable_feature(dev, IOMMU_DEV_FEAT_AUX);
     if (rc < 0) {
     dev_warn(dev, "Failed to enable aux-domain: 
%d\n", rc);

     return rc;
     }
     }

So please don't remove it.



ok, fine. It also seems that this API has not had a user since it was 
introduced in v5.2.


Thanks,
John

Ps. I suppose a v3 series is not needed ATM - this patch can just be 
dropped.

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu