Re: WARNING in dma_map_page_attrs

2020-10-23 Thread syzbot
syzbot has found a reproducer for the following issue on:

HEAD commit:3cb12d27 Merge tag 'net-5.10-rc1' of git://git.kernel.org/..
git tree:   net
console output: https://syzkaller.appspot.com/x/log.txt?x=1312539050
kernel config:  https://syzkaller.appspot.com/x/.config?x=46c6fea3eb827ae1
dashboard link: https://syzkaller.appspot.com/bug?extid=34dc2fea3478e659af01
compiler:   gcc (GCC) 10.1.0-syz 20200507
syz repro:  https://syzkaller.appspot.com/x/repro.syz?x=1685866450
C reproducer:   https://syzkaller.appspot.com/x/repro.c?x=11f402ef90

IMPORTANT: if you fix the issue, please add the following tag to the commit:
Reported-by: syzbot+34dc2fea3478e659a...@syzkaller.appspotmail.com

netdevsim netdevsim0 netdevsim1: set [1, 0] type 2 family 0 port 6081 - 0
netdevsim netdevsim0 netdevsim2: set [1, 0] type 2 family 0 port 6081 - 0
netdevsim netdevsim0 netdevsim3: set [1, 0] type 2 family 0 port 6081 - 0
infiniband syz2: set active
infiniband syz2: added macvlan0
[ cut here ]
WARNING: CPU: 1 PID: 8488 at kernel/dma/mapping.c:149 
dma_map_page_attrs+0x493/0x700 kernel/dma/mapping.c:149
Modules linked in:
CPU: 1 PID: 8488 Comm: syz-executor144 Not tainted 5.9.0-syzkaller #0
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 
01/01/2011
RIP: 0010:dma_map_page_attrs+0x493/0x700 kernel/dma/mapping.c:149
Code: 80 3c 10 00 0f 85 ed 01 00 00 48 8b 1d 86 38 e9 0c e9 2d fc ff ff 48 89 
c3 e9 d1 fd ff ff e8 04 11 12 00 0f 0b e8 fd 10 12 00 <0f> 0b 49 c7 c4 ff ff ff 
ff e9 d5 fd ff ff e8 ea 10 12 00 48 8d 7b
RSP: 0018:c9fdec68 EFLAGS: 00010293
RAX:  RBX: 894d1060 RCX: 815df1e3
RDX: 8880208c1a40 RSI: 815df5b3 RDI: 8880196f8b00
RBP: 88801412d800 R08: 0002 R09: 
R10: 0002 R11:  R12: ea504b40
R13: 8880196f86e8 R14: 08b8 R15: 0002
FS:  01b26880() GS:8880b9f0() knlGS:
CS:  0010 DS:  ES:  CR0: 80050033
CR2: 22c0 CR3: 22446000 CR4: 001506e0
DR0:  DR1:  DR2: 
DR3:  DR6: fffe0ff0 DR7: 0400
Call Trace:
 dma_map_single_attrs include/linux/dma-mapping.h:279 [inline]
 ib_dma_map_single include/rdma/ib_verbs.h:3967 [inline]
 ib_mad_post_receive_mads+0x23f/0xd60 drivers/infiniband/core/mad.c:2715
 ib_mad_port_start drivers/infiniband/core/mad.c:2862 [inline]
 ib_mad_port_open drivers/infiniband/core/mad.c:3016 [inline]
 ib_mad_init_device+0x72b/0x1400 drivers/infiniband/core/mad.c:3092
 add_client_context+0x405/0x5e0 drivers/infiniband/core/device.c:680
 enable_device_and_get+0x1d5/0x3c0 drivers/infiniband/core/device.c:1301
 ib_register_device drivers/infiniband/core/device.c:1376 [inline]
 ib_register_device+0x7a7/0xa40 drivers/infiniband/core/device.c:1335
 rxe_register_device+0x46d/0x570 drivers/infiniband/sw/rxe/rxe_verbs.c:1182
 rxe_add+0x12fe/0x16d0 drivers/infiniband/sw/rxe/rxe.c:247
 rxe_net_add+0x8c/0xe0 drivers/infiniband/sw/rxe/rxe_net.c:507
 rxe_newlink drivers/infiniband/sw/rxe/rxe.c:269 [inline]
 rxe_newlink+0xb7/0xe0 drivers/infiniband/sw/rxe/rxe.c:250
 nldev_newlink+0x30e/0x540 drivers/infiniband/core/nldev.c:1555
 rdma_nl_rcv_msg+0x367/0x690 drivers/infiniband/core/netlink.c:195
 rdma_nl_rcv_skb drivers/infiniband/core/netlink.c:239 [inline]
 rdma_nl_rcv+0x2f2/0x440 drivers/infiniband/core/netlink.c:259
 netlink_unicast_kernel net/netlink/af_netlink.c:1304 [inline]
 netlink_unicast+0x533/0x7d0 net/netlink/af_netlink.c:1330
 netlink_sendmsg+0x856/0xd90 net/netlink/af_netlink.c:1919
 sock_sendmsg_nosec net/socket.c:651 [inline]
 sock_sendmsg+0xcf/0x120 net/socket.c:671
 sys_sendmsg+0x6e8/0x810 net/socket.c:2353
 ___sys_sendmsg+0xf3/0x170 net/socket.c:2407
 __sys_sendmsg+0xe5/0x1b0 net/socket.c:2440
 do_syscall_64+0x2d/0x70 arch/x86/entry/common.c:46
 entry_SYSCALL_64_after_hwframe+0x44/0xa9
RIP: 0033:0x443699
Code: 18 89 d0 c3 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 00 48 89 f8 48 89 f7 48 
89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 0f 83 
db 0d fc ff c3 66 2e 0f 1f 84 00 00 00 00
RSP: 002b:7ffc067db418 EFLAGS: 0246 ORIG_RAX: 002e
RAX: ffda RBX: 0003 RCX: 00443699
RDX:  RSI: 22c0 RDI: 0003
RBP: 7ffc067db420 R08: 01bb R09: 01bb
R10:  R11: 0246 R12: 7ffc067db430
R13:  R14:  R15: 

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH v4 0/7] arm64: Default to 32-bit wide ZONE_DMA

2020-10-23 Thread Jeremy Linton

Hi,

On 10/21/20 7:34 AM, Nicolas Saenz Julienne wrote:

Using two distinct DMA zones turned out to be problematic. Here's an
attempt go back to a saner default.

I tested this on both a RPi4 and QEMU.


I've tested this in ACPI mode on the rpi4 (4+8G with/without the 3G 
limiter) as well, with Ard's IORT patch. Nothing seems to have regressed.


Thanks,

Tested-by: Jeremy Linton 






---

Changes since v3:
  - Drop patch adding define in dma-mapping
  - Address small review changes
  - Update Ard's patch
  - Add new patch removing examples from mmzone.h

Changes since v2:
  - Introduce Ard's patch
  - Improve OF dma-ranges parsing function
  - Add unit test for OF function
  - Address small changes
  - Move crashkernel reservation later in boot process

Changes since v1:
  - Parse dma-ranges instead of using machine compatible string

Ard Biesheuvel (1):
   arm64: mm: Set ZONE_DMA size based on early IORT scan

Nicolas Saenz Julienne (6):
   arm64: mm: Move reserve_crashkernel() into mem_init()
   arm64: mm: Move zone_dma_bits initialization into zone_sizes_init()
   of/address: Introduce of_dma_get_max_cpu_address()
   of: unittest: Add test for of_dma_get_max_cpu_address()
   arm64: mm: Set ZONE_DMA size based on devicetree's dma-ranges
   mm: Remove examples from enum zone_type comment

  arch/arm64/mm/init.c  | 16 ++--
  drivers/acpi/arm64/iort.c | 52 +++
  drivers/of/address.c  | 42 +++
  drivers/of/unittest.c | 18 ++
  include/linux/acpi_iort.h |  4 +++
  include/linux/mmzone.h| 20 ---
  include/linux/of.h|  7 ++
  7 files changed, 130 insertions(+), 29 deletions(-)



___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH v4 5/7] arm64: mm: Set ZONE_DMA size based on devicetree's dma-ranges

2020-10-23 Thread Catalin Marinas
On Fri, Oct 23, 2020 at 05:27:49PM +0200, Nicolas Saenz Julienne wrote:
> On Thu, 2020-10-22 at 19:06 +0100, Catalin Marinas wrote:
> > On Wed, Oct 21, 2020 at 02:34:35PM +0200, Nicolas Saenz Julienne wrote:
> > > @@ -188,9 +186,11 @@ static phys_addr_t __init max_zone_phys(unsigned int 
> > > zone_bits)
> > >  static void __init zone_sizes_init(unsigned long min, unsigned long max)
> > >  {
> > >   unsigned long max_zone_pfns[MAX_NR_ZONES]  = {0};
> > > + unsigned int __maybe_unused dt_zone_dma_bits;
> > >  
> > >  #ifdef CONFIG_ZONE_DMA
> > > - zone_dma_bits = ARM64_ZONE_DMA_BITS;
> > > + dt_zone_dma_bits = ilog2(of_dma_get_max_cpu_address(NULL));
> > > + zone_dma_bits = min(32U, dt_zone_dma_bits);
> > 
> > A thought: can we remove the min here and expand ZONE_DMA to whatever
> > dt_zone_dma_bits says? More on this below.
> 
> On most platforms we'd get PHYS_ADDR_MAX, or something bigger than the actual
> amount of RAM. Which would ultimately create a system wide ZONE_DMA. At first
> sight, I don't see it breaking dma-direct in any way.
> 
> On the other hand, there is a big amount of MMIO devices out there that can
> only handle 32-bit addressing. Be it PCI cards or actual IP cores. To make
> things worse, this limitation is often expressed in the driver, not FW (with
> dma_set_mask() and friends). If those devices aren't behind an IOMMU we have 
> be
> able to provide at least 32-bit addressable memory. See this comment from
> dma_direct_supported():
> 
> /*
>  * Because 32-bit DMA masks are so common we expect every architecture
>  * to be able to satisfy them - either by not supporting more physical
>  * memory, or by providing a ZONE_DMA32.  If neither is the case, the
>  * architecture needs to use an IOMMU instead of the direct mapping.
>  */
> 
> I think, for the common case, we're stuck with at least one zone spanning the
> 32-bit address space.

You are right, I guess it makes sense to keep a 32-bit zone as not all
devices would be described as such.

> > >   arm64_dma_phys_limit = max_zone_phys(zone_dma_bits);
> > >   max_zone_pfns[ZONE_DMA] = PFN_DOWN(arm64_dma_phys_limit);
> > >  #endif
> > 
> > I was talking earlier to Ard and Robin on the ZONE_DMA32 history and the
> > need for max_zone_phys(). This was rather theoretical, the Seattle
> > platform has all RAM starting above 4GB and that led to an empty
> > ZONE_DMA32 originally. The max_zone_phys() hack was meant to lift
> > ZONE_DMA32 into the bottom of the RAM, on the assumption that such
> > 32-bit devices would have a DMA offset hardwired. We are not aware of
> > any such case on arm64 systems and even on Seattle, IIUC 32-bit devices
> > only work if they are behind an SMMU (so no hardwired offset).
> > 
> > In hindsight, it would have made more sense on platforms with RAM above
> > 4GB to expand ZONE_DMA32 to cover the whole memory (so empty
> > ZONE_NORMAL). Something like:
> > 
> > diff --git a/arch/arm64/mm/init.c b/arch/arm64/mm/init.c
> > index a53c1e0fb017..7d5e3dd85617 100644
> > --- a/arch/arm64/mm/init.c
> > +++ b/arch/arm64/mm/init.c
> > @@ -187,8 +187,12 @@ static void __init reserve_elfcorehdr(void)
> >   */
> >  static phys_addr_t __init max_zone_phys(unsigned int zone_bits)
> >  {
> > -   phys_addr_t offset = memblock_start_of_DRAM() & GENMASK_ULL(63, 
> > zone_bits);
> > -   return min(offset + (1ULL << zone_bits), memblock_end_of_DRAM());
> > +   phys_addr_t zone_mask = 1ULL << zone_bits;
> > +
> > +   if (!(memblock_start_of_DRAM() & zone_mask))
> > +   zone_mask = PHYS_ADDR_MAX;
> > +
> > +   return min(zone_mask, memblock_end_of_DRAM());
> >  }
> >  
> >  static void __init zone_sizes_init(unsigned long min, unsigned long max)
> > 
> > I don't think this makes any difference for ZONE_DMA unless a
> > broken DT or IORT reports the max CPU address below the start of DRAM.
> > 
> > There's a minor issue if of_dma_get_max_cpu_address() matches
> > memblock_end_of_DRAM() but they are not a power of 2. We'd be left with
> > a bit of RAM at the end in ZONE_NORMAL due to ilog2 truncation.
> 
> I agree it makes no sense to create more than one zone when the beginning of
> RAM is located above the 32-bit address space. I'm all for disregarding the
> possibility of hardwired offsets. As a bonus, as we already discussed some 
> time
> ago, this is something that never played well with current dma-direct code[1].
> 
> [1] https://lkml.org/lkml/2020/9/8/377

Maybe this one is still worth fixing, at least for consistency. But it's
not urgent.

My diff above has a side-effect that if dt_zone_dma_bits is below the
start of DRAM, ZONE_DMA gets expanded to PHYS_ADDR_MAX. If this was
32-bit, that's fine but if it was, say, 30-bit because of some firmware
misdescription with RAM starting at 2GB, we end up with no ZONE_DMA32. I
think max_zone_phys() could cap this at 32, as a safety mechanism:

static phys_addr_t __init max_zone_phys(unsigned int zone_bits)
{
phys_addr_t zone_mask = (1ULL << zone_bits) - 1;

Re: [PATCH 2/4] iommu/mediatek: Add iotlb_sync_range() support

2020-10-23 Thread Robin Murphy

On 2020-10-23 06:57, chao hao wrote:

On Wed, 2020-10-21 at 17:55 +0100, Robin Murphy wrote:

On 2020-10-19 12:30, Chao Hao wrote:

MTK_IOMMU driver writes one page entry and does tlb flush at a time
currently. More optimal would be to aggregate the writes and flush
BUS buffer in the end.


That's exactly what iommu_iotlb_gather_add_page() is meant to achieve.
Rather than jumping straight into hacking up a new API to go round the
back of the existing API design, it would be far better to ask the
question of why that's not behaving as expected.


Thanks for you review!

iommu_iotlb_gather_add_page is put in io_pgtable_tlb_add_page().
io_pgtable_tlb_add_page() only be called in
unmapping and mapping flow doesn't have it in linux iommu driver, but
mtk iommu needs to do tlb sync in mapping
and unmapping to avoid old data being in the iommu tlb.


Right, as I suspected, if it's primarily about the map path then the 
answer is to use the existing API intended to accommodate that specific 
case; if that API doesn't quite do what you need, then just fix it! It 
doesn't make sense to clutter up the IOMMU core with multiple 
overlapping APIs for TLB invalidation, especially when TLB invalidation 
ultimately isn't all that complicated.



In addtion, we hope to do tlb sync once when all the pages mapping done.
iommu_iotlb_gather_add_page maybe do
tlb sync more than once. because one whole buffer consists of different
page size(1MB/64K/4K).


So if your hardware doesn't care about the granule size used for 
invalidation matching the underlying mapping, use an implementation of 
iommu_flush_ops::tlb_add_page that doesn't care about the granule size!



Based on the previous considerations,  don't find more appropriate the
way of tlb sync for mtk iommu, so we add a new API.


I know I'm probably more familiar with this code than most people, but 
from my perspective, this reads like "my car was dirty, so I had to buy 
a new car" ;)



For 50MB buffer mapping, if mtk_iommu driver use iotlb_sync_range()
instead of tlb_add_range() and tlb_flush_walk/leaf(), it can increase
50% performance or more(depending on size of every page size) in
comparison to flushing after each page entry update. So we prefer to
use iotlb_sync_range() to replace iotlb_sync(), tlb_add_range() and
tlb_flush_walk/leaf() for MTK platforms.


In the case of mapping, it sounds like what you actually want to do is
hook up .iotlb_sync_map and generally make IO_PGTABLE_QUIRK_TLBI_ON_MAP
cleverer, because the current implementation is as dumb as it could
possibly be.


iotlb_sync_map only has one parameter(iommu_domain), but mtk
iommu_domain maybe include the whole iova space, if mtk_iommu to do tlb
sync based on iommu_domain, it is equivalent to do tlb flush all in
fact.


The first, and so far only, user of this API is the Tegra GART, which 
can only do a "flush all" operation, so the API is currently only as 
complex as it needs to be, which is to say "not very". There are plenty 
of options.



iommu driver will do tlb sync in every mapping page when mtk iommu sets
IO_PGTABLE_QUIRK_TLBI_ON_MAP(io_pgtable_tlb_flush_walk),
as is the commit message mentioned, it will drop mapping performance in
mtk platform.


And as I said, that quirk is implemented in a really simplistic way, 
which is sure to be functionally correct, but has never been given any 
performance consideration.



In fact if we simply passed an address range to
.iotlb_sync_map, io-pgtable probably wouldn't need to be involved at all
any more.


I know it is not a good idea probably by adding a new api, but I found
out that tlb sync only to be done after mapping one page, so if
mtk_iommu hope to do tlb sync once after all the pages map done, could
you give me some advices? thanks!


Getting rid of IO_PGTABLE_QUIRK_TLBI_ON_MAP and simply wiring 
.iotlb_sync_map to .flush_iotlb_all is certainly the easiest way to make 
.map quicker, although it will obviously have *some* impact on any other 
live mappings. Given that in principle this should have the least CPU 
overhead, then depending on TLB usage patterns there's a small chance it 
might actually work out as a reasonable tradeoff, so I wouldn't 
necessarily rule it out as a viable option without at least trying some 
tests.


If it's cheap to *issue* invalidation commands, and the expense is in 
waiting for them to actually complete, then you could still fire off an 
invalidate for each page from .map, then wire up the sync (wait) step to 
.iotlb_sync_map, still without needing any core changes.


Otherwise, it would seem reasonable to pass the complete address and 
size of the iommu_map() operation to .iotlb_sync_map, so drivers can 
perform their whole invalidation operation synchronously there.


What I *don't* think makes sense is to try passing a gather structure 
through .map in the same way as for .unmap. That seems a bit too 
invasive for what is still a fairly exceptional case, and half the stuff 
that unmap operations will use the 

Re: [PATCH v4 5/7] arm64: mm: Set ZONE_DMA size based on devicetree's dma-ranges

2020-10-23 Thread Nicolas Saenz Julienne
Hi Catalin,

On Thu, 2020-10-22 at 19:06 +0100, Catalin Marinas wrote:
> On Wed, Oct 21, 2020 at 02:34:35PM +0200, Nicolas Saenz Julienne wrote:
> > @@ -188,9 +186,11 @@ static phys_addr_t __init max_zone_phys(unsigned int 
> > zone_bits)
> >  static void __init zone_sizes_init(unsigned long min, unsigned long max)
> >  {
> > unsigned long max_zone_pfns[MAX_NR_ZONES]  = {0};
> > +   unsigned int __maybe_unused dt_zone_dma_bits;
> >  
> >  #ifdef CONFIG_ZONE_DMA
> > -   zone_dma_bits = ARM64_ZONE_DMA_BITS;
> > +   dt_zone_dma_bits = ilog2(of_dma_get_max_cpu_address(NULL));
> > +   zone_dma_bits = min(32U, dt_zone_dma_bits);
> 
> A thought: can we remove the min here and expand ZONE_DMA to whatever
> dt_zone_dma_bits says? More on this below.

On most platforms we'd get PHYS_ADDR_MAX, or something bigger than the actual
amount of RAM. Which would ultimately create a system wide ZONE_DMA. At first
sight, I don't see it breaking dma-direct in any way.

On the other hand, there is a big amount of MMIO devices out there that can
only handle 32-bit addressing. Be it PCI cards or actual IP cores. To make
things worse, this limitation is often expressed in the driver, not FW (with
dma_set_mask() and friends). If those devices aren't behind an IOMMU we have be
able to provide at least 32-bit addressable memory. See this comment from
dma_direct_supported():

/*
 * Because 32-bit DMA masks are so common we expect every architecture
 * to be able to satisfy them - either by not supporting more physical
 * memory, or by providing a ZONE_DMA32.  If neither is the case, the
 * architecture needs to use an IOMMU instead of the direct mapping.
 */

I think, for the common case, we're stuck with at least one zone spanning the
32-bit address space.

> > arm64_dma_phys_limit = max_zone_phys(zone_dma_bits);
> > max_zone_pfns[ZONE_DMA] = PFN_DOWN(arm64_dma_phys_limit);
> >  #endif
> 
> I was talking earlier to Ard and Robin on the ZONE_DMA32 history and the
> need for max_zone_phys(). This was rather theoretical, the Seattle
> platform has all RAM starting above 4GB and that led to an empty
> ZONE_DMA32 originally. The max_zone_phys() hack was meant to lift
> ZONE_DMA32 into the bottom of the RAM, on the assumption that such
> 32-bit devices would have a DMA offset hardwired. We are not aware of
> any such case on arm64 systems and even on Seattle, IIUC 32-bit devices
> only work if they are behind an SMMU (so no hardwired offset).
> 
> In hindsight, it would have made more sense on platforms with RAM above
> 4GB to expand ZONE_DMA32 to cover the whole memory (so empty
> ZONE_NORMAL). Something like:
> 
> diff --git a/arch/arm64/mm/init.c b/arch/arm64/mm/init.c
> index a53c1e0fb017..7d5e3dd85617 100644
> --- a/arch/arm64/mm/init.c
> +++ b/arch/arm64/mm/init.c
> @@ -187,8 +187,12 @@ static void __init reserve_elfcorehdr(void)
>   */
>  static phys_addr_t __init max_zone_phys(unsigned int zone_bits)
>  {
> - phys_addr_t offset = memblock_start_of_DRAM() & GENMASK_ULL(63, 
> zone_bits);
> - return min(offset + (1ULL << zone_bits), memblock_end_of_DRAM());
> + phys_addr_t zone_mask = 1ULL << zone_bits;
> +
> + if (!(memblock_start_of_DRAM() & zone_mask))
> + zone_mask = PHYS_ADDR_MAX;
> +
> + return min(zone_mask, memblock_end_of_DRAM());
>  }
>  
>  static void __init zone_sizes_init(unsigned long min, unsigned long max)
> 
> I don't think this makes any difference for ZONE_DMA unless a
> broken DT or IORT reports the max CPU address below the start of DRAM.
> 
> There's a minor issue if of_dma_get_max_cpu_address() matches
> memblock_end_of_DRAM() but they are not a power of 2. We'd be left with
> a bit of RAM at the end in ZONE_NORMAL due to ilog2 truncation.

I agree it makes no sense to create more than one zone when the beginning of
RAM is located above the 32-bit address space. I'm all for disregarding the
possibility of hardwired offsets. As a bonus, as we already discussed some time
ago, this is something that never played well with current dma-direct code[1].

Regards,
Nicolas

[1] https://lkml.org/lkml/2020/9/8/377



signature.asc
Description: This is a digitally signed message part
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

Re: [PATCH v3 11/24] iommu/io-pgtable-arm-v7s: Quad lvl1 pgtable for MediaTek

2020-10-23 Thread Robin Murphy

On 2020-09-30 08:06, Yong Wu wrote:

The standard input iova bits is 32. MediaTek quad the lvl1 pagetable
(4 * lvl1). No change for lvl2 pagetable. Then the iova bits can reach
34bit.

Signed-off-by: Yong Wu 
---
  drivers/iommu/io-pgtable-arm-v7s.c | 13 ++---
  drivers/iommu/mtk_iommu.c  |  2 +-
  2 files changed, 11 insertions(+), 4 deletions(-)

diff --git a/drivers/iommu/io-pgtable-arm-v7s.c 
b/drivers/iommu/io-pgtable-arm-v7s.c
index 8362fdf76657..306bae2755ed 100644
--- a/drivers/iommu/io-pgtable-arm-v7s.c
+++ b/drivers/iommu/io-pgtable-arm-v7s.c
@@ -50,10 +50,17 @@
   */
  #define ARM_V7S_ADDR_BITS 32
  #define _ARM_V7S_LVL_BITS(lvl)(16 - (lvl) * 4)
+/* MediaTek: totally 34bits, 14bits at lvl1 and 8bits at lvl2. */
+#define _ARM_V7S_LVL_BITS_MTK(lvl) (20 - (lvl) * 6)


This should defined in terms of both lvl and cfg->ias. The formula here 
is nothing more than a disgusting trick I made up since a linear 
interpolation happened to fit the required numbers. That said, all of 
these bits pretending that short-descriptor is a well-defined recursive 
format only served to allow the rest of the code to look more like the 
LPAE code - IIRC they've already diverged a fair bit since then, so 
frankly a lot of this could stand to be unpicked and made considerably 
clearer by simply accepting that level 1 and level 2 are different from 
each other.


Robin.


  #define ARM_V7S_LVL_SHIFT(lvl)(ARM_V7S_ADDR_BITS - (4 + 8 * 
(lvl)))
  #define ARM_V7S_TABLE_SHIFT   10
  
-#define ARM_V7S_PTES_PER_LVL(lvl, cfg)	(1 << _ARM_V7S_LVL_BITS(lvl))

+#define ARM_V7S_PTES_PER_LVL(lvl, cfg) ({  \
+   int _lvl = lvl; \
+   !arm_v7s_is_mtk_enabled(cfg) ?  \
+(1 << _ARM_V7S_LVL_BITS(_lvl)) : (1 << _ARM_V7S_LVL_BITS_MTK(_lvl));\
+})
+
  #define ARM_V7S_TABLE_SIZE(lvl, cfg)  \
(ARM_V7S_PTES_PER_LVL(lvl, cfg) * sizeof(arm_v7s_iopte))
  
@@ -63,7 +70,7 @@

  #define _ARM_V7S_IDX_MASK(lvl, cfg)   (ARM_V7S_PTES_PER_LVL(lvl, cfg) - 1)
  #define ARM_V7S_LVL_IDX(addr, lvl, cfg)   ({  \
int _l = lvl;   \
-   ((u32)(addr) >> ARM_V7S_LVL_SHIFT(_l)) & _ARM_V7S_IDX_MASK(_l, cfg); \
+   ((addr) >> ARM_V7S_LVL_SHIFT(_l)) & _ARM_V7S_IDX_MASK(_l, cfg); \
  })
  
  /*

@@ -755,7 +762,7 @@ static struct io_pgtable *arm_v7s_alloc_pgtable(struct 
io_pgtable_cfg *cfg,
  {
struct arm_v7s_io_pgtable *data;
  
-	if (cfg->ias > ARM_V7S_ADDR_BITS)

+   if (cfg->ias > (arm_v7s_is_mtk_enabled(cfg) ? 34 : ARM_V7S_ADDR_BITS))
return NULL;
  
  	if (cfg->oas > (arm_v7s_is_mtk_enabled(cfg) ? 35 : ARM_V7S_ADDR_BITS))

diff --git a/drivers/iommu/mtk_iommu.c b/drivers/iommu/mtk_iommu.c
index f6a2e3eb59d2..6e85c9976a33 100644
--- a/drivers/iommu/mtk_iommu.c
+++ b/drivers/iommu/mtk_iommu.c
@@ -316,7 +316,7 @@ static int mtk_iommu_domain_finalise(struct 
mtk_iommu_domain *dom)
IO_PGTABLE_QUIRK_TLBI_ON_MAP |
IO_PGTABLE_QUIRK_ARM_MTK_EXT,
.pgsize_bitmap = mtk_iommu_ops.pgsize_bitmap,
-   .ias = 32,
+   .ias = 34,
.oas = 35,
.tlb = _iommu_flush_ops,
.iommu_dev = data->dev,


___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH v4 0/4] Add system mmu support for Armada-806

2020-10-23 Thread Tomasz Nowicki

On 10/23/20 12:33 PM, Robin Murphy wrote:

On 2020-10-23 13:19, Tomasz Nowicki wrote:

Hi Denis,

Sorry for late response, we had to check few things. Please see 
comments inline.


On 10/6/20 3:16 PM, Denis Odintsov wrote:

Hi,


Am 15.07.2020 um 09:06 schrieb Tomasz Nowicki :

The series is meant to support SMMU for AP806 and a workaround
for accessing ARM SMMU 64bit registers is the gist of it.

For the record, AP-806 can't access SMMU registers with 64bit width.
This patches split the readq/writeq into two 32bit accesses instead
and update DT bindings.

The series was successfully tested on a vanilla v5.8-rc3 kernel and
Intel e1000e PCIe NIC. The same for platform devices like SATA and USB.

For reference, previous versions are listed below:
V1: https://lkml.org/lkml/2018/10/15/373
V2: https://lkml.org/lkml/2019/7/11/426
V3: https://lkml.org/lkml/2020/7/2/1114



1) After enabling SMMU on Armada 8040, and 
ARM_SMMU_DISABLE_BYPASS_BY_DEFAUL=y by default in kernel since 
954a03be033c7cef80ddc232e7cbdb17df735663,
internal eMMC is prevented from being initialised (as there is no 
iommus property for ap_sdhci0)
Disabling "Disable bypass by default" make it work, but the patch 
highly suggest doing it properly.
I wasn't able to find correct path for ap_sdhci for iommus in any 
publicly available documentation,

would be highly appreciated addressed properly, thank you!

2) Second issue I got (btw I have ClearFog GT 8k armada-8040 based 
board) is mpci ath10k card.
It is found, it is enumerated, it is visible in lspci, but it fails 
to be initialised. Here is the log:


Firmware has to configure and assign device StreamIDs. Most of the 
devices are configured properly and supported in public FW. However, 
for both these cases (ap_sdhci0 and PCIe) some extra (u-boot/UEFI/ATF) 
patches are required which are not available yet. Sorry we let that 
happen.


Since we have dependency on custom FW and we cannot enforce people to 
patch their FW we will send the follow up fix patch (v5.9+) and revert 
respective DTS changes.


Note that it should be sufficient to simply keep the SMMU node disabled, 
rather than fully revert everything. For example, the PCIe SMMU for Arm 
Juno boards has been in that state for a long time - there are reasons 
why it isn't (yet) 100% usable for everyone, but it can easily be 
enabled locally for development (as I do).




Actually that was our plan :) but then we decided to keep DTS clean if 
something is not used. Your reasoning, however, does make sense and we 
will go for it.


Thanks,
Tomasz



The most important Armada-806 SMMU driver enhancements were merged so 
people who still willing to use SMMU need to provide proper DTB and 
use ARM_SMMU_DISABLE_BYPASS_BY_DEFAUL=n (or via kernel command line) 
with extra cautious.


Thanks,
Tomasz



[    1.743754] armada8k-pcie f260.pcie: host bridge 
/cp0/pcie@f260 ranges:
[    1.751116] armada8k-pcie f260.pcie:  MEM 
0x00f600..0x00f6ef -> 0x00f600

[    1.964690] armada8k-pcie f260.pcie: Link up
[    1.969379] armada8k-pcie f260.pcie: PCI host bridge to bus 
:00

[    1.976026] pci_bus :00: root bus resource [bus 00-ff]
[    1.981537] pci_bus :00: root bus resource [mem 
0xf600-0xf6ef]

[    1.988462] pci :00:00.0: [11ab:0110] type 01 class 0x060400
[    1.994504] pci :00:00.0: reg 0x10: [mem 0x-0x000f]
[    2.000843] pci :00:00.0: supports D1 D2
[    2.005132] pci :00:00.0: PME# supported from D0 D1 D3hot
[    2.011853] pci :01:00.0: [168c:003c] type 00 class 0x028000
[    2.018001] pci :01:00.0: reg 0x10: [mem 0x-0x001f 
64bit]
[    2.025002] pci :01:00.0: reg 0x30: [mem 0x-0x 
pref]

[    2.032111] pci :01:00.0: supports D1 D2
[    2.049409] pci :00:00.0: BAR 14: assigned [mem 
0xf600-0xf61f]
[    2.056322] pci :00:00.0: BAR 0: assigned [mem 
0xf620-0xf62f]
[    2.063142] pci :00:00.0: BAR 15: assigned [mem 
0xf630-0xf63f pref]
[    2.070484] pci :01:00.0: BAR 0: assigned [mem 
0xf600-0xf61f 64bit]
[    2.077880] pci :01:00.0: BAR 6: assigned [mem 
0xf630-0xf630 pref]

[    2.085135] pci :00:00.0: PCI bridge to [bus 01-ff]
[    2.090384] pci :00:00.0:   bridge window [mem 
0xf600-0xf61f]
[    2.097202] pci :00:00.0:   bridge window [mem 
0xf630-0xf63f pref]

[    2.104539] pcieport :00:00.0: Adding to iommu group 4
[    2.110232] pcieport :00:00.0: PME: Signaling with IRQ 38
[    2.116141] pcieport :00:00.0: AER: enabled with IRQ 38
[    8.131135] ath10k_pci :01:00.0: Adding to iommu group 4
[    8.131874] ath10k_pci :01:00.0: enabling device ( -> 0002)
[    8.132203] ath10k_pci :01:00.0: pci irq msi oper_irq_mode 2 
irq_mode 0 reset_mode 0


up to that point the log is the same as without SMMU enabled, except 
"Adding to iommu group N" lines, and IRQ being 37


[    8.221328] ath10k_pci 

Re: [RFC PATCH 0/2] iommu: Avoid unnecessary PRI queue flushes

2020-10-23 Thread Jean-Philippe Brucker
On Mon, Oct 19, 2020 at 02:16:08PM -0700, Raj, Ashok wrote:
> Hi Jean
> 
> On Mon, Oct 19, 2020 at 04:08:24PM +0200, Jean-Philippe Brucker wrote:
> > On Sat, Oct 17, 2020 at 04:25:25AM -0700, Raj, Ashok wrote:
> > > > For devices that *don't* use a stop marker, the PCIe spec says 
> > > > (10.4.1.2):
> > > > 
> > > >   To stop [using a PASID] without using a Stop Marker Message, the
> > > >   function shall:
> > > >   1. Stop queueing new Page Request Messages for this PASID.
> > > 
> > > The device driver would need to tell stop sending any new PR's.
> > > 
> > > >   2. Finish transmitting any multi-page Page Request Messages for this
> > > >  PASID (i.e. send the Page Request Message with the L bit Set).
> > > >   3. Wait for PRG Response Messages associated any outstanding Page
> > > >  Request Messages for the PASID.
> > > > 
> > > > So they have to flush their PR themselves. And since the device driver
> > > > completes this sequence before calling unbind(), then there shouldn't be
> > > > any oustanding PR for the PASID, and unbind() doesn't need to flush,
> > > > right?
> > > 
> > > I can see how the device can complete #2,3 above. But the device driver
> > > isn't the one managing page-responses right. So in order for the device to
> > > know the above sequence is complete, it would need to get some assist from
> > > IOMMU driver?
> > 
> > No the device driver just waits for the device to indicate that it has
> > completed the sequence. That's what the magic stop-PASID mechanism
> > described by PCIe does. In 6.20.1 "Managing PASID TLP Prefix Usage" it
> > says:
> 
> The goal is we do this when the device is in a messup up state. So we can't
> take for granted the device is properly operating which is why we are going
> to wack the device with a flr().
> 
> The only thing that's supposed to work without a brain freeze is the
> invalidation logic. Spec requires devices to respond to invalidations even 
> when
> they are in the process of flr().
> 
> So when IOMMU does an invalidation wait with a Page-Drain, IOMMU waits till
> the response for that arrives before completing the descriptor. Due to 
> the posted semantics it will ensure any PR's issued and in the fabric are 
> flushed 
> out to memory. 
> 
> I suppose you can wait for the device to vouch for all responses, but that
> is assuming the device is still functioning properly. Given that we use it
> in two places,
> 
> * Reclaiming a PASID - only during a tear down sequence, skipping it
>   doesn't really buy us much.

Yes I was only wondering about normal PASID reclaim operations, in
unbind(). Agreed that for FLR we need to properly clean the queue, though
I do need to do more thinking about this.

Anyway, having a full priq drain in unbind() isn't harmful, just
unnecessary delay in my opinion. I'll drop these patches for now but
thanks for the discussion.

Thanks,
Jean

> * During FLR this can't be skipped anyway due to the above sequence
>   requirement. 
> 
> > 
> > "A Function must have a mechanism to request that it gracefully stop using
> >  a specific PASID. This mechanism is device specific but must satisfy the
> >  following rules:
> >  [...]
> >  * When the stop request mechanism indicates completion, the Function has:
> >[...]
> >* Complied with additional rules described in Address Translation
> >  Services (Chapter 10 [10.4.1.2 quoted above]) if Address Translations
> >  or Page Requests were issued on the behalf of this PASID."
> > 
> > So after the device driver initiates this mechanism in the device, the
> > device must be able to indicate completion of the mechanism, which
> > includes completing all in-flight Page Requests. At that point the device
> > driver can call unbind() knowing there is no pending PR for this PASID.
> > 
> 
> Cheers,
> Ashok
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [RFC PATCH 0/2] iommu: Avoid unnecessary PRI queue flushes

2020-10-23 Thread Jean-Philippe Brucker
On Mon, Oct 19, 2020 at 11:33:16AM -0700, Jacob Pan wrote:
> Hi Jean-Philippe,
> 
> On Mon, 19 Oct 2020 16:08:24 +0200, Jean-Philippe Brucker
>  wrote:
> 
> > On Sat, Oct 17, 2020 at 04:25:25AM -0700, Raj, Ashok wrote:
> > > > For devices that *don't* use a stop marker, the PCIe spec says
> > > > (10.4.1.2):
> > > > 
> > > >   To stop [using a PASID] without using a Stop Marker Message, the
> > > >   function shall:
> > > >   1. Stop queueing new Page Request Messages for this PASID.  
> > > 
> > > The device driver would need to tell stop sending any new PR's.
> > >   
> > > >   2. Finish transmitting any multi-page Page Request Messages for this
> > > >  PASID (i.e. send the Page Request Message with the L bit Set).
> > > >   3. Wait for PRG Response Messages associated any outstanding Page
> > > >  Request Messages for the PASID.
> > > > 
> > > > So they have to flush their PR themselves. And since the device driver
> > > > completes this sequence before calling unbind(), then there shouldn't
> > > > be any oustanding PR for the PASID, and unbind() doesn't need to
> > > > flush, right?  
> > > 
> > > I can see how the device can complete #2,3 above. But the device driver
> > > isn't the one managing page-responses right. So in order for the device
> > > to know the above sequence is complete, it would need to get some
> > > assist from IOMMU driver?  
> > 
> > No the device driver just waits for the device to indicate that it has
> > completed the sequence. That's what the magic stop-PASID mechanism
> > described by PCIe does. In 6.20.1 "Managing PASID TLP Prefix Usage" it
> > says:
> > 
> > "A Function must have a mechanism to request that it gracefully stop using
> >  a specific PASID. This mechanism is device specific but must satisfy the
> >  following rules:
> >  [...]
> >  * When the stop request mechanism indicates completion, the Function has:
> >[...]
> >* Complied with additional rules described in Address Translation
> >  Services (Chapter 10 [10.4.1.2 quoted above]) if Address Translations
> >  or Page Requests were issued on the behalf of this PASID."
> > 
> > So after the device driver initiates this mechanism in the device, the
> > device must be able to indicate completion of the mechanism, which
> > includes completing all in-flight Page Requests. At that point the device
> > driver can call unbind() knowing there is no pending PR for this PASID.
> > 
> In step #3, I think it is possible that device driver received page response
> as part of the auto page response, so it may not guarantee all the in-flight
> PRQs are completed inside IOMMU. Therefore, drain is _always_ needed to be
> sure?

So I don't think that's a problem, because non-last PR are not handled by
IOPF until the last PR is received. And when the IOMMU driver detects that
the queue has been overflowing and could have auto-responded to last PR,
we discard any pending non-last PR. I couldn't find a leak here.

Thanks,
Jean

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH v4 0/4] Add system mmu support for Armada-806

2020-10-23 Thread Denis Odintsov
Hi,

> Am 23.10.2020 um 14:19 schrieb Tomasz Nowicki :
> 
> Hi Denis,
> 
> Sorry for late response, we had to check few things. Please see comments 
> inline.
> 
> On 10/6/20 3:16 PM, Denis Odintsov wrote:
>> Hi,
>>> Am 15.07.2020 um 09:06 schrieb Tomasz Nowicki :
>>> 
>>> The series is meant to support SMMU for AP806 and a workaround
>>> for accessing ARM SMMU 64bit registers is the gist of it.
>>> 
>>> For the record, AP-806 can't access SMMU registers with 64bit width.
>>> This patches split the readq/writeq into two 32bit accesses instead
>>> and update DT bindings.
>>> 
>>> The series was successfully tested on a vanilla v5.8-rc3 kernel and
>>> Intel e1000e PCIe NIC. The same for platform devices like SATA and USB.
>>> 
>>> For reference, previous versions are listed below:
>>> V1: https://lkml.org/lkml/2018/10/15/373
>>> V2: https://lkml.org/lkml/2019/7/11/426
>>> V3: https://lkml.org/lkml/2020/7/2/1114
>>> 
>> 1) After enabling SMMU on Armada 8040, and 
>> ARM_SMMU_DISABLE_BYPASS_BY_DEFAUL=y by default in kernel since 
>> 954a03be033c7cef80ddc232e7cbdb17df735663,
>> internal eMMC is prevented from being initialised (as there is no iommus 
>> property for ap_sdhci0)
>> Disabling "Disable bypass by default" make it work, but the patch highly 
>> suggest doing it properly.
>> I wasn't able to find correct path for ap_sdhci for iommus in any publicly 
>> available documentation,
>> would be highly appreciated addressed properly, thank you!
>> 2) Second issue I got (btw I have ClearFog GT 8k armada-8040 based board) is 
>> mpci ath10k card.
>> It is found, it is enumerated, it is visible in lspci, but it fails to be 
>> initialised. Here is the log:
> 
> Firmware has to configure and assign device StreamIDs. Most of the devices 
> are configured properly and supported in public FW. However, for both these 
> cases (ap_sdhci0 and PCIe) some extra (u-boot/UEFI/ATF) patches are required 
> which are not available yet. Sorry we let that happen.
> 
> Since we have dependency on custom FW and we cannot enforce people to patch 
> their FW we will send the follow up fix patch (v5.9+) and revert respective 
> DTS changes.
> 
> The most important Armada-806 SMMU driver enhancements were merged so people 
> who still willing to use SMMU need to provide proper DTB and use 
> ARM_SMMU_DISABLE_BYPASS_BY_DEFAUL=n (or via kernel command line) with extra 
> cautious.
> 

Thank you very much for the reply, I'm a mere computer enthusiast who like to 
progress with the technology and keep up with the software.
There was no harm at all with all those changes, and I see how they all are 
planned for a good reason.
I'm looking forward for that patches and further progress, and thank you for 
your work.

Denis.

> Thanks,
> Tomasz
> 
>> [1.743754] armada8k-pcie f260.pcie: host bridge /cp0/pcie@f260 
>> ranges:
>> [1.751116] armada8k-pcie f260.pcie:  MEM 
>> 0x00f600..0x00f6ef -> 0x00f600
>> [1.964690] armada8k-pcie f260.pcie: Link up
>> [1.969379] armada8k-pcie f260.pcie: PCI host bridge to bus :00
>> [1.976026] pci_bus :00: root bus resource [bus 00-ff]
>> [1.981537] pci_bus :00: root bus resource [mem 0xf600-0xf6ef]
>> [1.988462] pci :00:00.0: [11ab:0110] type 01 class 0x060400
>> [1.994504] pci :00:00.0: reg 0x10: [mem 0x-0x000f]
>> [2.000843] pci :00:00.0: supports D1 D2
>> [2.005132] pci :00:00.0: PME# supported from D0 D1 D3hot
>> [2.011853] pci :01:00.0: [168c:003c] type 00 class 0x028000
>> [2.018001] pci :01:00.0: reg 0x10: [mem 0x-0x001f 64bit]
>> [2.025002] pci :01:00.0: reg 0x30: [mem 0x-0x pref]
>> [2.032111] pci :01:00.0: supports D1 D2
>> [2.049409] pci :00:00.0: BAR 14: assigned [mem 0xf600-0xf61f]
>> [2.056322] pci :00:00.0: BAR 0: assigned [mem 0xf620-0xf62f]
>> [2.063142] pci :00:00.0: BAR 15: assigned [mem 0xf630-0xf63f 
>> pref]
>> [2.070484] pci :01:00.0: BAR 0: assigned [mem 0xf600-0xf61f 
>> 64bit]
>> [2.077880] pci :01:00.0: BAR 6: assigned [mem 0xf630-0xf630 
>> pref]
>> [2.085135] pci :00:00.0: PCI bridge to [bus 01-ff]
>> [2.090384] pci :00:00.0:   bridge window [mem 0xf600-0xf61f]
>> [2.097202] pci :00:00.0:   bridge window [mem 0xf630-0xf63f 
>> pref]
>> [2.104539] pcieport :00:00.0: Adding to iommu group 4
>> [2.110232] pcieport :00:00.0: PME: Signaling with IRQ 38
>> [2.116141] pcieport :00:00.0: AER: enabled with IRQ 38
>> [8.131135] ath10k_pci :01:00.0: Adding to iommu group 4
>> [8.131874] ath10k_pci :01:00.0: enabling device ( -> 0002)
>> [8.132203] ath10k_pci :01:00.0: pci irq msi oper_irq_mode 2 irq_mode 
>> 0 reset_mode 0
>> up to that point the log is the same as without SMMU enabled, except "Adding 
>> to iommu group N" lines, and IRQ 

Re: [PATCH v4 0/4] Add system mmu support for Armada-806

2020-10-23 Thread Robin Murphy

On 2020-10-23 13:19, Tomasz Nowicki wrote:

Hi Denis,

Sorry for late response, we had to check few things. Please see comments 
inline.


On 10/6/20 3:16 PM, Denis Odintsov wrote:

Hi,


Am 15.07.2020 um 09:06 schrieb Tomasz Nowicki :

The series is meant to support SMMU for AP806 and a workaround
for accessing ARM SMMU 64bit registers is the gist of it.

For the record, AP-806 can't access SMMU registers with 64bit width.
This patches split the readq/writeq into two 32bit accesses instead
and update DT bindings.

The series was successfully tested on a vanilla v5.8-rc3 kernel and
Intel e1000e PCIe NIC. The same for platform devices like SATA and USB.

For reference, previous versions are listed below:
V1: https://lkml.org/lkml/2018/10/15/373
V2: https://lkml.org/lkml/2019/7/11/426
V3: https://lkml.org/lkml/2020/7/2/1114



1) After enabling SMMU on Armada 8040, and 
ARM_SMMU_DISABLE_BYPASS_BY_DEFAUL=y by default in kernel since 
954a03be033c7cef80ddc232e7cbdb17df735663,
internal eMMC is prevented from being initialised (as there is no 
iommus property for ap_sdhci0)
Disabling "Disable bypass by default" make it work, but the patch 
highly suggest doing it properly.
I wasn't able to find correct path for ap_sdhci for iommus in any 
publicly available documentation,

would be highly appreciated addressed properly, thank you!

2) Second issue I got (btw I have ClearFog GT 8k armada-8040 based 
board) is mpci ath10k card.
It is found, it is enumerated, it is visible in lspci, but it fails to 
be initialised. Here is the log:


Firmware has to configure and assign device StreamIDs. Most of the 
devices are configured properly and supported in public FW. However, for 
both these cases (ap_sdhci0 and PCIe) some extra (u-boot/UEFI/ATF) 
patches are required which are not available yet. Sorry we let that happen.


Since we have dependency on custom FW and we cannot enforce people to 
patch their FW we will send the follow up fix patch (v5.9+) and revert 
respective DTS changes.


Note that it should be sufficient to simply keep the SMMU node disabled, 
rather than fully revert everything. For example, the PCIe SMMU for Arm 
Juno boards has been in that state for a long time - there are reasons 
why it isn't (yet) 100% usable for everyone, but it can easily be 
enabled locally for development (as I do).


Robin.

The most important Armada-806 SMMU driver enhancements were merged so 
people who still willing to use SMMU need to provide proper DTB and use 
ARM_SMMU_DISABLE_BYPASS_BY_DEFAUL=n (or via kernel command line) with 
extra cautious.


Thanks,
Tomasz



[    1.743754] armada8k-pcie f260.pcie: host bridge 
/cp0/pcie@f260 ranges:
[    1.751116] armada8k-pcie f260.pcie:  MEM 
0x00f600..0x00f6ef -> 0x00f600

[    1.964690] armada8k-pcie f260.pcie: Link up
[    1.969379] armada8k-pcie f260.pcie: PCI host bridge to bus 
:00

[    1.976026] pci_bus :00: root bus resource [bus 00-ff]
[    1.981537] pci_bus :00: root bus resource [mem 
0xf600-0xf6ef]

[    1.988462] pci :00:00.0: [11ab:0110] type 01 class 0x060400
[    1.994504] pci :00:00.0: reg 0x10: [mem 0x-0x000f]
[    2.000843] pci :00:00.0: supports D1 D2
[    2.005132] pci :00:00.0: PME# supported from D0 D1 D3hot
[    2.011853] pci :01:00.0: [168c:003c] type 00 class 0x028000
[    2.018001] pci :01:00.0: reg 0x10: [mem 0x-0x001f 
64bit]
[    2.025002] pci :01:00.0: reg 0x30: [mem 0x-0x 
pref]

[    2.032111] pci :01:00.0: supports D1 D2
[    2.049409] pci :00:00.0: BAR 14: assigned [mem 
0xf600-0xf61f]
[    2.056322] pci :00:00.0: BAR 0: assigned [mem 
0xf620-0xf62f]
[    2.063142] pci :00:00.0: BAR 15: assigned [mem 
0xf630-0xf63f pref]
[    2.070484] pci :01:00.0: BAR 0: assigned [mem 
0xf600-0xf61f 64bit]
[    2.077880] pci :01:00.0: BAR 6: assigned [mem 
0xf630-0xf630 pref]

[    2.085135] pci :00:00.0: PCI bridge to [bus 01-ff]
[    2.090384] pci :00:00.0:   bridge window [mem 
0xf600-0xf61f]
[    2.097202] pci :00:00.0:   bridge window [mem 
0xf630-0xf63f pref]

[    2.104539] pcieport :00:00.0: Adding to iommu group 4
[    2.110232] pcieport :00:00.0: PME: Signaling with IRQ 38
[    2.116141] pcieport :00:00.0: AER: enabled with IRQ 38
[    8.131135] ath10k_pci :01:00.0: Adding to iommu group 4
[    8.131874] ath10k_pci :01:00.0: enabling device ( -> 0002)
[    8.132203] ath10k_pci :01:00.0: pci irq msi oper_irq_mode 2 
irq_mode 0 reset_mode 0


up to that point the log is the same as without SMMU enabled, except 
"Adding to iommu group N" lines, and IRQ being 37


[    8.221328] ath10k_pci :01:00.0: failed to poke copy engine: -16
[    8.313362] ath10k_pci :01:00.0: failed to poke copy engine: -16
[    8.409373] ath10k_pci :01:00.0: failed to poke copy engine: -16
[    8.553433] ath10k_pci 

Re: [PATCH v4 0/4] Add system mmu support for Armada-806

2020-10-23 Thread Tomasz Nowicki

Hi Denis,

Sorry for late response, we had to check few things. Please see comments 
inline.


On 10/6/20 3:16 PM, Denis Odintsov wrote:

Hi,


Am 15.07.2020 um 09:06 schrieb Tomasz Nowicki :

The series is meant to support SMMU for AP806 and a workaround
for accessing ARM SMMU 64bit registers is the gist of it.

For the record, AP-806 can't access SMMU registers with 64bit width.
This patches split the readq/writeq into two 32bit accesses instead
and update DT bindings.

The series was successfully tested on a vanilla v5.8-rc3 kernel and
Intel e1000e PCIe NIC. The same for platform devices like SATA and USB.

For reference, previous versions are listed below:
V1: https://lkml.org/lkml/2018/10/15/373
V2: https://lkml.org/lkml/2019/7/11/426
V3: https://lkml.org/lkml/2020/7/2/1114



1) After enabling SMMU on Armada 8040, and ARM_SMMU_DISABLE_BYPASS_BY_DEFAUL=y 
by default in kernel since 954a03be033c7cef80ddc232e7cbdb17df735663,
internal eMMC is prevented from being initialised (as there is no iommus 
property for ap_sdhci0)
Disabling "Disable bypass by default" make it work, but the patch highly 
suggest doing it properly.
I wasn't able to find correct path for ap_sdhci for iommus in any publicly 
available documentation,
would be highly appreciated addressed properly, thank you!

2) Second issue I got (btw I have ClearFog GT 8k armada-8040 based board) is 
mpci ath10k card.
It is found, it is enumerated, it is visible in lspci, but it fails to be 
initialised. Here is the log:


Firmware has to configure and assign device StreamIDs. Most of the 
devices are configured properly and supported in public FW. However, for 
both these cases (ap_sdhci0 and PCIe) some extra (u-boot/UEFI/ATF) 
patches are required which are not available yet. Sorry we let that happen.


Since we have dependency on custom FW and we cannot enforce people to 
patch their FW we will send the follow up fix patch (v5.9+) and revert 
respective DTS changes.


The most important Armada-806 SMMU driver enhancements were merged so 
people who still willing to use SMMU need to provide proper DTB and use 
ARM_SMMU_DISABLE_BYPASS_BY_DEFAUL=n (or via kernel command line) with 
extra cautious.


Thanks,
Tomasz



[1.743754] armada8k-pcie f260.pcie: host bridge /cp0/pcie@f260 
ranges:
[1.751116] armada8k-pcie f260.pcie:  MEM 0x00f600..0x00f6ef 
-> 0x00f600
[1.964690] armada8k-pcie f260.pcie: Link up
[1.969379] armada8k-pcie f260.pcie: PCI host bridge to bus :00
[1.976026] pci_bus :00: root bus resource [bus 00-ff]
[1.981537] pci_bus :00: root bus resource [mem 0xf600-0xf6ef]
[1.988462] pci :00:00.0: [11ab:0110] type 01 class 0x060400
[1.994504] pci :00:00.0: reg 0x10: [mem 0x-0x000f]
[2.000843] pci :00:00.0: supports D1 D2
[2.005132] pci :00:00.0: PME# supported from D0 D1 D3hot
[2.011853] pci :01:00.0: [168c:003c] type 00 class 0x028000
[2.018001] pci :01:00.0: reg 0x10: [mem 0x-0x001f 64bit]
[2.025002] pci :01:00.0: reg 0x30: [mem 0x-0x pref]
[2.032111] pci :01:00.0: supports D1 D2
[2.049409] pci :00:00.0: BAR 14: assigned [mem 0xf600-0xf61f]
[2.056322] pci :00:00.0: BAR 0: assigned [mem 0xf620-0xf62f]
[2.063142] pci :00:00.0: BAR 15: assigned [mem 0xf630-0xf63f 
pref]
[2.070484] pci :01:00.0: BAR 0: assigned [mem 0xf600-0xf61f 
64bit]
[2.077880] pci :01:00.0: BAR 6: assigned [mem 0xf630-0xf630 
pref]
[2.085135] pci :00:00.0: PCI bridge to [bus 01-ff]
[2.090384] pci :00:00.0:   bridge window [mem 0xf600-0xf61f]
[2.097202] pci :00:00.0:   bridge window [mem 0xf630-0xf63f 
pref]
[2.104539] pcieport :00:00.0: Adding to iommu group 4
[2.110232] pcieport :00:00.0: PME: Signaling with IRQ 38
[2.116141] pcieport :00:00.0: AER: enabled with IRQ 38
[8.131135] ath10k_pci :01:00.0: Adding to iommu group 4
[8.131874] ath10k_pci :01:00.0: enabling device ( -> 0002)
[8.132203] ath10k_pci :01:00.0: pci irq msi oper_irq_mode 2 irq_mode 0 
reset_mode 0

up to that point the log is the same as without SMMU enabled, except "Adding to 
iommu group N" lines, and IRQ being 37

[8.221328] ath10k_pci :01:00.0: failed to poke copy engine: -16
[8.313362] ath10k_pci :01:00.0: failed to poke copy engine: -16
[8.409373] ath10k_pci :01:00.0: failed to poke copy engine: -16
[8.553433] ath10k_pci :01:00.0: failed to poke copy engine: -16
[8.641370] ath10k_pci :01:00.0: failed to poke copy engine: -16
[8.737979] ath10k_pci :01:00.0: failed to poke copy engine: -16
[8.807356] ath10k_pci :01:00.0: Failed to get pcie state addr: -16
[8.814032] ath10k_pci :01:00.0: failed to setup init config: -16
[8.820605] ath10k_pci :01:00.0: could not power on hif bus 

Re: [PATCH v3 11/14] iommu/ioasid: Support mm type ioasid_set notifications

2020-10-23 Thread Jean-Philippe Brucker
On Mon, Sep 28, 2020 at 02:38:38PM -0700, Jacob Pan wrote:
> As a system-wide resource, IOASID is often shared by multiple kernel
> subsystems that are independent of each other. However, at the
> ioasid_set level, these kernel subsystems must communicate with each
> other for ownership checking, event notifications, etc. For example, on
> Intel Scalable IO Virtualization (SIOV) enabled platforms, KVM and VFIO
> instances under the same process/guest must be aware of a shared IOASID
> set.
> IOASID_SET_TYPE_MM token type was introduced to explicitly mark an
> IOASID set that belongs to a process, thus use the same mm_struct
> pointer as a token. Users of the same process can then identify with
> each other based on this token.
> 
> This patch introduces MM token specific event registration APIs. Event
> subscribers such as KVM instances can register IOASID event handler
> without the knowledge of its ioasid_set. Event handlers are registered
> based on its mm_struct pointer as a token. In case when subscribers
> register handler *prior* to the creation of the ioasid_set, the
> handler’s notification block is stored in a pending list within IOASID
> core. Once the ioasid_set of the MM token is created, the notification
> block will be registered by the IOASID core.
> 
> Signed-off-by: Liu Yi L 
> Signed-off-by: Wu Hao 
> Signed-off-by: Jacob Pan 
> ---
>  drivers/iommu/ioasid.c | 117 
> +
>  include/linux/ioasid.h |  15 +++
>  2 files changed, 132 insertions(+)
> 
> diff --git a/drivers/iommu/ioasid.c b/drivers/iommu/ioasid.c
> index 894b17c06ead..d5faeb559a43 100644
> --- a/drivers/iommu/ioasid.c
> +++ b/drivers/iommu/ioasid.c
> @@ -889,6 +889,29 @@ void ioasid_set_put(struct ioasid_set *set)
>  }
>  EXPORT_SYMBOL_GPL(ioasid_set_put);
>  
> +/*
> + * ioasid_find_mm_set - Retrieve IOASID set with mm token
> + * Take a reference of the set if found.
> + */
> +static struct ioasid_set *ioasid_find_mm_set(struct mm_struct *token)
> +{
> + struct ioasid_set *set;
> + unsigned long index;
> +
> + spin_lock(_allocator_lock);

This could deadlock since ioasid_set_put() takes the nb_lock while holding
the allocator_lock.

> +
> + xa_for_each(_sets, index, set) {
> + if (set->type == IOASID_SET_TYPE_MM && set->token == token) {
> + refcount_inc(>ref);
> + goto exit_unlock;
> + }
> + }
> + set = NULL;
> +exit_unlock:
> + spin_unlock(_allocator_lock);
> + return set;
> +}
> +
>  /**
>   * ioasid_adjust_set - Adjust the quota of an IOASID set
>   * @set: IOASID set to be assigned
> @@ -1121,6 +1144,100 @@ void ioasid_unregister_notifier(struct ioasid_set 
> *set,
>  }
>  EXPORT_SYMBOL_GPL(ioasid_unregister_notifier);
>  
> +/**
> + * ioasid_register_notifier_mm - Register a notifier block on the IOASID set
> + *   created by the mm_struct pointer as the 
> token
> + *
> + * @mm: the mm_struct token of the ioasid_set
> + * @nb: notfier block to be registered on the ioasid_set

Maybe add that the priority in @nb needs to be set by the caller using a
ioasid_notifier_priority value

> + *
> + * This a variant of ioasid_register_notifier() where the caller intends to
> + * listen to IOASID events belong the ioasid_set created under the same

 that belong to

> + * process. Caller is not aware of the ioasid_set, no need to hold reference
> + * of the ioasid_set.
> + */
> +int ioasid_register_notifier_mm(struct mm_struct *mm, struct notifier_block 
> *nb)
> +{
> + struct ioasid_set_nb *curr;
> + struct ioasid_set *set;
> + int ret = 0;
> +
> + if (!mm)
> + return -EINVAL;
> +
> + spin_lock(_nb_lock);
> +
> + /* Check for duplicates, nb is unique per set */
> + list_for_each_entry(curr, _nb_pending_list, list) {
> + if (curr->token == mm && curr->nb == nb) {
> + ret = -EBUSY;
> + goto exit_unlock;
> + }
> + }
> +
> + /* Check if the token has an existing set */
> + set = ioasid_find_mm_set(mm);
> + if (!set) {
> + /* Add to the rsvd list as inactive */
> + curr->active = false;

curr is invalid here

> + } else {
> + /* REVISIT: Only register empty set for now. Can add an option
> +  * in the future to playback existing PASIDs.
> +  */

You can drop this comment

> + if (set->nr_ioasids) {
> + pr_warn("IOASID set %d not empty\n", set->id);
> + ret = -EBUSY;
> + goto exit_unlock;
> + }
> + curr = kzalloc(sizeof(*curr), GFP_ATOMIC);

Needs to be in the common path, before this block

> + if (!curr) {
> + ret = -ENOMEM;
> + goto exit_unlock;
> + }
> + curr->token = mm;
> +

Re: [PATCH v3 10/14] iommu/ioasid: Introduce notification APIs

2020-10-23 Thread Jean-Philippe Brucker
On Mon, Sep 28, 2020 at 02:38:37PM -0700, Jacob Pan wrote:
> Relations among IOASID users largely follow a publisher-subscriber
> pattern. E.g. to support guest SVA on Intel Scalable I/O Virtualization
> (SIOV) enabled platforms, VFIO, IOMMU, device drivers, KVM are all users
> of IOASIDs. When a state change occurs, VFIO publishes the change event
> that needs to be processed by other users/subscribers.
> 
> This patch introduced two types of notifications: global and per
> ioasid_set. The latter is intended for users who only needs to handle
> events related to the IOASID of a given set.
> For more information, refer to the kernel documentation at
> Documentation/ioasid.rst.
> 
> Signed-off-by: Liu Yi L 
> Signed-off-by: Wu Hao 
> Signed-off-by: Jacob Pan 
> ---
>  drivers/iommu/ioasid.c | 141 
> +
>  include/linux/ioasid.h |  57 +++-
>  2 files changed, 197 insertions(+), 1 deletion(-)
> 
> diff --git a/drivers/iommu/ioasid.c b/drivers/iommu/ioasid.c
> index 378fef8f23d9..894b17c06ead 100644
> --- a/drivers/iommu/ioasid.c
> +++ b/drivers/iommu/ioasid.c
> @@ -10,12 +10,35 @@
>  #include 
>  #include 
>  
> +/*
> + * An IOASID can have multiple consumers where each consumer may have
> + * hardware contexts associated with the IOASID.
> + * When a status change occurs, like on IOASID deallocation, notifier chains
> + * are used to keep the consumers in sync.
> + * This is a publisher-subscriber pattern where publisher can change the
> + * state of each IOASID, e.g. alloc/free, bind IOASID to a device and mm.
> + * On the other hand, subscribers get notified for the state change and
> + * keep local states in sync.
> + */
> +static ATOMIC_NOTIFIER_HEAD(ioasid_notifier);
> +/* List to hold pending notification block registrations */
> +static LIST_HEAD(ioasid_nb_pending_list);

nit: it might be tidier to deal with the pending list in the next patch,
since it's only relevant for mm set notifier

> +static DEFINE_SPINLOCK(ioasid_nb_lock);
> +
>  /* Default to PCIe standard 20 bit PASID */
>  #define PCI_PASID_MAX 0x10
>  static ioasid_t ioasid_capacity = PCI_PASID_MAX;
>  static ioasid_t ioasid_capacity_avail = PCI_PASID_MAX;
>  static DEFINE_XARRAY_ALLOC(ioasid_sets);
>  
> +struct ioasid_set_nb {
> + struct list_headlist;
> + struct notifier_block   *nb;
> + void*token;
> + struct ioasid_set   *set;
> + boolactive;
> +};
> +
>  enum ioasid_state {
>   IOASID_STATE_INACTIVE,
>   IOASID_STATE_ACTIVE,
> @@ -365,6 +388,42 @@ void ioasid_detach_data(ioasid_t ioasid)
>  }
>  EXPORT_SYMBOL_GPL(ioasid_detach_data);
>  
> +/**
> + * ioasid_notify - Send notification on a given IOASID for status change.
> + *
> + * @data:The IOASID data to which the notification will send
> + * @cmd: Notification event sent by IOASID external users, can be
> + *   IOASID_BIND or IOASID_UNBIND.
> + *
> + * @flags:   Special instructions, e.g. notify within a set or global by
> + *   IOASID_NOTIFY_FLAG_SET or IOASID_NOTIFY_FLAG_ALL flags
> + * Caller must hold ioasid_allocator_lock and reference to the IOASID
> + */
> +static int ioasid_notify(struct ioasid_data *data,
> +  enum ioasid_notify_val cmd, unsigned int flags)
> +{
> + struct ioasid_nb_args args = { 0 };
> + int ret = 0;
> +
> + /* IOASID_FREE/ALLOC are internal events emitted by IOASID core only */
> + if (cmd <= IOASID_NOTIFY_FREE)
> + return -EINVAL;

This function is now called with ALLOC and FREE

> +
> + if (flags & ~(IOASID_NOTIFY_FLAG_ALL | IOASID_NOTIFY_FLAG_SET))
> + return -EINVAL;
> +
> + args.id = data->id;
> + args.set = data->set;
> + args.pdata = data->private;
> + args.spid = data->spid;
> + if (flags & IOASID_NOTIFY_FLAG_ALL)
> + ret = atomic_notifier_call_chain(_notifier, cmd, );
> + if (flags & IOASID_NOTIFY_FLAG_SET)
> + ret = atomic_notifier_call_chain(>set->nh, cmd, );

If both flags are set, we'll miss errors within the global notification.
Is that a problem?

> +
> + return ret;
> +}
> +
>  static ioasid_t ioasid_find_by_spid_locked(struct ioasid_set *set, ioasid_t 
> spid)
>  {
>   ioasid_t ioasid = INVALID_IOASID;
> @@ -417,6 +476,7 @@ int ioasid_attach_spid(ioasid_t ioasid, ioasid_t spid)
>   goto done_unlock;
>   }
>   data->spid = spid;
> + ioasid_notify(data, IOASID_NOTIFY_BIND, IOASID_NOTIFY_FLAG_SET);
>  
>  done_unlock:
>   spin_unlock(_allocator_lock);
> @@ -436,6 +496,7 @@ void ioasid_detach_spid(ioasid_t ioasid)
>   goto done_unlock;
>   }
>   data->spid = INVALID_IOASID;
> + ioasid_notify(data, IOASID_NOTIFY_UNBIND, IOASID_NOTIFY_FLAG_SET);
>  
>  done_unlock:
>   spin_unlock(_allocator_lock);
> @@ -469,6 +530,28 @@ static inline bool ioasid_set_is_valid(struct ioasid_set 
> *set)
>   return 

Re: [PATCH v3 09/14] iommu/ioasid: Introduce ioasid_set private ID

2020-10-23 Thread Jean-Philippe Brucker
On Mon, Sep 28, 2020 at 02:38:36PM -0700, Jacob Pan wrote:
> When an IOASID set is used for guest SVA, each VM will acquire its
> ioasid_set for IOASID allocations. IOASIDs within the VM must have a
> host/physical IOASID backing, mapping between guest and host IOASIDs can
> be non-identical. IOASID set private ID (SPID) is introduced in this
> patch to be used as guest IOASID. However, the concept of ioasid_set
> specific namespace is generic, thus named SPID.
> 
> As SPID namespace is within the IOASID set, the IOASID core can provide
> lookup services at both directions. SPIDs may not be available when its
> IOASID is allocated, the mapping between SPID and IOASID is usually
> established when a guest page table is bound to a host PASID.
> 
> Signed-off-by: Jacob Pan 
> ---
>  drivers/iommu/ioasid.c | 102 
> +
>  include/linux/ioasid.h |  19 +
>  2 files changed, 121 insertions(+)
> 
> diff --git a/drivers/iommu/ioasid.c b/drivers/iommu/ioasid.c
> index 828cc44b1b1c..378fef8f23d9 100644
> --- a/drivers/iommu/ioasid.c
> +++ b/drivers/iommu/ioasid.c
> @@ -26,6 +26,7 @@ enum ioasid_state {
>   * struct ioasid_data - Meta data about ioasid
>   *
>   * @id:  Unique ID
> + * @spid:Private ID unique within a set
>   * @users:   Number of active users
>   * @state:   Track state of the IOASID
>   * @set: ioasid_set of the IOASID belongs to
> @@ -34,6 +35,7 @@ enum ioasid_state {
>   */
>  struct ioasid_data {
>   ioasid_t id;
> + ioasid_t spid;
>   refcount_t users;
>   enum ioasid_state state;
>   struct ioasid_set *set;
> @@ -363,6 +365,105 @@ void ioasid_detach_data(ioasid_t ioasid)
>  }
>  EXPORT_SYMBOL_GPL(ioasid_detach_data);
>  
> +static ioasid_t ioasid_find_by_spid_locked(struct ioasid_set *set, ioasid_t 
> spid)
> +{
> + ioasid_t ioasid = INVALID_IOASID;
> + struct ioasid_data *entry;
> + unsigned long index;
> +
> + if (!xa_load(_sets, set->id)) {
> + pr_warn("Invalid set\n");

Could use ioasid_set_is_valid(), and perhaps a WARN_ON() instead of
pr_warn() since this is a programming error.

> + goto done;

Or just return INVALID_IOASID

> + }
> +
> + xa_for_each(>xa, index, entry) {
> + if (spid == entry->spid) {
> + refcount_inc(>users);
> + ioasid = index;

break

> + }
> + }
> +done:
> + return ioasid;
> +}
> +
> +/**
> + * ioasid_attach_spid - Attach ioasid_set private ID to an IOASID
> + *
> + * @ioasid: the system-wide IOASID to attach
> + * @spid:   the ioasid_set private ID of @ioasid
> + *
> + * After attching SPID, future lookup can be done via ioasid_find_by_spid().

attaching

> + */
> +int ioasid_attach_spid(ioasid_t ioasid, ioasid_t spid)
> +{
> + struct ioasid_data *data;
> + int ret = 0;
> +
> + if (spid == INVALID_IOASID)
> + return -EINVAL;
> +
> + spin_lock(_allocator_lock);
> + data = xa_load(_allocator->xa, ioasid);
> +
> + if (!data) {
> + pr_err("No IOASID entry %d to attach SPID %d\n",
> + ioasid, spid);
> + ret = -ENOENT;
> + goto done_unlock;
> + }
> + /* Check if SPID is unique within the set */
> + if (ioasid_find_by_spid_locked(data->set, spid) != INVALID_IOASID) {

We need an additional parameter to ioasid_find_by_spid_locked(), telling
it not to take a reference to the conflicting entry. Here we return with
the reference, which will never be released.

> + ret = -EINVAL;
> + goto done_unlock;
> + }
> + data->spid = spid;
> +
> +done_unlock:
> + spin_unlock(_allocator_lock);
> + return ret;
> +}
> +EXPORT_SYMBOL_GPL(ioasid_attach_spid);
> +
> +void ioasid_detach_spid(ioasid_t ioasid)

Could add a small comment to this public function

Thanks,
Jean

> +{
> + struct ioasid_data *data;
> +
> + spin_lock(_allocator_lock);
> + data = xa_load(_allocator->xa, ioasid);
> +
> + if (!data || data->spid == INVALID_IOASID) {
> + pr_err("Invalid IOASID entry %d to detach\n", ioasid);
> + goto done_unlock;
> + }
> + data->spid = INVALID_IOASID;
> +
> +done_unlock:
> + spin_unlock(_allocator_lock);
> +}
> +EXPORT_SYMBOL_GPL(ioasid_detach_spid);
> +
> +/**
> + * ioasid_find_by_spid - Find the system-wide IOASID by a set private ID and
> + * its set.
> + *
> + * @set: the ioasid_set to search within
> + * @spid:the set private ID
> + *
> + * Given a set private ID and its IOASID set, find the system-wide IOASID. 
> Take
> + * a reference upon finding the matching IOASID. Return INVALID_IOASID if the
> + * IOASID is not found in the set or the set is not valid.
> + */
> +ioasid_t ioasid_find_by_spid(struct ioasid_set *set, ioasid_t spid)
> +{
> + ioasid_t ioasid;
> +
> + spin_lock(_allocator_lock);
> + ioasid = ioasid_find_by_spid_locked(set, spid);
> + 

Re: [PATCH v3 10/24] iommu/io-pgtable-arm-v7s: Add cfg as a param in some macros

2020-10-23 Thread Will Deacon
On Wed, Sep 30, 2020 at 03:06:33PM +0800, Yong Wu wrote:
> Add "cfg" as a parameter for some macros. This is a preparing patch for
> mediatek extend the lvl1 pgtable. No functional change.
> 
> Signed-off-by: Yong Wu 
> ---
>  drivers/iommu/io-pgtable-arm-v7s.c | 34 +++---
>  1 file changed, 17 insertions(+), 17 deletions(-)

Acked-by: Will Deacon 

(but see my later comments above doing this for some of the 'constants' too)

Will
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH v3 09/24] iommu/io-pgtable-arm-v7s: Extend PA34 for MediaTek

2020-10-23 Thread Will Deacon
On Wed, Sep 30, 2020 at 03:06:32PM +0800, Yong Wu wrote:
> MediaTek extend the bit5 in lvl1 and lvl2 descriptor as PA34.
> 
> Signed-off-by: Yong Wu 
> ---
>  drivers/iommu/io-pgtable-arm-v7s.c | 9 +++--
>  drivers/iommu/mtk_iommu.c  | 2 +-
>  include/linux/io-pgtable.h | 4 ++--
>  3 files changed, 10 insertions(+), 5 deletions(-)

Acked-by: Will Deacon 

Will
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH v3 11/24] iommu/io-pgtable-arm-v7s: Quad lvl1 pgtable for MediaTek

2020-10-23 Thread Will Deacon
On Wed, Sep 30, 2020 at 03:06:34PM +0800, Yong Wu wrote:
> The standard input iova bits is 32. MediaTek quad the lvl1 pagetable
> (4 * lvl1). No change for lvl2 pagetable. Then the iova bits can reach
> 34bit.
> 
> Signed-off-by: Yong Wu 
> ---
>  drivers/iommu/io-pgtable-arm-v7s.c | 13 ++---
>  drivers/iommu/mtk_iommu.c  |  2 +-
>  2 files changed, 11 insertions(+), 4 deletions(-)
> 
> diff --git a/drivers/iommu/io-pgtable-arm-v7s.c 
> b/drivers/iommu/io-pgtable-arm-v7s.c
> index 8362fdf76657..306bae2755ed 100644
> --- a/drivers/iommu/io-pgtable-arm-v7s.c
> +++ b/drivers/iommu/io-pgtable-arm-v7s.c
> @@ -50,10 +50,17 @@
>   */
>  #define ARM_V7S_ADDR_BITS32

If we rename this to _ARM_V7S_ADDR_BITS can we then have ARM_V7S_ADDR_BITS
take a cfg parameter and move the arm_v7s_is_mtk_enabled() check in there?
Same for _ARM_V7S_LVL_BITS.

That would avoid scattering arm_v7s_is_mtk_enabled() into all the users.

Will
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH v3 08/24] iommu/io-pgtable-arm-v7s: Use ias to check the valid iova in unmap

2020-10-23 Thread Will Deacon
On Wed, Sep 30, 2020 at 03:06:31PM +0800, Yong Wu wrote:
> Use the ias for the valid iova checking in arm_v7s_unmap. This is a
> preparing patch for supporting iova 34bit for MediaTek.
> BTW, change the ias/oas checking format in arm_v7s_map.
> 
> Signed-off-by: Yong Wu 
> ---
>  drivers/iommu/io-pgtable-arm-v7s.c | 5 ++---
>  1 file changed, 2 insertions(+), 3 deletions(-)
> 
> diff --git a/drivers/iommu/io-pgtable-arm-v7s.c 
> b/drivers/iommu/io-pgtable-arm-v7s.c
> index a688f22cbe3b..4c9d8dccfc5a 100644
> --- a/drivers/iommu/io-pgtable-arm-v7s.c
> +++ b/drivers/iommu/io-pgtable-arm-v7s.c
> @@ -526,8 +526,7 @@ static int arm_v7s_map(struct io_pgtable_ops *ops, 
> unsigned long iova,
>   if (!(prot & (IOMMU_READ | IOMMU_WRITE)))
>   return 0;
>  
> - if (WARN_ON(iova >= (1ULL << data->iop.cfg.ias) ||
> - paddr >= (1ULL << data->iop.cfg.oas)))
> + if (WARN_ON(iova >> data->iop.cfg.ias || paddr >> data->iop.cfg.oas))
>   return -ERANGE;

As discussed when reviewing these for Android, please leave this code as-is.

>  
>   ret = __arm_v7s_map(data, iova, paddr, size, prot, 1, data->pgd, gfp);
> @@ -717,7 +716,7 @@ static size_t arm_v7s_unmap(struct io_pgtable_ops *ops, 
> unsigned long iova,
>  {
>   struct arm_v7s_io_pgtable *data = io_pgtable_ops_to_data(ops);
>  
> - if (WARN_ON(upper_32_bits(iova)))
> + if (WARN_ON(iova >> data->iop.cfg.ias))
>   return 0;

And avoid the UB here for 32-bit machines by comparing with 1ULL <<
iop.cfg.ias instead.

Thanks,

Will
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH] dma-mapping: document dma_{alloc,free}_pages

2020-10-23 Thread Robin Murphy

On 2020-10-23 07:45, Christoph Hellwig wrote:

Document the new dma_alloc_pages and dma_free_pages APIs, and fix
up the documentation for dma_alloc_noncoherent and dma_free_noncoherent.

Reported-by: Robin Murphy 
Signed-off-by: Christoph Hellwig 
---
  Documentation/core-api/dma-api.rst | 49 ++
  1 file changed, 43 insertions(+), 6 deletions(-)

diff --git a/Documentation/core-api/dma-api.rst 
b/Documentation/core-api/dma-api.rst
index ea0413276ddb70..209d7978cdaa1b 100644
--- a/Documentation/core-api/dma-api.rst
+++ b/Documentation/core-api/dma-api.rst
@@ -519,10 +519,9 @@ routines, e.g.:::
  Part II - Non-coherent DMA allocations
  --
  
-These APIs allow to allocate pages in the kernel direct mapping that are

-guaranteed to be DMA addressable.  This means that unlike dma_alloc_coherent,
-virt_to_page can be called on the resulting address, and the resulting
-struct page can be used for everything a struct page is suitable for.
+These APIs allow to allocate pages that are guranteed to be DMA addressable


Typo: guaranteed

Otherwise,

Reviewed-by: Robin Murphy 


+by the passed in device, but which need explicit management of memory
+ownership for the kernel vs the device.
  
  If you don't understand how cache line coherency works between a processor and

  an I/O device, you should not be using this part of the API.
@@ -537,7 +536,7 @@ an I/O device, you should not be using this part of the API.
  This routine allocates a region of  bytes of consistent memory.  It
  returns a pointer to the allocated region (in the processor's virtual address
  space) or NULL if the allocation failed.  The returned memory may or may not
-be in the kernels direct mapping.  Drivers must not call virt_to_page on
+be in the kernel direct mapping.  Drivers must not call virt_to_page on
  the returned memory region.
  
  It also returns a  which may be cast to an unsigned integer the

@@ -565,7 +564,45 @@ reused.
  Free a region of memory previously allocated using dma_alloc_noncoherent().
  dev, size and dma_handle and dir must all be the same as those passed into
  dma_alloc_noncoherent().  cpu_addr must be the virtual address returned by
-the dma_alloc_noncoherent().
+dma_alloc_noncoherent().
+
+::
+
+   struct page *
+   dma_alloc_pages(struct device *dev, size_t size, dma_addr_t *dma_handle,
+   enum dma_data_direction dir, gfp_t gfp)
+
+This routine allocates a region of  bytes of non-coherent memory.  It
+returns a pointer to first struct page for the region, or NULL if the
+allocation failed. The resulting struct page can be used for everything a
+struct page is suitable for.
+
+It also returns a  which may be cast to an unsigned integer the
+same width as the bus and given to the device as the DMA address base of
+the region.
+
+The dir parameter specified if data is read and/or written by the device,
+see dma_map_single() for details.
+
+The gfp parameter allows the caller to specify the ``GFP_`` flags (see
+kmalloc()) for the allocation, but rejects flags used to specify a memory
+zone such as GFP_DMA or GFP_HIGHMEM.
+
+Before giving the memory to the device, dma_sync_single_for_device() needs
+to be called, and before reading memory written by the device,
+dma_sync_single_for_cpu(), just like for streaming DMA mappings that are
+reused.
+
+::
+
+   void
+   dma_free_pages(struct device *dev, size_t size, struct page *page,
+   dma_addr_t dma_handle, enum dma_data_direction dir)
+
+Free a region of memory previously allocated using dma_alloc_pages().
+dev, size and dma_handle and dir must all be the same as those passed into
+dma_alloc_noncoherent().  page must be the pointer returned by
+dma_alloc_pages().
  
  ::
  


___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[PATCH] iommu: Modify the description of iommu_sva_unbind_device

2020-10-23 Thread Chen Jun
From: Chen Jun 

iommu_sva_unbind_device has no return value.

Remove the description of the return value of the function.

Signed-off-by: Chen Jun 
---
 drivers/iommu/iommu.c | 2 --
 1 file changed, 2 deletions(-)

diff --git a/drivers/iommu/iommu.c b/drivers/iommu/iommu.c
index 8c470f4..bb51d53 100644
--- a/drivers/iommu/iommu.c
+++ b/drivers/iommu/iommu.c
@@ -2995,8 +2995,6 @@ EXPORT_SYMBOL_GPL(iommu_sva_bind_device);
  * Put reference to a bond between device and address space. The device should
  * not be issuing any more transaction for this PASID. All outstanding page
  * requests for this PASID must have been flushed to the IOMMU.
- *
- * Returns 0 on success, or an error value
  */
 void iommu_sva_unbind_device(struct iommu_sva *handle)
 {
-- 
2.7.4

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH v4 7/7] mm: Remove examples from enum zone_type comment

2020-10-23 Thread Christoph Hellwig
Looks good,

Reviewed-by: Christoph Hellwig 
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH 2/3] drm/msm: add DRM_MSM_GEM_SYNC_CACHE for non-coherent cache maintenance

2020-10-23 Thread Christoph Hellwig
On Thu, Oct 15, 2020 at 04:43:01PM +0100, Christoph Hellwig wrote:
> > Somewhat related, but is there a way to tell the dma-api to fail instead
> > of falling back to swiotlb? In many case for gpu drivers it's much better
> > if we fall back to dma_alloc_coherent and manage the copying ourselves
> > instead of abstracting this away in the dma-api. Currently that's "solved"
> > rather pessimistically by always allocating from dma_alloc_coherent if
> > swiotlb could be in the picture (at least for ttm based drivers, i915 just
> > falls over).
> 
> Is this for the alloc_pages plus manually map logic in various drivers?
> 
> They should switch to the new dma_alloc_pages API that I'll send to
> Linus for 5.10 soon.

Daniel, can you clarify this?
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[PATCH] dma-mapping: document dma_{alloc,free}_pages

2020-10-23 Thread Christoph Hellwig
Document the new dma_alloc_pages and dma_free_pages APIs, and fix
up the documentation for dma_alloc_noncoherent and dma_free_noncoherent.

Reported-by: Robin Murphy 
Signed-off-by: Christoph Hellwig 
---
 Documentation/core-api/dma-api.rst | 49 ++
 1 file changed, 43 insertions(+), 6 deletions(-)

diff --git a/Documentation/core-api/dma-api.rst 
b/Documentation/core-api/dma-api.rst
index ea0413276ddb70..209d7978cdaa1b 100644
--- a/Documentation/core-api/dma-api.rst
+++ b/Documentation/core-api/dma-api.rst
@@ -519,10 +519,9 @@ routines, e.g.:::
 Part II - Non-coherent DMA allocations
 --
 
-These APIs allow to allocate pages in the kernel direct mapping that are
-guaranteed to be DMA addressable.  This means that unlike dma_alloc_coherent,
-virt_to_page can be called on the resulting address, and the resulting
-struct page can be used for everything a struct page is suitable for.
+These APIs allow to allocate pages that are guranteed to be DMA addressable
+by the passed in device, but which need explicit management of memory
+ownership for the kernel vs the device.
 
 If you don't understand how cache line coherency works between a processor and
 an I/O device, you should not be using this part of the API.
@@ -537,7 +536,7 @@ an I/O device, you should not be using this part of the API.
 This routine allocates a region of  bytes of consistent memory.  It
 returns a pointer to the allocated region (in the processor's virtual address
 space) or NULL if the allocation failed.  The returned memory may or may not
-be in the kernels direct mapping.  Drivers must not call virt_to_page on
+be in the kernel direct mapping.  Drivers must not call virt_to_page on
 the returned memory region.
 
 It also returns a  which may be cast to an unsigned integer the
@@ -565,7 +564,45 @@ reused.
 Free a region of memory previously allocated using dma_alloc_noncoherent().
 dev, size and dma_handle and dir must all be the same as those passed into
 dma_alloc_noncoherent().  cpu_addr must be the virtual address returned by
-the dma_alloc_noncoherent().
+dma_alloc_noncoherent().
+
+::
+
+   struct page *
+   dma_alloc_pages(struct device *dev, size_t size, dma_addr_t *dma_handle,
+   enum dma_data_direction dir, gfp_t gfp)
+
+This routine allocates a region of  bytes of non-coherent memory.  It
+returns a pointer to first struct page for the region, or NULL if the
+allocation failed. The resulting struct page can be used for everything a
+struct page is suitable for.
+
+It also returns a  which may be cast to an unsigned integer the
+same width as the bus and given to the device as the DMA address base of
+the region.
+
+The dir parameter specified if data is read and/or written by the device,
+see dma_map_single() for details.
+
+The gfp parameter allows the caller to specify the ``GFP_`` flags (see
+kmalloc()) for the allocation, but rejects flags used to specify a memory
+zone such as GFP_DMA or GFP_HIGHMEM.
+
+Before giving the memory to the device, dma_sync_single_for_device() needs
+to be called, and before reading memory written by the device,
+dma_sync_single_for_cpu(), just like for streaming DMA mappings that are
+reused.
+
+::
+
+   void
+   dma_free_pages(struct device *dev, size_t size, struct page *page,
+   dma_addr_t dma_handle, enum dma_data_direction dir)
+
+Free a region of memory previously allocated using dma_alloc_pages().
+dev, size and dma_handle and dir must all be the same as those passed into
+dma_alloc_noncoherent().  page must be the pointer returned by
+dma_alloc_pages().
 
 ::
 
-- 
2.28.0

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[PATCH for-5.10] swiotlb: remove the tbl_dma_addr argument to swiotlb_tbl_map_single

2020-10-23 Thread Christoph Hellwig
The tbl_dma_addr argument is used to check the DMA boundary for the
allocations, and thus needs to be a dma_addr_t.  swiotlb-xen instead
passed a physical address, which could lead to incorrect results for
strange offsets.  Fix this by removing the parameter entirely and hard
code the DMA address for io_tlb_start instead.

Fixes: 91ffe4ad534a ("swiotlb-xen: introduce phys_to_dma/dma_to_phys 
translations")
Signed-off-by: Christoph Hellwig 
Reviewed-by: Stefano Stabellini 
---
 drivers/iommu/intel/iommu.c |  5 ++---
 drivers/xen/swiotlb-xen.c   |  3 +--
 include/linux/swiotlb.h | 10 +++---
 kernel/dma/swiotlb.c| 16 ++--
 4 files changed, 12 insertions(+), 22 deletions(-)

diff --git a/drivers/iommu/intel/iommu.c b/drivers/iommu/intel/iommu.c
index 8651f6d4dfa032..6b560e6f193096 100644
--- a/drivers/iommu/intel/iommu.c
+++ b/drivers/iommu/intel/iommu.c
@@ -3815,9 +3815,8 @@ bounce_map_single(struct device *dev, phys_addr_t paddr, 
size_t size,
 * page aligned, we don't need to use a bounce page.
 */
if (!IS_ALIGNED(paddr | size, VTD_PAGE_SIZE)) {
-   tlb_addr = swiotlb_tbl_map_single(dev,
-   phys_to_dma_unencrypted(dev, io_tlb_start),
-   paddr, size, aligned_size, dir, attrs);
+   tlb_addr = swiotlb_tbl_map_single(dev, paddr, size,
+   aligned_size, dir, attrs);
if (tlb_addr == DMA_MAPPING_ERROR) {
goto swiotlb_error;
} else {
diff --git a/drivers/xen/swiotlb-xen.c b/drivers/xen/swiotlb-xen.c
index 71ce1b7a23d1cc..2b385c1b4a99cb 100644
--- a/drivers/xen/swiotlb-xen.c
+++ b/drivers/xen/swiotlb-xen.c
@@ -395,8 +395,7 @@ static dma_addr_t xen_swiotlb_map_page(struct device *dev, 
struct page *page,
 */
trace_swiotlb_bounced(dev, dev_addr, size, swiotlb_force);
 
-   map = swiotlb_tbl_map_single(dev, virt_to_phys(xen_io_tlb_start),
-phys, size, size, dir, attrs);
+   map = swiotlb_tbl_map_single(dev, phys, size, size, dir, attrs);
if (map == (phys_addr_t)DMA_MAPPING_ERROR)
return DMA_MAPPING_ERROR;
 
diff --git a/include/linux/swiotlb.h b/include/linux/swiotlb.h
index 513913ff748626..3bb72266a75a1d 100644
--- a/include/linux/swiotlb.h
+++ b/include/linux/swiotlb.h
@@ -45,13 +45,9 @@ enum dma_sync_target {
SYNC_FOR_DEVICE = 1,
 };
 
-extern phys_addr_t swiotlb_tbl_map_single(struct device *hwdev,
- dma_addr_t tbl_dma_addr,
- phys_addr_t phys,
- size_t mapping_size,
- size_t alloc_size,
- enum dma_data_direction dir,
- unsigned long attrs);
+phys_addr_t swiotlb_tbl_map_single(struct device *hwdev, phys_addr_t phys,
+   size_t mapping_size, size_t alloc_size,
+   enum dma_data_direction dir, unsigned long attrs);
 
 extern void swiotlb_tbl_unmap_single(struct device *hwdev,
 phys_addr_t tlb_addr,
diff --git a/kernel/dma/swiotlb.c b/kernel/dma/swiotlb.c
index b4eea0abc3f002..92e2f54f24c01b 100644
--- a/kernel/dma/swiotlb.c
+++ b/kernel/dma/swiotlb.c
@@ -441,14 +441,11 @@ static void swiotlb_bounce(phys_addr_t orig_addr, 
phys_addr_t tlb_addr,
}
 }
 
-phys_addr_t swiotlb_tbl_map_single(struct device *hwdev,
-  dma_addr_t tbl_dma_addr,
-  phys_addr_t orig_addr,
-  size_t mapping_size,
-  size_t alloc_size,
-  enum dma_data_direction dir,
-  unsigned long attrs)
+phys_addr_t swiotlb_tbl_map_single(struct device *hwdev, phys_addr_t orig_addr,
+   size_t mapping_size, size_t alloc_size,
+   enum dma_data_direction dir, unsigned long attrs)
 {
+   dma_addr_t tbl_dma_addr = phys_to_dma_unencrypted(hwdev, io_tlb_start);
unsigned long flags;
phys_addr_t tlb_addr;
unsigned int nslots, stride, index, wrap;
@@ -667,9 +664,8 @@ dma_addr_t swiotlb_map(struct device *dev, phys_addr_t 
paddr, size_t size,
trace_swiotlb_bounced(dev, phys_to_dma(dev, paddr), size,
  swiotlb_force);
 
-   swiotlb_addr = swiotlb_tbl_map_single(dev,
-   phys_to_dma_unencrypted(dev, io_tlb_start),
-   paddr, size, size, dir, attrs);
+   swiotlb_addr = swiotlb_tbl_map_single(dev, paddr, size, size, dir,
+   attrs);
if (swiotlb_addr == (phys_addr_t)DMA_MAPPING_ERROR)
return DMA_MAPPING_ERROR;
 
-- 
2.28.0

___
iommu mailing list

Re: [PATCH 2/4] iommu/mediatek: Add iotlb_sync_range() support

2020-10-23 Thread chao hao
On Fri, 2020-10-23 at 13:57 +0800, chao hao wrote:
> On Wed, 2020-10-21 at 17:55 +0100, Robin Murphy wrote:
> > On 2020-10-19 12:30, Chao Hao wrote:
> > > MTK_IOMMU driver writes one page entry and does tlb flush at a time
> > > currently. More optimal would be to aggregate the writes and flush
> > > BUS buffer in the end.
> > 
> > That's exactly what iommu_iotlb_gather_add_page() is meant to achieve. 
> > Rather than jumping straight into hacking up a new API to go round the 
> > back of the existing API design, it would be far better to ask the 
> > question of why that's not behaving as expected.
> 
> Thanks for you review!
> 
> iommu_iotlb_gather_add_page is put in io_pgtable_tlb_add_page().
> io_pgtable_tlb_add_page() only be called in
> unmapping and mapping flow doesn't have it in linux iommu driver, but
> mtk iommu needs to do tlb sync in mapping
> and unmapping to avoid old data being in the iommu tlb.
> 
> In addtion, we hope to do tlb sync once when all the pages mapping done.
> iommu_iotlb_gather_add_page maybe do
> tlb sync more than once. because one whole buffer consists of different
> page size(1MB/64K/4K).
> 
> Based on the previous considerations,  don't find more appropriate the
> way of tlb sync for mtk iommu, so we add a new API.
> 
> > 
> > > For 50MB buffer mapping, if mtk_iommu driver use iotlb_sync_range()
> > > instead of tlb_add_range() and tlb_flush_walk/leaf(), it can increase
> > > 50% performance or more(depending on size of every page size) in
> > > comparison to flushing after each page entry update. So we prefer to
> > > use iotlb_sync_range() to replace iotlb_sync(), tlb_add_range() and
> > > tlb_flush_walk/leaf() for MTK platforms.
> > 
> > In the case of mapping, it sounds like what you actually want to do is 
> > hook up .iotlb_sync_map and generally make IO_PGTABLE_QUIRK_TLBI_ON_MAP 
> > cleverer, because the current implementation is as dumb as it could 
> > possibly be. 
> 
> iotlb_sync_map only has one parameter(iommu_domain), but mtk
> iommu_domain maybe include the whole iova space, if mtk_iommu to do tlb
> sync based on iommu_domain, it is equivalent to do tlb flush all in
> fact.
> iommu driver will do tlb sync in every mapping page when mtk iommu sets
> IO_PGTABLE_QUIRK_TLBI_ON_MAP(io_pgtable_tlb_flush_walk),
> as is the commit message mentioned, it will drop mapping performance in
> mtk platform.
> 
> 
> > In fact if we simply passed an address range to 
> > .iotlb_sync_map, io-pgtable probably wouldn't need to be involved at all 
> > any more.
Sorry, I forget to reply the question in previous mail.
Do you mean we need to modify iotlb_sync_map() input parameter(ex: add
start/end iova)?

> 
> I know it is not a good idea probably by adding a new api, but I found
> out that tlb sync only to be done after mapping one page, so if
> mtk_iommu hope to do tlb sync once after all the pages map done, could
> you give me some advices? thanks!
> 
> > 
> > Robin.
> > 
> > > Signed-off-by: Chao Hao 
> > > ---
> > >   drivers/iommu/mtk_iommu.c | 6 ++
> > >   1 file changed, 6 insertions(+)
> > > 
> > > diff --git a/drivers/iommu/mtk_iommu.c b/drivers/iommu/mtk_iommu.c
> > > index 785b228d39a6..d3400c15ff7b 100644
> > > --- a/drivers/iommu/mtk_iommu.c
> > > +++ b/drivers/iommu/mtk_iommu.c
> > > @@ -224,6 +224,11 @@ static void mtk_iommu_tlb_flush_range_sync(unsigned 
> > > long iova, size_t size,
> > >   }
> > >   }
> > >   
> > > +static void __mtk_iommu_tlb_flush_range_sync(unsigned long iova, size_t 
> > > size)
> > > +{
> > > + mtk_iommu_tlb_flush_range_sync(iova, size, 0, NULL)
> > > +}
> > > +
> > >   static void mtk_iommu_tlb_flush_page_nosync(struct iommu_iotlb_gather 
> > > *gather,
> > >   unsigned long iova, size_t 
> > > granule,
> > >   void *cookie)
> > > @@ -536,6 +541,7 @@ static const struct iommu_ops mtk_iommu_ops = {
> > >   .map= mtk_iommu_map,
> > >   .unmap  = mtk_iommu_unmap,
> > >   .flush_iotlb_all = mtk_iommu_flush_iotlb_all,
> > > + .iotlb_sync_range = __mtk_iommu_tlb_flush_range_sync,
> > >   .iotlb_sync = mtk_iommu_iotlb_sync,
> > >   .iova_to_phys   = mtk_iommu_iova_to_phys,
> > >   .probe_device   = mtk_iommu_probe_device,
> > > 
> 

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH 2/4] iommu/mediatek: Add iotlb_sync_range() support

2020-10-23 Thread chao hao
On Wed, 2020-10-21 at 17:55 +0100, Robin Murphy wrote:
> On 2020-10-19 12:30, Chao Hao wrote:
> > MTK_IOMMU driver writes one page entry and does tlb flush at a time
> > currently. More optimal would be to aggregate the writes and flush
> > BUS buffer in the end.
> 
> That's exactly what iommu_iotlb_gather_add_page() is meant to achieve. 
> Rather than jumping straight into hacking up a new API to go round the 
> back of the existing API design, it would be far better to ask the 
> question of why that's not behaving as expected.

Thanks for you review!

iommu_iotlb_gather_add_page is put in io_pgtable_tlb_add_page().
io_pgtable_tlb_add_page() only be called in
unmapping and mapping flow doesn't have it in linux iommu driver, but
mtk iommu needs to do tlb sync in mapping
and unmapping to avoid old data being in the iommu tlb.

In addtion, we hope to do tlb sync once when all the pages mapping done.
iommu_iotlb_gather_add_page maybe do
tlb sync more than once. because one whole buffer consists of different
page size(1MB/64K/4K).

Based on the previous considerations,  don't find more appropriate the
way of tlb sync for mtk iommu, so we add a new API.

> 
> > For 50MB buffer mapping, if mtk_iommu driver use iotlb_sync_range()
> > instead of tlb_add_range() and tlb_flush_walk/leaf(), it can increase
> > 50% performance or more(depending on size of every page size) in
> > comparison to flushing after each page entry update. So we prefer to
> > use iotlb_sync_range() to replace iotlb_sync(), tlb_add_range() and
> > tlb_flush_walk/leaf() for MTK platforms.
> 
> In the case of mapping, it sounds like what you actually want to do is 
> hook up .iotlb_sync_map and generally make IO_PGTABLE_QUIRK_TLBI_ON_MAP 
> cleverer, because the current implementation is as dumb as it could 
> possibly be. 

iotlb_sync_map only has one parameter(iommu_domain), but mtk
iommu_domain maybe include the whole iova space, if mtk_iommu to do tlb
sync based on iommu_domain, it is equivalent to do tlb flush all in
fact.
iommu driver will do tlb sync in every mapping page when mtk iommu sets
IO_PGTABLE_QUIRK_TLBI_ON_MAP(io_pgtable_tlb_flush_walk),
as is the commit message mentioned, it will drop mapping performance in
mtk platform.


> In fact if we simply passed an address range to 
> .iotlb_sync_map, io-pgtable probably wouldn't need to be involved at all 
> any more.

I know it is not a good idea probably by adding a new api, but I found
out that tlb sync only to be done after mapping one page, so if
mtk_iommu hope to do tlb sync once after all the pages map done, could
you give me some advices? thanks!

> 
> Robin.
> 
> > Signed-off-by: Chao Hao 
> > ---
> >   drivers/iommu/mtk_iommu.c | 6 ++
> >   1 file changed, 6 insertions(+)
> > 
> > diff --git a/drivers/iommu/mtk_iommu.c b/drivers/iommu/mtk_iommu.c
> > index 785b228d39a6..d3400c15ff7b 100644
> > --- a/drivers/iommu/mtk_iommu.c
> > +++ b/drivers/iommu/mtk_iommu.c
> > @@ -224,6 +224,11 @@ static void mtk_iommu_tlb_flush_range_sync(unsigned 
> > long iova, size_t size,
> > }
> >   }
> >   
> > +static void __mtk_iommu_tlb_flush_range_sync(unsigned long iova, size_t 
> > size)
> > +{
> > +   mtk_iommu_tlb_flush_range_sync(iova, size, 0, NULL)
> > +}
> > +
> >   static void mtk_iommu_tlb_flush_page_nosync(struct iommu_iotlb_gather 
> > *gather,
> > unsigned long iova, size_t granule,
> > void *cookie)
> > @@ -536,6 +541,7 @@ static const struct iommu_ops mtk_iommu_ops = {
> > .map= mtk_iommu_map,
> > .unmap  = mtk_iommu_unmap,
> > .flush_iotlb_all = mtk_iommu_flush_iotlb_all,
> > +   .iotlb_sync_range = __mtk_iommu_tlb_flush_range_sync,
> > .iotlb_sync = mtk_iommu_iotlb_sync,
> > .iova_to_phys   = mtk_iommu_iova_to_phys,
> > .probe_device   = mtk_iommu_probe_device,
> > 

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu