Re: [PATCH v4 3/7] of/address: Introduce of_dma_get_max_cpu_address()

2020-10-22 Thread Ard Biesheuvel
On Wed, 21 Oct 2020 at 14:35, Nicolas Saenz Julienne
 wrote:
>
> Introduce of_dma_get_max_cpu_address(), which provides the highest CPU
> physical address addressable by all DMA masters in the system. It's
> specially useful for setting memory zones sizes at early boot time.
>
> Signed-off-by: Nicolas Saenz Julienne 
>
> ---
>
> Changes since v3:
>  - use u64 with cpu_end
>
> Changes since v2:
>  - Use PHYS_ADDR_MAX
>  - return phys_dma_t
>  - Rename function
>  - Correct subject
>  - Add support to start parsing from an arbitrary device node in order
>for the function to work with unit tests
>
>  drivers/of/address.c | 42 ++
>  include/linux/of.h   |  7 +++
>  2 files changed, 49 insertions(+)
>
> diff --git a/drivers/of/address.c b/drivers/of/address.c
> index eb9ab4f1e80b..47dfe5881e18 100644
> --- a/drivers/of/address.c
> +++ b/drivers/of/address.c
> @@ -1024,6 +1024,48 @@ int of_dma_get_range(struct device_node *np, const 
> struct bus_dma_region **map)
>  }
>  #endif /* CONFIG_HAS_DMA */
>
> +/**
> + * of_dma_get_max_cpu_address - Gets highest CPU address suitable for DMA
> + * @np: The node to start searching from or NULL to start from the root
> + *
> + * Gets the highest CPU physical address that is addressable by all DMA 
> masters
> + * in the sub-tree pointed by np, or the whole tree if NULL is passed. If no
> + * DMA constrained device is found, it returns PHYS_ADDR_MAX.
> + */
> +phys_addr_t __init of_dma_get_max_cpu_address(struct device_node *np)
> +{
> +   phys_addr_t max_cpu_addr = PHYS_ADDR_MAX;
> +   struct of_range_parser parser;
> +   phys_addr_t subtree_max_addr;
> +   struct device_node *child;
> +   struct of_range range;
> +   const __be32 *ranges;
> +   u64 cpu_end = 0;
> +   int len;
> +
> +   if (!np)
> +   np = of_root;
> +
> +   ranges = of_get_property(np, "dma-ranges", &len);
> +   if (ranges && len) {
> +   of_dma_range_parser_init(&parser, np);
> +   for_each_of_range(&parser, &range)
> +   if (range.cpu_addr + range.size > cpu_end)
> +   cpu_end = range.cpu_addr + range.size;

Shouldn't this be 'range.cpu_addr + range.size - 1' ?

> +
> +   if (max_cpu_addr > cpu_end)
> +   max_cpu_addr = cpu_end;
> +   }
> +
> +   for_each_available_child_of_node(np, child) {
> +   subtree_max_addr = of_dma_get_max_cpu_address(child);
> +   if (max_cpu_addr > subtree_max_addr)
> +   max_cpu_addr = subtree_max_addr;
> +   }
> +
> +   return max_cpu_addr;
> +}
> +
>  /**
>   * of_dma_is_coherent - Check if device is coherent
>   * @np:device node
> diff --git a/include/linux/of.h b/include/linux/of.h
> index 481ec0467285..db8db8f2c967 100644
> --- a/include/linux/of.h
> +++ b/include/linux/of.h
> @@ -558,6 +558,8 @@ int of_map_id(struct device_node *np, u32 id,
>const char *map_name, const char *map_mask_name,
>struct device_node **target, u32 *id_out);
>
> +phys_addr_t of_dma_get_max_cpu_address(struct device_node *np);
> +
>  #else /* CONFIG_OF */
>
>  static inline void of_core_init(void)
> @@ -995,6 +997,11 @@ static inline int of_map_id(struct device_node *np, u32 
> id,
> return -EINVAL;
>  }
>
> +static inline phys_addr_t of_dma_get_max_cpu_address(struct device_node *np)
> +{
> +   return PHYS_ADDR_MAX;
> +}
> +
>  #define of_match_ptr(_ptr) NULL
>  #define of_match_node(_matches, _node) NULL
>  #endif /* CONFIG_OF */
> --
> 2.28.0
>
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: WARNING in dma_map_page_attrs

2020-10-22 Thread Christoph Hellwig
I don't think the merge commit makes sense here.  But what we see here
is that dma_map_page is called on the rxe device, without that device
having a DMA mask.  For now this needs a workaround in rxe, but for
5.11 I'll send a patch to remove dma-virt and just handle this case
inside of the rdma core.

On Wed, Oct 21, 2020 at 12:03:19PM -0700, syzbot wrote:
> Hello,
> 
> syzbot found the following issue on:
> 
> HEAD commit:c4d6fe73 Merge tag 'xarray-5.9' of git://git.infradead.org..
> git tree:   upstream
> console output: https://syzkaller.appspot.com/x/log.txt?x=14862ff050
> kernel config:  https://syzkaller.appspot.com/x/.config?x=7d790573d3e379c4
> dashboard link: https://syzkaller.appspot.com/bug?extid=34dc2fea3478e659af01
> compiler:   gcc (GCC) 10.1.0-syz 20200507
> 
> Unfortunately, I don't have any reproducer for this issue yet.
> 
> IMPORTANT: if you fix the issue, please add the following tag to the commit:
> Reported-by: syzbot+34dc2fea3478e659a...@syzkaller.appspotmail.com
> 
> infiniband syz1: set active
> infiniband syz1: added vcan0
> [ cut here ]
> WARNING: CPU: 1 PID: 9851 at kernel/dma/mapping.c:149 
> dma_map_page_attrs+0x493/0x700 kernel/dma/mapping.c:149
> Modules linked in:
> CPU: 1 PID: 9851 Comm: syz-executor.1 Not tainted 5.9.0-syzkaller #0
> Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 
> rel-1.12.0-59-gc9ba5276e321-prebuilt.qemu.org 04/01/2014
> RIP: 0010:dma_map_page_attrs+0x493/0x700 kernel/dma/mapping.c:149
> Code: 80 3c 10 00 0f 85 ed 01 00 00 48 8b 1d 36 c3 fa 0c e9 2d fc ff ff 48 89 
> c3 e9 d1 fd ff ff e8 04 12 12 00 0f 0b e8 fd 11 12 00 <0f> 0b 49 c7 c4 ff ff 
> ff ff e9 d5 fd ff ff e8 ea 11 12 00 48 8d 7b
> RSP: 0018:c90001546c68 EFLAGS: 00010246
> RAX: 0004 RBX: 894d0040 RCX: c9000dbe4000
> RDX: 0004 RSI: 815d3b03 RDI: 88806a988b00
> RBP: 8880236cc400 R08: 0002 R09: 
> R10: 0002 R11:  R12: ea8db300
> R13: 88806a9886e8 R14: 04b8 R15: 0002
> FS:  7f678fae2700() GS:88802ce0() knlGS:
> CS:  0010 DS:  ES:  CR0: 80050033
> CR2: 7f299a39b190 CR3: 69f31000 CR4: 00350ee0
> DR0:  DR1:  DR2: 
> DR3:  DR6: fffe0ff0 DR7: 0400
> Call Trace:
>  dma_map_single_attrs include/linux/dma-mapping.h:279 [inline]
>  ib_dma_map_single include/rdma/ib_verbs.h:3967 [inline]
>  ib_mad_post_receive_mads+0x23f/0xd60 drivers/infiniband/core/mad.c:2715
>  ib_mad_port_start drivers/infiniband/core/mad.c:2862 [inline]
>  ib_mad_port_open drivers/infiniband/core/mad.c:3016 [inline]
>  ib_mad_init_device+0x72b/0x1400 drivers/infiniband/core/mad.c:3092
>  add_client_context+0x405/0x5e0 drivers/infiniband/core/device.c:680
>  enable_device_and_get+0x1d5/0x3c0 drivers/infiniband/core/device.c:1301
>  ib_register_device drivers/infiniband/core/device.c:1376 [inline]
>  ib_register_device+0x7a7/0xa40 drivers/infiniband/core/device.c:1335
>  rxe_register_device+0x46d/0x570 drivers/infiniband/sw/rxe/rxe_verbs.c:1182
>  rxe_add+0x12fe/0x16d0 drivers/infiniband/sw/rxe/rxe.c:247
>  rxe_net_add+0x8c/0xe0 drivers/infiniband/sw/rxe/rxe_net.c:507
>  rxe_newlink drivers/infiniband/sw/rxe/rxe.c:269 [inline]
>  rxe_newlink+0xb7/0xe0 drivers/infiniband/sw/rxe/rxe.c:250
>  nldev_newlink+0x30e/0x540 drivers/infiniband/core/nldev.c:1555
>  rdma_nl_rcv_msg+0x367/0x690 drivers/infiniband/core/netlink.c:195
>  rdma_nl_rcv_skb drivers/infiniband/core/netlink.c:239 [inline]
>  rdma_nl_rcv+0x2f2/0x440 drivers/infiniband/core/netlink.c:259
>  netlink_unicast_kernel net/netlink/af_netlink.c:1304 [inline]
>  netlink_unicast+0x533/0x7d0 net/netlink/af_netlink.c:1330
>  netlink_sendmsg+0x856/0xd90 net/netlink/af_netlink.c:1919
>  sock_sendmsg_nosec net/socket.c:651 [inline]
>  sock_sendmsg+0xcf/0x120 net/socket.c:671
>  sys_sendmsg+0x6e8/0x810 net/socket.c:2353
>  ___sys_sendmsg+0xf3/0x170 net/socket.c:2407
>  __sys_sendmsg+0xe5/0x1b0 net/socket.c:2440
>  do_syscall_64+0x2d/0x70 arch/x86/entry/common.c:46
>  entry_SYSCALL_64_after_hwframe+0x44/0xa9
> RIP: 0033:0x45d9f9
> Code: bd b1 fb ff c3 66 2e 0f 1f 84 00 00 00 00 00 66 90 48 89 f8 48 89 f7 48 
> 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 0f 
> 83 8b b1 fb ff c3 66 2e 0f 1f 84 00 00 00 00
> RSP: 002b:7f678fae1c88 EFLAGS: 0246 ORIG_RAX: 002e
> RAX: ffda RBX: 0071f480 RCX: 0045d9f9
> RDX:  RSI: 2200 RDI: 0003
> RBP: 004aab13 R08:  R09: 
> R10:  R11: 0246 R12: 0075bf00
> R13: 7ffc6f9b8bbf R14: 7f678fac2000 R15: 0003
> 
> 
> ---
> This report is generated by a bot. It may contain errors.
> See https://goo.gl/tpsmEJ for m

Re: [PATCH v4 3/7] of/address: Introduce of_dma_get_max_cpu_address()

2020-10-22 Thread Nicolas Saenz Julienne
On Thu, 2020-10-22 at 14:23 +0200, Ard Biesheuvel wrote:
> > +/**
> > + * of_dma_get_max_cpu_address - Gets highest CPU address suitable for DMA
> > + * @np: The node to start searching from or NULL to start from the root
> > + *
> > + * Gets the highest CPU physical address that is addressable by all DMA 
> > masters
> > + * in the sub-tree pointed by np, or the whole tree if NULL is passed. If 
> > no
> > + * DMA constrained device is found, it returns PHYS_ADDR_MAX.
> > + */
> > +phys_addr_t __init of_dma_get_max_cpu_address(struct device_node *np)
> > +{
> > +   phys_addr_t max_cpu_addr = PHYS_ADDR_MAX;
> > +   struct of_range_parser parser;
> > +   phys_addr_t subtree_max_addr;
> > +   struct device_node *child;
> > +   struct of_range range;
> > +   const __be32 *ranges;
> > +   u64 cpu_end = 0;
> > +   int len;
> > +
> > +   if (!np)
> > +   np = of_root;
> > +
> > +   ranges = of_get_property(np, "dma-ranges", &len);
> > +   if (ranges && len) {
> > +   of_dma_range_parser_init(&parser, np);
> > +   for_each_of_range(&parser, &range)
> > +   if (range.cpu_addr + range.size > cpu_end)
> > +   cpu_end = range.cpu_addr + range.size;
> 
> 
> Shouldn't this be 'range.cpu_addr + range.size - 1' ?

Yes, I agree. In that case arm64's counterpart should be:

zone_dma_bits = max(32U, fls64(of_dma_get_max_cpu_address(NULL)));

I'll update it.

Regards,
Nicolas



signature.asc
Description: This is a digitally signed message part
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

Re: [PATCH v4 5/7] arm64: mm: Set ZONE_DMA size based on devicetree's dma-ranges

2020-10-22 Thread Catalin Marinas
On Wed, Oct 21, 2020 at 02:34:35PM +0200, Nicolas Saenz Julienne wrote:
> @@ -188,9 +186,11 @@ static phys_addr_t __init max_zone_phys(unsigned int 
> zone_bits)
>  static void __init zone_sizes_init(unsigned long min, unsigned long max)
>  {
>   unsigned long max_zone_pfns[MAX_NR_ZONES]  = {0};
> + unsigned int __maybe_unused dt_zone_dma_bits;
>  
>  #ifdef CONFIG_ZONE_DMA
> - zone_dma_bits = ARM64_ZONE_DMA_BITS;
> + dt_zone_dma_bits = ilog2(of_dma_get_max_cpu_address(NULL));
> + zone_dma_bits = min(32U, dt_zone_dma_bits);

A thought: can we remove the min here and expand ZONE_DMA to whatever
dt_zone_dma_bits says? More on this below.

>   arm64_dma_phys_limit = max_zone_phys(zone_dma_bits);
>   max_zone_pfns[ZONE_DMA] = PFN_DOWN(arm64_dma_phys_limit);
>  #endif

I was talking earlier to Ard and Robin on the ZONE_DMA32 history and the
need for max_zone_phys(). This was rather theoretical, the Seattle
platform has all RAM starting above 4GB and that led to an empty
ZONE_DMA32 originally. The max_zone_phys() hack was meant to lift
ZONE_DMA32 into the bottom of the RAM, on the assumption that such
32-bit devices would have a DMA offset hardwired. We are not aware of
any such case on arm64 systems and even on Seattle, IIUC 32-bit devices
only work if they are behind an SMMU (so no hardwired offset).

In hindsight, it would have made more sense on platforms with RAM above
4GB to expand ZONE_DMA32 to cover the whole memory (so empty
ZONE_NORMAL). Something like:

diff --git a/arch/arm64/mm/init.c b/arch/arm64/mm/init.c
index a53c1e0fb017..7d5e3dd85617 100644
--- a/arch/arm64/mm/init.c
+++ b/arch/arm64/mm/init.c
@@ -187,8 +187,12 @@ static void __init reserve_elfcorehdr(void)
  */
 static phys_addr_t __init max_zone_phys(unsigned int zone_bits)
 {
-   phys_addr_t offset = memblock_start_of_DRAM() & GENMASK_ULL(63, 
zone_bits);
-   return min(offset + (1ULL << zone_bits), memblock_end_of_DRAM());
+   phys_addr_t zone_mask = 1ULL << zone_bits;
+
+   if (!(memblock_start_of_DRAM() & zone_mask))
+   zone_mask = PHYS_ADDR_MAX;
+
+   return min(zone_mask, memblock_end_of_DRAM());
 }
 
 static void __init zone_sizes_init(unsigned long min, unsigned long max)

I don't think this makes any difference for ZONE_DMA unless a
broken DT or IORT reports the max CPU address below the start of DRAM.

There's a minor issue if of_dma_get_max_cpu_address() matches
memblock_end_of_DRAM() but they are not a power of 2. We'd be left with
a bit of RAM at the end in ZONE_NORMAL due to ilog2 truncation.

-- 
Catalin
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH v5 0/3] iommu/arm-smmu-qcom: Support maintaining bootloader mappings

2020-10-22 Thread Steev Klimaszewski


On 10/19/20 1:23 PM, Bjorn Andersson wrote:
> This is the revised fourth attempt of inheriting the stream mapping for
> the framebuffer on many Qualcomm platforms, in order to not hit
> catastrophic faults during arm-smmu initialization.
>
> The new approach does, based on Robin's suggestion, take a much more
> direct approach with the allocation of a context bank for bypass
> emulation and use of this context bank pretty much isolated to the
> Qualcomm specific implementation.
>
> The patchset has been tested to boot DB845c and RB5 (with splash screen)
> and Lenovo Yoga C630 (with EFI framebuffer).
>
> Bjorn Andersson (3):
>   iommu/arm-smmu: Allow implementation specific write_s2cr
>   iommu/arm-smmu-qcom: Read back stream mappings
>   iommu/arm-smmu-qcom: Implement S2CR quirk
>
>  drivers/iommu/arm/arm-smmu/arm-smmu-qcom.c | 90 ++
>  drivers/iommu/arm/arm-smmu/arm-smmu.c  | 13 +++-
>  drivers/iommu/arm/arm-smmu/arm-smmu.h  |  1 +
>  3 files changed, 101 insertions(+), 3 deletions(-)
>
Tested series here on my Lenovo C630.

Tested-by: Steev Klimaszewski 

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH 2/4] iommu/mediatek: Add iotlb_sync_range() support

2020-10-22 Thread chao hao
On Wed, 2020-10-21 at 17:55 +0100, Robin Murphy wrote:
> On 2020-10-19 12:30, Chao Hao wrote:
> > MTK_IOMMU driver writes one page entry and does tlb flush at a time
> > currently. More optimal would be to aggregate the writes and flush
> > BUS buffer in the end.
> 
> That's exactly what iommu_iotlb_gather_add_page() is meant to achieve. 
> Rather than jumping straight into hacking up a new API to go round the 
> back of the existing API design, it would be far better to ask the 
> question of why that's not behaving as expected.

Thanks for you review!

iommu_iotlb_gather_add_page is put in io_pgtable_tlb_add_page().
io_pgtable_tlb_add_page() only be called in
unmapping and mapping flow doesn't have it in linux iommu driver, but
mtk iommu needs to do tlb sync in mapping
and unmapping to avoid old data being in the iommu tlb.

In addtion, we hope to do tlb sync once when all the pages mapping done.
iommu_iotlb_gather_add_page maybe do
tlb sync more than once. because one whole buffer consists of different
page size(1MB/64K/4K).

Based on the previous considerations,  don't find more appropriate the
way of tlb sync for mtk iommu, so we add a new API.

> 
> > For 50MB buffer mapping, if mtk_iommu driver use iotlb_sync_range()
> > instead of tlb_add_range() and tlb_flush_walk/leaf(), it can increase
> > 50% performance or more(depending on size of every page size) in
> > comparison to flushing after each page entry update. So we prefer to
> > use iotlb_sync_range() to replace iotlb_sync(), tlb_add_range() and
> > tlb_flush_walk/leaf() for MTK platforms.
> 
> In the case of mapping, it sounds like what you actually want to do is 
> hook up .iotlb_sync_map and generally make IO_PGTABLE_QUIRK_TLBI_ON_MAP 
> cleverer, because the current implementation is as dumb as it could 
> possibly be. 

iotlb_sync_map only has one parameter(iommu_domain), but mtk
iommu_domain maybe include the whole iova space, if mtk_iommu to do tlb
sync based on iommu_domain, it is equivalent to do tlb flush all in
fact.
iommu driver will do tlb sync in every mapping page when mtk iommu sets
IO_PGTABLE_QUIRK_TLBI_ON_MAP(io_pgtable_tlb_flush_walk),
as is the commit message mentioned, it will drop mapping performance in
mtk platform.


> In fact if we simply passed an address range to 
> .iotlb_sync_map, io-pgtable probably wouldn't need to be involved at all 
> any more.

I know it is not a good idea probably by adding a new api, but I found
out that tlb sync only to be done after mapping one page, so if
mtk_iommu hope to do tlb sync once after all the pages map done, could
you give me some advices? thanks!

> 
> Robin.
> 
> > Signed-off-by: Chao Hao 
> > ---
> >   drivers/iommu/mtk_iommu.c | 6 ++
> >   1 file changed, 6 insertions(+)
> > 
> > diff --git a/drivers/iommu/mtk_iommu.c b/drivers/iommu/mtk_iommu.c
> > index 785b228d39a6..d3400c15ff7b 100644
> > --- a/drivers/iommu/mtk_iommu.c
> > +++ b/drivers/iommu/mtk_iommu.c
> > @@ -224,6 +224,11 @@ static void mtk_iommu_tlb_flush_range_sync(unsigned 
> > long iova, size_t size,
> > }
> >   }
> >   
> > +static void __mtk_iommu_tlb_flush_range_sync(unsigned long iova, size_t 
> > size)
> > +{
> > +   mtk_iommu_tlb_flush_range_sync(iova, size, 0, NULL)
> > +}
> > +
> >   static void mtk_iommu_tlb_flush_page_nosync(struct iommu_iotlb_gather 
> > *gather,
> > unsigned long iova, size_t granule,
> > void *cookie)
> > @@ -536,6 +541,7 @@ static const struct iommu_ops mtk_iommu_ops = {
> > .map= mtk_iommu_map,
> > .unmap  = mtk_iommu_unmap,
> > .flush_iotlb_all = mtk_iommu_flush_iotlb_all,
> > +   .iotlb_sync_range = __mtk_iommu_tlb_flush_range_sync,
> > .iotlb_sync = mtk_iommu_iotlb_sync,
> > .iova_to_phys   = mtk_iommu_iova_to_phys,
> > .probe_device   = mtk_iommu_probe_device,
> > 

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH 2/4] iommu/mediatek: Add iotlb_sync_range() support

2020-10-22 Thread chao hao
On Fri, 2020-10-23 at 13:57 +0800, chao hao wrote:
> On Wed, 2020-10-21 at 17:55 +0100, Robin Murphy wrote:
> > On 2020-10-19 12:30, Chao Hao wrote:
> > > MTK_IOMMU driver writes one page entry and does tlb flush at a time
> > > currently. More optimal would be to aggregate the writes and flush
> > > BUS buffer in the end.
> > 
> > That's exactly what iommu_iotlb_gather_add_page() is meant to achieve. 
> > Rather than jumping straight into hacking up a new API to go round the 
> > back of the existing API design, it would be far better to ask the 
> > question of why that's not behaving as expected.
> 
> Thanks for you review!
> 
> iommu_iotlb_gather_add_page is put in io_pgtable_tlb_add_page().
> io_pgtable_tlb_add_page() only be called in
> unmapping and mapping flow doesn't have it in linux iommu driver, but
> mtk iommu needs to do tlb sync in mapping
> and unmapping to avoid old data being in the iommu tlb.
> 
> In addtion, we hope to do tlb sync once when all the pages mapping done.
> iommu_iotlb_gather_add_page maybe do
> tlb sync more than once. because one whole buffer consists of different
> page size(1MB/64K/4K).
> 
> Based on the previous considerations,  don't find more appropriate the
> way of tlb sync for mtk iommu, so we add a new API.
> 
> > 
> > > For 50MB buffer mapping, if mtk_iommu driver use iotlb_sync_range()
> > > instead of tlb_add_range() and tlb_flush_walk/leaf(), it can increase
> > > 50% performance or more(depending on size of every page size) in
> > > comparison to flushing after each page entry update. So we prefer to
> > > use iotlb_sync_range() to replace iotlb_sync(), tlb_add_range() and
> > > tlb_flush_walk/leaf() for MTK platforms.
> > 
> > In the case of mapping, it sounds like what you actually want to do is 
> > hook up .iotlb_sync_map and generally make IO_PGTABLE_QUIRK_TLBI_ON_MAP 
> > cleverer, because the current implementation is as dumb as it could 
> > possibly be. 
> 
> iotlb_sync_map only has one parameter(iommu_domain), but mtk
> iommu_domain maybe include the whole iova space, if mtk_iommu to do tlb
> sync based on iommu_domain, it is equivalent to do tlb flush all in
> fact.
> iommu driver will do tlb sync in every mapping page when mtk iommu sets
> IO_PGTABLE_QUIRK_TLBI_ON_MAP(io_pgtable_tlb_flush_walk),
> as is the commit message mentioned, it will drop mapping performance in
> mtk platform.
> 
> 
> > In fact if we simply passed an address range to 
> > .iotlb_sync_map, io-pgtable probably wouldn't need to be involved at all 
> > any more.
Sorry, I forget to reply the question in previous mail.
Do you mean we need to modify iotlb_sync_map() input parameter(ex: add
start/end iova)?

> 
> I know it is not a good idea probably by adding a new api, but I found
> out that tlb sync only to be done after mapping one page, so if
> mtk_iommu hope to do tlb sync once after all the pages map done, could
> you give me some advices? thanks!
> 
> > 
> > Robin.
> > 
> > > Signed-off-by: Chao Hao 
> > > ---
> > >   drivers/iommu/mtk_iommu.c | 6 ++
> > >   1 file changed, 6 insertions(+)
> > > 
> > > diff --git a/drivers/iommu/mtk_iommu.c b/drivers/iommu/mtk_iommu.c
> > > index 785b228d39a6..d3400c15ff7b 100644
> > > --- a/drivers/iommu/mtk_iommu.c
> > > +++ b/drivers/iommu/mtk_iommu.c
> > > @@ -224,6 +224,11 @@ static void mtk_iommu_tlb_flush_range_sync(unsigned 
> > > long iova, size_t size,
> > >   }
> > >   }
> > >   
> > > +static void __mtk_iommu_tlb_flush_range_sync(unsigned long iova, size_t 
> > > size)
> > > +{
> > > + mtk_iommu_tlb_flush_range_sync(iova, size, 0, NULL)
> > > +}
> > > +
> > >   static void mtk_iommu_tlb_flush_page_nosync(struct iommu_iotlb_gather 
> > > *gather,
> > >   unsigned long iova, size_t 
> > > granule,
> > >   void *cookie)
> > > @@ -536,6 +541,7 @@ static const struct iommu_ops mtk_iommu_ops = {
> > >   .map= mtk_iommu_map,
> > >   .unmap  = mtk_iommu_unmap,
> > >   .flush_iotlb_all = mtk_iommu_flush_iotlb_all,
> > > + .iotlb_sync_range = __mtk_iommu_tlb_flush_range_sync,
> > >   .iotlb_sync = mtk_iommu_iotlb_sync,
> > >   .iova_to_phys   = mtk_iommu_iova_to_phys,
> > >   .probe_device   = mtk_iommu_probe_device,
> > > 
> 

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[PATCH for-5.10] swiotlb: remove the tbl_dma_addr argument to swiotlb_tbl_map_single

2020-10-22 Thread Christoph Hellwig
The tbl_dma_addr argument is used to check the DMA boundary for the
allocations, and thus needs to be a dma_addr_t.  swiotlb-xen instead
passed a physical address, which could lead to incorrect results for
strange offsets.  Fix this by removing the parameter entirely and hard
code the DMA address for io_tlb_start instead.

Fixes: 91ffe4ad534a ("swiotlb-xen: introduce phys_to_dma/dma_to_phys 
translations")
Signed-off-by: Christoph Hellwig 
Reviewed-by: Stefano Stabellini 
---
 drivers/iommu/intel/iommu.c |  5 ++---
 drivers/xen/swiotlb-xen.c   |  3 +--
 include/linux/swiotlb.h | 10 +++---
 kernel/dma/swiotlb.c| 16 ++--
 4 files changed, 12 insertions(+), 22 deletions(-)

diff --git a/drivers/iommu/intel/iommu.c b/drivers/iommu/intel/iommu.c
index 8651f6d4dfa032..6b560e6f193096 100644
--- a/drivers/iommu/intel/iommu.c
+++ b/drivers/iommu/intel/iommu.c
@@ -3815,9 +3815,8 @@ bounce_map_single(struct device *dev, phys_addr_t paddr, 
size_t size,
 * page aligned, we don't need to use a bounce page.
 */
if (!IS_ALIGNED(paddr | size, VTD_PAGE_SIZE)) {
-   tlb_addr = swiotlb_tbl_map_single(dev,
-   phys_to_dma_unencrypted(dev, io_tlb_start),
-   paddr, size, aligned_size, dir, attrs);
+   tlb_addr = swiotlb_tbl_map_single(dev, paddr, size,
+   aligned_size, dir, attrs);
if (tlb_addr == DMA_MAPPING_ERROR) {
goto swiotlb_error;
} else {
diff --git a/drivers/xen/swiotlb-xen.c b/drivers/xen/swiotlb-xen.c
index 71ce1b7a23d1cc..2b385c1b4a99cb 100644
--- a/drivers/xen/swiotlb-xen.c
+++ b/drivers/xen/swiotlb-xen.c
@@ -395,8 +395,7 @@ static dma_addr_t xen_swiotlb_map_page(struct device *dev, 
struct page *page,
 */
trace_swiotlb_bounced(dev, dev_addr, size, swiotlb_force);
 
-   map = swiotlb_tbl_map_single(dev, virt_to_phys(xen_io_tlb_start),
-phys, size, size, dir, attrs);
+   map = swiotlb_tbl_map_single(dev, phys, size, size, dir, attrs);
if (map == (phys_addr_t)DMA_MAPPING_ERROR)
return DMA_MAPPING_ERROR;
 
diff --git a/include/linux/swiotlb.h b/include/linux/swiotlb.h
index 513913ff748626..3bb72266a75a1d 100644
--- a/include/linux/swiotlb.h
+++ b/include/linux/swiotlb.h
@@ -45,13 +45,9 @@ enum dma_sync_target {
SYNC_FOR_DEVICE = 1,
 };
 
-extern phys_addr_t swiotlb_tbl_map_single(struct device *hwdev,
- dma_addr_t tbl_dma_addr,
- phys_addr_t phys,
- size_t mapping_size,
- size_t alloc_size,
- enum dma_data_direction dir,
- unsigned long attrs);
+phys_addr_t swiotlb_tbl_map_single(struct device *hwdev, phys_addr_t phys,
+   size_t mapping_size, size_t alloc_size,
+   enum dma_data_direction dir, unsigned long attrs);
 
 extern void swiotlb_tbl_unmap_single(struct device *hwdev,
 phys_addr_t tlb_addr,
diff --git a/kernel/dma/swiotlb.c b/kernel/dma/swiotlb.c
index b4eea0abc3f002..92e2f54f24c01b 100644
--- a/kernel/dma/swiotlb.c
+++ b/kernel/dma/swiotlb.c
@@ -441,14 +441,11 @@ static void swiotlb_bounce(phys_addr_t orig_addr, 
phys_addr_t tlb_addr,
}
 }
 
-phys_addr_t swiotlb_tbl_map_single(struct device *hwdev,
-  dma_addr_t tbl_dma_addr,
-  phys_addr_t orig_addr,
-  size_t mapping_size,
-  size_t alloc_size,
-  enum dma_data_direction dir,
-  unsigned long attrs)
+phys_addr_t swiotlb_tbl_map_single(struct device *hwdev, phys_addr_t orig_addr,
+   size_t mapping_size, size_t alloc_size,
+   enum dma_data_direction dir, unsigned long attrs)
 {
+   dma_addr_t tbl_dma_addr = phys_to_dma_unencrypted(hwdev, io_tlb_start);
unsigned long flags;
phys_addr_t tlb_addr;
unsigned int nslots, stride, index, wrap;
@@ -667,9 +664,8 @@ dma_addr_t swiotlb_map(struct device *dev, phys_addr_t 
paddr, size_t size,
trace_swiotlb_bounced(dev, phys_to_dma(dev, paddr), size,
  swiotlb_force);
 
-   swiotlb_addr = swiotlb_tbl_map_single(dev,
-   phys_to_dma_unencrypted(dev, io_tlb_start),
-   paddr, size, size, dir, attrs);
+   swiotlb_addr = swiotlb_tbl_map_single(dev, paddr, size, size, dir,
+   attrs);
if (swiotlb_addr == (phys_addr_t)DMA_MAPPING_ERROR)
return DMA_MAPPING_ERROR;
 
-- 
2.28.0

___
iommu mailing list
iommu@lists.l

[PATCH] dma-mapping: document dma_{alloc,free}_pages

2020-10-22 Thread Christoph Hellwig
Document the new dma_alloc_pages and dma_free_pages APIs, and fix
up the documentation for dma_alloc_noncoherent and dma_free_noncoherent.

Reported-by: Robin Murphy 
Signed-off-by: Christoph Hellwig 
---
 Documentation/core-api/dma-api.rst | 49 ++
 1 file changed, 43 insertions(+), 6 deletions(-)

diff --git a/Documentation/core-api/dma-api.rst 
b/Documentation/core-api/dma-api.rst
index ea0413276ddb70..209d7978cdaa1b 100644
--- a/Documentation/core-api/dma-api.rst
+++ b/Documentation/core-api/dma-api.rst
@@ -519,10 +519,9 @@ routines, e.g.:::
 Part II - Non-coherent DMA allocations
 --
 
-These APIs allow to allocate pages in the kernel direct mapping that are
-guaranteed to be DMA addressable.  This means that unlike dma_alloc_coherent,
-virt_to_page can be called on the resulting address, and the resulting
-struct page can be used for everything a struct page is suitable for.
+These APIs allow to allocate pages that are guranteed to be DMA addressable
+by the passed in device, but which need explicit management of memory
+ownership for the kernel vs the device.
 
 If you don't understand how cache line coherency works between a processor and
 an I/O device, you should not be using this part of the API.
@@ -537,7 +536,7 @@ an I/O device, you should not be using this part of the API.
 This routine allocates a region of  bytes of consistent memory.  It
 returns a pointer to the allocated region (in the processor's virtual address
 space) or NULL if the allocation failed.  The returned memory may or may not
-be in the kernels direct mapping.  Drivers must not call virt_to_page on
+be in the kernel direct mapping.  Drivers must not call virt_to_page on
 the returned memory region.
 
 It also returns a  which may be cast to an unsigned integer the
@@ -565,7 +564,45 @@ reused.
 Free a region of memory previously allocated using dma_alloc_noncoherent().
 dev, size and dma_handle and dir must all be the same as those passed into
 dma_alloc_noncoherent().  cpu_addr must be the virtual address returned by
-the dma_alloc_noncoherent().
+dma_alloc_noncoherent().
+
+::
+
+   struct page *
+   dma_alloc_pages(struct device *dev, size_t size, dma_addr_t *dma_handle,
+   enum dma_data_direction dir, gfp_t gfp)
+
+This routine allocates a region of  bytes of non-coherent memory.  It
+returns a pointer to first struct page for the region, or NULL if the
+allocation failed. The resulting struct page can be used for everything a
+struct page is suitable for.
+
+It also returns a  which may be cast to an unsigned integer the
+same width as the bus and given to the device as the DMA address base of
+the region.
+
+The dir parameter specified if data is read and/or written by the device,
+see dma_map_single() for details.
+
+The gfp parameter allows the caller to specify the ``GFP_`` flags (see
+kmalloc()) for the allocation, but rejects flags used to specify a memory
+zone such as GFP_DMA or GFP_HIGHMEM.
+
+Before giving the memory to the device, dma_sync_single_for_device() needs
+to be called, and before reading memory written by the device,
+dma_sync_single_for_cpu(), just like for streaming DMA mappings that are
+reused.
+
+::
+
+   void
+   dma_free_pages(struct device *dev, size_t size, struct page *page,
+   dma_addr_t dma_handle, enum dma_data_direction dir)
+
+Free a region of memory previously allocated using dma_alloc_pages().
+dev, size and dma_handle and dir must all be the same as those passed into
+dma_alloc_noncoherent().  page must be the pointer returned by
+dma_alloc_pages().
 
 ::
 
-- 
2.28.0

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH 2/3] drm/msm: add DRM_MSM_GEM_SYNC_CACHE for non-coherent cache maintenance

2020-10-22 Thread Christoph Hellwig
On Thu, Oct 15, 2020 at 04:43:01PM +0100, Christoph Hellwig wrote:
> > Somewhat related, but is there a way to tell the dma-api to fail instead
> > of falling back to swiotlb? In many case for gpu drivers it's much better
> > if we fall back to dma_alloc_coherent and manage the copying ourselves
> > instead of abstracting this away in the dma-api. Currently that's "solved"
> > rather pessimistically by always allocating from dma_alloc_coherent if
> > swiotlb could be in the picture (at least for ttm based drivers, i915 just
> > falls over).
> 
> Is this for the alloc_pages plus manually map logic in various drivers?
> 
> They should switch to the new dma_alloc_pages API that I'll send to
> Linus for 5.10 soon.

Daniel, can you clarify this?
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH v4 7/7] mm: Remove examples from enum zone_type comment

2020-10-22 Thread Christoph Hellwig
Looks good,

Reviewed-by: Christoph Hellwig 
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu