Re: [PATCH v3 6/7] iommu/mediatek: Gather iova in iommu_unmap to achieve tlb sync once

2021-01-18 Thread Robin Murphy

On 2021-01-18 16:58, Will Deacon wrote:

On Mon, Jan 18, 2021 at 04:35:22PM +, Robin Murphy wrote:

On 2020-12-16 10:36, Yong Wu wrote:

In current iommu_unmap, this code is:

iommu_iotlb_gather_init(_gather);
ret = __iommu_unmap(domain, iova, size, _gather);
iommu_iotlb_sync(domain, _gather);

We could gather the whole iova range in __iommu_unmap, and then do tlb
synchronization in the iommu_iotlb_sync.

This patch implement this, Gather the range in mtk_iommu_unmap.
then iommu_iotlb_sync call tlb synchronization for the gathered iova range.
we don't call iommu_iotlb_gather_add_page since our tlb synchronization
could be regardless of granule size.

In this way, gather->start is impossible ULONG_MAX, remove the checking.

This patch aims to do tlb synchronization *once* in the iommu_unmap.


Assuming the update to patch #4 simply results in "unsigned long end = iova
+ size - 1;" here,

Reviewed-by: Robin Murphy 


There's a v4 here:

https://lore.kernel.org/r/20210107122909.16317-1-yong...@mediatek.com


Ha, so there is! Apparently I missed that in my post-holiday sweep last 
week and leant too heavily on the inbox-in-date-order assumption. Lemme 
just go catch up...


Thanks,
Robin.
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH v3 6/7] iommu/mediatek: Gather iova in iommu_unmap to achieve tlb sync once

2021-01-18 Thread Will Deacon
On Mon, Jan 18, 2021 at 04:35:22PM +, Robin Murphy wrote:
> On 2020-12-16 10:36, Yong Wu wrote:
> > In current iommu_unmap, this code is:
> > 
> > iommu_iotlb_gather_init(_gather);
> > ret = __iommu_unmap(domain, iova, size, _gather);
> > iommu_iotlb_sync(domain, _gather);
> > 
> > We could gather the whole iova range in __iommu_unmap, and then do tlb
> > synchronization in the iommu_iotlb_sync.
> > 
> > This patch implement this, Gather the range in mtk_iommu_unmap.
> > then iommu_iotlb_sync call tlb synchronization for the gathered iova range.
> > we don't call iommu_iotlb_gather_add_page since our tlb synchronization
> > could be regardless of granule size.
> > 
> > In this way, gather->start is impossible ULONG_MAX, remove the checking.
> > 
> > This patch aims to do tlb synchronization *once* in the iommu_unmap.
> 
> Assuming the update to patch #4 simply results in "unsigned long end = iova
> + size - 1;" here,
> 
> Reviewed-by: Robin Murphy 

There's a v4 here:

https://lore.kernel.org/r/20210107122909.16317-1-yong...@mediatek.com

Will
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH v3 6/7] iommu/mediatek: Gather iova in iommu_unmap to achieve tlb sync once

2021-01-18 Thread Robin Murphy

On 2020-12-16 10:36, Yong Wu wrote:

In current iommu_unmap, this code is:

iommu_iotlb_gather_init(_gather);
ret = __iommu_unmap(domain, iova, size, _gather);
iommu_iotlb_sync(domain, _gather);

We could gather the whole iova range in __iommu_unmap, and then do tlb
synchronization in the iommu_iotlb_sync.

This patch implement this, Gather the range in mtk_iommu_unmap.
then iommu_iotlb_sync call tlb synchronization for the gathered iova range.
we don't call iommu_iotlb_gather_add_page since our tlb synchronization
could be regardless of granule size.

In this way, gather->start is impossible ULONG_MAX, remove the checking.

This patch aims to do tlb synchronization *once* in the iommu_unmap.


Assuming the update to patch #4 simply results in "unsigned long end = 
iova + size - 1;" here,


Reviewed-by: Robin Murphy 


Signed-off-by: Yong Wu 
---
  drivers/iommu/mtk_iommu.c | 8 +---
  1 file changed, 5 insertions(+), 3 deletions(-)

diff --git a/drivers/iommu/mtk_iommu.c b/drivers/iommu/mtk_iommu.c
index db7d43adb06b..89cec51405cd 100644
--- a/drivers/iommu/mtk_iommu.c
+++ b/drivers/iommu/mtk_iommu.c
@@ -506,7 +506,12 @@ static size_t mtk_iommu_unmap(struct iommu_domain *domain,
  struct iommu_iotlb_gather *gather)
  {
struct mtk_iommu_domain *dom = to_mtk_domain(domain);
+   unsigned long long end = iova + size;
  
+	if (gather->start > iova)

+   gather->start = iova;
+   if (gather->end < end)
+   gather->end = end;
return dom->iop->unmap(dom->iop, iova, size, gather);
  }
  
@@ -523,9 +528,6 @@ static void mtk_iommu_iotlb_sync(struct iommu_domain *domain,

struct mtk_iommu_domain *dom = to_mtk_domain(domain);
size_t length = gather->end - gather->start;
  
-	if (gather->start == ULONG_MAX)

-   return;
-
mtk_iommu_tlb_flush_range_sync(gather->start, length, gather->pgsize,
   dom->data);
  }


___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH v3 6/7] iommu/mediatek: Gather iova in iommu_unmap to achieve tlb sync once

2021-01-08 Thread Tomasz Figa
On Wed, Dec 23, 2020 at 8:00 PM Robin Murphy  wrote:
>
> On 2020-12-23 08:56, Tomasz Figa wrote:
> > On Wed, Dec 16, 2020 at 06:36:06PM +0800, Yong Wu wrote:
> >> In current iommu_unmap, this code is:
> >>
> >>  iommu_iotlb_gather_init(_gather);
> >>  ret = __iommu_unmap(domain, iova, size, _gather);
> >>  iommu_iotlb_sync(domain, _gather);
> >>
> >> We could gather the whole iova range in __iommu_unmap, and then do tlb
> >> synchronization in the iommu_iotlb_sync.
> >>
> >> This patch implement this, Gather the range in mtk_iommu_unmap.
> >> then iommu_iotlb_sync call tlb synchronization for the gathered iova range.
> >> we don't call iommu_iotlb_gather_add_page since our tlb synchronization
> >> could be regardless of granule size.
> >>
> >> In this way, gather->start is impossible ULONG_MAX, remove the checking.
> >>
> >> This patch aims to do tlb synchronization *once* in the iommu_unmap.
> >>
> >> Signed-off-by: Yong Wu 
> >> ---
> >>   drivers/iommu/mtk_iommu.c | 8 +---
> >>   1 file changed, 5 insertions(+), 3 deletions(-)
> >>
> >> diff --git a/drivers/iommu/mtk_iommu.c b/drivers/iommu/mtk_iommu.c
> >> index db7d43adb06b..89cec51405cd 100644
> >> --- a/drivers/iommu/mtk_iommu.c
> >> +++ b/drivers/iommu/mtk_iommu.c
> >> @@ -506,7 +506,12 @@ static size_t mtk_iommu_unmap(struct iommu_domain 
> >> *domain,
> >>struct iommu_iotlb_gather *gather)
> >>   {
> >>  struct mtk_iommu_domain *dom = to_mtk_domain(domain);
> >> +unsigned long long end = iova + size;
> >>
> >> +if (gather->start > iova)
> >> +gather->start = iova;
> >> +if (gather->end < end)
> >> +gather->end = end;
> >
> > I don't know how common the case is, but what happens if
> > gather->start...gather->end is a disjoint range from iova...end? E.g.
> >
> >   | gather  | ..XXX... | iova |
> >   | |  |  |
> >   gather->start |  iova   |
> > gather->end   end
> >
> > We would also end up invalidating the TLB for the XXX area, which could
> > affect the performance.
>
> Take a closer look at iommu_unmap() - the gather data is scoped to each
> individual call, so that can't possibly happen.
>
> > Also, why is the existing code in __arm_v7s_unmap() not enough? It seems
> > to call io_pgtable_tlb_add_page() already, so it should be batching the
> > flushes.
>
> Because if we leave io-pgtable in charge of maintenance it will also
> inject additional invalidations and syncs for the sake of strictly
> correct walk cache maintenance. Apparently we can get away without that
> on this hardware, so the fundamental purpose of this series is to
> sidestep it.
>
> It's proven to be cleaner overall to devolve this kind of "non-standard"
> TLB maintenance back to drivers rather than try to cram yet more
> special-case complexity into io-pgtable itself. I'm planning to clean up
> the remains of the TLBI_ON_MAP quirk entirely after this.
>
> Robin.
>
> >>  return dom->iop->unmap(dom->iop, iova, size, gather);
> >>   }
> >>
> >> @@ -523,9 +528,6 @@ static void mtk_iommu_iotlb_sync(struct iommu_domain 
> >> *domain,
> >>  struct mtk_iommu_domain *dom = to_mtk_domain(domain);
> >>  size_t length = gather->end - gather->start;
> >>
> >> -if (gather->start == ULONG_MAX)
> >> -return;
> >> -
> >>  mtk_iommu_tlb_flush_range_sync(gather->start, length, gather->pgsize,
> >> dom->data);
> >>   }
> >> --
> >> 2.18.0
> >>
> >> ___
> >> iommu mailing list
> >> iommu@lists.linux-foundation.org
> >> https://lists.linuxfoundation.org/mailman/listinfo/iommu
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH v3 6/7] iommu/mediatek: Gather iova in iommu_unmap to achieve tlb sync once

2021-01-08 Thread Tomasz Figa
On Wed, Dec 23, 2020 at 8:00 PM Robin Murphy  wrote:
>
> On 2020-12-23 08:56, Tomasz Figa wrote:
> > On Wed, Dec 16, 2020 at 06:36:06PM +0800, Yong Wu wrote:
> >> In current iommu_unmap, this code is:
> >>
> >>  iommu_iotlb_gather_init(_gather);
> >>  ret = __iommu_unmap(domain, iova, size, _gather);
> >>  iommu_iotlb_sync(domain, _gather);
> >>
> >> We could gather the whole iova range in __iommu_unmap, and then do tlb
> >> synchronization in the iommu_iotlb_sync.
> >>
> >> This patch implement this, Gather the range in mtk_iommu_unmap.
> >> then iommu_iotlb_sync call tlb synchronization for the gathered iova range.
> >> we don't call iommu_iotlb_gather_add_page since our tlb synchronization
> >> could be regardless of granule size.
> >>
> >> In this way, gather->start is impossible ULONG_MAX, remove the checking.
> >>
> >> This patch aims to do tlb synchronization *once* in the iommu_unmap.
> >>
> >> Signed-off-by: Yong Wu 
> >> ---
> >>   drivers/iommu/mtk_iommu.c | 8 +---
> >>   1 file changed, 5 insertions(+), 3 deletions(-)
> >>
> >> diff --git a/drivers/iommu/mtk_iommu.c b/drivers/iommu/mtk_iommu.c
> >> index db7d43adb06b..89cec51405cd 100644
> >> --- a/drivers/iommu/mtk_iommu.c
> >> +++ b/drivers/iommu/mtk_iommu.c
> >> @@ -506,7 +506,12 @@ static size_t mtk_iommu_unmap(struct iommu_domain 
> >> *domain,
> >>struct iommu_iotlb_gather *gather)
> >>   {
> >>  struct mtk_iommu_domain *dom = to_mtk_domain(domain);
> >> +unsigned long long end = iova + size;
> >>
> >> +if (gather->start > iova)
> >> +gather->start = iova;
> >> +if (gather->end < end)
> >> +gather->end = end;
> >
> > I don't know how common the case is, but what happens if
> > gather->start...gather->end is a disjoint range from iova...end? E.g.
> >
> >   | gather  | ..XXX... | iova |
> >   | |  |  |
> >   gather->start |  iova   |
> > gather->end   end
> >
> > We would also end up invalidating the TLB for the XXX area, which could
> > affect the performance.
>
> Take a closer look at iommu_unmap() - the gather data is scoped to each
> individual call, so that can't possibly happen.
>
> > Also, why is the existing code in __arm_v7s_unmap() not enough? It seems
> > to call io_pgtable_tlb_add_page() already, so it should be batching the
> > flushes.
>
> Because if we leave io-pgtable in charge of maintenance it will also
> inject additional invalidations and syncs for the sake of strictly
> correct walk cache maintenance. Apparently we can get away without that
> on this hardware, so the fundamental purpose of this series is to
> sidestep it.
>
> It's proven to be cleaner overall to devolve this kind of "non-standard"
> TLB maintenance back to drivers rather than try to cram yet more
> special-case complexity into io-pgtable itself. I'm planning to clean up
> the remains of the TLBI_ON_MAP quirk entirely after this.

(Sorry, I sent an empty email accidentally.)

I see, thanks for clarifying. The patch looks good to me then.

Best regards,
Tomasz

>
> Robin.
>
> >>  return dom->iop->unmap(dom->iop, iova, size, gather);
> >>   }
> >>
> >> @@ -523,9 +528,6 @@ static void mtk_iommu_iotlb_sync(struct iommu_domain 
> >> *domain,
> >>  struct mtk_iommu_domain *dom = to_mtk_domain(domain);
> >>  size_t length = gather->end - gather->start;
> >>
> >> -if (gather->start == ULONG_MAX)
> >> -return;
> >> -
> >>  mtk_iommu_tlb_flush_range_sync(gather->start, length, gather->pgsize,
> >> dom->data);
> >>   }
> >> --
> >> 2.18.0
> >>
> >> ___
> >> iommu mailing list
> >> iommu@lists.linux-foundation.org
> >> https://lists.linuxfoundation.org/mailman/listinfo/iommu
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH v3 6/7] iommu/mediatek: Gather iova in iommu_unmap to achieve tlb sync once

2020-12-23 Thread Robin Murphy

On 2020-12-23 08:56, Tomasz Figa wrote:

On Wed, Dec 16, 2020 at 06:36:06PM +0800, Yong Wu wrote:

In current iommu_unmap, this code is:

iommu_iotlb_gather_init(_gather);
ret = __iommu_unmap(domain, iova, size, _gather);
iommu_iotlb_sync(domain, _gather);

We could gather the whole iova range in __iommu_unmap, and then do tlb
synchronization in the iommu_iotlb_sync.

This patch implement this, Gather the range in mtk_iommu_unmap.
then iommu_iotlb_sync call tlb synchronization for the gathered iova range.
we don't call iommu_iotlb_gather_add_page since our tlb synchronization
could be regardless of granule size.

In this way, gather->start is impossible ULONG_MAX, remove the checking.

This patch aims to do tlb synchronization *once* in the iommu_unmap.

Signed-off-by: Yong Wu 
---
  drivers/iommu/mtk_iommu.c | 8 +---
  1 file changed, 5 insertions(+), 3 deletions(-)

diff --git a/drivers/iommu/mtk_iommu.c b/drivers/iommu/mtk_iommu.c
index db7d43adb06b..89cec51405cd 100644
--- a/drivers/iommu/mtk_iommu.c
+++ b/drivers/iommu/mtk_iommu.c
@@ -506,7 +506,12 @@ static size_t mtk_iommu_unmap(struct iommu_domain *domain,
  struct iommu_iotlb_gather *gather)
  {
struct mtk_iommu_domain *dom = to_mtk_domain(domain);
+   unsigned long long end = iova + size;
  
+	if (gather->start > iova)

+   gather->start = iova;
+   if (gather->end < end)
+   gather->end = end;


I don't know how common the case is, but what happens if
gather->start...gather->end is a disjoint range from iova...end? E.g.

  | gather  | ..XXX... | iova |
  | |  |  |
  gather->start |  iova   |
gather->end   end

We would also end up invalidating the TLB for the XXX area, which could
affect the performance.


Take a closer look at iommu_unmap() - the gather data is scoped to each 
individual call, so that can't possibly happen.



Also, why is the existing code in __arm_v7s_unmap() not enough? It seems
to call io_pgtable_tlb_add_page() already, so it should be batching the
flushes.


Because if we leave io-pgtable in charge of maintenance it will also 
inject additional invalidations and syncs for the sake of strictly 
correct walk cache maintenance. Apparently we can get away without that 
on this hardware, so the fundamental purpose of this series is to 
sidestep it.


It's proven to be cleaner overall to devolve this kind of "non-standard" 
TLB maintenance back to drivers rather than try to cram yet more 
special-case complexity into io-pgtable itself. I'm planning to clean up 
the remains of the TLBI_ON_MAP quirk entirely after this.


Robin.


return dom->iop->unmap(dom->iop, iova, size, gather);
  }
  
@@ -523,9 +528,6 @@ static void mtk_iommu_iotlb_sync(struct iommu_domain *domain,

struct mtk_iommu_domain *dom = to_mtk_domain(domain);
size_t length = gather->end - gather->start;
  
-	if (gather->start == ULONG_MAX)

-   return;
-
mtk_iommu_tlb_flush_range_sync(gather->start, length, gather->pgsize,
   dom->data);
  }
--
2.18.0

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH v3 6/7] iommu/mediatek: Gather iova in iommu_unmap to achieve tlb sync once

2020-12-23 Thread Tomasz Figa
On Wed, Dec 16, 2020 at 06:36:06PM +0800, Yong Wu wrote:
> In current iommu_unmap, this code is:
> 
>   iommu_iotlb_gather_init(_gather);
>   ret = __iommu_unmap(domain, iova, size, _gather);
>   iommu_iotlb_sync(domain, _gather);
> 
> We could gather the whole iova range in __iommu_unmap, and then do tlb
> synchronization in the iommu_iotlb_sync.
> 
> This patch implement this, Gather the range in mtk_iommu_unmap.
> then iommu_iotlb_sync call tlb synchronization for the gathered iova range.
> we don't call iommu_iotlb_gather_add_page since our tlb synchronization
> could be regardless of granule size.
> 
> In this way, gather->start is impossible ULONG_MAX, remove the checking.
> 
> This patch aims to do tlb synchronization *once* in the iommu_unmap.
> 
> Signed-off-by: Yong Wu 
> ---
>  drivers/iommu/mtk_iommu.c | 8 +---
>  1 file changed, 5 insertions(+), 3 deletions(-)
> 
> diff --git a/drivers/iommu/mtk_iommu.c b/drivers/iommu/mtk_iommu.c
> index db7d43adb06b..89cec51405cd 100644
> --- a/drivers/iommu/mtk_iommu.c
> +++ b/drivers/iommu/mtk_iommu.c
> @@ -506,7 +506,12 @@ static size_t mtk_iommu_unmap(struct iommu_domain 
> *domain,
> struct iommu_iotlb_gather *gather)
>  {
>   struct mtk_iommu_domain *dom = to_mtk_domain(domain);
> + unsigned long long end = iova + size;
>  
> + if (gather->start > iova)
> + gather->start = iova;
> + if (gather->end < end)
> + gather->end = end;

I don't know how common the case is, but what happens if
gather->start...gather->end is a disjoint range from iova...end? E.g.

 | gather  | ..XXX... | iova |
 | |  |  |
 gather->start |  iova   |
   gather->end   end

We would also end up invalidating the TLB for the XXX area, which could
affect the performance.

Also, why is the existing code in __arm_v7s_unmap() not enough? It seems
to call io_pgtable_tlb_add_page() already, so it should be batching the
flushes.

>   return dom->iop->unmap(dom->iop, iova, size, gather);
>  }
>  
> @@ -523,9 +528,6 @@ static void mtk_iommu_iotlb_sync(struct iommu_domain 
> *domain,
>   struct mtk_iommu_domain *dom = to_mtk_domain(domain);
>   size_t length = gather->end - gather->start;
>  
> - if (gather->start == ULONG_MAX)
> - return;
> -
>   mtk_iommu_tlb_flush_range_sync(gather->start, length, gather->pgsize,
>  dom->data);
>  }
> -- 
> 2.18.0
> 
> ___
> iommu mailing list
> iommu@lists.linux-foundation.org
> https://lists.linuxfoundation.org/mailman/listinfo/iommu
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[PATCH v3 6/7] iommu/mediatek: Gather iova in iommu_unmap to achieve tlb sync once

2020-12-16 Thread Yong Wu
In current iommu_unmap, this code is:

iommu_iotlb_gather_init(_gather);
ret = __iommu_unmap(domain, iova, size, _gather);
iommu_iotlb_sync(domain, _gather);

We could gather the whole iova range in __iommu_unmap, and then do tlb
synchronization in the iommu_iotlb_sync.

This patch implement this, Gather the range in mtk_iommu_unmap.
then iommu_iotlb_sync call tlb synchronization for the gathered iova range.
we don't call iommu_iotlb_gather_add_page since our tlb synchronization
could be regardless of granule size.

In this way, gather->start is impossible ULONG_MAX, remove the checking.

This patch aims to do tlb synchronization *once* in the iommu_unmap.

Signed-off-by: Yong Wu 
---
 drivers/iommu/mtk_iommu.c | 8 +---
 1 file changed, 5 insertions(+), 3 deletions(-)

diff --git a/drivers/iommu/mtk_iommu.c b/drivers/iommu/mtk_iommu.c
index db7d43adb06b..89cec51405cd 100644
--- a/drivers/iommu/mtk_iommu.c
+++ b/drivers/iommu/mtk_iommu.c
@@ -506,7 +506,12 @@ static size_t mtk_iommu_unmap(struct iommu_domain *domain,
  struct iommu_iotlb_gather *gather)
 {
struct mtk_iommu_domain *dom = to_mtk_domain(domain);
+   unsigned long long end = iova + size;
 
+   if (gather->start > iova)
+   gather->start = iova;
+   if (gather->end < end)
+   gather->end = end;
return dom->iop->unmap(dom->iop, iova, size, gather);
 }
 
@@ -523,9 +528,6 @@ static void mtk_iommu_iotlb_sync(struct iommu_domain 
*domain,
struct mtk_iommu_domain *dom = to_mtk_domain(domain);
size_t length = gather->end - gather->start;
 
-   if (gather->start == ULONG_MAX)
-   return;
-
mtk_iommu_tlb_flush_range_sync(gather->start, length, gather->pgsize,
   dom->data);
 }
-- 
2.18.0

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu