Re: [PATCH v3 2/2] vfio/iommu_type1: Fix some sanity checks in detach group

2021-01-28 Thread Keqian Zhu



On 2021/1/28 7:46, Alex Williamson wrote:
> On Fri, 22 Jan 2021 17:26:35 +0800
> Keqian Zhu  wrote:
> 
>> vfio_sanity_check_pfn_list() is used to check whether pfn_list and
>> notifier are empty when remove the external domain, so it makes a
>> wrong assumption that only external domain will use the pinning
>> interface.
>>
>> Now we apply the pfn_list check when a vfio_dma is removed and apply
>> the notifier check when all domains are removed.
>>
>> Fixes: a54eb55045ae ("vfio iommu type1: Add support for mediated devices")
>> Signed-off-by: Keqian Zhu 
>> ---
>>  drivers/vfio/vfio_iommu_type1.c | 33 ++---
>>  1 file changed, 10 insertions(+), 23 deletions(-)
>>
>> diff --git a/drivers/vfio/vfio_iommu_type1.c 
>> b/drivers/vfio/vfio_iommu_type1.c
>> index 161725395f2f..d8c10f508321 100644
>> --- a/drivers/vfio/vfio_iommu_type1.c
>> +++ b/drivers/vfio/vfio_iommu_type1.c
>> @@ -957,6 +957,7 @@ static long vfio_unmap_unpin(struct vfio_iommu *iommu, 
>> struct vfio_dma *dma,
>>  
>>  static void vfio_remove_dma(struct vfio_iommu *iommu, struct vfio_dma *dma)
>>  {
>> +WARN_ON(!RB_EMPTY_ROOT(>pfn_list));
>>  vfio_unmap_unpin(iommu, dma, true);
>>  vfio_unlink_dma(iommu, dma);
>>  put_task_struct(dma->task);
>> @@ -2250,23 +2251,6 @@ static void vfio_iommu_unmap_unpin_reaccount(struct 
>> vfio_iommu *iommu)
>>  }
>>  }
>>  
>> -static void vfio_sanity_check_pfn_list(struct vfio_iommu *iommu)
>> -{
>> -struct rb_node *n;
>> -
>> -n = rb_first(>dma_list);
>> -for (; n; n = rb_next(n)) {
>> -struct vfio_dma *dma;
>> -
>> -dma = rb_entry(n, struct vfio_dma, node);
>> -
>> -if (WARN_ON(!RB_EMPTY_ROOT(>pfn_list)))
>> -break;
>> -}
>> -/* mdev vendor driver must unregister notifier */
>> -WARN_ON(iommu->notifier.head);
>> -}
>> -
>>  /*
>>   * Called when a domain is removed in detach. It is possible that
>>   * the removed domain decided the iova aperture window. Modify the
>> @@ -2366,10 +2350,10 @@ static void vfio_iommu_type1_detach_group(void 
>> *iommu_data,
>>  kfree(group);
>>  
>>  if (list_empty(>external_domain->group_list)) {
>> -vfio_sanity_check_pfn_list(iommu);
>> -
>> -if (!IS_IOMMU_CAP_DOMAIN_IN_CONTAINER(iommu))
>> +if (!IS_IOMMU_CAP_DOMAIN_IN_CONTAINER(iommu)) {
>> +WARN_ON(iommu->notifier.head);
>>  vfio_iommu_unmap_unpin_all(iommu);
>> +}
>>  
>>  kfree(iommu->external_domain);
>>  iommu->external_domain = NULL;
>> @@ -2403,10 +2387,12 @@ static void vfio_iommu_type1_detach_group(void 
>> *iommu_data,
>>   */
>>  if (list_empty(>group_list)) {
>>  if (list_is_singular(>domain_list)) {
>> -if (!iommu->external_domain)
>> +if (!iommu->external_domain) {
>> +WARN_ON(iommu->notifier.head);
>>  vfio_iommu_unmap_unpin_all(iommu);
>> -else
>> +} else {
>>  vfio_iommu_unmap_unpin_reaccount(iommu);
>> +}
>>  }
>>  iommu_domain_free(domain->domain);
>>  list_del(>next);
>> @@ -2488,9 +2474,10 @@ static void vfio_iommu_type1_release(void *iommu_data)
>>  struct vfio_iommu *iommu = iommu_data;
>>  struct vfio_domain *domain, *domain_tmp;
>>  
>> +WARN_ON(iommu->notifier.head);
> 
> I don't see that this does any harm, but isn't it actually redundant?
> It seems vfio-core only calls the iommu backend release function after
> removing all the groups, so the tests in _detach_group should catch all
> cases.  We're expecting the vfio bus/mdev driver to remove the notifier
> when a device is closed, which necessarily occurs before detaching the
> group.  Thanks,
Right. Devices of a specific group must be closed before detach this group.
Detach the last group have checked this, so vfio_iommu_type1_release doesn't
need to do this check again.

Could you please queue this patch and delete this check btw? Thanks. ;-)

Keqian.

> 
> Alex
> 
>> +
>>  if (iommu->external_domain) {
>>  vfio_release_domain(iommu->external_domain, true);
>> -vfio_sanity_check_pfn_list(iommu);
>>  kfree(iommu->external_domain);
>>  }
>>  
> 
> .
> 
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH 1/1] iommu/arm-smmu-v3: add support for BBML

2021-01-28 Thread Keqian Zhu



On 2021/1/29 0:17, Robin Murphy wrote:
> On 2021-01-28 15:18, Keqian Zhu wrote:
>>
>>
>> On 2021/1/27 17:39, Robin Murphy wrote:
>>> On 2021-01-27 07:36, Keqian Zhu wrote:


 On 2021/1/27 10:01, Leizhen (ThunderTown) wrote:
>
>
> On 2021/1/26 18:12, Will Deacon wrote:
>> On Mon, Jan 25, 2021 at 08:23:40PM +, Robin Murphy wrote:
>>> Now we probably will need some degreee of BBML feature awareness for the
>>> sake of SVA if and when we start using it for CPU pagetables, but I 
>>> still
>>> cannot see any need to consider it in io-pgtable.
>>
>> Agreed; I don't think this is something that io-pgtable should have to 
>> care
>> about.
 Hi,

 I have a question here :-).
 If the old table is not live, then the break procedure seems unnecessary. 
 Do I miss something?
>>>
>>> The MMU is allowed to prefetch translations at any time, so not following 
>>> the proper update procedure could still potentially lead to a TLB conflict, 
>>> even if there's no device traffic to worry about disrupting.
>>>
>>> Robin.
>>
>> Thanks. Does the MMU you mention here includes MMU and SMMU? I know that at 
>> SMMU side, ATS can prefetch translation.
> 
> Yes, both - VMSAv8 allows speculative translation table walks, so SMMUv3 
> inherits from there (per 3.21.1 "Translation tables and TLB invalidation 
> completion behavior").
OK, I Get it. Thanks.

Keqian.

> 
> Robin.
> 
>>
>> Keqian
>>>
 Thanks,
 Keqian

>
> Yes, the SVA works in stall mode, and the failed device access requests 
> are not
> discarded.
>
> Let me look for examples. The BBML usage scenario was told by a former 
> colleague.
>
>>
>> Will
>>
>> .
>>
>
>
> ___
> linux-arm-kernel mailing list
> linux-arm-ker...@lists.infradead.org
> http://lists.infradead.org/mailman/listinfo/linux-arm-kernel
> .
>
 ___
 iommu mailing list
 iommu@lists.linux-foundation.org
 https://lists.linuxfoundation.org/mailman/listinfo/iommu

>>> .
>>>
> .
> 
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH v6 07/33] iommu: Avoid reallocate default domain for a group

2021-01-28 Thread Yong Wu
On Thu, 2021-01-28 at 21:14 +, Will Deacon wrote:
> On Thu, Jan 28, 2021 at 09:10:20PM +, Will Deacon wrote:
> > On Wed, Jan 27, 2021 at 05:39:16PM +0800, Yong Wu wrote:
> > > On Tue, 2021-01-26 at 22:23 +, Will Deacon wrote:
> > > > On Mon, Jan 11, 2021 at 07:18:48PM +0800, Yong Wu wrote:
> > > > > If group->default_domain exists, avoid reallocate it.
> > > > > 
> > > > > In some iommu drivers, there may be several devices share a group. 
> > > > > Avoid
> > > > > realloc the default domain for this case.
> > > > > 
> > > > > Signed-off-by: Yong Wu 
> > > > > ---
> > > > >  drivers/iommu/iommu.c | 3 ++-
> > > > >  1 file changed, 2 insertions(+), 1 deletion(-)
> > > > > 
> > > > > diff --git a/drivers/iommu/iommu.c b/drivers/iommu/iommu.c
> > > > > index 3d099a31ddca..f4b87e6abe80 100644
> > > > > --- a/drivers/iommu/iommu.c
> > > > > +++ b/drivers/iommu/iommu.c
> > > > > @@ -266,7 +266,8 @@ int iommu_probe_device(struct device *dev)
> > > > >* support default domains, so the return value is not yet
> > > > >* checked.
> > > > >*/
> > > > > - iommu_alloc_default_domain(group, dev);
> > > > > + if (!group->default_domain)
> > > > > + iommu_alloc_default_domain(group, dev);
> > > > 
> > > > I don't really get what this achieves, since 
> > > > iommu_alloc_default_domain()
> > > > looks like this:
> > > > 
> > > > static int iommu_alloc_default_domain(struct iommu_group *group,
> > > >   struct device *dev)
> > > > {
> > > > unsigned int type;
> > > > 
> > > > if (group->default_domain)
> > > > return 0;
> > > > 
> > > > ...
> > > > 
> > > > in which case, it should be fine?
> > > 
> > > oh. sorry, I overlooked this. the current code is enough.
> > > I will remove this patch. and send the next version in this week.
> > > Thanks very much.
> > 
> > Actually, looking at this again, if we're dropping this patch and patch 6
> > just needs the kfree() moving about, then there's no need to repost. The
> > issue that Robin and Paul are discussing can be handled separately.
> 
> Argh, except that this version of the patches doesn't apply :)
> 
> So after all that, please go ahead and post a v7 ASAP based on this branch:
> 
> https://git.kernel.org/pub/scm/linux/kernel/git/will/linux.git/log/?h=for-joerg/mtk

After confirm with Tomasz, He still need some time to take a look at v6.

thus I need wait some time to send v7 after his feedback.

Thanks for your comment. and Thanks Tomasz for the review.

> 
> Will

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH v6 07/33] iommu: Avoid reallocate default domain for a group

2021-01-28 Thread Robin Murphy

On 2021-01-28 21:14, Will Deacon wrote:

On Thu, Jan 28, 2021 at 09:10:20PM +, Will Deacon wrote:

On Wed, Jan 27, 2021 at 05:39:16PM +0800, Yong Wu wrote:

On Tue, 2021-01-26 at 22:23 +, Will Deacon wrote:

On Mon, Jan 11, 2021 at 07:18:48PM +0800, Yong Wu wrote:

If group->default_domain exists, avoid reallocate it.

In some iommu drivers, there may be several devices share a group. Avoid
realloc the default domain for this case.

Signed-off-by: Yong Wu 
---
  drivers/iommu/iommu.c | 3 ++-
  1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/drivers/iommu/iommu.c b/drivers/iommu/iommu.c
index 3d099a31ddca..f4b87e6abe80 100644
--- a/drivers/iommu/iommu.c
+++ b/drivers/iommu/iommu.c
@@ -266,7 +266,8 @@ int iommu_probe_device(struct device *dev)
 * support default domains, so the return value is not yet
 * checked.
 */
-   iommu_alloc_default_domain(group, dev);
+   if (!group->default_domain)
+   iommu_alloc_default_domain(group, dev);


I don't really get what this achieves, since iommu_alloc_default_domain()
looks like this:

static int iommu_alloc_default_domain(struct iommu_group *group,
  struct device *dev)
{
unsigned int type;

if (group->default_domain)
return 0;

...

in which case, it should be fine?


oh. sorry, I overlooked this. the current code is enough.
I will remove this patch. and send the next version in this week.
Thanks very much.


Actually, looking at this again, if we're dropping this patch and patch 6
just needs the kfree() moving about, then there's no need to repost. The
issue that Robin and Paul are discussing can be handled separately.


FWIW patch #6 gets dropped as well now, since Rob has applied the 
standalone fix (89c7cb1608ac).


Robin.


Argh, except that this version of the patches doesn't apply :)

So after all that, please go ahead and post a v7 ASAP based on this branch:

https://git.kernel.org/pub/scm/linux/kernel/git/will/linux.git/log/?h=for-joerg/mtk

Will


___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH v6 07/33] iommu: Avoid reallocate default domain for a group

2021-01-28 Thread Will Deacon
On Thu, Jan 28, 2021 at 09:10:20PM +, Will Deacon wrote:
> On Wed, Jan 27, 2021 at 05:39:16PM +0800, Yong Wu wrote:
> > On Tue, 2021-01-26 at 22:23 +, Will Deacon wrote:
> > > On Mon, Jan 11, 2021 at 07:18:48PM +0800, Yong Wu wrote:
> > > > If group->default_domain exists, avoid reallocate it.
> > > > 
> > > > In some iommu drivers, there may be several devices share a group. Avoid
> > > > realloc the default domain for this case.
> > > > 
> > > > Signed-off-by: Yong Wu 
> > > > ---
> > > >  drivers/iommu/iommu.c | 3 ++-
> > > >  1 file changed, 2 insertions(+), 1 deletion(-)
> > > > 
> > > > diff --git a/drivers/iommu/iommu.c b/drivers/iommu/iommu.c
> > > > index 3d099a31ddca..f4b87e6abe80 100644
> > > > --- a/drivers/iommu/iommu.c
> > > > +++ b/drivers/iommu/iommu.c
> > > > @@ -266,7 +266,8 @@ int iommu_probe_device(struct device *dev)
> > > >  * support default domains, so the return value is not yet
> > > >  * checked.
> > > >  */
> > > > -   iommu_alloc_default_domain(group, dev);
> > > > +   if (!group->default_domain)
> > > > +   iommu_alloc_default_domain(group, dev);
> > > 
> > > I don't really get what this achieves, since iommu_alloc_default_domain()
> > > looks like this:
> > > 
> > > static int iommu_alloc_default_domain(struct iommu_group *group,
> > > struct device *dev)
> > > {
> > >   unsigned int type;
> > > 
> > >   if (group->default_domain)
> > >   return 0;
> > > 
> > >   ...
> > > 
> > > in which case, it should be fine?
> > 
> > oh. sorry, I overlooked this. the current code is enough.
> > I will remove this patch. and send the next version in this week.
> > Thanks very much.
> 
> Actually, looking at this again, if we're dropping this patch and patch 6
> just needs the kfree() moving about, then there's no need to repost. The
> issue that Robin and Paul are discussing can be handled separately.

Argh, except that this version of the patches doesn't apply :)

So after all that, please go ahead and post a v7 ASAP based on this branch:

https://git.kernel.org/pub/scm/linux/kernel/git/will/linux.git/log/?h=for-joerg/mtk

Will
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH v6 07/33] iommu: Avoid reallocate default domain for a group

2021-01-28 Thread Will Deacon
On Wed, Jan 27, 2021 at 05:39:16PM +0800, Yong Wu wrote:
> On Tue, 2021-01-26 at 22:23 +, Will Deacon wrote:
> > On Mon, Jan 11, 2021 at 07:18:48PM +0800, Yong Wu wrote:
> > > If group->default_domain exists, avoid reallocate it.
> > > 
> > > In some iommu drivers, there may be several devices share a group. Avoid
> > > realloc the default domain for this case.
> > > 
> > > Signed-off-by: Yong Wu 
> > > ---
> > >  drivers/iommu/iommu.c | 3 ++-
> > >  1 file changed, 2 insertions(+), 1 deletion(-)
> > > 
> > > diff --git a/drivers/iommu/iommu.c b/drivers/iommu/iommu.c
> > > index 3d099a31ddca..f4b87e6abe80 100644
> > > --- a/drivers/iommu/iommu.c
> > > +++ b/drivers/iommu/iommu.c
> > > @@ -266,7 +266,8 @@ int iommu_probe_device(struct device *dev)
> > >* support default domains, so the return value is not yet
> > >* checked.
> > >*/
> > > - iommu_alloc_default_domain(group, dev);
> > > + if (!group->default_domain)
> > > + iommu_alloc_default_domain(group, dev);
> > 
> > I don't really get what this achieves, since iommu_alloc_default_domain()
> > looks like this:
> > 
> > static int iommu_alloc_default_domain(struct iommu_group *group,
> >   struct device *dev)
> > {
> > unsigned int type;
> > 
> > if (group->default_domain)
> > return 0;
> > 
> > ...
> > 
> > in which case, it should be fine?
> 
> oh. sorry, I overlooked this. the current code is enough.
> I will remove this patch. and send the next version in this week.
> Thanks very much.

Actually, looking at this again, if we're dropping this patch and patch 6
just needs the kfree() moving about, then there's no need to repost. The
issue that Robin and Paul are discussing can be handled separately.

Will
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH 1/2] iommu/msm: Hook up iotlb_sync_map

2021-01-28 Thread Will Deacon
On Wed, 27 Jan 2021 16:29:28 +, Robin Murphy wrote:
> The core API can now accommodate invalidate-on-map style behaviour in a
> single efficient call, so hook that up instead of having io-pgatble do
> it piecemeal.

Applied to arm64 (for-joerg/mtk), thanks!

[1/2] iommu/msm: Hook up iotlb_sync_map
  https://git.kernel.org/arm64/c/c867c78acae9
[2/2] iommu/io-pgtable: Remove TLBI_ON_MAP quirk
  https://git.kernel.org/arm64/c/3d5eab41451f

Cheers,
-- 
Will

https://fixes.arm64.dev
https://next.arm64.dev
https://will.arm64.dev
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH v3 0/3] perf/smmuv3: Don't reserve the PMCG register spaces

2021-01-28 Thread Will Deacon
On Wed, Jan 27, 2021 at 07:32:55PM +0800, Zhen Lei wrote:
> v2 --> v3:
> Patch 3 is updated because https://lkml.org/lkml/2021/1/22/532 has been 
> queued in advance.
> 
> v1 --> v2:
> According to Robin Murphy's suggestion: https://lkml.org/lkml/2021/1/20/470
> Don't reserve the PMCG register spaces, and reserve the entire SMMU register 
> space.
> 
> v1:
> Since the PMCG may implement its resigters space(4KB Page0 and 4KB Page1)
> within the SMMUv3 64KB Page0. In this case, when the SMMUv3 driver reserves 
> the
> 64KB Page0 resource in advance, the PMCG driver try to reserve its Page0 and
> Page1 resources, a resource conflict occurs.
> 
> commit 52f3fab0067d6fa ("iommu/arm-smmu-v3: Don't reserve implementation
> defined register space") reduce the resource reservation range of the SMMUv3
> driver, it only reserves the first 0xe00 bytes in the 64KB Page0, to avoid
> the above-mentioned resource conflicts.
> 
> But the SMMUv3.3 add support for ECMDQ, its registers space is also 
> implemented
> in the SMMUv3 64KB Page0. This means we need to build two separate mappings.
> New features may be added in the future, and more independent mappings may be
> required. The simple problem is complicated because the user expects to map 
> the
> entire SMMUv3 64KB Page0.
> 
> Therefore, the proper solution is: If the PMCG register resources are located 
> in
> the 64KB Page0 of the SMMU, the PMCG driver does not reserve the conflict 
> resources
> when the SMMUv3 driver has reserved the conflict resources before. Instead, 
> the PMCG
> driver only performs devm_ioremap() to ensure that it can work properly.
> 
> Zhen Lei (3):
>   perf/smmuv3: Don't reserve the PMCG register spaces
>   perf/smmuv3: Add a MODULE_SOFTDEP() to indicate dependency on SMMU
>   iommu/arm-smmu-v3: Reserving the entire SMMU register space

I'll need Robin's ack on these.

Will
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH 3/3] Adding device_dma_parameters->offset_preserve_mask to NVMe driver.

2021-01-28 Thread Jianxiong Gao via iommu
I have tried to set it once at probe and then leave it in place, however
the NVMe driver does not seem to like it, and the VM does not boot
correctly. I have spent a couple days debugging but I am a bit lost
now.

Basically whenever nvme_setup_prp_simple

is
mapping with the mask,
I am getting timeout issues on boot, which to my knowledge shows NVMe
driver failure:

> [5.500662] random: crng init done
> [5.502933] random: 7 urandom warning(s) missed due to ratelimiting
> [  132.077795] dracut-initqueue[472]: Warning: dracut-initqueue timeout -
> starting timeout scripts
> [  132.614755] dracut-initqueue[472]: Warning: dracut-initqueue timeout -
> starting timeout scripts
>

I have checked that all the mappings happened correctly:

> [4.773570] nvme :00:04.0: nvme setup prp simple is mapping 200
> data, with offset 200, from fc9acd6c6040, mapped at 7affb200
> [4.784540] nvme :00:04.0: nvme setup prp simple is mapping 200
> data, with offset 400, from fc9acd6c6040, mapped at 7affc400
> [4.794096] nvme :00:04.0: nvme setup prp simple is mapping 200
> data, with offset 600, from fc9acd6c6040, mapped at 7affd600
> [4.801983] nvme :00:04.0: nvme setup prp simple is mapping 200
> data, with offset 800, from fc9acd6c6040, mapped at 7affe800
> [4.806873] nvme :00:04.0: nvme setup prp simple is mapping 200
> data, with offset a00, from fc9acd6c6040, mapped at 7afffa00
> [4.812382] nvme :00:04.0: nvme setup prp simple is mapping 200
> data, with offset c00, from fc9acd6c6040, mapped at 7b000c00
> [4.817423] nvme :00:04.0: nvme setup prp simple is mapping 200
> data, with offset e00, from fc9acd6c6040, mapped at 7b001e00
> [4.823652] nvme :00:04.0: nvme setup prp simple is mapping 200
> data, with offset 200, from fc9acd6c60c0, mapped at 7b003200
> [4.828679] nvme :00:04.0: nvme setup prp simple is mapping 200
> data, with offset 400, from fc9acd6c60c0, mapped at 7b004400
> [4.833875] nvme :00:04.0: nvme setup prp simple is mapping 200
> data, with offset 600, from fc9acd6c60c0, mapped at 7b005600
> [4.838926] nvme :00:04.0: nvme setup prp simple is mapping 200
> data, with offset 800, from fc9acd6c60c0, mapped at 7b006800

...

I have compared it to not setting the mask. The only difference in result is
that instead of being mapped to *200|* 400|*600 etc, they are all mapped
to *000. So I believe the mapping is done correctly, and according to NVMe

spec

figure
108/109, the mapping should have the offset kept. I am not
sure what caused the error that eventually led to the failure. Is there
another
bug in the NVMe driver?

On Thu, Jan 28, 2021 at 10:18 AM Christoph Hellwig  wrote:

> On Thu, Jan 28, 2021 at 06:00:58PM +, Robin Murphy wrote:
> > If it were possible for this to fail, you might leak the DMA mapping
> here.
> > However if dev->dma_parms somehow disappeared since a dozen lines above
> > then I think you've got far bigger problems anyway.
> >
> > That said, do you really need to keep toggling this back and forth all
> the
> > time? Even if the device does make other mappings elsewhere that don't
> > necessarily need the same strict alignment, would it be significantly
> > harmful just to set it once at probe and leave it in place anyway?
>
> Yes, we should kept it set all the time.  While some NVMe devices have
> the optional to use SGLs that do not have this limitation, I have
> absolutely no sympathy for anyone running NVMe with swiotlb as that
> means their system imposes an addressing limitation.  We need to make
> sure it does not corrupt data, but we're not going to make any effort
> to optimize for such a degenerated setup.
>


-- 
Jianxiong Gao
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

Re: [PATCH 3/3] Adding device_dma_parameters->offset_preserve_mask to NVMe driver.

2021-01-28 Thread Christoph Hellwig
On Thu, Jan 28, 2021 at 06:00:58PM +, Robin Murphy wrote:
> If it were possible for this to fail, you might leak the DMA mapping here. 
> However if dev->dma_parms somehow disappeared since a dozen lines above 
> then I think you've got far bigger problems anyway.
>
> That said, do you really need to keep toggling this back and forth all the 
> time? Even if the device does make other mappings elsewhere that don't 
> necessarily need the same strict alignment, would it be significantly 
> harmful just to set it once at probe and leave it in place anyway?

Yes, we should kept it set all the time.  While some NVMe devices have
the optional to use SGLs that do not have this limitation, I have
absolutely no sympathy for anyone running NVMe with swiotlb as that
means their system imposes an addressing limitation.  We need to make
sure it does not corrupt data, but we're not going to make any effort
to optimize for such a degenerated setup.
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH 2/3] Add swiotlb offset preserving mapping when dma_dma_parameters->page_offset_mask is non zero.

2021-01-28 Thread Christoph Hellwig
On Wed, Jan 27, 2021 at 04:38:28PM -0800, Jianxiong Gao wrote:
> For devices that need to preserve address offset on mapping through
> swiotlb, this patch adds offset preserving based on page_offset_mask
> and keeps the offset if the mask is non zero. This is needed for
> device drivers like NVMe.
> 
> Signed-off-by: Jianxiong Gao 
> ---
>  kernel/dma/swiotlb.c | 25 ++---
>  1 file changed, 22 insertions(+), 3 deletions(-)
> 
> diff --git a/kernel/dma/swiotlb.c b/kernel/dma/swiotlb.c
> index 7c42df6e6100..4cab35f2c9bc 100644
> --- a/kernel/dma/swiotlb.c
> +++ b/kernel/dma/swiotlb.c
> @@ -468,7 +468,7 @@ phys_addr_t swiotlb_tbl_map_single(struct device *hwdev, 
> phys_addr_t orig_addr,
>   dma_addr_t tbl_dma_addr = phys_to_dma_unencrypted(hwdev, io_tlb_start);
>   unsigned long flags;
>   phys_addr_t tlb_addr;
> - unsigned int nslots, stride, index, wrap;
> + unsigned int nslots, stride, index, wrap, page_offset_mask, page_offset;
>   int i;
>   unsigned long mask;
>   unsigned long offset_slots;
> @@ -500,12 +500,16 @@ phys_addr_t swiotlb_tbl_map_single(struct device 
> *hwdev, phys_addr_t orig_addr,
>   ? ALIGN(mask + 1, 1 << IO_TLB_SHIFT) >> IO_TLB_SHIFT
>   : 1UL << (BITS_PER_LONG - IO_TLB_SHIFT);
>  
> + page_offset_mask = dma_get_page_offset_mask(hwdev);
> + page_offset = orig_addr & page_offset_mask;
> + alloc_size += page_offset;
> +
>   /*
>* For mappings greater than or equal to a page, we limit the stride
>* (and hence alignment) to a page size.
>*/
>   nslots = ALIGN(alloc_size, 1 << IO_TLB_SHIFT) >> IO_TLB_SHIFT;
> - if (alloc_size >= PAGE_SIZE)
> + if ((alloc_size >= PAGE_SIZE) || (page_offset_mask > (1 << 
> IO_TLB_SHIFT)))
>   stride = (1 << (PAGE_SHIFT - IO_TLB_SHIFT));
>   else
>   stride = 1;
> @@ -583,6 +587,11 @@ phys_addr_t swiotlb_tbl_map_single(struct device *hwdev, 
> phys_addr_t orig_addr,
>*/
>   for (i = 0; i < nslots; i++)
>   io_tlb_orig_addr[index+i] = orig_addr + (i << IO_TLB_SHIFT);
> + /*
> +  * When keeping the offset of the original data, we need to advance
> +  * the tlb_addr by the offset of orig_addr.
> +  */
> + tlb_addr += page_offset;
>   if (!(attrs & DMA_ATTR_SKIP_CPU_SYNC) &&
>   (dir == DMA_TO_DEVICE || dir == DMA_BIDIRECTIONAL))
>   swiotlb_bounce(orig_addr, tlb_addr, mapping_size, 
> DMA_TO_DEVICE);
> @@ -598,7 +607,9 @@ void swiotlb_tbl_unmap_single(struct device *hwdev, 
> phys_addr_t tlb_addr,
> enum dma_data_direction dir, unsigned long attrs)
>  {
>   unsigned long flags;
> - int i, count, nslots = ALIGN(alloc_size, 1 << IO_TLB_SHIFT) >> 
> IO_TLB_SHIFT;
> + unsigned int num_page_offset_slabs, page_offset_mask = 
> dma_get_page_offset_mask(hwdev);

Yikes, please avoid these crazy long lines.

> + num_page_offset_slabs =  (tlb_addr & page_offset_mask) / (1 << 
> IO_TLB_SHIFT);

also a double whitespace here.
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH 1/3] Adding page_offset_mask to device_dma_parameters

2021-01-28 Thread Christoph Hellwig
On Thu, Jan 28, 2021 at 05:27:25PM +, Robin Murphy wrote:
> On 2021-01-28 00:38, Jianxiong Gao wrote:
>> Some devices rely on the address offset in a page to function
>> correctly (NVMe driver as an example). These devices may use
>> a different page size than the Linux kernel. The address offset
>> has to be preserved upon mapping, and in order to do so, we
>> need to record the page_offset_mask first.
>>
>> Signed-off-by: Jianxiong Gao 
>> ---
>>   include/linux/device.h  |  1 +
>>   include/linux/dma-mapping.h | 17 +
>>   2 files changed, 18 insertions(+)
>>
>> diff --git a/include/linux/device.h b/include/linux/device.h
>> index 1779f90eeb4c..f44e0659fc66 100644
>> --- a/include/linux/device.h
>> +++ b/include/linux/device.h
>> @@ -292,6 +292,7 @@ struct device_dma_parameters {
>>   */
>>  unsigned int max_segment_size;
>>  unsigned long segment_boundary_mask;
>> +unsigned int page_offset_mask;
>
> Could we call this something more like "min_align_mask" (sorry, I can't 
> think of a name that's actually good and descriptive right now). 
> Essentially I worry that having "page" in there is going to be too easy to 
> misinterpret as having anything to do what "page" means almost everywhere 
> else (even before you throw IOMMU pages into the mix).
>
> Also note that of all the possible ways to pack two ints and a long, this 
> one is the worst ;)

The block layer uses virt_boundary for the related concept, but that
is pretty horrible too.
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH 2/3] Add swiotlb offset preserving mapping when dma_dma_parameters->page_offset_mask is non zero.

2021-01-28 Thread Jianxiong Gao via iommu
The error can't be fixed by just updating the NVMe driver.
The NVMe spec (and as pointed out by Chirstoph, some other drivers) rely on
the offset of address to copy data correctly. When data is mapped via
swiotlb,
the current implementation always copy the data at 2k/4k aligned address.


On Thu, Jan 28, 2021 at 9:35 AM Keith Busch  wrote:

> On Thu, Jan 28, 2021 at 12:15:28PM -0500, Konrad Rzeszutek Wilk wrote:
> > On Wed, Jan 27, 2021 at 04:38:28PM -0800, Jianxiong Gao wrote:
> > > For devices that need to preserve address offset on mapping through
> > > swiotlb, this patch adds offset preserving based on page_offset_mask
> > > and keeps the offset if the mask is non zero. This is needed for
> > > device drivers like NVMe.
> >
> > 
> >
> > Didn't you send this patch like a month ago and someone pointed
> > out that the right fix would be in the NVMe driver?
> >
> > Is there an issue with fixing the NVMe driver?
>
> You got it backwards. The initial "fix" used a flag specific to the nvme
> driver, and it was pointed out that it should just be the generic
> behaviour.
>


-- 
Jianxiong Gao
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

Re: [PATCH 3/3] Adding device_dma_parameters->offset_preserve_mask to NVMe driver.

2021-01-28 Thread Robin Murphy

On 2021-01-28 00:38, Jianxiong Gao wrote:

NVMe driver relies on the address offset to function properly.
This patch adds the offset preserve mask to NVMe driver when mapping
via dma_map_sg_attrs and unmapping via nvme_unmap_sg. The mask
depends on the page size defined by CC.MPS register of NVMe
controller.

Signed-off-by: Jianxiong Gao 
---
  drivers/nvme/host/pci.c | 19 +--
  1 file changed, 17 insertions(+), 2 deletions(-)

diff --git a/drivers/nvme/host/pci.c b/drivers/nvme/host/pci.c
index 856aa31931c1..0b23f04068be 100644
--- a/drivers/nvme/host/pci.c
+++ b/drivers/nvme/host/pci.c
@@ -580,12 +580,15 @@ static void nvme_free_sgls(struct nvme_dev *dev, struct 
request *req)
  static void nvme_unmap_sg(struct nvme_dev *dev, struct request *req)
  {
struct nvme_iod *iod = blk_mq_rq_to_pdu(req);
-
+   if (dma_set_page_offset_mask(dev->dev, NVME_CTRL_PAGE_SIZE - 1))
+   dev_warn(dev->dev, "dma_set_page_offset_mask failed to set 
offset\n");
if (is_pci_p2pdma_page(sg_page(iod->sg)))
pci_p2pdma_unmap_sg(dev->dev, iod->sg, iod->nents,
rq_dma_dir(req));
else
dma_unmap_sg(dev->dev, iod->sg, iod->nents, rq_dma_dir(req));
+   if (dma_set_page_offset_mask(dev->dev, 0))
+   dev_warn(dev->dev, "dma_set_page_offset_mask failed to reset 
offset\n");
  }
  
  static void nvme_unmap_data(struct nvme_dev *dev, struct request *req)

@@ -842,7 +845,7 @@ static blk_status_t nvme_map_data(struct nvme_dev *dev, 
struct request *req,
  {
struct nvme_iod *iod = blk_mq_rq_to_pdu(req);
blk_status_t ret = BLK_STS_RESOURCE;
-   int nr_mapped;
+   int nr_mapped, offset_ret;
  
  	if (blk_rq_nr_phys_segments(req) == 1) {

struct bio_vec bv = req_bvec(req);
@@ -868,12 +871,24 @@ static blk_status_t nvme_map_data(struct nvme_dev *dev, 
struct request *req,
if (!iod->nents)
goto out_free_sg;
  
+	offset_ret = dma_set_page_offset_mask(dev->dev, NVME_CTRL_PAGE_SIZE - 1);

+   if (offset_ret) {
+   dev_warn(dev->dev, "dma_set_page_offset_mask failed to set 
offset\n");
+   goto out_free_sg;
+   }
+
if (is_pci_p2pdma_page(sg_page(iod->sg)))
nr_mapped = pci_p2pdma_map_sg_attrs(dev->dev, iod->sg,
iod->nents, rq_dma_dir(req), DMA_ATTR_NO_WARN);
else
nr_mapped = dma_map_sg_attrs(dev->dev, iod->sg, iod->nents,
 rq_dma_dir(req), DMA_ATTR_NO_WARN);
+
+   offset_ret = dma_set_page_offset_mask(dev->dev, 0);
+   if (offset_ret) {
+   dev_warn(dev->dev, "dma_set_page_offset_mask failed to reset 
offset\n");
+   goto out_free_sg;


If it were possible for this to fail, you might leak the DMA mapping 
here. However if dev->dma_parms somehow disappeared since a dozen lines 
above then I think you've got far bigger problems anyway.


That said, do you really need to keep toggling this back and forth all 
the time? Even if the device does make other mappings elsewhere that 
don't necessarily need the same strict alignment, would it be 
significantly harmful just to set it once at probe and leave it in place 
anyway?


Robin.


+   }
if (!nr_mapped)
goto out_free_sg;
  


___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[PATCH v2 6/6] media: uvcvideo: Use dma_alloc_noncontiguos API

2021-01-28 Thread Ricardo Ribalda
On architectures where the is no coherent caching such as ARM use the
dma_alloc_noncontiguos API and handle manually the cache flushing using
dma_sync_sgtable().

With this patch on the affected architectures we can measure up to 20x
performance improvement in uvc_video_copy_data_work().

Eg: aarch64 with an external usb camera

NON_CONTIGUOUS
frames:  999
packets: 999
empty:   0 (0 %)
errors:  0
invalid: 0
pts: 0 early, 0 initial, 999 ok
scr: 0 count ok, 0 diff ok
sof: 2048 <= sof <= 0, freq 0.000 kHz
bytes 67034480 : duration 33303
FPS: 29.99
URB: 523446/4993 uS/qty: 104.836 avg 132.532 std 13.230 min 831.094 max (uS)
header: 76564/4993 uS/qty: 15.334 avg 15.229 std 3.438 min 186.875 max (uS)
latency: 468945/4992 uS/qty: 93.939 avg 132.577 std 9.531 min 824.010 max (uS)
decode: 54161/4993 uS/qty: 10.847 avg 6.313 std 1.614 min 111.458 max (uS)
raw decode speed: 9.931 Gbits/s
raw URB handling speed: 1.025 Gbits/s
throughput: 16.102 Mbits/s
URB decode CPU usage 0.162600 %

COHERENT
frames:  999
packets: 999
empty:   0 (0 %)
errors:  0
invalid: 0
pts: 0 early, 0 initial, 999 ok
scr: 0 count ok, 0 diff ok
sof: 2048 <= sof <= 0, freq 0.000 kHz
bytes 54683536 : duration 33302
FPS: 29.99
URB: 1478135/4000 uS/qty: 369.533 avg 390.357 std 22.968 min 3337.865 max (uS)
header: 79761/4000 uS/qty: 19.940 avg 18.495 std 1.875 min 336.719 max (uS)
latency: 281077/4000 uS/qty: 70.269 avg 83.102 std 5.104 min 735.000 max (uS)
decode: 1197057/4000 uS/qty: 299.264 avg 318.080 std 1.615 min 2806.667 max (uS)
raw decode speed: 365.470 Mbits/s
raw URB handling speed: 295.986 Mbits/s
throughput: 13.136 Mbits/s
URB decode CPU usage 3.594500 %

Signed-off-by: Ricardo Ribalda 
Signed-off-by: Christoph Hellwig 
---

v2: 
- Replace DMA_BIDIRECTIONAL with DMA_FROM_DEVICE
- Add invalidate_kernel_vmap_range
- Tested in an X86 notebook and an ARM Chromebook

 drivers/media/usb/uvc/uvc_video.c | 80 ++-
 drivers/media/usb/uvc/uvcvideo.h  |  4 +-
 2 files changed, 61 insertions(+), 23 deletions(-)

diff --git a/drivers/media/usb/uvc/uvc_video.c 
b/drivers/media/usb/uvc/uvc_video.c
index a6a441d92b94..378fb5f29920 100644
--- a/drivers/media/usb/uvc/uvc_video.c
+++ b/drivers/media/usb/uvc/uvc_video.c
@@ -6,11 +6,13 @@
  *  Laurent Pinchart (laurent.pinch...@ideasonboard.com)
  */
 
+#include 
 #include 
 #include 
 #include 
 #include 
 #include 
+#include 
 #include 
 #include 
 #include 
@@ -1097,6 +1099,26 @@ static int uvc_video_decode_start(struct uvc_streaming 
*stream,
return data[0];
 }
 
+static inline struct device *stream_to_dmadev(struct uvc_streaming *stream)
+{
+   return bus_to_hcd(stream->dev->udev->bus)->self.sysdev;
+}
+
+static void uvc_urb_dma_sync(struct uvc_urb *uvc_urb, bool for_device)
+{
+   struct device *dma_dev = dma_dev = stream_to_dmadev(uvc_urb->stream);
+
+   if (for_device) {
+   dma_sync_sgtable_for_device(dma_dev, uvc_urb->sgt,
+   DMA_FROM_DEVICE);
+   } else {
+   dma_sync_sgtable_for_cpu(dma_dev, uvc_urb->sgt,
+DMA_FROM_DEVICE);
+   invalidate_kernel_vmap_range(uvc_urb->buffer,
+uvc_urb->stream->urb_size);
+   }
+}
+
 /*
  * uvc_video_decode_data_work: Asynchronous memcpy processing
  *
@@ -1118,6 +1140,8 @@ static void uvc_video_copy_data_work(struct work_struct 
*work)
uvc_queue_buffer_release(op->buf);
}
 
+   uvc_urb_dma_sync(uvc_urb, true);
+
ret = usb_submit_urb(uvc_urb->urb, GFP_KERNEL);
if (ret < 0)
uvc_printk(KERN_ERR, "Failed to resubmit video URB (%d).\n",
@@ -1539,10 +1563,12 @@ static void uvc_video_complete(struct urb *urb)
 * Process the URB headers, and optionally queue expensive memcpy tasks
 * to be deferred to a work queue.
 */
+   uvc_urb_dma_sync(uvc_urb, false);
stream->decode(uvc_urb, buf, buf_meta);
 
/* If no async work is needed, resubmit the URB immediately. */
if (!uvc_urb->async_operations) {
+   uvc_urb_dma_sync(uvc_urb, true);
ret = usb_submit_urb(uvc_urb->urb, GFP_ATOMIC);
if (ret < 0)
uvc_printk(KERN_ERR,
@@ -1559,24 +1585,47 @@ static void uvc_video_complete(struct urb *urb)
  */
 static void uvc_free_urb_buffers(struct uvc_streaming *stream)
 {
+   struct device *dma_dev = dma_dev = stream_to_dmadev(stream);
struct uvc_urb *uvc_urb;
 
for_each_uvc_urb(uvc_urb, stream) {
if (!uvc_urb->buffer)
continue;
 
-#ifndef CONFIG_DMA_NONCOHERENT
-   usb_free_coherent(stream->dev->udev, stream->urb_size,
- uvc_urb->buffer, uvc_urb->dma);
-#else
-   kfree(uvc_urb->buffer);
-#endif
+   dma_vunmap_noncontiguous(dma_dev, uvc_urb->buffer);
+   

Re: [PATCH 2/3] Add swiotlb offset preserving mapping when dma_dma_parameters->page_offset_mask is non zero.

2021-01-28 Thread Keith Busch
On Thu, Jan 28, 2021 at 12:15:28PM -0500, Konrad Rzeszutek Wilk wrote:
> On Wed, Jan 27, 2021 at 04:38:28PM -0800, Jianxiong Gao wrote:
> > For devices that need to preserve address offset on mapping through
> > swiotlb, this patch adds offset preserving based on page_offset_mask
> > and keeps the offset if the mask is non zero. This is needed for
> > device drivers like NVMe.
> 
> 
> 
> Didn't you send this patch like a month ago and someone pointed
> out that the right fix would be in the NVMe driver?
> 
> Is there an issue with fixing the NVMe driver?

You got it backwards. The initial "fix" used a flag specific to the nvme
driver, and it was pointed out that it should just be the generic
behaviour.
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH 1/3] Adding page_offset_mask to device_dma_parameters

2021-01-28 Thread Robin Murphy

On 2021-01-28 00:38, Jianxiong Gao wrote:

Some devices rely on the address offset in a page to function
correctly (NVMe driver as an example). These devices may use
a different page size than the Linux kernel. The address offset
has to be preserved upon mapping, and in order to do so, we
need to record the page_offset_mask first.

Signed-off-by: Jianxiong Gao 
---
  include/linux/device.h  |  1 +
  include/linux/dma-mapping.h | 17 +
  2 files changed, 18 insertions(+)

diff --git a/include/linux/device.h b/include/linux/device.h
index 1779f90eeb4c..f44e0659fc66 100644
--- a/include/linux/device.h
+++ b/include/linux/device.h
@@ -292,6 +292,7 @@ struct device_dma_parameters {
 */
unsigned int max_segment_size;
unsigned long segment_boundary_mask;
+   unsigned int page_offset_mask;


Could we call this something more like "min_align_mask" (sorry, I can't 
think of a name that's actually good and descriptive right now). 
Essentially I worry that having "page" in there is going to be too easy 
to misinterpret as having anything to do what "page" means almost 
everywhere else (even before you throw IOMMU pages into the mix).


Also note that of all the possible ways to pack two ints and a long, 
this one is the worst ;)


Robin.


  };
  
  /**

diff --git a/include/linux/dma-mapping.h b/include/linux/dma-mapping.h
index 2e49996a8f39..5529a31fefba 100644
--- a/include/linux/dma-mapping.h
+++ b/include/linux/dma-mapping.h
@@ -500,6 +500,23 @@ static inline int dma_set_seg_boundary(struct device *dev, 
unsigned long mask)
return -EIO;
  }
  
+static inline unsigned int dma_get_page_offset_mask(struct device *dev)

+{
+   if (dev->dma_parms)
+   return dev->dma_parms->page_offset_mask;
+   return 0;
+}
+
+static inline int dma_set_page_offset_mask(struct device *dev,
+   unsigned int page_offset_mask)
+{
+   if (dev->dma_parms) {
+   dev->dma_parms->page_offset_mask = page_offset_mask;
+   return 0;
+   }
+   return -EIO;
+}
+
  static inline int dma_get_cache_alignment(void)
  {
  #ifdef ARCH_DMA_MINALIGN


___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH 2/3] Add swiotlb offset preserving mapping when dma_dma_parameters->page_offset_mask is non zero.

2021-01-28 Thread Konrad Rzeszutek Wilk
On Wed, Jan 27, 2021 at 04:38:28PM -0800, Jianxiong Gao wrote:
> For devices that need to preserve address offset on mapping through
> swiotlb, this patch adds offset preserving based on page_offset_mask
> and keeps the offset if the mask is non zero. This is needed for
> device drivers like NVMe.



Didn't you send this patch like a month ago and someone pointed
out that the right fix would be in the NVMe driver?

Is there an issue with fixing the NVMe driver?

> 
> Signed-off-by: Jianxiong Gao 
> ---
>  kernel/dma/swiotlb.c | 25 ++---
>  1 file changed, 22 insertions(+), 3 deletions(-)
> 
> diff --git a/kernel/dma/swiotlb.c b/kernel/dma/swiotlb.c
> index 7c42df6e6100..4cab35f2c9bc 100644
> --- a/kernel/dma/swiotlb.c
> +++ b/kernel/dma/swiotlb.c
> @@ -468,7 +468,7 @@ phys_addr_t swiotlb_tbl_map_single(struct device *hwdev, 
> phys_addr_t orig_addr,
>   dma_addr_t tbl_dma_addr = phys_to_dma_unencrypted(hwdev, io_tlb_start);
>   unsigned long flags;
>   phys_addr_t tlb_addr;
> - unsigned int nslots, stride, index, wrap;
> + unsigned int nslots, stride, index, wrap, page_offset_mask, page_offset;
>   int i;
>   unsigned long mask;
>   unsigned long offset_slots;
> @@ -500,12 +500,16 @@ phys_addr_t swiotlb_tbl_map_single(struct device 
> *hwdev, phys_addr_t orig_addr,
>   ? ALIGN(mask + 1, 1 << IO_TLB_SHIFT) >> IO_TLB_SHIFT
>   : 1UL << (BITS_PER_LONG - IO_TLB_SHIFT);
>  
> + page_offset_mask = dma_get_page_offset_mask(hwdev);
> + page_offset = orig_addr & page_offset_mask;
> + alloc_size += page_offset;
> +
>   /*
>* For mappings greater than or equal to a page, we limit the stride
>* (and hence alignment) to a page size.
>*/
>   nslots = ALIGN(alloc_size, 1 << IO_TLB_SHIFT) >> IO_TLB_SHIFT;
> - if (alloc_size >= PAGE_SIZE)
> + if ((alloc_size >= PAGE_SIZE) || (page_offset_mask > (1 << 
> IO_TLB_SHIFT)))
>   stride = (1 << (PAGE_SHIFT - IO_TLB_SHIFT));
>   else
>   stride = 1;
> @@ -583,6 +587,11 @@ phys_addr_t swiotlb_tbl_map_single(struct device *hwdev, 
> phys_addr_t orig_addr,
>*/
>   for (i = 0; i < nslots; i++)
>   io_tlb_orig_addr[index+i] = orig_addr + (i << IO_TLB_SHIFT);
> + /*
> +  * When keeping the offset of the original data, we need to advance
> +  * the tlb_addr by the offset of orig_addr.
> +  */
> + tlb_addr += page_offset;
>   if (!(attrs & DMA_ATTR_SKIP_CPU_SYNC) &&
>   (dir == DMA_TO_DEVICE || dir == DMA_BIDIRECTIONAL))
>   swiotlb_bounce(orig_addr, tlb_addr, mapping_size, 
> DMA_TO_DEVICE);
> @@ -598,7 +607,9 @@ void swiotlb_tbl_unmap_single(struct device *hwdev, 
> phys_addr_t tlb_addr,
> enum dma_data_direction dir, unsigned long attrs)
>  {
>   unsigned long flags;
> - int i, count, nslots = ALIGN(alloc_size, 1 << IO_TLB_SHIFT) >> 
> IO_TLB_SHIFT;
> + unsigned int num_page_offset_slabs, page_offset_mask = 
> dma_get_page_offset_mask(hwdev);
> + int i, count;
> + int nslots = ALIGN(alloc_size + tlb_addr & page_offset_mask, 1 << 
> IO_TLB_SHIFT) >> IO_TLB_SHIFT;
>   int index = (tlb_addr - io_tlb_start) >> IO_TLB_SHIFT;
>   phys_addr_t orig_addr = io_tlb_orig_addr[index];
>  
> @@ -610,6 +621,14 @@ void swiotlb_tbl_unmap_single(struct device *hwdev, 
> phys_addr_t tlb_addr,
>   ((dir == DMA_FROM_DEVICE) || (dir == DMA_BIDIRECTIONAL)))
>   swiotlb_bounce(orig_addr, tlb_addr, mapping_size, 
> DMA_FROM_DEVICE);
>  
> + /*
> +  * When dma_get_page_offset_mask is used, we may have padded more slabs
> +  * when padding exceeds one slab. We need to move index back to the
> +  * beginning of the padding.
> +  */
> + num_page_offset_slabs =  (tlb_addr & page_offset_mask) / (1 << 
> IO_TLB_SHIFT);
> + index -= num_page_offset_slabs;
> +
>   /*
>* Return the buffer to the free list by setting the corresponding
>* entries to indicate the number of contiguous entries available.
> -- 
> 2.27.0
> 
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH 6/6] media: uvcvideo: Use dma_alloc_noncontiguos API

2021-01-28 Thread Ricardo Ribalda
HI Christoph

Thanks for your comments

On Thu, Jan 28, 2021 at 4:09 PM Christoph Hellwig  wrote:
>
> I just included this patch as-is, but here are a few comments:
>
> On Thu, Jan 28, 2021 at 03:58:37PM +0100, Christoph Hellwig wrote:
> > +static void uvc_urb_dma_sync(struct uvc_urb *uvc_urb, bool for_device)
> > +{
> > + struct device *dma_dev = dma_dev = stream_to_dmadev(uvc_urb->stream);
> > +
> > + if (for_device)
> > + dma_sync_sgtable_for_device(dma_dev, uvc_urb->sgt,
> > + DMA_FROM_DEVICE);
> > + else
> > + dma_sync_sgtable_for_cpu(dma_dev, uvc_urb->sgt,
> > +  DMA_FROM_DEVICE);
> > +}
>
> Given that we vmap the addresses this also needs
> flush_kernel_vmap_range / invalidate_kernel_vmap_range calls for
> VIVT architectures.

We only read from the device to the cpu. Then can we run only
invalidate_kernel_vmap_range() ?

something like ?
else {
  dma_sync_sgtable_for_cpu(dma_dev, uvc_urb->sgt, DMA_FROM_DEVICE);
   invalidate_kernel_vmap_range(uvc_urb->buffer,
uvc_urb->stream->urb_size );
}

Thanks!






--
Ricardo Ribalda
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH 6/6] media: uvcvideo: Use dma_alloc_noncontiguos API

2021-01-28 Thread Ricardo Ribalda
On Thu, Jan 28, 2021 at 4:03 PM Christoph Hellwig  wrote:
>
> From: Ricardo Ribalda 
>
> On architectures where the is no coherent caching such as ARM use the
> dma_alloc_noncontiguos API and handle manually the cache flushing using
> dma_sync_sgtable().
>
> With this patch on the affected architectures we can measure up to 20x
> performance improvement in uvc_video_copy_data_work().
>
> Eg: aarch64 with an external usb camera
>
> NON_CONTIGUOUS
> frames:  999
> packets: 999
> empty:   0 (0 %)
> errors:  0
> invalid: 0
> pts: 0 early, 0 initial, 999 ok
> scr: 0 count ok, 0 diff ok
> sof: 2048 <= sof <= 0, freq 0.000 kHz
> bytes 67034480 : duration 33303
> FPS: 29.99
> URB: 523446/4993 uS/qty: 104.836 avg 132.532 std 13.230 min 831.094 max (uS)
> header: 76564/4993 uS/qty: 15.334 avg 15.229 std 3.438 min 186.875 max (uS)
> latency: 468945/4992 uS/qty: 93.939 avg 132.577 std 9.531 min 824.010 max (uS)
> decode: 54161/4993 uS/qty: 10.847 avg 6.313 std 1.614 min 111.458 max (uS)
> raw decode speed: 9.931 Gbits/s
> raw URB handling speed: 1.025 Gbits/s
> throughput: 16.102 Mbits/s
> URB decode CPU usage 0.162600 %
>
> COHERENT
> frames:  999
> packets: 999
> empty:   0 (0 %)
> errors:  0
> invalid: 0
> pts: 0 early, 0 initial, 999 ok
> scr: 0 count ok, 0 diff ok
> sof: 2048 <= sof <= 0, freq 0.000 kHz
> bytes 54683536 : duration 33302
> FPS: 29.99
> URB: 1478135/4000 uS/qty: 369.533 avg 390.357 std 22.968 min 3337.865 max (uS)
> header: 79761/4000 uS/qty: 19.940 avg 18.495 std 1.875 min 336.719 max (uS)
> latency: 281077/4000 uS/qty: 70.269 avg 83.102 std 5.104 min 735.000 max (uS)
> decode: 1197057/4000 uS/qty: 299.264 avg 318.080 std 1.615 min 2806.667 max 
> (uS)
> raw decode speed: 365.470 Mbits/s
> raw URB handling speed: 295.986 Mbits/s
> throughput: 13.136 Mbits/s
> URB decode CPU usage 3.594500 %
>
> Signed-off-by: Ricardo Ribalda 
> Signed-off-by: Christoph Hellwig 
> ---
>  drivers/media/usb/uvc/uvc_video.c | 76 ++-
>  drivers/media/usb/uvc/uvcvideo.h  |  4 +-
>  2 files changed, 57 insertions(+), 23 deletions(-)
>
> diff --git a/drivers/media/usb/uvc/uvc_video.c 
> b/drivers/media/usb/uvc/uvc_video.c
> index a6a441d92b9488..9c051b55dc7bc6 100644
> --- a/drivers/media/usb/uvc/uvc_video.c
> +++ b/drivers/media/usb/uvc/uvc_video.c
> @@ -11,6 +11,7 @@
>  #include 
>  #include 
>  #include 
> +#include 
>  #include 
>  #include 
>  #include 
> @@ -1097,6 +1098,23 @@ static int uvc_video_decode_start(struct uvc_streaming 
> *stream,
> return data[0];
>  }
>
> +static inline struct device *stream_to_dmadev(struct uvc_streaming *stream)
> +{
> +   return bus_to_hcd(stream->dev->udev->bus)->self.sysdev;
> +}
> +
> +static void uvc_urb_dma_sync(struct uvc_urb *uvc_urb, bool for_device)
> +{
> +   struct device *dma_dev = dma_dev = stream_to_dmadev(uvc_urb->stream);
> +
> +   if (for_device)
> +   dma_sync_sgtable_for_device(dma_dev, uvc_urb->sgt,
> +   DMA_FROM_DEVICE);
> +   else
> +   dma_sync_sgtable_for_cpu(dma_dev, uvc_urb->sgt,
> +DMA_FROM_DEVICE);
> +}
> +
>  /*
>   * uvc_video_decode_data_work: Asynchronous memcpy processing
>   *
> @@ -1118,6 +1136,8 @@ static void uvc_video_copy_data_work(struct work_struct 
> *work)
> uvc_queue_buffer_release(op->buf);
> }
>
> +   uvc_urb_dma_sync(uvc_urb, true);
> +
> ret = usb_submit_urb(uvc_urb->urb, GFP_KERNEL);
> if (ret < 0)
> uvc_printk(KERN_ERR, "Failed to resubmit video URB (%d).\n",
> @@ -1539,10 +1559,12 @@ static void uvc_video_complete(struct urb *urb)
>  * Process the URB headers, and optionally queue expensive memcpy 
> tasks
>  * to be deferred to a work queue.
>  */
> +   uvc_urb_dma_sync(uvc_urb, false);
> stream->decode(uvc_urb, buf, buf_meta);
>
> /* If no async work is needed, resubmit the URB immediately. */
> if (!uvc_urb->async_operations) {
> +   uvc_urb_dma_sync(uvc_urb, true);
> ret = usb_submit_urb(uvc_urb->urb, GFP_ATOMIC);
> if (ret < 0)
> uvc_printk(KERN_ERR,
> @@ -1559,24 +1581,47 @@ static void uvc_video_complete(struct urb *urb)
>   */
>  static void uvc_free_urb_buffers(struct uvc_streaming *stream)
>  {
> +   struct device *dma_dev = dma_dev = stream_to_dmadev(stream);
> struct uvc_urb *uvc_urb;
>
> for_each_uvc_urb(uvc_urb, stream) {
> if (!uvc_urb->buffer)
> continue;
>
> -#ifndef CONFIG_DMA_NONCOHERENT
> -   usb_free_coherent(stream->dev->udev, stream->urb_size,
> - uvc_urb->buffer, uvc_urb->dma);
> -#else
> -   kfree(uvc_urb->buffer);
> -#endif
> +   dma_vunmap_noncontiguous(dma_dev, uvc_urb->buffer);
> +   dma_free_noncontiguous(dma_dev, stream->urb_size, 
> 

Re: [PATCH 1/1] iommu/arm-smmu-v3: add support for BBML

2021-01-28 Thread Robin Murphy

On 2021-01-28 15:18, Keqian Zhu wrote:



On 2021/1/27 17:39, Robin Murphy wrote:

On 2021-01-27 07:36, Keqian Zhu wrote:



On 2021/1/27 10:01, Leizhen (ThunderTown) wrote:



On 2021/1/26 18:12, Will Deacon wrote:

On Mon, Jan 25, 2021 at 08:23:40PM +, Robin Murphy wrote:

Now we probably will need some degreee of BBML feature awareness for the
sake of SVA if and when we start using it for CPU pagetables, but I still
cannot see any need to consider it in io-pgtable.


Agreed; I don't think this is something that io-pgtable should have to care
about.

Hi,

I have a question here :-).
If the old table is not live, then the break procedure seems unnecessary. Do I 
miss something?


The MMU is allowed to prefetch translations at any time, so not following the 
proper update procedure could still potentially lead to a TLB conflict, even if 
there's no device traffic to worry about disrupting.

Robin.


Thanks. Does the MMU you mention here includes MMU and SMMU? I know that at 
SMMU side, ATS can prefetch translation.


Yes, both - VMSAv8 allows speculative translation table walks, so SMMUv3 
inherits from there (per 3.21.1 "Translation tables and TLB invalidation 
completion behavior").


Robin.



Keqian



Thanks,
Keqian



Yes, the SVA works in stall mode, and the failed device access requests are not
discarded.

Let me look for examples. The BBML usage scenario was told by a former 
colleague.



Will

.




___
linux-arm-kernel mailing list
linux-arm-ker...@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel
.


___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


.


___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


RE: [PATCH 2/6] dma-mapping: add a dma_mmap_pages helper

2021-01-28 Thread David Laight
From: Christoph Hellwig
> Sent: 28 January 2021 14:59
> 
> Add a helper to map memory allocated using dma_alloc_pages into
> a user address space, similar to the dma_alloc_attrs function for
> coherent allocations.
> 
...
> +::
> +
> + int
> + dma_mmap_pages(struct device *dev, struct vm_area_struct *vma,
> +size_t size, struct page *page)
> +
> +Map an allocation returned from dma_alloc_pages() into a user address space.
> +dev and size must be the same as those passed into dma_alloc_pages().
> +page must be the pointer returned by dma_alloc_pages().

To be useful this needs to specify the offset into the user address area.
(ie the offset in the mmap() buffer.)

For example we have an fpga based PCIe card that converts internal
addresses that refer to one of 512 16k 'pages' to 64bit PCIe bus
master addresses.
So it (sort of) contains its own iommu.

So we can allocate (aligned) 16k kernel memory buffers with
dma_alloc_coherent() and make them appear contiguous to the
on-board PCIe bus master users.
We then mmap() them into contiguous user addresses.
So both 'ends' see contiguous addresses without requiring
contiguous physical memory or requiring all the memory be 
allocated at the same time.

Clearly in-kernel users have to allow for the 16k boundaries.
But the large structures are accesses from user space.

David

-
Registered Address Lakeside, Bramley Road, Mount Farm, Milton Keynes, MK1 1PT, 
UK
Registration No: 1397386 (Wales)

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[RFC PATCH 07/11] iommu/arm-smmu-v3: Clear dirty log according to bitmap

2021-01-28 Thread Keqian Zhu
From: jiangkunkun 

After dirty log is retrieved, user should clear dirty log to re-enable
dirty log tracking for these dirtied pages. This adds a new interface
named clear_dirty_log and arm smmuv3 implements it, which clears the dirty
state (As we just enable HTTU for stage1, so set the AP[2] bit) of these
TTDs that are specified by the user provided bitmap.

Co-developed-by: Keqian Zhu 
Signed-off-by: Kunkun Jiang 
---
 drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c | 24 ++
 drivers/iommu/io-pgtable-arm.c  | 95 +
 drivers/iommu/iommu.c   | 71 +++
 include/linux/io-pgtable.h  |  4 +
 include/linux/iommu.h   | 17 
 5 files changed, 211 insertions(+)

diff --git a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c 
b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
index 43d0536b429a..0c24503d29d3 100644
--- a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
+++ b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
@@ -2574,6 +2574,29 @@ static int arm_smmu_sync_dirty_log(struct iommu_domain 
*domain,
base_iova, bitmap_pgshift);
 }
 
+static int arm_smmu_clear_dirty_log(struct iommu_domain *domain,
+   unsigned long iova, size_t size,
+   unsigned long *bitmap,
+   unsigned long base_iova,
+   unsigned long bitmap_pgshift)
+{
+   struct io_pgtable_ops *ops = to_smmu_domain(domain)->pgtbl_ops;
+   struct arm_smmu_device *smmu = to_smmu_domain(domain)->smmu;
+
+   if (!(smmu->features & ARM_SMMU_FEAT_HTTU_HD)) {
+   dev_err(smmu->dev, "don't support HTTU_HD and clear dirty 
log\n");
+   return -EPERM;
+   }
+
+   if (!ops || !ops->clear_dirty_log) {
+   pr_err("don't support clear dirty log\n");
+   return -ENODEV;
+   }
+
+   return ops->clear_dirty_log(ops, iova, size, bitmap, base_iova,
+   bitmap_pgshift);
+}
+
 static int arm_smmu_of_xlate(struct device *dev, struct of_phandle_args *args)
 {
return iommu_fwspec_add_ids(dev, args->args, 1);
@@ -2676,6 +2699,7 @@ static struct iommu_ops arm_smmu_ops = {
.split_block= arm_smmu_split_block,
.merge_page = arm_smmu_merge_page,
.sync_dirty_log = arm_smmu_sync_dirty_log,
+   .clear_dirty_log= arm_smmu_clear_dirty_log,
.of_xlate   = arm_smmu_of_xlate,
.get_resv_regions   = arm_smmu_get_resv_regions,
.put_resv_regions   = generic_iommu_put_resv_regions,
diff --git a/drivers/iommu/io-pgtable-arm.c b/drivers/iommu/io-pgtable-arm.c
index 6cfe1ef3fedd..2256e37bcb3a 100644
--- a/drivers/iommu/io-pgtable-arm.c
+++ b/drivers/iommu/io-pgtable-arm.c
@@ -966,6 +966,100 @@ static int arm_lpae_sync_dirty_log(struct io_pgtable_ops 
*ops,
 bitmap, base_iova, bitmap_pgshift);
 }
 
+static int __arm_lpae_clear_dirty_log(struct arm_lpae_io_pgtable *data,
+ unsigned long iova, size_t size,
+ int lvl, arm_lpae_iopte *ptep,
+ unsigned long *bitmap,
+ unsigned long base_iova,
+ unsigned long bitmap_pgshift)
+{
+   arm_lpae_iopte pte;
+   struct io_pgtable *iop = >iop;
+   unsigned long offset;
+   size_t base, next_size;
+   int nbits, ret, i;
+
+   if (WARN_ON(lvl == ARM_LPAE_MAX_LEVELS))
+   return -EINVAL;
+
+   ptep += ARM_LPAE_LVL_IDX(iova, lvl, data);
+   pte = READ_ONCE(*ptep);
+   if (WARN_ON(!pte))
+   return -EINVAL;
+
+   if (size == ARM_LPAE_BLOCK_SIZE(lvl, data)) {
+   if (iopte_leaf(pte, lvl, iop->fmt)) {
+   if (pte & ARM_LPAE_PTE_AP_RDONLY)
+   return 0;
+
+   /* Ensure all corresponding bits are set */
+   nbits = size >> bitmap_pgshift;
+   offset = (iova - base_iova) >> bitmap_pgshift;
+   for (i = offset; i < offset + nbits; i++) {
+   if (!test_bit(i, bitmap))
+   return 0;
+   }
+
+   /* Race does not exist */
+   pte |= ARM_LPAE_PTE_AP_RDONLY;
+   __arm_lpae_set_pte(ptep, pte, >cfg);
+   return 0;
+   } else {
+   /* To traverse next level */
+   next_size = ARM_LPAE_BLOCK_SIZE(lvl + 1, data);
+   ptep = iopte_deref(pte, data);
+   for (base = 0; base < size; base += next_size) {
+   ret = 

Re: [PATCH v3 2/2] vfio/iommu_type1: Fix some sanity checks in detach group

2021-01-28 Thread Keqian Zhu



On 2021/1/28 7:46, Alex Williamson wrote:
> On Fri, 22 Jan 2021 17:26:35 +0800
> Keqian Zhu  wrote:
> 
>> vfio_sanity_check_pfn_list() is used to check whether pfn_list and
>> notifier are empty when remove the external domain, so it makes a
>> wrong assumption that only external domain will use the pinning
>> interface.
>>
>> Now we apply the pfn_list check when a vfio_dma is removed and apply
>> the notifier check when all domains are removed.
>>
>> Fixes: a54eb55045ae ("vfio iommu type1: Add support for mediated devices")
>> Signed-off-by: Keqian Zhu 
>> ---
>>  drivers/vfio/vfio_iommu_type1.c | 33 ++---
>>  1 file changed, 10 insertions(+), 23 deletions(-)
>>
>> diff --git a/drivers/vfio/vfio_iommu_type1.c 
>> b/drivers/vfio/vfio_iommu_type1.c
>> index 161725395f2f..d8c10f508321 100644
>> --- a/drivers/vfio/vfio_iommu_type1.c
>> +++ b/drivers/vfio/vfio_iommu_type1.c
>> @@ -957,6 +957,7 @@ static long vfio_unmap_unpin(struct vfio_iommu *iommu, 
>> struct vfio_dma *dma,
>>  
>>  static void vfio_remove_dma(struct vfio_iommu *iommu, struct vfio_dma *dma)
>>  {
>> +WARN_ON(!RB_EMPTY_ROOT(>pfn_list));
>>  vfio_unmap_unpin(iommu, dma, true);
>>  vfio_unlink_dma(iommu, dma);
>>  put_task_struct(dma->task);
>> @@ -2250,23 +2251,6 @@ static void vfio_iommu_unmap_unpin_reaccount(struct 
>> vfio_iommu *iommu)
>>  }
>>  }
>>  
>> -static void vfio_sanity_check_pfn_list(struct vfio_iommu *iommu)
>> -{
>> -struct rb_node *n;
>> -
>> -n = rb_first(>dma_list);
>> -for (; n; n = rb_next(n)) {
>> -struct vfio_dma *dma;
>> -
>> -dma = rb_entry(n, struct vfio_dma, node);
>> -
>> -if (WARN_ON(!RB_EMPTY_ROOT(>pfn_list)))
>> -break;
>> -}
>> -/* mdev vendor driver must unregister notifier */
>> -WARN_ON(iommu->notifier.head);
>> -}
>> -
>>  /*
>>   * Called when a domain is removed in detach. It is possible that
>>   * the removed domain decided the iova aperture window. Modify the
>> @@ -2366,10 +2350,10 @@ static void vfio_iommu_type1_detach_group(void 
>> *iommu_data,
>>  kfree(group);
>>  
>>  if (list_empty(>external_domain->group_list)) {
>> -vfio_sanity_check_pfn_list(iommu);
>> -
>> -if (!IS_IOMMU_CAP_DOMAIN_IN_CONTAINER(iommu))
>> +if (!IS_IOMMU_CAP_DOMAIN_IN_CONTAINER(iommu)) {
>> +WARN_ON(iommu->notifier.head);
>>  vfio_iommu_unmap_unpin_all(iommu);
>> +}
>>  
>>  kfree(iommu->external_domain);
>>  iommu->external_domain = NULL;
>> @@ -2403,10 +2387,12 @@ static void vfio_iommu_type1_detach_group(void 
>> *iommu_data,
>>   */
>>  if (list_empty(>group_list)) {
>>  if (list_is_singular(>domain_list)) {
>> -if (!iommu->external_domain)
>> +if (!iommu->external_domain) {
>> +WARN_ON(iommu->notifier.head);
>>  vfio_iommu_unmap_unpin_all(iommu);
>> -else
>> +} else {
>>  vfio_iommu_unmap_unpin_reaccount(iommu);
>> +}
>>  }
>>  iommu_domain_free(domain->domain);
>>  list_del(>next);
>> @@ -2488,9 +2474,10 @@ static void vfio_iommu_type1_release(void *iommu_data)
>>  struct vfio_iommu *iommu = iommu_data;
>>  struct vfio_domain *domain, *domain_tmp;
>>  
>> +WARN_ON(iommu->notifier.head);
> 
> I don't see that this does any harm, but isn't it actually redundant?
> It seems vfio-core only calls the iommu backend release function after
> removing all the groups, so the tests in _detach_group should catch all
> cases.  We're expecting the vfio bus/mdev driver to remove the notifier
> when a device is closed, which necessarily occurs before detaching the
> group.  Thanks,
> 
> Alex
Hi Alex,

Sorry that today I was busy at sending the smmu HTTU based dma dirty log 
tracking.
I will reply you tomorrow. Thanks!

Keqian.

> 
>> +
>>  if (iommu->external_domain) {
>>  vfio_release_domain(iommu->external_domain, true);
>> -vfio_sanity_check_pfn_list(iommu);
>>  kfree(iommu->external_domain);
>>  }
>>  
> 
> .
> 
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH 1/1] iommu/arm-smmu-v3: add support for BBML

2021-01-28 Thread Keqian Zhu



On 2021/1/27 17:39, Robin Murphy wrote:
> On 2021-01-27 07:36, Keqian Zhu wrote:
>>
>>
>> On 2021/1/27 10:01, Leizhen (ThunderTown) wrote:
>>>
>>>
>>> On 2021/1/26 18:12, Will Deacon wrote:
 On Mon, Jan 25, 2021 at 08:23:40PM +, Robin Murphy wrote:
> Now we probably will need some degreee of BBML feature awareness for the
> sake of SVA if and when we start using it for CPU pagetables, but I still
> cannot see any need to consider it in io-pgtable.

 Agreed; I don't think this is something that io-pgtable should have to care
 about.
>> Hi,
>>
>> I have a question here :-).
>> If the old table is not live, then the break procedure seems unnecessary. Do 
>> I miss something?
> 
> The MMU is allowed to prefetch translations at any time, so not following the 
> proper update procedure could still potentially lead to a TLB conflict, even 
> if there's no device traffic to worry about disrupting.
> 
> Robin.

Thanks. Does the MMU you mention here includes MMU and SMMU? I know that at 
SMMU side, ATS can prefetch translation.

Keqian
> 
>> Thanks,
>> Keqian
>>
>>>
>>> Yes, the SVA works in stall mode, and the failed device access requests are 
>>> not
>>> discarded.
>>>
>>> Let me look for examples. The BBML usage scenario was told by a former 
>>> colleague.
>>>

 Will

 .

>>>
>>>
>>> ___
>>> linux-arm-kernel mailing list
>>> linux-arm-ker...@lists.infradead.org
>>> http://lists.infradead.org/mailman/listinfo/linux-arm-kernel
>>> .
>>>
>> ___
>> iommu mailing list
>> iommu@lists.linux-foundation.org
>> https://lists.linuxfoundation.org/mailman/listinfo/iommu
>>
> .
> 
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[RFC PATCH 01/11] iommu/arm-smmu-v3: Add feature detection for HTTU

2021-01-28 Thread Keqian Zhu
From: jiangkunkun 

The SMMU which supports HTTU (Hardware Translation Table Update) can
update the access flag and the dirty state of TTD by hardware. It is
essential to track dirty pages of DMA.

This adds feature detection, none functional change.

Co-developed-by: Keqian Zhu 
Signed-off-by: Kunkun Jiang 
---
 drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c | 16 
 drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h |  8 
 include/linux/io-pgtable.h  |  1 +
 3 files changed, 25 insertions(+)

diff --git a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c 
b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
index 8ca7415d785d..0f0fe71cc10d 100644
--- a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
+++ b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
@@ -1987,6 +1987,7 @@ static int arm_smmu_domain_finalise(struct iommu_domain 
*domain,
.pgsize_bitmap  = smmu->pgsize_bitmap,
.ias= ias,
.oas= oas,
+   .httu_hd= smmu->features & ARM_SMMU_FEAT_HTTU_HD,
.coherent_walk  = smmu->features & ARM_SMMU_FEAT_COHERENCY,
.tlb= _smmu_flush_ops,
.iommu_dev  = smmu->dev,
@@ -3224,6 +3225,21 @@ static int arm_smmu_device_hw_probe(struct 
arm_smmu_device *smmu)
if (reg & IDR0_HYP)
smmu->features |= ARM_SMMU_FEAT_HYP;
 
+   switch (FIELD_GET(IDR0_HTTU, reg)) {
+   case IDR0_HTTU_NONE:
+   break;
+   case IDR0_HTTU_HA:
+   smmu->features |= ARM_SMMU_FEAT_HTTU_HA;
+   break;
+   case IDR0_HTTU_HAD:
+   smmu->features |= ARM_SMMU_FEAT_HTTU_HA;
+   smmu->features |= ARM_SMMU_FEAT_HTTU_HD;
+   break;
+   default:
+   dev_err(smmu->dev, "unknown/unsupported HTTU!\n");
+   return -ENXIO;
+   }
+
/*
 * The coherency feature as set by FW is used in preference to the ID
 * register, but warn on mismatch.
diff --git a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h 
b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h
index 96c2e9565e00..e91bea44519e 100644
--- a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h
+++ b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h
@@ -33,6 +33,10 @@
 #define IDR0_ASID16(1 << 12)
 #define IDR0_ATS   (1 << 10)
 #define IDR0_HYP   (1 << 9)
+#define IDR0_HTTU  GENMASK(7, 6)
+#define IDR0_HTTU_NONE 0
+#define IDR0_HTTU_HA   1
+#define IDR0_HTTU_HAD  2
 #define IDR0_COHACC(1 << 4)
 #define IDR0_TTF   GENMASK(3, 2)
 #define IDR0_TTF_AARCH64   2
@@ -286,6 +290,8 @@
 #define CTXDESC_CD_0_TCR_TBI0  (1ULL << 38)
 
 #define CTXDESC_CD_0_AA64  (1UL << 41)
+#define CTXDESC_CD_0_HD(1UL << 42)
+#define CTXDESC_CD_0_HA(1UL << 43)
 #define CTXDESC_CD_0_S (1UL << 44)
 #define CTXDESC_CD_0_R (1UL << 45)
 #define CTXDESC_CD_0_A (1UL << 46)
@@ -604,6 +610,8 @@ struct arm_smmu_device {
 #define ARM_SMMU_FEAT_RANGE_INV(1 << 15)
 #define ARM_SMMU_FEAT_BTM  (1 << 16)
 #define ARM_SMMU_FEAT_SVA  (1 << 17)
+#define ARM_SMMU_FEAT_HTTU_HA  (1 << 18)
+#define ARM_SMMU_FEAT_HTTU_HD  (1 << 19)
u32 features;
 
 #define ARM_SMMU_OPT_SKIP_PREFETCH (1 << 0)
diff --git a/include/linux/io-pgtable.h b/include/linux/io-pgtable.h
index ea727eb1a1a9..1a00ea8562c7 100644
--- a/include/linux/io-pgtable.h
+++ b/include/linux/io-pgtable.h
@@ -97,6 +97,7 @@ struct io_pgtable_cfg {
unsigned long   pgsize_bitmap;
unsigned intias;
unsigned intoas;
+   boolhttu_hd;
boolcoherent_walk;
const struct iommu_flush_ops*tlb;
struct device   *iommu_dev;
-- 
2.19.1

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[RFC PATCH 10/11] vfio/iommu_type1: Optimize dirty bitmap population based on iommu HWDBM

2021-01-28 Thread Keqian Zhu
From: jiangkunkun 

In the past if vfio_iommu is not of pinned_page_dirty_scope and
vfio_dma is iommu_mapped, we populate full dirty bitmap for this
vfio_dma. Now we can try to get dirty log from iommu before make
the lousy decision.

Co-developed-by: Keqian Zhu 
Signed-off-by: Kunkun Jiang 
---
 drivers/vfio/vfio_iommu_type1.c | 97 -
 1 file changed, 94 insertions(+), 3 deletions(-)

diff --git a/drivers/vfio/vfio_iommu_type1.c b/drivers/vfio/vfio_iommu_type1.c
index 3b8522ebf955..1cd10f3e7ed4 100644
--- a/drivers/vfio/vfio_iommu_type1.c
+++ b/drivers/vfio/vfio_iommu_type1.c
@@ -999,6 +999,25 @@ static bool vfio_group_supports_hwdbm(struct vfio_group 
*group)
return true;
 }
 
+static int vfio_iommu_dirty_log_clear(struct vfio_iommu *iommu,
+ dma_addr_t start_iova, size_t size,
+ unsigned long *bitmap_buffer,
+ dma_addr_t base_iova, size_t pgsize)
+{
+   struct vfio_domain *d;
+   unsigned long pgshift = __ffs(pgsize);
+   int ret;
+
+   list_for_each_entry(d, >domain_list, next) {
+   ret = iommu_clear_dirty_log(d->domain, start_iova, size,
+   bitmap_buffer, base_iova, pgshift);
+   if (ret)
+   return ret;
+   }
+
+   return 0;
+}
+
 static int update_user_bitmap(u64 __user *bitmap, struct vfio_iommu *iommu,
  struct vfio_dma *dma, dma_addr_t base_iova,
  size_t pgsize)
@@ -1010,13 +1029,28 @@ static int update_user_bitmap(u64 __user *bitmap, 
struct vfio_iommu *iommu,
unsigned long shift = bit_offset % BITS_PER_LONG;
unsigned long leftover;
 
+   if (iommu->pinned_page_dirty_scope || !dma->iommu_mapped)
+   goto bitmap_done;
+
+   /* try to get dirty log from IOMMU */
+   if (!iommu->num_non_hwdbm_groups) {
+   struct vfio_domain *d;
+
+   list_for_each_entry(d, >domain_list, next) {
+   if (iommu_sync_dirty_log(d->domain, dma->iova, 
dma->size,
+   dma->bitmap, dma->iova, 
pgshift))
+   return -EFAULT;
+   }
+   goto bitmap_done;
+   }
+
/*
 * mark all pages dirty if any IOMMU capable device is not able
 * to report dirty pages and all pages are pinned and mapped.
 */
-   if (!iommu->pinned_page_dirty_scope && dma->iommu_mapped)
-   bitmap_set(dma->bitmap, 0, nbits);
+   bitmap_set(dma->bitmap, 0, nbits);
 
+bitmap_done:
if (shift) {
bitmap_shift_left(dma->bitmap, dma->bitmap, shift,
  nbits + shift);
@@ -1078,6 +1112,18 @@ static int vfio_iova_dirty_bitmap(u64 __user *bitmap, 
struct vfio_iommu *iommu,
 */
bitmap_clear(dma->bitmap, 0, dma->size >> pgshift);
vfio_dma_populate_bitmap(dma, pgsize);
+
+   /* Clear iommu dirty log to re-enable dirty log tracking */
+   if (!iommu->pinned_page_dirty_scope &&
+   dma->iommu_mapped && !iommu->num_non_hwdbm_groups) {
+   ret = vfio_iommu_dirty_log_clear(iommu, dma->iova,
+   dma->size, dma->bitmap, dma->iova,
+   pgsize);
+   if (ret) {
+   pr_warn("dma dirty log clear failed!\n");
+   return ret;
+   }
+   }
}
return 0;
 }
@@ -2780,6 +2826,48 @@ static int vfio_iommu_type1_unmap_dma(struct vfio_iommu 
*iommu,
-EFAULT : 0;
 }
 
+static void vfio_dma_dirty_log_start(struct vfio_iommu *iommu,
+struct vfio_dma *dma)
+{
+   struct vfio_domain *d;
+
+   list_for_each_entry(d, >domain_list, next) {
+   /* Go through all domain anyway even if we fail */
+   iommu_split_block(d->domain, dma->iova, dma->size);
+   }
+}
+
+static void vfio_dma_dirty_log_stop(struct vfio_iommu *iommu,
+   struct vfio_dma *dma)
+{
+   struct vfio_domain *d;
+
+   list_for_each_entry(d, >domain_list, next) {
+   /* Go through all domain anyway even if we fail */
+   iommu_merge_page(d->domain, dma->iova, dma->size,
+d->prot | dma->prot);
+   }
+}
+
+static void vfio_iommu_dirty_log_switch(struct vfio_iommu *iommu, bool start)
+{
+   struct rb_node *n;
+
+   /* Split and merge even if all iommu don't support HWDBM now */
+   for (n = rb_first(>dma_list); n; n = rb_next(n)) {
+   struct vfio_dma *dma = rb_entry(n, struct vfio_dma, node);
+
+   if (!dma->iommu_mapped)
+   

[RFC PATCH 11/11] vfio/iommu_type1: Add support for manual dirty log clear

2021-01-28 Thread Keqian Zhu
From: jiangkunkun 

In the past, we clear dirty log immediately after sync dirty
log to userspace. This may cause redundant dirty handling if
userspace handles dirty log iteratively:

After vfio clears dirty log, new dirty log starts to generate.
These new dirty log will be reported to userspace even if they
are generated before userspace handles the same dirty page.

That's to say, we should minimize the time gap of dirty log
clearing and dirty log handling. We can give userspace the
interface to clear dirty log.

Co-developed-by: Keqian Zhu 
Signed-off-by: Kunkun Jiang 
---
 drivers/vfio/vfio_iommu_type1.c | 103 ++--
 include/uapi/linux/vfio.h   |  28 -
 2 files changed, 126 insertions(+), 5 deletions(-)

diff --git a/drivers/vfio/vfio_iommu_type1.c b/drivers/vfio/vfio_iommu_type1.c
index 1cd10f3e7ed4..a32dc684b86e 100644
--- a/drivers/vfio/vfio_iommu_type1.c
+++ b/drivers/vfio/vfio_iommu_type1.c
@@ -73,6 +73,7 @@ struct vfio_iommu {
boolv2;
boolnesting;
booldirty_page_tracking;
+   booldirty_log_manual_clear;
boolpinned_page_dirty_scope;
uint64_tnum_non_hwdbm_groups;
 };
@@ -1018,6 +1019,78 @@ static int vfio_iommu_dirty_log_clear(struct vfio_iommu 
*iommu,
return 0;
 }
 
+static int vfio_iova_dirty_log_clear(u64 __user *bitmap,
+struct vfio_iommu *iommu,
+dma_addr_t iova, size_t size,
+size_t pgsize)
+{
+   struct vfio_dma *dma;
+   struct rb_node *n;
+   dma_addr_t start_iova, end_iova, riova;
+   unsigned long pgshift = __ffs(pgsize);
+   unsigned long bitmap_size;
+   unsigned long *bitmap_buffer = NULL;
+   bool clear_valid;
+   int rs, re, start, end, dma_offset;
+   int ret = 0;
+
+   bitmap_size = DIRTY_BITMAP_BYTES(size >> pgshift);
+   bitmap_buffer = kvmalloc(bitmap_size, GFP_KERNEL);
+   if (!bitmap_buffer) {
+   ret = -ENOMEM;
+   goto out;
+   }
+
+   if (copy_from_user(bitmap_buffer, bitmap, bitmap_size)) {
+   ret = -EFAULT;
+   goto out;
+   }
+
+   for (n = rb_first(>dma_list); n; n = rb_next(n)) {
+   dma = rb_entry(n, struct vfio_dma, node);
+   if (!dma->iommu_mapped)
+   continue;
+   if ((dma->iova + dma->size - 1) < iova)
+   continue;
+   if (dma->iova > iova + size - 1)
+   break;
+
+   start_iova = max(iova, dma->iova);
+   end_iova = min(iova + size, dma->iova + dma->size);
+
+   /* Similar logic as the tail of vfio_iova_dirty_bitmap */
+
+   clear_valid = false;
+   start = (start_iova - iova) >> pgshift;
+   end = (end_iova - iova) >> pgshift;
+   bitmap_for_each_set_region(bitmap_buffer, rs, re, start, end) {
+   clear_valid = true;
+   riova = iova + (rs << pgshift);
+   dma_offset = (riova - dma->iova) >> pgshift;
+   bitmap_clear(dma->bitmap, dma_offset, re - rs);
+   }
+
+   if (clear_valid)
+   vfio_dma_populate_bitmap(dma, pgsize);
+
+   if (clear_valid && !iommu->pinned_page_dirty_scope &&
+   dma->iommu_mapped && !iommu->num_non_hwdbm_groups) {
+   ret = vfio_iommu_dirty_log_clear(iommu, start_iova,
+   end_iova - start_iova,  bitmap_buffer,
+   iova, pgsize);
+   if (ret) {
+   pr_warn("dma dirty log clear failed!\n");
+   goto out;
+   }
+   }
+
+   }
+
+out:
+   kfree(bitmap_buffer);
+   return ret;
+}
+
 static int update_user_bitmap(u64 __user *bitmap, struct vfio_iommu *iommu,
  struct vfio_dma *dma, dma_addr_t base_iova,
  size_t pgsize)
@@ -1067,6 +1140,10 @@ static int update_user_bitmap(u64 __user *bitmap, struct 
vfio_iommu *iommu,
 DIRTY_BITMAP_BYTES(nbits + shift)))
return -EFAULT;
 
+   if (shift && iommu->dirty_log_manual_clear)
+   bitmap_shift_right(dma->bitmap, dma->bitmap, shift,
+  nbits + shift);
+
return 0;
 }
 
@@ -1105,6 +1182,9 @@ static int vfio_iova_dirty_bitmap(u64 __user *bitmap, 
struct vfio_iommu *iommu,
if (ret)
return ret;
 
+   if (iommu->dirty_log_manual_clear)
+   continue;
+
/*
 * Re-populate bitmap to include 

[RFC PATCH 09/11] vfio/iommu_type1: Add HWDBM status maintanance

2021-01-28 Thread Keqian Zhu
From: jiangkunkun 

We are going to optimize dirty log tracking based on iommu
HWDBM feature, but the dirty log from iommu is useful only
when all iommu backed groups are connected to iommu with
HWDBM feature. This maintains a counter for this feature.

Co-developed-by: Keqian Zhu 
Signed-off-by: Kunkun Jiang 
---
 drivers/vfio/vfio_iommu_type1.c | 33 +
 1 file changed, 33 insertions(+)

diff --git a/drivers/vfio/vfio_iommu_type1.c b/drivers/vfio/vfio_iommu_type1.c
index 0b4dedaa9128..3b8522ebf955 100644
--- a/drivers/vfio/vfio_iommu_type1.c
+++ b/drivers/vfio/vfio_iommu_type1.c
@@ -74,6 +74,7 @@ struct vfio_iommu {
boolnesting;
booldirty_page_tracking;
boolpinned_page_dirty_scope;
+   uint64_tnum_non_hwdbm_groups;
 };
 
 struct vfio_domain {
@@ -102,6 +103,7 @@ struct vfio_group {
struct list_headnext;
boolmdev_group; /* An mdev group */
boolpinned_page_dirty_scope;
+   booliommu_hwdbm;/* Valid for non-mdev group */
 };
 
 struct vfio_iova {
@@ -976,6 +978,27 @@ static void vfio_update_pgsize_bitmap(struct vfio_iommu 
*iommu)
}
 }
 
+static int vfio_dev_has_feature(struct device *dev, void *data)
+{
+   enum iommu_dev_features *feat = data;
+
+   if (!iommu_dev_has_feature(dev, *feat))
+   return -ENODEV;
+
+   return 0;
+}
+
+static bool vfio_group_supports_hwdbm(struct vfio_group *group)
+{
+   enum iommu_dev_features feat = IOMMU_DEV_FEAT_HWDBM;
+
+   if (iommu_group_for_each_dev(group->iommu_group, ,
+vfio_dev_has_feature))
+   return false;
+
+   return true;
+}
+
 static int update_user_bitmap(u64 __user *bitmap, struct vfio_iommu *iommu,
  struct vfio_dma *dma, dma_addr_t base_iova,
  size_t pgsize)
@@ -2189,6 +2212,12 @@ static int vfio_iommu_type1_attach_group(void 
*iommu_data,
 * capable via the page pinning interface.
 */
iommu->pinned_page_dirty_scope = false;
+
+   /* Update the hwdbm status of group and iommu */
+   group->iommu_hwdbm = vfio_group_supports_hwdbm(group);
+   if (!group->iommu_hwdbm)
+   iommu->num_non_hwdbm_groups++;
+
mutex_unlock(>lock);
vfio_iommu_resv_free(_resv_regions);
 
@@ -2342,6 +2371,7 @@ static void vfio_iommu_type1_detach_group(void 
*iommu_data,
struct vfio_domain *domain;
struct vfio_group *group;
bool update_dirty_scope = false;
+   bool update_iommu_hwdbm = false;
LIST_HEAD(iova_copy);
 
mutex_lock(>lock);
@@ -2380,6 +2410,7 @@ static void vfio_iommu_type1_detach_group(void 
*iommu_data,
 
vfio_iommu_detach_group(domain, group);
update_dirty_scope = !group->pinned_page_dirty_scope;
+   update_iommu_hwdbm = !group->iommu_hwdbm;
list_del(>next);
kfree(group);
/*
@@ -2417,6 +2448,8 @@ static void vfio_iommu_type1_detach_group(void 
*iommu_data,
 */
if (update_dirty_scope)
update_pinned_page_dirty_scope(iommu);
+   if (update_iommu_hwdbm)
+   iommu->num_non_hwdbm_groups--;
mutex_unlock(>lock);
 }
 
-- 
2.19.1

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[RFC PATCH 06/11] iommu/arm-smmu-v3: Scan leaf TTD to sync hardware dirty log

2021-01-28 Thread Keqian Zhu
From: jiangkunkun 

During dirty log tracking, user will try to retrieve dirty log from
iommu if it supports hardware dirty log. This adds a new interface
named sync_dirty_log in iommu layer and arm smmuv3 implements it,
which scans leaf TTD and treats it's dirty if it's writable (As we
just enable HTTU for stage1, so check AP[2] is not set).

Co-developed-by: Keqian Zhu 
Signed-off-by: Kunkun Jiang 
---
 drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c | 27 +++
 drivers/iommu/io-pgtable-arm.c  | 90 +
 drivers/iommu/iommu.c   | 41 ++
 include/linux/io-pgtable.h  |  4 +
 include/linux/iommu.h   | 17 
 5 files changed, 179 insertions(+)

diff --git a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c 
b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
index 2434519e4bb6..43d0536b429a 100644
--- a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
+++ b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
@@ -2548,6 +2548,32 @@ static size_t arm_smmu_merge_page(struct iommu_domain 
*domain, unsigned long iov
return ops->merge_page(ops, iova, paddr, size, prot);
 }
 
+static int arm_smmu_sync_dirty_log(struct iommu_domain *domain,
+  unsigned long iova, size_t size,
+  unsigned long *bitmap,
+  unsigned long base_iova,
+  unsigned long bitmap_pgshift)
+{
+   struct io_pgtable_ops *ops = to_smmu_domain(domain)->pgtbl_ops;
+   struct arm_smmu_device *smmu = to_smmu_domain(domain)->smmu;
+
+   if (!(smmu->features & ARM_SMMU_FEAT_HTTU_HD)) {
+   dev_err(smmu->dev, "don't support HTTU_HD and sync dirty 
log\n");
+   return -EPERM;
+   }
+
+   if (!ops || !ops->sync_dirty_log) {
+   pr_err("don't support sync dirty log\n");
+   return -ENODEV;
+   }
+
+   /* To ensure all inflight transactions are completed */
+   arm_smmu_flush_iotlb_all(domain);
+
+   return ops->sync_dirty_log(ops, iova, size, bitmap,
+   base_iova, bitmap_pgshift);
+}
+
 static int arm_smmu_of_xlate(struct device *dev, struct of_phandle_args *args)
 {
return iommu_fwspec_add_ids(dev, args->args, 1);
@@ -2649,6 +2675,7 @@ static struct iommu_ops arm_smmu_ops = {
.domain_set_attr= arm_smmu_domain_set_attr,
.split_block= arm_smmu_split_block,
.merge_page = arm_smmu_merge_page,
+   .sync_dirty_log = arm_smmu_sync_dirty_log,
.of_xlate   = arm_smmu_of_xlate,
.get_resv_regions   = arm_smmu_get_resv_regions,
.put_resv_regions   = generic_iommu_put_resv_regions,
diff --git a/drivers/iommu/io-pgtable-arm.c b/drivers/iommu/io-pgtable-arm.c
index 17390f258eb1..6cfe1ef3fedd 100644
--- a/drivers/iommu/io-pgtable-arm.c
+++ b/drivers/iommu/io-pgtable-arm.c
@@ -877,6 +877,95 @@ static size_t arm_lpae_merge_page(struct io_pgtable_ops 
*ops, unsigned long iova
return __arm_lpae_merge_page(data, iova, paddr, size, lvl, ptep, prot);
 }
 
+static int __arm_lpae_sync_dirty_log(struct arm_lpae_io_pgtable *data,
+unsigned long iova, size_t size,
+int lvl, arm_lpae_iopte *ptep,
+unsigned long *bitmap,
+unsigned long base_iova,
+unsigned long bitmap_pgshift)
+{
+   arm_lpae_iopte pte;
+   struct io_pgtable *iop = >iop;
+   size_t base, next_size;
+   unsigned long offset;
+   int nbits, ret;
+
+   if (WARN_ON(lvl == ARM_LPAE_MAX_LEVELS))
+   return -EINVAL;
+
+   ptep += ARM_LPAE_LVL_IDX(iova, lvl, data);
+   pte = READ_ONCE(*ptep);
+   if (WARN_ON(!pte))
+   return -EINVAL;
+
+   if (size == ARM_LPAE_BLOCK_SIZE(lvl, data)) {
+   if (iopte_leaf(pte, lvl, iop->fmt)) {
+   if (pte & ARM_LPAE_PTE_AP_RDONLY)
+   return 0;
+
+   /* It is writable, set the bitmap */
+   nbits = size >> bitmap_pgshift;
+   offset = (iova - base_iova) >> bitmap_pgshift;
+   bitmap_set(bitmap, offset, nbits);
+   return 0;
+   } else {
+   /* To traverse next level */
+   next_size = ARM_LPAE_BLOCK_SIZE(lvl + 1, data);
+   ptep = iopte_deref(pte, data);
+   for (base = 0; base < size; base += next_size) {
+   ret = __arm_lpae_sync_dirty_log(data,
+   iova + base, next_size, lvl + 1,
+   ptep, bitmap, base_iova, 
bitmap_pgshift);
+   if 

[RFC PATCH 05/11] iommu/arm-smmu-v3: Merge a span of page to block descriptor

2021-01-28 Thread Keqian Zhu
From: jiangkunkun 

When stop dirty log tracking, we need to recover all block descriptors
which are splited when start dirty log tracking. This adds a new
interface named merge_page in iommu layer and arm smmuv3 implements it,
which reinstall block mappings and unmap the span of page mappings.

It's caller's duty to find contiuous physical memory.

During merging page, other interfaces are not expected to be working,
so race condition does not exist. And we flush all iotlbs after the merge
procedure is completed to ease the pressure of iommu, as we will merge a
huge range of page mappings in general.

Co-developed-by: Keqian Zhu 
Signed-off-by: Kunkun Jiang 
---
 drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c | 20 ++
 drivers/iommu/io-pgtable-arm.c  | 78 +
 drivers/iommu/iommu.c   | 75 
 include/linux/io-pgtable.h  |  2 +
 include/linux/iommu.h   | 10 +++
 5 files changed, 185 insertions(+)

diff --git a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c 
b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
index 5469f4fca820..2434519e4bb6 100644
--- a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
+++ b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
@@ -2529,6 +2529,25 @@ static size_t arm_smmu_split_block(struct iommu_domain 
*domain,
return ops->split_block(ops, iova, size);
 }
 
+static size_t arm_smmu_merge_page(struct iommu_domain *domain, unsigned long 
iova,
+ phys_addr_t paddr, size_t size, int prot)
+{
+   struct io_pgtable_ops *ops = to_smmu_domain(domain)->pgtbl_ops;
+   struct arm_smmu_device *smmu = to_smmu_domain(domain)->smmu;
+
+   if (!(smmu->features & (ARM_SMMU_FEAT_BBML1 | ARM_SMMU_FEAT_BBML2))) {
+   dev_err(smmu->dev, "don't support BBML1/2 and merge page\n");
+   return 0;
+   }
+
+   if (!ops || !ops->merge_page) {
+   pr_err("don't support merge page\n");
+   return 0;
+   }
+
+   return ops->merge_page(ops, iova, paddr, size, prot);
+}
+
 static int arm_smmu_of_xlate(struct device *dev, struct of_phandle_args *args)
 {
return iommu_fwspec_add_ids(dev, args->args, 1);
@@ -2629,6 +2648,7 @@ static struct iommu_ops arm_smmu_ops = {
.domain_get_attr= arm_smmu_domain_get_attr,
.domain_set_attr= arm_smmu_domain_set_attr,
.split_block= arm_smmu_split_block,
+   .merge_page = arm_smmu_merge_page,
.of_xlate   = arm_smmu_of_xlate,
.get_resv_regions   = arm_smmu_get_resv_regions,
.put_resv_regions   = generic_iommu_put_resv_regions,
diff --git a/drivers/iommu/io-pgtable-arm.c b/drivers/iommu/io-pgtable-arm.c
index f3b7f7115e38..17390f258eb1 100644
--- a/drivers/iommu/io-pgtable-arm.c
+++ b/drivers/iommu/io-pgtable-arm.c
@@ -800,6 +800,83 @@ static size_t arm_lpae_split_block(struct io_pgtable_ops 
*ops,
return __arm_lpae_split_block(data, iova, size, lvl, ptep);
 }
 
+static size_t __arm_lpae_merge_page(struct arm_lpae_io_pgtable *data,
+   unsigned long iova, phys_addr_t paddr,
+   size_t size, int lvl, arm_lpae_iopte *ptep,
+   arm_lpae_iopte prot)
+{
+   arm_lpae_iopte pte, *tablep;
+   struct io_pgtable *iop = >iop;
+   struct io_pgtable_cfg *cfg = >iop.cfg;
+
+   if (WARN_ON(lvl == ARM_LPAE_MAX_LEVELS))
+   return 0;
+
+   ptep += ARM_LPAE_LVL_IDX(iova, lvl, data);
+   pte = READ_ONCE(*ptep);
+   if (WARN_ON(!pte))
+   return 0;
+
+   if (size == ARM_LPAE_BLOCK_SIZE(lvl, data)) {
+   if (iopte_leaf(pte, lvl, iop->fmt))
+   return size;
+
+   /* Race does not exist */
+   if (cfg->bbml == 1) {
+   prot |= ARM_LPAE_PTE_NT;
+   __arm_lpae_init_pte(data, paddr, prot, lvl, ptep);
+   io_pgtable_tlb_flush_walk(iop, iova, size,
+ ARM_LPAE_GRANULE(data));
+
+   prot &= ~(ARM_LPAE_PTE_NT);
+   __arm_lpae_init_pte(data, paddr, prot, lvl, ptep);
+   } else {
+   __arm_lpae_init_pte(data, paddr, prot, lvl, ptep);
+   }
+
+   tablep = iopte_deref(pte, data);
+   __arm_lpae_free_pgtable(data, lvl + 1, tablep);
+   return size;
+   } else if (iopte_leaf(pte, lvl, iop->fmt)) {
+   /* The size is too small, already merged */
+   return size;
+   }
+
+   /* Keep on walkin */
+   ptep = iopte_deref(pte, data);
+   return __arm_lpae_merge_page(data, iova, paddr, size, lvl + 1, ptep, 
prot);
+}
+
+static size_t arm_lpae_merge_page(struct io_pgtable_ops *ops, unsigned long 
iova,
+

[RFC PATCH 02/11] iommu/arm-smmu-v3: Enable HTTU for SMMU stage1 mapping

2021-01-28 Thread Keqian Zhu
From: jiangkunkun 

If HTTU is supported, we enable HA/HD bits in the SMMU CD (stage 1
mapping), and set DBM bit for writable TTD.

The dirty state information is encoded using the access permission
bits AP[2] (stage 1) or S2AP[1] (stage 2) in conjunction with the
DBM (Dirty Bit Modifier) bit, where DBM means writable and AP[2]/
S2AP[1] means dirty.

Co-developed-by: Keqian Zhu 
Signed-off-by: Kunkun Jiang 
---
 drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c | 5 +
 drivers/iommu/io-pgtable-arm.c  | 7 ++-
 2 files changed, 11 insertions(+), 1 deletion(-)

diff --git a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c 
b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
index 0f0fe71cc10d..8cc9d7536b08 100644
--- a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
+++ b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
@@ -1036,6 +1036,11 @@ int arm_smmu_write_ctx_desc(struct arm_smmu_domain 
*smmu_domain, int ssid,
FIELD_PREP(CTXDESC_CD_0_ASID, cd->asid) |
CTXDESC_CD_0_V;
 
+   if (smmu->features & ARM_SMMU_FEAT_HTTU_HA)
+   val |= CTXDESC_CD_0_HA;
+   if (smmu->features & ARM_SMMU_FEAT_HTTU_HD)
+   val |= CTXDESC_CD_0_HD;
+
/* STALL_MODEL==0b10 && CD.S==0 is ILLEGAL */
if (smmu->features & ARM_SMMU_FEAT_STALL_FORCE)
val |= CTXDESC_CD_0_S;
diff --git a/drivers/iommu/io-pgtable-arm.c b/drivers/iommu/io-pgtable-arm.c
index 87def58e79b5..e299a44808ae 100644
--- a/drivers/iommu/io-pgtable-arm.c
+++ b/drivers/iommu/io-pgtable-arm.c
@@ -72,6 +72,7 @@
 
 #define ARM_LPAE_PTE_NSTABLE   (((arm_lpae_iopte)1) << 63)
 #define ARM_LPAE_PTE_XN(((arm_lpae_iopte)3) << 53)
+#define ARM_LPAE_PTE_DBM   (((arm_lpae_iopte)1) << 51)
 #define ARM_LPAE_PTE_AF(((arm_lpae_iopte)1) << 10)
 #define ARM_LPAE_PTE_SH_NS (((arm_lpae_iopte)0) << 8)
 #define ARM_LPAE_PTE_SH_OS (((arm_lpae_iopte)2) << 8)
@@ -81,7 +82,7 @@
 
 #define ARM_LPAE_PTE_ATTR_LO_MASK  (((arm_lpae_iopte)0x3ff) << 2)
 /* Ignore the contiguous bit for block splitting */
-#define ARM_LPAE_PTE_ATTR_HI_MASK  (((arm_lpae_iopte)6) << 52)
+#define ARM_LPAE_PTE_ATTR_HI_MASK  (((arm_lpae_iopte)13) << 51)
 #define ARM_LPAE_PTE_ATTR_MASK (ARM_LPAE_PTE_ATTR_LO_MASK |\
 ARM_LPAE_PTE_ATTR_HI_MASK)
 /* Software bit for solving coherency races */
@@ -379,6 +380,7 @@ static int __arm_lpae_map(struct arm_lpae_io_pgtable *data, 
unsigned long iova,
 static arm_lpae_iopte arm_lpae_prot_to_pte(struct arm_lpae_io_pgtable *data,
   int prot)
 {
+   struct io_pgtable_cfg *cfg = >iop.cfg;
arm_lpae_iopte pte;
 
if (data->iop.fmt == ARM_64_LPAE_S1 ||
@@ -386,6 +388,9 @@ static arm_lpae_iopte arm_lpae_prot_to_pte(struct 
arm_lpae_io_pgtable *data,
pte = ARM_LPAE_PTE_nG;
if (!(prot & IOMMU_WRITE) && (prot & IOMMU_READ))
pte |= ARM_LPAE_PTE_AP_RDONLY;
+   else if (cfg->httu_hd)
+   pte |= ARM_LPAE_PTE_DBM;
+
if (!(prot & IOMMU_PRIV))
pte |= ARM_LPAE_PTE_AP_UNPRIV;
} else {
-- 
2.19.1

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[RFC PATCH 04/11] iommu/arm-smmu-v3: Split block descriptor to a span of page

2021-01-28 Thread Keqian Zhu
From: jiangkunkun 

Block descriptor is not a proper granule for dirty log tracking. This
adds a new interface named split_block in iommu layer and arm smmuv3
implements it, which splits block descriptor to an equivalent span of
page descriptors.

During spliting block, other interfaces are not expected to be working,
so race condition does not exist. And we flush all iotlbs after the split
procedure is completed to ease the pressure of iommu, as we will split a
huge range of block mappings in general.

Co-developed-by: Keqian Zhu 
Signed-off-by: Kunkun Jiang 
---
 drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c |  20 
 drivers/iommu/io-pgtable-arm.c  | 122 
 drivers/iommu/iommu.c   |  40 +++
 include/linux/io-pgtable.h  |   2 +
 include/linux/iommu.h   |  10 ++
 5 files changed, 194 insertions(+)

diff --git a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c 
b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
index 9208881a571c..5469f4fca820 100644
--- a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
+++ b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
@@ -2510,6 +2510,25 @@ static int arm_smmu_domain_set_attr(struct iommu_domain 
*domain,
return ret;
 }
 
+static size_t arm_smmu_split_block(struct iommu_domain *domain,
+  unsigned long iova, size_t size)
+{
+   struct arm_smmu_device *smmu = to_smmu_domain(domain)->smmu;
+   struct io_pgtable_ops *ops = to_smmu_domain(domain)->pgtbl_ops;
+
+   if (!(smmu->features & (ARM_SMMU_FEAT_BBML1 | ARM_SMMU_FEAT_BBML2))) {
+   dev_err(smmu->dev, "don't support BBML1/2 and split block\n");
+   return 0;
+   }
+
+   if (!ops || !ops->split_block) {
+   pr_err("don't support split block\n");
+   return 0;
+   }
+
+   return ops->split_block(ops, iova, size);
+}
+
 static int arm_smmu_of_xlate(struct device *dev, struct of_phandle_args *args)
 {
return iommu_fwspec_add_ids(dev, args->args, 1);
@@ -2609,6 +2628,7 @@ static struct iommu_ops arm_smmu_ops = {
.device_group   = arm_smmu_device_group,
.domain_get_attr= arm_smmu_domain_get_attr,
.domain_set_attr= arm_smmu_domain_set_attr,
+   .split_block= arm_smmu_split_block,
.of_xlate   = arm_smmu_of_xlate,
.get_resv_regions   = arm_smmu_get_resv_regions,
.put_resv_regions   = generic_iommu_put_resv_regions,
diff --git a/drivers/iommu/io-pgtable-arm.c b/drivers/iommu/io-pgtable-arm.c
index e299a44808ae..f3b7f7115e38 100644
--- a/drivers/iommu/io-pgtable-arm.c
+++ b/drivers/iommu/io-pgtable-arm.c
@@ -79,6 +79,8 @@
 #define ARM_LPAE_PTE_SH_IS (((arm_lpae_iopte)3) << 8)
 #define ARM_LPAE_PTE_NS(((arm_lpae_iopte)1) << 5)
 #define ARM_LPAE_PTE_VALID (((arm_lpae_iopte)1) << 0)
+/* Block descriptor bits */
+#define ARM_LPAE_PTE_NT(((arm_lpae_iopte)1) << 16)
 
 #define ARM_LPAE_PTE_ATTR_LO_MASK  (((arm_lpae_iopte)0x3ff) << 2)
 /* Ignore the contiguous bit for block splitting */
@@ -679,6 +681,125 @@ static phys_addr_t arm_lpae_iova_to_phys(struct 
io_pgtable_ops *ops,
return iopte_to_paddr(pte, data) | iova;
 }
 
+static size_t __arm_lpae_split_block(struct arm_lpae_io_pgtable *data,
+unsigned long iova, size_t size, int lvl,
+arm_lpae_iopte *ptep);
+
+static size_t arm_lpae_do_split_blk(struct arm_lpae_io_pgtable *data,
+   unsigned long iova, size_t size,
+   arm_lpae_iopte blk_pte, int lvl,
+   arm_lpae_iopte *ptep)
+{
+   struct io_pgtable_cfg *cfg = >iop.cfg;
+   arm_lpae_iopte pte, *tablep;
+   phys_addr_t blk_paddr;
+   size_t tablesz = ARM_LPAE_GRANULE(data);
+   size_t split_sz = ARM_LPAE_BLOCK_SIZE(lvl, data);
+   int i;
+
+   if (WARN_ON(lvl == ARM_LPAE_MAX_LEVELS))
+   return 0;
+
+   tablep = __arm_lpae_alloc_pages(tablesz, GFP_ATOMIC, cfg);
+   if (!tablep)
+   return 0;
+
+   blk_paddr = iopte_to_paddr(blk_pte, data);
+   pte = iopte_prot(blk_pte);
+   for (i = 0; i < tablesz / sizeof(pte); i++, blk_paddr += split_sz)
+   __arm_lpae_init_pte(data, blk_paddr, pte, lvl, [i]);
+
+   if (cfg->bbml == 1) {
+   /* Race does not exist */
+   blk_pte |= ARM_LPAE_PTE_NT;
+   __arm_lpae_set_pte(ptep, blk_pte, cfg);
+   io_pgtable_tlb_flush_walk(>iop, iova, size, size);
+   }
+   /* Race does not exist */
+   pte = arm_lpae_install_table(tablep, ptep, blk_pte, cfg);
+
+   /* Have splited it into page? */
+   if (lvl == (ARM_LPAE_MAX_LEVELS - 1))
+   return size;
+
+   /* Go back to lvl - 1 */
+   ptep 

[RFC PATCH 03/11] iommu/arm-smmu-v3: Add feature detection for BBML

2021-01-28 Thread Keqian Zhu
From: jiangkunkun 

When altering a translation table descriptor of some specific reasons,
we require break-before-make procedure. But it might cause problems when
the TTD is alive. The I/O streams might not tolerate translation faults.

If the SMMU supports BBML level 1 or BBML level 2, we can change the block
size without using break-before-make.

This adds feature detection for BBML, none functional change.

Co-developed-by: Keqian Zhu 
Signed-off-by: Kunkun Jiang 
---
 drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c | 24 -
 drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h |  6 ++
 include/linux/io-pgtable.h  |  1 +
 3 files changed, 30 insertions(+), 1 deletion(-)

diff --git a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c 
b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
index 8cc9d7536b08..9208881a571c 100644
--- a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
+++ b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
@@ -1947,7 +1947,7 @@ static int arm_smmu_domain_finalise_s2(struct 
arm_smmu_domain *smmu_domain,
 static int arm_smmu_domain_finalise(struct iommu_domain *domain,
struct arm_smmu_master *master)
 {
-   int ret;
+   int ret, bbml;
unsigned long ias, oas;
enum io_pgtable_fmt fmt;
struct io_pgtable_cfg pgtbl_cfg;
@@ -1988,12 +1988,20 @@ static int arm_smmu_domain_finalise(struct iommu_domain 
*domain,
return -EINVAL;
}
 
+   if (smmu->features & ARM_SMMU_FEAT_BBML2)
+   bbml = 2;
+   else if (smmu->features & ARM_SMMU_FEAT_BBML1)
+   bbml = 1;
+   else
+   bbml = 0;
+
pgtbl_cfg = (struct io_pgtable_cfg) {
.pgsize_bitmap  = smmu->pgsize_bitmap,
.ias= ias,
.oas= oas,
.httu_hd= smmu->features & ARM_SMMU_FEAT_HTTU_HD,
.coherent_walk  = smmu->features & ARM_SMMU_FEAT_COHERENCY,
+   .bbml   = bbml,
.tlb= _smmu_flush_ops,
.iommu_dev  = smmu->dev,
};
@@ -3328,6 +3336,20 @@ static int arm_smmu_device_hw_probe(struct 
arm_smmu_device *smmu)
 
/* IDR3 */
reg = readl_relaxed(smmu->base + ARM_SMMU_IDR3);
+   switch (FIELD_GET(IDR3_BBML, reg)) {
+   case IDR3_BBML0:
+   break;
+   case IDR3_BBML1:
+   smmu->features |= ARM_SMMU_FEAT_BBML1;
+   break;
+   case IDR3_BBML2:
+   smmu->features |= ARM_SMMU_FEAT_BBML2;
+   break;
+   default:
+   dev_err(smmu->dev, "unknown/unsupported BBM behavior level\n");
+   return -ENXIO;
+   }
+
if (FIELD_GET(IDR3_RIL, reg))
smmu->features |= ARM_SMMU_FEAT_RANGE_INV;
 
diff --git a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h 
b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h
index e91bea44519e..11e526ab7239 100644
--- a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h
+++ b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h
@@ -55,6 +55,10 @@
 #define IDR1_SIDSIZE   GENMASK(5, 0)
 
 #define ARM_SMMU_IDR3  0xc
+#define IDR3_BBML  GENMASK(12, 11)
+#define IDR3_BBML0 0
+#define IDR3_BBML1 1
+#define IDR3_BBML2 2
 #define IDR3_RIL   (1 << 10)
 
 #define ARM_SMMU_IDR5  0x14
@@ -612,6 +616,8 @@ struct arm_smmu_device {
 #define ARM_SMMU_FEAT_SVA  (1 << 17)
 #define ARM_SMMU_FEAT_HTTU_HA  (1 << 18)
 #define ARM_SMMU_FEAT_HTTU_HD  (1 << 19)
+#define ARM_SMMU_FEAT_BBML1(1 << 20)
+#define ARM_SMMU_FEAT_BBML2(1 << 21)
u32 features;
 
 #define ARM_SMMU_OPT_SKIP_PREFETCH (1 << 0)
diff --git a/include/linux/io-pgtable.h b/include/linux/io-pgtable.h
index 1a00ea8562c7..26583beeb5d9 100644
--- a/include/linux/io-pgtable.h
+++ b/include/linux/io-pgtable.h
@@ -99,6 +99,7 @@ struct io_pgtable_cfg {
unsigned intoas;
boolhttu_hd;
boolcoherent_walk;
+   int bbml;
const struct iommu_flush_ops*tlb;
struct device   *iommu_dev;
 
-- 
2.19.1

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[RFC PATCH 08/11] iommu/arm-smmu-v3: Add HWDBM device feature reporting

2021-01-28 Thread Keqian Zhu
From: jiangkunkun 

We have implemented these interfaces required to support iommu
dirty log tracking. The last step is reporting this feature to
upper user, then the user can perform higher policy base on it.
This adds a new dev feature named IOMMU_DEV_FEAT_HWDBM in iommu
layer. For arm smmuv3, it is equal to ARM_SMMU_FEAT_HTTU_HD.

Co-developed-by: Keqian Zhu 
Signed-off-by: Kunkun Jiang 
---
 drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c | 2 ++
 include/linux/iommu.h   | 1 +
 2 files changed, 3 insertions(+)

diff --git a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c 
b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
index 0c24503d29d3..cbde0489cf31 100644
--- a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
+++ b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
@@ -2629,6 +2629,8 @@ static bool arm_smmu_dev_has_feature(struct device *dev,
switch (feat) {
case IOMMU_DEV_FEAT_SVA:
return arm_smmu_master_sva_supported(master);
+   case IOMMU_DEV_FEAT_HWDBM:
+   return !!(master->smmu->features & ARM_SMMU_FEAT_HTTU_HD);
default:
return false;
}
diff --git a/include/linux/iommu.h b/include/linux/iommu.h
index 1cb6cd0cfc7b..77e561ed57fd 100644
--- a/include/linux/iommu.h
+++ b/include/linux/iommu.h
@@ -160,6 +160,7 @@ struct iommu_resv_region {
 enum iommu_dev_features {
IOMMU_DEV_FEAT_AUX, /* Aux-domain feature */
IOMMU_DEV_FEAT_SVA, /* Shared Virtual Addresses */
+   IOMMU_DEV_FEAT_HWDBM,   /* Hardware Dirty Bit Management */
 };
 
 #define IOMMU_PASID_INVALID(-1U)
-- 
2.19.1

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[RFC PATCH 00/11] vfio/iommu_type1: Implement dirty log tracking based on smmuv3 HTTU

2021-01-28 Thread Keqian Zhu
Hi all,

This patch series implement a new dirty log tracking method for vfio dma.

Intention:

As we know, vfio live migration is an important and valuable feature, but there
are still many hurdles to solve, including migration of interrupt, device state,
DMA dirty log tracking, and etc.

For now, the only dirty log tracking interface is pinning. It has some 
drawbacks:
1. Only smart vendor drivers are aware of this.
2. It's coarse-grained, the pinned-scope is generally bigger than what the 
device actually access.
3. It can't track dirty continuously and precisely, vfio populates all 
pinned-scope as dirty.
   So it doesn't work well with iteratively dirty log handling.

About SMMU HTTU:

HTTU (Hardware Translation Table Update) is a feature of ARM SMMUv3, it can 
update
access flag or/and dirty state of the TTD (Translation Table Descriptor) by 
hardware.
With HTTU, stage1 TTD is classified into 3 types:
DBM bit AP[2](readonly bit)
1. writable_clean 1   1
2. writable_dirty 1   0
3. readonly   0   1

If HTTU_HD (manage dirty state) is enabled, smmu can change TTD from 
writable_clean to
writable_dirty. Then software can scan TTD to sync dirty state into dirty 
bitmap. With
this feature, we can track the dirty log of DMA continuously and precisely.

About this series:

Patch 1-3: Add feature detection for smmu HTTU and enable HTTU for smmu stage1 
mapping.
   And add feature detection for smmu BBML. We need to split block 
mapping when
   start dirty log tracking and merge page mapping when stop dirty log 
tracking,
   which requires break-before-make procedure. But it might 
cause problems when the
   TTD is alive. The I/O streams might not tolerate translation 
faults. So BBML
   should be used.

Patch 4-7: Add four interfaces (split_block, merge_page, sync_dirty_log and 
clear_dirty_log)
   in IOMMU layer, they are essential to implement dma dirty log 
tracking for vfio.
   We implement these interfaces for arm smmuv3.

Patch   8: Add HWDBM (Hardware Dirty Bit Management) device feature reporting 
in IOMMU layer.

Patch9-11: Implement a new dirty log tracking method for vfio based on iommu 
hwdbm. A new
   ioctl operation named VFIO_DIRTY_LOG_MANUAL_CLEAR is added, which 
can eliminate
   some redundant dirty handling of userspace.

Optimizations TO Do:

1. We recognized that each smmu_domain (a vfio_container may has several 
smmu_domain) has its
   own stage1 mapping, and we must scan all these mapping to sync dirty state. 
We plan to refactor
   smmu_domain to support more than one smmu in one smmu_domain, then these 
smmus can share a same
   stage1 mapping.
2. We also recognized that scan TTD is a hotspot of performance. Recently, I 
have implement a
   SW/HW conbined dirty log tracking at MMU side [1], which can effectively 
solve this problem.
   This idea can be applied to smmu side too.

Thanks,
Keqian


[1] 
https://lore.kernel.org/linux-arm-kernel/2021012612.27136-1-zhukeqi...@huawei.com/

jiangkunkun (11):
  iommu/arm-smmu-v3: Add feature detection for HTTU
  iommu/arm-smmu-v3: Enable HTTU for SMMU stage1 mapping
  iommu/arm-smmu-v3: Add feature detection for BBML
  iommu/arm-smmu-v3: Split block descriptor to a span of page
  iommu/arm-smmu-v3: Merge a span of page to block descriptor
  iommu/arm-smmu-v3: Scan leaf TTD to sync hardware dirty log
  iommu/arm-smmu-v3: Clear dirty log according to bitmap
  iommu/arm-smmu-v3: Add HWDBM device feature reporting
  vfio/iommu_type1: Add HWDBM status maintanance
  vfio/iommu_type1: Optimize dirty bitmap population based on iommu
HWDBM
  vfio/iommu_type1: Add support for manual dirty log clear

 drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c | 138 ++-
 drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h |  14 +
 drivers/iommu/io-pgtable-arm.c  | 392 +++-
 drivers/iommu/iommu.c   | 227 
 drivers/vfio/vfio_iommu_type1.c | 235 +++-
 include/linux/io-pgtable.h  |  14 +
 include/linux/iommu.h   |  55 +++
 include/uapi/linux/vfio.h   |  28 +-
 8 files changed, 1093 insertions(+), 10 deletions(-)

-- 
2.19.1

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

Re: [PATCH 6/6] media: uvcvideo: Use dma_alloc_noncontiguos API

2021-01-28 Thread Christoph Hellwig
I just included this patch as-is, but here are a few comments:

On Thu, Jan 28, 2021 at 03:58:37PM +0100, Christoph Hellwig wrote:
> +static void uvc_urb_dma_sync(struct uvc_urb *uvc_urb, bool for_device)
> +{
> + struct device *dma_dev = dma_dev = stream_to_dmadev(uvc_urb->stream);
> +
> + if (for_device)
> + dma_sync_sgtable_for_device(dma_dev, uvc_urb->sgt,
> + DMA_FROM_DEVICE);
> + else
> + dma_sync_sgtable_for_cpu(dma_dev, uvc_urb->sgt,
> +  DMA_FROM_DEVICE);
> +}

Given that we vmap the addresses this also needs
flush_kernel_vmap_range / invalidate_kernel_vmap_range calls for
VIVT architectures.
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[PATCH 6/6] media: uvcvideo: Use dma_alloc_noncontiguos API

2021-01-28 Thread Christoph Hellwig
From: Ricardo Ribalda 

On architectures where the is no coherent caching such as ARM use the
dma_alloc_noncontiguos API and handle manually the cache flushing using
dma_sync_sgtable().

With this patch on the affected architectures we can measure up to 20x
performance improvement in uvc_video_copy_data_work().

Eg: aarch64 with an external usb camera

NON_CONTIGUOUS
frames:  999
packets: 999
empty:   0 (0 %)
errors:  0
invalid: 0
pts: 0 early, 0 initial, 999 ok
scr: 0 count ok, 0 diff ok
sof: 2048 <= sof <= 0, freq 0.000 kHz
bytes 67034480 : duration 33303
FPS: 29.99
URB: 523446/4993 uS/qty: 104.836 avg 132.532 std 13.230 min 831.094 max (uS)
header: 76564/4993 uS/qty: 15.334 avg 15.229 std 3.438 min 186.875 max (uS)
latency: 468945/4992 uS/qty: 93.939 avg 132.577 std 9.531 min 824.010 max (uS)
decode: 54161/4993 uS/qty: 10.847 avg 6.313 std 1.614 min 111.458 max (uS)
raw decode speed: 9.931 Gbits/s
raw URB handling speed: 1.025 Gbits/s
throughput: 16.102 Mbits/s
URB decode CPU usage 0.162600 %

COHERENT
frames:  999
packets: 999
empty:   0 (0 %)
errors:  0
invalid: 0
pts: 0 early, 0 initial, 999 ok
scr: 0 count ok, 0 diff ok
sof: 2048 <= sof <= 0, freq 0.000 kHz
bytes 54683536 : duration 33302
FPS: 29.99
URB: 1478135/4000 uS/qty: 369.533 avg 390.357 std 22.968 min 3337.865 max (uS)
header: 79761/4000 uS/qty: 19.940 avg 18.495 std 1.875 min 336.719 max (uS)
latency: 281077/4000 uS/qty: 70.269 avg 83.102 std 5.104 min 735.000 max (uS)
decode: 1197057/4000 uS/qty: 299.264 avg 318.080 std 1.615 min 2806.667 max (uS)
raw decode speed: 365.470 Mbits/s
raw URB handling speed: 295.986 Mbits/s
throughput: 13.136 Mbits/s
URB decode CPU usage 3.594500 %

Signed-off-by: Ricardo Ribalda 
Signed-off-by: Christoph Hellwig 
---
 drivers/media/usb/uvc/uvc_video.c | 76 ++-
 drivers/media/usb/uvc/uvcvideo.h  |  4 +-
 2 files changed, 57 insertions(+), 23 deletions(-)

diff --git a/drivers/media/usb/uvc/uvc_video.c 
b/drivers/media/usb/uvc/uvc_video.c
index a6a441d92b9488..9c051b55dc7bc6 100644
--- a/drivers/media/usb/uvc/uvc_video.c
+++ b/drivers/media/usb/uvc/uvc_video.c
@@ -11,6 +11,7 @@
 #include 
 #include 
 #include 
+#include 
 #include 
 #include 
 #include 
@@ -1097,6 +1098,23 @@ static int uvc_video_decode_start(struct uvc_streaming 
*stream,
return data[0];
 }
 
+static inline struct device *stream_to_dmadev(struct uvc_streaming *stream)
+{
+   return bus_to_hcd(stream->dev->udev->bus)->self.sysdev;
+}
+
+static void uvc_urb_dma_sync(struct uvc_urb *uvc_urb, bool for_device)
+{
+   struct device *dma_dev = dma_dev = stream_to_dmadev(uvc_urb->stream);
+
+   if (for_device)
+   dma_sync_sgtable_for_device(dma_dev, uvc_urb->sgt,
+   DMA_FROM_DEVICE);
+   else
+   dma_sync_sgtable_for_cpu(dma_dev, uvc_urb->sgt,
+DMA_FROM_DEVICE);
+}
+
 /*
  * uvc_video_decode_data_work: Asynchronous memcpy processing
  *
@@ -1118,6 +1136,8 @@ static void uvc_video_copy_data_work(struct work_struct 
*work)
uvc_queue_buffer_release(op->buf);
}
 
+   uvc_urb_dma_sync(uvc_urb, true);
+
ret = usb_submit_urb(uvc_urb->urb, GFP_KERNEL);
if (ret < 0)
uvc_printk(KERN_ERR, "Failed to resubmit video URB (%d).\n",
@@ -1539,10 +1559,12 @@ static void uvc_video_complete(struct urb *urb)
 * Process the URB headers, and optionally queue expensive memcpy tasks
 * to be deferred to a work queue.
 */
+   uvc_urb_dma_sync(uvc_urb, false);
stream->decode(uvc_urb, buf, buf_meta);
 
/* If no async work is needed, resubmit the URB immediately. */
if (!uvc_urb->async_operations) {
+   uvc_urb_dma_sync(uvc_urb, true);
ret = usb_submit_urb(uvc_urb->urb, GFP_ATOMIC);
if (ret < 0)
uvc_printk(KERN_ERR,
@@ -1559,24 +1581,47 @@ static void uvc_video_complete(struct urb *urb)
  */
 static void uvc_free_urb_buffers(struct uvc_streaming *stream)
 {
+   struct device *dma_dev = dma_dev = stream_to_dmadev(stream);
struct uvc_urb *uvc_urb;
 
for_each_uvc_urb(uvc_urb, stream) {
if (!uvc_urb->buffer)
continue;
 
-#ifndef CONFIG_DMA_NONCOHERENT
-   usb_free_coherent(stream->dev->udev, stream->urb_size,
- uvc_urb->buffer, uvc_urb->dma);
-#else
-   kfree(uvc_urb->buffer);
-#endif
+   dma_vunmap_noncontiguous(dma_dev, uvc_urb->buffer);
+   dma_free_noncontiguous(dma_dev, stream->urb_size, uvc_urb->sgt,
+  uvc_urb->dma, DMA_BIDIRECTIONAL);
+
uvc_urb->buffer = NULL;
}
 
stream->urb_size = 0;
 }
 
+static bool uvc_alloc_urb_buffer(struct uvc_streaming *stream,
+struct uvc_urb *uvc_urb, gfp_t gfp_flags)
+{
+   

[PATCH 5/6] dma-iommu: implement ->alloc_noncontiguous

2021-01-28 Thread Christoph Hellwig
Implement support for allocating a non-contiguous DMA region.

Signed-off-by: Christoph Hellwig 
---
 drivers/iommu/dma-iommu.c | 47 +++
 1 file changed, 47 insertions(+)

diff --git a/drivers/iommu/dma-iommu.c b/drivers/iommu/dma-iommu.c
index 65af875ba8495c..938a2856b4a455 100644
--- a/drivers/iommu/dma-iommu.c
+++ b/drivers/iommu/dma-iommu.c
@@ -756,6 +756,49 @@ static void *iommu_dma_alloc_remap(struct device *dev, 
size_t size,
return NULL;
 }
 
+#ifdef CONFIG_DMA_REMAP
+static struct sg_table *iommu_dma_alloc_noncontiguous(struct device *dev,
+   size_t size, dma_addr_t *dma_handle,
+   enum dma_data_direction dir, gfp_t gfp)
+{
+   unsigned long count = PAGE_ALIGN(size) >> PAGE_SHIFT;
+   struct dma_sgt_handle *sh;
+
+   sh = kmalloc(sizeof(*sh), gfp);
+   if (!sh)
+   return NULL;
+
+   sh->pages = __iommu_dma_alloc_noncontiguous(dev, size, >sgt,
+   dma_handle, gfp,
+   PAGE_KERNEL, 0);
+   if (!sh->pages)
+   goto out_free_sh;
+
+   if (sg_alloc_table_from_pages(>sgt, sh->pages, count, 0, size,
+ GFP_KERNEL))
+   goto out_free_pages;
+
+   return >sgt;
+
+out_free_pages:
+   __iommu_dma_free_pages(sh->pages, count);
+out_free_sh:
+   kfree(sh);
+   return NULL;
+}
+
+static void iommu_dma_free_noncontiguous(struct device *dev, size_t size,
+   struct sg_table *sgt, dma_addr_t dma_handle,
+   enum dma_data_direction dir)
+{
+   struct dma_sgt_handle *sh = sgt_handle(sgt);
+
+   __iommu_dma_unmap(dev, dma_handle, size);
+   __iommu_dma_free_pages(sh->pages, PAGE_ALIGN(size) >> PAGE_SHIFT);
+   sg_free_table(>sgt);
+}
+#endif /* CONFIG_DMA_REMAP */
+
 static void iommu_dma_sync_single_for_cpu(struct device *dev,
dma_addr_t dma_handle, size_t size, enum dma_data_direction dir)
 {
@@ -1271,6 +1314,10 @@ static const struct dma_map_ops iommu_dma_ops = {
.free   = iommu_dma_free,
.alloc_pages= dma_common_alloc_pages,
.free_pages = dma_common_free_pages,
+#ifdef CONFIG_DMA_REMAP
+   .alloc_noncontiguous= iommu_dma_alloc_noncontiguous,
+   .free_noncontiguous = iommu_dma_free_noncontiguous,
+#endif
.mmap   = iommu_dma_mmap,
.get_sgtable= iommu_dma_get_sgtable,
.map_page   = iommu_dma_map_page,
-- 
2.29.2

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[PATCH 4/6] dma-iommu: refactor iommu_dma_alloc_remap

2021-01-28 Thread Christoph Hellwig
Split out a new helper that only allocates a sg_table worth of
memory without mapping it into contiguous kernel address space.

Signed-off-by: Christoph Hellwig 
---
 drivers/iommu/dma-iommu.c | 66 +--
 1 file changed, 35 insertions(+), 31 deletions(-)

diff --git a/drivers/iommu/dma-iommu.c b/drivers/iommu/dma-iommu.c
index 255533faf90599..65af875ba8495c 100644
--- a/drivers/iommu/dma-iommu.c
+++ b/drivers/iommu/dma-iommu.c
@@ -661,24 +661,13 @@ static struct page **__iommu_dma_alloc_pages(struct 
device *dev,
return pages;
 }
 
-/**
- * iommu_dma_alloc_remap - Allocate and map a buffer contiguous in IOVA space
- * @dev: Device to allocate memory for. Must be a real device
- *  attached to an iommu_dma_domain
- * @size: Size of buffer in bytes
- * @dma_handle: Out argument for allocated DMA handle
- * @gfp: Allocation flags
- * @prot: pgprot_t to use for the remapped mapping
- * @attrs: DMA attributes for this allocation
- *
- * If @size is less than PAGE_SIZE, then a full CPU page will be allocated,
+/*
+ * If size is less than PAGE_SIZE, then a full CPU page will be allocated,
  * but an IOMMU which supports smaller pages might not map the whole thing.
- *
- * Return: Mapped virtual address, or NULL on failure.
  */
-static void *iommu_dma_alloc_remap(struct device *dev, size_t size,
-   dma_addr_t *dma_handle, gfp_t gfp, pgprot_t prot,
-   unsigned long attrs)
+static struct page **__iommu_dma_alloc_noncontiguous(struct device *dev,
+   size_t size, struct sg_table *sgt, dma_addr_t *dma_handle,
+   gfp_t gfp, pgprot_t prot, unsigned long attrs)
 {
struct iommu_domain *domain = iommu_get_dma_domain(dev);
struct iommu_dma_cookie *cookie = domain->iova_cookie;
@@ -687,9 +676,7 @@ static void *iommu_dma_alloc_remap(struct device *dev, 
size_t size,
int ioprot = dma_info_to_prot(DMA_BIDIRECTIONAL, coherent, attrs);
unsigned int count, min_size, alloc_sizes = domain->pgsize_bitmap;
struct page **pages;
-   struct sg_table sgt;
dma_addr_t iova;
-   void *vaddr;
 
*dma_handle = DMA_MAPPING_ERROR;
 
@@ -717,34 +704,26 @@ static void *iommu_dma_alloc_remap(struct device *dev, 
size_t size,
if (!iova)
goto out_free_pages;
 
-   if (sg_alloc_table_from_pages(, pages, count, 0, size, GFP_KERNEL))
+   if (sg_alloc_table_from_pages(sgt, pages, count, 0, size, GFP_KERNEL))
goto out_free_iova;
 
if (!(ioprot & IOMMU_CACHE)) {
struct scatterlist *sg;
int i;
 
-   for_each_sg(sgt.sgl, sg, sgt.orig_nents, i)
+   for_each_sg(sgt->sgl, sg, sgt->orig_nents, i)
arch_dma_prep_coherent(sg_page(sg), sg->length);
}
 
-   if (iommu_map_sg_atomic(domain, iova, sgt.sgl, sgt.orig_nents, ioprot)
+   if (iommu_map_sg_atomic(domain, iova, sgt->sgl, sgt->orig_nents, ioprot)
< size)
goto out_free_sg;
 
-   vaddr = dma_common_pages_remap(pages, size, prot,
-   __builtin_return_address(0));
-   if (!vaddr)
-   goto out_unmap;
-
*dma_handle = iova;
-   sg_free_table();
-   return vaddr;
+   return pages;
 
-out_unmap:
-   __iommu_dma_unmap(dev, iova, size);
 out_free_sg:
-   sg_free_table();
+   sg_free_table(sgt);
 out_free_iova:
iommu_dma_free_iova(cookie, iova, size, NULL);
 out_free_pages:
@@ -752,6 +731,31 @@ static void *iommu_dma_alloc_remap(struct device *dev, 
size_t size,
return NULL;
 }
 
+static void *iommu_dma_alloc_remap(struct device *dev, size_t size,
+   dma_addr_t *dma_handle, gfp_t gfp, pgprot_t prot,
+   unsigned long attrs)
+{
+   struct page **pages;
+   struct sg_table sgt;
+   void *vaddr;
+
+   pages = __iommu_dma_alloc_noncontiguous(dev, size, , dma_handle,
+   gfp, prot, attrs);
+   if (!pages)
+   return NULL;
+   sg_free_table();
+   vaddr = dma_common_pages_remap(pages, size, prot,
+   __builtin_return_address(0));
+   if (!vaddr)
+   goto out_unmap;
+   return vaddr;
+
+out_unmap:
+   __iommu_dma_unmap(dev, *dma_handle, size);
+   __iommu_dma_free_pages(pages, PAGE_ALIGN(size) >> PAGE_SHIFT);
+   return NULL;
+}
+
 static void iommu_dma_sync_single_for_cpu(struct device *dev,
dma_addr_t dma_handle, size_t size, enum dma_data_direction dir)
 {
-- 
2.29.2

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[PATCH 3/6] dma-mapping: add a dma_alloc_noncontiguous API

2021-01-28 Thread Christoph Hellwig
Add a new API that returns a potentiall virtually non-contigous sg_table
and a DMA address.  This API is only properly implemented for dma-iommu
and will simply return a contigious chunk as a fallback.

The intent is that media drivers can use this API if either:

 - no kernel mapping or only temporary kernel mappings are required.
   That is as a better replacement for DMA_ATTR_NO_KERNEL_MAPPING
 - a kernel mapping is required for cached and DMA mapped pages, but
   the driver also needs the pages to e.g. map them to userspace.
   In that sense it is a replacement for some aspects of the recently
   removed and never fully implemented DMA_ATTR_NON_CONSISTENT

Signed-off-by: Christoph Hellwig 
---
 Documentation/core-api/dma-api.rst | 72 +++
 include/linux/dma-map-ops.h| 20 +++
 include/linux/dma-mapping.h| 34 +++
 kernel/dma/mapping.c   | 92 ++
 4 files changed, 218 insertions(+)

diff --git a/Documentation/core-api/dma-api.rst 
b/Documentation/core-api/dma-api.rst
index 157a474ae54416..1dd676a8217137 100644
--- a/Documentation/core-api/dma-api.rst
+++ b/Documentation/core-api/dma-api.rst
@@ -594,6 +594,78 @@ dev, size, dma_handle and dir must all be the same as 
those passed into
 dma_alloc_noncoherent().  cpu_addr must be the virtual address returned by
 dma_alloc_noncoherent().
 
+::
+
+   struct sg_table *
+   dma_alloc_noncontiguous(struct device *dev, size_t size,
+   dma_addr_t *dma_handle,
+   enum dma_data_direction dir, gfp_t gfp)
+
+This routine allocates   bytes of non-coherent and possibly 
non-contiguous
+memory.  It returns a pointer to struct sg_table that describes the allocated
+memory , or NULL if the allocation failed. The resulting memory can be used for
+everything a struct page is suitable for.
+
+It also returns a  which may be cast to an unsigned integer the
+same width as the bus and given to the device as the DMA address base of
+the region.
+
+The dir parameter specified if data is read and/or written by the device,
+see dma_map_single() for details.
+
+The gfp parameter allows the caller to specify the ``GFP_`` flags (see
+kmalloc()) for the allocation, but rejects flags used to specify a memory
+zone such as GFP_DMA or GFP_HIGHMEM.
+
+Before giving the memory to the device, dma_sync_sgtable_for_device() needs
+to be called, and before reading memory written by the device,
+dma_sync_sgtable_for_cpu(), just like for streaming DMA mappings that are
+reused.
+
+::
+
+   void
+   dma_free_noncontiguous(struct device *dev, size_t size,
+  struct sg_table *sgt, dma_addr_t dma_handle,
+  enum dma_data_direction dir)
+
+Free memory previously allocated using dma_alloc_noncontiguous().  dev, size,
+dma_handle and dir must all be the same as those passed into
+dma_alloc_noncontiguous().  sgt must be the pointer returned by
+dma_alloc_noncontiguous().
+
+::
+
+   void *
+   dma_vmap_noncontiguous(struct device *dev, size_t size,
+   struct sg_table *sgt)
+
+Return a contiguous kernel mapping for an allocation returned from
+dma_alloc_noncontiguous().  dev and size must be the same as those passed into
+dma_alloc_noncontiguous().  sgt must be the pointer returned by
+dma_alloc_noncontiguous().
+
+::
+
+   void
+   dma_vunmap_noncontiguous(struct device *dev, void *vaddr)
+
+Unmap a kernel mapping returned by dma_vmap_noncontiguous().  dev must be the
+same the one passed into dma_alloc_noncontiguous().  vaddr must be the pointer
+returned by dma_vmap_noncontiguous().
+
+
+::
+
+   int
+   dma_mmap_noncontiguous(struct device *dev, struct vm_area_struct *vma,
+  size_t size, struct sg_table *sgt)
+
+Map an allocation returned from dma_alloc_noncontiguous() into a user address
+space.  dev and size must be the same as those passed into
+dma_alloc_noncontiguous().  sgt must be the pointer returned by
+dma_alloc_noncontiguous().
+
 ::
 
int
diff --git a/include/linux/dma-map-ops.h b/include/linux/dma-map-ops.h
index 11e02537b9e01b..82efa36d8b09c4 100644
--- a/include/linux/dma-map-ops.h
+++ b/include/linux/dma-map-ops.h
@@ -22,6 +22,12 @@ struct dma_map_ops {
gfp_t gfp);
void (*free_pages)(struct device *dev, size_t size, struct page *vaddr,
dma_addr_t dma_handle, enum dma_data_direction dir);
+   struct sg_table *(*alloc_noncontiguous)(struct device *dev, size_t size,
+   dma_addr_t *dma_handle, enum dma_data_direction dir,
+   gfp_t gfp);
+   void (*free_noncontiguous)(struct device *dev, size_t size,
+   struct sg_table *sgt, dma_addr_t dma_handle,
+   enum dma_data_direction dir);
int (*mmap)(struct device *, struct vm_area_struct *,
   

[PATCH 2/6] dma-mapping: add a dma_mmap_pages helper

2021-01-28 Thread Christoph Hellwig
Add a helper to map memory allocated using dma_alloc_pages into
a user address space, similar to the dma_alloc_attrs function for
coherent allocations.

Signed-off-by: Christoph Hellwig 
---
 Documentation/core-api/dma-api.rst | 10 ++
 include/linux/dma-mapping.h|  2 ++
 kernel/dma/mapping.c   | 13 +
 3 files changed, 25 insertions(+)

diff --git a/Documentation/core-api/dma-api.rst 
b/Documentation/core-api/dma-api.rst
index e6d23f117308df..157a474ae54416 100644
--- a/Documentation/core-api/dma-api.rst
+++ b/Documentation/core-api/dma-api.rst
@@ -563,6 +563,16 @@ Free a region of memory previously allocated using 
dma_alloc_pages().
 dev, size, dma_handle and dir must all be the same as those passed into
 dma_alloc_pages().  page must be the pointer returned by dma_alloc_pages().
 
+::
+
+   int
+   dma_mmap_pages(struct device *dev, struct vm_area_struct *vma,
+  size_t size, struct page *page)
+
+Map an allocation returned from dma_alloc_pages() into a user address space.
+dev and size must be the same as those passed into dma_alloc_pages().
+page must be the pointer returned by dma_alloc_pages().
+
 ::
 
void *
diff --git a/include/linux/dma-mapping.h b/include/linux/dma-mapping.h
index fbfa3f5abd9498..4977a748cb9483 100644
--- a/include/linux/dma-mapping.h
+++ b/include/linux/dma-mapping.h
@@ -263,6 +263,8 @@ struct page *dma_alloc_pages(struct device *dev, size_t 
size,
dma_addr_t *dma_handle, enum dma_data_direction dir, gfp_t gfp);
 void dma_free_pages(struct device *dev, size_t size, struct page *page,
dma_addr_t dma_handle, enum dma_data_direction dir);
+int dma_mmap_pages(struct device *dev, struct vm_area_struct *vma,
+   size_t size, struct page *page);
 
 static inline void *dma_alloc_noncoherent(struct device *dev, size_t size,
dma_addr_t *dma_handle, enum dma_data_direction dir, gfp_t gfp)
diff --git a/kernel/dma/mapping.c b/kernel/dma/mapping.c
index 68992e35c8c3a7..c1e515496c067b 100644
--- a/kernel/dma/mapping.c
+++ b/kernel/dma/mapping.c
@@ -515,6 +515,19 @@ void dma_free_pages(struct device *dev, size_t size, 
struct page *page,
 }
 EXPORT_SYMBOL_GPL(dma_free_pages);
 
+int dma_mmap_pages(struct device *dev, struct vm_area_struct *vma,
+   size_t size, struct page *page)
+{
+   unsigned long count = PAGE_ALIGN(size) >> PAGE_SHIFT;
+
+   if (vma->vm_pgoff >= count || vma_pages(vma) > count - vma->vm_pgoff)
+   return -ENXIO;
+   return remap_pfn_range(vma, vma->vm_start,
+  page_to_pfn(page) + vma->vm_pgoff,
+  vma_pages(vma) << PAGE_SHIFT, vma->vm_page_prot);
+}
+EXPORT_SYMBOL_GPL(dma_mmap_pages);
+
 int dma_supported(struct device *dev, u64 mask)
 {
const struct dma_map_ops *ops = get_dma_ops(dev);
-- 
2.29.2

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[PATCH 1/6] dma-mapping: remove the {alloc,free}_noncoherent methods

2021-01-28 Thread Christoph Hellwig
It turns out allowing non-contigous allocations here was a rather bad
idea, as we'll now need to define ways to get the pages for mmaping
or dma_buf sharing.  Revert this change and stick to the original
concept.  A different API for the use case of non-contigous allocations
will be added back later.

Signed-off-by: Christoph Hellwig 
---
 Documentation/core-api/dma-api.rst | 64 ++
 drivers/iommu/dma-iommu.c  | 30 --
 include/linux/dma-map-ops.h|  5 ---
 include/linux/dma-mapping.h| 17 ++--
 kernel/dma/mapping.c   | 40 ---
 5 files changed, 35 insertions(+), 121 deletions(-)

diff --git a/Documentation/core-api/dma-api.rst 
b/Documentation/core-api/dma-api.rst
index 75cb757bbff00a..e6d23f117308df 100644
--- a/Documentation/core-api/dma-api.rst
+++ b/Documentation/core-api/dma-api.rst
@@ -528,16 +528,14 @@ an I/O device, you should not be using this part of the 
API.
 
 ::
 
-   void *
-   dma_alloc_noncoherent(struct device *dev, size_t size,
-   dma_addr_t *dma_handle, enum dma_data_direction dir,
-   gfp_t gfp)
+   struct page *
+   dma_alloc_pages(struct device *dev, size_t size, dma_addr_t *dma_handle,
+   enum dma_data_direction dir, gfp_t gfp)
 
-This routine allocates a region of  bytes of consistent memory.  It
-returns a pointer to the allocated region (in the processor's virtual address
-space) or NULL if the allocation failed.  The returned memory may or may not
-be in the kernel direct mapping.  Drivers must not call virt_to_page on
-the returned memory region.
+This routine allocates a region of  bytes of non-coherent memory.  It
+returns a pointer to first struct page for the region, or NULL if the
+allocation failed. The resulting struct page can be used for everything a
+struct page is suitable for.
 
 It also returns a  which may be cast to an unsigned integer the
 same width as the bus and given to the device as the DMA address base of
@@ -558,51 +556,33 @@ reused.
 ::
 
void
-   dma_free_noncoherent(struct device *dev, size_t size, void *cpu_addr,
+   dma_free_pages(struct device *dev, size_t size, struct page *page,
dma_addr_t dma_handle, enum dma_data_direction dir)
 
-Free a region of memory previously allocated using dma_alloc_noncoherent().
-dev, size and dma_handle and dir must all be the same as those passed into
-dma_alloc_noncoherent().  cpu_addr must be the virtual address returned by
-dma_alloc_noncoherent().
+Free a region of memory previously allocated using dma_alloc_pages().
+dev, size, dma_handle and dir must all be the same as those passed into
+dma_alloc_pages().  page must be the pointer returned by dma_alloc_pages().
 
 ::
 
-   struct page *
-   dma_alloc_pages(struct device *dev, size_t size, dma_addr_t *dma_handle,
-   enum dma_data_direction dir, gfp_t gfp)
-
-This routine allocates a region of  bytes of non-coherent memory.  It
-returns a pointer to first struct page for the region, or NULL if the
-allocation failed. The resulting struct page can be used for everything a
-struct page is suitable for.
-
-It also returns a  which may be cast to an unsigned integer the
-same width as the bus and given to the device as the DMA address base of
-the region.
-
-The dir parameter specified if data is read and/or written by the device,
-see dma_map_single() for details.
-
-The gfp parameter allows the caller to specify the ``GFP_`` flags (see
-kmalloc()) for the allocation, but rejects flags used to specify a memory
-zone such as GFP_DMA or GFP_HIGHMEM.
+   void *
+   dma_alloc_noncoherent(struct device *dev, size_t size,
+   dma_addr_t *dma_handle, enum dma_data_direction dir,
+   gfp_t gfp)
 
-Before giving the memory to the device, dma_sync_single_for_device() needs
-to be called, and before reading memory written by the device,
-dma_sync_single_for_cpu(), just like for streaming DMA mappings that are
-reused.
+This routine is a convenient wrapper around dma_alloc_pages that returns the
+kernel virtual address for the allocated memory instead of the page structure.
 
 ::
 
void
-   dma_free_pages(struct device *dev, size_t size, struct page *page,
+   dma_free_noncoherent(struct device *dev, size_t size, void *cpu_addr,
dma_addr_t dma_handle, enum dma_data_direction dir)
 
-Free a region of memory previously allocated using dma_alloc_pages().
-dev, size and dma_handle and dir must all be the same as those passed into
-dma_alloc_noncoherent().  page must be the pointer returned by
-dma_alloc_pages().
+Free a region of memory previously allocated using dma_alloc_noncoherent().
+dev, size, dma_handle and dir must all be the same as those passed into
+dma_alloc_noncoherent().  cpu_addr must be the virtual address returned by
+dma_alloc_noncoherent().

add a new dma_alloc_noncontiguous API

2021-01-28 Thread Christoph Hellwig
Hi all,

this series adds the new noncontiguous DMA allocation API requested by
various media driver maintainers.
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH RFC 0/9] Possible set of VT-d optimizations

2021-01-28 Thread Chuck Lever



> On Jan 28, 2021, at 8:59 AM, Robin Murphy  wrote:
> 
> On 2021-01-27 20:00, Chuck Lever wrote:
>> Hi-
>> This collection of patches seems to get the best throughtput results
>> so far. The NFS WRITE result is fully restored, and the NFS READ
>> result is very close to fully restored.
>>  Children see throughput for 12 initial writers  = 5008474.03 kB/sec
>>  Parent sees throughput for 12 initial writers   = 4996927.80 kB/sec
>>  Min throughput per process  = 416956.88 kB/sec
>>  Max throughput per process  = 417910.22 kB/sec
>>  Avg throughput per process  = 417372.84 kB/sec
>>  Min xfer= 1046272.00 kB
>>  CPU Utilization: Wall time2.515CPU time1.996CPU 
>> utilization  79.37 %
>>  Children see throughput for 12 rewriters= 5020584.59 kB/sec
>>  Parent sees throughput for 12 rewriters = 5012539.29 kB/sec
>>  Min throughput per process  = 417799.00 kB/sec
>>  Max throughput per process  = 419082.22 kB/sec
>>  Avg throughput per process  = 418382.05 kB/sec
>>  Min xfer= 1046528.00 kB
>>  CPU utilization: Wall time2.507CPU time2.024CPU 
>> utilization  80.73 %
>>  Children see throughput for 12 readers  = 5805484.25 kB/sec
>>  Parent sees throughput for 12 readers   = 5799535.68 kB/sec
>>  Min throughput per process  = 482888.16 kB/sec
>>  Max throughput per process  = 48.16 kB/sec
>>  Avg throughput per process  = 483790.35 kB/sec
>>  Min xfer= 1045760.00 kB
>>  CPU utilization: Wall time2.167CPU time1.964CPU 
>> utilization  90.63 %
>>  Children see throughput for 12 re-readers   = 5812227.16 kB/sec
>>  Parent sees throughput for 12 re-readers= 5803793.06 kB/sec
>>  Min throughput per process  = 483242.97 kB/sec
>>  Max throughput per process  = 485724.41 kB/sec
>>  Avg throughput per process  = 484352.26 kB/sec
>>  Min xfer= 1043456.00 kB
>>  CPU utilization: Wall time2.161CPU time1.976CPU 
>> utilization  91.45 %
>> I've included a simple-minded implementation of a map_sg op for
>> the Intel IOMMU. This is nothing more than a copy of the loop in
>> __iommu_map_sg() with the call to __iommu_map() replaced with a
>> call to intel_iommu_map().
> 
> ...which is the main reason I continue to strongly dislike patches #4-#9 (#3 
> definitely seems to makes sense either way, now that #1 and #2 are going to 
> land). If a common operation is worth optimising anywhere, then it deserves 
> optimising everywhere, so we end up with a dozen diverging copies of 
> essentially the same code - particularly when the driver-specific 
> functionality *is* already in the drivers, so what gets duplicated is solely 
> the "generic" parts.

I don't disagree with that assessment, but I don't immediately see an
alternative API arrangement that would be more successful in the short
term. If 4/9 - 9/9 are not acceptable, then the responsible thing to
do would be to revert:

 - 58a8bb39490d ("iommu/vt-d: Cleanup after converting to dma-iommu ops")
 - c588072bba6b ("iommu/vt-d: Convert intel iommu driver to the iommu ops")

for v5.11, work out the proper API design, and then try the VT-d conversion
again.

IMHO.


> And if there's justification for pushing iommu_map_sg() entirely into 
> drivers, then it's verging on self-contradictory not to do the same for 
> iommu_map() and iommu_unmap(). Some IOMMU drivers - mainly intel-iommu, as it 
> happens - are already implementing hacks around the "one call per page" 
> interface being inherently inefficient, so the logical thing to do here is 
> take a step back and reconsider the fundamental design of the whole map/unmap 
> interface. Implementing hacks on top of hacks to make particular things 
> faster on particular systems that particular people care about is not going 
> to do us any favours in the long run.
> 
> As it stands, I can easily see a weird anti-pattern emerging where people 
> start adding code to fake up scatterlists in random drivers because they see 
> dma_map_sg() performing paradoxically better than dma_map_page().
> 
> Robin.
> 
>> ---
>> Chuck Lever (1):
>>   iommu/vt-d: Introduce map_sg() for Intel IOMMUs
>> Isaac J. Manjarres (5):
>>   iommu/io-pgtable: Introduce map_sg() as a page table op
>>   iommu/io-pgtable-arm: Hook up map_sg()
>>   iommu/io-pgtable-arm-v7s: Hook up map_sg()
>>   iommu: Introduce map_sg() as an IOMMU op for IOMMU drivers
>>   iommu/arm-smmu: Hook up map_sg()
>> Lu Baolu (1):
>>   iommu/vt-d: Add 

Re: [PATCH RFC 0/9] Possible set of VT-d optimizations

2021-01-28 Thread Robin Murphy

On 2021-01-27 20:00, Chuck Lever wrote:

Hi-

This collection of patches seems to get the best throughtput results
so far. The NFS WRITE result is fully restored, and the NFS READ
result is very close to fully restored.

Children see throughput for 12 initial writers  = 5008474.03 kB/sec
Parent sees throughput for 12 initial writers   = 4996927.80 kB/sec
Min throughput per process  = 416956.88 kB/sec
Max throughput per process  = 417910.22 kB/sec
Avg throughput per process  = 417372.84 kB/sec
Min xfer= 1046272.00 kB
CPU Utilization: Wall time2.515CPU time1.996CPU 
utilization  79.37 %


Children see throughput for 12 rewriters= 5020584.59 kB/sec
Parent sees throughput for 12 rewriters = 5012539.29 kB/sec
Min throughput per process  = 417799.00 kB/sec
Max throughput per process  = 419082.22 kB/sec
Avg throughput per process  = 418382.05 kB/sec
Min xfer= 1046528.00 kB
CPU utilization: Wall time2.507CPU time2.024CPU 
utilization  80.73 %


Children see throughput for 12 readers  = 5805484.25 kB/sec
Parent sees throughput for 12 readers   = 5799535.68 kB/sec
Min throughput per process  = 482888.16 kB/sec
Max throughput per process  = 48.16 kB/sec
Avg throughput per process  = 483790.35 kB/sec
Min xfer= 1045760.00 kB
CPU utilization: Wall time2.167CPU time1.964CPU 
utilization  90.63 %


Children see throughput for 12 re-readers   = 5812227.16 kB/sec
Parent sees throughput for 12 re-readers= 5803793.06 kB/sec
Min throughput per process  = 483242.97 kB/sec
Max throughput per process  = 485724.41 kB/sec
Avg throughput per process  = 484352.26 kB/sec
Min xfer= 1043456.00 kB
CPU utilization: Wall time2.161CPU time1.976CPU 
utilization  91.45 %

I've included a simple-minded implementation of a map_sg op for
the Intel IOMMU. This is nothing more than a copy of the loop in
__iommu_map_sg() with the call to __iommu_map() replaced with a
call to intel_iommu_map().


...which is the main reason I continue to strongly dislike patches #4-#9 
(#3 definitely seems to makes sense either way, now that #1 and #2 are 
going to land). If a common operation is worth optimising anywhere, then 
it deserves optimising everywhere, so we end up with a dozen diverging 
copies of essentially the same code - particularly when the 
driver-specific functionality *is* already in the drivers, so what gets 
duplicated is solely the "generic" parts.


And if there's justification for pushing iommu_map_sg() entirely into 
drivers, then it's verging on self-contradictory not to do the same for 
iommu_map() and iommu_unmap(). Some IOMMU drivers - mainly intel-iommu, 
as it happens - are already implementing hacks around the "one call per 
page" interface being inherently inefficient, so the logical thing to do 
here is take a step back and reconsider the fundamental design of the 
whole map/unmap interface. Implementing hacks on top of hacks to make 
particular things faster on particular systems that particular people 
care about is not going to do us any favours in the long run.


As it stands, I can easily see a weird anti-pattern emerging where 
people start adding code to fake up scatterlists in random drivers 
because they see dma_map_sg() performing paradoxically better than 
dma_map_page().


Robin.


---

Chuck Lever (1):
   iommu/vt-d: Introduce map_sg() for Intel IOMMUs

Isaac J. Manjarres (5):
   iommu/io-pgtable: Introduce map_sg() as a page table op
   iommu/io-pgtable-arm: Hook up map_sg()
   iommu/io-pgtable-arm-v7s: Hook up map_sg()
   iommu: Introduce map_sg() as an IOMMU op for IOMMU drivers
   iommu/arm-smmu: Hook up map_sg()

Lu Baolu (1):
   iommu/vt-d: Add iotlb_sync_map callback

Yong Wu (2):
   iommu: Move iotlb_sync_map out from __iommu_map
   iommu: Add iova and size as parameters in iotlb_sync_map


  drivers/iommu/arm/arm-smmu/arm-smmu.c |  19 
  drivers/iommu/intel/iommu.c   | 131 --
  drivers/iommu/io-pgtable-arm-v7s.c|  90 ++
  drivers/iommu/io-pgtable-arm.c|  86 +
  drivers/iommu/iommu.c |  47 +++--
  drivers/iommu/tegra-gart.c|   7 +-
  include/linux/iommu.h |  16 +++-
  7 files changed, 353 insertions(+), 43 deletions(-)

--

Re: [PATCH RFC 5/9] iommu/io-pgtable-arm: Hook up map_sg()

2021-01-28 Thread Will Deacon
On Wed, Jan 27, 2021 at 03:00:53PM -0500, Chuck Lever wrote:
> From: Isaac J. Manjarres 
> 
> Implement the map_sg io-pgtable op for the ARM LPAE io-pgtable
> code, so that IOMMU drivers can call it when they need to map
> a scatter-gather list.
> 
> Signed-off-by: Isaac J. Manjarres 
> Tested-by: Sai Prakash Ranjan 
> Signed-off-by: Chuck Lever 
> ---
>  drivers/iommu/io-pgtable-arm.c |   86 
> 
>  drivers/iommu/iommu.c  |   12 +++---
>  include/linux/iommu.h  |8 
>  3 files changed, 101 insertions(+), 5 deletions(-)
> 
> diff --git a/drivers/iommu/io-pgtable-arm.c b/drivers/iommu/io-pgtable-arm.c
> index 87def58e79b5..0c11529442b8 100644
> --- a/drivers/iommu/io-pgtable-arm.c
> +++ b/drivers/iommu/io-pgtable-arm.c
> @@ -473,6 +473,91 @@ static int arm_lpae_map(struct io_pgtable_ops *ops, 
> unsigned long iova,
>   return ret;
>  }
>  
> +static int arm_lpae_map_by_pgsize(struct io_pgtable_ops *ops,
> +   unsigned long iova, phys_addr_t paddr,
> +   size_t size, int iommu_prot, gfp_t gfp,
> +   size_t *mapped)
> +{
> + struct arm_lpae_io_pgtable *data = io_pgtable_ops_to_data(ops);
> + struct io_pgtable_cfg *cfg = >iop.cfg;
> + arm_lpae_iopte *ptep = data->pgd;
> + int ret, lvl = data->start_level;
> + arm_lpae_iopte prot = arm_lpae_prot_to_pte(data, iommu_prot);
> + unsigned int min_pagesz = 1 << __ffs(cfg->pgsize_bitmap);
> + long iaext = (s64)(iova + size - 1) >> cfg->ias;
> + size_t pgsize;
> +
> + if (!IS_ALIGNED(iova | paddr | size, min_pagesz)) {
> + pr_err("unaligned: iova 0x%lx pa %pa size 0x%zx min_pagesz 
> 0x%x\n",
> +iova, , size, min_pagesz);
> + return -EINVAL;
> + }
> +
> + if (cfg->quirks & IO_PGTABLE_QUIRK_ARM_TTBR1)
> + iaext = ~iaext;
> + if (WARN_ON(iaext || (paddr + size - 1) >> cfg->oas))
> + return -ERANGE;
> +
> + while (size) {
> + pgsize = iommu_pgsize(cfg->pgsize_bitmap, iova | paddr, size);
> + ret = __arm_lpae_map(data, iova, paddr, pgsize, prot, lvl, ptep,
> +  gfp);
> + if (ret)
> + return ret;
> +
> + iova += pgsize;
> + paddr += pgsize;
> + *mapped += pgsize;
> + size -= pgsize;
> + }
> +
> + return 0;
> +}
> +
> +static int arm_lpae_map_sg(struct io_pgtable_ops *ops, unsigned long iova,
> +struct scatterlist *sg, unsigned int nents,
> +int iommu_prot, gfp_t gfp, size_t *mapped)
> +{
> +
> + size_t len = 0;
> + unsigned int i = 0;
> + int ret;
> + phys_addr_t start;
> +
> + *mapped = 0;
> +
> + /* If no access, then nothing to do */
> + if (!(iommu_prot & (IOMMU_READ | IOMMU_WRITE)))
> + return 0;
> +
> + while (i <= nents) {
> + phys_addr_t s_phys = sg_phys(sg);
> +
> + if (len && s_phys != start + len) {
> + ret = arm_lpae_map_by_pgsize(ops, iova + *mapped, start,
> +  len, iommu_prot, gfp,
> +  mapped);
> +
> + if (ret)
> + return ret;
> +
> + len = 0;
> + }
> +
> + if (len) {
> + len += sg->length;
> + } else {
> + len = sg->length;
> + start = s_phys;
> + }
> +
> + if (++i < nents)
> + sg = sg_next(sg);
> + }
> +
> + return 0;
> +}

Although I really like the idea of reducing the layering here, I think we
need to figure out a way to reduce the amount of boiler-plate that ends up
in the pgtable code. Otherwise it's pretty unmaintainable.

Will
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH RFC 2/9] iommu: Add iova and size as parameters in iotlb_sync_map

2021-01-28 Thread Joerg Roedel
On Thu, Jan 28, 2021 at 01:19:29PM +, Will Deacon wrote:
> Heads-up, but I already queued this patch as part of its original series:
> 
> https://lore.kernel.org/r/20210107122909.16317-1-yong...@mediatek.com
> 
> since it's part of the Mediatek series for 5.12.
> 
> Would you like me to drop that, or can we stick with passing iova and size
> for now, with a view to refactoring it later on?

I think its okay for now, we can refactor it later.

Regards,

Joerg
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH RFC 2/9] iommu: Add iova and size as parameters in iotlb_sync_map

2021-01-28 Thread Will Deacon
Hi Joerg,

On Thu, Jan 28, 2021 at 01:51:12PM +0100, Joerg Roedel wrote:
> On Wed, Jan 27, 2021 at 03:00:32PM -0500, Chuck Lever wrote:
> > From: Yong Wu 
> > 
> > iotlb_sync_map allow IOMMU drivers tlb sync after completing the whole
> > mapping. This patch adds iova and size as the parameters in it. then the
> > IOMMU driver could flush tlb with the whole range once after iova mapping
> > to improve performance.
> > 
> > Signed-off-by: Yong Wu 
> > Reviewed-by: Robin Murphy 
> > Signed-off-by: Chuck Lever 
> > ---
> >  drivers/iommu/iommu.c  |4 ++--
> >  drivers/iommu/tegra-gart.c |7 +--
> >  include/linux/iommu.h  |3 ++-
> >  3 files changed, 9 insertions(+), 5 deletions(-)
> > 
> > diff --git a/drivers/iommu/iommu.c b/drivers/iommu/iommu.c
> > index c304a6a30d42..3d099a31ddca 100644
> > --- a/drivers/iommu/iommu.c
> > +++ b/drivers/iommu/iommu.c
> > @@ -2443,7 +2443,7 @@ static int _iommu_map(struct iommu_domain *domain, 
> > unsigned long iova,
> >  
> > ret = __iommu_map(domain, iova, paddr, size, prot, GFP_KERNEL);
> > if (ret == 0 && ops->iotlb_sync_map)
> > -   ops->iotlb_sync_map(domain);
> > +   ops->iotlb_sync_map(domain, iova, size);
> 
> How about using 'struct iommu_iotlb_gather' instead of directly passing
> the iova/size parameters here? The iotlb_sync() call-back uses it
> already.

Heads-up, but I already queued this patch as part of its original series:

https://lore.kernel.org/r/20210107122909.16317-1-yong...@mediatek.com

since it's part of the Mediatek series for 5.12.

Would you like me to drop that, or can we stick with passing iova and size
for now, with a view to refactoring it later on?

Will
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[PATCH 1/2] iommu/ipmmu-vmsa: refactor ipmmu_of_xlate()

2021-01-28 Thread Yoshihiro Shimoda
Refactor ipmmu_of_xlate() to improve readability/scalability.

Signed-off-by: Yoshihiro Shimoda 
---
 drivers/iommu/ipmmu-vmsa.c | 49 +-
 1 file changed, 18 insertions(+), 31 deletions(-)

diff --git a/drivers/iommu/ipmmu-vmsa.c b/drivers/iommu/ipmmu-vmsa.c
index 0f18abd..0bdf354 100644
--- a/drivers/iommu/ipmmu-vmsa.c
+++ b/drivers/iommu/ipmmu-vmsa.c
@@ -735,54 +735,41 @@ static int ipmmu_init_platform_device(struct device *dev,
return 0;
 }
 
-static const struct soc_device_attribute soc_rcar_gen3[] = {
-   { .soc_id = "r8a774a1", },
-   { .soc_id = "r8a774b1", },
-   { .soc_id = "r8a774c0", },
-   { .soc_id = "r8a774e1", },
-   { .soc_id = "r8a7795", },
-   { .soc_id = "r8a77961", },
-   { .soc_id = "r8a7796", },
-   { .soc_id = "r8a77965", },
-   { .soc_id = "r8a77970", },
-   { .soc_id = "r8a77990", },
-   { .soc_id = "r8a77995", },
+static const struct soc_device_attribute soc_needs_opt_in[] = {
+   { .family = "R-Car Gen3", },
+   { .family = "RZ/G2", },
{ /* sentinel */ }
 };
 
-static const struct soc_device_attribute soc_rcar_gen3_whitelist[] = {
-   { .soc_id = "r8a774b1", },
-   { .soc_id = "r8a774c0", },
-   { .soc_id = "r8a774e1", },
-   { .soc_id = "r8a7795", .revision = "ES3.*" },
-   { .soc_id = "r8a77961", },
-   { .soc_id = "r8a77965", },
-   { .soc_id = "r8a77990", },
-   { .soc_id = "r8a77995", },
+static const struct soc_device_attribute soc_denylist[] = {
+   { .soc_id = "r8a774a1", },
+   { .soc_id = "r8a7795", .revision = "ES1.*" },
+   { .soc_id = "r8a7795", .revision = "ES2.*" },
+   { .soc_id = "r8a7796", },
{ /* sentinel */ }
 };
 
-static const char * const rcar_gen3_slave_whitelist[] = {
+static const char * const devices_allowlist[] = {
 };
 
-static bool ipmmu_slave_whitelist(struct device *dev)
+static bool ipmmu_device_is_allowed(struct device *dev)
 {
unsigned int i;
 
/*
-* For R-Car Gen3 use a white list to opt-in slave devices.
+* R-Car Gen3 and RZ/G2 use the allow list to opt-in devices.
 * For Other SoCs, this returns true anyway.
 */
-   if (!soc_device_match(soc_rcar_gen3))
+   if (!soc_device_match(soc_needs_opt_in))
return true;
 
-   /* Check whether this R-Car Gen3 can use the IPMMU correctly or not */
-   if (!soc_device_match(soc_rcar_gen3_whitelist))
+   /* Check whether this SoC can use the IPMMU correctly or not */
+   if (soc_device_match(soc_denylist))
return false;
 
-   /* Check whether this slave device can work with the IPMMU */
-   for (i = 0; i < ARRAY_SIZE(rcar_gen3_slave_whitelist); i++) {
-   if (!strcmp(dev_name(dev), rcar_gen3_slave_whitelist[i]))
+   /* Check whether this device can work with the IPMMU */
+   for (i = 0; i < ARRAY_SIZE(devices_allowlist); i++) {
+   if (!strcmp(dev_name(dev), devices_allowlist[i]))
return true;
}
 
@@ -793,7 +780,7 @@ static bool ipmmu_slave_whitelist(struct device *dev)
 static int ipmmu_of_xlate(struct device *dev,
  struct of_phandle_args *spec)
 {
-   if (!ipmmu_slave_whitelist(dev))
+   if (!ipmmu_device_is_allowed(dev))
return -ENODEV;
 
iommu_fwspec_add_ids(dev, spec->args, 1);
-- 
2.7.4

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[PATCH 2/2] iommu/ipmmu-vmsa: Allow SDHI devices

2021-01-28 Thread Yoshihiro Shimoda
Add SDHI devices into devices_allowlist.

Signed-off-by: Yoshihiro Shimoda 
---
 drivers/iommu/ipmmu-vmsa.c | 4 
 1 file changed, 4 insertions(+)

diff --git a/drivers/iommu/ipmmu-vmsa.c b/drivers/iommu/ipmmu-vmsa.c
index 0bdf354..5715fbe 100644
--- a/drivers/iommu/ipmmu-vmsa.c
+++ b/drivers/iommu/ipmmu-vmsa.c
@@ -750,6 +750,10 @@ static const struct soc_device_attribute soc_denylist[] = {
 };
 
 static const char * const devices_allowlist[] = {
+   "ee10.mmc",
+   "ee12.mmc",
+   "ee14.mmc",
+   "ee16.mmc"
 };
 
 static bool ipmmu_device_is_allowed(struct device *dev)
-- 
2.7.4

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[PATCH 0/2] iommu/ipmmu-vmsa: refactoring and allow SDHI devices

2021-01-28 Thread Yoshihiro Shimoda
I intend to add new SoC support in near the future, but before that,
I would like to refactor the ipmmu_of_xlate() to improve
readability/scalability. Also, adds SDHI devices into the allow list.

Yoshihiro Shimoda (2):
  iommu/ipmmu-vmsa: refactor ipmmu_of_xlate()
  iommu/ipmmu-vmsa: Allow SDHI devices

 drivers/iommu/ipmmu-vmsa.c | 53 +++---
 1 file changed, 22 insertions(+), 31 deletions(-)

-- 
2.7.4

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH] iommu: Check dev->iommu in iommu_dev_xxx functions

2021-01-28 Thread Joerg Roedel
On Tue, Jan 26, 2021 at 01:06:29PM +, Shameer Kolothum wrote:
> The device iommu probe/attach might have failed leaving dev->iommu
> to NULL and device drivers may still invoke these functions resulting
> a crash in iommu vendor driver code. Hence make sure we check that.
> 
> Signed-off-by: Shameer Kolothum 
> ---
>  drivers/iommu/iommu.c | 8 
>  1 file changed, 4 insertions(+), 4 deletions(-)
> 
> diff --git a/drivers/iommu/iommu.c b/drivers/iommu/iommu.c
> index ffeebda8d6de..cb68153c5cc0 100644
> --- a/drivers/iommu/iommu.c
> +++ b/drivers/iommu/iommu.c
> @@ -2867,7 +2867,7 @@ bool iommu_dev_has_feature(struct device *dev, enum 
> iommu_dev_features feat)

This function has been removed from the iommu-tree. Can you please
rebase this patch against the latest 'core' branch when I pushed it
later this week (maybe even later today)?
A Fixes tag would be nice too.

Regards,

Joerg
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH RFC 2/9] iommu: Add iova and size as parameters in iotlb_sync_map

2021-01-28 Thread Joerg Roedel
Hi Chuck,

thanks for your optimizations!

On Wed, Jan 27, 2021 at 03:00:32PM -0500, Chuck Lever wrote:
> From: Yong Wu 
> 
> iotlb_sync_map allow IOMMU drivers tlb sync after completing the whole
> mapping. This patch adds iova and size as the parameters in it. then the
> IOMMU driver could flush tlb with the whole range once after iova mapping
> to improve performance.
> 
> Signed-off-by: Yong Wu 
> Reviewed-by: Robin Murphy 
> Signed-off-by: Chuck Lever 
> ---
>  drivers/iommu/iommu.c  |4 ++--
>  drivers/iommu/tegra-gart.c |7 +--
>  include/linux/iommu.h  |3 ++-
>  3 files changed, 9 insertions(+), 5 deletions(-)
> 
> diff --git a/drivers/iommu/iommu.c b/drivers/iommu/iommu.c
> index c304a6a30d42..3d099a31ddca 100644
> --- a/drivers/iommu/iommu.c
> +++ b/drivers/iommu/iommu.c
> @@ -2443,7 +2443,7 @@ static int _iommu_map(struct iommu_domain *domain, 
> unsigned long iova,
>  
>   ret = __iommu_map(domain, iova, paddr, size, prot, GFP_KERNEL);
>   if (ret == 0 && ops->iotlb_sync_map)
> - ops->iotlb_sync_map(domain);
> + ops->iotlb_sync_map(domain, iova, size);

How about using 'struct iommu_iotlb_gather' instead of directly passing
the iova/size parameters here? The iotlb_sync() call-back uses it
already.

Regards,

Joerg
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH v3 0/2] iommu: fix the failure of deferred attach for iommu attach device

2021-01-28 Thread Joerg Roedel
On Tue, Jan 26, 2021 at 07:53:35PM +0800, Lianbo Jiang wrote:
> Lianbo Jiang (2):
>   dma-iommu: use static-key to minimize the impact in the fast-path
>   iommu: use the __iommu_attach_device() directly for deferred attach
> 
>  drivers/iommu/dma-iommu.c | 29 +++--
>  drivers/iommu/iommu.c | 10 ++
>  include/linux/iommu.h |  1 +
>  3 files changed, 22 insertions(+), 18 deletions(-)

Sorry, missed that there was a newer version. Applied this instead of
v2.
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH 1/1] iommu/vt-d: Correctly check addr alignment in qi_flush_dev_iotlb_pasid()

2021-01-28 Thread Joerg Roedel
On Tue, Jan 19, 2021 at 12:35:00PM +0800, Lu Baolu wrote:
> An incorrect address mask is being used in the qi_flush_dev_iotlb_pasid()
> to check the address alignment. This leads to a lot of spurious kernel
> warnings:
> 
> [  485.837093] DMAR: Invalidate non-aligned address 7f76f47f9000, order 0
> [  485.837098] DMAR: Invalidate non-aligned address 7f76f47f9000, order 0
> [  492.494145] qi_flush_dev_iotlb_pasid: 5734 callbacks suppressed
> [  492.494147] DMAR: Invalidate non-aligned address 7f772880, order 11
> [  492.508965] DMAR: Invalidate non-aligned address 7f772880, order 11
> 
> Fix it by checking the alignment in right way.
> 
> Fixes: 288d08e780088 ("iommu/vt-d: Handle non-page aligned address")
> Reported-and-tested-by: Guo Kaijie 
> Signed-off-by: Lu Baolu 
> Cc: Liu Yi L 
> ---
>  drivers/iommu/intel/dmar.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)

Applied for 5.11, thanks.

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH 0/2 v2] iommu: fix the failure of deferred attach for iommu attach device

2021-01-28 Thread Joerg Roedel
On Tue, Jan 19, 2021 at 07:16:14PM +0800, Lianbo Jiang wrote:
> Lianbo Jiang (2):
>   dma-iommu: use static-key to minimize the impact in the fast-path
>   iommu: use the __iommu_attach_device() directly for deferred attach
> 
>  drivers/iommu/dma-iommu.c | 29 +++--
>  drivers/iommu/iommu.c | 12 
>  include/linux/iommu.h |  2 ++
>  3 files changed, 25 insertions(+), 18 deletions(-)

Applied, thanks.
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH 1/1] iommu/vt-d: Preset Access/Dirty bits for IOVA over FL

2021-01-28 Thread Joerg Roedel
On Fri, Jan 15, 2021 at 08:42:02AM +0800, Lu Baolu wrote:
> The Access/Dirty bits in the first level page table entry will be set
> whenever a page table entry was used for address translation or write
> permission was successfully translated. This is always true when using
> the first-level page table for kernel IOVA. Instead of wasting hardware
> cycles to update the certain bits, it's better to set them up at the
> beginning.
> 
> Suggested-by: Ashok Raj 
> Signed-off-by: Lu Baolu 
> ---
>  drivers/iommu/intel/iommu.c | 14 --
>  include/linux/intel-iommu.h |  2 ++
>  2 files changed, 14 insertions(+), 2 deletions(-)

Applied, thanks.
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH 1/1] iommu/vt-d: Add qi_submit trace event

2021-01-28 Thread Joerg Roedel
On Thu, Jan 14, 2021 at 05:04:00PM +0800, Lu Baolu wrote:
> This adds a new trace event to track the submissions of requests to the
> invalidation queue. This event will provide the information like:
> - IOMMU name
> - Invalidation type
> - Descriptor raw data
> 
> A sample output like:
> | qi_submit: iotlb_inv dmar1: 0x100e2 0x0 0x0 0x0
> | qi_submit: dev_tlb_inv dmar1: 0x13 0x7001 0x0 0x0
> | qi_submit: iotlb_inv dmar2: 0x800f2 0xf9a5 0x0 0x0
> 
> This will be helpful for queued invalidation related debugging.
> 
> Signed-off-by: Lu Baolu 
> ---
>  drivers/iommu/intel/dmar.c |  3 +++
>  include/trace/events/intel_iommu.h | 37 ++
>  2 files changed, 40 insertions(+)

Applied, thanks.
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH 1/1] iommu/vt-d: Consolidate duplicate cache invaliation code

2021-01-28 Thread Joerg Roedel
On Thu, Jan 14, 2021 at 04:50:21PM +0800, Lu Baolu wrote:
> The pasid based IOTLB and devTLB invalidation code is duplicate in
> several places. Consolidate them by using the common helpers.
> 
> Signed-off-by: Lu Baolu 
> ---
>  drivers/iommu/intel/pasid.c | 18 ++--
>  drivers/iommu/intel/svm.c   | 55 ++---
>  2 files changed, 11 insertions(+), 62 deletions(-)

Applied, thanks.
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH 0/3] Adding offset keeping option when mapping data via SWIOTLB.*

2021-01-28 Thread Greg KH
On Wed, Jan 27, 2021 at 04:38:26PM -0800, Jianxiong Gao wrote:
> NVMe driver and other applications may depend on the data offset
> to operate correctly. Currently when unaligned data is mapped via
> SWIOTLB, the data is mapped as slab aligned with the SWIOTLB. This
> patch adds an option to make sure the mapped data preserves its
> offset of the orginal addrss.
> 
> Without the patch when creating xfs formatted disk on NVMe backends,
> with swiotlb=force in kernel boot option, creates the following error:
> meta-data=/dev/nvme2n1   isize=512agcount=4, agsize=131072 blks
>  =   sectsz=512   attr=2, projid32bit=1
>  =   crc=1finobt=1, sparse=0, rmapbt=0, 
> refl
> ink=0
> data =   bsize=4096   blocks=524288, imaxpct=25
>  =   sunit=0  swidth=0 blks
> naming   =version 2  bsize=4096   ascii-ci=0 ftype=1
> log  =internal log   bsize=4096   blocks=2560, version=2
>  =   sectsz=512   sunit=0 blks, lazy-count=1
> realtime =none   extsz=4096   blocks=0, rtextents=0
> mkfs.xfs: pwrite failed: Input/output error
> 
> Jianxiong Gao (3):
>   Adding page_offset_mask to device_dma_parameters
>   Add swiotlb offset preserving mapping when
> dma_dma_parameters->page_offset_mask is non zero.
>   Adding device_dma_parameters->offset_preserve_mask to NVMe driver.


Hi,

This is the friendly patch-bot of Greg Kroah-Hartman.  You have sent him
a patch that has triggered this response.  He used to manually respond
to these common problems, but in order to save his sanity (he kept
writing the same thing over and over, yet to different people), I was
created.  Hopefully you will not take offence and will fix the problem
in your patch and resubmit it so that it can be accepted into the Linux
kernel tree.

You are receiving this message because of the following common error(s)
as indicated below:

- You did not write a descriptive Subject: for the patch, allowing Greg,
  and everyone else, to know what this patch is all about.  Please read
  the section entitled "The canonical patch format" in the kernel file,
  Documentation/SubmittingPatches for what a proper Subject: line should
  look like.

If you wish to discuss this problem further, or you have questions about
how to resolve this issue, please feel free to respond to this email and
Greg will reply once he has dug out from the pending patches received
from other developers.

thanks,

greg k-h's patch email bot
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu