from:"Keqian Zhu"

Re: [RFC PATCH v4 00/13] iommu/smmuv3: Implement hardware dirty log tracking

2021-05-17 Thread Keqian Zhu

Hi all,

The VFIO part is at here: 
https://lore.kernel.org/kvm/20210507103608.39440-1-zhukeqi...@huawei.com/

Thanks,
Keqian

On 2021/5/7 18:21, Keqian Zhu wrote:
> Hi Robin, Will and everyone,
> 
> I think this series is relative mature now, please give your valuable 
> suggestions,
> thanks!
> 
> 
> This patch series is split from the series[1] that containes both IOMMU part 
> and
> VFIO part. The VFIO part will be sent out in another series.
> 
> [1] 
> https://lore.kernel.org/linux-iommu/20210310090614.26668-1-zhukeqi...@huawei.com/
> 
> changelog:
> 
> v4:
>  - Modify the framework as suggested by Baolu, thanks!
>  - Add trace for iommu ops.
>  - Extract io-pgtable part.
> 
> v3:
>  - Merge start_dirty_log and stop_dirty_log into switch_dirty_log. (Yi Sun)
>  - Maintain the dirty log status in iommu_domain.
>  - Update commit message to make patch easier to review.
> 
> v2:
>  - Address all comments of RFC version, thanks for all of you ;-)
>  - Add a bugfix that start dirty log for newly added dma ranges and domain.
> 
> 
> 
> Hi everyone,
> 
> This patch series introduces a framework of iommu dirty log tracking, and 
> smmuv3
> realizes this framework. This new feature can be used by VFIO dma dirty 
> tracking.
> 
> Intention：
> 
> Some types of IOMMU are capable of tracking DMA dirty log, such as
> ARM SMMU with HTTU or Intel IOMMU with SLADE. This introduces the
> dirty log tracking framework in the IOMMU base layer.
> 
> Three new essential interfaces are added, and we maintaince the status
> of dirty log tracking in iommu_domain.
> 1. iommu_switch_dirty_log: Perform actions to start|stop dirty log tracking
> 2. iommu_sync_dirty_log: Sync dirty log from IOMMU into a dirty bitmap
> 3. iommu_clear_dirty_log: Clear dirty log of IOMMU by a mask bitmap
> 
> About SMMU HTTU:
> 
> HTTU (Hardware Translation Table Update) is a feature of ARM SMMUv3, it can 
> update
> access flag or/and dirty state of the TTD (Translation Table Descriptor) by 
> hardware.
> With HTTU, stage1 TTD is classified into 3 types:
> DBM bit AP[2](readonly bit)
> 1. writable_clean 1   1
> 2. writable_dirty 1   0
> 3. readonly   0   1
> 
> If HTTU_HD (manage dirty state) is enabled, smmu can change TTD from 
> writable_clean to
> writable_dirty. Then software can scan TTD to sync dirty state into dirty 
> bitmap. With
> this feature, we can track the dirty log of DMA continuously and precisely.
> 
> About this series:
> 
> Patch 1-3：Introduce dirty log tracking framework in the IOMMU base layer, and 
> two common
>interfaces that can be used by many types of iommu.
> 
> Patch 4-6: Add feature detection for smmu HTTU and enable HTTU for smmu 
> stage1 mapping.
>And add feature detection for smmu BBML. We need to split block 
> mapping when
>start dirty log tracking and merge page mapping when stop dirty 
> log tracking,
>  which requires break-before-make procedure. But it might 
> cause problems when the
>  TTD is alive. The I/O streams might not tolerate translation 
> faults. So BBML
>  should be used.
> 
> Patch 7-12: We implement these interfaces for arm smmuv3.
> 
> Thanks,
> Keqian
> 
> Jean-Philippe Brucker (1):
>   iommu/arm-smmu-v3: Add support for Hardware Translation Table Update
> 
> Keqian Zhu (1):
>   iommu: Introduce dirty log tracking framework
> 
> Kunkun Jiang (11):
>   iommu/io-pgtable-arm: Add quirk ARM_HD and ARM_BBMLx
>   iommu/io-pgtable-arm: Add and realize split_block ops
>   iommu/io-pgtable-arm: Add and realize merge_page ops
>   iommu/io-pgtable-arm: Add and realize sync_dirty_log ops
>   iommu/io-pgtable-arm: Add and realize clear_dirty_log ops
>   iommu/arm-smmu-v3: Enable HTTU for stage1 with io-pgtable mapping
>   iommu/arm-smmu-v3: Add feature detection for BBML
>   iommu/arm-smmu-v3: Realize switch_dirty_log iommu ops
>   iommu/arm-smmu-v3: Realize sync_dirty_log iommu ops
>   iommu/arm-smmu-v3: Realize clear_dirty_log iommu ops
>   iommu/arm-smmu-v3: Realize support_dirty_log iommu ops
> 
>  .../iommu/arm/arm-smmu-v3/arm-smmu-v3-sva.c   |   2 +
>  drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c   | 268 +++-
>  drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h   |  14 +
>  drivers/iommu/io-pgtable-arm.c| 389 +-
>  drivers/iommu/iommu.c | 206 +-
>  include/linux/io-pgtable.h|  23 ++
>  include/linux/iommu.h |  65 +++
>  include/trace/events/iommu.h  |  63 +++
>  8 files changed, 1026 insertions(+), 4 deletions(-)
> 
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

Re: [RFC PATCH v4 01/13] iommu: Introduce dirty log tracking framework

2021-05-13 Thread Keqian Zhu




On 2021/5/13 20:02, Lu Baolu wrote:
> On 5/13/21 6:58 PM, Keqian Zhu wrote:
>>
>>
>> On 2021/5/12 19:36, Lu Baolu wrote:
>>> Hi keqian,
>>>
>>> On 5/12/21 4:44 PM, Keqian Zhu wrote:
>>>>
>>>>
>>>> On 2021/5/12 11:20, Lu Baolu wrote:
>>>>> On 5/11/21 3:40 PM, Keqian Zhu wrote:
>>>>>>> For upper layers, before starting page tracking, they check the
>>>>>>> dirty_page_trackable attribution of the domain and start it only it's
>>>>>>> capable. Once the page tracking is switched on the vendor iommu driver
>>>>>>> (or iommu core) should block further device attach/detach operations
>>>>>>> until page tracking is stopped.
>>>>>> But when a domain becomes capable after detaching a device, the upper 
>>>>>> layer
>>>>>> still needs to query it and enable dirty log for it...
>>>>>>
>>>>>> To make things coordinated, maybe the upper layer can register a 
>>>>>> notifier,
>>>>>> when the domain's capability change, the upper layer do not need to 
>>>>>> query, instead
>>>>>> they just need to realize a callback, and do their specific policy in 
>>>>>> the callback.
>>>>>> What do you think?
>>>>>>
>>>>>
>>>>> That might be an option. But why not checking domain's attribution every
>>>>> time a new tracking period is about to start?
>>>> Hi Baolu,
>>>>
>>>> I'll add an attribution in iommu_domain, and the vendor iommu driver will 
>>>> update
>>>> the attribution when attach/detach devices.
>>>>
>>>> The attribute should be protected by a lock, so the upper layer shouldn't 
>>>> access
>>>> the attribute directly. Then the iommu_domain_support_dirty_log() still 
>>>> should be
>>>> retained. Does this design looks good to you?
>>>
>>> Yes, that's what I was thinking of. But I am not sure whether it worth
>>> of a lock here. It seems not to be a valid behavior for upper layer to
>>> attach or detach any device while doing the dirty page tracking.
>> Hi Baolu,
>>
>> Right, if the "detach|attach" interfaces and "dirty tracking" interfaces can 
>> be called concurrently,
>> a lock in iommu_domain_support_dirty_log() is still not enough. I will add 
>> another note for the dirty
>> tracking interfaces.
>>
>> Do you have other suggestions? I will accelerate the progress, so I plan to 
>> send out v5 next week.
> 
> No further comments expect below nit:
> 
> "iommu_switch_dirty_log: Perform actions to start|stop dirty log tracking"
> 
> How about splitting it into
>  - iommu_start_dirty_log()
>  - iommu_stop_dirty_log()
Yeah, actually this is my original version, and the "switch" style is suggested 
by Yi Sun.
Anyway, I think both is OK, and the "switch" style can reduce some code.

Thanks,
Keqian

> 
> Not a strong opinion anyway.
> 
> Best regards,
> baolu
> .
> 
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

Re: [RFC PATCH v4 01/13] iommu: Introduce dirty log tracking framework

2021-05-13 Thread Keqian Zhu




On 2021/5/12 19:36, Lu Baolu wrote:
> Hi keqian,
> 
> On 5/12/21 4:44 PM, Keqian Zhu wrote:
>>
>>
>> On 2021/5/12 11:20, Lu Baolu wrote:
>>> On 5/11/21 3:40 PM, Keqian Zhu wrote:
>>>>> For upper layers, before starting page tracking, they check the
>>>>> dirty_page_trackable attribution of the domain and start it only it's
>>>>> capable. Once the page tracking is switched on the vendor iommu driver
>>>>> (or iommu core) should block further device attach/detach operations
>>>>> until page tracking is stopped.
>>>> But when a domain becomes capable after detaching a device, the upper layer
>>>> still needs to query it and enable dirty log for it...
>>>>
>>>> To make things coordinated, maybe the upper layer can register a notifier,
>>>> when the domain's capability change, the upper layer do not need to query, 
>>>> instead
>>>> they just need to realize a callback, and do their specific policy in the 
>>>> callback.
>>>> What do you think?
>>>>
>>>
>>> That might be an option. But why not checking domain's attribution every
>>> time a new tracking period is about to start?
>> Hi Baolu,
>>
>> I'll add an attribution in iommu_domain, and the vendor iommu driver will 
>> update
>> the attribution when attach/detach devices.
>>
>> The attribute should be protected by a lock, so the upper layer shouldn't 
>> access
>> the attribute directly. Then the iommu_domain_support_dirty_log() still 
>> should be
>> retained. Does this design looks good to you?
> 
> Yes, that's what I was thinking of. But I am not sure whether it worth
> of a lock here. It seems not to be a valid behavior for upper layer to
> attach or detach any device while doing the dirty page tracking.
Hi Baolu,

Right, if the "detach|attach" interfaces and "dirty tracking" interfaces can be 
called concurrently,
a lock in iommu_domain_support_dirty_log() is still not enough. I will add 
another note for the dirty
tracking interfaces.

Do you have other suggestions? I will accelerate the progress, so I plan to 
send out v5 next week.

Thanks,
Keqian
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

Re: [RFC PATCH v4 01/13] iommu: Introduce dirty log tracking framework

2021-05-12 Thread Keqian Zhu




On 2021/5/12 11:20, Lu Baolu wrote:
> On 5/11/21 3:40 PM, Keqian Zhu wrote:
>>> For upper layers, before starting page tracking, they check the
>>> dirty_page_trackable attribution of the domain and start it only it's
>>> capable. Once the page tracking is switched on the vendor iommu driver
>>> (or iommu core) should block further device attach/detach operations
>>> until page tracking is stopped.
>> But when a domain becomes capable after detaching a device, the upper layer
>> still needs to query it and enable dirty log for it...
>>
>> To make things coordinated, maybe the upper layer can register a notifier,
>> when the domain's capability change, the upper layer do not need to query, 
>> instead
>> they just need to realize a callback, and do their specific policy in the 
>> callback.
>> What do you think?
>>
> 
> That might be an option. But why not checking domain's attribution every
> time a new tracking period is about to start?
Hi Baolu,

I'll add an attribution in iommu_domain, and the vendor iommu driver will update
the attribution when attach/detach devices.

The attribute should be protected by a lock, so the upper layer shouldn't access
the attribute directly. Then the iommu_domain_support_dirty_log() still should 
be
retained. Does this design looks good to you?

Thanks,
Keqian
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

Re: [RFC PATCH v4 01/13] iommu: Introduce dirty log tracking framework

2021-05-11 Thread Keqian Zhu

Hi Baolu,

On 2021/5/11 11:12, Lu Baolu wrote:
> Hi Keqian,
> 
> On 5/10/21 7:07 PM, Keqian Zhu wrote:
>>>>> I suppose this interface is to ask the vendor IOMMU driver to check
>>>>> whether each device/iommu in the domain supports dirty bit tracking.
>>>>> But what will happen if new devices with different tracking capability
>>>>> are added afterward?
>>>> Yep, this is considered in the vfio part. We will query again after 
>>>> attaching or
>>>> detaching devices from the domain.  When the domain becomes capable, we 
>>>> enable
>>>> dirty log for it. When it becomes not capable, we disable dirty log for it.
>>> If that's the case, why not putting this logic in the iommu subsystem so
>>> that it doesn't need to be duplicate in different upper layers?
>>>
>>> For example, add something like dirty_page_trackable in the struct of
>>> iommu_domain and ask the vendor iommu driver to update it once any
>>> device is added/removed to/from the domain. It's also better to disallow
>> If we do it, the upper layer still needs to query the capability from domain 
>> and switch
>> dirty log tracking for it. Or do you mean the domain can switch dirty log 
>> tracking automatically
>> when its capability change? If so, I think we're lack of some flexibility. 
>> The upper layer
>> may have it's own policy, such as only enable dirty log tracking when all 
>> domains are capable,
>> and disable dirty log tracking when just one domain is not capable.
> 
> I may not get you.
> 
> Assume that dirty_page_trackable is an attribution of an iommu_domain.
> This attribution might be changed once a new device (with different
> capability) added or removed. So it should be updated every time a new
> device is attached or detached. This work could be done by the vendor
> iommu driver on the path of dev_attach/dev_detach callback.
Yes, this is what I understand you.

> 
> For upper layers, before starting page tracking, they check the
> dirty_page_trackable attribution of the domain and start it only it's
> capable. Once the page tracking is switched on the vendor iommu driver
> (or iommu core) should block further device attach/detach operations
> until page tracking is stopped.
But when a domain becomes capable after detaching a device, the upper layer
still needs to query it and enable dirty log for it...

To make things coordinated, maybe the upper layer can register a notifier,
when the domain's capability change, the upper layer do not need to query, 
instead
they just need to realize a callback, and do their specific policy in the 
callback.
What do you think?

> 
>>
>>> any domain attach/detach once the dirty page tracking is on.
>> Yep, this can greatly simplify our code logic, but I don't know whether our 
>> maintainers
>> agree that, as they may think that IOMMU dirty logging should not change 
>> original domain
>> behaviors.
> 
> The maintainer owns the last word, but we need to work out a generic and
> self-contained API set.
OK, I see.

Thanks,
Keqian
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

Re: [RFC PATCH v4 01/13] iommu: Introduce dirty log tracking framework

2021-05-10 Thread Keqian Zhu

Hi Baolu,

On 2021/5/10 9:08, Lu Baolu wrote:
> Hi Keqian,
> 
> On 5/8/21 3:35 PM, Keqian Zhu wrote:
>> Hi Baolu,
>>
>> On 2021/5/8 11:46, Lu Baolu wrote:
>>> Hi Keqian,
>>>
>>> On 5/7/21 6:21 PM, Keqian Zhu wrote:
>>>> Some types of IOMMU are capable of tracking DMA dirty log, such as
>>>> ARM SMMU with HTTU or Intel IOMMU with SLADE. This introduces the
>>>> dirty log tracking framework in the IOMMU base layer.
>>>>
>>>> Four new essential interfaces are added, and we maintaince the status
>>>> of dirty log tracking in iommu_domain.
>>>> 1. iommu_support_dirty_log: Check whether domain supports dirty log 
>>>> tracking
>>>> 2. iommu_switch_dirty_log: Perform actions to start|stop dirty log tracking
>>>> 3. iommu_sync_dirty_log: Sync dirty log from IOMMU into a dirty bitmap
>>>> 4. iommu_clear_dirty_log: Clear dirty log of IOMMU by a mask bitmap
>>>>
>>>> Note: Don't concurrently call these interfaces with other ops that
>>>> access underlying page table.
>>>>
>>>> Signed-off-by: Keqian Zhu
>>>> Signed-off-by: Kunkun Jiang
>>>> ---
>>>>drivers/iommu/iommu.c| 201 +++
>>>>include/linux/iommu.h|  63 +++
>>>>include/trace/events/iommu.h |  63 +++
>>>>3 files changed, 327 insertions(+)
>>>>
>>>> diff --git a/drivers/iommu/iommu.c b/drivers/iommu/iommu.c
>>>> index 808ab70d5df5..0d15620d1e90 100644
>>>> --- a/drivers/iommu/iommu.c
>>>> +++ b/drivers/iommu/iommu.c
>>>> @@ -1940,6 +1940,7 @@ static struct iommu_domain 
>>>> *__iommu_domain_alloc(struct bus_type *bus,
>>>>domain->type = type;
>>>>/* Assume all sizes by default; the driver may override this later 
>>>> */
>>>>domain->pgsize_bitmap  = bus->iommu_ops->pgsize_bitmap;
>>>> +mutex_init(>switch_log_lock);
>>>>  return domain;
>>>>}
>>>> @@ -2703,6 +2704,206 @@ int iommu_set_pgtable_quirks(struct iommu_domain 
>>>> *domain,
>>>>}
>>>>EXPORT_SYMBOL_GPL(iommu_set_pgtable_quirks);
>>>>+bool iommu_support_dirty_log(struct iommu_domain *domain)
>>>> +{
>>>> +const struct iommu_ops *ops = domain->ops;
>>>> +
>>>> +return ops->support_dirty_log && ops->support_dirty_log(domain);
>>>> +}
>>>> +EXPORT_SYMBOL_GPL(iommu_support_dirty_log);
>>> I suppose this interface is to ask the vendor IOMMU driver to check
>>> whether each device/iommu in the domain supports dirty bit tracking.
>>> But what will happen if new devices with different tracking capability
>>> are added afterward?
>> Yep, this is considered in the vfio part. We will query again after 
>> attaching or
>> detaching devices from the domain.  When the domain becomes capable, we 
>> enable
>> dirty log for it. When it becomes not capable, we disable dirty log for it.
> 
> If that's the case, why not putting this logic in the iommu subsystem so
> that it doesn't need to be duplicate in different upper layers?
> 
> For example, add something like dirty_page_trackable in the struct of
> iommu_domain and ask the vendor iommu driver to update it once any
> device is added/removed to/from the domain. It's also better to disallow
If we do it, the upper layer still needs to query the capability from domain 
and switch
dirty log tracking for it. Or do you mean the domain can switch dirty log 
tracking automatically
when its capability change? If so, I think we're lack of some flexibility. The 
upper layer
may have it's own policy, such as only enable dirty log tracking when all 
domains are capable,
and disable dirty log tracking when just one domain is not capable.

> any domain attach/detach once the dirty page tracking is on.
Yep, this can greatly simplify our code logic, but I don't know whether our 
maintainers
agree that, as they may think that IOMMU dirty logging should not change 
original domain
behaviors.


Thanks,
Keqian
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

Re: [RFC PATCH v4 01/13] iommu: Introduce dirty log tracking framework

2021-05-08 Thread Keqian Zhu

Hi Baolu,

On 2021/5/8 11:46, Lu Baolu wrote:
> Hi Keqian,
> 
> On 5/7/21 6:21 PM, Keqian Zhu wrote:
>> Some types of IOMMU are capable of tracking DMA dirty log, such as
>> ARM SMMU with HTTU or Intel IOMMU with SLADE. This introduces the
>> dirty log tracking framework in the IOMMU base layer.
>>
>> Four new essential interfaces are added, and we maintaince the status
>> of dirty log tracking in iommu_domain.
>> 1. iommu_support_dirty_log: Check whether domain supports dirty log tracking
>> 2. iommu_switch_dirty_log: Perform actions to start|stop dirty log tracking
>> 3. iommu_sync_dirty_log: Sync dirty log from IOMMU into a dirty bitmap
>> 4. iommu_clear_dirty_log: Clear dirty log of IOMMU by a mask bitmap
>>
>> Note: Don't concurrently call these interfaces with other ops that
>> access underlying page table.
>>
>> Signed-off-by: Keqian Zhu 
>> Signed-off-by: Kunkun Jiang 
>> ---
>>   drivers/iommu/iommu.c| 201 +++
>>   include/linux/iommu.h|  63 +++
>>   include/trace/events/iommu.h |  63 +++
>>   3 files changed, 327 insertions(+)
>>
>> diff --git a/drivers/iommu/iommu.c b/drivers/iommu/iommu.c
>> index 808ab70d5df5..0d15620d1e90 100644
>> --- a/drivers/iommu/iommu.c
>> +++ b/drivers/iommu/iommu.c
>> @@ -1940,6 +1940,7 @@ static struct iommu_domain 
>> *__iommu_domain_alloc(struct bus_type *bus,
>>   domain->type = type;
>>   /* Assume all sizes by default; the driver may override this later */
>>   domain->pgsize_bitmap  = bus->iommu_ops->pgsize_bitmap;
>> +mutex_init(>switch_log_lock);
>> return domain;
>>   }
>> @@ -2703,6 +2704,206 @@ int iommu_set_pgtable_quirks(struct iommu_domain 
>> *domain,
>>   }
>>   EXPORT_SYMBOL_GPL(iommu_set_pgtable_quirks);
>>   +bool iommu_support_dirty_log(struct iommu_domain *domain)
>> +{
>> +const struct iommu_ops *ops = domain->ops;
>> +
>> +return ops->support_dirty_log && ops->support_dirty_log(domain);
>> +}
>> +EXPORT_SYMBOL_GPL(iommu_support_dirty_log);
> 
> I suppose this interface is to ask the vendor IOMMU driver to check
> whether each device/iommu in the domain supports dirty bit tracking.
> But what will happen if new devices with different tracking capability
> are added afterward?
Yep, this is considered in the vfio part. We will query again after attaching or
detaching devices from the domain.  When the domain becomes capable, we enable
dirty log for it. When it becomes not capable, we disable dirty log for it.

> 
> To make things simple, is it possible to support this tracking only when
> all underlying IOMMUs support dirty bit tracking?
IIUC, all underlying IOMMUs you refer is of system wide. I think this idea may 
has
two issues. 1) The target domain may just contains part of system IOMMUs. 2) The
dirty tracking capability can be related to the capability of devices. For 
example,
we can track dirty log based on IOPF, which needs the capability of devices. 
That's
to say, we can make this framework more common.

> 
> Or, the more crazy idea is that we don't need to check this capability
> at all. If dirty bit tracking is not supported by hardware, just mark
> all pages dirty?
Yeah, I think this idea is nice :).

Still one concern is that we may have other dirty tracking methods in the 
future,
if we can't track dirty through iommu, we can still try other methods.

If there is no interface to check this capability, we have no chance to try
other methods. What do you think?

> 
>> +
>> +int iommu_switch_dirty_log(struct iommu_domain *domain, bool enable,
>> +   unsigned long iova, size_t size, int prot)
>> +{
>> +const struct iommu_ops *ops = domain->ops;
>> +unsigned long orig_iova = iova;
>> +unsigned int min_pagesz;
>> +size_t orig_size = size;
>> +bool flush = false;
>> +int ret = 0;
>> +
>> +if (unlikely(!ops->switch_dirty_log))
>> +return -ENODEV;
>> +
>> +min_pagesz = 1 << __ffs(domain->pgsize_bitmap);
>> +if (!IS_ALIGNED(iova | size, min_pagesz)) {
>> +pr_err("unaligned: iova 0x%lx size 0x%zx min_pagesz 0x%x\n",
>> +   iova, size, min_pagesz);
>> +return -EINVAL;
>> +}
>> +
>> +mutex_lock(>switch_log_lock);
>> +if (enable && domain->dirty_log_tracking) {
>> +ret = -EBUSY;
>> +goto out;
>> +} else if (!enable && !domain->dirty_log_tracking) {
>> +ret = -EINVAL;
>>

[RFC PATCH v4 09/13] iommu/arm-smmu-v3: Add feature detection for BBML

2021-05-07 Thread Keqian Zhu

From: Kunkun Jiang 

This detects BBML feature and if SMMU supports it, transfer BBMLx
quirk to io-pgtable.

Co-developed-by: Keqian Zhu 
Signed-off-by: Kunkun Jiang 
---
 drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c | 19 +++
 drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h |  6 ++
 2 files changed, 25 insertions(+)

diff --git a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c 
b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
index c42e59655fd0..3a2dc3177180 100644
--- a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
+++ b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
@@ -2051,6 +2051,11 @@ static int arm_smmu_domain_finalise(struct iommu_domain 
*domain,
if (smmu->features & ARM_SMMU_FEAT_HD)
pgtbl_cfg.quirks |= IO_PGTABLE_QUIRK_ARM_HD;
 
+   if (smmu->features & ARM_SMMU_FEAT_BBML1)
+   pgtbl_cfg.quirks |= IO_PGTABLE_QUIRK_ARM_BBML1;
+   else if (smmu->features & ARM_SMMU_FEAT_BBML2)
+   pgtbl_cfg.quirks |= IO_PGTABLE_QUIRK_ARM_BBML2;
+
pgtbl_ops = alloc_io_pgtable_ops(fmt, _cfg, smmu_domain);
if (!pgtbl_ops)
return -ENOMEM;
@@ -3419,6 +3424,20 @@ static int arm_smmu_device_hw_probe(struct 
arm_smmu_device *smmu)
 
/* IDR3 */
reg = readl_relaxed(smmu->base + ARM_SMMU_IDR3);
+   switch (FIELD_GET(IDR3_BBML, reg)) {
+   case IDR3_BBML0:
+   break;
+   case IDR3_BBML1:
+   smmu->features |= ARM_SMMU_FEAT_BBML1;
+   break;
+   case IDR3_BBML2:
+   smmu->features |= ARM_SMMU_FEAT_BBML2;
+   break;
+   default:
+   dev_err(smmu->dev, "unknown/unsupported BBM behavior level\n");
+   return -ENXIO;
+   }
+
if (FIELD_GET(IDR3_RIL, reg))
smmu->features |= ARM_SMMU_FEAT_RANGE_INV;
 
diff --git a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h 
b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h
index 3edcd31b046e..e3b6bdd292c9 100644
--- a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h
+++ b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h
@@ -54,6 +54,10 @@
 #define IDR1_SIDSIZE   GENMASK(5, 0)
 
 #define ARM_SMMU_IDR3  0xc
+#define IDR3_BBML  GENMASK(12, 11)
+#define IDR3_BBML0 0
+#define IDR3_BBML1 1
+#define IDR3_BBML2 2
 #define IDR3_RIL   (1 << 10)
 
 #define ARM_SMMU_IDR5  0x14
@@ -613,6 +617,8 @@ struct arm_smmu_device {
 #define ARM_SMMU_FEAT_E2H  (1 << 18)
 #define ARM_SMMU_FEAT_HA   (1 << 19)
 #define ARM_SMMU_FEAT_HD   (1 << 20)
+#define ARM_SMMU_FEAT_BBML1(1 << 21)
+#define ARM_SMMU_FEAT_BBML2(1 << 22)
u32 features;
 
 #define ARM_SMMU_OPT_SKIP_PREFETCH (1 << 0)
-- 
2.19.1

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

[RFC PATCH v4 11/13] iommu/arm-smmu-v3: Realize sync_dirty_log iommu ops

2021-05-07 Thread Keqian Zhu

From: Kunkun Jiang 

This realizes sync_dirty_log iommu ops based on sync_dirty_log
io-pgtable ops.

Co-developed-by: Keqian Zhu 
Signed-off-by: Kunkun Jiang 
---
 drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c | 30 +
 1 file changed, 30 insertions(+)

diff --git a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c 
b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
index 6de81d6ab652..3d3c0f8e2446 100644
--- a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
+++ b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
@@ -2721,6 +2721,35 @@ static int arm_smmu_switch_dirty_log(struct iommu_domain 
*domain, bool enable,
return 0;
 }
 
+static int arm_smmu_sync_dirty_log(struct iommu_domain *domain,
+  unsigned long iova, size_t size,
+  unsigned long *bitmap,
+  unsigned long base_iova,
+  unsigned long bitmap_pgshift)
+{
+   struct arm_smmu_domain *smmu_domain = to_smmu_domain(domain);
+   struct io_pgtable_ops *ops = smmu_domain->pgtbl_ops;
+   struct arm_smmu_device *smmu = smmu_domain->smmu;
+
+   if (!(smmu->features & ARM_SMMU_FEAT_HD))
+   return -ENODEV;
+   if (smmu_domain->stage != ARM_SMMU_DOMAIN_S1)
+   return -EINVAL;
+
+   if (!ops || !ops->sync_dirty_log) {
+   pr_err("io-pgtable don't realize sync dirty log\n");
+   return -ENODEV;
+   }
+
+   /*
+* Flush iotlb to ensure all inflight transactions are completed.
+* See doc IHI0070Da 3.13.4 "HTTU behavior summary".
+*/
+   arm_smmu_flush_iotlb_all(domain);
+   return ops->sync_dirty_log(ops, iova, size, bitmap, base_iova,
+  bitmap_pgshift);
+}
+
 static int arm_smmu_of_xlate(struct device *dev, struct of_phandle_args *args)
 {
return iommu_fwspec_add_ids(dev, args->args, 1);
@@ -2820,6 +2849,7 @@ static struct iommu_ops arm_smmu_ops = {
.device_group   = arm_smmu_device_group,
.enable_nesting = arm_smmu_enable_nesting,
.switch_dirty_log   = arm_smmu_switch_dirty_log,
+   .sync_dirty_log = arm_smmu_sync_dirty_log,
.of_xlate   = arm_smmu_of_xlate,
.get_resv_regions   = arm_smmu_get_resv_regions,
.put_resv_regions   = generic_iommu_put_resv_regions,
-- 
2.19.1

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

[RFC PATCH v4 13/13] iommu/arm-smmu-v3: Realize support_dirty_log iommu ops

2021-05-07 Thread Keqian Zhu

From: Kunkun Jiang 

We have implemented these interfaces required to support iommu
dirty log tracking. The last step is reporting this feature to
upper user, then the user can perform higher policy base on it.
For arm smmuv3, it is equal to ARM_SMMU_FEAT_HD.

Co-developed-by: Keqian Zhu 
Signed-off-by: Kunkun Jiang 
---
 drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c | 8 
 1 file changed, 8 insertions(+)

diff --git a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c 
b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
index 9b4739247dbb..59d11f084199 100644
--- a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
+++ b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
@@ -2684,6 +2684,13 @@ static int arm_smmu_merge_page(struct iommu_domain 
*domain, unsigned long iova,
return ret;
 }
 
+static bool arm_smmu_support_dirty_log(struct iommu_domain *domain)
+{
+   struct arm_smmu_domain *smmu_domain = to_smmu_domain(domain);
+
+   return !!(smmu_domain->smmu->features & ARM_SMMU_FEAT_HD);
+}
+
 static int arm_smmu_switch_dirty_log(struct iommu_domain *domain, bool enable,
 unsigned long iova, size_t size, int prot)
 {
@@ -2872,6 +2879,7 @@ static struct iommu_ops arm_smmu_ops = {
.release_device = arm_smmu_release_device,
.device_group   = arm_smmu_device_group,
.enable_nesting = arm_smmu_enable_nesting,
+   .support_dirty_log  = arm_smmu_support_dirty_log,
.switch_dirty_log   = arm_smmu_switch_dirty_log,
.sync_dirty_log = arm_smmu_sync_dirty_log,
.clear_dirty_log= arm_smmu_clear_dirty_log,
-- 
2.19.1

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

[RFC PATCH v4 10/13] iommu/arm-smmu-v3: Realize switch_dirty_log iommu ops

2021-05-07 Thread Keqian Zhu

From: Kunkun Jiang 

This realizes switch_dirty_log. In order to get finer dirty
granule, it invokes arm_smmu_split_block when start dirty
log, and invokes arm_smmu_merge_page() to recover block
mapping when stop dirty log.

Co-developed-by: Keqian Zhu 
Signed-off-by: Kunkun Jiang 
---
 drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c | 142 
 drivers/iommu/iommu.c   |   5 +-
 include/linux/iommu.h   |   2 +
 3 files changed, 147 insertions(+), 2 deletions(-)

diff --git a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c 
b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
index 3a2dc3177180..6de81d6ab652 100644
--- a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
+++ b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
@@ -2580,6 +2580,147 @@ static int arm_smmu_enable_nesting(struct iommu_domain 
*domain)
return ret;
 }
 
+static int arm_smmu_split_block(struct iommu_domain *domain,
+   unsigned long iova, size_t size)
+{
+   struct arm_smmu_domain *smmu_domain = to_smmu_domain(domain);
+   struct arm_smmu_device *smmu = smmu_domain->smmu;
+   struct io_pgtable_ops *ops = smmu_domain->pgtbl_ops;
+   size_t handled_size;
+
+   if (!(smmu->features & (ARM_SMMU_FEAT_BBML1 | ARM_SMMU_FEAT_BBML2))) {
+   dev_err(smmu->dev, "don't support BBML1/2, can't split 
block\n");
+   return -ENODEV;
+   }
+   if (!ops || !ops->split_block) {
+   pr_err("io-pgtable don't realize split block\n");
+   return -ENODEV;
+   }
+
+   handled_size = ops->split_block(ops, iova, size);
+   if (handled_size != size) {
+   pr_err("split block failed\n");
+   return -EFAULT;
+   }
+
+   return 0;
+}
+
+static int __arm_smmu_merge_page(struct iommu_domain *domain,
+unsigned long iova, phys_addr_t paddr,
+size_t size, int prot)
+{
+   struct arm_smmu_domain *smmu_domain = to_smmu_domain(domain);
+   struct io_pgtable_ops *ops = smmu_domain->pgtbl_ops;
+   size_t handled_size;
+
+   if (!ops || !ops->merge_page) {
+   pr_err("io-pgtable don't realize merge page\n");
+   return -ENODEV;
+   }
+
+   while (size) {
+   size_t pgsize = iommu_pgsize(domain, iova | paddr, size);
+
+   handled_size = ops->merge_page(ops, iova, paddr, pgsize, prot);
+   if (handled_size != pgsize) {
+   pr_err("merge page failed\n");
+   return -EFAULT;
+   }
+
+   pr_debug("merge handled: iova 0x%lx pa %pa size 0x%zx\n",
+iova, , pgsize);
+
+   iova += pgsize;
+   paddr += pgsize;
+   size -= pgsize;
+   }
+
+   return 0;
+}
+
+static int arm_smmu_merge_page(struct iommu_domain *domain, unsigned long iova,
+  size_t size, int prot)
+{
+   struct arm_smmu_domain *smmu_domain = to_smmu_domain(domain);
+   struct arm_smmu_device *smmu = smmu_domain->smmu;
+   struct io_pgtable_ops *ops = smmu_domain->pgtbl_ops;
+   phys_addr_t phys;
+   dma_addr_t p, i;
+   size_t cont_size;
+   int ret = 0;
+
+   if (!(smmu->features & (ARM_SMMU_FEAT_BBML1 | ARM_SMMU_FEAT_BBML2))) {
+   dev_err(smmu->dev, "don't support BBML1/2, can't merge page\n");
+   return -ENODEV;
+   }
+
+   if (!ops || !ops->iova_to_phys)
+   return -ENODEV;
+
+   while (size) {
+   phys = ops->iova_to_phys(ops, iova);
+   cont_size = PAGE_SIZE;
+   p = phys + cont_size;
+   i = iova + cont_size;
+
+   while (cont_size < size && p == ops->iova_to_phys(ops, i)) {
+   p += PAGE_SIZE;
+   i += PAGE_SIZE;
+   cont_size += PAGE_SIZE;
+   }
+
+   if (cont_size != PAGE_SIZE) {
+   ret = __arm_smmu_merge_page(domain, iova, phys,
+   cont_size, prot);
+   if (ret)
+   break;
+   }
+
+   iova += cont_size;
+   size -= cont_size;
+   }
+
+   return ret;
+}
+
+static int arm_smmu_switch_dirty_log(struct iommu_domain *domain, bool enable,
+unsigned long iova, size_t size, int prot)
+{
+   struct arm_smmu_domain *smmu_domain = to_smmu_domain(domain);
+   struct arm_smmu_device *smmu = smmu_domain->smmu;
+
+   if (!(smmu->features & ARM_SMMU_FEAT_HD))
+   return -ENODEV;
+   if (smmu_domain->stage != ARM_SMMU_DOMAIN_S1)
+

[RFC PATCH v4 07/13] iommu/arm-smmu-v3: Add support for Hardware Translation Table Update

2021-05-07 Thread Keqian Zhu

From: Jean-Philippe Brucker 

If the SMMU supports it and the kernel was built with HTTU support,
enable hardware update of access and dirty flags. This is essential for
shared page tables, to reduce the number of access faults on the fault
queue. Normal DMA with io-pgtables doesn't currently use the access or
dirty flags.

We can enable HTTU even if CPUs don't support it, because the kernel
always checks for HW dirty bit and updates the PTE flags atomically.

Signed-off-by: Jean-Philippe Brucker 
---
 .../iommu/arm/arm-smmu-v3/arm-smmu-v3-sva.c   |  2 +
 drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c   | 41 ++-
 drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h   |  8 
 3 files changed, 50 insertions(+), 1 deletion(-)

diff --git a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3-sva.c 
b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3-sva.c
index bb251cab61f3..ae075e675892 100644
--- a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3-sva.c
+++ b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3-sva.c
@@ -121,10 +121,12 @@ static struct arm_smmu_ctx_desc 
*arm_smmu_alloc_shared_cd(struct mm_struct *mm)
if (err)
goto out_free_asid;
 
+   /* HA and HD will be filtered out later if not supported by the SMMU */
tcr = FIELD_PREP(CTXDESC_CD_0_TCR_T0SZ, 64ULL - vabits_actual) |
  FIELD_PREP(CTXDESC_CD_0_TCR_IRGN0, ARM_LPAE_TCR_RGN_WBWA) |
  FIELD_PREP(CTXDESC_CD_0_TCR_ORGN0, ARM_LPAE_TCR_RGN_WBWA) |
  FIELD_PREP(CTXDESC_CD_0_TCR_SH0, ARM_LPAE_TCR_SH_IS) |
+ CTXDESC_CD_0_TCR_HA | CTXDESC_CD_0_TCR_HD |
  CTXDESC_CD_0_TCR_EPD1 | CTXDESC_CD_0_AA64;
 
switch (PAGE_SIZE) {
diff --git a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c 
b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
index 54b2f27b81d4..4ac59a89bc76 100644
--- a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
+++ b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
@@ -1010,10 +1010,17 @@ int arm_smmu_write_ctx_desc(struct arm_smmu_domain 
*smmu_domain, int ssid,
 * this substream's traffic
 */
} else { /* (1) and (2) */
+   u64 tcr = cd->tcr;
+
cdptr[1] = cpu_to_le64(cd->ttbr & CTXDESC_CD_1_TTB0_MASK);
cdptr[2] = 0;
cdptr[3] = cpu_to_le64(cd->mair);
 
+   if (!(smmu->features & ARM_SMMU_FEAT_HD))
+   tcr &= ~CTXDESC_CD_0_TCR_HD;
+   if (!(smmu->features & ARM_SMMU_FEAT_HA))
+   tcr &= ~CTXDESC_CD_0_TCR_HA;
+
/*
 * STE is live, and the SMMU might read dwords of this CD in any
 * order. Ensure that it observes valid values before reading
@@ -1021,7 +1028,7 @@ int arm_smmu_write_ctx_desc(struct arm_smmu_domain 
*smmu_domain, int ssid,
 */
arm_smmu_sync_cd(smmu_domain, ssid, true);
 
-   val = cd->tcr |
+   val = tcr |
 #ifdef __BIG_ENDIAN
CTXDESC_CD_0_ENDI |
 #endif
@@ -3242,6 +3249,28 @@ static int arm_smmu_device_reset(struct arm_smmu_device 
*smmu, bool bypass)
return 0;
 }
 
+static void arm_smmu_get_httu(struct arm_smmu_device *smmu, u32 reg)
+{
+   u32 fw_features = smmu->features & (ARM_SMMU_FEAT_HA | 
ARM_SMMU_FEAT_HD);
+   u32 features = 0;
+
+   switch (FIELD_GET(IDR0_HTTU, reg)) {
+   case IDR0_HTTU_ACCESS_DIRTY:
+   features |= ARM_SMMU_FEAT_HD;
+   fallthrough;
+   case IDR0_HTTU_ACCESS:
+   features |= ARM_SMMU_FEAT_HA;
+   }
+
+   if (smmu->dev->of_node)
+   smmu->features |= features;
+   else if (features != fw_features)
+   /* ACPI IORT sets the HTTU bits */
+   dev_warn(smmu->dev,
+"IDR0.HTTU overridden by FW configuration (0x%x)\n",
+fw_features);
+}
+
 static int arm_smmu_device_hw_probe(struct arm_smmu_device *smmu)
 {
u32 reg;
@@ -3302,6 +3331,8 @@ static int arm_smmu_device_hw_probe(struct 
arm_smmu_device *smmu)
smmu->features |= ARM_SMMU_FEAT_E2H;
}
 
+   arm_smmu_get_httu(smmu, reg);
+
/*
 * The coherency feature as set by FW is used in preference to the ID
 * register, but warn on mismatch.
@@ -3487,6 +3518,14 @@ static int arm_smmu_device_acpi_probe(struct 
platform_device *pdev,
if (iort_smmu->flags & ACPI_IORT_SMMU_V3_COHACC_OVERRIDE)
smmu->features |= ARM_SMMU_FEAT_COHERENCY;
 
+   switch (FIELD_GET(ACPI_IORT_SMMU_V3_HTTU_OVERRIDE, iort_smmu->flags)) {
+   case IDR0_HTTU_ACCESS_DIRTY:
+   smmu->features |= ARM_SMMU_FEAT_HD;
+   fallthrough;
+   case IDR0_HTTU_ACCESS:
+   smmu->features |= ARM_SMMU_FEAT_HA;
+   }
+
return 0;
 }
 #else
diff --git a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h 
b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h
index

[RFC PATCH v4 08/13] iommu/arm-smmu-v3: Enable HTTU for stage1 with io-pgtable mapping

2021-05-07 Thread Keqian Zhu

From: Kunkun Jiang 

As nested mode is not upstreamed now, we just aim to support dirty
log tracking for stage1 with io-pgtable mapping (means not support
SVA mapping). If HTTU is supported, we enable HA/HD bits in the SMMU
CD and transfer ARM_HD quirk to io-pgtable.

Co-developed-by: Keqian Zhu 
Signed-off-by: Kunkun Jiang 
---
 drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c 
b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
index 4ac59a89bc76..c42e59655fd0 100644
--- a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
+++ b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
@@ -1942,6 +1942,7 @@ static int arm_smmu_domain_finalise_s1(struct 
arm_smmu_domain *smmu_domain,
  FIELD_PREP(CTXDESC_CD_0_TCR_ORGN0, tcr->orgn) |
  FIELD_PREP(CTXDESC_CD_0_TCR_SH0, tcr->sh) |
  FIELD_PREP(CTXDESC_CD_0_TCR_IPS, tcr->ips) |
+ CTXDESC_CD_0_TCR_HA | CTXDESC_CD_0_TCR_HD |
  CTXDESC_CD_0_TCR_EPD1 | CTXDESC_CD_0_AA64;
cfg->cd.mair= pgtbl_cfg->arm_lpae_s1_cfg.mair;
 
@@ -2047,6 +2048,8 @@ static int arm_smmu_domain_finalise(struct iommu_domain 
*domain,
 
if (!iommu_get_dma_strict(domain))
pgtbl_cfg.quirks |= IO_PGTABLE_QUIRK_NON_STRICT;
+   if (smmu->features & ARM_SMMU_FEAT_HD)
+   pgtbl_cfg.quirks |= IO_PGTABLE_QUIRK_ARM_HD;
 
pgtbl_ops = alloc_io_pgtable_ops(fmt, _cfg, smmu_domain);
if (!pgtbl_ops)
-- 
2.19.1

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

[RFC PATCH v4 06/13] iommu/io-pgtable-arm: Add and realize clear_dirty_log ops

2021-05-07 Thread Keqian Zhu

From: Kunkun Jiang 

After dirty log is retrieved, user should clear dirty log to re-enable
dirty log tracking for these dirtied pages. This clears the dirty state
(As we just set DBM bit for stage1 mapping, so should set the AP[2] bit)
of these leaf TTDs that are specified by the user provided bitmap.

Co-developed-by: Keqian Zhu 
Signed-off-by: Kunkun Jiang 
---
 drivers/iommu/io-pgtable-arm.c | 93 ++
 include/linux/io-pgtable.h |  4 ++
 2 files changed, 97 insertions(+)

diff --git a/drivers/iommu/io-pgtable-arm.c b/drivers/iommu/io-pgtable-arm.c
index 155d440099ab..2b41b9d0faa3 100644
--- a/drivers/iommu/io-pgtable-arm.c
+++ b/drivers/iommu/io-pgtable-arm.c
@@ -965,6 +965,98 @@ static int arm_lpae_sync_dirty_log(struct io_pgtable_ops 
*ops,
 bitmap, base_iova, bitmap_pgshift);
 }
 
+static int __arm_lpae_clear_dirty_log(struct arm_lpae_io_pgtable *data,
+ unsigned long iova, size_t size,
+ int lvl, arm_lpae_iopte *ptep,
+ unsigned long *bitmap,
+ unsigned long base_iova,
+ unsigned long bitmap_pgshift)
+{
+   arm_lpae_iopte pte;
+   struct io_pgtable *iop = >iop;
+   unsigned long offset;
+   size_t base, next_size;
+   int nbits, ret, i;
+
+   if (WARN_ON(lvl == ARM_LPAE_MAX_LEVELS))
+   return -EINVAL;
+
+   ptep += ARM_LPAE_LVL_IDX(iova, lvl, data);
+   pte = READ_ONCE(*ptep);
+   if (WARN_ON(!pte))
+   return -EINVAL;
+
+   if (size == ARM_LPAE_BLOCK_SIZE(lvl, data)) {
+   if (iopte_leaf(pte, lvl, iop->fmt)) {
+   if (pte & ARM_LPAE_PTE_AP_RDONLY)
+   return 0;
+
+   /* Ensure all corresponding bits are set */
+   nbits = size >> bitmap_pgshift;
+   offset = (iova - base_iova) >> bitmap_pgshift;
+   for (i = offset; i < offset + nbits; i++) {
+   if (!test_bit(i, bitmap))
+   return 0;
+   }
+
+   /* Race does not exist */
+   pte |= ARM_LPAE_PTE_AP_RDONLY;
+   __arm_lpae_set_pte(ptep, pte, >cfg);
+   return 0;
+   }
+   /* Current level is table, traverse next level */
+   next_size = ARM_LPAE_BLOCK_SIZE(lvl + 1, data);
+   ptep = iopte_deref(pte, data);
+   for (base = 0; base < size; base += next_size) {
+   ret = __arm_lpae_clear_dirty_log(data, iova + base,
+   next_size, lvl + 1, ptep, bitmap,
+   base_iova, bitmap_pgshift);
+   if (ret)
+   return ret;
+   }
+   return 0;
+   } else if (iopte_leaf(pte, lvl, iop->fmt)) {
+   /* Though the size is too small, it is already clean */
+   if (pte & ARM_LPAE_PTE_AP_RDONLY)
+   return 0;
+
+   return -EINVAL;
+   }
+
+   /* Keep on walkin */
+   ptep = iopte_deref(pte, data);
+   return __arm_lpae_clear_dirty_log(data, iova, size, lvl + 1, ptep,
+   bitmap, base_iova, bitmap_pgshift);
+}
+
+static int arm_lpae_clear_dirty_log(struct io_pgtable_ops *ops,
+   unsigned long iova, size_t size,
+   unsigned long *bitmap,
+   unsigned long base_iova,
+   unsigned long bitmap_pgshift)
+{
+   struct arm_lpae_io_pgtable *data = io_pgtable_ops_to_data(ops);
+   struct io_pgtable_cfg *cfg = >iop.cfg;
+   arm_lpae_iopte *ptep = data->pgd;
+   int lvl = data->start_level;
+   long iaext = (s64)iova >> cfg->ias;
+
+   if (WARN_ON(!size || (size & cfg->pgsize_bitmap) != size))
+   return -EINVAL;
+
+   if (cfg->quirks & IO_PGTABLE_QUIRK_ARM_TTBR1)
+   iaext = ~iaext;
+   if (WARN_ON(iaext))
+   return -EINVAL;
+
+   if (data->iop.fmt != ARM_64_LPAE_S1 &&
+   data->iop.fmt != ARM_32_LPAE_S1)
+   return -EINVAL;
+
+   return __arm_lpae_clear_dirty_log(data, iova, size, lvl, ptep,
+   bitmap, base_iova, bitmap_pgshift);
+}
+
 static void arm_lpae_restrict_pgsizes(struct io_pgtable_cfg *cfg)
 {
unsigned long granule, page_sizes;
@@ -1046,6 +1138,7 @@ arm_lpae_alloc_pgtable(struct io_pgtable_cfg *cfg)
.split_block= arm_lpae_split_block,
.merge_page

[RFC PATCH v4 12/13] iommu/arm-smmu-v3: Realize clear_dirty_log iommu ops

2021-05-07 Thread Keqian Zhu

From: Kunkun Jiang 

This realizes clear_dirty_log iommu ops based on clear_dirty_log
io-pgtable ops.

Co-developed-by: Keqian Zhu 
Signed-off-by: Kunkun Jiang 
---
 drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c | 25 +
 1 file changed, 25 insertions(+)

diff --git a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c 
b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
index 3d3c0f8e2446..9b4739247dbb 100644
--- a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
+++ b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
@@ -2750,6 +2750,30 @@ static int arm_smmu_sync_dirty_log(struct iommu_domain 
*domain,
   bitmap_pgshift);
 }
 
+static int arm_smmu_clear_dirty_log(struct iommu_domain *domain,
+   unsigned long iova, size_t size,
+   unsigned long *bitmap,
+   unsigned long base_iova,
+   unsigned long bitmap_pgshift)
+{
+   struct arm_smmu_domain *smmu_domain = to_smmu_domain(domain);
+   struct io_pgtable_ops *ops = smmu_domain->pgtbl_ops;
+   struct arm_smmu_device *smmu = smmu_domain->smmu;
+
+   if (!(smmu->features & ARM_SMMU_FEAT_HD))
+   return -ENODEV;
+   if (smmu_domain->stage != ARM_SMMU_DOMAIN_S1)
+   return -EINVAL;
+
+   if (!ops || !ops->clear_dirty_log) {
+   pr_err("io-pgtable don't realize clear dirty log\n");
+   return -ENODEV;
+   }
+
+   return ops->clear_dirty_log(ops, iova, size, bitmap, base_iova,
+   bitmap_pgshift);
+}
+
 static int arm_smmu_of_xlate(struct device *dev, struct of_phandle_args *args)
 {
return iommu_fwspec_add_ids(dev, args->args, 1);
@@ -2850,6 +2874,7 @@ static struct iommu_ops arm_smmu_ops = {
.enable_nesting = arm_smmu_enable_nesting,
.switch_dirty_log   = arm_smmu_switch_dirty_log,
.sync_dirty_log = arm_smmu_sync_dirty_log,
+   .clear_dirty_log= arm_smmu_clear_dirty_log,
.of_xlate   = arm_smmu_of_xlate,
.get_resv_regions   = arm_smmu_get_resv_regions,
.put_resv_regions   = generic_iommu_put_resv_regions,
-- 
2.19.1

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

[RFC PATCH v4 04/13] iommu/io-pgtable-arm: Add and realize merge_page ops

2021-05-07 Thread Keqian Zhu

From: Kunkun Jiang 

If block(largepage) mappings are split during start dirty log, then
when stop dirty log, we need to recover them for better DMA performance.

This recovers block mappings and unmap the span of page mappings. BBML1
or BBML2 feature is required.

Merging page is designed to be only used by dirty log tracking, which
does not concurrently work with other pgtable ops that access underlying
page table, so race condition does not exist.

Co-developed-by: Keqian Zhu 
Signed-off-by: Kunkun Jiang 
---
 drivers/iommu/io-pgtable-arm.c | 78 ++
 include/linux/io-pgtable.h |  2 +
 2 files changed, 80 insertions(+)

diff --git a/drivers/iommu/io-pgtable-arm.c b/drivers/iommu/io-pgtable-arm.c
index 664a9548b199..b9f6e3370032 100644
--- a/drivers/iommu/io-pgtable-arm.c
+++ b/drivers/iommu/io-pgtable-arm.c
@@ -800,6 +800,83 @@ static size_t arm_lpae_split_block(struct io_pgtable_ops 
*ops,
return __arm_lpae_split_block(data, iova, size, lvl, ptep);
 }
 
+static size_t __arm_lpae_merge_page(struct arm_lpae_io_pgtable *data,
+   unsigned long iova, phys_addr_t paddr,
+   size_t size, int lvl, arm_lpae_iopte *ptep,
+   arm_lpae_iopte prot)
+{
+   arm_lpae_iopte pte, *tablep;
+   struct io_pgtable *iop = >iop;
+   struct io_pgtable_cfg *cfg = >iop.cfg;
+
+   if (WARN_ON(lvl == ARM_LPAE_MAX_LEVELS))
+   return 0;
+
+   ptep += ARM_LPAE_LVL_IDX(iova, lvl, data);
+   pte = READ_ONCE(*ptep);
+   if (WARN_ON(!pte))
+   return 0;
+
+   if (size == ARM_LPAE_BLOCK_SIZE(lvl, data)) {
+   if (iopte_leaf(pte, lvl, iop->fmt))
+   return size;
+
+   /* Race does not exist */
+   if (cfg->quirks & IO_PGTABLE_QUIRK_ARM_BBML1) {
+   prot |= ARM_LPAE_PTE_NT;
+   __arm_lpae_init_pte(data, paddr, prot, lvl, ptep);
+   io_pgtable_tlb_flush_walk(iop, iova, size,
+ ARM_LPAE_GRANULE(data));
+
+   prot &= ~(ARM_LPAE_PTE_NT);
+   __arm_lpae_init_pte(data, paddr, prot, lvl, ptep);
+   } else {
+   __arm_lpae_init_pte(data, paddr, prot, lvl, ptep);
+   }
+
+   tablep = iopte_deref(pte, data);
+   __arm_lpae_free_pgtable(data, lvl + 1, tablep);
+   return size;
+   } else if (iopte_leaf(pte, lvl, iop->fmt)) {
+   /* The size is too small, already merged */
+   return size;
+   }
+
+   /* Keep on walkin */
+   ptep = iopte_deref(pte, data);
+   return __arm_lpae_merge_page(data, iova, paddr, size, lvl + 1, ptep, 
prot);
+}
+
+static size_t arm_lpae_merge_page(struct io_pgtable_ops *ops, unsigned long 
iova,
+ phys_addr_t paddr, size_t size, int 
iommu_prot)
+{
+   struct arm_lpae_io_pgtable *data = io_pgtable_ops_to_data(ops);
+   struct io_pgtable_cfg *cfg = >iop.cfg;
+   arm_lpae_iopte *ptep = data->pgd;
+   int lvl = data->start_level;
+   arm_lpae_iopte prot;
+   long iaext = (s64)iova >> cfg->ias;
+
+   if (WARN_ON(!size || (size & cfg->pgsize_bitmap) != size))
+   return 0;
+
+   if (cfg->quirks & IO_PGTABLE_QUIRK_ARM_TTBR1)
+   iaext = ~iaext;
+   if (WARN_ON(iaext || paddr >> cfg->oas))
+   return 0;
+
+   /* If no access, then nothing to do */
+   if (!(iommu_prot & (IOMMU_READ | IOMMU_WRITE)))
+   return size;
+
+   /* If it is smallest granule, then nothing to do */
+   if (size == ARM_LPAE_BLOCK_SIZE(ARM_LPAE_MAX_LEVELS - 1, data))
+   return size;
+
+   prot = arm_lpae_prot_to_pte(data, iommu_prot);
+   return __arm_lpae_merge_page(data, iova, paddr, size, lvl, ptep, prot);
+}
+
 static void arm_lpae_restrict_pgsizes(struct io_pgtable_cfg *cfg)
 {
unsigned long granule, page_sizes;
@@ -879,6 +956,7 @@ arm_lpae_alloc_pgtable(struct io_pgtable_cfg *cfg)
.unmap  = arm_lpae_unmap,
.iova_to_phys   = arm_lpae_iova_to_phys,
.split_block= arm_lpae_split_block,
+   .merge_page = arm_lpae_merge_page,
};
 
return data;
diff --git a/include/linux/io-pgtable.h b/include/linux/io-pgtable.h
index eba6c6ccbe49..e77576d946a2 100644
--- a/include/linux/io-pgtable.h
+++ b/include/linux/io-pgtable.h
@@ -169,6 +169,8 @@ struct io_pgtable_ops {
unsigned long iova);
size_t (*split_block)(struct io_pgtable_ops *ops, unsigned long iova,
  size_t size);
+   size_t (*merge_page)(struct io_pgtable_ops *ops, unsigned long iova,

[RFC PATCH v4 05/13] iommu/io-pgtable-arm: Add and realize sync_dirty_log ops

2021-05-07 Thread Keqian Zhu

From: Kunkun Jiang 

During dirty log tracking, user will try to retrieve dirty log from
iommu if it supports hardware dirty log. Scan leaf TTD and treat it
is dirty if it's writable. As we just set DBM bit for stage1 mapping,
so check whether AP[2] is not set.

Co-developed-by: Keqian Zhu 
Signed-off-by: Kunkun Jiang 
---
 drivers/iommu/io-pgtable-arm.c | 89 ++
 include/linux/io-pgtable.h |  4 ++
 2 files changed, 93 insertions(+)

diff --git a/drivers/iommu/io-pgtable-arm.c b/drivers/iommu/io-pgtable-arm.c
index b9f6e3370032..155d440099ab 100644
--- a/drivers/iommu/io-pgtable-arm.c
+++ b/drivers/iommu/io-pgtable-arm.c
@@ -877,6 +877,94 @@ static size_t arm_lpae_merge_page(struct io_pgtable_ops 
*ops, unsigned long iova
return __arm_lpae_merge_page(data, iova, paddr, size, lvl, ptep, prot);
 }
 
+static int __arm_lpae_sync_dirty_log(struct arm_lpae_io_pgtable *data,
+unsigned long iova, size_t size,
+int lvl, arm_lpae_iopte *ptep,
+unsigned long *bitmap,
+unsigned long base_iova,
+unsigned long bitmap_pgshift)
+{
+   arm_lpae_iopte pte;
+   struct io_pgtable *iop = >iop;
+   size_t base, next_size;
+   unsigned long offset;
+   int nbits, ret;
+
+   if (WARN_ON(lvl == ARM_LPAE_MAX_LEVELS))
+   return -EINVAL;
+
+   ptep += ARM_LPAE_LVL_IDX(iova, lvl, data);
+   pte = READ_ONCE(*ptep);
+   if (WARN_ON(!pte))
+   return -EINVAL;
+
+   if (size == ARM_LPAE_BLOCK_SIZE(lvl, data)) {
+   if (iopte_leaf(pte, lvl, iop->fmt)) {
+   if (pte & ARM_LPAE_PTE_AP_RDONLY)
+   return 0;
+
+   /* It is writable, set the bitmap */
+   nbits = size >> bitmap_pgshift;
+   offset = (iova - base_iova) >> bitmap_pgshift;
+   bitmap_set(bitmap, offset, nbits);
+   return 0;
+   }
+   /* Current level is table, traverse next level */
+   next_size = ARM_LPAE_BLOCK_SIZE(lvl + 1, data);
+   ptep = iopte_deref(pte, data);
+   for (base = 0; base < size; base += next_size) {
+   ret = __arm_lpae_sync_dirty_log(data, iova + base,
+   next_size, lvl + 1, ptep, bitmap,
+   base_iova, bitmap_pgshift);
+   if (ret)
+   return ret;
+   }
+   return 0;
+   } else if (iopte_leaf(pte, lvl, iop->fmt)) {
+   if (pte & ARM_LPAE_PTE_AP_RDONLY)
+   return 0;
+
+   /* Though the size is too small, also set bitmap */
+   nbits = size >> bitmap_pgshift;
+   offset = (iova - base_iova) >> bitmap_pgshift;
+   bitmap_set(bitmap, offset, nbits);
+   return 0;
+   }
+
+   /* Keep on walkin */
+   ptep = iopte_deref(pte, data);
+   return __arm_lpae_sync_dirty_log(data, iova, size, lvl + 1, ptep,
+   bitmap, base_iova, bitmap_pgshift);
+}
+
+static int arm_lpae_sync_dirty_log(struct io_pgtable_ops *ops,
+  unsigned long iova, size_t size,
+  unsigned long *bitmap,
+  unsigned long base_iova,
+  unsigned long bitmap_pgshift)
+{
+   struct arm_lpae_io_pgtable *data = io_pgtable_ops_to_data(ops);
+   struct io_pgtable_cfg *cfg = >iop.cfg;
+   arm_lpae_iopte *ptep = data->pgd;
+   int lvl = data->start_level;
+   long iaext = (s64)iova >> cfg->ias;
+
+   if (WARN_ON(!size || (size & cfg->pgsize_bitmap) != size))
+   return -EINVAL;
+
+   if (cfg->quirks & IO_PGTABLE_QUIRK_ARM_TTBR1)
+   iaext = ~iaext;
+   if (WARN_ON(iaext))
+   return -EINVAL;
+
+   if (data->iop.fmt != ARM_64_LPAE_S1 &&
+   data->iop.fmt != ARM_32_LPAE_S1)
+   return -EINVAL;
+
+   return __arm_lpae_sync_dirty_log(data, iova, size, lvl, ptep,
+bitmap, base_iova, bitmap_pgshift);
+}
+
 static void arm_lpae_restrict_pgsizes(struct io_pgtable_cfg *cfg)
 {
unsigned long granule, page_sizes;
@@ -957,6 +1045,7 @@ arm_lpae_alloc_pgtable(struct io_pgtable_cfg *cfg)
.iova_to_phys   = arm_lpae_iova_to_phys,
.split_block= arm_lpae_split_block,
.merge_page = arm_lpae_merge_page,
+   .sync_dirty_log = arm_lpae_sync_dirty_log,
};
 
return data;
diff --git a/include/linux/io-pg

[RFC PATCH v4 01/13] iommu: Introduce dirty log tracking framework

2021-05-07 Thread Keqian Zhu

Some types of IOMMU are capable of tracking DMA dirty log, such as
ARM SMMU with HTTU or Intel IOMMU with SLADE. This introduces the
dirty log tracking framework in the IOMMU base layer.

Four new essential interfaces are added, and we maintaince the status
of dirty log tracking in iommu_domain.
1. iommu_support_dirty_log: Check whether domain supports dirty log tracking
2. iommu_switch_dirty_log: Perform actions to start|stop dirty log tracking
3. iommu_sync_dirty_log: Sync dirty log from IOMMU into a dirty bitmap
4. iommu_clear_dirty_log: Clear dirty log of IOMMU by a mask bitmap

Note: Don't concurrently call these interfaces with other ops that
access underlying page table.

Signed-off-by: Keqian Zhu 
Signed-off-by: Kunkun Jiang 
---
 drivers/iommu/iommu.c| 201 +++
 include/linux/iommu.h|  63 +++
 include/trace/events/iommu.h |  63 +++
 3 files changed, 327 insertions(+)

diff --git a/drivers/iommu/iommu.c b/drivers/iommu/iommu.c
index 808ab70d5df5..0d15620d1e90 100644
--- a/drivers/iommu/iommu.c
+++ b/drivers/iommu/iommu.c
@@ -1940,6 +1940,7 @@ static struct iommu_domain *__iommu_domain_alloc(struct 
bus_type *bus,
domain->type = type;
/* Assume all sizes by default; the driver may override this later */
domain->pgsize_bitmap  = bus->iommu_ops->pgsize_bitmap;
+   mutex_init(>switch_log_lock);
 
return domain;
 }
@@ -2703,6 +2704,206 @@ int iommu_set_pgtable_quirks(struct iommu_domain 
*domain,
 }
 EXPORT_SYMBOL_GPL(iommu_set_pgtable_quirks);
 
+bool iommu_support_dirty_log(struct iommu_domain *domain)
+{
+   const struct iommu_ops *ops = domain->ops;
+
+   return ops->support_dirty_log && ops->support_dirty_log(domain);
+}
+EXPORT_SYMBOL_GPL(iommu_support_dirty_log);
+
+int iommu_switch_dirty_log(struct iommu_domain *domain, bool enable,
+  unsigned long iova, size_t size, int prot)
+{
+   const struct iommu_ops *ops = domain->ops;
+   unsigned long orig_iova = iova;
+   unsigned int min_pagesz;
+   size_t orig_size = size;
+   bool flush = false;
+   int ret = 0;
+
+   if (unlikely(!ops->switch_dirty_log))
+   return -ENODEV;
+
+   min_pagesz = 1 << __ffs(domain->pgsize_bitmap);
+   if (!IS_ALIGNED(iova | size, min_pagesz)) {
+   pr_err("unaligned: iova 0x%lx size 0x%zx min_pagesz 0x%x\n",
+  iova, size, min_pagesz);
+   return -EINVAL;
+   }
+
+   mutex_lock(>switch_log_lock);
+   if (enable && domain->dirty_log_tracking) {
+   ret = -EBUSY;
+   goto out;
+   } else if (!enable && !domain->dirty_log_tracking) {
+   ret = -EINVAL;
+   goto out;
+   }
+
+   pr_debug("switch_dirty_log %s for: iova 0x%lx size 0x%zx\n",
+enable ? "enable" : "disable", iova, size);
+
+   while (size) {
+   size_t pgsize = iommu_pgsize(domain, iova, size);
+
+   flush = true;
+   ret = ops->switch_dirty_log(domain, enable, iova, pgsize, prot);
+   if (ret)
+   break;
+
+   pr_debug("switch_dirty_log handled: iova 0x%lx size 0x%zx\n",
+iova, pgsize);
+
+   iova += pgsize;
+   size -= pgsize;
+   }
+
+   if (flush)
+   iommu_flush_iotlb_all(domain);
+
+   if (!ret) {
+   domain->dirty_log_tracking = enable;
+   trace_switch_dirty_log(orig_iova, orig_size, enable);
+   }
+out:
+   mutex_unlock(>switch_log_lock);
+   return ret;
+}
+EXPORT_SYMBOL_GPL(iommu_switch_dirty_log);
+
+int iommu_sync_dirty_log(struct iommu_domain *domain, unsigned long iova,
+size_t size, unsigned long *bitmap,
+unsigned long base_iova, unsigned long bitmap_pgshift)
+{
+   const struct iommu_ops *ops = domain->ops;
+   unsigned long orig_iova = iova;
+   unsigned int min_pagesz;
+   size_t orig_size = size;
+   int ret = 0;
+
+   if (unlikely(!ops->sync_dirty_log))
+   return -ENODEV;
+
+   min_pagesz = 1 << __ffs(domain->pgsize_bitmap);
+   if (!IS_ALIGNED(iova | size, min_pagesz)) {
+   pr_err("unaligned: iova 0x%lx size 0x%zx min_pagesz 0x%x\n",
+  iova, size, min_pagesz);
+   return -EINVAL;
+   }
+
+   mutex_lock(>switch_log_lock);
+   if (!domain->dirty_log_tracking) {
+   ret = -EINVAL;
+   goto out;
+   }
+
+   pr_debug("sync_dirty_log for: iova 0x%lx size 0x%zx\n", iova, size);
+
+   while (size) {
+   size_t pgsize = iommu_pgsize(domain, iova, size);
+
+   ret = op

[RFC PATCH v4 00/13] iommu/smmuv3: Implement hardware dirty log tracking

2021-05-07 Thread Keqian Zhu

Hi Robin, Will and everyone,

I think this series is relative mature now, please give your valuable 
suggestions,
thanks!


This patch series is split from the series[1] that containes both IOMMU part and
VFIO part. The VFIO part will be sent out in another series.

[1] 
https://lore.kernel.org/linux-iommu/20210310090614.26668-1-zhukeqi...@huawei.com/

changelog:

v4:
 - Modify the framework as suggested by Baolu, thanks!
 - Add trace for iommu ops.
 - Extract io-pgtable part.

v3:
 - Merge start_dirty_log and stop_dirty_log into switch_dirty_log. (Yi Sun)
 - Maintain the dirty log status in iommu_domain.
 - Update commit message to make patch easier to review.

v2:
 - Address all comments of RFC version, thanks for all of you ;-)
 - Add a bugfix that start dirty log for newly added dma ranges and domain.



Hi everyone,

This patch series introduces a framework of iommu dirty log tracking, and smmuv3
realizes this framework. This new feature can be used by VFIO dma dirty 
tracking.

Intention：

Some types of IOMMU are capable of tracking DMA dirty log, such as
ARM SMMU with HTTU or Intel IOMMU with SLADE. This introduces the
dirty log tracking framework in the IOMMU base layer.

Three new essential interfaces are added, and we maintaince the status
of dirty log tracking in iommu_domain.
1. iommu_switch_dirty_log: Perform actions to start|stop dirty log tracking
2. iommu_sync_dirty_log: Sync dirty log from IOMMU into a dirty bitmap
3. iommu_clear_dirty_log: Clear dirty log of IOMMU by a mask bitmap

About SMMU HTTU:

HTTU (Hardware Translation Table Update) is a feature of ARM SMMUv3, it can 
update
access flag or/and dirty state of the TTD (Translation Table Descriptor) by 
hardware.
With HTTU, stage1 TTD is classified into 3 types:
DBM bit AP[2](readonly bit)
1. writable_clean 1   1
2. writable_dirty 1   0
3. readonly   0   1

If HTTU_HD (manage dirty state) is enabled, smmu can change TTD from 
writable_clean to
writable_dirty. Then software can scan TTD to sync dirty state into dirty 
bitmap. With
this feature, we can track the dirty log of DMA continuously and precisely.

About this series:

Patch 1-3：Introduce dirty log tracking framework in the IOMMU base layer, and 
two common
   interfaces that can be used by many types of iommu.

Patch 4-6: Add feature detection for smmu HTTU and enable HTTU for smmu stage1 
mapping.
   And add feature detection for smmu BBML. We need to split block 
mapping when
   start dirty log tracking and merge page mapping when stop dirty log 
tracking,
   which requires break-before-make procedure. But it might 
cause problems when the
   TTD is alive. The I/O streams might not tolerate translation 
faults. So BBML
   should be used.

Patch 7-12: We implement these interfaces for arm smmuv3.

Thanks,
Keqian

Jean-Philippe Brucker (1):
  iommu/arm-smmu-v3: Add support for Hardware Translation Table Update

Keqian Zhu (1):
  iommu: Introduce dirty log tracking framework

Kunkun Jiang (11):
  iommu/io-pgtable-arm: Add quirk ARM_HD and ARM_BBMLx
  iommu/io-pgtable-arm: Add and realize split_block ops
  iommu/io-pgtable-arm: Add and realize merge_page ops
  iommu/io-pgtable-arm: Add and realize sync_dirty_log ops
  iommu/io-pgtable-arm: Add and realize clear_dirty_log ops
  iommu/arm-smmu-v3: Enable HTTU for stage1 with io-pgtable mapping
  iommu/arm-smmu-v3: Add feature detection for BBML
  iommu/arm-smmu-v3: Realize switch_dirty_log iommu ops
  iommu/arm-smmu-v3: Realize sync_dirty_log iommu ops
  iommu/arm-smmu-v3: Realize clear_dirty_log iommu ops
  iommu/arm-smmu-v3: Realize support_dirty_log iommu ops

 .../iommu/arm/arm-smmu-v3/arm-smmu-v3-sva.c   |   2 +
 drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c   | 268 +++-
 drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h   |  14 +
 drivers/iommu/io-pgtable-arm.c| 389 +-
 drivers/iommu/iommu.c | 206 +-
 include/linux/io-pgtable.h|  23 ++
 include/linux/iommu.h |  65 +++
 include/trace/events/iommu.h  |  63 +++
 8 files changed, 1026 insertions(+), 4 deletions(-)

-- 
2.19.1

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

[RFC PATCH v4 02/13] iommu/io-pgtable-arm: Add quirk ARM_HD and ARM_BBMLx

2021-05-07 Thread Keqian Zhu

From: Kunkun Jiang 

These features are essential to support dirty log tracking for
SMMU with io-pgtable mapping.

The dirty state information is encoded using the access permission
bits AP[2] (stage 1) or S2AP[1] (stage 2) in conjunction with the
DBM (Dirty Bit Modifier) bit, where DBM means writable and AP[2]/
S2AP[1] means dirty.

When has ARM_HD, we set DBM bit for S1 mapping. As SMMU nested
mode is not upstreamed for now, we just aim to support dirty
log tracking for stage1 with io-pgtable mapping (means not support
SVA).

Co-developed-by: Keqian Zhu 
Signed-off-by: Kunkun Jiang 
---
 drivers/iommu/io-pgtable-arm.c |  7 ++-
 include/linux/io-pgtable.h | 11 +++
 2 files changed, 17 insertions(+), 1 deletion(-)

diff --git a/drivers/iommu/io-pgtable-arm.c b/drivers/iommu/io-pgtable-arm.c
index 87def58e79b5..94d790b8ed27 100644
--- a/drivers/iommu/io-pgtable-arm.c
+++ b/drivers/iommu/io-pgtable-arm.c
@@ -72,6 +72,7 @@
 
 #define ARM_LPAE_PTE_NSTABLE   (((arm_lpae_iopte)1) << 63)
 #define ARM_LPAE_PTE_XN(((arm_lpae_iopte)3) << 53)
+#define ARM_LPAE_PTE_DBM   (((arm_lpae_iopte)1) << 51)
 #define ARM_LPAE_PTE_AF(((arm_lpae_iopte)1) << 10)
 #define ARM_LPAE_PTE_SH_NS (((arm_lpae_iopte)0) << 8)
 #define ARM_LPAE_PTE_SH_OS (((arm_lpae_iopte)2) << 8)
@@ -81,7 +82,7 @@
 
 #define ARM_LPAE_PTE_ATTR_LO_MASK  (((arm_lpae_iopte)0x3ff) << 2)
 /* Ignore the contiguous bit for block splitting */
-#define ARM_LPAE_PTE_ATTR_HI_MASK  (((arm_lpae_iopte)6) << 52)
+#define ARM_LPAE_PTE_ATTR_HI_MASK  (((arm_lpae_iopte)13) << 51)
 #define ARM_LPAE_PTE_ATTR_MASK (ARM_LPAE_PTE_ATTR_LO_MASK |\
 ARM_LPAE_PTE_ATTR_HI_MASK)
 /* Software bit for solving coherency races */
@@ -379,6 +380,7 @@ static int __arm_lpae_map(struct arm_lpae_io_pgtable *data, 
unsigned long iova,
 static arm_lpae_iopte arm_lpae_prot_to_pte(struct arm_lpae_io_pgtable *data,
   int prot)
 {
+   struct io_pgtable_cfg *cfg = >iop.cfg;
arm_lpae_iopte pte;
 
if (data->iop.fmt == ARM_64_LPAE_S1 ||
@@ -386,6 +388,9 @@ static arm_lpae_iopte arm_lpae_prot_to_pte(struct 
arm_lpae_io_pgtable *data,
pte = ARM_LPAE_PTE_nG;
if (!(prot & IOMMU_WRITE) && (prot & IOMMU_READ))
pte |= ARM_LPAE_PTE_AP_RDONLY;
+   else if (cfg->quirks & IO_PGTABLE_QUIRK_ARM_HD)
+   pte |= ARM_LPAE_PTE_DBM;
+
if (!(prot & IOMMU_PRIV))
pte |= ARM_LPAE_PTE_AP_UNPRIV;
} else {
diff --git a/include/linux/io-pgtable.h b/include/linux/io-pgtable.h
index 4d40dfa75b55..92274705b772 100644
--- a/include/linux/io-pgtable.h
+++ b/include/linux/io-pgtable.h
@@ -82,6 +82,14 @@ struct io_pgtable_cfg {
 *
 * IO_PGTABLE_QUIRK_ARM_OUTER_WBWA: Override the outer-cacheability
 *  attributes set in the TCR for a non-coherent page-table walker.
+*
+* IO_PGTABLE_QUIRK_ARM_HD: Support hardware management of dirty status.
+*
+* IO_PGTABLE_QUIRK_ARM_BBML1: ARM SMMU supports BBM Level 1 behavior
+*  when changing block size.
+*
+* IO_PGTABLE_QUIRK_ARM_BBML2: ARM SMMU supports BBM Level 2 behavior
+*  when changing block size.
 */
#define IO_PGTABLE_QUIRK_ARM_NS BIT(0)
#define IO_PGTABLE_QUIRK_NO_PERMS   BIT(1)
@@ -89,6 +97,9 @@ struct io_pgtable_cfg {
#define IO_PGTABLE_QUIRK_NON_STRICT BIT(4)
#define IO_PGTABLE_QUIRK_ARM_TTBR1  BIT(5)
#define IO_PGTABLE_QUIRK_ARM_OUTER_WBWA BIT(6)
+   #define IO_PGTABLE_QUIRK_ARM_HD BIT(7)
+   #define IO_PGTABLE_QUIRK_ARM_BBML1  BIT(8)
+   #define IO_PGTABLE_QUIRK_ARM_BBML2  BIT(9)
unsigned long   quirks;
unsigned long   pgsize_bitmap;
unsigned intias;
-- 
2.19.1

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

[RFC PATCH v4 03/13] iommu/io-pgtable-arm: Add and realize split_block ops

2021-05-07 Thread Keqian Zhu

From: Kunkun Jiang 

Block(largepage) mapping is not a proper granule for dirty log tracking.
Take an extreme example, if DMA writes one byte, under 1G mapping, the
dirty amount reported is 1G, but under 4K mapping, the dirty amount is
just 4K.

This splits block descriptor to an span of page descriptors. BBML1 or
BBML2 feature is required.

Spliting block is designed to be only used by dirty log tracking, which
does not concurrently work with other pgtable ops that access underlying
page table, so race condition does not exist.

Co-developed-by: Keqian Zhu 
Signed-off-by: Kunkun Jiang 
---
 drivers/iommu/io-pgtable-arm.c | 122 +
 include/linux/io-pgtable.h |   2 +
 2 files changed, 124 insertions(+)

diff --git a/drivers/iommu/io-pgtable-arm.c b/drivers/iommu/io-pgtable-arm.c
index 94d790b8ed27..664a9548b199 100644
--- a/drivers/iommu/io-pgtable-arm.c
+++ b/drivers/iommu/io-pgtable-arm.c
@@ -79,6 +79,8 @@
 #define ARM_LPAE_PTE_SH_IS (((arm_lpae_iopte)3) << 8)
 #define ARM_LPAE_PTE_NS(((arm_lpae_iopte)1) << 5)
 #define ARM_LPAE_PTE_VALID (((arm_lpae_iopte)1) << 0)
+/* Block descriptor bits */
+#define ARM_LPAE_PTE_NT(((arm_lpae_iopte)1) << 16)
 
 #define ARM_LPAE_PTE_ATTR_LO_MASK  (((arm_lpae_iopte)0x3ff) << 2)
 /* Ignore the contiguous bit for block splitting */
@@ -679,6 +681,125 @@ static phys_addr_t arm_lpae_iova_to_phys(struct 
io_pgtable_ops *ops,
return iopte_to_paddr(pte, data) | iova;
 }
 
+static size_t __arm_lpae_split_block(struct arm_lpae_io_pgtable *data,
+unsigned long iova, size_t size, int lvl,
+arm_lpae_iopte *ptep);
+
+static size_t arm_lpae_do_split_blk(struct arm_lpae_io_pgtable *data,
+   unsigned long iova, size_t size,
+   arm_lpae_iopte blk_pte, int lvl,
+   arm_lpae_iopte *ptep)
+{
+   struct io_pgtable_cfg *cfg = >iop.cfg;
+   arm_lpae_iopte pte, *tablep;
+   phys_addr_t blk_paddr;
+   size_t tablesz = ARM_LPAE_GRANULE(data);
+   size_t split_sz = ARM_LPAE_BLOCK_SIZE(lvl, data);
+   int i;
+
+   if (WARN_ON(lvl == ARM_LPAE_MAX_LEVELS))
+   return 0;
+
+   tablep = __arm_lpae_alloc_pages(tablesz, GFP_ATOMIC, cfg);
+   if (!tablep)
+   return 0;
+
+   blk_paddr = iopte_to_paddr(blk_pte, data);
+   pte = iopte_prot(blk_pte);
+   for (i = 0; i < tablesz / sizeof(pte); i++, blk_paddr += split_sz)
+   __arm_lpae_init_pte(data, blk_paddr, pte, lvl, [i]);
+
+   if (cfg->quirks & IO_PGTABLE_QUIRK_ARM_BBML1) {
+   /* Race does not exist */
+   blk_pte |= ARM_LPAE_PTE_NT;
+   __arm_lpae_set_pte(ptep, blk_pte, cfg);
+   io_pgtable_tlb_flush_walk(>iop, iova, size, size);
+   }
+   /* Race does not exist */
+   pte = arm_lpae_install_table(tablep, ptep, blk_pte, cfg);
+
+   /* Have splited it into page? */
+   if (lvl == (ARM_LPAE_MAX_LEVELS - 1))
+   return size;
+
+   /* Go back to lvl - 1 */
+   ptep -= ARM_LPAE_LVL_IDX(iova, lvl - 1, data);
+   return __arm_lpae_split_block(data, iova, size, lvl - 1, ptep);
+}
+
+static size_t __arm_lpae_split_block(struct arm_lpae_io_pgtable *data,
+unsigned long iova, size_t size, int lvl,
+arm_lpae_iopte *ptep)
+{
+   arm_lpae_iopte pte;
+   struct io_pgtable *iop = >iop;
+   size_t base, next_size, total_size;
+
+   if (WARN_ON(lvl == ARM_LPAE_MAX_LEVELS))
+   return 0;
+
+   ptep += ARM_LPAE_LVL_IDX(iova, lvl, data);
+   pte = READ_ONCE(*ptep);
+   if (WARN_ON(!pte))
+   return 0;
+
+   if (size == ARM_LPAE_BLOCK_SIZE(lvl, data)) {
+   if (iopte_leaf(pte, lvl, iop->fmt)) {
+   if (lvl == (ARM_LPAE_MAX_LEVELS - 1) ||
+   (pte & ARM_LPAE_PTE_AP_RDONLY))
+   return size;
+
+   /* We find a writable block, split it. */
+   return arm_lpae_do_split_blk(data, iova, size, pte,
+   lvl + 1, ptep);
+   } else {
+   /* If it is the last table level, then nothing to do */
+   if (lvl == (ARM_LPAE_MAX_LEVELS - 2))
+   return size;
+
+   total_size = 0;
+   next_size = ARM_LPAE_BLOCK_SIZE(lvl + 1, data);
+   ptep = iopte_deref(pte, data);
+   for (base = 0; base < size; base += next_size)
+   total_size += __arm_lpae_split_block(data,
+

Re: [PATCH v3 02/12] iommu: Add iommu_split_block interface

2021-04-20 Thread Keqian Zhu

Hi Baolu,

Cheers for the your quick reply.

On 2021/4/20 10:09, Lu Baolu wrote:
> Hi Keqian,
> 
> On 4/20/21 9:25 AM, Keqian Zhu wrote:
>> Hi Baolu,
>>
>> On 2021/4/19 21:33, Lu Baolu wrote:
>>> Hi Keqian,
>>>
>>> On 2021/4/19 17:32, Keqian Zhu wrote:
>>>>>> +EXPORT_SYMBOL_GPL(iommu_split_block);
>>>>> Do you really have any consumers of this interface other than the dirty
>>>>> bit tracking? If not, I don't suggest to make this as a generic IOMMU
>>>>> interface.
>>>>>
>>>>> There is an implicit requirement for such interfaces. The
>>>>> iommu_map/unmap(iova, size) shouldn't be called at the same time.
>>>>> Currently there's no such sanity check in the iommu core. A poorly
>>>>> written driver could mess up the kernel by misusing this interface.
>>>> Yes, I don't think up a scenario except dirty tracking.
>>>>
>>>> Indeed, we'd better not make them as a generic interface.
>>>>
>>>> Do you have any suggestion that underlying iommu drivers can share these 
>>>> code but
>>>> not make it as a generic iommu interface?
>>>>
>>>> I have a not so good idea. Make the "split" interfaces as a static 
>>>> function, and
>>>> transfer the function pointer to start_dirty_log. But it looks weird and 
>>>> inflexible.
>>>
>>> I understand splitting/merging super pages is an optimization, but not a
>>> functional requirement. So is it possible to let the vendor iommu driver
>>> decide whether splitting super pages when starting dirty bit tracking
>>> and the opposite operation during when stopping it? The requirement for
>> Right. If I understand you correct, actually that is what this series does.
> 
> I mean to say no generic APIs, jut do it by the iommu subsystem itself.
> It's totally transparent to the upper level, just like what map() does.
> The upper layer doesn't care about either super page or small page is
> in use when do a mapping, right?
> 
> If you want to consolidate some code, how about putting them in
> start/stop_tracking()?

Yep, this reminds me. What we want to reuse is the logic of "chunk by chunk" in 
split().
We can implement switch_dirty_log to be "chunk by chunk" too (just the same as 
sync/clear),
then the vendor iommu driver can invoke it's own private implementation of 
split().
So we can completely remove split() in the IOMMU core layer.

example code logic

iommu.c:
switch_dirty_log(big range) {
for_each_iommu_page(big range) {
  ops->switch_dirty_log(iommu_pgsize)
}
}

vendor iommu driver:
switch_dirty_log(iommu_pgsize) {

if (enable) {
ops->split_block(iommu_pgsize)
/* And other actions, such as enable hardware capability */
} else {
for_each_continuous_physical_address(iommu_pgsize)
ops->merge_page()
}
}

Besides, vendor iommu driver can invoke split() in clear_dirty_log instead of 
in switch_dirty_log.
The benefit is that we usually clear dirty log gradually during dirty tracking, 
then we can split
large page mapping gradually, which speedup start_dirty_log and make less side 
effect on DMA performance.

Does it looks good for you?

Thanks,
Keqian
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

Re: [PATCH v3 02/12] iommu: Add iommu_split_block interface

2021-04-19 Thread Keqian Zhu

Hi Baolu,

On 2021/4/19 21:33, Lu Baolu wrote:
> Hi Keqian,
> 
> On 2021/4/19 17:32, Keqian Zhu wrote:
>>>> +EXPORT_SYMBOL_GPL(iommu_split_block);
>>> Do you really have any consumers of this interface other than the dirty
>>> bit tracking? If not, I don't suggest to make this as a generic IOMMU
>>> interface.
>>>
>>> There is an implicit requirement for such interfaces. The
>>> iommu_map/unmap(iova, size) shouldn't be called at the same time.
>>> Currently there's no such sanity check in the iommu core. A poorly
>>> written driver could mess up the kernel by misusing this interface.
>> Yes, I don't think up a scenario except dirty tracking.
>>
>> Indeed, we'd better not make them as a generic interface.
>>
>> Do you have any suggestion that underlying iommu drivers can share these 
>> code but
>> not make it as a generic iommu interface?
>>
>> I have a not so good idea. Make the "split" interfaces as a static function, 
>> and
>> transfer the function pointer to start_dirty_log. But it looks weird and 
>> inflexible.
> 
> I understand splitting/merging super pages is an optimization, but not a
> functional requirement. So is it possible to let the vendor iommu driver
> decide whether splitting super pages when starting dirty bit tracking
> and the opposite operation during when stopping it? The requirement for
Right. If I understand you correct, actually that is what this series does.
We realized split/merge in IOMMU core layer, but don't force vendor driver to 
use it.

The problem is that when we expose these interfaces to vendor IOMMU driver, 
will also
expose them to upper driver.

> upper layer is that starting/stopping dirty bit tracking and
> mapping/unmapping are mutually exclusive.
OK, I will explicitly add the hints. Thanks.

Thanks,
Keqian
> 
>>
>> On the other hand, if a driver calls map/unmap with split/merge at the same 
>> time,
>> it's a bug of driver, it should follow the rule.
>>
> 
> Best regards,
> baolu
> .
> 
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

Re: [PATCH v3 02/12] iommu: Add iommu_split_block interface

2021-04-19 Thread Keqian Zhu

Hi Baolu,

On 2021/4/14 15:14, Lu Baolu wrote:
> On 4/13/21 4:54 PM, Keqian Zhu wrote:
>> Block(largepage) mapping is not a proper granule for dirty log tracking.
>> Take an extreme example, if DMA writes one byte, under 1G mapping, the
>> dirty amount reported is 1G, but under 4K mapping, the dirty amount is
>> just 4K.
>>
>> This adds a new interface named iommu_split_block in IOMMU base layer.
>> A specific IOMMU driver can invoke it during start dirty log. If so, the
>> driver also need to realize the split_block iommu ops.
>>
>> We flush all iotlbs after the whole procedure is completed to ease the
>> pressure of IOMMU, as we will hanle a huge range of mapping in general.
>>
>> Signed-off-by: Keqian Zhu 
>> Signed-off-by: Kunkun Jiang 
>> ---
>>   drivers/iommu/iommu.c | 41 +
>>   include/linux/iommu.h | 11 +++
>>   2 files changed, 52 insertions(+)
>>
>> diff --git a/drivers/iommu/iommu.c b/drivers/iommu/iommu.c
>> index 667b2d6d2fc0..bb413a927870 100644
>> --- a/drivers/iommu/iommu.c
>> +++ b/drivers/iommu/iommu.c
>> @@ -2721,6 +2721,47 @@ int iommu_domain_set_attr(struct iommu_domain *domain,
>>   }
>>   EXPORT_SYMBOL_GPL(iommu_domain_set_attr);
>>   +int iommu_split_block(struct iommu_domain *domain, unsigned long iova,
>> +  size_t size)
>> +{
>> +const struct iommu_ops *ops = domain->ops;
>> +unsigned int min_pagesz;
>> +size_t pgsize;
>> +bool flush = false;
>> +int ret = 0;
>> +
>> +if (unlikely(!ops || !ops->split_block))
>> +return -ENODEV;
>> +
>> +min_pagesz = 1 << __ffs(domain->pgsize_bitmap);
>> +if (!IS_ALIGNED(iova | size, min_pagesz)) {
>> +pr_err("unaligned: iova 0x%lx size 0x%zx min_pagesz 0x%x\n",
>> +   iova, size, min_pagesz);
>> +return -EINVAL;
>> +}
>> +
>> +while (size) {
>> +flush = true;
>> +
>> +pgsize = iommu_pgsize(domain, iova, size);
>> +
>> +ret = ops->split_block(domain, iova, pgsize);
>> +if (ret)
>> +break;
>> +
>> +pr_debug("split handled: iova 0x%lx size 0x%zx\n", iova, pgsize);
>> +
>> +iova += pgsize;
>> +size -= pgsize;
>> +}
>> +
>> +if (flush)
>> +iommu_flush_iotlb_all(domain);
>> +
>> +return ret;
>> +}
>> +EXPORT_SYMBOL_GPL(iommu_split_block);
> 
> Do you really have any consumers of this interface other than the dirty
> bit tracking? If not, I don't suggest to make this as a generic IOMMU
> interface.
> 
> There is an implicit requirement for such interfaces. The
> iommu_map/unmap(iova, size) shouldn't be called at the same time.
> Currently there's no such sanity check in the iommu core. A poorly
> written driver could mess up the kernel by misusing this interface.

Yes, I don't think up a scenario except dirty tracking.

Indeed, we'd better not make them as a generic interface.

Do you have any suggestion that underlying iommu drivers can share these code 
but
not make it as a generic iommu interface?

I have a not so good idea. Make the "split" interfaces as a static function, and
transfer the function pointer to start_dirty_log. But it looks weird and 
inflexible.

On the other hand, if a driver calls map/unmap with split/merge at the same 
time,
it's a bug of driver, it should follow the rule.

> 
> This also applies to iommu_merge_page().
>

Thanks,
Keqian
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

Re: [PATCH v3 01/12] iommu: Introduce dirty log tracking framework

2021-04-16 Thread Keqian Zhu

Hi Baolu,

On 2021/4/15 18:21, Lu Baolu wrote:
> Hi,
> 
> On 2021/4/15 15:43, Keqian Zhu wrote:
>>>> design it as not switchable. I will modify the commit message of patch#12, 
>>>> thanks!
>>> I am not sure that I fully get your point. But I can't see any gaps of
>>> using iommu_dev_enable/disable_feature() to switch dirty log on and off.
>>> Probably I missed anything.
>> IOMMU_DEV_FEAT_HWDBM just tells user whether underlying IOMMU driver supports
>> dirty tracking, it is not used to management the status of dirty log 
>> tracking.
>>
>> The feature reporting is per device, but the status management is per 
>> iommu_domain.
>> Only when all devices in a domain support HWDBM, we can start dirty log for 
>> the domain.
> 
> So why not
> 
> for_each_device_attached_domain()
> iommu_dev_enable_feature(IOMMU_DEV_FEAT_HWDBM)
Looks reasonable, but the problem is that we just need to enable dirty log once 
per domain.

> 
> ?
>>
>> And I think we'd better not mix the feature reporting and status management. 
>> Thoughts?
>>
> 
> I am worrying about having two sets of APIs for single purpose. From
> vendor iommu driver's point of view, this feature is per device. Hence,
> it still needs to do the same thing.
Yes, we can unify the granule of feature reporting and status management.

The basic granule of dirty tracking is iommu_domain, I think it's very 
reasonable. We need an
interface to report the feature of iommu_domain, then the logic is much more 
clear.

Every time we add new device or remove device from the domain, we should update 
the feature (e.g.,
maintain a counter of unsupported devices).

What do you think about this idea?

Thanks,
Keqian
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

Re: [PATCH v3 01/12] iommu: Introduce dirty log tracking framework

2021-04-15 Thread Keqian Zhu




On 2021/4/15 15:03, Lu Baolu wrote:
> On 4/15/21 2:18 PM, Keqian Zhu wrote:
>> Hi Baolu,
>>
>> Thanks for the review!
>>
>> On 2021/4/14 15:00, Lu Baolu wrote:
>>> Hi Keqian,
>>>
>>> On 4/13/21 4:54 PM, Keqian Zhu wrote:
>>>> Some types of IOMMU are capable of tracking DMA dirty log, such as
>>>> ARM SMMU with HTTU or Intel IOMMU with SLADE. This introduces the
>>>> dirty log tracking framework in the IOMMU base layer.
>>>>
>>>> Three new essential interfaces are added, and we maintaince the status
>>>> of dirty log tracking in iommu_domain.
>>>> 1. iommu_switch_dirty_log: Perform actions to start|stop dirty log tracking
>>>> 2. iommu_sync_dirty_log: Sync dirty log from IOMMU into a dirty bitmap
>>>> 3. iommu_clear_dirty_log: Clear dirty log of IOMMU by a mask bitmap
>>>>
>>>> A new dev feature are added to indicate whether a specific type of
>>>> iommu hardware supports and its driver realizes them.
>>>>
>>>> Signed-off-by: Keqian Zhu 
>>>> Signed-off-by: Kunkun Jiang 
>>>> ---
>>>>drivers/iommu/iommu.c | 150 ++
>>>>include/linux/iommu.h |  53 +++
>>>>2 files changed, 203 insertions(+)
>>>>
>>>> diff --git a/drivers/iommu/iommu.c b/drivers/iommu/iommu.c
>>>> index d0b0a15dba84..667b2d6d2fc0 100644
>>>> --- a/drivers/iommu/iommu.c
>>>> +++ b/drivers/iommu/iommu.c
>>>> @@ -1922,6 +1922,7 @@ static struct iommu_domain 
>>>> *__iommu_domain_alloc(struct bus_type *bus,
>>>>domain->type = type;
>>>>/* Assume all sizes by default; the driver may override this later 
>>>> */
>>>>domain->pgsize_bitmap  = bus->iommu_ops->pgsize_bitmap;
>>>> +mutex_init(>switch_log_lock);
>>>>  return domain;
>>>>}
>>>> @@ -2720,6 +2721,155 @@ int iommu_domain_set_attr(struct iommu_domain 
>>>> *domain,
>>>>}
>>>>EXPORT_SYMBOL_GPL(iommu_domain_set_attr);
>>>>+int iommu_switch_dirty_log(struct iommu_domain *domain, bool enable,
>>>> +   unsigned long iova, size_t size, int prot)
>>>> +{
>>>> +const struct iommu_ops *ops = domain->ops;
>>>> +int ret;
>>>> +
>>>> +if (unlikely(!ops || !ops->switch_dirty_log))
>>>> +return -ENODEV;
>>>> +
>>>> +mutex_lock(>switch_log_lock);
>>>> +if (enable && domain->dirty_log_tracking) {
>>>> +ret = -EBUSY;
>>>> +goto out;
>>>> +} else if (!enable && !domain->dirty_log_tracking) {
>>>> +ret = -EINVAL;
>>>> +goto out;
>>>> +}
>>>> +
>>>> +ret = ops->switch_dirty_log(domain, enable, iova, size, prot);
>>>> +if (ret)
>>>> +goto out;
>>>> +
>>>> +domain->dirty_log_tracking = enable;
>>>> +out:
>>>> +mutex_unlock(>switch_log_lock);
>>>> +return ret;
>>>> +}
>>>> +EXPORT_SYMBOL_GPL(iommu_switch_dirty_log);
>>>
>>> Since you also added IOMMU_DEV_FEAT_HWDBM, I am wondering what's the
>>> difference between
>>>
>>> iommu_switch_dirty_log(on) vs. 
>>> iommu_dev_enable_feature(IOMMU_DEV_FEAT_HWDBM)
>>>
>>> iommu_switch_dirty_log(off) vs. 
>>> iommu_dev_disable_feature(IOMMU_DEV_FEAT_HWDBM)
>> Indeed. As I can see, IOMMU_DEV_FEAT_AUX is not switchable, so enable/disable
>> are not applicable for it. IOMMU_DEV_FEAT_SVA is switchable, so we can use 
>> these
>> interfaces for it.
>>
>> IOMMU_DEV_FEAT_HWDBM is used to indicate whether hardware supports HWDBM, so 
>> we should
> 
> Previously we had iommu_dev_has_feature() and then was cleaned up due to
> lack of real users. If you have a real case for it, why not bringing it
> back?
Yep, good suggestion.

> 
>> design it as not switchable. I will modify the commit message of patch#12, 
>> thanks!
> 
> I am not sure that I fully get your point. But I can't see any gaps of
> using iommu_dev_enable/disable_feature() to switch dirty log on and off.
> Probably I missed anything.
IOMMU_DEV_FEAT_HWDBM just tells user whether underlying IOMMU driver supports
dirty tracking,

Re: [PATCH v3 01/12] iommu: Introduce dirty log tracking framework

2021-04-15 Thread Keqian Zhu

Hi Baolu,

Thanks for the review!

On 2021/4/14 15:00, Lu Baolu wrote:
> Hi Keqian,
> 
> On 4/13/21 4:54 PM, Keqian Zhu wrote:
>> Some types of IOMMU are capable of tracking DMA dirty log, such as
>> ARM SMMU with HTTU or Intel IOMMU with SLADE. This introduces the
>> dirty log tracking framework in the IOMMU base layer.
>>
>> Three new essential interfaces are added, and we maintaince the status
>> of dirty log tracking in iommu_domain.
>> 1. iommu_switch_dirty_log: Perform actions to start|stop dirty log tracking
>> 2. iommu_sync_dirty_log: Sync dirty log from IOMMU into a dirty bitmap
>> 3. iommu_clear_dirty_log: Clear dirty log of IOMMU by a mask bitmap
>>
>> A new dev feature are added to indicate whether a specific type of
>> iommu hardware supports and its driver realizes them.
>>
>> Signed-off-by: Keqian Zhu 
>> Signed-off-by: Kunkun Jiang 
>> ---
>>   drivers/iommu/iommu.c | 150 ++
>>   include/linux/iommu.h |  53 +++
>>   2 files changed, 203 insertions(+)
>>
>> diff --git a/drivers/iommu/iommu.c b/drivers/iommu/iommu.c
>> index d0b0a15dba84..667b2d6d2fc0 100644
>> --- a/drivers/iommu/iommu.c
>> +++ b/drivers/iommu/iommu.c
>> @@ -1922,6 +1922,7 @@ static struct iommu_domain 
>> *__iommu_domain_alloc(struct bus_type *bus,
>>   domain->type = type;
>>   /* Assume all sizes by default; the driver may override this later */
>>   domain->pgsize_bitmap  = bus->iommu_ops->pgsize_bitmap;
>> +mutex_init(>switch_log_lock);
>> return domain;
>>   }
>> @@ -2720,6 +2721,155 @@ int iommu_domain_set_attr(struct iommu_domain 
>> *domain,
>>   }
>>   EXPORT_SYMBOL_GPL(iommu_domain_set_attr);
>>   +int iommu_switch_dirty_log(struct iommu_domain *domain, bool enable,
>> +   unsigned long iova, size_t size, int prot)
>> +{
>> +const struct iommu_ops *ops = domain->ops;
>> +int ret;
>> +
>> +if (unlikely(!ops || !ops->switch_dirty_log))
>> +return -ENODEV;
>> +
>> +mutex_lock(>switch_log_lock);
>> +if (enable && domain->dirty_log_tracking) {
>> +ret = -EBUSY;
>> +goto out;
>> +} else if (!enable && !domain->dirty_log_tracking) {
>> +ret = -EINVAL;
>> +goto out;
>> +}
>> +
>> +ret = ops->switch_dirty_log(domain, enable, iova, size, prot);
>> +if (ret)
>> +goto out;
>> +
>> +domain->dirty_log_tracking = enable;
>> +out:
>> +mutex_unlock(>switch_log_lock);
>> +return ret;
>> +}
>> +EXPORT_SYMBOL_GPL(iommu_switch_dirty_log);
> 
> Since you also added IOMMU_DEV_FEAT_HWDBM, I am wondering what's the
> difference between
> 
> iommu_switch_dirty_log(on) vs. iommu_dev_enable_feature(IOMMU_DEV_FEAT_HWDBM)
> 
> iommu_switch_dirty_log(off) vs. 
> iommu_dev_disable_feature(IOMMU_DEV_FEAT_HWDBM)
Indeed. As I can see, IOMMU_DEV_FEAT_AUX is not switchable, so enable/disable
are not applicable for it. IOMMU_DEV_FEAT_SVA is switchable, so we can use these
interfaces for it.

IOMMU_DEV_FEAT_HWDBM is used to indicate whether hardware supports HWDBM, so we 
should
design it as not switchable. I will modify the commit message of patch#12, 
thanks!

> 
>> +
>> +int iommu_sync_dirty_log(struct iommu_domain *domain, unsigned long iova,
>> + size_t size, unsigned long *bitmap,
>> + unsigned long base_iova, unsigned long bitmap_pgshift)
>> +{
>> +const struct iommu_ops *ops = domain->ops;
>> +unsigned int min_pagesz;
>> +size_t pgsize;
>> +int ret = 0;
>> +
>> +if (unlikely(!ops || !ops->sync_dirty_log))
>> +return -ENODEV;
>> +
>> +min_pagesz = 1 << __ffs(domain->pgsize_bitmap);
>> +if (!IS_ALIGNED(iova | size, min_pagesz)) {
>> +pr_err("unaligned: iova 0x%lx size 0x%zx min_pagesz 0x%x\n",
>> +   iova, size, min_pagesz);
>> +return -EINVAL;
>> +}
>> +
>> +mutex_lock(>switch_log_lock);
>> +if (!domain->dirty_log_tracking) {
>> +ret = -EINVAL;
>> +goto out;
>> +}
>> +
>> +while (size) {
>> +pgsize = iommu_pgsize(domain, iova, size);
>> +
>> +ret = ops->sync_dirty_log(domain, iova, pgsize,
>> +  bitmap, base_iova, bitmap_pgshift);
> 
> Any reason why do you want to do this in a per-4K page manner? This can
> lead to a lot of indirect calls and bad performance.
> 
> How about a sync_dirty_pages()?
The function name of iommu_pgsize() is a bit puzzling. Actually it will try to
compute the max size that fit into size, so the pgsize can be a large page size
even if the underlying mapping is 4K. The __iommu_unmap() also has a similar 
logic.

BRs,
Keqian
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

[PATCH v3 10/12] iommu/arm-smmu-v3: Realize sync_dirty_log iommu ops

2021-04-13 Thread Keqian Zhu

From: Kunkun Jiang 

During dirty log tracking, user will try to retrieve dirty log from
iommu if it supports hardware dirty log. Scan leaf TTD and treat it
is dirty if it's writable. As we just enable HTTU for stage1, so
check whether AP[2] is not set.

Co-developed-by: Keqian Zhu 
Signed-off-by: Kunkun Jiang 
---
 drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c | 30 +++
 drivers/iommu/io-pgtable-arm.c  | 90 +
 include/linux/io-pgtable.h  |  4 +
 3 files changed, 124 insertions(+)

diff --git a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c 
b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
index 52c6f3e74d6f..9eb209a07acc 100644
--- a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
+++ b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
@@ -2627,6 +2627,35 @@ static int arm_smmu_switch_dirty_log(struct iommu_domain 
*domain, bool enable,
return 0;
 }
 
+static int arm_smmu_sync_dirty_log(struct iommu_domain *domain,
+  unsigned long iova, size_t size,
+  unsigned long *bitmap,
+  unsigned long base_iova,
+  unsigned long bitmap_pgshift)
+{
+   struct arm_smmu_domain *smmu_domain = to_smmu_domain(domain);
+   struct io_pgtable_ops *ops = smmu_domain->pgtbl_ops;
+   struct arm_smmu_device *smmu = smmu_domain->smmu;
+
+   if (!(smmu->features & ARM_SMMU_FEAT_HD))
+   return -ENODEV;
+   if (smmu_domain->stage != ARM_SMMU_DOMAIN_S1)
+   return -EINVAL;
+
+   if (!ops || !ops->sync_dirty_log) {
+   pr_err("io-pgtable don't realize sync dirty log\n");
+   return -ENODEV;
+   }
+
+   /*
+* Flush iotlb to ensure all inflight transactions are completed.
+* See doc IHI0070Da 3.13.4 "HTTU behavior summary".
+*/
+   arm_smmu_flush_iotlb_all(domain);
+   return ops->sync_dirty_log(ops, iova, size, bitmap, base_iova,
+  bitmap_pgshift);
+}
+
 static int arm_smmu_of_xlate(struct device *dev, struct of_phandle_args *args)
 {
return iommu_fwspec_add_ids(dev, args->args, 1);
@@ -2729,6 +2758,7 @@ static struct iommu_ops arm_smmu_ops = {
.split_block= arm_smmu_split_block,
.merge_page = arm_smmu_merge_page,
.switch_dirty_log   = arm_smmu_switch_dirty_log,
+   .sync_dirty_log = arm_smmu_sync_dirty_log,
.of_xlate   = arm_smmu_of_xlate,
.get_resv_regions   = arm_smmu_get_resv_regions,
.put_resv_regions   = generic_iommu_put_resv_regions,
diff --git a/drivers/iommu/io-pgtable-arm.c b/drivers/iommu/io-pgtable-arm.c
index 9028328b99b0..67a208a05ab2 100644
--- a/drivers/iommu/io-pgtable-arm.c
+++ b/drivers/iommu/io-pgtable-arm.c
@@ -877,6 +877,95 @@ static size_t arm_lpae_merge_page(struct io_pgtable_ops 
*ops, unsigned long iova
return __arm_lpae_merge_page(data, iova, paddr, size, lvl, ptep, prot);
 }
 
+static int __arm_lpae_sync_dirty_log(struct arm_lpae_io_pgtable *data,
+unsigned long iova, size_t size,
+int lvl, arm_lpae_iopte *ptep,
+unsigned long *bitmap,
+unsigned long base_iova,
+unsigned long bitmap_pgshift)
+{
+   arm_lpae_iopte pte;
+   struct io_pgtable *iop = >iop;
+   size_t base, next_size;
+   unsigned long offset;
+   int nbits, ret;
+
+   if (WARN_ON(lvl == ARM_LPAE_MAX_LEVELS))
+   return -EINVAL;
+
+   ptep += ARM_LPAE_LVL_IDX(iova, lvl, data);
+   pte = READ_ONCE(*ptep);
+   if (WARN_ON(!pte))
+   return -EINVAL;
+
+   if (size == ARM_LPAE_BLOCK_SIZE(lvl, data)) {
+   if (iopte_leaf(pte, lvl, iop->fmt)) {
+   if (pte & ARM_LPAE_PTE_AP_RDONLY)
+   return 0;
+
+   /* It is writable, set the bitmap */
+   nbits = size >> bitmap_pgshift;
+   offset = (iova - base_iova) >> bitmap_pgshift;
+   bitmap_set(bitmap, offset, nbits);
+   return 0;
+   } else {
+   /* To traverse next level */
+   next_size = ARM_LPAE_BLOCK_SIZE(lvl + 1, data);
+   ptep = iopte_deref(pte, data);
+   for (base = 0; base < size; base += next_size) {
+   ret = __arm_lpae_sync_dirty_log(data,
+   iova + base, next_size, lvl + 1,
+   ptep, bitmap, base_iova, 
bitmap_pgshift);
+

[PATCH v3 12/12] iommu/arm-smmu-v3: Add HWDBM device feature reporting

2021-04-13 Thread Keqian Zhu

From: Kunkun Jiang 

We have implemented these interfaces required to support iommu
dirty log tracking. The last step is reporting this feature to
upper user, then the user can perform higher policy base on it.

There is a new dev feature named IOMMU_DEV_FEAT_HWDBM in iommu
layer. For arm smmuv3, it is equal to ARM_SMMU_FEAT_HD and it is
enabled by default if supported. Other types of IOMMU can enable
it by default or when user invokes enable_feature.

Co-developed-by: Keqian Zhu 
Signed-off-by: Kunkun Jiang 
---
 drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c | 6 ++
 1 file changed, 6 insertions(+)

diff --git a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c 
b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
index 59bb1d198631..2d716ee5621f 100644
--- a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
+++ b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
@@ -2712,6 +2712,9 @@ static bool arm_smmu_dev_has_feature(struct device *dev,
switch (feat) {
case IOMMU_DEV_FEAT_SVA:
return arm_smmu_master_sva_supported(master);
+   case IOMMU_DEV_FEAT_HWDBM:
+   /* No requirement for device, require HTTU HD of smmu */
+   return !!(master->smmu->features & ARM_SMMU_FEAT_HD);
default:
return false;
}
@@ -2728,6 +2731,9 @@ static bool arm_smmu_dev_feature_enabled(struct device 
*dev,
switch (feat) {
case IOMMU_DEV_FEAT_SVA:
return arm_smmu_master_sva_enabled(master);
+   case IOMMU_DEV_FEAT_HWDBM:
+   /* HTTU HD is enabled if supported */
+   return arm_smmu_dev_has_feature(dev, feat);
default:
return false;
}
-- 
2.19.1

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

[PATCH v3 11/12] iommu/arm-smmu-v3: Realize clear_dirty_log iommu ops

2021-04-13 Thread Keqian Zhu

From: Kunkun Jiang 

After dirty log is retrieved, user should clear dirty log to re-enable
dirty log tracking for these dirtied pages. This clears the dirty state
(As we just enable HTTU for stage1, so set the AP[2] bit) of these leaf
TTDs that are specified by the user provided bitmap.

Co-developed-by: Keqian Zhu 
Signed-off-by: Kunkun Jiang 
---
 drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c | 25 ++
 drivers/iommu/io-pgtable-arm.c  | 95 +
 include/linux/io-pgtable.h  |  4 +
 3 files changed, 124 insertions(+)

diff --git a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c 
b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
index 9eb209a07acc..59bb1d198631 100644
--- a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
+++ b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
@@ -2656,6 +2656,30 @@ static int arm_smmu_sync_dirty_log(struct iommu_domain 
*domain,
   bitmap_pgshift);
 }
 
+static int arm_smmu_clear_dirty_log(struct iommu_domain *domain,
+   unsigned long iova, size_t size,
+   unsigned long *bitmap,
+   unsigned long base_iova,
+   unsigned long bitmap_pgshift)
+{
+   struct arm_smmu_domain *smmu_domain = to_smmu_domain(domain);
+   struct io_pgtable_ops *ops = smmu_domain->pgtbl_ops;
+   struct arm_smmu_device *smmu = smmu_domain->smmu;
+
+   if (!(smmu->features & ARM_SMMU_FEAT_HD))
+   return -ENODEV;
+   if (smmu_domain->stage != ARM_SMMU_DOMAIN_S1)
+   return -EINVAL;
+
+   if (!ops || !ops->clear_dirty_log) {
+   pr_err("io-pgtable don't realize clear dirty log\n");
+   return -ENODEV;
+   }
+
+   return ops->clear_dirty_log(ops, iova, size, bitmap, base_iova,
+   bitmap_pgshift);
+}
+
 static int arm_smmu_of_xlate(struct device *dev, struct of_phandle_args *args)
 {
return iommu_fwspec_add_ids(dev, args->args, 1);
@@ -2759,6 +2783,7 @@ static struct iommu_ops arm_smmu_ops = {
.merge_page = arm_smmu_merge_page,
.switch_dirty_log   = arm_smmu_switch_dirty_log,
.sync_dirty_log = arm_smmu_sync_dirty_log,
+   .clear_dirty_log= arm_smmu_clear_dirty_log,
.of_xlate   = arm_smmu_of_xlate,
.get_resv_regions   = arm_smmu_get_resv_regions,
.put_resv_regions   = generic_iommu_put_resv_regions,
diff --git a/drivers/iommu/io-pgtable-arm.c b/drivers/iommu/io-pgtable-arm.c
index 67a208a05ab2..e3ef0f50611c 100644
--- a/drivers/iommu/io-pgtable-arm.c
+++ b/drivers/iommu/io-pgtable-arm.c
@@ -966,6 +966,100 @@ static int arm_lpae_sync_dirty_log(struct io_pgtable_ops 
*ops,
 bitmap, base_iova, bitmap_pgshift);
 }
 
+static int __arm_lpae_clear_dirty_log(struct arm_lpae_io_pgtable *data,
+ unsigned long iova, size_t size,
+ int lvl, arm_lpae_iopte *ptep,
+ unsigned long *bitmap,
+ unsigned long base_iova,
+ unsigned long bitmap_pgshift)
+{
+   arm_lpae_iopte pte;
+   struct io_pgtable *iop = >iop;
+   unsigned long offset;
+   size_t base, next_size;
+   int nbits, ret, i;
+
+   if (WARN_ON(lvl == ARM_LPAE_MAX_LEVELS))
+   return -EINVAL;
+
+   ptep += ARM_LPAE_LVL_IDX(iova, lvl, data);
+   pte = READ_ONCE(*ptep);
+   if (WARN_ON(!pte))
+   return -EINVAL;
+
+   if (size == ARM_LPAE_BLOCK_SIZE(lvl, data)) {
+   if (iopte_leaf(pte, lvl, iop->fmt)) {
+   if (pte & ARM_LPAE_PTE_AP_RDONLY)
+   return 0;
+
+   /* Ensure all corresponding bits are set */
+   nbits = size >> bitmap_pgshift;
+   offset = (iova - base_iova) >> bitmap_pgshift;
+   for (i = offset; i < offset + nbits; i++) {
+   if (!test_bit(i, bitmap))
+   return 0;
+   }
+
+   /* Race does not exist */
+   pte |= ARM_LPAE_PTE_AP_RDONLY;
+   __arm_lpae_set_pte(ptep, pte, >cfg);
+   return 0;
+   } else {
+   /* To traverse next level */
+   next_size = ARM_LPAE_BLOCK_SIZE(lvl + 1, data);
+   ptep = iopte_deref(pte, data);
+   for (base = 0; base < size; base += next_size) {
+   ret = __arm_lpae_clear_dirty_log(data,
+   iova + base, next_s

[PATCH v3 09/12] iommu/arm-smmu-v3: Realize switch_dirty_log iommu ops

2021-04-13 Thread Keqian Zhu

From: Kunkun Jiang 

This realizes switch_dirty_log by invoking iommu_split_block() and
iommu_merge_page(). HTTU HD feature is required.

Co-developed-by: Keqian Zhu 
Signed-off-by: Kunkun Jiang 
---
 drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c | 38 +
 1 file changed, 38 insertions(+)

diff --git a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c 
b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
index 4d8495d88be2..52c6f3e74d6f 100644
--- a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
+++ b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
@@ -2590,6 +2590,43 @@ static int arm_smmu_merge_page(struct iommu_domain 
*domain,
return 0;
 }
 
+static int arm_smmu_switch_dirty_log(struct iommu_domain *domain, bool enable,
+unsigned long iova, size_t size, int prot)
+{
+   struct arm_smmu_domain *smmu_domain = to_smmu_domain(domain);
+   struct arm_smmu_device *smmu = smmu_domain->smmu;
+
+   if (!(smmu->features & ARM_SMMU_FEAT_HD))
+   return -ENODEV;
+   if (smmu_domain->stage != ARM_SMMU_DOMAIN_S1)
+   return -EINVAL;
+
+   if (enable) {
+   /*
+* For SMMU, the hardware dirty management is always enabled if
+* hardware supports HTTU HD. The action to start dirty log is
+* spliting block mapping.
+*
+* We don't return error even if the split operation fail, as we
+* can still track dirty at block granule, which is still a much
+* better choice compared to full dirty policy.
+*/
+   iommu_split_block(domain, iova, size);
+   } else {
+   /*
+* For SMMU, the hardware dirty management is always enabled if
+* hardware supports HTTU HD. The action to start dirty log is
+* merging page mapping.
+*
+* We don't return error even if the merge operation fail, as it
+* just effects performace of DMA transaction.
+*/
+   iommu_merge_page(domain, iova, size, prot);
+   }
+
+   return 0;
+}
+
 static int arm_smmu_of_xlate(struct device *dev, struct of_phandle_args *args)
 {
return iommu_fwspec_add_ids(dev, args->args, 1);
@@ -2691,6 +2728,7 @@ static struct iommu_ops arm_smmu_ops = {
.domain_set_attr= arm_smmu_domain_set_attr,
.split_block= arm_smmu_split_block,
.merge_page = arm_smmu_merge_page,
+   .switch_dirty_log   = arm_smmu_switch_dirty_log,
.of_xlate   = arm_smmu_of_xlate,
.get_resv_regions   = arm_smmu_get_resv_regions,
.put_resv_regions   = generic_iommu_put_resv_regions,
-- 
2.19.1

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

[PATCH v3 08/12] iommu/arm-smmu-v3: Realize merge_page iommu ops

2021-04-13 Thread Keqian Zhu

From: Kunkun Jiang 

This reinstalls block mappings and unmap the span of page mappings.
BBML1 or BBML2 feature is required.

Merging page does not simultaneously work with other pgtable ops,
as the only designed user is vfio, which always hold a lock, so race
condition is not considered in the pgtable ops.

Co-developed-by: Keqian Zhu 
Signed-off-by: Kunkun Jiang 
---
 drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c | 28 
 drivers/iommu/io-pgtable-arm.c  | 78 +
 include/linux/io-pgtable.h  |  2 +
 3 files changed, 108 insertions(+)

diff --git a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c 
b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
index cfa83fa03c89..4d8495d88be2 100644
--- a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
+++ b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
@@ -2563,6 +2563,33 @@ static int arm_smmu_split_block(struct iommu_domain 
*domain,
return 0;
 }
 
+static int arm_smmu_merge_page(struct iommu_domain *domain,
+  unsigned long iova, phys_addr_t paddr,
+  size_t size, int prot)
+{
+   struct arm_smmu_domain *smmu_domain = to_smmu_domain(domain);
+   struct arm_smmu_device *smmu = smmu_domain->smmu;
+   struct io_pgtable_ops *ops = smmu_domain->pgtbl_ops;
+   size_t handled_size;
+
+   if (!(smmu->features & (ARM_SMMU_FEAT_BBML1 | ARM_SMMU_FEAT_BBML2))) {
+   dev_err(smmu->dev, "don't support BBML1/2, can't merge page\n");
+   return -ENODEV;
+   }
+   if (!ops || !ops->merge_page) {
+   pr_err("io-pgtable don't realize merge page\n");
+   return -ENODEV;
+   }
+
+   handled_size = ops->merge_page(ops, iova, paddr, size, prot);
+   if (handled_size != size) {
+   pr_err("merge page failed\n");
+   return -EFAULT;
+   }
+
+   return 0;
+}
+
 static int arm_smmu_of_xlate(struct device *dev, struct of_phandle_args *args)
 {
return iommu_fwspec_add_ids(dev, args->args, 1);
@@ -2663,6 +2690,7 @@ static struct iommu_ops arm_smmu_ops = {
.domain_get_attr= arm_smmu_domain_get_attr,
.domain_set_attr= arm_smmu_domain_set_attr,
.split_block= arm_smmu_split_block,
+   .merge_page = arm_smmu_merge_page,
.of_xlate   = arm_smmu_of_xlate,
.get_resv_regions   = arm_smmu_get_resv_regions,
.put_resv_regions   = generic_iommu_put_resv_regions,
diff --git a/drivers/iommu/io-pgtable-arm.c b/drivers/iommu/io-pgtable-arm.c
index 4c4eec3c0698..9028328b99b0 100644
--- a/drivers/iommu/io-pgtable-arm.c
+++ b/drivers/iommu/io-pgtable-arm.c
@@ -800,6 +800,83 @@ static size_t arm_lpae_split_block(struct io_pgtable_ops 
*ops,
return __arm_lpae_split_block(data, iova, size, lvl, ptep);
 }
 
+static size_t __arm_lpae_merge_page(struct arm_lpae_io_pgtable *data,
+   unsigned long iova, phys_addr_t paddr,
+   size_t size, int lvl, arm_lpae_iopte *ptep,
+   arm_lpae_iopte prot)
+{
+   arm_lpae_iopte pte, *tablep;
+   struct io_pgtable *iop = >iop;
+   struct io_pgtable_cfg *cfg = >iop.cfg;
+
+   if (WARN_ON(lvl == ARM_LPAE_MAX_LEVELS))
+   return 0;
+
+   ptep += ARM_LPAE_LVL_IDX(iova, lvl, data);
+   pte = READ_ONCE(*ptep);
+   if (WARN_ON(!pte))
+   return 0;
+
+   if (size == ARM_LPAE_BLOCK_SIZE(lvl, data)) {
+   if (iopte_leaf(pte, lvl, iop->fmt))
+   return size;
+
+   /* Race does not exist */
+   if (cfg->quirks & IO_PGTABLE_QUIRK_ARM_BBML1) {
+   prot |= ARM_LPAE_PTE_NT;
+   __arm_lpae_init_pte(data, paddr, prot, lvl, ptep);
+   io_pgtable_tlb_flush_walk(iop, iova, size,
+ ARM_LPAE_GRANULE(data));
+
+   prot &= ~(ARM_LPAE_PTE_NT);
+   __arm_lpae_init_pte(data, paddr, prot, lvl, ptep);
+   } else {
+   __arm_lpae_init_pte(data, paddr, prot, lvl, ptep);
+   }
+
+   tablep = iopte_deref(pte, data);
+   __arm_lpae_free_pgtable(data, lvl + 1, tablep);
+   return size;
+   } else if (iopte_leaf(pte, lvl, iop->fmt)) {
+   /* The size is too small, already merged */
+   return size;
+   }
+
+   /* Keep on walkin */
+   ptep = iopte_deref(pte, data);
+   return __arm_lpae_merge_page(data, iova, paddr, size, lvl + 1, ptep, 
prot);
+}
+
+static size_t arm_lpae_merge_page(struct io_pgtable_ops *ops, unsigned long 
iova,
+ phys_addr_t paddr, size_t size, int 
iommu_prot)
+{
+

[PATCH v3 03/12] iommu: Add iommu_merge_page interface

2021-04-13 Thread Keqian Zhu

If block(largepage) mappings are split during start dirty log, then
when stop dirty log, we need to recover them for better DMA performance.

This adds a new interface named iommu_merge_page in IOMMU base layer.
A specific IOMMU driver can invoke it during stop dirty log. If so, the
driver also need to realize the merge_page iommu ops.

We flush all iotlbs after the whole procedure is completed to ease the
pressure of iommu, as we will hanle a huge range of mapping in general.

Signed-off-by: Keqian Zhu 
Signed-off-by: Kunkun Jiang 
---
 drivers/iommu/iommu.c | 75 +++
 include/linux/iommu.h | 12 +++
 2 files changed, 87 insertions(+)

diff --git a/drivers/iommu/iommu.c b/drivers/iommu/iommu.c
index bb413a927870..8f0d71bafb3a 100644
--- a/drivers/iommu/iommu.c
+++ b/drivers/iommu/iommu.c
@@ -2762,6 +2762,81 @@ int iommu_split_block(struct iommu_domain *domain, 
unsigned long iova,
 }
 EXPORT_SYMBOL_GPL(iommu_split_block);
 
+static int __iommu_merge_page(struct iommu_domain *domain,
+ unsigned long iova, phys_addr_t paddr,
+ size_t size, int prot)
+{
+   const struct iommu_ops *ops = domain->ops;
+   unsigned int min_pagesz;
+   size_t pgsize;
+   int ret = 0;
+
+   if (unlikely(!ops || !ops->merge_page))
+   return -ENODEV;
+
+   min_pagesz = 1 << __ffs(domain->pgsize_bitmap);
+   if (!IS_ALIGNED(iova | paddr | size, min_pagesz)) {
+   pr_err("unaligned: iova 0x%lx pa %pa size 0x%zx min_pagesz 
0x%x\n",
+   iova, , size, min_pagesz);
+   return -EINVAL;
+   }
+
+   while (size) {
+   pgsize = iommu_pgsize(domain, iova | paddr, size);
+
+   ret = ops->merge_page(domain, iova, paddr, pgsize, prot);
+   if (ret)
+   break;
+
+   pr_debug("merge handled: iova 0x%lx pa %pa size 0x%zx\n",
+iova, , pgsize);
+
+   iova += pgsize;
+   paddr += pgsize;
+   size -= pgsize;
+   }
+
+   return ret;
+}
+
+int iommu_merge_page(struct iommu_domain *domain, unsigned long iova,
+size_t size, int prot)
+{
+   phys_addr_t phys;
+   dma_addr_t p, i;
+   size_t cont_size;
+   bool flush = false;
+   int ret = 0;
+
+   while (size) {
+   flush = true;
+
+   phys = iommu_iova_to_phys(domain, iova);
+   cont_size = PAGE_SIZE;
+   p = phys + cont_size;
+   i = iova + cont_size;
+
+   while (cont_size < size && p == iommu_iova_to_phys(domain, i)) {
+   p += PAGE_SIZE;
+   i += PAGE_SIZE;
+   cont_size += PAGE_SIZE;
+   }
+
+   ret = __iommu_merge_page(domain, iova, phys, cont_size, prot);
+   if (ret)
+   break;
+
+   iova += cont_size;
+   size -= cont_size;
+   }
+
+   if (flush)
+   iommu_flush_iotlb_all(domain);
+
+   return ret;
+}
+EXPORT_SYMBOL_GPL(iommu_merge_page);
+
 int iommu_switch_dirty_log(struct iommu_domain *domain, bool enable,
   unsigned long iova, size_t size, int prot)
 {
diff --git a/include/linux/iommu.h b/include/linux/iommu.h
index c6c90ac069e3..fea3ecabff3d 100644
--- a/include/linux/iommu.h
+++ b/include/linux/iommu.h
@@ -209,6 +209,7 @@ struct iommu_iotlb_gather {
  * @domain_get_attr: Query domain attributes
  * @domain_set_attr: Change domain attributes
  * @split_block: Split block mapping into page mapping
+ * @merge_page: Merge page mapping into block mapping
  * @switch_dirty_log: Perform actions to start|stop dirty log tracking
  * @sync_dirty_log: Sync dirty log from IOMMU into a dirty bitmap
  * @clear_dirty_log: Clear dirty log of IOMMU by a mask bitmap
@@ -270,6 +271,8 @@ struct iommu_ops {
/* Track dirty log */
int (*split_block)(struct iommu_domain *domain, unsigned long iova,
   size_t size);
+   int (*merge_page)(struct iommu_domain *domain, unsigned long iova,
+ phys_addr_t phys, size_t size, int prot);
int (*switch_dirty_log)(struct iommu_domain *domain, bool enable,
unsigned long iova, size_t size, int prot);
int (*sync_dirty_log)(struct iommu_domain *domain,
@@ -534,6 +537,8 @@ extern int iommu_domain_set_attr(struct iommu_domain 
*domain, enum iommu_attr,
 void *data);
 extern int iommu_split_block(struct iommu_domain *domain, unsigned long iova,
 size_t size);
+extern int iommu_merge_page(struct iommu_domain *domain, unsigned long iova,
+   size_t size, int prot);
 extern int iommu_switch_d

[PATCH v3 02/12] iommu: Add iommu_split_block interface

2021-04-13 Thread Keqian Zhu

Block(largepage) mapping is not a proper granule for dirty log tracking.
Take an extreme example, if DMA writes one byte, under 1G mapping, the
dirty amount reported is 1G, but under 4K mapping, the dirty amount is
just 4K.

This adds a new interface named iommu_split_block in IOMMU base layer.
A specific IOMMU driver can invoke it during start dirty log. If so, the
driver also need to realize the split_block iommu ops.

We flush all iotlbs after the whole procedure is completed to ease the
pressure of IOMMU, as we will hanle a huge range of mapping in general.

Signed-off-by: Keqian Zhu 
Signed-off-by: Kunkun Jiang 
---
 drivers/iommu/iommu.c | 41 +
 include/linux/iommu.h | 11 +++
 2 files changed, 52 insertions(+)

diff --git a/drivers/iommu/iommu.c b/drivers/iommu/iommu.c
index 667b2d6d2fc0..bb413a927870 100644
--- a/drivers/iommu/iommu.c
+++ b/drivers/iommu/iommu.c
@@ -2721,6 +2721,47 @@ int iommu_domain_set_attr(struct iommu_domain *domain,
 }
 EXPORT_SYMBOL_GPL(iommu_domain_set_attr);
 
+int iommu_split_block(struct iommu_domain *domain, unsigned long iova,
+ size_t size)
+{
+   const struct iommu_ops *ops = domain->ops;
+   unsigned int min_pagesz;
+   size_t pgsize;
+   bool flush = false;
+   int ret = 0;
+
+   if (unlikely(!ops || !ops->split_block))
+   return -ENODEV;
+
+   min_pagesz = 1 << __ffs(domain->pgsize_bitmap);
+   if (!IS_ALIGNED(iova | size, min_pagesz)) {
+   pr_err("unaligned: iova 0x%lx size 0x%zx min_pagesz 0x%x\n",
+  iova, size, min_pagesz);
+   return -EINVAL;
+   }
+
+   while (size) {
+   flush = true;
+
+   pgsize = iommu_pgsize(domain, iova, size);
+
+   ret = ops->split_block(domain, iova, pgsize);
+   if (ret)
+   break;
+
+   pr_debug("split handled: iova 0x%lx size 0x%zx\n", iova, 
pgsize);
+
+   iova += pgsize;
+   size -= pgsize;
+   }
+
+   if (flush)
+   iommu_flush_iotlb_all(domain);
+
+   return ret;
+}
+EXPORT_SYMBOL_GPL(iommu_split_block);
+
 int iommu_switch_dirty_log(struct iommu_domain *domain, bool enable,
   unsigned long iova, size_t size, int prot)
 {
diff --git a/include/linux/iommu.h b/include/linux/iommu.h
index 7f9ed9f520e2..c6c90ac069e3 100644
--- a/include/linux/iommu.h
+++ b/include/linux/iommu.h
@@ -208,6 +208,7 @@ struct iommu_iotlb_gather {
  * @device_group: find iommu group for a particular device
  * @domain_get_attr: Query domain attributes
  * @domain_set_attr: Change domain attributes
+ * @split_block: Split block mapping into page mapping
  * @switch_dirty_log: Perform actions to start|stop dirty log tracking
  * @sync_dirty_log: Sync dirty log from IOMMU into a dirty bitmap
  * @clear_dirty_log: Clear dirty log of IOMMU by a mask bitmap
@@ -267,6 +268,8 @@ struct iommu_ops {
   enum iommu_attr attr, void *data);
 
/* Track dirty log */
+   int (*split_block)(struct iommu_domain *domain, unsigned long iova,
+  size_t size);
int (*switch_dirty_log)(struct iommu_domain *domain, bool enable,
unsigned long iova, size_t size, int prot);
int (*sync_dirty_log)(struct iommu_domain *domain,
@@ -529,6 +532,8 @@ extern int iommu_domain_get_attr(struct iommu_domain 
*domain, enum iommu_attr,
 void *data);
 extern int iommu_domain_set_attr(struct iommu_domain *domain, enum iommu_attr,
 void *data);
+extern int iommu_split_block(struct iommu_domain *domain, unsigned long iova,
+size_t size);
 extern int iommu_switch_dirty_log(struct iommu_domain *domain, bool enable,
  unsigned long iova, size_t size, int prot);
 extern int iommu_sync_dirty_log(struct iommu_domain *domain, unsigned long 
iova,
@@ -929,6 +934,12 @@ static inline int iommu_domain_set_attr(struct 
iommu_domain *domain,
return -EINVAL;
 }
 
+static inline int iommu_split_block(struct iommu_domain *domain,
+   unsigned long iova, size_t size)
+{
+   return -EINVAL;
+}
+
 static inline int iommu_switch_dirty_log(struct iommu_domain *domain,
 bool enable, unsigned long iova,
 size_t size, int prot)
-- 
2.19.1

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

[PATCH v3 07/12] iommu/arm-smmu-v3: Realize split_block iommu ops

2021-04-13 Thread Keqian Zhu

From: Kunkun Jiang 

This splits block descriptor to an span of page descriptors. BBML1
or BBML2 feature is required.

Spliting block does not simultaneously work with other pgtable ops,
as the only designed user is vfio, which always hold a lock, so race
condition is not considered in the pgtable ops.

Co-developed-by: Keqian Zhu 
Signed-off-by: Kunkun Jiang 
---
 drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c |  27 +
 drivers/iommu/io-pgtable-arm.c  | 122 
 include/linux/io-pgtable.h  |   2 +
 3 files changed, 151 insertions(+)

diff --git a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c 
b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
index 443ac19c6da9..cfa83fa03c89 100644
--- a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
+++ b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
@@ -2537,6 +2537,32 @@ static int arm_smmu_domain_set_attr(struct iommu_domain 
*domain,
return ret;
 }
 
+static int arm_smmu_split_block(struct iommu_domain *domain,
+   unsigned long iova, size_t size)
+{
+   struct arm_smmu_domain *smmu_domain = to_smmu_domain(domain);
+   struct arm_smmu_device *smmu = smmu_domain->smmu;
+   struct io_pgtable_ops *ops = smmu_domain->pgtbl_ops;
+   size_t handled_size;
+
+   if (!(smmu->features & (ARM_SMMU_FEAT_BBML1 | ARM_SMMU_FEAT_BBML2))) {
+   dev_err(smmu->dev, "don't support BBML1/2, can't split 
block\n");
+   return -ENODEV;
+   }
+   if (!ops || !ops->split_block) {
+   pr_err("io-pgtable don't realize split block\n");
+   return -ENODEV;
+   }
+
+   handled_size = ops->split_block(ops, iova, size);
+   if (handled_size != size) {
+   pr_err("split block failed\n");
+   return -EFAULT;
+   }
+
+   return 0;
+}
+
 static int arm_smmu_of_xlate(struct device *dev, struct of_phandle_args *args)
 {
return iommu_fwspec_add_ids(dev, args->args, 1);
@@ -2636,6 +2662,7 @@ static struct iommu_ops arm_smmu_ops = {
.device_group   = arm_smmu_device_group,
.domain_get_attr= arm_smmu_domain_get_attr,
.domain_set_attr= arm_smmu_domain_set_attr,
+   .split_block= arm_smmu_split_block,
.of_xlate   = arm_smmu_of_xlate,
.get_resv_regions   = arm_smmu_get_resv_regions,
.put_resv_regions   = generic_iommu_put_resv_regions,
diff --git a/drivers/iommu/io-pgtable-arm.c b/drivers/iommu/io-pgtable-arm.c
index 94d790b8ed27..4c4eec3c0698 100644
--- a/drivers/iommu/io-pgtable-arm.c
+++ b/drivers/iommu/io-pgtable-arm.c
@@ -79,6 +79,8 @@
 #define ARM_LPAE_PTE_SH_IS (((arm_lpae_iopte)3) << 8)
 #define ARM_LPAE_PTE_NS(((arm_lpae_iopte)1) << 5)
 #define ARM_LPAE_PTE_VALID (((arm_lpae_iopte)1) << 0)
+/* Block descriptor bits */
+#define ARM_LPAE_PTE_NT(((arm_lpae_iopte)1) << 16)
 
 #define ARM_LPAE_PTE_ATTR_LO_MASK  (((arm_lpae_iopte)0x3ff) << 2)
 /* Ignore the contiguous bit for block splitting */
@@ -679,6 +681,125 @@ static phys_addr_t arm_lpae_iova_to_phys(struct 
io_pgtable_ops *ops,
return iopte_to_paddr(pte, data) | iova;
 }
 
+static size_t __arm_lpae_split_block(struct arm_lpae_io_pgtable *data,
+unsigned long iova, size_t size, int lvl,
+arm_lpae_iopte *ptep);
+
+static size_t arm_lpae_do_split_blk(struct arm_lpae_io_pgtable *data,
+   unsigned long iova, size_t size,
+   arm_lpae_iopte blk_pte, int lvl,
+   arm_lpae_iopte *ptep)
+{
+   struct io_pgtable_cfg *cfg = >iop.cfg;
+   arm_lpae_iopte pte, *tablep;
+   phys_addr_t blk_paddr;
+   size_t tablesz = ARM_LPAE_GRANULE(data);
+   size_t split_sz = ARM_LPAE_BLOCK_SIZE(lvl, data);
+   int i;
+
+   if (WARN_ON(lvl == ARM_LPAE_MAX_LEVELS))
+   return 0;
+
+   tablep = __arm_lpae_alloc_pages(tablesz, GFP_ATOMIC, cfg);
+   if (!tablep)
+   return 0;
+
+   blk_paddr = iopte_to_paddr(blk_pte, data);
+   pte = iopte_prot(blk_pte);
+   for (i = 0; i < tablesz / sizeof(pte); i++, blk_paddr += split_sz)
+   __arm_lpae_init_pte(data, blk_paddr, pte, lvl, [i]);
+
+   if (cfg->quirks & IO_PGTABLE_QUIRK_ARM_BBML1) {
+   /* Race does not exist */
+   blk_pte |= ARM_LPAE_PTE_NT;
+   __arm_lpae_set_pte(ptep, blk_pte, cfg);
+   io_pgtable_tlb_flush_walk(>iop, iova, size, size);
+   }
+   /* Race does not exist */
+   pte = arm_lpae_install_table(tablep, ptep, blk_pte, cfg);
+
+   /* Have splited it into page? */
+   if (lvl == (ARM_LPAE_MAX_LEVELS - 1))
+

[PATCH v3 04/12] iommu/arm-smmu-v3: Add support for Hardware Translation Table Update

2021-04-13 Thread Keqian Zhu

From: Jean-Philippe Brucker 

If the SMMU supports it and the kernel was built with HTTU support,
enable hardware update of access and dirty flags. This is essential for
shared page tables, to reduce the number of access faults on the fault
queue. Normal DMA with io-pgtables doesn't currently use the access or
dirty flags.

We can enable HTTU even if CPUs don't support it, because the kernel
always checks for HW dirty bit and updates the PTE flags atomically.

Signed-off-by: Jean-Philippe Brucker 
---
 .../iommu/arm/arm-smmu-v3/arm-smmu-v3-sva.c   |  2 +
 drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c   | 41 ++-
 drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h   |  8 
 3 files changed, 50 insertions(+), 1 deletion(-)

diff --git a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3-sva.c 
b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3-sva.c
index bb251cab61f3..ae075e675892 100644
--- a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3-sva.c
+++ b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3-sva.c
@@ -121,10 +121,12 @@ static struct arm_smmu_ctx_desc 
*arm_smmu_alloc_shared_cd(struct mm_struct *mm)
if (err)
goto out_free_asid;
 
+   /* HA and HD will be filtered out later if not supported by the SMMU */
tcr = FIELD_PREP(CTXDESC_CD_0_TCR_T0SZ, 64ULL - vabits_actual) |
  FIELD_PREP(CTXDESC_CD_0_TCR_IRGN0, ARM_LPAE_TCR_RGN_WBWA) |
  FIELD_PREP(CTXDESC_CD_0_TCR_ORGN0, ARM_LPAE_TCR_RGN_WBWA) |
  FIELD_PREP(CTXDESC_CD_0_TCR_SH0, ARM_LPAE_TCR_SH_IS) |
+ CTXDESC_CD_0_TCR_HA | CTXDESC_CD_0_TCR_HD |
  CTXDESC_CD_0_TCR_EPD1 | CTXDESC_CD_0_AA64;
 
switch (PAGE_SIZE) {
diff --git a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c 
b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
index 8594b4a83043..b6d965504f44 100644
--- a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
+++ b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
@@ -1012,10 +1012,17 @@ int arm_smmu_write_ctx_desc(struct arm_smmu_domain 
*smmu_domain, int ssid,
 * this substream's traffic
 */
} else { /* (1) and (2) */
+   u64 tcr = cd->tcr;
+
cdptr[1] = cpu_to_le64(cd->ttbr & CTXDESC_CD_1_TTB0_MASK);
cdptr[2] = 0;
cdptr[3] = cpu_to_le64(cd->mair);
 
+   if (!(smmu->features & ARM_SMMU_FEAT_HD))
+   tcr &= ~CTXDESC_CD_0_TCR_HD;
+   if (!(smmu->features & ARM_SMMU_FEAT_HA))
+   tcr &= ~CTXDESC_CD_0_TCR_HA;
+
/*
 * STE is live, and the SMMU might read dwords of this CD in any
 * order. Ensure that it observes valid values before reading
@@ -1023,7 +1030,7 @@ int arm_smmu_write_ctx_desc(struct arm_smmu_domain 
*smmu_domain, int ssid,
 */
arm_smmu_sync_cd(smmu_domain, ssid, true);
 
-   val = cd->tcr |
+   val = tcr |
 #ifdef __BIG_ENDIAN
CTXDESC_CD_0_ENDI |
 #endif
@@ -3196,6 +3203,28 @@ static int arm_smmu_device_reset(struct arm_smmu_device 
*smmu, bool bypass)
return 0;
 }
 
+static void arm_smmu_get_httu(struct arm_smmu_device *smmu, u32 reg)
+{
+   u32 fw_features = smmu->features & (ARM_SMMU_FEAT_HA | 
ARM_SMMU_FEAT_HD);
+   u32 features = 0;
+
+   switch (FIELD_GET(IDR0_HTTU, reg)) {
+   case IDR0_HTTU_ACCESS_DIRTY:
+   features |= ARM_SMMU_FEAT_HD;
+   fallthrough;
+   case IDR0_HTTU_ACCESS:
+   features |= ARM_SMMU_FEAT_HA;
+   }
+
+   if (smmu->dev->of_node)
+   smmu->features |= features;
+   else if (features != fw_features)
+   /* ACPI IORT sets the HTTU bits */
+   dev_warn(smmu->dev,
+"IDR0.HTTU overridden by FW configuration (0x%x)\n",
+fw_features);
+}
+
 static int arm_smmu_device_hw_probe(struct arm_smmu_device *smmu)
 {
u32 reg;
@@ -3256,6 +3285,8 @@ static int arm_smmu_device_hw_probe(struct 
arm_smmu_device *smmu)
smmu->features |= ARM_SMMU_FEAT_E2H;
}
 
+   arm_smmu_get_httu(smmu, reg);
+
/*
 * The coherency feature as set by FW is used in preference to the ID
 * register, but warn on mismatch.
@@ -3441,6 +3472,14 @@ static int arm_smmu_device_acpi_probe(struct 
platform_device *pdev,
if (iort_smmu->flags & ACPI_IORT_SMMU_V3_COHACC_OVERRIDE)
smmu->features |= ARM_SMMU_FEAT_COHERENCY;
 
+   switch (FIELD_GET(ACPI_IORT_SMMU_V3_HTTU_OVERRIDE, iort_smmu->flags)) {
+   case IDR0_HTTU_ACCESS_DIRTY:
+   smmu->features |= ARM_SMMU_FEAT_HD;
+   fallthrough;
+   case IDR0_HTTU_ACCESS:
+   smmu->features |= ARM_SMMU_FEAT_HA;
+   }
+
return 0;
 }
 #else
diff --git a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h 
b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h
index

[PATCH v3 06/12] iommu/arm-smmu-v3: Add feature detection for BBML

2021-04-13 Thread Keqian Zhu

From: Kunkun Jiang 

When altering a translation table descriptor of some specific reasons,
we require break-before-make procedure. But it might cause problems when
the TTD is alive. The I/O streams might not tolerate translation faults.

If the SMMU supports BBM level 1 or BBM level 2, we can change the block
size without using break-before-make sequence.

This adds feature detection for BBML, none functional change expected.

Co-developed-by: Keqian Zhu 
Signed-off-by: Kunkun Jiang 
---
 drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c | 19 +++
 drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h |  6 ++
 include/linux/io-pgtable.h  |  8 
 3 files changed, 33 insertions(+)

diff --git a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c 
b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
index 369c0ea7a104..443ac19c6da9 100644
--- a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
+++ b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
@@ -2030,6 +2030,11 @@ static int arm_smmu_domain_finalise(struct iommu_domain 
*domain,
if (smmu->features & ARM_SMMU_FEAT_HD)
pgtbl_cfg.quirks |= IO_PGTABLE_QUIRK_ARM_HD;
 
+   if (smmu->features & ARM_SMMU_FEAT_BBML1)
+   pgtbl_cfg.quirks |= IO_PGTABLE_QUIRK_ARM_BBML1;
+   else if (smmu->features & ARM_SMMU_FEAT_BBML2)
+   pgtbl_cfg.quirks |= IO_PGTABLE_QUIRK_ARM_BBML2;
+
pgtbl_ops = alloc_io_pgtable_ops(fmt, _cfg, smmu_domain);
if (!pgtbl_ops)
return -ENOMEM;
@@ -3373,6 +3378,20 @@ static int arm_smmu_device_hw_probe(struct 
arm_smmu_device *smmu)
 
/* IDR3 */
reg = readl_relaxed(smmu->base + ARM_SMMU_IDR3);
+   switch (FIELD_GET(IDR3_BBML, reg)) {
+   case IDR3_BBML0:
+   break;
+   case IDR3_BBML1:
+   smmu->features |= ARM_SMMU_FEAT_BBML1;
+   break;
+   case IDR3_BBML2:
+   smmu->features |= ARM_SMMU_FEAT_BBML2;
+   break;
+   default:
+   dev_err(smmu->dev, "unknown/unsupported BBM behavior level\n");
+   return -ENXIO;
+   }
+
if (FIELD_GET(IDR3_RIL, reg))
smmu->features |= ARM_SMMU_FEAT_RANGE_INV;
 
diff --git a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h 
b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h
index 26d6b935b383..a74125675544 100644
--- a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h
+++ b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h
@@ -54,6 +54,10 @@
 #define IDR1_SIDSIZE   GENMASK(5, 0)
 
 #define ARM_SMMU_IDR3  0xc
+#define IDR3_BBML  GENMASK(12, 11)
+#define IDR3_BBML0 0
+#define IDR3_BBML1 1
+#define IDR3_BBML2 2
 #define IDR3_RIL   (1 << 10)
 
 #define ARM_SMMU_IDR5  0x14
@@ -615,6 +619,8 @@ struct arm_smmu_device {
 #define ARM_SMMU_FEAT_E2H  (1 << 18)
 #define ARM_SMMU_FEAT_HA   (1 << 19)
 #define ARM_SMMU_FEAT_HD   (1 << 20)
+#define ARM_SMMU_FEAT_BBML1(1 << 21)
+#define ARM_SMMU_FEAT_BBML2(1 << 22)
u32 features;
 
 #define ARM_SMMU_OPT_SKIP_PREFETCH (1 << 0)
diff --git a/include/linux/io-pgtable.h b/include/linux/io-pgtable.h
index 64cee6831c97..9e7163ec9447 100644
--- a/include/linux/io-pgtable.h
+++ b/include/linux/io-pgtable.h
@@ -84,6 +84,12 @@ struct io_pgtable_cfg {
 *  attributes set in the TCR for a non-coherent page-table walker.
 *
 * IO_PGTABLE_QUIRK_ARM_HD: Support hardware management of dirty status.
+*
+* IO_PGTABLE_QUIRK_ARM_BBML1: ARM SMMU supports BBM Level 1 behavior
+*  when changing block size.
+*
+* IO_PGTABLE_QUIRK_ARM_BBML2: ARM SMMU supports BBM Level 2 behavior
+*  when changing block size.
 */
#define IO_PGTABLE_QUIRK_ARM_NS BIT(0)
#define IO_PGTABLE_QUIRK_NO_PERMS   BIT(1)
@@ -92,6 +98,8 @@ struct io_pgtable_cfg {
#define IO_PGTABLE_QUIRK_ARM_TTBR1  BIT(5)
#define IO_PGTABLE_QUIRK_ARM_OUTER_WBWA BIT(6)
#define IO_PGTABLE_QUIRK_ARM_HD BIT(7)
+   #define IO_PGTABLE_QUIRK_ARM_BBML1  BIT(8)
+   #define IO_PGTABLE_QUIRK_ARM_BBML2  BIT(9)
unsigned long   quirks;
unsigned long   pgsize_bitmap;
unsigned intias;
-- 
2.19.1

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

[PATCH v3 05/12] iommu/arm-smmu-v3: Enable HTTU for stage1 with io-pgtable mapping

2021-04-13 Thread Keqian Zhu

From: Kunkun Jiang 

As nested mode is not upstreamed now, we just aim to support dirty
log tracking for stage1 with io-pgtable mapping (means not support
SVA mapping). If HTTU is supported, we enable HA/HD bits in the SMMU
CD, and set DBM bit for writable TTD.

The dirty state information is encoded using the access permission
bits AP[2] (stage 1) or S2AP[1] (stage 2) in conjunction with the
DBM (Dirty Bit Modifier) bit, where DBM means writable and AP[2]/
S2AP[1] means dirty.

Co-developed-by: Keqian Zhu 
Signed-off-by: Kunkun Jiang 
---
 drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c | 3 +++
 drivers/iommu/io-pgtable-arm.c  | 7 ++-
 include/linux/io-pgtable.h  | 3 +++
 3 files changed, 12 insertions(+), 1 deletion(-)

diff --git a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c 
b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
index b6d965504f44..369c0ea7a104 100644
--- a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
+++ b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
@@ -1921,6 +1921,7 @@ static int arm_smmu_domain_finalise_s1(struct 
arm_smmu_domain *smmu_domain,
  FIELD_PREP(CTXDESC_CD_0_TCR_ORGN0, tcr->orgn) |
  FIELD_PREP(CTXDESC_CD_0_TCR_SH0, tcr->sh) |
  FIELD_PREP(CTXDESC_CD_0_TCR_IPS, tcr->ips) |
+ CTXDESC_CD_0_TCR_HA | CTXDESC_CD_0_TCR_HD |
  CTXDESC_CD_0_TCR_EPD1 | CTXDESC_CD_0_AA64;
cfg->cd.mair= pgtbl_cfg->arm_lpae_s1_cfg.mair;
 
@@ -2026,6 +2027,8 @@ static int arm_smmu_domain_finalise(struct iommu_domain 
*domain,
 
if (smmu_domain->non_strict)
pgtbl_cfg.quirks |= IO_PGTABLE_QUIRK_NON_STRICT;
+   if (smmu->features & ARM_SMMU_FEAT_HD)
+   pgtbl_cfg.quirks |= IO_PGTABLE_QUIRK_ARM_HD;
 
pgtbl_ops = alloc_io_pgtable_ops(fmt, _cfg, smmu_domain);
if (!pgtbl_ops)
diff --git a/drivers/iommu/io-pgtable-arm.c b/drivers/iommu/io-pgtable-arm.c
index 87def58e79b5..94d790b8ed27 100644
--- a/drivers/iommu/io-pgtable-arm.c
+++ b/drivers/iommu/io-pgtable-arm.c
@@ -72,6 +72,7 @@
 
 #define ARM_LPAE_PTE_NSTABLE   (((arm_lpae_iopte)1) << 63)
 #define ARM_LPAE_PTE_XN(((arm_lpae_iopte)3) << 53)
+#define ARM_LPAE_PTE_DBM   (((arm_lpae_iopte)1) << 51)
 #define ARM_LPAE_PTE_AF(((arm_lpae_iopte)1) << 10)
 #define ARM_LPAE_PTE_SH_NS (((arm_lpae_iopte)0) << 8)
 #define ARM_LPAE_PTE_SH_OS (((arm_lpae_iopte)2) << 8)
@@ -81,7 +82,7 @@
 
 #define ARM_LPAE_PTE_ATTR_LO_MASK  (((arm_lpae_iopte)0x3ff) << 2)
 /* Ignore the contiguous bit for block splitting */
-#define ARM_LPAE_PTE_ATTR_HI_MASK  (((arm_lpae_iopte)6) << 52)
+#define ARM_LPAE_PTE_ATTR_HI_MASK  (((arm_lpae_iopte)13) << 51)
 #define ARM_LPAE_PTE_ATTR_MASK (ARM_LPAE_PTE_ATTR_LO_MASK |\
 ARM_LPAE_PTE_ATTR_HI_MASK)
 /* Software bit for solving coherency races */
@@ -379,6 +380,7 @@ static int __arm_lpae_map(struct arm_lpae_io_pgtable *data, 
unsigned long iova,
 static arm_lpae_iopte arm_lpae_prot_to_pte(struct arm_lpae_io_pgtable *data,
   int prot)
 {
+   struct io_pgtable_cfg *cfg = >iop.cfg;
arm_lpae_iopte pte;
 
if (data->iop.fmt == ARM_64_LPAE_S1 ||
@@ -386,6 +388,9 @@ static arm_lpae_iopte arm_lpae_prot_to_pte(struct 
arm_lpae_io_pgtable *data,
pte = ARM_LPAE_PTE_nG;
if (!(prot & IOMMU_WRITE) && (prot & IOMMU_READ))
pte |= ARM_LPAE_PTE_AP_RDONLY;
+   else if (cfg->quirks & IO_PGTABLE_QUIRK_ARM_HD)
+   pte |= ARM_LPAE_PTE_DBM;
+
if (!(prot & IOMMU_PRIV))
pte |= ARM_LPAE_PTE_AP_UNPRIV;
} else {
diff --git a/include/linux/io-pgtable.h b/include/linux/io-pgtable.h
index a4c9ca2c31f1..64cee6831c97 100644
--- a/include/linux/io-pgtable.h
+++ b/include/linux/io-pgtable.h
@@ -82,6 +82,8 @@ struct io_pgtable_cfg {
 *
 * IO_PGTABLE_QUIRK_ARM_OUTER_WBWA: Override the outer-cacheability
 *  attributes set in the TCR for a non-coherent page-table walker.
+*
+* IO_PGTABLE_QUIRK_ARM_HD: Support hardware management of dirty status.
 */
#define IO_PGTABLE_QUIRK_ARM_NS BIT(0)
#define IO_PGTABLE_QUIRK_NO_PERMS   BIT(1)
@@ -89,6 +91,7 @@ struct io_pgtable_cfg {
#define IO_PGTABLE_QUIRK_NON_STRICT BIT(4)
#define IO_PGTABLE_QUIRK_ARM_TTBR1  BIT(5)
#define IO_PGTABLE_QUIRK_ARM_OUTER_WBWA BIT(6)
+   #define IO_PGTABLE_QUIRK_ARM_HD BIT(7)
unsigned long   quirks;
unsigned long   pgsize_bitmap;
unsigned intias;
-- 
2

[PATCH v3 01/12] iommu: Introduce dirty log tracking framework

2021-04-13 Thread Keqian Zhu

Some types of IOMMU are capable of tracking DMA dirty log, such as
ARM SMMU with HTTU or Intel IOMMU with SLADE. This introduces the
dirty log tracking framework in the IOMMU base layer.

Three new essential interfaces are added, and we maintaince the status
of dirty log tracking in iommu_domain.
1. iommu_switch_dirty_log: Perform actions to start|stop dirty log tracking
2. iommu_sync_dirty_log: Sync dirty log from IOMMU into a dirty bitmap
3. iommu_clear_dirty_log: Clear dirty log of IOMMU by a mask bitmap

A new dev feature are added to indicate whether a specific type of
iommu hardware supports and its driver realizes them.

Signed-off-by: Keqian Zhu 
Signed-off-by: Kunkun Jiang 
---
 drivers/iommu/iommu.c | 150 ++
 include/linux/iommu.h |  53 +++
 2 files changed, 203 insertions(+)

diff --git a/drivers/iommu/iommu.c b/drivers/iommu/iommu.c
index d0b0a15dba84..667b2d6d2fc0 100644
--- a/drivers/iommu/iommu.c
+++ b/drivers/iommu/iommu.c
@@ -1922,6 +1922,7 @@ static struct iommu_domain *__iommu_domain_alloc(struct 
bus_type *bus,
domain->type = type;
/* Assume all sizes by default; the driver may override this later */
domain->pgsize_bitmap  = bus->iommu_ops->pgsize_bitmap;
+   mutex_init(>switch_log_lock);
 
return domain;
 }
@@ -2720,6 +2721,155 @@ int iommu_domain_set_attr(struct iommu_domain *domain,
 }
 EXPORT_SYMBOL_GPL(iommu_domain_set_attr);
 
+int iommu_switch_dirty_log(struct iommu_domain *domain, bool enable,
+  unsigned long iova, size_t size, int prot)
+{
+   const struct iommu_ops *ops = domain->ops;
+   int ret;
+
+   if (unlikely(!ops || !ops->switch_dirty_log))
+   return -ENODEV;
+
+   mutex_lock(>switch_log_lock);
+   if (enable && domain->dirty_log_tracking) {
+   ret = -EBUSY;
+   goto out;
+   } else if (!enable && !domain->dirty_log_tracking) {
+   ret = -EINVAL;
+   goto out;
+   }
+
+   ret = ops->switch_dirty_log(domain, enable, iova, size, prot);
+   if (ret)
+   goto out;
+
+   domain->dirty_log_tracking = enable;
+out:
+   mutex_unlock(>switch_log_lock);
+   return ret;
+}
+EXPORT_SYMBOL_GPL(iommu_switch_dirty_log);
+
+int iommu_sync_dirty_log(struct iommu_domain *domain, unsigned long iova,
+size_t size, unsigned long *bitmap,
+unsigned long base_iova, unsigned long bitmap_pgshift)
+{
+   const struct iommu_ops *ops = domain->ops;
+   unsigned int min_pagesz;
+   size_t pgsize;
+   int ret = 0;
+
+   if (unlikely(!ops || !ops->sync_dirty_log))
+   return -ENODEV;
+
+   min_pagesz = 1 << __ffs(domain->pgsize_bitmap);
+   if (!IS_ALIGNED(iova | size, min_pagesz)) {
+   pr_err("unaligned: iova 0x%lx size 0x%zx min_pagesz 0x%x\n",
+  iova, size, min_pagesz);
+   return -EINVAL;
+   }
+
+   mutex_lock(>switch_log_lock);
+   if (!domain->dirty_log_tracking) {
+   ret = -EINVAL;
+   goto out;
+   }
+
+   while (size) {
+   pgsize = iommu_pgsize(domain, iova, size);
+
+   ret = ops->sync_dirty_log(domain, iova, pgsize,
+ bitmap, base_iova, bitmap_pgshift);
+   if (ret)
+   break;
+
+   pr_debug("dirty_log_sync handle: iova 0x%lx pagesz 0x%zx\n",
+iova, pgsize);
+
+   iova += pgsize;
+   size -= pgsize;
+   }
+out:
+   mutex_unlock(>switch_log_lock);
+   return ret;
+}
+EXPORT_SYMBOL_GPL(iommu_sync_dirty_log);
+
+static int __iommu_clear_dirty_log(struct iommu_domain *domain,
+  unsigned long iova, size_t size,
+  unsigned long *bitmap,
+  unsigned long base_iova,
+  unsigned long bitmap_pgshift)
+{
+   const struct iommu_ops *ops = domain->ops;
+   size_t pgsize;
+   int ret = 0;
+
+   if (unlikely(!ops || !ops->clear_dirty_log))
+   return -ENODEV;
+
+   while (size) {
+   pgsize = iommu_pgsize(domain, iova, size);
+
+   ret = ops->clear_dirty_log(domain, iova, pgsize, bitmap,
+  base_iova, bitmap_pgshift);
+   if (ret)
+   break;
+
+   pr_debug("dirty_log_clear handled: iova 0x%lx pagesz 0x%zx\n",
+iova, pgsize);
+
+   iova += pgsize;
+   size -= pgsize;
+   }
+
+   return ret;
+}
+
+int iommu_clear_dirty_log(struct iommu_domain *domain,
+

[PATCH v3 00/12] iommu/smmuv3: Implement hardware dirty log tracking

2021-04-13 Thread Keqian Zhu


This patch series is split from the series[1] that containes both IOMMU part and
VFIO part. The new VFIO part will be sent out in another series.

[1] 
https://lore.kernel.org/linux-iommu/20210310090614.26668-1-zhukeqi...@huawei.com/

changelog:

v3:
 - Merge start_dirty_log and stop_dirty_log into switch_dirty_log. (Yi Sun)
 - Maintain the dirty log status in iommu_domain.
 - Update commit message to make patch easier to review.

v2:
 - Address all comments of RFC version, thanks for all of you ;-)
 - Add a bugfix that start dirty log for newly added dma ranges and domain.



Hi everyone,

This patch series introduces a framework of iommu dirty log tracking, and smmuv3
realizes this framework. This new feature can be used by VFIO dma dirty 
tracking.

Intention：

Some types of IOMMU are capable of tracking DMA dirty log, such as
ARM SMMU with HTTU or Intel IOMMU with SLADE. This introduces the
dirty log tracking framework in the IOMMU base layer.

Three new essential interfaces are added, and we maintaince the status
of dirty log tracking in iommu_domain.
1. iommu_switch_dirty_log: Perform actions to start|stop dirty log tracking
2. iommu_sync_dirty_log: Sync dirty log from IOMMU into a dirty bitmap
3. iommu_clear_dirty_log: Clear dirty log of IOMMU by a mask bitmap

About SMMU HTTU:

HTTU (Hardware Translation Table Update) is a feature of ARM SMMUv3, it can 
update
access flag or/and dirty state of the TTD (Translation Table Descriptor) by 
hardware.
With HTTU, stage1 TTD is classified into 3 types:
DBM bit AP[2](readonly bit)
1. writable_clean 1   1
2. writable_dirty 1   0
3. readonly   0   1

If HTTU_HD (manage dirty state) is enabled, smmu can change TTD from 
writable_clean to
writable_dirty. Then software can scan TTD to sync dirty state into dirty 
bitmap. With
this feature, we can track the dirty log of DMA continuously and precisely.

About this series:

Patch 1-3：Introduce dirty log tracking framework in the IOMMU base layer, and 
two common
   interfaces that can be used by many types of iommu.

Patch 4-6: Add feature detection for smmu HTTU and enable HTTU for smmu stage1 
mapping.
   And add feature detection for smmu BBML. We need to split block 
mapping when
   start dirty log tracking and merge page mapping when stop dirty log 
tracking,
   which requires break-before-make procedure. But it might 
cause problems when the
   TTD is alive. The I/O streams might not tolerate translation 
faults. So BBML
   should be used.

Patch 7-12: We implement these interfaces for arm smmuv3.

Thanks,
Keqian

Jean-Philippe Brucker (1):
  iommu/arm-smmu-v3: Add support for Hardware Translation Table Update

Keqian Zhu (3):
  iommu: Introduce dirty log tracking framework
  iommu: Add iommu_split_block interface
  iommu: Add iommu_merge_page interface

Kunkun Jiang (8):
  iommu/arm-smmu-v3: Enable HTTU for stage1 with io-pgtable mapping
  iommu/arm-smmu-v3: Add feature detection for BBML
  iommu/arm-smmu-v3: Realize split_block iommu ops
  iommu/arm-smmu-v3: Realize merge_page iommu ops
  iommu/arm-smmu-v3: Realize switch_dirty_log iommu ops
  iommu/arm-smmu-v3: Realize sync_dirty_log iommu ops
  iommu/arm-smmu-v3: Realize clear_dirty_log iommu ops
  iommu/arm-smmu-v3: Add HWDBM device feature reporting

 .../iommu/arm/arm-smmu-v3/arm-smmu-v3-sva.c   |   2 +
 drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c   | 217 +-
 drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h   |  14 +
 drivers/iommu/io-pgtable-arm.c| 392 +-
 drivers/iommu/iommu.c | 266 
 include/linux/io-pgtable.h|  23 +
 include/linux/iommu.h |  76 
 7 files changed, 988 insertions(+), 2 deletions(-)

-- 
2.19.1

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

Re: [PATCH v14 05/13] iommu/smmuv3: Implement attach/detach_pasid_table

2021-03-22 Thread Keqian Zhu

Hi Eric,

On 2021/3/19 21:15, Auger Eric wrote:
> Hi Keqian,
> 
> On 3/2/21 9:35 AM, Keqian Zhu wrote:
>> Hi Eric,
>>
>> On 2021/2/24 4:56, Eric Auger wrote:
>>> On attach_pasid_table() we program STE S1 related info set
>>> by the guest into the actual physical STEs. At minimum
>>> we need to program the context descriptor GPA and compute
>>> whether the stage1 is translated/bypassed or aborted.
>>>
>>> On detach, the stage 1 config is unset and the abort flag is
>>> unset.
>>>
>>> Signed-off-by: Eric Auger 
>>>
>> [...]
>>
>>> +
>>> +   /*
>>> +* we currently support a single CD so s1fmt and s1dss
>>> +* fields are also ignored
>>> +*/
>>> +   if (cfg->pasid_bits)
>>> +   goto out;
>>> +
>>> +   smmu_domain->s1_cfg.cdcfg.cdtab_dma = cfg->base_ptr;
>> only the "cdtab_dma" field of "cdcfg" is set, we are not able to locate a 
>> specific cd using arm_smmu_get_cd_ptr().
>>
>> Maybe we'd better use a specialized function to fill other fields of "cdcfg" 
>> or add a sanity check in arm_smmu_get_cd_ptr()
>> to prevent calling it under nested mode?
>>
>> As now we just call arm_smmu_get_cd_ptr() during finalise_s1(), no problem 
>> found. Just a suggestion ;-)
> 
> forgive me for the delay. yes I can indeed make sure that code is not
> called in nested mode. Please could you detail why you would need to
> call arm_smmu_get_cd_ptr()?
I accidentally called this function in nested mode when verify the smmu mpam 
feature. :)

Yes, in nested mode, context descriptor is owned by guest, hypervisor does not 
need to care about its content.
Maybe we'd better give an explicit comment for arm_smmu_get_cd_ptr() to let 
coder pay attention to this? :)

Thanks,
Keqian

> 
> Thanks
> 
> Eric
>>
>> Thanks,
>> Keqian
>>
>>
>>> +   smmu_domain->s1_cfg.set = true;
>>> +   smmu_domain->abort = false;
>>> +   break;
>>> +   default:
>>> +   goto out;
>>> +   }
>>> +   spin_lock_irqsave(_domain->devices_lock, flags);
>>> +   list_for_each_entry(master, _domain->devices, domain_head)
>>> +   arm_smmu_install_ste_for_dev(master);
>>> +   spin_unlock_irqrestore(_domain->devices_lock, flags);
>>> +   ret = 0;
>>> +out:
>>> +   mutex_unlock(_domain->init_mutex);
>>> +   return ret;
>>> +}
>>> +
>>> +static void arm_smmu_detach_pasid_table(struct iommu_domain *domain)
>>> +{
>>> +   struct arm_smmu_domain *smmu_domain = to_smmu_domain(domain);
>>> +   struct arm_smmu_master *master;
>>> +   unsigned long flags;
>>> +
>>> +   mutex_lock(_domain->init_mutex);
>>> +
>>> +   if (smmu_domain->stage != ARM_SMMU_DOMAIN_NESTED)
>>> +   goto unlock;
>>> +
>>> +   smmu_domain->s1_cfg.set = false;
>>> +   smmu_domain->abort = false;
>>> +
>>> +   spin_lock_irqsave(_domain->devices_lock, flags);
>>> +   list_for_each_entry(master, _domain->devices, domain_head)
>>> +   arm_smmu_install_ste_for_dev(master);
>>> +   spin_unlock_irqrestore(_domain->devices_lock, flags);
>>> +
>>> +unlock:
>>> +   mutex_unlock(_domain->init_mutex);
>>> +}
>>> +
>>>  static bool arm_smmu_dev_has_feature(struct device *dev,
>>>  enum iommu_dev_features feat)
>>>  {
>>> @@ -2939,6 +3026,8 @@ static struct iommu_ops arm_smmu_ops = {
>>> .of_xlate   = arm_smmu_of_xlate,
>>> .get_resv_regions   = arm_smmu_get_resv_regions,
>>> .put_resv_regions   = generic_iommu_put_resv_regions,
>>> +   .attach_pasid_table = arm_smmu_attach_pasid_table,
>>> +   .detach_pasid_table = arm_smmu_detach_pasid_table,
>>> .dev_has_feat   = arm_smmu_dev_has_feature,
>>> .dev_feat_enabled   = arm_smmu_dev_feature_enabled,
>>> .dev_enable_feat= arm_smmu_dev_enable_feature,
>>>
>>
> 
> .
> 
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

Re: [PATCH v2 06/11] iommu/arm-smmu-v3: Scan leaf TTD to sync hardware dirty log

2021-03-17 Thread Keqian Zhu

On 2021/3/17 18:44, Yi Sun wrote:
> On 21-03-10 17:06:09, Keqian Zhu wrote:
>> From: jiangkunkun 
>>
>> During dirty log tracking, user will try to retrieve dirty log from
>> iommu if it supports hardware dirty log.
>>
>> This adds a new interface named sync_dirty_log in iommu layer and
>> arm smmuv3 implements it, which scans leaf TTD and treats it's dirty
>> if it's writable (As we just enable HTTU for stage1, so check whether
>> AP[2] is not set).
>>
>> Co-developed-by: Keqian Zhu 
>> Signed-off-by: Kunkun Jiang 
>> ---
>>
>> changelog:
>>
>> v2:
>>  - Add new sanity check in arm_smmu_sync_dirty_log(). (smmu_domain->stage != 
>> ARM_SMMU_DOMAIN_S1)
>>  - Document the purpose of flush_iotlb in arm_smmu_sync_dirty_log(). (Robin)
>>  
>> ---
>>  drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c | 30 +++
>>  drivers/iommu/io-pgtable-arm.c  | 90 +
>>  drivers/iommu/iommu.c   | 38 +
>>  include/linux/io-pgtable.h  |  4 +
>>  include/linux/iommu.h   | 18 +
>>  5 files changed, 180 insertions(+)
>>
> Please split iommu common interface out. Thanks!
Yes, I will do it in v3.

> 
> [...]
> 
>> diff --git a/drivers/iommu/iommu.c b/drivers/iommu/iommu.c
>> index 2a10294b62a3..44dfb78f9050 100644
>> --- a/drivers/iommu/iommu.c
>> +++ b/drivers/iommu/iommu.c
>> @@ -2850,6 +2850,44 @@ int iommu_stop_dirty_log(struct iommu_domain *domain, 
>> unsigned long iova,
>>  }
>>  EXPORT_SYMBOL_GPL(iommu_stop_dirty_log);
>>  
>> +int iommu_sync_dirty_log(struct iommu_domain *domain, unsigned long iova,
>> + size_t size, unsigned long *bitmap,
>> + unsigned long base_iova, unsigned long bitmap_pgshift)
> 
> One open question: shall we add PASID as one parameter to make iommu
> know which address space to visit?
> 
> For live migration, the pasid should not be necessary. But considering
Sure, for live migration we just need to care about level/stage 2 mapping under 
nested mode.

> future extension, it may be required.
It sounds a good idea. I will consider this, thanks!

> 
> BRs,
> Yi Sun
> .
> 
Thanks,
Keqian
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

Re: [PATCH v2 04/11] iommu/arm-smmu-v3: Split block descriptor when start dirty log

2021-03-16 Thread Keqian Zhu

Hi Yi,

On 2021/3/16 17:17, Yi Sun wrote:
> On 21-03-10 17:06:07, Keqian Zhu wrote:
>> From: jiangkunkun 
>>
>> Block descriptor is not a proper granule for dirty log tracking.
>> Take an extreme example, if DMA writes one byte, under 1G mapping,
>> the dirty amount reported to userspace is 1G, but under 4K mapping,
>> the dirty amount is just 4K.
>>
>> This adds a new interface named start_dirty_log in iommu layer and
>> arm smmuv3 implements it, which splits block descriptor to an span
>> of page descriptors. Other types of IOMMU will perform architecture
>> specific actions to start dirty log.
>>
>> To allow code reuse, the split_block operation is realized as an
>> iommu_ops too. We flush all iotlbs after the whole procedure is
>> completed to ease the pressure of iommu, as we will hanle a huge
>> range of mapping in general.
>>
>> Spliting block does not simultaneously work with other pgtable ops,
>> as the only designed user is vfio, which always hold a lock, so race
>> condition is not considered in the pgtable ops.
>>
>> Co-developed-by: Keqian Zhu 
>> Signed-off-by: Kunkun Jiang 
>> ---
>>
>> changelog:
>>
>> v2:
>>  - Change the return type of split_block(). size_t -> int.
>>  - Change commit message to properly describe race condition. (Robin)
>>  - Change commit message to properly describe the need of split block.
>>  - Add a new interface named start_dirty_log(). (Sun Yi)
>>  - Change commit message to explain the realtionship of split_block() and 
>> start_dirty_log().
>>
>> ---
>>  drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c |  52 +
>>  drivers/iommu/io-pgtable-arm.c  | 122 
>>  drivers/iommu/iommu.c   |  48 
>>  include/linux/io-pgtable.h  |   2 +
>>  include/linux/iommu.h   |  24 
>>  5 files changed, 248 insertions(+)
>>
> Could you please split iommu common interface to a separate patch?
> This may make review and comments easier.
Yup, good suggestion.

> 
> IMHO, I think the start/stop interfaces could be merged into one, e.g:
> int iommu_domain_set_hwdbm(struct iommu_domain *domain, bool enable,
>unsigned long iova, size_t size,
>int prot);
Looks good, this reduces some code. but I have a concern that this causes loss 
of flexibility,
as we must pass same arguments when start|stop dirty log. What's your opinion 
about this?

> 
> Same comments to patch 5.
OK. Thanks.

> 
> BRs,
> Yi Sun
> 
>> -- 
>> 2.19.1
> .
Thanks,
Keqian
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

Re: [PATCH] vfio/type1: fix vaddr_get_pfns() return in vfio_pin_page_external()

2021-03-14 Thread Keqian Zhu



Hi Daniel,

[+Cc iommu mail list]

This patch looks good to me. (but I don't test it too.)

Thanks,
Keqian

On 2021/3/9 1:24, Daniel Jordan wrote:
> vaddr_get_pfns() now returns the positive number of pfns successfully
> gotten instead of zero.  vfio_pin_page_external() might return 1 to
> vfio_iommu_type1_pin_pages(), which will treat it as an error, if
> vaddr_get_pfns() is successful but vfio_pin_page_external() doesn't
> reach vfio_lock_acct().
> 
> Fix it up in vfio_pin_page_external().  Found by inspection.
> 
> Fixes: be16c1fd99f4 ("vfio/type1: Change success value of vaddr_get_pfn()")
> Signed-off-by: Daniel Jordan 
> ---
> 
> I couldn't test this due to lack of hardware.
> 
>  drivers/vfio/vfio_iommu_type1.c | 8 +++-
>  1 file changed, 7 insertions(+), 1 deletion(-)
> 
> diff --git a/drivers/vfio/vfio_iommu_type1.c b/drivers/vfio/vfio_iommu_type1.c
> index 4bb162c1d649..2a0e3b3ce206 100644
> --- a/drivers/vfio/vfio_iommu_type1.c
> +++ b/drivers/vfio/vfio_iommu_type1.c
> @@ -785,7 +785,12 @@ static int vfio_pin_page_external(struct vfio_dma *dma, 
> unsigned long vaddr,
>   return -ENODEV;
>  
>   ret = vaddr_get_pfns(mm, vaddr, 1, dma->prot, pfn_base, pages);
> - if (ret == 1 && do_accounting && !is_invalid_reserved_pfn(*pfn_base)) {
> + if (ret != 1)
> + goto out;
> +
> + ret = 0;
> +
> + if (do_accounting && !is_invalid_reserved_pfn(*pfn_base)) {
>   ret = vfio_lock_acct(dma, 1, true);
>   if (ret) {
>   put_pfn(*pfn_base, dma->prot);
> @@ -797,6 +802,7 @@ static int vfio_pin_page_external(struct vfio_dma *dma, 
> unsigned long vaddr,
>   }
>   }
>  
> +out:
>   mmput(mm);
>   return ret;
>  }
> 
> base-commit: 144c79ef33536b4ecb4951e07dbc1f2b7fa99d32
> 
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

[PATCH v2 01/11] iommu/arm-smmu-v3: Add support for Hardware Translation Table Update

2021-03-10 Thread Keqian Zhu

From: Jean-Philippe Brucker 

If the SMMU supports it and the kernel was built with HTTU support,
enable hardware update of access and dirty flags. This is essential for
shared page tables, to reduce the number of access faults on the fault
queue. Normal DMA with io-pgtables doesn't currently use the access or
dirty flags.

We can enable HTTU even if CPUs don't support it, because the kernel
always checks for HW dirty bit and updates the PTE flags atomically.

Signed-off-by: Jean-Philippe Brucker 
---
 .../iommu/arm/arm-smmu-v3/arm-smmu-v3-sva.c   |  2 +
 drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c   | 41 ++-
 drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h   |  8 
 3 files changed, 50 insertions(+), 1 deletion(-)

diff --git a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3-sva.c 
b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3-sva.c
index bb251cab61f3..ae075e675892 100644
--- a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3-sva.c
+++ b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3-sva.c
@@ -121,10 +121,12 @@ static struct arm_smmu_ctx_desc 
*arm_smmu_alloc_shared_cd(struct mm_struct *mm)
if (err)
goto out_free_asid;
 
+   /* HA and HD will be filtered out later if not supported by the SMMU */
tcr = FIELD_PREP(CTXDESC_CD_0_TCR_T0SZ, 64ULL - vabits_actual) |
  FIELD_PREP(CTXDESC_CD_0_TCR_IRGN0, ARM_LPAE_TCR_RGN_WBWA) |
  FIELD_PREP(CTXDESC_CD_0_TCR_ORGN0, ARM_LPAE_TCR_RGN_WBWA) |
  FIELD_PREP(CTXDESC_CD_0_TCR_SH0, ARM_LPAE_TCR_SH_IS) |
+ CTXDESC_CD_0_TCR_HA | CTXDESC_CD_0_TCR_HD |
  CTXDESC_CD_0_TCR_EPD1 | CTXDESC_CD_0_AA64;
 
switch (PAGE_SIZE) {
diff --git a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c 
b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
index 8594b4a83043..b6d965504f44 100644
--- a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
+++ b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
@@ -1012,10 +1012,17 @@ int arm_smmu_write_ctx_desc(struct arm_smmu_domain 
*smmu_domain, int ssid,
 * this substream's traffic
 */
} else { /* (1) and (2) */
+   u64 tcr = cd->tcr;
+
cdptr[1] = cpu_to_le64(cd->ttbr & CTXDESC_CD_1_TTB0_MASK);
cdptr[2] = 0;
cdptr[3] = cpu_to_le64(cd->mair);
 
+   if (!(smmu->features & ARM_SMMU_FEAT_HD))
+   tcr &= ~CTXDESC_CD_0_TCR_HD;
+   if (!(smmu->features & ARM_SMMU_FEAT_HA))
+   tcr &= ~CTXDESC_CD_0_TCR_HA;
+
/*
 * STE is live, and the SMMU might read dwords of this CD in any
 * order. Ensure that it observes valid values before reading
@@ -1023,7 +1030,7 @@ int arm_smmu_write_ctx_desc(struct arm_smmu_domain 
*smmu_domain, int ssid,
 */
arm_smmu_sync_cd(smmu_domain, ssid, true);
 
-   val = cd->tcr |
+   val = tcr |
 #ifdef __BIG_ENDIAN
CTXDESC_CD_0_ENDI |
 #endif
@@ -3196,6 +3203,28 @@ static int arm_smmu_device_reset(struct arm_smmu_device 
*smmu, bool bypass)
return 0;
 }
 
+static void arm_smmu_get_httu(struct arm_smmu_device *smmu, u32 reg)
+{
+   u32 fw_features = smmu->features & (ARM_SMMU_FEAT_HA | 
ARM_SMMU_FEAT_HD);
+   u32 features = 0;
+
+   switch (FIELD_GET(IDR0_HTTU, reg)) {
+   case IDR0_HTTU_ACCESS_DIRTY:
+   features |= ARM_SMMU_FEAT_HD;
+   fallthrough;
+   case IDR0_HTTU_ACCESS:
+   features |= ARM_SMMU_FEAT_HA;
+   }
+
+   if (smmu->dev->of_node)
+   smmu->features |= features;
+   else if (features != fw_features)
+   /* ACPI IORT sets the HTTU bits */
+   dev_warn(smmu->dev,
+"IDR0.HTTU overridden by FW configuration (0x%x)\n",
+fw_features);
+}
+
 static int arm_smmu_device_hw_probe(struct arm_smmu_device *smmu)
 {
u32 reg;
@@ -3256,6 +3285,8 @@ static int arm_smmu_device_hw_probe(struct 
arm_smmu_device *smmu)
smmu->features |= ARM_SMMU_FEAT_E2H;
}
 
+   arm_smmu_get_httu(smmu, reg);
+
/*
 * The coherency feature as set by FW is used in preference to the ID
 * register, but warn on mismatch.
@@ -3441,6 +3472,14 @@ static int arm_smmu_device_acpi_probe(struct 
platform_device *pdev,
if (iort_smmu->flags & ACPI_IORT_SMMU_V3_COHACC_OVERRIDE)
smmu->features |= ARM_SMMU_FEAT_COHERENCY;
 
+   switch (FIELD_GET(ACPI_IORT_SMMU_V3_HTTU_OVERRIDE, iort_smmu->flags)) {
+   case IDR0_HTTU_ACCESS_DIRTY:
+   smmu->features |= ARM_SMMU_FEAT_HD;
+   fallthrough;
+   case IDR0_HTTU_ACCESS:
+   smmu->features |= ARM_SMMU_FEAT_HA;
+   }
+
return 0;
 }
 #else
diff --git a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h 
b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h
index

[PATCH v2 11/11] vfio/iommu_type1: Add support for manual dirty log clear

2021-03-10 Thread Keqian Zhu

From: jiangkunkun 

In the past, we clear dirty log immediately after sync dirty
log to userspace. This may cause redundant dirty handling if
userspace handles dirty log iteratively:

After vfio clears dirty log, new dirty log starts to generate.
These new dirty log will be reported to userspace even if they
are generated before userspace handles the same dirty page.

That's to say, we should minimize the time gap of dirty log
clearing and dirty log handling. We can give userspace the
interface to clear dirty log.

Co-developed-by: Keqian Zhu 
Signed-off-by: Kunkun Jiang 
---

changelog:

v2:
 - Rebase to newest code, so change VFIO_DIRTY_LOG_MANUAL_CLEAR form 9 to 11.

---
 drivers/vfio/vfio_iommu_type1.c | 104 ++--
 include/uapi/linux/vfio.h   |  28 -
 2 files changed, 127 insertions(+), 5 deletions(-)

diff --git a/drivers/vfio/vfio_iommu_type1.c b/drivers/vfio/vfio_iommu_type1.c
index a7ab0279eda0..94306f567894 100644
--- a/drivers/vfio/vfio_iommu_type1.c
+++ b/drivers/vfio/vfio_iommu_type1.c
@@ -77,6 +77,7 @@ struct vfio_iommu {
boolv2;
boolnesting;
booldirty_page_tracking;
+   booldirty_log_manual_clear;
boolpinned_page_dirty_scope;
boolcontainer_open;
uint64_tnum_non_hwdbm_groups;
@@ -1226,6 +1227,78 @@ static int vfio_iommu_dirty_log_clear(struct vfio_iommu 
*iommu,
return 0;
 }
 
+static int vfio_iova_dirty_log_clear(u64 __user *bitmap,
+struct vfio_iommu *iommu,
+dma_addr_t iova, size_t size,
+size_t pgsize)
+{
+   struct vfio_dma *dma;
+   struct rb_node *n;
+   dma_addr_t start_iova, end_iova, riova;
+   unsigned long pgshift = __ffs(pgsize);
+   unsigned long bitmap_size;
+   unsigned long *bitmap_buffer = NULL;
+   bool clear_valid;
+   int rs, re, start, end, dma_offset;
+   int ret = 0;
+
+   bitmap_size = DIRTY_BITMAP_BYTES(size >> pgshift);
+   bitmap_buffer = kvmalloc(bitmap_size, GFP_KERNEL);
+   if (!bitmap_buffer) {
+   ret = -ENOMEM;
+   goto out;
+   }
+
+   if (copy_from_user(bitmap_buffer, bitmap, bitmap_size)) {
+   ret = -EFAULT;
+   goto out;
+   }
+
+   for (n = rb_first(>dma_list); n; n = rb_next(n)) {
+   dma = rb_entry(n, struct vfio_dma, node);
+   if (!dma->iommu_mapped)
+   continue;
+   if ((dma->iova + dma->size - 1) < iova)
+   continue;
+   if (dma->iova > iova + size - 1)
+   break;
+
+   start_iova = max(iova, dma->iova);
+   end_iova = min(iova + size, dma->iova + dma->size);
+
+   /* Similar logic as the tail of vfio_iova_dirty_bitmap */
+
+   clear_valid = false;
+   start = (start_iova - iova) >> pgshift;
+   end = (end_iova - iova) >> pgshift;
+   bitmap_for_each_set_region(bitmap_buffer, rs, re, start, end) {
+   clear_valid = true;
+   riova = iova + (rs << pgshift);
+   dma_offset = (riova - dma->iova) >> pgshift;
+   bitmap_clear(dma->bitmap, dma_offset, re - rs);
+   }
+
+   if (clear_valid)
+   vfio_dma_populate_bitmap(dma, pgsize);
+
+   if (clear_valid && !iommu->pinned_page_dirty_scope &&
+   dma->iommu_mapped && !iommu->num_non_hwdbm_groups) {
+   ret = vfio_iommu_dirty_log_clear(iommu, start_iova,
+   end_iova - start_iova,  bitmap_buffer,
+   iova, pgsize);
+   if (ret) {
+   pr_warn("dma dirty log clear failed!\n");
+   goto out;
+   }
+   }
+
+   }
+
+out:
+   kfree(bitmap_buffer);
+   return ret;
+}
+
 static int update_user_bitmap(u64 __user *bitmap, struct vfio_iommu *iommu,
  struct vfio_dma *dma, dma_addr_t base_iova,
  size_t pgsize)
@@ -1275,6 +1348,11 @@ static int update_user_bitmap(u64 __user *bitmap, struct 
vfio_iommu *iommu,
 DIRTY_BITMAP_BYTES(nbits + shift)))
return -EFAULT;
 
+   /* Recover the bitmap under manual clear */
+   if (shift && iommu->dirty_log_manual_clear)
+   bitmap_shift_right(dma->bitmap, dma->bitmap, shift,
+  nbits + shift);
+
return 0;
 }

[PATCH v2 06/11] iommu/arm-smmu-v3: Scan leaf TTD to sync hardware dirty log

2021-03-10 Thread Keqian Zhu

From: jiangkunkun 

During dirty log tracking, user will try to retrieve dirty log from
iommu if it supports hardware dirty log.

This adds a new interface named sync_dirty_log in iommu layer and
arm smmuv3 implements it, which scans leaf TTD and treats it's dirty
if it's writable (As we just enable HTTU for stage1, so check whether
AP[2] is not set).

Co-developed-by: Keqian Zhu 
Signed-off-by: Kunkun Jiang 
---

changelog:

v2:
 - Add new sanity check in arm_smmu_sync_dirty_log(). (smmu_domain->stage != 
ARM_SMMU_DOMAIN_S1)
 - Document the purpose of flush_iotlb in arm_smmu_sync_dirty_log(). (Robin)
 
---
 drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c | 30 +++
 drivers/iommu/io-pgtable-arm.c  | 90 +
 drivers/iommu/iommu.c   | 38 +
 include/linux/io-pgtable.h  |  4 +
 include/linux/iommu.h   | 18 +
 5 files changed, 180 insertions(+)

diff --git a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c 
b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
index ac0d881c77b8..7407896a710e 100644
--- a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
+++ b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
@@ -2637,6 +2637,35 @@ static int arm_smmu_stop_dirty_log(struct iommu_domain 
*domain,
return 0;
 }
 
+static int arm_smmu_sync_dirty_log(struct iommu_domain *domain,
+  unsigned long iova, size_t size,
+  unsigned long *bitmap,
+  unsigned long base_iova,
+  unsigned long bitmap_pgshift)
+{
+   struct arm_smmu_domain *smmu_domain = to_smmu_domain(domain);
+   struct io_pgtable_ops *ops = smmu_domain->pgtbl_ops;
+   struct arm_smmu_device *smmu = smmu_domain->smmu;
+
+   if (!(smmu->features & ARM_SMMU_FEAT_HD))
+   return -ENODEV;
+   if (smmu_domain->stage != ARM_SMMU_DOMAIN_S1)
+   return -EINVAL;
+
+   if (!ops || !ops->sync_dirty_log) {
+   pr_err("io-pgtable don't realize sync dirty log\n");
+   return -ENODEV;
+   }
+
+   /*
+* Flush iotlb to ensure all inflight transactions are completed.
+* See doc IHI0070Da 3.13.4 "HTTU behavior summary".
+*/
+   arm_smmu_flush_iotlb_all(domain);
+   return ops->sync_dirty_log(ops, iova, size, bitmap, base_iova,
+  bitmap_pgshift);
+}
+
 static int arm_smmu_of_xlate(struct device *dev, struct of_phandle_args *args)
 {
return iommu_fwspec_add_ids(dev, args->args, 1);
@@ -2740,6 +2769,7 @@ static struct iommu_ops arm_smmu_ops = {
.start_dirty_log= arm_smmu_start_dirty_log,
.merge_page = arm_smmu_merge_page,
.stop_dirty_log = arm_smmu_stop_dirty_log,
+   .sync_dirty_log = arm_smmu_sync_dirty_log,
.of_xlate   = arm_smmu_of_xlate,
.get_resv_regions   = arm_smmu_get_resv_regions,
.put_resv_regions   = generic_iommu_put_resv_regions,
diff --git a/drivers/iommu/io-pgtable-arm.c b/drivers/iommu/io-pgtable-arm.c
index 9028328b99b0..67a208a05ab2 100644
--- a/drivers/iommu/io-pgtable-arm.c
+++ b/drivers/iommu/io-pgtable-arm.c
@@ -877,6 +877,95 @@ static size_t arm_lpae_merge_page(struct io_pgtable_ops 
*ops, unsigned long iova
return __arm_lpae_merge_page(data, iova, paddr, size, lvl, ptep, prot);
 }
 
+static int __arm_lpae_sync_dirty_log(struct arm_lpae_io_pgtable *data,
+unsigned long iova, size_t size,
+int lvl, arm_lpae_iopte *ptep,
+unsigned long *bitmap,
+unsigned long base_iova,
+unsigned long bitmap_pgshift)
+{
+   arm_lpae_iopte pte;
+   struct io_pgtable *iop = >iop;
+   size_t base, next_size;
+   unsigned long offset;
+   int nbits, ret;
+
+   if (WARN_ON(lvl == ARM_LPAE_MAX_LEVELS))
+   return -EINVAL;
+
+   ptep += ARM_LPAE_LVL_IDX(iova, lvl, data);
+   pte = READ_ONCE(*ptep);
+   if (WARN_ON(!pte))
+   return -EINVAL;
+
+   if (size == ARM_LPAE_BLOCK_SIZE(lvl, data)) {
+   if (iopte_leaf(pte, lvl, iop->fmt)) {
+   if (pte & ARM_LPAE_PTE_AP_RDONLY)
+   return 0;
+
+   /* It is writable, set the bitmap */
+   nbits = size >> bitmap_pgshift;
+   offset = (iova - base_iova) >> bitmap_pgshift;
+   bitmap_set(bitmap, offset, nbits);
+   return 0;
+   } else {
+   /* To traverse next level */
+   next_size = ARM_LPAE_BLOCK_SIZE(lvl + 1, data);
+   ptep = iopt

[PATCH v2 04/11] iommu/arm-smmu-v3: Split block descriptor when start dirty log

2021-03-10 Thread Keqian Zhu

From: jiangkunkun 

Block descriptor is not a proper granule for dirty log tracking.
Take an extreme example, if DMA writes one byte, under 1G mapping,
the dirty amount reported to userspace is 1G, but under 4K mapping,
the dirty amount is just 4K.

This adds a new interface named start_dirty_log in iommu layer and
arm smmuv3 implements it, which splits block descriptor to an span
of page descriptors. Other types of IOMMU will perform architecture
specific actions to start dirty log.

To allow code reuse, the split_block operation is realized as an
iommu_ops too. We flush all iotlbs after the whole procedure is
completed to ease the pressure of iommu, as we will hanle a huge
range of mapping in general.

Spliting block does not simultaneously work with other pgtable ops,
as the only designed user is vfio, which always hold a lock, so race
condition is not considered in the pgtable ops.

Co-developed-by: Keqian Zhu 
Signed-off-by: Kunkun Jiang 
---

changelog:

v2:
 - Change the return type of split_block(). size_t -> int.
 - Change commit message to properly describe race condition. (Robin)
 - Change commit message to properly describe the need of split block.
 - Add a new interface named start_dirty_log(). (Sun Yi)
 - Change commit message to explain the realtionship of split_block() and 
start_dirty_log().

---
 drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c |  52 +
 drivers/iommu/io-pgtable-arm.c  | 122 
 drivers/iommu/iommu.c   |  48 
 include/linux/io-pgtable.h  |   2 +
 include/linux/iommu.h   |  24 
 5 files changed, 248 insertions(+)

diff --git a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c 
b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
index 443ac19c6da9..5d2fb926a08e 100644
--- a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
+++ b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
@@ -2537,6 +2537,56 @@ static int arm_smmu_domain_set_attr(struct iommu_domain 
*domain,
return ret;
 }
 
+static int arm_smmu_split_block(struct iommu_domain *domain,
+   unsigned long iova, size_t size)
+{
+   struct arm_smmu_domain *smmu_domain = to_smmu_domain(domain);
+   struct arm_smmu_device *smmu = smmu_domain->smmu;
+   struct io_pgtable_ops *ops = smmu_domain->pgtbl_ops;
+   size_t handled_size;
+
+   if (!(smmu->features & (ARM_SMMU_FEAT_BBML1 | ARM_SMMU_FEAT_BBML2))) {
+   dev_err(smmu->dev, "don't support BBML1/2, can't split 
block\n");
+   return -ENODEV;
+   }
+   if (!ops || !ops->split_block) {
+   pr_err("io-pgtable don't realize split block\n");
+   return -ENODEV;
+   }
+
+   handled_size = ops->split_block(ops, iova, size);
+   if (handled_size != size) {
+   pr_err("split block failed\n");
+   return -EFAULT;
+   }
+
+   return 0;
+}
+
+/*
+ * For SMMU, the action to start dirty log is spliting block mapping. The
+ * hardware dirty management is always enabled if hardware supports HTTU HD.
+ */
+static int arm_smmu_start_dirty_log(struct iommu_domain *domain,
+   unsigned long iova, size_t size)
+{
+   struct arm_smmu_domain *smmu_domain = to_smmu_domain(domain);
+   struct arm_smmu_device *smmu = smmu_domain->smmu;
+
+   if (!(smmu->features & ARM_SMMU_FEAT_HD))
+   return -ENODEV;
+   if (smmu_domain->stage != ARM_SMMU_DOMAIN_S1)
+   return -EINVAL;
+
+   /*
+* Even if the split operation fail, we can still track dirty at block
+* granule, which is still a much better choice compared to full dirty
+* policy.
+*/
+   iommu_split_block(domain, iova, size);
+   return 0;
+}
+
 static int arm_smmu_of_xlate(struct device *dev, struct of_phandle_args *args)
 {
return iommu_fwspec_add_ids(dev, args->args, 1);
@@ -2636,6 +2686,8 @@ static struct iommu_ops arm_smmu_ops = {
.device_group   = arm_smmu_device_group,
.domain_get_attr= arm_smmu_domain_get_attr,
.domain_set_attr= arm_smmu_domain_set_attr,
+   .split_block= arm_smmu_split_block,
+   .start_dirty_log= arm_smmu_start_dirty_log,
.of_xlate   = arm_smmu_of_xlate,
.get_resv_regions   = arm_smmu_get_resv_regions,
.put_resv_regions   = generic_iommu_put_resv_regions,
diff --git a/drivers/iommu/io-pgtable-arm.c b/drivers/iommu/io-pgtable-arm.c
index 94d790b8ed27..4c4eec3c0698 100644
--- a/drivers/iommu/io-pgtable-arm.c
+++ b/drivers/iommu/io-pgtable-arm.c
@@ -79,6 +79,8 @@
 #define ARM_LPAE_PTE_SH_IS (((arm_lpae_iopte)3) << 8)
 #define ARM_LPAE_PTE_NS(((arm_lpae_iopte)1) << 5)
 #define ARM_LPAE_PTE_VALID

[PATCH v2 08/11] iommu/arm-smmu-v3: Add HWDBM device feature reporting

2021-03-10 Thread Keqian Zhu

From: jiangkunkun 

We have implemented these interfaces required to support iommu
dirty log tracking. The last step is reporting this feature to
upper user, then the user can perform higher policy base on it.

This adds a new dev feature named IOMMU_DEV_FEAT_HWDBM in iommu
layer. For arm smmuv3, it is equal to ARM_SMMU_FEAT_HD and it is
enabled by default if supported. Other types of IOMMU can enable
it by default or when dev_enable_feature() is called.

Co-developed-by: Keqian Zhu 
Signed-off-by: Kunkun Jiang 
---

changelog:

v2:
 - As dev_has_feature() has been removed from iommu layer, IOMMU_DEV_FEAT_HWDBM
   is designed to be used through "enable" interface.

---
 drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c | 4 
 include/linux/iommu.h   | 1 +
 2 files changed, 5 insertions(+)

diff --git a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c 
b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
index 696df51a3282..cd1627123e80 100644
--- a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
+++ b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
@@ -2722,6 +2722,8 @@ static bool arm_smmu_dev_has_feature(struct device *dev,
switch (feat) {
case IOMMU_DEV_FEAT_SVA:
return arm_smmu_master_sva_supported(master);
+   case IOMMU_DEV_FEAT_HWDBM:
+   return !!(master->smmu->features & ARM_SMMU_FEAT_HD);
default:
return false;
}
@@ -2738,6 +2740,8 @@ static bool arm_smmu_dev_feature_enabled(struct device 
*dev,
switch (feat) {
case IOMMU_DEV_FEAT_SVA:
return arm_smmu_master_sva_enabled(master);
+   case IOMMU_DEV_FEAT_HWDBM:
+   return arm_smmu_dev_has_feature(dev, feat);
default:
return false;
}
diff --git a/include/linux/iommu.h b/include/linux/iommu.h
index 4f7db5d23b23..88584a2d027c 100644
--- a/include/linux/iommu.h
+++ b/include/linux/iommu.h
@@ -160,6 +160,7 @@ struct iommu_resv_region {
 enum iommu_dev_features {
IOMMU_DEV_FEAT_AUX, /* Aux-domain feature */
IOMMU_DEV_FEAT_SVA, /* Shared Virtual Addresses */
+   IOMMU_DEV_FEAT_HWDBM,   /* Hardware Dirty Bit Management */
 };
 
 #define IOMMU_PASID_INVALID(-1U)
-- 
2.19.1

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

[PATCH v2 02/11] iommu/arm-smmu-v3: Enable HTTU for stage1 with io-pgtable mapping

2021-03-10 Thread Keqian Zhu

From: jiangkunkun 

If HTTU is supported, we enable HA/HD bits in the SMMU CD (stage 1
mapping), and set DBM bit for writable TTD.

The dirty state information is encoded using the access permission
bits AP[2] (stage 1) or S2AP[1] (stage 2) in conjunction with the
DBM (Dirty Bit Modifier) bit, where DBM means writable and AP[2]/
S2AP[1] means dirty.

Co-developed-by: Keqian Zhu 
Signed-off-by: Kunkun Jiang 
---

changelog:

v2:
 - Use a new quirk flag named IO_PGTABLE_QUIRK_ARM_HD to transfer
   SMMU HD feature to io-pgtable. (Robin)

 - Rebase on Jean's HTTU patch(#1).

---
 drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c | 3 +++
 drivers/iommu/io-pgtable-arm.c  | 7 ++-
 include/linux/io-pgtable.h  | 3 +++
 3 files changed, 12 insertions(+), 1 deletion(-)

diff --git a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c 
b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
index b6d965504f44..369c0ea7a104 100644
--- a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
+++ b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
@@ -1921,6 +1921,7 @@ static int arm_smmu_domain_finalise_s1(struct 
arm_smmu_domain *smmu_domain,
  FIELD_PREP(CTXDESC_CD_0_TCR_ORGN0, tcr->orgn) |
  FIELD_PREP(CTXDESC_CD_0_TCR_SH0, tcr->sh) |
  FIELD_PREP(CTXDESC_CD_0_TCR_IPS, tcr->ips) |
+ CTXDESC_CD_0_TCR_HA | CTXDESC_CD_0_TCR_HD |
  CTXDESC_CD_0_TCR_EPD1 | CTXDESC_CD_0_AA64;
cfg->cd.mair= pgtbl_cfg->arm_lpae_s1_cfg.mair;
 
@@ -2026,6 +2027,8 @@ static int arm_smmu_domain_finalise(struct iommu_domain 
*domain,
 
if (smmu_domain->non_strict)
pgtbl_cfg.quirks |= IO_PGTABLE_QUIRK_NON_STRICT;
+   if (smmu->features & ARM_SMMU_FEAT_HD)
+   pgtbl_cfg.quirks |= IO_PGTABLE_QUIRK_ARM_HD;
 
pgtbl_ops = alloc_io_pgtable_ops(fmt, _cfg, smmu_domain);
if (!pgtbl_ops)
diff --git a/drivers/iommu/io-pgtable-arm.c b/drivers/iommu/io-pgtable-arm.c
index 87def58e79b5..94d790b8ed27 100644
--- a/drivers/iommu/io-pgtable-arm.c
+++ b/drivers/iommu/io-pgtable-arm.c
@@ -72,6 +72,7 @@
 
 #define ARM_LPAE_PTE_NSTABLE   (((arm_lpae_iopte)1) << 63)
 #define ARM_LPAE_PTE_XN(((arm_lpae_iopte)3) << 53)
+#define ARM_LPAE_PTE_DBM   (((arm_lpae_iopte)1) << 51)
 #define ARM_LPAE_PTE_AF(((arm_lpae_iopte)1) << 10)
 #define ARM_LPAE_PTE_SH_NS (((arm_lpae_iopte)0) << 8)
 #define ARM_LPAE_PTE_SH_OS (((arm_lpae_iopte)2) << 8)
@@ -81,7 +82,7 @@
 
 #define ARM_LPAE_PTE_ATTR_LO_MASK  (((arm_lpae_iopte)0x3ff) << 2)
 /* Ignore the contiguous bit for block splitting */
-#define ARM_LPAE_PTE_ATTR_HI_MASK  (((arm_lpae_iopte)6) << 52)
+#define ARM_LPAE_PTE_ATTR_HI_MASK  (((arm_lpae_iopte)13) << 51)
 #define ARM_LPAE_PTE_ATTR_MASK (ARM_LPAE_PTE_ATTR_LO_MASK |\
 ARM_LPAE_PTE_ATTR_HI_MASK)
 /* Software bit for solving coherency races */
@@ -379,6 +380,7 @@ static int __arm_lpae_map(struct arm_lpae_io_pgtable *data, 
unsigned long iova,
 static arm_lpae_iopte arm_lpae_prot_to_pte(struct arm_lpae_io_pgtable *data,
   int prot)
 {
+   struct io_pgtable_cfg *cfg = >iop.cfg;
arm_lpae_iopte pte;
 
if (data->iop.fmt == ARM_64_LPAE_S1 ||
@@ -386,6 +388,9 @@ static arm_lpae_iopte arm_lpae_prot_to_pte(struct 
arm_lpae_io_pgtable *data,
pte = ARM_LPAE_PTE_nG;
if (!(prot & IOMMU_WRITE) && (prot & IOMMU_READ))
pte |= ARM_LPAE_PTE_AP_RDONLY;
+   else if (cfg->quirks & IO_PGTABLE_QUIRK_ARM_HD)
+   pte |= ARM_LPAE_PTE_DBM;
+
if (!(prot & IOMMU_PRIV))
pte |= ARM_LPAE_PTE_AP_UNPRIV;
} else {
diff --git a/include/linux/io-pgtable.h b/include/linux/io-pgtable.h
index a4c9ca2c31f1..64cee6831c97 100644
--- a/include/linux/io-pgtable.h
+++ b/include/linux/io-pgtable.h
@@ -82,6 +82,8 @@ struct io_pgtable_cfg {
 *
 * IO_PGTABLE_QUIRK_ARM_OUTER_WBWA: Override the outer-cacheability
 *  attributes set in the TCR for a non-coherent page-table walker.
+*
+* IO_PGTABLE_QUIRK_ARM_HD: Support hardware management of dirty status.
 */
#define IO_PGTABLE_QUIRK_ARM_NS BIT(0)
#define IO_PGTABLE_QUIRK_NO_PERMS   BIT(1)
@@ -89,6 +91,7 @@ struct io_pgtable_cfg {
#define IO_PGTABLE_QUIRK_NON_STRICT BIT(4)
#define IO_PGTABLE_QUIRK_ARM_TTBR1  BIT(5)
#define IO_PGTABLE_QUIRK_ARM_OUTER_WBWA BIT(6)
+   #define IO_PGTABLE_QUIRK_ARM_HD BIT(7)
unsigned long   quirks;
unsigned long   pgsize_bitmap;
unsigned

[PATCH v2 00/11] vfio/iommu_type1: Implement dirty log tracking based on smmuv3 HTTU

2021-03-10 Thread Keqian Zhu

Hi all,

This patch series implement vfio dma dirty log tracking based on smmuv3 HTTU.

changelog:

v2:
 - Address all comments of RFC version, thanks for all of you ;-)
 - Add a bugfix that start dirty log for newly added dma ranges and domain.

Intention：

As we know, vfio live migration is an important and valuable feature, but there
are still many hurdles to solve, including migration of interrupt, device state,
DMA dirty log tracking, and etc.

For now, the only dirty log tracking interface is pinning. It has some 
drawbacks:
1. Only smart vendor drivers are aware of this.
2. It's coarse-grained, the pinned-scope is generally bigger than what the 
device actually access.
3. It can't track dirty continuously and precisely, vfio populates all 
pinned-scope as dirty.
   So it doesn't work well with iteratively dirty log handling.

About SMMU HTTU:

HTTU (Hardware Translation Table Update) is a feature of ARM SMMUv3, it can 
update
access flag or/and dirty state of the TTD (Translation Table Descriptor) by 
hardware.
With HTTU, stage1 TTD is classified into 3 types:
DBM bit AP[2](readonly bit)
1. writable_clean 1   1
2. writable_dirty 1   0
3. readonly   0   1

If HTTU_HD (manage dirty state) is enabled, smmu can change TTD from 
writable_clean to
writable_dirty. Then software can scan TTD to sync dirty state into dirty 
bitmap. With
this feature, we can track the dirty log of DMA continuously and precisely.

About this series:

Patch 1-3: Add feature detection for smmu HTTU and enable HTTU for smmu stage1 
mapping.
   And add feature detection for smmu BBML. We need to split block 
mapping when
   start dirty log tracking and merge page mapping when stop dirty log 
tracking,
   which requires break-before-make procedure. But it might 
cause problems when the
   TTD is alive. The I/O streams might not tolerate translation 
faults. So BBML
   should be used.

Patch 4-7: Add four interfaces (start_dirty_log, stop_dirty_log, sync_dirty_log 
and clear_dirty_log)
   in IOMMU layer, they are essential to implement dma dirty log 
tracking for vfio.
   We implement these interfaces for arm smmuv3.

Patch   8: Add HWDBM (Hardware Dirty Bit Management) device feature reporting 
in IOMMU layer.

Patch9-11: Implement a new dirty log tracking method for vfio based on iommu 
hwdbm. A new
   ioctl operation named VFIO_DIRTY_LOG_MANUAL_CLEAR is added, which 
can eliminate
   some redundant dirty handling of userspace.

Optimizations TO Do:

1. We recognized that each smmu_domain (a vfio_container may has several 
smmu_domain) has its
   own stage1 mapping, and we must scan all these mapping to sync dirty state. 
We plan to refactor
   smmu_domain to support more than one smmu in one smmu_domain, then these 
smmus can share a same
   stage1 mapping.
2. We also recognized that scan TTD is a hotspot of performance. Recently, I 
have implement a
   SW/HW conbined dirty log tracking at MMU side [1], which can effectively 
solve this problem.
   This idea can be applied to smmu side too.

Thanks,
Keqian


[1] 
https://lore.kernel.org/linux-arm-kernel/2021012612.27136-1-zhukeqi...@huawei.com/

Jean-Philippe Brucker (1):
  iommu/arm-smmu-v3: Add support for Hardware Translation Table Update

jiangkunkun (10):
  iommu/arm-smmu-v3: Enable HTTU for stage1 with io-pgtable mapping
  iommu/arm-smmu-v3: Add feature detection for BBML
  iommu/arm-smmu-v3: Split block descriptor when start dirty log
  iommu/arm-smmu-v3: Merge a span of page when stop dirty log
  iommu/arm-smmu-v3: Scan leaf TTD to sync hardware dirty log
  iommu/arm-smmu-v3: Clear dirty log according to bitmap
  iommu/arm-smmu-v3: Add HWDBM device feature reporting
  vfio/iommu_type1: Add HWDBM status maintanance
  vfio/iommu_type1: Optimize dirty bitmap population based on iommu
HWDBM
  vfio/iommu_type1: Add support for manual dirty log clear

 .../iommu/arm/arm-smmu-v3/arm-smmu-v3-sva.c   |   2 +
 drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c   | 226 +-
 drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h   |  14 +
 drivers/iommu/io-pgtable-arm.c| 392 +-
 drivers/iommu/iommu.c | 236 +++
 drivers/vfio/vfio_iommu_type1.c   | 270 +++-
 include/linux/io-pgtable.h|  23 +
 include/linux/iommu.h |  84 
 include/uapi/linux/vfio.h |  28 +-
 9 files changed, 1264 insertions(+), 11 deletions(-)

-- 
2.19.1

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

[PATCH v2 09/11] vfio/iommu_type1: Add HWDBM status maintanance

2021-03-10 Thread Keqian Zhu

From: jiangkunkun 

We are going to optimize dirty log tracking based on iommu
HWDBM feature, but the dirty log from iommu is useful only
when all iommu backed groups are connected to iommu with
HWDBM feature. This maintains a counter for this feature.

Co-developed-by: Keqian Zhu 
Signed-off-by: Kunkun Jiang 
---

changelog:

v2:
 - Simplify vfio_group_supports_hwdbm().
 - AS feature report of HWDBM has been changed, so change 
vfio_dev_has_feature() to
   vfio_dev_enable_feature().

---
 drivers/vfio/vfio_iommu_type1.c | 30 ++
 1 file changed, 30 insertions(+)

diff --git a/drivers/vfio/vfio_iommu_type1.c b/drivers/vfio/vfio_iommu_type1.c
index 4bb162c1d649..876351c061e4 100644
--- a/drivers/vfio/vfio_iommu_type1.c
+++ b/drivers/vfio/vfio_iommu_type1.c
@@ -79,6 +79,7 @@ struct vfio_iommu {
booldirty_page_tracking;
boolpinned_page_dirty_scope;
boolcontainer_open;
+   uint64_tnum_non_hwdbm_groups;
 };
 
 struct vfio_domain {
@@ -116,6 +117,7 @@ struct vfio_group {
struct list_headnext;
boolmdev_group; /* An mdev group */
boolpinned_page_dirty_scope;
+   booliommu_hwdbm;/* For iommu-backed group */
 };
 
 struct vfio_iova {
@@ -1187,6 +1189,24 @@ static void vfio_update_pgsize_bitmap(struct vfio_iommu 
*iommu)
}
 }
 
+static int vfio_dev_enable_feature(struct device *dev, void *data)
+{
+   enum iommu_dev_features *feat = data;
+
+   if (iommu_dev_feature_enabled(dev, *feat))
+   return 0;
+
+   return iommu_dev_enable_feature(dev, *feat);
+}
+
+static bool vfio_group_supports_hwdbm(struct vfio_group *group)
+{
+   enum iommu_dev_features feat = IOMMU_DEV_FEAT_HWDBM;
+
+   return !iommu_group_for_each_dev(group->iommu_group, ,
+vfio_dev_enable_feature);
+}
+
 static int update_user_bitmap(u64 __user *bitmap, struct vfio_iommu *iommu,
  struct vfio_dma *dma, dma_addr_t base_iova,
  size_t pgsize)
@@ -2435,6 +2455,12 @@ static int vfio_iommu_type1_attach_group(void 
*iommu_data,
 * capable via the page pinning interface.
 */
iommu->num_non_pinned_groups++;
+
+   /* Update the hwdbm status of group and iommu */
+   group->iommu_hwdbm = vfio_group_supports_hwdbm(group);
+   if (!group->iommu_hwdbm)
+   iommu->num_non_hwdbm_groups++;
+
mutex_unlock(>lock);
vfio_iommu_resv_free(_resv_regions);
 
@@ -2571,6 +2597,7 @@ static void vfio_iommu_type1_detach_group(void 
*iommu_data,
struct vfio_domain *domain;
struct vfio_group *group;
bool update_dirty_scope = false;
+   bool update_iommu_hwdbm = false;
LIST_HEAD(iova_copy);
 
mutex_lock(>lock);
@@ -2609,6 +2636,7 @@ static void vfio_iommu_type1_detach_group(void 
*iommu_data,
 
vfio_iommu_detach_group(domain, group);
update_dirty_scope = !group->pinned_page_dirty_scope;
+   update_iommu_hwdbm = !group->iommu_hwdbm;
list_del(>next);
kfree(group);
/*
@@ -2651,6 +2679,8 @@ static void vfio_iommu_type1_detach_group(void 
*iommu_data,
if (iommu->dirty_page_tracking)
vfio_iommu_populate_bitmap_full(iommu);
}
+   if (update_iommu_hwdbm)
+   iommu->num_non_hwdbm_groups--;
mutex_unlock(>lock);
 }
 
-- 
2.19.1

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

[PATCH v2 05/11] iommu/arm-smmu-v3: Merge a span of page when stop dirty log

2021-03-10 Thread Keqian Zhu

From: jiangkunkun 

When stop dirty log tracking, we need to recover all block descriptors
which are splited when start dirty log tracking.

This adds a new interface named stop_dirty_log in iommu layer and
arm smmuv3 implements it, which reinstall block mappings and unmap
the span of page mappings. Other types of IOMMU perform architecture
specific actions to stop dirty log.

To allow code reuse, the merge_page operation is realized as an
iommu_ops too. We flush all iotlbs after the whole procedure is
completed to ease the pressure of iommu, as we will hanle a huge
range of mapping in general.

Merging page does not simultaneously work with other pgtable ops,
as the only designed user is vfio, which always hold a lock, so race
condition is not considered in the pgtable ops.

Co-developed-by: Keqian Zhu 
Signed-off-by: Kunkun Jiang 
---

changelog:

v2:
 - Change the return type of merge_page(). size_t -> int.
 - Change commit message to properly describe race condition. (Robin)
 - Add a new interface named stop_dirty_log(). (Sun Yi)
 - Change commit message to explain the realtionship of merge_page() and 
stop_dirty_log().
 
---
 drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c | 52 +
 drivers/iommu/io-pgtable-arm.c  | 78 
 drivers/iommu/iommu.c   | 82 +
 include/linux/io-pgtable.h  |  2 +
 include/linux/iommu.h   | 24 ++
 5 files changed, 238 insertions(+)

diff --git a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c 
b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
index 5d2fb926a08e..ac0d881c77b8 100644
--- a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
+++ b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
@@ -2587,6 +2587,56 @@ static int arm_smmu_start_dirty_log(struct iommu_domain 
*domain,
return 0;
 }
 
+static int arm_smmu_merge_page(struct iommu_domain *domain,
+  unsigned long iova, phys_addr_t paddr,
+  size_t size, int prot)
+{
+   struct arm_smmu_domain *smmu_domain = to_smmu_domain(domain);
+   struct arm_smmu_device *smmu = smmu_domain->smmu;
+   struct io_pgtable_ops *ops = smmu_domain->pgtbl_ops;
+   size_t handled_size;
+
+   if (!(smmu->features & (ARM_SMMU_FEAT_BBML1 | ARM_SMMU_FEAT_BBML2))) {
+   dev_err(smmu->dev, "don't support BBML1/2, can't merge page\n");
+   return -ENODEV;
+   }
+   if (!ops || !ops->merge_page) {
+   pr_err("io-pgtable don't realize merge page\n");
+   return -ENODEV;
+   }
+
+   handled_size = ops->merge_page(ops, iova, paddr, size, prot);
+   if (handled_size != size) {
+   pr_err("merge page failed\n");
+   return -EFAULT;
+   }
+
+   return 0;
+}
+
+/*
+ * For SMMU, the action to stop dirty log is merge page mapping. The hardware
+ * dirty management is always enabled if hardware supports HTTU HD.
+ */
+static int arm_smmu_stop_dirty_log(struct iommu_domain *domain,
+  unsigned long iova, size_t size, int prot)
+{
+   struct arm_smmu_domain *smmu_domain = to_smmu_domain(domain);
+   struct arm_smmu_device *smmu = smmu_domain->smmu;
+
+   if (!(smmu->features & ARM_SMMU_FEAT_HD))
+   return -ENODEV;
+   if (smmu_domain->stage != ARM_SMMU_DOMAIN_S1)
+   return -EINVAL;
+
+   /*
+* Even if the merge operation fail, it just effects performace of DMA
+* transaction.
+*/
+   iommu_merge_page(domain, iova, size, prot);
+   return 0;
+}
+
 static int arm_smmu_of_xlate(struct device *dev, struct of_phandle_args *args)
 {
return iommu_fwspec_add_ids(dev, args->args, 1);
@@ -2688,6 +2738,8 @@ static struct iommu_ops arm_smmu_ops = {
.domain_set_attr= arm_smmu_domain_set_attr,
.split_block= arm_smmu_split_block,
.start_dirty_log= arm_smmu_start_dirty_log,
+   .merge_page = arm_smmu_merge_page,
+   .stop_dirty_log = arm_smmu_stop_dirty_log,
.of_xlate   = arm_smmu_of_xlate,
.get_resv_regions   = arm_smmu_get_resv_regions,
.put_resv_regions   = generic_iommu_put_resv_regions,
diff --git a/drivers/iommu/io-pgtable-arm.c b/drivers/iommu/io-pgtable-arm.c
index 4c4eec3c0698..9028328b99b0 100644
--- a/drivers/iommu/io-pgtable-arm.c
+++ b/drivers/iommu/io-pgtable-arm.c
@@ -800,6 +800,83 @@ static size_t arm_lpae_split_block(struct io_pgtable_ops 
*ops,
return __arm_lpae_split_block(data, iova, size, lvl, ptep);
 }
 
+static size_t __arm_lpae_merge_page(struct arm_lpae_io_pgtable *data,
+   unsigned long iova, phys_addr_t paddr,
+   size_t size, int lvl, arm_lpae_iopte *ptep,
+

[PATCH v2 10/11] vfio/iommu_type1: Optimize dirty bitmap population based on iommu HWDBM

2021-03-10 Thread Keqian Zhu

From: jiangkunkun 

In the past if vfio_iommu is not of pinned_page_dirty_scope and
vfio_dma is iommu_mapped, we populate full dirty bitmap for this
vfio_dma. Now we can try to get dirty log from iommu before make
the lousy decision.

In detail, if all vfio_group are of pinned_page_dirty_scope, the
dirty bitmap population is not affected. If there are vfio_groups
not of pinned_page_dirty_scope and their domains support HWDBM,
then we can try to get dirty log from IOMMU. Otherwise, lead to
full dirty bitmap.

We should start dirty log for newly added dma range and domain.

Co-developed-by: Keqian Zhu 
Signed-off-by: Kunkun Jiang 
---

changelog:

v2:
 - Use new interface to start|stop dirty log. As split_block|merge_page are 
related to ARM SMMU. (Sun Yi)
 - Bugfix: Start dirty log for newly added dma range and domain.
 
---
 drivers/vfio/vfio_iommu_type1.c | 136 +++-
 1 file changed, 132 insertions(+), 4 deletions(-)

diff --git a/drivers/vfio/vfio_iommu_type1.c b/drivers/vfio/vfio_iommu_type1.c
index 876351c061e4..a7ab0279eda0 100644
--- a/drivers/vfio/vfio_iommu_type1.c
+++ b/drivers/vfio/vfio_iommu_type1.c
@@ -1207,6 +1207,25 @@ static bool vfio_group_supports_hwdbm(struct vfio_group 
*group)
 vfio_dev_enable_feature);
 }
 
+static int vfio_iommu_dirty_log_clear(struct vfio_iommu *iommu,
+ dma_addr_t start_iova, size_t size,
+ unsigned long *bitmap_buffer,
+ dma_addr_t base_iova, size_t pgsize)
+{
+   struct vfio_domain *d;
+   unsigned long pgshift = __ffs(pgsize);
+   int ret;
+
+   list_for_each_entry(d, >domain_list, next) {
+   ret = iommu_clear_dirty_log(d->domain, start_iova, size,
+   bitmap_buffer, base_iova, pgshift);
+   if (ret)
+   return ret;
+   }
+
+   return 0;
+}
+
 static int update_user_bitmap(u64 __user *bitmap, struct vfio_iommu *iommu,
  struct vfio_dma *dma, dma_addr_t base_iova,
  size_t pgsize)
@@ -1218,13 +1237,28 @@ static int update_user_bitmap(u64 __user *bitmap, 
struct vfio_iommu *iommu,
unsigned long shift = bit_offset % BITS_PER_LONG;
unsigned long leftover;
 
+   if (!iommu->num_non_pinned_groups || !dma->iommu_mapped)
+   goto bitmap_done;
+
+   /* try to get dirty log from IOMMU */
+   if (!iommu->num_non_hwdbm_groups) {
+   struct vfio_domain *d;
+
+   list_for_each_entry(d, >domain_list, next) {
+   if (iommu_sync_dirty_log(d->domain, dma->iova, 
dma->size,
+   dma->bitmap, dma->iova, 
pgshift))
+   return -EFAULT;
+   }
+   goto bitmap_done;
+   }
+
/*
 * mark all pages dirty if any IOMMU capable device is not able
 * to report dirty pages and all pages are pinned and mapped.
 */
-   if (iommu->num_non_pinned_groups && dma->iommu_mapped)
-   bitmap_set(dma->bitmap, 0, nbits);
+   bitmap_set(dma->bitmap, 0, nbits);
 
+bitmap_done:
if (shift) {
bitmap_shift_left(dma->bitmap, dma->bitmap, shift,
  nbits + shift);
@@ -1286,6 +1320,18 @@ static int vfio_iova_dirty_bitmap(u64 __user *bitmap, 
struct vfio_iommu *iommu,
 */
bitmap_clear(dma->bitmap, 0, dma->size >> pgshift);
vfio_dma_populate_bitmap(dma, pgsize);
+
+   /* Clear iommu dirty log to re-enable dirty log tracking */
+   if (!iommu->pinned_page_dirty_scope &&
+   dma->iommu_mapped && !iommu->num_non_hwdbm_groups) {
+   ret = vfio_iommu_dirty_log_clear(iommu, dma->iova,
+   dma->size, dma->bitmap, dma->iova,
+   pgsize);
+   if (ret) {
+   pr_warn("dma dirty log clear failed!\n");
+   return ret;
+   }
+   }
}
return 0;
 }
@@ -1561,6 +1607,9 @@ static bool vfio_iommu_iova_dma_valid(struct vfio_iommu 
*iommu,
return list_empty(iova);
 }
 
+static void vfio_dma_dirty_log_start(struct vfio_iommu *iommu,
+struct vfio_dma *dma);
+
 static int vfio_dma_do_map(struct vfio_iommu *iommu,
   struct vfio_iommu_type1_dma_map *map)
 {
@@ -1684,8 +1733,13 @@ static int vfio_dma_do_map(struct vfio_iommu *iommu,
 
if (!ret && iommu->dirty_page_tracking) {
ret = vfio_dma_

[PATCH v2 07/11] iommu/arm-smmu-v3: Clear dirty log according to bitmap

2021-03-10 Thread Keqian Zhu

From: jiangkunkun 

After dirty log is retrieved, user should clear dirty log to re-enable
dirty log tracking for these dirtied pages.

This adds a new interface named clear_dirty_log in iommu layer and
arm smmuv3 implements it, which clears the dirty state (As we just
enable HTTU for stage1, so set the AP[2] bit) of these TTDs that are
specified by the user provided bitmap.

Co-developed-by: Keqian Zhu 
Signed-off-by: Kunkun Jiang 
---

changelog:

v2:
 - Add new sanity check in arm_smmu_sync_dirty_log(). (smmu_domain->stage != 
ARM_SMMU_DOMAIN_S1)
 - Remove extra flush_iotlb in __iommu_clear_dirty_log().
 
---
 drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c | 25 ++
 drivers/iommu/io-pgtable-arm.c  | 95 +
 drivers/iommu/iommu.c   | 68 +++
 include/linux/io-pgtable.h  |  4 +
 include/linux/iommu.h   | 17 
 5 files changed, 209 insertions(+)

diff --git a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c 
b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
index 7407896a710e..696df51a3282 100644
--- a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
+++ b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
@@ -2666,6 +2666,30 @@ static int arm_smmu_sync_dirty_log(struct iommu_domain 
*domain,
   bitmap_pgshift);
 }
 
+static int arm_smmu_clear_dirty_log(struct iommu_domain *domain,
+   unsigned long iova, size_t size,
+   unsigned long *bitmap,
+   unsigned long base_iova,
+   unsigned long bitmap_pgshift)
+{
+   struct arm_smmu_domain *smmu_domain = to_smmu_domain(domain);
+   struct io_pgtable_ops *ops = smmu_domain->pgtbl_ops;
+   struct arm_smmu_device *smmu = smmu_domain->smmu;
+
+   if (!(smmu->features & ARM_SMMU_FEAT_HD))
+   return -ENODEV;
+   if (smmu_domain->stage != ARM_SMMU_DOMAIN_S1)
+   return -EINVAL;
+
+   if (!ops || !ops->clear_dirty_log) {
+   pr_err("io-pgtable don't realize clear dirty log\n");
+   return -ENODEV;
+   }
+
+   return ops->clear_dirty_log(ops, iova, size, bitmap, base_iova,
+   bitmap_pgshift);
+}
+
 static int arm_smmu_of_xlate(struct device *dev, struct of_phandle_args *args)
 {
return iommu_fwspec_add_ids(dev, args->args, 1);
@@ -2770,6 +2794,7 @@ static struct iommu_ops arm_smmu_ops = {
.merge_page = arm_smmu_merge_page,
.stop_dirty_log = arm_smmu_stop_dirty_log,
.sync_dirty_log = arm_smmu_sync_dirty_log,
+   .clear_dirty_log= arm_smmu_clear_dirty_log,
.of_xlate   = arm_smmu_of_xlate,
.get_resv_regions   = arm_smmu_get_resv_regions,
.put_resv_regions   = generic_iommu_put_resv_regions,
diff --git a/drivers/iommu/io-pgtable-arm.c b/drivers/iommu/io-pgtable-arm.c
index 67a208a05ab2..e3ef0f50611c 100644
--- a/drivers/iommu/io-pgtable-arm.c
+++ b/drivers/iommu/io-pgtable-arm.c
@@ -966,6 +966,100 @@ static int arm_lpae_sync_dirty_log(struct io_pgtable_ops 
*ops,
 bitmap, base_iova, bitmap_pgshift);
 }
 
+static int __arm_lpae_clear_dirty_log(struct arm_lpae_io_pgtable *data,
+ unsigned long iova, size_t size,
+ int lvl, arm_lpae_iopte *ptep,
+ unsigned long *bitmap,
+ unsigned long base_iova,
+ unsigned long bitmap_pgshift)
+{
+   arm_lpae_iopte pte;
+   struct io_pgtable *iop = >iop;
+   unsigned long offset;
+   size_t base, next_size;
+   int nbits, ret, i;
+
+   if (WARN_ON(lvl == ARM_LPAE_MAX_LEVELS))
+   return -EINVAL;
+
+   ptep += ARM_LPAE_LVL_IDX(iova, lvl, data);
+   pte = READ_ONCE(*ptep);
+   if (WARN_ON(!pte))
+   return -EINVAL;
+
+   if (size == ARM_LPAE_BLOCK_SIZE(lvl, data)) {
+   if (iopte_leaf(pte, lvl, iop->fmt)) {
+   if (pte & ARM_LPAE_PTE_AP_RDONLY)
+   return 0;
+
+   /* Ensure all corresponding bits are set */
+   nbits = size >> bitmap_pgshift;
+   offset = (iova - base_iova) >> bitmap_pgshift;
+   for (i = offset; i < offset + nbits; i++) {
+   if (!test_bit(i, bitmap))
+   return 0;
+   }
+
+   /* Race does not exist */
+   pte |= ARM_LPAE_PTE_AP_RDONLY;
+   __arm_lpae_set_pte(ptep, pte, >cfg);
+   return 0;
+

[PATCH v2 03/11] iommu/arm-smmu-v3: Add feature detection for BBML

2021-03-10 Thread Keqian Zhu

From: jiangkunkun 

When altering a translation table descriptor of some specific reasons,
we require break-before-make procedure. But it might cause problems when
the TTD is alive. The I/O streams might not tolerate translation faults.

If the SMMU supports BBM level 1 or BBM level 2, we can change the block
size without using break-before-make sequence.

This adds feature detection for BBML, none functional change expected.

Co-developed-by: Keqian Zhu 
Signed-off-by: Kunkun Jiang 
---

changelog:

v2:
 - Use two new quirk flags named IO_PGTABLE_QUIRK_ARM_BBML1/2 to transfer
   SMMU BBML feature to io-pgtable. (Robin)
   
---
 drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c | 19 +++
 drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h |  6 ++
 include/linux/io-pgtable.h  |  8 
 3 files changed, 33 insertions(+)

diff --git a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c 
b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
index 369c0ea7a104..443ac19c6da9 100644
--- a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
+++ b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
@@ -2030,6 +2030,11 @@ static int arm_smmu_domain_finalise(struct iommu_domain 
*domain,
if (smmu->features & ARM_SMMU_FEAT_HD)
pgtbl_cfg.quirks |= IO_PGTABLE_QUIRK_ARM_HD;
 
+   if (smmu->features & ARM_SMMU_FEAT_BBML1)
+   pgtbl_cfg.quirks |= IO_PGTABLE_QUIRK_ARM_BBML1;
+   else if (smmu->features & ARM_SMMU_FEAT_BBML2)
+   pgtbl_cfg.quirks |= IO_PGTABLE_QUIRK_ARM_BBML2;
+
pgtbl_ops = alloc_io_pgtable_ops(fmt, _cfg, smmu_domain);
if (!pgtbl_ops)
return -ENOMEM;
@@ -3373,6 +3378,20 @@ static int arm_smmu_device_hw_probe(struct 
arm_smmu_device *smmu)
 
/* IDR3 */
reg = readl_relaxed(smmu->base + ARM_SMMU_IDR3);
+   switch (FIELD_GET(IDR3_BBML, reg)) {
+   case IDR3_BBML0:
+   break;
+   case IDR3_BBML1:
+   smmu->features |= ARM_SMMU_FEAT_BBML1;
+   break;
+   case IDR3_BBML2:
+   smmu->features |= ARM_SMMU_FEAT_BBML2;
+   break;
+   default:
+   dev_err(smmu->dev, "unknown/unsupported BBM behavior level\n");
+   return -ENXIO;
+   }
+
if (FIELD_GET(IDR3_RIL, reg))
smmu->features |= ARM_SMMU_FEAT_RANGE_INV;
 
diff --git a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h 
b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h
index 26d6b935b383..a74125675544 100644
--- a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h
+++ b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h
@@ -54,6 +54,10 @@
 #define IDR1_SIDSIZE   GENMASK(5, 0)
 
 #define ARM_SMMU_IDR3  0xc
+#define IDR3_BBML  GENMASK(12, 11)
+#define IDR3_BBML0 0
+#define IDR3_BBML1 1
+#define IDR3_BBML2 2
 #define IDR3_RIL   (1 << 10)
 
 #define ARM_SMMU_IDR5  0x14
@@ -615,6 +619,8 @@ struct arm_smmu_device {
 #define ARM_SMMU_FEAT_E2H  (1 << 18)
 #define ARM_SMMU_FEAT_HA   (1 << 19)
 #define ARM_SMMU_FEAT_HD   (1 << 20)
+#define ARM_SMMU_FEAT_BBML1(1 << 21)
+#define ARM_SMMU_FEAT_BBML2(1 << 22)
u32 features;
 
 #define ARM_SMMU_OPT_SKIP_PREFETCH (1 << 0)
diff --git a/include/linux/io-pgtable.h b/include/linux/io-pgtable.h
index 64cee6831c97..857932357f1d 100644
--- a/include/linux/io-pgtable.h
+++ b/include/linux/io-pgtable.h
@@ -84,6 +84,12 @@ struct io_pgtable_cfg {
 *  attributes set in the TCR for a non-coherent page-table walker.
 *
 * IO_PGTABLE_QUIRK_ARM_HD: Support hardware management of dirty status.
+*
+* IO_PGTABLE_QUIRK_ARM_BBML1: ARM SMMU supports BBM Level 1 behavior
+*  when changing block size.
+*
+* IO_PGTABLE_QUIRK_ARM_BBML2: ARM SMMU supports BBM Level 2 behavior
+* when changing block size.
 */
#define IO_PGTABLE_QUIRK_ARM_NS BIT(0)
#define IO_PGTABLE_QUIRK_NO_PERMS   BIT(1)
@@ -92,6 +98,8 @@ struct io_pgtable_cfg {
#define IO_PGTABLE_QUIRK_ARM_TTBR1  BIT(5)
#define IO_PGTABLE_QUIRK_ARM_OUTER_WBWA BIT(6)
#define IO_PGTABLE_QUIRK_ARM_HD BIT(7)
+   #define IO_PGTABLE_QUIRK_ARM_BBML1  BIT(8)
+   #define IO_PGTABLE_QUIRK_ARM_BBML2  BIT(9)
unsigned long   quirks;
unsigned long   pgsize_bitmap;
unsigned intias;
-- 
2.19.1

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

Re: [PATCH v13 07/10] iommu/arm-smmu-v3: Maintain a SID->device structure

2021-03-02 Thread Keqian Zhu

Hi Jean,

Reviewed-by: Keqian Zhu 

On 2021/3/2 17:26, Jean-Philippe Brucker wrote:
> When handling faults from the event or PRI queue, we need to find the
> struct device associated with a SID. Add a rb_tree to keep track of
> SIDs.
> 
> Acked-by: Jonathan Cameron 
> Reviewed-by: Eric Auger 
> Signed-off-by: Jean-Philippe Brucker 
> ---
>  drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h |  13 +-
>  drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c | 157 
>  2 files changed, 140 insertions(+), 30 deletions(-)
> 
> diff --git a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h 
> b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h
> index f985817c967a..7b15b7580c6e 100644
> --- a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h
> +++ b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h
> @@ -639,6 +639,15 @@ struct arm_smmu_device {
>  
>   /* IOMMU core code handle */
>   struct iommu_device iommu;
> +
> + struct rb_root  streams;
> + struct mutexstreams_mutex;
> +};
> +
> +struct arm_smmu_stream {
> + u32 id;
> + struct arm_smmu_master  *master;
> + struct rb_node  node;
>  };
>  
>  /* SMMU private data for each master */
> @@ -647,8 +656,8 @@ struct arm_smmu_master {
>   struct device   *dev;
>   struct arm_smmu_domain  *domain;
>   struct list_headdomain_head;
> - u32 *sids;
> - unsigned intnum_sids;
> + struct arm_smmu_stream  *streams;
> + unsigned intnum_streams;
>   boolats_enabled;
>   boolsva_enabled;
>   struct list_headbonds;
> diff --git a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c 
> b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
> index 7edce914c45e..d148bb6d4289 100644
> --- a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
> +++ b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
> @@ -909,8 +909,8 @@ static void arm_smmu_sync_cd(struct arm_smmu_domain 
> *smmu_domain,
>  
>   spin_lock_irqsave(_domain->devices_lock, flags);
>   list_for_each_entry(master, _domain->devices, domain_head) {
> - for (i = 0; i < master->num_sids; i++) {
> - cmd.cfgi.sid = master->sids[i];
> + for (i = 0; i < master->num_streams; i++) {
> + cmd.cfgi.sid = master->streams[i].id;
>   arm_smmu_cmdq_batch_add(smmu, , );
>   }
>   }
> @@ -1355,6 +1355,28 @@ static int arm_smmu_init_l2_strtab(struct 
> arm_smmu_device *smmu, u32 sid)
>   return 0;
>  }
>  
> +/* smmu->streams_mutex must be held */
> +__maybe_unused
> +static struct arm_smmu_master *
> +arm_smmu_find_master(struct arm_smmu_device *smmu, u32 sid)
> +{
> + struct rb_node *node;
> + struct arm_smmu_stream *stream;
> +
> + node = smmu->streams.rb_node;
> + while (node) {
> + stream = rb_entry(node, struct arm_smmu_stream, node);
> + if (stream->id < sid)
> + node = node->rb_right;
> + else if (stream->id > sid)
> + node = node->rb_left;
> + else
> + return stream->master;
> + }
> +
> + return NULL;
> +}
> +
>  /* IRQ and event handlers */
>  static irqreturn_t arm_smmu_evtq_thread(int irq, void *dev)
>  {
> @@ -1588,8 +1610,8 @@ static int arm_smmu_atc_inv_master(struct 
> arm_smmu_master *master)
>  
>   arm_smmu_atc_inv_to_cmd(0, 0, 0, );
>  
> - for (i = 0; i < master->num_sids; i++) {
> - cmd.atc.sid = master->sids[i];
> + for (i = 0; i < master->num_streams; i++) {
> + cmd.atc.sid = master->streams[i].id;
>   arm_smmu_cmdq_issue_cmd(master->smmu, );
>   }
>  
> @@ -1632,8 +1654,8 @@ int arm_smmu_atc_inv_domain(struct arm_smmu_domain 
> *smmu_domain, int ssid,
>   if (!master->ats_enabled)
>   continue;
>  
> - for (i = 0; i < master->num_sids; i++) {
> - cmd.atc.sid = master->sids[i];
> + for (i = 0; i < master->num_streams; i++) {
> + cmd.atc.sid = master->streams[i].id;
>   arm_smmu_cmdq_batch_add(smmu_domain->smmu, , );
>   }
>   }
> @@ -2065,13 +2087,13 @@ static void arm_smmu_install_ste_for_dev(struct 
> arm_smmu_master *master)
>

Re: [PATCH v14 05/13] iommu/smmuv3: Implement attach/detach_pasid_table

2021-03-02 Thread Keqian Zhu

Hi Eric,

On 2021/2/24 4:56, Eric Auger wrote:
> On attach_pasid_table() we program STE S1 related info set
> by the guest into the actual physical STEs. At minimum
> we need to program the context descriptor GPA and compute
> whether the stage1 is translated/bypassed or aborted.
> 
> On detach, the stage 1 config is unset and the abort flag is
> unset.
> 
> Signed-off-by: Eric Auger 
> 
[...]

> +
> + /*
> +  * we currently support a single CD so s1fmt and s1dss
> +  * fields are also ignored
> +  */
> + if (cfg->pasid_bits)
> + goto out;
> +
> + smmu_domain->s1_cfg.cdcfg.cdtab_dma = cfg->base_ptr;
only the "cdtab_dma" field of "cdcfg" is set, we are not able to locate a 
specific cd using arm_smmu_get_cd_ptr().

Maybe we'd better use a specialized function to fill other fields of "cdcfg" or 
add a sanity check in arm_smmu_get_cd_ptr()
to prevent calling it under nested mode?

As now we just call arm_smmu_get_cd_ptr() during finalise_s1(), no problem 
found. Just a suggestion ;-)

Thanks,
Keqian


> + smmu_domain->s1_cfg.set = true;
> + smmu_domain->abort = false;
> + break;
> + default:
> + goto out;
> + }
> + spin_lock_irqsave(_domain->devices_lock, flags);
> + list_for_each_entry(master, _domain->devices, domain_head)
> + arm_smmu_install_ste_for_dev(master);
> + spin_unlock_irqrestore(_domain->devices_lock, flags);
> + ret = 0;
> +out:
> + mutex_unlock(_domain->init_mutex);
> + return ret;
> +}
> +
> +static void arm_smmu_detach_pasid_table(struct iommu_domain *domain)
> +{
> + struct arm_smmu_domain *smmu_domain = to_smmu_domain(domain);
> + struct arm_smmu_master *master;
> + unsigned long flags;
> +
> + mutex_lock(_domain->init_mutex);
> +
> + if (smmu_domain->stage != ARM_SMMU_DOMAIN_NESTED)
> + goto unlock;
> +
> + smmu_domain->s1_cfg.set = false;
> + smmu_domain->abort = false;
> +
> + spin_lock_irqsave(_domain->devices_lock, flags);
> + list_for_each_entry(master, _domain->devices, domain_head)
> + arm_smmu_install_ste_for_dev(master);
> + spin_unlock_irqrestore(_domain->devices_lock, flags);
> +
> +unlock:
> + mutex_unlock(_domain->init_mutex);
> +}
> +
>  static bool arm_smmu_dev_has_feature(struct device *dev,
>enum iommu_dev_features feat)
>  {
> @@ -2939,6 +3026,8 @@ static struct iommu_ops arm_smmu_ops = {
>   .of_xlate   = arm_smmu_of_xlate,
>   .get_resv_regions   = arm_smmu_get_resv_regions,
>   .put_resv_regions   = generic_iommu_put_resv_regions,
> + .attach_pasid_table = arm_smmu_attach_pasid_table,
> + .detach_pasid_table = arm_smmu_detach_pasid_table,
>   .dev_has_feat   = arm_smmu_dev_has_feature,
>   .dev_feat_enabled   = arm_smmu_dev_feature_enabled,
>   .dev_enable_feat= arm_smmu_dev_enable_feature,
> 
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

Re: [RFC PATCH 01/11] iommu/arm-smmu-v3: Add feature detection for HTTU

2021-03-01 Thread Keqian Zhu

Hi Robin,

I am going to send v2 at next week, to addresses these issues reported by you. 
Many thanks!
And do you have any further comments on patch #4 #5 and #6?

Thanks,
Keqian

On 2021/2/5 3:50, Robin Murphy wrote:
> On 2021-01-28 15:17, Keqian Zhu wrote:
>> From: jiangkunkun 
>>
>> The SMMU which supports HTTU (Hardware Translation Table Update) can
>> update the access flag and the dirty state of TTD by hardware. It is
>> essential to track dirty pages of DMA.
>>
>> This adds feature detection, none functional change.
>>
>> Co-developed-by: Keqian Zhu 
>> Signed-off-by: Kunkun Jiang 
>> ---
>>   drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c | 16 
>>   drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h |  8 
>>   include/linux/io-pgtable.h  |  1 +
>>   3 files changed, 25 insertions(+)
>>
>> diff --git a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c 
>> b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
>> index 8ca7415d785d..0f0fe71cc10d 100644
>> --- a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
>> +++ b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
>> @@ -1987,6 +1987,7 @@ static int arm_smmu_domain_finalise(struct 
>> iommu_domain *domain,
>>   .pgsize_bitmap= smmu->pgsize_bitmap,
>>   .ias= ias,
>>   .oas= oas,
>> +.httu_hd= smmu->features & ARM_SMMU_FEAT_HTTU_HD,
>>   .coherent_walk= smmu->features & ARM_SMMU_FEAT_COHERENCY,
>>   .tlb= _smmu_flush_ops,
>>   .iommu_dev= smmu->dev,
>> @@ -3224,6 +3225,21 @@ static int arm_smmu_device_hw_probe(struct 
>> arm_smmu_device *smmu)
>>   if (reg & IDR0_HYP)
>>   smmu->features |= ARM_SMMU_FEAT_HYP;
>>   +switch (FIELD_GET(IDR0_HTTU, reg)) {
> 
> We need to accommodate the firmware override as well if we need this to be 
> meaningful. Jean-Philippe is already carrying a suitable patch in the SVA 
> stack[1].
> 
>> +case IDR0_HTTU_NONE:
>> +break;
>> +case IDR0_HTTU_HA:
>> +smmu->features |= ARM_SMMU_FEAT_HTTU_HA;
>> +break;
>> +case IDR0_HTTU_HAD:
>> +smmu->features |= ARM_SMMU_FEAT_HTTU_HA;
>> +smmu->features |= ARM_SMMU_FEAT_HTTU_HD;
>> +break;
>> +default:
>> +dev_err(smmu->dev, "unknown/unsupported HTTU!\n");
>> +return -ENXIO;
>> +}
>> +
>>   /*
>>* The coherency feature as set by FW is used in preference to the ID
>>* register, but warn on mismatch.
>> diff --git a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h 
>> b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h
>> index 96c2e9565e00..e91bea44519e 100644
>> --- a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h
>> +++ b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h
>> @@ -33,6 +33,10 @@
>>   #define IDR0_ASID16(1 << 12)
>>   #define IDR0_ATS(1 << 10)
>>   #define IDR0_HYP(1 << 9)
>> +#define IDR0_HTTUGENMASK(7, 6)
>> +#define IDR0_HTTU_NONE0
>> +#define IDR0_HTTU_HA1
>> +#define IDR0_HTTU_HAD2
>>   #define IDR0_COHACC(1 << 4)
>>   #define IDR0_TTFGENMASK(3, 2)
>>   #define IDR0_TTF_AARCH642
>> @@ -286,6 +290,8 @@
>>   #define CTXDESC_CD_0_TCR_TBI0(1ULL << 38)
>> #define CTXDESC_CD_0_AA64(1UL << 41)
>> +#define CTXDESC_CD_0_HD(1UL << 42)
>> +#define CTXDESC_CD_0_HA(1UL << 43)
>>   #define CTXDESC_CD_0_S(1UL << 44)
>>   #define CTXDESC_CD_0_R(1UL << 45)
>>   #define CTXDESC_CD_0_A(1UL << 46)
>> @@ -604,6 +610,8 @@ struct arm_smmu_device {
>>   #define ARM_SMMU_FEAT_RANGE_INV(1 << 15)
>>   #define ARM_SMMU_FEAT_BTM(1 << 16)
>>   #define ARM_SMMU_FEAT_SVA(1 << 17)
>> +#define ARM_SMMU_FEAT_HTTU_HA(1 << 18)
>> +#define ARM_SMMU_FEAT_HTTU_HD(1 << 19)
>>   u32features;
>> #define ARM_SMMU_OPT_SKIP_PREFETCH(1 << 0)
>> diff --git a/include/linux/io-pgtable.h b/include/linux/io-pgtable.h
>> index ea727eb1a1a9..1a00ea8562c7 100644
>> --- a/include/linux/io-pgtable.h
>> +++ b/include/linux/io-pgtable.h
>> @@ -97,6 +97,7 @@ struct io_pgtable_cfg {
>>   unsigned longpgsize_bitmap;
>>   unsigned intias;
>>   unsigned intoas;
>> +boolhttu_hd;
> 
> This is very specific to the AArch64 stage 1 format, not a generic capability 
> - I think it should be a quirk flag rather than a common field.
> 
> Robin.
> 
> [1] 
> https://jpbrucker.net/git/linux/commit/?h=sva/current=1ef7d512fb9082450dfe0d22ca4f7e35625a097b
> 
>>   boolcoherent_walk;
>>   const struct iommu_flush_ops*tlb;
>>   struct device*iommu_dev;
>>
> .
> 
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

Re: [PATCH v11 01/13] vfio: VFIO_IOMMU_SET_PASID_TABLE

2021-02-22 Thread Keqian Zhu

Hi Eric,

On 2021/2/22 18:53, Auger Eric wrote:
> Hi Keqian,
> 
> On 2/2/21 1:34 PM, Keqian Zhu wrote:
>> Hi Eric,
>>
>> On 2020/11/16 19:00, Eric Auger wrote:
>>> From: "Liu, Yi L" 
>>>
>>> This patch adds an VFIO_IOMMU_SET_PASID_TABLE ioctl
>>> which aims to pass the virtual iommu guest configuration
>>> to the host. This latter takes the form of the so-called
>>> PASID table.
>>>
>>> Signed-off-by: Jacob Pan 
>>> Signed-off-by: Liu, Yi L 
>>> Signed-off-by: Eric Auger 
>>>
>>> ---
>>> v11 -> v12:
>>> - use iommu_uapi_set_pasid_table
>>> - check SET and UNSET are not set simultaneously (Zenghui)
>>>
>>> v8 -> v9:
>>> - Merge VFIO_IOMMU_ATTACH/DETACH_PASID_TABLE into a single
>>>   VFIO_IOMMU_SET_PASID_TABLE ioctl.
>>>
>>> v6 -> v7:
>>> - add a comment related to VFIO_IOMMU_DETACH_PASID_TABLE
>>>
>>> v3 -> v4:
>>> - restore ATTACH/DETACH
>>> - add unwind on failure
>>>
>>> v2 -> v3:
>>> - s/BIND_PASID_TABLE/SET_PASID_TABLE
>>>
>>> v1 -> v2:
>>> - s/BIND_GUEST_STAGE/BIND_PASID_TABLE
>>> - remove the struct device arg
>>> ---
>>>  drivers/vfio/vfio_iommu_type1.c | 65 +
>>>  include/uapi/linux/vfio.h   | 19 ++
>>>  2 files changed, 84 insertions(+)
>>>
>>> diff --git a/drivers/vfio/vfio_iommu_type1.c 
>>> b/drivers/vfio/vfio_iommu_type1.c
>>> index 67e827638995..87ddd9e882dc 100644
>>> --- a/drivers/vfio/vfio_iommu_type1.c
>>> +++ b/drivers/vfio/vfio_iommu_type1.c
>>> @@ -2587,6 +2587,41 @@ static int vfio_iommu_iova_build_caps(struct 
>>> vfio_iommu *iommu,
>>> return ret;
>>>  }
>>>  
>>> +static void
>>> +vfio_detach_pasid_table(struct vfio_iommu *iommu)
>>> +{
>>> +   struct vfio_domain *d;
>>> +
>>> +   mutex_lock(>lock);
>>> +   list_for_each_entry(d, >domain_list, next)
>>> +   iommu_detach_pasid_table(d->domain);
>>> +
>>> +   mutex_unlock(>lock);
>>> +}
>>> +
>>> +static int
>>> +vfio_attach_pasid_table(struct vfio_iommu *iommu, unsigned long arg)
>>> +{
>>> +   struct vfio_domain *d;
>>> +   int ret = 0;
>>> +
>>> +   mutex_lock(>lock);
>>> +
>>> +   list_for_each_entry(d, >domain_list, next) {
>>> +   ret = iommu_uapi_attach_pasid_table(d->domain, (void __user 
>>> *)arg);
>> This design is not very clear to me. This assumes all iommu_domains share 
>> the same pasid table.
>>
>> As I understand, it's reasonable when there is only one group in the domain, 
>> and only one domain in the vfio_iommu.
>> If more than one group in the vfio_iommu, the guest may put them into 
>> different guest iommu_domain, then they have different pasid table.
>>
>> Is this the use scenario?
> 
> the vfio_iommu is attached to a container. all the groups within a
> container share the same set of page tables (linux
> Documentation/driver-api/vfio.rst). So to me if you want to use
> different pasid tables, the groups need to be attached to different
> containers. Does that make sense to you?
OK, so this is what I understand about the design. A little question is that 
when
we perform attach_pasid_table on a container, maybe we ought to do a sanity
check to make sure that only one group is in this container, instead of
iterating all domain?

To be frank, my main concern is that if we put each group into different 
container
under nested mode, then we give up the possibility that they can share stage2 
page tables,
which saves host memory and reduces the time of preparing environment for VM.

To me, I'd like to understand the "container shares page table" to be:
1) share stage2 page table under nested mode.
2) share stage1 page table under non-nested mode.

As when we perform "map" on a container:
1) under nested mode, we setup stage2 mapping.
2) under non-nested mode, we setup stage1 mapping.

Indeed, to realize stage2 mapping sharing, we should do much more work to 
refactor
SMMU_DOMAIN...

Hope you can consider this. :)

Thanks,
Keqian

> 
> Thanks
> 
> Eric
>>
>> Thanks,
>> Keqian
>>
>>> +   if (ret)
>>> +   goto unwind;
>>> +   }
>>> +   goto unlock;
>>> +unwind:
>>> +   list_for_each_entry_continue_reverse(d, >domain_list, next) {
>

Re: [PATCH v13 02/15] iommu: Introduce bind/unbind_guest_msi

2021-02-18 Thread Keqian Zhu

Hi Eric,

On 2021/2/12 16:55, Auger Eric wrote:
> Hi Keqian,
> 
> On 2/1/21 12:52 PM, Keqian Zhu wrote:
>> Hi Eric,
>>
>> On 2020/11/18 19:21, Eric Auger wrote:
>>> On ARM, MSI are translated by the SMMU. An IOVA is allocated
>>> for each MSI doorbell. If both the host and the guest are exposed
>>> with SMMUs, we end up with 2 different IOVAs allocated by each.
>>> guest allocates an IOVA (gIOVA) to map onto the guest MSI
>>> doorbell (gDB). The Host allocates another IOVA (hIOVA) to map
>>> onto the physical doorbell (hDB).
>>>
>>> So we end up with 2 untied mappings:
>>>  S1S2
>>> gIOVA->gDB
>>>   hIOVA->hDB
>>>
>>> Currently the PCI device is programmed by the host with hIOVA
>>> as MSI doorbell. So this does not work.
>>>
>>> This patch introduces an API to pass gIOVA/gDB to the host so
>>> that gIOVA can be reused by the host instead of re-allocating
>>> a new IOVA. So the goal is to create the following nested mapping:
>> Does the gDB can be reused under non-nested mode?
> 
> Under non nested mode the hIOVA is allocated within the MSI reserved
> region exposed by the SMMU driver, [0x800, 80f]. see
> iommu_dma_prepare_msi/iommu_dma_get_msi_page in dma_iommu.c. this hIOVA
> is programmed in the physical device so that the physical SMMU
> translates it into the physical doorbell (hDB = host physical ITS
So, AFAIU, under non-nested mode, at smmu side, we reuse the workflow of 
non-virtualization scenario.

> doorbell). The gDB is not used at pIOMMU programming level. It is only
> used when setting up the KVM irq route.
> 
> Hope this answers your question.
Thanks for your explanation!
> 

Thanks,
Keqian

>>
>>>
>>>  S1S2
>>> gIOVA->gDB ->hDB
>>>
>>> and program the PCI device with gIOVA MSI doorbell.
>>>
>>> In case we have several devices attached to this nested domain
>>> (devices belonging to the same group), they cannot be isolated
>>> on guest side either. So they should also end up in the same domain
>>> on guest side. We will enforce that all the devices attached to
>>> the host iommu domain use the same physical doorbell and similarly
>>> a single virtual doorbell mapping gets registered (1 single
>>> virtual doorbell is used on guest as well).
>>>
>> [...]
>>
>>> + *
>>> + * The associated IOVA can be reused by the host to create a nested
>>> + * stage2 binding mapping translating into the physical doorbell used
>>> + * by the devices attached to the domain.
>>> + *
>>> + * All devices within the domain must share the same physical doorbell.
>>> + * A single MSI GIOVA/GPA mapping can be attached to an iommu_domain.
>>> + */
>>> +
>>> +int iommu_bind_guest_msi(struct iommu_domain *domain,
>>> +dma_addr_t giova, phys_addr_t gpa, size_t size)
>>> +{
>>> +   if (unlikely(!domain->ops->bind_guest_msi))
>>> +   return -ENODEV;
>>> +
>>> +   return domain->ops->bind_guest_msi(domain, giova, gpa, size);
>>> +}
>>> +EXPORT_SYMBOL_GPL(iommu_bind_guest_msi);
>>> +
>>> +void iommu_unbind_guest_msi(struct iommu_domain *domain,
>>> +   dma_addr_t iova)
>> nit: s/iova/giova
> sure
>>
>>> +{
>>> +   if (unlikely(!domain->ops->unbind_guest_msi))
>>> +   return;
>>> +
>>> +   domain->ops->unbind_guest_msi(domain, iova);
>>> +}
>>> +EXPORT_SYMBOL_GPL(iommu_unbind_guest_msi);
>>> +
>> [...]
>>
>> Thanks,
>> Keqian
>>
> 
> Thanks
> 
> Eric
> 
> .
> 
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

Re: [RFC PATCH 10/11] vfio/iommu_type1: Optimize dirty bitmap population based on iommu HWDBM

2021-02-17 Thread Keqian Zhu

Hi Yi,

On 2021/2/9 19:57, Yi Sun wrote:
> On 21-02-07 18:40:36, Keqian Zhu wrote:
>> Hi Yi,
>>
>> On 2021/2/7 17:56, Yi Sun wrote:
>>> Hi,
>>>
>>> On 21-01-28 23:17:41, Keqian Zhu wrote:
>>>
>>> [...]
>>>
>>>> +static void vfio_dma_dirty_log_start(struct vfio_iommu *iommu,
>>>> +   struct vfio_dma *dma)
>>>> +{
>>>> +  struct vfio_domain *d;
>>>> +
>>>> +  list_for_each_entry(d, >domain_list, next) {
>>>> +  /* Go through all domain anyway even if we fail */
>>>> +  iommu_split_block(d->domain, dma->iova, dma->size);
>>>> +  }
>>>> +}
>>>
>>> This should be a switch to prepare for dirty log start. Per Intel
>>> Vtd spec, there is SLADE defined in Scalable-Mode PASID Table Entry.
>>> It enables Accessed/Dirty Flags in second-level paging entries.
>>> So, a generic iommu interface here is better. For Intel iommu, it
>>> enables SLADE. For ARM, it splits block.
>> Indeed, a generic interface name is better.
>>
>> The vendor iommu driver plays vendor's specific actions to start dirty log, 
>> and Intel iommu and ARM smmu may differ. Besides, we may add more actions in 
>> ARM smmu driver in future.
>>
>> One question: Though I am not familiar with Intel iommu, I think it also 
>> should split block mapping besides enable SLADE. Right?
>>
> I am not familiar with ARM smmu. :) So I want to clarify if the block
> in smmu is big page, e.g. 2M page? Intel Vtd manages the memory per
Yes, for ARM, the "block" is big page :).

> page, 4KB/2MB/1GB. There are two ways to manage dirty pages.
> 1. Keep default granularity. Just set SLADE to enable the dirty track.
> 2. Split big page to 4KB to get finer granularity.
According to your statement, I see that VT-D's SLADE behaves like smmu HTTU. 
They are both based on page-table.

Right, we should give more freedom to iommu vendor driver, so a generic 
interface is better.
1) As you said, set SLADE when enable dirty log.
2) IOMMUs of other architecture may has completely different dirty tracking 
mechanism.

> 
> But question about the second solution is if it can benefit the user
> space, e.g. live migration. If my understanding about smmu block (i.e.
> the big page) is correct, have you collected some performance data to
> prove that the split can improve performance? Thanks!
The purpose of splitting block mapping is to reduce the amount of dirty bytes, 
which depends on actual DMA transaction.
Take an extreme example, if DMA writes one byte, under 1G mapping, the dirty 
amount reported to userspace is 1G, but under 4K mapping, the dirty amount is 
just 4K.

I will detail the commit message in v2.

Thanks,
Keqian
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

Re: [RFC PATCH 06/11] iommu/arm-smmu-v3: Scan leaf TTD to sync hardware dirty log

2021-02-07 Thread Keqian Zhu




On 2021/2/5 3:52, Robin Murphy wrote:
> On 2021-01-28 15:17, Keqian Zhu wrote:
>> From: jiangkunkun 
>>
>> During dirty log tracking, user will try to retrieve dirty log from
>> iommu if it supports hardware dirty log. This adds a new interface
[...]

>>   static void arm_lpae_restrict_pgsizes(struct io_pgtable_cfg *cfg)
>>   {
>>   unsigned long granule, page_sizes;
>> @@ -957,6 +1046,7 @@ arm_lpae_alloc_pgtable(struct io_pgtable_cfg *cfg)
>>   .iova_to_phys= arm_lpae_iova_to_phys,
>>   .split_block= arm_lpae_split_block,
>>   .merge_page= arm_lpae_merge_page,
>> +.sync_dirty_log= arm_lpae_sync_dirty_log,
>>   };
>> return data;
>> diff --git a/drivers/iommu/iommu.c b/drivers/iommu/iommu.c
>> index f1261da11ea8..69f268069282 100644
>> --- a/drivers/iommu/iommu.c
>> +++ b/drivers/iommu/iommu.c
>> @@ -2822,6 +2822,47 @@ size_t iommu_merge_page(struct iommu_domain *domain, 
>> unsigned long iova,
>>   }
>>   EXPORT_SYMBOL_GPL(iommu_merge_page);
>>   +int iommu_sync_dirty_log(struct iommu_domain *domain, unsigned long iova,
>> + size_t size, unsigned long *bitmap,
>> + unsigned long base_iova, unsigned long bitmap_pgshift)
>> +{
>> +const struct iommu_ops *ops = domain->ops;
>> +unsigned int min_pagesz;
>> +size_t pgsize;
>> +int ret;
>> +
>> +min_pagesz = 1 << __ffs(domain->pgsize_bitmap);
>> +
>> +if (!IS_ALIGNED(iova | size, min_pagesz)) {
>> +pr_err("unaligned: iova 0x%lx size 0x%zx min_pagesz 0x%x\n",
>> +   iova, size, min_pagesz);
>> +return -EINVAL;
>> +}
>> +
>> +if (!ops || !ops->sync_dirty_log) {
>> +pr_err("don't support sync dirty log\n");
>> +return -ENODEV;
>> +}
>> +
>> +while (size) {
>> +pgsize = iommu_pgsize(domain, iova, size);
>> +
>> +ret = ops->sync_dirty_log(domain, iova, pgsize,
>> +  bitmap, base_iova, bitmap_pgshift);
> 
> Once again, we have a worst-of-both-worlds iteration that doesn't make much 
> sense. iommu_pgsize() essentially tells you the best supported size that an 
> IOVA range *can* be mapped with, but we're iterating a range that's already 
> mapped, so we don't know if it's relevant, and either way it may not bear any 
> relation to the granularity of the bitmap, which is presumably what actually 
> matters.
> 
> Logically, either we should iterate at the bitmap granularity here, and the 
> driver just says whether the given iova chunk contains any dirty pages or 
> not, or we just pass everything through to the driver and let it do the whole 
> job itself. Doing a little bit of both is just an overcomplicated mess.
> 
> I'm skimming patch #7 and pretty much the same comments apply, so I can't be 
> bothered to repeat them there...
> 
> Robin.
Sorry that I missed these comments...

As I clarified in #4, due to unsuitable variable name, the @pgsize actually is 
the max size that meets alignment acquirement and fits into the pgsize_bitmap.

All iommu interfaces acquire the @size fits into pgsize_bitmap to simplify 
their implementation. And the logic is very similar to "unmap" here.

Thanks,
Keqian

> 
>> +if (ret)
>> +break;
>> +
>> +pr_debug("dirty_log_sync: iova 0x%lx pagesz 0x%zx\n", iova,
>> + pgsize);
>> +
>> +iova += pgsize;
>> +size -= pgsize;
>> +}
>> +
>> +return ret;
>> +}
>> +EXPORT_SYMBOL_GPL(iommu_sync_dirty_log);
>> +
>>   void iommu_get_resv_regions(struct device *dev, struct list_head *list)
>>   {
>>   const struct iommu_ops *ops = dev->bus->iommu_ops;
>> diff --git a/include/linux/io-pgtable.h b/include/linux/io-pgtable.h
>> index 754b62a1bbaf..f44551e4a454 100644
>> --- a/include/linux/io-pgtable.h
>> +++ b/include/linux/io-pgtable.h
>> @@ -166,6 +166,10 @@ struct io_pgtable_ops {
>> size_t size);
>>   size_t (*merge_page)(struct io_pgtable_ops *ops, unsigned long iova,
>>phys_addr_t phys, size_t size, int prot);
>> +int (*sync_dirty_log)(struct io_pgtable_ops *ops,
>> +  unsigned long iova, size_t size,
>> +  unsigned long *bitmap, unsigned long base_iova,
>> +  unsigned long bitmap_pgshift);
>>   };
>> /**
>> diff --git a/include/linux/iommu.h b/include/linux/iommu.h
>> in

Re: [RFC PATCH 06/11] iommu/arm-smmu-v3: Scan leaf TTD to sync hardware dirty log

2021-02-07 Thread Keqian Zhu

Hi Robin,

On 2021/2/5 3:52, Robin Murphy wrote:
> On 2021-01-28 15:17, Keqian Zhu wrote:
>> From: jiangkunkun 
>>
>> During dirty log tracking, user will try to retrieve dirty log from
>> iommu if it supports hardware dirty log. This adds a new interface
>> named sync_dirty_log in iommu layer and arm smmuv3 implements it,
>> which scans leaf TTD and treats it's dirty if it's writable (As we
>> just enable HTTU for stage1, so check AP[2] is not set).
>>
>> Co-developed-by: Keqian Zhu 
>> Signed-off-by: Kunkun Jiang 
>> ---
>>   drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c | 27 +++
>>   drivers/iommu/io-pgtable-arm.c  | 90 +
>>   drivers/iommu/iommu.c   | 41 ++
>>   include/linux/io-pgtable.h  |  4 +
>>   include/linux/iommu.h   | 17 
>>   5 files changed, 179 insertions(+)
>>
>> diff --git a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c 
>> b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
>> index 2434519e4bb6..43d0536b429a 100644
>> --- a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
>> +++ b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
>> @@ -2548,6 +2548,32 @@ static size_t arm_smmu_merge_page(struct iommu_domain 
>> *domain, unsigned long iov
>>   return ops->merge_page(ops, iova, paddr, size, prot);
>>   }
>>   +static int arm_smmu_sync_dirty_log(struct iommu_domain *domain,
>> +   unsigned long iova, size_t size,
>> +   unsigned long *bitmap,
>> +   unsigned long base_iova,
>> +   unsigned long bitmap_pgshift)
>> +{
>> +struct io_pgtable_ops *ops = to_smmu_domain(domain)->pgtbl_ops;
>> +struct arm_smmu_device *smmu = to_smmu_domain(domain)->smmu;
>> +
>> +if (!(smmu->features & ARM_SMMU_FEAT_HTTU_HD)) {
>> +dev_err(smmu->dev, "don't support HTTU_HD and sync dirty log\n");
>> +return -EPERM;
>> +}
>> +
>> +if (!ops || !ops->sync_dirty_log) {
>> +pr_err("don't support sync dirty log\n");
>> +return -ENODEV;
>> +}
>> +
>> +/* To ensure all inflight transactions are completed */
>> +arm_smmu_flush_iotlb_all(domain);
> 
> What about transactions that arrive between the point that this completes, 
> and the point - potentially much later - that we actually access any given 
> PTE during the walk? I don't see what this is supposed to be synchronising 
> against, even if it were just a CMD_SYNC (I especially don't see why we'd 
> want to knock out the TLBs).
The idea is that pgtable may be updated by HTTU *before* or *after* actual DMA 
access.

1) For PCI ATS. As SMMU spec (3.13.6.1 Hardware flag update for ATS & PRI) 
states:

"In addition to the behavior that is described earlier in this section, if 
hardware-management of Dirty state is enabled
and an ATS request for write access (with NW == 0) is made to a page that is 
marked Writable Clean, the SMMU
assumes a write will be made to that page and marks the page as Writable Dirty 
before returning the ATS response
that grants write access. When this happens, the modification to the page data 
by a device is not visible before
the page state is visible as Writable Dirty."

The problem is that guest memory may be dirtied *after* we actually handle it.

2) For inflight DMA. As SMMU spec (3.13.4 HTTU behavior summary) states:

"In addition, the completion of a TLB invalidation operation makes TTD updates 
that were caused by
transactions that are themselves completed by the completion of the TLB 
invalidation visible. Both
broadcast and explicit CMD_TLBI_* invalidations have this property."

The problem is that we should flush all dma transaction after guest stop.

The key to solve these problems is that we should invalidate related TLB.
1) TLBI can flush inflight dma translation (before dirty_log_sync()).
2) If a DMA translation uses ATC and occurs after we have handle dirty memory, 
then the ATC has been invalidated, so this will remark page as dirty (in 
dirty_log_clear()).

Thanks,
Keqian

> 
>> +
>> +return ops->sync_dirty_log(ops, iova, size, bitmap,
>> +base_iova, bitmap_pgshift);
>> +}
>> +
>>   static int arm_smmu_of_xlate(struct device *dev, struct of_phandle_args 
>> *args)
>>   {
>>   return iommu_fwspec_add_ids(dev, args->args, 1);
>> @@ -2649,6 +2675,7 @@ static struct iommu_ops arm_smmu_ops = {
>>   .domain_set_attr= arm_smmu_domain_set_attr,
>>   .split_block= arm_smmu_split_block,
>>   .merge_page= arm_smmu_m

Re: [RFC PATCH 05/11] iommu/arm-smmu-v3: Merge a span of page to block descriptor

2021-02-07 Thread Keqian Zhu

Hi Robin,

On 2021/2/5 3:52, Robin Murphy wrote:
> On 2021-01-28 15:17, Keqian Zhu wrote:
>> From: jiangkunkun 
>>
>> When stop dirty log tracking, we need to recover all block descriptors
>> which are splited when start dirty log tracking. This adds a new
>> interface named merge_page in iommu layer and arm smmuv3 implements it,
>> which reinstall block mappings and unmap the span of page mappings.
>>
>> It's caller's duty to find contiuous physical memory.
>>
>> During merging page, other interfaces are not expected to be working,
>> so race condition does not exist. And we flush all iotlbs after the merge
>> procedure is completed to ease the pressure of iommu, as we will merge a
>> huge range of page mappings in general.
> 
> Again, I think we need better reasoning than "race conditions don't exist 
> because we don't expect them to exist".
Sure, because they can't. ;-)

> 
>> Co-developed-by: Keqian Zhu 
>> Signed-off-by: Kunkun Jiang 
>> ---
>>   drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c | 20 ++
>>   drivers/iommu/io-pgtable-arm.c  | 78 +
>>   drivers/iommu/iommu.c   | 75 
>>   include/linux/io-pgtable.h  |  2 +
>>   include/linux/iommu.h   | 10 +++
>>   5 files changed, 185 insertions(+)
>>
>> diff --git a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c 
>> b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
>> index 5469f4fca820..2434519e4bb6 100644
>> --- a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
>> +++ b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
>> @@ -2529,6 +2529,25 @@ static size_t arm_smmu_split_block(struct 
>> iommu_domain *domain,
>>   return ops->split_block(ops, iova, size);
>>   }
[...]

>> +
>> +size_t iommu_merge_page(struct iommu_domain *domain, unsigned long iova,
>> +size_t size, int prot)
>> +{
>> +phys_addr_t phys;
>> +dma_addr_t p, i;
>> +size_t cont_size, merged_size;
>> +size_t merged = 0;
>> +
>> +while (size) {
>> +phys = iommu_iova_to_phys(domain, iova);
>> +cont_size = PAGE_SIZE;
>> +p = phys + cont_size;
>> +i = iova + cont_size;
>> +
>> +while (cont_size < size && p == iommu_iova_to_phys(domain, i)) {
>> +p += PAGE_SIZE;
>> +i += PAGE_SIZE;
>> +cont_size += PAGE_SIZE;
>> +}
>> +
>> +merged_size = __iommu_merge_page(domain, iova, phys, cont_size,
>> +prot);
> 
> This is incredibly silly. The amount of time you'll spend just on walking the 
> tables in all those iova_to_phys() calls is probably significantly more than 
> it would take the low-level pagetable code to do the entire operation for 
> itself. At this level, any knowledge of how mappings are actually constructed 
> is lost once __iommu_map() returns, so we just don't know, and for this 
> operation in particular there seems little point in trying to guess - the 
> driver backend still has to figure out whether something we *think* might me 
> mergeable actually is, so it's better off doing the entire operation in a 
> single pass by itself.
>
> There's especially little point in starting all this work *before* checking 
> that it's even possible...
>
> Robin.

Well, this looks silly indeed. But the iova->phys info is only stored in 
pgtable. It seems that there is no other method to find continuous physical 
address :-( (actually, the vfio_iommu_replay() has similar logic).

We put the finding procedure of continuous physical address in common iommu 
layer, because this is a common logic for all types of iommu driver.

If a vendor iommu driver thinks (iova, phys, cont_size) is not merge-able, it 
can make its own decision to map them. This keeps same as iommu_map(), which 
provides (iova, paddr, pgsize) to vendor driver, and vendor driver can make its 
own decision to map them.

Do I understand your idea correctly?

Thanks,
Keqian
> 
>> +iova += merged_size;
>> +size -= merged_size;
>> +merged += merged_size;
>> +
>> +if (merged_size != cont_size)
>> +break;
>> +}
>> +iommu_flush_iotlb_all(domain);
>> +
>> +return merged;
>> +}
>> +EXPORT_SYMBOL_GPL(iommu_merge_page);
>> +
>>   void iommu_get_resv_regions(struct device *dev, struct list_head *list)
>>   {
>>   const struct iommu_ops *ops = dev->bus->iommu_ops;
>> diff --git a/include/linux/io-pgtable.h b/include/linux/io-pgtable.h
&g

Re: [RFC PATCH 10/11] vfio/iommu_type1: Optimize dirty bitmap population based on iommu HWDBM

2021-02-07 Thread Keqian Zhu

Hi Yi,

On 2021/2/7 17:56, Yi Sun wrote:
> Hi,
> 
> On 21-01-28 23:17:41, Keqian Zhu wrote:
> 
> [...]
> 
>> +static void vfio_dma_dirty_log_start(struct vfio_iommu *iommu,
>> + struct vfio_dma *dma)
>> +{
>> +struct vfio_domain *d;
>> +
>> +list_for_each_entry(d, >domain_list, next) {
>> +/* Go through all domain anyway even if we fail */
>> +iommu_split_block(d->domain, dma->iova, dma->size);
>> +}
>> +}
> 
> This should be a switch to prepare for dirty log start. Per Intel
> Vtd spec, there is SLADE defined in Scalable-Mode PASID Table Entry.
> It enables Accessed/Dirty Flags in second-level paging entries.
> So, a generic iommu interface here is better. For Intel iommu, it
> enables SLADE. For ARM, it splits block.
Indeed, a generic interface name is better.

The vendor iommu driver plays vendor's specific actions to start dirty log, and 
Intel iommu and ARM smmu may differ. Besides, we may add more actions in ARM 
smmu driver in future.

One question: Though I am not familiar with Intel iommu, I think it also should 
split block mapping besides enable SLADE. Right?

Thanks,
Keqian
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

Re: [RFC PATCH 04/11] iommu/arm-smmu-v3: Split block descriptor to a span of page

2021-02-07 Thread Keqian Zhu

Hi Robin,

On 2021/2/5 3:51, Robin Murphy wrote:
> On 2021-01-28 15:17, Keqian Zhu wrote:
>> From: jiangkunkun 
>>
>> Block descriptor is not a proper granule for dirty log tracking. This
>> adds a new interface named split_block in iommu layer and arm smmuv3
>> implements it, which splits block descriptor to an equivalent span of
>> page descriptors.
>>
>> During spliting block, other interfaces are not expected to be working,
>> so race condition does not exist. And we flush all iotlbs after the split
>> procedure is completed to ease the pressure of iommu, as we will split a
>> huge range of block mappings in general.
> 
> "Not expected to be" is not the same thing as "can not". Presumably the whole 
> point of dirty log tracking is that it can be run speculatively in the 
> background, so is there any actual guarantee that the guest can't, say, issue 
> a hotplug event that would cause some memory to be released back to the host 
> and unmapped while a scan might be in progress? Saying effectively "there is 
> no race condition as long as you assume there is no race condition" isn't all 
> that reassuring...
Sorry for my inaccuracy expression. "Not expected to be" is inappropriate here, 
the actual meaning is "can not".

As the only user of these newly added interfaces is vfio_iommu_type1 for now, 
and vfio_iommu_type1 always acquires "iommu->lock" before invoke them.

> 
> That said, it's not very clear why patches #4 and #5 are here at all, given 
> that patches #6 and #7 appear quite happy to handle block entries.
Split block into page is very important for dirty page tracking. Page mapping 
can greatly reduce the amount of dirty memory handling. The KVM mmu stage2 side 
also has this logic.

Yes, #6 (log_sync) and #7 (log_clear) is designed to be applied for both block 
and page mapping. As the "split" operation may fail (e.g, without BBML1/2 or 
ENOMEM), but we can still track dirty at block granule, which is still a much 
better choice compared to the full dirty policy.

> 
>> Co-developed-by: Keqian Zhu 
>> Signed-off-by: Kunkun Jiang 
>> ---
>>   drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c |  20 
>>   drivers/iommu/io-pgtable-arm.c  | 122 
>>   drivers/iommu/iommu.c   |  40 +++
>>   include/linux/io-pgtable.h  |   2 +
>>   include/linux/iommu.h   |  10 ++
>>   5 files changed, 194 insertions(+)
>>
>> diff --git a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c 
>> b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
>> index 9208881a571c..5469f4fca820 100644
>> --- a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
>> +++ b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
>> @@ -2510,6 +2510,25 @@ static int arm_smmu_domain_set_attr(struct 
>> iommu_domain *domain,
>>   return ret;
>>   }
>>   +static size_t arm_smmu_split_block(struct iommu_domain *domain,
>> +   unsigned long iova, size_t size)
>> +{
>> +struct arm_smmu_device *smmu = to_smmu_domain(domain)->smmu;
>> +struct io_pgtable_ops *ops = to_smmu_domain(domain)->pgtbl_ops;
>> +
>> +if (!(smmu->features & (ARM_SMMU_FEAT_BBML1 | ARM_SMMU_FEAT_BBML2))) {
>> +dev_err(smmu->dev, "don't support BBML1/2 and split block\n");
>> +return 0;
>> +}
>> +
>> +if (!ops || !ops->split_block) {
>> +pr_err("don't support split block\n");
>> +return 0;
>> +}
>> +
>> +return ops->split_block(ops, iova, size);
>> +}
>> +
>>   static int arm_smmu_of_xlate(struct device *dev, struct of_phandle_args 
>> *args)
>>   {
>>   return iommu_fwspec_add_ids(dev, args->args, 1);
>> @@ -2609,6 +2628,7 @@ static struct iommu_ops arm_smmu_ops = {
>>   .device_group= arm_smmu_device_group,
>>   .domain_get_attr= arm_smmu_domain_get_attr,
>>   .domain_set_attr= arm_smmu_domain_set_attr,
>> +.split_block= arm_smmu_split_block,
>>   .of_xlate= arm_smmu_of_xlate,
>>   .get_resv_regions= arm_smmu_get_resv_regions,
>>   .put_resv_regions= generic_iommu_put_resv_regions,
>> diff --git a/drivers/iommu/io-pgtable-arm.c b/drivers/iommu/io-pgtable-arm.c
>> index e299a44808ae..f3b7f7115e38 100644
>> --- a/drivers/iommu/io-pgtable-arm.c
>> +++ b/drivers/iommu/io-pgtable-arm.c
>> @@ -79,6 +79,8 @@
>>   #define ARM_LPAE_PTE_SH_IS(((arm_lpae_iopte)3) << 8)
>>   #define ARM_LPAE_P

Re: [RFC PATCH 01/11] iommu/arm-smmu-v3: Add feature detection for HTTU

2021-02-06 Thread Keqian Zhu

Hi Robin,

On 2021/2/5 19:48, Robin Murphy wrote:
> On 2021-02-05 09:13, Keqian Zhu wrote:
>> Hi Robin and Jean,
>>
>> On 2021/2/5 3:50, Robin Murphy wrote:
>>> On 2021-01-28 15:17, Keqian Zhu wrote:
>>>> From: jiangkunkun 
>>>>
>>>> The SMMU which supports HTTU (Hardware Translation Table Update) can
>>>> update the access flag and the dirty state of TTD by hardware. It is
>>>> essential to track dirty pages of DMA.
>>>>
>>>> This adds feature detection, none functional change.
>>>>
>>>> Co-developed-by: Keqian Zhu 
>>>> Signed-off-by: Kunkun Jiang 
>>>> ---
>>>>drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c | 16 
>>>>drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h |  8 
>>>>include/linux/io-pgtable.h  |  1 +
>>>>3 files changed, 25 insertions(+)
>>>>
>>>> diff --git a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c 
>>>> b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
>>>> index 8ca7415d785d..0f0fe71cc10d 100644
>>>> --- a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
>>>> +++ b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
>>>> @@ -1987,6 +1987,7 @@ static int arm_smmu_domain_finalise(struct 
>>>> iommu_domain *domain,
>>>>.pgsize_bitmap= smmu->pgsize_bitmap,
>>>>.ias= ias,
>>>>.oas= oas,
>>>> +.httu_hd= smmu->features & ARM_SMMU_FEAT_HTTU_HD,
>>>>.coherent_walk= smmu->features & ARM_SMMU_FEAT_COHERENCY,
>>>>.tlb= _smmu_flush_ops,
>>>>.iommu_dev= smmu->dev,
>>>> @@ -3224,6 +3225,21 @@ static int arm_smmu_device_hw_probe(struct 
>>>> arm_smmu_device *smmu)
>>>>if (reg & IDR0_HYP)
>>>>smmu->features |= ARM_SMMU_FEAT_HYP;
>>>>+switch (FIELD_GET(IDR0_HTTU, reg)) {
>>>
>>> We need to accommodate the firmware override as well if we need this to be 
>>> meaningful. Jean-Philippe is already carrying a suitable patch in the SVA 
>>> stack[1].
>> Robin, Thanks for pointing it out.
>>
>> Jean, I see that the IORT HTTU flag overrides the hardware register info 
>> unconditionally. I have some concern about it:
>>
>> If the override flag has HTTU but hardware doesn't support it, then driver 
>> will use this feature but receive access fault or permission fault from SMMU 
>> unexpectedly.
>> 1) If IOPF is not supported, then kernel can not work normally.
>> 2) If IOPF is supported, kernel will perform useless actions, such as HTTU 
>> based dma dirty tracking (this series).
> 
> Yes, if the IORT describes the SMMU incorrectly, things will not work well. 
> Just like if it describes the wrong base address or the wrong interrupt 
> numbers, things will also not work well. The point is that incorrect firmware 
> can be updated in the field fairly easily; incorrect hardware can not.
Agree.

> 
> Say the SMMU designer hard-codes the ID register field to 0x2 because the 
> SMMU itself is capable of HTTU, and they assume it's always going to be wired 
> up coherently, but then a customer integrates it to a non-coherent 
> interconnect. Firmware needs to override that value to prevent an OS thinking 
> that the claimed HTTU capability is ever going to work.
> 
> Or say the SMMU *is* integrated correctly, but due to an erratum discovered 
> later in the interconnect or SMMU itself, it turns out DBM doesn't always 
> work reliably, but AF is still OK. Firmware needs to downgrade the indicated 
> level of support from that which was intended to that which works reliably.
> 
> Or say someone forgets to set an integration tieoff so their SMMU reports 0x0 
> even though it and the interconnect *can* happily support HTTU. In that case, 
> firmware may want to upgrade the value to *allow* an OS to use HTTU despite 
> the ID register being wrong.
Fair enough. Mask can realize "downgrade", but not "upgrade". You give a 
reasonable point for upgrade.

BTW, my original intention is that mask can provide some convenience for BIOS 
maker, as the override flag can keep same for SMMUs regardless they support 
HTTU or not. But it shows that mask cannot cover all scenario.

> 
>> As the IORT spec doesn't give an explicit explanation for HTTU override, can 
>> we comprehend it as a mask for HTTU related hardware register?
>> So the logic becomes: smmu->feature = HTTU override &

Re: [RFC PATCH 01/11] iommu/arm-smmu-v3: Add feature detection for HTTU

2021-02-06 Thread Keqian Zhu

Hi Robin,

On 2021/2/6 0:11, Robin Murphy wrote:
> On 2021-02-05 11:48, Robin Murphy wrote:
>> On 2021-02-05 09:13, Keqian Zhu wrote:
>>> Hi Robin and Jean,
>>>
>>> On 2021/2/5 3:50, Robin Murphy wrote:
>>>> On 2021-01-28 15:17, Keqian Zhu wrote:
>>>>> From: jiangkunkun 
>>>>>
>>>>> The SMMU which supports HTTU (Hardware Translation Table Update) can
>>>>> update the access flag and the dirty state of TTD by hardware. It is
>>>>> essential to track dirty pages of DMA.
>>>>>
>>>>> This adds feature detection, none functional change.
>>>>>
>>>>> Co-developed-by: Keqian Zhu 
>>>>> Signed-off-by: Kunkun Jiang 
>>>>> ---
>>>>>drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c | 16 
>>>>>drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h |  8 
>>>>>include/linux/io-pgtable.h  |  1 +
>>>>>3 files changed, 25 insertions(+)
>>>>>
>>>>> diff --git a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c 
>>>>> b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
>>>>> index 8ca7415d785d..0f0fe71cc10d 100644
>>>>> --- a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
>>>>> +++ b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
>>>>> @@ -1987,6 +1987,7 @@ static int arm_smmu_domain_finalise(struct 
>>>>> iommu_domain *domain,
>>>>>.pgsize_bitmap= smmu->pgsize_bitmap,
>>>>>.ias= ias,
>>>>>.oas= oas,
>>>>> +.httu_hd= smmu->features & ARM_SMMU_FEAT_HTTU_HD,
>>>>>.coherent_walk= smmu->features & ARM_SMMU_FEAT_COHERENCY,
>>>>>.tlb= _smmu_flush_ops,
>>>>>.iommu_dev= smmu->dev,
>>>>> @@ -3224,6 +3225,21 @@ static int arm_smmu_device_hw_probe(struct 
>>>>> arm_smmu_device *smmu)
>>>>>if (reg & IDR0_HYP)
>>>>>smmu->features |= ARM_SMMU_FEAT_HYP;
>>>>>+switch (FIELD_GET(IDR0_HTTU, reg)) {
>>>>
>>>> We need to accommodate the firmware override as well if we need this to be 
>>>> meaningful. Jean-Philippe is already carrying a suitable patch in the SVA 
>>>> stack[1].
>>> Robin, Thanks for pointing it out.
>>>
>>> Jean, I see that the IORT HTTU flag overrides the hardware register info 
>>> unconditionally. I have some concern about it:
>>>
>>> If the override flag has HTTU but hardware doesn't support it, then driver 
>>> will use this feature but receive access fault or permission fault from 
>>> SMMU unexpectedly.
>>> 1) If IOPF is not supported, then kernel can not work normally.
>>> 2) If IOPF is supported, kernel will perform useless actions, such as HTTU 
>>> based dma dirty tracking (this series).
>>
>> Yes, if the IORT describes the SMMU incorrectly, things will not work well. 
>> Just like if it describes the wrong base address or the wrong interrupt 
>> numbers, things will also not work well. The point is that incorrect 
>> firmware can be updated in the field fairly easily; incorrect hardware can 
>> not.
>>
>> Say the SMMU designer hard-codes the ID register field to 0x2 because the 
>> SMMU itself is capable of HTTU, and they assume it's always going to be 
>> wired up coherently, but then a customer integrates it to a non-coherent 
>> interconnect. Firmware needs to override that value to prevent an OS 
>> thinking that the claimed HTTU capability is ever going to work.
>>
>> Or say the SMMU *is* integrated correctly, but due to an erratum discovered 
>> later in the interconnect or SMMU itself, it turns out DBM doesn't always 
>> work reliably, but AF is still OK. Firmware needs to downgrade the indicated 
>> level of support from that which was intended to that which works reliably.
>>
>> Or say someone forgets to set an integration tieoff so their SMMU reports 
>> 0x0 even though it and the interconnect *can* happily support HTTU. In that 
>> case, firmware may want to upgrade the value to *allow* an OS to use HTTU 
>> despite the ID register being wrong.
>>
>>> As the IORT spec doesn't give an explicit explanation for HTTU override, 
>>> can we comprehend it as a mask for HTTU related hardware register?
>>> So the logic becomes: smmu->feature = HTTU override & IDR0_HTTU;
>>
>> No, it literally states that the OS must use the value of the firmware field 
>> *instead* of the value from the hardware field.
> 
> Oops, apologies for an oversight there - I've been reviewing IORT spec 
> updates lately so naturally had the newest version open already. Turns out 
> these descriptions were only clarified in the most recent release, so if you 
> were looking at an older document they *were* horribly vague.
Yep, my local version is E which was released at July 2020. I download the 
version E.a just now, thanks. ;-)

Thanks,
Keqian
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

Re: [RFC PATCH 01/11] iommu/arm-smmu-v3: Add feature detection for HTTU

2021-02-06 Thread Keqian Zhu

Hi Jean,

On 2021/2/5 17:51, Jean-Philippe Brucker wrote:
> Hi Keqian,
> 
> On Fri, Feb 05, 2021 at 05:13:50PM +0800, Keqian Zhu wrote:
>>> We need to accommodate the firmware override as well if we need this to be 
>>> meaningful. Jean-Philippe is already carrying a suitable patch in the SVA 
>>> stack[1].
>> Robin, Thanks for pointing it out.
>>
>> Jean, I see that the IORT HTTU flag overrides the hardware register info 
>> unconditionally. I have some concern about it:
>>
>> If the override flag has HTTU but hardware doesn't support it, then driver 
>> will use this feature but receive access fault or permission fault from SMMU 
>> unexpectedly.
>> 1) If IOPF is not supported, then kernel can not work normally.
>> 2) If IOPF is supported, kernel will perform useless actions, such as HTTU 
>> based dma dirty tracking (this series).
>>
>> As the IORT spec doesn't give an explicit explanation for HTTU override, can 
>> we comprehend it as a mask for HTTU related hardware register?
> 
> To me "Overrides the value of SMMU_IDR0.HTTU" is clear enough: disregard
> the value of SMMU_IDR0.HTTU and use the one specified by IORT instead. And
> that's both ways, since there is no validity mask for the IORT value: if
> there is an IORT table, always ignore SMMU_IDR0.HTTU.
> 
> That's how the SMMU driver implements the COHACC bit, which has the same
> wording in IORT. So I think we should implement HTTU the same way.
OK, and Robin said that the latest IORT spec literally states it.

> 
> One complication is that there is no equivalent override for device tree.
> I think it can be added later if necessary, because unlike IORT it can be
> tri state (property not present, overriden positive, overridden negative).
Yeah, that would be more flexible. ;-)

> 
> Thanks,
> Jean
> 
> .
> 
Thanks,
Keqian
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

Re: [RFC PATCH 01/11] iommu/arm-smmu-v3: Add feature detection for HTTU

2021-02-05 Thread Keqian Zhu

Hi Robin and Jean,

On 2021/2/5 3:50, Robin Murphy wrote:
> On 2021-01-28 15:17, Keqian Zhu wrote:
>> From: jiangkunkun 
>>
>> The SMMU which supports HTTU (Hardware Translation Table Update) can
>> update the access flag and the dirty state of TTD by hardware. It is
>> essential to track dirty pages of DMA.
>>
>> This adds feature detection, none functional change.
>>
>> Co-developed-by: Keqian Zhu 
>> Signed-off-by: Kunkun Jiang 
>> ---
>>   drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c | 16 
>>   drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h |  8 
>>   include/linux/io-pgtable.h  |  1 +
>>   3 files changed, 25 insertions(+)
>>
>> diff --git a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c 
>> b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
>> index 8ca7415d785d..0f0fe71cc10d 100644
>> --- a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
>> +++ b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
>> @@ -1987,6 +1987,7 @@ static int arm_smmu_domain_finalise(struct 
>> iommu_domain *domain,
>>   .pgsize_bitmap= smmu->pgsize_bitmap,
>>   .ias= ias,
>>   .oas= oas,
>> +.httu_hd= smmu->features & ARM_SMMU_FEAT_HTTU_HD,
>>   .coherent_walk= smmu->features & ARM_SMMU_FEAT_COHERENCY,
>>   .tlb= _smmu_flush_ops,
>>   .iommu_dev= smmu->dev,
>> @@ -3224,6 +3225,21 @@ static int arm_smmu_device_hw_probe(struct 
>> arm_smmu_device *smmu)
>>   if (reg & IDR0_HYP)
>>   smmu->features |= ARM_SMMU_FEAT_HYP;
>>   +switch (FIELD_GET(IDR0_HTTU, reg)) {
> 
> We need to accommodate the firmware override as well if we need this to be 
> meaningful. Jean-Philippe is already carrying a suitable patch in the SVA 
> stack[1].
Robin, Thanks for pointing it out.

Jean, I see that the IORT HTTU flag overrides the hardware register info 
unconditionally. I have some concern about it:

If the override flag has HTTU but hardware doesn't support it, then driver will 
use this feature but receive access fault or permission fault from SMMU 
unexpectedly.
1) If IOPF is not supported, then kernel can not work normally.
2) If IOPF is supported, kernel will perform useless actions, such as HTTU 
based dma dirty tracking (this series).

As the IORT spec doesn't give an explicit explanation for HTTU override, can we 
comprehend it as a mask for HTTU related hardware register?
So the logic becomes: smmu->feature = HTTU override & IDR0_HTTU;

> 
>> +case IDR0_HTTU_NONE:
>> +break;
>> +case IDR0_HTTU_HA:
>> +smmu->features |= ARM_SMMU_FEAT_HTTU_HA;
>> +break;
>> +case IDR0_HTTU_HAD:
>> +smmu->features |= ARM_SMMU_FEAT_HTTU_HA;
>> +smmu->features |= ARM_SMMU_FEAT_HTTU_HD;
>> +break;
>> +default:
>> +dev_err(smmu->dev, "unknown/unsupported HTTU!\n");
>> +return -ENXIO;
>> +}
>> +
>>   /*
>>* The coherency feature as set by FW is used in preference to the ID
>>* register, but warn on mismatch.
>> diff --git a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h 
>> b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h
>> index 96c2e9565e00..e91bea44519e 100644
>> --- a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h
>> +++ b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h
>> @@ -33,6 +33,10 @@
>>   #define IDR0_ASID16(1 << 12)
>>   #define IDR0_ATS(1 << 10)
>>   #define IDR0_HYP(1 << 9)
>> +#define IDR0_HTTUGENMASK(7, 6)
>> +#define IDR0_HTTU_NONE0
>> +#define IDR0_HTTU_HA1
>> +#define IDR0_HTTU_HAD2
>>   #define IDR0_COHACC(1 << 4)
>>   #define IDR0_TTFGENMASK(3, 2)
>>   #define IDR0_TTF_AARCH642
>> @@ -286,6 +290,8 @@
>>   #define CTXDESC_CD_0_TCR_TBI0(1ULL << 38)
>> #define CTXDESC_CD_0_AA64(1UL << 41)
>> +#define CTXDESC_CD_0_HD(1UL << 42)
>> +#define CTXDESC_CD_0_HA(1UL << 43)
>>   #define CTXDESC_CD_0_S(1UL << 44)
>>   #define CTXDESC_CD_0_R(1UL << 45)
>>   #define CTXDESC_CD_0_A(1UL << 46)
>> @@ -604,6 +610,8 @@ struct arm_smmu_device {
>>   #define ARM_SMMU_FEAT_RANGE_INV(1 << 15)
>>   #define ARM_SMMU_FEAT_BTM(1 << 16)
>>   #define ARM_SMMU_FEAT_SVA(1 << 17)
>> +#define ARM_SMMU_FEAT_

Re: [RFC] Use SMMU HTTU for DMA dirty page tracking

2021-02-04 Thread Keqian Zhu

Hi Jean and Kevin,

FYI, I have send out the SMMUv3 HTTU support for DMA dirty tracking[1] a week 
ago.

Thanks,
Keqian

[1] 
https://lore.kernel.org/linux-iommu/20210128151742.18840-1-zhukeqi...@huawei.com/

On 2020/5/27 17:14, Jean-Philippe Brucker wrote:
> On Wed, May 27, 2020 at 08:40:47AM +, Tian, Kevin wrote:
>>> From: Xiang Zheng 
>>> Sent: Wednesday, May 27, 2020 2:45 PM
>>>
>>>
>>> On 2020/5/27 11:27, Tian, Kevin wrote:
> From: Xiang Zheng
> Sent: Monday, May 25, 2020 7:34 PM
>
> [+cc Kirti, Yan, Alex]
>
> On 2020/5/23 1:14, Jean-Philippe Brucker wrote:
>> Hi,
>>
>> On Tue, May 19, 2020 at 05:42:55PM +0800, Xiang Zheng wrote:
>>> Hi all,
>>>
>>> Is there any plan for enabling SMMU HTTU?
>>
>> Not outside of SVA, as far as I know.
>>
>
>>> I have seen the patch locates in the SVA series patch, which adds
>>> support for HTTU:
>>> https://www.spinics.net/lists/arm-kernel/msg798694.html
>>>
>>> HTTU reduces the number of access faults on SMMU fault queue
>>> (permission faults also benifit from it).
>>>
>>> Besides reducing the faults, HTTU also helps to track dirty pages for
>>> device DMA. Is it feasible to utilize HTTU to get dirty pages on device
>>> DMA during VFIO live migration?
>>
>> As you know there is a VFIO interface for this under discussion:
>> https://lore.kernel.org/kvm/1589781397-28368-1-git-send-email-
> kwankh...@nvidia.com/
>> It doesn't implement an internal API to communicate with the IOMMU
> driver
>> about dirty pages.

 We plan to add such API later, e.g. to utilize A/D bit in VT-d 2nd-level
 page tables (Rev 3.0).

>>>
>>> Thank you, Kevin.
>>>
>>> When will you send this series patches? Maybe(Hope) we can also support
>>> hardware-based dirty pages tracking via common APIs based on your
>>> patches. :)
>>
>> Yan is working with Kirti on basic live migration support now. After that
>> part is done, we will start working on A/D bit support. Yes, common APIs
>> are definitely the goal here.
>>
>>>
>
>>
>>> If SMMU can track dirty pages, devices are not required to implement
>>> additional dirty pages tracking to support VFIO live migration.
>>
>> It seems feasible, though tracking it in the device might be more
>> efficient. I might have misunderstood but I think for live migration of
>> the Intel NIC they trap guest accesses to the device and introspect its
>> state to figure out which pages it is accessing.

 Does HTTU implement A/D-like mechanism in SMMU page tables, or just
 report dirty pages in a log buffer? Either way tracking dirty pages in 
 IOMMU
 side is generic thus doesn't require device-specific tweak like in Intel 
 NIC.

>>>
>>> Currently HTTU just implement A/D-like mechanism in SMMU page tables.
>>> We certainly
>>> expect SMMU can also implement PML-like feature so that we can avoid
>>> walking the
>>> whole page table to get the dirty pages.
> 
> There is no reporting of dirty pages in log buffer. It might be possible
> to do software logging based on PRI or Stall, but that requires special
> support in the endpoint as well as the SMMU.
> 
>> Is there a link to HTTU introduction?
> 
> I don't know any gentle introduction, but there are sections D5.4.11
> "Hardware management of the Access flag and dirty state" in the ARM
> Architecture Reference Manual (DDI0487E), and section 3.13 "Translation
> table entries and Access/Dirty flags" in the SMMU specification
> (IHI0070C). HTTU stands for "Hardware Translation Table Update".
> 
> In short, when HTTU is enabled, the SMMU translation performs an atomic
> read-modify-write on the leaf translation table descriptor, setting some
> bits depending on the type of memory access. This can be enabled
> independently on both stage-1 and stage-2 tables (equivalent to your 1st
> and 2nd page tables levels, I think).
> 
> Thanks,
> Jean
> ___
> kvmarm mailing list
> kvm...@lists.cs.columbia.edu
> https://lists.cs.columbia.edu/mailman/listinfo/kvmarm
> .
> 
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

Re: [PATCH] iommu: Update the document of IOMMU_DOMAIN_UNMANAGED

2021-02-02 Thread Keqian Zhu




On 2021/2/2 20:58, Robin Murphy wrote:
> On 2021-02-02 08:53, Keqian Zhu wrote:
>> Signed-off-by: Keqian Zhu 
>> ---
>>   include/linux/iommu.h | 2 +-
>>   1 file changed, 1 insertion(+), 1 deletion(-)
>>
>> diff --git a/include/linux/iommu.h b/include/linux/iommu.h
>> index 77e561ed57fd..e8f2efae212b 100644
>> --- a/include/linux/iommu.h
>> +++ b/include/linux/iommu.h
>> @@ -68,7 +68,7 @@ struct iommu_domain_geometry {
>>*  devices
>>*IOMMU_DOMAIN_IDENTITY- DMA addresses are system physical 
>> addresses
>>*IOMMU_DOMAIN_UNMANAGED- DMA mappings managed by IOMMU-API user, 
>> used
>> - *  for VMs
>> + *  for VMs or userspace driver frameworks
> 
> Given that "VMs" effectively has to mean VFIO, doesn't it effectively already 
> imply other uses of VFIO anyway? Unmanaged domains are also used in other 
> subsystems/drivers inside the kernel and we're not naming those, so I don't 
> see that it's particularly helpful to specifically call out one more VFIO 
> use-case.
> 
> Perhaps the current wording could be generalised a little more, but we 
> certainly don't want to start trying to maintain an exhaustive list of users 
> here...
Yep, a more generalised description is better. After I have a look at all the 
use cases...

Thanks,
Keqian

> 
> Robin.
> 
>>*IOMMU_DOMAIN_DMA- Internally used for DMA-API implementations.
>>*  This flag allows IOMMU drivers to implement
>>*  certain optimizations for these domains
>>
> .
> 
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

Re: [PATCH v11 01/13] vfio: VFIO_IOMMU_SET_PASID_TABLE

2021-02-02 Thread Keqian Zhu

Hi Eric,

On 2020/11/16 19:00, Eric Auger wrote:
> From: "Liu, Yi L" 
> 
> This patch adds an VFIO_IOMMU_SET_PASID_TABLE ioctl
> which aims to pass the virtual iommu guest configuration
> to the host. This latter takes the form of the so-called
> PASID table.
> 
> Signed-off-by: Jacob Pan 
> Signed-off-by: Liu, Yi L 
> Signed-off-by: Eric Auger 
> 
> ---
> v11 -> v12:
> - use iommu_uapi_set_pasid_table
> - check SET and UNSET are not set simultaneously (Zenghui)
> 
> v8 -> v9:
> - Merge VFIO_IOMMU_ATTACH/DETACH_PASID_TABLE into a single
>   VFIO_IOMMU_SET_PASID_TABLE ioctl.
> 
> v6 -> v7:
> - add a comment related to VFIO_IOMMU_DETACH_PASID_TABLE
> 
> v3 -> v4:
> - restore ATTACH/DETACH
> - add unwind on failure
> 
> v2 -> v3:
> - s/BIND_PASID_TABLE/SET_PASID_TABLE
> 
> v1 -> v2:
> - s/BIND_GUEST_STAGE/BIND_PASID_TABLE
> - remove the struct device arg
> ---
>  drivers/vfio/vfio_iommu_type1.c | 65 +
>  include/uapi/linux/vfio.h   | 19 ++
>  2 files changed, 84 insertions(+)
> 
> diff --git a/drivers/vfio/vfio_iommu_type1.c b/drivers/vfio/vfio_iommu_type1.c
> index 67e827638995..87ddd9e882dc 100644
> --- a/drivers/vfio/vfio_iommu_type1.c
> +++ b/drivers/vfio/vfio_iommu_type1.c
> @@ -2587,6 +2587,41 @@ static int vfio_iommu_iova_build_caps(struct 
> vfio_iommu *iommu,
>   return ret;
>  }
>  
> +static void
> +vfio_detach_pasid_table(struct vfio_iommu *iommu)
> +{
> + struct vfio_domain *d;
> +
> + mutex_lock(>lock);
> + list_for_each_entry(d, >domain_list, next)
> + iommu_detach_pasid_table(d->domain);
> +
> + mutex_unlock(>lock);
> +}
> +
> +static int
> +vfio_attach_pasid_table(struct vfio_iommu *iommu, unsigned long arg)
> +{
> + struct vfio_domain *d;
> + int ret = 0;
> +
> + mutex_lock(>lock);
> +
> + list_for_each_entry(d, >domain_list, next) {
> + ret = iommu_uapi_attach_pasid_table(d->domain, (void __user 
> *)arg);
This design is not very clear to me. This assumes all iommu_domains share the 
same pasid table.

As I understand, it's reasonable when there is only one group in the domain, 
and only one domain in the vfio_iommu.
If more than one group in the vfio_iommu, the guest may put them into different 
guest iommu_domain, then they have different pasid table.

Is this the use scenario?

Thanks,
Keqian

> + if (ret)
> + goto unwind;
> + }
> + goto unlock;
> +unwind:
> + list_for_each_entry_continue_reverse(d, >domain_list, next) {
> + iommu_detach_pasid_table(d->domain);
> + }
> +unlock:
> + mutex_unlock(>lock);
> + return ret;
> +}
> +
>  static int vfio_iommu_migration_build_caps(struct vfio_iommu *iommu,
>  struct vfio_info_cap *caps)
>  {
> @@ -2747,6 +2782,34 @@ static int vfio_iommu_type1_unmap_dma(struct 
> vfio_iommu *iommu,
>   -EFAULT : 0;
>  }
>  
> +static int vfio_iommu_type1_set_pasid_table(struct vfio_iommu *iommu,
> + unsigned long arg)
> +{
> + struct vfio_iommu_type1_set_pasid_table spt;
> + unsigned long minsz;
> + int ret = -EINVAL;
> +
> + minsz = offsetofend(struct vfio_iommu_type1_set_pasid_table, flags);
> +
> + if (copy_from_user(, (void __user *)arg, minsz))
> + return -EFAULT;
> +
> + if (spt.argsz < minsz)
> + return -EINVAL;
> +
> + if (spt.flags & VFIO_PASID_TABLE_FLAG_SET &&
> + spt.flags & VFIO_PASID_TABLE_FLAG_UNSET)
> + return -EINVAL;
> +
> + if (spt.flags & VFIO_PASID_TABLE_FLAG_SET)
> + ret = vfio_attach_pasid_table(iommu, arg + minsz);
> + else if (spt.flags & VFIO_PASID_TABLE_FLAG_UNSET) {
> + vfio_detach_pasid_table(iommu);
> + ret = 0;
> + }
> + return ret;
> +}
> +
>  static int vfio_iommu_type1_dirty_pages(struct vfio_iommu *iommu,
>   unsigned long arg)
>  {
> @@ -2867,6 +2930,8 @@ static long vfio_iommu_type1_ioctl(void *iommu_data,
>   return vfio_iommu_type1_unmap_dma(iommu, arg);
>   case VFIO_IOMMU_DIRTY_PAGES:
>   return vfio_iommu_type1_dirty_pages(iommu, arg);
> + case VFIO_IOMMU_SET_PASID_TABLE:
> + return vfio_iommu_type1_set_pasid_table(iommu, arg);
>   default:
>   return -ENOTTY;
>   }
> diff --git a/include/uapi/linux/vfio.h b/include/uapi/linux/vfio.h
> index 2f313a238a8f..78ce3ce6c331 100644
> --- a/include/uapi/linux/vfio.h
> +++ b/include/uapi/linux/vfio.h
> @@ -14,6 +14,7 @@
>  
>  #include 
>  #include 
> +#include 
>  
>  #define VFIO_API_VERSION 0
>  
> @@ -1180,6 +1181,24 @@ struct vfio_iommu_type1_dirty_bitmap_get {
>  
>  #define VFIO_IOMMU_DIRTY_PAGES _IO(VFIO_TYPE, VFIO_BASE + 17)
>  
> +/*
> + * VFIO_IOMMU_SET_PASID_TABLE - _IOWR(VFIO_TYPE, VFIO_BASE + 22,
> + *   struct vfio_iommu_type1_set_pasid_table)
> +

Re: [PATCH v11 03/13] vfio: VFIO_IOMMU_SET_MSI_BINDING

2021-02-02 Thread Keqian Zhu

Hi Eric,

On 2020/11/16 19:00, Eric Auger wrote:
> This patch adds the VFIO_IOMMU_SET_MSI_BINDING ioctl which aim
> to (un)register the guest MSI binding to the host. This latter
> then can use those stage 1 bindings to build a nested stage
> binding targeting the physical MSIs.
[...]

> +static int vfio_iommu_type1_set_msi_binding(struct vfio_iommu *iommu,
> + unsigned long arg)
> +{
> + struct vfio_iommu_type1_set_msi_binding msi_binding;
> + unsigned long minsz;
> + int ret = -EINVAL;
> +
> + minsz = offsetofend(struct vfio_iommu_type1_set_msi_binding,
> + size);
> +
> + if (copy_from_user(_binding, (void __user *)arg, minsz))
> + return -EFAULT;
> +
> + if (msi_binding.argsz < minsz)
> + return -EINVAL;
We can check BIND and UNBIND are not set simultaneously, just like 
VFIO_IOMMU_SET_PASID_TABLE.

> +
> + if (msi_binding.flags == VFIO_IOMMU_UNBIND_MSI) {
> + vfio_unbind_msi(iommu, msi_binding.iova);
> + ret = 0;
> + } else if (msi_binding.flags == VFIO_IOMMU_BIND_MSI) {
> + ret = vfio_bind_msi(iommu, msi_binding.iova,
> + msi_binding.gpa, msi_binding.size);
> + }
> + return ret;
> +}
> +

Thanks,
Keqian
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

[PATCH] iommu: Update the document of IOMMU_DOMAIN_UNMANAGED

2021-02-02 Thread Keqian Zhu

Signed-off-by: Keqian Zhu 
---
 include/linux/iommu.h | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/include/linux/iommu.h b/include/linux/iommu.h
index 77e561ed57fd..e8f2efae212b 100644
--- a/include/linux/iommu.h
+++ b/include/linux/iommu.h
@@ -68,7 +68,7 @@ struct iommu_domain_geometry {
  *   devices
  * IOMMU_DOMAIN_IDENTITY   - DMA addresses are system physical addresses
  * IOMMU_DOMAIN_UNMANAGED  - DMA mappings managed by IOMMU-API user, used
- *   for VMs
+ *   for VMs or userspace driver frameworks
  * IOMMU_DOMAIN_DMA- Internally used for DMA-API implementations.
  *   This flag allows IOMMU drivers to implement
  *   certain optimizations for these domains
-- 
2.19.1

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

Re: [PATCH v13 06/15] iommu/smmuv3: Implement attach/detach_pasid_table

2021-02-02 Thread Keqian Zhu

Hi Eric,

On 2020/11/18 19:21, Eric Auger wrote:
> On attach_pasid_table() we program STE S1 related info set
> by the guest into the actual physical STEs. At minimum
> we need to program the context descriptor GPA and compute
> whether the stage1 is translated/bypassed or aborted.
> 
> Signed-off-by: Eric Auger 
> 
> ---
> v7 -> v8:
> - remove smmu->features check, now done on domain finalize
> 
> v6 -> v7:
> - check versions and comment the fact we don't need to take
>   into account s1dss and s1fmt
> v3 -> v4:
> - adapt to changes in iommu_pasid_table_config
> - different programming convention at s1_cfg/s2_cfg/ste.abort
> 
> v2 -> v3:
> - callback now is named set_pasid_table and struct fields
>   are laid out differently.
> 
> v1 -> v2:
> - invalidate the STE before changing them
> - hold init_mutex
> - handle new fields
> ---
>  drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c | 89 +
>  1 file changed, 89 insertions(+)
> 
> diff --git a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c 
> b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
> index 412ea1bafa50..805acdc18a3a 100644
> --- a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
> +++ b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
> @@ -2661,6 +2661,93 @@ static void arm_smmu_get_resv_regions(struct device 
> *dev,
>   iommu_dma_get_resv_regions(dev, head);
>  }
>  
> +static int arm_smmu_attach_pasid_table(struct iommu_domain *domain,
> +struct iommu_pasid_table_config *cfg)
> +{
> + struct arm_smmu_domain *smmu_domain = to_smmu_domain(domain);
> + struct arm_smmu_master *master;
> + struct arm_smmu_device *smmu;
> + unsigned long flags;
> + int ret = -EINVAL;
> +
> + if (cfg->format != IOMMU_PASID_FORMAT_SMMUV3)
> + return -EINVAL;
> +
> + if (cfg->version != PASID_TABLE_CFG_VERSION_1 ||
> + cfg->vendor_data.smmuv3.version != PASID_TABLE_SMMUV3_CFG_VERSION_1)
> + return -EINVAL;
> +
> + mutex_lock(_domain->init_mutex);
> +
> + smmu = smmu_domain->smmu;
> +
> + if (!smmu)
> + goto out;
> +
> + if (smmu_domain->stage != ARM_SMMU_DOMAIN_NESTED)
> + goto out;
> +
> + switch (cfg->config) {
> + case IOMMU_PASID_CONFIG_ABORT:
> + smmu_domain->s1_cfg.set = false;
> + smmu_domain->abort = true;
> + break;
> + case IOMMU_PASID_CONFIG_BYPASS:
> + smmu_domain->s1_cfg.set = false;
> + smmu_domain->abort = false;
I didn't test it, but it seems that this will cause BUG() in 
arm_smmu_write_strtab_ent().
At the line "BUG_ON(ste_live && !nested);". Maybe I miss something?

> + break;
> + case IOMMU_PASID_CONFIG_TRANSLATE:
> + /* we do not support S1 <-> S1 transitions */
> + if (smmu_domain->s1_cfg.set)
> + goto out;
> +
> + /*
> +  * we currently support a single CD so s1fmt and s1dss
> +  * fields are also ignored
> +  */
> + if (cfg->pasid_bits)
> + goto out;
> +
> + smmu_domain->s1_cfg.cdcfg.cdtab_dma = cfg->base_ptr;
> + smmu_domain->s1_cfg.set = true;
> + smmu_domain->abort = false;
> + break;
> + default:
> + goto out;
> + }
> + spin_lock_irqsave(_domain->devices_lock, flags);
> + list_for_each_entry(master, _domain->devices, domain_head)
> + arm_smmu_install_ste_for_dev(master);
> + spin_unlock_irqrestore(_domain->devices_lock, flags);
> + ret = 0;
> +out:
> + mutex_unlock(_domain->init_mutex);
> + return ret;
> +}
> +
[...]

Thanks,
Keqian
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

Re: [PATCH v13 03/15] iommu/arm-smmu-v3: Maintain a SID->device structure

2021-02-01 Thread Keqian Zhu

Hi Eric,

On 2021/2/2 1:19, Auger Eric wrote:
> Hi Keqian,
> 
> On 2/1/21 1:26 PM, Keqian Zhu wrote:
>> Hi Eric,
>>
>> On 2020/11/18 19:21, Eric Auger wrote:
>>> From: Jean-Philippe Brucker 
>>>
>>> When handling faults from the event or PRI queue, we need to find the
>>> struct device associated to a SID. Add a rb_tree to keep track of SIDs.
>>>
>>> Signed-off-by: Jean-Philippe Brucker 
>> [...]
>>
>>>  }
>>>  
>>> +static int arm_smmu_insert_master(struct arm_smmu_device *smmu,
>>> + struct arm_smmu_master *master)
[...]

>>> kfree(master);
>>
>> Thanks,
>> Keqian
>>
> Thank you for the review. Jean will address this issues in his own
> series and on my end I will rebase on this latter.
> 
> Best Regards
> 
> Eric
>

Yeah, and hope this series can be accepted earlier ;-)

Thanks,
Keqian
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

Re: [PATCH v13 05/15] iommu/smmuv3: Get prepared for nested stage support

2021-02-01 Thread Keqian Zhu

Hi Eric,

On 2020/11/18 19:21, Eric Auger wrote:
> When nested stage translation is setup, both s1_cfg and
> s2_cfg are set.
> 
> We introduce a new smmu domain abort field that will be set
> upon guest stage1 configuration passing.
> 
> arm_smmu_write_strtab_ent() is modified to write both stage
> fields in the STE and deal with the abort field.
> 
> In nested mode, only stage 2 is "finalized" as the host does
> not own/configure the stage 1 context descriptor; guest does.
> 
> Signed-off-by: Eric Auger 
> 
> ---
> v10 -> v11:
> - Fix an issue reported by Shameer when switching from with vSMMU
>   to without vSMMU. Despite the spec does not seem to mention it
>   seems to be needed to reset the 2 high 64b when switching from
>   S1+S2 cfg to S1 only. Especially dst[3] needs to be reset (S2TTB).
>   On some implementations, if the S2TTB is not reset, this causes
>   a C_BAD_STE error
> ---
>  drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c | 64 +
>  drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h |  2 +
>  2 files changed, 56 insertions(+), 10 deletions(-)
> 
> diff --git a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c 
> b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
> index 18ac5af1b284..412ea1bafa50 100644
> --- a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
> +++ b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
> @@ -1181,8 +1181,10 @@ static void arm_smmu_write_strtab_ent(struct 
> arm_smmu_master *master, u32 sid,
>* three cases at the moment:
>*
>* 1. Invalid (all zero) -> bypass/fault (init)
> -  * 2. Bypass/fault -> translation/bypass (attach)
> -  * 3. Translation/bypass -> bypass/fault (detach)
> +  * 2. Bypass/fault -> single stage translation/bypass (attach)
> +  * 3. Single or nested stage Translation/bypass -> bypass/fault (detach)
> +  * 4. S2 -> S1 + S2 (attach_pasid_table)
> +  * 5. S1 + S2 -> S2 (detach_pasid_table)

The following line "BUG_ON(ste_live && !nested);" forbids this transform.
And I have a look at the 6th patch, the transform seems S1 + S2 -> abort.
So after detach, the status is not the same as that before attach. Does it
match our expectation?

>*
>* Given that we can't update the STE atomically and the SMMU
>* doesn't read the thing in a defined order, that leaves us
> @@ -1193,7 +1195,8 @@ static void arm_smmu_write_strtab_ent(struct 
> arm_smmu_master *master, u32 sid,
>* 3. Update Config, sync
>*/
>   u64 val = le64_to_cpu(dst[0]);
> - bool ste_live = false;
> + bool s1_live = false, s2_live = false, ste_live;
> + bool abort, nested = false, translate = false;
>   struct arm_smmu_device *smmu = NULL;
>   struct arm_smmu_s1_cfg *s1_cfg;
>   struct arm_smmu_s2_cfg *s2_cfg;
> @@ -1233,6 +1236,8 @@ static void arm_smmu_write_strtab_ent(struct 
> arm_smmu_master *master, u32 sid,
>   default:
>   break;
>   }
> + nested = s1_cfg->set && s2_cfg->set;
> + translate = s1_cfg->set || s2_cfg->set;
>   }
>  
>   if (val & STRTAB_STE_0_V) {
> @@ -1240,23 +1245,36 @@ static void arm_smmu_write_strtab_ent(struct 
> arm_smmu_master *master, u32 sid,
>   case STRTAB_STE_0_CFG_BYPASS:
>   break;
>   case STRTAB_STE_0_CFG_S1_TRANS:
> + s1_live = true;
> + break;
>   case STRTAB_STE_0_CFG_S2_TRANS:
> - ste_live = true;
> + s2_live = true;
> + break;
> + case STRTAB_STE_0_CFG_NESTED:
> + s1_live = true;
> + s2_live = true;
>   break;
>   case STRTAB_STE_0_CFG_ABORT:
> - BUG_ON(!disable_bypass);
>   break;
>   default:
>   BUG(); /* STE corruption */
>   }
>   }
>  
> + ste_live = s1_live || s2_live;
> +
>   /* Nuke the existing STE_0 value, as we're going to rewrite it */
>   val = STRTAB_STE_0_V;
>  
>   /* Bypass/fault */
> - if (!smmu_domain || !(s1_cfg->set || s2_cfg->set)) {
> - if (!smmu_domain && disable_bypass)
> +
> + if (!smmu_domain)
> + abort = disable_bypass;
> + else
> + abort = smmu_domain->abort;
> +
> + if (abort || !translate) {
> + if (abort)
>   val |= FIELD_PREP(STRTAB_STE_0_CFG, 
> STRTAB_STE_0_CFG_ABORT);
>   else
>   val |= FIELD_PREP(STRTAB_STE_0_CFG, 
> STRTAB_STE_0_CFG_BYPASS);
> @@ -1274,8 +1292,16 @@ static void arm_smmu_write_strtab_ent(struct 
> arm_smmu_master *master, u32 sid,
>   return;
>   }
>  
> + BUG_ON(ste_live && !nested);
> +
> + if (ste_live) {
> + /* First invalidate the live STE */
> + dst[0] = cpu_to_le64(STRTAB_STE_0_CFG_ABORT);
> +

Re: [PATCH v13 03/15] iommu/arm-smmu-v3: Maintain a SID->device structure

2021-02-01 Thread Keqian Zhu

Hi Jean,

On 2021/2/1 23:15, Jean-Philippe Brucker wrote:
> On Mon, Feb 01, 2021 at 08:26:41PM +0800, Keqian Zhu wrote:
>>> +static int arm_smmu_insert_master(struct arm_smmu_device *smmu,
>>> + struct arm_smmu_master *master)
>>> +{
>>> +   int i;
>>> +   int ret = 0;
>>> +   struct arm_smmu_stream *new_stream, *cur_stream;
>>> +   struct rb_node **new_node, *parent_node = NULL;
>>> +   struct iommu_fwspec *fwspec = dev_iommu_fwspec_get(master->dev);
>>> +
>>> +   master->streams = kcalloc(fwspec->num_ids,
>>> + sizeof(struct arm_smmu_stream), GFP_KERNEL);
>>> +   if (!master->streams)
>>> +   return -ENOMEM;
>>> +   master->num_streams = fwspec->num_ids;
>> This is not roll-backed when fail.
> 
> No need, the caller frees master
OK.

> 
>>> +
>>> +   mutex_lock(>streams_mutex);
>>> +   for (i = 0; i < fwspec->num_ids && !ret; i++) {
>> Check ret at here, makes it hard to decide the start index of rollback.
>>
>> If we fail at here, then start index is (i-2).
>> If we fail in the loop, then start index is (i-1).
>>
> [...]
>>> +   if (ret) {
>>> +   for (; i > 0; i--)
>> should be (i >= 0)?
>> And the start index seems not correct.
> 
> Indeed, this whole bit is wrong. I'll fix it while resending the IOPF
> series.
> 
> Thanks,
> Jean
OK, I am glad it helps.

Thanks,
Keqian
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

Re: [PATCH v13 04/15] iommu/smmuv3: Allow s1 and s2 configs to coexist

2021-02-01 Thread Keqian Zhu

Hi Eric,

On 2020/11/18 19:21, Eric Auger wrote:
> In true nested mode, both s1_cfg and s2_cfg will coexist.
> Let's remove the union and add a "set" field in each
> config structure telling whether the config is set and needs
> to be applied when writing the STE. In legacy nested mode,
> only the 2d stage is used. In true nested mode, the "set" field
nit: s/2d/2nd

> will be set when the guest passes the pasid table.
nit: ... the "set" filed of s1_cfg and s2_cfg will be set ...

> 
> Signed-off-by: Eric Auger 
> 
> ---
> v12 -> v13:
> - does not dynamically allocate s1-cfg and s2_cfg anymore. Add
>   the set field
> ---
>  drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c | 43 +
>  drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h |  8 ++--
>  2 files changed, 31 insertions(+), 20 deletions(-)
> 
> diff --git a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c 
> b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
> index 1e4acc7f3d3c..18ac5af1b284 100644
> --- a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
> +++ b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
> @@ -1195,8 +1195,8 @@ static void arm_smmu_write_strtab_ent(struct 
> arm_smmu_master *master, u32 sid,
>   u64 val = le64_to_cpu(dst[0]);
>   bool ste_live = false;
>   struct arm_smmu_device *smmu = NULL;
> - struct arm_smmu_s1_cfg *s1_cfg = NULL;
> - struct arm_smmu_s2_cfg *s2_cfg = NULL;
> + struct arm_smmu_s1_cfg *s1_cfg;
> + struct arm_smmu_s2_cfg *s2_cfg;
>   struct arm_smmu_domain *smmu_domain = NULL;
>   struct arm_smmu_cmdq_ent prefetch_cmd = {
>   .opcode = CMDQ_OP_PREFETCH_CFG,
> @@ -1211,13 +1211,24 @@ static void arm_smmu_write_strtab_ent(struct 
> arm_smmu_master *master, u32 sid,
>   }
>  
>   if (smmu_domain) {
> + s1_cfg = _domain->s1_cfg;
> + s2_cfg = _domain->s2_cfg;
> +
>   switch (smmu_domain->stage) {
>   case ARM_SMMU_DOMAIN_S1:
> - s1_cfg = _domain->s1_cfg;
> + s1_cfg->set = true;
> + s2_cfg->set = false;
>   break;
>   case ARM_SMMU_DOMAIN_S2:
> + s1_cfg->set = false;
> + s2_cfg->set = true;
> + break;
>   case ARM_SMMU_DOMAIN_NESTED:
> - s2_cfg = _domain->s2_cfg;
> + /*
> +  * Actual usage of stage 1 depends on nested mode:
> +  * legacy (2d stage only) or true nested mode
> +  */
> + s2_cfg->set = true;
>   break;
>   default:
>   break;
> @@ -1244,7 +1255,7 @@ static void arm_smmu_write_strtab_ent(struct 
> arm_smmu_master *master, u32 sid,
>   val = STRTAB_STE_0_V;
>  
>   /* Bypass/fault */
> - if (!smmu_domain || !(s1_cfg || s2_cfg)) {
> + if (!smmu_domain || !(s1_cfg->set || s2_cfg->set)) {
>   if (!smmu_domain && disable_bypass)
>   val |= FIELD_PREP(STRTAB_STE_0_CFG, 
> STRTAB_STE_0_CFG_ABORT);
>   else
> @@ -1263,7 +1274,7 @@ static void arm_smmu_write_strtab_ent(struct 
> arm_smmu_master *master, u32 sid,
>   return;
>   }
>  
> - if (s1_cfg) {
> + if (s1_cfg->set) {
>   BUG_ON(ste_live);
>   dst[1] = cpu_to_le64(
>FIELD_PREP(STRTAB_STE_1_S1DSS, 
> STRTAB_STE_1_S1DSS_SSID0) |
> @@ -1282,7 +1293,7 @@ static void arm_smmu_write_strtab_ent(struct 
> arm_smmu_master *master, u32 sid,
>   FIELD_PREP(STRTAB_STE_0_S1FMT, s1_cfg->s1fmt);
>   }
>  
> - if (s2_cfg) {
> + if (s2_cfg->set) {
>   BUG_ON(ste_live);
>   dst[2] = cpu_to_le64(
>FIELD_PREP(STRTAB_STE_2_S2VMID, s2_cfg->vmid) |
> @@ -1846,24 +1857,24 @@ static void arm_smmu_domain_free(struct iommu_domain 
> *domain)
>  {
>   struct arm_smmu_domain *smmu_domain = to_smmu_domain(domain);
>   struct arm_smmu_device *smmu = smmu_domain->smmu;
> + struct arm_smmu_s1_cfg *s1_cfg = _domain->s1_cfg;
> + struct arm_smmu_s2_cfg *s2_cfg = _domain->s2_cfg;
>  
>   iommu_put_dma_cookie(domain);
>   free_io_pgtable_ops(smmu_domain->pgtbl_ops);
>  
>   /* Free the CD and ASID, if we allocated them */
> - if (smmu_domain->stage == ARM_SMMU_DOMAIN_S1) {
> - struct arm_smmu_s1_cfg *cfg = _domain->s1_cfg;
> -
> + if (s1_cfg->set) {
>   /* Prevent SVA from touching the CD while we're freeing it */
>   mutex_lock(_smmu_asid_lock);
> - if (cfg->cdcfg.cdtab)
> + if (s1_cfg->cdcfg.cdtab)
>   arm_smmu_free_cd_tables(smmu_domain);
> - arm_smmu_free_asid(>cd);
> + arm_smmu_free_asid(_cfg->cd);
>   mutex_unlock(_smmu_asid_lock);
> - } else {
> - struct arm_smmu_s2_cfg *cfg =

Re: [PATCH v13 03/15] iommu/arm-smmu-v3: Maintain a SID->device structure

2021-02-01 Thread Keqian Zhu

Hi Eric,

On 2020/11/18 19:21, Eric Auger wrote:
> From: Jean-Philippe Brucker 
> 
> When handling faults from the event or PRI queue, we need to find the
> struct device associated to a SID. Add a rb_tree to keep track of SIDs.
> 
> Signed-off-by: Jean-Philippe Brucker 
[...]

>  }
>  
> +static int arm_smmu_insert_master(struct arm_smmu_device *smmu,
> +   struct arm_smmu_master *master)
> +{
> + int i;
> + int ret = 0;
> + struct arm_smmu_stream *new_stream, *cur_stream;
> + struct rb_node **new_node, *parent_node = NULL;
> + struct iommu_fwspec *fwspec = dev_iommu_fwspec_get(master->dev);
> +
> + master->streams = kcalloc(fwspec->num_ids,
> +   sizeof(struct arm_smmu_stream), GFP_KERNEL);
> + if (!master->streams)
> + return -ENOMEM;
> + master->num_streams = fwspec->num_ids;
This is not roll-backed when fail.

> +
> + mutex_lock(>streams_mutex);
> + for (i = 0; i < fwspec->num_ids && !ret; i++) {
Check ret at here, makes it hard to decide the start index of rollback.

If we fail at here, then start index is (i-2).
If we fail in the loop, then start index is (i-1).

> + u32 sid = fwspec->ids[i];
> +
> + new_stream = >streams[i];
> + new_stream->id = sid;
> + new_stream->master = master;
> +
> + /*
> +  * Check the SIDs are in range of the SMMU and our stream table
> +  */
> + if (!arm_smmu_sid_in_range(smmu, sid)) {
> + ret = -ERANGE;
> + break;
> + }
> +
> + /* Ensure l2 strtab is initialised */
> + if (smmu->features & ARM_SMMU_FEAT_2_LVL_STRTAB) {
> + ret = arm_smmu_init_l2_strtab(smmu, sid);
> + if (ret)
> + break;
> + }
> +
> + /* Insert into SID tree */
> + new_node = &(smmu->streams.rb_node);
> + while (*new_node) {
> + cur_stream = rb_entry(*new_node, struct arm_smmu_stream,
> +   node);
> + parent_node = *new_node;
> + if (cur_stream->id > new_stream->id) {
> + new_node = &((*new_node)->rb_left);
> + } else if (cur_stream->id < new_stream->id) {
> + new_node = &((*new_node)->rb_right);
> + } else {
> + dev_warn(master->dev,
> +  "stream %u already in tree\n",
> +  cur_stream->id);
> + ret = -EINVAL;
> + break;
> + }
> + }
> +
> + if (!ret) {
> + rb_link_node(_stream->node, parent_node, new_node);
> + rb_insert_color(_stream->node, >streams);
> + }
> + }
> +
> + if (ret) {
> + for (; i > 0; i--)
should be (i >= 0)?
And the start index seems not correct.

> + rb_erase(>streams[i].node, >streams);
> + kfree(master->streams);
> + }
> + mutex_unlock(>streams_mutex);
> +
> + return ret;
> +}
> +
> +static void arm_smmu_remove_master(struct arm_smmu_master *master)
> +{
> + int i;
> + struct arm_smmu_device *smmu = master->smmu;
> + struct iommu_fwspec *fwspec = dev_iommu_fwspec_get(master->dev);
> +
> + if (!smmu || !master->streams)
> + return;
> +
> + mutex_lock(>streams_mutex);
> + for (i = 0; i < fwspec->num_ids; i++)
> + rb_erase(>streams[i].node, >streams);
> + mutex_unlock(>streams_mutex);
> +
> + kfree(master->streams);
> +}
> +
>  static struct iommu_ops arm_smmu_ops;
>  
>  static struct iommu_device *arm_smmu_probe_device(struct device *dev)
>  {
> - int i, ret;
> + int ret;
>   struct arm_smmu_device *smmu;
>   struct arm_smmu_master *master;
>   struct iommu_fwspec *fwspec = dev_iommu_fwspec_get(dev);
> @@ -2331,27 +2447,12 @@ static struct iommu_device 
> *arm_smmu_probe_device(struct device *dev)
>  
>   master->dev = dev;
>   master->smmu = smmu;
> - master->sids = fwspec->ids;
> - master->num_sids = fwspec->num_ids;
>   INIT_LIST_HEAD(>bonds);
>   dev_iommu_priv_set(dev, master);
>  
> - /* Check the SIDs are in range of the SMMU and our stream table */
> - for (i = 0; i < master->num_sids; i++) {
> - u32 sid = master->sids[i];
> -
> - if (!arm_smmu_sid_in_range(smmu, sid)) {
> - ret = -ERANGE;
> - goto err_free_master;
> - }
> -
> - /* Ensure l2 strtab is initialised */
> - if (smmu->features & ARM_SMMU_FEAT_2_LVL_STRTAB) {
> - ret = arm_smmu_init_l2_strtab(smmu,

Re: [PATCH v13 02/15] iommu: Introduce bind/unbind_guest_msi

2021-02-01 Thread Keqian Zhu

Hi Eric,

On 2020/11/18 19:21, Eric Auger wrote:
> On ARM, MSI are translated by the SMMU. An IOVA is allocated
> for each MSI doorbell. If both the host and the guest are exposed
> with SMMUs, we end up with 2 different IOVAs allocated by each.
> guest allocates an IOVA (gIOVA) to map onto the guest MSI
> doorbell (gDB). The Host allocates another IOVA (hIOVA) to map
> onto the physical doorbell (hDB).
> 
> So we end up with 2 untied mappings:
>  S1S2
> gIOVA->gDB
>   hIOVA->hDB
> 
> Currently the PCI device is programmed by the host with hIOVA
> as MSI doorbell. So this does not work.
> 
> This patch introduces an API to pass gIOVA/gDB to the host so
> that gIOVA can be reused by the host instead of re-allocating
> a new IOVA. So the goal is to create the following nested mapping:
Does the gDB can be reused under non-nested mode?

> 
>  S1S2
> gIOVA->gDB ->hDB
> 
> and program the PCI device with gIOVA MSI doorbell.
> 
> In case we have several devices attached to this nested domain
> (devices belonging to the same group), they cannot be isolated
> on guest side either. So they should also end up in the same domain
> on guest side. We will enforce that all the devices attached to
> the host iommu domain use the same physical doorbell and similarly
> a single virtual doorbell mapping gets registered (1 single
> virtual doorbell is used on guest as well).
> 
[...]

> + *
> + * The associated IOVA can be reused by the host to create a nested
> + * stage2 binding mapping translating into the physical doorbell used
> + * by the devices attached to the domain.
> + *
> + * All devices within the domain must share the same physical doorbell.
> + * A single MSI GIOVA/GPA mapping can be attached to an iommu_domain.
> + */
> +
> +int iommu_bind_guest_msi(struct iommu_domain *domain,
> +  dma_addr_t giova, phys_addr_t gpa, size_t size)
> +{
> + if (unlikely(!domain->ops->bind_guest_msi))
> + return -ENODEV;
> +
> + return domain->ops->bind_guest_msi(domain, giova, gpa, size);
> +}
> +EXPORT_SYMBOL_GPL(iommu_bind_guest_msi);
> +
> +void iommu_unbind_guest_msi(struct iommu_domain *domain,
> + dma_addr_t iova)
nit: s/iova/giova

> +{
> + if (unlikely(!domain->ops->unbind_guest_msi))
> + return;
> +
> + domain->ops->unbind_guest_msi(domain, iova);
> +}
> +EXPORT_SYMBOL_GPL(iommu_unbind_guest_msi);
> +
[...]

Thanks,
Keqian

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

Re: [PATCH v13 01/15] iommu: Introduce attach/detach_pasid_table API

2021-02-01 Thread Keqian Zhu

Hi Eric,

On 2020/11/18 19:21, Eric Auger wrote:
> In virtualization use case, when a guest is assigned
> a PCI host device, protected by a virtual IOMMU on the guest,
> the physical IOMMU must be programmed to be consistent with
> the guest mappings. If the physical IOMMU supports two
> translation stages it makes sense to program guest mappings
> onto the first stage/level (ARM/Intel terminology) while the host
> owns the stage/level 2.
> 
> In that case, it is mandated to trap on guest configuration
> settings and pass those to the physical iommu driver.
> 
> This patch adds a new API to the iommu subsystem that allows
> to set/unset the pasid table information.
> 
> A generic iommu_pasid_table_config struct is introduced in
> a new iommu.h uapi header. This is going to be used by the VFIO
> user API.
> 
> Signed-off-by: Jean-Philippe Brucker 
> Signed-off-by: Liu, Yi L 
> Signed-off-by: Ashok Raj 
> Signed-off-by: Jacob Pan 
> Signed-off-by: Eric Auger 
> 
> ---
> 
> v12 -> v13:
> - Fix config check
> 
> v11 -> v12:
> - add argsz, name the union
> ---
>  drivers/iommu/iommu.c  | 68 ++
>  include/linux/iommu.h  | 21 
>  include/uapi/linux/iommu.h | 54 ++
>  3 files changed, 143 insertions(+)
> 
> diff --git a/drivers/iommu/iommu.c b/drivers/iommu/iommu.c
> index b53446bb8c6b..978fe34378fb 100644
> --- a/drivers/iommu/iommu.c
> +++ b/drivers/iommu/iommu.c
> @@ -2171,6 +2171,74 @@ int iommu_uapi_sva_unbind_gpasid(struct iommu_domain 
> *domain, struct device *dev
>  }
>  EXPORT_SYMBOL_GPL(iommu_uapi_sva_unbind_gpasid);
>  
> +int iommu_attach_pasid_table(struct iommu_domain *domain,
> +  struct iommu_pasid_table_config *cfg)
> +{
> + if (unlikely(!domain->ops->attach_pasid_table))
> + return -ENODEV;
> +
> + return domain->ops->attach_pasid_table(domain, cfg);
> +}
miss export symbol?

> +
> +int iommu_uapi_attach_pasid_table(struct iommu_domain *domain,
> +   void __user *uinfo)
> +{
> + struct iommu_pasid_table_config pasid_table_data = { 0 };
> + u32 minsz;
> +
> + if (unlikely(!domain->ops->attach_pasid_table))
> + return -ENODEV;
> +
> + /*
> +  * No new spaces can be added before the variable sized union, the
> +  * minimum size is the offset to the union.
> +  */
> + minsz = offsetof(struct iommu_pasid_table_config, vendor_data);
> +
> + /* Copy minsz from user to get flags and argsz */
> + if (copy_from_user(_table_data, uinfo, minsz))
> + return -EFAULT;
> +
> + /* Fields before the variable size union are mandatory */
> + if (pasid_table_data.argsz < minsz)
> + return -EINVAL;
> +
> + /* PASID and address granu require additional info beyond minsz */
> + if (pasid_table_data.version != PASID_TABLE_CFG_VERSION_1)
> + return -EINVAL;
> + if (pasid_table_data.format == IOMMU_PASID_FORMAT_SMMUV3 &&
> + pasid_table_data.argsz <
> + offsetofend(struct iommu_pasid_table_config, 
> vendor_data.smmuv3))
> + return -EINVAL;
> +
> + /*
> +  * User might be using a newer UAPI header which has a larger data
> +  * size, we shall support the existing flags within the current
> +  * size. Copy the remaining user data _after_ minsz but not more
> +  * than the current kernel supported size.
> +  */
> + if (copy_from_user((void *)_table_data + minsz, uinfo + minsz,
> +min_t(u32, pasid_table_data.argsz, 
> sizeof(pasid_table_data)) - minsz))
> + return -EFAULT;
> +
> + /* Now the argsz is validated, check the content */
> + if (pasid_table_data.config < IOMMU_PASID_CONFIG_TRANSLATE ||
> + pasid_table_data.config > IOMMU_PASID_CONFIG_ABORT)
> + return -EINVAL;
> +
> + return domain->ops->attach_pasid_table(domain, _table_data);
> +}
> +EXPORT_SYMBOL_GPL(iommu_uapi_attach_pasid_table);
> +
> +void iommu_detach_pasid_table(struct iommu_domain *domain)
> +{
> + if (unlikely(!domain->ops->detach_pasid_table))
> + return;
> +
> + domain->ops->detach_pasid_table(domain);
> +}
> +EXPORT_SYMBOL_GPL(iommu_detach_pasid_table);
> +
>  static void __iommu_detach_device(struct iommu_domain *domain,
> struct device *dev)
>  {
> diff --git a/include/linux/iommu.h b/include/linux/iommu.h
> index b95a6f8db6ff..464fcbecf841 100644
> --- a/include/linux/iommu.h
> +++ b/include/linux/iommu.h
> @@ -223,6 +223,8 @@ struct iommu_iotlb_gather {
>   * @cache_invalidate: invalidate translation caches
>   * @sva_bind_gpasid: bind guest pasid and mm
>   * @sva_unbind_gpasid: unbind guest pasid and mm
> + * @attach_pasid_table: attach a pasid table
> + * @detach_pasid_table: detach the pasid table
>   * @def_domain_type: device default domain type, return value:
>   *   - IOMMU_DOMAIN_IDENTITY:

Re: [PATCH v3 2/2] vfio/iommu_type1: Fix some sanity checks in detach group

2021-01-28 Thread Keqian Zhu




On 2021/1/28 7:46, Alex Williamson wrote:
> On Fri, 22 Jan 2021 17:26:35 +0800
> Keqian Zhu  wrote:
> 
>> vfio_sanity_check_pfn_list() is used to check whether pfn_list and
>> notifier are empty when remove the external domain, so it makes a
>> wrong assumption that only external domain will use the pinning
>> interface.
>>
>> Now we apply the pfn_list check when a vfio_dma is removed and apply
>> the notifier check when all domains are removed.
>>
>> Fixes: a54eb55045ae ("vfio iommu type1: Add support for mediated devices")
>> Signed-off-by: Keqian Zhu 
>> ---
>>  drivers/vfio/vfio_iommu_type1.c | 33 ++---
>>  1 file changed, 10 insertions(+), 23 deletions(-)
>>
>> diff --git a/drivers/vfio/vfio_iommu_type1.c 
>> b/drivers/vfio/vfio_iommu_type1.c
>> index 161725395f2f..d8c10f508321 100644
>> --- a/drivers/vfio/vfio_iommu_type1.c
>> +++ b/drivers/vfio/vfio_iommu_type1.c
>> @@ -957,6 +957,7 @@ static long vfio_unmap_unpin(struct vfio_iommu *iommu, 
>> struct vfio_dma *dma,
>>  
>>  static void vfio_remove_dma(struct vfio_iommu *iommu, struct vfio_dma *dma)
>>  {
>> +WARN_ON(!RB_EMPTY_ROOT(>pfn_list));
>>  vfio_unmap_unpin(iommu, dma, true);
>>  vfio_unlink_dma(iommu, dma);
>>  put_task_struct(dma->task);
>> @@ -2250,23 +2251,6 @@ static void vfio_iommu_unmap_unpin_reaccount(struct 
>> vfio_iommu *iommu)
>>  }
>>  }
>>  
>> -static void vfio_sanity_check_pfn_list(struct vfio_iommu *iommu)
>> -{
>> -struct rb_node *n;
>> -
>> -n = rb_first(>dma_list);
>> -for (; n; n = rb_next(n)) {
>> -struct vfio_dma *dma;
>> -
>> -dma = rb_entry(n, struct vfio_dma, node);
>> -
>> -if (WARN_ON(!RB_EMPTY_ROOT(>pfn_list)))
>> -break;
>> -}
>> -/* mdev vendor driver must unregister notifier */
>> -WARN_ON(iommu->notifier.head);
>> -}
>> -
>>  /*
>>   * Called when a domain is removed in detach. It is possible that
>>   * the removed domain decided the iova aperture window. Modify the
>> @@ -2366,10 +2350,10 @@ static void vfio_iommu_type1_detach_group(void 
>> *iommu_data,
>>  kfree(group);
>>  
>>  if (list_empty(>external_domain->group_list)) {
>> -vfio_sanity_check_pfn_list(iommu);
>> -
>> -if (!IS_IOMMU_CAP_DOMAIN_IN_CONTAINER(iommu))
>> +if (!IS_IOMMU_CAP_DOMAIN_IN_CONTAINER(iommu)) {
>> +WARN_ON(iommu->notifier.head);
>>  vfio_iommu_unmap_unpin_all(iommu);
>> +}
>>  
>>  kfree(iommu->external_domain);
>>  iommu->external_domain = NULL;
>> @@ -2403,10 +2387,12 @@ static void vfio_iommu_type1_detach_group(void 
>> *iommu_data,
>>   */
>>  if (list_empty(>group_list)) {
>>  if (list_is_singular(>domain_list)) {
>> -if (!iommu->external_domain)
>> +if (!iommu->external_domain) {
>> +WARN_ON(iommu->notifier.head);
>>  vfio_iommu_unmap_unpin_all(iommu);
>> -else
>> +} else {
>>  vfio_iommu_unmap_unpin_reaccount(iommu);
>> +}
>>  }
>>  iommu_domain_free(domain->domain);
>>  list_del(>next);
>> @@ -2488,9 +2474,10 @@ static void vfio_iommu_type1_release(void *iommu_data)
>>  struct vfio_iommu *iommu = iommu_data;
>>  struct vfio_domain *domain, *domain_tmp;
>>  
>> +WARN_ON(iommu->notifier.head);
> 
> I don't see that this does any harm, but isn't it actually redundant?
> It seems vfio-core only calls the iommu backend release function after
> removing all the groups, so the tests in _detach_group should catch all
> cases.  We're expecting the vfio bus/mdev driver to remove the notifier
> when a device is closed, which necessarily occurs before detaching the
> group.  Thanks,
Right. Devices of a specific group must be closed before detach this group.
Detach the last group have checked this, so vfio_iommu_type1_release doesn't
need to do this check again.

Could you please queue this patch and delete this check btw? Thanks. ;-)

Keqian.

> 
> Alex
> 
>> +
>>  if (iommu->external_domain) {
>>  vfio_release_domain(iommu->external_domain, true);
>> -vfio_sanity_check_pfn_list(iommu);
>>  kfree(iommu->external_domain);
>>  }
>>  
> 
> .
> 
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

Re: [PATCH 1/1] iommu/arm-smmu-v3: add support for BBML

2021-01-28 Thread Keqian Zhu




On 2021/1/29 0:17, Robin Murphy wrote:
> On 2021-01-28 15:18, Keqian Zhu wrote:
>>
>>
>> On 2021/1/27 17:39, Robin Murphy wrote:
>>> On 2021-01-27 07:36, Keqian Zhu wrote:
>>>>
>>>>
>>>> On 2021/1/27 10:01, Leizhen (ThunderTown) wrote:
>>>>>
>>>>>
>>>>> On 2021/1/26 18:12, Will Deacon wrote:
>>>>>> On Mon, Jan 25, 2021 at 08:23:40PM +, Robin Murphy wrote:
>>>>>>> Now we probably will need some degreee of BBML feature awareness for the
>>>>>>> sake of SVA if and when we start using it for CPU pagetables, but I 
>>>>>>> still
>>>>>>> cannot see any need to consider it in io-pgtable.
>>>>>>
>>>>>> Agreed; I don't think this is something that io-pgtable should have to 
>>>>>> care
>>>>>> about.
>>>> Hi,
>>>>
>>>> I have a question here :-).
>>>> If the old table is not live, then the break procedure seems unnecessary. 
>>>> Do I miss something?
>>>
>>> The MMU is allowed to prefetch translations at any time, so not following 
>>> the proper update procedure could still potentially lead to a TLB conflict, 
>>> even if there's no device traffic to worry about disrupting.
>>>
>>> Robin.
>>
>> Thanks. Does the MMU you mention here includes MMU and SMMU? I know that at 
>> SMMU side, ATS can prefetch translation.
> 
> Yes, both - VMSAv8 allows speculative translation table walks, so SMMUv3 
> inherits from there (per 3.21.1 "Translation tables and TLB invalidation 
> completion behavior").
OK, I Get it. Thanks.

Keqian.

> 
> Robin.
> 
>>
>> Keqian
>>>
>>>> Thanks,
>>>> Keqian
>>>>
>>>>>
>>>>> Yes, the SVA works in stall mode, and the failed device access requests 
>>>>> are not
>>>>> discarded.
>>>>>
>>>>> Let me look for examples. The BBML usage scenario was told by a former 
>>>>> colleague.
>>>>>
>>>>>>
>>>>>> Will
>>>>>>
>>>>>> .
>>>>>>
>>>>>
>>>>>
>>>>> ___
>>>>> linux-arm-kernel mailing list
>>>>> linux-arm-ker...@lists.infradead.org
>>>>> http://lists.infradead.org/mailman/listinfo/linux-arm-kernel
>>>>> .
>>>>>
>>>> ___
>>>> iommu mailing list
>>>> iommu@lists.linux-foundation.org
>>>> https://lists.linuxfoundation.org/mailman/listinfo/iommu
>>>>
>>> .
>>>
> .
> 
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

[RFC PATCH 07/11] iommu/arm-smmu-v3: Clear dirty log according to bitmap

2021-01-28 Thread Keqian Zhu

From: jiangkunkun 

After dirty log is retrieved, user should clear dirty log to re-enable
dirty log tracking for these dirtied pages. This adds a new interface
named clear_dirty_log and arm smmuv3 implements it, which clears the dirty
state (As we just enable HTTU for stage1, so set the AP[2] bit) of these
TTDs that are specified by the user provided bitmap.

Co-developed-by: Keqian Zhu 
Signed-off-by: Kunkun Jiang 
---
 drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c | 24 ++
 drivers/iommu/io-pgtable-arm.c  | 95 +
 drivers/iommu/iommu.c   | 71 +++
 include/linux/io-pgtable.h  |  4 +
 include/linux/iommu.h   | 17 
 5 files changed, 211 insertions(+)

diff --git a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c 
b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
index 43d0536b429a..0c24503d29d3 100644
--- a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
+++ b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
@@ -2574,6 +2574,29 @@ static int arm_smmu_sync_dirty_log(struct iommu_domain 
*domain,
base_iova, bitmap_pgshift);
 }
 
+static int arm_smmu_clear_dirty_log(struct iommu_domain *domain,
+   unsigned long iova, size_t size,
+   unsigned long *bitmap,
+   unsigned long base_iova,
+   unsigned long bitmap_pgshift)
+{
+   struct io_pgtable_ops *ops = to_smmu_domain(domain)->pgtbl_ops;
+   struct arm_smmu_device *smmu = to_smmu_domain(domain)->smmu;
+
+   if (!(smmu->features & ARM_SMMU_FEAT_HTTU_HD)) {
+   dev_err(smmu->dev, "don't support HTTU_HD and clear dirty 
log\n");
+   return -EPERM;
+   }
+
+   if (!ops || !ops->clear_dirty_log) {
+   pr_err("don't support clear dirty log\n");
+   return -ENODEV;
+   }
+
+   return ops->clear_dirty_log(ops, iova, size, bitmap, base_iova,
+   bitmap_pgshift);
+}
+
 static int arm_smmu_of_xlate(struct device *dev, struct of_phandle_args *args)
 {
return iommu_fwspec_add_ids(dev, args->args, 1);
@@ -2676,6 +2699,7 @@ static struct iommu_ops arm_smmu_ops = {
.split_block= arm_smmu_split_block,
.merge_page = arm_smmu_merge_page,
.sync_dirty_log = arm_smmu_sync_dirty_log,
+   .clear_dirty_log= arm_smmu_clear_dirty_log,
.of_xlate   = arm_smmu_of_xlate,
.get_resv_regions   = arm_smmu_get_resv_regions,
.put_resv_regions   = generic_iommu_put_resv_regions,
diff --git a/drivers/iommu/io-pgtable-arm.c b/drivers/iommu/io-pgtable-arm.c
index 6cfe1ef3fedd..2256e37bcb3a 100644
--- a/drivers/iommu/io-pgtable-arm.c
+++ b/drivers/iommu/io-pgtable-arm.c
@@ -966,6 +966,100 @@ static int arm_lpae_sync_dirty_log(struct io_pgtable_ops 
*ops,
 bitmap, base_iova, bitmap_pgshift);
 }
 
+static int __arm_lpae_clear_dirty_log(struct arm_lpae_io_pgtable *data,
+ unsigned long iova, size_t size,
+ int lvl, arm_lpae_iopte *ptep,
+ unsigned long *bitmap,
+ unsigned long base_iova,
+ unsigned long bitmap_pgshift)
+{
+   arm_lpae_iopte pte;
+   struct io_pgtable *iop = >iop;
+   unsigned long offset;
+   size_t base, next_size;
+   int nbits, ret, i;
+
+   if (WARN_ON(lvl == ARM_LPAE_MAX_LEVELS))
+   return -EINVAL;
+
+   ptep += ARM_LPAE_LVL_IDX(iova, lvl, data);
+   pte = READ_ONCE(*ptep);
+   if (WARN_ON(!pte))
+   return -EINVAL;
+
+   if (size == ARM_LPAE_BLOCK_SIZE(lvl, data)) {
+   if (iopte_leaf(pte, lvl, iop->fmt)) {
+   if (pte & ARM_LPAE_PTE_AP_RDONLY)
+   return 0;
+
+   /* Ensure all corresponding bits are set */
+   nbits = size >> bitmap_pgshift;
+   offset = (iova - base_iova) >> bitmap_pgshift;
+   for (i = offset; i < offset + nbits; i++) {
+   if (!test_bit(i, bitmap))
+   return 0;
+   }
+
+   /* Race does not exist */
+   pte |= ARM_LPAE_PTE_AP_RDONLY;
+   __arm_lpae_set_pte(ptep, pte, >cfg);
+   return 0;
+   } else {
+   /* To traverse next level */
+   next_size = ARM_LPAE_BLOCK_SIZE(lvl + 1, data);
+   ptep = iopte_deref(pte, data);
+   for (base = 0; base < size; base += n

Re: [PATCH v3 2/2] vfio/iommu_type1: Fix some sanity checks in detach group

2021-01-28 Thread Keqian Zhu




On 2021/1/28 7:46, Alex Williamson wrote:
> On Fri, 22 Jan 2021 17:26:35 +0800
> Keqian Zhu  wrote:
> 
>> vfio_sanity_check_pfn_list() is used to check whether pfn_list and
>> notifier are empty when remove the external domain, so it makes a
>> wrong assumption that only external domain will use the pinning
>> interface.
>>
>> Now we apply the pfn_list check when a vfio_dma is removed and apply
>> the notifier check when all domains are removed.
>>
>> Fixes: a54eb55045ae ("vfio iommu type1: Add support for mediated devices")
>> Signed-off-by: Keqian Zhu 
>> ---
>>  drivers/vfio/vfio_iommu_type1.c | 33 ++---
>>  1 file changed, 10 insertions(+), 23 deletions(-)
>>
>> diff --git a/drivers/vfio/vfio_iommu_type1.c 
>> b/drivers/vfio/vfio_iommu_type1.c
>> index 161725395f2f..d8c10f508321 100644
>> --- a/drivers/vfio/vfio_iommu_type1.c
>> +++ b/drivers/vfio/vfio_iommu_type1.c
>> @@ -957,6 +957,7 @@ static long vfio_unmap_unpin(struct vfio_iommu *iommu, 
>> struct vfio_dma *dma,
>>  
>>  static void vfio_remove_dma(struct vfio_iommu *iommu, struct vfio_dma *dma)
>>  {
>> +WARN_ON(!RB_EMPTY_ROOT(>pfn_list));
>>  vfio_unmap_unpin(iommu, dma, true);
>>  vfio_unlink_dma(iommu, dma);
>>  put_task_struct(dma->task);
>> @@ -2250,23 +2251,6 @@ static void vfio_iommu_unmap_unpin_reaccount(struct 
>> vfio_iommu *iommu)
>>  }
>>  }
>>  
>> -static void vfio_sanity_check_pfn_list(struct vfio_iommu *iommu)
>> -{
>> -struct rb_node *n;
>> -
>> -n = rb_first(>dma_list);
>> -for (; n; n = rb_next(n)) {
>> -struct vfio_dma *dma;
>> -
>> -dma = rb_entry(n, struct vfio_dma, node);
>> -
>> -if (WARN_ON(!RB_EMPTY_ROOT(>pfn_list)))
>> -break;
>> -}
>> -/* mdev vendor driver must unregister notifier */
>> -WARN_ON(iommu->notifier.head);
>> -}
>> -
>>  /*
>>   * Called when a domain is removed in detach. It is possible that
>>   * the removed domain decided the iova aperture window. Modify the
>> @@ -2366,10 +2350,10 @@ static void vfio_iommu_type1_detach_group(void 
>> *iommu_data,
>>  kfree(group);
>>  
>>  if (list_empty(>external_domain->group_list)) {
>> -vfio_sanity_check_pfn_list(iommu);
>> -
>> -if (!IS_IOMMU_CAP_DOMAIN_IN_CONTAINER(iommu))
>> +if (!IS_IOMMU_CAP_DOMAIN_IN_CONTAINER(iommu)) {
>> +WARN_ON(iommu->notifier.head);
>>  vfio_iommu_unmap_unpin_all(iommu);
>> +}
>>  
>>  kfree(iommu->external_domain);
>>  iommu->external_domain = NULL;
>> @@ -2403,10 +2387,12 @@ static void vfio_iommu_type1_detach_group(void 
>> *iommu_data,
>>   */
>>  if (list_empty(>group_list)) {
>>  if (list_is_singular(>domain_list)) {
>> -if (!iommu->external_domain)
>> +if (!iommu->external_domain) {
>> +WARN_ON(iommu->notifier.head);
>>  vfio_iommu_unmap_unpin_all(iommu);
>> -else
>> +} else {
>>  vfio_iommu_unmap_unpin_reaccount(iommu);
>> +}
>>  }
>>  iommu_domain_free(domain->domain);
>>  list_del(>next);
>> @@ -2488,9 +2474,10 @@ static void vfio_iommu_type1_release(void *iommu_data)
>>  struct vfio_iommu *iommu = iommu_data;
>>  struct vfio_domain *domain, *domain_tmp;
>>  
>> +WARN_ON(iommu->notifier.head);
> 
> I don't see that this does any harm, but isn't it actually redundant?
> It seems vfio-core only calls the iommu backend release function after
> removing all the groups, so the tests in _detach_group should catch all
> cases.  We're expecting the vfio bus/mdev driver to remove the notifier
> when a device is closed, which necessarily occurs before detaching the
> group.  Thanks,
> 
> Alex
Hi Alex,

Sorry that today I was busy at sending the smmu HTTU based dma dirty log 
tracking.
I will reply you tomorrow. Thanks!

Keqian.

> 
>> +
>>  if (iommu->external_domain) {
>>  vfio_release_domain(iommu->external_domain, true);
>> -vfio_sanity_check_pfn_list(iommu);
>>  kfree(iommu->external_domain);
>>  }
>>  
> 
> .
> 
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

Re: [PATCH 1/1] iommu/arm-smmu-v3: add support for BBML

2021-01-28 Thread Keqian Zhu




On 2021/1/27 17:39, Robin Murphy wrote:
> On 2021-01-27 07:36, Keqian Zhu wrote:
>>
>>
>> On 2021/1/27 10:01, Leizhen (ThunderTown) wrote:
>>>
>>>
>>> On 2021/1/26 18:12, Will Deacon wrote:
>>>> On Mon, Jan 25, 2021 at 08:23:40PM +, Robin Murphy wrote:
>>>>> Now we probably will need some degreee of BBML feature awareness for the
>>>>> sake of SVA if and when we start using it for CPU pagetables, but I still
>>>>> cannot see any need to consider it in io-pgtable.
>>>>
>>>> Agreed; I don't think this is something that io-pgtable should have to care
>>>> about.
>> Hi,
>>
>> I have a question here :-).
>> If the old table is not live, then the break procedure seems unnecessary. Do 
>> I miss something?
> 
> The MMU is allowed to prefetch translations at any time, so not following the 
> proper update procedure could still potentially lead to a TLB conflict, even 
> if there's no device traffic to worry about disrupting.
> 
> Robin.

Thanks. Does the MMU you mention here includes MMU and SMMU? I know that at 
SMMU side, ATS can prefetch translation.

Keqian
> 
>> Thanks,
>> Keqian
>>
>>>
>>> Yes, the SVA works in stall mode, and the failed device access requests are 
>>> not
>>> discarded.
>>>
>>> Let me look for examples. The BBML usage scenario was told by a former 
>>> colleague.
>>>
>>>>
>>>> Will
>>>>
>>>> .
>>>>
>>>
>>>
>>> ___
>>> linux-arm-kernel mailing list
>>> linux-arm-ker...@lists.infradead.org
>>> http://lists.infradead.org/mailman/listinfo/linux-arm-kernel
>>> .
>>>
>> ___
>> iommu mailing list
>> iommu@lists.linux-foundation.org
>> https://lists.linuxfoundation.org/mailman/listinfo/iommu
>>
> .
> 
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

[RFC PATCH 01/11] iommu/arm-smmu-v3: Add feature detection for HTTU

2021-01-28 Thread Keqian Zhu

From: jiangkunkun 

The SMMU which supports HTTU (Hardware Translation Table Update) can
update the access flag and the dirty state of TTD by hardware. It is
essential to track dirty pages of DMA.

This adds feature detection, none functional change.

Co-developed-by: Keqian Zhu 
Signed-off-by: Kunkun Jiang 
---
 drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c | 16 
 drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h |  8 
 include/linux/io-pgtable.h  |  1 +
 3 files changed, 25 insertions(+)

diff --git a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c 
b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
index 8ca7415d785d..0f0fe71cc10d 100644
--- a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
+++ b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
@@ -1987,6 +1987,7 @@ static int arm_smmu_domain_finalise(struct iommu_domain 
*domain,
.pgsize_bitmap  = smmu->pgsize_bitmap,
.ias= ias,
.oas= oas,
+   .httu_hd= smmu->features & ARM_SMMU_FEAT_HTTU_HD,
.coherent_walk  = smmu->features & ARM_SMMU_FEAT_COHERENCY,
.tlb= _smmu_flush_ops,
.iommu_dev  = smmu->dev,
@@ -3224,6 +3225,21 @@ static int arm_smmu_device_hw_probe(struct 
arm_smmu_device *smmu)
if (reg & IDR0_HYP)
smmu->features |= ARM_SMMU_FEAT_HYP;
 
+   switch (FIELD_GET(IDR0_HTTU, reg)) {
+   case IDR0_HTTU_NONE:
+   break;
+   case IDR0_HTTU_HA:
+   smmu->features |= ARM_SMMU_FEAT_HTTU_HA;
+   break;
+   case IDR0_HTTU_HAD:
+   smmu->features |= ARM_SMMU_FEAT_HTTU_HA;
+   smmu->features |= ARM_SMMU_FEAT_HTTU_HD;
+   break;
+   default:
+   dev_err(smmu->dev, "unknown/unsupported HTTU!\n");
+   return -ENXIO;
+   }
+
/*
 * The coherency feature as set by FW is used in preference to the ID
 * register, but warn on mismatch.
diff --git a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h 
b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h
index 96c2e9565e00..e91bea44519e 100644
--- a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h
+++ b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h
@@ -33,6 +33,10 @@
 #define IDR0_ASID16(1 << 12)
 #define IDR0_ATS   (1 << 10)
 #define IDR0_HYP   (1 << 9)
+#define IDR0_HTTU  GENMASK(7, 6)
+#define IDR0_HTTU_NONE 0
+#define IDR0_HTTU_HA   1
+#define IDR0_HTTU_HAD  2
 #define IDR0_COHACC(1 << 4)
 #define IDR0_TTF   GENMASK(3, 2)
 #define IDR0_TTF_AARCH64   2
@@ -286,6 +290,8 @@
 #define CTXDESC_CD_0_TCR_TBI0  (1ULL << 38)
 
 #define CTXDESC_CD_0_AA64  (1UL << 41)
+#define CTXDESC_CD_0_HD(1UL << 42)
+#define CTXDESC_CD_0_HA(1UL << 43)
 #define CTXDESC_CD_0_S (1UL << 44)
 #define CTXDESC_CD_0_R (1UL << 45)
 #define CTXDESC_CD_0_A (1UL << 46)
@@ -604,6 +610,8 @@ struct arm_smmu_device {
 #define ARM_SMMU_FEAT_RANGE_INV(1 << 15)
 #define ARM_SMMU_FEAT_BTM  (1 << 16)
 #define ARM_SMMU_FEAT_SVA  (1 << 17)
+#define ARM_SMMU_FEAT_HTTU_HA  (1 << 18)
+#define ARM_SMMU_FEAT_HTTU_HD  (1 << 19)
u32 features;
 
 #define ARM_SMMU_OPT_SKIP_PREFETCH (1 << 0)
diff --git a/include/linux/io-pgtable.h b/include/linux/io-pgtable.h
index ea727eb1a1a9..1a00ea8562c7 100644
--- a/include/linux/io-pgtable.h
+++ b/include/linux/io-pgtable.h
@@ -97,6 +97,7 @@ struct io_pgtable_cfg {
unsigned long   pgsize_bitmap;
unsigned intias;
unsigned intoas;
+   boolhttu_hd;
boolcoherent_walk;
const struct iommu_flush_ops*tlb;
struct device   *iommu_dev;
-- 
2.19.1

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

[RFC PATCH 10/11] vfio/iommu_type1: Optimize dirty bitmap population based on iommu HWDBM

2021-01-28 Thread Keqian Zhu

From: jiangkunkun 

In the past if vfio_iommu is not of pinned_page_dirty_scope and
vfio_dma is iommu_mapped, we populate full dirty bitmap for this
vfio_dma. Now we can try to get dirty log from iommu before make
the lousy decision.

Co-developed-by: Keqian Zhu 
Signed-off-by: Kunkun Jiang 
---
 drivers/vfio/vfio_iommu_type1.c | 97 -
 1 file changed, 94 insertions(+), 3 deletions(-)

diff --git a/drivers/vfio/vfio_iommu_type1.c b/drivers/vfio/vfio_iommu_type1.c
index 3b8522ebf955..1cd10f3e7ed4 100644
--- a/drivers/vfio/vfio_iommu_type1.c
+++ b/drivers/vfio/vfio_iommu_type1.c
@@ -999,6 +999,25 @@ static bool vfio_group_supports_hwdbm(struct vfio_group 
*group)
return true;
 }
 
+static int vfio_iommu_dirty_log_clear(struct vfio_iommu *iommu,
+ dma_addr_t start_iova, size_t size,
+ unsigned long *bitmap_buffer,
+ dma_addr_t base_iova, size_t pgsize)
+{
+   struct vfio_domain *d;
+   unsigned long pgshift = __ffs(pgsize);
+   int ret;
+
+   list_for_each_entry(d, >domain_list, next) {
+   ret = iommu_clear_dirty_log(d->domain, start_iova, size,
+   bitmap_buffer, base_iova, pgshift);
+   if (ret)
+   return ret;
+   }
+
+   return 0;
+}
+
 static int update_user_bitmap(u64 __user *bitmap, struct vfio_iommu *iommu,
  struct vfio_dma *dma, dma_addr_t base_iova,
  size_t pgsize)
@@ -1010,13 +1029,28 @@ static int update_user_bitmap(u64 __user *bitmap, 
struct vfio_iommu *iommu,
unsigned long shift = bit_offset % BITS_PER_LONG;
unsigned long leftover;
 
+   if (iommu->pinned_page_dirty_scope || !dma->iommu_mapped)
+   goto bitmap_done;
+
+   /* try to get dirty log from IOMMU */
+   if (!iommu->num_non_hwdbm_groups) {
+   struct vfio_domain *d;
+
+   list_for_each_entry(d, >domain_list, next) {
+   if (iommu_sync_dirty_log(d->domain, dma->iova, 
dma->size,
+   dma->bitmap, dma->iova, 
pgshift))
+   return -EFAULT;
+   }
+   goto bitmap_done;
+   }
+
/*
 * mark all pages dirty if any IOMMU capable device is not able
 * to report dirty pages and all pages are pinned and mapped.
 */
-   if (!iommu->pinned_page_dirty_scope && dma->iommu_mapped)
-   bitmap_set(dma->bitmap, 0, nbits);
+   bitmap_set(dma->bitmap, 0, nbits);
 
+bitmap_done:
if (shift) {
bitmap_shift_left(dma->bitmap, dma->bitmap, shift,
  nbits + shift);
@@ -1078,6 +1112,18 @@ static int vfio_iova_dirty_bitmap(u64 __user *bitmap, 
struct vfio_iommu *iommu,
 */
bitmap_clear(dma->bitmap, 0, dma->size >> pgshift);
vfio_dma_populate_bitmap(dma, pgsize);
+
+   /* Clear iommu dirty log to re-enable dirty log tracking */
+   if (!iommu->pinned_page_dirty_scope &&
+   dma->iommu_mapped && !iommu->num_non_hwdbm_groups) {
+   ret = vfio_iommu_dirty_log_clear(iommu, dma->iova,
+   dma->size, dma->bitmap, dma->iova,
+   pgsize);
+   if (ret) {
+   pr_warn("dma dirty log clear failed!\n");
+   return ret;
+   }
+   }
}
return 0;
 }
@@ -2780,6 +2826,48 @@ static int vfio_iommu_type1_unmap_dma(struct vfio_iommu 
*iommu,
-EFAULT : 0;
 }
 
+static void vfio_dma_dirty_log_start(struct vfio_iommu *iommu,
+struct vfio_dma *dma)
+{
+   struct vfio_domain *d;
+
+   list_for_each_entry(d, >domain_list, next) {
+   /* Go through all domain anyway even if we fail */
+   iommu_split_block(d->domain, dma->iova, dma->size);
+   }
+}
+
+static void vfio_dma_dirty_log_stop(struct vfio_iommu *iommu,
+   struct vfio_dma *dma)
+{
+   struct vfio_domain *d;
+
+   list_for_each_entry(d, >domain_list, next) {
+   /* Go through all domain anyway even if we fail */
+   iommu_merge_page(d->domain, dma->iova, dma->size,
+d->prot | dma->prot);
+   }
+}
+
+static void vfio_iommu_dirty_log_switch(struct vfio_iommu *iommu, bool start)
+{
+   struct rb_node *n;
+
+   /* Split and merge even if all iommu don't support HWDBM now */
+   for (n = rb_first(

[RFC PATCH 11/11] vfio/iommu_type1: Add support for manual dirty log clear

2021-01-28 Thread Keqian Zhu

From: jiangkunkun 

In the past, we clear dirty log immediately after sync dirty
log to userspace. This may cause redundant dirty handling if
userspace handles dirty log iteratively:

After vfio clears dirty log, new dirty log starts to generate.
These new dirty log will be reported to userspace even if they
are generated before userspace handles the same dirty page.

That's to say, we should minimize the time gap of dirty log
clearing and dirty log handling. We can give userspace the
interface to clear dirty log.

Co-developed-by: Keqian Zhu 
Signed-off-by: Kunkun Jiang 
---
 drivers/vfio/vfio_iommu_type1.c | 103 ++--
 include/uapi/linux/vfio.h   |  28 -
 2 files changed, 126 insertions(+), 5 deletions(-)

diff --git a/drivers/vfio/vfio_iommu_type1.c b/drivers/vfio/vfio_iommu_type1.c
index 1cd10f3e7ed4..a32dc684b86e 100644
--- a/drivers/vfio/vfio_iommu_type1.c
+++ b/drivers/vfio/vfio_iommu_type1.c
@@ -73,6 +73,7 @@ struct vfio_iommu {
boolv2;
boolnesting;
booldirty_page_tracking;
+   booldirty_log_manual_clear;
boolpinned_page_dirty_scope;
uint64_tnum_non_hwdbm_groups;
 };
@@ -1018,6 +1019,78 @@ static int vfio_iommu_dirty_log_clear(struct vfio_iommu 
*iommu,
return 0;
 }
 
+static int vfio_iova_dirty_log_clear(u64 __user *bitmap,
+struct vfio_iommu *iommu,
+dma_addr_t iova, size_t size,
+size_t pgsize)
+{
+   struct vfio_dma *dma;
+   struct rb_node *n;
+   dma_addr_t start_iova, end_iova, riova;
+   unsigned long pgshift = __ffs(pgsize);
+   unsigned long bitmap_size;
+   unsigned long *bitmap_buffer = NULL;
+   bool clear_valid;
+   int rs, re, start, end, dma_offset;
+   int ret = 0;
+
+   bitmap_size = DIRTY_BITMAP_BYTES(size >> pgshift);
+   bitmap_buffer = kvmalloc(bitmap_size, GFP_KERNEL);
+   if (!bitmap_buffer) {
+   ret = -ENOMEM;
+   goto out;
+   }
+
+   if (copy_from_user(bitmap_buffer, bitmap, bitmap_size)) {
+   ret = -EFAULT;
+   goto out;
+   }
+
+   for (n = rb_first(>dma_list); n; n = rb_next(n)) {
+   dma = rb_entry(n, struct vfio_dma, node);
+   if (!dma->iommu_mapped)
+   continue;
+   if ((dma->iova + dma->size - 1) < iova)
+   continue;
+   if (dma->iova > iova + size - 1)
+   break;
+
+   start_iova = max(iova, dma->iova);
+   end_iova = min(iova + size, dma->iova + dma->size);
+
+   /* Similar logic as the tail of vfio_iova_dirty_bitmap */
+
+   clear_valid = false;
+   start = (start_iova - iova) >> pgshift;
+   end = (end_iova - iova) >> pgshift;
+   bitmap_for_each_set_region(bitmap_buffer, rs, re, start, end) {
+   clear_valid = true;
+   riova = iova + (rs << pgshift);
+   dma_offset = (riova - dma->iova) >> pgshift;
+   bitmap_clear(dma->bitmap, dma_offset, re - rs);
+   }
+
+   if (clear_valid)
+   vfio_dma_populate_bitmap(dma, pgsize);
+
+   if (clear_valid && !iommu->pinned_page_dirty_scope &&
+   dma->iommu_mapped && !iommu->num_non_hwdbm_groups) {
+   ret = vfio_iommu_dirty_log_clear(iommu, start_iova,
+   end_iova - start_iova,  bitmap_buffer,
+   iova, pgsize);
+   if (ret) {
+   pr_warn("dma dirty log clear failed!\n");
+   goto out;
+   }
+   }
+
+   }
+
+out:
+   kfree(bitmap_buffer);
+   return ret;
+}
+
 static int update_user_bitmap(u64 __user *bitmap, struct vfio_iommu *iommu,
  struct vfio_dma *dma, dma_addr_t base_iova,
  size_t pgsize)
@@ -1067,6 +1140,10 @@ static int update_user_bitmap(u64 __user *bitmap, struct 
vfio_iommu *iommu,
 DIRTY_BITMAP_BYTES(nbits + shift)))
return -EFAULT;
 
+   if (shift && iommu->dirty_log_manual_clear)
+   bitmap_shift_right(dma->bitmap, dma->bitmap, shift,
+  nbits + shift);
+
return 0;
 }
 
@@ -1105,6 +1182,9 @@ static int vfio_iova_dirty_bitmap(u64 __user *bitmap, 
struct vfio_iommu *iommu,
if (ret)
return ret;
 
+   if (iommu-&g

[RFC PATCH 09/11] vfio/iommu_type1: Add HWDBM status maintanance

2021-01-28 Thread Keqian Zhu

From: jiangkunkun 

We are going to optimize dirty log tracking based on iommu
HWDBM feature, but the dirty log from iommu is useful only
when all iommu backed groups are connected to iommu with
HWDBM feature. This maintains a counter for this feature.

Co-developed-by: Keqian Zhu 
Signed-off-by: Kunkun Jiang 
---
 drivers/vfio/vfio_iommu_type1.c | 33 +
 1 file changed, 33 insertions(+)

diff --git a/drivers/vfio/vfio_iommu_type1.c b/drivers/vfio/vfio_iommu_type1.c
index 0b4dedaa9128..3b8522ebf955 100644
--- a/drivers/vfio/vfio_iommu_type1.c
+++ b/drivers/vfio/vfio_iommu_type1.c
@@ -74,6 +74,7 @@ struct vfio_iommu {
boolnesting;
booldirty_page_tracking;
boolpinned_page_dirty_scope;
+   uint64_tnum_non_hwdbm_groups;
 };
 
 struct vfio_domain {
@@ -102,6 +103,7 @@ struct vfio_group {
struct list_headnext;
boolmdev_group; /* An mdev group */
boolpinned_page_dirty_scope;
+   booliommu_hwdbm;/* Valid for non-mdev group */
 };
 
 struct vfio_iova {
@@ -976,6 +978,27 @@ static void vfio_update_pgsize_bitmap(struct vfio_iommu 
*iommu)
}
 }
 
+static int vfio_dev_has_feature(struct device *dev, void *data)
+{
+   enum iommu_dev_features *feat = data;
+
+   if (!iommu_dev_has_feature(dev, *feat))
+   return -ENODEV;
+
+   return 0;
+}
+
+static bool vfio_group_supports_hwdbm(struct vfio_group *group)
+{
+   enum iommu_dev_features feat = IOMMU_DEV_FEAT_HWDBM;
+
+   if (iommu_group_for_each_dev(group->iommu_group, ,
+vfio_dev_has_feature))
+   return false;
+
+   return true;
+}
+
 static int update_user_bitmap(u64 __user *bitmap, struct vfio_iommu *iommu,
  struct vfio_dma *dma, dma_addr_t base_iova,
  size_t pgsize)
@@ -2189,6 +2212,12 @@ static int vfio_iommu_type1_attach_group(void 
*iommu_data,
 * capable via the page pinning interface.
 */
iommu->pinned_page_dirty_scope = false;
+
+   /* Update the hwdbm status of group and iommu */
+   group->iommu_hwdbm = vfio_group_supports_hwdbm(group);
+   if (!group->iommu_hwdbm)
+   iommu->num_non_hwdbm_groups++;
+
mutex_unlock(>lock);
vfio_iommu_resv_free(_resv_regions);
 
@@ -2342,6 +2371,7 @@ static void vfio_iommu_type1_detach_group(void 
*iommu_data,
struct vfio_domain *domain;
struct vfio_group *group;
bool update_dirty_scope = false;
+   bool update_iommu_hwdbm = false;
LIST_HEAD(iova_copy);
 
mutex_lock(>lock);
@@ -2380,6 +2410,7 @@ static void vfio_iommu_type1_detach_group(void 
*iommu_data,
 
vfio_iommu_detach_group(domain, group);
update_dirty_scope = !group->pinned_page_dirty_scope;
+   update_iommu_hwdbm = !group->iommu_hwdbm;
list_del(>next);
kfree(group);
/*
@@ -2417,6 +2448,8 @@ static void vfio_iommu_type1_detach_group(void 
*iommu_data,
 */
if (update_dirty_scope)
update_pinned_page_dirty_scope(iommu);
+   if (update_iommu_hwdbm)
+   iommu->num_non_hwdbm_groups--;
mutex_unlock(>lock);
 }
 
-- 
2.19.1

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

[RFC PATCH 06/11] iommu/arm-smmu-v3: Scan leaf TTD to sync hardware dirty log

2021-01-28 Thread Keqian Zhu

From: jiangkunkun 

During dirty log tracking, user will try to retrieve dirty log from
iommu if it supports hardware dirty log. This adds a new interface
named sync_dirty_log in iommu layer and arm smmuv3 implements it,
which scans leaf TTD and treats it's dirty if it's writable (As we
just enable HTTU for stage1, so check AP[2] is not set).

Co-developed-by: Keqian Zhu 
Signed-off-by: Kunkun Jiang 
---
 drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c | 27 +++
 drivers/iommu/io-pgtable-arm.c  | 90 +
 drivers/iommu/iommu.c   | 41 ++
 include/linux/io-pgtable.h  |  4 +
 include/linux/iommu.h   | 17 
 5 files changed, 179 insertions(+)

diff --git a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c 
b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
index 2434519e4bb6..43d0536b429a 100644
--- a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
+++ b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
@@ -2548,6 +2548,32 @@ static size_t arm_smmu_merge_page(struct iommu_domain 
*domain, unsigned long iov
return ops->merge_page(ops, iova, paddr, size, prot);
 }
 
+static int arm_smmu_sync_dirty_log(struct iommu_domain *domain,
+  unsigned long iova, size_t size,
+  unsigned long *bitmap,
+  unsigned long base_iova,
+  unsigned long bitmap_pgshift)
+{
+   struct io_pgtable_ops *ops = to_smmu_domain(domain)->pgtbl_ops;
+   struct arm_smmu_device *smmu = to_smmu_domain(domain)->smmu;
+
+   if (!(smmu->features & ARM_SMMU_FEAT_HTTU_HD)) {
+   dev_err(smmu->dev, "don't support HTTU_HD and sync dirty 
log\n");
+   return -EPERM;
+   }
+
+   if (!ops || !ops->sync_dirty_log) {
+   pr_err("don't support sync dirty log\n");
+   return -ENODEV;
+   }
+
+   /* To ensure all inflight transactions are completed */
+   arm_smmu_flush_iotlb_all(domain);
+
+   return ops->sync_dirty_log(ops, iova, size, bitmap,
+   base_iova, bitmap_pgshift);
+}
+
 static int arm_smmu_of_xlate(struct device *dev, struct of_phandle_args *args)
 {
return iommu_fwspec_add_ids(dev, args->args, 1);
@@ -2649,6 +2675,7 @@ static struct iommu_ops arm_smmu_ops = {
.domain_set_attr= arm_smmu_domain_set_attr,
.split_block= arm_smmu_split_block,
.merge_page = arm_smmu_merge_page,
+   .sync_dirty_log = arm_smmu_sync_dirty_log,
.of_xlate   = arm_smmu_of_xlate,
.get_resv_regions   = arm_smmu_get_resv_regions,
.put_resv_regions   = generic_iommu_put_resv_regions,
diff --git a/drivers/iommu/io-pgtable-arm.c b/drivers/iommu/io-pgtable-arm.c
index 17390f258eb1..6cfe1ef3fedd 100644
--- a/drivers/iommu/io-pgtable-arm.c
+++ b/drivers/iommu/io-pgtable-arm.c
@@ -877,6 +877,95 @@ static size_t arm_lpae_merge_page(struct io_pgtable_ops 
*ops, unsigned long iova
return __arm_lpae_merge_page(data, iova, paddr, size, lvl, ptep, prot);
 }
 
+static int __arm_lpae_sync_dirty_log(struct arm_lpae_io_pgtable *data,
+unsigned long iova, size_t size,
+int lvl, arm_lpae_iopte *ptep,
+unsigned long *bitmap,
+unsigned long base_iova,
+unsigned long bitmap_pgshift)
+{
+   arm_lpae_iopte pte;
+   struct io_pgtable *iop = >iop;
+   size_t base, next_size;
+   unsigned long offset;
+   int nbits, ret;
+
+   if (WARN_ON(lvl == ARM_LPAE_MAX_LEVELS))
+   return -EINVAL;
+
+   ptep += ARM_LPAE_LVL_IDX(iova, lvl, data);
+   pte = READ_ONCE(*ptep);
+   if (WARN_ON(!pte))
+   return -EINVAL;
+
+   if (size == ARM_LPAE_BLOCK_SIZE(lvl, data)) {
+   if (iopte_leaf(pte, lvl, iop->fmt)) {
+   if (pte & ARM_LPAE_PTE_AP_RDONLY)
+   return 0;
+
+   /* It is writable, set the bitmap */
+   nbits = size >> bitmap_pgshift;
+   offset = (iova - base_iova) >> bitmap_pgshift;
+   bitmap_set(bitmap, offset, nbits);
+   return 0;
+   } else {
+   /* To traverse next level */
+   next_size = ARM_LPAE_BLOCK_SIZE(lvl + 1, data);
+   ptep = iopte_deref(pte, data);
+   for (base = 0; base < size; base += next_size) {
+   ret = __arm_lpae_sync_dirty_log(data,
+   iova + base, next_size, lvl + 1,
+

[RFC PATCH 05/11] iommu/arm-smmu-v3: Merge a span of page to block descriptor

2021-01-28 Thread Keqian Zhu

From: jiangkunkun 

When stop dirty log tracking, we need to recover all block descriptors
which are splited when start dirty log tracking. This adds a new
interface named merge_page in iommu layer and arm smmuv3 implements it,
which reinstall block mappings and unmap the span of page mappings.

It's caller's duty to find contiuous physical memory.

During merging page, other interfaces are not expected to be working,
so race condition does not exist. And we flush all iotlbs after the merge
procedure is completed to ease the pressure of iommu, as we will merge a
huge range of page mappings in general.

Co-developed-by: Keqian Zhu 
Signed-off-by: Kunkun Jiang 
---
 drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c | 20 ++
 drivers/iommu/io-pgtable-arm.c  | 78 +
 drivers/iommu/iommu.c   | 75 
 include/linux/io-pgtable.h  |  2 +
 include/linux/iommu.h   | 10 +++
 5 files changed, 185 insertions(+)

diff --git a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c 
b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
index 5469f4fca820..2434519e4bb6 100644
--- a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
+++ b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
@@ -2529,6 +2529,25 @@ static size_t arm_smmu_split_block(struct iommu_domain 
*domain,
return ops->split_block(ops, iova, size);
 }
 
+static size_t arm_smmu_merge_page(struct iommu_domain *domain, unsigned long 
iova,
+ phys_addr_t paddr, size_t size, int prot)
+{
+   struct io_pgtable_ops *ops = to_smmu_domain(domain)->pgtbl_ops;
+   struct arm_smmu_device *smmu = to_smmu_domain(domain)->smmu;
+
+   if (!(smmu->features & (ARM_SMMU_FEAT_BBML1 | ARM_SMMU_FEAT_BBML2))) {
+   dev_err(smmu->dev, "don't support BBML1/2 and merge page\n");
+   return 0;
+   }
+
+   if (!ops || !ops->merge_page) {
+   pr_err("don't support merge page\n");
+   return 0;
+   }
+
+   return ops->merge_page(ops, iova, paddr, size, prot);
+}
+
 static int arm_smmu_of_xlate(struct device *dev, struct of_phandle_args *args)
 {
return iommu_fwspec_add_ids(dev, args->args, 1);
@@ -2629,6 +2648,7 @@ static struct iommu_ops arm_smmu_ops = {
.domain_get_attr= arm_smmu_domain_get_attr,
.domain_set_attr= arm_smmu_domain_set_attr,
.split_block= arm_smmu_split_block,
+   .merge_page = arm_smmu_merge_page,
.of_xlate   = arm_smmu_of_xlate,
.get_resv_regions   = arm_smmu_get_resv_regions,
.put_resv_regions   = generic_iommu_put_resv_regions,
diff --git a/drivers/iommu/io-pgtable-arm.c b/drivers/iommu/io-pgtable-arm.c
index f3b7f7115e38..17390f258eb1 100644
--- a/drivers/iommu/io-pgtable-arm.c
+++ b/drivers/iommu/io-pgtable-arm.c
@@ -800,6 +800,83 @@ static size_t arm_lpae_split_block(struct io_pgtable_ops 
*ops,
return __arm_lpae_split_block(data, iova, size, lvl, ptep);
 }
 
+static size_t __arm_lpae_merge_page(struct arm_lpae_io_pgtable *data,
+   unsigned long iova, phys_addr_t paddr,
+   size_t size, int lvl, arm_lpae_iopte *ptep,
+   arm_lpae_iopte prot)
+{
+   arm_lpae_iopte pte, *tablep;
+   struct io_pgtable *iop = >iop;
+   struct io_pgtable_cfg *cfg = >iop.cfg;
+
+   if (WARN_ON(lvl == ARM_LPAE_MAX_LEVELS))
+   return 0;
+
+   ptep += ARM_LPAE_LVL_IDX(iova, lvl, data);
+   pte = READ_ONCE(*ptep);
+   if (WARN_ON(!pte))
+   return 0;
+
+   if (size == ARM_LPAE_BLOCK_SIZE(lvl, data)) {
+   if (iopte_leaf(pte, lvl, iop->fmt))
+   return size;
+
+   /* Race does not exist */
+   if (cfg->bbml == 1) {
+   prot |= ARM_LPAE_PTE_NT;
+   __arm_lpae_init_pte(data, paddr, prot, lvl, ptep);
+   io_pgtable_tlb_flush_walk(iop, iova, size,
+ ARM_LPAE_GRANULE(data));
+
+   prot &= ~(ARM_LPAE_PTE_NT);
+   __arm_lpae_init_pte(data, paddr, prot, lvl, ptep);
+   } else {
+   __arm_lpae_init_pte(data, paddr, prot, lvl, ptep);
+   }
+
+   tablep = iopte_deref(pte, data);
+   __arm_lpae_free_pgtable(data, lvl + 1, tablep);
+   return size;
+   } else if (iopte_leaf(pte, lvl, iop->fmt)) {
+   /* The size is too small, already merged */
+   return size;
+   }
+
+   /* Keep on walkin */
+   ptep = iopte_deref(pte, data);
+   return __arm_lpae_merge_page(data, iova, paddr, size, lvl + 1, ptep, 
prot);
+}
+
+static size_t

[RFC PATCH 02/11] iommu/arm-smmu-v3: Enable HTTU for SMMU stage1 mapping

2021-01-28 Thread Keqian Zhu

From: jiangkunkun 

If HTTU is supported, we enable HA/HD bits in the SMMU CD (stage 1
mapping), and set DBM bit for writable TTD.

The dirty state information is encoded using the access permission
bits AP[2] (stage 1) or S2AP[1] (stage 2) in conjunction with the
DBM (Dirty Bit Modifier) bit, where DBM means writable and AP[2]/
S2AP[1] means dirty.

Co-developed-by: Keqian Zhu 
Signed-off-by: Kunkun Jiang 
---
 drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c | 5 +
 drivers/iommu/io-pgtable-arm.c  | 7 ++-
 2 files changed, 11 insertions(+), 1 deletion(-)

diff --git a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c 
b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
index 0f0fe71cc10d..8cc9d7536b08 100644
--- a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
+++ b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
@@ -1036,6 +1036,11 @@ int arm_smmu_write_ctx_desc(struct arm_smmu_domain 
*smmu_domain, int ssid,
FIELD_PREP(CTXDESC_CD_0_ASID, cd->asid) |
CTXDESC_CD_0_V;
 
+   if (smmu->features & ARM_SMMU_FEAT_HTTU_HA)
+   val |= CTXDESC_CD_0_HA;
+   if (smmu->features & ARM_SMMU_FEAT_HTTU_HD)
+   val |= CTXDESC_CD_0_HD;
+
/* STALL_MODEL==0b10 && CD.S==0 is ILLEGAL */
if (smmu->features & ARM_SMMU_FEAT_STALL_FORCE)
val |= CTXDESC_CD_0_S;
diff --git a/drivers/iommu/io-pgtable-arm.c b/drivers/iommu/io-pgtable-arm.c
index 87def58e79b5..e299a44808ae 100644
--- a/drivers/iommu/io-pgtable-arm.c
+++ b/drivers/iommu/io-pgtable-arm.c
@@ -72,6 +72,7 @@
 
 #define ARM_LPAE_PTE_NSTABLE   (((arm_lpae_iopte)1) << 63)
 #define ARM_LPAE_PTE_XN(((arm_lpae_iopte)3) << 53)
+#define ARM_LPAE_PTE_DBM   (((arm_lpae_iopte)1) << 51)
 #define ARM_LPAE_PTE_AF(((arm_lpae_iopte)1) << 10)
 #define ARM_LPAE_PTE_SH_NS (((arm_lpae_iopte)0) << 8)
 #define ARM_LPAE_PTE_SH_OS (((arm_lpae_iopte)2) << 8)
@@ -81,7 +82,7 @@
 
 #define ARM_LPAE_PTE_ATTR_LO_MASK  (((arm_lpae_iopte)0x3ff) << 2)
 /* Ignore the contiguous bit for block splitting */
-#define ARM_LPAE_PTE_ATTR_HI_MASK  (((arm_lpae_iopte)6) << 52)
+#define ARM_LPAE_PTE_ATTR_HI_MASK  (((arm_lpae_iopte)13) << 51)
 #define ARM_LPAE_PTE_ATTR_MASK (ARM_LPAE_PTE_ATTR_LO_MASK |\
 ARM_LPAE_PTE_ATTR_HI_MASK)
 /* Software bit for solving coherency races */
@@ -379,6 +380,7 @@ static int __arm_lpae_map(struct arm_lpae_io_pgtable *data, 
unsigned long iova,
 static arm_lpae_iopte arm_lpae_prot_to_pte(struct arm_lpae_io_pgtable *data,
   int prot)
 {
+   struct io_pgtable_cfg *cfg = >iop.cfg;
arm_lpae_iopte pte;
 
if (data->iop.fmt == ARM_64_LPAE_S1 ||
@@ -386,6 +388,9 @@ static arm_lpae_iopte arm_lpae_prot_to_pte(struct 
arm_lpae_io_pgtable *data,
pte = ARM_LPAE_PTE_nG;
if (!(prot & IOMMU_WRITE) && (prot & IOMMU_READ))
pte |= ARM_LPAE_PTE_AP_RDONLY;
+   else if (cfg->httu_hd)
+   pte |= ARM_LPAE_PTE_DBM;
+
if (!(prot & IOMMU_PRIV))
pte |= ARM_LPAE_PTE_AP_UNPRIV;
} else {
-- 
2.19.1

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

[RFC PATCH 04/11] iommu/arm-smmu-v3: Split block descriptor to a span of page

2021-01-28 Thread Keqian Zhu

From: jiangkunkun 

Block descriptor is not a proper granule for dirty log tracking. This
adds a new interface named split_block in iommu layer and arm smmuv3
implements it, which splits block descriptor to an equivalent span of
page descriptors.

During spliting block, other interfaces are not expected to be working,
so race condition does not exist. And we flush all iotlbs after the split
procedure is completed to ease the pressure of iommu, as we will split a
huge range of block mappings in general.

Co-developed-by: Keqian Zhu 
Signed-off-by: Kunkun Jiang 
---
 drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c |  20 
 drivers/iommu/io-pgtable-arm.c  | 122 
 drivers/iommu/iommu.c   |  40 +++
 include/linux/io-pgtable.h  |   2 +
 include/linux/iommu.h   |  10 ++
 5 files changed, 194 insertions(+)

diff --git a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c 
b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
index 9208881a571c..5469f4fca820 100644
--- a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
+++ b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
@@ -2510,6 +2510,25 @@ static int arm_smmu_domain_set_attr(struct iommu_domain 
*domain,
return ret;
 }
 
+static size_t arm_smmu_split_block(struct iommu_domain *domain,
+  unsigned long iova, size_t size)
+{
+   struct arm_smmu_device *smmu = to_smmu_domain(domain)->smmu;
+   struct io_pgtable_ops *ops = to_smmu_domain(domain)->pgtbl_ops;
+
+   if (!(smmu->features & (ARM_SMMU_FEAT_BBML1 | ARM_SMMU_FEAT_BBML2))) {
+   dev_err(smmu->dev, "don't support BBML1/2 and split block\n");
+   return 0;
+   }
+
+   if (!ops || !ops->split_block) {
+   pr_err("don't support split block\n");
+   return 0;
+   }
+
+   return ops->split_block(ops, iova, size);
+}
+
 static int arm_smmu_of_xlate(struct device *dev, struct of_phandle_args *args)
 {
return iommu_fwspec_add_ids(dev, args->args, 1);
@@ -2609,6 +2628,7 @@ static struct iommu_ops arm_smmu_ops = {
.device_group   = arm_smmu_device_group,
.domain_get_attr= arm_smmu_domain_get_attr,
.domain_set_attr= arm_smmu_domain_set_attr,
+   .split_block= arm_smmu_split_block,
.of_xlate   = arm_smmu_of_xlate,
.get_resv_regions   = arm_smmu_get_resv_regions,
.put_resv_regions   = generic_iommu_put_resv_regions,
diff --git a/drivers/iommu/io-pgtable-arm.c b/drivers/iommu/io-pgtable-arm.c
index e299a44808ae..f3b7f7115e38 100644
--- a/drivers/iommu/io-pgtable-arm.c
+++ b/drivers/iommu/io-pgtable-arm.c
@@ -79,6 +79,8 @@
 #define ARM_LPAE_PTE_SH_IS (((arm_lpae_iopte)3) << 8)
 #define ARM_LPAE_PTE_NS(((arm_lpae_iopte)1) << 5)
 #define ARM_LPAE_PTE_VALID (((arm_lpae_iopte)1) << 0)
+/* Block descriptor bits */
+#define ARM_LPAE_PTE_NT(((arm_lpae_iopte)1) << 16)
 
 #define ARM_LPAE_PTE_ATTR_LO_MASK  (((arm_lpae_iopte)0x3ff) << 2)
 /* Ignore the contiguous bit for block splitting */
@@ -679,6 +681,125 @@ static phys_addr_t arm_lpae_iova_to_phys(struct 
io_pgtable_ops *ops,
return iopte_to_paddr(pte, data) | iova;
 }
 
+static size_t __arm_lpae_split_block(struct arm_lpae_io_pgtable *data,
+unsigned long iova, size_t size, int lvl,
+arm_lpae_iopte *ptep);
+
+static size_t arm_lpae_do_split_blk(struct arm_lpae_io_pgtable *data,
+   unsigned long iova, size_t size,
+   arm_lpae_iopte blk_pte, int lvl,
+   arm_lpae_iopte *ptep)
+{
+   struct io_pgtable_cfg *cfg = >iop.cfg;
+   arm_lpae_iopte pte, *tablep;
+   phys_addr_t blk_paddr;
+   size_t tablesz = ARM_LPAE_GRANULE(data);
+   size_t split_sz = ARM_LPAE_BLOCK_SIZE(lvl, data);
+   int i;
+
+   if (WARN_ON(lvl == ARM_LPAE_MAX_LEVELS))
+   return 0;
+
+   tablep = __arm_lpae_alloc_pages(tablesz, GFP_ATOMIC, cfg);
+   if (!tablep)
+   return 0;
+
+   blk_paddr = iopte_to_paddr(blk_pte, data);
+   pte = iopte_prot(blk_pte);
+   for (i = 0; i < tablesz / sizeof(pte); i++, blk_paddr += split_sz)
+   __arm_lpae_init_pte(data, blk_paddr, pte, lvl, [i]);
+
+   if (cfg->bbml == 1) {
+   /* Race does not exist */
+   blk_pte |= ARM_LPAE_PTE_NT;
+   __arm_lpae_set_pte(ptep, blk_pte, cfg);
+   io_pgtable_tlb_flush_walk(>iop, iova, size, size);
+   }
+   /* Race does not exist */
+   pte = arm_lpae_install_table(tablep, ptep, blk_pte, cfg);
+
+   /* Have splited it into page? */
+   if (lvl == (ARM_LPAE_MAX_LE

[RFC PATCH 03/11] iommu/arm-smmu-v3: Add feature detection for BBML

2021-01-28 Thread Keqian Zhu

From: jiangkunkun 

When altering a translation table descriptor of some specific reasons,
we require break-before-make procedure. But it might cause problems when
the TTD is alive. The I/O streams might not tolerate translation faults.

If the SMMU supports BBML level 1 or BBML level 2, we can change the block
size without using break-before-make.

This adds feature detection for BBML, none functional change.

Co-developed-by: Keqian Zhu 
Signed-off-by: Kunkun Jiang 
---
 drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c | 24 -
 drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h |  6 ++
 include/linux/io-pgtable.h  |  1 +
 3 files changed, 30 insertions(+), 1 deletion(-)

diff --git a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c 
b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
index 8cc9d7536b08..9208881a571c 100644
--- a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
+++ b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
@@ -1947,7 +1947,7 @@ static int arm_smmu_domain_finalise_s2(struct 
arm_smmu_domain *smmu_domain,
 static int arm_smmu_domain_finalise(struct iommu_domain *domain,
struct arm_smmu_master *master)
 {
-   int ret;
+   int ret, bbml;
unsigned long ias, oas;
enum io_pgtable_fmt fmt;
struct io_pgtable_cfg pgtbl_cfg;
@@ -1988,12 +1988,20 @@ static int arm_smmu_domain_finalise(struct iommu_domain 
*domain,
return -EINVAL;
}
 
+   if (smmu->features & ARM_SMMU_FEAT_BBML2)
+   bbml = 2;
+   else if (smmu->features & ARM_SMMU_FEAT_BBML1)
+   bbml = 1;
+   else
+   bbml = 0;
+
pgtbl_cfg = (struct io_pgtable_cfg) {
.pgsize_bitmap  = smmu->pgsize_bitmap,
.ias= ias,
.oas= oas,
.httu_hd= smmu->features & ARM_SMMU_FEAT_HTTU_HD,
.coherent_walk  = smmu->features & ARM_SMMU_FEAT_COHERENCY,
+   .bbml   = bbml,
.tlb= _smmu_flush_ops,
.iommu_dev  = smmu->dev,
};
@@ -3328,6 +3336,20 @@ static int arm_smmu_device_hw_probe(struct 
arm_smmu_device *smmu)
 
/* IDR3 */
reg = readl_relaxed(smmu->base + ARM_SMMU_IDR3);
+   switch (FIELD_GET(IDR3_BBML, reg)) {
+   case IDR3_BBML0:
+   break;
+   case IDR3_BBML1:
+   smmu->features |= ARM_SMMU_FEAT_BBML1;
+   break;
+   case IDR3_BBML2:
+   smmu->features |= ARM_SMMU_FEAT_BBML2;
+   break;
+   default:
+   dev_err(smmu->dev, "unknown/unsupported BBM behavior level\n");
+   return -ENXIO;
+   }
+
if (FIELD_GET(IDR3_RIL, reg))
smmu->features |= ARM_SMMU_FEAT_RANGE_INV;
 
diff --git a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h 
b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h
index e91bea44519e..11e526ab7239 100644
--- a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h
+++ b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h
@@ -55,6 +55,10 @@
 #define IDR1_SIDSIZE   GENMASK(5, 0)
 
 #define ARM_SMMU_IDR3  0xc
+#define IDR3_BBML  GENMASK(12, 11)
+#define IDR3_BBML0 0
+#define IDR3_BBML1 1
+#define IDR3_BBML2 2
 #define IDR3_RIL   (1 << 10)
 
 #define ARM_SMMU_IDR5  0x14
@@ -612,6 +616,8 @@ struct arm_smmu_device {
 #define ARM_SMMU_FEAT_SVA  (1 << 17)
 #define ARM_SMMU_FEAT_HTTU_HA  (1 << 18)
 #define ARM_SMMU_FEAT_HTTU_HD  (1 << 19)
+#define ARM_SMMU_FEAT_BBML1(1 << 20)
+#define ARM_SMMU_FEAT_BBML2(1 << 21)
u32 features;
 
 #define ARM_SMMU_OPT_SKIP_PREFETCH (1 << 0)
diff --git a/include/linux/io-pgtable.h b/include/linux/io-pgtable.h
index 1a00ea8562c7..26583beeb5d9 100644
--- a/include/linux/io-pgtable.h
+++ b/include/linux/io-pgtable.h
@@ -99,6 +99,7 @@ struct io_pgtable_cfg {
unsigned intoas;
boolhttu_hd;
boolcoherent_walk;
+   int bbml;
const struct iommu_flush_ops*tlb;
struct device   *iommu_dev;
 
-- 
2.19.1

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

[RFC PATCH 08/11] iommu/arm-smmu-v3: Add HWDBM device feature reporting

2021-01-28 Thread Keqian Zhu

From: jiangkunkun 

We have implemented these interfaces required to support iommu
dirty log tracking. The last step is reporting this feature to
upper user, then the user can perform higher policy base on it.
This adds a new dev feature named IOMMU_DEV_FEAT_HWDBM in iommu
layer. For arm smmuv3, it is equal to ARM_SMMU_FEAT_HTTU_HD.

Co-developed-by: Keqian Zhu 
Signed-off-by: Kunkun Jiang 
---
 drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c | 2 ++
 include/linux/iommu.h   | 1 +
 2 files changed, 3 insertions(+)

diff --git a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c 
b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
index 0c24503d29d3..cbde0489cf31 100644
--- a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
+++ b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
@@ -2629,6 +2629,8 @@ static bool arm_smmu_dev_has_feature(struct device *dev,
switch (feat) {
case IOMMU_DEV_FEAT_SVA:
return arm_smmu_master_sva_supported(master);
+   case IOMMU_DEV_FEAT_HWDBM:
+   return !!(master->smmu->features & ARM_SMMU_FEAT_HTTU_HD);
default:
return false;
}
diff --git a/include/linux/iommu.h b/include/linux/iommu.h
index 1cb6cd0cfc7b..77e561ed57fd 100644
--- a/include/linux/iommu.h
+++ b/include/linux/iommu.h
@@ -160,6 +160,7 @@ struct iommu_resv_region {
 enum iommu_dev_features {
IOMMU_DEV_FEAT_AUX, /* Aux-domain feature */
IOMMU_DEV_FEAT_SVA, /* Shared Virtual Addresses */
+   IOMMU_DEV_FEAT_HWDBM,   /* Hardware Dirty Bit Management */
 };
 
 #define IOMMU_PASID_INVALID(-1U)
-- 
2.19.1

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

[RFC PATCH 00/11] vfio/iommu_type1: Implement dirty log tracking based on smmuv3 HTTU

2021-01-28 Thread Keqian Zhu

Hi all,

This patch series implement a new dirty log tracking method for vfio dma.

Intention：

As we know, vfio live migration is an important and valuable feature, but there
are still many hurdles to solve, including migration of interrupt, device state,
DMA dirty log tracking, and etc.

For now, the only dirty log tracking interface is pinning. It has some 
drawbacks:
1. Only smart vendor drivers are aware of this.
2. It's coarse-grained, the pinned-scope is generally bigger than what the 
device actually access.
3. It can't track dirty continuously and precisely, vfio populates all 
pinned-scope as dirty.
   So it doesn't work well with iteratively dirty log handling.

About SMMU HTTU:

HTTU (Hardware Translation Table Update) is a feature of ARM SMMUv3, it can 
update
access flag or/and dirty state of the TTD (Translation Table Descriptor) by 
hardware.
With HTTU, stage1 TTD is classified into 3 types:
DBM bit AP[2](readonly bit)
1. writable_clean 1   1
2. writable_dirty 1   0
3. readonly   0   1

If HTTU_HD (manage dirty state) is enabled, smmu can change TTD from 
writable_clean to
writable_dirty. Then software can scan TTD to sync dirty state into dirty 
bitmap. With
this feature, we can track the dirty log of DMA continuously and precisely.

About this series:

Patch 1-3: Add feature detection for smmu HTTU and enable HTTU for smmu stage1 
mapping.
   And add feature detection for smmu BBML. We need to split block 
mapping when
   start dirty log tracking and merge page mapping when stop dirty log 
tracking,
   which requires break-before-make procedure. But it might 
cause problems when the
   TTD is alive. The I/O streams might not tolerate translation 
faults. So BBML
   should be used.

Patch 4-7: Add four interfaces (split_block, merge_page, sync_dirty_log and 
clear_dirty_log)
   in IOMMU layer, they are essential to implement dma dirty log 
tracking for vfio.
   We implement these interfaces for arm smmuv3.

Patch   8: Add HWDBM (Hardware Dirty Bit Management) device feature reporting 
in IOMMU layer.

Patch9-11: Implement a new dirty log tracking method for vfio based on iommu 
hwdbm. A new
   ioctl operation named VFIO_DIRTY_LOG_MANUAL_CLEAR is added, which 
can eliminate
   some redundant dirty handling of userspace.

Optimizations TO Do:

1. We recognized that each smmu_domain (a vfio_container may has several 
smmu_domain) has its
   own stage1 mapping, and we must scan all these mapping to sync dirty state. 
We plan to refactor
   smmu_domain to support more than one smmu in one smmu_domain, then these 
smmus can share a same
   stage1 mapping.
2. We also recognized that scan TTD is a hotspot of performance. Recently, I 
have implement a
   SW/HW conbined dirty log tracking at MMU side [1], which can effectively 
solve this problem.
   This idea can be applied to smmu side too.

Thanks,
Keqian


[1] 
https://lore.kernel.org/linux-arm-kernel/2021012612.27136-1-zhukeqi...@huawei.com/

jiangkunkun (11):
  iommu/arm-smmu-v3: Add feature detection for HTTU
  iommu/arm-smmu-v3: Enable HTTU for SMMU stage1 mapping
  iommu/arm-smmu-v3: Add feature detection for BBML
  iommu/arm-smmu-v3: Split block descriptor to a span of page
  iommu/arm-smmu-v3: Merge a span of page to block descriptor
  iommu/arm-smmu-v3: Scan leaf TTD to sync hardware dirty log
  iommu/arm-smmu-v3: Clear dirty log according to bitmap
  iommu/arm-smmu-v3: Add HWDBM device feature reporting
  vfio/iommu_type1: Add HWDBM status maintanance
  vfio/iommu_type1: Optimize dirty bitmap population based on iommu
HWDBM
  vfio/iommu_type1: Add support for manual dirty log clear

 drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c | 138 ++-
 drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h |  14 +
 drivers/iommu/io-pgtable-arm.c  | 392 +++-
 drivers/iommu/iommu.c   | 227 
 drivers/vfio/vfio_iommu_type1.c | 235 +++-
 include/linux/io-pgtable.h  |  14 +
 include/linux/iommu.h   |  55 +++
 include/uapi/linux/vfio.h   |  28 +-
 8 files changed, 1093 insertions(+), 10 deletions(-)

-- 
2.19.1

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

1 2 >

1 - 100 of 141 matches

Mail list logo