date:20210617

Re: Plan for /dev/ioasid RFC v2

2021-06-17 Thread Lu Baolu


Hi David,

On 6/17/21 1:22 PM, David Gibson wrote:

The iommu_group can guarantee the isolation among different physical
devices (represented by RIDs). But when it comes to sub-devices (ex. mdev or
vDPA devices represented by RID + SSID), we have to rely on the
device driver for isolation. The devices which are able to generate sub-
devices should either use their own on-device mechanisms or use the
platform features like Intel Scalable IOV to isolate the sub-devices.

This seems like a misunderstanding of groups.  Groups are not tied to
any PCI meaning.  Groups are the smallest unit of isolation, no matter
what is providing that isolation.

If mdevs are isolated from each other by clever software, even though
they're on the same PCI device they are in different groups from each
other*by definition*.  They are also in a different group from their
parent device (however the mdevs only exist when mdev driver is
active, which implies that the parent device's group is owned by the
kernel).



You are right. This is also my understanding of an "isolation group".

But, as I understand it, iommu_group is only the isolation group visible
to IOMMU. When we talk about sub-devices (sw-mdev or mdev w/ pasid),
only the device and device driver knows the details of isolation, hence
iommu_group could not be extended to cover them. The device drivers
should define their own isolation groups.

Otherwise, the device driver has to fake an iommu_group and add hacky
code to link the related IOMMU elements (iommu device, domain, group
etc.) together. Actually this is part of the problem that this proposal
tries to solve.




Under above conditions, different sub-device from a same RID device
could be able to use different IOASID. This seems to means that we can't
support mixed mode where, for example, two RIDs share an iommu_group and
one (or both) of them have sub-devices.

That doesn't necessarily follow.  mdevs which can be successfully
isolated by their mdev driver are in a different group from their
parent device, and therefore need not be affected by whether the
parent device shares a group with some other physical device.  They
*might*  be, but that's up to the mdev driver to determine based on
what it can safely isolate.



If we understand it as multiple levels of isolation, can we classify the
devices into the following categories?

1) Legacy devices
   - devices without device-level isolation
   - multiple devices could sit in a single iommu_group
   - only a single I/O address space could be bound to IOMMU

2) Modern devices
   - devices capable of device-level isolation
   - able to have subdevices
   - self-isolated, hence not share iommu_group with others
   - multiple I/O address spaces could be bound to IOMMU

For 1), all devices in an iommu_group should be bound to a single
IOASID; The isolation is guaranteed by an iommu_group.

For 2) a single device could be bound to multiple IOASIDs with each sub-
device corresponding to an IOASID. The isolation of each subdevice is
guaranteed by the device driver.

Best regards,
baolu

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

Re: [PATCH] iommu/io-pgtable-arm: Optimize partial walk flush for large scatter-gather list

2021-06-17 Thread Sai Prakash Ranjan

On 2021-06-15 17:21, Sai Prakash Ranjan wrote:
> Hi Krishna,
> 
> On 2021-06-14 23:18, Krishna Reddy wrote:
>>> Right but we won't know until we profile the specific usecases or try them 
>>> in
>>> generic workload to see if they affect the performance. Sure, over 
>>> invalidation is
>>> a concern where multiple buffers can be mapped to same context and the cache
>>> is not usable at the time for lookup and such but we don't do it for small 
>>> buffers
>>> and only for large buffers which means thousands of TLB entry mappings in
>>> which case TLBIASID is preferred (note: I mentioned the HW team
>>> recommendation to use it for anything greater than 128 TLB entries) in my
>>> earlier reply. And also note that we do this only for partial walk flush, 
>>> we are not
>>> arbitrarily changing all the TLBIs to ASID based.
>>
>> Most of the heavy bw use cases does involve processing larger buffers.
>> When the physical memory is allocated dis-contiguously at page_size
>> (let's use 4KB here)
>> granularity, each aligned 2MB chunks IOVA unmap would involve
>> performing a TLBIASID
>> as 2MB is not a leaf. Essentially, It happens all the time during
>> large buffer unmaps and
>> potentially impact active traffic on other large buffers. Depending on how 
>> much
>> latency HW engines can absorb, the overflow/underflow issues for ISO
>> engines can be
>> sporadic and vendor specific.
>> Performing TLBIASID as default for all SoCs is not a safe operation.
>>
> 
> Ok so from what I gather from this is that its not easy to test for the
> negative impact and you don't have data on such yet and the behaviour is
> very vendor specific. To add on qcom impl, we have several performance
> improvements for TLB cache invalidations in HW like wait-for-safe(for realtime
> clients such as camera and display) and few others to allow for cache
> lookups/updates when TLBI is in progress for the same context bank, so atleast
> we are good here.
> 
>>
>>> I am no camera expert but from what the camera team mentioned is that there
>>> is a thread which frees memory(large unused memory buffers) periodically 
>>> which
>>> ends up taking around 100+ms and causing some camera test failures with
>>> frame drops. Parallel efforts are already being made to optimize this usage 
>>> of
>>> thread but as I mentioned previously, this is *not a camera specific*, lets 
>>> say
>>> someone else invokes such large unmaps, it's going to face the same issue.
>>
>> From the above, It doesn't look like the root cause of frame drops is
>> fully understood.
>> Why is 100+ms delay causing camera frame drop?  Is the same thread
>> submitting the buffers
>> to camera after unmap is complete? If not, how is the unmap latency
>> causing issue here?
>>
> 
> Ok since you are interested in camera usecase, I have requested for more 
> details
> from the camera team and will give it once they comeback. However I don't 
> think
> its good to have unmap latency at all and that is being addressed by this 
> patch.
> 

As promised, here are some more details shared by camera team:

Mapping of a framework buffer happens at the time of process request and 
unmapping
of a framework buffer happens once the buffer is available from hardware and 
result
will be notified to camera framework.
 * When there is a delay in unmapping of a buffer, result notification to 
framework
   will be delayed and based on pipeline delay depth, new requests from 
framework
   will be delayed.
 * Camera stack uses internal buffer managers for internal and framework 
buffers.
   While mapping and unmapping these managers will be accessed, so uses common 
lock
   and hence is a blocking call. So unmapping delay will cause the delay for 
mapping
   of a new request and leads to framedrop.

Map and unmap happens in the camera service process context. There is no 
separate perf
path to perform unmapping.

In Camera stack along with map/unmap delay, additional delays are due to HW. So 
HW should
be able to get the requests in time from SW to avoid frame drops.

Thanks,
Sai
-- 
QUALCOMM INDIA, on behalf of Qualcomm Innovation Center, Inc. is a member
of Code Aurora Forum, hosted by The Linux Foundation
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

Re: Re: [PATCH v8 03/10] eventfd: Increase the recursion depth of eventfd_signal()

2021-06-17 Thread Yongji Xie

On Thu, Jun 17, 2021 at 4:34 PM He Zhe  wrote:
>
>
>
> On 6/15/21 10:13 PM, Xie Yongji wrote:
> > Increase the recursion depth of eventfd_signal() to 1. This
> > is the maximum recursion depth we have found so far, which
> > can be triggered with the following call chain:
> >
> > kvm_io_bus_write[kvm]
> >   --> ioeventfd_write   [kvm]
> > --> eventfd_signal  [eventfd]
> >   --> vhost_poll_wakeup [vhost]
> > --> vduse_vdpa_kick_vq  [vduse]
> >   --> eventfd_signal[eventfd]
> >
> > Signed-off-by: Xie Yongji 
> > Acked-by: Jason Wang 
>
> The fix had been posted one year ago.
>
> https://lore.kernel.org/lkml/20200410114720.24838-1-zhe...@windriver.com/
>

OK, so it seems to be a fix for the RT system if my understanding is
correct? Any reason why it's not merged? I'm happy to rebase my series
on your patch if you'd like to repost it.

BTW, I also notice another thread for this issue:

https://lore.kernel.org/linux-fsdevel/dm6pr11mb420291b550a10853403c7592ff...@dm6pr11mb4202.namprd11.prod.outlook.com/T/

>
> > ---
> >  fs/eventfd.c| 2 +-
> >  include/linux/eventfd.h | 5 -
> >  2 files changed, 5 insertions(+), 2 deletions(-)
> >
> > diff --git a/fs/eventfd.c b/fs/eventfd.c
> > index e265b6dd4f34..cc7cd1dbedd3 100644
> > --- a/fs/eventfd.c
> > +++ b/fs/eventfd.c
> > @@ -71,7 +71,7 @@ __u64 eventfd_signal(struct eventfd_ctx *ctx, __u64 n)
> >* it returns true, the eventfd_signal() call should be deferred to a
> >* safe context.
> >*/
> > - if (WARN_ON_ONCE(this_cpu_read(eventfd_wake_count)))
> > + if (WARN_ON_ONCE(this_cpu_read(eventfd_wake_count) > EFD_WAKE_DEPTH))
> >   return 0;
> >
> >   spin_lock_irqsave(>wqh.lock, flags);
> > diff --git a/include/linux/eventfd.h b/include/linux/eventfd.h
> > index fa0a524baed0..886d99cd38ef 100644
> > --- a/include/linux/eventfd.h
> > +++ b/include/linux/eventfd.h
> > @@ -29,6 +29,9 @@
> >  #define EFD_SHARED_FCNTL_FLAGS (O_CLOEXEC | O_NONBLOCK)
> >  #define EFD_FLAGS_SET (EFD_SHARED_FCNTL_FLAGS | EFD_SEMAPHORE)
> >
> > +/* Maximum recursion depth */
> > +#define EFD_WAKE_DEPTH 1
> > +
> >  struct eventfd_ctx;
> >  struct file;
> >
> > @@ -47,7 +50,7 @@ DECLARE_PER_CPU(int, eventfd_wake_count);
> >
> >  static inline bool eventfd_signal_count(void)
> >  {
> > - return this_cpu_read(eventfd_wake_count);
> > + return this_cpu_read(eventfd_wake_count) > EFD_WAKE_DEPTH;
>
> count is just count. How deep is acceptable should be put
> where eventfd_signal_count is called.
>

The return value of this function is boolean rather than integer.
Please see the comments in eventfd_signal():

"then it should check eventfd_signal_count() before calling this
function. If it returns true, the eventfd_signal() call should be
deferred to a safe context."

Thanks,
Yongji
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

[PATCHv2 2/3] iommu/io-pgtable: Optimize partial walk flush for large scatter-gather list

2021-06-17 Thread Sai Prakash Ranjan

Currently for iommu_unmap() of large scatter-gather list with page size
elements, the majority of time is spent in flushing of partial walks in
__arm_lpae_unmap() which is a VA based TLB invalidation invalidating
page-by-page on iommus like arm-smmu-v2 (TLBIVA) which do not support
range based invalidations like on arm-smmu-v3.2.

For example: to unmap a 32MB scatter-gather list with page size elements
(8192 entries), there are 16->2MB buffer unmaps based on the pgsize (2MB
for 4K granule) and each of 2MB will further result in 512 TLBIVAs (2MB/4K)
resulting in a total of 8192 TLBIVAs (512*16) for 16->2MB causing a huge
overhead.

So instead use tlb_flush_all() callback (TLBIALL/TLBIASID) to invalidate
the entire context for partial walk flush on select few platforms where
cost of over-invalidation is less than unmap latency using the newly
introduced quirk IO_PGTABLE_QUIRK_TLB_INV_ALL. We also do this for
non-strict mode given its all about over-invalidation saving time on
individual unmaps and non-deterministic generally.

For this example of 32MB scatter-gather list unmap, this results in just
16 ASID based TLB invalidations (TLBIASIDs) as opposed to 8192 TLBIVAs
thereby increasing the performance of unmaps drastically.

Test on QTI SM8150 SoC for 10 iterations of iommu_{map_sg}/unmap:
(average over 10 iterations)

Before this optimization:

sizeiommu_map_sg  iommu_unmap
  4K2.067 us 1.854 us
 64K9.598 us 8.802 us
  1M  148.890 us   130.718 us
  2M  305.864 us67.291 us
 12M 1793.604 us   390.838 us
 16M 2386.848 us   518.187 us
 24M 3563.296 us   775.989 us
 32M 4747.171 us  1033.364 us

After this optimization:

sizeiommu_map_sg  iommu_unmap
  4K1.723 us 1.765 us
 64K9.880 us 8.869 us
  1M  155.364 us   135.223 us
  2M  303.906 us 5.385 us
 12M 1786.557 us21.250 us
 16M 2391.890 us27.437 us
 24M 3570.895 us39.937 us
 32M 4755.234 us51.797 us

This is further reduced once the map/unmap_pages() support gets in which
will result in just 1 TLBIASID as compared to 16 TLBIASIDs.

Real world data also shows big difference in unmap performance as below:

There were reports of camera frame drops because of high overhead in
iommu unmap without this optimization because of frequent unmaps issued
by camera of about 100MB/s taking more than 100ms thereby causing frame
drops.

Signed-off-by: Sai Prakash Ranjan 
---
 include/linux/io-pgtable.h | 6 ++
 1 file changed, 6 insertions(+)

diff --git a/include/linux/io-pgtable.h b/include/linux/io-pgtable.h
index 45441592a0e6..fd6b30cfdbf7 100644
--- a/include/linux/io-pgtable.h
+++ b/include/linux/io-pgtable.h
@@ -219,6 +219,12 @@ static inline void
 io_pgtable_tlb_flush_walk(struct io_pgtable *iop, unsigned long iova,
  size_t size, size_t granule)
 {
+   if (iop->cfg.quirks & IO_PGTABLE_QUIRK_NON_STRICT ||
+   iop->cfg.quirks & IO_PGTABLE_QUIRK_TLB_INV_ALL) {
+   iop->cfg.tlb->tlb_flush_all(iop->cookie);
+   return;
+   }
+
if (iop->cfg.tlb && iop->cfg.tlb->tlb_flush_walk)
iop->cfg.tlb->tlb_flush_walk(iova, size, granule, iop->cookie);
 }
-- 
QUALCOMM INDIA, on behalf of Qualcomm Innovation Center, Inc. is a member
of Code Aurora Forum, hosted by The Linux Foundation

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

[PATCHv2 3/3] iommu/arm-smmu-qcom: Set IO_PGTABLE_QUIRK_TLB_INV_ALL for QTI SoC impl

2021-06-17 Thread Sai Prakash Ranjan

Set the pgtable quirk IO_PGTABLE_QUIRK_TLB_INV_ALL for QTI SoC
implementation to use ::tlb_flush_all() for partial walk flush
to improve unmap performance.

Signed-off-by: Sai Prakash Ranjan 
---
 drivers/iommu/arm/arm-smmu/arm-smmu-qcom.c | 11 +++
 1 file changed, 11 insertions(+)

diff --git a/drivers/iommu/arm/arm-smmu/arm-smmu-qcom.c 
b/drivers/iommu/arm/arm-smmu/arm-smmu-qcom.c
index 7771d40176de..b8ae51592d00 100644
--- a/drivers/iommu/arm/arm-smmu/arm-smmu-qcom.c
+++ b/drivers/iommu/arm/arm-smmu/arm-smmu-qcom.c
@@ -146,6 +146,8 @@ static int qcom_adreno_smmu_init_context(struct 
arm_smmu_domain *smmu_domain,
 {
struct adreno_smmu_priv *priv;
 
+   pgtbl_cfg->quirks |= IO_PGTABLE_QUIRK_TLB_INV_ALL;
+
/* Only enable split pagetables for the GPU device (SID 0) */
if (!qcom_adreno_smmu_is_gpu_device(dev))
return 0;
@@ -185,6 +187,14 @@ static const struct of_device_id 
qcom_smmu_client_of_match[] __maybe_unused = {
{ }
 };
 
+static int qcom_smmu_init_context(struct arm_smmu_domain *smmu_domain,
+   struct io_pgtable_cfg *pgtbl_cfg, struct device *dev)
+{
+   pgtbl_cfg->quirks |= IO_PGTABLE_QUIRK_TLB_INV_ALL;
+
+   return 0;
+}
+
 static int qcom_smmu_cfg_probe(struct arm_smmu_device *smmu)
 {
unsigned int last_s2cr = ARM_SMMU_GR0_S2CR(smmu->num_mapping_groups - 
1);
@@ -308,6 +318,7 @@ static int qcom_smmu500_reset(struct arm_smmu_device *smmu)
 }
 
 static const struct arm_smmu_impl qcom_smmu_impl = {
+   .init_context = qcom_smmu_init_context,
.cfg_probe = qcom_smmu_cfg_probe,
.def_domain_type = qcom_smmu_def_domain_type,
.reset = qcom_smmu500_reset,
-- 
QUALCOMM INDIA, on behalf of Qualcomm Innovation Center, Inc. is a member
of Code Aurora Forum, hosted by The Linux Foundation

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

[PATCHv2 1/3] iommu/io-pgtable: Add a quirk to use tlb_flush_all() for partial walk flush

2021-06-17 Thread Sai Prakash Ranjan

Add a quirk IO_PGTABLE_QUIRK_TLB_INV_ALL to invalidate entire context
with tlb_flush_all() callback in partial walk flush to improve unmap
performance on select few platforms where the cost of over-invalidation
is less than the unmap latency.

Signed-off-by: Sai Prakash Ranjan 
---
 drivers/iommu/io-pgtable-arm.c | 3 ++-
 include/linux/io-pgtable.h | 5 +
 2 files changed, 7 insertions(+), 1 deletion(-)

diff --git a/drivers/iommu/io-pgtable-arm.c b/drivers/iommu/io-pgtable-arm.c
index 87def58e79b5..5d362f2214bd 100644
--- a/drivers/iommu/io-pgtable-arm.c
+++ b/drivers/iommu/io-pgtable-arm.c
@@ -768,7 +768,8 @@ arm_64_lpae_alloc_pgtable_s1(struct io_pgtable_cfg *cfg, 
void *cookie)
if (cfg->quirks & ~(IO_PGTABLE_QUIRK_ARM_NS |
IO_PGTABLE_QUIRK_NON_STRICT |
IO_PGTABLE_QUIRK_ARM_TTBR1 |
-   IO_PGTABLE_QUIRK_ARM_OUTER_WBWA))
+   IO_PGTABLE_QUIRK_ARM_OUTER_WBWA |
+   IO_PGTABLE_QUIRK_TLB_INV_ALL))
return NULL;
 
data = arm_lpae_alloc_pgtable(cfg);
diff --git a/include/linux/io-pgtable.h b/include/linux/io-pgtable.h
index 4d40dfa75b55..45441592a0e6 100644
--- a/include/linux/io-pgtable.h
+++ b/include/linux/io-pgtable.h
@@ -82,6 +82,10 @@ struct io_pgtable_cfg {
 *
 * IO_PGTABLE_QUIRK_ARM_OUTER_WBWA: Override the outer-cacheability
 *  attributes set in the TCR for a non-coherent page-table walker.
+*
+* IO_PGTABLE_QUIRK_TLB_INV_ALL: Use TLBIALL/TLBIASID to invalidate
+*  entire context for partial walk flush to increase unmap
+*  performance on select few platforms.
 */
#define IO_PGTABLE_QUIRK_ARM_NS BIT(0)
#define IO_PGTABLE_QUIRK_NO_PERMS   BIT(1)
@@ -89,6 +93,7 @@ struct io_pgtable_cfg {
#define IO_PGTABLE_QUIRK_NON_STRICT BIT(4)
#define IO_PGTABLE_QUIRK_ARM_TTBR1  BIT(5)
#define IO_PGTABLE_QUIRK_ARM_OUTER_WBWA BIT(6)
+   #define IO_PGTABLE_QUIRK_TLB_INV_ALLBIT(7)
unsigned long   quirks;
unsigned long   pgsize_bitmap;
unsigned intias;
-- 
QUALCOMM INDIA, on behalf of Qualcomm Innovation Center, Inc. is a member
of Code Aurora Forum, hosted by The Linux Foundation

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

[PATCHv2 0/3] iommu/io-pgtable: Optimize partial walk flush for large scatter-gather list

2021-06-17 Thread Sai Prakash Ranjan

Currently for iommu_unmap() of large scatter-gather list with page size
elements, the majority of time is spent in flushing of partial walks in
__arm_lpae_unmap() which is a VA based TLB invalidation invalidating
page-by-page on iommus like arm-smmu-v2 (TLBIVA) which do not support
range based invalidations like on arm-smmu-v3.2.

For example: to unmap a 32MB scatter-gather list with page size elements
(8192 entries), there are 16->2MB buffer unmaps based on the pgsize (2MB
for 4K granule) and each of 2MB will further result in 512 TLBIVAs (2MB/4K)
resulting in a total of 8192 TLBIVAs (512*16) for 16->2MB causing a huge
overhead.

So instead use tlb_flush_all() callback (TLBIALL/TLBIASID) to invalidate
the entire context for partial walk flush on select few platforms where
cost of over-invalidation is less than unmap latency using the newly
introduced quirk IO_PGTABLE_QUIRK_TLB_INV_ALL. We also do this for
non-strict mode given its all about over-invalidation saving time on
individual unmaps and non-deterministic generally.

For this example of 32MB scatter-gather list unmap, this results in just
16 ASID based TLB invalidations (TLBIASIDs) as opposed to 8192 TLBIVAs
thereby increasing the performance of unmaps drastically.

Test on QTI SM8150 SoC for 10 iterations of iommu_{map_sg}/unmap:
(average over 10 iterations)

Before this optimization:

sizeiommu_map_sg  iommu_unmap
  4K2.067 us 1.854 us
 64K9.598 us 8.802 us
  1M  148.890 us   130.718 us
  2M  305.864 us67.291 us
 12M 1793.604 us   390.838 us
 16M 2386.848 us   518.187 us
 24M 3563.296 us   775.989 us
 32M 4747.171 us  1033.364 us

After this optimization:

sizeiommu_map_sg  iommu_unmap
  4K1.723 us 1.765 us
 64K9.880 us 8.869 us
  1M  155.364 us   135.223 us
  2M  303.906 us 5.385 us
 12M 1786.557 us21.250 us
 16M 2391.890 us27.437 us
 24M 3570.895 us39.937 us
 32M 4755.234 us51.797 us

This is further reduced once the map/unmap_pages() support gets in which
will result in just 1 TLBIASID as compared to 16 TLBIASIDs.

Real world data also shows big difference in unmap performance as below:

There were reports of camera frame drops because of high overhead in
iommu unmap without this optimization because of frequent unmaps issued
by camera of about 100MB/s taking more than 100ms thereby causing frame
drops.

Changes in v2:
 * Add a quirk to choose tlb_flush_all in partial walk flush
 * Set the quirk for QTI SoC implementation

Sai Prakash Ranjan (3):
  iommu/io-pgtable: Add a quirk to use tlb_flush_all() for partial walk
flush
  iommu/io-pgtable: Optimize partial walk flush for large scatter-gather
list
  iommu/arm-smmu-qcom: Set IO_PGTABLE_QUIRK_TLB_INV_ALL for QTI SoC impl

 drivers/iommu/arm/arm-smmu/arm-smmu-qcom.c | 11 +++
 drivers/iommu/io-pgtable-arm.c |  3 ++-
 include/linux/io-pgtable.h | 11 +++
 3 files changed, 24 insertions(+), 1 deletion(-)

-- 
QUALCOMM INDIA, on behalf of Qualcomm Innovation Center, Inc. is a member
of Code Aurora Forum, hosted by The Linux Foundation

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

Re: [PATCH] iommu/io-pgtable-arm: Optimize partial walk flush for large scatter-gather list

2021-06-17 Thread Sai Prakash Ranjan


Hi Krishna,

On 2021-06-18 02:48, Krishna Reddy wrote:
Instead of flush_ops in init_context hook, perhaps a io_pgtable quirk 
since this is
related to tlb, probably a bad name but IO_PGTABLE_QUIRK_TLB_INV which 
will

be set in init_context impl hook and the prev condition in
io_pgtable_tlb_flush_walk()
becomes something like below. Seems very minimal and neat instead of 
poking

into tlb_flush_walk functions or touching dma strict with some flag?

if (iop->cfg.quirks & IO_PGTABLE_QUIRK_NON_STRICT ||
 iop->cfg.quirks & IO_PGTABLE_QUIRK_TLB_INV) {
 iop->cfg.tlb->tlb_flush_all(iop->cookie);
 return;
}


Can you name it as IO_PGTABLE_QUIRK_TLB_INV_ASID or
IO_PGTABLE_QUIRK_TLB_INV_ALL_ASID?



tlb_flush_all() callback implementations can use TLBIALL or TLBIASID. so
having ASID in the quirk name doesn't sound right given this quirk 
should

be generic enough to be usable on other implementations as well.
Instead I will go with IO_PGTABLE_QUIRK_TLB_INV_ALL and will be happy to
change if others have some other preference.

Thanks,
Sai

--
QUALCOMM INDIA, on behalf of Qualcomm Innovation Center, Inc. is a 
member

of Code Aurora Forum, hosted by The Linux Foundation
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

Re: [PATCH v13 6/6] iommu: Remove mode argument from iommu_set_dma_strict()

2021-06-17 Thread Lu Baolu


On 6/17/21 3:41 PM, John Garry wrote:



@@ -349,10 +349,9 @@ static int __init iommu_dma_setup(char *str)
  }
  early_param("iommu.strict", iommu_dma_setup);
-void iommu_set_dma_strict(bool strict)
+void iommu_set_dma_strict(void)
  {
-    if (strict || !(iommu_cmd_line & IOMMU_CMD_LINE_STRICT))
-    iommu_dma_strict = strict;
+    iommu_dma_strict = true;


Sorry, I still can't get how iommu.strict kernel option works.

static int __init iommu_dma_setup(char *str)
{
 int ret = kstrtobool(str, _dma_strict);

 if (!ret)
 iommu_cmd_line |= IOMMU_CMD_LINE_STRICT;
 return ret;
}
early_param("iommu.strict", iommu_dma_setup);

The bit IOMMU_CMD_LINE_STRICT is only set, but not used anywhere.


It is used in patch 2/6:

+    pr_info("DMA domain TLB invalidation policy: %s mode %s\n",
+    iommu_dma_strict ? "strict" : "lazy",
+    (iommu_cmd_line & IOMMU_CMD_LINE_STRICT) ?
+    "(set via kernel command line)" : "");


Hence,
I am wondering how could it work? A bug or I missed anything?


It is really just used for informative purpose now.


I am clear now. Thanks!

Best regards,
baolu
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

Re: [PATCH v13 6/6] iommu: Remove mode argument from iommu_set_dma_strict()

2021-06-17 Thread Lu Baolu


Hi Robin,

On 6/18/21 2:56 AM, Robin Murphy wrote:

diff --git a/drivers/iommu/iommu.c b/drivers/iommu/iommu.c
index 60b1ec42e73b..ff221d3ddcbc 100644
--- a/drivers/iommu/iommu.c
+++ b/drivers/iommu/iommu.c
@@ -349,10 +349,9 @@ static int __init iommu_dma_setup(char *str)
  }
  early_param("iommu.strict", iommu_dma_setup);
-void iommu_set_dma_strict(bool strict)
+void iommu_set_dma_strict(void)
  {
-    if (strict || !(iommu_cmd_line & IOMMU_CMD_LINE_STRICT))
-    iommu_dma_strict = strict;
+    iommu_dma_strict = true;


Sorry, I still can't get how iommu.strict kernel option works.

static int __init iommu_dma_setup(char *str)
{
 int ret = kstrtobool(str, _dma_strict);


Note that this is the bit that does the real work - if the argument 
parses OK then iommu_dma_strict is reassigned with the appropriate 
value. The iommu_cmd_line stuff is a bit of additional bookkeeping, 
basically just so we can see whether default values have been overridden.


Ah, get it. Thanks a lot. I missed this part and naively thought it just
converts a string to integer.

Best regards,
baolu
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

Re: [PATCH v13 4/6] iommu/vt-d: Add support for IOMMU default DMA mode build options

2021-06-17 Thread Lu Baolu


Hi John,

On 6/17/21 4:00 PM, John Garry wrote:

On 17/06/2021 08:32, Lu Baolu wrote:

On 6/16/21 7:03 PM, John Garry wrote:

@@ -4382,9 +4380,9 @@ int __init intel_iommu_init(void)
   * is likely to be much lower than the overhead of 
synchronizing

   * the virtual and physical IOMMU page-tables.
   */
-    if (!intel_iommu_strict && cap_caching_mode(iommu->cap)) {
-    pr_warn("IOMMU batching is disabled due to 
virtualization");

-    intel_iommu_strict = 1;
+    if (cap_caching_mode(iommu->cap)) {
+    pr_warn("IOMMU batching disallowed due to 
virtualization\n");

+    iommu_set_dma_strict(true);


With this change, VM guest will always show this warning.


Would they have got it before also normally?

I mean, default is intel_iommu_strict=0, so if 
cap_caching_mode(iommu->cap) is true and intel_iommu_strict not set to 1 
elsewhere previously, then we would get this print.


Yes. You are right.




How about
removing this message? Users could get the same information through the
kernel message added by "[PATCH v13 2/6] iommu: Print strict or lazy
mode at init time".


I think that the print from 2/6 should occur before this print.

Regardless I would think that you would still like to be notified of 
this change in policy, right?


However I now realize that the print is in a loop per iommu, so we would 
get it per iommu:


for_each_active_iommu(iommu, drhd) {
 /*
  * The flush queue implementation does not perform
  * page-selective invalidations that are required for efficient
  * TLB flushes in virtual environments.  The benefit of batching
  * is likely to be much lower than the overhead of synchronizing
  * the virtual and physical IOMMU page-tables.
  */
 if (!intel_iommu_strict && cap_caching_mode(iommu->cap)) {
     pr_warn("IOMMU batching is disabled due to virtualization");
     intel_iommu_strict = 1;
 }
 ...
}

I need to change that. How about this:

bool print_warning = false;

for_each_active_iommu(iommu, drhd) {
 /*
  * The flush queue implementation does not perform
  * page-selective invalidations that are required for efficient
  * TLB flushes in virtual environments.  The benefit of batching
  * is likely to be much lower than the overhead of synchronizing
  * the virtual and physical IOMMU page-tables.
  */
 if (!print_warning && cap_caching_mode(iommu->cap)) {
     pr_warn("IOMMU batching disallowed due to virtualization\n");
     iommu_set_dma_strict(true);
     print_warning = true;
 }
 ...
}

or use pr_warn_once().


From my p.o.v, pr__once() is better.

How about using a pr_info_once()? I don't think it's a warning, it's
just a policy choice in VM environment.

Best regards,
baolu
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

Re: Plan for /dev/ioasid RFC v2

2021-06-17 Thread Jason Gunthorpe

On Thu, Jun 17, 2021 at 07:31:03AM +, Tian, Kevin wrote:
> > > Yes. function 1 is block-DMA while function 0 still attached to IOASID.
> > > Actually unbind from IOMMU fd doesn't change the security context.
> > > the change is conducted when attaching/detaching device to/from an
> > > IOASID.
> > 
> > But I think you're suggesting that the IOMMU context is simply the
> > device's default domain, so vfio is left in the position where the user
> > gained access to the device by binding it to an iommu_fd, but now the
> > device exists outside of the iommu_fd.

I don't think unbind should be allowed. Close the fd and re-open it if
you want to attach to a different iommu_fd.

> > to gate device access on binding the device to the iommu_fd?  The user
> > can get an accessible device_fd unbound from an iommu_fd on the reverse
> > path.
> 
> yes, binding to iommu_fd is not the appropriate point of gating
> device access.

Binding is the only point we have enough information to make a
full security decision. Device FDs that are not bound must be
inoperable until bound.

The complexities with revoking mmap/etc are what lead me to conclude
that unbind is not worth doing - we can't go back to an inoperable
state very easially.

> Yes, that was the original impression. But after figuring out the new
> block-DMA behavior, I'm not sure whether /dev/iommu must maintain
> its own group integrity check. If it trusts vfio, I feel it's fine to avoid 
> such check which even allows a group of devices bound to different
> IOMMU fd's if user likes. Also if we want to sustain the current vfio
> semantics which doesn't require all devices in the group bound to
> vfio driver, seems it's pointless to enforce such integrity check in
> /dev/iommu.
> 
> Jason, what's your opinion?

I think the iommu code should do all of this, I don't see why vfio
should be dealing with *iommu* isolation.

The rest of this email got a bit long for me to catch up on, sorry :\

Jason
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

Re: Plan for /dev/ioasid RFC v2

2021-06-17 Thread Jason Gunthorpe

On Thu, Jun 17, 2021 at 03:14:52PM -0600, Alex Williamson wrote:

> I've referred to this as a limitation of type1, that we can't put
> devices within the same group into different address spaces, such as
> behind separate vRoot-Ports in a vIOMMU config, but really, who cares?
> As isolation support improves we see fewer multi-device groups, this
> scenario becomes the exception.  Buy better hardware to use the devices
> independently.

This is basically my thinking too, but my conclusion is that we should
not continue to make groups central to the API.

As I've explained to David this is actually causing functional
problems and mess - and I don't see a clean way to keep groups central
but still have the device in control of what is happening. We need
this device <-> iommu connection to be direct to robustly model all
the things that are in the RFC.

To keep groups central someone needs to sketch out how to solve
today's mdev SW page table and mdev PASID issues in a clean
way. Device centric is my suggestion on how to make it clean, but I
haven't heard an alternative??

So, I view the purpose of this discussion to scope out what a
device-centric world looks like and then if we can securely fit in the
legacy non-isolated world on top of that clean future oriented
API. Then decide if it is work worth doing or not.

To my mind it looks like it is not so bad, granted not every detail is
clear, and no code has be sketched, but I don't see a big scary
blocker emerging. An extra ioctl or two, some special logic that
activates for >1 device groups that looks a lot like VFIO's current
logic..

At some level I would be perfectly fine if we made the group FD part
of the API for >1 device groups - except that complexifies every user
space implementation to deal with that. It doesn't feel like a good
trade off.

Jason

(I've been off this week so I didn't try to read/answer absolutely
everything, just a few things - though it looks like this is settling
down into 'kevin make a specific proposal' kind of situation..)
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

Re: Plan for /dev/ioasid RFC v2

2021-06-17 Thread Jason Gunthorpe

On Tue, Jun 15, 2021 at 10:12:15AM -0600, Alex Williamson wrote:
> 
> 1) A dual-function PCIe e1000e NIC where the functions are grouped
>together due to ACS isolation issues.
> 
>a) Initial state: functions 0 & 1 are both bound to e1000e driver.
> 
>b) Admin uses driverctl to bind function 1 to vfio-pci, creating
>   vfio device file, which is chmod'd to grant to a user.
> 
>c) User opens vfio function 1 device file and an iommu_fd, binds
>device_fd to iommu_fd.
> 
>Does this succeed?
>  - if no, specifically where does it fail?

No, the e1000e driver is still connected to the device.

It fails during the VFIO_BIND_IOASID_FD call because the iommu common
code checks the group membership for consistency.

We detect it basically the same way things work today, just moved to
the iommu code.

>d) Repeat b) for function 0.
>e) Repeat c), still using function 1, is it different?  Where?  Why?

Succeeds because all group device members are now bound to vfio

It is hard to predict the nicest way to do all of this, but I would
start by imagining that iommu_fd using drivers (like vfio) will call
some kind of iommu_fd_allow_dma_blocking() call during their probe()
which organizes the machinery to drive this.

> 2) The same NIC as 1)
> 
>a) Initial state: functions 0 & 1 bound to vfio-pci, vfio device
>   files granted to user, user has bound both device_fds to the same
>   iommu_fd.
> 
>AIUI, even though not bound to an IOASID, vfio can now enable access
>through the device_fds, right?

Yes

>What specific entity has placed these
>devices into a block DMA state, when, and how?

To keep all the semantics the same it must be done as part of
VFIO_BIND_IOASID_FD. 

This will have to go over every device in the group and put it in the
dma blocked state. Riffing on the above this is possible if there is
no attached device driver, or the device driver that is attached has
called iommu_fd_allow_dma_blocking() during its probe()

I haven't gone through all of Kevins notes about how this could be
sorted out directly in the iomumu code though..

>b) Both devices are attached to the same IOASID.
>
>Are we assuming that each device was atomically moved to the new
>IOMMU context by the IOASID code?  What if the IOMMU cannot change
>the domain atomically?

What does "atomically" mean here? I assume all IOMMU HW can
change IOASIDs without accidentally leaking traffic
through.

Otherwise that is a major design restriction..

> c) The device_fd for function 1 is detached from the IOASID.
> 
>Are we assuming the reverse of b) performed by the IOASID code?

Yes, the IOMMU will change from the active IOASID to the "block DMA"
ioasid in a way that is secure.

>d) The device_fd for function 1 is unbound from the iommu_fd.
> 
>Does this succeed?

Yes

>  - if yes, what is the resulting IOMMU context of the device and
>who owns it?

device_fd for function 1 remains set to the "block DMA"
ioasid.

Attempting to attach a kernel driver triggers bug_on as today

Attempting to open it again and use it with a different iommu_fd fails

>e) Function 1 is unbound from vfio-pci.
> 
>Does this work or is it blocked?  If blocked, by what entity
>specifically?

As today, it is allowed. The IOASID would have to remain at the "block
all dma" until the implicit connection to the group in the iommu_fd is
released.

>f) Function 1 is bound to e1000e driver.

As today bug_on is triggered via the same maze of notifiers (gross,
but where we are for now). The notifiers would be done by the iommu_fd
instead of vfio

> 3) A dual-function conventional PCI e1000 NIC where the functions are
>grouped together due to shared RID.

This operates effectively the same as today. Manipulating a device
implicitly manipulates the group. Instead of doing dma block the
devices track the IOASID the group is using. 

We model it by demanding that all devices attach to the same IOASID
and instead of doing the DMA block step the device remains attached to
the group's IOASID.  Today this is such an uncommon configuration (a
PCI bridge!) we shouldn't design the entire API around it.

> If vfio gets to offload all of it's group management to IOASID code,
> that's great, but I'm afraid that IOASID is so focused on a
> device-level API that we're instead just ignoring the group dynamics
> and vfio will be forced to provide oversight to maintain secure
> userspace access.

I think it would be a major design failure if VFIO is required to
provide additional security on top of the iommu code. This is
basically the refactoring excercise - to move the VFIO code that is
only about iommu concerns to the iommu layer and VFIO becomes thinner.

Otherwise we still can't properly share this code - why should VDPA
and VFIO have different isolation models? Is it just because we expect
that everything except VFIO has 1:1 groups or not group at all? Feels
wonky.

Jason

Re: [PATCH v13 09/12] swiotlb: Add restricted DMA alloc/free support

2021-06-17 Thread Stefano Stabellini

On Thu, 17 Jun 2021, Claire Chang wrote:
> Add the functions, swiotlb_{alloc,free} and is_swiotlb_for_alloc to
> support the memory allocation from restricted DMA pool.
> 
> The restricted DMA pool is preferred if available.
> 
> Note that since coherent allocation needs remapping, one must set up
> another device coherent pool by shared-dma-pool and use
> dma_alloc_from_dev_coherent instead for atomic coherent allocation.
> 
> Signed-off-by: Claire Chang 
> Reviewed-by: Christoph Hellwig 
> Tested-by: Stefano Stabellini 
> Tested-by: Will Deacon 

Acked-by: Stefano Stabellini 


> ---
>  include/linux/swiotlb.h | 26 ++
>  kernel/dma/direct.c | 49 +++--
>  kernel/dma/swiotlb.c| 38 ++--
>  3 files changed, 99 insertions(+), 14 deletions(-)
> 
> diff --git a/include/linux/swiotlb.h b/include/linux/swiotlb.h
> index 8d8855c77d9a..a73fad460162 100644
> --- a/include/linux/swiotlb.h
> +++ b/include/linux/swiotlb.h
> @@ -85,6 +85,7 @@ extern enum swiotlb_force swiotlb_force;
>   * @debugfs: The dentry to debugfs.
>   * @late_alloc:  %true if allocated using the page allocator
>   * @force_bounce: %true if swiotlb bouncing is forced
> + * @for_alloc:  %true if the pool is used for memory allocation
>   */
>  struct io_tlb_mem {
>   phys_addr_t start;
> @@ -96,6 +97,7 @@ struct io_tlb_mem {
>   struct dentry *debugfs;
>   bool late_alloc;
>   bool force_bounce;
> + bool for_alloc;
>   struct io_tlb_slot {
>   phys_addr_t orig_addr;
>   size_t alloc_size;
> @@ -156,4 +158,28 @@ static inline void swiotlb_adjust_size(unsigned long 
> size)
>  extern void swiotlb_print_info(void);
>  extern void swiotlb_set_max_segment(unsigned int);
>  
> +#ifdef CONFIG_DMA_RESTRICTED_POOL
> +struct page *swiotlb_alloc(struct device *dev, size_t size);
> +bool swiotlb_free(struct device *dev, struct page *page, size_t size);
> +
> +static inline bool is_swiotlb_for_alloc(struct device *dev)
> +{
> + return dev->dma_io_tlb_mem->for_alloc;
> +}
> +#else
> +static inline struct page *swiotlb_alloc(struct device *dev, size_t size)
> +{
> + return NULL;
> +}
> +static inline bool swiotlb_free(struct device *dev, struct page *page,
> + size_t size)
> +{
> + return false;
> +}
> +static inline bool is_swiotlb_for_alloc(struct device *dev)
> +{
> + return false;
> +}
> +#endif /* CONFIG_DMA_RESTRICTED_POOL */
> +
>  #endif /* __LINUX_SWIOTLB_H */
> diff --git a/kernel/dma/direct.c b/kernel/dma/direct.c
> index a92465b4eb12..2de33e5d302b 100644
> --- a/kernel/dma/direct.c
> +++ b/kernel/dma/direct.c
> @@ -75,6 +75,15 @@ static bool dma_coherent_ok(struct device *dev, 
> phys_addr_t phys, size_t size)
>   min_not_zero(dev->coherent_dma_mask, dev->bus_dma_limit);
>  }
>  
> +static void __dma_direct_free_pages(struct device *dev, struct page *page,
> + size_t size)
> +{
> + if (IS_ENABLED(CONFIG_DMA_RESTRICTED_POOL) &&
> + swiotlb_free(dev, page, size))
> + return;
> + dma_free_contiguous(dev, page, size);
> +}
> +
>  static struct page *__dma_direct_alloc_pages(struct device *dev, size_t size,
>   gfp_t gfp)
>  {
> @@ -86,6 +95,16 @@ static struct page *__dma_direct_alloc_pages(struct device 
> *dev, size_t size,
>  
>   gfp |= dma_direct_optimal_gfp_mask(dev, dev->coherent_dma_mask,
>  _limit);
> + if (IS_ENABLED(CONFIG_DMA_RESTRICTED_POOL) &&
> + is_swiotlb_for_alloc(dev)) {
> + page = swiotlb_alloc(dev, size);
> + if (page && !dma_coherent_ok(dev, page_to_phys(page), size)) {
> + __dma_direct_free_pages(dev, page, size);
> + return NULL;
> + }
> + return page;
> + }
> +
>   page = dma_alloc_contiguous(dev, size, gfp);
>   if (page && !dma_coherent_ok(dev, page_to_phys(page), size)) {
>   dma_free_contiguous(dev, page, size);
> @@ -142,7 +161,7 @@ void *dma_direct_alloc(struct device *dev, size_t size,
>   gfp |= __GFP_NOWARN;
>  
>   if ((attrs & DMA_ATTR_NO_KERNEL_MAPPING) &&
> - !force_dma_unencrypted(dev)) {
> + !force_dma_unencrypted(dev) && !is_swiotlb_for_alloc(dev)) {
>   page = __dma_direct_alloc_pages(dev, size, gfp & ~__GFP_ZERO);
>   if (!page)
>   return NULL;
> @@ -155,18 +174,23 @@ void *dma_direct_alloc(struct device *dev, size_t size,
>   }
>  
>   if (!IS_ENABLED(CONFIG_ARCH_HAS_DMA_SET_UNCACHED) &&
> - !IS_ENABLED(CONFIG_DMA_DIRECT_REMAP) &&
> - !dev_is_dma_coherent(dev))
> + !IS_ENABLED(CONFIG_DMA_DIRECT_REMAP) && !dev_is_dma_coherent(dev) &&
> + !is_swiotlb_for_alloc(dev))
>   return arch_dma_alloc(dev, size, dma_handle, gfp, attrs);
>  
>   /*
>*

Re: [PATCH v13 06/12] swiotlb: Use is_swiotlb_force_bounce for swiotlb data bouncing

2021-06-17 Thread Stefano Stabellini

On Thu, 17 Jun 2021, Claire Chang wrote:
> Propagate the swiotlb_force into io_tlb_default_mem->force_bounce and
> use it to determine whether to bounce the data or not. This will be
> useful later to allow for different pools.
> 
> Signed-off-by: Claire Chang 
> Reviewed-by: Christoph Hellwig 
> Tested-by: Stefano Stabellini 
> Tested-by: Will Deacon 

Acked-by: Stefano Stabellini 


> ---
>  drivers/xen/swiotlb-xen.c |  2 +-
>  include/linux/swiotlb.h   | 11 +++
>  kernel/dma/direct.c   |  2 +-
>  kernel/dma/direct.h   |  2 +-
>  kernel/dma/swiotlb.c  |  4 
>  5 files changed, 18 insertions(+), 3 deletions(-)
> 
> diff --git a/drivers/xen/swiotlb-xen.c b/drivers/xen/swiotlb-xen.c
> index 0c6ed09f8513..4730a146fa35 100644
> --- a/drivers/xen/swiotlb-xen.c
> +++ b/drivers/xen/swiotlb-xen.c
> @@ -369,7 +369,7 @@ static dma_addr_t xen_swiotlb_map_page(struct device 
> *dev, struct page *page,
>   if (dma_capable(dev, dev_addr, size, true) &&
>   !range_straddles_page_boundary(phys, size) &&
>   !xen_arch_need_swiotlb(dev, phys, dev_addr) &&
> - swiotlb_force != SWIOTLB_FORCE)
> + !is_swiotlb_force_bounce(dev))
>   goto done;
>  
>   /*
> diff --git a/include/linux/swiotlb.h b/include/linux/swiotlb.h
> index dd1c30a83058..8d8855c77d9a 100644
> --- a/include/linux/swiotlb.h
> +++ b/include/linux/swiotlb.h
> @@ -84,6 +84,7 @@ extern enum swiotlb_force swiotlb_force;
>   *   unmap calls.
>   * @debugfs: The dentry to debugfs.
>   * @late_alloc:  %true if allocated using the page allocator
> + * @force_bounce: %true if swiotlb bouncing is forced
>   */
>  struct io_tlb_mem {
>   phys_addr_t start;
> @@ -94,6 +95,7 @@ struct io_tlb_mem {
>   spinlock_t lock;
>   struct dentry *debugfs;
>   bool late_alloc;
> + bool force_bounce;
>   struct io_tlb_slot {
>   phys_addr_t orig_addr;
>   size_t alloc_size;
> @@ -109,6 +111,11 @@ static inline bool is_swiotlb_buffer(struct device *dev, 
> phys_addr_t paddr)
>   return mem && paddr >= mem->start && paddr < mem->end;
>  }
>  
> +static inline bool is_swiotlb_force_bounce(struct device *dev)
> +{
> + return dev->dma_io_tlb_mem->force_bounce;
> +}
> +
>  void __init swiotlb_exit(void);
>  unsigned int swiotlb_max_segment(void);
>  size_t swiotlb_max_mapping_size(struct device *dev);
> @@ -120,6 +127,10 @@ static inline bool is_swiotlb_buffer(struct device *dev, 
> phys_addr_t paddr)
>  {
>   return false;
>  }
> +static inline bool is_swiotlb_force_bounce(struct device *dev)
> +{
> + return false;
> +}
>  static inline void swiotlb_exit(void)
>  {
>  }
> diff --git a/kernel/dma/direct.c b/kernel/dma/direct.c
> index 7a88c34d0867..a92465b4eb12 100644
> --- a/kernel/dma/direct.c
> +++ b/kernel/dma/direct.c
> @@ -496,7 +496,7 @@ size_t dma_direct_max_mapping_size(struct device *dev)
>  {
>   /* If SWIOTLB is active, use its maximum mapping size */
>   if (is_swiotlb_active(dev) &&
> - (dma_addressing_limited(dev) || swiotlb_force == SWIOTLB_FORCE))
> + (dma_addressing_limited(dev) || is_swiotlb_force_bounce(dev)))
>   return swiotlb_max_mapping_size(dev);
>   return SIZE_MAX;
>  }
> diff --git a/kernel/dma/direct.h b/kernel/dma/direct.h
> index 13e9e7158d94..4632b0f4f72e 100644
> --- a/kernel/dma/direct.h
> +++ b/kernel/dma/direct.h
> @@ -87,7 +87,7 @@ static inline dma_addr_t dma_direct_map_page(struct device 
> *dev,
>   phys_addr_t phys = page_to_phys(page) + offset;
>   dma_addr_t dma_addr = phys_to_dma(dev, phys);
>  
> - if (unlikely(swiotlb_force == SWIOTLB_FORCE))
> + if (is_swiotlb_force_bounce(dev))
>   return swiotlb_map(dev, phys, size, dir, attrs);
>  
>   if (unlikely(!dma_capable(dev, dma_addr, size, true))) {
> diff --git a/kernel/dma/swiotlb.c b/kernel/dma/swiotlb.c
> index 409694d7a8ad..13891d5de8c9 100644
> --- a/kernel/dma/swiotlb.c
> +++ b/kernel/dma/swiotlb.c
> @@ -179,6 +179,10 @@ static void swiotlb_init_io_tlb_mem(struct io_tlb_mem 
> *mem, phys_addr_t start,
>   mem->end = mem->start + bytes;
>   mem->index = 0;
>   mem->late_alloc = late_alloc;
> +
> + if (swiotlb_force == SWIOTLB_FORCE)
> + mem->force_bounce = true;
> +
>   spin_lock_init(>lock);
>   for (i = 0; i < mem->nslabs; i++) {
>   mem->slots[i].list = IO_TLB_SEGSIZE - io_tlb_offset(i);
> -- 
> 2.32.0.288.g62a8d224e6-goog
> 
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

Re: [PATCH v13 05/12] swiotlb: Update is_swiotlb_active to add a struct device argument

2021-06-17 Thread Stefano Stabellini

On Thu, 17 Jun 2021, Claire Chang wrote:
> Update is_swiotlb_active to add a struct device argument. This will be
> useful later to allow for different pools.
> 
> Signed-off-by: Claire Chang 
> Reviewed-by: Christoph Hellwig 
> Tested-by: Stefano Stabellini 
> Tested-by: Will Deacon 

Acked-by: Stefano Stabellini 


> ---
>  drivers/gpu/drm/i915/gem/i915_gem_internal.c | 2 +-
>  drivers/gpu/drm/nouveau/nouveau_ttm.c| 2 +-
>  drivers/pci/xen-pcifront.c   | 2 +-
>  include/linux/swiotlb.h  | 4 ++--
>  kernel/dma/direct.c  | 2 +-
>  kernel/dma/swiotlb.c | 4 ++--
>  6 files changed, 8 insertions(+), 8 deletions(-)
> 
> diff --git a/drivers/gpu/drm/i915/gem/i915_gem_internal.c 
> b/drivers/gpu/drm/i915/gem/i915_gem_internal.c
> index a9d65fc8aa0e..4b7afa0fc85d 100644
> --- a/drivers/gpu/drm/i915/gem/i915_gem_internal.c
> +++ b/drivers/gpu/drm/i915/gem/i915_gem_internal.c
> @@ -42,7 +42,7 @@ static int i915_gem_object_get_pages_internal(struct 
> drm_i915_gem_object *obj)
>  
>   max_order = MAX_ORDER;
>  #ifdef CONFIG_SWIOTLB
> - if (is_swiotlb_active()) {
> + if (is_swiotlb_active(obj->base.dev->dev)) {
>   unsigned int max_segment;
>  
>   max_segment = swiotlb_max_segment();
> diff --git a/drivers/gpu/drm/nouveau/nouveau_ttm.c 
> b/drivers/gpu/drm/nouveau/nouveau_ttm.c
> index 9662522aa066..be15bfd9e0ee 100644
> --- a/drivers/gpu/drm/nouveau/nouveau_ttm.c
> +++ b/drivers/gpu/drm/nouveau/nouveau_ttm.c
> @@ -321,7 +321,7 @@ nouveau_ttm_init(struct nouveau_drm *drm)
>   }
>  
>  #if IS_ENABLED(CONFIG_SWIOTLB) && IS_ENABLED(CONFIG_X86)
> - need_swiotlb = is_swiotlb_active();
> + need_swiotlb = is_swiotlb_active(dev->dev);
>  #endif
>  
>   ret = ttm_bo_device_init(>ttm.bdev, _bo_driver,
> diff --git a/drivers/pci/xen-pcifront.c b/drivers/pci/xen-pcifront.c
> index b7a8f3a1921f..0d56985bfe81 100644
> --- a/drivers/pci/xen-pcifront.c
> +++ b/drivers/pci/xen-pcifront.c
> @@ -693,7 +693,7 @@ static int pcifront_connect_and_init_dma(struct 
> pcifront_device *pdev)
>  
>   spin_unlock(_dev_lock);
>  
> - if (!err && !is_swiotlb_active()) {
> + if (!err && !is_swiotlb_active(>xdev->dev)) {
>   err = pci_xen_swiotlb_init_late();
>   if (err)
>   dev_err(>xdev->dev, "Could not setup SWIOTLB!\n");
> diff --git a/include/linux/swiotlb.h b/include/linux/swiotlb.h
> index d1f3d95881cd..dd1c30a83058 100644
> --- a/include/linux/swiotlb.h
> +++ b/include/linux/swiotlb.h
> @@ -112,7 +112,7 @@ static inline bool is_swiotlb_buffer(struct device *dev, 
> phys_addr_t paddr)
>  void __init swiotlb_exit(void);
>  unsigned int swiotlb_max_segment(void);
>  size_t swiotlb_max_mapping_size(struct device *dev);
> -bool is_swiotlb_active(void);
> +bool is_swiotlb_active(struct device *dev);
>  void __init swiotlb_adjust_size(unsigned long size);
>  #else
>  #define swiotlb_force SWIOTLB_NO_FORCE
> @@ -132,7 +132,7 @@ static inline size_t swiotlb_max_mapping_size(struct 
> device *dev)
>   return SIZE_MAX;
>  }
>  
> -static inline bool is_swiotlb_active(void)
> +static inline bool is_swiotlb_active(struct device *dev)
>  {
>   return false;
>  }
> diff --git a/kernel/dma/direct.c b/kernel/dma/direct.c
> index 84c9feb5474a..7a88c34d0867 100644
> --- a/kernel/dma/direct.c
> +++ b/kernel/dma/direct.c
> @@ -495,7 +495,7 @@ int dma_direct_supported(struct device *dev, u64 mask)
>  size_t dma_direct_max_mapping_size(struct device *dev)
>  {
>   /* If SWIOTLB is active, use its maximum mapping size */
> - if (is_swiotlb_active() &&
> + if (is_swiotlb_active(dev) &&
>   (dma_addressing_limited(dev) || swiotlb_force == SWIOTLB_FORCE))
>   return swiotlb_max_mapping_size(dev);
>   return SIZE_MAX;
> diff --git a/kernel/dma/swiotlb.c b/kernel/dma/swiotlb.c
> index de79e9437030..409694d7a8ad 100644
> --- a/kernel/dma/swiotlb.c
> +++ b/kernel/dma/swiotlb.c
> @@ -664,9 +664,9 @@ size_t swiotlb_max_mapping_size(struct device *dev)
>   return ((size_t)IO_TLB_SIZE) * IO_TLB_SEGSIZE;
>  }
>  
> -bool is_swiotlb_active(void)
> +bool is_swiotlb_active(struct device *dev)
>  {
> - return io_tlb_default_mem != NULL;
> + return dev->dma_io_tlb_mem != NULL;
>  }
>  EXPORT_SYMBOL_GPL(is_swiotlb_active);
>  
> -- 
> 2.32.0.288.g62a8d224e6-goog
> 
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

Re: [PATCH v13 04/12] swiotlb: Update is_swiotlb_buffer to add a struct device argument

2021-06-17 Thread Stefano Stabellini

On Thu, 17 Jun 2021, Claire Chang wrote:
> Update is_swiotlb_buffer to add a struct device argument. This will be
> useful later to allow for different pools.
> 
> Signed-off-by: Claire Chang 
> Reviewed-by: Christoph Hellwig 
> Tested-by: Stefano Stabellini 
> Tested-by: Will Deacon 

Acked-by: Stefano Stabellini 


> ---
>  drivers/iommu/dma-iommu.c | 12 ++--
>  drivers/xen/swiotlb-xen.c |  2 +-
>  include/linux/swiotlb.h   |  7 ---
>  kernel/dma/direct.c   |  6 +++---
>  kernel/dma/direct.h   |  6 +++---
>  5 files changed, 17 insertions(+), 16 deletions(-)
> 
> diff --git a/drivers/iommu/dma-iommu.c b/drivers/iommu/dma-iommu.c
> index 3087d9fa6065..10997ef541f8 100644
> --- a/drivers/iommu/dma-iommu.c
> +++ b/drivers/iommu/dma-iommu.c
> @@ -507,7 +507,7 @@ static void __iommu_dma_unmap_swiotlb(struct device *dev, 
> dma_addr_t dma_addr,
>  
>   __iommu_dma_unmap(dev, dma_addr, size);
>  
> - if (unlikely(is_swiotlb_buffer(phys)))
> + if (unlikely(is_swiotlb_buffer(dev, phys)))
>   swiotlb_tbl_unmap_single(dev, phys, size, dir, attrs);
>  }
>  
> @@ -578,7 +578,7 @@ static dma_addr_t __iommu_dma_map_swiotlb(struct device 
> *dev, phys_addr_t phys,
>   }
>  
>   iova = __iommu_dma_map(dev, phys, aligned_size, prot, dma_mask);
> - if (iova == DMA_MAPPING_ERROR && is_swiotlb_buffer(phys))
> + if (iova == DMA_MAPPING_ERROR && is_swiotlb_buffer(dev, phys))
>   swiotlb_tbl_unmap_single(dev, phys, org_size, dir, attrs);
>   return iova;
>  }
> @@ -749,7 +749,7 @@ static void iommu_dma_sync_single_for_cpu(struct device 
> *dev,
>   if (!dev_is_dma_coherent(dev))
>   arch_sync_dma_for_cpu(phys, size, dir);
>  
> - if (is_swiotlb_buffer(phys))
> + if (is_swiotlb_buffer(dev, phys))
>   swiotlb_sync_single_for_cpu(dev, phys, size, dir);
>  }
>  
> @@ -762,7 +762,7 @@ static void iommu_dma_sync_single_for_device(struct 
> device *dev,
>   return;
>  
>   phys = iommu_iova_to_phys(iommu_get_dma_domain(dev), dma_handle);
> - if (is_swiotlb_buffer(phys))
> + if (is_swiotlb_buffer(dev, phys))
>   swiotlb_sync_single_for_device(dev, phys, size, dir);
>  
>   if (!dev_is_dma_coherent(dev))
> @@ -783,7 +783,7 @@ static void iommu_dma_sync_sg_for_cpu(struct device *dev,
>   if (!dev_is_dma_coherent(dev))
>   arch_sync_dma_for_cpu(sg_phys(sg), sg->length, dir);
>  
> - if (is_swiotlb_buffer(sg_phys(sg)))
> + if (is_swiotlb_buffer(dev, sg_phys(sg)))
>   swiotlb_sync_single_for_cpu(dev, sg_phys(sg),
>   sg->length, dir);
>   }
> @@ -800,7 +800,7 @@ static void iommu_dma_sync_sg_for_device(struct device 
> *dev,
>   return;
>  
>   for_each_sg(sgl, sg, nelems, i) {
> - if (is_swiotlb_buffer(sg_phys(sg)))
> + if (is_swiotlb_buffer(dev, sg_phys(sg)))
>   swiotlb_sync_single_for_device(dev, sg_phys(sg),
>  sg->length, dir);
>  
> diff --git a/drivers/xen/swiotlb-xen.c b/drivers/xen/swiotlb-xen.c
> index 4c89afc0df62..0c6ed09f8513 100644
> --- a/drivers/xen/swiotlb-xen.c
> +++ b/drivers/xen/swiotlb-xen.c
> @@ -100,7 +100,7 @@ static int is_xen_swiotlb_buffer(struct device *dev, 
> dma_addr_t dma_addr)
>* in our domain. Therefore _only_ check address within our domain.
>*/
>   if (pfn_valid(PFN_DOWN(paddr)))
> - return is_swiotlb_buffer(paddr);
> + return is_swiotlb_buffer(dev, paddr);
>   return 0;
>  }
>  
> diff --git a/include/linux/swiotlb.h b/include/linux/swiotlb.h
> index 216854a5e513..d1f3d95881cd 100644
> --- a/include/linux/swiotlb.h
> +++ b/include/linux/swiotlb.h
> @@ -2,6 +2,7 @@
>  #ifndef __LINUX_SWIOTLB_H
>  #define __LINUX_SWIOTLB_H
>  
> +#include 
>  #include 
>  #include 
>  #include 
> @@ -101,9 +102,9 @@ struct io_tlb_mem {
>  };
>  extern struct io_tlb_mem *io_tlb_default_mem;
>  
> -static inline bool is_swiotlb_buffer(phys_addr_t paddr)
> +static inline bool is_swiotlb_buffer(struct device *dev, phys_addr_t paddr)
>  {
> - struct io_tlb_mem *mem = io_tlb_default_mem;
> + struct io_tlb_mem *mem = dev->dma_io_tlb_mem;
>  
>   return mem && paddr >= mem->start && paddr < mem->end;
>  }
> @@ -115,7 +116,7 @@ bool is_swiotlb_active(void);
>  void __init swiotlb_adjust_size(unsigned long size);
>  #else
>  #define swiotlb_force SWIOTLB_NO_FORCE
> -static inline bool is_swiotlb_buffer(phys_addr_t paddr)
> +static inline bool is_swiotlb_buffer(struct device *dev, phys_addr_t paddr)
>  {
>   return false;
>  }
> diff --git a/kernel/dma/direct.c b/kernel/dma/direct.c
> index f737e3347059..84c9feb5474a 100644
> --- a/kernel/dma/direct.c
> +++ b/kernel/dma/direct.c
> @@ -343,7 +343,7 @@ void dma_direct_sync_sg_for_device(struct device *dev,
>   for_each_sg(sgl, sg,

Re: [PATCH v13 03/12] swiotlb: Set dev->dma_io_tlb_mem to the swiotlb pool used

2021-06-17 Thread Stefano Stabellini

On Thu, 17 Jun 2021, Claire Chang wrote:
> Always have the pointer to the swiotlb pool used in struct device. This
> could help simplify the code for other pools.
> 
> Signed-off-by: Claire Chang 
> Reviewed-by: Christoph Hellwig 
> Tested-by: Stefano Stabellini 
> Tested-by: Will Deacon 

Acked-by: Stefano Stabellini 

> ---
>  drivers/base/core.c| 4 
>  include/linux/device.h | 4 
>  kernel/dma/swiotlb.c   | 8 
>  3 files changed, 12 insertions(+), 4 deletions(-)
> 
> diff --git a/drivers/base/core.c b/drivers/base/core.c
> index f29839382f81..cb3123e3954d 100644
> --- a/drivers/base/core.c
> +++ b/drivers/base/core.c
> @@ -27,6 +27,7 @@
>  #include 
>  #include 
>  #include 
> +#include 
>  #include 
>  #include  /* for dma_default_coherent */
>  
> @@ -2736,6 +2737,9 @@ void device_initialize(struct device *dev)
>  defined(CONFIG_ARCH_HAS_SYNC_DMA_FOR_CPU_ALL)
>   dev->dma_coherent = dma_default_coherent;
>  #endif
> +#ifdef CONFIG_SWIOTLB
> + dev->dma_io_tlb_mem = io_tlb_default_mem;
> +#endif
>  }
>  EXPORT_SYMBOL_GPL(device_initialize);
>  
> diff --git a/include/linux/device.h b/include/linux/device.h
> index ba660731bd25..240d652a0696 100644
> --- a/include/linux/device.h
> +++ b/include/linux/device.h
> @@ -416,6 +416,7 @@ struct dev_links_info {
>   * @dma_pools:   Dma pools (if dma'ble device).
>   * @dma_mem: Internal for coherent mem override.
>   * @cma_area:Contiguous memory area for dma allocations
> + * @dma_io_tlb_mem: Pointer to the swiotlb pool used.  Not for driver use.
>   * @archdata:For arch-specific additions.
>   * @of_node: Associated device tree node.
>   * @fwnode:  Associated device node supplied by platform firmware.
> @@ -518,6 +519,9 @@ struct device {
>  #ifdef CONFIG_DMA_CMA
>   struct cma *cma_area;   /* contiguous memory area for dma
>  allocations */
> +#endif
> +#ifdef CONFIG_SWIOTLB
> + struct io_tlb_mem *dma_io_tlb_mem;
>  #endif
>   /* arch specific additions */
>   struct dev_archdata archdata;
> diff --git a/kernel/dma/swiotlb.c b/kernel/dma/swiotlb.c
> index 2dba659a1e73..de79e9437030 100644
> --- a/kernel/dma/swiotlb.c
> +++ b/kernel/dma/swiotlb.c
> @@ -340,7 +340,7 @@ void __init swiotlb_exit(void)
>  static void swiotlb_bounce(struct device *dev, phys_addr_t tlb_addr, size_t 
> size,
>  enum dma_data_direction dir)
>  {
> - struct io_tlb_mem *mem = io_tlb_default_mem;
> + struct io_tlb_mem *mem = dev->dma_io_tlb_mem;
>   int index = (tlb_addr - mem->start) >> IO_TLB_SHIFT;
>   unsigned int offset = (tlb_addr - mem->start) & (IO_TLB_SIZE - 1);
>   phys_addr_t orig_addr = mem->slots[index].orig_addr;
> @@ -431,7 +431,7 @@ static unsigned int wrap_index(struct io_tlb_mem *mem, 
> unsigned int index)
>  static int find_slots(struct device *dev, phys_addr_t orig_addr,
>   size_t alloc_size)
>  {
> - struct io_tlb_mem *mem = io_tlb_default_mem;
> + struct io_tlb_mem *mem = dev->dma_io_tlb_mem;
>   unsigned long boundary_mask = dma_get_seg_boundary(dev);
>   dma_addr_t tbl_dma_addr =
>   phys_to_dma_unencrypted(dev, mem->start) & boundary_mask;
> @@ -508,7 +508,7 @@ phys_addr_t swiotlb_tbl_map_single(struct device *dev, 
> phys_addr_t orig_addr,
>   size_t mapping_size, size_t alloc_size,
>   enum dma_data_direction dir, unsigned long attrs)
>  {
> - struct io_tlb_mem *mem = io_tlb_default_mem;
> + struct io_tlb_mem *mem = dev->dma_io_tlb_mem;
>   unsigned int offset = swiotlb_align_offset(dev, orig_addr);
>   unsigned int i;
>   int index;
> @@ -559,7 +559,7 @@ void swiotlb_tbl_unmap_single(struct device *hwdev, 
> phys_addr_t tlb_addr,
> size_t mapping_size, enum dma_data_direction dir,
> unsigned long attrs)
>  {
> - struct io_tlb_mem *mem = io_tlb_default_mem;
> + struct io_tlb_mem *mem = hwdev->dma_io_tlb_mem;
>   unsigned long flags;
>   unsigned int offset = swiotlb_align_offset(hwdev, tlb_addr);
>   int index = (tlb_addr - offset - mem->start) >> IO_TLB_SHIFT;
> -- 
> 2.32.0.288.g62a8d224e6-goog
> 
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

Re: [PATCH v13 01/12] swiotlb: Refactor swiotlb init functions

2021-06-17 Thread Stefano Stabellini

On Thu, 17 Jun 2021, Claire Chang wrote:
> Add a new function, swiotlb_init_io_tlb_mem, for the io_tlb_mem struct
> initialization to make the code reusable.
> 
> Signed-off-by: Claire Chang 
> Reviewed-by: Christoph Hellwig 
> Tested-by: Stefano Stabellini 
> Tested-by: Will Deacon 
> ---
>  kernel/dma/swiotlb.c | 50 ++--
>  1 file changed, 25 insertions(+), 25 deletions(-)
> 
> diff --git a/kernel/dma/swiotlb.c b/kernel/dma/swiotlb.c
> index 52e2ac526757..47bb2a766798 100644
> --- a/kernel/dma/swiotlb.c
> +++ b/kernel/dma/swiotlb.c
> @@ -168,9 +168,28 @@ void __init swiotlb_update_mem_attributes(void)
>   memset(vaddr, 0, bytes);
>  }
>  
> -int __init swiotlb_init_with_tbl(char *tlb, unsigned long nslabs, int 
> verbose)
> +static void swiotlb_init_io_tlb_mem(struct io_tlb_mem *mem, phys_addr_t 
> start,
> + unsigned long nslabs, bool late_alloc)
>  {
> + void *vaddr = phys_to_virt(start);
>   unsigned long bytes = nslabs << IO_TLB_SHIFT, i;
> +
> + mem->nslabs = nslabs;
> + mem->start = start;
> + mem->end = mem->start + bytes;
> + mem->index = 0;
> + mem->late_alloc = late_alloc;
> + spin_lock_init(>lock);
> + for (i = 0; i < mem->nslabs; i++) {
> + mem->slots[i].list = IO_TLB_SEGSIZE - io_tlb_offset(i);
> + mem->slots[i].orig_addr = INVALID_PHYS_ADDR;
> + mem->slots[i].alloc_size = 0;
> + }
> + memset(vaddr, 0, bytes);
> +}
> +
> +int __init swiotlb_init_with_tbl(char *tlb, unsigned long nslabs, int 
> verbose)
> +{
>   struct io_tlb_mem *mem;
>   size_t alloc_size;
>  
> @@ -186,16 +205,8 @@ int __init swiotlb_init_with_tbl(char *tlb, unsigned 
> long nslabs, int verbose)
>   if (!mem)
>   panic("%s: Failed to allocate %zu bytes align=0x%lx\n",
> __func__, alloc_size, PAGE_SIZE);
> - mem->nslabs = nslabs;
> - mem->start = __pa(tlb);
> - mem->end = mem->start + bytes;
> - mem->index = 0;
> - spin_lock_init(>lock);
> - for (i = 0; i < mem->nslabs; i++) {
> - mem->slots[i].list = IO_TLB_SEGSIZE - io_tlb_offset(i);
> - mem->slots[i].orig_addr = INVALID_PHYS_ADDR;
> - mem->slots[i].alloc_size = 0;
> - }
> +
> + swiotlb_init_io_tlb_mem(mem, __pa(tlb), nslabs, false);
>  
>   io_tlb_default_mem = mem;
>   if (verbose)
> @@ -282,8 +293,8 @@ swiotlb_late_init_with_default_size(size_t default_size)
>  int
>  swiotlb_late_init_with_tbl(char *tlb, unsigned long nslabs)
>  {
> - unsigned long bytes = nslabs << IO_TLB_SHIFT, i;
>   struct io_tlb_mem *mem;
> + unsigned long bytes = nslabs << IO_TLB_SHIFT;
>  
>   if (swiotlb_force == SWIOTLB_NO_FORCE)
>   return 0;
> @@ -297,20 +308,9 @@ swiotlb_late_init_with_tbl(char *tlb, unsigned long 
> nslabs)
>   if (!mem)
>   return -ENOMEM;
>  
> - mem->nslabs = nslabs;
> - mem->start = virt_to_phys(tlb);
> - mem->end = mem->start + bytes;
> - mem->index = 0;
> - mem->late_alloc = 1;
> - spin_lock_init(>lock);
> - for (i = 0; i < mem->nslabs; i++) {
> - mem->slots[i].list = IO_TLB_SEGSIZE - io_tlb_offset(i);
> - mem->slots[i].orig_addr = INVALID_PHYS_ADDR;
> - mem->slots[i].alloc_size = 0;
> - }
> -
> + memset(mem, 0, sizeof(*mem));
> + swiotlb_init_io_tlb_mem(mem, virt_to_phys(tlb), nslabs, true);
>   set_memory_decrypted((unsigned long)tlb, bytes >> PAGE_SHIFT);
> - memset(tlb, 0, bytes);
 
This is good for swiotlb_late_init_with_tbl. However I have just noticed
that mem could also be allocated from swiotlb_init_with_tbl, in which
case the zeroing is missing. I think we need another memset in
swiotlb_init_with_tbl as well. Or maybe it could be better to have a
single memset at the beginning of swiotlb_init_io_tlb_mem instead. Up to
you.
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

Re: Plan for /dev/ioasid RFC v2

2021-06-17 Thread Jason Gunthorpe

On Thu, Jun 17, 2021 at 02:45:46PM +1000, David Gibson wrote:
> On Wed, Jun 09, 2021 at 09:39:19AM -0300, Jason Gunthorpe wrote:
> > On Wed, Jun 09, 2021 at 02:24:03PM +0200, Joerg Roedel wrote:
> > > On Mon, Jun 07, 2021 at 02:58:18AM +, Tian, Kevin wrote:
> > > > -   Device-centric (Jason) vs. group-centric (David) uAPI. David is not 
> > > > fully
> > > > convinced yet. Based on discussion v2 will continue to have ioasid 
> > > > uAPI
> > > > being device-centric (but it's fine for vfio to be group-centric). 
> > > > A new
> > > > section will be added to elaborate this part;
> > > 
> > > I would vote for group-centric here. Or do the reasons for which VFIO is
> > > group-centric not apply to IOASID? If so, why?
> > 
> > VFIO being group centric has made it very ugly/difficult to inject
> > device driver specific knowledge into the scheme.
> > 
> > The device driver is the only thing that knows to ask:
> >  - I need a SW table for this ioasid because I am like a mdev
> >  - I will issue TLPs with PASID
> >  - I need a IOASID linked to a PASID
> >  - I am a devices that uses ENQCMD and vPASID
> >  - etc in future
> 
> mdev drivers might know these, but shim drivers, like basic vfio-pci
> often won't.

The generic drivers say 'I will do every kind of DMA possible', which
is in-of-itself a special kind of information to convey.

There are alot of weird corners to think about here, like what if the
guest asks for a PASID on a mdev that doesn't support PASID, but
hooked to a RID that does or other quite nonsense combinations. These
need to be blocked/handled/whatever properly, which is made much
easier if the common code actually knows detail about what is going
on.

> I still think you're having a tendency to partially conflate several
> meanings of "group":
>   1. the unavoidable hardware unit of non-isolation
>   2. the kernel internal concept and interface to it
>   3. the user visible fd and interface

I think I have those pretty clearly seperated :)
 
> We can't avoid having (1) somewhere, (3) and to a lesser extent (2)
> are what you object to.

I don't like (3) either, and am yet to hear a definitive reason why we
must have it..
 
> > The current approach has the group try to guess the device driver
> > intention in the vfio type 1 code.
> 
> I agree this has gotten ugly.  What I'm not yet convinced of is that
> reworking groups to make this not-ugly necessarily requires totally
> minimizing the importance of groups.

I think it does - we can't have the group in the middle and still put
the driver in chrage, it doesn't really work.

At least if someone can see an arrangement otherwise lets hear it -
start with how to keep groups and remove the mdev hackery from type1..

Jason
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

Re: Plan for /dev/ioasid RFC v2

2021-06-17 Thread Jason Gunthorpe

On Thu, Jun 17, 2021 at 03:02:33PM +1000, David Gibson wrote:

> In other words, do we really have use cases where we need to identify
> different devices IDs, even though we know they're not isolated.

I think when PASID is added in and all the complexity that brings, it
does become more important, yes.

At the minimum we should scope the complexity.

I'm not convinced it is so complicated, really it is just a single bit
of information toward userspace: 'all devices in this group must use
the same IOASID'

Something like qemu consumes this bit and creates the pci/pcie bridge
to model this to the guest and so on.

Something like dpdk just doesn't care (same as today).

Jason
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

Re: [PATCH] dt-bindings: Drop redundant minItems/maxItems

2021-06-17 Thread Rob Herring

On Thu, Jun 17, 2021 at 10:06 AM Suman Anna  wrote:
>
> Hi Rob,
>
> On 6/15/21 2:15 PM, Rob Herring wrote:
> > If a property has an 'items' list, then a 'minItems' or 'maxItems' with the
> > same size as the list is redundant and can be dropped. Note that is DT
> > schema specific behavior and not standard json-schema behavior. The tooling
> > will fixup the final schema adding any unspecified minItems/maxItems.
> >
> > This condition is partially checked with the meta-schema already, but
> > only if both 'minItems' and 'maxItems' are equal to the 'items' length.
> > An improved meta-schema is pending.
> >
> > Cc: Jens Axboe 
> > Cc: Stephen Boyd 
> > Cc: Herbert Xu 
> > Cc: "David S. Miller" 
> > Cc: David Airlie 
> > Cc: Daniel Vetter 
> > Cc: Vinod Koul 
> > Cc: Bartosz Golaszewski 
> > Cc: Kamal Dasu 
> > Cc: Jonathan Cameron 
> > Cc: Lars-Peter Clausen 
> > Cc: Thomas Gleixner 
> > Cc: Marc Zyngier 
> > Cc: Joerg Roedel 
> > Cc: Jassi Brar 
> > Cc: Mauro Carvalho Chehab 
> > Cc: Krzysztof Kozlowski 
> > Cc: Ulf Hansson 
> > Cc: Jakub Kicinski 
> > Cc: Wolfgang Grandegger 
> > Cc: Marc Kleine-Budde 
> > Cc: Andrew Lunn 
> > Cc: Vivien Didelot 
> > Cc: Vladimir Oltean 
> > Cc: Bjorn Helgaas 
> > Cc: Kishon Vijay Abraham I 
> > Cc: Linus Walleij 
> > Cc: "Uwe Kleine-König" 
> > Cc: Lee Jones 
> > Cc: Ohad Ben-Cohen 
> > Cc: Mathieu Poirier 
> > Cc: Philipp Zabel 
> > Cc: Paul Walmsley 
> > Cc: Palmer Dabbelt 
> > Cc: Albert Ou 
> > Cc: Alessandro Zummo 
> > Cc: Alexandre Belloni 
> > Cc: Greg Kroah-Hartman 
> > Cc: Mark Brown 
> > Cc: Zhang Rui 
> > Cc: Daniel Lezcano 
> > Cc: Wim Van Sebroeck 
> > Cc: Guenter Roeck 
> > Signed-off-by: Rob Herring 
> > ---
> >  .../devicetree/bindings/ata/nvidia,tegra-ahci.yaml  | 1 -
> >  .../devicetree/bindings/clock/allwinner,sun4i-a10-ccu.yaml  | 2 --
> >  .../devicetree/bindings/clock/qcom,gcc-apq8064.yaml | 1 -
> >  Documentation/devicetree/bindings/clock/qcom,gcc-sdx55.yaml | 2 --
> >  .../devicetree/bindings/clock/qcom,gcc-sm8350.yaml  | 2 --
> >  .../devicetree/bindings/clock/sprd,sc9863a-clk.yaml | 1 -
> >  .../devicetree/bindings/crypto/allwinner,sun8i-ce.yaml  | 2 --
> >  Documentation/devicetree/bindings/crypto/fsl-dcp.yaml   | 1 -
> >  .../display/allwinner,sun4i-a10-display-backend.yaml| 6 --
> >  .../bindings/display/allwinner,sun6i-a31-mipi-dsi.yaml  | 1 -
> >  .../bindings/display/allwinner,sun8i-a83t-dw-hdmi.yaml  | 4 
> >  .../bindings/display/allwinner,sun8i-a83t-hdmi-phy.yaml | 2 --
> >  .../bindings/display/allwinner,sun8i-r40-tcon-top.yaml  | 2 --
> >  .../devicetree/bindings/display/bridge/cdns,mhdp8546.yaml   | 2 --
> >  .../bindings/display/rockchip/rockchip,dw-hdmi.yaml | 2 --
> >  Documentation/devicetree/bindings/display/st,stm32-dsi.yaml | 2 --
> >  .../devicetree/bindings/display/st,stm32-ltdc.yaml  | 1 -
> >  .../devicetree/bindings/display/xlnx/xlnx,zynqmp-dpsub.yaml | 4 
> >  .../devicetree/bindings/dma/renesas,rcar-dmac.yaml  | 1 -
> >  .../devicetree/bindings/edac/amazon,al-mc-edac.yaml | 2 --
> >  Documentation/devicetree/bindings/eeprom/at24.yaml  | 1 -
> >  Documentation/devicetree/bindings/example-schema.yaml   | 2 --
> >  Documentation/devicetree/bindings/gpu/brcm,bcm-v3d.yaml | 1 -
> >  Documentation/devicetree/bindings/gpu/vivante,gc.yaml   | 1 -
> >  Documentation/devicetree/bindings/i2c/brcm,brcmstb-i2c.yaml | 1 -
> >  .../devicetree/bindings/i2c/marvell,mv64xxx-i2c.yaml| 2 --
> >  .../devicetree/bindings/i2c/mellanox,i2c-mlxbf.yaml | 1 -
> >  .../devicetree/bindings/iio/adc/amlogic,meson-saradc.yaml   | 1 -
> >  .../devicetree/bindings/iio/adc/st,stm32-dfsdm-adc.yaml | 2 --
> >  .../bindings/interrupt-controller/fsl,irqsteer.yaml | 1 -
> >  .../bindings/interrupt-controller/loongson,liointc.yaml | 1 -
> >  Documentation/devicetree/bindings/iommu/arm,smmu-v3.yaml| 1 -
> >  .../devicetree/bindings/iommu/renesas,ipmmu-vmsa.yaml   | 1 -
> >  .../devicetree/bindings/mailbox/st,stm32-ipcc.yaml  | 2 --
> >  .../devicetree/bindings/media/amlogic,gx-vdec.yaml  | 1 -
> >  Documentation/devicetree/bindings/media/i2c/adv7604.yaml| 1 -
> >  .../devicetree/bindings/media/marvell,mmp2-ccic.yaml| 1 -
> >  .../devicetree/bindings/media/qcom,sc7180-venus.yaml| 1 -
> >  .../devicetree/bindings/media/qcom,sdm845-venus-v2.yaml | 1 -
> >  .../devicetree/bindings/media/qcom,sm8250-venus.yaml| 1 -
> >  Documentation/devicetree/bindings/media/renesas,drif.yaml   | 1 -
> >  .../bindings/memory-controllers/mediatek,smi-common.yaml| 6 ++
> >  .../bindings/memory-controllers/mediatek,smi-larb.yaml  | 1 -
> >  .../devicetree/bindings/mmc/allwinner,sun4i-a10-mmc.yaml| 2 --
> >  Documentation/devicetree/bindings/mmc/fsl-imx-esdhc.yaml| 1 -
> >  Documentation/devicetree/bindings/mmc/mtk-sd.yaml   | 2 --
> >

Re: [PATCH] iommu/vt-d: Fix W=1 clang warning in intel/perf.c

2021-06-17 Thread Nathan Chancellor


On 6/17/2021 1:30 PM, Joerg Roedel wrote:

On Thu, Jun 17, 2021 at 10:16:50AM -0700, Nick Desaulniers wrote:

On Thu, Jun 17, 2021 at 7:54 AM Joerg Roedel  wrote:


From: Joerg Roedel 

Fix this warning when compiled with clang and W=1:

 drivers/iommu/intel/perf.c:16: warning: Function parameter or member 
'latency_lock' not described in 'DEFINE_SPINLOCK'
 drivers/iommu/intel/perf.c:16: warning: expecting prototype for 
perf.c(). Prototype was for DEFINE_SPINLOCK() instead


I think these warnings are actually produced by kernel-doc? (not clang)


Will kernel-doc check automatically when COMPILER=clang is set and W=1?
Because I did not explicitly enable any kernel-doc checks.

Regards,

Joerg



kernel-doc is run automatically with W=1, regardless of gcc versus clang.

Cheers,
Nathan
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

RE: [PATCH] iommu/io-pgtable-arm: Optimize partial walk flush for large scatter-gather list

2021-06-17 Thread Krishna Reddy

> Instead of flush_ops in init_context hook, perhaps a io_pgtable quirk since 
> this is
> related to tlb, probably a bad name but IO_PGTABLE_QUIRK_TLB_INV which will
> be set in init_context impl hook and the prev condition in
> io_pgtable_tlb_flush_walk()
> becomes something like below. Seems very minimal and neat instead of poking
> into tlb_flush_walk functions or touching dma strict with some flag?
> 
> if (iop->cfg.quirks & IO_PGTABLE_QUIRK_NON_STRICT ||
>  iop->cfg.quirks & IO_PGTABLE_QUIRK_TLB_INV) {
>  iop->cfg.tlb->tlb_flush_all(iop->cookie);
>  return;
> }

Can you name it as IO_PGTABLE_QUIRK_TLB_INV_ASID or 
IO_PGTABLE_QUIRK_TLB_INV_ALL_ASID?

-KR
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

Re: Plan for /dev/ioasid RFC v2

2021-06-17 Thread Alex Williamson

On Thu, 17 Jun 2021 07:31:03 +
"Tian, Kevin"  wrote:

> > From: Alex Williamson 
> > Sent: Thursday, June 17, 2021 3:40 AM
> > 
> > On Wed, 16 Jun 2021 06:43:23 +
> > "Tian, Kevin"  wrote:
> >   
> > > > From: Alex Williamson 
> > > > Sent: Wednesday, June 16, 2021 12:12 AM
> > > >
> > > > On Tue, 15 Jun 2021 02:31:39 +
> > > > "Tian, Kevin"  wrote:
> > > >  
> > > > > > From: Alex Williamson 
> > > > > > Sent: Tuesday, June 15, 2021 12:28 AM
> > > > > >  
> > > > > [...]  
> > > > > > > IOASID. Today the group fd requires an IOASID before it hands out 
> > > > > > > a
> > > > > > > device_fd. With iommu_fd the device_fd will not allow IOCTLs 
> > > > > > > until  
> > it  
> > > > > > > has a blocked DMA IOASID and is successefully joined to an  
> > iommu_fd.  
> > > > > >
> > > > > > Which is the root of my concern.  Who owns ioctls to the device fd?
> > > > > > It's my understanding this is a vfio provided file descriptor and 
> > > > > > it's
> > > > > > therefore vfio's responsibility.  A device-level IOASID interface
> > > > > > therefore requires that vfio manage the group aspect of device 
> > > > > > access.
> > > > > > AFAICT, that means that device access can therefore only begin when 
> > > > > >  
> > all  
> > > > > > devices for a given group are attached to the IOASID and must halt 
> > > > > > for
> > > > > > all devices in the group if any device is ever detached from an 
> > > > > > IOASID,
> > > > > > even temporarily.  That suggests a lot more oversight of the 
> > > > > > IOASIDs  
> > by  
> > > > > > vfio than I'd prefer.
> > > > > >  
> > > > >
> > > > > This is possibly the point that is worthy of more clarification and
> > > > > alignment, as it sounds like the root of controversy here.
> > > > >
> > > > > I feel the goal of vfio group management is more about ownership, i.e.
> > > > > all devices within a group must be assigned to a single user. 
> > > > > Following
> > > > > the three rules defined by Jason, what we really care is whether a 
> > > > > group
> > > > > of devices can be isolated from the rest of the world, i.e. no access 
> > > > > to
> > > > > memory/device outside of its security context and no access to its
> > > > > security context from devices outside of this group. This can be  
> > achieved  
> > > > > as long as every device in the group is either in block-DMA state when
> > > > > it's not attached to any security context or attached to an IOASID  
> > context  
> > > > > in IOMMU fd.
> > > > >
> > > > > As long as group-level isolation is satisfied, how devices within a 
> > > > > group
> > > > > are further managed is decided by the user (unattached, all attached 
> > > > > to
> > > > > same IOASID, attached to different IOASIDs) as long as the user
> > > > > understands the implication of lacking of isolation within the group. 
> > > > >  
> > This  
> > > > > is what a device-centric model comes to play. Misconfiguration just  
> > hurts  
> > > > > the user itself.
> > > > >
> > > > > If this rationale can be agreed, then I didn't see the point of 
> > > > > having VFIO
> > > > > to mandate all devices in the group must be attached/detached in
> > > > > lockstep.  
> > > >
> > > > In theory this sounds great, but there are still too many assumptions
> > > > and too much hand waving about where isolation occurs for me to feel
> > > > like I really have the complete picture.  So let's walk through some
> > > > examples.  Please fill in and correct where I'm wrong.  
> > >
> > > Thanks for putting these examples. They are helpful for clearing the
> > > whole picture.
> > >
> > > Before filling in let's first align on what is the key difference between
> > > current VFIO model and this new proposal. With this comparison we'll
> > > know which of following questions are answered with existing VFIO
> > > mechanism and which are handled differently.
> > >
> > > With Yi's help we figured out the current mechanism:
> > >
> > > 1) vfio_group_viable. The code comment explains the intention clearly:
> > >
> > > --
> > > * A vfio group is viable for use by userspace if all devices are in
> > >  * one of the following states:
> > >  *  - driver-less
> > >  *  - bound to a vfio driver
> > >  *  - bound to an otherwise allowed driver
> > >  *  - a PCI interconnect device
> > > --
> > >
> > > Note this check is not related to an IOMMU security context.  
> > 
> > Because this is a pre-requisite for imposing that IOMMU security
> > context.
> >   
> > > 2) vfio_iommu_group_notifier. When an IOMMU_GROUP_NOTIFY_
> > > BOUND_DRIVER event is notified, vfio_group_viable is re-evaluated.
> > > If the affected group was previously viable but now becomes not
> > > viable, BUG_ON() as it implies that this device is bound to a non-vfio
> > > driver which breaks the group isolation.  
> > 
> > This notifier action is conditional on there being users of devices
> > within a secure group IOMMU context.
> >   
> > > 3) vfio_group_get_device_fd. User can acquire a

[Patch V2 2/2] iommu/arm-smmu: Fix race condition during iommu_group creation

2021-06-17 Thread Ashish Mhetre

From: Krishna Reddy 

iommu_group is getting created more than once during asynchronous multiple
display heads(devices) probe on Tegra194 SoC. All the display heads share
same SID and are expected to be in same iommu_group.
As arm_smmu_device_group() is not protecting group creation across devices,
it is leading to multiple groups creation across devices with same SID and
subsequent IOMMU faults.
During race, the iommu_probe_device() call for two display devices is
ending up in arm_smmu_device_group() twice and hence two groups are getting
created. Ideally after group creation for first display device, same group
should be used by second display device.
This race is leading to context faults when one display device is accessing
IOVA from other display device which shouldn't be the case for devices
sharing same SID.
Fix this by protecting group creation with smmu->stream_map_mutex.

Signed-off-by: Krishna Reddy 
---
Changes since V1:
- Update the commit message per Will's suggestion

 drivers/iommu/arm/arm-smmu/arm-smmu.c | 6 +-
 1 file changed, 5 insertions(+), 1 deletion(-)

diff --git a/drivers/iommu/arm/arm-smmu/arm-smmu.c 
b/drivers/iommu/arm/arm-smmu/arm-smmu.c
index 6f72c4d..21af179 100644
--- a/drivers/iommu/arm/arm-smmu/arm-smmu.c
+++ b/drivers/iommu/arm/arm-smmu/arm-smmu.c
@@ -1458,6 +1458,7 @@ static struct iommu_group *arm_smmu_device_group(struct 
device *dev)
struct iommu_group *group = NULL;
int i, idx;
 
+   mutex_lock(>stream_map_mutex);
for_each_cfg_sme(cfg, fwspec, i, idx) {
if (group && smmu->s2crs[idx].group &&
group != smmu->s2crs[idx].group)
@@ -1466,8 +1467,10 @@ static struct iommu_group *arm_smmu_device_group(struct 
device *dev)
group = smmu->s2crs[idx].group;
}
 
-   if (group)
+   if (group) {
+   mutex_unlock(>stream_map_mutex);
return iommu_group_ref_get(group);
+   }
 
if (dev_is_pci(dev))
group = pci_device_group(dev);
@@ -1481,6 +1484,7 @@ static struct iommu_group *arm_smmu_device_group(struct 
device *dev)
for_each_cfg_sme(cfg, fwspec, i, idx)
smmu->s2crs[idx].group = group;
 
+   mutex_unlock(>stream_map_mutex);
return group;
 }
 
-- 
2.7.4

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

[Patch V2 1/2] iommu: Fix race condition during default domain allocation

2021-06-17 Thread Ashish Mhetre

Domain is getting created more than once during asynchronous multiple
display heads(devices) probe. All the display heads share same SID and
are expected to be in same domain. As iommu_alloc_default_domain() call
is not protected, it ends up in creating two domains for two display
devices which should ideally be in same domain.
iommu_alloc_default_domain() checks whether domain is already allocated for
given iommu group, but due to this race the check condition is failing and
two different domains are getting created.
This is leading to context faults when one device is accessing the IOVA
mapped by other device.
Fix this by protecting iommu_alloc_default_domain() call with group->mutex.
With this fix serialization will happen only for the devices sharing same
group. Also, only first device in group will hold the mutex till group is
created and for rest of the devices it will just check for existing domain
and then release the mutex.

Signed-off-by: Ashish Mhetre 
---
Changes since V1:
- Update the commit message per Will's suggestion

 drivers/iommu/iommu.c | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/drivers/iommu/iommu.c b/drivers/iommu/iommu.c
index 808ab70..2700500 100644
--- a/drivers/iommu/iommu.c
+++ b/drivers/iommu/iommu.c
@@ -273,7 +273,9 @@ int iommu_probe_device(struct device *dev)
 * support default domains, so the return value is not yet
 * checked.
 */
+   mutex_lock(>mutex);
iommu_alloc_default_domain(group, dev);
+   mutex_unlock(>mutex);
 
if (group->default_domain) {
ret = __iommu_attach_device(group->default_domain, dev);
-- 
2.7.4

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

Re: [PATCH] iommu/vt-d: Fix W=1 clang warning in intel/perf.c

2021-06-17 Thread Joerg Roedel

On Thu, Jun 17, 2021 at 10:16:50AM -0700, Nick Desaulniers wrote:
> On Thu, Jun 17, 2021 at 7:54 AM Joerg Roedel  wrote:
> >
> > From: Joerg Roedel 
> >
> > Fix this warning when compiled with clang and W=1:
> >
> > drivers/iommu/intel/perf.c:16: warning: Function parameter or 
> > member 'latency_lock' not described in 'DEFINE_SPINLOCK'
> > drivers/iommu/intel/perf.c:16: warning: expecting prototype for 
> > perf.c(). Prototype was for DEFINE_SPINLOCK() instead
> 
> I think these warnings are actually produced by kernel-doc? (not clang)

Will kernel-doc check automatically when COMPILER=clang is set and W=1?
Because I did not explicitly enable any kernel-doc checks.

Regards,

Joerg
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

[Patch V2 0/2] iommu/arm-smmu: Fix races in iommu domain/group creation

2021-06-17 Thread Ashish Mhetre

Multiple iommu domains and iommu groups are getting created for the devices
sharing same SID. It is expected for devices sharing same SID to be in same
iommu group and same iommu domain.
This is leading to context faults when one device is accessing IOVA from
other device which shouldn't be the case for devices sharing same SID.
Fix this by protecting iommu domain and iommu group creation with mutexes.

Ashish Mhetre (1):
  iommu: Fix race condition during default domain allocation

Krishna Reddy (1):
  iommu/arm-smmu: Fix race condition during iommu_group creation

 drivers/iommu/arm/arm-smmu/arm-smmu.c | 6 +-
 drivers/iommu/iommu.c | 2 ++
 2 files changed, 7 insertions(+), 1 deletion(-)

-- 
2.7.4

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

Re: [PATCH v13 4/6] iommu/vt-d: Add support for IOMMU default DMA mode build options

2021-06-17 Thread Robin Murphy


On 2021-06-17 09:00, John Garry wrote:

On 17/06/2021 08:32, Lu Baolu wrote:

On 6/16/21 7:03 PM, John Garry wrote:

@@ -4382,9 +4380,9 @@ int __init intel_iommu_init(void)
   * is likely to be much lower than the overhead of 
synchronizing

   * the virtual and physical IOMMU page-tables.
   */
-    if (!intel_iommu_strict && cap_caching_mode(iommu->cap)) {
-    pr_warn("IOMMU batching is disabled due to 
virtualization");

-    intel_iommu_strict = 1;
+    if (cap_caching_mode(iommu->cap)) {
+    pr_warn("IOMMU batching disallowed due to 
virtualization\n");

+    iommu_set_dma_strict(true);


With this change, VM guest will always show this warning.


Would they have got it before also normally?

I mean, default is intel_iommu_strict=0, so if 
cap_caching_mode(iommu->cap) is true and intel_iommu_strict not set to 1 
elsewhere previously, then we would get this print.



How about
removing this message? Users could get the same information through the
kernel message added by "[PATCH v13 2/6] iommu: Print strict or lazy
mode at init time".


I think that the print from 2/6 should occur before this print.

Regardless I would think that you would still like to be notified of 
this change in policy, right?


However I now realize that the print is in a loop per iommu, so we would 
get it per iommu:


for_each_active_iommu(iommu, drhd) {
 /*
  * The flush queue implementation does not perform
  * page-selective invalidations that are required for efficient
  * TLB flushes in virtual environments.  The benefit of batching
  * is likely to be much lower than the overhead of synchronizing
  * the virtual and physical IOMMU page-tables.
  */
 if (!intel_iommu_strict && cap_caching_mode(iommu->cap)) {
     pr_warn("IOMMU batching is disabled due to virtualization");
     intel_iommu_strict = 1;
 }
 ...
}

I need to change that. How about this:

bool print_warning = false;

for_each_active_iommu(iommu, drhd) {
 /*
  * The flush queue implementation does not perform
  * page-selective invalidations that are required for efficient
  * TLB flushes in virtual environments.  The benefit of batching
  * is likely to be much lower than the overhead of synchronizing
  * the virtual and physical IOMMU page-tables.
  */
 if (!print_warning && cap_caching_mode(iommu->cap)) {
     pr_warn("IOMMU batching disallowed due to virtualization\n");
     iommu_set_dma_strict(true);
     print_warning = true;
 }
 ...
}

or use pr_warn_once().


Maybe even downgrade it to pr_info_once(), since AIUI it's not really 
anything scary?


I suppose you could technically fake up a domain on the stack to get the 
global setting out of iommu_get_dma_strict(), or perhaps give 
iommu_set_dma_strict() a cheeky return value to indicate what the 
previous setting was, in order to suppress the message entirely if 
strict is already set, but I'm not at all convinced it's worth the bother.


Robin.
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

Re: [PATCH v13 1/6] iommu: Deprecate Intel and AMD cmdline methods to enable strict mode

2021-06-17 Thread Robin Murphy


On 2021-06-16 12:03, John Garry wrote:

Now that the x86 drivers support iommu.strict, deprecate the custom
methods.

Signed-off-by: John Garry 
---
  Documentation/admin-guide/kernel-parameters.txt | 5 +++--
  drivers/iommu/amd/init.c| 4 +++-
  drivers/iommu/intel/iommu.c | 1 +
  3 files changed, 7 insertions(+), 3 deletions(-)

diff --git a/Documentation/admin-guide/kernel-parameters.txt 
b/Documentation/admin-guide/kernel-parameters.txt
index 30e9dd52464e..fcbb36d6eea7 100644
--- a/Documentation/admin-guide/kernel-parameters.txt
+++ b/Documentation/admin-guide/kernel-parameters.txt
@@ -290,7 +290,8 @@
amd_iommu=  [HW,X86-64]
Pass parameters to the AMD IOMMU driver in the system.
Possible values are:
-   fullflush - enable flushing of IO/TLB entries when
+   fullflush   [Deprecated, use iommu.strict instead]
+ - enable flushing of IO/TLB entries when
they are unmapped. Otherwise they are
flushed before they will be reused, which
is a lot of faster
@@ -1947,7 +1948,7 @@
bypassed by not enabling DMAR with this option. In
this case, gfx device will use physical address for
DMA.
-   strict [Default Off]
+   strict [Default Off] [Deprecated, use iommu.strict instead]
With this option on every unmap_single operation will
result in a hardware IOTLB flush operation as opposed
to batching them for performance.


FWIW I'd be inclined to replace both whole descriptions with just 
something like "Deprecated, equivalent to iommu.strict=1".



diff --git a/drivers/iommu/amd/init.c b/drivers/iommu/amd/init.c
index 46280e6e1535..9f3096d650aa 100644
--- a/drivers/iommu/amd/init.c
+++ b/drivers/iommu/amd/init.c
@@ -3098,8 +3098,10 @@ static int __init parse_amd_iommu_intr(char *str)
  static int __init parse_amd_iommu_options(char *str)
  {
for (; *str; ++str) {
-   if (strncmp(str, "fullflush", 9) == 0)
+   if (strncmp(str, "fullflush", 9) == 0) {
+   pr_warn("amd_iommu=fullflush deprecated; use iommu.strict 
instead\n");


Nit: maybe we should spell out "...use =1 instead" in all of 
these messages just in case anyone takes them literally? (I'm not sure 
the options parse correctly with no argument)


Either way,

Acked-by: Robin Murphy 

Thanks,
Robin.


amd_iommu_unmap_flush = true;
+   }
if (strncmp(str, "force_enable", 12) == 0)
amd_iommu_force_enable = true;
if (strncmp(str, "off", 3) == 0)
diff --git a/drivers/iommu/intel/iommu.c b/drivers/iommu/intel/iommu.c
index bd93c7ec879e..821d8227a4e6 100644
--- a/drivers/iommu/intel/iommu.c
+++ b/drivers/iommu/intel/iommu.c
@@ -454,6 +454,7 @@ static int __init intel_iommu_setup(char *str)
pr_warn("intel_iommu=forcedac deprecated; use iommu.forcedac 
instead\n");
iommu_dma_forcedac = true;
} else if (!strncmp(str, "strict", 6)) {
+   pr_warn("intel_iommu=strict deprecated; use iommu.strict 
instead\n");
pr_info("Disable batched IOTLB flush\n");
intel_iommu_strict = 1;
} else if (!strncmp(str, "sp_off", 6)) {


___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

Re: [PATCH v13 6/6] iommu: Remove mode argument from iommu_set_dma_strict()

2021-06-17 Thread Robin Murphy


On 2021-06-17 08:36, Lu Baolu wrote:

On 6/16/21 7:03 PM, John Garry wrote:

We only ever now set strict mode enabled in iommu_set_dma_strict(), so
just remove the argument.

Signed-off-by: John Garry 
Reviewed-by: Robin Murphy 
---
  drivers/iommu/amd/init.c    | 2 +-
  drivers/iommu/intel/iommu.c | 6 +++---
  drivers/iommu/iommu.c   | 5 ++---
  include/linux/iommu.h   | 2 +-
  4 files changed, 7 insertions(+), 8 deletions(-)

diff --git a/drivers/iommu/amd/init.c b/drivers/iommu/amd/init.c
index fb3618af643b..7bc460052678 100644
--- a/drivers/iommu/amd/init.c
+++ b/drivers/iommu/amd/init.c
@@ -3099,7 +3099,7 @@ static int __init parse_amd_iommu_options(char 
*str)

  for (; *str; ++str) {
  if (strncmp(str, "fullflush", 9) == 0) {
  pr_warn("amd_iommu=fullflush deprecated; use 
iommu.strict instead\n");

-    iommu_set_dma_strict(true);
+    iommu_set_dma_strict();
  }
  if (strncmp(str, "force_enable", 12) == 0)
  amd_iommu_force_enable = true;
diff --git a/drivers/iommu/intel/iommu.c b/drivers/iommu/intel/iommu.c
index d586990fa751..0618c35cfb51 100644
--- a/drivers/iommu/intel/iommu.c
+++ b/drivers/iommu/intel/iommu.c
@@ -454,7 +454,7 @@ static int __init intel_iommu_setup(char *str)
  iommu_dma_forcedac = true;
  } else if (!strncmp(str, "strict", 6)) {
  pr_warn("intel_iommu=strict deprecated; use iommu.strict 
instead\n");

-    iommu_set_dma_strict(true);
+    iommu_set_dma_strict();
  } else if (!strncmp(str, "sp_off", 6)) {
  pr_info("Disable supported super page\n");
  intel_iommu_superpage = 0;
@@ -4382,7 +4382,7 @@ int __init intel_iommu_init(void)
   */
  if (cap_caching_mode(iommu->cap)) {
  pr_warn("IOMMU batching disallowed due to 
virtualization\n");

-    iommu_set_dma_strict(true);
+    iommu_set_dma_strict();
  }
  iommu_device_sysfs_add(>iommu, NULL,
 intel_iommu_groups,
@@ -5699,7 +5699,7 @@ static void quirk_calpella_no_shadow_gtt(struct 
pci_dev *dev)

  } else if (dmar_map_gfx) {
  /* we have to ensure the gfx device is idle before we flush */
  pci_info(dev, "Disabling batched IOTLB flush on Ironlake\n");
-    iommu_set_dma_strict(true);
+    iommu_set_dma_strict();
  }
  }
  DECLARE_PCI_FIXUP_HEADER(PCI_VENDOR_ID_INTEL, 0x0040, 
quirk_calpella_no_shadow_gtt);

diff --git a/drivers/iommu/iommu.c b/drivers/iommu/iommu.c
index 60b1ec42e73b..ff221d3ddcbc 100644
--- a/drivers/iommu/iommu.c
+++ b/drivers/iommu/iommu.c
@@ -349,10 +349,9 @@ static int __init iommu_dma_setup(char *str)
  }
  early_param("iommu.strict", iommu_dma_setup);
-void iommu_set_dma_strict(bool strict)
+void iommu_set_dma_strict(void)
  {
-    if (strict || !(iommu_cmd_line & IOMMU_CMD_LINE_STRICT))
-    iommu_dma_strict = strict;
+    iommu_dma_strict = true;


Sorry, I still can't get how iommu.strict kernel option works.

static int __init iommu_dma_setup(char *str)
{
     int ret = kstrtobool(str, _dma_strict);


Note that this is the bit that does the real work - if the argument 
parses OK then iommu_dma_strict is reassigned with the appropriate 
value. The iommu_cmd_line stuff is a bit of additional bookkeeping, 
basically just so we can see whether default values have been overridden.


Robin.



     if (!ret)
     iommu_cmd_line |= IOMMU_CMD_LINE_STRICT;
     return ret;
}
early_param("iommu.strict", iommu_dma_setup);

The bit IOMMU_CMD_LINE_STRICT is only set, but not used anywhere. Hence,
I am wondering how could it work? A bug or I missed anything?

Best regards,
baolu


  }
  bool iommu_get_dma_strict(struct iommu_domain *domain)
diff --git a/include/linux/iommu.h b/include/linux/iommu.h
index 32d448050bf7..754f67d6dd90 100644
--- a/include/linux/iommu.h
+++ b/include/linux/iommu.h
@@ -476,7 +476,7 @@ int iommu_enable_nesting(struct iommu_domain 
*domain);

  int iommu_set_pgtable_quirks(struct iommu_domain *domain,
  unsigned long quirks);
-void iommu_set_dma_strict(bool val);
+void iommu_set_dma_strict(void);
  bool iommu_get_dma_strict(struct iommu_domain *domain);
  extern int report_iommu_fault(struct iommu_domain *domain, struct 
device *dev,



___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

Re: [PATCH 1/2] iommu: Fix race condition during default domain allocation

2021-06-17 Thread Will Deacon

On Thu, Jun 17, 2021 at 11:21:39AM +0530, Ashish Mhetre wrote:
> 
> 
> On 6/11/2021 6:19 PM, Robin Murphy wrote:
> > External email: Use caution opening links or attachments
> > 
> > 
> > On 2021-06-11 11:45, Will Deacon wrote:
> > > On Thu, Jun 10, 2021 at 09:46:53AM +0530, Ashish Mhetre wrote:
> > > > Domain is getting created more than once during asynchronous multiple
> > > > display heads(devices) probe. All the display heads share same SID and
> > > > are expected to be in same domain. As iommu_alloc_default_domain() call
> > > > is not protected, the group->default_domain and group->domain are ending
> > > > up with different domains and leading to subsequent IOMMU faults.
> > > > Fix this by protecting iommu_alloc_default_domain() call with
> > > > group->mutex.
> > > 
> > > Can you provide some more information about exactly what the h/w
> > > configuration is, and the callstack which exhibits the race, please?
> > 
> > It'll be basically the same as the issue reported long ago with PCI
> > groups in the absence of ACS not being constructed correctly. Triggering
> > the iommu_probe_device() replay in of_iommu_configure() off the back of
> > driver probe is way too late and allows calls to happen in the wrong
> > order, or indeed race in parallel as here. Fixing that is still on my
> > radar, but will not be simple, and will probably go hand-in-hand with
> > phasing out the bus ops (for the multiple-driver-coexistence problem).
> > 
> For iommu group creation, the stack flow during race is like:
> Display device 1:
> iommu_probe_device -> iommu_group_get_for_dev -> arm_smmu_device_group
> Display device 2:
> iommu_probe_device -> iommu_group_get_for_dev -> arm_smmu_device_group
> 
> And this way it ends up in creating 2 groups for 2 display devices sharing
> same SID.
> Ideally for 2nd display device, iommu_group_get call from
> iommu_group_get_for_dev should return same group as 1st display device. But
> due to the race, it ends up with 2 groups.
> 
> For default domain, the stack flow during race is like:
> Display device 1:
> iommu_probe_device -> iommu_alloc_default_domain -> arm_smmu_domain_alloc
> Display device 2:
> iommu_probe_device -> iommu_alloc_default_domain -> arm_smmu_domain_alloc
> 
> Here also 2nd device should already have domain allocated and
> 'if(group->default_domain)' condition from iommu_alloc_default_domain should
> be true for 2nd device.
> 
> Issue with this is IOVA accesses from 2nd device results in context faults.

Thanks for the explanation (also Robin and Krishna). Please put some of this
in the commit message for the next version.

Will
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

Re: [PATCH] iommu/vt-d: Fix W=1 clang warning in intel/perf.c

2021-06-17 Thread Nick Desaulniers via iommu

On Thu, Jun 17, 2021 at 7:54 AM Joerg Roedel  wrote:
>
> From: Joerg Roedel 
>
> Fix this warning when compiled with clang and W=1:
>
> drivers/iommu/intel/perf.c:16: warning: Function parameter or member 
> 'latency_lock' not described in 'DEFINE_SPINLOCK'
> drivers/iommu/intel/perf.c:16: warning: expecting prototype for 
> perf.c(). Prototype was for DEFINE_SPINLOCK() instead

I think these warnings are actually produced by kernel-doc? (not clang)

>
> Cc: Lu Baolu 
> Reported-by: kernel test robot 
> Fixes: 55ee5e67a59a ("iommu/vt-d: Add common code for dmar latency 
> performance monitors")
> Signed-off-by: Joerg Roedel 
> ---
>  drivers/iommu/intel/perf.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
>
> diff --git a/drivers/iommu/intel/perf.c b/drivers/iommu/intel/perf.c
> index 73b7ec705552..0e8e03252d92 100644
> --- a/drivers/iommu/intel/perf.c
> +++ b/drivers/iommu/intel/perf.c
> @@ -1,5 +1,5 @@
>  // SPDX-License-Identifier: GPL-2.0
> -/**
> +/*
>   * perf.c - performance monitor
>   *
>   * Copyright (C) 2021 Intel Corporation
> --
> 2.31.1
>
> --
> You received this message because you are subscribed to the Google Groups 
> "Clang Built Linux" group.
> To unsubscribe from this group and stop receiving emails from it, send an 
> email to clang-built-linux+unsubscr...@googlegroups.com.
> To view this discussion on the web visit 
> https://groups.google.com/d/msgid/clang-built-linux/20210617145339.2692-1-joro%408bytes.org.



-- 
Thanks,
~Nick Desaulniers
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

Re: [PATCH] iommu: Update "iommu.strict" documentation

2021-06-17 Thread Joerg Roedel

On Mon, Jun 14, 2021 at 03:57:26PM +0100, Robin Murphy wrote:
> Consolidating the flush queue logic also meant that the "iommu.strict"
> option started taking effect on x86 as well. Make sure we document that.
> 
> Fixes: a250c23f15c2 ("iommu: remove DOMAIN_ATTR_DMA_USE_FLUSH_QUEUE")
> Signed-off-by: Robin Murphy 
> ---
>  Documentation/admin-guide/kernel-parameters.txt | 6 +-
>  1 file changed, 5 insertions(+), 1 deletion(-)

Applied, thanks.
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

Re: [GIT PULL] iommu/arm-smmu: Updates for 5.14

2021-06-17 Thread Joerg Roedel

On Wed, Jun 16, 2021 at 11:58:13AM +0100, Will Deacon wrote:
> The following changes since commit c4681547bcce777daf576925a966ffa824edd09d:
> 
>   Linux 5.13-rc3 (2021-05-23 11:42:48 -1000)
> 
> are available in the Git repository at:
> 
>   git://git.kernel.org/pub/scm/linux/kernel/git/will/linux.git 
> tags/arm-smmu-updates

Pulled, thanks Will.

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

[PATCH] iommu/vt-d: Fix W=1 clang warning in intel/perf.c

2021-06-17 Thread Joerg Roedel

From: Joerg Roedel 

Fix this warning when compiled with clang and W=1:

drivers/iommu/intel/perf.c:16: warning: Function parameter or member 
'latency_lock' not described in 'DEFINE_SPINLOCK'
drivers/iommu/intel/perf.c:16: warning: expecting prototype for 
perf.c(). Prototype was for DEFINE_SPINLOCK() instead

Cc: Lu Baolu 
Reported-by: kernel test robot 
Fixes: 55ee5e67a59a ("iommu/vt-d: Add common code for dmar latency performance 
monitors")
Signed-off-by: Joerg Roedel 
---
 drivers/iommu/intel/perf.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/iommu/intel/perf.c b/drivers/iommu/intel/perf.c
index 73b7ec705552..0e8e03252d92 100644
--- a/drivers/iommu/intel/perf.c
+++ b/drivers/iommu/intel/perf.c
@@ -1,5 +1,5 @@
 // SPDX-License-Identifier: GPL-2.0
-/**
+/*
  * perf.c - performance monitor
  *
  * Copyright (C) 2021 Intel Corporation
-- 
2.31.1

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

Re: [PATCH] dt-bindings: Drop redundant minItems/maxItems

2021-06-17 Thread Jassi Brar

On Tue, Jun 15, 2021 at 2:15 PM Rob Herring  wrote:
>
> If a property has an 'items' list, then a 'minItems' or 'maxItems' with the
> same size as the list is redundant and can be dropped. Note that is DT
> schema specific behavior and not standard json-schema behavior. The tooling
> will fixup the final schema adding any unspecified minItems/maxItems.
>
> This condition is partially checked with the meta-schema already, but
> only if both 'minItems' and 'maxItems' are equal to the 'items' length.
> An improved meta-schema is pending.
>
> Cc: Jens Axboe 
> Cc: Stephen Boyd 
> Cc: Herbert Xu 
> Cc: "David S. Miller" 
> Cc: David Airlie 
> Cc: Daniel Vetter 
> Cc: Vinod Koul 
> Cc: Bartosz Golaszewski 
> Cc: Kamal Dasu 
> Cc: Jonathan Cameron 
> Cc: Lars-Peter Clausen 
> Cc: Thomas Gleixner 
> Cc: Marc Zyngier 
> Cc: Joerg Roedel 
> Cc: Jassi Brar 
> Cc: Mauro Carvalho Chehab 
> Cc: Krzysztof Kozlowski 
> Cc: Ulf Hansson 
> Cc: Jakub Kicinski 
> Cc: Wolfgang Grandegger 
> Cc: Marc Kleine-Budde 
> Cc: Andrew Lunn 
> Cc: Vivien Didelot 
> Cc: Vladimir Oltean 
> Cc: Bjorn Helgaas 
> Cc: Kishon Vijay Abraham I 
> Cc: Linus Walleij 
> Cc: "Uwe Kleine-König" 
> Cc: Lee Jones 
> Cc: Ohad Ben-Cohen 
> Cc: Mathieu Poirier 
> Cc: Philipp Zabel 
> Cc: Paul Walmsley 
> Cc: Palmer Dabbelt 
> Cc: Albert Ou 
> Cc: Alessandro Zummo 
> Cc: Alexandre Belloni 
> Cc: Greg Kroah-Hartman 
> Cc: Mark Brown 
> Cc: Zhang Rui 
> Cc: Daniel Lezcano 
> Cc: Wim Van Sebroeck 
> Cc: Guenter Roeck 
> Signed-off-by: Rob Herring 
> ---
>  .../devicetree/bindings/ata/nvidia,tegra-ahci.yaml  | 1 -
>  .../devicetree/bindings/clock/allwinner,sun4i-a10-ccu.yaml  | 2 --
>  .../devicetree/bindings/clock/qcom,gcc-apq8064.yaml | 1 -
>  Documentation/devicetree/bindings/clock/qcom,gcc-sdx55.yaml | 2 --
>  .../devicetree/bindings/clock/qcom,gcc-sm8350.yaml  | 2 --
>  .../devicetree/bindings/clock/sprd,sc9863a-clk.yaml | 1 -
>  .../devicetree/bindings/crypto/allwinner,sun8i-ce.yaml  | 2 --
>  Documentation/devicetree/bindings/crypto/fsl-dcp.yaml   | 1 -
>  .../display/allwinner,sun4i-a10-display-backend.yaml| 6 --
>  .../bindings/display/allwinner,sun6i-a31-mipi-dsi.yaml  | 1 -
>  .../bindings/display/allwinner,sun8i-a83t-dw-hdmi.yaml  | 4 
>  .../bindings/display/allwinner,sun8i-a83t-hdmi-phy.yaml | 2 --
>  .../bindings/display/allwinner,sun8i-r40-tcon-top.yaml  | 2 --
>  .../devicetree/bindings/display/bridge/cdns,mhdp8546.yaml   | 2 --
>  .../bindings/display/rockchip/rockchip,dw-hdmi.yaml | 2 --
>  Documentation/devicetree/bindings/display/st,stm32-dsi.yaml | 2 --
>  .../devicetree/bindings/display/st,stm32-ltdc.yaml  | 1 -
>  .../devicetree/bindings/display/xlnx/xlnx,zynqmp-dpsub.yaml | 4 
>  .../devicetree/bindings/dma/renesas,rcar-dmac.yaml  | 1 -
>  .../devicetree/bindings/edac/amazon,al-mc-edac.yaml | 2 --
>  Documentation/devicetree/bindings/eeprom/at24.yaml  | 1 -
>  Documentation/devicetree/bindings/example-schema.yaml   | 2 --
>  Documentation/devicetree/bindings/gpu/brcm,bcm-v3d.yaml | 1 -
>  Documentation/devicetree/bindings/gpu/vivante,gc.yaml   | 1 -
>  Documentation/devicetree/bindings/i2c/brcm,brcmstb-i2c.yaml | 1 -
>  .../devicetree/bindings/i2c/marvell,mv64xxx-i2c.yaml| 2 --
>  .../devicetree/bindings/i2c/mellanox,i2c-mlxbf.yaml | 1 -
>  .../devicetree/bindings/iio/adc/amlogic,meson-saradc.yaml   | 1 -
>  .../devicetree/bindings/iio/adc/st,stm32-dfsdm-adc.yaml | 2 --
>  .../bindings/interrupt-controller/fsl,irqsteer.yaml | 1 -
>  .../bindings/interrupt-controller/loongson,liointc.yaml | 1 -
>  Documentation/devicetree/bindings/iommu/arm,smmu-v3.yaml| 1 -
>  .../devicetree/bindings/iommu/renesas,ipmmu-vmsa.yaml   | 1 -
>  .../devicetree/bindings/mailbox/st,stm32-ipcc.yaml  | 2 --
>  .../devicetree/bindings/media/amlogic,gx-vdec.yaml  | 1 -
>  Documentation/devicetree/bindings/media/i2c/adv7604.yaml| 1 -
>  .../devicetree/bindings/media/marvell,mmp2-ccic.yaml| 1 -
>  .../devicetree/bindings/media/qcom,sc7180-venus.yaml| 1 -
>  .../devicetree/bindings/media/qcom,sdm845-venus-v2.yaml | 1 -
>  .../devicetree/bindings/media/qcom,sm8250-venus.yaml| 1 -
>  Documentation/devicetree/bindings/media/renesas,drif.yaml   | 1 -
>  .../bindings/memory-controllers/mediatek,smi-common.yaml| 6 ++
>  .../bindings/memory-controllers/mediatek,smi-larb.yaml  | 1 -
>  .../devicetree/bindings/mmc/allwinner,sun4i-a10-mmc.yaml| 2 --
>  Documentation/devicetree/bindings/mmc/fsl-imx-esdhc.yaml| 1 -
>  Documentation/devicetree/bindings/mmc/mtk-sd.yaml   | 2 --
>  Documentation/devicetree/bindings/mmc/renesas,sdhi.yaml | 2 --
>  Documentation/devicetree/bindings/mmc/sdhci-am654.yaml  | 1 -
>  Documentation/devicetree/bindings/mmc/sdhci-pxa.yaml| 1 -
>  .../devicetree/bindings/net/amlogic,meson-dwmac.yaml

Re: [PATCH v4 3/6] ACPI: Add driver for the VIOT table

2021-06-17 Thread Rafael J. Wysocki

On Thu, Jun 10, 2021 at 10:03 AM Jean-Philippe Brucker
 wrote:
>
> The ACPI Virtual I/O Translation Table describes topology of
> para-virtual platforms, similarly to vendor tables DMAR, IVRS and IORT.
> For now it describes the relation between virtio-iommu and the endpoints
> it manages.
>
> Three steps are needed to configure DMA of endpoints:
>
> (1) acpi_viot_init(): parse the VIOT table, find or create the fwnode
> associated to each vIOMMU device.
>
> (2) When probing the vIOMMU device, the driver registers its IOMMU ops
> within the IOMMU subsystem. This step doesn't require any
> intervention from the VIOT driver.
>
> (3) viot_iommu_configure(): before binding the endpoint to a driver,
> find the associated IOMMU ops. Register them, along with the
> endpoint ID, into the device's iommu_fwspec.
>
> If step (3) happens before step (2), it is deferred until the IOMMU is
> initialized, then retried.
>
> Signed-off-by: Jean-Philippe Brucker 
> ---
>  drivers/acpi/Kconfig  |   3 +
>  drivers/iommu/Kconfig |   1 +
>  drivers/acpi/Makefile |   2 +
>  include/linux/acpi_viot.h |  19 ++
>  drivers/acpi/bus.c|   2 +
>  drivers/acpi/scan.c   |   3 +
>  drivers/acpi/viot.c   | 364 ++
>  MAINTAINERS   |   8 +
>  8 files changed, 402 insertions(+)
>  create mode 100644 include/linux/acpi_viot.h
>  create mode 100644 drivers/acpi/viot.c
>
> diff --git a/drivers/acpi/Kconfig b/drivers/acpi/Kconfig
> index eedec61e3476..3758c6940ed7 100644
> --- a/drivers/acpi/Kconfig
> +++ b/drivers/acpi/Kconfig
> @@ -526,6 +526,9 @@ endif
>
>  source "drivers/acpi/pmic/Kconfig"
>
> +config ACPI_VIOT
> +   bool
> +
>  endif  # ACPI
>
>  config X86_PM_TIMER
> diff --git a/drivers/iommu/Kconfig b/drivers/iommu/Kconfig
> index 1f111b399bca..aff8a4830dd1 100644
> --- a/drivers/iommu/Kconfig
> +++ b/drivers/iommu/Kconfig
> @@ -403,6 +403,7 @@ config VIRTIO_IOMMU
> depends on ARM64
> select IOMMU_API
> select INTERVAL_TREE
> +   select ACPI_VIOT if ACPI
> help
>   Para-virtualised IOMMU driver with virtio.
>
> diff --git a/drivers/acpi/Makefile b/drivers/acpi/Makefile
> index 700b41adf2db..a6e644c48987 100644
> --- a/drivers/acpi/Makefile
> +++ b/drivers/acpi/Makefile
> @@ -118,3 +118,5 @@ video-objs  += acpi_video.o video_detect.o
>  obj-y  += dptf/
>
>  obj-$(CONFIG_ARM64)+= arm64/
> +
> +obj-$(CONFIG_ACPI_VIOT)+= viot.o
> diff --git a/include/linux/acpi_viot.h b/include/linux/acpi_viot.h
> new file mode 100644
> index ..1eb8ee5b0e5f
> --- /dev/null
> +++ b/include/linux/acpi_viot.h
> @@ -0,0 +1,19 @@
> +/* SPDX-License-Identifier: GPL-2.0-only */
> +
> +#ifndef __ACPI_VIOT_H__
> +#define __ACPI_VIOT_H__
> +
> +#include 
> +
> +#ifdef CONFIG_ACPI_VIOT
> +void __init acpi_viot_init(void);
> +int viot_iommu_configure(struct device *dev);
> +#else
> +static inline void acpi_viot_init(void) {}
> +static inline int viot_iommu_configure(struct device *dev)
> +{
> +   return -ENODEV;
> +}
> +#endif
> +
> +#endif /* __ACPI_VIOT_H__ */
> diff --git a/drivers/acpi/bus.c b/drivers/acpi/bus.c
> index be7da23fad76..b835ca702ff0 100644
> --- a/drivers/acpi/bus.c
> +++ b/drivers/acpi/bus.c
> @@ -27,6 +27,7 @@
>  #include 
>  #endif
>  #include 
> +#include 
>  #include 
>  #include 
>  #include 
> @@ -1339,6 +1340,7 @@ static int __init acpi_init(void)
> pci_mmcfg_late_init();
> acpi_iort_init();
> acpi_scan_init();
> +   acpi_viot_init();

Is there a specific reason why to call it right here?

In particular, does it need to be called after acpi_scan_init()?  And
does it need to be called before the subsequent functions?  If so,
then why?

> acpi_ec_init();
> acpi_debugfs_init();
> acpi_sleep_proc_init();
> diff --git a/drivers/acpi/scan.c b/drivers/acpi/scan.c
> index 0c53c8533300..4fa684fdfda8 100644
> --- a/drivers/acpi/scan.c
> +++ b/drivers/acpi/scan.c
> @@ -9,6 +9,7 @@
>  #include 
>  #include 
>  #include 
> +#include 
>  #include 
>  #include 
>  #include 
> @@ -1556,6 +1557,8 @@ static const struct iommu_ops 
> *acpi_iommu_configure_id(struct device *dev,
> return ops;
>
> err = iort_iommu_configure_id(dev, id_in);
> +   if (err && err != -EPROBE_DEFER)
> +   err = viot_iommu_configure(dev);
>
> /*
>  * If we have reason to believe the IOMMU driver missed the initial
> diff --git a/drivers/acpi/viot.c b/drivers/acpi/viot.c
> new file mode 100644
> index ..892cd9fa7b6d
> --- /dev/null
> +++ b/drivers/acpi/viot.c
> @@ -0,0 +1,364 @@
> +// SPDX-License-Identifier: GPL-2.0
> +/*
> + * Virtual I/O topology
> + *
> + * The Virtual I/O Translation Table (VIOT) describes the topology of
> + * para-virtual IOMMUs and the endpoints they manage. The OS uses it to
> + * initialize devices in the right order,

Re: swiotlb/caamjr regression (Was: [GIT PULL] (swiotlb) stable/for-linus-5.12)

2021-06-17 Thread Christoph Hellwig

On Wed, Jun 16, 2021 at 08:27:39PM -0400, Konrad Rzeszutek Wilk wrote:
> How unique is this NVMe? Should I be able to reproduce this with any
> type or is it specific to Google Cloud?

With swiotlb=force this should be reproducable everywhere.
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

Re: [PATCH] dt-bindings: Drop redundant minItems/maxItems

2021-06-17 Thread Ulf Hansson

On Tue, 15 Jun 2021 at 21:15, Rob Herring  wrote:
>
> If a property has an 'items' list, then a 'minItems' or 'maxItems' with the
> same size as the list is redundant and can be dropped. Note that is DT
> schema specific behavior and not standard json-schema behavior. The tooling
> will fixup the final schema adding any unspecified minItems/maxItems.
>
> This condition is partially checked with the meta-schema already, but
> only if both 'minItems' and 'maxItems' are equal to the 'items' length.
> An improved meta-schema is pending.
>
> Cc: Jens Axboe 
> Cc: Stephen Boyd 
> Cc: Herbert Xu 
> Cc: "David S. Miller" 
> Cc: David Airlie 
> Cc: Daniel Vetter 
> Cc: Vinod Koul 
> Cc: Bartosz Golaszewski 
> Cc: Kamal Dasu 
> Cc: Jonathan Cameron 
> Cc: Lars-Peter Clausen 
> Cc: Thomas Gleixner 
> Cc: Marc Zyngier 
> Cc: Joerg Roedel 
> Cc: Jassi Brar 
> Cc: Mauro Carvalho Chehab 
> Cc: Krzysztof Kozlowski 
> Cc: Ulf Hansson 
> Cc: Jakub Kicinski 
> Cc: Wolfgang Grandegger 
> Cc: Marc Kleine-Budde 
> Cc: Andrew Lunn 
> Cc: Vivien Didelot 
> Cc: Vladimir Oltean 
> Cc: Bjorn Helgaas 
> Cc: Kishon Vijay Abraham I 
> Cc: Linus Walleij 
> Cc: "Uwe Kleine-König" 
> Cc: Lee Jones 
> Cc: Ohad Ben-Cohen 
> Cc: Mathieu Poirier 
> Cc: Philipp Zabel 
> Cc: Paul Walmsley 
> Cc: Palmer Dabbelt 
> Cc: Albert Ou 
> Cc: Alessandro Zummo 
> Cc: Alexandre Belloni 
> Cc: Greg Kroah-Hartman 
> Cc: Mark Brown 
> Cc: Zhang Rui 
> Cc: Daniel Lezcano 
> Cc: Wim Van Sebroeck 
> Cc: Guenter Roeck 
> Signed-off-by: Rob Herring 

Acked-by: Ulf Hansson  # for MMC

[...]

Kind regards
Uffe
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

Re: [PATCH v8 03/10] eventfd: Increase the recursion depth of eventfd_signal()

2021-06-17 Thread He Zhe




On 6/15/21 10:13 PM, Xie Yongji wrote:
> Increase the recursion depth of eventfd_signal() to 1. This
> is the maximum recursion depth we have found so far, which
> can be triggered with the following call chain:
>
> kvm_io_bus_write[kvm]
>   --> ioeventfd_write   [kvm]
> --> eventfd_signal  [eventfd]
>   --> vhost_poll_wakeup [vhost]
> --> vduse_vdpa_kick_vq  [vduse]
>   --> eventfd_signal[eventfd]
>
> Signed-off-by: Xie Yongji 
> Acked-by: Jason Wang 

The fix had been posted one year ago.

https://lore.kernel.org/lkml/20200410114720.24838-1-zhe...@windriver.com/


> ---
>  fs/eventfd.c| 2 +-
>  include/linux/eventfd.h | 5 -
>  2 files changed, 5 insertions(+), 2 deletions(-)
>
> diff --git a/fs/eventfd.c b/fs/eventfd.c
> index e265b6dd4f34..cc7cd1dbedd3 100644
> --- a/fs/eventfd.c
> +++ b/fs/eventfd.c
> @@ -71,7 +71,7 @@ __u64 eventfd_signal(struct eventfd_ctx *ctx, __u64 n)
>* it returns true, the eventfd_signal() call should be deferred to a
>* safe context.
>*/
> - if (WARN_ON_ONCE(this_cpu_read(eventfd_wake_count)))
> + if (WARN_ON_ONCE(this_cpu_read(eventfd_wake_count) > EFD_WAKE_DEPTH))
>   return 0;
>  
>   spin_lock_irqsave(>wqh.lock, flags);
> diff --git a/include/linux/eventfd.h b/include/linux/eventfd.h
> index fa0a524baed0..886d99cd38ef 100644
> --- a/include/linux/eventfd.h
> +++ b/include/linux/eventfd.h
> @@ -29,6 +29,9 @@
>  #define EFD_SHARED_FCNTL_FLAGS (O_CLOEXEC | O_NONBLOCK)
>  #define EFD_FLAGS_SET (EFD_SHARED_FCNTL_FLAGS | EFD_SEMAPHORE)
>  
> +/* Maximum recursion depth */
> +#define EFD_WAKE_DEPTH 1
> +
>  struct eventfd_ctx;
>  struct file;
>  
> @@ -47,7 +50,7 @@ DECLARE_PER_CPU(int, eventfd_wake_count);
>  
>  static inline bool eventfd_signal_count(void)
>  {
> - return this_cpu_read(eventfd_wake_count);
> + return this_cpu_read(eventfd_wake_count) > EFD_WAKE_DEPTH;

count is just count. How deep is acceptable should be put
where eventfd_signal_count is called.


Zhe

>  }
>  
>  #else /* CONFIG_EVENTFD */

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

Re: [PATCH v13 4/6] iommu/vt-d: Add support for IOMMU default DMA mode build options

2021-06-17 Thread John Garry


On 17/06/2021 08:32, Lu Baolu wrote:

On 6/16/21 7:03 PM, John Garry wrote:

@@ -4382,9 +4380,9 @@ int __init intel_iommu_init(void)
   * is likely to be much lower than the overhead of 
synchronizing

   * the virtual and physical IOMMU page-tables.
   */
-    if (!intel_iommu_strict && cap_caching_mode(iommu->cap)) {
-    pr_warn("IOMMU batching is disabled due to virtualization");
-    intel_iommu_strict = 1;
+    if (cap_caching_mode(iommu->cap)) {
+    pr_warn("IOMMU batching disallowed due to 
virtualization\n");

+    iommu_set_dma_strict(true);


With this change, VM guest will always show this warning.


Would they have got it before also normally?

I mean, default is intel_iommu_strict=0, so if 
cap_caching_mode(iommu->cap) is true and intel_iommu_strict not set to 1 
elsewhere previously, then we would get this print.



How about
removing this message? Users could get the same information through the
kernel message added by "[PATCH v13 2/6] iommu: Print strict or lazy
mode at init time".


I think that the print from 2/6 should occur before this print.

Regardless I would think that you would still like to be notified of 
this change in policy, right?


However I now realize that the print is in a loop per iommu, so we would 
get it per iommu:


for_each_active_iommu(iommu, drhd) {
/*
 * The flush queue implementation does not perform
 * page-selective invalidations that are required for efficient
 * TLB flushes in virtual environments.  The benefit of batching
 * is likely to be much lower than the overhead of synchronizing
 * the virtual and physical IOMMU page-tables.
 */
if (!intel_iommu_strict && cap_caching_mode(iommu->cap)) {
pr_warn("IOMMU batching is disabled due to virtualization");
intel_iommu_strict = 1;
}
...
}

I need to change that. How about this:

bool print_warning = false;

for_each_active_iommu(iommu, drhd) {
/*
 * The flush queue implementation does not perform
 * page-selective invalidations that are required for efficient
 * TLB flushes in virtual environments.  The benefit of batching
 * is likely to be much lower than the overhead of synchronizing
 * the virtual and physical IOMMU page-tables.
 */
if (!print_warning && cap_caching_mode(iommu->cap)) {
pr_warn("IOMMU batching disallowed due to virtualization\n");
iommu_set_dma_strict(true);
print_warning = true;
}
...
}

or use pr_warn_once().

Thanks,
John
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

Re: [PATCH v13 6/6] iommu: Remove mode argument from iommu_set_dma_strict()

2021-06-17 Thread John Garry




@@ -349,10 +349,9 @@ static int __init iommu_dma_setup(char *str)
  }
  early_param("iommu.strict", iommu_dma_setup);
-void iommu_set_dma_strict(bool strict)
+void iommu_set_dma_strict(void)
  {
-    if (strict || !(iommu_cmd_line & IOMMU_CMD_LINE_STRICT))
-    iommu_dma_strict = strict;
+    iommu_dma_strict = true;


Sorry, I still can't get how iommu.strict kernel option works.

static int __init iommu_dma_setup(char *str)
{
     int ret = kstrtobool(str, _dma_strict);

     if (!ret)
     iommu_cmd_line |= IOMMU_CMD_LINE_STRICT;
     return ret;
}
early_param("iommu.strict", iommu_dma_setup);

The bit IOMMU_CMD_LINE_STRICT is only set, but not used anywhere.


It is used in patch 2/6:

+   pr_info("DMA domain TLB invalidation policy: %s mode %s\n",
+   iommu_dma_strict ? "strict" : "lazy",
+   (iommu_cmd_line & IOMMU_CMD_LINE_STRICT) ?
+   "(set via kernel command line)" : "");


Hence,
I am wondering how could it work? A bug or I missed anything?


It is really just used for informative purpose now.

Thanks,
john
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

Re: [PATCH v13 6/6] iommu: Remove mode argument from iommu_set_dma_strict()

2021-06-17 Thread Lu Baolu


On 6/16/21 7:03 PM, John Garry wrote:

We only ever now set strict mode enabled in iommu_set_dma_strict(), so
just remove the argument.

Signed-off-by: John Garry 
Reviewed-by: Robin Murphy 
---
  drivers/iommu/amd/init.c| 2 +-
  drivers/iommu/intel/iommu.c | 6 +++---
  drivers/iommu/iommu.c   | 5 ++---
  include/linux/iommu.h   | 2 +-
  4 files changed, 7 insertions(+), 8 deletions(-)

diff --git a/drivers/iommu/amd/init.c b/drivers/iommu/amd/init.c
index fb3618af643b..7bc460052678 100644
--- a/drivers/iommu/amd/init.c
+++ b/drivers/iommu/amd/init.c
@@ -3099,7 +3099,7 @@ static int __init parse_amd_iommu_options(char *str)
for (; *str; ++str) {
if (strncmp(str, "fullflush", 9) == 0) {
pr_warn("amd_iommu=fullflush deprecated; use iommu.strict 
instead\n");
-   iommu_set_dma_strict(true);
+   iommu_set_dma_strict();
}
if (strncmp(str, "force_enable", 12) == 0)
amd_iommu_force_enable = true;
diff --git a/drivers/iommu/intel/iommu.c b/drivers/iommu/intel/iommu.c
index d586990fa751..0618c35cfb51 100644
--- a/drivers/iommu/intel/iommu.c
+++ b/drivers/iommu/intel/iommu.c
@@ -454,7 +454,7 @@ static int __init intel_iommu_setup(char *str)
iommu_dma_forcedac = true;
} else if (!strncmp(str, "strict", 6)) {
pr_warn("intel_iommu=strict deprecated; use iommu.strict 
instead\n");
-   iommu_set_dma_strict(true);
+   iommu_set_dma_strict();
} else if (!strncmp(str, "sp_off", 6)) {
pr_info("Disable supported super page\n");
intel_iommu_superpage = 0;
@@ -4382,7 +4382,7 @@ int __init intel_iommu_init(void)
 */
if (cap_caching_mode(iommu->cap)) {
pr_warn("IOMMU batching disallowed due to 
virtualization\n");
-   iommu_set_dma_strict(true);
+   iommu_set_dma_strict();
}
iommu_device_sysfs_add(>iommu, NULL,
   intel_iommu_groups,
@@ -5699,7 +5699,7 @@ static void quirk_calpella_no_shadow_gtt(struct pci_dev 
*dev)
} else if (dmar_map_gfx) {
/* we have to ensure the gfx device is idle before we flush */
pci_info(dev, "Disabling batched IOTLB flush on Ironlake\n");
-   iommu_set_dma_strict(true);
+   iommu_set_dma_strict();
}
  }
  DECLARE_PCI_FIXUP_HEADER(PCI_VENDOR_ID_INTEL, 0x0040, 
quirk_calpella_no_shadow_gtt);
diff --git a/drivers/iommu/iommu.c b/drivers/iommu/iommu.c
index 60b1ec42e73b..ff221d3ddcbc 100644
--- a/drivers/iommu/iommu.c
+++ b/drivers/iommu/iommu.c
@@ -349,10 +349,9 @@ static int __init iommu_dma_setup(char *str)
  }
  early_param("iommu.strict", iommu_dma_setup);
  
-void iommu_set_dma_strict(bool strict)

+void iommu_set_dma_strict(void)
  {
-   if (strict || !(iommu_cmd_line & IOMMU_CMD_LINE_STRICT))
-   iommu_dma_strict = strict;
+   iommu_dma_strict = true;


Sorry, I still can't get how iommu.strict kernel option works.

static int __init iommu_dma_setup(char *str)
{
int ret = kstrtobool(str, _dma_strict);

if (!ret)
iommu_cmd_line |= IOMMU_CMD_LINE_STRICT;
return ret;
}
early_param("iommu.strict", iommu_dma_setup);

The bit IOMMU_CMD_LINE_STRICT is only set, but not used anywhere. Hence,
I am wondering how could it work? A bug or I missed anything?

Best regards,
baolu


  }
  
  bool iommu_get_dma_strict(struct iommu_domain *domain)

diff --git a/include/linux/iommu.h b/include/linux/iommu.h
index 32d448050bf7..754f67d6dd90 100644
--- a/include/linux/iommu.h
+++ b/include/linux/iommu.h
@@ -476,7 +476,7 @@ int iommu_enable_nesting(struct iommu_domain *domain);
  int iommu_set_pgtable_quirks(struct iommu_domain *domain,
unsigned long quirks);
  
-void iommu_set_dma_strict(bool val);

+void iommu_set_dma_strict(void);
  bool iommu_get_dma_strict(struct iommu_domain *domain);
  
  extern int report_iommu_fault(struct iommu_domain *domain, struct device *dev,



___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

Re: [PATCH v13 4/6] iommu/vt-d: Add support for IOMMU default DMA mode build options

2021-06-17 Thread Lu Baolu


On 6/16/21 7:03 PM, John Garry wrote:

@@ -4382,9 +4380,9 @@ int __init intel_iommu_init(void)
 * is likely to be much lower than the overhead of synchronizing
 * the virtual and physical IOMMU page-tables.
 */
-   if (!intel_iommu_strict && cap_caching_mode(iommu->cap)) {
-   pr_warn("IOMMU batching is disabled due to 
virtualization");
-   intel_iommu_strict = 1;
+   if (cap_caching_mode(iommu->cap)) {
+   pr_warn("IOMMU batching disallowed due to 
virtualization\n");
+   iommu_set_dma_strict(true);


With this change, VM guest will always show this warning. How about
removing this message? Users could get the same information through the
kernel message added by "[PATCH v13 2/6] iommu: Print strict or lazy
mode at init time".

Best regards,
baolu
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

RE: Plan for /dev/ioasid RFC v2

2021-06-17 Thread Tian, Kevin

> From: Alex Williamson 
> Sent: Thursday, June 17, 2021 3:40 AM
> 
> On Wed, 16 Jun 2021 06:43:23 +
> "Tian, Kevin"  wrote:
> 
> > > From: Alex Williamson 
> > > Sent: Wednesday, June 16, 2021 12:12 AM
> > >
> > > On Tue, 15 Jun 2021 02:31:39 +
> > > "Tian, Kevin"  wrote:
> > >
> > > > > From: Alex Williamson 
> > > > > Sent: Tuesday, June 15, 2021 12:28 AM
> > > > >
> > > > [...]
> > > > > > IOASID. Today the group fd requires an IOASID before it hands out a
> > > > > > device_fd. With iommu_fd the device_fd will not allow IOCTLs until
> it
> > > > > > has a blocked DMA IOASID and is successefully joined to an
> iommu_fd.
> > > > >
> > > > > Which is the root of my concern.  Who owns ioctls to the device fd?
> > > > > It's my understanding this is a vfio provided file descriptor and it's
> > > > > therefore vfio's responsibility.  A device-level IOASID interface
> > > > > therefore requires that vfio manage the group aspect of device access.
> > > > > AFAICT, that means that device access can therefore only begin when
> all
> > > > > devices for a given group are attached to the IOASID and must halt for
> > > > > all devices in the group if any device is ever detached from an 
> > > > > IOASID,
> > > > > even temporarily.  That suggests a lot more oversight of the IOASIDs
> by
> > > > > vfio than I'd prefer.
> > > > >
> > > >
> > > > This is possibly the point that is worthy of more clarification and
> > > > alignment, as it sounds like the root of controversy here.
> > > >
> > > > I feel the goal of vfio group management is more about ownership, i.e.
> > > > all devices within a group must be assigned to a single user. Following
> > > > the three rules defined by Jason, what we really care is whether a group
> > > > of devices can be isolated from the rest of the world, i.e. no access to
> > > > memory/device outside of its security context and no access to its
> > > > security context from devices outside of this group. This can be
> achieved
> > > > as long as every device in the group is either in block-DMA state when
> > > > it's not attached to any security context or attached to an IOASID
> context
> > > > in IOMMU fd.
> > > >
> > > > As long as group-level isolation is satisfied, how devices within a 
> > > > group
> > > > are further managed is decided by the user (unattached, all attached to
> > > > same IOASID, attached to different IOASIDs) as long as the user
> > > > understands the implication of lacking of isolation within the group.
> This
> > > > is what a device-centric model comes to play. Misconfiguration just
> hurts
> > > > the user itself.
> > > >
> > > > If this rationale can be agreed, then I didn't see the point of having 
> > > > VFIO
> > > > to mandate all devices in the group must be attached/detached in
> > > > lockstep.
> > >
> > > In theory this sounds great, but there are still too many assumptions
> > > and too much hand waving about where isolation occurs for me to feel
> > > like I really have the complete picture.  So let's walk through some
> > > examples.  Please fill in and correct where I'm wrong.
> >
> > Thanks for putting these examples. They are helpful for clearing the
> > whole picture.
> >
> > Before filling in let's first align on what is the key difference between
> > current VFIO model and this new proposal. With this comparison we'll
> > know which of following questions are answered with existing VFIO
> > mechanism and which are handled differently.
> >
> > With Yi's help we figured out the current mechanism:
> >
> > 1) vfio_group_viable. The code comment explains the intention clearly:
> >
> > --
> > * A vfio group is viable for use by userspace if all devices are in
> >  * one of the following states:
> >  *  - driver-less
> >  *  - bound to a vfio driver
> >  *  - bound to an otherwise allowed driver
> >  *  - a PCI interconnect device
> > --
> >
> > Note this check is not related to an IOMMU security context.
> 
> Because this is a pre-requisite for imposing that IOMMU security
> context.
> 
> > 2) vfio_iommu_group_notifier. When an IOMMU_GROUP_NOTIFY_
> > BOUND_DRIVER event is notified, vfio_group_viable is re-evaluated.
> > If the affected group was previously viable but now becomes not
> > viable, BUG_ON() as it implies that this device is bound to a non-vfio
> > driver which breaks the group isolation.
> 
> This notifier action is conditional on there being users of devices
> within a secure group IOMMU context.
> 
> > 3) vfio_group_get_device_fd. User can acquire a device fd only after
> > a) the group is viable;
> > b) the group is attached to a container;
> > c) iommu is set on the container (implying a security context
> > established);
> 
> The order is actually b) a) c) but arguably b) is a no-op until:
> 
> d) a device fd is provided to the user
> 
> > The new device-centric proposal suggests:
> >
> > 1) vfio_group_viable;
> > 2) vfio_iommu_group_notifier;
> > 3) block-DMA if a device is detached from

Re: Plan for /dev/ioasid RFC v2

2021-06-17 Thread David Gibson

On Wed, Jun 09, 2021 at 09:39:19AM -0300, Jason Gunthorpe wrote:
> On Wed, Jun 09, 2021 at 02:24:03PM +0200, Joerg Roedel wrote:
> > On Mon, Jun 07, 2021 at 02:58:18AM +, Tian, Kevin wrote:
> > > -   Device-centric (Jason) vs. group-centric (David) uAPI. David is not 
> > > fully
> > > convinced yet. Based on discussion v2 will continue to have ioasid 
> > > uAPI
> > > being device-centric (but it's fine for vfio to be group-centric). A 
> > > new
> > > section will be added to elaborate this part;
> > 
> > I would vote for group-centric here. Or do the reasons for which VFIO is
> > group-centric not apply to IOASID? If so, why?
> 
> VFIO being group centric has made it very ugly/difficult to inject
> device driver specific knowledge into the scheme.
> 
> The device driver is the only thing that knows to ask:
>  - I need a SW table for this ioasid because I am like a mdev
>  - I will issue TLPs with PASID
>  - I need a IOASID linked to a PASID
>  - I am a devices that uses ENQCMD and vPASID
>  - etc in future

mdev drivers might know these, but shim drivers, like basic vfio-pci
often won't.  In that case only the userspace driver will know that
for certain.  The shim driver at best has a fairly loose bound on what
the userspace driver *could* do.

I still think you're having a tendency to partially conflate several
meanings of "group":
1. the unavoidable hardware unit of non-isolation
2. the kernel internal concept and interface to it
3. the user visible fd and interface

We can't avoid having (1) somewhere, (3) and to a lesser extent (2)
are what you object to.

> The current approach has the group try to guess the device driver
> intention in the vfio type 1 code.

I agree this has gotten ugly.  What I'm not yet convinced of is that
reworking groups to make this not-ugly necessarily requires totally
minimizing the importance of groups.

> I want to see this be clean and have the device driver directly tell
> the iommu layer what kind of DMA it plans to do, and thus how it needs
> the IOMMU and IOASID configured.

> 
> This is the source of the ugly symbol_get and the very, very hacky 'if
> you are a mdev *and* a iommu then you must want a single PASID' stuff
> in type1.
> 
> The group is causing all this mess because the group knows nothing
> about what the device drivers contained in the group actually want.
> 
> Further being group centric eliminates the possibility of working in
> cases like !ACS. How do I use PASID functionality of a device behind a
> !ACS switch if the uAPI forces all IOASID's to be linked to a group,
> not a device?
> 
> Device centric with an report that "all devices in the group must use
> the same IOASID" covers all the new functionality, keep the old, and
> has a better chance to keep going as a uAPI into the future.
> 
> Jason
> 

-- 
David Gibson| I'll have my music baroque, and my code
david AT gibson.dropbear.id.au  | minimalist, thank you.  NOT _the_ _other_
| _way_ _around_!
http://www.ozlabs.org/~dgibson


signature.asc
Description: PGP signature
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

Re: Plan for /dev/ioasid RFC v2

2021-06-17 Thread David Gibson

On Fri, Jun 11, 2021 at 01:45:29PM -0300, Jason Gunthorpe wrote:
> On Thu, Jun 10, 2021 at 09:38:42AM -0600, Alex Williamson wrote:
> 
> > Opening the group is not the extent of the security check currently
> > required, the group must be added to a container and an IOMMU model
> > configured for the container *before* the user can get a devicefd.
> > Each devicefd creates a reference to this security context, therefore
> > access to a device does not exist without such a context.
> 
> Okay, I missed that detail in the organization..
> 
> So, if we have an independent vfio device fd then it needs to be
> kept disable until the user joins it to an ioasid that provides the
> security proof to allow it to work?
> 
> > What happens on detach?  As we've discussed elsewhere in this thread,
> > revoking access is more difficult than holding a reference to the
> > secure context, but I'm under the impression that moving a device
> > between IOASIDs could be standard practice in this new model.  A device
> > that's detached from a secure context, even temporarily, is a
> > problem.
> 
> This is why I think the single iommu FD is critical, it is the FD, not
> the IOASID that has to authorize the security. You shouldn't move
> devices between FDs, but you can move them between IOASIDs inside the
> same FD.
> 
> > How to label a device seems like a relatively mundane issue relative to
> > ownership and isolated contexts of groups and devices.  The label is
> > essentially just creating an identifier to device mapping, where the
> > identifier (label) will be used in the IOASID interface, right? 
> 
> It looks that way
> 
> > As I note above, that makes it difficult for vfio to maintain that a
> > user only accesses a device in a secure context.  This is exactly
> > why vfio has the model of getting a devicefd from a groupfd only
> > when that group is in a secure context and maintaining references to
> > that secure context for each device.  Split ownership of the secure
> > context in IOASID vs device access in vfio and exposing devicefds
> > outside the group is still a big question mark for me.  Thanks,
> 
> I think the protection model becomes different once we allow
> individual devices inside a group to be attached to different
> IOASID's.

I'm really wary of this.  They might be rare, but we still need to
consider the case of devices which can't be distinguished on the bus,
and therefore can't be attached to different IOASIDs.  That means that
if we allow attaching devices within a group to different IOASIDs we
effectively need to introduce two levels of "group-like" things.
First the idenfication group, then the isolation group.


You're using "group" for the isolation group, but then we have to
somehow expose this concept of identification group.  That seems like
a heap of complexity and confusion in the interface.

-- 
David Gibson| I'll have my music baroque, and my code
david AT gibson.dropbear.id.au  | minimalist, thank you.  NOT _the_ _other_
| _way_ _around_!
http://www.ozlabs.org/~dgibson


signature.asc
Description: PGP signature
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

Re: Plan for /dev/ioasid RFC v2

2021-06-17 Thread David Gibson

On Thu, Jun 10, 2021 at 01:50:22PM +0800, Lu Baolu wrote:
> On 6/9/21 8:39 PM, Jason Gunthorpe wrote:
> > On Wed, Jun 09, 2021 at 02:24:03PM +0200, Joerg Roedel wrote:
> > > On Mon, Jun 07, 2021 at 02:58:18AM +, Tian, Kevin wrote:
> > > > -   Device-centric (Jason) vs. group-centric (David) uAPI. David is not 
> > > > fully
> > > >  convinced yet. Based on discussion v2 will continue to have ioasid 
> > > > uAPI
> > > >  being device-centric (but it's fine for vfio to be group-centric). 
> > > > A new
> > > >  section will be added to elaborate this part;
> > > I would vote for group-centric here. Or do the reasons for which VFIO is
> > > group-centric not apply to IOASID? If so, why?
> > VFIO being group centric has made it very ugly/difficult to inject
> > device driver specific knowledge into the scheme.
> > 
> > The device driver is the only thing that knows to ask:
> >   - I need a SW table for this ioasid because I am like a mdev
> >   - I will issue TLPs with PASID
> >   - I need a IOASID linked to a PASID
> >   - I am a devices that uses ENQCMD and vPASID
> >   - etc in future
> > 
> > The current approach has the group try to guess the device driver
> > intention in the vfio type 1 code.
> > 
> > I want to see this be clean and have the device driver directly tell
> > the iommu layer what kind of DMA it plans to do, and thus how it needs
> > the IOMMU and IOASID configured.
> > 
> > This is the source of the ugly symbol_get and the very, very hacky 'if
> > you are a mdev*and*  a iommu then you must want a single PASID' stuff
> > in type1.
> > 
> > The group is causing all this mess because the group knows nothing
> > about what the device drivers contained in the group actually want.
> > 
> > Further being group centric eliminates the possibility of working in
> > cases like !ACS. How do I use PASID functionality of a device behind a
> > !ACS switch if the uAPI forces all IOASID's to be linked to a group,
> > not a device?
> > 
> > Device centric with an report that "all devices in the group must use
> > the same IOASID" covers all the new functionality, keep the old, and
> > has a better chance to keep going as a uAPI into the future.
> 
> The iommu_group can guarantee the isolation among different physical
> devices (represented by RIDs). But when it comes to sub-devices (ex. mdev or
> vDPA devices represented by RID + SSID), we have to rely on the
> device driver for isolation. The devices which are able to generate sub-
> devices should either use their own on-device mechanisms or use the
> platform features like Intel Scalable IOV to isolate the sub-devices.

This seems like a misunderstanding of groups.  Groups are not tied to
any PCI meaning.  Groups are the smallest unit of isolation, no matter
what is providing that isolation.

If mdevs are isolated from each other by clever software, even though
they're on the same PCI device they are in different groups from each
other *by definition*.  They are also in a different group from their
parent device (however the mdevs only exist when mdev driver is
active, which implies that the parent device's group is owned by the
kernel).

> Under above conditions, different sub-device from a same RID device
> could be able to use different IOASID. This seems to means that we can't
> support mixed mode where, for example, two RIDs share an iommu_group and
> one (or both) of them have sub-devices.

That doesn't necessarily follow.  mdevs which can be successfully
isolated by their mdev driver are in a different group from their
parent device, and therefore need not be affected by whether the
parent device shares a group with some other physical device.  They
*might* be, but that's up to the mdev driver to determine based on
what it can safely isolate.

> AIUI, when we attach a "RID + SSID" to an IOASID, we should require that
> the RID doesn't share the iommu_group with any other RID.
> 
> Best regards,
> baolu
> 

-- 
David Gibson| I'll have my music baroque, and my code
david AT gibson.dropbear.id.au  | minimalist, thank you.  NOT _the_ _other_
| _way_ _around_!
http://www.ozlabs.org/~dgibson


signature.asc
Description: PGP signature
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

Re: [RFC] /dev/ioasid uAPI proposal

2021-06-17 Thread David Gibson

On Tue, Jun 08, 2021 at 10:17:56AM -0300, Jason Gunthorpe wrote:
> On Tue, Jun 08, 2021 at 12:37:04PM +1000, David Gibson wrote:
> 
> > > The PPC/SPAPR support allows KVM to associate a vfio group to an IOMMU
> > > page table so that it can handle iotlb programming from pre-registered
> > > memory without trapping out to userspace.
> > 
> > To clarify that's a guest side logical vIOMMU page table which is
> > partially managed by KVM.  This is an optimization - things can work
> > without it, but it means guest iomap/unmap becomes a hot path because
> > each map/unmap hypercall has to go
> > guest -> KVM -> qemu -> VFIO
> > 
> > So there are multiple context transitions.
> 
> Isn't this overhead true of many of the vIOMMUs?

Yes, but historically it bit much harder on POWER for a couple of reasons:

1) POWER guests *always* have a vIOMMU - the platform has no concept
   of passthrough mode.  We therefore had a vIOMMU implementation some
   time before the AMD or Intel IOMMUs were implemented as vIOMMUs in
   qemu.

2) At the time we were implementing this the supported IOVA window for
   the paravirtualized IOMMU was pretty small (1G, I think) making
   vIOMMU maps and unmaps a pretty common operation.

> Can the fast path be
> generalized?

Not really.  This is a paravirtualized guest IOMMU, so it's a platform
specific group of hypercalls that's being interpreted by KVM and
passed through to the IOMMU side using essentially the same backend
that that the userspace implementation would eventually get to after a
bunch more context switches.

-- 
David Gibson| I'll have my music baroque, and my code
david AT gibson.dropbear.id.au  | minimalist, thank you.  NOT _the_ _other_
| _way_ _around_!
http://www.ozlabs.org/~dgibson

signature.asc
Description: PGP signature
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

Re: Plan for /dev/ioasid RFC v2

2021-06-17 Thread David Gibson

On Wed, Jun 09, 2021 at 10:15:32AM -0600, Alex Williamson wrote:
> On Wed, 9 Jun 2021 17:51:26 +0200
> Joerg Roedel  wrote:
> 
> > On Wed, Jun 09, 2021 at 12:00:09PM -0300, Jason Gunthorpe wrote:
> > > Only *drivers* know what the actual device is going to do, devices do
> > > not. Since the group doesn't have drivers it is the wrong layer to be
> > > making choices about how to configure the IOMMU.  
> > 
> > Groups don't carry how to configure IOMMUs, that information is
> > mostly in the IOMMU domains. And those (or an abstraction of them) is
> > configured through /dev/ioasid. So not sure what you wanted to say with
> > the above.
> > 
> > All a group carries is information about which devices are not
> > sufficiently isolated from each other and thus need to always be in the
> > same domain.
> > 
> > > The device centric approach is my attempt at this, and it is pretty
> > > clean, I think.  
> > 
> > Clean, but still insecure.
> > 
> > > All ACS does is prevent P2P operations, if you assign all the group
> > > devices into the same /dev/iommu then you may not care about that
> > > security isolation property. At the very least it is policy for user
> > > to decide, not kernel.  
> > 
> > It is a kernel decision, because a fundamental task of the kernel is to
> > ensure isolation between user-space tasks as good as it can. And if a
> > device assigned to one task can interfer with a device of another task
> > (e.g. by sending P2P messages), then the promise of isolation is broken.
> 
> AIUI, the IOASID model will still enforce IOMMU groups, but it's not an
> explicit part of the interface like it is for vfio.  For example the
> IOASID model allows attaching individual devices such that we have
> granularity to create per device IOASIDs, but all devices within an
> IOMMU group are required to be attached to an IOASID before they can be
> used.  It's not entirely clear to me yet how that last bit gets
> implemented though, ie. what barrier is in place to prevent device
> usage prior to reaching this viable state.
>
> > > Groups should be primarily about isolation security, not about IOASID
> > > matching.  
> > 
> > That doesn't make any sense, what do you mean by 'IOASID matching'?
> 
> One of the problems with the vfio interface use of groups is that we
> conflate the IOMMU group for both isolation and granularity.  I think
> what Jason is referring to here is that we still want groups to be the
> basis of isolation, but we don't want a uAPI that presumes all devices
> within the group must use the same IOASID.  For example, if a user owns
> an IOMMU group consisting of non-isolated functions of a multi-function
> device, they should be able to create a vIOMMU VM where each of those
> functions has its own address space.  That can't be done today, the
> entire group would need to be attached to the VM under a PCIe-to-PCI
> bridge to reflect the address space limitation imposed by the vfio
> group uAPI model.  Thanks,

I'm fairly sceptical of the idea of allowing the "identifiable
requestor" grouping to be different from the isolation grouping.
Certainly it's possible in hardware, but I think it makes the
interface horribly complex to understand without buying much.

"Good" modern devices on modern systems will be both fully isolated
and well identified, so for the uses cases that people seem to mostly
care about here we'll still have identification group == isolation
group == one device.

In other words, do we really have use cases where we need to identify
different devices IDs, even though we know they're not isolated.

-- 
David Gibson| I'll have my music baroque, and my code
david AT gibson.dropbear.id.au  | minimalist, thank you.  NOT _the_ _other_
| _way_ _around_!
http://www.ozlabs.org/~dgibson


signature.asc
Description: PGP signature
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

Re: [RFC] /dev/ioasid uAPI proposal

2021-06-17 Thread David Gibson

On Thu, Jun 03, 2021 at 08:12:27AM +, Tian, Kevin wrote:
> > From: David Gibson 
> > Sent: Wednesday, June 2, 2021 2:15 PM
> >
> [...]
>  
> > >
> > > /*
> > >   * Get information about an I/O address space
> > >   *
> > >   * Supported capabilities:
> > >   *   - VFIO type1 map/unmap;
> > >   *   - pgtable/pasid_table binding
> > >   *   - hardware nesting vs. software nesting;
> > >   *   - ...
> > >   *
> > >   * Related attributes:
> > >   *   - supported page sizes, reserved IOVA ranges (DMA mapping);
> > 
> > Can I request we represent this in terms of permitted IOVA ranges,
> > rather than reserved IOVA ranges.  This works better with the "window"
> > model I have in mind for unifying the restrictions of the POWER IOMMU
> > with Type1 like mapping.
> 
> Can you elaborate how permitted range work better here?

Pretty much just that MAP operations would fail if they don't entirely
lie within a permitted range.  So, for example if your IOMMU only
implements say, 45 bits of IOVA, then you'd have 0..0x1fff as
your only permitted range.  If, like the POWER paravirtual IOMMU (in
defaut configuration) you have a small (1G) 32-bit range and a large
(45-bit) 64-bit range at a high address, you'd have say:
0x..0x3fff (32-bit range)
and
0x800 .. 0x8001fff (64-bit range)
as your permitted ranges.

If your IOMMU supports truly full 64-bit addressing, but has a
reserved range (for MSIs or whatever) at 0x000..0x then
you'd have permitted ranges of 0..0xaaa9 and
0x..0x.

[snip]
> > For debugging and certain hypervisor edge cases it might be useful to
> > have a call to allow userspace to lookup and specific IOVA in a guest
> > managed pgtable.
> 
> Since all the mapping metadata is from userspace, why would one 
> rely on the kernel to provide such service? Or are you simply asking
> for some debugfs node to dump the I/O page table for a given 
> IOASID?

I'm thinking of this as a debugging aid so you can make sure that how
the kernel is interpreting that metadata in the same way that your
userspace expects it to interpret that metadata.

-- 
David Gibson| I'll have my music baroque, and my code
david AT gibson.dropbear.id.au  | minimalist, thank you.  NOT _the_ _other_
| _way_ _around_!
http://www.ozlabs.org/~dgibson

signature.asc
Description: PGP signature
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

Re: [RFC] /dev/ioasid uAPI proposal

2021-06-17 Thread David Gibson

On Tue, Jun 08, 2021 at 04:04:06PM -0300, Jason Gunthorpe wrote:
> On Tue, Jun 08, 2021 at 10:53:02AM +1000, David Gibson wrote:
> > On Thu, Jun 03, 2021 at 08:52:24AM -0300, Jason Gunthorpe wrote:
> > > On Thu, Jun 03, 2021 at 03:13:44PM +1000, David Gibson wrote:
> > > 
> > > > > We can still consider it a single "address space" from the IOMMU
> > > > > perspective. What has happened is that the address table is not just a
> > > > > 64 bit IOVA, but an extended ~80 bit IOVA formed by "PASID, IOVA".
> > > > 
> > > > True.  This does complexify how we represent what IOVA ranges are
> > > > valid, though.  I'll bet you most implementations don't actually
> > > > implement a full 64-bit IOVA, which means we effectively have a large
> > > > number of windows from (0..max IOVA) for each valid pasid.  This adds
> > > > another reason I don't think my concept of IOVA windows is just a
> > > > power specific thing.
> > > 
> > > Yes
> > > 
> > > Things rapidly get into weird hardware specific stuff though, the
> > > request will be for things like:
> > >   "ARM PASID page table format from SMMU IP block vXX"
> > 
> > So, I'm happy enough for picking a user-managed pagetable format to
> > imply the set of valid IOVA ranges (though a query might be nice).
> 
> I think a query is mandatory, and optionally asking for ranges seems
> generally useful as a HW property.
> 
> The danger is things can get really tricky as the app can ask for
> ranges some HW needs but other HW can't provide. 
> 
> I would encourage a flow where "generic" apps like DPDK can somehow
> just ignore this, or at least be very, very simplified "I want around
> XX GB of IOVA space"
> 
> dpdk type apps vs qemu apps are really quite different and we should
> be carefully that the needs of HW accelerated vIOMMU emulation do not
> trump the needs of simple universal control over a DMA map.

Agreed.

-- 
David Gibson| I'll have my music baroque, and my code
david AT gibson.dropbear.id.au  | minimalist, thank you.  NOT _the_ _other_
| _way_ _around_!
http://www.ozlabs.org/~dgibson


signature.asc
Description: PGP signature
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

Re: [RFC] /dev/ioasid uAPI proposal

2021-06-17 Thread David Gibson

On Thu, Jun 10, 2021 at 06:37:31PM +0200, Jean-Philippe Brucker wrote:
> On Tue, Jun 08, 2021 at 04:31:50PM +1000, David Gibson wrote:
> > For the qemu case, I would imagine a two stage fallback:
> > 
> > 1) Ask for the exact IOMMU capabilities (including pagetable
> >format) that the vIOMMU has.  If the host can supply, you're
> >good
> > 
> > 2) If not, ask for a kernel managed IOAS.  Verify that it can map
> >all the IOVA ranges the guest vIOMMU needs, and has an equal or
> >smaller pagesize than the guest vIOMMU presents.  If so,
> >software emulate the vIOMMU by shadowing guest io pagetable
> >updates into the kernel managed IOAS.
> > 
> > 3) You're out of luck, don't start.
> > 
> > For both (1) and (2) I'd expect it to be asking this question *after*
> > saying what devices are attached to the IOAS, based on the virtual
> > hardware configuration.  That doesn't cover hotplug, of course, for
> > that you have to just fail the hotplug if the new device isn't
> > supportable with the IOAS you already have.
> 
> Yes. So there is a point in time when the IOAS is frozen, and cannot take
> in new incompatible devices. I think that can support the usage I had in
> mind. If the VMM (non-QEMU, let's say) wanted to create one IOASID FD per
> feature set it could bind the first device, freeze the features, then bind

Are you thinking of this "freeze the features" as an explicitly
triggered action?  I have suggested that an explicit "ENABLE" step
might be useful, but that hasn't had much traction from what I've
seen.

> the second device. If the second bind fails it creates a new FD, allowing
> to fall back to (2) for the second device while keeping (1) for the first
> device. A paravirtual IOMMU like virtio-iommu could easily support this as
> it describes pIOMMU properties for each device to the guest. An emulated
> vIOMMU could also support some hybrid cases as you describe below.

Eh.. in some cases.  The vIOMMU model will often dictate what guest
side devices need to share an an address space, which may make it very
impractical to have them in different address spaces on the host side.

> > One can imagine optimizations where for certain intermediate cases you
> > could do a lighter SW emu if the host supports a model that's close to
> > the vIOMMU one, and you're able to trap and emulate the differences.
> > In practice I doubt anyone's going to have time to look for such cases
> > and implement the logic for it.
> > 
> > > For example depending whether the hardware IOMMU is SMMUv2 or SMMUv3, that
> > > completely changes the capabilities offered to the guest (some v2
> > > implementations support nesting page tables, but never PASID nor PRI
> > > unlike v3.) The same vIOMMU could support either, presenting different
> > > capabilities to the guest, even multiple page table formats if we wanted
> > > to be exhaustive (SMMUv2 supports the older 32-bit descriptor), but it
> > > needs to know early on what the hardware is precisely. Then some new page
> > > table format shows up and, although the vIOMMU can support that in
> > > addition to older ones, QEMU will have to pick a single one, that it
> > > assumes the guest knows how to drive?
> > > 
> > > I think once it binds a device to an IOASID fd, QEMU will want to probe
> > > what hardware features are available before going further with the vIOMMU
> > > setup (is there PASID, PRI, which page table formats are supported,
> > > address size, page granule, etc). Obtaining precise information about the
> > > hardware would be less awkward than trying different configurations until
> > > one succeeds. Binding an additional device would then fail if its pIOMMU
> > > doesn't support exactly the features supported for the first device,
> > > because we don't know which ones the guest will choose. QEMU will have to
> > > open a new IOASID fd for that device.
> > 
> > No, this fundamentally misunderstands the qemu model.  The user
> > *chooses* the guest visible platform, and qemu supplies it or fails.
> > There is no negotiation with the guest, because this makes managing
> > migration impossibly difficult.
> 
> I'd like to understand better where the difficulty lies, with migration.
> Is the problem, once we have a guest running on physical machine A, to
> make sure that physical machine B supports the same IOMMU properties
> before migrating the VM over to B?  Why can't QEMU (instead of the user)
> select a feature set on machine A, then when time comes to migrate, query
> all information from the host kernel on machine B and check that it
> matches what was picked for machine A?  Or is it only trying to
> accommodate different sets of features between A and B, that would be too
> difficult?

There are two problems

1) Although it could be done in theory, it's hard, and it would need a
huge rewrite to qemu's whole migration infrastructure to do this.
We'd need a way of representing host features,

Re: [PATCH v7 08/15] iommu: Add support for the map_pages() callback

2021-06-17 Thread Lu Baolu


On 6/16/21 9:38 PM, Georgi Djakov wrote:

From: "Isaac J. Manjarres" 

Since iommu_pgsize can calculate how many pages of the
same size can be mapped/unmapped before the next largest
page size boundary, add support for invoking an IOMMU
driver's map_pages() callback, if it provides one.

Signed-off-by: Isaac J. Manjarres 
Suggested-by: Will Deacon 
Signed-off-by: Georgi Djakov 
---
  drivers/iommu/iommu.c | 43 +++
  1 file changed, 35 insertions(+), 8 deletions(-)

diff --git a/drivers/iommu/iommu.c b/drivers/iommu/iommu.c
index 725622c7e603..70a729ce88b1 100644
--- a/drivers/iommu/iommu.c
+++ b/drivers/iommu/iommu.c
@@ -2429,6 +2429,30 @@ static size_t iommu_pgsize(struct iommu_domain *domain, 
unsigned long iova,
return pgsize;
  }
  
+static int __iommu_map_pages(struct iommu_domain *domain, unsigned long iova,

+phys_addr_t paddr, size_t size, int prot,
+gfp_t gfp, size_t *mapped)
+{
+   const struct iommu_ops *ops = domain->ops;
+   size_t pgsize, count;
+   int ret;
+
+   pgsize = iommu_pgsize(domain, iova, paddr, size, );
+
+   pr_debug("mapping: iova 0x%lx pa %pa pgsize 0x%zx count %zu\n",
+iova, , pgsize, count);
+
+   if (ops->map_pages) {
+   ret = ops->map_pages(domain, iova, paddr, pgsize, count, prot,
+gfp, mapped);
+   } else {
+   ret = ops->map(domain, iova, paddr, pgsize, prot, gfp);
+   *mapped = ret ? 0 : pgsize;
+   }
+
+   return ret;
+}
+
  static int __iommu_map(struct iommu_domain *domain, unsigned long iova,
   phys_addr_t paddr, size_t size, int prot, gfp_t gfp)
  {
@@ -2439,7 +2463,7 @@ static int __iommu_map(struct iommu_domain *domain, 
unsigned long iova,
phys_addr_t orig_paddr = paddr;
int ret = 0;
  
-	if (unlikely(ops->map == NULL ||

+   if (unlikely(!(ops->map || ops->map_pages) ||
 domain->pgsize_bitmap == 0UL))
return -ENODEV;
  
@@ -2463,18 +2487,21 @@ static int __iommu_map(struct iommu_domain *domain, unsigned long iova,

pr_debug("map: iova 0x%lx pa %pa size 0x%zx\n", iova, , size);
  
  	while (size) {

-   size_t pgsize = iommu_pgsize(domain, iova, paddr, size, NULL);
+   size_t mapped = 0;
  
-		pr_debug("mapping: iova 0x%lx pa %pa pgsize 0x%zx\n",

-iova, , pgsize);
-   ret = ops->map(domain, iova, paddr, pgsize, prot, gfp);
+   ret = __iommu_map_pages(domain, iova, paddr, size, prot, gfp,
+   );
+   /*
+* Some pages may have been mapped, even if an error occurred,
+* so we should account for those so they can be unmapped.
+*/
+   size -= mapped;
  
  		if (ret)

break;
  
-		iova += pgsize;

-   paddr += pgsize;
-   size -= pgsize;
+   iova += mapped;
+   paddr += mapped;
}
  
  	/* unroll mapping in case something went wrong */




Reviewed-by: Lu Baolu 

Best regards,
baolu
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

Re: [PATCH v7 07/15] iommu: Hook up '->unmap_pages' driver callback

2021-06-17 Thread Lu Baolu


On 6/16/21 9:38 PM, Georgi Djakov wrote:

From: Will Deacon 

Extend iommu_pgsize() to populate an optional 'count' parameter so that
we can direct unmapping operation to the ->unmap_pages callback if it
has been provided by the driver.

Signed-off-by: Will Deacon 
Signed-off-by: Isaac J. Manjarres 
Signed-off-by: Georgi Djakov 
---
  drivers/iommu/iommu.c | 59 +++
  1 file changed, 50 insertions(+), 9 deletions(-)

diff --git a/drivers/iommu/iommu.c b/drivers/iommu/iommu.c
index 80e14c139d40..725622c7e603 100644
--- a/drivers/iommu/iommu.c
+++ b/drivers/iommu/iommu.c
@@ -2376,11 +2376,11 @@ phys_addr_t iommu_iova_to_phys(struct iommu_domain 
*domain, dma_addr_t iova)
  EXPORT_SYMBOL_GPL(iommu_iova_to_phys);
  
  static size_t iommu_pgsize(struct iommu_domain *domain, unsigned long iova,

-  phys_addr_t paddr, size_t size)
+  phys_addr_t paddr, size_t size, size_t *count)
  {
-   unsigned int pgsize_idx;
+   unsigned int pgsize_idx, pgsize_idx_next;
unsigned long pgsizes;
-   size_t pgsize;
+   size_t offset, pgsize, pgsize_next;
unsigned long addr_merge = paddr | iova;
  
  	/* Page sizes supported by the hardware and small enough for @size */

@@ -2396,7 +2396,36 @@ static size_t iommu_pgsize(struct iommu_domain *domain, 
unsigned long iova,
/* Pick the biggest page size remaining */
pgsize_idx = __fls(pgsizes);
pgsize = BIT(pgsize_idx);
+   if (!count)
+   return pgsize;
  
+	/* Find the next biggest support page size, if it exists */

+   pgsizes = domain->pgsize_bitmap & ~GENMASK(pgsize_idx, 0);
+   if (!pgsizes)
+   goto out_set_count;
+
+   pgsize_idx_next = __ffs(pgsizes);
+   pgsize_next = BIT(pgsize_idx_next);
+
+   /*
+* There's no point trying a bigger page size unless the virtual
+* and physical addresses are similarly offset within the larger page.
+*/
+   if ((iova ^ paddr) & (pgsize_next - 1))
+   goto out_set_count;
+
+   /* Calculate the offset to the next page size alignment boundary */
+   offset = pgsize_next - (addr_merge & (pgsize_next - 1));
+
+   /*
+* If size is big enough to accommodate the larger page, reduce
+* the number of smaller pages.
+*/
+   if (offset + pgsize_next <= size)
+   size = offset;
+
+out_set_count:
+   *count = size >> pgsize_idx;
return pgsize;
  }
  
@@ -2434,7 +2463,7 @@ static int __iommu_map(struct iommu_domain *domain, unsigned long iova,

pr_debug("map: iova 0x%lx pa %pa size 0x%zx\n", iova, , size);
  
  	while (size) {

-   size_t pgsize = iommu_pgsize(domain, iova, paddr, size);
+   size_t pgsize = iommu_pgsize(domain, iova, paddr, size, NULL);
  
  		pr_debug("mapping: iova 0x%lx pa %pa pgsize 0x%zx\n",

 iova, , pgsize);
@@ -2485,6 +2514,19 @@ int iommu_map_atomic(struct iommu_domain *domain, 
unsigned long iova,
  }
  EXPORT_SYMBOL_GPL(iommu_map_atomic);
  
+static size_t __iommu_unmap_pages(struct iommu_domain *domain,

+ unsigned long iova, size_t size,
+ struct iommu_iotlb_gather *iotlb_gather)
+{
+   const struct iommu_ops *ops = domain->ops;
+   size_t pgsize, count;
+
+   pgsize = iommu_pgsize(domain, iova, iova, size, );
+   return ops->unmap_pages ?
+  ops->unmap_pages(domain, iova, pgsize, count, iotlb_gather) :
+  ops->unmap(domain, iova, pgsize, iotlb_gather);
+}
+
  static size_t __iommu_unmap(struct iommu_domain *domain,
unsigned long iova, size_t size,
struct iommu_iotlb_gather *iotlb_gather)
@@ -2494,7 +2536,7 @@ static size_t __iommu_unmap(struct iommu_domain *domain,
unsigned long orig_iova = iova;
unsigned int min_pagesz;
  
-	if (unlikely(ops->unmap == NULL ||

+   if (unlikely(!(ops->unmap || ops->unmap_pages) ||
 domain->pgsize_bitmap == 0UL))
return 0;
  
@@ -2522,10 +2564,9 @@ static size_t __iommu_unmap(struct iommu_domain *domain,

 * or we hit an area that isn't mapped.
 */
while (unmapped < size) {
-   size_t pgsize;
-
-   pgsize = iommu_pgsize(domain, iova, iova, size - unmapped);
-   unmapped_page = ops->unmap(domain, iova, pgsize, iotlb_gather);
+   unmapped_page = __iommu_unmap_pages(domain, iova,
+   size - unmapped,
+   iotlb_gather);
if (!unmapped_page)
break;
  



Reviewed-by: Lu Baolu 

Best regards,
baolu
___
iommu mailing list
iommu@lists.linux-foundation.org

Re: [PATCH v7 06/15] iommu: Split 'addr_merge' argument to iommu_pgsize() into separate parts

2021-06-17 Thread Lu Baolu


On 6/16/21 9:38 PM, Georgi Djakov wrote:

From: Will Deacon 

The 'addr_merge' parameter to iommu_pgsize() is a fabricated address
intended to describe the alignment requirements to consider when
choosing an appropriate page size. On the iommu_map() path, this address
is the logical OR of the virtual and physical addresses.

Subsequent improvements to iommu_pgsize() will need to check the
alignment of the virtual and physical components of 'addr_merge'
independently, so pass them in as separate parameters and reconstruct
'addr_merge' locally.

No functional change.

Signed-off-by: Will Deacon 
Signed-off-by: Isaac J. Manjarres 
Signed-off-by: Georgi Djakov 
---
  drivers/iommu/iommu.c | 10 ++
  1 file changed, 6 insertions(+), 4 deletions(-)

diff --git a/drivers/iommu/iommu.c b/drivers/iommu/iommu.c
index 80e471ada358..80e14c139d40 100644
--- a/drivers/iommu/iommu.c
+++ b/drivers/iommu/iommu.c
@@ -2375,12 +2375,13 @@ phys_addr_t iommu_iova_to_phys(struct iommu_domain 
*domain, dma_addr_t iova)
  }
  EXPORT_SYMBOL_GPL(iommu_iova_to_phys);
  
-static size_t iommu_pgsize(struct iommu_domain *domain,

-  unsigned long addr_merge, size_t size)
+static size_t iommu_pgsize(struct iommu_domain *domain, unsigned long iova,
+  phys_addr_t paddr, size_t size)
  {
unsigned int pgsize_idx;
unsigned long pgsizes;
size_t pgsize;
+   unsigned long addr_merge = paddr | iova;
  
  	/* Page sizes supported by the hardware and small enough for @size */

pgsizes = domain->pgsize_bitmap & GENMASK(__fls(size), 0);
@@ -2433,7 +2434,7 @@ static int __iommu_map(struct iommu_domain *domain, 
unsigned long iova,
pr_debug("map: iova 0x%lx pa %pa size 0x%zx\n", iova, , size);
  
  	while (size) {

-   size_t pgsize = iommu_pgsize(domain, iova | paddr, size);
+   size_t pgsize = iommu_pgsize(domain, iova, paddr, size);
  
  		pr_debug("mapping: iova 0x%lx pa %pa pgsize 0x%zx\n",

 iova, , pgsize);
@@ -2521,8 +2522,9 @@ static size_t __iommu_unmap(struct iommu_domain *domain,
 * or we hit an area that isn't mapped.
 */
while (unmapped < size) {
-   size_t pgsize = iommu_pgsize(domain, iova, size - unmapped);
+   size_t pgsize;
  
+		pgsize = iommu_pgsize(domain, iova, iova, size - unmapped);

unmapped_page = ops->unmap(domain, iova, pgsize, iotlb_gather);
if (!unmapped_page)
break;



Reviewed-by: Lu Baolu 

Best regards,
baolu
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

Re: [PATCH v7 05/15] iommu: Use bitmap to calculate page size in iommu_pgsize()

2021-06-17 Thread Lu Baolu


On 6/16/21 9:38 PM, Georgi Djakov wrote:

From: Will Deacon 

Avoid the potential for shifting values by amounts greater than the
width of their type by using a bitmap to compute page size in
iommu_pgsize().

Signed-off-by: Will Deacon 
Signed-off-by: Isaac J. Manjarres 
Signed-off-by: Georgi Djakov 
---
  drivers/iommu/iommu.c | 31 ---
  1 file changed, 12 insertions(+), 19 deletions(-)

diff --git a/drivers/iommu/iommu.c b/drivers/iommu/iommu.c
index 5419c4b9f27a..80e471ada358 100644
--- a/drivers/iommu/iommu.c
+++ b/drivers/iommu/iommu.c
@@ -8,6 +8,7 @@
  
  #include 

  #include 
+#include 
  #include 
  #include 
  #include 
@@ -2378,30 +2379,22 @@ static size_t iommu_pgsize(struct iommu_domain *domain,
   unsigned long addr_merge, size_t size)
  {
unsigned int pgsize_idx;
+   unsigned long pgsizes;
size_t pgsize;
  
-	/* Max page size that still fits into 'size' */

-   pgsize_idx = __fls(size);
+   /* Page sizes supported by the hardware and small enough for @size */
+   pgsizes = domain->pgsize_bitmap & GENMASK(__fls(size), 0);
  
-	/* need to consider alignment requirements ? */

-   if (likely(addr_merge)) {
-   /* Max page size allowed by address */
-   unsigned int align_pgsize_idx = __ffs(addr_merge);
-   pgsize_idx = min(pgsize_idx, align_pgsize_idx);
-   }
-
-   /* build a mask of acceptable page sizes */
-   pgsize = (1UL << (pgsize_idx + 1)) - 1;
-
-   /* throw away page sizes not supported by the hardware */
-   pgsize &= domain->pgsize_bitmap;
+   /* Constrain the page sizes further based on the maximum alignment */
+   if (likely(addr_merge))
+   pgsizes &= GENMASK(__ffs(addr_merge), 0);
  
-	/* make sure we're still sane */

-   BUG_ON(!pgsize);
+   /* Make sure we have at least one suitable page size */
+   BUG_ON(!pgsizes);
  
-	/* pick the biggest page */

-   pgsize_idx = __fls(pgsize);
-   pgsize = 1UL << pgsize_idx;
+   /* Pick the biggest page size remaining */
+   pgsize_idx = __fls(pgsizes);
+   pgsize = BIT(pgsize_idx);
  
  	return pgsize;

  }



Reviewed-by: Lu Baolu 

Best regards,
baolu
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

Re: [PATCH v12 00/12] Restricted DMA

2021-06-17 Thread Claire Chang

v13: https://lore.kernel.org/patchwork/cover/1448001/
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

Re: [PATCH v13 00/12] Restricted DMA

2021-06-17 Thread Claire Chang

v13: https://lore.kernel.org/patchwork/cover/1448001/
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

[PATCH v13 12/12] of: Add plumbing for restricted DMA pool

2021-06-17 Thread Claire Chang

If a device is not behind an IOMMU, we look up the device node and set
up the restricted DMA when the restricted-dma-pool is presented.

Signed-off-by: Claire Chang 
Tested-by: Stefano Stabellini 
Tested-by: Will Deacon 
---
 drivers/of/address.c| 33 +
 drivers/of/device.c |  3 +++
 drivers/of/of_private.h |  6 ++
 3 files changed, 42 insertions(+)

diff --git a/drivers/of/address.c b/drivers/of/address.c
index 73ddf2540f3f..cdf700fba5c4 100644
--- a/drivers/of/address.c
+++ b/drivers/of/address.c
@@ -8,6 +8,7 @@
 #include 
 #include 
 #include 
+#include 
 #include 
 #include 
 #include 
@@ -1022,6 +1023,38 @@ int of_dma_get_range(struct device_node *np, const 
struct bus_dma_region **map)
of_node_put(node);
return ret;
 }
+
+int of_dma_set_restricted_buffer(struct device *dev, struct device_node *np)
+{
+   struct device_node *node, *of_node = dev->of_node;
+   int count, i;
+
+   count = of_property_count_elems_of_size(of_node, "memory-region",
+   sizeof(u32));
+   /*
+* If dev->of_node doesn't exist or doesn't contain memory-region, try
+* the OF node having DMA configuration.
+*/
+   if (count <= 0) {
+   of_node = np;
+   count = of_property_count_elems_of_size(
+   of_node, "memory-region", sizeof(u32));
+   }
+
+   for (i = 0; i < count; i++) {
+   node = of_parse_phandle(of_node, "memory-region", i);
+   /*
+* There might be multiple memory regions, but only one
+* restricted-dma-pool region is allowed.
+*/
+   if (of_device_is_compatible(node, "restricted-dma-pool") &&
+   of_device_is_available(node))
+   return of_reserved_mem_device_init_by_idx(dev, of_node,
+ i);
+   }
+
+   return 0;
+}
 #endif /* CONFIG_HAS_DMA */
 
 /**
diff --git a/drivers/of/device.c b/drivers/of/device.c
index 6cb86de404f1..e68316836a7a 100644
--- a/drivers/of/device.c
+++ b/drivers/of/device.c
@@ -165,6 +165,9 @@ int of_dma_configure_id(struct device *dev, struct 
device_node *np,
 
arch_setup_dma_ops(dev, dma_start, size, iommu, coherent);
 
+   if (!iommu)
+   return of_dma_set_restricted_buffer(dev, np);
+
return 0;
 }
 EXPORT_SYMBOL_GPL(of_dma_configure_id);
diff --git a/drivers/of/of_private.h b/drivers/of/of_private.h
index d9e6a324de0a..25cebbed5f02 100644
--- a/drivers/of/of_private.h
+++ b/drivers/of/of_private.h
@@ -161,12 +161,18 @@ struct bus_dma_region;
 #if defined(CONFIG_OF_ADDRESS) && defined(CONFIG_HAS_DMA)
 int of_dma_get_range(struct device_node *np,
const struct bus_dma_region **map);
+int of_dma_set_restricted_buffer(struct device *dev, struct device_node *np);
 #else
 static inline int of_dma_get_range(struct device_node *np,
const struct bus_dma_region **map)
 {
return -ENODEV;
 }
+static inline int of_dma_set_restricted_buffer(struct device *dev,
+  struct device_node *np)
+{
+   return -ENODEV;
+}
 #endif
 
 #endif /* _LINUX_OF_PRIVATE_H */
-- 
2.32.0.288.g62a8d224e6-goog

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

[PATCH v13 11/12] dt-bindings: of: Add restricted DMA pool

2021-06-17 Thread Claire Chang

Introduce the new compatible string, restricted-dma-pool, for restricted
DMA. One can specify the address and length of the restricted DMA memory
region by restricted-dma-pool in the reserved-memory node.

Signed-off-by: Claire Chang 
Tested-by: Stefano Stabellini 
Tested-by: Will Deacon 
---
 .../reserved-memory/reserved-memory.txt   | 36 +--
 1 file changed, 33 insertions(+), 3 deletions(-)

diff --git 
a/Documentation/devicetree/bindings/reserved-memory/reserved-memory.txt 
b/Documentation/devicetree/bindings/reserved-memory/reserved-memory.txt
index e8d3096d922c..39b5f4c5a511 100644
--- a/Documentation/devicetree/bindings/reserved-memory/reserved-memory.txt
+++ b/Documentation/devicetree/bindings/reserved-memory/reserved-memory.txt
@@ -51,6 +51,23 @@ compatible (optional) - standard definition
   used as a shared pool of DMA buffers for a set of devices. It can
   be used by an operating system to instantiate the necessary pool
   management subsystem if necessary.
+- restricted-dma-pool: This indicates a region of memory meant to be
+  used as a pool of restricted DMA buffers for a set of devices. The
+  memory region would be the only region accessible to those devices.
+  When using this, the no-map and reusable properties must not be set,
+  so the operating system can create a virtual mapping that will be 
used
+  for synchronization. The main purpose for restricted DMA is to
+  mitigate the lack of DMA access control on systems without an IOMMU,
+  which could result in the DMA accessing the system memory at
+  unexpected times and/or unexpected addresses, possibly leading to 
data
+  leakage or corruption. The feature on its own provides a basic level
+  of protection against the DMA overwriting buffer contents at
+  unexpected times. However, to protect against general data leakage 
and
+  system memory corruption, the system needs to provide way to lock 
down
+  the memory access, e.g., MPU. Note that since coherent allocation
+  needs remapping, one must set up another device coherent pool by
+  shared-dma-pool and use dma_alloc_from_dev_coherent instead for 
atomic
+  coherent allocation.
 - vendor specific string in the form ,[-]
 no-map (optional) - empty property
 - Indicates the operating system must not create a virtual mapping
@@ -85,10 +102,11 @@ memory-region-names (optional) - a list of names, one for 
each corresponding
 
 Example
 ---
-This example defines 3 contiguous regions are defined for Linux kernel:
+This example defines 4 contiguous regions for Linux kernel:
 one default of all device drivers (named linux,cma@7200 and 64MiB in size),
-one dedicated to the framebuffer device (named framebuffer@7800, 8MiB), and
-one for multimedia processing (named multimedia-memory@7700, 64MiB).
+one dedicated to the framebuffer device (named framebuffer@7800, 8MiB),
+one for multimedia processing (named multimedia-memory@7700, 64MiB), and
+one for restricted dma pool (named restricted_dma_reserved@0x5000, 64MiB).
 
 / {
#address-cells = <1>;
@@ -120,6 +138,11 @@ one for multimedia processing (named 
multimedia-memory@7700, 64MiB).
compatible = "acme,multimedia-memory";
reg = <0x7700 0x400>;
};
+
+   restricted_dma_reserved: restricted_dma_reserved {
+   compatible = "restricted-dma-pool";
+   reg = <0x5000 0x400>;
+   };
};
 
/* ... */
@@ -138,4 +161,11 @@ one for multimedia processing (named 
multimedia-memory@7700, 64MiB).
memory-region = <_reserved>;
/* ... */
};
+
+   pcie_device: pcie_device@0,0 {
+   reg = <0x8301 0x0 0x 0x0 0x0010
+  0x8301 0x0 0x0010 0x0 0x0010>;
+   memory-region = <_dma_reserved>;
+   /* ... */
+   };
 };
-- 
2.32.0.288.g62a8d224e6-goog

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

[PATCH v13 10/12] swiotlb: Add restricted DMA pool initialization

2021-06-17 Thread Claire Chang

Add the initialization function to create restricted DMA pools from
matching reserved-memory nodes.

Regardless of swiotlb setting, the restricted DMA pool is preferred if
available.

The restricted DMA pools provide a basic level of protection against the
DMA overwriting buffer contents at unexpected times. However, to protect
against general data leakage and system memory corruption, the system
needs to provide a way to lock down the memory access, e.g., MPU.

Signed-off-by: Claire Chang 
Reviewed-by: Christoph Hellwig 
Tested-by: Stefano Stabellini 
Tested-by: Will Deacon 
---
 include/linux/swiotlb.h |  3 +-
 kernel/dma/Kconfig  | 14 
 kernel/dma/swiotlb.c| 76 +
 3 files changed, 92 insertions(+), 1 deletion(-)

diff --git a/include/linux/swiotlb.h b/include/linux/swiotlb.h
index a73fad460162..175b6c113ed8 100644
--- a/include/linux/swiotlb.h
+++ b/include/linux/swiotlb.h
@@ -73,7 +73,8 @@ extern enum swiotlb_force swiotlb_force;
  * range check to see if the memory was in fact allocated by this
  * API.
  * @nslabs:The number of IO TLB blocks (in groups of 64) between @start and
- * @end. This is command line adjustable via setup_io_tlb_npages.
+ * @end. For default swiotlb, this is command line adjustable via
+ * setup_io_tlb_npages.
  * @used:  The number of used IO TLB block.
  * @list:  The free list describing the number of free entries available
  * from each index.
diff --git a/kernel/dma/Kconfig b/kernel/dma/Kconfig
index 77b405508743..3e961dc39634 100644
--- a/kernel/dma/Kconfig
+++ b/kernel/dma/Kconfig
@@ -80,6 +80,20 @@ config SWIOTLB
bool
select NEED_DMA_MAP_STATE
 
+config DMA_RESTRICTED_POOL
+   bool "DMA Restricted Pool"
+   depends on OF && OF_RESERVED_MEM
+   select SWIOTLB
+   help
+ This enables support for restricted DMA pools which provide a level of
+ DMA memory protection on systems with limited hardware protection
+ capabilities, such as those lacking an IOMMU.
+
+ For more information see
+ 

+ and .
+ If unsure, say "n".
+
 #
 # Should be selected if we can mmap non-coherent mappings to userspace.
 # The only thing that is really required is a way to set an uncached bit
diff --git a/kernel/dma/swiotlb.c b/kernel/dma/swiotlb.c
index 6499cfbfe95f..d4099f03b2f0 100644
--- a/kernel/dma/swiotlb.c
+++ b/kernel/dma/swiotlb.c
@@ -39,6 +39,13 @@
 #ifdef CONFIG_DEBUG_FS
 #include 
 #endif
+#ifdef CONFIG_DMA_RESTRICTED_POOL
+#include 
+#include 
+#include 
+#include 
+#include 
+#endif
 
 #include 
 #include 
@@ -736,4 +743,73 @@ bool swiotlb_free(struct device *dev, struct page *page, 
size_t size)
return true;
 }
 
+static int rmem_swiotlb_device_init(struct reserved_mem *rmem,
+   struct device *dev)
+{
+   struct io_tlb_mem *mem = rmem->priv;
+   unsigned long nslabs = rmem->size >> IO_TLB_SHIFT;
+
+   /*
+* Since multiple devices can share the same pool, the private data,
+* io_tlb_mem struct, will be initialized by the first device attached
+* to it.
+*/
+   if (!mem) {
+   mem = kzalloc(struct_size(mem, slots, nslabs), GFP_KERNEL);
+   if (!mem)
+   return -ENOMEM;
+
+   swiotlb_init_io_tlb_mem(mem, rmem->base, nslabs, false);
+   mem->force_bounce = true;
+   mem->for_alloc = true;
+   set_memory_decrypted((unsigned long)phys_to_virt(rmem->base),
+rmem->size >> PAGE_SHIFT);
+
+   rmem->priv = mem;
+
+   if (IS_ENABLED(CONFIG_DEBUG_FS)) {
+   mem->debugfs =
+   debugfs_create_dir(rmem->name, debugfs_dir);
+   swiotlb_create_debugfs_files(mem);
+   }
+   }
+
+   dev->dma_io_tlb_mem = mem;
+
+   return 0;
+}
+
+static void rmem_swiotlb_device_release(struct reserved_mem *rmem,
+   struct device *dev)
+{
+   dev->dma_io_tlb_mem = io_tlb_default_mem;
+}
+
+static const struct reserved_mem_ops rmem_swiotlb_ops = {
+   .device_init = rmem_swiotlb_device_init,
+   .device_release = rmem_swiotlb_device_release,
+};
+
+static int __init rmem_swiotlb_setup(struct reserved_mem *rmem)
+{
+   unsigned long node = rmem->fdt_node;
+
+   if (of_get_flat_dt_prop(node, "reusable", NULL) ||
+   of_get_flat_dt_prop(node, "linux,cma-default", NULL) ||
+   of_get_flat_dt_prop(node, "linux,dma-default", NULL) ||
+   of_get_flat_dt_prop(node, "no-map", NULL))
+   return -EINVAL;
+
+   if (PageHighMem(pfn_to_page(PHYS_PFN(rmem->base {
+   pr_err("Restricted DMA pool must be accessible within the 
linear mapping.");
+   return

[PATCH v13 09/12] swiotlb: Add restricted DMA alloc/free support

2021-06-17 Thread Claire Chang

Add the functions, swiotlb_{alloc,free} and is_swiotlb_for_alloc to
support the memory allocation from restricted DMA pool.

The restricted DMA pool is preferred if available.

Note that since coherent allocation needs remapping, one must set up
another device coherent pool by shared-dma-pool and use
dma_alloc_from_dev_coherent instead for atomic coherent allocation.

Signed-off-by: Claire Chang 
Reviewed-by: Christoph Hellwig 
Tested-by: Stefano Stabellini 
Tested-by: Will Deacon 
---
 include/linux/swiotlb.h | 26 ++
 kernel/dma/direct.c | 49 +++--
 kernel/dma/swiotlb.c| 38 ++--
 3 files changed, 99 insertions(+), 14 deletions(-)

diff --git a/include/linux/swiotlb.h b/include/linux/swiotlb.h
index 8d8855c77d9a..a73fad460162 100644
--- a/include/linux/swiotlb.h
+++ b/include/linux/swiotlb.h
@@ -85,6 +85,7 @@ extern enum swiotlb_force swiotlb_force;
  * @debugfs:   The dentry to debugfs.
  * @late_alloc:%true if allocated using the page allocator
  * @force_bounce: %true if swiotlb bouncing is forced
+ * @for_alloc:  %true if the pool is used for memory allocation
  */
 struct io_tlb_mem {
phys_addr_t start;
@@ -96,6 +97,7 @@ struct io_tlb_mem {
struct dentry *debugfs;
bool late_alloc;
bool force_bounce;
+   bool for_alloc;
struct io_tlb_slot {
phys_addr_t orig_addr;
size_t alloc_size;
@@ -156,4 +158,28 @@ static inline void swiotlb_adjust_size(unsigned long size)
 extern void swiotlb_print_info(void);
 extern void swiotlb_set_max_segment(unsigned int);
 
+#ifdef CONFIG_DMA_RESTRICTED_POOL
+struct page *swiotlb_alloc(struct device *dev, size_t size);
+bool swiotlb_free(struct device *dev, struct page *page, size_t size);
+
+static inline bool is_swiotlb_for_alloc(struct device *dev)
+{
+   return dev->dma_io_tlb_mem->for_alloc;
+}
+#else
+static inline struct page *swiotlb_alloc(struct device *dev, size_t size)
+{
+   return NULL;
+}
+static inline bool swiotlb_free(struct device *dev, struct page *page,
+   size_t size)
+{
+   return false;
+}
+static inline bool is_swiotlb_for_alloc(struct device *dev)
+{
+   return false;
+}
+#endif /* CONFIG_DMA_RESTRICTED_POOL */
+
 #endif /* __LINUX_SWIOTLB_H */
diff --git a/kernel/dma/direct.c b/kernel/dma/direct.c
index a92465b4eb12..2de33e5d302b 100644
--- a/kernel/dma/direct.c
+++ b/kernel/dma/direct.c
@@ -75,6 +75,15 @@ static bool dma_coherent_ok(struct device *dev, phys_addr_t 
phys, size_t size)
min_not_zero(dev->coherent_dma_mask, dev->bus_dma_limit);
 }
 
+static void __dma_direct_free_pages(struct device *dev, struct page *page,
+   size_t size)
+{
+   if (IS_ENABLED(CONFIG_DMA_RESTRICTED_POOL) &&
+   swiotlb_free(dev, page, size))
+   return;
+   dma_free_contiguous(dev, page, size);
+}
+
 static struct page *__dma_direct_alloc_pages(struct device *dev, size_t size,
gfp_t gfp)
 {
@@ -86,6 +95,16 @@ static struct page *__dma_direct_alloc_pages(struct device 
*dev, size_t size,
 
gfp |= dma_direct_optimal_gfp_mask(dev, dev->coherent_dma_mask,
   _limit);
+   if (IS_ENABLED(CONFIG_DMA_RESTRICTED_POOL) &&
+   is_swiotlb_for_alloc(dev)) {
+   page = swiotlb_alloc(dev, size);
+   if (page && !dma_coherent_ok(dev, page_to_phys(page), size)) {
+   __dma_direct_free_pages(dev, page, size);
+   return NULL;
+   }
+   return page;
+   }
+
page = dma_alloc_contiguous(dev, size, gfp);
if (page && !dma_coherent_ok(dev, page_to_phys(page), size)) {
dma_free_contiguous(dev, page, size);
@@ -142,7 +161,7 @@ void *dma_direct_alloc(struct device *dev, size_t size,
gfp |= __GFP_NOWARN;
 
if ((attrs & DMA_ATTR_NO_KERNEL_MAPPING) &&
-   !force_dma_unencrypted(dev)) {
+   !force_dma_unencrypted(dev) && !is_swiotlb_for_alloc(dev)) {
page = __dma_direct_alloc_pages(dev, size, gfp & ~__GFP_ZERO);
if (!page)
return NULL;
@@ -155,18 +174,23 @@ void *dma_direct_alloc(struct device *dev, size_t size,
}
 
if (!IS_ENABLED(CONFIG_ARCH_HAS_DMA_SET_UNCACHED) &&
-   !IS_ENABLED(CONFIG_DMA_DIRECT_REMAP) &&
-   !dev_is_dma_coherent(dev))
+   !IS_ENABLED(CONFIG_DMA_DIRECT_REMAP) && !dev_is_dma_coherent(dev) &&
+   !is_swiotlb_for_alloc(dev))
return arch_dma_alloc(dev, size, dma_handle, gfp, attrs);
 
/*
 * Remapping or decrypting memory may block. If either is required and
 * we can't block, allocate the memory from the atomic pools.
+* If restricted DMA (i.e., is_swiotlb_for_alloc) is required, one must
+*

[PATCH v13 08/12] swiotlb: Refactor swiotlb_tbl_unmap_single

2021-06-17 Thread Claire Chang

Add a new function, swiotlb_release_slots, to make the code reusable for
supporting different bounce buffer pools.

Signed-off-by: Claire Chang 
Reviewed-by: Christoph Hellwig 
Tested-by: Stefano Stabellini 
Tested-by: Will Deacon 
---
 kernel/dma/swiotlb.c | 35 ---
 1 file changed, 20 insertions(+), 15 deletions(-)

diff --git a/kernel/dma/swiotlb.c b/kernel/dma/swiotlb.c
index 89049d021d0d..ff09341bb9f5 100644
--- a/kernel/dma/swiotlb.c
+++ b/kernel/dma/swiotlb.c
@@ -556,27 +556,15 @@ phys_addr_t swiotlb_tbl_map_single(struct device *dev, 
phys_addr_t orig_addr,
return tlb_addr;
 }
 
-/*
- * tlb_addr is the physical address of the bounce buffer to unmap.
- */
-void swiotlb_tbl_unmap_single(struct device *hwdev, phys_addr_t tlb_addr,
- size_t mapping_size, enum dma_data_direction dir,
- unsigned long attrs)
+static void swiotlb_release_slots(struct device *dev, phys_addr_t tlb_addr)
 {
-   struct io_tlb_mem *mem = hwdev->dma_io_tlb_mem;
+   struct io_tlb_mem *mem = dev->dma_io_tlb_mem;
unsigned long flags;
-   unsigned int offset = swiotlb_align_offset(hwdev, tlb_addr);
+   unsigned int offset = swiotlb_align_offset(dev, tlb_addr);
int index = (tlb_addr - offset - mem->start) >> IO_TLB_SHIFT;
int nslots = nr_slots(mem->slots[index].alloc_size + offset);
int count, i;
 
-   /*
-* First, sync the memory before unmapping the entry
-*/
-   if (!(attrs & DMA_ATTR_SKIP_CPU_SYNC) &&
-   (dir == DMA_FROM_DEVICE || dir == DMA_BIDIRECTIONAL))
-   swiotlb_bounce(hwdev, tlb_addr, mapping_size, DMA_FROM_DEVICE);
-
/*
 * Return the buffer to the free list by setting the corresponding
 * entries to indicate the number of contiguous entries available.
@@ -611,6 +599,23 @@ void swiotlb_tbl_unmap_single(struct device *hwdev, 
phys_addr_t tlb_addr,
spin_unlock_irqrestore(>lock, flags);
 }
 
+/*
+ * tlb_addr is the physical address of the bounce buffer to unmap.
+ */
+void swiotlb_tbl_unmap_single(struct device *dev, phys_addr_t tlb_addr,
+ size_t mapping_size, enum dma_data_direction dir,
+ unsigned long attrs)
+{
+   /*
+* First, sync the memory before unmapping the entry
+*/
+   if (!(attrs & DMA_ATTR_SKIP_CPU_SYNC) &&
+   (dir == DMA_FROM_DEVICE || dir == DMA_BIDIRECTIONAL))
+   swiotlb_bounce(dev, tlb_addr, mapping_size, DMA_FROM_DEVICE);
+
+   swiotlb_release_slots(dev, tlb_addr);
+}
+
 void swiotlb_sync_single_for_device(struct device *dev, phys_addr_t tlb_addr,
size_t size, enum dma_data_direction dir)
 {
-- 
2.32.0.288.g62a8d224e6-goog

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

[PATCH v13 07/12] swiotlb: Move alloc_size to swiotlb_find_slots

2021-06-17 Thread Claire Chang

Rename find_slots to swiotlb_find_slots and move the maintenance of
alloc_size to it for better code reusability later.

Signed-off-by: Claire Chang 
Reviewed-by: Christoph Hellwig 
Tested-by: Stefano Stabellini 
Tested-by: Will Deacon 
---
 kernel/dma/swiotlb.c | 16 
 1 file changed, 8 insertions(+), 8 deletions(-)

diff --git a/kernel/dma/swiotlb.c b/kernel/dma/swiotlb.c
index 13891d5de8c9..89049d021d0d 100644
--- a/kernel/dma/swiotlb.c
+++ b/kernel/dma/swiotlb.c
@@ -432,8 +432,8 @@ static unsigned int wrap_index(struct io_tlb_mem *mem, 
unsigned int index)
  * Find a suitable number of IO TLB entries size that will fit this request and
  * allocate a buffer from that IO TLB pool.
  */
-static int find_slots(struct device *dev, phys_addr_t orig_addr,
-   size_t alloc_size)
+static int swiotlb_find_slots(struct device *dev, phys_addr_t orig_addr,
+ size_t alloc_size)
 {
struct io_tlb_mem *mem = dev->dma_io_tlb_mem;
unsigned long boundary_mask = dma_get_seg_boundary(dev);
@@ -488,8 +488,11 @@ static int find_slots(struct device *dev, phys_addr_t 
orig_addr,
return -1;
 
 found:
-   for (i = index; i < index + nslots; i++)
+   for (i = index; i < index + nslots; i++) {
mem->slots[i].list = 0;
+   mem->slots[i].alloc_size =
+   alloc_size - ((i - index) << IO_TLB_SHIFT);
+   }
for (i = index - 1;
 io_tlb_offset(i) != IO_TLB_SEGSIZE - 1 &&
 mem->slots[i].list; i--)
@@ -530,7 +533,7 @@ phys_addr_t swiotlb_tbl_map_single(struct device *dev, 
phys_addr_t orig_addr,
return (phys_addr_t)DMA_MAPPING_ERROR;
}
 
-   index = find_slots(dev, orig_addr, alloc_size + offset);
+   index = swiotlb_find_slots(dev, orig_addr, alloc_size + offset);
if (index == -1) {
if (!(attrs & DMA_ATTR_NO_WARN))
dev_warn_ratelimited(dev,
@@ -544,11 +547,8 @@ phys_addr_t swiotlb_tbl_map_single(struct device *dev, 
phys_addr_t orig_addr,
 * This is needed when we sync the memory.  Then we sync the buffer if
 * needed.
 */
-   for (i = 0; i < nr_slots(alloc_size + offset); i++) {
+   for (i = 0; i < nr_slots(alloc_size + offset); i++)
mem->slots[index + i].orig_addr = slot_addr(orig_addr, i);
-   mem->slots[index + i].alloc_size =
-   alloc_size - (i << IO_TLB_SHIFT);
-   }
tlb_addr = slot_addr(mem->start, index) + offset;
if (!(attrs & DMA_ATTR_SKIP_CPU_SYNC) &&
(dir == DMA_TO_DEVICE || dir == DMA_BIDIRECTIONAL))
-- 
2.32.0.288.g62a8d224e6-goog

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

[PATCH v13 06/12] swiotlb: Use is_swiotlb_force_bounce for swiotlb data bouncing

2021-06-17 Thread Claire Chang

Propagate the swiotlb_force into io_tlb_default_mem->force_bounce and
use it to determine whether to bounce the data or not. This will be
useful later to allow for different pools.

Signed-off-by: Claire Chang 
Reviewed-by: Christoph Hellwig 
Tested-by: Stefano Stabellini 
Tested-by: Will Deacon 
---
 drivers/xen/swiotlb-xen.c |  2 +-
 include/linux/swiotlb.h   | 11 +++
 kernel/dma/direct.c   |  2 +-
 kernel/dma/direct.h   |  2 +-
 kernel/dma/swiotlb.c  |  4 
 5 files changed, 18 insertions(+), 3 deletions(-)

diff --git a/drivers/xen/swiotlb-xen.c b/drivers/xen/swiotlb-xen.c
index 0c6ed09f8513..4730a146fa35 100644
--- a/drivers/xen/swiotlb-xen.c
+++ b/drivers/xen/swiotlb-xen.c
@@ -369,7 +369,7 @@ static dma_addr_t xen_swiotlb_map_page(struct device *dev, 
struct page *page,
if (dma_capable(dev, dev_addr, size, true) &&
!range_straddles_page_boundary(phys, size) &&
!xen_arch_need_swiotlb(dev, phys, dev_addr) &&
-   swiotlb_force != SWIOTLB_FORCE)
+   !is_swiotlb_force_bounce(dev))
goto done;
 
/*
diff --git a/include/linux/swiotlb.h b/include/linux/swiotlb.h
index dd1c30a83058..8d8855c77d9a 100644
--- a/include/linux/swiotlb.h
+++ b/include/linux/swiotlb.h
@@ -84,6 +84,7 @@ extern enum swiotlb_force swiotlb_force;
  * unmap calls.
  * @debugfs:   The dentry to debugfs.
  * @late_alloc:%true if allocated using the page allocator
+ * @force_bounce: %true if swiotlb bouncing is forced
  */
 struct io_tlb_mem {
phys_addr_t start;
@@ -94,6 +95,7 @@ struct io_tlb_mem {
spinlock_t lock;
struct dentry *debugfs;
bool late_alloc;
+   bool force_bounce;
struct io_tlb_slot {
phys_addr_t orig_addr;
size_t alloc_size;
@@ -109,6 +111,11 @@ static inline bool is_swiotlb_buffer(struct device *dev, 
phys_addr_t paddr)
return mem && paddr >= mem->start && paddr < mem->end;
 }
 
+static inline bool is_swiotlb_force_bounce(struct device *dev)
+{
+   return dev->dma_io_tlb_mem->force_bounce;
+}
+
 void __init swiotlb_exit(void);
 unsigned int swiotlb_max_segment(void);
 size_t swiotlb_max_mapping_size(struct device *dev);
@@ -120,6 +127,10 @@ static inline bool is_swiotlb_buffer(struct device *dev, 
phys_addr_t paddr)
 {
return false;
 }
+static inline bool is_swiotlb_force_bounce(struct device *dev)
+{
+   return false;
+}
 static inline void swiotlb_exit(void)
 {
 }
diff --git a/kernel/dma/direct.c b/kernel/dma/direct.c
index 7a88c34d0867..a92465b4eb12 100644
--- a/kernel/dma/direct.c
+++ b/kernel/dma/direct.c
@@ -496,7 +496,7 @@ size_t dma_direct_max_mapping_size(struct device *dev)
 {
/* If SWIOTLB is active, use its maximum mapping size */
if (is_swiotlb_active(dev) &&
-   (dma_addressing_limited(dev) || swiotlb_force == SWIOTLB_FORCE))
+   (dma_addressing_limited(dev) || is_swiotlb_force_bounce(dev)))
return swiotlb_max_mapping_size(dev);
return SIZE_MAX;
 }
diff --git a/kernel/dma/direct.h b/kernel/dma/direct.h
index 13e9e7158d94..4632b0f4f72e 100644
--- a/kernel/dma/direct.h
+++ b/kernel/dma/direct.h
@@ -87,7 +87,7 @@ static inline dma_addr_t dma_direct_map_page(struct device 
*dev,
phys_addr_t phys = page_to_phys(page) + offset;
dma_addr_t dma_addr = phys_to_dma(dev, phys);
 
-   if (unlikely(swiotlb_force == SWIOTLB_FORCE))
+   if (is_swiotlb_force_bounce(dev))
return swiotlb_map(dev, phys, size, dir, attrs);
 
if (unlikely(!dma_capable(dev, dma_addr, size, true))) {
diff --git a/kernel/dma/swiotlb.c b/kernel/dma/swiotlb.c
index 409694d7a8ad..13891d5de8c9 100644
--- a/kernel/dma/swiotlb.c
+++ b/kernel/dma/swiotlb.c
@@ -179,6 +179,10 @@ static void swiotlb_init_io_tlb_mem(struct io_tlb_mem 
*mem, phys_addr_t start,
mem->end = mem->start + bytes;
mem->index = 0;
mem->late_alloc = late_alloc;
+
+   if (swiotlb_force == SWIOTLB_FORCE)
+   mem->force_bounce = true;
+
spin_lock_init(>lock);
for (i = 0; i < mem->nslabs; i++) {
mem->slots[i].list = IO_TLB_SEGSIZE - io_tlb_offset(i);
-- 
2.32.0.288.g62a8d224e6-goog

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

[PATCH v13 05/12] swiotlb: Update is_swiotlb_active to add a struct device argument

2021-06-17 Thread Claire Chang

Update is_swiotlb_active to add a struct device argument. This will be
useful later to allow for different pools.

Signed-off-by: Claire Chang 
Reviewed-by: Christoph Hellwig 
Tested-by: Stefano Stabellini 
Tested-by: Will Deacon 
---
 drivers/gpu/drm/i915/gem/i915_gem_internal.c | 2 +-
 drivers/gpu/drm/nouveau/nouveau_ttm.c| 2 +-
 drivers/pci/xen-pcifront.c   | 2 +-
 include/linux/swiotlb.h  | 4 ++--
 kernel/dma/direct.c  | 2 +-
 kernel/dma/swiotlb.c | 4 ++--
 6 files changed, 8 insertions(+), 8 deletions(-)

diff --git a/drivers/gpu/drm/i915/gem/i915_gem_internal.c 
b/drivers/gpu/drm/i915/gem/i915_gem_internal.c
index a9d65fc8aa0e..4b7afa0fc85d 100644
--- a/drivers/gpu/drm/i915/gem/i915_gem_internal.c
+++ b/drivers/gpu/drm/i915/gem/i915_gem_internal.c
@@ -42,7 +42,7 @@ static int i915_gem_object_get_pages_internal(struct 
drm_i915_gem_object *obj)
 
max_order = MAX_ORDER;
 #ifdef CONFIG_SWIOTLB
-   if (is_swiotlb_active()) {
+   if (is_swiotlb_active(obj->base.dev->dev)) {
unsigned int max_segment;
 
max_segment = swiotlb_max_segment();
diff --git a/drivers/gpu/drm/nouveau/nouveau_ttm.c 
b/drivers/gpu/drm/nouveau/nouveau_ttm.c
index 9662522aa066..be15bfd9e0ee 100644
--- a/drivers/gpu/drm/nouveau/nouveau_ttm.c
+++ b/drivers/gpu/drm/nouveau/nouveau_ttm.c
@@ -321,7 +321,7 @@ nouveau_ttm_init(struct nouveau_drm *drm)
}
 
 #if IS_ENABLED(CONFIG_SWIOTLB) && IS_ENABLED(CONFIG_X86)
-   need_swiotlb = is_swiotlb_active();
+   need_swiotlb = is_swiotlb_active(dev->dev);
 #endif
 
ret = ttm_bo_device_init(>ttm.bdev, _bo_driver,
diff --git a/drivers/pci/xen-pcifront.c b/drivers/pci/xen-pcifront.c
index b7a8f3a1921f..0d56985bfe81 100644
--- a/drivers/pci/xen-pcifront.c
+++ b/drivers/pci/xen-pcifront.c
@@ -693,7 +693,7 @@ static int pcifront_connect_and_init_dma(struct 
pcifront_device *pdev)
 
spin_unlock(_dev_lock);
 
-   if (!err && !is_swiotlb_active()) {
+   if (!err && !is_swiotlb_active(>xdev->dev)) {
err = pci_xen_swiotlb_init_late();
if (err)
dev_err(>xdev->dev, "Could not setup SWIOTLB!\n");
diff --git a/include/linux/swiotlb.h b/include/linux/swiotlb.h
index d1f3d95881cd..dd1c30a83058 100644
--- a/include/linux/swiotlb.h
+++ b/include/linux/swiotlb.h
@@ -112,7 +112,7 @@ static inline bool is_swiotlb_buffer(struct device *dev, 
phys_addr_t paddr)
 void __init swiotlb_exit(void);
 unsigned int swiotlb_max_segment(void);
 size_t swiotlb_max_mapping_size(struct device *dev);
-bool is_swiotlb_active(void);
+bool is_swiotlb_active(struct device *dev);
 void __init swiotlb_adjust_size(unsigned long size);
 #else
 #define swiotlb_force SWIOTLB_NO_FORCE
@@ -132,7 +132,7 @@ static inline size_t swiotlb_max_mapping_size(struct device 
*dev)
return SIZE_MAX;
 }
 
-static inline bool is_swiotlb_active(void)
+static inline bool is_swiotlb_active(struct device *dev)
 {
return false;
 }
diff --git a/kernel/dma/direct.c b/kernel/dma/direct.c
index 84c9feb5474a..7a88c34d0867 100644
--- a/kernel/dma/direct.c
+++ b/kernel/dma/direct.c
@@ -495,7 +495,7 @@ int dma_direct_supported(struct device *dev, u64 mask)
 size_t dma_direct_max_mapping_size(struct device *dev)
 {
/* If SWIOTLB is active, use its maximum mapping size */
-   if (is_swiotlb_active() &&
+   if (is_swiotlb_active(dev) &&
(dma_addressing_limited(dev) || swiotlb_force == SWIOTLB_FORCE))
return swiotlb_max_mapping_size(dev);
return SIZE_MAX;
diff --git a/kernel/dma/swiotlb.c b/kernel/dma/swiotlb.c
index de79e9437030..409694d7a8ad 100644
--- a/kernel/dma/swiotlb.c
+++ b/kernel/dma/swiotlb.c
@@ -664,9 +664,9 @@ size_t swiotlb_max_mapping_size(struct device *dev)
return ((size_t)IO_TLB_SIZE) * IO_TLB_SEGSIZE;
 }
 
-bool is_swiotlb_active(void)
+bool is_swiotlb_active(struct device *dev)
 {
-   return io_tlb_default_mem != NULL;
+   return dev->dma_io_tlb_mem != NULL;
 }
 EXPORT_SYMBOL_GPL(is_swiotlb_active);
 
-- 
2.32.0.288.g62a8d224e6-goog

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

[PATCH v13 04/12] swiotlb: Update is_swiotlb_buffer to add a struct device argument

2021-06-17 Thread Claire Chang

Update is_swiotlb_buffer to add a struct device argument. This will be
useful later to allow for different pools.

Signed-off-by: Claire Chang 
Reviewed-by: Christoph Hellwig 
Tested-by: Stefano Stabellini 
Tested-by: Will Deacon 
---
 drivers/iommu/dma-iommu.c | 12 ++--
 drivers/xen/swiotlb-xen.c |  2 +-
 include/linux/swiotlb.h   |  7 ---
 kernel/dma/direct.c   |  6 +++---
 kernel/dma/direct.h   |  6 +++---
 5 files changed, 17 insertions(+), 16 deletions(-)

diff --git a/drivers/iommu/dma-iommu.c b/drivers/iommu/dma-iommu.c
index 3087d9fa6065..10997ef541f8 100644
--- a/drivers/iommu/dma-iommu.c
+++ b/drivers/iommu/dma-iommu.c
@@ -507,7 +507,7 @@ static void __iommu_dma_unmap_swiotlb(struct device *dev, 
dma_addr_t dma_addr,
 
__iommu_dma_unmap(dev, dma_addr, size);
 
-   if (unlikely(is_swiotlb_buffer(phys)))
+   if (unlikely(is_swiotlb_buffer(dev, phys)))
swiotlb_tbl_unmap_single(dev, phys, size, dir, attrs);
 }
 
@@ -578,7 +578,7 @@ static dma_addr_t __iommu_dma_map_swiotlb(struct device 
*dev, phys_addr_t phys,
}
 
iova = __iommu_dma_map(dev, phys, aligned_size, prot, dma_mask);
-   if (iova == DMA_MAPPING_ERROR && is_swiotlb_buffer(phys))
+   if (iova == DMA_MAPPING_ERROR && is_swiotlb_buffer(dev, phys))
swiotlb_tbl_unmap_single(dev, phys, org_size, dir, attrs);
return iova;
 }
@@ -749,7 +749,7 @@ static void iommu_dma_sync_single_for_cpu(struct device 
*dev,
if (!dev_is_dma_coherent(dev))
arch_sync_dma_for_cpu(phys, size, dir);
 
-   if (is_swiotlb_buffer(phys))
+   if (is_swiotlb_buffer(dev, phys))
swiotlb_sync_single_for_cpu(dev, phys, size, dir);
 }
 
@@ -762,7 +762,7 @@ static void iommu_dma_sync_single_for_device(struct device 
*dev,
return;
 
phys = iommu_iova_to_phys(iommu_get_dma_domain(dev), dma_handle);
-   if (is_swiotlb_buffer(phys))
+   if (is_swiotlb_buffer(dev, phys))
swiotlb_sync_single_for_device(dev, phys, size, dir);
 
if (!dev_is_dma_coherent(dev))
@@ -783,7 +783,7 @@ static void iommu_dma_sync_sg_for_cpu(struct device *dev,
if (!dev_is_dma_coherent(dev))
arch_sync_dma_for_cpu(sg_phys(sg), sg->length, dir);
 
-   if (is_swiotlb_buffer(sg_phys(sg)))
+   if (is_swiotlb_buffer(dev, sg_phys(sg)))
swiotlb_sync_single_for_cpu(dev, sg_phys(sg),
sg->length, dir);
}
@@ -800,7 +800,7 @@ static void iommu_dma_sync_sg_for_device(struct device *dev,
return;
 
for_each_sg(sgl, sg, nelems, i) {
-   if (is_swiotlb_buffer(sg_phys(sg)))
+   if (is_swiotlb_buffer(dev, sg_phys(sg)))
swiotlb_sync_single_for_device(dev, sg_phys(sg),
   sg->length, dir);
 
diff --git a/drivers/xen/swiotlb-xen.c b/drivers/xen/swiotlb-xen.c
index 4c89afc0df62..0c6ed09f8513 100644
--- a/drivers/xen/swiotlb-xen.c
+++ b/drivers/xen/swiotlb-xen.c
@@ -100,7 +100,7 @@ static int is_xen_swiotlb_buffer(struct device *dev, 
dma_addr_t dma_addr)
 * in our domain. Therefore _only_ check address within our domain.
 */
if (pfn_valid(PFN_DOWN(paddr)))
-   return is_swiotlb_buffer(paddr);
+   return is_swiotlb_buffer(dev, paddr);
return 0;
 }
 
diff --git a/include/linux/swiotlb.h b/include/linux/swiotlb.h
index 216854a5e513..d1f3d95881cd 100644
--- a/include/linux/swiotlb.h
+++ b/include/linux/swiotlb.h
@@ -2,6 +2,7 @@
 #ifndef __LINUX_SWIOTLB_H
 #define __LINUX_SWIOTLB_H
 
+#include 
 #include 
 #include 
 #include 
@@ -101,9 +102,9 @@ struct io_tlb_mem {
 };
 extern struct io_tlb_mem *io_tlb_default_mem;
 
-static inline bool is_swiotlb_buffer(phys_addr_t paddr)
+static inline bool is_swiotlb_buffer(struct device *dev, phys_addr_t paddr)
 {
-   struct io_tlb_mem *mem = io_tlb_default_mem;
+   struct io_tlb_mem *mem = dev->dma_io_tlb_mem;
 
return mem && paddr >= mem->start && paddr < mem->end;
 }
@@ -115,7 +116,7 @@ bool is_swiotlb_active(void);
 void __init swiotlb_adjust_size(unsigned long size);
 #else
 #define swiotlb_force SWIOTLB_NO_FORCE
-static inline bool is_swiotlb_buffer(phys_addr_t paddr)
+static inline bool is_swiotlb_buffer(struct device *dev, phys_addr_t paddr)
 {
return false;
 }
diff --git a/kernel/dma/direct.c b/kernel/dma/direct.c
index f737e3347059..84c9feb5474a 100644
--- a/kernel/dma/direct.c
+++ b/kernel/dma/direct.c
@@ -343,7 +343,7 @@ void dma_direct_sync_sg_for_device(struct device *dev,
for_each_sg(sgl, sg, nents, i) {
phys_addr_t paddr = dma_to_phys(dev, sg_dma_address(sg));
 
-   if (unlikely(is_swiotlb_buffer(paddr)))
+   if (unlikely(is_swiotlb_buffer(dev, paddr)))

[PATCH v13 03/12] swiotlb: Set dev->dma_io_tlb_mem to the swiotlb pool used

2021-06-17 Thread Claire Chang

Always have the pointer to the swiotlb pool used in struct device. This
could help simplify the code for other pools.

Signed-off-by: Claire Chang 
Reviewed-by: Christoph Hellwig 
Tested-by: Stefano Stabellini 
Tested-by: Will Deacon 
---
 drivers/base/core.c| 4 
 include/linux/device.h | 4 
 kernel/dma/swiotlb.c   | 8 
 3 files changed, 12 insertions(+), 4 deletions(-)

diff --git a/drivers/base/core.c b/drivers/base/core.c
index f29839382f81..cb3123e3954d 100644
--- a/drivers/base/core.c
+++ b/drivers/base/core.c
@@ -27,6 +27,7 @@
 #include 
 #include 
 #include 
+#include 
 #include 
 #include  /* for dma_default_coherent */
 
@@ -2736,6 +2737,9 @@ void device_initialize(struct device *dev)
 defined(CONFIG_ARCH_HAS_SYNC_DMA_FOR_CPU_ALL)
dev->dma_coherent = dma_default_coherent;
 #endif
+#ifdef CONFIG_SWIOTLB
+   dev->dma_io_tlb_mem = io_tlb_default_mem;
+#endif
 }
 EXPORT_SYMBOL_GPL(device_initialize);
 
diff --git a/include/linux/device.h b/include/linux/device.h
index ba660731bd25..240d652a0696 100644
--- a/include/linux/device.h
+++ b/include/linux/device.h
@@ -416,6 +416,7 @@ struct dev_links_info {
  * @dma_pools: Dma pools (if dma'ble device).
  * @dma_mem:   Internal for coherent mem override.
  * @cma_area:  Contiguous memory area for dma allocations
+ * @dma_io_tlb_mem: Pointer to the swiotlb pool used.  Not for driver use.
  * @archdata:  For arch-specific additions.
  * @of_node:   Associated device tree node.
  * @fwnode:Associated device node supplied by platform firmware.
@@ -518,6 +519,9 @@ struct device {
 #ifdef CONFIG_DMA_CMA
struct cma *cma_area;   /* contiguous memory area for dma
   allocations */
+#endif
+#ifdef CONFIG_SWIOTLB
+   struct io_tlb_mem *dma_io_tlb_mem;
 #endif
/* arch specific additions */
struct dev_archdata archdata;
diff --git a/kernel/dma/swiotlb.c b/kernel/dma/swiotlb.c
index 2dba659a1e73..de79e9437030 100644
--- a/kernel/dma/swiotlb.c
+++ b/kernel/dma/swiotlb.c
@@ -340,7 +340,7 @@ void __init swiotlb_exit(void)
 static void swiotlb_bounce(struct device *dev, phys_addr_t tlb_addr, size_t 
size,
   enum dma_data_direction dir)
 {
-   struct io_tlb_mem *mem = io_tlb_default_mem;
+   struct io_tlb_mem *mem = dev->dma_io_tlb_mem;
int index = (tlb_addr - mem->start) >> IO_TLB_SHIFT;
unsigned int offset = (tlb_addr - mem->start) & (IO_TLB_SIZE - 1);
phys_addr_t orig_addr = mem->slots[index].orig_addr;
@@ -431,7 +431,7 @@ static unsigned int wrap_index(struct io_tlb_mem *mem, 
unsigned int index)
 static int find_slots(struct device *dev, phys_addr_t orig_addr,
size_t alloc_size)
 {
-   struct io_tlb_mem *mem = io_tlb_default_mem;
+   struct io_tlb_mem *mem = dev->dma_io_tlb_mem;
unsigned long boundary_mask = dma_get_seg_boundary(dev);
dma_addr_t tbl_dma_addr =
phys_to_dma_unencrypted(dev, mem->start) & boundary_mask;
@@ -508,7 +508,7 @@ phys_addr_t swiotlb_tbl_map_single(struct device *dev, 
phys_addr_t orig_addr,
size_t mapping_size, size_t alloc_size,
enum dma_data_direction dir, unsigned long attrs)
 {
-   struct io_tlb_mem *mem = io_tlb_default_mem;
+   struct io_tlb_mem *mem = dev->dma_io_tlb_mem;
unsigned int offset = swiotlb_align_offset(dev, orig_addr);
unsigned int i;
int index;
@@ -559,7 +559,7 @@ void swiotlb_tbl_unmap_single(struct device *hwdev, 
phys_addr_t tlb_addr,
  size_t mapping_size, enum dma_data_direction dir,
  unsigned long attrs)
 {
-   struct io_tlb_mem *mem = io_tlb_default_mem;
+   struct io_tlb_mem *mem = hwdev->dma_io_tlb_mem;
unsigned long flags;
unsigned int offset = swiotlb_align_offset(hwdev, tlb_addr);
int index = (tlb_addr - offset - mem->start) >> IO_TLB_SHIFT;
-- 
2.32.0.288.g62a8d224e6-goog

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

[PATCH v13 02/12] swiotlb: Refactor swiotlb_create_debugfs

2021-06-17 Thread Claire Chang

Split the debugfs creation to make the code reusable for supporting
different bounce buffer pools.

Signed-off-by: Claire Chang 
Reviewed-by: Christoph Hellwig 
Tested-by: Stefano Stabellini 
Tested-by: Will Deacon 
---
 kernel/dma/swiotlb.c | 21 ++---
 1 file changed, 14 insertions(+), 7 deletions(-)

diff --git a/kernel/dma/swiotlb.c b/kernel/dma/swiotlb.c
index 47bb2a766798..2dba659a1e73 100644
--- a/kernel/dma/swiotlb.c
+++ b/kernel/dma/swiotlb.c
@@ -671,19 +671,26 @@ bool is_swiotlb_active(void)
 EXPORT_SYMBOL_GPL(is_swiotlb_active);
 
 #ifdef CONFIG_DEBUG_FS
+static struct dentry *debugfs_dir;
 
-static int __init swiotlb_create_debugfs(void)
+static void swiotlb_create_debugfs_files(struct io_tlb_mem *mem)
 {
-   struct io_tlb_mem *mem = io_tlb_default_mem;
-
-   if (!mem)
-   return 0;
-   mem->debugfs = debugfs_create_dir("swiotlb", NULL);
debugfs_create_ulong("io_tlb_nslabs", 0400, mem->debugfs, >nslabs);
debugfs_create_ulong("io_tlb_used", 0400, mem->debugfs, >used);
+}
+
+static int __init swiotlb_create_default_debugfs(void)
+{
+   struct io_tlb_mem *mem = io_tlb_default_mem;
+
+   debugfs_dir = debugfs_create_dir("swiotlb", NULL);
+   if (mem) {
+   mem->debugfs = debugfs_dir;
+   swiotlb_create_debugfs_files(mem);
+   }
return 0;
 }
 
-late_initcall(swiotlb_create_debugfs);
+late_initcall(swiotlb_create_default_debugfs);
 
 #endif
-- 
2.32.0.288.g62a8d224e6-goog

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

[PATCH v13 01/12] swiotlb: Refactor swiotlb init functions

2021-06-17 Thread Claire Chang

Add a new function, swiotlb_init_io_tlb_mem, for the io_tlb_mem struct
initialization to make the code reusable.

Signed-off-by: Claire Chang 
Reviewed-by: Christoph Hellwig 
Tested-by: Stefano Stabellini 
Tested-by: Will Deacon 
---
 kernel/dma/swiotlb.c | 50 ++--
 1 file changed, 25 insertions(+), 25 deletions(-)

diff --git a/kernel/dma/swiotlb.c b/kernel/dma/swiotlb.c
index 52e2ac526757..47bb2a766798 100644
--- a/kernel/dma/swiotlb.c
+++ b/kernel/dma/swiotlb.c
@@ -168,9 +168,28 @@ void __init swiotlb_update_mem_attributes(void)
memset(vaddr, 0, bytes);
 }
 
-int __init swiotlb_init_with_tbl(char *tlb, unsigned long nslabs, int verbose)
+static void swiotlb_init_io_tlb_mem(struct io_tlb_mem *mem, phys_addr_t start,
+   unsigned long nslabs, bool late_alloc)
 {
+   void *vaddr = phys_to_virt(start);
unsigned long bytes = nslabs << IO_TLB_SHIFT, i;
+
+   mem->nslabs = nslabs;
+   mem->start = start;
+   mem->end = mem->start + bytes;
+   mem->index = 0;
+   mem->late_alloc = late_alloc;
+   spin_lock_init(>lock);
+   for (i = 0; i < mem->nslabs; i++) {
+   mem->slots[i].list = IO_TLB_SEGSIZE - io_tlb_offset(i);
+   mem->slots[i].orig_addr = INVALID_PHYS_ADDR;
+   mem->slots[i].alloc_size = 0;
+   }
+   memset(vaddr, 0, bytes);
+}
+
+int __init swiotlb_init_with_tbl(char *tlb, unsigned long nslabs, int verbose)
+{
struct io_tlb_mem *mem;
size_t alloc_size;
 
@@ -186,16 +205,8 @@ int __init swiotlb_init_with_tbl(char *tlb, unsigned long 
nslabs, int verbose)
if (!mem)
panic("%s: Failed to allocate %zu bytes align=0x%lx\n",
  __func__, alloc_size, PAGE_SIZE);
-   mem->nslabs = nslabs;
-   mem->start = __pa(tlb);
-   mem->end = mem->start + bytes;
-   mem->index = 0;
-   spin_lock_init(>lock);
-   for (i = 0; i < mem->nslabs; i++) {
-   mem->slots[i].list = IO_TLB_SEGSIZE - io_tlb_offset(i);
-   mem->slots[i].orig_addr = INVALID_PHYS_ADDR;
-   mem->slots[i].alloc_size = 0;
-   }
+
+   swiotlb_init_io_tlb_mem(mem, __pa(tlb), nslabs, false);
 
io_tlb_default_mem = mem;
if (verbose)
@@ -282,8 +293,8 @@ swiotlb_late_init_with_default_size(size_t default_size)
 int
 swiotlb_late_init_with_tbl(char *tlb, unsigned long nslabs)
 {
-   unsigned long bytes = nslabs << IO_TLB_SHIFT, i;
struct io_tlb_mem *mem;
+   unsigned long bytes = nslabs << IO_TLB_SHIFT;
 
if (swiotlb_force == SWIOTLB_NO_FORCE)
return 0;
@@ -297,20 +308,9 @@ swiotlb_late_init_with_tbl(char *tlb, unsigned long nslabs)
if (!mem)
return -ENOMEM;
 
-   mem->nslabs = nslabs;
-   mem->start = virt_to_phys(tlb);
-   mem->end = mem->start + bytes;
-   mem->index = 0;
-   mem->late_alloc = 1;
-   spin_lock_init(>lock);
-   for (i = 0; i < mem->nslabs; i++) {
-   mem->slots[i].list = IO_TLB_SEGSIZE - io_tlb_offset(i);
-   mem->slots[i].orig_addr = INVALID_PHYS_ADDR;
-   mem->slots[i].alloc_size = 0;
-   }
-
+   memset(mem, 0, sizeof(*mem));
+   swiotlb_init_io_tlb_mem(mem, virt_to_phys(tlb), nslabs, true);
set_memory_decrypted((unsigned long)tlb, bytes >> PAGE_SHIFT);
-   memset(tlb, 0, bytes);
 
io_tlb_default_mem = mem;
swiotlb_print_info();
-- 
2.32.0.288.g62a8d224e6-goog

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

[PATCH v13 00/12] Restricted DMA

2021-06-17 Thread Claire Chang

This series implements mitigations for lack of DMA access control on
systems without an IOMMU, which could result in the DMA accessing the
system memory at unexpected times and/or unexpected addresses, possibly
leading to data leakage or corruption.

For example, we plan to use the PCI-e bus for Wi-Fi and that PCI-e bus is
not behind an IOMMU. As PCI-e, by design, gives the device full access to
system memory, a vulnerability in the Wi-Fi firmware could easily escalate
to a full system exploit (remote wifi exploits: [1a], [1b] that shows a
full chain of exploits; [2], [3]).

To mitigate the security concerns, we introduce restricted DMA. Restricted
DMA utilizes the existing swiotlb to bounce streaming DMA in and out of a
specially allocated region and does memory allocation from the same region.
The feature on its own provides a basic level of protection against the DMA
overwriting buffer contents at unexpected times. However, to protect
against general data leakage and system memory corruption, the system needs
to provide a way to restrict the DMA to a predefined memory region (this is
usually done at firmware level, e.g. MPU in ATF on some ARM platforms [4]).

[1a] 
https://googleprojectzero.blogspot.com/2017/04/over-air-exploiting-broadcoms-wi-fi_4.html
[1b] 
https://googleprojectzero.blogspot.com/2017/04/over-air-exploiting-broadcoms-wi-fi_11.html
[2] https://blade.tencent.com/en/advisories/qualpwn/
[3] 
https://www.bleepingcomputer.com/news/security/vulnerabilities-found-in-highly-popular-firmware-for-wifi-chips/
[4] 
https://github.com/ARM-software/arm-trusted-firmware/blob/master/plat/mediatek/mt8183/drivers/emi_mpu/emi_mpu.c#L132

v13:
- Fix xen-swiotlb issues
  - memset in patch 01/12
  - is_swiotlb_force_bounce in patch 06/12
- Fix the dts example typo in reserved-memory.txt
- Add Stefano and Will's Tested-by tag from v12

v12:
Split is_dev_swiotlb_force into is_swiotlb_force_bounce (patch 06/12) and
is_swiotlb_for_alloc (patch 09/12)
https://lore.kernel.org/patchwork/cover/1447254/

v11:
- Rebase against swiotlb devel/for-linus-5.14
- s/mempry/memory/g
- exchange the order of patch 09/12 and 10/12
https://lore.kernel.org/patchwork/cover/1447216/

v10:
Address the comments in v9 to
  - fix the dev->dma_io_tlb_mem assignment
  - propagate swiotlb_force setting into io_tlb_default_mem->force
  - move set_memory_decrypted out of swiotlb_init_io_tlb_mem
  - move debugfs_dir declaration into the main CONFIG_DEBUG_FS block
  - add swiotlb_ prefix to find_slots and release_slots
  - merge the 3 alloc/free related patches
  - move the CONFIG_DMA_RESTRICTED_POOL later
https://lore.kernel.org/patchwork/cover/1446882/

v9:
Address the comments in v7 to
  - set swiotlb active pool to dev->dma_io_tlb_mem
  - get rid of get_io_tlb_mem
  - dig out the device struct for is_swiotlb_active
  - move debugfs_create_dir out of swiotlb_create_debugfs
  - do set_memory_decrypted conditionally in swiotlb_init_io_tlb_mem
  - use IS_ENABLED in kernel/dma/direct.c
  - fix redefinition of 'of_dma_set_restricted_buffer'
https://lore.kernel.org/patchwork/cover/1445081/

v8:
- Fix reserved-memory.txt and add the reg property in example.
- Fix sizeof for of_property_count_elems_of_size in
  drivers/of/address.c#of_dma_set_restricted_buffer.
- Apply Will's suggestion to try the OF node having DMA configuration in
  drivers/of/address.c#of_dma_set_restricted_buffer.
- Fix typo in the comment of drivers/of/address.c#of_dma_set_restricted_buffer.
- Add error message for PageHighMem in
  kernel/dma/swiotlb.c#rmem_swiotlb_device_init and move it to
  rmem_swiotlb_setup.
- Fix the message string in rmem_swiotlb_setup.
https://lore.kernel.org/patchwork/cover/1437112/

v7:
Fix debugfs, PageHighMem and comment style in rmem_swiotlb_device_init
https://lore.kernel.org/patchwork/cover/1431031/

v6:
Address the comments in v5
https://lore.kernel.org/patchwork/cover/1423201/

v5:
Rebase on latest linux-next
https://lore.kernel.org/patchwork/cover/1416899/

v4:
- Fix spinlock bad magic
- Use rmem->name for debugfs entry
- Address the comments in v3
https://lore.kernel.org/patchwork/cover/1378113/

v3:
Using only one reserved memory region for both streaming DMA and memory
allocation.
https://lore.kernel.org/patchwork/cover/1360992/

v2:
Building on top of swiotlb.
https://lore.kernel.org/patchwork/cover/1280705/

v1:
Using dma_map_ops.
https://lore.kernel.org/patchwork/cover/1271660/

Claire Chang (12):
  swiotlb: Refactor swiotlb init functions
  swiotlb: Refactor swiotlb_create_debugfs
  swiotlb: Set dev->dma_io_tlb_mem to the swiotlb pool used
  swiotlb: Update is_swiotlb_buffer to add a struct device argument
  swiotlb: Update is_swiotlb_active to add a struct device argument
  swiotlb: Use is_swiotlb_force_bounce for swiotlb data bouncing
  swiotlb: Move alloc_size to swiotlb_find_slots
  swiotlb: Refactor swiotlb_tbl_unmap_single
  swiotlb: Add restricted DMA alloc/free support
  swiotlb: Add restricted DMA pool initialization

75 matches

Mail list logo