date:20210118

On Mon, Jan 18, 2021 at 01:16:03PM -0800, Rob Clark wrote:
> On Mon, Dec 21, 2020 at 4:44 PM Isaac J. Manjarres
>  wrote:
> >
> > The MSM DRM driver depends on the availability of the ARM LPAE io-pgtable
> > format code to work properly. In preparation for having the io-pgtable
> > formats as modules, add a "pre" dependency with MODULE_SOFTDEP() to
> > ensure that the io-pgtable-arm format module is loaded before loading
> > the MSM DRM driver module.
> >
> > Signed-off-by: Isaac J. Manjarres 
> 
> Thanks, I've queued this up locally

I don't plan to make the io-pgtable code modular, so please drop this patch.

https://lore.kernel.org/r/20210106123428.GA1798@willie-the-truck

Will
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

Re: [PATCH v2 5/7] drm/msm: Add dependency on io-pgtable-arm format module

2021-01-18 Thread Rob Clark

On Mon, Dec 21, 2020 at 4:44 PM Isaac J. Manjarres
 wrote:
>
> The MSM DRM driver depends on the availability of the ARM LPAE io-pgtable
> format code to work properly. In preparation for having the io-pgtable
> formats as modules, add a "pre" dependency with MODULE_SOFTDEP() to
> ensure that the io-pgtable-arm format module is loaded before loading
> the MSM DRM driver module.
>
> Signed-off-by: Isaac J. Manjarres 

Thanks, I've queued this up locally

BR,
-R

> ---
>  drivers/gpu/drm/msm/msm_drv.c | 1 +
>  1 file changed, 1 insertion(+)
>
> diff --git a/drivers/gpu/drm/msm/msm_drv.c b/drivers/gpu/drm/msm/msm_drv.c
> index 535a026..8be3506 100644
> --- a/drivers/gpu/drm/msm/msm_drv.c
> +++ b/drivers/gpu/drm/msm/msm_drv.c
> @@ -1369,3 +1369,4 @@ module_exit(msm_drm_unregister);
>  MODULE_AUTHOR("Rob Clark   MODULE_DESCRIPTION("MSM DRM Driver");
>  MODULE_LICENSE("GPL");
> +MODULE_SOFTDEP("pre: io-pgtable-arm");
> --
> The Qualcomm Innovation Center, Inc. is a member of the Code Aurora Forum,
> a Linux Foundation Collaborative Project
>
> ___
> iommu mailing list
> iommu@lists.linux-foundation.org
> https://lists.linuxfoundation.org/mailman/listinfo/iommu
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

Re: performance regression noted in v5.11-rc after c062db039f40

2021-01-18 Thread Chuck Lever




> On Jan 18, 2021, at 1:00 PM, Robin Murphy  wrote:
> 
> On 2021-01-18 16:18, Chuck Lever wrote:
>>> On Jan 12, 2021, at 9:38 AM, Will Deacon  wrote:
>>> 
>>> [Expanding cc list to include DMA-IOMMU and intel IOMMU folks]
>>> 
>>> On Fri, Jan 08, 2021 at 04:18:36PM -0500, Chuck Lever wrote:
 Hi-
 
 [ Please cc: me on replies, I'm not currently subscribed to
 iommu@lists ].
 
 I'm running NFS performance tests on InfiniBand using CX-3 Pro cards
 at 56Gb/s. The test is iozone on an NFSv3/RDMA mount:
 
 /home/cel/bin/iozone -M -+u -i0 -i1 -s1g -r256k -t12 -I
 
 For those not familiar with the way storage protocols use RDMA, The
 initiator/client sets up memory regions and the target/server uses
 RDMA Read and Write to move data out of and into those regions. The
 initiator/client uses only RDMA memory registration and invalidation
 operations, and the target/server uses RDMA Read and Write.
 
 My NFS client is a two-socket 12-core x86_64 system with its I/O MMU
 enabled using the kernel command line options "intel_iommu=on
 iommu=strict".
 
 Recently I've noticed a significant (25-30%) loss in NFS throughput.
 I was able to bisect on my client to the following commits.
 
 Here's 65f746e8285f ("iommu: Add quirk for Intel graphic devices in
 map_sg"). This is about normal for this test.
 
Children see throughput for 12 initial writers  = 4732581.09 kB/sec
Parent sees throughput for 12 initial writers   = 4646810.21 kB/sec
Min throughput per process  =  387764.34 kB/sec
Max throughput per process  =  399655.47 kB/sec
Avg throughput per process  =  394381.76 kB/sec
Min xfer= 1017344.00 kB
CPU Utilization: Wall time2.671CPU time1.974CPU 
 utilization  73.89 %
Children see throughput for 12 rewriters= 4837741.94 kB/sec
Parent sees throughput for 12 rewriters = 4833509.35 kB/sec
Min throughput per process  =  398983.72 kB/sec
Max throughput per process  =  406199.66 kB/sec
Avg throughput per process  =  403145.16 kB/sec
Min xfer= 1030656.00 kB
CPU utilization: Wall time2.584CPU time1.959CPU 
 utilization  75.82 %
Children see throughput for 12 readers  = 5921370.94 kB/sec
Parent sees throughput for 12 readers   = 5914106.69 kB/sec
Min throughput per process  =  491812.38 kB/sec
Max throughput per process  =  494777.28 kB/sec
Avg throughput per process  =  493447.58 kB/sec
Min xfer= 1042688.00 kB
CPU utilization: Wall time2.122CPU time1.968CPU 
 utilization  92.75 %
Children see throughput for 12 re-readers   = 5947985.69 kB/sec
Parent sees throughput for 12 re-readers= 5941348.51 kB/sec
Min throughput per process  =  492805.81 kB/sec
Max throughput per process  =  497280.19 kB/sec
Avg throughput per process  =  495665.47 kB/sec
Min xfer= 1039360.00 kB
CPU utilization: Wall time2.111CPU time1.968CPU 
 utilization  93.22 %
 
 Here's c062db039f40 ("iommu/vt-d: Update domain geometry in
 iommu_ops.at(de)tach_dev"). It's losing some steam here.
 
Children see throughput for 12 initial writers  = 4342419.12 kB/sec
Parent sees throughput for 12 initial writers   = 4310612.79 kB/sec
Min throughput per process  =  359299.06 kB/sec
Max throughput per process  =  363866.16 kB/sec
Avg throughput per process  =  361868.26 kB/sec
Min xfer= 1035520.00 kB
CPU Utilization: Wall time2.902CPU time1.951CPU 
 utilization  67.22 %
Children see throughput for 12 rewriters= 4408576.66 kB/sec
Parent sees throughput for 12 rewriters = 4404280.87 kB/sec
Min throughput per process  =  364553.88 kB/sec
Max throughput per process  =  370029.28 kB/sec
Avg throughput per process  =  367381.39 kB/sec
Min xfer= 1033216.00 kB
CPU utilization: Wall time2.836CPU time1.956CPU 
 utilization  68.97 %
Children see throughput for 12 readers  = 5406879.47 kB/sec
Parent sees throughput for 12 readers   =

dma_mmap_coherent() breakage with mem_encrypt on AMD Ryzen

2021-01-18 Thread Takashi Iwai

Hi,

we've got a bug report recently about the garbage playback sound from
a PCI sound device with mem_encrypt on AMD Ryzen:
   https://bugzilla.kernel.org/show_bug.cgi?id=27

The debug session showed that this happens only with the mmap using
dma_mmap_coherent() and mem_encrypt.  The mmap with the legacy page
fault (as done in ALSA PCM core) seems still working even with the
mem_encrypt.

Since the problem could be observed on two different PCI drivers, it
looks like a generic problem of DMA-mmap implementation with AMD
memory encryption.

According to the reporter, it's seen on both 5.4.x and 5.10.x
kernels, hence it doesn't like a recent regression.

Can anyone take a look?


Thanks!

Takashi
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

Re: [PATCH v4 7/7] iommu/mediatek: Remove the tlb-ops for v7s


On 2021-01-07 12:29, Yong Wu wrote:

Until now, we have already used the tlb operations from iommu framework,
then the tlb operations for v7s can be removed.

Correspondingly, Switch the paramenter "cookie" to the internal structure.


Reviewed-by: Robin Murphy 


Signed-off-by: Yong Wu 
---
  drivers/iommu/mtk_iommu.c | 27 ---
  1 file changed, 4 insertions(+), 23 deletions(-)

diff --git a/drivers/iommu/mtk_iommu.c b/drivers/iommu/mtk_iommu.c
index d3b8a1649093..86ab577c9520 100644
--- a/drivers/iommu/mtk_iommu.c
+++ b/drivers/iommu/mtk_iommu.c
@@ -182,10 +182,8 @@ static struct mtk_iommu_domain *to_mtk_domain(struct 
iommu_domain *dom)
return container_of(dom, struct mtk_iommu_domain, domain);
  }
  
-static void mtk_iommu_tlb_flush_all(void *cookie)

+static void mtk_iommu_tlb_flush_all(struct mtk_iommu_data *data)
  {
-   struct mtk_iommu_data *data = cookie;
-
for_each_m4u(data) {
writel_relaxed(F_INVLD_EN1 | F_INVLD_EN0,
   data->base + data->plat_data->inv_sel_reg);
@@ -195,9 +193,9 @@ static void mtk_iommu_tlb_flush_all(void *cookie)
  }
  
  static void mtk_iommu_tlb_flush_range_sync(unsigned long iova, size_t size,

-  size_t granule, void *cookie)
+  size_t granule,
+  struct mtk_iommu_data *data)
  {
-   struct mtk_iommu_data *data = cookie;
unsigned long flags;
int ret;
u32 tmp;
@@ -219,7 +217,7 @@ static void mtk_iommu_tlb_flush_range_sync(unsigned long 
iova, size_t size,
if (ret) {
dev_warn(data->dev,
 "Partial TLB flush timed out, falling back to full 
flush\n");
-   mtk_iommu_tlb_flush_all(cookie);
+   mtk_iommu_tlb_flush_all(data);
}
/* Clear the CPE status */
writel_relaxed(0, data->base + REG_MMU_CPE_DONE);
@@ -227,22 +225,6 @@ static void mtk_iommu_tlb_flush_range_sync(unsigned long 
iova, size_t size,
}
  }
  
-static void mtk_iommu_tlb_flush_page_nosync(struct iommu_iotlb_gather *gather,

-   unsigned long iova, size_t granule,
-   void *cookie)
-{
-   struct mtk_iommu_data *data = cookie;
-   struct iommu_domain *domain = >m4u_dom->domain;
-
-   iommu_iotlb_gather_add_page(domain, gather, iova, granule);
-}
-
-static const struct iommu_flush_ops mtk_iommu_flush_ops = {
-   .tlb_flush_all = mtk_iommu_tlb_flush_all,
-   .tlb_flush_walk = mtk_iommu_tlb_flush_range_sync,
-   .tlb_add_page = mtk_iommu_tlb_flush_page_nosync,
-};
-
  static irqreturn_t mtk_iommu_isr(int irq, void *dev_id)
  {
struct mtk_iommu_data *data = dev_id;
@@ -326,7 +308,6 @@ static int mtk_iommu_domain_finalise(struct 
mtk_iommu_domain *dom)
.pgsize_bitmap = mtk_iommu_ops.pgsize_bitmap,
.ias = 32,
.oas = 34,
-   .tlb = _iommu_flush_ops,
.iommu_dev = data->dev,
};
  


___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

Re: [PATCH v4 6/7] iommu/mediatek: Gather iova in iommu_unmap to achieve tlb sync once


On 2021-01-07 12:29, Yong Wu wrote:

In current iommu_unmap, this code is:

iommu_iotlb_gather_init(_gather);
ret = __iommu_unmap(domain, iova, size, _gather);
iommu_iotlb_sync(domain, _gather);

We could gather the whole iova range in __iommu_unmap, and then do tlb
synchronization in the iommu_iotlb_sync.

This patch implement this, Gather the range in mtk_iommu_unmap.
then iommu_iotlb_sync call tlb synchronization for the gathered iova range.
we don't call iommu_iotlb_gather_add_page since our tlb synchronization
could be regardless of granule size.

In this way, gather->start is impossible ULONG_MAX, remove the checking.

This patch aims to do tlb synchronization *once* in the iommu_unmap.


Reviewed-by: Robin Murphy 


Signed-off-by: Yong Wu 
---
  drivers/iommu/mtk_iommu.c | 8 +---
  1 file changed, 5 insertions(+), 3 deletions(-)

diff --git a/drivers/iommu/mtk_iommu.c b/drivers/iommu/mtk_iommu.c
index 66a00a2cb445..d3b8a1649093 100644
--- a/drivers/iommu/mtk_iommu.c
+++ b/drivers/iommu/mtk_iommu.c
@@ -430,7 +430,12 @@ static size_t mtk_iommu_unmap(struct iommu_domain *domain,
  struct iommu_iotlb_gather *gather)
  {
struct mtk_iommu_domain *dom = to_mtk_domain(domain);
+   unsigned long end = iova + size - 1;
  
+	if (gather->start > iova)

+   gather->start = iova;
+   if (gather->end < end)
+   gather->end = end;
return dom->iop->unmap(dom->iop, iova, size, gather);
  }
  
@@ -445,9 +450,6 @@ static void mtk_iommu_iotlb_sync(struct iommu_domain *domain,

struct mtk_iommu_data *data = mtk_iommu_get_m4u_data();
size_t length = gather->end - gather->start + 1;
  
-	if (gather->start == ULONG_MAX)

-   return;
-
mtk_iommu_tlb_flush_range_sync(gather->start, length, gather->pgsize,
   data);
  }


___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

Re: [PATCH v4 5/7] iommu/io-pgtable: Allow io_pgtable_tlb ops optional


On 2021-01-07 12:29, Yong Wu wrote:

This patch allows io_pgtable_tlb ops could be null since the IOMMU drivers
may use the tlb ops from iommu framework.


For the reasons I gave on v3,

Reviewed-by: Robin Murphy 


Signed-off-by: Yong Wu 
---
  include/linux/io-pgtable.h | 8 +---
  1 file changed, 5 insertions(+), 3 deletions(-)

diff --git a/include/linux/io-pgtable.h b/include/linux/io-pgtable.h
index ea727eb1a1a9..2a5686ca2ba3 100644
--- a/include/linux/io-pgtable.h
+++ b/include/linux/io-pgtable.h
@@ -214,14 +214,16 @@ struct io_pgtable_domain_attr {
  
  static inline void io_pgtable_tlb_flush_all(struct io_pgtable *iop)

  {
-   iop->cfg.tlb->tlb_flush_all(iop->cookie);
+   if (iop->cfg.tlb && iop->cfg.tlb->tlb_flush_all)
+   iop->cfg.tlb->tlb_flush_all(iop->cookie);
  }
  
  static inline void

  io_pgtable_tlb_flush_walk(struct io_pgtable *iop, unsigned long iova,
  size_t size, size_t granule)
  {
-   iop->cfg.tlb->tlb_flush_walk(iova, size, granule, iop->cookie);
+   if (iop->cfg.tlb && iop->cfg.tlb->tlb_flush_walk)
+   iop->cfg.tlb->tlb_flush_walk(iova, size, granule, iop->cookie);
  }
  
  static inline void

@@ -229,7 +231,7 @@ io_pgtable_tlb_add_page(struct io_pgtable *iop,
struct iommu_iotlb_gather * gather, unsigned long iova,
size_t granule)
  {
-   if (iop->cfg.tlb->tlb_add_page)
+   if (iop->cfg.tlb && iop->cfg.tlb->tlb_add_page)
iop->cfg.tlb->tlb_add_page(gather, iova, granule, iop->cookie);
  }
  


___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

Re: [PATCH v4 4/7] iommu: Switch gather->end to the inclusive end


On 2021-01-07 12:29, Yong Wu wrote:

Currently gather->end is "unsigned long" which may be overflow in
arch32 in the corner case: 0xfff0 + 0x10(iova + size).
Although it doesn't affect the size(end - start), it affects the checking
"gather->end < end"

This patch changes this "end" to the real end address
(end = start + size - 1). Correspondingly, update the length to
"end - start + 1".

Fixes: a7d20dc19d9e ("iommu: Introduce struct iommu_iotlb_gather for batching TLB 
flushes")
Signed-off-by: Yong Wu 
---
  drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c | 2 +-
  drivers/iommu/mtk_iommu.c   | 2 +-
  drivers/iommu/tegra-gart.c  | 2 +-
  include/linux/iommu.h   | 4 ++--
  4 files changed, 5 insertions(+), 5 deletions(-)

diff --git a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c 
b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
index 8ca7415d785d..c70d6e79f534 100644
--- a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
+++ b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
@@ -2280,7 +2280,7 @@ static void arm_smmu_iotlb_sync(struct iommu_domain 
*domain,
  {
struct arm_smmu_domain *smmu_domain = to_smmu_domain(domain);
  
-	arm_smmu_tlb_inv_range(gather->start, gather->end - gather->start,

+   arm_smmu_tlb_inv_range(gather->start, gather->end - gather->start + 1,
   gather->pgsize, true, smmu_domain);
  }
  
diff --git a/drivers/iommu/mtk_iommu.c b/drivers/iommu/mtk_iommu.c

index f579fc21971e..66a00a2cb445 100644
--- a/drivers/iommu/mtk_iommu.c
+++ b/drivers/iommu/mtk_iommu.c
@@ -443,7 +443,7 @@ static void mtk_iommu_iotlb_sync(struct iommu_domain 
*domain,
 struct iommu_iotlb_gather *gather)
  {
struct mtk_iommu_data *data = mtk_iommu_get_m4u_data();
-   size_t length = gather->end - gather->start;
+   size_t length = gather->end - gather->start + 1;
  
  	if (gather->start == ULONG_MAX)

return;
diff --git a/drivers/iommu/tegra-gart.c b/drivers/iommu/tegra-gart.c
index 05e8e19b8269..6f130e51f072 100644
--- a/drivers/iommu/tegra-gart.c
+++ b/drivers/iommu/tegra-gart.c
@@ -270,7 +270,7 @@ static void gart_iommu_sync_map(struct iommu_domain 
*domain, unsigned long iova,
  static void gart_iommu_sync(struct iommu_domain *domain,
struct iommu_iotlb_gather *gather)
  {
-   size_t length = gather->end - gather->start;
+   size_t length = gather->end - gather->start + 1;


TBH I don't think there's any need to bother doing precise calculations 
on effectively-uninitialised data (this driver doesn't even do 
address-based invalidation, let alone use the gather mechanism). In fact 
it might make sense to flip things around and define gart_iommu_sync_map 
in terms of gart_iommu_sync now just so there's one less unused argument 
to make up. However we can always do cleanup on top, and right now I'm 
more interested in getting these changes landed, so either way,


Reviewed-by: Robin Murphy 


gart_iommu_sync_map(domain, gather->start, length);
  }
diff --git a/include/linux/iommu.h b/include/linux/iommu.h
index 9ce0aa9e236b..ae8eddd4621b 100644
--- a/include/linux/iommu.h
+++ b/include/linux/iommu.h
@@ -170,7 +170,7 @@ enum iommu_dev_features {
   * struct iommu_iotlb_gather - Range information for a pending IOTLB flush
   *
   * @start: IOVA representing the start of the range to be flushed
- * @end: IOVA representing the end of the range to be flushed (exclusive)
+ * @end: IOVA representing the end of the range to be flushed (inclusive)
   * @pgsize: The interval at which to perform the flush
   *
   * This structure is intended to be updated by multiple calls to the
@@ -539,7 +539,7 @@ static inline void iommu_iotlb_gather_add_page(struct 
iommu_domain *domain,
   struct iommu_iotlb_gather 
*gather,
   unsigned long iova, size_t size)
  {
-   unsigned long start = iova, end = start + size;
+   unsigned long start = iova, end = start + size - 1;
  
  	/*

 * If the new page is disjoint from the current range or is mapped at


___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

Re: performance regression noted in v5.11-rc after c062db039f40


On 2021-01-18 16:18, Chuck Lever wrote:




On Jan 12, 2021, at 9:38 AM, Will Deacon  wrote:

[Expanding cc list to include DMA-IOMMU and intel IOMMU folks]

On Fri, Jan 08, 2021 at 04:18:36PM -0500, Chuck Lever wrote:

Hi-

[ Please cc: me on replies, I'm not currently subscribed to
iommu@lists ].

I'm running NFS performance tests on InfiniBand using CX-3 Pro cards
at 56Gb/s. The test is iozone on an NFSv3/RDMA mount:

/home/cel/bin/iozone -M -+u -i0 -i1 -s1g -r256k -t12 -I

For those not familiar with the way storage protocols use RDMA, The
initiator/client sets up memory regions and the target/server uses
RDMA Read and Write to move data out of and into those regions. The
initiator/client uses only RDMA memory registration and invalidation
operations, and the target/server uses RDMA Read and Write.

My NFS client is a two-socket 12-core x86_64 system with its I/O MMU
enabled using the kernel command line options "intel_iommu=on
iommu=strict".

Recently I've noticed a significant (25-30%) loss in NFS throughput.
I was able to bisect on my client to the following commits.

Here's 65f746e8285f ("iommu: Add quirk for Intel graphic devices in
map_sg"). This is about normal for this test.

Children see throughput for 12 initial writers  = 4732581.09 kB/sec
Parent sees throughput for 12 initial writers   = 4646810.21 kB/sec
Min throughput per process  =  387764.34 kB/sec
Max throughput per process  =  399655.47 kB/sec
Avg throughput per process  =  394381.76 kB/sec
Min xfer= 1017344.00 kB
CPU Utilization: Wall time2.671CPU time1.974CPU 
utilization  73.89 %
Children see throughput for 12 rewriters= 4837741.94 kB/sec
Parent sees throughput for 12 rewriters = 4833509.35 kB/sec
Min throughput per process  =  398983.72 kB/sec
Max throughput per process  =  406199.66 kB/sec
Avg throughput per process  =  403145.16 kB/sec
Min xfer= 1030656.00 kB
CPU utilization: Wall time2.584CPU time1.959CPU 
utilization  75.82 %
Children see throughput for 12 readers  = 5921370.94 kB/sec
Parent sees throughput for 12 readers   = 5914106.69 kB/sec
Min throughput per process  =  491812.38 kB/sec
Max throughput per process  =  494777.28 kB/sec
Avg throughput per process  =  493447.58 kB/sec
Min xfer= 1042688.00 kB
CPU utilization: Wall time2.122CPU time1.968CPU 
utilization  92.75 %
Children see throughput for 12 re-readers   = 5947985.69 kB/sec
Parent sees throughput for 12 re-readers= 5941348.51 kB/sec
Min throughput per process  =  492805.81 kB/sec
Max throughput per process  =  497280.19 kB/sec
Avg throughput per process  =  495665.47 kB/sec
Min xfer= 1039360.00 kB
CPU utilization: Wall time2.111CPU time1.968CPU 
utilization  93.22 %

Here's c062db039f40 ("iommu/vt-d: Update domain geometry in
iommu_ops.at(de)tach_dev"). It's losing some steam here.

Children see throughput for 12 initial writers  = 4342419.12 kB/sec
Parent sees throughput for 12 initial writers   = 4310612.79 kB/sec
Min throughput per process  =  359299.06 kB/sec
Max throughput per process  =  363866.16 kB/sec
Avg throughput per process  =  361868.26 kB/sec
Min xfer= 1035520.00 kB
CPU Utilization: Wall time2.902CPU time1.951CPU 
utilization  67.22 %
Children see throughput for 12 rewriters= 4408576.66 kB/sec
Parent sees throughput for 12 rewriters = 4404280.87 kB/sec
Min throughput per process  =  364553.88 kB/sec
Max throughput per process  =  370029.28 kB/sec
Avg throughput per process  =  367381.39 kB/sec
Min xfer= 1033216.00 kB
CPU utilization: Wall time2.836CPU time1.956CPU 
utilization  68.97 %
Children see throughput for 12 readers  = 5406879.47 kB/sec
Parent sees throughput for 12 readers   = 5401862.78 kB/sec
Min throughput per process  =  449583.03 kB/sec
Max throughput per process  =  451761.69 kB/sec
Avg throughput per process  =  450573.29 kB/sec
Min

Re: [PATCH v3 6/7] iommu/mediatek: Gather iova in iommu_unmap to achieve tlb sync once


On 2021-01-18 16:58, Will Deacon wrote:

On Mon, Jan 18, 2021 at 04:35:22PM +, Robin Murphy wrote:

On 2020-12-16 10:36, Yong Wu wrote:

In current iommu_unmap, this code is:

iommu_iotlb_gather_init(_gather);
ret = __iommu_unmap(domain, iova, size, _gather);
iommu_iotlb_sync(domain, _gather);

We could gather the whole iova range in __iommu_unmap, and then do tlb
synchronization in the iommu_iotlb_sync.

This patch implement this, Gather the range in mtk_iommu_unmap.
then iommu_iotlb_sync call tlb synchronization for the gathered iova range.
we don't call iommu_iotlb_gather_add_page since our tlb synchronization
could be regardless of granule size.

In this way, gather->start is impossible ULONG_MAX, remove the checking.

This patch aims to do tlb synchronization *once* in the iommu_unmap.


Assuming the update to patch #4 simply results in "unsigned long end = iova
+ size - 1;" here,

Reviewed-by: Robin Murphy 


There's a v4 here:

https://lore.kernel.org/r/20210107122909.16317-1-yong...@mediatek.com


Ha, so there is! Apparently I missed that in my post-holiday sweep last 
week and leant too heavily on the inbox-in-date-order assumption. Lemme 
just go catch up...


Thanks,
Robin.
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

Re: [PATCH v3 6/7] iommu/mediatek: Gather iova in iommu_unmap to achieve tlb sync once

On Mon, Jan 18, 2021 at 04:35:22PM +, Robin Murphy wrote:
> On 2020-12-16 10:36, Yong Wu wrote:
> > In current iommu_unmap, this code is:
> > 
> > iommu_iotlb_gather_init(_gather);
> > ret = __iommu_unmap(domain, iova, size, _gather);
> > iommu_iotlb_sync(domain, _gather);
> > 
> > We could gather the whole iova range in __iommu_unmap, and then do tlb
> > synchronization in the iommu_iotlb_sync.
> > 
> > This patch implement this, Gather the range in mtk_iommu_unmap.
> > then iommu_iotlb_sync call tlb synchronization for the gathered iova range.
> > we don't call iommu_iotlb_gather_add_page since our tlb synchronization
> > could be regardless of granule size.
> > 
> > In this way, gather->start is impossible ULONG_MAX, remove the checking.
> > 
> > This patch aims to do tlb synchronization *once* in the iommu_unmap.
> 
> Assuming the update to patch #4 simply results in "unsigned long end = iova
> + size - 1;" here,
> 
> Reviewed-by: Robin Murphy 

There's a v4 here:

https://lore.kernel.org/r/20210107122909.16317-1-yong...@mediatek.com

Will
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

Re: [PATCH v3 7/7] iommu/mediatek: Remove the tlb-ops for v7s


On 2020-12-16 10:36, Yong Wu wrote:

Until now, we have already used the tlb operations from iommu framework,
then the tlb operations for v7s can be removed.

Correspondingly, Switch the paramenter "cookie" to internal structure.


FWIW,

Reviewed-by: Robin Murphy 


Signed-off-by: Yong Wu 
---
  drivers/iommu/mtk_iommu.c | 27 ---
  1 file changed, 4 insertions(+), 23 deletions(-)

diff --git a/drivers/iommu/mtk_iommu.c b/drivers/iommu/mtk_iommu.c
index 89cec51405cd..5656819cd4a1 100644
--- a/drivers/iommu/mtk_iommu.c
+++ b/drivers/iommu/mtk_iommu.c
@@ -206,10 +206,8 @@ static struct mtk_iommu_domain *to_mtk_domain(struct 
iommu_domain *dom)
return container_of(dom, struct mtk_iommu_domain, domain);
  }
  
-static void mtk_iommu_tlb_flush_all(void *cookie)

+static void mtk_iommu_tlb_flush_all(struct mtk_iommu_data *data)
  {
-   struct mtk_iommu_data *data = cookie;
-
for_each_m4u(data) {
if (!pm_runtime_active(data->dev))
continue;
@@ -221,9 +219,9 @@ static void mtk_iommu_tlb_flush_all(void *cookie)
  }
  
  static void mtk_iommu_tlb_flush_range_sync(unsigned long iova, size_t size,

-  size_t granule, void *cookie)
+  size_t granule,
+  struct mtk_iommu_data *data)
  {
-   struct mtk_iommu_data *data = cookie;
unsigned long flags;
int ret;
u32 tmp;
@@ -250,7 +248,7 @@ static void mtk_iommu_tlb_flush_range_sync(unsigned long 
iova, size_t size,
if (ret) {
dev_warn(data->dev,
 "Partial TLB flush timed out, falling back to full 
flush\n");
-   mtk_iommu_tlb_flush_all(cookie);
+   mtk_iommu_tlb_flush_all(data);
}
/* Clear the CPE status */
writel_relaxed(0, data->base + REG_MMU_CPE_DONE);
@@ -258,22 +256,6 @@ static void mtk_iommu_tlb_flush_range_sync(unsigned long 
iova, size_t size,
}
  }
  
-static void mtk_iommu_tlb_flush_page_nosync(struct iommu_iotlb_gather *gather,

-   unsigned long iova, size_t granule,
-   void *cookie)
-{
-   struct mtk_iommu_data *data = cookie;
-   struct iommu_domain *domain = >m4u_dom->domain;
-
-   iommu_iotlb_gather_add_page(domain, gather, iova, granule);
-}
-
-static const struct iommu_flush_ops mtk_iommu_flush_ops = {
-   .tlb_flush_all = mtk_iommu_tlb_flush_all,
-   .tlb_flush_walk = mtk_iommu_tlb_flush_range_sync,
-   .tlb_add_page = mtk_iommu_tlb_flush_page_nosync,
-};
-
  static irqreturn_t mtk_iommu_isr(int irq, void *dev_id)
  {
struct mtk_iommu_data *data = dev_id;
@@ -380,7 +362,6 @@ static int mtk_iommu_domain_finalise(struct 
mtk_iommu_domain *dom)
.pgsize_bitmap = mtk_iommu_ops.pgsize_bitmap,
.ias = MTK_IOMMU_HAS_FLAG(data->plat_data, IOVA_34_EN) ? 34 : 
32,
.oas = 35,
-   .tlb = _iommu_flush_ops,
.iommu_dev = data->dev,
};
  


___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

Re: [PATCH v3 6/7] iommu/mediatek: Gather iova in iommu_unmap to achieve tlb sync once


On 2020-12-16 10:36, Yong Wu wrote:

In current iommu_unmap, this code is:

iommu_iotlb_gather_init(_gather);
ret = __iommu_unmap(domain, iova, size, _gather);
iommu_iotlb_sync(domain, _gather);

We could gather the whole iova range in __iommu_unmap, and then do tlb
synchronization in the iommu_iotlb_sync.

This patch implement this, Gather the range in mtk_iommu_unmap.
then iommu_iotlb_sync call tlb synchronization for the gathered iova range.
we don't call iommu_iotlb_gather_add_page since our tlb synchronization
could be regardless of granule size.

In this way, gather->start is impossible ULONG_MAX, remove the checking.

This patch aims to do tlb synchronization *once* in the iommu_unmap.


Assuming the update to patch #4 simply results in "unsigned long end = 
iova + size - 1;" here,


Reviewed-by: Robin Murphy 


Signed-off-by: Yong Wu 
---
  drivers/iommu/mtk_iommu.c | 8 +---
  1 file changed, 5 insertions(+), 3 deletions(-)

diff --git a/drivers/iommu/mtk_iommu.c b/drivers/iommu/mtk_iommu.c
index db7d43adb06b..89cec51405cd 100644
--- a/drivers/iommu/mtk_iommu.c
+++ b/drivers/iommu/mtk_iommu.c
@@ -506,7 +506,12 @@ static size_t mtk_iommu_unmap(struct iommu_domain *domain,
  struct iommu_iotlb_gather *gather)
  {
struct mtk_iommu_domain *dom = to_mtk_domain(domain);
+   unsigned long long end = iova + size;
  
+	if (gather->start > iova)

+   gather->start = iova;
+   if (gather->end < end)
+   gather->end = end;
return dom->iop->unmap(dom->iop, iova, size, gather);
  }
  
@@ -523,9 +528,6 @@ static void mtk_iommu_iotlb_sync(struct iommu_domain *domain,

struct mtk_iommu_domain *dom = to_mtk_domain(domain);
size_t length = gather->end - gather->start;
  
-	if (gather->start == ULONG_MAX)

-   return;
-
mtk_iommu_tlb_flush_range_sync(gather->start, length, gather->pgsize,
   dom->data);
  }


___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

Re: performance regression noted in v5.11-rc after c062db039f40

2021-01-18 Thread Chuck Lever




> On Jan 12, 2021, at 9:38 AM, Will Deacon  wrote:
> 
> [Expanding cc list to include DMA-IOMMU and intel IOMMU folks]
> 
> On Fri, Jan 08, 2021 at 04:18:36PM -0500, Chuck Lever wrote:
>> Hi-
>> 
>> [ Please cc: me on replies, I'm not currently subscribed to
>> iommu@lists ].
>> 
>> I'm running NFS performance tests on InfiniBand using CX-3 Pro cards
>> at 56Gb/s. The test is iozone on an NFSv3/RDMA mount:
>> 
>> /home/cel/bin/iozone -M -+u -i0 -i1 -s1g -r256k -t12 -I
>> 
>> For those not familiar with the way storage protocols use RDMA, The
>> initiator/client sets up memory regions and the target/server uses
>> RDMA Read and Write to move data out of and into those regions. The
>> initiator/client uses only RDMA memory registration and invalidation
>> operations, and the target/server uses RDMA Read and Write.
>> 
>> My NFS client is a two-socket 12-core x86_64 system with its I/O MMU
>> enabled using the kernel command line options "intel_iommu=on
>> iommu=strict".
>> 
>> Recently I've noticed a significant (25-30%) loss in NFS throughput.
>> I was able to bisect on my client to the following commits.
>> 
>> Here's 65f746e8285f ("iommu: Add quirk for Intel graphic devices in
>> map_sg"). This is about normal for this test.
>> 
>>  Children see throughput for 12 initial writers  = 4732581.09 kB/sec
>>  Parent sees throughput for 12 initial writers   = 4646810.21 kB/sec
>>  Min throughput per process  =  387764.34 kB/sec
>>  Max throughput per process  =  399655.47 kB/sec
>>  Avg throughput per process  =  394381.76 kB/sec
>>  Min xfer= 1017344.00 kB
>>  CPU Utilization: Wall time2.671CPU time1.974CPU 
>> utilization  73.89 %
>>  Children see throughput for 12 rewriters= 4837741.94 kB/sec
>>  Parent sees throughput for 12 rewriters = 4833509.35 kB/sec
>>  Min throughput per process  =  398983.72 kB/sec
>>  Max throughput per process  =  406199.66 kB/sec
>>  Avg throughput per process  =  403145.16 kB/sec
>>  Min xfer= 1030656.00 kB
>>  CPU utilization: Wall time2.584CPU time1.959CPU 
>> utilization  75.82 %
>>  Children see throughput for 12 readers  = 5921370.94 kB/sec
>>  Parent sees throughput for 12 readers   = 5914106.69 kB/sec
>>  Min throughput per process  =  491812.38 kB/sec
>>  Max throughput per process  =  494777.28 kB/sec
>>  Avg throughput per process  =  493447.58 kB/sec
>>  Min xfer= 1042688.00 kB
>>  CPU utilization: Wall time2.122CPU time1.968CPU 
>> utilization  92.75 %
>>  Children see throughput for 12 re-readers   = 5947985.69 kB/sec
>>  Parent sees throughput for 12 re-readers= 5941348.51 kB/sec
>>  Min throughput per process  =  492805.81 kB/sec
>>  Max throughput per process  =  497280.19 kB/sec
>>  Avg throughput per process  =  495665.47 kB/sec
>>  Min xfer= 1039360.00 kB
>>  CPU utilization: Wall time2.111CPU time1.968CPU 
>> utilization  93.22 %
>> 
>> Here's c062db039f40 ("iommu/vt-d: Update domain geometry in
>> iommu_ops.at(de)tach_dev"). It's losing some steam here.
>> 
>>  Children see throughput for 12 initial writers  = 4342419.12 kB/sec
>>  Parent sees throughput for 12 initial writers   = 4310612.79 kB/sec
>>  Min throughput per process  =  359299.06 kB/sec
>>  Max throughput per process  =  363866.16 kB/sec
>>  Avg throughput per process  =  361868.26 kB/sec
>>  Min xfer= 1035520.00 kB
>>  CPU Utilization: Wall time2.902CPU time1.951CPU 
>> utilization  67.22 %
>>  Children see throughput for 12 rewriters= 4408576.66 kB/sec
>>  Parent sees throughput for 12 rewriters = 4404280.87 kB/sec
>>  Min throughput per process  =  364553.88 kB/sec
>>  Max throughput per process  =  370029.28 kB/sec
>>  Avg throughput per process  =  367381.39 kB/sec
>>  Min xfer= 1033216.00 kB
>>  CPU utilization: Wall time2.836CPU time1.956CPU 
>> utilization  68.97 %
>>  Children see throughput for 12 readers  = 5406879.47 kB/sec
>>  Parent sees throughput for 12 readers   = 5401862.78 kB/sec
>>  Min throughput per process  =  449583.03 kB/sec
>>  Max throughput per process  =  451761.69 kB/sec
>>  Avg

Re: [PATCH v3 5/7] iommu: Allow io_pgtable_tlb ops optional


On 2020-12-16 10:36, Yong Wu wrote:

This patch allows io_pgtable_tlb ops could be null since the IOMMU drivers
may use the tlb ops from iommu framework.


There's not much in it, but I guess this does make more sense overall 
than just making .tlb_flush_all optional and drivers having to provide a 
full set of NULL callbacks.


Reviewed-by: Robin Murphy 


Signed-off-by: Yong Wu 
---
  include/linux/io-pgtable.h | 8 +---
  1 file changed, 5 insertions(+), 3 deletions(-)

diff --git a/include/linux/io-pgtable.h b/include/linux/io-pgtable.h
index adde9e49be08..c81796814afa 100644
--- a/include/linux/io-pgtable.h
+++ b/include/linux/io-pgtable.h
@@ -206,14 +206,16 @@ struct io_pgtable {
  
  static inline void io_pgtable_tlb_flush_all(struct io_pgtable *iop)

  {
-   iop->cfg.tlb->tlb_flush_all(iop->cookie);
+   if (iop->cfg.tlb && iop->cfg.tlb->tlb_flush_all)
+   iop->cfg.tlb->tlb_flush_all(iop->cookie);
  }
  
  static inline void

  io_pgtable_tlb_flush_walk(struct io_pgtable *iop, unsigned long iova,
  size_t size, size_t granule)
  {
-   iop->cfg.tlb->tlb_flush_walk(iova, size, granule, iop->cookie);
+   if (iop->cfg.tlb && iop->cfg.tlb->tlb_flush_walk)
+   iop->cfg.tlb->tlb_flush_walk(iova, size, granule, iop->cookie);
  }
  
  static inline void

@@ -221,7 +223,7 @@ io_pgtable_tlb_add_page(struct io_pgtable *iop,
struct iommu_iotlb_gather * gather, unsigned long iova,
size_t granule)
  {
-   if (iop->cfg.tlb->tlb_add_page)
+   if (iop->cfg.tlb && iop->cfg.tlb->tlb_add_page)
iop->cfg.tlb->tlb_add_page(gather, iova, granule, iop->cookie);
  }
  


___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

Re: [PATCH v6 06/33] of/device: Move dma_range_map before of_iommu_configure


On 2021-01-15 05:30, Yong Wu wrote:

On Thu, 2021-01-14 at 13:27 -0600, Rob Herring wrote:

On Mon, Jan 11, 2021 at 07:18:47PM +0800, Yong Wu wrote:

"dev->dma_range_map" contains the devices' dma_ranges information,
This patch moves dma_range_map before of_iommu_configure. The iommu
driver may need to know the dma_address requirements of its iommu
consumer devices.

CC: Rob Herring 
CC: Frank Rowand 
Signed-off-by: Yong Wu 
---
  drivers/of/device.c | 3 ++-
  1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/drivers/of/device.c b/drivers/of/device.c
index aedfaaafd3e7..1d84636149df 100644
--- a/drivers/of/device.c
+++ b/drivers/of/device.c
@@ -170,9 +170,11 @@ int of_dma_configure_id(struct device *dev, struct 
device_node *np,
dev_dbg(dev, "device is%sdma coherent\n",
coherent ? " " : " not ");
  
+	dev->dma_range_map = map;

iommu = of_iommu_configure(dev, np, id);
if (PTR_ERR(iommu) == -EPROBE_DEFER) {
kfree(map);
+   dev->dma_range_map = NULL;


Not really going to matter, but you should probably clear dma_range_map
before what it points to is freed.

With that,

Reviewed-by: Rob Herring 


Thanks for the review. I will move it before "kfree(map)" in next
version.


Paul noticed that we already have a bug in assigning to this 
unconditionally[1] - I'd totally forgotten about this series when I 
theorised about IOMMU drivers wanting the information earlier, but 
sweeping my inbox now only goes to show I was right to think of it :)


We should really get something in as a fix independent of this series, 
taking both angles into account.


Robin.

[1] 
https://lore.kernel.org/linux-arm-kernel/5c7946f3-b56e-da00-a750-be097c7ce...@arm.com/





return -EPROBE_DEFER;
}
  
@@ -181,7 +183,6 @@ int of_dma_configure_id(struct device *dev, struct device_node *np,
  
  	arch_setup_dma_ops(dev, dma_start, size, iommu, coherent);
  
-	dev->dma_range_map = map;

return 0;
  }
  EXPORT_SYMBOL_GPL(of_dma_configure_id);
--
2.18.0



___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

Re: [PATCH] swiotlb: Validate bounce size in the sync/unmap path

2021-01-18 Thread Konrad Rzeszutek Wilk

On Mon, Jan 18, 2021 at 12:44:58PM +0100, Martin Radev wrote:
> On Wed, Jan 13, 2021 at 12:30:17PM +0100, Christoph Hellwig wrote:
> > On Tue, Jan 12, 2021 at 04:07:29PM +0100, Martin Radev wrote:
> > > The size of the buffer being bounced is not checked if it happens
> > > to be larger than the size of the mapped buffer. Because the size
> > > can be controlled by a device, as it's the case with virtio devices,
> > > this can lead to memory corruption.
> > > 
> > 
> > I'm really worried about all these hodge podge hacks for not trusted
> > hypervisors in the I/O stack.  Instead of trying to harden protocols
> > that are fundamentally not designed for this, how about instead coming
> > up with a new paravirtualized I/O interface that is specifically
> > designed for use with an untrusted hypervisor from the start?
> 
> Your comment makes sense but then that would require the cooperation
> of these vendors and the cloud providers to agree on something meaningful.
> I am also not sure whether the end result would be better than hardening
> this interface to catch corruption. There is already some validation in
> unmap path anyway.
> 
> Another possibility is to move this hardening to the common virtio code,
> but I think the code may become more complicated there since it would
> require tracking both the dma_addr and length for each descriptor.

Christoph,

I've been wrestling with the same thing - this is specific to busted
drivers. And in reality you could do the same thing with a hardware
virtio device (see example in http://thunderclap.io/) - where the
mitigation is 'enable the IOMMU to do its job.'.

AMD SEV documents speak about utilizing IOMMU to do this (AMD SEV-SNP)..
and while that is great in the future, SEV without IOMMU is now here.

Doing a full circle here, this issue can be exploited with virtio
but you could say do that with real hardware too if you hacked the
firmware, so if you say used Intel SR-IOV NIC that was compromised
on an AMD SEV machine, and plumbed in the guest - the IOMMU inside
of the guest would be SWIOTLB code. Last line of defense against
bad firmware to say.

As such I am leaning towards taking this code, but I am worried
about the performance hit .. but perhaps I shouldn't as if you
are using SWIOTLB=force already you are kind of taking a
performance hit?

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

Re: [PATCH v4 2/3] iommu/iova: Avoid double-negatives in magazine helpers


On 18/01/2021 12:59, Robin Murphy wrote:

for cpu_rcaches too, and get a similar abort at runtime.

It's not specifically that we expect them (allocation failures for the
loaded magazine), rather we should make safe against it.

So could you be more specific in your concern for the cpu_rcache 
failure?

cpu_rcache magazine assignment comes from this logic.

If this fails:

drivers/iommu/iova.c:847: rcache->cpu_rcaches = 
__alloc_percpu(sizeof(*cpu_rcache), cache_line_size());


then we'll get an Oops in __iova_rcache_get(). So if we're making the
module safer against magazine allocation failure, shouldn't we also
protect against cpu_rcaches allocation failure?


Ah, gotcha. So we have the WARN there, but that's not much use as this 
would still crash, as you say.


So maybe we can embed the cpu rcaches in iova_domain struct, to avoid 
the separate (failable) cpu rcache allocation.


Is that even possible? The size of percpu data isn't known at compile 
time, so at best it would add ugly runtime complexity to any allocation 
of a struct iova_domain by itself, but worse than that it means that 
embedding iova_domain in any other structure becomes completely broken, no?


Ah, now I see that it's not possible. I was thinking of using 
DEFINE_PER_CPU(), but it's not permitted.


So even though this patch saves us from cpu_rcache->loaded / ->prev == 
NULL, I still prefer not to add explicit checks for cpu_rcache == NULL 
in the IOVA alloc/free paths, and would rather pass an error back in 
init_iova_rcaches(), but adding code for tidy-up looks messy.


Thanks,
John
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

Re: [PATCH 6/6] vfio/iommu_type1: Drop parameter "pgsize" of update_user_bitmap




On 2021/1/16 7:44, Alex Williamson wrote:
> On Thu, 7 Jan 2021 12:44:01 +0800
> Keqian Zhu  wrote:
> 
>> We always use the smallest supported page size of vfio_iommu as
>> pgsize. Drop parameter "pgsize" of update_user_bitmap.
>>
>> Signed-off-by: Keqian Zhu 
>> ---
>>  drivers/vfio/vfio_iommu_type1.c | 9 -
>>  1 file changed, 4 insertions(+), 5 deletions(-)
>>
>> diff --git a/drivers/vfio/vfio_iommu_type1.c 
>> b/drivers/vfio/vfio_iommu_type1.c
>> index 82649a040148..bceda5e8baaa 100644
>> --- a/drivers/vfio/vfio_iommu_type1.c
>> +++ b/drivers/vfio/vfio_iommu_type1.c
>> @@ -978,10 +978,9 @@ static void vfio_update_pgsize_bitmap(struct vfio_iommu 
>> *iommu)
>>  }
>>  
>>  static int update_user_bitmap(u64 __user *bitmap, struct vfio_iommu *iommu,
>> -  struct vfio_dma *dma, dma_addr_t base_iova,
>> -  size_t pgsize)
>> +  struct vfio_dma *dma, dma_addr_t base_iova)
>>  {
>> -unsigned long pgshift = __ffs(pgsize);
>> +unsigned long pgshift = __ffs(iommu->pgsize_bitmap);
>>  unsigned long nbits = dma->size >> pgshift;
>>  unsigned long bit_offset = (dma->iova - base_iova) >> pgshift;
>>  unsigned long copy_offset = bit_offset / BITS_PER_LONG;
>> @@ -1046,7 +1045,7 @@ static int vfio_iova_dirty_bitmap(u64 __user *bitmap, 
>> struct vfio_iommu *iommu,
>>  if (dma->iova > iova + size - 1)
>>  break;
>>  
>> -ret = update_user_bitmap(bitmap, iommu, dma, iova, pgsize);
>> +ret = update_user_bitmap(bitmap, iommu, dma, iova);
>>  if (ret)
>>  return ret;
>>  
>> @@ -1192,7 +1191,7 @@ static int vfio_dma_do_unmap(struct vfio_iommu *iommu,
>>  
>>  if (unmap->flags & VFIO_DMA_UNMAP_FLAG_GET_DIRTY_BITMAP) {
>>  ret = update_user_bitmap(bitmap->data, iommu, dma,
>> - unmap->iova, pgsize);
>> + unmap->iova);
>>  if (ret)
>>  break;
>>  }
> 
> Same as the previous, both call sites already have both pgsize and
> pgshift, pass both rather than recalculate.  Thanks,
> 
My idea is that the "pgsize" parameter goes here and there, disturbs our sight 
when read code.
To be frankly, the recalculate is negligible. Or we can add new fields in 
vfio_iommu, which are
updated in vfio_update_pgsize_bitmap().

Thanks,
Keqian



> Alex
> 
> .
> 
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

Re: [PATCH 3/6] vfio/iommu_type1: Initially set the pinned_page_dirty_scope




On 2021/1/16 7:30, Alex Williamson wrote:
> On Thu, 7 Jan 2021 12:43:58 +0800
> Keqian Zhu  wrote:
> 
>> For now there are 3 ways to promote the pinned_page_dirty_scope
>> status of vfio_iommu:
>>
>> 1. Through vfio pin interface.
>> 2. Detach a group without pinned_dirty_scope.
>> 3. Attach a group with pinned_dirty_scope.
>>
>> For point 3, the only chance to promote the pinned_page_dirty_scope
>> status is when vfio_iommu is newly created. As we can safely set
>> empty vfio_iommu to be at pinned status, then the point 3 can be
>> removed to reduce operations.
>>
>> Signed-off-by: Keqian Zhu 
>> ---
>>  drivers/vfio/vfio_iommu_type1.c | 4 +---
>>  1 file changed, 1 insertion(+), 3 deletions(-)
>>
>> diff --git a/drivers/vfio/vfio_iommu_type1.c 
>> b/drivers/vfio/vfio_iommu_type1.c
>> index 110ada24ee91..b596c482487b 100644
>> --- a/drivers/vfio/vfio_iommu_type1.c
>> +++ b/drivers/vfio/vfio_iommu_type1.c
>> @@ -2045,11 +2045,8 @@ static int vfio_iommu_type1_attach_group(void 
>> *iommu_data,
>>   * Non-iommu backed group cannot dirty memory directly,
>>   * it can only use interfaces that provide dirty
>>   * tracking.
>> - * The iommu scope can only be promoted with the
>> - * addition of a dirty tracking group.
>>   */
>>  group->pinned_page_dirty_scope = true;
>> -promote_pinned_page_dirty_scope(iommu);
>>  mutex_unlock(>lock);
>>  
>>  return 0;
>> @@ -2436,6 +2433,7 @@ static void *vfio_iommu_type1_open(unsigned long arg)
>>  INIT_LIST_HEAD(>iova_list);
>>  iommu->dma_list = RB_ROOT;
>>  iommu->dma_avail = dma_entry_limit;
>> +iommu->pinned_page_dirty_scope = true;
>>  mutex_init(>lock);
>>  BLOCKING_INIT_NOTIFIER_HEAD(>notifier);
>>  
> 
> This would be resolved automatically if we used the counter approach I
> mentioned on the previous patch, adding a pinned-page scope group simply
> wouldn't increment the iommu counter, which would initially be zero
> indicating no "all-dma" groups.  Thanks,
> 
Sure, I will do that, thanks.

Keqian.

> Alex
> 
> .
> 
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

Re: [PATCH 1/6] vfio/iommu_type1: Make an explicit "promote" semantic




On 2021/1/16 6:42, Alex Williamson wrote:
> On Thu, 7 Jan 2021 12:43:56 +0800
> Keqian Zhu  wrote:
> 
>> When we want to promote the pinned_page_dirty_scope of vfio_iommu,
>> we call the "update" function to visit all vfio_group, but when we
>> want to downgrade this, we can set the flag as false directly.
> 
> I agree that the transition can only go in one direction, but it's
> still conditional on the scope of all groups involved.  We are
> "updating" the iommu state based on the change of a group.  Renaming
> this to "promote" seems like a matter of personal preference.
> 
>> So we'd better make an explicit "promote" semantic to the "update"
>> function. BTW, if vfio_iommu already has been promoted, then return
>> early.
> 
> Currently it's the caller that avoids using this function when the
> iommu scope is already correct.  In fact the changes induces a
> redundant test in the pin_pages code path, we're changing a group from
> non-pinned-page-scope to pinned-page-scope, therefore the iommu scope
> cannot initially be scope limited.  In the attach_group call path,
> we're moving that test from the caller, so at best we've introduced an
> additional function call.
> 
> The function as it exists today is also more versatile whereas the
> "promote" version here forces it to a single task with no appreciable
> difference in complexity or code.  This seems like a frivolous change.
> Thanks,
OK, I will adapt your idea that maintenance a counter of non-pinned groups.
Then we keep the "update" semantic, and the target is the counter ;-).

Thanks,
Keqian

> 
> Alex
> 
>> Signed-off-by: Keqian Zhu 
>> ---
>>  drivers/vfio/vfio_iommu_type1.c | 30 ++
>>  1 file changed, 14 insertions(+), 16 deletions(-)
>>
>> diff --git a/drivers/vfio/vfio_iommu_type1.c 
>> b/drivers/vfio/vfio_iommu_type1.c
>> index 0b4dedaa9128..334a8240e1da 100644
>> --- a/drivers/vfio/vfio_iommu_type1.c
>> +++ b/drivers/vfio/vfio_iommu_type1.c
>> @@ -148,7 +148,7 @@ static int put_pfn(unsigned long pfn, int prot);
>>  static struct vfio_group *vfio_iommu_find_iommu_group(struct vfio_iommu 
>> *iommu,
>> struct iommu_group *iommu_group);
>>  
>> -static void update_pinned_page_dirty_scope(struct vfio_iommu *iommu);
>> +static void promote_pinned_page_dirty_scope(struct vfio_iommu *iommu);
>>  /*
>>   * This code handles mapping and unmapping of user data buffers
>>   * into DMA'ble space using the IOMMU
>> @@ -714,7 +714,7 @@ static int vfio_iommu_type1_pin_pages(void *iommu_data,
>>  group = vfio_iommu_find_iommu_group(iommu, iommu_group);
>>  if (!group->pinned_page_dirty_scope) {
>>  group->pinned_page_dirty_scope = true;
>> -update_pinned_page_dirty_scope(iommu);
>> +promote_pinned_page_dirty_scope(iommu);
>>  }
>>  
>>  goto pin_done;
>> @@ -1622,27 +1622,26 @@ static struct vfio_group 
>> *vfio_iommu_find_iommu_group(struct vfio_iommu *iommu,
>>  return group;
>>  }
>>  
>> -static void update_pinned_page_dirty_scope(struct vfio_iommu *iommu)
>> +static void promote_pinned_page_dirty_scope(struct vfio_iommu *iommu)
>>  {
>>  struct vfio_domain *domain;
>>  struct vfio_group *group;
>>  
>> +if (iommu->pinned_page_dirty_scope)
>> +return;
>> +
>>  list_for_each_entry(domain, >domain_list, next) {
>>  list_for_each_entry(group, >group_list, next) {
>> -if (!group->pinned_page_dirty_scope) {
>> -iommu->pinned_page_dirty_scope = false;
>> +if (!group->pinned_page_dirty_scope)
>>  return;
>> -}
>>  }
>>  }
>>  
>>  if (iommu->external_domain) {
>>  domain = iommu->external_domain;
>>  list_for_each_entry(group, >group_list, next) {
>> -if (!group->pinned_page_dirty_scope) {
>> -iommu->pinned_page_dirty_scope = false;
>> +if (!group->pinned_page_dirty_scope)
>>  return;
>> -}
>>  }
>>  }
>>  
>> @@ -2057,8 +2056,7 @@ static int vfio_iommu_type1_attach_group(void 
>> *iommu_data,
>>   * addition of a dirty tracking group.
>>   */
>>  group->pinned_page_dirty_scope = true;
>> -if (!iommu->pinned_page_dirty_scope)
>> -update_pinned_page_dirty_scope(iommu);
>> +promote_pinned_page_dirty_scope(iommu);
>>  mutex_unlock(>lock);
>>  
>>  return 0;
>> @@ -2341,7 +2339,7 @@ static void vfio_iommu_type1_detach_group(void 
>> *iommu_data,
>>  struct vfio_iommu *iommu = iommu_data;
>>  struct vfio_domain *domain;
>>  struct vfio_group *group;
>> -bool update_dirty_scope = false;
>> +bool promote_dirty_scope = false;
>>

Re: [PATCH v2 2/2] vfio/iommu_type1: Sanity check pfn_list when remove vfio_dma




On 2021/1/16 3:14, Alex Williamson wrote:
> On Fri, 15 Jan 2021 17:26:43 +0800
> Keqian Zhu  wrote:
> 
>> vfio_sanity_check_pfn_list() is used to check whether pfn_list of
>> vfio_dma is empty when remove the external domain, so it makes a
>> wrong assumption that only external domain will add pfn to dma pfn_list.
>>
>> Now we apply this check when remove a specific vfio_dma and extract
>> the notifier check just for external domain.
> 
> The page pinning interface is gated by having a notifier registered for
> unmaps, therefore non-external domains would also need to register a
> notifier.  There's currently no other way to add entries to the
> pfn_list.  So if we allow pinning for such domains, then it's wrong to
> WARN_ON() when the notifier list is not-empty when removing an external
> domain.  Long term we should probably extend page {un}pinning for the
> caller to pass their notifier to be validated against the notifier list
> rather than just allowing page pinning if *any* notifier is registered.
> Thanks,
I was misled by the code comments. So when the commit a54eb55045ae is added, 
the only
user of pin interface is mdev vendor driver, but now we also allow iommu backed 
group
to use this interface to constraint dirty scope. Is 
vfio_iommu_unmap_unpin_all() a
proper place to put this WARN()?

Thanks,
Keqian

> 
> Alex
>  
>> Fixes: a54eb55045ae ("vfio iommu type1: Add support for mediated devices")
>> Signed-off-by: Keqian Zhu 
>> ---
>>  drivers/vfio/vfio_iommu_type1.c | 24 +---
>>  1 file changed, 5 insertions(+), 19 deletions(-)
>>
>> diff --git a/drivers/vfio/vfio_iommu_type1.c 
>> b/drivers/vfio/vfio_iommu_type1.c
>> index 4e82b9a3440f..a9bc15e84a4e 100644
>> --- a/drivers/vfio/vfio_iommu_type1.c
>> +++ b/drivers/vfio/vfio_iommu_type1.c
>> @@ -958,6 +958,7 @@ static long vfio_unmap_unpin(struct vfio_iommu *iommu, 
>> struct vfio_dma *dma,
>>  
>>  static void vfio_remove_dma(struct vfio_iommu *iommu, struct vfio_dma *dma)
>>  {
>> +WARN_ON(!RB_EMPTY_ROOT(>pfn_list);
>>  vfio_unmap_unpin(iommu, dma, true);
>>  vfio_unlink_dma(iommu, dma);
>>  put_task_struct(dma->task);
>> @@ -2251,23 +2252,6 @@ static void vfio_iommu_unmap_unpin_reaccount(struct 
>> vfio_iommu *iommu)
>>  }
>>  }
>>  
>> -static void vfio_sanity_check_pfn_list(struct vfio_iommu *iommu)
>> -{
>> -struct rb_node *n;
>> -
>> -n = rb_first(>dma_list);
>> -for (; n; n = rb_next(n)) {
>> -struct vfio_dma *dma;
>> -
>> -dma = rb_entry(n, struct vfio_dma, node);
>> -
>> -if (WARN_ON(!RB_EMPTY_ROOT(>pfn_list)))
>> -break;
>> -}
>> -/* mdev vendor driver must unregister notifier */
>> -WARN_ON(iommu->notifier.head);
>> -}
>> -
>>  /*
>>   * Called when a domain is removed in detach. It is possible that
>>   * the removed domain decided the iova aperture window. Modify the
>> @@ -2367,7 +2351,8 @@ static void vfio_iommu_type1_detach_group(void 
>> *iommu_data,
>>  kfree(group);
>>  
>>  if (list_empty(>external_domain->group_list)) {
>> -vfio_sanity_check_pfn_list(iommu);
>> +/* mdev vendor driver must unregister notifier 
>> */
>> +WARN_ON(iommu->notifier.head);
>>  
>>  if (!IS_IOMMU_CAP_DOMAIN_IN_CONTAINER(iommu))
>>  vfio_iommu_unmap_unpin_all(iommu);
>> @@ -2491,7 +2476,8 @@ static void vfio_iommu_type1_release(void *iommu_data)
>>  
>>  if (iommu->external_domain) {
>>  vfio_release_domain(iommu->external_domain, true);
>> -vfio_sanity_check_pfn_list(iommu);
>> +/* mdev vendor driver must unregister notifier */
>> +WARN_ON(iommu->notifier.head);
>>  kfree(iommu->external_domain);
>>  }
>>  
> 
> .
> 
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

Re: [PATCH v4 2/3] iommu/iova: Avoid double-negatives in magazine helpers


On 2021-01-18 10:55, John Garry wrote:

On 18/01/2021 10:08, Jean-Philippe Brucker wrote:
Any idea why that's happening?  This fix seems ok but if we're 
expecting

allocation failures for the loaded magazine then we could easily get it
for cpu_rcaches too, and get a similar abort at runtime.

It's not specifically that we expect them (allocation failures for the
loaded magazine), rather we should make safe against it.

So could you be more specific in your concern for the cpu_rcache 
failure?

cpu_rcache magazine assignment comes from this logic.

If this fails:

drivers/iommu/iova.c:847: rcache->cpu_rcaches = 
__alloc_percpu(sizeof(*cpu_rcache), cache_line_size());


then we'll get an Oops in __iova_rcache_get(). So if we're making the
module safer against magazine allocation failure, shouldn't we also
protect against cpu_rcaches allocation failure?


Ah, gotcha. So we have the WARN there, but that's not much use as this 
would still crash, as you say.


So maybe we can embed the cpu rcaches in iova_domain struct, to avoid 
the separate (failable) cpu rcache allocation.


Is that even possible? The size of percpu data isn't known at compile 
time, so at best it would add ugly runtime complexity to any allocation 
of a struct iova_domain by itself, but worse than that it means that 
embedding iova_domain in any other structure becomes completely broken, no?


Robin.

Alternatively, we could add NULL checks __iova_rcache_get() et al for 
this allocation failure but that's not preferable as it's fastpath.


Finally so we could pass back an error code from init_iova_rcache() to 
its only caller, init_iova_domain(); but that has multiple callers and 
would need to be fixed up.


Not sure which is best or on other options.

Thanks,
John

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

Re: [PATCH v4 2/3] iommu/iova: Avoid double-negatives in magazine helpers

2021-01-18 Thread Jean-Philippe Brucker

On Mon, Jan 18, 2021 at 10:55:52AM +, John Garry wrote:
> On 18/01/2021 10:08, Jean-Philippe Brucker wrote:
> > > > Any idea why that's happening?  This fix seems ok but if we're expecting
> > > > allocation failures for the loaded magazine then we could easily get it
> > > > for cpu_rcaches too, and get a similar abort at runtime.
> > > It's not specifically that we expect them (allocation failures for the
> > > loaded magazine), rather we should make safe against it.
> > > 
> > > So could you be more specific in your concern for the cpu_rcache failure?
> > > cpu_rcache magazine assignment comes from this logic.
> > If this fails:
> > 
> > drivers/iommu/iova.c:847: rcache->cpu_rcaches = 
> > __alloc_percpu(sizeof(*cpu_rcache), cache_line_size());
> > 
> > then we'll get an Oops in __iova_rcache_get(). So if we're making the
> > module safer against magazine allocation failure, shouldn't we also
> > protect against cpu_rcaches allocation failure?
> 
> Ah, gotcha. So we have the WARN there, but that's not much use as this would
> still crash, as you say.
> 
> So maybe we can embed the cpu rcaches in iova_domain struct, to avoid the
> separate (failable) cpu rcache allocation.
> 
> Alternatively, we could add NULL checks __iova_rcache_get() et al for this
> allocation failure but that's not preferable as it's fastpath.
> 
> Finally so we could pass back an error code from init_iova_rcache() to its
> only caller, init_iova_domain(); but that has multiple callers and would
> need to be fixed up.
> 
> Not sure which is best or on other options.

I would have initially gone with option 2 which seems the simplest, but I
don't have a setup to measure that overhead

Thanks,
Jean
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

Re: [PATCH v4 3/3] iommu/iova: Flush CPU rcache for when a depot fills

On 15/01/2021 19:21, Robin Murphy wrote:

It would be good to understand why the rcache doesn't stabilize. Could be
a bug, or just need some tuning

In strict mode, if a driver does Alloc-Free-Alloc and the first alloc
misses the rcache, the second allocation hits it. The same sequence in
non-strict mode misses the cache twice, because the IOVA is added to the
flush queue on Free.

So rather than AFAFAF.. we get AAA..FFF.., only once the fq_timer
triggers

or the FQ is full.

Sounds right

Interestingly the FQ size is 2x IOVA_MAG_SIZE, so we
could allocate 2 magazines worth of fresh IOVAs before alloc starts
hitting the cache. If a job allocates more than that, some magazines are
going to the depot, and with multi-CPU jobs those will get used on other
CPUs during the next alloc bursts, causing the progressive increase in
rcache consumption. I wonder if setting IOVA_MAG_SIZE > IOVA_FQ_SIZE
helps

reuse of IOVAs?

Looking back through the lore history, I don't know where the
IOVA_FQ_SIZE = 256 came from. I guess it's size of 2x IOVA_MAG_SIZE (1x
for loaded and 1x for prev) for the reason you mention.

Then again I haven't worked out the details, might be entirely wrong.
I'll

have another look next week.

cheers

I did start digging into the data (thanks for that!) before Christmas,
but between being generally frazzled and trying to remember how to write
Perl to massage the numbers out of the log dump I never got round to
responding, sorry.

As you may have seen:
https://raw.githubusercontent.com/hisilicon/kernel-dev/064c4dc8869b3f2ad07edffceafde0b129f276b0/lsi3008_dmesg

I had to change some block configs via sysfs to ever get IOVA locations
for size > 0. And even then, I still got none bigger than
IOVA_RANGE_CACHE_MAX_SIZE.

Note: For a log like:
[13175.361915] print_iova2 iova_allocs(=500 ... too_big=47036

47036 is number of IOVA size > IOVA_RANGE_CACHE_MAX_SIZE, in case it was
not clear.

And I never hit the critical point of a depot bin filling, but it may
just take even longer.

However with IOVA size = 0 always occurring, then I noticed that the
depot size = 0 bin fills up relatively quickly. As such, I am now
slightly skeptical of the approach I have taken here, i.e purge the
whole rcache.

The partial thoughts that I can recall right now are firstly that the
total numbers of IOVAs are actually pretty meaningless, it really needs
to be broken down by size (that's where my Perl-hacking stalled...);
secondly that the pattern is far more than just a steady increase - the
CPU rcache count looks to be heading asymptotically towards ~65K IOVAs
all the time, representing (IIRC) two sizes being in heavy rotation,
while the depot is happily ticking along in a steady state as expected,
until it suddenly explodes out of nowhere; thirdly, I'd really like to
see instrumentation of the flush queues at the same time, since I think
they're the real culprit.

My theory so far is that everyone is calling queue_iova() frequently
enough to keep the timer at bay and their own queues drained. Then at
the ~16H mark, *something* happens that pauses unmaps long enough for
the timer to fire, and at that point all hell breaks loose.

So do you think that the freeing the IOVA magazines when the depot fills
is the cause of this? That was our analysis.

The depot is
suddenly flooded with IOVAs of *all* sizes, indicative of all the queues
being flushed at once (note that the two most common sizes have been
hovering perilously close to "full" the whole time), but then,
crucially, *that keeps happening*. My guess is that the load of
fq_flush_timeout() slows things down enough that the the timer then
keeps getting the chance to expire and repeat the situation.

Not sure on that one.

The main conclusion I draw from this is the same one that was my initial
gut feeling; that MAX_GLOBAL_MAGS = 32 is utter bollocks.

Yeah, I tend to agree with that. Or, more specifically, how things work
today is broken, and MAX_GLOBAL_MAGS = 32 is very much involved with that.

The CPU rcache
capacity scales with the number of CPUs; the flush queue capacity scales
with the number of CPUs; it is nonsensical that the depot size does not
correspondingly scale with the number of CPUs (I note that the testing
on the original patchset cites a 16-CPU system, where that depot
capacity is conveniently equal to the total rcache capacity).

Now yes, purging the rcaches when the depot is full does indeed help
mitigate this scenario - I assume it provides enough of a buffer where
the regular free_iova_fast() calls don't hit queue_iova() for a while
(and gives fq_ring_free() some reprieve on the CPU handling the
timeout), giving enough leeway for the flood to finish before anyone
starts hitting queues/locks/etc. and stalling again, and thus break the
self-perpetuating timeout cycle. But that's still only a damage
limitation exercise! It's planning for failure to just lie down and
assume

Re: [PATCH v2 1/2] vfio/iommu_type1: Populate full dirty when detach non-pinned group




On 2021/1/16 2:01, Alex Williamson wrote:
> On Fri, 15 Jan 2021 17:26:42 +0800
> Keqian Zhu  wrote:
> 
>> If a group with non-pinned-page dirty scope is detached with dirty
>> logging enabled, we should fully populate the dirty bitmaps at the
>> time it's removed since we don't know the extent of its previous DMA,
>> nor will the group be present to trigger the full bitmap when the user
>> retrieves the dirty bitmap.
>>
>> Fixes: d6a4c185660c ("vfio iommu: Implementation of ioctl for dirty pages 
>> tracking")
>> Suggested-by: Alex Williamson 
>> Signed-off-by: Keqian Zhu 
>> ---
>>  drivers/vfio/vfio_iommu_type1.c | 18 +-
>>  1 file changed, 17 insertions(+), 1 deletion(-)
>>
>> diff --git a/drivers/vfio/vfio_iommu_type1.c 
>> b/drivers/vfio/vfio_iommu_type1.c
>> index 0b4dedaa9128..4e82b9a3440f 100644
>> --- a/drivers/vfio/vfio_iommu_type1.c
>> +++ b/drivers/vfio/vfio_iommu_type1.c
>> @@ -236,6 +236,19 @@ static void vfio_dma_populate_bitmap(struct vfio_dma 
>> *dma, size_t pgsize)
>>  }
>>  }
>>  
>> +static void vfio_iommu_populate_bitmap_full(struct vfio_iommu *iommu)
>> +{
>> +struct rb_node *n;
>> +unsigned long pgshift = __ffs(iommu->pgsize_bitmap);
>> +
>> +for (n = rb_first(>dma_list); n; n = rb_next(n)) {
>> +struct vfio_dma *dma = rb_entry(n, struct vfio_dma, node);
>> +
>> +if (dma->iommu_mapped)
>> +bitmap_set(dma->bitmap, 0, dma->size >> pgshift);
>> +}
>> +}
>> +
>>  static int vfio_dma_bitmap_alloc_all(struct vfio_iommu *iommu, size_t 
>> pgsize)
>>  {
>>  struct rb_node *n;
>> @@ -2415,8 +2428,11 @@ static void vfio_iommu_type1_detach_group(void 
>> *iommu_data,
>>   * Removal of a group without dirty tracking may allow the iommu scope
>>   * to be promoted.
>>   */
>> -if (update_dirty_scope)
>> +if (update_dirty_scope) {
>>  update_pinned_page_dirty_scope(iommu);
>> +if (iommu->dirty_page_tracking)
>> +vfio_iommu_populate_bitmap_full(iommu);
>> +}
>>  mutex_unlock(>lock);
>>  }
>>  
> 
> This doesn't do the right thing.  This marks the bitmap dirty if:
> 
>  * The detached group dirty scope was not limited to pinned pages
> 
>  AND
> 
>  * Dirty tracking is enabled
> 
>  AND
> 
>  * The vfio_dma is *currently* (ie. after the detach) iommu_mapped
> 
> We need to mark the bitmap dirty based on whether the vfio_dma *was*
> iommu_mapped by the group that is now detached.  Thanks,
> 
> Alex
> 
Hi Alex,

Yes, I missed this point again :-(. The update_dirty_scope means we detached
an iommu backed group, and that means the vfio_dma *was* iommu_mapped by this
group, so we can populate full bitmap unconditionally, right?

Thanks,
Keqian


> .
> 
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

Re: [PATCH] swiotlb: Validate bounce size in the sync/unmap path

2021-01-18 Thread Martin Radev

On Wed, Jan 13, 2021 at 12:30:17PM +0100, Christoph Hellwig wrote:
> On Tue, Jan 12, 2021 at 04:07:29PM +0100, Martin Radev wrote:
> > The size of the buffer being bounced is not checked if it happens
> > to be larger than the size of the mapped buffer. Because the size
> > can be controlled by a device, as it's the case with virtio devices,
> > this can lead to memory corruption.
> > 
> 
> I'm really worried about all these hodge podge hacks for not trusted
> hypervisors in the I/O stack.  Instead of trying to harden protocols
> that are fundamentally not designed for this, how about instead coming
> up with a new paravirtualized I/O interface that is specifically
> designed for use with an untrusted hypervisor from the start?

Your comment makes sense but then that would require the cooperation
of these vendors and the cloud providers to agree on something meaningful.
I am also not sure whether the end result would be better than hardening
this interface to catch corruption. There is already some validation in
unmap path anyway.

Another possibility is to move this hardening to the common virtio code,
but I think the code may become more complicated there since it would
require tracking both the dma_addr and length for each descriptor.
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

Re: [RESEND PATCH 1/1] iommu/amd: Remove unnecessary assignment

On Thu, Dec 10, 2020 at 10:13:30AM +0800, Adrian Huang wrote:
> From: Adrian Huang 
> 
> From: Adrian Huang 
> 
> The values of local variables are assigned after local variables
> are declared, so no need to assign the initial value during the
> variable declaration.
> 
> And, no need to assign NULL for the local variable 'ivrs_base'
> after invoking acpi_put_table().
> 
> Signed-off-by: Adrian Huang 
> ---
>  drivers/iommu/amd/init.c | 5 ++---
>  1 file changed, 2 insertions(+), 3 deletions(-)

Acked-by: Will Deacon 

Will
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

Re: [PATCH v2 0/6] IOMMU: Some more IOVA and core code tidy-up

On Wed, Jan 06, 2021 at 09:35:05PM +0800, John Garry wrote:
> Just some tidy-up to IOVA and core code.
> 
> Based on v5.11-rc2
> 
> Differences to v1:
> - Add core IOMMU patches
> 
> John Garry (6):
>   iova: Make has_iova_flush_queue() private
>   iova: Delete copy_reserved_iova()
>   iova: Stop exporting some more functions
>   iommu: Stop exporting iommu_map_sg_atomic()
>   iommu: Delete iommu_domain_window_disable()
>   iommu: Delete iommu_dev_has_feature()
> 
>  drivers/iommu/iommu.c | 21 -
>  drivers/iommu/iova.c  | 36 +---
>  include/linux/iommu.h | 13 -
>  include/linux/iova.h  | 12 
>  4 files changed, 1 insertion(+), 81 deletions(-)

For the series:

Acked-by: Will Deacon 

Will
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

Re: [PATCH v4 2/3] iommu/iova: Avoid double-negatives in magazine helpers


On 18/01/2021 10:08, Jean-Philippe Brucker wrote:

Any idea why that's happening?  This fix seems ok but if we're expecting
allocation failures for the loaded magazine then we could easily get it
for cpu_rcaches too, and get a similar abort at runtime.

It's not specifically that we expect them (allocation failures for the
loaded magazine), rather we should make safe against it.

So could you be more specific in your concern for the cpu_rcache failure?
cpu_rcache magazine assignment comes from this logic.

If this fails:

drivers/iommu/iova.c:847: rcache->cpu_rcaches = 
__alloc_percpu(sizeof(*cpu_rcache), cache_line_size());

then we'll get an Oops in __iova_rcache_get(). So if we're making the
module safer against magazine allocation failure, shouldn't we also
protect against cpu_rcaches allocation failure?


Ah, gotcha. So we have the WARN there, but that's not much use as this 
would still crash, as you say.


So maybe we can embed the cpu rcaches in iova_domain struct, to avoid 
the separate (failable) cpu rcache allocation.


Alternatively, we could add NULL checks __iova_rcache_get() et al for 
this allocation failure but that's not preferable as it's fastpath.


Finally so we could pass back an error code from init_iova_rcache() to 
its only caller, init_iova_domain(); but that has multiple callers and 
would need to be fixed up.


Not sure which is best or on other options.

Thanks,
John
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

Re: [PATCH v4 2/3] iommu/iova: Avoid double-negatives in magazine helpers

2021-01-18 Thread Jean-Philippe Brucker

On Mon, Jan 18, 2021 at 09:24:17AM +, John Garry wrote:
> On 15/01/2021 17:30, Jean-Philippe Brucker wrote:
> > On Thu, Dec 10, 2020 at 02:23:08AM +0800, John Garry wrote:
> > > A similar crash to the following could be observed if initial CPU rcache
> > > magazine allocations fail in init_iova_rcaches():
> > 
> 
> thanks for having a look
> 
> > Any idea why that's happening?  This fix seems ok but if we're expecting
> > allocation failures for the loaded magazine then we could easily get it
> > for cpu_rcaches too, and get a similar abort at runtime.
> 
> It's not specifically that we expect them (allocation failures for the
> loaded magazine), rather we should make safe against it.
> 
> So could you be more specific in your concern for the cpu_rcache failure?
> cpu_rcache magazine assignment comes from this logic.

If this fails:

drivers/iommu/iova.c:847: rcache->cpu_rcaches = 
__alloc_percpu(sizeof(*cpu_rcache), cache_line_size());

then we'll get an Oops in __iova_rcache_get(). So if we're making the
module safer against magazine allocation failure, shouldn't we also
protect against cpu_rcaches allocation failure?

Thanks,
Jean

> 
> Anyway, logic like "if not full" or "if not empty" is poor as the outcome
> for NULL is ambiguous (maybe there's a better word) and the code is not safe
> against it, and so I replace with "if space" or "if have an IOVA",
> respectively.
> 
> Thanks,
> John
> 
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

Re: [PATCH v4 2/3] iommu/iova: Avoid double-negatives in magazine helpers