Re: [PATCH v3 4/4] vfio: Require that devices support DMA cache coherence

2022-07-04 Thread chenxiang (M) via iommu

Hi,

We encounter a issue with the patch: our platform is ARM64, and we run 
DPDK with smmu disable on VM (without iommu=smmuv3 etc),


so we use noiommu mode with enable_unsafe_noiommu_mode=1 to passthrough 
the device to VM with following steps (those steps are on VM) :


insmod vfio.ko enable_unsafe_noiommu_mode=1
insmod vfio_virqfd.ko
insmod vfio-pci-core.ko
insmdo vfio-pci.ko
insmod vfio_iommu_type1.ko

echo vfio-pci > /sys/bus/pci/devices/:00:02.0/driver_override
echo :00:02.0 > /sys/bus/pci/drivers_probe -- failed

I find that vfio-pci device is not probed because of the additional 
check. It works well without this patch.


Do we need to skip the check if enable_unsafe_noiommu_mode=1?

Best regards,

Xiang Chen


在 2022/4/11 23:16, Jason Gunthorpe 写道:

IOMMU_CACHE means that normal DMAs do not require any additional coherency
mechanism and is the basic uAPI that VFIO exposes to userspace. For
instance VFIO applications like DPDK will not work if additional coherency
operations are required.

Therefore check IOMMU_CAP_CACHE_COHERENCY like vdpa & usnic do before
allowing an IOMMU backed VFIO device to be created.

Reviewed-by: Kevin Tian 
Acked-by: Alex Williamson 
Acked-by: Robin Murphy 
Signed-off-by: Jason Gunthorpe 
---
  drivers/vfio/vfio.c | 7 +++
  1 file changed, 7 insertions(+)

diff --git a/drivers/vfio/vfio.c b/drivers/vfio/vfio.c
index a4555014bd1e72..9edad767cfdad3 100644
--- a/drivers/vfio/vfio.c
+++ b/drivers/vfio/vfio.c
@@ -815,6 +815,13 @@ static int __vfio_register_dev(struct vfio_device *device,
  
  int vfio_register_group_dev(struct vfio_device *device)

  {
+   /*
+* VFIO always sets IOMMU_CACHE because we offer no way for userspace to
+* restore cache coherency.
+*/
+   if (!iommu_capable(device->dev->bus, IOMMU_CAP_CACHE_COHERENCY))
+   return -EINVAL;
+
return __vfio_register_dev(device,
vfio_group_find_or_alloc(device->dev));
  }


___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

Re: [PATCH v2 00/24] iommu: Refactor DMA domain strictness

2021-07-29 Thread chenxiang (M)



在 2021/7/29 18:59, Robin Murphy 写道:

On 2021-07-29 03:55, chenxiang (M) wrote:

Hi Robin,


在 2021/7/28 23:58, Robin Murphy 写道:

Hi all,

Here's v2 where things start to look more realistic, hence the expanded
CC list. The patches are now based on the current iommu/core branch to
take John's iommu_set_dma_strict() cleanup into account.

The series remiains in two (or possibly 3) logical parts - for people
CC'd on cookie cleanup patches, the later parts should not affect you
since your drivers don't implement non-strict mode anyway; the cleanup
is all pretty straightforward, but please do yell at me if I've managed
to let a silly mistake slip through and broken your driver.

This time I have also build-tested x86 as well as arm64 :)


I have tested those patchset on ARM64 with SMMUV3, and the testcases 
are as follows:

- Boot with iommu.strict=0, running fio and it works well;
- Boot with iommu.strict=1, running fio and it works well;
- Change strict mode to lazy mode when building, the change takes 
effect;
- Boot without iommu.strict(default strict mode), change the sysfs 
interface type from DMA to DMA-FQ dynamically during running fio, and 
it works well;
- Boot without iommu.strict(default strict mode), change the sysfs 
interface type from DMA-FQ to DMA dynamically, and it is not allowed 
and print "Device or resource busy"
(i know it is qualified, and we can change no-strict mode to strict 
by unbind the driver -> change the sysfs interface (type)->bind the 
driver (tested this and it works well),
but i have a small question: is it also possible to change from 
DMA-FQ to DMA dynamically? )


As patch #22 mentions, I think it's possible in principle, but it's 
certainly trickier. When enabling a flush queue, it doesn't matter if 
it takes a while for other threads to notice that cookie->fq_domain is 
now set and stop doing synchronous invalidations (and in the SMMU case 
it seems like there are probably enough dependencies to additionally 
prevent the io_pgtable quirk being observable before that). However 
when disabling, we'd need to be absolutely sure that the driver *has* 
started invalidating strictly before we stop queueing freed IOVAs, 
plus we need to be absolutely sure that we've stopped queueing freed 
IOVAs before we attempt to tear down the flush queue itself. I'm not 
sure off-hand how feasible it would be to put all that synchronisation 
in the right places without it also impacting normal operation.


Furthermore, as also noted, there doesn't seem to be a good reason for 
ever actually needing to do that. If a device isn't trusted, it should 
be given a strict domain *before* any driver has a chance to start 
doing anything, or your trust model is broken and pretty useless. I 
can imagine some niche debugging/benchmarking cases where it might 
help save a bit of effort, but nothing with a strong enough 
justification to be worth supporting in mainline.


Ok, thanks.




Anyway, please feel free to add :
Tested-by: Xiang Chen 


That's great, thanks!

Robin.


Changes in v2:

- Add iommu_is_dma_domain() helper to abstract flag check (and help
   avoid silly typos like the one in v1).
- Tweak a few commit messages for spelling and (hopefully) clarity.
- Move the iommu_create_device_direct_mappings() update to patch #14
   where it should have been.
- Rewrite patch #20 as a conversion of the now-existing option.
- Clean up the ops->flush_iotlb_all check which is also made redundant
   by the new domain type
- Add patch #24, which is arguably tangential, but it was something I
   spotted during the rebase, so...

Once again, the whole lot is available on a branch here:

https://gitlab.arm.com/linux-arm/linux-rm/-/tree/iommu/fq

Thanks,
Robin.


CC: Marek Szyprowski 
CC: Yoshihiro Shimoda 
CC: Geert Uytterhoeven 
CC: Yong Wu 
CC: Heiko Stuebner 
CC: Chunyan Zhang 
CC: Chunyan Zhang 
CC: Maxime Ripard 
CC: Jean-Philippe Brucker 

Robin Murphy (24):
   iommu: Pull IOVA cookie management into the core
   iommu/amd: Drop IOVA cookie management
   iommu/arm-smmu: Drop IOVA cookie management
   iommu/vt-d: Drop IOVA cookie management
   iommu/exynos: Drop IOVA cookie management
   iommu/ipmmu-vmsa: Drop IOVA cookie management
   iommu/mtk: Drop IOVA cookie management
   iommu/rockchip: Drop IOVA cookie management
   iommu/sprd: Drop IOVA cookie management
   iommu/sun50i: Drop IOVA cookie management
   iommu/virtio: Drop IOVA cookie management
   iommu/dma: Unexport IOVA cookie management
   iommu/dma: Remove redundant "!dev" checks
   iommu: Introduce explicit type for non-strict DMA domains
   iommu/amd: Prepare for multiple DMA domain types
   iommu/arm-smmu: Prepare for multiple DMA domain types
   iommu/vt-d: Prepare for multiple DMA domain types
   iommu: Express DMA strictness via the domain type
   iommu: Expose DMA domain strictness via sysfs
   iommu: Merge strictness and domain type configs
   iommu/dma: Factor out flush queue init
   iommu

Re: [PATCH v2 00/24] iommu: Refactor DMA domain strictness

2021-07-28 Thread chenxiang (M)

Hi Robin,


在 2021/7/28 23:58, Robin Murphy 写道:

Hi all,

Here's v2 where things start to look more realistic, hence the expanded
CC list. The patches are now based on the current iommu/core branch to
take John's iommu_set_dma_strict() cleanup into account.

The series remiains in two (or possibly 3) logical parts - for people
CC'd on cookie cleanup patches, the later parts should not affect you
since your drivers don't implement non-strict mode anyway; the cleanup
is all pretty straightforward, but please do yell at me if I've managed
to let a silly mistake slip through and broken your driver.

This time I have also build-tested x86 as well as arm64 :)


I have tested those patchset on ARM64 with SMMUV3, and the testcases are 
as follows:

- Boot with iommu.strict=0, running fio and it works well;
- Boot with iommu.strict=1, running fio and it works well;
- Change strict mode to lazy mode when building, the change takes effect;
- Boot without iommu.strict(default strict mode), change the sysfs 
interface type from DMA to DMA-FQ dynamically during running fio, and it 
works well;
- Boot without iommu.strict(default strict mode), change the sysfs 
interface type from DMA-FQ to DMA dynamically, and it is not allowed and 
print "Device or resource busy"
(i know it is qualified, and we can change no-strict mode to strict by 
unbind the driver -> change the sysfs interface (type)->bind the driver 
(tested this and it works well),
but i have a small question: is it also possible to change from DMA-FQ 
to DMA dynamically? )


Anyway, please feel free to add :
Tested-by: Xiang Chen 



Changes in v2:

- Add iommu_is_dma_domain() helper to abstract flag check (and help
   avoid silly typos like the one in v1).
- Tweak a few commit messages for spelling and (hopefully) clarity.
- Move the iommu_create_device_direct_mappings() update to patch #14
   where it should have been.
- Rewrite patch #20 as a conversion of the now-existing option.
- Clean up the ops->flush_iotlb_all check which is also made redundant
   by the new domain type
- Add patch #24, which is arguably tangential, but it was something I
   spotted during the rebase, so...

Once again, the whole lot is available on a branch here:

https://gitlab.arm.com/linux-arm/linux-rm/-/tree/iommu/fq

Thanks,
Robin.


CC: Marek Szyprowski 
CC: Yoshihiro Shimoda 
CC: Geert Uytterhoeven 
CC: Yong Wu 
CC: Heiko Stuebner 
CC: Chunyan Zhang 
CC: Chunyan Zhang 
CC: Maxime Ripard 
CC: Jean-Philippe Brucker 

Robin Murphy (24):
   iommu: Pull IOVA cookie management into the core
   iommu/amd: Drop IOVA cookie management
   iommu/arm-smmu: Drop IOVA cookie management
   iommu/vt-d: Drop IOVA cookie management
   iommu/exynos: Drop IOVA cookie management
   iommu/ipmmu-vmsa: Drop IOVA cookie management
   iommu/mtk: Drop IOVA cookie management
   iommu/rockchip: Drop IOVA cookie management
   iommu/sprd: Drop IOVA cookie management
   iommu/sun50i: Drop IOVA cookie management
   iommu/virtio: Drop IOVA cookie management
   iommu/dma: Unexport IOVA cookie management
   iommu/dma: Remove redundant "!dev" checks
   iommu: Introduce explicit type for non-strict DMA domains
   iommu/amd: Prepare for multiple DMA domain types
   iommu/arm-smmu: Prepare for multiple DMA domain types
   iommu/vt-d: Prepare for multiple DMA domain types
   iommu: Express DMA strictness via the domain type
   iommu: Expose DMA domain strictness via sysfs
   iommu: Merge strictness and domain type configs
   iommu/dma: Factor out flush queue init
   iommu: Allow enabling non-strict mode dynamically
   iommu/arm-smmu: Allow non-strict in pgtable_quirks interface
   iommu: Only log strictness for DMA domains

  .../ABI/testing/sysfs-kernel-iommu_groups |  2 +
  drivers/iommu/Kconfig | 80 +--
  drivers/iommu/amd/iommu.c | 21 +
  drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c   | 25 --
  drivers/iommu/arm/arm-smmu/arm-smmu.c | 29 ---
  drivers/iommu/arm/arm-smmu/qcom_iommu.c   |  8 --
  drivers/iommu/dma-iommu.c | 44 +-
  drivers/iommu/exynos-iommu.c  | 18 +
  drivers/iommu/intel/iommu.c   | 23 ++
  drivers/iommu/iommu.c | 53 +++-
  drivers/iommu/ipmmu-vmsa.c| 27 +--
  drivers/iommu/mtk_iommu.c |  6 --
  drivers/iommu/rockchip-iommu.c| 11 +--
  drivers/iommu/sprd-iommu.c|  6 --
  drivers/iommu/sun50i-iommu.c  | 12 +--
  drivers/iommu/virtio-iommu.c  |  8 --
  include/linux/dma-iommu.h |  9 ++-
  include/linux/iommu.h | 15 +++-
  18 files changed, 171 insertions(+), 226 deletions(-)




___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

Re: [PATCH v7 00/15] Optimizing iommu_[map/unmap] performance

2021-07-14 Thread chenxiang (M)



在 2021/7/15 9:23, Lu Baolu 写道:

On 7/14/21 10:24 PM, Georgi Djakov wrote:

On 16.06.21 16:38, Georgi Djakov wrote:
When unmapping a buffer from an IOMMU domain, the IOMMU framework 
unmaps

the buffer at a granule of the largest page size that is supported by
the IOMMU hardware and fits within the buffer. For every block that
is unmapped, the IOMMU framework will call into the IOMMU driver, and
then the io-pgtable framework to walk the page tables to find the entry
that corresponds to the IOVA, and then unmaps the entry.

This can be suboptimal in scenarios where a buffer or a piece of a
buffer can be split into several contiguous page blocks of the same 
size.
For example, consider an IOMMU that supports 4 KB page blocks, 2 MB 
page
blocks, and 1 GB page blocks, and a buffer that is 4 MB in size is 
being
unmapped at IOVA 0. The current call-flow will result in 4 indirect 
calls,
and 2 page table walks, to unmap 2 entries that are next to each 
other in

the page-tables, when both entries could have been unmapped in one shot
by clearing both page table entries in the same call.

The same optimization is applicable to mapping buffers as well, so
these patches implement a set of callbacks called unmap_pages and
map_pages to the io-pgtable code and IOMMU drivers which unmaps or maps
an IOVA range that consists of a number of pages of the same
page size that is supported by the IOMMU hardware, and allows for
manipulating multiple page table entries in the same set of indirect
calls. The reason for introducing these callbacks is to give other 
IOMMU
drivers/io-pgtable formats time to change to using the new 
callbacks, so

that the transition to using this approach can be done piecemeal.


Hi Will,

Did you get a chance to look at this patchset? Most patches are already
acked/reviewed and all still applies clean on rc1.


I also have the ops->[un]map_pages implementation for the Intel IOMMU
driver. I will post them once the iommu/core part get applied.


I also implement those callbacks on ARM SMMUV3 based on this series, and 
use dma_map_benchmark to have a test on
the latency of map/unmap as follows, and i think it promotes much on the 
latency of map/unmap. I will also plan to post

the implementations for ARM SMMUV3 after this series are applied.

t = 1(thread = 1):
   before opt(us)   after opt(us)
g=1(4K size)0.1/1.3  0.1/0.8
g=2(8K size)0.2/1.5  0.2/0.9
g=4(16K size)   0.3/1.9  0.1/1.1
g=8(32K size)   0.5/2.7  0.2/1.4
g=16(64K size)  1.0/4.5  0.2/2.0
g=32(128K size) 1.8/7.9  0.2/3.3
g=64(256K size) 3.7/14.8 0.4/6.1
g=128(512K size)7.1/14.7 0.5/10.4
g=256(1M size)  14.0/53.90.8/19.3
g=512(2M size)  0.2/0.9  0.2/0.9
g=1024(4M size) 0.5/1.5  0.4/1.0

t = 10(thread = 10):
   before opt(us)   after opt(us)
g=1(4K size)0.3/7.0  0.1/5.8
g=2(8K size)0.4/6.7  0.3/6.0
g=4(16K size)   0.5/6.3  0.3/5.6
g=8(32K size)   0.5/8.3  0.2/6.3
g=16(64K size)  1.0/17.3 0.3/12.4
g=32(128K size) 1.8/36.0 0.2/24.2
g=64(256K size) 4.3/67.2 1.2/46.4
g=128(512K size)7.8/93.7 1.3/94.2
g=256(1M size)  14.7/280.8   1.8/191.5
g=512(2M size)  3.6/3.2  1.5/2.5
g=1024(4M size) 2.0/3.1  1.8/2.6




Best regards,
baolu
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

.




___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

Re: [Linuxarm] Re: [PATCH 0/4] Free cached iovas when rmmod the driver of the last device in group

2021-06-08 Thread chenxiang (M)

Hi Robin,


在 2021/6/7 19:23, Robin Murphy 写道:

On 2021-06-07 03:42, chenxiang wrote:

From: Xiang Chen 

When rmmod the driver of the last device in the group, cached iovas 
are not

used, and it is better to free them to save memories. And also export
function free_rcache_cached_iovas() and iommu_domain_to_iova().


How common is it to use a device a significant amount, then unbind its 
driver and not use it for long enough to care about? Off-hand I can't 
think of a particularly realistic use-case which looks like that - the 
closest I can imagine is unbinding a default kernel driver to replace 
it with VFIO, but I would expect the set of devices intended for 
assignment to be distinct from the set of devices used by the host 
itself, and thus the host driver wouldn't have actually done much to 
generate a lot of DMA mappings in that initial period. Is my 
expectation there wrong?


It is possible that user uses the driver for a while and then rmmod it 
(for example, SAS card is the driver we use always, but sometimes we 
need to compare the performance with raid card, so we will insmod the 
raid driver, and rmmod
the driver after the test). At this situation, the rcache iovas of raid 
card are always still even if rmmod the driver (user always rmmod the 
driver rather than removing the device which will release the group and 
release all resources).




If there is such a case, how much memory does this actually save in 
practice? The theoretical worst-case should be roughly 4 * 128 * 6 * 
sizeof(struct iova) bytes per CPU, which is around 192KB assuming 
64-byte kmem cache alignment. 


There are 128 CPUs in our ARM64 kunpeng920 board, and i add a debugfs 
interface drop_rcache of every group in local code which calls function 
free_rcache_cached_iovas(), and we can see that it saves 1~2M memory 
after freeing cached iovas.


estuary:/$ free
 total   used   free sharedbuffers cached
Mem: 5267767201336216  525440504 177664  0 177664
-/+ buffers/cache:1158552  525618168
Swap:0  0  0
estuary:/sys/kernel/debug/iommu/iovad/iommu_domain29$ echo 1 > drop_rcache
estuary:/sys/kernel/debug/iommu/iovad/iommu_domain29$ free
 total   used   free sharedbuffers cached
Mem: 5267767201334672  525442048 177664  0 177664
-/+ buffers/cache:1157008  525619712
Swap:0  0  0

The other reason i want to free rcache iovas in suitable place is the 
IOVA longterm issue[0] (which is unrecoverable), if freeing cached iovas 
when rmmod the driver or
adding a debugfs to do that, i think we can recover it by rmmod the 
driver and then insmod it or calling the debugfs drop_rcache though it 
doesn't solve the issue.


[0] 
https://lore.kernel.org/linux-iommu/1607538189-237944-4-git-send-email-john.ga...@huawei.com/


However it seems rather unlikely in practice to have every possible 
cache entry of every size used, so if saving smaller amounts of memory 
is a concern wouldn't you also want to explicitly drain the flush 
queues (16KB per CPU) and maybe look at trying to reclaim the unused 
pagetable pages from the domain itself - that ~192KB worth of cached 
IOVAs represents ~32K pages worth of IOVA space, which for an 
implementation like io-pgtable-arm with the 4KB granule means ~256KB 
worth of non-leaf pagetables left behind.


Ok, we may consider about those.



I'm not against the idea of having a mechanism to "compact" an idle 
DMA domain if there are convincing numbers to back it up, but the 
actual implementation would need to be better than this as well - 
having the IOMMU core code poking directly into the internals of the 
iommu-dma abstraction is not the way to go, and exporting anything to 
modules, which the IOMU core is not, smells suspicious.


Robin.


Xiang Chen (4):
   iommu/iova: add a function to free all rcached iovas and export it
   iommu/iova: use function free_rcache_cached_iovas() to free all
 rcached iovas
   dma-iommu: add a interface to get iova_domain from iommu domain
   iommu: free cached iovas when rmmod the driver of the last device in
 the group

  drivers/iommu/dma-iommu.c |  7 +++
  drivers/iommu/iommu.c |  7 +++
  drivers/iommu/iova.c  | 17 -
  include/linux/dma-iommu.h |  6 ++
  include/linux/iova.h  |  5 +
  5 files changed, 37 insertions(+), 5 deletions(-)


___
Linuxarm mailing list -- linux...@openeuler.org
To unsubscribe send an email to linuxarm-le...@openeuler.org



___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

Re: [PATCH] dma-iommu: Add a check to avoid dereference null pointer in function iommu_dma_map_sg()

2021-05-21 Thread chenxiang (M)



在 2021/5/21 18:36, Robin Murphy 写道:

On 2021-05-21 04:05, chenxiang wrote:

From: Xiang Chen 

The issue is reported by tool TscanCode, and it is possible to deference
null pointer when prev is NULL which is the initial value.


No it isn't. This is literally explained in the comment visible in the 
diff context below...


Robin.


ok, thanks




Signed-off-by: Xiang Chen 
---
  drivers/iommu/dma-iommu.c | 2 +-
  1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/iommu/dma-iommu.c b/drivers/iommu/dma-iommu.c
index 4cb63b2..88a4f34 100644
--- a/drivers/iommu/dma-iommu.c
+++ b/drivers/iommu/dma-iommu.c
@@ -1042,7 +1042,7 @@ static int iommu_dma_map_sg(struct device *dev, 
struct scatterlist *sg,

   *   iova_len == 0, thus we cannot dereference prev the first
   *   time through here (i.e. before it has a meaningful 
value).

   */
-if (pad_len && pad_len < s_length - 1) {
+if (prev && pad_len && pad_len < s_length - 1) {
  prev->length += pad_len;
  iova_len += pad_len;
  }



.




___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

Re: [PATCH v3] iommu/iova: put free_iova_mem() outside of spinlock iova_rbtree_lock

2021-05-18 Thread chenxiang (M)

Hi Joerg,

在 2021/5/18 17:08, Joerg Roedel 写道:

On Mon, Apr 19, 2021 at 03:13:35PM +0800, chenxiang wrote:

From: Xiang Chen 

It is not necessary to put free_iova_mem() inside of spinlock/unlock
iova_rbtree_lock which only leads to more completion for the spinlock.
It has a small promote on the performance after the change. And also
rename private_free_iova() as remove_iova() because the function will not
free iova after that change.

Signed-off-by: Xiang Chen 
---
  drivers/iommu/iova.c | 18 +++---
  1 file changed, 11 insertions(+), 7 deletions(-)

Looks good to me, but I'll wait for Robins opinion before applying.


I have re-sent v3, and Robin has given his acked-by:
https://www.mail-archive.com/iommu@lists.linux-foundation.org/msg50950.html




.




___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

Re: [RFC PATCH v5 11/15] iommu/io-pgtable-arm: Implement arm_lpae_map_pages()

2021-04-19 Thread chenxiang (M)

Hi Isaac,


在 2021/4/9 1:13, Isaac J. Manjarres 写道:

Implement the map_pages() callback for the ARM LPAE io-pgtable
format.

Signed-off-by: Isaac J. Manjarres 
---
  drivers/iommu/io-pgtable-arm.c | 42 ++
  1 file changed, 32 insertions(+), 10 deletions(-)

diff --git a/drivers/iommu/io-pgtable-arm.c b/drivers/iommu/io-pgtable-arm.c
index 1b690911995a..92978dd9c885 100644
--- a/drivers/iommu/io-pgtable-arm.c
+++ b/drivers/iommu/io-pgtable-arm.c
@@ -344,20 +344,30 @@ static arm_lpae_iopte 
arm_lpae_install_table(arm_lpae_iopte *table,
  }
  
  static int __arm_lpae_map(struct arm_lpae_io_pgtable *data, unsigned long iova,

- phys_addr_t paddr, size_t size, arm_lpae_iopte prot,
- int lvl, arm_lpae_iopte *ptep, gfp_t gfp)
+ phys_addr_t paddr, size_t size, size_t pgcount,
+ arm_lpae_iopte prot, int lvl, arm_lpae_iopte *ptep,
+ gfp_t gfp, size_t *mapped)
  {
arm_lpae_iopte *cptep, pte;
size_t block_size = ARM_LPAE_BLOCK_SIZE(lvl, data);
size_t tblsz = ARM_LPAE_GRANULE(data);
struct io_pgtable_cfg *cfg = >iop.cfg;
+   int ret = 0, num_entries, max_entries, map_idx_start;
  
  	/* Find our entry at the current level */

-   ptep += ARM_LPAE_LVL_IDX(iova, lvl, data);
+   map_idx_start = ARM_LPAE_LVL_IDX(iova, lvl, data);
+   ptep += map_idx_start;
  
  	/* If we can install a leaf entry at this level, then do so */

-   if (size == block_size)
-   return arm_lpae_init_pte(data, iova, paddr, prot, lvl, 1, ptep);
+   if (size == block_size) {
+   max_entries = ARM_LPAE_PTES_PER_TABLE(data) - map_idx_start;
+   num_entries = min_t(int, pgcount, max_entries);
+   ret = arm_lpae_init_pte(data, iova, paddr, prot, lvl, 
num_entries, ptep);
+   if (!ret && mapped)
+   *mapped += num_entries * size;
+
+   return ret;
+   }
  
  	/* We can't allocate tables at the final level */

if (WARN_ON(lvl >= ARM_LPAE_MAX_LEVELS - 1))
@@ -386,7 +396,8 @@ static int __arm_lpae_map(struct arm_lpae_io_pgtable *data, 
unsigned long iova,
}
  
  	/* Rinse, repeat */

-   return __arm_lpae_map(data, iova, paddr, size, prot, lvl + 1, cptep, 
gfp);
+   return __arm_lpae_map(data, iova, paddr, size, pgcount, prot, lvl + 1,
+ cptep, gfp, mapped);
  }
  
  static arm_lpae_iopte arm_lpae_prot_to_pte(struct arm_lpae_io_pgtable *data,

@@ -453,8 +464,9 @@ static arm_lpae_iopte arm_lpae_prot_to_pte(struct 
arm_lpae_io_pgtable *data,
return pte;
  }
  
-static int arm_lpae_map(struct io_pgtable_ops *ops, unsigned long iova,

-   phys_addr_t paddr, size_t size, int iommu_prot, gfp_t 
gfp)
+static int arm_lpae_map_pages(struct io_pgtable_ops *ops, unsigned long iova,
+ phys_addr_t paddr, size_t pgsize, size_t pgcount,
+ int iommu_prot, gfp_t gfp, size_t *mapped)
  {
struct arm_lpae_io_pgtable *data = io_pgtable_ops_to_data(ops);
struct io_pgtable_cfg *cfg = >iop.cfg;
@@ -463,7 +475,7 @@ static int arm_lpae_map(struct io_pgtable_ops *ops, 
unsigned long iova,
arm_lpae_iopte prot;
long iaext = (s64)iova >> cfg->ias;
  
-	if (WARN_ON(!size || (size & cfg->pgsize_bitmap) != size))

+   if (WARN_ON(!pgsize || (pgsize & cfg->pgsize_bitmap) != pgsize))
return -EINVAL;
  
  	if (cfg->quirks & IO_PGTABLE_QUIRK_ARM_TTBR1)

@@ -476,7 +488,8 @@ static int arm_lpae_map(struct io_pgtable_ops *ops, 
unsigned long iova,
return 0;
  
  	prot = arm_lpae_prot_to_pte(data, iommu_prot);

-   ret = __arm_lpae_map(data, iova, paddr, size, prot, lvl, ptep, gfp);
+   ret = __arm_lpae_map(data, iova, paddr, pgsize, pgcount, prot, lvl,
+ptep, gfp, NULL);


The last input parameter should be mapped instead of NULL.


/*
 * Synchronise all PTE updates for the new mapping before there's
 * a chance for anything to kick off a table walk for the new iova.
@@ -486,6 +499,14 @@ static int arm_lpae_map(struct io_pgtable_ops *ops, 
unsigned long iova,
return ret;
  }
  
+

+static int arm_lpae_map(struct io_pgtable_ops *ops, unsigned long iova,
+   phys_addr_t paddr, size_t size, int iommu_prot, gfp_t 
gfp)
+{
+   return arm_lpae_map_pages(ops, iova, paddr, size, 1, iommu_prot, gfp,
+ NULL);
+}
+
  static void __arm_lpae_free_pgtable(struct arm_lpae_io_pgtable *data, int lvl,
arm_lpae_iopte *ptep)
  {
@@ -782,6 +803,7 @@ arm_lpae_alloc_pgtable(struct io_pgtable_cfg *cfg)
  
  	data->iop.ops = (struct io_pgtable_ops) {

.map= arm_lpae_map,
+   .map_pages  = arm_lpae_map_pages,

Re: [PATCH v2] iommu/iova: put free_iova_mem() outside of spinlock iova_rbtree_lock

2021-04-16 Thread chenxiang (M)



在 2021/4/16 16:53, John Garry 写道:

On 16/04/2021 04:30, chenxiang (M) wrote:


you need to make this:
if (iova)
free_iova_mem(iova);

as free_iova_mem(iova) dereferences iova:

if (iova->pfn_lo != IOVA_ANCHOR)
kmem_cache_free(iova_cache, iova)

So if iova were NULL, we crash.

Or you can make free_iova_mem() NULL safe.


Right, it has the issue. What about changing it as below?

@@ -472,10 +472,13 @@ free_iova(struct iova_domain *iovad, unsigned 
long pfn)


 spin_lock_irqsave(>iova_rbtree_lock, flags);
 iova = private_find_iova(iovad, pfn);
-   if (iova)
-   private_free_iova(iovad, iova);
+   if (!iova) {
+ spin_unlock_irqrestore(>iova_rbtree_lock, flags);
+   return;
+   }
+   remove_iova(iovad, iova);
 spin_unlock_irqrestore(>iova_rbtree_lock, flags);
-
+   free_iova_mem(iova);
  }
  EXPORT_SYMBOL_GPL(free_iova);


I don't feel too strongly about how it's done.

Please note that kmem_cache_free() is not NULL safe. So the NULL check 
could be added in free_iova_mem(), but we prob don't want silent 
free_iova_mem(NULL) calls, so I would stick with changing free_iova().


so I would stick with changing free_iova()
- Do you mean free_iova_mem()?

But i think there is a check (judge iova is NULL) before free_iova_mem() 
for following scenarios, and it is not necessary to change in funciton 
free_iova_mem():

1) iova_magazine_free_pfns()
2) alloc_iova()
3) free_iova()
4) __free_iova(): Those functions which call __free_iova() have a check

Only for function put_iova_doamin(), it doesn't check iova, but i think 
iova in rbtree should not be NULL.




Thanks,
John

.




___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

Re: [PATCH v2] iommu/iova: put free_iova_mem() outside of spinlock iova_rbtree_lock

2021-04-15 Thread chenxiang (M)



在 2021/4/15 17:49, John Garry 写道:

On 15/04/2021 04:52, chenxiang wrote:

From: Xiang Chen 

It is not necessary to put free_iova_mem() inside of spinlock/unlock
iova_rbtree_lock which only leads to more completion for the spinlock.
It has a small promote on the performance after the change. And also
rename private_free_iova() as remove_iova() because the function will 
not

free iova after that change.

Signed-off-by: Xiang Chen 
---
  drivers/iommu/iova.c | 13 +++--
  1 file changed, 7 insertions(+), 6 deletions(-)

diff --git a/drivers/iommu/iova.c b/drivers/iommu/iova.c
index b7ecd5b..c10af23 100644
--- a/drivers/iommu/iova.c
+++ b/drivers/iommu/iova.c
@@ -412,12 +412,11 @@ private_find_iova(struct iova_domain *iovad, 
unsigned long pfn)

  return NULL;
  }
  -static void private_free_iova(struct iova_domain *iovad, struct 
iova *iova)

+static void remove_iova(struct iova_domain *iovad, struct iova *iova)
  {
  assert_spin_locked(>iova_rbtree_lock);
  __cached_rbnode_delete_update(iovad, iova);
  rb_erase(>node, >rbroot);
-free_iova_mem(iova);
  }
/**
@@ -452,8 +451,9 @@ __free_iova(struct iova_domain *iovad, struct 
iova *iova)

  unsigned long flags;
spin_lock_irqsave(>iova_rbtree_lock, flags);
-private_free_iova(iovad, iova);
+remove_iova(iovad, iova);
  spin_unlock_irqrestore(>iova_rbtree_lock, flags);
+free_iova_mem(iova);
  }
  EXPORT_SYMBOL_GPL(__free_iova);
  @@ -473,9 +473,9 @@ free_iova(struct iova_domain *iovad, unsigned 
long pfn)

  spin_lock_irqsave(>iova_rbtree_lock, flags);
  iova = private_find_iova(iovad, pfn);
  if (iova)
-private_free_iova(iovad, iova);
+remove_iova(iovad, iova);
  spin_unlock_irqrestore(>iova_rbtree_lock, flags);
-
+free_iova_mem(iova);


you need to make this:
if (iova)
free_iova_mem(iova);

as free_iova_mem(iova) dereferences iova:

if (iova->pfn_lo != IOVA_ANCHOR)
kmem_cache_free(iova_cache, iova)

So if iova were NULL, we crash.

Or you can make free_iova_mem() NULL safe.


Right, it has the issue. What about changing it as below?

@@ -472,10 +472,13 @@ free_iova(struct iova_domain *iovad, unsigned long 
pfn)


spin_lock_irqsave(>iova_rbtree_lock, flags);
iova = private_find_iova(iovad, pfn);
-   if (iova)
-   private_free_iova(iovad, iova);
+   if (!iova) {
+ spin_unlock_irqrestore(>iova_rbtree_lock, flags);
+   return;
+   }
+   remove_iova(iovad, iova);
spin_unlock_irqrestore(>iova_rbtree_lock, flags);
-
+   free_iova_mem(iova);
 }
 EXPORT_SYMBOL_GPL(free_iova);




  }
  EXPORT_SYMBOL_GPL(free_iova);
  @@ -825,7 +825,8 @@ iova_magazine_free_pfns(struct iova_magazine 
*mag, struct iova_domain *iovad)

  if (WARN_ON(!iova))
  continue;
  -private_free_iova(iovad, iova);
+remove_iova(iovad, iova);
+free_iova_mem(iova);
  }
spin_unlock_irqrestore(>iova_rbtree_lock, flags);




.




___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

Re: [PATCH] iommu/iova: put free_iova_mem() outside of spinlock iova_rbtree_lock

2021-04-14 Thread chenxiang (M)

Hi Robin,


在 2021/4/14 21:17, Robin Murphy 写道:

On 2021-04-14 07:38, chenxiang wrote:

From: Xiang Chen 

It is not necessary to put free_iova_mem() inside of spinlock/unlock
iova_rbtree_lock which only leads to more completion for the spinlock.
It has a small promote on the performance after the change.


This seems not entirely unreasonable, but private_free_iova() really 
needs to be renamed (maybe something like remove_iova()?) if it's no 
longer actually freeing anything - otherwise it's just unnecessarily 
misleading.


Ok, i will rename function private_free_iova() in next version.



Robin.


Signed-off-by: Xiang Chen 
---
  drivers/iommu/iova.c | 5 +++--
  1 file changed, 3 insertions(+), 2 deletions(-)

diff --git a/drivers/iommu/iova.c b/drivers/iommu/iova.c
index c669526f..292ed4a 100644
--- a/drivers/iommu/iova.c
+++ b/drivers/iommu/iova.c
@@ -339,7 +339,6 @@ static void private_free_iova(struct iova_domain 
*iovad, struct iova *iova)

  assert_spin_locked(>iova_rbtree_lock);
  __cached_rbnode_delete_update(iovad, iova);
  rb_erase(>node, >rbroot);
-free_iova_mem(iova);
  }
/**
@@ -376,6 +375,7 @@ __free_iova(struct iova_domain *iovad, struct 
iova *iova)

  spin_lock_irqsave(>iova_rbtree_lock, flags);
  private_free_iova(iovad, iova);
  spin_unlock_irqrestore(>iova_rbtree_lock, flags);
+free_iova_mem(iova);
  }
  EXPORT_SYMBOL_GPL(__free_iova);
  @@ -397,7 +397,7 @@ free_iova(struct iova_domain *iovad, unsigned 
long pfn)

  if (iova)
  private_free_iova(iovad, iova);
  spin_unlock_irqrestore(>iova_rbtree_lock, flags);
-
+free_iova_mem(iova);
  }
  EXPORT_SYMBOL_GPL(free_iova);
  @@ -746,6 +746,7 @@ iova_magazine_free_pfns(struct iova_magazine 
*mag, struct iova_domain *iovad)

  continue;
private_free_iova(iovad, iova);
+free_iova_mem(iova);
  }
spin_unlock_irqrestore(>iova_rbtree_lock, flags);



.




___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

Re: [PATCH] iommu: Add device name to iommu map/unmap trace events

2021-04-06 Thread chenxiang (M)

Hi,


在 2021/2/12 18:50, Joerg Roedel 写道:

On Tue, Feb 09, 2021 at 06:06:20PM +0530, Sai Prakash Ranjan wrote:

diff --git a/include/linux/iommu.h b/include/linux/iommu.h
index 5e7fe519430a..6064187d9bb6 100644
--- a/include/linux/iommu.h
+++ b/include/linux/iommu.h
@@ -87,6 +87,7 @@ struct iommu_domain {
void *handler_token;
struct iommu_domain_geometry geometry;
void *iova_cookie;
+   char dev_name[32];
  };

No, definitly not. A domain is a device DMA address space which can be
used by more than one device. Just look at IOMMU groups with more than
one member device, in this case just one device name would be very
misleading.


Is it possible to use group id to identify different domains?




Regards,

Joerg
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

.




___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

Re: [RFC PATCH 0/5] Optimization for unmapping iommu mapped buffers

2021-03-31 Thread chenxiang (M)

Hi Isaac,


在 2021/3/31 11:00, Isaac J. Manjarres 写道:

When unmapping a buffer from an IOMMU domain, the IOMMU framework unmaps
the buffer at a granule of the largest page size that is supported by
the IOMMU hardware and fits within the buffer. For every block that
is unmapped, the IOMMU framework will call into the IOMMU driver, and
then the io-pgtable framework to walk the page tables to find the entry
that corresponds to the IOVA, and then unmaps the entry.

This can be suboptimal in scenarios where a buffer or a piece of a
buffer can be split into several contiguous page blocks of the same size.
For example, consider an IOMMU that supports 4 KB page blocks, 2 MB page
blocks, and 1 GB page blocks, and a buffer that is 4 MB in size is being
unmapped at IOVA 0. The current call-flow will result in 4 indirect calls,
and 2 page table walks, to unmap 2 entries that are next to each other in
the page-tables, when both entries could have been unmapped in one shot
by clearing both page table entries in the same call.

These patches implement a callback called unmap_pages to the io-pgtable
code and IOMMU drivers which unmaps an IOVA range that consists of a
number of pages of the same page size that is supported by the IOMMU
hardware, and allows for clearing multiple entries in the same set of
indirect calls. The reason for introducing unmap_pages is to give
other IOMMU drivers/io-pgtable formats time to change to using the new
unmap_pages callback, so that the transition to using this approach can be
done piecemeal.

The same optimization is applicable for mapping buffers, however, the
error handling in the io-pgtable layer couldn't be handled cleanly, as we
would need to invoke iommu_unmap to unmap the parts of the buffer that
were mapped, and then do any TLB maintenance. However, that seemed like a
layering violation.

Any feedback is very much appreciated.


I apply those patchset and implement it for smmuv3 on my kunpeng ARM64 
platform, and test the latency of map/unmap
with tool dma_map_benchmark (./dma_map_benchmark -g xxx),  it promotes 
much on the latency of unmap(us).
Maybe you can add the implement for smmuv3 also. The test result is as 
follows:


latency of map/unmap(before opt)latency of 
map/unmap(after opt)

g=1(4K size)0.1/0.7 0.1/0.7
g=2(8K size)0.2/1.4 0.2/0.8
g=4(16K size)  0.3/2.7 0.3/0.9
g=8(32K size)  0.5/5.4 0.5/1.2
g=16(64K size)1/10.7 1/1.8
g=32(128K size)  1.8/21.4 1.8/2.8
g=64(256K size)  3.6/42.9 3.6/5.1
g=128(512K size)7/85.6 7/8.6
g=256(1M size)  13.9/171.1 13.9/15.5
g=512(2M size)  0.2/0.7 0.2/0.9
g=1024(4M size)0.3/1.5 0.3/1.1


The change for smmuv3 for test are as follows:
diff --git a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c 
b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c

index 8594b4a..e0268b1 100644
--- a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
+++ b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
@@ -2292,6 +2292,20 @@ static size_t arm_smmu_unmap(struct iommu_domain 
*domain, unsigned long iova,

return ops->unmap(ops, iova, size, gather);
 }

+static size_t arm_smmu_unmap_pages(struct iommu_domain *domain, 
unsigned long iova,

+size_t pgsize, size_t pgcount,
+struct iommu_iotlb_gather *gather)
+{
+   struct arm_smmu_domain *smmu_domain = to_smmu_domain(domain);
+   struct io_pgtable_ops *ops = smmu_domain->pgtbl_ops;
+
+   if (!ops)
+   return 0;
+
+   return ops->unmap_pages(ops, iova, pgsize, pgcount, gather);
+}
+
+
 static void arm_smmu_flush_iotlb_all(struct iommu_domain *domain)
 {
struct arm_smmu_domain *smmu_domain = to_smmu_domain(domain);
@@ -2613,6 +2627,7 @@ static struct iommu_ops arm_smmu_ops = {
.attach_dev = arm_smmu_attach_dev,
.map= arm_smmu_map,
.unmap  = arm_smmu_unmap,
+   .unmap_pages= arm_smmu_unmap_pages,
.flush_iotlb_all= arm_smmu_flush_iotlb_all,
.iotlb_sync = arm_smmu_iotlb_sync,
.iova_to_phys   = arm_smmu_iova_to_phys,



Thanks,
Isaac

Isaac J. Manjarres (5):
   iommu/io-pgtable: Introduce unmap_pages() as a page table op
   iommu: Add an unmap_pages() op for IOMMU drivers
   iommu: Add support for the unmap_pages IOMMU callback
   iommu/io-pgtable-arm: Implement arm_lpae_unmap_pages()
   iommu/arm-smmu: Implement the unmap_pages IOMMU driver callback

  drivers/iommu/arm/arm-smmu/arm-smmu.c |  19 +
  drivers/iommu/io-pgtable-arm.c| 114 +-
  drivers/iommu/iommu.c |  44 --
  include/linux/io-pgtable.h|   4 +
  include/linux/iommu.h |   4 +
  5 files changed, 159 insertions(+), 26 deletions(-)





Re: [PATCH] iommu: Add a check to avoid invalid iotlb sync

2021-03-30 Thread chenxiang (M)



在 2021/3/30 17:25, Will Deacon 写道:

On Tue, Mar 30, 2021 at 10:04:53AM +0100, Robin Murphy wrote:

On 2021-03-30 02:22, chenxiang (M) wrote:

Hi Will,


在 2021/3/29 22:45, Will Deacon 写道:

On Sat, Mar 27, 2021 at 02:23:10PM +0800, chenxiang wrote:

From: Xiang Chen 

Currently it will send a iotlb sync at end of iommu unmap even if
iotlb_gather is not valid (iotlb_gather->pgsize = 0). Actually it is not
necessary, so add a check to avoid invalid iotlb sync.

Signed-off-by: Xiang Chen 
---
   include/linux/iommu.h | 3 +++
   1 file changed, 3 insertions(+)

diff --git a/include/linux/iommu.h b/include/linux/iommu.h
index 9ca6e6b..6afa61b 100644
--- a/include/linux/iommu.h
+++ b/include/linux/iommu.h
@@ -529,6 +529,9 @@ static inline void
iommu_flush_iotlb_all(struct iommu_domain *domain)
   static inline void iommu_iotlb_sync(struct iommu_domain *domain,
 struct iommu_iotlb_gather *iotlb_gather)
   {
+if (!iotlb_gather->pgsize)
+return;

In which circumstances does this occur?

Will

When iommu_unmap, it initializes iotlb_gather (then iotlb_gather->pgsize
= 0).
If the unmap size = ARM_LPAE_BLOCK_SIZE(lvl) && iopte_leaf(), it will
fill the iotlb_gather (set iotlb_gather->pgsize = granule);
but if the unmap size = ARM_LPAE_BLOCK_SIZE(lvl) && !iopte_leaf() (tlb
flush walk situation), iotlb_gather is not filled (iotlb_gather->pgsize
= 0),
it will still send iommu_iotlb_sync(domain, _gather) which is
actually invalid iotlb sync at last.

I guess this can happen in DMA API usage if we've previously mapped a
block's worth of scatterlist pages into a block-=sized IOVA region, but this
is not the place to do anything about it. The main thing this patch will do
is break all the other drivers that implement .iotlb_sync but do not use
iotlb_gather.

By all means optimise SMMUv3's sync behaviour, but the only valid place to
do that is in SMMUv3's own sync callback. These particular details are
beyond what the IOMMU core knows about.

Yes, that's where I was heading. Chenxiang, please could you send a v2
with the check inside arm_smmu_iotlb_sync() instead?


Ok, thanks, i will send a v2 later.



Thanks,

Will

.




___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

Re: [PATCH] iommu: Add a check to avoid invalid iotlb sync

2021-03-29 Thread chenxiang (M)

Hi Will,


在 2021/3/29 22:45, Will Deacon 写道:

On Sat, Mar 27, 2021 at 02:23:10PM +0800, chenxiang wrote:

From: Xiang Chen 

Currently it will send a iotlb sync at end of iommu unmap even if
iotlb_gather is not valid (iotlb_gather->pgsize = 0). Actually it is not
necessary, so add a check to avoid invalid iotlb sync.

Signed-off-by: Xiang Chen 
---
  include/linux/iommu.h | 3 +++
  1 file changed, 3 insertions(+)

diff --git a/include/linux/iommu.h b/include/linux/iommu.h
index 9ca6e6b..6afa61b 100644
--- a/include/linux/iommu.h
+++ b/include/linux/iommu.h
@@ -529,6 +529,9 @@ static inline void iommu_flush_iotlb_all(struct 
iommu_domain *domain)
  static inline void iommu_iotlb_sync(struct iommu_domain *domain,
  struct iommu_iotlb_gather *iotlb_gather)
  {
+   if (!iotlb_gather->pgsize)
+   return;

In which circumstances does this occur?

Will


When iommu_unmap, it initializes iotlb_gather (then iotlb_gather->pgsize 
= 0).
If the unmap size = ARM_LPAE_BLOCK_SIZE(lvl) && iopte_leaf(), it will 
fill the iotlb_gather (set iotlb_gather->pgsize = granule);
but if the unmap size = ARM_LPAE_BLOCK_SIZE(lvl) && !iopte_leaf() (tlb 
flush walk situation), iotlb_gather is not filled (iotlb_gather->pgsize 
= 0),
it will still send iommu_iotlb_sync(domain, _gather) which is 
actually invalid iotlb sync at last.





.




___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

Re: [Linuxarm] Re: [PATCH v14 07/13] iommu/smmuv3: Implement cache_invalidate

2021-03-22 Thread chenxiang (M)

Hi Eric,


在 2021/3/22 17:05, Auger Eric 写道:

Hi Chenxiang,

On 3/22/21 7:40 AM, chenxiang (M) wrote:

Hi Eric,


在 2021/3/20 1:36, Auger Eric 写道:

Hi Chenxiang,

On 3/4/21 8:55 AM, chenxiang (M) wrote:

Hi Eric,


在 2021/2/24 4:56, Eric Auger 写道:

Implement domain-selective, pasid selective and page-selective
IOTLB invalidations.

Signed-off-by: Eric Auger 

---

v13 -> v14:
- Add domain invalidation
- do global inval when asid is not provided with addr
granularity

v7 -> v8:
- ASID based invalidation using iommu_inv_pasid_info
- check ARCHID/PASID flags in addr based invalidation
- use __arm_smmu_tlb_inv_context and __arm_smmu_tlb_inv_range_nosync

v6 -> v7
- check the uapi version

v3 -> v4:
- adapt to changes in the uapi
- add support for leaf parameter
- do not use arm_smmu_tlb_inv_range_nosync or arm_smmu_tlb_inv_context
anymore

v2 -> v3:
- replace __arm_smmu_tlb_sync by arm_smmu_cmdq_issue_sync

v1 -> v2:
- properly pass the asid
---
   drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c | 74
+
   1 file changed, 74 insertions(+)

diff --git a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
index 4c19a1114de4..df3adc49111c 100644
--- a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
+++ b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
@@ -2949,6 +2949,79 @@ static void
arm_smmu_detach_pasid_table(struct iommu_domain *domain)
   mutex_unlock(_domain->init_mutex);
   }
   +static int
+arm_smmu_cache_invalidate(struct iommu_domain *domain, struct
device *dev,
+  struct iommu_cache_invalidate_info *inv_info)
+{
+struct arm_smmu_cmdq_ent cmd = {.opcode = CMDQ_OP_TLBI_NSNH_ALL};
+struct arm_smmu_domain *smmu_domain = to_smmu_domain(domain);
+struct arm_smmu_device *smmu = smmu_domain->smmu;
+
+if (smmu_domain->stage != ARM_SMMU_DOMAIN_NESTED)
+return -EINVAL;
+
+if (!smmu)
+return -EINVAL;
+
+if (inv_info->version != IOMMU_CACHE_INVALIDATE_INFO_VERSION_1)
+return -EINVAL;
+
+if (inv_info->cache & IOMMU_CACHE_INV_TYPE_PASID ||
+inv_info->cache & IOMMU_CACHE_INV_TYPE_DEV_IOTLB) {
+return -ENOENT;
+}
+
+if (!(inv_info->cache & IOMMU_CACHE_INV_TYPE_IOTLB))
+return -EINVAL;
+
+/* IOTLB invalidation */
+
+switch (inv_info->granularity) {
+case IOMMU_INV_GRANU_PASID:
+{
+struct iommu_inv_pasid_info *info =
+_info->granu.pasid_info;
+
+if (info->flags & IOMMU_INV_ADDR_FLAGS_PASID)
+return -ENOENT;
+if (!(info->flags & IOMMU_INV_PASID_FLAGS_ARCHID))
+return -EINVAL;
+
+__arm_smmu_tlb_inv_context(smmu_domain, info->archid);
+return 0;
+}
+case IOMMU_INV_GRANU_ADDR:
+{
+struct iommu_inv_addr_info *info = _info->granu.addr_info;
+size_t size = info->nb_granules * info->granule_size;
+bool leaf = info->flags & IOMMU_INV_ADDR_FLAGS_LEAF;
+
+if (info->flags & IOMMU_INV_ADDR_FLAGS_PASID)
+return -ENOENT;
+
+if (!(info->flags & IOMMU_INV_ADDR_FLAGS_ARCHID))
+break;
+
+arm_smmu_tlb_inv_range_domain(info->addr, size,
+  info->granule_size, leaf,
+  info->archid, smmu_domain);

Is it possible to add a check before the function to make sure that
info->granule_size can be recognized by SMMU?
There is a scenario which will cause TLBI issue: RIL feature is enabled
on guest but is disabled on host, and then on
host it just invalidate 4K/2M/1G once a time, but from QEMU,
info->nb_granules is always 1 and info->granule_size = size,
if size is not equal to 4K or 2M or 1G (for example size = granule_size
is 5M), it will only part of the size it wants to invalidate.

Do you have any idea about this issue?

At the QEMU VFIO notifier level, when I fill the struct
iommu_cache_invalidate_info, I currently miss the granule info, hence
this weird choice of setting setting info->nb_granules is always 1 and
info->granule_size = size. I think in arm_smmu_cache_invalidate() I need
to convert this info back to a the leaf page size, in case the host does
not support RIL. Just as it is done in  __arm_smmu_tlb_inv_range if RIL
is supported.

Does it makes sense to you?


Yes, it makes sense to me.
I am glad to test them if the patchset are ready.




Thank you for spotting the issue.

Eric

I think maybe we can add a check here: if RIL is not enabled and also
size is not the granule_size (4K/2M/1G) supported by
SMMU hardware, can we just simply use the smallest granule_size
supported by hardware all the time?


+
+arm_smmu_cmdq_issue_sync(smmu);
+return 0;
+}
+case IOMMU_INV_GRANU_DOMAIN:
+break;

I check the qemu code
(https://github.com/eauger/qemu/tree/v5.2.0-2stage-rfcv8), for opcode
CMD_TLBI_NH_A

Re: [Linuxarm] Re: [PATCH v14 07/13] iommu/smmuv3: Implement cache_invalidate

2021-03-22 Thread chenxiang (M)

Hi Eric,


在 2021/3/20 1:36, Auger Eric 写道:

Hi Chenxiang,

On 3/4/21 8:55 AM, chenxiang (M) wrote:

Hi Eric,


在 2021/2/24 4:56, Eric Auger 写道:

Implement domain-selective, pasid selective and page-selective
IOTLB invalidations.

Signed-off-by: Eric Auger 

---

v13 -> v14:
- Add domain invalidation
- do global inval when asid is not provided with addr
   granularity

v7 -> v8:
- ASID based invalidation using iommu_inv_pasid_info
- check ARCHID/PASID flags in addr based invalidation
- use __arm_smmu_tlb_inv_context and __arm_smmu_tlb_inv_range_nosync

v6 -> v7
- check the uapi version

v3 -> v4:
- adapt to changes in the uapi
- add support for leaf parameter
- do not use arm_smmu_tlb_inv_range_nosync or arm_smmu_tlb_inv_context
   anymore

v2 -> v3:
- replace __arm_smmu_tlb_sync by arm_smmu_cmdq_issue_sync

v1 -> v2:
- properly pass the asid
---
  drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c | 74 +
  1 file changed, 74 insertions(+)

diff --git a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c 
b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
index 4c19a1114de4..df3adc49111c 100644
--- a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
+++ b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
@@ -2949,6 +2949,79 @@ static void arm_smmu_detach_pasid_table(struct 
iommu_domain *domain)
mutex_unlock(_domain->init_mutex);
  }
  
+static int

+arm_smmu_cache_invalidate(struct iommu_domain *domain, struct device *dev,
+ struct iommu_cache_invalidate_info *inv_info)
+{
+   struct arm_smmu_cmdq_ent cmd = {.opcode = CMDQ_OP_TLBI_NSNH_ALL};
+   struct arm_smmu_domain *smmu_domain = to_smmu_domain(domain);
+   struct arm_smmu_device *smmu = smmu_domain->smmu;
+
+   if (smmu_domain->stage != ARM_SMMU_DOMAIN_NESTED)
+   return -EINVAL;
+
+   if (!smmu)
+   return -EINVAL;
+
+   if (inv_info->version != IOMMU_CACHE_INVALIDATE_INFO_VERSION_1)
+   return -EINVAL;
+
+   if (inv_info->cache & IOMMU_CACHE_INV_TYPE_PASID ||
+   inv_info->cache & IOMMU_CACHE_INV_TYPE_DEV_IOTLB) {
+   return -ENOENT;
+   }
+
+   if (!(inv_info->cache & IOMMU_CACHE_INV_TYPE_IOTLB))
+   return -EINVAL;
+
+   /* IOTLB invalidation */
+
+   switch (inv_info->granularity) {
+   case IOMMU_INV_GRANU_PASID:
+   {
+   struct iommu_inv_pasid_info *info =
+   _info->granu.pasid_info;
+
+   if (info->flags & IOMMU_INV_ADDR_FLAGS_PASID)
+   return -ENOENT;
+   if (!(info->flags & IOMMU_INV_PASID_FLAGS_ARCHID))
+   return -EINVAL;
+
+   __arm_smmu_tlb_inv_context(smmu_domain, info->archid);
+   return 0;
+   }
+   case IOMMU_INV_GRANU_ADDR:
+   {
+   struct iommu_inv_addr_info *info = _info->granu.addr_info;
+   size_t size = info->nb_granules * info->granule_size;
+   bool leaf = info->flags & IOMMU_INV_ADDR_FLAGS_LEAF;
+
+   if (info->flags & IOMMU_INV_ADDR_FLAGS_PASID)
+   return -ENOENT;
+
+   if (!(info->flags & IOMMU_INV_ADDR_FLAGS_ARCHID))
+   break;
+
+   arm_smmu_tlb_inv_range_domain(info->addr, size,
+ info->granule_size, leaf,
+ info->archid, smmu_domain);

Is it possible to add a check before the function to make sure that
info->granule_size can be recognized by SMMU?
There is a scenario which will cause TLBI issue: RIL feature is enabled
on guest but is disabled on host, and then on
host it just invalidate 4K/2M/1G once a time, but from QEMU,
info->nb_granules is always 1 and info->granule_size = size,
if size is not equal to 4K or 2M or 1G (for example size = granule_size
is 5M), it will only part of the size it wants to invalidate.


Do you have any idea about this issue?



I think maybe we can add a check here: if RIL is not enabled and also
size is not the granule_size (4K/2M/1G) supported by
SMMU hardware, can we just simply use the smallest granule_size
supported by hardware all the time?


+
+   arm_smmu_cmdq_issue_sync(smmu);
+   return 0;
+   }
+   case IOMMU_INV_GRANU_DOMAIN:
+   break;

I check the qemu code
(https://github.com/eauger/qemu/tree/v5.2.0-2stage-rfcv8), for opcode
CMD_TLBI_NH_ALL or CMD_TLBI_NSNH_ALL from guest OS
it calls smmu_inv_notifiers_all() to unamp all notifiers of all mr's,
but it seems not set event.entry.granularity which i think it should set
IOMMU_INV_GRAN_ADDR.

this is because IOMMU_INV_GRAN_ADDR = 0. But for clarity I should rather
set it explicitly ;-)


ok


BTW, for opcode CMD_TLBI_NH_ALL or CMD_TLBI_NSNH_ALL, it needs to unmap
size = 0x1 on 48bit s

Re: [PATCH] iommu/dma: Fix a typo in a comment

2021-03-21 Thread chenxiang (M)



在 2021/3/19 19:02, Robin Murphy 写道:

On 2021-03-19 01:02, chenxiang (M) wrote:

Hi,


在 2021/3/18 19:01, Robin Murphy 写道:

On 2021-03-18 09:55, chenxiang (M) wrote:

Hi will,


在 2021/3/18 17:34, Will Deacon 写道:

On Thu, Mar 18, 2021 at 11:21:24AM +0800, chenxiang wrote:

From: Xiang Chen 

Fix a type "SAC" to "DAC" in the comment of function
iommu_dma_alloc_iova().

Signed-off-by: Xiang Chen 
---
  drivers/iommu/dma-iommu.c | 2 +-
  1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/iommu/dma-iommu.c b/drivers/iommu/dma-iommu.c
index af765c8..3bc17ee 100644
--- a/drivers/iommu/dma-iommu.c
+++ b/drivers/iommu/dma-iommu.c
@@ -443,7 +443,7 @@ static dma_addr_t iommu_dma_alloc_iova(struct 
iommu_domain *domain,

  if (domain->geometry.force_aperture)
  dma_limit = min(dma_limit, 
(u64)domain->geometry.aperture_end);

-/* Try to get PCI devices a SAC address */
+/* Try to get PCI devices a DAC address */
  if (dma_limit > DMA_BIT_MASK(32) && dev_is_pci(dev))
  iova = alloc_iova_fast(iovad, iova_len,
 DMA_BIT_MASK(32) >> shift, false);

This doesn't look like a typo to me... Please explain.


As the author of said comment, it is definitely not a typo.


What's the short for SAC?


Single Address Cycle - this is standard terminology from the PCI spec.


Got it, thanks.



Robin.

.




___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

Re: [PATCH] iommu/dma: Fix a typo in a comment

2021-03-18 Thread chenxiang (M)

Hi,


在 2021/3/18 19:01, Robin Murphy 写道:

On 2021-03-18 09:55, chenxiang (M) wrote:

Hi will,


在 2021/3/18 17:34, Will Deacon 写道:

On Thu, Mar 18, 2021 at 11:21:24AM +0800, chenxiang wrote:

From: Xiang Chen 

Fix a type "SAC" to "DAC" in the comment of function
iommu_dma_alloc_iova().

Signed-off-by: Xiang Chen 
---
  drivers/iommu/dma-iommu.c | 2 +-
  1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/iommu/dma-iommu.c b/drivers/iommu/dma-iommu.c
index af765c8..3bc17ee 100644
--- a/drivers/iommu/dma-iommu.c
+++ b/drivers/iommu/dma-iommu.c
@@ -443,7 +443,7 @@ static dma_addr_t iommu_dma_alloc_iova(struct 
iommu_domain *domain,

  if (domain->geometry.force_aperture)
  dma_limit = min(dma_limit, 
(u64)domain->geometry.aperture_end);

-/* Try to get PCI devices a SAC address */
+/* Try to get PCI devices a DAC address */
  if (dma_limit > DMA_BIT_MASK(32) && dev_is_pci(dev))
  iova = alloc_iova_fast(iovad, iova_len,
 DMA_BIT_MASK(32) >> shift, false);

This doesn't look like a typo to me... Please explain.


As the author of said comment, it is definitely not a typo.


What's the short for SAC?



I think it means double-address cycle(DAC), and in LLD3 452 page, 
there is a description about it "PCI double-address cycle mappings 
Normally,
the DMA support layer works with 32-bit bus addresses, possibly 
restricted by a specific device’s DMA mask. The PCI bus, however, 
also supports a 64-bit addressing mode, the double-address cycle (DAC)."

Please point it out if i am wrong :)


Yes, now look at what the code above does: *if* a PCI device has a DMA 
mask greater than 32 bits (i.e. can support dual address cycles), we 
first try an IOVA allocation with an explicit 32-bit limit (i.e. that 
can be expressed in a single address cycle), in the hope that we don't 
have to fall back to allocating from the upper range and forcing the 
device to actually use DAC (and suffer the potential performance impact).


Robin.

.




___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

Re: [PATCH] iommu/dma: Fix a typo in a comment

2021-03-18 Thread chenxiang (M)

Hi will,


在 2021/3/18 17:34, Will Deacon 写道:

On Thu, Mar 18, 2021 at 11:21:24AM +0800, chenxiang wrote:

From: Xiang Chen 

Fix a type "SAC" to "DAC" in the comment of function
iommu_dma_alloc_iova().

Signed-off-by: Xiang Chen 
---
  drivers/iommu/dma-iommu.c | 2 +-
  1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/iommu/dma-iommu.c b/drivers/iommu/dma-iommu.c
index af765c8..3bc17ee 100644
--- a/drivers/iommu/dma-iommu.c
+++ b/drivers/iommu/dma-iommu.c
@@ -443,7 +443,7 @@ static dma_addr_t iommu_dma_alloc_iova(struct iommu_domain 
*domain,
if (domain->geometry.force_aperture)
dma_limit = min(dma_limit, (u64)domain->geometry.aperture_end);
  
-	/* Try to get PCI devices a SAC address */

+   /* Try to get PCI devices a DAC address */
if (dma_limit > DMA_BIT_MASK(32) && dev_is_pci(dev))
iova = alloc_iova_fast(iovad, iova_len,
   DMA_BIT_MASK(32) >> shift, false);

This doesn't look like a typo to me... Please explain.


I think it means double-address cycle(DAC), and in LLD3 452 page, there 
is a description about it "PCI double-address cycle mappings Normally,
the DMA support layer works with 32-bit bus addresses, possibly 
restricted by a specific device’s DMA mask. The PCI bus, however, also 
supports a 64-bit addressing mode, the double-address cycle (DAC)."

Please point it out if i am wrong :)

Best Regards,
chenxiang



Will

.




___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

Re: [PATCH v14 07/13] iommu/smmuv3: Implement cache_invalidate

2021-03-03 Thread chenxiang (M)

Hi Eric,


在 2021/2/24 4:56, Eric Auger 写道:

Implement domain-selective, pasid selective and page-selective
IOTLB invalidations.

Signed-off-by: Eric Auger 

---

v13 -> v14:
- Add domain invalidation
- do global inval when asid is not provided with addr
   granularity

v7 -> v8:
- ASID based invalidation using iommu_inv_pasid_info
- check ARCHID/PASID flags in addr based invalidation
- use __arm_smmu_tlb_inv_context and __arm_smmu_tlb_inv_range_nosync

v6 -> v7
- check the uapi version

v3 -> v4:
- adapt to changes in the uapi
- add support for leaf parameter
- do not use arm_smmu_tlb_inv_range_nosync or arm_smmu_tlb_inv_context
   anymore

v2 -> v3:
- replace __arm_smmu_tlb_sync by arm_smmu_cmdq_issue_sync

v1 -> v2:
- properly pass the asid
---
  drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c | 74 +
  1 file changed, 74 insertions(+)

diff --git a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c 
b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
index 4c19a1114de4..df3adc49111c 100644
--- a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
+++ b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
@@ -2949,6 +2949,79 @@ static void arm_smmu_detach_pasid_table(struct 
iommu_domain *domain)
mutex_unlock(_domain->init_mutex);
  }
  
+static int

+arm_smmu_cache_invalidate(struct iommu_domain *domain, struct device *dev,
+ struct iommu_cache_invalidate_info *inv_info)
+{
+   struct arm_smmu_cmdq_ent cmd = {.opcode = CMDQ_OP_TLBI_NSNH_ALL};
+   struct arm_smmu_domain *smmu_domain = to_smmu_domain(domain);
+   struct arm_smmu_device *smmu = smmu_domain->smmu;
+
+   if (smmu_domain->stage != ARM_SMMU_DOMAIN_NESTED)
+   return -EINVAL;
+
+   if (!smmu)
+   return -EINVAL;
+
+   if (inv_info->version != IOMMU_CACHE_INVALIDATE_INFO_VERSION_1)
+   return -EINVAL;
+
+   if (inv_info->cache & IOMMU_CACHE_INV_TYPE_PASID ||
+   inv_info->cache & IOMMU_CACHE_INV_TYPE_DEV_IOTLB) {
+   return -ENOENT;
+   }
+
+   if (!(inv_info->cache & IOMMU_CACHE_INV_TYPE_IOTLB))
+   return -EINVAL;
+
+   /* IOTLB invalidation */
+
+   switch (inv_info->granularity) {
+   case IOMMU_INV_GRANU_PASID:
+   {
+   struct iommu_inv_pasid_info *info =
+   _info->granu.pasid_info;
+
+   if (info->flags & IOMMU_INV_ADDR_FLAGS_PASID)
+   return -ENOENT;
+   if (!(info->flags & IOMMU_INV_PASID_FLAGS_ARCHID))
+   return -EINVAL;
+
+   __arm_smmu_tlb_inv_context(smmu_domain, info->archid);
+   return 0;
+   }
+   case IOMMU_INV_GRANU_ADDR:
+   {
+   struct iommu_inv_addr_info *info = _info->granu.addr_info;
+   size_t size = info->nb_granules * info->granule_size;
+   bool leaf = info->flags & IOMMU_INV_ADDR_FLAGS_LEAF;
+
+   if (info->flags & IOMMU_INV_ADDR_FLAGS_PASID)
+   return -ENOENT;
+
+   if (!(info->flags & IOMMU_INV_ADDR_FLAGS_ARCHID))
+   break;
+
+   arm_smmu_tlb_inv_range_domain(info->addr, size,
+ info->granule_size, leaf,
+ info->archid, smmu_domain);


Is it possible to add a check before the function to make sure that 
info->granule_size can be recognized by SMMU?
There is a scenario which will cause TLBI issue: RIL feature is enabled 
on guest but is disabled on host, and then on
host it just invalidate 4K/2M/1G once a time, but from QEMU, 
info->nb_granules is always 1 and info->granule_size = size,
if size is not equal to 4K or 2M or 1G (for example size = granule_size 
is 5M), it will only part of the size it wants to invalidate.


I think maybe we can add a check here: if RIL is not enabled and also 
size is not the granule_size (4K/2M/1G) supported by
SMMU hardware, can we just simply use the smallest granule_size 
supported by hardware all the time?



+
+   arm_smmu_cmdq_issue_sync(smmu);
+   return 0;
+   }
+   case IOMMU_INV_GRANU_DOMAIN:
+   break;


I check the qemu code 
(https://github.com/eauger/qemu/tree/v5.2.0-2stage-rfcv8), for opcode 
CMD_TLBI_NH_ALL or CMD_TLBI_NSNH_ALL from guest OS
it calls smmu_inv_notifiers_all() to unamp all notifiers of all mr's, 
but it seems not set event.entry.granularity which i think it should set 
IOMMU_INV_GRAN_ADDR.
BTW, for opcode CMD_TLBI_NH_ALL or CMD_TLBI_NSNH_ALL, it needs to unmap 
size = 0x1 on 48bit system (it may spend much time),  maybe 
it is better
to set it as IOMMU_INV_GRANU_DOMAIN, then in host OS, send TLBI_NH_ALL 
directly when IOMMU_INV_GRANU_DOMAIN.




+   default:
+   return -EINVAL;
+   }
+
+   /* Global S1 invalidation */
+   cmd.tlbi.vmid   = smmu_domain->s2_cfg.vmid;
+   arm_smmu_cmdq_issue_cmd(smmu, 

Re: [PATCH 1/1] Revert "iommu/iova: Retry from last rb tree node if iova search fails"

2021-01-29 Thread chenxiang (M)

Hi Robin,


在 2021/1/29 20:03, Robin Murphy 写道:

On 2021-01-29 09:48, Leizhen (ThunderTown) wrote:


Currently, we are thinking about the solution to the problem. 
However, because the end time of v5.11 is approaching, this patch is 
sent first.


However, that commit was made for a reason - how do we justify that 
one thing being slow is more important than another thing being 
completely broken? It's not practical to just keep doing the patch 
hokey-cokey based on whoever shouts loudest :(



On 2021/1/29 17:21, Zhen Lei wrote:

This reverts commit 4e89dce725213d3d0b0475211b500eda4ef4bf2f.

We find that this patch has a great impact on performance. According to
our test: the iops decreases from 1655.6K to 893.5K, about half.

Hardware: 1 SAS expander with 12 SAS SSD
Command:  Only the main parameters are listed.
   fio bs=4k rw=read iodepth=128 cpus_allowed=0-127


FWIW, I'm 99% sure that what you really want is [1], but then you get 
to battle against an unknown quantity of dodgy firmware instead.


Robin.

[1] 
https://lore.kernel.org/linux-iommu/d412c292d222eb36469effd338e985f9d9e24cd6.1594207679.git.robin.mur...@arm.com/


Thank you for pointing that out. I have tested it, and it solves the 
performance drop issue mentioned above.
I noticed that you sent it July 2020, and do you have a plan to merge it 
recently?





Fixes: 4e89dce72521 ("iommu/iova: Retry from last rb tree node if 
iova search fails")

Tested-by: Xiang Chen 
Signed-off-by: Zhen Lei 
---
  drivers/iommu/iova.c | 23 ++-
  1 file changed, 6 insertions(+), 17 deletions(-)

diff --git a/drivers/iommu/iova.c b/drivers/iommu/iova.c
index d20b8b333d30d17..f840c7207efbced 100644
--- a/drivers/iommu/iova.c
+++ b/drivers/iommu/iova.c
@@ -185,9 +185,8 @@ static int __alloc_and_insert_iova_range(struct 
iova_domain *iovad,

  struct rb_node *curr, *prev;
  struct iova *curr_iova;
  unsigned long flags;
-unsigned long new_pfn, retry_pfn;
+unsigned long new_pfn;
  unsigned long align_mask = ~0UL;
-unsigned long high_pfn = limit_pfn, low_pfn = iovad->start_pfn;
if (size_aligned)
  align_mask <<= fls_long(size - 1);
@@ -200,25 +199,15 @@ static int 
__alloc_and_insert_iova_range(struct iova_domain *iovad,

curr = __get_cached_rbnode(iovad, limit_pfn);
  curr_iova = rb_entry(curr, struct iova, node);
-retry_pfn = curr_iova->pfn_hi + 1;
-
-retry:
  do {
-high_pfn = min(high_pfn, curr_iova->pfn_lo);
-new_pfn = (high_pfn - size) & align_mask;
+limit_pfn = min(limit_pfn, curr_iova->pfn_lo);
+new_pfn = (limit_pfn - size) & align_mask;
  prev = curr;
  curr = rb_prev(curr);
  curr_iova = rb_entry(curr, struct iova, node);
-} while (curr && new_pfn <= curr_iova->pfn_hi && new_pfn >= 
low_pfn);

-
-if (high_pfn < size || new_pfn < low_pfn) {
-if (low_pfn == iovad->start_pfn && retry_pfn < limit_pfn) {
-high_pfn = limit_pfn;
-low_pfn = retry_pfn;
-curr = >anchor.node;
-curr_iova = rb_entry(curr, struct iova, node);
-goto retry;
-}
+} while (curr && new_pfn <= curr_iova->pfn_hi);
+
+if (limit_pfn < size || new_pfn < iovad->start_pfn) {
  iovad->max32_alloc_size = size;
  goto iova32_full;
  }





.




___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

Re: [PATCH v13 08/15] iommu/smmuv3: Implement cache_invalidate

2021-01-15 Thread chenxiang (M)

Hi Eric,


在 2020/11/18 19:21, Eric Auger 写道:

Implement domain-selective and page-selective IOTLB invalidations.

Signed-off-by: Eric Auger 

---
v7 -> v8:
- ASID based invalidation using iommu_inv_pasid_info
- check ARCHID/PASID flags in addr based invalidation
- use __arm_smmu_tlb_inv_context and __arm_smmu_tlb_inv_range_nosync

v6 -> v7
- check the uapi version

v3 -> v4:
- adapt to changes in the uapi
- add support for leaf parameter
- do not use arm_smmu_tlb_inv_range_nosync or arm_smmu_tlb_inv_context
   anymore

v2 -> v3:
- replace __arm_smmu_tlb_sync by arm_smmu_cmdq_issue_sync

v1 -> v2:
- properly pass the asid
---
  drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c | 53 +
  1 file changed, 53 insertions(+)

diff --git a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c 
b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
index fdecc9f17b36..24124361dd3b 100644
--- a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
+++ b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
@@ -2771,6 +2771,58 @@ static void arm_smmu_detach_pasid_table(struct 
iommu_domain *domain)
mutex_unlock(_domain->init_mutex);
  }
  
+static int

+arm_smmu_cache_invalidate(struct iommu_domain *domain, struct device *dev,
+ struct iommu_cache_invalidate_info *inv_info)
+{
+   struct arm_smmu_domain *smmu_domain = to_smmu_domain(domain);
+   struct arm_smmu_device *smmu = smmu_domain->smmu;
+
+   if (smmu_domain->stage != ARM_SMMU_DOMAIN_NESTED)
+   return -EINVAL;
+
+   if (!smmu)
+   return -EINVAL;
+
+   if (inv_info->version != IOMMU_CACHE_INVALIDATE_INFO_VERSION_1)
+   return -EINVAL;
+
+   if (inv_info->cache & IOMMU_CACHE_INV_TYPE_IOTLB) {
+   if (inv_info->granularity == IOMMU_INV_GRANU_PASID) {
+   struct iommu_inv_pasid_info *info =
+   _info->granu.pasid_info;
+
+   if (!(info->flags & IOMMU_INV_PASID_FLAGS_ARCHID) ||
+(info->flags & IOMMU_INV_PASID_FLAGS_PASID))
+   return -EINVAL;
+
+   __arm_smmu_tlb_inv_context(smmu_domain, info->archid);
+
+   } else if (inv_info->granularity == IOMMU_INV_GRANU_ADDR) {
+   struct iommu_inv_addr_info *info = 
_info->granu.addr_info;
+   size_t size = info->nb_granules * info->granule_size;
+   bool leaf = info->flags & IOMMU_INV_ADDR_FLAGS_LEAF;
+
+   if (!(info->flags & IOMMU_INV_ADDR_FLAGS_ARCHID) ||
+(info->flags & IOMMU_INV_ADDR_FLAGS_PASID))
+   return -EINVAL;
+
+   __arm_smmu_tlb_inv_range(info->addr, size,
+info->granule_size, leaf,
+ smmu_domain, info->archid);


When debugging with vSMMU on ARM64 huawei platform,  there is a issue:  
RIL feature is enabled on guest OS while it is not supported on host 
OS,  with some operations
(such as rmmod driver during iperf between guest and host), SMMU 
translation error occurs frequently, and it works well if RIL feature is 
disabled on guest OS.
We find that in function vfio_iommu_unmap_notify() (qemu code) it passes 
total size of tlb (num_pages * granule_size) as granule size to host OS :

addr_info->granules_size = size
addr_info->nb_granule = 1
So total size (num_pages * granule_size) is passed as granule size to 
function __arm_smmu_inv_range(), if RIL feature is not supported, then 
total size may be not
the granule size that host OS can recongize, i will only invalidate part 
of tlb it wants to invalidate (for example, 4K/2M/1G pagesize is 
supported on host OS,  if total_size = 8K, then addr_info->granule_size 
= 8K,
addr_info->nb_granule = 1, as RIL feature is not supported, it 
invalidates just 4K one time but it thought it had invalidated 8K).
Please have a check of the issue, and we have a temporary fix as follows 
on the issue.


hw/arm/smmuv3.c   | 3 ++-
 hw/vfio/common.c  | 6 +++---
 include/exec/memory.h | 1 +
 3 files changed, 6 insertions(+), 4 deletions(-)

diff --git a/hw/arm/smmuv3.c b/hw/arm/smmuv3.c
index 6725019..891a65d 100644
--- a/hw/arm/smmuv3.c
+++ b/hw/arm/smmuv3.c
@@ -821,7 +821,8 @@ static void smmuv3_notify_iova(IOMMUMemoryRegion *mr,

 entry.target_as = _space_memory;
 entry.iova = iova;
-entry.addr_mask = num_pages * (1 << granule) - 1;
+entry.addr_mask = (1 << granule) - 1;
+entry.num_pages = num_pages;
 entry.perm = IOMMU_NONE;
 entry.arch_id = asid;
 entry.flags = IOMMU_INV_GRANU_ADDR;
diff --git a/hw/vfio/common.c b/hw/vfio/common.c
index 5d365e0..c0164b1 100644
--- a/hw/vfio/common.c
+++ b/hw/vfio/common.c
@@ -604,7 +604,7 @@ static void vfio_iommu_unmap_notify(IOMMUNotifier 
*n, IOMMUTLBEntry *iotlb)

 hwaddr start = iotlb->iova + 

Re: Consult on ARM SMMU debugfs

2021-01-15 Thread chenxiang (M)



在 2021/1/12 4:01, Robin Murphy 写道:

On 2021-01-07 02:45, chenxiang (M) wrote:

Hi Will,  Robin or other guys,

When debugging SMMU/SVA issue on huawei ARM64 board, we find that it 
lacks of enough debugfs for ARM SMMU driver (such as


the value of STE/CD which we need to check sometimes). Currently it 
creates top-level iommu directory in debugfs, but there is no debugfs


for ARM SMMU driver specially. Do you know whether ARM have the plan 
to do that recently?


FWIW I don't think I've ever felt the need to need to inspect the 
Stream Table on a live system. So far the nature of the STE code has 
been simple enough that it's very hard for any given STE to be *wrong* 
- either it's set up as expected and thus works fine, or it's not 
initialised at all and you get C_BAD_STE, where 99% of the time you 
then just cross-reference the Stream ID against the firmware and find 
that the DT/IORT is wrong.


Similarly I don't think I've even even *seen* an issue that could be 
attributed to a context descriptor, although I appreciate that as we 
start landing more PASID and SVA support the scope for that starts to 
widen considerably.


Thank you for your reply.  I aggree that it is very hard for the content 
of STE/CD to be wrong in current code, but  there are more upsteaming 
features(such as SVA/vSMMU) which are related to SMMU,
when debugging with those features, it is useful for us to locate issues 
if there are interfaces to dump those info. And also when debugging 
together with hardware guys, the content of dump is important for them too.




Feel free to propose a patch if you believe it would be genuinely 
useful and won't just bit-rot into a maintenance burden, but it's not 
something that's on our roadmap here.


OK, we are considering about incorporating following requirements into 
the plan:

- Dump the value of STE/CD of devices
- Dump page table entries of devices
- Dump the entries info of CMDQ/EVENTQ



Thanks,
Robin.

.




___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

Consult on ARM SMMU debugfs

2021-01-06 Thread chenxiang (M)

Hi Will,  Robin or other guys,

When debugging SMMU/SVA issue on huawei ARM64 board, we find that it 
lacks of enough debugfs for ARM SMMU driver (such as


the value of STE/CD which we need to check sometimes). Currently it 
creates top-level iommu directory in debugfs, but there is no debugfs


for ARM SMMU driver specially. Do you know whether ARM have the plan to 
do that recently?



Best regards,

Shawn


___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu