date:20210609

Re: [PATCH] iommu: remove redundant assignment to variable agaw

2021-06-09 Thread Lu Baolu


On 4/17/21 1:18 AM, Colin King wrote:

From: Colin Ian King

The variable agaw is initialized with a value that is never
read and it is being updated later with a new value as a
counter in a for-loop. The initialization is redundant and
can be removed.

Addresses-Coverity: ("Unused value")
Signed-off-by: Colin Ian King


Queued for v5.14. Thanks!

Best regards,
baolu
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

Re: [PATCH] iommu/vt-d: fix kernel-doc syntax in file header

2021-06-09 Thread Lu Baolu


On 5/23/21 10:32 PM, Aditya Srivastava wrote:

The opening comment mark '/**' is used for highlighting the beginning of
kernel-doc comments.
The header for drivers/iommu/intel/pasid.c follows this syntax, but
the content inside does not comply with kernel-doc.

This line was probably not meant for kernel-doc parsing, but is parsed
due to the presence of kernel-doc like comment syntax(i.e, '/**'), which
causes unexpected warnings from kernel-doc:
warning: Function parameter or member 'fmt' not described in 'pr_fmt'

Provide a simple fix by replacing this occurrence with general comment
format, i.e. '/*', to prevent kernel-doc from parsing it.

Signed-off-by: Aditya Srivastava


Queued for v5.14. Thanks!

Best regards,
baolu
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

Re: [PATCH v2][next] iommu/vt-d: Fix out-bounds-warning in intel_svm_page_response()

2021-06-09 Thread Lu Baolu


On 4/15/21 4:14 AM, Gustavo A. R. Silva wrote:

Replace a couple of calls to memcpy() with simple assignments in order
to fix the following out-of-bounds warning:

drivers/iommu/intel/svm.c:1198:4: warning: 'memcpy' offset [25, 32] from the 
object at 'desc' is out of the bounds of referenced subobject 'qw2' with type 
'long long unsigned int' at offset 16 [-Warray-bounds]

The problem is that the original code is trying to copy data into a
couple of struct members adjacent to each other in a single call to
memcpy(). This causes a legitimate compiler warning because memcpy()
overruns the length of &desc.qw2 and &resp.qw2, respectively.

This helps with the ongoing efforts to globally enable -Warray-bounds
and get us closer to being able to tighten the FORTIFY_SOURCE routines
on memcpy().

Link:https://github.com/KSPP/linux/issues/109
Signed-off-by: Gustavo A. R. Silva


Queued for v5.14. Thanks!

Best regards,
baolu
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

Re: [PATCH -next] iommu/vt-d: use DEVICE_ATTR_RO macro

2021-06-09 Thread Lu Baolu


On 5/28/21 9:02 PM, YueHaibing wrote:

Use DEVICE_ATTR_RO() helper instead of plain DEVICE_ATTR(),
which makes the code a bit shorter and easier to read.

Signed-off-by: YueHaibing


Queued for v5.14. Thanks!

Best regards,
baolu
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

Re: [PATCH 0/5] Short cleanups around DMAR

2021-06-09 Thread Lu Baolu


On 5/30/21 3:50 PM, Parav Pandit wrote:

Hi David, Lu,

This short series contains small cleanup patches for Intel iommu
in DMAR area.

Patch summary:
Patch-1 uses bitfields for few DMAR capabilities
Patch-2 removes unused iommu_count
Patch-3 removed unnecessary braces
Patch-4 define count data type explicitly as unsigned int
Patch-5 removes unnecessary typecasting


Parav Pandit (5):
   iommu/intel: Use bitfields for DMAR capabilities
   iommu/intel: Removed unused iommu_count in dmar domain
   iommu/intel: Remove unnecessary braces
   iommu/intel: Define counter explicitly as unsigned int
   iommu/intel: No need to typecast


All patches queued for v5.14. Thanks!

Best regards,
baolu
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

Re: [PATCH 1/1] iommu: Delete a duplicate check in iommu_change_dev_def_domain()

2021-06-09 Thread Joerg Roedel

On Thu, May 13, 2021 at 03:58:15PM +0800, Zhen Lei wrote:
> Function iommu_group_store_type() is the only caller of the static
> function iommu_change_dev_def_domain() and has performed
> "if (WARN_ON(!group))" detection before calling it. So the one here is
> redundant.
> 
> Signed-off-by: Zhen Lei 
> ---
>  drivers/iommu/iommu.c | 3 ---
>  1 file changed, 3 deletions(-)

Applied, thanks.

> 
> diff --git a/drivers/iommu/iommu.c b/drivers/iommu/iommu.c
> index 971068da67cb91d..8cdf6a1c4bfd773 100644
> --- a/drivers/iommu/iommu.c
> +++ b/drivers/iommu/iommu.c
> @@ -3059,9 +3059,6 @@ static int iommu_change_dev_def_domain(struct 
> iommu_group *group,
>   int ret, dev_def_dom;
>   struct device *dev;
>  
> - if (!group)
> - return -EINVAL;
> -
>   mutex_lock(&group->mutex);
>  
>   if (group->default_domain != group->domain) {
> -- 
> 2.26.0.106.g9fadedd
> 
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

Re: [PATCH] iommu/vt-d: Dump DMAR translation structure

2021-06-09 Thread Lu Baolu


On 5/27/21 7:35 AM, Kyung Min Park wrote:

When the dmar translation fault happens, the kernel prints a single line
fault reason with corresponding hexadecimal code defined in the Intel VT-d
specification.

Currently, when user wants to debug the translation fault in detail,
debugfs is used for dumping the dmar_translation_struct, which is not
available when the kernel failed to boot.

Dump the DMAR translation structure, pagewalk the IO page table and print
the page table entry when the fault happens.

Signed-off-by: Kyung Min Park 


Please fix below compile error:

ld: drivers/iommu/intel/dmar.o: in function `dmar_fault_do_one':
/home/allen/Workspace/iommu/drivers/iommu/intel/dmar.c:1965: undefined 
reference to `dmar_fault_dump_ptes'

make: *** [Makefile:1191: vmlinux] Error 1

Best regards,
baolu

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

Re: [PATCH v8 0/4] Add IOMMU driver for rk356x

2021-06-09 Thread Joerg Roedel

On Fri, Jun 04, 2021 at 06:44:37PM +0200, Benjamin Gaignard wrote:
> This series adds the IOMMU driver for rk356x SoC.
> Since a new compatible is needed to distinguish this second version of 
> IOMMU hardware block from the first one, it is an opportunity to convert
> the binding to DT schema.
> 
> version 8:
>  - Fix compilation issue.

Applied, thanks.
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

Re: Plan for /dev/ioasid RFC v2

2021-06-09 Thread Eric Auger

Hi Kevin,

On 6/7/21 4:58 AM, Tian, Kevin wrote:
> Hi, all,
>
> We plan to work on v2 now, given many good comments already received
> and substantial changes envisioned. This is a very complex topic with
> many sub-threads being discussed. To ensure that I didn't miss valuable 
> suggestions (and also keep everyone on the same page), here I'd like to 
> provide a list of planned changes in my mind. Please let me know if 
> anything important is lost.  :)
>
> --
>
> (Remaining opens in v1)
>
> -   Protocol between kvm/vfio/ioasid for wbinvd/no-snoop. I'll see how
> much can be refined based on discussion progress when v2 is out;
>
> -   Device-centric (Jason) vs. group-centric (David) uAPI. David is not fully
> convinced yet. Based on discussion v2 will continue to have ioasid uAPI
> being device-centric (but it's fine for vfio to be group-centric). A new
> section will be added to elaborate this part;
>
> -   PASID virtualization (section 4) has not been thoroughly discussed yet. 
> Jason gave some suggestion on how to categorize intended usages. 
> I will rephrase this section and hope more discussions can be held for 
> it in v2;
>
> (Adopted suggestions)
>
> -   (Jason) Rename /dev/ioasid to /dev/iommu (so does uAPI e.g. IOASID
> _XXX to IOMMU_XXX). One suggestion (Jason) was to also rename 
> RID+PASID to SID+SSID. But given the familiarity of the former, I will 
> still use RID+PASID in v2 to ease the discussoin;
>
> -   (Jason) v1 prevents one device from binding to multiple ioasid_fd's. This 
> will be fixed in v2;
>
> -   (Jean/Jason) No need to track guest I/O page tables on ARM/AMD. When 
> a pasid table is bound, it becomes a container for all guest I/O page 
> tables;
while I am totally in line with that change, I guess we need to revisit
the invalidate ioctl
to support PASID table invalidation.
>
> -   (Jean/Jason) Accordingly a device label is required so iotlb invalidation 
> and fault handling can both support per-device operation. Per Jean's 
> suggestion, this label will come from userspace (when VFIO_BIND_
> IOASID_FD);

what is not totally clear to me is the correspondance between this label
and the SID/SSID tuple.
My understanding is it rather maps to the SID because you can attach
several ioasids to the device.
So it is not clear to me how you reconstruct the SSID info

Thanks

Eric
>
> -   (Jason) Addition of device label allows per-device capability/format 
> check before IOASIDs are created. This leads to another major uAPI 
> change in v2 - specify format info when creating an IOASID (mapping 
> protocol, nesting, coherent, etc.). User is expected to check per-device 
> format and then set proper format for IOASID upon to-be-attached 
> device;

> -   (Jason/David) No restriction on map/unmap vs. bind/invalidate. They
> can be used in either parent or child;
>
> -   (David) Change IOASID_GET_INFO to report permitted range instead of
> reserved IOVA ranges. This works better for PPC;
>
> -   (Jason) For helper functions, expect to have explicit bus-type wrappers
> e.g. ioasid_pci_device_attach;
>
> (Not adopted)
>
> -   (Parav) Make page pinning a syscall;
> -   (Jason. W/Enrico) one I/O page table per fd;
> -   (David) Replace IOASID_REGISTER_MEMORY through another ioasid
> nesting (sort of passthrough mode). Need more thinking. v2 will not 
> change this part;
>
> Thanks
> Kevin
>

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

Re: [PATCH v11 0/3] Enhance IOMMU default DMA mode build options

2021-06-09 Thread Joerg Roedel

Hi John,

On Tue, Jun 08, 2021 at 09:18:25PM +0800, John Garry wrote:
> Zhen Lei (3):
>   iommu: Enhance IOMMU default DMA mode build options
>   iommu/vt-d: Add support for IOMMU default DMA mode build options
>   iommu/amd: Add support for IOMMU default DMA mode build options

So I like the idea, but can we go a step further and get (mostly) rid of
the driver-specific setup code for lazy/non-lazy mode? This can happen
in the dma-iommu code and the drivers only need to keep the support for
their legacy command line options.

Regards,

Joerg
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

[PATCH v3 1/2] iommu/iova: Fix spelling mistakes

2021-06-09 Thread Zhen Lei

Fix some spelling mistakes in comments found by "codespell":
detroyed ==> destroyed
defered ==> deferred
entrie ==> entry
alloced ==> allocated
regularily ==> regularly

Signed-off-by: Zhen Lei 
---
 drivers/iommu/iova.c |  2 +-
 include/linux/iova.h | 10 +-
 2 files changed, 6 insertions(+), 6 deletions(-)

diff --git a/drivers/iommu/iova.c b/drivers/iommu/iova.c
index b6cf5f16123b..cce4571548c4 100644
--- a/drivers/iommu/iova.c
+++ b/drivers/iommu/iova.c
@@ -594,7 +594,7 @@ static void fq_destroy_all_entries(struct iova_domain 
*iovad)
int cpu;
 
/*
-* This code runs when the iova_domain is being detroyed, so don't
+* This code runs when the iova_domain is being destroyed, so don't
 * bother to free iovas, just call the entry_dtor on all remaining
 * entries.
 */
diff --git a/include/linux/iova.h b/include/linux/iova.h
index 71d8a2de6635..16f671b04a37 100644
--- a/include/linux/iova.h
+++ b/include/linux/iova.h
@@ -49,12 +49,12 @@ typedef void (* iova_entry_dtor)(unsigned long data);
 /* Timeout (in ms) after which entries are flushed from the Flush-Queue */
 #define IOVA_FQ_TIMEOUT10
 
-/* Flush Queue entry for defered flushing */
+/* Flush Queue entry for deferred flushing */
 struct iova_fq_entry {
unsigned long iova_pfn;
unsigned long pages;
unsigned long data;
-   u64 counter; /* Flush counter when this entrie was added */
+   u64 counter; /* Flush counter when this entry was added */
 };
 
 /* Per-CPU Flush Queue structure */
@@ -68,8 +68,8 @@ struct iova_fq {
 struct iova_domain {
spinlock_t  iova_rbtree_lock; /* Lock to protect update of rbtree */
struct rb_root  rbroot; /* iova domain rbtree root */
-   struct rb_node  *cached_node;   /* Save last alloced node */
-   struct rb_node  *cached32_node; /* Save last 32-bit alloced node */
+   struct rb_node  *cached_node;   /* Save last allocated node */
+   struct rb_node  *cached32_node; /* Save last 32-bit allocated node */
unsigned long   granule;/* pfn granularity for this domain */
unsigned long   start_pfn;  /* Lower limit for this domain */
unsigned long   dma_32bit_pfn;
@@ -91,7 +91,7 @@ struct iova_domain {
iova_entry_dtor entry_dtor; /* IOMMU driver specific destructor for
   iova entry */
 
-   struct timer_list fq_timer; /* Timer to regularily empty the
+   struct timer_list fq_timer; /* Timer to regularly empty the
   flush-queues */
atomic_t fq_timer_on;   /* 1 when timer is active, 0
   when not */
-- 
2.25.1


___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

[PATCH v3 2/2] iommu: Fix spelling mistakes

2021-06-09 Thread Zhen Lei

Fix some spelling mistakes in comments found by "codespell":
alignement ==> alignment
implemtation ==> implementation
assignement ==> assignment
initally ==> initially
Returs ==> Returns
Traverese ==> Traverse
guarentees ==> guarantees
resgister ==> register
insufficent ==> insufficient
Specifiction ==> Specification
creats ==> creates
tabke ==> table
shuld ==> should
requeset ==> request
funcions ==> functions
distiguish ==> distinguish
phyiscal ==> physical
Uppon ==> Upon
consits ==> consists

And two were discovered manually by John Garry and Will Deacon:
appropriatley ==> appropriately
Additionally, The ==> Additionally, the

Signed-off-by: Zhen Lei 
---
 drivers/iommu/amd/amd_iommu_types.h   | 2 +-
 drivers/iommu/amd/init.c  | 2 +-
 drivers/iommu/amd/iommu.c | 2 +-
 drivers/iommu/arm/arm-smmu/arm-smmu.c | 2 +-
 drivers/iommu/fsl_pamu.c  | 2 +-
 drivers/iommu/fsl_pamu_domain.c   | 2 +-
 drivers/iommu/intel/dmar.c| 8 
 drivers/iommu/intel/iommu.c   | 2 +-
 drivers/iommu/intel/irq_remapping.c   | 2 +-
 drivers/iommu/intel/svm.c | 4 ++--
 drivers/iommu/iommu.c | 6 +++---
 drivers/iommu/mtk_iommu.c | 4 ++--
 drivers/iommu/omap-iommu.c| 2 +-
 drivers/iommu/sun50i-iommu.c  | 2 +-
 14 files changed, 21 insertions(+), 21 deletions(-)

diff --git a/drivers/iommu/amd/amd_iommu_types.h 
b/drivers/iommu/amd/amd_iommu_types.h
index 94c1a7a9876d..67a6c2fb4de9 100644
--- a/drivers/iommu/amd/amd_iommu_types.h
+++ b/drivers/iommu/amd/amd_iommu_types.h
@@ -446,7 +446,7 @@ extern struct irq_remap_table **irq_lookup_table;
 /* Interrupt remapping feature used? */
 extern bool amd_iommu_irq_remap;
 
-/* kmem_cache to get tables with 128 byte alignement */
+/* kmem_cache to get tables with 128 byte alignment */
 extern struct kmem_cache *amd_iommu_irq_cache;
 
 /*
diff --git a/drivers/iommu/amd/init.c b/drivers/iommu/amd/init.c
index 4e4fb0f4e412..52d450962288 100644
--- a/drivers/iommu/amd/init.c
+++ b/drivers/iommu/amd/init.c
@@ -2040,7 +2040,7 @@ static int intcapxt_irqdomain_activate(struct irq_domain 
*domain,
xt.destid_24_31 = cfg->dest_apicid >> 24;
 
/**
-* Current IOMMU implemtation uses the same IRQ for all
+* Current IOMMU implementation uses the same IRQ for all
 * 3 IOMMU interrupts.
 */
writeq(xt.capxt, iommu->mmio_base + MMIO_INTCAPXT_EVT_OFFSET);
diff --git a/drivers/iommu/amd/iommu.c b/drivers/iommu/amd/iommu.c
index b1fbf2c83df5..1b635d4c2142 100644
--- a/drivers/iommu/amd/iommu.c
+++ b/drivers/iommu/amd/iommu.c
@@ -1809,7 +1809,7 @@ int __init amd_iommu_init_api(void)
  * The following functions belong to the exported interface of AMD IOMMU
  *
  * This interface allows access to lower level functions of the IOMMU
- * like protection domain handling and assignement of devices to domains
+ * like protection domain handling and assignment of devices to domains
  * which is not possible with the dma_ops interface.
  *
  */
diff --git a/drivers/iommu/arm/arm-smmu/arm-smmu.c 
b/drivers/iommu/arm/arm-smmu/arm-smmu.c
index dba15f312cbd..79db6e8c5e31 100644
--- a/drivers/iommu/arm/arm-smmu/arm-smmu.c
+++ b/drivers/iommu/arm/arm-smmu/arm-smmu.c
@@ -1360,7 +1360,7 @@ static struct iommu_device *arm_smmu_probe_device(struct 
device *dev)
ret = arm_smmu_register_legacy_master(dev, &smmu);
 
/*
-* If dev->iommu_fwspec is initally NULL, 
arm_smmu_register_legacy_master()
+* If dev->iommu_fwspec is initially NULL, 
arm_smmu_register_legacy_master()
 * will allocate/initialise a new one. Thus we need to update 
fwspec for
 * later use.
 */
diff --git a/drivers/iommu/fsl_pamu.c b/drivers/iommu/fsl_pamu.c
index fc38b1fba7cf..53aff2766367 100644
--- a/drivers/iommu/fsl_pamu.c
+++ b/drivers/iommu/fsl_pamu.c
@@ -246,7 +246,7 @@ void get_ome_index(u32 *omi_index, struct device *dev)
  * @stash_dest_hint: L1, L2 or L3
  * @vcpu: vpcu target for a particular cache type.
  *
- * Returs stash on success or ~(u32)0 on failure.
+ * Returns stash on success or ~(u32)0 on failure.
  *
  */
 u32 get_stash_id(u32 stash_dest_hint, u32 vcpu)
diff --git a/drivers/iommu/fsl_pamu_domain.c b/drivers/iommu/fsl_pamu_domain.c
index a47f47307109..2da312645279 100644
--- a/drivers/iommu/fsl_pamu_domain.c
+++ b/drivers/iommu/fsl_pamu_domain.c
@@ -363,7 +363,7 @@ static struct iommu_group 
*get_shared_pci_device_group(struct pci_dev *pdev)
struct pci_bus *bus = pdev->bus;
 
/*
-* Traverese the pci bus device list to get
+* Traverse the pci bus device list to get
 * the shared iommu group.
 */
while (bus) {
diff --git a/drivers/iommu/intel/dmar.c b/drivers/iommu/intel/dmar.c
index 84057cb9596c..e1626f5eb689 100644
--- a/drivers/io

[PATCH v3 0/2] iommu: Fix spelling mistakes

2021-06-09 Thread Zhen Lei

v2 --> v3:
1) Add some new fixes for the latest linux-next:
   drivers/iommu/fsl_pamu_domain.c:366: Traverese ==> Traverse
   drivers/iommu/mtk_iommu.c:977: Uppon ==> Upon
   drivers/iommu/intel/svm.c:488: shuld ==> should
   drivers/iommu/intel/svm.c:920: requeset ==> request
   drivers/iommu/intel/dmar.c:2131: Specifiction ==> Specification
2) Add a new fix "Additionally, The ==> Additionally, the", discovered by Will 
Deacon
3) Add some new fixes for the header files of iommu/iova
   The header files to be checked are as follows:
   include/linux/iommu*.h
   include/linux/iova.h
   include/uapi/linux/iommu.h
4) Changes to files "iova.h" and "iova.c" are grouped into a new patch.


v1 --> v2:
1. Merge into one patch
2. Add a new fix "appropriatley --> appropriately" in iommu.c, discovered by 
John Garry

Zhen Lei (2):
  iommu/iova: Fix spelling mistakes
  iommu: Fix spelling mistakes

 drivers/iommu/amd/amd_iommu_types.h   |  2 +-
 drivers/iommu/amd/init.c  |  2 +-
 drivers/iommu/amd/iommu.c |  2 +-
 drivers/iommu/arm/arm-smmu/arm-smmu.c |  2 +-
 drivers/iommu/fsl_pamu.c  |  2 +-
 drivers/iommu/fsl_pamu_domain.c   |  2 +-
 drivers/iommu/intel/dmar.c|  8 
 drivers/iommu/intel/iommu.c   |  2 +-
 drivers/iommu/intel/irq_remapping.c   |  2 +-
 drivers/iommu/intel/svm.c |  4 ++--
 drivers/iommu/iommu.c |  6 +++---
 drivers/iommu/iova.c  |  2 +-
 drivers/iommu/mtk_iommu.c |  4 ++--
 drivers/iommu/omap-iommu.c|  2 +-
 drivers/iommu/sun50i-iommu.c  |  2 +-
 include/linux/iova.h  | 10 +-
 16 files changed, 27 insertions(+), 27 deletions(-)

-- 
2.25.1


___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

Re: [RFC] /dev/ioasid uAPI proposal

2021-06-09 Thread Enrico Weigelt, metux IT consult


On 08.06.21 21:00, Jason Gunthorpe wrote:


Eg I can do open() on a file and I get to keep that FD. I get to keep
that FD even if someone later does chmod() on that file so I can't
open it again.

There are lots of examples where a one time access control check
provides continuing access to a resource. I feel the ongoing proof is
the rarity in Unix.. 'revoke' is an uncommon concept in Unix..


Yes, it's even possible that somebody w/ privileges opens an fd and
hands it over to somebody unprivileged (eg. via unix socket). This is
a very basic unix concept. If some (already opened) fd now suddenly
behaves differently based on the current caller, that would be a break
with traditional unix semantics.


--mtx

--
---
Hinweis: unverschlüsselte E-Mails können leicht abgehört und manipuliert
werden ! Für eine vertrauliche Kommunikation senden Sie bitte ihren
GPG/PGP-Schlüssel zu.
---
Enrico Weigelt, metux IT consult
Free software and Linux embedded engineering
i...@metux.net -- +49-151-27565287
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

Re: Plan for /dev/ioasid RFC v2

2021-06-09 Thread Leon Romanovsky

On Mon, Jun 07, 2021 at 02:58:18AM +, Tian, Kevin wrote:
> Hi, all,

<...>

> (Remaining opens in v1)

<...>

> -   Device-centric (Jason) vs. group-centric (David) uAPI. David is not fully
> convinced yet. Based on discussion v2 will continue to have ioasid uAPI
> being device-centric (but it's fine for vfio to be group-centric). A new
> section will be added to elaborate this part;

<...>

> (Adopted suggestions)

<...>

> -   (Jason) Addition of device label allows per-device capability/format 
> check before IOASIDs are created. This leads to another major uAPI 
> change in v2 - specify format info when creating an IOASID (mapping 
> protocol, nesting, coherent, etc.). User is expected to check per-device 
> format and then set proper format for IOASID upon to-be-attached 
> device;

Sorry for my naive question, I still didn't read all v1 thread and maybe
the answer is already written, but will ask anyway.

Doesn't this adopted suggestion to allow device-specific configuration
actually means that uAPI should be device-centric?

User already needs to be aware of device, configure it explicitly, maybe
gracefully clean it later, it looks like not so much left to be group-centric.

Thanks
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

Re: [RFC] /dev/ioasid uAPI proposal

2021-06-09 Thread Paolo Bonzini


On 09/06/21 10:51, Enrico Weigelt, metux IT consult wrote:

On 08.06.21 21:00, Jason Gunthorpe wrote:


Eg I can do open() on a file and I get to keep that FD. I get to keep
that FD even if someone later does chmod() on that file so I can't
open it again.

There are lots of examples where a one time access control check
provides continuing access to a resource. I feel the ongoing proof is
the rarity in Unix.. 'revoke' is an uncommon concept in Unix..


Yes, it's even possible that somebody w/ privileges opens an fd and
hands it over to somebody unprivileged (eg. via unix socket). This is
a very basic unix concept. If some (already opened) fd now suddenly
behaves differently based on the current caller, that would be a break
with traditional unix semantics.


That's already more or less meaningless for both KVM and VFIO, since 
they are tied to an mm.


Paolo

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

RE: Plan for /dev/ioasid RFC v2

2021-06-09 Thread Tian, Kevin

> From: Eric Auger 
> Sent: Wednesday, June 9, 2021 4:15 PM
> 
> Hi Kevin,
> 
> On 6/7/21 4:58 AM, Tian, Kevin wrote:
> > Hi, all,
> >
> > We plan to work on v2 now, given many good comments already received
> > and substantial changes envisioned. This is a very complex topic with
> > many sub-threads being discussed. To ensure that I didn't miss valuable
> > suggestions (and also keep everyone on the same page), here I'd like to
> > provide a list of planned changes in my mind. Please let me know if
> > anything important is lost.  :)
> >
> > --
> >
> > (Remaining opens in v1)
> >
> > -   Protocol between kvm/vfio/ioasid for wbinvd/no-snoop. I'll see how
> > much can be refined based on discussion progress when v2 is out;
> >
> > -   Device-centric (Jason) vs. group-centric (David) uAPI. David is not 
> > fully
> > convinced yet. Based on discussion v2 will continue to have ioasid uAPI
> > being device-centric (but it's fine for vfio to be group-centric). A new
> > section will be added to elaborate this part;
> >
> > -   PASID virtualization (section 4) has not been thoroughly discussed yet.
> > Jason gave some suggestion on how to categorize intended usages.
> > I will rephrase this section and hope more discussions can be held for
> > it in v2;
> >
> > (Adopted suggestions)
> >
> > -   (Jason) Rename /dev/ioasid to /dev/iommu (so does uAPI e.g. IOASID
> > _XXX to IOMMU_XXX). One suggestion (Jason) was to also rename
> > RID+PASID to SID+SSID. But given the familiarity of the former, I will
> > still use RID+PASID in v2 to ease the discussoin;
> >
> > -   (Jason) v1 prevents one device from binding to multiple ioasid_fd's. 
> > This
> > will be fixed in v2;
> >
> > -   (Jean/Jason) No need to track guest I/O page tables on ARM/AMD.
> When
> > a pasid table is bound, it becomes a container for all guest I/O page
> tables;
> while I am totally in line with that change, I guess we need to revisit
> the invalidate ioctl
> to support PASID table invalidation.

Yes, this is planned when doing this change.

> >
> > -   (Jean/Jason) Accordingly a device label is required so iotlb 
> > invalidation
> > and fault handling can both support per-device operation. Per Jean's
> > suggestion, this label will come from userspace (when VFIO_BIND_
> > IOASID_FD);
> 
> what is not totally clear to me is the correspondance between this label
> and the SID/SSID tuple.
> My understanding is it rather maps to the SID because you can attach
> several ioasids to the device.
> So it is not clear to me how you reconstruct the SSID info
> 

Yes, device handle maps to SID. The fault data reported to userspace
will include {device_label, ioasid, vendor_fault_data}. In your case
I believe SSID will be included in vendor_fault_data thus no reconstruct
required. For Intel the user could figure out vPASID according to device_
label and ioasid, i.e. no need to include PASID info in vendor_fault_data.

Thanks
Kevin
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

RE: Plan for /dev/ioasid RFC v2

2021-06-09 Thread Tian, Kevin

> From: Leon Romanovsky 
> Sent: Wednesday, June 9, 2021 5:02 PM
> 
> On Mon, Jun 07, 2021 at 02:58:18AM +, Tian, Kevin wrote:
> > Hi, all,
> 
> <...>
> 
> > (Remaining opens in v1)
> 
> <...>
> 
> > -   Device-centric (Jason) vs. group-centric (David) uAPI. David is not 
> > fully
> > convinced yet. Based on discussion v2 will continue to have ioasid uAPI
> > being device-centric (but it's fine for vfio to be group-centric). A new
> > section will be added to elaborate this part;
> 
> <...>
> 
> > (Adopted suggestions)
> 
> <...>
> 
> > -   (Jason) Addition of device label allows per-device capability/format
> > check before IOASIDs are created. This leads to another major uAPI
> > change in v2 - specify format info when creating an IOASID (mapping
> > protocol, nesting, coherent, etc.). User is expected to check per-device
> > format and then set proper format for IOASID upon to-be-attached
> > device;
> 
> Sorry for my naive question, I still didn't read all v1 thread and maybe
> the answer is already written, but will ask anyway.
> 
> Doesn't this adopted suggestion to allow device-specific configuration
> actually means that uAPI should be device-centric?
> 
> User already needs to be aware of device, configure it explicitly, maybe
> gracefully clean it later, it looks like not so much left to be group-centric.
> 

Yes, this is what v2 will lean toward. /dev/ioasid reports format info and 
handle IOASID attachment per device. VFIO could still keep its group-
centric uAPI, but in the end it needs bind each device in the group to 
IOASID FD one-by-one. 

Thanks
Kevin
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

Re: [PATCH v15 0/3] iommu/arm-smmu-v3: Add stall support

2021-06-09 Thread Jean-Philippe Brucker

On Tue, Jun 08, 2021 at 12:42:34PM +0100, Will Deacon wrote:
> On Wed, 26 May 2021 18:19:25 +0200, Jean-Philippe Brucker wrote:
> > Add stall support for SMMUv3, enabling I/O page faults and SVA for
> > compatible devices. No change since last version [1], but I'd still like
> > this to be considered for upstream, because there exists hardware and
> > applications.
> > 
> > Stall is implemented by the Kunpeng 920 processor for its compression
> > and crypto accelerators, with which I tested the SVA infrastructure.
> > Using the userspace accelerator API [2], a program can obtain a queue
> > from one of these devices and submit compression or encryption work
> > within the program's address space. UADK [3] provides a library to do
> > this, and there is an openssl plugin [4] to use it.
> > 
> > [...]
> 
> Applied to will (for-joerg/arm-smmu/updates), thanks!
> 
> [1/3] dt-bindings: document stall property for IOMMU masters
>   https://git.kernel.org/will/c/ed1d08b9d0c9
> [2/3] ACPI/IORT: Enable stall support for platform devices
>   https://git.kernel.org/will/c/6522b1e0c78f
> [3/3] iommu/arm-smmu-v3: Add stall support for platform devices
>   https://git.kernel.org/will/c/395ad89d11fd
> 

Thanks!  That concludes most of the SVA work. For SMMUv3 we still need to
figure out DVM, there will be PRI at some point, and I'm sure some
bugfixes but I don't plan to send any other major feature support for the
next cycles.

Thanks,
Jean

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

Re: Plan for /dev/ioasid RFC v2

2021-06-09 Thread Eric Auger

Hi Kevin,

On 6/9/21 11:37 AM, Tian, Kevin wrote:
>> From: Eric Auger 
>> Sent: Wednesday, June 9, 2021 4:15 PM
>>
>> Hi Kevin,
>>
>> On 6/7/21 4:58 AM, Tian, Kevin wrote:
>>> Hi, all,
>>>
>>> We plan to work on v2 now, given many good comments already received
>>> and substantial changes envisioned. This is a very complex topic with
>>> many sub-threads being discussed. To ensure that I didn't miss valuable
>>> suggestions (and also keep everyone on the same page), here I'd like to
>>> provide a list of planned changes in my mind. Please let me know if
>>> anything important is lost.  :)
>>>
>>> --
>>>
>>> (Remaining opens in v1)
>>>
>>> -   Protocol between kvm/vfio/ioasid for wbinvd/no-snoop. I'll see how
>>> much can be refined based on discussion progress when v2 is out;
>>>
>>> -   Device-centric (Jason) vs. group-centric (David) uAPI. David is not 
>>> fully
>>> convinced yet. Based on discussion v2 will continue to have ioasid uAPI
>>> being device-centric (but it's fine for vfio to be group-centric). A new
>>> section will be added to elaborate this part;
>>>
>>> -   PASID virtualization (section 4) has not been thoroughly discussed yet.
>>> Jason gave some suggestion on how to categorize intended usages.
>>> I will rephrase this section and hope more discussions can be held for
>>> it in v2;
>>>
>>> (Adopted suggestions)
>>>
>>> -   (Jason) Rename /dev/ioasid to /dev/iommu (so does uAPI e.g. IOASID
>>> _XXX to IOMMU_XXX). One suggestion (Jason) was to also rename
>>> RID+PASID to SID+SSID. But given the familiarity of the former, I will
>>> still use RID+PASID in v2 to ease the discussoin;
>>>
>>> -   (Jason) v1 prevents one device from binding to multiple ioasid_fd's. 
>>> This
>>> will be fixed in v2;
>>>
>>> -   (Jean/Jason) No need to track guest I/O page tables on ARM/AMD.
>> When
>>> a pasid table is bound, it becomes a container for all guest I/O page
>> tables;
>> while I am totally in line with that change, I guess we need to revisit
>> the invalidate ioctl
>> to support PASID table invalidation.
> Yes, this is planned when doing this change.
OK
>
>>> -   (Jean/Jason) Accordingly a device label is required so iotlb 
>>> invalidation
>>> and fault handling can both support per-device operation. Per Jean's
>>> suggestion, this label will come from userspace (when VFIO_BIND_
>>> IOASID_FD);
>> what is not totally clear to me is the correspondance between this label
>> and the SID/SSID tuple.
>> My understanding is it rather maps to the SID because you can attach
>> several ioasids to the device.
>> So it is not clear to me how you reconstruct the SSID info
>>
> Yes, device handle maps to SID. The fault data reported to userspace
> will include {device_label, ioasid, vendor_fault_data}. In your case
> I believe SSID will be included in vendor_fault_data thus no reconstruct
> required. For Intel the user could figure out vPASID according to device_
> label and ioasid, i.e. no need to include PASID info in vendor_fault_data.
OK that works.

Thanks

Eric
>
> Thanks
> Kevin

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

Re: [RFC] /dev/ioasid uAPI proposal

2021-06-09 Thread Jason Gunthorpe

On Wed, Jun 09, 2021 at 11:11:17AM +0200, Paolo Bonzini wrote:
> On 09/06/21 10:51, Enrico Weigelt, metux IT consult wrote:
> > On 08.06.21 21:00, Jason Gunthorpe wrote:
> > 
> > > Eg I can do open() on a file and I get to keep that FD. I get to keep
> > > that FD even if someone later does chmod() on that file so I can't
> > > open it again.
> > > 
> > > There are lots of examples where a one time access control check
> > > provides continuing access to a resource. I feel the ongoing proof is
> > > the rarity in Unix.. 'revoke' is an uncommon concept in Unix..
> > 
> > Yes, it's even possible that somebody w/ privileges opens an fd and
> > hands it over to somebody unprivileged (eg. via unix socket). This is
> > a very basic unix concept. If some (already opened) fd now suddenly
> > behaves differently based on the current caller, that would be a break
> > with traditional unix semantics.
> 
> That's already more or less meaningless for both KVM and VFIO, since they
> are tied to an mm.

vfio isn't supposed to be tied to a mm.

Jason
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

Re: [RFC] /dev/ioasid uAPI proposal

2021-06-09 Thread Jason Gunthorpe

On Wed, Jun 09, 2021 at 02:49:32AM +, Tian, Kevin wrote:

> Last unclosed open. Jason, you dislike symbol_get in this contract per
> earlier comment. As Alex explained, looks it's more about module
> dependency which is orthogonal to how this contract is designed. What
> is your opinion now?

Generally when you see symbol_get like this it suggests something is
wrong in the layering..

Why shouldn't kvm have a normal module dependency on drivers/iommu?

Jason
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

Re: Plan for /dev/ioasid RFC v2

2021-06-09 Thread Joerg Roedel

On Mon, Jun 07, 2021 at 02:58:18AM +, Tian, Kevin wrote:
> -   Device-centric (Jason) vs. group-centric (David) uAPI. David is not fully
> convinced yet. Based on discussion v2 will continue to have ioasid uAPI
> being device-centric (but it's fine for vfio to be group-centric). A new
> section will be added to elaborate this part;

I would vote for group-centric here. Or do the reasons for which VFIO is
group-centric not apply to IOASID? If so, why?

Regards,

Joerg
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

Re: [RFC PATCH V3 01/11] x86/HV: Initialize GHCB page in Isolation VM

2021-06-09 Thread Joerg Roedel

On Sun, May 30, 2021 at 11:06:18AM -0400, Tianyu Lan wrote:
> From: Tianyu Lan 
> 
> Hyper-V exposes GHCB page via SEV ES GHCB MSR for SNP guest
> to communicate with hypervisor. Map GHCB page for all
> cpus to read/write MSR register and submit hvcall request
> via GHCB.
> 
> Signed-off-by: Tianyu Lan 
> ---
>  arch/x86/hyperv/hv_init.c   | 60 ++---
>  arch/x86/include/asm/mshyperv.h |  2 ++
>  include/asm-generic/mshyperv.h  |  2 ++
>  3 files changed, 60 insertions(+), 4 deletions(-)
> 
> diff --git a/arch/x86/hyperv/hv_init.c b/arch/x86/hyperv/hv_init.c
> index bb0ae4b5c00f..dc74d01cb859 100644
> --- a/arch/x86/hyperv/hv_init.c
> +++ b/arch/x86/hyperv/hv_init.c
> @@ -60,6 +60,9 @@ static int hv_cpu_init(unsigned int cpu)
>   struct hv_vp_assist_page **hvp = &hv_vp_assist_page[smp_processor_id()];
>   void **input_arg;
>   struct page *pg;
> + u64 ghcb_gpa;
> + void *ghcb_va;
> + void **ghcb_base;

Any reason you can't reuse the SEV-ES support code in the Linux kernel?
It already has code to setup GHCBs for all vCPUs.

I see that you don't need #VC handling in your SNP VMs because of the
paravisor running underneath it, but just re-using the GHCB setup code
shouldn't be too hard.

Regards,

Joerg
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

Re: Plan for /dev/ioasid RFC v2

2021-06-09 Thread Jason Gunthorpe

On Wed, Jun 09, 2021 at 02:24:03PM +0200, Joerg Roedel wrote:
> On Mon, Jun 07, 2021 at 02:58:18AM +, Tian, Kevin wrote:
> > -   Device-centric (Jason) vs. group-centric (David) uAPI. David is not 
> > fully
> > convinced yet. Based on discussion v2 will continue to have ioasid uAPI
> > being device-centric (but it's fine for vfio to be group-centric). A new
> > section will be added to elaborate this part;
> 
> I would vote for group-centric here. Or do the reasons for which VFIO is
> group-centric not apply to IOASID? If so, why?

VFIO being group centric has made it very ugly/difficult to inject
device driver specific knowledge into the scheme.

The device driver is the only thing that knows to ask:
 - I need a SW table for this ioasid because I am like a mdev
 - I will issue TLPs with PASID
 - I need a IOASID linked to a PASID
 - I am a devices that uses ENQCMD and vPASID
 - etc in future

The current approach has the group try to guess the device driver
intention in the vfio type 1 code.

I want to see this be clean and have the device driver directly tell
the iommu layer what kind of DMA it plans to do, and thus how it needs
the IOMMU and IOASID configured.

This is the source of the ugly symbol_get and the very, very hacky 'if
you are a mdev *and* a iommu then you must want a single PASID' stuff
in type1.

The group is causing all this mess because the group knows nothing
about what the device drivers contained in the group actually want.

Further being group centric eliminates the possibility of working in
cases like !ACS. How do I use PASID functionality of a device behind a
!ACS switch if the uAPI forces all IOASID's to be linked to a group,
not a device?

Device centric with an report that "all devices in the group must use
the same IOASID" covers all the new functionality, keep the old, and
has a better chance to keep going as a uAPI into the future.

Jason
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

Re: [RFC] /dev/ioasid uAPI proposal

2021-06-09 Thread Paolo Bonzini


On 09/06/21 13:57, Jason Gunthorpe wrote:

On Wed, Jun 09, 2021 at 02:49:32AM +, Tian, Kevin wrote:


Last unclosed open. Jason, you dislike symbol_get in this contract per
earlier comment. As Alex explained, looks it's more about module
dependency which is orthogonal to how this contract is designed. What
is your opinion now?


Generally when you see symbol_get like this it suggests something is
wrong in the layering..

Why shouldn't kvm have a normal module dependency on drivers/iommu?


It allows KVM to load even if there's an "install /bin/false" for vfio 
(typically used together with the blacklist directive) in modprobe.conf. 
 This rationale should apply to iommu as well.


Paolo

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

Re: [RFC PATCH V3 04/11] HV: Add Write/Read MSR registers via ghcb

2021-06-09 Thread Joerg Roedel

On Sun, May 30, 2021 at 11:06:21AM -0400, Tianyu Lan wrote:
> +void hv_ghcb_msr_write(u64 msr, u64 value)
> +{
> + union hv_ghcb *hv_ghcb;
> + void **ghcb_base;
> + unsigned long flags;
> +
> + if (!ms_hyperv.ghcb_base)
> + return;
> +
> + local_irq_save(flags);
> + ghcb_base = (void **)this_cpu_ptr(ms_hyperv.ghcb_base);
> + hv_ghcb = (union hv_ghcb *)*ghcb_base;
> + if (!hv_ghcb) {
> + local_irq_restore(flags);
> + return;
> + }
> +
> + memset(hv_ghcb, 0x00, HV_HYP_PAGE_SIZE);
> +
> + hv_ghcb->ghcb.protocol_version = 1;
> + hv_ghcb->ghcb.ghcb_usage = 0;
> +
> + ghcb_set_sw_exit_code(&hv_ghcb->ghcb, SVM_EXIT_MSR);
> + ghcb_set_rcx(&hv_ghcb->ghcb, msr);
> + ghcb_set_rax(&hv_ghcb->ghcb, lower_32_bits(value));
> + ghcb_set_rdx(&hv_ghcb->ghcb, value >> 32);
> + ghcb_set_sw_exit_info_1(&hv_ghcb->ghcb, 1);
> + ghcb_set_sw_exit_info_2(&hv_ghcb->ghcb, 0);
> +
> + VMGEXIT();

This is not safe to use from NMI context. You need at least some
checking or WARN_ON/assertion/whatever to catch cases where this is
violated. Otherwise it will result in some hard to debug bug reports.

Regards,

Joerg
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

Re: [RFC] /dev/ioasid uAPI proposal

2021-06-09 Thread Jason Gunthorpe

On Wed, Jun 09, 2021 at 02:46:05PM +0200, Paolo Bonzini wrote:
> On 09/06/21 13:57, Jason Gunthorpe wrote:
> > On Wed, Jun 09, 2021 at 02:49:32AM +, Tian, Kevin wrote:
> > 
> > > Last unclosed open. Jason, you dislike symbol_get in this contract per
> > > earlier comment. As Alex explained, looks it's more about module
> > > dependency which is orthogonal to how this contract is designed. What
> > > is your opinion now?
> > 
> > Generally when you see symbol_get like this it suggests something is
> > wrong in the layering..
> > 
> > Why shouldn't kvm have a normal module dependency on drivers/iommu?
> 
> It allows KVM to load even if there's an "install /bin/false" for vfio
> (typically used together with the blacklist directive) in modprobe.conf.
> This rationale should apply to iommu as well.

I can vaugely understand this rational for vfio, but not at all for
the platform's iommu driver, sorry.

Jason
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

[PATCH 1/1] iommu/ipmmu-vmsa: remove unnecessary oom message

2021-06-09 Thread Zhen Lei

Fixes scripts/checkpatch.pl warning:
WARNING: Possible unnecessary 'out of memory' message

Remove it can help us save a bit of memory.

Signed-off-by: Zhen Lei 
---
 drivers/iommu/ipmmu-vmsa.c | 4 +---
 1 file changed, 1 insertion(+), 3 deletions(-)

diff --git a/drivers/iommu/ipmmu-vmsa.c b/drivers/iommu/ipmmu-vmsa.c
index 51ea6f00db2f..d31e3890f5e2 100644
--- a/drivers/iommu/ipmmu-vmsa.c
+++ b/drivers/iommu/ipmmu-vmsa.c
@@ -992,10 +992,8 @@ static int ipmmu_probe(struct platform_device *pdev)
int ret;
 
mmu = devm_kzalloc(&pdev->dev, sizeof(*mmu), GFP_KERNEL);
-   if (!mmu) {
-   dev_err(&pdev->dev, "cannot allocate device data\n");
+   if (!mmu)
return -ENOMEM;
-   }
 
mmu->dev = &pdev->dev;
spin_lock_init(&mmu->lock);
-- 
2.25.1


___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

Re: [RFC PATCH V3 05/11] HV: Add ghcb hvcall support for SNP VM

2021-06-09 Thread Joerg Roedel

On Sun, May 30, 2021 at 11:06:22AM -0400, Tianyu Lan wrote:
> +u64 hv_ghcb_hypercall(u64 control, void *input, void *output, u32 input_size)
> +{
> + union hv_ghcb *hv_ghcb;
> + void **ghcb_base;
> + unsigned long flags;
> +
> + if (!ms_hyperv.ghcb_base)
> + return -EFAULT;
> +
> + local_irq_save(flags);
> + ghcb_base = (void **)this_cpu_ptr(ms_hyperv.ghcb_base);
> + hv_ghcb = (union hv_ghcb *)*ghcb_base;
> + if (!hv_ghcb) {
> + local_irq_restore(flags);
> + return -EFAULT;
> + }
> +
> + memset(hv_ghcb, 0x00, HV_HYP_PAGE_SIZE);
> + hv_ghcb->ghcb.protocol_version = 1;
> + hv_ghcb->ghcb.ghcb_usage = 1;
> +
> + hv_ghcb->hypercall.outputgpa = (u64)output;
> + hv_ghcb->hypercall.hypercallinput.asuint64 = 0;
> + hv_ghcb->hypercall.hypercallinput.callcode = control;
> +
> + if (input_size)
> + memcpy(hv_ghcb->hypercall.hypercalldata, input, input_size);
> +
> + VMGEXIT();

Also not NMI-safe. When you re-use the existing GHCB setup code from
from SEV-ES code, you can also use sev_es_get/put_ghcb() which takes
care of re-using a GHCB already in use.

Regards,

Joerg

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

[PATCH 1/1] iommu/vt-d: remove unnecessary oom message

2021-06-09 Thread Zhen Lei

Fixes scripts/checkpatch.pl warning:
WARNING: Possible unnecessary 'out of memory' message

Remove it can help us save a bit of memory.

Signed-off-by: Zhen Lei 
---
 drivers/iommu/intel/dmar.c  | 2 --
 drivers/iommu/intel/iommu.c | 6 +-
 2 files changed, 1 insertion(+), 7 deletions(-)

diff --git a/drivers/iommu/intel/dmar.c b/drivers/iommu/intel/dmar.c
index 84057cb9596c..ede7525b4ec3 100644
--- a/drivers/iommu/intel/dmar.c
+++ b/drivers/iommu/intel/dmar.c
@@ -148,8 +148,6 @@ dmar_alloc_pci_notify_info(struct pci_dev *dev, unsigned 
long event)
} else {
info = kzalloc(size, GFP_KERNEL);
if (!info) {
-   pr_warn("Out of memory when allocating notify_info "
-   "for %s.\n", pci_name(dev));
if (dmar_dev_scope_status == 0)
dmar_dev_scope_status = -ENOMEM;
return NULL;
diff --git a/drivers/iommu/intel/iommu.c b/drivers/iommu/intel/iommu.c
index be35284a2016..432f4019d1af 100644
--- a/drivers/iommu/intel/iommu.c
+++ b/drivers/iommu/intel/iommu.c
@@ -1780,11 +1780,8 @@ static int iommu_init_domains(struct intel_iommu *iommu)
spin_lock_init(&iommu->lock);
 
iommu->domain_ids = kcalloc(nlongs, sizeof(unsigned long), GFP_KERNEL);
-   if (!iommu->domain_ids) {
-   pr_err("%s: Allocating domain id array failed\n",
-  iommu->name);
+   if (!iommu->domain_ids)
return -ENOMEM;
-   }
 
size = (ALIGN(ndomains, 256) >> 8) * sizeof(struct dmar_domain **);
iommu->domains = kzalloc(size, GFP_KERNEL);
@@ -3220,7 +3217,6 @@ static int __init init_dmars(void)
g_iommus = kcalloc(g_num_of_iommus, sizeof(struct intel_iommu *),
GFP_KERNEL);
if (!g_iommus) {
-   pr_err("Allocating global iommu array failed\n");
ret = -ENOMEM;
goto error;
}
-- 
2.25.1


___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

[PATCH 1/1] iommu/arm-smmu: remove unnecessary oom message

2021-06-09 Thread Zhen Lei

Fixes scripts/checkpatch.pl warning:
WARNING: Possible unnecessary 'out of memory' message

Remove it can help us save a bit of memory.

Signed-off-by: Zhen Lei 
---
 drivers/iommu/arm/arm-smmu/arm-smmu.c | 8 ++--
 1 file changed, 2 insertions(+), 6 deletions(-)

diff --git a/drivers/iommu/arm/arm-smmu/arm-smmu.c 
b/drivers/iommu/arm/arm-smmu/arm-smmu.c
index dba15f312cbd..db7b1e101bc5 100644
--- a/drivers/iommu/arm/arm-smmu/arm-smmu.c
+++ b/drivers/iommu/arm/arm-smmu/arm-smmu.c
@@ -2051,10 +2051,8 @@ static int arm_smmu_device_probe(struct platform_device 
*pdev)
irqreturn_t (*global_fault)(int irq, void *dev);
 
smmu = devm_kzalloc(dev, sizeof(*smmu), GFP_KERNEL);
-   if (!smmu) {
-   dev_err(dev, "failed to allocate arm_smmu_device\n");
+   if (!smmu)
return -ENOMEM;
-   }
smmu->dev = dev;
 
if (dev->of_node)
@@ -2095,10 +2093,8 @@ static int arm_smmu_device_probe(struct platform_device 
*pdev)
 
smmu->irqs = devm_kcalloc(dev, num_irqs, sizeof(*smmu->irqs),
  GFP_KERNEL);
-   if (!smmu->irqs) {
-   dev_err(dev, "failed to allocate %d irqs\n", num_irqs);
+   if (!smmu->irqs)
return -ENOMEM;
-   }
 
for (i = 0; i < num_irqs; ++i) {
int irq = platform_get_irq(pdev, i);
-- 
2.25.1


___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

[PATCH 1/1] iommu/arm-smmu-v3: remove unnecessary oom message

2021-06-09 Thread Zhen Lei

Fixes scripts/checkpatch.pl warning:
WARNING: Possible unnecessary 'out of memory' message

Remove it can help us save a bit of memory.

Signed-off-by: Zhen Lei 
---
 drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c | 8 ++--
 1 file changed, 2 insertions(+), 6 deletions(-)

diff --git a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c 
b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
index 2ddc3cd5a7d1..fd7c55b44881 100644
--- a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
+++ b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
@@ -2787,10 +2787,8 @@ static int arm_smmu_init_l1_strtab(struct 
arm_smmu_device *smmu)
void *strtab = smmu->strtab_cfg.strtab;
 
cfg->l1_desc = devm_kzalloc(smmu->dev, size, GFP_KERNEL);
-   if (!cfg->l1_desc) {
-   dev_err(smmu->dev, "failed to allocate l1 stream table desc\n");
+   if (!cfg->l1_desc)
return -ENOMEM;
-   }
 
for (i = 0; i < cfg->num_l1_ents; ++i) {
arm_smmu_write_strtab_l1_desc(strtab, &cfg->l1_desc[i]);
@@ -3581,10 +3579,8 @@ static int arm_smmu_device_probe(struct platform_device 
*pdev)
bool bypass;
 
smmu = devm_kzalloc(dev, sizeof(*smmu), GFP_KERNEL);
-   if (!smmu) {
-   dev_err(dev, "failed to allocate arm_smmu_device\n");
+   if (!smmu)
return -ENOMEM;
-   }
smmu->dev = dev;
 
if (dev->of_node) {
-- 
2.25.1


___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

Re: [GIT PULL] memory: Tegra memory controller for v5.14

2021-06-09 Thread Krzysztof Kozlowski

On 07/06/2021 10:49, Krzysztof Kozlowski wrote:
> Hi Olof and Arnd,
> 
> Tegra memory controller driver changes with necessary dependency from Thierry
> (which you will also get from him):
> 1. Dmitry's power domain work on Tegra MC drivers,
> 2. Necessary clock and regulator dependencies for Dmitry's work.
> 

Hi Olof and Arnd,

Just FYI, the stable tag from Thierry which I pulled here (and you will
get from him as well) might cause COMPILE_TEST failures on specific
configurations. Regular defconfigs and allyes/allmod should not be affected.

I am giving heads up in case this lands in Linus later...

There will be two fixes for this, sent already by Thierry:
https://lore.kernel.org/lkml/20210609112806.3565057-1-thierry.red...@gmail.com/

1. reset controller stubs: going via reset tree,
2. reserved memory section: probably going via my tree later.

Best regards,
Krzysztof
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

Re: [RFC] /dev/ioasid uAPI proposal

2021-06-09 Thread Paolo Bonzini


On 09/06/21 14:47, Jason Gunthorpe wrote:

On Wed, Jun 09, 2021 at 02:46:05PM +0200, Paolo Bonzini wrote:

On 09/06/21 13:57, Jason Gunthorpe wrote:

On Wed, Jun 09, 2021 at 02:49:32AM +, Tian, Kevin wrote:


Last unclosed open. Jason, you dislike symbol_get in this contract per
earlier comment. As Alex explained, looks it's more about module
dependency which is orthogonal to how this contract is designed. What
is your opinion now?


Generally when you see symbol_get like this it suggests something is
wrong in the layering..

Why shouldn't kvm have a normal module dependency on drivers/iommu?


It allows KVM to load even if there's an "install /bin/false" for vfio
(typically used together with the blacklist directive) in modprobe.conf.
This rationale should apply to iommu as well.


I can vaugely understand this rational for vfio, but not at all for
the platform's iommu driver, sorry.


Sorry, should apply to ioasid, not iommu (assuming that /dev/ioasid 
support would be modular).


Paolo

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

Re: Plan for /dev/ioasid RFC v2

2021-06-09 Thread Joerg Roedel

On Wed, Jun 09, 2021 at 09:39:19AM -0300, Jason Gunthorpe wrote:
> VFIO being group centric has made it very ugly/difficult to inject
> device driver specific knowledge into the scheme.

This whole API will be complicated and difficult anyway, so no reason to
unnecessarily simplify things here.

VFIO is group-centric for security/isolation reasons, and since IOASID
is a uAPI it also needs to account for that. In the end the devices
which are going to use this will likely have their own group anyway, so
things will not get too complicated.

> The current approach has the group try to guess the device driver
> intention in the vfio type 1 code.
> 
> I want to see this be clean and have the device driver directly tell
> the iommu layer what kind of DMA it plans to do, and thus how it needs
> the IOMMU and IOASID configured.

I am in for the general idea, it simplifies the code. But the kernel
still needs to check whether the wishlist from user-space can be
fulfilled.

> The group is causing all this mess because the group knows nothing
> about what the device drivers contained in the group actually want.

There are devices in the group, not drivers.

> Further being group centric eliminates the possibility of working in
> cases like !ACS. How do I use PASID functionality of a device behind a
> !ACS switch if the uAPI forces all IOASID's to be linked to a group,
> not a device?

You don't use it, because it is not secure for devices which are not
behind an ACS bridge.

> Device centric with an report that "all devices in the group must use
> the same IOASID" covers all the new functionality, keep the old, and
> has a better chance to keep going as a uAPI into the future.

If all devices in the group have to use the same IOASID anyway, we can
just as well force it by making the interface group-centric.

Regards,

Joerg
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

Re: [PATCH v5 05/16] media: mtk-jpeg: Use pm_runtime_resume_and_get for PM get_sync

2021-06-09 Thread Matthias Brugger




On 10/04/2021 11:11, Yong Wu wrote:
> pm_runtime_get_sync will increment pm usage counter even it failed.
> This patch use pm_runtime_resume_and_get instead of pm_runtime_get
> to keep usage counter balanced.
> 
> CC: Xia Jiang 
> Signed-off-by: Yong Wu 

Reviewed-by: Matthias Brugger 

> ---
>  drivers/media/platform/mtk-jpeg/mtk_jpeg_core.c | 4 ++--
>  1 file changed, 2 insertions(+), 2 deletions(-)
> 
> diff --git a/drivers/media/platform/mtk-jpeg/mtk_jpeg_core.c 
> b/drivers/media/platform/mtk-jpeg/mtk_jpeg_core.c
> index 88a23bce569d..a89c7b206eef 100644
> --- a/drivers/media/platform/mtk-jpeg/mtk_jpeg_core.c
> +++ b/drivers/media/platform/mtk-jpeg/mtk_jpeg_core.c
> @@ -920,7 +920,7 @@ static void mtk_jpeg_enc_device_run(void *priv)
>   src_buf = v4l2_m2m_next_src_buf(ctx->fh.m2m_ctx);
>   dst_buf = v4l2_m2m_next_dst_buf(ctx->fh.m2m_ctx);
>  
> - ret = pm_runtime_get_sync(jpeg->dev);
> + ret = pm_runtime_resume_and_get(jpeg->dev);
>   if (ret < 0)
>   goto enc_end;
>  
> @@ -973,7 +973,7 @@ static void mtk_jpeg_dec_device_run(void *priv)
>   return;
>   }
>  
> - ret = pm_runtime_get_sync(jpeg->dev);
> + ret = pm_runtime_resume_and_get(jpeg->dev);
>   if (ret < 0)
>   goto dec_end;
>  
> 
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

[PATCH] iommu/arm-smmu: Fix arm_smmu_device refcount leak in address translation

2021-06-09 Thread Xiyu Yang via iommu

arm_smmu_rpm_get() invokes pm_runtime_get_sync(), which increases the
refcount of the "smmu" even though the return value is less than 0.

The reference counting issue happens in several exception handling paths
of arm_smmu_iova_to_phys_hard(). When those error scenarios occur, the
function forgets to decrease the refcount of "smmu" increased by
arm_smmu_rpm_get(), causing a refcount leak.

Fix this issue by jumping to "out" label when those error scenarios
occur.

Signed-off-by: Xiyu Yang 
Signed-off-by: Xin Tan 
---
 drivers/iommu/arm/arm-smmu/arm-smmu.c | 10 +++---
 1 file changed, 7 insertions(+), 3 deletions(-)

diff --git a/drivers/iommu/arm/arm-smmu/arm-smmu.c 
b/drivers/iommu/arm/arm-smmu/arm-smmu.c
index 6f72c4d208ca..023dcd72ed35 100644
--- a/drivers/iommu/arm/arm-smmu/arm-smmu.c
+++ b/drivers/iommu/arm/arm-smmu/arm-smmu.c
@@ -1271,10 +1271,11 @@ static phys_addr_t arm_smmu_iova_to_phys_hard(struct 
iommu_domain *domain,
u64 phys;
unsigned long va, flags;
int ret, idx = cfg->cbndx;
+   phys_addr_t addr = 0;
 
ret = arm_smmu_rpm_get(smmu);
if (ret < 0)
-   return 0;
+   goto out;
 
spin_lock_irqsave(&smmu_domain->cb_lock, flags);
va = iova & ~0xfffUL;
@@ -1290,6 +1291,7 @@ static phys_addr_t arm_smmu_iova_to_phys_hard(struct 
iommu_domain *domain,
dev_err(dev,
"iova to phys timed out on %pad. Falling back to 
software table walk.\n",
&iova);
+   arm_smmu_rpm_put(smmu);
return ops->iova_to_phys(ops, iova);
}
 
@@ -1298,12 +1300,14 @@ static phys_addr_t arm_smmu_iova_to_phys_hard(struct 
iommu_domain *domain,
if (phys & ARM_SMMU_CB_PAR_F) {
dev_err(dev, "translation fault!\n");
dev_err(dev, "PAR = 0x%llx\n", phys);
-   return 0;
+   goto out;
}
 
+   addr = (phys & GENMASK_ULL(39, 12)) | (iova & 0xfff);
+out:
arm_smmu_rpm_put(smmu);
 
-   return (phys & GENMASK_ULL(39, 12)) | (iova & 0xfff);
+   return addr;
 }
 
 static phys_addr_t arm_smmu_iova_to_phys(struct iommu_domain *domain,
-- 
2.7.4

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

[PATCH] iommu/arm-smmu: Fix arm_smmu_device refcount leak when arm_smmu_rpm_get fails

2021-06-09 Thread Xiyu Yang via iommu

arm_smmu_rpm_get() invokes pm_runtime_get_sync(), which increases the
refcount of the "smmu" even though the return value is less than 0.

The reference counting issue happens in some error handling paths of
arm_smmu_rpm_get() in its caller functions. When arm_smmu_rpm_get()
fails, the caller functions forget to decrease the refcount of "smmu"
increased by arm_smmu_rpm_get(), causing a refcount leak.

Fix this issue by calling arm_smmu_rpm_put() or jumping to the "rpm_put"
label when arm_smmu_rpm_get() fails.

Signed-off-by: Xiyu Yang 
Signed-off-by: Xin Tan 
---
 drivers/iommu/arm/arm-smmu/arm-smmu.c | 13 +
 1 file changed, 9 insertions(+), 4 deletions(-)

diff --git a/drivers/iommu/arm/arm-smmu/arm-smmu.c 
b/drivers/iommu/arm/arm-smmu/arm-smmu.c
index 6f72c4d208ca..177ee54c5534 100644
--- a/drivers/iommu/arm/arm-smmu/arm-smmu.c
+++ b/drivers/iommu/arm/arm-smmu/arm-smmu.c
@@ -840,7 +840,7 @@ static void arm_smmu_destroy_domain_context(struct 
iommu_domain *domain)
 
ret = arm_smmu_rpm_get(smmu);
if (ret < 0)
-   return;
+   goto rpm_put;
 
/*
 * Disable the context bank and free the page tables before freeing
@@ -857,6 +857,7 @@ static void arm_smmu_destroy_domain_context(struct 
iommu_domain *domain)
free_io_pgtable_ops(smmu_domain->pgtbl_ops);
__arm_smmu_free_bitmap(smmu->context_map, cfg->cbndx);
 
+rpm_put:
arm_smmu_rpm_put(smmu);
 }
 
@@ -1153,7 +1154,7 @@ static int arm_smmu_attach_dev(struct iommu_domain 
*domain, struct device *dev)
 
ret = arm_smmu_rpm_get(smmu);
if (ret < 0)
-   return ret;
+   goto rpm_put;
 
/* Ensure that the domain is finalised */
ret = arm_smmu_init_domain_context(domain, smmu, dev);
@@ -1404,7 +1405,7 @@ static struct iommu_device *arm_smmu_probe_device(struct 
device *dev)
 
ret = arm_smmu_rpm_get(smmu);
if (ret < 0)
-   goto out_cfg_free;
+   goto rpm_put;
 
ret = arm_smmu_master_alloc_smes(dev);
arm_smmu_rpm_put(smmu);
@@ -1417,6 +1418,8 @@ static struct iommu_device *arm_smmu_probe_device(struct 
device *dev)
 
return &smmu->iommu;
 
+rpm_put:
+   arm_smmu_rpm_put(smmu);
 out_cfg_free:
kfree(cfg);
 out_free:
@@ -1438,8 +1441,10 @@ static void arm_smmu_release_device(struct device *dev)
smmu = cfg->smmu;
 
ret = arm_smmu_rpm_get(smmu);
-   if (ret < 0)
+   if (ret < 0) {
+   arm_smmu_rpm_put(smmu);
return;
+   }
 
arm_smmu_master_free_smes(cfg, fwspec);
 
-- 
2.7.4

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

Re: [PATCH v5 09/16] drm/mediatek: Use pm_runtime_resume_and_get for PM get_sync

2021-06-09 Thread Matthias Brugger




On 10/04/2021 11:11, Yong Wu wrote:
> pm_runtime_get_sync will increment pm usage counter even it failed.
> This patch use pm_runtime_resume_and_get instead of pm_runtime_get
> to keep usage counter balanced.
> 
> Signed-off-by: Yong Wu 

Reviewed-by: Matthias Brugger 

> ---
>  drivers/gpu/drm/mediatek/mtk_drm_crtc.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/drivers/gpu/drm/mediatek/mtk_drm_crtc.c 
> b/drivers/gpu/drm/mediatek/mtk_drm_crtc.c
> index 8b0de90156c6..69d23ce56d2c 100644
> --- a/drivers/gpu/drm/mediatek/mtk_drm_crtc.c
> +++ b/drivers/gpu/drm/mediatek/mtk_drm_crtc.c
> @@ -259,7 +259,7 @@ static int mtk_crtc_ddp_hw_init(struct mtk_drm_crtc 
> *mtk_crtc)
>   drm_connector_list_iter_end(&conn_iter);
>   }
>  
> - ret = pm_runtime_get_sync(crtc->dev->dev);
> + ret = pm_runtime_resume_and_get(crtc->dev->dev);
>   if (ret < 0) {
>   DRM_ERROR("Failed to enable power domain: %d\n", ret);
>   return ret;
> 
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

Re: [PATCH] iommu/arm-smmu: Fix arm_smmu_device refcount leak when arm_smmu_rpm_get fails

2021-06-09 Thread Robin Murphy


On 2021-06-09 14:35, Xiyu Yang wrote:

arm_smmu_rpm_get() invokes pm_runtime_get_sync(), which increases the
refcount of the "smmu" even though the return value is less than 0.

The reference counting issue happens in some error handling paths of
arm_smmu_rpm_get() in its caller functions. When arm_smmu_rpm_get()
fails, the caller functions forget to decrease the refcount of "smmu"
increased by arm_smmu_rpm_get(), causing a refcount leak.

Fix this issue by calling arm_smmu_rpm_put() or jumping to the "rpm_put"
label when arm_smmu_rpm_get() fails.


If only there was some kind of helper function which could encapsulate 
the correct expected behaviour in a single place...


In fact with the new pm_runtime_resume_and_get() API I think these two 
patches boil down to a one-line change.


Thanks,
Robin.


Signed-off-by: Xiyu Yang 
Signed-off-by: Xin Tan 
---
  drivers/iommu/arm/arm-smmu/arm-smmu.c | 13 +
  1 file changed, 9 insertions(+), 4 deletions(-)

diff --git a/drivers/iommu/arm/arm-smmu/arm-smmu.c 
b/drivers/iommu/arm/arm-smmu/arm-smmu.c
index 6f72c4d208ca..177ee54c5534 100644
--- a/drivers/iommu/arm/arm-smmu/arm-smmu.c
+++ b/drivers/iommu/arm/arm-smmu/arm-smmu.c
@@ -840,7 +840,7 @@ static void arm_smmu_destroy_domain_context(struct 
iommu_domain *domain)
  
  	ret = arm_smmu_rpm_get(smmu);

if (ret < 0)
-   return;
+   goto rpm_put;
  
  	/*

 * Disable the context bank and free the page tables before freeing
@@ -857,6 +857,7 @@ static void arm_smmu_destroy_domain_context(struct 
iommu_domain *domain)
free_io_pgtable_ops(smmu_domain->pgtbl_ops);
__arm_smmu_free_bitmap(smmu->context_map, cfg->cbndx);
  
+rpm_put:

arm_smmu_rpm_put(smmu);
  }
  
@@ -1153,7 +1154,7 @@ static int arm_smmu_attach_dev(struct iommu_domain *domain, struct device *dev)
  
  	ret = arm_smmu_rpm_get(smmu);

if (ret < 0)
-   return ret;
+   goto rpm_put;
  
  	/* Ensure that the domain is finalised */

ret = arm_smmu_init_domain_context(domain, smmu, dev);
@@ -1404,7 +1405,7 @@ static struct iommu_device *arm_smmu_probe_device(struct 
device *dev)
  
  	ret = arm_smmu_rpm_get(smmu);

if (ret < 0)
-   goto out_cfg_free;
+   goto rpm_put;
  
  	ret = arm_smmu_master_alloc_smes(dev);

arm_smmu_rpm_put(smmu);
@@ -1417,6 +1418,8 @@ static struct iommu_device *arm_smmu_probe_device(struct 
device *dev)
  
  	return &smmu->iommu;
  
+rpm_put:

+   arm_smmu_rpm_put(smmu);
  out_cfg_free:
kfree(cfg);
  out_free:
@@ -1438,8 +1441,10 @@ static void arm_smmu_release_device(struct device *dev)
smmu = cfg->smmu;
  
  	ret = arm_smmu_rpm_get(smmu);

-   if (ret < 0)
+   if (ret < 0) {
+   arm_smmu_rpm_put(smmu);
return;
+   }
  
  	arm_smmu_master_free_smes(cfg, fwspec);
  


___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

Re: [RFC] /dev/ioasid uAPI proposal

2021-06-09 Thread Alex Williamson

On Wed, 9 Jun 2021 08:54:45 -0300
Jason Gunthorpe  wrote:

> On Wed, Jun 09, 2021 at 11:11:17AM +0200, Paolo Bonzini wrote:
> > On 09/06/21 10:51, Enrico Weigelt, metux IT consult wrote:  
> > > On 08.06.21 21:00, Jason Gunthorpe wrote:
> > >   
> > > > Eg I can do open() on a file and I get to keep that FD. I get to keep
> > > > that FD even if someone later does chmod() on that file so I can't
> > > > open it again.
> > > > 
> > > > There are lots of examples where a one time access control check
> > > > provides continuing access to a resource. I feel the ongoing proof is
> > > > the rarity in Unix.. 'revoke' is an uncommon concept in Unix..  
> > > 
> > > Yes, it's even possible that somebody w/ privileges opens an fd and
> > > hands it over to somebody unprivileged (eg. via unix socket). This is
> > > a very basic unix concept. If some (already opened) fd now suddenly
> > > behaves differently based on the current caller, that would be a break
> > > with traditional unix semantics.  

That's not the scenario here, this would work as expected.  It's not a
matter of who uses the fd it's that a property of the fd that provided
elevated access has been removed.  I only need to look as far as sudo
to find examples of protocols where having access at some point in time
does not guarantee ongoing access.

If we go back to the wbinvd ioctl mechanism, if I call that ioctl with
an ioasidfd that contains no devices, then I shouldn't be able to
generate a wbinvd on the processor, right?  If I add a device,
especially in a configuration that can generate non-coherent DMA, now
that ioctl should work.  If I then remove all devices from that ioasid,
what then is the difference from the initial state.  Should the ioctl
now work because it worked once in the past?

If we equate KVM to the process access to the ioctl, then a reference
or notification method should be used to retain or mark the end of that
access.  This is no different than starting a shell via sudo (ie. an
ongoing reference) or having the previous authentication time out, or
in our case be notified it has expired.

> > That's already more or less meaningless for both KVM and VFIO, since they
> > are tied to an mm.  
> 
> vfio isn't supposed to be tied to a mm.

vfio does accounting against an mm, why shouldn't it be tied to an mm?
Maybe you mean in the new model where vfio is just a device driver?
Thanks,

Alex

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

Re: [RFC] /dev/ioasid uAPI proposal

2021-06-09 Thread Jason Gunthorpe

On Wed, Jun 09, 2021 at 03:24:11PM +0200, Paolo Bonzini wrote:
> On 09/06/21 14:47, Jason Gunthorpe wrote:
> > On Wed, Jun 09, 2021 at 02:46:05PM +0200, Paolo Bonzini wrote:
> > > On 09/06/21 13:57, Jason Gunthorpe wrote:
> > > > On Wed, Jun 09, 2021 at 02:49:32AM +, Tian, Kevin wrote:
> > > > 
> > > > > Last unclosed open. Jason, you dislike symbol_get in this contract per
> > > > > earlier comment. As Alex explained, looks it's more about module
> > > > > dependency which is orthogonal to how this contract is designed. What
> > > > > is your opinion now?
> > > > 
> > > > Generally when you see symbol_get like this it suggests something is
> > > > wrong in the layering..
> > > > 
> > > > Why shouldn't kvm have a normal module dependency on drivers/iommu?
> > > 
> > > It allows KVM to load even if there's an "install /bin/false" for vfio
> > > (typically used together with the blacklist directive) in modprobe.conf.
> > > This rationale should apply to iommu as well.
> > 
> > I can vaugely understand this rational for vfio, but not at all for
> > the platform's iommu driver, sorry.
> 
> Sorry, should apply to ioasid, not iommu (assuming that /dev/ioasid support
> would be modular).

/dev/ioasid and drivers/iommu are tightly coupled things, I we can put
a key symbol or two in a place where it will not be a problem to
depend on.

Jason
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

Re: [PATCH v5 14/16] memory: mtk-smi: Get rid of mtk_smi_larb_get/put

2021-06-09 Thread Matthias Brugger




On 10/04/2021 11:11, Yong Wu wrote:
> After adding device_link between the iommu consumer and smi-larb,
> the pm_runtime_get(_sync) of smi-larb and smi-common will be called
> automatically. we can get rid of mtk_smi_larb_get/put.
> 
> CC: Matthias Brugger 
> Signed-off-by: Yong Wu 
> Reviewed-by: Evan Green 

Acked-by: Matthias Brugger 

> ---
>  drivers/memory/mtk-smi.c   | 14 --
>  include/soc/mediatek/smi.h | 20 
>  2 files changed, 34 deletions(-)
> 
> diff --git a/drivers/memory/mtk-smi.c b/drivers/memory/mtk-smi.c
> index c5fb51f73b34..7c61c924e220 100644
> --- a/drivers/memory/mtk-smi.c
> +++ b/drivers/memory/mtk-smi.c
> @@ -134,20 +134,6 @@ static void mtk_smi_clk_disable(const struct mtk_smi 
> *smi)
>   clk_disable_unprepare(smi->clk_apb);
>  }
>  
> -int mtk_smi_larb_get(struct device *larbdev)
> -{
> - int ret = pm_runtime_resume_and_get(larbdev);
> -
> - return (ret < 0) ? ret : 0;
> -}
> -EXPORT_SYMBOL_GPL(mtk_smi_larb_get);
> -
> -void mtk_smi_larb_put(struct device *larbdev)
> -{
> - pm_runtime_put_sync(larbdev);
> -}
> -EXPORT_SYMBOL_GPL(mtk_smi_larb_put);
> -
>  static int
>  mtk_smi_larb_bind(struct device *dev, struct device *master, void *data)
>  {
> diff --git a/include/soc/mediatek/smi.h b/include/soc/mediatek/smi.h
> index 15e3397cec58..11f7d6b59642 100644
> --- a/include/soc/mediatek/smi.h
> +++ b/include/soc/mediatek/smi.h
> @@ -19,26 +19,6 @@ struct mtk_smi_larb_iommu {
>   unsigned char  bank[32];
>  };
>  
> -/*
> - * mtk_smi_larb_get: Enable the power domain and clocks for this local 
> arbiter.
> - *   It also initialize some basic setting(like iommu).
> - * mtk_smi_larb_put: Disable the power domain and clocks for this local 
> arbiter.
> - * Both should be called in non-atomic context.
> - *
> - * Returns 0 if successful, negative on failure.
> - */
> -int mtk_smi_larb_get(struct device *larbdev);
> -void mtk_smi_larb_put(struct device *larbdev);
> -
> -#else
> -
> -static inline int mtk_smi_larb_get(struct device *larbdev)
> -{
> - return 0;
> -}
> -
> -static inline void mtk_smi_larb_put(struct device *larbdev) { }
> -
>  #endif
>  
>  #endif
> 
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

Re: [RFC] /dev/ioasid uAPI proposal

2021-06-09 Thread Jason Gunthorpe

On Wed, Jun 09, 2021 at 08:31:34AM -0600, Alex Williamson wrote:

> If we go back to the wbinvd ioctl mechanism, if I call that ioctl with
> an ioasidfd that contains no devices, then I shouldn't be able to
> generate a wbinvd on the processor, right?  If I add a device,
> especially in a configuration that can generate non-coherent DMA, now
> that ioctl should work.  If I then remove all devices from that ioasid,
> what then is the difference from the initial state.  Should the ioctl
> now work because it worked once in the past?

The ioctl is fine, but telling KVM to enable WBINVD is very similar to
open and then reconfiguring the ioasid_fd is very similar to
chmod. From a security perspective revoke is not strictly required,
IMHO.

> access.  This is no different than starting a shell via sudo (ie. an
> ongoing reference) or having the previous authentication time out, or
> in our case be notified it has expired.

Those are all authentication gates as well, yes sudo has a timer, but
once the timer expires it doesn't forcibly revoke & close all the
existing sudo sessions. It just means you can't create new ones
without authenticating.

> > > That's already more or less meaningless for both KVM and VFIO, since they
> > > are tied to an mm.  
> > 
> > vfio isn't supposed to be tied to a mm.
> 
> vfio does accounting against an mm, why shouldn't it be tied to an mm?

It looks like vfio type 1 is doing it properly, each ranch of of user
VA is stuffed into a struct vfio_dma and that contains a struct task
(which can be a mm_struct these days) that refers to the owning mm.

Looks like a single fd can hold multiple vfio_dma's and I don't see an
enforcment that current is locked to any specific process.

When the accounting is done it is done via the mm obtained through the
vfio_dma struct, not a global FD wide mm.

This appears all fine for something using pin_user_pages(). We don't
expect FDs to become locked to a single process on the first call to
pin_user_pages() that is un-unixy.

kvm is special in this regard.

Jason
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

[PATCH] iommu/io-pgtable-arm: Optimize partial walk flush for large scatter-gather list

2021-06-09 Thread Sai Prakash Ranjan

Currently for iommu_unmap() of large scatter-gather list with page size
elements, the majority of time is spent in flushing of partial walks in
__arm_lpae_unmap() which is a VA based TLB invalidation (TLBIVA for
arm-smmu).

For example: to unmap a 32MB scatter-gather list with page size elements
(8192 entries), there are 16->2MB buffer unmaps based on the pgsize (2MB
for 4K granule) and each of 2MB will further result in 512 TLBIVAs (2MB/4K)
resulting in a total of 8192 TLBIVAs (512*16) for 16->2MB causing a huge
overhead.

So instead use io_pgtable_tlb_flush_all() to invalidate the entire context
if size (pgsize) is greater than the granule size (4K, 16K, 64K). For this
example of 32MB scatter-gather list unmap, this results in just 16 ASID
based TLB invalidations or tlb_flush_all() callback (TLBIASID in case of
arm-smmu) as opposed to 8192 TLBIVAs thereby increasing the performance of
unmaps drastically.

Condition (size > granule size) is chosen for io_pgtable_tlb_flush_all()
because for any granule with supported pgsizes, we will have at least 512
TLB invalidations for which tlb_flush_all() is already recommended. For
example, take 4K granule with 2MB pgsize, this will result in 512 TLBIVA
in partial walk flush.

Test on QTI SM8150 SoC for 10 iterations of iommu_{map_sg}/unmap:
(average over 10 iterations)

Before this optimization:

sizeiommu_map_sg  iommu_unmap
  4K2.067 us 1.854 us
 64K9.598 us 8.802 us
  1M  148.890 us   130.718 us
  2M  305.864 us67.291 us
 12M 1793.604 us   390.838 us
 16M 2386.848 us   518.187 us
 24M 3563.296 us   775.989 us
 32M 4747.171 us  1033.364 us

After this optimization:

sizeiommu_map_sg  iommu_unmap
  4K1.723 us 1.765 us
 64K9.880 us 8.869 us
  1M  155.364 us   135.223 us
  2M  303.906 us 5.385 us
 12M 1786.557 us21.250 us
 16M 2391.890 us27.437 us
 24M 3570.895 us39.937 us
 32M 4755.234 us51.797 us

This is further reduced once the map/unmap_pages() support gets in which
will result in just 1 tlb_flush_all() as opposed to 16 tlb_flush_all().

Signed-off-by: Sai Prakash Ranjan 
---
 drivers/iommu/io-pgtable-arm.c | 7 +--
 1 file changed, 5 insertions(+), 2 deletions(-)

diff --git a/drivers/iommu/io-pgtable-arm.c b/drivers/iommu/io-pgtable-arm.c
index 87def58e79b5..c3cb9add3179 100644
--- a/drivers/iommu/io-pgtable-arm.c
+++ b/drivers/iommu/io-pgtable-arm.c
@@ -589,8 +589,11 @@ static size_t __arm_lpae_unmap(struct arm_lpae_io_pgtable 
*data,
 
if (!iopte_leaf(pte, lvl, iop->fmt)) {
/* Also flush any partial walks */
-   io_pgtable_tlb_flush_walk(iop, iova, size,
- ARM_LPAE_GRANULE(data));
+   if (size > ARM_LPAE_GRANULE(data))
+   io_pgtable_tlb_flush_all(iop);
+   else
+   io_pgtable_tlb_flush_walk(iop, iova, size,
+ 
ARM_LPAE_GRANULE(data));
ptep = iopte_deref(pte, data);
__arm_lpae_free_pgtable(data, lvl + 1, ptep);
} else if (iop->cfg.quirks & IO_PGTABLE_QUIRK_NON_STRICT) {
-- 
QUALCOMM INDIA, on behalf of Qualcomm Innovation Center, Inc. is a member
of Code Aurora Forum, hosted by The Linux Foundation

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

Re: Plan for /dev/ioasid RFC v2

2021-06-09 Thread Jason Gunthorpe

On Wed, Jun 09, 2021 at 03:32:34PM +0200, Joerg Roedel wrote:

> > The group is causing all this mess because the group knows nothing
> > about what the device drivers contained in the group actually want.
> 
> There are devices in the group, not drivers.

Well exactly, that is the whole problem.

Only *drivers* know what the actual device is going to do, devices do
not. Since the group doesn't have drivers it is the wrong layer to be
making choices about how to configure the IOMMU.

As I've said trying to cram these necessary choices through the group
has made mess. I think if people want to keep the group then they need
to come up with a reasonable in-kernel API that gets the driver
involved in the required decisions. ie figure out how to do PASID
support on VFIO type1 that isn't grotequesly hardwired to mdev like
today.

The device centric approach is my attempt at this, and it is pretty
clean, I think.

> > Further being group centric eliminates the possibility of working in
> > cases like !ACS. How do I use PASID functionality of a device behind a
> > !ACS switch if the uAPI forces all IOASID's to be linked to a group,
> > not a device?
> 
> You don't use it, because it is not secure for devices which are not
> behind an ACS bridge.

All ACS does is prevent P2P operations, if you assign all the group
devices into the same /dev/iommu then you may not care about that
security isolation property. At the very least it is policy for user
to decide, not kernel.

> > Device centric with an report that "all devices in the group must use
> > the same IOASID" covers all the new functionality, keep the old, and
> > has a better chance to keep going as a uAPI into the future.
> 
> If all devices in the group have to use the same IOASID anyway, 

That isn't true! That is true *today* due to the API design but
nothing about the HW forces this, and with PASID it starts to become
problematic.

Groups should be primarily about isolation security, not about IOASID
matching.

Again, there is no reason to block PASID support in the vIOMMU if all
the devices in the group are assigned into the same VM, and the HW can
properly match the (RID,PASID). PASID can't transit a PCI-PCIe bridge,
PASID isn't supported by old IOMMUs that can't do RID matching, so
PASID scenarios should always be able to determine the source
regardless of what the group layout is.

Blocking this forever in the new uAPI just because group = IOASID is
some historical convenience makes no sense to me.

Jason
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

Re: [PATCH v11 1/3] iommu: Enhance IOMMU default DMA mode build options

2021-06-09 Thread Robin Murphy


On 2021-06-08 14:18, John Garry wrote:

From: Zhen Lei 

First, add build options IOMMU_DEFAULT_{LAZY|STRICT}, so that we have the
opportunity to set {lazy|strict} mode as default at build time. Then put
the two config options in a choice, as they are mutually exclusive.

[jpg: Make choice between strict and lazy only (and not passthrough)]
Signed-off-by: Zhen Lei 
Signed-off-by: John Garry 
---
  drivers/iommu/Kconfig | 35 +++
  drivers/iommu/iommu.c |  3 ++-
  2 files changed, 37 insertions(+), 1 deletion(-)

diff --git a/drivers/iommu/Kconfig b/drivers/iommu/Kconfig
index 1f111b399bca..369a3af9e5bf 100644
--- a/drivers/iommu/Kconfig
+++ b/drivers/iommu/Kconfig
@@ -90,6 +90,41 @@ config IOMMU_DEFAULT_PASSTHROUGH
  
  	  If unsure, say N here.
  
+choice

+   prompt "IOMMU default DMA mode"
+   depends on IOMMU_API
+   depends on X86 || IA64 || X86_64 || ARM || ARM64


Simply "depends on IOMMU_DMA" should suffice, since that's now the only 
place where flush queues matter.



+
+   default IOMMU_DEFAULT_STRICT
+   help
+ This option allows an IOMMU DMA mode to be chosen at build time, to
+ override the default DMA mode of each ARCH, removing the need to
+ pass in kernel parameters through command line. It is still possible
+ to provide ARCH-specific or common boot options to override this
+ option.
+
+ If unsure, keep the default.
+
+config IOMMU_DEFAULT_LAZY
+   bool "lazy"
+   help
+ Support lazy mode, where for every IOMMU DMA unmap operation, the
+ flush operation of IOTLB and the free operation of IOVA are deferred.
+ They are only guaranteed to be done before the related IOVA will be
+ reused.
+
+config IOMMU_DEFAULT_STRICT
+   bool "strict"
+   help
+ For every IOMMU DMA unmap operation, the flush operation of IOTLB and
+ the free operation of IOVA are guaranteed to be done in the unmap
+ function.
+
+ This mode is safer than lazy mode, but it may be slower in some high
+ performance scenarios.


FWIW, as an end user who doesn't care much about the implementation 
details I'd probably appreciate the actual implications being clearer, 
i.e. what does "safer" mean in practice and what is it relative to?


Robin.


+
+endchoice
+
  config OF_IOMMU
def_bool y
depends on OF && IOMMU_API
diff --git a/drivers/iommu/iommu.c b/drivers/iommu/iommu.c
index 966426a96520..177b0dafc535 100644
--- a/drivers/iommu/iommu.c
+++ b/drivers/iommu/iommu.c
@@ -29,7 +29,8 @@ static struct kset *iommu_group_kset;
  static DEFINE_IDA(iommu_group_ida);
  
  static unsigned int iommu_def_domain_type __read_mostly;

-static bool iommu_dma_strict __read_mostly = true;
+static bool iommu_dma_strict __read_mostly =
+   IS_ENABLED(CONFIG_IOMMU_DEFAULT_STRICT);
  static u32 iommu_cmd_line __read_mostly;
  
  struct iommu_group {



___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

Re: [RFC] /dev/ioasid uAPI proposal

2021-06-09 Thread Paolo Bonzini


On 09/06/21 16:45, Jason Gunthorpe wrote:

On Wed, Jun 09, 2021 at 08:31:34AM -0600, Alex Williamson wrote:


If we go back to the wbinvd ioctl mechanism, if I call that ioctl with
an ioasidfd that contains no devices, then I shouldn't be able to
generate a wbinvd on the processor, right?  If I add a device,
especially in a configuration that can generate non-coherent DMA, now
that ioctl should work.  If I then remove all devices from that ioasid,
what then is the difference from the initial state.  Should the ioctl
now work because it worked once in the past?


The ioctl is fine, but telling KVM to enable WBINVD is very similar to
open and then reconfiguring the ioasid_fd is very similar to
chmod. From a security perspective revoke is not strictly required,
IMHO.


I absolutely do *not* want an API that tells KVM to enable WBINVD.  This 
is not up for discussion.


But really, let's stop calling the file descriptor a security proof or a 
capability.  It's overkill; all that we are doing here is kernel 
acceleration of the WBINVD ioctl.


As a thought experiment, let's consider what would happen if wbinvd 
caused an unconditional exit from guest to userspace.  Userspace would 
react by invoking the ioctl on the ioasid.  The proposed functionality 
is just an acceleration of this same thing, avoiding the 
guest->KVM->userspace->IOASID->wbinvd trip.


This is why the API that I want, and that is already exists for VFIO 
group file descriptors, informs KVM of which "ioctls" the guest should 
be able to do via privileged instructions[1].  Then the kernel works out 
with KVM how to ensure a 1:1 correspondence between the operation of the 
ioctls and the privileged operations.


One way to do it would be to always trap WBINVD and invoke the same 
kernel function that implements the ioctl.  The function would do either 
a wbinvd or nothing, based on whether the ioasid has any device.  The 
next logical step is a notification mechanism that enables WBINVD (by 
disabling the WBINVD intercept) when there are devices in the ioasidfd, 
and disables WBINVD (by enabling a no-op intercept) when there are none.


And in fact once all VFIO devices are gone, wbinvd is for all purposes a 
no-op as far as the guest kernel can tell.  So there's no reason to 
treat it as anything but a no-op.


Thanks,

Paolo

[1] As an aside, I must admit I didn't entirely understand the design of 
the KVM-VFIO device back when Alex added it.  But with this model it was 
absolutely the right thing to do, and it remains the right thing to do 
even if VFIO groups are replaced with IOASID file descriptors.


___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

Re: [PATCH v11 1/3] iommu: Enhance IOMMU default DMA mode build options

2021-06-09 Thread John Garry


On 09/06/2021 16:03, Robin Murphy wrote:

On 2021-06-08 14:18, John Garry wrote:

From: Zhen Lei 

First, add build options IOMMU_DEFAULT_{LAZY|STRICT}, so that we have the
opportunity to set {lazy|strict} mode as default at build time. Then put
the two config options in a choice, as they are mutually exclusive.

[jpg: Make choice between strict and lazy only (and not passthrough)]
Signed-off-by: Zhen Lei 
Signed-off-by: John Garry 
---
  drivers/iommu/Kconfig | 35 +++
  drivers/iommu/iommu.c |  3 ++-
  2 files changed, 37 insertions(+), 1 deletion(-)

diff --git a/drivers/iommu/Kconfig b/drivers/iommu/Kconfig
index 1f111b399bca..369a3af9e5bf 100644
--- a/drivers/iommu/Kconfig
+++ b/drivers/iommu/Kconfig
@@ -90,6 +90,41 @@ config IOMMU_DEFAULT_PASSTHROUGH
    If unsure, say N here.
+choice
+    prompt "IOMMU default DMA mode"
+    depends on IOMMU_API
+    depends on X86 || IA64 || X86_64 || ARM || ARM64


Simply "depends on IOMMU_DMA" should suffice, since that's now the only 
place where flush queues matter.


I suppose so.

Configs ARM64, AMD_IOMMU, and INTEL_IOMMU all select this.




+
+    default IOMMU_DEFAULT_STRICT
+    help
+  This option allows an IOMMU DMA mode to be chosen at build 
time, to

+  override the default DMA mode of each ARCH, removing the need to
+  pass in kernel parameters through command line. It is still 
possible

+  to provide ARCH-specific or common boot options to override this
+  option.
+
+  If unsure, keep the default.
+
+config IOMMU_DEFAULT_LAZY
+    bool "lazy"
+    help
+  Support lazy mode, where for every IOMMU DMA unmap operation, the
+  flush operation of IOTLB and the free operation of IOVA are 
deferred.
+  They are only guaranteed to be done before the related IOVA 
will be

+  reused.
+
+config IOMMU_DEFAULT_STRICT
+    bool "strict"
+    help
+  For every IOMMU DMA unmap operation, the flush operation of 
IOTLB and

+  the free operation of IOVA are guaranteed to be done in the unmap
+  function.
+
+  This mode is safer than lazy mode, but it may be slower in some 
high

+  performance scenarios.


FWIW, as an end user who doesn't care much about the implementation 
details I'd probably appreciate the actual implications being clearer, 
i.e. what does "safer" mean in practice and what is it relative to?




Fine, I can mention that lazy mode means that we have reduced device 
isolation and a dangerous window can be created between device driver 
DMA unmap and zapping the mapping in the IOMMU; however still much safer 
than passthrough/no IOMMU, which means no isolation at all.


Thanks,
John
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

Re: Plan for /dev/ioasid RFC v2

2021-06-09 Thread Joerg Roedel

On Wed, Jun 09, 2021 at 12:00:09PM -0300, Jason Gunthorpe wrote:
> Only *drivers* know what the actual device is going to do, devices do
> not. Since the group doesn't have drivers it is the wrong layer to be
> making choices about how to configure the IOMMU.

Groups don't carry how to configure IOMMUs, that information is
mostly in the IOMMU domains. And those (or an abstraction of them) is
configured through /dev/ioasid. So not sure what you wanted to say with
the above.

All a group carries is information about which devices are not
sufficiently isolated from each other and thus need to always be in the
same domain.

> The device centric approach is my attempt at this, and it is pretty
> clean, I think.

Clean, but still insecure.

> All ACS does is prevent P2P operations, if you assign all the group
> devices into the same /dev/iommu then you may not care about that
> security isolation property. At the very least it is policy for user
> to decide, not kernel.

It is a kernel decision, because a fundamental task of the kernel is to
ensure isolation between user-space tasks as good as it can. And if a
device assigned to one task can interfer with a device of another task
(e.g. by sending P2P messages), then the promise of isolation is broken.

> Groups should be primarily about isolation security, not about IOASID
> matching.

That doesn't make any sense, what do you mean by 'IOASID matching'?

> Blocking this forever in the new uAPI just because group = IOASID is
> some historical convenience makes no sense to me.

I think it is safe to assume that devices supporting PASID will most
often be the only ones in their group. But for the non-PASID IOASID
use-cases like plain old device assignment to a VM it needs to be
group-centric.

Regards,

Joerg
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

Re: Plan for /dev/ioasid RFC v2

2021-06-09 Thread Alex Williamson

On Wed, 9 Jun 2021 17:51:26 +0200
Joerg Roedel  wrote:

> On Wed, Jun 09, 2021 at 12:00:09PM -0300, Jason Gunthorpe wrote:
> > Only *drivers* know what the actual device is going to do, devices do
> > not. Since the group doesn't have drivers it is the wrong layer to be
> > making choices about how to configure the IOMMU.  
> 
> Groups don't carry how to configure IOMMUs, that information is
> mostly in the IOMMU domains. And those (or an abstraction of them) is
> configured through /dev/ioasid. So not sure what you wanted to say with
> the above.
> 
> All a group carries is information about which devices are not
> sufficiently isolated from each other and thus need to always be in the
> same domain.
> 
> > The device centric approach is my attempt at this, and it is pretty
> > clean, I think.  
> 
> Clean, but still insecure.
> 
> > All ACS does is prevent P2P operations, if you assign all the group
> > devices into the same /dev/iommu then you may not care about that
> > security isolation property. At the very least it is policy for user
> > to decide, not kernel.  
> 
> It is a kernel decision, because a fundamental task of the kernel is to
> ensure isolation between user-space tasks as good as it can. And if a
> device assigned to one task can interfer with a device of another task
> (e.g. by sending P2P messages), then the promise of isolation is broken.

AIUI, the IOASID model will still enforce IOMMU groups, but it's not an
explicit part of the interface like it is for vfio.  For example the
IOASID model allows attaching individual devices such that we have
granularity to create per device IOASIDs, but all devices within an
IOMMU group are required to be attached to an IOASID before they can be
used.  It's not entirely clear to me yet how that last bit gets
implemented though, ie. what barrier is in place to prevent device
usage prior to reaching this viable state.

> > Groups should be primarily about isolation security, not about IOASID
> > matching.  
> 
> That doesn't make any sense, what do you mean by 'IOASID matching'?

One of the problems with the vfio interface use of groups is that we
conflate the IOMMU group for both isolation and granularity.  I think
what Jason is referring to here is that we still want groups to be the
basis of isolation, but we don't want a uAPI that presumes all devices
within the group must use the same IOASID.  For example, if a user owns
an IOMMU group consisting of non-isolated functions of a multi-function
device, they should be able to create a vIOMMU VM where each of those
functions has its own address space.  That can't be done today, the
entire group would need to be attached to the VM under a PCIe-to-PCI
bridge to reflect the address space limitation imposed by the vfio
group uAPI model.  Thanks,

Alex

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

Re: Plan for /dev/ioasid RFC v2

2021-06-09 Thread Alex Williamson

On Wed, 9 Jun 2021 10:15:32 -0600
Alex Williamson  wrote:

> On Wed, 9 Jun 2021 17:51:26 +0200
> Joerg Roedel  wrote:
> 
> > On Wed, Jun 09, 2021 at 12:00:09PM -0300, Jason Gunthorpe wrote:  
> > > Only *drivers* know what the actual device is going to do, devices do
> > > not. Since the group doesn't have drivers it is the wrong layer to be
> > > making choices about how to configure the IOMMU.
> > 
> > Groups don't carry how to configure IOMMUs, that information is
> > mostly in the IOMMU domains. And those (or an abstraction of them) is
> > configured through /dev/ioasid. So not sure what you wanted to say with
> > the above.
> > 
> > All a group carries is information about which devices are not
> > sufficiently isolated from each other and thus need to always be in the
> > same domain.
> >   
> > > The device centric approach is my attempt at this, and it is pretty
> > > clean, I think.
> > 
> > Clean, but still insecure.
> >   
> > > All ACS does is prevent P2P operations, if you assign all the group
> > > devices into the same /dev/iommu then you may not care about that
> > > security isolation property. At the very least it is policy for user
> > > to decide, not kernel.
> > 
> > It is a kernel decision, because a fundamental task of the kernel is to
> > ensure isolation between user-space tasks as good as it can. And if a
> > device assigned to one task can interfer with a device of another task
> > (e.g. by sending P2P messages), then the promise of isolation is broken.  
> 
> AIUI, the IOASID model will still enforce IOMMU groups, but it's not an
> explicit part of the interface like it is for vfio.  For example the
> IOASID model allows attaching individual devices such that we have
> granularity to create per device IOASIDs, but all devices within an
> IOMMU group are required to be attached to an IOASID before they can be
> used.  It's not entirely clear to me yet how that last bit gets
> implemented though, ie. what barrier is in place to prevent device
> usage prior to reaching this viable state.
> 
> > > Groups should be primarily about isolation security, not about IOASID
> > > matching.
> > 
> > That doesn't make any sense, what do you mean by 'IOASID matching'?  
> 
> One of the problems with the vfio interface use of groups is that we
> conflate the IOMMU group for both isolation and granularity.  I think
> what Jason is referring to here is that we still want groups to be the
> basis of isolation, but we don't want a uAPI that presumes all devices
> within the group must use the same IOASID.  For example, if a user owns
> an IOMMU group consisting of non-isolated functions of a multi-function
> device, they should be able to create a vIOMMU VM where each of those
> functions has its own address space.  That can't be done today, the
> entire group would need to be attached to the VM under a PCIe-to-PCI
> bridge to reflect the address space limitation imposed by the vfio
> group uAPI model.  Thanks,

Hmm, likely discussed previously in these threads, but I can't come up
with the argument that prevents us from making the BIND interface
at the group level but the ATTACH interface at the device level?  For
example:

 - VFIO_GROUP_BIND_IOASID_FD
 - VFIO_DEVICE_ATTACH_IOASID

AFAICT that makes the group ownership more explicit but still allows
the device level IOASID granularity.  Logically this is just an
internal iommu_group_for_each_dev() in the BIND ioctl.  Thanks,

Alex

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

RE: [PATCH] x86/cpufeatures: Force disable X86_FEATURE_ENQCMD and remove update_pasid()

2021-06-09 Thread Luck, Tony

>> I've told Fenghua to dig out the previous iteration of this patch where
>> the plan was to lazily fix the PASID_MSR in other threads in the #GP
>> handler.
>
> Blech.  Also this won't work for other PASID-like features.
>
> I have a half-written patch to fix this up for real.  Stay tuned.

Andy: Any update on this?

-Tony
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

Re: [RFC] /dev/ioasid uAPI proposal

2021-06-09 Thread Alex Williamson

On Wed, 9 Jun 2021 02:49:32 +
"Tian, Kevin"  wrote:

> > From: Alex Williamson 
> > Sent: Wednesday, June 9, 2021 2:47 AM
> > 
> > On Tue, 8 Jun 2021 15:44:26 +0200
> > Paolo Bonzini  wrote:
> >   
> > > On 08/06/21 15:15, Jason Gunthorpe wrote:  
> > > > On Tue, Jun 08, 2021 at 09:56:09AM +0200, Paolo Bonzini wrote:
> > > >  
> > >  Alternatively you can add a KVM_DEV_IOASID_{ADD,DEL} pair of  
> > ioctls. But it  
> > >  seems useless complication compared to just using what we have  
> > now, at least  
> > >  while VMs only use IOASIDs via VFIO.  
> > > >>>
> > > >>> The simplest is KVM_ENABLE_WBINVD() and be  
> > done  
> > > >>> with it.  
> > 
> > Even if we were to relax wbinvd access to any device (capable of
> > no-snoop or not) in any IOMMU configuration (blocking no-snoop or not),
> > I think as soon as we say "proof" is required to gain this access then
> > that proof should be ongoing for the life of the access.
> > 
> > That alone makes this more than a "I want this feature, here's my
> > proof", one-shot ioctl.  Like the groupfd enabling a path for KVM to
> > ask if that group is non-coherent and holding a group reference to
> > prevent the group from being used to authorize multiple KVM instances,
> > the ioasidfd proof would need to minimally validate that devices are
> > present and provide some reference to enforce that model as ongoing, or
> > notifier to indicate an end of that authorization.  I don't think we
> > can simplify that out of the equation or we've essentially invalidated
> > that the proof is really required.
> >   
> > > >>
> > > >> The simplest one is KVM_DEV_VFIO_GROUP_ADD/DEL, that already  
> > exists and also  
> > > >> covers hot-unplug.  The second simplest one is  
> > KVM_DEV_IOASID_ADD/DEL.  
> > > >
> > > > This isn't the same thing, this is back to trying to have the kernel
> > > > set policy for userspace.  
> > >
> > > If you want a userspace policy then there would be three states:
> > >
> > > * WBINVD enabled because a WBINVD-enabled VFIO device is attached.
> > >
> > > * WBINVD potentially enabled but no WBINVD-enabled VFIO device  
> > attached  
> > >
> > > * WBINVD forcefully disabled
> > >
> > > KVM_DEV_VFIO_GROUP_ADD/DEL can still be used to distinguish the first
> > > two.  Due to backwards compatibility, those two describe the default
> > > behavior; disabling wbinvd can be done easily with a new sub-ioctl of
> > > KVM_ENABLE_CAP and doesn't require any security proof.  
> > 
> > That seems like a good model, use the kvm-vfio device for the default
> > behavior and extend an existing KVM ioctl if QEMU still needs a way to
> > tell KVM to assume all DMA is coherent, regardless of what the kvm-vfio
> > device reports.
> > 
> > If feels like we should be able to support a backwards compatibility
> > mode using the vfio group, but I expect long term we'll want to
> > transition the kvm-vfio device from a groupfd to an ioasidfd.
> >   
> > > The meaning of WBINVD-enabled is "won't return -ENXIO for the wbinvd
> > > ioctl", nothing more nothing less.  If all VFIO devices are going to be
> > > WBINVD-enabled, then that will reflect on KVM as well, and I won't have
> > > anything to object if there's consensus on the device assignment side of
> > > things that the wbinvd ioctl won't ever fail.  
> > 
> > If we create the IOMMU vs device coherency matrix:
> > 
> > \ Device supports
> > IOMMU blocks \   no-snoop
> >   no-snoop\  yes | no  |
> > ---+-+-+
> >yes |  1  |  2  |
> > ---+-+-+
> >no  |  3  |  4  |
> > ---+-+-+
> > 
> > DMA is always coherent in boxes {1,2,4} (wbinvd emulation is not
> > needed).  VFIO will currently always configure the IOMMU for {1,2} when
> > the feature is supported.  Boxes {3,4} are where we'll currently
> > emulate wbinvd.  The best we could do, not knowing the guest or
> > insights into the guest driver would be to only emulate wbinvd for {3}.
> > 
> > The majority of devices appear to report no-snoop support {1,3}, but
> > the claim is that it's mostly unused outside of GPUs, effectively {2,4}.
> > I'll speculate that most IOMMUs support enforcing coherency (amd, arm,
> > fsl unconditionally, intel conditionally) {1,2}.
> > 
> > I think that means we're currently operating primarily in Box {1},
> > which does not seem to lean towards unconditional wbinvd access with
> > device ownership.
> > 
> > I think we have a desire with IOASID to allow user policy to operate
> > certain devices in {3} and I think the argument there is that a
> > specific software enforced coherence sync is more efficient on the bus
> > than the constant coherence enforcement by the IOMMU.
> > 
> > I think that the disable mode Jason proposed is essentially just a way
> > to move a device from {3} to {4}, ie. the IOASID support or
> > configuration does not block no-snoop and the device claims to support
> > no-snoop, but doesn't use it.

Re: [PATCH] iommu/io-pgtable-arm: Optimize partial walk flush for large scatter-gather list

2021-06-09 Thread Robin Murphy


On 2021-06-09 15:53, Sai Prakash Ranjan wrote:

Currently for iommu_unmap() of large scatter-gather list with page size
elements, the majority of time is spent in flushing of partial walks in
__arm_lpae_unmap() which is a VA based TLB invalidation (TLBIVA for
arm-smmu).

For example: to unmap a 32MB scatter-gather list with page size elements
(8192 entries), there are 16->2MB buffer unmaps based on the pgsize (2MB
for 4K granule) and each of 2MB will further result in 512 TLBIVAs (2MB/4K)
resulting in a total of 8192 TLBIVAs (512*16) for 16->2MB causing a huge
overhead.

So instead use io_pgtable_tlb_flush_all() to invalidate the entire context
if size (pgsize) is greater than the granule size (4K, 16K, 64K). For this
example of 32MB scatter-gather list unmap, this results in just 16 ASID
based TLB invalidations or tlb_flush_all() callback (TLBIASID in case of
arm-smmu) as opposed to 8192 TLBIVAs thereby increasing the performance of
unmaps drastically.

Condition (size > granule size) is chosen for io_pgtable_tlb_flush_all()
because for any granule with supported pgsizes, we will have at least 512
TLB invalidations for which tlb_flush_all() is already recommended. For
example, take 4K granule with 2MB pgsize, this will result in 512 TLBIVA
in partial walk flush.

Test on QTI SM8150 SoC for 10 iterations of iommu_{map_sg}/unmap:
(average over 10 iterations)

Before this optimization:

 sizeiommu_map_sg  iommu_unmap
   4K2.067 us 1.854 us
  64K9.598 us 8.802 us
   1M  148.890 us   130.718 us
   2M  305.864 us67.291 us
  12M 1793.604 us   390.838 us
  16M 2386.848 us   518.187 us
  24M 3563.296 us   775.989 us
  32M 4747.171 us  1033.364 us

After this optimization:

 sizeiommu_map_sg  iommu_unmap
   4K1.723 us 1.765 us
  64K9.880 us 8.869 us
   1M  155.364 us   135.223 us
   2M  303.906 us 5.385 us
  12M 1786.557 us21.250 us
  16M 2391.890 us27.437 us
  24M 3570.895 us39.937 us
  32M 4755.234 us51.797 us

This is further reduced once the map/unmap_pages() support gets in which
will result in just 1 tlb_flush_all() as opposed to 16 tlb_flush_all().

Signed-off-by: Sai Prakash Ranjan 
---
  drivers/iommu/io-pgtable-arm.c | 7 +--
  1 file changed, 5 insertions(+), 2 deletions(-)

diff --git a/drivers/iommu/io-pgtable-arm.c b/drivers/iommu/io-pgtable-arm.c
index 87def58e79b5..c3cb9add3179 100644
--- a/drivers/iommu/io-pgtable-arm.c
+++ b/drivers/iommu/io-pgtable-arm.c
@@ -589,8 +589,11 @@ static size_t __arm_lpae_unmap(struct arm_lpae_io_pgtable 
*data,
  
  		if (!iopte_leaf(pte, lvl, iop->fmt)) {

/* Also flush any partial walks */
-   io_pgtable_tlb_flush_walk(iop, iova, size,
- ARM_LPAE_GRANULE(data));
+   if (size > ARM_LPAE_GRANULE(data))
+   io_pgtable_tlb_flush_all(iop);
+   else


Erm, when will the above condition ever not be true? ;)

Taking a step back, though, what about the impact to drivers other than 
SMMUv2? In particular I'm thinking of SMMUv3.2 where the whole range can 
be invalidated by VA in a single command anyway, so the additional 
penalties of TLBIALL are undesirable.


Robin.


+   io_pgtable_tlb_flush_walk(iop, iova, size,
+ 
ARM_LPAE_GRANULE(data));
ptep = iopte_deref(pte, data);
__arm_lpae_free_pgtable(data, lvl + 1, ptep);
} else if (iop->cfg.quirks & IO_PGTABLE_QUIRK_NON_STRICT) {


___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

Re: Plan for /dev/ioasid RFC v2

2021-06-09 Thread Jason Gunthorpe

On Wed, Jun 09, 2021 at 10:27:22AM -0600, Alex Williamson wrote:

> > > It is a kernel decision, because a fundamental task of the kernel is to
> > > ensure isolation between user-space tasks as good as it can. And if a
> > > device assigned to one task can interfer with a device of another task
> > > (e.g. by sending P2P messages), then the promise of isolation is broken.  
> > 
> > AIUI, the IOASID model will still enforce IOMMU groups, but it's not an
> > explicit part of the interface like it is for vfio.  For example the
> > IOASID model allows attaching individual devices such that we have
> > granularity to create per device IOASIDs, but all devices within an
> > IOMMU group are required to be attached to an IOASID before they can be
> > used.  

Yes, thanks Alex

> > It's not entirely clear to me yet how that last bit gets
> > implemented though, ie. what barrier is in place to prevent device
> > usage prior to reaching this viable state.

The major security checkpoint for the group is on the VFIO side.  We
must require the group before userspace can be allowed access to any
device registers. Obtaining the device_fd from the group_fd does this
today as the group_fd is the security proof.

Actually, thinking about this some more.. If the only way to get a
working device_fd in the first place is to get it from the group_fd
and thus pass a group-based security check, why do we need to do
anything at the ioasid level?

The security concept of isolation was satisfied as soon as userspace
opened the group_fd. What do more checks in the kernel accomplish?

Yes, we have the issue where some groups require all devices to use
the same IOASID, but once someone has the group_fd that is no longer a
security issue. We can fail VFIO_DEVICE_ATTACH_IOASID callss that
don't make sense.

> > > > Groups should be primarily about isolation security, not about IOASID
> > > > matching.
> > > 
> > > That doesn't make any sense, what do you mean by 'IOASID matching'?  
> > 
> > One of the problems with the vfio interface use of groups is that we
> > conflate the IOMMU group for both isolation and granularity.  I think
> > what Jason is referring to here is that we still want groups to be the
> > basis of isolation, but we don't want a uAPI that presumes all devices
> > within the group must use the same IOASID.  

Yes, thanks again Alex

> > For example, if a user owns an IOMMU group consisting of
> > non-isolated functions of a multi-function device, they should be
> > able to create a vIOMMU VM where each of those functions has its
> > own address space.  That can't be done today, the entire group
> > would need to be attached to the VM under a PCIe-to-PCI bridge to
> > reflect the address space limitation imposed by the vfio group
> > uAPI model.  Thanks,
> 
> Hmm, likely discussed previously in these threads, but I can't come up
> with the argument that prevents us from making the BIND interface
> at the group level but the ATTACH interface at the device level?  For
> example:
> 
>  - VFIO_GROUP_BIND_IOASID_FD
>  - VFIO_DEVICE_ATTACH_IOASID
> 
> AFAICT that makes the group ownership more explicit but still allows
> the device level IOASID granularity.  Logically this is just an
> internal iommu_group_for_each_dev() in the BIND ioctl.  Thanks,

At a high level it sounds OK.

However I think your above question needs to be answered - what do we
want to enforce on the iommu_fd and why?

Also, this creates a problem with the device label idea, we still
need to associate each device_fd with a label, so your above sequence
is probably:

  VFIO_GROUP_BIND_IOASID_FD(group fd)
  VFIO_BIND_IOASID_FD(device fd 1, device_label)
  VFIO_BIND_IOASID_FD(device fd 2, device_label)
  VFIO_DEVICE_ATTACH_IOASID(..)

And then I think we are back to where I had started, we can trigger
whatever VFIO_GROUP_BIND_IOASID_FD does automatically as soon as all
of the devices in the group have been bound.

Jason
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

Re: [PATCH] x86/cpufeatures: Force disable X86_FEATURE_ENQCMD and remove update_pasid()

2021-06-09 Thread Andy Lutomirski

On 6/9/21 10:32 AM, Luck, Tony wrote:
>>> I've told Fenghua to dig out the previous iteration of this patch where
>>> the plan was to lazily fix the PASID_MSR in other threads in the #GP
>>> handler.
>>
>> Blech.  Also this won't work for other PASID-like features.
>>
>> I have a half-written patch to fix this up for real.  Stay tuned.
> 
> Andy: Any update on this?
> 
> -Tony
> 

Let me try to merge my pile with tglx's pile and come up with something
halfway sane.
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

Re: [RFC] /dev/ioasid uAPI proposal

2021-06-09 Thread Jason Wang



在 2021/6/8 下午9:20, Jason Gunthorpe 写道:

On Tue, Jun 08, 2021 at 09:10:42AM +0800, Jason Wang wrote:


Well, this sounds like a re-invention of io_uring which has already worked
for multifds.

How so? io_uring is about sending work to the kernel, not getting
structued events back?



Actually it can. Userspace can poll multiple fds via preparing multiple 
sqes with IORING_OP_ADD flag.





It is more like one of the perf rings



This means another ring and we need introduce ioctl() to add or remove 
ioasids from the poll. And it still need a kind of fallback like a list 
if the ring is full.


Thanks




Jason



___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

[PATCH 00/23] [PULL REQUEST] Intel IOMMU Updates for Linux v5.14

2021-06-09 Thread Lu Baolu

Hi Joerg,

The patches queued in this series are for v5.14.
It includes:

 - Convert Intel IOMMU to use sva_lib helpers in iommu core
 - ftrace and debugfs supports for page fault handling
 - Support asynchronous nested capabilities
 - Various misc cleanups

Please pull.

Best regards,
Baolu

Aditya Srivastava (1):
  iommu/vt-d: Fix kernel-doc syntax in file header

Colin Ian King (1):
  iommu/vt-d: Remove redundant assignment to variable agaw

Gustavo A. R. Silva (1):
  iommu/vt-d: Fix out-bounds-warning in intel/svm.c

Lu Baolu (14):
  iommu/vt-d: Tweak the description of a DMA fault
  iommu/vt-d: Select PCI_ATS explicitly
  iommu/vt-d: Support asynchronous IOMMU nested capabilities
  iommu/vt-d: Add pasid private data helpers
  iommu/vt-d: Use iommu_sva_alloc(free)_pasid() helpers
  iommu/vt-d: Use common helper to lookup svm devices
  iommu/vt-d: Refactor prq_event_thread()
  iommu/vt-d: Allocate/register iopf queue for sva devices
  iommu/vt-d: Report prq to io-pgfault framework
  iommu/vt-d: Add prq_report trace event
  iommu/vt-d: Add common code for dmar latency performance monitors
  iommu/vt-d: Expose latency monitor data through debugfs
  iommu/vt-d: Add cache invalidation latency sampling
  iommu/vt-d: Add PRQ handling latency sampling

Parav Pandit (5):
  iommu/vt-d: Use bitfields for DMAR capabilities
  iommu/vt-d: Removed unused iommu_count in dmar domain
  iommu/vt-d: Remove unnecessary braces
  iommu/vt-d: Define counter explicitly as unsigned int
  iommu/vt-d: No need to typecast

YueHaibing (1):
  iommu/vt-d: Use DEVICE_ATTR_RO macro

 include/linux/intel-iommu.h|  44 +-
 drivers/iommu/intel/perf.h |  73 
 include/trace/events/intel_iommu.h |  37 ++
 drivers/iommu/intel/debugfs.c  | 111 +
 drivers/iommu/intel/dmar.c |  54 ++-
 drivers/iommu/intel/iommu.c| 163 +---
 drivers/iommu/intel/pasid.c|   2 +-
 drivers/iommu/intel/perf.c | 166 
 drivers/iommu/intel/svm.c  | 643 ++---
 drivers/iommu/intel/Kconfig|   6 +
 drivers/iommu/intel/Makefile   |   1 +
 11 files changed, 894 insertions(+), 406 deletions(-)
 create mode 100644 drivers/iommu/intel/perf.h
 create mode 100644 drivers/iommu/intel/perf.c

-- 
2.25.1

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

[PATCH 01/23] iommu/vt-d: Remove redundant assignment to variable agaw

2021-06-09 Thread Lu Baolu

From: Colin Ian King 

The variable agaw is initialized with a value that is never read and it
is being updated later with a new value as a counter in a for-loop. The
initialization is redundant and can be removed.

Addresses-Coverity: ("Unused value")
Signed-off-by: Colin Ian King 
Signed-off-by: Lu Baolu 
Link: https://lore.kernel.org/r/20210416171826.64091-1-colin.k...@canonical.com
---
 drivers/iommu/intel/iommu.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/iommu/intel/iommu.c b/drivers/iommu/intel/iommu.c
index be35284a2016..65c2ed760060 100644
--- a/drivers/iommu/intel/iommu.c
+++ b/drivers/iommu/intel/iommu.c
@@ -564,7 +564,7 @@ static inline int domain_pfn_supported(struct dmar_domain 
*domain,
 static int __iommu_calculate_agaw(struct intel_iommu *iommu, int max_gaw)
 {
unsigned long sagaw;
-   int agaw = -1;
+   int agaw;
 
sagaw = cap_sagaw(iommu->cap);
for (agaw = width_to_agaw(max_gaw);
-- 
2.25.1

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

[PATCH 02/23] iommu/vt-d: Fix kernel-doc syntax in file header

2021-06-09 Thread Lu Baolu

From: Aditya Srivastava 

The opening comment mark '/**' is used for highlighting the beginning of
kernel-doc comments.
The header for drivers/iommu/intel/pasid.c follows this syntax, but
the content inside does not comply with kernel-doc.

This line was probably not meant for kernel-doc parsing, but is parsed
due to the presence of kernel-doc like comment syntax(i.e, '/**'), which
causes unexpected warnings from kernel-doc:
warning: Function parameter or member 'fmt' not described in 'pr_fmt'

Provide a simple fix by replacing this occurrence with general comment
format, i.e. '/*', to prevent kernel-doc from parsing it.

Signed-off-by: Aditya Srivastava 
Acked-by: Randy Dunlap 
Signed-off-by: Lu Baolu 
Link: https://lore.kernel.org/r/20210523143245.19040-1-yashsri...@gmail.com
---
 drivers/iommu/intel/pasid.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/iommu/intel/pasid.c b/drivers/iommu/intel/pasid.c
index 72dc84821dad..c6cf44a6c923 100644
--- a/drivers/iommu/intel/pasid.c
+++ b/drivers/iommu/intel/pasid.c
@@ -1,5 +1,5 @@
 // SPDX-License-Identifier: GPL-2.0
-/**
+/*
  * intel-pasid.c - PASID idr, table and entry manipulation
  *
  * Copyright (C) 2018 Intel Corporation
-- 
2.25.1

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

[PATCH 03/23] iommu/vt-d: Tweak the description of a DMA fault

2021-06-09 Thread Lu Baolu

The Intel IOMMU driver reports the DMA fault reason in a decimal number
while the VT-d specification uses a hexadecimal one. It's inconvenient
that users need to covert them everytime before consulting the spec.
Let's use hexadecimal number for a DMA fault reason.

The fault message uses 0x as PASID for DMA requests w/o PASID.
This is confusing. Tweak this by adding "NO_PASID" explicitly.

Reviewed-by: Ashok Raj 
Signed-off-by: Lu Baolu 
Link: https://lore.kernel.org/r/20210517065425.4953-1-baolu...@linux.intel.com
---
 drivers/iommu/intel/dmar.c | 23 +++
 1 file changed, 15 insertions(+), 8 deletions(-)

diff --git a/drivers/iommu/intel/dmar.c b/drivers/iommu/intel/dmar.c
index 84057cb9596c..cae1078cbfec 100644
--- a/drivers/iommu/intel/dmar.c
+++ b/drivers/iommu/intel/dmar.c
@@ -1913,16 +1913,23 @@ static int dmar_fault_do_one(struct intel_iommu *iommu, 
int type,
reason = dmar_get_fault_reason(fault_reason, &fault_type);
 
if (fault_type == INTR_REMAP)
-   pr_err("[INTR-REMAP] Request device [%02x:%02x.%d] fault index 
%llx [fault reason %02d] %s\n",
-   source_id >> 8, PCI_SLOT(source_id & 0xFF),
-   PCI_FUNC(source_id & 0xFF), addr >> 48,
-   fault_reason, reason);
-   else
-   pr_err("[%s] Request device [%02x:%02x.%d] PASID %x fault addr 
%llx [fault reason %02d] %s\n",
+   pr_err("[INTR-REMAP] Request device [0x%02x:0x%02x.%d] fault 
index 0x%llx [fault reason 0x%02x] %s\n",
+  source_id >> 8, PCI_SLOT(source_id & 0xFF),
+  PCI_FUNC(source_id & 0xFF), addr >> 48,
+  fault_reason, reason);
+   else if (pasid == INVALID_IOASID)
+   pr_err("[%s NO_PASID] Request device [0x%02x:0x%02x.%d] fault 
addr 0x%llx [fault reason 0x%02x] %s\n",
   type ? "DMA Read" : "DMA Write",
   source_id >> 8, PCI_SLOT(source_id & 0xFF),
-  PCI_FUNC(source_id & 0xFF), pasid, addr,
+  PCI_FUNC(source_id & 0xFF), addr,
   fault_reason, reason);
+   else
+   pr_err("[%s PASID 0x%x] Request device [0x%02x:0x%02x.%d] fault 
addr 0x%llx [fault reason 0x%02x] %s\n",
+  type ? "DMA Read" : "DMA Write", pasid,
+  source_id >> 8, PCI_SLOT(source_id & 0xFF),
+  PCI_FUNC(source_id & 0xFF), addr,
+  fault_reason, reason);
+
return 0;
 }
 
@@ -1989,7 +1996,7 @@ irqreturn_t dmar_fault(int irq, void *dev_id)
if (!ratelimited)
/* Using pasid -1 if pasid is not present */
dmar_fault_do_one(iommu, type, fault_reason,
- pasid_present ? pasid : -1,
+ pasid_present ? pasid : 
INVALID_IOASID,
  source_id, guest_addr);
 
fault_index++;
-- 
2.25.1

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

[PATCH 04/23] iommu/vt-d: Select PCI_ATS explicitly

2021-06-09 Thread Lu Baolu

The Intel VT-d implementation supports device TLB management. Select
PCI_ATS explicitly so that the pci_ats helpers are always available.

Signed-off-by: Lu Baolu 
Acked-by: Will Deacon 
Link: 
https://lore.kernel.org/r/20210512065313.3441309-1-baolu...@linux.intel.com
---
 drivers/iommu/intel/Kconfig | 1 +
 1 file changed, 1 insertion(+)

diff --git a/drivers/iommu/intel/Kconfig b/drivers/iommu/intel/Kconfig
index 28a3d1596c76..7e5b240b801d 100644
--- a/drivers/iommu/intel/Kconfig
+++ b/drivers/iommu/intel/Kconfig
@@ -14,6 +14,7 @@ config INTEL_IOMMU
select SWIOTLB
select IOASID
select IOMMU_DMA
+   select PCI_ATS
help
  DMA remapping (DMAR) devices support enables independent address
  translations for Direct Memory Access (DMA) from devices.
-- 
2.25.1

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

[PATCH 05/23] iommu/vt-d: Support asynchronous IOMMU nested capabilities

2021-06-09 Thread Lu Baolu

Current VT-d implementation supports nested translation only if all
underlying IOMMUs support the nested capability. This is unnecessary
as the upper layer is allowed to create different containers and set
them with different type of iommu backend. The IOMMU driver needs to
guarantee that devices attached to a nested mode iommu_domain should
support nested capabilility.

Signed-off-by: Lu Baolu 
Link: https://lore.kernel.org/r/20210517065701.5078-1-baolu...@linux.intel.com
---
 drivers/iommu/intel/iommu.c | 9 -
 1 file changed, 8 insertions(+), 1 deletion(-)

diff --git a/drivers/iommu/intel/iommu.c b/drivers/iommu/intel/iommu.c
index 65c2ed760060..b0ba187cb7f8 100644
--- a/drivers/iommu/intel/iommu.c
+++ b/drivers/iommu/intel/iommu.c
@@ -4757,6 +4757,13 @@ static int prepare_domain_attach_device(struct 
iommu_domain *domain,
if (!iommu)
return -ENODEV;
 
+   if ((dmar_domain->flags & DOMAIN_FLAG_NESTING_MODE) &&
+   !ecap_nest(iommu->ecap)) {
+   dev_err(dev, "%s: iommu not support nested translation\n",
+   iommu->name);
+   return -EINVAL;
+   }
+
/* check if this iommu agaw is sufficient for max mapped address */
addr_width = agaw_to_width(iommu->agaw);
if (addr_width > cap_mgaw(iommu->cap))
@@ -5457,7 +5464,7 @@ intel_iommu_enable_nesting(struct iommu_domain *domain)
int ret = -ENODEV;
 
spin_lock_irqsave(&device_domain_lock, flags);
-   if (nested_mode_support() && list_empty(&dmar_domain->devices)) {
+   if (list_empty(&dmar_domain->devices)) {
dmar_domain->flags |= DOMAIN_FLAG_NESTING_MODE;
dmar_domain->flags &= ~DOMAIN_FLAG_USE_FIRST_LEVEL;
ret = 0;
-- 
2.25.1

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

[PATCH 06/23] iommu/vt-d: Add pasid private data helpers

2021-06-09 Thread Lu Baolu

We are about to use iommu_sva_alloc/free_pasid() helpers in iommu core.
That means the pasid life cycle will be managed by iommu core. Use a
local array to save the per pasid private data instead of attaching it
the real pasid.

Signed-off-by: Lu Baolu 
Link: https://lore.kernel.org/r/20210520031531.712333-1-baolu...@linux.intel.com
---
 drivers/iommu/intel/svm.c | 62 ++-
 1 file changed, 41 insertions(+), 21 deletions(-)

diff --git a/drivers/iommu/intel/svm.c b/drivers/iommu/intel/svm.c
index 5165cea90421..82b0627ad7e7 100644
--- a/drivers/iommu/intel/svm.c
+++ b/drivers/iommu/intel/svm.c
@@ -17,6 +17,7 @@
 #include 
 #include 
 #include 
+#include 
 #include 
 #include 
 #include 
@@ -28,6 +29,23 @@ static void intel_svm_drain_prq(struct device *dev, u32 
pasid);
 
 #define PRQ_ORDER 0
 
+static DEFINE_XARRAY_ALLOC(pasid_private_array);
+static int pasid_private_add(ioasid_t pasid, void *priv)
+{
+   return xa_alloc(&pasid_private_array, &pasid, priv,
+   XA_LIMIT(pasid, pasid), GFP_ATOMIC);
+}
+
+static void pasid_private_remove(ioasid_t pasid)
+{
+   xa_erase(&pasid_private_array, pasid);
+}
+
+static void *pasid_private_find(ioasid_t pasid)
+{
+   return xa_load(&pasid_private_array, pasid);
+}
+
 int intel_svm_enable_prq(struct intel_iommu *iommu)
 {
struct page *pages;
@@ -224,7 +242,7 @@ static int pasid_to_svm_sdev(struct device *dev, unsigned 
int pasid,
if (pasid == INVALID_IOASID || pasid >= PASID_MAX)
return -EINVAL;
 
-   svm = ioasid_find(NULL, pasid, NULL);
+   svm = pasid_private_find(pasid);
if (IS_ERR(svm))
return PTR_ERR(svm);
 
@@ -334,7 +352,7 @@ int intel_svm_bind_gpasid(struct iommu_domain *domain, 
struct device *dev,
svm->gpasid = data->gpasid;
svm->flags |= SVM_FLAG_GUEST_PASID;
}
-   ioasid_set_data(data->hpasid, svm);
+   pasid_private_add(data->hpasid, svm);
INIT_LIST_HEAD_RCU(&svm->devs);
mmput(svm->mm);
}
@@ -388,7 +406,7 @@ int intel_svm_bind_gpasid(struct iommu_domain *domain, 
struct device *dev,
list_add_rcu(&sdev->list, &svm->devs);
  out:
if (!IS_ERR_OR_NULL(svm) && list_empty(&svm->devs)) {
-   ioasid_set_data(data->hpasid, NULL);
+   pasid_private_remove(data->hpasid);
kfree(svm);
}
 
@@ -431,7 +449,7 @@ int intel_svm_unbind_gpasid(struct device *dev, u32 pasid)
 * the unbind, IOMMU driver will get notified
 * and perform cleanup.
 */
-   ioasid_set_data(pasid, NULL);
+   pasid_private_remove(pasid);
kfree(svm);
}
}
@@ -547,8 +565,7 @@ intel_svm_bind_mm(struct device *dev, unsigned int flags,
svm = kzalloc(sizeof(*svm), GFP_KERNEL);
if (!svm) {
ret = -ENOMEM;
-   kfree(sdev);
-   goto out;
+   goto sdev_err;
}
 
if (pasid_max > intel_pasid_max_id)
@@ -556,13 +573,16 @@ intel_svm_bind_mm(struct device *dev, unsigned int flags,
 
/* Do not use PASID 0, reserved for RID to PASID */
svm->pasid = ioasid_alloc(NULL, PASID_MIN,
- pasid_max - 1, svm);
+ pasid_max - 1, NULL);
if (svm->pasid == INVALID_IOASID) {
-   kfree(svm);
-   kfree(sdev);
ret = -ENOSPC;
-   goto out;
+   goto svm_err;
}
+
+   ret = pasid_private_add(svm->pasid, svm);
+   if (ret)
+   goto pasid_err;
+
svm->notifier.ops = &intel_mmuops;
svm->mm = mm;
svm->flags = flags;
@@ -571,12 +591,8 @@ intel_svm_bind_mm(struct device *dev, unsigned int flags,
ret = -ENOMEM;
if (mm) {
ret = mmu_notifier_register(&svm->notifier, mm);
-   if (ret) {
-   ioasid_put(svm->pasid);
-   kfree(svm);
-   kfree(sdev);
-   goto out;
-   }
+   if (ret)
+   goto priv_err;
}
 
spin_lock_irqsave(&iommu->lock, iflags);
@@ -590,8 +606,13 @@ intel_svm_bind_mm(struct device *dev, unsigned int flags,
if (ret) {
if (mm)
mmu_notifier_unregister(&svm->notifier, mm);
+priv

[PATCH 07/23] iommu/vt-d: Use iommu_sva_alloc(free)_pasid() helpers

2021-06-09 Thread Lu Baolu

Align the pasid alloc/free code with the generic helpers defined in the
iommu core. This also refactored the SVA binding code to improve the
readability.

Signed-off-by: Lu Baolu 
Link: https://lore.kernel.org/r/20210520031531.712333-1-baolu...@linux.intel.com
---
 include/linux/intel-iommu.h |   1 -
 drivers/iommu/intel/iommu.c |   3 +
 drivers/iommu/intel/svm.c   | 278 +++-
 drivers/iommu/intel/Kconfig |   1 +
 4 files changed, 120 insertions(+), 163 deletions(-)

diff --git a/include/linux/intel-iommu.h b/include/linux/intel-iommu.h
index 03faf20a6817..4e8bb186daa7 100644
--- a/include/linux/intel-iommu.h
+++ b/include/linux/intel-iommu.h
@@ -791,7 +791,6 @@ struct intel_svm {
u32 pasid;
int gpasid; /* In case that guest PASID is different from host PASID */
struct list_head devs;
-   struct list_head list;
 };
 #else
 static inline void intel_svm_check(struct intel_iommu *iommu) {}
diff --git a/drivers/iommu/intel/iommu.c b/drivers/iommu/intel/iommu.c
index b0ba187cb7f8..0ca7f8a2f38e 100644
--- a/drivers/iommu/intel/iommu.c
+++ b/drivers/iommu/intel/iommu.c
@@ -5411,6 +5411,9 @@ intel_iommu_dev_enable_feat(struct device *dev, enum 
iommu_dev_features feat)
if (!info)
return -EINVAL;
 
+   if (intel_iommu_enable_pasid(info->iommu, dev))
+   return -ENODEV;
+
if (!info->pasid_enabled || !info->pri_enabled || 
!info->ats_enabled)
return -EINVAL;
 
diff --git a/drivers/iommu/intel/svm.c b/drivers/iommu/intel/svm.c
index 82b0627ad7e7..da4310686ed3 100644
--- a/drivers/iommu/intel/svm.c
+++ b/drivers/iommu/intel/svm.c
@@ -23,9 +23,11 @@
 #include 
 
 #include "pasid.h"
+#include "../iommu-sva-lib.h"
 
 static irqreturn_t prq_event_thread(int irq, void *d);
 static void intel_svm_drain_prq(struct device *dev, u32 pasid);
+#define to_intel_svm_dev(handle) container_of(handle, struct intel_svm_dev, 
sva)
 
 #define PRQ_ORDER 0
 
@@ -222,7 +224,6 @@ static const struct mmu_notifier_ops intel_mmuops = {
 };
 
 static DEFINE_MUTEX(pasid_mutex);
-static LIST_HEAD(global_svm_list);
 
 #define for_each_svm_dev(sdev, svm, d) \
list_for_each_entry((sdev), &(svm)->devs, list) \
@@ -477,79 +478,80 @@ static void load_pasid(struct mm_struct *mm, u32 pasid)
mutex_unlock(&mm->context.lock);
 }
 
-/* Caller must hold pasid_mutex, mm reference */
-static int
-intel_svm_bind_mm(struct device *dev, unsigned int flags,
- struct mm_struct *mm, struct intel_svm_dev **sd)
+static int intel_svm_alloc_pasid(struct device *dev, struct mm_struct *mm,
+unsigned int flags)
 {
-   struct intel_iommu *iommu = device_to_iommu(dev, NULL, NULL);
-   struct intel_svm *svm = NULL, *t;
-   struct device_domain_info *info;
-   struct intel_svm_dev *sdev;
-   unsigned long iflags;
-   int pasid_max;
-   int ret;
+   ioasid_t max_pasid = dev_is_pci(dev) ?
+   pci_max_pasids(to_pci_dev(dev)) : intel_pasid_max_id;
 
-   if (!iommu || dmar_disabled)
-   return -EINVAL;
+   return iommu_sva_alloc_pasid(mm, PASID_MIN, max_pasid - 1);
+}
 
-   if (!intel_svm_capable(iommu))
-   return -ENOTSUPP;
+static void intel_svm_free_pasid(struct mm_struct *mm)
+{
+   iommu_sva_free_pasid(mm);
+}
 
-   if (dev_is_pci(dev)) {
-   pasid_max = pci_max_pasids(to_pci_dev(dev));
-   if (pasid_max < 0)
-   return -EINVAL;
-   } else
-   pasid_max = 1 << 20;
+static struct iommu_sva *intel_svm_bind_mm(struct intel_iommu *iommu,
+  struct device *dev,
+  struct mm_struct *mm,
+  unsigned int flags)
+{
+   struct device_domain_info *info = get_domain_info(dev);
+   unsigned long iflags, sflags;
+   struct intel_svm_dev *sdev;
+   struct intel_svm *svm;
+   int ret = 0;
 
-   /* Bind supervisor PASID shuld have mm = NULL */
-   if (flags & SVM_FLAG_SUPERVISOR_MODE) {
-   if (!ecap_srs(iommu->ecap) || mm) {
-   pr_err("Supervisor PASID with user provided mm.\n");
-   return -EINVAL;
-   }
-   }
+   svm = pasid_private_find(mm->pasid);
+   if (!svm) {
+   svm = kzalloc(sizeof(*svm), GFP_KERNEL);
+   if (!svm)
+   return ERR_PTR(-ENOMEM);
 
-   list_for_each_entry(t, &global_svm_list, list) {
-   if (t->mm != mm)
-   continue;
+   svm->pasid = mm->pasid;
+   svm->mm = mm;
+   svm->flags = flags;
+   INIT_LIST_HEAD_RCU(&svm->devs);
 
-   svm = t;
-   if (svm->pasid >= pasid_max) {
-   dev_wa

[PATCH 08/23] iommu/vt-d: Use common helper to lookup svm devices

2021-06-09 Thread Lu Baolu

It's common to iterate the svm device list and find a matched device. Add
common helpers to do this and consolidate the code.

Signed-off-by: Lu Baolu 
Link: https://lore.kernel.org/r/20210520031531.712333-1-baolu...@linux.intel.com
---
 drivers/iommu/intel/svm.c | 68 +++
 1 file changed, 40 insertions(+), 28 deletions(-)

diff --git a/drivers/iommu/intel/svm.c b/drivers/iommu/intel/svm.c
index da4310686ed3..57867ff97bc2 100644
--- a/drivers/iommu/intel/svm.c
+++ b/drivers/iommu/intel/svm.c
@@ -48,6 +48,40 @@ static void *pasid_private_find(ioasid_t pasid)
return xa_load(&pasid_private_array, pasid);
 }
 
+static struct intel_svm_dev *
+svm_lookup_device_by_sid(struct intel_svm *svm, u16 sid)
+{
+   struct intel_svm_dev *sdev = NULL, *t;
+
+   rcu_read_lock();
+   list_for_each_entry_rcu(t, &svm->devs, list) {
+   if (t->sid == sid) {
+   sdev = t;
+   break;
+   }
+   }
+   rcu_read_unlock();
+
+   return sdev;
+}
+
+static struct intel_svm_dev *
+svm_lookup_device_by_dev(struct intel_svm *svm, struct device *dev)
+{
+   struct intel_svm_dev *sdev = NULL, *t;
+
+   rcu_read_lock();
+   list_for_each_entry_rcu(t, &svm->devs, list) {
+   if (t->dev == dev) {
+   sdev = t;
+   break;
+   }
+   }
+   rcu_read_unlock();
+
+   return sdev;
+}
+
 int intel_svm_enable_prq(struct intel_iommu *iommu)
 {
struct page *pages;
@@ -225,15 +259,11 @@ static const struct mmu_notifier_ops intel_mmuops = {
 
 static DEFINE_MUTEX(pasid_mutex);
 
-#define for_each_svm_dev(sdev, svm, d) \
-   list_for_each_entry((sdev), &(svm)->devs, list) \
-   if ((d) != (sdev)->dev) {} else
-
 static int pasid_to_svm_sdev(struct device *dev, unsigned int pasid,
 struct intel_svm **rsvm,
 struct intel_svm_dev **rsdev)
 {
-   struct intel_svm_dev *d, *sdev = NULL;
+   struct intel_svm_dev *sdev = NULL;
struct intel_svm *svm;
 
/* The caller should hold the pasid_mutex lock */
@@ -256,15 +286,7 @@ static int pasid_to_svm_sdev(struct device *dev, unsigned 
int pasid,
 */
if (WARN_ON(list_empty(&svm->devs)))
return -EINVAL;
-
-   rcu_read_lock();
-   list_for_each_entry_rcu(d, &svm->devs, list) {
-   if (d->dev == dev) {
-   sdev = d;
-   break;
-   }
-   }
-   rcu_read_unlock();
+   sdev = svm_lookup_device_by_dev(svm, dev);
 
 out:
*rsvm = svm;
@@ -533,7 +555,8 @@ static struct iommu_sva *intel_svm_bind_mm(struct 
intel_iommu *iommu,
}
 
/* Find the matching device in svm list */
-   for_each_svm_dev(sdev, svm, dev) {
+   sdev = svm_lookup_device_by_dev(svm, dev);
+   if (sdev) {
sdev->users++;
goto success;
}
@@ -901,19 +924,8 @@ static irqreturn_t prq_event_thread(int irq, void *d)
}
}
 
-   if (!sdev || sdev->sid != req->rid) {
-   struct intel_svm_dev *t;
-
-   sdev = NULL;
-   rcu_read_lock();
-   list_for_each_entry_rcu(t, &svm->devs, list) {
-   if (t->sid == req->rid) {
-   sdev = t;
-   break;
-   }
-   }
-   rcu_read_unlock();
-   }
+   if (!sdev || sdev->sid != req->rid)
+   sdev = svm_lookup_device_by_sid(svm, req->rid);
 
/* Since we're using init_mm.pgd directly, we should never take
 * any faults on kernel addresses. */
-- 
2.25.1

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

[PATCH 09/23] iommu/vt-d: Refactor prq_event_thread()

2021-06-09 Thread Lu Baolu

Refactor prq_event_thread() by moving handling single prq event out of
the main loop.

Signed-off-by: Lu Baolu 
Link: https://lore.kernel.org/r/20210520031531.712333-1-baolu...@linux.intel.com
---
 drivers/iommu/intel/svm.c | 239 ++
 1 file changed, 136 insertions(+), 103 deletions(-)

diff --git a/drivers/iommu/intel/svm.c b/drivers/iommu/intel/svm.c
index 57867ff97bc2..d51ddece4259 100644
--- a/drivers/iommu/intel/svm.c
+++ b/drivers/iommu/intel/svm.c
@@ -866,141 +866,174 @@ intel_svm_prq_report(struct device *dev, struct 
page_req_dsc *desc)
return iommu_report_device_fault(dev, &event);
 }
 
+static void handle_bad_prq_event(struct intel_iommu *iommu,
+struct page_req_dsc *req, int result)
+{
+   struct qi_desc desc;
+
+   pr_err("%s: Invalid page request: %08llx %08llx\n",
+  iommu->name, ((unsigned long long *)req)[0],
+  ((unsigned long long *)req)[1]);
+
+   /*
+* Per VT-d spec. v3.0 ch7.7, system software must
+* respond with page group response if private data
+* is present (PDP) or last page in group (LPIG) bit
+* is set. This is an additional VT-d feature beyond
+* PCI ATS spec.
+*/
+   if (!req->lpig && !req->priv_data_present)
+   return;
+
+   desc.qw0 = QI_PGRP_PASID(req->pasid) |
+   QI_PGRP_DID(req->rid) |
+   QI_PGRP_PASID_P(req->pasid_present) |
+   QI_PGRP_PDP(req->priv_data_present) |
+   QI_PGRP_RESP_CODE(result) |
+   QI_PGRP_RESP_TYPE;
+   desc.qw1 = QI_PGRP_IDX(req->prg_index) |
+   QI_PGRP_LPIG(req->lpig);
+   desc.qw2 = 0;
+   desc.qw3 = 0;
+
+   if (req->priv_data_present)
+   memcpy(&desc.qw2, req->priv_data, sizeof(req->priv_data));
+   qi_submit_sync(iommu, &desc, 1, 0);
+}
+
+static void handle_single_prq_event(struct intel_iommu *iommu,
+   struct mm_struct *mm,
+   struct page_req_dsc *req)
+{
+   u64 address = (u64)req->addr << VTD_PAGE_SHIFT;
+   int result = QI_RESP_INVALID;
+   struct vm_area_struct *vma;
+   struct qi_desc desc;
+   unsigned int flags;
+   vm_fault_t ret;
+
+   /* If the mm is already defunct, don't handle faults. */
+   if (!mmget_not_zero(mm))
+   goto response;
+
+   mmap_read_lock(mm);
+   vma = find_extend_vma(mm, address);
+   if (!vma || address < vma->vm_start)
+   goto invalid;
+
+   if (access_error(vma, req))
+   goto invalid;
+
+   flags = FAULT_FLAG_USER | FAULT_FLAG_REMOTE;
+   if (req->wr_req)
+   flags |= FAULT_FLAG_WRITE;
+
+   ret = handle_mm_fault(vma, address, flags, NULL);
+   if (!(ret & VM_FAULT_ERROR))
+   result = QI_RESP_SUCCESS;
+invalid:
+   mmap_read_unlock(mm);
+   mmput(mm);
+
+response:
+   if (!(req->lpig || req->priv_data_present))
+   return;
+
+   desc.qw0 = QI_PGRP_PASID(req->pasid) |
+   QI_PGRP_DID(req->rid) |
+   QI_PGRP_PASID_P(req->pasid_present) |
+   QI_PGRP_PDP(req->priv_data_present) |
+   QI_PGRP_RESP_CODE(result) |
+   QI_PGRP_RESP_TYPE;
+   desc.qw1 = QI_PGRP_IDX(req->prg_index) |
+   QI_PGRP_LPIG(req->lpig);
+   desc.qw2 = 0;
+   desc.qw3 = 0;
+
+   if (req->priv_data_present)
+   memcpy(&desc.qw2, req->priv_data, sizeof(req->priv_data));
+
+   qi_submit_sync(iommu, &desc, 1, 0);
+}
+
 static irqreturn_t prq_event_thread(int irq, void *d)
 {
struct intel_svm_dev *sdev = NULL;
struct intel_iommu *iommu = d;
struct intel_svm *svm = NULL;
-   int head, tail, handled = 0;
-   unsigned int flags = 0;
+   struct page_req_dsc *req;
+   int head, tail, handled;
+   u64 address;
 
-   /* Clear PPR bit before reading head/tail registers, to
-* ensure that we get a new interrupt if needed. */
+   /*
+* Clear PPR bit before reading head/tail registers, to ensure that
+* we get a new interrupt if needed.
+*/
writel(DMA_PRS_PPR, iommu->reg + DMAR_PRS_REG);
 
tail = dmar_readq(iommu->reg + DMAR_PQT_REG) & PRQ_RING_MASK;
head = dmar_readq(iommu->reg + DMAR_PQH_REG) & PRQ_RING_MASK;
+   handled = (head != tail);
while (head != tail) {
-   struct vm_area_struct *vma;
-   struct page_req_dsc *req;
-   struct qi_desc resp;
-   int result;
-   vm_fault_t ret;
-   u64 address;
-
-   handled = 1;
req = &iommu->prq[head / sizeof(*req)];
-   result = QI_RESP_INVALID;
address = (u64)req->addr << V

[PATCH 10/23] iommu/vt-d: Allocate/register iopf queue for sva devices

2021-06-09 Thread Lu Baolu

This allocates and registers the iopf queue infrastructure for devices
which want to support IO page fault for SVA.

Signed-off-by: Lu Baolu 
Link: https://lore.kernel.org/r/20210520031531.712333-1-baolu...@linux.intel.com
---
 include/linux/intel-iommu.h |  2 ++
 drivers/iommu/intel/iommu.c | 66 ++---
 drivers/iommu/intel/svm.c   | 37 +
 3 files changed, 79 insertions(+), 26 deletions(-)

diff --git a/include/linux/intel-iommu.h b/include/linux/intel-iommu.h
index 4e8bb186daa7..222520d149c1 100644
--- a/include/linux/intel-iommu.h
+++ b/include/linux/intel-iommu.h
@@ -606,6 +606,8 @@ struct intel_iommu {
struct completion prq_complete;
struct ioasid_allocator_ops pasid_allocator; /* Custom allocator for 
PASIDs */
 #endif
+   struct iopf_queue *iopf_queue;
+   unsigned char iopfq_name[16];
struct q_inval  *qi;/* Queued invalidation info */
u32 *iommu_state; /* Store iommu states between suspend and resume.*/
 
diff --git a/drivers/iommu/intel/iommu.c b/drivers/iommu/intel/iommu.c
index 0ca7f8a2f38e..e78773d46d7d 100644
--- a/drivers/iommu/intel/iommu.c
+++ b/drivers/iommu/intel/iommu.c
@@ -46,6 +46,7 @@
 #include 
 
 #include "../irq_remapping.h"
+#include "../iommu-sva-lib.h"
 #include "pasid.h"
 #include "cap_audit.h"
 
@@ -5338,6 +5339,34 @@ static int intel_iommu_disable_auxd(struct device *dev)
return 0;
 }
 
+static int intel_iommu_enable_sva(struct device *dev)
+{
+   struct device_domain_info *info = get_domain_info(dev);
+   struct intel_iommu *iommu = info->iommu;
+
+   if (!info || !iommu || dmar_disabled)
+   return -EINVAL;
+
+   if (!(iommu->flags & VTD_FLAG_SVM_CAPABLE))
+   return -ENODEV;
+
+   if (intel_iommu_enable_pasid(iommu, dev))
+   return -ENODEV;
+
+   if (!info->pasid_enabled || !info->pri_enabled || !info->ats_enabled)
+   return -EINVAL;
+
+   return iopf_queue_add_device(iommu->iopf_queue, dev);
+}
+
+static int intel_iommu_disable_sva(struct device *dev)
+{
+   struct device_domain_info *info = get_domain_info(dev);
+   struct intel_iommu *iommu = info->iommu;
+
+   return iopf_queue_remove_device(iommu->iopf_queue, dev);
+}
+
 /*
  * A PCI express designated vendor specific extended capability is defined
  * in the section 3.7 of Intel scalable I/O virtualization technical spec
@@ -5399,38 +5428,37 @@ intel_iommu_dev_has_feat(struct device *dev, enum 
iommu_dev_features feat)
 static int
 intel_iommu_dev_enable_feat(struct device *dev, enum iommu_dev_features feat)
 {
-   if (feat == IOMMU_DEV_FEAT_AUX)
+   switch (feat) {
+   case IOMMU_DEV_FEAT_AUX:
return intel_iommu_enable_auxd(dev);
 
-   if (feat == IOMMU_DEV_FEAT_IOPF)
+   case IOMMU_DEV_FEAT_IOPF:
return intel_iommu_dev_has_feat(dev, feat) ? 0 : -ENODEV;
 
-   if (feat == IOMMU_DEV_FEAT_SVA) {
-   struct device_domain_info *info = get_domain_info(dev);
-
-   if (!info)
-   return -EINVAL;
-
-   if (intel_iommu_enable_pasid(info->iommu, dev))
-   return -ENODEV;
+   case IOMMU_DEV_FEAT_SVA:
+   return intel_iommu_enable_sva(dev);
 
-   if (!info->pasid_enabled || !info->pri_enabled || 
!info->ats_enabled)
-   return -EINVAL;
-
-   if (info->iommu->flags & VTD_FLAG_SVM_CAPABLE)
-   return 0;
+   default:
+   return -ENODEV;
}
-
-   return -ENODEV;
 }
 
 static int
 intel_iommu_dev_disable_feat(struct device *dev, enum iommu_dev_features feat)
 {
-   if (feat == IOMMU_DEV_FEAT_AUX)
+   switch (feat) {
+   case IOMMU_DEV_FEAT_AUX:
return intel_iommu_disable_auxd(dev);
 
-   return -ENODEV;
+   case IOMMU_DEV_FEAT_IOPF:
+   return 0;
+
+   case IOMMU_DEV_FEAT_SVA:
+   return intel_iommu_disable_sva(dev);
+
+   default:
+   return -ENODEV;
+   }
 }
 
 static bool
diff --git a/drivers/iommu/intel/svm.c b/drivers/iommu/intel/svm.c
index d51ddece4259..4dc3ab36e9ae 100644
--- a/drivers/iommu/intel/svm.c
+++ b/drivers/iommu/intel/svm.c
@@ -84,6 +84,7 @@ svm_lookup_device_by_dev(struct intel_svm *svm, struct device 
*dev)
 
 int intel_svm_enable_prq(struct intel_iommu *iommu)
 {
+   struct iopf_queue *iopfq;
struct page *pages;
int irq, ret;
 
@@ -100,13 +101,20 @@ int intel_svm_enable_prq(struct intel_iommu *iommu)
pr_err("IOMMU: %s: Failed to create IRQ vector for page request 
queue\n",
   iommu->name);
ret = -EINVAL;
-   err:
-   free_pages((unsigned long)iommu->prq, PRQ_ORDER);
-   iommu->prq = NULL;
-   return ret;
+   goto free_prq;
}
iommu->pr_irq = irq;
 
+   snpr

[PATCH 11/23] iommu/vt-d: Report prq to io-pgfault framework

2021-06-09 Thread Lu Baolu

Let the IO page fault requests get handled through the io-pgfault
framework.

Signed-off-by: Lu Baolu 
Link: https://lore.kernel.org/r/20210520031531.712333-1-baolu...@linux.intel.com
---
 drivers/iommu/intel/iommu.c | 14 ++-
 drivers/iommu/intel/svm.c   | 84 +++--
 2 files changed, 17 insertions(+), 81 deletions(-)

diff --git a/drivers/iommu/intel/iommu.c b/drivers/iommu/intel/iommu.c
index e78773d46d7d..c45d4946f92d 100644
--- a/drivers/iommu/intel/iommu.c
+++ b/drivers/iommu/intel/iommu.c
@@ -5343,6 +5343,7 @@ static int intel_iommu_enable_sva(struct device *dev)
 {
struct device_domain_info *info = get_domain_info(dev);
struct intel_iommu *iommu = info->iommu;
+   int ret;
 
if (!info || !iommu || dmar_disabled)
return -EINVAL;
@@ -5356,15 +5357,24 @@ static int intel_iommu_enable_sva(struct device *dev)
if (!info->pasid_enabled || !info->pri_enabled || !info->ats_enabled)
return -EINVAL;
 
-   return iopf_queue_add_device(iommu->iopf_queue, dev);
+   ret = iopf_queue_add_device(iommu->iopf_queue, dev);
+   if (!ret)
+   ret = iommu_register_device_fault_handler(dev, 
iommu_queue_iopf, dev);
+
+   return ret;
 }
 
 static int intel_iommu_disable_sva(struct device *dev)
 {
struct device_domain_info *info = get_domain_info(dev);
struct intel_iommu *iommu = info->iommu;
+   int ret;
+
+   ret = iommu_unregister_device_fault_handler(dev);
+   if (!ret)
+   ret = iopf_queue_remove_device(iommu->iopf_queue, dev);
 
-   return iopf_queue_remove_device(iommu->iopf_queue, dev);
+   return ret;
 }
 
 /*
diff --git a/drivers/iommu/intel/svm.c b/drivers/iommu/intel/svm.c
index 4dc3ab36e9ae..ade157b64ce7 100644
--- a/drivers/iommu/intel/svm.c
+++ b/drivers/iommu/intel/svm.c
@@ -724,22 +724,6 @@ struct page_req_dsc {
 
 #define PRQ_RING_MASK  ((0x1000 << PRQ_ORDER) - 0x20)
 
-static bool access_error(struct vm_area_struct *vma, struct page_req_dsc *req)
-{
-   unsigned long requested = 0;
-
-   if (req->exe_req)
-   requested |= VM_EXEC;
-
-   if (req->rd_req)
-   requested |= VM_READ;
-
-   if (req->wr_req)
-   requested |= VM_WRITE;
-
-   return (requested & ~vma->vm_flags) != 0;
-}
-
 static bool is_canonical_address(u64 addr)
 {
int shift = 64 - (__VIRTUAL_MASK_SHIFT + 1);
@@ -809,6 +793,8 @@ static void intel_svm_drain_prq(struct device *dev, u32 
pasid)
goto prq_retry;
}
 
+   iopf_queue_flush_dev(dev);
+
/*
 * Perform steps described in VT-d spec CH7.10 to drain page
 * requests and responses in hardware.
@@ -924,61 +910,6 @@ static void handle_bad_prq_event(struct intel_iommu *iommu,
qi_submit_sync(iommu, &desc, 1, 0);
 }
 
-static void handle_single_prq_event(struct intel_iommu *iommu,
-   struct mm_struct *mm,
-   struct page_req_dsc *req)
-{
-   u64 address = (u64)req->addr << VTD_PAGE_SHIFT;
-   int result = QI_RESP_INVALID;
-   struct vm_area_struct *vma;
-   struct qi_desc desc;
-   unsigned int flags;
-   vm_fault_t ret;
-
-   /* If the mm is already defunct, don't handle faults. */
-   if (!mmget_not_zero(mm))
-   goto response;
-
-   mmap_read_lock(mm);
-   vma = find_extend_vma(mm, address);
-   if (!vma || address < vma->vm_start)
-   goto invalid;
-
-   if (access_error(vma, req))
-   goto invalid;
-
-   flags = FAULT_FLAG_USER | FAULT_FLAG_REMOTE;
-   if (req->wr_req)
-   flags |= FAULT_FLAG_WRITE;
-
-   ret = handle_mm_fault(vma, address, flags, NULL);
-   if (!(ret & VM_FAULT_ERROR))
-   result = QI_RESP_SUCCESS;
-invalid:
-   mmap_read_unlock(mm);
-   mmput(mm);
-
-response:
-   if (!(req->lpig || req->priv_data_present))
-   return;
-
-   desc.qw0 = QI_PGRP_PASID(req->pasid) |
-   QI_PGRP_DID(req->rid) |
-   QI_PGRP_PASID_P(req->pasid_present) |
-   QI_PGRP_PDP(req->priv_data_present) |
-   QI_PGRP_RESP_CODE(result) |
-   QI_PGRP_RESP_TYPE;
-   desc.qw1 = QI_PGRP_IDX(req->prg_index) |
-   QI_PGRP_LPIG(req->lpig);
-   desc.qw2 = 0;
-   desc.qw3 = 0;
-
-   if (req->priv_data_present)
-   memcpy(&desc.qw2, req->priv_data, sizeof(req->priv_data));
-
-   qi_submit_sync(iommu, &desc, 1, 0);
-}
-
 static irqreturn_t prq_event_thread(int irq, void *d)
 {
struct intel_svm_dev *sdev = NULL;
@@ -1049,14 +980,8 @@ static irqreturn_t prq_event_thread(int irq, void *d)
 * If prq is to be handled outside iommu driver via receiver of
 * the fault notifiers, we skip the page response here.
 */

[PATCH 12/23] iommu/vt-d: Add prq_report trace event

2021-06-09 Thread Lu Baolu

This adds a new trace event to track the page fault request report.
This event will provide almost all information defined in a page
request descriptor.

A sample output:
| prq_report: dmar0/:00:0a.0 seq# 1: rid=0x50 addr=0x559ef6f97 r 
pasid=0x2 index=0x1
| prq_report: dmar0/:00:0a.0 seq# 2: rid=0x50 addr=0x559ef6f9c rw--l 
pasid=0x2 index=0x1
| prq_report: dmar0/:00:0a.0 seq# 3: rid=0x50 addr=0x559ef6f98 r 
pasid=0x2 index=0x1
| prq_report: dmar0/:00:0a.0 seq# 4: rid=0x50 addr=0x559ef6f9d rw--l 
pasid=0x2 index=0x1
| prq_report: dmar0/:00:0a.0 seq# 5: rid=0x50 addr=0x559ef6f99 r 
pasid=0x2 index=0x1
| prq_report: dmar0/:00:0a.0 seq# 6: rid=0x50 addr=0x559ef6f9e rw--l 
pasid=0x2 index=0x1
| prq_report: dmar0/:00:0a.0 seq# 7: rid=0x50 addr=0x559ef6f9a r 
pasid=0x2 index=0x1
| prq_report: dmar0/:00:0a.0 seq# 8: rid=0x50 addr=0x559ef6f9f rw--l 
pasid=0x2 index=0x1

This will be helpful for I/O page fault related debugging.

Signed-off-by: Lu Baolu 
Link: https://lore.kernel.org/r/20210520031531.712333-1-baolu...@linux.intel.com
---
 include/linux/intel-iommu.h| 29 +++
 include/trace/events/intel_iommu.h | 37 ++
 drivers/iommu/intel/svm.c  |  7 ++
 3 files changed, 73 insertions(+)

diff --git a/include/linux/intel-iommu.h b/include/linux/intel-iommu.h
index 222520d149c1..98b04fa9373e 100644
--- a/include/linux/intel-iommu.h
+++ b/include/linux/intel-iommu.h
@@ -778,6 +778,7 @@ struct intel_svm_dev {
struct device *dev;
struct intel_iommu *iommu;
struct iommu_sva sva;
+   unsigned long prq_seq_number;
u32 pasid;
int users;
u16 did;
@@ -828,4 +829,32 @@ static inline int iommu_calculate_max_sagaw(struct 
intel_iommu *iommu)
 #define intel_iommu_enabled (0)
 #endif
 
+static inline const char *decode_prq_descriptor(char *str, size_t size,
+   u64 dw0, u64 dw1, u64 dw2, u64 dw3)
+{
+   char *buf = str;
+   int bytes;
+
+   bytes = snprintf(buf, size,
+"rid=0x%llx addr=0x%llx %c%c%c%c%c pasid=0x%llx 
index=0x%llx",
+FIELD_GET(GENMASK_ULL(31, 16), dw0),
+FIELD_GET(GENMASK_ULL(63, 12), dw1),
+dw1 & BIT_ULL(0) ? 'r' : '-',
+dw1 & BIT_ULL(1) ? 'w' : '-',
+dw0 & BIT_ULL(52) ? 'x' : '-',
+dw0 & BIT_ULL(53) ? 'p' : '-',
+dw1 & BIT_ULL(2) ? 'l' : '-',
+FIELD_GET(GENMASK_ULL(51, 32), dw0),
+FIELD_GET(GENMASK_ULL(11, 3), dw1));
+
+   /* Private Data */
+   if (dw0 & BIT_ULL(9)) {
+   size -= bytes;
+   buf += bytes;
+   snprintf(buf, size, " private=0x%llx/0x%llx\n", dw2, dw3);
+   }
+
+   return str;
+}
+
 #endif
diff --git a/include/trace/events/intel_iommu.h 
b/include/trace/events/intel_iommu.h
index d233f2916584..e5c1ca6d16ee 100644
--- a/include/trace/events/intel_iommu.h
+++ b/include/trace/events/intel_iommu.h
@@ -15,6 +15,8 @@
 #include 
 #include 
 
+#define MSG_MAX256
+
 TRACE_EVENT(qi_submit,
TP_PROTO(struct intel_iommu *iommu, u64 qw0, u64 qw1, u64 qw2, u64 qw3),
 
@@ -51,6 +53,41 @@ TRACE_EVENT(qi_submit,
__entry->qw0, __entry->qw1, __entry->qw2, __entry->qw3
)
 );
+
+TRACE_EVENT(prq_report,
+   TP_PROTO(struct intel_iommu *iommu, struct device *dev,
+u64 dw0, u64 dw1, u64 dw2, u64 dw3,
+unsigned long seq),
+
+   TP_ARGS(iommu, dev, dw0, dw1, dw2, dw3, seq),
+
+   TP_STRUCT__entry(
+   __field(u64, dw0)
+   __field(u64, dw1)
+   __field(u64, dw2)
+   __field(u64, dw3)
+   __field(unsigned long, seq)
+   __string(iommu, iommu->name)
+   __string(dev, dev_name(dev))
+   __dynamic_array(char, buff, MSG_MAX)
+   ),
+
+   TP_fast_assign(
+   __entry->dw0 = dw0;
+   __entry->dw1 = dw1;
+   __entry->dw2 = dw2;
+   __entry->dw3 = dw3;
+   __entry->seq = seq;
+   __assign_str(iommu, iommu->name);
+   __assign_str(dev, dev_name(dev));
+   ),
+
+   TP_printk("%s/%s seq# %ld: %s",
+   __get_str(iommu), __get_str(dev), __entry->seq,
+   decode_prq_descriptor(__get_str(buff), MSG_MAX, __entry->dw0,
+ __entry->dw1, __entry->dw2, __entry->dw3)
+   )
+);
 #endif /* _TRACE_INTEL_IOMMU_H */
 
 /* This part must be outside protection */
diff --git a/drivers/iommu/intel/svm.c b/drivers/iommu/intel/svm.c
index ade157b64ce7..d3d028c6a727 100644
--- a/drivers/iommu/intel/svm.c
+++ b/drivers/iommu/intel/svm.c
@@ -21,6 +21,7 @@
 #include 
 #include 
 #include 
+#include 
 
 #include "pasid.h"
 #include "../iommu-sv

[PATCH 13/23] iommu/vt-d: Add common code for dmar latency performance monitors

2021-06-09 Thread Lu Baolu

The execution time of some operations is very performance critical, such
as cache invalidation and PRQ processing time. This adds some common code
to monitor the execution time range of those operations. The interfaces
include enabling/disabling, checking status, updating sampling data and
providing a common string format for users.

Signed-off-by: Fenghua Yu 
Signed-off-by: Lu Baolu 
Link: https://lore.kernel.org/r/20210520031531.712333-1-baolu...@linux.intel.com
---
 include/linux/intel-iommu.h  |   1 +
 drivers/iommu/intel/perf.h   |  73 +++
 drivers/iommu/intel/perf.c   | 166 +++
 drivers/iommu/intel/Kconfig  |   3 +
 drivers/iommu/intel/Makefile |   1 +
 5 files changed, 244 insertions(+)
 create mode 100644 drivers/iommu/intel/perf.h
 create mode 100644 drivers/iommu/intel/perf.c

diff --git a/include/linux/intel-iommu.h b/include/linux/intel-iommu.h
index 98b04fa9373e..f5cf31dd7280 100644
--- a/include/linux/intel-iommu.h
+++ b/include/linux/intel-iommu.h
@@ -621,6 +621,7 @@ struct intel_iommu {
u32 flags;  /* Software defined flags */
 
struct dmar_drhd_unit *drhd;
+   void *perf_statistic;
 };
 
 /* Per subdevice private data */
diff --git a/drivers/iommu/intel/perf.h b/drivers/iommu/intel/perf.h
new file mode 100644
index ..fd6db8049d1a
--- /dev/null
+++ b/drivers/iommu/intel/perf.h
@@ -0,0 +1,73 @@
+/* SPDX-License-Identifier: GPL-2.0 */
+/*
+ * perf.h - performance monitor header
+ *
+ * Copyright (C) 2021 Intel Corporation
+ *
+ * Author: Lu Baolu 
+ */
+
+enum latency_type {
+   DMAR_LATENCY_INV_IOTLB = 0,
+   DMAR_LATENCY_INV_DEVTLB,
+   DMAR_LATENCY_INV_IEC,
+   DMAR_LATENCY_PRQ,
+   DMAR_LATENCY_NUM
+};
+
+enum latency_count {
+   COUNTS_10e2 = 0,/* < 0.1us  */
+   COUNTS_10e3,/* 0.1us ~ 1us  */
+   COUNTS_10e4,/* 1us ~ 10us   */
+   COUNTS_10e5,/* 10us ~ 100us */
+   COUNTS_10e6,/* 100us ~ 1ms  */
+   COUNTS_10e7,/* 1ms ~ 10ms   */
+   COUNTS_10e8_plus,   /* 10ms and plus*/
+   COUNTS_MIN,
+   COUNTS_MAX,
+   COUNTS_SUM,
+   COUNTS_NUM
+};
+
+struct latency_statistic {
+   bool enabled;
+   u64 counter[COUNTS_NUM];
+   u64 samples;
+};
+
+#ifdef CONFIG_DMAR_PERF
+int dmar_latency_enable(struct intel_iommu *iommu, enum latency_type type);
+void dmar_latency_disable(struct intel_iommu *iommu, enum latency_type type);
+bool dmar_latency_enabled(struct intel_iommu *iommu, enum latency_type type);
+void dmar_latency_update(struct intel_iommu *iommu, enum latency_type type,
+u64 latency);
+int dmar_latency_snapshot(struct intel_iommu *iommu, char *str, size_t size);
+#else
+static inline int
+dmar_latency_enable(struct intel_iommu *iommu, enum latency_type type)
+{
+   return -EINVAL;
+}
+
+static inline void
+dmar_latency_disable(struct intel_iommu *iommu, enum latency_type type)
+{
+}
+
+static inline bool
+dmar_latency_enabled(struct intel_iommu *iommu, enum latency_type type)
+{
+   return false;
+}
+
+static inline void
+dmar_latency_update(struct intel_iommu *iommu, enum latency_type type, u64 
latency)
+{
+}
+
+static inline int
+dmar_latency_snapshot(struct intel_iommu *iommu, char *str, size_t size)
+{
+   return 0;
+}
+#endif /* CONFIG_DMAR_PERF */
diff --git a/drivers/iommu/intel/perf.c b/drivers/iommu/intel/perf.c
new file mode 100644
index ..faaa96dda437
--- /dev/null
+++ b/drivers/iommu/intel/perf.c
@@ -0,0 +1,166 @@
+// SPDX-License-Identifier: GPL-2.0
+/**
+ * perf.c - performance monitor
+ *
+ * Copyright (C) 2021 Intel Corporation
+ *
+ * Author: Lu Baolu 
+ * Fenghua Yu 
+ */
+
+#include 
+#include 
+
+#include "perf.h"
+
+static DEFINE_SPINLOCK(latency_lock);
+
+bool dmar_latency_enabled(struct intel_iommu *iommu, enum latency_type type)
+{
+   struct latency_statistic *lstat = iommu->perf_statistic;
+
+   return lstat && lstat[type].enabled;
+}
+
+int dmar_latency_enable(struct intel_iommu *iommu, enum latency_type type)
+{
+   struct latency_statistic *lstat;
+   unsigned long flags;
+   int ret = -EBUSY;
+
+   if (dmar_latency_enabled(iommu, type))
+   return 0;
+
+   spin_lock_irqsave(&latency_lock, flags);
+   if (!iommu->perf_statistic) {
+   iommu->perf_statistic = kzalloc(sizeof(*lstat) * 
DMAR_LATENCY_NUM,
+   GFP_ATOMIC);
+   if (!iommu->perf_statistic) {
+   ret = -ENOMEM;
+   goto unlock_out;
+   }
+   }
+
+   lstat = iommu->perf_statistic;
+
+   if (!lstat[type].enabled) {
+   lstat[type].enabled = true;
+   lstat[type].counter[COUNTS_MIN] = UINT_MAX;
+   ret = 0;
+   }
+unlock_out:
+   spin_unlock_irqrestore(&latency_lock, flags);
+
+   return ret

[PATCH 14/23] iommu/vt-d: Expose latency monitor data through debugfs

2021-06-09 Thread Lu Baolu

A debugfs interface /sys/kernel/debug/iommu/intel/dmar_perf_latency is
created to control and show counts of execution time ranges for various
types per DMAR. The interface may help debug any potential performance
issue.

By default, the interface is disabled.

Possible write value of /sys/kernel/debug/iommu/intel/dmar_perf_latency
  0 - disable sampling all latency data
  1 - enable sampling IOTLB invalidation latency data
  2 - enable sampling devTLB invalidation latency data
  3 - enable sampling intr entry cache invalidation latency data
  4 - enable sampling prq handling latency data

Read /sys/kernel/debug/iommu/intel/dmar_perf_latency gives a snapshot
of sampling result of all enabled monitors.

Signed-off-by: Fenghua Yu 
Signed-off-by: Lu Baolu 
Link: https://lore.kernel.org/r/20210520031531.712333-1-baolu...@linux.intel.com
---
 drivers/iommu/intel/debugfs.c | 111 ++
 drivers/iommu/intel/Kconfig   |   1 +
 2 files changed, 112 insertions(+)

diff --git a/drivers/iommu/intel/debugfs.c b/drivers/iommu/intel/debugfs.c
index efea7f02abd9..62e23ff3c987 100644
--- a/drivers/iommu/intel/debugfs.c
+++ b/drivers/iommu/intel/debugfs.c
@@ -16,6 +16,7 @@
 #include 
 
 #include "pasid.h"
+#include "perf.h"
 
 struct tbl_walk {
u16 bus;
@@ -31,6 +32,9 @@ struct iommu_regset {
const char *regs;
 };
 
+#define DEBUG_BUFFER_SIZE  1024
+static char debug_buf[DEBUG_BUFFER_SIZE];
+
 #define IOMMU_REGSET_ENTRY(_reg_)  \
{ DMAR_##_reg_##_REG, __stringify(_reg_) }
 
@@ -538,6 +542,111 @@ static int ir_translation_struct_show(struct seq_file *m, 
void *unused)
 DEFINE_SHOW_ATTRIBUTE(ir_translation_struct);
 #endif
 
+static void latency_show_one(struct seq_file *m, struct intel_iommu *iommu,
+struct dmar_drhd_unit *drhd)
+{
+   int ret;
+
+   seq_printf(m, "IOMMU: %s Register Base Address: %llx\n",
+  iommu->name, drhd->reg_base_addr);
+
+   ret = dmar_latency_snapshot(iommu, debug_buf, DEBUG_BUFFER_SIZE);
+   if (ret < 0)
+   seq_puts(m, "Failed to get latency snapshot");
+   else
+   seq_puts(m, debug_buf);
+   seq_puts(m, "\n");
+}
+
+static int latency_show(struct seq_file *m, void *v)
+{
+   struct dmar_drhd_unit *drhd;
+   struct intel_iommu *iommu;
+
+   rcu_read_lock();
+   for_each_active_iommu(iommu, drhd)
+   latency_show_one(m, iommu, drhd);
+   rcu_read_unlock();
+
+   return 0;
+}
+
+static int dmar_perf_latency_open(struct inode *inode, struct file *filp)
+{
+   return single_open(filp, latency_show, NULL);
+}
+
+static ssize_t dmar_perf_latency_write(struct file *filp,
+  const char __user *ubuf,
+  size_t cnt, loff_t *ppos)
+{
+   struct dmar_drhd_unit *drhd;
+   struct intel_iommu *iommu;
+   int counting;
+   char buf[64];
+
+   if (cnt > 63)
+   cnt = 63;
+
+   if (copy_from_user(&buf, ubuf, cnt))
+   return -EFAULT;
+
+   buf[cnt] = 0;
+
+   if (kstrtoint(buf, 0, &counting))
+   return -EINVAL;
+
+   switch (counting) {
+   case 0:
+   rcu_read_lock();
+   for_each_active_iommu(iommu, drhd) {
+   dmar_latency_disable(iommu, DMAR_LATENCY_INV_IOTLB);
+   dmar_latency_disable(iommu, DMAR_LATENCY_INV_DEVTLB);
+   dmar_latency_disable(iommu, DMAR_LATENCY_INV_IEC);
+   dmar_latency_disable(iommu, DMAR_LATENCY_PRQ);
+   }
+   rcu_read_unlock();
+   break;
+   case 1:
+   rcu_read_lock();
+   for_each_active_iommu(iommu, drhd)
+   dmar_latency_enable(iommu, DMAR_LATENCY_INV_IOTLB);
+   rcu_read_unlock();
+   break;
+   case 2:
+   rcu_read_lock();
+   for_each_active_iommu(iommu, drhd)
+   dmar_latency_enable(iommu, DMAR_LATENCY_INV_DEVTLB);
+   rcu_read_unlock();
+   break;
+   case 3:
+   rcu_read_lock();
+   for_each_active_iommu(iommu, drhd)
+   dmar_latency_enable(iommu, DMAR_LATENCY_INV_IEC);
+   rcu_read_unlock();
+   break;
+   case 4:
+   rcu_read_lock();
+   for_each_active_iommu(iommu, drhd)
+   dmar_latency_enable(iommu, DMAR_LATENCY_PRQ);
+   rcu_read_unlock();
+   break;
+   default:
+   return -EINVAL;
+   }
+
+   *ppos += cnt;
+   return cnt;
+}
+
+static const struct file_operations dmar_perf_latency_fops = {
+   .open   = dmar_perf_latency_open,
+   .write  = dmar_perf_latency_write,
+   .read   = seq_read,
+   .llseek = seq_ls

[PATCH 15/23] iommu/vt-d: Add cache invalidation latency sampling

2021-06-09 Thread Lu Baolu

Queued invalidation execution time is performance critical and needs
to be monitored. This adds code to sample the execution time of IOTLB/
devTLB/ICE cache invalidation.

Signed-off-by: Lu Baolu 
Link: https://lore.kernel.org/r/20210520031531.712333-1-baolu...@linux.intel.com
---
 drivers/iommu/intel/dmar.c | 31 +++
 1 file changed, 31 insertions(+)

diff --git a/drivers/iommu/intel/dmar.c b/drivers/iommu/intel/dmar.c
index cae1078cbfec..d66f79acd14d 100644
--- a/drivers/iommu/intel/dmar.c
+++ b/drivers/iommu/intel/dmar.c
@@ -34,6 +34,7 @@
 #include 
 
 #include "../irq_remapping.h"
+#include "perf.h"
 
 typedef int (*dmar_res_handler_t)(struct acpi_dmar_header *, void *);
 struct dmar_res_callback {
@@ -1342,15 +1343,33 @@ int qi_submit_sync(struct intel_iommu *iommu, struct 
qi_desc *desc,
   unsigned int count, unsigned long options)
 {
struct q_inval *qi = iommu->qi;
+   s64 devtlb_start_ktime = 0;
+   s64 iotlb_start_ktime = 0;
+   s64 iec_start_ktime = 0;
struct qi_desc wait_desc;
int wait_index, index;
unsigned long flags;
int offset, shift;
int rc, i;
+   u64 type;
 
if (!qi)
return 0;
 
+   type = desc->qw0 & GENMASK_ULL(3, 0);
+
+   if ((type == QI_IOTLB_TYPE || type == QI_EIOTLB_TYPE) &&
+   dmar_latency_enabled(iommu, DMAR_LATENCY_INV_IOTLB))
+   iotlb_start_ktime = ktime_to_ns(ktime_get());
+
+   if ((type == QI_DIOTLB_TYPE || type == QI_DEIOTLB_TYPE) &&
+   dmar_latency_enabled(iommu, DMAR_LATENCY_INV_DEVTLB))
+   devtlb_start_ktime = ktime_to_ns(ktime_get());
+
+   if (type == QI_IEC_TYPE &&
+   dmar_latency_enabled(iommu, DMAR_LATENCY_INV_IEC))
+   iec_start_ktime = ktime_to_ns(ktime_get());
+
 restart:
rc = 0;
 
@@ -1425,6 +1444,18 @@ int qi_submit_sync(struct intel_iommu *iommu, struct 
qi_desc *desc,
if (rc == -EAGAIN)
goto restart;
 
+   if (iotlb_start_ktime)
+   dmar_latency_update(iommu, DMAR_LATENCY_INV_IOTLB,
+   ktime_to_ns(ktime_get()) - iotlb_start_ktime);
+
+   if (devtlb_start_ktime)
+   dmar_latency_update(iommu, DMAR_LATENCY_INV_DEVTLB,
+   ktime_to_ns(ktime_get()) - devtlb_start_ktime);
+
+   if (iec_start_ktime)
+   dmar_latency_update(iommu, DMAR_LATENCY_INV_IEC,
+   ktime_to_ns(ktime_get()) - iec_start_ktime);
+
return rc;
 }
 
-- 
2.25.1

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

[PATCH 16/23] iommu/vt-d: Add PRQ handling latency sampling

2021-06-09 Thread Lu Baolu

The execution time for page fault request handling is performance critical
and needs to be monitored. This adds code to sample the execution time of
page fault request handling.

Signed-off-by: Lu Baolu 
Link: https://lore.kernel.org/r/20210520031531.712333-1-baolu...@linux.intel.com
---
 drivers/iommu/intel/svm.c | 16 +---
 1 file changed, 13 insertions(+), 3 deletions(-)

diff --git a/drivers/iommu/intel/svm.c b/drivers/iommu/intel/svm.c
index d3d028c6a727..6bff9a7f9133 100644
--- a/drivers/iommu/intel/svm.c
+++ b/drivers/iommu/intel/svm.c
@@ -24,6 +24,7 @@
 #include 
 
 #include "pasid.h"
+#include "perf.h"
 #include "../iommu-sva-lib.h"
 
 static irqreturn_t prq_event_thread(int irq, void *d);
@@ -838,8 +839,8 @@ static int prq_to_iommu_prot(struct page_req_dsc *req)
return prot;
 }
 
-static int
-intel_svm_prq_report(struct device *dev, struct page_req_dsc *desc)
+static int intel_svm_prq_report(struct intel_iommu *iommu, struct device *dev,
+   struct page_req_dsc *desc)
 {
struct iommu_fault_event event;
 
@@ -871,6 +872,12 @@ intel_svm_prq_report(struct device *dev, struct 
page_req_dsc *desc)
event.fault.prm.flags |= IOMMU_FAULT_PAGE_REQUEST_PRIV_DATA;
memcpy(event.fault.prm.private_data, desc->priv_data,
   sizeof(desc->priv_data));
+   } else if (dmar_latency_enabled(iommu, DMAR_LATENCY_PRQ)) {
+   /*
+* If the private data fields are not used by hardware, use it
+* to monitor the prq handle latency.
+*/
+   event.fault.prm.private_data[0] = ktime_to_ns(ktime_get());
}
 
return iommu_report_device_fault(dev, &event);
@@ -983,7 +990,7 @@ static irqreturn_t prq_event_thread(int irq, void *d)
 * If prq is to be handled outside iommu driver via receiver of
 * the fault notifiers, we skip the page response here.
 */
-   if (intel_svm_prq_report(sdev->dev, req))
+   if (intel_svm_prq_report(iommu, sdev->dev, req))
handle_bad_prq_event(iommu, req, QI_RESP_INVALID);
 
trace_prq_report(iommu, sdev->dev, req->qw_0, req->qw_1,
@@ -1172,6 +1179,9 @@ int intel_svm_page_response(struct device *dev,
if (private_present)
memcpy(&desc.qw2, prm->private_data,
   sizeof(prm->private_data));
+   else if (prm->private_data[0])
+   dmar_latency_update(iommu, DMAR_LATENCY_PRQ,
+   ktime_to_ns(ktime_get()) - 
prm->private_data[0]);
 
qi_submit_sync(iommu, &desc, 1, 0);
}
-- 
2.25.1

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

[PATCH 17/23] iommu/vt-d: Fix out-bounds-warning in intel/svm.c

2021-06-09 Thread Lu Baolu

From: "Gustavo A. R. Silva" 

Replace a couple of calls to memcpy() with simple assignments in order
to fix the following out-of-bounds warning:

drivers/iommu/intel/svm.c:1198:4: warning: 'memcpy' offset [25, 32] from
the object at 'desc' is out of the bounds of referenced subobject
'qw2' with type 'long long unsigned int' at offset 16 [-Warray-bounds]

The problem is that the original code is trying to copy data into a
couple of struct members adjacent to each other in a single call to
memcpy(). This causes a legitimate compiler warning because memcpy()
overruns the length of &desc.qw2 and &resp.qw2, respectively.

This helps with the ongoing efforts to globally enable -Warray-bounds
and get us closer to being able to tighten the FORTIFY_SOURCE routines
on memcpy().

Link: https://github.com/KSPP/linux/issues/109
Signed-off-by: Gustavo A. R. Silva 
Signed-off-by: Lu Baolu 
Link: https://lore.kernel.org/r/20210414201403.GA392764@embeddedor
---
 drivers/iommu/intel/svm.c | 26 --
 1 file changed, 16 insertions(+), 10 deletions(-)

diff --git a/drivers/iommu/intel/svm.c b/drivers/iommu/intel/svm.c
index 6bff9a7f9133..9b0f22bc0514 100644
--- a/drivers/iommu/intel/svm.c
+++ b/drivers/iommu/intel/svm.c
@@ -870,8 +870,8 @@ static int intel_svm_prq_report(struct intel_iommu *iommu, 
struct device *dev,
 */
event.fault.prm.flags |= IOMMU_FAULT_PAGE_REQUEST_LAST_PAGE;
event.fault.prm.flags |= IOMMU_FAULT_PAGE_REQUEST_PRIV_DATA;
-   memcpy(event.fault.prm.private_data, desc->priv_data,
-  sizeof(desc->priv_data));
+   event.fault.prm.private_data[0] = desc->priv_data[0];
+   event.fault.prm.private_data[1] = desc->priv_data[1];
} else if (dmar_latency_enabled(iommu, DMAR_LATENCY_PRQ)) {
/*
 * If the private data fields are not used by hardware, use it
@@ -910,11 +910,15 @@ static void handle_bad_prq_event(struct intel_iommu 
*iommu,
QI_PGRP_RESP_TYPE;
desc.qw1 = QI_PGRP_IDX(req->prg_index) |
QI_PGRP_LPIG(req->lpig);
-   desc.qw2 = 0;
-   desc.qw3 = 0;
 
-   if (req->priv_data_present)
-   memcpy(&desc.qw2, req->priv_data, sizeof(req->priv_data));
+   if (req->priv_data_present) {
+   desc.qw2 = req->priv_data[0];
+   desc.qw3 = req->priv_data[1];
+   } else {
+   desc.qw2 = 0;
+   desc.qw3 = 0;
+   }
+
qi_submit_sync(iommu, &desc, 1, 0);
 }
 
@@ -1176,12 +1180,14 @@ int intel_svm_page_response(struct device *dev,
desc.qw1 = QI_PGRP_IDX(prm->grpid) | QI_PGRP_LPIG(last_page);
desc.qw2 = 0;
desc.qw3 = 0;
-   if (private_present)
-   memcpy(&desc.qw2, prm->private_data,
-  sizeof(prm->private_data));
-   else if (prm->private_data[0])
+
+   if (private_present) {
+   desc.qw2 = prm->private_data[0];
+   desc.qw3 = prm->private_data[1];
+   } else if (prm->private_data[0]) {
dmar_latency_update(iommu, DMAR_LATENCY_PRQ,
ktime_to_ns(ktime_get()) - 
prm->private_data[0]);
+   }
 
qi_submit_sync(iommu, &desc, 1, 0);
}
-- 
2.25.1

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

[PATCH 18/23] iommu/vt-d: Use DEVICE_ATTR_RO macro

2021-06-09 Thread Lu Baolu

From: YueHaibing 

Use DEVICE_ATTR_RO() helper instead of plain DEVICE_ATTR(),
which makes the code a bit shorter and easier to read.

Signed-off-by: YueHaibing 
Signed-off-by: Lu Baolu 
Link: https://lore.kernel.org/r/20210528130229.22108-1-yuehaib...@huawei.com
---
 drivers/iommu/intel/iommu.c | 42 -
 1 file changed, 18 insertions(+), 24 deletions(-)

diff --git a/drivers/iommu/intel/iommu.c b/drivers/iommu/intel/iommu.c
index c45d4946f92d..65458aee0d95 100644
--- a/drivers/iommu/intel/iommu.c
+++ b/drivers/iommu/intel/iommu.c
@@ -4139,62 +4139,56 @@ static inline struct intel_iommu 
*dev_to_intel_iommu(struct device *dev)
return container_of(iommu_dev, struct intel_iommu, iommu);
 }
 
-static ssize_t intel_iommu_show_version(struct device *dev,
-   struct device_attribute *attr,
-   char *buf)
+static ssize_t version_show(struct device *dev,
+   struct device_attribute *attr, char *buf)
 {
struct intel_iommu *iommu = dev_to_intel_iommu(dev);
u32 ver = readl(iommu->reg + DMAR_VER_REG);
return sprintf(buf, "%d:%d\n",
   DMAR_VER_MAJOR(ver), DMAR_VER_MINOR(ver));
 }
-static DEVICE_ATTR(version, S_IRUGO, intel_iommu_show_version, NULL);
+static DEVICE_ATTR_RO(version);
 
-static ssize_t intel_iommu_show_address(struct device *dev,
-   struct device_attribute *attr,
-   char *buf)
+static ssize_t address_show(struct device *dev,
+   struct device_attribute *attr, char *buf)
 {
struct intel_iommu *iommu = dev_to_intel_iommu(dev);
return sprintf(buf, "%llx\n", iommu->reg_phys);
 }
-static DEVICE_ATTR(address, S_IRUGO, intel_iommu_show_address, NULL);
+static DEVICE_ATTR_RO(address);
 
-static ssize_t intel_iommu_show_cap(struct device *dev,
-   struct device_attribute *attr,
-   char *buf)
+static ssize_t cap_show(struct device *dev,
+   struct device_attribute *attr, char *buf)
 {
struct intel_iommu *iommu = dev_to_intel_iommu(dev);
return sprintf(buf, "%llx\n", iommu->cap);
 }
-static DEVICE_ATTR(cap, S_IRUGO, intel_iommu_show_cap, NULL);
+static DEVICE_ATTR_RO(cap);
 
-static ssize_t intel_iommu_show_ecap(struct device *dev,
-   struct device_attribute *attr,
-   char *buf)
+static ssize_t ecap_show(struct device *dev,
+struct device_attribute *attr, char *buf)
 {
struct intel_iommu *iommu = dev_to_intel_iommu(dev);
return sprintf(buf, "%llx\n", iommu->ecap);
 }
-static DEVICE_ATTR(ecap, S_IRUGO, intel_iommu_show_ecap, NULL);
+static DEVICE_ATTR_RO(ecap);
 
-static ssize_t intel_iommu_show_ndoms(struct device *dev,
- struct device_attribute *attr,
- char *buf)
+static ssize_t domains_supported_show(struct device *dev,
+ struct device_attribute *attr, char *buf)
 {
struct intel_iommu *iommu = dev_to_intel_iommu(dev);
return sprintf(buf, "%ld\n", cap_ndoms(iommu->cap));
 }
-static DEVICE_ATTR(domains_supported, S_IRUGO, intel_iommu_show_ndoms, NULL);
+static DEVICE_ATTR_RO(domains_supported);
 
-static ssize_t intel_iommu_show_ndoms_used(struct device *dev,
-  struct device_attribute *attr,
-  char *buf)
+static ssize_t domains_used_show(struct device *dev,
+struct device_attribute *attr, char *buf)
 {
struct intel_iommu *iommu = dev_to_intel_iommu(dev);
return sprintf(buf, "%d\n", bitmap_weight(iommu->domain_ids,
  cap_ndoms(iommu->cap)));
 }
-static DEVICE_ATTR(domains_used, S_IRUGO, intel_iommu_show_ndoms_used, NULL);
+static DEVICE_ATTR_RO(domains_used);
 
 static struct attribute *intel_iommu_attrs[] = {
&dev_attr_version.attr,
-- 
2.25.1

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

[PATCH 19/23] iommu/vt-d: Use bitfields for DMAR capabilities

2021-06-09 Thread Lu Baolu

From: Parav Pandit 

IOTLB device presence, iommu coherency and snooping are boolean
capabilities. Use them as bits and keep them adjacent.

Structure layout before the reorg.
$ pahole -C dmar_domain drivers/iommu/intel/dmar.o
struct dmar_domain {
intnid;  /* 0 4 */
unsigned int   iommu_refcnt[128];/* 4   512 */
/* --- cacheline 8 boundary (512 bytes) was 4 bytes ago --- */
u16iommu_did[128];   /*   516   256 */
/* --- cacheline 12 boundary (768 bytes) was 4 bytes ago --- */
bool   has_iotlb_device; /*   772 1 */

/* XXX 3 bytes hole, try to pack */

struct list_head   devices;  /*   77616 */
struct list_head   subdevices;   /*   79216 */
struct iova_domain iovad __attribute__((__aligned__(8)));
 /*   808  2320 */
/* --- cacheline 48 boundary (3072 bytes) was 56 bytes ago --- */
struct dma_pte *   pgd;  /*  3128 8 */
/* --- cacheline 49 boundary (3136 bytes) --- */
intgaw;  /*  3136 4 */
intagaw; /*  3140 4 */
intflags;/*  3144 4 */
intiommu_coherency;  /*  3148 4 */
intiommu_snooping;   /*  3152 4 */
intiommu_count;  /*  3156 4 */
intiommu_superpage;  /*  3160 4 */

/* XXX 4 bytes hole, try to pack */

u64max_addr; /*  3168 8 */
u32default_pasid;/*  3176 4 */

/* XXX 4 bytes hole, try to pack */

struct iommu_domaindomain;   /*  318472 */

/* size: 3256, cachelines: 51, members: 18 */
/* sum members: 3245, holes: 3, sum holes: 11 */
/* forced alignments: 1 */
/* last cacheline: 56 bytes */
} __attribute__((__aligned__(8)));

After arranging it for natural padding and to make flags as u8 bits, it
saves 8 bytes for the struct.

struct dmar_domain {
intnid;  /* 0 4 */
unsigned int   iommu_refcnt[128];/* 4   512 */
/* --- cacheline 8 boundary (512 bytes) was 4 bytes ago --- */
u16iommu_did[128];   /*   516   256 */
/* --- cacheline 12 boundary (768 bytes) was 4 bytes ago --- */
u8 has_iotlb_device:1;   /*   772: 0  1 */
u8 iommu_coherency:1;/*   772: 1  1 */
u8 iommu_snooping:1; /*   772: 2  1 */

/* XXX 5 bits hole, try to pack */
/* XXX 3 bytes hole, try to pack */

struct list_head   devices;  /*   77616 */
struct list_head   subdevices;   /*   79216 */
struct iova_domain iovad __attribute__((__aligned__(8)));
 /*   808  2320 */
/* --- cacheline 48 boundary (3072 bytes) was 56 bytes ago --- */
struct dma_pte *   pgd;  /*  3128 8 */
/* --- cacheline 49 boundary (3136 bytes) --- */
intgaw;  /*  3136 4 */
intagaw; /*  3140 4 */
intflags;/*  3144 4 */
intiommu_count;  /*  3148 4 */
intiommu_superpage;  /*  3152 4 */

/* XXX 4 bytes hole, try to pack */

u64max_addr; /*  3160 8 */
u32default_pasid;/*  3168 4 */

/* XXX 4 bytes hole, try to pack */

struct iommu_domaindomain;   /*  317672 */

/* size: 3248, cachelines: 51, members: 18 */
/* sum members: 3236, holes: 3, sum holes: 11 */
/* sum bitfield members: 3 bits, bit holes: 1, sum bit holes: 5 bits */
/* forced alignments: 1 */
/* last cacheline: 48 bytes */
} __attribute__((__aligned__(8)));

Signed-off-by: Parav Pandit 
Signed-off-by: Lu Baolu 
Link: https://lore.kernel.org/r/20210530075053.264218-1-pa...@nvidia.com
---
 include/linux/intel-iommu.h |  8 
 drivers/iommu/intel/iommu.c | 18 +-
 2 files changed, 13 insertions(+), 13 deletions(-)

diff --git a/include/linux/intel-iommu.h b/include/linux/intel-iommu.h
index f5cf31dd7280..2621eff04c82 1006

[PATCH 21/23] iommu/vt-d: Remove unnecessary braces

2021-06-09 Thread Lu Baolu

From: Parav Pandit 

No need for braces for single line statement under if() block.

Signed-off-by: Parav Pandit 
Signed-off-by: Lu Baolu 
Link: https://lore.kernel.org/r/20210530075053.264218-1-pa...@nvidia.com
---
 drivers/iommu/intel/iommu.c | 3 +--
 1 file changed, 1 insertion(+), 2 deletions(-)

diff --git a/drivers/iommu/intel/iommu.c b/drivers/iommu/intel/iommu.c
index dd8ecfbfdb23..d9a248ee5779 100644
--- a/drivers/iommu/intel/iommu.c
+++ b/drivers/iommu/intel/iommu.c
@@ -683,9 +683,8 @@ static int domain_update_iommu_superpage(struct dmar_domain 
*domain,
struct intel_iommu *iommu;
int mask = 0x3;
 
-   if (!intel_iommu_superpage) {
+   if (!intel_iommu_superpage)
return 0;
-   }
 
/* set iommu_superpage to the smallest common denominator */
rcu_read_lock();
-- 
2.25.1

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

[PATCH 20/23] iommu/vt-d: Removed unused iommu_count in dmar domain

2021-06-09 Thread Lu Baolu

From: Parav Pandit 

DMAR domain uses per DMAR refcount. It is indexed by iommu seq_id.
Older iommu_count is only incremented and decremented but no decisions
are taken based on this refcount. This is not of much use.

Hence, remove iommu_count and further simplify domain_detach_iommu()
by returning void.

Signed-off-by: Parav Pandit 
Signed-off-by: Lu Baolu 
Link: https://lore.kernel.org/r/20210530075053.264218-1-pa...@nvidia.com
---
 include/linux/intel-iommu.h |  1 -
 drivers/iommu/intel/iommu.c | 11 +++
 2 files changed, 3 insertions(+), 9 deletions(-)

diff --git a/include/linux/intel-iommu.h b/include/linux/intel-iommu.h
index 2621eff04c82..574b932dfe86 100644
--- a/include/linux/intel-iommu.h
+++ b/include/linux/intel-iommu.h
@@ -561,7 +561,6 @@ struct dmar_domain {
int agaw;
 
int flags;  /* flags to find out type of domain */
-   int iommu_count;/* reference count of iommu */
int iommu_superpage;/* Level of superpages supported:
   0 == 4KiB (no superpages), 1 == 2MiB,
   2 == 1GiB, 3 == 512GiB, 4 == 1TiB */
diff --git a/drivers/iommu/intel/iommu.c b/drivers/iommu/intel/iommu.c
index 430ef2232d47..dd8ecfbfdb23 100644
--- a/drivers/iommu/intel/iommu.c
+++ b/drivers/iommu/intel/iommu.c
@@ -1920,7 +1920,6 @@ static int domain_attach_iommu(struct dmar_domain *domain,
assert_spin_locked(&iommu->lock);
 
domain->iommu_refcnt[iommu->seq_id] += 1;
-   domain->iommu_count += 1;
if (domain->iommu_refcnt[iommu->seq_id] == 1) {
ndomains = cap_ndoms(iommu->cap);
num  = find_first_zero_bit(iommu->domain_ids, ndomains);
@@ -1928,7 +1927,6 @@ static int domain_attach_iommu(struct dmar_domain *domain,
if (num >= ndomains) {
pr_err("%s: No free domain ids\n", iommu->name);
domain->iommu_refcnt[iommu->seq_id] -= 1;
-   domain->iommu_count -= 1;
return -ENOSPC;
}
 
@@ -1944,16 +1942,15 @@ static int domain_attach_iommu(struct dmar_domain 
*domain,
return 0;
 }
 
-static int domain_detach_iommu(struct dmar_domain *domain,
-  struct intel_iommu *iommu)
+static void domain_detach_iommu(struct dmar_domain *domain,
+   struct intel_iommu *iommu)
 {
-   int num, count;
+   int num;
 
assert_spin_locked(&device_domain_lock);
assert_spin_locked(&iommu->lock);
 
domain->iommu_refcnt[iommu->seq_id] -= 1;
-   count = --domain->iommu_count;
if (domain->iommu_refcnt[iommu->seq_id] == 0) {
num = domain->iommu_did[iommu->seq_id];
clear_bit(num, iommu->domain_ids);
@@ -1962,8 +1959,6 @@ static int domain_detach_iommu(struct dmar_domain *domain,
domain_update_iommu_cap(domain);
domain->iommu_did[iommu->seq_id] = 0;
}
-
-   return count;
 }
 
 static inline int guestwidth_to_adjustwidth(int gaw)
-- 
2.25.1

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

[PATCH 22/23] iommu/vt-d: Define counter explicitly as unsigned int

2021-06-09 Thread Lu Baolu

From: Parav Pandit 

Avoid below checkpatch warning.

WARNING: Prefer 'unsigned int' to bare use of 'unsigned'
+   unsignediommu_refcnt[DMAR_UNITS_SUPPORTED];

Fixes: 29a27719abaa ("iommu/vt-d: Replace iommu_bmp with a refcount")
Signed-off-by: Parav Pandit 
Signed-off-by: Lu Baolu 
Link: https://lore.kernel.org/r/20210530075053.264218-1-pa...@nvidia.com
---
 include/linux/intel-iommu.h | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/include/linux/intel-iommu.h b/include/linux/intel-iommu.h
index 574b932dfe86..d0fa0b31994d 100644
--- a/include/linux/intel-iommu.h
+++ b/include/linux/intel-iommu.h
@@ -537,7 +537,7 @@ struct context_entry {
 struct dmar_domain {
int nid;/* node id */
 
-   unsignediommu_refcnt[DMAR_UNITS_SUPPORTED];
+   unsigned int iommu_refcnt[DMAR_UNITS_SUPPORTED];
/* Refcount of devices per iommu */
 
 
-- 
2.25.1

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

[PATCH 23/23] iommu/vt-d: No need to typecast

2021-06-09 Thread Lu Baolu

From: Parav Pandit 

Page directory assignment by alloc_pgtable_page() or phys_to_virt()
doesn't need typecasting as both routines return void*. Hence, remove
typecasting from both the calls.

Signed-off-by: Parav Pandit 
Signed-off-by: Lu Baolu 
Link: https://lore.kernel.org/r/20210530075053.264218-1-pa...@nvidia.com
---
 drivers/iommu/intel/iommu.c | 5 ++---
 1 file changed, 2 insertions(+), 3 deletions(-)

diff --git a/drivers/iommu/intel/iommu.c b/drivers/iommu/intel/iommu.c
index d9a248ee5779..bd93c7ec879e 100644
--- a/drivers/iommu/intel/iommu.c
+++ b/drivers/iommu/intel/iommu.c
@@ -4506,7 +4506,7 @@ static int md_domain_init(struct dmar_domain *domain, int 
guest_width)
domain->max_addr = 0;
 
/* always allocate the top pgd */
-   domain->pgd = (struct dma_pte *)alloc_pgtable_page(domain->nid);
+   domain->pgd = alloc_pgtable_page(domain->nid);
if (!domain->pgd)
return -ENOMEM;
domain_flush_cache(domain, domain->pgd, PAGE_SIZE);
@@ -4774,8 +4774,7 @@ static int prepare_domain_attach_device(struct 
iommu_domain *domain,
 
pte = dmar_domain->pgd;
if (dma_pte_present(pte)) {
-   dmar_domain->pgd = (struct dma_pte *)
-   phys_to_virt(dma_pte_addr(pte));
+   dmar_domain->pgd = phys_to_virt(dma_pte_addr(pte));
free_pgtable_page(pte);
}
dmar_domain->agaw--;
-- 
2.25.1

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

Re: [RFC] /dev/ioasid uAPI proposal

2021-06-09 Thread Jason Wang



在 2021/6/8 下午6:45, Enrico Weigelt, metux IT consult 写道:

On 07.06.21 20:01, Jason Gunthorpe wrote:

 it is what it is, select has a fixed size bitmap of FD #s and
a hard upper bound on that size as part of the glibc ABI - can't be
fixed.


in glibc ABI ? Uuuuh!



Note that dealing with select() or try to overcome the limitation via 
epoll() directly via the application is not a good practice (or it's not 
portable).


It's suggested to use building blocks provided by glib, e.g the main 
event loop[1]. That is how Qemu solve the issues of dealing with a lot 
of file descriptors.


Thanks

[1] https://developer.gnome.org/glib/stable/glib-The-Main-Event-Loop.html




--mtx



___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

[PATCH v2] iommu/arm-smmu: Fix arm_smmu_device refcount leak in address translation

2021-06-09 Thread Xiyu Yang via iommu

The reference counting issue happens in several exception handling paths
of arm_smmu_iova_to_phys_hard(). When those error scenarios occur, the
function forgets to decrease the refcount of "smmu" increased by
arm_smmu_rpm_get(), causing a refcount leak.

Fix this issue by jumping to "out" label when those error scenarios
occur.

Signed-off-by: Xiyu Yang 
Signed-off-by: Xin Tan 
---
 drivers/iommu/arm/arm-smmu/arm-smmu.c | 8 ++--
 1 file changed, 6 insertions(+), 2 deletions(-)

diff --git a/drivers/iommu/arm/arm-smmu/arm-smmu.c 
b/drivers/iommu/arm/arm-smmu/arm-smmu.c
index 6f72c4d208ca..3a3847277320 100644
--- a/drivers/iommu/arm/arm-smmu/arm-smmu.c
+++ b/drivers/iommu/arm/arm-smmu/arm-smmu.c
@@ -1271,6 +1271,7 @@ static phys_addr_t arm_smmu_iova_to_phys_hard(struct 
iommu_domain *domain,
u64 phys;
unsigned long va, flags;
int ret, idx = cfg->cbndx;
+   phys_addr_t addr = 0;
 
ret = arm_smmu_rpm_get(smmu);
if (ret < 0)
@@ -1290,6 +1291,7 @@ static phys_addr_t arm_smmu_iova_to_phys_hard(struct 
iommu_domain *domain,
dev_err(dev,
"iova to phys timed out on %pad. Falling back to 
software table walk.\n",
&iova);
+   arm_smmu_rpm_put(smmu);
return ops->iova_to_phys(ops, iova);
}
 
@@ -1298,12 +1300,14 @@ static phys_addr_t arm_smmu_iova_to_phys_hard(struct 
iommu_domain *domain,
if (phys & ARM_SMMU_CB_PAR_F) {
dev_err(dev, "translation fault!\n");
dev_err(dev, "PAR = 0x%llx\n", phys);
-   return 0;
+   goto out;
}
 
+   addr = (phys & GENMASK_ULL(39, 12)) | (iova & 0xfff);
+out:
arm_smmu_rpm_put(smmu);
 
-   return (phys & GENMASK_ULL(39, 12)) | (iova & 0xfff);
+   return addr;
 }
 
 static phys_addr_t arm_smmu_iova_to_phys(struct iommu_domain *domain,
-- 
2.7.4

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

[PATCH v2] iommu/arm-smmu: Fix arm_smmu_device refcount leak when arm_smmu_rpm_get fails

2021-06-09 Thread Xiyu Yang via iommu

arm_smmu_rpm_get() invokes pm_runtime_get_sync(), which increases the
refcount of the "smmu" even though the return value is less than 0.

The reference counting issue happens in some error handling paths of
arm_smmu_rpm_get() in its caller functions. When arm_smmu_rpm_get()
fails, the caller functions forget to decrease the refcount of "smmu"
increased by arm_smmu_rpm_get(), causing a refcount leak.

Fix this issue by calling pm_runtime_resume_and_get() instead of
pm_runtime_get_sync() in arm_smmu_rpm_get(), which can keep the refcount
balanced in case of failure.

Signed-off-by: Xiyu Yang 
Signed-off-by: Xin Tan 
---
 drivers/iommu/arm/arm-smmu/arm-smmu.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/iommu/arm/arm-smmu/arm-smmu.c 
b/drivers/iommu/arm/arm-smmu/arm-smmu.c
index 3a3847277320..1a647e0ea3eb 100644
--- a/drivers/iommu/arm/arm-smmu/arm-smmu.c
+++ b/drivers/iommu/arm/arm-smmu/arm-smmu.c
@@ -74,7 +74,7 @@ static bool using_legacy_binding, using_generic_binding;
 static inline int arm_smmu_rpm_get(struct arm_smmu_device *smmu)
 {
if (pm_runtime_enabled(smmu->dev))
-   return pm_runtime_get_sync(smmu->dev);
+   return pm_runtime_resume_and_get(smmu->dev);
 
return 0;
 }
-- 
2.7.4

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

Re: [RFC] /dev/ioasid uAPI proposal

2021-06-09 Thread Jason Wang



在 2021/6/10 上午10:00, Jason Wang 写道:


在 2021/6/8 下午9:20, Jason Gunthorpe 写道:

On Tue, Jun 08, 2021 at 09:10:42AM +0800, Jason Wang wrote:

Well, this sounds like a re-invention of io_uring which has already 
worked

for multifds.

How so? io_uring is about sending work to the kernel, not getting
structued events back?



Actually it can. Userspace can poll multiple fds via preparing 
multiple sqes with IORING_OP_ADD flag.



IORING_OP_POLL_ADD actually.

Thanks







It is more like one of the perf rings



This means another ring and we need introduce ioctl() to add or remove 
ioasids from the poll. And it still need a kind of fallback like a 
list if the ring is full.


Thanks




Jason



___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

[PATCH 1/2] iommu: Fix race condition during default domain allocation

2021-06-09 Thread Ashish Mhetre

Domain is getting created more than once during asynchronous multiple
display heads(devices) probe. All the display heads share same SID and
are expected to be in same domain. As iommu_alloc_default_domain() call
is not protected, the group->default_domain and group->domain are ending
up with different domains and leading to subsequent IOMMU faults.
Fix this by protecting iommu_alloc_default_domain() call with group->mutex.

Signed-off-by: Ashish Mhetre 
---
 drivers/iommu/iommu.c | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/drivers/iommu/iommu.c b/drivers/iommu/iommu.c
index 808ab70..2700500 100644
--- a/drivers/iommu/iommu.c
+++ b/drivers/iommu/iommu.c
@@ -273,7 +273,9 @@ int iommu_probe_device(struct device *dev)
 * support default domains, so the return value is not yet
 * checked.
 */
+   mutex_lock(&group->mutex);
iommu_alloc_default_domain(group, dev);
+   mutex_unlock(&group->mutex);
 
if (group->default_domain) {
ret = __iommu_attach_device(group->default_domain, dev);
-- 
2.7.4

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

[PATCH 2/2] iommu/arm-smmu: Fix race condition during iommu_group creation

2021-06-09 Thread Ashish Mhetre

From: Krishna Reddy 

iommu_group is getting created more than once during asynchronous multiple
display heads(devices) probe on Tegra194 SoC. All the display heads share
same SID and are expected to be in same iommu_group.
As arm_smmu_device_group() is not protecting group creation across devices,
it is leading to multiple groups creation across devices with same SID and
subsequent IOMMU faults.
Fix this by protecting group creation with smmu->stream_map_mutex.

Signed-off-by: Krishna Reddy 
---
 drivers/iommu/arm/arm-smmu/arm-smmu.c | 6 +-
 1 file changed, 5 insertions(+), 1 deletion(-)

diff --git a/drivers/iommu/arm/arm-smmu/arm-smmu.c 
b/drivers/iommu/arm/arm-smmu/arm-smmu.c
index 6f72c4d..21af179 100644
--- a/drivers/iommu/arm/arm-smmu/arm-smmu.c
+++ b/drivers/iommu/arm/arm-smmu/arm-smmu.c
@@ -1458,6 +1458,7 @@ static struct iommu_group *arm_smmu_device_group(struct 
device *dev)
struct iommu_group *group = NULL;
int i, idx;
 
+   mutex_lock(&smmu->stream_map_mutex);
for_each_cfg_sme(cfg, fwspec, i, idx) {
if (group && smmu->s2crs[idx].group &&
group != smmu->s2crs[idx].group)
@@ -1466,8 +1467,10 @@ static struct iommu_group *arm_smmu_device_group(struct 
device *dev)
group = smmu->s2crs[idx].group;
}
 
-   if (group)
+   if (group) {
+   mutex_unlock(&smmu->stream_map_mutex);
return iommu_group_ref_get(group);
+   }
 
if (dev_is_pci(dev))
group = pci_device_group(dev);
@@ -1481,6 +1484,7 @@ static struct iommu_group *arm_smmu_device_group(struct 
device *dev)
for_each_cfg_sme(cfg, fwspec, i, idx)
smmu->s2crs[idx].group = group;
 
+   mutex_unlock(&smmu->stream_map_mutex);
return group;
 }
 
-- 
2.7.4

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

[PATCH 0/2] iommu/arm-smmu: Fix races in iommu domain/group creation

2021-06-09 Thread Ashish Mhetre

Fix races during iommu group/domain creation for devices sharing same SID.

Ashish Mhetre (1):
  iommu: Fix race condition during default domain allocation

Krishna Reddy (1):
  iommu/arm-smmu: Fix race condition during iommu_group creation

 drivers/iommu/arm/arm-smmu/arm-smmu.c | 6 +-
 drivers/iommu/iommu.c | 2 ++
 2 files changed, 7 insertions(+), 1 deletion(-)

-- 
2.7.4

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

Re: Re: [PATCH] iommu/arm-smmu: Fix arm_smmu_device refcount leak when arm_smmu_rpm_get fails

2021-06-09 Thread Xiyu Yang via iommu



Thanks for your advice, I'll send a v2 patch soon.


> -Original Messages-
> From: "Robin Murphy" 
> Sent Time: 2021-06-09 22:12:11 (Wednesday)
> To: "Xiyu Yang" , "Will Deacon" , 
> "Joerg Roedel" , "Nicolin Chen" , 
> "Bjorn Andersson" , "Krishna Reddy" 
> , "Jordan Crouse" , "Sai Prakash 
> Ranjan" , 
> linux-arm-ker...@lists.infradead.org, iommu@lists.linux-foundation.org, 
> linux-ker...@vger.kernel.org
> Cc: yuanxzh...@fudan.edu.cn, "Xin Tan" 
> Subject: Re: [PATCH] iommu/arm-smmu: Fix arm_smmu_device refcount leak when 
> arm_smmu_rpm_get fails
> 
> On 2021-06-09 14:35, Xiyu Yang wrote:
> > arm_smmu_rpm_get() invokes pm_runtime_get_sync(), which increases the
> > refcount of the "smmu" even though the return value is less than 0.
> > 
> > The reference counting issue happens in some error handling paths of
> > arm_smmu_rpm_get() in its caller functions. When arm_smmu_rpm_get()
> > fails, the caller functions forget to decrease the refcount of "smmu"
> > increased by arm_smmu_rpm_get(), causing a refcount leak.
> > 
> > Fix this issue by calling arm_smmu_rpm_put() or jumping to the "rpm_put"
> > label when arm_smmu_rpm_get() fails.
> 
> If only there was some kind of helper function which could encapsulate 
> the correct expected behaviour in a single place...
> 
> In fact with the new pm_runtime_resume_and_get() API I think these two 
> patches boil down to a one-line change.
> 
> Thanks,
> Robin.
> 
> > Signed-off-by: Xiyu Yang 
> > Signed-off-by: Xin Tan 
> > ---
> >   drivers/iommu/arm/arm-smmu/arm-smmu.c | 13 +
> >   1 file changed, 9 insertions(+), 4 deletions(-)
> > 
> > diff --git a/drivers/iommu/arm/arm-smmu/arm-smmu.c 
> > b/drivers/iommu/arm/arm-smmu/arm-smmu.c
> > index 6f72c4d208ca..177ee54c5534 100644
> > --- a/drivers/iommu/arm/arm-smmu/arm-smmu.c
> > +++ b/drivers/iommu/arm/arm-smmu/arm-smmu.c
> > @@ -840,7 +840,7 @@ static void arm_smmu_destroy_domain_context(struct 
> > iommu_domain *domain)
> >   
> > ret = arm_smmu_rpm_get(smmu);
> > if (ret < 0)
> > -   return;
> > +   goto rpm_put;
> >   
> > /*
> >  * Disable the context bank and free the page tables before freeing
> > @@ -857,6 +857,7 @@ static void arm_smmu_destroy_domain_context(struct 
> > iommu_domain *domain)
> > free_io_pgtable_ops(smmu_domain->pgtbl_ops);
> > __arm_smmu_free_bitmap(smmu->context_map, cfg->cbndx);
> >   
> > +rpm_put:
> > arm_smmu_rpm_put(smmu);
> >   }
> >   
> > @@ -1153,7 +1154,7 @@ static int arm_smmu_attach_dev(struct iommu_domain 
> > *domain, struct device *dev)
> >   
> > ret = arm_smmu_rpm_get(smmu);
> > if (ret < 0)
> > -   return ret;
> > +   goto rpm_put;
> >   
> > /* Ensure that the domain is finalised */
> > ret = arm_smmu_init_domain_context(domain, smmu, dev);
> > @@ -1404,7 +1405,7 @@ static struct iommu_device 
> > *arm_smmu_probe_device(struct device *dev)
> >   
> > ret = arm_smmu_rpm_get(smmu);
> > if (ret < 0)
> > -   goto out_cfg_free;
> > +   goto rpm_put;
> >   
> > ret = arm_smmu_master_alloc_smes(dev);
> > arm_smmu_rpm_put(smmu);
> > @@ -1417,6 +1418,8 @@ static struct iommu_device 
> > *arm_smmu_probe_device(struct device *dev)
> >   
> > return &smmu->iommu;
> >   
> > +rpm_put:
> > +   arm_smmu_rpm_put(smmu);
> >   out_cfg_free:
> > kfree(cfg);
> >   out_free:
> > @@ -1438,8 +1441,10 @@ static void arm_smmu_release_device(struct device 
> > *dev)
> > smmu = cfg->smmu;
> >   
> > ret = arm_smmu_rpm_get(smmu);
> > -   if (ret < 0)
> > +   if (ret < 0) {
> > +   arm_smmu_rpm_put(smmu);
> > return;
> > +   }
> >   
> > arm_smmu_master_free_smes(cfg, fwspec);
> >   
> > 






___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

Re: [PATCH] iommu/io-pgtable-arm: Optimize partial walk flush for large scatter-gather list

2021-06-09 Thread Sai Prakash Ranjan


Hi Robin,

On 2021-06-10 00:14, Robin Murphy wrote:

On 2021-06-09 15:53, Sai Prakash Ranjan wrote:
Currently for iommu_unmap() of large scatter-gather list with page 
size
elements, the majority of time is spent in flushing of partial walks 
in

__arm_lpae_unmap() which is a VA based TLB invalidation (TLBIVA for
arm-smmu).

For example: to unmap a 32MB scatter-gather list with page size 
elements
(8192 entries), there are 16->2MB buffer unmaps based on the pgsize 
(2MB
for 4K granule) and each of 2MB will further result in 512 TLBIVAs 
(2MB/4K)
resulting in a total of 8192 TLBIVAs (512*16) for 16->2MB causing a 
huge

overhead.

So instead use io_pgtable_tlb_flush_all() to invalidate the entire 
context
if size (pgsize) is greater than the granule size (4K, 16K, 64K). For 
this
example of 32MB scatter-gather list unmap, this results in just 16 
ASID
based TLB invalidations or tlb_flush_all() callback (TLBIASID in case 
of
arm-smmu) as opposed to 8192 TLBIVAs thereby increasing the 
performance of

unmaps drastically.

Condition (size > granule size) is chosen for 
io_pgtable_tlb_flush_all()
because for any granule with supported pgsizes, we will have at least 
512
TLB invalidations for which tlb_flush_all() is already recommended. 
For
example, take 4K granule with 2MB pgsize, this will result in 512 
TLBIVA

in partial walk flush.

Test on QTI SM8150 SoC for 10 iterations of iommu_{map_sg}/unmap:
(average over 10 iterations)

Before this optimization:

 sizeiommu_map_sg  iommu_unmap
   4K2.067 us 1.854 us
  64K9.598 us 8.802 us
   1M  148.890 us   130.718 us
   2M  305.864 us67.291 us
  12M 1793.604 us   390.838 us
  16M 2386.848 us   518.187 us
  24M 3563.296 us   775.989 us
  32M 4747.171 us  1033.364 us

After this optimization:

 sizeiommu_map_sg  iommu_unmap
   4K1.723 us 1.765 us
  64K9.880 us 8.869 us
   1M  155.364 us   135.223 us
   2M  303.906 us 5.385 us
  12M 1786.557 us21.250 us
  16M 2391.890 us27.437 us
  24M 3570.895 us39.937 us
  32M 4755.234 us51.797 us

This is further reduced once the map/unmap_pages() support gets in 
which
will result in just 1 tlb_flush_all() as opposed to 16 
tlb_flush_all().


Signed-off-by: Sai Prakash Ranjan 
---
  drivers/iommu/io-pgtable-arm.c | 7 +--
  1 file changed, 5 insertions(+), 2 deletions(-)

diff --git a/drivers/iommu/io-pgtable-arm.c 
b/drivers/iommu/io-pgtable-arm.c

index 87def58e79b5..c3cb9add3179 100644
--- a/drivers/iommu/io-pgtable-arm.c
+++ b/drivers/iommu/io-pgtable-arm.c
@@ -589,8 +589,11 @@ static size_t __arm_lpae_unmap(struct 
arm_lpae_io_pgtable *data,

if (!iopte_leaf(pte, lvl, iop->fmt)) {
/* Also flush any partial walks */
-   io_pgtable_tlb_flush_walk(iop, iova, size,
- ARM_LPAE_GRANULE(data));
+   if (size > ARM_LPAE_GRANULE(data))
+   io_pgtable_tlb_flush_all(iop);
+   else


Erm, when will the above condition ever not be true? ;)



Ah right, silly me :)


Taking a step back, though, what about the impact to drivers other
than SMMUv2?


Other drivers would be msm_iommu.c, qcom_iommu.c which does the same
thing as arm-smmu-v2 (page based invalidations), then there is 
ipmmu-vmsa.c

which does tlb_flush_all() for flush walk.


In particular I'm thinking of SMMUv3.2 where the whole
range can be invalidated by VA in a single command anyway, so the
additional penalties of TLBIALL are undesirable.



Right, so I am thinking we can have a new generic quirk 
IO_PGTABLE_QUIRK_RANGE_INV
to choose between range based invalidations(tlb_flush_walk) and 
tlb_flush_all().
In this case of arm-smmu-v3.2, we can tie up ARM_SMMU_FEAT_RANGE_INV 
with this quirk

and have something like below, thoughts?

if (iop->cfg.quirks & IO_PGTABLE_QUIRK_RANGE_INV)
io_pgtable_tlb_flush_walk(iop, iova, size,
  ARM_LPAE_GRANULE(data));
else
io_pgtable_tlb_flush_all(iop);

Thanks,
Sai

--
QUALCOMM INDIA, on behalf of Qualcomm Innovation Center, Inc. is a 
member

of Code Aurora Forum, hosted by The Linux Foundation
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

Re: Plan for /dev/ioasid RFC v2

2021-06-09 Thread Lu Baolu


On 6/9/21 8:39 PM, Jason Gunthorpe wrote:

On Wed, Jun 09, 2021 at 02:24:03PM +0200, Joerg Roedel wrote:

On Mon, Jun 07, 2021 at 02:58:18AM +, Tian, Kevin wrote:

-   Device-centric (Jason) vs. group-centric (David) uAPI. David is not fully
 convinced yet. Based on discussion v2 will continue to have ioasid uAPI
 being device-centric (but it's fine for vfio to be group-centric). A new
 section will be added to elaborate this part;

I would vote for group-centric here. Or do the reasons for which VFIO is
group-centric not apply to IOASID? If so, why?

VFIO being group centric has made it very ugly/difficult to inject
device driver specific knowledge into the scheme.

The device driver is the only thing that knows to ask:
  - I need a SW table for this ioasid because I am like a mdev
  - I will issue TLPs with PASID
  - I need a IOASID linked to a PASID
  - I am a devices that uses ENQCMD and vPASID
  - etc in future

The current approach has the group try to guess the device driver
intention in the vfio type 1 code.

I want to see this be clean and have the device driver directly tell
the iommu layer what kind of DMA it plans to do, and thus how it needs
the IOMMU and IOASID configured.

This is the source of the ugly symbol_get and the very, very hacky 'if
you are a mdev*and*  a iommu then you must want a single PASID' stuff
in type1.

The group is causing all this mess because the group knows nothing
about what the device drivers contained in the group actually want.

Further being group centric eliminates the possibility of working in
cases like !ACS. How do I use PASID functionality of a device behind a
!ACS switch if the uAPI forces all IOASID's to be linked to a group,
not a device?

Device centric with an report that "all devices in the group must use
the same IOASID" covers all the new functionality, keep the old, and
has a better chance to keep going as a uAPI into the future.


The iommu_group can guarantee the isolation among different physical
devices (represented by RIDs). But when it comes to sub-devices (ex. 
mdev or vDPA devices represented by RID + SSID), we have to rely on the

device driver for isolation. The devices which are able to generate sub-
devices should either use their own on-device mechanisms or use the
platform features like Intel Scalable IOV to isolate the sub-devices.

Under above conditions, different sub-device from a same RID device
could be able to use different IOASID. This seems to means that we can't
support mixed mode where, for example, two RIDs share an iommu_group and
one (or both) of them have sub-devices.

AIUI, when we attach a "RID + SSID" to an IOASID, we should require that
the RID doesn't share the iommu_group with any other RID.

Best regards,
baolu
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

93 matches

Mail list logo