[PATCH v2 2/2] iommu: add Unisoc iommu basic driver

2021-02-01 Thread Chunyan Zhang
From: Chunyan Zhang 

This iommu module can be used by Unisoc's multimedia devices, such as
display, Image codec(jpeg) and a few signal processors, including
VSP(video), GSP(graphic), ISP(image), and CPP(camera pixel processor), etc.

Signed-off-by: Chunyan Zhang 
---
 drivers/iommu/Kconfig  |  12 +
 drivers/iommu/Makefile |   1 +
 drivers/iommu/sprd-iommu.c | 581 +
 3 files changed, 594 insertions(+)
 create mode 100644 drivers/iommu/sprd-iommu.c

diff --git a/drivers/iommu/Kconfig b/drivers/iommu/Kconfig
index 192ef8f61310..79af62c519ae 100644
--- a/drivers/iommu/Kconfig
+++ b/drivers/iommu/Kconfig
@@ -408,4 +408,16 @@ config VIRTIO_IOMMU
 
  Say Y here if you intend to run this kernel as a guest.
 
+config SPRD_IOMMU
+   tristate "Unisoc IOMMU Support"
+   depends on ARCH_SPRD || COMPILE_TEST
+   select IOMMU_API
+   help
+ Support for IOMMU on Unisoc's SoCs on which multi-media subsystems
+ need IOMMU, such as DPU, Image codec(jpeg) processor, and a few
+ signal processors, including VSP(video), GSP(graphic), ISP(image), and
+ CPP, etc.
+
+ Say Y here if you want multi-media functions.
+
 endif # IOMMU_SUPPORT
diff --git a/drivers/iommu/Makefile b/drivers/iommu/Makefile
index 61bd30cd8369..5925b6af2123 100644
--- a/drivers/iommu/Makefile
+++ b/drivers/iommu/Makefile
@@ -28,3 +28,4 @@ obj-$(CONFIG_S390_IOMMU) += s390-iommu.o
 obj-$(CONFIG_HYPERV_IOMMU) += hyperv-iommu.o
 obj-$(CONFIG_VIRTIO_IOMMU) += virtio-iommu.o
 obj-$(CONFIG_IOMMU_SVA_LIB) += iommu-sva-lib.o
+obj-$(CONFIG_SPRD_IOMMU) += sprd-iommu.o
diff --git a/drivers/iommu/sprd-iommu.c b/drivers/iommu/sprd-iommu.c
new file mode 100644
index ..0e638d56b309
--- /dev/null
+++ b/drivers/iommu/sprd-iommu.c
@@ -0,0 +1,581 @@
+// SPDX-License-Identifier: GPL-2.0-only
+/*
+ * Unisoc IOMMU driver
+ *
+ * Copyright (C) 2020 Unisoc, Inc.
+ * Author: Chunyan Zhang 
+ */
+
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+
+/* SPRD IOMMU page is 4K size alignment */
+#define SPRD_IOMMU_PAGE_SHIFT  12
+#define SPRD_IOMMU_PAGE_SIZE   SZ_4K
+
+#define SPRD_EX_CFG0x0
+#define SPRD_IOMMU_VAOR_BYPASS BIT(4)
+#define SPRD_IOMMU_GATE_EN BIT(1)
+#define SPRD_IOMMU_EN  BIT(0)
+#define SPRD_EX_UPDATE 0x4
+#define SPRD_EX_FIRST_VPN  0x8
+#define SPRD_EX_VPN_RANGE  0xc
+#define SPRD_EX_FIRST_PPN  0x10
+#define SPRD_EX_DEFAULT_PPN0x14
+
+#define SPRD_IOMMU_VERSION 0x0
+#define SPRD_VERSION_MASK  GENMASK(15, 8)
+#define SPRD_VERSION_SHIFT 0x8
+#define SPRD_VAU_CFG   0x4
+#define SPRD_VAU_UPDATE0x8
+#define SPRD_VAU_AUTH_CFG  0xc
+#define SPRD_VAU_FIRST_PPN 0x10
+#define SPRD_VAU_DEFAULT_PPN_RD0x14
+#define SPRD_VAU_DEFAULT_PPN_WR0x18
+#define SPRD_VAU_FIRST_VPN 0x1c
+#define SPRD_VAU_VPN_RANGE 0x20
+
+enum sprd_iommu_version {
+   SPRD_IOMMU_EX,
+   SPRD_IOMMU_VAU,
+};
+
+/*
+ * struct sprd_iommu_device - high-level sprd iommu device representation,
+ * including hardware information and configuration, also driver data, etc
+ *
+ * @mdata: hardware configuration and information
+ * @ver:   sprd iommu device version
+ * @prot_page: protect page base address, data would be written to here
+ * while translation fault
+ * @base:  mapped base address for accessing registers
+ * @dev:   pointer to basic device structure
+ * @iommu: IOMMU core representation
+ * @group: IOMMU group
+ */
+struct sprd_iommu_device {
+   enum sprd_iommu_version ver;
+   u32 *prot_page_va;
+   dma_addr_t  prot_page_pa;
+   struct regmap   *base;
+   unsigned intreg_offset;
+   struct device   *dev;
+   struct iommu_device iommu;
+   struct iommu_group  *group;
+   struct clk  *eb;
+};
+
+struct sprd_iommu_domain {
+   spinlock_t  pgtlock; /* lock for page table */
+   struct iommu_domain domain;
+   u32 *pgt_va; /* page table virtual address base */
+   dma_addr_t  pgt_pa; /* page table physical address base */
+   struct sprd_iommu_device*sdev;
+};
+
+static const struct iommu_ops sprd_iommu_ops;
+
+static struct sprd_iommu_domain *to_sprd_domain(struct iommu_domain *dom)
+{
+   return container_of(dom, struct sprd_iommu_domain, domain);
+}
+
+static inline void
+sprd_iommu_writel(struct sprd_iommu_device *sdev, unsigned int reg, u32 val)
+{
+   regmap_write(sdev->base, sdev->reg_offset + reg, val);
+}
+
+static inline u32
+sprd_iommu_readl(struct sprd_iommu_device *sdev, unsigned int reg)
+{
+   u32 val;
+
+   regmap_read(sdev->base, sdev->reg_offset + reg, );
+   return val;
+}
+
+static inline void
+sprd_iommu_update_bits(struct 

[PATCH v2 1/2] dt-bindings: iommu: add bindings for sprd iommu

2021-02-01 Thread Chunyan Zhang
From: Chunyan Zhang 

This iommu module can be used by Unisoc's multimedia devices, such as
display, Image codec(jpeg) and a few signal processors, including
VSP(video), GSP(graphic), ISP(image), and CPP(camera pixel processor), etc.

Signed-off-by: Chunyan Zhang 
---
 .../devicetree/bindings/iommu/sprd,iommu.yaml | 72 +++
 1 file changed, 72 insertions(+)
 create mode 100644 Documentation/devicetree/bindings/iommu/sprd,iommu.yaml

diff --git a/Documentation/devicetree/bindings/iommu/sprd,iommu.yaml 
b/Documentation/devicetree/bindings/iommu/sprd,iommu.yaml
new file mode 100644
index ..4fc99e81fa66
--- /dev/null
+++ b/Documentation/devicetree/bindings/iommu/sprd,iommu.yaml
@@ -0,0 +1,72 @@
+# SPDX-License-Identifier: (GPL-2.0-only OR BSD-2-Clause)
+# Copyright 2020 Unisoc Inc.
+%YAML 1.2
+---
+$id: http://devicetree.org/schemas/iommu/sprd,iommu.yaml#
+$schema: http://devicetree.org/meta-schemas/core.yaml#
+
+title: Unisoc IOMMU and Multi-media MMU
+
+maintainers:
+  - Chunyan Zhang 
+
+properties:
+  compatible:
+enum:
+  - sprd,iommu-v1
+
+  "#iommu-cells":
+const: 0
+description:
+  Unisoc IOMMUs are all single-master IOMMU devices, therefore no
+  additional information needs to associate with its master device.
+  Please refer to the generic bindings document for more details,
+  Documentation/devicetree/bindings/iommu/iommu.txt
+
+  reg:
+maxItems: 1
+description:
+  Not required if 'sprd,iommu-regs' is defined.
+
+  clocks:
+description:
+  Reference to a gate clock phandle, since access to some of IOMMUs are
+  controlled by gate clock, but this is not required.
+
+  sprd,iommu-regs:
+$ref: /schemas/types.yaml#/definitions/phandle-array
+description:
+  Reference to a syscon phandle plus 1 cell, the syscon defines the
+  register range used by the iommu and the media device, the cell
+  defines the offset for iommu registers. Since iommu module shares
+  the same register range with the media device which uses it.
+
+required:
+  - compatible
+  - "#iommu-cells"
+
+oneOf:
+  - required:
+  - reg
+  - required:
+  - sprd,iommu-regs
+
+additionalProperties: false
+
+examples:
+  - |
+iommu_disp: iommu-disp {
+  compatible = "sprd,iommu-v1";
+  sprd,iommu-regs = <_regs 0x800>;
+  #iommu-cells = <0>;
+};
+
+  - |
+iommu_jpg: iommu-jpg {
+  compatible = "sprd,iommu-v1";
+  sprd,iommu-regs = <_regs 0x300>;
+  #iommu-cells = <0>;
+  clocks = <_gate 1>;
+};
+
+...
-- 
2.25.1

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[PATCH v2 0/2] Add Unisoc iommu basic driver

2021-02-01 Thread Chunyan Zhang
From: Chunyan Zhang 

Changes since v1:
* Fixed compile errors reported by kernel test robot .
* Changed to use syscon to get mapped registers for iommu and media devices to 
avoid double map issue.
* Addressed Robin's comments:
- Added including offset in the returned physical address if the input virtual 
address isn't page-aligned;
- Added platform_device_put() after calling of_find_device_by_node();
- Removed iommu register offset from driver, it will be defined as the cell of 
DT reference to syscon phandle;
- Removed multi compatible strings which are not needed;
- Added comments for the function sprd_iommu_clk_enable();
- Added clocks property in bindings;
- Set device_driver.suppress_bind_attrs to disable unbind the devices via sysfs;
- A few trivial fixes.

Changes since RFC v2:
* Addressed Robin's comments:
- Add COMPILE_TEST support;
- Use DMA allocator for PTE;
- Revised to avoid resource leak issue;
- Added ->iotlb_sync implemented;
- Moved iommu group allocation to probe;
- Changed some function names to make them sprd specific;
* Added support for more iommu instance;

Changes since RFC v1:
* Rebased on v5.11-rc1;
* Changed sprd-iommu to tristate;
* Removed check for args_count of iommu OF node, since there's no args
  for sprd-iommu device node;
* Added another IP version (i.e. vau);
* Removed unnecessary configs selection from CONFIG_SPRD_IOMMU;
* Changed to get zeroed pages.

Chunyan Zhang (2):
  dt-bindings: iommu: add bindings for sprd iommu
  iommu: add Unisoc iommu basic driver

 .../devicetree/bindings/iommu/sprd,iommu.yaml |  72 +++
 drivers/iommu/Kconfig |  12 +
 drivers/iommu/Makefile|   1 +
 drivers/iommu/sprd-iommu.c| 581 ++
 4 files changed, 666 insertions(+)
 create mode 100644 Documentation/devicetree/bindings/iommu/sprd,iommu.yaml
 create mode 100644 drivers/iommu/sprd-iommu.c

-- 
2.25.1

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH v13 03/15] iommu/arm-smmu-v3: Maintain a SID->device structure

2021-02-01 Thread Keqian Zhu
Hi Eric,

On 2021/2/2 1:19, Auger Eric wrote:
> Hi Keqian,
> 
> On 2/1/21 1:26 PM, Keqian Zhu wrote:
>> Hi Eric,
>>
>> On 2020/11/18 19:21, Eric Auger wrote:
>>> From: Jean-Philippe Brucker 
>>>
>>> When handling faults from the event or PRI queue, we need to find the
>>> struct device associated to a SID. Add a rb_tree to keep track of SIDs.
>>>
>>> Signed-off-by: Jean-Philippe Brucker 
>> [...]
>>
>>>  }
>>>  
>>> +static int arm_smmu_insert_master(struct arm_smmu_device *smmu,
>>> + struct arm_smmu_master *master)
[...]

>>> kfree(master);
>>
>> Thanks,
>> Keqian
>>
> Thank you for the review. Jean will address this issues in his own
> series and on my end I will rebase on this latter.
> 
> Best Regards
> 
> Eric
>

Yeah, and hope this series can be accepted earlier ;-)

Thanks,
Keqian
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH v13 05/15] iommu/smmuv3: Get prepared for nested stage support

2021-02-01 Thread Keqian Zhu
Hi Eric,

On 2020/11/18 19:21, Eric Auger wrote:
> When nested stage translation is setup, both s1_cfg and
> s2_cfg are set.
> 
> We introduce a new smmu domain abort field that will be set
> upon guest stage1 configuration passing.
> 
> arm_smmu_write_strtab_ent() is modified to write both stage
> fields in the STE and deal with the abort field.
> 
> In nested mode, only stage 2 is "finalized" as the host does
> not own/configure the stage 1 context descriptor; guest does.
> 
> Signed-off-by: Eric Auger 
> 
> ---
> v10 -> v11:
> - Fix an issue reported by Shameer when switching from with vSMMU
>   to without vSMMU. Despite the spec does not seem to mention it
>   seems to be needed to reset the 2 high 64b when switching from
>   S1+S2 cfg to S1 only. Especially dst[3] needs to be reset (S2TTB).
>   On some implementations, if the S2TTB is not reset, this causes
>   a C_BAD_STE error
> ---
>  drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c | 64 +
>  drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h |  2 +
>  2 files changed, 56 insertions(+), 10 deletions(-)
> 
> diff --git a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c 
> b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
> index 18ac5af1b284..412ea1bafa50 100644
> --- a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
> +++ b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
> @@ -1181,8 +1181,10 @@ static void arm_smmu_write_strtab_ent(struct 
> arm_smmu_master *master, u32 sid,
>* three cases at the moment:
>*
>* 1. Invalid (all zero) -> bypass/fault (init)
> -  * 2. Bypass/fault -> translation/bypass (attach)
> -  * 3. Translation/bypass -> bypass/fault (detach)
> +  * 2. Bypass/fault -> single stage translation/bypass (attach)
> +  * 3. Single or nested stage Translation/bypass -> bypass/fault (detach)
> +  * 4. S2 -> S1 + S2 (attach_pasid_table)
> +  * 5. S1 + S2 -> S2 (detach_pasid_table)

The following line "BUG_ON(ste_live && !nested);" forbids this transform.
And I have a look at the 6th patch, the transform seems S1 + S2 -> abort.
So after detach, the status is not the same as that before attach. Does it
match our expectation?

>*
>* Given that we can't update the STE atomically and the SMMU
>* doesn't read the thing in a defined order, that leaves us
> @@ -1193,7 +1195,8 @@ static void arm_smmu_write_strtab_ent(struct 
> arm_smmu_master *master, u32 sid,
>* 3. Update Config, sync
>*/
>   u64 val = le64_to_cpu(dst[0]);
> - bool ste_live = false;
> + bool s1_live = false, s2_live = false, ste_live;
> + bool abort, nested = false, translate = false;
>   struct arm_smmu_device *smmu = NULL;
>   struct arm_smmu_s1_cfg *s1_cfg;
>   struct arm_smmu_s2_cfg *s2_cfg;
> @@ -1233,6 +1236,8 @@ static void arm_smmu_write_strtab_ent(struct 
> arm_smmu_master *master, u32 sid,
>   default:
>   break;
>   }
> + nested = s1_cfg->set && s2_cfg->set;
> + translate = s1_cfg->set || s2_cfg->set;
>   }
>  
>   if (val & STRTAB_STE_0_V) {
> @@ -1240,23 +1245,36 @@ static void arm_smmu_write_strtab_ent(struct 
> arm_smmu_master *master, u32 sid,
>   case STRTAB_STE_0_CFG_BYPASS:
>   break;
>   case STRTAB_STE_0_CFG_S1_TRANS:
> + s1_live = true;
> + break;
>   case STRTAB_STE_0_CFG_S2_TRANS:
> - ste_live = true;
> + s2_live = true;
> + break;
> + case STRTAB_STE_0_CFG_NESTED:
> + s1_live = true;
> + s2_live = true;
>   break;
>   case STRTAB_STE_0_CFG_ABORT:
> - BUG_ON(!disable_bypass);
>   break;
>   default:
>   BUG(); /* STE corruption */
>   }
>   }
>  
> + ste_live = s1_live || s2_live;
> +
>   /* Nuke the existing STE_0 value, as we're going to rewrite it */
>   val = STRTAB_STE_0_V;
>  
>   /* Bypass/fault */
> - if (!smmu_domain || !(s1_cfg->set || s2_cfg->set)) {
> - if (!smmu_domain && disable_bypass)
> +
> + if (!smmu_domain)
> + abort = disable_bypass;
> + else
> + abort = smmu_domain->abort;
> +
> + if (abort || !translate) {
> + if (abort)
>   val |= FIELD_PREP(STRTAB_STE_0_CFG, 
> STRTAB_STE_0_CFG_ABORT);
>   else
>   val |= FIELD_PREP(STRTAB_STE_0_CFG, 
> STRTAB_STE_0_CFG_BYPASS);
> @@ -1274,8 +1292,16 @@ static void arm_smmu_write_strtab_ent(struct 
> arm_smmu_master *master, u32 sid,
>   return;
>   }
>  
> + BUG_ON(ste_live && !nested);
> +
> + if (ste_live) {
> + /* First invalidate the live STE */
> + dst[0] = cpu_to_le64(STRTAB_STE_0_CFG_ABORT);
> + 

Re: [PATCH v13 03/15] iommu/arm-smmu-v3: Maintain a SID->device structure

2021-02-01 Thread Keqian Zhu
Hi Jean,

On 2021/2/1 23:15, Jean-Philippe Brucker wrote:
> On Mon, Feb 01, 2021 at 08:26:41PM +0800, Keqian Zhu wrote:
>>> +static int arm_smmu_insert_master(struct arm_smmu_device *smmu,
>>> + struct arm_smmu_master *master)
>>> +{
>>> +   int i;
>>> +   int ret = 0;
>>> +   struct arm_smmu_stream *new_stream, *cur_stream;
>>> +   struct rb_node **new_node, *parent_node = NULL;
>>> +   struct iommu_fwspec *fwspec = dev_iommu_fwspec_get(master->dev);
>>> +
>>> +   master->streams = kcalloc(fwspec->num_ids,
>>> + sizeof(struct arm_smmu_stream), GFP_KERNEL);
>>> +   if (!master->streams)
>>> +   return -ENOMEM;
>>> +   master->num_streams = fwspec->num_ids;
>> This is not roll-backed when fail.
> 
> No need, the caller frees master
OK.

> 
>>> +
>>> +   mutex_lock(>streams_mutex);
>>> +   for (i = 0; i < fwspec->num_ids && !ret; i++) {
>> Check ret at here, makes it hard to decide the start index of rollback.
>>
>> If we fail at here, then start index is (i-2).
>> If we fail in the loop, then start index is (i-1).
>>
> [...]
>>> +   if (ret) {
>>> +   for (; i > 0; i--)
>> should be (i >= 0)?
>> And the start index seems not correct.
> 
> Indeed, this whole bit is wrong. I'll fix it while resending the IOPF
> series.
> 
> Thanks,
> Jean
OK, I am glad it helps.

Thanks,
Keqian
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH 2/3] iommu/io-pgtable-arm: Add IOMMU_LLC page protection flag

2021-02-01 Thread Sai Prakash Ranjan

On 2021-02-01 23:50, Jordan Crouse wrote:

On Mon, Feb 01, 2021 at 08:20:44AM -0800, Rob Clark wrote:

On Mon, Feb 1, 2021 at 3:16 AM Will Deacon  wrote:
>
> On Fri, Jan 29, 2021 at 03:12:59PM +0530, Sai Prakash Ranjan wrote:
> > On 2021-01-29 14:35, Will Deacon wrote:
> > > On Mon, Jan 11, 2021 at 07:45:04PM +0530, Sai Prakash Ranjan wrote:
> > > > Add a new page protection flag IOMMU_LLC which can be used
> > > > by non-coherent masters to set cacheable memory attributes
> > > > for an outer level of cache called as last-level cache or
> > > > system cache. Initial user of this page protection flag is
> > > > the adreno gpu and then can later be used by other clients
> > > > such as video where this can be used for per-buffer based
> > > > mapping.
> > > >
> > > > Signed-off-by: Sai Prakash Ranjan 
> > > > ---
> > > >  drivers/iommu/io-pgtable-arm.c | 3 +++
> > > >  include/linux/iommu.h  | 6 ++
> > > >  2 files changed, 9 insertions(+)
> > > >
> > > > diff --git a/drivers/iommu/io-pgtable-arm.c
> > > > b/drivers/iommu/io-pgtable-arm.c
> > > > index 7439ee7fdcdb..ebe653ef601b 100644
> > > > --- a/drivers/iommu/io-pgtable-arm.c
> > > > +++ b/drivers/iommu/io-pgtable-arm.c
> > > > @@ -415,6 +415,9 @@ static arm_lpae_iopte
> > > > arm_lpae_prot_to_pte(struct arm_lpae_io_pgtable *data,
> > > >   else if (prot & IOMMU_CACHE)
> > > >   pte |= (ARM_LPAE_MAIR_ATTR_IDX_CACHE
> > > >   << ARM_LPAE_PTE_ATTRINDX_SHIFT);
> > > > + else if (prot & IOMMU_LLC)
> > > > + pte |= (ARM_LPAE_MAIR_ATTR_IDX_INC_OCACHE
> > > > + << ARM_LPAE_PTE_ATTRINDX_SHIFT);
> > > >   }
> > > >
> > > >   if (prot & IOMMU_CACHE)
> > > > diff --git a/include/linux/iommu.h b/include/linux/iommu.h
> > > > index ffaa389ea128..1f82057df531 100644
> > > > --- a/include/linux/iommu.h
> > > > +++ b/include/linux/iommu.h
> > > > @@ -31,6 +31,12 @@
> > > >   * if the IOMMU page table format is equivalent.
> > > >   */
> > > >  #define IOMMU_PRIV   (1 << 5)
> > > > +/*
> > > > + * Non-coherent masters can use this page protection flag to set
> > > > cacheable
> > > > + * memory attributes for only a transparent outer level of cache,
> > > > also known as
> > > > + * the last-level or system cache.
> > > > + */
> > > > +#define IOMMU_LLC(1 << 6)
> > >
> > > On reflection, I'm a bit worried about exposing this because I think it
> > > will
> > > introduce a mismatched virtual alias with the CPU (we don't even have a
> > > MAIR
> > > set up for this memory type). Now, we also have that issue for the PTW,
> > > but
> > > since we always use cache maintenance (i.e. the streaming API) for
> > > publishing the page-tables to a non-coheren walker, it works out.
> > > However,
> > > if somebody expects IOMMU_LLC to be coherent with a DMA API coherent
> > > allocation, then they're potentially in for a nasty surprise due to the
> > > mismatched outer-cacheability attributes.
> > >
> >
> > Can't we add the syscached memory type similar to what is done on android?
>
> Maybe. How does the GPU driver map these things on the CPU side?

Currently we use writecombine mappings for everything, although there
are some cases that we'd like to use cached (but have not merged
patches that would give userspace a way to flush/invalidate)

BR,
-R


LLC/system cache doesn't have a relationship with the CPU cache.  Its 
just a
little accelerator that sits on the connection from the GPU to DDR and 
caches
accesses. The hint that Sai is suggesting is used to mark the buffers 
as
'no-write-allocate' to prevent GPU write operations from being cached 
in the LLC
which a) isn't interesting and b) takes up cache space for read 
operations.


Its easiest to think of the LLC as a bonus accelerator that has no cost 
for

us to use outside of the unfortunate per buffer hint.

We do have to worry about the CPU cache w.r.t I/O coherency (which is a
different hint) and in that case we have all of concerns that Will 
identified.




For mismatched outer cacheability attributes which Will mentioned, I was
referring to [1] in android kernel.

[1] https://android-review.googlesource.com/c/kernel/common/+/1549097/3

Thanks,
Sai

--
QUALCOMM INDIA, on behalf of Qualcomm Innovation Center, Inc. is a 
member

of Code Aurora Forum, hosted by The Linux Foundation
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH 2/3] iommu/io-pgtable-arm: Add IOMMU_LLC page protection flag

2021-02-01 Thread Sai Prakash Ranjan

On 2021-02-01 23:50, Jordan Crouse wrote:

On Mon, Feb 01, 2021 at 08:20:44AM -0800, Rob Clark wrote:

On Mon, Feb 1, 2021 at 3:16 AM Will Deacon  wrote:
>
> On Fri, Jan 29, 2021 at 03:12:59PM +0530, Sai Prakash Ranjan wrote:
> > On 2021-01-29 14:35, Will Deacon wrote:
> > > On Mon, Jan 11, 2021 at 07:45:04PM +0530, Sai Prakash Ranjan wrote:
> > > > Add a new page protection flag IOMMU_LLC which can be used
> > > > by non-coherent masters to set cacheable memory attributes
> > > > for an outer level of cache called as last-level cache or
> > > > system cache. Initial user of this page protection flag is
> > > > the adreno gpu and then can later be used by other clients
> > > > such as video where this can be used for per-buffer based
> > > > mapping.
> > > >
> > > > Signed-off-by: Sai Prakash Ranjan 
> > > > ---
> > > >  drivers/iommu/io-pgtable-arm.c | 3 +++
> > > >  include/linux/iommu.h  | 6 ++
> > > >  2 files changed, 9 insertions(+)
> > > >
> > > > diff --git a/drivers/iommu/io-pgtable-arm.c
> > > > b/drivers/iommu/io-pgtable-arm.c
> > > > index 7439ee7fdcdb..ebe653ef601b 100644
> > > > --- a/drivers/iommu/io-pgtable-arm.c
> > > > +++ b/drivers/iommu/io-pgtable-arm.c
> > > > @@ -415,6 +415,9 @@ static arm_lpae_iopte
> > > > arm_lpae_prot_to_pte(struct arm_lpae_io_pgtable *data,
> > > >   else if (prot & IOMMU_CACHE)
> > > >   pte |= (ARM_LPAE_MAIR_ATTR_IDX_CACHE
> > > >   << ARM_LPAE_PTE_ATTRINDX_SHIFT);
> > > > + else if (prot & IOMMU_LLC)
> > > > + pte |= (ARM_LPAE_MAIR_ATTR_IDX_INC_OCACHE
> > > > + << ARM_LPAE_PTE_ATTRINDX_SHIFT);
> > > >   }
> > > >
> > > >   if (prot & IOMMU_CACHE)
> > > > diff --git a/include/linux/iommu.h b/include/linux/iommu.h
> > > > index ffaa389ea128..1f82057df531 100644
> > > > --- a/include/linux/iommu.h
> > > > +++ b/include/linux/iommu.h
> > > > @@ -31,6 +31,12 @@
> > > >   * if the IOMMU page table format is equivalent.
> > > >   */
> > > >  #define IOMMU_PRIV   (1 << 5)
> > > > +/*
> > > > + * Non-coherent masters can use this page protection flag to set
> > > > cacheable
> > > > + * memory attributes for only a transparent outer level of cache,
> > > > also known as
> > > > + * the last-level or system cache.
> > > > + */
> > > > +#define IOMMU_LLC(1 << 6)
> > >
> > > On reflection, I'm a bit worried about exposing this because I think it
> > > will
> > > introduce a mismatched virtual alias with the CPU (we don't even have a
> > > MAIR
> > > set up for this memory type). Now, we also have that issue for the PTW,
> > > but
> > > since we always use cache maintenance (i.e. the streaming API) for
> > > publishing the page-tables to a non-coheren walker, it works out.
> > > However,
> > > if somebody expects IOMMU_LLC to be coherent with a DMA API coherent
> > > allocation, then they're potentially in for a nasty surprise due to the
> > > mismatched outer-cacheability attributes.
> > >
> >
> > Can't we add the syscached memory type similar to what is done on android?
>
> Maybe. How does the GPU driver map these things on the CPU side?

Currently we use writecombine mappings for everything, although there
are some cases that we'd like to use cached (but have not merged
patches that would give userspace a way to flush/invalidate)

BR,
-R


LLC/system cache doesn't have a relationship with the CPU cache.  Its 
just a
little accelerator that sits on the connection from the GPU to DDR and 
caches
accesses. The hint that Sai is suggesting is used to mark the buffers 
as
'no-write-allocate' to prevent GPU write operations from being cached 
in the LLC
which a) isn't interesting and b) takes up cache space for read 
operations.


Its easiest to think of the LLC as a bonus accelerator that has no cost 
for

us to use outside of the unfortunate per buffer hint.

We do have to worry about the CPU cache w.r.t I/O coherency (which is a
different hint) and in that case we have all of concerns that Will 
identified.




For mismatched outer cacheability attributes which Will mentioned, I was
referring to [1] in android kernel.

[1] https://android-review.googlesource.com/c/kernel/common/+/1549097/3

Thanks,
Sai

--
QUALCOMM INDIA, on behalf of Qualcomm Innovation Center, Inc. is a 
member

of Code Aurora Forum, hosted by The Linux Foundation
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH v12 06/10] iommu: Add a page fault handler

2021-02-01 Thread Shenming Lu
Hi Jean,

It seems that the preprocessing of the page faults(groups) here is relatively
generic, and if a device driver wants to reuse it while having its own 
iopf_handle_single(),
is there any chance for this? :-)

Thanks,
Shenming

On 2021/1/27 23:43, Jean-Philippe Brucker wrote:
> Some systems allow devices to handle I/O Page Faults in the core mm. For
> example systems implementing the PCIe PRI extension or Arm SMMU stall
> model. Infrastructure for reporting these recoverable page faults was
> added to the IOMMU core by commit 0c830e6b3282 ("iommu: Introduce device
> fault report API"). Add a page fault handler for host SVA.
> 
> IOMMU driver can now instantiate several fault workqueues and link them
> to IOPF-capable devices. Drivers can choose between a single global
> workqueue, one per IOMMU device, one per low-level fault queue, one per
> domain, etc.
> 
> When it receives a fault event, most commonly in an IRQ handler, the
> IOMMU driver reports the fault using iommu_report_device_fault(), which
> calls the registered handler. The page fault handler then calls the mm
> fault handler, and reports either success or failure with
> iommu_page_response(). After the handler succeeds, the hardware retries
> the access.
> 
> The iopf_param pointer could be embedded into iommu_fault_param. But
> putting iopf_param into the iommu_param structure allows us not to care
> about ordering between calls to iopf_queue_add_device() and
> iommu_register_device_fault_handler().
> 
> Reviewed-by: Jonathan Cameron 
> Signed-off-by: Jean-Philippe Brucker 
> ---
>  drivers/iommu/Makefile|   1 +
>  drivers/iommu/iommu-sva-lib.h |  53 
>  include/linux/iommu.h |   2 +
>  drivers/iommu/io-pgfault.c| 461 ++
>  4 files changed, 517 insertions(+)
>  create mode 100644 drivers/iommu/io-pgfault.c
> 
> diff --git a/drivers/iommu/Makefile b/drivers/iommu/Makefile
> index 61bd30cd8369..60fafc23dee6 100644
> --- a/drivers/iommu/Makefile
> +++ b/drivers/iommu/Makefile
> @@ -28,3 +28,4 @@ obj-$(CONFIG_S390_IOMMU) += s390-iommu.o
>  obj-$(CONFIG_HYPERV_IOMMU) += hyperv-iommu.o
>  obj-$(CONFIG_VIRTIO_IOMMU) += virtio-iommu.o
>  obj-$(CONFIG_IOMMU_SVA_LIB) += iommu-sva-lib.o
> +obj-$(CONFIG_IOMMU_SVA_LIB) += io-pgfault.o
> diff --git a/drivers/iommu/iommu-sva-lib.h b/drivers/iommu/iommu-sva-lib.h
> index b40990aef3fd..031155010ca8 100644
> --- a/drivers/iommu/iommu-sva-lib.h
> +++ b/drivers/iommu/iommu-sva-lib.h
> @@ -12,4 +12,57 @@ int iommu_sva_alloc_pasid(struct mm_struct *mm, ioasid_t 
> min, ioasid_t max);
>  void iommu_sva_free_pasid(struct mm_struct *mm);
>  struct mm_struct *iommu_sva_find(ioasid_t pasid);
>  
> +/* I/O Page fault */
> +struct device;
> +struct iommu_fault;
> +struct iopf_queue;
> +
> +#ifdef CONFIG_IOMMU_SVA_LIB
> +int iommu_queue_iopf(struct iommu_fault *fault, void *cookie);
> +
> +int iopf_queue_add_device(struct iopf_queue *queue, struct device *dev);
> +int iopf_queue_remove_device(struct iopf_queue *queue,
> +  struct device *dev);
> +int iopf_queue_flush_dev(struct device *dev);
> +struct iopf_queue *iopf_queue_alloc(const char *name);
> +void iopf_queue_free(struct iopf_queue *queue);
> +int iopf_queue_discard_partial(struct iopf_queue *queue);
> +
> +#else /* CONFIG_IOMMU_SVA_LIB */
> +static inline int iommu_queue_iopf(struct iommu_fault *fault, void *cookie)
> +{
> + return -ENODEV;
> +}
> +
> +static inline int iopf_queue_add_device(struct iopf_queue *queue,
> + struct device *dev)
> +{
> + return -ENODEV;
> +}
> +
> +static inline int iopf_queue_remove_device(struct iopf_queue *queue,
> +struct device *dev)
> +{
> + return -ENODEV;
> +}
> +
> +static inline int iopf_queue_flush_dev(struct device *dev)
> +{
> + return -ENODEV;
> +}
> +
> +static inline struct iopf_queue *iopf_queue_alloc(const char *name)
> +{
> + return NULL;
> +}
> +
> +static inline void iopf_queue_free(struct iopf_queue *queue)
> +{
> +}
> +
> +static inline int iopf_queue_discard_partial(struct iopf_queue *queue)
> +{
> + return -ENODEV;
> +}
> +#endif /* CONFIG_IOMMU_SVA_LIB */
>  #endif /* _IOMMU_SVA_LIB_H */
> diff --git a/include/linux/iommu.h b/include/linux/iommu.h
> index 00348e4c3c26..edc9be443a74 100644
> --- a/include/linux/iommu.h
> +++ b/include/linux/iommu.h
> @@ -366,6 +366,7 @@ struct iommu_fault_param {
>   * struct dev_iommu - Collection of per-device IOMMU data
>   *
>   * @fault_param: IOMMU detected device fault reporting data
> + * @iopf_param:   I/O Page Fault queue and data
>   * @fwspec:   IOMMU fwspec data
>   * @iommu_dev:IOMMU device this device is linked to
>   * @priv: IOMMU Driver private data
> @@ -376,6 +377,7 @@ struct iommu_fault_param {
>  struct dev_iommu {
>   struct mutex lock;
>   struct iommu_fault_param*fault_param;
> + struct iopf_device_param*iopf_param;
>   

Re: AMD-Vi: Unable to read/write to IOMMU perf counter

2021-02-01 Thread Suravee Suthikulpanit

Could you please try the attached patch to see if the problem still persist.

Thanks,
Suravee

On 1/25/21 4:24 PM, Tj (Elloe Linux) wrote:

Lenovo E495 reports:

pci :00:00.2: AMD-Vi: Unable to read/write to IOMMU perf counter.
pci :00:00.2: can't derive routing for PCI INT A
pci :00:00.2: PCI INT A: not connected

I found an existing identical bug report that doesn't seem to have
gained any attention:

https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Fbugzilla.kernel.org%2Fshow_bug.cgi%3Fid%3D201753data=04%7C01%7Csuravee.suthikulpanit%40amd.com%7C7c56640fcf24465050f008d8c145eba4%7C3dd8961fe4884e608e11a82d994e183d%7C0%7C0%7C637471853347946970%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000sdata=uykr%2FZMpr%2BuLrw3k1bKVcwywfJB4CU0p2qJSZXgLNK8%3Dreserved=0

Linux version 5.11.0-rc4+ (tj@elloe000) (gcc (Ubuntu
9.3.0-17ubuntu1~20.04) 9.3.0, GNU ld (GNU Binutils for Ubuntu) 2.34) #12
SMP PREEMPT Sun Jan 24 11:28:01 GMT 2021
Command line: BOOT_IMAGE=/vmlinuz-5.11.0-rc4+
root=/dev/mapper/ELLOE000-rootfs ro acpi_osi=! "acpi_osi=Windows 2016"
systemd.unified_cgroup_hierarchy=1 nosplash
...
DMI: LENOVO 20NECTO1WW/20NECTO1WW, BIOS R11ET32W (1.12 ) 12/23/2019
...
AMD-Vi: ivrs, add hid:PNPD0040, uid:, rdevid:152
...
smpboot: CPU0: AMD Ryzen 7 3700U with Radeon Vega Mobile Gfx (family:
0x17, model: 0x18, stepping: 0x1)
...
pci :00:00.2: AMD-Vi: Unable to read/write to IOMMU perf counter.
pci :00:00.2: can't derive routing for PCI INT A
pci :00:00.2: PCI INT A: not connected
pci :00:01.0: Adding to iommu group 0
pci :00:01.1: Adding to iommu group 1
pci :00:01.2: Adding to iommu group 2
pci :00:01.3: Adding to iommu group 3
pci :00:01.6: Adding to iommu group 4
pci :00:08.0: Adding to iommu group 5
pci :00:08.1: Adding to iommu group 6
pci :00:14.0: Adding to iommu group 7
pci :00:14.3: Adding to iommu group 7
pci :00:18.0: Adding to iommu group 8
pci :00:18.1: Adding to iommu group 8
pci :00:18.2: Adding to iommu group 8
pci :00:18.3: Adding to iommu group 8
pci :00:18.4: Adding to iommu group 8
pci :00:18.5: Adding to iommu group 8
pci :00:18.6: Adding to iommu group 8
pci :00:18.7: Adding to iommu group 8
pci :01:00.0: Adding to iommu group 9
pci :02:00.0: Adding to iommu group 10
pci :03:00.0: Adding to iommu group 11
pci :04:00.0: Adding to iommu group 12
pci :05:00.0: Adding to iommu group 13
pci :05:00.1: Adding to iommu group 14
pci :05:00.2: Adding to iommu group 14
pci :05:00.3: Adding to iommu group 14
pci :05:00.4: Adding to iommu group 14
pci :05:00.5: Adding to iommu group 14
pci :05:00.6: Adding to iommu group 14
pci :00:00.2: AMD-Vi: Found IOMMU cap 0x40
pci :00:00.2: AMD-Vi: Extended features (0x4f77ef22294ada):
  PPR NX GT IA GA PC GA_vAPIC
AMD-Vi: Interrupt remapping enabled
AMD-Vi: Virtual APIC enabled
AMD-Vi: Lazy IO/TLB flushing enabled
amd_uncore: 4  amd_df counters detected
___
iommu mailing list
iommu@lists.linux-foundation.org
https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Flists.linuxfoundation.org%2Fmailman%2Flistinfo%2Fiommudata=04%7C01%7Csuravee.suthikulpanit%40amd.com%7C7c56640fcf24465050f008d8c145eba4%7C3dd8961fe4884e608e11a82d994e183d%7C0%7C0%7C637471853347946970%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000sdata=5w2IiD7Cjsvk9qyiYC9eLmFaBIJLXdLQx4kg27LWycg%3Dreserved=0

From c103d631285cf376420e7f7869837302f2ac38c0 Mon Sep 17 00:00:00 2001
From: Suravee Suthikulpanit 
Date: Mon, 1 Feb 2021 18:38:26 -0600
Subject: [RFC PATCH] iommu/amd: Fix performance counter initialization

Certain AMD platforms enable power gating feature for IOMMU PMC,
which prevents the IOMMU driver from updating the counter while
trying to validate the PMC functionality in the init_iommu_perf_ctr().
This results in disabling PMC support and the following error message:

"AMD-Vi: Unable to write to IOMMU perf counter"

To workaround this issue, disable power gating temporarily by programming
the counter source to non-zero value while validating the counter,
and restore the prior state afterward.

Bugzilla: https://bugzilla.kernel.org/show_bug.cgi?id=201753
Signed-off-by: Suravee Suthikulpanit 
---
NOTE:
I have tested this patch only on certain platforms. It might need more testing
coverage on other mobile and desktop platforms.

Thank you,
Suravee

 drivers/iommu/amd/init.c | 33 -
 1 file changed, 24 insertions(+), 9 deletions(-)

diff --git a/drivers/iommu/amd/init.c b/drivers/iommu/amd/init.c
index 83d8ab2aed9f..edb885625e47 100644
--- a/drivers/iommu/amd/init.c
+++ b/drivers/iommu/amd/init.c
@@ -254,6 +254,8 @@ static enum iommu_init_state init_state = IOMMU_START_STATE;
 static int amd_iommu_enable_interrupts(void);
 static int __init iommu_go_to_state(enum 

[PATCH 3/3] iommu/vt-d: Apply SATC policy

2021-02-01 Thread Lu Baolu
From: Yian Chen 

Starting from Intel VT-d v3.2, Intel platform BIOS can provide a new SATC
table structure. SATC table lists a set of SoC integrated devices that
require ATC to work (VT-d specification v3.2, section 8.8). Furthermore,
the new version of IOMMU supports SoC device ATS in both its Scalable mode
and legacy mode.

When IOMMU is working in scalable mode, software must enable device ATS
support. On the other hand, when IOMMU is in legacy mode for whatever
reason, the hardware managed ATS will automatically take effect and the
SATC required devices can work transparently to the software. As the
result, software shouldn't enable ATS on that device, otherwise duplicate
device TLB invalidations will occur.

Signed-off-by: Yian Chen 
---
 drivers/iommu/intel/iommu.c | 72 ++---
 1 file changed, 68 insertions(+), 4 deletions(-)

diff --git a/drivers/iommu/intel/iommu.c b/drivers/iommu/intel/iommu.c
index b20fd305fc60..131fac718a4b 100644
--- a/drivers/iommu/intel/iommu.c
+++ b/drivers/iommu/intel/iommu.c
@@ -872,6 +872,59 @@ static bool iommu_is_dummy(struct intel_iommu *iommu, 
struct device *dev)
return false;
 }
 
+static bool iommu_support_ats(struct intel_iommu *iommu)
+{
+   return ecap_dev_iotlb_support(iommu->ecap);
+}
+
+static bool device_support_ats(struct pci_dev *dev)
+{
+   return pci_ats_supported(dev) && dmar_find_matched_atsr_unit(dev);
+}
+
+static int segment_atc_required(u16 segment)
+{
+   struct acpi_dmar_satc *satc;
+   struct dmar_satc_unit *satcu;
+   int ret = 1;
+
+   rcu_read_lock();
+   list_for_each_entry_rcu(satcu, _satc_units, list) {
+   satc = container_of(satcu->hdr, struct acpi_dmar_satc, header);
+   if (satcu->atc_required &&
+   satcu->devices_cnt &&
+   satc->segment == segment)
+   goto out;
+   }
+   ret = 0;
+out:
+   rcu_read_unlock();
+   return ret;
+}
+
+static int device_atc_required(struct pci_dev *dev)
+{
+   struct dmar_satc_unit *satcu;
+   struct acpi_dmar_satc *satc;
+   struct device *tmp;
+   int i, ret = 1;
+
+   dev = pci_physfn(dev);
+   rcu_read_lock();
+   list_for_each_entry_rcu(satcu, _satc_units, list) {
+   satc = container_of(satcu->hdr, struct acpi_dmar_satc, header);
+   if (satc->segment == pci_domain_nr(dev->bus) && 
satcu->atc_required) {
+   for_each_dev_scope(satcu->devices, satcu->devices_cnt, 
i, tmp)
+   if (to_pci_dev(tmp) == dev)
+   goto out;
+   }
+   }
+   ret = 0;
+out:
+   rcu_read_unlock();
+   return ret;
+}
+
 struct intel_iommu *device_to_iommu(struct device *dev, u8 *bus, u8 *devfn)
 {
struct dmar_drhd_unit *drhd = NULL;
@@ -2575,10 +2628,16 @@ static struct dmar_domain 
*dmar_insert_one_dev_info(struct intel_iommu *iommu,
if (dev && dev_is_pci(dev)) {
struct pci_dev *pdev = to_pci_dev(info->dev);
 
-   if (ecap_dev_iotlb_support(iommu->ecap) &&
-   pci_ats_supported(pdev) &&
-   dmar_find_matched_atsr_unit(pdev))
-   info->ats_supported = 1;
+   /*
+* Support ATS by default if it's supported by both IOMMU and
+* client sides, except that the device's ATS is required by
+* ACPI/SATC but the IOMMU scalable mode is disabled. In that
+* case the hardware managed ATS will be automatically used.
+*/
+   if (iommu_support_ats(iommu) && device_support_ats(pdev)) {
+   if (!device_atc_required(pdev) || sm_supported(iommu))
+   info->ats_supported = 1;
+   }
 
if (sm_supported(iommu)) {
if (pasid_supported(iommu)) {
@@ -3175,6 +3234,11 @@ static int __init init_dmars(void)
 * endfor
 */
for_each_drhd_unit(drhd) {
+   if (pci_ats_disabled() && segment_atc_required(drhd->segment)) {
+   pr_warn("Scalable mode disabled -- Use hardware managed 
ATS because PCIe ATS is disabled but some devices in PCIe segment 0x%x require 
it.",
+   drhd->segment);
+   intel_iommu_sm = 0;
+   }
/*
 * lock not needed as this is only incremented in the single
 * threaded kernel __init code path all other access are read
-- 
2.25.1

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[PATCH 2/3] iommu/vt-d: Parse SATC reporting structure

2021-02-01 Thread Lu Baolu
From: Yian Chen 

Software should parse every SATC table and all devices in the tables
reported by the BIOS and keep the information in kernel list for further
SATC policy deployment.

Signed-off-by: Yian Chen 
---
 drivers/iommu/intel/dmar.c  |  9 
 drivers/iommu/intel/iommu.c | 89 +
 include/linux/dmar.h|  2 +
 3 files changed, 100 insertions(+)

diff --git a/drivers/iommu/intel/dmar.c b/drivers/iommu/intel/dmar.c
index 980e8ae090af..6bd357966d4e 100644
--- a/drivers/iommu/intel/dmar.c
+++ b/drivers/iommu/intel/dmar.c
@@ -526,6 +526,7 @@ dmar_table_print_dmar_entry(struct acpi_dmar_header *header)
struct acpi_dmar_reserved_memory *rmrr;
struct acpi_dmar_atsr *atsr;
struct acpi_dmar_rhsa *rhsa;
+   struct acpi_dmar_satc *satc;
 
switch (header->type) {
case ACPI_DMAR_TYPE_HARDWARE_UNIT:
@@ -555,6 +556,11 @@ dmar_table_print_dmar_entry(struct acpi_dmar_header 
*header)
/* We don't print this here because we need to sanity-check
   it first. So print it in dmar_parse_one_andd() instead. */
break;
+   case ACPI_DMAR_TYPE_SATC:
+   satc = container_of(header, struct acpi_dmar_satc, header);
+   pr_info("SATC flags: 0x%x\n", satc->flags);
+   break;
+
}
 }
 
@@ -642,6 +648,7 @@ parse_dmar_table(void)
.cb[ACPI_DMAR_TYPE_ROOT_ATS] = _parse_one_atsr,
.cb[ACPI_DMAR_TYPE_HARDWARE_AFFINITY] = _parse_one_rhsa,
.cb[ACPI_DMAR_TYPE_NAMESPACE] = _parse_one_andd,
+   .cb[ACPI_DMAR_TYPE_SATC] = _parse_one_satc,
};
 
/*
@@ -2077,6 +2084,7 @@ static guid_t dmar_hp_guid =
 #defineDMAR_DSM_FUNC_DRHD  1
 #defineDMAR_DSM_FUNC_ATSR  2
 #defineDMAR_DSM_FUNC_RHSA  3
+#defineDMAR_DSM_FUNC_SATC  4
 
 static inline bool dmar_detect_dsm(acpi_handle handle, int func)
 {
@@ -2094,6 +2102,7 @@ static int dmar_walk_dsm_resource(acpi_handle handle, int 
func,
[DMAR_DSM_FUNC_DRHD] = ACPI_DMAR_TYPE_HARDWARE_UNIT,
[DMAR_DSM_FUNC_ATSR] = ACPI_DMAR_TYPE_ROOT_ATS,
[DMAR_DSM_FUNC_RHSA] = ACPI_DMAR_TYPE_HARDWARE_AFFINITY,
+   [DMAR_DSM_FUNC_SATC] = ACPI_DMAR_TYPE_SATC,
};
 
if (!dmar_detect_dsm(handle, func))
diff --git a/drivers/iommu/intel/iommu.c b/drivers/iommu/intel/iommu.c
index ecd36e456de3..b20fd305fc60 100644
--- a/drivers/iommu/intel/iommu.c
+++ b/drivers/iommu/intel/iommu.c
@@ -316,8 +316,18 @@ struct dmar_atsr_unit {
u8 include_all:1;   /* include all ports */
 };
 
+struct dmar_satc_unit {
+   struct list_head list;  /* list of SATC units */
+   struct acpi_dmar_header *hdr;   /* ACPI header */
+   struct dmar_dev_scope *devices; /* target devices */
+   struct intel_iommu *iommu;  /* the corresponding iommu */
+   int devices_cnt;/* target device count */
+   u8 atc_required:1;  /* ATS is required */
+};
+
 static LIST_HEAD(dmar_atsr_units);
 static LIST_HEAD(dmar_rmrr_units);
+static LIST_HEAD(dmar_satc_units);
 
 #define for_each_rmrr_units(rmrr) \
list_for_each_entry(rmrr, _rmrr_units, list)
@@ -3736,6 +3746,57 @@ int dmar_check_one_atsr(struct acpi_dmar_header *hdr, 
void *arg)
return 0;
 }
 
+static struct dmar_satc_unit *dmar_find_satc(struct acpi_dmar_satc *satc)
+{
+   struct dmar_satc_unit *satcu;
+   struct acpi_dmar_satc *tmp;
+
+   list_for_each_entry_rcu(satcu, _satc_units, list,
+   dmar_rcu_check()) {
+   tmp = (struct acpi_dmar_satc *)satcu->hdr;
+   if (satc->segment != tmp->segment)
+   continue;
+   if (satc->header.length != tmp->header.length)
+   continue;
+   if (memcmp(satc, tmp, satc->header.length) == 0)
+   return satcu;
+   }
+
+   return NULL;
+}
+
+int dmar_parse_one_satc(struct acpi_dmar_header *hdr, void *arg)
+{
+   struct acpi_dmar_satc *satc;
+   struct dmar_satc_unit *satcu;
+
+   if (system_state >= SYSTEM_RUNNING && !intel_iommu_enabled)
+   return 0;
+
+   satc = container_of(hdr, struct acpi_dmar_satc, header);
+   satcu = dmar_find_satc(satc);
+   if (satcu)
+   return 0;
+
+   satcu = kzalloc(sizeof(*satcu) + hdr->length, GFP_KERNEL);
+   if (!satcu)
+   return -ENOMEM;
+
+   satcu->hdr = (void *)(satcu + 1);
+   memcpy(satcu->hdr, hdr, hdr->length);
+   satcu->atc_required = satc->flags & 0x1;
+   satcu->devices = dmar_alloc_dev_scope((void *)(satc + 1),
+ (void *)satc + 
satc->header.length,
+ >devices_cnt);
+   if (satcu->devices_cnt && !satcu->devices) 

[PATCH 1/3] iommu/vt-d: Add new enum value and structure for SATC

2021-02-01 Thread Lu Baolu
From: Yian Chen 

Starting from Intel Platform VT-d v3.2, BIOS may provide new remapping
structure SATC for SOC integrated devices, according to section 8.8 of
Intel VT-d architecture specification v3.2. The SATC structure reports
a list of the devices that require SATC enabling via ATS capacity.

This patch introduces the new enum value and structure to represent the
remapping information. Kernel should parse the information from the
reporting structure and enable ATC for the devices as needed.

Signed-off-by: Yian Chen 
---
 include/acpi/actbl1.h | 11 ++-
 1 file changed, 10 insertions(+), 1 deletion(-)

diff --git a/include/acpi/actbl1.h b/include/acpi/actbl1.h
index 43549547ed3e..b7ca802b66d2 100644
--- a/include/acpi/actbl1.h
+++ b/include/acpi/actbl1.h
@@ -514,7 +514,8 @@ enum acpi_dmar_type {
ACPI_DMAR_TYPE_ROOT_ATS = 2,
ACPI_DMAR_TYPE_HARDWARE_AFFINITY = 3,
ACPI_DMAR_TYPE_NAMESPACE = 4,
-   ACPI_DMAR_TYPE_RESERVED = 5 /* 5 and greater are reserved */
+   ACPI_DMAR_TYPE_SATC = 5,
+   ACPI_DMAR_TYPE_RESERVED = 6 /* 5 and greater are reserved */
 };
 
 /* DMAR Device Scope structure */
@@ -607,6 +608,14 @@ struct acpi_dmar_andd {
char device_name[1];
 };
 
+/* 5: SOC Integrated Address Translation Cache Reporting Structure */
+
+struct acpi_dmar_satc {
+   struct acpi_dmar_header header;
+   u8 flags;
+   u8 reserved;
+   u16 segment;
+};
 
/***
  *
  * DRTM - Dynamic Root of Trust for Measurement table
-- 
2.25.1

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[PATCH 0/3] iommu/vt-d: Add support for ACPI/SATC table

2021-02-01 Thread Lu Baolu
Intel platform VT-d (v3.2) comes with a new type of DMAR subtable
SATC. The SATC table includes a list of SoC integrated devices
that require SATC. OS software can use this table to enable ATS
only for the devices in the list.

Yian Chen (3):
  iommu/vt-d: Add new enum value and structure for SATC
  iommu/vt-d: Parse SATC reporting structure
  iommu/vt-d: Apply SATC policy

 drivers/iommu/intel/dmar.c  |   9 ++
 drivers/iommu/intel/iommu.c | 161 +++-
 include/acpi/actbl1.h   |  11 ++-
 include/linux/dmar.h|   2 +
 4 files changed, 178 insertions(+), 5 deletions(-)

-- 
2.25.1

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


RE: [RFC PATCH v2] uacce: Add uacce_ctrl misc device

2021-02-01 Thread Song Bao Hua (Barry Song)



> -Original Message-
> From: Tian, Kevin [mailto:kevin.t...@intel.com]
> Sent: Tuesday, February 2, 2021 3:52 PM
> To: Jason Gunthorpe 
> Cc: Song Bao Hua (Barry Song) ; chensihang (A)
> ; Arnd Bergmann ; Greg
> Kroah-Hartman ; linux-ker...@vger.kernel.org;
> iommu@lists.linux-foundation.org; linux...@kvack.org; Zhangfei Gao
> ; Liguozhu (Kenneth) ;
> linux-accelerat...@lists.ozlabs.org
> Subject: RE: [RFC PATCH v2] uacce: Add uacce_ctrl misc device
> 
> > From: Jason Gunthorpe 
> > Sent: Tuesday, February 2, 2021 7:44 AM
> >
> > On Fri, Jan 29, 2021 at 10:09:03AM +, Tian, Kevin wrote:
> > > > SVA is not doom to work with IO page fault only. If we have SVA+pin,
> > > > we would get both sharing address and stable I/O latency.
> > >
> > > Isn't it like a traditional MAP_DMA API (imply pinning) plus specifying
> > > cpu_va of the memory pool as the iova?
> >
> > I think their issue is the HW can't do the cpu_va trick without also
> > involving the system IOMMU in a SVA mode
> >
> 
> This is the part that I didn't understand. Using cpu_va in a MAP_DMA
> interface doesn't require device support. It's just an user-specified
> address to be mapped into the IOMMU page table. On the other hand,

The background is that uacce is based on SVA and we are building
applications on uacce:
https://www.kernel.org/doc/html/v5.10/misc-devices/uacce.html
so IOMMU simply uses the page table of MMU, and don't do any
special mapping to an user-specified address. We don't break
the basic assumption that uacce is using SVA, otherwise, we
need to re-build uacce and the whole base.

> sharing CPU page table through a SVA interface for an usage where I/O
> page faults must be completely avoided seems a misleading attempt.

That is not for completely avoiding IO page fault, that is just
an extension for high-performance I/O case, providing a way to
avoid IO latency jitter. Using it or not is totally up to users.

> Even if people do want this model (e.g. mix pinning+fault), it should be
> a mm syscall as Greg pointed out, not specific to sva.
> 

We are glad to make it a syscall if people are happy with
it. The simplest way would be a syscall similar with
userfaultfd  if we don't want to mess up mm_struct.

> Thanks
> Kevin

Thanks
Barry
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


RE: [RFC PATCH v2] uacce: Add uacce_ctrl misc device

2021-02-01 Thread Tian, Kevin
> From: Jason Gunthorpe 
> Sent: Tuesday, February 2, 2021 7:44 AM
> 
> On Fri, Jan 29, 2021 at 10:09:03AM +, Tian, Kevin wrote:
> > > SVA is not doom to work with IO page fault only. If we have SVA+pin,
> > > we would get both sharing address and stable I/O latency.
> >
> > Isn't it like a traditional MAP_DMA API (imply pinning) plus specifying
> > cpu_va of the memory pool as the iova?
> 
> I think their issue is the HW can't do the cpu_va trick without also
> involving the system IOMMU in a SVA mode
> 

This is the part that I didn't understand. Using cpu_va in a MAP_DMA
interface doesn't require device support. It's just an user-specified
address to be mapped into the IOMMU page table. On the other hand,
sharing CPU page table through a SVA interface for an usage where I/O 
page faults must be completely avoided seems a misleading attempt. 
Even if people do want this model (e.g. mix pinning+fault), it should be
a mm syscall as Greg pointed out, not specific to sva.

Thanks
Kevin
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH v6 00/33] MT8192 IOMMU support

2021-02-01 Thread Yong Wu
On Mon, 2021-02-01 at 14:54 +, Will Deacon wrote:
> On Mon, Jan 11, 2021 at 07:18:41PM +0800, Yong Wu wrote:
> > This patch mainly adds support for mt8192 Multimedia IOMMU and SMI.
> > 
> > mt8192 also is MTK IOMMU gen2 which uses ARM Short-Descriptor translation
> > table format. The M4U-SMI HW diagram is as below:
> > 
> >   EMI
> >|
> >   M4U
> >|
> >   
> >SMI Common
> >   
> >|
> >   +---+--+--+--+---+
> >   |   |  |  |   .. |   |
> >   |   |  |  |  |   |
> > larb0   larb1  larb2  larb4 ..  larb19   larb20
> > disp0   disp1   mdpvdec   IPE  IPE
> > 
> > All the connections are HW fixed, SW can NOT adjust it.
> > 
> > Comparing with the preview SoC, this patchset mainly adds two new functions:
> > a) add iova 34 bits support.
> > b) add multi domains support since several HW has the special iova
> > region requirement.
> > 
> > change note:
> > v6:a) base on v5.11-rc1. and tlb v4:
> >   
> > https://lore.kernel.org/linux-mediatek/20210107122909.16317-1-yong...@mediatek.com/T/#t
> >  
> 
> I've queued this up apart from patches 6 and 7.

Thanks very much for the applying. I'd like to show there is a little
conflict with a smi change[1] in /include/soc/mediatek/smi.h.

This is the detailed conflict:

--- a/include/soc/mediatek/smi.h
+++ b/include/soc/mediatek/smi.h
@@ -9,7 +9,7 @@
 #include 
 #include 
 
-#ifdef CONFIG_MTK_SMI
+#if IS_ENABLED(CONFIG_MTK_SMI)   <---The smi patch change here.
 
 #define MTK_LARB_NR_MAX   16  <---This iommu patchset delete this line.


This code is simple. Please feel free to tell me how to do this if this
is not convenient to merge.

[1]
https://git.kernel.org/pub/scm/linux/kernel/git/krzk/linux-mem-ctrl.git/commit/?h=for-next=50fc8d9232cdc64b9e9d1b9488452f153de52b69

> 
> Thanks,
> 
> Will

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH v5 0/1] perf/smmuv3: Don't reserve the PMCG register spaces

2021-02-01 Thread Leizhen (ThunderTown)



On 2021/2/1 23:50, Will Deacon wrote:
> On Mon, Feb 01, 2021 at 09:27:49PM +0800, Zhen Lei wrote:
>> v4 --> v5:
>> 1. Give up doing the mapping for the entire SMMU register space.
>> 2. Fix some compile warnings. Sorry. So sorry.
> 
> That's alright, these things happen. However, this came in slightly too
> late for 5.12, so please resend at -rc1 and we'll aim for 5.13.

Okay, thanks for your tolerance.

> 
> Will
> 
> .
> 

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH v4 1/7] iommu: Move iotlb_sync_map out from __iommu_map

2021-02-01 Thread Doug Anderson
Hi,

On Thu, Jan 7, 2021 at 4:31 AM Yong Wu  wrote:
>
> @@ -2438,18 +2435,31 @@ static int __iommu_map(struct iommu_domain *domain, 
> unsigned long iova,
> return ret;
>  }
>
> +static int _iommu_map(struct iommu_domain *domain, unsigned long iova,
> + phys_addr_t paddr, size_t size, int prot, gfp_t gfp)
> +{
> +   const struct iommu_ops *ops = domain->ops;
> +   int ret;
> +
> +   ret = __iommu_map(domain, iova, paddr, size, prot, GFP_KERNEL);

The above is broken.  Instead of GFP_KERNEL it should be passing "gfp".


> +   if (ret == 0 && ops->iotlb_sync_map)
> +   ops->iotlb_sync_map(domain);
> +
> +   return ret;
> +}
> +
>  int iommu_map(struct iommu_domain *domain, unsigned long iova,
>   phys_addr_t paddr, size_t size, int prot)
>  {
> might_sleep();
> -   return __iommu_map(domain, iova, paddr, size, prot, GFP_KERNEL);
> +   return _iommu_map(domain, iova, paddr, size, prot, GFP_KERNEL);
>  }
>  EXPORT_SYMBOL_GPL(iommu_map);
>
>  int iommu_map_atomic(struct iommu_domain *domain, unsigned long iova,
>   phys_addr_t paddr, size_t size, int prot)
>  {
> -   return __iommu_map(domain, iova, paddr, size, prot, GFP_ATOMIC);
> +   return _iommu_map(domain, iova, paddr, size, prot, GFP_ATOMIC);

Specifically the above bug means we drop the "GFP_ATOMIC" here.

It means we trigger a warning, like this (on a downstream kernel with
the patch backported):

 BUG: sleeping function called from invalid context at mm/page_alloc.c:4726
 in_atomic(): 1, irqs_disabled(): 0, non_block: 0, pid: 9, name: ksoftirqd/0
 CPU: 0 PID: 9 Comm: ksoftirqd/0 Not tainted 5.4.93-12508-gc10c93e28e39 #1
 Call trace:
  dump_backtrace+0x0/0x154
  show_stack+0x20/0x2c
  dump_stack+0xa0/0xfc
  ___might_sleep+0x11c/0x12c
  __might_sleep+0x50/0x84
  __alloc_pages_nodemask+0xf8/0x2bc
  __arm_lpae_alloc_pages+0x48/0x1b4
  __arm_lpae_map+0x124/0x274
  __arm_lpae_map+0x1cc/0x274
  arm_lpae_map+0x140/0x170
  arm_smmu_map+0x78/0xbc
  __iommu_map+0xd4/0x210
  _iommu_map+0x4c/0x84
  iommu_map_atomic+0x44/0x58
  __iommu_dma_map+0x8c/0xc4
  iommu_dma_map_page+0xac/0xf0

---

A quick (but not very tested) fix at:

https://lore.kernel.org/r/20210201170611.1.I64a7b62579287d668d7c89e105dcedf45d641063@changeid/


-Doug
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH] iommu: Properly pass gfp_t in _iommu_map() to avoid atomic sleeping

2021-02-01 Thread Yong Wu
On Mon, 2021-02-01 at 17:06 -0800, Douglas Anderson wrote:
> Sleeping while atomic = bad.  Let's fix an obvious typo to try to avoid it.
> 
> The warning that was seen (on a downstream kernel with the problematic
> patch backported):
> 
>  BUG: sleeping function called from invalid context at mm/page_alloc.c:4726
>  in_atomic(): 1, irqs_disabled(): 0, non_block: 0, pid: 9, name: ksoftirqd/0
>  CPU: 0 PID: 9 Comm: ksoftirqd/0 Not tainted 5.4.93-12508-gc10c93e28e39 #1
>  Call trace:
>   dump_backtrace+0x0/0x154
>   show_stack+0x20/0x2c
>   dump_stack+0xa0/0xfc
>   ___might_sleep+0x11c/0x12c
>   __might_sleep+0x50/0x84
>   __alloc_pages_nodemask+0xf8/0x2bc
>   __arm_lpae_alloc_pages+0x48/0x1b4
>   __arm_lpae_map+0x124/0x274
>   __arm_lpae_map+0x1cc/0x274
>   arm_lpae_map+0x140/0x170
>   arm_smmu_map+0x78/0xbc
>   __iommu_map+0xd4/0x210
>   _iommu_map+0x4c/0x84
>   iommu_map_atomic+0x44/0x58
>   __iommu_dma_map+0x8c/0xc4
>   iommu_dma_map_page+0xac/0xf0
> 
> Fixes: d8c1df02ac7f ("iommu: Move iotlb_sync_map out from __iommu_map")
> Signed-off-by: Douglas Anderson 

oh. This is my fault. Thanks for the fix.

Reviewed-by: Yong Wu 

> ---
> I haven't done any serious testing on this.  I saw a report of the
> warning and the fix seemed obvious so I'm shooting it out.
> 
>  drivers/iommu/iommu.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/drivers/iommu/iommu.c b/drivers/iommu/iommu.c
> index 3d099a31ddca..2b06b01850d5 100644
> --- a/drivers/iommu/iommu.c
> +++ b/drivers/iommu/iommu.c
> @@ -2441,7 +2441,7 @@ static int _iommu_map(struct iommu_domain *domain, 
> unsigned long iova,
>   const struct iommu_ops *ops = domain->ops;
>   int ret;
>  
> - ret = __iommu_map(domain, iova, paddr, size, prot, GFP_KERNEL);
> + ret = __iommu_map(domain, iova, paddr, size, prot, gfp);
>   if (ret == 0 && ops->iotlb_sync_map)
>   ops->iotlb_sync_map(domain, iova, size);
>  

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[PATCH] iommu: Properly pass gfp_t in _iommu_map() to avoid atomic sleeping

2021-02-01 Thread Douglas Anderson
Sleeping while atomic = bad.  Let's fix an obvious typo to try to avoid it.

The warning that was seen (on a downstream kernel with the problematic
patch backported):

 BUG: sleeping function called from invalid context at mm/page_alloc.c:4726
 in_atomic(): 1, irqs_disabled(): 0, non_block: 0, pid: 9, name: ksoftirqd/0
 CPU: 0 PID: 9 Comm: ksoftirqd/0 Not tainted 5.4.93-12508-gc10c93e28e39 #1
 Call trace:
  dump_backtrace+0x0/0x154
  show_stack+0x20/0x2c
  dump_stack+0xa0/0xfc
  ___might_sleep+0x11c/0x12c
  __might_sleep+0x50/0x84
  __alloc_pages_nodemask+0xf8/0x2bc
  __arm_lpae_alloc_pages+0x48/0x1b4
  __arm_lpae_map+0x124/0x274
  __arm_lpae_map+0x1cc/0x274
  arm_lpae_map+0x140/0x170
  arm_smmu_map+0x78/0xbc
  __iommu_map+0xd4/0x210
  _iommu_map+0x4c/0x84
  iommu_map_atomic+0x44/0x58
  __iommu_dma_map+0x8c/0xc4
  iommu_dma_map_page+0xac/0xf0

Fixes: d8c1df02ac7f ("iommu: Move iotlb_sync_map out from __iommu_map")
Signed-off-by: Douglas Anderson 
---
I haven't done any serious testing on this.  I saw a report of the
warning and the fix seemed obvious so I'm shooting it out.

 drivers/iommu/iommu.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/iommu/iommu.c b/drivers/iommu/iommu.c
index 3d099a31ddca..2b06b01850d5 100644
--- a/drivers/iommu/iommu.c
+++ b/drivers/iommu/iommu.c
@@ -2441,7 +2441,7 @@ static int _iommu_map(struct iommu_domain *domain, 
unsigned long iova,
const struct iommu_ops *ops = domain->ops;
int ret;
 
-   ret = __iommu_map(domain, iova, paddr, size, prot, GFP_KERNEL);
+   ret = __iommu_map(domain, iova, paddr, size, prot, gfp);
if (ret == 0 && ops->iotlb_sync_map)
ops->iotlb_sync_map(domain, iova, size);
 
-- 
2.30.0.365.g02bc693789-goog

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH V2 3/3] Adding device_dma_parameters->offset_preserve_mask to NVMe driver.

2021-02-01 Thread Chaitanya Kulkarni
On 2/1/21 13:27, Jianxiong Gao wrote:
>> Why is this setting being done and undone on each IO? Wouldn't it be
>> more efficient to set it once during device initialization?
>>
>> And more importantly, this isn't thread safe: one CPU may be setting the
>> device's dma alignment mask to 0 while another CPU is expecting it to be
>> NVME_CTRL_PAGE_SIZE - 1.
>  I was having trouble getting the OS booted when setting it once during
>  initialization. However when I rebased to the latest rc6 this morning it
>  seems to be working with setting the mask on probe. I am still testing out
>  and will appreciate any idea what may cause the nvme driver to fail
>  with a single mask.
Based on the Keith's comment it needs to be completely avoided in hot path.

Did you get a chance to bisect the problem in the rc6 ? We need to know the
root cause otherwise it might happen again after we add this patch.
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH V2 3/3] Adding device_dma_parameters->offset_preserve_mask to NVMe driver.

2021-02-01 Thread Jianxiong Gao via iommu
> Why is this setting being done and undone on each IO? Wouldn't it be
> more efficient to set it once during device initialization?

I agree that setting it once is the right way of doing it.

So I have changed the patch to enable the mask once in nvme_probe.

 drivers/nvme/host/pci.c | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/drivers/nvme/host/pci.c b/drivers/nvme/host/pci.c
index 81e6389b2042..4ce78373f98d 100644
--- a/drivers/nvme/host/pci.c
+++ b/drivers/nvme/host/pci.c
@@ -2630,6 +2630,9 @@ static void nvme_reset_work(struct work_struct *work)
 */
dma_set_max_seg_size(dev->dev, 0x);

+   if (dma_set_min_align_mask(dev->dev, NVME_CTRL_PAGE_SIZE - 1))
+   dev_warn(dev->dev, "dma_set_min_align_mask failed to
set offset\n");
+
mutex_unlock(>shutdown_lock);

/*

However on boot of the system, the following error happens occasionally.
The error seems related to Journal service. Whenever "Stopping Journal
Service..."
appears, the boot succeeds. Otherwise boot fails with the following message.

log start here--
[  OK  ] Started Journal Service.
[   10.774545] xfs filesystem being remounted at / supports timestamps
until 2038 (0x7fff)
[  OK  ] Started Remount Root and Kernel File Systems.
 Starting Flush Journal to Persistent Storage...
 Starting Load/Save Random Seed...
 Starting Create Static [   10.804340] systemd-journald[780]:
Received request to flush runtime journal from PID 1
Device Nodes in /dev...
[  OK  ] Started Load/Save Random Seed.
[  OK  ] Started Flush Journal to Persistent Storage.
[  OK  ] Started Create Static Device Nodes in /dev.
 Starting udev Kernel Device Manager...
[  OK  ] Reached target Local File Systems (Pre).
 Starting File System Check on /dev/disk/by-uuid/7281-17FC...
[  OK  ] Started File System Check on /dev/disk/by-uuid/7281-17FC.
 Mounting /boot/efi...
[  OK  ] Mounted /boo[   11.203461] systemd[1]: segfault at 2e0 ip
55b08607cc24 sp 7ffe13809090 error 4 in
systemd[55b08600+14]
t/efi.
[   11.216088] Code: 02 c7 44 24 10 fe ff ff ff 49 89 e4 89 06 48 8d
6c 24 08 48 8d 5c 24 10 48 c7 44 24 18 00 00 00 00 eb 10 0f 1f 00 48
8b 3c 24 <44> 39 b7 e0 02 00 00 74 3b 49 8b 7d 00 4c 89 e1 48 89 ea 48
89 de
---log ends here---

> Based on the Keith's comment it needs to be completely avoided in hot path.
>
> Did you get a chance to bisect the problem in the rc6 ? We need to know the
> root cause otherwise it might happen again after we add this patch.

I am now trying to bisect the problem.
I am not sure how the mapping offset could have caused the error.
Any suggestions are appreciated.

--
Jianxiong Gao
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


RE: [RFC PATCH v2] uacce: Add uacce_ctrl misc device

2021-02-01 Thread Song Bao Hua (Barry Song)



> -Original Message-
> From: Jason Gunthorpe [mailto:j...@ziepe.ca]
> Sent: Tuesday, February 2, 2021 12:44 PM
> To: Tian, Kevin 
> Cc: Song Bao Hua (Barry Song) ; chensihang (A)
> ; Arnd Bergmann ; Greg
> Kroah-Hartman ; linux-ker...@vger.kernel.org;
> iommu@lists.linux-foundation.org; linux...@kvack.org; Zhangfei Gao
> ; Liguozhu (Kenneth) ;
> linux-accelerat...@lists.ozlabs.org
> Subject: Re: [RFC PATCH v2] uacce: Add uacce_ctrl misc device
> 
> On Fri, Jan 29, 2021 at 10:09:03AM +, Tian, Kevin wrote:
> > > SVA is not doom to work with IO page fault only. If we have SVA+pin,
> > > we would get both sharing address and stable I/O latency.
> >
> > Isn't it like a traditional MAP_DMA API (imply pinning) plus specifying
> > cpu_va of the memory pool as the iova?
> 
> I think their issue is the HW can't do the cpu_va trick without also
> involving the system IOMMU in a SVA mode
> 
> It really is something that belongs under some general /dev/sva as we
> talked on the vfio thread

AFAIK, there is no this /dev/sva so /dev/uacce is an uAPI
which belongs to sva.

Another option is that we add a system call like
fs/userfaultfd.c, and move the file_operations and  ioctl
to the anon inode by creating fd via anon_inode_getfd().
Then nothing will be buried by uacce.

> 
> Jason

Thanks
Barry

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [RFC PATCH v2] uacce: Add uacce_ctrl misc device

2021-02-01 Thread Jason Gunthorpe
On Fri, Jan 29, 2021 at 10:09:03AM +, Tian, Kevin wrote:
> > SVA is not doom to work with IO page fault only. If we have SVA+pin,
> > we would get both sharing address and stable I/O latency.
> 
> Isn't it like a traditional MAP_DMA API (imply pinning) plus specifying 
> cpu_va of the memory pool as the iova? 

I think their issue is the HW can't do the cpu_va trick without also
involving the system IOMMU in a SVA mode

It really is something that belongs under some general /dev/sva as we
talked on the vfio thread

Jason
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH V2 3/3] Adding device_dma_parameters->offset_preserve_mask to NVMe driver.

2021-02-01 Thread Jianxiong Gao via iommu
> Why is this setting being done and undone on each IO? Wouldn't it be
> more efficient to set it once during device initialization?
>
> And more importantly, this isn't thread safe: one CPU may be setting the
> device's dma alignment mask to 0 while another CPU is expecting it to be
> NVME_CTRL_PAGE_SIZE - 1.

 I was having trouble getting the OS booted when setting it once during
 initialization. However when I rebased to the latest rc6 this morning it
 seems to be working with setting the mask on probe. I am still testing out
 and will appreciate any idea what may cause the nvme driver to fail
 with a single mask.
-- 
Jianxiong Gao
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH V2 3/3] Adding device_dma_parameters->offset_preserve_mask to NVMe driver.

2021-02-01 Thread Jianxiong Gao via iommu
>
> On Mon, Feb 01, 2021 at 10:30:17AM -0800, Jianxiong Gao wrote:
> > @@ -868,12 +871,24 @@ static blk_status_t nvme_map_data(struct nvme_dev
> *dev, struct request *req,
> >   if (!iod->nents)
> >   goto out_free_sg;
> >
> > + offset_ret = dma_set_min_align_mask(dev->dev, NVME_CTRL_PAGE_SIZE
> - 1);
> > + if (offset_ret) {
> > + dev_warn(dev->dev, "dma_set_min_align_mask failed to set
> offset\n");
> > + goto out_free_sg;
> > + }
> > +
> >   if (is_pci_p2pdma_page(sg_page(iod->sg)))
> >   nr_mapped = pci_p2pdma_map_sg_attrs(dev->dev, iod->sg,
> >   iod->nents, rq_dma_dir(req),
> DMA_ATTR_NO_WARN);
> >   else
> >   nr_mapped = dma_map_sg_attrs(dev->dev, iod->sg, iod->nents,
> >rq_dma_dir(req),
> DMA_ATTR_NO_WARN);
> > +
> > + offset_ret = dma_set_min_align_mask(dev->dev, 0);
> > + if (offset_ret) {
> > + dev_warn(dev->dev, "dma_set_min_align_mask failed to reset
> offset\n");
> > + goto out_free_sg;
> > + }
> >   if (!nr_mapped)
> >   goto out_free_sg;
>
> Why is this setting being done and undone on each IO? Wouldn't it be
> more efficient to set it once during device initialization?
>
> And more importantly, this isn't thread safe: one CPU may be setting the
> device's dma alignment mask to 0 while another CPU is expecting it to be
> NVME_CTRL_PAGE_SIZE - 1.
>

I was having trouble getting the OS booted when setting it once during
initialization. However when I rebased to the latest rc6 this morning it
seems to be working with setting the mask on probe. I am still testing out
and will appreciate any idea what may cause the nvme driver to fail
with a single mask.

-- 
Jianxiong Gao
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

Re: [PATCH V2 3/3] Adding device_dma_parameters->offset_preserve_mask to NVMe driver.

2021-02-01 Thread Keith Busch
On Mon, Feb 01, 2021 at 10:30:17AM -0800, Jianxiong Gao wrote:
> @@ -868,12 +871,24 @@ static blk_status_t nvme_map_data(struct nvme_dev *dev, 
> struct request *req,
>   if (!iod->nents)
>   goto out_free_sg;
>  
> + offset_ret = dma_set_min_align_mask(dev->dev, NVME_CTRL_PAGE_SIZE - 1);
> + if (offset_ret) {
> + dev_warn(dev->dev, "dma_set_min_align_mask failed to set 
> offset\n");
> + goto out_free_sg;
> + }
> +
>   if (is_pci_p2pdma_page(sg_page(iod->sg)))
>   nr_mapped = pci_p2pdma_map_sg_attrs(dev->dev, iod->sg,
>   iod->nents, rq_dma_dir(req), DMA_ATTR_NO_WARN);
>   else
>   nr_mapped = dma_map_sg_attrs(dev->dev, iod->sg, iod->nents,
>rq_dma_dir(req), DMA_ATTR_NO_WARN);
> +
> + offset_ret = dma_set_min_align_mask(dev->dev, 0);
> + if (offset_ret) {
> + dev_warn(dev->dev, "dma_set_min_align_mask failed to reset 
> offset\n");
> + goto out_free_sg;
> + }
>   if (!nr_mapped)
>   goto out_free_sg;

Why is this setting being done and undone on each IO? Wouldn't it be
more efficient to set it once during device initialization?

And more importantly, this isn't thread safe: one CPU may be setting the
device's dma alignment mask to 0 while another CPU is expecting it to be
NVME_CTRL_PAGE_SIZE - 1.
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH V2 3/3] Adding device_dma_parameters->offset_preserve_mask to NVMe driver.

2021-02-01 Thread Jianxiong Gao via iommu
On Mon, Feb 1, 2021 at 10:56 AM Andy Shevchenko
 wrote:
>
> On Mon, Feb 01, 2021 at 10:30:17AM -0800, Jianxiong Gao wrote:
> > NVMe driver relies on the address offset to function properly.
> > This patch adds the offset preserve mask to NVMe driver when mapping
> > via dma_map_sg_attrs and unmapping via nvme_unmap_sg. The mask
> > depends on the page size defined by CC.MPS register of NVMe
> > controller.
>
> ...
>
> >   if (is_pci_p2pdma_page(sg_page(iod->sg)))
> >   nr_mapped = pci_p2pdma_map_sg_attrs(dev->dev, iod->sg,
> >   iod->nents, rq_dma_dir(req), 
> > DMA_ATTR_NO_WARN);
> >   else
> >   nr_mapped = dma_map_sg_attrs(dev->dev, iod->sg, iod->nents,
> >rq_dma_dir(req), 
> > DMA_ATTR_NO_WARN);
> > +
> > + offset_ret = dma_set_min_align_mask(dev->dev, 0);
> > + if (offset_ret) {
> > + dev_warn(dev->dev, "dma_set_min_align_mask failed to reset 
> > offset\n");
> > + goto out_free_sg;
> > + }
>
> Seems like rebasing effect which makes empty line goes in the middle of some
> other group of code lines.
>
> >   if (!nr_mapped)
> >   goto out_free_sg;
>
> Perhaps it should be here?
Yes you are correct, it should be

 else
  nr_mapped = dma_map_sg_attrs(dev->dev, iod->sg, iod->nents,
   rq_dma_dir(req), DMA_ATTR_NO_WARN);
  if (!nr_mapped)
  goto out_free_sg;
+
+ offset_ret = dma_set_min_align_mask(dev->dev, 0);
+ if (offset_ret) {
+ dev_warn(dev->dev, "dma_set_min_align_mask failed to
reset offset\n");
+ goto out_free_sg;
+ }

Thanks for pointing it out.
>
> --
> With Best Regards,
> Andy Shevchenko
>
>


-- 
Jianxiong Gao
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[PATCH V2 1/3] Adding page_offset_mask to device_dma_parameters

2021-02-01 Thread Jianxiong Gao via iommu
Some devices rely on the address offset in a page to function
correctly (NVMe driver as an example). These devices may use
a different page size than the Linux kernel. The address offset
has to be preserved upon mapping, and in order to do so, we
need to record the page_offset_mask first.

Signed-off-by: Jianxiong Gao 
---
 include/linux/device.h  |  1 +
 include/linux/dma-mapping.h | 17 +
 2 files changed, 18 insertions(+)

diff --git a/include/linux/device.h b/include/linux/device.h
index 1779f90eeb4c..7960bf516dd7 100644
--- a/include/linux/device.h
+++ b/include/linux/device.h
@@ -291,6 +291,7 @@ struct device_dma_parameters {
 * sg limitations.
 */
unsigned int max_segment_size;
+   unsigned int min_align_mask;
unsigned long segment_boundary_mask;
 };
 
diff --git a/include/linux/dma-mapping.h b/include/linux/dma-mapping.h
index 2e49996a8f39..27ec3cab8cbd 100644
--- a/include/linux/dma-mapping.h
+++ b/include/linux/dma-mapping.h
@@ -500,6 +500,23 @@ static inline int dma_set_seg_boundary(struct device *dev, 
unsigned long mask)
return -EIO;
 }
 
+static inline unsigned int dma_get_min_align_mask(struct device *dev)
+{
+   if (dev->dma_parms)
+   return dev->dma_parms->min_align_mask;
+   return 0;
+}
+
+static inline int dma_set_min_align_mask(struct device *dev,
+   unsigned int min_align_mask)
+{
+   if (dev->dma_parms) {
+   dev->dma_parms->min_align_mask = min_align_mask;
+   return 0;
+   }
+   return -EIO;
+}
+
 static inline int dma_get_cache_alignment(void)
 {
 #ifdef ARCH_DMA_MINALIGN
-- 
2.27.0

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[PATCH V2 2/3] Add swiotlb offset preserving mapping when dma_dma_parameters->page_offset_mask is non zero.

2021-02-01 Thread Jianxiong Gao via iommu
For devices that need to preserve address offset on mapping through
swiotlb, this patch adds offset preserving based on page_offset_mask
and keeps the offset if the mask is non zero. This is needed for
device drivers like NVMe.

Signed-off-by: Jianxiong Gao 
---
 kernel/dma/swiotlb.c | 27 ---
 1 file changed, 24 insertions(+), 3 deletions(-)

diff --git a/kernel/dma/swiotlb.c b/kernel/dma/swiotlb.c
index 7c42df6e6100..eeb640df35f3 100644
--- a/kernel/dma/swiotlb.c
+++ b/kernel/dma/swiotlb.c
@@ -468,7 +468,7 @@ phys_addr_t swiotlb_tbl_map_single(struct device *hwdev, 
phys_addr_t orig_addr,
dma_addr_t tbl_dma_addr = phys_to_dma_unencrypted(hwdev, io_tlb_start);
unsigned long flags;
phys_addr_t tlb_addr;
-   unsigned int nslots, stride, index, wrap;
+   unsigned int nslots, stride, index, wrap, min_align_mask, page_offset;
int i;
unsigned long mask;
unsigned long offset_slots;
@@ -500,12 +500,16 @@ phys_addr_t swiotlb_tbl_map_single(struct device *hwdev, 
phys_addr_t orig_addr,
? ALIGN(mask + 1, 1 << IO_TLB_SHIFT) >> IO_TLB_SHIFT
: 1UL << (BITS_PER_LONG - IO_TLB_SHIFT);
 
+   min_align_mask = dma_get_min_align_mask(hwdev);
+   page_offset = orig_addr & min_align_mask;
+   alloc_size += page_offset;
+
/*
 * For mappings greater than or equal to a page, we limit the stride
 * (and hence alignment) to a page size.
 */
nslots = ALIGN(alloc_size, 1 << IO_TLB_SHIFT) >> IO_TLB_SHIFT;
-   if (alloc_size >= PAGE_SIZE)
+   if ((alloc_size >= PAGE_SIZE) || (min_align_mask > (1 << IO_TLB_SHIFT)))
stride = (1 << (PAGE_SHIFT - IO_TLB_SHIFT));
else
stride = 1;
@@ -583,6 +587,11 @@ phys_addr_t swiotlb_tbl_map_single(struct device *hwdev, 
phys_addr_t orig_addr,
 */
for (i = 0; i < nslots; i++)
io_tlb_orig_addr[index+i] = orig_addr + (i << IO_TLB_SHIFT);
+   /*
+* When keeping the offset of the original data, we need to advance
+* the tlb_addr by the offset of orig_addr.
+*/
+   tlb_addr += page_offset;
if (!(attrs & DMA_ATTR_SKIP_CPU_SYNC) &&
(dir == DMA_TO_DEVICE || dir == DMA_BIDIRECTIONAL))
swiotlb_bounce(orig_addr, tlb_addr, mapping_size, 
DMA_TO_DEVICE);
@@ -598,7 +607,11 @@ void swiotlb_tbl_unmap_single(struct device *hwdev, 
phys_addr_t tlb_addr,
  enum dma_data_direction dir, unsigned long attrs)
 {
unsigned long flags;
-   int i, count, nslots = ALIGN(alloc_size, 1 << IO_TLB_SHIFT) >> 
IO_TLB_SHIFT;
+   unsigned int num_page_offset_slabs;
+   unsigned int min_align_mask = dma_get_min_align_mask(hwdev);
+   int i, count;
+   int nslots = ALIGN(alloc_size + (tlb_addr & min_align_mask),
+   1 << IO_TLB_SHIFT) >> IO_TLB_SHIFT;
int index = (tlb_addr - io_tlb_start) >> IO_TLB_SHIFT;
phys_addr_t orig_addr = io_tlb_orig_addr[index];
 
@@ -610,6 +623,14 @@ void swiotlb_tbl_unmap_single(struct device *hwdev, 
phys_addr_t tlb_addr,
((dir == DMA_FROM_DEVICE) || (dir == DMA_BIDIRECTIONAL)))
swiotlb_bounce(orig_addr, tlb_addr, mapping_size, 
DMA_FROM_DEVICE);
 
+   /*
+* When dma_get_min_align_mask is used, we may have padded more slabs
+* when padding exceeds one slab. We need to move index back to the
+* beginning of the padding.
+*/
+   num_page_offset_slabs = (tlb_addr & min_align_mask) / (1 << 
IO_TLB_SHIFT);
+   index -= num_page_offset_slabs;
+
/*
 * Return the buffer to the free list by setting the corresponding
 * entries to indicate the number of contiguous entries available.
-- 
2.27.0

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[PATCH V2 3/3] Adding device_dma_parameters->offset_preserve_mask to NVMe driver.

2021-02-01 Thread Jianxiong Gao via iommu
NVMe driver relies on the address offset to function properly.
This patch adds the offset preserve mask to NVMe driver when mapping
via dma_map_sg_attrs and unmapping via nvme_unmap_sg. The mask
depends on the page size defined by CC.MPS register of NVMe
controller.

Signed-off-by: Jianxiong Gao 
---
 drivers/nvme/host/pci.c | 19 +--
 1 file changed, 17 insertions(+), 2 deletions(-)

diff --git a/drivers/nvme/host/pci.c b/drivers/nvme/host/pci.c
index 81e6389b2042..30e45f7e0f75 100644
--- a/drivers/nvme/host/pci.c
+++ b/drivers/nvme/host/pci.c
@@ -580,12 +580,15 @@ static void nvme_free_sgls(struct nvme_dev *dev, struct 
request *req)
 static void nvme_unmap_sg(struct nvme_dev *dev, struct request *req)
 {
struct nvme_iod *iod = blk_mq_rq_to_pdu(req);
-
+   if (dma_set_min_align_mask(dev->dev, NVME_CTRL_PAGE_SIZE - 1))
+   dev_warn(dev->dev, "dma_set_min_align_mask failed to set 
offset\n");
if (is_pci_p2pdma_page(sg_page(iod->sg)))
pci_p2pdma_unmap_sg(dev->dev, iod->sg, iod->nents,
rq_dma_dir(req));
else
dma_unmap_sg(dev->dev, iod->sg, iod->nents, rq_dma_dir(req));
+   if (dma_set_min_align_mask(dev->dev, 0))
+   dev_warn(dev->dev, "dma_set_min_align_mask failed to reset 
offset\n");
 }
 
 static void nvme_unmap_data(struct nvme_dev *dev, struct request *req)
@@ -842,7 +845,7 @@ static blk_status_t nvme_map_data(struct nvme_dev *dev, 
struct request *req,
 {
struct nvme_iod *iod = blk_mq_rq_to_pdu(req);
blk_status_t ret = BLK_STS_RESOURCE;
-   int nr_mapped;
+   int nr_mapped, offset_ret;
 
if (blk_rq_nr_phys_segments(req) == 1) {
struct bio_vec bv = req_bvec(req);
@@ -868,12 +871,24 @@ static blk_status_t nvme_map_data(struct nvme_dev *dev, 
struct request *req,
if (!iod->nents)
goto out_free_sg;
 
+   offset_ret = dma_set_min_align_mask(dev->dev, NVME_CTRL_PAGE_SIZE - 1);
+   if (offset_ret) {
+   dev_warn(dev->dev, "dma_set_min_align_mask failed to set 
offset\n");
+   goto out_free_sg;
+   }
+
if (is_pci_p2pdma_page(sg_page(iod->sg)))
nr_mapped = pci_p2pdma_map_sg_attrs(dev->dev, iod->sg,
iod->nents, rq_dma_dir(req), DMA_ATTR_NO_WARN);
else
nr_mapped = dma_map_sg_attrs(dev->dev, iod->sg, iod->nents,
 rq_dma_dir(req), DMA_ATTR_NO_WARN);
+
+   offset_ret = dma_set_min_align_mask(dev->dev, 0);
+   if (offset_ret) {
+   dev_warn(dev->dev, "dma_set_min_align_mask failed to reset 
offset\n");
+   goto out_free_sg;
+   }
if (!nr_mapped)
goto out_free_sg;
 
-- 
2.27.0

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH V2 3/3] Adding device_dma_parameters->offset_preserve_mask to NVMe driver.

2021-02-01 Thread Andy Shevchenko
On Mon, Feb 01, 2021 at 10:30:17AM -0800, Jianxiong Gao wrote:
> NVMe driver relies on the address offset to function properly.
> This patch adds the offset preserve mask to NVMe driver when mapping
> via dma_map_sg_attrs and unmapping via nvme_unmap_sg. The mask
> depends on the page size defined by CC.MPS register of NVMe
> controller.

...

>   if (is_pci_p2pdma_page(sg_page(iod->sg)))
>   nr_mapped = pci_p2pdma_map_sg_attrs(dev->dev, iod->sg,
>   iod->nents, rq_dma_dir(req), DMA_ATTR_NO_WARN);
>   else
>   nr_mapped = dma_map_sg_attrs(dev->dev, iod->sg, iod->nents,
>rq_dma_dir(req), DMA_ATTR_NO_WARN);
> +
> + offset_ret = dma_set_min_align_mask(dev->dev, 0);
> + if (offset_ret) {
> + dev_warn(dev->dev, "dma_set_min_align_mask failed to reset 
> offset\n");
> + goto out_free_sg;
> + }

Seems like rebasing effect which makes empty line goes in the middle of some
other group of code lines.

>   if (!nr_mapped)
>   goto out_free_sg;

Perhaps it should be here?

-- 
With Best Regards,
Andy Shevchenko


___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[PATCH V2 0/3] SWIOTLB: Preserve swiotlb map offset when needed.

2021-02-01 Thread Jianxiong Gao via iommu
NVMe driver and other applications may depend on the data offset
to operate correctly. Currently when unaligned data is mapped via
SWIOTLB, the data is mapped as slab aligned with the SWIOTLB. This
patch adds an option to make sure the mapped data preserves its
offset of the orginal addrss.

Without the patch when creating xfs formatted disk on NVMe backends,
with swiotlb=force in kernel boot option, creates the following error:
meta-data=/dev/nvme2n1   isize=512agcount=4, agsize=131072 blks
 =   sectsz=512   attr=2, projid32bit=1
 =   crc=1finobt=1, sparse=0, rmapbt=0, refl
ink=0
data =   bsize=4096   blocks=524288, imaxpct=25
 =   sunit=0  swidth=0 blks
naming   =version 2  bsize=4096   ascii-ci=0 ftype=1
log  =internal log   bsize=4096   blocks=2560, version=2
 =   sectsz=512   sunit=0 blks, lazy-count=1
realtime =none   extsz=4096   blocks=0, rtextents=0
mkfs.xfs: pwrite failed: Input/output error

Jianxiong Gao (3):
  Adding page_offset_mask to device_dma_parameters
  Add swiotlb offset preserving mapping when
dma_dma_parameters->page_offset_mask is non zero.
  Adding device_dma_parameters->offset_preserve_mask to NVMe driver.

 drivers/nvme/host/pci.c |  4 
 include/linux/device.h  |  1 +
 include/linux/dma-mapping.h | 17 +
 kernel/dma/swiotlb.c| 16 +++-
 4 files changed, 37 insertions(+), 1 deletion(-)

-- 
2.27.0

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH 2/3] iommu/io-pgtable-arm: Add IOMMU_LLC page protection flag

2021-02-01 Thread Jordan Crouse
On Mon, Feb 01, 2021 at 08:20:44AM -0800, Rob Clark wrote:
> On Mon, Feb 1, 2021 at 3:16 AM Will Deacon  wrote:
> >
> > On Fri, Jan 29, 2021 at 03:12:59PM +0530, Sai Prakash Ranjan wrote:
> > > On 2021-01-29 14:35, Will Deacon wrote:
> > > > On Mon, Jan 11, 2021 at 07:45:04PM +0530, Sai Prakash Ranjan wrote:
> > > > > Add a new page protection flag IOMMU_LLC which can be used
> > > > > by non-coherent masters to set cacheable memory attributes
> > > > > for an outer level of cache called as last-level cache or
> > > > > system cache. Initial user of this page protection flag is
> > > > > the adreno gpu and then can later be used by other clients
> > > > > such as video where this can be used for per-buffer based
> > > > > mapping.
> > > > >
> > > > > Signed-off-by: Sai Prakash Ranjan 
> > > > > ---
> > > > >  drivers/iommu/io-pgtable-arm.c | 3 +++
> > > > >  include/linux/iommu.h  | 6 ++
> > > > >  2 files changed, 9 insertions(+)
> > > > >
> > > > > diff --git a/drivers/iommu/io-pgtable-arm.c
> > > > > b/drivers/iommu/io-pgtable-arm.c
> > > > > index 7439ee7fdcdb..ebe653ef601b 100644
> > > > > --- a/drivers/iommu/io-pgtable-arm.c
> > > > > +++ b/drivers/iommu/io-pgtable-arm.c
> > > > > @@ -415,6 +415,9 @@ static arm_lpae_iopte
> > > > > arm_lpae_prot_to_pte(struct arm_lpae_io_pgtable *data,
> > > > >   else if (prot & IOMMU_CACHE)
> > > > >   pte |= (ARM_LPAE_MAIR_ATTR_IDX_CACHE
> > > > >   << ARM_LPAE_PTE_ATTRINDX_SHIFT);
> > > > > + else if (prot & IOMMU_LLC)
> > > > > + pte |= (ARM_LPAE_MAIR_ATTR_IDX_INC_OCACHE
> > > > > + << ARM_LPAE_PTE_ATTRINDX_SHIFT);
> > > > >   }
> > > > >
> > > > >   if (prot & IOMMU_CACHE)
> > > > > diff --git a/include/linux/iommu.h b/include/linux/iommu.h
> > > > > index ffaa389ea128..1f82057df531 100644
> > > > > --- a/include/linux/iommu.h
> > > > > +++ b/include/linux/iommu.h
> > > > > @@ -31,6 +31,12 @@
> > > > >   * if the IOMMU page table format is equivalent.
> > > > >   */
> > > > >  #define IOMMU_PRIV   (1 << 5)
> > > > > +/*
> > > > > + * Non-coherent masters can use this page protection flag to set
> > > > > cacheable
> > > > > + * memory attributes for only a transparent outer level of cache,
> > > > > also known as
> > > > > + * the last-level or system cache.
> > > > > + */
> > > > > +#define IOMMU_LLC(1 << 6)
> > > >
> > > > On reflection, I'm a bit worried about exposing this because I think it
> > > > will
> > > > introduce a mismatched virtual alias with the CPU (we don't even have a
> > > > MAIR
> > > > set up for this memory type). Now, we also have that issue for the PTW,
> > > > but
> > > > since we always use cache maintenance (i.e. the streaming API) for
> > > > publishing the page-tables to a non-coheren walker, it works out.
> > > > However,
> > > > if somebody expects IOMMU_LLC to be coherent with a DMA API coherent
> > > > allocation, then they're potentially in for a nasty surprise due to the
> > > > mismatched outer-cacheability attributes.
> > > >
> > >
> > > Can't we add the syscached memory type similar to what is done on android?
> >
> > Maybe. How does the GPU driver map these things on the CPU side?
> 
> Currently we use writecombine mappings for everything, although there
> are some cases that we'd like to use cached (but have not merged
> patches that would give userspace a way to flush/invalidate)
> 
> BR,
> -R

LLC/system cache doesn't have a relationship with the CPU cache.  Its just a
little accelerator that sits on the connection from the GPU to DDR and caches
accesses. The hint that Sai is suggesting is used to mark the buffers as
'no-write-allocate' to prevent GPU write operations from being cached in the LLC
which a) isn't interesting and b) takes up cache space for read operations.

Its easiest to think of the LLC as a bonus accelerator that has no cost for
us to use outside of the unfortunate per buffer hint.

We do have to worry about the CPU cache w.r.t I/O coherency (which is a
different hint) and in that case we have all of concerns that Will identified.

Jordan
-- 
The Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum,
a Linux Foundation Collaborative Project
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH RFC 0/9] Possible set of VT-d optimizations

2021-02-01 Thread Chuck Lever



> On Jan 28, 2021, at 9:52 AM, Chuck Lever  wrote:
> 
> 
> 
>> On Jan 28, 2021, at 8:59 AM, Robin Murphy  wrote:
>> 
>> On 2021-01-27 20:00, Chuck Lever wrote:
>>> Hi-
>>> This collection of patches seems to get the best throughtput results
>>> so far. The NFS WRITE result is fully restored, and the NFS READ
>>> result is very close to fully restored.
>>> Children see throughput for 12 initial writers  = 5008474.03 kB/sec
>>> Parent sees throughput for 12 initial writers   = 4996927.80 kB/sec
>>> Min throughput per process  = 416956.88 kB/sec
>>> Max throughput per process  = 417910.22 kB/sec
>>> Avg throughput per process  = 417372.84 kB/sec
>>> Min xfer= 1046272.00 kB
>>> CPU Utilization: Wall time2.515CPU time1.996CPU 
>>> utilization  79.37 %
>>> Children see throughput for 12 rewriters= 5020584.59 kB/sec
>>> Parent sees throughput for 12 rewriters = 5012539.29 kB/sec
>>> Min throughput per process  = 417799.00 kB/sec
>>> Max throughput per process  = 419082.22 kB/sec
>>> Avg throughput per process  = 418382.05 kB/sec
>>> Min xfer= 1046528.00 kB
>>> CPU utilization: Wall time2.507CPU time2.024CPU 
>>> utilization  80.73 %
>>> Children see throughput for 12 readers  = 5805484.25 kB/sec
>>> Parent sees throughput for 12 readers   = 5799535.68 kB/sec
>>> Min throughput per process  = 482888.16 kB/sec
>>> Max throughput per process  = 48.16 kB/sec
>>> Avg throughput per process  = 483790.35 kB/sec
>>> Min xfer= 1045760.00 kB
>>> CPU utilization: Wall time2.167CPU time1.964CPU 
>>> utilization  90.63 %
>>> Children see throughput for 12 re-readers   = 5812227.16 kB/sec
>>> Parent sees throughput for 12 re-readers= 5803793.06 kB/sec
>>> Min throughput per process  = 483242.97 kB/sec
>>> Max throughput per process  = 485724.41 kB/sec
>>> Avg throughput per process  = 484352.26 kB/sec
>>> Min xfer= 1043456.00 kB
>>> CPU utilization: Wall time2.161CPU time1.976CPU 
>>> utilization  91.45 %
>>> I've included a simple-minded implementation of a map_sg op for
>>> the Intel IOMMU. This is nothing more than a copy of the loop in
>>> __iommu_map_sg() with the call to __iommu_map() replaced with a
>>> call to intel_iommu_map().
>> 
>> ...which is the main reason I continue to strongly dislike patches #4-#9 (#3 
>> definitely seems to makes sense either way, now that #1 and #2 are going to 
>> land). If a common operation is worth optimising anywhere, then it deserves 
>> optimising everywhere, so we end up with a dozen diverging copies of 
>> essentially the same code - particularly when the driver-specific 
>> functionality *is* already in the drivers, so what gets duplicated is solely 
>> the "generic" parts.
> 
> I don't disagree with that assessment, but I don't immediately see an
> alternative API arrangement that would be more successful in the short
> term. If 4/9 - 9/9 are not acceptable, then the responsible thing to
> do would be to revert:
> 
> - 58a8bb39490d ("iommu/vt-d: Cleanup after converting to dma-iommu ops")
> - c588072bba6b ("iommu/vt-d: Convert intel iommu driver to the iommu ops")
> 
> for v5.11, work out the proper API design, and then try the VT-d conversion
> again.
> 
> IMHO.

Are all y'all waiting for me to post such patches? ;-)


>> And if there's justification for pushing iommu_map_sg() entirely into 
>> drivers, then it's verging on self-contradictory not to do the same for 
>> iommu_map() and iommu_unmap(). Some IOMMU drivers - mainly intel-iommu, as 
>> it happens - are already implementing hacks around the "one call per page" 
>> interface being inherently inefficient, so the logical thing to do here is 
>> take a step back and reconsider the fundamental design of the whole 
>> map/unmap interface. Implementing hacks on top of hacks to make particular 
>> things faster on particular systems that particular people care about is not 
>> going to do us any favours in the long run.
>> 
>> As it stands, I can easily see a weird anti-pattern emerging where people 
>> start adding code to fake up scatterlists in random drivers because they see 
>> dma_map_sg() performing paradoxically better than dma_map_page().
>> 
>> Robin.
>> 
>>> ---
>>> Chuck Lever (1):
>>>  iommu/vt-d: Introduce map_sg() for Intel IOMMUs
>>> Isaac J. Manjarres (5):
>>>  iommu/io-pgtable: Introduce map_sg() as a page table op
>>>  iommu/io-pgtable-arm: Hook up map_sg()
>>>  

Re: [PATCH v13 03/15] iommu/arm-smmu-v3: Maintain a SID->device structure

2021-02-01 Thread Auger Eric
Hi Keqian,

On 2/1/21 1:26 PM, Keqian Zhu wrote:
> Hi Eric,
> 
> On 2020/11/18 19:21, Eric Auger wrote:
>> From: Jean-Philippe Brucker 
>>
>> When handling faults from the event or PRI queue, we need to find the
>> struct device associated to a SID. Add a rb_tree to keep track of SIDs.
>>
>> Signed-off-by: Jean-Philippe Brucker 
> [...]
> 
>>  }
>>  
>> +static int arm_smmu_insert_master(struct arm_smmu_device *smmu,
>> +  struct arm_smmu_master *master)
>> +{
>> +int i;
>> +int ret = 0;
>> +struct arm_smmu_stream *new_stream, *cur_stream;
>> +struct rb_node **new_node, *parent_node = NULL;
>> +struct iommu_fwspec *fwspec = dev_iommu_fwspec_get(master->dev);
>> +
>> +master->streams = kcalloc(fwspec->num_ids,
>> +  sizeof(struct arm_smmu_stream), GFP_KERNEL);
>> +if (!master->streams)
>> +return -ENOMEM;
>> +master->num_streams = fwspec->num_ids;
> This is not roll-backed when fail.
> 
>> +
>> +mutex_lock(>streams_mutex);
>> +for (i = 0; i < fwspec->num_ids && !ret; i++) {
> Check ret at here, makes it hard to decide the start index of rollback.
> 
> If we fail at here, then start index is (i-2).
> If we fail in the loop, then start index is (i-1).
> 
>> +u32 sid = fwspec->ids[i];
>> +
>> +new_stream = >streams[i];
>> +new_stream->id = sid;
>> +new_stream->master = master;
>> +
>> +/*
>> + * Check the SIDs are in range of the SMMU and our stream table
>> + */
>> +if (!arm_smmu_sid_in_range(smmu, sid)) {
>> +ret = -ERANGE;
>> +break;
>> +}
>> +
>> +/* Ensure l2 strtab is initialised */
>> +if (smmu->features & ARM_SMMU_FEAT_2_LVL_STRTAB) {
>> +ret = arm_smmu_init_l2_strtab(smmu, sid);
>> +if (ret)
>> +break;
>> +}
>> +
>> +/* Insert into SID tree */
>> +new_node = &(smmu->streams.rb_node);
>> +while (*new_node) {
>> +cur_stream = rb_entry(*new_node, struct arm_smmu_stream,
>> +  node);
>> +parent_node = *new_node;
>> +if (cur_stream->id > new_stream->id) {
>> +new_node = &((*new_node)->rb_left);
>> +} else if (cur_stream->id < new_stream->id) {
>> +new_node = &((*new_node)->rb_right);
>> +} else {
>> +dev_warn(master->dev,
>> + "stream %u already in tree\n",
>> + cur_stream->id);
>> +ret = -EINVAL;
>> +break;
>> +}
>> +}
>> +
>> +if (!ret) {
>> +rb_link_node(_stream->node, parent_node, new_node);
>> +rb_insert_color(_stream->node, >streams);
>> +}
>> +}
>> +
>> +if (ret) {
>> +for (; i > 0; i--)
> should be (i >= 0)?
> And the start index seems not correct.
> 
>> +rb_erase(>streams[i].node, >streams);
>> +kfree(master->streams);
>> +}
>> +mutex_unlock(>streams_mutex);
>> +
>> +return ret;
>> +}
>> +
>> +static void arm_smmu_remove_master(struct arm_smmu_master *master)
>> +{
>> +int i;
>> +struct arm_smmu_device *smmu = master->smmu;
>> +struct iommu_fwspec *fwspec = dev_iommu_fwspec_get(master->dev);
>> +
>> +if (!smmu || !master->streams)
>> +return;
>> +
>> +mutex_lock(>streams_mutex);
>> +for (i = 0; i < fwspec->num_ids; i++)
>> +rb_erase(>streams[i].node, >streams);
>> +mutex_unlock(>streams_mutex);
>> +
>> +kfree(master->streams);
>> +}
>> +
>>  static struct iommu_ops arm_smmu_ops;
>>  
>>  static struct iommu_device *arm_smmu_probe_device(struct device *dev)
>>  {
>> -int i, ret;
>> +int ret;
>>  struct arm_smmu_device *smmu;
>>  struct arm_smmu_master *master;
>>  struct iommu_fwspec *fwspec = dev_iommu_fwspec_get(dev);
>> @@ -2331,27 +2447,12 @@ static struct iommu_device 
>> *arm_smmu_probe_device(struct device *dev)
>>  
>>  master->dev = dev;
>>  master->smmu = smmu;
>> -master->sids = fwspec->ids;
>> -master->num_sids = fwspec->num_ids;
>>  INIT_LIST_HEAD(>bonds);
>>  dev_iommu_priv_set(dev, master);
>>  
>> -/* Check the SIDs are in range of the SMMU and our stream table */
>> -for (i = 0; i < master->num_sids; i++) {
>> -u32 sid = master->sids[i];
>> -
>> -if (!arm_smmu_sid_in_range(smmu, sid)) {
>> -ret = -ERANGE;
>> -goto err_free_master;
>> -}
>> -
>> -/* Ensure l2 strtab is initialised */
>> -

Re: [PATCH v13 01/15] iommu: Introduce attach/detach_pasid_table API

2021-02-01 Thread Auger Eric
Hi Keqian,

On 2/1/21 12:27 PM, Keqian Zhu wrote:
> Hi Eric,
> 
> On 2020/11/18 19:21, Eric Auger wrote:
>> In virtualization use case, when a guest is assigned
>> a PCI host device, protected by a virtual IOMMU on the guest,
>> the physical IOMMU must be programmed to be consistent with
>> the guest mappings. If the physical IOMMU supports two
>> translation stages it makes sense to program guest mappings
>> onto the first stage/level (ARM/Intel terminology) while the host
>> owns the stage/level 2.
>>
>> In that case, it is mandated to trap on guest configuration
>> settings and pass those to the physical iommu driver.
>>
>> This patch adds a new API to the iommu subsystem that allows
>> to set/unset the pasid table information.
>>
>> A generic iommu_pasid_table_config struct is introduced in
>> a new iommu.h uapi header. This is going to be used by the VFIO
>> user API.
>>
>> Signed-off-by: Jean-Philippe Brucker 
>> Signed-off-by: Liu, Yi L 
>> Signed-off-by: Ashok Raj 
>> Signed-off-by: Jacob Pan 
>> Signed-off-by: Eric Auger 
>>
>> ---
>>
>> v12 -> v13:
>> - Fix config check
>>
>> v11 -> v12:
>> - add argsz, name the union
>> ---
>>  drivers/iommu/iommu.c  | 68 ++
>>  include/linux/iommu.h  | 21 
>>  include/uapi/linux/iommu.h | 54 ++
>>  3 files changed, 143 insertions(+)
>>
>> diff --git a/drivers/iommu/iommu.c b/drivers/iommu/iommu.c
>> index b53446bb8c6b..978fe34378fb 100644
>> --- a/drivers/iommu/iommu.c
>> +++ b/drivers/iommu/iommu.c
>> @@ -2171,6 +2171,74 @@ int iommu_uapi_sva_unbind_gpasid(struct iommu_domain 
>> *domain, struct device *dev
>>  }
>>  EXPORT_SYMBOL_GPL(iommu_uapi_sva_unbind_gpasid);
>>  
>> +int iommu_attach_pasid_table(struct iommu_domain *domain,
>> + struct iommu_pasid_table_config *cfg)
>> +{
>> +if (unlikely(!domain->ops->attach_pasid_table))
>> +return -ENODEV;
>> +
>> +return domain->ops->attach_pasid_table(domain, cfg);
>> +}
> miss export symbol?
yes we do
> 
>> +
>> +int iommu_uapi_attach_pasid_table(struct iommu_domain *domain,
>> +  void __user *uinfo)
>> +{
>> +struct iommu_pasid_table_config pasid_table_data = { 0 };
>> +u32 minsz;
>> +
>> +if (unlikely(!domain->ops->attach_pasid_table))
>> +return -ENODEV;
>> +
>> +/*
>> + * No new spaces can be added before the variable sized union, the
>> + * minimum size is the offset to the union.
>> + */
>> +minsz = offsetof(struct iommu_pasid_table_config, vendor_data);
>> +
>> +/* Copy minsz from user to get flags and argsz */
>> +if (copy_from_user(_table_data, uinfo, minsz))
>> +return -EFAULT;
>> +
>> +/* Fields before the variable size union are mandatory */
>> +if (pasid_table_data.argsz < minsz)
>> +return -EINVAL;
>> +
>> +/* PASID and address granu require additional info beyond minsz */
>> +if (pasid_table_data.version != PASID_TABLE_CFG_VERSION_1)
>> +return -EINVAL;
>> +if (pasid_table_data.format == IOMMU_PASID_FORMAT_SMMUV3 &&
>> +pasid_table_data.argsz <
>> +offsetofend(struct iommu_pasid_table_config, 
>> vendor_data.smmuv3))
>> +return -EINVAL;
>> +
>> +/*
>> + * User might be using a newer UAPI header which has a larger data
>> + * size, we shall support the existing flags within the current
>> + * size. Copy the remaining user data _after_ minsz but not more
>> + * than the current kernel supported size.
>> + */
>> +if (copy_from_user((void *)_table_data + minsz, uinfo + minsz,
>> +   min_t(u32, pasid_table_data.argsz, 
>> sizeof(pasid_table_data)) - minsz))
>> +return -EFAULT;
>> +
>> +/* Now the argsz is validated, check the content */
>> +if (pasid_table_data.config < IOMMU_PASID_CONFIG_TRANSLATE ||
>> +pasid_table_data.config > IOMMU_PASID_CONFIG_ABORT)
>> +return -EINVAL;
>> +
>> +return domain->ops->attach_pasid_table(domain, _table_data);
>> +}
>> +EXPORT_SYMBOL_GPL(iommu_uapi_attach_pasid_table);
>> +
>> +void iommu_detach_pasid_table(struct iommu_domain *domain)
>> +{
>> +if (unlikely(!domain->ops->detach_pasid_table))
>> +return;
>> +
>> +domain->ops->detach_pasid_table(domain);
>> +}
>> +EXPORT_SYMBOL_GPL(iommu_detach_pasid_table);
>> +
>>  static void __iommu_detach_device(struct iommu_domain *domain,
>>struct device *dev)
>>  {
>> diff --git a/include/linux/iommu.h b/include/linux/iommu.h
>> index b95a6f8db6ff..464fcbecf841 100644
>> --- a/include/linux/iommu.h
>> +++ b/include/linux/iommu.h
>> @@ -223,6 +223,8 @@ struct iommu_iotlb_gather {
>>   * @cache_invalidate: invalidate translation caches
>>   * @sva_bind_gpasid: bind guest pasid and mm
>>   * @sva_unbind_gpasid: unbind guest pasid and mm
>> + * @attach_pasid_table: attach a pasid table
>> + * 

Re: [PATCH 6/6] media: uvcvideo: Use dma_alloc_noncontiguos API

2021-02-01 Thread Christoph Hellwig
On Thu, Jan 28, 2021 at 06:00:57PM +0100, Ricardo Ribalda wrote:
> > Given that we vmap the addresses this also needs
> > flush_kernel_vmap_range / invalidate_kernel_vmap_range calls for
> > VIVT architectures.
> 
> We only read from the device to the cpu. Then can we run only
> invalidate_kernel_vmap_range() ?
> 
> something like ?
> else {
>   dma_sync_sgtable_for_cpu(dma_dev, uvc_urb->sgt, DMA_FROM_DEVICE);
>invalidate_kernel_vmap_range(uvc_urb->buffer,
> uvc_urb->stream->urb_size );
> }

Yes.  Right now we don't have a proper state machine for the
*_kernel_vmap_range, but we should probably add one once usage of this
grows.  Until then I need to respin my API patch to document how callers
need to use *_kernel_vmap_range, as well as adding the so far missing
dma-debug support.

As we're getting toward the end of the merge window I'll try to get this
done ASAP.

How should we plan to merge this code?  Do you have a tree you'd like
to pick up the whole thing for?  Or should I create a dma-mapping
tree branch that can be pulled in?
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH 2/3] iommu/io-pgtable-arm: Add IOMMU_LLC page protection flag

2021-02-01 Thread Rob Clark
On Mon, Feb 1, 2021 at 3:16 AM Will Deacon  wrote:
>
> On Fri, Jan 29, 2021 at 03:12:59PM +0530, Sai Prakash Ranjan wrote:
> > On 2021-01-29 14:35, Will Deacon wrote:
> > > On Mon, Jan 11, 2021 at 07:45:04PM +0530, Sai Prakash Ranjan wrote:
> > > > Add a new page protection flag IOMMU_LLC which can be used
> > > > by non-coherent masters to set cacheable memory attributes
> > > > for an outer level of cache called as last-level cache or
> > > > system cache. Initial user of this page protection flag is
> > > > the adreno gpu and then can later be used by other clients
> > > > such as video where this can be used for per-buffer based
> > > > mapping.
> > > >
> > > > Signed-off-by: Sai Prakash Ranjan 
> > > > ---
> > > >  drivers/iommu/io-pgtable-arm.c | 3 +++
> > > >  include/linux/iommu.h  | 6 ++
> > > >  2 files changed, 9 insertions(+)
> > > >
> > > > diff --git a/drivers/iommu/io-pgtable-arm.c
> > > > b/drivers/iommu/io-pgtable-arm.c
> > > > index 7439ee7fdcdb..ebe653ef601b 100644
> > > > --- a/drivers/iommu/io-pgtable-arm.c
> > > > +++ b/drivers/iommu/io-pgtable-arm.c
> > > > @@ -415,6 +415,9 @@ static arm_lpae_iopte
> > > > arm_lpae_prot_to_pte(struct arm_lpae_io_pgtable *data,
> > > >   else if (prot & IOMMU_CACHE)
> > > >   pte |= (ARM_LPAE_MAIR_ATTR_IDX_CACHE
> > > >   << ARM_LPAE_PTE_ATTRINDX_SHIFT);
> > > > + else if (prot & IOMMU_LLC)
> > > > + pte |= (ARM_LPAE_MAIR_ATTR_IDX_INC_OCACHE
> > > > + << ARM_LPAE_PTE_ATTRINDX_SHIFT);
> > > >   }
> > > >
> > > >   if (prot & IOMMU_CACHE)
> > > > diff --git a/include/linux/iommu.h b/include/linux/iommu.h
> > > > index ffaa389ea128..1f82057df531 100644
> > > > --- a/include/linux/iommu.h
> > > > +++ b/include/linux/iommu.h
> > > > @@ -31,6 +31,12 @@
> > > >   * if the IOMMU page table format is equivalent.
> > > >   */
> > > >  #define IOMMU_PRIV   (1 << 5)
> > > > +/*
> > > > + * Non-coherent masters can use this page protection flag to set
> > > > cacheable
> > > > + * memory attributes for only a transparent outer level of cache,
> > > > also known as
> > > > + * the last-level or system cache.
> > > > + */
> > > > +#define IOMMU_LLC(1 << 6)
> > >
> > > On reflection, I'm a bit worried about exposing this because I think it
> > > will
> > > introduce a mismatched virtual alias with the CPU (we don't even have a
> > > MAIR
> > > set up for this memory type). Now, we also have that issue for the PTW,
> > > but
> > > since we always use cache maintenance (i.e. the streaming API) for
> > > publishing the page-tables to a non-coheren walker, it works out.
> > > However,
> > > if somebody expects IOMMU_LLC to be coherent with a DMA API coherent
> > > allocation, then they're potentially in for a nasty surprise due to the
> > > mismatched outer-cacheability attributes.
> > >
> >
> > Can't we add the syscached memory type similar to what is done on android?
>
> Maybe. How does the GPU driver map these things on the CPU side?

Currently we use writecombine mappings for everything, although there
are some cases that we'd like to use cached (but have not merged
patches that would give userspace a way to flush/invalidate)

BR,
-R
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH v5 0/1] perf/smmuv3: Don't reserve the PMCG register spaces

2021-02-01 Thread Will Deacon
On Mon, Feb 01, 2021 at 09:27:49PM +0800, Zhen Lei wrote:
> v4 --> v5:
> 1. Give up doing the mapping for the entire SMMU register space.
> 2. Fix some compile warnings. Sorry. So sorry.

That's alright, these things happen. However, this came in slightly too
late for 5.12, so please resend at -rc1 and we'll aim for 5.13.

Will
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[GIT PULL] iommu/arm-smmu: Updates for 5.12

2021-02-01 Thread Will Deacon
Hi Joerg,

Please pull these Arm SMMU updates for 5.12. The biggest thing here isn't
actually Arm SMMU-related at all, but is the addition of a new driver for
the MT8192 IOMMU. I've included it here because it ended up touching
quite a bit of the io-pgtable code.

Please note that I've based this branch on the "iommu-fixes" tag I sent for
-rc4 to avoid conflicts with upstream.

Cheers,

Will

--->8

The following changes since commit 694a1c0adebee9152a9ba0320468f7921aca647d:

  iommu/vt-d: Fix duplicate included linux/dma-map-ops.h (2021-01-12 16:56:20 
+)

are available in the Git repository at:

  git://git.kernel.org/pub/scm/linux/kernel/git/will/linux.git 
tags/arm-smmu-updates

for you to fetch changes up to 7060377ce06f9cd3ed6274c0f2310463feb5baec:

  Merge branch 'for-joerg/mtk' into for-joerg/arm-smmu/updates (2021-02-01 
12:59:28 +)


Arm SMMU updates for 5.12

- Support for MT8192 IOMMU from Mediatek

- Arm v7s io-pgtable extensions for MT8192

- Removal of TLBI_ON_MAP quirk

- New Qualcomm compatible strings

- Allow SVA without hardware broadcast TLB maintenance on SMMUv3

- Virtualization Host Extension support for SMMUv3 (SVA)

- Allow SMMUv3 PMU (perf) driver to be built independently from IOMMU

- Misc cleanups


Bjorn Andersson (2):
  dt-bindings: arm-smmu-qcom: Add Qualcomm SC8180X compatible
  iommu/arm-smmu-qcom: Add Qualcomm SC8180X impl

Isaac J. Manjarres (1):
  iommu/arm-smmu-qcom: Fix mask extraction for bootloader programmed SMRs

Jean-Philippe Brucker (3):
  iommu/arm-smmu-v3: Split arm_smmu_tlb_inv_range()
  iommu/arm-smmu-v3: Make BTM optional for SVA
  iommu/arm-smmu-v3: Add support for VHE

John Garry (1):
  driver/perf: Remove ARM_SMMU_V3_PMU dependency on ARM_SMMU_V3

Robin Murphy (3):
  iommu/arm-smmu-v3: Remove the page 1 fixup
  iommu/msm: Hook up iotlb_sync_map
  iommu/io-pgtable: Remove TLBI_ON_MAP quirk

Vinod Koul (2):
  dt-bindings: arm-smmu: Add sm8350 compatible string
  iommu: arm-smmu-impl: Add SM8350 qcom iommu implementation

Will Deacon (1):
  Merge branch 'for-joerg/mtk' into for-joerg/arm-smmu/updates

Yong Wu (38):
  iommu: Move iotlb_sync_map out from __iommu_map
  iommu: Add iova and size as parameters in iotlb_sync_map
  iommu/mediatek: Add iotlb_sync_map to sync whole the iova range
  iommu: Switch gather->end to the inclusive end
  iommu/io-pgtable: Allow io_pgtable_tlb ops optional
  iommu/mediatek: Gather iova in iommu_unmap to achieve tlb sync once
  iommu/mediatek: Remove the tlb-ops for v7s
  dt-bindings: iommu: mediatek: Convert IOMMU to DT schema
  dt-bindings: memory: mediatek: Add a common memory header file
  dt-bindings: memory: mediatek: Extend LARB_NR_MAX to 32
  dt-bindings: memory: mediatek: Rename header guard for SMI header file
  dt-bindings: mediatek: Add binding for mt8192 IOMMU
  iommu/mediatek: Use the common mtk-memory-port.h
  iommu/io-pgtable-arm-v7s: Use ias to check the valid iova in unmap
  iommu/io-pgtable-arm-v7s: Extend PA34 for MediaTek
  iommu/io-pgtable-arm-v7s: Clarify LVL_SHIFT/BITS macro
  iommu/io-pgtable-arm-v7s: Add cfg as a param in some macros
  iommu/io-pgtable-arm-v7s: Quad lvl1 pgtable for MediaTek
  iommu/mediatek: Add a flag for iova 34bits case
  iommu/mediatek: Update oas for v7s
  iommu/mediatek: Move hw_init into attach_device
  iommu/mediatek: Add error handle for mtk_iommu_probe
  iommu/mediatek: Add device link for smi-common and m4u
  iommu/mediatek: Add pm runtime callback
  iommu/mediatek: Add power-domain operation
  iommu/mediatek: Support up to 34bit iova in tlb flush
  iommu/mediatek: Support report iova 34bit translation fault in ISR
  iommu/mediatek: Adjust the structure
  iommu/mediatek: Move domain_finalise into attach_device
  iommu/mediatek: Move geometry.aperture updating into domain_finalise
  iommu/mediatek: Add iova_region structure
  iommu/mediatek: Add get_domain_id from dev->dma_range_map
  iommu/mediatek: Support for multi domains
  iommu/mediatek: Add iova reserved function
  iommu/mediatek: Support master use iova over 32bit
  iommu/mediatek: Remove unnecessary check in attach_device
  iommu/mediatek: Add mt8192 support
  MAINTAINERS: Add entry for MediaTek IOMMU

Zhen Lei (1):
  iommu/arm-smmu-v3: Use DEFINE_RES_MEM() to simplify code

 .../devicetree/bindings/iommu/arm,smmu.yaml|   2 +
 .../devicetree/bindings/iommu/mediatek,iommu.txt   | 105 --
 .../devicetree/bindings/iommu/mediatek,iommu.yaml  | 183 +
 MAINTAINERS|   9 +
 drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3-sva.c|  10 +-
 drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c| 154 
 

Re: [PATCH v12 10/10] iommu/arm-smmu-v3: Add stall support for platform devices

2021-02-01 Thread Jean-Philippe Brucker
On Mon, Feb 01, 2021 at 02:16:16PM +0100, Auger Eric wrote:
> >>> + flt->type = IOMMU_FAULT_DMA_UNRECOV;
> >>> + flt->event = (struct iommu_fault_unrecoverable) {
> >>> + .reason = reason,
> >>> + .flags = IOMMU_FAULT_UNRECOV_ADDR_VALID |
> >>> +  IOMMU_FAULT_UNRECOV_FETCH_ADDR_VALID,
> >> nit: shall IOMMU_FAULT_UNRECOV_FETCH_ADDR_VALID be set here? Supported
> >> unrecoverable faults feature the IPA field which is UNKNOWN for S1
> >> translations. fetch_addr rather was
> >> corresponding to WALK_EABT.Fetch_addr to me.
> > 
> > Right I should drop the IPA part entirely, since we don't report S2 faults
> > in this patch.
> OK
> 
> But as I mentioned this can be fixed separately if you don't have other
> comments on this version.

Thanks, I need to resend anyway to fix patch 7.

Thanks,
Jean
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH v13 03/15] iommu/arm-smmu-v3: Maintain a SID->device structure

2021-02-01 Thread Jean-Philippe Brucker
On Mon, Feb 01, 2021 at 08:26:41PM +0800, Keqian Zhu wrote:
> > +static int arm_smmu_insert_master(struct arm_smmu_device *smmu,
> > + struct arm_smmu_master *master)
> > +{
> > +   int i;
> > +   int ret = 0;
> > +   struct arm_smmu_stream *new_stream, *cur_stream;
> > +   struct rb_node **new_node, *parent_node = NULL;
> > +   struct iommu_fwspec *fwspec = dev_iommu_fwspec_get(master->dev);
> > +
> > +   master->streams = kcalloc(fwspec->num_ids,
> > + sizeof(struct arm_smmu_stream), GFP_KERNEL);
> > +   if (!master->streams)
> > +   return -ENOMEM;
> > +   master->num_streams = fwspec->num_ids;
> This is not roll-backed when fail.

No need, the caller frees master

> > +
> > +   mutex_lock(>streams_mutex);
> > +   for (i = 0; i < fwspec->num_ids && !ret; i++) {
> Check ret at here, makes it hard to decide the start index of rollback.
> 
> If we fail at here, then start index is (i-2).
> If we fail in the loop, then start index is (i-1).
> 
[...]
> > +   if (ret) {
> > +   for (; i > 0; i--)
> should be (i >= 0)?
> And the start index seems not correct.

Indeed, this whole bit is wrong. I'll fix it while resending the IOPF
series.

Thanks,
Jean

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH v6 00/33] MT8192 IOMMU support

2021-02-01 Thread Will Deacon
On Mon, Jan 11, 2021 at 07:18:41PM +0800, Yong Wu wrote:
> This patch mainly adds support for mt8192 Multimedia IOMMU and SMI.
> 
> mt8192 also is MTK IOMMU gen2 which uses ARM Short-Descriptor translation
> table format. The M4U-SMI HW diagram is as below:
> 
>   EMI
>|
>   M4U
>|
>   
>SMI Common
>   
>|
>   +---+--+--+--+---+
>   |   |  |  |   .. |   |
>   |   |  |  |  |   |
> larb0   larb1  larb2  larb4 ..  larb19   larb20
> disp0   disp1   mdpvdec   IPE  IPE
> 
> All the connections are HW fixed, SW can NOT adjust it.
> 
> Comparing with the preview SoC, this patchset mainly adds two new functions:
> a) add iova 34 bits support.
> b) add multi domains support since several HW has the special iova
> region requirement.
> 
> change note:
> v6:a) base on v5.11-rc1. and tlb v4:
>   
> https://lore.kernel.org/linux-mediatek/20210107122909.16317-1-yong...@mediatek.com/T/#t
>  

I've queued this up apart from patches 6 and 7.

Thanks,

Will
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[PATCH v5 0/1] perf/smmuv3: Don't reserve the PMCG register spaces

2021-02-01 Thread Zhen Lei
v4 --> v5:
1. Give up doing the mapping for the entire SMMU register space.
2. Fix some compile warnings. Sorry. So sorry.

v3 --> v4:
1. Delete the unnecessary encapsulation function 
smmu_pmu_get_and_ioremap_resource().
2. Discard adding MODULE_SOFTDEP.

v2 --> v3:
Patch 3 is updated because https://lkml.org/lkml/2021/1/22/532 has been queued 
in advance.

v1 --> v2:
According to Robin Murphy's suggestion: https://lkml.org/lkml/2021/1/20/470
Don't reserve the PMCG register spaces, and reserve the entire SMMU register 
space.

v1:
Since the PMCG may implement its resigters space(4KB Page0 and 4KB Page1)
within the SMMUv3 64KB Page0. In this case, when the SMMUv3 driver reserves the
64KB Page0 resource in advance, the PMCG driver try to reserve its Page0 and
Page1 resources, a resource conflict occurs.

commit 52f3fab0067d6fa ("iommu/arm-smmu-v3: Don't reserve implementation
defined register space") reduce the resource reservation range of the SMMUv3
driver, it only reserves the first 0xe00 bytes in the 64KB Page0, to avoid
the above-mentioned resource conflicts.

But the SMMUv3.3 add support for ECMDQ, its registers space is also implemented
in the SMMUv3 64KB Page0. This means we need to build two separate mappings.
New features may be added in the future, and more independent mappings may be
required. The simple problem is complicated because the user expects to map the
entire SMMUv3 64KB Page0.

Therefore, the proper solution is: If the PMCG register resources are located in
the 64KB Page0 of the SMMU, the PMCG driver does not reserve the conflict 
resources
when the SMMUv3 driver has reserved the conflict resources before. Instead, the 
PMCG
driver only performs devm_ioremap() to ensure that it can work properly.

Zhen Lei (1):
  perf/smmuv3: Don't reserve the PMCG register spaces

 drivers/perf/arm_smmuv3_pmu.c | 25 +++--
 1 file changed, 19 insertions(+), 6 deletions(-)

-- 
2.26.0.106.g9fadedd


___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[PATCH v5 1/1] perf/smmuv3: Don't reserve the PMCG register spaces

2021-02-01 Thread Zhen Lei
According to the SMMUv3 specification:
Each PMCG counter group is represented by one 4KB page (Page 0) with one
optional additional 4KB page (Page 1), both of which are at IMPLEMENTATION
DEFINED base addresses.

This means that the PMCG register spaces may be within the 64KB pages of
the SMMUv3 register space. When both the SMMU and PMCG drivers reserve
their own resources, a resource conflict occurs.

To avoid this conflict, don't reserve the PMCG regions.

Suggested-by: Robin Murphy 
Signed-off-by: Zhen Lei 
Reviewed-by: Robin Murphy 
---
 drivers/perf/arm_smmuv3_pmu.c | 25 +++--
 1 file changed, 19 insertions(+), 6 deletions(-)

diff --git a/drivers/perf/arm_smmuv3_pmu.c b/drivers/perf/arm_smmuv3_pmu.c
index 74474bb322c3f26..8f0b71b5d08a815 100644
--- a/drivers/perf/arm_smmuv3_pmu.c
+++ b/drivers/perf/arm_smmuv3_pmu.c
@@ -793,17 +793,30 @@ static int smmu_pmu_probe(struct platform_device *pdev)
.capabilities   = PERF_PMU_CAP_NO_EXCLUDE,
};
 
-   smmu_pmu->reg_base = devm_platform_get_and_ioremap_resource(pdev, 0, 
_0);
-   if (IS_ERR(smmu_pmu->reg_base))
-   return PTR_ERR(smmu_pmu->reg_base);
+   /*
+* The register spaces of the PMCG may be in the register space of
+* other devices. For example, SMMU. Therefore, the PMCG resources are
+* not reserved to avoid resource conflicts with other drivers.
+*/
+   res_0 = platform_get_resource(pdev, IORESOURCE_MEM, 0);
+   if (!res_0)
+   return -EINVAL;
+   smmu_pmu->reg_base = devm_ioremap(dev, res_0->start, 
resource_size(res_0));
+   if (!smmu_pmu->reg_base)
+   return -ENOMEM;
 
cfgr = readl_relaxed(smmu_pmu->reg_base + SMMU_PMCG_CFGR);
 
/* Determine if page 1 is present */
if (cfgr & SMMU_PMCG_CFGR_RELOC_CTRS) {
-   smmu_pmu->reloc_base = devm_platform_ioremap_resource(pdev, 1);
-   if (IS_ERR(smmu_pmu->reloc_base))
-   return PTR_ERR(smmu_pmu->reloc_base);
+   struct resource *res_1;
+
+   res_1 = platform_get_resource(pdev, IORESOURCE_MEM, 1);
+   if (!res_1)
+   return -EINVAL;
+   smmu_pmu->reloc_base = devm_ioremap(dev, res_1->start, 
resource_size(res_1));
+   if (!smmu_pmu->reloc_base)
+   return -ENOMEM;
} else {
smmu_pmu->reloc_base = smmu_pmu->reg_base;
}
-- 
2.26.0.106.g9fadedd


___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH v12 10/10] iommu/arm-smmu-v3: Add stall support for platform devices

2021-02-01 Thread Auger Eric
Hi Jean,

On 2/1/21 12:12 PM, Jean-Philippe Brucker wrote:
> On Sun, Jan 31, 2021 at 07:29:09PM +0100, Auger Eric wrote:
>> Hi Jean,
>>
>> Some rather minor comments§questions below that may not justify a respin.
>>
>> On 1/27/21 4:43 PM, Jean-Philippe Brucker wrote:
>>> -static bool arm_smmu_iopf_supported(struct arm_smmu_master *master)
>>> +bool arm_smmu_master_iopf_supported(struct arm_smmu_master *master)
>>>  {
>>> -   return false;
>>> +   /* We're not keeping track of SIDs in fault events */
>> shall we? [*] below
> 
> That would require storing the incoming SID into the iommu_fault_event
> struct, and retrieve it in arm_smmu_page_response(). Easy enough, but I
> don't think it's needed for existing devices.
OK
> 
>>> +   if (master->num_streams != 1)
>>> +   return false;
> [...]
>>> +static int arm_smmu_page_response(struct device *dev,
>>> + struct iommu_fault_event *unused,
>>> + struct iommu_page_response *resp)
>>> +{
>>> +   struct arm_smmu_cmdq_ent cmd = {0};
>>> +   struct arm_smmu_master *master = dev_iommu_priv_get(dev);
>>> +   int sid = master->streams[0].id;
>> [*]
>>> +
>>> +   if (master->stall_enabled) {
>>> +   cmd.opcode  = CMDQ_OP_RESUME;
>>> +   cmd.resume.sid  = sid;
>>> +   cmd.resume.stag = resp->grpid;
>>> +   switch (resp->code) {
>>> +   case IOMMU_PAGE_RESP_INVALID:
>> add fallthrough?
> 
> I think fallthrough is mainly useful to tell reader and compiler that a
> break was omitted on purpose. When two cases are stuck together the intent
> to merge the flow is clear enough in my opinion. GCC's
> -Wimplicit-fallthrough doesn't warn in this case.
OK
> 
>>> +   case IOMMU_PAGE_RESP_FAILURE:
>>> +   cmd.resume.resp = CMDQ_RESUME_0_RESP_ABORT;
>>> +   break;
> [...]
>>> +static int arm_smmu_handle_evt(struct arm_smmu_device *smmu, u64 *evt)
>>> +{
>>> +   int ret;
>>> +   u32 reason;
>>> +   u32 perm = 0;
>>> +   struct arm_smmu_master *master;
>>> +   bool ssid_valid = evt[0] & EVTQ_0_SSV;
>>> +   u32 sid = FIELD_GET(EVTQ_0_SID, evt[0]);
>>> +   struct iommu_fault_event fault_evt = { };
>>> +   struct iommu_fault *flt = _evt.fault;
>>> +
>>> +   /* Stage-2 is always pinned at the moment */
>>> +   if (evt[1] & EVTQ_1_S2)
>>> +   return -EFAULT;
>>> +
>>> +   master = arm_smmu_find_master(smmu, sid);
>>> +   if (!master)
>>> +   return -EINVAL;
>>> +
>>> +   if (evt[1] & EVTQ_1_RnW)
>>> +   perm |= IOMMU_FAULT_PERM_READ;
>>> +   else
>>> +   perm |= IOMMU_FAULT_PERM_WRITE;
>>> +
>>> +   if (evt[1] & EVTQ_1_InD)
>>> +   perm |= IOMMU_FAULT_PERM_EXEC;
>>> +
>>> +   if (evt[1] & EVTQ_1_PnU)
>>> +   perm |= IOMMU_FAULT_PERM_PRIV;
>>> +
>>> +   switch (FIELD_GET(EVTQ_0_ID, evt[0])) {
>>> +   case EVT_ID_TRANSLATION_FAULT:
>>> +   case EVT_ID_ADDR_SIZE_FAULT:
>>> +   case EVT_ID_ACCESS_FAULT:
>>> +   reason = IOMMU_FAULT_REASON_PTE_FETCH;
>> Doesn't it rather map to IOMMU_FAULT_REASON_ACCESS?
>> /* access flag check failed */"
> 
> Good point, I guess it didn't exist when I wrote this. And ADDR_SIZE_FAULT
> corresponds to IOMMU_FAULT_REASON_OOR_ADDRESS now, right?
yes it dies
> 
> By the way the wording on those two fault reasons, "access flag" and
> "stage", seems arch-specific - x86 names are "accessed flag" and "level".
> 
>>> +   break;
>>> +   case EVT_ID_PERMISSION_FAULT:
>>> +   reason = IOMMU_FAULT_REASON_PERMISSION;
>>> +   break;
>>> +   default:
>>> +   return -EOPNOTSUPP;
>>> +   }
>>> +
>>> +   if (evt[1] & EVTQ_1_STALL) {
>>> +   flt->type = IOMMU_FAULT_PAGE_REQ;
>>> +   flt->prm = (struct iommu_fault_page_request) {
>>> +   .flags = IOMMU_FAULT_PAGE_REQUEST_LAST_PAGE,
>>> +   .grpid = FIELD_GET(EVTQ_1_STAG, evt[1]),
>>> +   .perm = perm,
>>> +   .addr = FIELD_GET(EVTQ_2_ADDR, evt[2]),
>>> +   };
>>> +
>>> +   if (ssid_valid) {
>>> +   flt->prm.flags |= IOMMU_FAULT_PAGE_REQUEST_PASID_VALID;
>>> +   flt->prm.pasid = FIELD_GET(EVTQ_0_SSID, evt[0]);
>>> +   }
>>> +   } else {
>>> +   flt->type = IOMMU_FAULT_DMA_UNRECOV;
>>> +   flt->event = (struct iommu_fault_unrecoverable) {
>>> +   .reason = reason,
>>> +   .flags = IOMMU_FAULT_UNRECOV_ADDR_VALID |
>>> +IOMMU_FAULT_UNRECOV_FETCH_ADDR_VALID,
>> nit: shall IOMMU_FAULT_UNRECOV_FETCH_ADDR_VALID be set here? Supported
>> unrecoverable faults feature the IPA field which is UNKNOWN for S1
>> translations. fetch_addr rather was
>> corresponding to WALK_EABT.Fetch_addr to me.
> 
> Right I should drop the IPA part entirely, since we don't report S2 faults
> in this patch.
OK

But as I mentioned this can be fixed separately if you don't have other
comments on this 

Re: [PATCH v4 1/2] perf/smmuv3: Don't reserve the PMCG register spaces

2021-02-01 Thread Leizhen (ThunderTown)


On 2021/2/1 20:54, Will Deacon wrote:
> On Sat, Jan 30, 2021 at 03:14:13PM +0800, Zhen Lei wrote:
>> According to the SMMUv3 specification:
>> Each PMCG counter group is represented by one 4KB page (Page 0) with one
>> optional additional 4KB page (Page 1), both of which are at IMPLEMENTATION
>> DEFINED base addresses.
>>
>> This means that the PMCG register spaces may be within the 64KB pages of
>> the SMMUv3 register space. When both the SMMU and PMCG drivers reserve
>> their own resources, a resource conflict occurs.
>>
>> To avoid this conflict, don't reserve the PMCG regions.
>>
>> Suggested-by: Robin Murphy 
>> Signed-off-by: Zhen Lei 
>> ---
>>  drivers/perf/arm_smmuv3_pmu.c | 25 +++--
>>  1 file changed, 19 insertions(+), 6 deletions(-)
>>
>> diff --git a/drivers/perf/arm_smmuv3_pmu.c b/drivers/perf/arm_smmuv3_pmu.c
>> index 74474bb322c3f26..5e894f957c7b935 100644
>> --- a/drivers/perf/arm_smmuv3_pmu.c
>> +++ b/drivers/perf/arm_smmuv3_pmu.c
>> @@ -793,17 +793,30 @@ static int smmu_pmu_probe(struct platform_device *pdev)
>>  .capabilities   = PERF_PMU_CAP_NO_EXCLUDE,
>>  };
>>  
>> -smmu_pmu->reg_base = devm_platform_get_and_ioremap_resource(pdev, 0, 
>> _0);
>> -if (IS_ERR(smmu_pmu->reg_base))
>> -return PTR_ERR(smmu_pmu->reg_base);
>> +/*
>> + * The register spaces of the PMCG may be in the register space of
>> + * other devices. For example, SMMU. Therefore, the PMCG resources are
>> + * not reserved to avoid resource conflicts with other drivers.
>> + */
>> +res_0 = platform_get_resource(pdev, IORESOURCE_MEM, 0);
>> +if (!res_0)
>> +return ERR_PTR(-EINVAL);
> 
> I tried to apply this, but you've got your return type in a muddle:

I'm dizzy. I don't know how this bug patch came out. I just pinched my leg, 
like I'm still in the real world.

The "ERR_PTR()" of the four ERR_PTR(xxx) should be removed. Can you help me? Or 
I send a new one.

> 
> @@ @@
> +drivers/perf/arm_smmuv3_pmu.c: In function ‘smmu_pmu_probe’:
> +drivers/perf/arm_smmuv3_pmu.c:803:10: warning: returning ‘void *’ from a 
> function with return type ‘int’ makes integer from pointer without a cast 
> [-Wint-conversion]
> +  803 |   return ERR_PTR(-EINVAL);
> +  |  ^~~~
> +drivers/perf/arm_smmuv3_pmu.c:803:31: warning: incorrect type in return 
> expression (different base types) [sparse]
> +drivers/perf/arm_smmuv3_pmu.c:803:31:expected int [sparse]
> +drivers/perf/arm_smmuv3_pmu.c:803:31:got void * [sparse]
> +drivers/perf/arm_smmuv3_pmu.c:806:10: warning: returning ‘void *’ from a 
> function with return type ‘int’ makes integer from pointer without a cast 
> [-Wint-conversion]
> +  806 |   return ERR_PTR(-ENOMEM);
> +  |  ^~~~
> +drivers/perf/arm_smmuv3_pmu.c:806:31: warning: incorrect type in return 
> expression (different base types) [sparse]
> +drivers/perf/arm_smmuv3_pmu.c:806:31:expected int [sparse]
> +drivers/perf/arm_smmuv3_pmu.c:806:31:got void * [sparse]
> +drivers/perf/arm_smmuv3_pmu.c:816:11: warning: returning ‘void *’ from a 
> function with return type ‘int’ makes integer from pointer without a cast 
> [-Wint-conversion]
> +  816 |return ERR_PTR(-EINVAL);
> +  |   ^~~~
> +drivers/perf/arm_smmuv3_pmu.c:816:39: warning: incorrect type in return 
> expression (different base types) [sparse]
> +drivers/perf/arm_smmuv3_pmu.c:816:39:expected int [sparse]
> +drivers/perf/arm_smmuv3_pmu.c:816:39:got void * [sparse]
> +drivers/perf/arm_smmuv3_pmu.c:819:11: warning: returning ‘void *’ from a 
> function with return type ‘int’ makes integer from pointer without a cast 
> [-Wint-conversion]
> +  819 |return ERR_PTR(-ENOMEM);
> +  |   ^~~~
> +drivers/perf/arm_smmuv3_pmu.c:819:39: warning: incorrect type in return 
> expression (different base types) [sparse]
> +drivers/perf/arm_smmuv3_pmu.c:819:39:expected int [sparse]
> +drivers/perf/arm_smmuv3_pmu.c:819:39:got void * [sparse]
> 
> Will
> 
> .
> 

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

Re: [PATCH v4 1/2] perf/smmuv3: Don't reserve the PMCG register spaces

2021-02-01 Thread Will Deacon
On Sat, Jan 30, 2021 at 03:14:13PM +0800, Zhen Lei wrote:
> According to the SMMUv3 specification:
> Each PMCG counter group is represented by one 4KB page (Page 0) with one
> optional additional 4KB page (Page 1), both of which are at IMPLEMENTATION
> DEFINED base addresses.
> 
> This means that the PMCG register spaces may be within the 64KB pages of
> the SMMUv3 register space. When both the SMMU and PMCG drivers reserve
> their own resources, a resource conflict occurs.
> 
> To avoid this conflict, don't reserve the PMCG regions.
> 
> Suggested-by: Robin Murphy 
> Signed-off-by: Zhen Lei 
> ---
>  drivers/perf/arm_smmuv3_pmu.c | 25 +++--
>  1 file changed, 19 insertions(+), 6 deletions(-)
> 
> diff --git a/drivers/perf/arm_smmuv3_pmu.c b/drivers/perf/arm_smmuv3_pmu.c
> index 74474bb322c3f26..5e894f957c7b935 100644
> --- a/drivers/perf/arm_smmuv3_pmu.c
> +++ b/drivers/perf/arm_smmuv3_pmu.c
> @@ -793,17 +793,30 @@ static int smmu_pmu_probe(struct platform_device *pdev)
>   .capabilities   = PERF_PMU_CAP_NO_EXCLUDE,
>   };
>  
> - smmu_pmu->reg_base = devm_platform_get_and_ioremap_resource(pdev, 0, 
> _0);
> - if (IS_ERR(smmu_pmu->reg_base))
> - return PTR_ERR(smmu_pmu->reg_base);
> + /*
> +  * The register spaces of the PMCG may be in the register space of
> +  * other devices. For example, SMMU. Therefore, the PMCG resources are
> +  * not reserved to avoid resource conflicts with other drivers.
> +  */
> + res_0 = platform_get_resource(pdev, IORESOURCE_MEM, 0);
> + if (!res_0)
> + return ERR_PTR(-EINVAL);

I tried to apply this, but you've got your return type in a muddle:

@@ @@
+drivers/perf/arm_smmuv3_pmu.c: In function ‘smmu_pmu_probe’:
+drivers/perf/arm_smmuv3_pmu.c:803:10: warning: returning ‘void *’ from a 
function with return type ‘int’ makes integer from pointer without a cast 
[-Wint-conversion]
+  803 |   return ERR_PTR(-EINVAL);
+  |  ^~~~
+drivers/perf/arm_smmuv3_pmu.c:803:31: warning: incorrect type in return 
expression (different base types) [sparse]
+drivers/perf/arm_smmuv3_pmu.c:803:31:expected int [sparse]
+drivers/perf/arm_smmuv3_pmu.c:803:31:got void * [sparse]
+drivers/perf/arm_smmuv3_pmu.c:806:10: warning: returning ‘void *’ from a 
function with return type ‘int’ makes integer from pointer without a cast 
[-Wint-conversion]
+  806 |   return ERR_PTR(-ENOMEM);
+  |  ^~~~
+drivers/perf/arm_smmuv3_pmu.c:806:31: warning: incorrect type in return 
expression (different base types) [sparse]
+drivers/perf/arm_smmuv3_pmu.c:806:31:expected int [sparse]
+drivers/perf/arm_smmuv3_pmu.c:806:31:got void * [sparse]
+drivers/perf/arm_smmuv3_pmu.c:816:11: warning: returning ‘void *’ from a 
function with return type ‘int’ makes integer from pointer without a cast 
[-Wint-conversion]
+  816 |return ERR_PTR(-EINVAL);
+  |   ^~~~
+drivers/perf/arm_smmuv3_pmu.c:816:39: warning: incorrect type in return 
expression (different base types) [sparse]
+drivers/perf/arm_smmuv3_pmu.c:816:39:expected int [sparse]
+drivers/perf/arm_smmuv3_pmu.c:816:39:got void * [sparse]
+drivers/perf/arm_smmuv3_pmu.c:819:11: warning: returning ‘void *’ from a 
function with return type ‘int’ makes integer from pointer without a cast 
[-Wint-conversion]
+  819 |return ERR_PTR(-ENOMEM);
+  |   ^~~~
+drivers/perf/arm_smmuv3_pmu.c:819:39: warning: incorrect type in return 
expression (different base types) [sparse]
+drivers/perf/arm_smmuv3_pmu.c:819:39:expected int [sparse]
+drivers/perf/arm_smmuv3_pmu.c:819:39:got void * [sparse]

Will
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

Re: [PATCH v12 10/10] iommu/arm-smmu-v3: Add stall support for platform devices

2021-02-01 Thread Zhou Wang
On 2021/2/1 19:14, Jean-Philippe Brucker wrote:
> Hi Zhou,
> 
> On Mon, Feb 01, 2021 at 09:18:42AM +0800, Zhou Wang wrote:
>>> @@ -1033,8 +1076,7 @@ int arm_smmu_write_ctx_desc(struct arm_smmu_domain 
>>> *smmu_domain, int ssid,
>>> FIELD_PREP(CTXDESC_CD_0_ASID, cd->asid) |
>>> CTXDESC_CD_0_V;
>>>  
>>> -   /* STALL_MODEL==0b10 && CD.S==0 is ILLEGAL */
>>> -   if (smmu->features & ARM_SMMU_FEAT_STALL_FORCE)
>>> +   if (smmu_domain->stall_enabled)
>>
>> Could we add ssid checking here? like: if (smmu_domain->stall_enabled && 
>> ssid).
>> The reason is if not CD.S will also be set when ssid is 0, which is not 
>> needed.
> 
> Some drivers may want to get stall events on SSID 0:
> https://lore.kernel.org/kvm/20210125090402.1429-1-lushenm...@huawei.com/#t

Hi Jean,

I did not notice this before. Yes, if we need to support IOPF in vfio, we
need enable stall events on SSID 0.

> 
> Are you seeing an issue with stall events on ssid 0?  Normally there
> shouldn't be any fault on this context, but if they happen and no handler

It happened that there is bug in the test code of HiSilicon HPRE crypto driver.
We used a wrong iova which triggered a SMMU event with stall bit :)

> is registered, the SMMU driver will just abort them and report them like a
> non-stall event.

It will report event 0x10 with stall bit.

Best,
Zhou

> 
> Thanks,
> Jean
> 
> .
> 

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[PATCH v2] iommu: Check dev->iommu in iommu_dev_xxx functions

2021-02-01 Thread Shameer Kolothum
The device iommu probe/attach might have failed leaving dev->iommu
to NULL and device drivers may still invoke these functions resulting
in a crash in iommu vendor driver code. Hence make sure we check that.

Also added iommu_ops to the "struct dev_iommu" and set it if the dev
is successfully associated with an iommu. 

Fixes: a3a195929d40 ("iommu: Add APIs for multiple domains per device")
Signed-off-by: Shameer Kolothum 
---
v1 --> v2:
 -Added iommu_ops to struct dev_iommu based on the discussion with Robin.
 -Rebased against iommu-tree core branch.
---
 drivers/iommu/iommu.c | 19 +++
 include/linux/iommu.h |  2 ++
 2 files changed, 9 insertions(+), 12 deletions(-)

diff --git a/drivers/iommu/iommu.c b/drivers/iommu/iommu.c
index fd76e2f579fe..6023d0b7c542 100644
--- a/drivers/iommu/iommu.c
+++ b/drivers/iommu/iommu.c
@@ -217,6 +217,7 @@ static int __iommu_probe_device(struct device *dev, struct 
list_head *group_list
}
 
dev->iommu->iommu_dev = iommu_dev;
+   dev->iommu->ops = iommu_dev->ops;
 
group = iommu_group_get_for_dev(dev);
if (IS_ERR(group)) {
@@ -2865,10 +2866,8 @@ EXPORT_SYMBOL_GPL(iommu_fwspec_add_ids);
  */
 int iommu_dev_enable_feature(struct device *dev, enum iommu_dev_features feat)
 {
-   const struct iommu_ops *ops = dev->bus->iommu_ops;
-
-   if (ops && ops->dev_enable_feat)
-   return ops->dev_enable_feat(dev, feat);
+   if (dev->iommu && dev->iommu->ops->dev_enable_feat)
+   return dev->iommu->ops->dev_enable_feat(dev, feat);
 
return -ENODEV;
 }
@@ -2881,10 +2880,8 @@ EXPORT_SYMBOL_GPL(iommu_dev_enable_feature);
  */
 int iommu_dev_disable_feature(struct device *dev, enum iommu_dev_features feat)
 {
-   const struct iommu_ops *ops = dev->bus->iommu_ops;
-
-   if (ops && ops->dev_disable_feat)
-   return ops->dev_disable_feat(dev, feat);
+   if (dev->iommu && dev->iommu->ops->dev_disable_feat)
+   return dev->iommu->ops->dev_disable_feat(dev, feat);
 
return -EBUSY;
 }
@@ -2892,10 +2889,8 @@ EXPORT_SYMBOL_GPL(iommu_dev_disable_feature);
 
 bool iommu_dev_feature_enabled(struct device *dev, enum iommu_dev_features 
feat)
 {
-   const struct iommu_ops *ops = dev->bus->iommu_ops;
-
-   if (ops && ops->dev_feat_enabled)
-   return ops->dev_feat_enabled(dev, feat);
+   if (dev->iommu && dev->iommu->ops->dev_feat_enabled)
+   return dev->iommu->ops->dev_feat_enabled(dev, feat);
 
return false;
 }
diff --git a/include/linux/iommu.h b/include/linux/iommu.h
index 524ffc2ff64f..ff0c76bdfb67 100644
--- a/include/linux/iommu.h
+++ b/include/linux/iommu.h
@@ -354,6 +354,7 @@ struct iommu_fault_param {
  * @fault_param: IOMMU detected device fault reporting data
  * @fwspec: IOMMU fwspec data
  * @iommu_dev:  IOMMU device this device is linked to
+ * @ops:iommu-ops for talking to the iommu_dev
  * @priv:   IOMMU Driver private data
  *
  * TODO: migrate other per device data pointers under iommu_dev_data, e.g.
@@ -364,6 +365,7 @@ struct dev_iommu {
struct iommu_fault_param*fault_param;
struct iommu_fwspec *fwspec;
struct iommu_device *iommu_dev;
+   const struct iommu_ops  *ops;
void*priv;
 };
 
-- 
2.17.1

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

Re: [PATCH v13 04/15] iommu/smmuv3: Allow s1 and s2 configs to coexist

2021-02-01 Thread Keqian Zhu
Hi Eric,

On 2020/11/18 19:21, Eric Auger wrote:
> In true nested mode, both s1_cfg and s2_cfg will coexist.
> Let's remove the union and add a "set" field in each
> config structure telling whether the config is set and needs
> to be applied when writing the STE. In legacy nested mode,
> only the 2d stage is used. In true nested mode, the "set" field
nit: s/2d/2nd

> will be set when the guest passes the pasid table.
nit: ... the "set" filed of s1_cfg and s2_cfg will be set ...

> 
> Signed-off-by: Eric Auger 
> 
> ---
> v12 -> v13:
> - does not dynamically allocate s1-cfg and s2_cfg anymore. Add
>   the set field
> ---
>  drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c | 43 +
>  drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h |  8 ++--
>  2 files changed, 31 insertions(+), 20 deletions(-)
> 
> diff --git a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c 
> b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
> index 1e4acc7f3d3c..18ac5af1b284 100644
> --- a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
> +++ b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
> @@ -1195,8 +1195,8 @@ static void arm_smmu_write_strtab_ent(struct 
> arm_smmu_master *master, u32 sid,
>   u64 val = le64_to_cpu(dst[0]);
>   bool ste_live = false;
>   struct arm_smmu_device *smmu = NULL;
> - struct arm_smmu_s1_cfg *s1_cfg = NULL;
> - struct arm_smmu_s2_cfg *s2_cfg = NULL;
> + struct arm_smmu_s1_cfg *s1_cfg;
> + struct arm_smmu_s2_cfg *s2_cfg;
>   struct arm_smmu_domain *smmu_domain = NULL;
>   struct arm_smmu_cmdq_ent prefetch_cmd = {
>   .opcode = CMDQ_OP_PREFETCH_CFG,
> @@ -1211,13 +1211,24 @@ static void arm_smmu_write_strtab_ent(struct 
> arm_smmu_master *master, u32 sid,
>   }
>  
>   if (smmu_domain) {
> + s1_cfg = _domain->s1_cfg;
> + s2_cfg = _domain->s2_cfg;
> +
>   switch (smmu_domain->stage) {
>   case ARM_SMMU_DOMAIN_S1:
> - s1_cfg = _domain->s1_cfg;
> + s1_cfg->set = true;
> + s2_cfg->set = false;
>   break;
>   case ARM_SMMU_DOMAIN_S2:
> + s1_cfg->set = false;
> + s2_cfg->set = true;
> + break;
>   case ARM_SMMU_DOMAIN_NESTED:
> - s2_cfg = _domain->s2_cfg;
> + /*
> +  * Actual usage of stage 1 depends on nested mode:
> +  * legacy (2d stage only) or true nested mode
> +  */
> + s2_cfg->set = true;
>   break;
>   default:
>   break;
> @@ -1244,7 +1255,7 @@ static void arm_smmu_write_strtab_ent(struct 
> arm_smmu_master *master, u32 sid,
>   val = STRTAB_STE_0_V;
>  
>   /* Bypass/fault */
> - if (!smmu_domain || !(s1_cfg || s2_cfg)) {
> + if (!smmu_domain || !(s1_cfg->set || s2_cfg->set)) {
>   if (!smmu_domain && disable_bypass)
>   val |= FIELD_PREP(STRTAB_STE_0_CFG, 
> STRTAB_STE_0_CFG_ABORT);
>   else
> @@ -1263,7 +1274,7 @@ static void arm_smmu_write_strtab_ent(struct 
> arm_smmu_master *master, u32 sid,
>   return;
>   }
>  
> - if (s1_cfg) {
> + if (s1_cfg->set) {
>   BUG_ON(ste_live);
>   dst[1] = cpu_to_le64(
>FIELD_PREP(STRTAB_STE_1_S1DSS, 
> STRTAB_STE_1_S1DSS_SSID0) |
> @@ -1282,7 +1293,7 @@ static void arm_smmu_write_strtab_ent(struct 
> arm_smmu_master *master, u32 sid,
>   FIELD_PREP(STRTAB_STE_0_S1FMT, s1_cfg->s1fmt);
>   }
>  
> - if (s2_cfg) {
> + if (s2_cfg->set) {
>   BUG_ON(ste_live);
>   dst[2] = cpu_to_le64(
>FIELD_PREP(STRTAB_STE_2_S2VMID, s2_cfg->vmid) |
> @@ -1846,24 +1857,24 @@ static void arm_smmu_domain_free(struct iommu_domain 
> *domain)
>  {
>   struct arm_smmu_domain *smmu_domain = to_smmu_domain(domain);
>   struct arm_smmu_device *smmu = smmu_domain->smmu;
> + struct arm_smmu_s1_cfg *s1_cfg = _domain->s1_cfg;
> + struct arm_smmu_s2_cfg *s2_cfg = _domain->s2_cfg;
>  
>   iommu_put_dma_cookie(domain);
>   free_io_pgtable_ops(smmu_domain->pgtbl_ops);
>  
>   /* Free the CD and ASID, if we allocated them */
> - if (smmu_domain->stage == ARM_SMMU_DOMAIN_S1) {
> - struct arm_smmu_s1_cfg *cfg = _domain->s1_cfg;
> -
> + if (s1_cfg->set) {
>   /* Prevent SVA from touching the CD while we're freeing it */
>   mutex_lock(_smmu_asid_lock);
> - if (cfg->cdcfg.cdtab)
> + if (s1_cfg->cdcfg.cdtab)
>   arm_smmu_free_cd_tables(smmu_domain);
> - arm_smmu_free_asid(>cd);
> + arm_smmu_free_asid(_cfg->cd);
>   mutex_unlock(_smmu_asid_lock);
> - } else {
> - struct arm_smmu_s2_cfg *cfg = 

Re: [PATCH v13 03/15] iommu/arm-smmu-v3: Maintain a SID->device structure

2021-02-01 Thread Keqian Zhu
Hi Eric,

On 2020/11/18 19:21, Eric Auger wrote:
> From: Jean-Philippe Brucker 
> 
> When handling faults from the event or PRI queue, we need to find the
> struct device associated to a SID. Add a rb_tree to keep track of SIDs.
> 
> Signed-off-by: Jean-Philippe Brucker 
[...]

>  }
>  
> +static int arm_smmu_insert_master(struct arm_smmu_device *smmu,
> +   struct arm_smmu_master *master)
> +{
> + int i;
> + int ret = 0;
> + struct arm_smmu_stream *new_stream, *cur_stream;
> + struct rb_node **new_node, *parent_node = NULL;
> + struct iommu_fwspec *fwspec = dev_iommu_fwspec_get(master->dev);
> +
> + master->streams = kcalloc(fwspec->num_ids,
> +   sizeof(struct arm_smmu_stream), GFP_KERNEL);
> + if (!master->streams)
> + return -ENOMEM;
> + master->num_streams = fwspec->num_ids;
This is not roll-backed when fail.

> +
> + mutex_lock(>streams_mutex);
> + for (i = 0; i < fwspec->num_ids && !ret; i++) {
Check ret at here, makes it hard to decide the start index of rollback.

If we fail at here, then start index is (i-2).
If we fail in the loop, then start index is (i-1).

> + u32 sid = fwspec->ids[i];
> +
> + new_stream = >streams[i];
> + new_stream->id = sid;
> + new_stream->master = master;
> +
> + /*
> +  * Check the SIDs are in range of the SMMU and our stream table
> +  */
> + if (!arm_smmu_sid_in_range(smmu, sid)) {
> + ret = -ERANGE;
> + break;
> + }
> +
> + /* Ensure l2 strtab is initialised */
> + if (smmu->features & ARM_SMMU_FEAT_2_LVL_STRTAB) {
> + ret = arm_smmu_init_l2_strtab(smmu, sid);
> + if (ret)
> + break;
> + }
> +
> + /* Insert into SID tree */
> + new_node = &(smmu->streams.rb_node);
> + while (*new_node) {
> + cur_stream = rb_entry(*new_node, struct arm_smmu_stream,
> +   node);
> + parent_node = *new_node;
> + if (cur_stream->id > new_stream->id) {
> + new_node = &((*new_node)->rb_left);
> + } else if (cur_stream->id < new_stream->id) {
> + new_node = &((*new_node)->rb_right);
> + } else {
> + dev_warn(master->dev,
> +  "stream %u already in tree\n",
> +  cur_stream->id);
> + ret = -EINVAL;
> + break;
> + }
> + }
> +
> + if (!ret) {
> + rb_link_node(_stream->node, parent_node, new_node);
> + rb_insert_color(_stream->node, >streams);
> + }
> + }
> +
> + if (ret) {
> + for (; i > 0; i--)
should be (i >= 0)?
And the start index seems not correct.

> + rb_erase(>streams[i].node, >streams);
> + kfree(master->streams);
> + }
> + mutex_unlock(>streams_mutex);
> +
> + return ret;
> +}
> +
> +static void arm_smmu_remove_master(struct arm_smmu_master *master)
> +{
> + int i;
> + struct arm_smmu_device *smmu = master->smmu;
> + struct iommu_fwspec *fwspec = dev_iommu_fwspec_get(master->dev);
> +
> + if (!smmu || !master->streams)
> + return;
> +
> + mutex_lock(>streams_mutex);
> + for (i = 0; i < fwspec->num_ids; i++)
> + rb_erase(>streams[i].node, >streams);
> + mutex_unlock(>streams_mutex);
> +
> + kfree(master->streams);
> +}
> +
>  static struct iommu_ops arm_smmu_ops;
>  
>  static struct iommu_device *arm_smmu_probe_device(struct device *dev)
>  {
> - int i, ret;
> + int ret;
>   struct arm_smmu_device *smmu;
>   struct arm_smmu_master *master;
>   struct iommu_fwspec *fwspec = dev_iommu_fwspec_get(dev);
> @@ -2331,27 +2447,12 @@ static struct iommu_device 
> *arm_smmu_probe_device(struct device *dev)
>  
>   master->dev = dev;
>   master->smmu = smmu;
> - master->sids = fwspec->ids;
> - master->num_sids = fwspec->num_ids;
>   INIT_LIST_HEAD(>bonds);
>   dev_iommu_priv_set(dev, master);
>  
> - /* Check the SIDs are in range of the SMMU and our stream table */
> - for (i = 0; i < master->num_sids; i++) {
> - u32 sid = master->sids[i];
> -
> - if (!arm_smmu_sid_in_range(smmu, sid)) {
> - ret = -ERANGE;
> - goto err_free_master;
> - }
> -
> - /* Ensure l2 strtab is initialised */
> - if (smmu->features & ARM_SMMU_FEAT_2_LVL_STRTAB) {
> - ret = arm_smmu_init_l2_strtab(smmu, 

Re: [PATCH v4 1/2] perf/smmuv3: Don't reserve the PMCG register spaces

2021-02-01 Thread Robin Murphy

On 2021-01-30 07:14, Zhen Lei wrote:

According to the SMMUv3 specification:
Each PMCG counter group is represented by one 4KB page (Page 0) with one
optional additional 4KB page (Page 1), both of which are at IMPLEMENTATION
DEFINED base addresses.

This means that the PMCG register spaces may be within the 64KB pages of
the SMMUv3 register space. When both the SMMU and PMCG drivers reserve
their own resources, a resource conflict occurs.

To avoid this conflict, don't reserve the PMCG regions.


I said my review on v3 stood either way, but for the avoidance of doubt,

Reviewed-by: Robin Murphy 

I hadn't considered that a comment is a very good idea, in case the 
cleanup-script crew find this in future and try to "simplify" it :)


Thanks,
Robin.


Suggested-by: Robin Murphy 
Signed-off-by: Zhen Lei 
---
  drivers/perf/arm_smmuv3_pmu.c | 25 +++--
  1 file changed, 19 insertions(+), 6 deletions(-)

diff --git a/drivers/perf/arm_smmuv3_pmu.c b/drivers/perf/arm_smmuv3_pmu.c
index 74474bb322c3f26..5e894f957c7b935 100644
--- a/drivers/perf/arm_smmuv3_pmu.c
+++ b/drivers/perf/arm_smmuv3_pmu.c
@@ -793,17 +793,30 @@ static int smmu_pmu_probe(struct platform_device *pdev)
.capabilities   = PERF_PMU_CAP_NO_EXCLUDE,
};
  
-	smmu_pmu->reg_base = devm_platform_get_and_ioremap_resource(pdev, 0, _0);

-   if (IS_ERR(smmu_pmu->reg_base))
-   return PTR_ERR(smmu_pmu->reg_base);
+   /*
+* The register spaces of the PMCG may be in the register space of
+* other devices. For example, SMMU. Therefore, the PMCG resources are
+* not reserved to avoid resource conflicts with other drivers.
+*/
+   res_0 = platform_get_resource(pdev, IORESOURCE_MEM, 0);
+   if (!res_0)
+   return ERR_PTR(-EINVAL);
+   smmu_pmu->reg_base = devm_ioremap(dev, res_0->start, 
resource_size(res_0));
+   if (!smmu_pmu->reg_base)
+   return ERR_PTR(-ENOMEM);
  
  	cfgr = readl_relaxed(smmu_pmu->reg_base + SMMU_PMCG_CFGR);
  
  	/* Determine if page 1 is present */

if (cfgr & SMMU_PMCG_CFGR_RELOC_CTRS) {
-   smmu_pmu->reloc_base = devm_platform_ioremap_resource(pdev, 1);
-   if (IS_ERR(smmu_pmu->reloc_base))
-   return PTR_ERR(smmu_pmu->reloc_base);
+   struct resource *res_1;
+
+   res_1 = platform_get_resource(pdev, IORESOURCE_MEM, 1);
+   if (!res_1)
+   return ERR_PTR(-EINVAL);
+   smmu_pmu->reloc_base = devm_ioremap(dev, res_1->start, 
resource_size(res_1));
+   if (!smmu_pmu->reloc_base)
+   return ERR_PTR(-ENOMEM);
} else {
smmu_pmu->reloc_base = smmu_pmu->reg_base;
}


___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH v3 3/3] iommu/arm-smmu-v3: Reserving the entire SMMU register space

2021-02-01 Thread Leizhen (ThunderTown)


On 2021/2/1 19:44, Robin Murphy wrote:
> On 2021-01-30 01:54, Leizhen (ThunderTown) wrote:
>>
>>
>> On 2021/1/29 23:27, Robin Murphy wrote:
>>> On 2021-01-27 11:32, Zhen Lei wrote:
 commit 52f3fab0067d ("iommu/arm-smmu-v3: Don't reserve implementation
 defined register space") only reserves the basic SMMU register space. So
 the ECMDQ register space is not covered, it should be mapped again. Due
 to the size of this ECMDQ resource is not fixed, depending on
 SMMU_IDR6.CMDQ_CONTROL_PAGE_LOG2NUMQ. Processing its resource reservation
 to avoid resource conflict with PMCG is a bit more complicated.

 Therefore, the resources of the PMCG are not reserved, and the entire SMMU
 resources are reserved.
>>>
>>> This is the opposite of what I suggested. My point was that it will make 
>>> the most sense to map the ECMDQ pages as a separate request anyway, 
>>> therefore there is no reason to stop mapping page 0 and page 1 separately 
>>> either.
>>
>> I don't understand why the ECMDQ mapping must be performed separately. If 
>> the conflict with PMCG resources is eliminated. ECMDQ cannot be a separate 
>> driver like PMCG.
> 
> I mean in terms of the basic practice of not mapping megabytes worth of 
> IMP-DEF crap that this driver doesn't need or even understand. If we don't 
> have ECMDQ, we definitely don't need anything beyond page 1, so there's no 
> point mapping it all, and indeed it's safest not to anyway. Even if we do 
> have ECMDQ, it's still safer not to map all the unknown stuff that may be in 
> between, and until we've mapped page 0 we don't know whether we have ECMDQ or 
> not.
> 
> Therefore the most sensible thing to do either way is to map the basic 
> page(s) first, then map the ECMDQ pages specifically if we determine that we 
> need to. And either way we don't even need to think about this until adding 
> ECMDQ support.

Okay, I got it. We really don't have to write code ahead of time for what might 
happen in the future.

> 
> Robin.
> 
>>> If we need to expand the page 0 mapping to cover more of page 0 to reach 
>>> the SMMU_CMDQ_CONTROL_PAGE_* registers, we can do that when we actually add 
>>> ECMDQ support.
>>>
>>> Robin.
>>>
 Suggested-by: Robin Murphy 
 Signed-off-by: Zhen Lei 
 ---
    drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c | 24 
 
    drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h |  2 --
    2 files changed, 4 insertions(+), 22 deletions(-)

 diff --git a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c 
 b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
 index f04c55a7503c790..fde24403b06a9e3 100644
 --- a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
 +++ b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
 @@ -3476,14 +3476,6 @@ static int arm_smmu_set_bus_ops(struct iommu_ops 
 *ops)
    return err;
    }
    -static void __iomem *arm_smmu_ioremap(struct device *dev, 
 resource_size_t start,
 -  resource_size_t size)
 -{
 -    struct resource res = DEFINE_RES_MEM(start, size);
 -
 -    return devm_ioremap_resource(dev, );
 -}
 -
    static int arm_smmu_device_probe(struct platform_device *pdev)
    {
    int irq, ret;
 @@ -3519,22 +3511,14 @@ static int arm_smmu_device_probe(struct 
 platform_device *pdev)
    }
    ioaddr = res->start;
    -    /*
 - * Don't map the IMPLEMENTATION DEFINED regions, since they may 
 contain
 - * the PMCG registers which are reserved by the PMU driver.
 - */
 -    smmu->base = arm_smmu_ioremap(dev, ioaddr, ARM_SMMU_REG_SZ);
 +    smmu->base = devm_ioremap_resource(dev, res);
    if (IS_ERR(smmu->base))
    return PTR_ERR(smmu->base);
    -    if (arm_smmu_resource_size(smmu) > SZ_64K) {
 -    smmu->page1 = arm_smmu_ioremap(dev, ioaddr + SZ_64K,
 -   ARM_SMMU_REG_SZ);
 -    if (IS_ERR(smmu->page1))
 -    return PTR_ERR(smmu->page1);
 -    } else {
 +    if (arm_smmu_resource_size(smmu) > SZ_64K)
 +    smmu->page1 = smmu->base + SZ_64K;
 +    else
    smmu->page1 = smmu->base;
 -    }
      /* Interrupt lines */
    diff --git a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h 
 b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h
 index da525f46dab4fc1..63f2b476987d6ae 100644
 --- a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h
 +++ b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h
 @@ -152,8 +152,6 @@
    #define ARM_SMMU_PRIQ_IRQ_CFG1    0xd8
    #define ARM_SMMU_PRIQ_IRQ_CFG2    0xdc
    -#define ARM_SMMU_REG_SZ    0xe00
 -
    /* Common MSI config fields */
    #define MSI_CFG0_ADDR_MASK    GENMASK_ULL(51, 2)
    #define MSI_CFG2_SH    GENMASK(5, 4)

>>>
>>> .
>>>
>>
> 
> .
> 


Re: [PATCH v13 02/15] iommu: Introduce bind/unbind_guest_msi

2021-02-01 Thread Keqian Zhu
Hi Eric,

On 2020/11/18 19:21, Eric Auger wrote:
> On ARM, MSI are translated by the SMMU. An IOVA is allocated
> for each MSI doorbell. If both the host and the guest are exposed
> with SMMUs, we end up with 2 different IOVAs allocated by each.
> guest allocates an IOVA (gIOVA) to map onto the guest MSI
> doorbell (gDB). The Host allocates another IOVA (hIOVA) to map
> onto the physical doorbell (hDB).
> 
> So we end up with 2 untied mappings:
>  S1S2
> gIOVA->gDB
>   hIOVA->hDB
> 
> Currently the PCI device is programmed by the host with hIOVA
> as MSI doorbell. So this does not work.
> 
> This patch introduces an API to pass gIOVA/gDB to the host so
> that gIOVA can be reused by the host instead of re-allocating
> a new IOVA. So the goal is to create the following nested mapping:
Does the gDB can be reused under non-nested mode?

> 
>  S1S2
> gIOVA->gDB ->hDB
> 
> and program the PCI device with gIOVA MSI doorbell.
> 
> In case we have several devices attached to this nested domain
> (devices belonging to the same group), they cannot be isolated
> on guest side either. So they should also end up in the same domain
> on guest side. We will enforce that all the devices attached to
> the host iommu domain use the same physical doorbell and similarly
> a single virtual doorbell mapping gets registered (1 single
> virtual doorbell is used on guest as well).
> 
[...]

> + *
> + * The associated IOVA can be reused by the host to create a nested
> + * stage2 binding mapping translating into the physical doorbell used
> + * by the devices attached to the domain.
> + *
> + * All devices within the domain must share the same physical doorbell.
> + * A single MSI GIOVA/GPA mapping can be attached to an iommu_domain.
> + */
> +
> +int iommu_bind_guest_msi(struct iommu_domain *domain,
> +  dma_addr_t giova, phys_addr_t gpa, size_t size)
> +{
> + if (unlikely(!domain->ops->bind_guest_msi))
> + return -ENODEV;
> +
> + return domain->ops->bind_guest_msi(domain, giova, gpa, size);
> +}
> +EXPORT_SYMBOL_GPL(iommu_bind_guest_msi);
> +
> +void iommu_unbind_guest_msi(struct iommu_domain *domain,
> + dma_addr_t iova)
nit: s/iova/giova

> +{
> + if (unlikely(!domain->ops->unbind_guest_msi))
> + return;
> +
> + domain->ops->unbind_guest_msi(domain, iova);
> +}
> +EXPORT_SYMBOL_GPL(iommu_unbind_guest_msi);
> +
[...]

Thanks,
Keqian

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH v3 3/3] iommu/arm-smmu-v3: Reserving the entire SMMU register space

2021-02-01 Thread Robin Murphy

On 2021-01-30 01:54, Leizhen (ThunderTown) wrote:



On 2021/1/29 23:27, Robin Murphy wrote:

On 2021-01-27 11:32, Zhen Lei wrote:

commit 52f3fab0067d ("iommu/arm-smmu-v3: Don't reserve implementation
defined register space") only reserves the basic SMMU register space. So
the ECMDQ register space is not covered, it should be mapped again. Due
to the size of this ECMDQ resource is not fixed, depending on
SMMU_IDR6.CMDQ_CONTROL_PAGE_LOG2NUMQ. Processing its resource reservation
to avoid resource conflict with PMCG is a bit more complicated.

Therefore, the resources of the PMCG are not reserved, and the entire SMMU
resources are reserved.


This is the opposite of what I suggested. My point was that it will make the 
most sense to map the ECMDQ pages as a separate request anyway, therefore there 
is no reason to stop mapping page 0 and page 1 separately either.


I don't understand why the ECMDQ mapping must be performed separately. If the 
conflict with PMCG resources is eliminated. ECMDQ cannot be a separate driver 
like PMCG.


I mean in terms of the basic practice of not mapping megabytes worth of 
IMP-DEF crap that this driver doesn't need or even understand. If we 
don't have ECMDQ, we definitely don't need anything beyond page 1, so 
there's no point mapping it all, and indeed it's safest not to anyway. 
Even if we do have ECMDQ, it's still safer not to map all the unknown 
stuff that may be in between, and until we've mapped page 0 we don't 
know whether we have ECMDQ or not.


Therefore the most sensible thing to do either way is to map the basic 
page(s) first, then map the ECMDQ pages specifically if we determine 
that we need to. And either way we don't even need to think about this 
until adding ECMDQ support.


Robin.


If we need to expand the page 0 mapping to cover more of page 0 to reach the 
SMMU_CMDQ_CONTROL_PAGE_* registers, we can do that when we actually add ECMDQ 
support.

Robin.


Suggested-by: Robin Murphy 
Signed-off-by: Zhen Lei 
---
   drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c | 24 
   drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h |  2 --
   2 files changed, 4 insertions(+), 22 deletions(-)

diff --git a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c 
b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
index f04c55a7503c790..fde24403b06a9e3 100644
--- a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
+++ b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
@@ -3476,14 +3476,6 @@ static int arm_smmu_set_bus_ops(struct iommu_ops *ops)
   return err;
   }
   -static void __iomem *arm_smmu_ioremap(struct device *dev, resource_size_t 
start,
-  resource_size_t size)
-{
-    struct resource res = DEFINE_RES_MEM(start, size);
-
-    return devm_ioremap_resource(dev, );
-}
-
   static int arm_smmu_device_probe(struct platform_device *pdev)
   {
   int irq, ret;
@@ -3519,22 +3511,14 @@ static int arm_smmu_device_probe(struct platform_device 
*pdev)
   }
   ioaddr = res->start;
   -    /*
- * Don't map the IMPLEMENTATION DEFINED regions, since they may contain
- * the PMCG registers which are reserved by the PMU driver.
- */
-    smmu->base = arm_smmu_ioremap(dev, ioaddr, ARM_SMMU_REG_SZ);
+    smmu->base = devm_ioremap_resource(dev, res);
   if (IS_ERR(smmu->base))
   return PTR_ERR(smmu->base);
   -    if (arm_smmu_resource_size(smmu) > SZ_64K) {
-    smmu->page1 = arm_smmu_ioremap(dev, ioaddr + SZ_64K,
-   ARM_SMMU_REG_SZ);
-    if (IS_ERR(smmu->page1))
-    return PTR_ERR(smmu->page1);
-    } else {
+    if (arm_smmu_resource_size(smmu) > SZ_64K)
+    smmu->page1 = smmu->base + SZ_64K;
+    else
   smmu->page1 = smmu->base;
-    }
     /* Interrupt lines */
   diff --git a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h 
b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h
index da525f46dab4fc1..63f2b476987d6ae 100644
--- a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h
+++ b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h
@@ -152,8 +152,6 @@
   #define ARM_SMMU_PRIQ_IRQ_CFG1    0xd8
   #define ARM_SMMU_PRIQ_IRQ_CFG2    0xdc
   -#define ARM_SMMU_REG_SZ    0xe00
-
   /* Common MSI config fields */
   #define MSI_CFG0_ADDR_MASK    GENMASK_ULL(51, 2)
   #define MSI_CFG2_SH    GENMASK(5, 4)



.




___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

Re: [PATCH v13 01/15] iommu: Introduce attach/detach_pasid_table API

2021-02-01 Thread Keqian Zhu
Hi Eric,

On 2020/11/18 19:21, Eric Auger wrote:
> In virtualization use case, when a guest is assigned
> a PCI host device, protected by a virtual IOMMU on the guest,
> the physical IOMMU must be programmed to be consistent with
> the guest mappings. If the physical IOMMU supports two
> translation stages it makes sense to program guest mappings
> onto the first stage/level (ARM/Intel terminology) while the host
> owns the stage/level 2.
> 
> In that case, it is mandated to trap on guest configuration
> settings and pass those to the physical iommu driver.
> 
> This patch adds a new API to the iommu subsystem that allows
> to set/unset the pasid table information.
> 
> A generic iommu_pasid_table_config struct is introduced in
> a new iommu.h uapi header. This is going to be used by the VFIO
> user API.
> 
> Signed-off-by: Jean-Philippe Brucker 
> Signed-off-by: Liu, Yi L 
> Signed-off-by: Ashok Raj 
> Signed-off-by: Jacob Pan 
> Signed-off-by: Eric Auger 
> 
> ---
> 
> v12 -> v13:
> - Fix config check
> 
> v11 -> v12:
> - add argsz, name the union
> ---
>  drivers/iommu/iommu.c  | 68 ++
>  include/linux/iommu.h  | 21 
>  include/uapi/linux/iommu.h | 54 ++
>  3 files changed, 143 insertions(+)
> 
> diff --git a/drivers/iommu/iommu.c b/drivers/iommu/iommu.c
> index b53446bb8c6b..978fe34378fb 100644
> --- a/drivers/iommu/iommu.c
> +++ b/drivers/iommu/iommu.c
> @@ -2171,6 +2171,74 @@ int iommu_uapi_sva_unbind_gpasid(struct iommu_domain 
> *domain, struct device *dev
>  }
>  EXPORT_SYMBOL_GPL(iommu_uapi_sva_unbind_gpasid);
>  
> +int iommu_attach_pasid_table(struct iommu_domain *domain,
> +  struct iommu_pasid_table_config *cfg)
> +{
> + if (unlikely(!domain->ops->attach_pasid_table))
> + return -ENODEV;
> +
> + return domain->ops->attach_pasid_table(domain, cfg);
> +}
miss export symbol?

> +
> +int iommu_uapi_attach_pasid_table(struct iommu_domain *domain,
> +   void __user *uinfo)
> +{
> + struct iommu_pasid_table_config pasid_table_data = { 0 };
> + u32 minsz;
> +
> + if (unlikely(!domain->ops->attach_pasid_table))
> + return -ENODEV;
> +
> + /*
> +  * No new spaces can be added before the variable sized union, the
> +  * minimum size is the offset to the union.
> +  */
> + minsz = offsetof(struct iommu_pasid_table_config, vendor_data);
> +
> + /* Copy minsz from user to get flags and argsz */
> + if (copy_from_user(_table_data, uinfo, minsz))
> + return -EFAULT;
> +
> + /* Fields before the variable size union are mandatory */
> + if (pasid_table_data.argsz < minsz)
> + return -EINVAL;
> +
> + /* PASID and address granu require additional info beyond minsz */
> + if (pasid_table_data.version != PASID_TABLE_CFG_VERSION_1)
> + return -EINVAL;
> + if (pasid_table_data.format == IOMMU_PASID_FORMAT_SMMUV3 &&
> + pasid_table_data.argsz <
> + offsetofend(struct iommu_pasid_table_config, 
> vendor_data.smmuv3))
> + return -EINVAL;
> +
> + /*
> +  * User might be using a newer UAPI header which has a larger data
> +  * size, we shall support the existing flags within the current
> +  * size. Copy the remaining user data _after_ minsz but not more
> +  * than the current kernel supported size.
> +  */
> + if (copy_from_user((void *)_table_data + minsz, uinfo + minsz,
> +min_t(u32, pasid_table_data.argsz, 
> sizeof(pasid_table_data)) - minsz))
> + return -EFAULT;
> +
> + /* Now the argsz is validated, check the content */
> + if (pasid_table_data.config < IOMMU_PASID_CONFIG_TRANSLATE ||
> + pasid_table_data.config > IOMMU_PASID_CONFIG_ABORT)
> + return -EINVAL;
> +
> + return domain->ops->attach_pasid_table(domain, _table_data);
> +}
> +EXPORT_SYMBOL_GPL(iommu_uapi_attach_pasid_table);
> +
> +void iommu_detach_pasid_table(struct iommu_domain *domain)
> +{
> + if (unlikely(!domain->ops->detach_pasid_table))
> + return;
> +
> + domain->ops->detach_pasid_table(domain);
> +}
> +EXPORT_SYMBOL_GPL(iommu_detach_pasid_table);
> +
>  static void __iommu_detach_device(struct iommu_domain *domain,
> struct device *dev)
>  {
> diff --git a/include/linux/iommu.h b/include/linux/iommu.h
> index b95a6f8db6ff..464fcbecf841 100644
> --- a/include/linux/iommu.h
> +++ b/include/linux/iommu.h
> @@ -223,6 +223,8 @@ struct iommu_iotlb_gather {
>   * @cache_invalidate: invalidate translation caches
>   * @sva_bind_gpasid: bind guest pasid and mm
>   * @sva_unbind_gpasid: unbind guest pasid and mm
> + * @attach_pasid_table: attach a pasid table
> + * @detach_pasid_table: detach the pasid table
>   * @def_domain_type: device default domain type, return value:
>   *   - IOMMU_DOMAIN_IDENTITY: 

Re: [PATCH 2/3] iommu/io-pgtable-arm: Add IOMMU_LLC page protection flag

2021-02-01 Thread Will Deacon
On Fri, Jan 29, 2021 at 03:12:59PM +0530, Sai Prakash Ranjan wrote:
> On 2021-01-29 14:35, Will Deacon wrote:
> > On Mon, Jan 11, 2021 at 07:45:04PM +0530, Sai Prakash Ranjan wrote:
> > > Add a new page protection flag IOMMU_LLC which can be used
> > > by non-coherent masters to set cacheable memory attributes
> > > for an outer level of cache called as last-level cache or
> > > system cache. Initial user of this page protection flag is
> > > the adreno gpu and then can later be used by other clients
> > > such as video where this can be used for per-buffer based
> > > mapping.
> > > 
> > > Signed-off-by: Sai Prakash Ranjan 
> > > ---
> > >  drivers/iommu/io-pgtable-arm.c | 3 +++
> > >  include/linux/iommu.h  | 6 ++
> > >  2 files changed, 9 insertions(+)
> > > 
> > > diff --git a/drivers/iommu/io-pgtable-arm.c
> > > b/drivers/iommu/io-pgtable-arm.c
> > > index 7439ee7fdcdb..ebe653ef601b 100644
> > > --- a/drivers/iommu/io-pgtable-arm.c
> > > +++ b/drivers/iommu/io-pgtable-arm.c
> > > @@ -415,6 +415,9 @@ static arm_lpae_iopte
> > > arm_lpae_prot_to_pte(struct arm_lpae_io_pgtable *data,
> > >   else if (prot & IOMMU_CACHE)
> > >   pte |= (ARM_LPAE_MAIR_ATTR_IDX_CACHE
> > >   << ARM_LPAE_PTE_ATTRINDX_SHIFT);
> > > + else if (prot & IOMMU_LLC)
> > > + pte |= (ARM_LPAE_MAIR_ATTR_IDX_INC_OCACHE
> > > + << ARM_LPAE_PTE_ATTRINDX_SHIFT);
> > >   }
> > > 
> > >   if (prot & IOMMU_CACHE)
> > > diff --git a/include/linux/iommu.h b/include/linux/iommu.h
> > > index ffaa389ea128..1f82057df531 100644
> > > --- a/include/linux/iommu.h
> > > +++ b/include/linux/iommu.h
> > > @@ -31,6 +31,12 @@
> > >   * if the IOMMU page table format is equivalent.
> > >   */
> > >  #define IOMMU_PRIV   (1 << 5)
> > > +/*
> > > + * Non-coherent masters can use this page protection flag to set
> > > cacheable
> > > + * memory attributes for only a transparent outer level of cache,
> > > also known as
> > > + * the last-level or system cache.
> > > + */
> > > +#define IOMMU_LLC(1 << 6)
> > 
> > On reflection, I'm a bit worried about exposing this because I think it
> > will
> > introduce a mismatched virtual alias with the CPU (we don't even have a
> > MAIR
> > set up for this memory type). Now, we also have that issue for the PTW,
> > but
> > since we always use cache maintenance (i.e. the streaming API) for
> > publishing the page-tables to a non-coheren walker, it works out.
> > However,
> > if somebody expects IOMMU_LLC to be coherent with a DMA API coherent
> > allocation, then they're potentially in for a nasty surprise due to the
> > mismatched outer-cacheability attributes.
> > 
> 
> Can't we add the syscached memory type similar to what is done on android?

Maybe. How does the GPU driver map these things on the CPU side?

Will
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH v12 10/10] iommu/arm-smmu-v3: Add stall support for platform devices

2021-02-01 Thread Jean-Philippe Brucker
Hi Zhou,

On Mon, Feb 01, 2021 at 09:18:42AM +0800, Zhou Wang wrote:
> > @@ -1033,8 +1076,7 @@ int arm_smmu_write_ctx_desc(struct arm_smmu_domain 
> > *smmu_domain, int ssid,
> > FIELD_PREP(CTXDESC_CD_0_ASID, cd->asid) |
> > CTXDESC_CD_0_V;
> >  
> > -   /* STALL_MODEL==0b10 && CD.S==0 is ILLEGAL */
> > -   if (smmu->features & ARM_SMMU_FEAT_STALL_FORCE)
> > +   if (smmu_domain->stall_enabled)
> 
> Could we add ssid checking here? like: if (smmu_domain->stall_enabled && 
> ssid).
> The reason is if not CD.S will also be set when ssid is 0, which is not 
> needed.

Some drivers may want to get stall events on SSID 0:
https://lore.kernel.org/kvm/20210125090402.1429-1-lushenm...@huawei.com/#t

Are you seeing an issue with stall events on ssid 0?  Normally there
shouldn't be any fault on this context, but if they happen and no handler
is registered, the SMMU driver will just abort them and report them like a
non-stall event.

Thanks,
Jean
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH v12 10/10] iommu/arm-smmu-v3: Add stall support for platform devices

2021-02-01 Thread Jean-Philippe Brucker
On Sun, Jan 31, 2021 at 07:29:09PM +0100, Auger Eric wrote:
> Hi Jean,
> 
> Some rather minor comments§questions below that may not justify a respin.
> 
> On 1/27/21 4:43 PM, Jean-Philippe Brucker wrote:
> > -static bool arm_smmu_iopf_supported(struct arm_smmu_master *master)
> > +bool arm_smmu_master_iopf_supported(struct arm_smmu_master *master)
> >  {
> > -   return false;
> > +   /* We're not keeping track of SIDs in fault events */
> shall we? [*] below

That would require storing the incoming SID into the iommu_fault_event
struct, and retrieve it in arm_smmu_page_response(). Easy enough, but I
don't think it's needed for existing devices.

> > +   if (master->num_streams != 1)
> > +   return false;
[...]
> > +static int arm_smmu_page_response(struct device *dev,
> > + struct iommu_fault_event *unused,
> > + struct iommu_page_response *resp)
> > +{
> > +   struct arm_smmu_cmdq_ent cmd = {0};
> > +   struct arm_smmu_master *master = dev_iommu_priv_get(dev);
> > +   int sid = master->streams[0].id;
> [*]
> > +
> > +   if (master->stall_enabled) {
> > +   cmd.opcode  = CMDQ_OP_RESUME;
> > +   cmd.resume.sid  = sid;
> > +   cmd.resume.stag = resp->grpid;
> > +   switch (resp->code) {
> > +   case IOMMU_PAGE_RESP_INVALID:
> add fallthrough?

I think fallthrough is mainly useful to tell reader and compiler that a
break was omitted on purpose. When two cases are stuck together the intent
to merge the flow is clear enough in my opinion. GCC's
-Wimplicit-fallthrough doesn't warn in this case.

> > +   case IOMMU_PAGE_RESP_FAILURE:
> > +   cmd.resume.resp = CMDQ_RESUME_0_RESP_ABORT;
> > +   break;
[...]
> > +static int arm_smmu_handle_evt(struct arm_smmu_device *smmu, u64 *evt)
> > +{
> > +   int ret;
> > +   u32 reason;
> > +   u32 perm = 0;
> > +   struct arm_smmu_master *master;
> > +   bool ssid_valid = evt[0] & EVTQ_0_SSV;
> > +   u32 sid = FIELD_GET(EVTQ_0_SID, evt[0]);
> > +   struct iommu_fault_event fault_evt = { };
> > +   struct iommu_fault *flt = _evt.fault;
> > +
> > +   /* Stage-2 is always pinned at the moment */
> > +   if (evt[1] & EVTQ_1_S2)
> > +   return -EFAULT;
> > +
> > +   master = arm_smmu_find_master(smmu, sid);
> > +   if (!master)
> > +   return -EINVAL;
> > +
> > +   if (evt[1] & EVTQ_1_RnW)
> > +   perm |= IOMMU_FAULT_PERM_READ;
> > +   else
> > +   perm |= IOMMU_FAULT_PERM_WRITE;
> > +
> > +   if (evt[1] & EVTQ_1_InD)
> > +   perm |= IOMMU_FAULT_PERM_EXEC;
> > +
> > +   if (evt[1] & EVTQ_1_PnU)
> > +   perm |= IOMMU_FAULT_PERM_PRIV;
> > +
> > +   switch (FIELD_GET(EVTQ_0_ID, evt[0])) {
> > +   case EVT_ID_TRANSLATION_FAULT:
> > +   case EVT_ID_ADDR_SIZE_FAULT:
> > +   case EVT_ID_ACCESS_FAULT:
> > +   reason = IOMMU_FAULT_REASON_PTE_FETCH;
> Doesn't it rather map to IOMMU_FAULT_REASON_ACCESS?
> /* access flag check failed */"

Good point, I guess it didn't exist when I wrote this. And ADDR_SIZE_FAULT
corresponds to IOMMU_FAULT_REASON_OOR_ADDRESS now, right?

By the way the wording on those two fault reasons, "access flag" and
"stage", seems arch-specific - x86 names are "accessed flag" and "level".

> > +   break;
> > +   case EVT_ID_PERMISSION_FAULT:
> > +   reason = IOMMU_FAULT_REASON_PERMISSION;
> > +   break;
> > +   default:
> > +   return -EOPNOTSUPP;
> > +   }
> > +
> > +   if (evt[1] & EVTQ_1_STALL) {
> > +   flt->type = IOMMU_FAULT_PAGE_REQ;
> > +   flt->prm = (struct iommu_fault_page_request) {
> > +   .flags = IOMMU_FAULT_PAGE_REQUEST_LAST_PAGE,
> > +   .grpid = FIELD_GET(EVTQ_1_STAG, evt[1]),
> > +   .perm = perm,
> > +   .addr = FIELD_GET(EVTQ_2_ADDR, evt[2]),
> > +   };
> > +
> > +   if (ssid_valid) {
> > +   flt->prm.flags |= IOMMU_FAULT_PAGE_REQUEST_PASID_VALID;
> > +   flt->prm.pasid = FIELD_GET(EVTQ_0_SSID, evt[0]);
> > +   }
> > +   } else {
> > +   flt->type = IOMMU_FAULT_DMA_UNRECOV;
> > +   flt->event = (struct iommu_fault_unrecoverable) {
> > +   .reason = reason,
> > +   .flags = IOMMU_FAULT_UNRECOV_ADDR_VALID |
> > +IOMMU_FAULT_UNRECOV_FETCH_ADDR_VALID,
> nit: shall IOMMU_FAULT_UNRECOV_FETCH_ADDR_VALID be set here? Supported
> unrecoverable faults feature the IPA field which is UNKNOWN for S1
> translations. fetch_addr rather was
> corresponding to WALK_EABT.Fetch_addr to me.

Right I should drop the IPA part entirely, since we don't report S2 faults
in this patch.

Thanks,
Jean

> > +   .perm = perm,
> > +   .addr = FIELD_GET(EVTQ_2_ADDR, evt[2]),
> > +   .fetch_addr = FIELD_GET(EVTQ_3_IPA, evt[3]),
> > +   };


Re: [PATCH v5 06/27] dt-bindings: mediatek: Add binding for mt8192 IOMMU

2021-02-01 Thread Robin Murphy

On 2021-01-29 11:45, Tomasz Figa wrote:

On Mon, Jan 25, 2021 at 4:34 PM Yong Wu  wrote:


On Mon, 2021-01-25 at 13:18 +0900, Tomasz Figa wrote:

On Wed, Jan 20, 2021 at 4:08 PM Yong Wu  wrote:


On Wed, 2021-01-20 at 13:15 +0900, Tomasz Figa wrote:

On Wed, Jan 13, 2021 at 3:45 PM Yong Wu  wrote:


On Wed, 2021-01-13 at 14:30 +0900, Tomasz Figa wrote:

On Thu, Dec 24, 2020 at 8:35 PM Yong Wu  wrote:


On Wed, 2020-12-23 at 17:18 +0900, Tomasz Figa wrote:

On Wed, Dec 09, 2020 at 04:00:41PM +0800, Yong Wu wrote:

This patch adds decriptions for mt8192 IOMMU and SMI.

mt8192 also is MTK IOMMU gen2 which uses ARM Short-Descriptor translation
table format. The M4U-SMI HW diagram is as below:

   EMI
|
   M4U
|
   
SMI Common
   
|
   +---+--+--+--+---+
   |   |  |  |   .. |   |
   |   |  |  |  |   |
larb0   larb1  larb2  larb4 ..  larb19   larb20
disp0   disp1   mdpvdec   IPE  IPE

All the connections are HW fixed, SW can NOT adjust it.

mt8192 M4U support 0~16GB iova range. we preassign different engines
into different iova ranges:

domain-id  module iova-range  larbs
0   disp0 ~ 4G  larb0/1
1   vcodec  4G ~ 8G larb4/5/7
2   cam/mdp 8G ~ 12G larb2/9/11/13/14/16/17/18/19/20


Why do we preassign these addresses in DT? Shouldn't it be a user's or
integrator's decision to split the 16 GB address range into sub-ranges
and define which larbs those sub-ranges are shared with?


The problem is that we can't split the 16GB range with the larb as unit.
The example is the below ccu0(larb13 port9/10) is a independent
range(domain), the others ports in larb13 is in another domain.

disp/vcodec/cam/mdp don't have special iova requirement, they could
access any range. vcodec also can locate 8G~12G. it don't care about
where its iova locate. here I preassign like this following with our
internal project setting.


Let me try to understand this a bit more. Given the split you're
proposing, is there actually any isolation enforced between particular
domains? For example, if I program vcodec to with a DMA address from
the 0-4G range, would the IOMMU actually generate a fault, even if
disp had some memory mapped at that address?


In this case. we will get fault in current SW setting.



Okay, thanks.





Why set this in DT?, this is only for simplifying the code. Assume we
put it in the platform data. We have up to 32 larbs, each larb has up to
32 ports, each port may be in different iommu domains. we should have a
big array for this..however we only use a macro to get the domain in the
DT method.

When replying this mail, I happen to see there is a "dev->dev_range_map"
which has "dma-range" information, I think I could use this value to get
which domain the device belong to. then no need put domid in DT. I will
test this.


My feeling is that the only part that needs to be enforced statically
is the reserved IOVA range for CCUs. The other ranges should be
determined dynamically, although I think I need to understand better
how the hardware and your proposed design work to tell what would be
likely the best choice here.


I have removed the domid patch in v6. and get the domain id in [27/33]
in v6..

About the other ranges should be dynamical, the commit message [30/33]
of v6 should be helpful. the problem is that we have a bank_sel setting
for the iova[32:33]. currently we preassign this value. thus, all the
ranges are fixed. If you adjust this setting, you can let vcodec access
0~4G.


Okay, so it sounds like we effectively have four 4G address spaces and
we can assign the master devices to them. I guess each of these
address spaces makes for an IOMMU group.


Yes. Each a address spaces is an IOMMU group.



It's fine to pre-assign the devices to those groups for now, but it
definitely shouldn't be hardcoded in DT, because it depends on the use
case of the device. I'll take a look at v6, but it sounds like it
should be fine if it doesn't take the address space assignment from DT
anymore.


Thanks very much for your review.



Hmm, I had a look at v6 and it still has the address spaces hardcoded
in the DTS.


sorry. I didn't get here. where do you mean. or help reply in v6.

I only added the preassign list as comment in the file
(dt-binding/memory/mt8192-larb-port.h). I thought our iommu consumer may
need it when they use these ports. they need add dma-ranges property if
its iova is over 4GB.


That's exactly the problem. v6 simply replaced one way to describe the
policy (domain ID) with another (dma-ranges). However, DT is not the
right place to describe