[PATCH v10 4/5] dt-bindings: arm-smmu: add binding for Tegra194 SMMU

2020-07-07 Thread Krishna Reddy
Add binding for NVIDIA's Tegra194 SoC SMMU.

Signed-off-by: Krishna Reddy 
---
 .../devicetree/bindings/iommu/arm,smmu.yaml| 18 ++
 1 file changed, 18 insertions(+)

diff --git a/Documentation/devicetree/bindings/iommu/arm,smmu.yaml 
b/Documentation/devicetree/bindings/iommu/arm,smmu.yaml
index d7ceb4c34423..ac1f526c3424 100644
--- a/Documentation/devicetree/bindings/iommu/arm,smmu.yaml
+++ b/Documentation/devicetree/bindings/iommu/arm,smmu.yaml
@@ -38,6 +38,11 @@ properties:
   - qcom,sc7180-smmu-500
   - qcom,sdm845-smmu-500
   - const: arm,mmu-500
+  - description: NVIDIA SoCs that program two ARM MMU-500s identically
+items:
+  - enum:
+  - nvidia,tegra194-smmu
+  - const: nvidia,smmu-500
   - items:
   - const: arm,mmu-500
   - const: arm,smmu-v2
@@ -138,6 +143,19 @@ required:
 
 additionalProperties: false
 
+allOf:
+  - if:
+  properties:
+compatible:
+  contains:
+enum:
+  - nvidia,tegra194-smmu
+then:
+  properties:
+reg:
+  minItems: 2
+  maxItems: 2
+
 examples:
   - |+
 /* SMMU with stream matching or stream indexing */
-- 
2.26.2

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[PATCH v10 1/5] iommu/arm-smmu: move TLB timeout and spin count macros

2020-07-07 Thread Krishna Reddy
Move TLB timeout and spin count macros to header file to
allow using the same from vendor specific implementations.

Signed-off-by: Krishna Reddy 
---
 drivers/iommu/arm-smmu.c | 3 ---
 drivers/iommu/arm-smmu.h | 2 ++
 2 files changed, 2 insertions(+), 3 deletions(-)

diff --git a/drivers/iommu/arm-smmu.c b/drivers/iommu/arm-smmu.c
index 243bc4cb2705..d2054178df35 100644
--- a/drivers/iommu/arm-smmu.c
+++ b/drivers/iommu/arm-smmu.c
@@ -52,9 +52,6 @@
  */
 #define QCOM_DUMMY_VAL -1
 
-#define TLB_LOOP_TIMEOUT   100 /* 1s! */
-#define TLB_SPIN_COUNT 10
-
 #define MSI_IOVA_BASE  0x800
 #define MSI_IOVA_LENGTH0x10
 
diff --git a/drivers/iommu/arm-smmu.h b/drivers/iommu/arm-smmu.h
index d172c024be61..c7d0122a7c6c 100644
--- a/drivers/iommu/arm-smmu.h
+++ b/drivers/iommu/arm-smmu.h
@@ -236,6 +236,8 @@ enum arm_smmu_cbar_type {
 /* Maximum number of context banks per SMMU */
 #define ARM_SMMU_MAX_CBS   128
 
+#define TLB_LOOP_TIMEOUT   100 /* 1s! */
+#define TLB_SPIN_COUNT 10
 
 /* Shared driver definitions */
 enum arm_smmu_arch_version {
-- 
2.26.2

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[PATCH v10 2/5] iommu/arm-smmu: ioremap smmu mmio region before implementation init

2020-07-07 Thread Krishna Reddy
ioremap smmu mmio region before calling into implementation init.
This is necessary to allow mapped address available during vendor
specific implementation init.

Signed-off-by: Krishna Reddy 
---
 drivers/iommu/arm-smmu.c | 8 
 1 file changed, 4 insertions(+), 4 deletions(-)

diff --git a/drivers/iommu/arm-smmu.c b/drivers/iommu/arm-smmu.c
index d2054178df35..e03e873d3bca 100644
--- a/drivers/iommu/arm-smmu.c
+++ b/drivers/iommu/arm-smmu.c
@@ -2120,10 +2120,6 @@ static int arm_smmu_device_probe(struct platform_device 
*pdev)
if (err)
return err;
 
-   smmu = arm_smmu_impl_init(smmu);
-   if (IS_ERR(smmu))
-   return PTR_ERR(smmu);
-
res = platform_get_resource(pdev, IORESOURCE_MEM, 0);
ioaddr = res->start;
smmu->base = devm_ioremap_resource(dev, res);
@@ -2135,6 +2131,10 @@ static int arm_smmu_device_probe(struct platform_device 
*pdev)
 */
smmu->numpage = resource_size(res);
 
+   smmu = arm_smmu_impl_init(smmu);
+   if (IS_ERR(smmu))
+   return PTR_ERR(smmu);
+
num_irqs = 0;
while ((res = platform_get_resource(pdev, IORESOURCE_IRQ, num_irqs))) {
num_irqs++;
-- 
2.26.2

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[PATCH v10 0/5] NVIDIA ARM SMMU Implementation

2020-07-07 Thread Krishna Reddy
Changes in v10:
Perform SMMU base ioremap before calling implementation init.
Check for Global faults across both ARM MMU-500s during global interrupt.
Check for context faults across all contexts of both ARM MMU-500s during 
context fault interrupt.
Add new DT binding nvidia,smmu-500 for NVIDIA implementation.

v9 - https://lkml.org/lkml/2020/6/30/1282
v8 - https://lkml.org/lkml/2020/6/29/2385
v7 - https://lkml.org/lkml/2020/6/28/347
v6 - https://lkml.org/lkml/2020/6/4/1018
v5 - https://lkml.org/lkml/2020/5/21/1114
v4 - https://lkml.org/lkml/2019/10/30/1054
v3 - https://lkml.org/lkml/2019/10/18/1601
v2 - https://lkml.org/lkml/2019/9/2/980
v1 - https://lkml.org/lkml/2019/8/29/1588


Krishna Reddy (5):
  iommu/arm-smmu: move TLB timeout and spin count macros
  iommu/arm-smmu: ioremap smmu mmio region before implementation init
  iommu/arm-smmu: add NVIDIA implementation for ARM MMU-500 usage
  dt-bindings: arm-smmu: add binding for Tegra194 SMMU
  iommu/arm-smmu: Add global/context fault implementation hooks

 .../devicetree/bindings/iommu/arm,smmu.yaml   |  18 ++
 MAINTAINERS   |   2 +
 drivers/iommu/Makefile|   2 +-
 drivers/iommu/arm-smmu-impl.c |   3 +
 drivers/iommu/arm-smmu-nvidia.c   | 278 ++
 drivers/iommu/arm-smmu.c  |  29 +-
 drivers/iommu/arm-smmu.h  |   6 +
 7 files changed, 328 insertions(+), 10 deletions(-)
 create mode 100644 drivers/iommu/arm-smmu-nvidia.c


base-commit: e5640f21b63d2a5d3e4e0c4111b2b38e99fe5164
-- 
2.26.2

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[PATCH v10 5/5] iommu/arm-smmu: Add global/context fault implementation hooks

2020-07-07 Thread Krishna Reddy
Add global/context fault hooks to allow vendor specific implementations
override default fault interrupt handlers.

Update NVIDIA implementation to override the default global/context fault
interrupt handlers and handle interrupts across the two ARM MMU-500s that
are programmed identically.

Signed-off-by: Krishna Reddy 
---
 drivers/iommu/arm-smmu-nvidia.c | 99 +
 drivers/iommu/arm-smmu.c| 17 +-
 drivers/iommu/arm-smmu.h|  3 +
 3 files changed, 117 insertions(+), 2 deletions(-)

diff --git a/drivers/iommu/arm-smmu-nvidia.c b/drivers/iommu/arm-smmu-nvidia.c
index 2f55e5793d34..31368057e9be 100644
--- a/drivers/iommu/arm-smmu-nvidia.c
+++ b/drivers/iommu/arm-smmu-nvidia.c
@@ -127,6 +127,103 @@ static int nvidia_smmu_reset(struct arm_smmu_device *smmu)
return 0;
 }
 
+static irqreturn_t nvidia_smmu_global_fault_inst(int irq,
+struct arm_smmu_device *smmu,
+int inst)
+{
+   u32 gfsr, gfsynr0, gfsynr1, gfsynr2;
+   void __iomem *gr0_base = nvidia_smmu_page(smmu, inst, 0);
+
+   gfsr = readl_relaxed(gr0_base + ARM_SMMU_GR0_sGFSR);
+   if (!gfsr)
+   return IRQ_NONE;
+
+   gfsynr0 = readl_relaxed(gr0_base + ARM_SMMU_GR0_sGFSYNR0);
+   gfsynr1 = readl_relaxed(gr0_base + ARM_SMMU_GR0_sGFSYNR1);
+   gfsynr2 = readl_relaxed(gr0_base + ARM_SMMU_GR0_sGFSYNR2);
+
+   dev_err_ratelimited(smmu->dev,
+   "Unexpected global fault, this could be serious\n");
+   dev_err_ratelimited(smmu->dev,
+   "\tGFSR 0x%08x, GFSYNR0 0x%08x, GFSYNR1 0x%08x, 
GFSYNR2 0x%08x\n",
+   gfsr, gfsynr0, gfsynr1, gfsynr2);
+
+   writel_relaxed(gfsr, gr0_base + ARM_SMMU_GR0_sGFSR);
+   return IRQ_HANDLED;
+}
+
+static irqreturn_t nvidia_smmu_global_fault(int irq, void *dev)
+{
+   unsigned int inst;
+   irqreturn_t ret = IRQ_NONE;
+   struct arm_smmu_device *smmu = dev;
+
+   for (inst = 0; inst < NUM_SMMU_INSTANCES; inst++) {
+   irqreturn_t irq_ret;
+
+   irq_ret = nvidia_smmu_global_fault_inst(irq, smmu, inst);
+   if (irq_ret == IRQ_HANDLED)
+   ret = IRQ_HANDLED;
+   }
+
+   return ret;
+}
+
+static irqreturn_t nvidia_smmu_context_fault_bank(int irq,
+ struct arm_smmu_device *smmu,
+ int idx, int inst)
+{
+   u32 fsr, fsynr, cbfrsynra;
+   unsigned long iova;
+   void __iomem *gr1_base = nvidia_smmu_page(smmu, inst, 1);
+   void __iomem *cb_base = nvidia_smmu_page(smmu, inst, smmu->numpage + 
idx);
+
+   fsr = readl_relaxed(cb_base + ARM_SMMU_CB_FSR);
+   if (!(fsr & ARM_SMMU_FSR_FAULT))
+   return IRQ_NONE;
+
+   fsynr = readl_relaxed(cb_base + ARM_SMMU_CB_FSYNR0);
+   iova = readq_relaxed(cb_base + ARM_SMMU_CB_FAR);
+   cbfrsynra = readl_relaxed(gr1_base + ARM_SMMU_GR1_CBFRSYNRA(idx));
+
+   dev_err_ratelimited(smmu->dev,
+   "Unhandled context fault: fsr=0x%x, iova=0x%08lx, 
fsynr=0x%x, cbfrsynra=0x%x, cb=%d\n",
+   fsr, iova, fsynr, cbfrsynra, idx);
+
+   writel_relaxed(fsr, cb_base + ARM_SMMU_CB_FSR);
+   return IRQ_HANDLED;
+}
+
+static irqreturn_t nvidia_smmu_context_fault(int irq, void *dev)
+{
+   int idx;
+   unsigned int inst;
+   irqreturn_t ret = IRQ_NONE;
+   struct arm_smmu_device *smmu;
+   struct iommu_domain *domain = dev;
+   struct arm_smmu_domain *smmu_domain;
+
+   smmu_domain = container_of(domain, struct arm_smmu_domain, domain);
+   smmu = smmu_domain->smmu;
+
+   for (inst = 0; inst < NUM_SMMU_INSTANCES; inst++) {
+   irqreturn_t irq_ret;
+
+   /*
+* Interrupt line is shared between all contexts.
+* Check for faults across all contexts.
+*/
+   for (idx = 0; idx < smmu->num_context_banks; idx++) {
+   irq_ret = nvidia_smmu_context_fault_bank(irq, smmu,
+idx, inst);
+   if (irq_ret == IRQ_HANDLED)
+   ret = IRQ_HANDLED;
+   }
+   }
+
+   return ret;
+}
+
 static const struct arm_smmu_impl nvidia_smmu_impl = {
.read_reg = nvidia_smmu_read_reg,
.write_reg = nvidia_smmu_write_reg,
@@ -134,6 +231,8 @@ static const struct arm_smmu_impl nvidia_smmu_impl = {
.write_reg64 = nvidia_smmu_write_reg64,
.reset = nvidia_smmu_reset,
.tlb_sync = nvidia_smmu_tlb_sync,
+   .global_fault = nvidia_smmu_global_fault,
+   .context_fault = nvidia_smmu_context_fault,
 };
 
 struct arm_smmu_device *nvidia_smmu_impl_init(struct arm_smmu_device *smmu)
diff --git 

[PATCH v10 3/5] iommu/arm-smmu: add NVIDIA implementation for ARM MMU-500 usage

2020-07-07 Thread Krishna Reddy
NVIDIA's Tegra194 SoC has three ARM MMU-500 instances.
It uses two of the ARM MMU-500s together to interleave IOVA
accesses across them and must be programmed identically.
This implementation supports programming the two ARM MMU-500s
that must be programmed identically.

The third ARM MMU-500 instance is supported by standard
arm-smmu.c driver itself.

Signed-off-by: Krishna Reddy 
---
 MAINTAINERS |   2 +
 drivers/iommu/Makefile  |   2 +-
 drivers/iommu/arm-smmu-impl.c   |   3 +
 drivers/iommu/arm-smmu-nvidia.c | 179 
 drivers/iommu/arm-smmu.c|   1 +
 drivers/iommu/arm-smmu.h|   1 +
 6 files changed, 187 insertions(+), 1 deletion(-)
 create mode 100644 drivers/iommu/arm-smmu-nvidia.c

diff --git a/MAINTAINERS b/MAINTAINERS
index c23352059a6b..534cedaf8e55 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -16811,8 +16811,10 @@ F: drivers/i2c/busses/i2c-tegra.c
 
 TEGRA IOMMU DRIVERS
 M: Thierry Reding 
+R: Krishna Reddy 
 L: linux-te...@vger.kernel.org
 S: Supported
+F: drivers/iommu/arm-smmu-nvidia.c
 F: drivers/iommu/tegra*
 
 TEGRA KBC DRIVER
diff --git a/drivers/iommu/Makefile b/drivers/iommu/Makefile
index 342190196dfb..2b8203db73ec 100644
--- a/drivers/iommu/Makefile
+++ b/drivers/iommu/Makefile
@@ -15,7 +15,7 @@ obj-$(CONFIG_AMD_IOMMU) += amd/iommu.o amd/init.o amd/quirks.o
 obj-$(CONFIG_AMD_IOMMU_DEBUGFS) += amd/debugfs.o
 obj-$(CONFIG_AMD_IOMMU_V2) += amd/iommu_v2.o
 obj-$(CONFIG_ARM_SMMU) += arm_smmu.o
-arm_smmu-objs += arm-smmu.o arm-smmu-impl.o arm-smmu-qcom.o
+arm_smmu-objs += arm-smmu.o arm-smmu-impl.o arm-smmu-nvidia.o arm-smmu-qcom.o
 obj-$(CONFIG_ARM_SMMU_V3) += arm-smmu-v3.o
 obj-$(CONFIG_DMAR_TABLE) += intel/dmar.o
 obj-$(CONFIG_INTEL_IOMMU) += intel/iommu.o intel/pasid.o
diff --git a/drivers/iommu/arm-smmu-impl.c b/drivers/iommu/arm-smmu-impl.c
index c75b9d957b70..f15571d05474 100644
--- a/drivers/iommu/arm-smmu-impl.c
+++ b/drivers/iommu/arm-smmu-impl.c
@@ -171,6 +171,9 @@ struct arm_smmu_device *arm_smmu_impl_init(struct 
arm_smmu_device *smmu)
if (of_property_read_bool(np, "calxeda,smmu-secure-config-access"))
smmu->impl = _impl;
 
+   if (of_device_is_compatible(np, "nvidia,tegra194-smmu"))
+   return nvidia_smmu_impl_init(smmu);
+
if (of_device_is_compatible(np, "qcom,sdm845-smmu-500") ||
of_device_is_compatible(np, "qcom,sc7180-smmu-500"))
return qcom_smmu_impl_init(smmu);
diff --git a/drivers/iommu/arm-smmu-nvidia.c b/drivers/iommu/arm-smmu-nvidia.c
new file mode 100644
index ..2f55e5793d34
--- /dev/null
+++ b/drivers/iommu/arm-smmu-nvidia.c
@@ -0,0 +1,179 @@
+// SPDX-License-Identifier: GPL-2.0-only
+// Copyright (C) 2019-2020 NVIDIA CORPORATION.  All rights reserved.
+
+#include 
+#include 
+#include 
+#include 
+#include 
+
+#include "arm-smmu.h"
+
+/*
+ * Tegra194 has three ARM MMU-500 Instances.
+ * Two of them are used together and must be programmed identically for
+ * interleaved IOVA accesses across them and translates accesses from
+ * non-isochronous HW devices.
+ * Third one is used for translating accesses from isochronous HW devices.
+ * This implementation supports programming of the two instances that must
+ * be programmed identically.
+ * The third instance usage is through standard arm-smmu driver itself and
+ * is out of scope of this implementation.
+ */
+#define NUM_SMMU_INSTANCES 2
+
+struct nvidia_smmu {
+   struct arm_smmu_device  smmu;
+   void __iomem*bases[NUM_SMMU_INSTANCES];
+};
+
+static inline void __iomem *nvidia_smmu_page(struct arm_smmu_device *smmu,
+unsigned int inst, int page)
+{
+   struct nvidia_smmu *nvidia_smmu;
+
+   nvidia_smmu = container_of(smmu, struct nvidia_smmu, smmu);
+   return nvidia_smmu->bases[inst] + (page << smmu->pgshift);
+}
+
+static u32 nvidia_smmu_read_reg(struct arm_smmu_device *smmu,
+   int page, int offset)
+{
+   void __iomem *reg = nvidia_smmu_page(smmu, 0, page) + offset;
+
+   return readl_relaxed(reg);
+}
+
+static void nvidia_smmu_write_reg(struct arm_smmu_device *smmu,
+ int page, int offset, u32 val)
+{
+   unsigned int i;
+
+   for (i = 0; i < NUM_SMMU_INSTANCES; i++) {
+   void __iomem *reg = nvidia_smmu_page(smmu, i, page) + offset;
+
+   writel_relaxed(val, reg);
+   }
+}
+
+static u64 nvidia_smmu_read_reg64(struct arm_smmu_device *smmu,
+ int page, int offset)
+{
+   void __iomem *reg = nvidia_smmu_page(smmu, 0, page) + offset;
+
+   return readq_relaxed(reg);
+}
+
+static void nvidia_smmu_write_reg64(struct arm_smmu_device *smmu,
+   int page, int offset, u64 val)
+{
+   unsigned int i;
+
+   for (i = 0; i < NUM_SMMU_INSTANCES; i++) {
+   void 

Re: [PATCH v2 1/2] iommu: iommu_aux_at(de)tach_device() extension

2020-07-07 Thread Lu Baolu

Hi Alex,

Thanks a lot for your comments. Please check my reply inline.

On 7/8/20 5:04 AM, Alex Williamson wrote:

On Tue,  7 Jul 2020 09:39:56 +0800
Lu Baolu  wrote:


The hardware assistant vfio mediated device is a use case of iommu
aux-domain. The interactions between vfio/mdev and iommu during mdev
creation and passthr are:

- Create a group for mdev with iommu_group_alloc();
- Add the device to the group with
 group = iommu_group_alloc();
 if (IS_ERR(group))
 return PTR_ERR(group);

 ret = iommu_group_add_device(group, >dev);
 if (!ret)
 dev_info(>dev, "MDEV: group_id = %d\n",
  iommu_group_id(group));
- Allocate an aux-domain
 iommu_domain_alloc()
- Attach the aux-domain to the physical device from which the mdev is
   created.
 iommu_aux_attach_device()

In the whole process, an iommu group was allocated for the mdev and an
iommu domain was attached to the group, but the group->domain leaves
NULL. As the result, iommu_get_domain_for_dev() doesn't work anymore.

The iommu_get_domain_for_dev() is a necessary interface for device
drivers that want to support aux-domain. For example,

 struct iommu_domain *domain;
 struct device *dev = mdev_dev(mdev);
 unsigned long pasid;

 domain = iommu_get_domain_for_dev(dev);
 if (!domain)
 return -ENODEV;

 pasid = iommu_aux_get_pasid(domain, dev->parent);

How did we know this was an aux domain? ie. How did we know we could
use it with iommu_aux_get_pasid()?


Yes. It's a bit confusing if iommu_get_domain_for_dev() is reused here
for aux-domain.



Why did we assume the parent device is the iommu device for the aux
domain?  Should that level of detail be already known by the aux domain?

Nits - The iomu device of an mdev device is found via
mdev_get_iommu_device(dev), it should not be assumed to be the parent.
The parent of an mdev device is found via mdev_parent_dev(mdev).


My bad. The driver should use mdev_get_iommu_device() instead.



The leaps in logic here make me wonder if we should instead be exposing
more of an aux domain API rather than blurring the differences between
these domains.  Thanks,


How about add below API?

/**
 * iommu_aux_get_domain_for_dev - get aux domain for a device
 * @dev: the accessory device
 *
 * The caller should pass a valid @dev to iommu_aux_attach_device() before
 * calling this api. Return an attached aux-domain, or NULL otherwise.
 */
struct iommu_domain *iommu_aux_get_domain_for_dev(struct device *dev)
{
struct iommu_domain *domain = NULL;
struct iommu_group *group;

group = iommu_group_get(dev);
if (!group)
return NULL;

if (group->aux_domain_attached)
domain = group->domain;

iommu_group_put(group);

return domain;
}
EXPORT_SYMBOL_GPL(iommu_aux_get_domain_for_dev);

Best regards,
baolu
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH v4 3/5] iommu/uapi: Use named union for user data

2020-07-07 Thread Lu Baolu

Hi Jacob,

On 7/8/20 7:43 AM, Jacob Pan wrote:

IOMMU UAPI data size is filled by the user space which must be validated
by ther kernel. To ensure backward compatibility, user data can only be
extended by either re-purpose padding bytes or extend the variable sized
union at the end. No size change is allowed before the union. Therefore,
the minimum size is the offset of the union.

To use offsetof() on the union, we must make it named.

Link:https://lkml.org/lkml/2020/6/11/834


Please use lore.kernel.org links.

Best regards,
baolu
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH v2 3/4] iommu/vt-d: Report page request faults for guest SVA

2020-07-07 Thread Lu Baolu

Hi Jean,

On 7/7/20 7:23 PM, Jean-Philippe Brucker wrote:

On Mon, Jul 06, 2020 at 08:25:34AM +0800, Lu Baolu wrote:

A pasid might be bound to a page table from a VM guest via the iommu
ops.sva_bind_gpasid. In this case, when a DMA page fault is detected
on the physical IOMMU, we need to inject the page fault request into
the guest. After the guest completes handling the page fault, a page
response need to be sent back via the iommu ops.page_response().

This adds support to report a page request fault. Any external module
which is interested in handling this fault should regiester a notifier
callback.

Co-developed-by: Jacob Pan 
Signed-off-by: Jacob Pan 
Co-developed-by: Liu Yi L 
Signed-off-by: Liu Yi L 
Signed-off-by: Lu Baolu 
---

[...]

+static int
+intel_svm_prq_report(struct device *dev, struct page_req_dsc *desc)
+{
+   struct iommu_fault_event event;
+   u8 bus, devfn;
+
+   memset(, 0, sizeof(struct iommu_fault_event));
+   bus = PCI_BUS_NUM(desc->rid);
+   devfn = desc->rid & 0xff;
+
+   /* Fill in event data for device specific processing */
+   event.fault.type = IOMMU_FAULT_PAGE_REQ;
+   event.fault.prm.addr = desc->addr;
+   event.fault.prm.pasid = desc->pasid;
+   event.fault.prm.grpid = desc->prg_index;
+   event.fault.prm.perm = prq_to_iommu_prot(desc);
+
+   /*
+* Set last page in group bit if private data is present,
+* page response is required as it does for LPIG.
+*/
+   if (desc->lpig)
+   event.fault.prm.flags |= IOMMU_FAULT_PAGE_REQUEST_LAST_PAGE;
+   if (desc->pasid_present)
+   event.fault.prm.flags |= IOMMU_FAULT_PAGE_REQUEST_PASID_VALID;


Do you also need to set IOMMU_FAULT_PAGE_RESPONSE_NEEDS_PASID?  I added
the flag to deal with devices that do not want a PASID value in their PRI
response (bit 15 in the PCIe Page Request Status Register):
https://lore.kernel.org/linux-iommu/20200616144712.748818-1-jean-phili...@linaro.org/
(applied by Joerg for v5.9)

Grepping for pci_prg_resp_pasid_required() in intel/iommu.c it seems to
currently reject devices that do not want a PASID in a PRI response, so I
think you can set this flag unconditionally for now.


Yes. You are right. I will set this flag in the next version.

Best regards,
baolu



Thanks,
Jean


+   if (desc->priv_data_present) {
+   event.fault.prm.flags |= IOMMU_FAULT_PAGE_REQUEST_LAST_PAGE;
+   event.fault.prm.flags |= IOMMU_FAULT_PAGE_REQUEST_PRIV_DATA;
+   memcpy(event.fault.prm.private_data, desc->priv_data,
+  sizeof(desc->priv_data));
+   }
+
+   return iommu_report_device_fault(dev, );
+}

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH v4 1/5] docs: IOMMU user API

2020-07-07 Thread Lu Baolu

Hi,

On 7/8/20 7:43 AM, Jacob Pan wrote:

+For UAPIs that are shared with in-kernel users, a wrapper function
+is provided to distinguish the callers. For example,
+
+Userspace caller ::
+
+  int iommu_sva_unbind_gpasid(struct iommu_domain *domain, struct device *dev,
+  void __user *udata)
+
+In-kernel caller ::
+
+  int __iommu_sva_unbind_gpasid(struct iommu_domain *domain, struct device 
*dev,
+  struct iommu_gpasid_bind_data *data)


iommu_page_response() may have the same needs. Can we reach an agreement
on the naming rules?

Best regards,
baolu
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[PATCH v4 1/5] docs: IOMMU user API

2020-07-07 Thread Jacob Pan
IOMMU UAPI is newly introduced to support communications between guest
virtual IOMMU and host IOMMU. There has been lots of discussions on how
it should work with VFIO UAPI and userspace in general.

This document is indended to clarify the UAPI design and usage. The
mechenics of how future extensions should be achieved are also covered
in this documentation.

Signed-off-by: Liu Yi L 
Signed-off-by: Jacob Pan 
---
 Documentation/userspace-api/iommu.rst | 312 ++
 1 file changed, 312 insertions(+)
 create mode 100644 Documentation/userspace-api/iommu.rst

diff --git a/Documentation/userspace-api/iommu.rst 
b/Documentation/userspace-api/iommu.rst
new file mode 100644
index ..581b462c2cec
--- /dev/null
+++ b/Documentation/userspace-api/iommu.rst
@@ -0,0 +1,312 @@
+.. SPDX-License-Identifier: GPL-2.0
+.. iommu:
+
+=
+IOMMU Userspace API
+=
+
+IOMMU UAPI is used for virtualization cases where communications are
+needed between physical and virtual IOMMU drivers. For native
+usage, IOMMU is a system device which does not need to communicate
+with user space directly.
+
+The primary use cases are guest Shared Virtual Address (SVA) and
+guest IO virtual address (IOVA), wherein a virtual IOMMU (vIOMMU) is
+required to communicate with the physical IOMMU in the host.
+
+.. contents:: :local:
+
+Functionalities
+===
+Communications of user and kernel involve both directions. The
+supported user-kernel APIs are as follows:
+
+1. Alloc/Free PASID
+2. Bind/unbind guest PASID (e.g. Intel VT-d)
+3. Bind/unbind guest PASID table (e.g. ARM sMMU)
+4. Invalidate IOMMU caches
+5. Service page requests
+
+Requirements
+
+The IOMMU UAPIs are generic and extensible to meet the following
+requirements:
+
+1. Emulated and para-virtualised vIOMMUs
+2. Multiple vendors (Intel VT-d, ARM sMMU, etc.)
+3. Extensions to the UAPI shall not break existing user space
+
+Interfaces
+==
+Although the data structures defined in IOMMU UAPI are self-contained,
+there is no user API functions introduced. Instead, IOMMU UAPI is
+designed to work with existing user driver frameworks such as VFIO.
+
+Extension Rules & Precautions
+-
+When IOMMU UAPI gets extended, the data structures can *only* be
+modified in two ways:
+
+1. Adding new fields by re-purposing the padding[] field. No size change.
+2. Adding new union members at the end. May increase in size.
+
+No new fields can be added *after* the variable sized union in that it
+will break backward compatibility when offset moves. In both cases, a
+new flag must be accompanied with a new field such that the IOMMU
+driver can process the data based on the new flag. Version field is
+only reserved for the unlikely event of UAPI upgrade at its entirety.
+
+It's *always* the caller's responsibility to indicate the size of the
+structure passed by setting argsz appropriately.
+Though at the same time, argsz is user provided data which is not
+trusted. The argsz field allows the user to indicate how much data
+they're providing, it's still the kernel's responsibility to validate
+whether it's correct and sufficient for the requested operation.
+
+Compatibility Checking
+--
+When IOMMU UAPI extension results in size increase, user such as VFIO
+has to handle the following cases:
+
+1. User and kernel has exact size match
+2. An older user with older kernel header (smaller UAPI size) running on a
+   newer kernel (larger UAPI size)
+3. A newer user with newer kernel header (larger UAPI size) running
+   on an older kernel.
+4. A malicious/misbehaving user pass illegal/invalid size but within
+   range. The data may contain garbage.
+
+Feature Checking
+
+While launching a guest with vIOMMU, it is important to ensure that host
+can support the UAPI data structures to be used for vIOMMU-pIOMMU
+communications. Without upfront compatibility checking, future faults
+are difficult to report even in normal conditions. For example, TLB
+invalidations should always succeed. There is no architectural way to
+report back to the vIOMMU if the UAPI data is incompatible. If that
+happens, in order to protect IOMMU iosolation guarantee, we have to
+resort to not giving completion status in vIOMMU. This may result in
+VM hang.
+
+For this reason the following IOMMU UAPIs cannot fail:
+
+1. Free PASID
+2. Unbind guest PASID
+3. Unbind guest PASID table (SMMU)
+4. Cache invalidate
+
+User applications such as QEMU is expected to import kernel UAPI
+headers. Backward compatibility is supported per feature flags.
+For example, an older QEMU (with older kernel header) can run on newer
+kernel. Newer QEMU (with new kernel header) may refuse to initialize
+on an older kernel if new feature flags are not supported by older
+kernel. Simply recompile existing code with newer kernel header should
+not be an issue in that 

[PATCH v4 0/5] IOMMU user API enhancement

2020-07-07 Thread Jacob Pan
IOMMU user API header was introduced to support nested DMA translation and
related fault handling. The current UAPI data structures consist of three
areas that cover the interactions between host kernel and guest:
 - fault handling
 - cache invalidation
 - bind guest page tables, i.e. guest PASID

Future extensions are likely to support more architectures and vIOMMU features.

In the previous discussion, using user-filled data size and feature flags is
made a preferred approach over a unified version number.
https://lkml.org/lkml/2020/1/29/45

In addition to introduce argsz field to data structures, this patchset is also
trying to document the UAPI design, usage, and extension rules. VT-d driver
changes to utilize the new argsz field is included, VFIO usage is to follow.

Thanks,

Jacob


Changeog:
v4
- Added checks of UAPI data for reserved fields, version, and flags.
- Removed version check from vendor driver (vt-d)
- Relaxed argsz check to match the UAPI struct size instead of variable
  union size
- Updated documentation

v3:
- Rewrote backward compatibility rule to support existing code
  re-compiled with newer kernel UAPI header that runs on older
  kernel. Based on review comment from Alex W.
  https://lore.kernel.org/linux-iommu/20200611094741.6d118...@w520.home/
- Take user pointer directly in UAPI functions. Perform argsz check
  and copy_from_user() in IOMMU driver. Eliminate the need for
  VFIO or other upper layer to parse IOMMU data.
- Create wrapper function for in-kernel users of UAPI functions
v2:
- Removed unified API version and helper
- Introduced argsz for each UAPI data
- Introduced UAPI doc

Jacob Pan (5):
  docs: IOMMU user API
  iommu/uapi: Add argsz for user filled data
  iommu/uapi: Use named union for user data
  iommu/uapi: Handle data and argsz filled by users
  iommu/vt-d: Remove UAPI version check

 Documentation/userspace-api/iommu.rst | 312 ++
 drivers/iommu/intel/iommu.c   |  27 ++-
 drivers/iommu/intel/svm.c |   5 +-
 drivers/iommu/iommu.c | 208 ++-
 include/linux/iommu.h |   9 +-
 include/uapi/linux/iommu.h|  10 +-
 6 files changed, 541 insertions(+), 30 deletions(-)
 create mode 100644 Documentation/userspace-api/iommu.rst

-- 
2.7.4

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[PATCH v4 3/5] iommu/uapi: Use named union for user data

2020-07-07 Thread Jacob Pan
IOMMU UAPI data size is filled by the user space which must be validated
by ther kernel. To ensure backward compatibility, user data can only be
extended by either re-purpose padding bytes or extend the variable sized
union at the end. No size change is allowed before the union. Therefore,
the minimum size is the offset of the union.

To use offsetof() on the union, we must make it named.

Link: https://lkml.org/lkml/2020/6/11/834
Signed-off-by: Jacob Pan 
---
 drivers/iommu/intel/iommu.c | 24 
 drivers/iommu/intel/svm.c   |  2 +-
 include/uapi/linux/iommu.h  |  4 ++--
 3 files changed, 15 insertions(+), 15 deletions(-)

diff --git a/drivers/iommu/intel/iommu.c b/drivers/iommu/intel/iommu.c
index 50fc62413a35..abd70b618a3f 100644
--- a/drivers/iommu/intel/iommu.c
+++ b/drivers/iommu/intel/iommu.c
@@ -5409,8 +5409,8 @@ intel_iommu_sva_invalidate(struct iommu_domain *domain, 
struct device *dev,
 
/* Size is only valid in address selective invalidation */
if (inv_info->granularity == IOMMU_INV_GRANU_ADDR)
-   size = to_vtd_size(inv_info->addr_info.granule_size,
-  inv_info->addr_info.nb_granules);
+   size = to_vtd_size(inv_info->granu.addr_info.granule_size,
+  inv_info->granu.addr_info.nb_granules);
 
for_each_set_bit(cache_type,
 (unsigned long *)_info->cache,
@@ -5431,20 +5431,20 @@ intel_iommu_sva_invalidate(struct iommu_domain *domain, 
struct device *dev,
 * granularity.
 */
if (inv_info->granularity == IOMMU_INV_GRANU_PASID &&
-   (inv_info->pasid_info.flags & IOMMU_INV_PASID_FLAGS_PASID))
-   pasid = inv_info->pasid_info.pasid;
+   (inv_info->granu.pasid_info.flags & 
IOMMU_INV_PASID_FLAGS_PASID))
+   pasid = inv_info->granu.pasid_info.pasid;
else if (inv_info->granularity == IOMMU_INV_GRANU_ADDR &&
-(inv_info->addr_info.flags & 
IOMMU_INV_ADDR_FLAGS_PASID))
-   pasid = inv_info->addr_info.pasid;
+(inv_info->granu.addr_info.flags & 
IOMMU_INV_ADDR_FLAGS_PASID))
+   pasid = inv_info->granu.addr_info.pasid;
 
switch (BIT(cache_type)) {
case IOMMU_CACHE_INV_TYPE_IOTLB:
/* HW will ignore LSB bits based on address mask */
if (inv_info->granularity == IOMMU_INV_GRANU_ADDR &&
size &&
-   (inv_info->addr_info.addr & ((BIT(VTD_PAGE_SHIFT + 
size)) - 1))) {
+   (inv_info->granu.addr_info.addr & 
((BIT(VTD_PAGE_SHIFT + size)) - 1))) {
WARN_ONCE(1, "Address out of range, 0x%llx, 
size order %llu\n",
- inv_info->addr_info.addr, size);
+ inv_info->granu.addr_info.addr, size);
}
 
/*
@@ -5452,9 +5452,9 @@ intel_iommu_sva_invalidate(struct iommu_domain *domain, 
struct device *dev,
 * We use npages = -1 to indicate that.
 */
qi_flush_piotlb(iommu, did, pasid,
-   mm_to_dma_pfn(inv_info->addr_info.addr),
+   
mm_to_dma_pfn(inv_info->granu.addr_info.addr),
(granu == QI_GRAN_NONG_PASID) ? -1 : 1 
<< size,
-   inv_info->addr_info.flags & 
IOMMU_INV_ADDR_FLAGS_LEAF);
+   inv_info->granu.addr_info.flags & 
IOMMU_INV_ADDR_FLAGS_LEAF);
 
if (!info->ats_enabled)
break;
@@ -5475,13 +5475,13 @@ intel_iommu_sva_invalidate(struct iommu_domain *domain, 
struct device *dev,
size = 64 - VTD_PAGE_SHIFT;
addr = 0;
} else if (inv_info->granularity == 
IOMMU_INV_GRANU_ADDR)
-   addr = inv_info->addr_info.addr;
+   addr = inv_info->granu.addr_info.addr;
 
if (info->ats_enabled)
qi_flush_dev_iotlb_pasid(iommu, sid,
info->pfsid, pasid,
info->ats_qdep,
-   inv_info->addr_info.addr,
+   inv_info->granu.addr_info.addr,
size);
else
pr_warn_ratelimited("Passdown device IOTLB 
flush w/o ATS!\n");
diff --git a/drivers/iommu/intel/svm.c b/drivers/iommu/intel/svm.c
index 

[PATCH v4 5/5] iommu/vt-d: Remove UAPI version check

2020-07-07 Thread Jacob Pan
IOMMU generic layer already does sanity checks on UAPI data which
include version check. Remove the version check from vendor driver.

Signed-off-by: Jacob Pan 
---
 drivers/iommu/intel/iommu.c | 3 +--
 drivers/iommu/intel/svm.c   | 3 +--
 2 files changed, 2 insertions(+), 4 deletions(-)

diff --git a/drivers/iommu/intel/iommu.c b/drivers/iommu/intel/iommu.c
index abd70b618a3f..63fcca4645c9 100644
--- a/drivers/iommu/intel/iommu.c
+++ b/drivers/iommu/intel/iommu.c
@@ -5383,8 +5383,7 @@ intel_iommu_sva_invalidate(struct iommu_domain *domain, 
struct device *dev,
int ret = 0;
u64 size = 0;
 
-   if (!inv_info || !dmar_domain ||
-   inv_info->version != IOMMU_CACHE_INVALIDATE_INFO_VERSION_1)
+   if (!inv_info || !dmar_domain)
return -EINVAL;
 
if (!dev || !dev_is_pci(dev))
diff --git a/drivers/iommu/intel/svm.c b/drivers/iommu/intel/svm.c
index 713b3a218483..15b36fa0147a 100644
--- a/drivers/iommu/intel/svm.c
+++ b/drivers/iommu/intel/svm.c
@@ -240,8 +240,7 @@ int intel_svm_bind_gpasid(struct iommu_domain *domain, 
struct device *dev,
if (WARN_ON(!iommu) || !data)
return -EINVAL;
 
-   if (data->version != IOMMU_GPASID_BIND_VERSION_1 ||
-   data->format != IOMMU_PASID_FORMAT_INTEL_VTD)
+   if (data->format != IOMMU_PASID_FORMAT_INTEL_VTD)
return -EINVAL;
 
if (!dev_is_pci(dev))
-- 
2.7.4

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[PATCH v4 2/5] iommu/uapi: Add argsz for user filled data

2020-07-07 Thread Jacob Pan
As IOMMU UAPI gets extended, user data size may increase. To support
backward compatibiliy, this patch introduces a size field to each UAPI
data structures. It is *always* the responsibility for the user to fill in
the correct size.

Specific scenarios for user data handling are documented in:
Documentation/userspace-api/iommu.rst

Signed-off-by: Liu Yi L 
Signed-off-by: Jacob Pan 
---
 include/uapi/linux/iommu.h | 6 ++
 1 file changed, 6 insertions(+)

diff --git a/include/uapi/linux/iommu.h b/include/uapi/linux/iommu.h
index e907b7091a46..303f148a5cd7 100644
--- a/include/uapi/linux/iommu.h
+++ b/include/uapi/linux/iommu.h
@@ -135,6 +135,7 @@ enum iommu_page_response_code {
 
 /**
  * struct iommu_page_response - Generic page response information
+ * @argsz: User filled size of this data
  * @version: API version of this structure
  * @flags: encodes whether the corresponding fields are valid
  * (IOMMU_FAULT_PAGE_RESPONSE_* values)
@@ -143,6 +144,7 @@ enum iommu_page_response_code {
  * @code: response code from  iommu_page_response_code
  */
 struct iommu_page_response {
+   __u32   argsz;
 #define IOMMU_PAGE_RESP_VERSION_1  1
__u32   version;
 #define IOMMU_PAGE_RESP_PASID_VALID(1 << 0)
@@ -218,6 +220,7 @@ struct iommu_inv_pasid_info {
 /**
  * struct iommu_cache_invalidate_info - First level/stage invalidation
  * information
+ * @argsz: User filled size of this data
  * @version: API version of this structure
  * @cache: bitfield that allows to select which caches to invalidate
  * @granularity: defines the lowest granularity used for the invalidation:
@@ -246,6 +249,7 @@ struct iommu_inv_pasid_info {
  * must support the used granularity.
  */
 struct iommu_cache_invalidate_info {
+   __u32   argsz;
 #define IOMMU_CACHE_INVALIDATE_INFO_VERSION_1 1
__u32   version;
 /* IOMMU paging structure cache */
@@ -292,6 +296,7 @@ struct iommu_gpasid_bind_data_vtd {
 
 /**
  * struct iommu_gpasid_bind_data - Information about device and guest PASID 
binding
+ * @argsz: User filled size of this data
  * @version:   Version of this data structure
  * @format:PASID table entry format
  * @flags: Additional information on guest bind request
@@ -309,6 +314,7 @@ struct iommu_gpasid_bind_data_vtd {
  * PASID to host PASID based on this bind data.
  */
 struct iommu_gpasid_bind_data {
+   __u32 argsz;
 #define IOMMU_GPASID_BIND_VERSION_11
__u32 version;
 #define IOMMU_PASID_FORMAT_INTEL_VTD   1
-- 
2.7.4

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[PATCH v4 4/5] iommu/uapi: Handle data and argsz filled by users

2020-07-07 Thread Jacob Pan
IOMMU UAPI data has a user filled argsz field which indicates the data
length comes with the API call. User data is not trusted, argsz must be
validated based on the current kernel data size, mandatory data size,
and feature flags.

User data may also be extended, results in possible argsz increase.
Backward compatibility is ensured based on size and flags checking.

This patch adds sanity checks in the IOMMU layer. In addition to argsz,
reserved/unused fields in padding, flags, and version are also checked.
Details are documented in Documentation/userspace-api/iommu.rst

Signed-off-by: Liu Yi L 
Signed-off-by: Jacob Pan 
---
 drivers/iommu/iommu.c | 208 --
 include/linux/iommu.h |   9 ++-
 2 files changed, 206 insertions(+), 11 deletions(-)

diff --git a/drivers/iommu/iommu.c b/drivers/iommu/iommu.c
index d43120eb1dc5..7910249f5dd7 100644
--- a/drivers/iommu/iommu.c
+++ b/drivers/iommu/iommu.c
@@ -1950,33 +1950,225 @@ int iommu_attach_device(struct iommu_domain *domain, 
struct device *dev)
 }
 EXPORT_SYMBOL_GPL(iommu_attach_device);
 
+/*
+ * Check flags and other user privided data for valid combinations. We also
+ * make sure no reserved fields or unused flags are not set. This is to ensure
+ * not breaking userspace in the future when these fields or flags are used.
+ */
+static int iommu_check_cache_invl_data(struct iommu_cache_invalidate_info 
*info)
+{
+   int ret = 0;
+   u32 mask;
+
+   if (info->version != IOMMU_CACHE_INVALIDATE_INFO_VERSION_1)
+   return -EINVAL;
+
+   mask =  IOMMU_CACHE_INV_TYPE_IOTLB |
+   IOMMU_CACHE_INV_TYPE_DEV_IOTLB |
+   IOMMU_CACHE_INV_TYPE_PASID;
+   if (info->cache & ~mask) {
+   pr_warn_ratelimited("Invalid cache types %x\n", info->cache);
+   return -EINVAL;
+   }
+
+   if (info->granularity >= IOMMU_INV_GRANU_NR) {
+   pr_warn_ratelimited("Invalid cache invalidation granu %x\n",
+   info->granularity);
+   return -EINVAL;
+   }
+
+   switch (info->granularity) {
+   case IOMMU_INV_GRANU_ADDR:
+   mask = IOMMU_INV_ADDR_FLAGS_PASID |
+   IOMMU_INV_ADDR_FLAGS_ARCHID |
+   IOMMU_INV_ADDR_FLAGS_LEAF;
+
+   if (info->granu.addr_info.flags & ~mask) {
+   pr_warn_ratelimited("Unsupported invalidation addr 
flags %x\n",
+   info->granu.addr_info.flags);
+   ret = -EINVAL;
+   }
+   break;
+   case IOMMU_INV_GRANU_PASID:
+   mask = IOMMU_INV_PASID_FLAGS_PASID |
+   IOMMU_INV_PASID_FLAGS_ARCHID;
+   if (info->granu.pasid_info.flags & ~mask) {
+   pr_warn_ratelimited("Unsupported invalidation PASID 
flags%x\n",
+   info->granu.pasid_info.flags);
+   ret = -EINVAL;
+   }
+   break;
+   }
+
+   if (info->padding[0] || info->padding[1]) {
+   pr_warn_ratelimited("Non-zero reserved fields\n");
+   ret = -EINVAL;
+   }
+
+   return ret;
+}
+
 int iommu_cache_invalidate(struct iommu_domain *domain, struct device *dev,
-  struct iommu_cache_invalidate_info *inv_info)
+  void __user *uinfo)
 {
+   struct iommu_cache_invalidate_info inv_info;
+   unsigned long minsz, maxsz;
+   int ret = 0;
+
if (unlikely(!domain->ops->cache_invalidate))
return -ENODEV;
 
-   return domain->ops->cache_invalidate(domain, dev, inv_info);
+   /* Current kernel data size is the max to be copied from user */
+   maxsz = sizeof(struct iommu_cache_invalidate_info);
+   memset((void *)_info, 0, maxsz);
+
+   /*
+* No new spaces can be added before the variable sized union, the
+* minimum size is the offset to the union.
+*/
+   minsz = offsetof(struct iommu_cache_invalidate_info, granu);
+
+   /* Copy minsz from user to get flags and argsz */
+   if (copy_from_user(_info, uinfo, minsz))
+   return -EFAULT;
+
+   /* Fields before variable size union is mandatory */
+   if (inv_info.argsz < minsz)
+   return -EINVAL;
+
+   /*
+* User might be using a newer UAPI header which has a larger data
+* size, we shall support the existing flags within the current
+* size.
+*/
+   if (inv_info.argsz > maxsz)
+   inv_info.argsz = maxsz;
+
+   /* Copy the remaining user data _after_ minsz */
+   if (copy_from_user((void *)_info + minsz, uinfo + minsz,
+   inv_info.argsz - minsz))
+   return -EFAULT;
+
+   /* Now the argsz is validated, check the content */
+   ret = iommu_check_cache_invl_data(_info);
+

[PATCH v4 2/4] PCI: Keep the ACS capability offset in device

2020-07-07 Thread Rajat Jain via iommu
Currently ACS capabiity is being looked up at a number of places. Read and
store it once at bootup so that it can be used by all later.

Signed-off-by: Rajat Jain 
---
v4: No change
v3: fix commit log, remove forward declation of static function
v2: Commit log cosmetic changes

 drivers/pci/p2pdma.c |  2 +-
 drivers/pci/pci.c| 20 
 drivers/pci/pci.h|  2 +-
 drivers/pci/probe.c  |  2 +-
 drivers/pci/quirks.c |  8 
 include/linux/pci.h  |  1 +
 6 files changed, 24 insertions(+), 11 deletions(-)

diff --git a/drivers/pci/p2pdma.c b/drivers/pci/p2pdma.c
index e8e444eeb1cd2..f29a48f8fa594 100644
--- a/drivers/pci/p2pdma.c
+++ b/drivers/pci/p2pdma.c
@@ -253,7 +253,7 @@ static int pci_bridge_has_acs_redir(struct pci_dev *pdev)
int pos;
u16 ctrl;
 
-   pos = pci_find_ext_capability(pdev, PCI_EXT_CAP_ID_ACS);
+   pos = pdev->acs_cap;
if (!pos)
return 0;
 
diff --git a/drivers/pci/pci.c b/drivers/pci/pci.c
index eec625f0e594e..73a8627822140 100644
--- a/drivers/pci/pci.c
+++ b/drivers/pci/pci.c
@@ -831,7 +831,7 @@ static void pci_disable_acs_redir(struct pci_dev *dev)
if (!pci_dev_specific_disable_acs_redir(dev))
return;
 
-   pos = pci_find_ext_capability(dev, PCI_EXT_CAP_ID_ACS);
+   pos = dev->acs_cap;
if (!pos) {
pci_warn(dev, "cannot disable ACS redirect for this hardware as 
it does not have ACS capabilities\n");
return;
@@ -857,7 +857,7 @@ static void pci_std_enable_acs(struct pci_dev *dev)
u16 cap;
u16 ctrl;
 
-   pos = pci_find_ext_capability(dev, PCI_EXT_CAP_ID_ACS);
+   pos = dev->acs_cap;
if (!pos)
return;
 
@@ -883,7 +883,7 @@ static void pci_std_enable_acs(struct pci_dev *dev)
  * pci_enable_acs - enable ACS if hardware support it
  * @dev: the PCI device
  */
-void pci_enable_acs(struct pci_dev *dev)
+static void pci_enable_acs(struct pci_dev *dev)
 {
if (!pci_acs_enable)
goto disable_acs_redir;
@@ -3362,7 +3362,7 @@ static bool pci_acs_flags_enabled(struct pci_dev *pdev, 
u16 acs_flags)
int pos;
u16 cap, ctrl;
 
-   pos = pci_find_ext_capability(pdev, PCI_EXT_CAP_ID_ACS);
+   pos = pdev->acs_cap;
if (!pos)
return false;
 
@@ -3487,6 +3487,18 @@ bool pci_acs_path_enabled(struct pci_dev *start,
return true;
 }
 
+/**
+ * pci_acs_init - Initialize ACS if hardware supports it
+ * @dev: the PCI device
+ */
+void pci_acs_init(struct pci_dev *dev)
+{
+   dev->acs_cap = pci_find_ext_capability(dev, PCI_EXT_CAP_ID_ACS);
+
+   if (dev->acs_cap)
+   pci_enable_acs(dev);
+}
+
 /**
  * pci_rebar_find_pos - find position of resize ctrl reg for BAR
  * @pdev: PCI device
diff --git a/drivers/pci/pci.h b/drivers/pci/pci.h
index 6d3f758671064..12fb79fbe29d3 100644
--- a/drivers/pci/pci.h
+++ b/drivers/pci/pci.h
@@ -532,7 +532,7 @@ static inline resource_size_t pci_resource_alignment(struct 
pci_dev *dev,
return resource_alignment(res);
 }
 
-void pci_enable_acs(struct pci_dev *dev);
+void pci_acs_init(struct pci_dev *dev);
 #ifdef CONFIG_PCI_QUIRKS
 int pci_dev_specific_acs_enabled(struct pci_dev *dev, u16 acs_flags);
 int pci_dev_specific_enable_acs(struct pci_dev *dev);
diff --git a/drivers/pci/probe.c b/drivers/pci/probe.c
index 2f66988cea257..6d87066a5ecc5 100644
--- a/drivers/pci/probe.c
+++ b/drivers/pci/probe.c
@@ -2390,7 +2390,7 @@ static void pci_init_capabilities(struct pci_dev *dev)
pci_ats_init(dev);  /* Address Translation Services */
pci_pri_init(dev);  /* Page Request Interface */
pci_pasid_init(dev);/* Process Address Space ID */
-   pci_enable_acs(dev);/* Enable ACS P2P upstream forwarding */
+   pci_acs_init(dev);  /* Access Control Services */
pci_ptm_init(dev);  /* Precision Time Measurement */
pci_aer_init(dev);  /* Advanced Error Reporting */
pci_dpc_init(dev);  /* Downstream Port Containment */
diff --git a/drivers/pci/quirks.c b/drivers/pci/quirks.c
index 812bfc32ecb82..b341628e47527 100644
--- a/drivers/pci/quirks.c
+++ b/drivers/pci/quirks.c
@@ -4653,7 +4653,7 @@ static int pci_quirk_intel_spt_pch_acs(struct pci_dev 
*dev, u16 acs_flags)
if (!pci_quirk_intel_spt_pch_acs_match(dev))
return -ENOTTY;
 
-   pos = pci_find_ext_capability(dev, PCI_EXT_CAP_ID_ACS);
+   pos = dev->acs_cap;
if (!pos)
return -ENOTTY;
 
@@ -4961,7 +4961,7 @@ static int pci_quirk_enable_intel_spt_pch_acs(struct 
pci_dev *dev)
if (!pci_quirk_intel_spt_pch_acs_match(dev))
return -ENOTTY;
 
-   pos = pci_find_ext_capability(dev, PCI_EXT_CAP_ID_ACS);
+   pos = dev->acs_cap;
if (!pos)
return -ENOTTY;
 
@@ -4988,7 +4988,7 @@ static int 

[PATCH v4 1/4] PCI: Move pci_enable_acs() and its dependencies up in pci.c

2020-07-07 Thread Rajat Jain via iommu
Move pci_enable_acs() and the functions it depends on, further up in the
source code to avoid having to forward declare it when we make it static
in near future (next patch).

No functional changes intended.

Signed-off-by: Rajat Jain 
---
v4: Same as v3
v3: Initial version of the patch, created per Bjorn's suggestion

 drivers/pci/pci.c | 254 +++---
 1 file changed, 127 insertions(+), 127 deletions(-)

diff --git a/drivers/pci/pci.c b/drivers/pci/pci.c
index ce096272f52b1..eec625f0e594e 100644
--- a/drivers/pci/pci.c
+++ b/drivers/pci/pci.c
@@ -777,6 +777,133 @@ int pci_wait_for_pending(struct pci_dev *dev, int pos, 
u16 mask)
return 0;
 }
 
+static int pci_acs_enable;
+
+/**
+ * pci_request_acs - ask for ACS to be enabled if supported
+ */
+void pci_request_acs(void)
+{
+   pci_acs_enable = 1;
+}
+
+static const char *disable_acs_redir_param;
+
+/**
+ * pci_disable_acs_redir - disable ACS redirect capabilities
+ * @dev: the PCI device
+ *
+ * For only devices specified in the disable_acs_redir parameter.
+ */
+static void pci_disable_acs_redir(struct pci_dev *dev)
+{
+   int ret = 0;
+   const char *p;
+   int pos;
+   u16 ctrl;
+
+   if (!disable_acs_redir_param)
+   return;
+
+   p = disable_acs_redir_param;
+   while (*p) {
+   ret = pci_dev_str_match(dev, p, );
+   if (ret < 0) {
+   pr_info_once("PCI: Can't parse disable_acs_redir 
parameter: %s\n",
+disable_acs_redir_param);
+
+   break;
+   } else if (ret == 1) {
+   /* Found a match */
+   break;
+   }
+
+   if (*p != ';' && *p != ',') {
+   /* End of param or invalid format */
+   break;
+   }
+   p++;
+   }
+
+   if (ret != 1)
+   return;
+
+   if (!pci_dev_specific_disable_acs_redir(dev))
+   return;
+
+   pos = pci_find_ext_capability(dev, PCI_EXT_CAP_ID_ACS);
+   if (!pos) {
+   pci_warn(dev, "cannot disable ACS redirect for this hardware as 
it does not have ACS capabilities\n");
+   return;
+   }
+
+   pci_read_config_word(dev, pos + PCI_ACS_CTRL, );
+
+   /* P2P Request & Completion Redirect */
+   ctrl &= ~(PCI_ACS_RR | PCI_ACS_CR | PCI_ACS_EC);
+
+   pci_write_config_word(dev, pos + PCI_ACS_CTRL, ctrl);
+
+   pci_info(dev, "disabled ACS redirect\n");
+}
+
+/**
+ * pci_std_enable_acs - enable ACS on devices using standard ACS capabilities
+ * @dev: the PCI device
+ */
+static void pci_std_enable_acs(struct pci_dev *dev)
+{
+   int pos;
+   u16 cap;
+   u16 ctrl;
+
+   pos = pci_find_ext_capability(dev, PCI_EXT_CAP_ID_ACS);
+   if (!pos)
+   return;
+
+   pci_read_config_word(dev, pos + PCI_ACS_CAP, );
+   pci_read_config_word(dev, pos + PCI_ACS_CTRL, );
+
+   /* Source Validation */
+   ctrl |= (cap & PCI_ACS_SV);
+
+   /* P2P Request Redirect */
+   ctrl |= (cap & PCI_ACS_RR);
+
+   /* P2P Completion Redirect */
+   ctrl |= (cap & PCI_ACS_CR);
+
+   /* Upstream Forwarding */
+   ctrl |= (cap & PCI_ACS_UF);
+
+   pci_write_config_word(dev, pos + PCI_ACS_CTRL, ctrl);
+}
+
+/**
+ * pci_enable_acs - enable ACS if hardware support it
+ * @dev: the PCI device
+ */
+void pci_enable_acs(struct pci_dev *dev)
+{
+   if (!pci_acs_enable)
+   goto disable_acs_redir;
+
+   if (!pci_dev_specific_enable_acs(dev))
+   goto disable_acs_redir;
+
+   pci_std_enable_acs(dev);
+
+disable_acs_redir:
+   /*
+* Note: pci_disable_acs_redir() must be called even if ACS was not
+* enabled by the kernel because it may have been enabled by
+* platform firmware.  So if we are told to disable it, we should
+* always disable it after setting the kernel's default
+* preferences.
+*/
+   pci_disable_acs_redir(dev);
+}
+
 /**
  * pci_restore_bars - restore a device's BAR values (e.g. after wake-up)
  * @dev: PCI device to have its BARs restored
@@ -3230,133 +3357,6 @@ void pci_configure_ari(struct pci_dev *dev)
}
 }
 
-static int pci_acs_enable;
-
-/**
- * pci_request_acs - ask for ACS to be enabled if supported
- */
-void pci_request_acs(void)
-{
-   pci_acs_enable = 1;
-}
-
-static const char *disable_acs_redir_param;
-
-/**
- * pci_disable_acs_redir - disable ACS redirect capabilities
- * @dev: the PCI device
- *
- * For only devices specified in the disable_acs_redir parameter.
- */
-static void pci_disable_acs_redir(struct pci_dev *dev)
-{
-   int ret = 0;
-   const char *p;
-   int pos;
-   u16 ctrl;
-
-   if (!disable_acs_redir_param)
-   return;
-
-   p = disable_acs_redir_param;
-   while (*p) {
-   ret = 

[PATCH v4 4/4] PCI/ACS: Enable PCI_ACS_TB for untrusted/external-facing devices

2020-07-07 Thread Rajat Jain via iommu
When enabling ACS, enable translation blocking for external facing ports
and untrusted devices.

Signed-off-by: Rajat Jain 
---
v4: Add braces to avoid warning from kernel robot
print warning for only external-facing devices.
v3: print warning if ACS_TB not supported on external-facing/untrusted ports.
Minor code comments fixes.
v2: Commit log change

 drivers/pci/pci.c|  8 
 drivers/pci/quirks.c | 15 +++
 2 files changed, 23 insertions(+)

diff --git a/drivers/pci/pci.c b/drivers/pci/pci.c
index 73a8627822140..a5a6bea7af7ce 100644
--- a/drivers/pci/pci.c
+++ b/drivers/pci/pci.c
@@ -876,6 +876,14 @@ static void pci_std_enable_acs(struct pci_dev *dev)
/* Upstream Forwarding */
ctrl |= (cap & PCI_ACS_UF);
 
+   /* Enable Translation Blocking for external devices */
+   if (dev->external_facing || dev->untrusted) {
+   if (cap & PCI_ACS_TB)
+   ctrl |= PCI_ACS_TB;
+   else if (dev->external_facing)
+   pci_warn(dev, "ACS: No Translation Blocking on 
external-facing dev\n");
+   }
+
pci_write_config_word(dev, pos + PCI_ACS_CTRL, ctrl);
 }
 
diff --git a/drivers/pci/quirks.c b/drivers/pci/quirks.c
index b341628e47527..bb22b46c1d719 100644
--- a/drivers/pci/quirks.c
+++ b/drivers/pci/quirks.c
@@ -4934,6 +4934,13 @@ static void pci_quirk_enable_intel_rp_mpc_acs(struct 
pci_dev *dev)
}
 }
 
+/*
+ * Currently this quirk does the equivalent of
+ * PCI_ACS_SV | PCI_ACS_RR | PCI_ACS_CR | PCI_ACS_UF
+ *
+ * TODO: This quirk also needs to do equivalent of PCI_ACS_TB,
+ * if dev->external_facing || dev->untrusted
+ */
 static int pci_quirk_enable_intel_pch_acs(struct pci_dev *dev)
 {
if (!pci_quirk_intel_pch_acs_match(dev))
@@ -4973,6 +4980,14 @@ static int pci_quirk_enable_intel_spt_pch_acs(struct 
pci_dev *dev)
ctrl |= (cap & PCI_ACS_CR);
ctrl |= (cap & PCI_ACS_UF);
 
+   /* Enable Translation Blocking for external devices */
+   if (dev->external_facing || dev->untrusted) {
+   if (cap & PCI_ACS_TB)
+   ctrl |= PCI_ACS_TB;
+   else if (dev->external_facing)
+   pci_warn(dev, "ACS: No Translation Blocking on 
external-facing dev\n");
+   }
+
pci_write_config_dword(dev, pos + INTEL_SPT_ACS_CTRL, ctrl);
 
pci_info(dev, "Intel SPT PCH root port ACS workaround enabled\n");
-- 
2.27.0.212.ge8ba1cc988-goog

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[PATCH v4 3/4] PCI: Treat "external-facing" devices as internal

2020-07-07 Thread Rajat Jain via iommu
The "ExternalFacingPort" devices (root ports) are internal devices that
sit on the internal system fabric. Ref:
https://docs.microsoft.com/en-us/windows-hardware/drivers/pci/dsd-for-pcie-root-ports

Currently they were treated (marked as untrusted) at par with other
external devices downstream those external facing rootports.

Use the platform flag to identify the external facing devices and then
treat them at par with internal devices (don't mark them untrusted).
Any devices downstream continue to be marked as "untrusted". This was
discussed here:
https://lore.kernel.org/linux-pci/20200610230906.GA1528594@bjorn-Precision-5520/

Signed-off-by: Rajat Jain 
---
v4: No change
v3: * fix commit log and minor code comment
* Don't check for "ExternalFacingPort" on PCI_EXP_TYPE_DOWNSTREAM
* Check only for pdev->external_facing in iommu.c
v2: cosmetic changes in commit log

 drivers/iommu/intel/iommu.c |  6 +++---
 drivers/pci/of.c|  2 +-
 drivers/pci/pci-acpi.c  | 10 +-
 drivers/pci/probe.c |  2 +-
 include/linux/pci.h |  8 
 5 files changed, 18 insertions(+), 10 deletions(-)

diff --git a/drivers/iommu/intel/iommu.c b/drivers/iommu/intel/iommu.c
index d759e7234e982..4f0f6ee2d4aaa 100644
--- a/drivers/iommu/intel/iommu.c
+++ b/drivers/iommu/intel/iommu.c
@@ -4738,12 +4738,12 @@ const struct attribute_group *intel_iommu_groups[] = {
NULL,
 };
 
-static inline bool has_untrusted_dev(void)
+static inline bool has_external_pci(void)
 {
struct pci_dev *pdev = NULL;
 
for_each_pci_dev(pdev)
-   if (pdev->untrusted)
+   if (pdev->external_facing)
return true;
 
return false;
@@ -4751,7 +4751,7 @@ static inline bool has_untrusted_dev(void)
 
 static int __init platform_optin_force_iommu(void)
 {
-   if (!dmar_platform_optin() || no_platform_optin || !has_untrusted_dev())
+   if (!dmar_platform_optin() || no_platform_optin || !has_external_pci())
return 0;
 
if (no_iommu || dmar_disabled)
diff --git a/drivers/pci/of.c b/drivers/pci/of.c
index 27839cd2459f6..22727fc9558df 100644
--- a/drivers/pci/of.c
+++ b/drivers/pci/of.c
@@ -42,7 +42,7 @@ void pci_set_bus_of_node(struct pci_bus *bus)
} else {
node = of_node_get(bus->self->dev.of_node);
if (node && of_property_read_bool(node, "external-facing"))
-   bus->self->untrusted = true;
+   bus->self->external_facing = true;
}
 
bus->dev.of_node = node;
diff --git a/drivers/pci/pci-acpi.c b/drivers/pci/pci-acpi.c
index 7224b1e5f2a83..43a5158b2b662 100644
--- a/drivers/pci/pci-acpi.c
+++ b/drivers/pci/pci-acpi.c
@@ -1213,7 +1213,7 @@ static void pci_acpi_optimize_delay(struct pci_dev *pdev,
ACPI_FREE(obj);
 }
 
-static void pci_acpi_set_untrusted(struct pci_dev *dev)
+static void pci_acpi_set_external_facing(struct pci_dev *dev)
 {
u8 val;
 
@@ -1223,12 +1223,12 @@ static void pci_acpi_set_untrusted(struct pci_dev *dev)
return;
 
/*
-* These root ports expose PCIe (including DMA) outside of the
-* system so make sure we treat them and everything behind as
+* These root/down ports expose PCIe (including DMA) outside of the
+* system so make sure we treat everything behind them as
 * untrusted.
 */
if (val)
-   dev->untrusted = 1;
+   dev->external_facing = 1;
 }
 
 static void pci_acpi_setup(struct device *dev)
@@ -1240,7 +1240,7 @@ static void pci_acpi_setup(struct device *dev)
return;
 
pci_acpi_optimize_delay(pci_dev, adev->handle);
-   pci_acpi_set_untrusted(pci_dev);
+   pci_acpi_set_external_facing(pci_dev);
pci_acpi_add_edr_notifier(pci_dev);
 
pci_acpi_add_pm_notifier(adev, pci_dev);
diff --git a/drivers/pci/probe.c b/drivers/pci/probe.c
index 6d87066a5ecc5..8c40c00413e74 100644
--- a/drivers/pci/probe.c
+++ b/drivers/pci/probe.c
@@ -1552,7 +1552,7 @@ static void set_pcie_untrusted(struct pci_dev *dev)
 * untrusted as well.
 */
parent = pci_upstream_bridge(dev);
-   if (parent && parent->untrusted)
+   if (parent && (parent->untrusted || parent->external_facing))
dev->untrusted = true;
 }
 
diff --git a/include/linux/pci.h b/include/linux/pci.h
index 0ca39042507ce..281be857d2430 100644
--- a/include/linux/pci.h
+++ b/include/linux/pci.h
@@ -432,6 +432,14 @@ struct pci_dev {
 * mappings to make sure they cannot access arbitrary memory.
 */
unsigned intuntrusted:1;
+   /*
+* Devices are marked as external-facing using info from platform
+* (ACPI / devicetree). An external-facing device is still an internal
+* trusted device, but it faces external untrusted devices. Thus any
+* device enumerated downstream an external-facing device, is marked

Re: [PATCH RESEND v2] PCI: Add device even if driver attach failed

2020-07-07 Thread Bjorn Helgaas
On Mon, Jul 06, 2020 at 04:32:40PM -0700, Rajat Jain wrote:
> device_attach() returning failure indicates a driver error while trying to
> probe the device. In such a scenario, the PCI device should still be added
> in the system and be visible to the user.
> 
> This patch partially reverts:
> commit ab1a187bba5c ("PCI: Check device_attach() return value always")
> 
> Signed-off-by: Rajat Jain 
> Reviewed-by: Greg Kroah-Hartman 
> ---
> Resending to stable, independent from other patches per Greg's suggestion
> v2: Add Greg's reviewed by, fix commit log

Applied to pci/enumeration for v5.8 with stable tag, thanks!

>  drivers/pci/bus.c | 6 +-
>  1 file changed, 1 insertion(+), 5 deletions(-)
> 
> diff --git a/drivers/pci/bus.c b/drivers/pci/bus.c
> index 8e40b3e6da77d..3cef835b375fd 100644
> --- a/drivers/pci/bus.c
> +++ b/drivers/pci/bus.c
> @@ -322,12 +322,8 @@ void pci_bus_add_device(struct pci_dev *dev)
>  
>   dev->match_driver = true;
>   retval = device_attach(>dev);
> - if (retval < 0 && retval != -EPROBE_DEFER) {
> + if (retval < 0 && retval != -EPROBE_DEFER)
>   pci_warn(dev, "device attach failed (%d)\n", retval);
> - pci_proc_detach_device(dev);
> - pci_remove_sysfs_dev_files(dev);
> - return;
> - }
>  
>   pci_dev_assign_added(dev, true);
>  }
> -- 
> 2.27.0.212.ge8ba1cc988-goog
> 
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH v3 1/5] docs: IOMMU user API

2020-07-07 Thread Alex Williamson
On Mon, 29 Jun 2020 16:05:18 -0700
Jacob Pan  wrote:

> On Fri, 26 Jun 2020 16:19:23 -0600
> Alex Williamson  wrote:
> 
> > On Tue, 23 Jun 2020 10:03:53 -0700
> > Jacob Pan  wrote:
> >   
> > > IOMMU UAPI is newly introduced to support communications between
> > > guest virtual IOMMU and host IOMMU. There has been lots of
> > > discussions on how it should work with VFIO UAPI and userspace in
> > > general.
> > > 
> > > This document is indended to clarify the UAPI design and usage. The
> > > mechenics of how future extensions should be achieved are also
> > > covered in this documentation.
> > > 
> > > Signed-off-by: Liu Yi L 
> > > Signed-off-by: Jacob Pan 
> > > ---
> > >  Documentation/userspace-api/iommu.rst | 244
> > > ++ 1 file changed, 244 insertions(+)
> > >  create mode 100644 Documentation/userspace-api/iommu.rst
> > > 
> > > diff --git a/Documentation/userspace-api/iommu.rst
> > > b/Documentation/userspace-api/iommu.rst new file mode 100644
> > > index ..f9e4ed90a413
> > > --- /dev/null
> > > +++ b/Documentation/userspace-api/iommu.rst
> > > @@ -0,0 +1,244 @@
> > > +.. SPDX-License-Identifier: GPL-2.0
> > > +.. iommu:
> > > +
> > > +=
> > > +IOMMU Userspace API
> > > +=
> > > +
> > > +IOMMU UAPI is used for virtualization cases where communications
> > > are +needed between physical and virtual IOMMU drivers. For native
> > > +usage, IOMMU is a system device which does not need to communicate
> > > +with user space directly.
> > > +
> > > +The primary use cases are guest Shared Virtual Address (SVA) and
> > > +guest IO virtual address (IOVA), wherein a virtual IOMMU (vIOMMU)
> > > is +required to communicate with the physical IOMMU in the host.
> > > +
> > > +.. contents:: :local:
> > > +
> > > +Functionalities
> > > +===
> > > +Communications of user and kernel involve both directions. The
> > > +supported user-kernel APIs are as follows:
> > > +
> > > +1. Alloc/Free PASID
> > > +2. Bind/unbind guest PASID (e.g. Intel VT-d)
> > > +3. Bind/unbind guest PASID table (e.g. ARM sMMU)
> > > +4. Invalidate IOMMU caches
> > > +5. Service page requests
> > > +
> > > +Requirements
> > > +
> > > +The IOMMU UAPIs are generic and extensible to meet the following
> > > +requirements:
> > > +
> > > +1. Emulated and para-virtualised vIOMMUs
> > > +2. Multiple vendors (Intel VT-d, ARM sMMU, etc.)
> > > +3. Extensions to the UAPI shall not break existing user space
> > > +
> > > +Interfaces
> > > +==
> > > +Although the data structures defined in IOMMU UAPI are
> > > self-contained, +there is no user API functions introduced.
> > > Instead, IOMMU UAPI is +designed to work with existing user driver
> > > frameworks such as VFIO. +
> > > +Extension Rules & Precautions
> > > +-
> > > +When IOMMU UAPI gets extended, the data structures can *only* be
> > > +modified in two ways:
> > > +
> > > +1. Adding new fields by re-purposing the padding[] field. No size
> > > change. +2. Adding new union members at the end. May increase in
> > > size. +
> > > +No new fields can be added *after* the variable sized union in
> > > that it +will break backward compatibility when offset moves. In
> > > both cases, a +new flag must be accompanied with a new field such
> > > that the IOMMU +driver can process the data based on the new flag.
> > > Version field is +only reserved for the unlikely event of UAPI
> > > upgrade at its entirety. +
> > > +It's *always* the caller's responsibility to indicate the size of
> > > the +structure passed by setting argsz appropriately.
> > > +Though at the same time, argsz is user provided data which is not
> > > +trusted. The argsz field allows the user to indicate how much data
> > > +they're providing, it's still the kernel's responsibility to
> > > validate +whether it's correct and sufficient for the requested
> > > operation. +
> > > +Compatibility Checking
> > > +--
> > > +When IOMMU UAPI extension results in size increase, user such as
> > > VFIO +has to handle the following cases:
> > > +
> > > +1. User and kernel has exact size match
> > > +2. An older user with older kernel header (smaller UAPI size)
> > > running on a
> > > +   newer kernel (larger UAPI size)
> > > +3. A newer user with newer kernel header (larger UAPI size) running
> > > +   on an older kernel.
> > > +4. A malicious/misbehaving user pass illegal/invalid size but
> > > within
> > > +   range. The data may contain garbage.
> > 
> > What exactly does vfio need to do to handle these?
> >   
> VFIO does nothing other than returning the status from IOMMU driver.
> Based on the return status, users such as QEMU can cause fault
> conditions within the vIOMMU.
> 
> > > +
> > > +Feature Checking
> > > +
> > > +While launching a guest with vIOMMU, it is important to ensure
> > > that host +can support the UAPI data 

Re: [PATCH v5 00/12] x86: tag application address space for devices

2020-07-07 Thread Fenghua Yu
Hi, Thomas, Joerg, Boris, Ingo, Baolu, and x86/iommu maintainers,

On Tue, Jun 30, 2020 at 04:44:30PM -0700, Fenghua Yu wrote:
> Typical hardware devices require a driver stack to translate application
> buffers to hardware addresses, and a kernel-user transition to notify the
> hardware of new work. What if both the translation and transition overhead
> could be eliminated? This is what Shared Virtual Address (SVA) and ENQCMD
> enabled hardware like Data Streaming Accelerator (DSA) aims to achieve.
> Applications map portals in their local-address-space and directly submit
> work to them using a new instruction.
> 

Any comment on this series? Updated patch 1 with minor changes was sent out
on the same patch 1 thread and was Acked-by Felix Kuehling. If you want me
to send the whole series with the minor changes or any other changes, please
let me know.

Any plan to push the patches into your tree and upstream?

Thanks.

-Fenghua
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH v2 1/2] iommu: iommu_aux_at(de)tach_device() extension

2020-07-07 Thread Alex Williamson
On Tue,  7 Jul 2020 09:39:56 +0800
Lu Baolu  wrote:

> The hardware assistant vfio mediated device is a use case of iommu
> aux-domain. The interactions between vfio/mdev and iommu during mdev
> creation and passthr are:
> 
> - Create a group for mdev with iommu_group_alloc();
> - Add the device to the group with
> group = iommu_group_alloc();
> if (IS_ERR(group))
> return PTR_ERR(group);
> 
> ret = iommu_group_add_device(group, >dev);
> if (!ret)
> dev_info(>dev, "MDEV: group_id = %d\n",
>  iommu_group_id(group));
> - Allocate an aux-domain
> iommu_domain_alloc()
> - Attach the aux-domain to the physical device from which the mdev is
>   created.
> iommu_aux_attach_device()
> 
> In the whole process, an iommu group was allocated for the mdev and an
> iommu domain was attached to the group, but the group->domain leaves
> NULL. As the result, iommu_get_domain_for_dev() doesn't work anymore.
> 
> The iommu_get_domain_for_dev() is a necessary interface for device
> drivers that want to support aux-domain. For example,
> 
> struct iommu_domain *domain;
> struct device *dev = mdev_dev(mdev);
> unsigned long pasid;
> 
> domain = iommu_get_domain_for_dev(dev);
> if (!domain)
> return -ENODEV;
> 
> pasid = iommu_aux_get_pasid(domain, dev->parent);

How did we know this was an aux domain? ie. How did we know we could
use it with iommu_aux_get_pasid()?

Why did we assume the parent device is the iommu device for the aux
domain?  Should that level of detail be already known by the aux domain?

Nits - The iomu device of an mdev device is found via
mdev_get_iommu_device(dev), it should not be assumed to be the parent.
The parent of an mdev device is found via mdev_parent_dev(mdev).

The leaps in logic here make me wonder if we should instead be exposing
more of an aux domain API rather than blurring the differences between
these domains.  Thanks,

Alex

> if (pasid == IOASID_INVALID)
> return -EINVAL;
> 
>  /* Program the device context with the PASID value */
>  
> 
> This extends iommu_aux_at(de)tach_device() so that the users could pass
> in an optional device pointer (struct device for vfio/mdev for example),
> and the necessary check and data link could be done.
> 
> Fixes: a3a195929d40b ("iommu: Add APIs for multiple domains per device")
> Cc: Robin Murphy 
> Cc: Alex Williamson 
> Signed-off-by: Lu Baolu 
> ---
>  drivers/iommu/iommu.c   | 86 +
>  drivers/vfio/vfio_iommu_type1.c |  5 +-
>  include/linux/iommu.h   | 12 +++--
>  3 files changed, 87 insertions(+), 16 deletions(-)
> 
> diff --git a/drivers/iommu/iommu.c b/drivers/iommu/iommu.c
> index 1ed1e14a1f0c..435835058209 100644
> --- a/drivers/iommu/iommu.c
> +++ b/drivers/iommu/iommu.c
> @@ -2723,26 +2723,92 @@ EXPORT_SYMBOL_GPL(iommu_dev_feature_enabled);
>   * This should make us safe against a device being attached to a guest as a
>   * whole while there are still pasid users on it (aux and sva).
>   */
> -int iommu_aux_attach_device(struct iommu_domain *domain, struct device *dev)
> +int iommu_aux_attach_device(struct iommu_domain *domain,
> + struct device *phys_dev, struct device *dev)
>  {
> - int ret = -ENODEV;
> + struct iommu_group *group;
> + int ret;
>  
> - if (domain->ops->aux_attach_dev)
> - ret = domain->ops->aux_attach_dev(domain, dev);
> + if (!domain->ops->aux_attach_dev ||
> + !iommu_dev_feature_enabled(phys_dev, IOMMU_DEV_FEAT_AUX))
> + return -ENODEV;
>  
> - if (!ret)
> - trace_attach_device_to_domain(dev);
> + /* Bare use only. */
> + if (!dev) {
> + ret = domain->ops->aux_attach_dev(domain, phys_dev);
> + if (!ret)
> + trace_attach_device_to_domain(phys_dev);
> +
> + return ret;
> + }
> +
> + /*
> +  * The caller has created a made-up device (for example, vfio/mdev)
> +  * and allocated an iommu_group for user level direct assignment.
> +  * Make sure that the group has only single device and hasn't been
> +  * attached by any other domain.
> +  */
> + group = iommu_group_get(dev);
> + if (!group)
> + return -ENODEV;
> +
> + /*
> +  * Lock the group to make sure the device-count doesn't change while
> +  * we are attaching.
> +  */
> + mutex_lock(>mutex);
> + ret = -EINVAL;
> + if ((iommu_group_device_count(group) != 1) || group->domain)
> + goto out_unlock;
> +
> + ret = -EBUSY;
> + if (group->default_domain && group->domain != group->default_domain)
> + goto out_unlock;
> +
> + ret = domain->ops->aux_attach_dev(domain, phys_dev);
> + if (!ret) {
> + trace_attach_device_to_domain(phys_dev);
> + 

Re: the XSK buffer pool needs be to reverted

2020-07-07 Thread Robin Murphy

On 2020-07-06 20:59, Jonathan Lemon wrote:

On Wed, Jul 01, 2020 at 10:46:40AM +0100, Robin Murphy wrote:

On 2020-06-30 20:08, Jonathan Lemon wrote:

On Mon, Jun 29, 2020 at 02:15:16PM +0100, Robin Murphy wrote:

On 2020-06-27 08:02, Christoph Hellwig wrote:

On Fri, Jun 26, 2020 at 01:54:12PM -0700, Jonathan Lemon wrote:

On Fri, Jun 26, 2020 at 09:47:25AM +0200, Christoph Hellwig wrote:


Note that this is somewhat urgent, as various of the APIs that the code
is abusing are slated to go away for Linux 5.9, so this addition comes
at a really bad time.


Could you elaborate on what is upcoming here?


Moving all these calls out of line, and adding a bypass flag to avoid
the indirect function call for IOMMUs in direct mapped mode.


Also, on a semi-related note, are there limitations on how many pages
can be left mapped by the iommu?  Some of the page pool work involves
leaving the pages mapped instead of constantly mapping/unmapping them.


There are, but I think for all modern IOMMUs they are so big that they
don't matter.  Maintaines of the individual IOMMU drivers might know
more.


Right - I don't know too much about older and more esoteric stuff like POWER
TCE, but for modern pagetable-based stuff like Intel VT-d, AMD-Vi, and Arm
SMMU, the only "limits" are such that legitimate DMA API use should never
get anywhere near them (you'd run out of RAM for actual buffers long
beforehand). The most vaguely-realistic concern might be a pathological
system topology where some old 32-bit PCI device doesn't have ACS isolation
from your high-performance NIC such that they have to share an address
space, where the NIC might happen to steal all the low addresses and prevent
the soundcard or whatever from being able to map a usable buffer.

With an IOMMU, you typically really *want* to keep a full working set's
worth of pages mapped, since dma_map/unmap are expensive while dma_sync is
somewhere between relatively cheap and free. With no IOMMU it makes no real
difference from the DMA API perspective since map/unmap are effectively no
more than the equivalent sync operations anyway (I'm assuming we're not
talking about the kind of constrained hardware that might need SWIOTLB).


On a heavily loaded box with iommu enabled, it seems that quite often
there is contention on the iova_lock.  Are there known issues in this
area?


I'll have to defer to the IOMMU maintainers, and for that you'll need
to say what code you are using.  Current mainlaine doesn't even have
an iova_lock anywhere.


Again I can't speak for non-mainstream stuff outside drivers/iommu, but it's
been over 4 years now since merging the initial scalability work for the
generic IOVA allocator there that focused on minimising lock contention, and
it's had considerable evaluation and tweaking since. But if we can achieve
the goal of efficiently recycling mapped buffers then we shouldn't need to
go anywhere near IOVA allocation either way except when expanding the pool.



I'm running a set of patches which uses the page pool to try and keep
all the RX buffers mapped as the skb goes up the stack, returning the
pages to the pool when the skb is freed.

On a dual-socket 12-core Intel machine (48 processors), and 256G of
memory, when iommu is enabled, I see the following from 'perf top -U',
as the hottest function being run:

-   43.42%  worker  [k] queued_spin_lock_slowpath
 - 43.42% queued_spin_lock_slowpath
- 41.69% _raw_spin_lock_irqsave
   + 41.39% alloc_iova
   + 0.28% iova_magazine_free_pfns
+ 1.07% lock_sock_nested

Which likely is heavy contention on the iovad->iova_rbtree_lock.
(This is on a 5.6 based system, BTW).  More scripts and data are below.
Is there a way to reduce the contention here?


Hmm, how big are your DMA mappings? If you're still hitting the rbtree a
lot, that most likely implies that either you're making giant IOVA
allocations that are too big to be cached, or you're allocating/freeing
IOVAs in a pathological pattern that defeats the whole magazine cache
mechanism (It's optimised for relatively-balanced allocation and freeing of
sizes up order 6). On a further hunch, does the "intel_iommu=forcedac"
option make any difference at all?


The allocations are only 4K in size (packet memory) but there are a lot
of them.  I tried running with "forcedac" over the weekend, and that
seems to have made a huge difference.  Why the name of 'forcedac'?  It
would seem this should be the default for a system with a sane 64-bit PCI
device.


OK, I think I can imagine what's happening there - I could elaborate if 
you're particularly interested, but suffice to say that despite all the 
tricks we've put in to avoid spending unnecessary time down that path 
wherever possible, there are still ways the allocator could behave 
pathologically once the low end below 32 bits gets (nearly) full. As far 
as that command-line option goes, I just happened to notice that it 
already existed and has the same effect as the 

Re: IOASID set token

2020-07-07 Thread Jacob Pan
Hi Jean,

All other points agreed.

On Tue, 7 Jul 2020 12:07:18 +0200
Jean-Philippe Brucker  wrote:

> > > For the moment, though, we could actually specialize the IOASID
> > > API to only take an mm_struct as token.  
> > That would be fine with VT-d. We can use init_mm for host PASID set,
> > process mm for VM set.  
> 
> I'm not fond of using init_mm for the host PASID set. Does it need a
> token at all?
No need to use init_mm at all, probably can do without a token as well.
Just need to allocate a set for the native usage. I will give it a try.

Thanks,

Jacob
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH 1/4] dma-mapping: add a new dma_need_sync API

2020-07-07 Thread Christoph Hellwig
On Tue, Jul 07, 2020 at 08:11:09AM -0700, Jonathan Lemon wrote:
> > You need to check every mapping.  E.g. this API pairs with a
> > dma_map_single/page call.  For S/G mappings you'd need to call it for
> > each entry, although if you have a use case for that we really should
> > add a dma_sg_need_sync helper instea of open coding the scatterlist walk.
> 
> My use case is setting up a pinned memory area, and caching the dma
> mappings.  I'd like to bypass storing the DMA addresses if they aren't
> needed.  For example:
> 
> setup()
> {
> if (dma_need_sync(dev, addr, len)) {
> kvmalloc_array(...)
> cache_dma_mappings(...)
> }
> 
> 
> dev_get_dma(page)
> {
> if (!cache)
> return page_to_phys(page)
> 
> return dma_cache_lookup(...)
> 
> 
> 
> The reason for doing it this way is that the page in question may be
> backed by either system memory, or device memory such as a GPU.  For the
> latter, the GPU provides a table of DMA addresses where data may be
> accessed, so I'm unable to use the dma_map_page() API.

dma_need_sync doesn't tell you if the unmap needs the dma_addr_t.
I've been think about replacing CONFIG_NEED_DMA_MAP_STATE with a runtime
for a while, which would give you exattly what you need.  For now it
isn't very useful as there are very few configs left that do not have
CONFIG_NEED_DMA_MAP_STATE set.
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH 1/4] dma-mapping: add a new dma_need_sync API

2020-07-07 Thread Jonathan Lemon
On Tue, Jul 07, 2020 at 08:47:30AM +0200, Christoph Hellwig wrote:
> On Mon, Jul 06, 2020 at 12:42:27PM -0700, Jonathan Lemon wrote:
> > On Mon, Jun 29, 2020 at 03:03:56PM +0200, Christoph Hellwig wrote:
> > > Add a new API to check if calls to dma_sync_single_for_{device,cpu} are
> > > required for a given DMA streaming mapping.
> > > 
> > > +::
> > > +
> > > + bool
> > > + dma_need_sync(struct device *dev, dma_addr_t dma_addr);
> > > +
> > > +Returns %true if dma_sync_single_for_{device,cpu} calls are required to
> > > +transfer memory ownership.  Returns %false if those calls can be skipped.
> > 
> > Hi Christoph -
> > 
> > Thie call above is for a specific dma_addr.  For correctness, would I
> > need to check every addr, or can I assume that for a specific memory
> > type (pages returned from malloc), that the answer would be identical?
> 
> You need to check every mapping.  E.g. this API pairs with a
> dma_map_single/page call.  For S/G mappings you'd need to call it for
> each entry, although if you have a use case for that we really should
> add a dma_sg_need_sync helper instea of open coding the scatterlist walk.

My use case is setting up a pinned memory area, and caching the dma
mappings.  I'd like to bypass storing the DMA addresses if they aren't
needed.  For example:

setup()
{
if (dma_need_sync(dev, addr, len)) {
kvmalloc_array(...)
cache_dma_mappings(...)
}


dev_get_dma(page)
{
if (!cache)
return page_to_phys(page)

return dma_cache_lookup(...)



The reason for doing it this way is that the page in question may be
backed by either system memory, or device memory such as a GPU.  For the
latter, the GPU provides a table of DMA addresses where data may be
accessed, so I'm unable to use the dma_map_page() API.
-- 
Jonathan
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [Freedreno] [PATCH v2 1/6] iommu/arm-smmu: Add auxiliary domain support for arm-smmuv2

2020-07-07 Thread Rob Clark
On Tue, Jul 7, 2020 at 5:34 AM Robin Murphy  wrote:
>
> On 2020-06-26 21:04, Jordan Crouse wrote:
> > Support auxiliary domains for arm-smmu-v2 to initialize and support
> > multiple pagetables for a single SMMU context bank. Since the smmu-v2
> > hardware doesn't have any built in support for switching the pagetable
> > base it is left as an exercise to the caller to actually use the pagetable.
>
> Hmm, I've still been thinking that we could model this as supporting
> exactly 1 aux domain iff the device is currently attached to a primary
> domain with TTBR1 enabled. Then supporting multiple aux domains with
> magic TTBR0 switching is the Adreno-specific extension on top of that.
>
> And if we don't want to go to that length, then - as I think Will was
> getting at - I'm not sure it's worth bothering at all. There doesn't
> seem to be any point in half-implementing a pretend aux domain interface
> while still driving a bus through the rest of the abstractions - it's
> really the worst of both worlds. If we're going to hand over the guts of
> io-pgtable to the GPU driver then couldn't it just use
> DOMAIN_ATTR_PGTABLE_CFG bidirectionally to inject a TTBR0 table straight
> into the TTBR1-ified domain?

So, something along the lines of:

1) qcom_adreno_smmu_impl somehow tells core arms-smmu that we want
   to use TTBR1 instead of TTBR0

2) gpu driver uses iommu_domain_get_attr(PGTABLE_CFG) to snapshot
   the initial pgtable cfg.  (Btw, I kinda feel like we should add
   io_pgtable_fmt to io_pgtable_cfg to make it self contained.)

3) gpu driver constructs pgtable_ops for TTBR0, and then kicks
   arm-smmu to do the initial setup to enable TTBR0 with
   iommu_domain_set_attr(PGTABLE_CFG, _pgtable_cfg)

if I understood you properly, that sounds simpler.

> Much as I like the idea of the aux domain abstraction and making this
> fit semi-transparently into the IOMMU API, if an almost entirely private
> interface will be the simplest and cleanest way to get it done then at
> this point also I'm starting to lean towards just getting it done. But
> if some other mediated-device type case then turns up that doesn't quite
> fit that private interface, we revisit the proper abstraction again and
> I reserve the right to say "I told you so" ;)

I'm on board with not trying to design this too generically until
there is a second user

BR,
-R


>
> Robin.
>
> > Aux domains are supported if split pagetable (TTBR1) support has been
> > enabled on the master domain.  Each auxiliary domain will reuse the
> > configuration of the master domain. By default the a domain with TTBR1
> > support will have the TTBR0 region disabled so the first attached aux
> > domain will enable the TTBR0 region in the hardware and conversely the
> > last domain to be detached will disable TTBR0 translations.  All subsequent
> > auxiliary domains create a pagetable but not touch the hardware.
> >
> > The leaf driver will be able to query the physical address of the
> > pagetable with the DOMAIN_ATTR_PTBASE attribute so that it can use the
> > address with whatever means it has to switch the pagetable base.
> >
> > Following is a pseudo code example of how a domain can be created
> >
> >   /* Check to see if aux domains are supported */
> >   if (iommu_dev_has_feature(dev, IOMMU_DEV_FEAT_AUX)) {
> >iommu = iommu_domain_alloc(...);
> >
> >if (iommu_aux_attach_device(domain, dev))
> >return FAIL;
> >
> >   /* Save the base address of the pagetable for use by the driver
> >   iommu_domain_get_attr(domain, DOMAIN_ATTR_PTBASE, );
> >   }
> >
> > Then 'domain' can be used like any other iommu domain to map and
> > unmap iova addresses in the pagetable.
> >
> > Signed-off-by: Jordan Crouse 
> > ---
> >
> >   drivers/iommu/arm-smmu.c | 219 ---
> >   drivers/iommu/arm-smmu.h |   1 +
> >   2 files changed, 204 insertions(+), 16 deletions(-)
> >
> > diff --git a/drivers/iommu/arm-smmu.c b/drivers/iommu/arm-smmu.c
> > index 060139452c54..ce6d654301bf 100644
> > --- a/drivers/iommu/arm-smmu.c
> > +++ b/drivers/iommu/arm-smmu.c
> > @@ -91,6 +91,7 @@ struct arm_smmu_cb {
> >   u32 tcr[2];
> >   u32 mair[2];
> >   struct arm_smmu_cfg *cfg;
> > + atomic_taux;
> >   };
> >
> >   struct arm_smmu_master_cfg {
> > @@ -667,6 +668,86 @@ static void arm_smmu_write_context_bank(struct 
> > arm_smmu_device *smmu, int idx)
> >   arm_smmu_cb_write(smmu, idx, ARM_SMMU_CB_SCTLR, reg);
> >   }
> >
> > +/*
> > + * Update the context context bank to enable TTBR0. Assumes AARCH64 S1
> > + * configuration.
> > + */
> > +static void arm_smmu_context_set_ttbr0(struct arm_smmu_cb *cb,
> > + struct io_pgtable_cfg *pgtbl_cfg)
> > +{
> > + u32 tcr = cb->tcr[0];
> > +
> > + /* Add the TCR configuration from the new pagetable config */
> > + tcr |= arm_smmu_lpae_tcr(pgtbl_cfg);
> > +

Re: [PATCH v7 08/36] drm: exynos: fix common struct sg_table related issues

2020-07-07 Thread Andrzej Hajda


On 07.07.2020 11:40, Andrzej Hajda wrote:
> On 19.06.2020 12:36, Marek Szyprowski wrote:
>> The Documentation/DMA-API-HOWTO.txt states that the dma_map_sg() function
>> returns the number of the created entries in the DMA address space.
>> However the subsequent calls to the dma_sync_sg_for_{device,cpu}() and
>> dma_unmap_sg must be called with the original number of the entries
>> passed to the dma_map_sg().
>>
>> struct sg_table is a common structure used for describing a non-contiguous
>> memory buffer, used commonly in the DRM and graphics subsystems. It
>> consists of a scatterlist with memory pages and DMA addresses (sgl entry),
>> as well as the number of scatterlist entries: CPU pages (orig_nents entry)
>> and DMA mapped pages (nents entry).
>>
>> It turned out that it was a common mistake to misuse nents and orig_nents
>> entries, calling DMA-mapping functions with a wrong number of entries or
>> ignoring the number of mapped entries returned by the dma_map_sg()
>> function.
>>
>> To avoid such issues, lets use a common dma-mapping wrappers operating
>> directly on the struct sg_table objects and use scatterlist page
>> iterators where possible. This, almost always, hides references to the
>> nents and orig_nents entries, making the code robust, easier to follow
>> and copy/paste safe.
>>
>> Signed-off-by: Marek Szyprowski 
>

Just fixing my signature :)

Reviewed-by: Andrzej Hajda 

Regards
Andrzej

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH v7 07/36] drm: exynos: use common helper for a scatterlist contiguity check

2020-07-07 Thread Andrzej Hajda


On 07.07.2020 11:35, Andrzej Hajda wrote:
> Hi,
>
> On 19.06.2020 12:36, Marek Szyprowski wrote:
>> Use common helper for checking the contiguity of the imported dma-buf.
>>
>> Signed-off-by: Marek Szyprowski 

Just fixing my signature :)

Reviewed-by: Andrzej Hajda 

Regards
Andrzej

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH v7 01/36] drm: prime: add common helper to check scatterlist contiguity

2020-07-07 Thread Andrzej Hajda


On 07.07.2020 16:30, Andrzej Hajda wrote:
> On 19.06.2020 12:36, Marek Szyprowski wrote:
>> It is a common operation done by DRM drivers to check the contiguity
>> of the DMA-mapped buffer described by a scatterlist in the
>> sg_table object. Let's add a common helper for this operation.
>>
>> Signed-off-by: Marek Szyprowski 
>> ---

Just fixing my signature :)

Reviewed-by: Andrzej Hajda 

Regards
Andrzej

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH v7 03/36] drm: core: fix common struct sg_table related issues

2020-07-07 Thread Andrzej Hajda


On 19.06.2020 12:36, Marek Szyprowski wrote:
> The Documentation/DMA-API-HOWTO.txt states that the dma_map_sg() function
> returns the number of the created entries in the DMA address space.
> However the subsequent calls to the dma_sync_sg_for_{device,cpu}() and
> dma_unmap_sg must be called with the original number of the entries
> passed to the dma_map_sg().
>
> struct sg_table is a common structure used for describing a non-contiguous
> memory buffer, used commonly in the DRM and graphics subsystems. It
> consists of a scatterlist with memory pages and DMA addresses (sgl entry),
> as well as the number of scatterlist entries: CPU pages (orig_nents entry)
> and DMA mapped pages (nents entry).
>
> It turned out that it was a common mistake to misuse nents and orig_nents
> entries, calling DMA-mapping functions with a wrong number of entries or
> ignoring the number of mapped entries returned by the dma_map_sg()
> function.
>
> To avoid such issues, lets use a common dma-mapping wrappers operating
> directly on the struct sg_table objects and use scatterlist page
> iterators where possible. This, almost always, hides references to the
> nents and orig_nents entries, making the code robust, easier to follow
> and copy/paste safe.
>
> Signed-off-by: Marek Szyprowski 


I guess whole patchset can go via drm-misc, after r-b/a-b.


Reviewed-by: Andrzej Hajda 


Regards
Andrzej
> ---
>   drivers/gpu/drm/drm_cache.c|  2 +-
>   drivers/gpu/drm/drm_gem_shmem_helper.c | 14 +-
>   drivers/gpu/drm/drm_prime.c| 11 ++-
>   3 files changed, 16 insertions(+), 11 deletions(-)
>
> diff --git a/drivers/gpu/drm/drm_cache.c b/drivers/gpu/drm/drm_cache.c
> index 03e01b000f7a..0fe3c496002a 100644
> --- a/drivers/gpu/drm/drm_cache.c
> +++ b/drivers/gpu/drm/drm_cache.c
> @@ -127,7 +127,7 @@ drm_clflush_sg(struct sg_table *st)
>   struct sg_page_iter sg_iter;
>   
>   mb(); /*CLFLUSH is ordered only by using memory barriers*/
> - for_each_sg_page(st->sgl, _iter, st->nents, 0)
> + for_each_sgtable_page(st, _iter, 0)
>   drm_clflush_page(sg_page_iter_page(_iter));
>   mb(); /*Make sure that all cache line entry is flushed*/
>   
> diff --git a/drivers/gpu/drm/drm_gem_shmem_helper.c 
> b/drivers/gpu/drm/drm_gem_shmem_helper.c
> index 4b7cfbac4daa..47d8211221f2 100644
> --- a/drivers/gpu/drm/drm_gem_shmem_helper.c
> +++ b/drivers/gpu/drm/drm_gem_shmem_helper.c
> @@ -126,8 +126,8 @@ void drm_gem_shmem_free_object(struct drm_gem_object *obj)
>   drm_prime_gem_destroy(obj, shmem->sgt);
>   } else {
>   if (shmem->sgt) {
> - dma_unmap_sg(obj->dev->dev, shmem->sgt->sgl,
> -  shmem->sgt->nents, DMA_BIDIRECTIONAL);
> + dma_unmap_sgtable(obj->dev->dev, shmem->sgt,
> +   DMA_BIDIRECTIONAL, 0);
>   sg_free_table(shmem->sgt);
>   kfree(shmem->sgt);
>   }
> @@ -424,8 +424,7 @@ void drm_gem_shmem_purge_locked(struct drm_gem_object 
> *obj)
>   
>   WARN_ON(!drm_gem_shmem_is_purgeable(shmem));
>   
> - dma_unmap_sg(obj->dev->dev, shmem->sgt->sgl,
> -  shmem->sgt->nents, DMA_BIDIRECTIONAL);
> + dma_unmap_sgtable(obj->dev->dev, shmem->sgt, DMA_BIDIRECTIONAL, 0);
>   sg_free_table(shmem->sgt);
>   kfree(shmem->sgt);
>   shmem->sgt = NULL;
> @@ -697,12 +696,17 @@ struct sg_table *drm_gem_shmem_get_pages_sgt(struct 
> drm_gem_object *obj)
>   goto err_put_pages;
>   }
>   /* Map the pages for use by the h/w. */
> - dma_map_sg(obj->dev->dev, sgt->sgl, sgt->nents, DMA_BIDIRECTIONAL);
> + ret = dma_map_sgtable(obj->dev->dev, sgt, DMA_BIDIRECTIONAL, 0);
> + if (ret)
> + goto err_free_sgt;
>   
>   shmem->sgt = sgt;
>   
>   return sgt;
>   
> +err_free_sgt:
> + sg_free_table(sgt);
> + kfree(sgt);
>   err_put_pages:
>   drm_gem_shmem_put_pages(shmem);
>   return ERR_PTR(ret);
> diff --git a/drivers/gpu/drm/drm_prime.c b/drivers/gpu/drm/drm_prime.c
> index b717e52e909e..d583d6545666 100644
> --- a/drivers/gpu/drm/drm_prime.c
> +++ b/drivers/gpu/drm/drm_prime.c
> @@ -617,6 +617,7 @@ struct sg_table *drm_gem_map_dma_buf(struct 
> dma_buf_attachment *attach,
>   {
>   struct drm_gem_object *obj = attach->dmabuf->priv;
>   struct sg_table *sgt;
> + int ret;
>   
>   if (WARN_ON(dir == DMA_NONE))
>   return ERR_PTR(-EINVAL);
> @@ -626,11 +627,12 @@ struct sg_table *drm_gem_map_dma_buf(struct 
> dma_buf_attachment *attach,
>   else
>   sgt = obj->dev->driver->gem_prime_get_sg_table(obj);
>   
> - if (!dma_map_sg_attrs(attach->dev, sgt->sgl, sgt->nents, dir,
> -   DMA_ATTR_SKIP_CPU_SYNC)) {
> + ret = dma_map_sgtable(attach->dev, sgt, dir,
> +   

Re: [Freedreno] [PATCH v2 2/6] iommu/io-pgtable: Allow a pgtable implementation to skip TLB operations

2020-07-07 Thread Rob Clark
On Tue, Jul 7, 2020 at 7:25 AM Rob Clark  wrote:
>
> On Tue, Jul 7, 2020 at 4:34 AM Robin Murphy  wrote:
> >
> > On 2020-06-26 21:04, Jordan Crouse wrote:
> > > Allow a io-pgtable implementation to skip TLB operations by checking for
> > > NULL pointers in the helper functions. It will be up to to the owner
> > > of the io-pgtable instance to make sure that they independently handle
> > > the TLB correctly.
> >
> > I don't really understand what this is for - tricking the IOMMU driver
> > into not performing its TLB maintenance at points when that maintenance
> > has been deemed necessary doesn't seem like the appropriate way to
> > achieve anything good :/
>
> No, for triggering the io-pgtable helpers into not performing TLB
> maintenance.  But seriously, since we are creating pgtables ourselves,
> and we don't want to be ioremap'ing the GPU's SMMU instance, the
> alternative is plugging in no-op helpers.  Which amounts to the same
> thing.

Hmm, that said, since we are just memcpy'ing the io_pgtable_cfg from
arm-smmu, it will already be populated with arm-smmu's fxn ptrs.  I
guess we could maybe make it work without no-op helpers, although in
that case it looks like we need to fix something about aux-domain vs
tlb helpers:

[  +0.004373] Unable to handle kernel NULL pointer dereference at
virtual address 0019
[  +0.004086] Mem abort info:
[  +0.004319]   ESR = 0x9604
[  +0.003462]   EC = 0x25: DABT (current EL), IL = 32 bits
[  +0.003494]   SET = 0, FnV = 0
[  +0.002812]   EA = 0, S1PTW = 0
[  +0.002873] Data abort info:
[  +0.003031]   ISV = 0, ISS = 0x0004
[  +0.003785]   CM = 0, WnR = 0
[  +0.003641] user pgtable: 4k pages, 48-bit VAs, pgdp=000261d65000
[  +0.003383] [0019] pgd=, p4d=
[  +0.003715] Internal error: Oops: 9604 [#1] PREEMPT SMP
[  +0.002744] Modules linked in: xt_CHECKSUM xt_MASQUERADE
xt_conntrack ipt_REJECT nf_reject_ipv4 xt_tcpudp ip6table_mangle
ip6table_nat iptable_mangle iptable_nat nf_nat nf_conntrack
nf_defrag_ipv4 libcrc32c bridge stp llc ip6table_filter ip6_tables
iptable_filter ax88179_178a usbnet uvcvideo videobuf2_vmalloc
videobuf2_memops videobuf2_v4l2 videobuf2_common videodev mc
hid_multitouch i2c_hid some_battery ti_sn65dsi86 hci_uart btqca btbcm
qcom_spmi_adc5 bluetooth qcom_spmi_temp_alarm qcom_vadc_common
ecdh_generic ecc snd_soc_sdm845 snd_soc_rt5663 snd_soc_qcom_common
ath10k_snoc ath10k_core crct10dif_ce ath mac80211 snd_soc_rl6231
soundwire_bus i2c_qcom_geni libarc4 qcom_rng msm phy_qcom_qusb2
reset_qcom_pdc drm_kms_helper cfg80211 rfkill qcom_q6v5_mss
qcom_q6v5_ipa_notify socinfo qrtr ns panel_simple qcom_q6v5_pas
qcom_common qcom_glink_smem slim_qcom_ngd_ctrl qcom_sysmon drm
qcom_q6v5 slimbus qmi_helpers qcom_wdt mdt_loader rmtfs_mem be2iscsi
bnx2i cnic uio cxgb4i cxgb4 cxgb3i cxgb3 mdio
[  +0.000139]  libcxgbi libcxgb qla4xxx iscsi_boot_sysfs iscsi_tcp
libiscsi_tcp libiscsi scsi_transport_iscsi fuse ip_tables x_tables
ipv6 nf_defrag_ipv6
[  +0.020933] CPU: 3 PID: 168 Comm: kworker/u16:7 Not tainted
5.8.0-rc1-c630+ #31
[  +0.003828] Hardware name: LENOVO 81JL/LNVNB161216, BIOS
9UCN33WW(V2.06) 06/ 4/2019
[  +0.004039] Workqueue: msm msm_gem_free_work [msm]
[  +0.003885] pstate: 60c5 (nZCv daif +PAN +UAO BTYPE=--)
[  +0.003859] pc : arm_smmu_tlb_inv_range_s1+0x30/0x148
[  +0.003742] lr : arm_smmu_tlb_add_page_s1+0x1c/0x28
[  +0.003887] sp : 800011cdb970
[  +0.003868] x29: 800011cdb970 x28: 0003
[  +0.003930] x27: 0001f1882f80 x26: 0001
[  +0.003886] x25: 0003 x24: 0620
[  +0.003932] x23:  x22: 1000
[  +0.003886] x21: 1000 x20: 0001cf857300
[  +0.003916] x19: 0001 x18: 
[  +0.003921] x17: d9e6a24ae0e8 x16: 00012577
[  +0.003843] x15: 00012578 x14: 
[  +0.003884] x13: 00012574 x12: d9e6a2550180
[  +0.003834] x11: 00083f80 x10: 
[  +0.003889] x9 :  x8 : 0001f1882f80
[  +0.003812] x7 : 0001 x6 : 0048
[  +0.003807] x5 : 0001c86e1000 x4 : 0620
[  +0.003802] x3 : 0001ddb57700 x2 : 1000
[  +0.003809] x1 : 1000 x0 : 000101048000
[  +0.003768] Call trace:
[  +0.003665]  arm_smmu_tlb_inv_range_s1+0x30/0x148
[  +0.003769]  arm_smmu_tlb_add_page_s1+0x1c/0x28
[  +0.003760]  __arm_lpae_unmap+0x3c4/0x498
[  +0.003821]  __arm_lpae_unmap+0xfc/0x498
[  +0.003693]  __arm_lpae_unmap+0xfc/0x498
[  +0.003704]  __arm_lpae_unmap+0xfc/0x498
[  +0.003608]  arm_lpae_unmap+0x60/0x78
[  +0.003653]  msm_iommu_pagetable_unmap+0x5c/0xa0 [msm]
[  +0.003711]  msm_gem_purge_vma+0x48/0x70 [msm]
[  +0.003716]  put_iova+0x68/0xc8 [msm]
[  +0.003792]  msm_gem_free_work+0x118/0x190 [msm]
[  +0.003739]  process_one_work+0x28c/0x6e8
[  +0.003595]  worker_thread+0x4c/0x420
[  +0.003546]  kthread+0x148/0x168
[  +0.003675]  

Re: [PATCH 1/4] dma-mapping: move the remaining DMA API calls out of line

2020-07-07 Thread Christoph Hellwig
On Tue, Jul 07, 2020 at 10:43:10AM +1000, Alexey Kardashevskiy wrote:
> Any luck there? I'd really like to cross this off my todo list :) Thanks,

We had another incident with new net code poking into dma internals
blocking this series.  That is now sorted out, so the series is back
on track.
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH v7 02/36] drm: prime: use sgtable iterators in drm_prime_sg_to_page_addr_arrays()

2020-07-07 Thread Andrzej Hajda


On 19.06.2020 12:36, Marek Szyprowski wrote:
> Replace the current hand-crafted code for extracting pages and DMA
> addresses from the given scatterlist by the much more robust
> code based on the generic scatterlist iterators and recently
> introduced sg_table-based wrappers. The resulting code is simple and
> easy to understand, so the comment describing the old code is no
> longer needed.
>
> Signed-off-by: Marek Szyprowski 


Nice simplification.

Reviewed-by: Andrzej Hajda 

Btw, I've wrongly re-configured my e-mail client, so my R-Bs for other 
patches are little bit broken, I will resend them :)


Regards
Andrzej


> ---
>   drivers/gpu/drm/drm_prime.c | 49 -
>   1 file changed, 15 insertions(+), 34 deletions(-)
>
> diff --git a/drivers/gpu/drm/drm_prime.c b/drivers/gpu/drm/drm_prime.c
> index 226cd6ad3985..b717e52e909e 100644
> --- a/drivers/gpu/drm/drm_prime.c
> +++ b/drivers/gpu/drm/drm_prime.c
> @@ -990,45 +990,26 @@ EXPORT_SYMBOL(drm_gem_prime_import);
>   int drm_prime_sg_to_page_addr_arrays(struct sg_table *sgt, struct page 
> **pages,
>dma_addr_t *addrs, int max_entries)
>   {
> - unsigned count;
> - struct scatterlist *sg;
> - struct page *page;
> - u32 page_len, page_index;
> - dma_addr_t addr;
> - u32 dma_len, dma_index;
> -
> - /*
> -  * Scatterlist elements contains both pages and DMA addresses, but
> -  * one shoud not assume 1:1 relation between them. The sg->length is
> -  * the size of the physical memory chunk described by the sg->page,
> -  * while sg_dma_len(sg) is the size of the DMA (IO virtual) chunk
> -  * described by the sg_dma_address(sg).
> -  */
> - page_index = 0;
> - dma_index = 0;
> - for_each_sg(sgt->sgl, sg, sgt->nents, count) {
> - page_len = sg->length;
> - page = sg_page(sg);
> - dma_len = sg_dma_len(sg);
> - addr = sg_dma_address(sg);
> -
> - while (pages && page_len > 0) {
> - if (WARN_ON(page_index >= max_entries))
> + struct sg_dma_page_iter dma_iter;
> + struct sg_page_iter page_iter;
> + struct page **p = pages;
> + dma_addr_t *a = addrs;
> +
> + if (pages) {
> + for_each_sgtable_page(sgt, _iter, 0) {
> + if (p - pages >= max_entries)
>   return -1;
> - pages[page_index] = page;
> - page++;
> - page_len -= PAGE_SIZE;
> - page_index++;
> + *p++ = sg_page_iter_page(_iter);
>   }
> - while (addrs && dma_len > 0) {
> - if (WARN_ON(dma_index >= max_entries))
> + }
> + if (addrs) {
> + for_each_sgtable_dma_page(sgt, _iter, 0) {
> + if (a - addrs >= max_entries)
>   return -1;
> - addrs[dma_index] = addr;
> - addr += PAGE_SIZE;
> - dma_len -= PAGE_SIZE;
> - dma_index++;
> + *a++ = sg_page_iter_dma_address(_iter);
>   }
>   }
> +
>   return 0;
>   }
>   EXPORT_SYMBOL(drm_prime_sg_to_page_addr_arrays);
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [Freedreno] [PATCH v2 4/6] drm/msm: Add support to create a local pagetable

2020-07-07 Thread Rob Clark
On Tue, Jul 7, 2020 at 4:36 AM Robin Murphy  wrote:
>
> On 2020-06-26 21:04, Jordan Crouse wrote:
> > Add support to create a io-pgtable for use by targets that support
> > per-instance pagetables.  In order to support per-instance pagetables the
> > GPU SMMU device needs to have the qcom,adreno-smmu compatible string and
> > split pagetables and auxiliary domains need to be supported and enabled.
> >
> > Signed-off-by: Jordan Crouse 
> > ---
> >
> >   drivers/gpu/drm/msm/msm_gpummu.c |   2 +-
> >   drivers/gpu/drm/msm/msm_iommu.c  | 180 ++-
> >   drivers/gpu/drm/msm/msm_mmu.h|  16 ++-
> >   3 files changed, 195 insertions(+), 3 deletions(-)
> >
> > diff --git a/drivers/gpu/drm/msm/msm_gpummu.c 
> > b/drivers/gpu/drm/msm/msm_gpummu.c
> > index 310a31b05faa..aab121f4beb7 100644
> > --- a/drivers/gpu/drm/msm/msm_gpummu.c
> > +++ b/drivers/gpu/drm/msm/msm_gpummu.c
> > @@ -102,7 +102,7 @@ struct msm_mmu *msm_gpummu_new(struct device *dev, 
> > struct msm_gpu *gpu)
> >   }
> >
> >   gpummu->gpu = gpu;
> > - msm_mmu_init(>base, dev, );
> > + msm_mmu_init(>base, dev, , MSM_MMU_GPUMMU);
> >
> >   return >base;
> >   }
> > diff --git a/drivers/gpu/drm/msm/msm_iommu.c 
> > b/drivers/gpu/drm/msm/msm_iommu.c
> > index 1b6635504069..f455c597f76d 100644
> > --- a/drivers/gpu/drm/msm/msm_iommu.c
> > +++ b/drivers/gpu/drm/msm/msm_iommu.c
> > @@ -4,15 +4,192 @@
> >* Author: Rob Clark 
> >*/
> >
> > +#include 
> >   #include "msm_drv.h"
> >   #include "msm_mmu.h"
> >
> >   struct msm_iommu {
> >   struct msm_mmu base;
> >   struct iommu_domain *domain;
> > + struct iommu_domain *aux_domain;
> >   };
> > +
> >   #define to_msm_iommu(x) container_of(x, struct msm_iommu, base)
> >
> > +struct msm_iommu_pagetable {
> > + struct msm_mmu base;
> > + struct msm_mmu *parent;
> > + struct io_pgtable_ops *pgtbl_ops;
> > + phys_addr_t ttbr;
> > + u32 asid;
> > +};
> > +
> > +static struct msm_iommu_pagetable *to_pagetable(struct msm_mmu *mmu)
> > +{
> > + return container_of(mmu, struct msm_iommu_pagetable, base);
> > +}
> > +
> > +static int msm_iommu_pagetable_unmap(struct msm_mmu *mmu, u64 iova,
> > + size_t size)
> > +{
> > + struct msm_iommu_pagetable *pagetable = to_pagetable(mmu);
> > + struct io_pgtable_ops *ops = pagetable->pgtbl_ops;
> > + size_t unmapped = 0;
> > +
> > + /* Unmap the block one page at a time */
> > + while (size) {
> > + unmapped += ops->unmap(ops, iova, 4096, NULL);
> > + iova += 4096;
> > + size -= 4096;
> > + }
> > +
> > + iommu_flush_tlb_all(to_msm_iommu(pagetable->parent)->domain);
> > +
> > + return (unmapped == size) ? 0 : -EINVAL;
> > +}
>
> Remember in patch #1 when you said "Then 'domain' can be used like any
> other iommu domain to map and unmap iova addresses in the pagetable."?
>
> This appears to be very much not that :/
>

I guess that comment is a bit stale.. the original plan was to create
an iommu_domain per set of pgtables, but at some point we realized
that by using the io-pgtable helpers directly, we would inflict a lot
less GPU-crazy on the iommu drivers

BR,
-R

> Robin.
>
> > +
> > +static int msm_iommu_pagetable_map(struct msm_mmu *mmu, u64 iova,
> > + struct sg_table *sgt, size_t len, int prot)
> > +{
> > + struct msm_iommu_pagetable *pagetable = to_pagetable(mmu);
> > + struct io_pgtable_ops *ops = pagetable->pgtbl_ops;
> > + struct scatterlist *sg;
> > + size_t mapped = 0;
> > + u64 addr = iova;
> > + unsigned int i;
> > +
> > + for_each_sg(sgt->sgl, sg, sgt->nents, i) {
> > + size_t size = sg->length;
> > + phys_addr_t phys = sg_phys(sg);
> > +
> > + /* Map the block one page at a time */
> > + while (size) {
> > + if (ops->map(ops, addr, phys, 4096, prot)) {
> > + msm_iommu_pagetable_unmap(mmu, iova, mapped);
> > + return -EINVAL;
> > + }
> > +
> > + phys += 4096;
> > + addr += 4096;
> > + size -= 4096;
> > + mapped += 4096;
> > + }
> > + }
> > +
> > + return 0;
> > +}
> > +
> > +static void msm_iommu_pagetable_destroy(struct msm_mmu *mmu)
> > +{
> > + struct msm_iommu_pagetable *pagetable = to_pagetable(mmu);
> > +
> > + free_io_pgtable_ops(pagetable->pgtbl_ops);
> > + kfree(pagetable);
> > +}
> > +
> > +/*
> > + * Given a parent device, create and return an aux domain. This will 
> > enable the
> > + * TTBR0 region
> > + */
> > +static struct iommu_domain *msm_iommu_get_aux_domain(struct msm_mmu 
> > *parent)
> > +{
> > + struct msm_iommu *iommu = to_msm_iommu(parent);
> > + struct iommu_domain *domain;
> > + int ret;
> > +
> > + if (iommu->aux_domain)
> > + return iommu->aux_domain;

Re: [PATCH v7 01/36] drm: prime: add common helper to check scatterlist contiguity

2020-07-07 Thread Andrzej Hajda


On 19.06.2020 12:36, Marek Szyprowski wrote:
> It is a common operation done by DRM drivers to check the contiguity
> of the DMA-mapped buffer described by a scatterlist in the
> sg_table object. Let's add a common helper for this operation.
>
> Signed-off-by: Marek Szyprowski 
> ---
>   drivers/gpu/drm/drm_gem_cma_helper.c | 23 +++--
>   drivers/gpu/drm/drm_prime.c  | 31 
>   include/drm/drm_prime.h  |  2 ++
>   3 files changed, 36 insertions(+), 20 deletions(-)
>
> diff --git a/drivers/gpu/drm/drm_gem_cma_helper.c 
> b/drivers/gpu/drm/drm_gem_cma_helper.c
> index 06a5b9ee1fe0..41566a15dabd 100644
> --- a/drivers/gpu/drm/drm_gem_cma_helper.c
> +++ b/drivers/gpu/drm/drm_gem_cma_helper.c
> @@ -471,26 +471,9 @@ drm_gem_cma_prime_import_sg_table(struct drm_device *dev,
>   {
>   struct drm_gem_cma_object *cma_obj;
>   
> - if (sgt->nents != 1) {
> - /* check if the entries in the sg_table are contiguous */
> - dma_addr_t next_addr = sg_dma_address(sgt->sgl);
> - struct scatterlist *s;
> - unsigned int i;
> -
> - for_each_sg(sgt->sgl, s, sgt->nents, i) {
> - /*
> -  * sg_dma_address(s) is only valid for entries
> -  * that have sg_dma_len(s) != 0
> -  */
> - if (!sg_dma_len(s))
> - continue;
> -
> - if (sg_dma_address(s) != next_addr)
> - return ERR_PTR(-EINVAL);
> -
> - next_addr = sg_dma_address(s) + sg_dma_len(s);
> - }
> - }
> + /* check if the entries in the sg_table are contiguous */
> + if (drm_prime_get_contiguous_size(sgt) < attach->dmabuf->size)
> + return ERR_PTR(-EINVAL);
>   
>   /* Create a CMA GEM buffer. */
>   cma_obj = __drm_gem_cma_create(dev, attach->dmabuf->size);
> diff --git a/drivers/gpu/drm/drm_prime.c b/drivers/gpu/drm/drm_prime.c
> index bbfc713bfdc3..226cd6ad3985 100644
> --- a/drivers/gpu/drm/drm_prime.c
> +++ b/drivers/gpu/drm/drm_prime.c
> @@ -825,6 +825,37 @@ struct sg_table *drm_prime_pages_to_sg(struct page 
> **pages, unsigned int nr_page
>   }
>   EXPORT_SYMBOL(drm_prime_pages_to_sg);
>   
> +/**
> + * drm_prime_get_contiguous_size - returns the contiguous size of the buffer
> + * @sgt: sg_table describing the buffer to check
> + *
> + * This helper calculates the contiguous size in the DMA address space
> + * of the the buffer described by the provided sg_table.
> + *
> + * This is useful for implementing
> + * _gem_object_funcs.gem_prime_import_sg_table.
> + */
> +unsigned long drm_prime_get_contiguous_size(struct sg_table *sgt)
> +{
> + dma_addr_t expected = sg_dma_address(sgt->sgl);
> + struct scatterlist *sg;
> + unsigned long size = 0;
> + int i;
> +
> + for_each_sgtable_dma_sg(sgt, sg, i) {
> + unsigned int len = sg_dma_len(sg);
> +
> + if (!len)
> + break;


I wander if in some dark corners of the kernel 0-length buffers can be 
in use :)


> + if (sg_dma_address(sg) != expected)
> + break;
> + expected += len;
> + size += len;
> + }
> + return size;
> +}
> +EXPORT_SYMBOL(drm_prime_get_contiguous_size);
> +
>   /**
>* drm_gem_prime_export - helper library implementation of the export 
> callback
>* @obj: GEM object to export
> diff --git a/include/drm/drm_prime.h b/include/drm/drm_prime.h
> index 9af7422b44cf..47ef11614627 100644
> --- a/include/drm/drm_prime.h
> +++ b/include/drm/drm_prime.h
> @@ -92,6 +92,8 @@ struct sg_table *drm_prime_pages_to_sg(struct page **pages, 
> unsigned int nr_page
>   struct dma_buf *drm_gem_prime_export(struct drm_gem_object *obj,
>int flags);
>   
> +unsigned long drm_prime_get_contiguous_size(struct sg_table *sgt);
> +

Reviewed-by 

Regards
Andrzej


>   /* helper functions for importing */
>   struct drm_gem_object *drm_gem_prime_import_dev(struct drm_device *dev,
>   struct dma_buf *dma_buf,
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [Freedreno] [PATCH v2 2/6] iommu/io-pgtable: Allow a pgtable implementation to skip TLB operations

2020-07-07 Thread Rob Clark
On Tue, Jul 7, 2020 at 4:34 AM Robin Murphy  wrote:
>
> On 2020-06-26 21:04, Jordan Crouse wrote:
> > Allow a io-pgtable implementation to skip TLB operations by checking for
> > NULL pointers in the helper functions. It will be up to to the owner
> > of the io-pgtable instance to make sure that they independently handle
> > the TLB correctly.
>
> I don't really understand what this is for - tricking the IOMMU driver
> into not performing its TLB maintenance at points when that maintenance
> has been deemed necessary doesn't seem like the appropriate way to
> achieve anything good :/

No, for triggering the io-pgtable helpers into not performing TLB
maintenance.  But seriously, since we are creating pgtables ourselves,
and we don't want to be ioremap'ing the GPU's SMMU instance, the
alternative is plugging in no-op helpers.  Which amounts to the same
thing.

Currently (in a later patch in the series) we are using
iommu_flush_tlb_all() when unmapping, which is a bit of a big hammer.
Although I think we could be a bit more clever and do the TLB ops on
the GPU (since the GPU knows if pagetables we are unmapping from are
in-use and could skip the TLB ops otherwise).

On the topic, if we are using unique ASID values per set of
pagetables, how expensive is tlb invalidate for an ASID that has no
entries in the TLB?

BR,
-R

>
> Robin.
>
> > Signed-off-by: Jordan Crouse 
> > ---
> >
> >   include/linux/io-pgtable.h | 11 +++
> >   1 file changed, 7 insertions(+), 4 deletions(-)
> >
> > diff --git a/include/linux/io-pgtable.h b/include/linux/io-pgtable.h
> > index 53d53c6c2be9..bbed1d3925ba 100644
> > --- a/include/linux/io-pgtable.h
> > +++ b/include/linux/io-pgtable.h
> > @@ -210,21 +210,24 @@ struct io_pgtable {
> >
> >   static inline void io_pgtable_tlb_flush_all(struct io_pgtable *iop)
> >   {
> > - iop->cfg.tlb->tlb_flush_all(iop->cookie);
> > + if (iop->cfg.tlb)
> > + iop->cfg.tlb->tlb_flush_all(iop->cookie);
> >   }
> >
> >   static inline void
> >   io_pgtable_tlb_flush_walk(struct io_pgtable *iop, unsigned long iova,
> > size_t size, size_t granule)
> >   {
> > - iop->cfg.tlb->tlb_flush_walk(iova, size, granule, iop->cookie);
> > + if (iop->cfg.tlb)
> > + iop->cfg.tlb->tlb_flush_walk(iova, size, granule, 
> > iop->cookie);
> >   }
> >
> >   static inline void
> >   io_pgtable_tlb_flush_leaf(struct io_pgtable *iop, unsigned long iova,
> > size_t size, size_t granule)
> >   {
> > - iop->cfg.tlb->tlb_flush_leaf(iova, size, granule, iop->cookie);
> > + if (iop->cfg.tlb)
> > + iop->cfg.tlb->tlb_flush_leaf(iova, size, granule, 
> > iop->cookie);
> >   }
> >
> >   static inline void
> > @@ -232,7 +235,7 @@ io_pgtable_tlb_add_page(struct io_pgtable *iop,
> >   struct iommu_iotlb_gather * gather, unsigned long 
> > iova,
> >   size_t granule)
> >   {
> > - if (iop->cfg.tlb->tlb_add_page)
> > + if (iop->cfg.tlb && iop->cfg.tlb->tlb_add_page)
> >   iop->cfg.tlb->tlb_add_page(gather, iova, granule, 
> > iop->cookie);
> >   }
> >
> >
> ___
> Freedreno mailing list
> freedr...@lists.freedesktop.org
> https://lists.freedesktop.org/mailman/listinfo/freedreno
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH v4 4/7] iommu/vt-d: Handle non-page aligned address

2020-07-07 Thread kernel test robot
Hi Jacob,

Thank you for the patch! Perhaps something to improve:

[auto build test WARNING on iommu/next]
[also build test WARNING on linux/master linus/master v5.8-rc4 next-20200707]
[If your patch is applied to the wrong git tree, kindly drop us a note.
And when submitting patch, we suggest to use  as documented in
https://git-scm.com/docs/git-format-patch]

url:
https://github.com/0day-ci/linux/commits/Jacob-Pan/iommu-vt-d-Misc-tweaks-and-fixes-for-vSVA/20200707-081026
base:   https://git.kernel.org/pub/scm/linux/kernel/git/joro/iommu.git next
config: x86_64-randconfig-m031-20200707 (attached as .config)
compiler: gcc-9 (Debian 9.3.0-14) 9.3.0

If you fix the issue, kindly add following tag as appropriate
Reported-by: kernel test robot 

All warnings (new ones prefixed by >>):

   In file included from include/linux/string.h:6,
from include/linux/uuid.h:12,
from include/linux/mod_devicetable.h:13,
from include/linux/pci.h:27,
from drivers/iommu/intel/dmar.c:19:
   drivers/iommu/intel/dmar.c: In function 'qi_flush_dev_iotlb_pasid':
   include/linux/bits.h:26:28: warning: comparison of unsigned expression < 0 
is always false [-Wtype-limits]
  26 |   __builtin_constant_p((l) > (h)), (l) > (h), 0)))
 |^
   include/linux/compiler.h:58:52: note: in definition of macro '__trace_if_var'
  58 | #define __trace_if_var(cond) (__builtin_constant_p(cond) ? (cond) : 
__trace_if_value(cond))
 |^~~~
>> drivers/iommu/intel/dmar.c:1459:2: note: in expansion of macro 'if'
1459 |  if (addr & GENMASK_ULL(size_order + VTD_PAGE_SHIFT, 0))
 |  ^~
   include/linux/bits.h:25:3: note: in expansion of macro 'BUILD_BUG_ON_ZERO'
  25 |  (BUILD_BUG_ON_ZERO(__builtin_choose_expr( \
 |   ^
   include/linux/bits.h:45:3: note: in expansion of macro 'GENMASK_INPUT_CHECK'
  45 |  (GENMASK_INPUT_CHECK(h, l) + __GENMASK_ULL(h, l))
 |   ^~~
   drivers/iommu/intel/dmar.c:1459:13: note: in expansion of macro 'GENMASK_ULL'
1459 |  if (addr & GENMASK_ULL(size_order + VTD_PAGE_SHIFT, 0))
 | ^~~
   include/linux/bits.h:26:40: warning: comparison of unsigned expression < 0 
is always false [-Wtype-limits]
  26 |   __builtin_constant_p((l) > (h)), (l) > (h), 0)))
 |^
   include/linux/compiler.h:58:52: note: in definition of macro '__trace_if_var'
  58 | #define __trace_if_var(cond) (__builtin_constant_p(cond) ? (cond) : 
__trace_if_value(cond))
 |^~~~
>> drivers/iommu/intel/dmar.c:1459:2: note: in expansion of macro 'if'
1459 |  if (addr & GENMASK_ULL(size_order + VTD_PAGE_SHIFT, 0))
 |  ^~
   include/linux/bits.h:25:3: note: in expansion of macro 'BUILD_BUG_ON_ZERO'
  25 |  (BUILD_BUG_ON_ZERO(__builtin_choose_expr( \
 |   ^
   include/linux/bits.h:45:3: note: in expansion of macro 'GENMASK_INPUT_CHECK'
  45 |  (GENMASK_INPUT_CHECK(h, l) + __GENMASK_ULL(h, l))
 |   ^~~
   drivers/iommu/intel/dmar.c:1459:13: note: in expansion of macro 'GENMASK_ULL'
1459 |  if (addr & GENMASK_ULL(size_order + VTD_PAGE_SHIFT, 0))
 | ^~~
   include/linux/bits.h:26:28: warning: comparison of unsigned expression < 0 
is always false [-Wtype-limits]
  26 |   __builtin_constant_p((l) > (h)), (l) > (h), 0)))
 |^
   include/linux/compiler.h:58:61: note: in definition of macro '__trace_if_var'
  58 | #define __trace_if_var(cond) (__builtin_constant_p(cond) ? (cond) : 
__trace_if_value(cond))
 | ^~~~
>> drivers/iommu/intel/dmar.c:1459:2: note: in expansion of macro 'if'
1459 |  if (addr & GENMASK_ULL(size_order + VTD_PAGE_SHIFT, 0))
 |  ^~
   include/linux/bits.h:25:3: note: in expansion of macro 'BUILD_BUG_ON_ZERO'
  25 |  (BUILD_BUG_ON_ZERO(__builtin_choose_expr( \
 |   ^
   include/linux/bits.h:45:3: note: in expansion of macro 'GENMASK_INPUT_CHECK'
  45 |  (GENMASK_INPUT_CHECK(h, l) + __GENMASK_ULL(h, l))
 |   ^~~
   drivers/iommu/intel/dmar.c:1459:13: note: in expansion of macro 'GENMASK_ULL'
1459 |  if (addr & GENMASK_ULL(size_order + VTD_PAGE_SHIFT, 0))
 | ^~~
   include/linux/bits.h:26:40: warning: comparison of unsigned expression < 0 
is always false [-Wtype-limits]
  26 |   __builtin_constant_p((l) > (h)), (l) > (h), 0)))
 |^
   include/linux/compiler.h:58:61: note: in definition of macro '__trace_if_var'
  58 | #defin

Re: [PATCH v7 04/36] drm: amdgpu: fix common struct sg_table related issues

2020-07-07 Thread Marek Szyprowski
Hi Christian,

On 22.06.2020 15:27, Christian König wrote:
> Am 19.06.20 um 12:36 schrieb Marek Szyprowski:
>> The Documentation/DMA-API-HOWTO.txt states that the dma_map_sg() 
>> function
>> returns the number of the created entries in the DMA address space.
>> However the subsequent calls to the dma_sync_sg_for_{device,cpu}() and
>> dma_unmap_sg must be called with the original number of the entries
>> passed to the dma_map_sg().
>>
>> struct sg_table is a common structure used for describing a 
>> non-contiguous
>> memory buffer, used commonly in the DRM and graphics subsystems. It
>> consists of a scatterlist with memory pages and DMA addresses (sgl 
>> entry),
>> as well as the number of scatterlist entries: CPU pages (orig_nents 
>> entry)
>> and DMA mapped pages (nents entry).
>>
>> It turned out that it was a common mistake to misuse nents and 
>> orig_nents
>> entries, calling DMA-mapping functions with a wrong number of entries or
>> ignoring the number of mapped entries returned by the dma_map_sg()
>> function.
>>
>> To avoid such issues, lets use a common dma-mapping wrappers operating
>> directly on the struct sg_table objects and use scatterlist page
>> iterators where possible. This, almost always, hides references to the
>> nents and orig_nents entries, making the code robust, easier to follow
>> and copy/paste safe.
>>
>> Signed-off-by: Marek Szyprowski 
>> Reviewed-by: Christian König 
>
> Any objection that we pick this one and the radeon up into our 
> branches for upstreaming?
>
> That should about clashes with other driver changes.

I'm fine. This one and radeon doesn't depend on the prime changes, so it 
should merge fine via your tree. I will try to ask for more review of 
the remaining patches and then try merging via drm-misc.

Best regards
-- 
Marek Szyprowski, PhD
Samsung R Institute Poland

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

Re: [PATCH v2 1/6] iommu/arm-smmu: Add auxiliary domain support for arm-smmuv2

2020-07-07 Thread Robin Murphy

On 2020-06-26 21:04, Jordan Crouse wrote:

Support auxiliary domains for arm-smmu-v2 to initialize and support
multiple pagetables for a single SMMU context bank. Since the smmu-v2
hardware doesn't have any built in support for switching the pagetable
base it is left as an exercise to the caller to actually use the pagetable.


Hmm, I've still been thinking that we could model this as supporting 
exactly 1 aux domain iff the device is currently attached to a primary 
domain with TTBR1 enabled. Then supporting multiple aux domains with 
magic TTBR0 switching is the Adreno-specific extension on top of that.


And if we don't want to go to that length, then - as I think Will was 
getting at - I'm not sure it's worth bothering at all. There doesn't 
seem to be any point in half-implementing a pretend aux domain interface 
while still driving a bus through the rest of the abstractions - it's 
really the worst of both worlds. If we're going to hand over the guts of 
io-pgtable to the GPU driver then couldn't it just use 
DOMAIN_ATTR_PGTABLE_CFG bidirectionally to inject a TTBR0 table straight 
into the TTBR1-ified domain?


Much as I like the idea of the aux domain abstraction and making this 
fit semi-transparently into the IOMMU API, if an almost entirely private 
interface will be the simplest and cleanest way to get it done then at 
this point also I'm starting to lean towards just getting it done. But 
if some other mediated-device type case then turns up that doesn't quite 
fit that private interface, we revisit the proper abstraction again and 
I reserve the right to say "I told you so" ;)


Robin.


Aux domains are supported if split pagetable (TTBR1) support has been
enabled on the master domain.  Each auxiliary domain will reuse the
configuration of the master domain. By default the a domain with TTBR1
support will have the TTBR0 region disabled so the first attached aux
domain will enable the TTBR0 region in the hardware and conversely the
last domain to be detached will disable TTBR0 translations.  All subsequent
auxiliary domains create a pagetable but not touch the hardware.

The leaf driver will be able to query the physical address of the
pagetable with the DOMAIN_ATTR_PTBASE attribute so that it can use the
address with whatever means it has to switch the pagetable base.

Following is a pseudo code example of how a domain can be created

  /* Check to see if aux domains are supported */
  if (iommu_dev_has_feature(dev, IOMMU_DEV_FEAT_AUX)) {
 iommu = iommu_domain_alloc(...);

 if (iommu_aux_attach_device(domain, dev))
 return FAIL;

/* Save the base address of the pagetable for use by the driver
iommu_domain_get_attr(domain, DOMAIN_ATTR_PTBASE, );
  }

Then 'domain' can be used like any other iommu domain to map and
unmap iova addresses in the pagetable.

Signed-off-by: Jordan Crouse 
---

  drivers/iommu/arm-smmu.c | 219 ---
  drivers/iommu/arm-smmu.h |   1 +
  2 files changed, 204 insertions(+), 16 deletions(-)

diff --git a/drivers/iommu/arm-smmu.c b/drivers/iommu/arm-smmu.c
index 060139452c54..ce6d654301bf 100644
--- a/drivers/iommu/arm-smmu.c
+++ b/drivers/iommu/arm-smmu.c
@@ -91,6 +91,7 @@ struct arm_smmu_cb {
u32 tcr[2];
u32 mair[2];
struct arm_smmu_cfg *cfg;
+   atomic_taux;
  };
  
  struct arm_smmu_master_cfg {

@@ -667,6 +668,86 @@ static void arm_smmu_write_context_bank(struct 
arm_smmu_device *smmu, int idx)
arm_smmu_cb_write(smmu, idx, ARM_SMMU_CB_SCTLR, reg);
  }
  
+/*

+ * Update the context context bank to enable TTBR0. Assumes AARCH64 S1
+ * configuration.
+ */
+static void arm_smmu_context_set_ttbr0(struct arm_smmu_cb *cb,
+   struct io_pgtable_cfg *pgtbl_cfg)
+{
+   u32 tcr = cb->tcr[0];
+
+   /* Add the TCR configuration from the new pagetable config */
+   tcr |= arm_smmu_lpae_tcr(pgtbl_cfg);
+
+   /* Make sure that both TTBR0 and TTBR1 are enabled */
+   tcr &= ~(ARM_SMMU_TCR_EPD0 | ARM_SMMU_TCR_EPD1);
+
+   /* Udate the TCR register */
+   cb->tcr[0] = tcr;
+
+   /* Program the new TTBR0 */
+   cb->ttbr[0] = pgtbl_cfg->arm_lpae_s1_cfg.ttbr;
+   cb->ttbr[0] |= FIELD_PREP(ARM_SMMU_TTBRn_ASID, cb->cfg->asid);
+}
+
+/*
+ * Thus function assumes that the current model only allows aux domains for
+ * AARCH64 S1 configurations
+ */
+static int arm_smmu_aux_init_domain_context(struct iommu_domain *domain,
+   struct arm_smmu_device *smmu, struct arm_smmu_cfg *master)
+{
+   struct arm_smmu_domain *smmu_domain = to_smmu_domain(domain);
+   struct io_pgtable_ops *pgtbl_ops;
+   struct io_pgtable_cfg pgtbl_cfg;
+
+   mutex_lock(_domain->init_mutex);
+
+   /* Copy the configuration from the master */
+   memcpy(_domain->cfg, master, sizeof(smmu_domain->cfg));
+
+   

[PATCH] dma-pool: use single atomic pool for both DMA zones

2020-07-07 Thread Nicolas Saenz Julienne
When allocating atomic DMA memory for a device, the dma-pool core
queries __dma_direct_optimal_gfp_mask() to check which atomic pool to
use. It turns out the GFP flag returned is only an optimistic guess.
The pool selected might sometimes live in a zone higher than the
device's view of memory.

As there isn't a way to grantee a mapping between a device's DMA
constraints and correct GFP flags this unifies both DMA atomic pools.
The resulting pool is allocated in the lower DMA zone available, if any,
so as for devices to always get accessible memory while having the
flexibility of using dma_pool_kernel for the non constrained ones.

Fixes: c84dc6e68a1d ("dma-pool: add additional coherent pools to map to gfp 
mask")
Reported-by: Jeremy Linton 
Suggested-by: Robin Murphy 
Signed-off-by: Nicolas Saenz Julienne 
---
 kernel/dma/pool.c | 47 +++
 1 file changed, 19 insertions(+), 28 deletions(-)

diff --git a/kernel/dma/pool.c b/kernel/dma/pool.c
index 8cfa01243ed2..883f7a583969 100644
--- a/kernel/dma/pool.c
+++ b/kernel/dma/pool.c
@@ -13,10 +13,11 @@
 #include 
 #include 
 
+#define GFP_ATOMIC_POOL_DMA(IS_ENABLED(CONFIG_ZONE_DMA) ? GFP_DMA : \
+IS_ENABLED(CONFIG_ZONE_DMA32) ? GFP_DMA32 : 0)
+
 static struct gen_pool *atomic_pool_dma __ro_after_init;
 static unsigned long pool_size_dma;
-static struct gen_pool *atomic_pool_dma32 __ro_after_init;
-static unsigned long pool_size_dma32;
 static struct gen_pool *atomic_pool_kernel __ro_after_init;
 static unsigned long pool_size_kernel;
 
@@ -42,16 +43,13 @@ static void __init dma_atomic_pool_debugfs_init(void)
return;
 
debugfs_create_ulong("pool_size_dma", 0400, root, _size_dma);
-   debugfs_create_ulong("pool_size_dma32", 0400, root, _size_dma32);
debugfs_create_ulong("pool_size_kernel", 0400, root, _size_kernel);
 }
 
 static void dma_atomic_pool_size_add(gfp_t gfp, size_t size)
 {
-   if (gfp & __GFP_DMA)
+   if (gfp & GFP_ATOMIC_POOL_DMA)
pool_size_dma += size;
-   else if (gfp & __GFP_DMA32)
-   pool_size_dma32 += size;
else
pool_size_kernel += size;
 }
@@ -132,12 +130,11 @@ static void atomic_pool_resize(struct gen_pool *pool, 
gfp_t gfp)
 
 static void atomic_pool_work_fn(struct work_struct *work)
 {
-   if (IS_ENABLED(CONFIG_ZONE_DMA))
-   atomic_pool_resize(atomic_pool_dma,
-  GFP_KERNEL | GFP_DMA);
-   if (IS_ENABLED(CONFIG_ZONE_DMA32))
-   atomic_pool_resize(atomic_pool_dma32,
-  GFP_KERNEL | GFP_DMA32);
+   gfp_t dma_gfp = GFP_ATOMIC_POOL_DMA;
+
+   if (dma_gfp)
+   atomic_pool_resize(atomic_pool_dma, GFP_KERNEL | dma_gfp);
+
atomic_pool_resize(atomic_pool_kernel, GFP_KERNEL);
 }
 
@@ -168,6 +165,7 @@ static __init struct gen_pool 
*__dma_atomic_pool_init(size_t pool_size,
 
 static int __init dma_atomic_pool_init(void)
 {
+   gfp_t dma_gfp = GFP_ATOMIC_POOL_DMA;
int ret = 0;
 
/*
@@ -185,18 +183,13 @@ static int __init dma_atomic_pool_init(void)
GFP_KERNEL);
if (!atomic_pool_kernel)
ret = -ENOMEM;
-   if (IS_ENABLED(CONFIG_ZONE_DMA)) {
+
+   if (dma_gfp) {
atomic_pool_dma = __dma_atomic_pool_init(atomic_pool_size,
-   GFP_KERNEL | GFP_DMA);
+GFP_KERNEL | dma_gfp);
if (!atomic_pool_dma)
ret = -ENOMEM;
}
-   if (IS_ENABLED(CONFIG_ZONE_DMA32)) {
-   atomic_pool_dma32 = __dma_atomic_pool_init(atomic_pool_size,
-   GFP_KERNEL | GFP_DMA32);
-   if (!atomic_pool_dma32)
-   ret = -ENOMEM;
-   }
 
dma_atomic_pool_debugfs_init();
return ret;
@@ -206,14 +199,12 @@ postcore_initcall(dma_atomic_pool_init);
 static inline struct gen_pool *dev_to_pool(struct device *dev)
 {
u64 phys_mask;
-   gfp_t gfp;
-
-   gfp = dma_direct_optimal_gfp_mask(dev, dev->coherent_dma_mask,
- _mask);
-   if (IS_ENABLED(CONFIG_ZONE_DMA) && gfp == GFP_DMA)
-   return atomic_pool_dma;
-   if (IS_ENABLED(CONFIG_ZONE_DMA32) && gfp == GFP_DMA32)
-   return atomic_pool_dma32;
+
+   if (atomic_pool_dma &&
+   dma_direct_optimal_gfp_mask(dev, dev->coherent_dma_mask,
+   _mask))
+   return atomic_pool_dma;
+
return atomic_pool_kernel;
 }
 
-- 
2.27.0

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH v3 4/4] PCI/ACS: Enable PCI_ACS_TB for untrusted/external-facing devices

2020-07-07 Thread kernel test robot
Hi Rajat,

Thank you for the patch! Perhaps something to improve:

[auto build test WARNING on pci/next]
[also build test WARNING on iommu/next pm/linux-next v5.8-rc4 next-20200707]
[If your patch is applied to the wrong git tree, kindly drop us a note.
And when submitting patch, we suggest to use  as documented in
https://git-scm.com/docs/git-format-patch]

url:
https://github.com/0day-ci/linux/commits/Rajat-Jain/PCI-Move-pci_enable_acs-and-its-dependencies-up-in-pci-c/20200707-125604
base:   https://git.kernel.org/pub/scm/linux/kernel/git/helgaas/pci.git next
config: arm-randconfig-r012-20200707 (attached as .config)
compiler: clang version 11.0.0 (https://github.com/llvm/llvm-project 
02946de3802d3bc65bc9f2eb9b8d4969b5a7add8)
reproduce (this is a W=1 build):
wget 
https://raw.githubusercontent.com/intel/lkp-tests/master/sbin/make.cross -O 
~/bin/make.cross
chmod +x ~/bin/make.cross
# install arm cross compiling tool for clang build
# apt-get install binutils-arm-linux-gnueabi
# save the attached .config to linux build tree
COMPILER_INSTALL_PATH=$HOME/0day COMPILER=clang make.cross ARCH=arm 

If you fix the issue, kindly add following tag as appropriate
Reported-by: kernel test robot 

All warnings (new ones prefixed by >>):

>> drivers/pci/pci.c:883:3: warning: add explicit braces to avoid dangling else 
>> [-Wdangling-else]
   else
   ^
   1 warning generated.
--
>> drivers/pci/quirks.c:4987:3: warning: add explicit braces to avoid dangling 
>> else [-Wdangling-else]
   else
   ^
   1 warning generated.

vim +883 drivers/pci/pci.c

   849  
   850  /**
   851   * pci_std_enable_acs - enable ACS on devices using standard ACS 
capabilities
   852   * @dev: the PCI device
   853   */
   854  static void pci_std_enable_acs(struct pci_dev *dev)
   855  {
   856  int pos;
   857  u16 cap;
   858  u16 ctrl;
   859  
   860  pos = dev->acs_cap;
   861  if (!pos)
   862  return;
   863  
   864  pci_read_config_word(dev, pos + PCI_ACS_CAP, );
   865  pci_read_config_word(dev, pos + PCI_ACS_CTRL, );
   866  
   867  /* Source Validation */
   868  ctrl |= (cap & PCI_ACS_SV);
   869  
   870  /* P2P Request Redirect */
   871  ctrl |= (cap & PCI_ACS_RR);
   872  
   873  /* P2P Completion Redirect */
   874  ctrl |= (cap & PCI_ACS_CR);
   875  
   876  /* Upstream Forwarding */
   877  ctrl |= (cap & PCI_ACS_UF);
   878  
   879  /* Enable Translation Blocking for external devices */
   880  if (dev->external_facing || dev->untrusted)
   881  if (cap & PCI_ACS_TB)
   882  ctrl |= PCI_ACS_TB;
 > 883  else
   884  pci_warn(dev, "ACS: No Trans Blocking on ext 
dev\n");
   885  
   886  pci_write_config_word(dev, pos + PCI_ACS_CTRL, ctrl);
   887  }
   888  

---
0-DAY CI Kernel Test Service, Intel Corporation
https://lists.01.org/hyperkitty/list/kbuild-...@lists.01.org


.config.gz
Description: application/gzip
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

Re: [PATCH v2 4/6] drm/msm: Add support to create a local pagetable

2020-07-07 Thread Robin Murphy

On 2020-06-26 21:04, Jordan Crouse wrote:

Add support to create a io-pgtable for use by targets that support
per-instance pagetables.  In order to support per-instance pagetables the
GPU SMMU device needs to have the qcom,adreno-smmu compatible string and
split pagetables and auxiliary domains need to be supported and enabled.

Signed-off-by: Jordan Crouse 
---

  drivers/gpu/drm/msm/msm_gpummu.c |   2 +-
  drivers/gpu/drm/msm/msm_iommu.c  | 180 ++-
  drivers/gpu/drm/msm/msm_mmu.h|  16 ++-
  3 files changed, 195 insertions(+), 3 deletions(-)

diff --git a/drivers/gpu/drm/msm/msm_gpummu.c b/drivers/gpu/drm/msm/msm_gpummu.c
index 310a31b05faa..aab121f4beb7 100644
--- a/drivers/gpu/drm/msm/msm_gpummu.c
+++ b/drivers/gpu/drm/msm/msm_gpummu.c
@@ -102,7 +102,7 @@ struct msm_mmu *msm_gpummu_new(struct device *dev, struct 
msm_gpu *gpu)
}
  
  	gpummu->gpu = gpu;

-   msm_mmu_init(>base, dev, );
+   msm_mmu_init(>base, dev, , MSM_MMU_GPUMMU);
  
  	return >base;

  }
diff --git a/drivers/gpu/drm/msm/msm_iommu.c b/drivers/gpu/drm/msm/msm_iommu.c
index 1b6635504069..f455c597f76d 100644
--- a/drivers/gpu/drm/msm/msm_iommu.c
+++ b/drivers/gpu/drm/msm/msm_iommu.c
@@ -4,15 +4,192 @@
   * Author: Rob Clark 
   */
  
+#include 

  #include "msm_drv.h"
  #include "msm_mmu.h"
  
  struct msm_iommu {

struct msm_mmu base;
struct iommu_domain *domain;
+   struct iommu_domain *aux_domain;
  };
+
  #define to_msm_iommu(x) container_of(x, struct msm_iommu, base)
  
+struct msm_iommu_pagetable {

+   struct msm_mmu base;
+   struct msm_mmu *parent;
+   struct io_pgtable_ops *pgtbl_ops;
+   phys_addr_t ttbr;
+   u32 asid;
+};
+
+static struct msm_iommu_pagetable *to_pagetable(struct msm_mmu *mmu)
+{
+   return container_of(mmu, struct msm_iommu_pagetable, base);
+}
+
+static int msm_iommu_pagetable_unmap(struct msm_mmu *mmu, u64 iova,
+   size_t size)
+{
+   struct msm_iommu_pagetable *pagetable = to_pagetable(mmu);
+   struct io_pgtable_ops *ops = pagetable->pgtbl_ops;
+   size_t unmapped = 0;
+
+   /* Unmap the block one page at a time */
+   while (size) {
+   unmapped += ops->unmap(ops, iova, 4096, NULL);
+   iova += 4096;
+   size -= 4096;
+   }
+
+   iommu_flush_tlb_all(to_msm_iommu(pagetable->parent)->domain);
+
+   return (unmapped == size) ? 0 : -EINVAL;
+}


Remember in patch #1 when you said "Then 'domain' can be used like any 
other iommu domain to map and unmap iova addresses in the pagetable."?


This appears to be very much not that :/

Robin.


+
+static int msm_iommu_pagetable_map(struct msm_mmu *mmu, u64 iova,
+   struct sg_table *sgt, size_t len, int prot)
+{
+   struct msm_iommu_pagetable *pagetable = to_pagetable(mmu);
+   struct io_pgtable_ops *ops = pagetable->pgtbl_ops;
+   struct scatterlist *sg;
+   size_t mapped = 0;
+   u64 addr = iova;
+   unsigned int i;
+
+   for_each_sg(sgt->sgl, sg, sgt->nents, i) {
+   size_t size = sg->length;
+   phys_addr_t phys = sg_phys(sg);
+
+   /* Map the block one page at a time */
+   while (size) {
+   if (ops->map(ops, addr, phys, 4096, prot)) {
+   msm_iommu_pagetable_unmap(mmu, iova, mapped);
+   return -EINVAL;
+   }
+
+   phys += 4096;
+   addr += 4096;
+   size -= 4096;
+   mapped += 4096;
+   }
+   }
+
+   return 0;
+}
+
+static void msm_iommu_pagetable_destroy(struct msm_mmu *mmu)
+{
+   struct msm_iommu_pagetable *pagetable = to_pagetable(mmu);
+
+   free_io_pgtable_ops(pagetable->pgtbl_ops);
+   kfree(pagetable);
+}
+
+/*
+ * Given a parent device, create and return an aux domain. This will enable the
+ * TTBR0 region
+ */
+static struct iommu_domain *msm_iommu_get_aux_domain(struct msm_mmu *parent)
+{
+   struct msm_iommu *iommu = to_msm_iommu(parent);
+   struct iommu_domain *domain;
+   int ret;
+
+   if (iommu->aux_domain)
+   return iommu->aux_domain;
+
+   if (!iommu_dev_has_feature(parent->dev, IOMMU_DEV_FEAT_AUX))
+   return ERR_PTR(-ENODEV);
+
+   domain = iommu_domain_alloc(_bus_type);
+   if (!domain)
+   return ERR_PTR(-ENODEV);
+
+   ret = iommu_aux_attach_device(domain, parent->dev);
+   if (ret) {
+   iommu_domain_free(domain);
+   return ERR_PTR(ret);
+   }
+
+   iommu->aux_domain = domain;
+   return domain;
+}
+
+int msm_iommu_pagetable_params(struct msm_mmu *mmu,
+   phys_addr_t *ttbr, int *asid)
+{
+   struct msm_iommu_pagetable *pagetable;
+
+   if (mmu->type != MSM_MMU_IOMMU_PAGETABLE)
+   return -EINVAL;
+
+   pagetable = 

Re: [PATCH v2 2/6] iommu/io-pgtable: Allow a pgtable implementation to skip TLB operations

2020-07-07 Thread Robin Murphy

On 2020-06-26 21:04, Jordan Crouse wrote:

Allow a io-pgtable implementation to skip TLB operations by checking for
NULL pointers in the helper functions. It will be up to to the owner
of the io-pgtable instance to make sure that they independently handle
the TLB correctly.


I don't really understand what this is for - tricking the IOMMU driver 
into not performing its TLB maintenance at points when that maintenance 
has been deemed necessary doesn't seem like the appropriate way to 
achieve anything good :/


Robin.


Signed-off-by: Jordan Crouse 
---

  include/linux/io-pgtable.h | 11 +++
  1 file changed, 7 insertions(+), 4 deletions(-)

diff --git a/include/linux/io-pgtable.h b/include/linux/io-pgtable.h
index 53d53c6c2be9..bbed1d3925ba 100644
--- a/include/linux/io-pgtable.h
+++ b/include/linux/io-pgtable.h
@@ -210,21 +210,24 @@ struct io_pgtable {
  
  static inline void io_pgtable_tlb_flush_all(struct io_pgtable *iop)

  {
-   iop->cfg.tlb->tlb_flush_all(iop->cookie);
+   if (iop->cfg.tlb)
+   iop->cfg.tlb->tlb_flush_all(iop->cookie);
  }
  
  static inline void

  io_pgtable_tlb_flush_walk(struct io_pgtable *iop, unsigned long iova,
  size_t size, size_t granule)
  {
-   iop->cfg.tlb->tlb_flush_walk(iova, size, granule, iop->cookie);
+   if (iop->cfg.tlb)
+   iop->cfg.tlb->tlb_flush_walk(iova, size, granule, iop->cookie);
  }
  
  static inline void

  io_pgtable_tlb_flush_leaf(struct io_pgtable *iop, unsigned long iova,
  size_t size, size_t granule)
  {
-   iop->cfg.tlb->tlb_flush_leaf(iova, size, granule, iop->cookie);
+   if (iop->cfg.tlb)
+   iop->cfg.tlb->tlb_flush_leaf(iova, size, granule, iop->cookie);
  }
  
  static inline void

@@ -232,7 +235,7 @@ io_pgtable_tlb_add_page(struct io_pgtable *iop,
struct iommu_iotlb_gather * gather, unsigned long iova,
size_t granule)
  {
-   if (iop->cfg.tlb->tlb_add_page)
+   if (iop->cfg.tlb && iop->cfg.tlb->tlb_add_page)
iop->cfg.tlb->tlb_add_page(gather, iova, granule, iop->cookie);
  }
  


___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH v2 3/4] iommu/vt-d: Report page request faults for guest SVA

2020-07-07 Thread Jean-Philippe Brucker
On Mon, Jul 06, 2020 at 08:25:34AM +0800, Lu Baolu wrote:
> A pasid might be bound to a page table from a VM guest via the iommu
> ops.sva_bind_gpasid. In this case, when a DMA page fault is detected
> on the physical IOMMU, we need to inject the page fault request into
> the guest. After the guest completes handling the page fault, a page
> response need to be sent back via the iommu ops.page_response().
> 
> This adds support to report a page request fault. Any external module
> which is interested in handling this fault should regiester a notifier
> callback.
> 
> Co-developed-by: Jacob Pan 
> Signed-off-by: Jacob Pan 
> Co-developed-by: Liu Yi L 
> Signed-off-by: Liu Yi L 
> Signed-off-by: Lu Baolu 
> ---
[...]
> +static int
> +intel_svm_prq_report(struct device *dev, struct page_req_dsc *desc)
> +{
> + struct iommu_fault_event event;
> + u8 bus, devfn;
> +
> + memset(, 0, sizeof(struct iommu_fault_event));
> + bus = PCI_BUS_NUM(desc->rid);
> + devfn = desc->rid & 0xff;
> +
> + /* Fill in event data for device specific processing */
> + event.fault.type = IOMMU_FAULT_PAGE_REQ;
> + event.fault.prm.addr = desc->addr;
> + event.fault.prm.pasid = desc->pasid;
> + event.fault.prm.grpid = desc->prg_index;
> + event.fault.prm.perm = prq_to_iommu_prot(desc);
> +
> + /*
> +  * Set last page in group bit if private data is present,
> +  * page response is required as it does for LPIG.
> +  */
> + if (desc->lpig)
> + event.fault.prm.flags |= IOMMU_FAULT_PAGE_REQUEST_LAST_PAGE;
> + if (desc->pasid_present)
> + event.fault.prm.flags |= IOMMU_FAULT_PAGE_REQUEST_PASID_VALID;

Do you also need to set IOMMU_FAULT_PAGE_RESPONSE_NEEDS_PASID?  I added
the flag to deal with devices that do not want a PASID value in their PRI
response (bit 15 in the PCIe Page Request Status Register):
https://lore.kernel.org/linux-iommu/20200616144712.748818-1-jean-phili...@linaro.org/
(applied by Joerg for v5.9)

Grepping for pci_prg_resp_pasid_required() in intel/iommu.c it seems to
currently reject devices that do not want a PASID in a PRI response, so I
think you can set this flag unconditionally for now.

Thanks,
Jean

> + if (desc->priv_data_present) {
> + event.fault.prm.flags |= IOMMU_FAULT_PAGE_REQUEST_LAST_PAGE;
> + event.fault.prm.flags |= IOMMU_FAULT_PAGE_REQUEST_PRIV_DATA;
> + memcpy(event.fault.prm.private_data, desc->priv_data,
> +sizeof(desc->priv_data));
> + }
> +
> + return iommu_report_device_fault(dev, );
> +}
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH v2 1/6] iommu/arm-smmu: Add auxiliary domain support for arm-smmuv2

2020-07-07 Thread Jean-Philippe Brucker
Hi Jordan,

On Fri, Jun 26, 2020 at 02:04:09PM -0600, Jordan Crouse wrote:
> Support auxiliary domains for arm-smmu-v2 to initialize and support
> multiple pagetables for a single SMMU context bank. Since the smmu-v2
> hardware doesn't have any built in support for switching the pagetable
> base it is left as an exercise to the caller to actually use the pagetable.
> 
> Aux domains are supported if split pagetable (TTBR1) support has been
> enabled on the master domain.  Each auxiliary domain will reuse the
> configuration of the master domain. By default the a domain with TTBR1
> support will have the TTBR0 region disabled so the first attached aux
> domain will enable the TTBR0 region in the hardware and conversely the
> last domain to be detached will disable TTBR0 translations.  All subsequent
> auxiliary domains create a pagetable but not touch the hardware.
> 
> The leaf driver will be able to query the physical address of the
> pagetable with the DOMAIN_ATTR_PTBASE attribute so that it can use the
> address with whatever means it has to switch the pagetable base.
> 
> Following is a pseudo code example of how a domain can be created
> 
>  /* Check to see if aux domains are supported */
>  if (iommu_dev_has_feature(dev, IOMMU_DEV_FEAT_AUX)) {
>iommu = iommu_domain_alloc(...);
> 

The device driver should also call iommu_dev_enable_feature() before using
the AUX feature. I see that you implement them as NOPs and in this case
the GPU is tightly coupled with the SMMU so interoperability between
different IOMMU and device drivers doesn't matter much, but I think it's
still a good idea to follow the same patterns in all drivers to make
future work on the core IOMMU easier.

>if (iommu_aux_attach_device(domain, dev))
>return FAIL;
> 
>   /* Save the base address of the pagetable for use by the driver
>   iommu_domain_get_attr(domain, DOMAIN_ATTR_PTBASE, );
>  }
> 
> Then 'domain' can be used like any other iommu domain to map and
> unmap iova addresses in the pagetable.
> 
> Signed-off-by: Jordan Crouse 
> ---
> 
>  drivers/iommu/arm-smmu.c | 219 ---
>  drivers/iommu/arm-smmu.h |   1 +
>  2 files changed, 204 insertions(+), 16 deletions(-)
[...]
> @@ -1653,6 +1836,10 @@ static struct iommu_ops arm_smmu_ops = {
>   .get_resv_regions   = arm_smmu_get_resv_regions,
>   .put_resv_regions   = generic_iommu_put_resv_regions,
>   .def_domain_type= arm_smmu_def_domain_type,
> + .dev_has_feat   = arm_smmu_dev_has_feat,
> + .dev_enable_feat= arm_smmu_dev_enable_feat,
> + .dev_disable_feat   = arm_smmu_dev_disable_feat,
> + .aux_attach_dev = arm_smmu_aux_attach_dev,

To be complete this also needs dev_feat_enabled() and aux_detach_dev() ops

Thanks,
Jean
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: IOASID set token

2020-07-07 Thread Jean-Philippe Brucker
On Mon, Jul 06, 2020 at 01:51:37PM -0700, Jacob Pan wrote:
> Hi Jean,
> Thanks for the feedback, please see replies inline.
> 
> On Mon, 6 Jul 2020 12:30:41 +0200
> Jean-Philippe Brucker  wrote:
> 
> > Hi Jacob,
> > 
> > On Thu, Jul 02, 2020 at 06:48:25AM -0700, Jacob Pan wrote:
> > > Hi Jean,
> > > 
> > > Just realized I should send this to your Linaro account instead of
> > > ARM. So Hi again :)
> > > 
> > > On Wed, 1 Jul 2020 23:29:16 -0700
> > > Jacob Pan  wrote:
> > >   
> > > > Hi Jean,
> > > > 
> > > > Have a question for you on whether we can have a fixed token type
> > > > for ioasid_set.
> > > > 
> > > > Currently, ioasid_set has an arbitrary token. For VT-d vSVA
> > > > usage, we choose mm as ioasid_set token to identify PASIDs within
> > > > a guest. We have multiple in-kernel users of PASIDs such as VFIO,
> > > > KVM, and VDCM. When an IOASID set is created, there is not a good
> > > > way to communicate about the token choices. So we have to let
> > > > VDCM and KVM *assume* mm is used as token, then retrieve
> > > > ioasid_set based on the token.
> > > > 
> > > > This assumption of "mm as token" is not a reliable SW
> > > > architecture.  
> > 
> > I don't see this as a problem. The token type is tied to the IOASID
> > set, so users that pass those IOASID sets to ioasid_find() can safely
> > assume that the returned pointer is an mm_struct. That said I'm not
> > opposed to consolidating the API with explicit types, it could
> > definitely be more elegant.
> > 
> In our use case, we need token to ioasid_set lookup. e.g. VFIO creates
> an ioasid_set with a token. KVM instance needs to know the
> ioasid_set based on its mm. So ioasid_find() does not know the set
> until it finds it based on the token and token type.

Right, I mixed up "token" with the private data associated with each
IOASID. I had mostly forgotten about the principles you introduced in the
"IOASID extensions for guest SVA" series, sorry for the confusion.

> > > > So
> > > > we are thinking if we can have an explicit ioasid_set token type
> > > > where mm is used. After all, PASID and mm are closely related.
> > > > 
> > > > The code change might be the following:
> > > > 1. add a flag to indicate token type when ioasid_set is allocated,
> > > > e.g. IOASID_SET_TYPE_MM
> > > > IOASID_SET_TYPE_ANY
> > > > 2. other users of the ioasid_set can query if an mm token exists
> > > > based on the flag IOASID_SET_TYPE_MM, then retrieve the
> > > > ioasid_set.
> > > > 
> > > > Existing ioasid_set user can still use arbitrary token under the
> > > > flag IOASID_SET_TYPE_ANY
> > > > 
> > > > Would this be an issue for ARM usage?  
> > 
> > In my current implementation of auxiliary domains for Arm SMMU (which
> > might never be useful enough to go upstream) I don't even use a token
> > for the private IOASID set. However I still think we should leave the
> > option to use a type different than mm_struct as token for some
> > IOASID sets because device drivers (e.g. AMD kfd) may also want to
> > dip into the IOASID space and use their own token type.
> > 
> > For the moment, though, we could actually specialize the IOASID API to
> > only take an mm_struct as token.
> That would be fine with VT-d. We can use init_mm for host PASID set,
> process mm for VM set.

I'm not fond of using init_mm for the host PASID set. Does it need a token
at all?

> 
> > For example the functions exported
> > by the IOASID lib would be:
> > 
> >   ioasid_t ioasid_alloc_mm(set, min, max, struct mm_struct *mm)
> what is the set argument used for? In my view, ioasid_set and mm are
> 1:1 mapped.

Please disregard this, it was replacing the void* argument of
ioasid_alloc() with an mm_struct.

> struct ioasid_set *ioasid_alloc_set(void *token, ioasid_t quota)
> 
> 
> I was thinking we still have APIs for IOASID set alloc/free since we
> want to embed ioasid_set info in driver data structures for ownership
> check.
> 
> >   struct mm_struct *ioasid_find_mm(set, ioasid)
> I don't get why we need ioasid to find the set token. If we put
> mm_struct* inside ioasid_set, then we can get the token from the set
> directly.

Same here, this was replacing the void* returned by ioasid_find() with an
mm_struct.

> 
> >   ...
> > 
> > And ioasid_alloc(), ioasid_find(), etc would be internal to ioasid.c
> > and deal with IOASID_SET_TYPE_MM (or even be removed entirely for
> > now). New users that need different token types could then introduce
> > their own IOASID_SET_TYPE_* and use the lower-level functions.
> > 
> I will keep that in mind in my next set. I think it would be much
> easier to explain with code.
> 
> My takeaway is that we have a high-level agreement to have explicit mm
> as token in IOASID APIs. I think we can work out the details with
> patches.

Yes I think it would be easier to discuss with code.

Thanks,
Jean

___
iommu mailing list
iommu@lists.linux-foundation.org

RE: [PATCH v4 07/15] vfio/type1: Add VFIO_IOMMU_PASID_REQUEST (alloc/free)

2020-07-07 Thread Liu, Yi L
Hi Eric,

> From: Auger Eric 
> Sent: Monday, July 6, 2020 11:18 PM
> 
> Hi Yi,
> 
> On 7/4/20 1:26 PM, Liu Yi L wrote:
> > This patch allows user space to request PASID allocation/free, e.g. when
> > serving the request from the guest.
> >
> > PASIDs that are not freed by userspace are automatically freed when the
> > IOASID set is destroyed when process exits.
> >
> > Cc: Kevin Tian 
> > CC: Jacob Pan 
> > Cc: Alex Williamson 
> > Cc: Eric Auger 
> > Cc: Jean-Philippe Brucker 
> > Cc: Joerg Roedel 
> > Cc: Lu Baolu 
> > Signed-off-by: Liu Yi L 
> > Signed-off-by: Yi Sun 
> > Signed-off-by: Jacob Pan 
> > ---
> > v3 -> v4:
> > *) address comments from v3, except the below comment against the range
> >of PASID_FREE request. needs more help on it.
> > "> +if (req.range.min > req.range.max)
> >
> > Is it exploitable that a user can spin the kernel for a long time in
> > the case of a free by calling this with [0, MAX_UINT] regardless of
> > their actual allocations?"
> >
> > v1 -> v2:
> > *) move the vfio_mm related code to be a seprate module
> > *) use a single structure for alloc/free, could support a range of PASIDs
> > *) fetch vfio_mm at group_attach time instead of at iommu driver open time
> > ---
> >  drivers/vfio/Kconfig|  1 +
> >  drivers/vfio/vfio_iommu_type1.c | 84
> +
> >  drivers/vfio/vfio_pasid.c   | 10 +
> >  include/linux/vfio.h|  6 +++
> >  include/uapi/linux/vfio.h   | 36 ++
> >  5 files changed, 137 insertions(+)
> >
> > diff --git a/drivers/vfio/Kconfig b/drivers/vfio/Kconfig
> > index 3d8a108..95d90c6 100644
> > --- a/drivers/vfio/Kconfig
> > +++ b/drivers/vfio/Kconfig
> > @@ -2,6 +2,7 @@
> >  config VFIO_IOMMU_TYPE1
> > tristate
> > depends on VFIO
> > +   select VFIO_PASID if (X86)
> > default n
> >
> >  config VFIO_IOMMU_SPAPR_TCE
> > diff --git a/drivers/vfio/vfio_iommu_type1.c 
> > b/drivers/vfio/vfio_iommu_type1.c
> > index 80623b8..29726ca 100644
> > --- a/drivers/vfio/vfio_iommu_type1.c
> > +++ b/drivers/vfio/vfio_iommu_type1.c
> > @@ -76,6 +76,7 @@ struct vfio_iommu {
> > booldirty_page_tracking;
> > boolpinned_page_dirty_scope;
> > struct iommu_nesting_info   *nesting_info;
> > +   struct vfio_mm  *vmm;
> >  };
> >
> >  struct vfio_domain {
> > @@ -1937,6 +1938,11 @@ static void vfio_iommu_iova_insert_copy(struct
> vfio_iommu *iommu,
> >
> >  static void vfio_iommu_release_nesting_info(struct vfio_iommu *iommu)
> >  {
> > +   if (iommu->vmm) {
> > +   vfio_mm_put(iommu->vmm);
> > +   iommu->vmm = NULL;
> > +   }
> > +
> > kfree(iommu->nesting_info);
> > iommu->nesting_info = NULL;
> >  }
> > @@ -2075,6 +2081,25 @@ static int vfio_iommu_type1_attach_group(void
> *iommu_data,
> > goto out_detach;
> > }
> > iommu->nesting_info = info;
> > +
> > +   if (info->features & IOMMU_NESTING_FEAT_SYSWIDE_PASID) {
> > +   struct vfio_mm *vmm;
> > +   int sid;
> > +
> > +   vmm = vfio_mm_get_from_task(current);
> > +   if (IS_ERR(vmm)) {
> > +   ret = PTR_ERR(vmm);
> > +   goto out_detach;
> > +   }
> > +   iommu->vmm = vmm;
> > +
> > +   sid = vfio_mm_ioasid_sid(vmm);
> > +   ret = iommu_domain_set_attr(domain->domain,
> > +   DOMAIN_ATTR_IOASID_SID,
> > +   );
> > +   if (ret)
> > +   goto out_detach;
> > +   }
> > }
> >
> > /* Get aperture info */
> > @@ -2860,6 +2885,63 @@ static int vfio_iommu_type1_dirty_pages(struct
> vfio_iommu *iommu,
> > return -EINVAL;
> >  }
> >
> > +static int vfio_iommu_type1_pasid_alloc(struct vfio_iommu *iommu,
> > +   unsigned int min,
> > +   unsigned int max)
> > +{
> > +   int ret = -EOPNOTSUPP;
> > +
> > +   mutex_lock(>lock);
> > +   if (iommu->vmm)
> > +   ret = vfio_pasid_alloc(iommu->vmm, min, max);
> > +   mutex_unlock(>lock);
> > +   return ret;
> > +}
> > +
> > +static int vfio_iommu_type1_pasid_free(struct vfio_iommu *iommu,
> > +   unsigned int min,
> > +   unsigned int max)
> > +{
> > +   int ret = -EOPNOTSUPP;
> > +
> > +   mutex_lock(>lock);
> > +   if (iommu->vmm) {
> > +   vfio_pasid_free_range(iommu->vmm, min, max);
> > +   ret = 0;
> > +   }
> > +   mutex_unlock(>lock);
> > +   return ret;
> > +}
> > +
> > +static int vfio_iommu_type1_pasid_request(struct vfio_iommu *iommu,
> > + unsigned long arg)
> > +{
> > +   struct vfio_iommu_type1_pasid_request 

Re: [PATCH v7 08/36] drm: exynos: fix common struct sg_table related issues

2020-07-07 Thread Andrzej Hajda


On 19.06.2020 12:36, Marek Szyprowski wrote:
> The Documentation/DMA-API-HOWTO.txt states that the dma_map_sg() function
> returns the number of the created entries in the DMA address space.
> However the subsequent calls to the dma_sync_sg_for_{device,cpu}() and
> dma_unmap_sg must be called with the original number of the entries
> passed to the dma_map_sg().
>
> struct sg_table is a common structure used for describing a non-contiguous
> memory buffer, used commonly in the DRM and graphics subsystems. It
> consists of a scatterlist with memory pages and DMA addresses (sgl entry),
> as well as the number of scatterlist entries: CPU pages (orig_nents entry)
> and DMA mapped pages (nents entry).
>
> It turned out that it was a common mistake to misuse nents and orig_nents
> entries, calling DMA-mapping functions with a wrong number of entries or
> ignoring the number of mapped entries returned by the dma_map_sg()
> function.
>
> To avoid such issues, lets use a common dma-mapping wrappers operating
> directly on the struct sg_table objects and use scatterlist page
> iterators where possible. This, almost always, hides references to the
> nents and orig_nents entries, making the code robust, easier to follow
> and copy/paste safe.
>
> Signed-off-by: Marek Szyprowski 

Reviewed-by 

Regards
Andrzej


> ---
>   drivers/gpu/drm/exynos/exynos_drm_g2d.c | 10 +-
>   1 file changed, 5 insertions(+), 5 deletions(-)
>
> diff --git a/drivers/gpu/drm/exynos/exynos_drm_g2d.c 
> b/drivers/gpu/drm/exynos/exynos_drm_g2d.c
> index fcee33a43aca..7014a8cd971a 100644
> --- a/drivers/gpu/drm/exynos/exynos_drm_g2d.c
> +++ b/drivers/gpu/drm/exynos/exynos_drm_g2d.c
> @@ -395,8 +395,8 @@ static void g2d_userptr_put_dma_addr(struct g2d_data *g2d,
>   return;
>   
>   out:
> - dma_unmap_sg(to_dma_dev(g2d->drm_dev), g2d_userptr->sgt->sgl,
> - g2d_userptr->sgt->nents, DMA_BIDIRECTIONAL);
> + dma_unmap_sgtable(to_dma_dev(g2d->drm_dev), g2d_userptr->sgt,
> +   DMA_BIDIRECTIONAL, 0);
>   
>   pages = frame_vector_pages(g2d_userptr->vec);
>   if (!IS_ERR(pages)) {
> @@ -511,10 +511,10 @@ static dma_addr_t *g2d_userptr_get_dma_addr(struct 
> g2d_data *g2d,
>   
>   g2d_userptr->sgt = sgt;
>   
> - if (!dma_map_sg(to_dma_dev(g2d->drm_dev), sgt->sgl, sgt->nents,
> - DMA_BIDIRECTIONAL)) {
> + ret = dma_map_sgtable(to_dma_dev(g2d->drm_dev), sgt,
> +   DMA_BIDIRECTIONAL, 0);
> + if (ret) {
>   DRM_DEV_ERROR(g2d->dev, "failed to map sgt with dma region.\n");
> - ret = -ENOMEM;
>   goto err_sg_free_table;
>   }
>   
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


RE: [PATCH v4 05/15] vfio: Add PASID allocation/free support

2020-07-07 Thread Liu, Yi L
Hi Eric,

> From: Auger Eric 
> Sent: Monday, July 6, 2020 10:52 PM
> 
> Hi Yi,
> 
> On 7/4/20 1:26 PM, Liu Yi L wrote:
> > Shared Virtual Addressing (a.k.a Shared Virtual Memory) allows sharing
> > multiple process virtual address spaces with the device for simplified
> > programming model. PASID is used to tag an virtual address space in DMA
> > requests and to identify the related translation structure in IOMMU. When
> > a PASID-capable device is assigned to a VM, we want the same capability
> > of using PASID to tag guest process virtual address spaces to achieve
> > virtual SVA (vSVA).
> >
> > PASID management for guest is vendor specific. Some vendors (e.g. Intel
> > VT-d) requires system-wide managed PASIDs cross all devices, regardless
> > of whether a device is used by host or assigned to guest. Other vendors
> > (e.g. ARM SMMU) may allow PASIDs managed per-device thus could be fully
> > delegated to the guest for assigned devices.
> >
> > For system-wide managed PASIDs, this patch introduces a vfio module to
> > handle explicit PASID alloc/free requests from guest. Allocated PASIDs
> > are associated to a process (or, mm_struct) in IOASID core. A vfio_mm
> > object is introduced to track mm_struct. Multiple VFIO containers within
> > a process share the same vfio_mm object.
> >
> > A quota mechanism is provided to prevent malicious user from exhausting
> > available PASIDs. Currently the quota is a global parameter applied to
> > all VFIO devices. In the future per-device quota might be supported too.
> >
> > Cc: Kevin Tian 
> > CC: Jacob Pan 
> > Cc: Eric Auger 
> > Cc: Jean-Philippe Brucker 
> > Cc: Joerg Roedel 
> > Cc: Lu Baolu 
> > Suggested-by: Alex Williamson 
> > Signed-off-by: Liu Yi L 
> > ---
> > v3 -> v4:
> > *) fix lock leam in vfio_mm_get_from_task()
> > *) drop pasid_quota field in struct vfio_mm
> > *) vfio_mm_get_from_task() returns ERR_PTR(-ENOTTY)
> when !CONFIG_VFIO_PASID
> >
> > v1 -> v2:
> > *) added in v2, split from the pasid alloc/free support of v1
> > ---
> >  drivers/vfio/Kconfig  |   5 ++
> >  drivers/vfio/Makefile |   1 +
> >  drivers/vfio/vfio_pasid.c | 152
> ++
> >  include/linux/vfio.h  |  28 +
> >  4 files changed, 186 insertions(+)
> >  create mode 100644 drivers/vfio/vfio_pasid.c
> >
> > diff --git a/drivers/vfio/Kconfig b/drivers/vfio/Kconfig
> > index fd17db9..3d8a108 100644
> > --- a/drivers/vfio/Kconfig
> > +++ b/drivers/vfio/Kconfig
> > @@ -19,6 +19,11 @@ config VFIO_VIRQFD
> > depends on VFIO && EVENTFD
> > default n
> >
> > +config VFIO_PASID
> > +   tristate
> > +   depends on IOASID && VFIO
> > +   default n
> > +
> >  menuconfig VFIO
> > tristate "VFIO Non-Privileged userspace driver framework"
> > depends on IOMMU_API
> > diff --git a/drivers/vfio/Makefile b/drivers/vfio/Makefile
> > index de67c47..bb836a3 100644
> > --- a/drivers/vfio/Makefile
> > +++ b/drivers/vfio/Makefile
> > @@ -3,6 +3,7 @@ vfio_virqfd-y := virqfd.o
> >
> >  obj-$(CONFIG_VFIO) += vfio.o
> >  obj-$(CONFIG_VFIO_VIRQFD) += vfio_virqfd.o
> > +obj-$(CONFIG_VFIO_PASID) += vfio_pasid.o
> >  obj-$(CONFIG_VFIO_IOMMU_TYPE1) += vfio_iommu_type1.o
> >  obj-$(CONFIG_VFIO_IOMMU_SPAPR_TCE) += vfio_iommu_spapr_tce.o
> >  obj-$(CONFIG_VFIO_SPAPR_EEH) += vfio_spapr_eeh.o
> > diff --git a/drivers/vfio/vfio_pasid.c b/drivers/vfio/vfio_pasid.c
> > new file mode 100644
> > index 000..c46b870
> > --- /dev/null
> > +++ b/drivers/vfio/vfio_pasid.c
> > @@ -0,0 +1,152 @@
> > +// SPDX-License-Identifier: GPL-2.0-only
> > +/*
> > + * Copyright (C) 2020 Intel Corporation.
> > + * Author: Liu Yi L 
> > + *
> > + */
> > +
> > +#include 
> > +#include 
> > +#include 
> > +#include 
> > +#include 
> > +#include 
> > +
> > +#define DRIVER_VERSION  "0.1"
> > +#define DRIVER_AUTHOR   "Liu Yi L "
> > +#define DRIVER_DESC "PASID management for VFIO bus drivers"
> > +
> > +#define VFIO_DEFAULT_PASID_QUOTA   1000
> > +static int pasid_quota = VFIO_DEFAULT_PASID_QUOTA;
> > +module_param_named(pasid_quota, pasid_quota, uint, 0444);
> > +MODULE_PARM_DESC(pasid_quota,
> > +" Set the quota for max number of PASIDs that an application is
> allowed to request (default 1000)");
> > +
> > +struct vfio_mm_token {
> > +   unsigned long long val;
> > +};
> > +
> > +struct vfio_mm {
> > +   struct kref kref;
> > +   int ioasid_sid;
> > +   struct list_headnext;
> > +   struct vfio_mm_tokentoken;
> > +};
> > +
> > +static struct vfio_pasid {
> > +   struct mutexvfio_mm_lock;
> > +   struct list_headvfio_mm_list;
> > +} vfio_pasid;
> > +
> > +/* called with vfio.vfio_mm_lock held */
> > +static void vfio_mm_release(struct kref *kref)
> > +{
> > +   struct vfio_mm *vmm = container_of(kref, struct vfio_mm, kref);
> > +
> > +   list_del(>next);
> > +   mutex_unlock(_pasid.vfio_mm_lock);
> > +   ioasid_free_set(vmm->ioasid_sid, true);
> > +   kfree(vmm);
> > +}
> > +
> > 

Re: [PATCH v7 07/36] drm: exynos: use common helper for a scatterlist contiguity check

2020-07-07 Thread Andrzej Hajda
Hi,

On 19.06.2020 12:36, Marek Szyprowski wrote:
> Use common helper for checking the contiguity of the imported dma-buf.
>
> Signed-off-by: Marek Szyprowski 
> ---
>   drivers/gpu/drm/exynos/exynos_drm_gem.c | 23 +++
>   1 file changed, 3 insertions(+), 20 deletions(-)
>
> diff --git a/drivers/gpu/drm/exynos/exynos_drm_gem.c 
> b/drivers/gpu/drm/exynos/exynos_drm_gem.c
> index efa476858db5..1716a023bca0 100644
> --- a/drivers/gpu/drm/exynos/exynos_drm_gem.c
> +++ b/drivers/gpu/drm/exynos/exynos_drm_gem.c
> @@ -431,27 +431,10 @@ exynos_drm_gem_prime_import_sg_table(struct drm_device 
> *dev,
>   {
>   struct exynos_drm_gem *exynos_gem;
>   
> - if (sgt->nents < 1)
> + /* check if the entries in the sg_table are contiguous */
> + if (drm_prime_get_contiguous_size(sgt) < attach->dmabuf->size) {
> + DRM_ERROR("buffer chunks must be mapped contiguously");
>   return ERR_PTR(-EINVAL);
> -
> - /*
> -  * Check if the provided buffer has been mapped as contiguous
> -  * into DMA address space.
> -  */
> - if (sgt->nents > 1) {
> - dma_addr_t next_addr = sg_dma_address(sgt->sgl);
> - struct scatterlist *s;
> - unsigned int i;
> -
> - for_each_sg(sgt->sgl, s, sgt->nents, i) {
> - if (!sg_dma_len(s))
> - break;
> - if (sg_dma_address(s) != next_addr) {
> - DRM_ERROR("buffer chunks must be mapped 
> contiguously");
> - return ERR_PTR(-EINVAL);
> - }
> - next_addr = sg_dma_address(s) + sg_dma_len(s);
> - }
>   }


Reviewed-by 


Regards
Andrzej
>   
>   exynos_gem = exynos_drm_gem_init(dev, attach->dmabuf->size);
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


RE: [PATCH v4 06/15] iommu/vt-d: Support setting ioasid set to domain

2020-07-07 Thread Liu, Yi L
Hi Eric,

> From: Auger Eric 
> Sent: Monday, July 6, 2020 10:52 PM
> 
> Hi Yi,
> 
> On 7/4/20 1:26 PM, Liu Yi L wrote:
> > From IOMMU p.o.v., PASIDs allocated and managed by external components
> > (e.g. VFIO) will be passed in for gpasid_bind/unbind operation. IOMMU
> > needs some knowledge to check the PASID ownership, hence add an interface
> > for those components to tell the PASID owner.
> >
> > In latest kernel design, PASID ownership is managed by IOASID set where
> > the PASID is allocated from. This patch adds support for setting ioasid
> > set ID to the domains used for nesting/vSVA. Subsequent SVA operations
> > on the PASID will be checked against its IOASID set for proper ownership.
> >
> > Cc: Kevin Tian 
> > CC: Jacob Pan 
> > Cc: Alex Williamson 
> > Cc: Eric Auger 
> > Cc: Jean-Philippe Brucker 
> > Cc: Joerg Roedel 
> > Cc: Lu Baolu 
> > Signed-off-by: Liu Yi L 
> > Signed-off-by: Jacob Pan 
> > ---
> >  drivers/iommu/intel/iommu.c | 16 
> >  include/linux/intel-iommu.h |  4 
> >  include/linux/iommu.h   |  1 +
> >  3 files changed, 21 insertions(+)
> >
> > diff --git a/drivers/iommu/intel/iommu.c b/drivers/iommu/intel/iommu.c
> > index 62ebe01..89d708d 100644
> > --- a/drivers/iommu/intel/iommu.c
> > +++ b/drivers/iommu/intel/iommu.c
> > @@ -1793,6 +1793,7 @@ static struct dmar_domain *alloc_domain(int flags)
> > if (first_level_by_default())
> > domain->flags |= DOMAIN_FLAG_USE_FIRST_LEVEL;
> > domain->has_iotlb_device = false;
> > +   domain->ioasid_sid = INVALID_IOASID_SET;
> > INIT_LIST_HEAD(>devices);
> >
> > return domain;
> > @@ -6039,6 +6040,21 @@ intel_iommu_domain_set_attr(struct iommu_domain
> *domain,
> > }
> > spin_unlock_irqrestore(_domain_lock, flags);
> > break;
> > +   case DOMAIN_ATTR_IOASID_SID:
> no need to take the device_domain_lock?

oh, yes. thanks for spotting it.

> > +   if (!(dmar_domain->flags & DOMAIN_FLAG_NESTING_MODE)) {
> > +   ret = -ENODEV;
> > +   break;
> > +   }
> > +   if ((dmar_domain->ioasid_sid != INVALID_IOASID_SET) &&
> > +   (dmar_domain->ioasid_sid != (*(int *) data))) {
> storing *(int *) data) in a local variable would increase the
> readability of the code I think.

will do it. :-)

Regards,
Yi Liu

> > +   pr_warn_ratelimited("multi ioasid_set (%d:%d) setting",
> > +   dmar_domain->ioasid_sid,
> > +   (*(int *) data));
> > +   ret = -EBUSY;
> > +   break;
> > +   }
> > +   dmar_domain->ioasid_sid = *(int *) data;
> > +   break;
> > default:
> > ret = -EINVAL;
> > break;
> > diff --git a/include/linux/intel-iommu.h b/include/linux/intel-iommu.h
> > index 3f23c26..0d0ab32 100644
> > --- a/include/linux/intel-iommu.h
> > +++ b/include/linux/intel-iommu.h
> > @@ -549,6 +549,10 @@ struct dmar_domain {
> >2 == 1GiB, 3 == 512GiB, 4 == 1TiB */
> > u64 max_addr;   /* maximum mapped address */
> >
> > +   int ioasid_sid; /*
> > +* the ioasid set which tracks all
> > +* PASIDs used by the domain.
> > +*/
> > int default_pasid;  /*
> >  * The default pasid used for non-SVM
> >  * traffic on mediated devices.
> > diff --git a/include/linux/iommu.h b/include/linux/iommu.h
> > index 2567c33..21d32be 100644
> > --- a/include/linux/iommu.h
> > +++ b/include/linux/iommu.h
> > @@ -124,6 +124,7 @@ enum iommu_attr {
> > DOMAIN_ATTR_FSL_PAMUV1,
> > DOMAIN_ATTR_NESTING,/* two stages of translation */
> > DOMAIN_ATTR_DMA_USE_FLUSH_QUEUE,
> > +   DOMAIN_ATTR_IOASID_SID,
> > DOMAIN_ATTR_MAX,
> >  };
> >
> >
> Thanks
> 
> Eric

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


RE: [PATCH v4 04/15] vfio/type1: Report iommu nesting info to userspace

2020-07-07 Thread Liu, Yi L
Hi Eric,

> From: Auger Eric 
> Sent: Monday, July 6, 2020 10:07 PM
> 
> Hi Yi,
> On 7/4/20 1:26 PM, Liu Yi L wrote:
> > This patch exports iommu nesting capability info to user space through
> > VFIO. User space is expected to check this info for supported uAPIs (e.g.
> > PASID alloc/free, bind page table, and cache invalidation) and the vendor
> > specific format information for first level/stage page table that will be
> > bound to.
> >
> > The nesting info is available only after the nesting iommu type is set
> > for a container. Current implementation imposes one limitation - one
> > nesting container should include at most one group. The philosophy of
> > vfio container is having all groups/devices within the container share
> > the same IOMMU context. When vSVA is enabled, one IOMMU context could
> > include one 2nd-level address space and multiple 1st-level address spaces.
> > While the 2nd-leve address space is reasonably sharable by multiple groups
> > , blindly sharing 1st-level address spaces across all groups within the
> > container might instead break the guest expectation. In the future sub/
> > super container concept might be introduced to allow partial address space
> > sharing within an IOMMU context. But for now let's go with this restriction
> > by requiring singleton container for using nesting iommu features. Below
> > link has the related discussion about this decision.
> >
> > https://lkml.org/lkml/2020/5/15/1028
> >
> > Cc: Kevin Tian 
> > CC: Jacob Pan 
> > Cc: Alex Williamson 
> > Cc: Eric Auger 
> > Cc: Jean-Philippe Brucker 
> > Cc: Joerg Roedel 
> > Cc: Lu Baolu 
> > Signed-off-by: Liu Yi L 
> > ---
> > v3 -> v4:
> > *) address comments against v3.
> >
> > v1 -> v2:
> > *) added in v2
> > ---
> >
> >  drivers/vfio/vfio_iommu_type1.c | 105
> +++-
> >  include/uapi/linux/vfio.h   |  16 ++
> >  2 files changed, 109 insertions(+), 12 deletions(-)
> >
> > diff --git a/drivers/vfio/vfio_iommu_type1.c 
> > b/drivers/vfio/vfio_iommu_type1.c
> > index 7accb59..80623b8 100644
> > --- a/drivers/vfio/vfio_iommu_type1.c
> > +++ b/drivers/vfio/vfio_iommu_type1.c
> > @@ -62,18 +62,20 @@ MODULE_PARM_DESC(dma_entry_limit,
> >  "Maximum number of user DMA mappings per container (65535).");
> >
> >  struct vfio_iommu {
> > -   struct list_headdomain_list;
> > -   struct list_headiova_list;
> > -   struct vfio_domain  *external_domain; /* domain for external user */
> > -   struct mutexlock;
> > -   struct rb_root  dma_list;
> > -   struct blocking_notifier_head notifier;
> > -   unsigned intdma_avail;
> > -   uint64_tpgsize_bitmap;
> > -   boolv2;
> > -   boolnesting;
> > -   booldirty_page_tracking;
> > -   boolpinned_page_dirty_scope;
> > +   struct list_headdomain_list;
> > +   struct list_headiova_list;
> > +   struct vfio_domain  *external_domain; /* domain for
> > +external user */
> > +   struct mutexlock;
> > +   struct rb_root  dma_list;
> > +   struct blocking_notifier_head   notifier;
> > +   unsigned intdma_avail;
> > +   uint64_tpgsize_bitmap;
> > +   boolv2;
> > +   boolnesting;
> > +   booldirty_page_tracking;
> > +   boolpinned_page_dirty_scope;
> > +   struct iommu_nesting_info   *nesting_info;
> >  };
> >
> >  struct vfio_domain {
> > @@ -130,6 +132,9 @@ struct vfio_regions {
> >  #define IS_IOMMU_CAP_DOMAIN_IN_CONTAINER(iommu)\
> > (!list_empty(>domain_list))
> >
> > +#define IS_DOMAIN_IN_CONTAINER(iommu)  ((iommu->external_domain) || \
> > +(!list_empty(>domain_list)))
> > +
> >  #define DIRTY_BITMAP_BYTES(n)  (ALIGN(n, BITS_PER_TYPE(u64)) /
> BITS_PER_BYTE)
> >
> >  /*
> > @@ -1929,6 +1934,13 @@ static void vfio_iommu_iova_insert_copy(struct
> vfio_iommu *iommu,
> >
> > list_splice_tail(iova_copy, iova);
> >  }
> > +
> > +static void vfio_iommu_release_nesting_info(struct vfio_iommu *iommu)
> > +{
> > +   kfree(iommu->nesting_info);
> > +   iommu->nesting_info = NULL;
> > +}
> > +
> >  static int vfio_iommu_type1_attach_group(void *iommu_data,
> >  struct iommu_group *iommu_group)
> >  {
> > @@ -1959,6 +1971,12 @@ static int vfio_iommu_type1_attach_group(void
> *iommu_data,
> > }
> > }
> >
> > +   /* Nesting type container can include only one group */
> > +   if (iommu->nesting && IS_DOMAIN_IN_CONTAINER(iommu)) {
> > +   mutex_unlock(>lock);
> > +   return -EINVAL;
> > +   }
> > +
> > group = kzalloc(sizeof(*group), 

RE: [PATCH v4 04/15] vfio/type1: Report iommu nesting info to userspace

2020-07-07 Thread Liu, Yi L
Hi Eric,

> From: Auger Eric < eric.au...@redhat.com >
> Sent: Monday, July 6, 2020 9:45 PM
> 
> Hi Yi,
> 
> On 7/6/20 3:10 PM, Liu, Yi L wrote:
> > Hi Eric,
> >
> >> From: Auger Eric 
> >> Sent: Monday, July 6, 2020 6:37 PM
> >>
> >> Yi,
> >>
> >> On 7/4/20 1:26 PM, Liu Yi L wrote:
> >>> This patch exports iommu nesting capability info to user space through
> >>> VFIO. User space is expected to check this info for supported uAPIs (e.g.
> >>> PASID alloc/free, bind page table, and cache invalidation) and the vendor
> >>> specific format information for first level/stage page table that will be
> >>> bound to.
> >>>
> >>> The nesting info is available only after the nesting iommu type is set
> >>> for a container. Current implementation imposes one limitation - one
> >>> nesting container should include at most one group. The philosophy of
> >>> vfio container is having all groups/devices within the container share
> >>> the same IOMMU context. When vSVA is enabled, one IOMMU context could
> >>> include one 2nd-level address space and multiple 1st-level address spaces.
> >>> While the 2nd-leve address space is reasonably sharable by multiple groups
> >> level
> >
> > oh, yes.
> >
> >>> , blindly sharing 1st-level address spaces across all groups within the
> >>> container might instead break the guest expectation. In the future sub/
> >>> super container concept might be introduced to allow partial address space
> >>> sharing within an IOMMU context. But for now let's go with this 
> >>> restriction
> >>> by requiring singleton container for using nesting iommu features. Below
> >>> link has the related discussion about this decision.
> >>>
> >>> https://lkml.org/lkml/2020/5/15/1028
> >>>
> >>> Cc: Kevin Tian 
> >>> CC: Jacob Pan 
> >>> Cc: Alex Williamson 
> >>> Cc: Eric Auger 
> >>> Cc: Jean-Philippe Brucker 
> >>> Cc: Joerg Roedel 
> >>> Cc: Lu Baolu 
> >>> Signed-off-by: Liu Yi L 
> >>> ---
> >>> v3 -> v4:
> >>> *) address comments against v3.
> >>>
> >>> v1 -> v2:
> >>> *) added in v2
> >>> ---
> >>>
> >>>  drivers/vfio/vfio_iommu_type1.c | 105
> >> +++-
> >>>  include/uapi/linux/vfio.h   |  16 ++
> >>>  2 files changed, 109 insertions(+), 12 deletions(-)
> >>>
> >>> diff --git a/drivers/vfio/vfio_iommu_type1.c 
> >>> b/drivers/vfio/vfio_iommu_type1.c
> >>> index 7accb59..80623b8 100644
> >>> --- a/drivers/vfio/vfio_iommu_type1.c
> >>> +++ b/drivers/vfio/vfio_iommu_type1.c
> >>> @@ -62,18 +62,20 @@ MODULE_PARM_DESC(dma_entry_limit,
> >>>"Maximum number of user DMA mappings per container (65535).");
> >>>
> >>>  struct vfio_iommu {
> >>> - struct list_headdomain_list;
> >>> - struct list_headiova_list;
> >>> - struct vfio_domain  *external_domain; /* domain for external user */
> >>> - struct mutexlock;
> >>> - struct rb_root  dma_list;
> >>> - struct blocking_notifier_head notifier;
> >>> - unsigned intdma_avail;
> >>> - uint64_tpgsize_bitmap;
> >>> - boolv2;
> >>> - boolnesting;
> >>> - booldirty_page_tracking;
> >>> - boolpinned_page_dirty_scope;
> >>> + struct list_headdomain_list;
> >>> + struct list_headiova_list;
> >>> + struct vfio_domain  *external_domain; /* domain for
> >>> +  external user */
> >> nit: put the comment before the field?
> >
> > do you mean below?
> >
> > +   /* domain for external user */
> > +   struct vfio_domain  *external_domain;
> yes that's what I meant

got you. :-)

> >
> >>> + struct mutexlock;
> >>> + struct rb_root  dma_list;
> >>> + struct blocking_notifier_head   notifier;
> >>> + unsigned intdma_avail;
> >>> + uint64_tpgsize_bitmap;
> >>> + boolv2;
> >>> + boolnesting;
> >>> + booldirty_page_tracking;
> >>> + boolpinned_page_dirty_scope;
> >>> + struct iommu_nesting_info   *nesting_info;
> >>>  };
> >>>
> >>>  struct vfio_domain {
> >>> @@ -130,6 +132,9 @@ struct vfio_regions {
> >>>  #define IS_IOMMU_CAP_DOMAIN_IN_CONTAINER(iommu)  \
> >>>   (!list_empty(>domain_list))
> >>>
> >>> +#define IS_DOMAIN_IN_CONTAINER(iommu)((iommu-
> >external_domain) || \
> >>> +  (!list_empty(>domain_list)))
> >> rename into something like CONTAINER_HAS_DOMAIN()?
> >
> > got it.
> >
> >>> +
> >>>  #define DIRTY_BITMAP_BYTES(n)(ALIGN(n, BITS_PER_TYPE(u64)) /
> >> BITS_PER_BYTE)
> >>>
> >>>  /*
> >>> @@ -1929,6 +1934,13 @@ static void vfio_iommu_iova_insert_copy(struct
> >> vfio_iommu *iommu,
> >>>
> >>>   list_splice_tail(iova_copy, iova);
> >>>  }
> >>> +
> >>> +static void 

Re: [Issue]platform/x86: iommu: System can't shutdown because iommu driver keeps checking the status of DMA_GSTS_TES

2020-07-07 Thread Lu Baolu

Hi Koba KO,

On 2020/7/7 11:27, Koba Ko wrote:

Dear Baolu,
On Tue, Jun 30, 2020 at 3:52 PM Lu Baolu  wrote:


Hi Koba,

On 2020/6/30 15:31, Koba Ko wrote:

On Mon, Jun 15, 2020 at 3:20 PM Lu Baolu  wrote:


Hi Koba Ko,

On 2020/6/15 11:19, Koba Ko wrote:

hi All,
I have a machine and there's only intel gpu.
the secureboot and vt-d is enabled in BIOS.
On the Ubuntu desktop, I do s2idle first and restart the machine.
The machine can't restart successfully, so I need to press the
power button to shutdown.
I tried  each of the following and the issue can't be triggered.
1. disable secure boot in BIOS.
2. intel_iommu=off.
3. intel_iomm=igfx_off.
4. nomodeset
5. i915.modeset=0.

After I investigate further, find inte_iommu keeps checking the status
of DMA_GSTS_TES.
During the procedure of restart, the driver would disable iommu
translation and
check status of DMA_GSTS_TES until status of DMA_GSTS_TES is 0.

If you need more information, I can provide it.


Do you mind telling what platform is it?

Best regards,
baolu

Hi Baolu,
Sorry, i missed your email.
I'm running on TGL.
My colleague has reported this on bugzilla.
Have tested your patch and it works fine.
https://bugzilla.kernel.org/show_bug.cgi?id=208363

Could you explain more about these patches!? Why do we need  a flag
for active iommu!?



We're still investigating this issue. If we find out more, I'll let you
know. Thanks a lot for the reporting.

Best regards,
baolu


Sorry for disturbing,
Do you have any updates for this issue?!
https://bugzilla.kernel.org/show_bug.cgi?id=208363


We are in process of collecting what kind of platforms are affected. I
will push the workaround patch to the community as soon as we are done.

Best regards,
baolu
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH RESEND v2] PCI: Add device even if driver attach failed

2020-07-07 Thread Greg Kroah-Hartman
On Mon, Jul 06, 2020 at 04:32:40PM -0700, Rajat Jain wrote:
> device_attach() returning failure indicates a driver error while trying to
> probe the device. In such a scenario, the PCI device should still be added
> in the system and be visible to the user.
> 
> This patch partially reverts:
> commit ab1a187bba5c ("PCI: Check device_attach() return value always")
> 
> Signed-off-by: Rajat Jain 
> Reviewed-by: Greg Kroah-Hartman 
> ---
> Resending to stable, independent from other patches per Greg's suggestion
> v2: Add Greg's reviewed by, fix commit log



This is not the correct way to submit patches for inclusion in the
stable kernel tree.  Please read:
https://www.kernel.org/doc/html/latest/process/stable-kernel-rules.html
for how to do this properly.


___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH 1/4] dma-mapping: add a new dma_need_sync API

2020-07-07 Thread Christoph Hellwig
On Mon, Jul 06, 2020 at 12:42:27PM -0700, Jonathan Lemon wrote:
> On Mon, Jun 29, 2020 at 03:03:56PM +0200, Christoph Hellwig wrote:
> > Add a new API to check if calls to dma_sync_single_for_{device,cpu} are
> > required for a given DMA streaming mapping.
> > 
> > +::
> > +
> > +   bool
> > +   dma_need_sync(struct device *dev, dma_addr_t dma_addr);
> > +
> > +Returns %true if dma_sync_single_for_{device,cpu} calls are required to
> > +transfer memory ownership.  Returns %false if those calls can be skipped.
> 
> Hi Christoph -
> 
> Thie call above is for a specific dma_addr.  For correctness, would I
> need to check every addr, or can I assume that for a specific memory
> type (pages returned from malloc), that the answer would be identical?

You need to check every mapping.  E.g. this API pairs with a
dma_map_single/page call.  For S/G mappings you'd need to call it for
each entry, although if you have a use case for that we really should
add a dma_sg_need_sync helper instea of open coding the scatterlist walk.
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[PATCH v3 4/4] PCI/ACS: Enable PCI_ACS_TB for untrusted/external-facing devices

2020-07-07 Thread Rajat Jain via iommu
When enabling ACS, enable translation blocking for external facing ports
and untrusted devices.

Signed-off-by: Rajat Jain 
---
v3: print warning if ACS_TB not supported on external-facing/untrusted ports.
Minor code comments fixes.
v2: Commit log change

 drivers/pci/pci.c|  7 +++
 drivers/pci/quirks.c | 14 ++
 2 files changed, 21 insertions(+)

diff --git a/drivers/pci/pci.c b/drivers/pci/pci.c
index 73a8627822140..497ac05bf36e8 100644
--- a/drivers/pci/pci.c
+++ b/drivers/pci/pci.c
@@ -876,6 +876,13 @@ static void pci_std_enable_acs(struct pci_dev *dev)
/* Upstream Forwarding */
ctrl |= (cap & PCI_ACS_UF);
 
+   /* Enable Translation Blocking for external devices */
+   if (dev->external_facing || dev->untrusted)
+   if (cap & PCI_ACS_TB)
+   ctrl |= PCI_ACS_TB;
+   else
+   pci_warn(dev, "ACS: No Trans Blocking on ext dev\n");
+
pci_write_config_word(dev, pos + PCI_ACS_CTRL, ctrl);
 }
 
diff --git a/drivers/pci/quirks.c b/drivers/pci/quirks.c
index b341628e47527..9cc8c1dc215ee 100644
--- a/drivers/pci/quirks.c
+++ b/drivers/pci/quirks.c
@@ -4934,6 +4934,13 @@ static void pci_quirk_enable_intel_rp_mpc_acs(struct 
pci_dev *dev)
}
 }
 
+/*
+ * Currently this quirk does the equivalent of
+ * PCI_ACS_SV | PCI_ACS_RR | PCI_ACS_CR | PCI_ACS_UF
+ *
+ * TODO: This quirk also needs to do equivalent of PCI_ACS_TB,
+ * if dev->external_facing || dev->untrusted
+ */
 static int pci_quirk_enable_intel_pch_acs(struct pci_dev *dev)
 {
if (!pci_quirk_intel_pch_acs_match(dev))
@@ -4973,6 +4980,13 @@ static int pci_quirk_enable_intel_spt_pch_acs(struct 
pci_dev *dev)
ctrl |= (cap & PCI_ACS_CR);
ctrl |= (cap & PCI_ACS_UF);
 
+   /* Enable Translation Blocking for external devices */
+   if (dev->external_facing || dev->untrusted)
+   if (cap & PCI_ACS_TB)
+   ctrl |= PCI_ACS_TB;
+   else
+   pci_warn(dev, "ACS: No Trans Blocking on ext dev\n");
+
pci_write_config_dword(dev, pos + INTEL_SPT_ACS_CTRL, ctrl);
 
pci_info(dev, "Intel SPT PCH root port ACS workaround enabled\n");
-- 
2.27.0.212.ge8ba1cc988-goog

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH] iommu: Remove unused IOMMU_SYS_CACHE_ONLY flag

2020-07-07 Thread Christoph Hellwig
On Fri, Jul 03, 2020 at 05:25:48PM +0100, Will Deacon wrote:
> The IOMMU_SYS_CACHE_ONLY flag was never exposed via the DMA API and
> has no in-tree users. Remove it.

Looks good,

Reviewed-by: Christoph Hellwig 
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [RFC PATCH] vfio: type1: fix kthread use case

2020-07-07 Thread Markus Elfring
…
> +++ b/drivers/vfio/vfio_iommu_type1.c
> @@ -2798,7 +2798,7 @@ static int vfio_iommu_type1_dma_rw_chunk
…
> - bool kthread = current->mm == NULL;
> + bool kthread_load_mm;
>   size_t offset;

How do you think about to reduce the scope for such variables?
https://refactoring.com/catalog/reduceScopeOfVariable.html


…
> @@ -2812,11 +2812,12 @@ static int vfio_iommu_type1_dma_rw_chunk
…
>   if (!mm)
>   return -EPERM;
…
> + kthread_load_mm = current->flags & PF_KTHREAD &&
> + current->mm == NULL;
…

Would you like to apply a more succinct code variant?

+   kthread_load_mm = current->flags & PF_KTHREAD && !current->mm;


Regards,
Markus
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

Re: [PATCH v2 5/7] driver core: Add device location to "struct device" and expose it in sysfs

2020-07-07 Thread Rajat Jain via iommu
On Wed, Jul 1, 2020 at 10:23 PM Oliver O'Halloran  wrote:
>
> On Thu, Jul 2, 2020 at 4:07 AM Rajat Jain  wrote:
> >
> > *snip*
> >
> > > > I guess it would make sense to have an attribute for user space to
> > > > write to in order to make the kernel reject device plug-in events
> > > > coming from a given port or connector, but the kernel has no reliable
> > > > means to determine *which* ports or connectors are "safe", and even if
> > > > there was a way for it to do that, it still may not agree with user
> > > > space on which ports or connectors should be regarded as "safe".
> > >
> > > Again, we have been doing this for USB devices for a very long time, PCI
> > > shouldn't be any different.  Why people keep ignoring working solutions
> > > is beyond me, there's nothing "special" about PCI devices here for this
> > > type of "worry" or reasoning to try to create new solutions.
> > >
> > > So, again, I ask, go do what USB does, and to do that, take the logic
> > > out of the USB core, make it bus-agnositic, and _THEN_ add it to the PCI
> > > code. Why the original submitter keeps ignoring my request to do this
> > > is beyond me, I guess they like making patches that will get rejected :(
> >
> > IMHO I'm actually trying to precisely do what I think was the
> > conclusion of our discussion, and then some changes because of the
> > further feedback I received on those patches. Let's take a step back
> > and please allow me to explain how I got here (my apologies but this
> > spans a couple of threads, and I"m trying to tie them all together
> > here):
>
> The previous thread had some suggestions, but no real conclusions.
> That's probably why we're still arguing about it...
>
> > GOAL: To allow user space to control what (PCI) drivers he wants to
> > allow on external (thunderbolt) ports. There was a lot of debate about
> > the need for such a policy at
> > https://lore.kernel.org/linux-pci/CACK8Z6GR7-wseug=ttvyrarvzx_ao2geoldnbwjtb+5y7vw...@mail.gmail.com/
> > with the final conclusion that it should be OK to implement such a
> > policy in userspace, as long as the policy is not implemented in the
> > kernel. The kernel only needs to expose bits & info that is needed by
> > the userspace to implement such a policy, and it can be used in
> > conjunction with "drivers_autoprobe" to implement this policy:
> > 
> > 
> > That's an odd thing, but sure, if you want to write up such a policy for
> > your systems, great.  But that policy does not belong in the kernel, it
> > belongs in userspace.
> > 
> > 
> > 1) The post 
> > https://lore.kernel.org/linux-pci/20200609210400.GA1461839@bjorn-Precision-5520/
> > lists out the approach that was agreed on. Replicating it here:
> > ---
> >   - Expose the PCI pdev->untrusted bit in sysfs.  We don't expose this
> > today, but doing so would be trivial.  I think I would prefer a
> > sysfs name like "external" so it's more descriptive and less of a
> > judgment.
> >
> > This comes from either the DT "external-facing" property or the
> > ACPI "ExternalFacingPort" property.
> >
> >   - All devices present at boot are enumerated.  Any statically built
> > drivers will bind to them before any userspace code runs.
> >
> > If you want to keep statically built drivers from binding, you'd
> > need to invent some mechanism so pci_driver_init() could clear
> > drivers_autoprobe after registering pci_bus_type.
> >
> >   - Early userspace code prevents modular drivers from automatically
> > binding to PCI devices:
> >
> >   echo 0 > /sys/bus/pci/drivers_autoprobe
> >
> > This prevents modular drivers from binding to all devices, whether
> > present at boot or hot-added.
> >
> >   - Userspace code uses the sysfs "bind" file to control which drivers
> > are loaded and can bind to each device, e.g.,
> >
> >   echo :02:00.0 > /sys/bus/pci/drivers/nvme/bind
>
> I think this is a reasonable suggestion. However, as Greg pointed out
> it's gratuitously different to what USB does for no real reason.
>
> > ---
> > 2) As part of implementing the above agreed approach, when I exposed
> > PCI "untrusted" attribute to userspace, it ran into discussion that
> > concluded that instead of this, the device core should be enhanced
> > with a location attribute.
> > https://lore.kernel.org/linux-pci/20200618184621.ga446...@kroah.com/
> > ---
> > ...
> > The attribute should be called something like "location" or something
> > like that (naming is hard), as you don't always know if something is
> > external or not (it could be internal, it could be unknown, it could be
> > internal to an external