[PATCH 4/5] iommu/arm-smmu-qcom: Consstently initialize stream mappings

2020-07-08 Thread Bjorn Andersson
Firmware that traps writes to S2CR to translate BYPASS into FAULT also
ignores writes of type FAULT. As such booting with "disable_bypass" set
will result in all S2CR registers left as configured by the bootloader.

This has been seen to result in indeterministic results, as these
mappings might linger and reference context banks that Linux is
reconfiguring.

Use the fact that BYPASS writes result in FAULT type to force all stream
mappings to FAULT.

Signed-off-by: Bjorn Andersson 
---
 drivers/iommu/arm-smmu-qcom.c | 18 +-
 1 file changed, 17 insertions(+), 1 deletion(-)

diff --git a/drivers/iommu/arm-smmu-qcom.c b/drivers/iommu/arm-smmu-qcom.c
index e8a36054e912..86b1917459a4 100644
--- a/drivers/iommu/arm-smmu-qcom.c
+++ b/drivers/iommu/arm-smmu-qcom.c
@@ -27,6 +27,7 @@ static int qcom_smmu_cfg_probe(struct arm_smmu_device *smmu)
 {
unsigned int last_s2cr = ARM_SMMU_GR0_S2CR(smmu->num_mapping_groups - 
1);
u32 reg;
+   int i;
 
/*
 * With some firmware writes to S2CR of type FAULT are ignored, and
@@ -37,9 +38,24 @@ static int qcom_smmu_cfg_probe(struct arm_smmu_device *smmu)
FIELD_PREP(ARM_SMMU_S2CR_CBNDX, 
0xff) |
FIELD_PREP(ARM_SMMU_S2CR_PRIVCFG, 
S2CR_PRIVCFG_DEFAULT));
reg = arm_smmu_gr0_read(smmu, last_s2cr);
-   if (FIELD_GET(ARM_SMMU_S2CR_TYPE, reg) != S2CR_TYPE_BYPASS)
+   if (FIELD_GET(ARM_SMMU_S2CR_TYPE, reg) != S2CR_TYPE_BYPASS) {
smmu->qcom_bypass_quirk = true;
 
+   /*
+* With firmware ignoring writes of type FAULT, booting the
+* Linux kernel with disable_bypass disabled (i.e. "enable
+* bypass") the initialization during probe will leave mappings
+* in an inconsistent state. Avoid this by configuring all
+* S2CRs to BYPASS.
+*/
+   for (i = 0; i < smmu->num_mapping_groups; i++) {
+   smmu->s2crs[i].type = S2CR_TYPE_BYPASS;
+   smmu->s2crs[i].privcfg = S2CR_PRIVCFG_DEFAULT;
+   smmu->s2crs[i].cbndx = 0xff;
+   smmu->s2crs[i].count = 0;
+   }
+   }
+
return 0;
 }
 
-- 
2.26.2

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[PATCH 3/5] iommu/arm-smmu: Move SMR and S2CR definitions to header file

2020-07-08 Thread Bjorn Andersson
Expose the SMR and S2CR structs in the header file, to allow platform
specific implementations to populate/initialize the smrs and s2cr
arrays.

Signed-off-by: Bjorn Andersson 
---
 drivers/iommu/arm-smmu.c | 14 --
 drivers/iommu/arm-smmu.h | 15 +++
 2 files changed, 15 insertions(+), 14 deletions(-)

diff --git a/drivers/iommu/arm-smmu.c b/drivers/iommu/arm-smmu.c
index f33eda3117fa..e2d6c0aaf1ea 100644
--- a/drivers/iommu/arm-smmu.c
+++ b/drivers/iommu/arm-smmu.c
@@ -68,24 +68,10 @@ module_param(disable_bypass, bool, S_IRUGO);
 MODULE_PARM_DESC(disable_bypass,
"Disable bypass streams such that incoming transactions from devices 
that are not attached to an iommu domain will report an abort back to the 
device and will not be allowed to pass through the SMMU.");
 
-struct arm_smmu_s2cr {
-   struct iommu_group  *group;
-   int count;
-   enum arm_smmu_s2cr_type type;
-   enum arm_smmu_s2cr_privcfg  privcfg;
-   u8  cbndx;
-};
-
 #define s2cr_init_val (struct arm_smmu_s2cr){  \
.type = disable_bypass ? S2CR_TYPE_FAULT : S2CR_TYPE_BYPASS,\
 }
 
-struct arm_smmu_smr {
-   u16 mask;
-   u16 id;
-   boolvalid;
-};
-
 struct arm_smmu_cb {
u64 ttbr[2];
u32 tcr[2];
diff --git a/drivers/iommu/arm-smmu.h b/drivers/iommu/arm-smmu.h
index a71d193073e4..bcd160d01c53 100644
--- a/drivers/iommu/arm-smmu.h
+++ b/drivers/iommu/arm-smmu.h
@@ -251,6 +251,21 @@ enum arm_smmu_implementation {
QCOM_SMMUV2,
 };
 
+struct arm_smmu_s2cr {
+   struct iommu_group  *group;
+   int count;
+   enum arm_smmu_s2cr_type type;
+   enum arm_smmu_s2cr_privcfg  privcfg;
+   u8  cbndx;
+};
+
+struct arm_smmu_smr {
+   u16 mask;
+   u16 id;
+   boolvalid;
+   boolpinned;
+};
+
 struct arm_smmu_device {
struct device   *dev;
 
-- 
2.26.2

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[PATCH 1/5] iommu/arm-smmu: Make all valid stream mappings BYPASS

2020-07-08 Thread Bjorn Andersson
Turn all stream mappings marked as valid into BYPASS. This allows the
platform specific implementation to configure stream mappings to match
the boot loader's configuration for e.g. display to continue to function
through the reset of the SMMU.

Suggested-by: Robin Murphy 
Signed-off-by: Bjorn Andersson 
---
 drivers/iommu/arm-smmu.c | 20 
 1 file changed, 20 insertions(+)

diff --git a/drivers/iommu/arm-smmu.c b/drivers/iommu/arm-smmu.c
index 243bc4cb2705..2e27cf9815ab 100644
--- a/drivers/iommu/arm-smmu.c
+++ b/drivers/iommu/arm-smmu.c
@@ -1924,6 +1924,22 @@ static int arm_smmu_device_cfg_probe(struct 
arm_smmu_device *smmu)
return 0;
 }
 
+int arm_smmu_setup_identity(struct arm_smmu_device *smmu)
+{
+   int i;
+
+   for (i = 0; i < smmu->num_mapping_groups; i++) {
+   if (smmu->smrs[i].valid) {
+   smmu->s2crs[i].type = S2CR_TYPE_BYPASS;
+   smmu->s2crs[i].privcfg = S2CR_PRIVCFG_DEFAULT;
+   smmu->s2crs[i].cbndx = 0xff;
+   smmu->s2crs[i].count++;
+   }
+   }
+
+   return 0;
+}
+
 struct arm_smmu_match_data {
enum arm_smmu_arch_version version;
enum arm_smmu_implementation model;
@@ -2181,6 +2197,10 @@ static int arm_smmu_device_probe(struct platform_device 
*pdev)
if (err)
return err;
 
+   err = arm_smmu_setup_identity(smmu);
+   if (err)
+   return err;
+
if (smmu->version == ARM_SMMU_V2) {
if (smmu->num_context_banks > smmu->num_context_irqs) {
dev_err(dev,
-- 
2.26.2

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[PATCH 2/5] iommu/arm-smmu: Emulate bypass by using context banks

2020-07-08 Thread Bjorn Andersson
Some firmware found on various Qualcomm platforms traps writes to S2CR
of type BYPASS and writes FAULT into the register. This prevents us from
marking the streams for the display controller as BYPASS to allow
continued scanout of the screen through the initialization of the ARM
SMMU.

This adds a Qualcomm specific cfg_probe function, which probes the
behavior of the S2CR registers and if found faulty enables the related
quirk. Based on this quirk context banks are allocated for IDENTITY
domains as well, but with ARM_SMMU_SCTLR_M omitted.

The result is valid stream mappings, without translation.

Signed-off-by: Bjorn Andersson 
---
 drivers/iommu/arm-smmu-qcom.c | 21 +
 drivers/iommu/arm-smmu.c  | 14 --
 drivers/iommu/arm-smmu.h  |  3 +++
 3 files changed, 36 insertions(+), 2 deletions(-)

diff --git a/drivers/iommu/arm-smmu-qcom.c b/drivers/iommu/arm-smmu-qcom.c
index cf01d0215a39..e8a36054e912 100644
--- a/drivers/iommu/arm-smmu-qcom.c
+++ b/drivers/iommu/arm-smmu-qcom.c
@@ -23,6 +23,26 @@ static const struct of_device_id qcom_smmu_client_of_match[] 
= {
{ }
 };
 
+static int qcom_smmu_cfg_probe(struct arm_smmu_device *smmu)
+{
+   unsigned int last_s2cr = ARM_SMMU_GR0_S2CR(smmu->num_mapping_groups - 
1);
+   u32 reg;
+
+   /*
+* With some firmware writes to S2CR of type FAULT are ignored, and
+* writing BYPASS will end up as FAULT in the register. Perform a write
+* to S2CR to detect if this is the case with the current firmware.
+*/
+   arm_smmu_gr0_write(smmu, last_s2cr, FIELD_PREP(ARM_SMMU_S2CR_TYPE, 
S2CR_TYPE_BYPASS) |
+   FIELD_PREP(ARM_SMMU_S2CR_CBNDX, 
0xff) |
+   FIELD_PREP(ARM_SMMU_S2CR_PRIVCFG, 
S2CR_PRIVCFG_DEFAULT));
+   reg = arm_smmu_gr0_read(smmu, last_s2cr);
+   if (FIELD_GET(ARM_SMMU_S2CR_TYPE, reg) != S2CR_TYPE_BYPASS)
+   smmu->qcom_bypass_quirk = true;
+
+   return 0;
+}
+
 static int qcom_smmu_def_domain_type(struct device *dev)
 {
const struct of_device_id *match =
@@ -61,6 +81,7 @@ static int qcom_smmu500_reset(struct arm_smmu_device *smmu)
 }
 
 static const struct arm_smmu_impl qcom_smmu_impl = {
+   .cfg_probe = qcom_smmu_cfg_probe,
.def_domain_type = qcom_smmu_def_domain_type,
.reset = qcom_smmu500_reset,
 };
diff --git a/drivers/iommu/arm-smmu.c b/drivers/iommu/arm-smmu.c
index 2e27cf9815ab..f33eda3117fa 100644
--- a/drivers/iommu/arm-smmu.c
+++ b/drivers/iommu/arm-smmu.c
@@ -654,7 +654,9 @@ static void arm_smmu_write_context_bank(struct 
arm_smmu_device *smmu, int idx)
 
/* SCTLR */
reg = ARM_SMMU_SCTLR_CFIE | ARM_SMMU_SCTLR_CFRE | ARM_SMMU_SCTLR_AFE |
- ARM_SMMU_SCTLR_TRE | ARM_SMMU_SCTLR_M;
+ ARM_SMMU_SCTLR_TRE;
+   if (cfg->m)
+   reg |= ARM_SMMU_SCTLR_M;
if (stage1)
reg |= ARM_SMMU_SCTLR_S1_ASIDPNE;
if (IS_ENABLED(CONFIG_CPU_BIG_ENDIAN))
@@ -678,7 +680,11 @@ static int arm_smmu_init_domain_context(struct 
iommu_domain *domain,
if (smmu_domain->smmu)
goto out_unlock;
 
-   if (domain->type == IOMMU_DOMAIN_IDENTITY) {
+   /*
+* Nothing to do for IDENTITY domains,unless disabled context banks are
+* used to emulate bypass mappings on Qualcomm platforms.
+*/
+   if (domain->type == IOMMU_DOMAIN_IDENTITY && !smmu->qcom_bypass_quirk) {
smmu_domain->stage = ARM_SMMU_DOMAIN_BYPASS;
smmu_domain->smmu = smmu;
goto out_unlock;
@@ -826,6 +832,10 @@ static int arm_smmu_init_domain_context(struct 
iommu_domain *domain,
domain->geometry.aperture_end = (1UL << ias) - 1;
domain->geometry.force_aperture = true;
 
+   /* Enable translation for non-identity context banks */
+   if (domain->type != IOMMU_DOMAIN_IDENTITY)
+   cfg->m = true;
+
/* Initialise the context bank with our page table cfg */
arm_smmu_init_context_bank(smmu_domain, _cfg);
arm_smmu_write_context_bank(smmu, cfg->cbndx);
diff --git a/drivers/iommu/arm-smmu.h b/drivers/iommu/arm-smmu.h
index d172c024be61..a71d193073e4 100644
--- a/drivers/iommu/arm-smmu.h
+++ b/drivers/iommu/arm-smmu.h
@@ -305,6 +305,8 @@ struct arm_smmu_device {
 
/* IOMMU core code handle */
struct iommu_device iommu;
+
+   boolqcom_bypass_quirk;
 };
 
 enum arm_smmu_context_fmt {
@@ -323,6 +325,7 @@ struct arm_smmu_cfg {
};
enum arm_smmu_cbar_type cbar;
enum arm_smmu_context_fmt   fmt;
+   boolm;
 };
 #define ARM_SMMU_INVALID_IRPTNDX   0xff
 
-- 
2.26.2

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[PATCH 0/5] iommu/arm-smmu: Support maintaining bootloader mappings

2020-07-08 Thread Bjorn Andersson
Based on previous attempts and discussions this is the latest attempt at
inheriting stream mappings set up by the bootloader, for e.g. boot splash or
efifb.

The first patch is an implementation of Robin's suggestion that we should just
mark the relevant stream mappings as BYPASS. Relying on something else to set
up the stream mappings wanted - e.g. by reading it back in platform specific
implementation code.

The series then tackles the problem seen in most versions of Qualcomm firmware,
that the hypervisor intercepts BYPASS writes and turn them into FAULTs. It does
this by allocating context banks for identity domains as well, with translation
disabled.

Lastly it amends the stream mapping initialization code to allocate a specific
identity domain that is used for any mappings inherited from the bootloader, if
above Qualcomm quirk is required.


The series has been tested and shown to allow booting SDM845, SDM850, SM8150,
SM8250 with boot splash screen setup by the bootloader. Specifically it also
allows the Lenovo Yoga C630 to boot with SMMU and efifb enabled.

Bjorn Andersson (5):
  iommu/arm-smmu: Make all valid stream mappings BYPASS
  iommu/arm-smmu: Emulate bypass by using context banks
  iommu/arm-smmu: Move SMR and S2CR definitions to header file
  iommu/arm-smmu-qcom: Consstently initialize stream mappings
  iommu/arm-smmu: Setup identity domain for boot mappings

 drivers/iommu/arm-smmu-qcom.c |  48 +
 drivers/iommu/arm-smmu.c  | 124 +-
 drivers/iommu/arm-smmu.h  |  21 ++
 3 files changed, 175 insertions(+), 18 deletions(-)

-- 
2.26.2

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[PATCH 5/5] iommu/arm-smmu: Setup identity domain for boot mappings

2020-07-08 Thread Bjorn Andersson
With many Qualcomm platforms not having functional S2CR BYPASS a
temporary IOMMU domain, without translation, needs to be allocated in
order to allow these memory transactions.

Unfortunately the boot loader uses the first few context banks, so
rather than overwriting a active bank the last context bank is used and
streams are diverted here during initialization.

This also performs the readback of SMR registers for the Qualcomm
platform, to trigger the mechanism.

This is based on prior work by Thierry Reding and Laurentiu Tudor.

Signed-off-by: Bjorn Andersson 
---
 drivers/iommu/arm-smmu-qcom.c | 11 +
 drivers/iommu/arm-smmu.c  | 80 +--
 drivers/iommu/arm-smmu.h  |  3 ++
 3 files changed, 90 insertions(+), 4 deletions(-)

diff --git a/drivers/iommu/arm-smmu-qcom.c b/drivers/iommu/arm-smmu-qcom.c
index 86b1917459a4..397df27c1d69 100644
--- a/drivers/iommu/arm-smmu-qcom.c
+++ b/drivers/iommu/arm-smmu-qcom.c
@@ -26,6 +26,7 @@ static const struct of_device_id qcom_smmu_client_of_match[] 
= {
 static int qcom_smmu_cfg_probe(struct arm_smmu_device *smmu)
 {
unsigned int last_s2cr = ARM_SMMU_GR0_S2CR(smmu->num_mapping_groups - 
1);
+   u32 smr;
u32 reg;
int i;
 
@@ -56,6 +57,16 @@ static int qcom_smmu_cfg_probe(struct arm_smmu_device *smmu)
}
}
 
+   for (i = 0; i < smmu->num_mapping_groups; i++) {
+   smr = arm_smmu_gr0_read(smmu, ARM_SMMU_GR0_SMR(i));
+
+   if (FIELD_GET(ARM_SMMU_SMR_VALID, smr)) {
+   smmu->smrs[i].id = FIELD_GET(ARM_SMMU_SMR_ID, smr);
+   smmu->smrs[i].mask = FIELD_GET(ARM_SMMU_SMR_MASK, smr);
+   smmu->smrs[i].valid = true;
+   }
+   }
+
return 0;
 }
 
diff --git a/drivers/iommu/arm-smmu.c b/drivers/iommu/arm-smmu.c
index e2d6c0aaf1ea..a7cb27c1a49e 100644
--- a/drivers/iommu/arm-smmu.c
+++ b/drivers/iommu/arm-smmu.c
@@ -652,7 +652,8 @@ static void arm_smmu_write_context_bank(struct 
arm_smmu_device *smmu, int idx)
 }
 
 static int arm_smmu_init_domain_context(struct iommu_domain *domain,
-   struct arm_smmu_device *smmu)
+   struct arm_smmu_device *smmu,
+   bool boot_domain)
 {
int irq, start, ret = 0;
unsigned long ias, oas;
@@ -770,6 +771,15 @@ static int arm_smmu_init_domain_context(struct 
iommu_domain *domain,
ret = -EINVAL;
goto out_unlock;
}
+
+   /*
+* Use the last context bank for identity mappings during boot, to
+* avoid overwriting in-use bank configuration while we're setting up
+* the new mappings.
+*/
+   if (boot_domain)
+   start = smmu->num_context_banks - 1;
+
ret = __arm_smmu_alloc_bitmap(smmu->context_map, start,
  smmu->num_context_banks);
if (ret < 0)
@@ -1149,7 +1159,10 @@ static int arm_smmu_attach_dev(struct iommu_domain 
*domain, struct device *dev)
struct iommu_fwspec *fwspec = dev_iommu_fwspec_get(dev);
struct arm_smmu_master_cfg *cfg;
struct arm_smmu_device *smmu;
+   bool free_identity_domain = false;
+   int idx;
int ret;
+   int i;
 
if (!fwspec || fwspec->ops != _smmu_ops) {
dev_err(dev, "cannot attach to SMMU, is it on the same bus?\n");
@@ -1174,7 +1187,7 @@ static int arm_smmu_attach_dev(struct iommu_domain 
*domain, struct device *dev)
return ret;
 
/* Ensure that the domain is finalised */
-   ret = arm_smmu_init_domain_context(domain, smmu);
+   ret = arm_smmu_init_domain_context(domain, smmu, false);
if (ret < 0)
goto rpm_put;
 
@@ -1190,9 +1203,34 @@ static int arm_smmu_attach_dev(struct iommu_domain 
*domain, struct device *dev)
goto rpm_put;
}
 
+   /* Decrement use counter for any references to the identity domain */
+   mutex_lock(>stream_map_mutex);
+   if (smmu->identity) {
+   struct arm_smmu_domain *identity = 
to_smmu_domain(smmu->identity);
+
+   for_each_cfg_sme(cfg, fwspec, i, idx) {
+   dev_err(smmu->dev, "%s() %#x\n", __func__, 
smmu->smrs[idx].id);
+   if (smmu->s2crs[idx].cbndx == identity->cfg.cbndx) {
+   smmu->num_identity_masters--;
+   if (smmu->num_identity_masters == 0)
+   free_identity_domain = true;
+   }
+   }
+   }
+   mutex_unlock(>stream_map_mutex);
+
/* Looks ok, so add the device to the domain */
ret = arm_smmu_domain_add_master(smmu_domain, cfg, fwspec);
 
+   /*
+* The last stream map to reference the identity domain has been
+* overwritten, so it's now okay to 

Re: [PATCH v2 0/8] arm64: dts: qcom: smmu/USB nodes and HDK855/HDK865 dts

2020-07-08 Thread Bjorn Andersson
On Fri 03 Jul 05:31 PDT 2020, Will Deacon wrote:

> On Tue, Jun 09, 2020 at 03:40:18PM -0400, Jonathan Marek wrote:
> > Add dts nodes for apps_smmu and USB for both sm8150 and sm8250.
> > 
> > Also add initial dts files for HDK855 and HDK865, based on mtp dts, with a
> > few changes. Notably, the HDK865 dts has regulator config changed a bit 
> > based
> > on downstream (I think sm8250-mtp.dts is wrong and copied too much from 
> > sm8150).
> > 
> > V2 changes:
> > * Added two patches for sm8150 and sm8250 iommu compatibles
> > * Changed apps_smmu node patches to use new compatibles
> > * Updated commit messages for apps_smmu patches to be more correct
> > * Updated HDK dts patches based on Bjorn's comments
> > 
> > Jonathan Marek (8):
> >   dt-bindings: arm-smmu: Add sm8150 and sm8250 compatible strings
> >   iommu: arm-smmu-impl: Use qcom impl for sm8150 and sm8250 compatibles
> >   arm64: dts: qcom: sm8150: add apps_smmu node
> >   arm64: dts: qcom: sm8250: add apps_smmu node
> >   arm64: dts: qcom: sm8150: Add secondary USB and PHY nodes
> >   arm64: dts: qcom: sm8250: Add USB and PHY device nodes
> >   arm64: dts: qcom: add sm8150 hdk dts
> >   arm64: dts: qcom: add sm8250 hdk dts
> 
> What's your plan for merging this? I can take the first two patches
> via arm-smmu, if you like. Please just let me know.
> 

Please pick up the binding and driver patch through your tree.

Regards,
Bjorn
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH v2 2/8] iommu: arm-smmu-impl: Use qcom impl for sm8150 and sm8250 compatibles

2020-07-08 Thread Bjorn Andersson
On Tue 09 Jun 12:40 PDT 2020, Jonathan Marek wrote:

> Use the qcom implementation for IOMMU hardware on sm8150 and sm8250 SoCs.
> 
> Signed-off-by: Jonathan Marek 

Reviewed-by: Bjorn Andersson 

Regards,
Bjorn

> ---
>  drivers/iommu/arm-smmu-impl.c | 4 +++-
>  1 file changed, 3 insertions(+), 1 deletion(-)
> 
> diff --git a/drivers/iommu/arm-smmu-impl.c b/drivers/iommu/arm-smmu-impl.c
> index c75b9d957b70..f5f6cab626be 100644
> --- a/drivers/iommu/arm-smmu-impl.c
> +++ b/drivers/iommu/arm-smmu-impl.c
> @@ -172,7 +172,9 @@ struct arm_smmu_device *arm_smmu_impl_init(struct 
> arm_smmu_device *smmu)
>   smmu->impl = _impl;
>  
>   if (of_device_is_compatible(np, "qcom,sdm845-smmu-500") ||
> - of_device_is_compatible(np, "qcom,sc7180-smmu-500"))
> + of_device_is_compatible(np, "qcom,sc7180-smmu-500") ||
> + of_device_is_compatible(np, "qcom,sm8150-smmu-500") ||
> + of_device_is_compatible(np, "qcom,sm8250-smmu-500"))
>   return qcom_smmu_impl_init(smmu);
>  
>   return smmu;
> -- 
> 2.26.1
> 
> 
> ___
> linux-arm-kernel mailing list
> linux-arm-ker...@lists.infradead.org
> http://lists.infradead.org/mailman/listinfo/linux-arm-kernel
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


RE: [PATCH v3 06/14] vfio/type1: Add VFIO_IOMMU_PASID_REQUEST (alloc/free)

2020-07-08 Thread Liu, Yi L
Hi Kevin,

> From: Tian, Kevin 
> Sent: Thursday, July 9, 2020 10:18 AM
> 
> > From: Liu, Yi L 
> > Sent: Thursday, July 9, 2020 10:08 AM
> >
> > Hi Kevin,
> >
> > > From: Tian, Kevin 
> > > Sent: Thursday, July 9, 2020 9:57 AM
> > >
> > > > From: Liu, Yi L 
> > > > Sent: Thursday, July 9, 2020 8:32 AM
> > > >
> > > > Hi Alex,
> > > >
> > > > > Alex Williamson 
> > > > > Sent: Thursday, July 9, 2020 3:55 AM
> > > > >
> > > > > On Wed, 8 Jul 2020 08:16:16 +
> > > > > "Liu, Yi L"  wrote:
> > > > >
> > > > > > Hi Alex,
> > > > > >
> > > > > > > From: Liu, Yi L < yi.l@intel.com>
> > > > > > > Sent: Friday, July 3, 2020 2:28 PM
> > > > > > >
> > > > > > > Hi Alex,
> > > > > > >
> > > > > > > > From: Alex Williamson 
> > > > > > > > Sent: Friday, July 3, 2020 5:19 AM
> > > > > > > >
> > > > > > > > On Wed, 24 Jun 2020 01:55:19 -0700 Liu Yi L
> > > > > > > >  wrote:
> > > > > > > >
> > > > > > > > > This patch allows user space to request PASID allocation/free,
> > e.g.
> > > > > > > > > when serving the request from the guest.
> > > > > > > > >
> > > > > > > > > PASIDs that are not freed by userspace are automatically
> > > > > > > > > freed
> > > > when
> > > > > > > > > the IOASID set is destroyed when process exits.
> > > > > > [...]
> > > > > > > > > +static int vfio_iommu_type1_pasid_request(struct vfio_iommu
> > > > *iommu,
> > > > > > > > > +   unsigned long arg)
> > > > > > > > > +{
> > > > > > > > > + struct vfio_iommu_type1_pasid_request req;
> > > > > > > > > + unsigned long minsz;
> > > > > > > > > +
> > > > > > > > > + minsz = offsetofend(struct
> vfio_iommu_type1_pasid_request,
> > > > > range);
> > > > > > > > > +
> > > > > > > > > + if (copy_from_user(, (void __user *)arg, minsz))
> > > > > > > > > + return -EFAULT;
> > > > > > > > > +
> > > > > > > > > + if (req.argsz < minsz || (req.flags &
> > > > > ~VFIO_PASID_REQUEST_MASK))
> > > > > > > > > + return -EINVAL;
> > > > > > > > > +
> > > > > > > > > + if (req.range.min > req.range.max)
> > > > > > > >
> > > > > > > > Is it exploitable that a user can spin the kernel for a long
> > > > > > > > time in the case of a free by calling this with [0, MAX_UINT]
> > > > > > > > regardless of their
> > > > > actual
> > > > > > > allocations?
> > > > > > >
> > > > > > > IOASID can ensure that user can only free the PASIDs allocated
> > > > > > > to the
> > > > user.
> > > > > but
> > > > > > > it's true, kernel needs to loop all the PASIDs within the range
> > > > > > > provided by user.
> > > > > it
> > > > > > > may take a long time. is there anything we can do? one thing may
> > > > > > > limit
> > > > the
> > > > > range
> > > > > > > provided by user?
> > > > > >
> > > > > > thought about it more, we have per-VM pasid quota (say 1000), so
> > > > > > even if user passed down [0, MAX_UNIT], kernel will only loop the
> > > > > > 1000 pasids at most. do you think we still need to do something on 
> > > > > > it?
> > > > >
> > > > > How do you figure that?  vfio_iommu_type1_pasid_request() accepts
> > > > > the user's min/max so long as (max > min) and passes that to
> > > > > vfio_iommu_type1_pasid_free(), then to vfio_pasid_free_range()
> > > > > which loops as:
> > > > >
> > > > >   ioasid_t pasid = min;
> > > > >   for (; pasid <= max; pasid++)
> > > > >   ioasid_free(pasid);
> > > > >
> > > > > A user might only be able to allocate 1000 pasids, but apparently
> > > > > they can ask to free all they want.
> > > > >
> > > > > It's also not obvious to me that calling ioasid_free() is only
> > > > > allowing the user to free their own passid.  Does it?  It would be a
> > > > > pretty
> > >
> > > Agree. I thought ioasid_free should at least carry a token since the user
> > space is
> > > only allowed to manage PASIDs in its own set...
> > >
> > > > > gaping hole if a user could free arbitrary pasids.  A r-b tree of
> > > > > passids might help both for security and to bound spinning in a loop.
> > > >
> > > > oh, yes. BTW. instead of r-b tree in VFIO, maybe we can add an
> > > > ioasid_set parameter for ioasid_free(), thus to prevent the user from
> > > > freeing PASIDs that doesn't belong to it. I remember Jacob mentioned it
> > before.
> > > >
> > >
> > > check current ioasid_free:
> > >
> > > spin_lock(_allocator_lock);
> > > ioasid_data = xa_load(_allocator->xa, ioasid);
> > > if (!ioasid_data) {
> > > pr_err("Trying to free unknown IOASID %u\n", ioasid);
> > > goto exit_unlock;
> > > }
> > >
> > > Allow an user to trigger above lock paths with MAX_UINT times might still
> > be bad.
> >
> > yeah, how about the below two options:
> >
> > - comparing the max - min with the quota before calling ioasid_free().
> >   If max - min > current quota of the user, then should fail it. If
> >   max - min < quota, then call ioasid_free() one by one. still trigger
> >   the above lock path 

RE: [PATCH v3 06/14] vfio/type1: Add VFIO_IOMMU_PASID_REQUEST (alloc/free)

2020-07-08 Thread Tian, Kevin
> From: Liu, Yi L 
> Sent: Thursday, July 9, 2020 10:08 AM
> 
> Hi Kevin,
> 
> > From: Tian, Kevin 
> > Sent: Thursday, July 9, 2020 9:57 AM
> >
> > > From: Liu, Yi L 
> > > Sent: Thursday, July 9, 2020 8:32 AM
> > >
> > > Hi Alex,
> > >
> > > > Alex Williamson 
> > > > Sent: Thursday, July 9, 2020 3:55 AM
> > > >
> > > > On Wed, 8 Jul 2020 08:16:16 +
> > > > "Liu, Yi L"  wrote:
> > > >
> > > > > Hi Alex,
> > > > >
> > > > > > From: Liu, Yi L < yi.l@intel.com>
> > > > > > Sent: Friday, July 3, 2020 2:28 PM
> > > > > >
> > > > > > Hi Alex,
> > > > > >
> > > > > > > From: Alex Williamson 
> > > > > > > Sent: Friday, July 3, 2020 5:19 AM
> > > > > > >
> > > > > > > On Wed, 24 Jun 2020 01:55:19 -0700 Liu Yi L
> > > > > > >  wrote:
> > > > > > >
> > > > > > > > This patch allows user space to request PASID allocation/free,
> e.g.
> > > > > > > > when serving the request from the guest.
> > > > > > > >
> > > > > > > > PASIDs that are not freed by userspace are automatically
> > > > > > > > freed
> > > when
> > > > > > > > the IOASID set is destroyed when process exits.
> > > > > [...]
> > > > > > > > +static int vfio_iommu_type1_pasid_request(struct vfio_iommu
> > > *iommu,
> > > > > > > > + unsigned long arg)
> > > > > > > > +{
> > > > > > > > +   struct vfio_iommu_type1_pasid_request req;
> > > > > > > > +   unsigned long minsz;
> > > > > > > > +
> > > > > > > > +   minsz = offsetofend(struct 
> > > > > > > > vfio_iommu_type1_pasid_request,
> > > > range);
> > > > > > > > +
> > > > > > > > +   if (copy_from_user(, (void __user *)arg, minsz))
> > > > > > > > +   return -EFAULT;
> > > > > > > > +
> > > > > > > > +   if (req.argsz < minsz || (req.flags &
> > > > ~VFIO_PASID_REQUEST_MASK))
> > > > > > > > +   return -EINVAL;
> > > > > > > > +
> > > > > > > > +   if (req.range.min > req.range.max)
> > > > > > >
> > > > > > > Is it exploitable that a user can spin the kernel for a long
> > > > > > > time in the case of a free by calling this with [0, MAX_UINT]
> > > > > > > regardless of their
> > > > actual
> > > > > > allocations?
> > > > > >
> > > > > > IOASID can ensure that user can only free the PASIDs allocated
> > > > > > to the
> > > user.
> > > > but
> > > > > > it's true, kernel needs to loop all the PASIDs within the range
> > > > > > provided by user.
> > > > it
> > > > > > may take a long time. is there anything we can do? one thing may
> > > > > > limit
> > > the
> > > > range
> > > > > > provided by user?
> > > > >
> > > > > thought about it more, we have per-VM pasid quota (say 1000), so
> > > > > even if user passed down [0, MAX_UNIT], kernel will only loop the
> > > > > 1000 pasids at most. do you think we still need to do something on it?
> > > >
> > > > How do you figure that?  vfio_iommu_type1_pasid_request() accepts
> > > > the user's min/max so long as (max > min) and passes that to
> > > > vfio_iommu_type1_pasid_free(), then to vfio_pasid_free_range()
> > > > which loops as:
> > > >
> > > > ioasid_t pasid = min;
> > > > for (; pasid <= max; pasid++)
> > > > ioasid_free(pasid);
> > > >
> > > > A user might only be able to allocate 1000 pasids, but apparently
> > > > they can ask to free all they want.
> > > >
> > > > It's also not obvious to me that calling ioasid_free() is only
> > > > allowing the user to free their own passid.  Does it?  It would be a
> > > > pretty
> >
> > Agree. I thought ioasid_free should at least carry a token since the user
> space is
> > only allowed to manage PASIDs in its own set...
> >
> > > > gaping hole if a user could free arbitrary pasids.  A r-b tree of
> > > > passids might help both for security and to bound spinning in a loop.
> > >
> > > oh, yes. BTW. instead of r-b tree in VFIO, maybe we can add an
> > > ioasid_set parameter for ioasid_free(), thus to prevent the user from
> > > freeing PASIDs that doesn't belong to it. I remember Jacob mentioned it
> before.
> > >
> >
> > check current ioasid_free:
> >
> > spin_lock(_allocator_lock);
> > ioasid_data = xa_load(_allocator->xa, ioasid);
> > if (!ioasid_data) {
> > pr_err("Trying to free unknown IOASID %u\n", ioasid);
> > goto exit_unlock;
> > }
> >
> > Allow an user to trigger above lock paths with MAX_UINT times might still
> be bad.
> 
> yeah, how about the below two options:
> 
> - comparing the max - min with the quota before calling ioasid_free().
>   If max - min > current quota of the user, then should fail it. If
>   max - min < quota, then call ioasid_free() one by one. still trigger
>   the above lock path with quota times.

This is definitely wrong. [min, max] is about the range of the PASID value,
while quota is about the number of allocated PASIDs. It's a bit weird to
mix two together. btw what is the main purpose of allowing batch PASID
free requests? Can we just simplify to allow one PASID in each 

RE: [PATCH v3 06/14] vfio/type1: Add VFIO_IOMMU_PASID_REQUEST (alloc/free)

2020-07-08 Thread Liu, Yi L
Hi Kevin,

> From: Tian, Kevin 
> Sent: Thursday, July 9, 2020 9:57 AM
> 
> > From: Liu, Yi L 
> > Sent: Thursday, July 9, 2020 8:32 AM
> >
> > Hi Alex,
> >
> > > Alex Williamson 
> > > Sent: Thursday, July 9, 2020 3:55 AM
> > >
> > > On Wed, 8 Jul 2020 08:16:16 +
> > > "Liu, Yi L"  wrote:
> > >
> > > > Hi Alex,
> > > >
> > > > > From: Liu, Yi L < yi.l@intel.com>
> > > > > Sent: Friday, July 3, 2020 2:28 PM
> > > > >
> > > > > Hi Alex,
> > > > >
> > > > > > From: Alex Williamson 
> > > > > > Sent: Friday, July 3, 2020 5:19 AM
> > > > > >
> > > > > > On Wed, 24 Jun 2020 01:55:19 -0700 Liu Yi L
> > > > > >  wrote:
> > > > > >
> > > > > > > This patch allows user space to request PASID allocation/free, 
> > > > > > > e.g.
> > > > > > > when serving the request from the guest.
> > > > > > >
> > > > > > > PASIDs that are not freed by userspace are automatically
> > > > > > > freed
> > when
> > > > > > > the IOASID set is destroyed when process exits.
> > > > [...]
> > > > > > > +static int vfio_iommu_type1_pasid_request(struct vfio_iommu
> > *iommu,
> > > > > > > +   unsigned long arg)
> > > > > > > +{
> > > > > > > + struct vfio_iommu_type1_pasid_request req;
> > > > > > > + unsigned long minsz;
> > > > > > > +
> > > > > > > + minsz = offsetofend(struct vfio_iommu_type1_pasid_request,
> > > range);
> > > > > > > +
> > > > > > > + if (copy_from_user(, (void __user *)arg, minsz))
> > > > > > > + return -EFAULT;
> > > > > > > +
> > > > > > > + if (req.argsz < minsz || (req.flags &
> > > ~VFIO_PASID_REQUEST_MASK))
> > > > > > > + return -EINVAL;
> > > > > > > +
> > > > > > > + if (req.range.min > req.range.max)
> > > > > >
> > > > > > Is it exploitable that a user can spin the kernel for a long
> > > > > > time in the case of a free by calling this with [0, MAX_UINT]
> > > > > > regardless of their
> > > actual
> > > > > allocations?
> > > > >
> > > > > IOASID can ensure that user can only free the PASIDs allocated
> > > > > to the
> > user.
> > > but
> > > > > it's true, kernel needs to loop all the PASIDs within the range
> > > > > provided by user.
> > > it
> > > > > may take a long time. is there anything we can do? one thing may
> > > > > limit
> > the
> > > range
> > > > > provided by user?
> > > >
> > > > thought about it more, we have per-VM pasid quota (say 1000), so
> > > > even if user passed down [0, MAX_UNIT], kernel will only loop the
> > > > 1000 pasids at most. do you think we still need to do something on it?
> > >
> > > How do you figure that?  vfio_iommu_type1_pasid_request() accepts
> > > the user's min/max so long as (max > min) and passes that to
> > > vfio_iommu_type1_pasid_free(), then to vfio_pasid_free_range()
> > > which loops as:
> > >
> > >   ioasid_t pasid = min;
> > >   for (; pasid <= max; pasid++)
> > >   ioasid_free(pasid);
> > >
> > > A user might only be able to allocate 1000 pasids, but apparently
> > > they can ask to free all they want.
> > >
> > > It's also not obvious to me that calling ioasid_free() is only
> > > allowing the user to free their own passid.  Does it?  It would be a
> > > pretty
> 
> Agree. I thought ioasid_free should at least carry a token since the user 
> space is
> only allowed to manage PASIDs in its own set...
> 
> > > gaping hole if a user could free arbitrary pasids.  A r-b tree of
> > > passids might help both for security and to bound spinning in a loop.
> >
> > oh, yes. BTW. instead of r-b tree in VFIO, maybe we can add an
> > ioasid_set parameter for ioasid_free(), thus to prevent the user from
> > freeing PASIDs that doesn't belong to it. I remember Jacob mentioned it 
> > before.
> >
> 
> check current ioasid_free:
> 
> spin_lock(_allocator_lock);
> ioasid_data = xa_load(_allocator->xa, ioasid);
> if (!ioasid_data) {
> pr_err("Trying to free unknown IOASID %u\n", ioasid);
> goto exit_unlock;
> }
> 
> Allow an user to trigger above lock paths with MAX_UINT times might still be 
> bad.

yeah, how about the below two options:

- comparing the max - min with the quota before calling ioasid_free().
  If max - min > current quota of the user, then should fail it. If
  max - min < quota, then call ioasid_free() one by one. still trigger
  the above lock path with quota times.

- pass the max and min to ioasid_free(), let ioasid_free() decide. should
  be able to avoid trigger the lock multiple times, and ioasid has have a
  track on how may PASIDs have been allocated, if max - min is larger than
  the allocated number, should fail anyway.
 
thoughts on the above reply?

Regards,
Yi Liu

> Thanks
> Kevin
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


RE: [PATCH v3 06/14] vfio/type1: Add VFIO_IOMMU_PASID_REQUEST (alloc/free)

2020-07-08 Thread Tian, Kevin
> From: Liu, Yi L 
> Sent: Thursday, July 9, 2020 8:32 AM
> 
> Hi Alex,
> 
> > Alex Williamson 
> > Sent: Thursday, July 9, 2020 3:55 AM
> >
> > On Wed, 8 Jul 2020 08:16:16 +
> > "Liu, Yi L"  wrote:
> >
> > > Hi Alex,
> > >
> > > > From: Liu, Yi L < yi.l@intel.com>
> > > > Sent: Friday, July 3, 2020 2:28 PM
> > > >
> > > > Hi Alex,
> > > >
> > > > > From: Alex Williamson 
> > > > > Sent: Friday, July 3, 2020 5:19 AM
> > > > >
> > > > > On Wed, 24 Jun 2020 01:55:19 -0700
> > > > > Liu Yi L  wrote:
> > > > >
> > > > > > This patch allows user space to request PASID allocation/free, e.g.
> > > > > > when serving the request from the guest.
> > > > > >
> > > > > > PASIDs that are not freed by userspace are automatically freed
> when
> > > > > > the IOASID set is destroyed when process exits.
> > > [...]
> > > > > > +static int vfio_iommu_type1_pasid_request(struct vfio_iommu
> *iommu,
> > > > > > + unsigned long arg)
> > > > > > +{
> > > > > > +   struct vfio_iommu_type1_pasid_request req;
> > > > > > +   unsigned long minsz;
> > > > > > +
> > > > > > +   minsz = offsetofend(struct vfio_iommu_type1_pasid_request,
> > range);
> > > > > > +
> > > > > > +   if (copy_from_user(, (void __user *)arg, minsz))
> > > > > > +   return -EFAULT;
> > > > > > +
> > > > > > +   if (req.argsz < minsz || (req.flags &
> > ~VFIO_PASID_REQUEST_MASK))
> > > > > > +   return -EINVAL;
> > > > > > +
> > > > > > +   if (req.range.min > req.range.max)
> > > > >
> > > > > Is it exploitable that a user can spin the kernel for a long time in
> > > > > the case of a free by calling this with [0, MAX_UINT] regardless of 
> > > > > their
> > actual
> > > > allocations?
> > > >
> > > > IOASID can ensure that user can only free the PASIDs allocated to the
> user.
> > but
> > > > it's true, kernel needs to loop all the PASIDs within the range provided
> > > > by user.
> > it
> > > > may take a long time. is there anything we can do? one thing may limit
> the
> > range
> > > > provided by user?
> > >
> > > thought about it more, we have per-VM pasid quota (say 1000), so even if
> > > user passed down [0, MAX_UNIT], kernel will only loop the 1000 pasids at
> > > most. do you think we still need to do something on it?
> >
> > How do you figure that?  vfio_iommu_type1_pasid_request() accepts the
> > user's min/max so long as (max > min) and passes that to
> > vfio_iommu_type1_pasid_free(), then to vfio_pasid_free_range()  which
> > loops as:
> >
> > ioasid_t pasid = min;
> > for (; pasid <= max; pasid++)
> > ioasid_free(pasid);
> >
> > A user might only be able to allocate 1000 pasids, but apparently they
> > can ask to free all they want.
> >
> > It's also not obvious to me that calling ioasid_free() is only allowing
> > the user to free their own passid.  Does it?  It would be a pretty

Agree. I thought ioasid_free should at least carry a token since the
user space is only allowed to manage PASIDs in its own set...

> > gaping hole if a user could free arbitrary pasids.  A r-b tree of
> > passids might help both for security and to bound spinning in a loop.
> 
> oh, yes. BTW. instead of r-b tree in VFIO, maybe we can add an ioasid_set
> parameter for ioasid_free(), thus to prevent the user from freeing PASIDs
> that doesn't belong to it. I remember Jacob mentioned it before.
> 

check current ioasid_free:

spin_lock(_allocator_lock);
ioasid_data = xa_load(_allocator->xa, ioasid);
if (!ioasid_data) {
pr_err("Trying to free unknown IOASID %u\n", ioasid);
goto exit_unlock;
}

Allow an user to trigger above lock paths with MAX_UINT times might still
be bad. 

Thanks
Kevin
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


RE: [kbuild-all] Re: [PATCH v4 4/7] iommu/vt-d: Handle non-page aligned address

2020-07-08 Thread Li, Philip
> Subject: [kbuild-all] Re: [PATCH v4 4/7] iommu/vt-d: Handle non-page aligned
> address
> 
> Hi Jacob,
> 
> Thank you for the patch! Perhaps something to improve:
> 
> [auto build test WARNING on iommu/next]
> [also build test WARNING on linux/master linus/master v5.8-rc4 next-20200707]
> [If your patch is applied to the wrong git tree, kindly drop us a note.
> And when submitting patch, we suggest to use  as documented in
> https://git-scm.com/docs/git-format-patch]
> 
> url:https://github.com/0day-ci/linux/commits/Jacob-Pan/iommu-vt-d-Misc-
> tweaks-and-fixes-for-vSVA/20200707-081026
> base:   https://git.kernel.org/pub/scm/linux/kernel/git/joro/iommu.git next
> config: x86_64-randconfig-m031-20200707 (attached as .config)
> compiler: gcc-9 (Debian 9.3.0-14) 9.3.0
> 
> If you fix the issue, kindly add following tag as appropriate
> Reported-by: kernel test robot 
> 
> All warnings (new ones prefixed by >>):
Kindly ignore this, which is related to [-Wtype-limits], and is under
discussion to move such warning from W=1 to W=2 as it may not
suitable to fix which can hurt code readability.

Refer to 
https://lore.kernel.org/lkml/CAHk-=wiKCXEWKJ9dWUimGbrVRo_N2RosESUw8E7m9AEtyZcu=w...@mail.gmail.com/
for the discussion.


> 
>In file included from include/linux/string.h:6,
> from include/linux/uuid.h:12,
> from include/linux/mod_devicetable.h:13,
> from include/linux/pci.h:27,
> from drivers/iommu/intel/dmar.c:19:
>drivers/iommu/intel/dmar.c: In function 'qi_flush_dev_iotlb_pasid':
>include/linux/bits.h:26:28: warning: comparison of unsigned expression < 0 
> is
> always false [-Wtype-limits]
>   26 |   __builtin_constant_p((l) > (h)), (l) > (h), 0)))
>  |^
>include/linux/compiler.h:58:52: note: in definition of macro 
> '__trace_if_var'
>   58 | #define __trace_if_var(cond) (__builtin_constant_p(cond) ? (cond) :
> __trace_if_value(cond))
>  |^~~~
> >> drivers/iommu/intel/dmar.c:1459:2: note: in expansion of macro 'if'
> 1459 |  if (addr & GENMASK_ULL(size_order + VTD_PAGE_SHIFT, 0))
>  |  ^~
>include/linux/bits.h:25:3: note: in expansion of macro 'BUILD_BUG_ON_ZERO'
>   25 |  (BUILD_BUG_ON_ZERO(__builtin_choose_expr( \
>  |   ^
>include/linux/bits.h:45:3: note: in expansion of macro
> 'GENMASK_INPUT_CHECK'
>   45 |  (GENMASK_INPUT_CHECK(h, l) + __GENMASK_ULL(h, l))
>  |   ^~~
>drivers/iommu/intel/dmar.c:1459:13: note: in expansion of macro
> 'GENMASK_ULL'
> 1459 |  if (addr & GENMASK_ULL(size_order + VTD_PAGE_SHIFT, 0))
>  | ^~~
>include/linux/bits.h:26:40: warning: comparison of unsigned expression < 0 
> is
> always false [-Wtype-limits]
>   26 |   __builtin_constant_p((l) > (h)), (l) > (h), 0)))
>  |^
>include/linux/compiler.h:58:52: note: in definition of macro 
> '__trace_if_var'
>   58 | #define __trace_if_var(cond) (__builtin_constant_p(cond) ? (cond) :
> __trace_if_value(cond))
>  |^~~~
> >> drivers/iommu/intel/dmar.c:1459:2: note: in expansion of macro 'if'
> 1459 |  if (addr & GENMASK_ULL(size_order + VTD_PAGE_SHIFT, 0))
>  |  ^~
>include/linux/bits.h:25:3: note: in expansion of macro 'BUILD_BUG_ON_ZERO'
>   25 |  (BUILD_BUG_ON_ZERO(__builtin_choose_expr( \
>  |   ^
>include/linux/bits.h:45:3: note: in expansion of macro
> 'GENMASK_INPUT_CHECK'
>   45 |  (GENMASK_INPUT_CHECK(h, l) + __GENMASK_ULL(h, l))
>  |   ^~~
>drivers/iommu/intel/dmar.c:1459:13: note: in expansion of macro
> 'GENMASK_ULL'
> 1459 |  if (addr & GENMASK_ULL(size_order + VTD_PAGE_SHIFT, 0))
>  | ^~~
>include/linux/bits.h:26:28: warning: comparison of unsigned expression < 0 
> is
> always false [-Wtype-limits]
>   26 |   __builtin_constant_p((l) > (h)), (l) > (h), 0)))
>  |^
>include/linux/compiler.h:58:61: note: in definition of macro 
> '__trace_if_var'
>   58 | #define __trace_if_var(cond) (__builtin_constant_p(cond) ? (cond) :
> __trace_if_value(cond))
>  | ^~~~
> >> drivers/iommu/intel/dmar.c:1459:2: note: in expansion of macro 'if'
> 1459 |  if (addr & GENMASK_ULL(size_order + VTD_PAGE_SHIFT, 0))
>  |  ^~
>include/linux/bits.h:25:3: note: in expansion of macro 'BUILD_BUG_ON_ZERO'
>   25 |  (BUILD_BUG_ON_ZERO(__builtin_choose_expr( \
>  |   ^
>include/linux/bits.h:45:3: note: in expansion of macro
> 'GENMASK_INPUT_CHECK'
>   45 |  (GENMASK_INPUT_CHECK(h, l) + __GENMASK_ULL(h, l))
>  |   ^~~
> 

Re: [PATCH v4 1/5] docs: IOMMU user API

2020-07-08 Thread Lu Baolu

Hi Jacob,

On 7/8/20 11:29 PM, Jacob Pan wrote:

On Wed, 8 Jul 2020 10:07:13 +0800
Lu Baolu  wrote:


Hi,

On 7/8/20 7:43 AM, Jacob Pan wrote:

+For UAPIs that are shared with in-kernel users, a wrapper function
+is provided to distinguish the callers. For example,
+
+Userspace caller ::
+
+  int iommu_sva_unbind_gpasid(struct iommu_domain *domain, struct
device *dev,
+  void __user *udata)
+
+In-kernel caller ::
+
+  int __iommu_sva_unbind_gpasid(struct iommu_domain *domain,
struct device *dev,
+  struct iommu_gpasid_bind_data *data)


iommu_page_response() may have the same needs. Can we reach an
agreement on the naming rules?


Yes iommu_page_response() also has to deal with in-kernel and UAPI
callers. I left it out because I thought the current VFIO and SVA common
code is not ready for PRQ yet, I might be reading old news :) argsz
need to be handled as well.

Perhaps we can wait until this set is settled? Do you have a suggestion
on the naming rules?


I have no suggestion on the naming rules, just wanted to check whether
others have any preference. I agree that we can wait until this series
is settled.

Best regards,
baolu
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH v2 1/2] iommu: iommu_aux_at(de)tach_device() extension

2020-07-08 Thread Lu Baolu

Hi Alex,

On 7/9/20 3:07 AM, Alex Williamson wrote:

On Wed, 8 Jul 2020 10:53:12 +0800
Lu Baolu  wrote:


Hi Alex,

Thanks a lot for your comments. Please check my reply inline.

On 7/8/20 5:04 AM, Alex Williamson wrote:

On Tue,  7 Jul 2020 09:39:56 +0800
Lu Baolu  wrote:
   

The hardware assistant vfio mediated device is a use case of iommu
aux-domain. The interactions between vfio/mdev and iommu during mdev
creation and passthr are:

- Create a group for mdev with iommu_group_alloc();
- Add the device to the group with
  group = iommu_group_alloc();
  if (IS_ERR(group))
  return PTR_ERR(group);

  ret = iommu_group_add_device(group, >dev);
  if (!ret)
  dev_info(>dev, "MDEV: group_id = %d\n",
   iommu_group_id(group));
- Allocate an aux-domain
  iommu_domain_alloc()
- Attach the aux-domain to the physical device from which the mdev is
created.
  iommu_aux_attach_device()

In the whole process, an iommu group was allocated for the mdev and an
iommu domain was attached to the group, but the group->domain leaves
NULL. As the result, iommu_get_domain_for_dev() doesn't work anymore.

The iommu_get_domain_for_dev() is a necessary interface for device
drivers that want to support aux-domain. For example,

  struct iommu_domain *domain;
  struct device *dev = mdev_dev(mdev);
  unsigned long pasid;

  domain = iommu_get_domain_for_dev(dev);
  if (!domain)
  return -ENODEV;

  pasid = iommu_aux_get_pasid(domain, dev->parent);

How did we know this was an aux domain? ie. How did we know we could
use it with iommu_aux_get_pasid()?


Yes. It's a bit confusing if iommu_get_domain_for_dev() is reused here
for aux-domain.



Why did we assume the parent device is the iommu device for the aux
domain?  Should that level of detail be already known by the aux domain?

Nits - The iomu device of an mdev device is found via
mdev_get_iommu_device(dev), it should not be assumed to be the parent.
The parent of an mdev device is found via mdev_parent_dev(mdev).


My bad. The driver should use mdev_get_iommu_device() instead.



The leaps in logic here make me wonder if we should instead be exposing
more of an aux domain API rather than blurring the differences between
these domains.  Thanks,


How about add below API?

/**
   * iommu_aux_get_domain_for_dev - get aux domain for a device
   * @dev: the accessory device
   *
   * The caller should pass a valid @dev to iommu_aux_attach_device() before
   * calling this api. Return an attached aux-domain, or NULL otherwise.


That's not necessarily the caller's responsibility, that might happen
elsewhere, this function simply returns an aux domain for the device if
it's attached to one.


Yes. Fair enough. This piece of comments will be removed.




   */
struct iommu_domain *iommu_aux_get_domain_for_dev(struct device *dev)
{
  struct iommu_domain *domain = NULL;
  struct iommu_group *group;

  group = iommu_group_get(dev);
  if (!group)
  return NULL;

  if (group->aux_domain_attached)
  domain = group->domain;

  iommu_group_put(group);

  return domain;
}
EXPORT_SYMBOL_GPL(iommu_aux_get_domain_for_dev);


For your example use case, this seems more clear to me.  Thanks,



Okay, thank you!


Alex


Best regards,
baolu

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH v2 4/4] iommu/vt-d: Add page response ops support

2020-07-08 Thread Lu Baolu

Hi Kevin,

On 7/6/20 9:47 AM, Tian, Kevin wrote:

From: Lu Baolu
Sent: Monday, July 6, 2020 8:26 AM

After a page request is handled, software must response the device which
raised the page request with the handling result. This is done through


'response' is a noun.


Yes.




the iommu ops.page_response if the request was reported to outside of
vendor iommu driver through iommu_report_device_fault(). This adds the
VT-d implementation of page_response ops.

Co-developed-by: Jacob Pan 
Signed-off-by: Jacob Pan 
Co-developed-by: Liu Yi L 
Signed-off-by: Liu Yi L 
Signed-off-by: Lu Baolu 
---
  drivers/iommu/intel/iommu.c |  1 +
  drivers/iommu/intel/svm.c   | 74
+
  include/linux/intel-iommu.h |  3 ++
  3 files changed, 78 insertions(+)

diff --git a/drivers/iommu/intel/iommu.c b/drivers/iommu/intel/iommu.c
index de17952ed133..7eb29167e8f9 100644
--- a/drivers/iommu/intel/iommu.c
+++ b/drivers/iommu/intel/iommu.c
@@ -6057,6 +6057,7 @@ const struct iommu_ops intel_iommu_ops = {
.sva_bind   = intel_svm_bind,
.sva_unbind = intel_svm_unbind,
.sva_get_pasid  = intel_svm_get_pasid,
+   .page_response  = intel_svm_page_response,
  #endif
  };

diff --git a/drivers/iommu/intel/svm.c b/drivers/iommu/intel/svm.c
index 08c58c2b1a06..1c7d8a9ea124 100644
--- a/drivers/iommu/intel/svm.c
+++ b/drivers/iommu/intel/svm.c
@@ -1078,3 +1078,77 @@ int intel_svm_get_pasid(struct iommu_sva *sva)

return pasid;
  }
+
+int intel_svm_page_response(struct device *dev,
+   struct iommu_fault_event *evt,
+   struct iommu_page_response *msg)
+{
+   struct iommu_fault_page_request *prm;
+   struct intel_svm_dev *sdev;
+   struct intel_iommu *iommu;
+   struct intel_svm *svm;
+   bool private_present;
+   bool pasid_present;
+   bool last_page;
+   u8 bus, devfn;
+   int ret = 0;
+   u16 sid;
+
+   if (!dev || !dev_is_pci(dev))
+   return -ENODEV;


but we didn't do same check when reporting fault?


For now, we only support PCI devices, so I will add this check in report
as well.




+
+   iommu = device_to_iommu(dev, , );
+   if (!iommu)
+   return -ENODEV;
+
+   if (!msg || !evt)
+   return -EINVAL;
+
+   mutex_lock(_mutex);
+
+   prm = >fault.prm;
+   sid = PCI_DEVID(bus, devfn);
+   pasid_present = prm->flags &
IOMMU_FAULT_PAGE_REQUEST_PASID_VALID;
+   private_present = prm->flags &
IOMMU_FAULT_PAGE_REQUEST_PRIV_DATA;
+   last_page = prm->flags &
IOMMU_FAULT_PAGE_REQUEST_LAST_PAGE;
+
+   if (pasid_present) {
+   if (prm->pasid == 0 || prm->pasid >= PASID_MAX) {
+   ret = -EINVAL;
+   goto out;
+   }
+
+   ret = pasid_to_svm_sdev(dev, prm->pasid, , );
+   if (ret || !sdev) {
+   ret = -ENODEV;
+   goto out;
+   }
+   }


what about pasid_present==0? Do we support recoverable fault now
with this patch?


For now, we don't support reporting a prq without pasid to outside.
prq_event_thread() handles such requests explicitly. I will add a
check in response ops.



and who guarantees that the external fault handler (e.g. guest)
cannot do bad thing with this interface, e.g. by specifying a PASID
belonging to other guests (when Scalable IOV is enabled)?


I will check below if the response is from user space.

(svm->mm ==  get_task_mm(current))



Thanks
Kevin


Best regards,
baolu


+
+   /*
+* Per VT-d spec. v3.0 ch7.7, system software must respond
+* with page group response if private data is present (PDP)
+* or last page in group (LPIG) bit is set. This is an
+* additional VT-d requirement beyond PCI ATS spec.
+*/
+   if (last_page || private_present) {
+   struct qi_desc desc;
+
+   desc.qw0 = QI_PGRP_PASID(prm->pasid) | QI_PGRP_DID(sid)
|
+   QI_PGRP_PASID_P(pasid_present) |
+   QI_PGRP_PDP(private_present) |
+   QI_PGRP_RESP_CODE(msg->code) |
+   QI_PGRP_RESP_TYPE;
+   desc.qw1 = QI_PGRP_IDX(prm->grpid) |
QI_PGRP_LPIG(last_page);
+   desc.qw2 = 0;
+   desc.qw3 = 0;
+   if (private_present)
+   memcpy(, prm->private_data,
+  sizeof(prm->private_data));
+
+   qi_submit_sync(iommu, , 1, 0);
+   }
+out:
+   mutex_unlock(_mutex);
+   return ret;
+}
diff --git a/include/linux/intel-iommu.h b/include/linux/intel-iommu.h
index fc2cfc3db6e1..bf6009a344f5 100644
--- a/include/linux/intel-iommu.h
+++ b/include/linux/intel-iommu.h
@@ -741,6 +741,9 @@ struct iommu_sva *intel_svm_bind(struct device
*dev, struct mm_struct *mm,
 

RE: [PATCH v3 06/14] vfio/type1: Add VFIO_IOMMU_PASID_REQUEST (alloc/free)

2020-07-08 Thread Liu, Yi L
Hi Alex,

> Alex Williamson 
> Sent: Thursday, July 9, 2020 3:55 AM
> 
> On Wed, 8 Jul 2020 08:16:16 +
> "Liu, Yi L"  wrote:
> 
> > Hi Alex,
> >
> > > From: Liu, Yi L < yi.l@intel.com>
> > > Sent: Friday, July 3, 2020 2:28 PM
> > >
> > > Hi Alex,
> > >
> > > > From: Alex Williamson 
> > > > Sent: Friday, July 3, 2020 5:19 AM
> > > >
> > > > On Wed, 24 Jun 2020 01:55:19 -0700
> > > > Liu Yi L  wrote:
> > > >
> > > > > This patch allows user space to request PASID allocation/free, e.g.
> > > > > when serving the request from the guest.
> > > > >
> > > > > PASIDs that are not freed by userspace are automatically freed when
> > > > > the IOASID set is destroyed when process exits.
> > [...]
> > > > > +static int vfio_iommu_type1_pasid_request(struct vfio_iommu *iommu,
> > > > > +   unsigned long arg)
> > > > > +{
> > > > > + struct vfio_iommu_type1_pasid_request req;
> > > > > + unsigned long minsz;
> > > > > +
> > > > > + minsz = offsetofend(struct vfio_iommu_type1_pasid_request,
> range);
> > > > > +
> > > > > + if (copy_from_user(, (void __user *)arg, minsz))
> > > > > + return -EFAULT;
> > > > > +
> > > > > + if (req.argsz < minsz || (req.flags &
> ~VFIO_PASID_REQUEST_MASK))
> > > > > + return -EINVAL;
> > > > > +
> > > > > + if (req.range.min > req.range.max)
> > > >
> > > > Is it exploitable that a user can spin the kernel for a long time in
> > > > the case of a free by calling this with [0, MAX_UINT] regardless of 
> > > > their
> actual
> > > allocations?
> > >
> > > IOASID can ensure that user can only free the PASIDs allocated to the 
> > > user.
> but
> > > it's true, kernel needs to loop all the PASIDs within the range provided
> > > by user.
> it
> > > may take a long time. is there anything we can do? one thing may limit the
> range
> > > provided by user?
> >
> > thought about it more, we have per-VM pasid quota (say 1000), so even if
> > user passed down [0, MAX_UNIT], kernel will only loop the 1000 pasids at
> > most. do you think we still need to do something on it?
> 
> How do you figure that?  vfio_iommu_type1_pasid_request() accepts the
> user's min/max so long as (max > min) and passes that to
> vfio_iommu_type1_pasid_free(), then to vfio_pasid_free_range()  which
> loops as:
> 
>   ioasid_t pasid = min;
>   for (; pasid <= max; pasid++)
>   ioasid_free(pasid);
> 
> A user might only be able to allocate 1000 pasids, but apparently they
> can ask to free all they want.
> 
> It's also not obvious to me that calling ioasid_free() is only allowing
> the user to free their own passid.  Does it?  It would be a pretty
> gaping hole if a user could free arbitrary pasids.  A r-b tree of
> passids might help both for security and to bound spinning in a loop.

oh, yes. BTW. instead of r-b tree in VFIO, maybe we can add an ioasid_set
parameter for ioasid_free(), thus to prevent the user from freeing PASIDs
that doesn't belong to it. I remember Jacob mentioned it before.

@Jacob, is it still in your plan?

Regards,
Yi Liu

> Thanks,
> 
> Alex

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


RE: [PATCH v4 04/15] vfio/type1: Report iommu nesting info to userspace

2020-07-08 Thread Liu, Yi L
Hi Alex,

> From: Alex Williamson 
> Sent: Thursday, July 9, 2020 3:30 AM
> 
> On Wed, 8 Jul 2020 08:08:40 +
> "Liu, Yi L"  wrote:
> 
> > Hi Alex,
> >
> > Eric asked if we will to have data strcut other than struct 
> > iommu_nesting_info
> > type in the struct vfio_iommu_type1_info_cap_nesting @info[] field. I'm not
> > quit sure on it. I guess the answer may be not as VFIO's nesting support 
> > should
> > based on IOMMU UAPI. how about your opinion?
> >
> > +#define VFIO_IOMMU_TYPE1_INFO_CAP_NESTING  3
> > +
> > +/*
> > + * Reporting nesting info to user space.
> > + *
> > + * @info:  the nesting info provided by IOMMU driver. Today
> > + * it is expected to be a struct iommu_nesting_info
> > + * data.
> > + */
> > +struct vfio_iommu_type1_info_cap_nesting {
> > +   struct  vfio_info_cap_header header;
> > +   __u32   flags;
> > +   __u32   padding;
> > +   __u8info[];
> > +};
> 
> It's not a very useful uAPI if the user can't be sure what they're
> getting out of it.  Info capabilities are "cheap", they don't need to
> be as extensible as an ioctl.  It's not clear that we really even need
> the flags (and therefore the padding), just define it to return the
> IOMMU uAPI structure with no extensibility.  If we need to expose
> something else, create a new capability.  Thanks,

thanks for the guiding, then I may embed the struct iommu_nesting_info
here. :-)

Regards,
Yi Liu

> Alex
> 
> >
> > https://lore.kernel.org/linux-
> iommu/dm5pr11mb1435290b6cd561ec61027892c3...@dm5pr11mb1435.nam
> prd11.prod.outlook.com/
> >
> > Regards,
> > Yi Liu
> >
> > > From: Liu, Yi L
> > > Sent: Tuesday, July 7, 2020 5:32 PM
> > >
> > [...]
> > > > >
> > > > >>> +
> > > > >>> +/*
> > > > >>> + * Reporting nesting info to user space.
> > > > >>> + *
> > > > >>> + * @info:  the nesting info provided by IOMMU driver. Today
> > > > >>> + * it is expected to be a struct iommu_nesting_info
> > > > >>> + * data.
> > > > >> Is it expected to change?
> > > > >
> > > > > honestly, I'm not quite sure on it. I did considered to embed struct
> > > > > iommu_nesting_info here instead of using info[]. but I hesitated as
> > > > > using info[] may leave more flexibility on this struct. how about
> > > > > your opinion? perhaps it's fine to embed the struct
> > > > > iommu_nesting_info here as long as VFIO is setup nesting based on
> > > > > IOMMU UAPI.
> > > > >
> > > > >>> + */
> > > > >>> +struct vfio_iommu_type1_info_cap_nesting {
> > > > >>> +   struct  vfio_info_cap_header header;
> > > > >>> +   __u32   flags;
> > > > >> You may document flags.
> > > > >
> > > > > sure. it's reserved for future.
> > > > >
> > > > > Regards,
> > > > > Yi Liu
> > > > >
> > > > >>> +   __u32   padding;
> > > > >>> +   __u8info[];
> > > > >>> +};
> > > > >>> +
> > > > >>>  #define VFIO_IOMMU_GET_INFO _IO(VFIO_TYPE, VFIO_BASE + 12)
> > > > >>>
> > > > >>>  /**
> > > > >>>
> > > > >> Thanks
> > > > >>
> > > > >> Eric
> > > > >
> >

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH] dma-pool: use single atomic pool for both DMA zones

2020-07-08 Thread Jeremy Linton

Hi,

On 7/7/20 7:28 AM, Nicolas Saenz Julienne wrote:

When allocating atomic DMA memory for a device, the dma-pool core
queries __dma_direct_optimal_gfp_mask() to check which atomic pool to
use. It turns out the GFP flag returned is only an optimistic guess.
The pool selected might sometimes live in a zone higher than the
device's view of memory.

As there isn't a way to grantee a mapping between a device's DMA
constraints and correct GFP flags this unifies both DMA atomic pools.
The resulting pool is allocated in the lower DMA zone available, if any,
so as for devices to always get accessible memory while having the
flexibility of using dma_pool_kernel for the non constrained ones.


With the follow-on patch "dma-pool: Do not allocate pool memory from 
CMA" everything seems to be working fine now.


tested-by: Jeremy Linton 

Thanks for fixing this!




Fixes: c84dc6e68a1d ("dma-pool: add additional coherent pools to map to gfp 
mask")
Reported-by: Jeremy Linton 
Suggested-by: Robin Murphy 
Signed-off-by: Nicolas Saenz Julienne 
---
  kernel/dma/pool.c | 47 +++
  1 file changed, 19 insertions(+), 28 deletions(-)

diff --git a/kernel/dma/pool.c b/kernel/dma/pool.c
index 8cfa01243ed2..883f7a583969 100644
--- a/kernel/dma/pool.c
+++ b/kernel/dma/pool.c
@@ -13,10 +13,11 @@
  #include 
  #include 
  
+#define GFP_ATOMIC_POOL_DMA	(IS_ENABLED(CONFIG_ZONE_DMA) ? GFP_DMA : \

+IS_ENABLED(CONFIG_ZONE_DMA32) ? GFP_DMA32 : 0)
+
  static struct gen_pool *atomic_pool_dma __ro_after_init;
  static unsigned long pool_size_dma;
-static struct gen_pool *atomic_pool_dma32 __ro_after_init;
-static unsigned long pool_size_dma32;
  static struct gen_pool *atomic_pool_kernel __ro_after_init;
  static unsigned long pool_size_kernel;
  
@@ -42,16 +43,13 @@ static void __init dma_atomic_pool_debugfs_init(void)

return;
  
  	debugfs_create_ulong("pool_size_dma", 0400, root, _size_dma);

-   debugfs_create_ulong("pool_size_dma32", 0400, root, _size_dma32);
debugfs_create_ulong("pool_size_kernel", 0400, root, _size_kernel);
  }
  
  static void dma_atomic_pool_size_add(gfp_t gfp, size_t size)

  {
-   if (gfp & __GFP_DMA)
+   if (gfp & GFP_ATOMIC_POOL_DMA)
pool_size_dma += size;
-   else if (gfp & __GFP_DMA32)
-   pool_size_dma32 += size;
else
pool_size_kernel += size;
  }
@@ -132,12 +130,11 @@ static void atomic_pool_resize(struct gen_pool *pool, 
gfp_t gfp)
  
  static void atomic_pool_work_fn(struct work_struct *work)

  {
-   if (IS_ENABLED(CONFIG_ZONE_DMA))
-   atomic_pool_resize(atomic_pool_dma,
-  GFP_KERNEL | GFP_DMA);
-   if (IS_ENABLED(CONFIG_ZONE_DMA32))
-   atomic_pool_resize(atomic_pool_dma32,
-  GFP_KERNEL | GFP_DMA32);
+   gfp_t dma_gfp = GFP_ATOMIC_POOL_DMA;
+
+   if (dma_gfp)
+   atomic_pool_resize(atomic_pool_dma, GFP_KERNEL | dma_gfp);
+
atomic_pool_resize(atomic_pool_kernel, GFP_KERNEL);
  }
  
@@ -168,6 +165,7 @@ static __init struct gen_pool *__dma_atomic_pool_init(size_t pool_size,
  
  static int __init dma_atomic_pool_init(void)

  {
+   gfp_t dma_gfp = GFP_ATOMIC_POOL_DMA;
int ret = 0;
  
  	/*

@@ -185,18 +183,13 @@ static int __init dma_atomic_pool_init(void)
GFP_KERNEL);
if (!atomic_pool_kernel)
ret = -ENOMEM;
-   if (IS_ENABLED(CONFIG_ZONE_DMA)) {
+
+   if (dma_gfp) {
atomic_pool_dma = __dma_atomic_pool_init(atomic_pool_size,
-   GFP_KERNEL | GFP_DMA);
+GFP_KERNEL | dma_gfp);
if (!atomic_pool_dma)
ret = -ENOMEM;
}
-   if (IS_ENABLED(CONFIG_ZONE_DMA32)) {
-   atomic_pool_dma32 = __dma_atomic_pool_init(atomic_pool_size,
-   GFP_KERNEL | GFP_DMA32);
-   if (!atomic_pool_dma32)
-   ret = -ENOMEM;
-   }
  
  	dma_atomic_pool_debugfs_init();

return ret;
@@ -206,14 +199,12 @@ postcore_initcall(dma_atomic_pool_init);
  static inline struct gen_pool *dev_to_pool(struct device *dev)
  {
u64 phys_mask;
-   gfp_t gfp;
-
-   gfp = dma_direct_optimal_gfp_mask(dev, dev->coherent_dma_mask,
- _mask);
-   if (IS_ENABLED(CONFIG_ZONE_DMA) && gfp == GFP_DMA)
-   return atomic_pool_dma;
-   if (IS_ENABLED(CONFIG_ZONE_DMA32) && gfp == GFP_DMA32)
-   return atomic_pool_dma32;
+
+   if (atomic_pool_dma &&
+   dma_direct_optimal_gfp_mask(dev, dev->coherent_dma_mask,
+   _mask))
+   return atomic_pool_dma;
+
return 

Re: [PATCH] dma-pool: Do not allocate pool memory from CMA

2020-07-08 Thread Jeremy Linton

Hi,

On 7/8/20 11:49 AM, Nicolas Saenz Julienne wrote:

There is no guarantee to CMA's placement, so allocating a zone specific
atomic pool from CMA might return memory from a completely different
memory zone. So stop using it.

Fixes: c84dc6e68a1d ("dma-pool: add additional coherent pools to map to gfp 
mask")
Reported-by: Jeremy Linton 
Signed-off-by: Nicolas Saenz Julienne 
---


With this patch and 
https://www.mail-archive.com/linux-kernel@vger.kernel.org/msg2226971.html 
the machine appears to be working fine with/without CMA and in both 
DT/ACPI mode.


The JBOD/etc I was having problems with are working fine, and the rtlsdr 
seems to be working better now too (its still not perfect, but that is 
likely another issue).


so:

tested-by: Jeremy Linton 

thanks!




An more costly alternative would be adding an option to
dma_alloc_from_contiguous() so it fails when the allocation doesn't fall
in a specific zone.

  kernel/dma/pool.c | 11 ++-
  1 file changed, 2 insertions(+), 9 deletions(-)

diff --git a/kernel/dma/pool.c b/kernel/dma/pool.c
index 8cfa01243ed2..4bc1c46ae6ef 100644
--- a/kernel/dma/pool.c
+++ b/kernel/dma/pool.c
@@ -6,7 +6,6 @@
  #include 
  #include 
  #include 
-#include 
  #include 
  #include 
  #include 
@@ -69,12 +68,7 @@ static int atomic_pool_expand(struct gen_pool *pool, size_t 
pool_size,
  
  	do {

pool_size = 1 << (PAGE_SHIFT + order);
-
-   if (dev_get_cma_area(NULL))
-   page = dma_alloc_from_contiguous(NULL, 1 << order,
-order, false);
-   else
-   page = alloc_pages(gfp, order);
+   page = alloc_pages(gfp, order);
} while (!page && order-- > 0);
if (!page)
goto out;
@@ -118,8 +112,7 @@ static int atomic_pool_expand(struct gen_pool *pool, size_t 
pool_size,
dma_common_free_remap(addr, pool_size);
  #endif
  free_page: __maybe_unused
-   if (!dma_release_from_contiguous(NULL, page, 1 << order))
-   __free_pages(page, order);
+   __free_pages(page, order);
  out:
return ret;
  }



___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH] iommu/arm-smmu: Update impl quirks comment

2020-07-08 Thread Will Deacon
On Wed, 24 Jun 2020 11:24:51 +0100, Robin Murphy wrote:
> The comment about implementation and integration quirks being
> mutually-exclusive is out of date, and in fact the code is already
> structured for the case it anticipates, so document that properly.

Applied to will (for-joerg/arm-smmu/updates), thanks!

[1/1] iommu/arm-smmu: Update impl quirks comment
  https://git.kernel.org/will/c/976492922a6a

Cheers,
-- 
Will

https://fixes.arm64.dev
https://next.arm64.dev
https://will.arm64.dev
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[PATCH] IOMMU DRIVERS: Replace HTTP links with HTTPS ones

2020-07-08 Thread Alexander A. Klimov
Rationale:
Reduces attack surface on kernel devs opening the links for MITM
as HTTPS traffic is much harder to manipulate.

Deterministic algorithm:
For each file:
  If not .svg:
For each line:
  If doesn't contain `\bxmlns\b`:
For each link, `\bhttp://[^# \t\r\n]*(?:\w|/)`:
  If neither `\bgnu\.org/license`, nor `\bmozilla\.org/MPL\b`:
If both the HTTP and HTTPS versions
return 200 OK and serve the same content:
  Replace HTTP with HTTPS.

Signed-off-by: Alexander A. Klimov 
---
 Continuing my work started at 93431e0607e5.
 See also: git log --oneline '--author=Alexander A. Klimov 
' v5.7..master
 (Actually letting a shell for loop submit all this stuff for me.)

 If there are any URLs to be removed completely or at least not HTTPSified:
 Just clearly say so and I'll *undo my change*.
 See also: https://lkml.org/lkml/2020/6/27/64

 If there are any valid, but yet not changed URLs:
 See: https://lkml.org/lkml/2020/6/26/837

 If you apply the patch, please let me know.


 drivers/iommu/omap-iommu.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/iommu/omap-iommu.c b/drivers/iommu/omap-iommu.c
index c8282cc212cb..ec1c8cd0e419 100644
--- a/drivers/iommu/omap-iommu.c
+++ b/drivers/iommu/omap-iommu.c
@@ -3,7 +3,7 @@
  * omap iommu: tlb and pagetable primitives
  *
  * Copyright (C) 2008-2010 Nokia Corporation
- * Copyright (C) 2013-2017 Texas Instruments Incorporated - http://www.ti.com/
+ * Copyright (C) 2013-2017 Texas Instruments Incorporated - https://www.ti.com/
  *
  * Written by Hiroshi DOYU ,
  * Paul Mundt and Toshihiro Kobayashi
-- 
2.27.0

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH v10 2/5] iommu/arm-smmu: ioremap smmu mmio region before implementation init

2020-07-08 Thread Nicolin Chen
On Tue, Jul 07, 2020 at 10:00:14PM -0700, Krishna Reddy wrote:
> ioremap smmu mmio region before calling into implementation init.
> This is necessary to allow mapped address available during vendor
> specific implementation init.
> 
> Signed-off-by: Krishna Reddy 

Reviewed-by: Nicolin Chen 
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH v10 3/5] iommu/arm-smmu: add NVIDIA implementation for ARM MMU-500 usage

2020-07-08 Thread Nicolin Chen
On Tue, Jul 07, 2020 at 10:00:15PM -0700, Krishna Reddy wrote:
> NVIDIA's Tegra194 SoC has three ARM MMU-500 instances.
> It uses two of the ARM MMU-500s together to interleave IOVA
> accesses across them and must be programmed identically.
> This implementation supports programming the two ARM MMU-500s
> that must be programmed identically.
> 
> The third ARM MMU-500 instance is supported by standard
> arm-smmu.c driver itself.
> 
> Signed-off-by: Krishna Reddy 

Reviewed-by: Nicolin Chen 
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH v10 1/5] iommu/arm-smmu: move TLB timeout and spin count macros

2020-07-08 Thread Nicolin Chen
On Tue, Jul 07, 2020 at 10:00:13PM -0700, Krishna Reddy wrote:
> Move TLB timeout and spin count macros to header file to
> allow using the same from vendor specific implementations.
> 
> Signed-off-by: Krishna Reddy 

Reviewed-by: Nicolin Chen 
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH v10 5/5] iommu/arm-smmu: Add global/context fault implementation hooks

2020-07-08 Thread Nicolin Chen
On Tue, Jul 07, 2020 at 10:00:17PM -0700, Krishna Reddy wrote:
> Add global/context fault hooks to allow vendor specific implementations
> override default fault interrupt handlers.
> 
> Update NVIDIA implementation to override the default global/context fault
> interrupt handlers and handle interrupts across the two ARM MMU-500s that
> are programmed identically.
> 
> Signed-off-by: Krishna Reddy 

Reviewed-by: Nicolin Chen 
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH v3 06/14] vfio/type1: Add VFIO_IOMMU_PASID_REQUEST (alloc/free)

2020-07-08 Thread Alex Williamson
On Wed, 8 Jul 2020 08:16:16 +
"Liu, Yi L"  wrote:

> Hi Alex,
> 
> > From: Liu, Yi L < yi.l@intel.com>
> > Sent: Friday, July 3, 2020 2:28 PM
> > 
> > Hi Alex,
> >   
> > > From: Alex Williamson 
> > > Sent: Friday, July 3, 2020 5:19 AM
> > >
> > > On Wed, 24 Jun 2020 01:55:19 -0700
> > > Liu Yi L  wrote:
> > >  
> > > > This patch allows user space to request PASID allocation/free, e.g.
> > > > when serving the request from the guest.
> > > >
> > > > PASIDs that are not freed by userspace are automatically freed when
> > > > the IOASID set is destroyed when process exits.  
> [...]
> > > > +static int vfio_iommu_type1_pasid_request(struct vfio_iommu *iommu,
> > > > + unsigned long arg)
> > > > +{
> > > > +   struct vfio_iommu_type1_pasid_request req;
> > > > +   unsigned long minsz;
> > > > +
> > > > +   minsz = offsetofend(struct vfio_iommu_type1_pasid_request, 
> > > > range);
> > > > +
> > > > +   if (copy_from_user(, (void __user *)arg, minsz))
> > > > +   return -EFAULT;
> > > > +
> > > > +   if (req.argsz < minsz || (req.flags & ~VFIO_PASID_REQUEST_MASK))
> > > > +   return -EINVAL;
> > > > +
> > > > +   if (req.range.min > req.range.max)  
> > >
> > > Is it exploitable that a user can spin the kernel for a long time in
> > > the case of a free by calling this with [0, MAX_UINT] regardless of their 
> > > actual  
> > allocations?
> > 
> > IOASID can ensure that user can only free the PASIDs allocated to the user. 
> > but
> > it's true, kernel needs to loop all the PASIDs within the range provided by 
> > user. it
> > may take a long time. is there anything we can do? one thing may limit the 
> > range
> > provided by user?  
> 
> thought about it more, we have per-VM pasid quota (say 1000), so even if
> user passed down [0, MAX_UNIT], kernel will only loop the 1000 pasids at
> most. do you think we still need to do something on it?

How do you figure that?  vfio_iommu_type1_pasid_request() accepts the
user's min/max so long as (max > min) and passes that to
vfio_iommu_type1_pasid_free(), then to vfio_pasid_free_range()  which
loops as:

ioasid_t pasid = min;
for (; pasid <= max; pasid++)
ioasid_free(pasid);

A user might only be able to allocate 1000 pasids, but apparently they
can ask to free all they want.

It's also not obvious to me that calling ioasid_free() is only allowing
the user to free their own passid.  Does it?  It would be a pretty
gaping hole if a user could free arbitrary pasids.  A r-b tree of
passids might help both for security and to bound spinning in a loop.
Thanks,

Alex

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH v2 4/6] drm/msm: Add support to create a local pagetable

2020-07-08 Thread Jordan Crouse
On Tue, Jul 07, 2020 at 12:36:42PM +0100, Robin Murphy wrote:
> On 2020-06-26 21:04, Jordan Crouse wrote:
> >Add support to create a io-pgtable for use by targets that support
> >per-instance pagetables.  In order to support per-instance pagetables the
> >GPU SMMU device needs to have the qcom,adreno-smmu compatible string and
> >split pagetables and auxiliary domains need to be supported and enabled.
> >
> >Signed-off-by: Jordan Crouse 
> >---
> >
> >  drivers/gpu/drm/msm/msm_gpummu.c |   2 +-
> >  drivers/gpu/drm/msm/msm_iommu.c  | 180 ++-
> >  drivers/gpu/drm/msm/msm_mmu.h|  16 ++-
> >  3 files changed, 195 insertions(+), 3 deletions(-)
> >
> >diff --git a/drivers/gpu/drm/msm/msm_gpummu.c 
> >b/drivers/gpu/drm/msm/msm_gpummu.c
> >index 310a31b05faa..aab121f4beb7 100644
> >--- a/drivers/gpu/drm/msm/msm_gpummu.c
> >+++ b/drivers/gpu/drm/msm/msm_gpummu.c
> >@@ -102,7 +102,7 @@ struct msm_mmu *msm_gpummu_new(struct device *dev, 
> >struct msm_gpu *gpu)
> > }
> > gpummu->gpu = gpu;
> >-msm_mmu_init(>base, dev, );
> >+msm_mmu_init(>base, dev, , MSM_MMU_GPUMMU);
> > return >base;
> >  }
> >diff --git a/drivers/gpu/drm/msm/msm_iommu.c 
> >b/drivers/gpu/drm/msm/msm_iommu.c
> >index 1b6635504069..f455c597f76d 100644
> >--- a/drivers/gpu/drm/msm/msm_iommu.c
> >+++ b/drivers/gpu/drm/msm/msm_iommu.c
> >@@ -4,15 +4,192 @@
> >   * Author: Rob Clark 
> >   */
> >+#include 
> >  #include "msm_drv.h"
> >  #include "msm_mmu.h"
> >  struct msm_iommu {
> > struct msm_mmu base;
> > struct iommu_domain *domain;
> >+struct iommu_domain *aux_domain;
> >  };
> >+
> >  #define to_msm_iommu(x) container_of(x, struct msm_iommu, base)
> >+struct msm_iommu_pagetable {
> >+struct msm_mmu base;
> >+struct msm_mmu *parent;
> >+struct io_pgtable_ops *pgtbl_ops;
> >+phys_addr_t ttbr;
> >+u32 asid;
> >+};
> >+
> >+static struct msm_iommu_pagetable *to_pagetable(struct msm_mmu *mmu)
> >+{
> >+return container_of(mmu, struct msm_iommu_pagetable, base);
> >+}
> >+
> >+static int msm_iommu_pagetable_unmap(struct msm_mmu *mmu, u64 iova,
> >+size_t size)
> >+{
> >+struct msm_iommu_pagetable *pagetable = to_pagetable(mmu);
> >+struct io_pgtable_ops *ops = pagetable->pgtbl_ops;
> >+size_t unmapped = 0;
> >+
> >+/* Unmap the block one page at a time */
> >+while (size) {
> >+unmapped += ops->unmap(ops, iova, 4096, NULL);
> >+iova += 4096;
> >+size -= 4096;
> >+}
> >+
> >+iommu_flush_tlb_all(to_msm_iommu(pagetable->parent)->domain);
> >+
> >+return (unmapped == size) ? 0 : -EINVAL;
> >+}
> 
> Remember in patch #1 when you said "Then 'domain' can be used like any other
> iommu domain to map and unmap iova addresses in the pagetable."?
> 
> This appears to be very much not that :/
 
The code changed but the commit log stayed the same.  I'll reword.

Jordan

> Robin.
> 
> >+
> >+static int msm_iommu_pagetable_map(struct msm_mmu *mmu, u64 iova,
> >+struct sg_table *sgt, size_t len, int prot)
> >+{
> >+struct msm_iommu_pagetable *pagetable = to_pagetable(mmu);
> >+struct io_pgtable_ops *ops = pagetable->pgtbl_ops;
> >+struct scatterlist *sg;
> >+size_t mapped = 0;
> >+u64 addr = iova;
> >+unsigned int i;
> >+
> >+for_each_sg(sgt->sgl, sg, sgt->nents, i) {
> >+size_t size = sg->length;
> >+phys_addr_t phys = sg_phys(sg);
> >+
> >+/* Map the block one page at a time */
> >+while (size) {
> >+if (ops->map(ops, addr, phys, 4096, prot)) {
> >+msm_iommu_pagetable_unmap(mmu, iova, mapped);
> >+return -EINVAL;
> >+}
> >+
> >+phys += 4096;
> >+addr += 4096;
> >+size -= 4096;
> >+mapped += 4096;
> >+}
> >+}
> >+
> >+return 0;
> >+}
> >+
> >+static void msm_iommu_pagetable_destroy(struct msm_mmu *mmu)
> >+{
> >+struct msm_iommu_pagetable *pagetable = to_pagetable(mmu);
> >+
> >+free_io_pgtable_ops(pagetable->pgtbl_ops);
> >+kfree(pagetable);
> >+}
> >+
> >+/*
> >+ * Given a parent device, create and return an aux domain. This will enable 
> >the
> >+ * TTBR0 region
> >+ */
> >+static struct iommu_domain *msm_iommu_get_aux_domain(struct msm_mmu *parent)
> >+{
> >+struct msm_iommu *iommu = to_msm_iommu(parent);
> >+struct iommu_domain *domain;
> >+int ret;
> >+
> >+if (iommu->aux_domain)
> >+return iommu->aux_domain;
> >+
> >+if (!iommu_dev_has_feature(parent->dev, IOMMU_DEV_FEAT_AUX))
> >+return ERR_PTR(-ENODEV);
> >+
> >+domain = iommu_domain_alloc(_bus_type);
> >+if (!domain)
> >+return ERR_PTR(-ENODEV);
> >+
> >+ret = iommu_aux_attach_device(domain, parent->dev);
> >+if (ret) {
> >+iommu_domain_free(domain);
> >+

[PATCH v7 00/12] PCI: brcmstb: enable PCIe for STB chips

2020-07-08 Thread Jim Quinlan via iommu


Patchset Summary:
  Enhance a PCIe host controller driver.  Because of its unusual design
  we are foced to change dev->dma_pfn_offset into a more general role
  allowing multiple offsets.  See the 'v1' notes below for more info.

v7:
  Commit: "device core: Introduce DMA range map, supplanting ..."
  -- remove second kcalloc/copy in device.c (AndyS)
  -- use PTR_ERR_OR_ZERO() and PHYS_PFN() (AndyS)
  -- indentation, sizeof(struct ...) => sizeof(*r) (AndyS)
  -- add pfn.h definitions: PFN_DMA_ADDR(), DMA_ADDR_PFN() (AndyS)
  -- Fixed compile error in "sun6i_csi.c" (kernel test robot)
  Commit "ata: ahci_brcm: Fix use of BCM7216 reset controller"
  -- correct name of function in the commit msg (SergeiS)
  
v6:
  Commit "device core: Introduce DMA range map":
  -- of_dma_get_range() now takes a single argument and returns either
 NULL, a valid map, or an ERR_PTR. (Robin)
  -- offsets are no longer a PFN value but an actual address. (Robin)
  -- the bus_dma_region struct stores the range size instead of
 the cpu_end and pci_end values. (Robin)
  -- devices that were setting a single offset with no boundaries
 have been modified to have boundaries; in a few places
 where this informatino was unavilable a /* FIXME: ... */
 comment was added. (Robin)
  -- dma_attach_offset_range() can be called when an offset
 map already exists; if it's range is already present
 nothing is done and success is returned. (Robin)
  All commits:
  -- Man name/style/corrections/etc changed (Bjorn)
  -- rebase to Torvalds master

v5:
  Commit "device core: Introduce multiple dma pfn offsets"
  -- in of/address.c: "map_size = 0" => "*map_size = 0"
  -- use kcalloc instead of kzalloc (AndyS)
  -- use PHYS_ADDR_MAX instead of "~(phys_addr_t)0"
  Commit "PCI: brcmstb: Set internal memory viewport sizes"
  -- now gives error on missing dma-ranges property.
  Commit "dt-bindings: PCI: Add bindings for more Brcmstb chips"
  -- removed "Allof:" from brcm,scb-sizes definition (RobH)
  All Commits:
  -- indentation style, use max chars 100 (AndyS)
  -- rebased to torvalds master

v4:
  Commit "device core: Introduce multiple dma pfn offsets"
  -- of_dma_get_range() does not take a dev param but instead
 takes two "out" params: map and map_size.  We do this so
 that the code that parses dma-ranges is separate from
 the code that modifies 'dev'.   (Nicolas)
  -- the separate case of having a single pfn offset has
 been removed and is now processed by going through the
 map array. (Nicolas)
  -- move attach_uniform_dma_pfn_offset() from of/address.c to
 dma/mapping.c so that it does not depend on CONFIG_OF. (Nicolas)
  -- devm_kcalloc => devm_kzalloc (DanC)
  -- add/fix assignment to dev->dma_pfn_offset_map for func
 attach_uniform_dma_pfn_offset() (DanC, Nicolas)
  -- s/struct dma_pfn_offset_region/struct bus_dma_region/ (Nicolas)
  -- s/attach_uniform_dma_pfn_offset/dma_attach_uniform_pfn_offset/
  -- s/attach_dma_pfn_offset_map/dma_attach_pfn_offset_map/
  -- More use of PFN_{PHYS,DOWN,UP}. (AndyS)
  Commit "of: Include a dev param in of_dma_get_range()"
  -- this commit was sqaushed with "device core: Introduce ..."

v3:
  Commit "device core: Introduce multiple dma pfn offsets"
  Commit "arm: dma-mapping: Invoke dma offset func if needed"
  -- The above two commits have been squashed.  More importantly,
 the code has been modified so that the functionality for
 multiple pfn offsets subsumes the use of dev->dma_pfn_offset.
 In fact, dma_pfn_offset is removed and supplanted by
 dma_pfn_offset_map, which is a pointer to an array.  The
 more common case of a uniform offset is now handled as
 a map with a single entry, while cases requiring multiple
 pfn offsets use a map with multiple entries.  Code paths
 that used to do this:

 dev->dma_pfn_offset = mydrivers_pfn_offset;

 have been changed to do this:

 attach_uniform_dma_pfn_offset(dev, pfn_offset);

  Commit "dt-bindings: PCI: Add bindings for more Brcmstb chips"
  -- Add if/then clause for required props: resets, reset-names (RobH)
  -- Change compatible list from const to enum (RobH)
  -- Change list of u32-tuples to u64 (RobH)

  Commit "of: Include a dev param in of_dma_get_range()"
  -- modify of/unittests.c to add NULL param in of_dma_get_range() call.

  Commit "device core: Add ability to handle multiple dma offsets"
  -- align comment in device.h (AndyS).
  -- s/cpu_beg/cpu_start/ and s/dma_beg/dma_start/ in struct
 dma_pfn_offset_region (AndyS).

v2:
Commit: "device core: Add ability to handle multiple dma offsets"
  o Added helper func attach_dma_pfn_offset_map() in address.c (Chistoph)
  o Helpers funcs added to __phys_to_dma() & __dma_to_phys() (Christoph)
  o Added warning when multiple offsets are needed and !DMA_PFN_OFFSET_MAP
  o dev->dma_pfn_map => dev->dma_pfn_offset_map
  o s/frm/from/ for dma_pfn_offset_frm_{phys,dma}_addr() (Christoph)
  o In device.h: 

[PATCH v7 08/12] device core: Introduce DMA range map, supplanting dma_pfn_offset

2020-07-08 Thread Jim Quinlan via iommu
The new field 'dma_range_map' in struct device is used to facilitate the
use of single or multiple offsets between mapping regions of cpu addrs and
dma addrs.  It subsumes the role of "dev->dma_pfn_offset" which was only
capable of holding a single uniform offset and had no region bounds
checking.

The function of_dma_get_range() has been modified so that it takes a single
argument -- the device node -- and returns a map, NULL, or an error code.
The map is an array that holds the information regarding the DMA regions.
Each range entry contains the address offset, the cpu_start address, the
dma_start address, and the size of the region.

of_dma_configure() is the typical manner to set range offsets but there are
a number of ad hoc assignments to "dev->dma_pfn_offset" in the kernel
driver code.  These cases now invoke the function
dma_attach_offset_range(dev, cpu_addr, dma_addr, size).

Signed-off-by: Jim Quinlan 
---
 arch/arm/include/asm/dma-mapping.h|  9 +-
 arch/arm/mach-keystone/keystone.c | 17 ++--
 arch/sh/drivers/pci/pcie-sh7786.c |  9 +-
 arch/sh/kernel/dma-coherent.c | 16 ++--
 arch/x86/pci/sta2x11-fixup.c  |  7 +-
 drivers/acpi/arm64/iort.c |  5 +-
 drivers/gpu/drm/sun4i/sun4i_backend.c |  7 +-
 drivers/iommu/io-pgtable-arm.c|  2 +-
 .../platform/sunxi/sun4i-csi/sun4i_csi.c  |  6 +-
 .../platform/sunxi/sun6i-csi/sun6i_csi.c  |  5 +-
 drivers/of/address.c  | 95 ++-
 drivers/of/device.c   | 47 +
 drivers/of/of_private.h   |  9 +-
 drivers/of/unittest.c | 35 +--
 drivers/remoteproc/remoteproc_core.c  |  2 +-
 .../staging/media/sunxi/cedrus/cedrus_hw.c|  8 +-
 drivers/usb/core/message.c|  4 +-
 drivers/usb/core/usb.c|  2 +-
 include/linux/device.h|  4 +-
 include/linux/dma-direct.h| 10 +-
 include/linux/dma-mapping.h   | 37 
 include/linux/pfn.h   |  2 +
 kernel/dma/coherent.c | 10 +-
 kernel/dma/mapping.c  | 53 +++
 24 files changed, 277 insertions(+), 124 deletions(-)

diff --git a/arch/arm/include/asm/dma-mapping.h 
b/arch/arm/include/asm/dma-mapping.h
index bdd80ddbca34..b7cdde9fb83d 100644
--- a/arch/arm/include/asm/dma-mapping.h
+++ b/arch/arm/include/asm/dma-mapping.h
@@ -35,8 +35,9 @@ static inline const struct dma_map_ops 
*get_arch_dma_ops(struct bus_type *bus)
 #ifndef __arch_pfn_to_dma
 static inline dma_addr_t pfn_to_dma(struct device *dev, unsigned long pfn)
 {
-   if (dev)
-   pfn -= dev->dma_pfn_offset;
+   if (dev && dev->dma_range_map)
+   pfn -= DMA_ADDR_PFN(dma_offset_from_phys_addr(dev, 
PFN_PHYS(pfn)));
+
return (dma_addr_t)__pfn_to_bus(pfn);
 }
 
@@ -44,8 +45,8 @@ static inline unsigned long dma_to_pfn(struct device *dev, 
dma_addr_t addr)
 {
unsigned long pfn = __bus_to_pfn(addr);
 
-   if (dev)
-   pfn += dev->dma_pfn_offset;
+   if (dev && dev->dma_range_map)
+   pfn += DMA_ADDR_PFN(dma_offset_from_dma_addr(dev, addr));
 
return pfn;
 }
diff --git a/arch/arm/mach-keystone/keystone.c 
b/arch/arm/mach-keystone/keystone.c
index 638808c4e122..a1a19781983b 100644
--- a/arch/arm/mach-keystone/keystone.c
+++ b/arch/arm/mach-keystone/keystone.c
@@ -8,6 +8,7 @@
  */
 #include 
 #include 
+#include 
 #include 
 #include 
 #include 
@@ -24,8 +25,6 @@
 
 #include "keystone.h"
 
-static unsigned long keystone_dma_pfn_offset __read_mostly;
-
 static int keystone_platform_notifier(struct notifier_block *nb,
  unsigned long event, void *data)
 {
@@ -38,9 +37,12 @@ static int keystone_platform_notifier(struct notifier_block 
*nb,
return NOTIFY_BAD;
 
if (!dev->of_node) {
-   dev->dma_pfn_offset = keystone_dma_pfn_offset;
-   dev_err(dev, "set dma_pfn_offset%08lx\n",
-   dev->dma_pfn_offset);
+   int ret = dma_attach_offset_range(dev, KEYSTONE_HIGH_PHYS_START,
+ KEYSTONE_LOW_PHYS_START,
+ KEYSTONE_HIGH_PHYS_SIZE);
+   dev_err(dev, "set dma_offset%08llx%s\n",
+   KEYSTONE_HIGH_PHYS_START - KEYSTONE_LOW_PHYS_START,
+   ret ? " failed" : "");
}
return NOTIFY_OK;
 }
@@ -51,11 +53,8 @@ static struct notifier_block platform_nb = {
 
 static void __init keystone_init(void)
 {
-   if (PHYS_OFFSET >= KEYSTONE_HIGH_PHYS_START) {
-   keystone_dma_pfn_offset = PFN_DOWN(KEYSTONE_HIGH_PHYS_START -
-  KEYSTONE_LOW_PHYS_START);
+   if 

Re: [PATCH v4 04/15] vfio/type1: Report iommu nesting info to userspace

2020-07-08 Thread Alex Williamson
On Wed, 8 Jul 2020 08:08:40 +
"Liu, Yi L"  wrote:

> Hi Alex,
> 
> Eric asked if we will to have data strcut other than struct iommu_nesting_info
> type in the struct vfio_iommu_type1_info_cap_nesting @info[] field. I'm not
> quit sure on it. I guess the answer may be not as VFIO's nesting support 
> should
> based on IOMMU UAPI. how about your opinion?
> 
> +#define VFIO_IOMMU_TYPE1_INFO_CAP_NESTING  3
> +
> +/*
> + * Reporting nesting info to user space.
> + *
> + * @info:the nesting info provided by IOMMU driver. Today
> + *   it is expected to be a struct iommu_nesting_info
> + *   data.
> + */
> +struct vfio_iommu_type1_info_cap_nesting {
> + struct  vfio_info_cap_header header;
> + __u32   flags;
> + __u32   padding;
> + __u8info[];
> +};

It's not a very useful uAPI if the user can't be sure what they're
getting out of it.  Info capabilities are "cheap", they don't need to
be as extensible as an ioctl.  It's not clear that we really even need
the flags (and therefore the padding), just define it to return the
IOMMU uAPI structure with no extensibility.  If we need to expose
something else, create a new capability.  Thanks,

Alex

> 
> https://lore.kernel.org/linux-iommu/dm5pr11mb1435290b6cd561ec61027892c3...@dm5pr11mb1435.namprd11.prod.outlook.com/
> 
> Regards,
> Yi Liu
> 
> > From: Liu, Yi L
> > Sent: Tuesday, July 7, 2020 5:32 PM
> >   
> [...]
> > > >  
> > > >>> +
> > > >>> +/*
> > > >>> + * Reporting nesting info to user space.
> > > >>> + *
> > > >>> + * @info:the nesting info provided by IOMMU driver. Today
> > > >>> + *   it is expected to be a struct iommu_nesting_info
> > > >>> + *   data.  
> > > >> Is it expected to change?  
> > > >
> > > > honestly, I'm not quite sure on it. I did considered to embed struct
> > > > iommu_nesting_info here instead of using info[]. but I hesitated as
> > > > using info[] may leave more flexibility on this struct. how about
> > > > your opinion? perhaps it's fine to embed the struct
> > > > iommu_nesting_info here as long as VFIO is setup nesting based on
> > > > IOMMU UAPI.
> > > >  
> > > >>> + */
> > > >>> +struct vfio_iommu_type1_info_cap_nesting {
> > > >>> + struct  vfio_info_cap_header header;
> > > >>> + __u32   flags;  
> > > >> You may document flags.  
> > > >
> > > > sure. it's reserved for future.
> > > >
> > > > Regards,
> > > > Yi Liu
> > > >  
> > > >>> + __u32   padding;
> > > >>> + __u8info[];
> > > >>> +};
> > > >>> +
> > > >>>  #define VFIO_IOMMU_GET_INFO _IO(VFIO_TYPE, VFIO_BASE + 12)
> > > >>>
> > > >>>  /**
> > > >>>  
> > > >> Thanks
> > > >>
> > > >> Eric  
> > > >  
> 

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [Freedreno] [PATCH v2 2/6] iommu/io-pgtable: Allow a pgtable implementation to skip TLB operations

2020-07-08 Thread Jordan Crouse
On Tue, Jul 07, 2020 at 07:58:18AM -0700, Rob Clark wrote:
> On Tue, Jul 7, 2020 at 7:25 AM Rob Clark  wrote:
> >
> > On Tue, Jul 7, 2020 at 4:34 AM Robin Murphy  wrote:
> > >
> > > On 2020-06-26 21:04, Jordan Crouse wrote:
> > > > Allow a io-pgtable implementation to skip TLB operations by checking for
> > > > NULL pointers in the helper functions. It will be up to to the owner
> > > > of the io-pgtable instance to make sure that they independently handle
> > > > the TLB correctly.
> > >
> > > I don't really understand what this is for - tricking the IOMMU driver
> > > into not performing its TLB maintenance at points when that maintenance
> > > has been deemed necessary doesn't seem like the appropriate way to
> > > achieve anything good :/
> >
> > No, for triggering the io-pgtable helpers into not performing TLB
> > maintenance.  But seriously, since we are creating pgtables ourselves,
> > and we don't want to be ioremap'ing the GPU's SMMU instance, the
> > alternative is plugging in no-op helpers.  Which amounts to the same
> > thing.
> 
> Hmm, that said, since we are just memcpy'ing the io_pgtable_cfg from
> arm-smmu, it will already be populated with arm-smmu's fxn ptrs.  I
> guess we could maybe make it work without no-op helpers, although in
> that case it looks like we need to fix something about aux-domain vs
> tlb helpers:

I had a change that handled these correctly but I abandoned it because the
TLB functions didn't kick the power and I didn't think that would be desirable
at the generic level for performance reasons. Since the GPU SMMU is on the same
power domain as the GMU we could enable it in the GPU driver before calling
the TLB operations but we would need to be clever about it to prevent bringing
up the GMU just to unmap memory.

Jordan

> [  +0.004373] Unable to handle kernel NULL pointer dereference at
> virtual address 0019
> [  +0.004086] Mem abort info:
> [  +0.004319]   ESR = 0x9604
> [  +0.003462]   EC = 0x25: DABT (current EL), IL = 32 bits
> [  +0.003494]   SET = 0, FnV = 0
> [  +0.002812]   EA = 0, S1PTW = 0
> [  +0.002873] Data abort info:
> [  +0.003031]   ISV = 0, ISS = 0x0004
> [  +0.003785]   CM = 0, WnR = 0
> [  +0.003641] user pgtable: 4k pages, 48-bit VAs, pgdp=000261d65000
> [  +0.003383] [0019] pgd=, p4d=
> [  +0.003715] Internal error: Oops: 9604 [#1] PREEMPT SMP
> [  +0.002744] Modules linked in: xt_CHECKSUM xt_MASQUERADE
> xt_conntrack ipt_REJECT nf_reject_ipv4 xt_tcpudp ip6table_mangle
> ip6table_nat iptable_mangle iptable_nat nf_nat nf_conntrack
> nf_defrag_ipv4 libcrc32c bridge stp llc ip6table_filter ip6_tables
> iptable_filter ax88179_178a usbnet uvcvideo videobuf2_vmalloc
> videobuf2_memops videobuf2_v4l2 videobuf2_common videodev mc
> hid_multitouch i2c_hid some_battery ti_sn65dsi86 hci_uart btqca btbcm
> qcom_spmi_adc5 bluetooth qcom_spmi_temp_alarm qcom_vadc_common
> ecdh_generic ecc snd_soc_sdm845 snd_soc_rt5663 snd_soc_qcom_common
> ath10k_snoc ath10k_core crct10dif_ce ath mac80211 snd_soc_rl6231
> soundwire_bus i2c_qcom_geni libarc4 qcom_rng msm phy_qcom_qusb2
> reset_qcom_pdc drm_kms_helper cfg80211 rfkill qcom_q6v5_mss
> qcom_q6v5_ipa_notify socinfo qrtr ns panel_simple qcom_q6v5_pas
> qcom_common qcom_glink_smem slim_qcom_ngd_ctrl qcom_sysmon drm
> qcom_q6v5 slimbus qmi_helpers qcom_wdt mdt_loader rmtfs_mem be2iscsi
> bnx2i cnic uio cxgb4i cxgb4 cxgb3i cxgb3 mdio
> [  +0.000139]  libcxgbi libcxgb qla4xxx iscsi_boot_sysfs iscsi_tcp
> libiscsi_tcp libiscsi scsi_transport_iscsi fuse ip_tables x_tables
> ipv6 nf_defrag_ipv6
> [  +0.020933] CPU: 3 PID: 168 Comm: kworker/u16:7 Not tainted
> 5.8.0-rc1-c630+ #31
> [  +0.003828] Hardware name: LENOVO 81JL/LNVNB161216, BIOS
> 9UCN33WW(V2.06) 06/ 4/2019
> [  +0.004039] Workqueue: msm msm_gem_free_work [msm]
> [  +0.003885] pstate: 60c5 (nZCv daif +PAN +UAO BTYPE=--)
> [  +0.003859] pc : arm_smmu_tlb_inv_range_s1+0x30/0x148
> [  +0.003742] lr : arm_smmu_tlb_add_page_s1+0x1c/0x28
> [  +0.003887] sp : 800011cdb970
> [  +0.003868] x29: 800011cdb970 x28: 0003
> [  +0.003930] x27: 0001f1882f80 x26: 0001
> [  +0.003886] x25: 0003 x24: 0620
> [  +0.003932] x23:  x22: 1000
> [  +0.003886] x21: 1000 x20: 0001cf857300
> [  +0.003916] x19: 0001 x18: 
> [  +0.003921] x17: d9e6a24ae0e8 x16: 00012577
> [  +0.003843] x15: 00012578 x14: 
> [  +0.003884] x13: 00012574 x12: d9e6a2550180
> [  +0.003834] x11: 00083f80 x10: 
> [  +0.003889] x9 :  x8 : 0001f1882f80
> [  +0.003812] x7 : 0001 x6 : 0048
> [  +0.003807] x5 : 0001c86e1000 x4 : 0620
> [  +0.003802] x3 : 0001ddb57700 x2 : 1000
> [  +0.003809] x1 : 1000 x0 : 000101048000
> [  +0.003768] Call trace:
> [  

Re: [PATCH v2 1/2] iommu: iommu_aux_at(de)tach_device() extension

2020-07-08 Thread Alex Williamson
On Wed, 8 Jul 2020 10:53:12 +0800
Lu Baolu  wrote:

> Hi Alex,
> 
> Thanks a lot for your comments. Please check my reply inline.
> 
> On 7/8/20 5:04 AM, Alex Williamson wrote:
> > On Tue,  7 Jul 2020 09:39:56 +0800
> > Lu Baolu  wrote:
> >   
> >> The hardware assistant vfio mediated device is a use case of iommu
> >> aux-domain. The interactions between vfio/mdev and iommu during mdev
> >> creation and passthr are:
> >>
> >> - Create a group for mdev with iommu_group_alloc();
> >> - Add the device to the group with
> >>  group = iommu_group_alloc();
> >>  if (IS_ERR(group))
> >>  return PTR_ERR(group);
> >>
> >>  ret = iommu_group_add_device(group, >dev);
> >>  if (!ret)
> >>  dev_info(>dev, "MDEV: group_id = %d\n",
> >>   iommu_group_id(group));
> >> - Allocate an aux-domain
> >>  iommu_domain_alloc()
> >> - Attach the aux-domain to the physical device from which the mdev is
> >>created.
> >>  iommu_aux_attach_device()
> >>
> >> In the whole process, an iommu group was allocated for the mdev and an
> >> iommu domain was attached to the group, but the group->domain leaves
> >> NULL. As the result, iommu_get_domain_for_dev() doesn't work anymore.
> >>
> >> The iommu_get_domain_for_dev() is a necessary interface for device
> >> drivers that want to support aux-domain. For example,
> >>
> >>  struct iommu_domain *domain;
> >>  struct device *dev = mdev_dev(mdev);
> >>  unsigned long pasid;
> >>
> >>  domain = iommu_get_domain_for_dev(dev);
> >>  if (!domain)
> >>  return -ENODEV;
> >>
> >>  pasid = iommu_aux_get_pasid(domain, dev->parent);  
> > How did we know this was an aux domain? ie. How did we know we could
> > use it with iommu_aux_get_pasid()?  
> 
> Yes. It's a bit confusing if iommu_get_domain_for_dev() is reused here
> for aux-domain.
> 
> > 
> > Why did we assume the parent device is the iommu device for the aux
> > domain?  Should that level of detail be already known by the aux domain?
> > 
> > Nits - The iomu device of an mdev device is found via
> > mdev_get_iommu_device(dev), it should not be assumed to be the parent.
> > The parent of an mdev device is found via mdev_parent_dev(mdev).  
> 
> My bad. The driver should use mdev_get_iommu_device() instead.
> 
> > 
> > The leaps in logic here make me wonder if we should instead be exposing
> > more of an aux domain API rather than blurring the differences between
> > these domains.  Thanks,  
> 
> How about add below API?
> 
> /**
>   * iommu_aux_get_domain_for_dev - get aux domain for a device
>   * @dev: the accessory device
>   *
>   * The caller should pass a valid @dev to iommu_aux_attach_device() before
>   * calling this api. Return an attached aux-domain, or NULL otherwise.

That's not necessarily the caller's responsibility, that might happen
elsewhere, this function simply returns an aux domain for the device if
it's attached to one.

>   */
> struct iommu_domain *iommu_aux_get_domain_for_dev(struct device *dev)
> {
>  struct iommu_domain *domain = NULL;
>  struct iommu_group *group;
> 
>  group = iommu_group_get(dev);
>  if (!group)
>  return NULL;
> 
>  if (group->aux_domain_attached)
>  domain = group->domain;
> 
>  iommu_group_put(group);
> 
>  return domain;
> }
> EXPORT_SYMBOL_GPL(iommu_aux_get_domain_for_dev);

For your example use case, this seems more clear to me.  Thanks,

Alex

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH] dma-pool: Do not allocate pool memory from CMA

2020-07-08 Thread Christoph Hellwig
On Wed, Jul 08, 2020 at 06:49:39PM +0200, Nicolas Saenz Julienne wrote:
> There is no guarantee to CMA's placement, so allocating a zone specific
> atomic pool from CMA might return memory from a completely different
> memory zone. So stop using it.
> 
> Fixes: c84dc6e68a1d ("dma-pool: add additional coherent pools to map to gfp 
> mask")
> Reported-by: Jeremy Linton 
> Signed-off-by: Nicolas Saenz Julienne 
> ---
> 
> An more costly alternative would be adding an option to
> dma_alloc_from_contiguous() so it fails when the allocation doesn't fall
> in a specific zone.

Which seems like the right thing to do.  But then again for 5.8 I think
something less invasive like your patch might be better.  Waiting for
a few more opinions for now.
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[PATCH] dma-pool: Do not allocate pool memory from CMA

2020-07-08 Thread Nicolas Saenz Julienne
There is no guarantee to CMA's placement, so allocating a zone specific
atomic pool from CMA might return memory from a completely different
memory zone. So stop using it.

Fixes: c84dc6e68a1d ("dma-pool: add additional coherent pools to map to gfp 
mask")
Reported-by: Jeremy Linton 
Signed-off-by: Nicolas Saenz Julienne 
---

An more costly alternative would be adding an option to
dma_alloc_from_contiguous() so it fails when the allocation doesn't fall
in a specific zone.

 kernel/dma/pool.c | 11 ++-
 1 file changed, 2 insertions(+), 9 deletions(-)

diff --git a/kernel/dma/pool.c b/kernel/dma/pool.c
index 8cfa01243ed2..4bc1c46ae6ef 100644
--- a/kernel/dma/pool.c
+++ b/kernel/dma/pool.c
@@ -6,7 +6,6 @@
 #include 
 #include 
 #include 
-#include 
 #include 
 #include 
 #include 
@@ -69,12 +68,7 @@ static int atomic_pool_expand(struct gen_pool *pool, size_t 
pool_size,
 
do {
pool_size = 1 << (PAGE_SHIFT + order);
-
-   if (dev_get_cma_area(NULL))
-   page = dma_alloc_from_contiguous(NULL, 1 << order,
-order, false);
-   else
-   page = alloc_pages(gfp, order);
+   page = alloc_pages(gfp, order);
} while (!page && order-- > 0);
if (!page)
goto out;
@@ -118,8 +112,7 @@ static int atomic_pool_expand(struct gen_pool *pool, size_t 
pool_size,
dma_common_free_remap(addr, pool_size);
 #endif
 free_page: __maybe_unused
-   if (!dma_release_from_contiguous(NULL, page, 1 << order))
-   __free_pages(page, order);
+   __free_pages(page, order);
 out:
return ret;
 }
-- 
2.27.0

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH] dma-pool: use single atomic pool for both DMA zones

2020-07-08 Thread Robin Murphy

On 2020-07-08 16:36, Christoph Hellwig wrote:

On Wed, Jul 08, 2020 at 12:35:34PM +0200, Nicolas Saenz Julienne wrote:

Which allows me to switch between ACPI/DT on the machine. In DT mode it
works fine now,


Nice, would that count as a Tested-by from you?


but with ACPI I continue to have failures unless I
disable CMA via cma=0 on the kernel command line.


Yes, I see why, in atomic_pool_expand() memory is allocated from CMA without
checking its correctness. That calls for a separate fix. I'll try to think of
something.


I think we need a dma_coherent_ok for the allocations from the
pool and then fall back to the next better one to get started.  And
yes, CMA is a bit of a mess, that generally needs better checks.


Yeah, another thought that came to mind later is that iommu-dma can use 
pages from any pool regardless of the device's DMA mask, so we could 
stand to be a lot less restrictive in that case too.


Perhaps it is better to just bite the bullet, keep the straightforward 
one-pool-per-zone setup, and implement the dma_coherent_ok() type 
fallback logic. More complexity for dma_alloc_from_pool(), but 
everything else stays nice and simple - lose the assumption that 
dev_to_pool() can work for this and and just let callers pass an 
allocation mask directly, and have dma_free_from_pool() simply check all 
available pools.


Robin.
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH] dma-pool: use single atomic pool for both DMA zones

2020-07-08 Thread Christoph Hellwig
On Wed, Jul 08, 2020 at 06:00:35PM +0200, Nicolas Saenz Julienne wrote:
> On Wed, 2020-07-08 at 17:35 +0200, Christoph Hellwig wrote:
> > On Tue, Jul 07, 2020 at 02:28:04PM +0200, Nicolas Saenz Julienne wrote:
> > > When allocating atomic DMA memory for a device, the dma-pool core
> > > queries __dma_direct_optimal_gfp_mask() to check which atomic pool to
> > > use. It turns out the GFP flag returned is only an optimistic guess.
> > > The pool selected might sometimes live in a zone higher than the
> > > device's view of memory.
> > > 
> > > As there isn't a way to grantee a mapping between a device's DMA
> > > constraints and correct GFP flags this unifies both DMA atomic pools.
> > > The resulting pool is allocated in the lower DMA zone available, if any,
> > > so as for devices to always get accessible memory while having the
> > > flexibility of using dma_pool_kernel for the non constrained ones.
> > > 
> > > Fixes: c84dc6e68a1d ("dma-pool: add additional coherent pools to map to 
> > > gfp
> > > mask")
> > > Reported-by: Jeremy Linton 
> > > Suggested-by: Robin Murphy 
> > > Signed-off-by: Nicolas Saenz Julienne 
> > 
> > Hmm, this is not what I expected from the previous thread.  I thought
> > we'd just use one dma pool based on runtime available of the zones..
> 
> I may be misunderstanding you, but isn't that going back to how things used to
> be before pulling in David Rientjes' work? The benefit of having a GFP_KERNEL
> pool is that non-address-constrained devices can get their atomic memory 
> there,
> instead of consuming somewhat scarcer low memory.

Yes, I think we are misunderstanding each other.  I don't want to remove
any pool, just make better runtime decisions when to use them.
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH] dma-pool: use single atomic pool for both DMA zones

2020-07-08 Thread Nicolas Saenz Julienne
On Wed, 2020-07-08 at 17:35 +0200, Christoph Hellwig wrote:
> On Tue, Jul 07, 2020 at 02:28:04PM +0200, Nicolas Saenz Julienne wrote:
> > When allocating atomic DMA memory for a device, the dma-pool core
> > queries __dma_direct_optimal_gfp_mask() to check which atomic pool to
> > use. It turns out the GFP flag returned is only an optimistic guess.
> > The pool selected might sometimes live in a zone higher than the
> > device's view of memory.
> > 
> > As there isn't a way to grantee a mapping between a device's DMA
> > constraints and correct GFP flags this unifies both DMA atomic pools.
> > The resulting pool is allocated in the lower DMA zone available, if any,
> > so as for devices to always get accessible memory while having the
> > flexibility of using dma_pool_kernel for the non constrained ones.
> > 
> > Fixes: c84dc6e68a1d ("dma-pool: add additional coherent pools to map to gfp
> > mask")
> > Reported-by: Jeremy Linton 
> > Suggested-by: Robin Murphy 
> > Signed-off-by: Nicolas Saenz Julienne 
> 
> Hmm, this is not what I expected from the previous thread.  I thought
> we'd just use one dma pool based on runtime available of the zones..

I may be misunderstanding you, but isn't that going back to how things used to
be before pulling in David Rientjes' work? The benefit of having a GFP_KERNEL
pool is that non-address-constrained devices can get their atomic memory there,
instead of consuming somewhat scarcer low memory.



signature.asc
Description: This is a digitally signed message part
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

[PATCH 5/5] powerpc: use the generic dma_ops_bypass mode

2020-07-08 Thread Christoph Hellwig
Use the DMA API bypass mechanism for direct window mappings.  This uses
common code and speed up the direct mapping case by avoiding indirect
calls just when not using dma ops at all.  It also fixes a problem where
the sync_* methods were using the bypass check for DMA allocations, but
those are part of the streaming ops.

Note that this patch loses the DMA_ATTR_WEAK_ORDERING override, which
has never been well defined, as is only used by a few drivers, which
IIRC never showed up in the typical Cell blade setups that are affected
by the ordering workaround.

Fixes: efd176a04bef ("powerpc/pseries/dma: Allow SWIOTLB")
Signed-off-by: Christoph Hellwig 
---
 arch/powerpc/Kconfig  |  1 +
 arch/powerpc/include/asm/device.h |  5 --
 arch/powerpc/kernel/dma-iommu.c   | 90 ---
 3 files changed, 10 insertions(+), 86 deletions(-)

diff --git a/arch/powerpc/Kconfig b/arch/powerpc/Kconfig
index e9b091d3587222..be868bfbe76ecf 100644
--- a/arch/powerpc/Kconfig
+++ b/arch/powerpc/Kconfig
@@ -152,6 +152,7 @@ config PPC
select CLONE_BACKWARDS
select DCACHE_WORD_ACCESS   if PPC64 && CPU_LITTLE_ENDIAN
select DMA_OPS  if PPC64
+   select DMA_OPS_BYPASS   if PPC64
select DYNAMIC_FTRACE   if FUNCTION_TRACER
select EDAC_ATOMIC_SCRUB
select EDAC_SUPPORT
diff --git a/arch/powerpc/include/asm/device.h 
b/arch/powerpc/include/asm/device.h
index 266542769e4bd1..452402215e1210 100644
--- a/arch/powerpc/include/asm/device.h
+++ b/arch/powerpc/include/asm/device.h
@@ -18,11 +18,6 @@ struct iommu_table;
  * drivers/macintosh/macio_asic.c
  */
 struct dev_archdata {
-   /*
-* Set to %true if the dma_iommu_ops are requested to use a direct
-* window instead of dynamically mapping memory.
-*/
-   booliommu_bypass : 1;
/*
 * These two used to be a union. However, with the hybrid ops we need
 * both so here we store both a DMA offset for direct mappings and
diff --git a/arch/powerpc/kernel/dma-iommu.c b/arch/powerpc/kernel/dma-iommu.c
index e486d1d78de288..569fecd7b5b234 100644
--- a/arch/powerpc/kernel/dma-iommu.c
+++ b/arch/powerpc/kernel/dma-iommu.c
@@ -14,23 +14,6 @@
  * Generic iommu implementation
  */
 
-/*
- * The coherent mask may be smaller than the real mask, check if we can
- * really use a direct window.
- */
-static inline bool dma_iommu_alloc_bypass(struct device *dev)
-{
-   return dev->archdata.iommu_bypass && !iommu_fixed_is_weak &&
-   dma_direct_supported(dev, dev->coherent_dma_mask);
-}
-
-static inline bool dma_iommu_map_bypass(struct device *dev,
-   unsigned long attrs)
-{
-   return dev->archdata.iommu_bypass &&
-   (!iommu_fixed_is_weak || (attrs & DMA_ATTR_WEAK_ORDERING));
-}
-
 /* Allocates a contiguous real buffer and creates mappings over it.
  * Returns the virtual address of the buffer and sets dma_handle
  * to the dma address (mapping) of the first page.
@@ -39,8 +22,6 @@ static void *dma_iommu_alloc_coherent(struct device *dev, 
size_t size,
  dma_addr_t *dma_handle, gfp_t flag,
  unsigned long attrs)
 {
-   if (dma_iommu_alloc_bypass(dev))
-   return dma_direct_alloc(dev, size, dma_handle, flag, attrs);
return iommu_alloc_coherent(dev, get_iommu_table_base(dev), size,
dma_handle, dev->coherent_dma_mask, flag,
dev_to_node(dev));
@@ -50,11 +31,7 @@ static void dma_iommu_free_coherent(struct device *dev, 
size_t size,
void *vaddr, dma_addr_t dma_handle,
unsigned long attrs)
 {
-   if (dma_iommu_alloc_bypass(dev))
-   dma_direct_free(dev, size, vaddr, dma_handle, attrs);
-   else
-   iommu_free_coherent(get_iommu_table_base(dev), size, vaddr,
-   dma_handle);
+   iommu_free_coherent(get_iommu_table_base(dev), size, vaddr, dma_handle);
 }
 
 /* Creates TCEs for a user provided buffer.  The user buffer must be
@@ -67,9 +44,6 @@ static dma_addr_t dma_iommu_map_page(struct device *dev, 
struct page *page,
 enum dma_data_direction direction,
 unsigned long attrs)
 {
-   if (dma_iommu_map_bypass(dev, attrs))
-   return dma_direct_map_page(dev, page, offset, size, direction,
-   attrs);
return iommu_map_page(dev, get_iommu_table_base(dev), page, offset,
  size, dma_get_mask(dev), direction, attrs);
 }
@@ -79,11 +53,8 @@ static void dma_iommu_unmap_page(struct device *dev, 
dma_addr_t dma_handle,
 size_t size, enum dma_data_direction direction,
  

[PATCH 4/5] dma-mapping: add a dma_ops_bypass flag to struct device

2020-07-08 Thread Christoph Hellwig
Several IOMMU drivers have a bypass mode where they can use a direct
mapping if the devices DMA mask is large enough.  Add generic support
to the core dma-mapping code to do that to switch those drivers to
a common solution.

Signed-off-by: Christoph Hellwig 
---
 include/linux/device.h |  8 +
 kernel/dma/Kconfig |  8 +
 kernel/dma/mapping.c   | 74 +-
 3 files changed, 68 insertions(+), 22 deletions(-)

diff --git a/include/linux/device.h b/include/linux/device.h
index 4c4af98321ebd6..1f71acf37f78d7 100644
--- a/include/linux/device.h
+++ b/include/linux/device.h
@@ -523,6 +523,11 @@ struct dev_links_info {
  *   sync_state() callback.
  * @dma_coherent: this particular device is dma coherent, even if the
  * architecture supports non-coherent devices.
+ * @dma_ops_bypass: If set to %true then the dma_ops are bypassed for the
+ * streaming DMA operations (->map_* / ->unmap_* / ->sync_*),
+ * and optionall (if the coherent mask is large enough) also
+ * for dma allocations.  This flag is managed by the dma ops
+ * instance from ->dma_supported.
  *
  * At the lowest level, every device in a Linux system is represented by an
  * instance of struct device. The device structure contains the information
@@ -623,6 +628,9 @@ struct device {
 defined(CONFIG_ARCH_HAS_SYNC_DMA_FOR_CPU_ALL)
booldma_coherent:1;
 #endif
+#ifdef CONFIG_DMA_OPS_BYPASS
+   booldma_ops_bypass : 1;
+#endif
 };
 
 static inline struct device *kobj_to_dev(struct kobject *kobj)
diff --git a/kernel/dma/Kconfig b/kernel/dma/Kconfig
index 5cfb2428593ac7..f4770fcfa62bb3 100644
--- a/kernel/dma/Kconfig
+++ b/kernel/dma/Kconfig
@@ -8,6 +8,14 @@ config HAS_DMA
 config DMA_OPS
bool
 
+#
+# IOMMU drivers that can bypass the IOMMU code and optionally use the direct
+# mapping fast path should select this option and set the dma_ops_bypass
+# flag in struct device where applicable
+#
+config DMA_OPS_BYPASS
+   bool
+
 config NEED_SG_DMA_LENGTH
bool
 
diff --git a/kernel/dma/mapping.c b/kernel/dma/mapping.c
index b53953024512fe..0d129421e75fc8 100644
--- a/kernel/dma/mapping.c
+++ b/kernel/dma/mapping.c
@@ -105,9 +105,35 @@ void *dmam_alloc_attrs(struct device *dev, size_t size, 
dma_addr_t *dma_handle,
 }
 EXPORT_SYMBOL(dmam_alloc_attrs);
 
-static inline bool dma_is_direct(const struct dma_map_ops *ops)
+static bool dma_go_direct(struct device *dev, dma_addr_t mask,
+   const struct dma_map_ops *ops)
 {
-   return likely(!ops);
+   if (likely(!ops))
+   return true;
+#ifdef CONFIG_DMA_OPS_BYPASS
+   if (dev->dma_ops_bypass)
+   return min_not_zero(mask, dev->bus_dma_limit) >=
+   dma_direct_get_required_mask(dev);
+#endif
+   return false;
+}
+
+
+/*
+ * Check if the devices uses a direct mapping for streaming DMA operations.
+ * This allows IOMMU drivers to set a bypass mode if the DMA mask is large
+ * enough.
+ */
+static inline bool dma_alloc_direct(struct device *dev,
+   const struct dma_map_ops *ops)
+{
+   return dma_go_direct(dev, dev->coherent_dma_mask, ops);
+}
+
+static inline bool dma_map_direct(struct device *dev,
+   const struct dma_map_ops *ops)
+{
+   return dma_go_direct(dev, *dev->dma_mask, ops);
 }
 
 dma_addr_t dma_map_page_attrs(struct device *dev, struct page *page,
@@ -118,7 +144,7 @@ dma_addr_t dma_map_page_attrs(struct device *dev, struct 
page *page,
dma_addr_t addr;
 
BUG_ON(!valid_dma_direction(dir));
-   if (dma_is_direct(ops))
+   if (dma_map_direct(dev, ops))
addr = dma_direct_map_page(dev, page, offset, size, dir, attrs);
else
addr = ops->map_page(dev, page, offset, size, dir, attrs);
@@ -134,7 +160,7 @@ void dma_unmap_page_attrs(struct device *dev, dma_addr_t 
addr, size_t size,
const struct dma_map_ops *ops = get_dma_ops(dev);
 
BUG_ON(!valid_dma_direction(dir));
-   if (dma_is_direct(ops))
+   if (dma_map_direct(dev, ops))
dma_direct_unmap_page(dev, addr, size, dir, attrs);
else if (ops->unmap_page)
ops->unmap_page(dev, addr, size, dir, attrs);
@@ -153,7 +179,7 @@ int dma_map_sg_attrs(struct device *dev, struct scatterlist 
*sg, int nents,
int ents;
 
BUG_ON(!valid_dma_direction(dir));
-   if (dma_is_direct(ops))
+   if (dma_map_direct(dev, ops))
ents = dma_direct_map_sg(dev, sg, nents, dir, attrs);
else
ents = ops->map_sg(dev, sg, nents, dir, attrs);
@@ -172,7 +198,7 @@ void dma_unmap_sg_attrs(struct device *dev, struct 
scatterlist *sg,
 
BUG_ON(!valid_dma_direction(dir));
debug_dma_unmap_sg(dev, sg, nents, dir);
-   if (dma_is_direct(ops))
+   if (dma_map_direct(dev, ops))

Re: [PATCH] dma-pool: use single atomic pool for both DMA zones

2020-07-08 Thread Christoph Hellwig
On Wed, Jul 08, 2020 at 12:35:34PM +0200, Nicolas Saenz Julienne wrote:
> > Which allows me to switch between ACPI/DT on the machine. In DT mode it 
> > works fine now, 
> 
> Nice, would that count as a Tested-by from you?
> 
> > but with ACPI I continue to have failures unless I 
> > disable CMA via cma=0 on the kernel command line. 
> 
> Yes, I see why, in atomic_pool_expand() memory is allocated from CMA without
> checking its correctness. That calls for a separate fix. I'll try to think of
> something.

I think we need a dma_coherent_ok for the allocations from the
pool and then fall back to the next better one to get started.  And
yes, CMA is a bit of a mess, that generally needs better checks.
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[PATCH 3/5] dma-mapping: make support for dma ops optional

2020-07-08 Thread Christoph Hellwig
Avoid the overhead of the dma ops support for tiny builds that only
use the direct mapping.

Signed-off-by: Christoph Hellwig 
---
 arch/alpha/Kconfig  |  1 +
 arch/arm/Kconfig|  1 +
 arch/ia64/Kconfig   |  1 +
 arch/mips/Kconfig   |  1 +
 arch/parisc/Kconfig |  1 +
 arch/powerpc/Kconfig|  1 +
 arch/s390/Kconfig   |  1 +
 arch/sparc/Kconfig  |  1 +
 arch/x86/Kconfig|  1 +
 drivers/iommu/Kconfig   |  2 ++
 drivers/misc/mic/Kconfig|  1 +
 drivers/vdpa/Kconfig|  1 +
 drivers/xen/Kconfig |  1 +
 include/linux/device.h  |  3 ++-
 include/linux/dma-mapping.h | 12 +++-
 kernel/dma/Kconfig  |  4 
 kernel/dma/Makefile |  3 ++-
 17 files changed, 33 insertions(+), 3 deletions(-)

diff --git a/arch/alpha/Kconfig b/arch/alpha/Kconfig
index 10862c5a8c7682..9c5f06e8eb9bc0 100644
--- a/arch/alpha/Kconfig
+++ b/arch/alpha/Kconfig
@@ -7,6 +7,7 @@ config ALPHA
select ARCH_NO_PREEMPT
select ARCH_NO_SG_CHAIN
select ARCH_USE_CMPXCHG_LOCKREF
+   select DMA_OPS if PCI
select FORCE_PCI if !ALPHA_JENSEN
select PCI_DOMAINS if PCI
select PCI_SYSCALL if PCI
diff --git a/arch/arm/Kconfig b/arch/arm/Kconfig
index 2ac74904a3ce58..bee35b0187e452 100644
--- a/arch/arm/Kconfig
+++ b/arch/arm/Kconfig
@@ -41,6 +41,7 @@ config ARM
select CPU_PM if SUSPEND || CPU_IDLE
select DCACHE_WORD_ACCESS if HAVE_EFFICIENT_UNALIGNED_ACCESS
select DMA_DECLARE_COHERENT
+   select DMA_OPS
select DMA_REMAP if MMU
select EDAC_SUPPORT
select EDAC_ATOMIC_SCRUB
diff --git a/arch/ia64/Kconfig b/arch/ia64/Kconfig
index 1fa2fe2ef053f8..5b4ec80bf5863a 100644
--- a/arch/ia64/Kconfig
+++ b/arch/ia64/Kconfig
@@ -192,6 +192,7 @@ config IA64_SGI_UV
 
 config IA64_HP_SBA_IOMMU
bool "HP SBA IOMMU support"
+   select DMA_OPS
default y
help
  Say Y here to add support for the SBA IOMMU found on HP zx1 and
diff --git a/arch/mips/Kconfig b/arch/mips/Kconfig
index 6fee1a133e9d6a..8a458105e445b6 100644
--- a/arch/mips/Kconfig
+++ b/arch/mips/Kconfig
@@ -367,6 +367,7 @@ config MACH_JAZZ
select ARC_PROMLIB
select ARCH_MIGHT_HAVE_PC_PARPORT
select ARCH_MIGHT_HAVE_PC_SERIO
+   select DMA_OPS
select FW_ARC
select FW_ARC32
select ARCH_MAY_HAVE_PC_FDC
diff --git a/arch/parisc/Kconfig b/arch/parisc/Kconfig
index 8e4c3708773d08..38c1eafc1f1ae9 100644
--- a/arch/parisc/Kconfig
+++ b/arch/parisc/Kconfig
@@ -14,6 +14,7 @@ config PARISC
select ARCH_HAS_UBSAN_SANITIZE_ALL
select ARCH_NO_SG_CHAIN
select ARCH_SUPPORTS_MEMORY_FAILURE
+   select DMA_OPS
select RTC_CLASS
select RTC_DRV_GENERIC
select INIT_ALL_POSSIBLE
diff --git a/arch/powerpc/Kconfig b/arch/powerpc/Kconfig
index 9fa23eb320ff5a..e9b091d3587222 100644
--- a/arch/powerpc/Kconfig
+++ b/arch/powerpc/Kconfig
@@ -151,6 +151,7 @@ config PPC
select BUILDTIME_TABLE_SORT
select CLONE_BACKWARDS
select DCACHE_WORD_ACCESS   if PPC64 && CPU_LITTLE_ENDIAN
+   select DMA_OPS  if PPC64
select DYNAMIC_FTRACE   if FUNCTION_TRACER
select EDAC_ATOMIC_SCRUB
select EDAC_SUPPORT
diff --git a/arch/s390/Kconfig b/arch/s390/Kconfig
index c7d7ede6300c59..687fe23f61cc8d 100644
--- a/arch/s390/Kconfig
+++ b/arch/s390/Kconfig
@@ -113,6 +113,7 @@ config S390
select ARCH_WANT_IPC_PARSE_VERSION
select BUILDTIME_TABLE_SORT
select CLONE_BACKWARDS2
+   select DMA_OPS if PCI
select DYNAMIC_FTRACE if FUNCTION_TRACER
select GENERIC_CLOCKEVENTS
select GENERIC_CPU_AUTOPROBE
diff --git a/arch/sparc/Kconfig b/arch/sparc/Kconfig
index 5bf2dc163540fc..5db1faaaee31c8 100644
--- a/arch/sparc/Kconfig
+++ b/arch/sparc/Kconfig
@@ -15,6 +15,7 @@ config SPARC
default y
select ARCH_MIGHT_HAVE_PC_PARPORT if SPARC64 && PCI
select ARCH_MIGHT_HAVE_PC_SERIO
+   select DMA_OPS
select OF
select OF_PROMTREE
select HAVE_ASM_MODVERSIONS
diff --git a/arch/x86/Kconfig b/arch/x86/Kconfig
index 883da0abf7790c..96ab92754158dd 100644
--- a/arch/x86/Kconfig
+++ b/arch/x86/Kconfig
@@ -909,6 +909,7 @@ config DMI
 
 config GART_IOMMU
bool "Old AMD GART IOMMU support"
+   select DMA_OPS
select IOMMU_HELPER
select SWIOTLB
depends on X86_64 && PCI && AMD_NB
diff --git a/drivers/iommu/Kconfig b/drivers/iommu/Kconfig
index 6dc49ed8377a5c..d6ce878a7e8684 100644
--- a/drivers/iommu/Kconfig
+++ b/drivers/iommu/Kconfig
@@ -97,6 +97,7 @@ config OF_IOMMU
 # IOMMU-agnostic DMA-mapping layer
 config IOMMU_DMA
bool
+   select DMA_OPS
select IOMMU_API
select IOMMU_IOVA
select IRQ_MSI_IOMMU
@@ -183,6 +184,7 @@ config DMAR_TABLE
 config INTEL_IOMMU
bool "Support 

Re: [PATCH] dma-pool: use single atomic pool for both DMA zones

2020-07-08 Thread Christoph Hellwig
On Tue, Jul 07, 2020 at 02:28:04PM +0200, Nicolas Saenz Julienne wrote:
> When allocating atomic DMA memory for a device, the dma-pool core
> queries __dma_direct_optimal_gfp_mask() to check which atomic pool to
> use. It turns out the GFP flag returned is only an optimistic guess.
> The pool selected might sometimes live in a zone higher than the
> device's view of memory.
> 
> As there isn't a way to grantee a mapping between a device's DMA
> constraints and correct GFP flags this unifies both DMA atomic pools.
> The resulting pool is allocated in the lower DMA zone available, if any,
> so as for devices to always get accessible memory while having the
> flexibility of using dma_pool_kernel for the non constrained ones.
> 
> Fixes: c84dc6e68a1d ("dma-pool: add additional coherent pools to map to gfp 
> mask")
> Reported-by: Jeremy Linton 
> Suggested-by: Robin Murphy 
> Signed-off-by: Nicolas Saenz Julienne 

Hmm, this is not what I expected from the previous thread.  I thought
we'd just use one dma pool based on runtime available of the zones..
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[PATCH 2/5] dma-mapping: inline the fast path dma-direct calls

2020-07-08 Thread Christoph Hellwig
Inline the single page map/unmap/sync dma-direct calls into the now
out of line generic wrappers.  This restores the behavior of a single
function call that we had before moving the generic calls out of line.
Besides the dma-mapping callers there are just a few callers in IOMMU
drivers that have a bypass mode, and more of those are going to be
switched to the generic bypass soon.

Signed-off-by: Christoph Hellwig 
---
 include/linux/dma-direct.h | 92 --
 kernel/dma/direct.c| 65 ---
 2 files changed, 69 insertions(+), 88 deletions(-)

diff --git a/include/linux/dma-direct.h b/include/linux/dma-direct.h
index 78dc3524adf880..dbb19dd9869054 100644
--- a/include/linux/dma-direct.h
+++ b/include/linux/dma-direct.h
@@ -1,10 +1,16 @@
 /* SPDX-License-Identifier: GPL-2.0 */
+/*
+ * Internals of the DMA direct mapping implementation.  Only for use by the
+ * DMA mapping code and IOMMU drivers.
+ */
 #ifndef _LINUX_DMA_DIRECT_H
 #define _LINUX_DMA_DIRECT_H 1
 
 #include 
+#include 
 #include  /* for min_low_pfn */
 #include 
+#include 
 
 extern unsigned int zone_dma_bits;
 
@@ -86,25 +92,17 @@ int dma_direct_mmap(struct device *dev, struct 
vm_area_struct *vma,
unsigned long attrs);
 int dma_direct_supported(struct device *dev, u64 mask);
 bool dma_direct_need_sync(struct device *dev, dma_addr_t dma_addr);
-dma_addr_t dma_direct_map_page(struct device *dev, struct page *page,
-   unsigned long offset, size_t size, enum dma_data_direction dir,
-   unsigned long attrs);
 int dma_direct_map_sg(struct device *dev, struct scatterlist *sgl, int nents,
enum dma_data_direction dir, unsigned long attrs);
 dma_addr_t dma_direct_map_resource(struct device *dev, phys_addr_t paddr,
size_t size, enum dma_data_direction dir, unsigned long attrs);
+size_t dma_direct_max_mapping_size(struct device *dev);
 
 #if defined(CONFIG_ARCH_HAS_SYNC_DMA_FOR_DEVICE) || \
 defined(CONFIG_SWIOTLB)
-void dma_direct_sync_single_for_device(struct device *dev,
-   dma_addr_t addr, size_t size, enum dma_data_direction dir);
-void dma_direct_sync_sg_for_device(struct device *dev,
-   struct scatterlist *sgl, int nents, enum dma_data_direction 
dir);
+void dma_direct_sync_sg_for_device(struct device *dev, struct scatterlist *sgl,
+   int nents, enum dma_data_direction dir);
 #else
-static inline void dma_direct_sync_single_for_device(struct device *dev,
-   dma_addr_t addr, size_t size, enum dma_data_direction dir)
-{
-}
 static inline void dma_direct_sync_sg_for_device(struct device *dev,
struct scatterlist *sgl, int nents, enum dma_data_direction dir)
 {
@@ -114,34 +112,82 @@ static inline void dma_direct_sync_sg_for_device(struct 
device *dev,
 #if defined(CONFIG_ARCH_HAS_SYNC_DMA_FOR_CPU) || \
 defined(CONFIG_ARCH_HAS_SYNC_DMA_FOR_CPU_ALL) || \
 defined(CONFIG_SWIOTLB)
-void dma_direct_unmap_page(struct device *dev, dma_addr_t addr,
-   size_t size, enum dma_data_direction dir, unsigned long attrs);
 void dma_direct_unmap_sg(struct device *dev, struct scatterlist *sgl,
int nents, enum dma_data_direction dir, unsigned long attrs);
-void dma_direct_sync_single_for_cpu(struct device *dev,
-   dma_addr_t addr, size_t size, enum dma_data_direction dir);
 void dma_direct_sync_sg_for_cpu(struct device *dev,
struct scatterlist *sgl, int nents, enum dma_data_direction 
dir);
 #else
-static inline void dma_direct_unmap_page(struct device *dev, dma_addr_t addr,
-   size_t size, enum dma_data_direction dir, unsigned long attrs)
-{
-}
 static inline void dma_direct_unmap_sg(struct device *dev,
struct scatterlist *sgl, int nents, enum dma_data_direction dir,
unsigned long attrs)
 {
 }
+static inline void dma_direct_sync_sg_for_cpu(struct device *dev,
+   struct scatterlist *sgl, int nents, enum dma_data_direction dir)
+{
+}
+#endif
+
+static inline void dma_direct_sync_single_for_device(struct device *dev,
+   dma_addr_t addr, size_t size, enum dma_data_direction dir)
+{
+   phys_addr_t paddr = dma_to_phys(dev, addr);
+
+   if (unlikely(is_swiotlb_buffer(paddr)))
+   swiotlb_tbl_sync_single(dev, paddr, size, dir, SYNC_FOR_DEVICE);
+
+   if (!dev_is_dma_coherent(dev))
+   arch_sync_dma_for_device(paddr, size, dir);
+}
+
 static inline void dma_direct_sync_single_for_cpu(struct device *dev,
dma_addr_t addr, size_t size, enum dma_data_direction dir)
 {
+   phys_addr_t paddr = dma_to_phys(dev, addr);
+
+   if (!dev_is_dma_coherent(dev)) {
+   arch_sync_dma_for_cpu(paddr, size, dir);
+   arch_sync_dma_for_cpu_all();
+   }
+
+   if (unlikely(is_swiotlb_buffer(paddr)))
+   swiotlb_tbl_sync_single(dev, paddr, size, dir, 

[PATCH 1/5] dma-mapping: move the remaining DMA API calls out of line

2020-07-08 Thread Christoph Hellwig
For a long time the DMA API has been implemented inline in dma-mapping.h,
but the function bodies can be quite large.  Move them all out of line.

This also removes all the dma_direct_* exports as those are just
implementation details and should never be used by drivers directly.

Signed-off-by: Christoph Hellwig 
---
 include/linux/dma-direct.h  |  58 +
 include/linux/dma-mapping.h | 247 
 kernel/dma/direct.c |   9 --
 kernel/dma/mapping.c| 164 
 4 files changed, 244 insertions(+), 234 deletions(-)

diff --git a/include/linux/dma-direct.h b/include/linux/dma-direct.h
index 5184735a0fe8eb..78dc3524adf880 100644
--- a/include/linux/dma-direct.h
+++ b/include/linux/dma-direct.h
@@ -86,4 +86,62 @@ int dma_direct_mmap(struct device *dev, struct 
vm_area_struct *vma,
unsigned long attrs);
 int dma_direct_supported(struct device *dev, u64 mask);
 bool dma_direct_need_sync(struct device *dev, dma_addr_t dma_addr);
+dma_addr_t dma_direct_map_page(struct device *dev, struct page *page,
+   unsigned long offset, size_t size, enum dma_data_direction dir,
+   unsigned long attrs);
+int dma_direct_map_sg(struct device *dev, struct scatterlist *sgl, int nents,
+   enum dma_data_direction dir, unsigned long attrs);
+dma_addr_t dma_direct_map_resource(struct device *dev, phys_addr_t paddr,
+   size_t size, enum dma_data_direction dir, unsigned long attrs);
+
+#if defined(CONFIG_ARCH_HAS_SYNC_DMA_FOR_DEVICE) || \
+defined(CONFIG_SWIOTLB)
+void dma_direct_sync_single_for_device(struct device *dev,
+   dma_addr_t addr, size_t size, enum dma_data_direction dir);
+void dma_direct_sync_sg_for_device(struct device *dev,
+   struct scatterlist *sgl, int nents, enum dma_data_direction 
dir);
+#else
+static inline void dma_direct_sync_single_for_device(struct device *dev,
+   dma_addr_t addr, size_t size, enum dma_data_direction dir)
+{
+}
+static inline void dma_direct_sync_sg_for_device(struct device *dev,
+   struct scatterlist *sgl, int nents, enum dma_data_direction dir)
+{
+}
+#endif
+
+#if defined(CONFIG_ARCH_HAS_SYNC_DMA_FOR_CPU) || \
+defined(CONFIG_ARCH_HAS_SYNC_DMA_FOR_CPU_ALL) || \
+defined(CONFIG_SWIOTLB)
+void dma_direct_unmap_page(struct device *dev, dma_addr_t addr,
+   size_t size, enum dma_data_direction dir, unsigned long attrs);
+void dma_direct_unmap_sg(struct device *dev, struct scatterlist *sgl,
+   int nents, enum dma_data_direction dir, unsigned long attrs);
+void dma_direct_sync_single_for_cpu(struct device *dev,
+   dma_addr_t addr, size_t size, enum dma_data_direction dir);
+void dma_direct_sync_sg_for_cpu(struct device *dev,
+   struct scatterlist *sgl, int nents, enum dma_data_direction 
dir);
+#else
+static inline void dma_direct_unmap_page(struct device *dev, dma_addr_t addr,
+   size_t size, enum dma_data_direction dir, unsigned long attrs)
+{
+}
+static inline void dma_direct_unmap_sg(struct device *dev,
+   struct scatterlist *sgl, int nents, enum dma_data_direction dir,
+   unsigned long attrs)
+{
+}
+static inline void dma_direct_sync_single_for_cpu(struct device *dev,
+   dma_addr_t addr, size_t size, enum dma_data_direction dir)
+{
+}
+static inline void dma_direct_sync_sg_for_cpu(struct device *dev,
+   struct scatterlist *sgl, int nents, enum dma_data_direction dir)
+{
+}
+#endif
+
+size_t dma_direct_max_mapping_size(struct device *dev);
+
 #endif /* _LINUX_DMA_DIRECT_H */
diff --git a/include/linux/dma-mapping.h b/include/linux/dma-mapping.h
index a33ed3954ed465..bd0a6f5ee44581 100644
--- a/include/linux/dma-mapping.h
+++ b/include/linux/dma-mapping.h
@@ -188,73 +188,6 @@ static inline int dma_mmap_from_global_coherent(struct 
vm_area_struct *vma,
 }
 #endif /* CONFIG_DMA_DECLARE_COHERENT */
 
-static inline bool dma_is_direct(const struct dma_map_ops *ops)
-{
-   return likely(!ops);
-}
-
-/*
- * All the dma_direct_* declarations are here just for the indirect call 
bypass,
- * and must not be used directly drivers!
- */
-dma_addr_t dma_direct_map_page(struct device *dev, struct page *page,
-   unsigned long offset, size_t size, enum dma_data_direction dir,
-   unsigned long attrs);
-int dma_direct_map_sg(struct device *dev, struct scatterlist *sgl, int nents,
-   enum dma_data_direction dir, unsigned long attrs);
-dma_addr_t dma_direct_map_resource(struct device *dev, phys_addr_t paddr,
-   size_t size, enum dma_data_direction dir, unsigned long attrs);
-
-#if defined(CONFIG_ARCH_HAS_SYNC_DMA_FOR_DEVICE) || \
-defined(CONFIG_SWIOTLB)
-void dma_direct_sync_single_for_device(struct device *dev,
-   dma_addr_t addr, size_t size, enum dma_data_direction dir);
-void dma_direct_sync_sg_for_device(struct device *dev,

generic DMA bypass flag v4

2020-07-08 Thread Christoph Hellwig
Hi all,

I've recently beeing chatting with Lu about using dma-iommu and
per-device DMA ops in the intel IOMMU driver, and one missing feature
in dma-iommu is a bypass mode where the direct mapping is used even
when an iommu is attached to improve performance.  The powerpc
code already has a similar mode, so I'd like to move it to the core
DMA mapping code.  As part of that I noticed that the current
powerpc code has a little bug in that it used the wrong check in the
dma_sync_* routines to see if the direct mapping code is used.

These two patches just add the generic code and move powerpc over,
the intel IOMMU bits will require a separate discussion.

The x86 AMD Gart code also has a bypass mode, but it is a lot
strange, so I'm not going to touch it for now.

Note that as-is this breaks the XSK buffer pool, which unfortunately
poked directly into DMA internals.  A fix for that is already queued
up in the netdev tree.

Jesper and XDP gang: this should not regress any performance as
the dma-direct calls are now inlined into the out of line DMA mapping
calls.  But if you can verify the performance numbers that would be
greatly appreciated.

A git tree is available here:

git://git.infradead.org/users/hch/misc.git dma-bypass.4

Gitweb:

git.infradead.org/users/hch/misc.git/shortlog/refs/heads/dma-bypass.4


Changes since v3:
 - add config options for the dma ops bypass and dma ops themselves
   to not increase the size of tinyconfig builds

Changes since v2:
 - move the dma mapping helpers out of line
 - check for possible direct mappings using the dma mask

Changes since v1:
 - rebased to the current dma-mapping-for-next tree


Diffstat:
 arch/alpha/Kconfig|1 
 arch/arm/Kconfig  |1 
 arch/ia64/Kconfig |1 
 arch/mips/Kconfig |1 
 arch/parisc/Kconfig   |1 
 arch/powerpc/Kconfig  |2 
 arch/powerpc/include/asm/device.h |5 
 arch/powerpc/kernel/dma-iommu.c   |   90 +
 arch/s390/Kconfig |1 
 arch/sparc/Kconfig|1 
 arch/x86/Kconfig  |1 
 drivers/iommu/Kconfig |2 
 drivers/misc/mic/Kconfig  |1 
 drivers/vdpa/Kconfig  |1 
 drivers/xen/Kconfig   |1 
 include/linux/device.h|   11 +
 include/linux/dma-direct.h|  104 +++
 include/linux/dma-mapping.h   |  251 --
 kernel/dma/Kconfig|   12 +
 kernel/dma/Makefile   |3 
 kernel/dma/direct.c   |   74 ---
 kernel/dma/mapping.c  |  214 ++--
 22 files changed, 385 insertions(+), 394 deletions(-)
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH v4 1/5] docs: IOMMU user API

2020-07-08 Thread Jacob Pan
On Wed, 8 Jul 2020 10:07:13 +0800
Lu Baolu  wrote:

> Hi,
> 
> On 7/8/20 7:43 AM, Jacob Pan wrote:
> > +For UAPIs that are shared with in-kernel users, a wrapper function
> > +is provided to distinguish the callers. For example,
> > +
> > +Userspace caller ::
> > +
> > +  int iommu_sva_unbind_gpasid(struct iommu_domain *domain, struct
> > device *dev,
> > +  void __user *udata)
> > +
> > +In-kernel caller ::
> > +
> > +  int __iommu_sva_unbind_gpasid(struct iommu_domain *domain,
> > struct device *dev,
> > +  struct iommu_gpasid_bind_data *data)  
> 
> iommu_page_response() may have the same needs. Can we reach an
> agreement on the naming rules?
> 
Yes iommu_page_response() also has to deal with in-kernel and UAPI
callers. I left it out because I thought the current VFIO and SVA common
code is not ready for PRQ yet, I might be reading old news :) argsz
need to be handled as well.

Perhaps we can wait until this set is settled? Do you have a suggestion
on the naming rules?

> Best regards,
> baolu

[Jacob Pan]
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH v3 1/5] docs: IOMMU user API

2020-07-08 Thread Jacob Pan
Hi Alex,

All points below are taken. I have sent out v4 that addresses these
feedbacks. Thanks for the review.

Jacob

On Tue, 7 Jul 2020 15:40:54 -0600
Alex Williamson  wrote:

> On Mon, 29 Jun 2020 16:05:18 -0700
> Jacob Pan  wrote:
> 
> > On Fri, 26 Jun 2020 16:19:23 -0600
> > Alex Williamson  wrote:
> >   
> > > On Tue, 23 Jun 2020 10:03:53 -0700
> > > Jacob Pan  wrote:
> > > 
> > > > IOMMU UAPI is newly introduced to support communications between
> > > > guest virtual IOMMU and host IOMMU. There has been lots of
> > > > discussions on how it should work with VFIO UAPI and userspace
> > > > in general.
> > > > 
> > > > This document is indended to clarify the UAPI design and usage.
> > > > The mechenics of how future extensions should be achieved are
> > > > also covered in this documentation.
> > > > 
> > > > Signed-off-by: Liu Yi L 
> > > > Signed-off-by: Jacob Pan 
> > > > ---
> > > >  Documentation/userspace-api/iommu.rst | 244
> > > > ++ 1 file changed, 244
> > > > insertions(+) create mode 100644
> > > > Documentation/userspace-api/iommu.rst
> > > > 
> > > > diff --git a/Documentation/userspace-api/iommu.rst
> > > > b/Documentation/userspace-api/iommu.rst new file mode 100644
> > > > index ..f9e4ed90a413
> > > > --- /dev/null
> > > > +++ b/Documentation/userspace-api/iommu.rst
> > > > @@ -0,0 +1,244 @@
> > > > +.. SPDX-License-Identifier: GPL-2.0
> > > > +.. iommu:
> > > > +
> > > > +=
> > > > +IOMMU Userspace API
> > > > +=
> > > > +
> > > > +IOMMU UAPI is used for virtualization cases where
> > > > communications are +needed between physical and virtual IOMMU
> > > > drivers. For native +usage, IOMMU is a system device which does
> > > > not need to communicate +with user space directly.
> > > > +
> > > > +The primary use cases are guest Shared Virtual Address (SVA)
> > > > and +guest IO virtual address (IOVA), wherein a virtual IOMMU
> > > > (vIOMMU) is +required to communicate with the physical IOMMU in
> > > > the host. +
> > > > +.. contents:: :local:
> > > > +
> > > > +Functionalities
> > > > +===
> > > > +Communications of user and kernel involve both directions. The
> > > > +supported user-kernel APIs are as follows:
> > > > +
> > > > +1. Alloc/Free PASID
> > > > +2. Bind/unbind guest PASID (e.g. Intel VT-d)
> > > > +3. Bind/unbind guest PASID table (e.g. ARM sMMU)
> > > > +4. Invalidate IOMMU caches
> > > > +5. Service page requests
> > > > +
> > > > +Requirements
> > > > +
> > > > +The IOMMU UAPIs are generic and extensible to meet the
> > > > following +requirements:
> > > > +
> > > > +1. Emulated and para-virtualised vIOMMUs
> > > > +2. Multiple vendors (Intel VT-d, ARM sMMU, etc.)
> > > > +3. Extensions to the UAPI shall not break existing user space
> > > > +
> > > > +Interfaces
> > > > +==
> > > > +Although the data structures defined in IOMMU UAPI are
> > > > self-contained, +there is no user API functions introduced.
> > > > Instead, IOMMU UAPI is +designed to work with existing user
> > > > driver frameworks such as VFIO. +
> > > > +Extension Rules & Precautions
> > > > +-
> > > > +When IOMMU UAPI gets extended, the data structures can *only*
> > > > be +modified in two ways:
> > > > +
> > > > +1. Adding new fields by re-purposing the padding[] field. No
> > > > size change. +2. Adding new union members at the end. May
> > > > increase in size. +
> > > > +No new fields can be added *after* the variable sized union in
> > > > that it +will break backward compatibility when offset moves. In
> > > > both cases, a +new flag must be accompanied with a new field
> > > > such that the IOMMU +driver can process the data based on the
> > > > new flag. Version field is +only reserved for the unlikely
> > > > event of UAPI upgrade at its entirety. +
> > > > +It's *always* the caller's responsibility to indicate the size
> > > > of the +structure passed by setting argsz appropriately.
> > > > +Though at the same time, argsz is user provided data which is
> > > > not +trusted. The argsz field allows the user to indicate how
> > > > much data +they're providing, it's still the kernel's
> > > > responsibility to validate +whether it's correct and sufficient
> > > > for the requested operation. +
> > > > +Compatibility Checking
> > > > +--
> > > > +When IOMMU UAPI extension results in size increase, user such
> > > > as VFIO +has to handle the following cases:
> > > > +
> > > > +1. User and kernel has exact size match
> > > > +2. An older user with older kernel header (smaller UAPI size)
> > > > running on a
> > > > +   newer kernel (larger UAPI size)
> > > > +3. A newer user with newer kernel header (larger UAPI size)
> > > > running
> > > > +   on an older kernel.
> > > > +4. A malicious/misbehaving user pass illegal/invalid size but
> > > > within
> > > > +   range. The data may contain 

Re: [PATCH v4 3/5] iommu/uapi: Use named union for user data

2020-07-08 Thread Jacob Pan
On Wed, 8 Jul 2020 10:17:57 +0800
Lu Baolu  wrote:

> Hi Jacob,
> 
> On 7/8/20 7:43 AM, Jacob Pan wrote:
> > IOMMU UAPI data size is filled by the user space which must be
> > validated by ther kernel. To ensure backward compatibility, user
> > data can only be extended by either re-purpose padding bytes or
> > extend the variable sized union at the end. No size change is
> > allowed before the union. Therefore, the minimum size is the offset
> > of the union.
> > 
> > To use offsetof() on the union, we must make it named.
> > 
> > Link:https://lkml.org/lkml/2020/6/11/834  
> 
> Please use lore.kernel.org links.
> 
OK. will do.

> Best regards,
> baolu

[Jacob Pan]
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH] dma-pool: use single atomic pool for both DMA zones

2020-07-08 Thread Jeremy Linton

Hi,

On 7/8/20 5:35 AM, Nicolas Saenz Julienne wrote:

Hi Jim,

On Tue, 2020-07-07 at 17:08 -0500, Jeremy Linton wrote:

Hi,

I spun this up on my 8G model using the PFTF firmware from:

https://github.com/pftf/RPi4/releases

Which allows me to switch between ACPI/DT on the machine. In DT mode it
works fine now,


Nice, would that count as a Tested-by from you?


If it worked... :)




but with ACPI I continue to have failures unless I
disable CMA via cma=0 on the kernel command line.


Yes, I see why, in atomic_pool_expand() memory is allocated from CMA without
checking its correctness. That calls for a separate fix. I'll try to think of
something.


It think that is because

using DT:

[0.00] Reserved memory: created CMA memory pool at
0x3740, size 64 MiB


using ACPI:
[0.00] cma: Reserved 64 MiB at 0xf800

Which is AFAIK because the default arm64 CMA allocation is just below
the arm64_dma32_phys_limit.


As I'm sure you know, we fix the CMA address trough DT, isn't that possible
trough ACPI?


Well there isn't a linux specific cma location property in ACPI. There 
are various ways to infer the information, like looking for the lowest 
_DMA() range and using that to lower the arm64_dma32_phys_limit. OTOH, 
as it stands I don't think that information is available early enough to 
setup the cma pool.


But as you mention the atomic pool code is allocating from CMA under the 
assumption that its going to be below the GFP_DMA range, which might not 
be generally true (due to lack of DT cma properties too?).

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH net] xsk: remove cheap_dma optimization

2020-07-08 Thread Robin Murphy

On 2020-07-08 07:50, Christoph Hellwig wrote:

On Mon, Jun 29, 2020 at 04:41:16PM +0100, Robin Murphy wrote:

On 2020-06-28 18:16, Bj�rn T�pel wrote:


On 2020-06-27 09:04, Christoph Hellwig wrote:

On Sat, Jun 27, 2020 at 01:00:19AM +0200, Daniel Borkmann wrote:

Given there is roughly a ~5 weeks window at max where this removal could
still be applied in the worst case, could we come up with a fix /
proposal
first that moves this into the DMA mapping core? If there is something
that
can be agreed upon by all parties, then we could avoid re-adding the 9%
slowdown. :/


I'd rather turn it upside down - this abuse of the internals blocks work
that has basically just missed the previous window and I'm not going
to wait weeks to sort out the API misuse.� But we can add optimizations
back later if we find a sane way.



I'm not super excited about the performance loss, but I do get
Christoph's frustration about gutting the DMA API making it harder for
DMA people to get work done. Lets try to solve this properly using
proper DMA APIs.



That being said I really can't see how this would make so much of a
difference.� What architecture and what dma_ops are you using for
those measurements?� What is the workload?



The 9% is for an AF_XDP (Fast raw Ethernet socket. Think AF_PACKET, but
faster.) benchmark: receive the packet from the NIC, and drop it. The DMA
syncs stand out in the perf top:

  � 28.63%� [kernel]������������������ 
[k] i40e_clean_rx_irq_zc
  � 17.12%� [kernel]������������������ 
[k] xp_alloc
  �� 8.80%� 
[kernel]������������������ [k] __xsk_rcv_zc
  �� 7.69%� 
[kernel]������������������ [k] 
xdp_do_redirect
  �� 5.35%� bpf_prog_992d9ddc835e5629� [k] bpf_prog_992d9ddc835e5629
  �� 4.77%� 
[kernel]������������������ [k] 
xsk_rcv.part.0
  �� 4.07%� 
[kernel]������������������ [k] 
__xsk_map_redirect
  �� 3.80%� 
[kernel]������������������ [k] 
dma_direct_sync_single_for_cpu
  �� 3.03%� 
[kernel]������������������ [k] 
dma_direct_sync_single_for_device
  �� 2.76%� 
[kernel]������������������ [k] 
i40e_alloc_rx_buffers_zc
  �� 1.83%� 
[kernel]������������������ [k] xsk_flush
...

For this benchmark the dma_ops are NULL (dma_is_direct() == true), and
the main issue is that SWIOTLB is now unconditionally enabled [1] for
x86, and for each sync we have to check that if is_swiotlb_buffer()
which involves a some costly indirection.

That was pretty much what my hack avoided. Instead we did all the checks
upfront, since AF_XDP has long-term DMA mappings, and just set a flag
for that.

Avoiding the whole "is this address swiotlb" in
dma_direct_sync_single_for_{cpu, device]() per-packet
would help a lot.


I'm pretty sure that's one of the things we hope to achieve with the
generic bypass flag :)


Somewhat related to the DMA API; It would have performance benefits for
AF_XDP if the DMA range of the mapped memory was linear, i.e. by IOMMU
utilization. I've started hacking a thing a little bit, but it would be
nice if such API was part of the mapping core.

Input: array of pages Output: array of dma addrs (and obviously dev,
flags and such)

For non-IOMMU len(array of pages) == len(array of dma addrs)
For best-case IOMMU len(array of dma addrs) == 1 (large linear space)

But that's for later. :-)


FWIW you will typically get that behaviour from IOMMU-based implementations
of dma_map_sg() right now, although it's not strictly guaranteed. If you
can weather some additional setup cost of calling
sg_alloc_table_from_pages() plus walking the list after mapping to test
whether you did get a contiguous result, you could start taking advantage
of it as some of the dma-buf code in DRM and v4l2 does already (although
those cases actually treat it as a strict dependency rather than an
optimisation).


Yikes.


Heh, consider it as iommu_dma_alloc_remap() and 
vb2_dc_get_contiguous_size() having a beautiful baby ;)



I'm inclined to agree that if we're going to see more of these cases, a new
API call that did formally guarantee a DMA-contiguous mapping (either via
IOMMU or bounce buffering) or failure might indeed be handy.


I was planning on adding a dma-level API to add more pages to an
IOMMU batch, but was waiting for at least the intel IOMMU driver to be
converted to the dma-iommu code (and preferably arm32 and s390 as well).


FWIW I did finally get round to having an initial crack at arm32 
recently[1] - of course it needs significant rework already for all the 
IOMMU API motion, and I still need to attempt to test any of it (at 
least I do have a couple of 32-bit boards here), but with 

Re: [PATCH 0/4] iommu/arm-smmu-v3: Improve cmdq lock efficiency

2020-07-08 Thread John Garry

On 22/06/2020 18:28, John Garry wrote:

Hi, Can you guys let me know if this is on the radar at all?

I have been talking about this performance issue since Jan, and not 
getting anything really.


thanks


As mentioned in [0], the CPU may consume many cycles processing
arm_smmu_cmdq_issue_cmdlist(). One issue we find is the cmpxchg() loop to
get space on the queue takes approx 25% of the cycles for this function.

This series removes that cmpxchg().

For my NVMe test with 3x NVMe SSDs, I'm getting a ~24% throughput
increase:
Before: 1310 IOPs
After: 1630 IOPs

I also have a test harness to check the rate of DMA map+unmaps we can
achieve:

CPU count   32  64  128
Before: 63187   19418   10169
After:  93287   44789   15862

(unit is map+unmaps per CPU per second)

[0] 
https://lore.kernel.org/linux-iommu/b926444035e5e2439431908e3842afd24b8...@dggemi525-mbs.china.huawei.com/T/#ma02e301c38c3e94b7725e685757c27e39c7cbde3

John Garry (4):
   iommu/arm-smmu-v3: Fix trivial typo
   iommu/arm-smmu-v3: Calculate bits for prod and owner
   iommu/arm-smmu-v3: Always issue a CMD_SYNC per batch
   iommu/arm-smmu-v3: Remove cmpxchg() in arm_smmu_cmdq_issue_cmdlist()

  drivers/iommu/arm-smmu-v3.c | 233 +++-
  1 file changed, 151 insertions(+), 82 deletions(-)



___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH v10 5/5] iommu/arm-smmu: Add global/context fault implementation hooks

2020-07-08 Thread Jon Hunter


On 08/07/2020 06:00, Krishna Reddy wrote:
> Add global/context fault hooks to allow vendor specific implementations
> override default fault interrupt handlers.
> 
> Update NVIDIA implementation to override the default global/context fault
> interrupt handlers and handle interrupts across the two ARM MMU-500s that
> are programmed identically.
> 
> Signed-off-by: Krishna Reddy 
> ---
>  drivers/iommu/arm-smmu-nvidia.c | 99 +
>  drivers/iommu/arm-smmu.c| 17 +-
>  drivers/iommu/arm-smmu.h|  3 +
>  3 files changed, 117 insertions(+), 2 deletions(-)
> 
> diff --git a/drivers/iommu/arm-smmu-nvidia.c b/drivers/iommu/arm-smmu-nvidia.c
> index 2f55e5793d34..31368057e9be 100644
> --- a/drivers/iommu/arm-smmu-nvidia.c
> +++ b/drivers/iommu/arm-smmu-nvidia.c
> @@ -127,6 +127,103 @@ static int nvidia_smmu_reset(struct arm_smmu_device 
> *smmu)
>   return 0;
>  }
>  
> +static irqreturn_t nvidia_smmu_global_fault_inst(int irq,
> +  struct arm_smmu_device *smmu,
> +  int inst)
> +{
> + u32 gfsr, gfsynr0, gfsynr1, gfsynr2;
> + void __iomem *gr0_base = nvidia_smmu_page(smmu, inst, 0);
> +
> + gfsr = readl_relaxed(gr0_base + ARM_SMMU_GR0_sGFSR);
> + if (!gfsr)
> + return IRQ_NONE;
> +
> + gfsynr0 = readl_relaxed(gr0_base + ARM_SMMU_GR0_sGFSYNR0);
> + gfsynr1 = readl_relaxed(gr0_base + ARM_SMMU_GR0_sGFSYNR1);
> + gfsynr2 = readl_relaxed(gr0_base + ARM_SMMU_GR0_sGFSYNR2);
> +
> + dev_err_ratelimited(smmu->dev,
> + "Unexpected global fault, this could be serious\n");
> + dev_err_ratelimited(smmu->dev,
> + "\tGFSR 0x%08x, GFSYNR0 0x%08x, GFSYNR1 0x%08x, 
> GFSYNR2 0x%08x\n",
> + gfsr, gfsynr0, gfsynr1, gfsynr2);
> +
> + writel_relaxed(gfsr, gr0_base + ARM_SMMU_GR0_sGFSR);
> + return IRQ_HANDLED;
> +}
> +
> +static irqreturn_t nvidia_smmu_global_fault(int irq, void *dev)
> +{
> + unsigned int inst;
> + irqreturn_t ret = IRQ_NONE;
> + struct arm_smmu_device *smmu = dev;
> +
> + for (inst = 0; inst < NUM_SMMU_INSTANCES; inst++) {
> + irqreturn_t irq_ret;
> +
> + irq_ret = nvidia_smmu_global_fault_inst(irq, smmu, inst);
> + if (irq_ret == IRQ_HANDLED)
> + ret = IRQ_HANDLED;
> + }
> +
> + return ret;
> +}
> +
> +static irqreturn_t nvidia_smmu_context_fault_bank(int irq,
> +   struct arm_smmu_device *smmu,
> +   int idx, int inst)
> +{
> + u32 fsr, fsynr, cbfrsynra;
> + unsigned long iova;
> + void __iomem *gr1_base = nvidia_smmu_page(smmu, inst, 1);
> + void __iomem *cb_base = nvidia_smmu_page(smmu, inst, smmu->numpage + 
> idx);
> +
> + fsr = readl_relaxed(cb_base + ARM_SMMU_CB_FSR);
> + if (!(fsr & ARM_SMMU_FSR_FAULT))
> + return IRQ_NONE;
> +
> + fsynr = readl_relaxed(cb_base + ARM_SMMU_CB_FSYNR0);
> + iova = readq_relaxed(cb_base + ARM_SMMU_CB_FAR);
> + cbfrsynra = readl_relaxed(gr1_base + ARM_SMMU_GR1_CBFRSYNRA(idx));
> +
> + dev_err_ratelimited(smmu->dev,
> + "Unhandled context fault: fsr=0x%x, iova=0x%08lx, 
> fsynr=0x%x, cbfrsynra=0x%x, cb=%d\n",
> + fsr, iova, fsynr, cbfrsynra, idx);
> +
> + writel_relaxed(fsr, cb_base + ARM_SMMU_CB_FSR);
> + return IRQ_HANDLED;
> +}
> +
> +static irqreturn_t nvidia_smmu_context_fault(int irq, void *dev)
> +{
> + int idx;
> + unsigned int inst;
> + irqreturn_t ret = IRQ_NONE;
> + struct arm_smmu_device *smmu;
> + struct iommu_domain *domain = dev;
> + struct arm_smmu_domain *smmu_domain;
> +
> + smmu_domain = container_of(domain, struct arm_smmu_domain, domain);
> + smmu = smmu_domain->smmu;
> +
> + for (inst = 0; inst < NUM_SMMU_INSTANCES; inst++) {
> + irqreturn_t irq_ret;
> +
> + /*
> +  * Interrupt line is shared between all contexts.
> +  * Check for faults across all contexts.
> +  */
> + for (idx = 0; idx < smmu->num_context_banks; idx++) {
> + irq_ret = nvidia_smmu_context_fault_bank(irq, smmu,
> +  idx, inst);
> + if (irq_ret == IRQ_HANDLED)
> + ret = IRQ_HANDLED;
> + }
> + }
> +
> + return ret;
> +}
> +
>  static const struct arm_smmu_impl nvidia_smmu_impl = {
>   .read_reg = nvidia_smmu_read_reg,
>   .write_reg = nvidia_smmu_write_reg,
> @@ -134,6 +231,8 @@ static const struct arm_smmu_impl nvidia_smmu_impl = {
>   .write_reg64 = nvidia_smmu_write_reg64,
>   .reset = nvidia_smmu_reset,
>   .tlb_sync = nvidia_smmu_tlb_sync,
> + .global_fault = 

Re: [PATCH v10 4/5] dt-bindings: arm-smmu: add binding for Tegra194 SMMU

2020-07-08 Thread Jon Hunter


On 08/07/2020 06:00, Krishna Reddy wrote:
> Add binding for NVIDIA's Tegra194 SoC SMMU.
> 
> Signed-off-by: Krishna Reddy 
> ---
>  .../devicetree/bindings/iommu/arm,smmu.yaml| 18 ++
>  1 file changed, 18 insertions(+)
> 
> diff --git a/Documentation/devicetree/bindings/iommu/arm,smmu.yaml 
> b/Documentation/devicetree/bindings/iommu/arm,smmu.yaml
> index d7ceb4c34423..ac1f526c3424 100644
> --- a/Documentation/devicetree/bindings/iommu/arm,smmu.yaml
> +++ b/Documentation/devicetree/bindings/iommu/arm,smmu.yaml
> @@ -38,6 +38,11 @@ properties:
>- qcom,sc7180-smmu-500
>- qcom,sdm845-smmu-500
>- const: arm,mmu-500
> +  - description: NVIDIA SoCs that program two ARM MMU-500s identically
> +items:
> +  - enum:
> +  - nvidia,tegra194-smmu
> +  - const: nvidia,smmu-500
>- items:
>- const: arm,mmu-500
>- const: arm,smmu-v2
> @@ -138,6 +143,19 @@ required:
>  
>  additionalProperties: false
>  
> +allOf:
> +  - if:
> +  properties:
> +compatible:
> +  contains:
> +enum:
> +  - nvidia,tegra194-smmu
> +then:
> +  properties:
> +reg:
> +  minItems: 2
> +  maxItems: 2
> +
>  examples:
>- |+
>  /* SMMU with stream matching or stream indexing */
> 

Reviewed-by: Jon Hunter 

Thanks
Jon

-- 
nvpublic
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH 1/2] iommu/intel: Avoid SAC address trick for PCIe devices

2020-07-08 Thread Christoph Hellwig
Looks pretty sensible.

> - if (!dmar_forcedac && dma_mask > DMA_BIT_MASK(32)) {
> + if (!dmar_forcedac && dma_mask > DMA_BIT_MASK(32) &&
> + dev_is_pci(dev) && !pci_is_pcie(to_pci_dev(dev))) {

The only thing I wonder is if it is worth to add a little helper
for this check, but with everything moving to dma-iommu that might
not be needed.

Btw, does anyone know what the status of the intel-iommu conversion
to dma-iommu is?
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH v10 3/5] iommu/arm-smmu: add NVIDIA implementation for ARM MMU-500 usage

2020-07-08 Thread Jon Hunter


On 08/07/2020 06:00, Krishna Reddy wrote:
> NVIDIA's Tegra194 SoC has three ARM MMU-500 instances.
> It uses two of the ARM MMU-500s together to interleave IOVA
> accesses across them and must be programmed identically.
> This implementation supports programming the two ARM MMU-500s
> that must be programmed identically.
> 
> The third ARM MMU-500 instance is supported by standard
> arm-smmu.c driver itself.
> 
> Signed-off-by: Krishna Reddy 
> ---
>  MAINTAINERS |   2 +
>  drivers/iommu/Makefile  |   2 +-
>  drivers/iommu/arm-smmu-impl.c   |   3 +
>  drivers/iommu/arm-smmu-nvidia.c | 179 
>  drivers/iommu/arm-smmu.c|   1 +
>  drivers/iommu/arm-smmu.h|   1 +
>  6 files changed, 187 insertions(+), 1 deletion(-)
>  create mode 100644 drivers/iommu/arm-smmu-nvidia.c
> 
> diff --git a/MAINTAINERS b/MAINTAINERS
> index c23352059a6b..534cedaf8e55 100644
> --- a/MAINTAINERS
> +++ b/MAINTAINERS
> @@ -16811,8 +16811,10 @@ F:   drivers/i2c/busses/i2c-tegra.c
>  
>  TEGRA IOMMU DRIVERS
>  M:   Thierry Reding 
> +R:   Krishna Reddy 
>  L:   linux-te...@vger.kernel.org
>  S:   Supported
> +F:   drivers/iommu/arm-smmu-nvidia.c
>  F:   drivers/iommu/tegra*
>  
>  TEGRA KBC DRIVER
> diff --git a/drivers/iommu/Makefile b/drivers/iommu/Makefile
> index 342190196dfb..2b8203db73ec 100644
> --- a/drivers/iommu/Makefile
> +++ b/drivers/iommu/Makefile
> @@ -15,7 +15,7 @@ obj-$(CONFIG_AMD_IOMMU) += amd/iommu.o amd/init.o 
> amd/quirks.o
>  obj-$(CONFIG_AMD_IOMMU_DEBUGFS) += amd/debugfs.o
>  obj-$(CONFIG_AMD_IOMMU_V2) += amd/iommu_v2.o
>  obj-$(CONFIG_ARM_SMMU) += arm_smmu.o
> -arm_smmu-objs += arm-smmu.o arm-smmu-impl.o arm-smmu-qcom.o
> +arm_smmu-objs += arm-smmu.o arm-smmu-impl.o arm-smmu-nvidia.o arm-smmu-qcom.o
>  obj-$(CONFIG_ARM_SMMU_V3) += arm-smmu-v3.o
>  obj-$(CONFIG_DMAR_TABLE) += intel/dmar.o
>  obj-$(CONFIG_INTEL_IOMMU) += intel/iommu.o intel/pasid.o
> diff --git a/drivers/iommu/arm-smmu-impl.c b/drivers/iommu/arm-smmu-impl.c
> index c75b9d957b70..f15571d05474 100644
> --- a/drivers/iommu/arm-smmu-impl.c
> +++ b/drivers/iommu/arm-smmu-impl.c
> @@ -171,6 +171,9 @@ struct arm_smmu_device *arm_smmu_impl_init(struct 
> arm_smmu_device *smmu)
>   if (of_property_read_bool(np, "calxeda,smmu-secure-config-access"))
>   smmu->impl = _impl;
>  
> + if (of_device_is_compatible(np, "nvidia,tegra194-smmu"))
> + return nvidia_smmu_impl_init(smmu);
> +

I wonder if we should be matching nvidia,smmu-500 here as well because
any device that has that we want to call nvidia_smmu_impl_init(). I
understand that there is only tegra194 today, but seems funny to match
nvidia,smmu-500 below and then nvidia,tegra194-smmu here. That said ...

>   if (of_device_is_compatible(np, "qcom,sdm845-smmu-500") ||
>   of_device_is_compatible(np, "qcom,sc7180-smmu-500"))
>   return qcom_smmu_impl_init(smmu);
> diff --git a/drivers/iommu/arm-smmu-nvidia.c b/drivers/iommu/arm-smmu-nvidia.c
> new file mode 100644
> index ..2f55e5793d34
> --- /dev/null
> +++ b/drivers/iommu/arm-smmu-nvidia.c
> @@ -0,0 +1,179 @@
> +// SPDX-License-Identifier: GPL-2.0-only
> +// Copyright (C) 2019-2020 NVIDIA CORPORATION.  All rights reserved.
> +
> +#include 
> +#include 
> +#include 
> +#include 
> +#include 
> +
> +#include "arm-smmu.h"
> +
> +/*
> + * Tegra194 has three ARM MMU-500 Instances.
> + * Two of them are used together and must be programmed identically for
> + * interleaved IOVA accesses across them and translates accesses from
> + * non-isochronous HW devices.
> + * Third one is used for translating accesses from isochronous HW devices.
> + * This implementation supports programming of the two instances that must
> + * be programmed identically.
> + * The third instance usage is through standard arm-smmu driver itself and
> + * is out of scope of this implementation.
> + */
> +#define NUM_SMMU_INSTANCES 2
> +
> +struct nvidia_smmu {
> + struct arm_smmu_device  smmu;
> + void __iomem*bases[NUM_SMMU_INSTANCES];
> +};
> +
> +static inline void __iomem *nvidia_smmu_page(struct arm_smmu_device *smmu,
> +  unsigned int inst, int page)
> +{
> + struct nvidia_smmu *nvidia_smmu;
> +
> + nvidia_smmu = container_of(smmu, struct nvidia_smmu, smmu);
> + return nvidia_smmu->bases[inst] + (page << smmu->pgshift);
> +}
> +
> +static u32 nvidia_smmu_read_reg(struct arm_smmu_device *smmu,
> + int page, int offset)
> +{
> + void __iomem *reg = nvidia_smmu_page(smmu, 0, page) + offset;
> +
> + return readl_relaxed(reg);
> +}
> +
> +static void nvidia_smmu_write_reg(struct arm_smmu_device *smmu,
> +   int page, int offset, u32 val)
> +{
> + unsigned int i;
> +
> + for (i = 0; i < NUM_SMMU_INSTANCES; i++) {
> + void __iomem *reg = nvidia_smmu_page(smmu, i, page) + offset;
> +
> +

Re: [PATCH net] xsk: remove cheap_dma optimization

2020-07-08 Thread Christoph Hellwig
On Wed, Jul 08, 2020 at 07:57:23AM +, Song Bao Hua (Barry Song) wrote:
> > int dma_map_batch_start(struct device *dev, size_t rounded_len,
> > enum dma_data_direction dir, unsigned long attrs, dma_addr_t *addr);
> > int dma_map_batch_add(struct device *dev, dma_addr_t *addr, struct page
> > *page,
> > unsigned long offset, size_t size);
> > int dma_map_batch_end(struct device *dev, int ret, dma_addr_t start_addr);
> > 
> 
> Hello Christoph,
> 
> What is the different between dma_map_batch_add() and adding the buffer to sg 
> of dma_map_sg()?

There is not struct scatterlist involved in this API, avoiding the
overhead to allocate it (which is kinda the point).
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH v10 2/5] iommu/arm-smmu: ioremap smmu mmio region before implementation init

2020-07-08 Thread Jon Hunter


On 08/07/2020 06:00, Krishna Reddy wrote:
> ioremap smmu mmio region before calling into implementation init.
> This is necessary to allow mapped address available during vendor
> specific implementation init.
> 
> Signed-off-by: Krishna Reddy 
> ---
>  drivers/iommu/arm-smmu.c | 8 
>  1 file changed, 4 insertions(+), 4 deletions(-)
> 
> diff --git a/drivers/iommu/arm-smmu.c b/drivers/iommu/arm-smmu.c
> index d2054178df35..e03e873d3bca 100644
> --- a/drivers/iommu/arm-smmu.c
> +++ b/drivers/iommu/arm-smmu.c
> @@ -2120,10 +2120,6 @@ static int arm_smmu_device_probe(struct 
> platform_device *pdev)
>   if (err)
>   return err;
>  
> - smmu = arm_smmu_impl_init(smmu);
> - if (IS_ERR(smmu))
> - return PTR_ERR(smmu);
> -
>   res = platform_get_resource(pdev, IORESOURCE_MEM, 0);
>   ioaddr = res->start;
>   smmu->base = devm_ioremap_resource(dev, res);
> @@ -2135,6 +2131,10 @@ static int arm_smmu_device_probe(struct 
> platform_device *pdev)
>*/
>   smmu->numpage = resource_size(res);
>  
> + smmu = arm_smmu_impl_init(smmu);
> + if (IS_ERR(smmu))
> + return PTR_ERR(smmu);
> +
>   num_irqs = 0;
>   while ((res = platform_get_resource(pdev, IORESOURCE_IRQ, num_irqs))) {
>   num_irqs++;
> 


Reviewed-by: Jon Hunter 

Thanks!
Jon

-- 
nvpublic
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH v10 1/5] iommu/arm-smmu: move TLB timeout and spin count macros

2020-07-08 Thread Jon Hunter


On 08/07/2020 06:00, Krishna Reddy wrote:
> Move TLB timeout and spin count macros to header file to
> allow using the same from vendor specific implementations.
> 
> Signed-off-by: Krishna Reddy 
> ---
>  drivers/iommu/arm-smmu.c | 3 ---
>  drivers/iommu/arm-smmu.h | 2 ++
>  2 files changed, 2 insertions(+), 3 deletions(-)
> 
> diff --git a/drivers/iommu/arm-smmu.c b/drivers/iommu/arm-smmu.c
> index 243bc4cb2705..d2054178df35 100644
> --- a/drivers/iommu/arm-smmu.c
> +++ b/drivers/iommu/arm-smmu.c
> @@ -52,9 +52,6 @@
>   */
>  #define QCOM_DUMMY_VAL -1
>  
> -#define TLB_LOOP_TIMEOUT 100 /* 1s! */
> -#define TLB_SPIN_COUNT   10
> -
>  #define MSI_IOVA_BASE0x800
>  #define MSI_IOVA_LENGTH  0x10
>  
> diff --git a/drivers/iommu/arm-smmu.h b/drivers/iommu/arm-smmu.h
> index d172c024be61..c7d0122a7c6c 100644
> --- a/drivers/iommu/arm-smmu.h
> +++ b/drivers/iommu/arm-smmu.h
> @@ -236,6 +236,8 @@ enum arm_smmu_cbar_type {
>  /* Maximum number of context banks per SMMU */
>  #define ARM_SMMU_MAX_CBS 128
>  
> +#define TLB_LOOP_TIMEOUT 100 /* 1s! */
> +#define TLB_SPIN_COUNT   10
>  
>  /* Shared driver definitions */
>  enum arm_smmu_arch_version {
> 


Reviewed-by: Jon Hunter 

Thanks!
Jon

-- 
nvpublic
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[PATCH 1/2] iommu/intel: Avoid SAC address trick for PCIe devices

2020-07-08 Thread Robin Murphy
For devices stuck behind a conventional PCI bus, saving extra cycles at
33MHz is probably fairly significant. However since native PCI Express
is now the norm for high-performance devices, the optimisation to always
prefer 32-bit addresses for the sake of avoiding DAC is starting to look
rather anachronistic. Technically 32-bit addresses do have shorter TLPs
on PCIe, but unless the device is saturating its link bandwidth with
small transfers it seems unlikely that the difference is appreciable.

What definitely is appreciable, however, is that the IOVA allocator
doesn't behave all that well once the 32-bit space starts getting full.
As DMA working sets get bigger, this optimisation increasingly backfires
and adds considerable overhead to the dma_map path for use-cases like
high-bandwidth networking.

As such, let's simply take it out of consideration for PCIe devices.
Technically this might work out suboptimal for a PCIe device stuck
behind a conventional PCI bridge, or for PCI-X devices that also have
native 64-bit addressing, but neither of those are likely to be found
in performance-critical parts of modern systems.

Signed-off-by: Robin Murphy 
---
 drivers/iommu/intel/iommu.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/drivers/iommu/intel/iommu.c b/drivers/iommu/intel/iommu.c
index 9129663a7406..21b11de23a53 100644
--- a/drivers/iommu/intel/iommu.c
+++ b/drivers/iommu/intel/iommu.c
@@ -3348,7 +3348,8 @@ static unsigned long intel_alloc_iova(struct device *dev,
/* Ensure we reserve the whole size-aligned region */
nrpages = __roundup_pow_of_two(nrpages);
 
-   if (!dmar_forcedac && dma_mask > DMA_BIT_MASK(32)) {
+   if (!dmar_forcedac && dma_mask > DMA_BIT_MASK(32) &&
+   dev_is_pci(dev) && !pci_is_pcie(to_pci_dev(dev))) {
/*
 * First try to allocate an io virtual address in
 * DMA_BIT_MASK(32) and if that fails then try allocating
-- 
2.23.0.dirty

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[PATCH 2/2] iommu/dma: Avoid SAC address trick for PCIe devices

2020-07-08 Thread Robin Murphy
As for the intel-iommu implementation, relegate the opportunistic
attempt to allocate a SAC address to the domain of conventional PCI
devices only, to avoid it increasingly causing far more performance
issues than possible benefits on modern PCI Express systems.

Signed-off-by: Robin Murphy 
---
 drivers/iommu/dma-iommu.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/drivers/iommu/dma-iommu.c b/drivers/iommu/dma-iommu.c
index 4959f5df21bd..0ff124f16ad4 100644
--- a/drivers/iommu/dma-iommu.c
+++ b/drivers/iommu/dma-iommu.c
@@ -426,7 +426,8 @@ static dma_addr_t iommu_dma_alloc_iova(struct iommu_domain 
*domain,
dma_limit = min(dma_limit, (u64)domain->geometry.aperture_end);
 
/* Try to get PCI devices a SAC address */
-   if (dma_limit > DMA_BIT_MASK(32) && dev_is_pci(dev))
+   if (dma_limit > DMA_BIT_MASK(32) &&
+   dev_is_pci(dev) && !pci_is_pcie(to_pci_dev(dev)))
iova = alloc_iova_fast(iovad, iova_len,
   DMA_BIT_MASK(32) >> shift, false);
 
-- 
2.23.0.dirty

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH v6 03/10] iommu/mediatek: Use a u32 flags to describe different HW features

2020-07-08 Thread chao hao
Hi Matthias and Yingjoe,
Thanks for your comments!

On Mon, 2020-07-06 at 17:17 +0200, Matthias Brugger wrote:
> 
> On 04/07/2020 03:16, Yingjoe Chen wrote:
> > On Fri, 2020-07-03 at 12:41 +0800, Chao Hao wrote:
> >> Given the fact that we are adding more and more plat_data bool values,
> >> it would make sense to use a u32 flags register and add the appropriate
> >> macro definitions to set and check for a flag present.
> >> No functional change.
> >>
> >> Cc: Yong Wu 
> >> Suggested-by: Matthias Brugger 
> >> Signed-off-by: Chao Hao 
> >> Reviewed-by: Matthias Brugger 
> >> ---
> >>  drivers/iommu/mtk_iommu.c | 28 +---
> >>  drivers/iommu/mtk_iommu.h |  7 +--
> >>  2 files changed, 18 insertions(+), 17 deletions(-)
> >>
> >> diff --git a/drivers/iommu/mtk_iommu.c b/drivers/iommu/mtk_iommu.c
> >> index 88d3df5b91c2..40ca564d97af 100644
> >> --- a/drivers/iommu/mtk_iommu.c
> >> +++ b/drivers/iommu/mtk_iommu.c
> >> @@ -100,6 +100,15 @@
> >>  #define MTK_M4U_TO_LARB(id)   (((id) >> 5) & 0xf)
> >>  #define MTK_M4U_TO_PORT(id)   ((id) & 0x1f)
> >>  
> >> +#define HAS_4GB_MODE  BIT(0)
> >> +/* HW will use the EMI clock if there isn't the "bclk". */
> >> +#define HAS_BCLK  BIT(1)
> >> +#define HAS_VLD_PA_RNGBIT(2)
> >> +#define RESET_AXI BIT(3)
> >> +
> >> +#define MTK_IOMMU_HAS_FLAG(pdata, _x) \
> >> +  pdata)->flags) & (_x)) == (_x))
> >> +
> >>  struct mtk_iommu_domain {
> >>struct io_pgtable_cfg   cfg;
> >>struct io_pgtable_ops   *iop;
> >> @@ -563,7 +572,8 @@ static int mtk_iommu_hw_init(const struct 
> >> mtk_iommu_data *data)
> >> upper_32_bits(data->protect_base);
> >>writel_relaxed(regval, data->base + REG_MMU_IVRP_PADDR);
> >>  
> >> -  if (data->enable_4GB && data->plat_data->has_vld_pa_rng) {
> >> +  if (data->enable_4GB &&
> >> +  MTK_IOMMU_HAS_FLAG(data->plat_data, HAS_VLD_PA_RNG)) {
> >>/*
> >> * If 4GB mode is enabled, the validate PA range is from
> >> * 0x1__ to 0x1__. here record bit[32:30].
> >> @@ -573,7 +583,7 @@ static int mtk_iommu_hw_init(const struct 
> >> mtk_iommu_data *data)
> >>}
> >>writel_relaxed(0, data->base + REG_MMU_DCM_DIS);
> >>  
> >> -  if (data->plat_data->reset_axi) {
> >> +  if (MTK_IOMMU_HAS_FLAG(data->plat_data, RESET_AXI)) {
> >>/* The register is called STANDARD_AXI_MODE in this case */
> >>writel_relaxed(0, data->base + REG_MMU_MISC_CTRL);
> >>}
> >> @@ -618,7 +628,7 @@ static int mtk_iommu_probe(struct platform_device 
> >> *pdev)
> >>  
> >>/* Whether the current dram is over 4GB */
> >>data->enable_4GB = !!(max_pfn > (BIT_ULL(32) >> PAGE_SHIFT));
> >> -  if (!data->plat_data->has_4gb_mode)
> >> +  if (!MTK_IOMMU_HAS_FLAG(data->plat_data, HAS_4GB_MODE))
> >>data->enable_4GB = false;
> >>  
> >>res = platform_get_resource(pdev, IORESOURCE_MEM, 0);
> >> @@ -631,7 +641,7 @@ static int mtk_iommu_probe(struct platform_device 
> >> *pdev)
> >>if (data->irq < 0)
> >>return data->irq;
> >>  
> >> -  if (data->plat_data->has_bclk) {
> >> +  if (MTK_IOMMU_HAS_FLAG(data->plat_data, HAS_BCLK)) {
> >>data->bclk = devm_clk_get(dev, "bclk");
> >>if (IS_ERR(data->bclk))
> >>return PTR_ERR(data->bclk);
> >> @@ -763,23 +773,19 @@ static const struct dev_pm_ops mtk_iommu_pm_ops = {
> >>  
> >>  static const struct mtk_iommu_plat_data mt2712_data = {
> >>.m4u_plat = M4U_MT2712,
> >> -  .has_4gb_mode = true,
> >> -  .has_bclk = true,
> >> -  .has_vld_pa_rng   = true,
> >> +  .flags= HAS_4GB_MODE | HAS_BCLK | HAS_VLD_PA_RNG,
> >>.larbid_remap = {0, 1, 2, 3, 4, 5, 6, 7, 8, 9},
> >>  };
> >>  
> >>  static const struct mtk_iommu_plat_data mt8173_data = {
> >>.m4u_plat = M4U_MT8173,
> >> -  .has_4gb_mode = true,
> >> -  .has_bclk = true,
> >> -  .reset_axi= true,
> >> +  .flags= HAS_4GB_MODE | HAS_BCLK | RESET_AXI,
> >>.larbid_remap = {0, 1, 2, 3, 4, 5}, /* Linear mapping. */
> >>  };
> >>  
> >>  static const struct mtk_iommu_plat_data mt8183_data = {
> >>.m4u_plat = M4U_MT8183,
> >> -  .reset_axi= true,
> >> +  .flags= RESET_AXI,
> >>.larbid_remap = {0, 4, 5, 6, 7, 2, 3, 1},
> >>  };
> >>  
> >> diff --git a/drivers/iommu/mtk_iommu.h b/drivers/iommu/mtk_iommu.h
> >> index 7212e6fcf982..5225a9170aaa 100644
> >> --- a/drivers/iommu/mtk_iommu.h
> >> +++ b/drivers/iommu/mtk_iommu.h
> >> @@ -39,12 +39,7 @@ enum mtk_iommu_plat {
> >>  
> >>  struct mtk_iommu_plat_data {
> >>enum mtk_iommu_plat m4u_plat;
> >> -  boolhas_4gb_mode;
> >> -
> >> -  /* HW will use the EMI clock if there isn't the "bclk". */
> >> -  boolhas_bclk;
> >> -  boolhas_vld_pa_rng;
> >> -  boolreset_axi;
> >> +  u32 

Re: [PATCH] dma-pool: use single atomic pool for both DMA zones

2020-07-08 Thread Nicolas Saenz Julienne
Hi Jim,

On Tue, 2020-07-07 at 17:08 -0500, Jeremy Linton wrote:
> Hi,
> 
> I spun this up on my 8G model using the PFTF firmware from:
> 
> https://github.com/pftf/RPi4/releases
> 
> Which allows me to switch between ACPI/DT on the machine. In DT mode it 
> works fine now, 

Nice, would that count as a Tested-by from you?

> but with ACPI I continue to have failures unless I 
> disable CMA via cma=0 on the kernel command line. 

Yes, I see why, in atomic_pool_expand() memory is allocated from CMA without
checking its correctness. That calls for a separate fix. I'll try to think of
something.

> It think that is because
> 
> using DT:
> 
> [0.00] Reserved memory: created CMA memory pool at
> 0x3740, size 64 MiB
> 
> 
> using ACPI:
> [0.00] cma: Reserved 64 MiB at 0xf800
> 
> Which is AFAIK because the default arm64 CMA allocation is just below 
> the arm64_dma32_phys_limit.

As I'm sure you know, we fix the CMA address trough DT, isn't that possible
trough ACPI?

Regards,
Nicolas



signature.asc
Description: This is a digitally signed message part
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

Re: add an API to check if a streamming mapping needs sync calls

2020-07-08 Thread Daniel Borkmann

On 7/8/20 9:44 AM, Christoph Hellwig wrote:

On Mon, Jun 29, 2020 at 03:39:01PM +0200, Björn Töpel wrote:

On 2020-06-29 15:03, Christoph Hellwig wrote:

Hi all,

this series lifts the somewhat hacky checks in the XSK code if a DMA
streaming mapping needs dma_sync_single_for_{device,cpu} calls to the
DMA API.



Thanks a lot for working on, and fixing this, Christoph!

I took the series for a spin, and there are (obviously) no performance
regressions.

Would the patches go through the net/bpf trees or somewhere else?


Where did this end up?  I still don't see it in Linus' tree and this
is getting urgent now.


It was merged into bpf tree and we sent the PR to DaveM which was merged into
net tree around a week ago [0]; I assume the PR for net might go to Linus soon
this week.

  [0] 
https://git.kernel.org/pub/scm/linux/kernel/git/netdev/net.git/commit/?id=e708e2bd55c921f5bb554fa5837d132a878951cf
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

Re: [PATCH] iommu/arm-smmu-v3: expose numa_node attribute to users in sysfs

2020-07-08 Thread Brice Goglin

Le 06/07/2020 à 10:26, Jonathan Cameron a écrit :
> On Sun, 5 Jul 2020 09:53:58 +
> "Song Bao Hua (Barry Song)"  wrote:
>
>>> -Original Message-
>>> From: Will Deacon [mailto:w...@kernel.org]
>>> Sent: Saturday, July 4, 2020 4:22 AM
>>> To: Song Bao Hua (Barry Song) 
>>> Cc: robin.mur...@arm.com; h...@lst.de; m.szyprow...@samsung.com;
>>> iommu@lists.linux-foundation.org; linux-arm-ker...@lists.infradead.org;
>>> Linuxarm 
>>> Subject: Re: [PATCH] iommu/arm-smmu-v3: expose numa_node attribute to
>>> users in sysfs
>>>
>>> On Sat, May 30, 2020 at 09:15:05PM +1200, Barry Song wrote:  
 As tests show the latency of dma_unmap can increase dramatically while
 calling them cross NUMA nodes, especially cross CPU packages, eg.
 300ns vs 800ns while waiting for the completion of CMD_SYNC in an
 empty command queue. The large latency causing by remote node will
 in turn make contention of the command queue more serious, and enlarge
 the latency of DMA users within local NUMA nodes.

 Users might intend to enforce NUMA locality with the consideration of
 the position of SMMU. The patch provides minor benefit by presenting
 this information to users directly, as they might want to know it without
 checking hardware spec at all.  
>>> I don't think that's a very good reason to expose things to userspace.
>>> I know sysfs shouldn't be treated as ABI, but the grim reality is that
>>> once somebody relies on this stuff then we can't change it, so I'd
>>> rather avoid exposing it unless it's absolutely necessary.  
>> Will, thanks for taking a look!
>>
>> I am not sure if it is absolutely necessary, but it is useful to users. The 
>> whole story started
>> from some users who wanted to know the hardware topology very clear by 
>> reading some
>> sysfs node just like they are able to do that for pci devices. The intention 
>> is that users can
>> know hardware topology of various devices easily from linux since they maybe 
>> don't know
>> all the hardware details.
>>
>> For pci devices, kernel has done that. And there are some other drivers out 
>> of pci
>> exposing numa_node as well. It seems it is hard to say it is absolutely 
>> necessary
>> for them too since sysfs shouldn't be treated as ABI. 
> Brice,
>
> Given hwloc is probably the most demanding user of topology information
> currently...
>
> How useful would this info be for hwloc and hwloc users?
> Sort of feels like it might be useful in some cases.
>
> The very brief description of what we have here is exposing the numa node
> of an IOMMU.  The discussion also diverted into whether it just makes sense
> to expose this for all platform devices or even do it at the device level.


Hello

We don't have anything about IOMMU in hwloc so far, likely because its
locality never mattered in the past? I guess we'll get some user
requests for it once more platforms show this issue and some
performance-critical applications are not happy with it.

Can you clarify what the whole machine topology look like? Are we
talking about some PCI devices being attached to one socket but talking
to the IOMMU of the other socket?

Brice

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

Re: [PATCH] dma-pool: use single atomic pool for both DMA zones

2020-07-08 Thread Jeremy Linton

Hi,

I spun this up on my 8G model using the PFTF firmware from:

https://github.com/pftf/RPi4/releases

Which allows me to switch between ACPI/DT on the machine. In DT mode it 
works fine now, but with ACPI I continue to have failures unless I 
disable CMA via cma=0 on the kernel command line. It think that is because


using DT:

[0.00] Reserved memory: created CMA memory pool at
0x3740, size 64 MiB


using ACPI:
[0.00] cma: Reserved 64 MiB at 0xf800

Which is AFAIK because the default arm64 CMA allocation is just below 
the arm64_dma32_phys_limit.



Thanks,



On 7/7/20 7:28 AM, Nicolas Saenz Julienne wrote:

When allocating atomic DMA memory for a device, the dma-pool core
queries __dma_direct_optimal_gfp_mask() to check which atomic pool to
use. It turns out the GFP flag returned is only an optimistic guess.
The pool selected might sometimes live in a zone higher than the
device's view of memory.

As there isn't a way to grantee a mapping between a device's DMA
constraints and correct GFP flags this unifies both DMA atomic pools.
The resulting pool is allocated in the lower DMA zone available, if any,
so as for devices to always get accessible memory while having the
flexibility of using dma_pool_kernel for the non constrained ones.

Fixes: c84dc6e68a1d ("dma-pool: add additional coherent pools to map to gfp 
mask")
Reported-by: Jeremy Linton 
Suggested-by: Robin Murphy 
Signed-off-by: Nicolas Saenz Julienne 
---
  kernel/dma/pool.c | 47 +++
  1 file changed, 19 insertions(+), 28 deletions(-)

diff --git a/kernel/dma/pool.c b/kernel/dma/pool.c
index 8cfa01243ed2..883f7a583969 100644
--- a/kernel/dma/pool.c
+++ b/kernel/dma/pool.c
@@ -13,10 +13,11 @@
  #include 
  #include 
  
+#define GFP_ATOMIC_POOL_DMA	(IS_ENABLED(CONFIG_ZONE_DMA) ? GFP_DMA : \

+IS_ENABLED(CONFIG_ZONE_DMA32) ? GFP_DMA32 : 0)
+
  static struct gen_pool *atomic_pool_dma __ro_after_init;
  static unsigned long pool_size_dma;
-static struct gen_pool *atomic_pool_dma32 __ro_after_init;
-static unsigned long pool_size_dma32;
  static struct gen_pool *atomic_pool_kernel __ro_after_init;
  static unsigned long pool_size_kernel;
  
@@ -42,16 +43,13 @@ static void __init dma_atomic_pool_debugfs_init(void)

return;
  
  	debugfs_create_ulong("pool_size_dma", 0400, root, _size_dma);

-   debugfs_create_ulong("pool_size_dma32", 0400, root, _size_dma32);
debugfs_create_ulong("pool_size_kernel", 0400, root, _size_kernel);
  }
  
  static void dma_atomic_pool_size_add(gfp_t gfp, size_t size)

  {
-   if (gfp & __GFP_DMA)
+   if (gfp & GFP_ATOMIC_POOL_DMA)
pool_size_dma += size;
-   else if (gfp & __GFP_DMA32)
-   pool_size_dma32 += size;
else
pool_size_kernel += size;
  }
@@ -132,12 +130,11 @@ static void atomic_pool_resize(struct gen_pool *pool, 
gfp_t gfp)
  
  static void atomic_pool_work_fn(struct work_struct *work)

  {
-   if (IS_ENABLED(CONFIG_ZONE_DMA))
-   atomic_pool_resize(atomic_pool_dma,
-  GFP_KERNEL | GFP_DMA);
-   if (IS_ENABLED(CONFIG_ZONE_DMA32))
-   atomic_pool_resize(atomic_pool_dma32,
-  GFP_KERNEL | GFP_DMA32);
+   gfp_t dma_gfp = GFP_ATOMIC_POOL_DMA;
+
+   if (dma_gfp)
+   atomic_pool_resize(atomic_pool_dma, GFP_KERNEL | dma_gfp);
+
atomic_pool_resize(atomic_pool_kernel, GFP_KERNEL);
  }
  
@@ -168,6 +165,7 @@ static __init struct gen_pool *__dma_atomic_pool_init(size_t pool_size,
  
  static int __init dma_atomic_pool_init(void)

  {
+   gfp_t dma_gfp = GFP_ATOMIC_POOL_DMA;
int ret = 0;
  
  	/*

@@ -185,18 +183,13 @@ static int __init dma_atomic_pool_init(void)
GFP_KERNEL);
if (!atomic_pool_kernel)
ret = -ENOMEM;
-   if (IS_ENABLED(CONFIG_ZONE_DMA)) {
+
+   if (dma_gfp) {
atomic_pool_dma = __dma_atomic_pool_init(atomic_pool_size,
-   GFP_KERNEL | GFP_DMA);
+GFP_KERNEL | dma_gfp);
if (!atomic_pool_dma)
ret = -ENOMEM;
}
-   if (IS_ENABLED(CONFIG_ZONE_DMA32)) {
-   atomic_pool_dma32 = __dma_atomic_pool_init(atomic_pool_size,
-   GFP_KERNEL | GFP_DMA32);
-   if (!atomic_pool_dma32)
-   ret = -ENOMEM;
-   }
  
  	dma_atomic_pool_debugfs_init();

return ret;
@@ -206,14 +199,12 @@ postcore_initcall(dma_atomic_pool_init);
  static inline struct gen_pool *dev_to_pool(struct device *dev)
  {
u64 phys_mask;
-   gfp_t gfp;
-
-   gfp = dma_direct_optimal_gfp_mask(dev, dev->coherent_dma_mask,
- 

RE: [PATCH v3 06/14] vfio/type1: Add VFIO_IOMMU_PASID_REQUEST (alloc/free)

2020-07-08 Thread Liu, Yi L
Hi Alex,

> From: Liu, Yi L < yi.l@intel.com>
> Sent: Friday, July 3, 2020 2:28 PM
> 
> Hi Alex,
> 
> > From: Alex Williamson 
> > Sent: Friday, July 3, 2020 5:19 AM
> >
> > On Wed, 24 Jun 2020 01:55:19 -0700
> > Liu Yi L  wrote:
> >
> > > This patch allows user space to request PASID allocation/free, e.g.
> > > when serving the request from the guest.
> > >
> > > PASIDs that are not freed by userspace are automatically freed when
> > > the IOASID set is destroyed when process exits.
[...]
> > > +static int vfio_iommu_type1_pasid_request(struct vfio_iommu *iommu,
> > > +   unsigned long arg)
> > > +{
> > > + struct vfio_iommu_type1_pasid_request req;
> > > + unsigned long minsz;
> > > +
> > > + minsz = offsetofend(struct vfio_iommu_type1_pasid_request, range);
> > > +
> > > + if (copy_from_user(, (void __user *)arg, minsz))
> > > + return -EFAULT;
> > > +
> > > + if (req.argsz < minsz || (req.flags & ~VFIO_PASID_REQUEST_MASK))
> > > + return -EINVAL;
> > > +
> > > + if (req.range.min > req.range.max)
> >
> > Is it exploitable that a user can spin the kernel for a long time in
> > the case of a free by calling this with [0, MAX_UINT] regardless of their 
> > actual
> allocations?
> 
> IOASID can ensure that user can only free the PASIDs allocated to the user. 
> but
> it's true, kernel needs to loop all the PASIDs within the range provided by 
> user. it
> may take a long time. is there anything we can do? one thing may limit the 
> range
> provided by user?

thought about it more, we have per-VM pasid quota (say 1000), so even if
user passed down [0, MAX_UNIT], kernel will only loop the 1000 pasids at
most. do you think we still need to do something on it?

Regards,
Yi Liu
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


RE: [PATCH v4 04/15] vfio/type1: Report iommu nesting info to userspace

2020-07-08 Thread Liu, Yi L
Hi Alex,

Eric asked if we will to have data strcut other than struct iommu_nesting_info
type in the struct vfio_iommu_type1_info_cap_nesting @info[] field. I'm not
quit sure on it. I guess the answer may be not as VFIO's nesting support should
based on IOMMU UAPI. how about your opinion?

+#define VFIO_IOMMU_TYPE1_INFO_CAP_NESTING  3
+
+/*
+ * Reporting nesting info to user space.
+ *
+ * @info:  the nesting info provided by IOMMU driver. Today
+ * it is expected to be a struct iommu_nesting_info
+ * data.
+ */
+struct vfio_iommu_type1_info_cap_nesting {
+   struct  vfio_info_cap_header header;
+   __u32   flags;
+   __u32   padding;
+   __u8info[];
+};

https://lore.kernel.org/linux-iommu/dm5pr11mb1435290b6cd561ec61027892c3...@dm5pr11mb1435.namprd11.prod.outlook.com/

Regards,
Yi Liu

> From: Liu, Yi L
> Sent: Tuesday, July 7, 2020 5:32 PM
> 
[...]
> > >
> > >>> +
> > >>> +/*
> > >>> + * Reporting nesting info to user space.
> > >>> + *
> > >>> + * @info:  the nesting info provided by IOMMU driver. Today
> > >>> + * it is expected to be a struct iommu_nesting_info
> > >>> + * data.
> > >> Is it expected to change?
> > >
> > > honestly, I'm not quite sure on it. I did considered to embed struct
> > > iommu_nesting_info here instead of using info[]. but I hesitated as
> > > using info[] may leave more flexibility on this struct. how about
> > > your opinion? perhaps it's fine to embed the struct
> > > iommu_nesting_info here as long as VFIO is setup nesting based on
> > > IOMMU UAPI.
> > >
> > >>> + */
> > >>> +struct vfio_iommu_type1_info_cap_nesting {
> > >>> +   struct  vfio_info_cap_header header;
> > >>> +   __u32   flags;
> > >> You may document flags.
> > >
> > > sure. it's reserved for future.
> > >
> > > Regards,
> > > Yi Liu
> > >
> > >>> +   __u32   padding;
> > >>> +   __u8info[];
> > >>> +};
> > >>> +
> > >>>  #define VFIO_IOMMU_GET_INFO _IO(VFIO_TYPE, VFIO_BASE + 12)
> > >>>
> > >>>  /**
> > >>>
> > >> Thanks
> > >>
> > >> Eric
> > >

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


RE: [PATCH net] xsk: remove cheap_dma optimization

2020-07-08 Thread Song Bao Hua (Barry Song)



> -Original Message-
> From: netdev-ow...@vger.kernel.org [mailto:netdev-ow...@vger.kernel.org]
> On Behalf Of Christoph Hellwig
> Sent: Wednesday, July 8, 2020 6:50 PM
> To: Robin Murphy 
> Cc: Björn Töpel ; Christoph Hellwig ;
> Daniel Borkmann ; maxi...@mellanox.com;
> konrad.w...@oracle.com; jonathan.le...@gmail.com;
> linux-ker...@vger.kernel.org; iommu@lists.linux-foundation.org;
> net...@vger.kernel.org; b...@vger.kernel.org; da...@davemloft.net;
> magnus.karls...@intel.com
> Subject: Re: [PATCH net] xsk: remove cheap_dma optimization
> 
> On Mon, Jun 29, 2020 at 04:41:16PM +0100, Robin Murphy wrote:
> > On 2020-06-28 18:16, Björn Töpel wrote:
> >>
> >> On 2020-06-27 09:04, Christoph Hellwig wrote:
> >>> On Sat, Jun 27, 2020 at 01:00:19AM +0200, Daniel Borkmann wrote:
>  Given there is roughly a ~5 weeks window at max where this removal
> could
>  still be applied in the worst case, could we come up with a fix /
>  proposal
>  first that moves this into the DMA mapping core? If there is something
>  that
>  can be agreed upon by all parties, then we could avoid re-adding the 9%
>  slowdown. :/
> >>>
> >>> I'd rather turn it upside down - this abuse of the internals blocks work
> >>> that has basically just missed the previous window and I'm not going
> >>> to wait weeks to sort out the API misuse.  But we can add optimizations
> >>> back later if we find a sane way.
> >>>
> >>
> >> I'm not super excited about the performance loss, but I do get
> >> Christoph's frustration about gutting the DMA API making it harder for
> >> DMA people to get work done. Lets try to solve this properly using
> >> proper DMA APIs.
> >>
> >>
> >>> That being said I really can't see how this would make so much of a
> >>> difference.  What architecture and what dma_ops are you using for
> >>> those measurements?  What is the workload?
> >>>
> >>
> >> The 9% is for an AF_XDP (Fast raw Ethernet socket. Think AF_PACKET, but
> >> faster.) benchmark: receive the packet from the NIC, and drop it. The DMA
> >> syncs stand out in the perf top:
> >>
> >>    28.63%  [kernel]   [k] i40e_clean_rx_irq_zc
> >>    17.12%  [kernel]   [k] xp_alloc
> >>     8.80%  [kernel]   [k] __xsk_rcv_zc
> >>     7.69%  [kernel]   [k] xdp_do_redirect
> >>     5.35%  bpf_prog_992d9ddc835e5629  [k]
> bpf_prog_992d9ddc835e5629
> >>     4.77%  [kernel]   [k] xsk_rcv.part.0
> >>     4.07%  [kernel]   [k] __xsk_map_redirect
> >>     3.80%  [kernel]   [k]
> dma_direct_sync_single_for_cpu
> >>     3.03%  [kernel]   [k]
> dma_direct_sync_single_for_device
> >>     2.76%  [kernel]   [k]
> i40e_alloc_rx_buffers_zc
> >>     1.83%  [kernel]   [k] xsk_flush
> >> ...
> >>
> >> For this benchmark the dma_ops are NULL (dma_is_direct() == true), and
> >> the main issue is that SWIOTLB is now unconditionally enabled [1] for
> >> x86, and for each sync we have to check that if is_swiotlb_buffer()
> >> which involves a some costly indirection.
> >>
> >> That was pretty much what my hack avoided. Instead we did all the checks
> >> upfront, since AF_XDP has long-term DMA mappings, and just set a flag
> >> for that.
> >>
> >> Avoiding the whole "is this address swiotlb" in
> >> dma_direct_sync_single_for_{cpu, device]() per-packet
> >> would help a lot.
> >
> > I'm pretty sure that's one of the things we hope to achieve with the
> > generic bypass flag :)
> >
> >> Somewhat related to the DMA API; It would have performance benefits for
> >> AF_XDP if the DMA range of the mapped memory was linear, i.e. by IOMMU
> >> utilization. I've started hacking a thing a little bit, but it would be
> >> nice if such API was part of the mapping core.
> >>
> >> Input: array of pages Output: array of dma addrs (and obviously dev,
> >> flags and such)
> >>
> >> For non-IOMMU len(array of pages) == len(array of dma addrs)
> >> For best-case IOMMU len(array of dma addrs) == 1 (large linear space)
> >>
> >> But that's for later. :-)
> >
> > FWIW you will typically get that behaviour from IOMMU-based
> implementations
> > of dma_map_sg() right now, although it's not strictly guaranteed. If you
> > can weather some additional setup cost of calling
> > sg_alloc_table_from_pages() plus walking the list after mapping to test
> > whether you did get a contiguous result, you could start taking advantage
> > of it as some of the dma-buf code in DRM and v4l2 does already (although
> > those cases actually treat it as a strict dependency rather than an
> > optimisation).
> 
> Yikes.
> 
> > I'm inclined to agree that if we're going to see more of these cases, a new
> > API call that did formally guarantee a DMA-contiguous mapping (either via
> > IOMMU or bounce buffering) or failure might indeed be handy.
> 
> I was planning on adding a dma-level API to add more pages to an
> IOMMU batch, but was waiting for at 

Re: add an API to check if a streamming mapping needs sync calls

2020-07-08 Thread Christoph Hellwig
On Mon, Jun 29, 2020 at 03:39:01PM +0200, Björn Töpel wrote:
> On 2020-06-29 15:03, Christoph Hellwig wrote:
>> Hi all,
>>
>> this series lifts the somewhat hacky checks in the XSK code if a DMA
>> streaming mapping needs dma_sync_single_for_{device,cpu} calls to the
>> DMA API.
>>
>
> Thanks a lot for working on, and fixing this, Christoph!
>
> I took the series for a spin, and there are (obviously) no performance
> regressions.
>
> Would the patches go through the net/bpf trees or somewhere else?

Where did this end up?  I still don't see it in Linus' tree and this
is getting urgent now.
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH net] xsk: remove cheap_dma optimization

2020-07-08 Thread Christoph Hellwig
On Mon, Jun 29, 2020 at 04:41:16PM +0100, Robin Murphy wrote:
> On 2020-06-28 18:16, Björn Töpel wrote:
>>
>> On 2020-06-27 09:04, Christoph Hellwig wrote:
>>> On Sat, Jun 27, 2020 at 01:00:19AM +0200, Daniel Borkmann wrote:
 Given there is roughly a ~5 weeks window at max where this removal could
 still be applied in the worst case, could we come up with a fix / 
 proposal
 first that moves this into the DMA mapping core? If there is something 
 that
 can be agreed upon by all parties, then we could avoid re-adding the 9%
 slowdown. :/
>>>
>>> I'd rather turn it upside down - this abuse of the internals blocks work
>>> that has basically just missed the previous window and I'm not going
>>> to wait weeks to sort out the API misuse.  But we can add optimizations
>>> back later if we find a sane way.
>>>
>>
>> I'm not super excited about the performance loss, but I do get
>> Christoph's frustration about gutting the DMA API making it harder for
>> DMA people to get work done. Lets try to solve this properly using
>> proper DMA APIs.
>>
>>
>>> That being said I really can't see how this would make so much of a
>>> difference.  What architecture and what dma_ops are you using for
>>> those measurements?  What is the workload?
>>>
>>
>> The 9% is for an AF_XDP (Fast raw Ethernet socket. Think AF_PACKET, but 
>> faster.) benchmark: receive the packet from the NIC, and drop it. The DMA 
>> syncs stand out in the perf top:
>>
>>    28.63%  [kernel]   [k] i40e_clean_rx_irq_zc
>>    17.12%  [kernel]   [k] xp_alloc
>>     8.80%  [kernel]   [k] __xsk_rcv_zc
>>     7.69%  [kernel]   [k] xdp_do_redirect
>>     5.35%  bpf_prog_992d9ddc835e5629  [k] bpf_prog_992d9ddc835e5629
>>     4.77%  [kernel]   [k] xsk_rcv.part.0
>>     4.07%  [kernel]   [k] __xsk_map_redirect
>>     3.80%  [kernel]   [k] dma_direct_sync_single_for_cpu
>>     3.03%  [kernel]   [k] dma_direct_sync_single_for_device
>>     2.76%  [kernel]   [k] i40e_alloc_rx_buffers_zc
>>     1.83%  [kernel]   [k] xsk_flush
>> ...
>>
>> For this benchmark the dma_ops are NULL (dma_is_direct() == true), and
>> the main issue is that SWIOTLB is now unconditionally enabled [1] for
>> x86, and for each sync we have to check that if is_swiotlb_buffer()
>> which involves a some costly indirection.
>>
>> That was pretty much what my hack avoided. Instead we did all the checks
>> upfront, since AF_XDP has long-term DMA mappings, and just set a flag
>> for that.
>>
>> Avoiding the whole "is this address swiotlb" in
>> dma_direct_sync_single_for_{cpu, device]() per-packet
>> would help a lot.
>
> I'm pretty sure that's one of the things we hope to achieve with the 
> generic bypass flag :)
>
>> Somewhat related to the DMA API; It would have performance benefits for
>> AF_XDP if the DMA range of the mapped memory was linear, i.e. by IOMMU
>> utilization. I've started hacking a thing a little bit, but it would be
>> nice if such API was part of the mapping core.
>>
>> Input: array of pages Output: array of dma addrs (and obviously dev,
>> flags and such)
>>
>> For non-IOMMU len(array of pages) == len(array of dma addrs)
>> For best-case IOMMU len(array of dma addrs) == 1 (large linear space)
>>
>> But that's for later. :-)
>
> FWIW you will typically get that behaviour from IOMMU-based implementations 
> of dma_map_sg() right now, although it's not strictly guaranteed. If you 
> can weather some additional setup cost of calling 
> sg_alloc_table_from_pages() plus walking the list after mapping to test 
> whether you did get a contiguous result, you could start taking advantage 
> of it as some of the dma-buf code in DRM and v4l2 does already (although 
> those cases actually treat it as a strict dependency rather than an 
> optimisation).

Yikes.

> I'm inclined to agree that if we're going to see more of these cases, a new 
> API call that did formally guarantee a DMA-contiguous mapping (either via 
> IOMMU or bounce buffering) or failure might indeed be handy.

I was planning on adding a dma-level API to add more pages to an
IOMMU batch, but was waiting for at least the intel IOMMU driver to be
converted to the dma-iommu code (and preferably arm32 and s390 as well).

Here is my old pseudo-code sketch for what I was aiming for from the
block/nvme perspective.  I haven't even implemented it yet, so there might
be some holes in the design:


/*
 * Returns 0 if batching is possible, postitive number of segments required
 * if batching is not possible, or negatie values on error.
 */
int dma_map_batch_start(struct device *dev, size_t rounded_len,
enum dma_data_direction dir, unsigned long attrs, dma_addr_t *addr);
int dma_map_batch_add(struct device *dev, dma_addr_t *addr, struct page *page,
unsigned long offset, size_t size);
int dma_map_batch_end(struct device *dev, int 

Re: [PATCH] scatterlist: protect parameters of the sg_table related macros

2020-07-08 Thread Christoph Hellwig
On Tue, Jun 30, 2020 at 10:16:02AM +0200, Marek Szyprowski wrote:
> Add brackets to protect parameters of the recently added sg_table related
> macros from side-effects.

Applied to the dma-mapping tree, thanks.
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu