date:20180727

This series provides the support for turning on the arm-smmu's
clocks/power domains using runtime pm. This is done using
device links between smmu and client devices. The device link
framework keeps the two devices in correct order for power-cycling
across runtime PM or across system-wide PM.

With addition of a new device link flag DL_FLAG_AUTOREMOVE_SUPPLIER [8]
(available in linux-next of Rafael's linux-pm tree [9]), the device links
created between arm-smmu and its clients will be automatically purged
when arm-smmu driver unbinds from its device.

As not all implementations support clock/power gating, we are checking
for a valid 'smmu->dev's pm_domain' to conditionally enable the runtime
power management for such smmu implementations that can support it.
Otherwise, the clocks are turned to be always on in .probe until .remove.
With conditional runtime pm now, we avoid touching dev->power.lock
in fastpaths for smmu implementations that don't need to do anything
useful with pm_runtime.
This lets us to use the much-argued pm_runtime_get_sync/put_sync()
calls in map/unmap callbacks so that the clients do not have to
worry about handling any of the arm-smmu's power.

This series also adds support for Qcom's arm-smmu-v2 variant that
has different clocks and power requirements.

Previous version of this patch series is @ [2].

Tested this series on msm8996, and sdm845 after pulling in Rafael's linux-pm
linux-next[9] and Joerg's iommu next[10] branches, and related changes for
device tree, etc.

Hi Robin, Will,
I have addressed the comments for v13. If there's still a chance
can you please consider pulling this for v4.19.
Thanks.

[v14]
   * Moved arm_smmu_device_reset() from arm_smmu_pm_resume() to
 arm_smmu_runtime_resume() so that the pm_resume callback calls
 only runtime_resume to resume the device.
 This should take care of restoring the state of smmu in systems
 in which smmu lose register state on power-domain collapse.

[v13]
   Addressing Rafael's comments:
   * Added .suspend pm callback to disable the clocks in system wide suspend.
   * Added corresponding clock enable in .resume pm callback.
   * Explicitly enabling/disabling the clocks now when runtime PM is disabled.
   * device_link_add() doesn't depend on pm_runtime_enabled() as we can
 use device links across system suspend/resume too.

   Addressing Robin's comments:
   * Making device_link_add failures as non-fatal.

   * Removed IOMMU_OF_DECLARE() declaration as we don't need this after Rob's
 patch that removed all of these declarations.

[v12]
   * Use new device link's flag introduced in [8] -
 DL_FLAG_AUTOREMOVE_SUPPLIER. With this devices links are automatically
 purged when arm-smmu driver unbinds.
   * Using pm_runtime_force_suspend() instead of pm_runtime_disable() to
 avoid following warning from arm_smmu_device_remove()

 [295711.537507] [ cut here ]
 [295711.544226] Unpreparing enabled smmu_mdp_ahb_clk
 [295711.549099] WARNING: CPU: 0 PID: 1 at ../drivers/clk/clk.c:697
 clk_core_unprepare+0xd8/0xe0
 ...
 [295711.674073] Call trace:
 [295711.679454]  clk_core_unprepare+0xd8/0xe0
 [295711.682059]  clk_unprepare+0x28/0x40
 [295711.685964]  clk_bulk_unprepare+0x28/0x40
 [295711.689701]  arm_smmu_device_remove+0x88/0xd8
 [295711.693692]  arm_smmu_device_shutdown+0xc/0x18
 [295711.698120]  platform_drv_shutdown+0x20/0x30

[v11]
   * Some more cleanups for device link. We don't need an explicit
 delete for device link from the driver, but just set the flag
 DL_FLAG_AUTOREMOVE.
 device_link_add() API description says -
 "If the DL_FLAG_AUTOREMOVE is set, the link will be removed
 automatically when the consumer device driver unbinds."
   * Addressed the comments for 'smmu' in arm_smmu_map/unmap().
   * Dropped the patch [7] that introduced device_link_del_dev() API. 

[v10]
   * Introduce device_link_del_dev() API to delete the link between
 given consumer and supplier devices. The users of device link
 do not need to store link pointer to delete the link later.
 They can straightaway use this API by passing consumer and
 supplier devices.
   * Made corresponding changes to arm-smmu driver patch handling the
 device links.
   * Dropped the patch [6] that was adding device_link_find() API to
 device core layer. device_link_del_dev() serves the purpose to
 directly delete the link between two given devices.

[v9]
   * Removed 'rpm_supported' flag, instead checking on pm_domain
 to enable runtime pm.
   * Creating device link only when the runtime pm is enabled, as we
 don't need a device link besides managing the power dependency
 between supplier and consumer devices.
   * Introducing a patch to add device_link_find() API that finds
 and existing link between supplier and consumer devices.
 Also, made necessary change to device_link_add() to use

[PATCH v14 1/4] iommu/arm-smmu: Add pm_runtime/sleep ops

From: Sricharan R 

The smmu needs to be functional only when the respective
master's using it are active. The device_link feature
helps to track such functional dependencies, so that the
iommu gets powered when the master device enables itself
using pm_runtime. So by adapting the smmu driver for
runtime pm, above said dependency can be addressed.

This patch adds the pm runtime/sleep callbacks to the
driver and also the functions to parse the smmu clocks
from DT and enable them in resume/suspend.

Also, while we enable the runtime pm add a pm sleep suspend
callback that pushes devices to low power state by turning
the clocks off in a system sleep.
Also add corresponding clock enable path in resume callback.

Signed-off-by: Sricharan R 
Signed-off-by: Archit Taneja 
[vivek: rework for clock and pm ops]
Signed-off-by: Vivek Gautam 
Reviewed-by: Tomasz Figa 
---

Changes since v13:
 - Moved arm_smmu_device_reset() from arm_smmu_pm_resume() to
   arm_smmu_runtime_resume(). arm_smmu_pm_resume() calls just
   runtime_resume() now.

 drivers/iommu/arm-smmu.c | 77 ++--
 1 file changed, 74 insertions(+), 3 deletions(-)

diff --git a/drivers/iommu/arm-smmu.c b/drivers/iommu/arm-smmu.c
index c73cfce1ccc0..5f6a9e3c0079 100644
--- a/drivers/iommu/arm-smmu.c
+++ b/drivers/iommu/arm-smmu.c
@@ -48,6 +48,7 @@
 #include 
 #include 
 #include 
+#include 
 #include 
 #include 
 
@@ -205,6 +206,8 @@ struct arm_smmu_device {
u32 num_global_irqs;
u32 num_context_irqs;
unsigned int*irqs;
+   struct clk_bulk_data*clks;
+   int num_clks;
 
u32 cavium_id_base; /* Specific to Cavium */
 
@@ -1897,10 +1900,12 @@ static int arm_smmu_device_cfg_probe(struct 
arm_smmu_device *smmu)
 struct arm_smmu_match_data {
enum arm_smmu_arch_version version;
enum arm_smmu_implementation model;
+   const char * const *clks;
+   int num_clks;
 };
 
 #define ARM_SMMU_MATCH_DATA(name, ver, imp)\
-static struct arm_smmu_match_data name = { .version = ver, .model = imp }
+static const struct arm_smmu_match_data name = { .version = ver, .model = imp }
 
 ARM_SMMU_MATCH_DATA(smmu_generic_v1, ARM_SMMU_V1, GENERIC_SMMU);
 ARM_SMMU_MATCH_DATA(smmu_generic_v2, ARM_SMMU_V2, GENERIC_SMMU);
@@ -1919,6 +1924,23 @@ static const struct of_device_id arm_smmu_of_match[] = {
 };
 MODULE_DEVICE_TABLE(of, arm_smmu_of_match);
 
+static void arm_smmu_fill_clk_data(struct arm_smmu_device *smmu,
+  const char * const *clks)
+{
+   int i;
+
+   if (smmu->num_clks < 1)
+   return;
+
+   smmu->clks = devm_kcalloc(smmu->dev, smmu->num_clks,
+ sizeof(*smmu->clks), GFP_KERNEL);
+   if (!smmu->clks)
+   return;
+
+   for (i = 0; i < smmu->num_clks; i++)
+   smmu->clks[i].id = clks[i];
+}
+
 #ifdef CONFIG_ACPI
 static int acpi_smmu_get_data(u32 model, struct arm_smmu_device *smmu)
 {
@@ -2001,6 +2023,9 @@ static int arm_smmu_device_dt_probe(struct 
platform_device *pdev,
data = of_device_get_match_data(dev);
smmu->version = data->version;
smmu->model = data->model;
+   smmu->num_clks = data->num_clks;
+
+   arm_smmu_fill_clk_data(smmu, data->clks);
 
parse_driver_options(smmu);
 
@@ -2099,6 +2124,14 @@ static int arm_smmu_device_probe(struct platform_device 
*pdev)
smmu->irqs[i] = irq;
}
 
+   err = devm_clk_bulk_get(smmu->dev, smmu->num_clks, smmu->clks);
+   if (err)
+   return err;
+
+   err = clk_bulk_prepare(smmu->num_clks, smmu->clks);
+   if (err)
+   return err;
+
err = arm_smmu_device_cfg_probe(smmu);
if (err)
return err;
@@ -2181,6 +2214,9 @@ static int arm_smmu_device_remove(struct platform_device 
*pdev)
 
/* Turn the thing off */
writel(sCR0_CLIENTPD, ARM_SMMU_GR0_NS(smmu) + ARM_SMMU_GR0_sCR0);
+
+   clk_bulk_unprepare(smmu->num_clks, smmu->clks);
+
return 0;
 }
 
@@ -2189,15 +2225,50 @@ static void arm_smmu_device_shutdown(struct 
platform_device *pdev)
arm_smmu_device_remove(pdev);
 }
 
-static int __maybe_unused arm_smmu_pm_resume(struct device *dev)
+static int __maybe_unused arm_smmu_runtime_resume(struct device *dev)
 {
struct arm_smmu_device *smmu = dev_get_drvdata(dev);
+   int ret;
+
+   ret = clk_bulk_enable(smmu->num_clks, smmu->clks);
+   if (ret)
+   return ret;
 
arm_smmu_device_reset(smmu);
+
return 0;
 }
 
-static SIMPLE_DEV_PM_OPS(arm_smmu_pm_ops, NULL, arm_smmu_pm_resume);
+static int __maybe_unused arm_smmu_runtime_suspend(struct device *dev)
+{
+   struct arm_smmu_device *smmu = dev_get_drvdata(dev);
+
+   clk_bulk_disable(smmu->num_clks, smmu->clks);
+
+

[PATCH v14 2/4] iommu/arm-smmu: Invoke pm_runtime during probe, add/remove device

From: Sricharan R 

The smmu device probe/remove and add/remove master device callbacks
gets called when the smmu is not linked to its master, that is without
the context of the master device. So calling runtime apis in those places
separately.

Signed-off-by: Sricharan R 
[vivek: Cleanup pm runtime calls]
Signed-off-by: Vivek Gautam 
Reviewed-by: Tomasz Figa 
---

Change since v13:
 - No change.

 drivers/iommu/arm-smmu.c | 101 +++
 1 file changed, 93 insertions(+), 8 deletions(-)

diff --git a/drivers/iommu/arm-smmu.c b/drivers/iommu/arm-smmu.c
index 5f6a9e3c0079..1efa5681b905 100644
--- a/drivers/iommu/arm-smmu.c
+++ b/drivers/iommu/arm-smmu.c
@@ -268,6 +268,20 @@ static struct arm_smmu_option_prop arm_smmu_options[] = {
{ 0, NULL},
 };
 
+static inline int arm_smmu_rpm_get(struct arm_smmu_device *smmu)
+{
+   if (pm_runtime_enabled(smmu->dev))
+   return pm_runtime_get_sync(smmu->dev);
+
+   return 0;
+}
+
+static inline void arm_smmu_rpm_put(struct arm_smmu_device *smmu)
+{
+   if (pm_runtime_enabled(smmu->dev))
+   pm_runtime_put(smmu->dev);
+}
+
 static struct arm_smmu_domain *to_smmu_domain(struct iommu_domain *dom)
 {
return container_of(dom, struct arm_smmu_domain, domain);
@@ -913,11 +927,15 @@ static void arm_smmu_destroy_domain_context(struct 
iommu_domain *domain)
struct arm_smmu_domain *smmu_domain = to_smmu_domain(domain);
struct arm_smmu_device *smmu = smmu_domain->smmu;
struct arm_smmu_cfg *cfg = _domain->cfg;
-   int irq;
+   int ret, irq;
 
if (!smmu || domain->type == IOMMU_DOMAIN_IDENTITY)
return;
 
+   ret = arm_smmu_rpm_get(smmu);
+   if (ret < 0)
+   return;
+
/*
 * Disable the context bank and free the page tables before freeing
 * it.
@@ -932,6 +950,8 @@ static void arm_smmu_destroy_domain_context(struct 
iommu_domain *domain)
 
free_io_pgtable_ops(smmu_domain->pgtbl_ops);
__arm_smmu_free_bitmap(smmu->context_map, cfg->cbndx);
+
+   arm_smmu_rpm_put(smmu);
 }
 
 static struct iommu_domain *arm_smmu_domain_alloc(unsigned type)
@@ -1213,10 +1233,15 @@ static int arm_smmu_attach_dev(struct iommu_domain 
*domain, struct device *dev)
return -ENODEV;
 
smmu = fwspec_smmu(fwspec);
+
+   ret = arm_smmu_rpm_get(smmu);
+   if (ret < 0)
+   return ret;
+
/* Ensure that the domain is finalised */
ret = arm_smmu_init_domain_context(domain, smmu);
if (ret < 0)
-   return ret;
+   goto rpm_put;
 
/*
 * Sanity check the domain. We don't support domains across
@@ -1226,33 +1251,50 @@ static int arm_smmu_attach_dev(struct iommu_domain 
*domain, struct device *dev)
dev_err(dev,
"cannot attach to SMMU %s whilst already attached to 
domain on SMMU %s\n",
dev_name(smmu_domain->smmu->dev), dev_name(smmu->dev));
-   return -EINVAL;
+   ret = -EINVAL;
+   goto rpm_put;
}
 
/* Looks ok, so add the device to the domain */
-   return arm_smmu_domain_add_master(smmu_domain, fwspec);
+   ret = arm_smmu_domain_add_master(smmu_domain, fwspec);
+
+rpm_put:
+   arm_smmu_rpm_put(smmu);
+   return ret;
 }
 
 static int arm_smmu_map(struct iommu_domain *domain, unsigned long iova,
phys_addr_t paddr, size_t size, int prot)
 {
struct io_pgtable_ops *ops = to_smmu_domain(domain)->pgtbl_ops;
+   struct arm_smmu_device *smmu = to_smmu_domain(domain)->smmu;
+   int ret;
 
if (!ops)
return -ENODEV;
 
-   return ops->map(ops, iova, paddr, size, prot);
+   arm_smmu_rpm_get(smmu);
+   ret = ops->map(ops, iova, paddr, size, prot);
+   arm_smmu_rpm_put(smmu);
+
+   return ret;
 }
 
 static size_t arm_smmu_unmap(struct iommu_domain *domain, unsigned long iova,
 size_t size)
 {
struct io_pgtable_ops *ops = to_smmu_domain(domain)->pgtbl_ops;
+   struct arm_smmu_device *smmu = to_smmu_domain(domain)->smmu;
+   size_t ret;
 
if (!ops)
return 0;
 
-   return ops->unmap(ops, iova, size);
+   arm_smmu_rpm_get(smmu);
+   ret = ops->unmap(ops, iova, size);
+   arm_smmu_rpm_put(smmu);
+
+   return ret;
 }
 
 static void arm_smmu_iotlb_sync(struct iommu_domain *domain)
@@ -1407,7 +1449,13 @@ static int arm_smmu_add_device(struct device *dev)
while (i--)
cfg->smendx[i] = INVALID_SMENDX;
 
+   ret = arm_smmu_rpm_get(smmu);
+   if (ret < 0)
+   goto out_cfg_free;
+
ret = arm_smmu_master_alloc_smes(dev);
+   arm_smmu_rpm_put(smmu);
+
if (ret)
goto out_cfg_free;
 
@@ -1427,7 +1475,7 @@ static void arm_smmu_remove_device(struct device *dev)

[PATCH v14 3/4] iommu/arm-smmu: Add the device_link between masters and smmu

From: Sricharan R 

Finally add the device link between the master device and
smmu, so that the smmu gets runtime enabled/disabled only when the
master needs it. This is done from add_device callback which gets
called once when the master is added to the smmu.

Signed-off-by: Sricharan R 
Signed-off-by: Vivek Gautam 
Reviewed-by: Tomasz Figa 
---

Change since v13:
 - No change.

 drivers/iommu/arm-smmu.c | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/drivers/iommu/arm-smmu.c b/drivers/iommu/arm-smmu.c
index 1efa5681b905..e558abf1ecfc 100644
--- a/drivers/iommu/arm-smmu.c
+++ b/drivers/iommu/arm-smmu.c
@@ -1461,6 +1461,9 @@ static int arm_smmu_add_device(struct device *dev)
 
iommu_device_link(>iommu, dev);
 
+   device_link_add(dev, smmu->dev,
+   DL_FLAG_PM_RUNTIME | DL_FLAG_AUTOREMOVE_SUPPLIER);
+
return 0;
 
 out_cfg_free:
-- 
QUALCOMM INDIA, on behalf of Qualcomm Innovation Center, Inc. is a member
of Code Aurora Forum, hosted by The Linux Foundation

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

Re: [PATCH] iommu/ipmmu-vmsa: Fix allocation in atomic context

On Fri, Jul 20, 2018 at 06:16:59PM +0200, Geert Uytterhoeven wrote:
> When attaching a device to an IOMMU group with
> CONFIG_DEBUG_ATOMIC_SLEEP=y:
> 
> BUG: sleeping function called from invalid context at mm/slab.h:421
> in_atomic(): 1, irqs_disabled(): 128, pid: 61, name: kworker/1:1
> ...
> Call trace:
>  ...
>  arm_lpae_alloc_pgtable+0x114/0x184
>  arm_64_lpae_alloc_pgtable_s1+0x2c/0x128
>  arm_32_lpae_alloc_pgtable_s1+0x40/0x6c
>  alloc_io_pgtable_ops+0x60/0x88
>  ipmmu_attach_device+0x140/0x334
> 
> ipmmu_attach_device() takes a spinlock, while arm_lpae_alloc_pgtable()
> allocates memory using GFP_KERNEL.  Originally, the ipmmu-vmsa driver
> had its own custom page table allocation implementation using
> GFP_ATOMIC, hence the spinlock was fine.
> 
> Fix this by replacing the spinlock by a mutex, like the arm-smmu driver
> does.
> 
> Fixes: f20ed39f53145e45 ("iommu/ipmmu-vmsa: Use the ARM LPAE page table 
> allocator")
> Signed-off-by: Geert Uytterhoeven 
> ---
>  drivers/iommu/ipmmu-vmsa.c | 9 -
>  1 file changed, 4 insertions(+), 5 deletions(-)

Applied, thanks.

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

Re: [RFC PATCH v1 0/6] Resolve unwanted DMA backing with IOMMU

On Fri, Jul 27, 2018 at 02:16:18AM +0300, Dmitry Osipenko wrote:
> The proposed solution adds a new option to the base device driver
> structure that allows device drivers to explicitly convey to the drivers
> core that the implicit IOMMU backing for devices must not happen.

Why is IOMMU mapping a problem for the Tegra GPU driver?

If we add something like this then it should not be the choice of the
device driver, but of the user and/or the firmware.

Joerg

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

[PATCH v14 4/4] iommu/arm-smmu: Add support for qcom,smmu-v2 variant

qcom,smmu-v2 is an arm,smmu-v2 implementation with specific
clock and power requirements. This smmu core is used with
multiple masters on msm8996, viz. mdss, video, etc.
Add bindings for the same.

Signed-off-by: Vivek Gautam 
Reviewed-by: Rob Herring 
Reviewed-by: Tomasz Figa 
---

Change since v13:
 - No change.

 .../devicetree/bindings/iommu/arm,smmu.txt | 42 ++
 drivers/iommu/arm-smmu.c   | 13 +++
 2 files changed, 55 insertions(+)

diff --git a/Documentation/devicetree/bindings/iommu/arm,smmu.txt 
b/Documentation/devicetree/bindings/iommu/arm,smmu.txt
index 8a6ffce12af5..7c71a6ed465a 100644
--- a/Documentation/devicetree/bindings/iommu/arm,smmu.txt
+++ b/Documentation/devicetree/bindings/iommu/arm,smmu.txt
@@ -17,10 +17,19 @@ conditions.
 "arm,mmu-401"
 "arm,mmu-500"
 "cavium,smmu-v2"
+"qcom,-smmu-v2", "qcom,smmu-v2"
 
   depending on the particular implementation and/or the
   version of the architecture implemented.
 
+  A number of Qcom SoCs use qcom,smmu-v2 version of the IP.
+  "qcom,-smmu-v2" represents a soc specific compatible
+  string that should be present along with the "qcom,smmu-v2"
+  to facilitate SoC specific clocks/power connections and to
+  address specific bug fixes.
+  An example string would be -
+  "qcom,msm8996-smmu-v2", "qcom,smmu-v2".
+
 - reg   : Base address and size of the SMMU.
 
 - #global-interrupts : The number of global interrupts exposed by the
@@ -71,6 +80,22 @@ conditions.
   or using stream matching with #iommu-cells = <2>, and
   may be ignored if present in such cases.
 
+- clock-names:List of the names of clocks input to the device. The
+  required list depends on particular implementation and
+  is as follows:
+  - for "qcom,smmu-v2":
+- "bus": clock required for downstream bus access and
+ for the smmu ptw,
+- "iface": clock required to access smmu's registers
+   through the TCU's programming interface.
+  - unspecified for other implementations.
+
+- clocks: Specifiers for all clocks listed in the clock-names property,
+  as per generic clock bindings.
+
+- power-domains:  Specifiers for power domains required to be powered on for
+  the SMMU to operate, as per generic power domain bindings.
+
 ** Deprecated properties:
 
 - mmu-masters (deprecated in favour of the generic "iommus" binding) :
@@ -137,3 +162,20 @@ conditions.
 iommu-map = <0  0 0x400>;
 ...
 };
+
+   /* Qcom's arm,smmu-v2 implementation */
+   smmu4: iommu {
+   compatible = "qcom,msm8996-smmu-v2", "qcom,smmu-v2";
+   reg = <0xd0 0x1>;
+
+   #global-interrupts = <1>;
+   interrupts = ,
+,
+;
+   #iommu-cells = <1>;
+   power-domains = < MDSS_GDSC>;
+
+   clocks = < SMMU_MDP_AXI_CLK>,
+< SMMU_MDP_AHB_CLK>;
+   clock-names = "bus", "iface";
+   };
diff --git a/drivers/iommu/arm-smmu.c b/drivers/iommu/arm-smmu.c
index e558abf1ecfc..2b4edba188a5 100644
--- a/drivers/iommu/arm-smmu.c
+++ b/drivers/iommu/arm-smmu.c
@@ -119,6 +119,7 @@ enum arm_smmu_implementation {
GENERIC_SMMU,
ARM_MMU500,
CAVIUM_SMMUV2,
+   QCOM_SMMUV2,
 };
 
 struct arm_smmu_s2cr {
@@ -1971,6 +1972,17 @@ ARM_SMMU_MATCH_DATA(arm_mmu401, ARM_SMMU_V1_64K, 
GENERIC_SMMU);
 ARM_SMMU_MATCH_DATA(arm_mmu500, ARM_SMMU_V2, ARM_MMU500);
 ARM_SMMU_MATCH_DATA(cavium_smmuv2, ARM_SMMU_V2, CAVIUM_SMMUV2);
 
+static const char * const qcom_smmuv2_clks[] = {
+   "bus", "iface",
+};
+
+static const struct arm_smmu_match_data qcom_smmuv2 = {
+   .version = ARM_SMMU_V2,
+   .model = QCOM_SMMUV2,
+   .clks = qcom_smmuv2_clks,
+   .num_clks = ARRAY_SIZE(qcom_smmuv2_clks),
+};
+
 static const struct of_device_id arm_smmu_of_match[] = {
{ .compatible = "arm,smmu-v1", .data = _generic_v1 },
{ .compatible = "arm,smmu-v2", .data = _generic_v2 },
@@ -1978,6 +1990,7 @@ static const struct of_device_id arm_smmu_of_match[] = {
{ .compatible = "arm,mmu-401", .data = _mmu401 },
{ .compatible = "arm,mmu-500", .data = _mmu500 },
{ .compatible = "cavium,smmu-v2", .data = _smmuv2 },
+   { .compatible = "qcom,smmu-v2", .data = _smmuv2 },
{ },
 };
 MODULE_DEVICE_TABLE(of, arm_smmu_of_match);
-- 
QUALCOMM INDIA, on behalf of Qualcomm Innovation Center, Inc. is a member
of Code Aurora Forum, hosted by The

Re: [PATCH v1] iommu/ipmmu-vmsa: Don't register as BUS IOMMU if machine doesn't have IPMMU-VMSA

On Fri, Jul 27, 2018 at 12:19:16AM +0300, Dmitry Osipenko wrote:
> This fixes kernel crashing on NVIDIA Tegra if kernel is compiled in
> a multiplatform configuration and IPMMU-VMSA driver is enabled.
> 
> Cc:  # v3.20+
> Signed-off-by: Dmitry Osipenko 
> ---
>  drivers/iommu/ipmmu-vmsa.c | 7 +++
>  1 file changed, 7 insertions(+)

Applied, thanks.
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

Re: [PATCH] iommu/ipmmu-vmsa: Clarify supported platforms

On Wed, Jul 25, 2018 at 03:10:29PM +0200, Geert Uytterhoeven wrote:
> The Renesas IPMMU-VMSA driver supports not just R-Car H2 and M2 SoCs,
> but also other R-Car Gen2 and R-Car Gen3 SoCs.
> 
> Drop a superfluous "Renesas" while at it.
> 
> Signed-off-by: Geert Uytterhoeven 
> ---
>  drivers/iommu/Kconfig | 4 ++--
>  1 file changed, 2 insertions(+), 2 deletions(-)

Applied, thanks.

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

Re: MSI B350M MORTAR: `AMD-Vi: Unable to write to IOMMU perf counter.` and `pci 0000:00:00.2: can't derive routing for PCI INT A`

2018-07-27 Thread Jörg Rödel

Hi Paul,

On Mon, Jul 23, 2018 at 12:09:37PM +0200, Paul Menzel wrote:
> > or the BIOS did not enable the performance counter feature in the IOMMU
> > hardware.
> 
> Is it possible to check that from the OS side?

It would be if we had the NB documentation, but I guess those details
are not publicly available.

> > Are you running on the latest BIOS?
> 
> Yes, I am even using a “beta“ one from [1].
> 
> DMI: MSI MS-7A37/B350M MORTAR (MS-7A37), BIOS 1.G1 05/17/2018

I think the best you can do is to report this as a bug to your BIOS
vendor and hope they'll fix it.


Joerg
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

Re: MSI B350M MORTAR: `AMD-Vi: Unable to write to IOMMU perf counter.` and `pci 0000:00:00.2: can't derive routing for PCI INT A`

2018-07-27 Thread Jörg Rödel

On Fri, Jul 27, 2018 at 11:18:56AM +0200, Paul Menzel wrote:
> On the AMD site [1] I see the three family 0x17 documents below.
> 
> 1.  Open-Source Register Reference for AMD Family 17h Processors (PUB)
> 2.  Processor Programming Reference (PPR) for AMD Family 17h Models 00h-0Fh 
> Processors (PUB)
> 3.  Software Optimization Guide for AMD Family 17h Processors (PUB)
> 
> What documented is required? The BKDG?

IIRC the PPR is the new BKDG, but the public version does not seem to
contain any references about how the IOMMU needs to be set up by
firmware.


Thanks,

Joerg
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

Re: [PATCH] iommu/arm-smmu-v3: sync the OVACKFLG to PRIQ consumer register

2018-07-27 Thread Zhangshaokun

Hi Will,

I saw it in your trees, is it necessary to Cc stable version?

Cheers,
Shaokun

On 2018/7/23 20:56, Shaokun Zhang wrote:
> From: Miao Zhong 
> 
> When PRI queue occurs overflow, driver should update the OVACKFLG to
> the PRIQ consumer register, otherwise subsequent PRI requests will not
> be processed.
> 
> Cc: Will Deacon 
> Cc: Robin Murphy  
> Signed-off-by: Miao Zhong 
> ---
>  drivers/iommu/arm-smmu-v3.c | 1 +
>  1 file changed, 1 insertion(+)
> 
> diff --git a/drivers/iommu/arm-smmu-v3.c b/drivers/iommu/arm-smmu-v3.c
> index 1d64710..deacc15 100644
> --- a/drivers/iommu/arm-smmu-v3.c
> +++ b/drivers/iommu/arm-smmu-v3.c
> @@ -1301,6 +1301,7 @@ static irqreturn_t arm_smmu_priq_thread(int irq, void 
> *dev)
>  
>   /* Sync our overflow flag, as we believe we're up to speed */
>   q->cons = Q_OVF(q, q->prod) | Q_WRP(q, q->cons) | Q_IDX(q, q->cons);
> + writel(q->cons, q->cons_reg);
>   return IRQ_HANDLED;
>  }
>  
> 

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

Re: [PATCH 3/3] iommu/arm-smmu: Error out only if not enough context interrupts

On Wed, Jul 25, 2018 at 5:27 PM, Will Deacon  wrote:
> On Tue, Jul 24, 2018 at 03:09:41PM +0530, Vivek Gautam wrote:
>> On 7/24/2018 2:06 PM, Will Deacon wrote:
>> >On Thu, Jul 19, 2018 at 11:23:56PM +0530, Vivek Gautam wrote:
>> >>diff --git a/drivers/iommu/arm-smmu.c b/drivers/iommu/arm-smmu.c
>> >>index 7c69736a30f8..4cb53bf4f423 100644
>> >>--- a/drivers/iommu/arm-smmu.c
>> >>+++ b/drivers/iommu/arm-smmu.c
>> >>@@ -2229,12 +2229,19 @@ static int arm_smmu_device_probe(struct 
>> >>platform_device *pdev)
>> >>if (err)
>> >>return err;
>> >>-   if (smmu->version == ARM_SMMU_V2 &&
>> >>-   smmu->num_context_banks != smmu->num_context_irqs) {
>> >>-   dev_err(dev,
>> >>-   "found only %d context interrupt(s) but %d required\n",
>> >>-   smmu->num_context_irqs, smmu->num_context_banks);
>> >>-   return -ENODEV;
>> >>+   if (smmu->version == ARM_SMMU_V2) {
>> >>+   if (smmu->num_context_banks > smmu->num_context_irqs) {
>> >>+   dev_err(dev,
>> >>+ "found only %d context irq(s) but %d required\n",
>> >>+ smmu->num_context_irqs, smmu->num_context_banks);
>> >>+   return -ENODEV;
>> >>+   } else if (smmu->num_context_banks < smmu->num_context_irqs) {
>> >>+   /* loose extra context interrupts */
>> >>+   dev_notice(dev,
>> >>+ "found %d context irq(s) but only %d required\n",
>> >>+ smmu->num_context_irqs, smmu->num_context_banks);
>> >>+   smmu->num_context_irqs = smmu->num_context_banks;
>> >>+   }
>> >I don't see the utility in the new message. Can you simplify with the patch
>> >below on top? It's a bit weird that we only decide to ignore the extra irqs
>> >after calling platform_get_irq() on them, but that seems to be harmless.
>>
>> Thanks. I will modify as suggested below and respin.
>
> It's ok, I can make the change locally.

Thanks Will for making the changes, and picking this.

Best regards
Vivek

-- 
QUALCOMM INDIA, on behalf of Qualcomm Innovation Center, Inc. is a member
of Code Aurora Forum, hosted by The Linux Foundation
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

Re: [PATCH v1] iommu/ipmmu-vmsa: Don't register as BUS IOMMU if machine doesn't have IPMMU-VMSA

2018-07-27 Thread Laurent Pinchart

Hi Dmitry,

(CC'ing Geert and Magnus)

Thank you for the patch.

On Friday, 27 July 2018 00:19:16 EEST Dmitry Osipenko wrote:
> This fixes kernel crashing on NVIDIA Tegra if kernel is compiled in
> a multiplatform configuration and IPMMU-VMSA driver is enabled.
> 
> Cc:  # v3.20+
> Signed-off-by: Dmitry Osipenko 
> ---
>  drivers/iommu/ipmmu-vmsa.c | 7 +++
>  1 file changed, 7 insertions(+)
> 
> diff --git a/drivers/iommu/ipmmu-vmsa.c b/drivers/iommu/ipmmu-vmsa.c
> index 9e8495762bc8..78c50db9cd71 100644
> --- a/drivers/iommu/ipmmu-vmsa.c
> +++ b/drivers/iommu/ipmmu-vmsa.c
> @@ -1109,12 +1109,19 @@ static struct platform_driver ipmmu_driver = {
> 
>  static int __init ipmmu_init(void)
>  {
> + struct device_node *np;
>   static bool setup_done;
>   int ret;
> 
>   if (setup_done)
>   return 0;
> 
> + np = of_find_matching_node(NULL, ipmmu_of_ids);
> + if (!np)
> + return 0;
> +
> + of_node_put(np);
> +

While functionally correct, this will add some unnecessary overhead when 
iommu_init() is called from IOMMU_OF_DECLARE(). I'm OK with this fix as a 
temporary measure to solve your problem, but we need to address the underlying 
issue properly.

Geert, Magnus, the ipmmu-vmsa driver is a bit of a mess. We should brush it up 
and start using IOMMU_OF_DECLARE() on all platforms (and eventually get rid of 
bus_set_iommu() completely...). Do you have plans to address this ? If not, 
could you please add it to your to-do list ?

>   ret = platform_driver_register(_driver);
>   if (ret < 0)
>   return ret;

-- 
Regards,

Laurent Pinchart

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

Re: MSI B350M MORTAR: `AMD-Vi: Unable to write to IOMMU perf counter.` and `pci 0000:00:00.2: can't derive routing for PCI INT A`

2018-07-27 Thread Paul Menzel

Dear Jörg,

On 07/27/18 10:27, Jörg Rödel wrote:

> On Mon, Jul 23, 2018 at 12:09:37PM +0200, Paul Menzel wrote:
>>> or the BIOS did not enable the performance counter feature in the IOMMU
>>> hardware.
>>
>> Is it possible to check that from the OS side?
> 
> It would be if we had the NB documentation, but I guess those details
> are not publicly available.

On the AMD site [1] I see the three family 0x17 documents below.

1.  Open-Source Register Reference for AMD Family 17h Processors (PUB)
2.  Processor Programming Reference (PPR) for AMD Family 17h Models 00h-0Fh 
Processors (PUB)
3.  Software Optimization Guide for AMD Family 17h Processors (PUB)

What documented is required? The BKDG?

>>> Are you running on the latest BIOS?
>>
>> Yes, I am even using a “beta“ one from [1].
>>
>> DMI: MSI MS-7A37/B350M MORTAR (MS-7A37), BIOS 1.G1 05/17/2018
> 
> I think the best you can do is to report this as a bug to your BIOS
> vendor and hope they'll fix it.

I contacted them with the problem description. Let’s see what the
result will be.

Kind regards,

Paul

[1]: https://developer.amd.com/resources/developer-guides-manuals/

smime.p7s
Description: S/MIME Cryptographic Signature
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

Re: [PATCH] iommu/arm-smmu-v3: sync the OVACKFLG to PRIQ consumer register

On Fri, Jul 27, 2018 at 05:41:46PM +0800, Zhangshaokun wrote:
> I saw it in your trees, is it necessary to Cc stable version?

I don't think so, given that we don't actually support PRI upstream.

Will

> On 2018/7/23 20:56, Shaokun Zhang wrote:
> > From: Miao Zhong 
> > 
> > When PRI queue occurs overflow, driver should update the OVACKFLG to
> > the PRIQ consumer register, otherwise subsequent PRI requests will not
> > be processed.
> > 
> > Cc: Will Deacon 
> > Cc: Robin Murphy  
> > Signed-off-by: Miao Zhong 
> > ---
> >  drivers/iommu/arm-smmu-v3.c | 1 +
> >  1 file changed, 1 insertion(+)
> > 
> > diff --git a/drivers/iommu/arm-smmu-v3.c b/drivers/iommu/arm-smmu-v3.c
> > index 1d64710..deacc15 100644
> > --- a/drivers/iommu/arm-smmu-v3.c
> > +++ b/drivers/iommu/arm-smmu-v3.c
> > @@ -1301,6 +1301,7 @@ static irqreturn_t arm_smmu_priq_thread(int irq, void 
> > *dev)
> >  
> > /* Sync our overflow flag, as we believe we're up to speed */
> > q->cons = Q_OVF(q, q->prod) | Q_WRP(q, q->cons) | Q_IDX(q, q->cons);
> > +   writel(q->cons, q->cons_reg);
> > return IRQ_HANDLED;
> >  }
> >  
> > 
> 
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

Re: [PATCH v2 7/7] OF: Don't set default coherent DMA mask


On 27/07/18 01:22, Grygorii Strashko wrote:
[...]

the result of this change is pretty strange as for me :(
Resulted code:

/*
 * If @dev is expected to be DMA-capable then the bus code that created
 * it should have initialised its dma_mask pointer by this point. For
 * now, we'll continue the legacy behaviour of coercing it to the
 * coherent mask if not, but we'll no longer do so quietly.
 */
if (!dev->dma_mask) {
dev_warn(dev, "DMA mask not set\n");
dev->dma_mask = >coherent_dma_mask;
^this will always produce warning in case of platform-bus or if there are no 
bus driver.
even if DT contains correct definition of dma-range
}

if (!size && dev->coherent_dma_mask)

^ coherent_dma_mask is zero, so size will not be calculated

pls, ignore this particular comment


size = max(dev->coherent_dma_mask, dev->coherent_dma_mask + 1);
else if (!size)
size = 1ULL << 32;

dev->dma_pfn_offset = offset;

/*
 * Limit coherent and dma mask based on size and default mask
 * set by the driver.
 */
mask = DMA_BIT_MASK(ilog2(dma_addr + size - 1) + 1);
dev->bus_dma_mask = mask;
dev->coherent_dma_mask &= mask;

^^ if nobody set coherent_dma_mask before it will stay null forever unless 
drivers
will overwrite it. Again even if DT has correct definition for dma-range.


That is intentional. Consider a 32-bit device on a bus with an upstream 
DMA range of 40 bits (which could easily happen with e.g. PCI). If the 
firmware code gives that device 40-bit DMA masks and the driver doesn't 
change them, then DMA is not going to work correctly. Therefore the 
firmware code cannot safely make {coherent_}dma_mask larger than the 
default value passed in, and if the default is 0 then that should be 
fixed in whatever code created the device, not worked around later.


Robin.



*dev->dma_mask &= mask;

coherent = of_dma_is_coherent(np);
dev_dbg(dev, "device is%sdma coherent\n",
coherent ? " " : " not ");




FYI, with below diff i can boot at least:

diff --git a/drivers/of/device.c b/drivers/of/device.c
index 5957cd4..f7dc121 100644
--- a/drivers/of/device.c
+++ b/drivers/of/device.c
@@ -150,8 +150,8 @@ int of_dma_configure(struct device *dev, struct device_node 
*np, bool force_dma)
  */
 mask = DMA_BIT_MASK(ilog2(dma_addr + size - 1) + 1);
 dev->bus_dma_mask = mask;
-   dev->coherent_dma_mask &= mask;
-   *dev->dma_mask &= mask;
+   dev->coherent_dma_mask = mask;
+   *dev->dma_mask = mask;
  
 coherent = of_dma_is_coherent(np);

 dev_dbg(dev, "device is%sdma coherent\n",




___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

[GIT PULL] iommu/arm-smmu: Updates for 4.19

Hi Joerg,

Please pull these ARM SMMU updates for 4.19. Most of these are non-critical
fixes, but the main change is switching our default behaviour so that we
now abort transactions originating from unknown devices (i.e. those which
are not attached to an iommu domain) rather than silently let them bypass
the SMMU.

Cheers,

Will

--->8

The following changes since commit ce397d215ccd07b8ae3f71db689aedb85d56ab40:

  Linux 4.18-rc1 (2018-06-17 08:04:49 +0900)

are available in the git repository at:

  git://git.kernel.org/pub/scm/linux/kernel/git/will/linux.git 
for-joerg/arm-smmu/updates

for you to fetch changes up to b63b3439b85609338e4faabd5d2588dbda137e5c:

  iommu/arm-smmu-v3: Abort all transactions if SMMU is enabled in kdump kernel 
(2018-07-27 11:12:37 +0100)


Jean-Philippe Brucker (2):
  iommu/io-pgtable-arm: Fix pgtable allocation in selftest
  iommu/io-pgtable-arm-v7s: Abort allocation when table address overflows 
the PTE

Miao Zhong (1):
  iommu/arm-smmu-v3: sync the OVACKFLG to PRIQ consumer register

Vivek Gautam (1):
  iommu/arm-smmu: Error out only if not enough context interrupts

Will Deacon (1):
  iommu/arm-smmu-v3: Abort all transactions if SMMU is enabled in kdump 
kernel

Zhen Lei (1):
  iommu/arm-smmu-v3: Prevent any devices access to memory without 
registration

 drivers/iommu/arm-smmu-v3.c| 25 ++---
 drivers/iommu/arm-smmu.c   | 16 ++--
 drivers/iommu/io-pgtable-arm-v7s.c |  7 ++-
 drivers/iommu/io-pgtable-arm.c |  3 ++-
 4 files changed, 36 insertions(+), 15 deletions(-)
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

Re: [PATCH v3 0/6] add non-strict mode support for arm-smmu-v3

On Fri, Jul 27, 2018 at 10:49:59AM +0800, Leizhen (ThunderTown) wrote:
> On 2018/7/26 22:16, Robin Murphy wrote:
> > On 2018-07-26 4:44 AM, Leizhen (ThunderTown) wrote:
> >> Passthrough is not enough to support VFIO, and virtualization need
> >> the later.
> > 
> > Huh? The whole point of passthrough mode is that the IOMMU can still be
> > used for VFIO, but without imposing the overhead of dynamic mapping on
> > host DMA.
> 
> I said that from my experience. Userspace do not known the PA, so I think
> the user can not fill dma_map.iova correctly.
> 
>   /* Allocate some space and setup a DMA mapping */
>   dma_map.vaddr = (__u64)mmap(0, 0x1000, PROT_READ | PROT_WRITE,
>MAP_PRIVATE | MAP_ANONYMOUS, 0, 0);
>   dma_map.size = 0x1000;
>   dma_map.iova = 0x2fUL;  
> /* user specified */
>   dma_map.flags = VFIO_DMA_MAP_FLAG_READ | VFIO_DMA_MAP_FLAG_WRITE;
> 
>   ioctl(container, VFIO_IOMMU_MAP_DMA, _map);

Hmm, I'm pretty sure that's not the case. When passthrough is configured
via iommu.passthrough, it only applies to the default domain and therefore
won't affect the unmanaged domain that VFIO manages explicitly. So VFIO
will continue to use translation and userspace can't pass whatever it likes
for the iova.

Will
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

Re: [PATCH] iommu/arm-smmu-v3: sync the OVACKFLG to PRIQ consumer register

2018-07-27 Thread Zhangshaokun



On 2018/7/27 17:48, Will Deacon wrote:
> On Fri, Jul 27, 2018 at 05:41:46PM +0800, Zhangshaokun wrote:
>> I saw it in your trees, is it necessary to Cc stable version?
> 
> I don't think so, given that we don't actually support PRI upstream.
> 

Got it, thanks your reply.

Shaokun

> Will
> 
>> On 2018/7/23 20:56, Shaokun Zhang wrote:
>>> From: Miao Zhong 
>>>
>>> When PRI queue occurs overflow, driver should update the OVACKFLG to
>>> the PRIQ consumer register, otherwise subsequent PRI requests will not
>>> be processed.
>>>
>>> Cc: Will Deacon 
>>> Cc: Robin Murphy  
>>> Signed-off-by: Miao Zhong 
>>> ---
>>>  drivers/iommu/arm-smmu-v3.c | 1 +
>>>  1 file changed, 1 insertion(+)
>>>
>>> diff --git a/drivers/iommu/arm-smmu-v3.c b/drivers/iommu/arm-smmu-v3.c
>>> index 1d64710..deacc15 100644
>>> --- a/drivers/iommu/arm-smmu-v3.c
>>> +++ b/drivers/iommu/arm-smmu-v3.c
>>> @@ -1301,6 +1301,7 @@ static irqreturn_t arm_smmu_priq_thread(int irq, void 
>>> *dev)
>>>  
>>> /* Sync our overflow flag, as we believe we're up to speed */
>>> q->cons = Q_OVF(q, q->prod) | Q_WRP(q, q->cons) | Q_IDX(q, q->cons);
>>> +   writel(q->cons, q->cons_reg);
>>> return IRQ_HANDLED;
>>>  }
>>>  
>>>
>>
> 
> .
> 

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

Re: [PATCH v2 0/7] Stop losing firmware-set DMA masks


Hi Grygorii,

Thanks for the report.

On 27/07/18 00:45, Grygorii Strashko wrote:
[...]

With this series applied I can't boot TI ARM32 am574x-idk any more.

And log output is full of "DMA mask not set" -


That's somewhat expected - as the relevant commit message mentions, 
there will be bugs flushed out by this change, but the point is to fix 
them :)



nobody sets dma_mask for platform bus in case of OF boot :(


Right, after a brief investigation, that one turns out to be nice and 
straightforward to explain and fix; I'll write up the patch shortly.


[...]

[3.793493] ti-qspi 4b30.qspi: dma_alloc_coherent failed, using PIO mode
[3.801088] Unable to handle kernel NULL pointer dereference at virtual 
address 0080

[...]

[4.602777] [] (omap_dma_write) from [] 
(omap_dma_start_desc+0x78/0x150)
[4.611258] [] (omap_dma_start_desc) from [] 
(omap_dma_issue_pending+0x90/0x98)
[4.620347]  r9:ed002780 r8: r7:0010 r6:ee2bdd00 r5:ee2bdd4c 
r4:ee2bdd00
[4.628130] [] (omap_dma_issue_pending) from [] 
(ti_qspi_dma_xfer+0x8c/0x13c)
[4.637039]  r5: r4:ee7e1b28
[4.640636] [] (ti_qspi_dma_xfer) from [] 
(ti_qspi_exec_mem_op+0x2e4/0x32c)


Hooray, a real bug! Since dma_alloc_coherent() could have failed due to 
lack of memory or any other reason, that's not even directly related to 
this change, we've simply helped uncover it. I guess that PIO fallback 
path hasn't been tested recently, because it clearly doesn't work. It 
would be good if someone who knows the ti-qspi driver and has the means 
to test it could figure that one out before also adding the missing 
dma_set_coherent_mask() call.


Robin.
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

Re: [RFC PATCH v1 0/6] Resolve unwanted DMA backing with IOMMU

On Fri, Jul 27, 2018 at 10:25:13AM +0200, Joerg Roedel wrote:
> On Fri, Jul 27, 2018 at 02:16:18AM +0300, Dmitry Osipenko wrote:
> > The proposed solution adds a new option to the base device driver
> > structure that allows device drivers to explicitly convey to the drivers
> > core that the implicit IOMMU backing for devices must not happen.
> 
> Why is IOMMU mapping a problem for the Tegra GPU driver?
> 
> If we add something like this then it should not be the choice of the
> device driver, but of the user and/or the firmware.

Agreed, and it would still need somebody to configure an identity domain so
that transactions aren't aborted immediately. We currently allow the
identity domain to be used by default via a command-line option, so I guess
we'd need a way for firmware to request that on a per-device basis.

Will
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

Re: [PATCH] iommu/iova: Update cached node pointer when current node fails to get any free IOVA

2018-07-27 Thread Ganapatrao Kulkarni

Hi Robin,

On Wed, Jul 25, 2018 at 7:50 PM, Robin Murphy  wrote:
> On 12/07/18 08:45, Ganapatrao Kulkarni wrote:
>>
>> Hi Robin,
>>
>>
>> On Mon, Jun 4, 2018 at 9:36 AM, Ganapatrao Kulkarni 
>> wrote:
>>>
>>> ping??
>>>
>>> On Mon, May 21, 2018 at 6:45 AM, Ganapatrao Kulkarni 
>>> wrote:

 On Thu, Apr 26, 2018 at 3:15 PM, Ganapatrao Kulkarni
  wrote:
>
> Hi Robin,
>
> On Mon, Apr 23, 2018 at 11:11 PM, Ganapatrao Kulkarni
>  wrote:
>>
>> On Mon, Apr 23, 2018 at 10:07 PM, Robin Murphy 
>> wrote:
>>>
>>> On 19/04/18 18:12, Ganapatrao Kulkarni wrote:

 The performance drop is observed with long hours iperf testing using
 40G
 cards. This is mainly due to long iterations in finding the free
 iova
 range in 32bit address space.

 In current implementation for 64bit PCI devices, there is always
 first
 attempt to allocate iova from 32bit(SAC preferred over DAC) address
 range. Once we run out 32bit range, there is allocation from higher
 range,
 however due to cached32_node optimization it does not suppose to be
 painful. cached32_node always points to recently allocated 32-bit
 node.
 When address range is full, it will be pointing to last allocated
 node
 (leaf node), so walking rbtree to find the available range is not
 expensive affair. However this optimization does not behave well
 when
 one of the middle node is freed. In that case cached32_node is
 updated
 to point to next iova range. The next iova allocation will consume
 free
 range and again update cached32_node to itself. From now on, walking
 over 32-bit range is more expensive.

 This patch adds fix to update cached node to leaf node when there
 are no
 iova free range left, which avoids unnecessary long iterations.
>>>
>>>
>>>
>>> The only trouble with this is that "allocation failed" doesn't
>>> uniquely mean
>>> "space full". Say that after some time the 32-bit space ends up empty
>>> except
>>> for one page at 0x1000 and one at 0x8000, then somebody tries to
>>> allocate 2GB. If we move the cached node down to the leftmost entry
>>> when
>>> that fails, all subsequent allocation attempts are now going to fail
>>> despite
>>> the space being 99.% free!
>>>
>>> I can see a couple of ways to solve that general problem of free
>>> space above
>>> the cached node getting lost, but neither of them helps with the case
>>> where
>>> there is genuinely insufficient space (and if anything would make it
>>> even
>>> slower). In terms of the optimisation you want here, i.e. fail fast
>>> when an
>>> allocation cannot possibly succeed, the only reliable idea which
>>> comes to
>>> mind is free-PFN accounting. I might give that a go myself to see how
>>> ugly
>>> it looks.
>>
>>
>> did you get any chance to look in to this issue?
>> i am waiting for your suggestion/patch for this issue!
>
>
> I got as far as [1], but I wasn't sure how much I liked it, since it still
> seems a little invasive for such a specific case (plus I can't remember if
> it's actually been debugged or not). I think in the end I started wondering
> whether it's even worth bothering with the 32-bit optimisation for PCIe
> devices - 4 extra bytes worth of TLP is surely a lot less significant than
> every transaction taking up to 50% more bus cycles was for legacy PCI.

how about tracking previous attempt to get 32bit range iova and avoid
further attempts, if it was failed. Later Resume attempts once
replenish happens.
Created patch for the same [2]

[2] 
https://github.com/gpkulkarni/linux/commit/e2343a3e1f55cdeb5694103dd354bcb881dc65c3
note, the testing of this patch is in progress.

>
> Robin.
>
> [1]
> http://www.linux-arm.org/git?p=linux-rm.git;a=commitdiff;h=a8e0e4af10ebebb3669750e05bf0028e5bd6afe8

thanks
Ganapat
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

Re: [PATCH] of/platform: Initialise default DMA masks

On Fri, Jul 27, 2018 at 03:14:15PM +0100, Robin Murphy wrote:
> This shouldn't strictly depend on the changes currently queued in the
> dma-mapping tree, so should be OK to go through the DT tree in parallel.
> Ideally we'd fix all DMA-capable drivers to set their masks correctly,
> but in the short term everyone's going to get cross about -rc1 not
> booting...

I'd prefer to pick this up through the dma-mapping tree, and in fact
move it before the bus_dma_mask changes just to keep a bisectable
tree.
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

Re: [PATCH 1/3] dmapool: improve scalability of dma_pool_alloc


> That would be true if the test were "if
> (list_empty(>avail_page_list))".  But it is testing the list
> pointers in the item rather than the list pointers in the pool.  It may
> be a bit confusing if you have never seen that usage before, which is
> why I added a comment.  Basically, if you use list_del_init() instead of
> list_del(), then you can use list_empty() on the item itself to test if
> the item is present in a list or not.  For example, the comments in
> list.h warn not to use list_empty() on the entry after just list_del():
>
> /**
>  * list_del - deletes entry from list.
>  * @entry: the element to delete from the list.
>  * Note: list_empty() on entry does not return true after this, the entry is
>  * in an undefined state.
>  */

Sorry for the crappy line length (fixed above).  Should have just used
Preformat in Thunderbird like always rather than following
Documentation/process/email-clients.rst and changing mailnews.wraplength
from 72 to 0.

Anyway, I have been using list_del_init() for a long time in various
places, but now I can't find where any of its useful features are
documented.  I will have to submit a separate patch to add more
documentation for it.  I find it useful for two things.

1) If you use list_del_init(), you can delete an item from a list
multiple times without checking if it has already been deleted.  So the
following is safe:

list_add(entry, list);
list_del_init(entry);
list_del_init(entry);

That would not be safe if just using list_del().

2) If you use list_del_init(), you can test if an item is present in a
list using list_empty() on the entry.  So:

list_add(entry, list);
/* list_empty(entry) is false */
list_del_init(entry);
/* list_empty(entry) is true */

That would not work if using just list_del().

Since the list_empty() name is unintuitive for this purpose, it might be
useful to add a new macro for this use case.

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

Re: [RFC PATCH v1 0/6] Resolve unwanted DMA backing with IOMMU

2018-07-27 Thread Dmitry Osipenko

On Friday, 27 July 2018 12:03:28 MSK Will Deacon wrote:
> On Fri, Jul 27, 2018 at 10:25:13AM +0200, Joerg Roedel wrote:
> > On Fri, Jul 27, 2018 at 02:16:18AM +0300, Dmitry Osipenko wrote:
> > > The proposed solution adds a new option to the base device driver
> > > structure that allows device drivers to explicitly convey to the drivers
> > > core that the implicit IOMMU backing for devices must not happen.
> > 
> > Why is IOMMU mapping a problem for the Tegra GPU driver?
> > 
> > If we add something like this then it should not be the choice of the
> > device driver, but of the user and/or the firmware.
> 
> Agreed, and it would still need somebody to configure an identity domain so
> that transactions aren't aborted immediately. We currently allow the
> identity domain to be used by default via a command-line option, so I guess
> we'd need a way for firmware to request that on a per-device basis.

The IOMMU mapping itself is not a problem, the problem is the management of 
the IOMMU. For Tegra we don't want anything to intrude into the IOMMU 
activities because:

1) GPU HW require additional configuration for the IOMMU usage and dumb 
mapping of the allocations simply doesn't work.

2) Older Tegra generations have a limited resource and capabilities in regards 
to IOMMU usage, allocating IOMMU domain per-device is just impossible for  
example.

3) HW performs context switches and so particular allocations have to be 
assigned to a particular contexts IOMMU domain.

Some of the above is due to a SW driver model (and its work-in-progress 
status), other is due to a HW model. So essentially we need a way for a driver 
to tell the core not to mess with IOMMU stuff of drivers device behind the 
drivers back.

I'm not sure what you guys are meaning by the "firmware", could you elaborate 
please? Do you mean the Open Firmware and hence the devicetree or what?

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

[PATCH] of/platform: Initialise default DMA masks

When of_dma_configure() was first born in 591c1ee465ce ("of: configure
the platform device dma parameters"), everything DMA-related was
factored out of of_platform_device_create_pdata() as seemed appropriate
at the time. However, now that of_dma_configure() has grown into the
generic handler for processing DMA-related properties from DT for all
kinds of devices, it is no longer an appropriate place to be doing
OF-platform-specific business. Since there are still plenty of platform
drivers not setting their own masks and depending on the bus default,
let's reinstate that inialisation in the OF-platform code itself, and
restore the long-standing status quo from 0589342c2794 ("of: set
dma_mask to point to coherent_dma_mask")

CC: Rob Herring 
CC: Frank Rowand 
Signed-off-by: Robin Murphy 
---

This shouldn't strictly depend on the changes currently queued in the
dma-mapping tree, so should be OK to go through the DT tree in parallel.
Ideally we'd fix all DMA-capable drivers to set their masks correctly,
but in the short term everyone's going to get cross about -rc1 not
booting...

Robin.

 drivers/of/platform.c | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/drivers/of/platform.c b/drivers/of/platform.c
index 6925d993e1f0..7ba90c290a42 100644
--- a/drivers/of/platform.c
+++ b/drivers/of/platform.c
@@ -185,6 +185,9 @@ static struct platform_device 
*of_platform_device_create_pdata(
if (!dev)
goto err_clear_flag;
 
+   dev->dev.coherent_dma_mask = DMA_BIT_MASK(32);
+   if (!dev->dev.dma_mask)
+   dev->dev.dma_mask = >dev.coherent_dma_mask;
dev->dev.bus = _bus_type;
dev->dev.platform_data = platform_data;
of_msi_configure(>dev, dev->dev.of_node);
-- 
2.17.1.dirty

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

Re: [PATCH] of/platform: Initialise default DMA masks


On 27/07/18 15:20, Christoph Hellwig wrote:

On Fri, Jul 27, 2018 at 03:14:15PM +0100, Robin Murphy wrote:

This shouldn't strictly depend on the changes currently queued in the
dma-mapping tree, so should be OK to go through the DT tree in parallel.
Ideally we'd fix all DMA-capable drivers to set their masks correctly,
but in the short term everyone's going to get cross about -rc1 not
booting...


I'd prefer to pick this up through the dma-mapping tree, and in fact
move it before the bus_dma_mask changes just to keep a bisectable
tree.


Sure, if Rob and Frank are OK with that I don't mind either way.

Thanks,
Robin.
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

Re: [PATCH 2/3] dmapool: improve scalability of dma_pool_free

On 07/26/2018 08:07 PM, Matthew Wilcox wrote:
> On Thu, Jul 26, 2018 at 04:06:05PM -0400, Tony Battersby wrote:
>> On 07/26/2018 03:42 PM, Matthew Wilcox wrote:
>>> On Thu, Jul 26, 2018 at 02:54:56PM -0400, Tony Battersby wrote:
 dma_pool_free() scales poorly when the pool contains many pages because
 pool_find_page() does a linear scan of all allocated pages.  Improve its
 scalability by replacing the linear scan with a red-black tree lookup. 
 In big O notation, this improves the algorithm from O(n^2) to O(n * log n).
>>> This is a lot of code to get us to O(n * log(n)) when we can use less
>>> code to go to O(n).  dma_pool_free() is passed the virtual address.
>>> We can go from virtual address to struct page with virt_to_page().
>>> In struct page, we have 5 words available (20/40 bytes), and it's trivial
>>> to use one of them to point to the struct dma_page.
>>>
>> Thanks for the tip.  I will give that a try.
> If you're up for more major surgery, then I think we can put all the
> information currently stored in dma_page into struct page.  Something
> like this:
>
> +++ b/include/linux/mm_types.h
> @@ -152,6 +152,12 @@ struct page {
> unsigned long hmm_data;
> unsigned long _zd_pad_1;/* uses mapping */
> };
> +   struct {/* dma_pool pages */
> +   struct list_head dma_list;
> +   unsigned short in_use;
> +   unsigned short offset;
> +   dma_addr_t dma;
> +   };
>  
> /** @rcu_head: You can use this to free a page by RCU. */
> struct rcu_head rcu_head;
>
> page_list -> dma_list
> vaddr goes away (page_to_virt() exists)
> dma -> dma
> in_use and offset shrink from 4 bytes to 2.
>
> Some 32-bit systems have a 64-bit dma_addr_t, and on those systems,
> this will be 8 + 2 + 2 + 8 = 20 bytes.  On 64-bit systems, it'll be
> 16 + 2 + 2 + 4 bytes of padding + 8 = 32 bytes (we have 40 available).
>
>
offset at least needs more bits, since allocations can be multi-page. 
See the following from mpt3sas:

cat /sys/devices/pci:80/:80:07.0/:85:00.0/pools
(manually cleaned up column alignment)
poolinfo - 0.1
reply_post_free_array pool  1  21 192 1
reply_free pool 1  1  41728   1
reply pool  1  1  1335296 1
sense pool  1  1  970272  1
chain pool  373959 386048 128 12064
reply_post_free pool12 12 166528  12
  ^size^

In my initial implementation I also added a pointer from struct page to
the dma_pool so that dma_pool_free() could sanity-check that the page
really belongs to the pool, as in:

pg = virt_to_page(vaddr);
if (unlikely(pg->dma_pool != pool)) {
handle error...
}
page = pg->dma_page;

The linked-list search previously implemented that check as a
by-product, and I didn't want to lose it.  It all seems to be working so
far.

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

Re: [GIT PULL] iommu/arm-smmu: Updates for 4.19

On Fri, Jul 27, 2018 at 12:51:42PM +0100, Will Deacon wrote:
> The following changes since commit ce397d215ccd07b8ae3f71db689aedb85d56ab40:
> 
>   Linux 4.18-rc1 (2018-06-17 08:04:49 +0900)
> 
> are available in the git repository at:
> 
>   git://git.kernel.org/pub/scm/linux/kernel/git/will/linux.git 
> for-joerg/arm-smmu/updates
> 
> for you to fetch changes up to b63b3439b85609338e4faabd5d2588dbda137e5c:
> 
>   iommu/arm-smmu-v3: Abort all transactions if SMMU is enabled in kdump 
> kernel (2018-07-27 11:12:37 +0100)

Pulled, thanks Will.

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

Re: [PATCH] iommu/iova: Update cached node pointer when current node fails to get any free IOVA


On 27/07/18 13:56, Ganapatrao Kulkarni wrote:
[...]

did you get any chance to look in to this issue?
i am waiting for your suggestion/patch for this issue!



I got as far as [1], but I wasn't sure how much I liked it, since it still
seems a little invasive for such a specific case (plus I can't remember if
it's actually been debugged or not). I think in the end I started wondering
whether it's even worth bothering with the 32-bit optimisation for PCIe
devices - 4 extra bytes worth of TLP is surely a lot less significant than
every transaction taking up to 50% more bus cycles was for legacy PCI.


how about tracking previous attempt to get 32bit range iova and avoid
further attempts, if it was failed. Later Resume attempts once
replenish happens.
Created patch for the same [2]


Ooh, that's a much neater implementation of essentially the same concept 
- now why couldn't I think of that? :)


Looks like it should be possible to make it entirely self-contained too, 
since alloc_iova() is in a position to both test and update the flag 
based on the limit_pfn passed in.


Robin.



[2] 
https://github.com/gpkulkarni/linux/commit/e2343a3e1f55cdeb5694103dd354bcb881dc65c3
note, the testing of this patch is in progress.



Robin.

[1]
http://www.linux-arm.org/git?p=linux-rm.git;a=commitdiff;h=a8e0e4af10ebebb3669750e05bf0028e5bd6afe8


thanks
Ganapat


___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

Re: [RFC PATCH v1 0/6] Resolve unwanted DMA backing with IOMMU

2018-07-27 Thread Rob Herring

On Fri, Jul 27, 2018 at 8:20 AM Joerg Roedel  wrote:
>
> On Fri, Jul 27, 2018 at 05:10:22PM +0300, Dmitry Osipenko wrote:
> > I'm not sure what you guys are meaning by the "firmware", could you 
> > elaborate
> > please? Do you mean the Open Firmware and hence the devicetree or what?
>
> Yes, I think the best way to request this is using a device-tree
> property. Letting the device driver request it is definitly a bad idea.

I don't follow why we need a property rather than being implied by the
device's (the GPU) compatible string.

Rob
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

Re: [PATCH 2/3] dmapool: improve scalability of dma_pool_free

2018-07-27 Thread Matthew Wilcox

On Fri, Jul 27, 2018 at 09:23:30AM -0400, Tony Battersby wrote:
> On 07/26/2018 08:07 PM, Matthew Wilcox wrote:
> > If you're up for more major surgery, then I think we can put all the
> > information currently stored in dma_page into struct page.  Something
> > like this:
> >
> > +++ b/include/linux/mm_types.h
> > @@ -152,6 +152,12 @@ struct page {
> > unsigned long hmm_data;
> > unsigned long _zd_pad_1;/* uses mapping */
> > };
> > +   struct {/* dma_pool pages */
> > +   struct list_head dma_list;
> > +   unsigned short in_use;
> > +   unsigned short offset;
> > +   dma_addr_t dma;
> > +   };
> >  
> > /** @rcu_head: You can use this to free a page by RCU. */
> > struct rcu_head rcu_head;
> >
> > page_list -> dma_list
> > vaddr goes away (page_to_virt() exists)
> > dma -> dma
> > in_use and offset shrink from 4 bytes to 2.
> >
> > Some 32-bit systems have a 64-bit dma_addr_t, and on those systems,
> > this will be 8 + 2 + 2 + 8 = 20 bytes.  On 64-bit systems, it'll be
> > 16 + 2 + 2 + 4 bytes of padding + 8 = 32 bytes (we have 40 available).
> >
> >
> offset at least needs more bits, since allocations can be multi-page. 

Ah, rats.  That means we have to use the mapcount union too:

+++ b/include/linux/mm_types.h
@@ -152,6 +152,11 @@ struct page {
unsigned long hmm_data;
unsigned long _zd_pad_1;/* uses mapping */
};
+   struct {/* dma_pool pages */
+   struct list_head dma_list;
+   unsigned int dma_in_use;
+   dma_addr_t dma;
+   };
 
/** @rcu_head: You can use this to free a page by RCU. */
struct rcu_head rcu_head;
@@ -174,6 +179,7 @@ struct page {
 
unsigned int active;/* SLAB */
int units;  /* SLOB */
+   unsigned int dma_offset;/* dma_pool */
};
 
/* Usage count. *DO NOT USE DIRECTLY*. See page_ref.h */


> See the following from mpt3sas:
> 
> cat /sys/devices/pci:80/:80:07.0/:85:00.0/pools
> (manually cleaned up column alignment)
> poolinfo - 0.1
> reply_post_free_array pool  1  21 192 1
> reply_free pool 1  1  41728   1
> reply pool  1  1  1335296 1
> sense pool  1  1  970272  1
> chain pool  373959 386048 128 12064
> reply_post_free pool12 12 166528  12
>   ^size^

Wow, that's a pretty weird way to use the dmapool.  It'd be more efficient
to just call dma_alloc_coherent() directly.

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

[PATCH] sparc: use generic dma_noncoherent_ops

Switch to the generic noncoherent direct mapping implementation.

This removes the previous sync_single_for_device implementation, which
looks bogus given that no syncing is happening in the similar but more
important map_single case.

Signed-off-by: Christoph Hellwig 
---
 arch/sparc/Kconfig   |   2 +
 arch/sparc/include/asm/dma-mapping.h |   5 +-
 arch/sparc/kernel/ioport.c   | 151 ++-
 3 files changed, 14 insertions(+), 144 deletions(-)

diff --git a/arch/sparc/Kconfig b/arch/sparc/Kconfig
index 0f535debf802..79f29c67291a 100644
--- a/arch/sparc/Kconfig
+++ b/arch/sparc/Kconfig
@@ -48,6 +48,8 @@ config SPARC
 
 config SPARC32
def_bool !64BIT
+   select ARCH_HAS_SYNC_DMA_FOR_CPU
+   select DMA_NONCOHERENT_OPS
select GENERIC_ATOMIC64
select CLZ_TAB
select HAVE_UID16
diff --git a/arch/sparc/include/asm/dma-mapping.h 
b/arch/sparc/include/asm/dma-mapping.h
index 12ae33daf52f..e17566376934 100644
--- a/arch/sparc/include/asm/dma-mapping.h
+++ b/arch/sparc/include/asm/dma-mapping.h
@@ -7,7 +7,6 @@
 #include 
 
 extern const struct dma_map_ops *dma_ops;
-extern const struct dma_map_ops pci32_dma_ops;
 
 extern struct bus_type pci_bus_type;
 
@@ -15,11 +14,11 @@ static inline const struct dma_map_ops 
*get_arch_dma_ops(struct bus_type *bus)
 {
 #ifdef CONFIG_SPARC_LEON
if (sparc_cpu_model == sparc_leon)
-   return _dma_ops;
+   return _noncoherent_ops;
 #endif
 #if defined(CONFIG_SPARC32) && defined(CONFIG_PCI)
if (bus == _bus_type)
-   return _dma_ops;
+   return _noncoherent_ops;
 #endif
return dma_ops;
 }
diff --git a/arch/sparc/kernel/ioport.c b/arch/sparc/kernel/ioport.c
index cca9134cfa7d..960c1bc22c2e 100644
--- a/arch/sparc/kernel/ioport.c
+++ b/arch/sparc/kernel/ioport.c
@@ -38,6 +38,7 @@
 #include 
 #include 
 #include 
+#include 
 #include 
 
 #include 
@@ -434,9 +435,8 @@ arch_initcall(sparc_register_ioport);
 /* Allocate and map kernel buffer using consistent mode DMA for a device.
  * hwdev should be valid struct pci_dev pointer for PCI devices.
  */
-static void *pci32_alloc_coherent(struct device *dev, size_t len,
- dma_addr_t *pba, gfp_t gfp,
- unsigned long attrs)
+void *arch_dma_alloc(struct device *dev, size_t len, dma_addr_t *pba, gfp_t 
gfp,
+   unsigned long attrs)
 {
unsigned long len_total = PAGE_ALIGN(len);
void *va;
@@ -488,8 +488,8 @@ static void *pci32_alloc_coherent(struct device *dev, 
size_t len,
  * References to the memory and mappings associated with cpu_addr/dma_addr
  * past this call are illegal.
  */
-static void pci32_free_coherent(struct device *dev, size_t n, void *p,
-   dma_addr_t ba, unsigned long attrs)
+void arch_dma_free(struct device *dev, size_t n, void *p, dma_addr_t ba,
+   unsigned long attrs)
 {
struct resource *res;
 
@@ -519,146 +519,15 @@ static void pci32_free_coherent(struct device *dev, 
size_t n, void *p,
free_pages((unsigned long)phys_to_virt(ba), get_order(n));
 }
 
-/*
- * Same as pci_map_single, but with pages.
- */
-static dma_addr_t pci32_map_page(struct device *dev, struct page *page,
-unsigned long offset, size_t size,
-enum dma_data_direction dir,
-unsigned long attrs)
-{
-   /* IIep is write-through, not flushing. */
-   return page_to_phys(page) + offset;
-}
-
-static void pci32_unmap_page(struct device *dev, dma_addr_t ba, size_t size,
-enum dma_data_direction dir, unsigned long attrs)
-{
-   if (dir != PCI_DMA_TODEVICE && !(attrs & DMA_ATTR_SKIP_CPU_SYNC))
-   dma_make_coherent(ba, PAGE_ALIGN(size));
-}
-
-/* Map a set of buffers described by scatterlist in streaming
- * mode for DMA.  This is the scatter-gather version of the
- * above pci_map_single interface.  Here the scatter gather list
- * elements are each tagged with the appropriate dma address
- * and length.  They are obtained via sg_dma_{address,length}(SG).
- *
- * NOTE: An implementation may be able to use a smaller number of
- *   DMA address/length pairs than there are SG table elements.
- *   (for example via virtual mapping capabilities)
- *   The routine returns the number of addr/length pairs actually
- *   used, at most nents.
- *
- * Device ownership issues as mentioned above for pci_map_single are
- * the same here.
- */
-static int pci32_map_sg(struct device *device, struct scatterlist *sgl,
-   int nents, enum dma_data_direction dir,
-   unsigned long attrs)
-{
-   struct scatterlist *sg;
-   int n;
-
-   /* IIep is write-through, not flushing. */
-   for_each_sg(sgl, sg, nents, n) {
-   sg->dma_address = sg_phys(sg);
-

generic noncoherent dma ops for sparc32, 2nd resend

Dave, can you take a review this and apply if ok?  I've been
sending this twice already but never got an answer.  Out of my
batch of conversion sparc is one of only two that haven't been
merged yet.


___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

Re: [PATCH] of/platform: Initialise default DMA masks

On Fri, Jul 27, 2018 at 10:50:34AM -0600, Rob Herring wrote:
> On Fri, Jul 27, 2018 at 8:18 AM Robin Murphy  wrote:
> >
> > On 27/07/18 15:20, Christoph Hellwig wrote:
> > > On Fri, Jul 27, 2018 at 03:14:15PM +0100, Robin Murphy wrote:
> > >> This shouldn't strictly depend on the changes currently queued in the
> > >> dma-mapping tree, so should be OK to go through the DT tree in parallel.
> > >> Ideally we'd fix all DMA-capable drivers to set their masks correctly,
> > >> but in the short term everyone's going to get cross about -rc1 not
> > >> booting...
> > >
> > > I'd prefer to pick this up through the dma-mapping tree, and in fact
> > > move it before the bus_dma_mask changes just to keep a bisectable
> > > tree.
> >
> > Sure, if Rob and Frank are OK with that I don't mind either way.
> 
> NP.
> 
> Acked-by: Rob Herring 

Ok, applied so that it gets into the next linux-next.
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

Re: [RFC PATCH v1 0/6] Resolve unwanted DMA backing with IOMMU

2018-07-27 Thread Jordan Crouse

On Fri, Jul 27, 2018 at 05:02:37PM +0100, Robin Murphy wrote:
> On 27/07/18 15:10, Dmitry Osipenko wrote:
> >On Friday, 27 July 2018 12:03:28 MSK Will Deacon wrote:
> >>On Fri, Jul 27, 2018 at 10:25:13AM +0200, Joerg Roedel wrote:
> >>>On Fri, Jul 27, 2018 at 02:16:18AM +0300, Dmitry Osipenko wrote:
> The proposed solution adds a new option to the base device driver
> structure that allows device drivers to explicitly convey to the drivers
> core that the implicit IOMMU backing for devices must not happen.
> >>>
> >>>Why is IOMMU mapping a problem for the Tegra GPU driver?
> >>>
> >>>If we add something like this then it should not be the choice of the
> >>>device driver, but of the user and/or the firmware.
> >>
> >>Agreed, and it would still need somebody to configure an identity domain so
> >>that transactions aren't aborted immediately. We currently allow the
> >>identity domain to be used by default via a command-line option, so I guess
> >>we'd need a way for firmware to request that on a per-device basis.
> >
> >The IOMMU mapping itself is not a problem, the problem is the management of
> >the IOMMU. For Tegra we don't want anything to intrude into the IOMMU
> >activities because:
> >
> >1) GPU HW require additional configuration for the IOMMU usage and dumb
> >mapping of the allocations simply doesn't work.
> 
> Generally, that's already handled by the DRM drivers allocating
> their own unmanaged domains. The only problem we really need to
> solve in that regard is that currently the device DMA ops don't get
> updated when moving away from the managed domain. That's been OK for
> the VFIO case where the device is bound to a different driver which
> we know won't make any explicit DMA API calls, but for the more
> general case of IOMMU-aware drivers we could certainly do with a bit
> of cooperation between the IOMMU API, DMA API, and arch code to
> update the DMA ops dynamically to cope with intermediate subsystems
> making DMA API calls on behalf of devices they don't know the
> intimate details of.
> 
> >2) Older Tegra generations have a limited resource and capabilities in 
> >regards
> >to IOMMU usage, allocating IOMMU domain per-device is just impossible for
> >example.
> >
> >3) HW performs context switches and so particular allocations have to be
> >assigned to a particular contexts IOMMU domain.
> 
> I understand Qualcomm SoCs have a similar thing too, and AFAICS that
> case just doesn't fit into the current API model at all. We need the
> IOMMU driver to somehow know about the specific details of which
> devices have magic associations with specific contexts, and we
> almost certainly need a more expressive interface than
> iommu_domain_alloc() to have any hope of reliable results.

This is correct for Qualcomm GPUs - The GPU hardware context switching 
requires a specific context and there are some restrictions around
secure contexts as well.

We don't really care if the DMA attaches to a context just as long as it
doesn't attach to the one(s) we care about. Perhaps a "valid context" mask would
work in from the DT or the device struct to give the subsystems a clue as to
which domains they were allowed to use. I recognize that there isn't a
one-size-fits-all solution to this problem so I'm open to different ideas.

Jordan

-- 
The Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum,
a Linux Foundation Collaborative Project
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

Re: [PATCH v2 0/7] Stop losing firmware-set DMA masks



On 07/27/2018 05:55 AM, Robin Murphy wrote:
> Hi Grygorii,
> 
> Thanks for the report.
> 
> On 27/07/18 00:45, Grygorii Strashko wrote:
> [...]
>> With this series applied I can't boot TI ARM32 am574x-idk any more.
>>
>> And log output is full of "DMA mask not set" -
> 
> That's somewhat expected - as the relevant commit message mentions, 
> there will be bugs flushed out by this change, but the point is to fix 
> them :)
> 
>> nobody sets dma_mask for platform bus in case of OF boot :(
> 
> Right, after a brief investigation, that one turns out to be nice and 
> straightforward to explain and fix; I'll write up the patch shortly.
> 
> [...]
>> [    3.793493] ti-qspi 4b30.qspi: dma_alloc_coherent failed, using 
>> PIO mode
>> [    3.801088] Unable to handle kernel NULL pointer dereference at 
>> virtual address 0080
> [...]
>> [    4.602777] [] (omap_dma_write) from [] 
>> (omap_dma_start_desc+0x78/0x150)
>> [    4.611258] [] (omap_dma_start_desc) from [] 
>> (omap_dma_issue_pending+0x90/0x98)
>> [    4.620347]  r9:ed002780 r8: r7:0010 r6:ee2bdd00 
>> r5:ee2bdd4c r4:ee2bdd00
>> [    4.628130] [] (omap_dma_issue_pending) from [] 
>> (ti_qspi_dma_xfer+0x8c/0x13c)
>> [    4.637039]  r5: r4:ee7e1b28
>> [    4.640636] [] (ti_qspi_dma_xfer) from [] 
>> (ti_qspi_exec_mem_op+0x2e4/0x32c)
> 
> Hooray, a real bug! Since dma_alloc_coherent() could have failed due to 
> lack of memory or any other reason, that's not even directly related to 
> this change, we've simply helped uncover it. I guess that PIO fallback 
> path hasn't been tested recently, because it clearly doesn't work. It 
> would be good if someone who knows the ti-qspi driver and has the means 
> to test it could figure that one out before also adding the missing 
> dma_set_coherent_mask() call.

yes. ti-qspi has an issues with PIO mode. 

But, reason of failure is this warning:

[3.482502] WARNING: CPU: 0 PID: 1 at ./include/linux/dma-mapping.h:516 
ti_qspi_probe+0x4a4/0x50c

which is caused by this series - which makes coherent_dma_mask constantly 0 by 
default,
so dma_alloc_coherent() - fails.


> Robin.

-- 
regards,
-grygorii
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

Re: [RFC PATCH v1 0/6] Resolve unwanted DMA backing with IOMMU

2018-07-27 Thread Dmitry Osipenko

On Friday, 27 July 2018 20:03:26 MSK Jordan Crouse wrote:
> On Fri, Jul 27, 2018 at 05:02:37PM +0100, Robin Murphy wrote:
> > On 27/07/18 15:10, Dmitry Osipenko wrote:
> > >On Friday, 27 July 2018 12:03:28 MSK Will Deacon wrote:
> > >>On Fri, Jul 27, 2018 at 10:25:13AM +0200, Joerg Roedel wrote:
> > >>>On Fri, Jul 27, 2018 at 02:16:18AM +0300, Dmitry Osipenko wrote:
> > The proposed solution adds a new option to the base device driver
> > structure that allows device drivers to explicitly convey to the
> > drivers
> > core that the implicit IOMMU backing for devices must not happen.
> > >>>
> > >>>Why is IOMMU mapping a problem for the Tegra GPU driver?
> > >>>
> > >>>If we add something like this then it should not be the choice of the
> > >>>device driver, but of the user and/or the firmware.
> > >>
> > >>Agreed, and it would still need somebody to configure an identity domain
> > >>so
> > >>that transactions aren't aborted immediately. We currently allow the
> > >>identity domain to be used by default via a command-line option, so I
> > >>guess
> > >>we'd need a way for firmware to request that on a per-device basis.
> > >
> > >The IOMMU mapping itself is not a problem, the problem is the management
> > >of
> > >the IOMMU. For Tegra we don't want anything to intrude into the IOMMU
> > >activities because:
> > >
> > >1) GPU HW require additional configuration for the IOMMU usage and dumb
> > >mapping of the allocations simply doesn't work.
> > 
> > Generally, that's already handled by the DRM drivers allocating
> > their own unmanaged domains. The only problem we really need to
> > solve in that regard is that currently the device DMA ops don't get
> > updated when moving away from the managed domain. That's been OK for
> > the VFIO case where the device is bound to a different driver which
> > we know won't make any explicit DMA API calls, but for the more
> > general case of IOMMU-aware drivers we could certainly do with a bit
> > of cooperation between the IOMMU API, DMA API, and arch code to
> > update the DMA ops dynamically to cope with intermediate subsystems
> > making DMA API calls on behalf of devices they don't know the
> > intimate details of.
> > 
> > >2) Older Tegra generations have a limited resource and capabilities in
> > >regards to IOMMU usage, allocating IOMMU domain per-device is just
> > >impossible for example.
> > >
> > >3) HW performs context switches and so particular allocations have to be
> > >assigned to a particular contexts IOMMU domain.
> > 
> > I understand Qualcomm SoCs have a similar thing too, and AFAICS that
> > case just doesn't fit into the current API model at all. We need the
> > IOMMU driver to somehow know about the specific details of which
> > devices have magic associations with specific contexts, and we
> > almost certainly need a more expressive interface than
> > iommu_domain_alloc() to have any hope of reliable results.
> 
> This is correct for Qualcomm GPUs - The GPU hardware context switching
> requires a specific context and there are some restrictions around
> secure contexts as well.
> 
> We don't really care if the DMA attaches to a context just as long as it
> doesn't attach to the one(s) we care about. Perhaps a "valid context" mask
> would work in from the DT or the device struct to give the subsystems a
> clue as to which domains they were allowed to use. I recognize that there
> isn't a one-size-fits-all solution to this problem so I'm open to different
> ideas.

Designating whether implicit IOMMU backing is appropriate for a device via 
device-tree property sounds a bit awkward because that will be a kinda 
software description (of a custom Linux driver model), while device-tree is 
supposed to describe HW.

What about to grant IOMMU drivers with ability to decide whether the implicit 
backing for a device is appropriate? Like this:

bool implicit_iommu_for_dma_is_allowed(struct device *dev)
{
const struct iommu_ops *ops = dev->bus->iommu_ops;
struct iommu_group *group;

group = iommu_group_get(dev);
if (!group)
return NULL;

iommu_group_put(group);

if (!ops->implicit_iommu_for_dma_is_allowed)
return true;

return ops->implicit_iommu_for_dma_is_allowed(dev);
}

Then arch_setup_dma_ops() could have a clue whether implicit IOMMU backing for 
a device is appropriate.



___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

[PATCH] MIPS: remove mips_swiotlb_ops

mips_swiotlb_ops differs from the generic swiotlb_dma_ops only in that
it contains a mb() barrier after each operations that maps or syncs
dma memory to the device.

The dma operations are defined to not be memory barriers, but instead
the write* operations to kick the DMA off are supposed to contain them.

For mips this handled by war_io_reorder_wmb(), which evaluates to the
stronger wmb() instead of the pure compiler barrier barrier() for
just those platforms that use swiotlb, so I think we are covered
properly.

Signed-off-by: Christoph Hellwig 
---
 arch/mips/include/asm/dma-mapping.h |  3 +-
 arch/mips/mm/Makefile   |  1 -
 arch/mips/mm/dma-swiotlb.c  | 61 -
 3 files changed, 1 insertion(+), 64 deletions(-)
 delete mode 100644 arch/mips/mm/dma-swiotlb.c

diff --git a/arch/mips/include/asm/dma-mapping.h 
b/arch/mips/include/asm/dma-mapping.h
index 1c6e0c8ef4830..8ae7b20b68627 100644
--- a/arch/mips/include/asm/dma-mapping.h
+++ b/arch/mips/include/asm/dma-mapping.h
@@ -3,14 +3,13 @@
 #define _ASM_DMA_MAPPING_H
 
 extern const struct dma_map_ops jazz_dma_ops;
-extern const struct dma_map_ops mips_swiotlb_ops;
 
 static inline const struct dma_map_ops *get_arch_dma_ops(struct bus_type *bus)
 {
 #if defined(CONFIG_MACH_JAZZ)
return _dma_ops;
 #elif defined(CONFIG_SWIOTLB)
-   return _swiotlb_ops;
+   return _dma_ops;
 #elif defined(CONFIG_DMA_NONCOHERENT_OPS)
return _noncoherent_ops;
 #else
diff --git a/arch/mips/mm/Makefile b/arch/mips/mm/Makefile
index 6922f393af196..3e5bb203c95ac 100644
--- a/arch/mips/mm/Makefile
+++ b/arch/mips/mm/Makefile
@@ -18,7 +18,6 @@ obj-$(CONFIG_64BIT)   += pgtable-64.o
 obj-$(CONFIG_HIGHMEM)  += highmem.o
 obj-$(CONFIG_HUGETLB_PAGE) += hugetlbpage.o
 obj-$(CONFIG_DMA_NONCOHERENT)  += dma-noncoherent.o
-obj-$(CONFIG_SWIOTLB)  += dma-swiotlb.o
 
 obj-$(CONFIG_CPU_R4K_CACHE_TLB) += c-r4k.o cex-gen.o tlb-r4k.o
 obj-$(CONFIG_CPU_R3000)+= c-r3k.o tlb-r3k.o
diff --git a/arch/mips/mm/dma-swiotlb.c b/arch/mips/mm/dma-swiotlb.c
deleted file mode 100644
index 6014ed3479fd7..0
--- a/arch/mips/mm/dma-swiotlb.c
+++ /dev/null
@@ -1,61 +0,0 @@
-// SPDX-License-Identifier: GPL-2.0
-#include 
-#include 
-
-static void *mips_swiotlb_alloc(struct device *dev, size_t size,
-   dma_addr_t *dma_handle, gfp_t gfp, unsigned long attrs)
-{
-   void *ret = swiotlb_alloc(dev, size, dma_handle, gfp, attrs);
-
-   mb();
-   return ret;
-}
-
-static dma_addr_t mips_swiotlb_map_page(struct device *dev,
-   struct page *page, unsigned long offset, size_t size,
-   enum dma_data_direction dir, unsigned long attrs)
-{
-   dma_addr_t daddr = swiotlb_map_page(dev, page, offset, size,
-   dir, attrs);
-   mb();
-   return daddr;
-}
-
-static int mips_swiotlb_map_sg(struct device *dev, struct scatterlist *sg,
-   int nents, enum dma_data_direction dir, unsigned long attrs)
-{
-   int r = swiotlb_map_sg_attrs(dev, sg, nents, dir, attrs);
-   mb();
-
-   return r;
-}
-
-static void mips_swiotlb_sync_single_for_device(struct device *dev,
-   dma_addr_t dma_handle, size_t size, enum dma_data_direction dir)
-{
-   swiotlb_sync_single_for_device(dev, dma_handle, size, dir);
-   mb();
-}
-
-static void mips_swiotlb_sync_sg_for_device(struct device *dev,
-   struct scatterlist *sg, int nents, enum dma_data_direction dir)
-{
-   swiotlb_sync_sg_for_device(dev, sg, nents, dir);
-   mb();
-}
-
-const struct dma_map_ops mips_swiotlb_ops = {
-   .alloc  = mips_swiotlb_alloc,
-   .free   = swiotlb_free,
-   .map_page   = mips_swiotlb_map_page,
-   .unmap_page = swiotlb_unmap_page,
-   .map_sg = mips_swiotlb_map_sg,
-   .unmap_sg   = swiotlb_unmap_sg_attrs,
-   .sync_single_for_cpu= swiotlb_sync_single_for_cpu,
-   .sync_single_for_device = mips_swiotlb_sync_single_for_device,
-   .sync_sg_for_cpu= swiotlb_sync_sg_for_cpu,
-   .sync_sg_for_device = mips_swiotlb_sync_sg_for_device,
-   .mapping_error  = swiotlb_dma_mapping_error,
-   .dma_supported  = swiotlb_dma_supported,
-};
-EXPORT_SYMBOL(mips_swiotlb_ops);
-- 
2.18.0

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

Re: [RFC PATCH v1 0/6] Resolve unwanted DMA backing with IOMMU


On 27/07/18 15:10, Dmitry Osipenko wrote:

On Friday, 27 July 2018 12:03:28 MSK Will Deacon wrote:

On Fri, Jul 27, 2018 at 10:25:13AM +0200, Joerg Roedel wrote:

On Fri, Jul 27, 2018 at 02:16:18AM +0300, Dmitry Osipenko wrote:

The proposed solution adds a new option to the base device driver
structure that allows device drivers to explicitly convey to the drivers
core that the implicit IOMMU backing for devices must not happen.


Why is IOMMU mapping a problem for the Tegra GPU driver?

If we add something like this then it should not be the choice of the
device driver, but of the user and/or the firmware.


Agreed, and it would still need somebody to configure an identity domain so
that transactions aren't aborted immediately. We currently allow the
identity domain to be used by default via a command-line option, so I guess
we'd need a way for firmware to request that on a per-device basis.


The IOMMU mapping itself is not a problem, the problem is the management of
the IOMMU. For Tegra we don't want anything to intrude into the IOMMU
activities because:

1) GPU HW require additional configuration for the IOMMU usage and dumb
mapping of the allocations simply doesn't work.


Generally, that's already handled by the DRM drivers allocating their 
own unmanaged domains. The only problem we really need to solve in that 
regard is that currently the device DMA ops don't get updated when 
moving away from the managed domain. That's been OK for the VFIO case 
where the device is bound to a different driver which we know won't make 
any explicit DMA API calls, but for the more general case of IOMMU-aware 
drivers we could certainly do with a bit of cooperation between the 
IOMMU API, DMA API, and arch code to update the DMA ops dynamically to 
cope with intermediate subsystems making DMA API calls on behalf of 
devices they don't know the intimate details of.



2) Older Tegra generations have a limited resource and capabilities in regards
to IOMMU usage, allocating IOMMU domain per-device is just impossible for
example.

3) HW performs context switches and so particular allocations have to be
assigned to a particular contexts IOMMU domain.


I understand Qualcomm SoCs have a similar thing too, and AFAICS that 
case just doesn't fit into the current API model at all. We need the 
IOMMU driver to somehow know about the specific details of which devices 
have magic associations with specific contexts, and we almost certainly 
need a more expressive interface than iommu_domain_alloc() to have any 
hope of reliable results.


Robin.


Some of the above is due to a SW driver model (and its work-in-progress
status), other is due to a HW model. So essentially we need a way for a driver
to tell the core not to mess with IOMMU stuff of drivers device behind the
drivers back.

I'm not sure what you guys are meaning by the "firmware", could you elaborate
please? Do you mean the Open Firmware and hence the devicetree or what?



___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

Re: [PATCH] of/platform: Initialise default DMA masks

2018-07-27 Thread Rob Herring

On Fri, Jul 27, 2018 at 8:18 AM Robin Murphy  wrote:
>
> On 27/07/18 15:20, Christoph Hellwig wrote:
> > On Fri, Jul 27, 2018 at 03:14:15PM +0100, Robin Murphy wrote:
> >> This shouldn't strictly depend on the changes currently queued in the
> >> dma-mapping tree, so should be OK to go through the DT tree in parallel.
> >> Ideally we'd fix all DMA-capable drivers to set their masks correctly,
> >> but in the short term everyone's going to get cross about -rc1 not
> >> booting...
> >
> > I'd prefer to pick this up through the dma-mapping tree, and in fact
> > move it before the bus_dma_mask changes just to keep a bisectable
> > tree.
>
> Sure, if Rob and Frank are OK with that I don't mind either way.

NP.

Acked-by: Rob Herring 
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

Re: [PATCH v2 7/7] OF: Don't set default coherent DMA mask




On 07/27/2018 06:36 AM, Robin Murphy wrote:

On 27/07/18 01:22, Grygorii Strashko wrote:
[...]

the result of this change is pretty strange as for me :(
Resulted code:

/*
 * If @dev is expected to be DMA-capable then the bus code that 
created
 * it should have initialised its dma_mask pointer by this point. 
For

 * now, we'll continue the legacy behaviour of coercing it to the
 * coherent mask if not, but we'll no longer do so quietly.
 */
if (!dev->dma_mask) {
    dev_warn(dev, "DMA mask not set\n");
    dev->dma_mask = >coherent_dma_mask;
^this will always produce warning in case of platform-bus or if there 
are no bus driver.

even if DT contains correct definition of dma-range
}

if (!size && dev->coherent_dma_mask)

^ coherent_dma_mask is zero, so size will not be calculated

pls, ignore this particular comment


    size = max(dev->coherent_dma_mask, dev->coherent_dma_mask + 1);
else if (!size)
    size = 1ULL << 32;

dev->dma_pfn_offset = offset;

/*
 * Limit coherent and dma mask based on size and default mask
 * set by the driver.
 */
mask = DMA_BIT_MASK(ilog2(dma_addr + size - 1) + 1);
dev->bus_dma_mask = mask;
dev->coherent_dma_mask &= mask;

^^ if nobody set coherent_dma_mask before it will stay null forever 
unless drivers
will overwrite it. Again even if DT has correct definition for 
dma-range.


That is intentional. Consider a 32-bit device on a bus with an upstream 
DMA range of 40 bits (which could easily happen with e.g. PCI). If the 
firmware code gives that device 40-bit DMA masks and the driver doesn't 
change them, then DMA is not going to work correctly. Therefore the 
firmware code cannot safely make {coherent_}dma_mask larger than the 
default value passed in, and if the default is 0 then that should be 
fixed in whatever code created the device, not worked around later.


Could you clarify what do you mean "firmware" here?

On most ARM32 platforms there is only DT + kernel.
And DT is usually only one source of information about DT configuration.
Sry, but seems this changes makes impossible using DT for DMA parameters 
configuration any more.




Robin.



*dev->dma_mask &= mask;

coherent = of_dma_is_coherent(np);
dev_dbg(dev, "device is%sdma coherent\n",
    coherent ? " " : " not ");




FYI, with below diff i can boot at least:

diff --git a/drivers/of/device.c b/drivers/of/device.c
index 5957cd4..f7dc121 100644
--- a/drivers/of/device.c
+++ b/drivers/of/device.c
@@ -150,8 +150,8 @@ int of_dma_configure(struct device *dev, struct 
device_node *np, bool force_dma)

  */
 mask = DMA_BIT_MASK(ilog2(dma_addr + size - 1) + 1);
 dev->bus_dma_mask = mask;
-   dev->coherent_dma_mask &= mask;
-   *dev->dma_mask &= mask;
+   dev->coherent_dma_mask = mask;
+   *dev->dma_mask = mask;
 coherent = of_dma_is_coherent(np);
 dev_dbg(dev, "device is%sdma coherent\n",





--
regards,
-grygorii
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

Re: use the generic dma-noncoherent code for sh V2

2018-07-27 Thread Rob Landley

On 07/24/2018 03:21 PM, Christoph Hellwig wrote:
> On Tue, Jul 24, 2018 at 02:01:42PM +0200, Christoph Hellwig wrote:
>> Hi all,
>>
>> can you review these patches to switch sh to use the generic
>> dma-noncoherent code?  All the requirements are in mainline already
>> and we've switched various architectures over to it already.
>
> Ok, there is one more issue with this version.   Wait for a new one
> tomorrow.

Speaking of DMA:

I'm trying to wire up DMAEngine to an sh7760 board that uses platform data (and
fix the smc91x.c driver to use DMAEngine without #ifdef arm), so I've been
reading through all that stuff, but the docs seem kinda... thin?

Is there something I should have read other than
Documentation/driver-model/platform.txt,
Documentation/dmaegine/{provider,client}.txt, then trying to picking through the
source code and the sh7760 hardware pdf? (And watching the youtube video of
Laurent Pinchart's 2014 ELC talk on DMA, Maxime Ripard's 2015 ELC overview of
DMAEngine, the Xilinx video on DMAEngine...)

At first I thought the SH_DMAE could initialize itself, but the probe function
needs platform data, and although arch/sh/kernel/cpu/sh4a/setup-sh7722.c looks
_kind_ of like a model I can crib from:

A) "make ARCH=sh se7722_defconfig" results in a config with SH_DMA disabled??!?
(This is why I use miniconfig instead of defconfig format, I'm assuming that's
bit rot?)

B) That platform data is supplying sh_dmae_slave_config preallocating slave
channels to devices? (Does it have to? The docs gave me the impression the
driver would dynamically request them and devices could even share. Wasn't that
sort of the point of DMAEngine? Can my new board data _not_ do that? What's the
minimum amount of micromanaging I have to do?)

C) It's full of stuff like setting ts_low_shift to CHCR_TS_LOW_SHIFT where both
grepping Docuemntation and Google "dmaengine ts_low_shift" are unhelpful.

What I'd really like is a "hello world" version of DMAEngine somewhere I can
build and run on a supported qemu target, to set up _one_ channel with a block
device or something using it. I can't tell what's optional, or what the minimal
version of this looks like.

(Currently I've only managed to update this kernel to 4.14 because 4.15
introduced an intermittent data corruption bug in the flash, which takes long
enough to reproduce bisecting it is fiddly and ship deadlines are all blinky and
red. But next release should be current, _and_ with at least the 4.14 source
published so I can point people at it. Heck, maybe I can convince them to let me
port it to device tree next cycle, but I need to get it to _work_ first. And
doing PIO on a 100baseT controller, I.E. a ~200 mhz embedded CPU trying to copy
11 megabytes/second across a 16 bit bus with a for(;;) loop... bit of a
performance bottleneck even before you add https.)

Thanks,

Rob

>>
>> Changes since V1:
>>  - fixed two stupid compile errors and verified them using a local
>>cross toolchain instead of the 0day buildbot
>> ___
>> iommu mailing list
>> iommu@lists.linux-foundation.org
>> https://lists.linuxfoundation.org/mailman/listinfo/iommu
> ---end quoted text---
> --
> To unsubscribe from this list: send the line "unsubscribe linux-sh" in
> the body of a message to majord...@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> 
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

Re: [PATCH v2 2/7] dma-mapping: Generalise dma_32bit_limit flag





On 07/23/2018 05:16 PM, Robin Murphy wrote:

Whilst the notion of an upstream DMA restriction is most commonly seen
in PCI host bridges saddled with a 32-bit native interface, a more
general version of the same issue can exist on complex SoCs where a bus
or point-to-point interconnect link from a device's DMA master interface
to another component along the path to memory (often an IOMMU) may carry
fewer address bits than the interfaces at both ends nominally support.
In order to properly deal with this, the first step is to expand the
dma_32bit_limit flag into an arbitrary mask.

To minimise the impact on existing code, we'll make sure to only
consider this new mask valid if set. That makes sense anyway, since a
mask of zero would represent DMA not being wired up at all, and that
would be better handled by not providing valid ops in the first place.

Signed-off-by: Robin Murphy 


I'd like to note about some possible issue related to this change.

There are some places in kernel where parent DMA configuration is copied 
to the manually created child devices, like:

mfd-core.c
mfd_add_device()
pdev->dev.parent = parent;
pdev->dev.type = _dev_type;
pdev->dev.dma_mask = parent->dma_mask;
pdev->dev.dma_parms = parent->dma_parms;
pdev->dev.coherent_dma_mask = parent->coherent_dma_mask;

Adding or changing generic DMA device properties might affect on such
subsystems/drivers. Have you considered such cases?

--
regards,
-grygorii
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

Re: [RFC PATCH v1 0/6] Resolve unwanted DMA backing with IOMMU

On Fri, Jul 27, 2018 at 11:13:31AM -0600, Rob Herring wrote:
> I don't follow why we need a property rather than being implied by the
> device's (the GPU) compatible string.

There might be devices where either setup works, with or without IOMMU
translation, and the firmware can set the property depending on whether
the user wants more performance or more security.

If we have a whitelist in the kernel this gets more complicated, we
probably need additional kernel-parameters to overwrite those whitelist
entries. Having a property in the device-tree seems to be a better way
here, imho.

Joerg

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

Re: [PATCH v2 7/7] OF: Don't set default coherent DMA mask

2018-07-27 Thread Russell King - ARM Linux

On Fri, Jul 27, 2018 at 12:36:00PM +0100, Robin Murphy wrote:
> That is intentional. Consider a 32-bit device on a bus with an upstream DMA
> range of 40 bits (which could easily happen with e.g. PCI). If the firmware
> code gives that device 40-bit DMA masks and the driver doesn't change them,
> then DMA is not going to work correctly. Therefore the firmware code cannot
> safely make {coherent_}dma_mask larger than the default value passed in, and
> if the default is 0 then that should be fixed in whatever code created the
> device, not worked around later.

I think you're missing the point.

When OF creates platform devices, they are created without any DMA masks
what so ever.  It is only during device probe that dma_configure() gets
called, which then calls of_dma_configure(), where the DMA mask for any
DT declared device is setup.

What this means is that your change has broken all existing DT devices
that are trying to use DMA, because they now end up with a zero coherent
DMA mask and/or streaming DMA mask.

Your original idea behind the patch may be a sound one, but since the
implementation has caused a regression, the implementation is obviously
broken, and that needs fixing or reworking in some way.

-- 
RMK's Patch system: http://www.armlinux.org.uk/developer/patches/
FTTC broadband for 0.8mile line in suburbia: sync at 13.8Mbps down 630kbps up
According to speedtest.net: 13Mbps down 490kbps up
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

Re: [PATCH v2 7/7] OF: Don't set default coherent DMA mask




On 07/27/2018 01:13 PM, Russell King - ARM Linux wrote:
> On Fri, Jul 27, 2018 at 12:36:00PM +0100, Robin Murphy wrote:
>> That is intentional. Consider a 32-bit device on a bus with an upstream DMA
>> range of 40 bits (which could easily happen with e.g. PCI). If the firmware
>> code gives that device 40-bit DMA masks and the driver doesn't change them,
>> then DMA is not going to work correctly. Therefore the firmware code cannot
>> safely make {coherent_}dma_mask larger than the default value passed in, and
>> if the default is 0 then that should be fixed in whatever code created the
>> device, not worked around later.
> 
> I think you're missing the point.
> 
> When OF creates platform devices, they are created without any DMA masks
> what so ever.  It is only during device probe that dma_configure() gets
> called, which then calls of_dma_configure(), where the DMA mask for any
> DT declared device is setup.
> 
> What this means is that your change has broken all existing DT devices
> that are trying to use DMA, because they now end up with a zero coherent
> DMA mask and/or streaming DMA mask.
> 
> Your original idea behind the patch may be a sound one, but since the
> implementation has caused a regression, the implementation is obviously
> broken, and that needs fixing or reworking in some way.
> 

There was additional patch [1] which fixes an issue.

But I have a question to all:
- The patch [1] sets default DMA mask to DMA_BIT_MASK(32)
+   dev->dev.coherent_dma_mask = DMA_BIT_MASK(32);
+   if (!dev->dev.dma_mask)
+   dev->dev.dma_mask = >dev.coherent_dma_mask;

this will also work the same way for ARM64

- But of_dma_configure() still have code which does:
dev->coherent_dma_mask &= mask;
*dev->dma_mask &= mask;

As result, DMA mask supplied from DT will always be truncated
 to 32 bit for platform devices. for example:

soc0: soc0 {
compatible = "simple-bus";
#address-cells = <2>;
#size-cells = <2>;
ranges;
+dma-ranges = <0 0 0 0 0x1 0>; 

^ 48 bit DMA mask expected to be generated and assigned.

But real mask will be DMA_BIT_MASK(32). As result, any 
DMA capable driver will have be modified to do dma_set_xxx_mask(),
which disregards DT DMA configuration (especially for HW modules
which are used on ARM32 and ARM64).

Could it be considered to do one the changes below?

1) assign mask in case of dt
diff --git a/drivers/of/device.c b/drivers/of/device.c
index 5957cd4..f7dc121 100644
--- a/drivers/of/device.c
+++ b/drivers/of/device.c
@@ -150,8 +150,8 @@ int of_dma_configure(struct device *dev, struct device_node 
*np, bool force_dma)
 */
mask = DMA_BIT_MASK(ilog2(dma_addr + size - 1) + 1);
dev->bus_dma_mask = mask;
-   dev->coherent_dma_mask &= mask;
-   *dev->dma_mask &= mask;
+   dev->coherent_dma_mask = mask;
+   *dev->dma_mask = mask;
 
coherent = of_dma_is_coherent(np);

2) use BITS_PER_LONG for default DMA mask for of_platform devices

diff --git a/drivers/of/platform.c b/drivers/of/platform.c
index 7ba90c2..3f326e2 100644
--- a/drivers/of/platform.c
+++ b/drivers/of/platform.c
@@ -185,7 +185,7 @@ static struct platform_device 
*of_platform_device_create_pdata(
if (!dev)
goto err_clear_flag;
 
-   dev->dev.coherent_dma_mask = DMA_BIT_MASK(32);
+   dev->dev.coherent_dma_mask = DMA_BIT_MASK(BITS_PER_LONG);
if (!dev->dev.dma_mask)
dev->dev.dma_mask = >dev.coherent_dma_mask;

3) ...

[1] https://www.spinics.net/lists/arm-kernel/msg668934.html
-- 
regards,
-grygorii
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

Re: [PATCH 2/3] dmapool: improve scalability of dma_pool_free

On 07/27/2018 05:35 PM, Andy Shevchenko wrote:
> On Sat, Jul 28, 2018 at 12:27 AM, Tony Battersby  
> wrote:
>> On 07/27/2018 03:38 PM, Tony Battersby wrote:
>>> But the bigger problem is that my first patch adds another list_head to
>>> the dma_page for the avail_page_link to make allocations faster.  I
>>> suppose we could make the lists singly-linked instead of doubly-linked
>>> to save space.
>>>
>> I managed to redo my dma_pool_alloc() patch to make avail_page_list
>> singly-linked instead of doubly-linked.
> Are you relying on llist.h implementation?
>
> Btw, did you see quicklist.h?
>
>
I looked over include/linux/*list* to see if there was a suitable
implementation I could use.  llist.h makes a big deal about having a
lock-less implementation with atomic instructions, which seemed like
overkill.  I didn't see anything else suitable, so I just went with my
own implementation.  Singly-linked lists are simple enough.  And a quick
"grep -i singly include/linux/*" shows precedent in bi_next, fl_next,
fa_next, etc.

Thanks for pointing out quicklist.h.  At first I thought you were
confused since you were talking about linked list implementations and
quicklist.h sounds like a linked list implementation but isn't.  But now
I see that it is doing simple memory allocation/free, so that is the
relevance to dmapool.  Incidentally it looks like it is also using a
singly-linked list to store the list of free pages, but it is much
simpler because it doesn't try to sub-divide the pages into smaller
allocations.
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

Re: [PATCH] sparc: use generic dma_noncoherent_ops

2018-07-27 Thread Sam Ravnborg

Hi Christoph.

Some observations below - nitpick and bikeshedding only.

The parameter of phys_addr_t is sometimes renamed
to use the same name as in the original prototype (good),
and sometimes uses the old name (bad).
This makes it inconsistent as the local name changes in the
different functions, but they represents the same.

I really like how you managed to kill a lot of code
with this patch.

You can add my:
Acked-by: Sam Ravnborg 

But that does imply that this is OK, you
must wait for davem.

Sam

On Fri, Jul 27, 2018 at 06:44:09PM +0200, Christoph Hellwig wrote:
> Switch to the generic noncoherent direct mapping implementation.
> 
> This removes the previous sync_single_for_device implementation, which
> looks bogus given that no syncing is happening in the similar but more
> important map_single case.
> 
> Signed-off-by: Christoph Hellwig 
> ---
>  arch/sparc/Kconfig   |   2 +
>  arch/sparc/include/asm/dma-mapping.h |   5 +-
>  arch/sparc/kernel/ioport.c   | 151 ++-
>  3 files changed, 14 insertions(+), 144 deletions(-)
> 
> diff --git a/arch/sparc/Kconfig b/arch/sparc/Kconfig
> index 0f535debf802..79f29c67291a 100644
> --- a/arch/sparc/Kconfig
> +++ b/arch/sparc/Kconfig
> @@ -48,6 +48,8 @@ config SPARC
>  
>  config SPARC32
>   def_bool !64BIT
> + select ARCH_HAS_SYNC_DMA_FOR_CPU
> + select DMA_NONCOHERENT_OPS

The current implementation has both:
pci32_sync_single_for_cpu() AND
pci32_sync_single_for_device

Yet, the new implementation only include arch_sync_dma_for_cpu()
You addressed this in the changelog - so I guess it is OK.


> --- a/arch/sparc/include/asm/dma-mapping.h
> +++ b/arch/sparc/include/asm/dma-mapping.h
> @@ -7,7 +7,6 @@
>  #include 
>  
>  extern const struct dma_map_ops *dma_ops;
> -extern const struct dma_map_ops pci32_dma_ops;
>  
>  extern struct bus_type pci_bus_type;
>  
> @@ -15,11 +14,11 @@ static inline const struct dma_map_ops 
> *get_arch_dma_ops(struct bus_type *bus)
>  {
>  #ifdef CONFIG_SPARC_LEON
>   if (sparc_cpu_model == sparc_leon)
> - return _dma_ops;
> + return _noncoherent_ops;
>  #endif
>  #if defined(CONFIG_SPARC32) && defined(CONFIG_PCI)
>   if (bus == _bus_type)
> - return _dma_ops;
> + return _noncoherent_ops;
>  #endif
>   return dma_ops;
>  }
> diff --git a/arch/sparc/kernel/ioport.c b/arch/sparc/kernel/ioport.c
> index cca9134cfa7d..960c1bc22c2e 100644
> --- a/arch/sparc/kernel/ioport.c
> +++ b/arch/sparc/kernel/ioport.c
> @@ -38,6 +38,7 @@
>  #include 
>  #include 
>  #include 
> +#include 
>  #include 
>  
>  #include 
> @@ -434,9 +435,8 @@ arch_initcall(sparc_register_ioport);
>  /* Allocate and map kernel buffer using consistent mode DMA for a device.
>   * hwdev should be valid struct pci_dev pointer for PCI devices.
>   */
> -static void *pci32_alloc_coherent(struct device *dev, size_t len,
> -   dma_addr_t *pba, gfp_t gfp,
> -   unsigned long attrs)
> +void *arch_dma_alloc(struct device *dev, size_t len, dma_addr_t *pba, gfp_t 
> gfp,
> + unsigned long attrs)

This function was renamed in ee664a9252d24 and is now renamed again.
The printk statements should be updated to use arch_dma_alloc.


>  {
>   unsigned long len_total = PAGE_ALIGN(len);
>   void *va;
> @@ -488,8 +488,8 @@ static void *pci32_alloc_coherent(struct device *dev, 
> size_t len,
>   * References to the memory and mappings associated with cpu_addr/dma_addr
>   * past this call are illegal.
>   */
> -static void pci32_free_coherent(struct device *dev, size_t n, void *p,
> - dma_addr_t ba, unsigned long attrs)
> +void arch_dma_free(struct device *dev, size_t n, void *p, dma_addr_t ba,
> + unsigned long attrs)
Likewise, use arch_dma_free in printk statements

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

Re: [PATCH 2/3] dmapool: improve scalability of dma_pool_free

On 07/27/2018 03:38 PM, Tony Battersby wrote:
> But the bigger problem is that my first patch adds another list_head to
> the dma_page for the avail_page_link to make allocations faster.  I
> suppose we could make the lists singly-linked instead of doubly-linked
> to save space.
>

I managed to redo my dma_pool_alloc() patch to make avail_page_list
singly-linked instead of doubly-linked.  But the problem with making
either list singly-linked is that it would no longer be possible to call
pool_free_page() any time other than dma_pool_destroy() without scanning
the lists to remove the page from them, which would make pruning
arbitrary free pages slower (adding back a O(n^2)).  But the current
code doesn't do that anyway, and in fact it has a comment in
dma_pool_free() to "resist the temptation" to prune free pages.  And yet
it seems like it might be reasonable for someone to add such code in the
future if there are a whole lot of free pages, so I am hesitant to make
it more difficult.

So my question is: when I post v2 of the patchset, should I send the
doubly-linked version or the singly-linked version, in anticipation that
someone else might want to take it further and move everything into
struct page as you suggest?

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

Re: [PATCH 2/3] dmapool: improve scalability of dma_pool_free

2018-07-27 Thread Andy Shevchenko

On Sat, Jul 28, 2018 at 12:27 AM, Tony Battersby  wrote:
> On 07/27/2018 03:38 PM, Tony Battersby wrote:
>> But the bigger problem is that my first patch adds another list_head to
>> the dma_page for the avail_page_link to make allocations faster.  I
>> suppose we could make the lists singly-linked instead of doubly-linked
>> to save space.
>>
>
> I managed to redo my dma_pool_alloc() patch to make avail_page_list
> singly-linked instead of doubly-linked.

Are you relying on llist.h implementation?

Btw, did you see quicklist.h?


-- 
With Best Regards,
Andy Shevchenko
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

Re: [PATCH v2 7/7] OF: Don't set default coherent DMA mask

On 2018-07-27 7:13 PM, Russell King - ARM Linux wrote:

On Fri, Jul 27, 2018 at 12:36:00PM +0100, Robin Murphy wrote:

That is intentional. Consider a 32-bit device on a bus with an upstream DMA
range of 40 bits (which could easily happen with e.g. PCI). If the firmware
code gives that device 40-bit DMA masks and the driver doesn't change them,
then DMA is not going to work correctly. Therefore the firmware code cannot
safely make {coherent_}dma_mask larger than the default value passed in, and
if the default is 0 then that should be fixed in whatever code created the
device, not worked around later.

I think you're missing the point.

No, I'm just failing to make it clearly enough. Sorry about that.

When OF creates platform devices, they are created without any DMA masks
what so ever. It is only during device probe that dma_configure() gets
called, which then calls of_dma_configure(), where the DMA mask for any
DT declared device is setup.

Indeed, because the DMA mask initialisation which was originally part of
OF platform device creation got factored out into of_dma_configure(),
then of_dma_configure() got repurposed into a
non-platform-device-specific helper, then DMA configuration got
generalised more and moved from device creation to probe time, and it
now transpires that what looked like an unnecessary vestigial leftover
was in fact some OF-platform-specific code all along. Which, having now
made that realisitaion, I've put back where it originally was and
conceptually should be.

What this means is that your change has broken all existing DT devices
that are trying to use DMA, because they now end up with a zero coherent
DMA mask and/or streaming DMA mask.

All existing DT devices that are trying to use DMA *without first
calling dma_set_mask() etc. as the API expects* - I could point you at
lines 166-171 of DMA-API-HOWTO.txt, but given that the last person to
touch half this stuff was you, that feels almost impolite. It appears
all the drivers on the machines I boot-tested are well-behaved and do do
that. In a few hours we'll have a useful datapoint from the kernelci.org
boot logs about how many aren't and don't. Happy accidents 'n'all...

Your original idea behind the patch may be a sound one, but since the
implementation has caused a regression, the implementation is obviously
broken, and that needs fixing or reworking in some way.

Already done:
http://git.infradead.org/users/hch/dma-mapping.git/commitdiff/a5516219b10218a87abb3352c82248ce3088e94a

Robin.
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

Re: [PATCH 2/3] dmapool: improve scalability of dma_pool_free

On 07/27/2018 11:23 AM, Matthew Wilcox wrote:
> On Fri, Jul 27, 2018 at 09:23:30AM -0400, Tony Battersby wrote:
>> On 07/26/2018 08:07 PM, Matthew Wilcox wrote:
>>> If you're up for more major surgery, then I think we can put all the
>>> information currently stored in dma_page into struct page.  Something
>>> like this:
>>>
>>> +++ b/include/linux/mm_types.h
>>> @@ -152,6 +152,12 @@ struct page {
>>> unsigned long hmm_data;
>>> unsigned long _zd_pad_1;/* uses mapping */
>>> };
>>> +   struct {/* dma_pool pages */
>>> +   struct list_head dma_list;
>>> +   unsigned short in_use;
>>> +   unsigned short offset;
>>> +   dma_addr_t dma;
>>> +   };
>>>  
>>> /** @rcu_head: You can use this to free a page by RCU. */
>>> struct rcu_head rcu_head;
>>>
>>> page_list -> dma_list
>>> vaddr goes away (page_to_virt() exists)
>>> dma -> dma
>>> in_use and offset shrink from 4 bytes to 2.
>>>
>>> Some 32-bit systems have a 64-bit dma_addr_t, and on those systems,
>>> this will be 8 + 2 + 2 + 8 = 20 bytes.  On 64-bit systems, it'll be
>>> 16 + 2 + 2 + 4 bytes of padding + 8 = 32 bytes (we have 40 available).
>>>
>>>
>> offset at least needs more bits, since allocations can be multi-page. 
> Ah, rats.  That means we have to use the mapcount union too:
>

Actually, on second thought, if I understand it correctly, a multi-page
allocation will never be split up and returned as multiple
sub-allocations, so the offset shouldn't be needed for that case.  The
offset should only be needed when splitting a PAGE_SIZE-allocation into
smaller sub-allocations.  The current code uses the offset
unconditionally though, so it would need major changes to remove the
dependence.  So a 16-bit offset might work.

As for sanity checking, I suppose having the dma address in the page
could provide something for dma_pool_free() to check against (in fact it
is already there under DMAPOOL_DEBUG).

But the bigger problem is that my first patch adds another list_head to
the dma_page for the avail_page_link to make allocations faster.  I
suppose we could make the lists singly-linked instead of doubly-linked
to save space.

Wouldn't using the mapcount union make it problematic for userspace to
mmap() the returned DMA buffers?  I am not sure if any drivers allow
that to be done or not.  I have heard of drivers in userspace, drivers
with DMA ring buffers, etc.  I don't want to audit the whole kernel tree
to know if it would be safe.  As you have seen, at least mpt3sas is
doing unexpected things with dma pools.

So maybe it could be done, but you are right, it would involve major
surgery.  My current in-development patch to implement your intial
suggestion is pretty small and it works.  So I'm not sure if I want to
take it further or not.  Lots of other things to do...
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

Re: [RFC PATCH v1 0/6] Resolve unwanted DMA backing with IOMMU

2018-07-27 Thread Dmitry Osipenko

On Friday, 27 July 2018 21:31:34 MSK Joerg Roedel wrote:
> On Fri, Jul 27, 2018 at 11:13:31AM -0600, Rob Herring wrote:
> > I don't follow why we need a property rather than being implied by the
> > device's (the GPU) compatible string.
> 
> There might be devices where either setup works, with or without IOMMU
> translation, and the firmware can set the property depending on whether
> the user wants more performance or more security.
> 
> If we have a whitelist in the kernel this gets more complicated, we
> probably need additional kernel-parameters to overwrite those whitelist
> entries. Having a property in the device-tree seems to be a better way
> here, imho.

IIUC, device-tree should be considered to be "written in stone" for a consumer 
device and hence firmware property isn't something that could be easily 
changed. The kernel-parameter will be much more universal. Anyway the global 
whitelisting should be a different topic for discussion, right now we need a 
kind of private whitelisting that is internal to kernel.

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

Re: [PATCH v2 2/7] dma-mapping: Generalise dma_32bit_limit flag



On 07/27/2018 03:11 PM, Robin Murphy wrote:
> On 2018-07-27 6:45 PM, Grygorii Strashko wrote:
>> On 07/23/2018 05:16 PM, Robin Murphy wrote:
>>> Whilst the notion of an upstream DMA restriction is most commonly seen
>>> in PCI host bridges saddled with a 32-bit native interface, a more
>>> general version of the same issue can exist on complex SoCs where a bus
>>> or point-to-point interconnect link from a device's DMA master interface
>>> to another component along the path to memory (often an IOMMU) may carry
>>> fewer address bits than the interfaces at both ends nominally support.
>>> In order to properly deal with this, the first step is to expand the
>>> dma_32bit_limit flag into an arbitrary mask.
>>>
>>> To minimise the impact on existing code, we'll make sure to only
>>> consider this new mask valid if set. That makes sense anyway, since a
>>> mask of zero would represent DMA not being wired up at all, and that
>>> would be better handled by not providing valid ops in the first place.
>>>
>>> Signed-off-by: Robin Murphy 
>>
>> I'd like to note about some possible issue related to this change.
>>
>> There are some places in kernel where parent DMA configuration is 
>> copied to the manually created child devices, like:
>> mfd-core.c
>> mfd_add_device()
>>  pdev->dev.parent = parent;
>>  pdev->dev.type = _dev_type;
>>  pdev->dev.dma_mask = parent->dma_mask;
>>  pdev->dev.dma_parms = parent->dma_parms;
>>  pdev->dev.coherent_dma_mask = parent->coherent_dma_mask;
>>
>> Adding or changing generic DMA device properties might affect on such
>> subsystems/drivers. Have you considered such cases?
> 
> Yes, that's a lovely example of what I class as "bus code" creating a 
> child device and initialising its DMA parameters appropriately. The 
> subdevice goes on to get associated with an OF node or ACPI companion, 
> so when the subdriver for that function binds it should go through its 
> own dma_configure() process and pick up any further properties accordingly.

Ideally ;), but in reality - dev->of_node not always initialized for child 
devices :(

> 
> Code which just tries to copy the DMA configuration from an existing 
> device to a new one has never worked properly, because there is often 
> additional DMA configuration in archdata and other places it cannot 
> possibly know about. Last time I looked there were still some specific 
> hacks in the USB layer in order to interact correctly with the block 
> layer bounce limit, but I think anything truly wrong has been more or 
> less flushed out by now (the DMA ops changes for arm64 ACPI support 
> caught a fair few IIRC).

Yep. For usb I wouldn't call it hack (dma controller device was introduced
to avoid DMA props copying).

Thanks for your comments.

-- 
regards,
-grygorii
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

Re: [PATCH v2 7/7] OF: Don't set default coherent DMA mask


On 2018-07-27 7:45 PM, Grygorii Strashko wrote:
[...]

But I have a question to all:
- The patch [1] sets default DMA mask to DMA_BIT_MASK(32)
+   dev->dev.coherent_dma_mask = DMA_BIT_MASK(32);
+   if (!dev->dev.dma_mask)
+   dev->dev.dma_mask = >dev.coherent_dma_mask;

this will also work the same way for ARM64

- But of_dma_configure() still have code which does:
dev->coherent_dma_mask &= mask;
*dev->dma_mask &= mask;

As result, DMA mask supplied from DT will always be truncated
  to 32 bit for platform devices. for example:

soc0: soc0 {
 compatible = "simple-bus";
 #address-cells = <2>;
 #size-cells = <2>;
 ranges;
+dma-ranges = <0 0 0 0 0x1 0>;

^ 48 bit DMA mask expected to be generated and assigned.

But real mask will be DMA_BIT_MASK(32). As result, any
DMA capable driver will have be modified to do dma_set_xxx_mask(),
which disregards DT DMA configuration (especially for HW modules
which are used on ARM32 and ARM64).


That has always been the case. Drivers which want to use 
larger-than-32-bit masks *must* explicitly say so. The issue that the DT 
dma-ranges (and ACPI equivalents) cannot be preserved in the device DMA 
masks is the entire purpose of this series. At the moment there's only a 
couple of point fixes for specific places with known problems, but this 
is just the start of some ongoing work.



Could it be considered to do one the changes below?

1) assign mask in case of dt
diff --git a/drivers/of/device.c b/drivers/of/device.c
index 5957cd4..f7dc121 100644
--- a/drivers/of/device.c
+++ b/drivers/of/device.c
@@ -150,8 +150,8 @@ int of_dma_configure(struct device *dev, struct device_node 
*np, bool force_dma)
  */
 mask = DMA_BIT_MASK(ilog2(dma_addr + size - 1) + 1);
 dev->bus_dma_mask = mask;
-   dev->coherent_dma_mask &= mask;
-   *dev->dma_mask &= mask;
+   dev->coherent_dma_mask = mask;
+   *dev->dma_mask = mask;
  
 coherent = of_dma_is_coherent(np);


No, because that leads to a risk of DMA address truncation in hardware 
(and thus at worst random memory corruption) when drivers expect the 
default mask to be 32-bit and fail to explicitly set it as such.



2) use BITS_PER_LONG for default DMA mask for of_platform devices

diff --git a/drivers/of/platform.c b/drivers/of/platform.c
index 7ba90c2..3f326e2 100644
--- a/drivers/of/platform.c
+++ b/drivers/of/platform.c
@@ -185,7 +185,7 @@ static struct platform_device 
*of_platform_device_create_pdata(
 if (!dev)
 goto err_clear_flag;
  
-   dev->dev.coherent_dma_mask = DMA_BIT_MASK(32);

+   dev->dev.coherent_dma_mask = DMA_BIT_MASK(BITS_PER_LONG);
 if (!dev->dev.dma_mask)
 dev->dev.dma_mask = >dev.coherent_dma_mask;


No, because that leads to a risk of DMA address truncation in hardware 
(and thus at worst random memory corruption) when drivers expect the 
default mask to be 32-bit and fail to explicitly set it as such.



3) ...


Remember when we found out how many drivers expect the default mask to 
be 32-bit and fail to explicitly set it as such, because they all broke 
when some chump set it to 0 in linux-next for a day? ;)


Robin.
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

Re: [PATCH v2 7/7] OF: Don't set default coherent DMA mask


On 2018-07-27 6:34 PM, Grygorii Strashko wrote:

On 07/27/2018 06:36 AM, Robin Murphy wrote:

On 27/07/18 01:22, Grygorii Strashko wrote:
[...]

the result of this change is pretty strange as for me :(
Resulted code:

/*
 * If @dev is expected to be DMA-capable then the bus code that 
created
 * it should have initialised its dma_mask pointer by this 
point. For

 * now, we'll continue the legacy behaviour of coercing it to the
 * coherent mask if not, but we'll no longer do so quietly.
 */
if (!dev->dma_mask) {
    dev_warn(dev, "DMA mask not set\n");
    dev->dma_mask = >coherent_dma_mask;
^this will always produce warning in case of platform-bus or if 
there are no bus driver.

even if DT contains correct definition of dma-range
}

if (!size && dev->coherent_dma_mask)

^ coherent_dma_mask is zero, so size will not be calculated

pls, ignore this particular comment


    size = max(dev->coherent_dma_mask, dev->coherent_dma_mask + 1);
else if (!size)
    size = 1ULL << 32;

dev->dma_pfn_offset = offset;

/*
 * Limit coherent and dma mask based on size and default mask
 * set by the driver.
 */
mask = DMA_BIT_MASK(ilog2(dma_addr + size - 1) + 1);
dev->bus_dma_mask = mask;
dev->coherent_dma_mask &= mask;

^^ if nobody set coherent_dma_mask before it will stay null forever 
unless drivers
will overwrite it. Again even if DT has correct definition for 
dma-range.


That is intentional. Consider a 32-bit device on a bus with an 
upstream DMA range of 40 bits (which could easily happen with e.g. 
PCI). If the firmware code gives that device 40-bit DMA masks and the 
driver doesn't change them, then DMA is not going to work correctly. 
Therefore the firmware code cannot safely make {coherent_}dma_mask 
larger than the default value passed in, and if the default is 0 then 
that should be fixed in whatever code created the device, not worked 
around later.


Could you clarify what do you mean "firmware" here?


By "firmware code" in this context I mean of_dma_configure(), 
acpi_dma_configure() and anything else which may also do the equivalent 
in future, i.e. the code which processes dma coherency attributes and 
addressing restrictions from the firmware-provided machine description. 
DT is conceptually firmware, regardless of how embedded ARM bootloaders 
might provide it.



On most ARM32 platforms there is only DT + kernel.
And DT is usually only one source of information about DT configuration.
Sry, but seems this changes makes impossible using DT for DMA parameters 
configuration any more.


Well, it was also impossible in general before. of_dma_configure() has 
in practice never been able to set device masks to wider than 32 bits. 
Even if it did, that would just lead to breakage elsewhere, as above. 
There'll doubtless be a few more rounds of whack-a-mole until *all*B the 
brokenness has been flushed out.


Robin.
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

Re: [PATCH v2 2/7] dma-mapping: Generalise dma_32bit_limit flag