Re: [PATCH 01/11] iommu/vt-d: Add pasid private data helpers

2021-05-21 Thread Jacob Pan
Hi BaoLu,

On Thu, 20 May 2021 11:15:21 +0800, Lu Baolu 
wrote:

> We are about to use iommu_sva_alloc/free_pasid() helpers in iommu core.
> That means the pasid life cycle will be managed by iommu core. Use a
> local array to save the per pasid private data instead of attaching it
> the real pasid.
> 
I feel a little awkward to have a separate xarray for storing per IOASID
data. Seems duplicated.
Jason suggested in another thread that we can make ioasid_data public
and embeded in struct intel_svm, then we can get rid of the private data
pointer. ioasid_find will return the ioasid_data, then we can retrieve the
private data with container_of.

roughly,

struct intel_svm {
...
struct ioasid_data;
};

struct ioasid_data {
ioasid_t id;
refcount_t refs;
struct mm_struct *mm;
};

This can be a separate patch/effort if it make sense to you.

> Signed-off-by: Lu Baolu 
> ---
>  drivers/iommu/intel/svm.c | 62 ++-
>  1 file changed, 41 insertions(+), 21 deletions(-)
> 
> diff --git a/drivers/iommu/intel/svm.c b/drivers/iommu/intel/svm.c
> index 5165cea90421..82b0627ad7e7 100644
> --- a/drivers/iommu/intel/svm.c
> +++ b/drivers/iommu/intel/svm.c
> @@ -17,6 +17,7 @@
>  #include 
>  #include 
>  #include 
> +#include 
>  #include 
>  #include 
>  #include 
> @@ -28,6 +29,23 @@ static void intel_svm_drain_prq(struct device *dev,
> u32 pasid); 
>  #define PRQ_ORDER 0
>  
> +static DEFINE_XARRAY_ALLOC(pasid_private_array);
> +static int pasid_private_add(ioasid_t pasid, void *priv)
> +{
> + return xa_alloc(_private_array, , priv,
> + XA_LIMIT(pasid, pasid), GFP_ATOMIC);
> +}
> +
> +static void pasid_private_remove(ioasid_t pasid)
> +{
> + xa_erase(_private_array, pasid);
> +}
> +
> +static void *pasid_private_find(ioasid_t pasid)
> +{
> + return xa_load(_private_array, pasid);
> +}
> +
>  int intel_svm_enable_prq(struct intel_iommu *iommu)
>  {
>   struct page *pages;
> @@ -224,7 +242,7 @@ static int pasid_to_svm_sdev(struct device *dev,
> unsigned int pasid, if (pasid == INVALID_IOASID || pasid >= PASID_MAX)
>   return -EINVAL;
>  
> - svm = ioasid_find(NULL, pasid, NULL);
> + svm = pasid_private_find(pasid);
>   if (IS_ERR(svm))
>   return PTR_ERR(svm);
>  
> @@ -334,7 +352,7 @@ int intel_svm_bind_gpasid(struct iommu_domain
> *domain, struct device *dev, svm->gpasid = data->gpasid;
>   svm->flags |= SVM_FLAG_GUEST_PASID;
>   }
> - ioasid_set_data(data->hpasid, svm);
> + pasid_private_add(data->hpasid, svm);
>   INIT_LIST_HEAD_RCU(>devs);
>   mmput(svm->mm);
>   }
> @@ -388,7 +406,7 @@ int intel_svm_bind_gpasid(struct iommu_domain
> *domain, struct device *dev, list_add_rcu(>list, >devs);
>   out:
>   if (!IS_ERR_OR_NULL(svm) && list_empty(>devs)) {
> - ioasid_set_data(data->hpasid, NULL);
> + pasid_private_remove(data->hpasid);
>   kfree(svm);
>   }
>  
> @@ -431,7 +449,7 @@ int intel_svm_unbind_gpasid(struct device *dev, u32
> pasid)
>* the unbind, IOMMU driver will get
> notified
>* and perform cleanup.
>*/
> - ioasid_set_data(pasid, NULL);
> + pasid_private_remove(pasid);
>   kfree(svm);
>   }
>   }
> @@ -547,8 +565,7 @@ intel_svm_bind_mm(struct device *dev, unsigned int
> flags, svm = kzalloc(sizeof(*svm), GFP_KERNEL);
>   if (!svm) {
>   ret = -ENOMEM;
> - kfree(sdev);
> - goto out;
> + goto sdev_err;
>   }
>  
>   if (pasid_max > intel_pasid_max_id)
> @@ -556,13 +573,16 @@ intel_svm_bind_mm(struct device *dev, unsigned int
> flags, 
>   /* Do not use PASID 0, reserved for RID to PASID */
>   svm->pasid = ioasid_alloc(NULL, PASID_MIN,
> -   pasid_max - 1, svm);
> +   pasid_max - 1, NULL);
>   if (svm->pasid == INVALID_IOASID) {
> - kfree(svm);
> - kfree(sdev);
>   ret = -ENOSPC;
> - goto out;
> + goto svm_err;
>   }
> +
> + ret = pasid_private_add(svm->pasid, svm);
> + if (ret)
> + goto pasid_err;
> +
>   svm->notifier.ops = _mmuops;
>   svm->mm = mm;
>   svm->flags = flags;
> @@ -571,12 +591,8 @@ intel_svm_bind_mm(struct device *dev, unsigned int
> flags, ret = -ENOMEM;
>   if (mm) {
>   ret = mmu_notifier_register(>notifier, mm);
> - if (ret) {
> -   

Re: [PATCH 2/2] firmware: QCOM_SCM: Allow qcom_scm driver to be loadable as a permenent module

2021-05-21 Thread Greg Kroah-Hartman
On Tue, May 18, 2021 at 09:19:22PM +, John Stultz wrote:
> Allow the qcom_scm driver to be loadable as a permenent module.
> 
> This still uses the "depends on QCOM_SCM || !QCOM_SCM" bit to
> ensure that drivers that call into the qcom_scm driver are
> also built as modules. While not ideal in some cases its the
> only safe way I can find to avoid build errors without having
> those drivers select QCOM_SCM and have to force it on (as
> QCOM_SCM=n can be valid for those drivers).
> 
> Reviving this now that Saravana's fw_devlink defaults to on,
> which should avoid loading troubles seen before.
> 
> Cc: Catalin Marinas 
> Cc: Will Deacon 
> Cc: Andy Gross 
> Cc: Bjorn Andersson 
> Cc: Joerg Roedel 
> Cc: Thomas Gleixner 
> Cc: Jason Cooper 
> Cc: Marc Zyngier 
> Cc: Linus Walleij 
> Cc: Vinod Koul 
> Cc: Kalle Valo 
> Cc: Maulik Shah 
> Cc: Lina Iyer 
> Cc: Saravana Kannan 
> Cc: Todd Kjos 
> Cc: Greg Kroah-Hartman 
> Cc: linux-arm-...@vger.kernel.org
> Cc: iommu@lists.linux-foundation.org
> Cc: linux-g...@vger.kernel.org
> Acked-by: Kalle Valo 
> Acked-by: Greg Kroah-Hartman 
> Reviewed-by: Bjorn Andersson 
> Signed-off-by: John Stultz 

Acked-by: Greg Kroah-Hartman 
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH 1/2] irqchip/qcom-pdc: Switch to IRQCHIP_PLATFORM_DRIVER and allow as a module

2021-05-21 Thread Greg Kroah-Hartman
On Tue, May 18, 2021 at 09:19:21PM +, John Stultz wrote:
> From: Saravana Kannan 
> 
> This patch revives changes from Saravana Kannan to switch the
> qcom-pdc driver to use IRQCHIP_PLATFORM_DRIVER helper macros,
> and allows qcom-pdc driver to be loaded as a permanent module.
> 
> Earlier attempts at this ran into trouble with loading
> dependencies, but with Saravana's fw_devlink=on set by default
> now we should avoid those.
> 
> Cc: Andy Gross 
> Cc: Bjorn Andersson 
> Cc: Joerg Roedel 
> Cc: Thomas Gleixner 
> Cc: Jason Cooper 
> Cc: Marc Zyngier 
> Cc: Linus Walleij 
> Cc: Maulik Shah 
> Cc: Lina Iyer 
> Cc: Saravana Kannan 
> Cc: Todd Kjos 
> Cc: Greg Kroah-Hartman 
> Cc: linux-arm-...@vger.kernel.org
> Cc: iommu@lists.linux-foundation.org
> Cc: linux-g...@vger.kernel.org
> Signed-off-by: Saravana Kannan 
> [jstultz: Folded in with my changes to allow the driver to be
>  loadable as a permenent module]
> Signed-off-by: John Stultz 

Acked-by: Greg Kroah-Hartman 
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH v4] iommu/of: Fix pci_request_acs() before enumerating PCI devices

2021-05-21 Thread Rob Herring
On Thu, May 20, 2021 at 10:03 PM Wang Xingang  wrote:
>
> From: Xingang Wang 
>
> When booting with devicetree, the pci_request_acs() is called after the
> enumeration and initialization of PCI devices, thus the ACS is not
> enabled. And ACS should be enabled when IOMMU is detected for the
> PCI host bridge, so add check for IOMMU before probe of PCI host and call
> pci_request_acs() to make sure ACS will be enabled when enumerating PCI
> devices.
>
> Fixes: 6bf6c24720d33 ("iommu/of: Request ACS from the PCI core when
> configuring IOMMU linkage")
> Signed-off-by: Xingang Wang 
> ---
>  drivers/iommu/of_iommu.c | 1 -
>  drivers/pci/of.c | 8 +++-
>  2 files changed, 7 insertions(+), 2 deletions(-)

Reviewed-by: Rob Herring 

> diff --git a/drivers/iommu/of_iommu.c b/drivers/iommu/of_iommu.c
> index a9d2df001149..54a14da242cc 100644
> --- a/drivers/iommu/of_iommu.c
> +++ b/drivers/iommu/of_iommu.c
> @@ -205,7 +205,6 @@ const struct iommu_ops *of_iommu_configure(struct device 
> *dev,
> .np = master_np,
> };
>
> -   pci_request_acs();
> err = pci_for_each_dma_alias(to_pci_dev(dev),
>  of_pci_iommu_init, );
> } else {
> diff --git a/drivers/pci/of.c b/drivers/pci/of.c
> index da5b414d585a..2313c3f848b0 100644
> --- a/drivers/pci/of.c
> +++ b/drivers/pci/of.c
> @@ -581,9 +581,15 @@ static int pci_parse_request_of_pci_ranges(struct device 
> *dev,
>
>  int devm_of_pci_bridge_init(struct device *dev, struct pci_host_bridge 
> *bridge)
>  {
> -   if (!dev->of_node)
> +   struct device_node *node = dev->of_node;
> +
> +   if (!node)
> return 0;
>
> +   /* Detect IOMMU and make sure ACS will be enabled */
> +   if (of_property_read_bool(node, "iommu-map"))
> +   pci_request_acs();
> +
> bridge->swizzle_irq = pci_common_swizzle;
> bridge->map_irq = of_irq_parse_and_map_pci;
>
> --
> 2.19.1
>
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH v2 2/6] ACPI: Move IOMMU setup code out of IORT

2021-05-21 Thread Rafael J. Wysocki
On Fri, Apr 23, 2021 at 1:57 PM Jean-Philippe Brucker
 wrote:
>
> Some of the IOMMU setup code in IORT is fairly generic and can be reused
> by VIOT. Extract it from IORT.

Except that iort_iommu_configure_id() is not really generic AFAICS.
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH v2 3/6] ACPI: Add driver for the VIOT table

2021-05-21 Thread Rafael J. Wysocki
On Fri, Apr 23, 2021 at 1:57 PM Jean-Philippe Brucker
 wrote:
>
> The ACPI Virtual I/O Translation Table describes topology of
> para-virtual platforms, similarly to vendor tables DMAR, IVRS and IORT.
> For now it describes the relation between virtio-iommu and the endpoints
> it manages.
>
> Three steps are needed to configure DMA of endpoints:
>
> (1) acpi_viot_init(): parse the VIOT table, find or create the fwnode
> associated to each vIOMMU device.
>
> (2) When probing the vIOMMU device, the driver registers its IOMMU ops
> within the IOMMU subsystem. This step doesn't require any
> intervention from the VIOT driver.
>
> (3) viot_iommu_configure(): before binding the endpoint to a driver,
> find the associated IOMMU ops. Register them, along with the
> endpoint ID, into the device's iommu_fwspec.
>
> If step (3) happens before step (2), it is deferred until the IOMMU is
> initialized, then retried.
>
> Signed-off-by: Jean-Philippe Brucker 
> ---
>  drivers/acpi/Kconfig  |   3 +
>  drivers/iommu/Kconfig |   1 +
>  drivers/acpi/Makefile |   2 +
>  include/linux/acpi_viot.h |  19 +++
>  drivers/acpi/bus.c|   2 +
>  drivers/acpi/scan.c   |   3 +
>  drivers/acpi/viot.c   | 350 ++
>  MAINTAINERS   |   8 +
>  8 files changed, 388 insertions(+)
>  create mode 100644 include/linux/acpi_viot.h
>  create mode 100644 drivers/acpi/viot.c
>
> diff --git a/drivers/acpi/Kconfig b/drivers/acpi/Kconfig
> index eedec61e3476..3758c6940ed7 100644
> --- a/drivers/acpi/Kconfig
> +++ b/drivers/acpi/Kconfig
> @@ -526,6 +526,9 @@ endif
>
>  source "drivers/acpi/pmic/Kconfig"
>
> +config ACPI_VIOT
> +   bool
> +
>  endif  # ACPI
>
>  config X86_PM_TIMER
> diff --git a/drivers/iommu/Kconfig b/drivers/iommu/Kconfig
> index 1f111b399bca..aff8a4830dd1 100644
> --- a/drivers/iommu/Kconfig
> +++ b/drivers/iommu/Kconfig
> @@ -403,6 +403,7 @@ config VIRTIO_IOMMU
> depends on ARM64
> select IOMMU_API
> select INTERVAL_TREE
> +   select ACPI_VIOT if ACPI
> help
>   Para-virtualised IOMMU driver with virtio.
>
> diff --git a/drivers/acpi/Makefile b/drivers/acpi/Makefile
> index 700b41adf2db..a6e644c48987 100644
> --- a/drivers/acpi/Makefile
> +++ b/drivers/acpi/Makefile
> @@ -118,3 +118,5 @@ video-objs  += acpi_video.o video_detect.o
>  obj-y  += dptf/
>
>  obj-$(CONFIG_ARM64)+= arm64/
> +
> +obj-$(CONFIG_ACPI_VIOT)+= viot.o
> diff --git a/include/linux/acpi_viot.h b/include/linux/acpi_viot.h
> new file mode 100644
> index ..1eb8ee5b0e5f
> --- /dev/null
> +++ b/include/linux/acpi_viot.h
> @@ -0,0 +1,19 @@
> +/* SPDX-License-Identifier: GPL-2.0-only */
> +
> +#ifndef __ACPI_VIOT_H__
> +#define __ACPI_VIOT_H__
> +
> +#include 
> +
> +#ifdef CONFIG_ACPI_VIOT
> +void __init acpi_viot_init(void);
> +int viot_iommu_configure(struct device *dev);
> +#else
> +static inline void acpi_viot_init(void) {}
> +static inline int viot_iommu_configure(struct device *dev)
> +{
> +   return -ENODEV;
> +}
> +#endif
> +
> +#endif /* __ACPI_VIOT_H__ */
> diff --git a/drivers/acpi/bus.c b/drivers/acpi/bus.c
> index be7da23fad76..b835ca702ff0 100644
> --- a/drivers/acpi/bus.c
> +++ b/drivers/acpi/bus.c
> @@ -27,6 +27,7 @@
>  #include 
>  #endif
>  #include 
> +#include 
>  #include 
>  #include 
>  #include 
> @@ -1339,6 +1340,7 @@ static int __init acpi_init(void)
> pci_mmcfg_late_init();
> acpi_iort_init();
> acpi_scan_init();
> +   acpi_viot_init();
> acpi_ec_init();
> acpi_debugfs_init();
> acpi_sleep_proc_init();
> diff --git a/drivers/acpi/scan.c b/drivers/acpi/scan.c
> index 5924421075f6..4db43c822ee7 100644
> --- a/drivers/acpi/scan.c
> +++ b/drivers/acpi/scan.c
> @@ -9,6 +9,7 @@
>  #include 
>  #include 
>  #include 
> +#include 
>  #include 
>  #include 
>  #include 
> @@ -1554,6 +1555,8 @@ static const struct iommu_ops 
> *acpi_iommu_configure_id(struct device *dev,
> return ops;
>
> err = iort_iommu_configure_id(dev, id_in);
> +   if (err && err != -EPROBE_DEFER)
> +   err = viot_iommu_configure(dev);
>
> /*
>  * If we have reason to believe the IOMMU driver missed the initial
> diff --git a/drivers/acpi/viot.c b/drivers/acpi/viot.c
> new file mode 100644
> index ..710e5a5eac70
> --- /dev/null
> +++ b/drivers/acpi/viot.c
> @@ -0,0 +1,350 @@
> +// SPDX-License-Identifier: GPL-2.0
> +/*
> + * Virtual I/O topology

In the first place, more information on what this is all about, please.

What it does and how it is used.
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH v5 3/4] iommu: rockchip: Add internal ops to handle variants

2021-05-21 Thread Robin Murphy

On 2021-05-21 14:38, Benjamin Gaignard wrote:


Le 21/05/2021 à 14:58, Robin Murphy a écrit :

On 2021-05-21 09:36, Benjamin Gaignard wrote:

Add internal ops to be able to handle incoming variant v2.
The goal is to keep the overall structure of the framework but
to allow to add the evolution of this hardware block.

The ops are global for a SoC because iommu domains are not
attached to a specific devices if they are for a virtuel device like
drm. Use a global variable shouldn't be since SoC usually doesn't
embedded different versions of the iommu hardware block.
If that happen one day a WARN_ON will be displayed at probe time.


IMO it would be a grievous error if such a "virtual device" ever gets 
near the IOMMU API, so personally I wouldn't use that as a 
justification for anything :)


FWIW you should be OK to handle things on a per-instance basis, it 
just means you have to defer some of the domain setup to .attach_dev 
time, like various other drivers do. That said, there's nothing wrong 
with the global if we do expect instances to be consistent across any 
given Rockchip SoC (and my gut feeling is that we probably should).


I have tried that solution first but drm device appear to but such 
"virtual device" so I had to use the global.


Hmm, the "rockchip,display-subsystem" node is not associated with an 
IOMMU, and shouldn't even be passed to the DMA API either, because it's 
not a real piece of DMA capable hardware. Whatever the DRM stack is 
doing above, it should only be the actual VOP devices that we see down 
here. If not, that's indicative of something being wrong elsewhere. Like 
I say though, I think it's fine to use global ops simply on the 
expectation that that's how the new SOCs are going to be.


In fact this reminds me, I think I started writing a patch somewhere to 
clean up the virtual device mess for rockchip-drm (IIRC I could see no 
reason why we can't just allocate the DRM device from the VOP driver, 
similar to what exynos-drm does). Maybe I should dig that up again...



I send a v6 to fix your others remarks.


I guess I'll wait for v7 now then, since I got sidetracked before 
sending my review of patch #4 (heck, I've just spent the last half hour 
doing something else in the middle of writing this!) ;)


Cheers,
Robin.



Thanks for your advice.

Benjamin




Signed-off-by: Benjamin Gaignard 
---
version 5:
  - Use of_device_get_match_data()
  - Add internal ops inside the driver

  drivers/iommu/rockchip-iommu.c | 69 --
  1 file changed, 50 insertions(+), 19 deletions(-)

diff --git a/drivers/iommu/rockchip-iommu.c 
b/drivers/iommu/rockchip-iommu.c

index 7a2932772fdf..e7b9bcf174b1 100644
--- a/drivers/iommu/rockchip-iommu.c
+++ b/drivers/iommu/rockchip-iommu.c
@@ -19,6 +19,7 @@
  #include 
  #include 
  #include 
+#include 


This seems to be an unrelated and unnecessary change.


  #include 
  #include 
  #include 
@@ -96,6 +97,14 @@ static const char * const rk_iommu_clocks[] = {
  "aclk", "iface",
  };
  +struct rk_iommu_ops {
+    phys_addr_t (*pt_address)(u32 dte);
+    u32 (*mk_dtentries)(dma_addr_t pt_dma);
+    u32 (*mk_ptentries)(phys_addr_t page, int prot);
+    phys_addr_t (*dte_addr_phys)(phys_addr_t addr);
+    u32 pt_address_mask;
+};
+
  struct rk_iommu {
  struct device *dev;
  void __iomem **bases;
@@ -116,6 +125,7 @@ struct rk_iommudata {
  };
    static struct device *dma_dev;
+static const struct rk_iommu_ops *rk_ops;
    static inline void rk_table_flush(struct rk_iommu_domain *dom, 
dma_addr_t dma,

    unsigned int count)
@@ -215,11 +225,6 @@ static inline u32 rk_mk_dte(dma_addr_t pt_dma)
  #define RK_PTE_PAGE_READABLE  BIT(1)
  #define RK_PTE_PAGE_VALID BIT(0)
  -static inline phys_addr_t rk_pte_page_address(u32 pte)
-{
-    return (phys_addr_t)pte & RK_PTE_PAGE_ADDRESS_MASK;
-}
-
  static inline bool rk_pte_is_page_valid(u32 pte)
  {
  return pte & RK_PTE_PAGE_VALID;
@@ -451,7 +456,7 @@ static int rk_iommu_force_reset(struct rk_iommu 
*iommu)
  rk_iommu_write(iommu->bases[i], RK_MMU_DTE_ADDR, 
DTE_ADDR_DUMMY);

    dte_addr = rk_iommu_read(iommu->bases[i], RK_MMU_DTE_ADDR);
-    if (dte_addr != (DTE_ADDR_DUMMY & RK_DTE_PT_ADDRESS_MASK)) {
+    if (dte_addr != (DTE_ADDR_DUMMY & rk_ops->pt_address_mask)) {


Nit: might it make more sense to do something like:

    dte_addr = rk_ops->pt_address(... DTE_ADDR_DUMMY);
    rk_iommu_write(... dte_addr)
    if (rk_iommu_read(...) != dte_addr)

so that you don't need to bother defining ->pt_address_mask for just 
this one sanity-check?


  dev_err(iommu->dev, "Error during raw reset. 
MMU_DTE_ADDR is not functioning\n");

  return -EFAULT;
  }
@@ -470,6 +475,11 @@ static int rk_iommu_force_reset(struct rk_iommu 
*iommu)

  return 0;
  }
  +static inline phys_addr_t rk_dte_addr_phys(phys_addr_t addr)


The argument type here should be u32, since it's a DTE, 

[PATCH] iommu/rockchip: Remove redundant DMA syncs

2021-05-21 Thread Robin Murphy
When we allocate new pagetable pages, we don't modify them between the
initial dma_map_single() call and the dma_sync_single_for_device() call
via rk_iommu_flush(), so the latter is entirely pointless.

Signed-off-by: Robin Murphy 
---
 drivers/iommu/rockchip-iommu.c | 3 ---
 1 file changed, 3 deletions(-)

diff --git a/drivers/iommu/rockchip-iommu.c b/drivers/iommu/rockchip-iommu.c
index 7a2932772fdf..e5116d6a6260 100644
--- a/drivers/iommu/rockchip-iommu.c
+++ b/drivers/iommu/rockchip-iommu.c
@@ -682,7 +682,6 @@ static u32 *rk_dte_get_page_table(struct rk_iommu_domain 
*rk_domain,
dte = rk_mk_dte(pt_dma);
*dte_addr = dte;
 
-   rk_table_flush(rk_domain, pt_dma, NUM_PT_ENTRIES);
rk_table_flush(rk_domain,
   rk_domain->dt_dma + dte_index * sizeof(u32), 1);
 done:
@@ -1004,8 +1003,6 @@ static struct iommu_domain 
*rk_iommu_domain_alloc(unsigned type)
goto err_free_dt;
}
 
-   rk_table_flush(rk_domain, rk_domain->dt_dma, NUM_DT_ENTRIES);
-
spin_lock_init(_domain->iommus_lock);
spin_lock_init(_domain->dt_lock);
INIT_LIST_HEAD(_domain->iommus);
-- 
2.21.0.dirty

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH v5 4/4] iommu: rockchip: Add support for iommu v2

2021-05-21 Thread Robin Murphy

On 2021-05-21 09:36, Benjamin Gaignard wrote:

This second version of the hardware block has a different bits
mapping for page table entries.
Add the ops matching to this new mapping.
Define a new compatible to distinguish it from the first version.

Signed-off-by: Benjamin Gaignard 
---
version 5:
  - Use internal ops to support v2 hardware block
  - Use GENMASK macro.
  - Keep rk_dte_pt_address() and rk_dte_pt_address_v2() separated
because I believe that is more readable like this.
  - Do not duplicate code.

  drivers/iommu/rockchip-iommu.c | 78 ++
  1 file changed, 78 insertions(+)

diff --git a/drivers/iommu/rockchip-iommu.c b/drivers/iommu/rockchip-iommu.c
index e7b9bcf174b1..23253a2f269e 100644
--- a/drivers/iommu/rockchip-iommu.c
+++ b/drivers/iommu/rockchip-iommu.c
@@ -189,6 +189,33 @@ static inline phys_addr_t rk_dte_pt_address(u32 dte)
return (phys_addr_t)dte & RK_DTE_PT_ADDRESS_MASK;
  }
  
+/*

+ * In v2:
+ * 31:12 - PT address bit 31:0
+ * 11: 8 - PT address bit 35:32
+ *  7: 4 - PT address bit 39:36
+ *  3: 1 - Reserved
+ * 0 - 1 if PT @ PT address is valid
+ */
+#define RK_DTE_PT_ADDRESS_MASK_V2 GENMASK_ULL(31, 4)
+#define DTE_HI_MASK1   GENMASK(11, 8)
+#define DTE_HI_MASK2   GENMASK(7, 4)
+#define DTE_HI_SHIFT1  24 /* shift bit 8 to bit 32 */
+#define DTE_HI_SHIFT2  32 /* shift bit 4 to bit 36 */


Nit: no harm in doing "#define DTE_HI_SHIFT1 (32 - 8)" etc. for maximum 
clarity if you want.



+#define PAGE_DESC_HI_MASK1 GENMASK_ULL(39, 36)
+#define PAGE_DESC_HI_MASK2 GENMASK_ULL(35, 32)
+
+static inline phys_addr_t rk_dte_pt_address_v2(u32 dte)
+{
+   u64 dte_v2 = dte;
+
+   dte_v2 = ((dte_v2 & DTE_HI_MASK2) << DTE_HI_SHIFT2) |
+((dte_v2 & DTE_HI_MASK1) << DTE_HI_SHIFT1) |
+(dte_v2 & RK_DTE_PT_ADDRESS_MASK);
+
+   return (phys_addr_t)dte_v2;
+}
+
  static inline bool rk_dte_is_pt_valid(u32 dte)
  {
return dte & RK_DTE_PT_VALID;
@@ -199,6 +226,15 @@ static inline u32 rk_mk_dte(dma_addr_t pt_dma)
return (pt_dma & RK_DTE_PT_ADDRESS_MASK) | RK_DTE_PT_VALID;
  }
  
+static inline u32 rk_mk_dte_v2(dma_addr_t pt_dma)

+{
+   pt_dma = (pt_dma & RK_DTE_PT_ADDRESS_MASK) |
+((pt_dma & PAGE_DESC_HI_MASK1) >> DTE_HI_SHIFT1) |
+(pt_dma & PAGE_DESC_HI_MASK2) >> DTE_HI_SHIFT2;
+
+   return (pt_dma & RK_DTE_PT_ADDRESS_MASK_V2) | RK_DTE_PT_VALID;
+}
+
  /*
   * Each PTE has a Page address, some flags and a valid bit:
   * +-+---+---+-+
@@ -240,6 +276,29 @@ static u32 rk_mk_pte(phys_addr_t page, int prot)
return page | flags | RK_PTE_PAGE_VALID;
  }
  
+/*

+ * In v2:
+ * 31:12 - Page address bit 31:0
+ *  11:9 - Page address bit 34:32
+ *   8:4 - Page address bit 39:35
+ * 3 - Security
+ * 2 - Readable
+ * 1 - Writable
+ * 0 - 1 if Page @ Page address is valid
+ */
+#define RK_PTE_PAGE_READABLE_V2  BIT(2)
+#define RK_PTE_PAGE_WRITABLE_V2  BIT(1)
+
+static u32 rk_mk_pte_v2(phys_addr_t page, int prot)
+{
+   u32 flags = 0;
+
+   flags |= (prot & IOMMU_READ) ? RK_PTE_PAGE_READABLE_V2 : 0;
+   flags |= (prot & IOMMU_WRITE) ? RK_PTE_PAGE_WRITABLE_V2 : 0;
+
+   return rk_mk_dte_v2(page) | flags ;
+}
+
  static u32 rk_mk_pte_invalid(u32 pte)
  {
return pte & ~RK_PTE_PAGE_VALID;
@@ -480,6 +539,14 @@ static inline phys_addr_t rk_dte_addr_phys(phys_addr_t 
addr)
return addr;
  }
  
+#define DT_HI_MASK GENMASK_ULL(39, 32)

+#define DT_SHIFT   28
+
+static inline phys_addr_t rk_dte_addr_phys_v2(phys_addr_t addr)
+{
+   return (addr & RK_DTE_PT_ADDRESS_MASK) | ((addr & DT_HI_MASK) << 
DT_SHIFT);
+}


Are we missing something overall? AFAICS the DT_HI_MASK bits of 
RK_MMU_DTE_ADDR will never actually be used, since rk_iommu_enable() 
just writes the value of dt_dma directly...



+
  static void log_iova(struct rk_iommu *iommu, int index, dma_addr_t iova)
  {
void __iomem *base = iommu->bases[index];
@@ -1305,10 +1372,21 @@ static struct rk_iommu_ops iommu_data_ops_v1 = {
.pt_address_mask = RK_DTE_PT_ADDRESS_MASK,
  };
  
+static struct rk_iommu_ops iommu_data_ops_v2 = {

+   .pt_address = _dte_pt_address_v2,
+   .mk_dtentries = _mk_dte_v2,
+   .mk_ptentries = _mk_pte_v2,
+   .dte_addr_phys = _dte_addr_phys_v2,
+   .pt_address_mask = RK_DTE_PT_ADDRESS_MASK_V2,
+};
+
  static const struct of_device_id rk_iommu_dt_ids[] = {
{   .compatible = "rockchip,iommu",
.data = _data_ops_v1,
},
+   {   .compatible = "rockchip,rk3568-iommu",
+   .data = _data_ops_v2,
+   },
{ /* sentinel */ }
  };
  MODULE_DEVICE_TABLE(of, rk_iommu_dt_ids);



...and I'll bet the reason it appears to work is that we also never 
actually set the IOMMU device's DMA masks anywhere, so what happens in 
practice is that even if pagetable pages are allocated above 32 bits 
they'll just get bounced by the DMA 

[PATCH v6 4/4] iommu: rockchip: Add support for iommu v2

2021-05-21 Thread Benjamin Gaignard
This second version of the hardware block has a different bits
mapping for page table entries.
Add the ops matching to this new mapping.
Define a new compatible to distinguish it from the first version.

Signed-off-by: Benjamin Gaignard 
Reviewed-by: Heiko Stuebner 
---
version 5:
 - Use internal ops to support v2 hardware block
 - Use GENMASK macro.
 - Keep rk_dte_pt_address() and rk_dte_pt_address_v2() separated
   because I believe that is more readable like this.
 - Do not duplicate code.

 drivers/iommu/rockchip-iommu.c | 78 ++
 1 file changed, 78 insertions(+)

diff --git a/drivers/iommu/rockchip-iommu.c b/drivers/iommu/rockchip-iommu.c
index 7b537dd168ca..d943b9d8bdb2 100644
--- a/drivers/iommu/rockchip-iommu.c
+++ b/drivers/iommu/rockchip-iommu.c
@@ -187,6 +187,33 @@ static inline phys_addr_t rk_dte_pt_address(u32 dte)
return (phys_addr_t)dte & RK_DTE_PT_ADDRESS_MASK;
 }
 
+/*
+ * In v2:
+ * 31:12 - PT address bit 31:0
+ * 11: 8 - PT address bit 35:32
+ *  7: 4 - PT address bit 39:36
+ *  3: 1 - Reserved
+ * 0 - 1 if PT @ PT address is valid
+ */
+#define RK_DTE_PT_ADDRESS_MASK_V2 GENMASK_ULL(31, 4)
+#define DTE_HI_MASK1   GENMASK(11, 8)
+#define DTE_HI_MASK2   GENMASK(7, 4)
+#define DTE_HI_SHIFT1  24 /* shift bit 8 to bit 32 */
+#define DTE_HI_SHIFT2  32 /* shift bit 4 to bit 36 */
+#define PAGE_DESC_HI_MASK1 GENMASK_ULL(39, 36)
+#define PAGE_DESC_HI_MASK2 GENMASK_ULL(35, 32)
+
+static inline phys_addr_t rk_dte_pt_address_v2(u32 dte)
+{
+   u64 dte_v2 = dte;
+
+   dte_v2 = ((dte_v2 & DTE_HI_MASK2) << DTE_HI_SHIFT2) |
+((dte_v2 & DTE_HI_MASK1) << DTE_HI_SHIFT1) |
+(dte_v2 & RK_DTE_PT_ADDRESS_MASK);
+
+   return (phys_addr_t)dte_v2;
+}
+
 static inline bool rk_dte_is_pt_valid(u32 dte)
 {
return dte & RK_DTE_PT_VALID;
@@ -197,6 +224,15 @@ static inline u32 rk_mk_dte(dma_addr_t pt_dma)
return (pt_dma & RK_DTE_PT_ADDRESS_MASK) | RK_DTE_PT_VALID;
 }
 
+static inline u32 rk_mk_dte_v2(dma_addr_t pt_dma)
+{
+   pt_dma = (pt_dma & RK_DTE_PT_ADDRESS_MASK) |
+((pt_dma & PAGE_DESC_HI_MASK1) >> DTE_HI_SHIFT1) |
+(pt_dma & PAGE_DESC_HI_MASK2) >> DTE_HI_SHIFT2;
+
+   return (pt_dma & RK_DTE_PT_ADDRESS_MASK_V2) | RK_DTE_PT_VALID;
+}
+
 /*
  * Each PTE has a Page address, some flags and a valid bit:
  * +-+---+---+-+
@@ -238,6 +274,29 @@ static u32 rk_mk_pte(phys_addr_t page, int prot)
return page | flags | RK_PTE_PAGE_VALID;
 }
 
+/*
+ * In v2:
+ * 31:12 - Page address bit 31:0
+ *  11:9 - Page address bit 34:32
+ *   8:4 - Page address bit 39:35
+ * 3 - Security
+ * 2 - Readable
+ * 1 - Writable
+ * 0 - 1 if Page @ Page address is valid
+ */
+#define RK_PTE_PAGE_READABLE_V2  BIT(2)
+#define RK_PTE_PAGE_WRITABLE_V2  BIT(1)
+
+static u32 rk_mk_pte_v2(phys_addr_t page, int prot)
+{
+   u32 flags = 0;
+
+   flags |= (prot & IOMMU_READ) ? RK_PTE_PAGE_READABLE_V2 : 0;
+   flags |= (prot & IOMMU_WRITE) ? RK_PTE_PAGE_WRITABLE_V2 : 0;
+
+   return rk_mk_dte_v2(page) | flags;
+}
+
 static u32 rk_mk_pte_invalid(u32 pte)
 {
return pte & ~RK_PTE_PAGE_VALID;
@@ -478,6 +537,15 @@ static inline phys_addr_t rk_dte_addr_phys(u32 addr)
return (phys_addr_t)addr;
 }
 
+#define DT_HI_MASK GENMASK_ULL(39, 32)
+#define DT_SHIFT   28
+
+static inline phys_addr_t rk_dte_addr_phys_v2(u32 addr)
+{
+   return (phys_addr_t)(addr & RK_DTE_PT_ADDRESS_MASK) |
+  ((addr & DT_HI_MASK) << DT_SHIFT);
+}
+
 static void log_iova(struct rk_iommu *iommu, int index, dma_addr_t iova)
 {
void __iomem *base = iommu->bases[index];
@@ -1305,10 +1373,20 @@ static struct rk_iommu_ops iommu_data_ops_v1 = {
.dte_addr_phys = _dte_addr_phys,
 };
 
+static struct rk_iommu_ops iommu_data_ops_v2 = {
+   .pt_address = _dte_pt_address_v2,
+   .mk_dtentries = _mk_dte_v2,
+   .mk_ptentries = _mk_pte_v2,
+   .dte_addr_phys = _dte_addr_phys_v2,
+};
+
 static const struct of_device_id rk_iommu_dt_ids[] = {
{   .compatible = "rockchip,iommu",
.data = _data_ops_v1,
},
+   {   .compatible = "rockchip,rk3568-iommu",
+   .data = _data_ops_v2,
+   },
{ /* sentinel */ }
 };
 
-- 
2.25.1

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[PATCH v6 1/4] dt-bindings: iommu: rockchip: Convert IOMMU to DT schema

2021-05-21 Thread Benjamin Gaignard
Convert Rockchip IOMMU to DT schema

Signed-off-by: Benjamin Gaignard 
Reviewed-by: Rob Herring 
Reviewed-by: Heiko Stuebner 
---
 .../bindings/iommu/rockchip,iommu.txt | 38 -
 .../bindings/iommu/rockchip,iommu.yaml| 80 +++
 2 files changed, 80 insertions(+), 38 deletions(-)
 delete mode 100644 Documentation/devicetree/bindings/iommu/rockchip,iommu.txt
 create mode 100644 Documentation/devicetree/bindings/iommu/rockchip,iommu.yaml

diff --git a/Documentation/devicetree/bindings/iommu/rockchip,iommu.txt 
b/Documentation/devicetree/bindings/iommu/rockchip,iommu.txt
deleted file mode 100644
index 6ecefea1c6f9..
--- a/Documentation/devicetree/bindings/iommu/rockchip,iommu.txt
+++ /dev/null
@@ -1,38 +0,0 @@
-Rockchip IOMMU
-==
-
-A Rockchip DRM iommu translates io virtual addresses to physical addresses for
-its master device.  Each slave device is bound to a single master device, and
-shares its clocks, power domain and irq.
-
-Required properties:
-- compatible  : Should be "rockchip,iommu"
-- reg : Address space for the configuration registers
-- interrupts  : Interrupt specifier for the IOMMU instance
-- interrupt-names : Interrupt name for the IOMMU instance
-- #iommu-cells: Should be <0>.  This indicates the iommu is a
-"single-master" device, and needs no additional information
-to associate with its master device.  See:
-Documentation/devicetree/bindings/iommu/iommu.txt
-- clocks  : A list of clocks required for the IOMMU to be accessible by
-the host CPU.
-- clock-names : Should contain the following:
-   "iface" - Main peripheral bus clock (PCLK/HCL) (required)
-   "aclk"  - AXI bus clock (required)
-
-Optional properties:
-- rockchip,disable-mmu-reset : Don't use the mmu reset operation.
-  Some mmu instances may produce unexpected results
-  when the reset operation is used.
-
-Example:
-
-   vopl_mmu: iommu@ff940300 {
-   compatible = "rockchip,iommu";
-   reg = <0xff940300 0x100>;
-   interrupts = ;
-   interrupt-names = "vopl_mmu";
-   clocks = < ACLK_VOP1>, < HCLK_VOP1>;
-   clock-names = "aclk", "iface";
-   #iommu-cells = <0>;
-   };
diff --git a/Documentation/devicetree/bindings/iommu/rockchip,iommu.yaml 
b/Documentation/devicetree/bindings/iommu/rockchip,iommu.yaml
new file mode 100644
index ..099fc2578b54
--- /dev/null
+++ b/Documentation/devicetree/bindings/iommu/rockchip,iommu.yaml
@@ -0,0 +1,80 @@
+# SPDX-License-Identifier: GPL-2.0-only
+%YAML 1.2
+---
+$id: http://devicetree.org/schemas/iommu/rockchip,iommu.yaml#
+$schema: http://devicetree.org/meta-schemas/core.yaml#
+
+title: Rockchip IOMMU
+
+maintainers:
+  - Heiko Stuebner 
+
+description: |+
+  A Rockchip DRM iommu translates io virtual addresses to physical addresses 
for
+  its master device. Each slave device is bound to a single master device and
+  shares its clocks, power domain and irq.
+
+  For information on assigning IOMMU controller to its peripheral devices,
+  see generic IOMMU bindings.
+
+properties:
+  compatible:
+const: rockchip,iommu
+
+  reg:
+items:
+  - description: configuration registers for MMU instance 0
+  - description: configuration registers for MMU instance 1
+minItems: 1
+maxItems: 2
+
+  interrupts:
+items:
+  - description: interruption for MMU instance 0
+  - description: interruption for MMU instance 1
+minItems: 1
+maxItems: 2
+
+  clocks:
+items:
+  - description: Core clock
+  - description: Interface clock
+
+  clock-names:
+items:
+  - const: aclk
+  - const: iface
+
+  "#iommu-cells":
+const: 0
+
+  rockchip,disable-mmu-reset:
+$ref: /schemas/types.yaml#/definitions/flag
+description: |
+  Do not use the mmu reset operation.
+  Some mmu instances may produce unexpected results
+  when the reset operation is used.
+
+required:
+  - compatible
+  - reg
+  - interrupts
+  - clocks
+  - clock-names
+  - "#iommu-cells"
+
+additionalProperties: false
+
+examples:
+  - |
+#include 
+#include 
+
+vopl_mmu: iommu@ff940300 {
+  compatible = "rockchip,iommu";
+  reg = <0xff940300 0x100>;
+  interrupts = ;
+  clocks = < ACLK_VOP1>, < HCLK_VOP1>;
+  clock-names = "aclk", "iface";
+  #iommu-cells = <0>;
+};
-- 
2.25.1

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[PATCH v6 2/4] dt-bindings: iommu: rockchip: Add compatible for v2

2021-05-21 Thread Benjamin Gaignard
Add compatible for the second version of IOMMU hardware block.
RK356x IOMMU can also be link to a power domain.

Signed-off-by: Benjamin Gaignard 
Reviewed-by: Rob Herring 
Reviewed-by: Heiko Stuebner 
---
 .../devicetree/bindings/iommu/rockchip,iommu.yaml  | 7 ++-
 1 file changed, 6 insertions(+), 1 deletion(-)

diff --git a/Documentation/devicetree/bindings/iommu/rockchip,iommu.yaml 
b/Documentation/devicetree/bindings/iommu/rockchip,iommu.yaml
index 099fc2578b54..d2e28a9e3545 100644
--- a/Documentation/devicetree/bindings/iommu/rockchip,iommu.yaml
+++ b/Documentation/devicetree/bindings/iommu/rockchip,iommu.yaml
@@ -19,7 +19,9 @@ description: |+
 
 properties:
   compatible:
-const: rockchip,iommu
+enum:
+  - rockchip,iommu
+  - rockchip,rk3568-iommu
 
   reg:
 items:
@@ -48,6 +50,9 @@ properties:
   "#iommu-cells":
 const: 0
 
+  power-domains:
+maxItems: 1
+
   rockchip,disable-mmu-reset:
 $ref: /schemas/types.yaml#/definitions/flag
 description: |
-- 
2.25.1

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[PATCH v6 3/4] iommu: rockchip: Add internal ops to handle variants

2021-05-21 Thread Benjamin Gaignard
Add internal ops to be able to handle incoming variant v2.
The goal is to keep the overall structure of the framework but
to allow to add the evolution of this hardware block.

The ops are global for a SoC because iommu domains are not
attached to a specific devices if they are for a virtuel device like
drm. Use a global variable shouldn't be since SoC usually doesn't
embedded different versions of the iommu hardware block.
If that happen one day a WARN_ON will be displayed at probe time.

Signed-off-by: Benjamin Gaignard 
Reviewed-by: Heiko Stuebner 
---
version 6:
 - Remove #include 
 - Remove pt_address_mask field
 - Only use once of_device_get_match_data
 - Return an error if ops don't match

version 5:
 - Use of_device_get_match_data()
 - Add internal ops inside the driver

 drivers/iommu/rockchip-iommu.c | 72 --
 1 file changed, 51 insertions(+), 21 deletions(-)

diff --git a/drivers/iommu/rockchip-iommu.c b/drivers/iommu/rockchip-iommu.c
index 7a2932772fdf..7b537dd168ca 100644
--- a/drivers/iommu/rockchip-iommu.c
+++ b/drivers/iommu/rockchip-iommu.c
@@ -96,6 +96,13 @@ static const char * const rk_iommu_clocks[] = {
"aclk", "iface",
 };
 
+struct rk_iommu_ops {
+   phys_addr_t (*pt_address)(u32 dte);
+   u32 (*mk_dtentries)(dma_addr_t pt_dma);
+   u32 (*mk_ptentries)(phys_addr_t page, int prot);
+   phys_addr_t (*dte_addr_phys)(u32 addr);
+};
+
 struct rk_iommu {
struct device *dev;
void __iomem **bases;
@@ -116,6 +123,7 @@ struct rk_iommudata {
 };
 
 static struct device *dma_dev;
+static const struct rk_iommu_ops *rk_ops;
 
 static inline void rk_table_flush(struct rk_iommu_domain *dom, dma_addr_t dma,
  unsigned int count)
@@ -215,11 +223,6 @@ static inline u32 rk_mk_dte(dma_addr_t pt_dma)
 #define RK_PTE_PAGE_READABLE  BIT(1)
 #define RK_PTE_PAGE_VALID BIT(0)
 
-static inline phys_addr_t rk_pte_page_address(u32 pte)
-{
-   return (phys_addr_t)pte & RK_PTE_PAGE_ADDRESS_MASK;
-}
-
 static inline bool rk_pte_is_page_valid(u32 pte)
 {
return pte & RK_PTE_PAGE_VALID;
@@ -448,10 +451,10 @@ static int rk_iommu_force_reset(struct rk_iommu *iommu)
 * and verifying that upper 5 nybbles are read back.
 */
for (i = 0; i < iommu->num_mmu; i++) {
-   rk_iommu_write(iommu->bases[i], RK_MMU_DTE_ADDR, 
DTE_ADDR_DUMMY);
+   dte_addr = rk_ops->pt_address(DTE_ADDR_DUMMY);
+   rk_iommu_write(iommu->bases[i], RK_MMU_DTE_ADDR, dte_addr);
 
-   dte_addr = rk_iommu_read(iommu->bases[i], RK_MMU_DTE_ADDR);
-   if (dte_addr != (DTE_ADDR_DUMMY & RK_DTE_PT_ADDRESS_MASK)) {
+   if (dte_addr != rk_iommu_read(iommu->bases[i], 
RK_MMU_DTE_ADDR)) {
dev_err(iommu->dev, "Error during raw reset. 
MMU_DTE_ADDR is not functioning\n");
return -EFAULT;
}
@@ -470,6 +473,11 @@ static int rk_iommu_force_reset(struct rk_iommu *iommu)
return 0;
 }
 
+static inline phys_addr_t rk_dte_addr_phys(u32 addr)
+{
+   return (phys_addr_t)addr;
+}
+
 static void log_iova(struct rk_iommu *iommu, int index, dma_addr_t iova)
 {
void __iomem *base = iommu->bases[index];
@@ -489,7 +497,7 @@ static void log_iova(struct rk_iommu *iommu, int index, 
dma_addr_t iova)
page_offset = rk_iova_page_offset(iova);
 
mmu_dte_addr = rk_iommu_read(base, RK_MMU_DTE_ADDR);
-   mmu_dte_addr_phys = (phys_addr_t)mmu_dte_addr;
+   mmu_dte_addr_phys = rk_ops->dte_addr_phys(mmu_dte_addr);
 
dte_addr_phys = mmu_dte_addr_phys + (4 * dte_index);
dte_addr = phys_to_virt(dte_addr_phys);
@@ -498,14 +506,14 @@ static void log_iova(struct rk_iommu *iommu, int index, 
dma_addr_t iova)
if (!rk_dte_is_pt_valid(dte))
goto print_it;
 
-   pte_addr_phys = rk_dte_pt_address(dte) + (pte_index * 4);
+   pte_addr_phys = rk_ops->pt_address(dte) + (pte_index * 4);
pte_addr = phys_to_virt(pte_addr_phys);
pte = *pte_addr;
 
if (!rk_pte_is_page_valid(pte))
goto print_it;
 
-   page_addr_phys = rk_pte_page_address(pte) + page_offset;
+   page_addr_phys = rk_ops->pt_address(pte) + page_offset;
page_flags = pte & RK_PTE_PAGE_FLAGS_MASK;
 
 print_it:
@@ -601,13 +609,13 @@ static phys_addr_t rk_iommu_iova_to_phys(struct 
iommu_domain *domain,
if (!rk_dte_is_pt_valid(dte))
goto out;
 
-   pt_phys = rk_dte_pt_address(dte);
+   pt_phys = rk_ops->pt_address(dte);
page_table = (u32 *)phys_to_virt(pt_phys);
pte = page_table[rk_iova_pte_index(iova)];
if (!rk_pte_is_page_valid(pte))
goto out;
 
-   phys = rk_pte_page_address(pte) + rk_iova_page_offset(iova);
+   phys = rk_ops->pt_address(pte) + rk_iova_page_offset(iova);
 out:
spin_unlock_irqrestore(_domain->dt_lock, flags);
 
@@ -679,14 +687,14 @@ 

[PATCH v6 0/4] Add IOMMU driver for rk356x

2021-05-21 Thread Benjamin Gaignard
This series adds the IOMMU driver for rk356x SoC.
Since a new compatible is needed to distinguish this second version of 
IOMMU hardware block from the first one, it is an opportunity to convert
the binding to DT schema.

version 6:
 - Remove #include 
 - Remove pt_address_mask field
 - Only use once of_device_get_match_data
 - Return an error if ops don't match

version 5:
 - Add internal ops inside the driver to be able to add variants.
 - Add support of v2 variant.
 - Stop using 'version' field
 - Use GENMASK macro

version 4:
 - Add description for reg items
 - Remove useless interrupt-names properties
 - Add description for interrupts items
 - Remove interrupt-names properties from DST files

version 3:
 - Rename compatible with soc prefix
 - Rebase on v5.12 tag

version 2:
 - Fix iommu-cells typo in rk322x.dtsi
 - Change maintainer
 - Change reg maxItems
 - Add power-domains property
Add IOMMU driver for rk356x

This series adds the IOMMU driver for rk356x SoC.
Since a new compatible is needed to distinguish this second version of 
IOMMU hardware block from the first one, it is an opportunity to convert
the binding to DT schema.

version 4:
 - Add description for reg items
 - Remove useless interrupt-names properties
 - Add description for interrupts items
 - Remove interrupt-names properties from DST files

version 3:
 - Rename compatible with soc prefix
 - Rebase on v5.12 tag

version 2:
 - Fix iommu-cells typo in rk322x.dtsi
 - Change maintainer
 - Change reg maxItems
 - Add power-domains property

Benjamin Gaignard (4):
  dt-bindings: iommu: rockchip: Convert IOMMU to DT schema
  dt-bindings: iommu: rockchip: Add compatible for v2
  iommu: rockchip: Add internal ops to handle variants
  iommu: rockchip: Add support for iommu v2

 .../bindings/iommu/rockchip,iommu.txt |  38 -
 .../bindings/iommu/rockchip,iommu.yaml|  85 ++
 drivers/iommu/rockchip-iommu.c| 150 +++---
 3 files changed, 214 insertions(+), 59 deletions(-)
 delete mode 100644 Documentation/devicetree/bindings/iommu/rockchip,iommu.txt
 create mode 100644 Documentation/devicetree/bindings/iommu/rockchip,iommu.yaml

-- 
2.25.1

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH v5 3/4] iommu: rockchip: Add internal ops to handle variants

2021-05-21 Thread Benjamin Gaignard


Le 21/05/2021 à 14:58, Robin Murphy a écrit :

On 2021-05-21 09:36, Benjamin Gaignard wrote:

Add internal ops to be able to handle incoming variant v2.
The goal is to keep the overall structure of the framework but
to allow to add the evolution of this hardware block.

The ops are global for a SoC because iommu domains are not
attached to a specific devices if they are for a virtuel device like
drm. Use a global variable shouldn't be since SoC usually doesn't
embedded different versions of the iommu hardware block.
If that happen one day a WARN_ON will be displayed at probe time.


IMO it would be a grievous error if such a "virtual device" ever gets 
near the IOMMU API, so personally I wouldn't use that as a 
justification for anything :)


FWIW you should be OK to handle things on a per-instance basis, it 
just means you have to defer some of the domain setup to .attach_dev 
time, like various other drivers do. That said, there's nothing wrong 
with the global if we do expect instances to be consistent across any 
given Rockchip SoC (and my gut feeling is that we probably should).


I have tried that solution first but drm device appear to but such "virtual 
device" so I had to use the global.

I send a v6 to fix your others remarks.

Thanks for your advice.

Benjamin




Signed-off-by: Benjamin Gaignard 
---
version 5:
  - Use of_device_get_match_data()
  - Add internal ops inside the driver

  drivers/iommu/rockchip-iommu.c | 69 --
  1 file changed, 50 insertions(+), 19 deletions(-)

diff --git a/drivers/iommu/rockchip-iommu.c 
b/drivers/iommu/rockchip-iommu.c

index 7a2932772fdf..e7b9bcf174b1 100644
--- a/drivers/iommu/rockchip-iommu.c
+++ b/drivers/iommu/rockchip-iommu.c
@@ -19,6 +19,7 @@
  #include 
  #include 
  #include 
+#include 


This seems to be an unrelated and unnecessary change.


  #include 
  #include 
  #include 
@@ -96,6 +97,14 @@ static const char * const rk_iommu_clocks[] = {
  "aclk", "iface",
  };
  +struct rk_iommu_ops {
+    phys_addr_t (*pt_address)(u32 dte);
+    u32 (*mk_dtentries)(dma_addr_t pt_dma);
+    u32 (*mk_ptentries)(phys_addr_t page, int prot);
+    phys_addr_t (*dte_addr_phys)(phys_addr_t addr);
+    u32 pt_address_mask;
+};
+
  struct rk_iommu {
  struct device *dev;
  void __iomem **bases;
@@ -116,6 +125,7 @@ struct rk_iommudata {
  };
    static struct device *dma_dev;
+static const struct rk_iommu_ops *rk_ops;
    static inline void rk_table_flush(struct rk_iommu_domain *dom, 
dma_addr_t dma,

    unsigned int count)
@@ -215,11 +225,6 @@ static inline u32 rk_mk_dte(dma_addr_t pt_dma)
  #define RK_PTE_PAGE_READABLE  BIT(1)
  #define RK_PTE_PAGE_VALID BIT(0)
  -static inline phys_addr_t rk_pte_page_address(u32 pte)
-{
-    return (phys_addr_t)pte & RK_PTE_PAGE_ADDRESS_MASK;
-}
-
  static inline bool rk_pte_is_page_valid(u32 pte)
  {
  return pte & RK_PTE_PAGE_VALID;
@@ -451,7 +456,7 @@ static int rk_iommu_force_reset(struct rk_iommu 
*iommu)
  rk_iommu_write(iommu->bases[i], RK_MMU_DTE_ADDR, 
DTE_ADDR_DUMMY);

    dte_addr = rk_iommu_read(iommu->bases[i], RK_MMU_DTE_ADDR);
-    if (dte_addr != (DTE_ADDR_DUMMY & RK_DTE_PT_ADDRESS_MASK)) {
+    if (dte_addr != (DTE_ADDR_DUMMY & rk_ops->pt_address_mask)) {


Nit: might it make more sense to do something like:

    dte_addr = rk_ops->pt_address(... DTE_ADDR_DUMMY);
    rk_iommu_write(... dte_addr)
    if (rk_iommu_read(...) != dte_addr)

so that you don't need to bother defining ->pt_address_mask for just 
this one sanity-check?


  dev_err(iommu->dev, "Error during raw reset. 
MMU_DTE_ADDR is not functioning\n");

  return -EFAULT;
  }
@@ -470,6 +475,11 @@ static int rk_iommu_force_reset(struct rk_iommu 
*iommu)

  return 0;
  }
  +static inline phys_addr_t rk_dte_addr_phys(phys_addr_t addr)


The argument type here should be u32, since it's a DTE, not a physical 
address...



+{
+    return addr;
+}
+
  static void log_iova(struct rk_iommu *iommu, int index, dma_addr_t 
iova)

  {
  void __iomem *base = iommu->bases[index];
@@ -489,7 +499,7 @@ static void log_iova(struct rk_iommu *iommu, int 
index, dma_addr_t iova)

  page_offset = rk_iova_page_offset(iova);
    mmu_dte_addr = rk_iommu_read(base, RK_MMU_DTE_ADDR);
-    mmu_dte_addr_phys = (phys_addr_t)mmu_dte_addr;
+    mmu_dte_addr_phys = 
rk_ops->dte_addr_phys((phys_addr_t)mmu_dte_addr);


...and the cast here should not be here, since it *is* the conversion 
that the called function is supposed to be performing.



  dte_addr_phys = mmu_dte_addr_phys + (4 * dte_index);
  dte_addr = phys_to_virt(dte_addr_phys);
@@ -498,14 +508,14 @@ static void log_iova(struct rk_iommu *iommu, 
int index, dma_addr_t iova)

  if (!rk_dte_is_pt_valid(dte))
  goto print_it;
  -    pte_addr_phys = rk_dte_pt_address(dte) + (pte_index * 4);
+    pte_addr_phys = rk_ops->pt_address(dte) + (pte_index * 4);
 

RE: [PATCH v4 0/8] ACPI/IORT: Support for IORT RMR node

2021-05-21 Thread Shameerali Kolothum Thodi


> -Original Message-
> From: Steven Price [mailto:steven.pr...@arm.com]
> Sent: 21 May 2021 13:55
> To: Shameerali Kolothum Thodi ;
> linux-arm-ker...@lists.infradead.org; linux-a...@vger.kernel.org;
> iommu@lists.linux-foundation.org
> Cc: Linuxarm ; lorenzo.pieral...@arm.com;
> j...@8bytes.org; robin.mur...@arm.com; wanghuiqiang
> ; Guohanjun (Hanjun Guo)
> ; sami.muja...@arm.com; j...@solid-run.com;
> eric.au...@redhat.com; yangyicong 
> Subject: Re: [PATCH v4 0/8] ACPI/IORT: Support for IORT RMR node
> 
> On 13/05/2021 14:45, Shameer Kolothum wrote:
> > Hi,
> >
> > v3 -->v4
> > -Included the SMMUv2 SMR bypass install changes suggested by
> >  Steve(patch #7)
> > -As per Robin's comments, RMR reserve implementation is now
> >  more generic  (patch #8) and dropped v3 patches 8 and 10.
> > -Rebase to 5.13-rc1
> >
> > The whole series is available here,
> > https://github.com/hisilicon/kernel-dev/tree/private-v5.13-rc1-rmr-v4-ext
> >
> > RFC v2 --> v3
> >  -Dropped RFC tag as the ACPICA header changes are now ready to be
> >   part of 5.13[0]. But this series still has a dependency on that patch.
> >  -Added IORT E.b related changes(node flags, _DSM function 5 checks for
> >   PCIe).
> >  -Changed RMR to stream id mapping from M:N to M:1 as per the spec and
> >   discussion here[1].
> >  -Last two patches add support for SMMUv2(Thanks to Jon Nettleton!)
> >
> > Sanity tested on a HiSilicon D06. Further testing and feedback is greatly
> > appreciated.
> 
> With the updated SMMUv2 support this works fine on my Juno with EFIFB
> (and corresponding patches to the firmware to expose the ACPI tables).
> Feel free to add
> 
> Tested-by: Steven Price 

Thanks Steve. I am in the process of incorporating the comments from Joerg/Robin
to reuse the struct iommu_resv_region. I will post a v5 soon with that and a 
couple
of other minor fixes.

Thanks,
Shameer


> Thanks,
> 
> Steve
> 
> > https://github.com/hisilicon/kernel-dev/tree/private-v5.12-rc8-rmr-v3
> >
> > Thanks,
> > Shameer
> >
> > [0]
> https://lore.kernel.org/linux-acpi/20210406213028.718796-22-erik.kaneda@i
> ntel.com/
> > [1]
> https://op-lists.linaro.org/pipermail/linaro-open-discussions/2021-April/00015
> 0.html
> >
> > RFC v1 --> v2:
> >  - Added a generic interface for IOMMU drivers to retrieve all the
> >    RMR info associated with a given IOMMU.
> >  - SMMUv3 driver gets the RMR list during probe() and installs
> >    bypass STEs for all the SIDs in the RMR list. This is to keep
> >    the ongoing traffic alive(if any) during SMMUv3 reset. This is
> >based on the suggestions received for v1 to take care of the
> >EFI framebuffer use case. Only sanity tested for now.
> >  - During the probe/attach device, SMMUv3 driver reserves any
> >    RMR region associated with the device such that there is a unity
> >    mapping for them in SMMU.
> > ---
> >
> > From RFC v1:
> > -
> > The series adds support to IORT RMR nodes specified in IORT
> > Revision E -ARM DEN 0049E[0]. RMR nodes are used to describe memory
> > ranges that are used by endpoints and require a unity mapping
> > in SMMU.
> >
> > We have faced issues with 3408iMR RAID controller cards which
> > fail to boot when SMMU is enabled. This is because these controllers
> > make use of host memory for various caching related purposes and when
> > SMMU is enabled the iMR firmware fails to access these memory regions
> > as there is no mapping for them. IORT RMR provides a way for UEFI to
> > describe and report these memory regions so that the kernel can make
> > a unity mapping for these in SMMU.
> >
> > Tests:
> >
> > With a UEFI, that reports the RMR for the dev,
> > 
> > [16F0h 5872   1] Type : 06
> > [16F1h 5873   2]   Length : 007C
> > [16F3h 5875   1] Revision : 00
> > [1038h 0056   2] Reserved : 
> > [1038h 0056   2]   Identifier : 
> > [16F8h 5880   4]Mapping Count : 0001
> > [16FCh 5884   4]   Mapping Offset : 0040
> >
> > [1700h 5888   4]Number of RMR Descriptors : 0002
> > [1704h 5892   4]RMR Descriptor Offset : 0018
> >
> > [1708h 5896   8]  Base Address of RMR : E640
> > [1710h 5904   8]Length of RMR : 0010
> > [1718h 5912   4] Reserved : 
> >
> > [171Ch 5916   8]  Base Address of RMR : 27B0
> > [1724h 5924   8]Length of RMR : 00C0
> > [172Ch 5932   4] Reserved : 
> >
> > [1730h 5936   4]   Input base : 
> > [1734h 5940   4] ID Count : 0001
> > [1738h 5944   4]  Output Base : 0003
> > [173Ch 5948   4] Output Reference : 0064
> > [1740h 5952   4]Flags (decoded below) : 0001
> >Single Mapping : 1
> > ...
> >
> > 

Re: [PATCH v5 3/4] iommu: rockchip: Add internal ops to handle variants

2021-05-21 Thread Robin Murphy

On 2021-05-21 09:36, Benjamin Gaignard wrote:

Add internal ops to be able to handle incoming variant v2.
The goal is to keep the overall structure of the framework but
to allow to add the evolution of this hardware block.

The ops are global for a SoC because iommu domains are not
attached to a specific devices if they are for a virtuel device like
drm. Use a global variable shouldn't be since SoC usually doesn't
embedded different versions of the iommu hardware block.
If that happen one day a WARN_ON will be displayed at probe time.


IMO it would be a grievous error if such a "virtual device" ever gets 
near the IOMMU API, so personally I wouldn't use that as a justification 
for anything :)


FWIW you should be OK to handle things on a per-instance basis, it just 
means you have to defer some of the domain setup to .attach_dev time, 
like various other drivers do. That said, there's nothing wrong with the 
global if we do expect instances to be consistent across any given 
Rockchip SoC (and my gut feeling is that we probably should).



Signed-off-by: Benjamin Gaignard 
---
version 5:
  - Use of_device_get_match_data()
  - Add internal ops inside the driver

  drivers/iommu/rockchip-iommu.c | 69 --
  1 file changed, 50 insertions(+), 19 deletions(-)

diff --git a/drivers/iommu/rockchip-iommu.c b/drivers/iommu/rockchip-iommu.c
index 7a2932772fdf..e7b9bcf174b1 100644
--- a/drivers/iommu/rockchip-iommu.c
+++ b/drivers/iommu/rockchip-iommu.c
@@ -19,6 +19,7 @@
  #include 
  #include 
  #include 
+#include 


This seems to be an unrelated and unnecessary change.


  #include 
  #include 
  #include 
@@ -96,6 +97,14 @@ static const char * const rk_iommu_clocks[] = {
"aclk", "iface",
  };
  
+struct rk_iommu_ops {

+   phys_addr_t (*pt_address)(u32 dte);
+   u32 (*mk_dtentries)(dma_addr_t pt_dma);
+   u32 (*mk_ptentries)(phys_addr_t page, int prot);
+   phys_addr_t (*dte_addr_phys)(phys_addr_t addr);
+   u32 pt_address_mask;
+};
+
  struct rk_iommu {
struct device *dev;
void __iomem **bases;
@@ -116,6 +125,7 @@ struct rk_iommudata {
  };
  
  static struct device *dma_dev;

+static const struct rk_iommu_ops *rk_ops;
  
  static inline void rk_table_flush(struct rk_iommu_domain *dom, dma_addr_t dma,

  unsigned int count)
@@ -215,11 +225,6 @@ static inline u32 rk_mk_dte(dma_addr_t pt_dma)
  #define RK_PTE_PAGE_READABLE  BIT(1)
  #define RK_PTE_PAGE_VALID BIT(0)
  
-static inline phys_addr_t rk_pte_page_address(u32 pte)

-{
-   return (phys_addr_t)pte & RK_PTE_PAGE_ADDRESS_MASK;
-}
-
  static inline bool rk_pte_is_page_valid(u32 pte)
  {
return pte & RK_PTE_PAGE_VALID;
@@ -451,7 +456,7 @@ static int rk_iommu_force_reset(struct rk_iommu *iommu)
rk_iommu_write(iommu->bases[i], RK_MMU_DTE_ADDR, 
DTE_ADDR_DUMMY);
  
  		dte_addr = rk_iommu_read(iommu->bases[i], RK_MMU_DTE_ADDR);

-   if (dte_addr != (DTE_ADDR_DUMMY & RK_DTE_PT_ADDRESS_MASK)) {
+   if (dte_addr != (DTE_ADDR_DUMMY & rk_ops->pt_address_mask)) {


Nit: might it make more sense to do something like:

dte_addr = rk_ops->pt_address(... DTE_ADDR_DUMMY);
rk_iommu_write(... dte_addr)
if (rk_iommu_read(...) != dte_addr)

so that you don't need to bother defining ->pt_address_mask for just 
this one sanity-check?



dev_err(iommu->dev, "Error during raw reset. MMU_DTE_ADDR is 
not functioning\n");
return -EFAULT;
}
@@ -470,6 +475,11 @@ static int rk_iommu_force_reset(struct rk_iommu *iommu)
return 0;
  }
  
+static inline phys_addr_t rk_dte_addr_phys(phys_addr_t addr)


The argument type here should be u32, since it's a DTE, not a physical 
address...



+{
+   return addr;
+}
+
  static void log_iova(struct rk_iommu *iommu, int index, dma_addr_t iova)
  {
void __iomem *base = iommu->bases[index];
@@ -489,7 +499,7 @@ static void log_iova(struct rk_iommu *iommu, int index, 
dma_addr_t iova)
page_offset = rk_iova_page_offset(iova);
  
  	mmu_dte_addr = rk_iommu_read(base, RK_MMU_DTE_ADDR);

-   mmu_dte_addr_phys = (phys_addr_t)mmu_dte_addr;
+   mmu_dte_addr_phys = rk_ops->dte_addr_phys((phys_addr_t)mmu_dte_addr);


...and the cast here should not be here, since it *is* the conversion 
that the called function is supposed to be performing.



dte_addr_phys = mmu_dte_addr_phys + (4 * dte_index);
dte_addr = phys_to_virt(dte_addr_phys);
@@ -498,14 +508,14 @@ static void log_iova(struct rk_iommu *iommu, int index, 
dma_addr_t iova)
if (!rk_dte_is_pt_valid(dte))
goto print_it;
  
-	pte_addr_phys = rk_dte_pt_address(dte) + (pte_index * 4);

+   pte_addr_phys = rk_ops->pt_address(dte) + (pte_index * 4);
pte_addr = phys_to_virt(pte_addr_phys);
pte = *pte_addr;
  
  	if (!rk_pte_is_page_valid(pte))

Re: [PATCH v4 0/8] ACPI/IORT: Support for IORT RMR node

2021-05-21 Thread Steven Price
On 13/05/2021 14:45, Shameer Kolothum wrote:
> Hi,
> 
> v3 -->v4
> -Included the SMMUv2 SMR bypass install changes suggested by
>  Steve(patch #7)
> -As per Robin's comments, RMR reserve implementation is now
>  more generic  (patch #8) and dropped v3 patches 8 and 10.
> -Rebase to 5.13-rc1 
> 
> The whole series is available here,
> https://github.com/hisilicon/kernel-dev/tree/private-v5.13-rc1-rmr-v4-ext
> 
> RFC v2 --> v3
>  -Dropped RFC tag as the ACPICA header changes are now ready to be
>   part of 5.13[0]. But this series still has a dependency on that patch.
>  -Added IORT E.b related changes(node flags, _DSM function 5 checks for
>   PCIe).
>  -Changed RMR to stream id mapping from M:N to M:1 as per the spec and
>   discussion here[1].
>  -Last two patches add support for SMMUv2(Thanks to Jon Nettleton!) 
> 
> Sanity tested on a HiSilicon D06. Further testing and feedback is greatly
> appreciated.

With the updated SMMUv2 support this works fine on my Juno with EFIFB
(and corresponding patches to the firmware to expose the ACPI tables).
Feel free to add

Tested-by: Steven Price 

Thanks,

Steve

> https://github.com/hisilicon/kernel-dev/tree/private-v5.12-rc8-rmr-v3
> 
> Thanks,
> Shameer
> 
> [0] 
> https://lore.kernel.org/linux-acpi/20210406213028.718796-22-erik.kan...@intel.com/
> [1] 
> https://op-lists.linaro.org/pipermail/linaro-open-discussions/2021-April/000150.html
> 
> RFC v1 --> v2:
>  - Added a generic interface for IOMMU drivers to retrieve all the 
>    RMR info associated with a given IOMMU.
>  - SMMUv3 driver gets the RMR list during probe() and installs
>    bypass STEs for all the SIDs in the RMR list. This is to keep
>    the ongoing traffic alive(if any) during SMMUv3 reset. This is
>based on the suggestions received for v1 to take care of the
>EFI framebuffer use case. Only sanity tested for now.
>  - During the probe/attach device, SMMUv3 driver reserves any
>    RMR region associated with the device such that there is a unity
>    mapping for them in SMMU.
> ---    
> 
> From RFC v1:
> -
> The series adds support to IORT RMR nodes specified in IORT
> Revision E -ARM DEN 0049E[0]. RMR nodes are used to describe memory
> ranges that are used by endpoints and require a unity mapping
> in SMMU.
> 
> We have faced issues with 3408iMR RAID controller cards which
> fail to boot when SMMU is enabled. This is because these controllers
> make use of host memory for various caching related purposes and when
> SMMU is enabled the iMR firmware fails to access these memory regions
> as there is no mapping for them. IORT RMR provides a way for UEFI to
> describe and report these memory regions so that the kernel can make
> a unity mapping for these in SMMU.
> 
> Tests:
> 
> With a UEFI, that reports the RMR for the dev,
> 
> [16F0h 5872   1] Type : 06
> [16F1h 5873   2]   Length : 007C
> [16F3h 5875   1] Revision : 00
> [1038h 0056   2] Reserved : 
> [1038h 0056   2]   Identifier : 
> [16F8h 5880   4]Mapping Count : 0001
> [16FCh 5884   4]   Mapping Offset : 0040
> 
> [1700h 5888   4]Number of RMR Descriptors : 0002
> [1704h 5892   4]RMR Descriptor Offset : 0018
> 
> [1708h 5896   8]  Base Address of RMR : E640
> [1710h 5904   8]Length of RMR : 0010
> [1718h 5912   4] Reserved : 
> 
> [171Ch 5916   8]  Base Address of RMR : 27B0
> [1724h 5924   8]Length of RMR : 00C0
> [172Ch 5932   4] Reserved : 
> 
> [1730h 5936   4]   Input base : 
> [1734h 5940   4] ID Count : 0001
> [1738h 5944   4]  Output Base : 0003
> [173Ch 5948   4] Output Reference : 0064
> [1740h 5952   4]Flags (decoded below) : 0001
>Single Mapping : 1
> ...
> 
> Without the series the RAID controller initialization fails as
> below,
> 
> ...
> [   12.631117] megaraid_sas :03:00.0: FW supports sync cache: Yes 
>   
> [   12.637360] megaraid_sas :03:00.0: megasas_disable_intr_fusion is 
> called outbound_intr_mask:0x4009  
>  
> [   18.776377] megaraid_sas :03:00.0: Init cmd return status FAILED for 
> SCSI host 0   
>   
> [   23.019383] megaraid_sas :03:00.0: Waiting for FW to come to ready 
> state 
> [  106.684281] megaraid_sas :03:00.0: FW in FAULT state, Fault 
> code:0x1 subcode:0x0 func:megasas_transition_to_ready 
>
> [  106.695186] megaraid_sas :03:00.0: System Register set:
>   
> [  106.889787] megaraid_sas :03:00.0: Failed to transition controller 

Re: [PATCH] dma-iommu: Add a check to avoid dereference null pointer in function iommu_dma_map_sg()

2021-05-21 Thread chenxiang (M)



在 2021/5/21 18:36, Robin Murphy 写道:

On 2021-05-21 04:05, chenxiang wrote:

From: Xiang Chen 

The issue is reported by tool TscanCode, and it is possible to deference
null pointer when prev is NULL which is the initial value.


No it isn't. This is literally explained in the comment visible in the 
diff context below...


Robin.


ok, thanks




Signed-off-by: Xiang Chen 
---
  drivers/iommu/dma-iommu.c | 2 +-
  1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/iommu/dma-iommu.c b/drivers/iommu/dma-iommu.c
index 4cb63b2..88a4f34 100644
--- a/drivers/iommu/dma-iommu.c
+++ b/drivers/iommu/dma-iommu.c
@@ -1042,7 +1042,7 @@ static int iommu_dma_map_sg(struct device *dev, 
struct scatterlist *sg,

   *   iova_len == 0, thus we cannot dereference prev the first
   *   time through here (i.e. before it has a meaningful 
value).

   */
-if (pad_len && pad_len < s_length - 1) {
+if (prev && pad_len && pad_len < s_length - 1) {
  prev->length += pad_len;
  iova_len += pad_len;
  }



.




___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

Re: [PATCH v5 2/4] dt-bindings: iommu: rockchip: Add compatible for v2

2021-05-21 Thread Robin Murphy

On 2021-05-21 09:36, Benjamin Gaignard wrote:

Add compatible for the second version of IOMMU hardware block.
RK356x IOMMU can also be link to a power domain.

Signed-off-by: Benjamin Gaignard 
Reviewed-by: Rob Herring 
---
  .../devicetree/bindings/iommu/rockchip,iommu.yaml  | 7 ++-
  1 file changed, 6 insertions(+), 1 deletion(-)

diff --git a/Documentation/devicetree/bindings/iommu/rockchip,iommu.yaml 
b/Documentation/devicetree/bindings/iommu/rockchip,iommu.yaml
index 099fc2578b54..d2e28a9e3545 100644
--- a/Documentation/devicetree/bindings/iommu/rockchip,iommu.yaml
+++ b/Documentation/devicetree/bindings/iommu/rockchip,iommu.yaml
@@ -19,7 +19,9 @@ description: |+
  
  properties:

compatible:
-const: rockchip,iommu
+enum:
+  - rockchip,iommu
+  - rockchip,rk3568-iommu
  
reg:

  items:
@@ -48,6 +50,9 @@ properties:
"#iommu-cells":
  const: 0
  
+  power-domains:

+maxItems: 1
+


Nit: power domains are already present on various IOMMU nodes since 
RK3288 - it feels like strictly this should be in patch #1 to fix the 
existing binding as part of the conversion, but on the other hand I 
can't really imagine anyone caring *that* much about dtscheck bisecting 
cleanly :)


Robin.


rockchip,disable-mmu-reset:
  $ref: /schemas/types.yaml#/definitions/flag
  description: |


___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH] dma-iommu: Add a check to avoid dereference null pointer in function iommu_dma_map_sg()

2021-05-21 Thread Robin Murphy

On 2021-05-21 04:05, chenxiang wrote:

From: Xiang Chen 

The issue is reported by tool TscanCode, and it is possible to deference
null pointer when prev is NULL which is the initial value.


No it isn't. This is literally explained in the comment visible in the 
diff context below...


Robin.


Signed-off-by: Xiang Chen 
---
  drivers/iommu/dma-iommu.c | 2 +-
  1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/iommu/dma-iommu.c b/drivers/iommu/dma-iommu.c
index 4cb63b2..88a4f34 100644
--- a/drivers/iommu/dma-iommu.c
+++ b/drivers/iommu/dma-iommu.c
@@ -1042,7 +1042,7 @@ static int iommu_dma_map_sg(struct device *dev, struct 
scatterlist *sg,
 *   iova_len == 0, thus we cannot dereference prev the first
 *   time through here (i.e. before it has a meaningful value).
 */
-   if (pad_len && pad_len < s_length - 1) {
+   if (prev && pad_len && pad_len < s_length - 1) {
prev->length += pad_len;
iova_len += pad_len;
}


___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH v5 1/4] dt-bindings: iommu: rockchip: Convert IOMMU to DT schema

2021-05-21 Thread Heiko Stübner
Am Freitag, 21. Mai 2021, 10:36:34 CEST schrieb Benjamin Gaignard:
> Convert Rockchip IOMMU to DT schema
> 
> Signed-off-by: Benjamin Gaignard 
> Reviewed-by: Rob Herring 

Reviewed-by: Heiko Stuebner 



___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH v5 2/4] dt-bindings: iommu: rockchip: Add compatible for v2

2021-05-21 Thread Heiko Stübner
Am Freitag, 21. Mai 2021, 10:36:35 CEST schrieb Benjamin Gaignard:
> Add compatible for the second version of IOMMU hardware block.
> RK356x IOMMU can also be link to a power domain.
> 
> Signed-off-by: Benjamin Gaignard 
> Reviewed-by: Rob Herring 

Reviewed-by: Heiko Stuebner 

> ---
>  .../devicetree/bindings/iommu/rockchip,iommu.yaml  | 7 ++-
>  1 file changed, 6 insertions(+), 1 deletion(-)
> 
> diff --git a/Documentation/devicetree/bindings/iommu/rockchip,iommu.yaml 
> b/Documentation/devicetree/bindings/iommu/rockchip,iommu.yaml
> index 099fc2578b54..d2e28a9e3545 100644
> --- a/Documentation/devicetree/bindings/iommu/rockchip,iommu.yaml
> +++ b/Documentation/devicetree/bindings/iommu/rockchip,iommu.yaml
> @@ -19,7 +19,9 @@ description: |+
>  
>  properties:
>compatible:
> -const: rockchip,iommu
> +enum:
> +  - rockchip,iommu
> +  - rockchip,rk3568-iommu
>  
>reg:
>  items:
> @@ -48,6 +50,9 @@ properties:
>"#iommu-cells":
>  const: 0
>  
> +  power-domains:
> +maxItems: 1
> +
>rockchip,disable-mmu-reset:
>  $ref: /schemas/types.yaml#/definitions/flag
>  description: |
> 




___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH v5 4/4] iommu: rockchip: Add support for iommu v2

2021-05-21 Thread Heiko Stübner
Am Freitag, 21. Mai 2021, 10:36:37 CEST schrieb Benjamin Gaignard:
> This second version of the hardware block has a different bits
> mapping for page table entries.
> Add the ops matching to this new mapping.
> Define a new compatible to distinguish it from the first version.
> 
> Signed-off-by: Benjamin Gaignard 

Reviewed-by: Heiko Stuebner 

> ---
> version 5:
>  - Use internal ops to support v2 hardware block
>  - Use GENMASK macro.
>  - Keep rk_dte_pt_address() and rk_dte_pt_address_v2() separated
>because I believe that is more readable like this.
>  - Do not duplicate code.
> 
>  drivers/iommu/rockchip-iommu.c | 78 ++
>  1 file changed, 78 insertions(+)
> 
> diff --git a/drivers/iommu/rockchip-iommu.c b/drivers/iommu/rockchip-iommu.c
> index e7b9bcf174b1..23253a2f269e 100644
> --- a/drivers/iommu/rockchip-iommu.c
> +++ b/drivers/iommu/rockchip-iommu.c
> @@ -189,6 +189,33 @@ static inline phys_addr_t rk_dte_pt_address(u32 dte)
>   return (phys_addr_t)dte & RK_DTE_PT_ADDRESS_MASK;
>  }
>  
> +/*
> + * In v2:
> + * 31:12 - PT address bit 31:0
> + * 11: 8 - PT address bit 35:32
> + *  7: 4 - PT address bit 39:36
> + *  3: 1 - Reserved
> + * 0 - 1 if PT @ PT address is valid
> + */
> +#define RK_DTE_PT_ADDRESS_MASK_V2 GENMASK_ULL(31, 4)
> +#define DTE_HI_MASK1 GENMASK(11, 8)
> +#define DTE_HI_MASK2 GENMASK(7, 4)
> +#define DTE_HI_SHIFT124 /* shift bit 8 to bit 32 */
> +#define DTE_HI_SHIFT232 /* shift bit 4 to bit 36 */
> +#define PAGE_DESC_HI_MASK1   GENMASK_ULL(39, 36)
> +#define PAGE_DESC_HI_MASK2   GENMASK_ULL(35, 32)
> +
> +static inline phys_addr_t rk_dte_pt_address_v2(u32 dte)
> +{
> + u64 dte_v2 = dte;
> +
> + dte_v2 = ((dte_v2 & DTE_HI_MASK2) << DTE_HI_SHIFT2) |
> +  ((dte_v2 & DTE_HI_MASK1) << DTE_HI_SHIFT1) |
> +  (dte_v2 & RK_DTE_PT_ADDRESS_MASK);
> +
> + return (phys_addr_t)dte_v2;
> +}
> +
>  static inline bool rk_dte_is_pt_valid(u32 dte)
>  {
>   return dte & RK_DTE_PT_VALID;
> @@ -199,6 +226,15 @@ static inline u32 rk_mk_dte(dma_addr_t pt_dma)
>   return (pt_dma & RK_DTE_PT_ADDRESS_MASK) | RK_DTE_PT_VALID;
>  }
>  
> +static inline u32 rk_mk_dte_v2(dma_addr_t pt_dma)
> +{
> + pt_dma = (pt_dma & RK_DTE_PT_ADDRESS_MASK) |
> +  ((pt_dma & PAGE_DESC_HI_MASK1) >> DTE_HI_SHIFT1) |
> +  (pt_dma & PAGE_DESC_HI_MASK2) >> DTE_HI_SHIFT2;
> +
> + return (pt_dma & RK_DTE_PT_ADDRESS_MASK_V2) | RK_DTE_PT_VALID;
> +}
> +
>  /*
>   * Each PTE has a Page address, some flags and a valid bit:
>   * +-+---+---+-+
> @@ -240,6 +276,29 @@ static u32 rk_mk_pte(phys_addr_t page, int prot)
>   return page | flags | RK_PTE_PAGE_VALID;
>  }
>  
> +/*
> + * In v2:
> + * 31:12 - Page address bit 31:0
> + *  11:9 - Page address bit 34:32
> + *   8:4 - Page address bit 39:35
> + * 3 - Security
> + * 2 - Readable
> + * 1 - Writable
> + * 0 - 1 if Page @ Page address is valid
> + */
> +#define RK_PTE_PAGE_READABLE_V2  BIT(2)
> +#define RK_PTE_PAGE_WRITABLE_V2  BIT(1)
> +
> +static u32 rk_mk_pte_v2(phys_addr_t page, int prot)
> +{
> + u32 flags = 0;
> +
> + flags |= (prot & IOMMU_READ) ? RK_PTE_PAGE_READABLE_V2 : 0;
> + flags |= (prot & IOMMU_WRITE) ? RK_PTE_PAGE_WRITABLE_V2 : 0;
> +
> + return rk_mk_dte_v2(page) | flags ;
> +}
> +
>  static u32 rk_mk_pte_invalid(u32 pte)
>  {
>   return pte & ~RK_PTE_PAGE_VALID;
> @@ -480,6 +539,14 @@ static inline phys_addr_t rk_dte_addr_phys(phys_addr_t 
> addr)
>   return addr;
>  }
>  
> +#define DT_HI_MASK GENMASK_ULL(39, 32)
> +#define DT_SHIFT   28
> +
> +static inline phys_addr_t rk_dte_addr_phys_v2(phys_addr_t addr)
> +{
> + return (addr & RK_DTE_PT_ADDRESS_MASK) | ((addr & DT_HI_MASK) << 
> DT_SHIFT);
> +}
> +
>  static void log_iova(struct rk_iommu *iommu, int index, dma_addr_t iova)
>  {
>   void __iomem *base = iommu->bases[index];
> @@ -1305,10 +1372,21 @@ static struct rk_iommu_ops iommu_data_ops_v1 = {
>   .pt_address_mask = RK_DTE_PT_ADDRESS_MASK,
>  };
>  
> +static struct rk_iommu_ops iommu_data_ops_v2 = {
> + .pt_address = _dte_pt_address_v2,
> + .mk_dtentries = _mk_dte_v2,
> + .mk_ptentries = _mk_pte_v2,
> + .dte_addr_phys = _dte_addr_phys_v2,
> + .pt_address_mask = RK_DTE_PT_ADDRESS_MASK_V2,
> +};
> +
>  static const struct of_device_id rk_iommu_dt_ids[] = {
>   {   .compatible = "rockchip,iommu",
>   .data = _data_ops_v1,
>   },
> + {   .compatible = "rockchip,rk3568-iommu",
> + .data = _data_ops_v2,
> + },
>   { /* sentinel */ }
>  };
>  MODULE_DEVICE_TABLE(of, rk_iommu_dt_ids);
> 




___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH v5 3/4] iommu: rockchip: Add internal ops to handle variants

2021-05-21 Thread Heiko Stübner
Am Freitag, 21. Mai 2021, 10:36:36 CEST schrieb Benjamin Gaignard:
> Add internal ops to be able to handle incoming variant v2.
> The goal is to keep the overall structure of the framework but
> to allow to add the evolution of this hardware block.
> 
> The ops are global for a SoC because iommu domains are not
> attached to a specific devices if they are for a virtuel device like
> drm. Use a global variable shouldn't be since SoC usually doesn't
> embedded different versions of the iommu hardware block.
> If that happen one day a WARN_ON will be displayed at probe time.
> 
> Signed-off-by: Benjamin Gaignard 

Reviewed-by: Heiko Stuebner 



___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[PATCH v5 3/4] iommu: rockchip: Add internal ops to handle variants

2021-05-21 Thread Benjamin Gaignard
Add internal ops to be able to handle incoming variant v2.
The goal is to keep the overall structure of the framework but
to allow to add the evolution of this hardware block.

The ops are global for a SoC because iommu domains are not
attached to a specific devices if they are for a virtuel device like
drm. Use a global variable shouldn't be since SoC usually doesn't
embedded different versions of the iommu hardware block.
If that happen one day a WARN_ON will be displayed at probe time.

Signed-off-by: Benjamin Gaignard 
---
version 5:
 - Use of_device_get_match_data()
 - Add internal ops inside the driver

 drivers/iommu/rockchip-iommu.c | 69 --
 1 file changed, 50 insertions(+), 19 deletions(-)

diff --git a/drivers/iommu/rockchip-iommu.c b/drivers/iommu/rockchip-iommu.c
index 7a2932772fdf..e7b9bcf174b1 100644
--- a/drivers/iommu/rockchip-iommu.c
+++ b/drivers/iommu/rockchip-iommu.c
@@ -19,6 +19,7 @@
 #include 
 #include 
 #include 
+#include 
 #include 
 #include 
 #include 
@@ -96,6 +97,14 @@ static const char * const rk_iommu_clocks[] = {
"aclk", "iface",
 };
 
+struct rk_iommu_ops {
+   phys_addr_t (*pt_address)(u32 dte);
+   u32 (*mk_dtentries)(dma_addr_t pt_dma);
+   u32 (*mk_ptentries)(phys_addr_t page, int prot);
+   phys_addr_t (*dte_addr_phys)(phys_addr_t addr);
+   u32 pt_address_mask;
+};
+
 struct rk_iommu {
struct device *dev;
void __iomem **bases;
@@ -116,6 +125,7 @@ struct rk_iommudata {
 };
 
 static struct device *dma_dev;
+static const struct rk_iommu_ops *rk_ops;
 
 static inline void rk_table_flush(struct rk_iommu_domain *dom, dma_addr_t dma,
  unsigned int count)
@@ -215,11 +225,6 @@ static inline u32 rk_mk_dte(dma_addr_t pt_dma)
 #define RK_PTE_PAGE_READABLE  BIT(1)
 #define RK_PTE_PAGE_VALID BIT(0)
 
-static inline phys_addr_t rk_pte_page_address(u32 pte)
-{
-   return (phys_addr_t)pte & RK_PTE_PAGE_ADDRESS_MASK;
-}
-
 static inline bool rk_pte_is_page_valid(u32 pte)
 {
return pte & RK_PTE_PAGE_VALID;
@@ -451,7 +456,7 @@ static int rk_iommu_force_reset(struct rk_iommu *iommu)
rk_iommu_write(iommu->bases[i], RK_MMU_DTE_ADDR, 
DTE_ADDR_DUMMY);
 
dte_addr = rk_iommu_read(iommu->bases[i], RK_MMU_DTE_ADDR);
-   if (dte_addr != (DTE_ADDR_DUMMY & RK_DTE_PT_ADDRESS_MASK)) {
+   if (dte_addr != (DTE_ADDR_DUMMY & rk_ops->pt_address_mask)) {
dev_err(iommu->dev, "Error during raw reset. 
MMU_DTE_ADDR is not functioning\n");
return -EFAULT;
}
@@ -470,6 +475,11 @@ static int rk_iommu_force_reset(struct rk_iommu *iommu)
return 0;
 }
 
+static inline phys_addr_t rk_dte_addr_phys(phys_addr_t addr)
+{
+   return addr;
+}
+
 static void log_iova(struct rk_iommu *iommu, int index, dma_addr_t iova)
 {
void __iomem *base = iommu->bases[index];
@@ -489,7 +499,7 @@ static void log_iova(struct rk_iommu *iommu, int index, 
dma_addr_t iova)
page_offset = rk_iova_page_offset(iova);
 
mmu_dte_addr = rk_iommu_read(base, RK_MMU_DTE_ADDR);
-   mmu_dte_addr_phys = (phys_addr_t)mmu_dte_addr;
+   mmu_dte_addr_phys = rk_ops->dte_addr_phys((phys_addr_t)mmu_dte_addr);
 
dte_addr_phys = mmu_dte_addr_phys + (4 * dte_index);
dte_addr = phys_to_virt(dte_addr_phys);
@@ -498,14 +508,14 @@ static void log_iova(struct rk_iommu *iommu, int index, 
dma_addr_t iova)
if (!rk_dte_is_pt_valid(dte))
goto print_it;
 
-   pte_addr_phys = rk_dte_pt_address(dte) + (pte_index * 4);
+   pte_addr_phys = rk_ops->pt_address(dte) + (pte_index * 4);
pte_addr = phys_to_virt(pte_addr_phys);
pte = *pte_addr;
 
if (!rk_pte_is_page_valid(pte))
goto print_it;
 
-   page_addr_phys = rk_pte_page_address(pte) + page_offset;
+   page_addr_phys = rk_ops->pt_address(pte) + page_offset;
page_flags = pte & RK_PTE_PAGE_FLAGS_MASK;
 
 print_it:
@@ -601,13 +611,13 @@ static phys_addr_t rk_iommu_iova_to_phys(struct 
iommu_domain *domain,
if (!rk_dte_is_pt_valid(dte))
goto out;
 
-   pt_phys = rk_dte_pt_address(dte);
+   pt_phys = rk_ops->pt_address(dte);
page_table = (u32 *)phys_to_virt(pt_phys);
pte = page_table[rk_iova_pte_index(iova)];
if (!rk_pte_is_page_valid(pte))
goto out;
 
-   phys = rk_pte_page_address(pte) + rk_iova_page_offset(iova);
+   phys = rk_ops->pt_address(pte) + rk_iova_page_offset(iova);
 out:
spin_unlock_irqrestore(_domain->dt_lock, flags);
 
@@ -679,14 +689,14 @@ static u32 *rk_dte_get_page_table(struct rk_iommu_domain 
*rk_domain,
return ERR_PTR(-ENOMEM);
}
 
-   dte = rk_mk_dte(pt_dma);
+   dte = rk_ops->mk_dtentries(pt_dma);
*dte_addr = dte;
 
rk_table_flush(rk_domain, pt_dma, NUM_PT_ENTRIES);

[PATCH v5 4/4] iommu: rockchip: Add support for iommu v2

2021-05-21 Thread Benjamin Gaignard
This second version of the hardware block has a different bits
mapping for page table entries.
Add the ops matching to this new mapping.
Define a new compatible to distinguish it from the first version.

Signed-off-by: Benjamin Gaignard 
---
version 5:
 - Use internal ops to support v2 hardware block
 - Use GENMASK macro.
 - Keep rk_dte_pt_address() and rk_dte_pt_address_v2() separated
   because I believe that is more readable like this.
 - Do not duplicate code.

 drivers/iommu/rockchip-iommu.c | 78 ++
 1 file changed, 78 insertions(+)

diff --git a/drivers/iommu/rockchip-iommu.c b/drivers/iommu/rockchip-iommu.c
index e7b9bcf174b1..23253a2f269e 100644
--- a/drivers/iommu/rockchip-iommu.c
+++ b/drivers/iommu/rockchip-iommu.c
@@ -189,6 +189,33 @@ static inline phys_addr_t rk_dte_pt_address(u32 dte)
return (phys_addr_t)dte & RK_DTE_PT_ADDRESS_MASK;
 }
 
+/*
+ * In v2:
+ * 31:12 - PT address bit 31:0
+ * 11: 8 - PT address bit 35:32
+ *  7: 4 - PT address bit 39:36
+ *  3: 1 - Reserved
+ * 0 - 1 if PT @ PT address is valid
+ */
+#define RK_DTE_PT_ADDRESS_MASK_V2 GENMASK_ULL(31, 4)
+#define DTE_HI_MASK1   GENMASK(11, 8)
+#define DTE_HI_MASK2   GENMASK(7, 4)
+#define DTE_HI_SHIFT1  24 /* shift bit 8 to bit 32 */
+#define DTE_HI_SHIFT2  32 /* shift bit 4 to bit 36 */
+#define PAGE_DESC_HI_MASK1 GENMASK_ULL(39, 36)
+#define PAGE_DESC_HI_MASK2 GENMASK_ULL(35, 32)
+
+static inline phys_addr_t rk_dte_pt_address_v2(u32 dte)
+{
+   u64 dte_v2 = dte;
+
+   dte_v2 = ((dte_v2 & DTE_HI_MASK2) << DTE_HI_SHIFT2) |
+((dte_v2 & DTE_HI_MASK1) << DTE_HI_SHIFT1) |
+(dte_v2 & RK_DTE_PT_ADDRESS_MASK);
+
+   return (phys_addr_t)dte_v2;
+}
+
 static inline bool rk_dte_is_pt_valid(u32 dte)
 {
return dte & RK_DTE_PT_VALID;
@@ -199,6 +226,15 @@ static inline u32 rk_mk_dte(dma_addr_t pt_dma)
return (pt_dma & RK_DTE_PT_ADDRESS_MASK) | RK_DTE_PT_VALID;
 }
 
+static inline u32 rk_mk_dte_v2(dma_addr_t pt_dma)
+{
+   pt_dma = (pt_dma & RK_DTE_PT_ADDRESS_MASK) |
+((pt_dma & PAGE_DESC_HI_MASK1) >> DTE_HI_SHIFT1) |
+(pt_dma & PAGE_DESC_HI_MASK2) >> DTE_HI_SHIFT2;
+
+   return (pt_dma & RK_DTE_PT_ADDRESS_MASK_V2) | RK_DTE_PT_VALID;
+}
+
 /*
  * Each PTE has a Page address, some flags and a valid bit:
  * +-+---+---+-+
@@ -240,6 +276,29 @@ static u32 rk_mk_pte(phys_addr_t page, int prot)
return page | flags | RK_PTE_PAGE_VALID;
 }
 
+/*
+ * In v2:
+ * 31:12 - Page address bit 31:0
+ *  11:9 - Page address bit 34:32
+ *   8:4 - Page address bit 39:35
+ * 3 - Security
+ * 2 - Readable
+ * 1 - Writable
+ * 0 - 1 if Page @ Page address is valid
+ */
+#define RK_PTE_PAGE_READABLE_V2  BIT(2)
+#define RK_PTE_PAGE_WRITABLE_V2  BIT(1)
+
+static u32 rk_mk_pte_v2(phys_addr_t page, int prot)
+{
+   u32 flags = 0;
+
+   flags |= (prot & IOMMU_READ) ? RK_PTE_PAGE_READABLE_V2 : 0;
+   flags |= (prot & IOMMU_WRITE) ? RK_PTE_PAGE_WRITABLE_V2 : 0;
+
+   return rk_mk_dte_v2(page) | flags ;
+}
+
 static u32 rk_mk_pte_invalid(u32 pte)
 {
return pte & ~RK_PTE_PAGE_VALID;
@@ -480,6 +539,14 @@ static inline phys_addr_t rk_dte_addr_phys(phys_addr_t 
addr)
return addr;
 }
 
+#define DT_HI_MASK GENMASK_ULL(39, 32)
+#define DT_SHIFT   28
+
+static inline phys_addr_t rk_dte_addr_phys_v2(phys_addr_t addr)
+{
+   return (addr & RK_DTE_PT_ADDRESS_MASK) | ((addr & DT_HI_MASK) << 
DT_SHIFT);
+}
+
 static void log_iova(struct rk_iommu *iommu, int index, dma_addr_t iova)
 {
void __iomem *base = iommu->bases[index];
@@ -1305,10 +1372,21 @@ static struct rk_iommu_ops iommu_data_ops_v1 = {
.pt_address_mask = RK_DTE_PT_ADDRESS_MASK,
 };
 
+static struct rk_iommu_ops iommu_data_ops_v2 = {
+   .pt_address = _dte_pt_address_v2,
+   .mk_dtentries = _mk_dte_v2,
+   .mk_ptentries = _mk_pte_v2,
+   .dte_addr_phys = _dte_addr_phys_v2,
+   .pt_address_mask = RK_DTE_PT_ADDRESS_MASK_V2,
+};
+
 static const struct of_device_id rk_iommu_dt_ids[] = {
{   .compatible = "rockchip,iommu",
.data = _data_ops_v1,
},
+   {   .compatible = "rockchip,rk3568-iommu",
+   .data = _data_ops_v2,
+   },
{ /* sentinel */ }
 };
 MODULE_DEVICE_TABLE(of, rk_iommu_dt_ids);
-- 
2.25.1

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[PATCH v5 1/4] dt-bindings: iommu: rockchip: Convert IOMMU to DT schema

2021-05-21 Thread Benjamin Gaignard
Convert Rockchip IOMMU to DT schema

Signed-off-by: Benjamin Gaignard 
Reviewed-by: Rob Herring 
---
 .../bindings/iommu/rockchip,iommu.txt | 38 -
 .../bindings/iommu/rockchip,iommu.yaml| 80 +++
 2 files changed, 80 insertions(+), 38 deletions(-)
 delete mode 100644 Documentation/devicetree/bindings/iommu/rockchip,iommu.txt
 create mode 100644 Documentation/devicetree/bindings/iommu/rockchip,iommu.yaml

diff --git a/Documentation/devicetree/bindings/iommu/rockchip,iommu.txt 
b/Documentation/devicetree/bindings/iommu/rockchip,iommu.txt
deleted file mode 100644
index 6ecefea1c6f9..
--- a/Documentation/devicetree/bindings/iommu/rockchip,iommu.txt
+++ /dev/null
@@ -1,38 +0,0 @@
-Rockchip IOMMU
-==
-
-A Rockchip DRM iommu translates io virtual addresses to physical addresses for
-its master device.  Each slave device is bound to a single master device, and
-shares its clocks, power domain and irq.
-
-Required properties:
-- compatible  : Should be "rockchip,iommu"
-- reg : Address space for the configuration registers
-- interrupts  : Interrupt specifier for the IOMMU instance
-- interrupt-names : Interrupt name for the IOMMU instance
-- #iommu-cells: Should be <0>.  This indicates the iommu is a
-"single-master" device, and needs no additional information
-to associate with its master device.  See:
-Documentation/devicetree/bindings/iommu/iommu.txt
-- clocks  : A list of clocks required for the IOMMU to be accessible by
-the host CPU.
-- clock-names : Should contain the following:
-   "iface" - Main peripheral bus clock (PCLK/HCL) (required)
-   "aclk"  - AXI bus clock (required)
-
-Optional properties:
-- rockchip,disable-mmu-reset : Don't use the mmu reset operation.
-  Some mmu instances may produce unexpected results
-  when the reset operation is used.
-
-Example:
-
-   vopl_mmu: iommu@ff940300 {
-   compatible = "rockchip,iommu";
-   reg = <0xff940300 0x100>;
-   interrupts = ;
-   interrupt-names = "vopl_mmu";
-   clocks = < ACLK_VOP1>, < HCLK_VOP1>;
-   clock-names = "aclk", "iface";
-   #iommu-cells = <0>;
-   };
diff --git a/Documentation/devicetree/bindings/iommu/rockchip,iommu.yaml 
b/Documentation/devicetree/bindings/iommu/rockchip,iommu.yaml
new file mode 100644
index ..099fc2578b54
--- /dev/null
+++ b/Documentation/devicetree/bindings/iommu/rockchip,iommu.yaml
@@ -0,0 +1,80 @@
+# SPDX-License-Identifier: GPL-2.0-only
+%YAML 1.2
+---
+$id: http://devicetree.org/schemas/iommu/rockchip,iommu.yaml#
+$schema: http://devicetree.org/meta-schemas/core.yaml#
+
+title: Rockchip IOMMU
+
+maintainers:
+  - Heiko Stuebner 
+
+description: |+
+  A Rockchip DRM iommu translates io virtual addresses to physical addresses 
for
+  its master device. Each slave device is bound to a single master device and
+  shares its clocks, power domain and irq.
+
+  For information on assigning IOMMU controller to its peripheral devices,
+  see generic IOMMU bindings.
+
+properties:
+  compatible:
+const: rockchip,iommu
+
+  reg:
+items:
+  - description: configuration registers for MMU instance 0
+  - description: configuration registers for MMU instance 1
+minItems: 1
+maxItems: 2
+
+  interrupts:
+items:
+  - description: interruption for MMU instance 0
+  - description: interruption for MMU instance 1
+minItems: 1
+maxItems: 2
+
+  clocks:
+items:
+  - description: Core clock
+  - description: Interface clock
+
+  clock-names:
+items:
+  - const: aclk
+  - const: iface
+
+  "#iommu-cells":
+const: 0
+
+  rockchip,disable-mmu-reset:
+$ref: /schemas/types.yaml#/definitions/flag
+description: |
+  Do not use the mmu reset operation.
+  Some mmu instances may produce unexpected results
+  when the reset operation is used.
+
+required:
+  - compatible
+  - reg
+  - interrupts
+  - clocks
+  - clock-names
+  - "#iommu-cells"
+
+additionalProperties: false
+
+examples:
+  - |
+#include 
+#include 
+
+vopl_mmu: iommu@ff940300 {
+  compatible = "rockchip,iommu";
+  reg = <0xff940300 0x100>;
+  interrupts = ;
+  clocks = < ACLK_VOP1>, < HCLK_VOP1>;
+  clock-names = "aclk", "iface";
+  #iommu-cells = <0>;
+};
-- 
2.25.1

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[PATCH v5 2/4] dt-bindings: iommu: rockchip: Add compatible for v2

2021-05-21 Thread Benjamin Gaignard
Add compatible for the second version of IOMMU hardware block.
RK356x IOMMU can also be link to a power domain.

Signed-off-by: Benjamin Gaignard 
Reviewed-by: Rob Herring 
---
 .../devicetree/bindings/iommu/rockchip,iommu.yaml  | 7 ++-
 1 file changed, 6 insertions(+), 1 deletion(-)

diff --git a/Documentation/devicetree/bindings/iommu/rockchip,iommu.yaml 
b/Documentation/devicetree/bindings/iommu/rockchip,iommu.yaml
index 099fc2578b54..d2e28a9e3545 100644
--- a/Documentation/devicetree/bindings/iommu/rockchip,iommu.yaml
+++ b/Documentation/devicetree/bindings/iommu/rockchip,iommu.yaml
@@ -19,7 +19,9 @@ description: |+
 
 properties:
   compatible:
-const: rockchip,iommu
+enum:
+  - rockchip,iommu
+  - rockchip,rk3568-iommu
 
   reg:
 items:
@@ -48,6 +50,9 @@ properties:
   "#iommu-cells":
 const: 0
 
+  power-domains:
+maxItems: 1
+
   rockchip,disable-mmu-reset:
 $ref: /schemas/types.yaml#/definitions/flag
 description: |
-- 
2.25.1

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[PATCH v5 0/4] Add IOMMU driver for rk356x

2021-05-21 Thread Benjamin Gaignard
This series adds the IOMMU driver for rk356x SoC.
Since a new compatible is needed to distinguish this second version of 
IOMMU hardware block from the first one, it is an opportunity to convert
the binding to DT schema.

version 5:
 - Add internal ops inside the driver to be able to add variants.
 - Add support of v2 variant.
 - Stop using 'version' field
 - Use GENMASK macro

version 4:
 - Add description for reg items
 - Remove useless interrupt-names properties
 - Add description for interrupts items
 - Remove interrupt-names properties from DST files

version 3:
 - Rename compatible with soc prefix
 - Rebase on v5.12 tag

version 2:
 - Fix iommu-cells typo in rk322x.dtsi
 - Change maintainer
 - Change reg maxItems
 - Add power-domains property
Add IOMMU driver for rk356x

This series adds the IOMMU driver for rk356x SoC.
Since a new compatible is needed to distinguish this second version of 
IOMMU hardware block from the first one, it is an opportunity to convert
the binding to DT schema.

version 4:
 - Add description for reg items
 - Remove useless interrupt-names properties
 - Add description for interrupts items
 - Remove interrupt-names properties from DST files

version 3:
 - Rename compatible with soc prefix
 - Rebase on v5.12 tag

version 2:
 - Fix iommu-cells typo in rk322x.dtsi
 - Change maintainer
 - Change reg maxItems
 - Add power-domains property

 
Benjamin Gaignard (4):
  dt-bindings: iommu: rockchip: Convert IOMMU to DT schema
  dt-bindings: iommu: rockchip: Add compatible for v2
  iommu: rockchip: Add internal ops to handle variants
  iommu: rockchip: Add support for iommu v2

 .../bindings/iommu/rockchip,iommu.txt |  38 -
 .../bindings/iommu/rockchip,iommu.yaml|  85 ++
 drivers/iommu/rockchip-iommu.c| 147 +++---
 3 files changed, 213 insertions(+), 57 deletions(-)
 delete mode 100644 Documentation/devicetree/bindings/iommu/rockchip,iommu.txt
 create mode 100644 Documentation/devicetree/bindings/iommu/rockchip,iommu.yaml

-- 
2.25.1

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [RFC PATCH v3 8/8] vfio: Add nested IOPF support

2021-05-21 Thread Shenming Lu
On 2021/5/19 2:58, Alex Williamson wrote:
> On Fri, 9 Apr 2021 11:44:20 +0800
> Shenming Lu  wrote:
> 
>> To set up nested mode, drivers such as vfio_pci need to register a
>> handler to receive stage/level 1 faults from the IOMMU, but since
>> currently each device can only have one iommu dev fault handler,
>> and if stage 2 IOPF is already enabled (VFIO_IOMMU_ENABLE_IOPF),
>> we choose to update the registered handler (a consolidated one) via
>> flags (set FAULT_REPORT_NESTED_L1), and further deliver the received
>> stage 1 faults in the handler to the guest through a newly added
>> vfio_device_ops callback.
> 
> Are there proposed in-kernel drivers that would use any of these
> symbols?

I hope that such as Eric's SMMUv3 Nested Stage Setup series [1] can
use these symbols to consolidate the two page fault handlers into one.

[1] 
https://patchwork.kernel.org/project/kvm/cover/2021044659.15051-1-eric.au...@redhat.com/

> 
>> Signed-off-by: Shenming Lu 
>> ---
>>  drivers/vfio/vfio.c | 81 +
>>  drivers/vfio/vfio_iommu_type1.c | 49 +++-
>>  include/linux/vfio.h| 12 +
>>  3 files changed, 141 insertions(+), 1 deletion(-)
>>
>> diff --git a/drivers/vfio/vfio.c b/drivers/vfio/vfio.c
>> index 44c8dfabf7de..4245f15914bf 100644
>> --- a/drivers/vfio/vfio.c
>> +++ b/drivers/vfio/vfio.c
>> @@ -2356,6 +2356,87 @@ struct iommu_domain *vfio_group_iommu_domain(struct 
>> vfio_group *group)
>>  }
>>  EXPORT_SYMBOL_GPL(vfio_group_iommu_domain);
>>  
>> +/*
>> + * Register/Update the VFIO IOPF handler to receive
>> + * nested stage/level 1 faults.
>> + */
>> +int vfio_iommu_dev_fault_handler_register_nested(struct device *dev)
>> +{
>> +struct vfio_container *container;
>> +struct vfio_group *group;
>> +struct vfio_iommu_driver *driver;
>> +int ret;
>> +
>> +if (!dev)
>> +return -EINVAL;
>> +
>> +group = vfio_group_get_from_dev(dev);
>> +if (!group)
>> +return -ENODEV;
>> +
>> +ret = vfio_group_add_container_user(group);
>> +if (ret)
>> +goto out;
>> +
>> +container = group->container;
>> +driver = container->iommu_driver;
>> +if (likely(driver && driver->ops->register_handler))
>> +ret = driver->ops->register_handler(container->iommu_data, dev);
>> +else
>> +ret = -ENOTTY;
>> +
>> +vfio_group_try_dissolve_container(group);
>> +
>> +out:
>> +vfio_group_put(group);
>> +return ret;
>> +}
>> +EXPORT_SYMBOL_GPL(vfio_iommu_dev_fault_handler_register_nested);
>> +
>> +int vfio_iommu_dev_fault_handler_unregister_nested(struct device *dev)
>> +{
>> +struct vfio_container *container;
>> +struct vfio_group *group;
>> +struct vfio_iommu_driver *driver;
>> +int ret;
>> +
>> +if (!dev)
>> +return -EINVAL;
>> +
>> +group = vfio_group_get_from_dev(dev);
>> +if (!group)
>> +return -ENODEV;
>> +
>> +ret = vfio_group_add_container_user(group);
>> +if (ret)
>> +goto out;
>> +
>> +container = group->container;
>> +driver = container->iommu_driver;
>> +if (likely(driver && driver->ops->unregister_handler))
>> +ret = driver->ops->unregister_handler(container->iommu_data, 
>> dev);
>> +else
>> +ret = -ENOTTY;
>> +
>> +vfio_group_try_dissolve_container(group);
>> +
>> +out:
>> +vfio_group_put(group);
>> +return ret;
>> +}
>> +EXPORT_SYMBOL_GPL(vfio_iommu_dev_fault_handler_unregister_nested);
>> +
>> +int vfio_transfer_iommu_fault(struct device *dev, struct iommu_fault *fault)
>> +{
>> +struct vfio_device *device = dev_get_drvdata(dev);
>> +
>> +if (unlikely(!device->ops->transfer))
>> +return -EOPNOTSUPP;
>> +
>> +return device->ops->transfer(device->device_data, fault);
>> +}
>> +EXPORT_SYMBOL_GPL(vfio_transfer_iommu_fault);
>> +
>>  /**
>>   * Module/class support
>>   */
>> diff --git a/drivers/vfio/vfio_iommu_type1.c 
>> b/drivers/vfio/vfio_iommu_type1.c
>> index ba2b5a1cf6e9..9d1adeddb303 100644
>> --- a/drivers/vfio/vfio_iommu_type1.c
>> +++ b/drivers/vfio/vfio_iommu_type1.c
>> @@ -3821,13 +3821,32 @@ static int vfio_iommu_type1_dma_map_iopf(struct 
>> iommu_fault *fault, void *data)
>>  struct vfio_batch batch;
>>  struct vfio_range *range;
>>  dma_addr_t iova = ALIGN_DOWN(fault->prm.addr, PAGE_SIZE);
>> -int access_flags = 0;
>> +int access_flags = 0, nested;
>>  size_t premap_len, map_len, mapped_len = 0;
>>  unsigned long bit_offset, vaddr, pfn, i, npages;
>>  int ret;
>>  enum iommu_page_response_code status = IOMMU_PAGE_RESP_INVALID;
>>  struct iommu_page_response resp = {0};
>>  
>> +if (vfio_dev_domian_nested(dev, ))
>> +return -ENODEV;
>> +
>> +/*
>> + * When configured in nested mode, further deliver the
>> + * stage/level 1 faults to the guest.
>> + */
>> +if (nested) {
>> +bool l2;
>> +
>> +

Re: [PATCH] iommu/vt-d: Fix sysfs leak in alloc_domain()

2021-05-21 Thread Lu Baolu

Hi Joerg,

On 5/21/21 2:53 PM, Rolf Eike Beer wrote:

Am Donnerstag, 22. April 2021, 07:34:17 CEST schrieb Lu Baolu:

Hi Rolf,

On 4/22/21 1:39 PM, Rolf Eike Beer wrote:

iommu_device_sysfs_add() is called before, so is has to be cleaned on
subsequent errors.

Fixes: 39ab9555c2411 ("iommu: Add sysfs bindings for struct iommu_device")
Cc: sta...@vger.kernel.org # 4.11.x
Signed-off-by: Rolf Eike Beer 


Acked-by: Lu Baolu 


Ping. Who is going to pick this up, please?


I forgot to put this fix in my last pull request. Could you please pick
it up? Or, I can send another pull request the next week.

Best regards,
baolu
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH] iommu/vt-d: Fix sysfs leak in alloc_domain()

2021-05-21 Thread Rolf Eike Beer
Am Donnerstag, 22. April 2021, 07:34:17 CEST schrieb Lu Baolu:
> Hi Rolf,
> 
> On 4/22/21 1:39 PM, Rolf Eike Beer wrote:
> > iommu_device_sysfs_add() is called before, so is has to be cleaned on
> > subsequent errors.
> > 
> > Fixes: 39ab9555c2411 ("iommu: Add sysfs bindings for struct iommu_device")
> > Cc: sta...@vger.kernel.org # 4.11.x
> > Signed-off-by: Rolf Eike Beer 
> 
> Acked-by: Lu Baolu 

Ping. Who is going to pick this up, please?

Eike
-- 
Rolf Eike Beer, emlix GmbH, http://www.emlix.com
Fon +49 551 30664-0, Fax +49 551 30664-11
Gothaer Platz 3, 37083 Göttingen, Germany
Sitz der Gesellschaft: Göttingen, Amtsgericht Göttingen HR B 3160
Geschäftsführung: Heike Jordan, Dr. Uwe Kracke – Ust-IdNr.: DE 205 198 055

emlix - smart embedded open source


signature.asc
Description: This is a digitally signed message part.
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

Re: [RFC PATCH v1 0/2] iommu/arm-smmu-v3: Add some parameter check in __arm_smmu_tlb_inv_range()

2021-05-21 Thread Kunkun Jiang

Hi Robin,

On 2021/5/19 18:01, Robin Murphy wrote:

On 2021-05-19 10:43, Kunkun Jiang wrote:

Hi all,

This set of patches solves some errors when I tested the SMMU nested 
mode.


Test scenario description:
guest kernel: 4KB translation granule
host kernel: 16KB translation granule

errors:
1. encountered an endless loop in __arm_smmu_tlb_inv_range because
num_pages is 0
2. encountered CERROR_ILL because the fields of TLB invalidation
command are as follow: TG = 2, NUM = 0, SCALE = 0, TTL = 0. The
combination is exactly the kind of reserved combination pointed
out in the SMMUv3 spec(page 143-144, version D.a)

In my opinion, it is more appropriate to add parameter check in
__arm_smmu_tlb_inv_range(), although these problems only appeared
when I tested the SMMU nested mode. What do you think?


FWIW I think it would be better to fix the caller to not issue broken 
commands in the first place. The kernel shouldn't do so for itself 
(and definitely needs fixing if it ever does), so it sounds like the 
nesting implementation needs to do a bit more validation of what it's 
passing through.

Thanks for your reply.
I will report these errors to Eric and discuss how to fix them.

Thanks,
Kunkun Jiang


Robin.


This series include patches as below:
Patch 1:
- align the invalid range with leaf page size upwards when smmu
supports RIL

Patch 2:
- add a check to standardize granule size when smmu supports RIL

Kunkun Jiang (2):
   iommu/arm-smmu-v3: Align invalid range with leaf page size upwards
 when support RIL
   iommu/arm-smmu-v3: Standardize granule size when support RIL

  drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c | 9 +
  1 file changed, 9 insertions(+)


.



___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

Re: [PATCH v15 07/12] iommu/smmuv3: Implement cache_invalidate

2021-05-21 Thread Kunkun Jiang

Hi Eric,

On 2021/4/11 19:12, Eric Auger wrote:

Implement domain-selective, pasid selective and page-selective
IOTLB invalidations.

Signed-off-by: Eric Auger 

---
v4 -> v15:
- remove the redundant arm_smmu_cmdq_issue_sync(smmu)
   in IOMMU_INV_GRANU_ADDR case (Zenghui)
- if RIL is not supported by the host, make sure the granule_size
   that is passed by the userspace is supported or fix it
   (Chenxiang)

v13 -> v14:
- Add domain invalidation
- do global inval when asid is not provided with addr
   granularity

v7 -> v8:
- ASID based invalidation using iommu_inv_pasid_info
- check ARCHID/PASID flags in addr based invalidation
- use __arm_smmu_tlb_inv_context and __arm_smmu_tlb_inv_range_nosync

v6 -> v7
- check the uapi version

v3 -> v4:
- adapt to changes in the uapi
- add support for leaf parameter
- do not use arm_smmu_tlb_inv_range_nosync or arm_smmu_tlb_inv_context
   anymore

v2 -> v3:
- replace __arm_smmu_tlb_sync by arm_smmu_cmdq_issue_sync

v1 -> v2:
- properly pass the asid
---
  drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c | 89 +
  1 file changed, 89 insertions(+)

diff --git a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c 
b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
index 56a301fbe75a..bfc112cc0d38 100644
--- a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
+++ b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
@@ -2961,6 +2961,94 @@ static void arm_smmu_detach_pasid_table(struct 
iommu_domain *domain)
mutex_unlock(_domain->init_mutex);
  }
  
+static int

+arm_smmu_cache_invalidate(struct iommu_domain *domain, struct device *dev,
+ struct iommu_cache_invalidate_info *inv_info)
+{
+   struct arm_smmu_cmdq_ent cmd = {.opcode = CMDQ_OP_TLBI_NSNH_ALL};
+   struct arm_smmu_domain *smmu_domain = to_smmu_domain(domain);
+   struct arm_smmu_device *smmu = smmu_domain->smmu;
+
+   if (smmu_domain->stage != ARM_SMMU_DOMAIN_NESTED)
+   return -EINVAL;
+
+   if (!smmu)
+   return -EINVAL;
+
+   if (inv_info->version != IOMMU_CACHE_INVALIDATE_INFO_VERSION_1)
+   return -EINVAL;
+
+   if (inv_info->cache & IOMMU_CACHE_INV_TYPE_PASID ||
+   inv_info->cache & IOMMU_CACHE_INV_TYPE_DEV_IOTLB) {
+   return -ENOENT;
+   }
+
+   if (!(inv_info->cache & IOMMU_CACHE_INV_TYPE_IOTLB))
+   return -EINVAL;
+
+   /* IOTLB invalidation */
+
+   switch (inv_info->granularity) {
+   case IOMMU_INV_GRANU_PASID:
+   {
+   struct iommu_inv_pasid_info *info =
+   _info->granu.pasid_info;
+
+   if (info->flags & IOMMU_INV_ADDR_FLAGS_PASID)
+   return -ENOENT;
+   if (!(info->flags & IOMMU_INV_PASID_FLAGS_ARCHID))
+   return -EINVAL;
+
+   __arm_smmu_tlb_inv_context(smmu_domain, info->archid);
+   return 0;
+   }
+   case IOMMU_INV_GRANU_ADDR:
+   {
+   struct iommu_inv_addr_info *info = _info->granu.addr_info;
+   size_t granule_size  = info->granule_size;
+   size_t size = info->nb_granules * info->granule_size;
+   bool leaf = info->flags & IOMMU_INV_ADDR_FLAGS_LEAF;
+   int tg;
+
+   if (info->flags & IOMMU_INV_ADDR_FLAGS_PASID)
+   return -ENOENT;
+
+   if (!(info->flags & IOMMU_INV_ADDR_FLAGS_ARCHID))
+   break;
+
+   tg = __ffs(granule_size);
+   if (granule_size & ~(1 << tg))
+   return -EINVAL;
+   /*
+* When RIL is not supported, make sure the granule size that is
+* passed is supported. In RIL mode, this is enforced in
+* __arm_smmu_tlb_inv_range()
+*/
+   if (!(smmu->features & ARM_SMMU_FEAT_RANGE_INV) &&
+   !(granule_size & smmu_domain->domain.pgsize_bitmap)) {
+   tg = __ffs(smmu_domain->domain.pgsize_bitmap);
+   granule_size = 1 << tg;
+   size = size >> tg;
+   }
+
+   arm_smmu_tlb_inv_range_domain(info->addr, size,
+ granule_size, leaf,
+ info->archid, smmu_domain);

I encountered some errors when I tested the SMMU nested mode.

Test scenario description:
guest kernel: 4KB translation granule
host kernel: 16KB translation granule

errors:
1. encountered an endless loop in __arm_smmu_tlb_inv_range because
num_pages is 0
2. encountered CERROR_ILL because the fields of TLB invalidation
command are as follow: TG = 2, NUM = 0, SCALE = 0, TTL = 0. The
combination is exactly the kind of reserved combination pointed
out in the SMMUv3 spec(page 143-144, version D.a)

According to my analysis, we should do a bit more validation on the
'size' and 'granule_size' when SMMU supports RIL:
1. 

Re: [RFC PATCH v3 7/8] vfio/type1: Add selective DMA faulting support

2021-05-21 Thread Shenming Lu
On 2021/5/19 2:58, Alex Williamson wrote:
> On Fri, 9 Apr 2021 11:44:19 +0800
> Shenming Lu  wrote:
> 
>> Some devices only allow selective DMA faulting. Similar to the selective
>> dirty page tracking, the vendor driver can call vfio_pin_pages() to
>> indicate the non-faultable scope, we add a new struct vfio_range to
>> record it, then when the IOPF handler receives any page request out
>> of the scope, we can directly return with an invalid response.
> 
> Seems like this highlights a deficiency in the design, that the user
> can't specify mappings as iopf enabled or disabled.  Also, if the
> vendor driver has pinned pages within the range, shouldn't that prevent
> them from faulting in the first place?  Why do we need yet more
> tracking structures?  Pages pinned by the vendor driver need to count
> against the user's locked memory limits regardless of iopf.  Thanks,

Currently we only have a vfio_pfn struct to track the external pinned pages
(single page granularity), so I add a vfio_range struct for efficient lookup.

Yeah, by this patch, for the non-pinned scope, we can directly return INVALID,
but for the pinned(non-faultable) scope, tracking the pinned range doesn't seem
to help more...

Thanks,
Shenming

> 
> Alex
> 
> .
> 
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [RFC PATCH v3 6/8] vfio/type1: No need to statically pin and map if IOPF enabled

2021-05-21 Thread Shenming Lu
On 2021/5/19 2:58, Alex Williamson wrote:
> On Fri, 9 Apr 2021 11:44:18 +0800
> Shenming Lu  wrote:
> 
>> If IOPF enabled for the VFIO container, there is no need to statically
>> pin and map the entire DMA range, we can do it on demand. And unmap
>> according to the IOPF mapped bitmap when removing vfio_dma.
>>
>> Note that we still mark all pages dirty even if IOPF enabled, we may
>> add IOPF-based fine grained dirty tracking support in the future.
>>
>> Signed-off-by: Shenming Lu 
>> ---
>>  drivers/vfio/vfio_iommu_type1.c | 38 +++--
>>  1 file changed, 32 insertions(+), 6 deletions(-)
>>
>> diff --git a/drivers/vfio/vfio_iommu_type1.c 
>> b/drivers/vfio/vfio_iommu_type1.c
>> index 7df5711e743a..dcc93c3b258c 100644
>> --- a/drivers/vfio/vfio_iommu_type1.c
>> +++ b/drivers/vfio/vfio_iommu_type1.c
>> @@ -175,6 +175,7 @@ struct vfio_iopf_group {
>>  #define IOPF_MAPPED_BITMAP_GET(dma, i)  \
>>((dma->iopf_mapped_bitmap[(i) / BITS_PER_LONG]
>> \
>> >> ((i) % BITS_PER_LONG)) & 0x1)  
>> +#define IOPF_MAPPED_BITMAP_BYTES(n) DIRTY_BITMAP_BYTES(n)
>>  
>>  #define WAITED 1
>>  
>> @@ -959,7 +960,8 @@ static int vfio_iommu_type1_pin_pages(void *iommu_data,
>>   * already pinned and accounted. Accouting should be done if there is no
>>   * iommu capable domain in the container.
>>   */
>> -do_accounting = !IS_IOMMU_CAP_DOMAIN_IN_CONTAINER(iommu);
>> +do_accounting = !IS_IOMMU_CAP_DOMAIN_IN_CONTAINER(iommu) ||
>> +iommu->iopf_enabled;
>>  
>>  for (i = 0; i < npage; i++) {
>>  struct vfio_pfn *vpfn;
>> @@ -1048,7 +1050,8 @@ static int vfio_iommu_type1_unpin_pages(void 
>> *iommu_data,
>>  
>>  mutex_lock(>lock);
>>  
>> -do_accounting = !IS_IOMMU_CAP_DOMAIN_IN_CONTAINER(iommu);
>> +do_accounting = !IS_IOMMU_CAP_DOMAIN_IN_CONTAINER(iommu) ||
>> +iommu->iopf_enabled;
> 
> pin/unpin are actually still pinning pages, why does iopf exempt them
> from accounting?

If iopf_enabled is true, do_accounting will be true too, we will account
the external pinned pages?

> 
> 
>>  for (i = 0; i < npage; i++) {
>>  struct vfio_dma *dma;
>>  dma_addr_t iova;
>> @@ -1169,7 +1172,7 @@ static long vfio_unmap_unpin(struct vfio_iommu *iommu, 
>> struct vfio_dma *dma,
>>  if (!dma->size)
>>  return 0;
>>  
>> -if (!IS_IOMMU_CAP_DOMAIN_IN_CONTAINER(iommu))
>> +if (!IS_IOMMU_CAP_DOMAIN_IN_CONTAINER(iommu) || iommu->iopf_enabled)
>>  return 0;
>>  
>>  /*
>> @@ -1306,11 +1309,20 @@ static void vfio_unmap_partial_iopf(struct 
>> vfio_iommu *iommu,
>>  }
>>  }
>>  
>> +static void vfio_dma_clean_iopf(struct vfio_iommu *iommu, struct vfio_dma 
>> *dma)
>> +{
>> +vfio_unmap_partial_iopf(iommu, dma, dma->iova, dma->iova + dma->size);
>> +
>> +kfree(dma->iopf_mapped_bitmap);
>> +}
>> +
>>  static void vfio_remove_dma(struct vfio_iommu *iommu, struct vfio_dma *dma)
>>  {
>>  WARN_ON(!RB_EMPTY_ROOT(>pfn_list));
>>  vfio_unmap_unpin(iommu, dma, true);
>>  vfio_unlink_dma(iommu, dma);
>> +if (iommu->iopf_enabled)
>> +vfio_dma_clean_iopf(iommu, dma);
>>  put_task_struct(dma->task);
>>  vfio_dma_bitmap_free(dma);
>>  if (dma->vaddr_invalid) {
>> @@ -1359,7 +1371,8 @@ static int update_user_bitmap(u64 __user *bitmap, 
>> struct vfio_iommu *iommu,
>>   * mark all pages dirty if any IOMMU capable device is not able
>>   * to report dirty pages and all pages are pinned and mapped.
>>   */
>> -if (iommu->num_non_pinned_groups && dma->iommu_mapped)
>> +if (iommu->num_non_pinned_groups &&
>> +(dma->iommu_mapped || iommu->iopf_enabled))
>>  bitmap_set(dma->bitmap, 0, nbits);
> 
> This seems like really poor integration of iopf into dirty page
> tracking.  I'd expect dirty logging to flush the mapped pages and
> write faults to mark pages dirty.  Shouldn't the fault handler also
> provide only the access faulted, so for example a read fault wouldn't
> mark the page dirty?
I just want to keep the behavior here as before, if IOPF enabled, we
will still mark all pages dirty.

We can distinguish between write and read faults in the fault handler,
so there is a way to add IOPF-based fine grained dirty tracking support...
But I am not sure whether there is a need to implement this, we can
consider this in the future?

> 
>>  
>>  if (shift) {
>> @@ -1772,6 +1785,16 @@ static int vfio_dma_do_map(struct vfio_iommu *iommu,
>>  goto out_unlock;
>>  }
>>  
>> +if (iommu->iopf_enabled) {
>> +dma->iopf_mapped_bitmap = kvzalloc(IOPF_MAPPED_BITMAP_BYTES(
>> +size >> PAGE_SHIFT), 
>> GFP_KERNEL);
>> +if (!dma->iopf_mapped_bitmap) {
>> +ret = -ENOMEM;
>> +kfree(dma);
>> +goto out_unlock;
>> +  

Re: [RFC PATCH v3 2/8] vfio/type1: Add a page fault handler

2021-05-21 Thread Shenming Lu
On 2021/5/19 2:58, Alex Williamson wrote:
> On Fri, 9 Apr 2021 11:44:14 +0800
> Shenming Lu  wrote:
> 
>> VFIO manages the DMA mapping itself. To support IOPF (on-demand paging)
>> for VFIO (IOMMU capable) devices, we add a VFIO page fault handler to
>> serve the reported page faults from the IOMMU driver.
>>
>> Signed-off-by: Shenming Lu 
>> ---
>>  drivers/vfio/vfio_iommu_type1.c | 114 
>>  1 file changed, 114 insertions(+)
>>
>> diff --git a/drivers/vfio/vfio_iommu_type1.c 
>> b/drivers/vfio/vfio_iommu_type1.c
>> index 45cbfd4879a5..ab0ff60ee207 100644
>> --- a/drivers/vfio/vfio_iommu_type1.c
>> +++ b/drivers/vfio/vfio_iommu_type1.c
>> @@ -101,6 +101,7 @@ struct vfio_dma {
>>  struct task_struct  *task;
>>  struct rb_root  pfn_list;   /* Ex-user pinned pfn list */
>>  unsigned long   *bitmap;
>> +unsigned long   *iopf_mapped_bitmap;
>>  };
>>  
>>  struct vfio_batch {
>> @@ -141,6 +142,16 @@ struct vfio_regions {
>>  size_t len;
>>  };
>>  
>> +/* A global IOPF enabled group list */
>> +static struct rb_root iopf_group_list = RB_ROOT;
>> +static DEFINE_MUTEX(iopf_group_list_lock);
>> +
>> +struct vfio_iopf_group {
>> +struct rb_node  node;
>> +struct iommu_group  *iommu_group;
>> +struct vfio_iommu   *iommu;
>> +};
>> +
>>  #define IS_IOMMU_CAP_DOMAIN_IN_CONTAINER(iommu) \
>>  (!list_empty(>domain_list))
>>  
>> @@ -157,6 +168,10 @@ struct vfio_regions {
>>  #define DIRTY_BITMAP_PAGES_MAX   ((u64)INT_MAX)
>>  #define DIRTY_BITMAP_SIZE_MAX
>> DIRTY_BITMAP_BYTES(DIRTY_BITMAP_PAGES_MAX)
>>  
>> +#define IOPF_MAPPED_BITMAP_GET(dma, i)  \
>> +  ((dma->iopf_mapped_bitmap[(i) / BITS_PER_LONG]
>> \
>> +   >> ((i) % BITS_PER_LONG)) & 0x1)
> 
> 
> Can't we just use test_bit()?

Yeah, we can use it.

> 
> 
>> +
>>  #define WAITED 1
>>  
>>  static int put_pfn(unsigned long pfn, int prot);
>> @@ -416,6 +431,34 @@ static int vfio_iova_put_vfio_pfn(struct vfio_dma *dma, 
>> struct vfio_pfn *vpfn)
>>  return ret;
>>  }
>>  
>> +/*
>> + * Helper functions for iopf_group_list
>> + */
>> +static struct vfio_iopf_group *
>> +vfio_find_iopf_group(struct iommu_group *iommu_group)
>> +{
>> +struct vfio_iopf_group *iopf_group;
>> +struct rb_node *node;
>> +
>> +mutex_lock(_group_list_lock);
>> +
>> +node = iopf_group_list.rb_node;
>> +
>> +while (node) {
>> +iopf_group = rb_entry(node, struct vfio_iopf_group, node);
>> +
>> +if (iommu_group < iopf_group->iommu_group)
>> +node = node->rb_left;
>> +else if (iommu_group > iopf_group->iommu_group)
>> +node = node->rb_right;
>> +else
>> +break;
>> +}
>> +
>> +mutex_unlock(_group_list_lock);
>> +return node ? iopf_group : NULL;
>> +}
> 
> This looks like a pretty heavy weight operation per DMA fault.
> 
> I'm also suspicious of this validity of this iopf_group after we've
> dropped the locking, the ordering of patches makes this very confusing.

My thought was to include the handling of DMA faults completely in the type1
backend by introducing the vfio_iopf_group struct. But it seems that introducing
a struct with an unknown lifecycle causes more problems...
I will use the path from vfio-core as in the v2 for simplicity and validity.

Sorry for the confusing, I will reconstruct the series later. :-)

> 
>> +
>>  static int vfio_lock_acct(struct vfio_dma *dma, long npage, bool async)
>>  {
>>  struct mm_struct *mm;
>> @@ -3106,6 +3149,77 @@ static int vfio_iommu_type1_dirty_pages(struct 
>> vfio_iommu *iommu,
>>  return -EINVAL;
>>  }
>>  
>> +/* VFIO I/O Page Fault handler */
>> +static int vfio_iommu_type1_dma_map_iopf(struct iommu_fault *fault, void 
>> *data)
> 
>>From the comment, this seems like the IOMMU fault handler (the
> construction of this series makes this difficult to follow) and
> eventually it handles more than DMA mapping, for example transferring
> faults to the device driver.  "dma_map_iopf" seems like a poorly scoped
> name.

Maybe just call it dev_fault_handler?

> 
>> +{
>> +struct device *dev = (struct device *)data;
>> +struct iommu_group *iommu_group;
>> +struct vfio_iopf_group *iopf_group;
>> +struct vfio_iommu *iommu;
>> +struct vfio_dma *dma;
>> +dma_addr_t iova = ALIGN_DOWN(fault->prm.addr, PAGE_SIZE);
>> +int access_flags = 0;
>> +unsigned long bit_offset, vaddr, pfn;
>> +int ret;
>> +enum iommu_page_response_code status = IOMMU_PAGE_RESP_INVALID;
>> +struct iommu_page_response resp = {0};
>> +
>> +if (fault->type != IOMMU_FAULT_PAGE_REQ)
>> +return -EOPNOTSUPP;
>> +
>> +iommu_group = iommu_group_get(dev);
>> +if (!iommu_group)
>> +return -ENODEV;
>> +
>> +iopf_group = vfio_find_iopf_group(iommu_group);

Re: [RFC PATCH v3 5/8] vfio/type1: VFIO_IOMMU_ENABLE_IOPF

2021-05-21 Thread Shenming Lu
On 2021/5/19 2:58, Alex Williamson wrote:
> On Fri, 9 Apr 2021 11:44:17 +0800
> Shenming Lu  wrote:
> 
>> Since enabling IOPF for devices may lead to a slow ramp up of performance,
>> we add an ioctl VFIO_IOMMU_ENABLE_IOPF to make it configurable. And the
>> IOPF enabling of a VFIO device includes setting IOMMU_DEV_FEAT_IOPF and
>> registering the VFIO IOPF handler.
>>
>> Note that VFIO_IOMMU_DISABLE_IOPF is not supported since there may be
>> inflight page faults when disabling.
>>
>> Signed-off-by: Shenming Lu 
>> ---
>>  drivers/vfio/vfio_iommu_type1.c | 223 +++-
>>  include/uapi/linux/vfio.h   |   6 +
>>  2 files changed, 226 insertions(+), 3 deletions(-)
>>
>> diff --git a/drivers/vfio/vfio_iommu_type1.c 
>> b/drivers/vfio/vfio_iommu_type1.c
>> index 01e296c6dc9e..7df5711e743a 100644
>> --- a/drivers/vfio/vfio_iommu_type1.c
>> +++ b/drivers/vfio/vfio_iommu_type1.c
>> @@ -71,6 +71,7 @@ struct vfio_iommu {
>>  struct rb_root  dma_list;
>>  struct blocking_notifier_head notifier;
>>  struct mmu_notifier mn;
>> +struct mm_struct*mm;
> 
> We currently have no requirement that a single mm is used for all
> container mappings.  Does enabling IOPF impose such a requirement?
> Shouldn't MAP/UNMAP enforce that?

Did you mean that there is a possibility that each vfio_dma in a
container has a different mm? If so, could we register one MMU
notifier for each vfio_dma in the MAP ioctl?

> 
>>  unsigned intdma_avail;
>>  unsigned intvaddr_invalid_count;
>>  uint64_tpgsize_bitmap;
>> @@ -81,6 +82,7 @@ struct vfio_iommu {
>>  booldirty_page_tracking;
>>  boolpinned_page_dirty_scope;
>>  boolcontainer_open;
>> +booliopf_enabled;
>>  };
>>  
>>  struct vfio_domain {
>> @@ -461,6 +463,38 @@ vfio_find_iopf_group(struct iommu_group *iommu_group)
>>  return node ? iopf_group : NULL;
>>  }
>>  
>> +static void vfio_link_iopf_group(struct vfio_iopf_group *new)
>> +{
>> +struct rb_node **link, *parent = NULL;
>> +struct vfio_iopf_group *iopf_group;
>> +
>> +mutex_lock(_group_list_lock);
>> +
>> +link = _group_list.rb_node;
>> +
>> +while (*link) {
>> +parent = *link;
>> +iopf_group = rb_entry(parent, struct vfio_iopf_group, node);
>> +
>> +if (new->iommu_group < iopf_group->iommu_group)
>> +link = &(*link)->rb_left;
>> +else
>> +link = &(*link)->rb_right;
>> +}
>> +
>> +rb_link_node(>node, parent, link);
>> +rb_insert_color(>node, _group_list);
>> +
>> +mutex_unlock(_group_list_lock);
>> +}
>> +
>> +static void vfio_unlink_iopf_group(struct vfio_iopf_group *old)
>> +{
>> +mutex_lock(_group_list_lock);
>> +rb_erase(>node, _group_list);
>> +mutex_unlock(_group_list_lock);
>> +}
>> +
>>  static int vfio_lock_acct(struct vfio_dma *dma, long npage, bool async)
>>  {
>>  struct mm_struct *mm;
>> @@ -2363,6 +2397,68 @@ static void vfio_iommu_iova_insert_copy(struct 
>> vfio_iommu *iommu,
>>  list_splice_tail(iova_copy, iova);
>>  }
>>  
>> +static int vfio_dev_domian_nested(struct device *dev, int *nested)
> 
> 
> s/domian/domain/
> 
> 
>> +{
>> +struct iommu_domain *domain;
>> +
>> +domain = iommu_get_domain_for_dev(dev);
>> +if (!domain)
>> +return -ENODEV;
>> +
>> +return iommu_domain_get_attr(domain, DOMAIN_ATTR_NESTING, nested);
> 
> 
> Wouldn't this be easier to use if it returned bool, false on -errno?

Make sense.

> 
> 
>> +}
>> +
>> +static int vfio_iommu_type1_dma_map_iopf(struct iommu_fault *fault, void 
>> *data);
>> +
>> +static int dev_enable_iopf(struct device *dev, void *data)
>> +{
>> +int *enabled_dev_cnt = data;
>> +int nested;
>> +u32 flags;
>> +int ret;
>> +
>> +ret = iommu_dev_enable_feature(dev, IOMMU_DEV_FEAT_IOPF);
>> +if (ret)
>> +return ret;
>> +
>> +ret = vfio_dev_domian_nested(dev, );
>> +if (ret)
>> +goto out_disable;
>> +
>> +if (nested)
>> +flags = FAULT_REPORT_NESTED_L2;
>> +else
>> +flags = FAULT_REPORT_FLAT;
>> +
>> +ret = iommu_register_device_fault_handler(dev,
>> +vfio_iommu_type1_dma_map_iopf, flags, dev);
>> +if (ret)
>> +goto out_disable;
>> +
>> +(*enabled_dev_cnt)++;
>> +return 0;
>> +
>> +out_disable:
>> +iommu_dev_disable_feature(dev, IOMMU_DEV_FEAT_IOPF);
>> +return ret;
>> +}
>> +
>> +static int dev_disable_iopf(struct device *dev, void *data)
>> +{
>> +int *enabled_dev_cnt = data;
>> +
>> +if (enabled_dev_cnt && *enabled_dev_cnt <= 0)
>> +return -1;
> 
> 
> Use an errno.>
> 
>> +
>> +WARN_ON(iommu_unregister_device_fault_handler(dev));
>> +WARN_ON(iommu_dev_disable_feature(dev, IOMMU_DEV_FEAT_IOPF));
>> +
>> +if 

Re: [RFC PATCH v3 4/8] vfio/type1: Pre-map more pages than requested in the IOPF handling

2021-05-21 Thread Shenming Lu
On 2021/5/19 2:58, Alex Williamson wrote:
> On Fri, 9 Apr 2021 11:44:16 +0800
> Shenming Lu  wrote:
> 
>> To optimize for fewer page fault handlings, we can pre-map more pages
>> than requested at once.
>>
>> Note that IOPF_PREMAP_LEN is just an arbitrary value for now, which we
>> could try further tuning.
> 
> I'd prefer that the series introduced full end-to-end functionality
> before trying to improve performance.  The pre-map value seems
> arbitrary here and as noted in the previous patch, the IOMMU API does
> not guarantee unmaps of ranges smaller than the original mapping.  This
> would need to map with single page granularity in order to guarantee
> page granularity at the mmu notifier when the IOMMU supports
> superpages.  Thanks,

Yeah, I will drop this patch in the current stage.

Thanks,
Shenming

> 
> Alex
> 
> .
> 
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [RFC PATCH v3 3/8] vfio/type1: Add an MMU notifier to avoid pinning

2021-05-21 Thread Shenming Lu
On 2021/5/19 2:58, Alex Williamson wrote:
> On Fri, 9 Apr 2021 11:44:15 +0800
> Shenming Lu  wrote:
> 
>> To avoid pinning pages when they are mapped in IOMMU page tables, we
>> add an MMU notifier to tell the addresses which are no longer valid
>> and try to unmap them.
>>
>> Signed-off-by: Shenming Lu 
>> ---
>>  drivers/vfio/vfio_iommu_type1.c | 112 +++-
>>  1 file changed, 109 insertions(+), 3 deletions(-)
>>
>> diff --git a/drivers/vfio/vfio_iommu_type1.c 
>> b/drivers/vfio/vfio_iommu_type1.c
>> index ab0ff60ee207..1cb9d1f2717b 100644
>> --- a/drivers/vfio/vfio_iommu_type1.c
>> +++ b/drivers/vfio/vfio_iommu_type1.c
>> @@ -40,6 +40,7 @@
>>  #include 
>>  #include 
>>  #include 
>> +#include 
>>  
>>  #define DRIVER_VERSION  "0.2"
>>  #define DRIVER_AUTHOR   "Alex Williamson "
>> @@ -69,6 +70,7 @@ struct vfio_iommu {
>>  struct mutexlock;
>>  struct rb_root  dma_list;
>>  struct blocking_notifier_head notifier;
>> +struct mmu_notifier mn;
>>  unsigned intdma_avail;
>>  unsigned intvaddr_invalid_count;
>>  uint64_tpgsize_bitmap;
>> @@ -1204,6 +1206,72 @@ static long vfio_unmap_unpin(struct vfio_iommu 
>> *iommu, struct vfio_dma *dma,
>>  return unlocked;
>>  }
>>  
>> +/* Unmap the IOPF mapped pages in the specified range. */
>> +static void vfio_unmap_partial_iopf(struct vfio_iommu *iommu,
>> +struct vfio_dma *dma,
>> +dma_addr_t start, dma_addr_t end)
>> +{
>> +struct iommu_iotlb_gather *gathers;
>> +struct vfio_domain *d;
>> +int i, num_domains = 0;
>> +
>> +list_for_each_entry(d, >domain_list, next)
>> +num_domains++;
>> +
>> +gathers = kzalloc(sizeof(*gathers) * num_domains, GFP_KERNEL);
>> +if (gathers) {
>> +for (i = 0; i < num_domains; i++)
>> +iommu_iotlb_gather_init([i]);
>> +}
> 
> 
> If we're always serialized on iommu->lock, would it make sense to have
> gathers pre-allocated on the vfio_iommu object?

Yeah, we can do it.

> 
>> +
>> +while (start < end) {
>> +unsigned long bit_offset;
>> +size_t len;
>> +
>> +bit_offset = (start - dma->iova) >> PAGE_SHIFT;
>> +
>> +for (len = 0; start + len < end; len += PAGE_SIZE) {
>> +if (!IOPF_MAPPED_BITMAP_GET(dma,
>> +bit_offset + (len >> PAGE_SHIFT)))
>> +break;
> 
> 
> There are bitmap helpers for this, find_first_bit(),
> find_next_zero_bit().

Thanks for the hint. :-)

> 
> 
>> +}
>> +
>> +if (len) {
>> +i = 0;
>> +list_for_each_entry(d, >domain_list, next) {
>> +size_t unmapped;
>> +
>> +if (gathers)
>> +unmapped = iommu_unmap_fast(d->domain,
>> +start, len,
>> +
>> [i++]);
>> +else
>> +unmapped = iommu_unmap(d->domain,
>> +   start, len);
>> +
>> +if (WARN_ON(unmapped != len))
> 
> The IOMMU API does not guarantee arbitrary unmap sizes, this will
> trigger and this exit path is wrong.  If we've already unmapped the
> IOMMU, shouldn't we proceed with @unmapped rather than @len so the
> device can re-fault the extra mappings?  Otherwise the IOMMU state
> doesn't match the iopf bitmap state.

OK, I will correct it.

And can we assume that the @unmapped values (returned from iommu_unmap)
of all domains are the same (since all domains share the same 
iopf_mapped_bitmap)?

> 
>> +goto out;
>> +}
>> +
>> +bitmap_clear(dma->iopf_mapped_bitmap,
>> + bit_offset, len >> PAGE_SHIFT);
>> +
>> +cond_resched();
>> +}
>> +
>> +start += (len + PAGE_SIZE);
>> +}
>> +
>> +out:
>> +if (gathers) {
>> +i = 0;
>> +list_for_each_entry(d, >domain_list, next)
>> +iommu_iotlb_sync(d->domain, [i++]);
>> +
>> +kfree(gathers);
>> +}
>> +}
>> +
>>  static void vfio_remove_dma(struct vfio_iommu *iommu, struct vfio_dma *dma)
>>  {
>>  WARN_ON(!RB_EMPTY_ROOT(>pfn_list));
>> @@ -3197,17 +3265,18 @@ static int vfio_iommu_type1_dma_map_iopf(struct 
>> iommu_fault *fault, void *data)
>>  
>>  vaddr = iova - dma->iova + dma->vaddr;
>>  
>> -if (vfio_pin_page_external(dma, vaddr, , true))
>> +if (vfio_pin_page_external(dma, vaddr, , false))
>>  goto out_invalid;
>>  
>>  if (vfio_iommu_map(iommu, iova, pfn, 1, dma->prot)) {

Re: [RFC PATCH v3 1/8] iommu: Evolve the device fault reporting framework

2021-05-21 Thread Shenming Lu
On 2021/5/19 2:58, Alex Williamson wrote:
> On Fri, 9 Apr 2021 11:44:13 +0800
> Shenming Lu  wrote:
> 
>> This patch follows the discussion here:
>>
>> https://lore.kernel.org/linux-acpi/YAaxjmJW+ZMvrhac@myrica/
>>
>> Besides SVA/vSVA, such as VFIO may also enable (2nd level) IOPF to remove
>> pinning restriction. In order to better support more scenarios of using
>> device faults, we extend iommu_register_fault_handler() with flags and
>> introduce FAULT_REPORT_ to describe the device fault reporting capability
>> under a specific configuration.
>>
>> Note that we don't further distinguish recoverable and unrecoverable faults
>> by flags in the fault reporting cap, having PAGE_FAULT_REPORT_ +
>> UNRECOV_FAULT_REPORT_ seems not a clean way.
>>
>> In addition, still take VFIO as an example, in nested mode, the 1st level
>> and 2nd level fault reporting may be configured separately and currently
>> each device can only register one iommu dev fault handler, so we add a
>> handler update interface for this.
> 
> 
> IIUC, you're introducing flags for the fault handler callout, which
> essentially allows the IOMMU layer to filter which types of faults the
> handler can receive.  You then need an update function to modify those
> flags.  Why can't the handler itself perform this filtering?  For
> instance in your vfio example, the handler registered by the type1
> backend could itself return fault until the fault transfer path to the
> device driver is established.  Thanks,

As discussed in [1]:

In nested IOPF, we have to figure out whether a fault comes from L1 or L2.
A SMMU stall event comes with this information, but a PRI page request doesn't.
The IOMMU driver can walk the page tables to find out the level information.
If the walk terminates at L1, further inject to the guest. Otherwise fix the
fault at L2 in VFIO. It's not efficient compared to hardware-provided info.

And in pinned case, if VFIO can tell the IOMMU driver that no L2 fault is
expected, there is no need to walk the page tables in the IOMMU driver?

[1] 
https://patchwork.kernel.org/project/linux-arm-kernel/patch/20210108145217.2254447-4-jean-phili...@linaro.org/

Thanks,
Shenming

> 
> Alex
> 
> .
> 
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [RFC PATCH v3 0/8] Add IOPF support for VFIO passthrough

2021-05-21 Thread Shenming Lu
Hi Alex,

On 2021/5/19 2:57, Alex Williamson wrote:
> On Fri, 9 Apr 2021 11:44:12 +0800
> Shenming Lu  wrote:
> 
>> Hi,
>>
>> Requesting for your comments and suggestions. :-)
>>
>> The static pinning and mapping problem in VFIO and possible solutions
>> have been discussed a lot [1, 2]. One of the solutions is to add I/O
>> Page Fault support for VFIO devices. Different from those relatively
>> complicated software approaches such as presenting a vIOMMU that provides
>> the DMA buffer information (might include para-virtualized optimizations),
>> IOPF mainly depends on the hardware faulting capability, such as the PCIe
>> PRI extension or Arm SMMU stall model. What's more, the IOPF support in
>> the IOMMU driver has already been implemented in SVA [3]. So we add IOPF
>> support for VFIO passthrough based on the IOPF part of SVA in this series.
> 
> The SVA proposals are being reworked to make use of a new IOASID
> object, it's not clear to me that this shouldn't also make use of that
> work as it does a significant expansion of the type1 IOMMU with fault
> handlers that would duplicate new work using that new model.

It seems that the IOASID extension for guest SVA would not affect this series,
will we do any host-guest IOASID translation in the VFIO fault handler?

> 
>> We have measured its performance with UADK [4] (passthrough an accelerator
>> to a VM(1U16G)) on Hisilicon Kunpeng920 board (and compared with host SVA):
>>
>> Run hisi_sec_test...
>>  - with varying sending times and message lengths
>>  - with/without IOPF enabled (speed slowdown)
>>
>> when msg_len = 1MB (and PREMAP_LEN (in Patch 4) = 1):
>> slowdown (num of faults)
>>  times  VFIO IOPF  host SVA
>>  1  63.4% (518)82.8% (512)
>>  10022.9% (1058)   47.9% (1024)
>>  1000   2.6% (1071)8.5% (1024)
>>
>> when msg_len = 10MB (and PREMAP_LEN = 512):
>> slowdown (num of faults)
>>  times  VFIO IOPF
>>  1  32.6% (13)
>>  1003.5% (26)
>>  1000   1.6% (26)
> 
> It seems like this is only an example that you can make a benchmark
> show anything you want.  The best results would be to pre-map
> everything, which is what we have without this series.  What is an
> acceptable overhead to incur to avoid page pinning?  What if userspace
> had more fine grained control over which mappings were available for
> faulting and which were statically mapped?  I don't really see what
> sense the pre-mapping range makes.  If I assume the user is QEMU in a
> non-vIOMMU configuration, pre-mapping the beginning of each RAM section
> doesn't make any logical sense relative to device DMA.

As you said in Patch 4, we can introduce full end-to-end functionality
before trying to improve performance, and I will drop the pre-mapping patch
in the current stage...

Is there a need that userspace wants more fine grained control over which
mappings are available for faulting? If so, we may evolve the MAP ioctl
to support for specifying the faulting range.

As for the overhead of IOPF, it is unavoidable if enabling on-demand paging
(and page faults occur almost only when first accessing), and it seems that
there is a large optimization space compared to CPU page faulting.

Thanks,
Shenming

> 
> Comments per patch to follow.  Thanks,
> 
> Alex
> 
> 
>> History:
>>
>> v2 -> v3
>>  - Nit fixes.
>>  - No reason to disable reporting the unrecoverable faults. (baolu)
>>  - Maintain a global IOPF enabled group list.
>>  - Split the pre-mapping optimization to be a separate patch.
>>  - Add selective faulting support (use vfio_pin_pages to indicate the
>>non-faultable scope and add a new struct vfio_range to record it,
>>untested). (Kevin)
>>
>> v1 -> v2
>>  - Numerous improvements following the suggestions. Thanks a lot to all
>>of you.
>>
>> Note that PRI is not supported at the moment since there is no hardware.
>>
>> Links:
>> [1] Lesokhin I, et al. Page Fault Support for Network Controllers. In ASPLOS,
>> 2016.
>> [2] Tian K, et al. coIOMMU: A Virtual IOMMU with Cooperative DMA Buffer 
>> Tracking
>> for Efficient Memory Management in Direct I/O. In USENIX ATC, 2020.
>> [3] 
>> https://patchwork.kernel.org/project/linux-arm-kernel/cover/20210401154718.307519-1-jean-phili...@linaro.org/
>> [4] https://github.com/Linaro/uadk
>>
>> Thanks,
>> Shenming
>>
>>
>> Shenming Lu (8):
>>   iommu: Evolve the device fault reporting framework
>>   vfio/type1: Add a page fault handler
>>   vfio/type1: Add an MMU notifier to avoid pinning
>>   vfio/type1: Pre-map more pages than requested in the IOPF handling
>>   vfio/type1: VFIO_IOMMU_ENABLE_IOPF
>>   vfio/type1: No need to statically pin and map if IOPF enabled
>>   vfio/type1: Add selective DMA faulting support
>>   vfio: Add nested IOPF support
>>
>>  .../iommu/arm/arm-smmu-v3/arm-smmu-v3-sva.c   |3 +-
>>  drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c   |   18 +-
>>  drivers/iommu/iommu.c |   56 +-
>>