[PATCH] iommu/amd: Remove redundant assignment of err

2021-05-18 Thread Shaokun Zhang
'err' will be initialized and cleanup the redundant initialization.

Cc: Joerg Roedel  
Signed-off-by: Shaokun Zhang 
---
 drivers/iommu/amd/iommu.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/iommu/amd/iommu.c b/drivers/iommu/amd/iommu.c
index 80e8e1916dd1..bea91cc73153 100644
--- a/drivers/iommu/amd/iommu.c
+++ b/drivers/iommu/amd/iommu.c
@@ -1773,7 +1773,7 @@ void amd_iommu_domain_update(struct protection_domain 
*domain)
 
 int __init amd_iommu_init_api(void)
 {
-   int ret, err = 0;
+   int ret, err;
 
ret = iova_cache_get();
if (ret)
-- 
2.7.4

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[PATCH 1/2] iommu/vt-d: Check for allocation failure in aux_detach_device()

2021-05-18 Thread Lu Baolu
From: Dan Carpenter 

In current kernels small allocations never fail, but checking for
allocation failure is the correct thing to do.

Fixes: 18abda7a2d55 ("iommu/vt-d: Fix general protection fault in 
aux_detach_device()")
Signed-off-by: Dan Carpenter 
Acked-by: Lu Baolu 
Link: https://lore.kernel.org/r/YJuobKuSn81dOPLd@mwanda
---
 drivers/iommu/intel/iommu.c | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/drivers/iommu/intel/iommu.c b/drivers/iommu/intel/iommu.c
index 708f430af1c4..9a7b79b5af18 100644
--- a/drivers/iommu/intel/iommu.c
+++ b/drivers/iommu/intel/iommu.c
@@ -4606,6 +4606,8 @@ static int auxiliary_link_device(struct dmar_domain 
*domain,
 
if (!sinfo) {
sinfo = kzalloc(sizeof(*sinfo), GFP_ATOMIC);
+   if (!sinfo)
+   return -ENOMEM;
sinfo->domain = domain;
sinfo->pdev = dev;
list_add(>link_phys, >subdevices);
-- 
2.25.1

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[PATCH 2/2] iommu/vt-d: Use user privilege for RID2PASID translation

2021-05-18 Thread Lu Baolu
When first-level page tables are used for IOVA translation, we use user
privilege by setting U/S bit in the page table entry. This is to make it
consistent with the second level translation, where the U/S enforcement
is not available. Clear the SRE (Supervisor Request Enable) field in the
pasid table entry of RID2PASID so that requests requesting the supervisor
privilege are blocked and treated as DMA remapping faults.

Fixes: b802d070a52a1 ("iommu/vt-d: Use iova over first level")
Suggested-by: Jacob Pan 
Signed-off-by: Lu Baolu 
Link: 
https://lore.kernel.org/r/20210512064426.3440915-1-baolu...@linux.intel.com
---
 drivers/iommu/intel/iommu.c | 7 +--
 drivers/iommu/intel/pasid.c | 3 ++-
 2 files changed, 7 insertions(+), 3 deletions(-)

diff --git a/drivers/iommu/intel/iommu.c b/drivers/iommu/intel/iommu.c
index 9a7b79b5af18..be35284a2016 100644
--- a/drivers/iommu/intel/iommu.c
+++ b/drivers/iommu/intel/iommu.c
@@ -2525,9 +2525,9 @@ static int domain_setup_first_level(struct intel_iommu 
*iommu,
struct device *dev,
u32 pasid)
 {
-   int flags = PASID_FLAG_SUPERVISOR_MODE;
struct dma_pte *pgd = domain->pgd;
int agaw, level;
+   int flags = 0;
 
/*
 * Skip top levels of page tables for iommu which has
@@ -2543,7 +2543,10 @@ static int domain_setup_first_level(struct intel_iommu 
*iommu,
if (level != 4 && level != 5)
return -EINVAL;
 
-   flags |= (level == 5) ? PASID_FLAG_FL5LP : 0;
+   if (pasid != PASID_RID2PASID)
+   flags |= PASID_FLAG_SUPERVISOR_MODE;
+   if (level == 5)
+   flags |= PASID_FLAG_FL5LP;
 
if (domain->domain.type == IOMMU_DOMAIN_UNMANAGED)
flags |= PASID_FLAG_PAGE_SNOOP;
diff --git a/drivers/iommu/intel/pasid.c b/drivers/iommu/intel/pasid.c
index 72646bafc52f..72dc84821dad 100644
--- a/drivers/iommu/intel/pasid.c
+++ b/drivers/iommu/intel/pasid.c
@@ -699,7 +699,8 @@ int intel_pasid_setup_second_level(struct intel_iommu 
*iommu,
 * Since it is a second level only translation setup, we should
 * set SRE bit as well (addresses are expected to be GPAs).
 */
-   pasid_set_sre(pte);
+   if (pasid != PASID_RID2PASID)
+   pasid_set_sre(pte);
pasid_set_present(pte);
pasid_flush_caches(iommu, pte, pasid, did);
 
-- 
2.25.1

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[PATCH 0/2] [PULL REQUEST] iommu/vt-d: Fixes for v5.13-rc3

2021-05-18 Thread Lu Baolu
Hi Joerg,

Two small fixes are queued in this series. It includes:

 - Use user privilege for RID2PASID translation
 - Check memory allocation return value

Please consider them for v5.13.

Best regards,
Lu Baolu

Dan Carpenter (1):
  iommu/vt-d: Check for allocation failure in aux_detach_device()

Lu Baolu (1):
  iommu/vt-d: Use user privilege for RID2PASID translation

 drivers/iommu/intel/iommu.c | 9 +++--
 drivers/iommu/intel/pasid.c | 3 ++-
 2 files changed, 9 insertions(+), 3 deletions(-)

-- 
2.25.1

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[PATCH 1/2] irqchip/qcom-pdc: Switch to IRQCHIP_PLATFORM_DRIVER and allow as a module

2021-05-18 Thread John Stultz
From: Saravana Kannan 

This patch revives changes from Saravana Kannan to switch the
qcom-pdc driver to use IRQCHIP_PLATFORM_DRIVER helper macros,
and allows qcom-pdc driver to be loaded as a permanent module.

Earlier attempts at this ran into trouble with loading
dependencies, but with Saravana's fw_devlink=on set by default
now we should avoid those.

Cc: Andy Gross 
Cc: Bjorn Andersson 
Cc: Joerg Roedel 
Cc: Thomas Gleixner 
Cc: Jason Cooper 
Cc: Marc Zyngier 
Cc: Linus Walleij 
Cc: Maulik Shah 
Cc: Lina Iyer 
Cc: Saravana Kannan 
Cc: Todd Kjos 
Cc: Greg Kroah-Hartman 
Cc: linux-arm-...@vger.kernel.org
Cc: iommu@lists.linux-foundation.org
Cc: linux-g...@vger.kernel.org
Signed-off-by: Saravana Kannan 
[jstultz: Folded in with my changes to allow the driver to be
 loadable as a permenent module]
Signed-off-by: John Stultz 
---
 drivers/irqchip/Kconfig| 2 +-
 drivers/irqchip/qcom-pdc.c | 8 +++-
 2 files changed, 8 insertions(+), 2 deletions(-)

diff --git a/drivers/irqchip/Kconfig b/drivers/irqchip/Kconfig
index b90e825df7e14..d4a0b4964ccc5 100644
--- a/drivers/irqchip/Kconfig
+++ b/drivers/irqchip/Kconfig
@@ -415,7 +415,7 @@ config GOLDFISH_PIC
  for Goldfish based virtual platforms.
 
 config QCOM_PDC
-   bool "QCOM PDC"
+   tristate "QCOM PDC"
depends on ARCH_QCOM
select IRQ_DOMAIN_HIERARCHY
help
diff --git a/drivers/irqchip/qcom-pdc.c b/drivers/irqchip/qcom-pdc.c
index 5dc63c20b67ea..32d59202d408d 100644
--- a/drivers/irqchip/qcom-pdc.c
+++ b/drivers/irqchip/qcom-pdc.c
@@ -11,9 +11,11 @@
 #include 
 #include 
 #include 
+#include 
 #include 
 #include 
 #include 
+#include 
 #include 
 #include 
 #include 
@@ -459,4 +461,8 @@ static int qcom_pdc_init(struct device_node *node, struct 
device_node *parent)
return ret;
 }
 
-IRQCHIP_DECLARE(qcom_pdc, "qcom,pdc", qcom_pdc_init);
+IRQCHIP_PLATFORM_DRIVER_BEGIN(qcom_pdc)
+IRQCHIP_MATCH("qcom,pdc", qcom_pdc_init)
+IRQCHIP_PLATFORM_DRIVER_END(qcom_pdc)
+MODULE_DESCRIPTION("Qualcomm Technologies, Inc. Power Domain Controller");
+MODULE_LICENSE("GPL v2");
-- 
2.25.1

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[PATCH 2/2] firmware: QCOM_SCM: Allow qcom_scm driver to be loadable as a permenent module

2021-05-18 Thread John Stultz
Allow the qcom_scm driver to be loadable as a permenent module.

This still uses the "depends on QCOM_SCM || !QCOM_SCM" bit to
ensure that drivers that call into the qcom_scm driver are
also built as modules. While not ideal in some cases its the
only safe way I can find to avoid build errors without having
those drivers select QCOM_SCM and have to force it on (as
QCOM_SCM=n can be valid for those drivers).

Reviving this now that Saravana's fw_devlink defaults to on,
which should avoid loading troubles seen before.

Cc: Catalin Marinas 
Cc: Will Deacon 
Cc: Andy Gross 
Cc: Bjorn Andersson 
Cc: Joerg Roedel 
Cc: Thomas Gleixner 
Cc: Jason Cooper 
Cc: Marc Zyngier 
Cc: Linus Walleij 
Cc: Vinod Koul 
Cc: Kalle Valo 
Cc: Maulik Shah 
Cc: Lina Iyer 
Cc: Saravana Kannan 
Cc: Todd Kjos 
Cc: Greg Kroah-Hartman 
Cc: linux-arm-...@vger.kernel.org
Cc: iommu@lists.linux-foundation.org
Cc: linux-g...@vger.kernel.org
Acked-by: Kalle Valo 
Acked-by: Greg Kroah-Hartman 
Reviewed-by: Bjorn Andersson 
Signed-off-by: John Stultz 
---
v3:
* Fix __arm_smccc_smc build issue reported by
  kernel test robot 
v4:
* Add "depends on QCOM_SCM || !QCOM_SCM" bit to ath10k
  config that requires it.
v5:
* Fix QCOM_QCM typo in Kconfig, it should be QCOM_SCM
---
 drivers/firmware/Kconfig| 2 +-
 drivers/firmware/Makefile   | 3 ++-
 drivers/firmware/qcom_scm.c | 4 
 drivers/iommu/Kconfig   | 2 ++
 drivers/net/wireless/ath/ath10k/Kconfig | 1 +
 5 files changed, 10 insertions(+), 2 deletions(-)

diff --git a/drivers/firmware/Kconfig b/drivers/firmware/Kconfig
index db0ea2d2d75a2..af53778edc7e2 100644
--- a/drivers/firmware/Kconfig
+++ b/drivers/firmware/Kconfig
@@ -235,7 +235,7 @@ config INTEL_STRATIX10_RSU
  Say Y here if you want Intel RSU support.
 
 config QCOM_SCM
-   bool
+   tristate "Qcom SCM driver"
depends on ARM || ARM64
depends on HAVE_ARM_SMCCC
select RESET_CONTROLLER
diff --git a/drivers/firmware/Makefile b/drivers/firmware/Makefile
index 5e013b6a3692e..523173cbff335 100644
--- a/drivers/firmware/Makefile
+++ b/drivers/firmware/Makefile
@@ -17,7 +17,8 @@ obj-$(CONFIG_ISCSI_IBFT)  += iscsi_ibft.o
 obj-$(CONFIG_FIRMWARE_MEMMAP)  += memmap.o
 obj-$(CONFIG_RASPBERRYPI_FIRMWARE) += raspberrypi.o
 obj-$(CONFIG_FW_CFG_SYSFS) += qemu_fw_cfg.o
-obj-$(CONFIG_QCOM_SCM) += qcom_scm.o qcom_scm-smc.o qcom_scm-legacy.o
+obj-$(CONFIG_QCOM_SCM) += qcom-scm.o
+qcom-scm-objs += qcom_scm.o qcom_scm-smc.o qcom_scm-legacy.o
 obj-$(CONFIG_TI_SCI_PROTOCOL)  += ti_sci.o
 obj-$(CONFIG_TRUSTED_FOUNDATIONS) += trusted_foundations.o
 obj-$(CONFIG_TURRIS_MOX_RWTM)  += turris-mox-rwtm.o
diff --git a/drivers/firmware/qcom_scm.c b/drivers/firmware/qcom_scm.c
index ee9cb545e73b9..bb9ce3f92931a 100644
--- a/drivers/firmware/qcom_scm.c
+++ b/drivers/firmware/qcom_scm.c
@@ -1296,6 +1296,7 @@ static const struct of_device_id qcom_scm_dt_match[] = {
{ .compatible = "qcom,scm" },
{}
 };
+MODULE_DEVICE_TABLE(of, qcom_scm_dt_match);
 
 static struct platform_driver qcom_scm_driver = {
.driver = {
@@ -1312,3 +1313,6 @@ static int __init qcom_scm_init(void)
return platform_driver_register(_scm_driver);
 }
 subsys_initcall(qcom_scm_init);
+
+MODULE_DESCRIPTION("Qualcomm Technologies, Inc. SCM driver");
+MODULE_LICENSE("GPL v2");
diff --git a/drivers/iommu/Kconfig b/drivers/iommu/Kconfig
index 1f111b399bcab..38f7b7a8e2843 100644
--- a/drivers/iommu/Kconfig
+++ b/drivers/iommu/Kconfig
@@ -253,6 +253,7 @@ config SPAPR_TCE_IOMMU
 config ARM_SMMU
tristate "ARM Ltd. System MMU (SMMU) Support"
depends on ARM64 || ARM || (COMPILE_TEST && !GENERIC_ATOMIC64)
+   depends on QCOM_SCM || !QCOM_SCM #if QCOM_SCM=m this can't be =y
select IOMMU_API
select IOMMU_IO_PGTABLE_LPAE
select ARM_DMA_USE_IOMMU if ARM
@@ -382,6 +383,7 @@ config QCOM_IOMMU
# Note: iommu drivers cannot (yet?) be built as modules
bool "Qualcomm IOMMU Support"
depends on ARCH_QCOM || (COMPILE_TEST && !GENERIC_ATOMIC64)
+   depends on QCOM_SCM=y
select IOMMU_API
select IOMMU_IO_PGTABLE_LPAE
select ARM_DMA_USE_IOMMU
diff --git a/drivers/net/wireless/ath/ath10k/Kconfig 
b/drivers/net/wireless/ath/ath10k/Kconfig
index 40f91bc8514d8..741289e385d59 100644
--- a/drivers/net/wireless/ath/ath10k/Kconfig
+++ b/drivers/net/wireless/ath/ath10k/Kconfig
@@ -44,6 +44,7 @@ config ATH10K_SNOC
tristate "Qualcomm ath10k SNOC support"
depends on ATH10K
depends on ARCH_QCOM || COMPILE_TEST
+   depends on QCOM_SCM || !QCOM_SCM #if QCOM_SCM=m this can't be =y
select QCOM_QMI_HELPERS
help
  This module adds support for integrated WCN3990 chip connected
-- 
2.25.1

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [RFC PATCH v3 5/8] vfio/type1: VFIO_IOMMU_ENABLE_IOPF

2021-05-18 Thread Alex Williamson
On Fri, 9 Apr 2021 11:44:17 +0800
Shenming Lu  wrote:

> Since enabling IOPF for devices may lead to a slow ramp up of performance,
> we add an ioctl VFIO_IOMMU_ENABLE_IOPF to make it configurable. And the
> IOPF enabling of a VFIO device includes setting IOMMU_DEV_FEAT_IOPF and
> registering the VFIO IOPF handler.
> 
> Note that VFIO_IOMMU_DISABLE_IOPF is not supported since there may be
> inflight page faults when disabling.
> 
> Signed-off-by: Shenming Lu 
> ---
>  drivers/vfio/vfio_iommu_type1.c | 223 +++-
>  include/uapi/linux/vfio.h   |   6 +
>  2 files changed, 226 insertions(+), 3 deletions(-)
> 
> diff --git a/drivers/vfio/vfio_iommu_type1.c b/drivers/vfio/vfio_iommu_type1.c
> index 01e296c6dc9e..7df5711e743a 100644
> --- a/drivers/vfio/vfio_iommu_type1.c
> +++ b/drivers/vfio/vfio_iommu_type1.c
> @@ -71,6 +71,7 @@ struct vfio_iommu {
>   struct rb_root  dma_list;
>   struct blocking_notifier_head notifier;
>   struct mmu_notifier mn;
> + struct mm_struct*mm;

We currently have no requirement that a single mm is used for all
container mappings.  Does enabling IOPF impose such a requirement?
Shouldn't MAP/UNMAP enforce that?

>   unsigned intdma_avail;
>   unsigned intvaddr_invalid_count;
>   uint64_tpgsize_bitmap;
> @@ -81,6 +82,7 @@ struct vfio_iommu {
>   booldirty_page_tracking;
>   boolpinned_page_dirty_scope;
>   boolcontainer_open;
> + booliopf_enabled;
>  };
>  
>  struct vfio_domain {
> @@ -461,6 +463,38 @@ vfio_find_iopf_group(struct iommu_group *iommu_group)
>   return node ? iopf_group : NULL;
>  }
>  
> +static void vfio_link_iopf_group(struct vfio_iopf_group *new)
> +{
> + struct rb_node **link, *parent = NULL;
> + struct vfio_iopf_group *iopf_group;
> +
> + mutex_lock(_group_list_lock);
> +
> + link = _group_list.rb_node;
> +
> + while (*link) {
> + parent = *link;
> + iopf_group = rb_entry(parent, struct vfio_iopf_group, node);
> +
> + if (new->iommu_group < iopf_group->iommu_group)
> + link = &(*link)->rb_left;
> + else
> + link = &(*link)->rb_right;
> + }
> +
> + rb_link_node(>node, parent, link);
> + rb_insert_color(>node, _group_list);
> +
> + mutex_unlock(_group_list_lock);
> +}
> +
> +static void vfio_unlink_iopf_group(struct vfio_iopf_group *old)
> +{
> + mutex_lock(_group_list_lock);
> + rb_erase(>node, _group_list);
> + mutex_unlock(_group_list_lock);
> +}
> +
>  static int vfio_lock_acct(struct vfio_dma *dma, long npage, bool async)
>  {
>   struct mm_struct *mm;
> @@ -2363,6 +2397,68 @@ static void vfio_iommu_iova_insert_copy(struct 
> vfio_iommu *iommu,
>   list_splice_tail(iova_copy, iova);
>  }
>  
> +static int vfio_dev_domian_nested(struct device *dev, int *nested)


s/domian/domain/


> +{
> + struct iommu_domain *domain;
> +
> + domain = iommu_get_domain_for_dev(dev);
> + if (!domain)
> + return -ENODEV;
> +
> + return iommu_domain_get_attr(domain, DOMAIN_ATTR_NESTING, nested);


Wouldn't this be easier to use if it returned bool, false on -errno?


> +}
> +
> +static int vfio_iommu_type1_dma_map_iopf(struct iommu_fault *fault, void 
> *data);
> +
> +static int dev_enable_iopf(struct device *dev, void *data)
> +{
> + int *enabled_dev_cnt = data;
> + int nested;
> + u32 flags;
> + int ret;
> +
> + ret = iommu_dev_enable_feature(dev, IOMMU_DEV_FEAT_IOPF);
> + if (ret)
> + return ret;
> +
> + ret = vfio_dev_domian_nested(dev, );
> + if (ret)
> + goto out_disable;
> +
> + if (nested)
> + flags = FAULT_REPORT_NESTED_L2;
> + else
> + flags = FAULT_REPORT_FLAT;
> +
> + ret = iommu_register_device_fault_handler(dev,
> + vfio_iommu_type1_dma_map_iopf, flags, dev);
> + if (ret)
> + goto out_disable;
> +
> + (*enabled_dev_cnt)++;
> + return 0;
> +
> +out_disable:
> + iommu_dev_disable_feature(dev, IOMMU_DEV_FEAT_IOPF);
> + return ret;
> +}
> +
> +static int dev_disable_iopf(struct device *dev, void *data)
> +{
> + int *enabled_dev_cnt = data;
> +
> + if (enabled_dev_cnt && *enabled_dev_cnt <= 0)
> + return -1;


Use an errno.


> +
> + WARN_ON(iommu_unregister_device_fault_handler(dev));
> + WARN_ON(iommu_dev_disable_feature(dev, IOMMU_DEV_FEAT_IOPF));
> +
> + if (enabled_dev_cnt)
> + (*enabled_dev_cnt)--;


Why do we need these counters?

> +
> + return 0;
> +}
> +
>  static int vfio_iommu_type1_attach_group(void *iommu_data,
>struct iommu_group *iommu_group)
>  {
> @@ -2376,6 +2472,8 @@ static int vfio_iommu_type1_attach_group(void 
> *iommu_data,
>  

Re: [RFC PATCH v3 0/8] Add IOPF support for VFIO passthrough

2021-05-18 Thread Alex Williamson
On Fri, 9 Apr 2021 11:44:12 +0800
Shenming Lu  wrote:

> Hi,
> 
> Requesting for your comments and suggestions. :-)
> 
> The static pinning and mapping problem in VFIO and possible solutions
> have been discussed a lot [1, 2]. One of the solutions is to add I/O
> Page Fault support for VFIO devices. Different from those relatively
> complicated software approaches such as presenting a vIOMMU that provides
> the DMA buffer information (might include para-virtualized optimizations),
> IOPF mainly depends on the hardware faulting capability, such as the PCIe
> PRI extension or Arm SMMU stall model. What's more, the IOPF support in
> the IOMMU driver has already been implemented in SVA [3]. So we add IOPF
> support for VFIO passthrough based on the IOPF part of SVA in this series.

The SVA proposals are being reworked to make use of a new IOASID
object, it's not clear to me that this shouldn't also make use of that
work as it does a significant expansion of the type1 IOMMU with fault
handlers that would duplicate new work using that new model.

> We have measured its performance with UADK [4] (passthrough an accelerator
> to a VM(1U16G)) on Hisilicon Kunpeng920 board (and compared with host SVA):
> 
> Run hisi_sec_test...
>  - with varying sending times and message lengths
>  - with/without IOPF enabled (speed slowdown)
> 
> when msg_len = 1MB (and PREMAP_LEN (in Patch 4) = 1):
> slowdown (num of faults)
>  times  VFIO IOPF  host SVA
>  1  63.4% (518)82.8% (512)
>  10022.9% (1058)   47.9% (1024)
>  1000   2.6% (1071)8.5% (1024)
> 
> when msg_len = 10MB (and PREMAP_LEN = 512):
> slowdown (num of faults)
>  times  VFIO IOPF
>  1  32.6% (13)
>  1003.5% (26)
>  1000   1.6% (26)

It seems like this is only an example that you can make a benchmark
show anything you want.  The best results would be to pre-map
everything, which is what we have without this series.  What is an
acceptable overhead to incur to avoid page pinning?  What if userspace
had more fine grained control over which mappings were available for
faulting and which were statically mapped?  I don't really see what
sense the pre-mapping range makes.  If I assume the user is QEMU in a
non-vIOMMU configuration, pre-mapping the beginning of each RAM section
doesn't make any logical sense relative to device DMA.

Comments per patch to follow.  Thanks,

Alex


> History:
> 
> v2 -> v3
>  - Nit fixes.
>  - No reason to disable reporting the unrecoverable faults. (baolu)
>  - Maintain a global IOPF enabled group list.
>  - Split the pre-mapping optimization to be a separate patch.
>  - Add selective faulting support (use vfio_pin_pages to indicate the
>non-faultable scope and add a new struct vfio_range to record it,
>untested). (Kevin)
> 
> v1 -> v2
>  - Numerous improvements following the suggestions. Thanks a lot to all
>of you.
> 
> Note that PRI is not supported at the moment since there is no hardware.
> 
> Links:
> [1] Lesokhin I, et al. Page Fault Support for Network Controllers. In ASPLOS,
> 2016.
> [2] Tian K, et al. coIOMMU: A Virtual IOMMU with Cooperative DMA Buffer 
> Tracking
> for Efficient Memory Management in Direct I/O. In USENIX ATC, 2020.
> [3] 
> https://patchwork.kernel.org/project/linux-arm-kernel/cover/20210401154718.307519-1-jean-phili...@linaro.org/
> [4] https://github.com/Linaro/uadk
> 
> Thanks,
> Shenming
> 
> 
> Shenming Lu (8):
>   iommu: Evolve the device fault reporting framework
>   vfio/type1: Add a page fault handler
>   vfio/type1: Add an MMU notifier to avoid pinning
>   vfio/type1: Pre-map more pages than requested in the IOPF handling
>   vfio/type1: VFIO_IOMMU_ENABLE_IOPF
>   vfio/type1: No need to statically pin and map if IOPF enabled
>   vfio/type1: Add selective DMA faulting support
>   vfio: Add nested IOPF support
> 
>  .../iommu/arm/arm-smmu-v3/arm-smmu-v3-sva.c   |3 +-
>  drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c   |   18 +-
>  drivers/iommu/iommu.c |   56 +-
>  drivers/vfio/vfio.c   |   85 +-
>  drivers/vfio/vfio_iommu_type1.c   | 1000 -
>  include/linux/iommu.h |   19 +-
>  include/linux/vfio.h  |   13 +
>  include/uapi/linux/iommu.h|4 +
>  include/uapi/linux/vfio.h |6 +
>  9 files changed, 1181 insertions(+), 23 deletions(-)
> 

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [RFC PATCH v3 6/8] vfio/type1: No need to statically pin and map if IOPF enabled

2021-05-18 Thread Alex Williamson
On Fri, 9 Apr 2021 11:44:18 +0800
Shenming Lu  wrote:

> If IOPF enabled for the VFIO container, there is no need to statically
> pin and map the entire DMA range, we can do it on demand. And unmap
> according to the IOPF mapped bitmap when removing vfio_dma.
> 
> Note that we still mark all pages dirty even if IOPF enabled, we may
> add IOPF-based fine grained dirty tracking support in the future.
> 
> Signed-off-by: Shenming Lu 
> ---
>  drivers/vfio/vfio_iommu_type1.c | 38 +++--
>  1 file changed, 32 insertions(+), 6 deletions(-)
> 
> diff --git a/drivers/vfio/vfio_iommu_type1.c b/drivers/vfio/vfio_iommu_type1.c
> index 7df5711e743a..dcc93c3b258c 100644
> --- a/drivers/vfio/vfio_iommu_type1.c
> +++ b/drivers/vfio/vfio_iommu_type1.c
> @@ -175,6 +175,7 @@ struct vfio_iopf_group {
>  #define IOPF_MAPPED_BITMAP_GET(dma, i)   \
> ((dma->iopf_mapped_bitmap[(i) / BITS_PER_LONG]
> \
>  >> ((i) % BITS_PER_LONG)) & 0x1)  
> +#define IOPF_MAPPED_BITMAP_BYTES(n)  DIRTY_BITMAP_BYTES(n)
>  
>  #define WAITED 1
>  
> @@ -959,7 +960,8 @@ static int vfio_iommu_type1_pin_pages(void *iommu_data,
>* already pinned and accounted. Accouting should be done if there is no
>* iommu capable domain in the container.
>*/
> - do_accounting = !IS_IOMMU_CAP_DOMAIN_IN_CONTAINER(iommu);
> + do_accounting = !IS_IOMMU_CAP_DOMAIN_IN_CONTAINER(iommu) ||
> + iommu->iopf_enabled;
>  
>   for (i = 0; i < npage; i++) {
>   struct vfio_pfn *vpfn;
> @@ -1048,7 +1050,8 @@ static int vfio_iommu_type1_unpin_pages(void 
> *iommu_data,
>  
>   mutex_lock(>lock);
>  
> - do_accounting = !IS_IOMMU_CAP_DOMAIN_IN_CONTAINER(iommu);
> + do_accounting = !IS_IOMMU_CAP_DOMAIN_IN_CONTAINER(iommu) ||
> + iommu->iopf_enabled;

pin/unpin are actually still pinning pages, why does iopf exempt them
from accounting?


>   for (i = 0; i < npage; i++) {
>   struct vfio_dma *dma;
>   dma_addr_t iova;
> @@ -1169,7 +1172,7 @@ static long vfio_unmap_unpin(struct vfio_iommu *iommu, 
> struct vfio_dma *dma,
>   if (!dma->size)
>   return 0;
>  
> - if (!IS_IOMMU_CAP_DOMAIN_IN_CONTAINER(iommu))
> + if (!IS_IOMMU_CAP_DOMAIN_IN_CONTAINER(iommu) || iommu->iopf_enabled)
>   return 0;
>  
>   /*
> @@ -1306,11 +1309,20 @@ static void vfio_unmap_partial_iopf(struct vfio_iommu 
> *iommu,
>   }
>  }
>  
> +static void vfio_dma_clean_iopf(struct vfio_iommu *iommu, struct vfio_dma 
> *dma)
> +{
> + vfio_unmap_partial_iopf(iommu, dma, dma->iova, dma->iova + dma->size);
> +
> + kfree(dma->iopf_mapped_bitmap);
> +}
> +
>  static void vfio_remove_dma(struct vfio_iommu *iommu, struct vfio_dma *dma)
>  {
>   WARN_ON(!RB_EMPTY_ROOT(>pfn_list));
>   vfio_unmap_unpin(iommu, dma, true);
>   vfio_unlink_dma(iommu, dma);
> + if (iommu->iopf_enabled)
> + vfio_dma_clean_iopf(iommu, dma);
>   put_task_struct(dma->task);
>   vfio_dma_bitmap_free(dma);
>   if (dma->vaddr_invalid) {
> @@ -1359,7 +1371,8 @@ static int update_user_bitmap(u64 __user *bitmap, 
> struct vfio_iommu *iommu,
>* mark all pages dirty if any IOMMU capable device is not able
>* to report dirty pages and all pages are pinned and mapped.
>*/
> - if (iommu->num_non_pinned_groups && dma->iommu_mapped)
> + if (iommu->num_non_pinned_groups &&
> + (dma->iommu_mapped || iommu->iopf_enabled))
>   bitmap_set(dma->bitmap, 0, nbits);

This seems like really poor integration of iopf into dirty page
tracking.  I'd expect dirty logging to flush the mapped pages and
write faults to mark pages dirty.  Shouldn't the fault handler also
provide only the access faulted, so for example a read fault wouldn't
mark the page dirty?

>  
>   if (shift) {
> @@ -1772,6 +1785,16 @@ static int vfio_dma_do_map(struct vfio_iommu *iommu,
>   goto out_unlock;
>   }
>  
> + if (iommu->iopf_enabled) {
> + dma->iopf_mapped_bitmap = kvzalloc(IOPF_MAPPED_BITMAP_BYTES(
> + size >> PAGE_SHIFT), 
> GFP_KERNEL);
> + if (!dma->iopf_mapped_bitmap) {
> + ret = -ENOMEM;
> + kfree(dma);
> + goto out_unlock;
> + }


So we're assuming nothing can fault and therefore nothing can reference
the iopf_mapped_bitmap until this point in the series?


> + }
> +
>   iommu->dma_avail--;
>   dma->iova = iova;
>   dma->vaddr = vaddr;
> @@ -1811,8 +1834,11 @@ static int vfio_dma_do_map(struct vfio_iommu *iommu,
>   /* Insert zero-sized and grow as we map chunks of it */
>   vfio_link_dma(iommu, dma);
>  
> - /* Don't pin and map if container doesn't contain IOMMU capable domain*/
> - if (!IS_IOMMU_CAP_DOMAIN_IN_CONTAINER(iommu))
> + /*
> +  

Re: [RFC PATCH v3 3/8] vfio/type1: Add an MMU notifier to avoid pinning

2021-05-18 Thread Alex Williamson
On Fri, 9 Apr 2021 11:44:15 +0800
Shenming Lu  wrote:

> To avoid pinning pages when they are mapped in IOMMU page tables, we
> add an MMU notifier to tell the addresses which are no longer valid
> and try to unmap them.
> 
> Signed-off-by: Shenming Lu 
> ---
>  drivers/vfio/vfio_iommu_type1.c | 112 +++-
>  1 file changed, 109 insertions(+), 3 deletions(-)
> 
> diff --git a/drivers/vfio/vfio_iommu_type1.c b/drivers/vfio/vfio_iommu_type1.c
> index ab0ff60ee207..1cb9d1f2717b 100644
> --- a/drivers/vfio/vfio_iommu_type1.c
> +++ b/drivers/vfio/vfio_iommu_type1.c
> @@ -40,6 +40,7 @@
>  #include 
>  #include 
>  #include 
> +#include 
>  
>  #define DRIVER_VERSION  "0.2"
>  #define DRIVER_AUTHOR   "Alex Williamson "
> @@ -69,6 +70,7 @@ struct vfio_iommu {
>   struct mutexlock;
>   struct rb_root  dma_list;
>   struct blocking_notifier_head notifier;
> + struct mmu_notifier mn;
>   unsigned intdma_avail;
>   unsigned intvaddr_invalid_count;
>   uint64_tpgsize_bitmap;
> @@ -1204,6 +1206,72 @@ static long vfio_unmap_unpin(struct vfio_iommu *iommu, 
> struct vfio_dma *dma,
>   return unlocked;
>  }
>  
> +/* Unmap the IOPF mapped pages in the specified range. */
> +static void vfio_unmap_partial_iopf(struct vfio_iommu *iommu,
> + struct vfio_dma *dma,
> + dma_addr_t start, dma_addr_t end)
> +{
> + struct iommu_iotlb_gather *gathers;
> + struct vfio_domain *d;
> + int i, num_domains = 0;
> +
> + list_for_each_entry(d, >domain_list, next)
> + num_domains++;
> +
> + gathers = kzalloc(sizeof(*gathers) * num_domains, GFP_KERNEL);
> + if (gathers) {
> + for (i = 0; i < num_domains; i++)
> + iommu_iotlb_gather_init([i]);
> + }


If we're always serialized on iommu->lock, would it make sense to have
gathers pre-allocated on the vfio_iommu object?

> +
> + while (start < end) {
> + unsigned long bit_offset;
> + size_t len;
> +
> + bit_offset = (start - dma->iova) >> PAGE_SHIFT;
> +
> + for (len = 0; start + len < end; len += PAGE_SIZE) {
> + if (!IOPF_MAPPED_BITMAP_GET(dma,
> + bit_offset + (len >> PAGE_SHIFT)))
> + break;


There are bitmap helpers for this, find_first_bit(),
find_next_zero_bit().


> + }
> +
> + if (len) {
> + i = 0;
> + list_for_each_entry(d, >domain_list, next) {
> + size_t unmapped;
> +
> + if (gathers)
> + unmapped = iommu_unmap_fast(d->domain,
> + start, len,
> + 
> [i++]);
> + else
> + unmapped = iommu_unmap(d->domain,
> +start, len);
> +
> + if (WARN_ON(unmapped != len))

The IOMMU API does not guarantee arbitrary unmap sizes, this will
trigger and this exit path is wrong.  If we've already unmapped the
IOMMU, shouldn't we proceed with @unmapped rather than @len so the
device can re-fault the extra mappings?  Otherwise the IOMMU state
doesn't match the iopf bitmap state.

> + goto out;
> + }
> +
> + bitmap_clear(dma->iopf_mapped_bitmap,
> +  bit_offset, len >> PAGE_SHIFT);
> +
> + cond_resched();
> + }
> +
> + start += (len + PAGE_SIZE);
> + }
> +
> +out:
> + if (gathers) {
> + i = 0;
> + list_for_each_entry(d, >domain_list, next)
> + iommu_iotlb_sync(d->domain, [i++]);
> +
> + kfree(gathers);
> + }
> +}
> +
>  static void vfio_remove_dma(struct vfio_iommu *iommu, struct vfio_dma *dma)
>  {
>   WARN_ON(!RB_EMPTY_ROOT(>pfn_list));
> @@ -3197,17 +3265,18 @@ static int vfio_iommu_type1_dma_map_iopf(struct 
> iommu_fault *fault, void *data)
>  
>   vaddr = iova - dma->iova + dma->vaddr;
>  
> - if (vfio_pin_page_external(dma, vaddr, , true))
> + if (vfio_pin_page_external(dma, vaddr, , false))
>   goto out_invalid;
>  
>   if (vfio_iommu_map(iommu, iova, pfn, 1, dma->prot)) {
> - if (put_pfn(pfn, dma->prot))
> - vfio_lock_acct(dma, -1, true);
> + put_pfn(pfn, dma->prot);
>   goto out_invalid;
>   }
>  
>   bitmap_set(dma->iopf_mapped_bitmap, bit_offset, 1);
>  
> + put_pfn(pfn, dma->prot);
> +
>  out_success:
>   status = IOMMU_PAGE_RESP_SUCCESS;
>  
> @@ -3220,6 

Re: [RFC PATCH v3 4/8] vfio/type1: Pre-map more pages than requested in the IOPF handling

2021-05-18 Thread Alex Williamson
On Fri, 9 Apr 2021 11:44:16 +0800
Shenming Lu  wrote:

> To optimize for fewer page fault handlings, we can pre-map more pages
> than requested at once.
> 
> Note that IOPF_PREMAP_LEN is just an arbitrary value for now, which we
> could try further tuning.

I'd prefer that the series introduced full end-to-end functionality
before trying to improve performance.  The pre-map value seems
arbitrary here and as noted in the previous patch, the IOMMU API does
not guarantee unmaps of ranges smaller than the original mapping.  This
would need to map with single page granularity in order to guarantee
page granularity at the mmu notifier when the IOMMU supports
superpages.  Thanks,

Alex

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [RFC PATCH v3 8/8] vfio: Add nested IOPF support

2021-05-18 Thread Alex Williamson
On Fri, 9 Apr 2021 11:44:20 +0800
Shenming Lu  wrote:

> To set up nested mode, drivers such as vfio_pci need to register a
> handler to receive stage/level 1 faults from the IOMMU, but since
> currently each device can only have one iommu dev fault handler,
> and if stage 2 IOPF is already enabled (VFIO_IOMMU_ENABLE_IOPF),
> we choose to update the registered handler (a consolidated one) via
> flags (set FAULT_REPORT_NESTED_L1), and further deliver the received
> stage 1 faults in the handler to the guest through a newly added
> vfio_device_ops callback.

Are there proposed in-kernel drivers that would use any of these
symbols?

> Signed-off-by: Shenming Lu 
> ---
>  drivers/vfio/vfio.c | 81 +
>  drivers/vfio/vfio_iommu_type1.c | 49 +++-
>  include/linux/vfio.h| 12 +
>  3 files changed, 141 insertions(+), 1 deletion(-)
> 
> diff --git a/drivers/vfio/vfio.c b/drivers/vfio/vfio.c
> index 44c8dfabf7de..4245f15914bf 100644
> --- a/drivers/vfio/vfio.c
> +++ b/drivers/vfio/vfio.c
> @@ -2356,6 +2356,87 @@ struct iommu_domain *vfio_group_iommu_domain(struct 
> vfio_group *group)
>  }
>  EXPORT_SYMBOL_GPL(vfio_group_iommu_domain);
>  
> +/*
> + * Register/Update the VFIO IOPF handler to receive
> + * nested stage/level 1 faults.
> + */
> +int vfio_iommu_dev_fault_handler_register_nested(struct device *dev)
> +{
> + struct vfio_container *container;
> + struct vfio_group *group;
> + struct vfio_iommu_driver *driver;
> + int ret;
> +
> + if (!dev)
> + return -EINVAL;
> +
> + group = vfio_group_get_from_dev(dev);
> + if (!group)
> + return -ENODEV;
> +
> + ret = vfio_group_add_container_user(group);
> + if (ret)
> + goto out;
> +
> + container = group->container;
> + driver = container->iommu_driver;
> + if (likely(driver && driver->ops->register_handler))
> + ret = driver->ops->register_handler(container->iommu_data, dev);
> + else
> + ret = -ENOTTY;
> +
> + vfio_group_try_dissolve_container(group);
> +
> +out:
> + vfio_group_put(group);
> + return ret;
> +}
> +EXPORT_SYMBOL_GPL(vfio_iommu_dev_fault_handler_register_nested);
> +
> +int vfio_iommu_dev_fault_handler_unregister_nested(struct device *dev)
> +{
> + struct vfio_container *container;
> + struct vfio_group *group;
> + struct vfio_iommu_driver *driver;
> + int ret;
> +
> + if (!dev)
> + return -EINVAL;
> +
> + group = vfio_group_get_from_dev(dev);
> + if (!group)
> + return -ENODEV;
> +
> + ret = vfio_group_add_container_user(group);
> + if (ret)
> + goto out;
> +
> + container = group->container;
> + driver = container->iommu_driver;
> + if (likely(driver && driver->ops->unregister_handler))
> + ret = driver->ops->unregister_handler(container->iommu_data, 
> dev);
> + else
> + ret = -ENOTTY;
> +
> + vfio_group_try_dissolve_container(group);
> +
> +out:
> + vfio_group_put(group);
> + return ret;
> +}
> +EXPORT_SYMBOL_GPL(vfio_iommu_dev_fault_handler_unregister_nested);
> +
> +int vfio_transfer_iommu_fault(struct device *dev, struct iommu_fault *fault)
> +{
> + struct vfio_device *device = dev_get_drvdata(dev);
> +
> + if (unlikely(!device->ops->transfer))
> + return -EOPNOTSUPP;
> +
> + return device->ops->transfer(device->device_data, fault);
> +}
> +EXPORT_SYMBOL_GPL(vfio_transfer_iommu_fault);
> +
>  /**
>   * Module/class support
>   */
> diff --git a/drivers/vfio/vfio_iommu_type1.c b/drivers/vfio/vfio_iommu_type1.c
> index ba2b5a1cf6e9..9d1adeddb303 100644
> --- a/drivers/vfio/vfio_iommu_type1.c
> +++ b/drivers/vfio/vfio_iommu_type1.c
> @@ -3821,13 +3821,32 @@ static int vfio_iommu_type1_dma_map_iopf(struct 
> iommu_fault *fault, void *data)
>   struct vfio_batch batch;
>   struct vfio_range *range;
>   dma_addr_t iova = ALIGN_DOWN(fault->prm.addr, PAGE_SIZE);
> - int access_flags = 0;
> + int access_flags = 0, nested;
>   size_t premap_len, map_len, mapped_len = 0;
>   unsigned long bit_offset, vaddr, pfn, i, npages;
>   int ret;
>   enum iommu_page_response_code status = IOMMU_PAGE_RESP_INVALID;
>   struct iommu_page_response resp = {0};
>  
> + if (vfio_dev_domian_nested(dev, ))
> + return -ENODEV;
> +
> + /*
> +  * When configured in nested mode, further deliver the
> +  * stage/level 1 faults to the guest.
> +  */
> + if (nested) {
> + bool l2;
> +
> + if (fault->type == IOMMU_FAULT_PAGE_REQ)
> + l2 = fault->prm.flags & IOMMU_FAULT_PAGE_REQUEST_L2;
> + if (fault->type == IOMMU_FAULT_DMA_UNRECOV)
> + l2 = fault->event.flags & IOMMU_FAULT_UNRECOV_L2;
> +
> + if (!l2)
> + return vfio_transfer_iommu_fault(dev, fault);
> + }
> 

Re: [RFC PATCH v3 1/8] iommu: Evolve the device fault reporting framework

2021-05-18 Thread Alex Williamson
On Fri, 9 Apr 2021 11:44:13 +0800
Shenming Lu  wrote:

> This patch follows the discussion here:
> 
> https://lore.kernel.org/linux-acpi/YAaxjmJW+ZMvrhac@myrica/
> 
> Besides SVA/vSVA, such as VFIO may also enable (2nd level) IOPF to remove
> pinning restriction. In order to better support more scenarios of using
> device faults, we extend iommu_register_fault_handler() with flags and
> introduce FAULT_REPORT_ to describe the device fault reporting capability
> under a specific configuration.
> 
> Note that we don't further distinguish recoverable and unrecoverable faults
> by flags in the fault reporting cap, having PAGE_FAULT_REPORT_ +
> UNRECOV_FAULT_REPORT_ seems not a clean way.
> 
> In addition, still take VFIO as an example, in nested mode, the 1st level
> and 2nd level fault reporting may be configured separately and currently
> each device can only register one iommu dev fault handler, so we add a
> handler update interface for this.


IIUC, you're introducing flags for the fault handler callout, which
essentially allows the IOMMU layer to filter which types of faults the
handler can receive.  You then need an update function to modify those
flags.  Why can't the handler itself perform this filtering?  For
instance in your vfio example, the handler registered by the type1
backend could itself return fault until the fault transfer path to the
device driver is established.  Thanks,

Alex

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [RFC PATCH v3 2/8] vfio/type1: Add a page fault handler

2021-05-18 Thread Alex Williamson
On Fri, 9 Apr 2021 11:44:14 +0800
Shenming Lu  wrote:

> VFIO manages the DMA mapping itself. To support IOPF (on-demand paging)
> for VFIO (IOMMU capable) devices, we add a VFIO page fault handler to
> serve the reported page faults from the IOMMU driver.
> 
> Signed-off-by: Shenming Lu 
> ---
>  drivers/vfio/vfio_iommu_type1.c | 114 
>  1 file changed, 114 insertions(+)
> 
> diff --git a/drivers/vfio/vfio_iommu_type1.c b/drivers/vfio/vfio_iommu_type1.c
> index 45cbfd4879a5..ab0ff60ee207 100644
> --- a/drivers/vfio/vfio_iommu_type1.c
> +++ b/drivers/vfio/vfio_iommu_type1.c
> @@ -101,6 +101,7 @@ struct vfio_dma {
>   struct task_struct  *task;
>   struct rb_root  pfn_list;   /* Ex-user pinned pfn list */
>   unsigned long   *bitmap;
> + unsigned long   *iopf_mapped_bitmap;
>  };
>  
>  struct vfio_batch {
> @@ -141,6 +142,16 @@ struct vfio_regions {
>   size_t len;
>  };
>  
> +/* A global IOPF enabled group list */
> +static struct rb_root iopf_group_list = RB_ROOT;
> +static DEFINE_MUTEX(iopf_group_list_lock);
> +
> +struct vfio_iopf_group {
> + struct rb_node  node;
> + struct iommu_group  *iommu_group;
> + struct vfio_iommu   *iommu;
> +};
> +
>  #define IS_IOMMU_CAP_DOMAIN_IN_CONTAINER(iommu)  \
>   (!list_empty(>domain_list))
>  
> @@ -157,6 +168,10 @@ struct vfio_regions {
>  #define DIRTY_BITMAP_PAGES_MAX((u64)INT_MAX)
>  #define DIRTY_BITMAP_SIZE_MAX 
> DIRTY_BITMAP_BYTES(DIRTY_BITMAP_PAGES_MAX)
>  
> +#define IOPF_MAPPED_BITMAP_GET(dma, i)   \
> +   ((dma->iopf_mapped_bitmap[(i) / BITS_PER_LONG]
> \
> +>> ((i) % BITS_PER_LONG)) & 0x1)


Can't we just use test_bit()?


> +
>  #define WAITED 1
>  
>  static int put_pfn(unsigned long pfn, int prot);
> @@ -416,6 +431,34 @@ static int vfio_iova_put_vfio_pfn(struct vfio_dma *dma, 
> struct vfio_pfn *vpfn)
>   return ret;
>  }
>  
> +/*
> + * Helper functions for iopf_group_list
> + */
> +static struct vfio_iopf_group *
> +vfio_find_iopf_group(struct iommu_group *iommu_group)
> +{
> + struct vfio_iopf_group *iopf_group;
> + struct rb_node *node;
> +
> + mutex_lock(_group_list_lock);
> +
> + node = iopf_group_list.rb_node;
> +
> + while (node) {
> + iopf_group = rb_entry(node, struct vfio_iopf_group, node);
> +
> + if (iommu_group < iopf_group->iommu_group)
> + node = node->rb_left;
> + else if (iommu_group > iopf_group->iommu_group)
> + node = node->rb_right;
> + else
> + break;
> + }
> +
> + mutex_unlock(_group_list_lock);
> + return node ? iopf_group : NULL;
> +}

This looks like a pretty heavy weight operation per DMA fault.

I'm also suspicious of this validity of this iopf_group after we've
dropped the locking, the ordering of patches makes this very confusing.

> +
>  static int vfio_lock_acct(struct vfio_dma *dma, long npage, bool async)
>  {
>   struct mm_struct *mm;
> @@ -3106,6 +3149,77 @@ static int vfio_iommu_type1_dirty_pages(struct 
> vfio_iommu *iommu,
>   return -EINVAL;
>  }
>  
> +/* VFIO I/O Page Fault handler */
> +static int vfio_iommu_type1_dma_map_iopf(struct iommu_fault *fault, void 
> *data)

>From the comment, this seems like the IOMMU fault handler (the
construction of this series makes this difficult to follow) and
eventually it handles more than DMA mapping, for example transferring
faults to the device driver.  "dma_map_iopf" seems like a poorly scoped
name.

> +{
> + struct device *dev = (struct device *)data;
> + struct iommu_group *iommu_group;
> + struct vfio_iopf_group *iopf_group;
> + struct vfio_iommu *iommu;
> + struct vfio_dma *dma;
> + dma_addr_t iova = ALIGN_DOWN(fault->prm.addr, PAGE_SIZE);
> + int access_flags = 0;
> + unsigned long bit_offset, vaddr, pfn;
> + int ret;
> + enum iommu_page_response_code status = IOMMU_PAGE_RESP_INVALID;
> + struct iommu_page_response resp = {0};
> +
> + if (fault->type != IOMMU_FAULT_PAGE_REQ)
> + return -EOPNOTSUPP;
> +
> + iommu_group = iommu_group_get(dev);
> + if (!iommu_group)
> + return -ENODEV;
> +
> + iopf_group = vfio_find_iopf_group(iommu_group);
> + iommu_group_put(iommu_group);
> + if (!iopf_group)
> + return -ENODEV;
> +
> + iommu = iopf_group->iommu;
> +
> + mutex_lock(>lock);

Again, I'm dubious of our ability to grab this lock from an object with
an unknown lifecycle and races we might have with that group being
detached or DMA unmapped.  Also, how effective is enabling IOMMU page
faulting if we're serializing all faults within a container context?

> +
> + ret = vfio_find_dma_valid(iommu, iova, PAGE_SIZE, );
> + if (ret < 0)
> + goto out_invalid;
> +
> + 

Re: [RFC PATCH v3 7/8] vfio/type1: Add selective DMA faulting support

2021-05-18 Thread Alex Williamson
On Fri, 9 Apr 2021 11:44:19 +0800
Shenming Lu  wrote:

> Some devices only allow selective DMA faulting. Similar to the selective
> dirty page tracking, the vendor driver can call vfio_pin_pages() to
> indicate the non-faultable scope, we add a new struct vfio_range to
> record it, then when the IOPF handler receives any page request out
> of the scope, we can directly return with an invalid response.

Seems like this highlights a deficiency in the design, that the user
can't specify mappings as iopf enabled or disabled.  Also, if the
vendor driver has pinned pages within the range, shouldn't that prevent
them from faulting in the first place?  Why do we need yet more
tracking structures?  Pages pinned by the vendor driver need to count
against the user's locked memory limits regardless of iopf.  Thanks,

Alex

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH 1/1] dma-contiguous: return early for dt case in dma_contiguous_reserve

2021-05-18 Thread Robin Murphy

On 2021-05-18 12:28, Dong Aisheng wrote:

dma_contiguous_reserve() aims to support cmdline case for CMA memory
reserve. But if users define reserved memory in DT,
'dma_contiguous_default_area' will not be 0, then it's meaningless
to continue to run dma_contiguous_reserve(). So we return early
if detect 'dma_contiguous_default_area' is unzero.


But dma_contiguous_default_area *shouldn't* be set if the command-line 
argument is present - see the "if (size_cmdline != -1 && default_cma)" 
part of rmem_cma_setup(). Are you seeing something different in practice?



Cc: Christoph Hellwig 
Cc: Marek Szyprowski 
Cc: Robin Murphy 
Signed-off-by: Dong Aisheng 
---
  kernel/dma/contiguous.c | 5 -
  1 file changed, 4 insertions(+), 1 deletion(-)

diff --git a/kernel/dma/contiguous.c b/kernel/dma/contiguous.c
index 3d63d91cba5c..ebade9f43eff 100644
--- a/kernel/dma/contiguous.c
+++ b/kernel/dma/contiguous.c
@@ -171,6 +171,9 @@ void __init dma_contiguous_reserve(phys_addr_t limit)
phys_addr_t selected_limit = limit;
bool fixed = false;
  
+	if (dma_contiguous_default_area)

+   return;
+
pr_debug("%s(limit %08lx)\n", __func__, (unsigned long)limit);
  
  	if (size_cmdline != -1) {

@@ -191,7 +194,7 @@ void __init dma_contiguous_reserve(phys_addr_t limit)
  #endif
}
  
-	if (selected_size && !dma_contiguous_default_area) {

+   if (selected_size) {


Either way, does skipping a handful of trivial calculations and a 
debugging message really matter even when it is redundant? I can't 
imagine it has any measurable effect on boot times...


Robin.


pr_debug("%s: reserving %ld MiB for global area\n", __func__,
 (unsigned long)selected_size / SZ_1M);
  


___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[PATCH] dma debug: report -EEXIST errors in add_dma_entry

2021-05-18 Thread Hamza Mahfooz
Since, overlapping mappings are not supported by the DMA API we should
report an error if active_cacheline_insert returns -EEXIST.

Suggested-by: Dan Williams 
Signed-off-by: Hamza Mahfooz 
---
 kernel/dma/debug.c | 6 ++
 1 file changed, 2 insertions(+), 4 deletions(-)

diff --git a/kernel/dma/debug.c b/kernel/dma/debug.c
index 14de1271463f..dadae6255d05 100644
--- a/kernel/dma/debug.c
+++ b/kernel/dma/debug.c
@@ -566,11 +566,9 @@ static void add_dma_entry(struct dma_debug_entry *entry)
if (rc == -ENOMEM) {
pr_err("cacheline tracking ENOMEM, dma-debug disabled\n");
global_disable = true;
+   } else if (rc == -EEXIST) {
+   pr_err("cacheline tracking EEXIST, overlapping mappings aren't 
supported\n");
}
-
-   /* TODO: report -EEXIST errors here as overlapping mappings are
-* not supported by the DMA API
-*/
 }
 
 static int dma_debug_create_entries(gfp_t gfp)
-- 
2.31.1

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[PATCH 1/1] dma-contiguous: return early for dt case in dma_contiguous_reserve

2021-05-18 Thread Dong Aisheng
dma_contiguous_reserve() aims to support cmdline case for CMA memory
reserve. But if users define reserved memory in DT,
'dma_contiguous_default_area' will not be 0, then it's meaningless
to continue to run dma_contiguous_reserve(). So we return early
if detect 'dma_contiguous_default_area' is unzero.

Cc: Christoph Hellwig 
Cc: Marek Szyprowski 
Cc: Robin Murphy 
Signed-off-by: Dong Aisheng 
---
 kernel/dma/contiguous.c | 5 -
 1 file changed, 4 insertions(+), 1 deletion(-)

diff --git a/kernel/dma/contiguous.c b/kernel/dma/contiguous.c
index 3d63d91cba5c..ebade9f43eff 100644
--- a/kernel/dma/contiguous.c
+++ b/kernel/dma/contiguous.c
@@ -171,6 +171,9 @@ void __init dma_contiguous_reserve(phys_addr_t limit)
phys_addr_t selected_limit = limit;
bool fixed = false;
 
+   if (dma_contiguous_default_area)
+   return;
+
pr_debug("%s(limit %08lx)\n", __func__, (unsigned long)limit);
 
if (size_cmdline != -1) {
@@ -191,7 +194,7 @@ void __init dma_contiguous_reserve(phys_addr_t limit)
 #endif
}
 
-   if (selected_size && !dma_contiguous_default_area) {
+   if (selected_size) {
pr_debug("%s: reserving %ld MiB for global area\n", __func__,
 (unsigned long)selected_size / SZ_1M);
 
-- 
2.25.1

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH v3] iommu/iova: put free_iova_mem() outside of spinlock iova_rbtree_lock

2021-05-18 Thread Joerg Roedel
On Tue, May 18, 2021 at 11:08:47AM +0200, Joerg Roedel wrote:
> Looks good to me, but I'll wait for Robins opinion before applying.

Nevermind, just found the new version which Robin already acked.
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH v3] iommu/iova: put free_iova_mem() outside of spinlock iova_rbtree_lock

2021-05-18 Thread Robin Murphy

On 2021-05-18 10:08, Joerg Roedel wrote:

On Mon, Apr 19, 2021 at 03:13:35PM +0800, chenxiang wrote:

From: Xiang Chen 

It is not necessary to put free_iova_mem() inside of spinlock/unlock
iova_rbtree_lock which only leads to more completion for the spinlock.
It has a small promote on the performance after the change. And also
rename private_free_iova() as remove_iova() because the function will not
free iova after that change.

Signed-off-by: Xiang Chen 
---
  drivers/iommu/iova.c | 18 +++---
  1 file changed, 11 insertions(+), 7 deletions(-)


Looks good to me, but I'll wait for Robins opinion before applying.


Go ahead - I acked the resend[1] :)

Thanks,
Robin.

[1] 
https://lore.kernel.org/linux-iommu/9edc6849-f9ff-f6fc-30a0-8bb2e8d3b...@arm.com/

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH v3] iommu/iova: put free_iova_mem() outside of spinlock iova_rbtree_lock

2021-05-18 Thread chenxiang (M)

Hi Joerg,

在 2021/5/18 17:08, Joerg Roedel 写道:

On Mon, Apr 19, 2021 at 03:13:35PM +0800, chenxiang wrote:

From: Xiang Chen 

It is not necessary to put free_iova_mem() inside of spinlock/unlock
iova_rbtree_lock which only leads to more completion for the spinlock.
It has a small promote on the performance after the change. And also
rename private_free_iova() as remove_iova() because the function will not
free iova after that change.

Signed-off-by: Xiang Chen 
---
  drivers/iommu/iova.c | 18 +++---
  1 file changed, 11 insertions(+), 7 deletions(-)

Looks good to me, but I'll wait for Robins opinion before applying.


I have re-sent v3, and Robin has given his acked-by:
https://www.mail-archive.com/iommu@lists.linux-foundation.org/msg50950.html




.




___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

Re: [PATCH -next] iommu/virtio: Add missing MODULE_DEVICE_TABLE

2021-05-18 Thread Joerg Roedel
On Sat, May 08, 2021 at 11:14:51AM +0800, Bixuan Cui wrote:
> This patch adds missing MODULE_DEVICE_TABLE definition which generates
> correct modalias for automatic loading of this driver when it is built
> as an external module.
> 
> Reported-by: Hulk Robot 
> Signed-off-by: Bixuan Cui 
> ---
>  drivers/iommu/virtio-iommu.c | 1 +
>  1 file changed, 1 insertion(+)

Applied for v5.13, thanks.
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH 1/4] iommu/amd: Fix wrong parentheses on page-specific invalidations

2021-05-18 Thread Joerg Roedel
On Sat, May 01, 2021 at 11:59:56PM -0700, Nadav Amit wrote:
> From: Nadav Amit 
> 
> The logic to determine the mask of page-specific invalidations was
> tested in userspace. As the code was copied into the kernel, the
> parentheses were mistakenly set in the wrong place, resulting in the
> wrong mask.
> 
> Fix it.
> 
> Cc: Joerg Roedel 
> Cc: Will Deacon 
> Cc: Jiajun Cao 
> Cc: iommu@lists.linux-foundation.org
> Cc: linux-ker...@vger.kernel.org
> Fixes: 268aa4548277 ("iommu/amd: Page-specific invalidations for more than 
> one page")
> Signed-off-by: Nadav Amit 
> ---
>  drivers/iommu/amd/iommu.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)

Applied this one for v5.13, thanks Nadav.

Somehow the rest of the patch-set got screwed up during sending or so,
at least I see some patches twice in my inbox and with different
subjects.

Can you please re-send patches 2-4 when -rc3 it out?

Thanks,

Joerg
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH] iommu/amd: Clear DMA ops when switching domain

2021-05-18 Thread Joerg Roedel
On Thu, Apr 22, 2021 at 11:42:19AM +0200, Jean-Philippe Brucker wrote:
> Since commit 08a27c1c3ecf ("iommu: Add support to change default domain
> of an iommu group") a user can switch a device between IOMMU and direct
> DMA through sysfs. This doesn't work for AMD IOMMU at the moment because
> dev->dma_ops is not cleared when switching from a DMA to an identity
> IOMMU domain. The DMA layer thus attempts to use the dma-iommu ops on an
> identity domain, causing an oops:
> 
>   # echo :00:05.0 > /sys/sys/bus/pci/drivers/e1000e/unbind
>   # echo identity > /sys/bus/pci/devices/:00:05.0/iommu_group/type
>   # echo :00:05.0 > /sys/sys/bus/pci/drivers/e1000e/bind
>...
>   BUG: kernel NULL pointer dereference, address: 0028
>...
>Call Trace:
> iommu_dma_alloc
> e1000e_setup_tx_resources
> e1000e_open
> 
> Since iommu_change_dev_def_domain() calls probe_finalize() again, clear
> the dma_ops there like Vt-d does.
> 
> Fixes: 08a27c1c3ecf ("iommu: Add support to change default domain of an iommu 
> group")
> Signed-off-by: Jean-Philippe Brucker 

Applied for v5.13, thanks Jean-Philippe.
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH v3] iommu/iova: put free_iova_mem() outside of spinlock iova_rbtree_lock

2021-05-18 Thread Joerg Roedel
On Mon, Apr 19, 2021 at 03:13:35PM +0800, chenxiang wrote:
> From: Xiang Chen 
> 
> It is not necessary to put free_iova_mem() inside of spinlock/unlock
> iova_rbtree_lock which only leads to more completion for the spinlock.
> It has a small promote on the performance after the change. And also
> rename private_free_iova() as remove_iova() because the function will not
> free iova after that change.
> 
> Signed-off-by: Xiang Chen 
> ---
>  drivers/iommu/iova.c | 18 +++---
>  1 file changed, 11 insertions(+), 7 deletions(-)

Looks good to me, but I'll wait for Robins opinion before applying.

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH v4 2/8] iommu/dma: Introduce generic helper to retrieve RMR info

2021-05-18 Thread Joerg Roedel
On Thu, May 13, 2021 at 02:45:44PM +0100, Shameer Kolothum wrote:
> +/**
> + * struct iommu_rmr - Reserved Memory Region details per IOMMU
> + * @list: Linked list pointers to hold RMR region info
> + * @base_address: base address of Reserved Memory Region
> + * @length: length of memory region
> + * @sid: associated stream id
> + * @flags: flags that apply to the RMR node
> + */
> +struct iommu_rmr {
> + struct list_headlist;
> + phys_addr_t base_address;
> + u64 length;
> + u32 sid;
> + u32 flags;
> +};
> +
> +/* RMR Remap permitted */
> +#define IOMMU_RMR_REMAP_PERMITTED(1 << 0)
> +

This struct has lots of overlap with 'struct iommu_resv_region'. Any
reason the existing struct can't be used here?

Regards,

Joerg

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH v6 00/15] Restricted DMA

2021-05-18 Thread Claire Chang
v7: https://lore.kernel.org/patchwork/cover/1431031/

On Mon, May 10, 2021 at 5:50 PM Claire Chang  wrote:
>
> From: Claire Chang 
>
> This series implements mitigations for lack of DMA access control on
> systems without an IOMMU, which could result in the DMA accessing the
> system memory at unexpected times and/or unexpected addresses, possibly
> leading to data leakage or corruption.
>
> For example, we plan to use the PCI-e bus for Wi-Fi and that PCI-e bus is
> not behind an IOMMU. As PCI-e, by design, gives the device full access to
> system memory, a vulnerability in the Wi-Fi firmware could easily escalate
> to a full system exploit (remote wifi exploits: [1a], [1b] that shows a
> full chain of exploits; [2], [3]).
>
> To mitigate the security concerns, we introduce restricted DMA. Restricted
> DMA utilizes the existing swiotlb to bounce streaming DMA in and out of a
> specially allocated region and does memory allocation from the same region.
> The feature on its own provides a basic level of protection against the DMA
> overwriting buffer contents at unexpected times. However, to protect
> against general data leakage and system memory corruption, the system needs
> to provide a way to restrict the DMA to a predefined memory region (this is
> usually done at firmware level, e.g. MPU in ATF on some ARM platforms [4]).
>
> [1a] 
> https://googleprojectzero.blogspot.com/2017/04/over-air-exploiting-broadcoms-wi-fi_4.html
> [1b] 
> https://googleprojectzero.blogspot.com/2017/04/over-air-exploiting-broadcoms-wi-fi_11.html
> [2] https://blade.tencent.com/en/advisories/qualpwn/
> [3] 
> https://www.bleepingcomputer.com/news/security/vulnerabilities-found-in-highly-popular-firmware-for-wifi-chips/
> [4] 
> https://github.com/ARM-software/arm-trusted-firmware/blob/master/plat/mediatek/mt8183/drivers/emi_mpu/emi_mpu.c#L132
>
> v6:
> Address the comments in v5
>
> v5:
> Rebase on latest linux-next
> https://lore.kernel.org/patchwork/cover/1416899/
>
> v4:
> - Fix spinlock bad magic
> - Use rmem->name for debugfs entry
> - Address the comments in v3
> https://lore.kernel.org/patchwork/cover/1378113/
>
> v3:
> Using only one reserved memory region for both streaming DMA and memory
> allocation.
> https://lore.kernel.org/patchwork/cover/1360992/
>
> v2:
> Building on top of swiotlb.
> https://lore.kernel.org/patchwork/cover/1280705/
>
> v1:
> Using dma_map_ops.
> https://lore.kernel.org/patchwork/cover/1271660/
> *** BLURB HERE ***
>
> Claire Chang (15):
>   swiotlb: Refactor swiotlb init functions
>   swiotlb: Refactor swiotlb_create_debugfs
>   swiotlb: Add DMA_RESTRICTED_POOL
>   swiotlb: Add restricted DMA pool initialization
>   swiotlb: Add a new get_io_tlb_mem getter
>   swiotlb: Update is_swiotlb_buffer to add a struct device argument
>   swiotlb: Update is_swiotlb_active to add a struct device argument
>   swiotlb: Bounce data from/to restricted DMA pool if available
>   swiotlb: Move alloc_size to find_slots
>   swiotlb: Refactor swiotlb_tbl_unmap_single
>   dma-direct: Add a new wrapper __dma_direct_free_pages()
>   swiotlb: Add restricted DMA alloc/free support.
>   dma-direct: Allocate memory from restricted DMA pool if available
>   dt-bindings: of: Add restricted DMA pool
>   of: Add plumbing for restricted DMA pool
>
>  .../reserved-memory/reserved-memory.txt   |  27 ++
>  drivers/gpu/drm/i915/gem/i915_gem_internal.c  |   2 +-
>  drivers/gpu/drm/nouveau/nouveau_ttm.c |   2 +-
>  drivers/iommu/dma-iommu.c |  12 +-
>  drivers/of/address.c  |  25 ++
>  drivers/of/device.c   |   3 +
>  drivers/of/of_private.h   |   5 +
>  drivers/pci/xen-pcifront.c|   2 +-
>  drivers/xen/swiotlb-xen.c |   2 +-
>  include/linux/device.h|   4 +
>  include/linux/swiotlb.h   |  41 ++-
>  kernel/dma/Kconfig|  14 +
>  kernel/dma/direct.c   |  63 +++--
>  kernel/dma/direct.h   |   9 +-
>  kernel/dma/swiotlb.c  | 242 +-
>  15 files changed, 356 insertions(+), 97 deletions(-)
>
> --
> 2.31.1.607.g51e8a6a459-goog
>
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH v7 04/15] swiotlb: Add restricted DMA pool initialization

2021-05-18 Thread Claire Chang
I didn't move this to a separate file because I feel it might be
confusing for swiotlb_alloc/free (and need more functions to be
non-static).
Maybe instead of moving to a separate file, we can try to come up with
a better naming?
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH v7 05/15] swiotlb: Add a new get_io_tlb_mem getter

2021-05-18 Thread Claire Chang
Still keep this function because directly using dev->dma_io_tlb_mem
will cause issues for memory allocation for existing devices. The pool
can't support atomic coherent allocation so we need to distinguish the
per device pool and the default pool in swiotlb_alloc.
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[PATCH v7 15/15] of: Add plumbing for restricted DMA pool

2021-05-18 Thread Claire Chang
If a device is not behind an IOMMU, we look up the device node and set
up the restricted DMA when the restricted-dma-pool is presented.

Signed-off-by: Claire Chang 
---
 drivers/of/address.c| 25 +
 drivers/of/device.c |  3 +++
 drivers/of/of_private.h |  5 +
 3 files changed, 33 insertions(+)

diff --git a/drivers/of/address.c b/drivers/of/address.c
index aca94c348bd4..c562a9ff5f0b 100644
--- a/drivers/of/address.c
+++ b/drivers/of/address.c
@@ -8,6 +8,7 @@
 #include 
 #include 
 #include 
+#include 
 #include 
 #include 
 #include 
@@ -1112,6 +1113,30 @@ bool of_dma_is_coherent(struct device_node *np)
 }
 EXPORT_SYMBOL_GPL(of_dma_is_coherent);
 
+int of_dma_set_restricted_buffer(struct device *dev)
+{
+   struct device_node *node;
+   int count, i;
+
+   if (!dev->of_node)
+   return 0;
+
+   count = of_property_count_elems_of_size(dev->of_node, "memory-region",
+   sizeof(phandle));
+   for (i = 0; i < count; i++) {
+   node = of_parse_phandle(dev->of_node, "memory-region", i);
+   /* There might be multiple memory regions, but only one
+* restriced-dma-pool region is allowed.
+*/
+   if (of_device_is_compatible(node, "restricted-dma-pool") &&
+   of_device_is_available(node))
+   return of_reserved_mem_device_init_by_idx(
+   dev, dev->of_node, i);
+   }
+
+   return 0;
+}
+
 /**
  * of_mmio_is_nonposted - Check if device uses non-posted MMIO
  * @np:device node
diff --git a/drivers/of/device.c b/drivers/of/device.c
index c5a9473a5fb1..d8d865223e51 100644
--- a/drivers/of/device.c
+++ b/drivers/of/device.c
@@ -165,6 +165,9 @@ int of_dma_configure_id(struct device *dev, struct 
device_node *np,
 
arch_setup_dma_ops(dev, dma_start, size, iommu, coherent);
 
+   if (!iommu)
+   return of_dma_set_restricted_buffer(dev);
+
return 0;
 }
 EXPORT_SYMBOL_GPL(of_dma_configure_id);
diff --git a/drivers/of/of_private.h b/drivers/of/of_private.h
index d717efbd637d..9fc874548528 100644
--- a/drivers/of/of_private.h
+++ b/drivers/of/of_private.h
@@ -163,12 +163,17 @@ struct bus_dma_region;
 #if defined(CONFIG_OF_ADDRESS) && defined(CONFIG_HAS_DMA)
 int of_dma_get_range(struct device_node *np,
const struct bus_dma_region **map);
+int of_dma_set_restricted_buffer(struct device *dev);
 #else
 static inline int of_dma_get_range(struct device_node *np,
const struct bus_dma_region **map)
 {
return -ENODEV;
 }
+static inline int of_dma_set_restricted_buffer(struct device *dev)
+{
+   return -ENODEV;
+}
 #endif
 
 #endif /* _LINUX_OF_PRIVATE_H */
-- 
2.31.1.751.gd2f1c929bd-goog

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[PATCH v7 14/15] dt-bindings: of: Add restricted DMA pool

2021-05-18 Thread Claire Chang
Introduce the new compatible string, restricted-dma-pool, for restricted
DMA. One can specify the address and length of the restricted DMA memory
region by restricted-dma-pool in the reserved-memory node.

Signed-off-by: Claire Chang 
---
 .../reserved-memory/reserved-memory.txt   | 27 +++
 1 file changed, 27 insertions(+)

diff --git 
a/Documentation/devicetree/bindings/reserved-memory/reserved-memory.txt 
b/Documentation/devicetree/bindings/reserved-memory/reserved-memory.txt
index e8d3096d922c..284aea659015 100644
--- a/Documentation/devicetree/bindings/reserved-memory/reserved-memory.txt
+++ b/Documentation/devicetree/bindings/reserved-memory/reserved-memory.txt
@@ -51,6 +51,23 @@ compatible (optional) - standard definition
   used as a shared pool of DMA buffers for a set of devices. It can
   be used by an operating system to instantiate the necessary pool
   management subsystem if necessary.
+- restricted-dma-pool: This indicates a region of memory meant to be
+  used as a pool of restricted DMA buffers for a set of devices. The
+  memory region would be the only region accessible to those devices.
+  When using this, the no-map and reusable properties must not be set,
+  so the operating system can create a virtual mapping that will be 
used
+  for synchronization. The main purpose for restricted DMA is to
+  mitigate the lack of DMA access control on systems without an IOMMU,
+  which could result in the DMA accessing the system memory at
+  unexpected times and/or unexpected addresses, possibly leading to 
data
+  leakage or corruption. The feature on its own provides a basic level
+  of protection against the DMA overwriting buffer contents at
+  unexpected times. However, to protect against general data leakage 
and
+  system memory corruption, the system needs to provide way to lock 
down
+  the memory access, e.g., MPU. Note that since coherent allocation
+  needs remapping, one must set up another device coherent pool by
+  shared-dma-pool and use dma_alloc_from_dev_coherent instead for 
atomic
+  coherent allocation.
 - vendor specific string in the form ,[-]
 no-map (optional) - empty property
 - Indicates the operating system must not create a virtual mapping
@@ -120,6 +137,11 @@ one for multimedia processing (named 
multimedia-memory@7700, 64MiB).
compatible = "acme,multimedia-memory";
reg = <0x7700 0x400>;
};
+
+   restricted_dma_mem_reserved: restricted_dma_mem_reserved {
+   compatible = "restricted-dma-pool";
+   reg = <0x5000 0x40>;
+   };
};
 
/* ... */
@@ -138,4 +160,9 @@ one for multimedia processing (named 
multimedia-memory@7700, 64MiB).
memory-region = <_reserved>;
/* ... */
};
+
+   pcie_device: pcie_device@0,0 {
+   memory-region = <_dma_mem_reserved>;
+   /* ... */
+   };
 };
-- 
2.31.1.751.gd2f1c929bd-goog

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[PATCH v7 13/15] dma-direct: Allocate memory from restricted DMA pool if available

2021-05-18 Thread Claire Chang
The restricted DMA pool is preferred if available.

The restricted DMA pools provide a basic level of protection against the
DMA overwriting buffer contents at unexpected times. However, to protect
against general data leakage and system memory corruption, the system
needs to provide a way to lock down the memory access, e.g., MPU.

Note that since coherent allocation needs remapping, one must set up
another device coherent pool by shared-dma-pool and use
dma_alloc_from_dev_coherent instead for atomic coherent allocation.

Signed-off-by: Claire Chang 
---
 kernel/dma/direct.c | 38 +-
 1 file changed, 29 insertions(+), 9 deletions(-)

diff --git a/kernel/dma/direct.c b/kernel/dma/direct.c
index eb4098323bbc..0d521f78c7b9 100644
--- a/kernel/dma/direct.c
+++ b/kernel/dma/direct.c
@@ -78,6 +78,10 @@ static bool dma_coherent_ok(struct device *dev, phys_addr_t 
phys, size_t size)
 static void __dma_direct_free_pages(struct device *dev, struct page *page,
size_t size)
 {
+#ifdef CONFIG_DMA_RESTRICTED_POOL
+   if (swiotlb_free(dev, page, size))
+   return;
+#endif
dma_free_contiguous(dev, page, size);
 }
 
@@ -92,7 +96,17 @@ static struct page *__dma_direct_alloc_pages(struct device 
*dev, size_t size,
 
gfp |= dma_direct_optimal_gfp_mask(dev, dev->coherent_dma_mask,
   _limit);
-   page = dma_alloc_contiguous(dev, size, gfp);
+
+#ifdef CONFIG_DMA_RESTRICTED_POOL
+   page = swiotlb_alloc(dev, size);
+   if (page && !dma_coherent_ok(dev, page_to_phys(page), size)) {
+   __dma_direct_free_pages(dev, page, size);
+   page = NULL;
+   }
+#endif
+
+   if (!page)
+   page = dma_alloc_contiguous(dev, size, gfp);
if (page && !dma_coherent_ok(dev, page_to_phys(page), size)) {
dma_free_contiguous(dev, page, size);
page = NULL;
@@ -148,7 +162,7 @@ void *dma_direct_alloc(struct device *dev, size_t size,
gfp |= __GFP_NOWARN;
 
if ((attrs & DMA_ATTR_NO_KERNEL_MAPPING) &&
-   !force_dma_unencrypted(dev)) {
+   !force_dma_unencrypted(dev) && !is_dev_swiotlb_force(dev)) {
page = __dma_direct_alloc_pages(dev, size, gfp & ~__GFP_ZERO);
if (!page)
return NULL;
@@ -161,18 +175,23 @@ void *dma_direct_alloc(struct device *dev, size_t size,
}
 
if (!IS_ENABLED(CONFIG_ARCH_HAS_DMA_SET_UNCACHED) &&
-   !IS_ENABLED(CONFIG_DMA_DIRECT_REMAP) &&
-   !dev_is_dma_coherent(dev))
+   !IS_ENABLED(CONFIG_DMA_DIRECT_REMAP) && !dev_is_dma_coherent(dev) &&
+   !is_dev_swiotlb_force(dev))
return arch_dma_alloc(dev, size, dma_handle, gfp, attrs);
 
/*
 * Remapping or decrypting memory may block. If either is required and
 * we can't block, allocate the memory from the atomic pools.
+* If restricted DMA (i.e., is_dev_swiotlb_force) is required, one must
+* set up another device coherent pool by shared-dma-pool and use
+* dma_alloc_from_dev_coherent instead.
 */
if (IS_ENABLED(CONFIG_DMA_COHERENT_POOL) &&
!gfpflags_allow_blocking(gfp) &&
(force_dma_unencrypted(dev) ||
-(IS_ENABLED(CONFIG_DMA_DIRECT_REMAP) && 
!dev_is_dma_coherent(dev
+(IS_ENABLED(CONFIG_DMA_DIRECT_REMAP) &&
+ !dev_is_dma_coherent(dev))) &&
+   !is_dev_swiotlb_force(dev))
return dma_direct_alloc_from_pool(dev, size, dma_handle, gfp);
 
/* we always manually zero the memory once we are done */
@@ -253,15 +272,15 @@ void dma_direct_free(struct device *dev, size_t size,
unsigned int page_order = get_order(size);
 
if ((attrs & DMA_ATTR_NO_KERNEL_MAPPING) &&
-   !force_dma_unencrypted(dev)) {
+   !force_dma_unencrypted(dev) && !is_dev_swiotlb_force(dev)) {
/* cpu_addr is a struct page cookie, not a kernel address */
dma_free_contiguous(dev, cpu_addr, size);
return;
}
 
if (!IS_ENABLED(CONFIG_ARCH_HAS_DMA_SET_UNCACHED) &&
-   !IS_ENABLED(CONFIG_DMA_DIRECT_REMAP) &&
-   !dev_is_dma_coherent(dev)) {
+   !IS_ENABLED(CONFIG_DMA_DIRECT_REMAP) && !dev_is_dma_coherent(dev) &&
+   !is_dev_swiotlb_force(dev)) {
arch_dma_free(dev, size, cpu_addr, dma_addr, attrs);
return;
}
@@ -289,7 +308,8 @@ struct page *dma_direct_alloc_pages(struct device *dev, 
size_t size,
void *ret;
 
if (IS_ENABLED(CONFIG_DMA_COHERENT_POOL) &&
-   force_dma_unencrypted(dev) && !gfpflags_allow_blocking(gfp))
+   force_dma_unencrypted(dev) && !gfpflags_allow_blocking(gfp) &&
+   !is_dev_swiotlb_force(dev))
return dma_direct_alloc_from_pool(dev, size, dma_handle, 

[PATCH v7 12/15] swiotlb: Add restricted DMA alloc/free support.

2021-05-18 Thread Claire Chang
Add the functions, swiotlb_{alloc,free} to support the memory allocation
from restricted DMA pool.

Signed-off-by: Claire Chang 
---
 include/linux/swiotlb.h |  4 
 kernel/dma/swiotlb.c| 35 +--
 2 files changed, 37 insertions(+), 2 deletions(-)

diff --git a/include/linux/swiotlb.h b/include/linux/swiotlb.h
index 0c5a18d9cf89..e8cf49bd90c5 100644
--- a/include/linux/swiotlb.h
+++ b/include/linux/swiotlb.h
@@ -134,6 +134,10 @@ unsigned int swiotlb_max_segment(void);
 size_t swiotlb_max_mapping_size(struct device *dev);
 bool is_swiotlb_active(struct device *dev);
 void __init swiotlb_adjust_size(unsigned long size);
+#ifdef CONFIG_DMA_RESTRICTED_POOL
+struct page *swiotlb_alloc(struct device *dev, size_t size);
+bool swiotlb_free(struct device *dev, struct page *page, size_t size);
+#endif /* CONFIG_DMA_RESTRICTED_POOL */
 #else
 #define swiotlb_force SWIOTLB_NO_FORCE
 static inline bool is_swiotlb_buffer(struct device *dev, phys_addr_t paddr)
diff --git a/kernel/dma/swiotlb.c b/kernel/dma/swiotlb.c
index cef856d23194..d3fa4669229b 100644
--- a/kernel/dma/swiotlb.c
+++ b/kernel/dma/swiotlb.c
@@ -457,8 +457,9 @@ static int find_slots(struct device *dev, phys_addr_t 
orig_addr,
 
index = wrap = wrap_index(mem, ALIGN(mem->index, stride));
do {
-   if ((slot_addr(tbl_dma_addr, index) & iotlb_align_mask) !=
-   (orig_addr & iotlb_align_mask)) {
+   if (orig_addr &&
+   (slot_addr(tbl_dma_addr, index) & iotlb_align_mask) !=
+   (orig_addr & iotlb_align_mask)) {
index = wrap_index(mem, index + 1);
continue;
}
@@ -704,6 +705,36 @@ late_initcall(swiotlb_create_default_debugfs);
 #endif
 
 #ifdef CONFIG_DMA_RESTRICTED_POOL
+struct page *swiotlb_alloc(struct device *dev, size_t size)
+{
+   struct io_tlb_mem *mem = dev->dma_io_tlb_mem;
+   phys_addr_t tlb_addr;
+   int index;
+
+   if (!mem)
+   return NULL;
+
+   index = find_slots(dev, 0, size);
+   if (index == -1)
+   return NULL;
+
+   tlb_addr = slot_addr(mem->start, index);
+
+   return pfn_to_page(PFN_DOWN(tlb_addr));
+}
+
+bool swiotlb_free(struct device *dev, struct page *page, size_t size)
+{
+   phys_addr_t tlb_addr = page_to_phys(page);
+
+   if (!is_swiotlb_buffer(dev, tlb_addr))
+   return false;
+
+   release_slots(dev, tlb_addr);
+
+   return true;
+}
+
 static int rmem_swiotlb_device_init(struct reserved_mem *rmem,
struct device *dev)
 {
-- 
2.31.1.751.gd2f1c929bd-goog

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[PATCH v7 11/15] dma-direct: Add a new wrapper __dma_direct_free_pages()

2021-05-18 Thread Claire Chang
Add a new wrapper __dma_direct_free_pages() that will be useful later
for swiotlb_free().

Signed-off-by: Claire Chang 
---
 kernel/dma/direct.c | 14 ++
 1 file changed, 10 insertions(+), 4 deletions(-)

diff --git a/kernel/dma/direct.c b/kernel/dma/direct.c
index 078f7087e466..eb4098323bbc 100644
--- a/kernel/dma/direct.c
+++ b/kernel/dma/direct.c
@@ -75,6 +75,12 @@ static bool dma_coherent_ok(struct device *dev, phys_addr_t 
phys, size_t size)
min_not_zero(dev->coherent_dma_mask, dev->bus_dma_limit);
 }
 
+static void __dma_direct_free_pages(struct device *dev, struct page *page,
+   size_t size)
+{
+   dma_free_contiguous(dev, page, size);
+}
+
 static struct page *__dma_direct_alloc_pages(struct device *dev, size_t size,
gfp_t gfp)
 {
@@ -237,7 +243,7 @@ void *dma_direct_alloc(struct device *dev, size_t size,
return NULL;
}
 out_free_pages:
-   dma_free_contiguous(dev, page, size);
+   __dma_direct_free_pages(dev, page, size);
return NULL;
 }
 
@@ -273,7 +279,7 @@ void dma_direct_free(struct device *dev, size_t size,
else if (IS_ENABLED(CONFIG_ARCH_HAS_DMA_CLEAR_UNCACHED))
arch_dma_clear_uncached(cpu_addr, size);
 
-   dma_free_contiguous(dev, dma_direct_to_page(dev, dma_addr), size);
+   __dma_direct_free_pages(dev, dma_direct_to_page(dev, dma_addr), size);
 }
 
 struct page *dma_direct_alloc_pages(struct device *dev, size_t size,
@@ -310,7 +316,7 @@ struct page *dma_direct_alloc_pages(struct device *dev, 
size_t size,
*dma_handle = phys_to_dma_direct(dev, page_to_phys(page));
return page;
 out_free_pages:
-   dma_free_contiguous(dev, page, size);
+   __dma_direct_free_pages(dev, page, size);
return NULL;
 }
 
@@ -329,7 +335,7 @@ void dma_direct_free_pages(struct device *dev, size_t size,
if (force_dma_unencrypted(dev))
set_memory_encrypted((unsigned long)vaddr, 1 << page_order);
 
-   dma_free_contiguous(dev, page, size);
+   __dma_direct_free_pages(dev, page, size);
 }
 
 #if defined(CONFIG_ARCH_HAS_SYNC_DMA_FOR_DEVICE) || \
-- 
2.31.1.751.gd2f1c929bd-goog

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[PATCH v7 10/15] swiotlb: Refactor swiotlb_tbl_unmap_single

2021-05-18 Thread Claire Chang
Add a new function, release_slots, to make the code reusable for supporting
different bounce buffer pools, e.g. restricted DMA pool.

Signed-off-by: Claire Chang 
---
 kernel/dma/swiotlb.c | 35 ---
 1 file changed, 20 insertions(+), 15 deletions(-)

diff --git a/kernel/dma/swiotlb.c b/kernel/dma/swiotlb.c
index 2ec6711071de..cef856d23194 100644
--- a/kernel/dma/swiotlb.c
+++ b/kernel/dma/swiotlb.c
@@ -550,27 +550,15 @@ phys_addr_t swiotlb_tbl_map_single(struct device *dev, 
phys_addr_t orig_addr,
return tlb_addr;
 }
 
-/*
- * tlb_addr is the physical address of the bounce buffer to unmap.
- */
-void swiotlb_tbl_unmap_single(struct device *hwdev, phys_addr_t tlb_addr,
- size_t mapping_size, enum dma_data_direction dir,
- unsigned long attrs)
+static void release_slots(struct device *dev, phys_addr_t tlb_addr)
 {
-   struct io_tlb_mem *mem = get_io_tlb_mem(hwdev);
+   struct io_tlb_mem *mem = get_io_tlb_mem(dev);
unsigned long flags;
-   unsigned int offset = swiotlb_align_offset(hwdev, tlb_addr);
+   unsigned int offset = swiotlb_align_offset(dev, tlb_addr);
int index = (tlb_addr - offset - mem->start) >> IO_TLB_SHIFT;
int nslots = nr_slots(mem->slots[index].alloc_size + offset);
int count, i;
 
-   /*
-* First, sync the memory before unmapping the entry
-*/
-   if (!(attrs & DMA_ATTR_SKIP_CPU_SYNC) &&
-   (dir == DMA_FROM_DEVICE || dir == DMA_BIDIRECTIONAL))
-   swiotlb_bounce(hwdev, tlb_addr, mapping_size, DMA_FROM_DEVICE);
-
/*
 * Return the buffer to the free list by setting the corresponding
 * entries to indicate the number of contiguous entries available.
@@ -605,6 +593,23 @@ void swiotlb_tbl_unmap_single(struct device *hwdev, 
phys_addr_t tlb_addr,
spin_unlock_irqrestore(>lock, flags);
 }
 
+/*
+ * tlb_addr is the physical address of the bounce buffer to unmap.
+ */
+void swiotlb_tbl_unmap_single(struct device *dev, phys_addr_t tlb_addr,
+ size_t mapping_size, enum dma_data_direction dir,
+ unsigned long attrs)
+{
+   /*
+* First, sync the memory before unmapping the entry
+*/
+   if (!(attrs & DMA_ATTR_SKIP_CPU_SYNC) &&
+   (dir == DMA_FROM_DEVICE || dir == DMA_BIDIRECTIONAL))
+   swiotlb_bounce(dev, tlb_addr, mapping_size, DMA_FROM_DEVICE);
+
+   release_slots(dev, tlb_addr);
+}
+
 void swiotlb_sync_single_for_device(struct device *dev, phys_addr_t tlb_addr,
size_t size, enum dma_data_direction dir)
 {
-- 
2.31.1.751.gd2f1c929bd-goog

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[PATCH v7 09/15] swiotlb: Move alloc_size to find_slots

2021-05-18 Thread Claire Chang
Move the maintenance of alloc_size to find_slots for better code
reusability later.

Signed-off-by: Claire Chang 
---
 kernel/dma/swiotlb.c | 10 +-
 1 file changed, 5 insertions(+), 5 deletions(-)

diff --git a/kernel/dma/swiotlb.c b/kernel/dma/swiotlb.c
index 95f482c4408c..2ec6711071de 100644
--- a/kernel/dma/swiotlb.c
+++ b/kernel/dma/swiotlb.c
@@ -482,8 +482,11 @@ static int find_slots(struct device *dev, phys_addr_t 
orig_addr,
return -1;
 
 found:
-   for (i = index; i < index + nslots; i++)
+   for (i = index; i < index + nslots; i++) {
mem->slots[i].list = 0;
+   mem->slots[i].alloc_size =
+   alloc_size - ((i - index) << IO_TLB_SHIFT);
+   }
for (i = index - 1;
 io_tlb_offset(i) != IO_TLB_SEGSIZE - 1 &&
 mem->slots[i].list; i--)
@@ -538,11 +541,8 @@ phys_addr_t swiotlb_tbl_map_single(struct device *dev, 
phys_addr_t orig_addr,
 * This is needed when we sync the memory.  Then we sync the buffer if
 * needed.
 */
-   for (i = 0; i < nr_slots(alloc_size + offset); i++) {
+   for (i = 0; i < nr_slots(alloc_size + offset); i++)
mem->slots[index + i].orig_addr = slot_addr(orig_addr, i);
-   mem->slots[index + i].alloc_size =
-   alloc_size - (i << IO_TLB_SHIFT);
-   }
tlb_addr = slot_addr(mem->start, index) + offset;
if (!(attrs & DMA_ATTR_SKIP_CPU_SYNC) &&
(dir == DMA_TO_DEVICE || dir == DMA_BIDIRECTIONAL))
-- 
2.31.1.751.gd2f1c929bd-goog

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[PATCH v7 08/15] swiotlb: Bounce data from/to restricted DMA pool if available

2021-05-18 Thread Claire Chang
Regardless of swiotlb setting, the restricted DMA pool is preferred if
available.

The restricted DMA pools provide a basic level of protection against the
DMA overwriting buffer contents at unexpected times. However, to protect
against general data leakage and system memory corruption, the system
needs to provide a way to lock down the memory access, e.g., MPU.

Note that is_dev_swiotlb_force doesn't check if
swiotlb_force == SWIOTLB_FORCE. Otherwise the memory allocation behavior
with default swiotlb will be changed by the following patche
("dma-direct: Allocate memory from restricted DMA pool if available").

Signed-off-by: Claire Chang 
---
 include/linux/swiotlb.h | 13 +
 kernel/dma/direct.c |  3 ++-
 kernel/dma/direct.h |  3 ++-
 kernel/dma/swiotlb.c|  8 
 4 files changed, 21 insertions(+), 6 deletions(-)

diff --git a/include/linux/swiotlb.h b/include/linux/swiotlb.h
index c530c976d18b..0c5a18d9cf89 100644
--- a/include/linux/swiotlb.h
+++ b/include/linux/swiotlb.h
@@ -120,6 +120,15 @@ static inline bool is_swiotlb_buffer(struct device *dev, 
phys_addr_t paddr)
return mem && paddr >= mem->start && paddr < mem->end;
 }
 
+static inline bool is_dev_swiotlb_force(struct device *dev)
+{
+#ifdef CONFIG_DMA_RESTRICTED_POOL
+   if (dev->dma_io_tlb_mem)
+   return true;
+#endif /* CONFIG_DMA_RESTRICTED_POOL */
+   return false;
+}
+
 void __init swiotlb_exit(void);
 unsigned int swiotlb_max_segment(void);
 size_t swiotlb_max_mapping_size(struct device *dev);
@@ -131,6 +140,10 @@ static inline bool is_swiotlb_buffer(struct device *dev, 
phys_addr_t paddr)
 {
return false;
 }
+static inline bool is_dev_swiotlb_force(struct device *dev)
+{
+   return false;
+}
 static inline void swiotlb_exit(void)
 {
 }
diff --git a/kernel/dma/direct.c b/kernel/dma/direct.c
index 7a88c34d0867..078f7087e466 100644
--- a/kernel/dma/direct.c
+++ b/kernel/dma/direct.c
@@ -496,7 +496,8 @@ size_t dma_direct_max_mapping_size(struct device *dev)
 {
/* If SWIOTLB is active, use its maximum mapping size */
if (is_swiotlb_active(dev) &&
-   (dma_addressing_limited(dev) || swiotlb_force == SWIOTLB_FORCE))
+   (dma_addressing_limited(dev) || swiotlb_force == SWIOTLB_FORCE ||
+is_dev_swiotlb_force(dev)))
return swiotlb_max_mapping_size(dev);
return SIZE_MAX;
 }
diff --git a/kernel/dma/direct.h b/kernel/dma/direct.h
index 13e9e7158d94..f94813674e23 100644
--- a/kernel/dma/direct.h
+++ b/kernel/dma/direct.h
@@ -87,7 +87,8 @@ static inline dma_addr_t dma_direct_map_page(struct device 
*dev,
phys_addr_t phys = page_to_phys(page) + offset;
dma_addr_t dma_addr = phys_to_dma(dev, phys);
 
-   if (unlikely(swiotlb_force == SWIOTLB_FORCE))
+   if (unlikely(swiotlb_force == SWIOTLB_FORCE) ||
+   is_dev_swiotlb_force(dev))
return swiotlb_map(dev, phys, size, dir, attrs);
 
if (unlikely(!dma_capable(dev, dma_addr, size, true))) {
diff --git a/kernel/dma/swiotlb.c b/kernel/dma/swiotlb.c
index 68e7633f11fe..95f482c4408c 100644
--- a/kernel/dma/swiotlb.c
+++ b/kernel/dma/swiotlb.c
@@ -347,7 +347,7 @@ void __init swiotlb_exit(void)
 static void swiotlb_bounce(struct device *dev, phys_addr_t tlb_addr, size_t 
size,
   enum dma_data_direction dir)
 {
-   struct io_tlb_mem *mem = io_tlb_default_mem;
+   struct io_tlb_mem *mem = get_io_tlb_mem(dev);
int index = (tlb_addr - mem->start) >> IO_TLB_SHIFT;
phys_addr_t orig_addr = mem->slots[index].orig_addr;
size_t alloc_size = mem->slots[index].alloc_size;
@@ -429,7 +429,7 @@ static unsigned int wrap_index(struct io_tlb_mem *mem, 
unsigned int index)
 static int find_slots(struct device *dev, phys_addr_t orig_addr,
size_t alloc_size)
 {
-   struct io_tlb_mem *mem = io_tlb_default_mem;
+   struct io_tlb_mem *mem = get_io_tlb_mem(dev);
unsigned long boundary_mask = dma_get_seg_boundary(dev);
dma_addr_t tbl_dma_addr =
phys_to_dma_unencrypted(dev, mem->start) & boundary_mask;
@@ -506,7 +506,7 @@ phys_addr_t swiotlb_tbl_map_single(struct device *dev, 
phys_addr_t orig_addr,
size_t mapping_size, size_t alloc_size,
enum dma_data_direction dir, unsigned long attrs)
 {
-   struct io_tlb_mem *mem = io_tlb_default_mem;
+   struct io_tlb_mem *mem = get_io_tlb_mem(dev);
unsigned int offset = swiotlb_align_offset(dev, orig_addr);
unsigned int i;
int index;
@@ -557,7 +557,7 @@ void swiotlb_tbl_unmap_single(struct device *hwdev, 
phys_addr_t tlb_addr,
  size_t mapping_size, enum dma_data_direction dir,
  unsigned long attrs)
 {
-   struct io_tlb_mem *mem = io_tlb_default_mem;
+   struct io_tlb_mem *mem = get_io_tlb_mem(hwdev);
unsigned long flags;
unsigned int offset = 

[PATCH v7 07/15] swiotlb: Update is_swiotlb_active to add a struct device argument

2021-05-18 Thread Claire Chang
Update is_swiotlb_active to add a struct device argument. This will be
useful later to allow for restricted DMA pool.

Signed-off-by: Claire Chang 
---
 drivers/gpu/drm/i915/gem/i915_gem_internal.c | 2 +-
 drivers/gpu/drm/nouveau/nouveau_ttm.c| 2 +-
 drivers/pci/xen-pcifront.c   | 2 +-
 include/linux/swiotlb.h  | 4 ++--
 kernel/dma/direct.c  | 2 +-
 kernel/dma/swiotlb.c | 4 ++--
 6 files changed, 8 insertions(+), 8 deletions(-)

diff --git a/drivers/gpu/drm/i915/gem/i915_gem_internal.c 
b/drivers/gpu/drm/i915/gem/i915_gem_internal.c
index ce6b664b10aa..7d48c433446b 100644
--- a/drivers/gpu/drm/i915/gem/i915_gem_internal.c
+++ b/drivers/gpu/drm/i915/gem/i915_gem_internal.c
@@ -42,7 +42,7 @@ static int i915_gem_object_get_pages_internal(struct 
drm_i915_gem_object *obj)
 
max_order = MAX_ORDER;
 #ifdef CONFIG_SWIOTLB
-   if (is_swiotlb_active()) {
+   if (is_swiotlb_active(NULL)) {
unsigned int max_segment;
 
max_segment = swiotlb_max_segment();
diff --git a/drivers/gpu/drm/nouveau/nouveau_ttm.c 
b/drivers/gpu/drm/nouveau/nouveau_ttm.c
index e8b506a6685b..2a2ae6d6cf6d 100644
--- a/drivers/gpu/drm/nouveau/nouveau_ttm.c
+++ b/drivers/gpu/drm/nouveau/nouveau_ttm.c
@@ -321,7 +321,7 @@ nouveau_ttm_init(struct nouveau_drm *drm)
}
 
 #if IS_ENABLED(CONFIG_SWIOTLB) && IS_ENABLED(CONFIG_X86)
-   need_swiotlb = is_swiotlb_active();
+   need_swiotlb = is_swiotlb_active(NULL);
 #endif
 
ret = ttm_device_init(>ttm.bdev, _bo_driver, drm->dev->dev,
diff --git a/drivers/pci/xen-pcifront.c b/drivers/pci/xen-pcifront.c
index b7a8f3a1921f..6d548ce53ce7 100644
--- a/drivers/pci/xen-pcifront.c
+++ b/drivers/pci/xen-pcifront.c
@@ -693,7 +693,7 @@ static int pcifront_connect_and_init_dma(struct 
pcifront_device *pdev)
 
spin_unlock(_dev_lock);
 
-   if (!err && !is_swiotlb_active()) {
+   if (!err && !is_swiotlb_active(NULL)) {
err = pci_xen_swiotlb_init_late();
if (err)
dev_err(>xdev->dev, "Could not setup SWIOTLB!\n");
diff --git a/include/linux/swiotlb.h b/include/linux/swiotlb.h
index 2a6cca07540b..c530c976d18b 100644
--- a/include/linux/swiotlb.h
+++ b/include/linux/swiotlb.h
@@ -123,7 +123,7 @@ static inline bool is_swiotlb_buffer(struct device *dev, 
phys_addr_t paddr)
 void __init swiotlb_exit(void);
 unsigned int swiotlb_max_segment(void);
 size_t swiotlb_max_mapping_size(struct device *dev);
-bool is_swiotlb_active(void);
+bool is_swiotlb_active(struct device *dev);
 void __init swiotlb_adjust_size(unsigned long size);
 #else
 #define swiotlb_force SWIOTLB_NO_FORCE
@@ -143,7 +143,7 @@ static inline size_t swiotlb_max_mapping_size(struct device 
*dev)
return SIZE_MAX;
 }
 
-static inline bool is_swiotlb_active(void)
+static inline bool is_swiotlb_active(struct device *dev)
 {
return false;
 }
diff --git a/kernel/dma/direct.c b/kernel/dma/direct.c
index 84c9feb5474a..7a88c34d0867 100644
--- a/kernel/dma/direct.c
+++ b/kernel/dma/direct.c
@@ -495,7 +495,7 @@ int dma_direct_supported(struct device *dev, u64 mask)
 size_t dma_direct_max_mapping_size(struct device *dev)
 {
/* If SWIOTLB is active, use its maximum mapping size */
-   if (is_swiotlb_active() &&
+   if (is_swiotlb_active(dev) &&
(dma_addressing_limited(dev) || swiotlb_force == SWIOTLB_FORCE))
return swiotlb_max_mapping_size(dev);
return SIZE_MAX;
diff --git a/kernel/dma/swiotlb.c b/kernel/dma/swiotlb.c
index 1d8eb4de0d01..68e7633f11fe 100644
--- a/kernel/dma/swiotlb.c
+++ b/kernel/dma/swiotlb.c
@@ -662,9 +662,9 @@ size_t swiotlb_max_mapping_size(struct device *dev)
return ((size_t)IO_TLB_SIZE) * IO_TLB_SEGSIZE;
 }
 
-bool is_swiotlb_active(void)
+bool is_swiotlb_active(struct device *dev)
 {
-   return io_tlb_default_mem != NULL;
+   return get_io_tlb_mem(dev) != NULL;
 }
 EXPORT_SYMBOL_GPL(is_swiotlb_active);
 
-- 
2.31.1.751.gd2f1c929bd-goog

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[PATCH v7 05/15] swiotlb: Add a new get_io_tlb_mem getter

2021-05-18 Thread Claire Chang
Add a new getter, get_io_tlb_mem, to help select the io_tlb_mem struct.
The restricted DMA pool is preferred if available.

Signed-off-by: Claire Chang 
---
 include/linux/swiotlb.h | 11 +++
 1 file changed, 11 insertions(+)

diff --git a/include/linux/swiotlb.h b/include/linux/swiotlb.h
index 03ad6e3b4056..b469f04cca26 100644
--- a/include/linux/swiotlb.h
+++ b/include/linux/swiotlb.h
@@ -2,6 +2,7 @@
 #ifndef __LINUX_SWIOTLB_H
 #define __LINUX_SWIOTLB_H
 
+#include 
 #include 
 #include 
 #include 
@@ -102,6 +103,16 @@ struct io_tlb_mem {
 };
 extern struct io_tlb_mem *io_tlb_default_mem;
 
+static inline struct io_tlb_mem *get_io_tlb_mem(struct device *dev)
+{
+#ifdef CONFIG_DMA_RESTRICTED_POOL
+   if (dev && dev->dma_io_tlb_mem)
+   return dev->dma_io_tlb_mem;
+#endif /* CONFIG_DMA_RESTRICTED_POOL */
+
+   return io_tlb_default_mem;
+}
+
 static inline bool is_swiotlb_buffer(phys_addr_t paddr)
 {
struct io_tlb_mem *mem = io_tlb_default_mem;
-- 
2.31.1.751.gd2f1c929bd-goog

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[PATCH v7 06/15] swiotlb: Update is_swiotlb_buffer to add a struct device argument

2021-05-18 Thread Claire Chang
Update is_swiotlb_buffer to add a struct device argument. This will be
useful later to allow for restricted DMA pool.

Signed-off-by: Claire Chang 
---
 drivers/iommu/dma-iommu.c | 12 ++--
 drivers/xen/swiotlb-xen.c |  2 +-
 include/linux/swiotlb.h   |  6 +++---
 kernel/dma/direct.c   |  6 +++---
 kernel/dma/direct.h   |  6 +++---
 5 files changed, 16 insertions(+), 16 deletions(-)

diff --git a/drivers/iommu/dma-iommu.c b/drivers/iommu/dma-iommu.c
index 7bcdd1205535..a5df35bfd150 100644
--- a/drivers/iommu/dma-iommu.c
+++ b/drivers/iommu/dma-iommu.c
@@ -504,7 +504,7 @@ static void __iommu_dma_unmap_swiotlb(struct device *dev, 
dma_addr_t dma_addr,
 
__iommu_dma_unmap(dev, dma_addr, size);
 
-   if (unlikely(is_swiotlb_buffer(phys)))
+   if (unlikely(is_swiotlb_buffer(dev, phys)))
swiotlb_tbl_unmap_single(dev, phys, size, dir, attrs);
 }
 
@@ -575,7 +575,7 @@ static dma_addr_t __iommu_dma_map_swiotlb(struct device 
*dev, phys_addr_t phys,
}
 
iova = __iommu_dma_map(dev, phys, aligned_size, prot, dma_mask);
-   if (iova == DMA_MAPPING_ERROR && is_swiotlb_buffer(phys))
+   if (iova == DMA_MAPPING_ERROR && is_swiotlb_buffer(dev, phys))
swiotlb_tbl_unmap_single(dev, phys, org_size, dir, attrs);
return iova;
 }
@@ -781,7 +781,7 @@ static void iommu_dma_sync_single_for_cpu(struct device 
*dev,
if (!dev_is_dma_coherent(dev))
arch_sync_dma_for_cpu(phys, size, dir);
 
-   if (is_swiotlb_buffer(phys))
+   if (is_swiotlb_buffer(dev, phys))
swiotlb_sync_single_for_cpu(dev, phys, size, dir);
 }
 
@@ -794,7 +794,7 @@ static void iommu_dma_sync_single_for_device(struct device 
*dev,
return;
 
phys = iommu_iova_to_phys(iommu_get_dma_domain(dev), dma_handle);
-   if (is_swiotlb_buffer(phys))
+   if (is_swiotlb_buffer(dev, phys))
swiotlb_sync_single_for_device(dev, phys, size, dir);
 
if (!dev_is_dma_coherent(dev))
@@ -815,7 +815,7 @@ static void iommu_dma_sync_sg_for_cpu(struct device *dev,
if (!dev_is_dma_coherent(dev))
arch_sync_dma_for_cpu(sg_phys(sg), sg->length, dir);
 
-   if (is_swiotlb_buffer(sg_phys(sg)))
+   if (is_swiotlb_buffer(dev, sg_phys(sg)))
swiotlb_sync_single_for_cpu(dev, sg_phys(sg),
sg->length, dir);
}
@@ -832,7 +832,7 @@ static void iommu_dma_sync_sg_for_device(struct device *dev,
return;
 
for_each_sg(sgl, sg, nelems, i) {
-   if (is_swiotlb_buffer(sg_phys(sg)))
+   if (is_swiotlb_buffer(dev, sg_phys(sg)))
swiotlb_sync_single_for_device(dev, sg_phys(sg),
   sg->length, dir);
 
diff --git a/drivers/xen/swiotlb-xen.c b/drivers/xen/swiotlb-xen.c
index 4c89afc0df62..0c6ed09f8513 100644
--- a/drivers/xen/swiotlb-xen.c
+++ b/drivers/xen/swiotlb-xen.c
@@ -100,7 +100,7 @@ static int is_xen_swiotlb_buffer(struct device *dev, 
dma_addr_t dma_addr)
 * in our domain. Therefore _only_ check address within our domain.
 */
if (pfn_valid(PFN_DOWN(paddr)))
-   return is_swiotlb_buffer(paddr);
+   return is_swiotlb_buffer(dev, paddr);
return 0;
 }
 
diff --git a/include/linux/swiotlb.h b/include/linux/swiotlb.h
index b469f04cca26..2a6cca07540b 100644
--- a/include/linux/swiotlb.h
+++ b/include/linux/swiotlb.h
@@ -113,9 +113,9 @@ static inline struct io_tlb_mem *get_io_tlb_mem(struct 
device *dev)
return io_tlb_default_mem;
 }
 
-static inline bool is_swiotlb_buffer(phys_addr_t paddr)
+static inline bool is_swiotlb_buffer(struct device *dev, phys_addr_t paddr)
 {
-   struct io_tlb_mem *mem = io_tlb_default_mem;
+   struct io_tlb_mem *mem = get_io_tlb_mem(dev);
 
return mem && paddr >= mem->start && paddr < mem->end;
 }
@@ -127,7 +127,7 @@ bool is_swiotlb_active(void);
 void __init swiotlb_adjust_size(unsigned long size);
 #else
 #define swiotlb_force SWIOTLB_NO_FORCE
-static inline bool is_swiotlb_buffer(phys_addr_t paddr)
+static inline bool is_swiotlb_buffer(struct device *dev, phys_addr_t paddr)
 {
return false;
 }
diff --git a/kernel/dma/direct.c b/kernel/dma/direct.c
index f737e3347059..84c9feb5474a 100644
--- a/kernel/dma/direct.c
+++ b/kernel/dma/direct.c
@@ -343,7 +343,7 @@ void dma_direct_sync_sg_for_device(struct device *dev,
for_each_sg(sgl, sg, nents, i) {
phys_addr_t paddr = dma_to_phys(dev, sg_dma_address(sg));
 
-   if (unlikely(is_swiotlb_buffer(paddr)))
+   if (unlikely(is_swiotlb_buffer(dev, paddr)))
swiotlb_sync_single_for_device(dev, paddr, sg->length,
   dir);
 
@@ -369,7 +369,7 @@ void dma_direct_sync_sg_for_cpu(struct device 

[PATCH v7 04/15] swiotlb: Add restricted DMA pool initialization

2021-05-18 Thread Claire Chang
Add the initialization function to create restricted DMA pools from
matching reserved-memory nodes.

Signed-off-by: Claire Chang 
---
 include/linux/device.h  |  4 +++
 include/linux/swiotlb.h |  3 +-
 kernel/dma/swiotlb.c| 76 +
 3 files changed, 82 insertions(+), 1 deletion(-)

diff --git a/include/linux/device.h b/include/linux/device.h
index 38a2071cf776..4987608ea4ff 100644
--- a/include/linux/device.h
+++ b/include/linux/device.h
@@ -416,6 +416,7 @@ struct dev_links_info {
  * @dma_pools: Dma pools (if dma'ble device).
  * @dma_mem:   Internal for coherent mem override.
  * @cma_area:  Contiguous memory area for dma allocations
+ * @dma_io_tlb_mem: Internal for swiotlb io_tlb_mem override.
  * @archdata:  For arch-specific additions.
  * @of_node:   Associated device tree node.
  * @fwnode:Associated device node supplied by platform firmware.
@@ -521,6 +522,9 @@ struct device {
 #ifdef CONFIG_DMA_CMA
struct cma *cma_area;   /* contiguous memory area for dma
   allocations */
+#endif
+#ifdef CONFIG_DMA_RESTRICTED_POOL
+   struct io_tlb_mem *dma_io_tlb_mem;
 #endif
/* arch specific additions */
struct dev_archdata archdata;
diff --git a/include/linux/swiotlb.h b/include/linux/swiotlb.h
index 216854a5e513..03ad6e3b4056 100644
--- a/include/linux/swiotlb.h
+++ b/include/linux/swiotlb.h
@@ -72,7 +72,8 @@ extern enum swiotlb_force swiotlb_force;
  * range check to see if the memory was in fact allocated by this
  * API.
  * @nslabs:The number of IO TLB blocks (in groups of 64) between @start and
- * @end. This is command line adjustable via setup_io_tlb_npages.
+ * @end. For default swiotlb, this is command line adjustable via
+ * setup_io_tlb_npages.
  * @used:  The number of used IO TLB block.
  * @list:  The free list describing the number of free entries available
  * from each index.
diff --git a/kernel/dma/swiotlb.c b/kernel/dma/swiotlb.c
index b849b01a446f..1d8eb4de0d01 100644
--- a/kernel/dma/swiotlb.c
+++ b/kernel/dma/swiotlb.c
@@ -39,6 +39,13 @@
 #ifdef CONFIG_DEBUG_FS
 #include 
 #endif
+#ifdef CONFIG_DMA_RESTRICTED_POOL
+#include 
+#include 
+#include 
+#include 
+#include 
+#endif
 
 #include 
 #include 
@@ -690,3 +697,72 @@ static int __init swiotlb_create_default_debugfs(void)
 late_initcall(swiotlb_create_default_debugfs);
 
 #endif
+
+#ifdef CONFIG_DMA_RESTRICTED_POOL
+static int rmem_swiotlb_device_init(struct reserved_mem *rmem,
+   struct device *dev)
+{
+   struct io_tlb_mem *mem = rmem->priv;
+   unsigned long nslabs = rmem->size >> IO_TLB_SHIFT;
+
+   if (dev->dma_io_tlb_mem)
+   return 0;
+
+   /*
+* Since multiple devices can share the same pool, the private data,
+* io_tlb_mem struct, will be initialized by the first device attached
+* to it.
+*/
+   if (!mem) {
+   mem = kzalloc(struct_size(mem, slots, nslabs), GFP_KERNEL);
+   if (!mem)
+   return -ENOMEM;
+
+   if (PageHighMem(pfn_to_page(PHYS_PFN(rmem->base {
+   kfree(mem);
+   return -EINVAL;
+   }
+
+   swiotlb_init_io_tlb_mem(mem, rmem->base, nslabs, false);
+
+   rmem->priv = mem;
+
+   if (IS_ENABLED(CONFIG_DEBUG_FS))
+   swiotlb_create_debugfs(mem, rmem->name);
+   }
+
+   dev->dma_io_tlb_mem = mem;
+
+   return 0;
+}
+
+static void rmem_swiotlb_device_release(struct reserved_mem *rmem,
+   struct device *dev)
+{
+   if (dev)
+   dev->dma_io_tlb_mem = NULL;
+}
+
+static const struct reserved_mem_ops rmem_swiotlb_ops = {
+   .device_init = rmem_swiotlb_device_init,
+   .device_release = rmem_swiotlb_device_release,
+};
+
+static int __init rmem_swiotlb_setup(struct reserved_mem *rmem)
+{
+   unsigned long node = rmem->fdt_node;
+
+   if (of_get_flat_dt_prop(node, "reusable", NULL) ||
+   of_get_flat_dt_prop(node, "linux,cma-default", NULL) ||
+   of_get_flat_dt_prop(node, "linux,dma-default", NULL) ||
+   of_get_flat_dt_prop(node, "no-map", NULL))
+   return -EINVAL;
+
+   rmem->ops = _swiotlb_ops;
+   pr_info("Reserved memory: created device swiotlb memory pool at %pa, 
size %ld MiB\n",
+   >base, (unsigned long)rmem->size / SZ_1M);
+   return 0;
+}
+
+RESERVEDMEM_OF_DECLARE(dma, "restricted-dma-pool", rmem_swiotlb_setup);
+#endif /* CONFIG_DMA_RESTRICTED_POOL */
-- 
2.31.1.751.gd2f1c929bd-goog

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[PATCH v7 03/15] swiotlb: Add DMA_RESTRICTED_POOL

2021-05-18 Thread Claire Chang
Add a new kconfig symbol, DMA_RESTRICTED_POOL, for restricted DMA pool.

Signed-off-by: Claire Chang 
---
 kernel/dma/Kconfig | 14 ++
 1 file changed, 14 insertions(+)

diff --git a/kernel/dma/Kconfig b/kernel/dma/Kconfig
index 77b405508743..3e961dc39634 100644
--- a/kernel/dma/Kconfig
+++ b/kernel/dma/Kconfig
@@ -80,6 +80,20 @@ config SWIOTLB
bool
select NEED_DMA_MAP_STATE
 
+config DMA_RESTRICTED_POOL
+   bool "DMA Restricted Pool"
+   depends on OF && OF_RESERVED_MEM
+   select SWIOTLB
+   help
+ This enables support for restricted DMA pools which provide a level of
+ DMA memory protection on systems with limited hardware protection
+ capabilities, such as those lacking an IOMMU.
+
+ For more information see
+ 

+ and .
+ If unsure, say "n".
+
 #
 # Should be selected if we can mmap non-coherent mappings to userspace.
 # The only thing that is really required is a way to set an uncached bit
-- 
2.31.1.751.gd2f1c929bd-goog

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[PATCH v7 02/15] swiotlb: Refactor swiotlb_create_debugfs

2021-05-18 Thread Claire Chang
Split the debugfs creation to make the code reusable for supporting
different bounce buffer pools, e.g. restricted DMA pool.

Signed-off-by: Claire Chang 
---
 kernel/dma/swiotlb.c | 25 +++--
 1 file changed, 19 insertions(+), 6 deletions(-)

diff --git a/kernel/dma/swiotlb.c b/kernel/dma/swiotlb.c
index d3232fc19385..b849b01a446f 100644
--- a/kernel/dma/swiotlb.c
+++ b/kernel/dma/swiotlb.c
@@ -64,6 +64,7 @@
 enum swiotlb_force swiotlb_force;
 
 struct io_tlb_mem *io_tlb_default_mem;
+static struct dentry *debugfs_dir;
 
 /*
  * Max segment that we can provide which (if pages are contingous) will
@@ -662,18 +663,30 @@ EXPORT_SYMBOL_GPL(is_swiotlb_active);
 
 #ifdef CONFIG_DEBUG_FS
 
-static int __init swiotlb_create_debugfs(void)
+static void swiotlb_create_debugfs(struct io_tlb_mem *mem, const char *name)
 {
-   struct io_tlb_mem *mem = io_tlb_default_mem;
-
if (!mem)
-   return 0;
-   mem->debugfs = debugfs_create_dir("swiotlb", NULL);
+   return;
+
+   mem->debugfs = debugfs_create_dir(name, debugfs_dir);
debugfs_create_ulong("io_tlb_nslabs", 0400, mem->debugfs, >nslabs);
debugfs_create_ulong("io_tlb_used", 0400, mem->debugfs, >used);
+}
+
+static int __init swiotlb_create_default_debugfs(void)
+{
+   struct io_tlb_mem *mem = io_tlb_default_mem;
+
+   if (mem) {
+   swiotlb_create_debugfs(mem, "swiotlb");
+   debugfs_dir = mem->debugfs;
+   } else {
+   debugfs_dir = debugfs_create_dir("swiotlb", NULL);
+   }
+
return 0;
 }
 
-late_initcall(swiotlb_create_debugfs);
+late_initcall(swiotlb_create_default_debugfs);
 
 #endif
-- 
2.31.1.751.gd2f1c929bd-goog

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[PATCH v7 01/15] swiotlb: Refactor swiotlb init functions

2021-05-18 Thread Claire Chang
Add a new function, swiotlb_init_io_tlb_mem, for the io_tlb_mem struct
initialization to make the code reusable.

Note that we now also call set_memory_decrypted in swiotlb_init_with_tbl.

Signed-off-by: Claire Chang 
---
 kernel/dma/swiotlb.c | 51 ++--
 1 file changed, 25 insertions(+), 26 deletions(-)

diff --git a/kernel/dma/swiotlb.c b/kernel/dma/swiotlb.c
index 8ca7d505d61c..d3232fc19385 100644
--- a/kernel/dma/swiotlb.c
+++ b/kernel/dma/swiotlb.c
@@ -168,9 +168,30 @@ void __init swiotlb_update_mem_attributes(void)
memset(vaddr, 0, bytes);
 }
 
-int __init swiotlb_init_with_tbl(char *tlb, unsigned long nslabs, int verbose)
+static void swiotlb_init_io_tlb_mem(struct io_tlb_mem *mem, phys_addr_t start,
+   unsigned long nslabs, bool late_alloc)
 {
+   void *vaddr = phys_to_virt(start);
unsigned long bytes = nslabs << IO_TLB_SHIFT, i;
+
+   mem->nslabs = nslabs;
+   mem->start = start;
+   mem->end = mem->start + bytes;
+   mem->index = 0;
+   mem->late_alloc = late_alloc;
+   spin_lock_init(>lock);
+   for (i = 0; i < mem->nslabs; i++) {
+   mem->slots[i].list = IO_TLB_SEGSIZE - io_tlb_offset(i);
+   mem->slots[i].orig_addr = INVALID_PHYS_ADDR;
+   mem->slots[i].alloc_size = 0;
+   }
+
+   set_memory_decrypted((unsigned long)vaddr, bytes >> PAGE_SHIFT);
+   memset(vaddr, 0, bytes);
+}
+
+int __init swiotlb_init_with_tbl(char *tlb, unsigned long nslabs, int verbose)
+{
struct io_tlb_mem *mem;
size_t alloc_size;
 
@@ -186,16 +207,8 @@ int __init swiotlb_init_with_tbl(char *tlb, unsigned long 
nslabs, int verbose)
if (!mem)
panic("%s: Failed to allocate %zu bytes align=0x%lx\n",
  __func__, alloc_size, PAGE_SIZE);
-   mem->nslabs = nslabs;
-   mem->start = __pa(tlb);
-   mem->end = mem->start + bytes;
-   mem->index = 0;
-   spin_lock_init(>lock);
-   for (i = 0; i < mem->nslabs; i++) {
-   mem->slots[i].list = IO_TLB_SEGSIZE - io_tlb_offset(i);
-   mem->slots[i].orig_addr = INVALID_PHYS_ADDR;
-   mem->slots[i].alloc_size = 0;
-   }
+
+   swiotlb_init_io_tlb_mem(mem, __pa(tlb), nslabs, false);
 
io_tlb_default_mem = mem;
if (verbose)
@@ -282,7 +295,6 @@ swiotlb_late_init_with_default_size(size_t default_size)
 int
 swiotlb_late_init_with_tbl(char *tlb, unsigned long nslabs)
 {
-   unsigned long bytes = nslabs << IO_TLB_SHIFT, i;
struct io_tlb_mem *mem;
 
if (swiotlb_force == SWIOTLB_NO_FORCE)
@@ -297,20 +309,7 @@ swiotlb_late_init_with_tbl(char *tlb, unsigned long nslabs)
if (!mem)
return -ENOMEM;
 
-   mem->nslabs = nslabs;
-   mem->start = virt_to_phys(tlb);
-   mem->end = mem->start + bytes;
-   mem->index = 0;
-   mem->late_alloc = 1;
-   spin_lock_init(>lock);
-   for (i = 0; i < mem->nslabs; i++) {
-   mem->slots[i].list = IO_TLB_SEGSIZE - io_tlb_offset(i);
-   mem->slots[i].orig_addr = INVALID_PHYS_ADDR;
-   mem->slots[i].alloc_size = 0;
-   }
-
-   set_memory_decrypted((unsigned long)tlb, bytes >> PAGE_SHIFT);
-   memset(tlb, 0, bytes);
+   swiotlb_init_io_tlb_mem(mem, virt_to_phys(tlb), nslabs, true);
 
io_tlb_default_mem = mem;
swiotlb_print_info();
-- 
2.31.1.751.gd2f1c929bd-goog

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[PATCH v7 00/15] Restricted DMA

2021-05-18 Thread Claire Chang
This series implements mitigations for lack of DMA access control on
systems without an IOMMU, which could result in the DMA accessing the
system memory at unexpected times and/or unexpected addresses, possibly
leading to data leakage or corruption.

For example, we plan to use the PCI-e bus for Wi-Fi and that PCI-e bus is
not behind an IOMMU. As PCI-e, by design, gives the device full access to
system memory, a vulnerability in the Wi-Fi firmware could easily escalate
to a full system exploit (remote wifi exploits: [1a], [1b] that shows a
full chain of exploits; [2], [3]).

To mitigate the security concerns, we introduce restricted DMA. Restricted
DMA utilizes the existing swiotlb to bounce streaming DMA in and out of a
specially allocated region and does memory allocation from the same region.
The feature on its own provides a basic level of protection against the DMA
overwriting buffer contents at unexpected times. However, to protect
against general data leakage and system memory corruption, the system needs
to provide a way to restrict the DMA to a predefined memory region (this is
usually done at firmware level, e.g. MPU in ATF on some ARM platforms [4]).

[1a] 
https://googleprojectzero.blogspot.com/2017/04/over-air-exploiting-broadcoms-wi-fi_4.html
[1b] 
https://googleprojectzero.blogspot.com/2017/04/over-air-exploiting-broadcoms-wi-fi_11.html
[2] https://blade.tencent.com/en/advisories/qualpwn/
[3] 
https://www.bleepingcomputer.com/news/security/vulnerabilities-found-in-highly-popular-firmware-for-wifi-chips/
[4] 
https://github.com/ARM-software/arm-trusted-firmware/blob/master/plat/mediatek/mt8183/drivers/emi_mpu/emi_mpu.c#L132

v7:
Fix debugfs, PageHighMem and comment style in rmem_swiotlb_device_init

v6:
Address the comments in v5
https://lore.kernel.org/patchwork/cover/1423201/

v5:
Rebase on latest linux-next
https://lore.kernel.org/patchwork/cover/1416899/

v4:
- Fix spinlock bad magic
- Use rmem->name for debugfs entry
- Address the comments in v3
https://lore.kernel.org/patchwork/cover/1378113/

v3:
Using only one reserved memory region for both streaming DMA and memory
allocation.
https://lore.kernel.org/patchwork/cover/1360992/

v2:
Building on top of swiotlb.
https://lore.kernel.org/patchwork/cover/1280705/

v1:
Using dma_map_ops.
https://lore.kernel.org/patchwork/cover/1271660/

Claire Chang (15):
  swiotlb: Refactor swiotlb init functions
  swiotlb: Refactor swiotlb_create_debugfs
  swiotlb: Add DMA_RESTRICTED_POOL
  swiotlb: Add restricted DMA pool initialization
  swiotlb: Add a new get_io_tlb_mem getter
  swiotlb: Update is_swiotlb_buffer to add a struct device argument
  swiotlb: Update is_swiotlb_active to add a struct device argument
  swiotlb: Bounce data from/to restricted DMA pool if available
  swiotlb: Move alloc_size to find_slots
  swiotlb: Refactor swiotlb_tbl_unmap_single
  dma-direct: Add a new wrapper __dma_direct_free_pages()
  swiotlb: Add restricted DMA alloc/free support.
  dma-direct: Allocate memory from restricted DMA pool if available
  dt-bindings: of: Add restricted DMA pool
  of: Add plumbing for restricted DMA pool

 .../reserved-memory/reserved-memory.txt   |  27 ++
 drivers/gpu/drm/i915/gem/i915_gem_internal.c  |   2 +-
 drivers/gpu/drm/nouveau/nouveau_ttm.c |   2 +-
 drivers/iommu/dma-iommu.c |  12 +-
 drivers/of/address.c  |  25 ++
 drivers/of/device.c   |   3 +
 drivers/of/of_private.h   |   5 +
 drivers/pci/xen-pcifront.c|   2 +-
 drivers/xen/swiotlb-xen.c |   2 +-
 include/linux/device.h|   4 +
 include/linux/swiotlb.h   |  41 ++-
 kernel/dma/Kconfig|  14 +
 kernel/dma/direct.c   |  63 +++--
 kernel/dma/direct.h   |   9 +-
 kernel/dma/swiotlb.c  | 242 +-
 15 files changed, 356 insertions(+), 97 deletions(-)

-- 
2.31.1.751.gd2f1c929bd-goog

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu