Re: [V2, 2/2] media: i2c: Add DW9768 VCM driver

2019-09-05 Thread Tomasz Figa
Hi Dongchun,

On Thu, Sep 5, 2019 at 4:22 PM  wrote:
>
> From: Dongchun Zhu 
>
> This patch adds a V4L2 sub-device driver for DW9768 lens voice coil,
> and provides control to set the desired focus.
>
> The DW9768 is a 10 bit DAC with 100mA output current sink capability
> from Dongwoon, designed for linear control of voice coil motor,
> and controlled via I2C serial interface.
>
> Signed-off-by: Dongchun Zhu 
> ---
>  MAINTAINERS|   1 +
>  drivers/media/i2c/Kconfig  |  10 ++
>  drivers/media/i2c/Makefile |   1 +
>  drivers/media/i2c/dw9768.c | 349 
> +
>  4 files changed, 361 insertions(+)
>  create mode 100644 drivers/media/i2c/dw9768.c
>

Thanks for v2! Please see my comments inline.

> diff --git a/MAINTAINERS b/MAINTAINERS
> index 192a671..c5c9a0e 100644
> --- a/MAINTAINERS
> +++ b/MAINTAINERS
> @@ -4976,6 +4976,7 @@ M:Dongchun Zhu 
>  L: linux-me...@vger.kernel.org
>  T: git git://linuxtv.org/media_tree.git
>  S: Maintained
> +F: drivers/media/i2c/dw9768.c
>  F: Documentation/devicetree/bindings/media/i2c/dongwoon,dw9768.txt
>
>  DONGWOON DW9807 LENS VOICE COIL DRIVER
> diff --git a/drivers/media/i2c/Kconfig b/drivers/media/i2c/Kconfig
> index 79ce9ec..dfb665c 100644
> --- a/drivers/media/i2c/Kconfig
> +++ b/drivers/media/i2c/Kconfig
> @@ -1016,6 +1016,16 @@ config VIDEO_DW9714
>   capability. This is designed for linear control of
>   voice coil motors, controlled via I2C serial interface.
>
> +config VIDEO_DW9768
> +   tristate "DW9768 lens voice coil support"
> +   depends on I2C && VIDEO_V4L2 && MEDIA_CONTROLLER
> +   depends on VIDEO_V4L2_SUBDEV_API
> +   help
> + This is a driver for the DW9768 camera lens voice coil.
> + DW9768 is a 10 bit DAC with 100mA output current sink
> + capability. This is designed for linear control of
> + voice coil motors, controlled via I2C serial interface.
> +
>  config VIDEO_DW9807_VCM
> tristate "DW9807 lens voice coil support"
> depends on I2C && VIDEO_V4L2 && MEDIA_CONTROLLER
> diff --git a/drivers/media/i2c/Makefile b/drivers/media/i2c/Makefile
> index fd4ea86..2561239 100644
> --- a/drivers/media/i2c/Makefile
> +++ b/drivers/media/i2c/Makefile
> @@ -24,6 +24,7 @@ obj-$(CONFIG_VIDEO_SAA6752HS) += saa6752hs.o
>  obj-$(CONFIG_VIDEO_AD5820)  += ad5820.o
>  obj-$(CONFIG_VIDEO_AK7375)  += ak7375.o
>  obj-$(CONFIG_VIDEO_DW9714)  += dw9714.o
> +obj-$(CONFIG_VIDEO_DW9768)  += dw9768.o
>  obj-$(CONFIG_VIDEO_DW9807_VCM)  += dw9807-vcm.o
>  obj-$(CONFIG_VIDEO_ADV7170) += adv7170.o
>  obj-$(CONFIG_VIDEO_ADV7175) += adv7175.o
> diff --git a/drivers/media/i2c/dw9768.c b/drivers/media/i2c/dw9768.c
> new file mode 100644
> index 000..66d1712
> --- /dev/null
> +++ b/drivers/media/i2c/dw9768.c
> @@ -0,0 +1,349 @@
> +// SPDX-License-Identifier: GPL-2.0
> +// Copyright (c) 2019 MediaTek Inc.
> +
> +#include 
> +#include 
> +#include 
> +#include 
> +#include 
> +#include 
> +#include 
> +#include 
> +
> +#define DW9768_NAME"dw9768"
> +#define DW9768_MAX_FOCUS_POS   1023
> +/*
> + * This sets the minimum granularity for the focus positions.
> + * A value of 1 gives maximum accuracy for a desired focus position
> + */
> +#define DW9768_FOCUS_STEPS 1
> +/*
> + * DW9768 separates two registers to control the VCM position.
> + * One for MSB value, another is LSB value.
> + */
> +#define DW9768_REG_MASK_MSB0x03
> +#define DW9768_REG_MASK_LSB0xff
> +#define DW9768_SET_POSITION_ADDR0x03
> +
> +#define DW9768_CMD_DELAY   0xff
> +#define DW9768_CTRL_DELAY_US   5000
> +
> +#define DW9768_DAC_SHIFT   8
> +
> +/* dw9768 device structure */
> +struct dw9768 {
> +   struct v4l2_ctrl_handler ctrls;
> +   struct v4l2_subdev sd;
> +   struct regulator *vin;
> +   struct regulator *vdd;
> +};
> +
> +static inline struct dw9768 *to_dw9768_vcm(struct v4l2_ctrl *ctrl)
> +{
> +   return container_of(ctrl->handler, struct dw9768, ctrls);
> +}
> +
> +static inline struct dw9768 *sd_to_dw9768_vcm(struct v4l2_subdev *subdev)
> +{
> +   return container_of(subdev, struct dw9768, sd);
> +}
> +
> +struct regval_list {
> +   unsigned char reg_num;
> +   unsigned char value;
> +};
> +
> +static struct regval_list dw9768_init_regs[] = {
> +   {0x02, 0x02},
> +   {DW9768_CMD_DELAY, DW9768_CMD_DELAY},
> +   {0x06, 0x41},
> +   {0x07, 0x39},
> +   {DW9768_CMD_DELAY, DW9768_CMD_DELAY},
> +};
> +
> +static struct regval_list dw9768_release_regs[] = {
> +   {0x02, 0x00},
> +   {DW9768_CMD_DELAY, DW9768_CMD_DELAY},
> +   {0x01, 0x00},
> +   {DW9768_CMD_DELAY, DW9768_CMD_DELAY},
> +};
> +
> +static int dw9768_write_smbus(struct dw9768 *dw9768, unsigned char reg,
> + unsigned char va

Re: [PATCH 2/2] iommu/ipmmu-vmsa: Disable cache snoop transactions on R-Car Gen3

2019-09-05 Thread Simon Horman
On Wed, Sep 04, 2019 at 02:08:02PM +0200, Geert Uytterhoeven wrote:
> From: Hai Nguyen Pham 
> 
> According to the Hardware Manual Errata for Rev. 1.50 of April 10, 2019,
> cache snoop transactions for page table walk requests are not supported
> on R-Car Gen3.
> 
> Hence, this patch removes setting these fields in the IMTTBCR register,
> since it will have no effect, and adds comments to the register bit
> definitions, to make it clear they apply to R-Car Gen2 only.
> 
> Signed-off-by: Hai Nguyen Pham 
> [geert: Reword, add comments]
> Signed-off-by: Geert Uytterhoeven 

Reviewed-by: Simon Horman 

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH 1/2] iommu/ipmmu-vmsa: Move IMTTBCR_SL0_TWOBIT_* to restore sort order

2019-09-05 Thread Simon Horman
On Wed, Sep 04, 2019 at 02:08:01PM +0200, Geert Uytterhoeven wrote:
> Move the recently added IMTTBCR_SL0_TWOBIT_* definitions up, to make
> sure all IMTTBCR register bit definitions are sorted by decreasing bit
> index.  Add comments to make it clear that they exist on R-Car Gen3
> only.
> 
> Fixes: c295f504fb5a38ab ("iommu/ipmmu-vmsa: Allow two bit SL0")
> Signed-off-by: Geert Uytterhoeven 

Reviewed-by: Simon Horman 

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [V3, 2/2] media: i2c: Add Omnivision OV02A10 camera sensor driver

2019-09-05 Thread Tomasz Figa
On Thu, Sep 5, 2019 at 7:45 PM Sakari Ailus
 wrote:
>
> Hi Dongchun,
>
> On Thu, Sep 05, 2019 at 05:41:05PM +0800, Dongchun Zhu wrote:
>
> ...
>
> > > > + ret = regulator_bulk_enable(OV02A10_NUM_SUPPLIES, ov02a10->supplies);
> > > > + if (ret < 0) {
> > > > + dev_err(dev, "Failed to enable regulators\n");
> > > > + goto disable_clk;
> > > > + }
> > > > + msleep_range(7);
> > >
> > > This has some potential of clashing with more generic functions in the
> > > future. Please use usleep_range directly, or msleep.
> > >
> >
> > Did you mean using usleep_range(7*1000, 8*1000), as used in patch v1?
> > https://patchwork.kernel.org/patch/10957225/
>
> Yes, please.

Why not just msleep()?


swiotlb-xen cleanups v4

2019-09-05 Thread Christoph Hellwig
Hi Xen maintainers and friends,

please take a look at this series that cleans up the parts of swiotlb-xen
that deal with non-coherent caches.


Changes since v3:
 - don't use dma_direct_alloc on x86

Changes since v2:
 - further dma_cache_maint improvements
 - split the previous patch 1 into 3 patches

Changes since v1:
 - rewrite dma_cache_maint to be much simpler
 - improve various comments and commit logs
 - remove page-coherent.h entirely


[PATCH 01/11] xen/arm: use dma-noncoherent.h calls for xen-swiotlb cache maintainance

2019-09-05 Thread Christoph Hellwig
Copy the arm64 code that uses the dma-direct/swiotlb helpers for DMA
on-coherent devices.

Signed-off-by: Christoph Hellwig 
---
 arch/arm/include/asm/device.h|  3 -
 arch/arm/include/asm/xen/page-coherent.h | 72 +---
 arch/arm/mm/dma-mapping.c|  8 +--
 drivers/xen/swiotlb-xen.c| 20 ---
 4 files changed, 28 insertions(+), 75 deletions(-)

diff --git a/arch/arm/include/asm/device.h b/arch/arm/include/asm/device.h
index f6955b55c544..c675bc0d5aa8 100644
--- a/arch/arm/include/asm/device.h
+++ b/arch/arm/include/asm/device.h
@@ -14,9 +14,6 @@ struct dev_archdata {
 #endif
 #ifdef CONFIG_ARM_DMA_USE_IOMMU
struct dma_iommu_mapping*mapping;
-#endif
-#ifdef CONFIG_XEN
-   const struct dma_map_ops *dev_dma_ops;
 #endif
unsigned int dma_coherent:1;
unsigned int dma_ops_setup:1;
diff --git a/arch/arm/include/asm/xen/page-coherent.h 
b/arch/arm/include/asm/xen/page-coherent.h
index 2c403e7c782d..602ac02f154c 100644
--- a/arch/arm/include/asm/xen/page-coherent.h
+++ b/arch/arm/include/asm/xen/page-coherent.h
@@ -6,23 +6,37 @@
 #include 
 #include 
 
-static inline const struct dma_map_ops *xen_get_dma_ops(struct device *dev)
-{
-   if (dev && dev->archdata.dev_dma_ops)
-   return dev->archdata.dev_dma_ops;
-   return get_arch_dma_ops(NULL);
-}
-
 static inline void *xen_alloc_coherent_pages(struct device *hwdev, size_t size,
dma_addr_t *dma_handle, gfp_t flags, unsigned long attrs)
 {
-   return xen_get_dma_ops(hwdev)->alloc(hwdev, size, dma_handle, flags, 
attrs);
+   return dma_direct_alloc(hwdev, size, dma_handle, flags, attrs);
 }
 
 static inline void xen_free_coherent_pages(struct device *hwdev, size_t size,
void *cpu_addr, dma_addr_t dma_handle, unsigned long attrs)
 {
-   xen_get_dma_ops(hwdev)->free(hwdev, size, cpu_addr, dma_handle, attrs);
+   dma_direct_free(hwdev, size, cpu_addr, dma_handle, attrs);
+}
+
+static inline void xen_dma_sync_single_for_cpu(struct device *hwdev,
+   dma_addr_t handle, size_t size, enum dma_data_direction dir)
+{
+   unsigned long pfn = PFN_DOWN(handle);
+
+   if (pfn_valid(pfn))
+   dma_direct_sync_single_for_cpu(hwdev, handle, size, dir);
+   else
+   __xen_dma_sync_single_for_cpu(hwdev, handle, size, dir);
+}
+
+static inline void xen_dma_sync_single_for_device(struct device *hwdev,
+   dma_addr_t handle, size_t size, enum dma_data_direction dir)
+{
+   unsigned long pfn = PFN_DOWN(handle);
+   if (pfn_valid(pfn))
+   dma_direct_sync_single_for_device(hwdev, handle, size, dir);
+   else
+   __xen_dma_sync_single_for_device(hwdev, handle, size, dir);
 }
 
 static inline void xen_dma_map_page(struct device *hwdev, struct page *page,
@@ -36,17 +50,8 @@ static inline void xen_dma_map_page(struct device *hwdev, 
struct page *page,
bool local = (page_pfn <= dev_pfn) &&
(dev_pfn - page_pfn < compound_pages);
 
-   /*
-* Dom0 is mapped 1:1, while the Linux page can span across
-* multiple Xen pages, it's not possible for it to contain a
-* mix of local and foreign Xen pages. So if the first xen_pfn
-* == mfn the page is local otherwise it's a foreign page
-* grant-mapped in dom0. If the page is local we can safely
-* call the native dma_ops function, otherwise we call the xen
-* specific function.
-*/
if (local)
-   xen_get_dma_ops(hwdev)->map_page(hwdev, page, offset, size, 
dir, attrs);
+   dma_direct_map_page(hwdev, page, offset, size, dir, attrs);
else
__xen_dma_map_page(hwdev, page, dev_addr, offset, size, dir, 
attrs);
 }
@@ -63,33 +68,10 @@ static inline void xen_dma_unmap_page(struct device *hwdev, 
dma_addr_t handle,
 * safely call the native dma_ops function, otherwise we call the xen
 * specific function.
 */
-   if (pfn_valid(pfn)) {
-   if (xen_get_dma_ops(hwdev)->unmap_page)
-   xen_get_dma_ops(hwdev)->unmap_page(hwdev, handle, size, 
dir, attrs);
-   } else
+   if (pfn_valid(pfn))
+   dma_direct_unmap_page(hwdev, handle, size, dir, attrs);
+   else
__xen_dma_unmap_page(hwdev, handle, size, dir, attrs);
 }
 
-static inline void xen_dma_sync_single_for_cpu(struct device *hwdev,
-   dma_addr_t handle, size_t size, enum dma_data_direction dir)
-{
-   unsigned long pfn = PFN_DOWN(handle);
-   if (pfn_valid(pfn)) {
-   if (xen_get_dma_ops(hwdev)->sync_single_for_cpu)
-   xen_get_dma_ops(hwdev)->sync_single_for_cpu(hwdev, 
handle, size, dir);
-   } else
-   __xen_dma_sync_single_for_cpu(hwdev, handle, size, dir);
-}
-
-static inline void xen_dma_sync_single_for_device(struct device *hwdev,
-

[PATCH 02/11] xen/arm: consolidate page-coherent.h

2019-09-05 Thread Christoph Hellwig
Shared the duplicate arm/arm64 code in include/xen/arm/page-coherent.h.

Signed-off-by: Christoph Hellwig 
---
 arch/arm/include/asm/xen/page-coherent.h   | 75 
 arch/arm64/include/asm/xen/page-coherent.h | 75 
 include/xen/arm/page-coherent.h| 80 ++
 3 files changed, 80 insertions(+), 150 deletions(-)

diff --git a/arch/arm/include/asm/xen/page-coherent.h 
b/arch/arm/include/asm/xen/page-coherent.h
index 602ac02f154c..27e984977402 100644
--- a/arch/arm/include/asm/xen/page-coherent.h
+++ b/arch/arm/include/asm/xen/page-coherent.h
@@ -1,77 +1,2 @@
 /* SPDX-License-Identifier: GPL-2.0 */
-#ifndef _ASM_ARM_XEN_PAGE_COHERENT_H
-#define _ASM_ARM_XEN_PAGE_COHERENT_H
-
-#include 
-#include 
 #include 
-
-static inline void *xen_alloc_coherent_pages(struct device *hwdev, size_t size,
-   dma_addr_t *dma_handle, gfp_t flags, unsigned long attrs)
-{
-   return dma_direct_alloc(hwdev, size, dma_handle, flags, attrs);
-}
-
-static inline void xen_free_coherent_pages(struct device *hwdev, size_t size,
-   void *cpu_addr, dma_addr_t dma_handle, unsigned long attrs)
-{
-   dma_direct_free(hwdev, size, cpu_addr, dma_handle, attrs);
-}
-
-static inline void xen_dma_sync_single_for_cpu(struct device *hwdev,
-   dma_addr_t handle, size_t size, enum dma_data_direction dir)
-{
-   unsigned long pfn = PFN_DOWN(handle);
-
-   if (pfn_valid(pfn))
-   dma_direct_sync_single_for_cpu(hwdev, handle, size, dir);
-   else
-   __xen_dma_sync_single_for_cpu(hwdev, handle, size, dir);
-}
-
-static inline void xen_dma_sync_single_for_device(struct device *hwdev,
-   dma_addr_t handle, size_t size, enum dma_data_direction dir)
-{
-   unsigned long pfn = PFN_DOWN(handle);
-   if (pfn_valid(pfn))
-   dma_direct_sync_single_for_device(hwdev, handle, size, dir);
-   else
-   __xen_dma_sync_single_for_device(hwdev, handle, size, dir);
-}
-
-static inline void xen_dma_map_page(struct device *hwdev, struct page *page,
-dma_addr_t dev_addr, unsigned long offset, size_t size,
-enum dma_data_direction dir, unsigned long attrs)
-{
-   unsigned long page_pfn = page_to_xen_pfn(page);
-   unsigned long dev_pfn = XEN_PFN_DOWN(dev_addr);
-   unsigned long compound_pages =
-   (1<
-#include 
 #include 
-
-static inline void *xen_alloc_coherent_pages(struct device *hwdev, size_t size,
-   dma_addr_t *dma_handle, gfp_t flags, unsigned long attrs)
-{
-   return dma_direct_alloc(hwdev, size, dma_handle, flags, attrs);
-}
-
-static inline void xen_free_coherent_pages(struct device *hwdev, size_t size,
-   void *cpu_addr, dma_addr_t dma_handle, unsigned long attrs)
-{
-   dma_direct_free(hwdev, size, cpu_addr, dma_handle, attrs);
-}
-
-static inline void xen_dma_sync_single_for_cpu(struct device *hwdev,
-   dma_addr_t handle, size_t size, enum dma_data_direction dir)
-{
-   unsigned long pfn = PFN_DOWN(handle);
-
-   if (pfn_valid(pfn))
-   dma_direct_sync_single_for_cpu(hwdev, handle, size, dir);
-   else
-   __xen_dma_sync_single_for_cpu(hwdev, handle, size, dir);
-}
-
-static inline void xen_dma_sync_single_for_device(struct device *hwdev,
-   dma_addr_t handle, size_t size, enum dma_data_direction dir)
-{
-   unsigned long pfn = PFN_DOWN(handle);
-   if (pfn_valid(pfn))
-   dma_direct_sync_single_for_device(hwdev, handle, size, dir);
-   else
-   __xen_dma_sync_single_for_device(hwdev, handle, size, dir);
-}
-
-static inline void xen_dma_map_page(struct device *hwdev, struct page *page,
-dma_addr_t dev_addr, unsigned long offset, size_t size,
-enum dma_data_direction dir, unsigned long attrs)
-{
-   unsigned long page_pfn = page_to_xen_pfn(page);
-   unsigned long dev_pfn = XEN_PFN_DOWN(dev_addr);
-   unsigned long compound_pages =
-   (1<
+#include 
+
 void __xen_dma_map_page(struct device *hwdev, struct page *page,
 dma_addr_t dev_addr, unsigned long offset, size_t size,
 enum dma_data_direction dir, unsigned long attrs);
@@ -13,4 +16,81 @@ void __xen_dma_sync_single_for_cpu(struct device *hwdev,
 void __xen_dma_sync_single_for_device(struct device *hwdev,
dma_addr_t handle, size_t size, enum dma_data_direction dir);
 
+static inline void *xen_alloc_coherent_pages(struct device *hwdev, size_t size,
+   dma_addr_t *dma_handle, gfp_t flags, unsigned long attrs)
+{
+   return dma_direct_alloc(hwdev, size, dma_handle, flags, attrs);
+}
+
+static inline void xen_free_coherent_pages(struct device *hwdev, size_t size,
+   void *cpu_addr, dma_addr_t dma_handle, unsigned long attrs)
+{
+   dma_direct_free(hwdev, size, cpu_addr, dma_handle, attrs);
+}
+
+static inline void xen_dm

[PATCH 03/11] xen/arm: use dev_is_dma_coherent

2019-09-05 Thread Christoph Hellwig
Use the dma-noncoherent dev_is_dma_coherent helper instead of the home
grown variant.  Note that both are always initialized to the same
value in arch_setup_dma_ops.

Signed-off-by: Christoph Hellwig 
Reviewed-by: Julien Grall 
Reviewed-by: Stefano Stabellini 
---
 arch/arm/include/asm/dma-mapping.h   |  6 --
 arch/arm/xen/mm.c| 12 ++--
 arch/arm64/include/asm/dma-mapping.h |  9 -
 3 files changed, 6 insertions(+), 21 deletions(-)

diff --git a/arch/arm/include/asm/dma-mapping.h 
b/arch/arm/include/asm/dma-mapping.h
index dba9355e2484..bdd80ddbca34 100644
--- a/arch/arm/include/asm/dma-mapping.h
+++ b/arch/arm/include/asm/dma-mapping.h
@@ -91,12 +91,6 @@ static inline dma_addr_t virt_to_dma(struct device *dev, 
void *addr)
 }
 #endif
 
-/* do not use this function in a driver */
-static inline bool is_device_dma_coherent(struct device *dev)
-{
-   return dev->archdata.dma_coherent;
-}
-
 /**
  * arm_dma_alloc - allocate consistent memory for DMA
  * @dev: valid struct device pointer, or NULL for ISA and EISA-like devices
diff --git a/arch/arm/xen/mm.c b/arch/arm/xen/mm.c
index d33b77e9add3..90574d89d0d4 100644
--- a/arch/arm/xen/mm.c
+++ b/arch/arm/xen/mm.c
@@ -1,6 +1,6 @@
 // SPDX-License-Identifier: GPL-2.0-only
 #include 
-#include 
+#include 
 #include 
 #include 
 #include 
@@ -99,7 +99,7 @@ void __xen_dma_map_page(struct device *hwdev, struct page 
*page,
 dma_addr_t dev_addr, unsigned long offset, size_t size,
 enum dma_data_direction dir, unsigned long attrs)
 {
-   if (is_device_dma_coherent(hwdev))
+   if (dev_is_dma_coherent(hwdev))
return;
if (attrs & DMA_ATTR_SKIP_CPU_SYNC)
return;
@@ -112,7 +112,7 @@ void __xen_dma_unmap_page(struct device *hwdev, dma_addr_t 
handle,
unsigned long attrs)
 
 {
-   if (is_device_dma_coherent(hwdev))
+   if (dev_is_dma_coherent(hwdev))
return;
if (attrs & DMA_ATTR_SKIP_CPU_SYNC)
return;
@@ -123,7 +123,7 @@ void __xen_dma_unmap_page(struct device *hwdev, dma_addr_t 
handle,
 void __xen_dma_sync_single_for_cpu(struct device *hwdev,
dma_addr_t handle, size_t size, enum dma_data_direction dir)
 {
-   if (is_device_dma_coherent(hwdev))
+   if (dev_is_dma_coherent(hwdev))
return;
__xen_dma_page_dev_to_cpu(hwdev, handle, size, dir);
 }
@@ -131,7 +131,7 @@ void __xen_dma_sync_single_for_cpu(struct device *hwdev,
 void __xen_dma_sync_single_for_device(struct device *hwdev,
dma_addr_t handle, size_t size, enum dma_data_direction dir)
 {
-   if (is_device_dma_coherent(hwdev))
+   if (dev_is_dma_coherent(hwdev))
return;
__xen_dma_page_cpu_to_dev(hwdev, handle, size, dir);
 }
@@ -159,7 +159,7 @@ bool xen_arch_need_swiotlb(struct device *dev,
 * memory and we are not able to flush the cache.
 */
return (!hypercall_cflush && (xen_pfn != bfn) &&
-   !is_device_dma_coherent(dev));
+   !dev_is_dma_coherent(dev));
 }
 
 int xen_create_contiguous_region(phys_addr_t pstart, unsigned int order,
diff --git a/arch/arm64/include/asm/dma-mapping.h 
b/arch/arm64/include/asm/dma-mapping.h
index bdcb0922a40c..67243255a858 100644
--- a/arch/arm64/include/asm/dma-mapping.h
+++ b/arch/arm64/include/asm/dma-mapping.h
@@ -18,14 +18,5 @@ static inline const struct dma_map_ops 
*get_arch_dma_ops(struct bus_type *bus)
return NULL;
 }
 
-/*
- * Do not use this function in a driver, it is only provided for
- * arch/arm/mm/xen.c, which is used by arm64 as well.
- */
-static inline bool is_device_dma_coherent(struct device *dev)
-{
-   return dev->dma_coherent;
-}
-
 #endif /* __KERNEL__ */
 #endif /* __ASM_DMA_MAPPING_H */
-- 
2.20.1



[PATCH 04/11] xen/arm: simplify dma_cache_maint

2019-09-05 Thread Christoph Hellwig
Calculate the required operation in the caller, and pass it directly
instead of recalculating it for each page, and use simple arithmetics
to get from the physical address to Xen page size aligned chunks.

Signed-off-by: Christoph Hellwig 
Reviewed-by: Stefano Stabellini 
---
 arch/arm/xen/mm.c | 61 ---
 1 file changed, 21 insertions(+), 40 deletions(-)

diff --git a/arch/arm/xen/mm.c b/arch/arm/xen/mm.c
index 90574d89d0d4..2fde161733b0 100644
--- a/arch/arm/xen/mm.c
+++ b/arch/arm/xen/mm.c
@@ -35,64 +35,45 @@ unsigned long xen_get_swiotlb_free_pages(unsigned int order)
return __get_free_pages(flags, order);
 }
 
-enum dma_cache_op {
-   DMA_UNMAP,
-   DMA_MAP,
-};
 static bool hypercall_cflush = false;
 
-/* functions called by SWIOTLB */
-
-static void dma_cache_maint(dma_addr_t handle, unsigned long offset,
-   size_t size, enum dma_data_direction dir, enum dma_cache_op op)
+/* buffers in highmem or foreign pages cannot cross page boundaries */
+static void dma_cache_maint(dma_addr_t handle, size_t size, u32 op)
 {
struct gnttab_cache_flush cflush;
-   unsigned long xen_pfn;
-   size_t left = size;
 
-   xen_pfn = (handle >> XEN_PAGE_SHIFT) + offset / XEN_PAGE_SIZE;
-   offset %= XEN_PAGE_SIZE;
+   cflush.a.dev_bus_addr = handle & XEN_PAGE_MASK;
+   cflush.offset = xen_offset_in_page(handle);
+   cflush.op = op;
 
do {
-   size_t len = left;
-   
-   /* buffers in highmem or foreign pages cannot cross page
-* boundaries */
-   if (len + offset > XEN_PAGE_SIZE)
-   len = XEN_PAGE_SIZE - offset;
-
-   cflush.op = 0;
-   cflush.a.dev_bus_addr = xen_pfn << XEN_PAGE_SHIFT;
-   cflush.offset = offset;
-   cflush.length = len;
-
-   if (op == DMA_UNMAP && dir != DMA_TO_DEVICE)
-   cflush.op = GNTTAB_CACHE_INVAL;
-   if (op == DMA_MAP) {
-   if (dir == DMA_FROM_DEVICE)
-   cflush.op = GNTTAB_CACHE_INVAL;
-   else
-   cflush.op = GNTTAB_CACHE_CLEAN;
-   }
-   if (cflush.op)
-   HYPERVISOR_grant_table_op(GNTTABOP_cache_flush, 
&cflush, 1);
+   if (size + cflush.offset > XEN_PAGE_SIZE)
+   cflush.length = XEN_PAGE_SIZE - cflush.offset;
+   else
+   cflush.length = size;
+
+   HYPERVISOR_grant_table_op(GNTTABOP_cache_flush, &cflush, 1);
 
-   offset = 0;
-   xen_pfn++;
-   left -= len;
-   } while (left);
+   cflush.offset = 0;
+   cflush.a.dev_bus_addr += cflush.length;
+   size -= cflush.length;
+   } while (size);
 }
 
 static void __xen_dma_page_dev_to_cpu(struct device *hwdev, dma_addr_t handle,
size_t size, enum dma_data_direction dir)
 {
-   dma_cache_maint(handle & PAGE_MASK, handle & ~PAGE_MASK, size, dir, 
DMA_UNMAP);
+   if (dir != DMA_TO_DEVICE)
+   dma_cache_maint(handle, size, GNTTAB_CACHE_INVAL);
 }
 
 static void __xen_dma_page_cpu_to_dev(struct device *hwdev, dma_addr_t handle,
size_t size, enum dma_data_direction dir)
 {
-   dma_cache_maint(handle & PAGE_MASK, handle & ~PAGE_MASK, size, dir, 
DMA_MAP);
+   if (dir == DMA_FROM_DEVICE)
+   dma_cache_maint(handle, size, GNTTAB_CACHE_INVAL);
+   else
+   dma_cache_maint(handle, size, GNTTAB_CACHE_CLEAN);
 }
 
 void __xen_dma_map_page(struct device *hwdev, struct page *page,
-- 
2.20.1

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[PATCH 07/11] swiotlb-xen: remove xen_swiotlb_dma_mmap and xen_swiotlb_dma_get_sgtable

2019-09-05 Thread Christoph Hellwig
There is no need to wrap the common version, just wire them up directly.

Signed-off-by: Christoph Hellwig 
Reviewed-by: Stefano Stabellini 
---
 drivers/xen/swiotlb-xen.c | 29 ++---
 1 file changed, 2 insertions(+), 27 deletions(-)

diff --git a/drivers/xen/swiotlb-xen.c b/drivers/xen/swiotlb-xen.c
index eee86cc7046b..b8808677ae1d 100644
--- a/drivers/xen/swiotlb-xen.c
+++ b/drivers/xen/swiotlb-xen.c
@@ -547,31 +547,6 @@ xen_swiotlb_dma_supported(struct device *hwdev, u64 mask)
return xen_virt_to_bus(xen_io_tlb_end - 1) <= mask;
 }
 
-/*
- * Create userspace mapping for the DMA-coherent memory.
- * This function should be called with the pages from the current domain only,
- * passing pages mapped from other domains would lead to memory corruption.
- */
-static int
-xen_swiotlb_dma_mmap(struct device *dev, struct vm_area_struct *vma,
-void *cpu_addr, dma_addr_t dma_addr, size_t size,
-unsigned long attrs)
-{
-   return dma_common_mmap(dev, vma, cpu_addr, dma_addr, size, attrs);
-}
-
-/*
- * This function should be called with the pages from the current domain only,
- * passing pages mapped from other domains would lead to memory corruption.
- */
-static int
-xen_swiotlb_get_sgtable(struct device *dev, struct sg_table *sgt,
-   void *cpu_addr, dma_addr_t handle, size_t size,
-   unsigned long attrs)
-{
-   return dma_common_get_sgtable(dev, sgt, cpu_addr, handle, size, attrs);
-}
-
 const struct dma_map_ops xen_swiotlb_dma_ops = {
.alloc = xen_swiotlb_alloc_coherent,
.free = xen_swiotlb_free_coherent,
@@ -584,6 +559,6 @@ const struct dma_map_ops xen_swiotlb_dma_ops = {
.map_page = xen_swiotlb_map_page,
.unmap_page = xen_swiotlb_unmap_page,
.dma_supported = xen_swiotlb_dma_supported,
-   .mmap = xen_swiotlb_dma_mmap,
-   .get_sgtable = xen_swiotlb_get_sgtable,
+   .mmap = dma_common_mmap,
+   .get_sgtable = dma_common_get_sgtable,
 };
-- 
2.20.1



[PATCH 06/11] xen: remove the exports for xen_{create,destroy}_contiguous_region

2019-09-05 Thread Christoph Hellwig
These routines are only used by swiotlb-xen, which cannot be modular.

Signed-off-by: Christoph Hellwig 
Reviewed-by: Stefano Stabellini 
---
 arch/arm/xen/mm.c | 2 --
 arch/x86/xen/mmu_pv.c | 2 --
 2 files changed, 4 deletions(-)

diff --git a/arch/arm/xen/mm.c b/arch/arm/xen/mm.c
index 11d5ad26fcfe..9d73fa4a5991 100644
--- a/arch/arm/xen/mm.c
+++ b/arch/arm/xen/mm.c
@@ -154,13 +154,11 @@ int xen_create_contiguous_region(phys_addr_t pstart, 
unsigned int order,
*dma_handle = pstart;
return 0;
 }
-EXPORT_SYMBOL_GPL(xen_create_contiguous_region);
 
 void xen_destroy_contiguous_region(phys_addr_t pstart, unsigned int order)
 {
return;
 }
-EXPORT_SYMBOL_GPL(xen_destroy_contiguous_region);
 
 int __init xen_mm_init(void)
 {
diff --git a/arch/x86/xen/mmu_pv.c b/arch/x86/xen/mmu_pv.c
index 26e8b326966d..c8dbee62ec2a 100644
--- a/arch/x86/xen/mmu_pv.c
+++ b/arch/x86/xen/mmu_pv.c
@@ -2625,7 +2625,6 @@ int xen_create_contiguous_region(phys_addr_t pstart, 
unsigned int order,
*dma_handle = virt_to_machine(vstart).maddr;
return success ? 0 : -ENOMEM;
 }
-EXPORT_SYMBOL_GPL(xen_create_contiguous_region);
 
 void xen_destroy_contiguous_region(phys_addr_t pstart, unsigned int order)
 {
@@ -2660,7 +2659,6 @@ void xen_destroy_contiguous_region(phys_addr_t pstart, 
unsigned int order)
 
spin_unlock_irqrestore(&xen_reservation_lock, flags);
 }
-EXPORT_SYMBOL_GPL(xen_destroy_contiguous_region);
 
 static noinline void xen_flush_tlb_all(void)
 {
-- 
2.20.1



[PATCH 05/11] xen/arm: remove xen_dma_ops

2019-09-05 Thread Christoph Hellwig
arm and arm64 can just use xen_swiotlb_dma_ops directly like x86, no
need for a pointer indirection.

Signed-off-by: Christoph Hellwig 
Reviewed-by: Julien Grall 
Reviewed-by: Stefano Stabellini 
---
 arch/arm/mm/dma-mapping.c| 3 ++-
 arch/arm/xen/mm.c| 4 
 arch/arm64/mm/dma-mapping.c  | 3 ++-
 include/xen/arm/hypervisor.h | 2 --
 4 files changed, 4 insertions(+), 8 deletions(-)

diff --git a/arch/arm/mm/dma-mapping.c b/arch/arm/mm/dma-mapping.c
index 738097396445..2661cad36359 100644
--- a/arch/arm/mm/dma-mapping.c
+++ b/arch/arm/mm/dma-mapping.c
@@ -35,6 +35,7 @@
 #include 
 #include 
 #include 
+#include 
 
 #include "dma.h"
 #include "mm.h"
@@ -2360,7 +2361,7 @@ void arch_setup_dma_ops(struct device *dev, u64 dma_base, 
u64 size,
 
 #ifdef CONFIG_XEN
if (xen_initial_domain())
-   dev->dma_ops = xen_dma_ops;
+   dev->dma_ops = &xen_swiotlb_dma_ops;
 #endif
dev->archdata.dma_ops_setup = true;
 }
diff --git a/arch/arm/xen/mm.c b/arch/arm/xen/mm.c
index 2fde161733b0..11d5ad26fcfe 100644
--- a/arch/arm/xen/mm.c
+++ b/arch/arm/xen/mm.c
@@ -162,16 +162,12 @@ void xen_destroy_contiguous_region(phys_addr_t pstart, 
unsigned int order)
 }
 EXPORT_SYMBOL_GPL(xen_destroy_contiguous_region);
 
-const struct dma_map_ops *xen_dma_ops;
-EXPORT_SYMBOL(xen_dma_ops);
-
 int __init xen_mm_init(void)
 {
struct gnttab_cache_flush cflush;
if (!xen_initial_domain())
return 0;
xen_swiotlb_init(1, false);
-   xen_dma_ops = &xen_swiotlb_dma_ops;
 
cflush.op = 0;
cflush.a.dev_bus_addr = 0;
diff --git a/arch/arm64/mm/dma-mapping.c b/arch/arm64/mm/dma-mapping.c
index bd2b039f43a6..4b244a037349 100644
--- a/arch/arm64/mm/dma-mapping.c
+++ b/arch/arm64/mm/dma-mapping.c
@@ -8,6 +8,7 @@
 #include 
 #include 
 #include 
+#include 
 
 #include 
 
@@ -64,6 +65,6 @@ void arch_setup_dma_ops(struct device *dev, u64 dma_base, u64 
size,
 
 #ifdef CONFIG_XEN
if (xen_initial_domain())
-   dev->dma_ops = xen_dma_ops;
+   dev->dma_ops = &xen_swiotlb_dma_ops;
 #endif
 }
diff --git a/include/xen/arm/hypervisor.h b/include/xen/arm/hypervisor.h
index 2982571f7cc1..43ef24dd030e 100644
--- a/include/xen/arm/hypervisor.h
+++ b/include/xen/arm/hypervisor.h
@@ -19,8 +19,6 @@ static inline enum paravirt_lazy_mode 
paravirt_get_lazy_mode(void)
return PARAVIRT_LAZY_NONE;
 }
 
-extern const struct dma_map_ops *xen_dma_ops;
-
 #ifdef CONFIG_XEN
 void __init xen_early_init(void);
 #else
-- 
2.20.1



[PATCH 09/11] swiotlb-xen: simplify cache maintainance

2019-09-05 Thread Christoph Hellwig
Now that we know we always have the dma-noncoherent.h helpers available
if we are on an architecture with support for non-coherent devices,
we can just call them directly, and remove the calls to the dma-direct
routines, including the fact that we call the dma_direct_map_page
routines but ignore the value returned from it.  Instead we now have
Xen wrappers for the arch_sync_dma_for_{device,cpu} helpers that call
the special Xen versions of those routines for foreign pages.

Note that the new helpers get the physical address passed in addition
to the dma address to avoid another translation for the local cache
maintainance.  The pfn_valid checks remain on the dma address as in
the old code, even if that looks a little funny.

Signed-off-by: Christoph Hellwig 
Reviewed-by: Stefano Stabellini 
---
 arch/arm/xen/mm.c| 64 +++-
 arch/x86/include/asm/xen/page-coherent.h | 14 --
 drivers/xen/swiotlb-xen.c| 20 
 include/xen/arm/page-coherent.h  | 63 ---
 include/xen/swiotlb-xen.h|  5 ++
 5 files changed, 32 insertions(+), 134 deletions(-)

diff --git a/arch/arm/xen/mm.c b/arch/arm/xen/mm.c
index 9d73fa4a5991..2b2c208408bb 100644
--- a/arch/arm/xen/mm.c
+++ b/arch/arm/xen/mm.c
@@ -60,63 +60,33 @@ static void dma_cache_maint(dma_addr_t handle, size_t size, 
u32 op)
} while (size);
 }
 
-static void __xen_dma_page_dev_to_cpu(struct device *hwdev, dma_addr_t handle,
-   size_t size, enum dma_data_direction dir)
+/*
+ * Dom0 is mapped 1:1, and while the Linux page can span across multiple Xen
+ * pages, it is not possible for it to contain a mix of local and foreign Xen
+ * pages.  Calling pfn_valid on a foreign mfn will always return false, so if
+ * pfn_valid returns true the pages is local and we can use the native
+ * dma-direct functions, otherwise we call the Xen specific version.
+ */
+void xen_dma_sync_for_cpu(struct device *dev, dma_addr_t handle,
+   phys_addr_t paddr, size_t size, enum dma_data_direction dir)
 {
-   if (dir != DMA_TO_DEVICE)
+   if (pfn_valid(PFN_DOWN(handle)))
+   arch_sync_dma_for_cpu(dev, paddr, size, dir);
+   else if (dir != DMA_TO_DEVICE)
dma_cache_maint(handle, size, GNTTAB_CACHE_INVAL);
 }
 
-static void __xen_dma_page_cpu_to_dev(struct device *hwdev, dma_addr_t handle,
-   size_t size, enum dma_data_direction dir)
+void xen_dma_sync_for_device(struct device *dev, dma_addr_t handle,
+   phys_addr_t paddr, size_t size, enum dma_data_direction dir)
 {
-   if (dir == DMA_FROM_DEVICE)
+   if (pfn_valid(PFN_DOWN(handle)))
+   arch_sync_dma_for_device(dev, paddr, size, dir);
+   else if (dir == DMA_FROM_DEVICE)
dma_cache_maint(handle, size, GNTTAB_CACHE_INVAL);
else
dma_cache_maint(handle, size, GNTTAB_CACHE_CLEAN);
 }
 
-void __xen_dma_map_page(struct device *hwdev, struct page *page,
-dma_addr_t dev_addr, unsigned long offset, size_t size,
-enum dma_data_direction dir, unsigned long attrs)
-{
-   if (dev_is_dma_coherent(hwdev))
-   return;
-   if (attrs & DMA_ATTR_SKIP_CPU_SYNC)
-   return;
-
-   __xen_dma_page_cpu_to_dev(hwdev, dev_addr, size, dir);
-}
-
-void __xen_dma_unmap_page(struct device *hwdev, dma_addr_t handle,
-   size_t size, enum dma_data_direction dir,
-   unsigned long attrs)
-
-{
-   if (dev_is_dma_coherent(hwdev))
-   return;
-   if (attrs & DMA_ATTR_SKIP_CPU_SYNC)
-   return;
-
-   __xen_dma_page_dev_to_cpu(hwdev, handle, size, dir);
-}
-
-void __xen_dma_sync_single_for_cpu(struct device *hwdev,
-   dma_addr_t handle, size_t size, enum dma_data_direction dir)
-{
-   if (dev_is_dma_coherent(hwdev))
-   return;
-   __xen_dma_page_dev_to_cpu(hwdev, handle, size, dir);
-}
-
-void __xen_dma_sync_single_for_device(struct device *hwdev,
-   dma_addr_t handle, size_t size, enum dma_data_direction dir)
-{
-   if (dev_is_dma_coherent(hwdev))
-   return;
-   __xen_dma_page_cpu_to_dev(hwdev, handle, size, dir);
-}
-
 bool xen_arch_need_swiotlb(struct device *dev,
   phys_addr_t phys,
   dma_addr_t dev_addr)
diff --git a/arch/x86/include/asm/xen/page-coherent.h 
b/arch/x86/include/asm/xen/page-coherent.h
index 116777e7f387..63cd41b2e17a 100644
--- a/arch/x86/include/asm/xen/page-coherent.h
+++ b/arch/x86/include/asm/xen/page-coherent.h
@@ -21,18 +21,4 @@ static inline void xen_free_coherent_pages(struct device 
*hwdev, size_t size,
free_pages((unsigned long) cpu_addr, get_order(size));
 }
 
-static inline void xen_dma_map_page(struct device *hwdev, struct page *page,
-dma_addr_t dev_addr, unsigned long offset, size_t size,
-enum dma_data_direction dir, 

[PATCH 11/11] arm64: use asm-generic/dma-mapping.h

2019-09-05 Thread Christoph Hellwig
Now that the Xen special cases are gone nothing worth mentioning is
left in the arm64  file, so switch to use the
asm-generic version instead.

Signed-off-by: Christoph Hellwig 
Acked-by: Will Deacon 
Reviewed-by: Stefano Stabellini 
---
 arch/arm64/include/asm/Kbuild|  1 +
 arch/arm64/include/asm/dma-mapping.h | 22 --
 arch/arm64/mm/dma-mapping.c  |  1 +
 3 files changed, 2 insertions(+), 22 deletions(-)
 delete mode 100644 arch/arm64/include/asm/dma-mapping.h

diff --git a/arch/arm64/include/asm/Kbuild b/arch/arm64/include/asm/Kbuild
index c52e151afab0..98a5405c8558 100644
--- a/arch/arm64/include/asm/Kbuild
+++ b/arch/arm64/include/asm/Kbuild
@@ -4,6 +4,7 @@ generic-y += delay.h
 generic-y += div64.h
 generic-y += dma.h
 generic-y += dma-contiguous.h
+generic-y += dma-mapping.h
 generic-y += early_ioremap.h
 generic-y += emergency-restart.h
 generic-y += hw_irq.h
diff --git a/arch/arm64/include/asm/dma-mapping.h 
b/arch/arm64/include/asm/dma-mapping.h
deleted file mode 100644
index 67243255a858..
--- a/arch/arm64/include/asm/dma-mapping.h
+++ /dev/null
@@ -1,22 +0,0 @@
-/* SPDX-License-Identifier: GPL-2.0-only */
-/*
- * Copyright (C) 2012 ARM Ltd.
- */
-#ifndef __ASM_DMA_MAPPING_H
-#define __ASM_DMA_MAPPING_H
-
-#ifdef __KERNEL__
-
-#include 
-#include 
-
-#include 
-#include 
-
-static inline const struct dma_map_ops *get_arch_dma_ops(struct bus_type *bus)
-{
-   return NULL;
-}
-
-#endif /* __KERNEL__ */
-#endif /* __ASM_DMA_MAPPING_H */
diff --git a/arch/arm64/mm/dma-mapping.c b/arch/arm64/mm/dma-mapping.c
index 4b244a037349..6578abcfbbc7 100644
--- a/arch/arm64/mm/dma-mapping.c
+++ b/arch/arm64/mm/dma-mapping.c
@@ -8,6 +8,7 @@
 #include 
 #include 
 #include 
+#include 
 #include 
 
 #include 
-- 
2.20.1



[PATCH 10/11] swiotlb-xen: merge xen_unmap_single into xen_swiotlb_unmap_page

2019-09-05 Thread Christoph Hellwig
No need for a no-op wrapper.

Signed-off-by: Christoph Hellwig 
Reviewed-by: Stefano Stabellini 
---
 drivers/xen/swiotlb-xen.c | 15 ---
 1 file changed, 4 insertions(+), 11 deletions(-)

diff --git a/drivers/xen/swiotlb-xen.c b/drivers/xen/swiotlb-xen.c
index f81031f0c1c7..1190934098eb 100644
--- a/drivers/xen/swiotlb-xen.c
+++ b/drivers/xen/swiotlb-xen.c
@@ -418,9 +418,8 @@ static dma_addr_t xen_swiotlb_map_page(struct device *dev, 
struct page *page,
  * After this call, reads by the cpu to the buffer are guaranteed to see
  * whatever the device wrote there.
  */
-static void xen_unmap_single(struct device *hwdev, dma_addr_t dev_addr,
-size_t size, enum dma_data_direction dir,
-unsigned long attrs)
+static void xen_swiotlb_unmap_page(struct device *hwdev, dma_addr_t dev_addr,
+   size_t size, enum dma_data_direction dir, unsigned long attrs)
 {
phys_addr_t paddr = xen_bus_to_phys(dev_addr);
 
@@ -434,13 +433,6 @@ static void xen_unmap_single(struct device *hwdev, 
dma_addr_t dev_addr,
swiotlb_tbl_unmap_single(hwdev, paddr, size, dir, attrs);
 }
 
-static void xen_swiotlb_unmap_page(struct device *hwdev, dma_addr_t dev_addr,
-   size_t size, enum dma_data_direction dir,
-   unsigned long attrs)
-{
-   xen_unmap_single(hwdev, dev_addr, size, dir, attrs);
-}
-
 static void
 xen_swiotlb_sync_single_for_cpu(struct device *dev, dma_addr_t dma_addr,
size_t size, enum dma_data_direction dir)
@@ -481,7 +473,8 @@ xen_swiotlb_unmap_sg(struct device *hwdev, struct 
scatterlist *sgl, int nelems,
BUG_ON(dir == DMA_NONE);
 
for_each_sg(sgl, sg, nelems, i)
-   xen_unmap_single(hwdev, sg->dma_address, sg_dma_len(sg), dir, 
attrs);
+   xen_swiotlb_unmap_page(hwdev, sg->dma_address, sg_dma_len(sg),
+   dir, attrs);
 
 }
 
-- 
2.20.1



[PATCH 08/11] swiotlb-xen: use the same foreign page check everywhere

2019-09-05 Thread Christoph Hellwig
xen_dma_map_page uses a different and more complicated check for foreign
pages than the other three cache maintainance helpers.  Switch it to the
simpler pfn_valid method a well, and document the scheme with a single
improved comment in xen_dma_map_page.

Signed-off-by: Christoph Hellwig 
Reviewed-by: Stefano Stabellini 
---
 include/xen/arm/page-coherent.h | 31 +--
 1 file changed, 9 insertions(+), 22 deletions(-)

diff --git a/include/xen/arm/page-coherent.h b/include/xen/arm/page-coherent.h
index a840d6949a87..a8d9c0678c27 100644
--- a/include/xen/arm/page-coherent.h
+++ b/include/xen/arm/page-coherent.h
@@ -53,23 +53,17 @@ static inline void xen_dma_map_page(struct device *hwdev, 
struct page *page,
 dma_addr_t dev_addr, unsigned long offset, size_t size,
 enum dma_data_direction dir, unsigned long attrs)
 {
-   unsigned long page_pfn = page_to_xen_pfn(page);
-   unsigned long dev_pfn = XEN_PFN_DOWN(dev_addr);
-   unsigned long compound_pages =
-   (1

Re: [RFC PATCH] iommu/amd: fix a race in increase_address_space()

2019-09-05 Thread Joerg Roedel
Hi Qian,

On Wed, Sep 04, 2019 at 05:24:22PM -0400, Qian Cai wrote:
>   if (domain->mode == PAGE_MODE_6_LEVEL)
>   /* address space already 64 bit large */
>   return false;
> 
> This gives a clue that there must be a race between multiple concurrent
> threads in increase_address_space().

Thanks for tracking this down, there is a race indeed.

> + mutex_lock(&domain->api_lock);
>   *dma_addr = __map_single(dev, dma_dom, page_to_phys(page),
>size, DMA_BIDIRECTIONAL, dma_mask);
> + mutex_unlock(&domain->api_lock);
>  
>   if (*dma_addr == DMA_MAPPING_ERROR)
>   goto out_free;
> @@ -2696,7 +2698,9 @@ static void free_coherent(struct device *dev, size_t 
> size,
>  
>   dma_dom = to_dma_ops_domain(domain);
>  
> + mutex_lock(&domain->api_lock);
>   __unmap_single(dma_dom, dma_addr, size, DMA_BIDIRECTIONAL);
> + mutex_unlock(&domain->api_lock);

But I think the right fix is to lock the operation in
increase_address_space() directly, and not the calls around it, like in
the diff below. It is untested, so can you please try it and report back
if it fixes your issue?

diff --git a/drivers/iommu/amd_iommu.c b/drivers/iommu/amd_iommu.c
index b607a92791d3..1ff705f16239 100644
--- a/drivers/iommu/amd_iommu.c
+++ b/drivers/iommu/amd_iommu.c
@@ -1424,18 +1424,21 @@ static void free_pagetable(struct protection_domain 
*domain)
  * another level increases the size of the address space by 9 bits to a size up
  * to 64 bits.
  */
-static bool increase_address_space(struct protection_domain *domain,
+static void increase_address_space(struct protection_domain *domain,
   gfp_t gfp)
 {
+   unsigned long flags;
u64 *pte;
 
-   if (domain->mode == PAGE_MODE_6_LEVEL)
+   spin_lock_irqsave(&domain->lock, flags);
+
+   if (WARN_ON_ONCE(domain->mode == PAGE_MODE_6_LEVEL))
/* address space already 64 bit large */
-   return false;
+   goto out;
 
pte = (void *)get_zeroed_page(gfp);
if (!pte)
-   return false;
+   goto out;
 
*pte = PM_LEVEL_PDE(domain->mode,
iommu_virt_to_phys(domain->pt_root));
@@ -1443,7 +1446,10 @@ static bool increase_address_space(struct 
protection_domain *domain,
domain->mode+= 1;
domain->updated  = true;
 
-   return true;
+out:
+   spin_unlock_irqrestore(&domain->lock, flags);
+
+   return;
 }
 
 static u64 *alloc_pte(struct protection_domain *domain,


Re: [V3, 2/2] media: i2c: Add Omnivision OV02A10 camera sensor driver

2019-09-05 Thread Sakari Ailus
On Thu, Sep 05, 2019 at 07:53:37PM +0900, Tomasz Figa wrote:
> On Thu, Sep 5, 2019 at 7:45 PM Sakari Ailus
>  wrote:
> >
> > Hi Dongchun,
> >
> > On Thu, Sep 05, 2019 at 05:41:05PM +0800, Dongchun Zhu wrote:
> >
> > ...
> >
> > > > > + ret = regulator_bulk_enable(OV02A10_NUM_SUPPLIES, 
> > > > > ov02a10->supplies);
> > > > > + if (ret < 0) {
> > > > > + dev_err(dev, "Failed to enable regulators\n");
> > > > > + goto disable_clk;
> > > > > + }
> > > > > + msleep_range(7);
> > > >
> > > > This has some potential of clashing with more generic functions in the
> > > > future. Please use usleep_range directly, or msleep.
> > > >
> > >
> > > Did you mean using usleep_range(7*1000, 8*1000), as used in patch v1?
> > > https://patchwork.kernel.org/patch/10957225/
> >
> > Yes, please.
> 
> Why not just msleep()?

msleep() is usually less accurate. I'm not sure it makes a big different in
this case. Perhaps, if someone wants that the sensor is powered on and
streaming as soon as possible.

-- 
Sakari Ailus
sakari.ai...@linux.intel.com


[PATCH] amd/iommu: flush old domains in kdump kernel

2019-09-05 Thread Stuart Hayes
When devices are attached to the amd_iommu in a kdump kernel, the old device
table entries (DTEs), which were copied from the crashed kernel, will be
overwritten with a new domain number.  When the new DTE is written, the IOMMU
is told to flush the DTE from its internal cache--but it is not told to flush
the translation cache entries for the old domain number.

Without this patch, AMD systems using the tg3 network driver fail when kdump
tries to save the vmcore to a network system, showing network timeouts and
(sometimes) IOMMU errors in the kernel log.

This patch will flush IOMMU translation cache entries for the old domain when
a DTE gets overwritten with a new domain number.


Signed-off-by: Stuart Hayes 

---

diff --git a/drivers/iommu/amd_iommu.c b/drivers/iommu/amd_iommu.c
index b607a92..f853b96 100644
--- a/drivers/iommu/amd_iommu.c
+++ b/drivers/iommu/amd_iommu.c
@@ -1143,6 +1143,17 @@ static void amd_iommu_flush_tlb_all(struct amd_iommu 
*iommu)
iommu_completion_wait(iommu);
 }
 
+static void amd_iommu_flush_tlb_domid(struct amd_iommu *iommu, u32 dom_id)
+{
+   struct iommu_cmd cmd;
+
+   build_inv_iommu_pages(&cmd, 0, CMD_INV_IOMMU_ALL_PAGES_ADDRESS,
+ dom_id, 1);
+   iommu_queue_command(iommu, &cmd);
+
+   iommu_completion_wait(iommu);
+}
+
 static void amd_iommu_flush_all(struct amd_iommu *iommu)
 {
struct iommu_cmd cmd;
@@ -1873,6 +1884,7 @@ static void set_dte_entry(u16 devid, struct 
protection_domain *domain,
 {
u64 pte_root = 0;
u64 flags = 0;
+   u32 old_domid;
 
if (domain->mode != PAGE_MODE_NONE)
pte_root = iommu_virt_to_phys(domain->pt_root);
@@ -1922,8 +1934,20 @@ static void set_dte_entry(u16 devid, struct 
protection_domain *domain,
flags &= ~DEV_DOMID_MASK;
flags |= domain->id;
 
+   old_domid = amd_iommu_dev_table[devid].data[1] & DEV_DOMID_MASK;
amd_iommu_dev_table[devid].data[1]  = flags;
amd_iommu_dev_table[devid].data[0]  = pte_root;
+
+   /*
+* A kdump kernel might be replacing a domain ID that was copied from
+* the previous kernel--if so, it needs to flush the translation cache
+* entries for the old domain ID that is being overwritten
+*/
+   if (old_domid) {
+   struct amd_iommu *iommu = amd_iommu_rlookup_table[devid];
+
+   amd_iommu_flush_tlb_domid(iommu, old_domid);
+   }
 }
 
 static void clear_dte_entry(u16 devid)




Re: [RFC PATCH] iommu/amd: fix a race in increase_address_space()

2019-09-05 Thread Qian Cai
On Thu, 2019-09-05 at 13:43 +0200, Joerg Roedel wrote:
> Hi Qian,
> 
> On Wed, Sep 04, 2019 at 05:24:22PM -0400, Qian Cai wrote:
> > if (domain->mode == PAGE_MODE_6_LEVEL)
> > /* address space already 64 bit large */
> > return false;
> > 
> > This gives a clue that there must be a race between multiple concurrent
> > threads in increase_address_space().
> 
> Thanks for tracking this down, there is a race indeed.
> 
> > +   mutex_lock(&domain->api_lock);
> > *dma_addr = __map_single(dev, dma_dom, page_to_phys(page),
> >  size, DMA_BIDIRECTIONAL, dma_mask);
> > +   mutex_unlock(&domain->api_lock);
> >  
> > if (*dma_addr == DMA_MAPPING_ERROR)
> > goto out_free;
> > @@ -2696,7 +2698,9 @@ static void free_coherent(struct device *dev, size_t 
> > size,
> >  
> > dma_dom = to_dma_ops_domain(domain);
> >  
> > +   mutex_lock(&domain->api_lock);
> > __unmap_single(dma_dom, dma_addr, size, DMA_BIDIRECTIONAL);
> > +   mutex_unlock(&domain->api_lock);
> 
> But I think the right fix is to lock the operation in
> increase_address_space() directly, and not the calls around it, like in
> the diff below. It is untested, so can you please try it and report back
> if it fixes your issue?

Yes, it works great so far.

> 
> diff --git a/drivers/iommu/amd_iommu.c b/drivers/iommu/amd_iommu.c
> index b607a92791d3..1ff705f16239 100644
> --- a/drivers/iommu/amd_iommu.c
> +++ b/drivers/iommu/amd_iommu.c
> @@ -1424,18 +1424,21 @@ static void free_pagetable(struct protection_domain 
> *domain)
>   * another level increases the size of the address space by 9 bits to a size 
> up
>   * to 64 bits.
>   */
> -static bool increase_address_space(struct protection_domain *domain,
> +static void increase_address_space(struct protection_domain *domain,
>  gfp_t gfp)
>  {
> + unsigned long flags;
>   u64 *pte;
>  
> - if (domain->mode == PAGE_MODE_6_LEVEL)
> + spin_lock_irqsave(&domain->lock, flags);
> +
> + if (WARN_ON_ONCE(domain->mode == PAGE_MODE_6_LEVEL))
>   /* address space already 64 bit large */
> - return false;
> + goto out;
>  
>   pte = (void *)get_zeroed_page(gfp);
>   if (!pte)
> - return false;
> + goto out;
>  
>   *pte = PM_LEVEL_PDE(domain->mode,
>   iommu_virt_to_phys(domain->pt_root));
> @@ -1443,7 +1446,10 @@ static bool increase_address_space(struct 
> protection_domain *domain,
>   domain->mode+= 1;
>   domain->updated  = true;
>  
> - return true;
> +out:
> + spin_unlock_irqrestore(&domain->lock, flags);
> +
> + return;
>  }
>  
>  static u64 *alloc_pte(struct protection_domain *domain,


Re: [bug] __blk_mq_run_hw_queue suspicious rcu usage

2019-09-05 Thread David Rientjes via iommu
On Thu, 5 Sep 2019, Christoph Hellwig wrote:

> > Hi Christoph, Jens, and Ming,
> > 
> > While booting a 5.2 SEV-enabled guest we have encountered the following 
> > WARNING that is followed up by a BUG because we are in atomic context 
> > while trying to call set_memory_decrypted:
> 
> Well, this really is a x86 / DMA API issue unfortunately.  Drivers
> are allowed to do GFP_ATOMIC dma allocation under locks / rcu critical
> sections and from interrupts.  And it seems like the SEV case can't
> handle that.  We have some semi-generic code to have a fixed sized
> pool in kernel/dma for non-coherent platforms that have similar issues
> that we could try to wire up, but I wonder if there is a better way
> to handle the issue, so I've added Tom and the x86 maintainers.
> 
> Now independent of that issue using DMA coherent memory for the nvme
> PRPs/SGLs doesn't actually feel very optional.  We could do with
> normal kmalloc allocations and just sync it to the device and back.
> I wonder if we should create some general mempool-like helpers for that.
> 

Thanks for looking into this.  I assume it's a non-starter to try to 
address this in _vm_unmap_aliases() itself, i.e. rely on a purge spinlock 
to do all synchronization (or trylock if not forced) for 
purge_vmap_area_lazy() rather than only the vmap_area_lock within it.  In 
other words, no mutex.

If that's the case, and set_memory_encrypted() can't be fixed to not need 
to sleep by changing _vm_unmap_aliases() locking, then I assume dmapool is 
our only alternative?  I have no idea with how large this should be.
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [V3, 2/2] media: i2c: Add Omnivision OV02A10 camera sensor driver

2019-09-05 Thread Nicolas Boichat
On Fri, Sep 6, 2019 at 12:05 AM Sakari Ailus
 wrote:
>
> On Thu, Sep 05, 2019 at 07:53:37PM +0900, Tomasz Figa wrote:
> > On Thu, Sep 5, 2019 at 7:45 PM Sakari Ailus
> >  wrote:
> > >
> > > Hi Dongchun,
> > >
> > > On Thu, Sep 05, 2019 at 05:41:05PM +0800, Dongchun Zhu wrote:
> > >
> > > ...
> > >
> > > > > > + ret = regulator_bulk_enable(OV02A10_NUM_SUPPLIES, 
> > > > > > ov02a10->supplies);
> > > > > > + if (ret < 0) {
> > > > > > + dev_err(dev, "Failed to enable regulators\n");
> > > > > > + goto disable_clk;
> > > > > > + }
> > > > > > + msleep_range(7);
> > > > >
> > > > > This has some potential of clashing with more generic functions in the
> > > > > future. Please use usleep_range directly, or msleep.
> > > > >
> > > >
> > > > Did you mean using usleep_range(7*1000, 8*1000), as used in patch v1?
> > > > https://patchwork.kernel.org/patch/10957225/
> > >
> > > Yes, please.
> >
> > Why not just msleep()?
>
> msleep() is usually less accurate. I'm not sure it makes a big different in
> this case. Perhaps, if someone wants that the sensor is powered on and
> streaming as soon as possible.

https://elixir.bootlin.com/linux/latest/source/Documentation/timers/timers-howto.txt#L70

Use usleep_range for delays up to 20ms (at least that's what the
documentation (still) says?)

> --
> Sakari Ailus
> sakari.ai...@linux.intel.com


Re: [V3, 2/2] media: i2c: Add Omnivision OV02A10 camera sensor driver

2019-09-05 Thread Dongchun Zhu
On Fri, 2019-09-06 at 06:58 +0800, Nicolas Boichat wrote:
> On Fri, Sep 6, 2019 at 12:05 AM Sakari Ailus
>  wrote:
> >
> > On Thu, Sep 05, 2019 at 07:53:37PM +0900, Tomasz Figa wrote:
> > > On Thu, Sep 5, 2019 at 7:45 PM Sakari Ailus
> > >  wrote:
> > > >
> > > > Hi Dongchun,
> > > >
> > > > On Thu, Sep 05, 2019 at 05:41:05PM +0800, Dongchun Zhu wrote:
> > > >
> > > > ...
> > > >
> > > > > > > + ret = regulator_bulk_enable(OV02A10_NUM_SUPPLIES, 
> > > > > > > ov02a10->supplies);
> > > > > > > + if (ret < 0) {
> > > > > > > + dev_err(dev, "Failed to enable regulators\n");
> > > > > > > + goto disable_clk;
> > > > > > > + }
> > > > > > > + msleep_range(7);
> > > > > >
> > > > > > This has some potential of clashing with more generic functions in 
> > > > > > the
> > > > > > future. Please use usleep_range directly, or msleep.
> > > > > >
> > > > >
> > > > > Did you mean using usleep_range(7*1000, 8*1000), as used in patch v1?
> > > > > https://patchwork.kernel.org/patch/10957225/
> > > >
> > > > Yes, please.
> > >
> > > Why not just msleep()?
> >
> > msleep() is usually less accurate. I'm not sure it makes a big different in
> > this case. Perhaps, if someone wants that the sensor is powered on and
> > streaming as soon as possible.
> 
> https://elixir.bootlin.com/linux/latest/source/Documentation/timers/timers-howto.txt#L70
> 
> Use usleep_range for delays up to 20ms (at least that's what the
> documentation (still) says?)
> 

Thank you for your clarifications.
>From the doc,
"msleep(1~20) may not do what the caller intends, and
will often sleep longer (~20 ms actual sleep for any
value given in the 1~20ms range). In many cases this
is not the desired behavior."

So, it is supposed to use usleep_range in shorter sleep case,
such as 5ms.

> > --
> > Sakari Ailus
> > sakari.ai...@linux.intel.com




Re: [RFC PATCH] virtio_ring: Use DMA API if guest memory is encrypted

2019-09-05 Thread Michael S. Tsirkin
On Mon, Aug 12, 2019 at 02:15:32PM +0200, Christoph Hellwig wrote:
> On Sun, Aug 11, 2019 at 04:55:27AM -0400, Michael S. Tsirkin wrote:
> > On Sun, Aug 11, 2019 at 07:56:07AM +0200, Christoph Hellwig wrote:
> > > So we need a flag on the virtio device, exposed by the
> > > hypervisor (or hardware for hw virtio devices) that says:  hey, I'm real,
> > > don't take a shortcut.
> > 
> > The point here is that it's actually still not real. So we would still
> > use a physical address. However Linux decides that it wants extra
> > security by moving all data through the bounce buffer.  The distinction
> > made is that one can actually give device a physical address of the
> > bounce buffer.
> 
> Sure.  The problem is just that you keep piling hacks on top of hacks.
> We need the per-device flag anyway to properly support hardware virtio
> device in all circumstances.  Instead of coming up with another ad-hoc
> hack to force DMA uses implement that one proper bit and reuse it here.

The flag that you mention literally means "I am a real device" so for
example, you can use VFIO with it. And this device isn't a real one,
and you can't use VFIO with it, even though it's part of a power
system which always has an IOMMU.



Or here's another way to put it: we have a broken device that can only
access physical addresses, not DMA addresses. But to enable SEV Linux
requires DMA API.  So we can still make it work if DMA address happens
to be a physical address (not necessarily of the same page).


This is where dma_addr_is_a_phys_addr() is coming from: it tells us this
weird configuration can still work.  What are we going to do for SEV if
dma_addr_is_a_phys_addr does not apply? Fail probe I guess.


So the proposal is really to make things safe and to this end,
to add this in probe:

if (sev_active() &&
!dma_addr_is_a_phys_addr(dev) &&
!virtio_has_feature(vdev, VIRTIO_F_IOMMU_PLATFORM))
return -EINVAL;


the point being to prevent loading driver where it would
corrupt guest memory. Put this way, any objections to adding
dma_addr_is_a_phys_addr to the DMA API?





-- 
MST
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[PATCH v9 0/5] iommu: Bounce page for untrusted devices

2019-09-05 Thread Lu Baolu
The Thunderbolt vulnerabilities are public and have a nice
name as Thunderclap [1] [3] nowadays. This patch series aims
to mitigate those concerns.

An external PCI device is a PCI peripheral device connected
to the system through an external bus, such as Thunderbolt.
What makes it different is that it can't be trusted to the
same degree as the devices build into the system. Generally,
a trusted PCIe device will DMA into the designated buffers
and not overrun or otherwise write outside the specified
bounds. But it's different for an external device.

The minimum IOMMU mapping granularity is one page (4k), so
for DMA transfers smaller than that a malicious PCIe device
can access the whole page of memory even if it does not
belong to the driver in question. This opens a possibility
for DMA attack. For more information about DMA attacks
imposed by an untrusted PCI/PCIe device, please refer to [2].

This implements bounce buffer for the untrusted external
devices. The transfers should be limited in isolated pages
so the IOMMU window does not cover memory outside of what
the driver expects. Previously (v3 and before), we proposed
an optimisation to only copy the head and tail of the buffer
if it spans multiple pages, and directly map the ones in the
middle. Figure 1 gives a big picture about this solution.

swiotlb System
IOVA  bounce page   Memory
 .-.  .-..-.
 | |  | || |
 | |  | || |
buffer_start .-.  .-..-.
 | |->| |***>| |
 | |  | | swiotlb| |
 | |  | | mapping| |
 IOMMU Page  '-'  '-''-'
  Boundary   | | | |
 | | | |
 | | | |
 | |>| |
 | |IOMMU mapping| |
 | | | |
 IOMMU Page  .-. .-.
  Boundary   | | | |
 | | | |
 | |>| |
 | | IOMMU mapping   | |
 | | | |
 | | | |
 IOMMU Page  .-.  .-..-.
  Boundary   | |  | || |
 | |  | || |
 | |->| |***>| |
  buffer_end '-'  '-' swiotlb'-'
 | |  | | mapping| |
 | |  | || |
 '-'  '-''-'
  Figure 1: A big view of iommu bounce page 

As Robin Murphy pointed out, this ties us to using strict mode for
TLB maintenance, which may not be an overall win depending on the
balance between invalidation bandwidth vs. memcpy bandwidth. If we
use standard SWIOTLB logic to always copy the whole thing, we should
be able to release the bounce pages via the flush queue to allow
'safe' lazy unmaps. So since v4 we start to use the standard swiotlb
logic.

swiotlb System
IOVA  bounce page   Memory
buffer_start .-.  .-..-.
 | |  | || |
 | |  | || |
 | |  | |.-.physical
 | |->| | -->| |_start  
 | |iommu | | swiotlb| |
 | | map  | |   map  | |
 IOMMU Page  .-.  .-.'-'
  Boundary   | |  | || |
 | |  | || |
 | |->| || |
 | |iommu | || |
 | | map  | || |
 | |  | || |
 IOMMU Page  .-.  .-..-.
  Boundary   | |  | || |
 | |->| || |
 | |iommu | || |
 | | map  | || |
 | |  

[PATCH v9 2/5] iommu/vt-d: Check whether device requires bounce buffer

2019-09-05 Thread Lu Baolu
This adds a helper to check whether a device needs to
use bounce buffer. It also provides a boot time option
to disable the bounce buffer. Users can use this to
prevent the iommu driver from using the bounce buffer
for performance gain.

Cc: Ashok Raj 
Cc: Jacob Pan 
Cc: Kevin Tian 
Signed-off-by: Lu Baolu 
Tested-by: Xu Pengfei 
Tested-by: Mika Westerberg 
---
 Documentation/admin-guide/kernel-parameters.txt | 5 +
 drivers/iommu/intel-iommu.c | 7 +++
 2 files changed, 12 insertions(+)

diff --git a/Documentation/admin-guide/kernel-parameters.txt 
b/Documentation/admin-guide/kernel-parameters.txt
index 4c1971960afa..f441a9cea5bb 100644
--- a/Documentation/admin-guide/kernel-parameters.txt
+++ b/Documentation/admin-guide/kernel-parameters.txt
@@ -1732,6 +1732,11 @@
Note that using this option lowers the security
provided by tboot because it makes the system
vulnerable to DMA attacks.
+   nobounce [Default off]
+   Disable bounce buffer for unstrusted devices such as
+   the Thunderbolt devices. This will treat the untrusted
+   devices as the trusted ones, hence might expose security
+   risks of DMA attacks.
 
intel_idle.max_cstate=  [KNL,HW,ACPI,X86]
0   disables intel_idle and fall back on acpi_idle.
diff --git a/drivers/iommu/intel-iommu.c b/drivers/iommu/intel-iommu.c
index c4e0e4a9ee9e..e12aa73008df 100644
--- a/drivers/iommu/intel-iommu.c
+++ b/drivers/iommu/intel-iommu.c
@@ -362,6 +362,7 @@ static int dmar_forcedac;
 static int intel_iommu_strict;
 static int intel_iommu_superpage = 1;
 static int iommu_identity_mapping;
+static int intel_no_bounce;
 
 #define IDENTMAP_ALL   1
 #define IDENTMAP_GFX   2
@@ -375,6 +376,9 @@ EXPORT_SYMBOL_GPL(intel_iommu_gfx_mapped);
 static DEFINE_SPINLOCK(device_domain_lock);
 static LIST_HEAD(device_domain_list);
 
+#define device_needs_bounce(d) (!intel_no_bounce && dev_is_pci(d) &&   \
+   to_pci_dev(d)->untrusted)
+
 /*
  * Iterate over elements in device_domain_list and call the specified
  * callback @fn against each element.
@@ -457,6 +461,9 @@ static int __init intel_iommu_setup(char *str)
printk(KERN_INFO
"Intel-IOMMU: not forcing on after tboot. This 
could expose security risk for tboot\n");
intel_iommu_tboot_noforce = 1;
+   } else if (!strncmp(str, "nobounce", 8)) {
+   pr_info("Intel-IOMMU: No bounce buffer. This could 
expose security risks of DMA attacks\n");
+   intel_no_bounce = 1;
}
 
str += strcspn(str, ",");
-- 
2.17.1

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[PATCH v9 3/5] iommu/vt-d: Don't switch off swiotlb if bounce page is used

2019-09-05 Thread Lu Baolu
The bounce page implementation depends on swiotlb. Hence, don't
switch off swiotlb if the system has untrusted devices or could
potentially be hot-added with any untrusted devices.

Cc: Ashok Raj 
Cc: Jacob Pan 
Cc: Kevin Tian 
Signed-off-by: Lu Baolu 
Reviewed-by: Christoph Hellwig 
---
 drivers/iommu/Kconfig   |  1 +
 drivers/iommu/intel-iommu.c | 32 +---
 2 files changed, 18 insertions(+), 15 deletions(-)

diff --git a/drivers/iommu/Kconfig b/drivers/iommu/Kconfig
index e15cdcd8cb3c..a4ddeade8ac4 100644
--- a/drivers/iommu/Kconfig
+++ b/drivers/iommu/Kconfig
@@ -182,6 +182,7 @@ config INTEL_IOMMU
select IOMMU_IOVA
select NEED_DMA_MAP_STATE
select DMAR_TABLE
+   select SWIOTLB
help
  DMA remapping (DMAR) devices support enables independent address
  translations for Direct Memory Access (DMA) from devices.
diff --git a/drivers/iommu/intel-iommu.c b/drivers/iommu/intel-iommu.c
index e12aa73008df..34e1265bb2ad 100644
--- a/drivers/iommu/intel-iommu.c
+++ b/drivers/iommu/intel-iommu.c
@@ -4576,22 +4576,20 @@ const struct attribute_group *intel_iommu_groups[] = {
NULL,
 };
 
-static int __init platform_optin_force_iommu(void)
+static inline bool has_untrusted_dev(void)
 {
struct pci_dev *pdev = NULL;
-   bool has_untrusted_dev = false;
 
-   if (!dmar_platform_optin() || no_platform_optin)
-   return 0;
+   for_each_pci_dev(pdev)
+   if (pdev->untrusted)
+   return true;
 
-   for_each_pci_dev(pdev) {
-   if (pdev->untrusted) {
-   has_untrusted_dev = true;
-   break;
-   }
-   }
+   return false;
+}
 
-   if (!has_untrusted_dev)
+static int __init platform_optin_force_iommu(void)
+{
+   if (!dmar_platform_optin() || no_platform_optin || !has_untrusted_dev())
return 0;
 
if (no_iommu || dmar_disabled)
@@ -4605,9 +4603,6 @@ static int __init platform_optin_force_iommu(void)
iommu_identity_mapping |= IDENTMAP_ALL;
 
dmar_disabled = 0;
-#if defined(CONFIG_X86) && defined(CONFIG_SWIOTLB)
-   swiotlb = 0;
-#endif
no_iommu = 0;
 
return 1;
@@ -4747,7 +4742,14 @@ int __init intel_iommu_init(void)
up_write(&dmar_global_lock);
 
 #if defined(CONFIG_X86) && defined(CONFIG_SWIOTLB)
-   swiotlb = 0;
+   /*
+* If the system has no untrusted device or the user has decided
+* to disable the bounce page mechanisms, we don't need swiotlb.
+* Mark this and the pre-allocated bounce pages will be released
+* later.
+*/
+   if (!has_untrusted_dev() || intel_no_bounce)
+   swiotlb = 0;
 #endif
dma_ops = &intel_dma_ops;
 
-- 
2.17.1

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[PATCH v9 1/5] swiotlb: Split size parameter to map/unmap APIs

2019-09-05 Thread Lu Baolu
This splits the size parameter to swiotlb_tbl_map_single() and
swiotlb_tbl_unmap_single() into an alloc_size and a mapping_size
parameter, where the latter one is rounded up to the iommu page
size.

Suggested-by: Christoph Hellwig 
Signed-off-by: Lu Baolu 
Reviewed-by: Christoph Hellwig 
---
 drivers/xen/swiotlb-xen.c |  8 
 include/linux/swiotlb.h   |  8 ++--
 kernel/dma/direct.c   |  2 +-
 kernel/dma/swiotlb.c  | 30 +++---
 4 files changed, 30 insertions(+), 18 deletions(-)

diff --git a/drivers/xen/swiotlb-xen.c b/drivers/xen/swiotlb-xen.c
index ae1df496bf38..adcabd9473eb 100644
--- a/drivers/xen/swiotlb-xen.c
+++ b/drivers/xen/swiotlb-xen.c
@@ -386,8 +386,8 @@ static dma_addr_t xen_swiotlb_map_page(struct device *dev, 
struct page *page,
 */
trace_swiotlb_bounced(dev, dev_addr, size, swiotlb_force);
 
-   map = swiotlb_tbl_map_single(dev, start_dma_addr, phys, size, dir,
-attrs);
+   map = swiotlb_tbl_map_single(dev, start_dma_addr, phys,
+size, size, dir, attrs);
if (map == (phys_addr_t)DMA_MAPPING_ERROR)
return DMA_MAPPING_ERROR;
 
@@ -397,7 +397,7 @@ static dma_addr_t xen_swiotlb_map_page(struct device *dev, 
struct page *page,
 * Ensure that the address returned is DMA'ble
 */
if (unlikely(!dma_capable(dev, dev_addr, size))) {
-   swiotlb_tbl_unmap_single(dev, map, size, dir,
+   swiotlb_tbl_unmap_single(dev, map, size, size, dir,
attrs | DMA_ATTR_SKIP_CPU_SYNC);
return DMA_MAPPING_ERROR;
}
@@ -433,7 +433,7 @@ static void xen_unmap_single(struct device *hwdev, 
dma_addr_t dev_addr,
 
/* NOTE: We use dev_addr here, not paddr! */
if (is_xen_swiotlb_buffer(dev_addr))
-   swiotlb_tbl_unmap_single(hwdev, paddr, size, dir, attrs);
+   swiotlb_tbl_unmap_single(hwdev, paddr, size, size, dir, attrs);
 }
 
 static void xen_swiotlb_unmap_page(struct device *hwdev, dma_addr_t dev_addr,
diff --git a/include/linux/swiotlb.h b/include/linux/swiotlb.h
index 361f62bb4a8e..cde3dc18e21a 100644
--- a/include/linux/swiotlb.h
+++ b/include/linux/swiotlb.h
@@ -46,13 +46,17 @@ enum dma_sync_target {
 
 extern phys_addr_t swiotlb_tbl_map_single(struct device *hwdev,
  dma_addr_t tbl_dma_addr,
- phys_addr_t phys, size_t size,
+ phys_addr_t phys,
+ size_t mapping_size,
+ size_t alloc_size,
  enum dma_data_direction dir,
  unsigned long attrs);
 
 extern void swiotlb_tbl_unmap_single(struct device *hwdev,
 phys_addr_t tlb_addr,
-size_t size, enum dma_data_direction dir,
+size_t mapping_size,
+size_t alloc_size,
+enum dma_data_direction dir,
 unsigned long attrs);
 
 extern void swiotlb_tbl_sync_single(struct device *hwdev,
diff --git a/kernel/dma/direct.c b/kernel/dma/direct.c
index 706113c6bebc..8402b29c280f 100644
--- a/kernel/dma/direct.c
+++ b/kernel/dma/direct.c
@@ -305,7 +305,7 @@ void dma_direct_unmap_page(struct device *dev, dma_addr_t 
addr,
dma_direct_sync_single_for_cpu(dev, addr, size, dir);
 
if (unlikely(is_swiotlb_buffer(phys)))
-   swiotlb_tbl_unmap_single(dev, phys, size, dir, attrs);
+   swiotlb_tbl_unmap_single(dev, phys, size, size, dir, attrs);
 }
 EXPORT_SYMBOL(dma_direct_unmap_page);
 
diff --git a/kernel/dma/swiotlb.c b/kernel/dma/swiotlb.c
index 9de232229063..89066efa3840 100644
--- a/kernel/dma/swiotlb.c
+++ b/kernel/dma/swiotlb.c
@@ -444,7 +444,9 @@ static void swiotlb_bounce(phys_addr_t orig_addr, 
phys_addr_t tlb_addr,
 
 phys_addr_t swiotlb_tbl_map_single(struct device *hwdev,
   dma_addr_t tbl_dma_addr,
-  phys_addr_t orig_addr, size_t size,
+  phys_addr_t orig_addr,
+  size_t mapping_size,
+  size_t alloc_size,
   enum dma_data_direction dir,
   unsigned long attrs)
 {
@@ -464,6 +466,9 @@ phys_addr_t swiotlb_tbl_map_single(struct device *hwdev,
pr_warn_once("%s is active and system is using DMA bounce 
buffers\n",
 sme_active() ? "SME" : "SEV");
 
+   if (WARN_ON(mapping_size > alloc_size))
+   return (phys_addr_t)DMA_MAPPING_ERROR;
+
mask = dma_get_seg_boundary(hwdev);
 

[PATCH v9 4/5] iommu/vt-d: Add trace events for device dma map/unmap

2019-09-05 Thread Lu Baolu
This adds trace support for the Intel IOMMU driver. It
also declares some events which could be used to trace
the events when an IOVA is being mapped or unmapped in
a domain.

Cc: Ashok Raj 
Cc: Jacob Pan 
Cc: Kevin Tian 
Signed-off-by: Mika Westerberg 
Signed-off-by: Lu Baolu 
Reviewed-by: Steven Rostedt (VMware) 
---
 drivers/iommu/Makefile |   1 +
 drivers/iommu/intel-iommu.c|  13 +++-
 drivers/iommu/intel-trace.c|  14 
 include/trace/events/intel_iommu.h | 106 +
 4 files changed, 131 insertions(+), 3 deletions(-)
 create mode 100644 drivers/iommu/intel-trace.c
 create mode 100644 include/trace/events/intel_iommu.h

diff --git a/drivers/iommu/Makefile b/drivers/iommu/Makefile
index f13f36ae1af6..bfe27b2755bd 100644
--- a/drivers/iommu/Makefile
+++ b/drivers/iommu/Makefile
@@ -17,6 +17,7 @@ obj-$(CONFIG_ARM_SMMU) += arm-smmu.o
 obj-$(CONFIG_ARM_SMMU_V3) += arm-smmu-v3.o
 obj-$(CONFIG_DMAR_TABLE) += dmar.o
 obj-$(CONFIG_INTEL_IOMMU) += intel-iommu.o intel-pasid.o
+obj-$(CONFIG_INTEL_IOMMU) += intel-trace.o
 obj-$(CONFIG_INTEL_IOMMU_DEBUGFS) += intel-iommu-debugfs.o
 obj-$(CONFIG_INTEL_IOMMU_SVM) += intel-svm.o
 obj-$(CONFIG_IPMMU_VMSA) += ipmmu-vmsa.o
diff --git a/drivers/iommu/intel-iommu.c b/drivers/iommu/intel-iommu.c
index 34e1265bb2ad..0720fd7b262b 100644
--- a/drivers/iommu/intel-iommu.c
+++ b/drivers/iommu/intel-iommu.c
@@ -3541,6 +3541,9 @@ static dma_addr_t __intel_map_single(struct device *dev, 
phys_addr_t paddr,
 
start_paddr = (phys_addr_t)iova_pfn << PAGE_SHIFT;
start_paddr += paddr & ~PAGE_MASK;
+
+   trace_map_single(dev, start_paddr, paddr, size << VTD_PAGE_SHIFT);
+
return start_paddr;
 
 error:
@@ -3596,10 +3599,7 @@ static void intel_unmap(struct device *dev, dma_addr_t 
dev_addr, size_t size)
if (dev_is_pci(dev))
pdev = to_pci_dev(dev);
 
-   dev_dbg(dev, "Device unmapping: pfn %lx-%lx\n", start_pfn, last_pfn);
-
freelist = domain_unmap(domain, start_pfn, last_pfn);
-
if (intel_iommu_strict || (pdev && pdev->untrusted) ||
!has_iova_flush_queue(&domain->iovad)) {
iommu_flush_iotlb_psi(iommu, domain, start_pfn,
@@ -3615,6 +3615,8 @@ static void intel_unmap(struct device *dev, dma_addr_t 
dev_addr, size_t size)
 * cpu used up by the iotlb flush operation...
 */
}
+
+   trace_unmap_single(dev, dev_addr, size);
 }
 
 static void intel_unmap_page(struct device *dev, dma_addr_t dev_addr,
@@ -3705,6 +3707,8 @@ static void intel_unmap_sg(struct device *dev, struct 
scatterlist *sglist,
}
 
intel_unmap(dev, startaddr, nrpages << VTD_PAGE_SHIFT);
+
+   trace_unmap_sg(dev, startaddr, nrpages << VTD_PAGE_SHIFT);
 }
 
 static int intel_map_sg(struct device *dev, struct scatterlist *sglist, int 
nelems,
@@ -3761,6 +3765,9 @@ static int intel_map_sg(struct device *dev, struct 
scatterlist *sglist, int nele
return 0;
}
 
+   trace_map_sg(dev, iova_pfn << PAGE_SHIFT,
+sg_phys(sglist), size << VTD_PAGE_SHIFT);
+
return nelems;
 }
 
diff --git a/drivers/iommu/intel-trace.c b/drivers/iommu/intel-trace.c
new file mode 100644
index ..bfb6a6e37a88
--- /dev/null
+++ b/drivers/iommu/intel-trace.c
@@ -0,0 +1,14 @@
+// SPDX-License-Identifier: GPL-2.0
+/*
+ * Intel IOMMU trace support
+ *
+ * Copyright (C) 2019 Intel Corporation
+ *
+ * Author: Lu Baolu 
+ */
+
+#include 
+#include 
+
+#define CREATE_TRACE_POINTS
+#include 
diff --git a/include/trace/events/intel_iommu.h 
b/include/trace/events/intel_iommu.h
new file mode 100644
index ..54e61d456cdf
--- /dev/null
+++ b/include/trace/events/intel_iommu.h
@@ -0,0 +1,106 @@
+/* SPDX-License-Identifier: GPL-2.0 */
+/*
+ * Intel IOMMU trace support
+ *
+ * Copyright (C) 2019 Intel Corporation
+ *
+ * Author: Lu Baolu 
+ */
+#ifdef CONFIG_INTEL_IOMMU
+#undef TRACE_SYSTEM
+#define TRACE_SYSTEM intel_iommu
+
+#if !defined(_TRACE_INTEL_IOMMU_H) || defined(TRACE_HEADER_MULTI_READ)
+#define _TRACE_INTEL_IOMMU_H
+
+#include 
+#include 
+
+DECLARE_EVENT_CLASS(dma_map,
+   TP_PROTO(struct device *dev, dma_addr_t dev_addr, phys_addr_t phys_addr,
+size_t size),
+
+   TP_ARGS(dev, dev_addr, phys_addr, size),
+
+   TP_STRUCT__entry(
+   __string(dev_name, dev_name(dev))
+   __field(dma_addr_t, dev_addr)
+   __field(phys_addr_t, phys_addr)
+   __field(size_t, size)
+   ),
+
+   TP_fast_assign(
+   __assign_str(dev_name, dev_name(dev));
+   __entry->dev_addr = dev_addr;
+   __entry->phys_addr = phys_addr;
+   __entry->size = size;
+   ),
+
+   TP_printk("dev=%s dev_addr=0x%llx phys_addr=0x%llx size=%zu",
+ __get_str(dev_name),
+ (unsigned long long)__entry->dev_addr,
+ (unsigned long lon

[PATCH v9 5/5] iommu/vt-d: Use bounce buffer for untrusted devices

2019-09-05 Thread Lu Baolu
The Intel VT-d hardware uses paging for DMA remapping.
The minimum mapped window is a page size. The device
drivers may map buffers not filling the whole IOMMU
window. This allows the device to access to possibly
unrelated memory and a malicious device could exploit
this to perform DMA attacks. To address this, the
Intel IOMMU driver will use bounce pages for those
buffers which don't fill whole IOMMU pages.

Cc: Ashok Raj 
Cc: Jacob Pan 
Cc: Kevin Tian 
Signed-off-by: Lu Baolu 
Tested-by: Xu Pengfei 
Tested-by: Mika Westerberg 
---
 drivers/iommu/intel-iommu.c | 258 
 1 file changed, 258 insertions(+)

diff --git a/drivers/iommu/intel-iommu.c b/drivers/iommu/intel-iommu.c
index 0720fd7b262b..c6935172df81 100644
--- a/drivers/iommu/intel-iommu.c
+++ b/drivers/iommu/intel-iommu.c
@@ -41,9 +41,11 @@
 #include 
 #include 
 #include 
+#include 
 #include 
 #include 
 #include 
+#include 
 
 #include "irq_remapping.h"
 #include "intel-pasid.h"
@@ -346,6 +348,8 @@ static int domain_detach_iommu(struct dmar_domain *domain,
 static bool device_is_rmrr_locked(struct device *dev);
 static int intel_iommu_attach_device(struct iommu_domain *domain,
 struct device *dev);
+static phys_addr_t intel_iommu_iova_to_phys(struct iommu_domain *domain,
+   dma_addr_t iova);
 
 #ifdef CONFIG_INTEL_IOMMU_DEFAULT_ON
 int dmar_disabled = 0;
@@ -3783,6 +3787,252 @@ static const struct dma_map_ops intel_dma_ops = {
.dma_supported = dma_direct_supported,
 };
 
+static void
+bounce_sync_single(struct device *dev, dma_addr_t addr, size_t size,
+  enum dma_data_direction dir, enum dma_sync_target target)
+{
+   struct dmar_domain *domain;
+   phys_addr_t tlb_addr;
+
+   domain = find_domain(dev);
+   if (WARN_ON(!domain))
+   return;
+
+   tlb_addr = intel_iommu_iova_to_phys(&domain->domain, addr);
+   if (is_swiotlb_buffer(tlb_addr))
+   swiotlb_tbl_sync_single(dev, tlb_addr, size, dir, target);
+}
+
+static dma_addr_t
+bounce_map_single(struct device *dev, phys_addr_t paddr, size_t size,
+ enum dma_data_direction dir, unsigned long attrs,
+ u64 dma_mask)
+{
+   size_t aligned_size = ALIGN(size, VTD_PAGE_SIZE);
+   struct dmar_domain *domain;
+   struct intel_iommu *iommu;
+   unsigned long iova_pfn;
+   unsigned long nrpages;
+   phys_addr_t tlb_addr;
+   int prot = 0;
+   int ret;
+
+   domain = find_domain(dev);
+   if (WARN_ON(dir == DMA_NONE || !domain))
+   return DMA_MAPPING_ERROR;
+
+   iommu = domain_get_iommu(domain);
+   if (WARN_ON(!iommu))
+   return DMA_MAPPING_ERROR;
+
+   nrpages = aligned_nrpages(0, size);
+   iova_pfn = intel_alloc_iova(dev, domain,
+   dma_to_mm_pfn(nrpages), dma_mask);
+   if (!iova_pfn)
+   return DMA_MAPPING_ERROR;
+
+   /*
+* Check if DMAR supports zero-length reads on write only
+* mappings..
+*/
+   if (dir == DMA_TO_DEVICE || dir == DMA_BIDIRECTIONAL ||
+   !cap_zlr(iommu->cap))
+   prot |= DMA_PTE_READ;
+   if (dir == DMA_FROM_DEVICE || dir == DMA_BIDIRECTIONAL)
+   prot |= DMA_PTE_WRITE;
+
+   /*
+* If both the physical buffer start address and size are
+* page aligned, we don't need to use a bounce page.
+*/
+   if (!IS_ALIGNED(paddr | size, VTD_PAGE_SIZE)) {
+   tlb_addr = swiotlb_tbl_map_single(dev,
+   __phys_to_dma(dev, io_tlb_start),
+   paddr, size, aligned_size, dir, attrs);
+   if (tlb_addr == DMA_MAPPING_ERROR) {
+   goto swiotlb_error;
+   } else {
+   /* Cleanup the padding area. */
+   void *padding_start = phys_to_virt(tlb_addr);
+   size_t padding_size = aligned_size;
+
+   if (!(attrs & DMA_ATTR_SKIP_CPU_SYNC) &&
+   (dir == DMA_TO_DEVICE ||
+dir == DMA_BIDIRECTIONAL)) {
+   padding_start += size;
+   padding_size -= size;
+   }
+
+   memset(padding_start, 0, padding_size);
+   }
+   } else {
+   tlb_addr = paddr;
+   }
+
+   ret = domain_pfn_mapping(domain, mm_to_dma_pfn(iova_pfn),
+tlb_addr >> VTD_PAGE_SHIFT, nrpages, prot);
+   if (ret)
+   goto mapping_error;
+
+   trace_bounce_map_single(dev, iova_pfn << PAGE_SHIFT, paddr, size);
+
+   return (phys_addr_t)iova_pfn << PAGE_SHIFT;
+
+mapping_error:
+   if (is_swiotlb_buffer(tlb_addr))
+   swiotlb_tbl_unmap_single(dev, tlb_addr, size,
+  

Re: [PATCH v4 3/3] iommu: arm-smmu-impl: Add sdm845 implementation hook

2019-09-05 Thread Vivek Gautam
On Fri, Aug 23, 2019 at 12:03 PM Vivek Gautam
 wrote:
>
> Add reset hook for sdm845 based platforms to turn off
> the wait-for-safe sequence.
>
> Understanding how wait-for-safe logic affects USB and UFS performance
> on MTP845 and DB845 boards:
>
> Qcom's implementation of arm,mmu-500 adds a WAIT-FOR-SAFE logic
> to address under-performance issues in real-time clients, such as
> Display, and Camera.
> On receiving an invalidation requests, the SMMU forwards SAFE request
> to these clients and waits for SAFE ack signal from real-time clients.
> The SAFE signal from such clients is used to qualify the start of
> invalidation.
> This logic is controlled by chicken bits, one for each - MDP (display),
> IFE0, and IFE1 (camera), that can be accessed only from secure software
> on sdm845.
>
> This configuration, however, degrades the performance of non-real time
> clients, such as USB, and UFS etc. This happens because, with wait-for-safe
> logic enabled the hardware tries to throttle non-real time clients while
> waiting for SAFE ack signals from real-time clients.
>
> On mtp845 and db845 devices, with wait-for-safe logic enabled by the
> bootloaders we see degraded performance of USB and UFS when kernel
> enables the smmu stage-1 translations for these clients.
> Turn off this wait-for-safe logic from the kernel gets us back the perf
> of USB and UFS devices until we re-visit this when we start seeing perf
> issues on display/camera on upstream supported SDM845 platforms.
> The bootloaders on these boards implement secure monitor callbacks to
> handle a specific command - QCOM_SCM_SVC_SMMU_PROGRAM with which the
> logic can be toggled.
>
> There are other boards such as cheza whose bootloaders don't enable this
> logic. Such boards don't implement callbacks to handle the specific SCM
> call so disabling this logic for such boards will be a no-op.
>
> This change is inspired by the downstream change from Patrick Daly
> to address performance issues with display and camera by handling
> this wait-for-safe within separte io-pagetable ops to do TLB
> maintenance. So a big thanks to him for the change and for all the
> offline discussions.
>
> Without this change the UFS reads are pretty slow:
> $ time dd if=/dev/sda of=/dev/zero bs=1048576 count=10 conv=sync
> 10+0 records in
> 10+0 records out
> 10485760 bytes (10.0MB) copied, 22.394903 seconds, 457.2KB/s
> real0m 22.39s
> user0m 0.00s
> sys 0m 0.01s
>
> With this change they are back to rock!
> $ time dd if=/dev/sda of=/dev/zero bs=1048576 count=300 conv=sync
> 300+0 records in
> 300+0 records out
> 314572800 bytes (300.0MB) copied, 1.030541 seconds, 291.1MB/s
> real0m 1.03s
> user0m 0.00s
> sys 0m 0.54s
>
> Signed-off-by: Vivek Gautam 
> ---
>  drivers/iommu/arm-smmu-impl.c | 27 ++-
>  1 file changed, 26 insertions(+), 1 deletion(-)
>
> diff --git a/drivers/iommu/arm-smmu-impl.c b/drivers/iommu/arm-smmu-impl.c
> index 3f88cd078dd5..0aef87c41f9c 100644
> --- a/drivers/iommu/arm-smmu-impl.c
> +++ b/drivers/iommu/arm-smmu-impl.c
> @@ -6,6 +6,7 @@
>
>  #include 
>  #include 
> +#include 
>
>  #include "arm-smmu.h"
>
> @@ -102,7 +103,6 @@ static struct arm_smmu_device 
> *cavium_smmu_impl_init(struct arm_smmu_device *smm
> return &cs->smmu;
>  }
>
> -
>  #define ARM_MMU500_ACTLR_CPRE  (1 << 1)
>
>  #define ARM_MMU500_ACR_CACHE_LOCK  (1 << 26)
> @@ -147,6 +147,28 @@ static const struct arm_smmu_impl arm_mmu500_impl = {
> .reset = arm_mmu500_reset,
>  };
>
> +static int qcom_sdm845_smmu500_reset(struct arm_smmu_device *smmu)
> +{
> +   int ret;
> +
> +   arm_mmu500_reset(smmu);
> +
> +   /*
> +* To address performance degradation in non-real time clients,
> +* such as USB and UFS, turn off wait-for-safe on sdm845 based boards,
> +* such as MTP and db845, whose firmwares implement secure monitor
> +* call handlers to turn on/off the wait-for-safe logic.
> +*/
> +   ret = qcom_scm_qsmmu500_wait_safe_toggle(0);
> +   if (ret)
> +   dev_warn(smmu->dev, "Failed to turn off SAFE logic\n");
> +
> +   return 0;
> +}
> +
> +const struct arm_smmu_impl qcom_sdm845_smmu500_impl = {
> +   .reset = qcom_sdm845_smmu500_reset,
> +};
>
>  struct arm_smmu_device *arm_smmu_impl_init(struct arm_smmu_device *smmu)
>  {
> @@ -170,5 +192,8 @@ struct arm_smmu_device *arm_smmu_impl_init(struct 
> arm_smmu_device *smmu)
>   "calxeda,smmu-secure-config-access"))
> smmu->impl = &calxeda_impl;
>
> +   if (of_device_is_compatible(smmu->dev->of_node, 
> "qcom,sdm845-smmu-500"))
> +   smmu->impl = &qcom_sdm845_smmu500_impl;
> +
> return smmu;
>  }

Hi Robin,

How does this change look now? Let me know if there are any concerns.
I will be happy to incorporate them.

Thanks & Regards
Vivek
-- 
QUALCOMM INDIA, on behalf of Qualcomm Innovation Center, Inc. is a member
of Code Au