Re: [PATCH 01/12] iommu/vt-d: Use iommu_get_domain_for_dev() in debugfs

2022-05-30 Thread Baolu Lu

On 2022/5/30 20:14, Jason Gunthorpe wrote:

On Sun, May 29, 2022 at 01:14:46PM +0800, Baolu Lu wrote:


 From 1e87b5df40c6ce9414cdd03988c3b52bfb17af5f Mon Sep 17 00:00:00 2001
From: Lu Baolu 
Date: Sun, 29 May 2022 10:18:56 +0800
Subject: [PATCH 1/1] iommu/vt-d: debugfs: Remove device_domain_lock usage

The domain_translation_struct debugfs node is used to dump static
mappings of PCI devices. It potentially races with setting new
domains to devices and the iommu_map/unmap() interfaces. The existing
code tries to use the global spinlock device_domain_lock to avoid the
races, but this is problematical as this lock is only used to protect
the device tracking lists of the domains.

Instead of using an immature lock to cover up the problem, it's better
to explicitly restrict the use of this debugfs node. This also makes
device_domain_lock static.


What does "explicitly restrict" mean?


I originally thought about adding restrictions on this interface to a
document. But obviously, this is a naive idea. :-) I went over the code
again. The races exist in two paths:

1. Dump the page table in use while setting a new page table to the
   device.
2. A high-level page table entry has been marked as non-present, but the
   dumping code has walked down to the low-level tables.

For case 1, we can try to solve it by dumping tables while holding the
group->mutex.

For case 2, it is a bit weird. I tried to add a rwsem lock to make the
iommu_unmap() and dumping tables in debugfs exclusive. This does not
work because debugfs may depend on the DMA of the devices to work. It
seems that what we can do is to allow this race, but when we traverse
the page table in debugfs, we will check the validity of the physical
address retrieved from the page table entry. Then, the worst case is to
print some useless information.

The real code looks like this:

From 3feb0727f9d7095729ef75ab1967270045b3a38c Mon Sep 17 00:00:00 2001
From: Lu Baolu 
Date: Sun, 29 May 2022 10:18:56 +0800
Subject: [PATCH 1/1] iommu/vt-d: debugfs: Remove device_domain_lock usage

The domain_translation_struct debugfs node is used to dump the DMAR page
tables for the PCI devices. It potentially races with setting domains to
devices and the iommu_unmap() interface. The existing code uses a global
spinlock device_domain_lock to avoid the races, but this is problematical
as this lock is only used to protect the device tracking lists of each
domain.

This replaces device_domain_lock with group->mutex to protect the traverse
of the page tables from setting a new domain and always check the physical
address retrieved from the page table entry before traversing to the next-
level page table.

As a cleanup, this also makes device_domain_lock static.

Signed-off-by: Lu Baolu 
---
 drivers/iommu/intel/debugfs.c | 42 ++-
 drivers/iommu/intel/iommu.c   |  2 +-
 drivers/iommu/intel/iommu.h   |  1 -
 3 files changed, 27 insertions(+), 18 deletions(-)

diff --git a/drivers/iommu/intel/debugfs.c b/drivers/iommu/intel/debugfs.c
index d927ef10641b..e6f4835b8d9f 100644
--- a/drivers/iommu/intel/debugfs.c
+++ b/drivers/iommu/intel/debugfs.c
@@ -333,25 +333,28 @@ static void pgtable_walk_level(struct seq_file *m, 
struct dma_pte *pde,

continue;

path[level] = pde->val;
-   if (dma_pte_superpage(pde) || level == 1)
+   if (dma_pte_superpage(pde) || level == 1) {
dump_page_info(m, start, path);
-   else
-   pgtable_walk_level(m, phys_to_virt(dma_pte_addr(pde)),
+   } else {
+   unsigned long phys_addr;
+
+   phys_addr = (unsigned long)dma_pte_addr(pde);
+   if (!pfn_valid(__phys_to_pfn(phys_addr)))
+   break;
+   pgtable_walk_level(m, phys_to_virt(phys_addr),
   level - 1, start, path);
+   }
path[level] = 0;
}
 }

-static int show_device_domain_translation(struct device *dev, void *data)
+static int __show_device_domain_translation(struct device *dev, void *data)
 {
struct device_domain_info *info = dev_iommu_priv_get(dev);
struct dmar_domain *domain = info->domain;
struct seq_file *m = data;
u64 path[6] = { 0 };

-   if (!domain)
-   return 0;
-
seq_printf(m, "Device %s @0x%llx\n", dev_name(dev),
   (u64)virt_to_phys(domain->pgd));
 	seq_puts(m, 
"IOVA_PFN\t\tPML5E\t\t\tPML4E\t\t\tPDPE\t\t\tPDE\t\t\tPTE\n");
@@ -359,20 +362,27 @@ static int show_device_domain_translation(struct 
device *dev, void *data)

pgtable_walk_level(m, domain->pgd, domain->agaw + 2, 0, path);
seq_putc(m, '\n');

-   return 0;
+   return 1;
 }

-static int domain_translation_struct_show(struct seq_file *m, void *unused)
+static int show_device_domain_translation(struct 

[PATCH V3 5/8] dt-bindings: Add xen, grant-dma IOMMU description for xen-grant DMA ops

2022-05-30 Thread Oleksandr Tyshchenko
From: Oleksandr Tyshchenko 

The main purpose of this binding is to communicate Xen specific
information using generic IOMMU device tree bindings (which is
a good fit here) rather than introducing a custom property.

Introduce Xen specific IOMMU for the virtualized device (e.g. virtio)
to be used by Xen grant DMA-mapping layer in the subsequent commit.

The reference to Xen specific IOMMU node using "iommus" property
indicates that Xen grant mappings need to be enabled for the device,
and it specifies the ID of the domain where the corresponding backend
resides. The domid (domain ID) is used as an argument to the Xen grant
mapping APIs.

This is needed for the option to restrict memory access using Xen grant
mappings to work which primary goal is to enable using virtio devices
in Xen guests.

Signed-off-by: Oleksandr Tyshchenko 
---
Changes RFC -> V1:
   - update commit subject/description and text in description
   - move to devicetree/bindings/arm/

Changes V1 -> V2:
   - update text in description
   - change the maintainer of the binding
   - fix validation issue
   - reference xen,dev-domid.yaml schema from virtio/mmio.yaml

Change V2 -> V3:
   - Stefano already gave his Reviewed-by, I dropped it due to the changes 
(significant)
   - use generic IOMMU device tree bindings instead of custom property
 "xen,dev-domid"
   - change commit subject and description, was
 "dt-bindings: Add xen,dev-domid property description for xen-grant DMA ops"
---
 .../devicetree/bindings/iommu/xen,grant-dma.yaml   | 49 ++
 1 file changed, 49 insertions(+)
 create mode 100644 Documentation/devicetree/bindings/iommu/xen,grant-dma.yaml

diff --git a/Documentation/devicetree/bindings/iommu/xen,grant-dma.yaml 
b/Documentation/devicetree/bindings/iommu/xen,grant-dma.yaml
new file mode 100644
index ..ab5765c
--- /dev/null
+++ b/Documentation/devicetree/bindings/iommu/xen,grant-dma.yaml
@@ -0,0 +1,49 @@
+# SPDX-License-Identifier: (GPL-2.0-only or BSD-2-Clause)
+%YAML 1.2
+---
+$id: http://devicetree.org/schemas/iommu/xen,grant-dma.yaml#
+$schema: http://devicetree.org/meta-schemas/core.yaml#
+
+title: Xen specific IOMMU for virtualized devices (e.g. virtio)
+
+maintainers:
+  - Stefano Stabellini 
+
+description:
+  The reference to Xen specific IOMMU node using "iommus" property indicates
+  that Xen grant mappings need to be enabled for the device, and it specifies
+  the ID of the domain where the corresponding backend resides.
+  The binding is required to restrict memory access using Xen grant mappings.
+
+properties:
+  compatible:
+const: xen,grant-dma
+
+  '#iommu-cells':
+const: 1
+description:
+  Xen specific IOMMU is multiple-master IOMMU device.
+  The single cell describes the domid (domain ID) of the domain where
+  the backend is running.
+
+required:
+  - compatible
+  - "#iommu-cells"
+
+additionalProperties: false
+
+examples:
+  - |
+xen_iommu {
+compatible = "xen,grant-dma";
+#iommu-cells = <1>;
+};
+
+virtio@3000 {
+compatible = "virtio,mmio";
+reg = <0x3000 0x100>;
+interrupts = <41>;
+
+/* The backend is located in Xen domain with ID 1 */
+iommus = <_iommu 1>;
+};
-- 
2.7.4

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[PATCH 2/3] iommu: mtk_iommu: add support for 6-bit encoded port IDs

2022-05-30 Thread Fabien Parent
Until now the port ID was always encoded as a 5-bit data. On MT8365,
the port ID is encoded as a 6-bit data. This requires to rework the
macros F_MMU_INT_ID_LARB_ID, and F_MMU_INT_ID_PORT_ID in order
to support 5-bit and 6-bit encoded port IDs.

Signed-off-by: Fabien Parent 
---
 drivers/iommu/mtk_iommu.c | 17 +
 drivers/iommu/mtk_iommu.h |  1 +
 2 files changed, 14 insertions(+), 4 deletions(-)

diff --git a/drivers/iommu/mtk_iommu.c b/drivers/iommu/mtk_iommu.c
index 6fd75a60abd6..b692347d8d56 100644
--- a/drivers/iommu/mtk_iommu.c
+++ b/drivers/iommu/mtk_iommu.c
@@ -103,8 +103,10 @@
 #define REG_MMU1_INT_ID0x154
 #define F_MMU_INT_ID_COMM_ID(a)(((a) >> 9) & 0x7)
 #define F_MMU_INT_ID_SUB_COMM_ID(a)(((a) >> 7) & 0x3)
-#define F_MMU_INT_ID_LARB_ID(a)(((a) >> 7) & 0x7)
-#define F_MMU_INT_ID_PORT_ID(a)(((a) >> 2) & 0x1f)
+#define F_MMU_INT_ID_LARB_ID(a, port_width)\
+   ((a) >> ((port_width + 2) & 0x7))
+#define F_MMU_INT_ID_PORT_ID(a, port_width)\
+   (((a) >> 2) & GENMASK(port_width - 1, 0))
 
 #define MTK_PROTECT_PA_ALIGN   256
 
@@ -291,12 +293,13 @@ static irqreturn_t mtk_iommu_isr(int irq, void *dev_id)
fault_pa |= (u64)pa34_32 << 32;
}
 
-   fault_port = F_MMU_INT_ID_PORT_ID(regval);
+   fault_port = F_MMU_INT_ID_PORT_ID(regval, data->plat_data->port_width);
if (MTK_IOMMU_HAS_FLAG(data->plat_data, HAS_SUB_COMM)) {
fault_larb = F_MMU_INT_ID_COMM_ID(regval);
sub_comm = F_MMU_INT_ID_SUB_COMM_ID(regval);
} else {
-   fault_larb = F_MMU_INT_ID_LARB_ID(regval);
+   fault_larb = F_MMU_INT_ID_LARB_ID(regval,
+ data->plat_data->port_width);
}
fault_larb = data->plat_data->larbid_remap[fault_larb][sub_comm];
 
@@ -1034,6 +1037,7 @@ static const struct mtk_iommu_plat_data mt2712_data = {
.iova_region  = single_domain,
.iova_region_nr = ARRAY_SIZE(single_domain),
.larbid_remap = {{0}, {1}, {2}, {3}, {4}, {5}, {6}, {7}},
+   .port_width   = 5,
 };
 
 static const struct mtk_iommu_plat_data mt6779_data = {
@@ -1043,6 +1047,7 @@ static const struct mtk_iommu_plat_data mt6779_data = {
.iova_region   = single_domain,
.iova_region_nr = ARRAY_SIZE(single_domain),
.larbid_remap  = {{0}, {1}, {2}, {3}, {5}, {7, 8}, {10}, {9}},
+   .port_width= 5,
 };
 
 static const struct mtk_iommu_plat_data mt8167_data = {
@@ -1052,6 +1057,7 @@ static const struct mtk_iommu_plat_data mt8167_data = {
.iova_region  = single_domain,
.iova_region_nr = ARRAY_SIZE(single_domain),
.larbid_remap = {{0}, {1}, {2}}, /* Linear mapping. */
+   .port_width   = 5,
 };
 
 static const struct mtk_iommu_plat_data mt8173_data = {
@@ -1062,6 +1068,7 @@ static const struct mtk_iommu_plat_data mt8173_data = {
.iova_region  = single_domain,
.iova_region_nr = ARRAY_SIZE(single_domain),
.larbid_remap = {{0}, {1}, {2}, {3}, {4}, {5}}, /* Linear mapping. */
+   .port_width   = 5,
 };
 
 static const struct mtk_iommu_plat_data mt8183_data = {
@@ -1071,6 +1078,7 @@ static const struct mtk_iommu_plat_data mt8183_data = {
.iova_region  = single_domain,
.iova_region_nr = ARRAY_SIZE(single_domain),
.larbid_remap = {{0}, {4}, {5}, {6}, {7}, {2}, {3}, {1}},
+   .port_width   = 5,
 };
 
 static const struct mtk_iommu_plat_data mt8192_data = {
@@ -1082,6 +1090,7 @@ static const struct mtk_iommu_plat_data mt8192_data = {
.iova_region_nr = ARRAY_SIZE(mt8192_multi_dom),
.larbid_remap   = {{0}, {1}, {4, 5}, {7}, {2}, {9, 11, 19, 20},
   {0, 14, 16}, {0, 13, 18, 17}},
+   .port_width = 5,
 };
 
 static const struct of_device_id mtk_iommu_of_ids[] = {
diff --git a/drivers/iommu/mtk_iommu.h b/drivers/iommu/mtk_iommu.h
index b742432220c5..84cecaf6d61c 100644
--- a/drivers/iommu/mtk_iommu.h
+++ b/drivers/iommu/mtk_iommu.h
@@ -54,6 +54,7 @@ struct mtk_iommu_plat_data {
enum mtk_iommu_plat m4u_plat;
u32 flags;
u32 inv_sel_reg;
+   u8  port_width;
 
unsigned intiova_region_nr;
const struct mtk_iommu_iova_region  *iova_region;
-- 
2.36.1

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[PATCH 3/3] iommu: mtk_iommu: add support for MT8365 SoC

2022-05-30 Thread Fabien Parent
Add IOMMU support for MT8365 SoC.

Signed-off-by: Fabien Parent 
---
 drivers/iommu/mtk_iommu.c | 11 +++
 drivers/iommu/mtk_iommu.h |  1 +
 2 files changed, 12 insertions(+)

diff --git a/drivers/iommu/mtk_iommu.c b/drivers/iommu/mtk_iommu.c
index b692347d8d56..039b8f9d5022 100644
--- a/drivers/iommu/mtk_iommu.c
+++ b/drivers/iommu/mtk_iommu.c
@@ -1093,6 +1093,16 @@ static const struct mtk_iommu_plat_data mt8192_data = {
.port_width = 5,
 };
 
+static const struct mtk_iommu_plat_data mt8365_data = {
+   .m4u_plat = M4U_MT8365,
+   .flags= RESET_AXI,
+   .inv_sel_reg  = REG_MMU_INV_SEL_GEN1,
+   .iova_region  = single_domain,
+   .iova_region_nr = ARRAY_SIZE(single_domain),
+   .larbid_remap = {{0}, {1}, {2}, {3}, {4}, {5}}, /* Linear mapping. */
+   .port_width   = 6,
+};
+
 static const struct of_device_id mtk_iommu_of_ids[] = {
{ .compatible = "mediatek,mt2712-m4u", .data = _data},
{ .compatible = "mediatek,mt6779-m4u", .data = _data},
@@ -1100,6 +1110,7 @@ static const struct of_device_id mtk_iommu_of_ids[] = {
{ .compatible = "mediatek,mt8173-m4u", .data = _data},
{ .compatible = "mediatek,mt8183-m4u", .data = _data},
{ .compatible = "mediatek,mt8192-m4u", .data = _data},
+   { .compatible = "mediatek,mt8365-m4u", .data = _data},
{}
 };
 
diff --git a/drivers/iommu/mtk_iommu.h b/drivers/iommu/mtk_iommu.h
index 84cecaf6d61c..cb174fa6f2ab 100644
--- a/drivers/iommu/mtk_iommu.h
+++ b/drivers/iommu/mtk_iommu.h
@@ -46,6 +46,7 @@ enum mtk_iommu_plat {
M4U_MT8173,
M4U_MT8183,
M4U_MT8192,
+   M4U_MT8365,
 };
 
 struct mtk_iommu_iova_region;
-- 
2.36.1

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[PATCH 1/3] dt-bindings: iommu: mediatek: add binding documentation for MT8365 SoC

2022-05-30 Thread Fabien Parent
Add IOMMU binding documentation for the MT8365 SoC.

Signed-off-by: Fabien Parent 
---
 .../bindings/iommu/mediatek,iommu.yaml|  2 +
 include/dt-bindings/memory/mt8365-larb-port.h | 96 +++
 2 files changed, 98 insertions(+)
 create mode 100644 include/dt-bindings/memory/mt8365-larb-port.h

diff --git a/Documentation/devicetree/bindings/iommu/mediatek,iommu.yaml 
b/Documentation/devicetree/bindings/iommu/mediatek,iommu.yaml
index 97e8c471a5e8..5ba688365da5 100644
--- a/Documentation/devicetree/bindings/iommu/mediatek,iommu.yaml
+++ b/Documentation/devicetree/bindings/iommu/mediatek,iommu.yaml
@@ -77,6 +77,7 @@ properties:
   - mediatek,mt8173-m4u  # generation two
   - mediatek,mt8183-m4u  # generation two
   - mediatek,mt8192-m4u  # generation two
+  - mediatek,mt8365-m4u  # generation two
 
   - description: mt7623 generation one
 items:
@@ -120,6 +121,7 @@ properties:
   dt-binding/memory/mt8173-larb-port.h for mt8173,
   dt-binding/memory/mt8183-larb-port.h for mt8183,
   dt-binding/memory/mt8192-larb-port.h for mt8192.
+  dt-binding/memory/mt8365-larb-port.h for mt8365.
 
   power-domains:
 maxItems: 1
diff --git a/include/dt-bindings/memory/mt8365-larb-port.h 
b/include/dt-bindings/memory/mt8365-larb-port.h
new file mode 100644
index ..e7d5637aa38e
--- /dev/null
+++ b/include/dt-bindings/memory/mt8365-larb-port.h
@@ -0,0 +1,96 @@
+/* SPDX-License-Identifier: GPL-2.0 */
+/*
+ * Copyright (c) 2022 MediaTek Inc.
+ * Author: Yong Wu 
+ */
+#ifndef _DT_BINDINGS_MEMORY_MT8365_LARB_PORT_H_
+#define _DT_BINDINGS_MEMORY_MT8365_LARB_PORT_H_
+
+#include 
+
+#define M4U_LARB0_ID   0
+#define M4U_LARB1_ID   1
+#define M4U_LARB2_ID   2
+#define M4U_LARB3_ID   3
+#define M4U_LARB4_ID   4
+#define M4U_LARB5_ID   5
+#define M4U_LARB6_ID   6
+#define M4U_LARB7_ID   7
+
+/* larb0 */
+#define M4U_PORT_DISP_OVL0 MTK_M4U_ID(0, 0)
+#define M4U_PORT_DISP_OVL0_2L  MTK_M4U_ID(0, 1)
+#define M4U_PORT_DISP_RDMA0MTK_M4U_ID(0, 2)
+#define M4U_PORT_DISP_WDMA0MTK_M4U_ID(0, 3)
+#define M4U_PORT_DISP_RDMA1MTK_M4U_ID(0, 4)
+#define M4U_PORT_MDP_RDMA0 MTK_M4U_ID(0, 5)
+#define M4U_PORT_MDP_WROT1 MTK_M4U_ID(0, 6)
+#define M4U_PORT_MDP_WROT0 MTK_M4U_ID(0, 7)
+#define M4U_PORT_MDP_RDMA1 MTK_M4U_ID(0, 8)
+#define M4U_PORT_DISP_FAKE0MTK_M4U_ID(0, 9)
+
+/* larb1 */
+#define M4U_PORT_VENC_RCPU MTK_M4U_ID(1, 0)
+#define M4U_PORT_VENC_REC  MTK_M4U_ID(1, 1)
+#define M4U_PORT_VENC_BSDMAMTK_M4U_ID(1, 2)
+#define M4U_PORT_VENC_SV_COMV  MTK_M4U_ID(1, 3)
+#define M4U_PORT_VENC_RD_COMV  MTK_M4U_ID(1, 4)
+#define M4U_PORT_VENC_NBM_RDMA MTK_M4U_ID(1, 5)
+#define M4U_PORT_VENC_NBM_RDMA_LITEMTK_M4U_ID(1, 6)
+#define M4U_PORT_JPGENC_Y_RDMA MTK_M4U_ID(1, 7)
+#define M4U_PORT_JPGENC_C_RDMA MTK_M4U_ID(1, 8)
+#define M4U_PORT_JPGENC_Q_TABLEMTK_M4U_ID(1, 9)
+#define M4U_PORT_JPGENC_BSDMA  MTK_M4U_ID(1, 10)
+#define M4U_PORT_JPGDEC_WDMA   MTK_M4U_ID(1, 11)
+#define M4U_PORT_JPGDEC_BSDMA  MTK_M4U_ID(1, 12)
+#define M4U_PORT_VENC_NBM_WDMA MTK_M4U_ID(1, 13)
+#define M4U_PORT_VENC_NBM_WDMA_LITEMTK_M4U_ID(1, 14)
+#define M4U_PORT_VENC_CUR_LUMA MTK_M4U_ID(1, 15)
+#define M4U_PORT_VENC_CUR_CHROMA   MTK_M4U_ID(1, 16)
+#define M4U_PORT_VENC_REF_LUMA MTK_M4U_ID(1, 17)
+#define M4U_PORT_VENC_REF_CHROMA   MTK_M4U_ID(1, 18)
+
+/* larb2 */
+#define M4U_PORT_CAM_IMGO  MTK_M4U_ID(2, 0)
+#define M4U_PORT_CAM_RRZO  MTK_M4U_ID(2, 1)
+#define M4U_PORT_CAM_AAO   MTK_M4U_ID(2, 2)
+#define M4U_PORT_CAM_LCS   MTK_M4U_ID(2, 3)
+#define M4U_PORT_CAM_ESFKO MTK_M4U_ID(2, 4)
+#define M4U_PORT_CAM_CAM_SV0   MTK_M4U_ID(2, 5)
+#define M4U_PORT_CAM_CAM_SV1   MTK_M4U_ID(2, 6)
+#define M4U_PORT_CAM_LSCI  MTK_M4U_ID(2, 7)
+#define M4U_PORT_CAM_LSCI_DMTK_M4U_ID(2, 8)
+#define M4U_PORT_CAM_AFO   MTK_M4U_ID(2, 9)
+#define M4U_PORT_CAM_SPARE MTK_M4U_ID(2, 10)
+#define M4U_PORT_CAM_BPCI  MTK_M4U_ID(2, 11)
+#define M4U_PORT_CAM_BPCI_DMTK_M4U_ID(2, 12)
+#define M4U_PORT_CAM_UFDI  MTK_M4U_ID(2, 13)
+#define M4U_PORT_CAM_IMGI  MTK_M4U_ID(2, 14)
+#define M4U_PORT_CAM_IMG2O MTK_M4U_ID(2, 15)
+#define M4U_PORT_CAM_IMG3O MTK_M4U_ID(2, 16)
+#define M4U_PORT_CAM_WPE0_IMTK_M4U_ID(2, 17)
+#define M4U_PORT_CAM_WPE1_IMTK_M4U_ID(2, 18)
+#define M4U_PORT_CAM_WPE_O MTK_M4U_ID(2, 19)
+#define M4U_PORT_CAM_FD0_I MTK_M4U_ID(2, 20)
+#define M4U_PORT_CAM_FD1_I MTK_M4U_ID(2, 21)

[syzbot] WARNING in dma_map_sgtable (2)

2022-05-30 Thread syzbot
Hello,

syzbot found the following issue on:

HEAD commit:7e062cda7d90 Merge tag 'net-next-5.19' of git://git.kernel..
git tree:   upstream
console+strace: https://syzkaller.appspot.com/x/log.txt?x=172151d3f0
kernel config:  https://syzkaller.appspot.com/x/.config?x=e9d71d3c07c36588
dashboard link: https://syzkaller.appspot.com/bug?extid=3ba551855046ba3b3806
compiler:   Debian clang version 
13.0.1-++20220126092033+75e33f71c2da-1~exp1~20220126212112.63, GNU ld (GNU 
Binutils for Debian) 2.35.2
syz repro:  https://syzkaller.appspot.com/x/repro.syz?x=12918503f0
C reproducer:   https://syzkaller.appspot.com/x/repro.c?x=1386fa39f0

Bisection is inconclusive: the issue happens on the oldest tested release.

bisection log:  https://syzkaller.appspot.com/x/bisect.txt?x=14107ee5f0
final oops: https://syzkaller.appspot.com/x/report.txt?x=16107ee5f0
console output: https://syzkaller.appspot.com/x/log.txt?x=12107ee5f0

IMPORTANT: if you fix the issue, please add the following tag to the commit:
Reported-by: syzbot+3ba551855046ba3b3...@syzkaller.appspotmail.com

[ cut here ]
WARNING: CPU: 0 PID: 3610 at kernel/dma/mapping.c:188 
dma_map_sgtable+0x203/0x260 kernel/dma/mapping.c:264
Modules linked in:
CPU: 0 PID: 3610 Comm: syz-executor162 Not tainted 
5.18.0-syzkaller-04943-g7e062cda7d90 #0
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 
01/01/2011
RIP: 0010:__dma_map_sg_attrs kernel/dma/mapping.c:188 [inline]
RIP: 0010:dma_map_sgtable+0x203/0x260 kernel/dma/mapping.c:264
Code: 75 15 e8 50 5f 14 00 eb cb e8 49 5f 14 00 eb c4 e8 42 5f 14 00 eb bd e8 
3b 5f 14 00 0f 0b bd fb ff ff ff eb af e8 2d 5f 14 00 <0f> 0b 31 ed 48 bb 00 00 
00 00 00 fc ff df e9 7b ff ff ff 89 e9 80
RSP: 0018:c9000305fd40 EFLAGS: 00010293
RAX: 81723873 RBX: dc00 RCX: 88801fbb8000
RDX:  RSI: 0001 RDI: 0002
RBP: 8881487e5408 R08: 81723743 R09: ed1003592c9e
R10: ed1003592c9e R11: 111003592c9c R12: 8881487e5000
R13: 88801ac964e0 R14:  R15: 0001
FS:  56c2a300() GS:8880b9a0() knlGS:
CS:  0010 DS:  ES:  CR0: 80050033
CR2: 005d84c8 CR3: 1f1ef000 CR4: 003506f0
DR0:  DR1:  DR2: 
DR3:  DR6: fffe0ff0 DR7: 0400
Call Trace:
 
 get_sg_table+0xf9/0x150 drivers/dma-buf/udmabuf.c:72
 begin_cpu_udmabuf+0xf5/0x160 drivers/dma-buf/udmabuf.c:126
 dma_buf_begin_cpu_access+0xd8/0x170 drivers/dma-buf/dma-buf.c:1172
 dma_buf_ioctl+0x2a0/0x2f0 drivers/dma-buf/dma-buf.c:363
 vfs_ioctl fs/ioctl.c:51 [inline]
 __do_sys_ioctl fs/ioctl.c:870 [inline]
 __se_sys_ioctl+0xfb/0x170 fs/ioctl.c:856
 do_syscall_x64 arch/x86/entry/common.c:50 [inline]
 do_syscall_64+0x2b/0x70 arch/x86/entry/common.c:80
 entry_SYSCALL_64_after_hwframe+0x46/0xb0
RIP: 0033:0x7f8bf9c6dc19
Code: 28 c3 e8 2a 14 00 00 66 2e 0f 1f 84 00 00 00 00 00 48 89 f8 48 89 f7 48 
89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 73 01 
c3 48 c7 c1 c0 ff ff ff f7 d8 64 89 01 48
RSP: 002b:7ffd7cfae1d8 EFLAGS: 0246 ORIG_RAX: 0010
RAX: ffda RBX:  RCX: 7f8bf9c6dc19
RDX: 2100 RSI: 40086200 RDI: 0006
RBP: 7f8bf9c31dc0 R08:  R09: 
R10:  R11: 0246 R12: 7f8bf9c31e50
R13:  R14:  R15: 
 


---
This report is generated by a bot. It may contain errors.
See https://goo.gl/tpsmEJ for more information about syzbot.
syzbot engineers can be reached at syzkal...@googlegroups.com.

syzbot will keep track of this issue. See:
https://goo.gl/tpsmEJ#status for how to communicate with syzbot.
For information about bisection process see: https://goo.gl/tpsmEJ#bisection
syzbot can test patches for this issue, for details see:
https://goo.gl/tpsmEJ#testing-patches
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[PATCH AUTOSEL 4.19 26/38] dma-debug: change allocation mode from GFP_NOWAIT to GFP_ATIOMIC

2022-05-30 Thread Sasha Levin
From: Mikulas Patocka 

[ Upstream commit 84bc4f1d5f8aa68706a96711dccb28b518e5 ]

We observed the error "cacheline tracking ENOMEM, dma-debug disabled"
during a light system load (copying some files). The reason for this error
is that the dma_active_cacheline radix tree uses GFP_NOWAIT allocation -
so it can't access the emergency memory reserves and it fails as soon as
anybody reaches the watermark.

This patch changes GFP_NOWAIT to GFP_ATOMIC, so that it can access the
emergency memory reserves.

Signed-off-by: Mikulas Patocka 
Signed-off-by: Christoph Hellwig 
Signed-off-by: Sasha Levin 
---
 kernel/dma/debug.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/kernel/dma/debug.c b/kernel/dma/debug.c
index 9c9a5b12f92f..7c6cd00d0fca 100644
--- a/kernel/dma/debug.c
+++ b/kernel/dma/debug.c
@@ -469,7 +469,7 @@ void debug_dma_dump_mappings(struct device *dev)
  * At any time debug_dma_assert_idle() can be called to trigger a
  * warning if any cachelines in the given page are in the active set.
  */
-static RADIX_TREE(dma_active_cacheline, GFP_NOWAIT);
+static RADIX_TREE(dma_active_cacheline, GFP_ATOMIC);
 static DEFINE_SPINLOCK(radix_lock);
 #define ACTIVE_CACHELINE_MAX_OVERLAP ((1 << RADIX_TREE_MAX_TAGS) - 1)
 #define CACHELINE_PER_PAGE_SHIFT (PAGE_SHIFT - L1_CACHE_SHIFT)
-- 
2.35.1

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[PATCH AUTOSEL 5.4 37/55] dma-debug: change allocation mode from GFP_NOWAIT to GFP_ATIOMIC

2022-05-30 Thread Sasha Levin
From: Mikulas Patocka 

[ Upstream commit 84bc4f1d5f8aa68706a96711dccb28b518e5 ]

We observed the error "cacheline tracking ENOMEM, dma-debug disabled"
during a light system load (copying some files). The reason for this error
is that the dma_active_cacheline radix tree uses GFP_NOWAIT allocation -
so it can't access the emergency memory reserves and it fails as soon as
anybody reaches the watermark.

This patch changes GFP_NOWAIT to GFP_ATOMIC, so that it can access the
emergency memory reserves.

Signed-off-by: Mikulas Patocka 
Signed-off-by: Christoph Hellwig 
Signed-off-by: Sasha Levin 
---
 kernel/dma/debug.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/kernel/dma/debug.c b/kernel/dma/debug.c
index 4dc3bbfd3e3f..1c133f610f59 100644
--- a/kernel/dma/debug.c
+++ b/kernel/dma/debug.c
@@ -450,7 +450,7 @@ void debug_dma_dump_mappings(struct device *dev)
  * At any time debug_dma_assert_idle() can be called to trigger a
  * warning if any cachelines in the given page are in the active set.
  */
-static RADIX_TREE(dma_active_cacheline, GFP_NOWAIT);
+static RADIX_TREE(dma_active_cacheline, GFP_ATOMIC);
 static DEFINE_SPINLOCK(radix_lock);
 #define ACTIVE_CACHELINE_MAX_OVERLAP ((1 << RADIX_TREE_MAX_TAGS) - 1)
 #define CACHELINE_PER_PAGE_SHIFT (PAGE_SHIFT - L1_CACHE_SHIFT)
-- 
2.35.1

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[PATCH AUTOSEL 5.10 50/76] dma-debug: change allocation mode from GFP_NOWAIT to GFP_ATIOMIC

2022-05-30 Thread Sasha Levin
From: Mikulas Patocka 

[ Upstream commit 84bc4f1d5f8aa68706a96711dccb28b518e5 ]

We observed the error "cacheline tracking ENOMEM, dma-debug disabled"
during a light system load (copying some files). The reason for this error
is that the dma_active_cacheline radix tree uses GFP_NOWAIT allocation -
so it can't access the emergency memory reserves and it fails as soon as
anybody reaches the watermark.

This patch changes GFP_NOWAIT to GFP_ATOMIC, so that it can access the
emergency memory reserves.

Signed-off-by: Mikulas Patocka 
Signed-off-by: Christoph Hellwig 
Signed-off-by: Sasha Levin 
---
 kernel/dma/debug.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/kernel/dma/debug.c b/kernel/dma/debug.c
index f8ae54679865..ee7da1f2462f 100644
--- a/kernel/dma/debug.c
+++ b/kernel/dma/debug.c
@@ -448,7 +448,7 @@ void debug_dma_dump_mappings(struct device *dev)
  * other hand, consumes a single dma_debug_entry, but inserts 'nents'
  * entries into the tree.
  */
-static RADIX_TREE(dma_active_cacheline, GFP_NOWAIT);
+static RADIX_TREE(dma_active_cacheline, GFP_ATOMIC);
 static DEFINE_SPINLOCK(radix_lock);
 #define ACTIVE_CACHELINE_MAX_OVERLAP ((1 << RADIX_TREE_MAX_TAGS) - 1)
 #define CACHELINE_PER_PAGE_SHIFT (PAGE_SHIFT - L1_CACHE_SHIFT)
-- 
2.35.1

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[PATCH AUTOSEL 5.10 01/76] iommu/vt-d: Add RPLS to quirk list to skip TE disabling

2022-05-30 Thread Sasha Levin
From: Tejas Upadhyay 

[ Upstream commit 0a967f5bfd9134b89681cae58deb222e20840e76 ]

The VT-d spec requires (10.4.4 Global Command Register, TE
field) that:

Hardware implementations supporting DMA draining must drain
any in-flight DMA read/write requests queued within the
Root-Complex before completing the translation enable
command and reflecting the status of the command through
the TES field in the Global Status register.

Unfortunately, some integrated graphic devices fail to do
so after some kind of power state transition. As the
result, the system might stuck in iommu_disable_translati
on(), waiting for the completion of TE transition.

This adds RPLS to a quirk list for those devices and skips
TE disabling if the qurik hits.

Link: https://gitlab.freedesktop.org/drm/intel/-/issues/4898
Tested-by: Raviteja Goud Talla 
Cc: Rodrigo Vivi 
Acked-by: Lu Baolu 
Signed-off-by: Tejas Upadhyay 
Reviewed-by: Rodrigo Vivi 
Signed-off-by: Rodrigo Vivi 
Link: 
https://patchwork.freedesktop.org/patch/msgid/20220302043256.191529-1-tejaskumarx.surendrakumar.upadh...@intel.com
Signed-off-by: Sasha Levin 
---
 drivers/iommu/intel/iommu.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/iommu/intel/iommu.c b/drivers/iommu/intel/iommu.c
index 21749859ad45..477dde39823c 100644
--- a/drivers/iommu/intel/iommu.c
+++ b/drivers/iommu/intel/iommu.c
@@ -6296,7 +6296,7 @@ static void quirk_igfx_skip_te_disable(struct pci_dev 
*dev)
ver = (dev->device >> 8) & 0xff;
if (ver != 0x45 && ver != 0x46 && ver != 0x4c &&
ver != 0x4e && ver != 0x8a && ver != 0x98 &&
-   ver != 0x9a)
+   ver != 0x9a && ver != 0xa7)
return;
 
if (risky_device(dev))
-- 
2.35.1

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[PATCH AUTOSEL 5.15 069/109] dma-debug: change allocation mode from GFP_NOWAIT to GFP_ATIOMIC

2022-05-30 Thread Sasha Levin
From: Mikulas Patocka 

[ Upstream commit 84bc4f1d5f8aa68706a96711dccb28b518e5 ]

We observed the error "cacheline tracking ENOMEM, dma-debug disabled"
during a light system load (copying some files). The reason for this error
is that the dma_active_cacheline radix tree uses GFP_NOWAIT allocation -
so it can't access the emergency memory reserves and it fails as soon as
anybody reaches the watermark.

This patch changes GFP_NOWAIT to GFP_ATOMIC, so that it can access the
emergency memory reserves.

Signed-off-by: Mikulas Patocka 
Signed-off-by: Christoph Hellwig 
Signed-off-by: Sasha Levin 
---
 kernel/dma/debug.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/kernel/dma/debug.c b/kernel/dma/debug.c
index f8ff598596b8..ac740630c79c 100644
--- a/kernel/dma/debug.c
+++ b/kernel/dma/debug.c
@@ -448,7 +448,7 @@ void debug_dma_dump_mappings(struct device *dev)
  * other hand, consumes a single dma_debug_entry, but inserts 'nents'
  * entries into the tree.
  */
-static RADIX_TREE(dma_active_cacheline, GFP_NOWAIT);
+static RADIX_TREE(dma_active_cacheline, GFP_ATOMIC);
 static DEFINE_SPINLOCK(radix_lock);
 #define ACTIVE_CACHELINE_MAX_OVERLAP ((1 << RADIX_TREE_MAX_TAGS) - 1)
 #define CACHELINE_PER_PAGE_SHIFT (PAGE_SHIFT - L1_CACHE_SHIFT)
-- 
2.35.1

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[PATCH AUTOSEL 5.15 001/109] iommu/vt-d: Add RPLS to quirk list to skip TE disabling

2022-05-30 Thread Sasha Levin
From: Tejas Upadhyay 

[ Upstream commit 0a967f5bfd9134b89681cae58deb222e20840e76 ]

The VT-d spec requires (10.4.4 Global Command Register, TE
field) that:

Hardware implementations supporting DMA draining must drain
any in-flight DMA read/write requests queued within the
Root-Complex before completing the translation enable
command and reflecting the status of the command through
the TES field in the Global Status register.

Unfortunately, some integrated graphic devices fail to do
so after some kind of power state transition. As the
result, the system might stuck in iommu_disable_translati
on(), waiting for the completion of TE transition.

This adds RPLS to a quirk list for those devices and skips
TE disabling if the qurik hits.

Link: https://gitlab.freedesktop.org/drm/intel/-/issues/4898
Tested-by: Raviteja Goud Talla 
Cc: Rodrigo Vivi 
Acked-by: Lu Baolu 
Signed-off-by: Tejas Upadhyay 
Reviewed-by: Rodrigo Vivi 
Signed-off-by: Rodrigo Vivi 
Link: 
https://patchwork.freedesktop.org/patch/msgid/20220302043256.191529-1-tejaskumarx.surendrakumar.upadh...@intel.com
Signed-off-by: Sasha Levin 
---
 drivers/iommu/intel/iommu.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/iommu/intel/iommu.c b/drivers/iommu/intel/iommu.c
index 91a5c75966f3..a1ffb3d6d901 100644
--- a/drivers/iommu/intel/iommu.c
+++ b/drivers/iommu/intel/iommu.c
@@ -5728,7 +5728,7 @@ static void quirk_igfx_skip_te_disable(struct pci_dev 
*dev)
ver = (dev->device >> 8) & 0xff;
if (ver != 0x45 && ver != 0x46 && ver != 0x4c &&
ver != 0x4e && ver != 0x8a && ver != 0x98 &&
-   ver != 0x9a)
+   ver != 0x9a && ver != 0xa7)
return;
 
if (risky_device(dev))
-- 
2.35.1

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[PATCH AUTOSEL 5.17 083/135] dma-debug: change allocation mode from GFP_NOWAIT to GFP_ATIOMIC

2022-05-30 Thread Sasha Levin
From: Mikulas Patocka 

[ Upstream commit 84bc4f1d5f8aa68706a96711dccb28b518e5 ]

We observed the error "cacheline tracking ENOMEM, dma-debug disabled"
during a light system load (copying some files). The reason for this error
is that the dma_active_cacheline radix tree uses GFP_NOWAIT allocation -
so it can't access the emergency memory reserves and it fails as soon as
anybody reaches the watermark.

This patch changes GFP_NOWAIT to GFP_ATOMIC, so that it can access the
emergency memory reserves.

Signed-off-by: Mikulas Patocka 
Signed-off-by: Christoph Hellwig 
Signed-off-by: Sasha Levin 
---
 kernel/dma/debug.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/kernel/dma/debug.c b/kernel/dma/debug.c
index f8ff598596b8..ac740630c79c 100644
--- a/kernel/dma/debug.c
+++ b/kernel/dma/debug.c
@@ -448,7 +448,7 @@ void debug_dma_dump_mappings(struct device *dev)
  * other hand, consumes a single dma_debug_entry, but inserts 'nents'
  * entries into the tree.
  */
-static RADIX_TREE(dma_active_cacheline, GFP_NOWAIT);
+static RADIX_TREE(dma_active_cacheline, GFP_ATOMIC);
 static DEFINE_SPINLOCK(radix_lock);
 #define ACTIVE_CACHELINE_MAX_OVERLAP ((1 << RADIX_TREE_MAX_TAGS) - 1)
 #define CACHELINE_PER_PAGE_SHIFT (PAGE_SHIFT - L1_CACHE_SHIFT)
-- 
2.35.1

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[PATCH AUTOSEL 5.17 001/135] iommu/vt-d: Add RPLS to quirk list to skip TE disabling

2022-05-30 Thread Sasha Levin
From: Tejas Upadhyay 

[ Upstream commit 0a967f5bfd9134b89681cae58deb222e20840e76 ]

The VT-d spec requires (10.4.4 Global Command Register, TE
field) that:

Hardware implementations supporting DMA draining must drain
any in-flight DMA read/write requests queued within the
Root-Complex before completing the translation enable
command and reflecting the status of the command through
the TES field in the Global Status register.

Unfortunately, some integrated graphic devices fail to do
so after some kind of power state transition. As the
result, the system might stuck in iommu_disable_translati
on(), waiting for the completion of TE transition.

This adds RPLS to a quirk list for those devices and skips
TE disabling if the qurik hits.

Link: https://gitlab.freedesktop.org/drm/intel/-/issues/4898
Tested-by: Raviteja Goud Talla 
Cc: Rodrigo Vivi 
Acked-by: Lu Baolu 
Signed-off-by: Tejas Upadhyay 
Reviewed-by: Rodrigo Vivi 
Signed-off-by: Rodrigo Vivi 
Link: 
https://patchwork.freedesktop.org/patch/msgid/20220302043256.191529-1-tejaskumarx.surendrakumar.upadh...@intel.com
Signed-off-by: Sasha Levin 
---
 drivers/iommu/intel/iommu.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/iommu/intel/iommu.c b/drivers/iommu/intel/iommu.c
index ab2273300346..e3f15e0cae34 100644
--- a/drivers/iommu/intel/iommu.c
+++ b/drivers/iommu/intel/iommu.c
@@ -5764,7 +5764,7 @@ static void quirk_igfx_skip_te_disable(struct pci_dev 
*dev)
ver = (dev->device >> 8) & 0xff;
if (ver != 0x45 && ver != 0x46 && ver != 0x4c &&
ver != 0x4e && ver != 0x8a && ver != 0x98 &&
-   ver != 0x9a)
+   ver != 0x9a && ver != 0xa7)
return;
 
if (risky_device(dev))
-- 
2.35.1

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[PATCH AUTOSEL 5.18 100/159] dma-debug: change allocation mode from GFP_NOWAIT to GFP_ATIOMIC

2022-05-30 Thread Sasha Levin
From: Mikulas Patocka 

[ Upstream commit 84bc4f1d5f8aa68706a96711dccb28b518e5 ]

We observed the error "cacheline tracking ENOMEM, dma-debug disabled"
during a light system load (copying some files). The reason for this error
is that the dma_active_cacheline radix tree uses GFP_NOWAIT allocation -
so it can't access the emergency memory reserves and it fails as soon as
anybody reaches the watermark.

This patch changes GFP_NOWAIT to GFP_ATOMIC, so that it can access the
emergency memory reserves.

Signed-off-by: Mikulas Patocka 
Signed-off-by: Christoph Hellwig 
Signed-off-by: Sasha Levin 
---
 kernel/dma/debug.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/kernel/dma/debug.c b/kernel/dma/debug.c
index f8ff598596b8..ac740630c79c 100644
--- a/kernel/dma/debug.c
+++ b/kernel/dma/debug.c
@@ -448,7 +448,7 @@ void debug_dma_dump_mappings(struct device *dev)
  * other hand, consumes a single dma_debug_entry, but inserts 'nents'
  * entries into the tree.
  */
-static RADIX_TREE(dma_active_cacheline, GFP_NOWAIT);
+static RADIX_TREE(dma_active_cacheline, GFP_ATOMIC);
 static DEFINE_SPINLOCK(radix_lock);
 #define ACTIVE_CACHELINE_MAX_OVERLAP ((1 << RADIX_TREE_MAX_TAGS) - 1)
 #define CACHELINE_PER_PAGE_SHIFT (PAGE_SHIFT - L1_CACHE_SHIFT)
-- 
2.35.1

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH V2 2/6] iommu: iova: properly handle 0 as a valid IOVA address

2022-05-30 Thread Ajay Kumar
Hi Robin

On Mon, May 23, 2022 at 11:00 PM Robin Murphy  wrote:
>
> On 2022-05-11 13:15, Ajay Kumar wrote:
> > From: Marek Szyprowski 
> >
> > Zero is a valid DMA and IOVA address on many architectures, so adjust the
> > IOVA management code to properly handle it. A new value IOVA_BAD_ADDR
> > (~0UL) is introduced as a generic value for the error case. Adjust all
> > callers of the alloc_iova_fast() function for the new return value.
>
> And when does anything actually need this? In fact if you were to stop
> iommu-dma from reserving IOVA 0 - which you don't - it would only show
> how patch #3 is broken.
Right! Since the IOVA allocation happens from higher addr to lower addr,
hitting this (IOVA==0) case means out of IOVA space which is highly unlikely.

> Also note that it's really nothing to do with architectures either way;
> iommu-dma simply chooses to reserve IOVA 0 for its own convenience,
> mostly because it can. Much the same way that 0 is typically a valid CPU
> VA, but mapping something meaningful there is just asking for a world of
> pain debugging NULL-dereference bugs.
>
> Robin.
This makes sense, let me think about managing the PFN at lowest address
in some other way.

Thanks,
Ajay Kumar

> > Signed-off-by: Marek Szyprowski 
> > Signed-off-by: Ajay Kumar 
> > ---
> >   drivers/iommu/dma-iommu.c | 16 +---
> >   drivers/iommu/iova.c  | 13 +
> >   include/linux/iova.h  |  1 +
> >   3 files changed, 19 insertions(+), 11 deletions(-)
> >
> > diff --git a/drivers/iommu/dma-iommu.c b/drivers/iommu/dma-iommu.c
> > index 1ca85d37eeab..16218d6a0703 100644
> > --- a/drivers/iommu/dma-iommu.c
> > +++ b/drivers/iommu/dma-iommu.c
> > @@ -605,7 +605,7 @@ static dma_addr_t iommu_dma_alloc_iova(struct 
> > iommu_domain *domain,
> >   {
> >   struct iommu_dma_cookie *cookie = domain->iova_cookie;
> >   struct iova_domain *iovad = >iovad;
> > - unsigned long shift, iova_len, iova = 0;
> > + unsigned long shift, iova_len, iova = IOVA_BAD_ADDR;
> >
> >   if (cookie->type == IOMMU_DMA_MSI_COOKIE) {
> >   cookie->msi_iova += size;
> > @@ -625,11 +625,13 @@ static dma_addr_t iommu_dma_alloc_iova(struct 
> > iommu_domain *domain,
> >   iova = alloc_iova_fast(iovad, iova_len,
> >  DMA_BIT_MASK(32) >> shift, false);
> >
> > - if (!iova)
> > + if (iova == IOVA_BAD_ADDR)
> >   iova = alloc_iova_fast(iovad, iova_len, dma_limit >> shift,
> >  true);
> >
> > - return (dma_addr_t)iova << shift;
> > + if (iova != IOVA_BAD_ADDR)
> > + return (dma_addr_t)iova << shift;
> > + return DMA_MAPPING_ERROR;
> >   }
> >
> >   static void iommu_dma_free_iova(struct iommu_dma_cookie *cookie,
> > @@ -688,7 +690,7 @@ static dma_addr_t __iommu_dma_map(struct device *dev, 
> > phys_addr_t phys,
> >   size = iova_align(iovad, size + iova_off);
> >
> >   iova = iommu_dma_alloc_iova(domain, size, dma_mask, dev);
> > - if (!iova)
> > + if (iova == DMA_MAPPING_ERROR)
> >   return DMA_MAPPING_ERROR;
> >
> >   if (iommu_map_atomic(domain, iova, phys - iova_off, size, prot)) {
> > @@ -799,7 +801,7 @@ static struct page 
> > **__iommu_dma_alloc_noncontiguous(struct device *dev,
> >
> >   size = iova_align(iovad, size);
> >   iova = iommu_dma_alloc_iova(domain, size, dev->coherent_dma_mask, 
> > dev);
> > - if (!iova)
> > + if (iova == DMA_MAPPING_ERROR)
> >   goto out_free_pages;
> >
> >   if (sg_alloc_table_from_pages(sgt, pages, count, 0, size, GFP_KERNEL))
> > @@ -1204,7 +1206,7 @@ static int iommu_dma_map_sg(struct device *dev, 
> > struct scatterlist *sg,
> >   }
> >
> >   iova = iommu_dma_alloc_iova(domain, iova_len, dma_get_mask(dev), dev);
> > - if (!iova) {
> > + if (iova == DMA_MAPPING_ERROR) {
> >   ret = -ENOMEM;
> >   goto out_restore_sg;
> >   }
> > @@ -1516,7 +1518,7 @@ static struct iommu_dma_msi_page 
> > *iommu_dma_get_msi_page(struct device *dev,
> >   return NULL;
> >
> >   iova = iommu_dma_alloc_iova(domain, size, dma_get_mask(dev), dev);
> > - if (!iova)
> > + if (iova == DMA_MAPPING_ERROR)
> >   goto out_free_page;
> >
> >   if (iommu_map(domain, iova, msi_addr, size, prot))
> > diff --git a/drivers/iommu/iova.c b/drivers/iommu/iova.c
> > index db77aa675145..ae0fe0a6714e 100644
> > --- a/drivers/iommu/iova.c
> > +++ b/drivers/iommu/iova.c
> > @@ -429,6 +429,8 @@ EXPORT_SYMBOL_GPL(free_iova);
> >* This function tries to satisfy an iova allocation from the rcache,
> >* and falls back to regular allocation on failure. If regular allocation
> >* fails too and the flush_rcache flag is set then the rcache will be 
> > flushed.
> > + * Returns a pfn the allocated iova starts at or IOVA_BAD_ADDR in the case
> > + * of a failure.
> >   */
> >   unsigned long
> >   alloc_iova_fast(struct 

[PATCH AUTOSEL 5.18 001/159] iommu/vt-d: Add RPLS to quirk list to skip TE disabling

2022-05-30 Thread Sasha Levin
From: Tejas Upadhyay 

[ Upstream commit 0a967f5bfd9134b89681cae58deb222e20840e76 ]

The VT-d spec requires (10.4.4 Global Command Register, TE
field) that:

Hardware implementations supporting DMA draining must drain
any in-flight DMA read/write requests queued within the
Root-Complex before completing the translation enable
command and reflecting the status of the command through
the TES field in the Global Status register.

Unfortunately, some integrated graphic devices fail to do
so after some kind of power state transition. As the
result, the system might stuck in iommu_disable_translati
on(), waiting for the completion of TE transition.

This adds RPLS to a quirk list for those devices and skips
TE disabling if the qurik hits.

Link: https://gitlab.freedesktop.org/drm/intel/-/issues/4898
Tested-by: Raviteja Goud Talla 
Cc: Rodrigo Vivi 
Acked-by: Lu Baolu 
Signed-off-by: Tejas Upadhyay 
Reviewed-by: Rodrigo Vivi 
Signed-off-by: Rodrigo Vivi 
Link: 
https://patchwork.freedesktop.org/patch/msgid/20220302043256.191529-1-tejaskumarx.surendrakumar.upadh...@intel.com
Signed-off-by: Sasha Levin 
---
 drivers/iommu/intel/iommu.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/iommu/intel/iommu.c b/drivers/iommu/intel/iommu.c
index 0ea47e17b379..ba9a63cac47c 100644
--- a/drivers/iommu/intel/iommu.c
+++ b/drivers/iommu/intel/iommu.c
@@ -5031,7 +5031,7 @@ static void quirk_igfx_skip_te_disable(struct pci_dev 
*dev)
ver = (dev->device >> 8) & 0xff;
if (ver != 0x45 && ver != 0x46 && ver != 0x4c &&
ver != 0x4e && ver != 0x8a && ver != 0x98 &&
-   ver != 0x9a)
+   ver != 0x9a && ver != 0xa7)
return;
 
if (risky_device(dev))
-- 
2.35.1

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH v4 1/6] iommu: Add a per domain PASID for DMA API

2022-05-30 Thread Jason Gunthorpe via iommu
On Tue, May 24, 2022 at 08:17:27AM -0700, Jacob Pan wrote:
> Hi Jason,
> 
> On Tue, 24 May 2022 10:50:34 -0300, Jason Gunthorpe  wrote:
> 
> > On Wed, May 18, 2022 at 11:21:15AM -0700, Jacob Pan wrote:
> > > DMA requests tagged with PASID can target individual IOMMU domains.
> > > Introduce a domain-wide PASID for DMA API, it will be used on the same
> > > mapping as legacy DMA without PASID. Let it be IOVA or PA in case of
> > > identity domain.  
> > 
> > Huh? I can't understand what this is trying to say or why this patch
> > makes sense.
> > 
> > We really should not have pasid's like this attached to the domains..
> > 
> This is the same "DMA API global PASID" you reviewed in v3, I just
> singled it out as a standalone patch and renamed it. Here is your previous
> review comment.
> 
> > +++ b/include/linux/iommu.h
> > @@ -105,6 +105,8 @@ struct iommu_domain {
> > enum iommu_page_response_code (*iopf_handler)(struct iommu_fault *fault,
> >   void *data);
> > void *fault_data;
> > +   ioasid_t pasid; /* Used for DMA requests with PASID */
> > +   atomic_t pasid_users;  
> 
> These are poorly named, this is really the DMA API global PASID and
> shouldn't be used for other things.
> 
> 
> 
> Perhaps I misunderstood, do you mind explaining more?

You still haven't really explained what this is for in this patch,
maybe it just needs a better commit message, or maybe something is
wrong.

I keep saying the DMA API usage is not special, so why do we need to
create a new global pasid and refcount? Realistically this is only
going to be used by IDXD, why can't we just allocate a PASID and
return it to the driver every time a driver asks for DMA API on PASI
mode? Why does the core need to do anything special?

Jason
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[PATCH] iommu/dma: Fix race condition during iova_domain initialization

2022-05-30 Thread yf.wang--- via iommu
From: Yunfei Wang 

When many devices share the same iova domain, iommu_dma_init_domain()
may be called at the same time. The checking of iovad->start_pfn will
all get false in iommu_dma_init_domain() and both enter init_iova_domain()
to do iovad initialization.

Fix this by protecting init_iova_domain() with iommu_dma_cookie->mutex.

Exception backtrace:
rb_insert_color(param1=0xFF80CD2BDB40, param3=1) + 64
init_iova_domain() + 180
iommu_setup_dma_ops() + 260
arch_setup_dma_ops() + 132
of_dma_configure_id() + 468
platform_dma_configure() + 32
really_probe() + 1168
driver_probe_device() + 268
__device_attach_driver() + 524
__device_attach() + 524
bus_probe_device() + 64
deferred_probe_work_func() + 260
process_one_work() + 580
worker_thread() + 1076
kthread() + 332
ret_from_fork() + 16

Signed-off-by: Ning Li 
Signed-off-by: Yunfei Wang 
---
 drivers/iommu/dma-iommu.c | 17 +
 1 file changed, 13 insertions(+), 4 deletions(-)

diff --git a/drivers/iommu/dma-iommu.c b/drivers/iommu/dma-iommu.c
index 09f6e1c0f9c0..b38c5041eeab 100644
--- a/drivers/iommu/dma-iommu.c
+++ b/drivers/iommu/dma-iommu.c
@@ -63,6 +63,7 @@ struct iommu_dma_cookie {
 
/* Domain for flush queue callback; NULL if flush queue not in use */
struct iommu_domain *fq_domain;
+   struct mutexmutex;
 };
 
 static DEFINE_STATIC_KEY_FALSE(iommu_deferred_attach_enabled);
@@ -309,6 +310,7 @@ int iommu_get_dma_cookie(struct iommu_domain *domain)
if (!domain->iova_cookie)
return -ENOMEM;
 
+   mutex_init(>iova_cookie->mutex);
return 0;
 }
 
@@ -549,26 +551,33 @@ static int iommu_dma_init_domain(struct iommu_domain 
*domain, dma_addr_t base,
}
 
/* start_pfn is always nonzero for an already-initialised domain */
+   mutex_lock(>mutex);
if (iovad->start_pfn) {
if (1UL << order != iovad->granule ||
base_pfn != iovad->start_pfn) {
pr_warn("Incompatible range for DMA domain\n");
-   return -EFAULT;
+   ret = -EFAULT;
+   goto done_unlock;
}
 
-   return 0;
+   ret = 0;
+   goto done_unlock;
}
 
init_iova_domain(iovad, 1UL << order, base_pfn);
ret = iova_domain_init_rcaches(iovad);
if (ret)
-   return ret;
+   goto done_unlock;
 
/* If the FQ fails we can simply fall back to strict mode */
if (domain->type == IOMMU_DOMAIN_DMA_FQ && iommu_dma_init_fq(domain))
domain->type = IOMMU_DOMAIN_DMA;
 
-   return iova_reserve_iommu_regions(dev, domain);
+   ret = iova_reserve_iommu_regions(dev, domain);
+
+done_unlock:
+   mutex_unlock(>mutex);
+   return ret;
 }
 
 /**
-- 
2.18.0

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH 01/12] iommu/vt-d: Use iommu_get_domain_for_dev() in debugfs

2022-05-30 Thread Jason Gunthorpe via iommu
On Sun, May 29, 2022 at 01:14:46PM +0800, Baolu Lu wrote:

> From 1e87b5df40c6ce9414cdd03988c3b52bfb17af5f Mon Sep 17 00:00:00 2001
> From: Lu Baolu 
> Date: Sun, 29 May 2022 10:18:56 +0800
> Subject: [PATCH 1/1] iommu/vt-d: debugfs: Remove device_domain_lock usage
> 
> The domain_translation_struct debugfs node is used to dump static
> mappings of PCI devices. It potentially races with setting new
> domains to devices and the iommu_map/unmap() interfaces. The existing
> code tries to use the global spinlock device_domain_lock to avoid the
> races, but this is problematical as this lock is only used to protect
> the device tracking lists of the domains.
> 
> Instead of using an immature lock to cover up the problem, it's better
> to explicitly restrict the use of this debugfs node. This also makes
> device_domain_lock static.

What does "explicitly restrict" mean?

Jason
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [RFC PATCH v1 7/9] driver core: Add fw_devlink_unblock_may_probe() helper function

2022-05-30 Thread Andy Shevchenko
On Thu, May 26, 2022 at 1:22 PM Saravana Kannan  wrote:
>
> This function can be used during the kernel boot sequence to forcefully
> override fw_devlink=on and unblock the probing of all devices that have
> a driver.
>
> It's mainly meant to be called from late_initcall() or
> late_initcall_sync() where a device needs to probe before the kernel can
> mount rootfs.

...

> diff --git a/include/linux/fwnode.h b/include/linux/fwnode.h
> index 9a81c4410b9f..0770edda7068 100644
> --- a/include/linux/fwnode.h
> +++ b/include/linux/fwnode.h
> @@ -13,6 +13,7 @@
>  #include 
>  #include 
>  #include 
> +#include 
>
>  struct fwnode_operations;
>  struct device;
> @@ -199,5 +200,6 @@ extern bool fw_devlink_is_strict(void);
>  int fwnode_link_add(struct fwnode_handle *con, struct fwnode_handle *sup);
>  void fwnode_links_purge(struct fwnode_handle *fwnode);
>  void fw_devlink_purge_absent_suppliers(struct fwnode_handle *fwnode);
> +void __init fw_devlink_unblock_may_probe(void);

I don't think you need init.h and __init here. Important is that you
have it in the C-file. Am I wrong?

-- 
With Best Regards,
Andy Shevchenko
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [RFC PATCH v1 0/9] deferred_probe_timeout logic clean up

2022-05-30 Thread Geert Uytterhoeven
Hi Saravana,

On Thu, May 26, 2022 at 10:15 AM Saravana Kannan  wrote:
> This series is based on linux-next + these 2 small patches applies on top:
> https://lore.kernel.org/lkml/20220526034609.480766-1-sarava...@google.com/
>
> A lot of the deferred_probe_timeout logic is redundant with
> fw_devlink=on.  Also, enabling deferred_probe_timeout by default breaks
> a few cases.
>
> This series tries to delete the redundant logic, simplify the frameworks
> that use driver_deferred_probe_check_state(), enable
> deferred_probe_timeout=10 by default, and fixes the nfsroot failure
> case.
>
> Patches 1 to 3 are fairly straightforward and can probably be applied
> right away.
>
> Patches 4 to 9 are related and are the complicated bits of this series.
>
> Patch 8 is where someone with more knowledge of the IP auto config code
> can help rewrite the patch to limit the scope of the workaround by
> running the work around only if IP auto config fails the first time
> around. But it's also something that can be optimized in the future
> because it's already limited to the case where IP auto config is enabled
> using the kernel commandline.

Thanks for your series!

> Yoshihiro/Geert,
>
> If you can test this patch series and confirm that the NFS root case
> works, I'd really appreciate that.

On Salvator-XS, Micrel KSZ9031 Gigabit PHY probe is no longer delayed
by 9s after applying the two earlier patches, and the same is true
after applying this series on top.
Tested-by: Geert Uytterhoeven 

I will do testing on more boards, but that may take a while, as we're
in the middle of the merge window.

Gr{oetje,eeting}s,

Geert

--
Geert Uytterhoeven -- There's lots of Linux beyond ia32 -- ge...@linux-m68k.org

In personal conversations with technical people, I call myself a hacker. But
when I'm talking to journalists I just say "programmer" or something like that.
-- Linus Torvalds
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [RFC PATCH v1 2/9] pinctrl: devicetree: Delete usage of driver_deferred_probe_check_state()

2022-05-30 Thread Geert Uytterhoeven
Hi Saravana,

Thanks for your patch!

On Thu, May 26, 2022 at 10:16 AM Saravana Kannan  wrote:
> Now that fw_devlink=on by default and fw_devlink supports
> "pinctrl-[0-8]" property, the execution will never get to the point

0-9?

oh, it's really 0-8:

drivers/of/property.c:DEFINE_SIMPLE_PROP(pinctrl0, "pinctrl-0", NULL)
drivers/of/property.c:DEFINE_SIMPLE_PROP(pinctrl1, "pinctrl-1", NULL)
drivers/of/property.c:DEFINE_SIMPLE_PROP(pinctrl2, "pinctrl-2", NULL)
drivers/of/property.c:DEFINE_SIMPLE_PROP(pinctrl3, "pinctrl-3", NULL)
drivers/of/property.c:DEFINE_SIMPLE_PROP(pinctrl4, "pinctrl-4", NULL)
drivers/of/property.c:DEFINE_SIMPLE_PROP(pinctrl5, "pinctrl-5", NULL)
drivers/of/property.c:DEFINE_SIMPLE_PROP(pinctrl6, "pinctrl-6", NULL)
drivers/of/property.c:DEFINE_SIMPLE_PROP(pinctrl7, "pinctrl-7", NULL)
drivers/of/property.c:DEFINE_SIMPLE_PROP(pinctrl8, "pinctrl-8", NULL)

Looks fragile, especially since we now have:

arch/arm64/boot/dts/microchip/sparx5_pcb134_board.dtsi:
pinctrl-9 = <_9>;
arch/arm64/boot/dts/microchip/sparx5_pcb134_board.dtsi: pinctrl-10
= <_10>;
arch/arm64/boot/dts/microchip/sparx5_pcb134_board.dtsi: pinctrl-11
= <_11>;
arch/arm64/boot/dts/microchip/sparx5_pcb134_board.dtsi: pinctrl-12
= <_pins_i>;

> where driver_deferred_probe_check_state() is called before the supplier
> has probed successfully or before deferred probe timeout has expired.
>
> So, delete the call and replace it with -ENODEV.
>
> Signed-off-by: Saravana Kannan 

Gr{oetje,eeting}s,

Geert

--
Geert Uytterhoeven -- There's lots of Linux beyond ia32 -- ge...@linux-m68k.org

In personal conversations with technical people, I call myself a hacker. But
when I'm talking to journalists I just say "programmer" or something like that.
-- Linus Torvalds
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [RFC PATCH v1 4/9] Revert "driver core: Set default deferred_probe_timeout back to 0."

2022-05-30 Thread Geert Uytterhoeven
Hi Saravana,

On Thu, May 26, 2022 at 10:16 AM Saravana Kannan  wrote:
> This reverts commit 11f7e7ef553b6b93ac1aa74a3c2011b9cc8aeb61.

scripts/chdeckpatch.pl says:

WARNING: Unknown commit id
'11f7e7ef553b6b93ac1aa74a3c2011b9cc8aeb61', maybe rebased or not
pulled?

I assume this is your local copy of
https://lore.kernel.org/r/20220526034609.480766-3-sarava...@google.com?

Gr{oetje,eeting}s,

Geert

--
Geert Uytterhoeven -- There's lots of Linux beyond ia32 -- ge...@linux-m68k.org

In personal conversations with technical people, I call myself a hacker. But
when I'm talking to journalists I just say "programmer" or something like that.
-- Linus Torvalds
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [RFC PATCH v1 0/9] deferred_probe_timeout logic clean up

2022-05-30 Thread Sebastian Andrzej Siewior
On 2022-05-26 01:15:39 [-0700], Saravana Kannan wrote:
> Yoshihiro/Geert,
Hi Saravana,

> If you can test this patch series and confirm that the NFS root case
> works, I'd really appreciate that.

The two patches you sent earlier, plus this series, plus

diff --git a/drivers/base/core.c b/drivers/base/core.c
index 7ff7fbb006431..829d9b1f7403f 100644
--- a/drivers/base/core.c
+++ b/drivers/base/core.c
@@ -1697,8 +1697,6 @@ static int fw_devlink_may_probe(struct device *dev, void 
*data)
  */
 void __init fw_devlink_unblock_may_probe(void)
 {
-   struct device_link *link, *ln;
-
if (!fw_devlink_flags || fw_devlink_is_permissive())
return;
 
and it compiles + boots without a delay.

Sebastian
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[PATCH v7 1/2] iommu/io-pgtable-arm-v7s: Add a quirk to allow pgtable PA up to 35bit

2022-05-30 Thread yf.wang--- via iommu
From: Yunfei Wang 

Single memory zone feature will remove ZONE_DMA32 and ZONE_DMA and
cause pgtable PA size larger than 32bit.

Since Mediatek IOMMU hardware support at most 35bit PA in pgtable,
so add a quirk to allow the PA of pgtables support up to bit35.

Signed-off-by: Ning Li 
Signed-off-by: Yunfei Wang 
---
 drivers/iommu/io-pgtable-arm-v7s.c | 48 +-
 include/linux/io-pgtable.h | 17 +++
 2 files changed, 45 insertions(+), 20 deletions(-)

diff --git a/drivers/iommu/io-pgtable-arm-v7s.c 
b/drivers/iommu/io-pgtable-arm-v7s.c
index be066c1503d3..9a7671a89fd7 100644
--- a/drivers/iommu/io-pgtable-arm-v7s.c
+++ b/drivers/iommu/io-pgtable-arm-v7s.c
@@ -182,14 +182,8 @@ static bool arm_v7s_is_mtk_enabled(struct io_pgtable_cfg 
*cfg)
(cfg->quirks & IO_PGTABLE_QUIRK_ARM_MTK_EXT);
 }
 
-static arm_v7s_iopte paddr_to_iopte(phys_addr_t paddr, int lvl,
-   struct io_pgtable_cfg *cfg)
+static arm_v7s_iopte to_iopte_mtk(phys_addr_t paddr, arm_v7s_iopte pte)
 {
-   arm_v7s_iopte pte = paddr & ARM_V7S_LVL_MASK(lvl);
-
-   if (!arm_v7s_is_mtk_enabled(cfg))
-   return pte;
-
if (paddr & BIT_ULL(32))
pte |= ARM_V7S_ATTR_MTK_PA_BIT32;
if (paddr & BIT_ULL(33))
@@ -199,6 +193,17 @@ static arm_v7s_iopte paddr_to_iopte(phys_addr_t paddr, int 
lvl,
return pte;
 }
 
+static arm_v7s_iopte paddr_to_iopte(phys_addr_t paddr, int lvl,
+   struct io_pgtable_cfg *cfg)
+{
+   arm_v7s_iopte pte = paddr & ARM_V7S_LVL_MASK(lvl);
+
+   if (!arm_v7s_is_mtk_enabled(cfg))
+   return pte;
+
+   return to_iopte_mtk(paddr, pte);
+}
+
 static phys_addr_t iopte_to_paddr(arm_v7s_iopte pte, int lvl,
  struct io_pgtable_cfg *cfg)
 {
@@ -234,6 +239,7 @@ static arm_v7s_iopte *iopte_deref(arm_v7s_iopte pte, int 
lvl,
 static void *__arm_v7s_alloc_table(int lvl, gfp_t gfp,
   struct arm_v7s_io_pgtable *data)
 {
+   gfp_t gfp_l1 = __GFP_ZERO | ARM_V7S_TABLE_GFP_DMA;
struct io_pgtable_cfg *cfg = >iop.cfg;
struct device *dev = cfg->iommu_dev;
phys_addr_t phys;
@@ -241,9 +247,11 @@ static void *__arm_v7s_alloc_table(int lvl, gfp_t gfp,
size_t size = ARM_V7S_TABLE_SIZE(lvl, cfg);
void *table = NULL;
 
+   if (cfg->quirks & IO_PGTABLE_QUIRK_ARM_MTK_TTBR_EXT)
+   gfp_l1 = GFP_KERNEL | __GFP_ZERO;
+
if (lvl == 1)
-   table = (void *)__get_free_pages(
-   __GFP_ZERO | ARM_V7S_TABLE_GFP_DMA, get_order(size));
+   table = (void *)__get_free_pages(gfp_l1, get_order(size));
else if (lvl == 2)
table = kmem_cache_zalloc(data->l2_tables, gfp);
 
@@ -251,7 +259,8 @@ static void *__arm_v7s_alloc_table(int lvl, gfp_t gfp,
return NULL;
 
phys = virt_to_phys(table);
-   if (phys != (arm_v7s_iopte)phys) {
+   if (phys != (arm_v7s_iopte)phys &&
+   !(cfg->quirks & IO_PGTABLE_QUIRK_ARM_MTK_TTBR_EXT)) {
/* Doesn't fit in PTE */
dev_err(dev, "Page table does not fit in PTE: %pa", );
goto out_free;
@@ -457,9 +466,14 @@ static arm_v7s_iopte arm_v7s_install_table(arm_v7s_iopte 
*table,
   arm_v7s_iopte curr,
   struct io_pgtable_cfg *cfg)
 {
+   phys_addr_t phys = virt_to_phys(table);
arm_v7s_iopte old, new;
 
-   new = virt_to_phys(table) | ARM_V7S_PTE_TYPE_TABLE;
+   new = phys | ARM_V7S_PTE_TYPE_TABLE;
+
+   if (cfg->quirks & IO_PGTABLE_QUIRK_ARM_MTK_TTBR_EXT)
+   new = to_iopte_mtk(phys, new);
+
if (cfg->quirks & IO_PGTABLE_QUIRK_ARM_NS)
new |= ARM_V7S_ATTR_NS_TABLE;
 
@@ -778,7 +792,9 @@ static phys_addr_t arm_v7s_iova_to_phys(struct 
io_pgtable_ops *ops,
 static struct io_pgtable *arm_v7s_alloc_pgtable(struct io_pgtable_cfg *cfg,
void *cookie)
 {
+   slab_flags_t slab_flag = ARM_V7S_TABLE_SLAB_FLAGS;
struct arm_v7s_io_pgtable *data;
+   phys_addr_t paddr;
 
if (cfg->ias > (arm_v7s_is_mtk_enabled(cfg) ? 34 : ARM_V7S_ADDR_BITS))
return NULL;
@@ -788,7 +804,8 @@ static struct io_pgtable *arm_v7s_alloc_pgtable(struct 
io_pgtable_cfg *cfg,
 
if (cfg->quirks & ~(IO_PGTABLE_QUIRK_ARM_NS |
IO_PGTABLE_QUIRK_NO_PERMS |
-   IO_PGTABLE_QUIRK_ARM_MTK_EXT))
+   IO_PGTABLE_QUIRK_ARM_MTK_EXT |
+   IO_PGTABLE_QUIRK_ARM_MTK_TTBR_EXT))
return NULL;
 
/* If ARM_MTK_4GB is enabled, the NO_PERMS is also expected. */
@@ -801,10 +818,12 @@ static struct io_pgtable *arm_v7s_alloc_pgtable(struct 
io_pgtable_cfg *cfg,
return NULL;
 

[PATCH v7 2/2] iommu/mediatek: Allow page table PA up to 35bit

2022-05-30 Thread yf.wang--- via iommu
From: Yunfei Wang 

Single memory zone feature will remove ZONE_DMA32 and ZONE_DMA. So add
the quirk IO_PGTABLE_QUIRK_ARM_MTK_TTBR_EXT to let level 1 and level 2
pgtable support at most 35bit PA.

Signed-off-by: Ning Li 
Signed-off-by: Yunfei Wang 
---
 drivers/iommu/mtk_iommu.c | 29 +
 1 file changed, 25 insertions(+), 4 deletions(-)

diff --git a/drivers/iommu/mtk_iommu.c b/drivers/iommu/mtk_iommu.c
index 6fd75a60abd6..dd9661690ca6 100644
--- a/drivers/iommu/mtk_iommu.c
+++ b/drivers/iommu/mtk_iommu.c
@@ -33,6 +33,9 @@
 
 #define REG_MMU_PT_BASE_ADDR   0x000
 #define MMU_PT_ADDR_MASK   GENMASK(31, 7)
+/* Mediatek extend ttbr bits[2:0] for PA bits[34:32] */
+#define MMU_PT_35BIT_PA(pa)\
+   ((pa & GENMASK_ULL(31, 7)) | ((pa & GENMASK_ULL(34, 32)) >> 32))
 
 #define REG_MMU_INVALIDATE 0x020
 #define F_ALL_INVLD0x2
@@ -118,6 +121,7 @@
 #define WR_THROT_ENBIT(6)
 #define HAS_LEGACY_IVRP_PADDR  BIT(7)
 #define IOVA_34_EN BIT(8)
+#define PGTABLE_PA_35_EN   BIT(9)
 
 #define MTK_IOMMU_HAS_FLAG(pdata, _x) \
pdata)->flags) & (_x)) == (_x))
@@ -401,6 +405,9 @@ static int mtk_iommu_domain_finalise(struct 
mtk_iommu_domain *dom,
.iommu_dev = data->dev,
};
 
+   if (MTK_IOMMU_HAS_FLAG(data->plat_data, PGTABLE_PA_35_EN))
+   dom->cfg.quirks |= IO_PGTABLE_QUIRK_ARM_MTK_TTBR_EXT;
+
if (MTK_IOMMU_HAS_FLAG(data->plat_data, HAS_4GB_MODE))
dom->cfg.oas = data->enable_4GB ? 33 : 32;
else
@@ -450,6 +457,7 @@ static int mtk_iommu_attach_device(struct iommu_domain 
*domain,
struct mtk_iommu_domain *dom = to_mtk_domain(domain);
struct device *m4udev = data->dev;
int ret, domid;
+   u32 regval;
 
domid = mtk_iommu_get_domain_id(dev, data->plat_data);
if (domid < 0)
@@ -472,8 +480,13 @@ static int mtk_iommu_attach_device(struct iommu_domain 
*domain,
return ret;
}
data->m4u_dom = dom;
-   writel(dom->cfg.arm_v7s_cfg.ttbr & MMU_PT_ADDR_MASK,
-  data->base + REG_MMU_PT_BASE_ADDR);
+
+   /* Bits[6:3] are invalid for mediatek platform */
+   if (MTK_IOMMU_HAS_FLAG(data->plat_data, PGTABLE_PA_35_EN))
+   regval = MMU_PT_35BIT_PA(dom->cfg.arm_v7s_cfg.ttbr);
+   else
+   regval = dom->cfg.arm_v7s_cfg.ttbr & MMU_PT_ADDR_MASK;
+   writel(regval, data->base + REG_MMU_PT_BASE_ADDR);
 
pm_runtime_put(m4udev);
}
@@ -987,6 +1000,7 @@ static int __maybe_unused mtk_iommu_runtime_resume(struct 
device *dev)
struct mtk_iommu_suspend_reg *reg = >reg;
struct mtk_iommu_domain *m4u_dom = data->m4u_dom;
void __iomem *base = data->base;
+   u32 regval;
int ret;
 
ret = clk_prepare_enable(data->bclk);
@@ -1010,7 +1024,13 @@ static int __maybe_unused 
mtk_iommu_runtime_resume(struct device *dev)
writel_relaxed(reg->int_main_control, base + REG_MMU_INT_MAIN_CONTROL);
writel_relaxed(reg->ivrp_paddr, base + REG_MMU_IVRP_PADDR);
writel_relaxed(reg->vld_pa_rng, base + REG_MMU_VLD_PA_RNG);
-   writel(m4u_dom->cfg.arm_v7s_cfg.ttbr & MMU_PT_ADDR_MASK, base + 
REG_MMU_PT_BASE_ADDR);
+
+   /* Bits[6:3] are invalid for mediatek platform */
+   if (MTK_IOMMU_HAS_FLAG(data->plat_data, PGTABLE_PA_35_EN))
+   regval = MMU_PT_35BIT_PA(m4u_dom->cfg.arm_v7s_cfg.ttbr);
+   else
+   regval = m4u_dom->cfg.arm_v7s_cfg.ttbr & MMU_PT_ADDR_MASK;
+   writel(regval, base + REG_MMU_PT_BASE_ADDR);
 
/*
 * Users may allocate dma buffer before they call pm_runtime_get,
@@ -1038,7 +1058,8 @@ static const struct mtk_iommu_plat_data mt2712_data = {
 
 static const struct mtk_iommu_plat_data mt6779_data = {
.m4u_plat  = M4U_MT6779,
-   .flags = HAS_SUB_COMM | OUT_ORDER_WR_EN | WR_THROT_EN,
+   .flags = HAS_SUB_COMM | OUT_ORDER_WR_EN | WR_THROT_EN |
+PGTABLE_PA_35_EN,
.inv_sel_reg   = REG_MMU_INV_SEL_GEN2,
.iova_region   = single_domain,
.iova_region_nr = ARRAY_SIZE(single_domain),
-- 
2.18.0

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH v1] driver core: Extend deferred probe timeout on driver registration

2022-05-30 Thread Niklas Cassel via iommu
On Wed, May 25, 2022 at 12:49:00PM -0700, Saravana Kannan wrote:
> On Wed, May 25, 2022 at 12:12 AM Sebastian Andrzej Siewior
>  wrote:
> >
> > On 2022-05-24 10:46:49 [-0700], Saravana Kannan wrote:
> > > > Removing probe_timeout_waitqueue (as suggested) or setting the timeout
> > > > to 0 avoids the delay.
> > >
> > > In your case, I think it might be working as intended? Curious, what
> > > was the call stack in your case where it was blocked?
> >
> > Why is then there 10sec delay during boot? The backtrace is
> > |[ cut here ]
> > |WARNING: CPU: 4 PID: 1 at drivers/base/dd.c:742 
> > wait_for_device_probe+0x30/0x110
> > |Modules linked in:
> > |CPU: 4 PID: 1 Comm: swapper/0 Not tainted 5.18.0-rc5+ #154
> > |RIP: 0010:wait_for_device_probe+0x30/0x110
> > |Call Trace:
> > | 
> > | prepare_namespace+0x2b/0x160
> > | kernel_init_freeable+0x2b3/0x2dd
> > | kernel_init+0x11/0x110
> > | ret_from_fork+0x22/0x30
> > | 
> >
> > Looking closer, it can't access init. This in particular box boots
> > directly the kernel without an initramfs so the kernel later mounts
> > /dev/sda1 and everything is good.  So that seems to be the reason…
>

Hello there,

My (QEMU) boot times were recently extended by 10 seconds.
Looking at the timestamps, it looks like nothing is being done for 10 whole
seconds.

A git bisect landed me at the patch in $subject:
2b28a1a84a0e ("driver core: Extend deferred probe timeout on driver 
registration")

Adding a WARN_ON(1) in wait_for_device_probe(), as requested by the patch
author from the others seeing a regression with this patch, gives two different
stacktraces during boot:

[0.459633] printk: console [netcon0] enabled
[0.459636] printk: console [netcon0] printing thread started
[0.459637] netconsole: network logging started
[0.459896] cfg80211: Loading compiled-in X.509 certificates for regulatory 
database
[0.460230] kworker/u8:6 (105) used greatest stack depth: 14744 bytes left
[0.461031] cfg80211: Loaded X.509 cert 'sforshee: 00b28ddf47aef9cea7'
[0.461077] platform regulatory.0: Direct firmware load for regulatory.db 
failed with error -2
[0.461085] cfg80211: failed to load regulatory.db
[0.461113] ALSA device list:
[0.461116]   No soundcards found.
[0.461614] [ cut here ]
[0.461615] WARNING: CPU: 2 PID: 1 at drivers/base/dd.c:741 
wait_for_device_probe+0x1a/0x160
[0.485809] Modules linked in:
[0.486089] CPU: 2 PID: 1 Comm: swapper/0 Not tainted 
5.18.0-next-20220526-4-g74f936013b08-dirty #20
[0.486842] Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 
rel-1.16.0-0-gd239552ce722-prebuilt.qemu.org 04/01/2014
[0.487707] RIP: 0010:wait_for_device_probe+0x1a/0x160
[0.488103] Code: 00 e8 fa e4 b5 ff 8b 44 24 04 48 83 c4 08 5b c3 0f 1f 44 
00 00 53 48 83 ec 30 65 48 8b 04 25 28 00 00 00 48 89 44 24 28 31 c0 <0f> 0b e8 
1f ac 57 00 8b 15 f1 b3 24 01 85 d2 75 3d 48 c7 c7 60 2f
[0.489539] RSP: :9c7900013ed8 EFLAGS: 00010246
[0.489965] RAX:  RBX: 0008 RCX: 0d02
[0.490597] RDX: 0cc2 RSI:  RDI: 0002e990
[0.491181] RBP: 0214 R08: 000f R09: 0064
[0.491788] R10: 9c7900013c6c R11:  R12: 8964c0343640
[0.492384] R13: 9e51791c R14:  R15: 
[0.492960] FS:  () GS:896637d0() 
knlGS:
[0.493658] CS:  0010 DS:  ES:  CR0: 80050033
[0.494501] CR2:  CR3: 0001ed20c001 CR4: 00370ee0
[0.495621] Call Trace:
[0.496059]  
[0.496266]  ? init_eaccess+0x3b/0x76
[0.496657]  prepare_namespace+0x30/0x16a
[0.497016]  kernel_init_freeable+0x207/0x212
[0.497407]  ? rest_init+0xc0/0xc0
[0.497714]  kernel_init+0x16/0x120
[0.498250]  ret_from_fork+0x1f/0x30
[0.498898]  
[0.499307] ---[ end trace  ]---
[0.748413] ata2: SATA link up 1.5 Gbps (SStatus 113 SControl 300)
[0.749053] ata1: SATA link up 1.5 Gbps (SStatus 113 SControl 300)
[0.749461] ata2.00: ATA-7: QEMU HARDDISK, version, max UDMA/100
[0.749470] ata2.00: 732 sectors, multi 16: LBA48 NCQ (depth 32)
[0.749479] ata2.00: applying bridge limits
[0.750915] ata4: SATA link down (SStatus 0 SControl 300)
[0.752110] ata5: SATA link down (SStatus 0 SControl 300)
[0.753424] ata6: SATA link up 1.5 Gbps (SStatus 113 SControl 300)
[0.754877] ata3: SATA link down (SStatus 0 SControl 300)
[0.755342] ata1.00: ATA-7: QEMU HARDDISK, version, max UDMA/100
[0.755377] ata1.00: 268435456 sectors, multi 16: LBA48 NCQ (depth 32)
[0.755387] ata1.00: applying bridge limits
[0.755486] ata6.00: ATA-7: QEMU HARDDISK, version, max UDMA/100
[0.755492] ata6.00: 8388608 sectors, multi 16: LBA48 NCQ (depth 32)
[0.755500] ata6.00: applying bridge limits
[