[PATCH v3 4/5] iommu/arm-smmu-v3: Add host support for NVIDIA Grace CMDQ-V
From: Nate Watterson NVIDIA's Grace Soc has a CMDQ-Virtualization (CMDQV) hardware, which extends the standard ARM SMMU v3 IP to support multiple VCMDQs with virtualization capabilities. In-kernel of host OS, they're used to reduce contention on a single queue. In terms of command queue, they are very like the standard CMDQ/ECMDQs, but only support CS_NONE in the CS field of CMD_SYNC command. This patch adds a new nvidia-grace-cmdqv file and inserts its structure pointer into the existing arm_smmu_device, and then adds related function calls in the arm-smmu-v3 driver. In the CMDQV driver itself, this patch only adds minimal part for host kernel support. Upon probe(), VINTF0 is reserved for in-kernel use. And some of the VCMDQs are assigned to VINTF0. Then the driver will select one of VCMDQs in the VINTF0 based on the CPU currently executing, to issue commands. Note that for the current plan the CMDQV driver only supports ACPI configuration. Signed-off-by: Nate Watterson Signed-off-by: Nicolin Chen --- Changelog: v2->v3: * Replaced impl design with simpler "nvidia_grace_cmdqv" pointer * Aligned all the namings to "nvidia_grace_cmdqv" or "cmdqv" * Changed VINTF_ENABLED check in get_cmdq() to VINTF_STATUS * Dropped overrides at smmu->features and smmu->options * Inlined hw_probe() to acpi_probe() for simplification * Added a new CMDQV CONFIG depending on CONFIG_ACPI * Removed additional platform_device involvement * Switched krealloc to kzalloc for cmdqv Pointer * Moved devm_request_irq() out of device_reset() * Dropped IRQF_SHARED flag at devm_request_irq() * Reused acpi_iort_node pointer from SMMU driver * Reused existing smmu functions to init vcmdqs * Changed writel_relaxed to writel to be safe * Removed pointless comments and prints * Updated Copyright lines MAINTAINERS | 1 + drivers/iommu/Kconfig | 12 + drivers/iommu/arm/arm-smmu-v3/Makefile| 1 + drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c | 21 +- drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h | 41 ++ .../arm/arm-smmu-v3/nvidia-grace-cmdqv.c | 418 ++ 6 files changed, 488 insertions(+), 6 deletions(-) create mode 100644 drivers/iommu/arm/arm-smmu-v3/nvidia-grace-cmdqv.c diff --git a/MAINTAINERS b/MAINTAINERS index f32c7d733255..0314ee1edf62 100644 --- a/MAINTAINERS +++ b/MAINTAINERS @@ -18726,6 +18726,7 @@ M: Thierry Reding R: Krishna Reddy L: linux-te...@vger.kernel.org S: Supported +F: drivers/iommu/arm/arm-smmu-v3/nvidia-grace-cmdqv.c F: drivers/iommu/arm/arm-smmu/arm-smmu-nvidia.c F: drivers/iommu/tegra* diff --git a/drivers/iommu/Kconfig b/drivers/iommu/Kconfig index 3eb68fa1b8cc..290af9c7b2a5 100644 --- a/drivers/iommu/Kconfig +++ b/drivers/iommu/Kconfig @@ -388,6 +388,18 @@ config ARM_SMMU_V3_SVA Say Y here if your system supports SVA extensions such as PCIe PASID and PRI. +config NVIDIA_GRACE_CMDQV + bool "NVIDIA Grace CMDQ-V extension support for ARM SMMUv3" + depends on ARM_SMMU_V3 + depends on ACPI + help + Support for NVIDIA Grace CMDQ-Virtualization extension for ARM SMMUv3. + The CMDQ-V extension is similar to v3.3 ECMDQ for multi command queues + support, except with virtualization capabilities. + + Say Y here if your system is NVIDIA Grace or it has the same CMDQ-V + extension. + config S390_IOMMU def_bool y if S390 && PCI depends on S390 && PCI diff --git a/drivers/iommu/arm/arm-smmu-v3/Makefile b/drivers/iommu/arm/arm-smmu-v3/Makefile index 54feb1ecccad..a083019de68a 100644 --- a/drivers/iommu/arm/arm-smmu-v3/Makefile +++ b/drivers/iommu/arm/arm-smmu-v3/Makefile @@ -2,4 +2,5 @@ obj-$(CONFIG_ARM_SMMU_V3) += arm_smmu_v3.o arm_smmu_v3-objs-y += arm-smmu-v3.o arm_smmu_v3-objs-$(CONFIG_ARM_SMMU_V3_SVA) += arm-smmu-v3-sva.o +arm_smmu_v3-objs-$(CONFIG_NVIDIA_GRACE_CMDQV) += nvidia-grace-cmdqv.o arm_smmu_v3-objs := $(arm_smmu_v3-objs-y) diff --git a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c index 188865ec9a33..b1182dd825fd 100644 --- a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c +++ b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c @@ -339,6 +339,9 @@ static int arm_smmu_cmdq_build_cmd(u64 *cmd, struct arm_smmu_cmdq_ent *ent) static struct arm_smmu_cmdq *arm_smmu_get_cmdq(struct arm_smmu_device *smmu) { + if (smmu->nvidia_grace_cmdqv) + return nvidia_grace_cmdqv_get_cmdq(smmu); + return >cmdq; } @@ -2874,12 +2877,10 @@ static struct iommu_ops arm_smmu_ops = { }; /* Probing and initialisation functions */ -static int arm_smmu_init_one_queue(struct arm_smmu_device *smmu, - struct arm_smmu_queue *q, - void __iomem *page, - unsigned long prod_off, - unsigned long
[PATCH v3 2/5] iommu/arm-smmu-v3: Make arm_smmu_cmdq_init reusable
The CMDQV extension in NVIDIA Grace SoC resues the arm_smmu_cmdq structure while the queue location isn't same as smmu->cmdq. So, this patch adds a cmdq argument to arm_smmu_cmdq_init() function and shares its define in the header for CMDQV driver to use. Signed-off-by: Nicolin Chen --- drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c | 6 +++--- drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h | 2 ++ 2 files changed, 5 insertions(+), 3 deletions(-) diff --git a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c index e6fee69dd79c..6be20e926f63 100644 --- a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c +++ b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c @@ -2922,10 +2922,10 @@ static void arm_smmu_cmdq_free_bitmap(void *data) bitmap_free(bitmap); } -static int arm_smmu_cmdq_init(struct arm_smmu_device *smmu) +int arm_smmu_cmdq_init(struct arm_smmu_device *smmu, + struct arm_smmu_cmdq *cmdq) { int ret = 0; - struct arm_smmu_cmdq *cmdq = >cmdq; unsigned int nents = 1 << cmdq->q.llq.max_n_shift; atomic_long_t *bitmap; @@ -2955,7 +2955,7 @@ static int arm_smmu_init_queues(struct arm_smmu_device *smmu) if (ret) return ret; - ret = arm_smmu_cmdq_init(smmu); + ret = arm_smmu_cmdq_init(smmu, >cmdq); if (ret) return ret; diff --git a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h index 7a6a6045700d..475f004ccbe4 100644 --- a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h +++ b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h @@ -751,6 +751,8 @@ void arm_smmu_tlb_inv_range_asid(unsigned long iova, size_t size, int asid, bool arm_smmu_free_asid(struct arm_smmu_ctx_desc *cd); int arm_smmu_atc_inv_domain(struct arm_smmu_domain *smmu_domain, int ssid, unsigned long iova, size_t size); +int arm_smmu_cmdq_init(struct arm_smmu_device *smmu, + struct arm_smmu_cmdq *cmdq); #ifdef CONFIG_ARM_SMMU_V3_SVA bool arm_smmu_sva_supported(struct arm_smmu_device *smmu); -- 2.17.1 ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
[PATCH v3 5/5] iommu/nvidia-grace-cmdqv: Limit CMDs for guest owned VINTF
When VCMDQs are assigned to a VINTF that is owned by a guest, not hypervisor (HYP_OWN bit is unset), only TLB invalidation commands are supported. This requires get_cmd() function to scan the input cmd before selecting cmdq between smmu->cmdq and vintf->vcmdq, so unsupported commands can still go through emulated smmu->cmdq. Also the guest shouldn't have HYP_OWN bit being set regardless of guest kernel driver writing it or not, i.e. the user space driver running in the host OS should wire this bit to zero when trapping a write access to this VINTF_CONFIG register from a guest kernel. So instead of using the existing regval, this patch reads out the register value explicitly to cache in vintf->cfg. Signed-off-by: Nicolin Chen --- drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c | 6 ++-- drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h | 5 +-- .../arm/arm-smmu-v3/nvidia-grace-cmdqv.c | 32 +-- 3 files changed, 36 insertions(+), 7 deletions(-) diff --git a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c index b1182dd825fd..73941ccc1a3e 100644 --- a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c +++ b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c @@ -337,10 +337,10 @@ static int arm_smmu_cmdq_build_cmd(u64 *cmd, struct arm_smmu_cmdq_ent *ent) return 0; } -static struct arm_smmu_cmdq *arm_smmu_get_cmdq(struct arm_smmu_device *smmu) +static struct arm_smmu_cmdq *arm_smmu_get_cmdq(struct arm_smmu_device *smmu, u64 *cmds, int n) { if (smmu->nvidia_grace_cmdqv) - return nvidia_grace_cmdqv_get_cmdq(smmu); + return nvidia_grace_cmdqv_get_cmdq(smmu, cmds, n); return >cmdq; } @@ -747,7 +747,7 @@ static int arm_smmu_cmdq_issue_cmdlist(struct arm_smmu_device *smmu, u32 prod; unsigned long flags; bool owner; - struct arm_smmu_cmdq *cmdq = arm_smmu_get_cmdq(smmu); + struct arm_smmu_cmdq *cmdq = arm_smmu_get_cmdq(smmu, cmds, n); struct arm_smmu_ll_queue llq, head; int ret = 0; diff --git a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h index 24f93444aeeb..085c775c2eea 100644 --- a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h +++ b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h @@ -832,7 +832,8 @@ struct nvidia_grace_cmdqv * nvidia_grace_cmdqv_acpi_probe(struct arm_smmu_device *smmu, struct acpi_iort_node *node); int nvidia_grace_cmdqv_device_reset(struct arm_smmu_device *smmu); -struct arm_smmu_cmdq *nvidia_grace_cmdqv_get_cmdq(struct arm_smmu_device *smmu); +struct arm_smmu_cmdq *nvidia_grace_cmdqv_get_cmdq(struct arm_smmu_device *smmu, + u64 *cmds, int n); #else /* CONFIG_NVIDIA_GRACE_CMDQV */ static inline struct nvidia_grace_cmdqv * nvidia_grace_cmdqv_acpi_probe(struct arm_smmu_device *smmu, @@ -847,7 +848,7 @@ static inline int nvidia_grace_cmdqv_device_reset(struct arm_smmu_device *smmu) } static inline struct arm_smmu_cmdq * -nvidia_grace_cmdqv_get_cmdq(struct arm_smmu_device *smmu) +nvidia_grace_cmdqv_get_cmdq(struct arm_smmu_device *smmu, u64 *cmds, int n) { return NULL; } diff --git a/drivers/iommu/arm/arm-smmu-v3/nvidia-grace-cmdqv.c b/drivers/iommu/arm/arm-smmu-v3/nvidia-grace-cmdqv.c index c0d7351f13e2..71f6bc684e64 100644 --- a/drivers/iommu/arm/arm-smmu-v3/nvidia-grace-cmdqv.c +++ b/drivers/iommu/arm/arm-smmu-v3/nvidia-grace-cmdqv.c @@ -166,7 +166,8 @@ static int nvidia_grace_cmdqv_init_one_vcmdq(struct nvidia_grace_cmdqv *cmdqv, return arm_smmu_cmdq_init(cmdqv->smmu, cmdq); } -struct arm_smmu_cmdq *nvidia_grace_cmdqv_get_cmdq(struct arm_smmu_device *smmu) +struct arm_smmu_cmdq * +nvidia_grace_cmdqv_get_cmdq(struct arm_smmu_device *smmu, u64 *cmds, int n) { struct nvidia_grace_cmdqv *cmdqv = smmu->nvidia_grace_cmdqv; struct nvidia_grace_cmdqv_vintf *vintf0 = >vintf0; @@ -176,6 +177,24 @@ struct arm_smmu_cmdq *nvidia_grace_cmdqv_get_cmdq(struct arm_smmu_device *smmu) if (!FIELD_GET(VINTF_STATUS, vintf0->status)) return >cmdq; + /* Check for supported CMDs if VINTF is owned by guest (not hypervisor) */ + if (!FIELD_GET(VINTF_HYP_OWN, vintf0->cfg)) { + u64 opcode = (n) ? FIELD_GET(CMDQ_0_OP, cmds[0]) : CMDQ_OP_CMD_SYNC; + + /* List all supported CMDs for vintf->cmdq pathway */ + switch (opcode) { + case CMDQ_OP_TLBI_NH_ASID: + case CMDQ_OP_TLBI_NH_VA: + case CMDQ_OP_TLBI_S12_VMALL: + case CMDQ_OP_TLBI_S2_IPA: + case CMDQ_OP_ATC_INV: + break; + default: + /* Unsupported CMDs go for smmu->cmdq pathway */ + return >cmdq; + } + } + /* * Select a vcmdq to use. Here we use a temporal solution to *
[PATCH v3 3/5] iommu/arm-smmu-v3: Pass cmdq pointer in arm_smmu_cmdq_issue_cmdlist()
The driver currently calls arm_smmu_get_cmdq() helper internally in different places, though they are all actually called from the same source -- arm_smmu_cmdq_issue_cmdlist() function. This patch changes this to pass the cmdq pointer to these functions instead of calling arm_smmu_get_cmdq() every time. This also helps CMDQV extension in NVIDIA Grace SoC, whose driver'd maintain its own cmdq pointers and needs to redirect arm_smmu->cmdq to that upon seeing a supported command by checking its opcode. Signed-off-by: Nicolin Chen --- drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c | 15 --- 1 file changed, 8 insertions(+), 7 deletions(-) diff --git a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c index 6be20e926f63..188865ec9a33 100644 --- a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c +++ b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c @@ -586,11 +586,11 @@ static void arm_smmu_cmdq_poll_valid_map(struct arm_smmu_cmdq *cmdq, /* Wait for the command queue to become non-full */ static int arm_smmu_cmdq_poll_until_not_full(struct arm_smmu_device *smmu, +struct arm_smmu_cmdq *cmdq, struct arm_smmu_ll_queue *llq) { unsigned long flags; struct arm_smmu_queue_poll qp; - struct arm_smmu_cmdq *cmdq = arm_smmu_get_cmdq(smmu); int ret = 0; /* @@ -621,11 +621,11 @@ static int arm_smmu_cmdq_poll_until_not_full(struct arm_smmu_device *smmu, * Must be called with the cmdq lock held in some capacity. */ static int __arm_smmu_cmdq_poll_until_msi(struct arm_smmu_device *smmu, + struct arm_smmu_cmdq *cmdq, struct arm_smmu_ll_queue *llq) { int ret = 0; struct arm_smmu_queue_poll qp; - struct arm_smmu_cmdq *cmdq = arm_smmu_get_cmdq(smmu); u32 *cmd = (u32 *)(Q_ENT(>q, llq->prod)); queue_poll_init(smmu, ); @@ -645,10 +645,10 @@ static int __arm_smmu_cmdq_poll_until_msi(struct arm_smmu_device *smmu, * Must be called with the cmdq lock held in some capacity. */ static int __arm_smmu_cmdq_poll_until_consumed(struct arm_smmu_device *smmu, + struct arm_smmu_cmdq *cmdq, struct arm_smmu_ll_queue *llq) { struct arm_smmu_queue_poll qp; - struct arm_smmu_cmdq *cmdq = arm_smmu_get_cmdq(smmu); u32 prod = llq->prod; int ret = 0; @@ -695,12 +695,13 @@ static int __arm_smmu_cmdq_poll_until_consumed(struct arm_smmu_device *smmu, } static int arm_smmu_cmdq_poll_until_sync(struct arm_smmu_device *smmu, +struct arm_smmu_cmdq *cmdq, struct arm_smmu_ll_queue *llq) { if (smmu->options & ARM_SMMU_OPT_MSIPOLL) - return __arm_smmu_cmdq_poll_until_msi(smmu, llq); + return __arm_smmu_cmdq_poll_until_msi(smmu, cmdq, llq); - return __arm_smmu_cmdq_poll_until_consumed(smmu, llq); + return __arm_smmu_cmdq_poll_until_consumed(smmu, cmdq, llq); } static void arm_smmu_cmdq_write_entries(struct arm_smmu_cmdq *cmdq, u64 *cmds, @@ -757,7 +758,7 @@ static int arm_smmu_cmdq_issue_cmdlist(struct arm_smmu_device *smmu, while (!queue_has_space(, n + sync)) { local_irq_restore(flags); - if (arm_smmu_cmdq_poll_until_not_full(smmu, )) + if (arm_smmu_cmdq_poll_until_not_full(smmu, cmdq, )) dev_err_ratelimited(smmu->dev, "CMDQ timeout\n"); local_irq_save(flags); } @@ -833,7 +834,7 @@ static int arm_smmu_cmdq_issue_cmdlist(struct arm_smmu_device *smmu, /* 5. If we are inserting a CMD_SYNC, we must wait for it to complete */ if (sync) { llq.prod = queue_inc_prod_n(, n); - ret = arm_smmu_cmdq_poll_until_sync(smmu, ); + ret = arm_smmu_cmdq_poll_until_sync(smmu, cmdq, ); if (ret) { dev_err_ratelimited(smmu->dev, "CMD_SYNC timeout at 0x%08x [hwprod 0x%08x, hwcons 0x%08x]\n", -- 2.17.1 ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
[PATCH v3 1/5] iommu/arm-smmu-v3: Add CS_NONE quirk
The CMDQV extension in NVIDIA Grace SoC only supports CS_NONE in the CS field of CMD_SYNC. So this patch adds a quirk flag to accommodate that. Signed-off-by: Nicolin Chen --- drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c | 7 ++- drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h | 4 2 files changed, 10 insertions(+), 1 deletion(-) diff --git a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c index f5848b351b19..e6fee69dd79c 100644 --- a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c +++ b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c @@ -319,7 +319,9 @@ static int arm_smmu_cmdq_build_cmd(u64 *cmd, struct arm_smmu_cmdq_ent *ent) cmd[1] |= FIELD_PREP(CMDQ_RESUME_1_STAG, ent->resume.stag); break; case CMDQ_OP_CMD_SYNC: - if (ent->sync.msiaddr) { + if (ent->sync.cs_none) { + cmd[0] |= FIELD_PREP(CMDQ_SYNC_0_CS, CMDQ_SYNC_0_CS_NONE); + } else if (ent->sync.msiaddr) { cmd[0] |= FIELD_PREP(CMDQ_SYNC_0_CS, CMDQ_SYNC_0_CS_IRQ); cmd[1] |= ent->sync.msiaddr & CMDQ_SYNC_1_MSIADDR_MASK; } else { @@ -356,6 +358,9 @@ static void arm_smmu_cmdq_build_sync_cmd(u64 *cmd, struct arm_smmu_device *smmu, q->ent_dwords * 8; } + if (q->quirks & CMDQ_QUIRK_SYNC_CS_NONE_ONLY) + ent.sync.cs_none = true; + arm_smmu_cmdq_build_cmd(cmd, ); } diff --git a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h index 4cb136f07914..7a6a6045700d 100644 --- a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h +++ b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h @@ -499,6 +499,7 @@ struct arm_smmu_cmdq_ent { #define CMDQ_OP_CMD_SYNC0x46 struct { u64 msiaddr; + boolcs_none; } sync; }; }; @@ -531,6 +532,9 @@ struct arm_smmu_queue { u32 __iomem *prod_reg; u32 __iomem *cons_reg; + +#define CMDQ_QUIRK_SYNC_CS_NONE_ONLY BIT(0) /* CMD_SYNC CS field supports CS_NONE only */ + u32 quirks; }; struct arm_smmu_queue_poll { -- 2.17.1 ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
[PATCH v3 0/5] iommu/arm-smmu-v3: Add NVIDIA Grace CMDQ-V Support
From: Nicolin Chen NVIDIA's Grace SoC has a CMDQ-Virtualization (CMDQV) hardware that extends standard ARM SMMUv3 to support multiple command queues with virtualization capabilities. Though this is similar to the ECMDQ in SMMUv3.3, CMDQV provides additional V-Interfaces that allow VMs to have their own interfaces and command queues, and these queues are able to execute a limited set of commands, mainly TLB invalidation commands when running in the guest mode, comparing to the standard SMMUv3 CMDQ. This patch series extends the SMMUv3 driver to support NVIDIA CMDQV and implements it first for in-kernel use. Upon kernel boot some of the vcmdqs will be setup for kernel driver to use, by selecting one of the command queues based on the CPU currently executing to avoid lock contention hot spots with a single queue. Although HW is able to securely expose the additional V-Interfaces and command queues to guest VMs for fast TLB invalidations without a hypervisor trap, due to the ongoing proposal of IOMMUFD [0], we have to postpone the virtualization support that were available in v2, suggested by Alex and Jason [1]. And we envision that it will be added back via IOMMUFD in the months ahead. Thank you! [0] https://lore.kernel.org/lkml/20210919063848.1476776-1-yi.l@intel.com/ [1] https://lore.kernel.org/kvm/20210831101549.237151fa.alex.william...@redhat.com/T/#ma07dcfce69fa3f9d59e8b16579f694a0e10798d9 Changelog (details available in PATCH) v2->v3: * Dropped VMID and mdev patches to redesign later based on IOMMUFD. * Separated HYP_OWN part for guest support into a new patch * Added new preparational changes v1->v2: * Added mdev interface support for hypervisor and VMs. * Added preparational changes for mdev interface implementation. * PATCH-12 Changed ->issue_cmdlist() to ->get_cmdq() for a better integration with recently merged ECMDQ-related changes. Nate Watterson (1): iommu/arm-smmu-v3: Add host support for NVIDIA Grace CMDQ-V Nicolin Chen (4): iommu/arm-smmu-v3: Add CS_NONE quirk iommu/arm-smmu-v3: Make arm_smmu_cmdq_init reusable iommu/arm-smmu-v3: Pass cmdq pointer in arm_smmu_cmdq_issue_cmdlist() iommu/nvidia-grace-cmdqv: Limit CMDs for guest owned VINTF MAINTAINERS | 1 + drivers/iommu/Kconfig | 12 + drivers/iommu/arm/arm-smmu-v3/Makefile| 1 + drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c | 53 ++- drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h | 48 ++ .../arm/arm-smmu-v3/nvidia-grace-cmdqv.c | 446 ++ 6 files changed, 542 insertions(+), 19 deletions(-) create mode 100644 drivers/iommu/arm/arm-smmu-v3/nvidia-grace-cmdqv.c -- 2.17.1 ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
RE: [PATCH 01/11] iommu: Add device dma ownership set/release interfaces
> From: Jason Gunthorpe > Sent: Thursday, November 18, 2021 9:33 PM > > > In concept a singleton group is different from a > > multi-devices group which has only one device bound to driver... > > Really? Why? I don't see it that way.. > > A singleton group is just a multi-device group that hasn't been > hotplugged yet. > > We don't seem to have the concept of a "true" singleton group which is > permanently single due to HW features. > > > This series aims to avoid conflict having both user and kernel drivers > > mixed in a multi-devices group. Well, the difference is just in literal. I don't know the background why the existing iommu_attach_device() users want to do it this way. But given the condition in iommu_attach_device() it could in theory imply some unknown hardware-level side effect which may break the desired functionality once the group size grows beyond singleton. Is it a real case? I don't know... You are now redefining that condition from singleton group to multi-devices group with single driver bound. As long as no object from existing driver users, I'm fine with it. But still want to raise awareness as it does change the existing semantics (though might be considered as an imperfect way). Thanks Kevin ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [PATCH] iommu/vt-d: Fix unmap_pages support
Hi Jerry, On 11/19/21 3:48 AM, Jerry Snitselaar wrote: On Fri, 2021-11-12 at 10:59 +0800, Lu Baolu wrote: Hi Alex, On 11/11/21 8:32 AM, Alex Williamson wrote: When supporting only the .map and .unmap callbacks of iommu_ops, the IOMMU driver can make assumptions about the size and alignment used for mappings based on the driver provided pgsize_bitmap. VT-d previously used essentially PAGE_MASK for this bitmap as any power of two mapping was acceptably filled by native page sizes. However, with the .map_pages and .unmap_pages interface we're now getting page-size and count arguments. If we simply combine these as (page-size * count) and make use of the previous map/unmap functions internally, any size and alignment assumptions are very different. As an example, a given vfio device assignment VM will often create a 4MB mapping at IOVA pfn [0x3fe00 - 0x401ff]. On a system that does not support IOMMU super pages, the unmap_pages interface will ask to unmap 1024 4KB pages at the base IOVA. dma_pte_clear_level() will recurse down to level 2 of the page table where the first half of the pfn range exactly matches the entire pte level. We clear the pte, increment the pfn by the level size, but (oops) the next pte is on a new page, so we exit the loop an pop back up a level. When we then update the pfn based on that higher level, we seem to assume that the previous pfn value was at the start of the level. In this case the level size is 256K pfns, which we add to the base pfn and get a results of 0x7fe00, which is clearly greater than 0x401ff, so we're done. Meanwhile we never cleared the ptes for the remainder of the range. When the VM remaps this range, we're overwriting valid ptes and the VT-d driver complains loudly, as reported by the user report linked below. The fix for this seems relatively simple, if each iteration of the loop in dma_pte_clear_level() is assumed to clear to the end of the level pte page, then our next pfn should be calculated from level_pfn rather than our working pfn. Fixes: 3f34f1259776 ("iommu/vt-d: Implement map/unmap_pages() iommu_ops callback") Reported-by: Ajay Garg Link: https://lore.kernel.org/all/20211002124012.18186-1-ajaygargn...@gmail.com/ Signed-off-by: Alex Williamson Thank you for fixing this! I will queue it for v5.16. Best regards, baolu Hi Baolu, Do you have an estimate of when this will be submitted? I will submit all fix patches in my queue to Joerg early the next week. Best regards, baolu ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [PATCH] iommu/vt-d: Fix unmap_pages support
On Fri, 2021-11-12 at 10:59 +0800, Lu Baolu wrote: > Hi Alex, > > On 11/11/21 8:32 AM, Alex Williamson wrote: > > When supporting only the .map and .unmap callbacks of iommu_ops, > > the IOMMU driver can make assumptions about the size and alignment > > used for mappings based on the driver provided pgsize_bitmap. VT-d > > previously used essentially PAGE_MASK for this bitmap as any power > > of two mapping was acceptably filled by native page sizes. > > > > However, with the .map_pages and .unmap_pages interface we're now > > getting page-size and count arguments. If we simply combine these > > as (page-size * count) and make use of the previous map/unmap > > functions internally, any size and alignment assumptions are very > > different. > > > > As an example, a given vfio device assignment VM will often create > > a 4MB mapping at IOVA pfn [0x3fe00 - 0x401ff]. On a system that > > does not support IOMMU super pages, the unmap_pages interface will > > ask to unmap 1024 4KB pages at the base IOVA. > > dma_pte_clear_level() > > will recurse down to level 2 of the page table where the first half > > of the pfn range exactly matches the entire pte level. We clear > > the > > pte, increment the pfn by the level size, but (oops) the next pte > > is > > on a new page, so we exit the loop an pop back up a level. When we > > then update the pfn based on that higher level, we seem to assume > > that the previous pfn value was at the start of the level. In this > > case the level size is 256K pfns, which we add to the base pfn and > > get a results of 0x7fe00, which is clearly greater than 0x401ff, > > so we're done. Meanwhile we never cleared the ptes for the > > remainder > > of the range. When the VM remaps this range, we're overwriting > > valid > > ptes and the VT-d driver complains loudly, as reported by the user > > report linked below. > > > > The fix for this seems relatively simple, if each iteration of the > > loop in dma_pte_clear_level() is assumed to clear to the end of the > > level pte page, then our next pfn should be calculated from > > level_pfn > > rather than our working pfn. > > > > Fixes: 3f34f1259776 ("iommu/vt-d: Implement map/unmap_pages() > > iommu_ops callback") > > Reported-by: Ajay Garg > > Link: > > https://lore.kernel.org/all/20211002124012.18186-1-ajaygargn...@gmail.com/ > > Signed-off-by: Alex Williamson > > Thank you for fixing this! I will queue it for v5.16. > > Best regards, > baolu > Hi Baolu, Do you have an estimate of when this will be submitted? Regards, Jerry > > --- > > drivers/iommu/intel/iommu.c | 2 +- > > 1 file changed, 1 insertion(+), 1 deletion(-) > > > > diff --git a/drivers/iommu/intel/iommu.c > > b/drivers/iommu/intel/iommu.c > > index d75f59ae28e6..f6395f5425f0 100644 > > --- a/drivers/iommu/intel/iommu.c > > +++ b/drivers/iommu/intel/iommu.c > > @@ -1249,7 +1249,7 @@ static struct page > > *dma_pte_clear_level(struct dmar_domain *domain, int level, > > freelist); > > } > > next: > > - pfn += level_size(level); > > + pfn = level_pfn + level_size(level); > > } while (!first_pte_in_page(++pte) && pfn <= last_pfn); > > > > if (first_pte) > > > > > ___ > iommu mailing list > iommu@lists.linux-foundation.org > https://lists.linuxfoundation.org/mailman/listinfo/iommu ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [PATCH 1/2] dt-bindings: Add Arm SMMUv3 PMCG binding
On 2021-11-17 23:19, Rob Herring wrote: On Tue, Nov 16, 2021 at 5:52 AM Jean-Philippe Brucker wrote: Add binding for the Arm SMMUv3 PMU. Each node represents a PMCG, and is placed as a sibling node of the SMMU. Although the PMCGs registers may be within the SMMU MMIO region, they are separate devices, and there can be multiple PMCG devices for each SMMU (for example one for the TCU and one for each TBU). Signed-off-by: Jean-Philippe Brucker --- .../bindings/iommu/arm,smmu-v3-pmcg.yaml | 67 +++ 1 file changed, 67 insertions(+) create mode 100644 Documentation/devicetree/bindings/iommu/arm,smmu-v3-pmcg.yaml diff --git a/Documentation/devicetree/bindings/iommu/arm,smmu-v3-pmcg.yaml b/Documentation/devicetree/bindings/iommu/arm,smmu-v3-pmcg.yaml new file mode 100644 index ..a893e071fdb4 --- /dev/null +++ b/Documentation/devicetree/bindings/iommu/arm,smmu-v3-pmcg.yaml @@ -0,0 +1,67 @@ +# SPDX-License-Identifier: (GPL-2.0-only OR BSD-2-Clause) +%YAML 1.2 +--- +$id: http://devicetree.org/schemas/iommu/arm,smmu-v3-pmcg.yaml# +$schema: http://devicetree.org/meta-schemas/core.yaml# + +title: Arm SMMUv3 Performance Monitor Counter Group + +maintainers: + - Will Deacon + - Robin Murphy + +description: |+ Don't need '|+' if no formatting to preserve. + An SMMUv3 may have several Performance Monitor Counter Group (PMCG). + They are standalone performance monitoring units that support both + architected and IMPLEMENTATION DEFINED event counters. Humm, I don't know that I agree they are standalone. They could be I guess, but looking at the MMU-600 spec the PMCG looks like it's just a subset of registers in a larger block. This seems similar to MPAM (which I'm working on a binding for) where it's just a register map and interrupts, but every other possible resource is unspecified by the architecture. They're "standalone" in the sense that they don't have to be part of an SMMU, they could be part of a PCIe root complex or other SoC device that couples to an SMMU (e.g. anything that can speak AMBA DTI, in the case of our SMMU implementations). In fact our SMMU TBUs are pretty much separate devices themselves, they just *only* speak DTI, so access to their registers is proxied through the TCU programming interface. The simplest change from this would be just specifying that the PMCG is child node(s) of whatever it is part of. The extreme would be this is all part of the SMMU binding (i.e. reg entry X is PMCG registers, interrupts entry Y is pmu irq). Being a child of its associated device doesn't seem too bad semantically, however how would we describe a PMCG as a child of a PCIe node when its "reg" property still exists in the parent address space and not PCI config/memory space like any of its siblings? Also in practical terms, consuming that binding in Linux and getting the things to probe when it may want to be independent of whether we even understand the parent node at all could be... unpleasant. Robin. + +properties: + $nodename: +pattern: "^pmu@[0-9a-f]*" s/*/+/ Need at least 1 digit. + compatible: +oneOf: + - items: +- enum: + - hisilicon,smmu-v3-pmcg-hip08 +- const: arm,smmu-v3-pmcg + - const: arm,smmu-v3-pmcg + + reg: +description: | + Base addresses of the PMCG registers. Either a single address for Page 0 + or an additional address for Page 1, where some registers can be + relocated with SMMU_PMCG_CFGR.RELOC_CTRS. +minItems: 1 +maxItems: 2 + + interrupts: +maxItems: 1 + + msi-parent: true + +required: + - compatible + - reg + +additionalProperties: false + +examples: + - |+ +#include +#include + +pmu@2b42 { +compatible = "arm,smmu-v3-pmcg"; +reg = <0 0x2b42 0 0x1000>, + <0 0x2b43 0 0x1000>; +interrupts = ; +msi-parent = < 0xff>; +}; + +pmu@2b44 { +compatible = "arm,smmu-v3-pmcg"; +reg = <0 0x2b44 0 0x1000>, + <0 0x2b45 0 0x1000>; +interrupts = ; +msi-parent = < 0xff>; +}; -- 2.33.1 ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [PATCH 01/11] iommu: Add device dma ownership set/release interfaces
On Thu, Nov 18, 2021 at 09:12:41AM +0800, Lu Baolu wrote: > The existing iommu_attach_device() allows only for singleton group. As > we have added group ownership attribute, we can enforce this interface > only for kernel domain usage. Below is what I came up with. - Replace the file * with a simple void * - Use owner_count == 0 <-> dma_owner == DMA_OWNER to simplify the logic and remove levels of indent - Add a kernel state DMA_OWNER_PRIVATE_DOMAIN - Rename the user state to DMA_OWNER_PRIVATE_DOMAIN_USER It differs from the above because it does extra work to keep the group isolated that kernel users do no need to do. - Rename the kernel state to DMA_OWNER_DMA_API to better reflect its purpose. Inspired by Robin's point that alot of this is indirectly coupled to the domain pointer. - Have iommu_attach_device() atomically swap from DMA_OWNER_DMA_API to DMA_OWNER_PRIVATE_DOMAIN - replaces the group size check. When we figure out tegra we can add an WARN_ON to iommu_attach_group() that dma_owner != DMA_OWNER_NONE || DMA_OWNER_DMA_API Then the whole thing makes some general sense.. Jason diff --git a/drivers/iommu/iommu.c b/drivers/iommu/iommu.c index 064d0679906afd..4cafe074775e30 100644 --- a/drivers/iommu/iommu.c +++ b/drivers/iommu/iommu.c @@ -49,7 +49,7 @@ struct iommu_group { struct list_head entry; enum iommu_dma_owner dma_owner; refcount_t owner_cnt; - struct file *owner_user_file; + void *owner_cookie; }; struct group_device { @@ -1937,12 +1937,18 @@ int iommu_attach_device(struct iommu_domain *domain, struct device *dev) * change while we are attaching */ mutex_lock(>mutex); - ret = -EINVAL; - if (iommu_group_device_count(group) != 1) + if (group->dma_owner != DMA_OWNER_DMA_API || + refcount_read(>owner_cnt) != 1) { + ret = -EBUSY; goto out_unlock; + } ret = __iommu_attach_group(domain, group); + if (ret) + goto out_unlock; + group->dma_owner = DMA_OWNER_PRIVATE_DOMAIN; + group->owner_cookie = domain; out_unlock: mutex_unlock(>mutex); iommu_group_put(group); @@ -2193,14 +2199,11 @@ void iommu_detach_device(struct iommu_domain *domain, struct device *dev) return; mutex_lock(>mutex); - if (iommu_group_device_count(group) != 1) { - WARN_ON(1); - goto out_unlock; - } - + WARN_ON(group->dma_owner != DMA_OWNER_PRIVATE_DOMAIN || + refcount_read(>owner_cnt) != 1 || + group->owner_cookie != domain); + group->dma_owner = DMA_OWNER_DMA_API; __iommu_detach_group(domain, group); - -out_unlock: mutex_unlock(>mutex); iommu_group_put(group); } @@ -3292,44 +3295,33 @@ static ssize_t iommu_group_store_type(struct iommu_group *group, static int __iommu_group_set_dma_owner(struct iommu_group *group, enum iommu_dma_owner owner, - struct file *user_file) + void *owner_cookie) { - if (group->dma_owner != DMA_OWNER_NONE && group->dma_owner != owner) - return -EBUSY; - - if (owner == DMA_OWNER_USER) { - if (!user_file) - return -EINVAL; - - if (group->owner_user_file && group->owner_user_file != user_file) - return -EPERM; + if (refcount_inc_not_zero(>owner_cnt)) { + if (group->dma_owner != owner || + group->owner_cookie != owner_cookie) { + refcount_dec(>owner_cnt); + return -EBUSY; + } + return 0; } - if (!refcount_inc_not_zero(>owner_cnt)) { - group->dma_owner = owner; - refcount_set(>owner_cnt, 1); - - if (owner == DMA_OWNER_USER) { - /* -* The UNMANAGED domain shouldn't be attached before -* claiming the USER ownership for the first time. -*/ - if (group->domain) { - if (group->domain != group->default_domain) { - group->dma_owner = DMA_OWNER_NONE; - refcount_set(>owner_cnt, 0); - - return -EBUSY; - } - - __iommu_detach_group(group->domain, group); - } - - get_file(user_file); - group->owner_user_file = user_file; + /* +* We must ensure that any device DMAs issued after this call +* are discarded. DMAs can only reach real memory once someone +* has attached a real domain. +
Re: [PATCH 01/11] iommu: Add device dma ownership set/release interfaces
On Thu, Nov 18, 2021 at 02:39:45AM +, Tian, Kevin wrote: > > From: Jason Gunthorpe > > Sent: Tuesday, November 16, 2021 9:46 PM > > > > On Tue, Nov 16, 2021 at 09:57:30AM +0800, Lu Baolu wrote: > > > Hi Christoph, > > > > > > On 11/15/21 9:14 PM, Christoph Hellwig wrote: > > > > On Mon, Nov 15, 2021 at 10:05:42AM +0800, Lu Baolu wrote: > > > > > +enum iommu_dma_owner { > > > > > + DMA_OWNER_NONE, > > > > > + DMA_OWNER_KERNEL, > > > > > + DMA_OWNER_USER, > > > > > +}; > > > > > + > > > > > > > > > + enum iommu_dma_owner dma_owner; > > > > > + refcount_t owner_cnt; > > > > > + struct file *owner_user_file; > > > > > > > > I'd just overload the ownership into owner_user_file, > > > > > > > > NULL -> no owner > > > > (struct file *)1UL) -> kernel > > > > real pointer -> user > > > > > > > > Which could simplify a lot of the code dealing with the owner. > > > > > > > > > > Yeah! Sounds reasonable. I will make this in the next version. > > > > It would be good to figure out how to make iommu_attach_device() > > enforce no other driver binding as a kernel user without a file *, as > > Robin pointed to, before optimizing this. > > > > This fixes an existing bug where iommu_attach_device() only checks the > > group size and is vunerable to a hot plug increasing the group size > > after it returns. That check should be replaced by this series's logic > > instead. > > > > I think this existing bug in iommu_attach_devce() is different from > what this series is attempting to solve. To avoid breaking singleton > group assumption there the ideal band-aid is to fail device hotplug. > Otherwise some IOVA ranges which are supposed to go upstream > to IOMMU may be considered as p2p and routed to the hotplugged > device instead. Yes, but the instability of the reserved regions during hotplug with !ACS seems like an entirely different problem. It affects everything, including VFIO, and multi-device groups. Certainly it is nothing to do with this series. > In concept a singleton group is different from a > multi-devices group which has only one device bound to driver... Really? Why? I don't see it that way.. A singleton group is just a multi-device group that hasn't been hotplugged yet. We don't seem to have the concept of a "true" singleton group which is permanently single due to HW features. > This series aims to avoid conflict having both user and kernel drivers > mixed in a multi-devices group. I see this series about bringing order to all the places that want to use a non-default domain - in-kernel or user doesn't really matter. ie why shouldn't iommu_attach_device() work in a group that has a PCI bridge, just like VFIO does? The only thing that is special about VFIO vs a kernel driver is we want a little help to track userspace ownership and VFIO opens userspace to do the P2P attack. The way I see it the num device == 1 test in iommu_attach_device() is an imperfect way of controlling driver binding, and we can do better by using the mechanism in this series. Jason ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [PATCH v2 2/6] hwtracing: Add trace function support for HiSilicon PCIe Tune and Trace device
Hi Robin, On 2021/11/16 19:37, Yicong Yang wrote: > On 2021/11/16 18:56, Robin Murphy wrote: >> On 2021-11-16 09:06, Yicong Yang via iommu wrote: >> [...] >>> +/* >>> + * Get RMR address if provided by the firmware. >>> + * Return 0 if the IOMMU doesn't present or the policy of the >>> + * IOMMU domain is passthrough or we get a usable RMR region. >>> + * Otherwise a negative value is returned. >>> + */ >>> +static int hisi_ptt_get_rmr(struct hisi_ptt *hisi_ptt) >>> +{ >>> + struct pci_dev *pdev = hisi_ptt->pdev; >>> + struct iommu_domain *iommu_domain; >>> + struct iommu_resv_region *region; >>> + LIST_HEAD(list); >>> + >>> + /* >>> + * Use direct DMA if IOMMU does not present or the policy of the >>> + * IOMMU domain is passthrough. >>> + */ >>> + iommu_domain = iommu_get_domain_for_dev(>dev); >>> + if (!iommu_domain || iommu_domain->type == IOMMU_DOMAIN_IDENTITY) >>> + return 0; >>> + >>> + iommu_get_resv_regions(>dev, ); >>> + list_for_each_entry(region, , list) >>> + if (region->type == IOMMU_RESV_DIRECT && >>> + region->length >= HISI_PTT_TRACE_BUFFER_SIZE) { >>> + hisi_ptt->trace_ctrl.has_rmr = true; >>> + hisi_ptt->trace_ctrl.rmr_addr = region->start; >>> + hisi_ptt->trace_ctrl.rmr_length = region->length; >>> + break; >>> + } >>> + >>> + iommu_put_resv_regions(>dev, ); >>> + return hisi_ptt->trace_ctrl.has_rmr ? 0 : -ENOMEM; >>> +} >> >> No. >> >> The whole point of RMRs is for devices that are already configured to access >> the given address range in a manner beyond the kernel's control. If you can >> do this, it proves that you should not have an RMR in the first place. >> >> The notion of a kernel driver explicitly configuring its device to DMA into >> any random RMR that looks big enough is so egregiously wrong that I'm almost >> lost for words... >> > > our bios will reserve such a region and reported it through iort. the device > will write to the region and in the driver we need to access the region > to get the traced data. the region is reserved exclusively and will not be > accessed by kernel or other devices. > > is it ok to let bios configure the address to the device and from CPU side we > just read it? > Any suggestion? Is this still an issue you concern if we move the configuration of the device address to BIOS and just read from the CPU side? > > > . > ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu