Re: [git pull] iommu: Move Intel and AMD drivers to a subdirectory
On Fri, Jun 12, 2020 at 12:23:49PM -0700, Linus Torvalds wrote: > Looks good to me. Any time a directory starts to have a lot of > filenames with a particular prefix, moving them deeper like this seems > to make sense. And doing it just before the -rc1 release and avoiding > unnecessary conflicts seems like the right time too. > > So pulled. Thanks! > Looking at it, it might even be worth moving the Kconfig and Makefile > details down to the intel/amd subdirectories, and have them be > included from the main iommu ones? But that's up to you. Yeah, right. Its cleaner to move the Kconfig and Makefile stuff a level deeper too, I'll take care of that for v5.9. Joerg ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [PATCH 0/2] Introduce PCI_FIXUP_IOMMU
On 2020/6/11 下午9:44, Bjorn Helgaas wrote: +++ b/drivers/iommu/iommu.c @@ -2418,6 +2418,10 @@ int iommu_fwspec_init(struct device *dev, struct fwnode_handle *iommu_fwnode, fwspec->iommu_fwnode = iommu_fwnode; fwspec->ops = ops; dev_iommu_fwspec_set(dev, fwspec); + + if (dev_is_pci(dev)) + pci_fixup_device(pci_fixup_final, to_pci_dev(dev)); + Then pci_fixup_final will be called twice, the first in pci_bus_add_device. Here in iommu_fwspec_init is the second time, specifically for iommu_fwspec. Will send this when 5.8-rc1 is open. Wait, this whole fixup approach seems wrong to me. No matter how you do the fixup, it's still a fixup, which means it requires ongoing maintenance. Surely we don't want to have to add the Vendor/Device ID for every new AMBA device that comes along, do we? Here the fake pci device has standard PCI cfg space, but physical implementation is base on AMBA They can provide pasid feature. However, 1, does not support tlp since they are not real pci devices. 2. does not support pri, instead support stall (provided by smmu) And stall is not a pci feature, so it is not described in struct pci_dev, but in struct iommu_fwspec. So we use this fixup to tell pci system that the devices can support stall, and hereby support pasid. This did not answer my question. Are you proposing that we update a quirk every time a new AMBA device is released? I don't think that would be a good model. Yes, you are right, but we do not have any better idea yet. Currently we have three fake pci devices, which support stall and pasid. We have to let pci system know the device can support pasid, because of stall feature, though not support pri. Do you have any other ideas? It sounds like the best way would be to allocate a PCI capability for it, so detection can be done through config space, at least in future devices, or possibly after a firmware update if the config space in your system is controlled by firmware somewhere. Once there is a proper mechanism to do this, using fixups to detect the early devices that don't use that should be uncontroversial. I have no idea what the process or timeline is to add new capabilities into the PCIe specification, or if this one would be acceptable to the PCI SIG at all. That sounds like a possibility. The spec already defines a Vendor-Specific Extended Capability (PCIe r5.0, sec 7.9.5) that might be a candidate. Will investigate this, thanks Bjorn FWIW, there's also a Vendor-Specific Capability that can appear in the first 256 bytes of config space (the Vendor-Specific Extended Capability must appear in the "Extended Configuration Space" from 0x100-0xfff). Unfortunately our silicon does not have either Vendor-Specific Capability or Vendor-Specific Extended Capability. Studied commit 8531e283bee66050734fb0e89d53e85fd5ce24a4 Looks this method requires adding member (like can_stall) to struct pci_dev, looks difficult. If detection cannot be done through PCI config space, the next best alternative is to pass auxiliary data through firmware. On DT based machines, you can list non-hotpluggable PCIe devices and add custom properties that could be read during device enumeration. I assume ACPI has something similar, but I have not done that. Yes, thanks Arnd ACPI has _DSM (ACPI v6.3, sec 9.1.1), which might be a candidate. I like this better than a PCI capability because the property you need to expose is not a PCI property. _DSM may not workable, since it is working in runtime. We need stall information in init stage, neither too early (after allocation of iommu_fwspec) nor too late (before arm_smmu_add_device ). I'm not aware of a restriction on when _DSM can be evaluated. I'm looking at ACPI v6.3, sec 9.1.1. Are you seeing something different? DSM method seems requires vendor specific guid, and code would be vendor specific. Except adding uuid to some spec like pci_acpi_dsm_guid. obj = acpi_evaluate_dsm(ACPI_HANDLE(bus->bridge), &pci_acpi_dsm_guid, 1, IGNORE_PCI_BOOT_CONFIG_DSM, NULL); By the way, It would be a long time if we need modify either pcie spec or acpi spec. Can we use pci_fixup_device in iommu_fwspec_init first, it is relatively simple and meet the requirement of platform device using pasid, and they are already in product. Neither the PCI Vendor-Specific Capability nor the ACPI _DSM requires a spec change. Both can be completely vendor-defined. Adding vendor-specific code to common files looks a bit ugly. Thanks ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [PATCH v2 11/12] x86/mmu: Allocate/free PASID
Hi Fenghua, On 2020/6/13 8:41, Fenghua Yu wrote: A PASID is allocated for an "mm" the first time any thread attaches to an SVM capable device. Later device attachments (whether to the same device or another SVM device) will re-use the same PASID. The PASID is freed when the process exits (so no need to keep reference counts on how many SVM devices are sharing the PASID). FYI. Jean-Philippe Brucker has a patch for mm->pasid management in the vendor agnostic manner. https://www.spinics.net/lists/iommu/msg44459.html Best regards, baolu Signed-off-by: Fenghua Yu Reviewed-by: Tony Luck --- v2: - Define a helper free_bind() to simplify error exit code in bind_mm() (Thomas) - Fix a ret error code in bind_mm() (Thomas) - Change pasid's type from "int" to "unsigned int" to have consistent pasid type in iommu (Thomas) - Simplify alloc_pasid() a bit. arch/x86/include/asm/iommu.h | 2 + arch/x86/include/asm/mmu_context.h | 14 drivers/iommu/intel/svm.c | 101 + 3 files changed, 105 insertions(+), 12 deletions(-) diff --git a/arch/x86/include/asm/iommu.h b/arch/x86/include/asm/iommu.h index bf1ed2ddc74b..ed41259fe7ac 100644 --- a/arch/x86/include/asm/iommu.h +++ b/arch/x86/include/asm/iommu.h @@ -26,4 +26,6 @@ arch_rmrr_sanity_check(struct acpi_dmar_reserved_memory *rmrr) return -EINVAL; } +void __free_pasid(struct mm_struct *mm); + #endif /* _ASM_X86_IOMMU_H */ diff --git a/arch/x86/include/asm/mmu_context.h b/arch/x86/include/asm/mmu_context.h index 47562147e70b..f8c91ce8c451 100644 --- a/arch/x86/include/asm/mmu_context.h +++ b/arch/x86/include/asm/mmu_context.h @@ -13,6 +13,7 @@ #include #include #include +#include extern atomic64_t last_mm_ctx_id; @@ -117,9 +118,22 @@ static inline int init_new_context(struct task_struct *tsk, init_new_context_ldt(mm); return 0; } + +static inline void free_pasid(struct mm_struct *mm) +{ + if (!IS_ENABLED(CONFIG_INTEL_IOMMU_SVM)) + return; + + if (!cpu_feature_enabled(X86_FEATURE_ENQCMD)) + return; + + __free_pasid(mm); +} + static inline void destroy_context(struct mm_struct *mm) { destroy_context_ldt(mm); + free_pasid(mm); } extern void switch_mm(struct mm_struct *prev, struct mm_struct *next, diff --git a/drivers/iommu/intel/svm.c b/drivers/iommu/intel/svm.c index 4e775e12ae52..27dc866b8461 100644 --- a/drivers/iommu/intel/svm.c +++ b/drivers/iommu/intel/svm.c @@ -425,6 +425,53 @@ int intel_svm_unbind_gpasid(struct device *dev, unsigned int pasid) return ret; } +static void free_bind(struct intel_svm *svm, struct intel_svm_dev *sdev, + bool new_pasid) +{ + if (new_pasid) + ioasid_free(svm->pasid); + kfree(svm); + kfree(sdev); +} + +/* + * If this mm already has a PASID, use it. Otherwise allocate a new one. + * Let the caller know if a new PASID is allocated via 'new_pasid'. + */ +static int alloc_pasid(struct intel_svm *svm, struct mm_struct *mm, + unsigned int pasid_max, bool *new_pasid, + unsigned int flags) +{ + unsigned int pasid; + + *new_pasid = false; + + /* +* Reuse the PASID if the mm already has a PASID and not a private +* PASID is requested. +*/ + if (mm && mm->pasid && !(flags & SVM_FLAG_PRIVATE_PASID)) { + /* +* Once a PASID is allocated for this mm, the PASID +* stays with the mm until the mm is dropped. Reuse +* the PASID which has been already allocated for the +* mm instead of allocating a new one. +*/ + ioasid_set_data(mm->pasid, svm); + + return mm->pasid; + } + + /* Allocate a new pasid. Do not use PASID 0, reserved for init PASID. */ + pasid = ioasid_alloc(NULL, PASID_MIN, pasid_max - 1, svm); + if (pasid != INVALID_IOASID) { + /* A new pasid is allocated. */ + *new_pasid = true; + } + + return pasid; +} + /* Caller must hold pasid_mutex, mm reference */ static int intel_svm_bind_mm(struct device *dev, unsigned int flags, @@ -518,6 +565,8 @@ intel_svm_bind_mm(struct device *dev, unsigned int flags, init_rcu_head(&sdev->rcu); if (!svm) { + bool new_pasid; + svm = kzalloc(sizeof(*svm), GFP_KERNEL); if (!svm) { ret = -ENOMEM; @@ -529,12 +578,9 @@ intel_svm_bind_mm(struct device *dev, unsigned int flags, if (pasid_max > intel_pasid_max_id) pasid_max = intel_pasid_max_id; - /* Do not use PASID 0, reserved for RID to PASID */ - svm->pasid = ioasid_alloc(NULL, PASID_MIN, - pasid_max - 1, svm); + svm->pasid = alloc_pasid(svm,
Re: [PATCH v2 04/12] docs: x86: Add documentation for SVA (Shared Virtual Addressing)
Hi Fenghua, On 2020/6/13 8:41, Fenghua Yu wrote: From: Ashok Raj ENQCMD and Data Streaming Accelerator (DSA) and all of their associated features are a complicated stack with lots of interconnected pieces. This documentation provides a big picture overview for all of the features. Signed-off-by: Ashok Raj Co-developed-by: Fenghua Yu Signed-off-by: Fenghua Yu Reviewed-by: Tony Luck --- v2: - Fix the doc format and add the doc in toctree (Thomas) - Modify the doc for better description (Thomas, Tony, Dave) Documentation/x86/index.rst | 1 + Documentation/x86/sva.rst | 287 2 files changed, 288 insertions(+) create mode 100644 Documentation/x86/sva.rst diff --git a/Documentation/x86/index.rst b/Documentation/x86/index.rst index 265d9e9a093b..e5d5ff096685 100644 --- a/Documentation/x86/index.rst +++ b/Documentation/x86/index.rst @@ -30,3 +30,4 @@ x86-specific Documentation usb-legacy-support i386/index x86_64/index + sva diff --git a/Documentation/x86/sva.rst b/Documentation/x86/sva.rst new file mode 100644 index ..1e52208c7dda --- /dev/null +++ b/Documentation/x86/sva.rst @@ -0,0 +1,287 @@ +.. SPDX-License-Identifier: GPL-2.0 + +=== +Shared Virtual Addressing (SVA) with ENQCMD +=== + +Background +== + +Shared Virtual Addressing (SVA) allows the processor and device to use the +same virtual addresses avoiding the need for software to translate virtual +addresses to physical addresses. SVA is what PCIe calls Shared Virtual +Memory (SVM) + +In addition to the convenience of using application virtual addresses +by the device, it also doesn't require pinning pages for DMA. +PCIe Address Translation Services (ATS) along with Page Request Interface +(PRI) allow devices to function much the same way as the CPU handling +application page-faults. For more information please refer to PCIe +specification Chapter 10: ATS Specification. + +Use of SVA requires IOMMU support in the platform. IOMMU also is required +to support PCIe features ATS and PRI. ATS allows devices to cache +translations for the virtual address. IOMMU driver uses the mmu_notifier() +support to keep the device tlb cache and the CPU cache in sync. PRI allows +the device to request paging the virtual address before using if they are +not paged in the CPU page tables. + + +Shared Hardware Workqueues +== + +Unlike Single Root I/O Virtualization (SRIOV), Scalable IOV (SIOV) permits +the use of Shared Work Queues (SWQ) by both applications and Virtual +Machines (VM's). This allows better hardware utilization vs. hard +partitioning resources that could result in under utilization. In order to +allow the hardware to distinguish the context for which work is being +executed in the hardware by SWQ interface, SIOV uses Process Address Space +ID (PASID), which is a 20bit number defined by the PCIe SIG. + +PASID value is encoded in all transactions from the device. This allows the +IOMMU to track I/O on a per-PASID granularity in addition to using the PCIe +Resource Identifier (RID) which is the Bus/Device/Function. + + +ENQCMD +== + +ENQCMD is a new instruction on Intel platforms that atomically submits a +work descriptor to a device. The descriptor includes the operation to be +performed, virtual addresses of all parameters, virtual address of a completion +record, and the PASID (process address space ID) of the current process. + +ENQCMD works with non-posted semantics and carries a status back if the +command was accepted by hardware. This allows the submitter to know if the +submission needs to be retried or other device specific mechanisms to +implement implement fairness or ensure forward progress can be made. Repeated "implement". + +ENQCMD is the glue that ensures applications can directly submit commands +to the hardware and also permit hardware to be aware of application context +to perform I/O operations via use of PASID. + +Process Address Space Tagging += + +A new thread scoped MSR (IA32_PASID) provides the connection between +user processes and the rest of the hardware. When an application first +accesses an SVA capable device this MSR is initialized with a newly +allocated PASID. The driver for the device calls an IOMMU specific api +that sets up the routing for DMA and page-requests. + +For example, the Intel Data Streaming Accelerator (DSA) uses +intel_svm_bind_mm(), which will do the following. The Intel SVM APIs have been deprecated. Drivers should use iommu_sva_bind_device() instead. Please also update other places in this document. + +- Allocate the PASID, and program the process page-table (cr3) in the PASID + context entries. +- Register for mmu_notifier() to track any page-table invalidations to keep + the device tlb in sync. For example, when a page-table entry is invalidated, + IOMMU propagates the in