Re: [PATCH v7 3/9] docs: x86: Add documentation for SVA (Shared Virtual Addressing)
Hi, Randy, On Sat, Sep 05, 2020 at 10:54:59AM -0700, Randy Dunlap wrote: > Hi, > > I'll add a few edits other than those that Borislav made. > (nice review job, BP) > > > On 8/27/20 8:06 AM, Fenghua Yu wrote: > > From: Ashok Raj > > > > ENQCMD and Data Streaming Accelerator (DSA) and all of their associated > > features are a complicated stack with lots of interconnected pieces. > > This documentation provides a big picture overview for all of the > > features. > > > > Signed-off-by: Ashok Raj > > Co-developed-by: Fenghua Yu > > Signed-off-by: Fenghua Yu > > Reviewed-by: Tony Luck > > --- > > diff --git a/Documentation/x86/sva.rst b/Documentation/x86/sva.rst > > new file mode 100644 > > index ..6e7ac565e127 > > --- /dev/null > > +++ b/Documentation/x86/sva.rst > > @@ -0,0 +1,254 @@ > > +MMIO. This doesn't scale as the number of threads becomes quite large. The > > +hardware also manages the queue depth for Shared Work Queues (SWQ), and > > +consumers don't need to track queue depth. If there is no space to accept > > +a command, the device will return an error indicating retry. Also > > +submitting a command to an MMIO address that can't accept ENQCMD will > > +return retry in response. In the new DMWr PCIe terminology, devices need to > > so how does a submitter know whether a return of "retry" means no_space or > invalid_for_this_device? I will add "A user should check Deferrable Memory Write (DMWr) capability on the device and only submits ENQCMD when the device supports it." So the user doesn't need to distinguish "no space" and "invalid for this device" errors. All of your other comments will be addressed in the next version. Thank you very much for your comments! -Fenghua ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [PATCH v7 3/9] docs: x86: Add documentation for SVA (Shared Virtual Addressing)
Hi, I'll add a few edits other than those that Borislav made. (nice review job, BP) On 8/27/20 8:06 AM, Fenghua Yu wrote: > From: Ashok Raj > > ENQCMD and Data Streaming Accelerator (DSA) and all of their associated > features are a complicated stack with lots of interconnected pieces. > This documentation provides a big picture overview for all of the > features. > > Signed-off-by: Ashok Raj > Co-developed-by: Fenghua Yu > Signed-off-by: Fenghua Yu > Reviewed-by: Tony Luck > --- > v7: > - Change the doc for updating PASID by IPI and context switch (Andy). > > v3: > - Replace deprecated intel_svm_bind_mm() by iommu_sva_bind_mm() (Baolu) > - Fix a couple of typos (Baolu) > > v2: > - Fix the doc format and add the doc in toctree (Thomas) > - Modify the doc for better description (Thomas, Tony, Dave) > > Documentation/x86/index.rst | 1 + > Documentation/x86/sva.rst | 254 > 2 files changed, 255 insertions(+) > create mode 100644 Documentation/x86/sva.rst > diff --git a/Documentation/x86/sva.rst b/Documentation/x86/sva.rst > new file mode 100644 > index ..6e7ac565e127 > --- /dev/null > +++ b/Documentation/x86/sva.rst > @@ -0,0 +1,254 @@ > +.. SPDX-License-Identifier: GPL-2.0 > + > +=== > +Shared Virtual Addressing (SVA) with ENQCMD > +=== > + > +Background > +== > + ... > + > +Shared Hardware Workqueues > +== > + > +Unlike Single Root I/O Virtualization (SRIOV), Scalable IOV (SIOV) permits > +the use of Shared Work Queues (SWQ) by both applications and Virtual > +Machines (VM's). This allows better hardware utilization vs. hard > +partitioning resources that could result in under utilization. In order to > +allow the hardware to distinguish the context for which work is being > +executed in the hardware by SWQ interface, SIOV uses Process Address Space > +ID (PASID), which is a 20bit number defined by the PCIe SIG. 20-bit > + > +PASID value is encoded in all transactions from the device. This allows the > +IOMMU to track I/O on a per-PASID granularity in addition to using the PCIe > +Resource Identifier (RID) which is the Bus/Device/Function. > + > + > +ENQCMD > +== > + ... > + > +Process Address Space Tagging > += > + ... > + > +PASID Management > + > + ... > + > +Relationships > += > + > + * Each process has many threads, but only one PASID (end with) PASID. > + * Devices have a limited number (~10's to 1000's) of hardware > + workqueues and each portal maps down to a single workqueue. > + The device driver manages allocating hardware workqueues. > + * A single mmap() maps a single hardware workqueue as a "portal" (end with) . > + * For each device with which a process interacts, there must be > + one or more mmap()'d portals. > + * Many threads within a process can share a single portal to access > + a single device. > + * Multiple processes can separately mmap() the same portal, in > + which case they still share one device hardware workqueue. > + * The single process-wide PASID is used by all threads to interact > + with all devices. There is not, for instance, a PASID for each > + thread or each thread<->device pair. > + > +FAQ > +=== > + > +* What is SVA/SVM? > + > +Shared Virtual Addressing (SVA) permits I/O hardware and the processor to > +work in the same address space. In short, sharing the address space. Some > +call it Shared Virtual Memory (SVM), but Linux community wanted to avoid waned to avoid confusing > +it with Posix Shared Memory and Secure Virtual Machines which were terms POSIX > +already in circulation. > + > +* What is a PASID? > + > +A Process Address Space ID (PASID) is a PCIe-defined TLP Prefix. A PASID is ah, BP already commented about using acronyms to define acronyms. :) > +a 20 bit number allocated and managed by the OS. PASID is included in all 20-bit > +transactions between the platform and the device. > + > +* How are shared work queues different? > + > +Traditionally to allow user space applications interact with hardware, > +there is a separate instance required per process. For example, consider > +doorbells as a mechanism of informing hardware about work to process. Each > +doorbell is required to be spaced 4k (or page-size) apart for process > +isolation. This requires hardware to provision that space and reserve in reserve it in > +MMIO. This doesn't scale as the number of threads becomes quite large. The > +hardware also manages the queue depth for Shared Work Queues (SWQ), and > +consumers don't need to track queue depth. If there
Re: [PATCH v7 3/9] docs: x86: Add documentation for SVA (Shared Virtual Addressing)
> Subject: Re: [PATCH v7 3/9] docs: x86: Add documentation for SVA (Shared > Virtual Addressing) Fix prefix: Documentation/x86: ... On Thu, Aug 27, 2020 at 08:06:28AM -0700, Fenghua Yu wrote: > From: Ashok Raj > > ENQCMD and Data Streaming Accelerator (DSA) and all of their associated > features are a complicated stack with lots of interconnected pieces. > This documentation provides a big picture overview for all of the > features. > > Signed-off-by: Ashok Raj > Co-developed-by: Fenghua Yu > Signed-off-by: Fenghua Yu > Reviewed-by: Tony Luck > --- > v7: > - Change the doc for updating PASID by IPI and context switch (Andy). > > v3: > - Replace deprecated intel_svm_bind_mm() by iommu_sva_bind_mm() (Baolu) > - Fix a couple of typos (Baolu) > > v2: > - Fix the doc format and add the doc in toctree (Thomas) > - Modify the doc for better description (Thomas, Tony, Dave) > > Documentation/x86/index.rst | 1 + > Documentation/x86/sva.rst | 254 > 2 files changed, 255 insertions(+) > create mode 100644 Documentation/x86/sva.rst > > diff --git a/Documentation/x86/index.rst b/Documentation/x86/index.rst > index 265d9e9a093b..e5d5ff096685 100644 > --- a/Documentation/x86/index.rst > +++ b/Documentation/x86/index.rst > @@ -30,3 +30,4 @@ x86-specific Documentation > usb-legacy-support > i386/index > x86_64/index > + sva > diff --git a/Documentation/x86/sva.rst b/Documentation/x86/sva.rst > new file mode 100644 > index ..6e7ac565e127 > --- /dev/null > +++ b/Documentation/x86/sva.rst > @@ -0,0 +1,254 @@ > +.. SPDX-License-Identifier: GPL-2.0 > + > +=== > +Shared Virtual Addressing (SVA) with ENQCMD > +=== > + > +Background > +== > + > +Shared Virtual Addressing (SVA) allows the processor and device to use the > +same virtual addresses avoiding the need for software to translate virtual > +addresses to physical addresses. SVA is what PCIe calls Shared Virtual > +Memory (SVM) ^ . <-- Fullstop > + > +In addition to the convenience of using application virtual addresses > +by the device, it also doesn't require pinning pages for DMA. > +PCIe Address Translation Services (ATS) along with Page Request Interface > +(PRI) allow devices to function much the same way as the CPU handling > +application page-faults. For more information please refer to PCIe ^ the > +specification Chapter 10: ATS Specification. > + > +Use of SVA requires IOMMU support in the platform. IOMMU also is required > +to support PCIe features ATS and PRI. ATS allows devices to cache > +translations for the virtual address. IOMMU driver uses the mmu_notifier() ... for virtual addresses. The IOMMU driver... > +support to keep the device tlb cache and the CPU cache in sync. PRI allows TLB > +the device to request paging the virtual address before using if they are ^ > +not paged in the CPU page tables. That sentence is reading strange and needs fixing. > + > + > +Shared Hardware Workqueues > +== > + > +Unlike Single Root I/O Virtualization (SRIOV), Scalable IOV (SIOV) permits > +the use of Shared Work Queues (SWQ) by both applications and Virtual > +Machines (VM's). This allows better hardware utilization vs. hard > +partitioning resources that could result in under utilization. In order to > +allow the hardware to distinguish the context for which work is being > +executed in the hardware by SWQ interface, SIOV uses Process Address Space > +ID (PASID), which is a 20bit number defined by the PCIe SIG. > + > +PASID value is encoded in all transactions from the device. This allows the > +IOMMU to track I/O on a per-PASID granularity in addition to using the PCIe > +Resource Identifier (RID) which is the Bus/Device/Function. > + > + > +ENQCMD > +== > + > +ENQCMD is a new instruction on Intel platforms that atomically submits a > +work descriptor to a device. The descriptor includes the operation to be > +performed, virtual addresses of all parameters, virtual address of a > completion > +record, and the PASID (process address space ID) of the current process. > + > +ENQCMD works with non-posted semantics and carries a status back if the > +command was accepted by hardware. This allows the submitter to know if the > +submission needs to be retried or other device specific mechanisms to > +implement fair
[PATCH v7 3/9] docs: x86: Add documentation for SVA (Shared Virtual Addressing)
From: Ashok Raj ENQCMD and Data Streaming Accelerator (DSA) and all of their associated features are a complicated stack with lots of interconnected pieces. This documentation provides a big picture overview for all of the features. Signed-off-by: Ashok Raj Co-developed-by: Fenghua Yu Signed-off-by: Fenghua Yu Reviewed-by: Tony Luck --- v7: - Change the doc for updating PASID by IPI and context switch (Andy). v3: - Replace deprecated intel_svm_bind_mm() by iommu_sva_bind_mm() (Baolu) - Fix a couple of typos (Baolu) v2: - Fix the doc format and add the doc in toctree (Thomas) - Modify the doc for better description (Thomas, Tony, Dave) Documentation/x86/index.rst | 1 + Documentation/x86/sva.rst | 254 2 files changed, 255 insertions(+) create mode 100644 Documentation/x86/sva.rst diff --git a/Documentation/x86/index.rst b/Documentation/x86/index.rst index 265d9e9a093b..e5d5ff096685 100644 --- a/Documentation/x86/index.rst +++ b/Documentation/x86/index.rst @@ -30,3 +30,4 @@ x86-specific Documentation usb-legacy-support i386/index x86_64/index + sva diff --git a/Documentation/x86/sva.rst b/Documentation/x86/sva.rst new file mode 100644 index ..6e7ac565e127 --- /dev/null +++ b/Documentation/x86/sva.rst @@ -0,0 +1,254 @@ +.. SPDX-License-Identifier: GPL-2.0 + +=== +Shared Virtual Addressing (SVA) with ENQCMD +=== + +Background +== + +Shared Virtual Addressing (SVA) allows the processor and device to use the +same virtual addresses avoiding the need for software to translate virtual +addresses to physical addresses. SVA is what PCIe calls Shared Virtual +Memory (SVM) + +In addition to the convenience of using application virtual addresses +by the device, it also doesn't require pinning pages for DMA. +PCIe Address Translation Services (ATS) along with Page Request Interface +(PRI) allow devices to function much the same way as the CPU handling +application page-faults. For more information please refer to PCIe +specification Chapter 10: ATS Specification. + +Use of SVA requires IOMMU support in the platform. IOMMU also is required +to support PCIe features ATS and PRI. ATS allows devices to cache +translations for the virtual address. IOMMU driver uses the mmu_notifier() +support to keep the device tlb cache and the CPU cache in sync. PRI allows +the device to request paging the virtual address before using if they are +not paged in the CPU page tables. + + +Shared Hardware Workqueues +== + +Unlike Single Root I/O Virtualization (SRIOV), Scalable IOV (SIOV) permits +the use of Shared Work Queues (SWQ) by both applications and Virtual +Machines (VM's). This allows better hardware utilization vs. hard +partitioning resources that could result in under utilization. In order to +allow the hardware to distinguish the context for which work is being +executed in the hardware by SWQ interface, SIOV uses Process Address Space +ID (PASID), which is a 20bit number defined by the PCIe SIG. + +PASID value is encoded in all transactions from the device. This allows the +IOMMU to track I/O on a per-PASID granularity in addition to using the PCIe +Resource Identifier (RID) which is the Bus/Device/Function. + + +ENQCMD +== + +ENQCMD is a new instruction on Intel platforms that atomically submits a +work descriptor to a device. The descriptor includes the operation to be +performed, virtual addresses of all parameters, virtual address of a completion +record, and the PASID (process address space ID) of the current process. + +ENQCMD works with non-posted semantics and carries a status back if the +command was accepted by hardware. This allows the submitter to know if the +submission needs to be retried or other device specific mechanisms to +implement fairness or ensure forward progress can be made. + +ENQCMD is the glue that ensures applications can directly submit commands +to the hardware and also permit hardware to be aware of application context +to perform I/O operations via use of PASID. + +Process Address Space Tagging += + +A new thread scoped MSR (IA32_PASID) provides the connection between +user processes and the rest of the hardware. When an application first +accesses an SVA capable device this MSR is initialized with a newly +allocated PASID. The driver for the device calls an IOMMU specific API +that sets up the routing for DMA and page-requests. + +For example, the Intel Data Streaming Accelerator (DSA) uses +iommu_sva_bind_device(), which will do the following. + +- Allocate the PASID, and program the process page-table (cr3) in the PASID + context entries. +- Register for mmu_notifier() to track any page-table invalidations to keep + the device tlb in sync. For example, when a page-table entry is invalidated, + IOMMU propagates the invalidation to device tlb. This will force any + futur