Re: [PATCH v6 03/12] docs: x86: Add documentation for SVA (Shared Virtual Addressing)

2020-07-15 Thread Fenghua Yu
Hi, Yi,

On Mon, Jul 13, 2020 at 08:25:20PM -0700, Liu, Yi L wrote:
> > From: Fenghua Yu 
> > Sent: Tuesday, July 14, 2020 7:48 AM
> > From: Ashok Raj 

Thank you for your comments!

But I think we don't need to update this patch because the current text
is better than suggested changes. I would rather to keep this patch
unchanged. Please see my following explanations for details.

> > +(PRI) allow devices to function much the same way as the CPU handling
> > +application page-faults. For more information please refer to PCIe
> > +specification Chapter 10: ATS Specification.
> > +
> 
> nit: may be helpful to mention Chapter 10 of PCIe spec since 4.0. before 
> that, ATS has its
> own specification.

This doc only refers to the latest specs. Older specs are not useful
for this spec.

> > +ENQCMD is the glue that ensures applications can directly submit
> > +commands to the hardware and also permit hardware to be aware of
> > +application context to perform I/O operations via use of PASID.
> > +
> 
> maybe a reader will ask about ENQCMDs after reading ENQCMD/S spec. :-)

This doc only covers ENQCMD. ENQCMDS is out of scope of this doc.

> > +- Allocate the PASID, and program the process page-table (cr3) in the
> > +PASID
> > +  context entries.
> 
> nit: s/PASID context entries/PASID table entries/

Kernel IOMMU does use PASID context. So we would keep the current text.

> 
> > +- Register for mmu_notifier() to track any page-table invalidations to
> > +keep
> > +  the device tlb in sync. For example, when a page-table entry is
> 
> not only device tlb. I guess iotlb is also included.

I think they are same/similar concepts. No need to duplicate here.

> > +it into the new MSR to communicate the process identity to platform 
> > hardware.
> > +ENQCMD uses the PASID stored in this MSR to tag requests from this process.
> > +When a user submits a work descriptor to a device using the ENQCMD
> > +instruction, the PASID field in the descriptor is auto-filled with the
> > +value from MSR_IA32_PASID. Requests for DMA from the device are also
> > +tagged with the same PASID. The platform IOMMU uses the PASID in the
> 
> not quite get " Requests for DMA from the device are also tagged with the
> same PASID"

The DMA access from device to the process address space uses the same PASID
set up in the MSR.
 
> 
> > +transaction to perform address translation. The IOMMU api's setup the
> 
> s/api's/apis/ ?


> 
> > +corresponding PASID entry in IOMMU with the process address used by the CPU
> > (for e.g cr3 in x86).
> 
> with the process page tables used by the CPU (e.g. the page tables pointed by 
> cr3 in x86).

We use address to match the "address space" specified in the term of PASID.
page table is implementation details. So we would keep the current text.

> > +The MSR must be configured on each logical CPU before any application
> 
> s/MSR/MSR_IA32_PASID/

The MSR refers to MSR_IA32_PASID. We don't need to repeat long
"MSR_IA32_PASID" everywhere while a short "the MSR" is better and clear
in the doc.

> > +thread can interact with a device. Threads that belong to the same
> > +process share the same page tables, thus the same MSR value.
> 
> s/MSR/PASID/

The "MSR" is better because we focus on the MSR value here that is set up for
each thread.

> 
> > +
> > +PASID is cleared when a process is created. The PASID allocation and
> 
> s/PASID/MSR_IA32_PASID/

No. It is PASID that is cleared per process creation. We are not talking about
the MSR here although the MSR will cleared as part of process FPU init.

> 
> > +MSR programming may occur long after a process and its threads have been
> > created.
> > +One thread must call bind() to allocate the PASID for the process. If a
> 
> s/bind()/iommu_sva_bind_device()/ or say "call iommu api to bind a process 
> with
> a device." :-)

bind() is better because the binding function may be changed (e.g. it was
changed in 5.8). People know bind() means binding. Even in the future the
binding function is changed again, we don't need to change the text here
if using bind().

> 
> > +thread uses ENQCMD without the MSR first being populated, it will cause 
> > #GP.
> > +The kernel will fix up the #GP by writing the process-wide PASID into
> > +the thread that took the #GP. A single process PASID can be used
> > +simultaneously with multiple devices since they all share the same address 
> > space.
> 
> simultaneously with multiple devices if they all share the process address
> space.

Using "since" is better. It explains "why".

> 
> > +
> > +New threads could inherit the MSR value from the parent. But this would
> 
> s/MSR/MSR_IA32_PASID/

Ditto. It's unnecessary to use long "MSR_IA32_PASID" everywhere.
"The MSR" is concise and clear.

Thanks.

-Fenghua


RE: [PATCH v6 03/12] docs: x86: Add documentation for SVA (Shared Virtual Addressing)

2020-07-13 Thread Liu, Yi L
> From: Fenghua Yu 
> Sent: Tuesday, July 14, 2020 7:48 AM
> 
> From: Ashok Raj 
> 
> ENQCMD and Data Streaming Accelerator (DSA) and all of their associated 
> features
> are a complicated stack with lots of interconnected pieces.
> This documentation provides a big picture overview for all of the features.
> 
> Signed-off-by: Ashok Raj 
> Co-developed-by: Fenghua Yu 
> Signed-off-by: Fenghua Yu 
> Reviewed-by: Tony Luck 
> ---
> v3:
> - Replace deprecated intel_svm_bind_mm() by iommu_sva_bind_mm() (Baolu)
> - Fix a couple of typos (Baolu)
> 
> v2:
> - Fix the doc format and add the doc in toctree (Thomas)
> - Modify the doc for better description (Thomas, Tony, Dave)
> 
>  Documentation/x86/index.rst |   1 +
>  Documentation/x86/sva.rst   | 287 
>  2 files changed, 288 insertions(+)
>  create mode 100644 Documentation/x86/sva.rst
> 
> diff --git a/Documentation/x86/index.rst b/Documentation/x86/index.rst index
> 265d9e9a093b..e5d5ff096685 100644
> --- a/Documentation/x86/index.rst
> +++ b/Documentation/x86/index.rst
> @@ -30,3 +30,4 @@ x86-specific Documentation
> usb-legacy-support
> i386/index
> x86_64/index
> +   sva
> diff --git a/Documentation/x86/sva.rst b/Documentation/x86/sva.rst new file 
> mode
> 100644 index ..7242a84169ef
> --- /dev/null
> +++ b/Documentation/x86/sva.rst
> @@ -0,0 +1,287 @@
> +.. SPDX-License-Identifier: GPL-2.0
> +
> +===
> +Shared Virtual Addressing (SVA) with ENQCMD
> +===
> +
> +Background
> +==
> +
> +Shared Virtual Addressing (SVA) allows the processor and device to use
> +the same virtual addresses avoiding the need for software to translate
> +virtual addresses to physical addresses. SVA is what PCIe calls Shared
> +Virtual Memory (SVM)
> +
> +In addition to the convenience of using application virtual addresses
> +by the device, it also doesn't require pinning pages for DMA.
> +PCIe Address Translation Services (ATS) along with Page Request
> +Interface
> +(PRI) allow devices to function much the same way as the CPU handling
> +application page-faults. For more information please refer to PCIe
> +specification Chapter 10: ATS Specification.
> +

nit: may be helpful to mention Chapter 10 of PCIe spec since 4.0. before that, 
ATS has its
own specification.

> +Use of SVA requires IOMMU support in the platform. IOMMU also is
> +required to support PCIe features ATS and PRI. ATS allows devices to
> +cache translations for the virtual address. IOMMU driver uses the
> +mmu_notifier() support to keep the device tlb cache and the CPU cache
> +in sync. PRI allows the device to request paging the virtual address
> +before using if they are not paged in the CPU page tables.
> +
> +
> +Shared Hardware Workqueues
> +==
> +
> +Unlike Single Root I/O Virtualization (SRIOV), Scalable IOV (SIOV)
> +permits the use of Shared Work Queues (SWQ) by both applications and
> +Virtual Machines (VM's). This allows better hardware utilization vs.
> +hard partitioning resources that could result in under utilization. In
> +order to allow the hardware to distinguish the context for which work
> +is being executed in the hardware by SWQ interface, SIOV uses Process
> +Address Space ID (PASID), which is a 20bit number defined by the PCIe SIG.
> +
> +PASID value is encoded in all transactions from the device. This allows
> +the IOMMU to track I/O on a per-PASID granularity in addition to using
> +the PCIe Resource Identifier (RID) which is the Bus/Device/Function.
> +
> +
> +ENQCMD
> +==
> +
> +ENQCMD is a new instruction on Intel platforms that atomically submits
> +a work descriptor to a device. The descriptor includes the operation to
> +be performed, virtual addresses of all parameters, virtual address of a
> +completion record, and the PASID (process address space ID) of the current 
> process.
> +
> +ENQCMD works with non-posted semantics and carries a status back if the
> +command was accepted by hardware. This allows the submitter to know if
> +the submission needs to be retried or other device specific mechanisms
> +to implement fairness or ensure forward progress can be made.
> +
> +ENQCMD is the glue that ensures applications can directly submit
> +commands to the hardware and also permit hardware to be aware of
> +application context to perform I/O operations via use of PASID.
> +

maybe a reader will ask about ENQCMDs after reading ENQCMD/S spec. :-)

> +Process Address Space Tagging
> +=
> +
> +A new thread scoped MSR (IA32_PASID) provides the connection between
> +user processes and the rest of the hardware. When an application first
> +accesses an SVA capable device this MSR is initialized with a newly
> +allocated PASID. The driver for the device calls an IOMMU specific api
> +that sets up the routing for DMA and page-requests.
> +
> +For example, the Intel Data Streaming