RE: [PATCH v15 00/12] SMMUv3 Nested Stage Setup (IOMMU part)
Hi Eric, > This is based on Jean-Philippe's > [PATCH v14 00/10] iommu: I/O page faults for SMMUv3 > https://www.spinics.net/lists/arm-kernel/msg886518.html > (including the patches that were not pulled for 5.13) > Jean's patches have been merged to v5.14. Do you anticipate IOMMU/VFIO part patches getting into upstream kernel soon? -KR ___ kvmarm mailing list kvmarm@lists.cs.columbia.edu https://lists.cs.columbia.edu/mailman/listinfo/kvmarm
Re: [PATCH v15 00/12] SMMUv3 Nested Stage Setup (IOMMU part)
Hi Krishna, On 9/27/21 11:17 PM, Krishna Reddy wrote: > Hi Eric, >> This is based on Jean-Philippe's >> [PATCH v14 00/10] iommu: I/O page faults for SMMUv3 >> https://www.spinics.net/lists/arm-kernel/msg886518.html >> (including the patches that were not pulled for 5.13) >> > Jean's patches have been merged to v5.14. > Do you anticipate IOMMU/VFIO part patches getting into upstream kernel soon? I am going to respin the smmu part rebased on v5.15. As for the VFIO part, this needs to be totally redesigned based on /dev/iommu (see [RFC 00/20] Introduce /dev/iommu for userspace I/O address space management). I will provide some updated kernel and qemu branches for testing purpose only. Thanks Eric > > -KR > ___ kvmarm mailing list kvmarm@lists.cs.columbia.edu https://lists.cs.columbia.edu/mailman/listinfo/kvmarm
Re: [PATCH v15 00/12] SMMUv3 Nested Stage Setup (IOMMU part)
Hi Eric, I have validated v15 of the patch series from your branch "v5.12-rc6-jean-iopf-14-2stage-v15" on top of Jean's current sva patches with Kernel-5.12.0-rc8. Verfied nested translations with NVMe PCI device assigned to Guest VM. Tested-by: Sumit Gupta ___ kvmarm mailing list kvmarm@lists.cs.columbia.edu https://lists.cs.columbia.edu/mailman/listinfo/kvmarm
Re: [PATCH v15 00/12] SMMUv3 Nested Stage Setup (IOMMU part)
Hi Eric, On 4/11/21 4:42 PM, Eric Auger wrote: SMMUv3 Nested Stage Setup (IOMMU part) [snip] Eric Auger (12): iommu: Introduce attach/detach_pasid_table API iommu: Introduce bind/unbind_guest_msi iommu/smmuv3: Allow s1 and s2 configs to coexist iommu/smmuv3: Get prepared for nested stage support iommu/smmuv3: Implement attach/detach_pasid_table iommu/smmuv3: Allow stage 1 invalidation with unmanaged ASIDs iommu/smmuv3: Implement cache_invalidate dma-iommu: Implement NESTED_MSI cookie iommu/smmuv3: Nested mode single MSI doorbell per domain enforcement iommu/smmuv3: Enforce incompatibility between nested mode and HW MSI regions iommu/smmuv3: Implement bind/unbind_guest_msi iommu/smmuv3: report additional recoverable faults [snip] I noticed that the patch[1]: [PATCH v13 15/15] iommu/smmuv3: Add PASID cache invalidation per PASID has been dropped in the v14 and v15 of this series. Is this planned to be part of any future series, or did I miss a discussion about dropping the patch? :-) [1] https://patchwork.kernel.org/project/kvm/patch/20201118112151.25412-16-eric.au...@redhat.com/ Best regards Vivek IMPORTANT NOTICE: The contents of this email and any attachments are confidential and may also be privileged. If you are not the intended recipient, please notify the sender immediately and do not disclose the contents to any other person, use it for any purpose, or store or copy the information in any medium. Thank you. ___ kvmarm mailing list kvmarm@lists.cs.columbia.edu https://lists.cs.columbia.edu/mailman/listinfo/kvmarm
Re: [PATCH v15 00/12] SMMUv3 Nested Stage Setup (IOMMU part)
Hi Shameer, On 2021/4/14 14:56, Shameerali Kolothum Thodi wrote: -Original Message- From: wangxingang Sent: 14 April 2021 03:36 To: Eric Auger ; eric.auger@gmail.com; jean-phili...@linaro.org; io...@lists.linux-foundation.org; linux-ker...@vger.kernel.org; k...@vger.kernel.org; kvmarm@lists.cs.columbia.edu; w...@kernel.org; m...@kernel.org; robin.mur...@arm.com; j...@8bytes.org; alex.william...@redhat.com; t...@semihalf.com; zhukeqian Cc: jacob.jun@linux.intel.com; yi.l@intel.com; zhangfei@linaro.org; zhangfei@gmail.com; vivek.gau...@arm.com; Shameerali Kolothum Thodi ; yuzenghui ; nicoleots...@gmail.com; lushenming ; vse...@nvidia.com; chenxiang (M) ; vdu...@nvidia.com; jiangkunkun Subject: Re: [PATCH v15 00/12] SMMUv3 Nested Stage Setup (IOMMU part) Hi Eric, Jean-Philippe On 2021/4/11 19:12, Eric Auger wrote: SMMUv3 Nested Stage Setup (IOMMU part) This series brings the IOMMU part of HW nested paging support in the SMMUv3. The VFIO part is submitted separately. This is based on Jean-Philippe's [PATCH v14 00/10] iommu: I/O page faults for SMMUv3 https://www.spinics.net/lists/arm-kernel/msg886518.html (including the patches that were not pulled for 5.13) The IOMMU API is extended to support 2 new API functionalities: 1) pass the guest stage 1 configuration 2) pass stage 1 MSI bindings Then those capabilities gets implemented in the SMMUv3 driver. The virtualizer passes information through the VFIO user API which cascades them to the iommu subsystem. This allows the guest to own stage 1 tables and context descriptors (so-called PASID table) while the host owns stage 2 tables and main configuration structures (STE). Best Regards Eric This series can be found at: v5.12-rc6-jean-iopf-14-2stage-v15 (including the VFIO part in its last version: v13) I am testing the performance of an accelerator with/without SVA/vSVA, and found there might be some potential performance loss risk for SVA/vSVA. I use a Network and computing encryption device (SEC), and send 1MB request for 1 times. I trigger mm fault before I send the request, so there should be no iopf. Here's what I got: physical scenario: performance:SVA:9MB/s NOSVA:9MB/s tlb_miss: SVA:302,651 NOSVA:1,223 trans_table_walk_access:SVA:302,276 NOSVA:1,237 VM scenario: performance:vSVA:9MB/s NOvSVA:6MB/s about 30~40% loss tlb_miss: vSVA:4,423,897 NOvSVA:1,907 trans_table_walk_access:vSVA:61,928,430 NOvSVA:21,948 In physical scenario, there's almost no performance loss, but the tlb_miss and trans_table_walk_access of stage 1 for SVA is quite high, comparing to NOSVA. In VM scenario, there's about 30~40% performance loss, this is because the two stage tlb_miss and trans_table_walk_access is even higher, and impact the performance. I compare the procedure of building page table of SVA and NOSVA, and found that NOSVA uses 2MB mapping as far as possible, while SVA uses only 4KB. I retest with huge page, and huge page could solve this problem, the performance of SVA/vSVA is almost the same as NOSVA. I am wondering do you have any other solution for the performance loss of vSVA, or any other method to reduce the tlb_miss/trans_table_walk. Hi Xingang, Just curious, do you have DVM enabled on this board or does it use explicit SMMU TLB invalidations? Thanks, Shameer For now, DVM is enabled and TLBI is not explicit used. And by the way the performance data above is performance:vSVA:9GB/s(not 9MB/s) NOvSVA:6GB/s(not 6GB/s) Thanks Xingang . ___ kvmarm mailing list kvmarm@lists.cs.columbia.edu https://lists.cs.columbia.edu/mailman/listinfo/kvmarm
Re: [PATCH v15 00/12] SMMUv3 Nested Stage Setup (IOMMU part)
Hi Eric, Jean-Philippe On 2021/4/11 19:12, Eric Auger wrote: SMMUv3 Nested Stage Setup (IOMMU part) This series brings the IOMMU part of HW nested paging support in the SMMUv3. The VFIO part is submitted separately. This is based on Jean-Philippe's [PATCH v14 00/10] iommu: I/O page faults for SMMUv3 https://www.spinics.net/lists/arm-kernel/msg886518.html (including the patches that were not pulled for 5.13) The IOMMU API is extended to support 2 new API functionalities: 1) pass the guest stage 1 configuration 2) pass stage 1 MSI bindings Then those capabilities gets implemented in the SMMUv3 driver. The virtualizer passes information through the VFIO user API which cascades them to the iommu subsystem. This allows the guest to own stage 1 tables and context descriptors (so-called PASID table) while the host owns stage 2 tables and main configuration structures (STE). Best Regards Eric This series can be found at: v5.12-rc6-jean-iopf-14-2stage-v15 (including the VFIO part in its last version: v13) I am testing the performance of an accelerator with/without SVA/vSVA, and found there might be some potential performance loss risk for SVA/vSVA. I use a Network and computing encryption device (SEC), and send 1MB request for 1 times. I trigger mm fault before I send the request, so there should be no iopf. Here's what I got: physical scenario: performance:SVA:9MB/s NOSVA:9MB/s tlb_miss: SVA:302,651 NOSVA:1,223 trans_table_walk_access:SVA:302,276 NOSVA:1,237 VM scenario: performance:vSVA:9MB/s NOvSVA:6MB/s about 30~40% loss tlb_miss: vSVA:4,423,897 NOvSVA:1,907 trans_table_walk_access:vSVA:61,928,430 NOvSVA:21,948 In physical scenario, there's almost no performance loss, but the tlb_miss and trans_table_walk_access of stage 1 for SVA is quite high, comparing to NOSVA. In VM scenario, there's about 30~40% performance loss, this is because the two stage tlb_miss and trans_table_walk_access is even higher, and impact the performance. I compare the procedure of building page table of SVA and NOSVA, and found that NOSVA uses 2MB mapping as far as possible, while SVA uses only 4KB. I retest with huge page, and huge page could solve this problem, the performance of SVA/vSVA is almost the same as NOSVA. I am wondering do you have any other solution for the performance loss of vSVA, or any other method to reduce the tlb_miss/trans_table_walk. Thanks Xingang . ___ kvmarm mailing list kvmarm@lists.cs.columbia.edu https://lists.cs.columbia.edu/mailman/listinfo/kvmarm
RE: [PATCH v15 00/12] SMMUv3 Nested Stage Setup (IOMMU part)
> -Original Message- > From: wangxingang > Sent: 14 April 2021 03:36 > To: Eric Auger ; eric.auger@gmail.com; > jean-phili...@linaro.org; io...@lists.linux-foundation.org; > linux-ker...@vger.kernel.org; k...@vger.kernel.org; > kvmarm@lists.cs.columbia.edu; w...@kernel.org; m...@kernel.org; > robin.mur...@arm.com; j...@8bytes.org; alex.william...@redhat.com; > t...@semihalf.com; zhukeqian > Cc: jacob.jun@linux.intel.com; yi.l@intel.com; > zhangfei@linaro.org; > zhangfei@gmail.com; vivek.gau...@arm.com; Shameerali Kolothum > Thodi ; yuzenghui > ; nicoleots...@gmail.com; lushenming > ; vse...@nvidia.com; chenxiang (M) > ; vdu...@nvidia.com; jiangkunkun > > Subject: Re: [PATCH v15 00/12] SMMUv3 Nested Stage Setup (IOMMU part) > > Hi Eric, Jean-Philippe > > On 2021/4/11 19:12, Eric Auger wrote: > > SMMUv3 Nested Stage Setup (IOMMU part) > > > > This series brings the IOMMU part of HW nested paging support > > in the SMMUv3. The VFIO part is submitted separately. > > > > This is based on Jean-Philippe's > > [PATCH v14 00/10] iommu: I/O page faults for SMMUv3 > > https://www.spinics.net/lists/arm-kernel/msg886518.html > > (including the patches that were not pulled for 5.13) > > > > The IOMMU API is extended to support 2 new API functionalities: > > 1) pass the guest stage 1 configuration > > 2) pass stage 1 MSI bindings > > > > Then those capabilities gets implemented in the SMMUv3 driver. > > > > The virtualizer passes information through the VFIO user API > > which cascades them to the iommu subsystem. This allows the guest > > to own stage 1 tables and context descriptors (so-called PASID > > table) while the host owns stage 2 tables and main configuration > > structures (STE). > > > > Best Regards > > > > Eric > > > > This series can be found at: > > v5.12-rc6-jean-iopf-14-2stage-v15 > > (including the VFIO part in its last version: v13) > > > > I am testing the performance of an accelerator with/without SVA/vSVA, > and found there might be some potential performance loss risk for SVA/vSVA. > > I use a Network and computing encryption device (SEC), and send 1MB > request for 1 times. > > I trigger mm fault before I send the request, so there should be no iopf. > > Here's what I got: > > physical scenario: > performance: SVA:9MB/s NOSVA:9MB/s > tlb_miss: SVA:302,651 NOSVA:1,223 > trans_table_walk_access:SVA:302,276 NOSVA:1,237 > > VM scenario: > performance: vSVA:9MB/s NOvSVA:6MB/s about 30~40% loss > tlb_miss: vSVA:4,423,897 NOvSVA:1,907 > trans_table_walk_access:vSVA:61,928,430 NOvSVA:21,948 > > In physical scenario, there's almost no performance loss, but the > tlb_miss and trans_table_walk_access of stage 1 for SVA is quite high, > comparing to NOSVA. > > In VM scenario, there's about 30~40% performance loss, this is because > the two stage tlb_miss and trans_table_walk_access is even higher, and > impact the performance. > > I compare the procedure of building page table of SVA and NOSVA, and > found that NOSVA uses 2MB mapping as far as possible, while SVA uses > only 4KB. > > I retest with huge page, and huge page could solve this problem, the > performance of SVA/vSVA is almost the same as NOSVA. > > I am wondering do you have any other solution for the performance loss > of vSVA, or any other method to reduce the tlb_miss/trans_table_walk. Hi Xingang, Just curious, do you have DVM enabled on this board or does it use explicit SMMU TLB invalidations? Thanks, Shameer ___ kvmarm mailing list kvmarm@lists.cs.columbia.edu https://lists.cs.columbia.edu/mailman/listinfo/kvmarm