From: Haseeb Ashraf <[email protected]> This patch series addresses a major issue for running Xen on KVM i.e. costly emulation of VMALLS12E1IS which becomes worse when this TLBI is invoked too many times. There are mainly two places where this is problematic: (a) When vCPUs switch on a pCPU or pCPUs (b) When domu mapped pages onto dom0, are to be unmapped, then each page being removed by XENMEM_remove_from_physmap has its TLBs invalidated by the TLBI variant that flushes the whole range.
This patch series prefers usage of IPA-based TLBIs wherever possible instead of complete flushing of TLBs every time. It consists of three patches where the first one address the issue being discussed for Arm64. Second patch further optimizes the combined stage-1,2 TLB flushes by leveraging FEAT_nTLBPA. Third patch introduces IPA-based TLBI for Arm32 in presence of FEAT_nTLBPA. Haseeb Ashraf (3): xen/arm/p2m: perform IPA-based TLBI when IPA is known xen/arm: optimize stage-1,2 combined TLBI in presence of FEAT_nTLBPA xen/arm32: add CPU capability for IPA-based TLBI Changes in v3: - Mainly the handling of repeat TLBI workaround with IPA-based TLBI, so that the extra TLBI and DSB are repeated only for the final TLBI and DSB of the whole sequence. - Updated code comments as per feedback. Further details are available in each commit's changelog. - Minor updates to code as per feedback. Further details are available in each commit's changelog. Changes in v2: - Split up the commit in 3 commits. First commit implements the baseline implementation without any addition of new CPU capabilities. Implemented new CPU caps in separate features to emphasize how each of it optimizes the TLB invalidation. - Moved ARM32 and ARM64 specific implementations of TLBIs to architecture specific flushtlb.h. - Added references of ARM ARM in code comments. - Evaluated and added a threshold to select between IPA-based TLB invalidation vs fallback to full stage TLB invalidation above the threshold. - Introduced ARM_HAS_NTLBPA CPU capability which leverages FEAT_nTLBPA for arm32 as well as arm64. - Introduced ARM_HAS_TLB_IPA CPU capability for IPA-based TLBI for arm32. Haseeb Ashraf (3): xen/arm/p2m: perform IPA-based TLBI when IPA is known xen/arm: optimize stage-1,2 combined TLBI in presence of FEAT_nTLBPA xen/arm32: add CPU capability for IPA-based TLBI xen/arch/arm/cpufeature.c | 31 ++++++++ xen/arch/arm/include/asm/arm32/flushtlb.h | 87 +++++++++++++++++++++ xen/arch/arm/include/asm/arm64/flushtlb.h | 77 +++++++++++++++++++ xen/arch/arm/include/asm/cpregs.h | 4 + xen/arch/arm/include/asm/cpufeature.h | 27 ++++++- xen/arch/arm/include/asm/mmu/p2m.h | 2 + xen/arch/arm/include/asm/processor.h | 10 +++ xen/arch/arm/mmu/p2m.c | 92 +++++++++++++++++------ 8 files changed, 302 insertions(+), 28 deletions(-) -- 2.43.0
