Re: [PATCH v3] arm64: Add support for new control bits CTR_EL0.DIC and CTR_EL0.IDC
On Wed, Feb 21, 2018 at 04:51:40PM +, Robin Murphy wrote: > On 21/02/18 16:14, Shanker Donthineni wrote: > [...] > > > > @@ -1100,6 +1114,20 @@ static int cpu_copy_el2regs(void *__unused) > > > > .enable = cpu_clear_disr, > > > > }, > > > > #endif /* CONFIG_ARM64_RAS_EXTN */ > > > > +#ifdef CONFIG_ARM64_SKIP_CACHE_POU > > > > + { > > > > + .desc = "DCache clean to POU", > > > > > > This description is confusing, and sounds like it's describing DC CVAU, > > > rather > > > than the ability to ellide it. How about: > > > > Sure, I'll take your suggestion. > > Can we at least spell "elision" correctly please? ;) Argh. Yes. > Personally I read DIC and IDC as "D-cache to I-cache coherency" and "I-cache > to D-cache coherency" respectively (just my interpretation, I've not looked > into the spec work for any hints of rationale), but out loud those do sound > so poorly-defined that keeping things in terms of the required maintenance > probably is better. So long as we have (IDC) and (DIC) in the text to avoid ambiguity, I'm not that worried either way. Thanks, Mark. ___ kvmarm mailing list kvmarm@lists.cs.columbia.edu https://lists.cs.columbia.edu/mailman/listinfo/kvmarm
Re: [PATCH v3] arm64: Add support for new control bits CTR_EL0.DIC and CTR_EL0.IDC
On 21/02/18 16:14, Shanker Donthineni wrote: [...] @@ -1100,6 +1114,20 @@ static int cpu_copy_el2regs(void *__unused) .enable = cpu_clear_disr, }, #endif /* CONFIG_ARM64_RAS_EXTN */ +#ifdef CONFIG_ARM64_SKIP_CACHE_POU + { + .desc = "DCache clean to POU", This description is confusing, and sounds like it's describing DC CVAU, rather than the ability to ellide it. How about: Sure, I'll take your suggestion. Can we at least spell "elision" correctly please? ;) Personally I read DIC and IDC as "D-cache to I-cache coherency" and "I-cache to D-cache coherency" respectively (just my interpretation, I've not looked into the spec work for any hints of rationale), but out loud those do sound so poorly-defined that keeping things in terms of the required maintenance probably is better. .desc = "D-cache maintenance ellision (IDC)" + .capability = ARM64_HAS_CACHE_IDC, + .def_scope = SCOPE_SYSTEM, + .matches = has_cache_idc, + }, + { + .desc = "ICache invalidation to POU", ... and correspondingly: .desc = "I-cache maintenance ellision (DIC)" + .capability = ARM64_HAS_CACHE_DIC, + .def_scope = SCOPE_SYSTEM, + .matches = has_cache_dic, + }, +#endif /* CONFIG_ARM64_CACHE_DIC */ {}, }; [...] +alternative_if ARM64_HAS_CACHE_DIC + isb Why have we gained an ISB here if DIC is set? I believe synchronization barrier (ISB) is required here to support self-modifying/jump-labels code. This is for a user address, and I can't see why DIC would imply we need an extra ISB kernel-side. This is for user and kernel addresses, alternatives and jumplabel patching logic calls flush_icache_range(). There's an ISB hidden in invalidate_icache_by_line(), so it probably would be unsafe to start implicitly skipping that. + b 8f +alternative_else_nop_endif invalidate_icache_by_line x0, x1, x2, x3, 9f - mov x0, #0 +8: mov x0, #0 1: uaccess_ttbr0_disable x1, x2 ret @@ -80,6 +87,12 @@ ENDPROC(__flush_cache_user_range) *- end - virtual end address of region */ ENTRY(invalidate_icache_range) +alternative_if ARM64_HAS_CACHE_DIC + mov x0, xzr + dsb ish Do we actually need a DSB in this case? I'll remove if everyone agree. Will, Can you comment on this? As-is, this function *only* invalidates the I-cache, so we already assume that the data is visible at the PoU at this point. I don't see what extra gaurantee we'd need the DSB for. If so, then ditto for the existing invalidate_icache_by_line() code presumably. Robin. ___ kvmarm mailing list kvmarm@lists.cs.columbia.edu https://lists.cs.columbia.edu/mailman/listinfo/kvmarm
Re: [PATCH v3] arm64: Add support for new control bits CTR_EL0.DIC and CTR_EL0.IDC
Hi Mark, On 02/21/2018 09:09 AM, Mark Rutland wrote: > On Wed, Feb 21, 2018 at 07:49:06AM -0600, Shanker Donthineni wrote: >> The DCache clean & ICache invalidation requirements for instructions >> to be data coherence are discoverable through new fields in CTR_EL0. >> The following two control bits DIC and IDC were defined for this >> purpose. No need to perform point of unification cache maintenance >> operations from software on systems where CPU caches are transparent. >> >> This patch optimize the three functions __flush_cache_user_range(), >> clean_dcache_area_pou() and invalidate_icache_range() if the hardware >> reports CTR_EL0.IDC and/or CTR_EL0.IDC. Basically it skips the two >> instructions 'DC CVAU' and 'IC IVAU', and the associated loop logic >> in order to avoid the unnecessary overhead. >> >> CTR_EL0.DIC: Instruction cache invalidation requirements for >> instruction to data coherence. The meaning of this bit[29]. >> 0: Instruction cache invalidation to the point of unification >> is required for instruction to data coherence. >> 1: Instruction cache cleaning to the point of unification is >> not required for instruction to data coherence. >> >> CTR_EL0.IDC: Data cache clean requirements for instruction to data >> coherence. The meaning of this bit[28]. >> 0: Data cache clean to the point of unification is required for >> instruction to data coherence, unless CLIDR_EL1.LoC == 0b000 >> or (CLIDR_EL1.LoUIS == 0b000 && CLIDR_EL1.LoUU == 0b000). >> 1: Data cache clean to the point of unification is not required >> for instruction to data coherence. >> >> Signed-off-by: Philip Elcan>> Signed-off-by: Shanker Donthineni >> --- >> Changes since v2: >> -Included barriers, DSB/ISB with DIC set, and DSB with IDC set. >> -Single Kconfig option. >> >> Changes since v1: >> -Reworded commit text. >> -Used the alternatives framework as Catalin suggested. >> -Rebased on top of https://patchwork.kernel.org/patch/10227927/ >> >> arch/arm64/Kconfig | 12 >> arch/arm64/include/asm/cache.h | 5 + >> arch/arm64/include/asm/cpucaps.h | 4 +++- >> arch/arm64/kernel/cpufeature.c | 40 >> ++-- >> arch/arm64/mm/cache.S| 21 +++-- >> 5 files changed, 73 insertions(+), 9 deletions(-) >> >> diff --git a/arch/arm64/Kconfig b/arch/arm64/Kconfig >> index f55fe5b..82b8053 100644 >> --- a/arch/arm64/Kconfig >> +++ b/arch/arm64/Kconfig >> @@ -1095,6 +1095,18 @@ config ARM64_RAS_EXTN >>and access the new registers if the system supports the extension. >>Platform RAS features may additionally depend on firmware support. >> >> +config ARM64_SKIP_CACHE_POU >> +bool "Enable support to skip cache POU operations" >> +default y >> +help >> + Explicit point of unification cache operations can be eliminated >> + in software if the hardware handles transparently. The new bits in >> + CTR_EL0, CTR_EL0.DIC and CTR_EL0.IDC indicates the hardware >> + capabilities of ICache and DCache POU requirements. >> + >> + Selecting this feature will allow the kernel to optimize the POU >> + cache maintaince operations where it requires 'D{I}C C{I}VAU' >> + >> endmenu > > Is it worth having a config option for this at all? The savings from turning > this off seem trivial. > >> >> config ARM64_SVE >> diff --git a/arch/arm64/include/asm/cache.h b/arch/arm64/include/asm/cache.h >> index ea9bb4e..e22178b 100644 >> --- a/arch/arm64/include/asm/cache.h >> +++ b/arch/arm64/include/asm/cache.h >> @@ -20,8 +20,13 @@ >> >> #define CTR_L1IP_SHIFT 14 >> #define CTR_L1IP_MASK 3 >> +#define CTR_DMLINE_SHIFT16 >> +#define CTR_ERG_SHIFT 20 >> #define CTR_CWG_SHIFT 24 >> #define CTR_CWG_MASK15 >> +#define CTR_IDC_SHIFT 28 >> +#define CTR_DIC_SHIFT 29 >> +#define CTR_B31_SHIFT 31 >> >> #define CTR_L1IP(ctr) (((ctr) >> CTR_L1IP_SHIFT) & >> CTR_L1IP_MASK) >> >> diff --git a/arch/arm64/include/asm/cpucaps.h >> b/arch/arm64/include/asm/cpucaps.h >> index bb26382..8dd42ae 100644 >> --- a/arch/arm64/include/asm/cpucaps.h >> +++ b/arch/arm64/include/asm/cpucaps.h >> @@ -45,7 +45,9 @@ >> #define ARM64_HARDEN_BRANCH_PREDICTOR 24 >> #define ARM64_HARDEN_BP_POST_GUEST_EXIT 25 >> #define ARM64_HAS_RAS_EXTN 26 >> +#define ARM64_HAS_CACHE_IDC 27 >> +#define ARM64_HAS_CACHE_DIC 28 >> >> -#define ARM64_NCAPS 27 >> +#define ARM64_NCAPS 29 >> >> #endif /* __ASM_CPUCAPS_H */ >> diff --git a/arch/arm64/kernel/cpufeature.c b/arch/arm64/kernel/cpufeature.c >> index ff8a6e9..12e100a 100644 >> --- a/arch/arm64/kernel/cpufeature.c >> +++
Re: [PATCH v3] arm64: Add support for new control bits CTR_EL0.DIC and CTR_EL0.IDC
On Wed, Feb 21, 2018 at 07:49:06AM -0600, Shanker Donthineni wrote: > The DCache clean & ICache invalidation requirements for instructions > to be data coherence are discoverable through new fields in CTR_EL0. > The following two control bits DIC and IDC were defined for this > purpose. No need to perform point of unification cache maintenance > operations from software on systems where CPU caches are transparent. > > This patch optimize the three functions __flush_cache_user_range(), > clean_dcache_area_pou() and invalidate_icache_range() if the hardware > reports CTR_EL0.IDC and/or CTR_EL0.IDC. Basically it skips the two > instructions 'DC CVAU' and 'IC IVAU', and the associated loop logic > in order to avoid the unnecessary overhead. > > CTR_EL0.DIC: Instruction cache invalidation requirements for > instruction to data coherence. The meaning of this bit[29]. > 0: Instruction cache invalidation to the point of unification > is required for instruction to data coherence. > 1: Instruction cache cleaning to the point of unification is > not required for instruction to data coherence. > > CTR_EL0.IDC: Data cache clean requirements for instruction to data > coherence. The meaning of this bit[28]. > 0: Data cache clean to the point of unification is required for > instruction to data coherence, unless CLIDR_EL1.LoC == 0b000 > or (CLIDR_EL1.LoUIS == 0b000 && CLIDR_EL1.LoUU == 0b000). > 1: Data cache clean to the point of unification is not required > for instruction to data coherence. > > Signed-off-by: Philip Elcan> Signed-off-by: Shanker Donthineni > --- > Changes since v2: > -Included barriers, DSB/ISB with DIC set, and DSB with IDC set. > -Single Kconfig option. > > Changes since v1: > -Reworded commit text. > -Used the alternatives framework as Catalin suggested. > -Rebased on top of https://patchwork.kernel.org/patch/10227927/ > > arch/arm64/Kconfig | 12 > arch/arm64/include/asm/cache.h | 5 + > arch/arm64/include/asm/cpucaps.h | 4 +++- > arch/arm64/kernel/cpufeature.c | 40 > ++-- > arch/arm64/mm/cache.S| 21 +++-- > 5 files changed, 73 insertions(+), 9 deletions(-) > > diff --git a/arch/arm64/Kconfig b/arch/arm64/Kconfig > index f55fe5b..82b8053 100644 > --- a/arch/arm64/Kconfig > +++ b/arch/arm64/Kconfig > @@ -1095,6 +1095,18 @@ config ARM64_RAS_EXTN > and access the new registers if the system supports the extension. > Platform RAS features may additionally depend on firmware support. > > +config ARM64_SKIP_CACHE_POU > + bool "Enable support to skip cache POU operations" > + default y > + help > + Explicit point of unification cache operations can be eliminated > + in software if the hardware handles transparently. The new bits in > + CTR_EL0, CTR_EL0.DIC and CTR_EL0.IDC indicates the hardware > + capabilities of ICache and DCache POU requirements. > + > + Selecting this feature will allow the kernel to optimize the POU > + cache maintaince operations where it requires 'D{I}C C{I}VAU' > + > endmenu Is it worth having a config option for this at all? The savings from turning this off seem trivial. > > config ARM64_SVE > diff --git a/arch/arm64/include/asm/cache.h b/arch/arm64/include/asm/cache.h > index ea9bb4e..e22178b 100644 > --- a/arch/arm64/include/asm/cache.h > +++ b/arch/arm64/include/asm/cache.h > @@ -20,8 +20,13 @@ > > #define CTR_L1IP_SHIFT 14 > #define CTR_L1IP_MASK3 > +#define CTR_DMLINE_SHIFT 16 > +#define CTR_ERG_SHIFT20 > #define CTR_CWG_SHIFT24 > #define CTR_CWG_MASK 15 > +#define CTR_IDC_SHIFT28 > +#define CTR_DIC_SHIFT29 > +#define CTR_B31_SHIFT31 > > #define CTR_L1IP(ctr)(((ctr) >> CTR_L1IP_SHIFT) & > CTR_L1IP_MASK) > > diff --git a/arch/arm64/include/asm/cpucaps.h > b/arch/arm64/include/asm/cpucaps.h > index bb26382..8dd42ae 100644 > --- a/arch/arm64/include/asm/cpucaps.h > +++ b/arch/arm64/include/asm/cpucaps.h > @@ -45,7 +45,9 @@ > #define ARM64_HARDEN_BRANCH_PREDICTOR24 > #define ARM64_HARDEN_BP_POST_GUEST_EXIT 25 > #define ARM64_HAS_RAS_EXTN 26 > +#define ARM64_HAS_CACHE_IDC 27 > +#define ARM64_HAS_CACHE_DIC 28 > > -#define ARM64_NCAPS 27 > +#define ARM64_NCAPS 29 > > #endif /* __ASM_CPUCAPS_H */ > diff --git a/arch/arm64/kernel/cpufeature.c b/arch/arm64/kernel/cpufeature.c > index ff8a6e9..12e100a 100644 > --- a/arch/arm64/kernel/cpufeature.c > +++ b/arch/arm64/kernel/cpufeature.c > @@ -199,12 +199,12 @@ static int __init register_cpu_hwcaps_dumper(void) > }; > > static const struct arm64_ftr_bits ftr_ctr[] = { >