Re: [PATCH v3] arm64: Add support for new control bits CTR_EL0.DIC and CTR_EL0.IDC

2018-02-22 Thread Mark Rutland
On Wed, Feb 21, 2018 at 04:51:40PM +, Robin Murphy wrote:
> On 21/02/18 16:14, Shanker Donthineni wrote:
> [...]
> > > > @@ -1100,6 +1114,20 @@ static int cpu_copy_el2regs(void *__unused)
> > > > .enable = cpu_clear_disr,
> > > > },
> > > >   #endif /* CONFIG_ARM64_RAS_EXTN */
> > > > +#ifdef CONFIG_ARM64_SKIP_CACHE_POU
> > > > +   {
> > > > +   .desc = "DCache clean to POU",
> > > 
> > > This description is confusing, and sounds like it's describing DC CVAU, 
> > > rather
> > > than the ability to ellide it. How about:
> > 
> > Sure, I'll take your suggestion.
> 
> Can we at least spell "elision" correctly please? ;)

Argh. Yes.

> Personally I read DIC and IDC as "D-cache to I-cache coherency" and "I-cache
> to D-cache coherency" respectively (just my interpretation, I've not looked
> into the spec work for any hints of rationale), but out loud those do sound
> so poorly-defined that keeping things in terms of the required maintenance
> probably is better.

So long as we have (IDC) and (DIC) in the text to avoid ambiguity, I'm
not that worried either way.

Thanks,
Mark.
___
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm


Re: [PATCH v3] arm64: Add support for new control bits CTR_EL0.DIC and CTR_EL0.IDC

2018-02-21 Thread Robin Murphy

On 21/02/18 16:14, Shanker Donthineni wrote:
[...]

@@ -1100,6 +1114,20 @@ static int cpu_copy_el2regs(void *__unused)
.enable = cpu_clear_disr,
},
  #endif /* CONFIG_ARM64_RAS_EXTN */
+#ifdef CONFIG_ARM64_SKIP_CACHE_POU
+   {
+   .desc = "DCache clean to POU",


This description is confusing, and sounds like it's describing DC CVAU, rather
than the ability to ellide it. How about:



Sure, I'll take your suggestion.


Can we at least spell "elision" correctly please? ;)

Personally I read DIC and IDC as "D-cache to I-cache coherency" and 
"I-cache to D-cache coherency" respectively (just my interpretation, 
I've not looked into the spec work for any hints of rationale), but out 
loud those do sound so poorly-defined that keeping things in terms of 
the required maintenance probably is better.



.desc = "D-cache maintenance ellision (IDC)"


+   .capability = ARM64_HAS_CACHE_IDC,
+   .def_scope = SCOPE_SYSTEM,
+   .matches = has_cache_idc,
+   },
+   {
+   .desc = "ICache invalidation to POU",


... and correspondingly:

.desc = "I-cache maintenance ellision (DIC)"


+   .capability = ARM64_HAS_CACHE_DIC,
+   .def_scope = SCOPE_SYSTEM,
+   .matches = has_cache_dic,
+   },
+#endif /* CONFIG_ARM64_CACHE_DIC */
{},
  };

[...]

+alternative_if ARM64_HAS_CACHE_DIC
+   isb


Why have we gained an ISB here if DIC is set?



I believe synchronization barrier (ISB) is required here to support 
self-modifying/jump-labels
code.
   

This is for a user address, and I can't see why DIC would imply we need an
extra ISB kernel-side.



This is for user and kernel addresses, alternatives and jumplabel patching logic
calls flush_icache_range().


There's an ISB hidden in invalidate_icache_by_line(), so it probably 
would be unsafe to start implicitly skipping that.



+   b   8f
+alternative_else_nop_endif
invalidate_icache_by_line x0, x1, x2, x3, 9f
-   mov x0, #0
+8: mov x0, #0
  1:
uaccess_ttbr0_disable x1, x2
ret
@@ -80,6 +87,12 @@ ENDPROC(__flush_cache_user_range)
   *- end - virtual end address of region
   */
  ENTRY(invalidate_icache_range)
+alternative_if ARM64_HAS_CACHE_DIC
+   mov x0, xzr
+   dsb ish


Do we actually need a DSB in this case?



I'll remove if everyone agree.

Will, Can you comment on this?


As-is, this function *only* invalidates the I-cache, so we already assume that
the data is visible at the PoU at this point. I don't see what extra gaurantee
we'd need the DSB for.


If so, then ditto for the existing invalidate_icache_by_line() code 
presumably.


Robin.
___
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm


Re: [PATCH v3] arm64: Add support for new control bits CTR_EL0.DIC and CTR_EL0.IDC

2018-02-21 Thread Shanker Donthineni
Hi Mark,

On 02/21/2018 09:09 AM, Mark Rutland wrote:
> On Wed, Feb 21, 2018 at 07:49:06AM -0600, Shanker Donthineni wrote:
>> The DCache clean & ICache invalidation requirements for instructions
>> to be data coherence are discoverable through new fields in CTR_EL0.
>> The following two control bits DIC and IDC were defined for this
>> purpose. No need to perform point of unification cache maintenance
>> operations from software on systems where CPU caches are transparent.
>>
>> This patch optimize the three functions __flush_cache_user_range(),
>> clean_dcache_area_pou() and invalidate_icache_range() if the hardware
>> reports CTR_EL0.IDC and/or CTR_EL0.IDC. Basically it skips the two
>> instructions 'DC CVAU' and 'IC IVAU', and the associated loop logic
>> in order to avoid the unnecessary overhead.
>>
>> CTR_EL0.DIC: Instruction cache invalidation requirements for
>>  instruction to data coherence. The meaning of this bit[29].
>>   0: Instruction cache invalidation to the point of unification
>>  is required for instruction to data coherence.
>>   1: Instruction cache cleaning to the point of unification is
>>   not required for instruction to data coherence.
>>
>> CTR_EL0.IDC: Data cache clean requirements for instruction to data
>>  coherence. The meaning of this bit[28].
>>   0: Data cache clean to the point of unification is required for
>>  instruction to data coherence, unless CLIDR_EL1.LoC == 0b000
>>  or (CLIDR_EL1.LoUIS == 0b000 && CLIDR_EL1.LoUU == 0b000).
>>   1: Data cache clean to the point of unification is not required
>>  for instruction to data coherence.
>>
>> Signed-off-by: Philip Elcan 
>> Signed-off-by: Shanker Donthineni 
>> ---
>> Changes since v2:
>>   -Included barriers, DSB/ISB with DIC set, and DSB with IDC set.
>>   -Single Kconfig option.
>>
>> Changes since v1:
>>   -Reworded commit text.
>>   -Used the alternatives framework as Catalin suggested.
>>   -Rebased on top of https://patchwork.kernel.org/patch/10227927/
>>
>>  arch/arm64/Kconfig   | 12 
>>  arch/arm64/include/asm/cache.h   |  5 +
>>  arch/arm64/include/asm/cpucaps.h |  4 +++-
>>  arch/arm64/kernel/cpufeature.c   | 40 
>> ++--
>>  arch/arm64/mm/cache.S| 21 +++--
>>  5 files changed, 73 insertions(+), 9 deletions(-)
>>
>> diff --git a/arch/arm64/Kconfig b/arch/arm64/Kconfig
>> index f55fe5b..82b8053 100644
>> --- a/arch/arm64/Kconfig
>> +++ b/arch/arm64/Kconfig
>> @@ -1095,6 +1095,18 @@ config ARM64_RAS_EXTN
>>and access the new registers if the system supports the extension.
>>Platform RAS features may additionally depend on firmware support.
>>  
>> +config ARM64_SKIP_CACHE_POU
>> +bool "Enable support to skip cache POU operations"
>> +default y
>> +help
>> +  Explicit point of unification cache operations can be eliminated
>> +  in software if the hardware handles transparently. The new bits in
>> +  CTR_EL0, CTR_EL0.DIC and CTR_EL0.IDC indicates the hardware
>> +  capabilities of ICache and DCache POU requirements.
>> +
>> +  Selecting this feature will allow the kernel to optimize the POU
>> +  cache maintaince operations where it requires 'D{I}C C{I}VAU'
>> +
>>  endmenu
> 
> Is it worth having a config option for this at all? The savings from turning
> this off seem trivial.
> 
>>  
>>  config ARM64_SVE
>> diff --git a/arch/arm64/include/asm/cache.h b/arch/arm64/include/asm/cache.h
>> index ea9bb4e..e22178b 100644
>> --- a/arch/arm64/include/asm/cache.h
>> +++ b/arch/arm64/include/asm/cache.h
>> @@ -20,8 +20,13 @@
>>  
>>  #define CTR_L1IP_SHIFT  14
>>  #define CTR_L1IP_MASK   3
>> +#define CTR_DMLINE_SHIFT16
>> +#define CTR_ERG_SHIFT   20
>>  #define CTR_CWG_SHIFT   24
>>  #define CTR_CWG_MASK15
>> +#define CTR_IDC_SHIFT   28
>> +#define CTR_DIC_SHIFT   29
>> +#define CTR_B31_SHIFT   31
>>  
>>  #define CTR_L1IP(ctr)   (((ctr) >> CTR_L1IP_SHIFT) & 
>> CTR_L1IP_MASK)
>>  
>> diff --git a/arch/arm64/include/asm/cpucaps.h 
>> b/arch/arm64/include/asm/cpucaps.h
>> index bb26382..8dd42ae 100644
>> --- a/arch/arm64/include/asm/cpucaps.h
>> +++ b/arch/arm64/include/asm/cpucaps.h
>> @@ -45,7 +45,9 @@
>>  #define ARM64_HARDEN_BRANCH_PREDICTOR   24
>>  #define ARM64_HARDEN_BP_POST_GUEST_EXIT 25
>>  #define ARM64_HAS_RAS_EXTN  26
>> +#define ARM64_HAS_CACHE_IDC 27
>> +#define ARM64_HAS_CACHE_DIC 28
>>  
>> -#define ARM64_NCAPS 27
>> +#define ARM64_NCAPS 29
>>  
>>  #endif /* __ASM_CPUCAPS_H */
>> diff --git a/arch/arm64/kernel/cpufeature.c b/arch/arm64/kernel/cpufeature.c
>> index ff8a6e9..12e100a 100644
>> --- a/arch/arm64/kernel/cpufeature.c
>> +++ 

Re: [PATCH v3] arm64: Add support for new control bits CTR_EL0.DIC and CTR_EL0.IDC

2018-02-21 Thread Mark Rutland
On Wed, Feb 21, 2018 at 07:49:06AM -0600, Shanker Donthineni wrote:
> The DCache clean & ICache invalidation requirements for instructions
> to be data coherence are discoverable through new fields in CTR_EL0.
> The following two control bits DIC and IDC were defined for this
> purpose. No need to perform point of unification cache maintenance
> operations from software on systems where CPU caches are transparent.
> 
> This patch optimize the three functions __flush_cache_user_range(),
> clean_dcache_area_pou() and invalidate_icache_range() if the hardware
> reports CTR_EL0.IDC and/or CTR_EL0.IDC. Basically it skips the two
> instructions 'DC CVAU' and 'IC IVAU', and the associated loop logic
> in order to avoid the unnecessary overhead.
> 
> CTR_EL0.DIC: Instruction cache invalidation requirements for
>  instruction to data coherence. The meaning of this bit[29].
>   0: Instruction cache invalidation to the point of unification
>  is required for instruction to data coherence.
>   1: Instruction cache cleaning to the point of unification is
>   not required for instruction to data coherence.
> 
> CTR_EL0.IDC: Data cache clean requirements for instruction to data
>  coherence. The meaning of this bit[28].
>   0: Data cache clean to the point of unification is required for
>  instruction to data coherence, unless CLIDR_EL1.LoC == 0b000
>  or (CLIDR_EL1.LoUIS == 0b000 && CLIDR_EL1.LoUU == 0b000).
>   1: Data cache clean to the point of unification is not required
>  for instruction to data coherence.
> 
> Signed-off-by: Philip Elcan 
> Signed-off-by: Shanker Donthineni 
> ---
> Changes since v2:
>   -Included barriers, DSB/ISB with DIC set, and DSB with IDC set.
>   -Single Kconfig option.
> 
> Changes since v1:
>   -Reworded commit text.
>   -Used the alternatives framework as Catalin suggested.
>   -Rebased on top of https://patchwork.kernel.org/patch/10227927/
> 
>  arch/arm64/Kconfig   | 12 
>  arch/arm64/include/asm/cache.h   |  5 +
>  arch/arm64/include/asm/cpucaps.h |  4 +++-
>  arch/arm64/kernel/cpufeature.c   | 40 
> ++--
>  arch/arm64/mm/cache.S| 21 +++--
>  5 files changed, 73 insertions(+), 9 deletions(-)
> 
> diff --git a/arch/arm64/Kconfig b/arch/arm64/Kconfig
> index f55fe5b..82b8053 100644
> --- a/arch/arm64/Kconfig
> +++ b/arch/arm64/Kconfig
> @@ -1095,6 +1095,18 @@ config ARM64_RAS_EXTN
> and access the new registers if the system supports the extension.
> Platform RAS features may additionally depend on firmware support.
>  
> +config ARM64_SKIP_CACHE_POU
> + bool "Enable support to skip cache POU operations"
> + default y
> + help
> +   Explicit point of unification cache operations can be eliminated
> +   in software if the hardware handles transparently. The new bits in
> +   CTR_EL0, CTR_EL0.DIC and CTR_EL0.IDC indicates the hardware
> +   capabilities of ICache and DCache POU requirements.
> +
> +   Selecting this feature will allow the kernel to optimize the POU
> +   cache maintaince operations where it requires 'D{I}C C{I}VAU'
> +
>  endmenu

Is it worth having a config option for this at all? The savings from turning
this off seem trivial.

>  
>  config ARM64_SVE
> diff --git a/arch/arm64/include/asm/cache.h b/arch/arm64/include/asm/cache.h
> index ea9bb4e..e22178b 100644
> --- a/arch/arm64/include/asm/cache.h
> +++ b/arch/arm64/include/asm/cache.h
> @@ -20,8 +20,13 @@
>  
>  #define CTR_L1IP_SHIFT   14
>  #define CTR_L1IP_MASK3
> +#define CTR_DMLINE_SHIFT 16
> +#define CTR_ERG_SHIFT20
>  #define CTR_CWG_SHIFT24
>  #define CTR_CWG_MASK 15
> +#define CTR_IDC_SHIFT28
> +#define CTR_DIC_SHIFT29
> +#define CTR_B31_SHIFT31
>  
>  #define CTR_L1IP(ctr)(((ctr) >> CTR_L1IP_SHIFT) & 
> CTR_L1IP_MASK)
>  
> diff --git a/arch/arm64/include/asm/cpucaps.h 
> b/arch/arm64/include/asm/cpucaps.h
> index bb26382..8dd42ae 100644
> --- a/arch/arm64/include/asm/cpucaps.h
> +++ b/arch/arm64/include/asm/cpucaps.h
> @@ -45,7 +45,9 @@
>  #define ARM64_HARDEN_BRANCH_PREDICTOR24
>  #define ARM64_HARDEN_BP_POST_GUEST_EXIT  25
>  #define ARM64_HAS_RAS_EXTN   26
> +#define ARM64_HAS_CACHE_IDC  27
> +#define ARM64_HAS_CACHE_DIC  28
>  
> -#define ARM64_NCAPS  27
> +#define ARM64_NCAPS  29
>  
>  #endif /* __ASM_CPUCAPS_H */
> diff --git a/arch/arm64/kernel/cpufeature.c b/arch/arm64/kernel/cpufeature.c
> index ff8a6e9..12e100a 100644
> --- a/arch/arm64/kernel/cpufeature.c
> +++ b/arch/arm64/kernel/cpufeature.c
> @@ -199,12 +199,12 @@ static int __init register_cpu_hwcaps_dumper(void)
>  };
>  
>  static const struct arm64_ftr_bits ftr_ctr[] = {
>