Re: [PATCH] arm64: Add support for new control bits CTR_EL0.IDC and CTR_EL0.IDC

2018-02-19 Thread kbuild test robot
Hi Shanker,

Thank you for the patch! Yet something to improve:

[auto build test ERROR on linus/master]
[also build test ERROR on v4.16-rc2 next-20180219]
[cannot apply to arm64/for-next/core]
[if your patch is applied to the wrong git tree, please drop us a note to help 
improve the system]

url:
https://github.com/0day-ci/linux/commits/Shanker-Donthineni/arm64-Add-support-for-new-control-bits-CTR_EL0-IDC-and-CTR_EL0-IDC/20180219-031155
config: arm64-defconfig (attached as .config)
compiler: aarch64-linux-gnu-gcc (Debian 7.2.0-11) 7.2.0
reproduce:
wget 
https://raw.githubusercontent.com/intel/lkp-tests/master/sbin/make.cross -O 
~/bin/make.cross
chmod +x ~/bin/make.cross
# save the attached .config to linux build tree
make.cross ARCH=arm64 

All errors (new ones prefixed by >>):

   arch/arm64/kernel/hibernate-asm.S: Assembler messages:
>> arch/arm64/kernel/hibernate-asm.S:101: Error: unexpected comma after the 
>> mnemonic name `mrs' -- `mrs ,ctr_el0'
>> arch/arm64/kernel/hibernate-asm.S:101: Error: operand 2 must be an integer 
>> register -- `ubfm x3,,#16,#19'
--
   arch/arm64/kernel/relocate_kernel.S: Assembler messages:
>> arch/arm64/kernel/relocate_kernel.S:37: Error: unexpected comma after the 
>> mnemonic name `mrs' -- `mrs ,ctr_el0'
>> arch/arm64/kernel/relocate_kernel.S:37: Error: operand 2 must be an integer 
>> register -- `ubfm x0,,#16,#19'

vim +101 arch/arm64/kernel/hibernate-asm.S

82869ac57 James Morse2016-04-27   46  
82869ac57 James Morse2016-04-27   47  
82869ac57 James Morse2016-04-27   48  /*
82869ac57 James Morse2016-04-27   49   * Resume from hibernate
82869ac57 James Morse2016-04-27   50   *
82869ac57 James Morse2016-04-27   51   * Loads temporary page tables 
then restores the memory image.
82869ac57 James Morse2016-04-27   52   * Finally branches to 
cpu_resume() to restore the state saved by
82869ac57 James Morse2016-04-27   53   * swsusp_arch_suspend().
82869ac57 James Morse2016-04-27   54   *
82869ac57 James Morse2016-04-27   55   * Because this code has to be 
copied to a 'safe' page, it can't call out to
82869ac57 James Morse2016-04-27   56   * other functions by PC-relative 
address. Also remember that it may be
82869ac57 James Morse2016-04-27   57   * mid-way through over-writing 
other functions. For this reason it contains
82869ac57 James Morse2016-04-27   58   * code from flush_icache_range() 
and uses the copy_page() macro.
82869ac57 James Morse2016-04-27   59   *
82869ac57 James Morse2016-04-27   60   * This 'safe' page is mapped via 
ttbr0, and executed from there. This function
82869ac57 James Morse2016-04-27   61   * switches to a copy of the 
linear map in ttbr1, performs the restore, then
82869ac57 James Morse2016-04-27   62   * switches ttbr1 to the original 
kernel's swapper_pg_dir.
82869ac57 James Morse2016-04-27   63   *
82869ac57 James Morse2016-04-27   64   * All of memory gets written to, 
including code. We need to clean the kernel
82869ac57 James Morse2016-04-27   65   * text to the Point of Coherence 
(PoC) before secondary cores can be booted.
82869ac57 James Morse2016-04-27   66   * Because the kernel modules and 
executable pages mapped to user space are
82869ac57 James Morse2016-04-27   67   * also written as data, we clean 
all pages we touch to the Point of
82869ac57 James Morse2016-04-27   68   * Unification (PoU).
82869ac57 James Morse2016-04-27   69   *
82869ac57 James Morse2016-04-27   70   * x0: physical address of 
temporary page tables
82869ac57 James Morse2016-04-27   71   * x1: physical address of 
swapper page tables
82869ac57 James Morse2016-04-27   72   * x2: address of cpu_resume
82869ac57 James Morse2016-04-27   73   * x3: linear map address of 
restore_pblist in the current kernel
82869ac57 James Morse2016-04-27   74   * x4: physical address of 
__hyp_stub_vectors, or 0
82869ac57 James Morse2016-04-27   75   * x5: physical address of a  
zero page that remains zero after resume
82869ac57 James Morse2016-04-27   76   */
82869ac57 James Morse2016-04-27   77  .pushsection
".hibernate_exit.text", "ax"
82869ac57 James Morse2016-04-27   78  ENTRY(swsusp_arch_suspend_exit)
82869ac57 James Morse2016-04-27   79/*
82869ac57 James Morse2016-04-27   80 * We execute from ttbr0, 
change ttbr1 to our copied linear map tables
82869ac57 James Morse2016-04-27   81 * with a break-before-make via 
the zero page
82869ac57 James Morse2016-04-27   82 */
529c4b05a Kristina Martsenko 2017-12-13   83break_before_make_ttbr_switch   
x5, x0, x6
82869ac57 James Morse2016-04-27   84  
82869ac57 James Morse2016-04-27   85mov x21, x1
82869ac57 James Morse2016-04-27   86mov 

Re: [PATCH] arm64: Add support for new control bits CTR_EL0.IDC and CTR_EL0.IDC

2018-02-19 Thread Shanker Donthineni
Thanks Catalin for your comments.

On 02/19/2018 11:18 AM, Catalin Marinas wrote:
> On Mon, Feb 19, 2018 at 10:35:30AM -0600, Shanker Donthineni wrote:
>> On 02/19/2018 08:38 AM, Catalin Marinas wrote:
>>> On the patch, I'd rather have an alternative framework entry for no VAU
>>> cache maint required and some ret instruction at the beginning of the
>>> cache maint function rather than jumping out of the loop somewhere
>>> inside the cache maintenance code, penalising the CPUs that do require
>>> it.
>>
>> Alternative framework might break things in case of CPU hotplug. I need one
>> more confirmation from you on incorporating alternative framework. 
> 
> CPU hotplug can be an issue but it should be handled like other similar
> cases: if a CPU comes online late and its features are incompatible, it
> should not be brought online. The cpufeature code handles this.
> 
> With Will's patch for CTR_EL0, we handle different CPU features during
> boot, defaulting to the lowest value for the IDC/DIC bits.
> 
> I suggest you add new ARM64_HAS_* feature bits and enable them based on
> CTR_EL0.IDC and DIC. You could check for both being 1 with a single
> feature bit but I guess an implementation is allowed to have these
> different (e.g. DIC == 0 and IDC == 1).
> 

I'll add two new features ARM64_HAS_DIC and ARM64_HAS_IDC to support
all implementations. Unfortunately QCOM server chips supports IDC not DIC.
 

-- 
Shanker Donthineni
Qualcomm Datacenter Technologies, Inc. as an affiliate of Qualcomm 
Technologies, Inc.
Qualcomm Technologies, Inc. is a member of the Code Aurora Forum, a Linux 
Foundation Collaborative Project.
___
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm


Re: [PATCH] arm64: Add support for new control bits CTR_EL0.IDC and CTR_EL0.IDC

2018-02-19 Thread Catalin Marinas
On Mon, Feb 19, 2018 at 10:35:30AM -0600, Shanker Donthineni wrote:
> On 02/19/2018 08:38 AM, Catalin Marinas wrote:
> > On the patch, I'd rather have an alternative framework entry for no VAU
> > cache maint required and some ret instruction at the beginning of the
> > cache maint function rather than jumping out of the loop somewhere
> > inside the cache maintenance code, penalising the CPUs that do require
> > it.
> 
> Alternative framework might break things in case of CPU hotplug. I need one
> more confirmation from you on incorporating alternative framework. 

CPU hotplug can be an issue but it should be handled like other similar
cases: if a CPU comes online late and its features are incompatible, it
should not be brought online. The cpufeature code handles this.

With Will's patch for CTR_EL0, we handle different CPU features during
boot, defaulting to the lowest value for the IDC/DIC bits.

I suggest you add new ARM64_HAS_* feature bits and enable them based on
CTR_EL0.IDC and DIC. You could check for both being 1 with a single
feature bit but I guess an implementation is allowed to have these
different (e.g. DIC == 0 and IDC == 1).

-- 
Catalin
___
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm


Re: [PATCH] arm64: Add support for new control bits CTR_EL0.IDC and CTR_EL0.IDC

2018-02-19 Thread Shanker Donthineni
Hi Will,

On 02/19/2018 08:43 AM, Will Deacon wrote:
> Hi Shanker,
> 
> On Fri, Feb 16, 2018 at 06:57:46PM -0600, Shanker Donthineni wrote:
>> Two point of unification cache maintenance operations 'DC CVAU' and
>> 'IC IVAU' are optional for implementors as per ARMv8 specification.
>> This patch parses the updated CTR_EL0 register definition and adds
>> the required changes to skip POU operations if the hardware reports
>> CTR_EL0.IDC and/or CTR_EL0.IDC.
>>
>> CTR_EL0.DIC: Instruction cache invalidation requirements for
>>  instruction to data coherence. The meaning of this bit[29].
>>   0: Instruction cache invalidation to the point of unification
>>  is required for instruction to data coherence.
>>   1: Instruction cache cleaning to the point of unification is
>>   not required for instruction to data coherence.
>>
>> CTR_EL0.IDC: Data cache clean requirements for instruction to data
>>  coherence. The meaning of this bit[28].
>>   0: Data cache clean to the point of unification is required for
>>  instruction to data coherence, unless CLIDR_EL1.LoC == 0b000
>>  or (CLIDR_EL1.LoUIS == 0b000 && CLIDR_EL1.LoUU == 0b000).
>>   1: Data cache clean to the point of unification is not required
>>  for instruction to data coherence.
>>
>> Signed-off-by: Philip Elcan 
>> Signed-off-by: Shanker Donthineni 
>> ---
>>  arch/arm64/include/asm/assembler.h | 48 
>> --
>>  arch/arm64/include/asm/cache.h |  2 ++
>>  arch/arm64/kernel/cpufeature.c |  2 ++
>>  arch/arm64/mm/cache.S  | 26 ++---
>>  4 files changed, 51 insertions(+), 27 deletions(-)
> 
> I was looking at our CTR_EL0 code last week but forgot to post the patch I
> wrote fixing up some of the fields. I just send it now, so please can
> you rebase on top of:
> 
> http://lists.infradead.org/pipermail/linux-arm-kernel/2018-February/560488.html
> 
> Also:
> 
>> diff --git a/arch/arm64/include/asm/cache.h b/arch/arm64/include/asm/cache.h
>> index ea9bb4e..aea533b 100644
>> --- a/arch/arm64/include/asm/cache.h
>> +++ b/arch/arm64/include/asm/cache.h
>> @@ -22,6 +22,8 @@
>>  #define CTR_L1IP_MASK   3
>>  #define CTR_CWG_SHIFT   24
>>  #define CTR_CWG_MASK15
>> +#define CTR_IDC_SHIFT   28
>> +#define CTR_DIC_SHIFT   29
>>  
>>  #define CTR_L1IP(ctr)   (((ctr) >> CTR_L1IP_SHIFT) & 
>> CTR_L1IP_MASK)
>>  
>> diff --git a/arch/arm64/kernel/cpufeature.c b/arch/arm64/kernel/cpufeature.c
>> index 29b1f87..f42bb5a 100644
>> --- a/arch/arm64/kernel/cpufeature.c
>> +++ b/arch/arm64/kernel/cpufeature.c
>> @@ -200,6 +200,8 @@ static int __init register_cpu_hwcaps_dumper(void)
>>  
>>  static const struct arm64_ftr_bits ftr_ctr[] = {
>>  ARM64_FTR_BITS(FTR_VISIBLE, FTR_STRICT, FTR_EXACT, 31, 1, 1),   /* RAO 
>> */
>> +ARM64_FTR_BITS(FTR_VISIBLE, FTR_STRICT, FTR_LOWER_SAFE, CTR_DIC_SHIFT, 
>> 1, 0),   /* DIC */
>> +ARM64_FTR_BITS(FTR_VISIBLE, FTR_STRICT, FTR_LOWER_SAFE, CTR_IDC_SHIFT, 
>> 1, 0),   /* IDC */
>>  ARM64_FTR_BITS(FTR_VISIBLE, FTR_STRICT, FTR_HIGHER_SAFE, 24, 4, 0), 
>> /* CWG */
>>  ARM64_FTR_BITS(FTR_VISIBLE, FTR_STRICT, FTR_LOWER_SAFE, 20, 4, 0),  
>> /* ERG */
>>  ARM64_FTR_BITS(FTR_VISIBLE, FTR_STRICT, FTR_LOWER_SAFE, 16, 4, 1),  
>> /* DminLine */
> 
> Could you update the other table entries here to use the CTR_*_SHIFT values
> as well?
> 

I'll do.

> Thanks,
> 
> Will
> 
> ___
> linux-arm-kernel mailing list
> linux-arm-ker...@lists.infradead.org
> http://lists.infradead.org/mailman/listinfo/linux-arm-kernel
> 

-- 
Shanker Donthineni
Qualcomm Datacenter Technologies, Inc. as an affiliate of Qualcomm 
Technologies, Inc.
Qualcomm Technologies, Inc. is a member of the Code Aurora Forum, a Linux 
Foundation Collaborative Project.
___
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm


Re: [PATCH] arm64: Add support for new control bits CTR_EL0.IDC and CTR_EL0.IDC

2018-02-19 Thread Shanker Donthineni
Hi Catalin,

On 02/19/2018 08:38 AM, Catalin Marinas wrote:
> On Fri, Feb 16, 2018 at 06:57:46PM -0600, Shanker Donthineni wrote:
>> Two point of unification cache maintenance operations 'DC CVAU' and
>> 'IC IVAU' are optional for implementors as per ARMv8 specification.
>> This patch parses the updated CTR_EL0 register definition and adds
>> the required changes to skip POU operations if the hardware reports
>> CTR_EL0.IDC and/or CTR_EL0.IDC.
>>
>> CTR_EL0.DIC: Instruction cache invalidation requirements for
>>  instruction to data coherence. The meaning of this bit[29].
>>   0: Instruction cache invalidation to the point of unification
>>  is required for instruction to data coherence.
>>   1: Instruction cache cleaning to the point of unification is
>>   not required for instruction to data coherence.
>>
>> CTR_EL0.IDC: Data cache clean requirements for instruction to data
>>  coherence. The meaning of this bit[28].
>>   0: Data cache clean to the point of unification is required for
>>  instruction to data coherence, unless CLIDR_EL1.LoC == 0b000
>>  or (CLIDR_EL1.LoUIS == 0b000 && CLIDR_EL1.LoUU == 0b000).
>>   1: Data cache clean to the point of unification is not required
>>  for instruction to data coherence.
> 
> There is a difference between cache maintenance to PoU "is not required"
> and the actual instructions being optional (i.e. undef when executed).
> If your caches are transparent and DC CVAU/IC IVAU is not required,
> these instructions should behave as NOPs. So, are you trying to improve
> the performance of the cache maintenance routines in the kernel? If yes,
> please show some (relative) numbers and a better description in the
> commit log.
> 

Yes, I agree with you, POU instructions are NOPs if the caches are transparent.
There was no issue as per correctness point of view. But causing the unnecessary
overhead in ASM routines where code goes thorough VA range incremented
by cache line size. This overhead is noticeable with 64K PAGE, especially with 
sections mappings. I'll reword the commit text to reflect your comments in v2 
patch.

e.g. 512M section with 64K PAGE_SIZE kernel, assume 64Bytes cache size.
 flush_icache_range() consumes around 256M cpu cycles
 
Icache loop overhead: 512Mbytes / 64Bytes * 4 instructions per loop
Dcache loop overhead: 512Mbytes / 64Bytes * 4 instructions per loop


With this patch it takes less than ~1K cycles.

 
> On the patch, I'd rather have an alternative framework entry for no VAU
> cache maint required and some ret instruction at the beginning of the
> cache maint function rather than jumping out of the loop somewhere
> inside the cache maintenance code, penalising the CPUs that do require
> it.
> 

Alternative framework might break things in case of CPU hotplug. I need one
more confirmation from you on incorporating alternative framework. 

-- 
Shanker Donthineni
Qualcomm Datacenter Technologies, Inc. as an affiliate of Qualcomm 
Technologies, Inc.
Qualcomm Technologies, Inc. is a member of the Code Aurora Forum, a Linux 
Foundation Collaborative Project.
___
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm


Re: [PATCH] arm64: Add support for new control bits CTR_EL0.IDC and CTR_EL0.IDC

2018-02-19 Thread Will Deacon
Hi Shanker,

On Fri, Feb 16, 2018 at 06:57:46PM -0600, Shanker Donthineni wrote:
> Two point of unification cache maintenance operations 'DC CVAU' and
> 'IC IVAU' are optional for implementors as per ARMv8 specification.
> This patch parses the updated CTR_EL0 register definition and adds
> the required changes to skip POU operations if the hardware reports
> CTR_EL0.IDC and/or CTR_EL0.IDC.
> 
> CTR_EL0.DIC: Instruction cache invalidation requirements for
>  instruction to data coherence. The meaning of this bit[29].
>   0: Instruction cache invalidation to the point of unification
>  is required for instruction to data coherence.
>   1: Instruction cache cleaning to the point of unification is
>   not required for instruction to data coherence.
> 
> CTR_EL0.IDC: Data cache clean requirements for instruction to data
>  coherence. The meaning of this bit[28].
>   0: Data cache clean to the point of unification is required for
>  instruction to data coherence, unless CLIDR_EL1.LoC == 0b000
>  or (CLIDR_EL1.LoUIS == 0b000 && CLIDR_EL1.LoUU == 0b000).
>   1: Data cache clean to the point of unification is not required
>  for instruction to data coherence.
> 
> Signed-off-by: Philip Elcan 
> Signed-off-by: Shanker Donthineni 
> ---
>  arch/arm64/include/asm/assembler.h | 48 
> --
>  arch/arm64/include/asm/cache.h |  2 ++
>  arch/arm64/kernel/cpufeature.c |  2 ++
>  arch/arm64/mm/cache.S  | 26 ++---
>  4 files changed, 51 insertions(+), 27 deletions(-)

I was looking at our CTR_EL0 code last week but forgot to post the patch I
wrote fixing up some of the fields. I just send it now, so please can
you rebase on top of:

http://lists.infradead.org/pipermail/linux-arm-kernel/2018-February/560488.html

Also:

> diff --git a/arch/arm64/include/asm/cache.h b/arch/arm64/include/asm/cache.h
> index ea9bb4e..aea533b 100644
> --- a/arch/arm64/include/asm/cache.h
> +++ b/arch/arm64/include/asm/cache.h
> @@ -22,6 +22,8 @@
>  #define CTR_L1IP_MASK3
>  #define CTR_CWG_SHIFT24
>  #define CTR_CWG_MASK 15
> +#define CTR_IDC_SHIFT28
> +#define CTR_DIC_SHIFT29
>  
>  #define CTR_L1IP(ctr)(((ctr) >> CTR_L1IP_SHIFT) & 
> CTR_L1IP_MASK)
>  
> diff --git a/arch/arm64/kernel/cpufeature.c b/arch/arm64/kernel/cpufeature.c
> index 29b1f87..f42bb5a 100644
> --- a/arch/arm64/kernel/cpufeature.c
> +++ b/arch/arm64/kernel/cpufeature.c
> @@ -200,6 +200,8 @@ static int __init register_cpu_hwcaps_dumper(void)
>  
>  static const struct arm64_ftr_bits ftr_ctr[] = {
>   ARM64_FTR_BITS(FTR_VISIBLE, FTR_STRICT, FTR_EXACT, 31, 1, 1),   /* RAO 
> */
> + ARM64_FTR_BITS(FTR_VISIBLE, FTR_STRICT, FTR_LOWER_SAFE, CTR_DIC_SHIFT, 
> 1, 0),   /* DIC */
> + ARM64_FTR_BITS(FTR_VISIBLE, FTR_STRICT, FTR_LOWER_SAFE, CTR_IDC_SHIFT, 
> 1, 0),   /* IDC */
>   ARM64_FTR_BITS(FTR_VISIBLE, FTR_STRICT, FTR_HIGHER_SAFE, 24, 4, 0), 
> /* CWG */
>   ARM64_FTR_BITS(FTR_VISIBLE, FTR_STRICT, FTR_LOWER_SAFE, 20, 4, 0),  
> /* ERG */
>   ARM64_FTR_BITS(FTR_VISIBLE, FTR_STRICT, FTR_LOWER_SAFE, 16, 4, 1),  
> /* DminLine */

Could you update the other table entries here to use the CTR_*_SHIFT values
as well?

Thanks,

Will
___
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm


Re: [PATCH] arm64: Add support for new control bits CTR_EL0.IDC and CTR_EL0.IDC

2018-02-19 Thread Catalin Marinas
On Fri, Feb 16, 2018 at 06:57:46PM -0600, Shanker Donthineni wrote:
> Two point of unification cache maintenance operations 'DC CVAU' and
> 'IC IVAU' are optional for implementors as per ARMv8 specification.
> This patch parses the updated CTR_EL0 register definition and adds
> the required changes to skip POU operations if the hardware reports
> CTR_EL0.IDC and/or CTR_EL0.IDC.
> 
> CTR_EL0.DIC: Instruction cache invalidation requirements for
>  instruction to data coherence. The meaning of this bit[29].
>   0: Instruction cache invalidation to the point of unification
>  is required for instruction to data coherence.
>   1: Instruction cache cleaning to the point of unification is
>   not required for instruction to data coherence.
> 
> CTR_EL0.IDC: Data cache clean requirements for instruction to data
>  coherence. The meaning of this bit[28].
>   0: Data cache clean to the point of unification is required for
>  instruction to data coherence, unless CLIDR_EL1.LoC == 0b000
>  or (CLIDR_EL1.LoUIS == 0b000 && CLIDR_EL1.LoUU == 0b000).
>   1: Data cache clean to the point of unification is not required
>  for instruction to data coherence.

There is a difference between cache maintenance to PoU "is not required"
and the actual instructions being optional (i.e. undef when executed).
If your caches are transparent and DC CVAU/IC IVAU is not required,
these instructions should behave as NOPs. So, are you trying to improve
the performance of the cache maintenance routines in the kernel? If yes,
please show some (relative) numbers and a better description in the
commit log.

On the patch, I'd rather have an alternative framework entry for no VAU
cache maint required and some ret instruction at the beginning of the
cache maint function rather than jumping out of the loop somewhere
inside the cache maintenance code, penalising the CPUs that do require
it.

-- 
Catalin
___
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm