Re: [PATCH v3] powerpc/pseries: detect secure and trusted boot state of the system.

2020-07-16 Thread Daniel Axtens
Michal Suchánek  writes:

> On Wed, Jul 15, 2020 at 07:52:01AM -0400, Nayna Jain wrote:
>> The device-tree property to check secure and trusted boot state is
>> different for guests(pseries) compared to baremetal(powernv).
>> 
>> This patch updates the existing is_ppc_secureboot_enabled() and
>> is_ppc_trustedboot_enabled() functions to add support for pseries.
>> 
>> The secureboot and trustedboot state are exposed via device-tree property:
>> /proc/device-tree/ibm,secure-boot and /proc/device-tree/ibm,trusted-boot
>> 
>> The values of ibm,secure-boot under pseries are interpreted as:
>   ^^^
>> 
>> 0 - Disabled
>> 1 - Enabled in Log-only mode. This patch interprets this value as
>> disabled, since audit mode is currently not supported for Linux.
>> 2 - Enabled and enforced.
>> 3-9 - Enabled and enforcing; requirements are at the discretion of the
>> operating system.
>> 
>> The values of ibm,trusted-boot under pseries are interpreted as:
>^^^
> These two should be different I suppose?

I'm not quite sure what you mean? They'll be documented in a future
revision of the PAPR, once I get my act together and submit the
relevant internal paperwork.

Daniel
>
> Thanks
>
> Michal
>> 0 - Disabled
>> 1 - Enabled
>> 
>> Signed-off-by: Nayna Jain 
>> Reviewed-by: Daniel Axtens 
>> ---
>> v3:
>> * fixed double check. Thanks Daniel for noticing it.
>> * updated patch description.
>> 
>> v2:
>> * included Michael Ellerman's feedback.
>> * added Daniel Axtens's Reviewed-by.
>> 
>>  arch/powerpc/kernel/secure_boot.c | 19 +--
>>  1 file changed, 17 insertions(+), 2 deletions(-)
>> 
>> diff --git a/arch/powerpc/kernel/secure_boot.c 
>> b/arch/powerpc/kernel/secure_boot.c
>> index 4b982324d368..118bcb5f79c4 100644
>> --- a/arch/powerpc/kernel/secure_boot.c
>> +++ b/arch/powerpc/kernel/secure_boot.c
>> @@ -6,6 +6,7 @@
>>  #include 
>>  #include 
>>  #include 
>> +#include 
>>  
>>  static struct device_node *get_ppc_fw_sb_node(void)
>>  {
>> @@ -23,12 +24,19 @@ bool is_ppc_secureboot_enabled(void)
>>  {
>>  struct device_node *node;
>>  bool enabled = false;
>> +u32 secureboot;
>>  
>>  node = get_ppc_fw_sb_node();
>>  enabled = of_property_read_bool(node, "os-secureboot-enforcing");
>> -
>>  of_node_put(node);
>>  
>> +if (enabled)
>> +goto out;
>> +
>> +if (!of_property_read_u32(of_root, "ibm,secure-boot", ))
>> +enabled = (secureboot > 1);
>> +
>> +out:
>>  pr_info("Secure boot mode %s\n", enabled ? "enabled" : "disabled");
>>  
>>  return enabled;
>> @@ -38,12 +46,19 @@ bool is_ppc_trustedboot_enabled(void)
>>  {
>>  struct device_node *node;
>>  bool enabled = false;
>> +u32 trustedboot;
>>  
>>  node = get_ppc_fw_sb_node();
>>  enabled = of_property_read_bool(node, "trusted-enabled");
>> -
>>  of_node_put(node);
>>  
>> +if (enabled)
>> +goto out;
>> +
>> +if (!of_property_read_u32(of_root, "ibm,trusted-boot", ))
>> +enabled = (trustedboot > 0);
>> +
>> +out:
>>  pr_info("Trusted boot mode %s\n", enabled ? "enabled" : "disabled");
>>  
>>  return enabled;
>> -- 
>> 2.26.2
>> 


Re: [PATCH 04/11] powerpc/smp: Enable small core scheduling sooner

2020-07-16 Thread Gautham R Shenoy
On Tue, Jul 14, 2020 at 10:06:17AM +0530, Srikar Dronamraju wrote:
> Enable small core scheduling as soon as we detect that we are in a
> system that supports thread group. Doing so would avoid a redundant
> check.
> 
> Cc: linuxppc-dev 
> Cc: Michael Ellerman 
> Cc: Nick Piggin 
> Cc: Oliver OHalloran 
> Cc: Nathan Lynch 
> Cc: Michael Neuling 
> Cc: Anton Blanchard 
> Cc: Gautham R Shenoy 
> Cc: Vaidyanathan Srinivasan 
> Signed-off-by: Srikar Dronamraju 

I don't see a problem with this.

However, since we are now going to be maintaining a single topology
structure, wouldn't it be better to collate all the changes being made
to the mask_functions/flags/names of this structure within a single
function so that it becomes easier to keep track of what all changes
are going into the topology and why are we doing it?


> ---
>  arch/powerpc/kernel/smp.c | 12 ++--
>  1 file changed, 6 insertions(+), 6 deletions(-)
> 
> diff --git a/arch/powerpc/kernel/smp.c b/arch/powerpc/kernel/smp.c
> index 24529f6134aa..7d430fc536cc 100644
> --- a/arch/powerpc/kernel/smp.c
> +++ b/arch/powerpc/kernel/smp.c
> @@ -892,6 +892,12 @@ static int init_big_cores(void)
>   }
> 
>   has_big_cores = true;
> +
> +#ifdef CONFIG_SCHED_SMT
> + pr_info("Big cores detected. Using small core scheduling\n");
> + powerpc_topology[0].mask = smallcore_smt_mask;
> +#endif
> +
>   return 0;
>  }
> 
> @@ -1383,12 +1389,6 @@ void __init smp_cpus_done(unsigned int max_cpus)
> 
>   dump_numa_cpu_topology();
> 
> -#ifdef CONFIG_SCHED_SMT
> - if (has_big_cores) {
> - pr_info("Big cores detected but using small core scheduling\n");
> - powerpc_topology[0].mask = smallcore_smt_mask;
> - }
> -#endif
>   set_sched_topology(powerpc_topology);
>  }
> 
> -- 
> 2.17.1
> 


Re: [PATCH 03/11] powerpc/smp: Move powerpc_topology above

2020-07-16 Thread Gautham R Shenoy
On Tue, Jul 14, 2020 at 10:06:16AM +0530, Srikar Dronamraju wrote:
> Just moving the powerpc_topology description above.
> This will help in using functions in this file and avoid declarations.
> 
> No other functional changes
> 
> Cc: linuxppc-dev 
> Cc: Michael Ellerman 
> Cc: Nick Piggin 
> Cc: Oliver OHalloran 
> Cc: Nathan Lynch 
> Cc: Michael Neuling 
> Cc: Anton Blanchard 
> Cc: Gautham R Shenoy 
> Cc: Vaidyanathan Srinivasan 
> Signed-off-by: Srikar Dronamraju 

Reviewed-by: Gautham R. Shenoy 

> ---
>  arch/powerpc/kernel/smp.c | 116 +++---
>  1 file changed, 58 insertions(+), 58 deletions(-)
> 
> diff --git a/arch/powerpc/kernel/smp.c b/arch/powerpc/kernel/smp.c
> index 069ea4b21c6d..24529f6134aa 100644
> --- a/arch/powerpc/kernel/smp.c
> +++ b/arch/powerpc/kernel/smp.c
> @@ -818,6 +818,64 @@ static int init_cpu_l1_cache_map(int cpu)
>   return err;
>  }
> 
> +static bool shared_caches;
> +
> +#ifdef CONFIG_SCHED_SMT
> +/* cpumask of CPUs with asymmetric SMT dependency */
> +static int powerpc_smt_flags(void)
> +{
> + int flags = SD_SHARE_CPUCAPACITY | SD_SHARE_PKG_RESOURCES;
> +
> + if (cpu_has_feature(CPU_FTR_ASYM_SMT)) {
> + printk_once(KERN_INFO "Enabling Asymmetric SMT scheduling\n");
> + flags |= SD_ASYM_PACKING;
> + }
> + return flags;
> +}
> +#endif
> +
> +/*
> + * P9 has a slightly odd architecture where pairs of cores share an L2 cache.
> + * This topology makes it *much* cheaper to migrate tasks between adjacent 
> cores
> + * since the migrated task remains cache hot. We want to take advantage of 
> this
> + * at the scheduler level so an extra topology level is required.
> + */
> +static int powerpc_shared_cache_flags(void)
> +{
> + return SD_SHARE_PKG_RESOURCES;
> +}
> +
> +/*
> + * We can't just pass cpu_l2_cache_mask() directly because
> + * returns a non-const pointer and the compiler barfs on that.
> + */
> +static const struct cpumask *shared_cache_mask(int cpu)
> +{
> + if (shared_caches)
> + return cpu_l2_cache_mask(cpu);
> +
> + if (has_big_cores)
> + return cpu_smallcore_mask(cpu);
> +
> + return cpu_smt_mask(cpu);
> +}
> +
> +#ifdef CONFIG_SCHED_SMT
> +static const struct cpumask *smallcore_smt_mask(int cpu)
> +{
> + return cpu_smallcore_mask(cpu);
> +}
> +#endif
> +
> +static struct sched_domain_topology_level powerpc_topology[] = {
> +#ifdef CONFIG_SCHED_SMT
> + { cpu_smt_mask, powerpc_smt_flags, SD_INIT_NAME(SMT) },
> +#endif
> + { shared_cache_mask, powerpc_shared_cache_flags, SD_INIT_NAME(CACHE) },
> + { cpu_cpu_mask, SD_INIT_NAME(DIE) },
> + { NULL, },
> +};
> +
>  static int init_big_cores(void)
>  {
>   int cpu;
> @@ -1249,8 +1307,6 @@ static void add_cpu_to_masks(int cpu)
>   set_cpus_related(cpu, i, cpu_core_mask);
>  }
> 
> -static bool shared_caches;
> -
>  /* Activate a secondary processor. */
>  void start_secondary(void *unused)
>  {
> @@ -1314,62 +1370,6 @@ int setup_profiling_timer(unsigned int multiplier)
>   return 0;
>  }
> 
> -#ifdef CONFIG_SCHED_SMT
> -/* cpumask of CPUs with asymmetric SMT dependency */
> -static int powerpc_smt_flags(void)
> -{
> - int flags = SD_SHARE_CPUCAPACITY | SD_SHARE_PKG_RESOURCES;
> -
> - if (cpu_has_feature(CPU_FTR_ASYM_SMT)) {
> - printk_once(KERN_INFO "Enabling Asymmetric SMT scheduling\n");
> - flags |= SD_ASYM_PACKING;
> - }
> - return flags;
> -}
> -#endif
> -
> -/*
> - * P9 has a slightly odd architecture where pairs of cores share an L2 cache.
> - * This topology makes it *much* cheaper to migrate tasks between adjacent 
> cores
> - * since the migrated task remains cache hot. We want to take advantage of 
> this
> - * at the scheduler level so an extra topology level is required.
> - */
> -static int powerpc_shared_cache_flags(void)
> -{
> - return SD_SHARE_PKG_RESOURCES;
> -}
> -
> -/*
> - * We can't just pass cpu_l2_cache_mask() directly because
> - * returns a non-const pointer and the compiler barfs on that.
> - */
> -static const struct cpumask *shared_cache_mask(int cpu)
> -{
> - if (shared_caches)
> - return cpu_l2_cache_mask(cpu);
> -
> - if (has_big_cores)
> - return cpu_smallcore_mask(cpu);
> -
> - return cpu_smt_mask(cpu);
> -}
> -
> -#ifdef CONFIG_SCHED_SMT
> -static const struct cpumask *smallcore_smt_mask(int cpu)
> -{
> - return cpu_smallcore_mask(cpu);
> -}
> -#endif
> -
> -static struct sched_domain_topology_level powerpc_topology[] = {
> -#ifdef CONFIG_SCHED_SMT
> - { cpu_smt_mask, powerpc_smt_flags, SD_INIT_NAME(SMT) },
> -#endif
> - { shared_cache_mask, powerpc_shared_cache_flags, SD_INIT_NAME(CACHE) },
> - { cpu_cpu_mask, SD_INIT_NAME(DIE) },
> - { NULL, },
> -};
> -
>  void __init smp_cpus_done(unsigned int max_cpus)
>  {
>   /*
> -- 
> 2.17.1
> 


Re: [PATCH v4 05/10] powerpc/dt_cpu_ftrs: Add feature for 2nd DAWR

2020-07-16 Thread Jordan Niethe
On Fri, Jul 17, 2020 at 2:10 PM Ravi Bangoria
 wrote:
>
> Add new device-tree feature for 2nd DAWR. If this feature is present,
> 2nd DAWR is supported, otherwise not.
>
> Signed-off-by: Ravi Bangoria 
> ---
>  arch/powerpc/include/asm/cputable.h | 7 +--
>  arch/powerpc/kernel/dt_cpu_ftrs.c   | 7 +++
>  2 files changed, 12 insertions(+), 2 deletions(-)
>
> diff --git a/arch/powerpc/include/asm/cputable.h 
> b/arch/powerpc/include/asm/cputable.h
> index e506d429b1af..3445c86e1f6f 100644
> --- a/arch/powerpc/include/asm/cputable.h
> +++ b/arch/powerpc/include/asm/cputable.h
> @@ -214,6 +214,7 @@ static inline void cpu_feature_keys_init(void) { }
>  #define CPU_FTR_P9_TLBIE_ERAT_BUG  LONG_ASM_CONST(0x0001)
>  #define CPU_FTR_P9_RADIX_PREFETCH_BUG  LONG_ASM_CONST(0x0002)
>  #define CPU_FTR_ARCH_31
> LONG_ASM_CONST(0x0004)
> +#define CPU_FTR_DAWR1  LONG_ASM_CONST(0x0008)
>
>  #ifndef __ASSEMBLY__
>
> @@ -497,14 +498,16 @@ static inline void cpu_feature_keys_init(void) { }
>  #define CPU_FTRS_POSSIBLE  \
> (CPU_FTRS_POWER7 | CPU_FTRS_POWER8E | CPU_FTRS_POWER8 | \
>  CPU_FTR_ALTIVEC_COMP | CPU_FTR_VSX_COMP | CPU_FTRS_POWER9 | \
> -CPU_FTRS_POWER9_DD2_1 | CPU_FTRS_POWER9_DD2_2 | CPU_FTRS_POWER10)
> +CPU_FTRS_POWER9_DD2_1 | CPU_FTRS_POWER9_DD2_2 | CPU_FTRS_POWER10 
> | \
> +CPU_FTR_DAWR1)
>  #else
>  #define CPU_FTRS_POSSIBLE  \
> (CPU_FTRS_PPC970 | CPU_FTRS_POWER5 | \
>  CPU_FTRS_POWER6 | CPU_FTRS_POWER7 | CPU_FTRS_POWER8E | \
>  CPU_FTRS_POWER8 | CPU_FTRS_CELL | CPU_FTRS_PA6T | \
>  CPU_FTR_VSX_COMP | CPU_FTR_ALTIVEC_COMP | CPU_FTRS_POWER9 | \
> -CPU_FTRS_POWER9_DD2_1 | CPU_FTRS_POWER9_DD2_2 | CPU_FTRS_POWER10)
> +CPU_FTRS_POWER9_DD2_1 | CPU_FTRS_POWER9_DD2_2 | CPU_FTRS_POWER10 
> | \
> +CPU_FTR_DAWR1)
Instead of putting CPU_FTR_DAWR1 into CPU_FTRS_POSSIBLE should it go
into CPU_FTRS_POWER10?
Then it will be picked up by CPU_FTRS_POSSIBLE.
>  #endif /* CONFIG_CPU_LITTLE_ENDIAN */
>  #endif
>  #else
> diff --git a/arch/powerpc/kernel/dt_cpu_ftrs.c 
> b/arch/powerpc/kernel/dt_cpu_ftrs.c
> index ac650c233cd9..c78cd3596ec4 100644
> --- a/arch/powerpc/kernel/dt_cpu_ftrs.c
> +++ b/arch/powerpc/kernel/dt_cpu_ftrs.c
> @@ -574,6 +574,12 @@ static int __init feat_enable_mma(struct dt_cpu_feature 
> *f)
> return 1;
>  }
>
> +static int __init feat_enable_debug_facilities_v31(struct dt_cpu_feature *f)
> +{
> +   cur_cpu_spec->cpu_features |= CPU_FTR_DAWR1;
> +   return 1;
> +}
> +
>  struct dt_cpu_feature_match {
> const char *name;
> int (*enable)(struct dt_cpu_feature *f);
> @@ -649,6 +655,7 @@ static struct dt_cpu_feature_match __initdata
> {"wait-v3", feat_enable, 0},
> {"prefix-instructions", feat_enable, 0},
> {"matrix-multiply-assist", feat_enable_mma, 0},
> +   {"debug-facilities-v31", feat_enable_debug_facilities_v31, 0},
Since all feat_enable_debug_facilities_v31() does is set
CPU_FTR_DAWR1, if you just have:
{"debug-facilities-v31", feat_enable, CPU_FTR_DAWR1},
I think cpufeatures_process_feature() should set it in for you at this point:
if (m->enable(f)) {
cur_cpu_spec->cpu_features |= m->cpu_ftr_bit_mask;
break;
}

>  };
>
>  static bool __initdata using_dt_cpu_ftrs;
> --
> 2.26.2
>


Re: [PATCH 02/11] powerpc/smp: Merge Power9 topology with Power topology

2020-07-16 Thread Gautham R Shenoy
Hi Srikar,

On Tue, Jul 14, 2020 at 10:06:15AM +0530, Srikar Dronamraju wrote:
> A new sched_domain_topology_level was added just for Power9. However the
> same can be achieved by merging powerpc_topology with power9_topology
> and makes the code more simpler especially when adding a new sched
> domain.
> 
> Cc: linuxppc-dev 
> Cc: Michael Ellerman 
> Cc: Nick Piggin 
> Cc: Oliver OHalloran 
> Cc: Nathan Lynch 
> Cc: Michael Neuling 
> Cc: Anton Blanchard 
> Cc: Gautham R Shenoy 
> Cc: Vaidyanathan Srinivasan 
> Signed-off-by: Srikar Dronamraju 
> ---
>  arch/powerpc/kernel/smp.c | 33 ++---
>  1 file changed, 10 insertions(+), 23 deletions(-)
> 
> diff --git a/arch/powerpc/kernel/smp.c b/arch/powerpc/kernel/smp.c
> index 680c0edcc59d..069ea4b21c6d 100644
> --- a/arch/powerpc/kernel/smp.c
> +++ b/arch/powerpc/kernel/smp.c
> @@ -1315,7 +1315,7 @@ int setup_profiling_timer(unsigned int multiplier)
>  }
> 
>  #ifdef CONFIG_SCHED_SMT
> -/* cpumask of CPUs with asymetric SMT dependancy */
> +/* cpumask of CPUs with asymmetric SMT dependency */
>  static int powerpc_smt_flags(void)
>  {
>   int flags = SD_SHARE_CPUCAPACITY | SD_SHARE_PKG_RESOURCES;
> @@ -1328,14 +1328,6 @@ static int powerpc_smt_flags(void)
>  }
>  #endif
> 
> -static struct sched_domain_topology_level powerpc_topology[] = {
> -#ifdef CONFIG_SCHED_SMT
> - { cpu_smt_mask, powerpc_smt_flags, SD_INIT_NAME(SMT) },
> -#endif
> - { cpu_cpu_mask, SD_INIT_NAME(DIE) },
> - { NULL, },
> -};
> -
>  /*
>   * P9 has a slightly odd architecture where pairs of cores share an L2 cache.
>   * This topology makes it *much* cheaper to migrate tasks between adjacent 
> cores
> @@ -1353,7 +1345,13 @@ static int powerpc_shared_cache_flags(void)
>   */
>  static const struct cpumask *shared_cache_mask(int cpu)
>  {
> - return cpu_l2_cache_mask(cpu);
> + if (shared_caches)
> + return cpu_l2_cache_mask(cpu);
> +
> + if (has_big_cores)
> + return cpu_smallcore_mask(cpu);
> +
> + return cpu_smt_mask(cpu);
>  }
> 
>  #ifdef CONFIG_SCHED_SMT
> @@ -1363,7 +1361,7 @@ static const struct cpumask *smallcore_smt_mask(int cpu)
>  }
>  #endif
> 
> -static struct sched_domain_topology_level power9_topology[] = {
> +static struct sched_domain_topology_level powerpc_topology[] = {


>  #ifdef CONFIG_SCHED_SMT
>   { cpu_smt_mask, powerpc_smt_flags, SD_INIT_NAME(SMT) },
>  #endif
> @@ -1388,21 +1386,10 @@ void __init smp_cpus_done(unsigned int max_cpus)
>  #ifdef CONFIG_SCHED_SMT
>   if (has_big_cores) {
>   pr_info("Big cores detected but using small core scheduling\n");
I> -power9_topology[0].mask = smallcore_smt_mask;
>   powerpc_topology[0].mask = smallcore_smt_mask;
>   }
>  #endif
> - /*
> -  * If any CPU detects that it's sharing a cache with another CPU then
> -  * use the deeper topology that is aware of this sharing.
> -  */
> - if (shared_caches) {
> - pr_info("Using shared cache scheduler topology\n");
> - set_sched_topology(power9_topology);
> - } else {
> - pr_info("Using standard scheduler topology\n");
> - set_sched_topology(powerpc_topology);


Ok, so we will go with the three level topology by default (SMT,
CACHE, DIE) and will rely on the sched-domain creation code to
degenerate CACHE domain in case SMT and CACHE have the same set of
CPUs (POWER8 for eg).

>From a cleanup perspective this is better, since we won't have to
worry about defining multiple topology structures, but from a
performance point of view, wouldn't we now pay an extra penalty of
degenerating the CACHE domains on POWER8 kind of systems, each time
when a CPU comes online ?

Do we know how bad it is ? If the degeneration takes a few extra
microseconds, that should be ok I suppose.

--
Thanks and Regards
gautham.


Re: [PATCH 01/11] powerpc/smp: Cache node for reuse

2020-07-16 Thread Gautham R Shenoy
On Tue, Jul 14, 2020 at 10:06:14AM +0530, Srikar Dronamraju wrote:
> While cpu_to_node is inline function with access to per_cpu variable.
> However when using repeatedly, it may be cleaner to cache it in a local
> variable.
> 
> Also fix a build error in a some weird config.
> "error: _numa_cpu_lookup_table_ undeclared"
> 
> No functional change
> 
> Cc: linuxppc-dev 
> Cc: Michael Ellerman 
> Cc: Nick Piggin 
> Cc: Oliver OHalloran 
> Cc: Nathan Lynch 
> Cc: Michael Neuling 
> Cc: Anton Blanchard 
> Cc: Gautham R Shenoy 
> Cc: Vaidyanathan Srinivasan 
> Signed-off-by: Srikar Dronamraju 


LGTM.

Reviewed-by: Gautham R. Shenoy 

> ---
>  arch/powerpc/kernel/smp.c | 18 +++---
>  1 file changed, 11 insertions(+), 7 deletions(-)
> 
> diff --git a/arch/powerpc/kernel/smp.c b/arch/powerpc/kernel/smp.c
> index 73199470c265..680c0edcc59d 100644
> --- a/arch/powerpc/kernel/smp.c
> +++ b/arch/powerpc/kernel/smp.c
> @@ -843,7 +843,7 @@ void __init smp_prepare_cpus(unsigned int max_cpus)
> 
>   DBG("smp_prepare_cpus\n");
> 
> - /* 
> + /*
>* setup_cpu may need to be called on the boot cpu. We havent
>* spun any cpus up but lets be paranoid.
>*/
> @@ -854,20 +854,24 @@ void __init smp_prepare_cpus(unsigned int max_cpus)
>   cpu_callin_map[boot_cpuid] = 1;
> 
>   for_each_possible_cpu(cpu) {
> + int node = cpu_to_node(cpu);
> +
>   zalloc_cpumask_var_node(_cpu(cpu_sibling_map, cpu),
> - GFP_KERNEL, cpu_to_node(cpu));
> + GFP_KERNEL, node);
>   zalloc_cpumask_var_node(_cpu(cpu_l2_cache_map, cpu),
> - GFP_KERNEL, cpu_to_node(cpu));
> + GFP_KERNEL, node);
>   zalloc_cpumask_var_node(_cpu(cpu_core_map, cpu),
> - GFP_KERNEL, cpu_to_node(cpu));
> + GFP_KERNEL, node);
> +#ifdef CONFIG_NEED_MULTIPLE_NODES
>   /*
>* numa_node_id() works after this.
>*/
>   if (cpu_present(cpu)) {
> - set_cpu_numa_node(cpu, numa_cpu_lookup_table[cpu]);
> - set_cpu_numa_mem(cpu,
> - local_memory_node(numa_cpu_lookup_table[cpu]));
> + node = numa_cpu_lookup_table[cpu];
> + set_cpu_numa_node(cpu, node);
> + set_cpu_numa_mem(cpu, local_memory_node(node));
>   }
> +#endif
>   }
> 
>   /* Init the cpumasks so the boot CPU is related to itself */
> -- 
> 2.17.1
> 


Re: [PATCH v3 02/12] powerpc/kexec_file: mark PPC64 specific code

2020-07-16 Thread Hari Bathini



On 16/07/20 7:19 am, Thiago Jung Bauermann wrote:
> 
> I didn't forget about this patch. I just wanted to see more of the
> changes before comenting on it.
> 
> Hari Bathini  writes:
> 
>> Some of the kexec_file_load code isn't PPC64 specific. Move PPC64
>> specific code from kexec/file_load.c to kexec/file_load_64.c. Also,
>> rename purgatory/trampoline.S to purgatory/trampoline_64.S in the
>> same spirit.
> 
> There's only a 64 bit implementation of kexec_file_load() so this is a
> somewhat theoretical exercise, but there's no harm in getting the code
> organized, so:
> 
> Reviewed-by: Thiago Jung Bauermann 
> 
> I have just one question below.



>> +/**
>> + * setup_new_fdt_ppc64 - Update the flattend device-tree of the kernel
>> + *   being loaded.
>> + * @image:   kexec image being loaded.
>> + * @fdt: Flattened device tree for the next kernel.
>> + * @initrd_load_addr:Address where the next initrd will be loaded.
>> + * @initrd_len:  Size of the next initrd, or 0 if there will be 
>> none.
>> + * @cmdline: Command line for the next kernel, or NULL if there 
>> will
>> + *   be none.
>> + *
>> + * Returns 0 on success, negative errno on error.
>> + */
>> +int setup_new_fdt_ppc64(const struct kimage *image, void *fdt,
>> +unsigned long initrd_load_addr,
>> +unsigned long initrd_len, const char *cmdline)
>> +{
>> +int chosen_node, ret;
>> +
>> +/* Remove memory reservation for the current device tree. */
>> +ret = delete_fdt_mem_rsv(fdt, __pa(initial_boot_params),
>> + fdt_totalsize(initial_boot_params));
>> +if (ret == 0)
>> +pr_debug("Removed old device tree reservation.\n");
>> +else if (ret != -ENOENT) {
>> +pr_err("Failed to remove old device-tree reservation.\n");
>> +return ret;
>> +}
>> +
>> +ret = setup_new_fdt(image, fdt, initrd_load_addr, initrd_len,
>> +cmdline, _node);
>> +if (ret)
>> +return ret;
>> +
>> +ret = fdt_setprop(fdt, chosen_node, "linux,booted-from-kexec", NULL, 0);
>> +if (ret)
>> +pr_err("Failed to update device-tree with 
>> linux,booted-from-kexec\n");
>> +
>> +return ret;
>> +}
> 
> For setup_purgatory_ppc64() you start with an empty function and build
> from there, but for setup_new_fdt_ppc64() you moved some code here. Is
> the code above 64 bit specific?

Actually, I was not quiet sure if fdt updates like in patch 6 & patch 9 can be
done after setup_ima_buffer() call. If you can confirm, I will move them back
to setup_purgatory()

Thanks
Hari


Re: [PATCH -next] cpuidle/pseries: Make symbol 'pseries_idle_driver' static

2020-07-16 Thread Daniel Lezcano
On 16/07/2020 14:56, Michael Ellerman wrote:
> On Tue, 14 Jul 2020 22:24:24 +0800, Wei Yongjun wrote:
>> The sparse tool complains as follows:
>>
>> drivers/cpuidle/cpuidle-pseries.c:25:23: warning:
>>  symbol 'pseries_idle_driver' was not declared. Should it be static?
>>
>> 'pseries_idle_driver' is not used outside of this file, so marks
>> it static.
> 
> Applied to powerpc/next.
> 
> [1/1] cpuidle/pseries: Make symbol 'pseries_idle_driver' static
>   
> https://git.kernel.org/powerpc/c/92fe8483b1660feaa602d8be6ca7efe95ae4789b

Rafael already picked the patch.


-- 
 Linaro.org │ Open source software for ARM SoCs

Follow Linaro:   Facebook |
 Twitter |
 Blog


Re: [PATCH v3 03/12] powerpc/kexec_file: add helper functions for getting memory ranges

2020-07-16 Thread Hari Bathini



On 15/07/20 5:19 am, Thiago Jung Bauermann wrote:
> 
> Hello Hari,
> 
> Hari Bathini  writes:
> 
>> In kexec case, the kernel to be loaded uses the same memory layout as
>> the running kernel. So, passing on the DT of the running kernel would
>> be good enough.
>>
>> But in case of kdump, different memory ranges are needed to manage
>> loading the kdump kernel, booting into it and exporting the elfcore
>> of the crashing kernel. The ranges are exlude memory ranges, usable
> 
> s/exlude/exclude/
> 
>> memory ranges, reserved memory ranges and crash memory ranges.
>>
>> Exclude memory ranges specify the list of memory ranges to avoid while
>> loading kdump segments. Usable memory ranges list the memory ranges
>> that could be used for booting kdump kernel. Reserved memory ranges
>> list the memory regions for the loading kernel's reserve map. Crash
>> memory ranges list the memory ranges to be exported as the crashing
>> kernel's elfcore.
>>
>> Add helper functions for setting up the above mentioned memory ranges.
>> This helpers facilitate in understanding the subsequent changes better
>> and make it easy to setup the different memory ranges listed above, as
>> and when appropriate.
>>
>> Signed-off-by: Hari Bathini 
>> Tested-by: Pingfan Liu 
> 



>> +/**
>> + * add_reserved_ranges - Adds "/reserved-ranges" regions exported by f/w
>> + *   to the given memory ranges list.
>> + * @mem_ranges:  Range list to add the memory ranges to.
>> + *
>> + * Returns 0 on success, negative errno on error.
>> + */
>> +int add_reserved_ranges(struct crash_mem **mem_ranges)
>> +{
>> +int i, len, ret = 0;
>> +const __be32 *prop;
>> +
>> +prop = of_get_property(of_root, "reserved-ranges", );
>> +if (!prop)
>> +return 0;
>> +
>> +/*
>> + * Each reserved range is an (address,size) pair, 2 cells each,
>> + * totalling 4 cells per range.
> 
> Can you assume that, or do you need to check the #address-cells and
> #size-cells properties of the root node?

Taken from early_reserve_mem_dt() which did not seem to care.
Should we be doing any different here?

Thanks
Hari


Re: [PATCH v3 0/3] Off-load TLB invalidations to host for !GTSE

2020-07-16 Thread Bharata B Rao
On Fri, Jul 17, 2020 at 12:44:00PM +1000, Nicholas Piggin wrote:
> Excerpts from Nicholas Piggin's message of July 17, 2020 12:08 pm:
> > Excerpts from Qian Cai's message of July 17, 2020 3:27 am:
> >> On Fri, Jul 03, 2020 at 11:06:05AM +0530, Bharata B Rao wrote:
> >>> Hypervisor may choose not to enable Guest Translation Shootdown Enable
> >>> (GTSE) option for the guest. When GTSE isn't ON, the guest OS isn't
> >>> permitted to use instructions like tblie and tlbsync directly, but is
> >>> expected to make hypervisor calls to get the TLB flushed.
> >>> 
> >>> This series enables the TLB flush routines in the radix code to
> >>> off-load TLB flushing to hypervisor via the newly proposed hcall
> >>> H_RPT_INVALIDATE. 
> >>> 
> >>> To easily check the availability of GTSE, it is made an MMU feature.
> >>> The OV5 handling and H_REGISTER_PROC_TBL hcall are changed to
> >>> handle GTSE as an optionally available feature and to not assume GTSE
> >>> when radix support is available.
> >>> 
> >>> The actual hcall implementation for KVM isn't included in this
> >>> patchset and will be posted separately.
> >>> 
> >>> Changes in v3
> >>> =
> >>> - Fixed a bug in the hcall wrapper code where we were missing setting
> >>>   H_RPTI_TYPE_NESTED while retrying the failed flush request with
> >>>   a full flush for the nested case.
> >>> - s/psize_to_h_rpti/psize_to_rpti_pgsize
> >>> 
> >>> v2: 
> >>> https://lore.kernel.org/linuxppc-dev/20200626131000.5207-1-bhar...@linux.ibm.com/T/#t
> >>> 
> >>> Bharata B Rao (2):
> >>>   powerpc/mm: Enable radix GTSE only if supported.
> >>>   powerpc/pseries: H_REGISTER_PROC_TBL should ask for GTSE only if
> >>> enabled
> >>> 
> >>> Nicholas Piggin (1):
> >>>   powerpc/mm/book3s64/radix: Off-load TLB invalidations to host when
> >>> !GTSE
> >> 
> >> Reverting the whole series fixed random memory corruptions during boot on
> >> POWER9 PowerNV systems below.
> > 
> > If I s/mmu_has_feature(MMU_FTR_GTSE)/(1)/g in radix_tlb.c, then the .o
> > disasm is the same as reverting my patch.
> > 
> > Feature bits not being set right? PowerNV should be pretty simple, seems
> > to do the same as FTR_TYPE_RADIX.
> 
> Might need this fix
> 
> ---
> 
> diff --git a/arch/powerpc/kernel/prom.c b/arch/powerpc/kernel/prom.c
> index 9cc49f265c86..54c9bcea9d4e 100644
> --- a/arch/powerpc/kernel/prom.c
> +++ b/arch/powerpc/kernel/prom.c
> @@ -163,7 +163,7 @@ static struct ibm_pa_feature {
>   { .pabyte = 0,  .pabit = 6, .cpu_features  = CPU_FTR_NOEXECUTE },
>   { .pabyte = 1,  .pabit = 2, .mmu_features  = MMU_FTR_CI_LARGE_PAGE },
>  #ifdef CONFIG_PPC_RADIX_MMU
> - { .pabyte = 40, .pabit = 0, .mmu_features  = MMU_FTR_TYPE_RADIX },
> + { .pabyte = 40, .pabit = 0, .mmu_features  = (MMU_FTR_TYPE_RADIX | 
> MMU_FTR_GTSE) },
>  #endif
>   { .pabyte = 1,  .pabit = 1, .invert = 1, .cpu_features = 
> CPU_FTR_NODSISRALIGN },
>   { .pabyte = 5,  .pabit = 0, .cpu_features  = CPU_FTR_REAL_LE,

Michael - Let me know if this should be folded into 1/3 and the complete
series resent.

Regards,
Bharata.


Re: [PATCH v4 04/10] powerpc/watchpoint: Enable watchpoint functionality on power10 guest

2020-07-16 Thread Jordan Niethe
On Fri, Jul 17, 2020 at 2:10 PM Ravi Bangoria
 wrote:
>
> CPU_FTR_DAWR is by default enabled for host via CPU_FTRS_DT_CPU_BASE
> (controlled by CONFIG_PPC_DT_CPU_FTRS). But cpu-features device-tree
> node is not PAPR compatible and thus not yet used by kvm or pHyp
> guests. Enable watchpoint functionality on power10 guest (both kvm
> and powervm) by adding CPU_FTR_DAWR to CPU_FTRS_POWER10. Note that
> this change does not enable 2nd DAWR support.
>
> Signed-off-by: Ravi Bangoria 
I ran the ptrace-hwbreak selftest successfully within a power10 kvm guest.
Tested-by: Jordan Niethe 
> ---
>  arch/powerpc/include/asm/cputable.h | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
>
> diff --git a/arch/powerpc/include/asm/cputable.h 
> b/arch/powerpc/include/asm/cputable.h
> index bac2252c839e..e506d429b1af 100644
> --- a/arch/powerpc/include/asm/cputable.h
> +++ b/arch/powerpc/include/asm/cputable.h
> @@ -478,7 +478,7 @@ static inline void cpu_feature_keys_init(void) { }
> CPU_FTR_CFAR | CPU_FTR_HVMODE | CPU_FTR_VMX_COPY | \
> CPU_FTR_DBELL | CPU_FTR_HAS_PPR | CPU_FTR_ARCH_207S | \
> CPU_FTR_TM_COMP | CPU_FTR_ARCH_300 | CPU_FTR_PKEY | \
> -   CPU_FTR_ARCH_31)
> +   CPU_FTR_ARCH_31 | CPU_FTR_DAWR)
>  #define CPU_FTRS_CELL  (CPU_FTR_LWSYNC | \
> CPU_FTR_PPCAS_ARCH_V2 | CPU_FTR_CTRL | \
> CPU_FTR_ALTIVEC_COMP | CPU_FTR_MMCRA | CPU_FTR_SMT | \
> --
> 2.26.2
>


Re: [PATCH v3 06/12] ppc64/kexec_file: restrict memory usage of kdump kernel

2020-07-16 Thread Hari Bathini



On 17/07/20 3:33 am, Thiago Jung Bauermann wrote:
> 
> Hari Bathini  writes:
> 
>> On 16/07/20 4:22 am, Thiago Jung Bauermann wrote:
>>>
>>> Hari Bathini  writes:
>>>


 
 +   * each representing a memory range.
 +   */
 +  ranges = (len >> 2) / (n_mem_addr_cells + n_mem_size_cells);
 +
 +  for (i = 0; i < ranges; i++) {
 +  base = of_read_number(prop, n_mem_addr_cells);
 +  prop += n_mem_addr_cells;
 +  end = base + of_read_number(prop, n_mem_size_cells) - 1;
>>
>> prop is not used after the above.
>>
>>> You need to `prop += n_mem_size_cells` here.
>>
>> But yeah, adding it would make it look complete in some sense..
> 
> Isn't it used in the next iteration of the loop?

Memory@XXX/reg typically has only one range. I was looking at it
from that perspective which is not right. Will update.

Thanks
Hari


Re: [powerpc:next-test 125/127] arch/powerpc/mm/book3s64/pkeys.c:392:7: error: implicit declaration of function 'is_pkey_enabled'; did you mean

2020-07-16 Thread Aneesh Kumar K.V

On 7/17/20 7:29 AM, kernel test robot wrote:

tree:   https://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux.git 
next-test
head:   0fbd1eb4df96e1cbd039e0b95fdf62cf65a7faf9
commit: ed411c66eea2ccf93a634ae661a1f79c2bc63d88 [125/127] 
powerpc/book3s64/pkeys: Remove is_pkey_enabled()
config: powerpc-allmodconfig (attached as .config)
compiler: powerpc64-linux-gcc (GCC) 9.3.0
reproduce (this is a W=1 build):
 wget 
https://raw.githubusercontent.com/intel/lkp-tests/master/sbin/make.cross -O 
~/bin/make.cross
 chmod +x ~/bin/make.cross
 git checkout ed411c66eea2ccf93a634ae661a1f79c2bc63d88
 # save the attached .config to linux build tree
 COMPILER_INSTALL_PATH=$HOME/0day COMPILER=gcc-9.3.0 make.cross 
ARCH=powerpc

If you fix the issue, kindly add following tag as appropriate
Reported-by: kernel test robot 

All errors (new ones prefixed by >>):

arch/powerpc/mm/book3s64/pkeys.c: In function 'pkey_access_permitted':

arch/powerpc/mm/book3s64/pkeys.c:392:7: error: implicit declaration of function 
'is_pkey_enabled'; did you mean 'arch_pkeys_enabled'? 
[-Werror=implicit-function-declaration]

  392 |  if (!is_pkey_enabled(pkey))
  |   ^~~
  |   arch_pkeys_enabled
cc1: some warnings being treated as errors

vim +392 arch/powerpc/mm/book3s64/pkeys.c




We removed that upstream in

19ab500edb5d6020010caba48ce3b4ce4182ab63 powerpc/mm/pkeys: Make pkey 
access check work on execute_only_key



next-test need to be rebased?

-aneesh


ASMedia USB 3.x host controllers triggering EEH on POWER9

2020-07-16 Thread Forest Crossman
Hi, all,

I have several ASMedia USB 3.x host controllers (ASM2142 and ASM3142,
both share the same Vendor ID/Device ID pair) that I'd like to use
with a POWER9 system (a Raptor Computing Systems Talos II).
Unfortunately, while the kernel recognizes the controllers just fine,
as soon as I plug in a device, an EEH error occurs and the host
controller gets repeatedly reset until it eventually gets disabled. An
example of one of these errors can be seen here:
https://paste.debian.net/hidden/e39698eb

Based on the "PHB4 Diag-data" reported by the kernel, it seems that
LEM_WOF_R bit 35, PHB_FESR bit 20, and RXE_ARB_FESR bit 28 have been
set. According to the PHB4 specification
(https://ibm.ent.box.com/s/jftnfhceul07qjh9jtn91xwjmclabc71), they
respectively mean the following:
 - ARB: IODA TVT Errors - "TCE Validation Table error occurred. The
entry is invalid, or the PCI Address was out of range as defined by
the TTA bounds in the TVE entry."
 - RXE_ARB OR Error Status - "RXE_ARB error bits, ... OR of all error
status bits."
 - IODA TVT Address Range Error - "IODA Error: The PCI Address was out
of range as defined by the TTA bounds in the TVE entry."

In other words, the ASMedia USB controllers seem to be trying to write
to addresses they're not supposed to, and thankfully the PHB4 is
catching these bad writes before they can cause any corruption of my
system's memory. Of course, this has the unfortunate side-effect that
these devices are completely unable to operate with my computer, and
since it seems to be possible to use these controllers on x86 systems
(presumably because of the less-strict/disabled-by-default IOMMU), I
wonder if maybe it would be possible to work around these errors in
either the kernel or the OPAL firmware? My thinking is that instead of
disconnecting the misbehaving devices, maybe the errors could be
"forgiven" (but still blocked) and the device permitted to continue
operating, possibly with some USB data loss from "writes to nowhere"
or retries that may reduce performance. Or maybe if the issue is
caused by some high address bits being set to random values, those
bits could be masked-off so as to not trigger the errors and even
avoid data loss.

So, my question is, is any of this possible? I know the simple
solution for me is to just RMA the cards and avoid purchasing
ASMedia-based USB host controllers in the future, but the fact that
they still seem to work "mostly ok" on x86 systems (with the
occasional kernel panics and BSODs reported by users) piques my
curiosity and makes me wonder if maybe there's a way for me to have my
cheap, buggy hardware cake and eat it, too.

Now, I'm a novice at kernel hacking, so I don't really know what I'm
doing, but just for fun I did try to paper over the issue by adding an
EEH handler to the xhci driver
(https://paste.debian.net/hidden/16081515), but as you might expect,
that didn't do anything but prevent further communication with the
device. I also read a bunch of the PHB4 and IODA2 specs to see if
maybe there'd be a way to implement that bit-masking thing I
mentioned, but both of those documents are, uh, rather dry reading, so
I haven't read them in their entirety, and I don't know enough about
how this all works to try to search the text for what I need.

All that said, if anyone has any suggestions or comments, I'd be
really interested to hear them, even if it's just to question why I'd
go to such ridiculous lengths to try to get software to account for
buggy hardware.


All the best,

Forest


[PATCH v4 10/10] powerpc/watchpoint: Remove 512 byte boundary

2020-07-16 Thread Ravi Bangoria
Power10 has removed 512 bytes boundary from match criteria. i.e. The watch
range can cross 512 bytes boundary.

Signed-off-by: Ravi Bangoria 
---
 arch/powerpc/kernel/hw_breakpoint.c | 5 +++--
 1 file changed, 3 insertions(+), 2 deletions(-)

diff --git a/arch/powerpc/kernel/hw_breakpoint.c 
b/arch/powerpc/kernel/hw_breakpoint.c
index c55e67bab271..1f4a1efa0074 100644
--- a/arch/powerpc/kernel/hw_breakpoint.c
+++ b/arch/powerpc/kernel/hw_breakpoint.c
@@ -418,8 +418,9 @@ static int hw_breakpoint_validate_len(struct 
arch_hw_breakpoint *hw)
 
if (dawr_enabled()) {
max_len = DAWR_MAX_LEN;
-   /* DAWR region can't cross 512 bytes boundary */
-   if (ALIGN_DOWN(start_addr, SZ_512) != ALIGN_DOWN(end_addr - 1, 
SZ_512))
+   /* DAWR region can't cross 512 bytes boundary on p10 
predecessors */
+   if (!cpu_has_feature(CPU_FTR_ARCH_31) &&
+   (ALIGN_DOWN(start_addr, SZ_512) != ALIGN_DOWN(end_addr - 1, 
SZ_512)))
return -EINVAL;
} else if (IS_ENABLED(CONFIG_PPC_8xx)) {
/* 8xx can setup a range without limitation */
-- 
2.26.2



[PATCH v4 09/10] powerpc/watchpoint: Return available watchpoints dynamically

2020-07-16 Thread Ravi Bangoria
So far Book3S Powerpc supported only one watchpoint. Power10 is
introducing 2nd DAWR. Enable 2nd DAWR support for Power10.
Availability of 2nd DAWR will depend on CPU_FTR_DAWR1.

Signed-off-by: Ravi Bangoria 
---
 arch/powerpc/include/asm/cputable.h  | 4 +++-
 arch/powerpc/include/asm/hw_breakpoint.h | 5 +++--
 2 files changed, 6 insertions(+), 3 deletions(-)

diff --git a/arch/powerpc/include/asm/cputable.h 
b/arch/powerpc/include/asm/cputable.h
index 3445c86e1f6f..36a0851a7a9b 100644
--- a/arch/powerpc/include/asm/cputable.h
+++ b/arch/powerpc/include/asm/cputable.h
@@ -633,7 +633,9 @@ enum {
  * Maximum number of hw breakpoint supported on powerpc. Number of
  * breakpoints supported by actual hw might be less than this.
  */
-#define HBP_NUM_MAX1
+#define HBP_NUM_MAX2
+#define HBP_NUM_ONE1
+#define HBP_NUM_TWO2
 
 #endif /* !__ASSEMBLY__ */
 
diff --git a/arch/powerpc/include/asm/hw_breakpoint.h 
b/arch/powerpc/include/asm/hw_breakpoint.h
index cb424799da0d..d4eab1694bcd 100644
--- a/arch/powerpc/include/asm/hw_breakpoint.h
+++ b/arch/powerpc/include/asm/hw_breakpoint.h
@@ -5,10 +5,11 @@
  * Copyright 2010, IBM Corporation.
  * Author: K.Prasad 
  */
-
 #ifndef _PPC_BOOK3S_64_HW_BREAKPOINT_H
 #define _PPC_BOOK3S_64_HW_BREAKPOINT_H
 
+#include 
+
 #ifdef __KERNEL__
 struct arch_hw_breakpoint {
unsigned long   address;
@@ -46,7 +47,7 @@ struct arch_hw_breakpoint {
 
 static inline int nr_wp_slots(void)
 {
-   return HBP_NUM_MAX;
+   return cpu_has_feature(CPU_FTR_DAWR1) ? HBP_NUM_TWO : HBP_NUM_ONE;
 }
 
 #ifdef CONFIG_HAVE_HW_BREAKPOINT
-- 
2.26.2



[PATCH v4 08/10] powerpc/watchpoint: Guest support for 2nd DAWR hcall

2020-07-16 Thread Ravi Bangoria
2nd DAWR can be set/unset using H_SET_MODE hcall with resource value 5.
Enable powervm guest support with that. This has no effect on kvm guest
because kvm will return error if guest does hcall with resource value 5.

Signed-off-by: Ravi Bangoria 
---
 arch/powerpc/include/asm/hvcall.h | 1 +
 arch/powerpc/include/asm/machdep.h| 2 +-
 arch/powerpc/include/asm/plpar_wrappers.h | 5 +
 arch/powerpc/kernel/dawr.c| 2 +-
 arch/powerpc/platforms/pseries/setup.c| 7 +--
 5 files changed, 13 insertions(+), 4 deletions(-)

diff --git a/arch/powerpc/include/asm/hvcall.h 
b/arch/powerpc/include/asm/hvcall.h
index b785e9f0071c..33793444144c 100644
--- a/arch/powerpc/include/asm/hvcall.h
+++ b/arch/powerpc/include/asm/hvcall.h
@@ -358,6 +358,7 @@
 #define H_SET_MODE_RESOURCE_SET_DAWR0  2
 #define H_SET_MODE_RESOURCE_ADDR_TRANS_MODE3
 #define H_SET_MODE_RESOURCE_LE 4
+#define H_SET_MODE_RESOURCE_SET_DAWR1  5
 
 /* Values for argument to H_SIGNAL_SYS_RESET */
 #define H_SIGNAL_SYS_RESET_ALL -1
diff --git a/arch/powerpc/include/asm/machdep.h 
b/arch/powerpc/include/asm/machdep.h
index 7bcb6a39..a90b892f0bfe 100644
--- a/arch/powerpc/include/asm/machdep.h
+++ b/arch/powerpc/include/asm/machdep.h
@@ -131,7 +131,7 @@ struct machdep_calls {
unsigned long dabrx);
 
/* Set DAWR for this platform, leave empty for default implementation */
-   int (*set_dawr)(unsigned long dawr,
+   int (*set_dawr)(int nr, unsigned long dawr,
unsigned long dawrx);
 
 #ifdef CONFIG_PPC32/* XXX for now */
diff --git a/arch/powerpc/include/asm/plpar_wrappers.h 
b/arch/powerpc/include/asm/plpar_wrappers.h
index d12c3680d946..ece84a430701 100644
--- a/arch/powerpc/include/asm/plpar_wrappers.h
+++ b/arch/powerpc/include/asm/plpar_wrappers.h
@@ -315,6 +315,11 @@ static inline long plpar_set_watchpoint0(unsigned long 
dawr0, unsigned long dawr
return plpar_set_mode(0, H_SET_MODE_RESOURCE_SET_DAWR0, dawr0, dawrx0);
 }
 
+static inline long plpar_set_watchpoint1(unsigned long dawr1, unsigned long 
dawrx1)
+{
+   return plpar_set_mode(0, H_SET_MODE_RESOURCE_SET_DAWR1, dawr1, dawrx1);
+}
+
 static inline long plpar_signal_sys_reset(long cpu)
 {
return plpar_hcall_norets(H_SIGNAL_SYS_RESET, cpu);
diff --git a/arch/powerpc/kernel/dawr.c b/arch/powerpc/kernel/dawr.c
index 500f52fa4711..cdc2dccb987d 100644
--- a/arch/powerpc/kernel/dawr.c
+++ b/arch/powerpc/kernel/dawr.c
@@ -37,7 +37,7 @@ int set_dawr(int nr, struct arch_hw_breakpoint *brk)
dawrx |= (mrd & 0x3f) << (63 - 53);
 
if (ppc_md.set_dawr)
-   return ppc_md.set_dawr(dawr, dawrx);
+   return ppc_md.set_dawr(nr, dawr, dawrx);
 
if (nr == 0) {
mtspr(SPRN_DAWR0, dawr);
diff --git a/arch/powerpc/platforms/pseries/setup.c 
b/arch/powerpc/platforms/pseries/setup.c
index 2db8469e475f..d516ee8eb7fc 100644
--- a/arch/powerpc/platforms/pseries/setup.c
+++ b/arch/powerpc/platforms/pseries/setup.c
@@ -831,12 +831,15 @@ static int pseries_set_xdabr(unsigned long dabr, unsigned 
long dabrx)
return plpar_hcall_norets(H_SET_XDABR, dabr, dabrx);
 }
 
-static int pseries_set_dawr(unsigned long dawr, unsigned long dawrx)
+static int pseries_set_dawr(int nr, unsigned long dawr, unsigned long dawrx)
 {
/* PAPR says we can't set HYP */
dawrx &= ~DAWRX_HYP;
 
-   return  plpar_set_watchpoint0(dawr, dawrx);
+   if (nr == 0)
+   return plpar_set_watchpoint0(dawr, dawrx);
+   else
+   return plpar_set_watchpoint1(dawr, dawrx);
 }
 
 #define CMO_CHARACTERISTICS_TOKEN 44
-- 
2.26.2



[PATCH v4 07/10] powerpc/watchpoint: Rename current H_SET_MODE DAWR macro

2020-07-16 Thread Ravi Bangoria
Current H_SET_MODE hcall macro name for setting/resetting DAWR0 is
H_SET_MODE_RESOURCE_SET_DAWR. Add suffix 0 to macro name as well.

Signed-off-by: Ravi Bangoria 
---
 arch/powerpc/include/asm/hvcall.h | 2 +-
 arch/powerpc/include/asm/plpar_wrappers.h | 2 +-
 arch/powerpc/kvm/book3s_hv.c  | 2 +-
 3 files changed, 3 insertions(+), 3 deletions(-)

diff --git a/arch/powerpc/include/asm/hvcall.h 
b/arch/powerpc/include/asm/hvcall.h
index 43486e773bd6..b785e9f0071c 100644
--- a/arch/powerpc/include/asm/hvcall.h
+++ b/arch/powerpc/include/asm/hvcall.h
@@ -355,7 +355,7 @@
 
 /* Values for 2nd argument to H_SET_MODE */
 #define H_SET_MODE_RESOURCE_SET_CIABR  1
-#define H_SET_MODE_RESOURCE_SET_DAWR   2
+#define H_SET_MODE_RESOURCE_SET_DAWR0  2
 #define H_SET_MODE_RESOURCE_ADDR_TRANS_MODE3
 #define H_SET_MODE_RESOURCE_LE 4
 
diff --git a/arch/powerpc/include/asm/plpar_wrappers.h 
b/arch/powerpc/include/asm/plpar_wrappers.h
index 4293c5d2ddf4..d12c3680d946 100644
--- a/arch/powerpc/include/asm/plpar_wrappers.h
+++ b/arch/powerpc/include/asm/plpar_wrappers.h
@@ -312,7 +312,7 @@ static inline long plpar_set_ciabr(unsigned long ciabr)
 
 static inline long plpar_set_watchpoint0(unsigned long dawr0, unsigned long 
dawrx0)
 {
-   return plpar_set_mode(0, H_SET_MODE_RESOURCE_SET_DAWR, dawr0, dawrx0);
+   return plpar_set_mode(0, H_SET_MODE_RESOURCE_SET_DAWR0, dawr0, dawrx0);
 }
 
 static inline long plpar_signal_sys_reset(long cpu)
diff --git a/arch/powerpc/kvm/book3s_hv.c b/arch/powerpc/kvm/book3s_hv.c
index 6bf66649ab92..7ad692c2d7c7 100644
--- a/arch/powerpc/kvm/book3s_hv.c
+++ b/arch/powerpc/kvm/book3s_hv.c
@@ -764,7 +764,7 @@ static int kvmppc_h_set_mode(struct kvm_vcpu *vcpu, 
unsigned long mflags,
return H_P3;
vcpu->arch.ciabr  = value1;
return H_SUCCESS;
-   case H_SET_MODE_RESOURCE_SET_DAWR:
+   case H_SET_MODE_RESOURCE_SET_DAWR0:
if (!kvmppc_power8_compatible(vcpu))
return H_P2;
if (!ppc_breakpoint_available())
-- 
2.26.2



[PATCH v4 06/10] powerpc/watchpoint: Set CPU_FTR_DAWR1 based on pa-features bit

2020-07-16 Thread Ravi Bangoria
As per the PAPR, bit 0 of byte 64 in pa-features property indicates
availability of 2nd DAWR registers. i.e. If this bit is set, 2nd
DAWR is present, otherwise not. Host generally uses "cpu-features",
which masks "pa-features". But "cpu-features" are still not used for
guests and thus this change is mostly applicable for guests only.

Signed-off-by: Ravi Bangoria 
---
 arch/powerpc/kernel/prom.c | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/arch/powerpc/kernel/prom.c b/arch/powerpc/kernel/prom.c
index 9cc49f265c86..c76c09b97bc8 100644
--- a/arch/powerpc/kernel/prom.c
+++ b/arch/powerpc/kernel/prom.c
@@ -175,6 +175,8 @@ static struct ibm_pa_feature {
 */
{ .pabyte = 22, .pabit = 0, .cpu_features = CPU_FTR_TM_COMP,
  .cpu_user_ftrs2 = PPC_FEATURE2_HTM_COMP | PPC_FEATURE2_HTM_NOSC_COMP 
},
+
+   { .pabyte = 64, .pabit = 0, .cpu_features = CPU_FTR_DAWR1 },
 };
 
 static void __init scan_features(unsigned long node, const unsigned char *ftrs,
-- 
2.26.2



[PATCH v4 05/10] powerpc/dt_cpu_ftrs: Add feature for 2nd DAWR

2020-07-16 Thread Ravi Bangoria
Add new device-tree feature for 2nd DAWR. If this feature is present,
2nd DAWR is supported, otherwise not.

Signed-off-by: Ravi Bangoria 
---
 arch/powerpc/include/asm/cputable.h | 7 +--
 arch/powerpc/kernel/dt_cpu_ftrs.c   | 7 +++
 2 files changed, 12 insertions(+), 2 deletions(-)

diff --git a/arch/powerpc/include/asm/cputable.h 
b/arch/powerpc/include/asm/cputable.h
index e506d429b1af..3445c86e1f6f 100644
--- a/arch/powerpc/include/asm/cputable.h
+++ b/arch/powerpc/include/asm/cputable.h
@@ -214,6 +214,7 @@ static inline void cpu_feature_keys_init(void) { }
 #define CPU_FTR_P9_TLBIE_ERAT_BUG  LONG_ASM_CONST(0x0001)
 #define CPU_FTR_P9_RADIX_PREFETCH_BUG  LONG_ASM_CONST(0x0002)
 #define CPU_FTR_ARCH_31
LONG_ASM_CONST(0x0004)
+#define CPU_FTR_DAWR1  LONG_ASM_CONST(0x0008)
 
 #ifndef __ASSEMBLY__
 
@@ -497,14 +498,16 @@ static inline void cpu_feature_keys_init(void) { }
 #define CPU_FTRS_POSSIBLE  \
(CPU_FTRS_POWER7 | CPU_FTRS_POWER8E | CPU_FTRS_POWER8 | \
 CPU_FTR_ALTIVEC_COMP | CPU_FTR_VSX_COMP | CPU_FTRS_POWER9 | \
-CPU_FTRS_POWER9_DD2_1 | CPU_FTRS_POWER9_DD2_2 | CPU_FTRS_POWER10)
+CPU_FTRS_POWER9_DD2_1 | CPU_FTRS_POWER9_DD2_2 | CPU_FTRS_POWER10 | 
\
+CPU_FTR_DAWR1)
 #else
 #define CPU_FTRS_POSSIBLE  \
(CPU_FTRS_PPC970 | CPU_FTRS_POWER5 | \
 CPU_FTRS_POWER6 | CPU_FTRS_POWER7 | CPU_FTRS_POWER8E | \
 CPU_FTRS_POWER8 | CPU_FTRS_CELL | CPU_FTRS_PA6T | \
 CPU_FTR_VSX_COMP | CPU_FTR_ALTIVEC_COMP | CPU_FTRS_POWER9 | \
-CPU_FTRS_POWER9_DD2_1 | CPU_FTRS_POWER9_DD2_2 | CPU_FTRS_POWER10)
+CPU_FTRS_POWER9_DD2_1 | CPU_FTRS_POWER9_DD2_2 | CPU_FTRS_POWER10 | 
\
+CPU_FTR_DAWR1)
 #endif /* CONFIG_CPU_LITTLE_ENDIAN */
 #endif
 #else
diff --git a/arch/powerpc/kernel/dt_cpu_ftrs.c 
b/arch/powerpc/kernel/dt_cpu_ftrs.c
index ac650c233cd9..c78cd3596ec4 100644
--- a/arch/powerpc/kernel/dt_cpu_ftrs.c
+++ b/arch/powerpc/kernel/dt_cpu_ftrs.c
@@ -574,6 +574,12 @@ static int __init feat_enable_mma(struct dt_cpu_feature *f)
return 1;
 }
 
+static int __init feat_enable_debug_facilities_v31(struct dt_cpu_feature *f)
+{
+   cur_cpu_spec->cpu_features |= CPU_FTR_DAWR1;
+   return 1;
+}
+
 struct dt_cpu_feature_match {
const char *name;
int (*enable)(struct dt_cpu_feature *f);
@@ -649,6 +655,7 @@ static struct dt_cpu_feature_match __initdata
{"wait-v3", feat_enable, 0},
{"prefix-instructions", feat_enable, 0},
{"matrix-multiply-assist", feat_enable_mma, 0},
+   {"debug-facilities-v31", feat_enable_debug_facilities_v31, 0},
 };
 
 static bool __initdata using_dt_cpu_ftrs;
-- 
2.26.2



[PATCH v4 04/10] powerpc/watchpoint: Enable watchpoint functionality on power10 guest

2020-07-16 Thread Ravi Bangoria
CPU_FTR_DAWR is by default enabled for host via CPU_FTRS_DT_CPU_BASE
(controlled by CONFIG_PPC_DT_CPU_FTRS). But cpu-features device-tree
node is not PAPR compatible and thus not yet used by kvm or pHyp
guests. Enable watchpoint functionality on power10 guest (both kvm
and powervm) by adding CPU_FTR_DAWR to CPU_FTRS_POWER10. Note that
this change does not enable 2nd DAWR support.

Signed-off-by: Ravi Bangoria 
---
 arch/powerpc/include/asm/cputable.h | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/arch/powerpc/include/asm/cputable.h 
b/arch/powerpc/include/asm/cputable.h
index bac2252c839e..e506d429b1af 100644
--- a/arch/powerpc/include/asm/cputable.h
+++ b/arch/powerpc/include/asm/cputable.h
@@ -478,7 +478,7 @@ static inline void cpu_feature_keys_init(void) { }
CPU_FTR_CFAR | CPU_FTR_HVMODE | CPU_FTR_VMX_COPY | \
CPU_FTR_DBELL | CPU_FTR_HAS_PPR | CPU_FTR_ARCH_207S | \
CPU_FTR_TM_COMP | CPU_FTR_ARCH_300 | CPU_FTR_PKEY | \
-   CPU_FTR_ARCH_31)
+   CPU_FTR_ARCH_31 | CPU_FTR_DAWR)
 #define CPU_FTRS_CELL  (CPU_FTR_LWSYNC | \
CPU_FTR_PPCAS_ARCH_V2 | CPU_FTR_CTRL | \
CPU_FTR_ALTIVEC_COMP | CPU_FTR_MMCRA | CPU_FTR_SMT | \
-- 
2.26.2



[PATCH v4 03/10] powerpc/watchpoint: Fix DAWR exception for CACHEOP

2020-07-16 Thread Ravi Bangoria
'ea' returned by analyse_instr() needs to be aligned down to cache
block size for CACHEOP instructions. analyse_instr() does not set
size for CACHEOP, thus size also needs to be calculated manually.

Fixes: 27985b2a640e ("powerpc/watchpoint: Don't ignore extraneous exceptions 
blindly")
Fixes: 74c6881019b7 ("powerpc/watchpoint: Prepare handler to handle more than 
one watchpoint")
Signed-off-by: Ravi Bangoria 
---
 arch/powerpc/kernel/hw_breakpoint.c | 21 -
 1 file changed, 20 insertions(+), 1 deletion(-)

diff --git a/arch/powerpc/kernel/hw_breakpoint.c 
b/arch/powerpc/kernel/hw_breakpoint.c
index a971e22aea81..c55e67bab271 100644
--- a/arch/powerpc/kernel/hw_breakpoint.c
+++ b/arch/powerpc/kernel/hw_breakpoint.c
@@ -538,7 +538,12 @@ static bool check_dawrx_constraints(struct pt_regs *regs, 
int type,
if (OP_IS_LOAD(type) && !(info->type & HW_BRK_TYPE_READ))
return false;
 
-   if (OP_IS_STORE(type) && !(info->type & HW_BRK_TYPE_WRITE))
+   /*
+* The Cache Management instructions other than dcbz never
+* cause a match. i.e. if type is CACHEOP, the instruction
+* is dcbz, and dcbz is treated as Store.
+*/
+   if ((OP_IS_STORE(type) || type == CACHEOP) && !(info->type & 
HW_BRK_TYPE_WRITE))
return false;
 
if (is_kernel_addr(regs->nip) && !(info->type & HW_BRK_TYPE_KERNEL))
@@ -601,6 +606,15 @@ static bool check_constraints(struct pt_regs *regs, struct 
ppc_inst instr,
return false;
 }
 
+static int cache_op_size(void)
+{
+#ifdef __powerpc64__
+   return ppc64_caches.l1d.block_size;
+#else
+   return L1_CACHE_BYTES;
+#endif
+}
+
 static void get_instr_detail(struct pt_regs *regs, struct ppc_inst *instr,
 int *type, int *size, unsigned long *ea)
 {
@@ -616,7 +630,12 @@ static void get_instr_detail(struct pt_regs *regs, struct 
ppc_inst *instr,
if (!(regs->msr & MSR_64BIT))
*ea &= 0xUL;
 #endif
+
*size = GETSIZE(op.type);
+   if (*type == CACHEOP) {
+   *size = cache_op_size();
+   *ea &= ~(*size - 1);
+   }
 }
 
 static bool is_larx_stcx_instr(int type)
-- 
2.26.2



[PATCH v4 02/10] powerpc/watchpoint: Fix DAWR exception constraint

2020-07-16 Thread Ravi Bangoria
Pedro Miraglia Franco de Carvalho noticed that on p8/p9, DAR value is
inconsistent with different type of load/store. Like for byte,word
etc. load/stores, DAR is set to the address of the first byte of
overlap between watch range and real access. But for quadword load/
store it's sometime set to the address of the first byte of real
access whereas sometime set to the address of the first byte of
overlap. This issue has been fixed in p10. In p10(ISA 3.1), DAR is
always set to the address of the first byte of overlap. Commit 27985b2a640e
("powerpc/watchpoint: Don't ignore extraneous exceptions blindly")
wrongly assumes that DAR is set to the address of the first byte of
overlap for all load/stores on p8/p9 as well. Fix that. With the fix,
we now rely on 'ea' provided by analyse_instr(). If analyse_instr()
fails, generate event unconditionally on p8/p9, and on p10 generate
event only if DAR is within a DAWR range.

Note: 8xx is not affected.

Fixes: 27985b2a640e ("powerpc/watchpoint: Don't ignore extraneous exceptions 
blindly")
Fixes: 74c6881019b7 ("powerpc/watchpoint: Prepare handler to handle more than 
one watchpoint")
Reported-by: Pedro Miraglia Franco de Carvalho 
Signed-off-by: Ravi Bangoria 
---
 arch/powerpc/kernel/hw_breakpoint.c | 72 -
 1 file changed, 41 insertions(+), 31 deletions(-)

diff --git a/arch/powerpc/kernel/hw_breakpoint.c 
b/arch/powerpc/kernel/hw_breakpoint.c
index 031e6defc08e..a971e22aea81 100644
--- a/arch/powerpc/kernel/hw_breakpoint.c
+++ b/arch/powerpc/kernel/hw_breakpoint.c
@@ -498,11 +498,11 @@ static bool dar_in_user_range(unsigned long dar, struct 
arch_hw_breakpoint *info
return ((info->address <= dar) && (dar - info->address < info->len));
 }
 
-static bool dar_user_range_overlaps(unsigned long dar, int size,
-   struct arch_hw_breakpoint *info)
+static bool ea_user_range_overlaps(unsigned long ea, int size,
+  struct arch_hw_breakpoint *info)
 {
-   return ((dar < info->address + info->len) &&
-   (dar + size > info->address));
+   return ((ea < info->address + info->len) &&
+   (ea + size > info->address));
 }
 
 static bool dar_in_hw_range(unsigned long dar, struct arch_hw_breakpoint *info)
@@ -515,20 +515,22 @@ static bool dar_in_hw_range(unsigned long dar, struct 
arch_hw_breakpoint *info)
return ((hw_start_addr <= dar) && (hw_end_addr > dar));
 }
 
-static bool dar_hw_range_overlaps(unsigned long dar, int size,
- struct arch_hw_breakpoint *info)
+static bool ea_hw_range_overlaps(unsigned long ea, int size,
+struct arch_hw_breakpoint *info)
 {
unsigned long hw_start_addr, hw_end_addr;
 
hw_start_addr = ALIGN_DOWN(info->address, HW_BREAKPOINT_SIZE);
hw_end_addr = ALIGN(info->address + info->len, HW_BREAKPOINT_SIZE);
 
-   return ((dar < hw_end_addr) && (dar + size > hw_start_addr));
+   return ((ea < hw_end_addr) && (ea + size > hw_start_addr));
 }
 
 /*
  * If hw has multiple DAWR registers, we also need to check all
  * dawrx constraint bits to confirm this is _really_ a valid event.
+ * If type is UNKNOWN, but privilege level matches, consider it as
+ * a positive match.
  */
 static bool check_dawrx_constraints(struct pt_regs *regs, int type,
struct arch_hw_breakpoint *info)
@@ -553,7 +555,8 @@ static bool check_dawrx_constraints(struct pt_regs *regs, 
int type,
  * including extraneous exception. Otherwise return false.
  */
 static bool check_constraints(struct pt_regs *regs, struct ppc_inst instr,
- int type, int size, struct arch_hw_breakpoint 
*info)
+ unsigned long ea, int type, int size,
+ struct arch_hw_breakpoint *info)
 {
bool in_user_range = dar_in_user_range(regs->dar, info);
bool dawrx_constraints;
@@ -569,22 +572,27 @@ static bool check_constraints(struct pt_regs *regs, 
struct ppc_inst instr,
}
 
if (unlikely(ppc_inst_equal(instr, ppc_inst(0 {
-   if (in_user_range)
-   return true;
+   if (cpu_has_feature(CPU_FTR_ARCH_31) &&
+   !dar_in_hw_range(regs->dar, info))
+   return false;
 
-   if (dar_in_hw_range(regs->dar, info)) {
-   info->type |= HW_BRK_TYPE_EXTRANEOUS_IRQ;
-   return true;
-   }
-   return false;
+   return true;
}
 
dawrx_constraints = check_dawrx_constraints(regs, type, info);
 
-   if (dar_user_range_overlaps(regs->dar, size, info))
+   if (type == UNKNOWN) {
+   if (cpu_has_feature(CPU_FTR_ARCH_31) &&
+   !dar_in_hw_range(regs->dar, info))
+   return false;
+
return dawrx_constraints;
+   }

[PATCH v4 01/10] powerpc/watchpoint: Fix 512 byte boundary limit

2020-07-16 Thread Ravi Bangoria
Milton Miller reported that we are aligning start and end address to
wrong size SZ_512M. It should be SZ_512. Fix that.

While doing this change I also found a case where ALIGN() comparison
fails. Within a given aligned range, ALIGN() of two addresses does not
match when start address is pointing to the first byte and end address
is pointing to any other byte except the first one. But that's not true
for ALIGN_DOWN(). ALIGN_DOWN() of any two addresses within that range
will always point to the first byte. So use ALIGN_DOWN() instead of
ALIGN().

Fixes: e68ef121c1f4 ("powerpc/watchpoint: Use builtin ALIGN*() macros")
Reported-by: Milton Miller 
Signed-off-by: Ravi Bangoria 
Tested-by: Jordan Niethe 
---
 arch/powerpc/kernel/hw_breakpoint.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/arch/powerpc/kernel/hw_breakpoint.c 
b/arch/powerpc/kernel/hw_breakpoint.c
index daf0e1da..031e6defc08e 100644
--- a/arch/powerpc/kernel/hw_breakpoint.c
+++ b/arch/powerpc/kernel/hw_breakpoint.c
@@ -419,7 +419,7 @@ static int hw_breakpoint_validate_len(struct 
arch_hw_breakpoint *hw)
if (dawr_enabled()) {
max_len = DAWR_MAX_LEN;
/* DAWR region can't cross 512 bytes boundary */
-   if (ALIGN(start_addr, SZ_512M) != ALIGN(end_addr - 1, SZ_512M))
+   if (ALIGN_DOWN(start_addr, SZ_512) != ALIGN_DOWN(end_addr - 1, 
SZ_512))
return -EINVAL;
} else if (IS_ENABLED(CONFIG_PPC_8xx)) {
/* 8xx can setup a range without limitation */
-- 
2.26.2



[PATCH v4 00/10] powerpc/watchpoint: Enable 2nd DAWR on baremetal and powervm

2020-07-16 Thread Ravi Bangoria
Last series[1] was to add basic infrastructure support for more than
one watchpoint on Book3S powerpc. This series actually enables the 2nd 
DAWR for baremetal and powervm. Kvm guest is still not supported.

v3: 
https://lore.kernel.org/lkml/20200708045046.135702-1-ravi.bango...@linux.ibm.com

v3->v4:
 - v3 patch #2 is split into two v4 patches: #2 and #3
 - Few other minor neats suggested by Jordan Niethe
 - Rebased to powerpc/next

[1]: 
https://lore.kernel.org/linuxppc-dev/20200514111741.97993-1-ravi.bango...@linux.ibm.com/

Ravi Bangoria (10):
  powerpc/watchpoint: Fix 512 byte boundary limit
  powerpc/watchpoint: Fix DAWR exception constraint
  powerpc/watchpoint: Fix DAWR exception for CACHEOP
  powerpc/watchpoint: Enable watchpoint functionality on power10 guest
  powerpc/dt_cpu_ftrs: Add feature for 2nd DAWR
  powerpc/watchpoint: Set CPU_FTR_DAWR1 based on pa-features bit
  powerpc/watchpoint: Rename current H_SET_MODE DAWR macro
  powerpc/watchpoint: Guest support for 2nd DAWR hcall
  powerpc/watchpoint: Return available watchpoints dynamically
  powerpc/watchpoint: Remove 512 byte boundary

 arch/powerpc/include/asm/cputable.h   | 13 ++-
 arch/powerpc/include/asm/hvcall.h |  3 +-
 arch/powerpc/include/asm/hw_breakpoint.h  |  5 +-
 arch/powerpc/include/asm/machdep.h|  2 +-
 arch/powerpc/include/asm/plpar_wrappers.h |  7 +-
 arch/powerpc/kernel/dawr.c|  2 +-
 arch/powerpc/kernel/dt_cpu_ftrs.c |  7 ++
 arch/powerpc/kernel/hw_breakpoint.c   | 98 +++
 arch/powerpc/kernel/prom.c|  2 +
 arch/powerpc/kvm/book3s_hv.c  |  2 +-
 arch/powerpc/platforms/pseries/setup.c|  7 +-
 11 files changed, 101 insertions(+), 47 deletions(-)

-- 
2.26.2



Re: [PATCH V5 1/4] mm/debug_vm_pgtable: Add tests validating arch helpers for core MM features

2020-07-16 Thread Anshuman Khandual



On 07/16/2020 07:44 PM, Steven Price wrote:
> On 13/07/2020 04:23, Anshuman Khandual wrote:
>> This adds new tests validating arch page table helpers for these following
>> core memory features. These tests create and test specific mapping types at
>> various page table levels.
>>
>> 1. SPECIAL mapping
>> 2. PROTNONE mapping
>> 3. DEVMAP mapping
>> 4. SOFTDIRTY mapping
>> 5. SWAP mapping
>> 6. MIGRATION mapping
>> 7. HUGETLB mapping
>> 8. THP mapping
>>
>> Cc: Andrew Morton 
>> Cc: Gerald Schaefer 
>> Cc: Christophe Leroy 
>> Cc: Mike Rapoport 
>> Cc: Vineet Gupta 
>> Cc: Catalin Marinas 
>> Cc: Will Deacon 
>> Cc: Benjamin Herrenschmidt 
>> Cc: Paul Mackerras 
>> Cc: Michael Ellerman 
>> Cc: Heiko Carstens 
>> Cc: Vasily Gorbik 
>> Cc: Christian Borntraeger 
>> Cc: Thomas Gleixner 
>> Cc: Ingo Molnar 
>> Cc: Borislav Petkov 
>> Cc: "H. Peter Anvin" 
>> Cc: Kirill A. Shutemov 
>> Cc: Paul Walmsley 
>> Cc: Palmer Dabbelt 
>> Cc: linux-snps-...@lists.infradead.org
>> Cc: linux-arm-ker...@lists.infradead.org
>> Cc: linuxppc-dev@lists.ozlabs.org
>> Cc: linux-s...@vger.kernel.org
>> Cc: linux-ri...@lists.infradead.org
>> Cc: x...@kernel.org
>> Cc: linux...@kvack.org
>> Cc: linux-a...@vger.kernel.org
>> Cc: linux-ker...@vger.kernel.org
>> Tested-by: Vineet Gupta     #arc
>> Reviewed-by: Zi Yan 
>> Suggested-by: Catalin Marinas 
>> Signed-off-by: Anshuman Khandual 
>> ---
>>   mm/debug_vm_pgtable.c | 302 +-
>>   1 file changed, 301 insertions(+), 1 deletion(-)
>>
>> diff --git a/mm/debug_vm_pgtable.c b/mm/debug_vm_pgtable.c
>> index 61ab16fb2e36..2fac47db3eb7 100644
>> --- a/mm/debug_vm_pgtable.c
>> +++ b/mm/debug_vm_pgtable.c
> [...]
>> +
>> +static void __init pte_swap_tests(unsigned long pfn, pgprot_t prot)
>> +{
>> +    swp_entry_t swp;
>> +    pte_t pte;
>> +
>> +    pte = pfn_pte(pfn, prot);
>> +    swp = __pte_to_swp_entry(pte);
> 
> Minor issue: this doesn't look necessarily valid - there's no reason a normal 
> PTE can be turned into a swp_entry. In practise this is likely to work on all 
> architectures because there's no reason not to use (at least) all the PFN 
> bits for the swap entry, but it doesn't exactly seem correct.

Agreed, that it is a simple test but nonetheless a valid one which
makes sure that PFN value remained unchanged during pte <---> swp
conversion.

> 
> Can we start with a swp_entry_t (from __swp_entry()) and check the round trip 
> of that?
> 
> It would also seem sensible to have a check that 
> is_swap_pte(__swp_entry_to_pte(__swp_entry(x,y))) is true.

>From past experiences, getting any these new tests involving platform
helpers, working on all existing enabled archs is neither trivial nor
going to be quick. Existing tests here are known to succeed in enabled
platforms. Nonetheless, proposed tests as in the above suggestions do
make sense but will try to accommodate them in a later patch.


Re: [RFC PATCH 4/7] x86: use exit_lazy_tlb rather than membarrier_mm_sync_core_before_usermode

2020-07-16 Thread Alan Stern
On Thu, Jul 16, 2020 at 02:58:41PM -0400, Mathieu Desnoyers wrote:
> - On Jul 16, 2020, at 12:03 PM, Mathieu Desnoyers 
> mathieu.desnoy...@efficios.com wrote:
> 
> > - On Jul 16, 2020, at 11:46 AM, Mathieu Desnoyers
> > mathieu.desnoy...@efficios.com wrote:
> > 
> >> - On Jul 16, 2020, at 12:42 AM, Nicholas Piggin npig...@gmail.com 
> >> wrote:
> >>> I should be more complete here, especially since I was complaining
> >>> about unclear barrier comment :)
> >>> 
> >>> 
> >>> CPU0 CPU1
> >>> a. user stuff1. user stuff
> >>> b. membarrier()  2. enter kernel
> >>> c. smp_mb()  3. smp_mb__after_spinlock(); // in __schedule
> >>> d. read rq->curr 4. rq->curr switched to kthread
> >>> e. is kthread, skip IPI  5. switch_to kthread
> >>> f. return to user6. rq->curr switched to user thread
> >>> g. user stuff7. switch_to user thread
> >>> 8. exit kernel
> >>> 9. more user stuff
> >>> 
> >>> What you're really ordering is a, g vs 1, 9 right?
> >>> 
> >>> In other words, 9 must see a if it sees g, g must see 1 if it saw 9,
> >>> etc.
> >>> 
> >>> Userspace does not care where the barriers are exactly or what kernel
> >>> memory accesses might be being ordered by them, so long as there is a
> >>> mb somewhere between a and g, and 1 and 9. Right?
> >> 
> >> This is correct.
> > 
> > Actually, sorry, the above is not quite right. It's been a while
> > since I looked into the details of membarrier.
> > 
> > The smp_mb() at the beginning of membarrier() needs to be paired with a
> > smp_mb() _after_ rq->curr is switched back to the user thread, so the
> > memory barrier is between store to rq->curr and following user-space
> > accesses.
> > 
> > The smp_mb() at the end of membarrier() needs to be paired with the
> > smp_mb__after_spinlock() at the beginning of schedule, which is
> > between accesses to userspace memory and switching rq->curr to kthread.
> > 
> > As to *why* this ordering is needed, I'd have to dig through additional
> > scenarios from https://lwn.net/Articles/573436/. Or maybe Paul remembers ?
> 
> Thinking further about this, I'm beginning to consider that maybe we have been
> overly cautious by requiring memory barriers before and after store to 
> rq->curr.
> 
> If CPU0 observes a CPU1's rq->curr->mm which differs from its own process 
> (current)
> while running the membarrier system call, it necessarily means that CPU1 had
> to issue smp_mb__after_spinlock when entering the scheduler, between any 
> user-space
> loads/stores and update of rq->curr.
> 
> Requiring a memory barrier between update of rq->curr (back to current 
> process's
> thread) and following user-space memory accesses does not seem to guarantee
> anything more than what the initial barrier at the beginning of __schedule 
> already
> provides, because the guarantees are only about accesses to user-space memory.
> 
> Therefore, with the memory barrier at the beginning of __schedule, just 
> observing that
> CPU1's rq->curr differs from current should guarantee that a memory barrier 
> was issued
> between any sequentially consistent instructions belonging to the current 
> process on
> CPU1.
> 
> Or am I missing/misremembering an important point here ?

Is it correct to say that the switch_to operations in 5 and 7 include 
memory barriers?  If they do, then skipping the IPI should be okay.

The reason is as follows: The guarantee you need to enforce is that 
anything written by CPU0 before the membarrier() will be visible to CPU1 
after it returns to user mode.  Let's say that a writes to X and 9 
reads from X.

Then we have an instance of the Store Buffer pattern:

CPU0CPU1
a. Write X  6. Write rq->curr for user thread
c. smp_mb() 7. switch_to memory barrier
d. Read rq->curr9. Read X

In this pattern, the memory barriers make it impossible for both reads 
to miss their corresponding writes.  Since d does fail to read 6 (it 
sees the earlier value stored by 4), 9 must read a.

The other guarantee you need is that g on CPU0 will observe anything 
written by CPU1 in 1.  This is easier to see, using the fact that 3 is a 
memory barrier and d reads from 4.

Alan Stern


Re: [PATCH v3 0/3] Off-load TLB invalidations to host for !GTSE

2020-07-16 Thread Nicholas Piggin
Excerpts from Nicholas Piggin's message of July 17, 2020 12:08 pm:
> Excerpts from Qian Cai's message of July 17, 2020 3:27 am:
>> On Fri, Jul 03, 2020 at 11:06:05AM +0530, Bharata B Rao wrote:
>>> Hypervisor may choose not to enable Guest Translation Shootdown Enable
>>> (GTSE) option for the guest. When GTSE isn't ON, the guest OS isn't
>>> permitted to use instructions like tblie and tlbsync directly, but is
>>> expected to make hypervisor calls to get the TLB flushed.
>>> 
>>> This series enables the TLB flush routines in the radix code to
>>> off-load TLB flushing to hypervisor via the newly proposed hcall
>>> H_RPT_INVALIDATE. 
>>> 
>>> To easily check the availability of GTSE, it is made an MMU feature.
>>> The OV5 handling and H_REGISTER_PROC_TBL hcall are changed to
>>> handle GTSE as an optionally available feature and to not assume GTSE
>>> when radix support is available.
>>> 
>>> The actual hcall implementation for KVM isn't included in this
>>> patchset and will be posted separately.
>>> 
>>> Changes in v3
>>> =
>>> - Fixed a bug in the hcall wrapper code where we were missing setting
>>>   H_RPTI_TYPE_NESTED while retrying the failed flush request with
>>>   a full flush for the nested case.
>>> - s/psize_to_h_rpti/psize_to_rpti_pgsize
>>> 
>>> v2: 
>>> https://lore.kernel.org/linuxppc-dev/20200626131000.5207-1-bhar...@linux.ibm.com/T/#t
>>> 
>>> Bharata B Rao (2):
>>>   powerpc/mm: Enable radix GTSE only if supported.
>>>   powerpc/pseries: H_REGISTER_PROC_TBL should ask for GTSE only if
>>> enabled
>>> 
>>> Nicholas Piggin (1):
>>>   powerpc/mm/book3s64/radix: Off-load TLB invalidations to host when
>>> !GTSE
>> 
>> Reverting the whole series fixed random memory corruptions during boot on
>> POWER9 PowerNV systems below.
> 
> If I s/mmu_has_feature(MMU_FTR_GTSE)/(1)/g in radix_tlb.c, then the .o
> disasm is the same as reverting my patch.
> 
> Feature bits not being set right? PowerNV should be pretty simple, seems
> to do the same as FTR_TYPE_RADIX.

Might need this fix

---

diff --git a/arch/powerpc/kernel/prom.c b/arch/powerpc/kernel/prom.c
index 9cc49f265c86..54c9bcea9d4e 100644
--- a/arch/powerpc/kernel/prom.c
+++ b/arch/powerpc/kernel/prom.c
@@ -163,7 +163,7 @@ static struct ibm_pa_feature {
{ .pabyte = 0,  .pabit = 6, .cpu_features  = CPU_FTR_NOEXECUTE },
{ .pabyte = 1,  .pabit = 2, .mmu_features  = MMU_FTR_CI_LARGE_PAGE },
 #ifdef CONFIG_PPC_RADIX_MMU
-   { .pabyte = 40, .pabit = 0, .mmu_features  = MMU_FTR_TYPE_RADIX },
+   { .pabyte = 40, .pabit = 0, .mmu_features  = (MMU_FTR_TYPE_RADIX | 
MMU_FTR_GTSE) },
 #endif
{ .pabyte = 1,  .pabit = 1, .invert = 1, .cpu_features = 
CPU_FTR_NODSISRALIGN },
{ .pabyte = 5,  .pabit = 0, .cpu_features  = CPU_FTR_REAL_LE,


Re: [PATCH v3 0/3] Off-load TLB invalidations to host for !GTSE

2020-07-16 Thread Nicholas Piggin
Excerpts from Qian Cai's message of July 17, 2020 3:27 am:
> On Fri, Jul 03, 2020 at 11:06:05AM +0530, Bharata B Rao wrote:
>> Hypervisor may choose not to enable Guest Translation Shootdown Enable
>> (GTSE) option for the guest. When GTSE isn't ON, the guest OS isn't
>> permitted to use instructions like tblie and tlbsync directly, but is
>> expected to make hypervisor calls to get the TLB flushed.
>> 
>> This series enables the TLB flush routines in the radix code to
>> off-load TLB flushing to hypervisor via the newly proposed hcall
>> H_RPT_INVALIDATE. 
>> 
>> To easily check the availability of GTSE, it is made an MMU feature.
>> The OV5 handling and H_REGISTER_PROC_TBL hcall are changed to
>> handle GTSE as an optionally available feature and to not assume GTSE
>> when radix support is available.
>> 
>> The actual hcall implementation for KVM isn't included in this
>> patchset and will be posted separately.
>> 
>> Changes in v3
>> =
>> - Fixed a bug in the hcall wrapper code where we were missing setting
>>   H_RPTI_TYPE_NESTED while retrying the failed flush request with
>>   a full flush for the nested case.
>> - s/psize_to_h_rpti/psize_to_rpti_pgsize
>> 
>> v2: 
>> https://lore.kernel.org/linuxppc-dev/20200626131000.5207-1-bhar...@linux.ibm.com/T/#t
>> 
>> Bharata B Rao (2):
>>   powerpc/mm: Enable radix GTSE only if supported.
>>   powerpc/pseries: H_REGISTER_PROC_TBL should ask for GTSE only if
>> enabled
>> 
>> Nicholas Piggin (1):
>>   powerpc/mm/book3s64/radix: Off-load TLB invalidations to host when
>> !GTSE
> 
> Reverting the whole series fixed random memory corruptions during boot on
> POWER9 PowerNV systems below.

If I s/mmu_has_feature(MMU_FTR_GTSE)/(1)/g in radix_tlb.c, then the .o
disasm is the same as reverting my patch.

Feature bits not being set right? PowerNV should be pretty simple, seems
to do the same as FTR_TYPE_RADIX.

So... test being done before static keys are set up? Shouldn't be. Must
be something obvious I just can't see it.

Thanks,
Nick



[powerpc:next-test 125/127] arch/powerpc/mm/book3s64/pkeys.c:392:7: error: implicit declaration of function 'is_pkey_enabled'; did you mean

2020-07-16 Thread kernel test robot
tree:   https://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux.git 
next-test
head:   0fbd1eb4df96e1cbd039e0b95fdf62cf65a7faf9
commit: ed411c66eea2ccf93a634ae661a1f79c2bc63d88 [125/127] 
powerpc/book3s64/pkeys: Remove is_pkey_enabled()
config: powerpc-allmodconfig (attached as .config)
compiler: powerpc64-linux-gcc (GCC) 9.3.0
reproduce (this is a W=1 build):
wget 
https://raw.githubusercontent.com/intel/lkp-tests/master/sbin/make.cross -O 
~/bin/make.cross
chmod +x ~/bin/make.cross
git checkout ed411c66eea2ccf93a634ae661a1f79c2bc63d88
# save the attached .config to linux build tree
COMPILER_INSTALL_PATH=$HOME/0day COMPILER=gcc-9.3.0 make.cross 
ARCH=powerpc 

If you fix the issue, kindly add following tag as appropriate
Reported-by: kernel test robot 

All errors (new ones prefixed by >>):

   arch/powerpc/mm/book3s64/pkeys.c: In function 'pkey_access_permitted':
>> arch/powerpc/mm/book3s64/pkeys.c:392:7: error: implicit declaration of 
>> function 'is_pkey_enabled'; did you mean 'arch_pkeys_enabled'? 
>> [-Werror=implicit-function-declaration]
 392 |  if (!is_pkey_enabled(pkey))
 |   ^~~
 |   arch_pkeys_enabled
   cc1: some warnings being treated as errors

vim +392 arch/powerpc/mm/book3s64/pkeys.c

f2407ef3ba2256 arch/powerpc/mm/pkeys.c Ram Pai 2018-01-18  386  
f2407ef3ba2256 arch/powerpc/mm/pkeys.c Ram Pai 2018-01-18  387  static bool 
pkey_access_permitted(int pkey, bool write, bool execute)
f2407ef3ba2256 arch/powerpc/mm/pkeys.c Ram Pai 2018-01-18  388  {
f2407ef3ba2256 arch/powerpc/mm/pkeys.c Ram Pai 2018-01-18  389  int 
pkey_shift;
f2407ef3ba2256 arch/powerpc/mm/pkeys.c Ram Pai 2018-01-18  390  u64 amr;
f2407ef3ba2256 arch/powerpc/mm/pkeys.c Ram Pai 2018-01-18  391  
f2407ef3ba2256 arch/powerpc/mm/pkeys.c Ram Pai 2018-01-18 @392  if 
(!is_pkey_enabled(pkey))
f2407ef3ba2256 arch/powerpc/mm/pkeys.c Ram Pai 2018-01-18  393  
return true;
f2407ef3ba2256 arch/powerpc/mm/pkeys.c Ram Pai 2018-01-18  394  
f2407ef3ba2256 arch/powerpc/mm/pkeys.c Ram Pai 2018-01-18  395  
pkey_shift = pkeyshift(pkey);
f2407ef3ba2256 arch/powerpc/mm/pkeys.c Ram Pai 2018-01-18  396  if 
(execute && !(read_iamr() & (IAMR_EX_BIT << pkey_shift)))
f2407ef3ba2256 arch/powerpc/mm/pkeys.c Ram Pai 2018-01-18  397  
return true;
f2407ef3ba2256 arch/powerpc/mm/pkeys.c Ram Pai 2018-01-18  398  
f2407ef3ba2256 arch/powerpc/mm/pkeys.c Ram Pai 2018-01-18  399  amr = 
read_amr(); /* Delay reading amr until absolutely needed */
f2407ef3ba2256 arch/powerpc/mm/pkeys.c Ram Pai 2018-01-18  400  return 
((!write && !(amr & (AMR_RD_BIT << pkey_shift))) ||
f2407ef3ba2256 arch/powerpc/mm/pkeys.c Ram Pai 2018-01-18  401  
(write &&  !(amr & (AMR_WR_BIT << pkey_shift;
f2407ef3ba2256 arch/powerpc/mm/pkeys.c Ram Pai 2018-01-18  402  }
f2407ef3ba2256 arch/powerpc/mm/pkeys.c Ram Pai 2018-01-18  403  

:: The code at line 392 was first introduced by commit
:: f2407ef3ba225665ee24965f69bc84435fb590cf powerpc: helper to validate 
key-access permissions of a pte

:: TO: Ram Pai 
:: CC: Michael Ellerman 

---
0-DAY CI Kernel Test Service, Intel Corporation
https://lists.01.org/hyperkitty/list/kbuild-...@lists.01.org


.config.gz
Description: application/gzip


[powerpc:next-test] BUILD SUCCESS 0fbd1eb4df96e1cbd039e0b95fdf62cf65a7faf9

2020-07-16 Thread kernel test robot
tree/branch: https://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux.git  
next-test
branch HEAD: 0fbd1eb4df96e1cbd039e0b95fdf62cf65a7faf9  papr/scm: Add bad memory 
ranges to nvdimm bad ranges

elapsed time: 789m

configs tested: 74
configs skipped: 1

The following configs have been built successfully.
More configs may be tested in the coming days.

arm defconfig
arm  allyesconfig
arm  allmodconfig
arm   allnoconfig
arm64allyesconfig
arm64   defconfig
arm64allmodconfig
arm64 allnoconfig
i386 allyesconfig
i386defconfig
i386  debian-10.3
i386  allnoconfig
ia64 allmodconfig
ia64defconfig
ia64  allnoconfig
ia64 allyesconfig
m68k allmodconfig
m68k  allnoconfig
m68k   sun3_defconfig
m68kdefconfig
m68k allyesconfig
nds32   defconfig
nds32 allnoconfig
csky allyesconfig
cskydefconfig
alpha   defconfig
alphaallyesconfig
nios2   defconfig
nios2allyesconfig
openriscdefconfig
c6x  allyesconfig
c6x   allnoconfig
openrisc allyesconfig
xtensa   allyesconfig
h8300allyesconfig
h8300allmodconfig
xtensa  defconfig
arc defconfig
arc  allyesconfig
sh   allmodconfig
shallnoconfig
microblazeallnoconfig
mips allyesconfig
mips  allnoconfig
mips allmodconfig
pariscallnoconfig
parisc  defconfig
parisc   allyesconfig
parisc   allmodconfig
powerpc defconfig
powerpc  allyesconfig
powerpc  rhel-kconfig
powerpc  allmodconfig
powerpc   allnoconfig
riscvallyesconfig
riscv allnoconfig
riscv   defconfig
riscvallmodconfig
s390 allyesconfig
s390  allnoconfig
s390 allmodconfig
s390defconfig
sparcallyesconfig
sparc   defconfig
sparc64 defconfig
sparc64   allnoconfig
sparc64  allyesconfig
sparc64  allmodconfig
x86_64   rhel
x86_64lkp
x86_64  fedora-25
x86_64rhel-7.6-kselftests
x86_64   rhel-8.3
x86_64  kexec

---
0-DAY CI Kernel Test Service, Intel Corporation
https://lists.01.org/hyperkitty/list/kbuild-...@lists.01.org


[powerpc:merge] BUILD SUCCESS 3a60e5fbdc3520d429d7cd6affed5a8daf120c6b

2020-07-16 Thread kernel test robot
tree/branch: https://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux.git  
merge
branch HEAD: 3a60e5fbdc3520d429d7cd6affed5a8daf120c6b  Automatic merge of 
'master', 'next' and 'fixes' (2020-07-16 22:34)

elapsed time: 791m

configs tested: 80
configs skipped: 1

The following configs have been built successfully.
More configs may be tested in the coming days.

arm defconfig
arm  allyesconfig
arm  allmodconfig
arm   allnoconfig
arm64allyesconfig
arm64   defconfig
arm64allmodconfig
arm64 allnoconfig
i386  allnoconfig
i386 allyesconfig
i386defconfig
i386  debian-10.3
ia64 allmodconfig
ia64defconfig
ia64  allnoconfig
ia64 allyesconfig
m68k allmodconfig
m68k  allnoconfig
m68k   sun3_defconfig
m68kdefconfig
m68k allyesconfig
nds32   defconfig
nds32 allnoconfig
csky allyesconfig
cskydefconfig
alpha   defconfig
alphaallyesconfig
nios2   defconfig
nios2allyesconfig
openriscdefconfig
c6x  allyesconfig
c6x   allnoconfig
openrisc allyesconfig
xtensa   allyesconfig
h8300allyesconfig
h8300allmodconfig
xtensa  defconfig
arc defconfig
arc  allyesconfig
sh   allmodconfig
shallnoconfig
microblazeallnoconfig
mips allyesconfig
mips  allnoconfig
mips allmodconfig
pariscallnoconfig
parisc  defconfig
parisc   allyesconfig
parisc   allmodconfig
powerpc defconfig
powerpc  allyesconfig
powerpc  rhel-kconfig
powerpc  allmodconfig
powerpc   allnoconfig
i386 randconfig-a016-20200716
i386 randconfig-a011-20200716
i386 randconfig-a015-20200716
i386 randconfig-a012-20200716
i386 randconfig-a013-20200716
i386 randconfig-a014-20200716
riscvallyesconfig
riscv allnoconfig
riscv   defconfig
riscvallmodconfig
s390 allyesconfig
s390  allnoconfig
s390 allmodconfig
s390defconfig
sparcallyesconfig
sparc   defconfig
sparc64 defconfig
sparc64   allnoconfig
sparc64  allyesconfig
sparc64  allmodconfig
x86_64rhel-7.6-kselftests
x86_64   rhel-8.3
x86_64  kexec
x86_64   rhel
x86_64lkp
x86_64  fedora-25

---
0-DAY CI Kernel Test Service, Intel Corporation
https://lists.01.org/hyperkitty/list/kbuild-...@lists.01.org


Re: [PATCH net-next] ibmvnic: Increase driver logging

2020-07-16 Thread Stephen Hemminger
On Thu, 16 Jul 2020 13:22:00 -0700
Jakub Kicinski  wrote:

> On Thu, 16 Jul 2020 18:07:37 +0200 Michal Suchánek wrote:
> > On Thu, Jul 16, 2020 at 10:59:58AM -0500, Thomas Falcon wrote:  
> > > On 7/15/20 8:29 PM, David Miller wrote:
> > > > From: Jakub Kicinski 
> > > > Date: Wed, 15 Jul 2020 17:06:32 -0700
> > > > 
> > > > > On Wed, 15 Jul 2020 18:51:55 -0500 Thomas Falcon wrote:
> > > > > > free_netdev(netdev);
> > > > > > dev_set_drvdata(>dev, NULL);
> > > > > > +   netdev_info(netdev, "VNIC client device has been successfully 
> > > > > > removed.\n");
> > > > > A step too far, perhaps.
> > > > > 
> > > > > In general this patch looks a little questionable IMHO, this amount of
> > > > > logging output is not commonly seen in drivers. All the the info
> > > > > messages are just static text, not even carrying any extra 
> > > > > information.
> > > > > In an era of ftrace, and bpftrace, do we really need this?
> > > > Agreed, this is too much.  This is debugging, and thus suitable for 
> > > > tracing
> > > > facilities, at best.
> > > 
> > > Thanks for your feedback. I see now that I was overly aggressive with this
> > > patch to be sure, but it would help with narrowing down problems at a 
> > > first
> > > glance, should they arise. The driver in its current state logs very 
> > > little
> > > of what is it doing without the use of additional debugging or tracing
> > > facilities. Would it be worth it to pursue a less aggressive version or
> > > would that be dead on arrival? What are acceptable driver operations to 
> > > log
> > > at this level?
> 
> Sadly it's much more of an art than hard science. Most networking
> drivers will print identifying information when they probe the device
> and then only about major config changes or when link comes up or goes
> down. And obviously when anything unexpected, like an error happens,
> that's key.
> 
> You seem to be adding start / end information for each driver init /
> deinit stage. I'd say try to focus on the actual errors you're trying
> to catch.
> 
> > Also would it be advisable to add the messages as pr_dbg to be enabled on 
> > demand?  
> 
> I personally have had a pretty poor experience with pr_debug() because
> CONFIG_DYNAMIC_DEBUG is not always enabled. Since you're just printing
> static text there shouldn't be much difference between pr_debug and
> ftrace and/or bpftrace, honestly.
> 
> Again, slightly hard to advise not knowing what you're trying to catch.

Linux drivers in general are far too noisy.
In production it is not uncommon to set kernel to suppress all info messages.


Re: [RFC PATCH 4/7] x86: use exit_lazy_tlb rather than membarrier_mm_sync_core_before_usermode

2020-07-16 Thread Nicholas Piggin
Excerpts from Mathieu Desnoyers's message of July 17, 2020 4:58 am:
> - On Jul 16, 2020, at 12:03 PM, Mathieu Desnoyers 
> mathieu.desnoy...@efficios.com wrote:
> 
>> - On Jul 16, 2020, at 11:46 AM, Mathieu Desnoyers
>> mathieu.desnoy...@efficios.com wrote:
>> 
>>> - On Jul 16, 2020, at 12:42 AM, Nicholas Piggin npig...@gmail.com wrote:
 I should be more complete here, especially since I was complaining
 about unclear barrier comment :)
 
 
 CPU0 CPU1
 a. user stuff1. user stuff
 b. membarrier()  2. enter kernel
 c. smp_mb()  3. smp_mb__after_spinlock(); // in __schedule
 d. read rq->curr 4. rq->curr switched to kthread
 e. is kthread, skip IPI  5. switch_to kthread
 f. return to user6. rq->curr switched to user thread
 g. user stuff7. switch_to user thread
 8. exit kernel
 9. more user stuff
 
 What you're really ordering is a, g vs 1, 9 right?
 
 In other words, 9 must see a if it sees g, g must see 1 if it saw 9,
 etc.
 
 Userspace does not care where the barriers are exactly or what kernel
 memory accesses might be being ordered by them, so long as there is a
 mb somewhere between a and g, and 1 and 9. Right?
>>> 
>>> This is correct.
>> 
>> Actually, sorry, the above is not quite right. It's been a while
>> since I looked into the details of membarrier.
>> 
>> The smp_mb() at the beginning of membarrier() needs to be paired with a
>> smp_mb() _after_ rq->curr is switched back to the user thread, so the
>> memory barrier is between store to rq->curr and following user-space
>> accesses.
>> 
>> The smp_mb() at the end of membarrier() needs to be paired with the
>> smp_mb__after_spinlock() at the beginning of schedule, which is
>> between accesses to userspace memory and switching rq->curr to kthread.
>> 
>> As to *why* this ordering is needed, I'd have to dig through additional
>> scenarios from https://lwn.net/Articles/573436/. Or maybe Paul remembers ?
> 
> Thinking further about this, I'm beginning to consider that maybe we have been
> overly cautious by requiring memory barriers before and after store to 
> rq->curr.
> 
> If CPU0 observes a CPU1's rq->curr->mm which differs from its own process 
> (current)
> while running the membarrier system call, it necessarily means that CPU1 had
> to issue smp_mb__after_spinlock when entering the scheduler, between any 
> user-space
> loads/stores and update of rq->curr.
> 
> Requiring a memory barrier between update of rq->curr (back to current 
> process's
> thread) and following user-space memory accesses does not seem to guarantee
> anything more than what the initial barrier at the beginning of __schedule 
> already
> provides, because the guarantees are only about accesses to user-space memory.
> 
> Therefore, with the memory barrier at the beginning of __schedule, just 
> observing that
> CPU1's rq->curr differs from current should guarantee that a memory barrier 
> was issued
> between any sequentially consistent instructions belonging to the current 
> process on
> CPU1.
> 
> Or am I missing/misremembering an important point here ?

I might have mislead you.

 CPU0CPU1
 r1=yx=1
 membarrier()y=1
 r2=x

membarrier provides if r1==1 then r2==1 (right?)

 CPU0
 r1=y
 membarrier()
   smp_mb();
   t = cpu_rq(1)->curr;
   if (t->mm == mm)
 IPI(CPU1);
   smp_mb()
 r2=x

 vs

 CPU1
   ...
   __schedule()
 smp_mb__after_spinlock()
 rq->curr = kthread
   ...
   __schedule()
 smp_mb__after_spinlock()
 rq->curr = user thread
 exit kernel
 x=1
 y=1

Now these last 3 stores are not ordered, so CPU0 might see y==1 but
rq->curr == kthread, right? Then it will skip the IPI and stores to x 
and y will not be ordered.

So we do need a mb after rq->curr store when mm is switching.

I believe for the global membarrier PF_KTHREAD optimisation, we also 
need a barrier when switching from a kernel thread to user, for the
same reason.

So I think I was wrong to say the barrier is not necessary.

I haven't quite worked out why two mb()s are required in membarrier(),
but at least that's less of a performance concern.

Thanks,
Nick


Re: [PATCH v8 5/8] powerpc/vdso: Prepare for switching VDSO to generic C implementation.

2020-07-16 Thread Tulio Magno Quites Machado Filho
Christophe Leroy  writes:

> Michael Ellerman  a écrit :
>
>> Christophe Leroy  writes:
>>> Prepare for switching VDSO to generic C implementation in following
>>> patch. Here, we:
>>> - Modify __get_datapage() to take an offset
>>> - Prepare the helpers to call the C VDSO functions
>>> - Prepare the required callbacks for the C VDSO functions
>>> - Prepare the clocksource.h files to define VDSO_ARCH_CLOCKMODES
>>> - Add the C trampolines to the generic C VDSO functions
>>>
>>> powerpc is a bit special for VDSO as well as system calls in the
>>> way that it requires setting CR SO bit which cannot be done in C.
>>> Therefore, entry/exit needs to be performed in ASM.
>>>
>>> Implementing __arch_get_vdso_data() would clobber the link register,
>>> requiring the caller to save it. As the ASM calling function already
>>> has to set a stack frame and saves the link register before calling
>>> the C vdso function, retriving the vdso data pointer there is lighter.
>> ...
>>
>>> diff --git a/arch/powerpc/include/asm/vdso/gettimeofday.h  
>>> b/arch/powerpc/include/asm/vdso/gettimeofday.h
>>> new file mode 100644
>>> index ..4452897f9bd8
>>> --- /dev/null
>>> +++ b/arch/powerpc/include/asm/vdso/gettimeofday.h
>>> @@ -0,0 +1,175 @@
>>> +/* SPDX-License-Identifier: GPL-2.0 */
>>> +#ifndef __ASM_VDSO_GETTIMEOFDAY_H
>>> +#define __ASM_VDSO_GETTIMEOFDAY_H
>>> +
>>> +#include 
>>> +
>>> +#ifdef __ASSEMBLY__
>>> +
>>> +.macro cvdso_call funct
>>> +  .cfi_startproc
>>> +   PPC_STLUr1, -STACK_FRAME_OVERHEAD(r1)
>>> +   mflrr0
>>> +  .cfi_register lr, r0
>>> +   PPC_STL r0, STACK_FRAME_OVERHEAD + PPC_LR_STKOFF(r1)
>>
>> This doesn't work for me on ppc64(le) with glibc.
>>
>> glibc doesn't create a stack frame before making the VDSO call, so the
>> store of r0 (LR) goes into the caller's frame, corrupting the saved LR,
>> leading to an infinite loop.
>
> Where should it be saved if it can't be saved in the standard location ?

As Michael pointed out, userspace doesn't treat the VDSO as a normal function
call.  In order to keep compatibility with existent software, LR would need to
be saved on another stack frame.

-- 
Tulio Magno


Re: [PATCH 1/1] ASoC: fsl: fsl-asoc-card: Trivial: Fix misspelling of 'exists'

2020-07-16 Thread Mark Brown
On Wed, 15 Jul 2020 10:44:47 +0100, Lee Jones wrote:
> 


Applied to

   https://git.kernel.org/pub/scm/linux/kernel/git/broonie/sound.git for-next

Thanks!

[1/1] ASoC: fsl: fsl-asoc-card: Trivial: Fix misspelling of 'exists'
  commit: 1b58214113481616b74ee4d196e5b1cb683758ee

All being well this means that it will be integrated into the linux-next
tree (usually sometime in the next 24 hours) and sent to Linus during
the next merge window (or sooner if it is a bug fix), however if
problems are discovered then the patch may be dropped or reverted.

You may get further e-mails resulting from automated or manual testing
and review of the tree, please engage with people reporting problems and
send followup patches addressing any issues that are reported if needed.

If any updates are required or you are submitting further changes they
should be sent as incremental updates against current git, existing
patches will not be replaced.

Please add any relevant lists and maintainers to the CCs when replying
to this mail.

Thanks,
Mark


Re: [PATCH v2 1/1] ASoC: fsl: fsl-asoc-card: Trivial: Fix misspelling of 'exists'

2020-07-16 Thread Mark Brown
On Wed, 15 Jul 2020 16:00:09 +0100, Lee Jones wrote:
> 


Applied to

   https://git.kernel.org/pub/scm/linux/kernel/git/broonie/sound.git for-next

Thanks!

[1/1] ASoC: fsl: fsl-asoc-card: Trivial: Fix misspelling of 'exists'
  commit: 1b58214113481616b74ee4d196e5b1cb683758ee

All being well this means that it will be integrated into the linux-next
tree (usually sometime in the next 24 hours) and sent to Linus during
the next merge window (or sooner if it is a bug fix), however if
problems are discovered then the patch may be dropped or reverted.

You may get further e-mails resulting from automated or manual testing
and review of the tree, please engage with people reporting problems and
send followup patches addressing any issues that are reported if needed.

If any updates are required or you are submitting further changes they
should be sent as incremental updates against current git, existing
patches will not be replaced.

Please add any relevant lists and maintainers to the CCs when replying
to this mail.

Thanks,
Mark


Re: [RFC PATCH 4/7] x86: use exit_lazy_tlb rather than membarrier_mm_sync_core_before_usermode

2020-07-16 Thread Nicholas Piggin
Excerpts from pet...@infradead.org's message of July 16, 2020 9:00 pm:
> On Thu, Jul 16, 2020 at 08:03:36PM +1000, Nicholas Piggin wrote:
>> Excerpts from Peter Zijlstra's message of July 16, 2020 6:50 pm:
>> > On Wed, Jul 15, 2020 at 10:18:20PM -0700, Andy Lutomirski wrote:
>> >> > On Jul 15, 2020, at 9:15 PM, Nicholas Piggin  wrote:
> 
>> >> But I’m wondering if all this deferred sync stuff is wrong. In the
>> >> brave new world of io_uring and such, perhaps kernel access matter
>> >> too.  Heck, even:
>> > 
>> > IIRC the membarrier SYNC_CORE use-case is about user-space
>> > self-modifying code.
>> > 
>> > Userspace re-uses a text address and needs to SYNC_CORE before it can be
>> > sure the old text is forgotten. Nothing the kernel does matters there.
>> > 
>> > I suppose the manpage could be more clear there.
>> 
>> True, but memory ordering of kernel stores from kernel threads for
>> regular mem barrier is the concern here.
>> 
>> Does io_uring update completion queue from kernel thread or interrupt,
>> for example? If it does, then membarrier will not order such stores
>> with user memory accesses.
> 
> So we're talking about regular membarrier() then? Not the SYNC_CORE
> variant per-se.

Well, both but Andy in this case was wondering about kernel writes
vs user.

> 
> Even there, I'll argue we don't care, but perhaps Mathieu has a
> different opinion. All we care about is that all other threads (or CPUs
> for GLOBAL) observe an smp_mb() before it returns.
> 
> Any serialization against whatever those other threads/CPUs are running
> at the instant of the syscall is external to the syscall, we make no
> gauarantees about that. That is, we can fundamentally not say what
> another CPU is executing concurrently. Nor should we want to.
> 
> So if you feel that your membarrier() ought to serialize against remote
> execution, you need to arrange a quiecent state on the remote side
> yourself.
> 
> Now, normally membarrier() is used to implement userspace RCU like
> things, and there all that matters is that the remote CPUs observe the
> beginngin of the new grace-period, ie counter flip, and we observe their
> read-side critical sections, or smething like that, it's been a while
> since I looked at all that.
> 
> It's always been the case that concurrent syscalls could change user
> memory, io_uring doesn't change that, it just makes it even less well
> defined when that would happen. If you want to serialize against that,
> you need to arrange that externally.

membarrier does replace barrier instructions on remote CPUs, which do
order accesses performed by the kernel on the user address space. So
membarrier should too I guess.

Normal process context accesses like read(2) will do so because they
don't get filtered out from IPIs, but kernel threads using the mm may
not.

Thanks,
Nick


Question about NUMA distance calculation in powerpc/mm/numa.c

2020-07-16 Thread Daniel Henrique Barboza

Hello,


I didn't find an explanation about the 'double the distance' logic in
'git log' or anywhere in the kernel docs:


(arch/powerpc/mm/numa.c, __node_distance()):

for (i = 0; i < distance_ref_points_depth; i++) {
if (distance_lookup_table[a][i] == distance_lookup_table[b][i])
break;

/* Double the distance for each NUMA level */
distance *= 2;
}

For reference, the commit that added it:


commit 41eab6f88f24124df89e38067b3766b7bef06ddb
Author: Anton Blanchard 
Date:   Sun May 16 20:22:31 2010 +

powerpc/numa: Use form 1 affinity to setup node distance
 


Is there a technical reason for the distance being calculated as the double
for each NUMA level?

The reason I'm asking is because of the QEMU/Libvirt capability to define NUMA
node distances in the VMs. For x86, an user is capable of setting any distance
values to the NUMA topology due to how ACPI SLIT works.

The user, of course, wants the pseries guest to behave the same way. The best
we can do for now is document why this will not happen. I'll document the
limitations imposed by the design itself (how ibm,associativity-reference-points
is capped to MAX_DISTANCE_REF_POINTS and so on). I also would like to document
that the pseries kernel will double the distance for each NUMA level, and for
that it would be nice to provide an actual reason for that to happen, if
there is any.


Thanks,


Daniel



Re: [PATCH v3 0/3] Off-load TLB invalidations to host for !GTSE

2020-07-16 Thread Stephen Rothwell
Hi all,

On Thu, 16 Jul 2020 13:27:14 -0400 Qian Cai  wrote:
>
> Reverting the whole series fixed random memory corruptions during boot on
> POWER9 PowerNV systems below.

I will revert those commits from linux-next today as well (they revert
cleanly).

-- 
Cheers,
Stephen Rothwell


pgpknsPCqmZPx.pgp
Description: OpenPGP digital signature


Re: [PATCH] powerpc/64: Fix an out of date comment about MMIO ordering

2020-07-16 Thread Benjamin Herrenschmidt
On Thu, 2020-07-16 at 12:38 -0700, Palmer Dabbelt wrote:
> From: Palmer Dabbelt 
> 
> This primitive has been renamed, but because it was spelled incorrectly in the
> first place it must have escaped the fixup patch.  As far as I can tell this
> logic is still correct: smp_mb__after_spinlock() uses the default smp_mb()
> implementation, which is "sync" rather than "hwsync" but those are the same
> (though I'm not that familiar with PowerPC).

Typo ? That must be me ... :)

Looks fine. Yes, sync and hwsync are the same (by opposition to lwsync
which is lighter weight and doesn't order cache inhibited).

Cheers,
Ben.

> Signed-off-by: Palmer Dabbelt 
> ---
>  arch/powerpc/kernel/entry_64.S | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/arch/powerpc/kernel/entry_64.S b/arch/powerpc/kernel/entry_64.S
> index b3c9f15089b6..7b38b4daca93 100644
> --- a/arch/powerpc/kernel/entry_64.S
> +++ b/arch/powerpc/kernel/entry_64.S
> @@ -357,7 +357,7 @@ _GLOBAL(_switch)
>* kernel/sched/core.c).
>*
>* Uncacheable stores in the case of involuntary preemption must
> -  * be taken care of. The smp_mb__before_spin_lock() in __schedule()
> +  * be taken care of. The smp_mb__after_spinlock() in __schedule()
>* is implemented as hwsync on powerpc, which orders MMIO too. So
>* long as there is an hwsync in the context switch path, it will
>* be executed on the source CPU after the task has performed



Re: [PATCH v2 1/3] module: Rename module_alloc() to text_alloc() and move to kernel proper

2020-07-16 Thread Christophe Leroy

Jarkko Sakkinen  a écrit :


Rename module_alloc() to text_alloc() and module_memfree() to
text_memfree(), and move them to kernel/text.c, which is unconditionally
compiled to the kernel proper. This allows kprobes, ftrace and bpf to
allocate space for executable code without requiring to compile the modules
support (CONFIG_MODULES=y) in.


You are not changing enough in powerpc to have this work.
On powerpc 32 bits (6xx), when STRICT_KERNEL_RWX is selected, the  
vmalloc space is set to NX (no exec) at segment level (ie by 256Mbytes  
zone) unless CONFIG_MODULES is selected.


Christophe




Cc: Andi Kleen 
Suggested-by: Peter Zijlstra 
Signed-off-by: Jarkko Sakkinen 
---
 arch/arm/kernel/Makefile |  3 +-
 arch/arm/kernel/module.c | 21 ---
 arch/arm/kernel/text.c   | 33 ++
 arch/arm64/kernel/Makefile   |  2 +-
 arch/arm64/kernel/module.c   | 42 --
 arch/arm64/kernel/text.c | 54 
 arch/mips/kernel/Makefile|  2 +-
 arch/mips/kernel/module.c|  9 -
 arch/mips/kernel/text.c  | 19 ++
 arch/mips/net/bpf_jit.c  |  4 +--
 arch/nds32/kernel/Makefile   |  2 +-
 arch/nds32/kernel/module.c   |  7 
 arch/nds32/kernel/text.c | 12 +++
 arch/nios2/kernel/Makefile   |  1 +
 arch/nios2/kernel/module.c   | 19 --
 arch/nios2/kernel/text.c | 34 ++
 arch/parisc/kernel/Makefile  |  2 +-
 arch/parisc/kernel/module.c  | 11 --
 arch/parisc/kernel/text.c| 22 
 arch/powerpc/net/bpf_jit_comp.c  |  4 +--
 arch/riscv/kernel/Makefile   |  1 +
 arch/riscv/kernel/module.c   | 12 ---
 arch/riscv/kernel/text.c | 20 +++
 arch/s390/kernel/Makefile|  2 +-
 arch/s390/kernel/ftrace.c|  2 +-
 arch/s390/kernel/module.c| 16 -
 arch/s390/kernel/text.c  | 23 
 arch/sparc/kernel/Makefile   |  1 +
 arch/sparc/kernel/module.c   | 30 
 arch/sparc/kernel/text.c | 39 +
 arch/sparc/net/bpf_jit_comp_32.c |  6 ++--
 arch/unicore32/kernel/Makefile   |  1 +
 arch/unicore32/kernel/module.c   |  7 
 arch/unicore32/kernel/text.c | 18 ++
 arch/x86/kernel/Makefile |  1 +
 arch/x86/kernel/ftrace.c |  4 +--
 arch/x86/kernel/kprobes/core.c   |  4 +--
 arch/x86/kernel/module.c | 49 --
 arch/x86/kernel/text.c   | 60 
 include/linux/moduleloader.h |  4 +--
 kernel/Makefile  |  2 +-
 kernel/bpf/core.c|  4 +--
 kernel/kprobes.c |  4 +--
 kernel/module.c  | 37 ++--
 kernel/text.c| 25 +
 45 files changed, 400 insertions(+), 275 deletions(-)
 create mode 100644 arch/arm/kernel/text.c
 create mode 100644 arch/arm64/kernel/text.c
 create mode 100644 arch/mips/kernel/text.c
 create mode 100644 arch/nds32/kernel/text.c
 create mode 100644 arch/nios2/kernel/text.c
 create mode 100644 arch/parisc/kernel/text.c
 create mode 100644 arch/riscv/kernel/text.c
 create mode 100644 arch/s390/kernel/text.c
 create mode 100644 arch/sparc/kernel/text.c
 create mode 100644 arch/unicore32/kernel/text.c
 create mode 100644 arch/x86/kernel/text.c
 create mode 100644 kernel/text.c

diff --git a/arch/arm/kernel/Makefile b/arch/arm/kernel/Makefile
index 89e5d864e923..69bfacfd60ef 100644
--- a/arch/arm/kernel/Makefile
+++ b/arch/arm/kernel/Makefile
@@ -19,7 +19,8 @@ CFLAGS_REMOVE_return_address.o = -pg
 obj-y  := elf.o entry-common.o irq.o opcodes.o \
   process.o ptrace.o reboot.o \
   setup.o signal.o sigreturn_codes.o \
-  stacktrace.o sys_arm.o time.o traps.o
+  stacktrace.o sys_arm.o time.o traps.o \
+  text.o

 ifneq ($(CONFIG_ARM_UNWIND),y)
 obj-$(CONFIG_FRAME_POINTER)+= return_address.o
diff --git a/arch/arm/kernel/module.c b/arch/arm/kernel/module.c
index e15444b25ca0..13e3442a6b9f 100644
--- a/arch/arm/kernel/module.c
+++ b/arch/arm/kernel/module.c
@@ -33,27 +33,6 @@
 #define MODULES_VADDR  (((unsigned long)_exiprom + ~PMD_MASK) & PMD_MASK)
 #endif

-#ifdef CONFIG_MMU
-void *module_alloc(unsigned long size)
-{
-   gfp_t gfp_mask = GFP_KERNEL;
-   void *p;
-
-   /* Silence the initial allocation */
-   if (IS_ENABLED(CONFIG_ARM_MODULE_PLTS))
-   gfp_mask |= __GFP_NOWARN;
-
-   p = __vmalloc_node_range(size, 1, MODULES_VADDR, MODULES_END,
-   gfp_mask, PAGE_KERNEL_EXEC, 0, NUMA_NO_NODE,
-   __builtin_return_address(0));
-   if (!IS_ENABLED(CONFIG_ARM_MODULE_PLTS) || p)
-   return p;
-   return __vmalloc_node_range(size, 1,  VMALLOC_START, VMALLOC_END,
-   GFP_KERNEL, 

Re: [PATCH v3 07/12] ppc64/kexec_file: add support to relocate purgatory

2020-07-16 Thread Thiago Jung Bauermann


Hari Bathini  writes:

> On 16/07/20 5:50 am, Thiago Jung Bauermann wrote:
>> 
>> Hari Bathini  writes:
>> 
>>> So, add support to relocate purgatory in kexec_file_load system call
>>> by setting up TOC pointer and applying RELA relocations as needed.
>> 
>> If we do want to use a C purgatory, Michael Ellerman had suggested
>> building it as a Position Independent Executable, which greatly reduces
>> the number and types of relocations that are needed. See patches 4 and 9
>> here:
>> 
>> https://lore.kernel.org/linuxppc-dev/1478748449-3894-1-git-send-email-bauer...@linux.vnet.ibm.com/
>> 
>> In the series above I hadn't converted x86 to PIE. If I had done that,
>> possibly Dave Young's opinion would have been different. :-)
>> 
>> If that's still not desirable, he suggested in that discussion lifting
>> some code from x86 to generic code, which I implemented and would
>> simplify this patch as well:
>> 
>> https://lore.kernel.org/linuxppc-dev/5009580.5GxAkTrMYA@morokweng/
>> 
>
> Agreed. But I prefer to work on PIE and/or moving common relocation_add code
> for x86 & s390 to generic code later when I try to build on these purgatory
> changes. So, a separate series later to rework purgatory with the things you
> mentioned above sounds ok?

Sounds ok to me. Let's see what the maintainers think, then.

-- 
Thiago Jung Bauermann
IBM Linux Technology Center


Re: [PATCH v3 09/12] ppc64/kexec_file: setup backup region for kdump kernel

2020-07-16 Thread Thiago Jung Bauermann


Hari Bathini  writes:

> On 16/07/20 7:08 am, Thiago Jung Bauermann wrote:
>>
>> Hari Bathini  writes:
>>
>>> @@ -968,7 +1040,7 @@ int setup_new_fdt_ppc64(const struct kimage *image, 
>>> void *fdt,
>>>
>>> /*
>>>  * Restrict memory usage for kdump kernel by setting up
>>> -* usable memory ranges.
>>> +* usable memory ranges and memory reserve map.
>>>  */
>>> if (image->type == KEXEC_TYPE_CRASH) {
>>> ret = get_usable_memory_ranges();
>>> @@ -980,6 +1052,24 @@ int setup_new_fdt_ppc64(const struct kimage *image, 
>>> void *fdt,
>>> pr_err("Error setting up usable-memory property for 
>>> kdump kernel\n");
>>> goto out;
>>> }
>>> +
>>> +   ret = fdt_add_mem_rsv(fdt, BACKUP_SRC_START + BACKUP_SRC_SIZE,
>>> + crashk_res.start - BACKUP_SRC_SIZE);
>>
>> I believe this answers my question from the other email about how the
>> crashkernel is prevented from stomping in the crashed kernel's memory,
>> right? I needed to think for a bit to understand what the above
>> reservation was protecting. I think it's worth adding a comment.
>
> Right. The reason to add it in the first place is, prom presses the panic 
> button if
> it can't find low memory. Marking it reserved seems to keep it quiet though. 
> so..
>
> Will add comment mentioning that..

Ah, makes sense. Thanks for the explanation.

>>> +void purgatory(void)
>>> +{
>>> +   void *dest, *src;
>>> +
>>> +   src = (void *)BACKUP_SRC_START;
>>> +   if (backup_start) {
>>> +   dest = (void *)backup_start;
>>> +   __memcpy(dest, src, BACKUP_SRC_SIZE);
>>> +   }
>>> +}
>>
>> In general I'm in favor of using C code over assembly, but having to
>> bring in that relocation support just for the above makes me wonder if
>> it's worth it in this case.
>
> I am planning to build on purgatory later with "I'm in purgatory" print 
> support
> for pseries at least and also, sha256 digest check.

Ok. In that case, my preference would be to convert both the powerpc and
x86 purgatories to PIE since this greatly reduces the types of
relocations that are emitted, but better ask Dave Young what he thinks
before going down that route.

--
Thiago Jung Bauermann
IBM Linux Technology Center


Re: [PATCH v3 06/12] ppc64/kexec_file: restrict memory usage of kdump kernel

2020-07-16 Thread Thiago Jung Bauermann


Hari Bathini  writes:

> On 16/07/20 4:22 am, Thiago Jung Bauermann wrote:
>> 
>> Hari Bathini  writes:
>> 
>
> 
>
>>> +/**
>>> + * get_node_path - Get the full path of the given node.
>>> + * @dn:Node.
>>> + * @path:  Updated with the full path of the node.
>>> + *
>>> + * Returns nothing.
>>> + */
>>> +static void get_node_path(struct device_node *dn, char *path)
>>> +{
>>> +   if (!dn)
>>> +   return;
>>> +
>>> +   get_node_path(dn->parent, path);
>> 
>> Is it ok to do recursion in the kernel? In this case I believe it's not
>> problematic since the maximum call depth will be the maximum depth of a
>> device tree node which shouldn't be too much. Also, there are no local
>> variables in this function. But I thought it was worth mentioning.
>
> You are right. We are better off avoiding the recursion here. Will
> change it to an iterative version instead.

Ok.

>>> +* each representing a memory range.
>>> +*/
>>> +   ranges = (len >> 2) / (n_mem_addr_cells + n_mem_size_cells);
>>> +
>>> +   for (i = 0; i < ranges; i++) {
>>> +   base = of_read_number(prop, n_mem_addr_cells);
>>> +   prop += n_mem_addr_cells;
>>> +   end = base + of_read_number(prop, n_mem_size_cells) - 1;
>
> prop is not used after the above.
>
>> You need to `prop += n_mem_size_cells` here.
>
> But yeah, adding it would make it look complete in some sense..

Isn't it used in the next iteration of the loop?

-- 
Thiago Jung Bauermann
IBM Linux Technology Center


Re: [PATCH v3 05/12] powerpc/drmem: make lmb walk a bit more flexible

2020-07-16 Thread Thiago Jung Bauermann


Hari Bathini  writes:

> On 15/07/20 9:20 am, Thiago Jung Bauermann wrote:
>> 
>> Hari Bathini  writes:
>> 
>>> @@ -534,7 +537,7 @@ static int __init 
>>> early_init_dt_scan_memory_ppc(unsigned long node,
>>>  #ifdef CONFIG_PPC_PSERIES
>>> if (depth == 1 &&
>>> strcmp(uname, "ibm,dynamic-reconfiguration-memory") == 0) {
>>> -   walk_drmem_lmbs_early(node, early_init_drmem_lmb);
>>> +   walk_drmem_lmbs_early(node, NULL, early_init_drmem_lmb);
>> 
>> walk_drmem_lmbs_early() can now fail. Should this failure be propagated
>> as a return value of early_init_dt_scan_memory_ppc()?
>   
>> 
>>> return 0;
>>> }
>>>  #endif
>> 
>> 
>>> @@ -787,7 +790,7 @@ static int __init parse_numa_properties(void)
>>>  */
>>> memory = of_find_node_by_path("/ibm,dynamic-reconfiguration-memory");
>>> if (memory) {
>>> -   walk_drmem_lmbs(memory, numa_setup_drmem_lmb);
>>> +   walk_drmem_lmbs(memory, NULL, numa_setup_drmem_lmb);
>> 
>> Similarly here. Now that this call can fail, should
>> parse_numa_properties() handle or propagate the failure?
>
> They would still not fail unless the callbacks early_init_drmem_lmb() & 
> numa_setup_drmem_lmb()
> are updated to have failure scenarios. Also, these call sites always ignored 
> failure scenarios
> even before walk_drmem_lmbs() was introduced. So, I prefer to keep them the 
> way they are?

Ok, makes sense. In this case:

Reviewed-by: Thiago Jung Bauermann 

-- 
Thiago Jung Bauermann
IBM Linux Technology Center


Re: [PATCH v3 04/12] ppc64/kexec_file: avoid stomping memory used by special regions

2020-07-16 Thread Thiago Jung Bauermann


Hari Bathini  writes:

> On 15/07/20 8:09 am, Thiago Jung Bauermann wrote:
>> 
>> Hari Bathini  writes:
>> 
>
> 
>  
>>> +/**
>>> + * __locate_mem_hole_top_down - Looks top down for a large enough memory 
>>> hole
>>> + *  in the memory regions between buf_min & 
>>> buf_max
>>> + *  for the buffer. If found, sets kbuf->mem.
>>> + * @kbuf:   Buffer contents and memory parameters.
>>> + * @buf_min:Minimum address for the buffer.
>>> + * @buf_max:Maximum address for the buffer.
>>> + *
>>> + * Returns 0 on success, negative errno on error.
>>> + */
>>> +static int __locate_mem_hole_top_down(struct kexec_buf *kbuf,
>>> + u64 buf_min, u64 buf_max)
>>> +{
>>> +   int ret = -EADDRNOTAVAIL;
>>> +   phys_addr_t start, end;
>>> +   u64 i;
>>> +
>>> +   for_each_mem_range_rev(i, , NULL, NUMA_NO_NODE,
>>> +  MEMBLOCK_NONE, , , NULL) {
>>> +   if (start > buf_max)
>>> +   continue;
>>> +
>>> +   /* Memory hole not found */
>>> +   if (end < buf_min)
>>> +   break;
>>> +
>>> +   /* Adjust memory region based on the given range */
>>> +   if (start < buf_min)
>>> +   start = buf_min;
>>> +   if (end > buf_max)
>>> +   end = buf_max;
>>> +
>>> +   start = ALIGN(start, kbuf->buf_align);
>>> +   if (start < end && (end - start + 1) >= kbuf->memsz) {
>> 
>> This is why I dislike using start and end to express address ranges:
>> 
>> While struct resource seems to use the [address, end] convention, my
>
> struct crash_mem also uses [address, end] convention.
> This off-by-one error did not cause any issues as the hole start and size we 
> try to find
> are at least page aligned.
>
> Nonetheless, I think fixing 'end' early in the loop with "end -= 1" would 
> ensure
> correctness while continuing to use the same convention for structs crash_mem 
> & resource.

Sounds good.

-- 
Thiago Jung Bauermann
IBM Linux Technology Center


Re: [PATCH v3 10/12] ppc64/kexec_file: prepare elfcore header for crashing kernel

2020-07-16 Thread Thiago Jung Bauermann


Hari Bathini  writes:

> On 16/07/20 7:52 am, Thiago Jung Bauermann wrote:
>> 
>> Hari Bathini  writes:
>> 
>>>  /**
>>> + * get_crash_memory_ranges - Get crash memory ranges. This list includes
>>> + *   first/crashing kernel's memory regions that
>>> + *   would be exported via an elfcore.
>>> + * @mem_ranges:  Range list to add the memory ranges to.
>>> + *
>>> + * Returns 0 on success, negative errno on error.
>>> + */
>>> +static int get_crash_memory_ranges(struct crash_mem **mem_ranges)
>>> +{
>>> +   struct memblock_region *reg;
>>> +   struct crash_mem *tmem;
>>> +   int ret;
>>> +
>>> +   for_each_memblock(memory, reg) {
>>> +   u64 base, size;
>>> +
>>> +   base = (u64)reg->base;
>>> +   size = (u64)reg->size;
>>> +
>>> +   /* Skip backup memory region, which needs a separate entry */
>>> +   if (base == BACKUP_SRC_START) {
>>> +   if (size > BACKUP_SRC_SIZE) {
>>> +   base = BACKUP_SRC_END + 1;
>>> +   size -= BACKUP_SRC_SIZE;
>>> +   } else
>>> +   continue;
>>> +   }
>>> +
>>> +   ret = add_mem_range(mem_ranges, base, size);
>>> +   if (ret)
>>> +   goto out;
>>> +
>>> +   /* Try merging adjacent ranges before reallocation attempt */
>>> +   if ((*mem_ranges)->nr_ranges == (*mem_ranges)->max_nr_ranges)
>>> +   sort_memory_ranges(*mem_ranges, true);
>>> +   }
>>> +
>>> +   /* Reallocate memory ranges if there is no space to split ranges */
>>> +   tmem = *mem_ranges;
>>> +   if (tmem && (tmem->nr_ranges == tmem->max_nr_ranges)) {
>>> +   tmem = realloc_mem_ranges(mem_ranges);
>>> +   if (!tmem)
>>> +   goto out;
>>> +   }
>>> +
>>> +   /* Exclude crashkernel region */
>>> +   ret = crash_exclude_mem_range(tmem, crashk_res.start, crashk_res.end);
>>> +   if (ret)
>>> +   goto out;
>>> +
>>> +   ret = add_rtas_mem_range(mem_ranges);
>>> +   if (ret)
>>> +   goto out;
>>> +
>>> +   ret = add_opal_mem_range(mem_ranges);
>>> +   if (ret)
>>> +   goto out;
>> 
>> Maybe I'm confused, but don't you add the RTAS and OPAL regions as
>> usable memory for the crashkernel? In that case they shouldn't show up
>> in the core file.
>
> kexec-tools does the same thing. I am not endorsing it but I was trying to 
> stay
> in parity to avoid breaking any userspace tools/commands. But as you rightly
> pointed, this is NOT right. The right thing to do, to get the rtas/opal data 
> at
> the time of crash, is to have a backup region for them just like we have for
> the first 64K memory. I was hoping to do that later.
>
> Will check how userspace tools respond to dropping these regions. If that 
> makes
> the tools unhappy, will retain the regions with a FIXME. Sorry about the 
> confusion.

No problem, thanks for the clarification.

-- 
Thiago Jung Bauermann
IBM Linux Technology Center


Re: [PATCH v3 07/12] ppc64/kexec_file: add support to relocate purgatory

2020-07-16 Thread Hari Bathini



On 16/07/20 5:50 am, Thiago Jung Bauermann wrote:
> 
> Hari Bathini  writes:
> 
>> Right now purgatory implementation is only minimal. But if purgatory
>> code is to be enhanced to copy memory to the backup region and verify
> 
> Can't the memcpy be done in asm? We have arch/powerpc/lib/memcpy_64.S
> for example, perhaps it could be linked in with the purgatory?

I wanted to avoid touching common code to make it work for purgatory
for now.

> 
>> sha256 digest, relocations may have to be applied to the purgatory.
> 
> Do we want to do the sha256 verification? My original patch series for
> kexec_file_load() had a purgatory in C from kexec-tools which did the
> sha256 verification but Michael Ellerman thought it was unnecessary and
> decided to use the simpler purgatory in asm from kexec-lite.

kexec_file_load could as well be used without IMA or secureboot. With sha256 
digest
calculated anyway, verifying it would make sense to accommodate that case as 
well.

> 
>> So, add support to relocate purgatory in kexec_file_load system call
>> by setting up TOC pointer and applying RELA relocations as needed.
> 
> If we do want to use a C purgatory, Michael Ellerman had suggested
> building it as a Position Independent Executable, which greatly reduces
> the number and types of relocations that are needed. See patches 4 and 9
> here:
> 
> https://lore.kernel.org/linuxppc-dev/1478748449-3894-1-git-send-email-bauer...@linux.vnet.ibm.com/
> 
> In the series above I hadn't converted x86 to PIE. If I had done that,
> possibly Dave Young's opinion would have been different. :-)
> 
> If that's still not desirable, he suggested in that discussion lifting
> some code from x86 to generic code, which I implemented and would
> simplify this patch as well:
> 
> https://lore.kernel.org/linuxppc-dev/5009580.5GxAkTrMYA@morokweng/
> 

Agreed. But I prefer to work on PIE and/or moving common relocation_add code
for x86 & s390 to generic code later when I try to build on these purgatory
changes. So, a separate series later to rework purgatory with the things you
mentioned above sounds ok?

Thanks
Hari



Re: [PATCH v3 09/12] ppc64/kexec_file: setup backup region for kdump kernel

2020-07-16 Thread Hari Bathini



On 16/07/20 7:08 am, Thiago Jung Bauermann wrote:
> 
> Hari Bathini  writes:
> 
>> @@ -968,7 +1040,7 @@ int setup_new_fdt_ppc64(const struct kimage *image, 
>> void *fdt,
>>
>>  /*
>>   * Restrict memory usage for kdump kernel by setting up
>> - * usable memory ranges.
>> + * usable memory ranges and memory reserve map.
>>   */
>>  if (image->type == KEXEC_TYPE_CRASH) {
>>  ret = get_usable_memory_ranges();
>> @@ -980,6 +1052,24 @@ int setup_new_fdt_ppc64(const struct kimage *image, 
>> void *fdt,
>>  pr_err("Error setting up usable-memory property for 
>> kdump kernel\n");
>>  goto out;
>>  }
>> +
>> +ret = fdt_add_mem_rsv(fdt, BACKUP_SRC_START + BACKUP_SRC_SIZE,
>> +  crashk_res.start - BACKUP_SRC_SIZE);
> 
> I believe this answers my question from the other email about how the
> crashkernel is prevented from stomping in the crashed kernel's memory,
> right? I needed to think for a bit to understand what the above
> reservation was protecting. I think it's worth adding a comment.

Right. The reason to add it in the first place is, prom presses the panic 
button if
it can't find low memory. Marking it reserved seems to keep it quiet though. 
so..

Will add comment mentioning that..

>> +void purgatory(void)
>> +{
>> +void *dest, *src;
>> +
>> +src = (void *)BACKUP_SRC_START;
>> +if (backup_start) {
>> +dest = (void *)backup_start;
>> +__memcpy(dest, src, BACKUP_SRC_SIZE);
>> +}
>> +}
> 
> In general I'm in favor of using C code over assembly, but having to
> bring in that relocation support just for the above makes me wonder if
> it's worth it in this case.

I am planning to build on purgatory later with "I'm in purgatory" print support
for pseries at least and also, sha256 digest check.

Thanks
Hari


Re: [PATCH v3 06/12] ppc64/kexec_file: restrict memory usage of kdump kernel

2020-07-16 Thread Hari Bathini



On 16/07/20 4:22 am, Thiago Jung Bauermann wrote:
> 
> Hari Bathini  writes:
> 



>> +/**
>> + * get_node_path - Get the full path of the given node.
>> + * @dn:Node.
>> + * @path:  Updated with the full path of the node.
>> + *
>> + * Returns nothing.
>> + */
>> +static void get_node_path(struct device_node *dn, char *path)
>> +{
>> +if (!dn)
>> +return;
>> +
>> +get_node_path(dn->parent, path);
> 
> Is it ok to do recursion in the kernel? In this case I believe it's not
> problematic since the maximum call depth will be the maximum depth of a
> device tree node which shouldn't be too much. Also, there are no local
> variables in this function. But I thought it was worth mentioning.

You are right. We are better off avoiding the recursion here. Will
change it to an iterative version instead.
 
>> + * each representing a memory range.
>> + */
>> +ranges = (len >> 2) / (n_mem_addr_cells + n_mem_size_cells);
>> +
>> +for (i = 0; i < ranges; i++) {
>> +base = of_read_number(prop, n_mem_addr_cells);
>> +prop += n_mem_addr_cells;
>> +end = base + of_read_number(prop, n_mem_size_cells) - 1;

prop is not used after the above.

> You need to `prop += n_mem_size_cells` here.

But yeah, adding it would make it look complete in some sense..

Thanks
Hari


Re: [PATCH v3 05/12] powerpc/drmem: make lmb walk a bit more flexible

2020-07-16 Thread Hari Bathini



On 15/07/20 9:20 am, Thiago Jung Bauermann wrote:
> 
> Hari Bathini  writes:
> 
>> @@ -534,7 +537,7 @@ static int __init early_init_dt_scan_memory_ppc(unsigned 
>> long node,
>>  #ifdef CONFIG_PPC_PSERIES
>>  if (depth == 1 &&
>>  strcmp(uname, "ibm,dynamic-reconfiguration-memory") == 0) {
>> -walk_drmem_lmbs_early(node, early_init_drmem_lmb);
>> +walk_drmem_lmbs_early(node, NULL, early_init_drmem_lmb);
> 
> walk_drmem_lmbs_early() can now fail. Should this failure be propagated
> as a return value of early_init_dt_scan_memory_ppc()?
  
> 
>>  return 0;
>>  }
>>  #endif
> 
> 
>> @@ -787,7 +790,7 @@ static int __init parse_numa_properties(void)
>>   */
>>  memory = of_find_node_by_path("/ibm,dynamic-reconfiguration-memory");
>>  if (memory) {
>> -walk_drmem_lmbs(memory, numa_setup_drmem_lmb);
>> +walk_drmem_lmbs(memory, NULL, numa_setup_drmem_lmb);
> 
> Similarly here. Now that this call can fail, should
> parse_numa_properties() handle or propagate the failure?

They would still not fail unless the callbacks early_init_drmem_lmb() & 
numa_setup_drmem_lmb()
are updated to have failure scenarios. Also, these call sites always ignored 
failure scenarios
even before walk_drmem_lmbs() was introduced. So, I prefer to keep them the way 
they are?

Thanks
Hari


Re: [PATCH v3 04/12] ppc64/kexec_file: avoid stomping memory used by special regions

2020-07-16 Thread Hari Bathini



On 15/07/20 8:09 am, Thiago Jung Bauermann wrote:
> 
> Hari Bathini  writes:
> 


 
>> +/**
>> + * __locate_mem_hole_top_down - Looks top down for a large enough memory 
>> hole
>> + *  in the memory regions between buf_min & 
>> buf_max
>> + *  for the buffer. If found, sets kbuf->mem.
>> + * @kbuf:   Buffer contents and memory parameters.
>> + * @buf_min:Minimum address for the buffer.
>> + * @buf_max:Maximum address for the buffer.
>> + *
>> + * Returns 0 on success, negative errno on error.
>> + */
>> +static int __locate_mem_hole_top_down(struct kexec_buf *kbuf,
>> +  u64 buf_min, u64 buf_max)
>> +{
>> +int ret = -EADDRNOTAVAIL;
>> +phys_addr_t start, end;
>> +u64 i;
>> +
>> +for_each_mem_range_rev(i, , NULL, NUMA_NO_NODE,
>> +   MEMBLOCK_NONE, , , NULL) {
>> +if (start > buf_max)
>> +continue;
>> +
>> +/* Memory hole not found */
>> +if (end < buf_min)
>> +break;
>> +
>> +/* Adjust memory region based on the given range */
>> +if (start < buf_min)
>> +start = buf_min;
>> +if (end > buf_max)
>> +end = buf_max;
>> +
>> +start = ALIGN(start, kbuf->buf_align);
>> +if (start < end && (end - start + 1) >= kbuf->memsz) {
> 
> This is why I dislike using start and end to express address ranges:
> 
> While struct resource seems to use the [address, end] convention, my

struct crash_mem also uses [address, end] convention.
This off-by-one error did not cause any issues as the hole start and size we 
try to find
are at least page aligned.

Nonetheless, I think fixing 'end' early in the loop with "end -= 1" would ensure
correctness while continuing to use the same convention for structs crash_mem & 
resource.

Thanks
Hari


Re: [PATCH v3 03/12] powerpc/kexec_file: add helper functions for getting memory ranges

2020-07-16 Thread Hari Bathini



On 15/07/20 5:19 am, Thiago Jung Bauermann wrote:
> 



> 
> 
>> +/**
>> + * get_mem_rngs_size - Get the allocated size of mrngs based on
>> + * max_nr_ranges and chunk size.
>> + * @mrngs: Memory ranges.
>> + *
>> + * Returns the maximum no. of ranges.
> 
> This isn't correct. It returns the maximum size of @mrngs.

True. Will update..

> 
> 
>> +/**
>> + * add_tce_mem_ranges - Adds tce-table range to the given memory ranges 
>> list.
>> + * @mem_ranges: Range list to add the memory range(s) to.
>> + *
>> + * Returns 0 on success, negative errno on error.
>> + */
>> +int add_tce_mem_ranges(struct crash_mem **mem_ranges)
>> +{
>> +struct device_node *dn;
>> +int ret;
>> +
>> +for_each_node_by_type(dn, "pci") {
>> +u64 base;
>> +u32 size;
>> +
>> +ret = of_property_read_u64(dn, "linux,tce-base", );
>> +ret |= of_property_read_u32(dn, "linux,tce-size", );
>> +if (!ret)
> 
> Shouldn't the condition be `ret` instead of `!ret`?

Oops! Will fix it.

>> +/**
>> + * sort_memory_ranges - Sorts the given memory ranges list.
>> + * @mem_ranges: Range list to sort.
>> + * @merge:  If true, merge the list after sorting.
>> + *
>> + * Returns nothing.
>> + */
>> +void sort_memory_ranges(struct crash_mem *mrngs, bool merge)
>> +{
>> +struct crash_mem_range *rngs;
>> +struct crash_mem_range rng;
>> +int i, j, idx;
>> +
>> +if (!mrngs)
>> +return;
>> +
>> +/* Sort the ranges in-place */
>> +rngs = >ranges[0];
>> +for (i = 0; i < mrngs->nr_ranges; i++) {
>> +idx = i;
>> +for (j = (i + 1); j < mrngs->nr_ranges; j++) {
>> +if (rngs[idx].start > rngs[j].start)
>> +idx = j;
>> +}
>> +if (idx != i) {
>> +rng = rngs[idx];
>> +rngs[idx] = rngs[i];
>> +rngs[i] = rng;
>> +}
>> +}
> 
> Would it work using sort() from lib/sort.c here?

Yeah. I think we could reuse it with a simple compare callback. Will do that.

Thanks
Hari


Re: [PATCH v3 10/12] ppc64/kexec_file: prepare elfcore header for crashing kernel

2020-07-16 Thread Hari Bathini



On 16/07/20 7:52 am, Thiago Jung Bauermann wrote:
> 
> Hari Bathini  writes:
> 
>>  /**
>> + * get_crash_memory_ranges - Get crash memory ranges. This list includes
>> + *   first/crashing kernel's memory regions that
>> + *   would be exported via an elfcore.
>> + * @mem_ranges:  Range list to add the memory ranges to.
>> + *
>> + * Returns 0 on success, negative errno on error.
>> + */
>> +static int get_crash_memory_ranges(struct crash_mem **mem_ranges)
>> +{
>> +struct memblock_region *reg;
>> +struct crash_mem *tmem;
>> +int ret;
>> +
>> +for_each_memblock(memory, reg) {
>> +u64 base, size;
>> +
>> +base = (u64)reg->base;
>> +size = (u64)reg->size;
>> +
>> +/* Skip backup memory region, which needs a separate entry */
>> +if (base == BACKUP_SRC_START) {
>> +if (size > BACKUP_SRC_SIZE) {
>> +base = BACKUP_SRC_END + 1;
>> +size -= BACKUP_SRC_SIZE;
>> +} else
>> +continue;
>> +}
>> +
>> +ret = add_mem_range(mem_ranges, base, size);
>> +if (ret)
>> +goto out;
>> +
>> +/* Try merging adjacent ranges before reallocation attempt */
>> +if ((*mem_ranges)->nr_ranges == (*mem_ranges)->max_nr_ranges)
>> +sort_memory_ranges(*mem_ranges, true);
>> +}
>> +
>> +/* Reallocate memory ranges if there is no space to split ranges */
>> +tmem = *mem_ranges;
>> +if (tmem && (tmem->nr_ranges == tmem->max_nr_ranges)) {
>> +tmem = realloc_mem_ranges(mem_ranges);
>> +if (!tmem)
>> +goto out;
>> +}
>> +
>> +/* Exclude crashkernel region */
>> +ret = crash_exclude_mem_range(tmem, crashk_res.start, crashk_res.end);
>> +if (ret)
>> +goto out;
>> +
>> +ret = add_rtas_mem_range(mem_ranges);
>> +if (ret)
>> +goto out;
>> +
>> +ret = add_opal_mem_range(mem_ranges);
>> +if (ret)
>> +goto out;
> 
> Maybe I'm confused, but don't you add the RTAS and OPAL regions as
> usable memory for the crashkernel? In that case they shouldn't show up
> in the core file.

kexec-tools does the same thing. I am not endorsing it but I was trying to stay
in parity to avoid breaking any userspace tools/commands. But as you rightly
pointed, this is NOT right. The right thing to do, to get the rtas/opal data at
the time of crash, is to have a backup region for them just like we have for
the first 64K memory. I was hoping to do that later.

Will check how userspace tools respond to dropping these regions. If that makes
the tools unhappy, will retain the regions with a FIXME. Sorry about the 
confusion.

Thanks
Hari


Re: [PATCH net-next] ibmvnic: Increase driver logging

2020-07-16 Thread Jakub Kicinski
On Thu, 16 Jul 2020 18:07:37 +0200 Michal Suchánek wrote:
> On Thu, Jul 16, 2020 at 10:59:58AM -0500, Thomas Falcon wrote:
> > On 7/15/20 8:29 PM, David Miller wrote:  
> > > From: Jakub Kicinski 
> > > Date: Wed, 15 Jul 2020 17:06:32 -0700
> > >   
> > > > On Wed, 15 Jul 2020 18:51:55 -0500 Thomas Falcon wrote:  
> > > > >   free_netdev(netdev);
> > > > >   dev_set_drvdata(>dev, NULL);
> > > > > + netdev_info(netdev, "VNIC client device has been successfully 
> > > > > removed.\n");  
> > > > A step too far, perhaps.
> > > > 
> > > > In general this patch looks a little questionable IMHO, this amount of
> > > > logging output is not commonly seen in drivers. All the the info
> > > > messages are just static text, not even carrying any extra information.
> > > > In an era of ftrace, and bpftrace, do we really need this?  
> > > Agreed, this is too much.  This is debugging, and thus suitable for 
> > > tracing
> > > facilities, at best.  
> > 
> > Thanks for your feedback. I see now that I was overly aggressive with this
> > patch to be sure, but it would help with narrowing down problems at a first
> > glance, should they arise. The driver in its current state logs very little
> > of what is it doing without the use of additional debugging or tracing
> > facilities. Would it be worth it to pursue a less aggressive version or
> > would that be dead on arrival? What are acceptable driver operations to log
> > at this level?  

Sadly it's much more of an art than hard science. Most networking
drivers will print identifying information when they probe the device
and then only about major config changes or when link comes up or goes
down. And obviously when anything unexpected, like an error happens,
that's key.

You seem to be adding start / end information for each driver init /
deinit stage. I'd say try to focus on the actual errors you're trying
to catch.

> Also would it be advisable to add the messages as pr_dbg to be enabled on 
> demand?

I personally have had a pretty poor experience with pr_debug() because
CONFIG_DYNAMIC_DEBUG is not always enabled. Since you're just printing
static text there shouldn't be much difference between pr_debug and
ftrace and/or bpftrace, honestly.

Again, slightly hard to advise not knowing what you're trying to catch.


[PATCH] powerpc/64: Fix an out of date comment about MMIO ordering

2020-07-16 Thread Palmer Dabbelt
From: Palmer Dabbelt 

This primitive has been renamed, but because it was spelled incorrectly in the
first place it must have escaped the fixup patch.  As far as I can tell this
logic is still correct: smp_mb__after_spinlock() uses the default smp_mb()
implementation, which is "sync" rather than "hwsync" but those are the same
(though I'm not that familiar with PowerPC).

Signed-off-by: Palmer Dabbelt 
---
 arch/powerpc/kernel/entry_64.S | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/arch/powerpc/kernel/entry_64.S b/arch/powerpc/kernel/entry_64.S
index b3c9f15089b6..7b38b4daca93 100644
--- a/arch/powerpc/kernel/entry_64.S
+++ b/arch/powerpc/kernel/entry_64.S
@@ -357,7 +357,7 @@ _GLOBAL(_switch)
 * kernel/sched/core.c).
 *
 * Uncacheable stores in the case of involuntary preemption must
-* be taken care of. The smp_mb__before_spin_lock() in __schedule()
+* be taken care of. The smp_mb__after_spinlock() in __schedule()
 * is implemented as hwsync on powerpc, which orders MMIO too. So
 * long as there is an hwsync in the context switch path, it will
 * be executed on the source CPU after the task has performed
-- 
2.28.0.rc0.105.gf9edc3c819-goog



Re: [RFC PATCH 4/7] x86: use exit_lazy_tlb rather than membarrier_mm_sync_core_before_usermode

2020-07-16 Thread Mathieu Desnoyers
- On Jul 16, 2020, at 12:03 PM, Mathieu Desnoyers 
mathieu.desnoy...@efficios.com wrote:

> - On Jul 16, 2020, at 11:46 AM, Mathieu Desnoyers
> mathieu.desnoy...@efficios.com wrote:
> 
>> - On Jul 16, 2020, at 12:42 AM, Nicholas Piggin npig...@gmail.com wrote:
>>> I should be more complete here, especially since I was complaining
>>> about unclear barrier comment :)
>>> 
>>> 
>>> CPU0 CPU1
>>> a. user stuff1. user stuff
>>> b. membarrier()  2. enter kernel
>>> c. smp_mb()  3. smp_mb__after_spinlock(); // in __schedule
>>> d. read rq->curr 4. rq->curr switched to kthread
>>> e. is kthread, skip IPI  5. switch_to kthread
>>> f. return to user6. rq->curr switched to user thread
>>> g. user stuff7. switch_to user thread
>>> 8. exit kernel
>>> 9. more user stuff
>>> 
>>> What you're really ordering is a, g vs 1, 9 right?
>>> 
>>> In other words, 9 must see a if it sees g, g must see 1 if it saw 9,
>>> etc.
>>> 
>>> Userspace does not care where the barriers are exactly or what kernel
>>> memory accesses might be being ordered by them, so long as there is a
>>> mb somewhere between a and g, and 1 and 9. Right?
>> 
>> This is correct.
> 
> Actually, sorry, the above is not quite right. It's been a while
> since I looked into the details of membarrier.
> 
> The smp_mb() at the beginning of membarrier() needs to be paired with a
> smp_mb() _after_ rq->curr is switched back to the user thread, so the
> memory barrier is between store to rq->curr and following user-space
> accesses.
> 
> The smp_mb() at the end of membarrier() needs to be paired with the
> smp_mb__after_spinlock() at the beginning of schedule, which is
> between accesses to userspace memory and switching rq->curr to kthread.
> 
> As to *why* this ordering is needed, I'd have to dig through additional
> scenarios from https://lwn.net/Articles/573436/. Or maybe Paul remembers ?

Thinking further about this, I'm beginning to consider that maybe we have been
overly cautious by requiring memory barriers before and after store to rq->curr.

If CPU0 observes a CPU1's rq->curr->mm which differs from its own process 
(current)
while running the membarrier system call, it necessarily means that CPU1 had
to issue smp_mb__after_spinlock when entering the scheduler, between any 
user-space
loads/stores and update of rq->curr.

Requiring a memory barrier between update of rq->curr (back to current process's
thread) and following user-space memory accesses does not seem to guarantee
anything more than what the initial barrier at the beginning of __schedule 
already
provides, because the guarantees are only about accesses to user-space memory.

Therefore, with the memory barrier at the beginning of __schedule, just 
observing that
CPU1's rq->curr differs from current should guarantee that a memory barrier was 
issued
between any sequentially consistent instructions belonging to the current 
process on
CPU1.

Or am I missing/misremembering an important point here ?

Thanks,

Mathieu

> 
> Thanks,
> 
> Mathieu
> 
> 
>> Note that the accesses to user-space memory can be
>> done either by user-space code or kernel code, it doesn't matter.
>> However, in order to be considered as happening before/after
>> either membarrier or the matching compiler barrier, kernel code
>> needs to have causality relationship with user-space execution,
>> e.g. user-space does a system call, or returns from a system call.
>> 
>> In the case of io_uring, submitting a request or returning from waiting
>> on request completion appear to provide this causality relationship.
>> 
>> Thanks,
>> 
>> Mathieu
>> 
>> 
>> --
>> Mathieu Desnoyers
>> EfficiOS Inc.
>> http://www.efficios.com
> 
> --
> Mathieu Desnoyers
> EfficiOS Inc.
> http://www.efficios.com

-- 
Mathieu Desnoyers
EfficiOS Inc.
http://www.efficios.com


Re: [V2 PATCH 1/3] Refactoring powerpc code for carrying over IMA measurement logs, to move non architecture specific code to security/ima.

2020-07-16 Thread Thiago Jung Bauermann


Hello Prakhar,

Prakhar Srivastava  writes:

> On 6/19/20 5:19 PM, Thiago Jung Bauermann wrote:
>>
>> Prakhar Srivastava  writes:
>>
>>> Powerpc has support to carry over the IMA measurement logs. Refatoring the
>>> non-architecture specific code out of arch/powerpc and into security/ima.
>>>
>>> The code adds support for reserving and freeing up of memory for IMA 
>>> measurement
>>> logs.
>>
>> Last week, Mimi provided this feedback:
>>
>> "From your patch description, this patch should be broken up.  Moving
>> the non-architecture specific code out of powerpc should be one patch.
>>   Additional support should be in another patch.  After each patch, the
>> code should work properly."
>>
>> That's not what you do here. You move the code, but you also make other
>> changes at the same time. This has two problems:
>>
>> 1. It makes the patch harder to review, because it's very easy to miss a
>> change.
>>
>> 2. If in the future a git bisect later points to this patch, it's not
>> clear whether the problem is because of the code movement, or because
>> of the other changes.
>>
>> When you move code, ideally the patch should only make the changes
>> necessary to make the code work at its new location. The patch which
>> does code movement should not cause any change in behavior.
>>
>> Other changes should go in separate patches, either before or after the
>> one moving the code.
>>
>> More comments below.
>>
> Hi Thiago,
>
> Apologies for the delayed response i was away for a few days.
> I am working on breaking up the changes so that its easier to review and 
> update
> as well.

No problem.

>
> Thanks,
> Prakhar Srivastava
>
>>>
>>> ---
>>>   arch/powerpc/include/asm/ima.h |  10 ---
>>>   arch/powerpc/kexec/ima.c   | 126 ++---
>>>   security/integrity/ima/ima_kexec.c | 116 ++
>>>   3 files changed, 124 insertions(+), 128 deletions(-)
>>>
>>> diff --git a/arch/powerpc/include/asm/ima.h b/arch/powerpc/include/asm/ima.h
>>> index ead488cf3981..c29ec86498f8 100644
>>> --- a/arch/powerpc/include/asm/ima.h
>>> +++ b/arch/powerpc/include/asm/ima.h
>>> @@ -4,15 +4,6 @@
>>>
>>>   struct kimage;
>>>
>>> -int ima_get_kexec_buffer(void **addr, size_t *size);
>>> -int ima_free_kexec_buffer(void);
>>> -
>>> -#ifdef CONFIG_IMA
>>> -void remove_ima_buffer(void *fdt, int chosen_node);
>>> -#else
>>> -static inline void remove_ima_buffer(void *fdt, int chosen_node) {}
>>> -#endif
>>> -
>>>   #ifdef CONFIG_IMA_KEXEC
>>>   int arch_ima_add_kexec_buffer(struct kimage *image, unsigned long 
>>> load_addr,
>>>   size_t size);
>>> @@ -22,7 +13,6 @@ int setup_ima_buffer(const struct kimage *image, void 
>>> *fdt, int chosen_node);
>>>   static inline int setup_ima_buffer(const struct kimage *image, void *fdt,
>>>int chosen_node)
>>>   {
>>> -   remove_ima_buffer(fdt, chosen_node);
>>> return 0;
>>>   }
>>
>> This is wrong. Even if the currently running kernel doesn't have
>> CONFIG_IMA_KEXEC, it should remove the IMA buffer property and memory
>> reservation from the FDT that is being prepared for the next kernel.
>>
>> This is because the IMA kexec buffer is useless for the next kernel,
>> regardless of whether the current kernel supports CONFIG_IMA_KEXEC or
>> not. Keeping it around would be a waste of memory.
>>
> I will keep it in my next revision.
> My understanding was the reserved memory is freed and property removed when 
> IMA
> loads the logs on init.

If CONFIG_IMA_KEXEC is set, then yes. If it isn't then that needs to
happen in the function above.

> During setup_fdt in kexec, a duplicate copy of the dt is
> used, but memory still needs to be allocated, thus the property itself 
> indicats
> presence of reserved memory.
> 
>>> @@ -179,13 +64,18 @@ int setup_ima_buffer(const struct kimage *image, void 
>>> *fdt, int chosen_node)
>>> int ret, addr_cells, size_cells, entry_size;
>>> u8 value[16];
>>>
>>> -   remove_ima_buffer(fdt, chosen_node);
>>
>> This is wrong, for the same reason stated above.
>>
>>> if (!image->arch.ima_buffer_size)
>>> return 0;
>>>
>>> -   ret = get_addr_size_cells(_cells, _cells);
>>> -   if (ret)
>>> +   ret = fdt_address_cells(fdt, chosen_node);
>>> +   if (ret < 0)
>>> +   return ret;
>>> +   addr_cells = ret;
>>> +
>>> +   ret = fdt_size_cells(fdt, chosen_node);
>>> +   if (ret < 0)
>>> return ret;
>>> +   size_cells = ret;
>>>
>>> entry_size = 4 * (addr_cells + size_cells);
>>>
>>
>> I liked this change. Thanks! I agree it's better to use
>> fdt_address_cells() and fdt_size_cells() here.
>>
>> But it should be in a separate patch. Either before or after the one
>> moving the code.
>>
>>> diff --git a/security/integrity/ima/ima_kexec.c 
>>> b/security/integrity/ima/ima_kexec.c
>>> index 121de3e04af2..e1e6d6154015 100644
>>> --- a/security/integrity/ima/ima_kexec.c
>>> +++ b/security/integrity/ima/ima_kexec.c
>>> 

Re: [PATCH v3 0/3] Off-load TLB invalidations to host for !GTSE

2020-07-16 Thread Qian Cai
On Fri, Jul 03, 2020 at 11:06:05AM +0530, Bharata B Rao wrote:
> Hypervisor may choose not to enable Guest Translation Shootdown Enable
> (GTSE) option for the guest. When GTSE isn't ON, the guest OS isn't
> permitted to use instructions like tblie and tlbsync directly, but is
> expected to make hypervisor calls to get the TLB flushed.
> 
> This series enables the TLB flush routines in the radix code to
> off-load TLB flushing to hypervisor via the newly proposed hcall
> H_RPT_INVALIDATE. 
> 
> To easily check the availability of GTSE, it is made an MMU feature.
> The OV5 handling and H_REGISTER_PROC_TBL hcall are changed to
> handle GTSE as an optionally available feature and to not assume GTSE
> when radix support is available.
> 
> The actual hcall implementation for KVM isn't included in this
> patchset and will be posted separately.
> 
> Changes in v3
> =
> - Fixed a bug in the hcall wrapper code where we were missing setting
>   H_RPTI_TYPE_NESTED while retrying the failed flush request with
>   a full flush for the nested case.
> - s/psize_to_h_rpti/psize_to_rpti_pgsize
> 
> v2: 
> https://lore.kernel.org/linuxppc-dev/20200626131000.5207-1-bhar...@linux.ibm.com/T/#t
> 
> Bharata B Rao (2):
>   powerpc/mm: Enable radix GTSE only if supported.
>   powerpc/pseries: H_REGISTER_PROC_TBL should ask for GTSE only if
> enabled
> 
> Nicholas Piggin (1):
>   powerpc/mm/book3s64/radix: Off-load TLB invalidations to host when
> !GTSE

Reverting the whole series fixed random memory corruptions during boot on
POWER9 PowerNV systems below.

IBM 8335-GTH (ibm,witherspoon)
POWER9, altivec supported
262144 MB memory, 2000 GB disk space

.config:
https://gitlab.com/cailca/linux-mm/-/blob/master/powerpc.config

[9.338996][  T925] BUG: Unable to handle kernel instruction fetch (NULL 
pointer?)
[9.339026][  T925] Faulting instruction address: 0x
[9.339051][  T925] Oops: Kernel access of bad area, sig: 11 [#1]
[9.339064][  T925] LE PAGE_SIZE=64K MMU=Radix SMP NR_CPUS=256 NUMA PowerNV
[9.339098][  T925] Modules linked in: dm_mirror dm_region_hash dm_log dm_mod
[9.339150][  T925] CPU: 92 PID: 925 Comm: (md-udevd) Not tainted 
5.8.0-rc5-next-20200716 #3
[9.339186][  T925] NIP:   LR: c021f2cc CTR: 
0000
[9.339210][  T925] REGS: c000201cb52d79b0 TRAP: 0400   Not tainted  
(5.8.0-rc5-next-20200716)
[9.339244][  T925] MSR:  900040009033   CR: 
2492  XER: 
[9.339278][  T925] CFAR: c021f2c8 IRQMASK: 0 
[9.339278][  T925] GPR00: c021f2cc c000201cb52d7c40 
c5901000 c000201cb52d7ca8 
[9.339278][  T925] GPR04: c0080ea60038  
7fff 7fff 
[9.339278][  T925] GPR08:   
c000201cb50bd500 0003 
[9.339278][  T925] GPR12:  c000201fff694500 
7fffa4a8a940 7fffa4a8a6c8 
[9.339278][  T925] GPR16: 7fffa4a8a8f8 7fffa4a8a650 
7fffa4a8a488  
[9.339278][  T925] GPR20: 00050001 7fffa4a8a984 
7fff ca4545cc 
[9.339278][  T925] GPR24: c0affe28  
 0166 
[9.339278][  T925] GPR28: c000201cb52d7ca8 c0080ea6 
c000201cc3b72600 7fff 
[9.339493][  T925] NIP [] 0x0
[9.339516][  T925] LR [c021f2cc] __seccomp_filter+0xec/0x530
bpf_dispatcher_nop_func at include/linux/bpf.h:567
(inlined by) bpf_prog_run_pin_on_cpu at include/linux/filter.h:597
(inlined by) seccomp_run_filters at kernel/seccomp.c:324
(inlined by) __seccomp_filter at kernel/seccomp.c:937
[9.339538][  T925] Call Trace:
[9.339548][  T925] [c000201cb52d7c40] [c021f2cc] 
__seccomp_filter+0xec/0x530 (unreliable)
[9.339566][  T925] [c000201cb52d7d50] [c0025af8] 
do_syscall_trace_enter+0xb8/0x470
do_seccomp at arch/powerpc/kernel/ptrace/ptrace.c:252
(inlined by) do_syscall_trace_enter at arch/powerpc/kernel/ptrace/ptrace.c:327
[9.339600][  T925] [c000201cb52d7dc0] [c002c8f8] 
system_call_exception+0x138/0x180
[9.339625][  T925] [c000201cb52d7e20] [c000c9e8] 
system_call_common+0xe8/0x214
[9.339648][  T925] Instruction dump:
[9.339667][  T925]       
  
[9.339706][  T925]       
  
[9.339748][  T925] ---[ end trace d89eb80f9a6bc141 ]---
[  OK  ] Started Journal Service.
[   10.452364][  T925] Kernel panic - not syncing: Fatal exception
[   11.876655][  T925] ---[ end Kernel panic - not syncing: Fatal exception ]---

There could also be lots of random userspace segfault like,

[   16.463545][  T771] rngd[771]: segfault (11) at 0 nip 0 lr 0 code 1 in 
rngd[106d6+2

Re: [RFC PATCH 4/7] x86: use exit_lazy_tlb rather than membarrier_mm_sync_core_before_usermode

2020-07-16 Thread Mathieu Desnoyers
- On Jul 16, 2020, at 11:46 AM, Mathieu Desnoyers 
mathieu.desnoy...@efficios.com wrote:

> - On Jul 16, 2020, at 12:42 AM, Nicholas Piggin npig...@gmail.com wrote:
>> I should be more complete here, especially since I was complaining
>> about unclear barrier comment :)
>> 
>> 
>> CPU0 CPU1
>> a. user stuff1. user stuff
>> b. membarrier()  2. enter kernel
>> c. smp_mb()  3. smp_mb__after_spinlock(); // in __schedule
>> d. read rq->curr 4. rq->curr switched to kthread
>> e. is kthread, skip IPI  5. switch_to kthread
>> f. return to user6. rq->curr switched to user thread
>> g. user stuff7. switch_to user thread
>> 8. exit kernel
>> 9. more user stuff
>> 
>> What you're really ordering is a, g vs 1, 9 right?
>> 
>> In other words, 9 must see a if it sees g, g must see 1 if it saw 9,
>> etc.
>> 
>> Userspace does not care where the barriers are exactly or what kernel
>> memory accesses might be being ordered by them, so long as there is a
>> mb somewhere between a and g, and 1 and 9. Right?
> 
> This is correct.

Actually, sorry, the above is not quite right. It's been a while
since I looked into the details of membarrier.

The smp_mb() at the beginning of membarrier() needs to be paired with a
smp_mb() _after_ rq->curr is switched back to the user thread, so the
memory barrier is between store to rq->curr and following user-space
accesses.

The smp_mb() at the end of membarrier() needs to be paired with the
smp_mb__after_spinlock() at the beginning of schedule, which is
between accesses to userspace memory and switching rq->curr to kthread.

As to *why* this ordering is needed, I'd have to dig through additional
scenarios from https://lwn.net/Articles/573436/. Or maybe Paul remembers ?

Thanks,

Mathieu


> Note that the accesses to user-space memory can be
> done either by user-space code or kernel code, it doesn't matter.
> However, in order to be considered as happening before/after
> either membarrier or the matching compiler barrier, kernel code
> needs to have causality relationship with user-space execution,
> e.g. user-space does a system call, or returns from a system call.
> 
> In the case of io_uring, submitting a request or returning from waiting
> on request completion appear to provide this causality relationship.
> 
> Thanks,
> 
> Mathieu
> 
> 
> --
> Mathieu Desnoyers
> EfficiOS Inc.
> http://www.efficios.com

-- 
Mathieu Desnoyers
EfficiOS Inc.
http://www.efficios.com


Re: [PATCH net-next] ibmvnic: Increase driver logging

2020-07-16 Thread Michal Suchánek
On Thu, Jul 16, 2020 at 10:59:58AM -0500, Thomas Falcon wrote:
> 
> On 7/15/20 8:29 PM, David Miller wrote:
> > From: Jakub Kicinski 
> > Date: Wed, 15 Jul 2020 17:06:32 -0700
> > 
> > > On Wed, 15 Jul 2020 18:51:55 -0500 Thomas Falcon wrote:
> > > > free_netdev(netdev);
> > > > dev_set_drvdata(>dev, NULL);
> > > > +   netdev_info(netdev, "VNIC client device has been successfully 
> > > > removed.\n");
> > > A step too far, perhaps.
> > > 
> > > In general this patch looks a little questionable IMHO, this amount of
> > > logging output is not commonly seen in drivers. All the the info
> > > messages are just static text, not even carrying any extra information.
> > > In an era of ftrace, and bpftrace, do we really need this?
> > Agreed, this is too much.  This is debugging, and thus suitable for tracing
> > facilities, at best.
> 
> Thanks for your feedback. I see now that I was overly aggressive with this
> patch to be sure, but it would help with narrowing down problems at a first
> glance, should they arise. The driver in its current state logs very little
> of what is it doing without the use of additional debugging or tracing
> facilities. Would it be worth it to pursue a less aggressive version or
> would that be dead on arrival? What are acceptable driver operations to log
> at this level?

Also would it be advisable to add the messages as pr_dbg to be enabled on 
demand?

Thanks

Michal


Re: [PATCH net-next] ibmvnic: Increase driver logging

2020-07-16 Thread Thomas Falcon



On 7/15/20 8:29 PM, David Miller wrote:

From: Jakub Kicinski 
Date: Wed, 15 Jul 2020 17:06:32 -0700


On Wed, 15 Jul 2020 18:51:55 -0500 Thomas Falcon wrote:

free_netdev(netdev);
dev_set_drvdata(>dev, NULL);
+   netdev_info(netdev, "VNIC client device has been successfully 
removed.\n");

A step too far, perhaps.

In general this patch looks a little questionable IMHO, this amount of
logging output is not commonly seen in drivers. All the the info
messages are just static text, not even carrying any extra information.
In an era of ftrace, and bpftrace, do we really need this?

Agreed, this is too much.  This is debugging, and thus suitable for tracing
facilities, at best.


Thanks for your feedback. I see now that I was overly aggressive with 
this patch to be sure, but it would help with narrowing down problems at 
a first glance, should they arise. The driver in its current state logs 
very little of what is it doing without the use of additional debugging 
or tracing facilities. Would it be worth it to pursue a less aggressive 
version or would that be dead on arrival? What are acceptable driver 
operations to log at this level?


Thanks,

Tom



Re: [RFC PATCH 4/7] x86: use exit_lazy_tlb rather than membarrier_mm_sync_core_before_usermode

2020-07-16 Thread Mathieu Desnoyers



- On Jul 16, 2020, at 12:42 AM, Nicholas Piggin npig...@gmail.com wrote:
> I should be more complete here, especially since I was complaining
> about unclear barrier comment :)
> 
> 
> CPU0 CPU1
> a. user stuff1. user stuff
> b. membarrier()  2. enter kernel
> c. smp_mb()  3. smp_mb__after_spinlock(); // in __schedule
> d. read rq->curr 4. rq->curr switched to kthread
> e. is kthread, skip IPI  5. switch_to kthread
> f. return to user6. rq->curr switched to user thread
> g. user stuff7. switch_to user thread
> 8. exit kernel
> 9. more user stuff
> 
> What you're really ordering is a, g vs 1, 9 right?
> 
> In other words, 9 must see a if it sees g, g must see 1 if it saw 9,
> etc.
> 
> Userspace does not care where the barriers are exactly or what kernel
> memory accesses might be being ordered by them, so long as there is a
> mb somewhere between a and g, and 1 and 9. Right?

This is correct. Note that the accesses to user-space memory can be
done either by user-space code or kernel code, it doesn't matter.
However, in order to be considered as happening before/after
either membarrier or the matching compiler barrier, kernel code
needs to have causality relationship with user-space execution,
e.g. user-space does a system call, or returns from a system call.

In the case of io_uring, submitting a request or returning from waiting
on request completion appear to provide this causality relationship.

Thanks,

Mathieu


-- 
Mathieu Desnoyers
EfficiOS Inc.
http://www.efficios.com


Re: [RFC PATCH 4/7] x86: use exit_lazy_tlb rather than membarrier_mm_sync_core_before_usermode

2020-07-16 Thread Mathieu Desnoyers
- On Jul 16, 2020, at 7:00 AM, Peter Zijlstra pet...@infradead.org wrote:

> On Thu, Jul 16, 2020 at 08:03:36PM +1000, Nicholas Piggin wrote:
>> Excerpts from Peter Zijlstra's message of July 16, 2020 6:50 pm:
>> > On Wed, Jul 15, 2020 at 10:18:20PM -0700, Andy Lutomirski wrote:
>> >> > On Jul 15, 2020, at 9:15 PM, Nicholas Piggin  wrote:
> 
>> >> But I’m wondering if all this deferred sync stuff is wrong. In the
>> >> brave new world of io_uring and such, perhaps kernel access matter
>> >> too.  Heck, even:
>> > 
>> > IIRC the membarrier SYNC_CORE use-case is about user-space
>> > self-modifying code.
>> > 
>> > Userspace re-uses a text address and needs to SYNC_CORE before it can be
>> > sure the old text is forgotten. Nothing the kernel does matters there.
>> > 
>> > I suppose the manpage could be more clear there.
>> 
>> True, but memory ordering of kernel stores from kernel threads for
>> regular mem barrier is the concern here.
>> 
>> Does io_uring update completion queue from kernel thread or interrupt,
>> for example? If it does, then membarrier will not order such stores
>> with user memory accesses.
> 
> So we're talking about regular membarrier() then? Not the SYNC_CORE
> variant per-se.
> 
> Even there, I'll argue we don't care, but perhaps Mathieu has a
> different opinion.

I agree with Peter that we don't care about accesses to user-space
memory performed concurrently with membarrier.

What we'd care about in terms of accesses to user-space memory from the
kernel is something that would be clearly ordered as happening before
or after the membarrier call, for instance a read(2) followed by
membarrier(2) after the read returns, or a read(2) issued after return
from membarrier(2). The other scenario we'd care about is with the compiler
barrier paired with membarrier: e.g. read(2) returns, compiler barrier,
followed by a store. Or load, compiler barrier, followed by write(2).

All those scenarios imply before/after ordering wrt either membarrier or
the compiler barrier. I notice that io_uring has a "completion" queue.
Let's try to come up with realistic usage scenarios.

So the dependency chain would be provided by e.g.:

* Infrequent read / Frequent write, communicating read completion through 
variable X

wait for io_uring read request completion -> membarrier -> store X=1

with matching

load from X (waiting for X==1) -> asm volatile (::: "memory") -> submit 
io_uring write request

or this other scenario:

* Frequent read / Infrequent write, communicating read completion through 
variable X

load from X (waiting for X==1) -> membarrier -> submit io_uring write request

with matching

wait for io_uring read request completion -> asm volatile (::: "memory") -> 
store X=1

Thanks,

Mathieu


-- 
Mathieu Desnoyers
EfficiOS Inc.
http://www.efficios.com


Re: [PATCH V5 1/4] mm/debug_vm_pgtable: Add tests validating arch helpers for core MM features

2020-07-16 Thread Steven Price

On 13/07/2020 04:23, Anshuman Khandual wrote:

This adds new tests validating arch page table helpers for these following
core memory features. These tests create and test specific mapping types at
various page table levels.

1. SPECIAL mapping
2. PROTNONE mapping
3. DEVMAP mapping
4. SOFTDIRTY mapping
5. SWAP mapping
6. MIGRATION mapping
7. HUGETLB mapping
8. THP mapping

Cc: Andrew Morton 
Cc: Gerald Schaefer 
Cc: Christophe Leroy 
Cc: Mike Rapoport 
Cc: Vineet Gupta 
Cc: Catalin Marinas 
Cc: Will Deacon 
Cc: Benjamin Herrenschmidt 
Cc: Paul Mackerras 
Cc: Michael Ellerman 
Cc: Heiko Carstens 
Cc: Vasily Gorbik 
Cc: Christian Borntraeger 
Cc: Thomas Gleixner 
Cc: Ingo Molnar 
Cc: Borislav Petkov 
Cc: "H. Peter Anvin" 
Cc: Kirill A. Shutemov 
Cc: Paul Walmsley 
Cc: Palmer Dabbelt 
Cc: linux-snps-...@lists.infradead.org
Cc: linux-arm-ker...@lists.infradead.org
Cc: linuxppc-dev@lists.ozlabs.org
Cc: linux-s...@vger.kernel.org
Cc: linux-ri...@lists.infradead.org
Cc: x...@kernel.org
Cc: linux...@kvack.org
Cc: linux-a...@vger.kernel.org
Cc: linux-ker...@vger.kernel.org
Tested-by: Vineet Gupta  #arc
Reviewed-by: Zi Yan 
Suggested-by: Catalin Marinas 
Signed-off-by: Anshuman Khandual 
---
  mm/debug_vm_pgtable.c | 302 +-
  1 file changed, 301 insertions(+), 1 deletion(-)

diff --git a/mm/debug_vm_pgtable.c b/mm/debug_vm_pgtable.c
index 61ab16fb2e36..2fac47db3eb7 100644
--- a/mm/debug_vm_pgtable.c
+++ b/mm/debug_vm_pgtable.c

[...]

+
+static void __init pte_swap_tests(unsigned long pfn, pgprot_t prot)
+{
+   swp_entry_t swp;
+   pte_t pte;
+
+   pte = pfn_pte(pfn, prot);
+   swp = __pte_to_swp_entry(pte);


Minor issue: this doesn't look necessarily valid - there's no reason a 
normal PTE can be turned into a swp_entry. In practise this is likely to 
work on all architectures because there's no reason not to use (at 
least) all the PFN bits for the swap entry, but it doesn't exactly seem 
correct.


Can we start with a swp_entry_t (from __swp_entry()) and check the round 
trip of that?


It would also seem sensible to have a check that 
is_swap_pte(__swp_entry_to_pte(__swp_entry(x,y))) is true.



+   pte = __swp_entry_to_pte(swp);
+   WARN_ON(pfn != pte_pfn(pte));
+}
+
+#ifdef CONFIG_ARCH_ENABLE_THP_MIGRATION
+static void __init pmd_swap_tests(unsigned long pfn, pgprot_t prot)
+{
+   swp_entry_t swp;
+   pmd_t pmd;
+
+   pmd = pfn_pmd(pfn, prot);
+   swp = __pmd_to_swp_entry(pmd);
+   pmd = __swp_entry_to_pmd(swp);
+   WARN_ON(pfn != pmd_pfn(pmd));
+}
+#else  /* !CONFIG_ARCH_ENABLE_THP_MIGRATION */
+static void __init pmd_swap_tests(unsigned long pfn, pgprot_t prot) { }
+#endif /* CONFIG_ARCH_ENABLE_THP_MIGRATION */
+
+static void __init swap_migration_tests(void)
+{
+   struct page *page;
+   swp_entry_t swp;
+
+   if (!IS_ENABLED(CONFIG_MIGRATION))
+   return;
+   /*
+* swap_migration_tests() requires a dedicated page as it needs to
+* be locked before creating a migration entry from it. Locking the
+* page that actually maps kernel text ('start_kernel') can be real
+* problematic. Lets allocate a dedicated page explicitly for this


NIT: s/Lets/Let's

Otherwise looks good to me.

Steve


Re: [PATCH] pseries: Fix 64 bit logical memory block panic

2020-07-16 Thread Aneesh Kumar K.V

On 7/16/20 7:00 AM, Paul Mackerras wrote:

On Wed, Jul 15, 2020 at 06:12:25PM +0530, Aneesh Kumar K.V wrote:

Anton Blanchard  writes:


Booting with a 4GB LMB size causes us to panic:

   qemu-system-ppc64: OS terminated: OS panic:
   Memory block size not suitable: 0x0

Fix pseries_memory_block_size() to handle 64 bit LMBs.

Cc: sta...@vger.kernel.org
Signed-off-by: Anton Blanchard 
---
  arch/powerpc/platforms/pseries/hotplug-memory.c | 2 +-
  1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/arch/powerpc/platforms/pseries/hotplug-memory.c 
b/arch/powerpc/platforms/pseries/hotplug-memory.c
index 5ace2f9a277e..6574ac33e887 100644
--- a/arch/powerpc/platforms/pseries/hotplug-memory.c
+++ b/arch/powerpc/platforms/pseries/hotplug-memory.c
@@ -27,7 +27,7 @@ static bool rtas_hp_event;
  unsigned long pseries_memory_block_size(void)
  {
struct device_node *np;
-   unsigned int memblock_size = MIN_MEMORY_BLOCK_SIZE;
+   uint64_t memblock_size = MIN_MEMORY_BLOCK_SIZE;
struct resource r;
  
  	np = of_find_node_by_path("/ibm,dynamic-reconfiguration-memory");


We need similar changes at more places?

modified   arch/powerpc/include/asm/book3s/64/mmu.h
@@ -85,7 +85,7 @@ extern unsigned int mmu_base_pid;
  /*
   * memory block size used with radix translation.
   */
-extern unsigned int __ro_after_init radix_mem_block_size;
+extern unsigned long __ro_after_init radix_mem_block_size;
  
  #define PRTB_SIZE_SHIFT	(mmu_pid_bits + 4)

  #define PRTB_ENTRIES  (1ul << mmu_pid_bits)
modified   arch/powerpc/include/asm/drmem.h
@@ -21,7 +21,7 @@ struct drmem_lmb {
  struct drmem_lmb_info {
struct drmem_lmb*lmbs;
int n_lmbs;
-   u32 lmb_size;
+   u64 lmb_size;
  };
  
  extern struct drmem_lmb_info *drmem_info;

modified   arch/powerpc/mm/book3s64/radix_pgtable.c
@@ -34,7 +34,7 @@
  
  unsigned int mmu_pid_bits;

  unsigned int mmu_base_pid;
-unsigned int radix_mem_block_size __ro_after_init;
+unsigned long radix_mem_block_size __ro_after_init;


These changes look fine.


  static __ref void *early_alloc_pgtable(unsigned long size, int nid,
unsigned long region_start, unsigned long region_end)
modified   arch/powerpc/mm/drmem.c
@@ -268,14 +268,15 @@ static void __init __walk_drmem_v2_lmbs(const __be32 
*prop, const __be32 *usm,
  void __init walk_drmem_lmbs_early(unsigned long node,
void (*func)(struct drmem_lmb *, const __be32 **))
  {
+   const __be64 *lmb_prop;
const __be32 *prop, *usm;
int len;
  
-	prop = of_get_flat_dt_prop(node, "ibm,lmb-size", );

-   if (!prop || len < dt_root_size_cells * sizeof(__be32))
+   lmb_prop = of_get_flat_dt_prop(node, "ibm,lmb-size", );
+   if (!lmb_prop || len < sizeof(__be64))
return;
  
-	drmem_info->lmb_size = dt_mem_next_cell(dt_root_size_cells, );

+   drmem_info->lmb_size = be64_to_cpup(lmb_prop);


This particular change shouldn't be necessary.  We already have
dt_mem_next_cell() returning u64, and it knows how to combine two
cells to give a u64 (for dt_root_size_cells == 2).



agreed. I added it here because in another patch i was confused about 
the usage of dt_root_size_cells. We don't generally use that in other 
device tree parsing code. I will move that to a separate patch as cleanup.





usm = of_get_flat_dt_prop(node, "linux,drconf-usable-memory", );
  
@@ -296,19 +297,19 @@ void __init walk_drmem_lmbs_early(unsigned long node,
  
  static int __init init_drmem_lmb_size(struct device_node *dn)

  {
-   const __be32 *prop;
+   const __be64 *prop;
int len;
  
  	if (drmem_info->lmb_size)

return 0;
  
  	prop = of_get_property(dn, "ibm,lmb-size", );

-   if (!prop || len < dt_root_size_cells * sizeof(__be32)) {
+   if (!prop || len < sizeof(__be64)) {
pr_info("Could not determine LMB size\n");
return -1;
}
  
-	drmem_info->lmb_size = dt_mem_next_cell(dt_root_size_cells, );

+   drmem_info->lmb_size = be64_to_cpup(prop);


Same comment here.



-aneesh


Re: [PATCH v3 0/4] powerpc/mm/radix: Memory unplug fixes

2020-07-16 Thread Nathan Lynch
"Aneesh Kumar K.V"  writes:
> This is the next version of the fixes for memory unplug on radix.
> The issues and the fix are described in the actual patches.

I guess this isn't actually causing problems at runtime right now, but I
notice calls to resize_hpt_for_hotplug() from arch_add_memory() and
arch_remove_memory(), which ought to be mmu-agnostic:

int __ref arch_add_memory(int nid, u64 start, u64 size,
  struct mhp_params *params)
{
unsigned long start_pfn = start >> PAGE_SHIFT;
unsigned long nr_pages = size >> PAGE_SHIFT;
int rc;

resize_hpt_for_hotplug(memblock_phys_mem_size());

start = (unsigned long)__va(start);
rc = create_section_mapping(start, start + size, nid,
params->pgprot);
...



Re: [PATCH 0/3] Implement shared_cpu_list for powerpc

2020-07-16 Thread Michael Ellerman
On Mon, 29 Jun 2020 16:07:00 +0530, Srikar Dronamraju wrote:
> shared_cpu_list sysfs file is missing in powerpc and shared_cpu_map gives an
> extra newline character.
> 
> Before this patchset
> # ls /sys/devices/system/cpu0/cache/index1
> coherency_line_size  number_of_sets  size  ways_of_associativity
> levelshared_cpu_map  type
> # cat /sys/devices/system/cpu0/cache/index1/shared_cpu_map
> 00ff
> 
> [...]

Applied to powerpc/next.

[1/3] powerpc/cacheinfo: Use cpumap_print to print cpumap
  https://git.kernel.org/powerpc/c/5658cf085ba3c3f3c24ac0f7210f0473794df506
[2/3] powerpc/cacheinfo: Make cpumap_show code reusable
  https://git.kernel.org/powerpc/c/74b7492e417812ea0f5002e210e2ac07a5728d17
[3/3] powerpc/cacheinfo: Add per cpu per index shared_cpu_list
  https://git.kernel.org/powerpc/c/a87a77cb947cc9fc89f0dad51aeee66a61cc7fc4

cheers


Re: [PATCH -next] cpuidle/pseries: Make symbol 'pseries_idle_driver' static

2020-07-16 Thread Michael Ellerman
On Tue, 14 Jul 2020 22:24:24 +0800, Wei Yongjun wrote:
> The sparse tool complains as follows:
> 
> drivers/cpuidle/cpuidle-pseries.c:25:23: warning:
>  symbol 'pseries_idle_driver' was not declared. Should it be static?
> 
> 'pseries_idle_driver' is not used outside of this file, so marks
> it static.

Applied to powerpc/next.

[1/1] cpuidle/pseries: Make symbol 'pseries_idle_driver' static
  https://git.kernel.org/powerpc/c/92fe8483b1660feaa602d8be6ca7efe95ae4789b

cheers


Re: [PATCH -next] powerpc/xive: Remove unused inline function xive_kexec_teardown_cpu()

2020-07-16 Thread Michael Ellerman
On Wed, 15 Jul 2020 10:50:40 +0800, YueHaibing wrote:
> commit e27e0a94651e ("powerpc/xive: Remove xive_kexec_teardown_cpu()")
> left behind this, remove it.

Applied to powerpc/next.

[1/1] powerpc/xive: Remove unused inline function xive_kexec_teardown_cpu()
  https://git.kernel.org/powerpc/c/29d9407e1037868b59d12948d42ad3ef58fc3a5a

cheers


Re: [PATCH v6] powerpc/fadump: fix race between pstore write and fadump crash trigger

2020-07-16 Thread Michael Ellerman
On Mon, 13 Jul 2020 10:54:35 +0530, Sourabh Jain wrote:
> When we enter into fadump crash path via system reset we fail to update
> the pstore.
> 
> On the system reset path we first update the pstore then we go for fadump
> crash. But the problem here is when all the CPUs try to get the pstore
> lock to initiate the pstore write, only one CPUs will acquire the lock
> and proceed with the pstore write. Since it in NMI context CPUs that fail
> to get lock do not wait for their turn to write to the pstore and simply
> proceed with the next operation which is fadump crash. One of the CPU who
> proceeded with fadump crash path triggers the crash and does not wait for
> the CPU who gets the pstore lock to complete the pstore update.
> 
> [...]

Applied to powerpc/next.

[1/1] powerpc/fadump: fix race between pstore write and fadump crash trigger
  https://git.kernel.org/powerpc/c/ba608c4fa12cfd0cab0e153249c29441f4dd3312

cheers


Re: [PATCH 1/1] MAINTAINERS: Remove self

2020-07-16 Thread Michael Ellerman
On Tue, 30 Jun 2020 08:50:44 +1000, Sam Bobroff wrote:
> I'm sorry to say I can no longer maintain this position.

Applied to powerpc/next.

[1/1] MAINTAINERS: Remove self from powerpc EEH
  https://git.kernel.org/powerpc/c/a984c1f2e49225b40f1d0d20d383ec27d4d0

cheers


Re: [PATCH 1/3] powerpc/64s: restore_math remove TM test

2020-07-16 Thread Michael Ellerman
On Wed, 24 Jun 2020 09:41:37 +1000, Nicholas Piggin wrote:
> The TM test in restore_math added by commit dc16b553c949e ("powerpc:
> Always restore FPU/VEC/VSX if hardware transactional memory in use") is
> no longer necessary after commit a8318c13e79ba ("powerpc/tm: Fix
> restoring FP/VMX facility incorrectly on interrupts"), which removed
> the cases where restore_math has to restore if TM is active.

Applied to powerpc/next.

[1/3] powerpc/64s: restore_math remove TM test
  https://git.kernel.org/powerpc/c/891b4fe8fe3d09f20948b391f24c9fc5b7580a2b
[2/3] powerpc/64s: Fix restore_math unnecessarily changing MSR
  https://git.kernel.org/powerpc/c/01eb01877f3386d4bd5de75909abdd0af45a5fa2
[3/3] powerpc: re-initialise lazy FPU/VEC counters on every fault
  https://git.kernel.org/powerpc/c/b2b46304e9360f3dda49c9d8ba4a1478b9eecf1d

cheers


Re: [PATCH 1/2] powerpc/powernv: Make pnv_pci_sriov_enable() and friends static

2020-07-16 Thread Michael Ellerman
On Sun, 5 Jul 2020 23:35:56 +1000, Oliver O'Halloran wrote:
> The kernel test robot noticed these are non-static which causes Clang to
> print some warnings. These are called via ppc_md function pointers so
> there's no need for them to be non-static.

Applied to powerpc/next.

[1/2] powerpc/powernv: Make pnv_pci_sriov_enable() and friends static
  https://git.kernel.org/powerpc/c/93eacd94e09db2b1bb0343f8115385e5c34abf0a
[2/2] powerpc/powernv: Move pnv_ioda_setup_bus_dma under CONFIG_IOMMU_API
  https://git.kernel.org/powerpc/c/e3417faec526cbf97773dca691dcd743f5bfeb64

cheers


Re: [PATCH v3] powerpc/pseries: detect secure and trusted boot state of the system.

2020-07-16 Thread Michael Ellerman
On Wed, 15 Jul 2020 07:52:01 -0400, Nayna Jain wrote:
> The device-tree property to check secure and trusted boot state is
> different for guests(pseries) compared to baremetal(powernv).
> 
> This patch updates the existing is_ppc_secureboot_enabled() and
> is_ppc_trustedboot_enabled() functions to add support for pseries.
> 
> The secureboot and trustedboot state are exposed via device-tree property:
> /proc/device-tree/ibm,secure-boot and /proc/device-tree/ibm,trusted-boot
> 
> [...]

Applied to powerpc/next.

[1/1] powerpc/pseries: Detect secure and trusted boot state of the system.
  https://git.kernel.org/powerpc/c/61f879d97ce4510dd29d676a20d67692e3b34806

cheers


Re: [PATCH v3 0/3] selftests: powerpc: Fixes and execute-disable test for pkeys

2020-07-16 Thread Michael Ellerman
On Thu, 4 Jun 2020 18:26:07 +0530, Sandipan Das wrote:
> This fixes the way the Authority Mask Register (AMR) is updated
> by the existing pkey tests and adds a new test to verify the
> functionality of execute-disabled pkeys.
> 
> Previous versions can be found at:
> v2: 
> https://lore.kernel.org/linuxppc-dev/20200527030342.13712-1-sandi...@linux.ibm.com/
> v1: 
> https://lore.kernel.org/linuxppc-dev/20200508162332.65316-1-sandi...@linux.ibm.com/
> 
> [...]

Applied to powerpc/next.

[1/3] selftests/powerpc: Fix pkey access right updates
  https://git.kernel.org/powerpc/c/828ca4320d130bbe1d12866152600c49ff6a9f79
[2/3] selftests/powerpc: Move Hash MMU check to utilities
  https://git.kernel.org/powerpc/c/c405b738daf9d8e8a5aedfeb6be851681e65e54b
[3/3] selftests/powerpc: Add test for execute-disabled pkeys
  https://git.kernel.org/powerpc/c/1addb6444791f9e87fce0eb9882ec96a4a76e615

cheers


Re: [PATCH 0/7] powerpc: branch cache flush changes

2020-07-16 Thread Michael Ellerman
On Tue, 9 Jun 2020 17:06:03 +1000, Nicholas Piggin wrote:
> This series allows the link stack to be flushed with the speical
> bcctr 2,0,0 flush instruction that also flushes the count cache if
> the processor supports it.
> 
> Firmware does not support this at the moment, but I've tested it in
> simulator with a patched firmware to advertise support.
> 
> [...]

Patches 1-6 applied to powerpc/next.

[1/7] powerpc/security: re-name count cache flush to branch cache flush
  https://git.kernel.org/powerpc/c/1026798c644bfd3115fc4e32fd5e767cfc30ccf1
[2/7] powerpc/security: change link stack flush state to the flush type enum
  https://git.kernel.org/powerpc/c/c06ac2771070f465076e87bba262c64fb0b3aca3
[3/7] powerpc/security: make display of branch cache flush more consistent
  https://git.kernel.org/powerpc/c/1afe00c74ffe6d502bffa81c7d849cb4640d7ae5
[4/7] powerpc/security: split branch cache flush toggle from code patching
  https://git.kernel.org/powerpc/c/c0036549a9d9a060fa8bc24e31f85503ce08ad5e
[5/7] powerpc/64s: Move branch cache flushing bcctr variant to ppc-ops.h
  https://git.kernel.org/powerpc/c/70d7cdaf0548ec95fa7204dcdd39cd8e63cee24d
[6/7] powerpc/security: Allow for processors that flush the link stack using 
the special bcctr
  https://git.kernel.org/powerpc/c/4d24e21cc694e7253a532fe5a9bde12b284f1317

cheers


Re: [PATCH v2] powerpc/64/signal: balance return predictor stack in signal trampoline

2020-07-16 Thread Michael Ellerman
On Mon, 11 May 2020 20:19:52 +1000, Nicholas Piggin wrote:
> Returning from an interrupt or syscall to a signal handler currently
> begins execution directly at the handler's entry point, with LR set to
> the address of the sigreturn trampoline. When the signal handler
> function returns, it runs the trampoline. It looks like this:
> 
> # interrupt at user address xyz
> # kernel stuff... signal is raised
> rfid
> # void handler(int sig)
> addis 2,12,.TOC.-.LCF0@ha
> addi 2,2,.TOC.-.LCF0@l
> mflr 0
> std 0,16(1)
> stdu 1,-96(1)
> # handler stuff
> ld 0,16(1)
> mtlr 0
> blr
> # __kernel_sigtramp_rt64
> addir1,r1,__SIGNAL_FRAMESIZE
> li  r0,__NR_rt_sigreturn
> sc
> # kernel executes rt_sigreturn
> rfid
> # back to user address xyz
> 
> [...]

Applied to powerpc/next.

[1/1] powerpc/64/signal: Balance return predictor stack in signal trampoline
  https://git.kernel.org/powerpc/c/0138ba5783ae0dcc799ad401a1e8ac8333790df9

cheers


Re: [PATCH 00/18] remove extended cede offline mode and bogus topology update code

2020-07-16 Thread Michael Ellerman
On Fri, 12 Jun 2020 00:12:20 -0500, Nathan Lynch wrote:
> Two major parts to this series:
> 
> 1. Removal of the extended cede offline mode for CPUs as well as the
>partition suspend code which accommodates it by temporarily
>onlining all CPUs prior to suspending the LPAR. This solves some
>accounting problems, simplifies the pseries CPU hotplug code, and
>greatly uncomplicates the existing partition suspend code, easing
>a much-needed transition to the Linux suspend framework. The two
>patches which make up this part have been posted before:
> 
> [...]

Applied to powerpc/next.

[01/18] powerpc/pseries: remove cede offline state for CPUs

https://git.kernel.org/powerpc/c/48f6e7f6d948b56489da027bc3284c709b939d28
[02/18] powerpc/rtas: don't online CPUs for partition suspend

https://git.kernel.org/powerpc/c/ec2fc2a9e9bbad9023aab65bc472ce7a3ca8608f
[03/18] powerpc/numa: remove ability to enable topology updates

https://git.kernel.org/powerpc/c/c30f931e891eb0a32885ecd79984e1e7366fceda
[04/18] powerpc/numa: remove unreachable topology update code

https://git.kernel.org/powerpc/c/7d35bef96a46f7e9e167bb25258c0bd389aeab1b
[05/18] powerpc/numa: make vphn_enabled, prrn_enabled flags const

https://git.kernel.org/powerpc/c/e6eacf8eb4dee7bc7021c837666e3ebf1b0ec3b5
[06/18] powerpc/numa: remove unreachable topology timer code

https://git.kernel.org/powerpc/c/50e0cf3742a01e72f4ea4a8fe9221b152e22871b
[07/18] powerpc/numa: remove unreachable topology workqueue code

https://git.kernel.org/powerpc/c/6325cb4a4ea8f4af8515b923650dd8f709694b44
[08/18] powerpc/numa: remove vphn_enabled and prrn_enabled internal flags

https://git.kernel.org/powerpc/c/9fb8b5fd1bf782a8257506ad5198237f4124d556
[09/18] powerpc/numa: stub out numa_update_cpu_topology()

https://git.kernel.org/powerpc/c/893ec6461f46c91487d914e6d467d2e804b9a883
[10/18] powerpc/numa: remove timed_topology_update()

https://git.kernel.org/powerpc/c/b1815aeac7fde2dc3412daf2efaededd21cd58e0
[11/18] powerpc/numa: remove start/stop_topology_update()

https://git.kernel.org/powerpc/c/1835303e5690cbeef2c07a9a5416045475ddaa13
[12/18] powerpc/rtasd: simplify handle_rtas_event(), emit message on events

https://git.kernel.org/powerpc/c/91713ac377859893a7798999cb2e3a388d8ae710
[13/18] powerpc/numa: remove prrn_is_enabled()

https://git.kernel.org/powerpc/c/042ef7cc43f4571d8cbe44a7c735ab6622809142
[14/18] powerpc/numa: remove arch_update_cpu_topology

https://git.kernel.org/powerpc/c/cdf082c4570f186d608aca688f2cc872b014558a
[15/18] powerpc/pseries: remove prrn special case from DT update path

https://git.kernel.org/powerpc/c/bb7c3d36e3b18aa02d34358ae75e1b91f69a968b
[16/18] powerpc/pseries: remove memory "re-add" implementation

https://git.kernel.org/powerpc/c/4abe60c6448bf1dba48689450ad1348e5fc6f7b7
[17/18] powerpc/pseries: remove dlpar_cpu_readd()

https://git.kernel.org/powerpc/c/38c392cef19019457ddcfb197ff3d9c5267698e6
[18/18] powerpc/pseries: remove obsolete memory hotplug DT notifier code

https://git.kernel.org/powerpc/c/e978a3ccaa714b5ff125857d2cbecbb6fdf6c094

cheers


Re: [PATCH] powerpc/boot/dts: Fix dtc "pciex" warnings

2020-07-16 Thread Michael Ellerman
On Tue, 23 Jun 2020 23:03:20 +1000, Michael Ellerman wrote:
> With CONFIG_OF_ALL_DTBS=y, as set by eg. allmodconfig, we see lots of
> warnings about our dts files, such as:
> 
>   arch/powerpc/boot/dts/glacier.dts:492.26-532.5:
>   Warning (pci_bridge): /plb/pciex@d: node name is not "pci"
>   or "pcie"
> 
> [...]

Applied to powerpc/next.

[1/1] powerpc/boot/dts: Fix dtc "pciex" warnings
  https://git.kernel.org/powerpc/c/86bc917d2ac117ec922dbf8ed92ca989bf333281

cheers


Re: [PATCH v2] powerpc/perf: Add kernel support for new MSR[HV PR] bits in trace-imc

2020-07-16 Thread Michael Ellerman
On Mon, 13 Jul 2020 20:16:23 +0530, Madhavan Srinivasan wrote:
> IMC trace-mode record has MSR[HV PR] bits added in the third DW.
> These bits can be used to set the cpumode for the instruction pointer
> captured in each sample.
> 
> Add support in kernel to use these bits to set the cpumode for
> each sample.

Applied to powerpc/next.

[1/1] powerpc/perf: Add kernel support for new MSR[HV PR] bits in trace-imc
  https://git.kernel.org/powerpc/c/77ca3951cc37727ae8361d583a30da7a1b84e427

cheers


Re: [PATCH] powerpc/boot: Use address-of operator on section symbols

2020-07-16 Thread Michael Ellerman
On Tue, 23 Jun 2020 20:59:20 -0700, Nathan Chancellor wrote:
> Clang warns:
> 
> arch/powerpc/boot/main.c:107:18: warning: array comparison always
> evaluates to a constant [-Wtautological-compare]
> if (_initrd_end > _initrd_start) {
> ^
> arch/powerpc/boot/main.c:155:20: warning: array comparison always
> evaluates to a constant [-Wtautological-compare]
> if (_esm_blob_end <= _esm_blob_start)
>   ^
> 2 warnings generated.
> 
> [...]

Applied to powerpc/next.

[1/1] powerpc/boot: Use address-of operator on section symbols
  https://git.kernel.org/powerpc/c/df4232d96e724d09e54a623362f9f610727f059f

cheers


Re: [PATCH v5 0/2] Add cpu hotplug support for powerpc/perf/hv-24x7

2020-07-16 Thread Michael Ellerman
On Thu, 9 Jul 2020 10:48:34 +0530, Kajol Jain wrote:
> This patchset add cpu hotplug support for hv_24x7 driver by adding
> online/offline cpu hotplug function. It also add sysfs file
> "cpumask" to expose current online cpu that can be used for
> hv_24x7 event count.
> 
> Changelog:
> v4 -> v5
> - Since we are making PMU fail incase hotplug init failed, hence
>   directly adding cpumask attr inside if_attrs rather then creating
>   new attribute_group as suggested by Madhavan Srinivasan.
> 
> [...]

Applied to powerpc/next.

[1/2] powerpc/perf/hv-24x7: Add cpu hotplug support
  https://git.kernel.org/powerpc/c/1a8f0886a6008c98a926bdeca49f2ef33015a491
[2/2] powerpc/hv-24x7: Add sysfs files inside hv-24x7 device to show cpumask
  https://git.kernel.org/powerpc/c/792f73f747b82f6cb191a323e1f5755d33149b50

cheers


Re: [PATCH] cpuidle/powernv : Remove dead code block

2020-07-16 Thread Michael Ellerman
On Mon, 6 Jul 2020 00:32:58 -0500, Abhishek Goel wrote:
> Commit 1961acad2f88559c2cdd2ef67c58c3627f1f6e54 removes usage of
> function "validate_dt_prop_sizes". This patch removes this unused
> function.

Applied to powerpc/next.

[1/1] cpuidle/powernv : Remove dead code block
  https://git.kernel.org/powerpc/c/c339f9be304c21da1c42899a824f84a2cc9ced30

cheers


Re: [PATCH] powerpc/Kconfig: Replace HTTP links with HTTPS ones

2020-07-16 Thread Michael Ellerman
On Mon, 13 Jul 2020 21:26:56 +0200, Alexander A. Klimov wrote:
> Rationale:
> Reduces attack surface on kernel devs opening the links for MITM
> as HTTPS traffic is much harder to manipulate.
> 
> Deterministic algorithm:
> For each file:
>   If not .svg:
> For each line:
>   If doesn't contain `\bxmlns\b`:
> For each link, `\bhttp://[^# \t\r\n]*(?:\w|/)`:
> If neither `\bgnu\.org/license`, nor `\bmozilla\.org/MPL\b`:
> If both the HTTP and HTTPS versions
> return 200 OK and serve the same content:
>   Replace HTTP with HTTPS.

Applied to powerpc/next.

[1/1] powerpc/Kconfig: Replace HTTP links with HTTPS ones
  https://git.kernel.org/powerpc/c/9a3e3dccbf4317d02d28f8f99a5d1ccce42f9922

cheers


Re: [PATCH] ocxl: Replace HTTP links with HTTPS ones

2020-07-16 Thread Michael Ellerman
On Mon, 13 Jul 2020 19:55:06 +0200, Alexander A. Klimov wrote:
> Rationale:
> Reduces attack surface on kernel devs opening the links for MITM
> as HTTPS traffic is much harder to manipulate.
> 
> Deterministic algorithm:
> For each file:
>   If not .svg:
> For each line:
>   If doesn't contain `\bxmlns\b`:
> For each link, `\bhttp://[^# \t\r\n]*(?:\w|/)`:
> If neither `\bgnu\.org/license`, nor `\bmozilla\.org/MPL\b`:
> If both the HTTP and HTTPS versions
> return 200 OK and serve the same content:
>   Replace HTTP with HTTPS.

Applied to powerpc/next.

[1/1] ocxl: Replace HTTP links with HTTPS ones
  https://git.kernel.org/powerpc/c/07497137a5efa9b2628c18083e8b07b33160153d

cheers


Re: [PATCH v5] ocxl: control via sysfs whether the FPGA is reloaded on a link reset

2020-07-16 Thread Michael Ellerman
On Fri, 19 Jun 2020 16:04:39 +0200, Frederic Barrat wrote:
> Some opencapi FPGA images allow to control if the FPGA should be reloaded
> on the next adapter reset. If it is supported, the image specifies it
> through a Vendor Specific DVSEC in the config space of function 0.

Applied to powerpc/next.

[1/1] ocxl: control via sysfs whether the FPGA is reloaded on a link reset
  https://git.kernel.org/powerpc/c/87db7579ebd5ded337056eb765542eb2608f16e3

cheers


Re: [PATCH] powerpc/signal64: Don't opencode page prefaulting

2020-07-16 Thread Michael Ellerman
On Tue, 7 Jul 2020 18:32:25 + (UTC), Christophe Leroy wrote:
> Instead of doing a __get_user() from the first and last location
> into a tmp var which won't be used, use fault_in_pages_readable()

Applied to powerpc/next.

[1/1] powerpc/signal64: Don't opencode page prefaulting
  https://git.kernel.org/powerpc/c/96032f983ca32ad1d43c73da922dbc7022754c3c

cheers


Re: [PATCH] selftests/powerpc: Purge extra count_pmc() calls of ebb selftests

2020-07-16 Thread Michael Ellerman
On Fri, 26 Jun 2020 13:47:37 -0300, Desnes A. Nunes do Rosario wrote:
> An extra count on ebb_state.stats.pmc_count[PMC_INDEX(pmc)] is being per-
> formed when count_pmc() is used to reset PMCs on a few selftests. This
> extra pmc_count can occasionally invalidate results, such as the ones from
> cycles_test shown hereafter. The ebb_check_count() failed with an above
> the upper limit error due to the extra value on ebb_state.stats.pmc_count.
> 
> Furthermore, this extra count is also indicated by extra PMC1 trace_log on
> the output of the cycle test (as well as on pmc56_overflow_test):
> 
> [...]

Applied to powerpc/next.

[1/1] selftests/powerpc: Purge extra count_pmc() calls of ebb selftests
  https://git.kernel.org/powerpc/c/3337bf41e0dd70b4064cdf60acdfcdc2d050066c

cheers


Re: [PATCH 1/2] Revert "powerpc/kasan: Fix shadow pages allocation failure"

2020-07-16 Thread Michael Ellerman
On Thu, 2 Jul 2020 11:52:02 + (UTC), Christophe Leroy wrote:
> This reverts commit d2a91cef9bbdeb87b7449fdab1a6be6000930210.
> 
> This commit moved too much work in kasan_init(). The allocation
> of shadow pages has to be moved for the reason explained in that
> patch, but the allocation of page tables still need to be done
> before switching to the final hash table.
> 
> [...]

Applied to powerpc/next.

[1/2] Revert "powerpc/kasan: Fix shadow pages allocation failure"
  https://git.kernel.org/powerpc/c/b506923ee44ae87fc9f4de16b53feb313623e146
[2/2] powerpc/kasan: Fix shadow pages allocation failure
  https://git.kernel.org/powerpc/c/41ea93cf7ba4e0f0cc46ebfdda8b6ff27c67bc91

cheers


Re: [PATCH] docs: powerpc: Clarify book3s/32 MMU families

2020-07-16 Thread Michael Ellerman
On Thu, 2 Jul 2020 14:09:21 + (UTC), Christophe Leroy wrote:
> Documentation wrongly tells that book3s/32 CPU have hash MMU.
> 
> 603 and e300 core only have software loaded TLB.
> 
> 755, 7450 family and e600 core have both hash MMU and software loaded
> TLB. This can be selected by setting a bit in HID2 (755) or
> HID0 (others). At the time being this is not supported by the kernel.
> 
> [...]

Applied to powerpc/next.

[1/1] docs: powerpc: Clarify book3s/32 MMU families
  https://git.kernel.org/powerpc/c/7d38f089731fe129a49e254028caec6f05420f18

cheers


Re: [PATCH v8 0/8] powerpc: switch VDSO to C implementation

2020-07-16 Thread Michael Ellerman
On Tue, 28 Apr 2020 13:16:46 + (UTC), Christophe Leroy wrote:
> This is the seventh version of a series to switch powerpc VDSO to
> generic C implementation.
> 
> Main changes since v7 are:
> - Added gettime64 on PPC32
> 
> This series applies on today's powerpc/merge branch.
> 
> [...]

Patch 1 applied to powerpc/next.

[1/8] powerpc/vdso64: Switch from __get_datapage() to get_datapage inline macro
  https://git.kernel.org/powerpc/c/793d74a8c78e05d6833bfcf582e24e40bd92518f

cheers


Re: [PATCH 1/2] powerpc/signal_32: Remove !FULL_REGS() special handling in PPC64 save_general_regs()

2020-07-16 Thread Michael Ellerman
On Tue, 7 Jul 2020 12:33:35 + (UTC), Christophe Leroy wrote:
> Since commit ("1bd79336a426 powerpc: Fix various
> syscall/signal/swapcontext bugs"), getting save_general_regs() called
> without FULL_REGS() is very unlikely and generates a warning.
> 
> The 32-bit version of save_general_regs() doesn't take care of it
> at all and copies all registers anyway since that commit.
> 
> [...]

Applied to powerpc/next.

[1/2] powerpc/signal_32: Remove !FULL_REGS() special handling in PPC64 
save_general_regs()
  https://git.kernel.org/powerpc/c/667e3c413ecf20371692fd2dc37e06dc14d0b140
[2/2] powerpc/signal_32: Simplify loop in PPC64 save_general_regs()
  https://git.kernel.org/powerpc/c/020c4831e01264f8b62af6ca9e669b7c51881a56

cheers


Re: [PATCH v2] powerpc: Drop CONFIG_MTD_M25P80 in 85xx-hw.config

2020-07-16 Thread Michael Ellerman
On Fri, 1 May 2020 21:44:54 -0700, Bin Meng wrote:
> Drop CONFIG_MTD_M25P80 that was removed in
> commit b35b9a10362d ("mtd: spi-nor: Move m25p80 code in spi-nor.c")

Applied to powerpc/next.

[1/1] powerpc: Drop CONFIG_MTD_M25P80 in 85xx-hw.config
  https://git.kernel.org/powerpc/c/76f09371bc05d6eb8d5a01823c9eaab768d6e934

cheers


Re: [PATCH v3 0/3] Off-load TLB invalidations to host for !GTSE

2020-07-16 Thread Michael Ellerman
On Fri, 3 Jul 2020 11:06:05 +0530, Bharata B Rao wrote:
> Hypervisor may choose not to enable Guest Translation Shootdown Enable
> (GTSE) option for the guest. When GTSE isn't ON, the guest OS isn't
> permitted to use instructions like tblie and tlbsync directly, but is
> expected to make hypervisor calls to get the TLB flushed.
> 
> This series enables the TLB flush routines in the radix code to
> off-load TLB flushing to hypervisor via the newly proposed hcall
> H_RPT_INVALIDATE.
> 
> [...]

Applied to powerpc/next.

[1/3] powerpc/mm: Enable radix GTSE only if supported.
  https://git.kernel.org/powerpc/c/029ab30b4c0a7ec587eece1ec07c3981fdff2bed
[2/3] powerpc/pseries: H_REGISTER_PROC_TBL should ask for GTSE only if enabled
  https://git.kernel.org/powerpc/c/b6c84175078ff022b343b7b0737aeb33001ca90c
[3/3] powerpc/mm/book3s64/radix: Off-load TLB invalidations to host when !GTSE
  https://git.kernel.org/powerpc/c/dd3d9aa5589c52efaec12ffeb84f0f5f8208fbc3

cheers


Re: [PATCH] powerpc/spufs: add CONFIG_COREDUMP dependency

2020-07-16 Thread Michael Ellerman
On Mon, 6 Jul 2020 15:22:46 +0200, Arnd Bergmann wrote:
> The kernel test robot pointed out a slightly different error message
> after recent commit 5456ffdee666 ("powerpc/spufs: simplify spufs core
> dumping") to spufs for a configuration that never worked:
> 
>powerpc64-linux-ld: arch/powerpc/platforms/cell/spufs/file.o: in function 
> `.spufs_proxydma_info_dump':
> >> file.c:(.text+0x4c68): undefined reference to `.dump_emit'
>powerpc64-linux-ld: arch/powerpc/platforms/cell/spufs/file.o: in function 
> `.spufs_dma_info_dump':
>file.c:(.text+0x4d70): undefined reference to `.dump_emit'
>powerpc64-linux-ld: arch/powerpc/platforms/cell/spufs/file.o: in function 
> `.spufs_wbox_info_dump':
>file.c:(.text+0x4df4): undefined reference to `.dump_emit'
> 
> [...]

Applied to powerpc/next.

[1/1] powerpc/spufs: add CONFIG_COREDUMP dependency
  https://git.kernel.org/powerpc/c/b648a5132ca3237a0f1ce5d871fff342b0efcf8a

cheers


Re: [PATCH v2 0/6] consolidate PowerPC instruction encoding macros

2020-07-16 Thread Michael Ellerman
On Wed, 24 Jun 2020 17:00:32 +0530, Balamuruhan S wrote:
> ppc-opcode.h have base instruction encoding wrapped with stringify_in_c()
> for raw encoding to have compatibility. But there are redundant macros for
> base instruction encodings in bpf, instruction emulation test infrastructure
> and powerpc selftests.
> 
> Currently PPC_INST_* macros are used for encoding instruction opcode and PPC_*
> for raw instuction encoding, this rfc patchset introduces PPC_RAW_* macros for
> base instruction encoding and reuse it from elsewhere. With this change we can
> avoid redundant macro definitions in multiple files and start adding new
> instructions in ppc-opcode.h in future.
> 
> [...]

Applied to powerpc/next.

[1/6] powerpc/ppc-opcode: Introduce PPC_RAW_* macros for base instruction 
encoding
  https://git.kernel.org/powerpc/c/db551f8cc6a33f79cd2d2a6cfd1903f044e828a8
[2/6] powerpc/ppc-opcode: Move ppc instruction encoding from test_emulate_step
  https://git.kernel.org/powerpc/c/1d33dd84080f4a430bde2fc363d9b70f0a010c19
[3/6] powerpc/bpf_jit: Reuse instruction macros from ppc-opcode.h
  https://git.kernel.org/powerpc/c/0654186510a40e7e1fa788cb941d1a156ba2dcb2
[4/6] powerpc/ppc-opcode: Consolidate powerpc instructions from bpf_jit.h
  https://git.kernel.org/powerpc/c/3a181237916310b2bbbad158d97933bb2b4e7552
[5/6] powerpc/ppc-opcode: Reuse raw instruction macros to stringify
  https://git.kernel.org/powerpc/c/357c572948310c88868cee00e64ca3f7fc933a74
[6/6] powerpc/ppc-opcode: Fold PPC_INST_* macros into PPC_RAW_* macros
  https://git.kernel.org/powerpc/c/e4208f1399b1bf7ed84ba359a6ba0979d1df4029

cheers


Re: [PATCH] pseries: Fix 64 bit logical memory block panic

2020-07-16 Thread Michael Ellerman
On Wed, 15 Jul 2020 10:08:20 +1000, Anton Blanchard wrote:
> Booting with a 4GB LMB size causes us to panic:
> 
>   qemu-system-ppc64: OS terminated: OS panic:
>   Memory block size not suitable: 0x0
> 
> Fix pseries_memory_block_size() to handle 64 bit LMBs.

Applied to powerpc/next.

[1/1] pseries: Fix 64 bit logical memory block panic
  https://git.kernel.org/powerpc/c/89c140bbaeee7a55ed0360a88f294ead2b95201b

cheers


Re: [PATCH] xmon: Reset RCU and soft lockup watchdogs

2020-07-16 Thread Michael Ellerman
On Tue, 30 Jun 2020 10:02:18 +1000, Anton Blanchard wrote:
> I'm seeing RCU warnings when exiting xmon. xmon resets the NMI watchdog,
> but does nothing with the RCU stall or soft lockup watchdogs. Add a
> helper function that handles all three.

Applied to powerpc/next.

[1/1] powerpc/xmon: Reset RCU and soft lockup watchdogs
  https://git.kernel.org/powerpc/c/5c699396f5f6cf6d67055af7b82c270d31fd831a

cheers


  1   2   >