Re: [PATCH 3/3] powerpc/smp: Use existing L2 cache_map cpumask to find L3 cache siblings

2021-07-19 Thread Gautham R Shenoy
Hi Parth,

Sorry for the late review.

On Tue, Jun 15, 2021 at 12:38:04PM +0530, Parth Shah wrote:
> On POWER10 systems, the "ibm,thread-groups" property "2" indicates the cpus
> in thread-group share both L2 and L3 caches. Hence, use cache_property = 2
> itself to find both the L2 and L3 cache siblings.
> Hence, rename existing macros to detect if the cache property is for L2 or
> L3 and use the L2 cache map itself to find the presence of L3 siblings.
> 
> Signed-off-by: Parth Shah 
> ---
>  arch/powerpc/include/asm/smp.h  |  2 ++
>  arch/powerpc/kernel/cacheinfo.c |  3 +++
>  arch/powerpc/kernel/smp.c   | 20 +++-
>  3 files changed, 20 insertions(+), 5 deletions(-)
> 
> diff --git a/arch/powerpc/include/asm/smp.h b/arch/powerpc/include/asm/smp.h
> index 1259040cc3a4..55082d343bd2 100644
> --- a/arch/powerpc/include/asm/smp.h
> +++ b/arch/powerpc/include/asm/smp.h
> @@ -144,6 +144,7 @@ extern int cpu_to_core_id(int cpu);
> 
>  extern bool has_big_cores;
>  extern bool thread_group_shares_l2;
> +extern bool thread_group_shares_l3;
> 
>  #define cpu_smt_mask cpu_smt_mask
>  #ifdef CONFIG_SCHED_SMT
> @@ -198,6 +199,7 @@ extern void __cpu_die(unsigned int cpu);
>  #define hard_smp_processor_id()  get_hard_smp_processor_id(0)
>  #define smp_setup_cpu_maps()
>  #define thread_group_shares_l2  0
> +#define thread_group_shares_l3   0
>  static inline void inhibit_secondary_onlining(void) {}
>  static inline void uninhibit_secondary_onlining(void) {}
>  static inline const struct cpumask *cpu_sibling_mask(int cpu)
> diff --git a/arch/powerpc/kernel/cacheinfo.c b/arch/powerpc/kernel/cacheinfo.c
> index 20d91693eac1..378ae20d05a9 100644
> --- a/arch/powerpc/kernel/cacheinfo.c
> +++ b/arch/powerpc/kernel/cacheinfo.c
> @@ -469,6 +469,9 @@ static int get_group_id(unsigned int cpu_id, int level)
>   else if (thread_group_shares_l2 && level == 2)
>   return cpumask_first(per_cpu(thread_group_l2_cache_map,
>cpu_id));
> + else if (thread_group_shares_l3 && level == 3)
> + return cpumask_first(per_cpu(thread_group_l2_cache_map,
> +  cpu_id));

We should either rename thread_group_l2_cache_map as
thread_group_l2_l3_cache_map or we should create a separate
thread_group_l3_cache_map. I prefer the latter approach since it makes
the code consistent. 

Otherwise, the patch looks good to me.

--
Thanks and Regards
gautham.

>   return -1;
>  }
> 
> diff --git a/arch/powerpc/kernel/smp.c b/arch/powerpc/kernel/smp.c
> index a34877257f2d..d0c70fcd0068 100644
> --- a/arch/powerpc/kernel/smp.c
> +++ b/arch/powerpc/kernel/smp.c
> @@ -78,6 +78,7 @@ struct task_struct *secondary_current;
>  bool has_big_cores;
>  bool coregroup_enabled;
>  bool thread_group_shares_l2;
> +bool thread_group_shares_l3;
> 
>  DEFINE_PER_CPU(cpumask_var_t, cpu_sibling_map);
>  DEFINE_PER_CPU(cpumask_var_t, cpu_smallcore_map);
> @@ -101,7 +102,7 @@ enum {
> 
>  #define MAX_THREAD_LIST_SIZE 8
>  #define THREAD_GROUP_SHARE_L1   1
> -#define THREAD_GROUP_SHARE_L2   2
> +#define THREAD_GROUP_SHARE_L2_L3 2
>  struct thread_groups {
>   unsigned int property;
>   unsigned int nr_groups;
> @@ -887,9 +888,16 @@ static int __init init_thread_group_cache_map(int cpu, 
> int cache_property)
>   cpumask_var_t *mask = NULL;
> 
>   if (cache_property != THREAD_GROUP_SHARE_L1 &&
> - cache_property != THREAD_GROUP_SHARE_L2)
> + cache_property != THREAD_GROUP_SHARE_L2_L3)
>   return -EINVAL;
> 
> + /*
> +  * On P10 fused-core system, the L3 cache is shared between threads of a
> +  * small core only, but the "ibm,thread-groups" property is indicated as
> +  * "2" only which is interpreted as the thread-groups sharing both L2
> +  * and L3 caches. Hence cache_property of THREAD_GROUP_SHARE_L2_L3 is
> +  * used for both L2 and L3 cache sibling detection.
> +  */
>   tg = get_thread_groups(cpu, cache_property, &err);
>   if (!tg)
>   return err;
> @@ -903,7 +911,7 @@ static int __init init_thread_group_cache_map(int cpu, 
> int cache_property)
> 
>   if (cache_property == THREAD_GROUP_SHARE_L1)
>   mask = &per_cpu(thread_group_l1_cache_map, cpu);
> - else if (cache_property == THREAD_GROUP_SHARE_L2)
> + else if (cache_property == THREAD_GROUP_SHARE_L2_L3)
>   mask = &per_cpu(thread_group_l2_cache_map, cpu);
> 
>   zalloc_cpumask_var_node(mask, GFP_KERNEL, cpu_to_node(cpu));
> @@ -1009,14 +1017,16 @@ static int __init init_big_cores(void)
>   has_big_cores = true;
> 
>   for_each_possible_cpu(cpu) {
> - int err = init_thread_group_cache_map(cpu, 
> THREAD_GROUP_SHARE_L2);
> + int err = init_thread_group_cache_map(cpu, 
> THREAD_GROUP_SHARE_L2_L3);
> 
>   if (err)
>   return err;
>   }
> 
>   thread_group_shares_l2 = true;
> -   

Re: [RFC 0/2] Add generic FPU api similar to x86

2021-07-19 Thread Christian König
While you already CCed a bunch of people stuff like that needs to go to 
the appropriate mailing list and not just amd-gfx.


Especially LKML so that other core devs can take a look as well.

Regards,
Christian.

Am 19.07.21 um 21:52 schrieb Anson Jacob:

This is an attempt to have generic FPU enable/disable
calls similar to x86.
So that we can simplify gpu/drm/amd/display/dc/os_types.h

Also adds FPU correctness logic seen in x86.

Anson Jacob (2):
   ppc/fpu: Add generic FPU api similar to x86
   drm/amd/display: Use PPC FPU functions

  arch/powerpc/include/asm/switch_to.h  |  29 ++---
  arch/powerpc/kernel/process.c | 130 ++
  drivers/gpu/drm/amd/display/dc/os_types.h |  28 +
  3 files changed, 139 insertions(+), 48 deletions(-)





Re: [RFC 2/2] drm/amd/display: Use PPC FPU functions

2021-07-19 Thread Christoph Hellwig
>  #define DC_FP_END() kernel_fpu_end()
>  #elif defined(CONFIG_PPC64)
>  #include 
> +#define DC_FP_START() kernel_fpu_begin()
> +#define DC_FP_END() kernel_fpu_end()
>  #endif

Please use the same header as x86 in your first patch and then kill
this ifdefered and the DC_FP_START/DC_FP_END definitions entirely.


Re: [RFC 1/2] ppc/fpu: Add generic FPU api similar to x86

2021-07-19 Thread Christoph Hellwig
On Mon, Jul 19, 2021 at 03:52:10PM -0400, Anson Jacob wrote:
> - Add kernel_fpu_begin & kernel_fpu_end API as x86
> - Add logic similar to x86 to ensure fpu
>   begin/end call correctness
> - Add kernel_fpu_enabled to know if FPU is enabled
> 
> Signed-off-by: Anson Jacob 

All the x86 FPU support is EXPORT_SYMBOL_GPL for a good reason, so
please stick to that.


Re: [PATCH v5 02/11] powerpc/kernel/iommu: Add new iommu_table_in_use() helper

2021-07-19 Thread Leonardo Brás
Hello Fred, thanks for this feedback!

Sorry if I miss anything, this snippet was written for v1 over an year
ago, and I have not taken a look at it ever since.

On Mon, 2021-07-19 at 15:53 +0200, Frederic Barrat wrote:
> 
> 
> On 16/07/2021 10:27, Leonardo Bras wrote:
> > @@ -1099,18 +1105,13 @@ int iommu_take_ownership(struct iommu_table
> > *tbl)
> > for (i = 0; i < tbl->nr_pools; i++)
> > spin_lock_nest_lock(&tbl->pools[i].lock, &tbl-
> > >large_pool.lock);
> >   
> > -   iommu_table_release_pages(tbl);
> > -
> > -   if (!bitmap_empty(tbl->it_map, tbl->it_size)) {
> > +   if (iommu_table_in_use(tbl)) {
> > pr_err("iommu_tce: it_map is not empty");
> > ret = -EBUSY;
> > -   /* Undo iommu_table_release_pages, i.e. restore
> > bit#0, etc */
> > -   iommu_table_reserve_pages(tbl, tbl-
> > >it_reserved_start,
> > -   tbl->it_reserved_end);
> > -   } else {
> > -   memset(tbl->it_map, 0xff, sz);
> > }
> >   
> > +   memset(tbl->it_map, 0xff, sz);
> > +
> 
> 
> So if the table is not empty, we fail (EBUSY) but we now also
> completely 
> overwrite the bitmap. It was in an unexpected state, but we're making
> it 
> worse. Or am I missing something?

IIRC there was a reason to do that at the time, but TBH I don't really
remember it, and by looking at the code right now you seem to be
correct about this causing trouble.

I will send a v6 fixing it soon.
Please review the remaining patches for some issue I may be missing.

Alexey, any comments on that?

> 
>    Fred
> 

Again, thank you for reviewing Fred! 
Best regards,
Leonardo Bras







[PATCH v6 1/1] powerpc/pseries: Interface to represent PAPR firmware attributes

2021-07-19 Thread Pratik R. Sampat
Adds a generic interface to represent the energy and frequency related
PAPR attributes on the system using the new H_CALL
"H_GET_ENERGY_SCALE_INFO".

H_GET_EM_PARMS H_CALL was previously responsible for exporting this
information in the lparcfg, however the H_GET_EM_PARMS H_CALL
will be deprecated P10 onwards.

The H_GET_ENERGY_SCALE_INFO H_CALL is of the following call format:
hcall(
  uint64 H_GET_ENERGY_SCALE_INFO,  // Get energy scale info
  uint64 flags,   // Per the flag request
  uint64 firstAttributeId,// The attribute id
  uint64 bufferAddress,   // Guest physical address of the output buffer
  uint64 bufferSize   // The size in bytes of the output buffer
);

This H_CALL can query either all the attributes at once with
firstAttributeId = 0, flags = 0 as well as query only one attribute
at a time with firstAttributeId = id, flags = 1.

The output buffer consists of the following
1. number of attributes  - 8 bytes
2. array offset to the data location - 8 bytes
3. version info  - 1 byte
4. A data array of size num attributes, which contains the following:
  a. attribute ID  - 8 bytes
  b. attribute value in number - 8 bytes
  c. attribute name in string  - 64 bytes
  d. attribute value in string - 64 bytes

The new H_CALL exports information in direct string value format, hence
a new interface has been introduced in
/sys/firmware/papr/energy_scale_info to export this information to
userspace in an extensible pass-through format.

The H_CALL returns the name, numeric value and string value (if exists)

The format of exposing the sysfs information is as follows:
/sys/firmware/papr/energy_scale_info/
   |-- /
 |-- desc
 |-- value
 |-- value_desc (if exists)
   |-- /
 |-- desc
 |-- value
 |-- value_desc (if exists)
...

The energy information that is exported is useful for userspace tools
such as powerpc-utils. Currently these tools infer the
"power_mode_data" value in the lparcfg, which in turn is obtained from
the to be deprecated H_GET_EM_PARMS H_CALL.
On future platforms, such userspace utilities will have to look at the
data returned from the new H_CALL being populated in this new sysfs
interface and report this information directly without the need of
interpretation.

Signed-off-by: Pratik R. Sampat 
Reviewed-by: Gautham R. Shenoy 
---
 .../sysfs-firmware-papr-energy-scale-info |  26 ++
 arch/powerpc/include/asm/hvcall.h |  24 +-
 arch/powerpc/kvm/trace_hv.h   |   1 +
 arch/powerpc/platforms/pseries/Makefile   |   3 +-
 .../pseries/papr_platform_attributes.c| 312 ++
 5 files changed, 364 insertions(+), 2 deletions(-)
 create mode 100644 
Documentation/ABI/testing/sysfs-firmware-papr-energy-scale-info
 create mode 100644 arch/powerpc/platforms/pseries/papr_platform_attributes.c

diff --git a/Documentation/ABI/testing/sysfs-firmware-papr-energy-scale-info 
b/Documentation/ABI/testing/sysfs-firmware-papr-energy-scale-info
new file mode 100644
index ..139a576c7c9d
--- /dev/null
+++ b/Documentation/ABI/testing/sysfs-firmware-papr-energy-scale-info
@@ -0,0 +1,26 @@
+What:  /sys/firmware/papr/energy_scale_info
+Date:  June 2021
+Contact:   Linux for PowerPC mailing list 
+Description:   Directory hosting a set of platform attributes like
+   energy/frequency on Linux running as a PAPR guest.
+
+   Each file in a directory contains a platform
+   attribute hierarchy pertaining to performance/
+   energy-savings mode and processor frequency.
+
+What:  /sys/firmware/papr/energy_scale_info/
+   /sys/firmware/papr/energy_scale_info//desc
+   /sys/firmware/papr/energy_scale_info//value
+   /sys/firmware/papr/energy_scale_info//value_desc
+Date:  June 2021
+Contact:   Linux for PowerPC mailing list 
+Description:   Energy, frequency attributes directory for POWERVM servers
+
+   This directory provides energy, frequency, folding information. 
It
+   contains below sysfs attributes:
+
+   - desc: String description of the attribute 
+
+   - value: Numeric value of attribute 
+
+   - value_desc: String value of attribute 
diff --git a/arch/powerpc/include/asm/hvcall.h 
b/arch/powerpc/include/asm/hvcall.h
index e3b29eda8074..c91714ea6719 100644
--- a/arch/powerpc/include/asm/hvcall.h
+++ b/arch/powerpc/include/asm/hvcall.h
@@ -316,7 +316,8 @@
 #define H_SCM_PERFORMANCE_STATS 0x418
 #define H_RPT_INVALIDATE   0x448
 #define H_SCM_FLUSH0x44C
-#define MAX_HCALL_OPCODE   H_SCM_FLUSH
+#define H_GET_ENERGY_SCALE_INFO0x450
+#define MAX_HCALL_OPCODE   H_GET_ENERGY_SCALE_INFO
 
 /* Scope args for H_SCM_UNBIND_ALL */
 #define H_UNBIND_SCOPE_ALL (0x1)
@@ -631,6 +632,27 @@ struct hv_gpci_request_buffer {
uint8_t bytes[HGPCI_MAX_DATA_BYTES];
 } __packed;
 
+#

[PATCH v6 0/1] Interface to represent PAPR firmware attributes

2021-07-19 Thread Pratik R. Sampat
RFC: https://lkml.org/lkml/2021/6/4/791
PATCH v1: https://lkml.org/lkml/2021/6/16/805
PATCH v2: https://lkml.org/lkml/2021/7/6/138
PATCH v3: https://lkml.org/lkml/2021/7/12/2799
PATCH v4: https://lkml.org/lkml/2021/7/16/532
PATCH v5: https://lkml.org/lkml/2021/7/19/247

Changelog v5 --> v6
1. On allocation faliure of "pgs[idx].pg attributes", redirect the free
   using "goto" to a common cleanups section for consistency

Also, have implemented a POC using this interface for the powerpc-utils'
ppc64_cpu --frequency command-line tool to utilize this information
in userspace.
The POC for the new interface has been hosted here:
https://github.com/pratiksampat/powerpc-utils/tree/H_GET_ENERGY_SCALE_INFO_v2

Sample output from the powerpc-utils tool is as follows:

# ppc64_cpu --frequency
Power and Performance Mode: 
Idle Power Saver Status   : 
Processor Folding Status  :  --> Printed if Idle power save status is 
supported

Platform reported frequencies --> Frequencies reported from the platform's 
H_CALL i.e PAPR interface
min: GHz
max: GHz
static : GHz

Tool Computed frequencies
min: GHz (cpu XX)
max: GHz (cpu XX)
avg: GHz

Pratik R. Sampat (1):
  powerpc/pseries: Interface to represent PAPR firmware attributes

 .../sysfs-firmware-papr-energy-scale-info |  26 ++
 arch/powerpc/include/asm/hvcall.h |  24 +-
 arch/powerpc/kvm/trace_hv.h   |   1 +
 arch/powerpc/platforms/pseries/Makefile   |   3 +-
 .../pseries/papr_platform_attributes.c| 312 ++
 5 files changed, 364 insertions(+), 2 deletions(-)
 create mode 100644 
Documentation/ABI/testing/sysfs-firmware-papr-energy-scale-info
 create mode 100644 arch/powerpc/platforms/pseries/papr_platform_attributes.c

-- 
2.31.1



Re: [PATCH v5 1/1] powerpc/pseries: Interface to represent PAPR firmware attributes

2021-07-19 Thread Pratik Sampat




On 20/07/21 12:39 am, Fabiano Rosas wrote:

"Pratik R. Sampat"  writes:


+   pgs = kcalloc(num_attrs, sizeof(*pgs), GFP_KERNEL);
+   if (!pgs)
+   goto out;
+
+   papr_kobj = kobject_create_and_add("papr", firmware_kobj);
+   if (!papr_kobj) {
+   pr_warn("kobject_create_and_add papr failed\n");
+   goto out_pgs;
+   }
+
+   esi_kobj = kobject_create_and_add("energy_scale_info", papr_kobj);
+   if (!esi_kobj) {
+   pr_warn("kobject_create_and_add energy_scale_info failed\n");
+   goto out_kobj;
+   }
+
+   for (idx = 0; idx < num_attrs; idx++) {
+   bool show_val_desc = true;
+
+   pgs[idx].pg.attrs = kcalloc(MAX_ATTRS + 1,
+   sizeof(*pgs[idx].pg.attrs),
+   GFP_KERNEL);
+   if (!pgs[idx].pg.attrs) {
+   for (i = idx - 1; i >= 0; i--)
+   kfree(pgs[i].pg.attrs);

What about the pg.name from the previous iterations?


+   goto out_ekobj;
+   }
+
+   pgs[idx].pg.name = kasprintf(GFP_KERNEL, "%lld",
+be64_to_cpu(esi_attrs[idx].id));
+   if (pgs[idx].pg.name == NULL) {
+   for (i = idx; i >= 0; i--)
+   kfree(pgs[i].pg.attrs);

Here too.

You could just 'goto out_pgattrs' in both cases.


Yeah, you're right. I may have over-complicated the free, in case of
failure in both the cases above I could just free from "out_pgattrs"
with no issues


+   goto out_ekobj;
+   }
+   /* Do not add the value description if it does not exist */
+   if (strnlen(esi_attrs[idx].value_desc,
+   sizeof(esi_attrs[idx].value_desc)) == 0)
+   show_val_desc = false;
+
+   if (add_attr_group(be64_to_cpu(esi_attrs[idx].id), &pgs[idx],
+  show_val_desc)) {
+   pr_warn("Failed to create papr attribute group %s\n",
+   pgs[idx].pg.name);
+   goto out_pgattrs;
+   }
+   }
+
+   kfree(esi_buf);
+   return 0;
+
+out_pgattrs:
+   for (i = 0; i < num_attrs ; i++) {
+   kfree(pgs[i].pg.attrs);
+   kfree(pgs[i].pg.name);
+   }
+out_ekobj:
+   kobject_put(esi_kobj);
+out_kobj:
+   kobject_put(papr_kobj);
+out_pgs:
+   kfree(pgs);
+out:
+   kfree(esi_buf);
+
+   return -ENOMEM;
+}
+
+machine_device_initcall(pseries, papr_init);




[RFC 1/2] ppc/fpu: Add generic FPU api similar to x86

2021-07-19 Thread Anson Jacob
- Add kernel_fpu_begin & kernel_fpu_end API as x86
- Add logic similar to x86 to ensure fpu
  begin/end call correctness
- Add kernel_fpu_enabled to know if FPU is enabled

Signed-off-by: Anson Jacob 
---
 arch/powerpc/include/asm/switch_to.h |  29 ++
 arch/powerpc/kernel/process.c| 130 +++
 2 files changed, 137 insertions(+), 22 deletions(-)

diff --git a/arch/powerpc/include/asm/switch_to.h 
b/arch/powerpc/include/asm/switch_to.h
index 9d1fbd8be1c7..aded7aa661c0 100644
--- a/arch/powerpc/include/asm/switch_to.h
+++ b/arch/powerpc/include/asm/switch_to.h
@@ -41,10 +41,7 @@ extern void enable_kernel_fp(void);
 extern void flush_fp_to_thread(struct task_struct *);
 extern void giveup_fpu(struct task_struct *);
 extern void save_fpu(struct task_struct *);
-static inline void disable_kernel_fp(void)
-{
-   msr_check_and_clear(MSR_FP);
-}
+extern void disable_kernel_fp(void);
 #else
 static inline void save_fpu(struct task_struct *t) { }
 static inline void flush_fp_to_thread(struct task_struct *t) { }
@@ -55,10 +52,7 @@ extern void enable_kernel_altivec(void);
 extern void flush_altivec_to_thread(struct task_struct *);
 extern void giveup_altivec(struct task_struct *);
 extern void save_altivec(struct task_struct *);
-static inline void disable_kernel_altivec(void)
-{
-   msr_check_and_clear(MSR_VEC);
-}
+extern void disable_kernel_altivec(void);
 #else
 static inline void save_altivec(struct task_struct *t) { }
 static inline void __giveup_altivec(struct task_struct *t) { }
@@ -67,20 +61,7 @@ static inline void __giveup_altivec(struct task_struct *t) { 
}
 #ifdef CONFIG_VSX
 extern void enable_kernel_vsx(void);
 extern void flush_vsx_to_thread(struct task_struct *);
-static inline void disable_kernel_vsx(void)
-{
-   msr_check_and_clear(MSR_FP|MSR_VEC|MSR_VSX);
-}
-#else
-static inline void enable_kernel_vsx(void)
-{
-   BUILD_BUG();
-}
-
-static inline void disable_kernel_vsx(void)
-{
-   BUILD_BUG();
-}
+extern void disable_kernel_vsx(void);
 #endif
 
 #ifdef CONFIG_SPE
@@ -114,4 +95,8 @@ static inline void clear_task_ebb(struct task_struct *t)
 
 extern int set_thread_tidr(struct task_struct *t);
 
+bool kernel_fpu_enabled(void);
+void kernel_fpu_begin(void);
+void kernel_fpu_end(void);
+
 #endif /* _ASM_POWERPC_SWITCH_TO_H */
diff --git a/arch/powerpc/kernel/process.c b/arch/powerpc/kernel/process.c
index 185beb290580..2ced8c6a3fab 100644
--- a/arch/powerpc/kernel/process.c
+++ b/arch/powerpc/kernel/process.c
@@ -75,6 +75,17 @@
 #define TM_DEBUG(x...) do { } while(0)
 #endif
 
+/*
+ * Track whether the kernel is using the FPU state
+ * currently.
+ *
+ * This flag is used:
+ *
+ *   - kernel_fpu_begin()/end() correctness
+ *   - kernel_fpu_enabled info
+ */
+static DEFINE_PER_CPU(bool, in_kernel_fpu);
+
 extern unsigned long _get_SP(void);
 
 #ifdef CONFIG_PPC_TRANSACTIONAL_MEM
@@ -212,6 +223,9 @@ void enable_kernel_fp(void)
unsigned long cpumsr;
 
WARN_ON(preemptible());
+   WARN_ON_ONCE(this_cpu_read(in_kernel_fpu));
+
+   this_cpu_write(in_kernel_fpu, true);
 
cpumsr = msr_check_and_set(MSR_FP);
 
@@ -231,6 +245,15 @@ void enable_kernel_fp(void)
}
 }
 EXPORT_SYMBOL(enable_kernel_fp);
+
+void disable_kernel_fp(void)
+{
+   WARN_ON_ONCE(!this_cpu_read(in_kernel_fpu));
+
+   this_cpu_write(in_kernel_fpu, false);
+   msr_check_and_clear(MSR_FP);
+}
+EXPORT_SYMBOL(disable_kernel_fp);
 #else
 static inline void __giveup_fpu(struct task_struct *tsk) { }
 #endif /* CONFIG_PPC_FPU */
@@ -263,6 +286,9 @@ void enable_kernel_altivec(void)
unsigned long cpumsr;
 
WARN_ON(preemptible());
+   WARN_ON_ONCE(this_cpu_read(in_kernel_fpu));
+
+   this_cpu_write(in_kernel_fpu, true);
 
cpumsr = msr_check_and_set(MSR_VEC);
 
@@ -283,6 +309,14 @@ void enable_kernel_altivec(void)
 }
 EXPORT_SYMBOL(enable_kernel_altivec);
 
+void disable_kernel_altivec(void)
+{
+   WARN_ON_ONCE(!this_cpu_read(in_kernel_fpu));
+
+   this_cpu_write(in_kernel_fpu, false);
+   msr_check_and_clear(MSR_VEC);
+}
+EXPORT_SYMBOL(disable_kernel_altivec);
 /*
  * Make sure the VMX/Altivec register state in the
  * the thread_struct is up to date for task tsk.
@@ -333,6 +367,9 @@ void enable_kernel_vsx(void)
unsigned long cpumsr;
 
WARN_ON(preemptible());
+   WARN_ON_ONCE(this_cpu_read(in_kernel_fpu));
+
+   this_cpu_write(in_kernel_fpu, true);
 
cpumsr = msr_check_and_set(MSR_FP|MSR_VEC|MSR_VSX);
 
@@ -354,6 +391,15 @@ void enable_kernel_vsx(void)
 }
 EXPORT_SYMBOL(enable_kernel_vsx);
 
+void disable_kernel_vsx(void)
+{
+   WARN_ON_ONCE(!this_cpu_read(in_kernel_fpu));
+
+   this_cpu_write(in_kernel_fpu, false);
+   msr_check_and_clear(MSR_FP|MSR_VEC|MSR_VSX);
+}
+EXPORT_SYMBOL(disable_kernel_vsx);
+
 void flush_vsx_to_thread(struct task_struct *tsk)
 {
if (tsk->thread.regs) {
@@ -406,6 +452,90 @@ void flush_spe_to_thread(struct task_struct *tsk)
 }
 #en

[RFC 2/2] drm/amd/display: Use PPC FPU functions

2021-07-19 Thread Anson Jacob
Use kernel_fpu_begin & kernel_fpu_end for PPC

Depends on "ppc/fpu: Add generic FPU api similar to x86"

Signed-off-by: Anson Jacob 
---
 drivers/gpu/drm/amd/display/dc/os_types.h | 28 ++-
 1 file changed, 2 insertions(+), 26 deletions(-)

diff --git a/drivers/gpu/drm/amd/display/dc/os_types.h 
b/drivers/gpu/drm/amd/display/dc/os_types.h
index 126c2f3a4dd3..999c5103357e 100644
--- a/drivers/gpu/drm/amd/display/dc/os_types.h
+++ b/drivers/gpu/drm/amd/display/dc/os_types.h
@@ -57,32 +57,8 @@
 #define DC_FP_END() kernel_fpu_end()
 #elif defined(CONFIG_PPC64)
 #include 
-#include 
-#define DC_FP_START() { \
-   if (cpu_has_feature(CPU_FTR_VSX_COMP)) { \
-   preempt_disable(); \
-   enable_kernel_vsx(); \
-   } else if (cpu_has_feature(CPU_FTR_ALTIVEC_COMP)) { \
-   preempt_disable(); \
-   enable_kernel_altivec(); \
-   } else if (!cpu_has_feature(CPU_FTR_FPU_UNAVAILABLE)) { \
-   preempt_disable(); \
-   enable_kernel_fp(); \
-   } \
-}
-#define DC_FP_END() { \
-   if (cpu_has_feature(CPU_FTR_VSX_COMP)) { \
-   disable_kernel_vsx(); \
-   preempt_enable(); \
-   } else if (cpu_has_feature(CPU_FTR_ALTIVEC_COMP)) { \
-   disable_kernel_altivec(); \
-   preempt_enable(); \
-   } else if (!cpu_has_feature(CPU_FTR_FPU_UNAVAILABLE)) { \
-   disable_kernel_fp(); \
-   preempt_enable(); \
-   } \
-}
-#endif
+#define DC_FP_START() kernel_fpu_begin()
+#define DC_FP_END() kernel_fpu_end()
 #endif
 
 /*
-- 
2.25.1



[RFC 0/2] Add generic FPU api similar to x86

2021-07-19 Thread Anson Jacob
This is an attempt to have generic FPU enable/disable
calls similar to x86. 
So that we can simplify gpu/drm/amd/display/dc/os_types.h

Also adds FPU correctness logic seen in x86.

Anson Jacob (2):
  ppc/fpu: Add generic FPU api similar to x86
  drm/amd/display: Use PPC FPU functions

 arch/powerpc/include/asm/switch_to.h  |  29 ++---
 arch/powerpc/kernel/process.c | 130 ++
 drivers/gpu/drm/amd/display/dc/os_types.h |  28 +
 3 files changed, 139 insertions(+), 48 deletions(-)

-- 
2.25.1



Re: [PATCH v1 13/16] xen: swiotlb: return error code from xen_swiotlb_map_sg()

2021-07-19 Thread Boris Ostrovsky


On 7/15/21 12:45 PM, Logan Gunthorpe wrote:
> From: Martin Oliveira 
>
> The .map_sg() op now expects an error code instead of zero on failure.
>
> xen_swiotlb_map_sg() may only fail if xen_swiotlb_map_page() fails, but
> xen_swiotlb_map_page() only supports returning errors as
> DMA_MAPPING_ERROR. So coalesce all errors into EINVAL.


Reviewed-by: Boris Ostrovsky 





Re: [PATCH v5 1/1] powerpc/pseries: Interface to represent PAPR firmware attributes

2021-07-19 Thread Fabiano Rosas
"Pratik R. Sampat"  writes:

> + pgs = kcalloc(num_attrs, sizeof(*pgs), GFP_KERNEL);
> + if (!pgs)
> + goto out;
> +
> + papr_kobj = kobject_create_and_add("papr", firmware_kobj);
> + if (!papr_kobj) {
> + pr_warn("kobject_create_and_add papr failed\n");
> + goto out_pgs;
> + }
> +
> + esi_kobj = kobject_create_and_add("energy_scale_info", papr_kobj);
> + if (!esi_kobj) {
> + pr_warn("kobject_create_and_add energy_scale_info failed\n");
> + goto out_kobj;
> + }
> +
> + for (idx = 0; idx < num_attrs; idx++) {
> + bool show_val_desc = true;
> +
> + pgs[idx].pg.attrs = kcalloc(MAX_ATTRS + 1,
> + sizeof(*pgs[idx].pg.attrs),
> + GFP_KERNEL);
> + if (!pgs[idx].pg.attrs) {
> + for (i = idx - 1; i >= 0; i--)
> + kfree(pgs[i].pg.attrs);

What about the pg.name from the previous iterations?

> + goto out_ekobj;
> + }
> +
> + pgs[idx].pg.name = kasprintf(GFP_KERNEL, "%lld",
> +  be64_to_cpu(esi_attrs[idx].id));
> + if (pgs[idx].pg.name == NULL) {
> + for (i = idx; i >= 0; i--)
> + kfree(pgs[i].pg.attrs);

Here too.

You could just 'goto out_pgattrs' in both cases.

> + goto out_ekobj;
> + }
> + /* Do not add the value description if it does not exist */
> + if (strnlen(esi_attrs[idx].value_desc,
> + sizeof(esi_attrs[idx].value_desc)) == 0)
> + show_val_desc = false;
> +
> + if (add_attr_group(be64_to_cpu(esi_attrs[idx].id), &pgs[idx],
> +show_val_desc)) {
> + pr_warn("Failed to create papr attribute group %s\n",
> + pgs[idx].pg.name);
> + goto out_pgattrs;
> + }
> + }
> +
> + kfree(esi_buf);
> + return 0;
> +
> +out_pgattrs:
> + for (i = 0; i < num_attrs ; i++) {
> + kfree(pgs[i].pg.attrs);
> + kfree(pgs[i].pg.name);
> + }
> +out_ekobj:
> + kobject_put(esi_kobj);
> +out_kobj:
> + kobject_put(papr_kobj);
> +out_pgs:
> + kfree(pgs);
> +out:
> + kfree(esi_buf);
> +
> + return -ENOMEM;
> +}
> +
> +machine_device_initcall(pseries, papr_init);


Re: [PATCH v5 04/11] powerpc/pseries/iommu: Add ddw_list_new_entry() helper

2021-07-19 Thread Leonardo Brás
On Mon, 2021-07-19 at 16:14 +0200, Frederic Barrat wrote:
> 
> 
> On 16/07/2021 10:27, Leonardo Bras wrote:
> > There are two functions creating direct_window_list entries in a
> > similar way, so create a ddw_list_new_entry() to avoid duplicity
> > and
> > simplify those functions.
> > 
> > Signed-off-by: Leonardo Bras 
> > Reviewed-by: Alexey Kardashevskiy 
> > ---
> 
> LGTM
> Reviewed-by: Frederic Barrat 
> 


Thanks!



Re: [PATCH v5 03/11] powerpc/pseries/iommu: Add iommu_pseries_alloc_table() helper

2021-07-19 Thread Leonardo Brás
On Mon, 2021-07-19 at 16:04 +0200, Frederic Barrat wrote:
> 
> 
> On 16/07/2021 10:27, Leonardo Bras wrote:
> > Creates a helper to allow allocating a new iommu_table without the
> > need
> > to reallocate the iommu_group.
> > 
> > This will be helpful for replacing the iommu_table for the new DMA
> > window,
> > after we remove the old one with iommu_tce_table_put().
> > 
> > Signed-off-by: Leonardo Bras 
> > Reviewed-by: Alexey Kardashevskiy 
> > ---
> >   arch/powerpc/platforms/pseries/iommu.c | 25 ++---
> > 
> >   1 file changed, 14 insertions(+), 11 deletions(-)
> > 
> > diff --git a/arch/powerpc/platforms/pseries/iommu.c
> > b/arch/powerpc/platforms/pseries/iommu.c
> > index b1b8d12bab39..33d82865d6e6 100644
> > --- a/arch/powerpc/platforms/pseries/iommu.c
> > +++ b/arch/powerpc/platforms/pseries/iommu.c
> > @@ -53,28 +53,31 @@ enum {
> > DDW_EXT_QUERY_OUT_SIZE = 2
> >   };
> >   
> > -static struct iommu_table_group *iommu_pseries_alloc_group(int
> > node)
> > +static struct iommu_table *iommu_pseries_alloc_table(int node)
> >   {
> > -   struct iommu_table_group *table_group;
> > struct iommu_table *tbl;
> >   
> > -   table_group = kzalloc_node(sizeof(struct
> > iommu_table_group), GFP_KERNEL,
> > -  node);
> > -   if (!table_group)
> > -   return NULL;
> > -
> > tbl = kzalloc_node(sizeof(struct iommu_table), GFP_KERNEL,
> > node);
> > if (!tbl)
> > -   goto free_group;
> > +   return NULL;
> >   
> > INIT_LIST_HEAD_RCU(&tbl->it_group_list);
> > kref_init(&tbl->it_kref);
> > +   return tbl;
> > +}
> >   
> > -   table_group->tables[0] = tbl;
> > +static struct iommu_table_group *iommu_pseries_alloc_group(int
> > node)
> > +{
> > +   struct iommu_table_group *table_group;
> > +
> > +   table_group = kzalloc_node(sizeof(*table_group),
> > GFP_KERNEL, node);
> > +   if (!table_group)
> > +   return NULL;
> >   
> > -   return table_group;
> > +   table_group->tables[0] = iommu_pseries_alloc_table(node);
> > +   if (table_group->tables[0])
> > +   return table_group;
> 
> 
> Nitpick: for readability, we'd usually expect the error path to be 
> detected with the if statement and keep going on the good path, and
> here 
> the code does the opposite. No big deal though, so
> 
> Reviewed-by: Frederic Barrat 
> 
> 

Thanks for the tip and review!

Best regards,
Leonardo Bras


> >   
> > -free_group:
> > kfree(table_group);
> > return NULL;
> >   }
> > 




Re: [PATCH v5 01/11] powerpc/pseries/iommu: Replace hard-coded page shift

2021-07-19 Thread Leonardo Brás
On Mon, 2021-07-19 at 15:48 +0200, Frederic Barrat wrote:
> 
> 
> On 16/07/2021 10:27, Leonardo Bras wrote:
> > Some functions assume IOMMU page size can only be 4K (pageshift ==
> > 12).
> > Update them to accept any page size passed, so we can use 64K
> > pages.
> > 
> > In the process, some defines like TCE_SHIFT were made obsolete, and
> > then
> > removed.
> > 
> > IODA3 Revision 3.0_prd1 (OpenPowerFoundation), Figures 3.4 and 3.5
> > show
> > a RPN of 52-bit, and considers a 12-bit pageshift, so there should
> > be
> > no need of using TCE_RPN_MASK, which masks out any bit after 40 in
> > rpn.
> > It's usage removed from tce_build_pSeries(), tce_build_pSeriesLP(),
> > and
> > tce_buildmulti_pSeriesLP().
> > 
> > Most places had a tbl struct, so using tbl->it_page_shift was
> > simple.
> > tce_free_pSeriesLP() was a special case, since callers not always
> > have a
> > tbl struct, so adding a tceshift parameter seems the right thing to
> > do.
> > 
> > Signed-off-by: Leonardo Bras 
> > Reviewed-by: Alexey Kardashevskiy 
> > ---
> 
> FWIW,
> Reviewed-by: Frederic Barrat 
> 

Thanks!



Re: [PATCH] replace if with min

2021-07-19 Thread Segher Boessenkool
On Mon, Jul 19, 2021 at 06:12:05PM +0200, Christophe Leroy wrote:
> Salah Triki  a écrit :
> >Replace if with min in order to make code more clean.

> >--- a/drivers/crypto/nx/nx-842.c
> >+++ b/drivers/crypto/nx/nx-842.c
> >@@ -134,8 +134,7 @@ EXPORT_SYMBOL_GPL(nx842_crypto_exit);
> > static void check_constraints(struct nx842_constraints *c)
> > {
> > /* limit maximum, to always have enough bounce buffer to decompress 
> > */
> >-if (c->maximum > BOUNCE_BUFFER_SIZE)
> >-c->maximum = BOUNCE_BUFFER_SIZE;
> >+c->maximum = min(c->maximum, BOUNCE_BUFFER_SIZE);
> 
> For me the code is less clear with this change, and in addition it  
> slightly changes the behaviour. Before, the write was done only when  
> the value was changing. Now you rewrite the value always, even when it  
> doesn't change.

In both cases the compiler can decide to either write it more often than
strictly needed, depending on what it thinks best (and it usually has
better estimates than the programmer).  The behaviour is identical (and
the generated machine code is as well, in my testing).

The field name "maximum" is not the best choice, which makes the code
read a bit funny ("the min of max"), but the comment makes things pretty
clear.


Segher


[PATCH 5.12 286/292] perf script python: Fix buffer size to report iregs in perf script

2021-07-19 Thread Greg Kroah-Hartman
From: Kajol Jain 

[ Upstream commit dea8cfcc33695f70f56023b416cf88ae44c8a45a ]

Commit 48a1f565261d2ab1 ("perf script python: Add more PMU fields to
event handler dict") added functionality to report fields like weight,
iregs, uregs etc via perf report.  That commit predefined buffer size to
512 bytes to print those fields.

But in PowerPC, since we added extended regs support in:

  068aeea3773a6f4c ("perf powerpc: Support exposing Performance Monitor Counter 
SPRs as part of extended regs")
  d735599a069f6936 ("powerpc/perf: Add extended regs support for power10 
platform")

Now iregs can carry more bytes of data and this predefined buffer size
can result to data loss in perf script output.

This patch resolves this issue by making the buffer size dynamic, based
on the number of registers needed to print. It also changes the
regs_map() return type from int to void, as it is not being used by the
set_regs_in_dict(), its only caller.

Fixes: 068aeea3773a6f4c ("perf powerpc: Support exposing Performance Monitor 
Counter SPRs as part of extended regs")
Signed-off-by: Kajol Jain 
Tested-by: Nageswara R Sastry 
Cc: Athira Jajeev 
Cc: Jiri Olsa 
Cc: Madhavan Srinivasan 
Cc: Paul Clarke 
Cc: Ravi Bangoria 
Cc: linuxppc-dev@lists.ozlabs.org
Link: http://lore.kernel.org/lkml/20210628062341.155839-1-kj...@linux.ibm.com
Signed-off-by: Arnaldo Carvalho de Melo 
Signed-off-by: Sasha Levin 
---
 .../util/scripting-engines/trace-event-python.c | 17 -
 1 file changed, 12 insertions(+), 5 deletions(-)

diff --git a/tools/perf/util/scripting-engines/trace-event-python.c 
b/tools/perf/util/scripting-engines/trace-event-python.c
index 23dc5014e711..a61be9c07565 100644
--- a/tools/perf/util/scripting-engines/trace-event-python.c
+++ b/tools/perf/util/scripting-engines/trace-event-python.c
@@ -687,7 +687,7 @@ static void set_sample_datasrc_in_dict(PyObject *dict,
_PyUnicode_FromString(decode));
 }
 
-static int regs_map(struct regs_dump *regs, uint64_t mask, char *bf, int size)
+static void regs_map(struct regs_dump *regs, uint64_t mask, char *bf, int size)
 {
unsigned int i = 0, r;
int printed = 0;
@@ -695,7 +695,7 @@ static int regs_map(struct regs_dump *regs, uint64_t mask, 
char *bf, int size)
bf[0] = 0;
 
if (!regs || !regs->regs)
-   return 0;
+   return;
 
for_each_set_bit(r, (unsigned long *) &mask, sizeof(mask) * 8) {
u64 val = regs->regs[i++];
@@ -704,8 +704,6 @@ static int regs_map(struct regs_dump *regs, uint64_t mask, 
char *bf, int size)
 "%5s:0x%" PRIx64 " ",
 perf_reg_name(r), val);
}
-
-   return printed;
 }
 
 static void set_regs_in_dict(PyObject *dict,
@@ -713,7 +711,16 @@ static void set_regs_in_dict(PyObject *dict,
 struct evsel *evsel)
 {
struct perf_event_attr *attr = &evsel->core.attr;
-   char bf[512];
+
+   /*
+* Here value 28 is a constant size which can be used to print
+* one register value and its corresponds to:
+* 16 chars is to specify 64 bit register in hexadecimal.
+* 2 chars is for appending "0x" to the hexadecimal value and
+* 10 chars is for register name.
+*/
+   int size = __sw_hweight64(attr->sample_regs_intr) * 28;
+   char bf[size];
 
regs_map(&sample->intr_regs, attr->sample_regs_intr, bf, sizeof(bf));
 
-- 
2.30.2





[PATCH 5.13 345/351] perf script python: Fix buffer size to report iregs in perf script

2021-07-19 Thread Greg Kroah-Hartman
From: Kajol Jain 

[ Upstream commit dea8cfcc33695f70f56023b416cf88ae44c8a45a ]

Commit 48a1f565261d2ab1 ("perf script python: Add more PMU fields to
event handler dict") added functionality to report fields like weight,
iregs, uregs etc via perf report.  That commit predefined buffer size to
512 bytes to print those fields.

But in PowerPC, since we added extended regs support in:

  068aeea3773a6f4c ("perf powerpc: Support exposing Performance Monitor Counter 
SPRs as part of extended regs")
  d735599a069f6936 ("powerpc/perf: Add extended regs support for power10 
platform")

Now iregs can carry more bytes of data and this predefined buffer size
can result to data loss in perf script output.

This patch resolves this issue by making the buffer size dynamic, based
on the number of registers needed to print. It also changes the
regs_map() return type from int to void, as it is not being used by the
set_regs_in_dict(), its only caller.

Fixes: 068aeea3773a6f4c ("perf powerpc: Support exposing Performance Monitor 
Counter SPRs as part of extended regs")
Signed-off-by: Kajol Jain 
Tested-by: Nageswara R Sastry 
Cc: Athira Jajeev 
Cc: Jiri Olsa 
Cc: Madhavan Srinivasan 
Cc: Paul Clarke 
Cc: Ravi Bangoria 
Cc: linuxppc-dev@lists.ozlabs.org
Link: http://lore.kernel.org/lkml/20210628062341.155839-1-kj...@linux.ibm.com
Signed-off-by: Arnaldo Carvalho de Melo 
Signed-off-by: Sasha Levin 
---
 .../util/scripting-engines/trace-event-python.c | 17 -
 1 file changed, 12 insertions(+), 5 deletions(-)

diff --git a/tools/perf/util/scripting-engines/trace-event-python.c 
b/tools/perf/util/scripting-engines/trace-event-python.c
index 3dfc543327af..18dbd9cddda8 100644
--- a/tools/perf/util/scripting-engines/trace-event-python.c
+++ b/tools/perf/util/scripting-engines/trace-event-python.c
@@ -687,7 +687,7 @@ static void set_sample_datasrc_in_dict(PyObject *dict,
_PyUnicode_FromString(decode));
 }
 
-static int regs_map(struct regs_dump *regs, uint64_t mask, char *bf, int size)
+static void regs_map(struct regs_dump *regs, uint64_t mask, char *bf, int size)
 {
unsigned int i = 0, r;
int printed = 0;
@@ -695,7 +695,7 @@ static int regs_map(struct regs_dump *regs, uint64_t mask, 
char *bf, int size)
bf[0] = 0;
 
if (!regs || !regs->regs)
-   return 0;
+   return;
 
for_each_set_bit(r, (unsigned long *) &mask, sizeof(mask) * 8) {
u64 val = regs->regs[i++];
@@ -704,8 +704,6 @@ static int regs_map(struct regs_dump *regs, uint64_t mask, 
char *bf, int size)
 "%5s:0x%" PRIx64 " ",
 perf_reg_name(r), val);
}
-
-   return printed;
 }
 
 static void set_regs_in_dict(PyObject *dict,
@@ -713,7 +711,16 @@ static void set_regs_in_dict(PyObject *dict,
 struct evsel *evsel)
 {
struct perf_event_attr *attr = &evsel->core.attr;
-   char bf[512];
+
+   /*
+* Here value 28 is a constant size which can be used to print
+* one register value and its corresponds to:
+* 16 chars is to specify 64 bit register in hexadecimal.
+* 2 chars is for appending "0x" to the hexadecimal value and
+* 10 chars is for register name.
+*/
+   int size = __sw_hweight64(attr->sample_regs_intr) * 28;
+   char bf[size];
 
regs_map(&sample->intr_regs, attr->sample_regs_intr, bf, sizeof(bf));
 
-- 
2.30.2





Re: [PATCH] replace if with min

2021-07-19 Thread Christophe Leroy

Salah Triki  a écrit :


Replace if with min in order to make code more clean.

Signed-off-by: Salah Triki 
---
 drivers/crypto/nx/nx-842.c | 3 +--
 1 file changed, 1 insertion(+), 2 deletions(-)

diff --git a/drivers/crypto/nx/nx-842.c b/drivers/crypto/nx/nx-842.c
index 2ab90ec10e61..0d1d5a463899 100644
--- a/drivers/crypto/nx/nx-842.c
+++ b/drivers/crypto/nx/nx-842.c
@@ -134,8 +134,7 @@ EXPORT_SYMBOL_GPL(nx842_crypto_exit);
 static void check_constraints(struct nx842_constraints *c)
 {
/* limit maximum, to always have enough bounce buffer to decompress */
-   if (c->maximum > BOUNCE_BUFFER_SIZE)
-   c->maximum = BOUNCE_BUFFER_SIZE;
+   c->maximum = min(c->maximum, BOUNCE_BUFFER_SIZE);


For me the code is less clear with this change, and in addition it  
slightly changes the behaviour. Before, the write was done only when  
the value was changing. Now you rewrite the value always, even when it  
doesn't change.



 }

 static int nx842_crypto_add_header(struct nx842_crypto_header *hdr, u8 *buf)
--
2.25.1





Re: FAILED: patch "[PATCH] drm/radeon/ni_dpm: Fix booting bug" failed to apply to 5.13-stable tree

2021-07-19 Thread Christian Zigotzky

On 19 July 2021 04:58 pm, Christian Zigotzky wrote:

On 19 July 2021 at 04:32 pm, Stan Johnson wrote:

On 7/18/21 10:23 PM, Christian Zigotzky wrote:

Hello Stan,

We had the same issue during the 5.14 merge window. Please look in the
following thread:

https://forum.hyperion-entertainment.com/viewtopic.php?p=53511#p53511

There is a patch available. Please try it.

Thanks,
Christian
...

Hello Christian,

Thanks. There were some errors applying the patch, so it wasn't fully
applied (see below). Of course, I'm using 5.13.2, not 5.14, so maybe
that's expected.

The patched 5.13.2 kernel still results in a blank screen while trying
to run wdm. On this attempt, wdm has died (oddly the screen remains
blank; it should display a text login after X dies). The Xorg.0.log
looks reasonable enough.

I tried disabling wdm, then rebooted, logged in at the console and ran
"startx". The screen goes blank, X is running, startx is running:

johnson   1392  0.0  0.2   2572  1452 tty1 S+   08:06   0:00 /bin/sh
/usr/bin/startx
johnson   1414  0.0  0.4   4904  2096 tty1 S+   08:06   0:00 xinit
/etc/X11/xinit/xinitrc -- /etc/X11/xinit/xserverrc :0 vt1 -keeptty -auth
/tmp/serverauth.dJ7lSnzjjo
johnson   1415  1.0  8.2 128436 41924 tty1 Sl   08:06   0:04
/usr/lib/xorg/Xorg -nolisten tcp :0 vt1 -keeptty -auth
/tmp/serverauth.dJ7lSnzjjo

I had to use "kill -KILL" to kill the startx, xinit and Xorg processes.
After those were killed, the screen was still blank, and even though
nothing was running, the load average was still around 1.00 several
minutes later, so something is still taking CPU time:

$ uptime
  08:25:15 up 20 min,  2 users,  load average: 1.00, 1.00, 0.84

I can attempt a git bisect, though that will take some time.

-Stan

--
$ patch -p1
<../v3-drm-radeon-Fix-NULL-dereference-when-updating-memory-stats.patch
patching file drivers/gpu/drm/radeon/radeon_object.c
Hunk #2 FAILED at 76.
Hunk #3 FAILED at 727.
2 out of 3 hunks FAILED -- saving rejects to file
drivers/gpu/drm/radeon/radeon_object.c.rej
patching file drivers/gpu/drm/radeon/radeon_object.h
patching file drivers/gpu/drm/radeon/radeon_ttm.c
Hunk #1 FAILED at 199.
Hunk #2 succeeded at 227 (offset 11 lines).
Hunk #3 succeeded at 275 (offset 11 lines).
Hunk #4 succeeded at 697 (offset 12 lines).
1 out of 4 hunks FAILED -- saving rejects to file
drivers/gpu/drm/radeon/radeon_ttm.c.rej
johnson@mac-server:/data/software/working/linux-5.13.2$ cat
drivers/gpu/drm/radeon/radeon_ttm.c.rej
--- drivers/gpu/drm/radeon/radeon_ttm.c
+++ drivers/gpu/drm/radeon/radeon_ttm.c
@@ -199,7 +199,7 @@ static int radeon_bo_move(struct ttm_buffer_object
*bo, bool evict,
  struct ttm_resource *old_mem = bo->resource;
  struct radeon_device *rdev;
  struct radeon_bo *rbo;
-    int r;
+    int r, old_type;

  if (new_mem->mem_type == TTM_PL_TT) {
  r = radeon_ttm_tt_bind(bo->bdev, bo->ttm, new_mem);

-

Hello Stan,

Greg has the same issue with patching the kernel 5.13 [1]. We have to 
wait for a solution.


- Christian

Stan,

Forget it. It's another issue below.

Sorry,
Christian


[1]

The patch below does not apply to the 5.13-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to .

thanks,

greg k-h

-- original commit in Linus's tree --

>From 293774413a3f519c826d4eb5313ef02e20515700 Mon Sep 17 00:00:00 2001
From: "Gustavo A. R. Silva" 
Date: Sun, 9 May 2021 17:49:26 -0500
Subject: [PATCH] drm/radeon/ni_dpm: Fix booting bug

Create new structure NISLANDS_SMC_SWSTATE_SINGLE, as initialState.levels
and ACPIState.levels are never actually used as flexible arrays. Those
arrays can be used as simple objects of type
NISLANDS_SMC_HW_PERFORMANCE_LEVEL, instead.

Currently, the code fails because flexible array _levels_ in
struct NISLANDS_SMC_SWSTATE doesn't allow for code that access
the first element of initialState.levels and ACPIState.levels
arrays:

drivers/gpu/drm/radeon/ni_dpm.c:
1690 table->initialState.levels[0].mclk.vMPLL_AD_FUNC_CNTL =
1691 cpu_to_be32(ni_pi->clock_registers.mpll_ad_func_cntl);
...
1903:   table->ACPIState.levels[0].mclk.vMPLL_AD_FUNC_CNTL = 
cpu_to_be32(mpll_ad_func_cntl);
1904:   table->ACPIState.levels[0].mclk.vMPLL_AD_FUNC_CNTL_2 = 
cpu_to_be32(mpll_ad_func_cntl_2);


because such element cannot exist without previously allocating
any dynamic memory for it (which never actually happens).

That's why struct NISLANDS_SMC_SWSTATE should only be used as type
for object driverState and new struct SISLANDS_SMC_SWSTATE_SINGLE is
created as type for objects initialState, ACPIState and ULVState.

Also, with the change from one-element array to flexible-array member
in commit 434fb1e7444a ("drm/radeon/nislands_smc.h: Replace one-element
array with flexible-array member in struct NISLANDS_SMC_SWSTATE"), the
size of dpmLevels in struct NISLANDS_SMC_STATETABLE should be fixed to
be NISLA

FAILED: patch "[PATCH] drm/radeon/ni_dpm: Fix booting bug" failed to apply to 5.13-stable tree

2021-07-19 Thread Christian Zigotzky

On 19 July 2021 at 04:32 pm, Stan Johnson wrote:

On 7/18/21 10:23 PM, Christian Zigotzky wrote:

Hello Stan,

We had the same issue during the 5.14 merge window. Please look in the
following thread:

https://forum.hyperion-entertainment.com/viewtopic.php?p=53511#p53511

There is a patch available. Please try it.

Thanks,
Christian
...

Hello Christian,

Thanks. There were some errors applying the patch, so it wasn't fully
applied (see below). Of course, I'm using 5.13.2, not 5.14, so maybe
that's expected.

The patched 5.13.2 kernel still results in a blank screen while trying
to run wdm. On this attempt, wdm has died (oddly the screen remains
blank; it should display a text login after X dies). The Xorg.0.log
looks reasonable enough.

I tried disabling wdm, then rebooted, logged in at the console and ran
"startx". The screen goes blank, X is running, startx is running:

johnson   1392  0.0  0.2   2572  1452 tty1 S+   08:06   0:00 /bin/sh
/usr/bin/startx
johnson   1414  0.0  0.4   4904  2096 tty1 S+   08:06   0:00 xinit
/etc/X11/xinit/xinitrc -- /etc/X11/xinit/xserverrc :0 vt1 -keeptty -auth
/tmp/serverauth.dJ7lSnzjjo
johnson   1415  1.0  8.2 128436 41924 tty1 Sl   08:06   0:04
/usr/lib/xorg/Xorg -nolisten tcp :0 vt1 -keeptty -auth
/tmp/serverauth.dJ7lSnzjjo

I had to use "kill -KILL" to kill the startx, xinit and Xorg processes.
After those were killed, the screen was still blank, and even though
nothing was running, the load average was still around 1.00 several
minutes later, so something is still taking CPU time:

$ uptime
  08:25:15 up 20 min,  2 users,  load average: 1.00, 1.00, 0.84

I can attempt a git bisect, though that will take some time.

-Stan

--
$ patch -p1
<../v3-drm-radeon-Fix-NULL-dereference-when-updating-memory-stats.patch
patching file drivers/gpu/drm/radeon/radeon_object.c
Hunk #2 FAILED at 76.
Hunk #3 FAILED at 727.
2 out of 3 hunks FAILED -- saving rejects to file
drivers/gpu/drm/radeon/radeon_object.c.rej
patching file drivers/gpu/drm/radeon/radeon_object.h
patching file drivers/gpu/drm/radeon/radeon_ttm.c
Hunk #1 FAILED at 199.
Hunk #2 succeeded at 227 (offset 11 lines).
Hunk #3 succeeded at 275 (offset 11 lines).
Hunk #4 succeeded at 697 (offset 12 lines).
1 out of 4 hunks FAILED -- saving rejects to file
drivers/gpu/drm/radeon/radeon_ttm.c.rej
johnson@mac-server:/data/software/working/linux-5.13.2$ cat
drivers/gpu/drm/radeon/radeon_ttm.c.rej
--- drivers/gpu/drm/radeon/radeon_ttm.c
+++ drivers/gpu/drm/radeon/radeon_ttm.c
@@ -199,7 +199,7 @@ static int radeon_bo_move(struct ttm_buffer_object
*bo, bool evict,
struct ttm_resource *old_mem = bo->resource;
struct radeon_device *rdev;
struct radeon_bo *rbo;
-   int r;
+   int r, old_type;

if (new_mem->mem_type == TTM_PL_TT) {
r = radeon_ttm_tt_bind(bo->bdev, bo->ttm, new_mem);

-

Hello Stan,

Greg has the same issue with patching the kernel 5.13 [1]. We have to 
wait for a solution.


- Christian

[1]

The patch below does not apply to the 5.13-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to .

thanks,

greg k-h

-- original commit in Linus's tree --

>From 293774413a3f519c826d4eb5313ef02e20515700 Mon Sep 17 00:00:00 2001
From: "Gustavo A. R. Silva" 
Date: Sun, 9 May 2021 17:49:26 -0500
Subject: [PATCH] drm/radeon/ni_dpm: Fix booting bug

Create new structure NISLANDS_SMC_SWSTATE_SINGLE, as initialState.levels
and ACPIState.levels are never actually used as flexible arrays. Those
arrays can be used as simple objects of type
NISLANDS_SMC_HW_PERFORMANCE_LEVEL, instead.

Currently, the code fails because flexible array _levels_ in
struct NISLANDS_SMC_SWSTATE doesn't allow for code that access
the first element of initialState.levels and ACPIState.levels
arrays:

drivers/gpu/drm/radeon/ni_dpm.c:
1690 table->initialState.levels[0].mclk.vMPLL_AD_FUNC_CNTL =
1691 cpu_to_be32(ni_pi->clock_registers.mpll_ad_func_cntl);
...
1903:   table->ACPIState.levels[0].mclk.vMPLL_AD_FUNC_CNTL = 
cpu_to_be32(mpll_ad_func_cntl);
1904:   table->ACPIState.levels[0].mclk.vMPLL_AD_FUNC_CNTL_2 = 
cpu_to_be32(mpll_ad_func_cntl_2);


because such element cannot exist without previously allocating
any dynamic memory for it (which never actually happens).

That's why struct NISLANDS_SMC_SWSTATE should only be used as type
for object driverState and new struct SISLANDS_SMC_SWSTATE_SINGLE is
created as type for objects initialState, ACPIState and ULVState.

Also, with the change from one-element array to flexible-array member
in commit 434fb1e7444a ("drm/radeon/nislands_smc.h: Replace one-element
array with flexible-array member in struct NISLANDS_SMC_SWSTATE"), the
size of dpmLevels in struct NISLANDS_SMC_STATETABLE should be fixed to
be NISLANDS_MAX_SMC_PERFORMANCE_LEVELS_PER_SWSTATE instead of
NISLANDS_MAX_SMC_PERFORMANCE_LEVELS_PER_SWS

[PATCH v2] soc: fsl: qe: convert QE interrupt controller to platform_device

2021-07-19 Thread Maxim Kochetkov
Since 5.13 QE's ucc nodes can't get interrupts from devicetree:

ucc@2000 {
cell-index = <1>;
reg = <0x2000 0x200>;
interrupts = <32>;
interrupt-parent = <&qeic>;
};

Now fw_devlink expects driver to create and probe a struct device
for interrupt controller.

So lets convert this driver to simple platform_device with probe().
Also use platform_get_ and devm_ family function to get/allocate
resources and drop unused .compatible = "qeic".

[1] - 
https://lore.kernel.org/lkml/CAGETcx9PiX==mlxb9po8myyk6u2vhpvwtmsa5nkd-ywh5xh...@mail.gmail.com
Fixes: e590474768f1 ("driver core: Set fw_devlink=on by default")
Fixes: ea718c699055 ("Revert "Revert "driver core: Set fw_devlink=on by 
default""")
Signed-off-by: Maxim Kochetkov 
Reported-by: kernel test robot 
Reported-by: Dan Carpenter 
---
Changes in v2:
 - use devm_ family functions to allocate mem/resources
 - use platform_get_ family functions to get resources/irqs
 - drop unused .compatible = "qeic"
 
 drivers/soc/fsl/qe/qe_ic.c | 74 ++
 1 file changed, 43 insertions(+), 31 deletions(-)

diff --git a/drivers/soc/fsl/qe/qe_ic.c b/drivers/soc/fsl/qe/qe_ic.c
index 3f711c1a0996..07db977df349 100644
--- a/drivers/soc/fsl/qe/qe_ic.c
+++ b/drivers/soc/fsl/qe/qe_ic.c
@@ -23,6 +23,7 @@
 #include 
 #include 
 #include 
+#include 
 #include 
 #include 
 #include 
@@ -404,41 +405,40 @@ static void qe_ic_cascade_muxed_mpic(struct irq_desc 
*desc)
chip->irq_eoi(&desc->irq_data);
 }
 
-static void __init qe_ic_init(struct device_node *node)
+static int qe_ic_init(struct platform_device *pdev)
 {
+   struct device *dev = &pdev->dev;
void (*low_handler)(struct irq_desc *desc);
void (*high_handler)(struct irq_desc *desc);
struct qe_ic *qe_ic;
-   struct resource res;
-   u32 ret;
+   struct resource *res;
+   struct device_node *node = pdev->dev.of_node;
 
-   ret = of_address_to_resource(node, 0, &res);
-   if (ret)
-   return;
+   res = platform_get_resource(pdev, IORESOURCE_MEM, 0);
+   if (res == NULL) {
+   dev_err(dev, "no memory resource defined\n");
+   return -ENODEV;
+   }
 
-   qe_ic = kzalloc(sizeof(*qe_ic), GFP_KERNEL);
+   qe_ic = devm_kzalloc(dev, sizeof(*qe_ic), GFP_KERNEL);
if (qe_ic == NULL)
-   return;
+   return -ENOMEM;
 
-   qe_ic->irqhost = irq_domain_add_linear(node, NR_QE_IC_INTS,
-  &qe_ic_host_ops, qe_ic);
-   if (qe_ic->irqhost == NULL) {
-   kfree(qe_ic);
-   return;
+   qe_ic->regs = devm_ioremap(dev, res->start, resource_size(res));
+   if (qe_ic->regs == NULL) {
+   dev_err(dev, "failed to ioremap() registers\n");
+   return -ENODEV;
}
 
-   qe_ic->regs = ioremap(res.start, resource_size(&res));
-
qe_ic->hc_irq = qe_ic_irq_chip;
 
-   qe_ic->virq_high = irq_of_parse_and_map(node, 0);
-   qe_ic->virq_low = irq_of_parse_and_map(node, 1);
+   qe_ic->virq_high = platform_get_irq(pdev, 0);
+   qe_ic->virq_low = platform_get_irq(pdev, 1);
 
-   if (!qe_ic->virq_low) {
-   printk(KERN_ERR "Failed to map QE_IC low IRQ\n");
-   kfree(qe_ic);
-   return;
+   if (qe_ic->virq_low < 0) {
+   return -ENODEV;
}
+
if (qe_ic->virq_high != qe_ic->virq_low) {
low_handler = qe_ic_cascade_low;
high_handler = qe_ic_cascade_high;
@@ -447,6 +447,13 @@ static void __init qe_ic_init(struct device_node *node)
high_handler = NULL;
}
 
+   qe_ic->irqhost = irq_domain_add_linear(node, NR_QE_IC_INTS,
+  &qe_ic_host_ops, qe_ic);
+   if (qe_ic->irqhost == NULL) {
+   dev_err(dev, "failed to add irq domain\n");
+   return -ENODEV;
+   }
+
qe_ic_write(qe_ic->regs, QEIC_CICR, 0);
 
irq_set_handler_data(qe_ic->virq_low, qe_ic);
@@ -456,20 +463,25 @@ static void __init qe_ic_init(struct device_node *node)
irq_set_handler_data(qe_ic->virq_high, qe_ic);
irq_set_chained_handler(qe_ic->virq_high, high_handler);
}
+   return 0;
 }
+static const struct of_device_id qe_ic_ids[] = {
+   { .compatible = "fsl,qe-ic"},
+   {},
+};
 
-static int __init qe_ic_of_init(void)
+static struct platform_driver qe_ic_driver =
 {
-   struct device_node *np;
+   .driver = {
+   .name   = "qe-ic",
+   .of_match_table = qe_ic_ids,
+   },
+   .probe  = qe_ic_init,
+};
 
-   np = of_find_compatible_node(NULL, NULL, "fsl,qe-ic");
-   if (!np) {
-   np = of_find_node_by_type(NULL, "qeic");
-   if (!np)
-   return -ENODEV;
-   }
-   qe_i

Re: [PATCH v5 04/11] powerpc/pseries/iommu: Add ddw_list_new_entry() helper

2021-07-19 Thread Frederic Barrat




On 16/07/2021 10:27, Leonardo Bras wrote:

There are two functions creating direct_window_list entries in a
similar way, so create a ddw_list_new_entry() to avoid duplicity and
simplify those functions.

Signed-off-by: Leonardo Bras 
Reviewed-by: Alexey Kardashevskiy 
---


LGTM
Reviewed-by: Frederic Barrat 



  arch/powerpc/platforms/pseries/iommu.c | 32 +-
  1 file changed, 21 insertions(+), 11 deletions(-)

diff --git a/arch/powerpc/platforms/pseries/iommu.c 
b/arch/powerpc/platforms/pseries/iommu.c
index 33d82865d6e6..712d1667144a 100644
--- a/arch/powerpc/platforms/pseries/iommu.c
+++ b/arch/powerpc/platforms/pseries/iommu.c
@@ -874,6 +874,21 @@ static u64 find_existing_ddw(struct device_node *pdn, int 
*window_shift)
return dma_addr;
  }
  
+static struct direct_window *ddw_list_new_entry(struct device_node *pdn,

+   const struct 
dynamic_dma_window_prop *dma64)
+{
+   struct direct_window *window;
+
+   window = kzalloc(sizeof(*window), GFP_KERNEL);
+   if (!window)
+   return NULL;
+
+   window->device = pdn;
+   window->prop = dma64;
+
+   return window;
+}
+
  static int find_existing_ddw_windows(void)
  {
int len;
@@ -886,18 +901,15 @@ static int find_existing_ddw_windows(void)
  
  	for_each_node_with_property(pdn, DIRECT64_PROPNAME) {

direct64 = of_get_property(pdn, DIRECT64_PROPNAME, &len);
-   if (!direct64)
-   continue;
-
-   window = kzalloc(sizeof(*window), GFP_KERNEL);
-   if (!window || len < sizeof(struct dynamic_dma_window_prop)) {
-   kfree(window);
+   if (!direct64 || len < sizeof(*direct64)) {
remove_ddw(pdn, true);
continue;
}
  
-		window->device = pdn;

-   window->prop = direct64;
+   window = ddw_list_new_entry(pdn, direct64);
+   if (!window)
+   break;
+
spin_lock(&direct_window_list_lock);
list_add(&window->list, &direct_window_list);
spin_unlock(&direct_window_list_lock);
@@ -1307,7 +1319,7 @@ static u64 enable_ddw(struct pci_dev *dev, struct 
device_node *pdn)
dev_dbg(&dev->dev, "created tce table LIOBN 0x%x for %pOF\n",
  create.liobn, dn);
  
-	window = kzalloc(sizeof(*window), GFP_KERNEL);

+   window = ddw_list_new_entry(pdn, ddwprop);
if (!window)
goto out_clear_window;
  
@@ -1326,8 +1338,6 @@ static u64 enable_ddw(struct pci_dev *dev, struct device_node *pdn)

goto out_free_window;
}
  
-	window->device = pdn;

-   window->prop = ddwprop;
spin_lock(&direct_window_list_lock);
list_add(&window->list, &direct_window_list);
spin_unlock(&direct_window_list_lock);



Re: [PATCH v5 03/11] powerpc/pseries/iommu: Add iommu_pseries_alloc_table() helper

2021-07-19 Thread Frederic Barrat




On 16/07/2021 10:27, Leonardo Bras wrote:

Creates a helper to allow allocating a new iommu_table without the need
to reallocate the iommu_group.

This will be helpful for replacing the iommu_table for the new DMA window,
after we remove the old one with iommu_tce_table_put().

Signed-off-by: Leonardo Bras 
Reviewed-by: Alexey Kardashevskiy 
---
  arch/powerpc/platforms/pseries/iommu.c | 25 ++---
  1 file changed, 14 insertions(+), 11 deletions(-)

diff --git a/arch/powerpc/platforms/pseries/iommu.c 
b/arch/powerpc/platforms/pseries/iommu.c
index b1b8d12bab39..33d82865d6e6 100644
--- a/arch/powerpc/platforms/pseries/iommu.c
+++ b/arch/powerpc/platforms/pseries/iommu.c
@@ -53,28 +53,31 @@ enum {
DDW_EXT_QUERY_OUT_SIZE = 2
  };
  
-static struct iommu_table_group *iommu_pseries_alloc_group(int node)

+static struct iommu_table *iommu_pseries_alloc_table(int node)
  {
-   struct iommu_table_group *table_group;
struct iommu_table *tbl;
  
-	table_group = kzalloc_node(sizeof(struct iommu_table_group), GFP_KERNEL,

-  node);
-   if (!table_group)
-   return NULL;
-
tbl = kzalloc_node(sizeof(struct iommu_table), GFP_KERNEL, node);
if (!tbl)
-   goto free_group;
+   return NULL;
  
  	INIT_LIST_HEAD_RCU(&tbl->it_group_list);

kref_init(&tbl->it_kref);
+   return tbl;
+}
  
-	table_group->tables[0] = tbl;

+static struct iommu_table_group *iommu_pseries_alloc_group(int node)
+{
+   struct iommu_table_group *table_group;
+
+   table_group = kzalloc_node(sizeof(*table_group), GFP_KERNEL, node);
+   if (!table_group)
+   return NULL;
  
-	return table_group;

+   table_group->tables[0] = iommu_pseries_alloc_table(node);
+   if (table_group->tables[0])
+   return table_group;



Nitpick: for readability, we'd usually expect the error path to be 
detected with the if statement and keep going on the good path, and here 
the code does the opposite. No big deal though, so


Reviewed-by: Frederic Barrat 


  
-free_group:

kfree(table_group);
return NULL;
  }



Re: [PATCH v5 02/11] powerpc/kernel/iommu: Add new iommu_table_in_use() helper

2021-07-19 Thread Frederic Barrat




On 16/07/2021 10:27, Leonardo Bras wrote:

@@ -1099,18 +1105,13 @@ int iommu_take_ownership(struct iommu_table *tbl)
for (i = 0; i < tbl->nr_pools; i++)
spin_lock_nest_lock(&tbl->pools[i].lock, &tbl->large_pool.lock);
  
-	iommu_table_release_pages(tbl);

-
-   if (!bitmap_empty(tbl->it_map, tbl->it_size)) {
+   if (iommu_table_in_use(tbl)) {
pr_err("iommu_tce: it_map is not empty");
ret = -EBUSY;
-   /* Undo iommu_table_release_pages, i.e. restore bit#0, etc */
-   iommu_table_reserve_pages(tbl, tbl->it_reserved_start,
-   tbl->it_reserved_end);
-   } else {
-   memset(tbl->it_map, 0xff, sz);
}
  
+	memset(tbl->it_map, 0xff, sz);

+



So if the table is not empty, we fail (EBUSY) but we now also completely 
overwrite the bitmap. It was in an unexpected state, but we're making it 
worse. Or am I missing something?


  Fred



for (i = 0; i < tbl->nr_pools; i++)
spin_unlock(&tbl->pools[i].lock);
spin_unlock_irqrestore(&tbl->large_pool.lock, flags);


Re: [PATCH v5 01/11] powerpc/pseries/iommu: Replace hard-coded page shift

2021-07-19 Thread Frederic Barrat




On 16/07/2021 10:27, Leonardo Bras wrote:

Some functions assume IOMMU page size can only be 4K (pageshift == 12).
Update them to accept any page size passed, so we can use 64K pages.

In the process, some defines like TCE_SHIFT were made obsolete, and then
removed.

IODA3 Revision 3.0_prd1 (OpenPowerFoundation), Figures 3.4 and 3.5 show
a RPN of 52-bit, and considers a 12-bit pageshift, so there should be
no need of using TCE_RPN_MASK, which masks out any bit after 40 in rpn.
It's usage removed from tce_build_pSeries(), tce_build_pSeriesLP(), and
tce_buildmulti_pSeriesLP().

Most places had a tbl struct, so using tbl->it_page_shift was simple.
tce_free_pSeriesLP() was a special case, since callers not always have a
tbl struct, so adding a tceshift parameter seems the right thing to do.

Signed-off-by: Leonardo Bras 
Reviewed-by: Alexey Kardashevskiy 
---


FWIW,
Reviewed-by: Frederic Barrat 




  arch/powerpc/include/asm/tce.h |  8 --
  arch/powerpc/platforms/pseries/iommu.c | 39 +++---
  2 files changed, 23 insertions(+), 24 deletions(-)

diff --git a/arch/powerpc/include/asm/tce.h b/arch/powerpc/include/asm/tce.h
index db5fc2f2262d..0c34d2756d92 100644
--- a/arch/powerpc/include/asm/tce.h
+++ b/arch/powerpc/include/asm/tce.h
@@ -19,15 +19,7 @@
  #define TCE_VB0
  #define TCE_PCI   1
  
-/* TCE page size is 4096 bytes (1 << 12) */

-
-#define TCE_SHIFT  12
-#define TCE_PAGE_SIZE  (1 << TCE_SHIFT)
-
  #define TCE_ENTRY_SIZE8   /* each TCE is 64 bits 
*/
-
-#define TCE_RPN_MASK   0xfful  /* 40-bit RPN (4K pages) */
-#define TCE_RPN_SHIFT  12
  #define TCE_VALID 0x800   /* TCE valid */
  #define TCE_ALLIO 0x400   /* TCE valid for all lpars */
  #define TCE_PCI_WRITE 0x2 /* write from PCI allowed */
diff --git a/arch/powerpc/platforms/pseries/iommu.c 
b/arch/powerpc/platforms/pseries/iommu.c
index 0c55b991f665..b1b8d12bab39 100644
--- a/arch/powerpc/platforms/pseries/iommu.c
+++ b/arch/powerpc/platforms/pseries/iommu.c
@@ -107,6 +107,8 @@ static int tce_build_pSeries(struct iommu_table *tbl, long 
index,
u64 proto_tce;
__be64 *tcep;
u64 rpn;
+   const unsigned long tceshift = tbl->it_page_shift;
+   const unsigned long pagesize = IOMMU_PAGE_SIZE(tbl);
  
  	proto_tce = TCE_PCI_READ; // Read allowed
  
@@ -117,10 +119,10 @@ static int tce_build_pSeries(struct iommu_table *tbl, long index,
  
  	while (npages--) {

/* can't move this out since we might cross MEMBLOCK boundary */
-   rpn = __pa(uaddr) >> TCE_SHIFT;
-   *tcep = cpu_to_be64(proto_tce | (rpn & TCE_RPN_MASK) << 
TCE_RPN_SHIFT);
+   rpn = __pa(uaddr) >> tceshift;
+   *tcep = cpu_to_be64(proto_tce | rpn << tceshift);
  
-		uaddr += TCE_PAGE_SIZE;

+   uaddr += pagesize;
tcep++;
}
return 0;
@@ -146,7 +148,7 @@ static unsigned long tce_get_pseries(struct iommu_table 
*tbl, long index)
return be64_to_cpu(*tcep);
  }
  
-static void tce_free_pSeriesLP(unsigned long liobn, long, long);

+static void tce_free_pSeriesLP(unsigned long liobn, long, long, long);
  static void tce_freemulti_pSeriesLP(struct iommu_table*, long, long);
  
  static int tce_build_pSeriesLP(unsigned long liobn, long tcenum, long tceshift,

@@ -166,12 +168,12 @@ static int tce_build_pSeriesLP(unsigned long liobn, long 
tcenum, long tceshift,
proto_tce |= TCE_PCI_WRITE;
  
  	while (npages--) {

-   tce = proto_tce | (rpn & TCE_RPN_MASK) << tceshift;
+   tce = proto_tce | rpn << tceshift;
rc = plpar_tce_put((u64)liobn, (u64)tcenum << tceshift, tce);
  
  		if (unlikely(rc == H_NOT_ENOUGH_RESOURCES)) {

ret = (int)rc;
-   tce_free_pSeriesLP(liobn, tcenum_start,
+   tce_free_pSeriesLP(liobn, tcenum_start, tceshift,
   (npages_start - (npages + 1)));
break;
}
@@ -205,10 +207,11 @@ static int tce_buildmulti_pSeriesLP(struct iommu_table 
*tbl, long tcenum,
long tcenum_start = tcenum, npages_start = npages;
int ret = 0;
unsigned long flags;
+   const unsigned long tceshift = tbl->it_page_shift;
  
  	if ((npages == 1) || !firmware_has_feature(FW_FEATURE_PUT_TCE_IND)) {

return tce_build_pSeriesLP(tbl->it_index, tcenum,
-  tbl->it_page_shift, npages, uaddr,
+  tceshift, npages, uaddr,
   direction, attrs);
}
  
@@ -225,13 +228,13 @@ static int tce_buildmulti_pSeriesLP(struct iommu_table *tbl, long tcenum,

if (!tcep) {
local_irq_restore(flags);

[PATCH] powerpc: use IRQF_NO_DEBUG for IPIs

2021-07-19 Thread Cédric Le Goater
There is no need to use the lockup detector ("noirqdebug") for IPIs.
The ipistorm benchmark measures a ~10% improvement on high systems
when this flag is set.

Cc: Thomas Gleixner 
Signed-off-by: Cédric Le Goater 
---
 arch/powerpc/sysdev/xics/xics-common.c | 2 +-
 arch/powerpc/sysdev/xive/common.c  | 3 ++-
 2 files changed, 3 insertions(+), 2 deletions(-)

diff --git a/arch/powerpc/sysdev/xics/xics-common.c 
b/arch/powerpc/sysdev/xics/xics-common.c
index b14c502e56a8..18174ccefbc0 100644
--- a/arch/powerpc/sysdev/xics/xics-common.c
+++ b/arch/powerpc/sysdev/xics/xics-common.c
@@ -133,7 +133,7 @@ static void xics_request_ipi(void)
 * IPIs are marked IRQF_PERCPU. The handler was set in map.
 */
BUG_ON(request_irq(ipi, icp_ops->ipi_action,
-  IRQF_PERCPU | IRQF_NO_THREAD, "IPI", NULL));
+  IRQF_NO_DEBUG | IRQF_PERCPU | IRQF_NO_THREAD, "IPI", 
NULL));
 }
 
 void __init xics_smp_probe(void)
diff --git a/arch/powerpc/sysdev/xive/common.c 
b/arch/powerpc/sysdev/xive/common.c
index dbdbbc2f1dc5..9ab44d069704 100644
--- a/arch/powerpc/sysdev/xive/common.c
+++ b/arch/powerpc/sysdev/xive/common.c
@@ -1161,7 +1161,8 @@ static int __init xive_request_ipi(void)
snprintf(xid->name, sizeof(xid->name), "IPI-%d", node);
 
ret = request_irq(xid->irq, xive_muxed_ipi_action,
- IRQF_PERCPU | IRQF_NO_THREAD, xid->name, 
NULL);
+ IRQF_NO_DEBUG | IRQF_PERCPU | IRQF_NO_THREAD,
+ xid->name, NULL);
 
WARN(ret < 0, "Failed to request IPI %d: %d\n", xid->irq, ret);
}
-- 
2.31.1



[PATCH] PCI: Move pci_dev_is/assign_added() to pci.h

2021-07-19 Thread Niklas Schnelle
The helper function pci_dev_is_added() from drivers/pci/pci.h is used in
PCI arch code of both s390 and powerpc leading to awkward relative
includes. Move it to the global include/linux/pci.h and get rid of these
includes just for that one function.

Signed-off-by: Niklas Schnelle 
---
 arch/powerpc/platforms/powernv/pci-sriov.c |  3 ---
 arch/powerpc/platforms/pseries/setup.c |  1 -
 arch/s390/pci/pci_sysfs.c  |  2 --
 drivers/pci/hotplug/acpiphp_glue.c |  1 -
 drivers/pci/pci.h  | 15 ---
 include/linux/pci.h| 13 +
 6 files changed, 13 insertions(+), 22 deletions(-)

diff --git a/arch/powerpc/platforms/powernv/pci-sriov.c 
b/arch/powerpc/platforms/powernv/pci-sriov.c
index 28aac933a439..2e0ca5451e85 100644
--- a/arch/powerpc/platforms/powernv/pci-sriov.c
+++ b/arch/powerpc/platforms/powernv/pci-sriov.c
@@ -9,9 +9,6 @@
 
 #include "pci.h"
 
-/* for pci_dev_is_added() */
-#include "../../../../drivers/pci/pci.h"
-
 /*
  * The majority of the complexity in supporting SR-IOV on PowerNV comes from
  * the need to put the MMIO space for each VF into a separate PE. Internally
diff --git a/arch/powerpc/platforms/pseries/setup.c 
b/arch/powerpc/platforms/pseries/setup.c
index 631a0d57b6cd..17585ec9f955 100644
--- a/arch/powerpc/platforms/pseries/setup.c
+++ b/arch/powerpc/platforms/pseries/setup.c
@@ -74,7 +74,6 @@
 #include 
 
 #include "pseries.h"
-#include "../../../../drivers/pci/pci.h"
 
 DEFINE_STATIC_KEY_FALSE(shared_processor);
 EXPORT_SYMBOL_GPL(shared_processor);
diff --git a/arch/s390/pci/pci_sysfs.c b/arch/s390/pci/pci_sysfs.c
index 6e2450c2b9c1..8dbe54ef8f8e 100644
--- a/arch/s390/pci/pci_sysfs.c
+++ b/arch/s390/pci/pci_sysfs.c
@@ -13,8 +13,6 @@
 #include 
 #include 
 
-#include "../../../drivers/pci/pci.h"
-
 #include 
 
 #define zpci_attr(name, fmt, member)   \
diff --git a/drivers/pci/hotplug/acpiphp_glue.c 
b/drivers/pci/hotplug/acpiphp_glue.c
index f031302ad401..4cb963f88183 100644
--- a/drivers/pci/hotplug/acpiphp_glue.c
+++ b/drivers/pci/hotplug/acpiphp_glue.c
@@ -38,7 +38,6 @@
 #include 
 #include 
 
-#include "../pci.h"
 #include "acpiphp.h"
 
 static LIST_HEAD(bridge_list);
diff --git a/drivers/pci/pci.h b/drivers/pci/pci.h
index 93dcdd431072..a159cd0f6f05 100644
--- a/drivers/pci/pci.h
+++ b/drivers/pci/pci.h
@@ -383,21 +383,6 @@ static inline bool pci_dev_is_disconnected(const struct 
pci_dev *dev)
return dev->error_state == pci_channel_io_perm_failure;
 }
 
-/* pci_dev priv_flags */
-#define PCI_DEV_ADDED 0
-#define PCI_DPC_RECOVERED 1
-#define PCI_DPC_RECOVERING 2
-
-static inline void pci_dev_assign_added(struct pci_dev *dev, bool added)
-{
-   assign_bit(PCI_DEV_ADDED, &dev->priv_flags, added);
-}
-
-static inline bool pci_dev_is_added(const struct pci_dev *dev)
-{
-   return test_bit(PCI_DEV_ADDED, &dev->priv_flags);
-}
-
 #ifdef CONFIG_PCIEAER
 #include 
 
diff --git a/include/linux/pci.h b/include/linux/pci.h
index 540b377ca8f6..b3b7bafa17e5 100644
--- a/include/linux/pci.h
+++ b/include/linux/pci.h
@@ -507,6 +507,19 @@ struct pci_dev {
unsigned long   priv_flags; /* Private flags for the PCI driver */
 };
 
+/* pci_dev priv_flags */
+#define PCI_DEV_ADDED 0
+
+static inline void pci_dev_assign_added(struct pci_dev *dev, bool added)
+{
+   assign_bit(PCI_DEV_ADDED, &dev->priv_flags, added);
+}
+
+static inline bool pci_dev_is_added(const struct pci_dev *dev)
+{
+   return test_bit(PCI_DEV_ADDED, &dev->priv_flags);
+}
+
 static inline struct pci_dev *pci_physfn(struct pci_dev *dev)
 {
 #ifdef CONFIG_PCI_IOV
-- 
2.25.1



Re: [PATCH] soc: fsl: qe: convert QE interrupt controller to platform_device

2021-07-19 Thread Maxim Kochetkov

15.07.2021 01:29, Li Yang wrote:

 From the original code, this should be type = "qeic".  It is not
defined in current binding but probably needed for backward
compatibility.


I took these strings from this part:

   np = of_find_compatible_node(NULL, NULL, "fsl,qe-ic");

   if (!np) {

   np = of_find_node_by_type(NULL, "qeic");

   if (!np)

   return -ENODEV;

   }

However I can't find usage of "qeic" in any dts, so I will drop this in V2


[PATCH v5 0/1] Interface to represent PAPR firmware attributes

2021-07-19 Thread Pratik R. Sampat
RFC: https://lkml.org/lkml/2021/6/4/791
PATCH v1: https://lkml.org/lkml/2021/6/16/805
PATCH v2: https://lkml.org/lkml/2021/7/6/138
PATCH v3: https://lkml.org/lkml/2021/7/12/2799
PATCH v4: https://lkml.org/lkml/2021/7/16/532

Changelog v4 --> v5
Based on comments from Fabiano
1. Cleaned up unused/redundant headers
2. Make "add_attr_group" use MAX_ATTRS instead of paramterizing "len"
3. Cleanup previous "pgs[idx].pg.attrs" allocations on failiure
4. Replaced strlen call with strnlen
5. Cleanup of pgs structures for "num_attrs" instead of "MAX_ATTRS"

Also, have implemented a POC using this interface for the powerpc-utils'
ppc64_cpu --frequency command-line tool to utilize this information
in userspace.
The POC for the new interface has been hosted here:
https://github.com/pratiksampat/powerpc-utils/tree/H_GET_ENERGY_SCALE_INFO_v2

Sample output from the powerpc-utils tool is as follows:

# ppc64_cpu --frequency
Power and Performance Mode: 
Idle Power Saver Status   : 
Processor Folding Status  :  --> Printed if Idle power save status is 
supported

Platform reported frequencies --> Frequencies reported from the platform's 
H_CALL i.e PAPR interface
min: GHz
max: GHz
static : GHz

Tool Computed frequencies
min: GHz (cpu XX)
max: GHz (cpu XX)
avg: GHz

Pratik R. Sampat (1):
  powerpc/pseries: Interface to represent PAPR firmware attributes

 .../sysfs-firmware-papr-energy-scale-info |  26 ++
 arch/powerpc/include/asm/hvcall.h |  24 +-
 arch/powerpc/kvm/trace_hv.h   |   1 +
 arch/powerpc/platforms/pseries/Makefile   |   3 +-
 .../pseries/papr_platform_attributes.c| 317 ++
 5 files changed, 369 insertions(+), 2 deletions(-)
 create mode 100644 
Documentation/ABI/testing/sysfs-firmware-papr-energy-scale-info
 create mode 100644 arch/powerpc/platforms/pseries/papr_platform_attributes.c

-- 
2.31.1



[PATCH v5 1/1] powerpc/pseries: Interface to represent PAPR firmware attributes

2021-07-19 Thread Pratik R. Sampat
Adds a generic interface to represent the energy and frequency related
PAPR attributes on the system using the new H_CALL
"H_GET_ENERGY_SCALE_INFO".

H_GET_EM_PARMS H_CALL was previously responsible for exporting this
information in the lparcfg, however the H_GET_EM_PARMS H_CALL
will be deprecated P10 onwards.

The H_GET_ENERGY_SCALE_INFO H_CALL is of the following call format:
hcall(
  uint64 H_GET_ENERGY_SCALE_INFO,  // Get energy scale info
  uint64 flags,   // Per the flag request
  uint64 firstAttributeId,// The attribute id
  uint64 bufferAddress,   // Guest physical address of the output buffer
  uint64 bufferSize   // The size in bytes of the output buffer
);

This H_CALL can query either all the attributes at once with
firstAttributeId = 0, flags = 0 as well as query only one attribute
at a time with firstAttributeId = id, flags = 1.

The output buffer consists of the following
1. number of attributes  - 8 bytes
2. array offset to the data location - 8 bytes
3. version info  - 1 byte
4. A data array of size num attributes, which contains the following:
  a. attribute ID  - 8 bytes
  b. attribute value in number - 8 bytes
  c. attribute name in string  - 64 bytes
  d. attribute value in string - 64 bytes

The new H_CALL exports information in direct string value format, hence
a new interface has been introduced in
/sys/firmware/papr/energy_scale_info to export this information to
userspace in an extensible pass-through format.

The H_CALL returns the name, numeric value and string value (if exists)

The format of exposing the sysfs information is as follows:
/sys/firmware/papr/energy_scale_info/
   |-- /
 |-- desc
 |-- value
 |-- value_desc (if exists)
   |-- /
 |-- desc
 |-- value
 |-- value_desc (if exists)
...

The energy information that is exported is useful for userspace tools
such as powerpc-utils. Currently these tools infer the
"power_mode_data" value in the lparcfg, which in turn is obtained from
the to be deprecated H_GET_EM_PARMS H_CALL.
On future platforms, such userspace utilities will have to look at the
data returned from the new H_CALL being populated in this new sysfs
interface and report this information directly without the need of
interpretation.

Signed-off-by: Pratik R. Sampat 
Reviewed-by: Gautham R. Shenoy 
---
 .../sysfs-firmware-papr-energy-scale-info |  26 ++
 arch/powerpc/include/asm/hvcall.h |  24 +-
 arch/powerpc/kvm/trace_hv.h   |   1 +
 arch/powerpc/platforms/pseries/Makefile   |   3 +-
 .../pseries/papr_platform_attributes.c| 317 ++
 5 files changed, 369 insertions(+), 2 deletions(-)
 create mode 100644 
Documentation/ABI/testing/sysfs-firmware-papr-energy-scale-info
 create mode 100644 arch/powerpc/platforms/pseries/papr_platform_attributes.c

diff --git a/Documentation/ABI/testing/sysfs-firmware-papr-energy-scale-info 
b/Documentation/ABI/testing/sysfs-firmware-papr-energy-scale-info
new file mode 100644
index ..139a576c7c9d
--- /dev/null
+++ b/Documentation/ABI/testing/sysfs-firmware-papr-energy-scale-info
@@ -0,0 +1,26 @@
+What:  /sys/firmware/papr/energy_scale_info
+Date:  June 2021
+Contact:   Linux for PowerPC mailing list 
+Description:   Directory hosting a set of platform attributes like
+   energy/frequency on Linux running as a PAPR guest.
+
+   Each file in a directory contains a platform
+   attribute hierarchy pertaining to performance/
+   energy-savings mode and processor frequency.
+
+What:  /sys/firmware/papr/energy_scale_info/
+   /sys/firmware/papr/energy_scale_info//desc
+   /sys/firmware/papr/energy_scale_info//value
+   /sys/firmware/papr/energy_scale_info//value_desc
+Date:  June 2021
+Contact:   Linux for PowerPC mailing list 
+Description:   Energy, frequency attributes directory for POWERVM servers
+
+   This directory provides energy, frequency, folding information. 
It
+   contains below sysfs attributes:
+
+   - desc: String description of the attribute 
+
+   - value: Numeric value of attribute 
+
+   - value_desc: String value of attribute 
diff --git a/arch/powerpc/include/asm/hvcall.h 
b/arch/powerpc/include/asm/hvcall.h
index e3b29eda8074..c91714ea6719 100644
--- a/arch/powerpc/include/asm/hvcall.h
+++ b/arch/powerpc/include/asm/hvcall.h
@@ -316,7 +316,8 @@
 #define H_SCM_PERFORMANCE_STATS 0x418
 #define H_RPT_INVALIDATE   0x448
 #define H_SCM_FLUSH0x44C
-#define MAX_HCALL_OPCODE   H_SCM_FLUSH
+#define H_GET_ENERGY_SCALE_INFO0x450
+#define MAX_HCALL_OPCODE   H_GET_ENERGY_SCALE_INFO
 
 /* Scope args for H_SCM_UNBIND_ALL */
 #define H_UNBIND_SCOPE_ALL (0x1)
@@ -631,6 +632,27 @@ struct hv_gpci_request_buffer {
uint8_t bytes[HGPCI_MAX_DATA_BYTES];
 } __packed;
 
+#

Re: [PATCH v5] pseries: prevent free CPU ids to be reused on another node

2021-07-19 Thread Laurent Dufour

Hi Michael,

Is there a way to get that patch in 5.14?

Thanks,
Laurent.

Le 29/04/2021 à 19:49, Laurent Dufour a écrit :

When a CPU is hot added, the CPU ids are taken from the available mask from
the lower possible set. If that set of values was previously used for CPU
attached to a different node, this seems to application like if these CPUs
have migrated from a node to another one which is not expected in real
life.

To prevent this, it is needed to record the CPU ids used for each node and
to not reuse them on another node. However, to prevent CPU hot plug to
fail, in the case the CPU ids is starved on a node, the capability to reuse
other nodes’ free CPU ids is kept. A warning is displayed in such a case
to warn the user.

A new CPU bit mask (node_recorded_ids_map) is introduced for each possible
node. It is populated with the CPU onlined at boot time, and then when a
CPU is hot plug to a node. The bits in that mask remain when the CPU is hot
unplugged, to remind this CPU ids have been used for this node.

If no id set was found, a retry is made without removing the ids used on
the other nodes to try reusing them. This is the way ids have been
allocated prior to this patch.

The effect of this patch can be seen by removing and adding CPUs using the
Qemu monitor. In the following case, the first CPU from the node 2 is
removed, then the first one from the node 1 is removed too. Later, the
first CPU of the node 2 is added back. Without that patch, the kernel will
numbered these CPUs using the first CPU ids available which are the ones
freed when removing the second CPU of the node 0. This leads to the CPU ids
16-23 to move from the node 1 to the node 2. With the patch applied, the
CPU ids 32-39 are used since they are the lowest free ones which have not
been used on another node.

At boot time:
[root@vm40 ~]# numactl -H | grep cpus
node 0 cpus: 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15
node 1 cpus: 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31
node 2 cpus: 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47

Vanilla kernel, after the CPU hot unplug/plug operations:
[root@vm40 ~]# numactl -H | grep cpus
node 0 cpus: 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15
node 1 cpus: 24 25 26 27 28 29 30 31
node 2 cpus: 16 17 18 19 20 21 22 23 40 41 42 43 44 45 46 47

Patched kernel, after the CPU hot unplug/plug operations:
[root@vm40 ~]# numactl -H | grep cpus
node 0 cpus: 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15
node 1 cpus: 24 25 26 27 28 29 30 31
node 2 cpus: 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47

Signed-off-by: Laurent Dufour 
---
V5:
  - Rework code structure
  - Reintroduce the capability to reuse other node's ids.
V4: addressing Nathan's comment
  - Rename the local variable named 'nid' into 'assigned_node'
V3: addressing Nathan's comments
  - Remove the retry feature
  - Reduce the number of local variables (removing 'i')
  - Add comment about the cpu_add_remove_lock protecting the added CPU mask.
  V2: (no functional changes)
  - update the test's output in the commit's description
  - node_recorded_ids_map should be static
---
  arch/powerpc/platforms/pseries/hotplug-cpu.c | 171 ++-
  1 file changed, 132 insertions(+), 39 deletions(-)

diff --git a/arch/powerpc/platforms/pseries/hotplug-cpu.c 
b/arch/powerpc/platforms/pseries/hotplug-cpu.c
index 7e970f81d8ff..e1f224320102 100644
--- a/arch/powerpc/platforms/pseries/hotplug-cpu.c
+++ b/arch/powerpc/platforms/pseries/hotplug-cpu.c
@@ -39,6 +39,12 @@
  /* This version can't take the spinlock, because it never returns */
  static int rtas_stop_self_token = RTAS_UNKNOWN_SERVICE;
  
+/*

+ * Record the CPU ids used on each nodes.
+ * Protected by cpu_add_remove_lock.
+ */
+static cpumask_var_t node_recorded_ids_map[MAX_NUMNODES];
+
  static void rtas_stop_self(void)
  {
static struct rtas_args args;
@@ -139,72 +145,148 @@ static void pseries_cpu_die(unsigned int cpu)
paca_ptrs[cpu]->cpu_start = 0;
  }
  
+/**

+ * find_cpu_id_range - found a linear ranger of @nthreads free CPU ids.
+ * @nthreads : the number of threads (cpu ids)
+ * @assigned_node : the node it belongs to or NUMA_NO_NODE if free ids from any
+ *  node can be peek.
+ * @cpu_mask: the returned CPU mask.
+ *
+ * Returns 0 on success.
+ */
+static int find_cpu_id_range(unsigned int nthreads, int assigned_node,
+cpumask_var_t *cpu_mask)
+{
+   cpumask_var_t candidate_mask;
+   unsigned int cpu, node;
+   int rc = -ENOSPC;
+
+   if (!zalloc_cpumask_var(&candidate_mask, GFP_KERNEL))
+   return -ENOMEM;
+
+   cpumask_clear(*cpu_mask);
+   for (cpu = 0; cpu < nthreads; cpu++)
+   cpumask_set_cpu(cpu, *cpu_mask);
+
+   BUG_ON(!cpumask_subset(cpu_present_mask, cpu_possible_mask));
+
+   /* Get a bitmap of unoccupied slots. */
+   cpumask_xor(candidate_mask, cpu_possible_mask, cpu_present_mask);
+
+   if (assigned_node != NUMA_NO_NODE) {
+   /*
+ 

Re: [PATCH v2] ppc64/numa: consider the max numa node for migratable LPAR

2021-07-19 Thread Laurent Dufour

Hi Michael,

Is there a way to get that patch in 5.14?

Thanks,
Laurent.

Le 11/05/2021 à 09:31, Laurent Dufour a écrit :

When a LPAR is migratable, we should consider the maximum possible NUMA
node instead the number of NUMA node from the actual system.

The DT property 'ibm,current-associativity-domains' is defining the maximum
number of nodes the LPAR can see when running on that box. But if the LPAR
is being migrated on another box, it may seen up to the nodes defined by
'ibm,max-associativity-domains'. So if a LPAR is migratable, that value
should be used.

Unfortunately, there is no easy way to know if a LPAR is migratable or
not. The hypervisor is exporting the property 'ibm,migratable-partition' in
the case it set to migrate partition, but that would not mean that the
current partition is migratable.

Without this patch, when a LPAR is started on a 2 nodes box and then
migrated to a 3 nodes box, the hypervisor may spread the LPAR's CPUs on the
3rd node. In that case if a CPU from that 3rd node is added to the LPAR, it
will be wrongly assigned to the node because the kernel has been set to use
up to 2 nodes (the configuration of the departure node). With this patch
applies, the CPU is correctly added to the 3rd node.

Fixes: f9f130ff2ec9 ("powerpc/numa: Detect support for coregroup")
Reviewed-by: Srikar Dronamraju 
Signed-off-by: Laurent Dufour 
---
V2: Address Srikar's comments
  - Fix the commit message
  - Use pr_info instead printk(KERN_INFO..)
---
  arch/powerpc/mm/numa.c | 13 ++---
  1 file changed, 10 insertions(+), 3 deletions(-)

diff --git a/arch/powerpc/mm/numa.c b/arch/powerpc/mm/numa.c
index f2bf98bdcea2..094a1076fd1f 100644
--- a/arch/powerpc/mm/numa.c
+++ b/arch/powerpc/mm/numa.c
@@ -893,7 +893,7 @@ static void __init setup_node_data(int nid, u64 start_pfn, 
u64 end_pfn)
  static void __init find_possible_nodes(void)
  {
struct device_node *rtas;
-   const __be32 *domains;
+   const __be32 *domains = NULL;
int prop_length, max_nodes;
u32 i;
  
@@ -909,9 +909,14 @@ static void __init find_possible_nodes(void)

 * it doesn't exist, then fallback on ibm,max-associativity-domains.
 * Current denotes what the platform can support compared to max
 * which denotes what the Hypervisor can support.
+*
+* If the LPAR is migratable, new nodes might be activated after a LPM,
+* so we should consider the max number in that case.
 */
-   domains = of_get_property(rtas, "ibm,current-associativity-domains",
-   &prop_length);
+   if (!of_get_property(of_root, "ibm,migratable-partition", NULL))
+   domains = of_get_property(rtas,
+ "ibm,current-associativity-domains",
+ &prop_length);
if (!domains) {
domains = of_get_property(rtas, "ibm,max-associativity-domains",
&prop_length);
@@ -920,6 +925,8 @@ static void __init find_possible_nodes(void)
}
  
  	max_nodes = of_read_number(&domains[min_common_depth], 1);

+   pr_info("Partition configured for %d NUMA nodes.\n", max_nodes);
+
for (i = 0; i < max_nodes; i++) {
if (!node_possible(i))
node_set(i, node_possible_map);





Re: [PATCH v5] pseries/drmem: update LMBs after LPM

2021-07-19 Thread Laurent Dufour

Hi Michael,

Is there a way to get that patch in 5.14?

Thanks,
Laurent.

Le 17/05/2021 à 11:06, Laurent Dufour a écrit :

After a LPM, the device tree node ibm,dynamic-reconfiguration-memory may be
updated by the hypervisor in the case the NUMA topology of the LPAR's
memory is updated.

This is handled by the kernel, but the memory's node is not updated because
there is no way to move a memory block between nodes from the Linux kernel
point of view.

If later a memory block is added or removed, drmem_update_dt() is called
and it is overwriting the DT node ibm,dynamic-reconfiguration-memory to
match the added or removed LMB. But the LMB's associativity node has not
been updated after the DT node update and thus the node is overwritten by
the Linux's topology instead of the hypervisor one.

Introduce a hook called when the ibm,dynamic-reconfiguration-memory node is
updated to force an update of the LMB's associativity. However, ignore the
call to that hook when the update has been triggered by drmem_update_dt().
Because, in that case, the LMB tree has been used to set the DT property
and thus it doesn't need to be updated back. Since drmem_update_dt() is
called under the protection of the device_hotplug_lock and the hook is
called in the same context, use a simple boolean variable to detect that
call.

Cc: Nathan Lynch 
Cc: Aneesh Kumar K.V 
Cc: Tyrel Datwyler 
Signed-off-by: Laurent Dufour 
---

V5:
  - Reword the commit's description to address Nathan's comments.
V4:
  - Prevent the LMB to be updated back in the case the request came from the
  LMB tree's update.
V3:
  - Check rd->dn->name instead of rd->dn->full_name
V2:
  - Take Tyrel's idea to rely on OF_RECONFIG_UPDATE_PROPERTY instead of
  introducing a new hook mechanism.
---
  arch/powerpc/include/asm/drmem.h  |  1 +
  arch/powerpc/mm/drmem.c   | 46 +++
  .../platforms/pseries/hotplug-memory.c|  4 ++
  3 files changed, 51 insertions(+)

diff --git a/arch/powerpc/include/asm/drmem.h b/arch/powerpc/include/asm/drmem.h
index bf2402fed3e0..4265d5e95c2c 100644
--- a/arch/powerpc/include/asm/drmem.h
+++ b/arch/powerpc/include/asm/drmem.h
@@ -111,6 +111,7 @@ int drmem_update_dt(void);
  int __init
  walk_drmem_lmbs_early(unsigned long node, void *data,
  int (*func)(struct drmem_lmb *, const __be32 **, void *));
+void drmem_update_lmbs(struct property *prop);
  #endif
  
  static inline void invalidate_lmb_associativity_index(struct drmem_lmb *lmb)

diff --git a/arch/powerpc/mm/drmem.c b/arch/powerpc/mm/drmem.c
index 9af3832c9d8d..22197b18d85e 100644
--- a/arch/powerpc/mm/drmem.c
+++ b/arch/powerpc/mm/drmem.c
@@ -18,6 +18,7 @@ static int n_root_addr_cells, n_root_size_cells;
  
  static struct drmem_lmb_info __drmem_info;

  struct drmem_lmb_info *drmem_info = &__drmem_info;
+static bool in_drmem_update;
  
  u64 drmem_lmb_memory_max(void)

  {
@@ -178,6 +179,11 @@ int drmem_update_dt(void)
if (!memory)
return -1;
  
+	/*

+* Set in_drmem_update to prevent the notifier callback to process the
+* DT property back since the change is coming from the LMB tree.
+*/
+   in_drmem_update = true;
prop = of_find_property(memory, "ibm,dynamic-memory", NULL);
if (prop) {
rc = drmem_update_dt_v1(memory, prop);
@@ -186,6 +192,7 @@ int drmem_update_dt(void)
if (prop)
rc = drmem_update_dt_v2(memory, prop);
}
+   in_drmem_update = false;
  
  	of_node_put(memory);

return rc;
@@ -307,6 +314,45 @@ int __init walk_drmem_lmbs_early(unsigned long node, void 
*data,
return ret;
  }
  
+/*

+ * Update the LMB associativity index.
+ */
+static int update_lmb(struct drmem_lmb *updated_lmb,
+ __maybe_unused const __be32 **usm,
+ __maybe_unused void *data)
+{
+   struct drmem_lmb *lmb;
+
+   for_each_drmem_lmb(lmb) {
+   if (lmb->drc_index != updated_lmb->drc_index)
+   continue;
+
+   lmb->aa_index = updated_lmb->aa_index;
+   break;
+   }
+   return 0;
+}
+
+/*
+ * Update the LMB associativity index.
+ *
+ * This needs to be called when the hypervisor is updating the
+ * dynamic-reconfiguration-memory node property.
+ */
+void drmem_update_lmbs(struct property *prop)
+{
+   /*
+* Don't update the LMBs if triggered by the update done in
+* drmem_update_dt(), the LMB values have been used to the update the DT
+* property in that case.
+*/
+   if (in_drmem_update)
+   return;
+   if (!strcmp(prop->name, "ibm,dynamic-memory"))
+   __walk_drmem_v1_lmbs(prop->value, NULL, NULL, update_lmb);
+   else if (!strcmp(prop->name, "ibm,dynamic-memory-v2"))
+   __walk_drmem_v2_lmbs(prop->value, NULL, NULL, update_lmb);
+}
  #endif
  
  static int init_drmem_lmb_size(struct dev

Re: [PATCH] powerpc/xive: Do not skip CPU-less nodes when creating the IPIs

2021-07-19 Thread Laurent Vivier
On 29/06/2021 15:15, Cédric Le Goater wrote:
> On PowerVM, CPU-less nodes can be populated with hot-plugged CPUs at
> runtime. Today, the IPI is not created for such nodes, and hot-plugged
> CPUs use a bogus IPI, which leads to soft lockups.
> 
> We could create the node IPI on demand but it is a bit complex because
> this code would be called under bringup_up() and some IRQ locking is
> being done. The simplest solution is to create the IPIs for all nodes
> at startup.
> 
> Fixes: 7dcc37b3eff9 ("powerpc/xive: Map one IPI interrupt per node")
> Cc: sta...@vger.kernel.org # v5.13
> Reported-by: Geetika Moolchandani 
> Cc: Srikar Dronamraju 
> Signed-off-by: Cédric Le Goater 
> ---
> 
> This patch breaks old versions of irqbalance (<= v1.4). Possible nodes
> are collected from /sys/devices/system/node/ but CPU-less nodes are
> not listed there. When interrupts are scanned, the link representing
> the node structure is NULL and segfault occurs.
> 
> Version 1.7 seems immune. 
> 
> ---
>  arch/powerpc/sysdev/xive/common.c | 4 
>  1 file changed, 4 deletions(-)
> 
> diff --git a/arch/powerpc/sysdev/xive/common.c 
> b/arch/powerpc/sysdev/xive/common.c
> index f3b16ed48b05..5d2c58dba57e 100644
> --- a/arch/powerpc/sysdev/xive/common.c
> +++ b/arch/powerpc/sysdev/xive/common.c
> @@ -1143,10 +1143,6 @@ static int __init xive_request_ipi(void)
>   struct xive_ipi_desc *xid = &xive_ipis[node];
>   struct xive_ipi_alloc_info info = { node };
>  
> - /* Skip nodes without CPUs */
> - if (cpumask_empty(cpumask_of_node(node)))
> - continue;
> -
>   /*
>* Map one IPI interrupt per node for all cpus of that node.
>* Since the HW interrupt number doesn't have any meaning,
> 

What happened to this fix? Will it be merged?

Thanks,
Laurent