Re: [PATCH v3 0/6] powerpc: queued spinlocks and rwlocks

2020-07-06 Thread Nicholas Piggin
Excerpts from Waiman Long's message of July 7, 2020 4:39 am:
> On 7/6/20 12:35 AM, Nicholas Piggin wrote:
>> v3 is updated to use __pv_queued_spin_unlock, noticed by Waiman (thank you).
>>
>> Thanks,
>> Nick
>>
>> Nicholas Piggin (6):
>>powerpc/powernv: must include hvcall.h to get PAPR defines
>>powerpc/pseries: move some PAPR paravirt functions to their own file
>>powerpc: move spinlock implementation to simple_spinlock
>>powerpc/64s: implement queued spinlocks and rwlocks
>>powerpc/pseries: implement paravirt qspinlocks for SPLPAR
>>powerpc/qspinlock: optimised atomic_try_cmpxchg_lock that adds the
>>  lock hint
>>
>>   arch/powerpc/Kconfig  |  13 +
>>   arch/powerpc/include/asm/Kbuild   |   2 +
>>   arch/powerpc/include/asm/atomic.h |  28 ++
>>   arch/powerpc/include/asm/paravirt.h   |  89 +
>>   arch/powerpc/include/asm/qspinlock.h  |  91 ++
>>   arch/powerpc/include/asm/qspinlock_paravirt.h |   7 +
>>   arch/powerpc/include/asm/simple_spinlock.h| 292 +
>>   .../include/asm/simple_spinlock_types.h   |  21 ++
>>   arch/powerpc/include/asm/spinlock.h   | 308 +-
>>   arch/powerpc/include/asm/spinlock_types.h |  17 +-
>>   arch/powerpc/lib/Makefile |   3 +
>>   arch/powerpc/lib/locks.c  |  12 +-
>>   arch/powerpc/platforms/powernv/pci-ioda-tce.c |   1 +
>>   arch/powerpc/platforms/pseries/Kconfig|   5 +
>>   arch/powerpc/platforms/pseries/setup.c|   6 +-
>>   include/asm-generic/qspinlock.h   |   4 +
>>   16 files changed, 577 insertions(+), 322 deletions(-)
>>   create mode 100644 arch/powerpc/include/asm/paravirt.h
>>   create mode 100644 arch/powerpc/include/asm/qspinlock.h
>>   create mode 100644 arch/powerpc/include/asm/qspinlock_paravirt.h
>>   create mode 100644 arch/powerpc/include/asm/simple_spinlock.h
>>   create mode 100644 arch/powerpc/include/asm/simple_spinlock_types.h
>>
> This patch looks OK to me.

Thanks for reviewing and testing.

> I had run some microbenchmark on powerpc system with or w/o the patch.
> 
> On a 2-socket 160-thread SMT4 POWER9 system (not virtualized):
> 
> 5.8.0-rc4
> =
> 
> Running locktest with spinlock [runtime = 10s, load = 1]
> Threads = 160, Min/Mean/Max = 77,665/90,153/106,895
> Threads = 160, Total Rate = 1,441,759 op/s; Percpu Rate = 9,011 op/s
> 
> Running locktest with rwlock [runtime = 10s, r% = 50%, load = 1]
> Threads = 160, Min/Mean/Max = 47,879/53,807/63,689
> Threads = 160, Total Rate = 860,192 op/s; Percpu Rate = 5,376 op/s
> 
> Running locktest with spinlock [runtime = 10s, load = 1]
> Threads = 80, Min/Mean/Max = 242,907/319,514/463,161
> Threads = 80, Total Rate = 2,555 kop/s; Percpu Rate = 32 kop/s
> 
> Running locktest with rwlock [runtime = 10s, r% = 50%, load = 1]
> Threads = 80, Min/Mean/Max = 146,161/187,474/259,270
> Threads = 80, Total Rate = 1,498 kop/s; Percpu Rate = 19 kop/s
> 
> Running locktest with spinlock [runtime = 10s, load = 1]
> Threads = 40, Min/Mean/Max = 646,639/1,000,817/1,455,205
> Threads = 40, Total Rate = 4,001 kop/s; Percpu Rate = 100 kop/s
> 
> Running locktest with rwlock [runtime = 10s, r% = 50%, load = 1]
> Threads = 40, Min/Mean/Max = 402,165/597,132/814,555
> Threads = 40, Total Rate = 2,388 kop/s; Percpu Rate = 60 kop/s
> 
> 5.8.0-rc4-qlock+
> 
> 
> Running locktest with spinlock [runtime = 10s, load = 1]
> Threads = 160, Min/Mean/Max = 123,835/124,580/124,587
> Threads = 160, Total Rate = 1,992 kop/s; Percpu Rate = 12 kop/s
> 
> Running locktest with rwlock [runtime = 10s, r% = 50%, load = 1]
> Threads = 160, Min/Mean/Max = 254,210/264,714/276,784
> Threads = 160, Total Rate = 4,231 kop/s; Percpu Rate = 26 kop/s
> 
> Running locktest with spinlock [runtime = 10s, load = 1]
> Threads = 80, Min/Mean/Max = 599,715/603,397/603,450
> Threads = 80, Total Rate = 4,825 kop/s; Percpu Rate = 60 kop/s
> 
> Running locktest with rwlock [runtime = 10s, r% = 50%, load = 1]
> Threads = 80, Min/Mean/Max = 492,687/525,224/567,456
> Threads = 80, Total Rate = 4,199 kop/s; Percpu Rate = 52 kop/s
> 
> Running locktest with spinlock [runtime = 10s, load = 1]
> Threads = 40, Min/Mean/Max = 1,325,623/1,325,628/1,325,636
> Threads = 40, Total Rate = 5,299 kop/s; Percpu Rate = 132 kop/s
> 
> Running locktest with rwlock [runtime = 10s, r% = 50%, load = 1]
> Threads = 40, Min/Mean/Max = 1,249,731/1,292,977/1,342,815
> Threads = 40, Total Rate = 5,168 kop/s; Percpu Rate = 129 kop/s
> 
> On systems on large number of cpus, qspinlock lock is faster and more fair.
> 
> With some tuning, we may be able to squeeze out more performance.

Yes, powerpc could certainly get more performance out of the slow
paths, and then there are a few parameters to tune.

We don't have a good alternate patching for function calls yet, but
that would be something to do for native vs pv.

And then there seem to be one or 

Re: [PATCH] powerpc/pseries: detect secure and trusted boot state of the system.

2020-07-06 Thread Michael Ellerman
Nayna Jain  writes:
> The device-tree property to check secure and trusted boot state is
> different for guests(pseries) compared to baremetal(powernv).
>
> This patch updates the existing is_ppc_secureboot_enabled() and
> is_ppc_trustedboot_enabled() function to add support for pseries.
>
> Signed-off-by: Nayna Jain 
> ---
>  arch/powerpc/kernel/secure_boot.c | 31 +--
>  1 file changed, 25 insertions(+), 6 deletions(-)
>
> diff --git a/arch/powerpc/kernel/secure_boot.c 
> b/arch/powerpc/kernel/secure_boot.c
> index 4b982324d368..43fc6607c7a5 100644
> --- a/arch/powerpc/kernel/secure_boot.c
> +++ b/arch/powerpc/kernel/secure_boot.c
> @@ -6,6 +6,7 @@
>  #include 
>  #include 
>  #include 
> +#include 
>  
>  static struct device_node *get_ppc_fw_sb_node(void)
>  {
> @@ -23,11 +24,20 @@ bool is_ppc_secureboot_enabled(void)
>  {
>   struct device_node *node;
>   bool enabled = false;
> + const u32 *secureboot;
>  
> - node = get_ppc_fw_sb_node();
> - enabled = of_property_read_bool(node, "os-secureboot-enforcing");
> + if (machine_is(powernv)) {
> + node = get_ppc_fw_sb_node();
> + enabled =
> + of_property_read_bool(node, "os-secureboot-enforcing");
> + of_node_put(node);
> + }

We generally try to avoid adding new machine_is() checks if we can.

In a case like this I think you can just check for both properties
regardless of what platform you're on.
  
> - of_node_put(node);
> + if (machine_is(pseries)) {
> + secureboot = of_get_property(of_root, "ibm,secure-boot", NULL);
> + if (secureboot)
> + enabled = (*secureboot > 1) ? true : false;
> + }

Please don't use of_get_property() in new code. Use one of the properly
typed accessors that handles endian conversion for you.

cheers


Re: [PATCH] powerpc: select ARCH_HAS_MEMBARRIER_SYNC_CORE

2020-07-06 Thread Nicholas Piggin
Excerpts from Christophe Leroy's message of July 6, 2020 7:53 pm:
> 
> 
> Le 06/07/2020 à 04:18, Nicholas Piggin a écrit :
>> diff --git a/arch/powerpc/include/asm/exception-64s.h 
>> b/arch/powerpc/include/asm/exception-64s.h
>> index 47bd4ea0837d..b88cb3a989b6 100644
>> --- a/arch/powerpc/include/asm/exception-64s.h
>> +++ b/arch/powerpc/include/asm/exception-64s.h
>> @@ -68,6 +68,10 @@
>>*
>>* The nop instructions allow us to insert one or more instructions to 
>> flush the
>>* L1-D cache when returning to userspace or a guest.
>> + *
>> + * powerpc relies on return from interrupt/syscall being context 
>> synchronising
>> + * (which hrfid, rfid, and rfscv are) to support 
>> ARCH_HAS_MEMBARRIER_SYNC_CORE
>> + * without additional additional synchronisation instructions.
> 
> This file is dedicated to BOOK3S/64. What about other ones ?
> 
> On 32 bits, this is also valid as 'rfi' is also context synchronising, 
> but then why just add some comment in exception-64s.h and only there ?

Yeah you're right, I basically wanted to keep a note there just in case,
because it's possible we would get a less synchronising return (maybe
unlikely with meltdown) or even return from a kernel interrupt using a
something faster (e.g., bctar if we don't use tar register in the kernel
anywhere).

So I wonder where to add the note, entry_32.S and 64e.h as well?

I should actually change the comment for 64-bit because soft masked 
interrupt replay is an interesting case. I thought it was okay (because 
the IPI would cause a hard interrupt which does do the rfi) but that 
should at least be written. The context synchronisation happens before
the Linux IPI function is called, but for the purpose of membarrier I 
think that is okay (the membarrier just needs to have caused a memory
barrier + context synchronistaion by the time it has done).

Thanks,
Nick


Re: [PATCH v3 1/2] powerpc/perf/hv-24x7: Add cpu hotplug support

2020-07-06 Thread Madhavan Srinivasan




On 7/7/20 10:26 AM, Michael Ellerman wrote:

Madhavan Srinivasan  writes:

On 7/6/20 8:43 AM, Michael Ellerman wrote:

Kajol Jain  writes:

Patch here adds cpu hotplug functions to hv_24x7 pmu.
A new cpuhp_state "CPUHP_AP_PERF_POWERPC_HV_24x7_ONLINE" enum
is added.

The online callback function updates the cpumask only if its
empty. As the primary intention of adding hotplug support
is to designate a CPU to make HCALL to collect the
counter data.

The offline function test and clear corresponding cpu in a cpumask
and update cpumask to any other active cpu.

Signed-off-by: Kajol Jain 
Reviewed-by: Gautham R. Shenoy 
---
   arch/powerpc/perf/hv-24x7.c | 45 +
   include/linux/cpuhotplug.h  |  1 +
   2 files changed, 46 insertions(+)

diff --git a/arch/powerpc/perf/hv-24x7.c b/arch/powerpc/perf/hv-24x7.c
index db213eb7cb02..ce4739e2b407 100644
--- a/arch/powerpc/perf/hv-24x7.c
+++ b/arch/powerpc/perf/hv-24x7.c
@@ -31,6 +31,8 @@ static int interface_version;
   /* Whether we have to aggregate result data for some domains. */
   static bool aggregate_result_elements;
   
+static cpumask_t hv_24x7_cpumask;

+
   static bool domain_is_valid(unsigned domain)
   {
switch (domain) {
@@ -1641,6 +1643,44 @@ static struct pmu h_24x7_pmu = {
.capabilities = PERF_PMU_CAP_NO_EXCLUDE,
   };
   
+static int ppc_hv_24x7_cpu_online(unsigned int cpu)

+{
+   /* Make this CPU the designated target for counter collection */

The comment implies every newly onlined CPU will become the target, but
actually it's only the first onlined CPU.

So I think the comment needs updating, or you could just drop the
comment, I think the code is fairly clear by itself.


+   if (cpumask_empty(_24x7_cpumask))
+   cpumask_set_cpu(cpu, _24x7_cpumask);
+
+   return 0;
+}
+
+static int ppc_hv_24x7_cpu_offline(unsigned int cpu)
+{
+   int target = -1;

No need to initialise target, you assign to it unconditionally below.


+   /* Check if exiting cpu is used for collecting 24x7 events */
+   if (!cpumask_test_and_clear_cpu(cpu, _24x7_cpumask))
+   return 0;
+
+   /* Find a new cpu to collect 24x7 events */
+   target = cpumask_last(cpu_active_mask);

Any reason to use cpumask_last() vs cpumask_first(), or a randomly
chosen CPU?


+   if (target < 0 || target >= nr_cpu_ids)
+   return -1;
+
+   /* Migrate 24x7 events to the new target */
+   cpumask_set_cpu(target, _24x7_cpumask);
+   perf_pmu_migrate_context(_24x7_pmu, cpu, target);
+
+   return 0;
+}
+
+static int hv_24x7_cpu_hotplug_init(void)
+{
+   return cpuhp_setup_state(CPUHP_AP_PERF_POWERPC_HV_24x7_ONLINE,
+ "perf/powerpc/hv_24x7:online",
+ ppc_hv_24x7_cpu_online,
+ ppc_hv_24x7_cpu_offline);
+}
+
   static int hv_24x7_init(void)
   {
int r;
@@ -1685,6 +1725,11 @@ static int hv_24x7_init(void)
if (r)
return r;
   
+	/* init cpuhotplug */

+   r = hv_24x7_cpu_hotplug_init();
+   if (r)
+   pr_err("hv_24x7: CPU hotplug init failed\n");
+

The hotplug initialisation shouldn't fail unless something is badly
wrong. I think you should just fail initialisation of the entire PMU if
that happens, which will make the error handling in the next patch much
simpler.

We  did fail the PMU registration on failure of the hotplug
code (and yes error handling is much simpler), but on internal
review/discussion,
what came up was that, hv_24x7 PMU will still be usable without
the hotplug code (with "-C" option to perf tool command line).

In theory yes.

But in reality no one will ever test that case, so the code will easily
bit rot.

Even if it doesn't bit rot, you've now created another state the system
can legally be in (hotplug init failed but PMU still probed), which you
have to test, document & support.

If the hotplug init fails then something is badly wrong, the best thing
we can do is bail on the PMU initialisation and hope the rest of the
system boots OK.


Yep agreed. Thanks for the comments mpe

Maddy



cheers




Re: [PATCH v12 00/31] Speculative page faults

2020-07-06 Thread Chinwen Chang
On Mon, 2020-07-06 at 14:27 +0200, Laurent Dufour wrote:
> Le 06/07/2020 à 11:25, Chinwen Chang a écrit :
> > On Thu, 2019-06-20 at 16:19 +0800, Haiyan Song wrote:
> >> Hi Laurent,
> >>
> >> I downloaded your script and run it on Intel 2s skylake platform with 
> >> spf-v12 patch
> >> serials.
> >>
> >> Here attached the output results of this script.
> >>
> >> The following comparison result is statistics from the script outputs.
> >>
> >> a). Enable THP
> >>  SPF_0  change   
> >> SPF_1
> >> will-it-scale.page_fault2.per_thread_ops2664190.8  -11.7%   
> >> 2353637.6
> >> will-it-scale.page_fault3.per_thread_ops4480027.2  -14.7%   
> >> 3819331.9
> >>
> >>
> >> b). Disable THP
> >>  SPF_0   change  
> >> SPF_1
> >> will-it-scale.page_fault2.per_thread_ops2653260.7   -10%
> >> 2385165.8
> >> will-it-scale.page_fault3.per_thread_ops4436330.1   -12.4%  
> >> 3886734.2
> >>
> >>
> >> Thanks,
> >> Haiyan Song
> >>
> >>
> >> On Fri, Jun 14, 2019 at 10:44:47AM +0200, Laurent Dufour wrote:
> >>> Le 14/06/2019 à 10:37, Laurent Dufour a écrit :
>  Please find attached the script I run to get these numbers.
>  This would be nice if you could give it a try on your victim node and 
>  share the result.
> >>>
> >>> Sounds that the Intel mail fitering system doesn't like the attached 
> >>> shell script.
> >>> Please find it there: 
> >>> https://urldefense.com/v3/__https://gist.github.com/ldu4/a5cc1a93f293108ea387d43d5d5e7f44__;!!CTRNKA9wMg0ARbw!0lux2FMCbIFxFEl824CdSuSQqT0IVWsvyUqfDVJNEVb9gTWyRltm7cpPZg70N_XhXmMZ$
> >>>  
> >>>
> >>> Thanks,
> >>> Laurent.
> >>>
> > 
> > Hi Laurent,
> > 
> > We merged SPF v11 and some patches from v12 into our platforms. After
> > several experiments, we observed SPF has obvious improvements on the
> > launch time of applications, especially for those high-TLP ones,
> > 
> > # launch time of applications(s):
> > 
> > package   version  w/ SPF  w/o SPF  improve(%)
> > --
> > Baidu maps10.13.3  0.887   0.98 9.49
> > Taobao8.4.0.35 1.227   1.2935.10
> > Meituan   9.12.401 1.107   1.54328.26
> > WeChat7.0.32.353   2.68 12.20
> > Honor of Kings1.43.1.6 6.636.7131.24
> 
> That's great news, thanks for reporting this!
> 
> > 
> > By the way, we have verified our platforms with those patches and
> > achieved the goal of mass production.
> 
> Another good news!
> For my information, what is your targeted hardware?
> 
> Cheers,
> Laurent.

Hi Laurent,

Our targeted hardware belongs to ARM64 multi-core series.

Thanks.
Chinwen
> 



Re: [PATCH] powerpc/numa: Restrict possible nodes based on platform

2020-07-06 Thread Michael Ellerman
Srikar Dronamraju  writes:
> As per PAPR, there are 2 device tree property
> ibm,max-associativity-domains (which defines the maximum number of
> domains that the firmware i.e PowerVM can support) and
> ibm,current-associativity-domains (which defines the maximum number of
> domains that the platform can support). Value of
> ibm,max-associativity-domains property is always greater than or equal
> to ibm,current-associativity-domains property.

Where is it documented?

It's definitely not in LoPAPR.

> Powerpc currently uses ibm,max-associativity-domains  property while
> setting the possible number of nodes. This is currently set at 32.
> However the possible number of nodes for a platform may be significantly
> less. Hence set the possible number of nodes based on
> ibm,current-associativity-domains property.
>
> $ lsprop /proc/device-tree/rtas/ibm,*associ*-domains
> /proc/device-tree/rtas/ibm,current-associativity-domains
>0005 0001 0002 0002 0002 0010
> /proc/device-tree/rtas/ibm,max-associativity-domains
>0005 0001 0008 0020 0020 0100
>
> $ cat /sys/devices/system/node/possible ##Before patch
> 0-31
>
> $ cat /sys/devices/system/node/possible ##After patch
> 0-1
>
> Note the maximum nodes this platform can support is only 2 but the
> possible nodes is set to 32.

But what about LPM to a system with more nodes?

cheers


[PATCH v2] MAINTAINERS: Add Shengjiu to reviewer list of sound/soc/fsl

2020-07-06 Thread Nicolin Chen
Add Shengjiu who's actively working on the latest fsl/nxp audio drivers.

Signed-off-by: Nicolin Chen 
Acked-by: Shengjiu Wang 
Reviewed-by: Fabio Estevam 
---
Changelog
v1->v2:
 * Replaced Shengjiu's emaill address with his gmail one
 * Added Acked-by from Shengjiu and Reviewed-by from Fabio

 MAINTAINERS | 1 +
 1 file changed, 1 insertion(+)

diff --git a/MAINTAINERS b/MAINTAINERS
index 496fd4eafb68..ff97b8cefaea 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -6956,6 +6956,7 @@ M:Timur Tabi 
 M: Nicolin Chen 
 M: Xiubo Li 
 R: Fabio Estevam 
+R: Shengjiu Wang 
 L: alsa-de...@alsa-project.org (moderated for non-subscribers)
 L: linuxppc-dev@lists.ozlabs.org
 S: Maintained
-- 
2.17.1



Re: [PATCH v3 1/2] powerpc/perf/hv-24x7: Add cpu hotplug support

2020-07-06 Thread Michael Ellerman
Madhavan Srinivasan  writes:
> On 7/6/20 8:43 AM, Michael Ellerman wrote:
>> Kajol Jain  writes:
>>> Patch here adds cpu hotplug functions to hv_24x7 pmu.
>>> A new cpuhp_state "CPUHP_AP_PERF_POWERPC_HV_24x7_ONLINE" enum
>>> is added.
>>>
>>> The online callback function updates the cpumask only if its
>>> empty. As the primary intention of adding hotplug support
>>> is to designate a CPU to make HCALL to collect the
>>> counter data.
>>>
>>> The offline function test and clear corresponding cpu in a cpumask
>>> and update cpumask to any other active cpu.
>>>
>>> Signed-off-by: Kajol Jain 
>>> Reviewed-by: Gautham R. Shenoy 
>>> ---
>>>   arch/powerpc/perf/hv-24x7.c | 45 +
>>>   include/linux/cpuhotplug.h  |  1 +
>>>   2 files changed, 46 insertions(+)
>>>
>>> diff --git a/arch/powerpc/perf/hv-24x7.c b/arch/powerpc/perf/hv-24x7.c
>>> index db213eb7cb02..ce4739e2b407 100644
>>> --- a/arch/powerpc/perf/hv-24x7.c
>>> +++ b/arch/powerpc/perf/hv-24x7.c
>>> @@ -31,6 +31,8 @@ static int interface_version;
>>>   /* Whether we have to aggregate result data for some domains. */
>>>   static bool aggregate_result_elements;
>>>   
>>> +static cpumask_t hv_24x7_cpumask;
>>> +
>>>   static bool domain_is_valid(unsigned domain)
>>>   {
>>> switch (domain) {
>>> @@ -1641,6 +1643,44 @@ static struct pmu h_24x7_pmu = {
>>> .capabilities = PERF_PMU_CAP_NO_EXCLUDE,
>>>   };
>>>   
>>> +static int ppc_hv_24x7_cpu_online(unsigned int cpu)
>>> +{
>>> +   /* Make this CPU the designated target for counter collection */
>> The comment implies every newly onlined CPU will become the target, but
>> actually it's only the first onlined CPU.
>>
>> So I think the comment needs updating, or you could just drop the
>> comment, I think the code is fairly clear by itself.
>>
>>> +   if (cpumask_empty(_24x7_cpumask))
>>> +   cpumask_set_cpu(cpu, _24x7_cpumask);
>>> +
>>> +   return 0;
>>> +}
>>> +
>>> +static int ppc_hv_24x7_cpu_offline(unsigned int cpu)
>>> +{
>>> +   int target = -1;
>> No need to initialise target, you assign to it unconditionally below.
>>
>>> +   /* Check if exiting cpu is used for collecting 24x7 events */
>>> +   if (!cpumask_test_and_clear_cpu(cpu, _24x7_cpumask))
>>> +   return 0;
>>> +
>>> +   /* Find a new cpu to collect 24x7 events */
>>> +   target = cpumask_last(cpu_active_mask);
>> Any reason to use cpumask_last() vs cpumask_first(), or a randomly
>> chosen CPU?
>>
>>> +   if (target < 0 || target >= nr_cpu_ids)
>>> +   return -1;
>>> +
>>> +   /* Migrate 24x7 events to the new target */
>>> +   cpumask_set_cpu(target, _24x7_cpumask);
>>> +   perf_pmu_migrate_context(_24x7_pmu, cpu, target);
>>> +
>>> +   return 0;
>>> +}
>>> +
>>> +static int hv_24x7_cpu_hotplug_init(void)
>>> +{
>>> +   return cpuhp_setup_state(CPUHP_AP_PERF_POWERPC_HV_24x7_ONLINE,
>>> + "perf/powerpc/hv_24x7:online",
>>> + ppc_hv_24x7_cpu_online,
>>> + ppc_hv_24x7_cpu_offline);
>>> +}
>>> +
>>>   static int hv_24x7_init(void)
>>>   {
>>> int r;
>>> @@ -1685,6 +1725,11 @@ static int hv_24x7_init(void)
>>> if (r)
>>> return r;
>>>   
>>> +   /* init cpuhotplug */
>>> +   r = hv_24x7_cpu_hotplug_init();
>>> +   if (r)
>>> +   pr_err("hv_24x7: CPU hotplug init failed\n");
>>> +
>> The hotplug initialisation shouldn't fail unless something is badly
>> wrong. I think you should just fail initialisation of the entire PMU if
>> that happens, which will make the error handling in the next patch much
>> simpler.
>
> We  did fail the PMU registration on failure of the hotplug
> code (and yes error handling is much simpler), but on internal 
> review/discussion,
> what came up was that, hv_24x7 PMU will still be usable without
> the hotplug code (with "-C" option to perf tool command line).

In theory yes.

But in reality no one will ever test that case, so the code will easily
bit rot.

Even if it doesn't bit rot, you've now created another state the system
can legally be in (hotplug init failed but PMU still probed), which you
have to test, document & support.

If the hotplug init fails then something is badly wrong, the best thing
we can do is bail on the PMU initialisation and hope the rest of the
system boots OK.

cheers


Re: [PATCH v3 1/2] powerpc/perf/hv-24x7: Add cpu hotplug support

2020-07-06 Thread Madhavan Srinivasan




On 7/6/20 8:43 AM, Michael Ellerman wrote:

Kajol Jain  writes:

Patch here adds cpu hotplug functions to hv_24x7 pmu.
A new cpuhp_state "CPUHP_AP_PERF_POWERPC_HV_24x7_ONLINE" enum
is added.

The online callback function updates the cpumask only if its
empty. As the primary intention of adding hotplug support
is to designate a CPU to make HCALL to collect the
counter data.

The offline function test and clear corresponding cpu in a cpumask
and update cpumask to any other active cpu.

Signed-off-by: Kajol Jain 
Reviewed-by: Gautham R. Shenoy 
---
  arch/powerpc/perf/hv-24x7.c | 45 +
  include/linux/cpuhotplug.h  |  1 +
  2 files changed, 46 insertions(+)

diff --git a/arch/powerpc/perf/hv-24x7.c b/arch/powerpc/perf/hv-24x7.c
index db213eb7cb02..ce4739e2b407 100644
--- a/arch/powerpc/perf/hv-24x7.c
+++ b/arch/powerpc/perf/hv-24x7.c
@@ -31,6 +31,8 @@ static int interface_version;
  /* Whether we have to aggregate result data for some domains. */
  static bool aggregate_result_elements;
  
+static cpumask_t hv_24x7_cpumask;

+
  static bool domain_is_valid(unsigned domain)
  {
switch (domain) {
@@ -1641,6 +1643,44 @@ static struct pmu h_24x7_pmu = {
.capabilities = PERF_PMU_CAP_NO_EXCLUDE,
  };
  
+static int ppc_hv_24x7_cpu_online(unsigned int cpu)

+{
+   /* Make this CPU the designated target for counter collection */

The comment implies every newly onlined CPU will become the target, but
actually it's only the first onlined CPU.

So I think the comment needs updating, or you could just drop the
comment, I think the code is fairly clear by itself.


+   if (cpumask_empty(_24x7_cpumask))
+   cpumask_set_cpu(cpu, _24x7_cpumask);
+
+   return 0;
+}
+
+static int ppc_hv_24x7_cpu_offline(unsigned int cpu)
+{
+   int target = -1;

No need to initialise target, you assign to it unconditionally below.


+   /* Check if exiting cpu is used for collecting 24x7 events */
+   if (!cpumask_test_and_clear_cpu(cpu, _24x7_cpumask))
+   return 0;
+
+   /* Find a new cpu to collect 24x7 events */
+   target = cpumask_last(cpu_active_mask);

Any reason to use cpumask_last() vs cpumask_first(), or a randomly
chosen CPU?


+   if (target < 0 || target >= nr_cpu_ids)
+   return -1;
+
+   /* Migrate 24x7 events to the new target */
+   cpumask_set_cpu(target, _24x7_cpumask);
+   perf_pmu_migrate_context(_24x7_pmu, cpu, target);
+
+   return 0;
+}
+
+static int hv_24x7_cpu_hotplug_init(void)
+{
+   return cpuhp_setup_state(CPUHP_AP_PERF_POWERPC_HV_24x7_ONLINE,
+ "perf/powerpc/hv_24x7:online",
+ ppc_hv_24x7_cpu_online,
+ ppc_hv_24x7_cpu_offline);
+}
+
  static int hv_24x7_init(void)
  {
int r;
@@ -1685,6 +1725,11 @@ static int hv_24x7_init(void)
if (r)
return r;
  
+	/* init cpuhotplug */

+   r = hv_24x7_cpu_hotplug_init();
+   if (r)
+   pr_err("hv_24x7: CPU hotplug init failed\n");
+

The hotplug initialisation shouldn't fail unless something is badly
wrong. I think you should just fail initialisation of the entire PMU if
that happens, which will make the error handling in the next patch much
simpler.


We  did fail the PMU registration on failure of the hotplug
code (and yes error handling is much simpler), but on internal 
review/discussion,

what came up was that, hv_24x7 PMU will still be usable without
the hotplug code (with "-C" option to perf tool command line).

Maddy



cheers


r = perf_pmu_register(_24x7_pmu, h_24x7_pmu.name, -1);
if (r)
return r;




Re: [PATCH] powerpc/numa: Restrict possible nodes based on platform

2020-07-06 Thread Srikar Dronamraju
* Nathan Lynch  [2020-07-06 19:44:31]:

> Tyrel Datwyler  writes:
> >> --- a/arch/powerpc/mm/numa.c
> >> +++ b/arch/powerpc/mm/numa.c
> >> @@ -897,7 +897,7 @@ static void __init find_possible_nodes(void)
> >>return;
> >> 
> >>if (of_property_read_u32_index(rtas,
> >> -  "ibm,max-associativity-domains",
> >> +  "ibm,current-associativity-domains",
> >
> > I'm not sure ibm,current-associativity-domains is guaranteed to exist on 
> > older
> > firmware. You may need check that it exists and fall back to
> > ibm,max-associativity-domains in the event it doesn't
> 
> Yes. Looks like it's a PowerVM-specific property.

Right, this is just a PowerVM specific property and doesn't affect PowerNV.
On PowerNV thought we have sparse nodes, the max possible nodes is set
correctly.

-- 
Thanks and Regards
Srikar Dronamraju


Re: [PATCH] powerpc/pseries: detect secure and trusted boot state of the system.

2020-07-06 Thread Daniel Axtens
As a final extra note,

  https://github.com/daxtens/qemu/tree/pseries-secboot

is a qemu repository that fakes out the relevant properties if anyone
else wants to test this.

Also,

>  3-9 - enabled, OS-defined behaviour. In this patch we map all these
>values to enabled and enforcing. Again I think this is the
>appropriate thing to do.

this should read "enabled and enforcing, requirements are at the
discretion of the operating system". Apologies.

Regards,
Daniel

> ibm,trusted-boot isn't published by a current PowerVM LPAR but will be
> published in future. (Currently, trusted boot state is inferred by the
> presence or absense of a vTPM.) It's simply 1 = enabled, 0 = disabled.
>
> As for this patch specifically, with the very small nits below,
>
> Reviewed-by: Daniel Axtens 
>
>> -node = get_ppc_fw_sb_node();
>> -enabled = of_property_read_bool(node, "os-secureboot-enforcing");
>> +if (machine_is(powernv)) {
>> +node = get_ppc_fw_sb_node();
>> +enabled =
>> +of_property_read_bool(node, "os-secureboot-enforcing");
>> +of_node_put(node);
>> +}
>>  
>> -of_node_put(node);
>> +if (machine_is(pseries)) {
> Maybe this should be an else if?
>
>> +secureboot = of_get_property(of_root, "ibm,secure-boot", NULL);
>> +if (secureboot)
>> +enabled = (*secureboot > 1) ? true : false;
>> +}
>>  
>>  pr_info("Secure boot mode %s\n", enabled ? "enabled" : "disabled");
>>  
>> @@ -38,11 +48,20 @@ bool is_ppc_trustedboot_enabled(void)
>>  {
>>  struct device_node *node;
>>  bool enabled = false;
>> +const u32 *trustedboot;
>>  
>> -node = get_ppc_fw_sb_node();
>> -enabled = of_property_read_bool(node, "trusted-enabled");
>> +if (machine_is(powernv)) {
>> +node = get_ppc_fw_sb_node();
>> +enabled = of_property_read_bool(node, "trusted-enabled");
>> +of_node_put(node);
>> +}
>>  
>> -of_node_put(node);
>> +if (machine_is(pseries)) {
> Likewise.
>> +trustedboot =
>> +of_get_property(of_root, "ibm,trusted-boot", NULL);
>> +if (trustedboot)
>> +enabled = (*trustedboot > 0) ? true : false;
>
> Regards,
> Daniel


Re: [PATCH] powerpc/numa: Restrict possible nodes based on platform

2020-07-06 Thread Srikar Dronamraju
* Tyrel Datwyler  [2020-07-06 13:58:42]:

> On 7/5/20 11:40 PM, Srikar Dronamraju wrote:
> > diff --git a/arch/powerpc/mm/numa.c b/arch/powerpc/mm/numa.c
> > index 9fcf2d195830..3d55cef1a2dc 100644
> > --- a/arch/powerpc/mm/numa.c
> > +++ b/arch/powerpc/mm/numa.c
> > @@ -897,7 +897,7 @@ static void __init find_possible_nodes(void)
> > return;
> > 
> > if (of_property_read_u32_index(rtas,
> > -   "ibm,max-associativity-domains",
> > +   "ibm,current-associativity-domains",
> 
> I'm not sure ibm,current-associativity-domains is guaranteed to exist on older
> firmware. You may need check that it exists and fall back to
> ibm,max-associativity-domains in the event it doesn't
> 
> -Tyrel
> 

Oh, 
thanks Tyrel thats a good observation.
Will fallback on max-associativity.

-- 
Thanks and Regards
Srikar Dronamraju


RE: [PATCH v6 00/11] Add the multiple PF support for DWC and Layerscape

2020-07-06 Thread Z.q. Hou
Hi Lorenzo,

> -Original Message-
> From: Lorenzo Pieralisi [mailto:lorenzo.pieral...@arm.com]
> Sent: 2020年7月6日 18:47
> To: Xiaowei Bao 
> Cc: Z.q. Hou ; M.h. Lian ;
> Mingkai Hu ; bhelg...@google.com;
> robh...@kernel.org; shawn...@kernel.org; Leo Li ;
> kis...@ti.com; Roy Zang ;
> amur...@thegoodpenguin.co.uk; jingooh...@gmail.com;
> gustavo.pimen...@synopsys.com; andrew.mur...@arm.com;
> linux-...@vger.kernel.org; devicet...@vger.kernel.org;
> linux-ker...@vger.kernel.org; linux-arm-ker...@lists.infradead.org;
> linuxppc-dev@lists.ozlabs.org
> Subject: Re: [PATCH v6 00/11] Add the multiple PF support for DWC and
> Layerscape
> 
> On Sat, Mar 14, 2020 at 11:30:27AM +0800, Xiaowei Bao wrote:
> > Add the PCIe EP multiple PF support for DWC and Layerscape, add the
> > doorbell MSIX function for DWC, use list to manage the PF of one PCIe
> > controller, and refactor the Layerscape EP driver due to some
> > platforms difference.
> >
> > Xiaowei Bao (11):
> >   PCI: designware-ep: Add multiple PFs support for DWC
> >   PCI: designware-ep: Add the doorbell mode of MSI-X in EP mode
> >   PCI: designware-ep: Move the function of getting MSI capability
> > forward
> >   PCI: designware-ep: Modify MSI and MSIX CAP way of finding
> >   dt-bindings: pci: layerscape-pci: Add compatible strings for ls1088a
> > and ls2088a
> >   PCI: layerscape: Fix some format issue of the code
> >   PCI: layerscape: Modify the way of getting capability with different
> > PEX
> >   PCI: layerscape: Modify the MSIX to the doorbell mode
> >   PCI: layerscape: Add EP mode support for ls1088a and ls2088a
> >   arm64: dts: layerscape: Add PCIe EP node for ls1088a
> >   misc: pci_endpoint_test: Add LS1088a in pci_device_id table
> >
> >  .../devicetree/bindings/pci/layerscape-pci.txt |   2 +
> >  arch/arm64/boot/dts/freescale/fsl-ls1088a.dtsi |  31 +++
> >  drivers/misc/pci_endpoint_test.c   |   2 +
> >  drivers/pci/controller/dwc/pci-layerscape-ep.c | 100 ++--
> >  drivers/pci/controller/dwc/pcie-designware-ep.c| 255
> +
> >  drivers/pci/controller/dwc/pcie-designware.c   |  59 +++--
> >  drivers/pci/controller/dwc/pcie-designware.h   |  48 +++-
> >  7 files changed, 404 insertions(+), 93 deletions(-)
> 
> Can you rebase it against v5.8-rc1 please ?

Yes, I will help to rebase.

Thanks,
Zhiqiang

> 
> Thanks,
> Lorenzo


Re: [PATCH] powerpc/pseries: detect secure and trusted boot state of the system.

2020-07-06 Thread Daniel Axtens
Thanks Nayna!

I'm hoping to get better public documentation for this soon as it's not
documented in a public PAPR yet.

Until then:

The values of ibm,secure-boot under PowerVM are:

 0 - disabled
 
 1 - audit mode only. This patch ignores this value for Linux, which I
 think is the appropriate thing to do.

 2 - enabled and enforcing

 3-9 - enabled, OS-defined behaviour. In this patch we map all these
   values to enabled and enforcing. Again I think this is the
   appropriate thing to do.

ibm,trusted-boot isn't published by a current PowerVM LPAR but will be
published in future. (Currently, trusted boot state is inferred by the
presence or absense of a vTPM.) It's simply 1 = enabled, 0 = disabled.

As for this patch specifically, with the very small nits below,

Reviewed-by: Daniel Axtens 

> - node = get_ppc_fw_sb_node();
> - enabled = of_property_read_bool(node, "os-secureboot-enforcing");
> + if (machine_is(powernv)) {
> + node = get_ppc_fw_sb_node();
> + enabled =
> + of_property_read_bool(node, "os-secureboot-enforcing");
> + of_node_put(node);
> + }
>  
> - of_node_put(node);
> + if (machine_is(pseries)) {
Maybe this should be an else if?

> + secureboot = of_get_property(of_root, "ibm,secure-boot", NULL);
> + if (secureboot)
> + enabled = (*secureboot > 1) ? true : false;
> + }
>  
>   pr_info("Secure boot mode %s\n", enabled ? "enabled" : "disabled");
>  
> @@ -38,11 +48,20 @@ bool is_ppc_trustedboot_enabled(void)
>  {
>   struct device_node *node;
>   bool enabled = false;
> + const u32 *trustedboot;
>  
> - node = get_ppc_fw_sb_node();
> - enabled = of_property_read_bool(node, "trusted-enabled");
> + if (machine_is(powernv)) {
> + node = get_ppc_fw_sb_node();
> + enabled = of_property_read_bool(node, "trusted-enabled");
> + of_node_put(node);
> + }
>  
> - of_node_put(node);
> + if (machine_is(pseries)) {
Likewise.
> + trustedboot =
> + of_get_property(of_root, "ibm,trusted-boot", NULL);
> + if (trustedboot)
> + enabled = (*trustedboot > 0) ? true : false;

Regards,
Daniel



Re: [PATCH] MAINTAINERS: Add Shengjiu to reviewer list of sound/soc/fsl

2020-07-06 Thread Shengjiu Wang
On Fri, Jul 3, 2020 at 3:33 AM Nicolin Chen  wrote:
>
> Add Shengjiu who's actively working on the latest fsl/nxp audio drivers.
>
> Signed-off-by: Nicolin Chen 
> Cc: Shengjiu Wang 
> ---
> To Shengjiu, please ack if you feel okay with this (your email address too).

Thanks Nicolin for nominating me as a reviewer.

I'd like to use my gmail address "shengjiu.w...@gmail.com".
with this then

Acked-by: Shengjiu Wang 

best regards
wang shengjiu

>
>  MAINTAINERS | 1 +
>  1 file changed, 1 insertion(+)
>
> diff --git a/MAINTAINERS b/MAINTAINERS
> index 496fd4eafb68..54aab083bb88 100644
> --- a/MAINTAINERS
> +++ b/MAINTAINERS
> @@ -6956,6 +6956,7 @@ M:Timur Tabi 
>  M: Nicolin Chen 
>  M: Xiubo Li 
>  R: Fabio Estevam 
> +R: Shengjiu Wang 
>  L: alsa-de...@alsa-project.org (moderated for non-subscribers)
>  L: linuxppc-dev@lists.ozlabs.org
>  S: Maintained
> --
> 2.17.1
>


Re: [PATCH v5 10/26] powerpc/book3s64/pkeys: Convert pkey_total to max_pkey

2020-07-06 Thread Michael Ellerman
"Aneesh Kumar K.V"  writes:
> On 7/6/20 12:34 PM, Michael Ellerman wrote:
>> "Aneesh Kumar K.V"  writes:
>>> max_pkey now represents max key value that userspace can allocate.
>>>
>
> I guess commit message is confusing.

And the name.

If it's called max_pkey then it should be the maximum allowed pkey
value.

cheers


>>> diff --git a/arch/powerpc/include/asm/pkeys.h 
>>> b/arch/powerpc/include/asm/pkeys.h
>>> index 75d2a2c19c04..652bad7334f3 100644
>>> --- a/arch/powerpc/include/asm/pkeys.h
>>> +++ b/arch/powerpc/include/asm/pkeys.h
>>> @@ -12,7 +12,7 @@
>>>   #include 
>>>   
>>>   DECLARE_STATIC_KEY_FALSE(pkey_disabled);
>>> -extern int pkeys_total; /* total pkeys as per device tree */
>>> +extern int max_pkey;
>>>   extern u32 initial_allocation_mask; /*  bits set for the initially 
>>> allocated keys */
>>>   extern u32 reserved_allocation_mask; /* bits set for reserved keys */
>>>   
>>> @@ -44,7 +44,10 @@ static inline int vma_pkey(struct vm_area_struct *vma)
>>> return (vma->vm_flags & ARCH_VM_PKEY_FLAGS) >> VM_PKEY_SHIFT;
>>>   }
>>>   
>>> -#define arch_max_pkey() pkeys_total
>>> +static inline int arch_max_pkey(void)
>>> +{
>>> +   return max_pkey;
>>> +}
>> 
>> If pkeys_total = 32 then max_pkey = 31.
>
> we have
>
> #ifdef CONFIG_PPC_4K_PAGES
>   /*
>* The OS can manage only 8 pkeys due to its inability to represent them
>* in the Linux 4K PTE. Mark all other keys reserved.
>*/
>   max_pkey = min(8, pkeys_total);
> #else
>   max_pkey = pkeys_total;
> #endif
>
> so it is 32.
>
>> 
>> So we can't just substitute one for the other. ie. arch_max_pkey() must
>> have been wrong, or it is wrong now.
>> 
>>> diff --git a/arch/powerpc/mm/book3s64/pkeys.c 
>>> b/arch/powerpc/mm/book3s64/pkeys.c
>>> index 87d882a9aaf2..a4d7287082a8 100644
>>> --- a/arch/powerpc/mm/book3s64/pkeys.c
>>> +++ b/arch/powerpc/mm/book3s64/pkeys.c
>>> @@ -14,7 +14,7 @@
>>>   
>>>   DEFINE_STATIC_KEY_FALSE(pkey_disabled);
>>>   DEFINE_STATIC_KEY_FALSE(execute_pkey_disabled);
>>> -int  pkeys_total;  /* Total pkeys as per device tree */
>>> +int  max_pkey; /* Maximum key value supported */
>>>   u32  initial_allocation_mask;   /* Bits set for the initially allocated 
>>> keys */
>>>   /*
>>>*  Keys marked in the reservation list cannot be allocated by  userspace
>>> @@ -84,7 +84,7 @@ static int scan_pkey_feature(void)
>>>   
>>>   static int pkey_initialize(void)
>>>   {
>>> -   int os_reserved, i;
>>> +   int pkeys_total, i;
>>>   
>>> /*
>>>  * We define PKEY_DISABLE_EXECUTE in addition to the arch-neutral
>>> @@ -122,12 +122,12 @@ static int pkey_initialize(void)
>>>  * The OS can manage only 8 pkeys due to its inability to represent them
>>>  * in the Linux 4K PTE. Mark all other keys reserved.
>>>  */
>>> -   os_reserved = pkeys_total - 8;
>>> +   max_pkey = min(8, pkeys_total);
>> 
>> Shouldn't it be 7 ?
>> 
>>>   #else
>>> -   os_reserved = 0;
>>> +   max_pkey = pkeys_total;
>>>   #endif
>>>   
>>> -   if (unlikely((pkeys_total - os_reserved) <= execute_only_key)) {
>>> +   if (unlikely(max_pkey <= execute_only_key)) {
>> 
>> Isn't that an off-by-one now?
>> 
>> This is one-off boot time code, there's no need to clutter it with
>> unlikely.
>> 
>>> /*
>>>  * Insufficient number of keys to support
>>>  * execute only key. Mark it unavailable.
>>> @@ -174,10 +174,10 @@ static int pkey_initialize(void)
>>> default_uamor &= ~(0x3ul << pkeyshift(1));
>>>   
>>> /*
>>> -* Prevent the usage of OS reserved the keys. Update UAMOR
>>> +* Prevent the usage of OS reserved keys. Update UAMOR
>>>  * for those keys.
>>>  */
>>> -   for (i = (pkeys_total - os_reserved); i < pkeys_total; i++) {
>>> +   for (i = max_pkey; i < pkeys_total; i++) {
>> 
>> Another off-by-one? Shouldn't we start from max_pkey + 1 ?
>> 
>>> reserved_allocation_mask |= (0x1 << i);
>>> default_uamor &= ~(0x3ul << pkeyshift(i));
>>> }
>> 
>
> It is the commit message. It is max pkey such that userspace can 
> allocate 0 - maxp_pkey -1.
>
> -aneesh


Re: [PATCH v5 19/26] powerpc/book3s64/kuap: Move KUAP related function outside radix

2020-07-06 Thread Michael Ellerman
"Aneesh Kumar K.V"  writes:
> On 7/6/20 6:11 PM, Michael Ellerman wrote:
>> "Aneesh Kumar K.V"  writes:
>>> The next set of patches adds support for kuap with hash translation.
>> 
>> That's no longer true of this series.
>> 
>>> diff --git a/arch/powerpc/mm/book3s64/pkeys.c 
>>> b/arch/powerpc/mm/book3s64/pkeys.c
>>> index 0d72c0246052..e93b65a0e6e7 100644
>>> --- a/arch/powerpc/mm/book3s64/pkeys.c
>>> +++ b/arch/powerpc/mm/book3s64/pkeys.c
>>> @@ -199,6 +200,24 @@ void __init pkey_early_init_devtree(void)
>>> return;
>>>   }
>>>   
>>> +#ifdef CONFIG_PPC_KUAP
>>> +void __init setup_kuap(bool disabled)
>>> +{
>>> +   if (disabled || !early_radix_enabled())
>>> +   return;
>>> +
>>> +   if (smp_processor_id() == boot_cpuid) {
>>> +   pr_info("Activating Kernel Userspace Access Prevention\n");
>>> +   cur_cpu_spec->mmu_features |= MMU_FTR_RADIX_KUAP;
>>> +   }
>>> +
>>> +   /* Make sure userspace can't change the AMR */
>>> +   mtspr(SPRN_UAMOR, 0);
>>> +   mtspr(SPRN_AMR, AMR_KUAP_BLOCKED);
>>> +   isync();
>>> +}
>>> +#endif
>> 
>> This makes this code depend on CONFIG_PPC_MEM_KEYS=y, which it didn't
>> used to.
>> 
>> That risks breaking people's existing .configs, if they have
>> PPC_MEM_KEYS=n they will now lose KUAP.
>> 
>> And I'm not convinced the two features should be tied together, at least
>> at the user-visible Kconfig level.
>
> That simplifies the addition of hash kuap a lot. Especially in the 
> exception entry and return paths.  I did try to consider them as 
> independent options. But then the feature fixup in asm code gets 
> unnecessarily complicated. Also the UAMOR handling also get complicated.

Yep. I'm OK if most of the code is enabled for either/both options, but
I think the user-visible options should not depend on each other.

So something like:

config PPC_PKEY
def_bool y
depends on PPC_MEM_KEYS || PPC_KUAP

And then the low-level code is guarded by PPC_PKEY.

Or we just say that making MEM_KEYS configurable is not worth the
added complexity and turn it on always.

cheers


Re: [PATCH v5 18/26] powerpc/book3s64/keys/kuap: Reset AMR/IAMR values on kexec

2020-07-06 Thread Michael Ellerman
"Aneesh Kumar K.V"  writes:
> On 7/6/20 5:59 PM, Michael Ellerman wrote:
>> "Aneesh Kumar K.V"  writes:
>>> As we kexec across kernels that use AMR/IAMR for different purposes
>>> we need to ensure that new kernels get kexec'd with a reset value
>>> of AMR/IAMR. For ex: the new kernel can use key 0 for kernel mapping and 
>>> the old
>>> AMR value prevents access to key 0.
>>>
>>> This patch also removes reset if IAMR and AMOR in kexec_sequence. Reset of 
>>> AMOR
>>> is not needed and the IAMR reset is partial (it doesn't do the reset
>>> on secondary cpus) and is redundant with this patch.
>> 
>> I like the idea of cleaning this stuff up.
>> 
>> But I think tying it into kup/kuep/kuap and the MMU features and ifdefs
>> and so on is overly complicated.
>> 
>> We should just have eg:
>> 
>> void reset_sprs(void)
>> {
>>  if (early_cpu_has_feature(CPU_FTR_ARCH_206)) {
>>  mtspr(SPRN_AMR, 0);
>>  mtspr(SPRN_UAMOR, 0);
>>  }
>> 
>>  if (early_cpu_has_feature(CPU_FTR_ARCH_207S)) {
>>  mtspr(SPRN_IAMR, 0);
>>  }
>> }
>> 
>> And call that from the kexec paths.
>
> Will fix. Should that be early_cpu_has_feature()? cpu_has_feature() 
> should work there right?

Yeah I guess. I was thinking if we crashed before code patching was
done, but if that happens we can't kdump anyway. So I'm probably being
over paranoid.

cheers


Re: [PATCH v5 13/26] powerpc/book3s64/pkeys: Enable MMU_FTR_PKEY

2020-07-06 Thread Michael Ellerman
"Aneesh Kumar K.V"  writes:
>>>
   /*
    * Let's assume 32 pkeys on P8 bare metal, if its not 
 defined by device
    * tree. We make this exception since skiboot forgot to 
 expose this
    * property on power8.
    */
   if (!firmware_has_feature(FW_FEATURE_LPAR) &&
 -    cpu_has_feature(CPU_FTRS_POWER8))
 +    early_cpu_has_feature(CPU_FTRS_POWER8))
   pkeys_total = 32;
>>>
>>> That's not how cpu_has_feature() works, we'll need to fix that.
>>>
>>> cheers
>>>
>> 
>> I did a separate patch to handle that which switch the above to
>> 
>>      /*
>>   * Let's assume 32 pkeys on P8/P9 bare metal, if its not 
>> defined by device
>>   * tree. We make this exception since skiboot forgot to expose 
>> this
>>   * property on power8/9.
>>   */
>>      if (!firmware_has_feature(FW_FEATURE_LPAR) &&
>>      (early_cpu_has_feature(CPU_FTR_ARCH_207S) ||
>>   early_cpu_has_feature(CPU_FTR_ARCH_300)))
>>      pkeys_total = 32;
>> 
>
> We should do a PVR check here i guess.

Yes, the ARCH features don't work because P10 will have both of those
enabled.

>   ret = of_scan_flat_dt(dt_scan_storage_keys, _total);
>   if (ret == 0) {
>
>   /*
>* Let's assume 32 pkeys on P8/P9 bare metal, if its not 
> defined by device
>* tree. We make this exception since skiboot forgot to expose 
> this
>* property on power8/9.

Well, it does expose it on Power9 after v6.6, but most P9 systems have
an older firmware than that.

And also the kernel has been enabling that on Power9 because of the
CPU_FTRS_POWER8 bug, so this is not actually a behaviour change.

>*/
>   if (!firmware_has_feature(FW_FEATURE_LPAR) &&
>   (pvr_version_is(PVR_POWER8) || pvr_version_is(PVR_POWER9)))
>   pkeys_total = 32;
>   }

You need PVR_POWER8E and PVR_POWER8NVL as well.

cheers


[PATCH 1/1] KVM/PPC: Fix typo on H_DISABLE_AND_GET hcall

2020-07-06 Thread Leonardo Bras
On PAPR+ the hcall() on 0x1B0 is called H_DISABLE_AND_GET, but got
defined as H_DISABLE_AND_GETC instead.

This define was introduced with a typo in commit 
("[PATCH] powerpc: Extends HCALL interface for InfiniBand usage"), and was
later used without having the typo noticed.

Signed-off-by: Leonardo Bras 
---
 arch/powerpc/include/asm/hvcall.h| 2 +-
 arch/powerpc/kvm/trace_hv.h  | 2 +-
 tools/perf/arch/powerpc/util/book3s_hcalls.h | 2 +-
 3 files changed, 3 insertions(+), 3 deletions(-)

diff --git a/arch/powerpc/include/asm/hvcall.h 
b/arch/powerpc/include/asm/hvcall.h
index e90c073e437e..d8ada9c7ec78 100644
--- a/arch/powerpc/include/asm/hvcall.h
+++ b/arch/powerpc/include/asm/hvcall.h
@@ -237,7 +237,7 @@
 #define H_CREATE_RPT0x1A4
 #define H_REMOVE_RPT0x1A8
 #define H_REGISTER_RPAGES   0x1AC
-#define H_DISABLE_AND_GETC  0x1B0
+#define H_DISABLE_AND_GET   0x1B0
 #define H_ERROR_DATA0x1B4
 #define H_GET_HCA_INFO  0x1B8
 #define H_GET_PERF_COUNT0x1BC
diff --git a/arch/powerpc/kvm/trace_hv.h b/arch/powerpc/kvm/trace_hv.h
index 4a61a971c34e..830a126e095d 100644
--- a/arch/powerpc/kvm/trace_hv.h
+++ b/arch/powerpc/kvm/trace_hv.h
@@ -89,7 +89,7 @@
{H_CREATE_RPT,  "H_CREATE_RPT"}, \
{H_REMOVE_RPT,  "H_REMOVE_RPT"}, \
{H_REGISTER_RPAGES, "H_REGISTER_RPAGES"}, \
-   {H_DISABLE_AND_GETC,"H_DISABLE_AND_GETC"}, \
+   {H_DISABLE_AND_GET, "H_DISABLE_AND_GET"}, \
{H_ERROR_DATA,  "H_ERROR_DATA"}, \
{H_GET_HCA_INFO,"H_GET_HCA_INFO"}, \
{H_GET_PERF_COUNT,  "H_GET_PERF_COUNT"}, \
diff --git a/tools/perf/arch/powerpc/util/book3s_hcalls.h 
b/tools/perf/arch/powerpc/util/book3s_hcalls.h
index 54cfa0530e86..488f4339b83c 100644
--- a/tools/perf/arch/powerpc/util/book3s_hcalls.h
+++ b/tools/perf/arch/powerpc/util/book3s_hcalls.h
@@ -84,7 +84,7 @@
{0x1a4, "H_CREATE_RPT"},\
{0x1a8, "H_REMOVE_RPT"},\
{0x1ac, "H_REGISTER_RPAGES"},   \
-   {0x1b0, "H_DISABLE_AND_GETC"},  \
+   {0x1b0, "H_DISABLE_AND_GET"},   \
{0x1b4, "H_ERROR_DATA"},\
{0x1b8, "H_GET_HCA_INFO"},  \
{0x1bc, "H_GET_PERF_COUNT"},\
-- 
2.25.4



Re: [PATCH] powerpc/numa: Restrict possible nodes based on platform

2020-07-06 Thread Nathan Lynch
Tyrel Datwyler  writes:
>> --- a/arch/powerpc/mm/numa.c
>> +++ b/arch/powerpc/mm/numa.c
>> @@ -897,7 +897,7 @@ static void __init find_possible_nodes(void)
>>  return;
>> 
>>  if (of_property_read_u32_index(rtas,
>> -"ibm,max-associativity-domains",
>> +"ibm,current-associativity-domains",
>
> I'm not sure ibm,current-associativity-domains is guaranteed to exist on older
> firmware. You may need check that it exists and fall back to
> ibm,max-associativity-domains in the event it doesn't

Yes. Looks like it's a PowerVM-specific property.


Re: [PATCH v2 00/12] ppc64: enable kdump support for kexec_file_load syscall

2020-07-06 Thread piliu



On 07/03/2020 03:53 AM, Hari Bathini wrote:
> This patch series enables kdump support for kexec_file_load system
> call (kexec -s -p) on PPC64. The changes are inspired from kexec-tools
> code but heavily modified for kernel consumption. There is scope to
> expand purgatory to verify sha256 digest along with other improvements
> in purgatory code. Will deal with those changes in a separate patch
> series later.
> 
> The first patch adds a weak arch_kexec_locate_mem_hole() function to
> override locate memory hole logic suiting arch needs. There are some
> special regions in ppc64 which should be avoided while loading buffer
> & there are multiple callers to kexec_add_buffer making it complicated
> to maintain range sanity and using generic lookup at the same time.
> 
> The second patch marks ppc64 specific code within arch/powerpc/kexec
> and arch/powerpc/purgatory to make the subsequent code changes easy
> to understand.
> 
> The next patch adds helper function to setup different memory ranges
> needed for loading kdump kernel, booting into it and exporting the
> crashing kernel's elfcore.
> 
> The fourth patch overrides arch_kexec_locate_mem_hole() function to
> locate memory hole for kdump segments by accounting for the special
> memory regions, referred to as excluded memory ranges, and sets
> kbuf->mem when a suitable memory region is found.
> 
> The fifth patch moves walk_drmem_lmbs() out of .init section with
> a few changes to reuse it for setting up kdump kernel's usable memory
> ranges. The next patch uses walk_drmem_lmbs() to look up the LMBs
> and set linux,drconf-usable-memory & linux,usable-memory properties
> in order to restrict kdump kernel's memory usage.
> 
> The seventh patch adds relocation support for the purgatory. Patch 8
> helps setup the stack for the purgatory. The next patch setups up
> backup region as a segment while loading kdump kernel and teaches
> purgatory to copy it from source to destination.
> 
> Patch 10 builds the elfcore header for the running kernel & passes
> the info to kdump kernel via "elfcorehdr=" parameter to export as
> /proc/vmcore file. The next patch sets up the memory reserve map
> for the kexec kernel and also claims kdump support for kdump as
> all the necessary changes are added.
> 
> The last patch fixes a lookup issue for `kexec -l -s` case when
> memory is reserved for crashkernel.
> 
> Tested the changes successfully on P8, P9 lpars, couple of OpenPOWER
> boxes and a simulator.
> 
> Changes in v2:
> * Introduced arch_kexec_locate_mem_hole() for override and dropped
>   weak arch_kexec_add_buffer().
> * Addressed warnings reported by lkp.
> * Added patch to address kexec load issue when memory is reserved
>   for crashkernel.
> * Used the appropriate license header for the new files added.
> * Added an option to merge ranges to minimize reallocations while
>   adding memory ranges.
> * Dropped within_crashkernel parameter for add_opal_mem_range() &
>   add_rtas_mem_range() functions as it is not really needed.
> 
> ---
> 
> Hari Bathini (12):
>   kexec_file: allow archs to handle special regions while locating memory 
> hole
>   powerpc/kexec_file: mark PPC64 specific code
>   powerpc/kexec_file: add helper functions for getting memory ranges
>   ppc64/kexec_file: avoid stomping memory used by special regions
>   powerpc/drmem: make lmb walk a bit more flexible
>   ppc64/kexec_file: restrict memory usage of kdump kernel
>   ppc64/kexec_file: add support to relocate purgatory
>   ppc64/kexec_file: setup the stack for purgatory
>   ppc64/kexec_file: setup backup region for kdump kernel
>   ppc64/kexec_file: prepare elfcore header for crashing kernel
>   ppc64/kexec_file: add appropriate regions for memory reserve map
>   ppc64/kexec_file: fix kexec load failure with lack of memory hole
> 
> 
>  arch/powerpc/include/asm/crashdump-ppc64.h |   15 
>  arch/powerpc/include/asm/drmem.h   |9 
>  arch/powerpc/include/asm/kexec.h   |   35 +
>  arch/powerpc/include/asm/kexec_ranges.h|   18 
>  arch/powerpc/include/asm/purgatory.h   |   11 
>  arch/powerpc/kernel/prom.c |   13 
>  arch/powerpc/kexec/Makefile|2 
>  arch/powerpc/kexec/elf_64.c|   35 +
>  arch/powerpc/kexec/file_load.c |   78 +
>  arch/powerpc/kexec/file_load_64.c  | 1509 
> 
>  arch/powerpc/kexec/ranges.c|  397 +++
>  arch/powerpc/mm/drmem.c|   87 +-
>  arch/powerpc/mm/numa.c |   13 
>  arch/powerpc/purgatory/Makefile|   28 -
>  arch/powerpc/purgatory/purgatory_64.c  |   36 +
>  arch/powerpc/purgatory/trampoline.S|  117 --
>  arch/powerpc/purgatory/trampoline_64.S |  175 +++
>  include/linux/kexec.h  |   29 -
>  kernel/kexec_file.c|   16 
>  19 files changed, 2413 insertions(+), 210 

Re: [PATCH 1/4] dma-mapping: move the remaining DMA API calls out of line

2020-07-06 Thread Alexey Kardashevskiy



On 03/06/2020 14:13, Alexey Kardashevskiy wrote:
> 
> 
> On 09/05/2020 18:19, Christoph Hellwig wrote:
>> On Tue, May 05, 2020 at 02:18:37PM +1000, Alexey Kardashevskiy wrote:
>>>
>>>
>>> On 17/04/2020 17:58, Christoph Hellwig wrote:
 On Wed, Apr 15, 2020 at 09:21:37PM +1000, Alexey Kardashevskiy wrote:
> And the fact they were exported leaves possibility that there is a
> driver somewhere relying on these symbols or distro kernel won't build
> because the symbol disappeared from exports (I do not know what KABI
> guarantees or if mainline kernel cares).

 We absolutely do not care.  In fact for abuses of APIs that drivers
 should not use we almost care to make them private and break people
 abusing them.
>>>
>>> ok :)
>>>
> I do not care in particular but
> some might, a line separated with empty lines in the commit log would do.

 I'll add a blurb for the next version.
>>>
>>>
>>> Has it gone anywhere? Thanks,
>>
>> I've been hoping for the sg_buf helpers to land first, as they need
>> backporting and would conflict.  Do you urgently need the series?
> 
> Any progress with sg_buf helpers stuff? Thanks,


Any luck there? I'd really like to cross this off my todo list :) Thanks,



-- 
Alexey


Re: [PATCH] powerpc/spufs: add CONFIG_COREDUMP dependency

2020-07-06 Thread Jeremy Kerr
Hi Arnd,

> The kernel test robot pointed out a slightly different error message
> after recent commit 5456ffdee666 ("powerpc/spufs: simplify spufs core
> dumping") to spufs for a configuration that never worked:
> 
>powerpc64-linux-ld: arch/powerpc/platforms/cell/spufs/file.o: in
> function `.spufs_proxydma_info_dump':
> > > file.c:(.text+0x4c68): undefined reference to `.dump_emit'
>powerpc64-linux-ld: arch/powerpc/platforms/cell/spufs/file.o: in
> function `.spufs_dma_info_dump':
>file.c:(.text+0x4d70): undefined reference to `.dump_emit'
>powerpc64-linux-ld: arch/powerpc/platforms/cell/spufs/file.o: in
> function `.spufs_wbox_info_dump':
>file.c:(.text+0x4df4): undefined reference to `.dump_emit'
> 
> Add a Kconfig dependency to prevent this from happening again.
> 
> Reported-by: kernel test robot 
> Signed-off-by: Arnd Bergmann 

Looks good to me, thanks.

Acked-by: Jeremy Kerr 

Cheers,


Jeremy



Re: [PATCH] powerpc/numa: Restrict possible nodes based on platform

2020-07-06 Thread Nathan Lynch
Srikar Dronamraju  writes:
> diff --git a/arch/powerpc/mm/numa.c b/arch/powerpc/mm/numa.c
> index 9fcf2d195830..3d55cef1a2dc 100644
> --- a/arch/powerpc/mm/numa.c
> +++ b/arch/powerpc/mm/numa.c
> @@ -897,7 +897,7 @@ static void __init find_possible_nodes(void)
>   return;
>  
>   if (of_property_read_u32_index(rtas,
> - "ibm,max-associativity-domains",
> + "ibm,current-associativity-domains",
>   min_common_depth, ))

Looks good if ibm,current-associativity-domains[min_common_depth]
actually denotes the range of possible values, i.e. a value of 2 implies
node numbers 0 and 1. PAPR+ says it's the "number of unique values",
which isn't how I would specify the property if it's supposed to express
a range. But it's probably OK... 


Re: [PATCH] powerpc/numa: Restrict possible nodes based on platform

2020-07-06 Thread Tyrel Datwyler
On 7/5/20 11:40 PM, Srikar Dronamraju wrote:
> As per PAPR, there are 2 device tree property
> ibm,max-associativity-domains (which defines the maximum number of
> domains that the firmware i.e PowerVM can support) and
> ibm,current-associativity-domains (which defines the maximum number of
> domains that the platform can support). Value of
> ibm,max-associativity-domains property is always greater than or equal
> to ibm,current-associativity-domains property.
> 
> Powerpc currently uses ibm,max-associativity-domains  property while
> setting the possible number of nodes. This is currently set at 32.
> However the possible number of nodes for a platform may be significantly
> less. Hence set the possible number of nodes based on
> ibm,current-associativity-domains property.
> 
> $ lsprop /proc/device-tree/rtas/ibm,*associ*-domains
> /proc/device-tree/rtas/ibm,current-associativity-domains
>0005 0001 0002 0002 0002 0010
> /proc/device-tree/rtas/ibm,max-associativity-domains
>0005 0001 0008 0020 0020 0100
> 
> $ cat /sys/devices/system/node/possible ##Before patch
> 0-31
> 
> $ cat /sys/devices/system/node/possible ##After patch
> 0-1
> 
> Note the maximum nodes this platform can support is only 2 but the
> possible nodes is set to 32.
> 
> This is important because lot of kernel and user space code allocate
> structures for all possible nodes leading to a lot of memory that is
> allocated but not used.
> 
> I ran a simple experiment to create and destroy 100 memory cgroups on
> boot on a 8 node machine (Power8 Alpine).
> 
> Before patch
> free -k at boot
>   totalusedfree  shared  buff/cache   
> available
> Mem:  523498176 4106816   518820608   22272  570752   
> 516606720
> Swap:   4194240   0 4194240
> 
> free -k after creating 100 memory cgroups
>   totalusedfree  shared  buff/cache   
> available
> Mem:  523498176 4628416   518246464   22336  623296   
> 516058688
> Swap:   4194240   0 4194240
> 
> free -k after destroying 100 memory cgroups
>   totalusedfree  shared  buff/cache   
> available
> Mem:  523498176 4697408   518173760   22400  627008   
> 515987904
> Swap:   4194240   0 4194240
> 
> After patch
> free -k at boot
>   totalusedfree  shared  buff/cache   
> available
> Mem:  523498176 3969472   518933888   22272  594816   
> 516731776
> Swap:   4194240   0 4194240
> 
> free -k after creating 100 memory cgroups
>   totalusedfree  shared  buff/cache   
> available
> Mem:  523498176 4181888   518676096   22208  640192   
> 516496448
> Swap:   4194240   0 4194240
> 
> free -k after destroying 100 memory cgroups
>   totalusedfree  shared  buff/cache   
> available
> Mem:  523498176 4232320   518619904   22272  645952   
> 516443264
> Swap:   4194240   0 4194240
> 
> Observations:
> Fixed kernel takes 137344 kb (4106816-3969472) less to boot.
> Fixed kernel takes 309184 kb (4628416-4181888-137344) less to create 100 
> memcgs.
> 
> Cc: Nathan Lynch 
> Cc: Michael Ellerman 
> Cc: linuxppc-dev@lists.ozlabs.org
> Cc: Anton Blanchard 
> Cc: Bharata B Rao 
> Signed-off-by: Srikar Dronamraju 
> ---
>  arch/powerpc/mm/numa.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/arch/powerpc/mm/numa.c b/arch/powerpc/mm/numa.c
> index 9fcf2d195830..3d55cef1a2dc 100644
> --- a/arch/powerpc/mm/numa.c
> +++ b/arch/powerpc/mm/numa.c
> @@ -897,7 +897,7 @@ static void __init find_possible_nodes(void)
>   return;
> 
>   if (of_property_read_u32_index(rtas,
> - "ibm,max-associativity-domains",
> + "ibm,current-associativity-domains",

I'm not sure ibm,current-associativity-domains is guaranteed to exist on older
firmware. You may need check that it exists and fall back to
ibm,max-associativity-domains in the event it doesn't

-Tyrel

>   min_common_depth, ))
>   goto out;
> 



Re: [PATCH v3 0/6] powerpc: queued spinlocks and rwlocks

2020-07-06 Thread Waiman Long

On 7/6/20 12:35 AM, Nicholas Piggin wrote:

v3 is updated to use __pv_queued_spin_unlock, noticed by Waiman (thank you).

Thanks,
Nick

Nicholas Piggin (6):
   powerpc/powernv: must include hvcall.h to get PAPR defines
   powerpc/pseries: move some PAPR paravirt functions to their own file
   powerpc: move spinlock implementation to simple_spinlock
   powerpc/64s: implement queued spinlocks and rwlocks
   powerpc/pseries: implement paravirt qspinlocks for SPLPAR
   powerpc/qspinlock: optimised atomic_try_cmpxchg_lock that adds the
 lock hint

  arch/powerpc/Kconfig  |  13 +
  arch/powerpc/include/asm/Kbuild   |   2 +
  arch/powerpc/include/asm/atomic.h |  28 ++
  arch/powerpc/include/asm/paravirt.h   |  89 +
  arch/powerpc/include/asm/qspinlock.h  |  91 ++
  arch/powerpc/include/asm/qspinlock_paravirt.h |   7 +
  arch/powerpc/include/asm/simple_spinlock.h| 292 +
  .../include/asm/simple_spinlock_types.h   |  21 ++
  arch/powerpc/include/asm/spinlock.h   | 308 +-
  arch/powerpc/include/asm/spinlock_types.h |  17 +-
  arch/powerpc/lib/Makefile |   3 +
  arch/powerpc/lib/locks.c  |  12 +-
  arch/powerpc/platforms/powernv/pci-ioda-tce.c |   1 +
  arch/powerpc/platforms/pseries/Kconfig|   5 +
  arch/powerpc/platforms/pseries/setup.c|   6 +-
  include/asm-generic/qspinlock.h   |   4 +
  16 files changed, 577 insertions(+), 322 deletions(-)
  create mode 100644 arch/powerpc/include/asm/paravirt.h
  create mode 100644 arch/powerpc/include/asm/qspinlock.h
  create mode 100644 arch/powerpc/include/asm/qspinlock_paravirt.h
  create mode 100644 arch/powerpc/include/asm/simple_spinlock.h
  create mode 100644 arch/powerpc/include/asm/simple_spinlock_types.h


This patch looks OK to me.

I had run some microbenchmark on powerpc system with or w/o the patch.

On a 2-socket 160-thread SMT4 POWER9 system (not virtualized):

5.8.0-rc4
=

Running locktest with spinlock [runtime = 10s, load = 1]
Threads = 160, Min/Mean/Max = 77,665/90,153/106,895
Threads = 160, Total Rate = 1,441,759 op/s; Percpu Rate = 9,011 op/s

Running locktest with rwlock [runtime = 10s, r% = 50%, load = 1]
Threads = 160, Min/Mean/Max = 47,879/53,807/63,689
Threads = 160, Total Rate = 860,192 op/s; Percpu Rate = 5,376 op/s

Running locktest with spinlock [runtime = 10s, load = 1]
Threads = 80, Min/Mean/Max = 242,907/319,514/463,161
Threads = 80, Total Rate = 2,555 kop/s; Percpu Rate = 32 kop/s

Running locktest with rwlock [runtime = 10s, r% = 50%, load = 1]
Threads = 80, Min/Mean/Max = 146,161/187,474/259,270
Threads = 80, Total Rate = 1,498 kop/s; Percpu Rate = 19 kop/s

Running locktest with spinlock [runtime = 10s, load = 1]
Threads = 40, Min/Mean/Max = 646,639/1,000,817/1,455,205
Threads = 40, Total Rate = 4,001 kop/s; Percpu Rate = 100 kop/s

Running locktest with rwlock [runtime = 10s, r% = 50%, load = 1]
Threads = 40, Min/Mean/Max = 402,165/597,132/814,555
Threads = 40, Total Rate = 2,388 kop/s; Percpu Rate = 60 kop/s

5.8.0-rc4-qlock+


Running locktest with spinlock [runtime = 10s, load = 1]
Threads = 160, Min/Mean/Max = 123,835/124,580/124,587
Threads = 160, Total Rate = 1,992 kop/s; Percpu Rate = 12 kop/s

Running locktest with rwlock [runtime = 10s, r% = 50%, load = 1]
Threads = 160, Min/Mean/Max = 254,210/264,714/276,784
Threads = 160, Total Rate = 4,231 kop/s; Percpu Rate = 26 kop/s

Running locktest with spinlock [runtime = 10s, load = 1]
Threads = 80, Min/Mean/Max = 599,715/603,397/603,450
Threads = 80, Total Rate = 4,825 kop/s; Percpu Rate = 60 kop/s

Running locktest with rwlock [runtime = 10s, r% = 50%, load = 1]
Threads = 80, Min/Mean/Max = 492,687/525,224/567,456
Threads = 80, Total Rate = 4,199 kop/s; Percpu Rate = 52 kop/s

Running locktest with spinlock [runtime = 10s, load = 1]
Threads = 40, Min/Mean/Max = 1,325,623/1,325,628/1,325,636
Threads = 40, Total Rate = 5,299 kop/s; Percpu Rate = 132 kop/s

Running locktest with rwlock [runtime = 10s, r% = 50%, load = 1]
Threads = 40, Min/Mean/Max = 1,249,731/1,292,977/1,342,815
Threads = 40, Total Rate = 5,168 kop/s; Percpu Rate = 129 kop/s

On systems on large number of cpus, qspinlock lock is faster and more fair.

With some tuning, we may be able to squeeze out more performance.

Cheers,
Longman



Re: [PATCH] kbuild: introduce ccflags-remove-y and asflags-remove-y

2020-07-06 Thread Masahiro Yamada
Hi Anders,

On Mon, Jul 6, 2020 at 8:24 PM Anders Roxell  wrote:
>

> The full log can be found here [1].
>
> Without this patch for  'trace_selftest_dynamic' for instance, CC_FLAGS_FTRACE
> was removed from kernel/trace/*, and then added back to
> kernel/trace/trace_selftest_dynamic.
> While with this patch it looks like we add the flag (even though it is
> already there), and then
> removes the flag for all files in kernel/trace/* .


You are right.

I will drop this patch,
and send v2.


Thank you.




-- 
Best Regards
Masahiro Yamada


Re: [PATCH v5 13/26] powerpc/book3s64/pkeys: Enable MMU_FTR_PKEY

2020-07-06 Thread Aneesh Kumar K.V






  /*
   * Let's assume 32 pkeys on P8 bare metal, if its not 
defined by device
   * tree. We make this exception since skiboot forgot to 
expose this

   * property on power8.
   */
  if (!firmware_has_feature(FW_FEATURE_LPAR) &&
-    cpu_has_feature(CPU_FTRS_POWER8))
+    early_cpu_has_feature(CPU_FTRS_POWER8))
  pkeys_total = 32;


That's not how cpu_has_feature() works, we'll need to fix that.

cheers



I did a separate patch to handle that which switch the above to

     /*
  * Let's assume 32 pkeys on P8/P9 bare metal, if its not 
defined by device
  * tree. We make this exception since skiboot forgot to expose 
this

  * property on power8/9.
  */
     if (!firmware_has_feature(FW_FEATURE_LPAR) &&
     (early_cpu_has_feature(CPU_FTR_ARCH_207S) ||
  early_cpu_has_feature(CPU_FTR_ARCH_300)))
     pkeys_total = 32;




We should do a PVR check here i guess.


ret = of_scan_flat_dt(dt_scan_storage_keys, _total);
if (ret == 0) {

/*
 * Let's assume 32 pkeys on P8/P9 bare metal, if its not 
defined by device
 * tree. We make this exception since skiboot forgot to expose 
this
 * property on power8/9.
 */
if (!firmware_has_feature(FW_FEATURE_LPAR) &&
(pvr_version_is(PVR_POWER8) || pvr_version_is(PVR_POWER9)))
pkeys_total = 32;
}


-aneesh


Re: [PATCH v5 3/3] mm/page_alloc: Keep memoryless cpuless node 0 offline

2020-07-06 Thread Andi Kleen
> > What's the point of this indirection other than another way of avoiding
> > empty node 0?
> 
> Honestly, I do not have any idea. I've traced it down to
> Author: Andi Kleen 
> Date:   Tue Jan 11 15:35:48 2005 -0800

I don't remember all the details, and I can't even find the commit
(is it in linux-historic?).

But AFAIK there's no guarantee PXMs are small and continuous, so it
seemed better to have a clean zero based space.

Back then we had a lot of problems with buggy SRAT tables in BIOS,
so we really tried to avoid trusting the BIOS as much as possible.

-Andi



Re: [PATCH v5 19/26] powerpc/book3s64/kuap: Move KUAP related function outside radix

2020-07-06 Thread Aneesh Kumar K.V

On 7/6/20 6:11 PM, Michael Ellerman wrote:

"Aneesh Kumar K.V"  writes:

The next set of patches adds support for kuap with hash translation.


That's no longer true of this series.


diff --git a/arch/powerpc/mm/book3s64/pkeys.c b/arch/powerpc/mm/book3s64/pkeys.c
index 0d72c0246052..e93b65a0e6e7 100644
--- a/arch/powerpc/mm/book3s64/pkeys.c
+++ b/arch/powerpc/mm/book3s64/pkeys.c
@@ -199,6 +200,24 @@ void __init pkey_early_init_devtree(void)
return;
  }
  
+#ifdef CONFIG_PPC_KUAP

+void __init setup_kuap(bool disabled)
+{
+   if (disabled || !early_radix_enabled())
+   return;
+
+   if (smp_processor_id() == boot_cpuid) {
+   pr_info("Activating Kernel Userspace Access Prevention\n");
+   cur_cpu_spec->mmu_features |= MMU_FTR_RADIX_KUAP;
+   }
+
+   /* Make sure userspace can't change the AMR */
+   mtspr(SPRN_UAMOR, 0);
+   mtspr(SPRN_AMR, AMR_KUAP_BLOCKED);
+   isync();
+}
+#endif


This makes this code depend on CONFIG_PPC_MEM_KEYS=y, which it didn't
used to.

That risks breaking people's existing .configs, if they have
PPC_MEM_KEYS=n they will now lose KUAP.

And I'm not convinced the two features should be tied together, at least
at the user-visible Kconfig level.



That simplifies the addition of hash kuap a lot. Especially in the 
exception entry and return paths.  I did try to consider them as 
independent options. But then the feature fixup in asm code gets 
unnecessarily complicated. Also the UAMOR handling also get complicated.


-aneesh


Re: [PATCH v5 18/26] powerpc/book3s64/keys/kuap: Reset AMR/IAMR values on kexec

2020-07-06 Thread Aneesh Kumar K.V

On 7/6/20 5:59 PM, Michael Ellerman wrote:

"Aneesh Kumar K.V"  writes:

As we kexec across kernels that use AMR/IAMR for different purposes
we need to ensure that new kernels get kexec'd with a reset value
of AMR/IAMR. For ex: the new kernel can use key 0 for kernel mapping and the old
AMR value prevents access to key 0.

This patch also removes reset if IAMR and AMOR in kexec_sequence. Reset of AMOR
is not needed and the IAMR reset is partial (it doesn't do the reset
on secondary cpus) and is redundant with this patch.


I like the idea of cleaning this stuff up.

But I think tying it into kup/kuep/kuap and the MMU features and ifdefs
and so on is overly complicated.

We should just have eg:

void reset_sprs(void)
{
if (early_cpu_has_feature(CPU_FTR_ARCH_206)) {
mtspr(SPRN_AMR, 0);
mtspr(SPRN_UAMOR, 0);
 }

if (early_cpu_has_feature(CPU_FTR_ARCH_207S)) {
mtspr(SPRN_IAMR, 0);
 }
}

And call that from the kexec paths.



Will fix. Should that be early_cpu_has_feature()? cpu_has_feature() 
should work there right?


-aneesh


Re: [PATCH v5 13/26] powerpc/book3s64/pkeys: Enable MMU_FTR_PKEY

2020-07-06 Thread Aneesh Kumar K.V

On 7/6/20 6:40 PM, Michael Ellerman wrote:

"Aneesh Kumar K.V"  writes:

Parse storage keys related device tree entry in early_init_devtree
and enable MMU feature MMU_FTR_PKEY if pkeys are supported.

MMU feature is used instead of CPU feature because this enables us
to group MMU_FTR_KUAP and MMU_FTR_PKEY in asm feature fixup code.


Subject should probably be "Add MMU_FTR_PKEY".


diff --git a/arch/powerpc/include/asm/mmu.h b/arch/powerpc/include/asm/mmu.h
index f4ac25d4df05..72966d3d8f64 100644
--- a/arch/powerpc/include/asm/mmu.h
+++ b/arch/powerpc/include/asm/mmu.h
@@ -23,6 +23,7 @@
  
  /* Radix page table supported and enabled */

  #define MMU_FTR_TYPE_RADIXASM_CONST(0x0040)
+#define MMU_FTR_PKEY   ASM_CONST(0x0080)


It's not a type, so it should be with the individual feature bits below:




We don't have free bit in the other group. For now i will move this to

modified   arch/powerpc/include/asm/mmu.h
@@ -23,12 +23,15 @@

 /* Radix page table supported and enabled */
 #define MMU_FTR_TYPE_RADIX ASM_CONST(0x0040)
-#define MMU_FTR_PKEY   ASM_CONST(0x0080)

 /*
  * Individual features below.
  */

+/*
+ * Support for memory protection keys.
+ */
+#define MMU_FTR_PKEY   ASM_CONST(0x1000)
 /*
  * Support for 68 bit VA space. We added that from ISA 2.05
  */




  /*
   * Individual features below.



diff --git a/arch/powerpc/kernel/prom.c b/arch/powerpc/kernel/prom.c
index 6a3bac357e24..6d70797352d8 100644
--- a/arch/powerpc/kernel/prom.c
+++ b/arch/powerpc/kernel/prom.c
@@ -815,6 +815,11 @@ void __init early_init_devtree(void *params)
/* Now try to figure out if we are running on LPAR and so on */
pseries_probe_fw_features();
  
+	/*

+* Initialize pkey features and default AMR/IAMR values
+*/
+   pkey_early_init_devtree();


Not your fault, but the fact that we're having to do more and more
initialisation this early, based on the flat device tree, makes me
wonder if we need to rethink how we're doing the CPU/MMU feature setup.


diff --git a/arch/powerpc/mm/book3s64/pkeys.c b/arch/powerpc/mm/book3s64/pkeys.c
index 0ff59acdbb84..bbba9c601e14 100644
--- a/arch/powerpc/mm/book3s64/pkeys.c
+++ b/arch/powerpc/mm/book3s64/pkeys.c
@@ -38,38 +39,45 @@ static int execute_only_key = 2;
  #define PKEY_REG_BITS (sizeof(u64) * 8)
  #define pkeyshift(pkey) (PKEY_REG_BITS - ((pkey+1) * AMR_BITS_PER_PKEY))
  
+static int __init dt_scan_storage_keys(unsigned long node,

+  const char *uname, int depth,
+  void *data)
+{
+   const char *type = of_get_flat_dt_prop(node, "device_type", NULL);
+   const __be32 *prop;
+   int pkeys_total;
+
+   /* We are scanning "cpu" nodes only */
+   if (type == NULL || strcmp(type, "cpu") != 0)
+   return 0;
+
+   prop = of_get_flat_dt_prop(node, "ibm,processor-storage-keys", NULL);
+   if (!prop)
+   return 0;
+   pkeys_total = be32_to_cpu(prop[0]);
+   return pkeys_total;


That's not really the way the return value is meant to be used for these
scanning functions.

The usual return values are 0 to keep scanning and !0 means we've found
the node we're looking for and we should stop scanning.

Doing it this way means if you find 0 pkeys it will keep scanning.

Instead you should pass _total as the data pointer and set that.


+}
+



done

modified   arch/powerpc/mm/book3s64/pkeys.c
@@ -52,7 +52,7 @@ static int __init dt_scan_storage_keys(unsigned long node,
 {
const char *type = of_get_flat_dt_prop(node, "device_type", NULL);
const __be32 *prop;
-   int pkeys_total;
+   int *pkeys_total = (int *) data;

/* We are scanning "cpu" nodes only */
if (type == NULL || strcmp(type, "cpu") != 0)
@@ -61,12 +61,13 @@ static int __init dt_scan_storage_keys(unsigned long 
node,

prop = of_get_flat_dt_prop(node, "ibm,processor-storage-keys", NULL);
if (!prop)
return 0;
-   pkeys_total = be32_to_cpu(prop[0]);
-   return pkeys_total;
+   *pkeys_total = be32_to_cpu(prop[0]);
+   return 1;
 }

 static int scan_pkey_feature(void)
 {
+   int ret;
int pkeys_total;

/*
@@ -75,8 +76,8 @@ static int scan_pkey_feature(void)
if (early_radix_enabled())
return 0;

-   pkeys_total = of_scan_flat_dt(dt_scan_storage_keys, NULL);
-   if (pkeys_total == 0) {
+   ret = of_scan_flat_dt(dt_scan_storage_keys, _total);
+   if (ret == 0) {

/*
 		 * Let's assume 32 pkeys on P8/P9 bare metal, if its not defined by 
device

@@ -87,7 +88,6 @@ static int scan_pkey_feature(void)
(early_cpu_has_feature(CPU_FTR_ARCH_207S) ||
 early_cpu_has_feature(CPU_FTR_ARCH_300)))
pkeys_total = 32;
-
}

/*



  static int 

[PATCH] powerpc/spufs: add CONFIG_COREDUMP dependency

2020-07-06 Thread Arnd Bergmann
The kernel test robot pointed out a slightly different error message
after recent commit 5456ffdee666 ("powerpc/spufs: simplify spufs core
dumping") to spufs for a configuration that never worked:

   powerpc64-linux-ld: arch/powerpc/platforms/cell/spufs/file.o: in function 
`.spufs_proxydma_info_dump':
>> file.c:(.text+0x4c68): undefined reference to `.dump_emit'
   powerpc64-linux-ld: arch/powerpc/platforms/cell/spufs/file.o: in function 
`.spufs_dma_info_dump':
   file.c:(.text+0x4d70): undefined reference to `.dump_emit'
   powerpc64-linux-ld: arch/powerpc/platforms/cell/spufs/file.o: in function 
`.spufs_wbox_info_dump':
   file.c:(.text+0x4df4): undefined reference to `.dump_emit'

Add a Kconfig dependency to prevent this from happening again.

Reported-by: kernel test robot 
Signed-off-by: Arnd Bergmann 
---
 arch/powerpc/platforms/cell/Kconfig | 1 +
 1 file changed, 1 insertion(+)

diff --git a/arch/powerpc/platforms/cell/Kconfig 
b/arch/powerpc/platforms/cell/Kconfig
index 0f7c8241912b..f2ff359041ee 100644
--- a/arch/powerpc/platforms/cell/Kconfig
+++ b/arch/powerpc/platforms/cell/Kconfig
@@ -44,6 +44,7 @@ config SPU_FS
tristate "SPU file system"
default m
depends on PPC_CELL
+   depends on COREDUMP
select SPU_BASE
help
  The SPU file system is used to access Synergistic Processing
-- 
2.27.0



Re: [PATCH v5 13/26] powerpc/book3s64/pkeys: Enable MMU_FTR_PKEY

2020-07-06 Thread Michael Ellerman
"Aneesh Kumar K.V"  writes:
> Parse storage keys related device tree entry in early_init_devtree
> and enable MMU feature MMU_FTR_PKEY if pkeys are supported.
>
> MMU feature is used instead of CPU feature because this enables us
> to group MMU_FTR_KUAP and MMU_FTR_PKEY in asm feature fixup code.

Subject should probably be "Add MMU_FTR_PKEY".

> diff --git a/arch/powerpc/include/asm/mmu.h b/arch/powerpc/include/asm/mmu.h
> index f4ac25d4df05..72966d3d8f64 100644
> --- a/arch/powerpc/include/asm/mmu.h
> +++ b/arch/powerpc/include/asm/mmu.h
> @@ -23,6 +23,7 @@
>  
>  /* Radix page table supported and enabled */
>  #define MMU_FTR_TYPE_RADIX   ASM_CONST(0x0040)
> +#define MMU_FTR_PKEY ASM_CONST(0x0080)

It's not a type, so it should be with the individual feature bits below:

>  /*
>   * Individual features below.

> diff --git a/arch/powerpc/kernel/prom.c b/arch/powerpc/kernel/prom.c
> index 6a3bac357e24..6d70797352d8 100644
> --- a/arch/powerpc/kernel/prom.c
> +++ b/arch/powerpc/kernel/prom.c
> @@ -815,6 +815,11 @@ void __init early_init_devtree(void *params)
>   /* Now try to figure out if we are running on LPAR and so on */
>   pseries_probe_fw_features();
>  
> + /*
> +  * Initialize pkey features and default AMR/IAMR values
> +  */
> + pkey_early_init_devtree();

Not your fault, but the fact that we're having to do more and more
initialisation this early, based on the flat device tree, makes me
wonder if we need to rethink how we're doing the CPU/MMU feature setup.

> diff --git a/arch/powerpc/mm/book3s64/pkeys.c 
> b/arch/powerpc/mm/book3s64/pkeys.c
> index 0ff59acdbb84..bbba9c601e14 100644
> --- a/arch/powerpc/mm/book3s64/pkeys.c
> +++ b/arch/powerpc/mm/book3s64/pkeys.c
> @@ -38,38 +39,45 @@ static int execute_only_key = 2;
>  #define PKEY_REG_BITS (sizeof(u64) * 8)
>  #define pkeyshift(pkey) (PKEY_REG_BITS - ((pkey+1) * AMR_BITS_PER_PKEY))
>  
> +static int __init dt_scan_storage_keys(unsigned long node,
> +const char *uname, int depth,
> +void *data)
> +{
> + const char *type = of_get_flat_dt_prop(node, "device_type", NULL);
> + const __be32 *prop;
> + int pkeys_total;
> +
> + /* We are scanning "cpu" nodes only */
> + if (type == NULL || strcmp(type, "cpu") != 0)
> + return 0;
> +
> + prop = of_get_flat_dt_prop(node, "ibm,processor-storage-keys", NULL);
> + if (!prop)
> + return 0;
> + pkeys_total = be32_to_cpu(prop[0]);
> + return pkeys_total;

That's not really the way the return value is meant to be used for these
scanning functions.

The usual return values are 0 to keep scanning and !0 means we've found
the node we're looking for and we should stop scanning.

Doing it this way means if you find 0 pkeys it will keep scanning.

Instead you should pass _total as the data pointer and set that.

> +}
> +
>  static int scan_pkey_feature(void)
>  {
> - u32 vals[2];
> - int pkeys_total = 0;
> - struct device_node *cpu;
> + int pkeys_total;
>  
>   /*
>* Pkey is not supported with Radix translation.
>*/
> - if (radix_enabled())
> + if (early_radix_enabled())
>   return 0;
>  
> - cpu = of_find_node_by_type(NULL, "cpu");
> - if (!cpu)
> - return 0;
> + pkeys_total = of_scan_flat_dt(dt_scan_storage_keys, NULL);
> + if (pkeys_total == 0) {
>  
> - if (of_property_read_u32_array(cpu,
> -"ibm,processor-storage-keys", vals, 2) 
> == 0) {
> - /*
> -  * Since any pkey can be used for data or execute, we will
> -  * just treat all keys as equal and track them as one entity.
> -  */
> - pkeys_total = vals[0];
> - /*  Should we check for IAMR support FIXME!! */

???

> - } else {

This loses the ability to differentiate between no storage-keys property
at all vs a property that specifies zero keys, which I don't think is a
good change.

>   /*
>* Let's assume 32 pkeys on P8 bare metal, if its not defined 
> by device
>* tree. We make this exception since skiboot forgot to expose 
> this
>* property on power8.
>*/
>   if (!firmware_has_feature(FW_FEATURE_LPAR) &&
> - cpu_has_feature(CPU_FTRS_POWER8))
> + early_cpu_has_feature(CPU_FTRS_POWER8))
>   pkeys_total = 32;

That's not how cpu_has_feature() works, we'll need to fix that.

cheers


Re: [PATCH v5 19/26] powerpc/book3s64/kuap: Move KUAP related function outside radix

2020-07-06 Thread Michael Ellerman
"Aneesh Kumar K.V"  writes:
> The next set of patches adds support for kuap with hash translation.

That's no longer true of this series.

> diff --git a/arch/powerpc/mm/book3s64/pkeys.c 
> b/arch/powerpc/mm/book3s64/pkeys.c
> index 0d72c0246052..e93b65a0e6e7 100644
> --- a/arch/powerpc/mm/book3s64/pkeys.c
> +++ b/arch/powerpc/mm/book3s64/pkeys.c
> @@ -199,6 +200,24 @@ void __init pkey_early_init_devtree(void)
>   return;
>  }
>  
> +#ifdef CONFIG_PPC_KUAP
> +void __init setup_kuap(bool disabled)
> +{
> + if (disabled || !early_radix_enabled())
> + return;
> +
> + if (smp_processor_id() == boot_cpuid) {
> + pr_info("Activating Kernel Userspace Access Prevention\n");
> + cur_cpu_spec->mmu_features |= MMU_FTR_RADIX_KUAP;
> + }
> +
> + /* Make sure userspace can't change the AMR */
> + mtspr(SPRN_UAMOR, 0);
> + mtspr(SPRN_AMR, AMR_KUAP_BLOCKED);
> + isync();
> +}
> +#endif

This makes this code depend on CONFIG_PPC_MEM_KEYS=y, which it didn't
used to.

That risks breaking people's existing .configs, if they have
PPC_MEM_KEYS=n they will now lose KUAP.

And I'm not convinced the two features should be tied together, at least
at the user-visible Kconfig level.

cheers


Re: [PATCH v12 00/31] Speculative page faults

2020-07-06 Thread Laurent Dufour

Le 06/07/2020 à 11:25, Chinwen Chang a écrit :

On Thu, 2019-06-20 at 16:19 +0800, Haiyan Song wrote:

Hi Laurent,

I downloaded your script and run it on Intel 2s skylake platform with spf-v12 
patch
serials.

Here attached the output results of this script.

The following comparison result is statistics from the script outputs.

a). Enable THP
 SPF_0  change   SPF_1
will-it-scale.page_fault2.per_thread_ops2664190.8  -11.7%   
2353637.6
will-it-scale.page_fault3.per_thread_ops4480027.2  -14.7%   
3819331.9


b). Disable THP
 SPF_0   change  SPF_1
will-it-scale.page_fault2.per_thread_ops2653260.7   -10%
2385165.8
will-it-scale.page_fault3.per_thread_ops4436330.1   -12.4%  
3886734.2


Thanks,
Haiyan Song


On Fri, Jun 14, 2019 at 10:44:47AM +0200, Laurent Dufour wrote:

Le 14/06/2019 à 10:37, Laurent Dufour a écrit :

Please find attached the script I run to get these numbers.
This would be nice if you could give it a try on your victim node and share the 
result.


Sounds that the Intel mail fitering system doesn't like the attached shell 
script.
Please find it there: 
https://gist.github.com/ldu4/a5cc1a93f293108ea387d43d5d5e7f44

Thanks,
Laurent.



Hi Laurent,

We merged SPF v11 and some patches from v12 into our platforms. After
several experiments, we observed SPF has obvious improvements on the
launch time of applications, especially for those high-TLP ones,

# launch time of applications(s):

package   version  w/ SPF  w/o SPF  improve(%)
--
Baidu maps10.13.3  0.887   0.98 9.49
Taobao8.4.0.35 1.227   1.2935.10
Meituan   9.12.401 1.107   1.54328.26
WeChat7.0.32.353   2.68 12.20
Honor of Kings1.43.1.6 6.636.7131.24


That's great news, thanks for reporting this!



By the way, we have verified our platforms with those patches and
achieved the goal of mass production.


Another good news!
For my information, what is your targeted hardware?

Cheers,
Laurent.



Re: [PATCH v5 18/26] powerpc/book3s64/keys/kuap: Reset AMR/IAMR values on kexec

2020-07-06 Thread Michael Ellerman
"Aneesh Kumar K.V"  writes:
> As we kexec across kernels that use AMR/IAMR for different purposes
> we need to ensure that new kernels get kexec'd with a reset value
> of AMR/IAMR. For ex: the new kernel can use key 0 for kernel mapping and the 
> old
> AMR value prevents access to key 0.
>
> This patch also removes reset if IAMR and AMOR in kexec_sequence. Reset of 
> AMOR
> is not needed and the IAMR reset is partial (it doesn't do the reset
> on secondary cpus) and is redundant with this patch.

I like the idea of cleaning this stuff up.

But I think tying it into kup/kuep/kuap and the MMU features and ifdefs
and so on is overly complicated.

We should just have eg:

void reset_sprs(void)
{
if (early_cpu_has_feature(CPU_FTR_ARCH_206)) {
mtspr(SPRN_AMR, 0);
mtspr(SPRN_UAMOR, 0);
}

if (early_cpu_has_feature(CPU_FTR_ARCH_207S)) {
mtspr(SPRN_IAMR, 0);
}
}

And call that from the kexec paths.

cheers

> diff --git a/arch/powerpc/include/asm/book3s/64/kup-radix.h 
> b/arch/powerpc/include/asm/book3s/64/kup-radix.h
> index 3ee1ec60be84..c57063c35833 100644
> --- a/arch/powerpc/include/asm/book3s/64/kup-radix.h
> +++ b/arch/powerpc/include/asm/book3s/64/kup-radix.h
> @@ -180,6 +180,26 @@ static inline unsigned long kuap_get_and_check_amr(void)
>  }
>  #endif /* CONFIG_PPC_KUAP */
>  
> +#define reset_kuap reset_kuap
> +static inline void reset_kuap(void)
> +{
> + if (mmu_has_feature(MMU_FTR_RADIX_KUAP)) {
> + mtspr(SPRN_AMR, 0);
> + /*  Do we need isync()? We are going via a kexec reset */
> + isync();
> + }
> +}
> +
> +#define reset_kuep reset_kuep
> +static inline void reset_kuep(void)
> +{
> + if (mmu_has_feature(MMU_FTR_KUEP)) {
> + mtspr(SPRN_IAMR, 0);
> + /*  Do we need isync()? We are going via a kexec reset */
> + isync();
> + }
> +}
> +
>  #endif /* __ASSEMBLY__ */
>  
>  #endif /* _ASM_POWERPC_BOOK3S_64_KUP_RADIX_H */
> diff --git a/arch/powerpc/include/asm/kup.h b/arch/powerpc/include/asm/kup.h
> index c745ee41ad66..4dc23a706910 100644
> --- a/arch/powerpc/include/asm/kup.h
> +++ b/arch/powerpc/include/asm/kup.h
> @@ -113,6 +113,20 @@ static inline void prevent_current_write_to_user(void)
>   prevent_user_access(NULL, NULL, ~0UL, KUAP_CURRENT_WRITE);
>  }
>  
> +#ifndef reset_kuap
> +#define reset_kuap reset_kuap
> +static inline void reset_kuap(void)
> +{
> +}
> +#endif
> +
> +#ifndef reset_kuep
> +#define reset_kuep reset_kuep
> +static inline void reset_kuep(void)
> +{
> +}
> +#endif
> +
>  #endif /* !__ASSEMBLY__ */
>  
>  #endif /* _ASM_POWERPC_KUAP_H_ */
> diff --git a/arch/powerpc/kernel/misc_64.S b/arch/powerpc/kernel/misc_64.S
> index 1864605eca29..7bb46ad98207 100644
> --- a/arch/powerpc/kernel/misc_64.S
> +++ b/arch/powerpc/kernel/misc_64.S
> @@ -413,20 +413,6 @@ _GLOBAL(kexec_sequence)
>   li  r0,0
>   std r0,16(r1)
>  
> -BEGIN_FTR_SECTION
> - /*
> -  * This is the best time to turn AMR/IAMR off.
> -  * key 0 is used in radix for supervisor<->user
> -  * protection, but on hash key 0 is reserved
> -  * ideally we want to enter with a clean state.
> -  * NOTE, we rely on r0 being 0 from above.
> -  */
> - mtspr   SPRN_IAMR,r0
> -BEGIN_FTR_SECTION_NESTED(42)
> - mtspr   SPRN_AMOR,r0
> -END_FTR_SECTION_NESTED_IFSET(CPU_FTR_HVMODE, 42)
> -END_FTR_SECTION_IFSET(CPU_FTR_ARCH_300)
> -
>   /* save regs for local vars on new stack.
>* yes, we won't go back, but ...
>*/
> diff --git a/arch/powerpc/kexec/core_64.c b/arch/powerpc/kexec/core_64.c
> index b4184092172a..a124715f33ea 100644
> --- a/arch/powerpc/kexec/core_64.c
> +++ b/arch/powerpc/kexec/core_64.c
> @@ -152,6 +152,9 @@ static void kexec_smp_down(void *arg)
>   if (ppc_md.kexec_cpu_down)
>   ppc_md.kexec_cpu_down(0, 1);
>  
> + reset_kuap();
> + reset_kuep();
> +
>   kexec_smp_wait();
>   /* NOTREACHED */
>  }
> diff --git a/arch/powerpc/mm/book3s64/pgtable.c 
> b/arch/powerpc/mm/book3s64/pgtable.c
> index c58ad1049909..9673f4b74c9a 100644
> --- a/arch/powerpc/mm/book3s64/pgtable.c
> +++ b/arch/powerpc/mm/book3s64/pgtable.c
> @@ -165,6 +165,9 @@ void mmu_cleanup_all(void)
>   radix__mmu_cleanup_all();
>   else if (mmu_hash_ops.hpte_clear_all)
>   mmu_hash_ops.hpte_clear_all();
> +
> + reset_kuap();
> + reset_kuep();
>  }
>  
>  #ifdef CONFIG_MEMORY_HOTPLUG
> -- 
> 2.26.2


Re: [PATCH V4 0/4] mm/debug_vm_pgtable: Add some more tests

2020-07-06 Thread Alexander Gordeev
On Mon, Jul 06, 2020 at 06:18:32AM +0530, Anshuman Khandual wrote:

[...]

> Tested on arm64, x86 platforms but only build tested on all other enabled
> platforms through ARCH_HAS_DEBUG_VM_PGTABLE i.e powerpc, arc, s390. The

Sorry for missing to test earlier. Works for me on s390. Also, tried with
few relevant config options to set/unset.

Thanks!


Re: [PATCH] kbuild: introduce ccflags-remove-y and asflags-remove-y

2020-07-06 Thread Anders Roxell
On Mon, 6 Jul 2020 at 13:24, Anders Roxell  wrote:
>
> Hi,
>
> When I built an allmodconfig kernel for arm64, and boot that in qemu
> guest I see the following issues:
>
> [...]
> [   10.451561][T1] Testing tracer function: PASSED
> [   33.087895][T1] Testing dynamic ftrace: .. filter did not
> filter .. FAILED!
> [   51.127094][T1] [ cut here ]
> [   51.132387][T1] WARNING: CPU: 0 PID: 1 at
> kernel/trace/trace.c:1814 run_tracer_selftest+0x314/0x40c
> [   51.140805][T1] Modules linked in:
> [   51.145398][T1] CPU: 0 PID: 1 Comm: swapper/0 Not tainted
> 5.8.0-rc3-next-20200703-4-g8cd4bc531754 #1
> [   51.154146][T1] Hardware name: linux,dummy-virt (DT)
> [   51.159536][T1] pstate: 8045 (Nzcv daif +PAN -UAO BTYPE=--)
> [   51.165711][T1] pc : run_tracer_selftest+0x314/0x40c
> [   51.171167][T1] lr : run_tracer_selftest+0x308/0x40c
> [   51.176475][T1] sp : 69c07c40
> [   51.180855][T1] x29: 69c07c40 x28: a00015e80320
> [   51.187187][T1] x27: a00013e074a0 x26: a000150f22ee
> [   51.193520][T1] x25: a000125b35a0 x24: a00013f6a000
> [   51.199868][T1] x23: a00013f6adc0 x22: 
> [   51.206225][T1] x21: a000150f2250 x20: a00015e801c0
> [   51.212582][T1] x19: a00015e7fa80 x18: 2750
> [   51.218887][T1] x17: 14c0 x16: 16f0
> [   51.225229][T1] x15: 14f8 x14: a0001000a3c8
> [   51.231584][T1] x13: a000124a5fb8 x12: 8d494822
> [   51.237935][T1] x11: 1fffed494821 x10: 8d494821
> [   51.244293][T1] x9 : a000101763b0 x8 : 69c077d7
> [   51.250704][T1] x7 : 0001 x6 : 69c077d0
> [   51.257060][T1] x5 : 6a7fc040 x4 : 
> [   51.263423][T1] x3 : a0001028 x2 : 8af050918b00
> [   51.269757][T1] x1 :  x0 : 0001
> [   51.276084][T1] Call trace:
> [   51.279934][T1]  run_tracer_selftest+0x314/0x40c
> [   51.285174][T1]  init_trace_selftests+0x110/0x31c
> [   51.290391][T1]  do_one_initcall+0x410/0x960
> [   51.295315][T1]  kernel_init_freeable+0x430/0x4f0
> [   51.300595][T1]  kernel_init+0x20/0x1d8
> [   51.305180][T1]  ret_from_fork+0x10/0x18
> [   51.309741][T1] irq event stamp: 1401832
> [   51.314399][T1] hardirqs last  enabled at (1401831):
> [] console_unlock+0xc04/0xd10
> [   51.322973][T1] hardirqs last disabled at (1401832):
> [] debug_exception_enter+0xbc/0x200
> [   51.331993][T1] softirqs last  enabled at (1401828):
> [] __do_softirq+0x95c/0x9f8
> [   51.340490][T1] softirqs last disabled at (1401821):
> [] irq_exit+0x118/0x198
> [   51.348717][T1] _warn_unseeded_randomness: 5 callbacks suppressed
> [   51.349263][T1] random: get_random_bytes called from
> print_oops_end_marker+0x48/0x80 with crng_init=0
> [   51.350502][T1] ---[ end trace c566e8a71c933d3c ]---
> [...]
> [40709.672335][C0] pool 2: cpus=0 flags=0x5 nice=0 hung=3s
> workers=3 manager: 1455
> [40739.960593][   T26] INFO: lockdep is turned off.
> [40775.312499][   T26] Kernel panic - not syncing: hung_task: blocked tasks
> [40775.341521][   T26] CPU: 0 PID: 26 Comm: khungtaskd Tainted: G
>   W 5.8.0-rc3-next-20200703-4-g8cd4bc531754 #1
> [40775.352848][   T26] Hardware name: linux,dummy-virt (DT)
> [40775.359304][   T26] Call trace:
> [40775.364148][   T26]  dump_backtrace+0x0/0x418
> [40775.369918][   T26]  show_stack+0x34/0x48
> [40775.375468][   T26]  dump_stack+0x1f4/0x2b0
> [40775.381136][   T26]  panic+0x2dc/0x6ec
> [40775.386430][   T26]  watchdog+0x1400/0x1460
> [40775.392103][   T26]  kthread+0x23c/0x250
> [40775.397548][   T26]  ret_from_fork+0x10/0x18
> [40775.407039][   T26] Kernel Offset: disabled
> [40775.412634][   T26] CPU features: 0x240002,20002004
> [40775.418751][   T26] Memory Limit: none
> [40775.425823][   T26] ---[ end Kernel panic - not syncing: hung_task:
> blocked tasks ]---
>
> The full log can be found here [1].
>
> Without this patch for  'trace_selftest_dynamic' for instance, CC_FLAGS_FTRACE
> was removed from kernel/trace/*, and then added back to
> kernel/trace/trace_selftest_dynamic.
> While with this patch it looks like we add the flag (even though it is
> already there), and then
> removes the flag for all files in kernel/trace/* .

Hi again,

I think the patch below solved the issue for trace_selftest_dynamic. However,
I think there is still differences in lib/* .

diff --git a/scripts/Makefile.lib b/scripts/Makefile.lib
index cc5e3c6aaa20..5632a711921f 100644
--- a/scripts/Makefile.lib
+++ b/scripts/Makefile.lib
@@ -111,12 +111,12 @@ basename_flags = -DKBUILD_BASENAME=$(call
name-fix,$(basetarget))
 modname_flags  = -DKBUILD_MODNAME=$(call name-fix,$(modname))
 modfile_flags  = -DKBUILD_MODFILE=$(call stringify,$(modfile))

-orig_c_flags   = $(KBUILD_CPPFLAGS) 

Re: [PATCH] kbuild: introduce ccflags-remove-y and asflags-remove-y

2020-07-06 Thread Anders Roxell
Hi,

When I built an allmodconfig kernel for arm64, and boot that in qemu
guest I see the following issues:

[...]
[   10.451561][T1] Testing tracer function: PASSED
[   33.087895][T1] Testing dynamic ftrace: .. filter did not
filter .. FAILED!
[   51.127094][T1] [ cut here ]
[   51.132387][T1] WARNING: CPU: 0 PID: 1 at
kernel/trace/trace.c:1814 run_tracer_selftest+0x314/0x40c
[   51.140805][T1] Modules linked in:
[   51.145398][T1] CPU: 0 PID: 1 Comm: swapper/0 Not tainted
5.8.0-rc3-next-20200703-4-g8cd4bc531754 #1
[   51.154146][T1] Hardware name: linux,dummy-virt (DT)
[   51.159536][T1] pstate: 8045 (Nzcv daif +PAN -UAO BTYPE=--)
[   51.165711][T1] pc : run_tracer_selftest+0x314/0x40c
[   51.171167][T1] lr : run_tracer_selftest+0x308/0x40c
[   51.176475][T1] sp : 69c07c40
[   51.180855][T1] x29: 69c07c40 x28: a00015e80320
[   51.187187][T1] x27: a00013e074a0 x26: a000150f22ee
[   51.193520][T1] x25: a000125b35a0 x24: a00013f6a000
[   51.199868][T1] x23: a00013f6adc0 x22: 
[   51.206225][T1] x21: a000150f2250 x20: a00015e801c0
[   51.212582][T1] x19: a00015e7fa80 x18: 2750
[   51.218887][T1] x17: 14c0 x16: 16f0
[   51.225229][T1] x15: 14f8 x14: a0001000a3c8
[   51.231584][T1] x13: a000124a5fb8 x12: 8d494822
[   51.237935][T1] x11: 1fffed494821 x10: 8d494821
[   51.244293][T1] x9 : a000101763b0 x8 : 69c077d7
[   51.250704][T1] x7 : 0001 x6 : 69c077d0
[   51.257060][T1] x5 : 6a7fc040 x4 : 
[   51.263423][T1] x3 : a0001028 x2 : 8af050918b00
[   51.269757][T1] x1 :  x0 : 0001
[   51.276084][T1] Call trace:
[   51.279934][T1]  run_tracer_selftest+0x314/0x40c
[   51.285174][T1]  init_trace_selftests+0x110/0x31c
[   51.290391][T1]  do_one_initcall+0x410/0x960
[   51.295315][T1]  kernel_init_freeable+0x430/0x4f0
[   51.300595][T1]  kernel_init+0x20/0x1d8
[   51.305180][T1]  ret_from_fork+0x10/0x18
[   51.309741][T1] irq event stamp: 1401832
[   51.314399][T1] hardirqs last  enabled at (1401831):
[] console_unlock+0xc04/0xd10
[   51.322973][T1] hardirqs last disabled at (1401832):
[] debug_exception_enter+0xbc/0x200
[   51.331993][T1] softirqs last  enabled at (1401828):
[] __do_softirq+0x95c/0x9f8
[   51.340490][T1] softirqs last disabled at (1401821):
[] irq_exit+0x118/0x198
[   51.348717][T1] _warn_unseeded_randomness: 5 callbacks suppressed
[   51.349263][T1] random: get_random_bytes called from
print_oops_end_marker+0x48/0x80 with crng_init=0
[   51.350502][T1] ---[ end trace c566e8a71c933d3c ]---
[...]
[40709.672335][C0] pool 2: cpus=0 flags=0x5 nice=0 hung=3s
workers=3 manager: 1455
[40739.960593][   T26] INFO: lockdep is turned off.
[40775.312499][   T26] Kernel panic - not syncing: hung_task: blocked tasks
[40775.341521][   T26] CPU: 0 PID: 26 Comm: khungtaskd Tainted: G
  W 5.8.0-rc3-next-20200703-4-g8cd4bc531754 #1
[40775.352848][   T26] Hardware name: linux,dummy-virt (DT)
[40775.359304][   T26] Call trace:
[40775.364148][   T26]  dump_backtrace+0x0/0x418
[40775.369918][   T26]  show_stack+0x34/0x48
[40775.375468][   T26]  dump_stack+0x1f4/0x2b0
[40775.381136][   T26]  panic+0x2dc/0x6ec
[40775.386430][   T26]  watchdog+0x1400/0x1460
[40775.392103][   T26]  kthread+0x23c/0x250
[40775.397548][   T26]  ret_from_fork+0x10/0x18
[40775.407039][   T26] Kernel Offset: disabled
[40775.412634][   T26] CPU features: 0x240002,20002004
[40775.418751][   T26] Memory Limit: none
[40775.425823][   T26] ---[ end Kernel panic - not syncing: hung_task:
blocked tasks ]---

The full log can be found here [1].

Without this patch for  'trace_selftest_dynamic' for instance, CC_FLAGS_FTRACE
was removed from kernel/trace/*, and then added back to
kernel/trace/trace_selftest_dynamic.
While with this patch it looks like we add the flag (even though it is
already there), and then
removes the flag for all files in kernel/trace/* .

Cheers,
Anders
[1] https://people.linaro.org/~anders.roxell/output-next-20200703.log

On Tue, 30 Jun 2020 at 04:09, Masahiro Yamada  wrote:
>
> On Mon, Jun 29, 2020 at 2:55 PM Michael Ellerman  wrote:
> >
> > Masahiro Yamada  writes:
> > > CFLAGS_REMOVE_.o works per object, that is, there is no
> > > convenient way to filter out flags for every object in a directory.
> > >
> > > Add ccflags-remove-y and asflags-remove-y to make it easily.
> > >
> > > Use ccflags-remove-y to clean up some Makefiles.
> > >
> > > Suggested-by: Sami Tolvanen 
> > > Signed-off-by: Masahiro Yamada 
> > > ---
> > >
> > >  arch/arm/boot/compressed/Makefile | 6 +-
> > >  arch/powerpc/xmon/Makefile| 3 +--
> > >  arch/sh/boot/compressed/Makefile  | 5 +
> > >  

Re: [PATCH v6 00/11] Add the multiple PF support for DWC and Layerscape

2020-07-06 Thread Lorenzo Pieralisi
On Sat, Mar 14, 2020 at 11:30:27AM +0800, Xiaowei Bao wrote:
> Add the PCIe EP multiple PF support for DWC and Layerscape, add
> the doorbell MSIX function for DWC, use list to manage the PF of
> one PCIe controller, and refactor the Layerscape EP driver due to
> some platforms difference.
> 
> Xiaowei Bao (11):
>   PCI: designware-ep: Add multiple PFs support for DWC
>   PCI: designware-ep: Add the doorbell mode of MSI-X in EP mode
>   PCI: designware-ep: Move the function of getting MSI capability
> forward
>   PCI: designware-ep: Modify MSI and MSIX CAP way of finding
>   dt-bindings: pci: layerscape-pci: Add compatible strings for ls1088a
> and ls2088a
>   PCI: layerscape: Fix some format issue of the code
>   PCI: layerscape: Modify the way of getting capability with different
> PEX
>   PCI: layerscape: Modify the MSIX to the doorbell mode
>   PCI: layerscape: Add EP mode support for ls1088a and ls2088a
>   arm64: dts: layerscape: Add PCIe EP node for ls1088a
>   misc: pci_endpoint_test: Add LS1088a in pci_device_id table
> 
>  .../devicetree/bindings/pci/layerscape-pci.txt |   2 +
>  arch/arm64/boot/dts/freescale/fsl-ls1088a.dtsi |  31 +++
>  drivers/misc/pci_endpoint_test.c   |   2 +
>  drivers/pci/controller/dwc/pci-layerscape-ep.c | 100 ++--
>  drivers/pci/controller/dwc/pcie-designware-ep.c| 255 
> +
>  drivers/pci/controller/dwc/pcie-designware.c   |  59 +++--
>  drivers/pci/controller/dwc/pcie-designware.h   |  48 +++-
>  7 files changed, 404 insertions(+), 93 deletions(-)

Can you rebase it against v5.8-rc1 please ?

Thanks,
Lorenzo


[PATCH 11/11 RFC] PCI: Remove "*val = 0" from pcie_capability_read_*()

2020-07-06 Thread Saheed Olayemi Bolarinwa
From: Bolarinwa Olayemi Saheed 

 **TODO**

Suggested-by: Bjorn Helgaas 
Signed-off-by: Bolarinwa Olayemi Saheed 
---
This patch  depends on all of the preceeding patches in this series,
otherwise it will introduce bugs as pointed out in the commit message
of each.
 drivers/pci/access.c | 14 --
 1 file changed, 14 deletions(-)

diff --git a/drivers/pci/access.c b/drivers/pci/access.c
index 79c4a2ef269a..ec95edbb1ac8 100644
--- a/drivers/pci/access.c
+++ b/drivers/pci/access.c
@@ -413,13 +413,6 @@ int pcie_capability_read_word(struct pci_dev *dev, int 
pos, u16 *val)
 
if (pcie_capability_reg_implemented(dev, pos)) {
ret = pci_read_config_word(dev, pci_pcie_cap(dev) + pos, val);
-   /*
-* Reset *val to 0 if pci_read_config_word() fails, it may
-* have been written as 0x if hardware error happens
-* during pci_read_config_word().
-*/
-   if (ret)
-   *val = 0;
return ret;
}
 
@@ -448,13 +441,6 @@ int pcie_capability_read_dword(struct pci_dev *dev, int 
pos, u32 *val)
 
if (pcie_capability_reg_implemented(dev, pos)) {
ret = pci_read_config_dword(dev, pci_pcie_cap(dev) + pos, val);
-   /*
-* Reset *val to 0 if pci_read_config_dword() fails, it may
-* have been written as 0x if hardware error happens
-* during pci_read_config_dword().
-*/
-   if (ret)
-   *val = 0;
return ret;
}
 
-- 
2.18.2



[PATCH 0/11 RFC] PCI: Remove "*val = 0" from pcie_capability_read_*()

2020-07-06 Thread Saheed Olayemi Bolarinwa
From: Bolarinwa Olayemi Saheed 

*** BLURB HERE ***

Bolarinwa Olayemi Saheed (9):
  IB/hfi1: Confirm that pcie_capability_read_dword() is successful
  misc: rtsx: Confirm that pcie_capability_read_word() is successful
  PCI/AER: Use error return value from pcie_capability_read_*()
  PCI/ASPM: Use error return value from pcie_capability_read_*()
  PCI: pciehp: Fix wrong failure check on pcie_capability_read_*()
  PCI: pciehp: Prevent wrong failure check on pcie_capability_read_*()
  PCI: pciehp: Make "Power On" the default in pciehp_get_power_status()
  PCI/ACPI: Prevent wrong failure check on pcie_capability_read_*()
  PCI: Prevent wrong failure check on pcie_capability_read_*()
  PCI: Remove "*val = 0" fom pcie_capability_read_*()

 
 drivers/infiniband/hw/hfi1/aspm.c | 7 ---
 drivers/misc/cardreader/rts5227.c | 5 +++--
 drivers/misc/cardreader/rts5249.c | 5 +++--
 drivers/misc/cardreader/rts5260.c | 5 +++--
 drivers/misc/cardreader/rts5261.c | 5 +++--
 drivers/pci/pcie/aer.c  |  5 +++--
 drivers/pci/pcie/aspm.c | 33 +
 drivers/pci/hotplug/pciehp_hpc.c | 47 
 drivers/pci/pci-acpi.c   | 10 ---
 drivers/pci/probe.c  | 29 
 drivers/pci/access.c | 14 --
 11 files changed, 82 insertions(+), 83 deletions(-)

-- 
2.18.2




Re: [PATCH v12 00/31] Speculative page faults

2020-07-06 Thread Chinwen Chang
On Thu, 2019-06-20 at 16:19 +0800, Haiyan Song wrote:
> Hi Laurent,
> 
> I downloaded your script and run it on Intel 2s skylake platform with spf-v12 
> patch
> serials.
> 
> Here attached the output results of this script.
> 
> The following comparison result is statistics from the script outputs.
> 
> a). Enable THP
> SPF_0  change   SPF_1
> will-it-scale.page_fault2.per_thread_ops2664190.8  -11.7%   
> 2353637.6  
> will-it-scale.page_fault3.per_thread_ops4480027.2  -14.7%   
> 3819331.9 
> 
> 
> b). Disable THP
> SPF_0   change  SPF_1
> will-it-scale.page_fault2.per_thread_ops2653260.7   -10%
> 2385165.8
> will-it-scale.page_fault3.per_thread_ops4436330.1   -12.4%  
> 3886734.2 
> 
> 
> Thanks,
> Haiyan Song
> 
> 
> On Fri, Jun 14, 2019 at 10:44:47AM +0200, Laurent Dufour wrote:
> > Le 14/06/2019 à 10:37, Laurent Dufour a écrit :
> > > Please find attached the script I run to get these numbers.
> > > This would be nice if you could give it a try on your victim node and 
> > > share the result.
> > 
> > Sounds that the Intel mail fitering system doesn't like the attached shell 
> > script.
> > Please find it there: 
> > https://gist.github.com/ldu4/a5cc1a93f293108ea387d43d5d5e7f44
> > 
> > Thanks,
> > Laurent.
> > 

Hi Laurent,

We merged SPF v11 and some patches from v12 into our platforms. After
several experiments, we observed SPF has obvious improvements on the
launch time of applications, especially for those high-TLP ones,

# launch time of applications(s):

package   version  w/ SPF  w/o SPF  improve(%)
--  

Baidu maps10.13.3  0.887   0.98 9.49
Taobao8.4.0.35 1.227   1.2935.10
Meituan   9.12.401 1.107   1.54328.26
WeChat7.0.32.353   2.68 12.20
Honor of Kings1.43.1.6 6.636.7131.24


By the way, we have verified our platforms with those patches and
achieved the goal of mass production.

Thanks.
Chinwen Chang


Re: [PATCH] powerpc: select ARCH_HAS_MEMBARRIER_SYNC_CORE

2020-07-06 Thread Christophe Leroy




Le 06/07/2020 à 04:18, Nicholas Piggin a écrit :

powerpc return from interrupt and return from system call sequences are
context synchronising.

Signed-off-by: Nicholas Piggin 
---
  .../features/sched/membarrier-sync-core/arch-support.txt  | 4 ++--
  arch/powerpc/Kconfig  | 1 +
  arch/powerpc/include/asm/exception-64s.h  | 4 
  3 files changed, 7 insertions(+), 2 deletions(-)

diff --git a/Documentation/features/sched/membarrier-sync-core/arch-support.txt 
b/Documentation/features/sched/membarrier-sync-core/arch-support.txt
index 8a521a622966..52ad74a25f54 100644
--- a/Documentation/features/sched/membarrier-sync-core/arch-support.txt
+++ b/Documentation/features/sched/membarrier-sync-core/arch-support.txt
@@ -5,7 +5,7 @@
  #
  # Architecture requirements
  #
-# * arm/arm64
+# * arm/arm64/powerpc
  #
  # Rely on implicit context synchronization as a result of exception return
  # when returning from IPI handler, and when returning to user-space.
@@ -45,7 +45,7 @@
  |   nios2: | TODO |
  |openrisc: | TODO |
  |  parisc: | TODO |
-| powerpc: | TODO |
+| powerpc: |  ok  |
  |   riscv: | TODO |
  |s390: | TODO |
  |  sh: | TODO |
diff --git a/arch/powerpc/Kconfig b/arch/powerpc/Kconfig
index 9fa23eb320ff..920c4e3ca4ef 100644
--- a/arch/powerpc/Kconfig
+++ b/arch/powerpc/Kconfig
@@ -131,6 +131,7 @@ config PPC
select ARCH_HAS_PTE_DEVMAP  if PPC_BOOK3S_64
select ARCH_HAS_PTE_SPECIAL
select ARCH_HAS_MEMBARRIER_CALLBACKS
+   select ARCH_HAS_MEMBARRIER_SYNC_CORE
select ARCH_HAS_SCALED_CPUTIME  if VIRT_CPU_ACCOUNTING_NATIVE 
&& PPC_BOOK3S_64
select ARCH_HAS_STRICT_KERNEL_RWX   if (PPC32 && !HIBERNATION)
select ARCH_HAS_TICK_BROADCAST  if GENERIC_CLOCKEVENTS_BROADCAST
diff --git a/arch/powerpc/include/asm/exception-64s.h 
b/arch/powerpc/include/asm/exception-64s.h
index 47bd4ea0837d..b88cb3a989b6 100644
--- a/arch/powerpc/include/asm/exception-64s.h
+++ b/arch/powerpc/include/asm/exception-64s.h
@@ -68,6 +68,10 @@
   *
   * The nop instructions allow us to insert one or more instructions to flush 
the
   * L1-D cache when returning to userspace or a guest.
+ *
+ * powerpc relies on return from interrupt/syscall being context synchronising
+ * (which hrfid, rfid, and rfscv are) to support ARCH_HAS_MEMBARRIER_SYNC_CORE
+ * without additional additional synchronisation instructions.


This file is dedicated to BOOK3S/64. What about other ones ?

On 32 bits, this is also valid as 'rfi' is also context synchronising, 
but then why just add some comment in exception-64s.h and only there ?



   */
  #define RFI_FLUSH_SLOT
\
RFI_FLUSH_FIXUP_SECTION;\




Christophe


Re: [PATCH 01/14] powerpc/eeh: Remove eeh_dev_phb_init_dynamic()

2020-07-06 Thread kernel test robot
Hi Oliver,

I love your patch! Yet something to improve:

[auto build test ERROR on powerpc/next]
[also build test ERROR on v5.8-rc4 next-20200706]
[cannot apply to scottwood/next]
[If your patch is applied to the wrong git tree, kindly drop us a note.
And when submitting patch, we suggest to use  as documented in
https://git-scm.com/docs/git-format-patch]

url:
https://github.com/0day-ci/linux/commits/Oliver-O-Halloran/powerpc-eeh-Remove-eeh_dev_phb_init_dynamic/20200706-103630
base:   https://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux.git next
config: powerpc64-randconfig-r006-20200706 (attached as .config)
compiler: powerpc64-linux-gcc (GCC) 9.3.0
reproduce (this is a W=1 build):
wget 
https://raw.githubusercontent.com/intel/lkp-tests/master/sbin/make.cross -O 
~/bin/make.cross
chmod +x ~/bin/make.cross
# save the attached .config to linux build tree
COMPILER_INSTALL_PATH=$HOME/0day COMPILER=gcc-9.3.0 make.cross 
ARCH=powerpc64 

If you fix the issue, kindly add following tag as appropriate
Reported-by: kernel test robot 

All errors (new ones prefixed by >>):

   arch/powerpc/kernel/of_platform.c: In function 'of_pci_phb_probe':
>> arch/powerpc/kernel/of_platform.c:66:2: error: implicit declaration of 
>> function 'eeh_phb_pe_create' [-Werror=implicit-function-declaration]
  66 |  eeh_phb_pe_create(phb);
 |  ^
   cc1: all warnings being treated as errors

vim +/eeh_phb_pe_create +66 arch/powerpc/kernel/of_platform.c

28  
29  /* The probing of PCI controllers from of_platform is currently
30   * 64 bits only, mostly due to gratuitous differences between
31   * the 32 and 64 bits PCI code on PowerPC and the 32 bits one
32   * lacking some bits needed here.
33   */
34  
35  static int of_pci_phb_probe(struct platform_device *dev)
36  {
37  struct pci_controller *phb;
38  
39  /* Check if we can do that ... */
40  if (ppc_md.pci_setup_phb == NULL)
41  return -ENODEV;
42  
43  pr_info("Setting up PCI bus %pOF\n", dev->dev.of_node);
44  
45  /* Alloc and setup PHB data structure */
46  phb = pcibios_alloc_controller(dev->dev.of_node);
47  if (!phb)
48  return -ENODEV;
49  
50  /* Setup parent in sysfs */
51  phb->parent = >dev;
52  
53  /* Setup the PHB using arch provided callback */
54  if (ppc_md.pci_setup_phb(phb)) {
55  pcibios_free_controller(phb);
56  return -ENODEV;
57  }
58  
59  /* Process "ranges" property */
60  pci_process_bridge_OF_ranges(phb, dev->dev.of_node, 0);
61  
62  /* Init pci_dn data structures */
63  pci_devs_phb_init_dynamic(phb);
64  
65  /* Create EEH PE for the PHB */
  > 66  eeh_phb_pe_create(phb);
67  
68  /* Scan the bus */
69  pcibios_scan_phb(phb);
70  if (phb->bus == NULL)
71  return -ENXIO;
72  
73  /* Claim resources. This might need some rework as well 
depending
74   * whether we are doing probe-only or not, like assigning 
unassigned
75   * resources etc...
76   */
77  pcibios_claim_one_bus(phb->bus);
78  
79  /* Add probed PCI devices to the device model */
80  pci_bus_add_devices(phb->bus);
81  
82  return 0;
83  }
84  

---
0-DAY CI Kernel Test Service, Intel Corporation
https://lists.01.org/hyperkitty/list/kbuild-...@lists.01.org


.config.gz
Description: application/gzip


Re: [PATCH v2 01/12] kexec_file: allow archs to handle special regions while locating memory hole

2020-07-06 Thread Dave Young
On 07/03/20 at 01:24am, Hari Bathini wrote:
> Some architectures may have special memory regions, within the given
> memory range, which can't be used for the buffer in a kexec segment.
> Implement weak arch_kexec_locate_mem_hole() definition which arch code
> may override, to take care of special regions, while trying to locate
> a memory hole.
> 
> Also, add the missing declarations for arch overridable functions and
> and drop the __weak descriptors in the declarations to avoid non-weak
> definitions from becoming weak.
> 
> Reported-by: kernel test robot 
> [lkp: In v1, arch_kimage_file_post_load_cleanup() declaration was missing]
> Signed-off-by: Hari Bathini 
> ---
> 
> Changes in v2:
> * Introduced arch_kexec_locate_mem_hole() for override and dropped
>   weak arch_kexec_add_buffer().
> * Dropped __weak identifier for arch overridable functions.
> * Fixed the missing declaration for arch_kimage_file_post_load_cleanup()
>   reported by lkp. lkp report for reference:
> - https://lore.kernel.org/patchwork/patch/1264418/
> 
> 
>  include/linux/kexec.h |   29 ++---
>  kernel/kexec_file.c   |   16 ++--
>  2 files changed, 32 insertions(+), 13 deletions(-)
> 
> diff --git a/include/linux/kexec.h b/include/linux/kexec.h
> index ea67910..9e93bef 100644
> --- a/include/linux/kexec.h
> +++ b/include/linux/kexec.h
> @@ -183,17 +183,24 @@ int kexec_purgatory_get_set_symbol(struct kimage 
> *image, const char *name,
>  bool get_value);
>  void *kexec_purgatory_get_symbol_addr(struct kimage *image, const char 
> *name);
>  
> -int __weak arch_kexec_kernel_image_probe(struct kimage *image, void *buf,
> -  unsigned long buf_len);
> -void * __weak arch_kexec_kernel_image_load(struct kimage *image);
> -int __weak arch_kexec_apply_relocations_add(struct purgatory_info *pi,
> - Elf_Shdr *section,
> - const Elf_Shdr *relsec,
> - const Elf_Shdr *symtab);
> -int __weak arch_kexec_apply_relocations(struct purgatory_info *pi,
> - Elf_Shdr *section,
> - const Elf_Shdr *relsec,
> - const Elf_Shdr *symtab);
> +/* Architectures may override the below functions */
> +int arch_kexec_kernel_image_probe(struct kimage *image, void *buf,
> +   unsigned long buf_len);
> +void *arch_kexec_kernel_image_load(struct kimage *image);
> +int arch_kexec_apply_relocations_add(struct purgatory_info *pi,
> +  Elf_Shdr *section,
> +  const Elf_Shdr *relsec,
> +  const Elf_Shdr *symtab);
> +int arch_kexec_apply_relocations(struct purgatory_info *pi,
> +  Elf_Shdr *section,
> +  const Elf_Shdr *relsec,
> +  const Elf_Shdr *symtab);
> +int arch_kimage_file_post_load_cleanup(struct kimage *image);
> +#ifdef CONFIG_KEXEC_SIG
> +int arch_kexec_kernel_verify_sig(struct kimage *image, void *buf,
> +  unsigned long buf_len);
> +#endif
> +int arch_kexec_locate_mem_hole(struct kexec_buf *kbuf);
>  
>  extern int kexec_add_buffer(struct kexec_buf *kbuf);
>  int kexec_locate_mem_hole(struct kexec_buf *kbuf);
> diff --git a/kernel/kexec_file.c b/kernel/kexec_file.c
> index 09cc78d..e89912d 100644
> --- a/kernel/kexec_file.c
> +++ b/kernel/kexec_file.c
> @@ -636,6 +636,19 @@ int kexec_locate_mem_hole(struct kexec_buf *kbuf)
>  }
>  
>  /**
> + * arch_kexec_locate_mem_hole - Find free memory to place the segments.
> + * @kbuf:   Parameters for the memory search.
> + *
> + * On success, kbuf->mem will have the start address of the memory region 
> found.
> + *
> + * Return: 0 on success, negative errno on error.
> + */
> +int __weak arch_kexec_locate_mem_hole(struct kexec_buf *kbuf)
> +{
> + return kexec_locate_mem_hole(kbuf);
> +}
> +
> +/**
>   * kexec_add_buffer - place a buffer in a kexec segment
>   * @kbuf:Buffer contents and memory parameters.
>   *
> @@ -647,7 +660,6 @@ int kexec_locate_mem_hole(struct kexec_buf *kbuf)
>   */
>  int kexec_add_buffer(struct kexec_buf *kbuf)
>  {
> -
>   struct kexec_segment *ksegment;
>   int ret;
>  
> @@ -675,7 +687,7 @@ int kexec_add_buffer(struct kexec_buf *kbuf)
>   kbuf->buf_align = max(kbuf->buf_align, PAGE_SIZE);
>  
>   /* Walk the RAM ranges and allocate a suitable range for the buffer */
> - ret = kexec_locate_mem_hole(kbuf);
> + ret = arch_kexec_locate_mem_hole(kbuf);
>   if (ret)
>   return ret;
>  
> 

Acked-by: Dave Young 

Thanks
Dave



Re: [PATCH v5 15/26] powerpc/book3s64/pkeys: Use execute_pkey_disable static key

2020-07-06 Thread Aneesh Kumar K.V

On 7/6/20 12:50 PM, Michael Ellerman wrote:

"Aneesh Kumar K.V"  writes:

Use execute_pkey_disabled static key to check for execute key support instead
of pkey_disabled.

Signed-off-by: Aneesh Kumar K.V 
---
  arch/powerpc/include/asm/pkeys.h | 10 +-
  arch/powerpc/mm/book3s64/pkeys.c |  5 -
  2 files changed, 5 insertions(+), 10 deletions(-)

diff --git a/arch/powerpc/include/asm/pkeys.h b/arch/powerpc/include/asm/pkeys.h
index 47c81d41ea9a..09fbaa409ac4 100644
--- a/arch/powerpc/include/asm/pkeys.h
+++ b/arch/powerpc/include/asm/pkeys.h
@@ -126,15 +126,7 @@ static inline int mm_pkey_free(struct mm_struct *mm, int 
pkey)
   * Try to dedicate one of the protection keys to be used as an
   * execute-only protection key.
   */
-extern int __execute_only_pkey(struct mm_struct *mm);
-static inline int execute_only_pkey(struct mm_struct *mm)
-{
-   if (static_branch_likely(_disabled))
-   return -1;
-
-   return __execute_only_pkey(mm);
-}
-
+extern int execute_only_pkey(struct mm_struct *mm);
  extern int __arch_override_mprotect_pkey(struct vm_area_struct *vma,
 int prot, int pkey);
  static inline int arch_override_mprotect_pkey(struct vm_area_struct *vma,
diff --git a/arch/powerpc/mm/book3s64/pkeys.c b/arch/powerpc/mm/book3s64/pkeys.c
index bbba9c601e14..fed4f159011b 100644
--- a/arch/powerpc/mm/book3s64/pkeys.c
+++ b/arch/powerpc/mm/book3s64/pkeys.c
@@ -345,8 +345,11 @@ void thread_pkey_regs_init(struct thread_struct *thread)
write_uamor(default_uamor);
  }
  
-int __execute_only_pkey(struct mm_struct *mm)

+int execute_only_pkey(struct mm_struct *mm)
  {
+   if (static_branch_likely(_pkey_disabled))
+   return -1;
+
return mm->context.execute_only_pkey;
  }


That adds the overhead of a function call, but then uses a static_key to
avoid an easy to predict branch, which seems like a bad tradeoff. And
it's not a performance critical path AFAICS.

Anyway this seems unnecessary:

pkey_early_init_devtree()
{
...
if (unlikely(max_pkey <= execute_only_key)) {
/*
 * Insufficient number of keys to support
 * execute only key. Mark it unavailable.
 */
execute_only_key = -1;

void pkey_mm_init(struct mm_struct *mm)
{
...
mm->context.execute_only_pkey = execute_only_key;
}


ie. Can't it just be:

static inline int execute_only_pkey(struct mm_struct *mm)
{
return mm->context.execute_only_pkey;
}



ok updated with

modified   arch/powerpc/mm/book3s64/pkeys.c
@@ -151,7 +151,7 @@ void __init pkey_early_init_devtree(void)
max_pkey = pkeys_total;
 #endif

-   if (unlikely(max_pkey <= execute_only_key)) {
+	if (unlikely(max_pkey <= execute_only_key) || 
!pkey_execute_disable_supported) {

/*
 * Insufficient number of keys to support
 * execute only key. Mark it unavailable.
@@ -368,9 +368,6 @@ int __arch_set_user_pkey_access(struct task_struct 
*tsk, int pkey,


 int execute_only_pkey(struct mm_struct *mm)
 {
-   if (static_branch_likely(_pkey_disabled))
-   return -1;
-
return mm->context.execute_only_pkey;
 }

-aneesh


Re: [PATCH v5 11/26] powerpc/book3s64/pkeys: Make initial_allocation_mask static

2020-07-06 Thread Aneesh Kumar K.V

On 7/6/20 12:34 PM, Michael Ellerman wrote:

"Aneesh Kumar K.V"  writes:

initial_allocation_mask is not used outside this file.


And never modified after init, so make it __ro_after_init as well?




ok, will update reserved_allocation_maask too.


cheers


diff --git a/arch/powerpc/include/asm/pkeys.h b/arch/powerpc/include/asm/pkeys.h
index 652bad7334f3..47c81d41ea9a 100644
--- a/arch/powerpc/include/asm/pkeys.h
+++ b/arch/powerpc/include/asm/pkeys.h
@@ -13,7 +13,6 @@
  
  DECLARE_STATIC_KEY_FALSE(pkey_disabled);

  extern int max_pkey;
-extern u32 initial_allocation_mask; /*  bits set for the initially allocated 
keys */
  extern u32 reserved_allocation_mask; /* bits set for reserved keys */
  
  #define ARCH_VM_PKEY_FLAGS (VM_PKEY_BIT0 | VM_PKEY_BIT1 | VM_PKEY_BIT2 | \

diff --git a/arch/powerpc/mm/book3s64/pkeys.c b/arch/powerpc/mm/book3s64/pkeys.c
index a4d7287082a8..73b5ef1490c8 100644
--- a/arch/powerpc/mm/book3s64/pkeys.c
+++ b/arch/powerpc/mm/book3s64/pkeys.c
@@ -15,11 +15,11 @@
  DEFINE_STATIC_KEY_FALSE(pkey_disabled);
  DEFINE_STATIC_KEY_FALSE(execute_pkey_disabled);
  int  max_pkey;/* Maximum key value supported */
-u32  initial_allocation_mask;   /* Bits set for the initially allocated keys */
  /*
   *  Keys marked in the reservation list cannot be allocated by  userspace
   */
  u32  reserved_allocation_mask;
+static u32  initial_allocation_mask;   /* Bits set for the initially allocated 
keys */
  static u64 default_amr;
  static u64 default_iamr;
  /* Allow all keys to be modified by default */
--
2.26.2



-aneesh


Re: [PATCH v5 08/26] powerpc/book3s64/pkeys: Convert execute key support to static key

2020-07-06 Thread Aneesh Kumar K.V

On 7/6/20 12:49 PM, Michael Ellerman wrote:

"Aneesh Kumar K.V"  writes:

Convert the bool to a static key like pkey_disabled.


I'm not convinced this is worth the added complexity of a static key.

It's not used in any performance critical paths, except for context
switch, but that's already guarded by:

if (old_thread->iamr != new_thread->iamr)

Which should always be false on machines which don't have the execute
key enabled.



Ok will drop the patch.

-aneesh



Re: [PATCH v8 4/4] powerpc: Book3S 64-bit "heavyweight" KASAN support

2020-07-06 Thread kernel test robot
Hi Daniel,

Thank you for the patch! Perhaps something to improve:

[auto build test WARNING on powerpc/next]
[also build test WARNING on linux/master linus/master v5.8-rc4 next-20200703]
[If your patch is applied to the wrong git tree, kindly drop us a note.
And when submitting patch, we suggest to use  as documented in
https://git-scm.com/docs/git-format-patch]

url:
https://github.com/0day-ci/linux/commits/Daniel-Axtens/KASAN-for-powerpc64-radix/20200702-105552
base:   https://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux.git next
config: powerpc64-randconfig-s031-20200706 (attached as .config)
compiler: powerpc64le-linux-gcc (GCC) 9.3.0
reproduce:
wget 
https://raw.githubusercontent.com/intel/lkp-tests/master/sbin/make.cross -O 
~/bin/make.cross
chmod +x ~/bin/make.cross
# apt-get install sparse
# sparse version: v0.6.2-14-g8fce3d7a-dirty
# save the attached .config to linux build tree
COMPILER_INSTALL_PATH=$HOME/0day COMPILER=gcc-9.3.0 make.cross C=1 
CF='-fdiagnostic-prefix -D__CHECK_ENDIAN__' ARCH=powerpc64 

If you fix the issue, kindly add following tag as appropriate
Reported-by: kernel test robot 


sparse warnings: (new ones prefixed by >>)

   drivers/char/tpm/tpm_ibmvtpm.c:130:9: sparse: sparse: cast removes address 
space '__iomem' of expression
>> drivers/char/tpm/tpm_ibmvtpm.c:131:9: sparse: sparse: incorrect type in 
>> argument 1 (different address spaces) @@ expected void *s @@ got 
>> void [noderef] __iomem *rtce_buf @@
>> drivers/char/tpm/tpm_ibmvtpm.c:131:9: sparse: expected void *s
   drivers/char/tpm/tpm_ibmvtpm.c:131:9: sparse: got void [noderef] __iomem 
*rtce_buf
   drivers/char/tpm/tpm_ibmvtpm.c:234:9: sparse: sparse: cast removes address 
space '__iomem' of expression
   drivers/char/tpm/tpm_ibmvtpm.c:369:30: sparse: sparse: incorrect type in 
argument 1 (different address spaces) @@ expected void const * @@ got 
void [noderef] __iomem *rtce_buf @@
   drivers/char/tpm/tpm_ibmvtpm.c:369:30: sparse: expected void const *
   drivers/char/tpm/tpm_ibmvtpm.c:369:30: sparse: got void [noderef] 
__iomem *rtce_buf
   drivers/char/tpm/tpm_ibmvtpm.c:530:43: sparse: sparse: incorrect type in 
assignment (different address spaces) @@ expected void [noderef] __iomem 
*rtce_buf @@ got void * @@
   drivers/char/tpm/tpm_ibmvtpm.c:530:43: sparse: expected void [noderef] 
__iomem *rtce_buf
   drivers/char/tpm/tpm_ibmvtpm.c:530:43: sparse: got void *
   drivers/char/tpm/tpm_ibmvtpm.c:537:52: sparse: sparse: incorrect type in 
argument 2 (different address spaces) @@ expected void *ptr @@ got void 
[noderef] __iomem *rtce_buf @@
   drivers/char/tpm/tpm_ibmvtpm.c:537:52: sparse: expected void *ptr
   drivers/char/tpm/tpm_ibmvtpm.c:537:52: sparse: got void [noderef] 
__iomem *rtce_buf
   drivers/char/tpm/tpm_ibmvtpm.c:543:46: sparse: sparse: incorrect type in 
argument 1 (different address spaces) @@ expected void const * @@ got 
void [noderef] __iomem *rtce_buf @@
   drivers/char/tpm/tpm_ibmvtpm.c:543:46: sparse: expected void const *
   drivers/char/tpm/tpm_ibmvtpm.c:543:46: sparse: got void [noderef] 
__iomem *rtce_buf
--
   drivers/tty/vt/vt.c:233:5: sparse: sparse: symbol 'console_blank_hook' was 
not declared. Should it be static?
   drivers/tty/vt/vt.c:2901:19: sparse: sparse: symbol 'console_driver' was not 
declared. Should it be static?
   arch/powerpc/include/asm/vga.h:34:16: sparse: sparse: cast to restricted 
__le16
   arch/powerpc/include/asm/vga.h:34:16: sparse: sparse: cast to restricted 
__le16
   arch/powerpc/include/asm/vga.h:34:16: sparse: sparse: cast to restricted 
__le16
>> arch/powerpc/include/asm/vga.h:40:21: sparse: sparse: incorrect type in 
>> argument 2 (different base types) @@ expected unsigned short [usertype] 
>> @@ got restricted __le16 [usertype] @@
>> arch/powerpc/include/asm/vga.h:40:21: sparse: expected unsigned short 
>> [usertype]
   arch/powerpc/include/asm/vga.h:40:21: sparse: got restricted __le16 
[usertype]
   arch/powerpc/include/asm/vga.h:34:16: sparse: sparse: cast to restricted 
__le16
   arch/powerpc/include/asm/vga.h:34:16: sparse: sparse: cast to restricted 
__le16
   arch/powerpc/include/asm/vga.h:34:16: sparse: sparse: cast to restricted 
__le16
   arch/powerpc/include/asm/vga.h:34:16: sparse: sparse: cast to restricted 
__le16
   arch/powerpc/include/asm/vga.h:29:15: sparse: sparse: incorrect type in 
assignment (different base types) @@ expected unsigned short volatile 
[usertype] @@ got restricted __le16 [usertype] @@
   arch/powerpc/include/asm/vga.h:29:15: sparse: expected unsigned short 
volatile [usertype]
   arch/powerpc/include/asm/vga.h:29:15: sparse: got restricted __le16 
[usertype]
   arch/powerpc/include/asm/vga.h:34:16: sparse: sparse: cast to restricted 
__le16
   arch/powerpc/include/asm/vga.h:29:

Re: [PATCH v5 17/26] powerpc/book3s64/keys: Print information during boot.

2020-07-06 Thread Michael Ellerman
"Aneesh Kumar K.V"  writes:

> Signed-off-by: Aneesh Kumar K.V 
> ---
>  arch/powerpc/mm/book3s64/pkeys.c | 1 +
>  1 file changed, 1 insertion(+)
>
> diff --git a/arch/powerpc/mm/book3s64/pkeys.c 
> b/arch/powerpc/mm/book3s64/pkeys.c
> index 810118123e70..0d72c0246052 100644
> --- a/arch/powerpc/mm/book3s64/pkeys.c
> +++ b/arch/powerpc/mm/book3s64/pkeys.c
> @@ -195,6 +195,7 @@ void __init pkey_early_init_devtree(void)
>*/
>   initial_allocation_mask |= reserved_allocation_mask;
>  
> + pr_info("Enabling Memory keys with max key count %d", max_pkey);
   ^
   missing newline

>   return;
>  }

The syscall is called pkey_mprotect() and the manpage is "pkeys", so I
think it would make sense to use "pkey" in the message.

cheers


[RFC][PATCH] avoid refcounting the lazy tlb mm struct

2020-07-06 Thread Nicholas Piggin
On big systems, the mm refcount can become highly contented when doing
a lot of context switching with threaded applications (particularly
switching between the idle thread and an application thread).

Not doing lazy tlb at all slows switching down quite a bit, so I wonder
if we can avoid the refcount for the lazy tlb, but have __mmdrop() IPI
all CPUs that might be using this mm lazily.

This patch has only had light testing so far, but seems to work okay.

Thanks,
Nick

--

diff --git a/arch/Kconfig b/arch/Kconfig
index 8cc35dc556c7..69ea7172db3d 100644
--- a/arch/Kconfig
+++ b/arch/Kconfig
@@ -411,6 +411,16 @@ config MMU_GATHER_NO_GATHER
bool
depends on MMU_GATHER_TABLE_FREE
 
+config MMU_LAZY_TLB_SHOOTDOWN
+   bool
+   help
+ Instead of refcounting the "lazy tlb" mm struct, which can cause
+ contention with multi-threaded apps on large multiprocessor systems,
+ this option causes __mmdrop to IPI all CPUs in the mm_cpumask and
+ switch to init_mm if they were using the to-be-freed mm as the lazy
+ tlb. Architectures which do not track all possible lazy tlb CPUs in
+ mm_cpumask can not use this (without modification).
+
 config ARCH_HAVE_NMI_SAFE_CMPXCHG
bool
 
diff --git a/arch/powerpc/Kconfig b/arch/powerpc/Kconfig
index 920c4e3ca4ef..24ac85c868db 100644
--- a/arch/powerpc/Kconfig
+++ b/arch/powerpc/Kconfig
@@ -225,6 +225,7 @@ config PPC
select HAVE_PERF_USER_STACK_DUMP
select MMU_GATHER_RCU_TABLE_FREE
select MMU_GATHER_PAGE_SIZE
+   select MMU_LAZY_TLB_SHOOTDOWN
select HAVE_REGS_AND_STACK_ACCESS_API
select HAVE_RELIABLE_STACKTRACE if PPC_BOOK3S_64 && 
CPU_LITTLE_ENDIAN
select HAVE_SYSCALL_TRACEPOINTS
diff --git a/arch/powerpc/mm/book3s64/radix_tlb.c 
b/arch/powerpc/mm/book3s64/radix_tlb.c
index b5cc9b23cf02..52730629b3eb 100644
--- a/arch/powerpc/mm/book3s64/radix_tlb.c
+++ b/arch/powerpc/mm/book3s64/radix_tlb.c
@@ -652,10 +652,10 @@ static void do_exit_flush_lazy_tlb(void *arg)
 * Must be a kernel thread because sender is single-threaded.
 */
BUG_ON(current->mm);
-   mmgrab(_mm);
+   mmgrab_lazy_tlb(_mm);
switch_mm(mm, _mm, current);
current->active_mm = _mm;
-   mmdrop(mm);
+   mmdrop_lazy_tlb(mm);
}
_tlbiel_pid(pid, RIC_FLUSH_ALL);
 }
diff --git a/fs/exec.c b/fs/exec.c
index e6e8a9a70327..6c96c8feba1f 100644
--- a/fs/exec.c
+++ b/fs/exec.c
@@ -1119,7 +1119,7 @@ static int exec_mmap(struct mm_struct *mm)
mmput(old_mm);
return 0;
}
-   mmdrop(active_mm);
+   mmdrop_lazy_tlb(active_mm);
return 0;
 }
 
diff --git a/include/linux/sched/mm.h b/include/linux/sched/mm.h
index 480a4d1b7dd8..ef28059086a1 100644
--- a/include/linux/sched/mm.h
+++ b/include/linux/sched/mm.h
@@ -51,6 +51,25 @@ static inline void mmdrop(struct mm_struct *mm)
 
 void mmdrop(struct mm_struct *mm);
 
+static inline void mmgrab_lazy_tlb(struct mm_struct *mm)
+{
+   if (!IS_ENABLED(CONFIG_MMU_LAZY_TLB_SHOOTDOWN))
+   mmgrab(mm);
+}
+
+static inline void mmdrop_lazy_tlb(struct mm_struct *mm)
+{
+   if (!IS_ENABLED(CONFIG_MMU_LAZY_TLB_SHOOTDOWN))
+   mmdrop(mm);
+}
+
+static inline void mmdrop_lazy_tlb_smp_mb(struct mm_struct *mm)
+{
+   mmdrop_lazy_tlb(mm);
+   if (IS_ENABLED(CONFIG_MMU_LAZY_TLB_SHOOTDOWN))
+   smp_mb();
+}
+
 /*
  * This has to be called after a get_task_mm()/mmget_not_zero()
  * followed by taking the mmap_lock for writing before modifying the
diff --git a/kernel/fork.c b/kernel/fork.c
index 142b23645d82..e3f1039cee9f 100644
--- a/kernel/fork.c
+++ b/kernel/fork.c
@@ -685,6 +685,34 @@ static void check_mm(struct mm_struct *mm)
 #define allocate_mm()  (kmem_cache_alloc(mm_cachep, GFP_KERNEL))
 #define free_mm(mm)(kmem_cache_free(mm_cachep, (mm)))
 
+static void do_shoot_lazy_tlb(void *arg)
+{
+   struct mm_struct *mm = arg;
+
+   if (current->active_mm == mm) {
+   BUG_ON(current->mm);
+   switch_mm(mm, _mm, current);
+   current->active_mm = _mm;
+   }
+}
+
+static void do_check_lazy_tlb(void *arg)
+{
+   struct mm_struct *mm = arg;
+
+   BUG_ON(current->active_mm == mm);
+}
+
+void shoot_lazy_tlbs(struct mm_struct *mm)
+{
+   if (IS_ENABLED(CONFIG_MMU_LAZY_TLB_SHOOTDOWN)) {
+   smp_call_function_many(mm_cpumask(mm), do_shoot_lazy_tlb, (void 
*)mm, 1);
+   do_shoot_lazy_tlb(mm);
+   }
+   smp_call_function(do_check_lazy_tlb, (void *)mm, 1);
+   do_check_lazy_tlb(mm);
+}
+
 /*
  * Called when the last reference to the mm
  * is dropped: either by a lazy thread or by
@@ -692,6 +720,7 @@ static void check_mm(struct mm_struct *mm)
  */
 void __mmdrop(struct mm_struct *mm)
 {
+   shoot_lazy_tlbs(mm);
BUG_ON(mm == _mm);

Re: [PATCH v5 10/26] powerpc/book3s64/pkeys: Convert pkey_total to max_pkey

2020-07-06 Thread Aneesh Kumar K.V

On 7/6/20 12:34 PM, Michael Ellerman wrote:

"Aneesh Kumar K.V"  writes:

max_pkey now represents max key value that userspace can allocate.



I guess commit message is confusing.


Signed-off-by: Aneesh Kumar K.V 
---
  arch/powerpc/include/asm/pkeys.h |  7 +--
  arch/powerpc/mm/book3s64/pkeys.c | 14 +++---
  2 files changed, 12 insertions(+), 9 deletions(-)

diff --git a/arch/powerpc/include/asm/pkeys.h b/arch/powerpc/include/asm/pkeys.h
index 75d2a2c19c04..652bad7334f3 100644
--- a/arch/powerpc/include/asm/pkeys.h
+++ b/arch/powerpc/include/asm/pkeys.h
@@ -12,7 +12,7 @@
  #include 
  
  DECLARE_STATIC_KEY_FALSE(pkey_disabled);

-extern int pkeys_total; /* total pkeys as per device tree */
+extern int max_pkey;
  extern u32 initial_allocation_mask; /*  bits set for the initially allocated 
keys */
  extern u32 reserved_allocation_mask; /* bits set for reserved keys */
  
@@ -44,7 +44,10 @@ static inline int vma_pkey(struct vm_area_struct *vma)

return (vma->vm_flags & ARCH_VM_PKEY_FLAGS) >> VM_PKEY_SHIFT;
  }
  
-#define arch_max_pkey() pkeys_total

+static inline int arch_max_pkey(void)
+{
+   return max_pkey;
+}


If pkeys_total = 32 then max_pkey = 31.


we have

#ifdef CONFIG_PPC_4K_PAGES
/*
 * The OS can manage only 8 pkeys due to its inability to represent them
 * in the Linux 4K PTE. Mark all other keys reserved.
 */
max_pkey = min(8, pkeys_total);
#else
max_pkey = pkeys_total;
#endif

so it is 32.



So we can't just substitute one for the other. ie. arch_max_pkey() must
have been wrong, or it is wrong now.


diff --git a/arch/powerpc/mm/book3s64/pkeys.c b/arch/powerpc/mm/book3s64/pkeys.c
index 87d882a9aaf2..a4d7287082a8 100644
--- a/arch/powerpc/mm/book3s64/pkeys.c
+++ b/arch/powerpc/mm/book3s64/pkeys.c
@@ -14,7 +14,7 @@
  
  DEFINE_STATIC_KEY_FALSE(pkey_disabled);

  DEFINE_STATIC_KEY_FALSE(execute_pkey_disabled);
-int  pkeys_total;  /* Total pkeys as per device tree */
+int  max_pkey; /* Maximum key value supported */
  u32  initial_allocation_mask;   /* Bits set for the initially allocated keys 
*/
  /*
   *  Keys marked in the reservation list cannot be allocated by  userspace
@@ -84,7 +84,7 @@ static int scan_pkey_feature(void)
  
  static int pkey_initialize(void)

  {
-   int os_reserved, i;
+   int pkeys_total, i;
  
  	/*

 * We define PKEY_DISABLE_EXECUTE in addition to the arch-neutral
@@ -122,12 +122,12 @@ static int pkey_initialize(void)
 * The OS can manage only 8 pkeys due to its inability to represent them
 * in the Linux 4K PTE. Mark all other keys reserved.
 */
-   os_reserved = pkeys_total - 8;
+   max_pkey = min(8, pkeys_total);


Shouldn't it be 7 ?


  #else
-   os_reserved = 0;
+   max_pkey = pkeys_total;
  #endif
  
-	if (unlikely((pkeys_total - os_reserved) <= execute_only_key)) {

+   if (unlikely(max_pkey <= execute_only_key)) {


Isn't that an off-by-one now?

This is one-off boot time code, there's no need to clutter it with
unlikely.


/*
 * Insufficient number of keys to support
 * execute only key. Mark it unavailable.
@@ -174,10 +174,10 @@ static int pkey_initialize(void)
default_uamor &= ~(0x3ul << pkeyshift(1));
  
  	/*

-* Prevent the usage of OS reserved the keys. Update UAMOR
+* Prevent the usage of OS reserved keys. Update UAMOR
 * for those keys.
 */
-   for (i = (pkeys_total - os_reserved); i < pkeys_total; i++) {
+   for (i = max_pkey; i < pkeys_total; i++) {


Another off-by-one? Shouldn't we start from max_pkey + 1 ?


reserved_allocation_mask |= (0x1 << i);
default_uamor &= ~(0x3ul << pkeyshift(i));
}




It is the commit message. It is max pkey such that userspace can 
allocate 0 - maxp_pkey -1.


-aneesh


Re: [PATCH v5 15/26] powerpc/book3s64/pkeys: Use execute_pkey_disable static key

2020-07-06 Thread Michael Ellerman
"Aneesh Kumar K.V"  writes:
> Use execute_pkey_disabled static key to check for execute key support instead
> of pkey_disabled.
>
> Signed-off-by: Aneesh Kumar K.V 
> ---
>  arch/powerpc/include/asm/pkeys.h | 10 +-
>  arch/powerpc/mm/book3s64/pkeys.c |  5 -
>  2 files changed, 5 insertions(+), 10 deletions(-)
>
> diff --git a/arch/powerpc/include/asm/pkeys.h 
> b/arch/powerpc/include/asm/pkeys.h
> index 47c81d41ea9a..09fbaa409ac4 100644
> --- a/arch/powerpc/include/asm/pkeys.h
> +++ b/arch/powerpc/include/asm/pkeys.h
> @@ -126,15 +126,7 @@ static inline int mm_pkey_free(struct mm_struct *mm, int 
> pkey)
>   * Try to dedicate one of the protection keys to be used as an
>   * execute-only protection key.
>   */
> -extern int __execute_only_pkey(struct mm_struct *mm);
> -static inline int execute_only_pkey(struct mm_struct *mm)
> -{
> - if (static_branch_likely(_disabled))
> - return -1;
> -
> - return __execute_only_pkey(mm);
> -}
> -
> +extern int execute_only_pkey(struct mm_struct *mm);
>  extern int __arch_override_mprotect_pkey(struct vm_area_struct *vma,
>int prot, int pkey);
>  static inline int arch_override_mprotect_pkey(struct vm_area_struct *vma,
> diff --git a/arch/powerpc/mm/book3s64/pkeys.c 
> b/arch/powerpc/mm/book3s64/pkeys.c
> index bbba9c601e14..fed4f159011b 100644
> --- a/arch/powerpc/mm/book3s64/pkeys.c
> +++ b/arch/powerpc/mm/book3s64/pkeys.c
> @@ -345,8 +345,11 @@ void thread_pkey_regs_init(struct thread_struct *thread)
>   write_uamor(default_uamor);
>  }
>  
> -int __execute_only_pkey(struct mm_struct *mm)
> +int execute_only_pkey(struct mm_struct *mm)
>  {
> + if (static_branch_likely(_pkey_disabled))
> + return -1;
> +
>   return mm->context.execute_only_pkey;
>  }

That adds the overhead of a function call, but then uses a static_key to
avoid an easy to predict branch, which seems like a bad tradeoff. And
it's not a performance critical path AFAICS.

Anyway this seems unnecessary:

pkey_early_init_devtree()
{
...
if (unlikely(max_pkey <= execute_only_key)) {
/*
 * Insufficient number of keys to support
 * execute only key. Mark it unavailable.
 */
execute_only_key = -1;

void pkey_mm_init(struct mm_struct *mm)
{
...
mm->context.execute_only_pkey = execute_only_key;
}


ie. Can't it just be:

static inline int execute_only_pkey(struct mm_struct *mm)
{
return mm->context.execute_only_pkey;
}

cheers


Re: [PATCH v5 08/26] powerpc/book3s64/pkeys: Convert execute key support to static key

2020-07-06 Thread Michael Ellerman
"Aneesh Kumar K.V"  writes:
> Convert the bool to a static key like pkey_disabled.

I'm not convinced this is worth the added complexity of a static key.

It's not used in any performance critical paths, except for context
switch, but that's already guarded by:

if (old_thread->iamr != new_thread->iamr)

Which should always be false on machines which don't have the execute
key enabled.

cheers

> diff --git a/arch/powerpc/mm/book3s64/pkeys.c 
> b/arch/powerpc/mm/book3s64/pkeys.c
> index 9e68a08799ee..7d400d5a4076 100644
> --- a/arch/powerpc/mm/book3s64/pkeys.c
> +++ b/arch/powerpc/mm/book3s64/pkeys.c
> @@ -13,13 +13,13 @@
>  #include 
>  
>  DEFINE_STATIC_KEY_TRUE(pkey_disabled);
> +DEFINE_STATIC_KEY_FALSE(execute_pkey_disabled);
>  int  pkeys_total;/* Total pkeys as per device tree */
>  u32  initial_allocation_mask;   /* Bits set for the initially allocated keys 
> */
>  /*
>   *  Keys marked in the reservation list cannot be allocated by  userspace
>   */
>  u32  reserved_allocation_mask;
> -static bool pkey_execute_disable_supported;
>  static u64 default_amr;
>  static u64 default_iamr;
>  /* Allow all keys to be modified by default */
> @@ -116,9 +116,7 @@ static int pkey_initialize(void)
>* execute_disable support. Instead we use a PVR check.
>*/
>   if (pvr_version_is(PVR_POWER7) || pvr_version_is(PVR_POWER7p))
> - pkey_execute_disable_supported = false;
> - else
> - pkey_execute_disable_supported = true;
> + static_branch_enable(_pkey_disabled);
>  
>  #ifdef CONFIG_PPC_4K_PAGES
>   /*
> @@ -214,7 +212,7 @@ static inline void write_amr(u64 value)
>  
>  static inline u64 read_iamr(void)
>  {
> - if (!likely(pkey_execute_disable_supported))
> + if (static_branch_unlikely(_pkey_disabled))
>   return 0x0UL;
>  
>   return mfspr(SPRN_IAMR);
> @@ -222,7 +220,7 @@ static inline u64 read_iamr(void)
>  
>  static inline void write_iamr(u64 value)
>  {
> - if (!likely(pkey_execute_disable_supported))
> + if (static_branch_unlikely(_pkey_disabled))
>   return;
>  
>   mtspr(SPRN_IAMR, value);
> @@ -282,7 +280,7 @@ int __arch_set_user_pkey_access(struct task_struct *tsk, 
> int pkey,
>   return -EINVAL;
>  
>   if (init_val & PKEY_DISABLE_EXECUTE) {
> - if (!pkey_execute_disable_supported)
> + if (static_branch_unlikely(_pkey_disabled))
>   return -EINVAL;
>   new_iamr_bits |= IAMR_EX_BIT;
>   }
> -- 
> 2.26.2


Re: [PATCH v5 11/26] powerpc/book3s64/pkeys: Make initial_allocation_mask static

2020-07-06 Thread Michael Ellerman
"Aneesh Kumar K.V"  writes:
> initial_allocation_mask is not used outside this file.

And never modified after init, so make it __ro_after_init as well?

cheers

> diff --git a/arch/powerpc/include/asm/pkeys.h 
> b/arch/powerpc/include/asm/pkeys.h
> index 652bad7334f3..47c81d41ea9a 100644
> --- a/arch/powerpc/include/asm/pkeys.h
> +++ b/arch/powerpc/include/asm/pkeys.h
> @@ -13,7 +13,6 @@
>  
>  DECLARE_STATIC_KEY_FALSE(pkey_disabled);
>  extern int max_pkey;
> -extern u32 initial_allocation_mask; /*  bits set for the initially allocated 
> keys */
>  extern u32 reserved_allocation_mask; /* bits set for reserved keys */
>  
>  #define ARCH_VM_PKEY_FLAGS (VM_PKEY_BIT0 | VM_PKEY_BIT1 | VM_PKEY_BIT2 | \
> diff --git a/arch/powerpc/mm/book3s64/pkeys.c 
> b/arch/powerpc/mm/book3s64/pkeys.c
> index a4d7287082a8..73b5ef1490c8 100644
> --- a/arch/powerpc/mm/book3s64/pkeys.c
> +++ b/arch/powerpc/mm/book3s64/pkeys.c
> @@ -15,11 +15,11 @@
>  DEFINE_STATIC_KEY_FALSE(pkey_disabled);
>  DEFINE_STATIC_KEY_FALSE(execute_pkey_disabled);
>  int  max_pkey;   /* Maximum key value supported */
> -u32  initial_allocation_mask;   /* Bits set for the initially allocated keys 
> */
>  /*
>   *  Keys marked in the reservation list cannot be allocated by  userspace
>   */
>  u32  reserved_allocation_mask;
> +static u32  initial_allocation_mask;   /* Bits set for the initially 
> allocated keys */
>  static u64 default_amr;
>  static u64 default_iamr;
>  /* Allow all keys to be modified by default */
> -- 
> 2.26.2


Re: [PATCH v5 10/26] powerpc/book3s64/pkeys: Convert pkey_total to max_pkey

2020-07-06 Thread Michael Ellerman
"Aneesh Kumar K.V"  writes:
> max_pkey now represents max key value that userspace can allocate.
>
> Signed-off-by: Aneesh Kumar K.V 
> ---
>  arch/powerpc/include/asm/pkeys.h |  7 +--
>  arch/powerpc/mm/book3s64/pkeys.c | 14 +++---
>  2 files changed, 12 insertions(+), 9 deletions(-)
>
> diff --git a/arch/powerpc/include/asm/pkeys.h 
> b/arch/powerpc/include/asm/pkeys.h
> index 75d2a2c19c04..652bad7334f3 100644
> --- a/arch/powerpc/include/asm/pkeys.h
> +++ b/arch/powerpc/include/asm/pkeys.h
> @@ -12,7 +12,7 @@
>  #include 
>  
>  DECLARE_STATIC_KEY_FALSE(pkey_disabled);
> -extern int pkeys_total; /* total pkeys as per device tree */
> +extern int max_pkey;
>  extern u32 initial_allocation_mask; /*  bits set for the initially allocated 
> keys */
>  extern u32 reserved_allocation_mask; /* bits set for reserved keys */
>  
> @@ -44,7 +44,10 @@ static inline int vma_pkey(struct vm_area_struct *vma)
>   return (vma->vm_flags & ARCH_VM_PKEY_FLAGS) >> VM_PKEY_SHIFT;
>  }
>  
> -#define arch_max_pkey() pkeys_total
> +static inline int arch_max_pkey(void)
> +{
> + return max_pkey;
> +}

If pkeys_total = 32 then max_pkey = 31.

So we can't just substitute one for the other. ie. arch_max_pkey() must
have been wrong, or it is wrong now.

> diff --git a/arch/powerpc/mm/book3s64/pkeys.c 
> b/arch/powerpc/mm/book3s64/pkeys.c
> index 87d882a9aaf2..a4d7287082a8 100644
> --- a/arch/powerpc/mm/book3s64/pkeys.c
> +++ b/arch/powerpc/mm/book3s64/pkeys.c
> @@ -14,7 +14,7 @@
>  
>  DEFINE_STATIC_KEY_FALSE(pkey_disabled);
>  DEFINE_STATIC_KEY_FALSE(execute_pkey_disabled);
> -int  pkeys_total;/* Total pkeys as per device tree */
> +int  max_pkey;   /* Maximum key value supported */
>  u32  initial_allocation_mask;   /* Bits set for the initially allocated keys 
> */
>  /*
>   *  Keys marked in the reservation list cannot be allocated by  userspace
> @@ -84,7 +84,7 @@ static int scan_pkey_feature(void)
>  
>  static int pkey_initialize(void)
>  {
> - int os_reserved, i;
> + int pkeys_total, i;
>  
>   /*
>* We define PKEY_DISABLE_EXECUTE in addition to the arch-neutral
> @@ -122,12 +122,12 @@ static int pkey_initialize(void)
>* The OS can manage only 8 pkeys due to its inability to represent them
>* in the Linux 4K PTE. Mark all other keys reserved.
>*/
> - os_reserved = pkeys_total - 8;
> + max_pkey = min(8, pkeys_total);

Shouldn't it be 7 ?

>  #else
> - os_reserved = 0;
> + max_pkey = pkeys_total;
>  #endif
>  
> - if (unlikely((pkeys_total - os_reserved) <= execute_only_key)) {
> + if (unlikely(max_pkey <= execute_only_key)) {

Isn't that an off-by-one now?

This is one-off boot time code, there's no need to clutter it with
unlikely.

>   /*
>* Insufficient number of keys to support
>* execute only key. Mark it unavailable.
> @@ -174,10 +174,10 @@ static int pkey_initialize(void)
>   default_uamor &= ~(0x3ul << pkeyshift(1));
>  
>   /*
> -  * Prevent the usage of OS reserved the keys. Update UAMOR
> +  * Prevent the usage of OS reserved keys. Update UAMOR
>* for those keys.
>*/
> - for (i = (pkeys_total - os_reserved); i < pkeys_total; i++) {
> + for (i = max_pkey; i < pkeys_total; i++) {

Another off-by-one? Shouldn't we start from max_pkey + 1 ?

>   reserved_allocation_mask |= (0x1 << i);
>   default_uamor &= ~(0x3ul << pkeyshift(i));
>   }

cheers


[PATCH] powerpc/numa: Restrict possible nodes based on platform

2020-07-06 Thread Srikar Dronamraju
As per PAPR, there are 2 device tree property
ibm,max-associativity-domains (which defines the maximum number of
domains that the firmware i.e PowerVM can support) and
ibm,current-associativity-domains (which defines the maximum number of
domains that the platform can support). Value of
ibm,max-associativity-domains property is always greater than or equal
to ibm,current-associativity-domains property.

Powerpc currently uses ibm,max-associativity-domains  property while
setting the possible number of nodes. This is currently set at 32.
However the possible number of nodes for a platform may be significantly
less. Hence set the possible number of nodes based on
ibm,current-associativity-domains property.

$ lsprop /proc/device-tree/rtas/ibm,*associ*-domains
/proc/device-tree/rtas/ibm,current-associativity-domains
 0005 0001 0002 0002 0002 0010
/proc/device-tree/rtas/ibm,max-associativity-domains
 0005 0001 0008 0020 0020 0100

$ cat /sys/devices/system/node/possible ##Before patch
0-31

$ cat /sys/devices/system/node/possible ##After patch
0-1

Note the maximum nodes this platform can support is only 2 but the
possible nodes is set to 32.

This is important because lot of kernel and user space code allocate
structures for all possible nodes leading to a lot of memory that is
allocated but not used.

I ran a simple experiment to create and destroy 100 memory cgroups on
boot on a 8 node machine (Power8 Alpine).

Before patch
free -k at boot
  totalusedfree  shared  buff/cache   available
Mem:  523498176 4106816   518820608   22272  570752   516606720
Swap:   4194240   0 4194240

free -k after creating 100 memory cgroups
  totalusedfree  shared  buff/cache   available
Mem:  523498176 4628416   518246464   22336  623296   516058688
Swap:   4194240   0 4194240

free -k after destroying 100 memory cgroups
  totalusedfree  shared  buff/cache   available
Mem:  523498176 4697408   518173760   22400  627008   515987904
Swap:   4194240   0 4194240

After patch
free -k at boot
  totalusedfree  shared  buff/cache   available
Mem:  523498176 3969472   518933888   22272  594816   516731776
Swap:   4194240   0 4194240

free -k after creating 100 memory cgroups
  totalusedfree  shared  buff/cache   available
Mem:  523498176 4181888   518676096   22208  640192   516496448
Swap:   4194240   0 4194240

free -k after destroying 100 memory cgroups
  totalusedfree  shared  buff/cache   available
Mem:  523498176 4232320   518619904   22272  645952   516443264
Swap:   4194240   0 4194240

Observations:
Fixed kernel takes 137344 kb (4106816-3969472) less to boot.
Fixed kernel takes 309184 kb (4628416-4181888-137344) less to create 100 memcgs.

Cc: Nathan Lynch 
Cc: Michael Ellerman 
Cc: linuxppc-dev@lists.ozlabs.org
Cc: Anton Blanchard 
Cc: Bharata B Rao 
Signed-off-by: Srikar Dronamraju 
---
 arch/powerpc/mm/numa.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/arch/powerpc/mm/numa.c b/arch/powerpc/mm/numa.c
index 9fcf2d195830..3d55cef1a2dc 100644
--- a/arch/powerpc/mm/numa.c
+++ b/arch/powerpc/mm/numa.c
@@ -897,7 +897,7 @@ static void __init find_possible_nodes(void)
return;
 
if (of_property_read_u32_index(rtas,
-   "ibm,max-associativity-domains",
+   "ibm,current-associativity-domains",
min_common_depth, ))
goto out;
 
-- 
2.18.2



[RFC v2 2/2] powerpc/powernv : Introduce capability for firmware-enabled-stop

2020-07-06 Thread Abhishek Goel
This patch introduces the capability for firmware to handle the stop
states instead. A bit is set based on the discovery of the feature
and correspondingly also the responsibility to handle the stop states.

If Kernel does not know about stop version, it can fallback to opal for
idle stop support if firmware-stop-supported property is present.

Earlier part of this patch was posted in this series :
https://lkml.org/lkml/2020/3/4/589

Signed-off-by: Abhishek Goel 
Signed-off-by: Pratik Rajesh Sampat 
---
v1->v2 : Combined patch 2 and 3 from previous iteration and rebased it.

 arch/powerpc/include/asm/processor.h  | 18 ++
 arch/powerpc/kernel/dt_cpu_ftrs.c | 13 +
 arch/powerpc/platforms/powernv/idle.c | 20 
 drivers/cpuidle/cpuidle-powernv.c |  3 ++-
 4 files changed, 49 insertions(+), 5 deletions(-)

diff --git a/arch/powerpc/include/asm/processor.h 
b/arch/powerpc/include/asm/processor.h
index bfa336fbcfeb..b8de7146387c 100644
--- a/arch/powerpc/include/asm/processor.h
+++ b/arch/powerpc/include/asm/processor.h
@@ -428,6 +428,24 @@ extern void power4_idle_nap(void);
 extern unsigned long cpuidle_disable;
 enum idle_boot_override {IDLE_NO_OVERRIDE = 0, IDLE_POWERSAVE_OFF};
 
+#define STOP_ENABLE0x0001
+#define FIRMWARE_STOP_ENABLE   0x0010
+
+#define STOP_VERSION_P9   0x1
+
+/*
+ * Classify the dependencies of the stop states
+ * @idle_stop: function handler to handle the quirk stop version
+ * @cpuidle_prop: Signify support for stop states through kernel and/or 
firmware
+ * @stop_version: Classify quirk versions for stop states
+ */
+typedef struct {
+   unsigned long (*idle_stop)(unsigned long psscr, bool mmu_on);
+   uint8_t cpuidle_prop;
+   uint8_t stop_version;
+} stop_deps_t;
+extern stop_deps_t stop_dep;
+
 extern int powersave_nap;  /* set if nap mode can be used in idle loop */
 
 extern void power7_idle_type(unsigned long type);
diff --git a/arch/powerpc/kernel/dt_cpu_ftrs.c 
b/arch/powerpc/kernel/dt_cpu_ftrs.c
index 36bc0d5c4f3a..737686fae3c7 100644
--- a/arch/powerpc/kernel/dt_cpu_ftrs.c
+++ b/arch/powerpc/kernel/dt_cpu_ftrs.c
@@ -291,6 +291,15 @@ static int __init feat_enable_idle_stop(struct 
dt_cpu_feature *f)
lpcr |=  LPCR_PECE1;
lpcr |=  LPCR_PECE2;
mtspr(SPRN_LPCR, lpcr);
+   stop_dep.cpuidle_prop |= STOP_ENABLE;
+   stop_dep.stop_version = STOP_VERSION_P9;
+
+   return 1;
+}
+
+static int __init feat_enable_firmware_stop(struct dt_cpu_feature *f)
+{
+   stop_dep.cpuidle_prop |= FIRMWARE_STOP_ENABLE;
 
return 1;
 }
@@ -589,6 +598,7 @@ static struct dt_cpu_feature_match __initdata
{"idle-nap", feat_enable_idle_nap, 0},
{"alignment-interrupt-dsisr", feat_enable_align_dsisr, 0},
{"idle-stop", feat_enable_idle_stop, 0},
+   {"firmware-stop-supported", feat_enable_firmware_stop, 0},
{"machine-check-power8", feat_enable_mce_power8, 0},
{"performance-monitor-power8", feat_enable_pmu_power8, 0},
{"data-stream-control-register", feat_enable_dscr, CPU_FTR_DSCR},
@@ -656,6 +666,9 @@ static void __init cpufeatures_setup_start(u32 isa)
}
 }
 
+stop_deps_t stop_dep = {NULL, 0x0, 0x0};
+EXPORT_SYMBOL(stop_dep);
+
 static bool __init cpufeatures_process_feature(struct dt_cpu_feature *f)
 {
const struct dt_cpu_feature_match *m;
diff --git a/arch/powerpc/platforms/powernv/idle.c 
b/arch/powerpc/platforms/powernv/idle.c
index 3afd4293f729..3602950f6c08 100644
--- a/arch/powerpc/platforms/powernv/idle.c
+++ b/arch/powerpc/platforms/powernv/idle.c
@@ -824,7 +824,7 @@ static unsigned long power9_offline_stop(unsigned long 
psscr)
 
 #ifndef CONFIG_KVM_BOOK3S_HV_POSSIBLE
__ppc64_runlatch_off();
-   srr1 = power9_idle_stop(psscr, true);
+   srr1 = stop_dep.idle_stop(psscr, true);
__ppc64_runlatch_on();
 #else
/*
@@ -840,7 +840,7 @@ static unsigned long power9_offline_stop(unsigned long 
psscr)
local_paca->kvm_hstate.hwthread_state = KVM_HWTHREAD_IN_IDLE;
 
__ppc64_runlatch_off();
-   srr1 = power9_idle_stop(psscr, false);
+   srr1 = stop_dep.idle_stop(psscr, true);
__ppc64_runlatch_on();
 
local_paca->kvm_hstate.hwthread_state = KVM_HWTHREAD_IN_KERNEL;
@@ -868,7 +868,7 @@ void power9_idle_type(unsigned long stop_psscr_val,
psscr = (psscr & ~stop_psscr_mask) | stop_psscr_val;
 
__ppc64_runlatch_off();
-   srr1 = power9_idle_stop(psscr, true);
+   srr1 = stop_dep.idle_stop(psscr, true);
__ppc64_runlatch_on();
 
fini_irq_for_idle_irqsoff();
@@ -1365,8 +1365,20 @@ static int __init pnv_init_idle_states(void)
nr_pnv_idle_states = 0;
supported_cpuidle_states = 0;
 
-   if (cpuidle_disable != IDLE_NO_OVERRIDE)
+   if (cpuidle_disable != IDLE_NO_OVERRIDE ||
+   !(stop_dep.cpuidle_prop & STOP_ENABLE))
goto out;
+
+   /* Check for supported version