Re: [PATCH v3 0/6] powerpc: queued spinlocks and rwlocks
Excerpts from Waiman Long's message of July 7, 2020 4:39 am: > On 7/6/20 12:35 AM, Nicholas Piggin wrote: >> v3 is updated to use __pv_queued_spin_unlock, noticed by Waiman (thank you). >> >> Thanks, >> Nick >> >> Nicholas Piggin (6): >>powerpc/powernv: must include hvcall.h to get PAPR defines >>powerpc/pseries: move some PAPR paravirt functions to their own file >>powerpc: move spinlock implementation to simple_spinlock >>powerpc/64s: implement queued spinlocks and rwlocks >>powerpc/pseries: implement paravirt qspinlocks for SPLPAR >>powerpc/qspinlock: optimised atomic_try_cmpxchg_lock that adds the >> lock hint >> >> arch/powerpc/Kconfig | 13 + >> arch/powerpc/include/asm/Kbuild | 2 + >> arch/powerpc/include/asm/atomic.h | 28 ++ >> arch/powerpc/include/asm/paravirt.h | 89 + >> arch/powerpc/include/asm/qspinlock.h | 91 ++ >> arch/powerpc/include/asm/qspinlock_paravirt.h | 7 + >> arch/powerpc/include/asm/simple_spinlock.h| 292 + >> .../include/asm/simple_spinlock_types.h | 21 ++ >> arch/powerpc/include/asm/spinlock.h | 308 +- >> arch/powerpc/include/asm/spinlock_types.h | 17 +- >> arch/powerpc/lib/Makefile | 3 + >> arch/powerpc/lib/locks.c | 12 +- >> arch/powerpc/platforms/powernv/pci-ioda-tce.c | 1 + >> arch/powerpc/platforms/pseries/Kconfig| 5 + >> arch/powerpc/platforms/pseries/setup.c| 6 +- >> include/asm-generic/qspinlock.h | 4 + >> 16 files changed, 577 insertions(+), 322 deletions(-) >> create mode 100644 arch/powerpc/include/asm/paravirt.h >> create mode 100644 arch/powerpc/include/asm/qspinlock.h >> create mode 100644 arch/powerpc/include/asm/qspinlock_paravirt.h >> create mode 100644 arch/powerpc/include/asm/simple_spinlock.h >> create mode 100644 arch/powerpc/include/asm/simple_spinlock_types.h >> > This patch looks OK to me. Thanks for reviewing and testing. > I had run some microbenchmark on powerpc system with or w/o the patch. > > On a 2-socket 160-thread SMT4 POWER9 system (not virtualized): > > 5.8.0-rc4 > = > > Running locktest with spinlock [runtime = 10s, load = 1] > Threads = 160, Min/Mean/Max = 77,665/90,153/106,895 > Threads = 160, Total Rate = 1,441,759 op/s; Percpu Rate = 9,011 op/s > > Running locktest with rwlock [runtime = 10s, r% = 50%, load = 1] > Threads = 160, Min/Mean/Max = 47,879/53,807/63,689 > Threads = 160, Total Rate = 860,192 op/s; Percpu Rate = 5,376 op/s > > Running locktest with spinlock [runtime = 10s, load = 1] > Threads = 80, Min/Mean/Max = 242,907/319,514/463,161 > Threads = 80, Total Rate = 2,555 kop/s; Percpu Rate = 32 kop/s > > Running locktest with rwlock [runtime = 10s, r% = 50%, load = 1] > Threads = 80, Min/Mean/Max = 146,161/187,474/259,270 > Threads = 80, Total Rate = 1,498 kop/s; Percpu Rate = 19 kop/s > > Running locktest with spinlock [runtime = 10s, load = 1] > Threads = 40, Min/Mean/Max = 646,639/1,000,817/1,455,205 > Threads = 40, Total Rate = 4,001 kop/s; Percpu Rate = 100 kop/s > > Running locktest with rwlock [runtime = 10s, r% = 50%, load = 1] > Threads = 40, Min/Mean/Max = 402,165/597,132/814,555 > Threads = 40, Total Rate = 2,388 kop/s; Percpu Rate = 60 kop/s > > 5.8.0-rc4-qlock+ > > > Running locktest with spinlock [runtime = 10s, load = 1] > Threads = 160, Min/Mean/Max = 123,835/124,580/124,587 > Threads = 160, Total Rate = 1,992 kop/s; Percpu Rate = 12 kop/s > > Running locktest with rwlock [runtime = 10s, r% = 50%, load = 1] > Threads = 160, Min/Mean/Max = 254,210/264,714/276,784 > Threads = 160, Total Rate = 4,231 kop/s; Percpu Rate = 26 kop/s > > Running locktest with spinlock [runtime = 10s, load = 1] > Threads = 80, Min/Mean/Max = 599,715/603,397/603,450 > Threads = 80, Total Rate = 4,825 kop/s; Percpu Rate = 60 kop/s > > Running locktest with rwlock [runtime = 10s, r% = 50%, load = 1] > Threads = 80, Min/Mean/Max = 492,687/525,224/567,456 > Threads = 80, Total Rate = 4,199 kop/s; Percpu Rate = 52 kop/s > > Running locktest with spinlock [runtime = 10s, load = 1] > Threads = 40, Min/Mean/Max = 1,325,623/1,325,628/1,325,636 > Threads = 40, Total Rate = 5,299 kop/s; Percpu Rate = 132 kop/s > > Running locktest with rwlock [runtime = 10s, r% = 50%, load = 1] > Threads = 40, Min/Mean/Max = 1,249,731/1,292,977/1,342,815 > Threads = 40, Total Rate = 5,168 kop/s; Percpu Rate = 129 kop/s > > On systems on large number of cpus, qspinlock lock is faster and more fair. > > With some tuning, we may be able to squeeze out more performance. Yes, powerpc could certainly get more performance out of the slow paths, and then there are a few parameters to tune. We don't have a good alternate patching for function calls yet, but that would be something to do for native vs pv. And then there seem to be one or
Re: [PATCH] powerpc/pseries: detect secure and trusted boot state of the system.
Nayna Jain writes: > The device-tree property to check secure and trusted boot state is > different for guests(pseries) compared to baremetal(powernv). > > This patch updates the existing is_ppc_secureboot_enabled() and > is_ppc_trustedboot_enabled() function to add support for pseries. > > Signed-off-by: Nayna Jain > --- > arch/powerpc/kernel/secure_boot.c | 31 +-- > 1 file changed, 25 insertions(+), 6 deletions(-) > > diff --git a/arch/powerpc/kernel/secure_boot.c > b/arch/powerpc/kernel/secure_boot.c > index 4b982324d368..43fc6607c7a5 100644 > --- a/arch/powerpc/kernel/secure_boot.c > +++ b/arch/powerpc/kernel/secure_boot.c > @@ -6,6 +6,7 @@ > #include > #include > #include > +#include > > static struct device_node *get_ppc_fw_sb_node(void) > { > @@ -23,11 +24,20 @@ bool is_ppc_secureboot_enabled(void) > { > struct device_node *node; > bool enabled = false; > + const u32 *secureboot; > > - node = get_ppc_fw_sb_node(); > - enabled = of_property_read_bool(node, "os-secureboot-enforcing"); > + if (machine_is(powernv)) { > + node = get_ppc_fw_sb_node(); > + enabled = > + of_property_read_bool(node, "os-secureboot-enforcing"); > + of_node_put(node); > + } We generally try to avoid adding new machine_is() checks if we can. In a case like this I think you can just check for both properties regardless of what platform you're on. > - of_node_put(node); > + if (machine_is(pseries)) { > + secureboot = of_get_property(of_root, "ibm,secure-boot", NULL); > + if (secureboot) > + enabled = (*secureboot > 1) ? true : false; > + } Please don't use of_get_property() in new code. Use one of the properly typed accessors that handles endian conversion for you. cheers
Re: [PATCH] powerpc: select ARCH_HAS_MEMBARRIER_SYNC_CORE
Excerpts from Christophe Leroy's message of July 6, 2020 7:53 pm: > > > Le 06/07/2020 à 04:18, Nicholas Piggin a écrit : >> diff --git a/arch/powerpc/include/asm/exception-64s.h >> b/arch/powerpc/include/asm/exception-64s.h >> index 47bd4ea0837d..b88cb3a989b6 100644 >> --- a/arch/powerpc/include/asm/exception-64s.h >> +++ b/arch/powerpc/include/asm/exception-64s.h >> @@ -68,6 +68,10 @@ >>* >>* The nop instructions allow us to insert one or more instructions to >> flush the >>* L1-D cache when returning to userspace or a guest. >> + * >> + * powerpc relies on return from interrupt/syscall being context >> synchronising >> + * (which hrfid, rfid, and rfscv are) to support >> ARCH_HAS_MEMBARRIER_SYNC_CORE >> + * without additional additional synchronisation instructions. > > This file is dedicated to BOOK3S/64. What about other ones ? > > On 32 bits, this is also valid as 'rfi' is also context synchronising, > but then why just add some comment in exception-64s.h and only there ? Yeah you're right, I basically wanted to keep a note there just in case, because it's possible we would get a less synchronising return (maybe unlikely with meltdown) or even return from a kernel interrupt using a something faster (e.g., bctar if we don't use tar register in the kernel anywhere). So I wonder where to add the note, entry_32.S and 64e.h as well? I should actually change the comment for 64-bit because soft masked interrupt replay is an interesting case. I thought it was okay (because the IPI would cause a hard interrupt which does do the rfi) but that should at least be written. The context synchronisation happens before the Linux IPI function is called, but for the purpose of membarrier I think that is okay (the membarrier just needs to have caused a memory barrier + context synchronistaion by the time it has done). Thanks, Nick
Re: [PATCH v3 1/2] powerpc/perf/hv-24x7: Add cpu hotplug support
On 7/7/20 10:26 AM, Michael Ellerman wrote: Madhavan Srinivasan writes: On 7/6/20 8:43 AM, Michael Ellerman wrote: Kajol Jain writes: Patch here adds cpu hotplug functions to hv_24x7 pmu. A new cpuhp_state "CPUHP_AP_PERF_POWERPC_HV_24x7_ONLINE" enum is added. The online callback function updates the cpumask only if its empty. As the primary intention of adding hotplug support is to designate a CPU to make HCALL to collect the counter data. The offline function test and clear corresponding cpu in a cpumask and update cpumask to any other active cpu. Signed-off-by: Kajol Jain Reviewed-by: Gautham R. Shenoy --- arch/powerpc/perf/hv-24x7.c | 45 + include/linux/cpuhotplug.h | 1 + 2 files changed, 46 insertions(+) diff --git a/arch/powerpc/perf/hv-24x7.c b/arch/powerpc/perf/hv-24x7.c index db213eb7cb02..ce4739e2b407 100644 --- a/arch/powerpc/perf/hv-24x7.c +++ b/arch/powerpc/perf/hv-24x7.c @@ -31,6 +31,8 @@ static int interface_version; /* Whether we have to aggregate result data for some domains. */ static bool aggregate_result_elements; +static cpumask_t hv_24x7_cpumask; + static bool domain_is_valid(unsigned domain) { switch (domain) { @@ -1641,6 +1643,44 @@ static struct pmu h_24x7_pmu = { .capabilities = PERF_PMU_CAP_NO_EXCLUDE, }; +static int ppc_hv_24x7_cpu_online(unsigned int cpu) +{ + /* Make this CPU the designated target for counter collection */ The comment implies every newly onlined CPU will become the target, but actually it's only the first onlined CPU. So I think the comment needs updating, or you could just drop the comment, I think the code is fairly clear by itself. + if (cpumask_empty(_24x7_cpumask)) + cpumask_set_cpu(cpu, _24x7_cpumask); + + return 0; +} + +static int ppc_hv_24x7_cpu_offline(unsigned int cpu) +{ + int target = -1; No need to initialise target, you assign to it unconditionally below. + /* Check if exiting cpu is used for collecting 24x7 events */ + if (!cpumask_test_and_clear_cpu(cpu, _24x7_cpumask)) + return 0; + + /* Find a new cpu to collect 24x7 events */ + target = cpumask_last(cpu_active_mask); Any reason to use cpumask_last() vs cpumask_first(), or a randomly chosen CPU? + if (target < 0 || target >= nr_cpu_ids) + return -1; + + /* Migrate 24x7 events to the new target */ + cpumask_set_cpu(target, _24x7_cpumask); + perf_pmu_migrate_context(_24x7_pmu, cpu, target); + + return 0; +} + +static int hv_24x7_cpu_hotplug_init(void) +{ + return cpuhp_setup_state(CPUHP_AP_PERF_POWERPC_HV_24x7_ONLINE, + "perf/powerpc/hv_24x7:online", + ppc_hv_24x7_cpu_online, + ppc_hv_24x7_cpu_offline); +} + static int hv_24x7_init(void) { int r; @@ -1685,6 +1725,11 @@ static int hv_24x7_init(void) if (r) return r; + /* init cpuhotplug */ + r = hv_24x7_cpu_hotplug_init(); + if (r) + pr_err("hv_24x7: CPU hotplug init failed\n"); + The hotplug initialisation shouldn't fail unless something is badly wrong. I think you should just fail initialisation of the entire PMU if that happens, which will make the error handling in the next patch much simpler. We did fail the PMU registration on failure of the hotplug code (and yes error handling is much simpler), but on internal review/discussion, what came up was that, hv_24x7 PMU will still be usable without the hotplug code (with "-C" option to perf tool command line). In theory yes. But in reality no one will ever test that case, so the code will easily bit rot. Even if it doesn't bit rot, you've now created another state the system can legally be in (hotplug init failed but PMU still probed), which you have to test, document & support. If the hotplug init fails then something is badly wrong, the best thing we can do is bail on the PMU initialisation and hope the rest of the system boots OK. Yep agreed. Thanks for the comments mpe Maddy cheers
Re: [PATCH v12 00/31] Speculative page faults
On Mon, 2020-07-06 at 14:27 +0200, Laurent Dufour wrote: > Le 06/07/2020 à 11:25, Chinwen Chang a écrit : > > On Thu, 2019-06-20 at 16:19 +0800, Haiyan Song wrote: > >> Hi Laurent, > >> > >> I downloaded your script and run it on Intel 2s skylake platform with > >> spf-v12 patch > >> serials. > >> > >> Here attached the output results of this script. > >> > >> The following comparison result is statistics from the script outputs. > >> > >> a). Enable THP > >> SPF_0 change > >> SPF_1 > >> will-it-scale.page_fault2.per_thread_ops2664190.8 -11.7% > >> 2353637.6 > >> will-it-scale.page_fault3.per_thread_ops4480027.2 -14.7% > >> 3819331.9 > >> > >> > >> b). Disable THP > >> SPF_0 change > >> SPF_1 > >> will-it-scale.page_fault2.per_thread_ops2653260.7 -10% > >> 2385165.8 > >> will-it-scale.page_fault3.per_thread_ops4436330.1 -12.4% > >> 3886734.2 > >> > >> > >> Thanks, > >> Haiyan Song > >> > >> > >> On Fri, Jun 14, 2019 at 10:44:47AM +0200, Laurent Dufour wrote: > >>> Le 14/06/2019 à 10:37, Laurent Dufour a écrit : > Please find attached the script I run to get these numbers. > This would be nice if you could give it a try on your victim node and > share the result. > >>> > >>> Sounds that the Intel mail fitering system doesn't like the attached > >>> shell script. > >>> Please find it there: > >>> https://urldefense.com/v3/__https://gist.github.com/ldu4/a5cc1a93f293108ea387d43d5d5e7f44__;!!CTRNKA9wMg0ARbw!0lux2FMCbIFxFEl824CdSuSQqT0IVWsvyUqfDVJNEVb9gTWyRltm7cpPZg70N_XhXmMZ$ > >>> > >>> > >>> Thanks, > >>> Laurent. > >>> > > > > Hi Laurent, > > > > We merged SPF v11 and some patches from v12 into our platforms. After > > several experiments, we observed SPF has obvious improvements on the > > launch time of applications, especially for those high-TLP ones, > > > > # launch time of applications(s): > > > > package version w/ SPF w/o SPF improve(%) > > -- > > Baidu maps10.13.3 0.887 0.98 9.49 > > Taobao8.4.0.35 1.227 1.2935.10 > > Meituan 9.12.401 1.107 1.54328.26 > > WeChat7.0.32.353 2.68 12.20 > > Honor of Kings1.43.1.6 6.636.7131.24 > > That's great news, thanks for reporting this! > > > > > By the way, we have verified our platforms with those patches and > > achieved the goal of mass production. > > Another good news! > For my information, what is your targeted hardware? > > Cheers, > Laurent. Hi Laurent, Our targeted hardware belongs to ARM64 multi-core series. Thanks. Chinwen >
Re: [PATCH] powerpc/numa: Restrict possible nodes based on platform
Srikar Dronamraju writes: > As per PAPR, there are 2 device tree property > ibm,max-associativity-domains (which defines the maximum number of > domains that the firmware i.e PowerVM can support) and > ibm,current-associativity-domains (which defines the maximum number of > domains that the platform can support). Value of > ibm,max-associativity-domains property is always greater than or equal > to ibm,current-associativity-domains property. Where is it documented? It's definitely not in LoPAPR. > Powerpc currently uses ibm,max-associativity-domains property while > setting the possible number of nodes. This is currently set at 32. > However the possible number of nodes for a platform may be significantly > less. Hence set the possible number of nodes based on > ibm,current-associativity-domains property. > > $ lsprop /proc/device-tree/rtas/ibm,*associ*-domains > /proc/device-tree/rtas/ibm,current-associativity-domains >0005 0001 0002 0002 0002 0010 > /proc/device-tree/rtas/ibm,max-associativity-domains >0005 0001 0008 0020 0020 0100 > > $ cat /sys/devices/system/node/possible ##Before patch > 0-31 > > $ cat /sys/devices/system/node/possible ##After patch > 0-1 > > Note the maximum nodes this platform can support is only 2 but the > possible nodes is set to 32. But what about LPM to a system with more nodes? cheers
[PATCH v2] MAINTAINERS: Add Shengjiu to reviewer list of sound/soc/fsl
Add Shengjiu who's actively working on the latest fsl/nxp audio drivers. Signed-off-by: Nicolin Chen Acked-by: Shengjiu Wang Reviewed-by: Fabio Estevam --- Changelog v1->v2: * Replaced Shengjiu's emaill address with his gmail one * Added Acked-by from Shengjiu and Reviewed-by from Fabio MAINTAINERS | 1 + 1 file changed, 1 insertion(+) diff --git a/MAINTAINERS b/MAINTAINERS index 496fd4eafb68..ff97b8cefaea 100644 --- a/MAINTAINERS +++ b/MAINTAINERS @@ -6956,6 +6956,7 @@ M:Timur Tabi M: Nicolin Chen M: Xiubo Li R: Fabio Estevam +R: Shengjiu Wang L: alsa-de...@alsa-project.org (moderated for non-subscribers) L: linuxppc-dev@lists.ozlabs.org S: Maintained -- 2.17.1
Re: [PATCH v3 1/2] powerpc/perf/hv-24x7: Add cpu hotplug support
Madhavan Srinivasan writes: > On 7/6/20 8:43 AM, Michael Ellerman wrote: >> Kajol Jain writes: >>> Patch here adds cpu hotplug functions to hv_24x7 pmu. >>> A new cpuhp_state "CPUHP_AP_PERF_POWERPC_HV_24x7_ONLINE" enum >>> is added. >>> >>> The online callback function updates the cpumask only if its >>> empty. As the primary intention of adding hotplug support >>> is to designate a CPU to make HCALL to collect the >>> counter data. >>> >>> The offline function test and clear corresponding cpu in a cpumask >>> and update cpumask to any other active cpu. >>> >>> Signed-off-by: Kajol Jain >>> Reviewed-by: Gautham R. Shenoy >>> --- >>> arch/powerpc/perf/hv-24x7.c | 45 + >>> include/linux/cpuhotplug.h | 1 + >>> 2 files changed, 46 insertions(+) >>> >>> diff --git a/arch/powerpc/perf/hv-24x7.c b/arch/powerpc/perf/hv-24x7.c >>> index db213eb7cb02..ce4739e2b407 100644 >>> --- a/arch/powerpc/perf/hv-24x7.c >>> +++ b/arch/powerpc/perf/hv-24x7.c >>> @@ -31,6 +31,8 @@ static int interface_version; >>> /* Whether we have to aggregate result data for some domains. */ >>> static bool aggregate_result_elements; >>> >>> +static cpumask_t hv_24x7_cpumask; >>> + >>> static bool domain_is_valid(unsigned domain) >>> { >>> switch (domain) { >>> @@ -1641,6 +1643,44 @@ static struct pmu h_24x7_pmu = { >>> .capabilities = PERF_PMU_CAP_NO_EXCLUDE, >>> }; >>> >>> +static int ppc_hv_24x7_cpu_online(unsigned int cpu) >>> +{ >>> + /* Make this CPU the designated target for counter collection */ >> The comment implies every newly onlined CPU will become the target, but >> actually it's only the first onlined CPU. >> >> So I think the comment needs updating, or you could just drop the >> comment, I think the code is fairly clear by itself. >> >>> + if (cpumask_empty(_24x7_cpumask)) >>> + cpumask_set_cpu(cpu, _24x7_cpumask); >>> + >>> + return 0; >>> +} >>> + >>> +static int ppc_hv_24x7_cpu_offline(unsigned int cpu) >>> +{ >>> + int target = -1; >> No need to initialise target, you assign to it unconditionally below. >> >>> + /* Check if exiting cpu is used for collecting 24x7 events */ >>> + if (!cpumask_test_and_clear_cpu(cpu, _24x7_cpumask)) >>> + return 0; >>> + >>> + /* Find a new cpu to collect 24x7 events */ >>> + target = cpumask_last(cpu_active_mask); >> Any reason to use cpumask_last() vs cpumask_first(), or a randomly >> chosen CPU? >> >>> + if (target < 0 || target >= nr_cpu_ids) >>> + return -1; >>> + >>> + /* Migrate 24x7 events to the new target */ >>> + cpumask_set_cpu(target, _24x7_cpumask); >>> + perf_pmu_migrate_context(_24x7_pmu, cpu, target); >>> + >>> + return 0; >>> +} >>> + >>> +static int hv_24x7_cpu_hotplug_init(void) >>> +{ >>> + return cpuhp_setup_state(CPUHP_AP_PERF_POWERPC_HV_24x7_ONLINE, >>> + "perf/powerpc/hv_24x7:online", >>> + ppc_hv_24x7_cpu_online, >>> + ppc_hv_24x7_cpu_offline); >>> +} >>> + >>> static int hv_24x7_init(void) >>> { >>> int r; >>> @@ -1685,6 +1725,11 @@ static int hv_24x7_init(void) >>> if (r) >>> return r; >>> >>> + /* init cpuhotplug */ >>> + r = hv_24x7_cpu_hotplug_init(); >>> + if (r) >>> + pr_err("hv_24x7: CPU hotplug init failed\n"); >>> + >> The hotplug initialisation shouldn't fail unless something is badly >> wrong. I think you should just fail initialisation of the entire PMU if >> that happens, which will make the error handling in the next patch much >> simpler. > > We did fail the PMU registration on failure of the hotplug > code (and yes error handling is much simpler), but on internal > review/discussion, > what came up was that, hv_24x7 PMU will still be usable without > the hotplug code (with "-C" option to perf tool command line). In theory yes. But in reality no one will ever test that case, so the code will easily bit rot. Even if it doesn't bit rot, you've now created another state the system can legally be in (hotplug init failed but PMU still probed), which you have to test, document & support. If the hotplug init fails then something is badly wrong, the best thing we can do is bail on the PMU initialisation and hope the rest of the system boots OK. cheers
Re: [PATCH v3 1/2] powerpc/perf/hv-24x7: Add cpu hotplug support
On 7/6/20 8:43 AM, Michael Ellerman wrote: Kajol Jain writes: Patch here adds cpu hotplug functions to hv_24x7 pmu. A new cpuhp_state "CPUHP_AP_PERF_POWERPC_HV_24x7_ONLINE" enum is added. The online callback function updates the cpumask only if its empty. As the primary intention of adding hotplug support is to designate a CPU to make HCALL to collect the counter data. The offline function test and clear corresponding cpu in a cpumask and update cpumask to any other active cpu. Signed-off-by: Kajol Jain Reviewed-by: Gautham R. Shenoy --- arch/powerpc/perf/hv-24x7.c | 45 + include/linux/cpuhotplug.h | 1 + 2 files changed, 46 insertions(+) diff --git a/arch/powerpc/perf/hv-24x7.c b/arch/powerpc/perf/hv-24x7.c index db213eb7cb02..ce4739e2b407 100644 --- a/arch/powerpc/perf/hv-24x7.c +++ b/arch/powerpc/perf/hv-24x7.c @@ -31,6 +31,8 @@ static int interface_version; /* Whether we have to aggregate result data for some domains. */ static bool aggregate_result_elements; +static cpumask_t hv_24x7_cpumask; + static bool domain_is_valid(unsigned domain) { switch (domain) { @@ -1641,6 +1643,44 @@ static struct pmu h_24x7_pmu = { .capabilities = PERF_PMU_CAP_NO_EXCLUDE, }; +static int ppc_hv_24x7_cpu_online(unsigned int cpu) +{ + /* Make this CPU the designated target for counter collection */ The comment implies every newly onlined CPU will become the target, but actually it's only the first onlined CPU. So I think the comment needs updating, or you could just drop the comment, I think the code is fairly clear by itself. + if (cpumask_empty(_24x7_cpumask)) + cpumask_set_cpu(cpu, _24x7_cpumask); + + return 0; +} + +static int ppc_hv_24x7_cpu_offline(unsigned int cpu) +{ + int target = -1; No need to initialise target, you assign to it unconditionally below. + /* Check if exiting cpu is used for collecting 24x7 events */ + if (!cpumask_test_and_clear_cpu(cpu, _24x7_cpumask)) + return 0; + + /* Find a new cpu to collect 24x7 events */ + target = cpumask_last(cpu_active_mask); Any reason to use cpumask_last() vs cpumask_first(), or a randomly chosen CPU? + if (target < 0 || target >= nr_cpu_ids) + return -1; + + /* Migrate 24x7 events to the new target */ + cpumask_set_cpu(target, _24x7_cpumask); + perf_pmu_migrate_context(_24x7_pmu, cpu, target); + + return 0; +} + +static int hv_24x7_cpu_hotplug_init(void) +{ + return cpuhp_setup_state(CPUHP_AP_PERF_POWERPC_HV_24x7_ONLINE, + "perf/powerpc/hv_24x7:online", + ppc_hv_24x7_cpu_online, + ppc_hv_24x7_cpu_offline); +} + static int hv_24x7_init(void) { int r; @@ -1685,6 +1725,11 @@ static int hv_24x7_init(void) if (r) return r; + /* init cpuhotplug */ + r = hv_24x7_cpu_hotplug_init(); + if (r) + pr_err("hv_24x7: CPU hotplug init failed\n"); + The hotplug initialisation shouldn't fail unless something is badly wrong. I think you should just fail initialisation of the entire PMU if that happens, which will make the error handling in the next patch much simpler. We did fail the PMU registration on failure of the hotplug code (and yes error handling is much simpler), but on internal review/discussion, what came up was that, hv_24x7 PMU will still be usable without the hotplug code (with "-C" option to perf tool command line). Maddy cheers r = perf_pmu_register(_24x7_pmu, h_24x7_pmu.name, -1); if (r) return r;
Re: [PATCH] powerpc/numa: Restrict possible nodes based on platform
* Nathan Lynch [2020-07-06 19:44:31]: > Tyrel Datwyler writes: > >> --- a/arch/powerpc/mm/numa.c > >> +++ b/arch/powerpc/mm/numa.c > >> @@ -897,7 +897,7 @@ static void __init find_possible_nodes(void) > >>return; > >> > >>if (of_property_read_u32_index(rtas, > >> - "ibm,max-associativity-domains", > >> + "ibm,current-associativity-domains", > > > > I'm not sure ibm,current-associativity-domains is guaranteed to exist on > > older > > firmware. You may need check that it exists and fall back to > > ibm,max-associativity-domains in the event it doesn't > > Yes. Looks like it's a PowerVM-specific property. Right, this is just a PowerVM specific property and doesn't affect PowerNV. On PowerNV thought we have sparse nodes, the max possible nodes is set correctly. -- Thanks and Regards Srikar Dronamraju
Re: [PATCH] powerpc/pseries: detect secure and trusted boot state of the system.
As a final extra note, https://github.com/daxtens/qemu/tree/pseries-secboot is a qemu repository that fakes out the relevant properties if anyone else wants to test this. Also, > 3-9 - enabled, OS-defined behaviour. In this patch we map all these >values to enabled and enforcing. Again I think this is the >appropriate thing to do. this should read "enabled and enforcing, requirements are at the discretion of the operating system". Apologies. Regards, Daniel > ibm,trusted-boot isn't published by a current PowerVM LPAR but will be > published in future. (Currently, trusted boot state is inferred by the > presence or absense of a vTPM.) It's simply 1 = enabled, 0 = disabled. > > As for this patch specifically, with the very small nits below, > > Reviewed-by: Daniel Axtens > >> -node = get_ppc_fw_sb_node(); >> -enabled = of_property_read_bool(node, "os-secureboot-enforcing"); >> +if (machine_is(powernv)) { >> +node = get_ppc_fw_sb_node(); >> +enabled = >> +of_property_read_bool(node, "os-secureboot-enforcing"); >> +of_node_put(node); >> +} >> >> -of_node_put(node); >> +if (machine_is(pseries)) { > Maybe this should be an else if? > >> +secureboot = of_get_property(of_root, "ibm,secure-boot", NULL); >> +if (secureboot) >> +enabled = (*secureboot > 1) ? true : false; >> +} >> >> pr_info("Secure boot mode %s\n", enabled ? "enabled" : "disabled"); >> >> @@ -38,11 +48,20 @@ bool is_ppc_trustedboot_enabled(void) >> { >> struct device_node *node; >> bool enabled = false; >> +const u32 *trustedboot; >> >> -node = get_ppc_fw_sb_node(); >> -enabled = of_property_read_bool(node, "trusted-enabled"); >> +if (machine_is(powernv)) { >> +node = get_ppc_fw_sb_node(); >> +enabled = of_property_read_bool(node, "trusted-enabled"); >> +of_node_put(node); >> +} >> >> -of_node_put(node); >> +if (machine_is(pseries)) { > Likewise. >> +trustedboot = >> +of_get_property(of_root, "ibm,trusted-boot", NULL); >> +if (trustedboot) >> +enabled = (*trustedboot > 0) ? true : false; > > Regards, > Daniel
Re: [PATCH] powerpc/numa: Restrict possible nodes based on platform
* Tyrel Datwyler [2020-07-06 13:58:42]: > On 7/5/20 11:40 PM, Srikar Dronamraju wrote: > > diff --git a/arch/powerpc/mm/numa.c b/arch/powerpc/mm/numa.c > > index 9fcf2d195830..3d55cef1a2dc 100644 > > --- a/arch/powerpc/mm/numa.c > > +++ b/arch/powerpc/mm/numa.c > > @@ -897,7 +897,7 @@ static void __init find_possible_nodes(void) > > return; > > > > if (of_property_read_u32_index(rtas, > > - "ibm,max-associativity-domains", > > + "ibm,current-associativity-domains", > > I'm not sure ibm,current-associativity-domains is guaranteed to exist on older > firmware. You may need check that it exists and fall back to > ibm,max-associativity-domains in the event it doesn't > > -Tyrel > Oh, thanks Tyrel thats a good observation. Will fallback on max-associativity. -- Thanks and Regards Srikar Dronamraju
RE: [PATCH v6 00/11] Add the multiple PF support for DWC and Layerscape
Hi Lorenzo, > -Original Message- > From: Lorenzo Pieralisi [mailto:lorenzo.pieral...@arm.com] > Sent: 2020年7月6日 18:47 > To: Xiaowei Bao > Cc: Z.q. Hou ; M.h. Lian ; > Mingkai Hu ; bhelg...@google.com; > robh...@kernel.org; shawn...@kernel.org; Leo Li ; > kis...@ti.com; Roy Zang ; > amur...@thegoodpenguin.co.uk; jingooh...@gmail.com; > gustavo.pimen...@synopsys.com; andrew.mur...@arm.com; > linux-...@vger.kernel.org; devicet...@vger.kernel.org; > linux-ker...@vger.kernel.org; linux-arm-ker...@lists.infradead.org; > linuxppc-dev@lists.ozlabs.org > Subject: Re: [PATCH v6 00/11] Add the multiple PF support for DWC and > Layerscape > > On Sat, Mar 14, 2020 at 11:30:27AM +0800, Xiaowei Bao wrote: > > Add the PCIe EP multiple PF support for DWC and Layerscape, add the > > doorbell MSIX function for DWC, use list to manage the PF of one PCIe > > controller, and refactor the Layerscape EP driver due to some > > platforms difference. > > > > Xiaowei Bao (11): > > PCI: designware-ep: Add multiple PFs support for DWC > > PCI: designware-ep: Add the doorbell mode of MSI-X in EP mode > > PCI: designware-ep: Move the function of getting MSI capability > > forward > > PCI: designware-ep: Modify MSI and MSIX CAP way of finding > > dt-bindings: pci: layerscape-pci: Add compatible strings for ls1088a > > and ls2088a > > PCI: layerscape: Fix some format issue of the code > > PCI: layerscape: Modify the way of getting capability with different > > PEX > > PCI: layerscape: Modify the MSIX to the doorbell mode > > PCI: layerscape: Add EP mode support for ls1088a and ls2088a > > arm64: dts: layerscape: Add PCIe EP node for ls1088a > > misc: pci_endpoint_test: Add LS1088a in pci_device_id table > > > > .../devicetree/bindings/pci/layerscape-pci.txt | 2 + > > arch/arm64/boot/dts/freescale/fsl-ls1088a.dtsi | 31 +++ > > drivers/misc/pci_endpoint_test.c | 2 + > > drivers/pci/controller/dwc/pci-layerscape-ep.c | 100 ++-- > > drivers/pci/controller/dwc/pcie-designware-ep.c| 255 > + > > drivers/pci/controller/dwc/pcie-designware.c | 59 +++-- > > drivers/pci/controller/dwc/pcie-designware.h | 48 +++- > > 7 files changed, 404 insertions(+), 93 deletions(-) > > Can you rebase it against v5.8-rc1 please ? Yes, I will help to rebase. Thanks, Zhiqiang > > Thanks, > Lorenzo
Re: [PATCH] powerpc/pseries: detect secure and trusted boot state of the system.
Thanks Nayna! I'm hoping to get better public documentation for this soon as it's not documented in a public PAPR yet. Until then: The values of ibm,secure-boot under PowerVM are: 0 - disabled 1 - audit mode only. This patch ignores this value for Linux, which I think is the appropriate thing to do. 2 - enabled and enforcing 3-9 - enabled, OS-defined behaviour. In this patch we map all these values to enabled and enforcing. Again I think this is the appropriate thing to do. ibm,trusted-boot isn't published by a current PowerVM LPAR but will be published in future. (Currently, trusted boot state is inferred by the presence or absense of a vTPM.) It's simply 1 = enabled, 0 = disabled. As for this patch specifically, with the very small nits below, Reviewed-by: Daniel Axtens > - node = get_ppc_fw_sb_node(); > - enabled = of_property_read_bool(node, "os-secureboot-enforcing"); > + if (machine_is(powernv)) { > + node = get_ppc_fw_sb_node(); > + enabled = > + of_property_read_bool(node, "os-secureboot-enforcing"); > + of_node_put(node); > + } > > - of_node_put(node); > + if (machine_is(pseries)) { Maybe this should be an else if? > + secureboot = of_get_property(of_root, "ibm,secure-boot", NULL); > + if (secureboot) > + enabled = (*secureboot > 1) ? true : false; > + } > > pr_info("Secure boot mode %s\n", enabled ? "enabled" : "disabled"); > > @@ -38,11 +48,20 @@ bool is_ppc_trustedboot_enabled(void) > { > struct device_node *node; > bool enabled = false; > + const u32 *trustedboot; > > - node = get_ppc_fw_sb_node(); > - enabled = of_property_read_bool(node, "trusted-enabled"); > + if (machine_is(powernv)) { > + node = get_ppc_fw_sb_node(); > + enabled = of_property_read_bool(node, "trusted-enabled"); > + of_node_put(node); > + } > > - of_node_put(node); > + if (machine_is(pseries)) { Likewise. > + trustedboot = > + of_get_property(of_root, "ibm,trusted-boot", NULL); > + if (trustedboot) > + enabled = (*trustedboot > 0) ? true : false; Regards, Daniel
Re: [PATCH] MAINTAINERS: Add Shengjiu to reviewer list of sound/soc/fsl
On Fri, Jul 3, 2020 at 3:33 AM Nicolin Chen wrote: > > Add Shengjiu who's actively working on the latest fsl/nxp audio drivers. > > Signed-off-by: Nicolin Chen > Cc: Shengjiu Wang > --- > To Shengjiu, please ack if you feel okay with this (your email address too). Thanks Nicolin for nominating me as a reviewer. I'd like to use my gmail address "shengjiu.w...@gmail.com". with this then Acked-by: Shengjiu Wang best regards wang shengjiu > > MAINTAINERS | 1 + > 1 file changed, 1 insertion(+) > > diff --git a/MAINTAINERS b/MAINTAINERS > index 496fd4eafb68..54aab083bb88 100644 > --- a/MAINTAINERS > +++ b/MAINTAINERS > @@ -6956,6 +6956,7 @@ M:Timur Tabi > M: Nicolin Chen > M: Xiubo Li > R: Fabio Estevam > +R: Shengjiu Wang > L: alsa-de...@alsa-project.org (moderated for non-subscribers) > L: linuxppc-dev@lists.ozlabs.org > S: Maintained > -- > 2.17.1 >
Re: [PATCH v5 10/26] powerpc/book3s64/pkeys: Convert pkey_total to max_pkey
"Aneesh Kumar K.V" writes: > On 7/6/20 12:34 PM, Michael Ellerman wrote: >> "Aneesh Kumar K.V" writes: >>> max_pkey now represents max key value that userspace can allocate. >>> > > I guess commit message is confusing. And the name. If it's called max_pkey then it should be the maximum allowed pkey value. cheers >>> diff --git a/arch/powerpc/include/asm/pkeys.h >>> b/arch/powerpc/include/asm/pkeys.h >>> index 75d2a2c19c04..652bad7334f3 100644 >>> --- a/arch/powerpc/include/asm/pkeys.h >>> +++ b/arch/powerpc/include/asm/pkeys.h >>> @@ -12,7 +12,7 @@ >>> #include >>> >>> DECLARE_STATIC_KEY_FALSE(pkey_disabled); >>> -extern int pkeys_total; /* total pkeys as per device tree */ >>> +extern int max_pkey; >>> extern u32 initial_allocation_mask; /* bits set for the initially >>> allocated keys */ >>> extern u32 reserved_allocation_mask; /* bits set for reserved keys */ >>> >>> @@ -44,7 +44,10 @@ static inline int vma_pkey(struct vm_area_struct *vma) >>> return (vma->vm_flags & ARCH_VM_PKEY_FLAGS) >> VM_PKEY_SHIFT; >>> } >>> >>> -#define arch_max_pkey() pkeys_total >>> +static inline int arch_max_pkey(void) >>> +{ >>> + return max_pkey; >>> +} >> >> If pkeys_total = 32 then max_pkey = 31. > > we have > > #ifdef CONFIG_PPC_4K_PAGES > /* >* The OS can manage only 8 pkeys due to its inability to represent them >* in the Linux 4K PTE. Mark all other keys reserved. >*/ > max_pkey = min(8, pkeys_total); > #else > max_pkey = pkeys_total; > #endif > > so it is 32. > >> >> So we can't just substitute one for the other. ie. arch_max_pkey() must >> have been wrong, or it is wrong now. >> >>> diff --git a/arch/powerpc/mm/book3s64/pkeys.c >>> b/arch/powerpc/mm/book3s64/pkeys.c >>> index 87d882a9aaf2..a4d7287082a8 100644 >>> --- a/arch/powerpc/mm/book3s64/pkeys.c >>> +++ b/arch/powerpc/mm/book3s64/pkeys.c >>> @@ -14,7 +14,7 @@ >>> >>> DEFINE_STATIC_KEY_FALSE(pkey_disabled); >>> DEFINE_STATIC_KEY_FALSE(execute_pkey_disabled); >>> -int pkeys_total; /* Total pkeys as per device tree */ >>> +int max_pkey; /* Maximum key value supported */ >>> u32 initial_allocation_mask; /* Bits set for the initially allocated >>> keys */ >>> /* >>>* Keys marked in the reservation list cannot be allocated by userspace >>> @@ -84,7 +84,7 @@ static int scan_pkey_feature(void) >>> >>> static int pkey_initialize(void) >>> { >>> - int os_reserved, i; >>> + int pkeys_total, i; >>> >>> /* >>> * We define PKEY_DISABLE_EXECUTE in addition to the arch-neutral >>> @@ -122,12 +122,12 @@ static int pkey_initialize(void) >>> * The OS can manage only 8 pkeys due to its inability to represent them >>> * in the Linux 4K PTE. Mark all other keys reserved. >>> */ >>> - os_reserved = pkeys_total - 8; >>> + max_pkey = min(8, pkeys_total); >> >> Shouldn't it be 7 ? >> >>> #else >>> - os_reserved = 0; >>> + max_pkey = pkeys_total; >>> #endif >>> >>> - if (unlikely((pkeys_total - os_reserved) <= execute_only_key)) { >>> + if (unlikely(max_pkey <= execute_only_key)) { >> >> Isn't that an off-by-one now? >> >> This is one-off boot time code, there's no need to clutter it with >> unlikely. >> >>> /* >>> * Insufficient number of keys to support >>> * execute only key. Mark it unavailable. >>> @@ -174,10 +174,10 @@ static int pkey_initialize(void) >>> default_uamor &= ~(0x3ul << pkeyshift(1)); >>> >>> /* >>> -* Prevent the usage of OS reserved the keys. Update UAMOR >>> +* Prevent the usage of OS reserved keys. Update UAMOR >>> * for those keys. >>> */ >>> - for (i = (pkeys_total - os_reserved); i < pkeys_total; i++) { >>> + for (i = max_pkey; i < pkeys_total; i++) { >> >> Another off-by-one? Shouldn't we start from max_pkey + 1 ? >> >>> reserved_allocation_mask |= (0x1 << i); >>> default_uamor &= ~(0x3ul << pkeyshift(i)); >>> } >> > > It is the commit message. It is max pkey such that userspace can > allocate 0 - maxp_pkey -1. > > -aneesh
Re: [PATCH v5 19/26] powerpc/book3s64/kuap: Move KUAP related function outside radix
"Aneesh Kumar K.V" writes: > On 7/6/20 6:11 PM, Michael Ellerman wrote: >> "Aneesh Kumar K.V" writes: >>> The next set of patches adds support for kuap with hash translation. >> >> That's no longer true of this series. >> >>> diff --git a/arch/powerpc/mm/book3s64/pkeys.c >>> b/arch/powerpc/mm/book3s64/pkeys.c >>> index 0d72c0246052..e93b65a0e6e7 100644 >>> --- a/arch/powerpc/mm/book3s64/pkeys.c >>> +++ b/arch/powerpc/mm/book3s64/pkeys.c >>> @@ -199,6 +200,24 @@ void __init pkey_early_init_devtree(void) >>> return; >>> } >>> >>> +#ifdef CONFIG_PPC_KUAP >>> +void __init setup_kuap(bool disabled) >>> +{ >>> + if (disabled || !early_radix_enabled()) >>> + return; >>> + >>> + if (smp_processor_id() == boot_cpuid) { >>> + pr_info("Activating Kernel Userspace Access Prevention\n"); >>> + cur_cpu_spec->mmu_features |= MMU_FTR_RADIX_KUAP; >>> + } >>> + >>> + /* Make sure userspace can't change the AMR */ >>> + mtspr(SPRN_UAMOR, 0); >>> + mtspr(SPRN_AMR, AMR_KUAP_BLOCKED); >>> + isync(); >>> +} >>> +#endif >> >> This makes this code depend on CONFIG_PPC_MEM_KEYS=y, which it didn't >> used to. >> >> That risks breaking people's existing .configs, if they have >> PPC_MEM_KEYS=n they will now lose KUAP. >> >> And I'm not convinced the two features should be tied together, at least >> at the user-visible Kconfig level. > > That simplifies the addition of hash kuap a lot. Especially in the > exception entry and return paths. I did try to consider them as > independent options. But then the feature fixup in asm code gets > unnecessarily complicated. Also the UAMOR handling also get complicated. Yep. I'm OK if most of the code is enabled for either/both options, but I think the user-visible options should not depend on each other. So something like: config PPC_PKEY def_bool y depends on PPC_MEM_KEYS || PPC_KUAP And then the low-level code is guarded by PPC_PKEY. Or we just say that making MEM_KEYS configurable is not worth the added complexity and turn it on always. cheers
Re: [PATCH v5 18/26] powerpc/book3s64/keys/kuap: Reset AMR/IAMR values on kexec
"Aneesh Kumar K.V" writes: > On 7/6/20 5:59 PM, Michael Ellerman wrote: >> "Aneesh Kumar K.V" writes: >>> As we kexec across kernels that use AMR/IAMR for different purposes >>> we need to ensure that new kernels get kexec'd with a reset value >>> of AMR/IAMR. For ex: the new kernel can use key 0 for kernel mapping and >>> the old >>> AMR value prevents access to key 0. >>> >>> This patch also removes reset if IAMR and AMOR in kexec_sequence. Reset of >>> AMOR >>> is not needed and the IAMR reset is partial (it doesn't do the reset >>> on secondary cpus) and is redundant with this patch. >> >> I like the idea of cleaning this stuff up. >> >> But I think tying it into kup/kuep/kuap and the MMU features and ifdefs >> and so on is overly complicated. >> >> We should just have eg: >> >> void reset_sprs(void) >> { >> if (early_cpu_has_feature(CPU_FTR_ARCH_206)) { >> mtspr(SPRN_AMR, 0); >> mtspr(SPRN_UAMOR, 0); >> } >> >> if (early_cpu_has_feature(CPU_FTR_ARCH_207S)) { >> mtspr(SPRN_IAMR, 0); >> } >> } >> >> And call that from the kexec paths. > > Will fix. Should that be early_cpu_has_feature()? cpu_has_feature() > should work there right? Yeah I guess. I was thinking if we crashed before code patching was done, but if that happens we can't kdump anyway. So I'm probably being over paranoid. cheers
Re: [PATCH v5 13/26] powerpc/book3s64/pkeys: Enable MMU_FTR_PKEY
"Aneesh Kumar K.V" writes: >>> /* * Let's assume 32 pkeys on P8 bare metal, if its not defined by device * tree. We make this exception since skiboot forgot to expose this * property on power8. */ if (!firmware_has_feature(FW_FEATURE_LPAR) && - cpu_has_feature(CPU_FTRS_POWER8)) + early_cpu_has_feature(CPU_FTRS_POWER8)) pkeys_total = 32; >>> >>> That's not how cpu_has_feature() works, we'll need to fix that. >>> >>> cheers >>> >> >> I did a separate patch to handle that which switch the above to >> >> /* >> * Let's assume 32 pkeys on P8/P9 bare metal, if its not >> defined by device >> * tree. We make this exception since skiboot forgot to expose >> this >> * property on power8/9. >> */ >> if (!firmware_has_feature(FW_FEATURE_LPAR) && >> (early_cpu_has_feature(CPU_FTR_ARCH_207S) || >> early_cpu_has_feature(CPU_FTR_ARCH_300))) >> pkeys_total = 32; >> > > We should do a PVR check here i guess. Yes, the ARCH features don't work because P10 will have both of those enabled. > ret = of_scan_flat_dt(dt_scan_storage_keys, _total); > if (ret == 0) { > > /* >* Let's assume 32 pkeys on P8/P9 bare metal, if its not > defined by device >* tree. We make this exception since skiboot forgot to expose > this >* property on power8/9. Well, it does expose it on Power9 after v6.6, but most P9 systems have an older firmware than that. And also the kernel has been enabling that on Power9 because of the CPU_FTRS_POWER8 bug, so this is not actually a behaviour change. >*/ > if (!firmware_has_feature(FW_FEATURE_LPAR) && > (pvr_version_is(PVR_POWER8) || pvr_version_is(PVR_POWER9))) > pkeys_total = 32; > } You need PVR_POWER8E and PVR_POWER8NVL as well. cheers
[PATCH 1/1] KVM/PPC: Fix typo on H_DISABLE_AND_GET hcall
On PAPR+ the hcall() on 0x1B0 is called H_DISABLE_AND_GET, but got defined as H_DISABLE_AND_GETC instead. This define was introduced with a typo in commit ("[PATCH] powerpc: Extends HCALL interface for InfiniBand usage"), and was later used without having the typo noticed. Signed-off-by: Leonardo Bras --- arch/powerpc/include/asm/hvcall.h| 2 +- arch/powerpc/kvm/trace_hv.h | 2 +- tools/perf/arch/powerpc/util/book3s_hcalls.h | 2 +- 3 files changed, 3 insertions(+), 3 deletions(-) diff --git a/arch/powerpc/include/asm/hvcall.h b/arch/powerpc/include/asm/hvcall.h index e90c073e437e..d8ada9c7ec78 100644 --- a/arch/powerpc/include/asm/hvcall.h +++ b/arch/powerpc/include/asm/hvcall.h @@ -237,7 +237,7 @@ #define H_CREATE_RPT0x1A4 #define H_REMOVE_RPT0x1A8 #define H_REGISTER_RPAGES 0x1AC -#define H_DISABLE_AND_GETC 0x1B0 +#define H_DISABLE_AND_GET 0x1B0 #define H_ERROR_DATA0x1B4 #define H_GET_HCA_INFO 0x1B8 #define H_GET_PERF_COUNT0x1BC diff --git a/arch/powerpc/kvm/trace_hv.h b/arch/powerpc/kvm/trace_hv.h index 4a61a971c34e..830a126e095d 100644 --- a/arch/powerpc/kvm/trace_hv.h +++ b/arch/powerpc/kvm/trace_hv.h @@ -89,7 +89,7 @@ {H_CREATE_RPT, "H_CREATE_RPT"}, \ {H_REMOVE_RPT, "H_REMOVE_RPT"}, \ {H_REGISTER_RPAGES, "H_REGISTER_RPAGES"}, \ - {H_DISABLE_AND_GETC,"H_DISABLE_AND_GETC"}, \ + {H_DISABLE_AND_GET, "H_DISABLE_AND_GET"}, \ {H_ERROR_DATA, "H_ERROR_DATA"}, \ {H_GET_HCA_INFO,"H_GET_HCA_INFO"}, \ {H_GET_PERF_COUNT, "H_GET_PERF_COUNT"}, \ diff --git a/tools/perf/arch/powerpc/util/book3s_hcalls.h b/tools/perf/arch/powerpc/util/book3s_hcalls.h index 54cfa0530e86..488f4339b83c 100644 --- a/tools/perf/arch/powerpc/util/book3s_hcalls.h +++ b/tools/perf/arch/powerpc/util/book3s_hcalls.h @@ -84,7 +84,7 @@ {0x1a4, "H_CREATE_RPT"},\ {0x1a8, "H_REMOVE_RPT"},\ {0x1ac, "H_REGISTER_RPAGES"}, \ - {0x1b0, "H_DISABLE_AND_GETC"}, \ + {0x1b0, "H_DISABLE_AND_GET"}, \ {0x1b4, "H_ERROR_DATA"},\ {0x1b8, "H_GET_HCA_INFO"}, \ {0x1bc, "H_GET_PERF_COUNT"},\ -- 2.25.4
Re: [PATCH] powerpc/numa: Restrict possible nodes based on platform
Tyrel Datwyler writes: >> --- a/arch/powerpc/mm/numa.c >> +++ b/arch/powerpc/mm/numa.c >> @@ -897,7 +897,7 @@ static void __init find_possible_nodes(void) >> return; >> >> if (of_property_read_u32_index(rtas, >> -"ibm,max-associativity-domains", >> +"ibm,current-associativity-domains", > > I'm not sure ibm,current-associativity-domains is guaranteed to exist on older > firmware. You may need check that it exists and fall back to > ibm,max-associativity-domains in the event it doesn't Yes. Looks like it's a PowerVM-specific property.
Re: [PATCH v2 00/12] ppc64: enable kdump support for kexec_file_load syscall
On 07/03/2020 03:53 AM, Hari Bathini wrote: > This patch series enables kdump support for kexec_file_load system > call (kexec -s -p) on PPC64. The changes are inspired from kexec-tools > code but heavily modified for kernel consumption. There is scope to > expand purgatory to verify sha256 digest along with other improvements > in purgatory code. Will deal with those changes in a separate patch > series later. > > The first patch adds a weak arch_kexec_locate_mem_hole() function to > override locate memory hole logic suiting arch needs. There are some > special regions in ppc64 which should be avoided while loading buffer > & there are multiple callers to kexec_add_buffer making it complicated > to maintain range sanity and using generic lookup at the same time. > > The second patch marks ppc64 specific code within arch/powerpc/kexec > and arch/powerpc/purgatory to make the subsequent code changes easy > to understand. > > The next patch adds helper function to setup different memory ranges > needed for loading kdump kernel, booting into it and exporting the > crashing kernel's elfcore. > > The fourth patch overrides arch_kexec_locate_mem_hole() function to > locate memory hole for kdump segments by accounting for the special > memory regions, referred to as excluded memory ranges, and sets > kbuf->mem when a suitable memory region is found. > > The fifth patch moves walk_drmem_lmbs() out of .init section with > a few changes to reuse it for setting up kdump kernel's usable memory > ranges. The next patch uses walk_drmem_lmbs() to look up the LMBs > and set linux,drconf-usable-memory & linux,usable-memory properties > in order to restrict kdump kernel's memory usage. > > The seventh patch adds relocation support for the purgatory. Patch 8 > helps setup the stack for the purgatory. The next patch setups up > backup region as a segment while loading kdump kernel and teaches > purgatory to copy it from source to destination. > > Patch 10 builds the elfcore header for the running kernel & passes > the info to kdump kernel via "elfcorehdr=" parameter to export as > /proc/vmcore file. The next patch sets up the memory reserve map > for the kexec kernel and also claims kdump support for kdump as > all the necessary changes are added. > > The last patch fixes a lookup issue for `kexec -l -s` case when > memory is reserved for crashkernel. > > Tested the changes successfully on P8, P9 lpars, couple of OpenPOWER > boxes and a simulator. > > Changes in v2: > * Introduced arch_kexec_locate_mem_hole() for override and dropped > weak arch_kexec_add_buffer(). > * Addressed warnings reported by lkp. > * Added patch to address kexec load issue when memory is reserved > for crashkernel. > * Used the appropriate license header for the new files added. > * Added an option to merge ranges to minimize reallocations while > adding memory ranges. > * Dropped within_crashkernel parameter for add_opal_mem_range() & > add_rtas_mem_range() functions as it is not really needed. > > --- > > Hari Bathini (12): > kexec_file: allow archs to handle special regions while locating memory > hole > powerpc/kexec_file: mark PPC64 specific code > powerpc/kexec_file: add helper functions for getting memory ranges > ppc64/kexec_file: avoid stomping memory used by special regions > powerpc/drmem: make lmb walk a bit more flexible > ppc64/kexec_file: restrict memory usage of kdump kernel > ppc64/kexec_file: add support to relocate purgatory > ppc64/kexec_file: setup the stack for purgatory > ppc64/kexec_file: setup backup region for kdump kernel > ppc64/kexec_file: prepare elfcore header for crashing kernel > ppc64/kexec_file: add appropriate regions for memory reserve map > ppc64/kexec_file: fix kexec load failure with lack of memory hole > > > arch/powerpc/include/asm/crashdump-ppc64.h | 15 > arch/powerpc/include/asm/drmem.h |9 > arch/powerpc/include/asm/kexec.h | 35 + > arch/powerpc/include/asm/kexec_ranges.h| 18 > arch/powerpc/include/asm/purgatory.h | 11 > arch/powerpc/kernel/prom.c | 13 > arch/powerpc/kexec/Makefile|2 > arch/powerpc/kexec/elf_64.c| 35 + > arch/powerpc/kexec/file_load.c | 78 + > arch/powerpc/kexec/file_load_64.c | 1509 > > arch/powerpc/kexec/ranges.c| 397 +++ > arch/powerpc/mm/drmem.c| 87 +- > arch/powerpc/mm/numa.c | 13 > arch/powerpc/purgatory/Makefile| 28 - > arch/powerpc/purgatory/purgatory_64.c | 36 + > arch/powerpc/purgatory/trampoline.S| 117 -- > arch/powerpc/purgatory/trampoline_64.S | 175 +++ > include/linux/kexec.h | 29 - > kernel/kexec_file.c| 16 > 19 files changed, 2413 insertions(+), 210
Re: [PATCH 1/4] dma-mapping: move the remaining DMA API calls out of line
On 03/06/2020 14:13, Alexey Kardashevskiy wrote: > > > On 09/05/2020 18:19, Christoph Hellwig wrote: >> On Tue, May 05, 2020 at 02:18:37PM +1000, Alexey Kardashevskiy wrote: >>> >>> >>> On 17/04/2020 17:58, Christoph Hellwig wrote: On Wed, Apr 15, 2020 at 09:21:37PM +1000, Alexey Kardashevskiy wrote: > And the fact they were exported leaves possibility that there is a > driver somewhere relying on these symbols or distro kernel won't build > because the symbol disappeared from exports (I do not know what KABI > guarantees or if mainline kernel cares). We absolutely do not care. In fact for abuses of APIs that drivers should not use we almost care to make them private and break people abusing them. >>> >>> ok :) >>> > I do not care in particular but > some might, a line separated with empty lines in the commit log would do. I'll add a blurb for the next version. >>> >>> >>> Has it gone anywhere? Thanks, >> >> I've been hoping for the sg_buf helpers to land first, as they need >> backporting and would conflict. Do you urgently need the series? > > Any progress with sg_buf helpers stuff? Thanks, Any luck there? I'd really like to cross this off my todo list :) Thanks, -- Alexey
Re: [PATCH] powerpc/spufs: add CONFIG_COREDUMP dependency
Hi Arnd, > The kernel test robot pointed out a slightly different error message > after recent commit 5456ffdee666 ("powerpc/spufs: simplify spufs core > dumping") to spufs for a configuration that never worked: > >powerpc64-linux-ld: arch/powerpc/platforms/cell/spufs/file.o: in > function `.spufs_proxydma_info_dump': > > > file.c:(.text+0x4c68): undefined reference to `.dump_emit' >powerpc64-linux-ld: arch/powerpc/platforms/cell/spufs/file.o: in > function `.spufs_dma_info_dump': >file.c:(.text+0x4d70): undefined reference to `.dump_emit' >powerpc64-linux-ld: arch/powerpc/platforms/cell/spufs/file.o: in > function `.spufs_wbox_info_dump': >file.c:(.text+0x4df4): undefined reference to `.dump_emit' > > Add a Kconfig dependency to prevent this from happening again. > > Reported-by: kernel test robot > Signed-off-by: Arnd Bergmann Looks good to me, thanks. Acked-by: Jeremy Kerr Cheers, Jeremy
Re: [PATCH] powerpc/numa: Restrict possible nodes based on platform
Srikar Dronamraju writes: > diff --git a/arch/powerpc/mm/numa.c b/arch/powerpc/mm/numa.c > index 9fcf2d195830..3d55cef1a2dc 100644 > --- a/arch/powerpc/mm/numa.c > +++ b/arch/powerpc/mm/numa.c > @@ -897,7 +897,7 @@ static void __init find_possible_nodes(void) > return; > > if (of_property_read_u32_index(rtas, > - "ibm,max-associativity-domains", > + "ibm,current-associativity-domains", > min_common_depth, )) Looks good if ibm,current-associativity-domains[min_common_depth] actually denotes the range of possible values, i.e. a value of 2 implies node numbers 0 and 1. PAPR+ says it's the "number of unique values", which isn't how I would specify the property if it's supposed to express a range. But it's probably OK...
Re: [PATCH] powerpc/numa: Restrict possible nodes based on platform
On 7/5/20 11:40 PM, Srikar Dronamraju wrote: > As per PAPR, there are 2 device tree property > ibm,max-associativity-domains (which defines the maximum number of > domains that the firmware i.e PowerVM can support) and > ibm,current-associativity-domains (which defines the maximum number of > domains that the platform can support). Value of > ibm,max-associativity-domains property is always greater than or equal > to ibm,current-associativity-domains property. > > Powerpc currently uses ibm,max-associativity-domains property while > setting the possible number of nodes. This is currently set at 32. > However the possible number of nodes for a platform may be significantly > less. Hence set the possible number of nodes based on > ibm,current-associativity-domains property. > > $ lsprop /proc/device-tree/rtas/ibm,*associ*-domains > /proc/device-tree/rtas/ibm,current-associativity-domains >0005 0001 0002 0002 0002 0010 > /proc/device-tree/rtas/ibm,max-associativity-domains >0005 0001 0008 0020 0020 0100 > > $ cat /sys/devices/system/node/possible ##Before patch > 0-31 > > $ cat /sys/devices/system/node/possible ##After patch > 0-1 > > Note the maximum nodes this platform can support is only 2 but the > possible nodes is set to 32. > > This is important because lot of kernel and user space code allocate > structures for all possible nodes leading to a lot of memory that is > allocated but not used. > > I ran a simple experiment to create and destroy 100 memory cgroups on > boot on a 8 node machine (Power8 Alpine). > > Before patch > free -k at boot > totalusedfree shared buff/cache > available > Mem: 523498176 4106816 518820608 22272 570752 > 516606720 > Swap: 4194240 0 4194240 > > free -k after creating 100 memory cgroups > totalusedfree shared buff/cache > available > Mem: 523498176 4628416 518246464 22336 623296 > 516058688 > Swap: 4194240 0 4194240 > > free -k after destroying 100 memory cgroups > totalusedfree shared buff/cache > available > Mem: 523498176 4697408 518173760 22400 627008 > 515987904 > Swap: 4194240 0 4194240 > > After patch > free -k at boot > totalusedfree shared buff/cache > available > Mem: 523498176 3969472 518933888 22272 594816 > 516731776 > Swap: 4194240 0 4194240 > > free -k after creating 100 memory cgroups > totalusedfree shared buff/cache > available > Mem: 523498176 4181888 518676096 22208 640192 > 516496448 > Swap: 4194240 0 4194240 > > free -k after destroying 100 memory cgroups > totalusedfree shared buff/cache > available > Mem: 523498176 4232320 518619904 22272 645952 > 516443264 > Swap: 4194240 0 4194240 > > Observations: > Fixed kernel takes 137344 kb (4106816-3969472) less to boot. > Fixed kernel takes 309184 kb (4628416-4181888-137344) less to create 100 > memcgs. > > Cc: Nathan Lynch > Cc: Michael Ellerman > Cc: linuxppc-dev@lists.ozlabs.org > Cc: Anton Blanchard > Cc: Bharata B Rao > Signed-off-by: Srikar Dronamraju > --- > arch/powerpc/mm/numa.c | 2 +- > 1 file changed, 1 insertion(+), 1 deletion(-) > > diff --git a/arch/powerpc/mm/numa.c b/arch/powerpc/mm/numa.c > index 9fcf2d195830..3d55cef1a2dc 100644 > --- a/arch/powerpc/mm/numa.c > +++ b/arch/powerpc/mm/numa.c > @@ -897,7 +897,7 @@ static void __init find_possible_nodes(void) > return; > > if (of_property_read_u32_index(rtas, > - "ibm,max-associativity-domains", > + "ibm,current-associativity-domains", I'm not sure ibm,current-associativity-domains is guaranteed to exist on older firmware. You may need check that it exists and fall back to ibm,max-associativity-domains in the event it doesn't -Tyrel > min_common_depth, )) > goto out; >
Re: [PATCH v3 0/6] powerpc: queued spinlocks and rwlocks
On 7/6/20 12:35 AM, Nicholas Piggin wrote: v3 is updated to use __pv_queued_spin_unlock, noticed by Waiman (thank you). Thanks, Nick Nicholas Piggin (6): powerpc/powernv: must include hvcall.h to get PAPR defines powerpc/pseries: move some PAPR paravirt functions to their own file powerpc: move spinlock implementation to simple_spinlock powerpc/64s: implement queued spinlocks and rwlocks powerpc/pseries: implement paravirt qspinlocks for SPLPAR powerpc/qspinlock: optimised atomic_try_cmpxchg_lock that adds the lock hint arch/powerpc/Kconfig | 13 + arch/powerpc/include/asm/Kbuild | 2 + arch/powerpc/include/asm/atomic.h | 28 ++ arch/powerpc/include/asm/paravirt.h | 89 + arch/powerpc/include/asm/qspinlock.h | 91 ++ arch/powerpc/include/asm/qspinlock_paravirt.h | 7 + arch/powerpc/include/asm/simple_spinlock.h| 292 + .../include/asm/simple_spinlock_types.h | 21 ++ arch/powerpc/include/asm/spinlock.h | 308 +- arch/powerpc/include/asm/spinlock_types.h | 17 +- arch/powerpc/lib/Makefile | 3 + arch/powerpc/lib/locks.c | 12 +- arch/powerpc/platforms/powernv/pci-ioda-tce.c | 1 + arch/powerpc/platforms/pseries/Kconfig| 5 + arch/powerpc/platforms/pseries/setup.c| 6 +- include/asm-generic/qspinlock.h | 4 + 16 files changed, 577 insertions(+), 322 deletions(-) create mode 100644 arch/powerpc/include/asm/paravirt.h create mode 100644 arch/powerpc/include/asm/qspinlock.h create mode 100644 arch/powerpc/include/asm/qspinlock_paravirt.h create mode 100644 arch/powerpc/include/asm/simple_spinlock.h create mode 100644 arch/powerpc/include/asm/simple_spinlock_types.h This patch looks OK to me. I had run some microbenchmark on powerpc system with or w/o the patch. On a 2-socket 160-thread SMT4 POWER9 system (not virtualized): 5.8.0-rc4 = Running locktest with spinlock [runtime = 10s, load = 1] Threads = 160, Min/Mean/Max = 77,665/90,153/106,895 Threads = 160, Total Rate = 1,441,759 op/s; Percpu Rate = 9,011 op/s Running locktest with rwlock [runtime = 10s, r% = 50%, load = 1] Threads = 160, Min/Mean/Max = 47,879/53,807/63,689 Threads = 160, Total Rate = 860,192 op/s; Percpu Rate = 5,376 op/s Running locktest with spinlock [runtime = 10s, load = 1] Threads = 80, Min/Mean/Max = 242,907/319,514/463,161 Threads = 80, Total Rate = 2,555 kop/s; Percpu Rate = 32 kop/s Running locktest with rwlock [runtime = 10s, r% = 50%, load = 1] Threads = 80, Min/Mean/Max = 146,161/187,474/259,270 Threads = 80, Total Rate = 1,498 kop/s; Percpu Rate = 19 kop/s Running locktest with spinlock [runtime = 10s, load = 1] Threads = 40, Min/Mean/Max = 646,639/1,000,817/1,455,205 Threads = 40, Total Rate = 4,001 kop/s; Percpu Rate = 100 kop/s Running locktest with rwlock [runtime = 10s, r% = 50%, load = 1] Threads = 40, Min/Mean/Max = 402,165/597,132/814,555 Threads = 40, Total Rate = 2,388 kop/s; Percpu Rate = 60 kop/s 5.8.0-rc4-qlock+ Running locktest with spinlock [runtime = 10s, load = 1] Threads = 160, Min/Mean/Max = 123,835/124,580/124,587 Threads = 160, Total Rate = 1,992 kop/s; Percpu Rate = 12 kop/s Running locktest with rwlock [runtime = 10s, r% = 50%, load = 1] Threads = 160, Min/Mean/Max = 254,210/264,714/276,784 Threads = 160, Total Rate = 4,231 kop/s; Percpu Rate = 26 kop/s Running locktest with spinlock [runtime = 10s, load = 1] Threads = 80, Min/Mean/Max = 599,715/603,397/603,450 Threads = 80, Total Rate = 4,825 kop/s; Percpu Rate = 60 kop/s Running locktest with rwlock [runtime = 10s, r% = 50%, load = 1] Threads = 80, Min/Mean/Max = 492,687/525,224/567,456 Threads = 80, Total Rate = 4,199 kop/s; Percpu Rate = 52 kop/s Running locktest with spinlock [runtime = 10s, load = 1] Threads = 40, Min/Mean/Max = 1,325,623/1,325,628/1,325,636 Threads = 40, Total Rate = 5,299 kop/s; Percpu Rate = 132 kop/s Running locktest with rwlock [runtime = 10s, r% = 50%, load = 1] Threads = 40, Min/Mean/Max = 1,249,731/1,292,977/1,342,815 Threads = 40, Total Rate = 5,168 kop/s; Percpu Rate = 129 kop/s On systems on large number of cpus, qspinlock lock is faster and more fair. With some tuning, we may be able to squeeze out more performance. Cheers, Longman
Re: [PATCH] kbuild: introduce ccflags-remove-y and asflags-remove-y
Hi Anders, On Mon, Jul 6, 2020 at 8:24 PM Anders Roxell wrote: > > The full log can be found here [1]. > > Without this patch for 'trace_selftest_dynamic' for instance, CC_FLAGS_FTRACE > was removed from kernel/trace/*, and then added back to > kernel/trace/trace_selftest_dynamic. > While with this patch it looks like we add the flag (even though it is > already there), and then > removes the flag for all files in kernel/trace/* . You are right. I will drop this patch, and send v2. Thank you. -- Best Regards Masahiro Yamada
Re: [PATCH v5 13/26] powerpc/book3s64/pkeys: Enable MMU_FTR_PKEY
/* * Let's assume 32 pkeys on P8 bare metal, if its not defined by device * tree. We make this exception since skiboot forgot to expose this * property on power8. */ if (!firmware_has_feature(FW_FEATURE_LPAR) && - cpu_has_feature(CPU_FTRS_POWER8)) + early_cpu_has_feature(CPU_FTRS_POWER8)) pkeys_total = 32; That's not how cpu_has_feature() works, we'll need to fix that. cheers I did a separate patch to handle that which switch the above to /* * Let's assume 32 pkeys on P8/P9 bare metal, if its not defined by device * tree. We make this exception since skiboot forgot to expose this * property on power8/9. */ if (!firmware_has_feature(FW_FEATURE_LPAR) && (early_cpu_has_feature(CPU_FTR_ARCH_207S) || early_cpu_has_feature(CPU_FTR_ARCH_300))) pkeys_total = 32; We should do a PVR check here i guess. ret = of_scan_flat_dt(dt_scan_storage_keys, _total); if (ret == 0) { /* * Let's assume 32 pkeys on P8/P9 bare metal, if its not defined by device * tree. We make this exception since skiboot forgot to expose this * property on power8/9. */ if (!firmware_has_feature(FW_FEATURE_LPAR) && (pvr_version_is(PVR_POWER8) || pvr_version_is(PVR_POWER9))) pkeys_total = 32; } -aneesh
Re: [PATCH v5 3/3] mm/page_alloc: Keep memoryless cpuless node 0 offline
> > What's the point of this indirection other than another way of avoiding > > empty node 0? > > Honestly, I do not have any idea. I've traced it down to > Author: Andi Kleen > Date: Tue Jan 11 15:35:48 2005 -0800 I don't remember all the details, and I can't even find the commit (is it in linux-historic?). But AFAIK there's no guarantee PXMs are small and continuous, so it seemed better to have a clean zero based space. Back then we had a lot of problems with buggy SRAT tables in BIOS, so we really tried to avoid trusting the BIOS as much as possible. -Andi
Re: [PATCH v5 19/26] powerpc/book3s64/kuap: Move KUAP related function outside radix
On 7/6/20 6:11 PM, Michael Ellerman wrote: "Aneesh Kumar K.V" writes: The next set of patches adds support for kuap with hash translation. That's no longer true of this series. diff --git a/arch/powerpc/mm/book3s64/pkeys.c b/arch/powerpc/mm/book3s64/pkeys.c index 0d72c0246052..e93b65a0e6e7 100644 --- a/arch/powerpc/mm/book3s64/pkeys.c +++ b/arch/powerpc/mm/book3s64/pkeys.c @@ -199,6 +200,24 @@ void __init pkey_early_init_devtree(void) return; } +#ifdef CONFIG_PPC_KUAP +void __init setup_kuap(bool disabled) +{ + if (disabled || !early_radix_enabled()) + return; + + if (smp_processor_id() == boot_cpuid) { + pr_info("Activating Kernel Userspace Access Prevention\n"); + cur_cpu_spec->mmu_features |= MMU_FTR_RADIX_KUAP; + } + + /* Make sure userspace can't change the AMR */ + mtspr(SPRN_UAMOR, 0); + mtspr(SPRN_AMR, AMR_KUAP_BLOCKED); + isync(); +} +#endif This makes this code depend on CONFIG_PPC_MEM_KEYS=y, which it didn't used to. That risks breaking people's existing .configs, if they have PPC_MEM_KEYS=n they will now lose KUAP. And I'm not convinced the two features should be tied together, at least at the user-visible Kconfig level. That simplifies the addition of hash kuap a lot. Especially in the exception entry and return paths. I did try to consider them as independent options. But then the feature fixup in asm code gets unnecessarily complicated. Also the UAMOR handling also get complicated. -aneesh
Re: [PATCH v5 18/26] powerpc/book3s64/keys/kuap: Reset AMR/IAMR values on kexec
On 7/6/20 5:59 PM, Michael Ellerman wrote: "Aneesh Kumar K.V" writes: As we kexec across kernels that use AMR/IAMR for different purposes we need to ensure that new kernels get kexec'd with a reset value of AMR/IAMR. For ex: the new kernel can use key 0 for kernel mapping and the old AMR value prevents access to key 0. This patch also removes reset if IAMR and AMOR in kexec_sequence. Reset of AMOR is not needed and the IAMR reset is partial (it doesn't do the reset on secondary cpus) and is redundant with this patch. I like the idea of cleaning this stuff up. But I think tying it into kup/kuep/kuap and the MMU features and ifdefs and so on is overly complicated. We should just have eg: void reset_sprs(void) { if (early_cpu_has_feature(CPU_FTR_ARCH_206)) { mtspr(SPRN_AMR, 0); mtspr(SPRN_UAMOR, 0); } if (early_cpu_has_feature(CPU_FTR_ARCH_207S)) { mtspr(SPRN_IAMR, 0); } } And call that from the kexec paths. Will fix. Should that be early_cpu_has_feature()? cpu_has_feature() should work there right? -aneesh
Re: [PATCH v5 13/26] powerpc/book3s64/pkeys: Enable MMU_FTR_PKEY
On 7/6/20 6:40 PM, Michael Ellerman wrote: "Aneesh Kumar K.V" writes: Parse storage keys related device tree entry in early_init_devtree and enable MMU feature MMU_FTR_PKEY if pkeys are supported. MMU feature is used instead of CPU feature because this enables us to group MMU_FTR_KUAP and MMU_FTR_PKEY in asm feature fixup code. Subject should probably be "Add MMU_FTR_PKEY". diff --git a/arch/powerpc/include/asm/mmu.h b/arch/powerpc/include/asm/mmu.h index f4ac25d4df05..72966d3d8f64 100644 --- a/arch/powerpc/include/asm/mmu.h +++ b/arch/powerpc/include/asm/mmu.h @@ -23,6 +23,7 @@ /* Radix page table supported and enabled */ #define MMU_FTR_TYPE_RADIXASM_CONST(0x0040) +#define MMU_FTR_PKEY ASM_CONST(0x0080) It's not a type, so it should be with the individual feature bits below: We don't have free bit in the other group. For now i will move this to modified arch/powerpc/include/asm/mmu.h @@ -23,12 +23,15 @@ /* Radix page table supported and enabled */ #define MMU_FTR_TYPE_RADIX ASM_CONST(0x0040) -#define MMU_FTR_PKEY ASM_CONST(0x0080) /* * Individual features below. */ +/* + * Support for memory protection keys. + */ +#define MMU_FTR_PKEY ASM_CONST(0x1000) /* * Support for 68 bit VA space. We added that from ISA 2.05 */ /* * Individual features below. diff --git a/arch/powerpc/kernel/prom.c b/arch/powerpc/kernel/prom.c index 6a3bac357e24..6d70797352d8 100644 --- a/arch/powerpc/kernel/prom.c +++ b/arch/powerpc/kernel/prom.c @@ -815,6 +815,11 @@ void __init early_init_devtree(void *params) /* Now try to figure out if we are running on LPAR and so on */ pseries_probe_fw_features(); + /* +* Initialize pkey features and default AMR/IAMR values +*/ + pkey_early_init_devtree(); Not your fault, but the fact that we're having to do more and more initialisation this early, based on the flat device tree, makes me wonder if we need to rethink how we're doing the CPU/MMU feature setup. diff --git a/arch/powerpc/mm/book3s64/pkeys.c b/arch/powerpc/mm/book3s64/pkeys.c index 0ff59acdbb84..bbba9c601e14 100644 --- a/arch/powerpc/mm/book3s64/pkeys.c +++ b/arch/powerpc/mm/book3s64/pkeys.c @@ -38,38 +39,45 @@ static int execute_only_key = 2; #define PKEY_REG_BITS (sizeof(u64) * 8) #define pkeyshift(pkey) (PKEY_REG_BITS - ((pkey+1) * AMR_BITS_PER_PKEY)) +static int __init dt_scan_storage_keys(unsigned long node, + const char *uname, int depth, + void *data) +{ + const char *type = of_get_flat_dt_prop(node, "device_type", NULL); + const __be32 *prop; + int pkeys_total; + + /* We are scanning "cpu" nodes only */ + if (type == NULL || strcmp(type, "cpu") != 0) + return 0; + + prop = of_get_flat_dt_prop(node, "ibm,processor-storage-keys", NULL); + if (!prop) + return 0; + pkeys_total = be32_to_cpu(prop[0]); + return pkeys_total; That's not really the way the return value is meant to be used for these scanning functions. The usual return values are 0 to keep scanning and !0 means we've found the node we're looking for and we should stop scanning. Doing it this way means if you find 0 pkeys it will keep scanning. Instead you should pass _total as the data pointer and set that. +} + done modified arch/powerpc/mm/book3s64/pkeys.c @@ -52,7 +52,7 @@ static int __init dt_scan_storage_keys(unsigned long node, { const char *type = of_get_flat_dt_prop(node, "device_type", NULL); const __be32 *prop; - int pkeys_total; + int *pkeys_total = (int *) data; /* We are scanning "cpu" nodes only */ if (type == NULL || strcmp(type, "cpu") != 0) @@ -61,12 +61,13 @@ static int __init dt_scan_storage_keys(unsigned long node, prop = of_get_flat_dt_prop(node, "ibm,processor-storage-keys", NULL); if (!prop) return 0; - pkeys_total = be32_to_cpu(prop[0]); - return pkeys_total; + *pkeys_total = be32_to_cpu(prop[0]); + return 1; } static int scan_pkey_feature(void) { + int ret; int pkeys_total; /* @@ -75,8 +76,8 @@ static int scan_pkey_feature(void) if (early_radix_enabled()) return 0; - pkeys_total = of_scan_flat_dt(dt_scan_storage_keys, NULL); - if (pkeys_total == 0) { + ret = of_scan_flat_dt(dt_scan_storage_keys, _total); + if (ret == 0) { /* * Let's assume 32 pkeys on P8/P9 bare metal, if its not defined by device @@ -87,7 +88,6 @@ static int scan_pkey_feature(void) (early_cpu_has_feature(CPU_FTR_ARCH_207S) || early_cpu_has_feature(CPU_FTR_ARCH_300))) pkeys_total = 32; - } /* static int
[PATCH] powerpc/spufs: add CONFIG_COREDUMP dependency
The kernel test robot pointed out a slightly different error message after recent commit 5456ffdee666 ("powerpc/spufs: simplify spufs core dumping") to spufs for a configuration that never worked: powerpc64-linux-ld: arch/powerpc/platforms/cell/spufs/file.o: in function `.spufs_proxydma_info_dump': >> file.c:(.text+0x4c68): undefined reference to `.dump_emit' powerpc64-linux-ld: arch/powerpc/platforms/cell/spufs/file.o: in function `.spufs_dma_info_dump': file.c:(.text+0x4d70): undefined reference to `.dump_emit' powerpc64-linux-ld: arch/powerpc/platforms/cell/spufs/file.o: in function `.spufs_wbox_info_dump': file.c:(.text+0x4df4): undefined reference to `.dump_emit' Add a Kconfig dependency to prevent this from happening again. Reported-by: kernel test robot Signed-off-by: Arnd Bergmann --- arch/powerpc/platforms/cell/Kconfig | 1 + 1 file changed, 1 insertion(+) diff --git a/arch/powerpc/platforms/cell/Kconfig b/arch/powerpc/platforms/cell/Kconfig index 0f7c8241912b..f2ff359041ee 100644 --- a/arch/powerpc/platforms/cell/Kconfig +++ b/arch/powerpc/platforms/cell/Kconfig @@ -44,6 +44,7 @@ config SPU_FS tristate "SPU file system" default m depends on PPC_CELL + depends on COREDUMP select SPU_BASE help The SPU file system is used to access Synergistic Processing -- 2.27.0
Re: [PATCH v5 13/26] powerpc/book3s64/pkeys: Enable MMU_FTR_PKEY
"Aneesh Kumar K.V" writes: > Parse storage keys related device tree entry in early_init_devtree > and enable MMU feature MMU_FTR_PKEY if pkeys are supported. > > MMU feature is used instead of CPU feature because this enables us > to group MMU_FTR_KUAP and MMU_FTR_PKEY in asm feature fixup code. Subject should probably be "Add MMU_FTR_PKEY". > diff --git a/arch/powerpc/include/asm/mmu.h b/arch/powerpc/include/asm/mmu.h > index f4ac25d4df05..72966d3d8f64 100644 > --- a/arch/powerpc/include/asm/mmu.h > +++ b/arch/powerpc/include/asm/mmu.h > @@ -23,6 +23,7 @@ > > /* Radix page table supported and enabled */ > #define MMU_FTR_TYPE_RADIX ASM_CONST(0x0040) > +#define MMU_FTR_PKEY ASM_CONST(0x0080) It's not a type, so it should be with the individual feature bits below: > /* > * Individual features below. > diff --git a/arch/powerpc/kernel/prom.c b/arch/powerpc/kernel/prom.c > index 6a3bac357e24..6d70797352d8 100644 > --- a/arch/powerpc/kernel/prom.c > +++ b/arch/powerpc/kernel/prom.c > @@ -815,6 +815,11 @@ void __init early_init_devtree(void *params) > /* Now try to figure out if we are running on LPAR and so on */ > pseries_probe_fw_features(); > > + /* > + * Initialize pkey features and default AMR/IAMR values > + */ > + pkey_early_init_devtree(); Not your fault, but the fact that we're having to do more and more initialisation this early, based on the flat device tree, makes me wonder if we need to rethink how we're doing the CPU/MMU feature setup. > diff --git a/arch/powerpc/mm/book3s64/pkeys.c > b/arch/powerpc/mm/book3s64/pkeys.c > index 0ff59acdbb84..bbba9c601e14 100644 > --- a/arch/powerpc/mm/book3s64/pkeys.c > +++ b/arch/powerpc/mm/book3s64/pkeys.c > @@ -38,38 +39,45 @@ static int execute_only_key = 2; > #define PKEY_REG_BITS (sizeof(u64) * 8) > #define pkeyshift(pkey) (PKEY_REG_BITS - ((pkey+1) * AMR_BITS_PER_PKEY)) > > +static int __init dt_scan_storage_keys(unsigned long node, > +const char *uname, int depth, > +void *data) > +{ > + const char *type = of_get_flat_dt_prop(node, "device_type", NULL); > + const __be32 *prop; > + int pkeys_total; > + > + /* We are scanning "cpu" nodes only */ > + if (type == NULL || strcmp(type, "cpu") != 0) > + return 0; > + > + prop = of_get_flat_dt_prop(node, "ibm,processor-storage-keys", NULL); > + if (!prop) > + return 0; > + pkeys_total = be32_to_cpu(prop[0]); > + return pkeys_total; That's not really the way the return value is meant to be used for these scanning functions. The usual return values are 0 to keep scanning and !0 means we've found the node we're looking for and we should stop scanning. Doing it this way means if you find 0 pkeys it will keep scanning. Instead you should pass _total as the data pointer and set that. > +} > + > static int scan_pkey_feature(void) > { > - u32 vals[2]; > - int pkeys_total = 0; > - struct device_node *cpu; > + int pkeys_total; > > /* >* Pkey is not supported with Radix translation. >*/ > - if (radix_enabled()) > + if (early_radix_enabled()) > return 0; > > - cpu = of_find_node_by_type(NULL, "cpu"); > - if (!cpu) > - return 0; > + pkeys_total = of_scan_flat_dt(dt_scan_storage_keys, NULL); > + if (pkeys_total == 0) { > > - if (of_property_read_u32_array(cpu, > -"ibm,processor-storage-keys", vals, 2) > == 0) { > - /* > - * Since any pkey can be used for data or execute, we will > - * just treat all keys as equal and track them as one entity. > - */ > - pkeys_total = vals[0]; > - /* Should we check for IAMR support FIXME!! */ ??? > - } else { This loses the ability to differentiate between no storage-keys property at all vs a property that specifies zero keys, which I don't think is a good change. > /* >* Let's assume 32 pkeys on P8 bare metal, if its not defined > by device >* tree. We make this exception since skiboot forgot to expose > this >* property on power8. >*/ > if (!firmware_has_feature(FW_FEATURE_LPAR) && > - cpu_has_feature(CPU_FTRS_POWER8)) > + early_cpu_has_feature(CPU_FTRS_POWER8)) > pkeys_total = 32; That's not how cpu_has_feature() works, we'll need to fix that. cheers
Re: [PATCH v5 19/26] powerpc/book3s64/kuap: Move KUAP related function outside radix
"Aneesh Kumar K.V" writes: > The next set of patches adds support for kuap with hash translation. That's no longer true of this series. > diff --git a/arch/powerpc/mm/book3s64/pkeys.c > b/arch/powerpc/mm/book3s64/pkeys.c > index 0d72c0246052..e93b65a0e6e7 100644 > --- a/arch/powerpc/mm/book3s64/pkeys.c > +++ b/arch/powerpc/mm/book3s64/pkeys.c > @@ -199,6 +200,24 @@ void __init pkey_early_init_devtree(void) > return; > } > > +#ifdef CONFIG_PPC_KUAP > +void __init setup_kuap(bool disabled) > +{ > + if (disabled || !early_radix_enabled()) > + return; > + > + if (smp_processor_id() == boot_cpuid) { > + pr_info("Activating Kernel Userspace Access Prevention\n"); > + cur_cpu_spec->mmu_features |= MMU_FTR_RADIX_KUAP; > + } > + > + /* Make sure userspace can't change the AMR */ > + mtspr(SPRN_UAMOR, 0); > + mtspr(SPRN_AMR, AMR_KUAP_BLOCKED); > + isync(); > +} > +#endif This makes this code depend on CONFIG_PPC_MEM_KEYS=y, which it didn't used to. That risks breaking people's existing .configs, if they have PPC_MEM_KEYS=n they will now lose KUAP. And I'm not convinced the two features should be tied together, at least at the user-visible Kconfig level. cheers
Re: [PATCH v12 00/31] Speculative page faults
Le 06/07/2020 à 11:25, Chinwen Chang a écrit : On Thu, 2019-06-20 at 16:19 +0800, Haiyan Song wrote: Hi Laurent, I downloaded your script and run it on Intel 2s skylake platform with spf-v12 patch serials. Here attached the output results of this script. The following comparison result is statistics from the script outputs. a). Enable THP SPF_0 change SPF_1 will-it-scale.page_fault2.per_thread_ops2664190.8 -11.7% 2353637.6 will-it-scale.page_fault3.per_thread_ops4480027.2 -14.7% 3819331.9 b). Disable THP SPF_0 change SPF_1 will-it-scale.page_fault2.per_thread_ops2653260.7 -10% 2385165.8 will-it-scale.page_fault3.per_thread_ops4436330.1 -12.4% 3886734.2 Thanks, Haiyan Song On Fri, Jun 14, 2019 at 10:44:47AM +0200, Laurent Dufour wrote: Le 14/06/2019 à 10:37, Laurent Dufour a écrit : Please find attached the script I run to get these numbers. This would be nice if you could give it a try on your victim node and share the result. Sounds that the Intel mail fitering system doesn't like the attached shell script. Please find it there: https://gist.github.com/ldu4/a5cc1a93f293108ea387d43d5d5e7f44 Thanks, Laurent. Hi Laurent, We merged SPF v11 and some patches from v12 into our platforms. After several experiments, we observed SPF has obvious improvements on the launch time of applications, especially for those high-TLP ones, # launch time of applications(s): package version w/ SPF w/o SPF improve(%) -- Baidu maps10.13.3 0.887 0.98 9.49 Taobao8.4.0.35 1.227 1.2935.10 Meituan 9.12.401 1.107 1.54328.26 WeChat7.0.32.353 2.68 12.20 Honor of Kings1.43.1.6 6.636.7131.24 That's great news, thanks for reporting this! By the way, we have verified our platforms with those patches and achieved the goal of mass production. Another good news! For my information, what is your targeted hardware? Cheers, Laurent.
Re: [PATCH v5 18/26] powerpc/book3s64/keys/kuap: Reset AMR/IAMR values on kexec
"Aneesh Kumar K.V" writes: > As we kexec across kernels that use AMR/IAMR for different purposes > we need to ensure that new kernels get kexec'd with a reset value > of AMR/IAMR. For ex: the new kernel can use key 0 for kernel mapping and the > old > AMR value prevents access to key 0. > > This patch also removes reset if IAMR and AMOR in kexec_sequence. Reset of > AMOR > is not needed and the IAMR reset is partial (it doesn't do the reset > on secondary cpus) and is redundant with this patch. I like the idea of cleaning this stuff up. But I think tying it into kup/kuep/kuap and the MMU features and ifdefs and so on is overly complicated. We should just have eg: void reset_sprs(void) { if (early_cpu_has_feature(CPU_FTR_ARCH_206)) { mtspr(SPRN_AMR, 0); mtspr(SPRN_UAMOR, 0); } if (early_cpu_has_feature(CPU_FTR_ARCH_207S)) { mtspr(SPRN_IAMR, 0); } } And call that from the kexec paths. cheers > diff --git a/arch/powerpc/include/asm/book3s/64/kup-radix.h > b/arch/powerpc/include/asm/book3s/64/kup-radix.h > index 3ee1ec60be84..c57063c35833 100644 > --- a/arch/powerpc/include/asm/book3s/64/kup-radix.h > +++ b/arch/powerpc/include/asm/book3s/64/kup-radix.h > @@ -180,6 +180,26 @@ static inline unsigned long kuap_get_and_check_amr(void) > } > #endif /* CONFIG_PPC_KUAP */ > > +#define reset_kuap reset_kuap > +static inline void reset_kuap(void) > +{ > + if (mmu_has_feature(MMU_FTR_RADIX_KUAP)) { > + mtspr(SPRN_AMR, 0); > + /* Do we need isync()? We are going via a kexec reset */ > + isync(); > + } > +} > + > +#define reset_kuep reset_kuep > +static inline void reset_kuep(void) > +{ > + if (mmu_has_feature(MMU_FTR_KUEP)) { > + mtspr(SPRN_IAMR, 0); > + /* Do we need isync()? We are going via a kexec reset */ > + isync(); > + } > +} > + > #endif /* __ASSEMBLY__ */ > > #endif /* _ASM_POWERPC_BOOK3S_64_KUP_RADIX_H */ > diff --git a/arch/powerpc/include/asm/kup.h b/arch/powerpc/include/asm/kup.h > index c745ee41ad66..4dc23a706910 100644 > --- a/arch/powerpc/include/asm/kup.h > +++ b/arch/powerpc/include/asm/kup.h > @@ -113,6 +113,20 @@ static inline void prevent_current_write_to_user(void) > prevent_user_access(NULL, NULL, ~0UL, KUAP_CURRENT_WRITE); > } > > +#ifndef reset_kuap > +#define reset_kuap reset_kuap > +static inline void reset_kuap(void) > +{ > +} > +#endif > + > +#ifndef reset_kuep > +#define reset_kuep reset_kuep > +static inline void reset_kuep(void) > +{ > +} > +#endif > + > #endif /* !__ASSEMBLY__ */ > > #endif /* _ASM_POWERPC_KUAP_H_ */ > diff --git a/arch/powerpc/kernel/misc_64.S b/arch/powerpc/kernel/misc_64.S > index 1864605eca29..7bb46ad98207 100644 > --- a/arch/powerpc/kernel/misc_64.S > +++ b/arch/powerpc/kernel/misc_64.S > @@ -413,20 +413,6 @@ _GLOBAL(kexec_sequence) > li r0,0 > std r0,16(r1) > > -BEGIN_FTR_SECTION > - /* > - * This is the best time to turn AMR/IAMR off. > - * key 0 is used in radix for supervisor<->user > - * protection, but on hash key 0 is reserved > - * ideally we want to enter with a clean state. > - * NOTE, we rely on r0 being 0 from above. > - */ > - mtspr SPRN_IAMR,r0 > -BEGIN_FTR_SECTION_NESTED(42) > - mtspr SPRN_AMOR,r0 > -END_FTR_SECTION_NESTED_IFSET(CPU_FTR_HVMODE, 42) > -END_FTR_SECTION_IFSET(CPU_FTR_ARCH_300) > - > /* save regs for local vars on new stack. >* yes, we won't go back, but ... >*/ > diff --git a/arch/powerpc/kexec/core_64.c b/arch/powerpc/kexec/core_64.c > index b4184092172a..a124715f33ea 100644 > --- a/arch/powerpc/kexec/core_64.c > +++ b/arch/powerpc/kexec/core_64.c > @@ -152,6 +152,9 @@ static void kexec_smp_down(void *arg) > if (ppc_md.kexec_cpu_down) > ppc_md.kexec_cpu_down(0, 1); > > + reset_kuap(); > + reset_kuep(); > + > kexec_smp_wait(); > /* NOTREACHED */ > } > diff --git a/arch/powerpc/mm/book3s64/pgtable.c > b/arch/powerpc/mm/book3s64/pgtable.c > index c58ad1049909..9673f4b74c9a 100644 > --- a/arch/powerpc/mm/book3s64/pgtable.c > +++ b/arch/powerpc/mm/book3s64/pgtable.c > @@ -165,6 +165,9 @@ void mmu_cleanup_all(void) > radix__mmu_cleanup_all(); > else if (mmu_hash_ops.hpte_clear_all) > mmu_hash_ops.hpte_clear_all(); > + > + reset_kuap(); > + reset_kuep(); > } > > #ifdef CONFIG_MEMORY_HOTPLUG > -- > 2.26.2
Re: [PATCH V4 0/4] mm/debug_vm_pgtable: Add some more tests
On Mon, Jul 06, 2020 at 06:18:32AM +0530, Anshuman Khandual wrote: [...] > Tested on arm64, x86 platforms but only build tested on all other enabled > platforms through ARCH_HAS_DEBUG_VM_PGTABLE i.e powerpc, arc, s390. The Sorry for missing to test earlier. Works for me on s390. Also, tried with few relevant config options to set/unset. Thanks!
Re: [PATCH] kbuild: introduce ccflags-remove-y and asflags-remove-y
On Mon, 6 Jul 2020 at 13:24, Anders Roxell wrote: > > Hi, > > When I built an allmodconfig kernel for arm64, and boot that in qemu > guest I see the following issues: > > [...] > [ 10.451561][T1] Testing tracer function: PASSED > [ 33.087895][T1] Testing dynamic ftrace: .. filter did not > filter .. FAILED! > [ 51.127094][T1] [ cut here ] > [ 51.132387][T1] WARNING: CPU: 0 PID: 1 at > kernel/trace/trace.c:1814 run_tracer_selftest+0x314/0x40c > [ 51.140805][T1] Modules linked in: > [ 51.145398][T1] CPU: 0 PID: 1 Comm: swapper/0 Not tainted > 5.8.0-rc3-next-20200703-4-g8cd4bc531754 #1 > [ 51.154146][T1] Hardware name: linux,dummy-virt (DT) > [ 51.159536][T1] pstate: 8045 (Nzcv daif +PAN -UAO BTYPE=--) > [ 51.165711][T1] pc : run_tracer_selftest+0x314/0x40c > [ 51.171167][T1] lr : run_tracer_selftest+0x308/0x40c > [ 51.176475][T1] sp : 69c07c40 > [ 51.180855][T1] x29: 69c07c40 x28: a00015e80320 > [ 51.187187][T1] x27: a00013e074a0 x26: a000150f22ee > [ 51.193520][T1] x25: a000125b35a0 x24: a00013f6a000 > [ 51.199868][T1] x23: a00013f6adc0 x22: > [ 51.206225][T1] x21: a000150f2250 x20: a00015e801c0 > [ 51.212582][T1] x19: a00015e7fa80 x18: 2750 > [ 51.218887][T1] x17: 14c0 x16: 16f0 > [ 51.225229][T1] x15: 14f8 x14: a0001000a3c8 > [ 51.231584][T1] x13: a000124a5fb8 x12: 8d494822 > [ 51.237935][T1] x11: 1fffed494821 x10: 8d494821 > [ 51.244293][T1] x9 : a000101763b0 x8 : 69c077d7 > [ 51.250704][T1] x7 : 0001 x6 : 69c077d0 > [ 51.257060][T1] x5 : 6a7fc040 x4 : > [ 51.263423][T1] x3 : a0001028 x2 : 8af050918b00 > [ 51.269757][T1] x1 : x0 : 0001 > [ 51.276084][T1] Call trace: > [ 51.279934][T1] run_tracer_selftest+0x314/0x40c > [ 51.285174][T1] init_trace_selftests+0x110/0x31c > [ 51.290391][T1] do_one_initcall+0x410/0x960 > [ 51.295315][T1] kernel_init_freeable+0x430/0x4f0 > [ 51.300595][T1] kernel_init+0x20/0x1d8 > [ 51.305180][T1] ret_from_fork+0x10/0x18 > [ 51.309741][T1] irq event stamp: 1401832 > [ 51.314399][T1] hardirqs last enabled at (1401831): > [] console_unlock+0xc04/0xd10 > [ 51.322973][T1] hardirqs last disabled at (1401832): > [] debug_exception_enter+0xbc/0x200 > [ 51.331993][T1] softirqs last enabled at (1401828): > [] __do_softirq+0x95c/0x9f8 > [ 51.340490][T1] softirqs last disabled at (1401821): > [] irq_exit+0x118/0x198 > [ 51.348717][T1] _warn_unseeded_randomness: 5 callbacks suppressed > [ 51.349263][T1] random: get_random_bytes called from > print_oops_end_marker+0x48/0x80 with crng_init=0 > [ 51.350502][T1] ---[ end trace c566e8a71c933d3c ]--- > [...] > [40709.672335][C0] pool 2: cpus=0 flags=0x5 nice=0 hung=3s > workers=3 manager: 1455 > [40739.960593][ T26] INFO: lockdep is turned off. > [40775.312499][ T26] Kernel panic - not syncing: hung_task: blocked tasks > [40775.341521][ T26] CPU: 0 PID: 26 Comm: khungtaskd Tainted: G > W 5.8.0-rc3-next-20200703-4-g8cd4bc531754 #1 > [40775.352848][ T26] Hardware name: linux,dummy-virt (DT) > [40775.359304][ T26] Call trace: > [40775.364148][ T26] dump_backtrace+0x0/0x418 > [40775.369918][ T26] show_stack+0x34/0x48 > [40775.375468][ T26] dump_stack+0x1f4/0x2b0 > [40775.381136][ T26] panic+0x2dc/0x6ec > [40775.386430][ T26] watchdog+0x1400/0x1460 > [40775.392103][ T26] kthread+0x23c/0x250 > [40775.397548][ T26] ret_from_fork+0x10/0x18 > [40775.407039][ T26] Kernel Offset: disabled > [40775.412634][ T26] CPU features: 0x240002,20002004 > [40775.418751][ T26] Memory Limit: none > [40775.425823][ T26] ---[ end Kernel panic - not syncing: hung_task: > blocked tasks ]--- > > The full log can be found here [1]. > > Without this patch for 'trace_selftest_dynamic' for instance, CC_FLAGS_FTRACE > was removed from kernel/trace/*, and then added back to > kernel/trace/trace_selftest_dynamic. > While with this patch it looks like we add the flag (even though it is > already there), and then > removes the flag for all files in kernel/trace/* . Hi again, I think the patch below solved the issue for trace_selftest_dynamic. However, I think there is still differences in lib/* . diff --git a/scripts/Makefile.lib b/scripts/Makefile.lib index cc5e3c6aaa20..5632a711921f 100644 --- a/scripts/Makefile.lib +++ b/scripts/Makefile.lib @@ -111,12 +111,12 @@ basename_flags = -DKBUILD_BASENAME=$(call name-fix,$(basetarget)) modname_flags = -DKBUILD_MODNAME=$(call name-fix,$(modname)) modfile_flags = -DKBUILD_MODFILE=$(call stringify,$(modfile)) -orig_c_flags = $(KBUILD_CPPFLAGS)
Re: [PATCH] kbuild: introduce ccflags-remove-y and asflags-remove-y
Hi, When I built an allmodconfig kernel for arm64, and boot that in qemu guest I see the following issues: [...] [ 10.451561][T1] Testing tracer function: PASSED [ 33.087895][T1] Testing dynamic ftrace: .. filter did not filter .. FAILED! [ 51.127094][T1] [ cut here ] [ 51.132387][T1] WARNING: CPU: 0 PID: 1 at kernel/trace/trace.c:1814 run_tracer_selftest+0x314/0x40c [ 51.140805][T1] Modules linked in: [ 51.145398][T1] CPU: 0 PID: 1 Comm: swapper/0 Not tainted 5.8.0-rc3-next-20200703-4-g8cd4bc531754 #1 [ 51.154146][T1] Hardware name: linux,dummy-virt (DT) [ 51.159536][T1] pstate: 8045 (Nzcv daif +PAN -UAO BTYPE=--) [ 51.165711][T1] pc : run_tracer_selftest+0x314/0x40c [ 51.171167][T1] lr : run_tracer_selftest+0x308/0x40c [ 51.176475][T1] sp : 69c07c40 [ 51.180855][T1] x29: 69c07c40 x28: a00015e80320 [ 51.187187][T1] x27: a00013e074a0 x26: a000150f22ee [ 51.193520][T1] x25: a000125b35a0 x24: a00013f6a000 [ 51.199868][T1] x23: a00013f6adc0 x22: [ 51.206225][T1] x21: a000150f2250 x20: a00015e801c0 [ 51.212582][T1] x19: a00015e7fa80 x18: 2750 [ 51.218887][T1] x17: 14c0 x16: 16f0 [ 51.225229][T1] x15: 14f8 x14: a0001000a3c8 [ 51.231584][T1] x13: a000124a5fb8 x12: 8d494822 [ 51.237935][T1] x11: 1fffed494821 x10: 8d494821 [ 51.244293][T1] x9 : a000101763b0 x8 : 69c077d7 [ 51.250704][T1] x7 : 0001 x6 : 69c077d0 [ 51.257060][T1] x5 : 6a7fc040 x4 : [ 51.263423][T1] x3 : a0001028 x2 : 8af050918b00 [ 51.269757][T1] x1 : x0 : 0001 [ 51.276084][T1] Call trace: [ 51.279934][T1] run_tracer_selftest+0x314/0x40c [ 51.285174][T1] init_trace_selftests+0x110/0x31c [ 51.290391][T1] do_one_initcall+0x410/0x960 [ 51.295315][T1] kernel_init_freeable+0x430/0x4f0 [ 51.300595][T1] kernel_init+0x20/0x1d8 [ 51.305180][T1] ret_from_fork+0x10/0x18 [ 51.309741][T1] irq event stamp: 1401832 [ 51.314399][T1] hardirqs last enabled at (1401831): [] console_unlock+0xc04/0xd10 [ 51.322973][T1] hardirqs last disabled at (1401832): [] debug_exception_enter+0xbc/0x200 [ 51.331993][T1] softirqs last enabled at (1401828): [] __do_softirq+0x95c/0x9f8 [ 51.340490][T1] softirqs last disabled at (1401821): [] irq_exit+0x118/0x198 [ 51.348717][T1] _warn_unseeded_randomness: 5 callbacks suppressed [ 51.349263][T1] random: get_random_bytes called from print_oops_end_marker+0x48/0x80 with crng_init=0 [ 51.350502][T1] ---[ end trace c566e8a71c933d3c ]--- [...] [40709.672335][C0] pool 2: cpus=0 flags=0x5 nice=0 hung=3s workers=3 manager: 1455 [40739.960593][ T26] INFO: lockdep is turned off. [40775.312499][ T26] Kernel panic - not syncing: hung_task: blocked tasks [40775.341521][ T26] CPU: 0 PID: 26 Comm: khungtaskd Tainted: G W 5.8.0-rc3-next-20200703-4-g8cd4bc531754 #1 [40775.352848][ T26] Hardware name: linux,dummy-virt (DT) [40775.359304][ T26] Call trace: [40775.364148][ T26] dump_backtrace+0x0/0x418 [40775.369918][ T26] show_stack+0x34/0x48 [40775.375468][ T26] dump_stack+0x1f4/0x2b0 [40775.381136][ T26] panic+0x2dc/0x6ec [40775.386430][ T26] watchdog+0x1400/0x1460 [40775.392103][ T26] kthread+0x23c/0x250 [40775.397548][ T26] ret_from_fork+0x10/0x18 [40775.407039][ T26] Kernel Offset: disabled [40775.412634][ T26] CPU features: 0x240002,20002004 [40775.418751][ T26] Memory Limit: none [40775.425823][ T26] ---[ end Kernel panic - not syncing: hung_task: blocked tasks ]--- The full log can be found here [1]. Without this patch for 'trace_selftest_dynamic' for instance, CC_FLAGS_FTRACE was removed from kernel/trace/*, and then added back to kernel/trace/trace_selftest_dynamic. While with this patch it looks like we add the flag (even though it is already there), and then removes the flag for all files in kernel/trace/* . Cheers, Anders [1] https://people.linaro.org/~anders.roxell/output-next-20200703.log On Tue, 30 Jun 2020 at 04:09, Masahiro Yamada wrote: > > On Mon, Jun 29, 2020 at 2:55 PM Michael Ellerman wrote: > > > > Masahiro Yamada writes: > > > CFLAGS_REMOVE_.o works per object, that is, there is no > > > convenient way to filter out flags for every object in a directory. > > > > > > Add ccflags-remove-y and asflags-remove-y to make it easily. > > > > > > Use ccflags-remove-y to clean up some Makefiles. > > > > > > Suggested-by: Sami Tolvanen > > > Signed-off-by: Masahiro Yamada > > > --- > > > > > > arch/arm/boot/compressed/Makefile | 6 +- > > > arch/powerpc/xmon/Makefile| 3 +-- > > > arch/sh/boot/compressed/Makefile | 5 + > > >
Re: [PATCH v6 00/11] Add the multiple PF support for DWC and Layerscape
On Sat, Mar 14, 2020 at 11:30:27AM +0800, Xiaowei Bao wrote: > Add the PCIe EP multiple PF support for DWC and Layerscape, add > the doorbell MSIX function for DWC, use list to manage the PF of > one PCIe controller, and refactor the Layerscape EP driver due to > some platforms difference. > > Xiaowei Bao (11): > PCI: designware-ep: Add multiple PFs support for DWC > PCI: designware-ep: Add the doorbell mode of MSI-X in EP mode > PCI: designware-ep: Move the function of getting MSI capability > forward > PCI: designware-ep: Modify MSI and MSIX CAP way of finding > dt-bindings: pci: layerscape-pci: Add compatible strings for ls1088a > and ls2088a > PCI: layerscape: Fix some format issue of the code > PCI: layerscape: Modify the way of getting capability with different > PEX > PCI: layerscape: Modify the MSIX to the doorbell mode > PCI: layerscape: Add EP mode support for ls1088a and ls2088a > arm64: dts: layerscape: Add PCIe EP node for ls1088a > misc: pci_endpoint_test: Add LS1088a in pci_device_id table > > .../devicetree/bindings/pci/layerscape-pci.txt | 2 + > arch/arm64/boot/dts/freescale/fsl-ls1088a.dtsi | 31 +++ > drivers/misc/pci_endpoint_test.c | 2 + > drivers/pci/controller/dwc/pci-layerscape-ep.c | 100 ++-- > drivers/pci/controller/dwc/pcie-designware-ep.c| 255 > + > drivers/pci/controller/dwc/pcie-designware.c | 59 +++-- > drivers/pci/controller/dwc/pcie-designware.h | 48 +++- > 7 files changed, 404 insertions(+), 93 deletions(-) Can you rebase it against v5.8-rc1 please ? Thanks, Lorenzo
[PATCH 11/11 RFC] PCI: Remove "*val = 0" from pcie_capability_read_*()
From: Bolarinwa Olayemi Saheed **TODO** Suggested-by: Bjorn Helgaas Signed-off-by: Bolarinwa Olayemi Saheed --- This patch depends on all of the preceeding patches in this series, otherwise it will introduce bugs as pointed out in the commit message of each. drivers/pci/access.c | 14 -- 1 file changed, 14 deletions(-) diff --git a/drivers/pci/access.c b/drivers/pci/access.c index 79c4a2ef269a..ec95edbb1ac8 100644 --- a/drivers/pci/access.c +++ b/drivers/pci/access.c @@ -413,13 +413,6 @@ int pcie_capability_read_word(struct pci_dev *dev, int pos, u16 *val) if (pcie_capability_reg_implemented(dev, pos)) { ret = pci_read_config_word(dev, pci_pcie_cap(dev) + pos, val); - /* -* Reset *val to 0 if pci_read_config_word() fails, it may -* have been written as 0x if hardware error happens -* during pci_read_config_word(). -*/ - if (ret) - *val = 0; return ret; } @@ -448,13 +441,6 @@ int pcie_capability_read_dword(struct pci_dev *dev, int pos, u32 *val) if (pcie_capability_reg_implemented(dev, pos)) { ret = pci_read_config_dword(dev, pci_pcie_cap(dev) + pos, val); - /* -* Reset *val to 0 if pci_read_config_dword() fails, it may -* have been written as 0x if hardware error happens -* during pci_read_config_dword(). -*/ - if (ret) - *val = 0; return ret; } -- 2.18.2
[PATCH 0/11 RFC] PCI: Remove "*val = 0" from pcie_capability_read_*()
From: Bolarinwa Olayemi Saheed *** BLURB HERE *** Bolarinwa Olayemi Saheed (9): IB/hfi1: Confirm that pcie_capability_read_dword() is successful misc: rtsx: Confirm that pcie_capability_read_word() is successful PCI/AER: Use error return value from pcie_capability_read_*() PCI/ASPM: Use error return value from pcie_capability_read_*() PCI: pciehp: Fix wrong failure check on pcie_capability_read_*() PCI: pciehp: Prevent wrong failure check on pcie_capability_read_*() PCI: pciehp: Make "Power On" the default in pciehp_get_power_status() PCI/ACPI: Prevent wrong failure check on pcie_capability_read_*() PCI: Prevent wrong failure check on pcie_capability_read_*() PCI: Remove "*val = 0" fom pcie_capability_read_*() drivers/infiniband/hw/hfi1/aspm.c | 7 --- drivers/misc/cardreader/rts5227.c | 5 +++-- drivers/misc/cardreader/rts5249.c | 5 +++-- drivers/misc/cardreader/rts5260.c | 5 +++-- drivers/misc/cardreader/rts5261.c | 5 +++-- drivers/pci/pcie/aer.c | 5 +++-- drivers/pci/pcie/aspm.c | 33 + drivers/pci/hotplug/pciehp_hpc.c | 47 drivers/pci/pci-acpi.c | 10 --- drivers/pci/probe.c | 29 drivers/pci/access.c | 14 -- 11 files changed, 82 insertions(+), 83 deletions(-) -- 2.18.2
Re: [PATCH v12 00/31] Speculative page faults
On Thu, 2019-06-20 at 16:19 +0800, Haiyan Song wrote: > Hi Laurent, > > I downloaded your script and run it on Intel 2s skylake platform with spf-v12 > patch > serials. > > Here attached the output results of this script. > > The following comparison result is statistics from the script outputs. > > a). Enable THP > SPF_0 change SPF_1 > will-it-scale.page_fault2.per_thread_ops2664190.8 -11.7% > 2353637.6 > will-it-scale.page_fault3.per_thread_ops4480027.2 -14.7% > 3819331.9 > > > b). Disable THP > SPF_0 change SPF_1 > will-it-scale.page_fault2.per_thread_ops2653260.7 -10% > 2385165.8 > will-it-scale.page_fault3.per_thread_ops4436330.1 -12.4% > 3886734.2 > > > Thanks, > Haiyan Song > > > On Fri, Jun 14, 2019 at 10:44:47AM +0200, Laurent Dufour wrote: > > Le 14/06/2019 à 10:37, Laurent Dufour a écrit : > > > Please find attached the script I run to get these numbers. > > > This would be nice if you could give it a try on your victim node and > > > share the result. > > > > Sounds that the Intel mail fitering system doesn't like the attached shell > > script. > > Please find it there: > > https://gist.github.com/ldu4/a5cc1a93f293108ea387d43d5d5e7f44 > > > > Thanks, > > Laurent. > > Hi Laurent, We merged SPF v11 and some patches from v12 into our platforms. After several experiments, we observed SPF has obvious improvements on the launch time of applications, especially for those high-TLP ones, # launch time of applications(s): package version w/ SPF w/o SPF improve(%) -- Baidu maps10.13.3 0.887 0.98 9.49 Taobao8.4.0.35 1.227 1.2935.10 Meituan 9.12.401 1.107 1.54328.26 WeChat7.0.32.353 2.68 12.20 Honor of Kings1.43.1.6 6.636.7131.24 By the way, we have verified our platforms with those patches and achieved the goal of mass production. Thanks. Chinwen Chang
Re: [PATCH] powerpc: select ARCH_HAS_MEMBARRIER_SYNC_CORE
Le 06/07/2020 à 04:18, Nicholas Piggin a écrit : powerpc return from interrupt and return from system call sequences are context synchronising. Signed-off-by: Nicholas Piggin --- .../features/sched/membarrier-sync-core/arch-support.txt | 4 ++-- arch/powerpc/Kconfig | 1 + arch/powerpc/include/asm/exception-64s.h | 4 3 files changed, 7 insertions(+), 2 deletions(-) diff --git a/Documentation/features/sched/membarrier-sync-core/arch-support.txt b/Documentation/features/sched/membarrier-sync-core/arch-support.txt index 8a521a622966..52ad74a25f54 100644 --- a/Documentation/features/sched/membarrier-sync-core/arch-support.txt +++ b/Documentation/features/sched/membarrier-sync-core/arch-support.txt @@ -5,7 +5,7 @@ # # Architecture requirements # -# * arm/arm64 +# * arm/arm64/powerpc # # Rely on implicit context synchronization as a result of exception return # when returning from IPI handler, and when returning to user-space. @@ -45,7 +45,7 @@ | nios2: | TODO | |openrisc: | TODO | | parisc: | TODO | -| powerpc: | TODO | +| powerpc: | ok | | riscv: | TODO | |s390: | TODO | | sh: | TODO | diff --git a/arch/powerpc/Kconfig b/arch/powerpc/Kconfig index 9fa23eb320ff..920c4e3ca4ef 100644 --- a/arch/powerpc/Kconfig +++ b/arch/powerpc/Kconfig @@ -131,6 +131,7 @@ config PPC select ARCH_HAS_PTE_DEVMAP if PPC_BOOK3S_64 select ARCH_HAS_PTE_SPECIAL select ARCH_HAS_MEMBARRIER_CALLBACKS + select ARCH_HAS_MEMBARRIER_SYNC_CORE select ARCH_HAS_SCALED_CPUTIME if VIRT_CPU_ACCOUNTING_NATIVE && PPC_BOOK3S_64 select ARCH_HAS_STRICT_KERNEL_RWX if (PPC32 && !HIBERNATION) select ARCH_HAS_TICK_BROADCAST if GENERIC_CLOCKEVENTS_BROADCAST diff --git a/arch/powerpc/include/asm/exception-64s.h b/arch/powerpc/include/asm/exception-64s.h index 47bd4ea0837d..b88cb3a989b6 100644 --- a/arch/powerpc/include/asm/exception-64s.h +++ b/arch/powerpc/include/asm/exception-64s.h @@ -68,6 +68,10 @@ * * The nop instructions allow us to insert one or more instructions to flush the * L1-D cache when returning to userspace or a guest. + * + * powerpc relies on return from interrupt/syscall being context synchronising + * (which hrfid, rfid, and rfscv are) to support ARCH_HAS_MEMBARRIER_SYNC_CORE + * without additional additional synchronisation instructions. This file is dedicated to BOOK3S/64. What about other ones ? On 32 bits, this is also valid as 'rfi' is also context synchronising, but then why just add some comment in exception-64s.h and only there ? */ #define RFI_FLUSH_SLOT \ RFI_FLUSH_FIXUP_SECTION;\ Christophe
Re: [PATCH 01/14] powerpc/eeh: Remove eeh_dev_phb_init_dynamic()
Hi Oliver, I love your patch! Yet something to improve: [auto build test ERROR on powerpc/next] [also build test ERROR on v5.8-rc4 next-20200706] [cannot apply to scottwood/next] [If your patch is applied to the wrong git tree, kindly drop us a note. And when submitting patch, we suggest to use as documented in https://git-scm.com/docs/git-format-patch] url: https://github.com/0day-ci/linux/commits/Oliver-O-Halloran/powerpc-eeh-Remove-eeh_dev_phb_init_dynamic/20200706-103630 base: https://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux.git next config: powerpc64-randconfig-r006-20200706 (attached as .config) compiler: powerpc64-linux-gcc (GCC) 9.3.0 reproduce (this is a W=1 build): wget https://raw.githubusercontent.com/intel/lkp-tests/master/sbin/make.cross -O ~/bin/make.cross chmod +x ~/bin/make.cross # save the attached .config to linux build tree COMPILER_INSTALL_PATH=$HOME/0day COMPILER=gcc-9.3.0 make.cross ARCH=powerpc64 If you fix the issue, kindly add following tag as appropriate Reported-by: kernel test robot All errors (new ones prefixed by >>): arch/powerpc/kernel/of_platform.c: In function 'of_pci_phb_probe': >> arch/powerpc/kernel/of_platform.c:66:2: error: implicit declaration of >> function 'eeh_phb_pe_create' [-Werror=implicit-function-declaration] 66 | eeh_phb_pe_create(phb); | ^ cc1: all warnings being treated as errors vim +/eeh_phb_pe_create +66 arch/powerpc/kernel/of_platform.c 28 29 /* The probing of PCI controllers from of_platform is currently 30 * 64 bits only, mostly due to gratuitous differences between 31 * the 32 and 64 bits PCI code on PowerPC and the 32 bits one 32 * lacking some bits needed here. 33 */ 34 35 static int of_pci_phb_probe(struct platform_device *dev) 36 { 37 struct pci_controller *phb; 38 39 /* Check if we can do that ... */ 40 if (ppc_md.pci_setup_phb == NULL) 41 return -ENODEV; 42 43 pr_info("Setting up PCI bus %pOF\n", dev->dev.of_node); 44 45 /* Alloc and setup PHB data structure */ 46 phb = pcibios_alloc_controller(dev->dev.of_node); 47 if (!phb) 48 return -ENODEV; 49 50 /* Setup parent in sysfs */ 51 phb->parent = >dev; 52 53 /* Setup the PHB using arch provided callback */ 54 if (ppc_md.pci_setup_phb(phb)) { 55 pcibios_free_controller(phb); 56 return -ENODEV; 57 } 58 59 /* Process "ranges" property */ 60 pci_process_bridge_OF_ranges(phb, dev->dev.of_node, 0); 61 62 /* Init pci_dn data structures */ 63 pci_devs_phb_init_dynamic(phb); 64 65 /* Create EEH PE for the PHB */ > 66 eeh_phb_pe_create(phb); 67 68 /* Scan the bus */ 69 pcibios_scan_phb(phb); 70 if (phb->bus == NULL) 71 return -ENXIO; 72 73 /* Claim resources. This might need some rework as well depending 74 * whether we are doing probe-only or not, like assigning unassigned 75 * resources etc... 76 */ 77 pcibios_claim_one_bus(phb->bus); 78 79 /* Add probed PCI devices to the device model */ 80 pci_bus_add_devices(phb->bus); 81 82 return 0; 83 } 84 --- 0-DAY CI Kernel Test Service, Intel Corporation https://lists.01.org/hyperkitty/list/kbuild-...@lists.01.org .config.gz Description: application/gzip
Re: [PATCH v2 01/12] kexec_file: allow archs to handle special regions while locating memory hole
On 07/03/20 at 01:24am, Hari Bathini wrote: > Some architectures may have special memory regions, within the given > memory range, which can't be used for the buffer in a kexec segment. > Implement weak arch_kexec_locate_mem_hole() definition which arch code > may override, to take care of special regions, while trying to locate > a memory hole. > > Also, add the missing declarations for arch overridable functions and > and drop the __weak descriptors in the declarations to avoid non-weak > definitions from becoming weak. > > Reported-by: kernel test robot > [lkp: In v1, arch_kimage_file_post_load_cleanup() declaration was missing] > Signed-off-by: Hari Bathini > --- > > Changes in v2: > * Introduced arch_kexec_locate_mem_hole() for override and dropped > weak arch_kexec_add_buffer(). > * Dropped __weak identifier for arch overridable functions. > * Fixed the missing declaration for arch_kimage_file_post_load_cleanup() > reported by lkp. lkp report for reference: > - https://lore.kernel.org/patchwork/patch/1264418/ > > > include/linux/kexec.h | 29 ++--- > kernel/kexec_file.c | 16 ++-- > 2 files changed, 32 insertions(+), 13 deletions(-) > > diff --git a/include/linux/kexec.h b/include/linux/kexec.h > index ea67910..9e93bef 100644 > --- a/include/linux/kexec.h > +++ b/include/linux/kexec.h > @@ -183,17 +183,24 @@ int kexec_purgatory_get_set_symbol(struct kimage > *image, const char *name, > bool get_value); > void *kexec_purgatory_get_symbol_addr(struct kimage *image, const char > *name); > > -int __weak arch_kexec_kernel_image_probe(struct kimage *image, void *buf, > - unsigned long buf_len); > -void * __weak arch_kexec_kernel_image_load(struct kimage *image); > -int __weak arch_kexec_apply_relocations_add(struct purgatory_info *pi, > - Elf_Shdr *section, > - const Elf_Shdr *relsec, > - const Elf_Shdr *symtab); > -int __weak arch_kexec_apply_relocations(struct purgatory_info *pi, > - Elf_Shdr *section, > - const Elf_Shdr *relsec, > - const Elf_Shdr *symtab); > +/* Architectures may override the below functions */ > +int arch_kexec_kernel_image_probe(struct kimage *image, void *buf, > + unsigned long buf_len); > +void *arch_kexec_kernel_image_load(struct kimage *image); > +int arch_kexec_apply_relocations_add(struct purgatory_info *pi, > + Elf_Shdr *section, > + const Elf_Shdr *relsec, > + const Elf_Shdr *symtab); > +int arch_kexec_apply_relocations(struct purgatory_info *pi, > + Elf_Shdr *section, > + const Elf_Shdr *relsec, > + const Elf_Shdr *symtab); > +int arch_kimage_file_post_load_cleanup(struct kimage *image); > +#ifdef CONFIG_KEXEC_SIG > +int arch_kexec_kernel_verify_sig(struct kimage *image, void *buf, > + unsigned long buf_len); > +#endif > +int arch_kexec_locate_mem_hole(struct kexec_buf *kbuf); > > extern int kexec_add_buffer(struct kexec_buf *kbuf); > int kexec_locate_mem_hole(struct kexec_buf *kbuf); > diff --git a/kernel/kexec_file.c b/kernel/kexec_file.c > index 09cc78d..e89912d 100644 > --- a/kernel/kexec_file.c > +++ b/kernel/kexec_file.c > @@ -636,6 +636,19 @@ int kexec_locate_mem_hole(struct kexec_buf *kbuf) > } > > /** > + * arch_kexec_locate_mem_hole - Find free memory to place the segments. > + * @kbuf: Parameters for the memory search. > + * > + * On success, kbuf->mem will have the start address of the memory region > found. > + * > + * Return: 0 on success, negative errno on error. > + */ > +int __weak arch_kexec_locate_mem_hole(struct kexec_buf *kbuf) > +{ > + return kexec_locate_mem_hole(kbuf); > +} > + > +/** > * kexec_add_buffer - place a buffer in a kexec segment > * @kbuf:Buffer contents and memory parameters. > * > @@ -647,7 +660,6 @@ int kexec_locate_mem_hole(struct kexec_buf *kbuf) > */ > int kexec_add_buffer(struct kexec_buf *kbuf) > { > - > struct kexec_segment *ksegment; > int ret; > > @@ -675,7 +687,7 @@ int kexec_add_buffer(struct kexec_buf *kbuf) > kbuf->buf_align = max(kbuf->buf_align, PAGE_SIZE); > > /* Walk the RAM ranges and allocate a suitable range for the buffer */ > - ret = kexec_locate_mem_hole(kbuf); > + ret = arch_kexec_locate_mem_hole(kbuf); > if (ret) > return ret; > > Acked-by: Dave Young Thanks Dave
Re: [PATCH v5 15/26] powerpc/book3s64/pkeys: Use execute_pkey_disable static key
On 7/6/20 12:50 PM, Michael Ellerman wrote: "Aneesh Kumar K.V" writes: Use execute_pkey_disabled static key to check for execute key support instead of pkey_disabled. Signed-off-by: Aneesh Kumar K.V --- arch/powerpc/include/asm/pkeys.h | 10 +- arch/powerpc/mm/book3s64/pkeys.c | 5 - 2 files changed, 5 insertions(+), 10 deletions(-) diff --git a/arch/powerpc/include/asm/pkeys.h b/arch/powerpc/include/asm/pkeys.h index 47c81d41ea9a..09fbaa409ac4 100644 --- a/arch/powerpc/include/asm/pkeys.h +++ b/arch/powerpc/include/asm/pkeys.h @@ -126,15 +126,7 @@ static inline int mm_pkey_free(struct mm_struct *mm, int pkey) * Try to dedicate one of the protection keys to be used as an * execute-only protection key. */ -extern int __execute_only_pkey(struct mm_struct *mm); -static inline int execute_only_pkey(struct mm_struct *mm) -{ - if (static_branch_likely(_disabled)) - return -1; - - return __execute_only_pkey(mm); -} - +extern int execute_only_pkey(struct mm_struct *mm); extern int __arch_override_mprotect_pkey(struct vm_area_struct *vma, int prot, int pkey); static inline int arch_override_mprotect_pkey(struct vm_area_struct *vma, diff --git a/arch/powerpc/mm/book3s64/pkeys.c b/arch/powerpc/mm/book3s64/pkeys.c index bbba9c601e14..fed4f159011b 100644 --- a/arch/powerpc/mm/book3s64/pkeys.c +++ b/arch/powerpc/mm/book3s64/pkeys.c @@ -345,8 +345,11 @@ void thread_pkey_regs_init(struct thread_struct *thread) write_uamor(default_uamor); } -int __execute_only_pkey(struct mm_struct *mm) +int execute_only_pkey(struct mm_struct *mm) { + if (static_branch_likely(_pkey_disabled)) + return -1; + return mm->context.execute_only_pkey; } That adds the overhead of a function call, but then uses a static_key to avoid an easy to predict branch, which seems like a bad tradeoff. And it's not a performance critical path AFAICS. Anyway this seems unnecessary: pkey_early_init_devtree() { ... if (unlikely(max_pkey <= execute_only_key)) { /* * Insufficient number of keys to support * execute only key. Mark it unavailable. */ execute_only_key = -1; void pkey_mm_init(struct mm_struct *mm) { ... mm->context.execute_only_pkey = execute_only_key; } ie. Can't it just be: static inline int execute_only_pkey(struct mm_struct *mm) { return mm->context.execute_only_pkey; } ok updated with modified arch/powerpc/mm/book3s64/pkeys.c @@ -151,7 +151,7 @@ void __init pkey_early_init_devtree(void) max_pkey = pkeys_total; #endif - if (unlikely(max_pkey <= execute_only_key)) { + if (unlikely(max_pkey <= execute_only_key) || !pkey_execute_disable_supported) { /* * Insufficient number of keys to support * execute only key. Mark it unavailable. @@ -368,9 +368,6 @@ int __arch_set_user_pkey_access(struct task_struct *tsk, int pkey, int execute_only_pkey(struct mm_struct *mm) { - if (static_branch_likely(_pkey_disabled)) - return -1; - return mm->context.execute_only_pkey; } -aneesh
Re: [PATCH v5 11/26] powerpc/book3s64/pkeys: Make initial_allocation_mask static
On 7/6/20 12:34 PM, Michael Ellerman wrote: "Aneesh Kumar K.V" writes: initial_allocation_mask is not used outside this file. And never modified after init, so make it __ro_after_init as well? ok, will update reserved_allocation_maask too. cheers diff --git a/arch/powerpc/include/asm/pkeys.h b/arch/powerpc/include/asm/pkeys.h index 652bad7334f3..47c81d41ea9a 100644 --- a/arch/powerpc/include/asm/pkeys.h +++ b/arch/powerpc/include/asm/pkeys.h @@ -13,7 +13,6 @@ DECLARE_STATIC_KEY_FALSE(pkey_disabled); extern int max_pkey; -extern u32 initial_allocation_mask; /* bits set for the initially allocated keys */ extern u32 reserved_allocation_mask; /* bits set for reserved keys */ #define ARCH_VM_PKEY_FLAGS (VM_PKEY_BIT0 | VM_PKEY_BIT1 | VM_PKEY_BIT2 | \ diff --git a/arch/powerpc/mm/book3s64/pkeys.c b/arch/powerpc/mm/book3s64/pkeys.c index a4d7287082a8..73b5ef1490c8 100644 --- a/arch/powerpc/mm/book3s64/pkeys.c +++ b/arch/powerpc/mm/book3s64/pkeys.c @@ -15,11 +15,11 @@ DEFINE_STATIC_KEY_FALSE(pkey_disabled); DEFINE_STATIC_KEY_FALSE(execute_pkey_disabled); int max_pkey;/* Maximum key value supported */ -u32 initial_allocation_mask; /* Bits set for the initially allocated keys */ /* * Keys marked in the reservation list cannot be allocated by userspace */ u32 reserved_allocation_mask; +static u32 initial_allocation_mask; /* Bits set for the initially allocated keys */ static u64 default_amr; static u64 default_iamr; /* Allow all keys to be modified by default */ -- 2.26.2 -aneesh
Re: [PATCH v5 08/26] powerpc/book3s64/pkeys: Convert execute key support to static key
On 7/6/20 12:49 PM, Michael Ellerman wrote: "Aneesh Kumar K.V" writes: Convert the bool to a static key like pkey_disabled. I'm not convinced this is worth the added complexity of a static key. It's not used in any performance critical paths, except for context switch, but that's already guarded by: if (old_thread->iamr != new_thread->iamr) Which should always be false on machines which don't have the execute key enabled. Ok will drop the patch. -aneesh
Re: [PATCH v8 4/4] powerpc: Book3S 64-bit "heavyweight" KASAN support
Hi Daniel, Thank you for the patch! Perhaps something to improve: [auto build test WARNING on powerpc/next] [also build test WARNING on linux/master linus/master v5.8-rc4 next-20200703] [If your patch is applied to the wrong git tree, kindly drop us a note. And when submitting patch, we suggest to use as documented in https://git-scm.com/docs/git-format-patch] url: https://github.com/0day-ci/linux/commits/Daniel-Axtens/KASAN-for-powerpc64-radix/20200702-105552 base: https://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux.git next config: powerpc64-randconfig-s031-20200706 (attached as .config) compiler: powerpc64le-linux-gcc (GCC) 9.3.0 reproduce: wget https://raw.githubusercontent.com/intel/lkp-tests/master/sbin/make.cross -O ~/bin/make.cross chmod +x ~/bin/make.cross # apt-get install sparse # sparse version: v0.6.2-14-g8fce3d7a-dirty # save the attached .config to linux build tree COMPILER_INSTALL_PATH=$HOME/0day COMPILER=gcc-9.3.0 make.cross C=1 CF='-fdiagnostic-prefix -D__CHECK_ENDIAN__' ARCH=powerpc64 If you fix the issue, kindly add following tag as appropriate Reported-by: kernel test robot sparse warnings: (new ones prefixed by >>) drivers/char/tpm/tpm_ibmvtpm.c:130:9: sparse: sparse: cast removes address space '__iomem' of expression >> drivers/char/tpm/tpm_ibmvtpm.c:131:9: sparse: sparse: incorrect type in >> argument 1 (different address spaces) @@ expected void *s @@ got >> void [noderef] __iomem *rtce_buf @@ >> drivers/char/tpm/tpm_ibmvtpm.c:131:9: sparse: expected void *s drivers/char/tpm/tpm_ibmvtpm.c:131:9: sparse: got void [noderef] __iomem *rtce_buf drivers/char/tpm/tpm_ibmvtpm.c:234:9: sparse: sparse: cast removes address space '__iomem' of expression drivers/char/tpm/tpm_ibmvtpm.c:369:30: sparse: sparse: incorrect type in argument 1 (different address spaces) @@ expected void const * @@ got void [noderef] __iomem *rtce_buf @@ drivers/char/tpm/tpm_ibmvtpm.c:369:30: sparse: expected void const * drivers/char/tpm/tpm_ibmvtpm.c:369:30: sparse: got void [noderef] __iomem *rtce_buf drivers/char/tpm/tpm_ibmvtpm.c:530:43: sparse: sparse: incorrect type in assignment (different address spaces) @@ expected void [noderef] __iomem *rtce_buf @@ got void * @@ drivers/char/tpm/tpm_ibmvtpm.c:530:43: sparse: expected void [noderef] __iomem *rtce_buf drivers/char/tpm/tpm_ibmvtpm.c:530:43: sparse: got void * drivers/char/tpm/tpm_ibmvtpm.c:537:52: sparse: sparse: incorrect type in argument 2 (different address spaces) @@ expected void *ptr @@ got void [noderef] __iomem *rtce_buf @@ drivers/char/tpm/tpm_ibmvtpm.c:537:52: sparse: expected void *ptr drivers/char/tpm/tpm_ibmvtpm.c:537:52: sparse: got void [noderef] __iomem *rtce_buf drivers/char/tpm/tpm_ibmvtpm.c:543:46: sparse: sparse: incorrect type in argument 1 (different address spaces) @@ expected void const * @@ got void [noderef] __iomem *rtce_buf @@ drivers/char/tpm/tpm_ibmvtpm.c:543:46: sparse: expected void const * drivers/char/tpm/tpm_ibmvtpm.c:543:46: sparse: got void [noderef] __iomem *rtce_buf -- drivers/tty/vt/vt.c:233:5: sparse: sparse: symbol 'console_blank_hook' was not declared. Should it be static? drivers/tty/vt/vt.c:2901:19: sparse: sparse: symbol 'console_driver' was not declared. Should it be static? arch/powerpc/include/asm/vga.h:34:16: sparse: sparse: cast to restricted __le16 arch/powerpc/include/asm/vga.h:34:16: sparse: sparse: cast to restricted __le16 arch/powerpc/include/asm/vga.h:34:16: sparse: sparse: cast to restricted __le16 >> arch/powerpc/include/asm/vga.h:40:21: sparse: sparse: incorrect type in >> argument 2 (different base types) @@ expected unsigned short [usertype] >> @@ got restricted __le16 [usertype] @@ >> arch/powerpc/include/asm/vga.h:40:21: sparse: expected unsigned short >> [usertype] arch/powerpc/include/asm/vga.h:40:21: sparse: got restricted __le16 [usertype] arch/powerpc/include/asm/vga.h:34:16: sparse: sparse: cast to restricted __le16 arch/powerpc/include/asm/vga.h:34:16: sparse: sparse: cast to restricted __le16 arch/powerpc/include/asm/vga.h:34:16: sparse: sparse: cast to restricted __le16 arch/powerpc/include/asm/vga.h:34:16: sparse: sparse: cast to restricted __le16 arch/powerpc/include/asm/vga.h:29:15: sparse: sparse: incorrect type in assignment (different base types) @@ expected unsigned short volatile [usertype] @@ got restricted __le16 [usertype] @@ arch/powerpc/include/asm/vga.h:29:15: sparse: expected unsigned short volatile [usertype] arch/powerpc/include/asm/vga.h:29:15: sparse: got restricted __le16 [usertype] arch/powerpc/include/asm/vga.h:34:16: sparse: sparse: cast to restricted __le16 arch/powerpc/include/asm/vga.h:29:
Re: [PATCH v5 17/26] powerpc/book3s64/keys: Print information during boot.
"Aneesh Kumar K.V" writes: > Signed-off-by: Aneesh Kumar K.V > --- > arch/powerpc/mm/book3s64/pkeys.c | 1 + > 1 file changed, 1 insertion(+) > > diff --git a/arch/powerpc/mm/book3s64/pkeys.c > b/arch/powerpc/mm/book3s64/pkeys.c > index 810118123e70..0d72c0246052 100644 > --- a/arch/powerpc/mm/book3s64/pkeys.c > +++ b/arch/powerpc/mm/book3s64/pkeys.c > @@ -195,6 +195,7 @@ void __init pkey_early_init_devtree(void) >*/ > initial_allocation_mask |= reserved_allocation_mask; > > + pr_info("Enabling Memory keys with max key count %d", max_pkey); ^ missing newline > return; > } The syscall is called pkey_mprotect() and the manpage is "pkeys", so I think it would make sense to use "pkey" in the message. cheers
[RFC][PATCH] avoid refcounting the lazy tlb mm struct
On big systems, the mm refcount can become highly contented when doing a lot of context switching with threaded applications (particularly switching between the idle thread and an application thread). Not doing lazy tlb at all slows switching down quite a bit, so I wonder if we can avoid the refcount for the lazy tlb, but have __mmdrop() IPI all CPUs that might be using this mm lazily. This patch has only had light testing so far, but seems to work okay. Thanks, Nick -- diff --git a/arch/Kconfig b/arch/Kconfig index 8cc35dc556c7..69ea7172db3d 100644 --- a/arch/Kconfig +++ b/arch/Kconfig @@ -411,6 +411,16 @@ config MMU_GATHER_NO_GATHER bool depends on MMU_GATHER_TABLE_FREE +config MMU_LAZY_TLB_SHOOTDOWN + bool + help + Instead of refcounting the "lazy tlb" mm struct, which can cause + contention with multi-threaded apps on large multiprocessor systems, + this option causes __mmdrop to IPI all CPUs in the mm_cpumask and + switch to init_mm if they were using the to-be-freed mm as the lazy + tlb. Architectures which do not track all possible lazy tlb CPUs in + mm_cpumask can not use this (without modification). + config ARCH_HAVE_NMI_SAFE_CMPXCHG bool diff --git a/arch/powerpc/Kconfig b/arch/powerpc/Kconfig index 920c4e3ca4ef..24ac85c868db 100644 --- a/arch/powerpc/Kconfig +++ b/arch/powerpc/Kconfig @@ -225,6 +225,7 @@ config PPC select HAVE_PERF_USER_STACK_DUMP select MMU_GATHER_RCU_TABLE_FREE select MMU_GATHER_PAGE_SIZE + select MMU_LAZY_TLB_SHOOTDOWN select HAVE_REGS_AND_STACK_ACCESS_API select HAVE_RELIABLE_STACKTRACE if PPC_BOOK3S_64 && CPU_LITTLE_ENDIAN select HAVE_SYSCALL_TRACEPOINTS diff --git a/arch/powerpc/mm/book3s64/radix_tlb.c b/arch/powerpc/mm/book3s64/radix_tlb.c index b5cc9b23cf02..52730629b3eb 100644 --- a/arch/powerpc/mm/book3s64/radix_tlb.c +++ b/arch/powerpc/mm/book3s64/radix_tlb.c @@ -652,10 +652,10 @@ static void do_exit_flush_lazy_tlb(void *arg) * Must be a kernel thread because sender is single-threaded. */ BUG_ON(current->mm); - mmgrab(_mm); + mmgrab_lazy_tlb(_mm); switch_mm(mm, _mm, current); current->active_mm = _mm; - mmdrop(mm); + mmdrop_lazy_tlb(mm); } _tlbiel_pid(pid, RIC_FLUSH_ALL); } diff --git a/fs/exec.c b/fs/exec.c index e6e8a9a70327..6c96c8feba1f 100644 --- a/fs/exec.c +++ b/fs/exec.c @@ -1119,7 +1119,7 @@ static int exec_mmap(struct mm_struct *mm) mmput(old_mm); return 0; } - mmdrop(active_mm); + mmdrop_lazy_tlb(active_mm); return 0; } diff --git a/include/linux/sched/mm.h b/include/linux/sched/mm.h index 480a4d1b7dd8..ef28059086a1 100644 --- a/include/linux/sched/mm.h +++ b/include/linux/sched/mm.h @@ -51,6 +51,25 @@ static inline void mmdrop(struct mm_struct *mm) void mmdrop(struct mm_struct *mm); +static inline void mmgrab_lazy_tlb(struct mm_struct *mm) +{ + if (!IS_ENABLED(CONFIG_MMU_LAZY_TLB_SHOOTDOWN)) + mmgrab(mm); +} + +static inline void mmdrop_lazy_tlb(struct mm_struct *mm) +{ + if (!IS_ENABLED(CONFIG_MMU_LAZY_TLB_SHOOTDOWN)) + mmdrop(mm); +} + +static inline void mmdrop_lazy_tlb_smp_mb(struct mm_struct *mm) +{ + mmdrop_lazy_tlb(mm); + if (IS_ENABLED(CONFIG_MMU_LAZY_TLB_SHOOTDOWN)) + smp_mb(); +} + /* * This has to be called after a get_task_mm()/mmget_not_zero() * followed by taking the mmap_lock for writing before modifying the diff --git a/kernel/fork.c b/kernel/fork.c index 142b23645d82..e3f1039cee9f 100644 --- a/kernel/fork.c +++ b/kernel/fork.c @@ -685,6 +685,34 @@ static void check_mm(struct mm_struct *mm) #define allocate_mm() (kmem_cache_alloc(mm_cachep, GFP_KERNEL)) #define free_mm(mm)(kmem_cache_free(mm_cachep, (mm))) +static void do_shoot_lazy_tlb(void *arg) +{ + struct mm_struct *mm = arg; + + if (current->active_mm == mm) { + BUG_ON(current->mm); + switch_mm(mm, _mm, current); + current->active_mm = _mm; + } +} + +static void do_check_lazy_tlb(void *arg) +{ + struct mm_struct *mm = arg; + + BUG_ON(current->active_mm == mm); +} + +void shoot_lazy_tlbs(struct mm_struct *mm) +{ + if (IS_ENABLED(CONFIG_MMU_LAZY_TLB_SHOOTDOWN)) { + smp_call_function_many(mm_cpumask(mm), do_shoot_lazy_tlb, (void *)mm, 1); + do_shoot_lazy_tlb(mm); + } + smp_call_function(do_check_lazy_tlb, (void *)mm, 1); + do_check_lazy_tlb(mm); +} + /* * Called when the last reference to the mm * is dropped: either by a lazy thread or by @@ -692,6 +720,7 @@ static void check_mm(struct mm_struct *mm) */ void __mmdrop(struct mm_struct *mm) { + shoot_lazy_tlbs(mm); BUG_ON(mm == _mm);
Re: [PATCH v5 10/26] powerpc/book3s64/pkeys: Convert pkey_total to max_pkey
On 7/6/20 12:34 PM, Michael Ellerman wrote: "Aneesh Kumar K.V" writes: max_pkey now represents max key value that userspace can allocate. I guess commit message is confusing. Signed-off-by: Aneesh Kumar K.V --- arch/powerpc/include/asm/pkeys.h | 7 +-- arch/powerpc/mm/book3s64/pkeys.c | 14 +++--- 2 files changed, 12 insertions(+), 9 deletions(-) diff --git a/arch/powerpc/include/asm/pkeys.h b/arch/powerpc/include/asm/pkeys.h index 75d2a2c19c04..652bad7334f3 100644 --- a/arch/powerpc/include/asm/pkeys.h +++ b/arch/powerpc/include/asm/pkeys.h @@ -12,7 +12,7 @@ #include DECLARE_STATIC_KEY_FALSE(pkey_disabled); -extern int pkeys_total; /* total pkeys as per device tree */ +extern int max_pkey; extern u32 initial_allocation_mask; /* bits set for the initially allocated keys */ extern u32 reserved_allocation_mask; /* bits set for reserved keys */ @@ -44,7 +44,10 @@ static inline int vma_pkey(struct vm_area_struct *vma) return (vma->vm_flags & ARCH_VM_PKEY_FLAGS) >> VM_PKEY_SHIFT; } -#define arch_max_pkey() pkeys_total +static inline int arch_max_pkey(void) +{ + return max_pkey; +} If pkeys_total = 32 then max_pkey = 31. we have #ifdef CONFIG_PPC_4K_PAGES /* * The OS can manage only 8 pkeys due to its inability to represent them * in the Linux 4K PTE. Mark all other keys reserved. */ max_pkey = min(8, pkeys_total); #else max_pkey = pkeys_total; #endif so it is 32. So we can't just substitute one for the other. ie. arch_max_pkey() must have been wrong, or it is wrong now. diff --git a/arch/powerpc/mm/book3s64/pkeys.c b/arch/powerpc/mm/book3s64/pkeys.c index 87d882a9aaf2..a4d7287082a8 100644 --- a/arch/powerpc/mm/book3s64/pkeys.c +++ b/arch/powerpc/mm/book3s64/pkeys.c @@ -14,7 +14,7 @@ DEFINE_STATIC_KEY_FALSE(pkey_disabled); DEFINE_STATIC_KEY_FALSE(execute_pkey_disabled); -int pkeys_total; /* Total pkeys as per device tree */ +int max_pkey; /* Maximum key value supported */ u32 initial_allocation_mask; /* Bits set for the initially allocated keys */ /* * Keys marked in the reservation list cannot be allocated by userspace @@ -84,7 +84,7 @@ static int scan_pkey_feature(void) static int pkey_initialize(void) { - int os_reserved, i; + int pkeys_total, i; /* * We define PKEY_DISABLE_EXECUTE in addition to the arch-neutral @@ -122,12 +122,12 @@ static int pkey_initialize(void) * The OS can manage only 8 pkeys due to its inability to represent them * in the Linux 4K PTE. Mark all other keys reserved. */ - os_reserved = pkeys_total - 8; + max_pkey = min(8, pkeys_total); Shouldn't it be 7 ? #else - os_reserved = 0; + max_pkey = pkeys_total; #endif - if (unlikely((pkeys_total - os_reserved) <= execute_only_key)) { + if (unlikely(max_pkey <= execute_only_key)) { Isn't that an off-by-one now? This is one-off boot time code, there's no need to clutter it with unlikely. /* * Insufficient number of keys to support * execute only key. Mark it unavailable. @@ -174,10 +174,10 @@ static int pkey_initialize(void) default_uamor &= ~(0x3ul << pkeyshift(1)); /* -* Prevent the usage of OS reserved the keys. Update UAMOR +* Prevent the usage of OS reserved keys. Update UAMOR * for those keys. */ - for (i = (pkeys_total - os_reserved); i < pkeys_total; i++) { + for (i = max_pkey; i < pkeys_total; i++) { Another off-by-one? Shouldn't we start from max_pkey + 1 ? reserved_allocation_mask |= (0x1 << i); default_uamor &= ~(0x3ul << pkeyshift(i)); } It is the commit message. It is max pkey such that userspace can allocate 0 - maxp_pkey -1. -aneesh
Re: [PATCH v5 15/26] powerpc/book3s64/pkeys: Use execute_pkey_disable static key
"Aneesh Kumar K.V" writes: > Use execute_pkey_disabled static key to check for execute key support instead > of pkey_disabled. > > Signed-off-by: Aneesh Kumar K.V > --- > arch/powerpc/include/asm/pkeys.h | 10 +- > arch/powerpc/mm/book3s64/pkeys.c | 5 - > 2 files changed, 5 insertions(+), 10 deletions(-) > > diff --git a/arch/powerpc/include/asm/pkeys.h > b/arch/powerpc/include/asm/pkeys.h > index 47c81d41ea9a..09fbaa409ac4 100644 > --- a/arch/powerpc/include/asm/pkeys.h > +++ b/arch/powerpc/include/asm/pkeys.h > @@ -126,15 +126,7 @@ static inline int mm_pkey_free(struct mm_struct *mm, int > pkey) > * Try to dedicate one of the protection keys to be used as an > * execute-only protection key. > */ > -extern int __execute_only_pkey(struct mm_struct *mm); > -static inline int execute_only_pkey(struct mm_struct *mm) > -{ > - if (static_branch_likely(_disabled)) > - return -1; > - > - return __execute_only_pkey(mm); > -} > - > +extern int execute_only_pkey(struct mm_struct *mm); > extern int __arch_override_mprotect_pkey(struct vm_area_struct *vma, >int prot, int pkey); > static inline int arch_override_mprotect_pkey(struct vm_area_struct *vma, > diff --git a/arch/powerpc/mm/book3s64/pkeys.c > b/arch/powerpc/mm/book3s64/pkeys.c > index bbba9c601e14..fed4f159011b 100644 > --- a/arch/powerpc/mm/book3s64/pkeys.c > +++ b/arch/powerpc/mm/book3s64/pkeys.c > @@ -345,8 +345,11 @@ void thread_pkey_regs_init(struct thread_struct *thread) > write_uamor(default_uamor); > } > > -int __execute_only_pkey(struct mm_struct *mm) > +int execute_only_pkey(struct mm_struct *mm) > { > + if (static_branch_likely(_pkey_disabled)) > + return -1; > + > return mm->context.execute_only_pkey; > } That adds the overhead of a function call, but then uses a static_key to avoid an easy to predict branch, which seems like a bad tradeoff. And it's not a performance critical path AFAICS. Anyway this seems unnecessary: pkey_early_init_devtree() { ... if (unlikely(max_pkey <= execute_only_key)) { /* * Insufficient number of keys to support * execute only key. Mark it unavailable. */ execute_only_key = -1; void pkey_mm_init(struct mm_struct *mm) { ... mm->context.execute_only_pkey = execute_only_key; } ie. Can't it just be: static inline int execute_only_pkey(struct mm_struct *mm) { return mm->context.execute_only_pkey; } cheers
Re: [PATCH v5 08/26] powerpc/book3s64/pkeys: Convert execute key support to static key
"Aneesh Kumar K.V" writes: > Convert the bool to a static key like pkey_disabled. I'm not convinced this is worth the added complexity of a static key. It's not used in any performance critical paths, except for context switch, but that's already guarded by: if (old_thread->iamr != new_thread->iamr) Which should always be false on machines which don't have the execute key enabled. cheers > diff --git a/arch/powerpc/mm/book3s64/pkeys.c > b/arch/powerpc/mm/book3s64/pkeys.c > index 9e68a08799ee..7d400d5a4076 100644 > --- a/arch/powerpc/mm/book3s64/pkeys.c > +++ b/arch/powerpc/mm/book3s64/pkeys.c > @@ -13,13 +13,13 @@ > #include > > DEFINE_STATIC_KEY_TRUE(pkey_disabled); > +DEFINE_STATIC_KEY_FALSE(execute_pkey_disabled); > int pkeys_total;/* Total pkeys as per device tree */ > u32 initial_allocation_mask; /* Bits set for the initially allocated keys > */ > /* > * Keys marked in the reservation list cannot be allocated by userspace > */ > u32 reserved_allocation_mask; > -static bool pkey_execute_disable_supported; > static u64 default_amr; > static u64 default_iamr; > /* Allow all keys to be modified by default */ > @@ -116,9 +116,7 @@ static int pkey_initialize(void) >* execute_disable support. Instead we use a PVR check. >*/ > if (pvr_version_is(PVR_POWER7) || pvr_version_is(PVR_POWER7p)) > - pkey_execute_disable_supported = false; > - else > - pkey_execute_disable_supported = true; > + static_branch_enable(_pkey_disabled); > > #ifdef CONFIG_PPC_4K_PAGES > /* > @@ -214,7 +212,7 @@ static inline void write_amr(u64 value) > > static inline u64 read_iamr(void) > { > - if (!likely(pkey_execute_disable_supported)) > + if (static_branch_unlikely(_pkey_disabled)) > return 0x0UL; > > return mfspr(SPRN_IAMR); > @@ -222,7 +220,7 @@ static inline u64 read_iamr(void) > > static inline void write_iamr(u64 value) > { > - if (!likely(pkey_execute_disable_supported)) > + if (static_branch_unlikely(_pkey_disabled)) > return; > > mtspr(SPRN_IAMR, value); > @@ -282,7 +280,7 @@ int __arch_set_user_pkey_access(struct task_struct *tsk, > int pkey, > return -EINVAL; > > if (init_val & PKEY_DISABLE_EXECUTE) { > - if (!pkey_execute_disable_supported) > + if (static_branch_unlikely(_pkey_disabled)) > return -EINVAL; > new_iamr_bits |= IAMR_EX_BIT; > } > -- > 2.26.2
Re: [PATCH v5 11/26] powerpc/book3s64/pkeys: Make initial_allocation_mask static
"Aneesh Kumar K.V" writes: > initial_allocation_mask is not used outside this file. And never modified after init, so make it __ro_after_init as well? cheers > diff --git a/arch/powerpc/include/asm/pkeys.h > b/arch/powerpc/include/asm/pkeys.h > index 652bad7334f3..47c81d41ea9a 100644 > --- a/arch/powerpc/include/asm/pkeys.h > +++ b/arch/powerpc/include/asm/pkeys.h > @@ -13,7 +13,6 @@ > > DECLARE_STATIC_KEY_FALSE(pkey_disabled); > extern int max_pkey; > -extern u32 initial_allocation_mask; /* bits set for the initially allocated > keys */ > extern u32 reserved_allocation_mask; /* bits set for reserved keys */ > > #define ARCH_VM_PKEY_FLAGS (VM_PKEY_BIT0 | VM_PKEY_BIT1 | VM_PKEY_BIT2 | \ > diff --git a/arch/powerpc/mm/book3s64/pkeys.c > b/arch/powerpc/mm/book3s64/pkeys.c > index a4d7287082a8..73b5ef1490c8 100644 > --- a/arch/powerpc/mm/book3s64/pkeys.c > +++ b/arch/powerpc/mm/book3s64/pkeys.c > @@ -15,11 +15,11 @@ > DEFINE_STATIC_KEY_FALSE(pkey_disabled); > DEFINE_STATIC_KEY_FALSE(execute_pkey_disabled); > int max_pkey; /* Maximum key value supported */ > -u32 initial_allocation_mask; /* Bits set for the initially allocated keys > */ > /* > * Keys marked in the reservation list cannot be allocated by userspace > */ > u32 reserved_allocation_mask; > +static u32 initial_allocation_mask; /* Bits set for the initially > allocated keys */ > static u64 default_amr; > static u64 default_iamr; > /* Allow all keys to be modified by default */ > -- > 2.26.2
Re: [PATCH v5 10/26] powerpc/book3s64/pkeys: Convert pkey_total to max_pkey
"Aneesh Kumar K.V" writes: > max_pkey now represents max key value that userspace can allocate. > > Signed-off-by: Aneesh Kumar K.V > --- > arch/powerpc/include/asm/pkeys.h | 7 +-- > arch/powerpc/mm/book3s64/pkeys.c | 14 +++--- > 2 files changed, 12 insertions(+), 9 deletions(-) > > diff --git a/arch/powerpc/include/asm/pkeys.h > b/arch/powerpc/include/asm/pkeys.h > index 75d2a2c19c04..652bad7334f3 100644 > --- a/arch/powerpc/include/asm/pkeys.h > +++ b/arch/powerpc/include/asm/pkeys.h > @@ -12,7 +12,7 @@ > #include > > DECLARE_STATIC_KEY_FALSE(pkey_disabled); > -extern int pkeys_total; /* total pkeys as per device tree */ > +extern int max_pkey; > extern u32 initial_allocation_mask; /* bits set for the initially allocated > keys */ > extern u32 reserved_allocation_mask; /* bits set for reserved keys */ > > @@ -44,7 +44,10 @@ static inline int vma_pkey(struct vm_area_struct *vma) > return (vma->vm_flags & ARCH_VM_PKEY_FLAGS) >> VM_PKEY_SHIFT; > } > > -#define arch_max_pkey() pkeys_total > +static inline int arch_max_pkey(void) > +{ > + return max_pkey; > +} If pkeys_total = 32 then max_pkey = 31. So we can't just substitute one for the other. ie. arch_max_pkey() must have been wrong, or it is wrong now. > diff --git a/arch/powerpc/mm/book3s64/pkeys.c > b/arch/powerpc/mm/book3s64/pkeys.c > index 87d882a9aaf2..a4d7287082a8 100644 > --- a/arch/powerpc/mm/book3s64/pkeys.c > +++ b/arch/powerpc/mm/book3s64/pkeys.c > @@ -14,7 +14,7 @@ > > DEFINE_STATIC_KEY_FALSE(pkey_disabled); > DEFINE_STATIC_KEY_FALSE(execute_pkey_disabled); > -int pkeys_total;/* Total pkeys as per device tree */ > +int max_pkey; /* Maximum key value supported */ > u32 initial_allocation_mask; /* Bits set for the initially allocated keys > */ > /* > * Keys marked in the reservation list cannot be allocated by userspace > @@ -84,7 +84,7 @@ static int scan_pkey_feature(void) > > static int pkey_initialize(void) > { > - int os_reserved, i; > + int pkeys_total, i; > > /* >* We define PKEY_DISABLE_EXECUTE in addition to the arch-neutral > @@ -122,12 +122,12 @@ static int pkey_initialize(void) >* The OS can manage only 8 pkeys due to its inability to represent them >* in the Linux 4K PTE. Mark all other keys reserved. >*/ > - os_reserved = pkeys_total - 8; > + max_pkey = min(8, pkeys_total); Shouldn't it be 7 ? > #else > - os_reserved = 0; > + max_pkey = pkeys_total; > #endif > > - if (unlikely((pkeys_total - os_reserved) <= execute_only_key)) { > + if (unlikely(max_pkey <= execute_only_key)) { Isn't that an off-by-one now? This is one-off boot time code, there's no need to clutter it with unlikely. > /* >* Insufficient number of keys to support >* execute only key. Mark it unavailable. > @@ -174,10 +174,10 @@ static int pkey_initialize(void) > default_uamor &= ~(0x3ul << pkeyshift(1)); > > /* > - * Prevent the usage of OS reserved the keys. Update UAMOR > + * Prevent the usage of OS reserved keys. Update UAMOR >* for those keys. >*/ > - for (i = (pkeys_total - os_reserved); i < pkeys_total; i++) { > + for (i = max_pkey; i < pkeys_total; i++) { Another off-by-one? Shouldn't we start from max_pkey + 1 ? > reserved_allocation_mask |= (0x1 << i); > default_uamor &= ~(0x3ul << pkeyshift(i)); > } cheers
[PATCH] powerpc/numa: Restrict possible nodes based on platform
As per PAPR, there are 2 device tree property ibm,max-associativity-domains (which defines the maximum number of domains that the firmware i.e PowerVM can support) and ibm,current-associativity-domains (which defines the maximum number of domains that the platform can support). Value of ibm,max-associativity-domains property is always greater than or equal to ibm,current-associativity-domains property. Powerpc currently uses ibm,max-associativity-domains property while setting the possible number of nodes. This is currently set at 32. However the possible number of nodes for a platform may be significantly less. Hence set the possible number of nodes based on ibm,current-associativity-domains property. $ lsprop /proc/device-tree/rtas/ibm,*associ*-domains /proc/device-tree/rtas/ibm,current-associativity-domains 0005 0001 0002 0002 0002 0010 /proc/device-tree/rtas/ibm,max-associativity-domains 0005 0001 0008 0020 0020 0100 $ cat /sys/devices/system/node/possible ##Before patch 0-31 $ cat /sys/devices/system/node/possible ##After patch 0-1 Note the maximum nodes this platform can support is only 2 but the possible nodes is set to 32. This is important because lot of kernel and user space code allocate structures for all possible nodes leading to a lot of memory that is allocated but not used. I ran a simple experiment to create and destroy 100 memory cgroups on boot on a 8 node machine (Power8 Alpine). Before patch free -k at boot totalusedfree shared buff/cache available Mem: 523498176 4106816 518820608 22272 570752 516606720 Swap: 4194240 0 4194240 free -k after creating 100 memory cgroups totalusedfree shared buff/cache available Mem: 523498176 4628416 518246464 22336 623296 516058688 Swap: 4194240 0 4194240 free -k after destroying 100 memory cgroups totalusedfree shared buff/cache available Mem: 523498176 4697408 518173760 22400 627008 515987904 Swap: 4194240 0 4194240 After patch free -k at boot totalusedfree shared buff/cache available Mem: 523498176 3969472 518933888 22272 594816 516731776 Swap: 4194240 0 4194240 free -k after creating 100 memory cgroups totalusedfree shared buff/cache available Mem: 523498176 4181888 518676096 22208 640192 516496448 Swap: 4194240 0 4194240 free -k after destroying 100 memory cgroups totalusedfree shared buff/cache available Mem: 523498176 4232320 518619904 22272 645952 516443264 Swap: 4194240 0 4194240 Observations: Fixed kernel takes 137344 kb (4106816-3969472) less to boot. Fixed kernel takes 309184 kb (4628416-4181888-137344) less to create 100 memcgs. Cc: Nathan Lynch Cc: Michael Ellerman Cc: linuxppc-dev@lists.ozlabs.org Cc: Anton Blanchard Cc: Bharata B Rao Signed-off-by: Srikar Dronamraju --- arch/powerpc/mm/numa.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/arch/powerpc/mm/numa.c b/arch/powerpc/mm/numa.c index 9fcf2d195830..3d55cef1a2dc 100644 --- a/arch/powerpc/mm/numa.c +++ b/arch/powerpc/mm/numa.c @@ -897,7 +897,7 @@ static void __init find_possible_nodes(void) return; if (of_property_read_u32_index(rtas, - "ibm,max-associativity-domains", + "ibm,current-associativity-domains", min_common_depth, )) goto out; -- 2.18.2
[RFC v2 2/2] powerpc/powernv : Introduce capability for firmware-enabled-stop
This patch introduces the capability for firmware to handle the stop states instead. A bit is set based on the discovery of the feature and correspondingly also the responsibility to handle the stop states. If Kernel does not know about stop version, it can fallback to opal for idle stop support if firmware-stop-supported property is present. Earlier part of this patch was posted in this series : https://lkml.org/lkml/2020/3/4/589 Signed-off-by: Abhishek Goel Signed-off-by: Pratik Rajesh Sampat --- v1->v2 : Combined patch 2 and 3 from previous iteration and rebased it. arch/powerpc/include/asm/processor.h | 18 ++ arch/powerpc/kernel/dt_cpu_ftrs.c | 13 + arch/powerpc/platforms/powernv/idle.c | 20 drivers/cpuidle/cpuidle-powernv.c | 3 ++- 4 files changed, 49 insertions(+), 5 deletions(-) diff --git a/arch/powerpc/include/asm/processor.h b/arch/powerpc/include/asm/processor.h index bfa336fbcfeb..b8de7146387c 100644 --- a/arch/powerpc/include/asm/processor.h +++ b/arch/powerpc/include/asm/processor.h @@ -428,6 +428,24 @@ extern void power4_idle_nap(void); extern unsigned long cpuidle_disable; enum idle_boot_override {IDLE_NO_OVERRIDE = 0, IDLE_POWERSAVE_OFF}; +#define STOP_ENABLE0x0001 +#define FIRMWARE_STOP_ENABLE 0x0010 + +#define STOP_VERSION_P9 0x1 + +/* + * Classify the dependencies of the stop states + * @idle_stop: function handler to handle the quirk stop version + * @cpuidle_prop: Signify support for stop states through kernel and/or firmware + * @stop_version: Classify quirk versions for stop states + */ +typedef struct { + unsigned long (*idle_stop)(unsigned long psscr, bool mmu_on); + uint8_t cpuidle_prop; + uint8_t stop_version; +} stop_deps_t; +extern stop_deps_t stop_dep; + extern int powersave_nap; /* set if nap mode can be used in idle loop */ extern void power7_idle_type(unsigned long type); diff --git a/arch/powerpc/kernel/dt_cpu_ftrs.c b/arch/powerpc/kernel/dt_cpu_ftrs.c index 36bc0d5c4f3a..737686fae3c7 100644 --- a/arch/powerpc/kernel/dt_cpu_ftrs.c +++ b/arch/powerpc/kernel/dt_cpu_ftrs.c @@ -291,6 +291,15 @@ static int __init feat_enable_idle_stop(struct dt_cpu_feature *f) lpcr |= LPCR_PECE1; lpcr |= LPCR_PECE2; mtspr(SPRN_LPCR, lpcr); + stop_dep.cpuidle_prop |= STOP_ENABLE; + stop_dep.stop_version = STOP_VERSION_P9; + + return 1; +} + +static int __init feat_enable_firmware_stop(struct dt_cpu_feature *f) +{ + stop_dep.cpuidle_prop |= FIRMWARE_STOP_ENABLE; return 1; } @@ -589,6 +598,7 @@ static struct dt_cpu_feature_match __initdata {"idle-nap", feat_enable_idle_nap, 0}, {"alignment-interrupt-dsisr", feat_enable_align_dsisr, 0}, {"idle-stop", feat_enable_idle_stop, 0}, + {"firmware-stop-supported", feat_enable_firmware_stop, 0}, {"machine-check-power8", feat_enable_mce_power8, 0}, {"performance-monitor-power8", feat_enable_pmu_power8, 0}, {"data-stream-control-register", feat_enable_dscr, CPU_FTR_DSCR}, @@ -656,6 +666,9 @@ static void __init cpufeatures_setup_start(u32 isa) } } +stop_deps_t stop_dep = {NULL, 0x0, 0x0}; +EXPORT_SYMBOL(stop_dep); + static bool __init cpufeatures_process_feature(struct dt_cpu_feature *f) { const struct dt_cpu_feature_match *m; diff --git a/arch/powerpc/platforms/powernv/idle.c b/arch/powerpc/platforms/powernv/idle.c index 3afd4293f729..3602950f6c08 100644 --- a/arch/powerpc/platforms/powernv/idle.c +++ b/arch/powerpc/platforms/powernv/idle.c @@ -824,7 +824,7 @@ static unsigned long power9_offline_stop(unsigned long psscr) #ifndef CONFIG_KVM_BOOK3S_HV_POSSIBLE __ppc64_runlatch_off(); - srr1 = power9_idle_stop(psscr, true); + srr1 = stop_dep.idle_stop(psscr, true); __ppc64_runlatch_on(); #else /* @@ -840,7 +840,7 @@ static unsigned long power9_offline_stop(unsigned long psscr) local_paca->kvm_hstate.hwthread_state = KVM_HWTHREAD_IN_IDLE; __ppc64_runlatch_off(); - srr1 = power9_idle_stop(psscr, false); + srr1 = stop_dep.idle_stop(psscr, true); __ppc64_runlatch_on(); local_paca->kvm_hstate.hwthread_state = KVM_HWTHREAD_IN_KERNEL; @@ -868,7 +868,7 @@ void power9_idle_type(unsigned long stop_psscr_val, psscr = (psscr & ~stop_psscr_mask) | stop_psscr_val; __ppc64_runlatch_off(); - srr1 = power9_idle_stop(psscr, true); + srr1 = stop_dep.idle_stop(psscr, true); __ppc64_runlatch_on(); fini_irq_for_idle_irqsoff(); @@ -1365,8 +1365,20 @@ static int __init pnv_init_idle_states(void) nr_pnv_idle_states = 0; supported_cpuidle_states = 0; - if (cpuidle_disable != IDLE_NO_OVERRIDE) + if (cpuidle_disable != IDLE_NO_OVERRIDE || + !(stop_dep.cpuidle_prop & STOP_ENABLE)) goto out; + + /* Check for supported version