Re: [PATCH linux-next] powerpc: init jump label early in ppc 64

2022-07-25 Thread Zhouyi Zhou
On Mon, Jul 25, 2022 at 3:55 PM Michael Ellerman  wrote:
>
> zhouzho...@gmail.com writes:
> > From: Zhouyi Zhou 
> >
> > In ppc 64, invoke jump_label_init in setup_feature_keys is too late
> > because static key will be used in subroutine of early_init_devtree.
> >
> > So we can invoke jump_label_init earlier in early_setup.
> > We can not move setup_feature_keys backward because its subroutine
> > cpu_feature_keys_init depend on data structures initialized in
> > early_init_devtree.
> >
> > Signed-off-by: Zhouyi Zhou 
> > ---
> > Dear PPC developers
> >
> > I found this bug when trying to do rcutorture tests in ppc VM of
> > Open Source Lab of Oregon State University.
> >
> > qemu-system-ppc64 -nographic -smp cores=8,threads=1 -net none -M pseries 
> > -nodefaults -device spapr-vscsi -serial 
> > file:/home/ubuntu/linux-next/tools/testing/selftests/rcutorture/res/2022.07.19-01.18.42-torture/results-rcutorture/TREE03/console.log
> >  -m 512 -kernel 
> > /home/ubuntu/linux-next/tools/testing/selftests/rcutorture/res/2022.07.19-01.18.42-torture/results-rcutorture/TREE03/vmlinux
> >  -append "debug_boot_weak_hash panic=-1 console=ttyS0 
> > rcupdate.rcu_cpu_stall_suppress_at_boot=1 torture.disable_onoff_at_boot 
> > rcupdate.rcu_task_stall_timeout=3 rcutorture.onoff_interval=200 
> > rcutorture.onoff_holdoff=30 rcutree.gp_preinit_delay=12 
> > rcutree.gp_init_delay=3 rcutree.gp_cleanup_delay=3 rcutree.kthread_prio=2 
> > threadirqs tree.use_softirq=0 rcutorture.n_barrier_cbs=4 
> > rcutorture.stat_interval=15 rcutorture.shutdown_secs=420 
> > rcutorture.test_no_idle_hz=1 rcutorture.verbose=1"
> >
> > console.log report following WARN:
> > [0.00][T0] static_key_enable_cpuslocked(): static key 
> > '0xc2953260' used before call to jump_label_init()^M
> > [0.00][T0] WARNING: CPU: 0 PID: 0 at kernel/jump_label.c:166 
> > static_key_enable_cpuslocked+0xfc/0x120^M
> > [0.00][T0] Modules linked in:^M
> > [0.00][T0] CPU: 0 PID: 0 Comm: swapper Not tainted 
> > 5.19.0-rc5-next-20220708-dirty #131^M
> > [0.00][T0] NIP:  c038068c LR: c0380688 CTR: 
> > c0186ac0^M
> > [0.00][T0] REGS: c2867930 TRAP: 0700   Not tainted  
> > (5.19.0-rc5-next-20220708-dirty)^M
> > [0.00][T0] MSR:  80022003   CR: 24282224  
> > XER: 2004^M
> > [0.00][T0] CFAR: 0730 IRQMASK: 1 ^M
> > [0.00][T0] GPR00: c0380688 c2867bd0 
> > c2868d00 0065 ^M
> > [0.00][T0] GPR04: 0001  
> > 0080 000d ^M
> > [0.00][T0] GPR08:   
> > c27fd000 000f ^M
> > [0.00][T0] GPR12: c0186ac0 c2082280 
> > 0003 000d ^M
> > [0.00][T0] GPR16: 02cc00d0  
> > c2082280 0001 ^M
> > [0.00][T0] GPR20: c2080942  
> >   ^M
> > [0.00][T0] GPR24:  c10d6168 
> >  c20034c8 ^M
> > [0.00][T0] GPR28: 0028  
> > c2080942 c2953260 ^M
> > [0.00][T0] NIP [c038068c] 
> > static_key_enable_cpuslocked+0xfc/0x120^M
> > [0.00][T0] LR [c0380688] 
> > static_key_enable_cpuslocked+0xf8/0x120^M
> > [0.00][T0] Call Trace:^M
> > [0.00][T0] [c2867bd0] [c0380688] 
> > static_key_enable_cpuslocked+0xf8/0x120 (unreliable)^M
> > [0.00][T0] [c2867c40] [c0380810] 
> > static_key_enable+0x30/0x50^M
> > [0.00][T0] [c2867c70] [c2030314] 
> > setup_forced_irqthreads+0x28/0x40^M
> > [0.00][T0] [c2867c90] [c2003568] 
> > do_early_param+0xa0/0x108^M
> > [0.00][T0] [c2867d10] [c0175340] 
> > parse_args+0x290/0x4e0^M
> > [0.00][T0] [c2867e10] [c2003c74] 
> > parse_early_options+0x48/0x5c^M
> > [0.00][T0] [c2867e30] [c2003ce0] 
> > parse_early_param+0x58/0x84^M
> > [0.00][T0] [c2867e60] [c2009878] 
> > early_init_devtree+0xd4/0x518^M
> > [0.00][T0] [c2867f10] [c200aee0] 
> > early_setup+0xb4/0x214^M
> >
> > After this fix, the WARN does not show again.
>
> Hi Zhouyi,
Thank Michael for your guidance.
>
> We have hit something like this previously, see the stack trace in
> commit e7eb919057c3 ("powerpc/64s: Handle program checks in wrong endian
> during early boot").
I am learning the fantastic work by you (git log -p e7eb919057c3),
e7eb919057c3 provides
a trampoline to detect and correct the wrong endian when handling the
exception caused by the WARN
(static key used before call to jump_label_init)
>
> T

Re: [PATCH linux-next] powerpc: init jump label early in ppc 64

2022-07-25 Thread Michael Ellerman
zhouzho...@gmail.com writes:
> From: Zhouyi Zhou 
>
> In ppc 64, invoke jump_label_init in setup_feature_keys is too late
> because static key will be used in subroutine of early_init_devtree.
>
> So we can invoke jump_label_init earlier in early_setup.
> We can not move setup_feature_keys backward because its subroutine
> cpu_feature_keys_init depend on data structures initialized in
> early_init_devtree.
>
> Signed-off-by: Zhouyi Zhou 
> ---
> Dear PPC developers
>
> I found this bug when trying to do rcutorture tests in ppc VM of
> Open Source Lab of Oregon State University.
>
> qemu-system-ppc64 -nographic -smp cores=8,threads=1 -net none -M pseries 
> -nodefaults -device spapr-vscsi -serial 
> file:/home/ubuntu/linux-next/tools/testing/selftests/rcutorture/res/2022.07.19-01.18.42-torture/results-rcutorture/TREE03/console.log
>  -m 512 -kernel 
> /home/ubuntu/linux-next/tools/testing/selftests/rcutorture/res/2022.07.19-01.18.42-torture/results-rcutorture/TREE03/vmlinux
>  -append "debug_boot_weak_hash panic=-1 console=ttyS0 
> rcupdate.rcu_cpu_stall_suppress_at_boot=1 torture.disable_onoff_at_boot 
> rcupdate.rcu_task_stall_timeout=3 rcutorture.onoff_interval=200 
> rcutorture.onoff_holdoff=30 rcutree.gp_preinit_delay=12 
> rcutree.gp_init_delay=3 rcutree.gp_cleanup_delay=3 rcutree.kthread_prio=2 
> threadirqs tree.use_softirq=0 rcutorture.n_barrier_cbs=4 
> rcutorture.stat_interval=15 rcutorture.shutdown_secs=420 
> rcutorture.test_no_idle_hz=1 rcutorture.verbose=1"
>
> console.log report following WARN:
> [0.00][T0] static_key_enable_cpuslocked(): static key 
> '0xc2953260' used before call to jump_label_init()^M
> [0.00][T0] WARNING: CPU: 0 PID: 0 at kernel/jump_label.c:166 
> static_key_enable_cpuslocked+0xfc/0x120^M
> [0.00][T0] Modules linked in:^M
> [0.00][T0] CPU: 0 PID: 0 Comm: swapper Not tainted 
> 5.19.0-rc5-next-20220708-dirty #131^M
> [0.00][T0] NIP:  c038068c LR: c0380688 CTR: 
> c0186ac0^M
> [0.00][T0] REGS: c2867930 TRAP: 0700   Not tainted  
> (5.19.0-rc5-next-20220708-dirty)^M
> [0.00][T0] MSR:  80022003   CR: 24282224  
> XER: 2004^M
> [0.00][T0] CFAR: 0730 IRQMASK: 1 ^M
> [0.00][T0] GPR00: c0380688 c2867bd0 
> c2868d00 0065 ^M
> [0.00][T0] GPR04: 0001  
> 0080 000d ^M
> [0.00][T0] GPR08:   
> c27fd000 000f ^M
> [0.00][T0] GPR12: c0186ac0 c2082280 
> 0003 000d ^M
> [0.00][T0] GPR16: 02cc00d0  
> c2082280 0001 ^M
> [0.00][T0] GPR20: c2080942  
>   ^M
> [0.00][T0] GPR24:  c10d6168 
>  c20034c8 ^M
> [0.00][T0] GPR28: 0028  
> c2080942 c2953260 ^M
> [0.00][T0] NIP [c038068c] 
> static_key_enable_cpuslocked+0xfc/0x120^M
> [0.00][T0] LR [c0380688] 
> static_key_enable_cpuslocked+0xf8/0x120^M
> [0.00][T0] Call Trace:^M
> [0.00][T0] [c2867bd0] [c0380688] 
> static_key_enable_cpuslocked+0xf8/0x120 (unreliable)^M
> [0.00][T0] [c2867c40] [c0380810] 
> static_key_enable+0x30/0x50^M
> [0.00][T0] [c2867c70] [c2030314] 
> setup_forced_irqthreads+0x28/0x40^M
> [0.00][T0] [c2867c90] [c2003568] 
> do_early_param+0xa0/0x108^M
> [0.00][T0] [c2867d10] [c0175340] 
> parse_args+0x290/0x4e0^M
> [0.00][T0] [c2867e10] [c2003c74] 
> parse_early_options+0x48/0x5c^M
> [0.00][T0] [c2867e30] [c2003ce0] 
> parse_early_param+0x58/0x84^M
> [0.00][T0] [c2867e60] [c2009878] 
> early_init_devtree+0xd4/0x518^M
> [0.00][T0] [c2867f10] [c200aee0] 
> early_setup+0xb4/0x214^M
>
> After this fix, the WARN does not show again.

Hi Zhouyi,

We have hit something like this previously, see the stack trace in
commit e7eb919057c3 ("powerpc/64s: Handle program checks in wrong endian
during early boot").

That was fixed incidentally/accidentally by the page_poison code
changing to not use static keys so early.

There was a similar case recently in the random code too, see
60e5b2886b92 ("random: do not use jump labels before they are
initialized").

But I guess this will keep happening, as generic code authors expect to
be able to use static keys in early_param() handlers.

I think the ideal solution would be to move most early param parsing
later. There's only a few parameters that need to be parsed that early
in e

[PATCH linux-next] powerpc: init jump label early in ppc 64

2022-07-23 Thread zhouzhouyi
From: Zhouyi Zhou 

In ppc 64, invoke jump_label_init in setup_feature_keys is too late
because static key will be used in subroutine of early_init_devtree.

So we can invoke jump_label_init earlier in early_setup.
We can not move setup_feature_keys backward because its subroutine
cpu_feature_keys_init depend on data structures initialized in
early_init_devtree.

Signed-off-by: Zhouyi Zhou 
---
Dear PPC developers

I found this bug when trying to do rcutorture tests in ppc VM of
Open Source Lab of Oregon State University.

qemu-system-ppc64 -nographic -smp cores=8,threads=1 -net none -M pseries 
-nodefaults -device spapr-vscsi -serial 
file:/home/ubuntu/linux-next/tools/testing/selftests/rcutorture/res/2022.07.19-01.18.42-torture/results-rcutorture/TREE03/console.log
 -m 512 -kernel 
/home/ubuntu/linux-next/tools/testing/selftests/rcutorture/res/2022.07.19-01.18.42-torture/results-rcutorture/TREE03/vmlinux
 -append "debug_boot_weak_hash panic=-1 console=ttyS0 
rcupdate.rcu_cpu_stall_suppress_at_boot=1 torture.disable_onoff_at_boot 
rcupdate.rcu_task_stall_timeout=3 rcutorture.onoff_interval=200 
rcutorture.onoff_holdoff=30 rcutree.gp_preinit_delay=12 rcutree.gp_init_delay=3 
rcutree.gp_cleanup_delay=3 rcutree.kthread_prio=2 threadirqs tree.use_softirq=0 
rcutorture.n_barrier_cbs=4 rcutorture.stat_interval=15 
rcutorture.shutdown_secs=420 rcutorture.test_no_idle_hz=1 rcutorture.verbose=1"

console.log report following WARN:
[0.00][T0] static_key_enable_cpuslocked(): static key 
'0xc2953260' used before call to jump_label_init()^M
[0.00][T0] WARNING: CPU: 0 PID: 0 at kernel/jump_label.c:166 
static_key_enable_cpuslocked+0xfc/0x120^M
[0.00][T0] Modules linked in:^M
[0.00][T0] CPU: 0 PID: 0 Comm: swapper Not tainted 
5.19.0-rc5-next-20220708-dirty #131^M
[0.00][T0] NIP:  c038068c LR: c0380688 CTR: 
c0186ac0^M
[0.00][T0] REGS: c2867930 TRAP: 0700   Not tainted  
(5.19.0-rc5-next-20220708-dirty)^M
[0.00][T0] MSR:  80022003   CR: 24282224  XER: 
2004^M
[0.00][T0] CFAR: 0730 IRQMASK: 1 ^M
[0.00][T0] GPR00: c0380688 c2867bd0 
c2868d00 0065 ^M
[0.00][T0] GPR04: 0001  
0080 000d ^M
[0.00][T0] GPR08:   
c27fd000 000f ^M
[0.00][T0] GPR12: c0186ac0 c2082280 
0003 000d ^M
[0.00][T0] GPR16: 02cc00d0  
c2082280 0001 ^M
[0.00][T0] GPR20: c2080942  
  ^M
[0.00][T0] GPR24:  c10d6168 
 c20034c8 ^M
[0.00][T0] GPR28: 0028  
c2080942 c2953260 ^M
[0.00][T0] NIP [c038068c] 
static_key_enable_cpuslocked+0xfc/0x120^M
[0.00][T0] LR [c0380688] 
static_key_enable_cpuslocked+0xf8/0x120^M
[0.00][T0] Call Trace:^M
[0.00][T0] [c2867bd0] [c0380688] 
static_key_enable_cpuslocked+0xf8/0x120 (unreliable)^M
[0.00][T0] [c2867c40] [c0380810] 
static_key_enable+0x30/0x50^M
[0.00][T0] [c2867c70] [c2030314] 
setup_forced_irqthreads+0x28/0x40^M
[0.00][T0] [c2867c90] [c2003568] 
do_early_param+0xa0/0x108^M
[0.00][T0] [c2867d10] [c0175340] 
parse_args+0x290/0x4e0^M
[0.00][T0] [c2867e10] [c2003c74] 
parse_early_options+0x48/0x5c^M
[0.00][T0] [c2867e30] [c2003ce0] 
parse_early_param+0x58/0x84^M
[0.00][T0] [c2867e60] [c2009878] 
early_init_devtree+0xd4/0x518^M
[0.00][T0] [c2867f10] [c200aee0] 
early_setup+0xb4/0x214^M

After this fix, the WARN does not show again.

Kind Regards
Zhouyi
--
 arch/powerpc/kernel/setup_64.c | 13 -
 1 file changed, 12 insertions(+), 1 deletion(-)

diff --git a/arch/powerpc/kernel/setup_64.c b/arch/powerpc/kernel/setup_64.c
index 2b2d0b0fbb30..bf2fb76221da 100644
--- a/arch/powerpc/kernel/setup_64.c
+++ b/arch/powerpc/kernel/setup_64.c
@@ -365,6 +365,9 @@ void __init early_setup(unsigned long dt_ptr)
 
udbg_printf(" -> %s(), dt_ptr: 0x%lx\n", __func__, dt_ptr);
 
+   /* Initialise jump label because subsequent calls need it */
+   jump_label_init();
+
/*
 * Do early initialization using the flattened device
 * tree, such as retrieving the physical memory map or
@@ -394,8 +397,15 @@ void __init early_setup(unsigned long dt_ptr)
 
/* Apply all the dynamic patching */
apply_feature_fixups();
-   setup_feature_keys();
+
+   /*
+* All