Re: [GIT pull] locking/urgent for v5.10-rc6

2020-12-02 Thread Heiko Carstens
On Wed, Dec 02, 2020 at 11:16:05AM +, Mark Rutland wrote:
> On Wed, Dec 02, 2020 at 11:56:49AM +0100, Heiko Carstens wrote:
> > From 7bd86fb3eb039a4163281472ca79b9158e726526 Mon Sep 17 00:00:00 2001
> > From: Heiko Carstens 
> > Date: Wed, 2 Dec 2020 11:46:01 +0100
> > Subject: [PATCH] s390: fix irq state tracing
> > 
> > With commit 58c644ba512c ("sched/idle: Fix arch_cpu_idle() vs
> > tracing") common code calls arch_cpu_idle() with a lockdep state that
> > tells irqs are on.
> > 
> > This doesn't work very well for s390: psw_idle() will enable interrupts
> > to wait for an interrupt. As soon as an interrupt occurs the interrupt
> > handler will verify if the old context was psw_idle(). If that is the
> > case the interrupt enablement bits in the old program status word will
> > be cleared.
> > 
> > A subsequent test in both the external as well as the io interrupt
> > handler checks if in the old context interrupts were enabled. Due to
> > the above patching of the old program status word it is assumed the
> > old context had interrupts disabled, and therefore a call to
> > TRACE_IRQS_OFF (aka trace_hardirqs_off_caller) is skipped. Which in
> > turn makes lockdep incorrectly "think" that interrupts are enabled
> > within the interrupt handler.
> > 
> > Fix this by unconditionally calling TRACE_IRQS_OFF when entering
> > interrupt handlers. Also call unconditionally TRACE_IRQS_ON when
> > leaving interrupts handlers.
> > 
> > This leaves the special psw_idle() case, which now returns with
> > interrupts disabled, but has an "irqs on" lockdep state. So callers of
> > psw_idle() must adjust the state on their own, if required. This is
> > currently only __udelay_disabled().
> > 
> > Fixes: 58c644ba512c ("sched/idle: Fix arch_cpu_idle() vs tracing")
> > Signed-off-by: Heiko Carstens 
> 
> FWIW, this makes sense to me from what I had to chase on the arm64 side,
> and this seems happy atop v5.10-rc6 with all the lockdep and RCU debug
> options enabled when booting to userspace under QEMU.
> 
> Thanks,
> Mark.

Thanks a lot for having a look and testing this!


Re: [GIT pull] locking/urgent for v5.10-rc6

2020-12-02 Thread Peter Zijlstra
On Wed, Dec 02, 2020 at 11:56:49AM +0100, Heiko Carstens wrote:
> From 7bd86fb3eb039a4163281472ca79b9158e726526 Mon Sep 17 00:00:00 2001
> From: Heiko Carstens 
> Date: Wed, 2 Dec 2020 11:46:01 +0100
> Subject: [PATCH] s390: fix irq state tracing
> 
> With commit 58c644ba512c ("sched/idle: Fix arch_cpu_idle() vs
> tracing") common code calls arch_cpu_idle() with a lockdep state that
> tells irqs are on.
> 
> This doesn't work very well for s390: psw_idle() will enable interrupts
> to wait for an interrupt. As soon as an interrupt occurs the interrupt
> handler will verify if the old context was psw_idle(). If that is the
> case the interrupt enablement bits in the old program status word will
> be cleared.
> 
> A subsequent test in both the external as well as the io interrupt
> handler checks if in the old context interrupts were enabled. Due to
> the above patching of the old program status word it is assumed the
> old context had interrupts disabled, and therefore a call to
> TRACE_IRQS_OFF (aka trace_hardirqs_off_caller) is skipped. Which in
> turn makes lockdep incorrectly "think" that interrupts are enabled
> within the interrupt handler.
> 
> Fix this by unconditionally calling TRACE_IRQS_OFF when entering
> interrupt handlers. Also call unconditionally TRACE_IRQS_ON when
> leaving interrupts handlers.
> 
> This leaves the special psw_idle() case, which now returns with
> interrupts disabled, but has an "irqs on" lockdep state. So callers of
> psw_idle() must adjust the state on their own, if required. This is
> currently only __udelay_disabled().
> 
> Fixes: 58c644ba512c ("sched/idle: Fix arch_cpu_idle() vs tracing")
> Signed-off-by: Heiko Carstens 

Thanks for sorting this Heiko!

Acked-by: Peter Zijlstra (Intel) 


Re: [GIT pull] locking/urgent for v5.10-rc6

2020-12-02 Thread Mark Rutland
On Wed, Dec 02, 2020 at 11:56:49AM +0100, Heiko Carstens wrote:
> From 7bd86fb3eb039a4163281472ca79b9158e726526 Mon Sep 17 00:00:00 2001
> From: Heiko Carstens 
> Date: Wed, 2 Dec 2020 11:46:01 +0100
> Subject: [PATCH] s390: fix irq state tracing
> 
> With commit 58c644ba512c ("sched/idle: Fix arch_cpu_idle() vs
> tracing") common code calls arch_cpu_idle() with a lockdep state that
> tells irqs are on.
> 
> This doesn't work very well for s390: psw_idle() will enable interrupts
> to wait for an interrupt. As soon as an interrupt occurs the interrupt
> handler will verify if the old context was psw_idle(). If that is the
> case the interrupt enablement bits in the old program status word will
> be cleared.
> 
> A subsequent test in both the external as well as the io interrupt
> handler checks if in the old context interrupts were enabled. Due to
> the above patching of the old program status word it is assumed the
> old context had interrupts disabled, and therefore a call to
> TRACE_IRQS_OFF (aka trace_hardirqs_off_caller) is skipped. Which in
> turn makes lockdep incorrectly "think" that interrupts are enabled
> within the interrupt handler.
> 
> Fix this by unconditionally calling TRACE_IRQS_OFF when entering
> interrupt handlers. Also call unconditionally TRACE_IRQS_ON when
> leaving interrupts handlers.
> 
> This leaves the special psw_idle() case, which now returns with
> interrupts disabled, but has an "irqs on" lockdep state. So callers of
> psw_idle() must adjust the state on their own, if required. This is
> currently only __udelay_disabled().
> 
> Fixes: 58c644ba512c ("sched/idle: Fix arch_cpu_idle() vs tracing")
> Signed-off-by: Heiko Carstens 

FWIW, this makes sense to me from what I had to chase on the arm64 side,
and this seems happy atop v5.10-rc6 with all the lockdep and RCU debug
options enabled when booting to userspace under QEMU.

Thanks,
Mark.

> ---
>  arch/s390/kernel/entry.S | 15 ---
>  arch/s390/lib/delay.c|  5 ++---
>  2 files changed, 2 insertions(+), 18 deletions(-)
> 
> diff --git a/arch/s390/kernel/entry.S b/arch/s390/kernel/entry.S
> index 26bb0603c5a1..92beb1444644 100644
> --- a/arch/s390/kernel/entry.S
> +++ b/arch/s390/kernel/entry.S
> @@ -763,12 +763,7 @@ ENTRY(io_int_handler)
>   xc  __PT_FLAGS(8,%r11),__PT_FLAGS(%r11)
>   TSTMSK  __LC_CPU_FLAGS,_CIF_IGNORE_IRQ
>   jo  .Lio_restore
> -#if IS_ENABLED(CONFIG_TRACE_IRQFLAGS)
> - tmhh%r8,0x300
> - jz  1f
>   TRACE_IRQS_OFF
> -1:
> -#endif
>   xc  __SF_BACKCHAIN(8,%r15),__SF_BACKCHAIN(%r15)
>  .Lio_loop:
>   lgr %r2,%r11# pass pointer to pt_regs
> @@ -791,12 +786,7 @@ ENTRY(io_int_handler)
>   TSTMSK  __LC_CPU_FLAGS,_CIF_WORK
>   jnz .Lio_work
>  .Lio_restore:
> -#if IS_ENABLED(CONFIG_TRACE_IRQFLAGS)
> - tm  __PT_PSW(%r11),3
> - jno 0f
>   TRACE_IRQS_ON
> -0:
> -#endif
>   mvc __LC_RETURN_PSW(16),__PT_PSW(%r11)
>   tm  __PT_PSW+1(%r11),0x01   # returning to user ?
>   jno .Lio_exit_kernel
> @@ -976,12 +966,7 @@ ENTRY(ext_int_handler)
>   xc  __PT_FLAGS(8,%r11),__PT_FLAGS(%r11)
>   TSTMSK  __LC_CPU_FLAGS,_CIF_IGNORE_IRQ
>   jo  .Lio_restore
> -#if IS_ENABLED(CONFIG_TRACE_IRQFLAGS)
> - tmhh%r8,0x300
> - jz  1f
>   TRACE_IRQS_OFF
> -1:
> -#endif
>   xc  __SF_BACKCHAIN(8,%r15),__SF_BACKCHAIN(%r15)
>   lgr %r2,%r11# pass pointer to pt_regs
>   lghi%r3,EXT_INTERRUPT
> diff --git a/arch/s390/lib/delay.c b/arch/s390/lib/delay.c
> index daca7bad66de..8c0c68e7770e 100644
> --- a/arch/s390/lib/delay.c
> +++ b/arch/s390/lib/delay.c
> @@ -33,7 +33,7 @@ EXPORT_SYMBOL(__delay);
>  
>  static void __udelay_disabled(unsigned long long usecs)
>  {
> - unsigned long cr0, cr0_new, psw_mask, flags;
> + unsigned long cr0, cr0_new, psw_mask;
>   struct s390_idle_data idle;
>   u64 end;
>  
> @@ -45,9 +45,8 @@ static void __udelay_disabled(unsigned long long usecs)
>   psw_mask = __extract_psw() | PSW_MASK_EXT | PSW_MASK_WAIT;
>   set_clock_comparator(end);
>   set_cpu_flag(CIF_IGNORE_IRQ);
> - local_irq_save(flags);
>   psw_idle(, psw_mask);
> - local_irq_restore(flags);
> + trace_hardirqs_off();
>   clear_cpu_flag(CIF_IGNORE_IRQ);
>   set_clock_comparator(S390_lowcore.clock_comparator);
>   __ctl_load(cr0, 0, 0);
> -- 
> 2.17.1
> 


Re: [GIT pull] locking/urgent for v5.10-rc6

2020-12-02 Thread Heiko Carstens
On Wed, Dec 02, 2020 at 10:21:16AM +0100, Peter Zijlstra wrote:
> On Tue, Dec 01, 2020 at 08:18:56PM +0100, Heiko Carstens wrote:
> OK, so with a little help from s390/PoO and Sven, the code removed skips
> the TRACE_IRQS_OFF when IRQs were enabled in the old PSW (the previous
> context).
> 
> That sounds entirely the right thing. Irrespective of what the previous
> IRQ state was, the current state is off.
> 
> > diff --git a/arch/s390/kernel/idle.c b/arch/s390/kernel/idle.c
> > index 2b85096964f8..5bd8c1044d09 100644
> > --- a/arch/s390/kernel/idle.c
> > +++ b/arch/s390/kernel/idle.c
> > @@ -123,7 +123,6 @@ void arch_cpu_idle_enter(void)
> >  void arch_cpu_idle(void)
> >  {
> > enabled_wait();
> > -   raw_local_irq_enable();
> >  }
> 
> Currently arch_cpu_idle() is defined as to return with IRQs enabled,
> however, the very first thing we do when we return is
> raw_local_irq_disable(), so this change is harmless.
> 
> It is also the direction I've been arguing for elsewhere in this thread.
> So I'm certainly not complaining.

So I left that raw_local_irq_enable() in to be consistent with other
architectures. enabled_wait() now returns with irqs disabled, but with
a lockdep state that tells irqs are on...  See patch below.
Works and hopefully makes sense ;)

In addition (but not for rc7) I want to get rid of our complex udelay
implementation. I think we don't need that anymore.. so there would be
only the idle code left where we have to play tricks.

>From 7bd86fb3eb039a4163281472ca79b9158e726526 Mon Sep 17 00:00:00 2001
From: Heiko Carstens 
Date: Wed, 2 Dec 2020 11:46:01 +0100
Subject: [PATCH] s390: fix irq state tracing

With commit 58c644ba512c ("sched/idle: Fix arch_cpu_idle() vs
tracing") common code calls arch_cpu_idle() with a lockdep state that
tells irqs are on.

This doesn't work very well for s390: psw_idle() will enable interrupts
to wait for an interrupt. As soon as an interrupt occurs the interrupt
handler will verify if the old context was psw_idle(). If that is the
case the interrupt enablement bits in the old program status word will
be cleared.

A subsequent test in both the external as well as the io interrupt
handler checks if in the old context interrupts were enabled. Due to
the above patching of the old program status word it is assumed the
old context had interrupts disabled, and therefore a call to
TRACE_IRQS_OFF (aka trace_hardirqs_off_caller) is skipped. Which in
turn makes lockdep incorrectly "think" that interrupts are enabled
within the interrupt handler.

Fix this by unconditionally calling TRACE_IRQS_OFF when entering
interrupt handlers. Also call unconditionally TRACE_IRQS_ON when
leaving interrupts handlers.

This leaves the special psw_idle() case, which now returns with
interrupts disabled, but has an "irqs on" lockdep state. So callers of
psw_idle() must adjust the state on their own, if required. This is
currently only __udelay_disabled().

Fixes: 58c644ba512c ("sched/idle: Fix arch_cpu_idle() vs tracing")
Signed-off-by: Heiko Carstens 
---
 arch/s390/kernel/entry.S | 15 ---
 arch/s390/lib/delay.c|  5 ++---
 2 files changed, 2 insertions(+), 18 deletions(-)

diff --git a/arch/s390/kernel/entry.S b/arch/s390/kernel/entry.S
index 26bb0603c5a1..92beb1444644 100644
--- a/arch/s390/kernel/entry.S
+++ b/arch/s390/kernel/entry.S
@@ -763,12 +763,7 @@ ENTRY(io_int_handler)
xc  __PT_FLAGS(8,%r11),__PT_FLAGS(%r11)
TSTMSK  __LC_CPU_FLAGS,_CIF_IGNORE_IRQ
jo  .Lio_restore
-#if IS_ENABLED(CONFIG_TRACE_IRQFLAGS)
-   tmhh%r8,0x300
-   jz  1f
TRACE_IRQS_OFF
-1:
-#endif
xc  __SF_BACKCHAIN(8,%r15),__SF_BACKCHAIN(%r15)
 .Lio_loop:
lgr %r2,%r11# pass pointer to pt_regs
@@ -791,12 +786,7 @@ ENTRY(io_int_handler)
TSTMSK  __LC_CPU_FLAGS,_CIF_WORK
jnz .Lio_work
 .Lio_restore:
-#if IS_ENABLED(CONFIG_TRACE_IRQFLAGS)
-   tm  __PT_PSW(%r11),3
-   jno 0f
TRACE_IRQS_ON
-0:
-#endif
mvc __LC_RETURN_PSW(16),__PT_PSW(%r11)
tm  __PT_PSW+1(%r11),0x01   # returning to user ?
jno .Lio_exit_kernel
@@ -976,12 +966,7 @@ ENTRY(ext_int_handler)
xc  __PT_FLAGS(8,%r11),__PT_FLAGS(%r11)
TSTMSK  __LC_CPU_FLAGS,_CIF_IGNORE_IRQ
jo  .Lio_restore
-#if IS_ENABLED(CONFIG_TRACE_IRQFLAGS)
-   tmhh%r8,0x300
-   jz  1f
TRACE_IRQS_OFF
-1:
-#endif
xc  __SF_BACKCHAIN(8,%r15),__SF_BACKCHAIN(%r15)
lgr %r2,%r11# pass pointer to pt_regs
lghi%r3,EXT_INTERRUPT
diff --git a/arch/s390/lib/delay.c b/arch/s390/lib/delay.c
index daca7bad66de..8c0c68e7770e 100644
--- a/arch/s390/lib/delay.c
+++ b/arch/s390/lib/delay.c
@@ -33,7 +33,7 @@ EXPORT_SYMBOL(__delay);
 
 static void __udelay_disabled(unsigned long long usecs)
 {
-   unsigned long cr0, cr0_new, psw_mask, flags;
+   unsigned long cr0, cr0_new, psw_mask;
struct 

Re: [GIT pull] locking/urgent for v5.10-rc6

2020-12-02 Thread Heiko Carstens
On Wed, Dec 02, 2020 at 10:38:05AM +0100, Peter Zijlstra wrote:
> On Wed, Dec 02, 2020 at 08:54:27AM +0100, Heiko Carstens wrote:
> > > > But but but...
> > > > 
> > > >   do_idle() # IRQs on
> > > > local_irq_disable();# IRQs off
> > > > defaul_idle_call()  # IRQs off
> > >   lockdep_hardirqs_on();  # IRQs off, but lockdep things they're on
> > > >   arch_cpu_idle()   # IRQs off
> > > > enabled_wait()  # IRQs off
> > > >   raw_local_save()  # still off
> > > >   psw_idle()# very much off
> > > > ext_int_handler # get an interrupt ?!?!
> > > rcu_irq_enter()   # lockdep thinks IRQs are on <- FAIL
> > > 
> > > I can't much read s390 assembler, but ext_int_handler() has a
> > > TRACE_IRQS_OFF, which would be sufficient to re-align the lockdep state
> > > with the actual state, but there's some condition before it, what's that
> > > test and is that right?
> > 
> > After digging a bit into our asm code: no, it is not right, and only
> > for psw_idle() it is wrong.
> > 
> > What happens with the current code:
> > 
> > - default_idle_call() calls lockdep_hardirqs_on() before calling into
> >   arch_cpu_idle()
> 
> Correct.
> 
> > - our arch_cpu_idle() calls psw_idle() which enables irqs. the irq
> >   handler will call/use the SWITCH_ASYNC macro which clears the
> >   interrupt enabled bits in the old program status word (_only_ for
> >   psw_idle)
> 
> This is the condition at 0: that compares r13 to psw_idle_exit?

Yes, exactly.

> > So I guess my patch which I sent yesterday evening should fix all that
> > mess
> 
> Yes, afaict it does the right thing. Exceptions should call
> TRACE_IRQS_OFF unconditionally, since the hardware will disable
> interrupts upon taking an exception, irrespective of what the previous
> context had.
> 
> On exception return the previous IRQ state is inspected and lockdep is
> changed to match (except for NMIs, which either are ignored by lockdep
> or need a little bit of extra care).

Yes, we do all that, except that it seems odd to test the previous
state for interrupts (not exceptions). Since for interrupts the
previous context obviously must have been enabled for interrupts.

Except if you play tricks with the old PSW, like we do :/


Re: [GIT pull] locking/urgent for v5.10-rc6

2020-12-02 Thread Peter Zijlstra
On Wed, Dec 02, 2020 at 08:54:27AM +0100, Heiko Carstens wrote:
> > > But but but...
> > > 
> > >   do_idle()   # IRQs on
> > > local_irq_disable();  # IRQs off
> > > defaul_idle_call()# IRQs off
> > lockdep_hardirqs_on();  # IRQs off, but lockdep things they're on
> > >   arch_cpu_idle() # IRQs off
> > > enabled_wait()# IRQs off
> > > raw_local_save()  # still off
> > > psw_idle()# very much off
> > >   ext_int_handler # get an interrupt ?!?!
> >   rcu_irq_enter()   # lockdep thinks IRQs are on <- FAIL
> > 
> > I can't much read s390 assembler, but ext_int_handler() has a
> > TRACE_IRQS_OFF, which would be sufficient to re-align the lockdep state
> > with the actual state, but there's some condition before it, what's that
> > test and is that right?
> 
> After digging a bit into our asm code: no, it is not right, and only
> for psw_idle() it is wrong.
> 
> What happens with the current code:
> 
> - default_idle_call() calls lockdep_hardirqs_on() before calling into
>   arch_cpu_idle()

Correct.

> - our arch_cpu_idle() calls psw_idle() which enables irqs. the irq
>   handler will call/use the SWITCH_ASYNC macro which clears the
>   interrupt enabled bits in the old program status word (_only_ for
>   psw_idle)

This is the condition at 0: that compares r13 to psw_idle_exit?

> - this again causes the interrupt handler to _not_ call TRACE_IRQS_OFF
>   and therefore lockdep thinks interrupts are enabled within the
>   interrupt handler
> 
> So I guess my patch which I sent yesterday evening should fix all that
> mess

Yes, afaict it does the right thing. Exceptions should call
TRACE_IRQS_OFF unconditionally, since the hardware will disable
interrupts upon taking an exception, irrespective of what the previous
context had.

On exception return the previous IRQ state is inspected and lockdep is
changed to match (except for NMIs, which either are ignored by lockdep
or need a little bit of extra care).


Re: [GIT pull] locking/urgent for v5.10-rc6

2020-12-02 Thread Peter Zijlstra
On Tue, Dec 01, 2020 at 08:18:56PM +0100, Heiko Carstens wrote:

> Is there a reason why this should be considered broken?

AFAICT this is good.

> diff --git a/arch/s390/kernel/entry.S b/arch/s390/kernel/entry.S
> index 26bb0603c5a1..92beb1444644 100644
> --- a/arch/s390/kernel/entry.S
> +++ b/arch/s390/kernel/entry.S
> @@ -763,12 +763,7 @@ ENTRY(io_int_handler)
>   xc  __PT_FLAGS(8,%r11),__PT_FLAGS(%r11)
>   TSTMSK  __LC_CPU_FLAGS,_CIF_IGNORE_IRQ
>   jo  .Lio_restore
> -#if IS_ENABLED(CONFIG_TRACE_IRQFLAGS)
> - tmhh%r8,0x300
> - jz  1f
>   TRACE_IRQS_OFF
> -1:
> -#endif
>   xc  __SF_BACKCHAIN(8,%r15),__SF_BACKCHAIN(%r15)
>  .Lio_loop:
>   lgr %r2,%r11# pass pointer to pt_regs
> @@ -791,12 +786,7 @@ ENTRY(io_int_handler)
>   TSTMSK  __LC_CPU_FLAGS,_CIF_WORK
>   jnz .Lio_work
>  .Lio_restore:
> -#if IS_ENABLED(CONFIG_TRACE_IRQFLAGS)
> - tm  __PT_PSW(%r11),3
> - jno 0f
>   TRACE_IRQS_ON
> -0:
> -#endif
>   mvc __LC_RETURN_PSW(16),__PT_PSW(%r11)
>   tm  __PT_PSW+1(%r11),0x01   # returning to user ?
>   jno .Lio_exit_kernel
> @@ -976,12 +966,7 @@ ENTRY(ext_int_handler)
>   xc  __PT_FLAGS(8,%r11),__PT_FLAGS(%r11)
>   TSTMSK  __LC_CPU_FLAGS,_CIF_IGNORE_IRQ
>   jo  .Lio_restore
> -#if IS_ENABLED(CONFIG_TRACE_IRQFLAGS)
> - tmhh%r8,0x300
> - jz  1f
>   TRACE_IRQS_OFF
> -1:
> -#endif
>   xc  __SF_BACKCHAIN(8,%r15),__SF_BACKCHAIN(%r15)
>   lgr %r2,%r11# pass pointer to pt_regs
>   lghi%r3,EXT_INTERRUPT

OK, so with a little help from s390/PoO and Sven, the code removed skips
the TRACE_IRQS_OFF when IRQs were enabled in the old PSW (the previous
context).

That sounds entirely the right thing. Irrespective of what the previous
IRQ state was, the current state is off.

> diff --git a/arch/s390/kernel/idle.c b/arch/s390/kernel/idle.c
> index 2b85096964f8..5bd8c1044d09 100644
> --- a/arch/s390/kernel/idle.c
> +++ b/arch/s390/kernel/idle.c
> @@ -123,7 +123,6 @@ void arch_cpu_idle_enter(void)
>  void arch_cpu_idle(void)
>  {
>   enabled_wait();
> - raw_local_irq_enable();
>  }

Currently arch_cpu_idle() is defined as to return with IRQs enabled,
however, the very first thing we do when we return is
raw_local_irq_disable(), so this change is harmless.

It is also the direction I've been arguing for elsewhere in this thread.
So I'm certainly not complaining.



Re: [GIT pull] locking/urgent for v5.10-rc6

2020-12-01 Thread Heiko Carstens
> > But but but...
> > 
> >   do_idle() # IRQs on
> > local_irq_disable();# IRQs off
> > defaul_idle_call()  # IRQs off
>   lockdep_hardirqs_on();  # IRQs off, but lockdep things they're on
> >   arch_cpu_idle()   # IRQs off
> > enabled_wait()  # IRQs off
> >   raw_local_save()  # still off
> >   psw_idle()# very much off
> > ext_int_handler # get an interrupt ?!?!
> rcu_irq_enter()   # lockdep thinks IRQs are on <- FAIL
> 
> I can't much read s390 assembler, but ext_int_handler() has a
> TRACE_IRQS_OFF, which would be sufficient to re-align the lockdep state
> with the actual state, but there's some condition before it, what's that
> test and is that right?

After digging a bit into our asm code: no, it is not right, and only
for psw_idle() it is wrong.

What happens with the current code:

- default_idle_call() calls lockdep_hardirqs_on() before calling into
  arch_cpu_idle()

- our arch_cpu_idle() calls psw_idle() which enables irqs. the irq
  handler will call/use the SWITCH_ASYNC macro which clears the
  interrupt enabled bits in the old program status word (_only_ for
  psw_idle)

- this again causes the interrupt handler to _not_ call TRACE_IRQS_OFF
  and therefore lockdep thinks interrupts are enabled within the
  interrupt handler

So I guess my patch which I sent yesterday evening should fix all that
mess - plus an explicit trace_hardirqs_off() call in our udelay
implementation is required now.

I'll send a proper patch later.


Re: [GIT pull] locking/urgent for v5.10-rc6

2020-12-01 Thread Peter Zijlstra
On Tue, Dec 01, 2020 at 11:45:25AM -0800, Linus Torvalds wrote:
> On Mon, Nov 30, 2020 at 11:56 PM Peter Zijlstra  wrote:
> >
> > So even if an architecture needs to enable interrupts on idle, we need
> > it disabled again when coming out. So we might as well have the arch
> > idle routine then be: STI; HLT; CLI; because then architectures than can
> > idle with interrupts disabled can avoid mucking about with the interrupt
> > state entirely.
> 
> But that's not what the code is doing.
> 
> Go look at it.
> 
> It does sti;hlt;cli;pushf;cli;sti.
> 
> All for no good reason - because the code is structured so that even
> if all the tracking and lockdep is disabled, the pointless "let's
> protect the tracking from interrupts" is still there.
> 
> See what I am complaining about?

Absolutely.

  default_idle()
arch_cpu_idle()
  sti; hlt;
cli;
rcu_idle_exit()
  pushf;
  cli;
  rcu_eqs_exit(false);
  popf;
sti;

is what it currently looks like, and that's completely insane, no
argument.

What I would like to end up with is:

  default_idle()
arch_cpu_idle()
  sti; hlt; cli
rcu_idle_exit()
  rcu_eqs_exit(false);
sti;

Which would allow architectures that can idle with IRQs disabled to do
so. But that needs a little more work:

 - make arch_cpu_idle() IRQ invariant (we enter and exit with IRQs off)
 - make cpuidle drivers do similar
 - audit all rcu_idle_exit() callers and remove save/restore



Re: [GIT pull] locking/urgent for v5.10-rc6

2020-12-01 Thread Linus Torvalds
On Mon, Nov 30, 2020 at 11:56 PM Peter Zijlstra  wrote:
>
> So even if an architecture needs to enable interrupts on idle, we need
> it disabled again when coming out. So we might as well have the arch
> idle routine then be: STI; HLT; CLI; because then architectures than can
> idle with interrupts disabled can avoid mucking about with the interrupt
> state entirely.

But that's not what the code is doing.

Go look at it.

It does sti;hlt;cli;pushf;cli;sti.

All for no good reason - because the code is structured so that even
if all the tracking and lockdep is disabled, the pointless "let's
protect the tracking from interrupts" is still there.

See what I am complaining about?

Linus


Re: [GIT pull] locking/urgent for v5.10-rc6

2020-12-01 Thread Heiko Carstens
On Tue, Dec 01, 2020 at 08:14:41PM +0100, Peter Zijlstra wrote:
> On Tue, Dec 01, 2020 at 06:57:37PM +, Mark Rutland wrote:
> > On Tue, Dec 01, 2020 at 07:15:06PM +0100, Peter Zijlstra wrote:
> > > On Tue, Dec 01, 2020 at 03:55:19PM +0100, Peter Zijlstra wrote:
> > > > On Tue, Dec 01, 2020 at 06:46:44AM -0800, Paul E. McKenney wrote:
> > > > 
> > > > > > So after having talked to Sven a bit, the thing that is happening, 
> > > > > > is
> > > > > > that this is the one place where we take interrupts with RCU being
> > > > > > disabled. Normally RCU is watching and all is well, except during 
> > > > > > idle.
> > > > > 
> > > > > Isn't interrupt entry supposed to invoke rcu_irq_enter() at some 
> > > > > point?
> > > > > Or did this fall victim to recent optimizations?
> > > > 
> > > > It does, but the problem is that s390 is still using
> > > 
> > > I might've been too quick there, I can't actually seem to find where
> > > s390 does rcu_irq_enter()/exit().
> > > 
> > > Also, I'm thinking the below might just about solve the current problem.
> > > The next problem would then be it calling TRACE_IRQS_ON after it did
> > > rcu_irq_exit()... :/
> > 
> > I gave this patch a go under QEMU TCG atop v5.10-rc6 s390 defconfig with
> > PROVE_LOCKING and DEBUG_ATOMIC_SLEEP. It significantly reduces the
> > number of lockdep splats, but IIUC we need to handle the io_int_handler
> > path in addition to the ext_int_handler path, and there's a remaining
> > lockdep splat (below).
> 
> I'm amazed it didn't actually make things worse, given how I failed to
> spot do_IRQ() was arch code etc..
> 
> > If this ends up looking like we'll need more point-fixes, I wonder if we
> > should conditionalise the new behaviour of the core idle code under a
> > new CONFIG symbol for now, and opt-in x86 and arm64, then transition the
> > rest once they've had a chance to test. They'll still be broken in the
> > mean time, but no more so than they previously were.
> 
> We can do that I suppose... :/

Well, the following small patch works for me (plus an additional call to
trace_hardirqs_on() in our udelay implementation - but that's probably
independent).
Is there a reason why this should be considered broken?

diff --git a/arch/s390/kernel/entry.S b/arch/s390/kernel/entry.S
index 26bb0603c5a1..92beb1444644 100644
--- a/arch/s390/kernel/entry.S
+++ b/arch/s390/kernel/entry.S
@@ -763,12 +763,7 @@ ENTRY(io_int_handler)
xc  __PT_FLAGS(8,%r11),__PT_FLAGS(%r11)
TSTMSK  __LC_CPU_FLAGS,_CIF_IGNORE_IRQ
jo  .Lio_restore
-#if IS_ENABLED(CONFIG_TRACE_IRQFLAGS)
-   tmhh%r8,0x300
-   jz  1f
TRACE_IRQS_OFF
-1:
-#endif
xc  __SF_BACKCHAIN(8,%r15),__SF_BACKCHAIN(%r15)
 .Lio_loop:
lgr %r2,%r11# pass pointer to pt_regs
@@ -791,12 +786,7 @@ ENTRY(io_int_handler)
TSTMSK  __LC_CPU_FLAGS,_CIF_WORK
jnz .Lio_work
 .Lio_restore:
-#if IS_ENABLED(CONFIG_TRACE_IRQFLAGS)
-   tm  __PT_PSW(%r11),3
-   jno 0f
TRACE_IRQS_ON
-0:
-#endif
mvc __LC_RETURN_PSW(16),__PT_PSW(%r11)
tm  __PT_PSW+1(%r11),0x01   # returning to user ?
jno .Lio_exit_kernel
@@ -976,12 +966,7 @@ ENTRY(ext_int_handler)
xc  __PT_FLAGS(8,%r11),__PT_FLAGS(%r11)
TSTMSK  __LC_CPU_FLAGS,_CIF_IGNORE_IRQ
jo  .Lio_restore
-#if IS_ENABLED(CONFIG_TRACE_IRQFLAGS)
-   tmhh%r8,0x300
-   jz  1f
TRACE_IRQS_OFF
-1:
-#endif
xc  __SF_BACKCHAIN(8,%r15),__SF_BACKCHAIN(%r15)
lgr %r2,%r11# pass pointer to pt_regs
lghi%r3,EXT_INTERRUPT
diff --git a/arch/s390/kernel/idle.c b/arch/s390/kernel/idle.c
index 2b85096964f8..5bd8c1044d09 100644
--- a/arch/s390/kernel/idle.c
+++ b/arch/s390/kernel/idle.c
@@ -123,7 +123,6 @@ void arch_cpu_idle_enter(void)
 void arch_cpu_idle(void)
 {
enabled_wait();
-   raw_local_irq_enable();
 }
 
 void arch_cpu_idle_exit(void)


Re: [GIT pull] locking/urgent for v5.10-rc6

2020-12-01 Thread Peter Zijlstra
On Tue, Dec 01, 2020 at 06:57:37PM +, Mark Rutland wrote:
> On Tue, Dec 01, 2020 at 07:15:06PM +0100, Peter Zijlstra wrote:
> > On Tue, Dec 01, 2020 at 03:55:19PM +0100, Peter Zijlstra wrote:
> > > On Tue, Dec 01, 2020 at 06:46:44AM -0800, Paul E. McKenney wrote:
> > > 
> > > > > So after having talked to Sven a bit, the thing that is happening, is
> > > > > that this is the one place where we take interrupts with RCU being
> > > > > disabled. Normally RCU is watching and all is well, except during 
> > > > > idle.
> > > > 
> > > > Isn't interrupt entry supposed to invoke rcu_irq_enter() at some point?
> > > > Or did this fall victim to recent optimizations?
> > > 
> > > It does, but the problem is that s390 is still using
> > 
> > I might've been too quick there, I can't actually seem to find where
> > s390 does rcu_irq_enter()/exit().
> > 
> > Also, I'm thinking the below might just about solve the current problem.
> > The next problem would then be it calling TRACE_IRQS_ON after it did
> > rcu_irq_exit()... :/
> 
> I gave this patch a go under QEMU TCG atop v5.10-rc6 s390 defconfig with
> PROVE_LOCKING and DEBUG_ATOMIC_SLEEP. It significantly reduces the
> number of lockdep splats, but IIUC we need to handle the io_int_handler
> path in addition to the ext_int_handler path, and there's a remaining
> lockdep splat (below).

I'm amazed it didn't actually make things worse, given how I failed to
spot do_IRQ() was arch code etc..

> If this ends up looking like we'll need more point-fixes, I wonder if we
> should conditionalise the new behaviour of the core idle code under a
> new CONFIG symbol for now, and opt-in x86 and arm64, then transition the
> rest once they've had a chance to test. They'll still be broken in the
> mean time, but no more so than they previously were.

We can do that I suppose... :/


Re: [GIT pull] locking/urgent for v5.10-rc6

2020-12-01 Thread Mark Rutland
On Tue, Dec 01, 2020 at 07:15:06PM +0100, Peter Zijlstra wrote:
> On Tue, Dec 01, 2020 at 03:55:19PM +0100, Peter Zijlstra wrote:
> > On Tue, Dec 01, 2020 at 06:46:44AM -0800, Paul E. McKenney wrote:
> > 
> > > > So after having talked to Sven a bit, the thing that is happening, is
> > > > that this is the one place where we take interrupts with RCU being
> > > > disabled. Normally RCU is watching and all is well, except during idle.
> > > 
> > > Isn't interrupt entry supposed to invoke rcu_irq_enter() at some point?
> > > Or did this fall victim to recent optimizations?
> > 
> > It does, but the problem is that s390 is still using
> 
> I might've been too quick there, I can't actually seem to find where
> s390 does rcu_irq_enter()/exit().
> 
> Also, I'm thinking the below might just about solve the current problem.
> The next problem would then be it calling TRACE_IRQS_ON after it did
> rcu_irq_exit()... :/

I gave this patch a go under QEMU TCG atop v5.10-rc6 s390 defconfig with
PROVE_LOCKING and DEBUG_ATOMIC_SLEEP. It significantly reduces the
number of lockdep splats, but IIUC we need to handle the io_int_handler
path in addition to the ext_int_handler path, and there's a remaining
lockdep splat (below).

If this ends up looking like we'll need more point-fixes, I wonder if we
should conditionalise the new behaviour of the core idle code under a
new CONFIG symbol for now, and opt-in x86 and arm64, then transition the
rest once they've had a chance to test. They'll still be broken in the
mean time, but no more so than they previously were.

Thanks,
Mark.

[3.870912] [ cut here ]
[3.871225] DEBUG_LOCKS_WARN_ON(current->hardirq_chain_key != 
current->curr_chain_key)
[3.872025] WARNING: CPU: 1 PID: 0 at kernel/locking/lockdep.c:4155 
lockdep_hardirqs_on+0x150/0x158
[3.872249] Modules linked in:
[3.873096] CPU: 1 PID: 0 Comm: swapper/1 Not tainted 5.10.0-rc6-dirty #9
[3.873262] Hardware name: QEMU 2964 QEMU (KVM/Linux)
[3.873584] Krnl PSW : 0404e0018000 00ce51ac 
(lockdep_hardirqs_on+0x154/0x158)
[3.873967]R:0 T:1 IO:0 EX:0 Key:0 M:1 W:0 P:0 AS:3 CC:2 PM:0 
RI:0 EA:3
[3.874214] Krnl GPRS: c000efff 011e5218 004a 
001d988a
[3.874363]ffea 03e1  

[3.874511] 0141e5f0 03278000 
00cf354a
[3.874658]  00ce51a8 
03e73db0
[3.876258] Krnl Code: 00ce519c: c0200012432clarl
%r2,00f2d7f4
[3.876258]00ce51a2: c0e583dbbrasl   
%r14,00cd5958
[3.876258]   #00ce51a8: af00mc  0,0
[3.876258]   >00ce51ac: a7f4ffb3brc 
15,00ce5112
[3.876258]00ce51b0: ebeff0880024stmg
%r14,%r15,136(%r15)
[3.876258]00ce51b6: b90400eflgr 
%r14,%r15
[3.876258]00ce51ba: e3f0ffe0ff71lay 
%r15,-32(%r15)
[3.876258]00ce51c0: e3e0f0980024stg 
%r14,152(%r15)
[3.877823] Call Trace:
[3.878073]  [<00ce51ac>] lockdep_hardirqs_on+0x154/0x158 
[3.878260] ([<00ce51a8>] lockdep_hardirqs_on+0x150/0x158)
[3.878406]  [<00cf3596>] default_idle_call+0xb6/0x2a8 
[3.878549]  [<00194f60>] do_idle+0xe0/0x190 
[3.878682]  [<001952ae>] cpu_startup_entry+0x36/0x40 
[3.878816]  [<00118152>] smp_start_secondary+0x82/0x88 
[3.878975] INFO: lockdep is turned off.
[3.879119] Last Breaking-Event-Address:
[3.879273]  [<00cf76d0>] __s390_indirect_jump_r14+0x0/0xc
[3.879456] irq event stamp: 104
[3.879619] hardirqs last  enabled at (102): [<001a540c>] 
load_balance+0x28c/0x990
[3.879783] hardirqs last disabled at (103): [<00cf7626>] 
__do_softirq+0x536/0x5c8
[3.879945] softirqs last  enabled at (104): [<00cf7584>] 
__do_softirq+0x494/0x5c8
[3.880114] softirqs last disabled at (95): [<0010df60>] 
do_softirq_own_stack+0x70/0x80
[3.880724] random: get_random_bytes called from __warn+0x12a/0x160 with 
crng_init=0
[3.882060] ---[ end trace ffae6482e242107b ]---

> ---
> diff --git a/arch/s390/include/asm/irq.h b/arch/s390/include/asm/irq.h
> index 9f75d67b8c20..24d3dd482df7 100644
> --- a/arch/s390/include/asm/irq.h
> +++ b/arch/s390/include/asm/irq.h
> @@ -113,6 +113,8 @@ void irq_subclass_unregister(enum irq_subclass subclass);
>  
>  #define irq_canonicalize(irq)  (irq)
>  
> +extern __visible void ext_do_IRQ(struct pt_regs *regs, unsigned long vector);
> +
>  #endif /* __ASSEMBLY__ */
>  
>  #endif /* _ASM_IRQ_H */
> diff --git a/arch/s390/kernel/entry.S b/arch/s390/kernel/entry.S
> index 26bb0603c5a1..b8e89b685038 100644
> 

Re: [GIT pull] locking/urgent for v5.10-rc6

2020-12-01 Thread Peter Zijlstra
On Tue, Dec 01, 2020 at 07:15:06PM +0100, Peter Zijlstra wrote:
> On Tue, Dec 01, 2020 at 03:55:19PM +0100, Peter Zijlstra wrote:
> > On Tue, Dec 01, 2020 at 06:46:44AM -0800, Paul E. McKenney wrote:
> > 
> > > > So after having talked to Sven a bit, the thing that is happening, is
> > > > that this is the one place where we take interrupts with RCU being
> > > > disabled. Normally RCU is watching and all is well, except during idle.
> > > 
> > > Isn't interrupt entry supposed to invoke rcu_irq_enter() at some point?
> > > Or did this fall victim to recent optimizations?
> > 
> > It does, but the problem is that s390 is still using
> 
> I might've been too quick there, I can't actually seem to find where
> s390 does rcu_irq_enter()/exit().

Argh, do_IRQ is per arch.. and that does irq_enter() which then does the
deed (thanks Sven!).


Re: [GIT pull] locking/urgent for v5.10-rc6

2020-12-01 Thread Peter Zijlstra
On Tue, Dec 01, 2020 at 03:55:19PM +0100, Peter Zijlstra wrote:
> On Tue, Dec 01, 2020 at 06:46:44AM -0800, Paul E. McKenney wrote:
> 
> > > So after having talked to Sven a bit, the thing that is happening, is
> > > that this is the one place where we take interrupts with RCU being
> > > disabled. Normally RCU is watching and all is well, except during idle.
> > 
> > Isn't interrupt entry supposed to invoke rcu_irq_enter() at some point?
> > Or did this fall victim to recent optimizations?
> 
> It does, but the problem is that s390 is still using

I might've been too quick there, I can't actually seem to find where
s390 does rcu_irq_enter()/exit().

Also, I'm thinking the below might just about solve the current problem.
The next problem would then be it calling TRACE_IRQS_ON after it did
rcu_irq_exit()... :/


---
diff --git a/arch/s390/include/asm/irq.h b/arch/s390/include/asm/irq.h
index 9f75d67b8c20..24d3dd482df7 100644
--- a/arch/s390/include/asm/irq.h
+++ b/arch/s390/include/asm/irq.h
@@ -113,6 +113,8 @@ void irq_subclass_unregister(enum irq_subclass subclass);
 
 #define irq_canonicalize(irq)  (irq)
 
+extern __visible void ext_do_IRQ(struct pt_regs *regs, unsigned long vector);
+
 #endif /* __ASSEMBLY__ */
 
 #endif /* _ASM_IRQ_H */
diff --git a/arch/s390/kernel/entry.S b/arch/s390/kernel/entry.S
index 26bb0603c5a1..b8e89b685038 100644
--- a/arch/s390/kernel/entry.S
+++ b/arch/s390/kernel/entry.S
@@ -976,16 +976,10 @@ ENTRY(ext_int_handler)
xc  __PT_FLAGS(8,%r11),__PT_FLAGS(%r11)
TSTMSK  __LC_CPU_FLAGS,_CIF_IGNORE_IRQ
jo  .Lio_restore
-#if IS_ENABLED(CONFIG_TRACE_IRQFLAGS)
-   tmhh%r8,0x300
-   jz  1f
-   TRACE_IRQS_OFF
-1:
-#endif
xc  __SF_BACKCHAIN(8,%r15),__SF_BACKCHAIN(%r15)
lgr %r2,%r11# pass pointer to pt_regs
lghi%r3,EXT_INTERRUPT
-   brasl   %r14,do_IRQ
+   brasl   %r14,ext_do_IRQ
j   .Lio_return
 ENDPROC(ext_int_handler)
 
diff --git a/arch/s390/kernel/irq.c b/arch/s390/kernel/irq.c
index 3514420f0259..f4a29114e9fd 100644
--- a/arch/s390/kernel/irq.c
+++ b/arch/s390/kernel/irq.c
@@ -329,3 +329,23 @@ void irq_subclass_unregister(enum irq_subclass subclass)
spin_unlock(_subclass_lock);
 }
 EXPORT_SYMBOL(irq_subclass_unregister);
+
+noinstr void ext_do_IRQ(struct pt_regs *regs, unsigned long vector)
+{
+   bool rcu = false;
+
+   lockdep_hardirqs_off(CALLER_ADDR0);
+   if (!IS_ENABLED(CONFIG_TINY_RCU) && is_idle_task(current)) {
+   rcu_irq_enter();
+   rcu = true;
+   }
+   /* instrumentation_begin(); */
+
+   trace_hardirqs_off_finish();
+
+   do_IRQ(regs, vector);
+
+   /* instrumentation_end(); */
+   if (rcu)
+   rcu_irq_exit();
+}


Re: [GIT pull] locking/urgent for v5.10-rc6

2020-12-01 Thread Paul E. McKenney
On Tue, Dec 01, 2020 at 03:55:19PM +0100, Peter Zijlstra wrote:
> On Tue, Dec 01, 2020 at 06:46:44AM -0800, Paul E. McKenney wrote:
> 
> > > So after having talked to Sven a bit, the thing that is happening, is
> > > that this is the one place where we take interrupts with RCU being
> > > disabled. Normally RCU is watching and all is well, except during idle.
> > 
> > Isn't interrupt entry supposed to invoke rcu_irq_enter() at some point?
> > Or did this fall victim to recent optimizations?
> 
> It does, but the problem is that s390 is still using
> trace_hardirqs_off(), which calls into tracing before RCU is enabled.
> 
> The entry order between lockdep, rcu and tracing is critical.
> 
> You can't call into tracing without RCU running,
> you can't call into RCU without lockdep setup,
> you can't call the (old) lockdep setup without landing in tracing.

Whew!  ;-)

Thanx, Paul


Re: [GIT pull] locking/urgent for v5.10-rc6

2020-12-01 Thread Peter Zijlstra
On Tue, Dec 01, 2020 at 06:46:44AM -0800, Paul E. McKenney wrote:

> > So after having talked to Sven a bit, the thing that is happening, is
> > that this is the one place where we take interrupts with RCU being
> > disabled. Normally RCU is watching and all is well, except during idle.
> 
> Isn't interrupt entry supposed to invoke rcu_irq_enter() at some point?
> Or did this fall victim to recent optimizations?

It does, but the problem is that s390 is still using
trace_hardirqs_off(), which calls into tracing before RCU is enabled.

The entry order between lockdep, rcu and tracing is critical.

You can't call into tracing without RCU running,
you can't call into RCU without lockdep setup,
you can't call the (old) lockdep setup without landing in tracing.




Re: [GIT pull] locking/urgent for v5.10-rc6

2020-12-01 Thread Paul E. McKenney
On Tue, Dec 01, 2020 at 12:07:24PM +0100, Peter Zijlstra wrote:
> On Tue, Dec 01, 2020 at 09:07:34AM +0100, Peter Zijlstra wrote:
> > On Mon, Nov 30, 2020 at 08:31:32PM +0100, Christian Borntraeger wrote:
> > > On 30.11.20 19:04, Linus Torvalds wrote:
> > > > On Mon, Nov 30, 2020 at 5:03 AM Peter Zijlstra  
> > > > wrote:
> > > >>
> > > >>> But but but...
> > > >>>
> > > >>>   do_idle()   # IRQs on
> > > >>> local_irq_disable();  # IRQs off
> > > >>> defaul_idle_call()# IRQs off
> > > >> lockdep_hardirqs_on();  # IRQs off, but lockdep things they're 
> > > >> on
> > > >>>   arch_cpu_idle() # IRQs off
> > > >>> enabled_wait()# IRQs off
> > > >>> raw_local_save()  # still off
> > > >>> psw_idle()# very much off
> > > >>>   ext_int_handler # get an interrupt ?!?!
> > > >>   rcu_irq_enter()   # lockdep thinks IRQs are on <- FAIL
> > > >>
> > > >> I can't much read s390 assembler, but ext_int_handler() has a
> > > >> TRACE_IRQS_OFF, which would be sufficient to re-align the lockdep state
> > > >> with the actual state, but there's some condition before it, what's 
> > > >> that
> > > >> test and is that right?
> > > > 
> > > > I think that "psw_idle()" enables interrupts, exactly like x86 does.
> > 
> > (like ye olde x86, modern x86 idles with interrupts disabled)
> > 
> > > Yes, by definition.  Otherwise it would be an software error state.
> > > The interesting part is the lpswe instruction at the end (load PSW) 
> > > which loads the full PSW, which contains interrupt enablement, wait bit,
> > > condition code, paging enablement, machine check enablement the address
> > > and others. The idle psw is enabled for interrupts and has the wait bit
> > > set. If the wait bit is set and interrupts are off this is called 
> > > "disabled
> > > wait" and is used for panic, shutdown etc. 
> > 
> > OK, but at that point, hardware interrupt state is on, lockdep thinks
> > it's on. And we take an interrupt, just like any old regular interrupt
> > enabled region.
> > 
> > But then the exception handler (ext_int_handler), which I'm assuming is
> > ran by the hardware with hardware interrupts disabled again, should be
> > calling into lockdep to tell interrupts were disabled. IOW that
> > TRACE_IRQS_OFF bit in there.
> > 
> > But that doesn't seem to be working right. Why? Because afaict this is
> > then the exact normal flow of things, but it's only going sideways
> > during this idle thing.
> > 
> > What's going 'funny' ?
> 
> So after having talked to Sven a bit, the thing that is happening, is
> that this is the one place where we take interrupts with RCU being
> disabled. Normally RCU is watching and all is well, except during idle.

Isn't interrupt entry supposed to invoke rcu_irq_enter() at some point?
Or did this fall victim to recent optimizations?

Thanx, Paul


Re: [GIT pull] locking/urgent for v5.10-rc6

2020-12-01 Thread Peter Zijlstra
On Tue, Dec 01, 2020 at 09:07:34AM +0100, Peter Zijlstra wrote:
> On Mon, Nov 30, 2020 at 08:31:32PM +0100, Christian Borntraeger wrote:
> > On 30.11.20 19:04, Linus Torvalds wrote:
> > > On Mon, Nov 30, 2020 at 5:03 AM Peter Zijlstra  
> > > wrote:
> > >>
> > >>> But but but...
> > >>>
> > >>>   do_idle()   # IRQs on
> > >>> local_irq_disable();  # IRQs off
> > >>> defaul_idle_call()# IRQs off
> > >> lockdep_hardirqs_on();  # IRQs off, but lockdep things they're on
> > >>>   arch_cpu_idle() # IRQs off
> > >>> enabled_wait()# IRQs off
> > >>> raw_local_save()  # still off
> > >>> psw_idle()# very much off
> > >>>   ext_int_handler # get an interrupt ?!?!
> > >>   rcu_irq_enter()   # lockdep thinks IRQs are on <- FAIL
> > >>
> > >> I can't much read s390 assembler, but ext_int_handler() has a
> > >> TRACE_IRQS_OFF, which would be sufficient to re-align the lockdep state
> > >> with the actual state, but there's some condition before it, what's that
> > >> test and is that right?
> > > 
> > > I think that "psw_idle()" enables interrupts, exactly like x86 does.
> 
> (like ye olde x86, modern x86 idles with interrupts disabled)
> 
> > Yes, by definition.  Otherwise it would be an software error state.
> > The interesting part is the lpswe instruction at the end (load PSW) 
> > which loads the full PSW, which contains interrupt enablement, wait bit,
> > condition code, paging enablement, machine check enablement the address
> > and others. The idle psw is enabled for interrupts and has the wait bit
> > set. If the wait bit is set and interrupts are off this is called "disabled
> > wait" and is used for panic, shutdown etc. 
> 
> OK, but at that point, hardware interrupt state is on, lockdep thinks
> it's on. And we take an interrupt, just like any old regular interrupt
> enabled region.
> 
> But then the exception handler (ext_int_handler), which I'm assuming is
> ran by the hardware with hardware interrupts disabled again, should be
> calling into lockdep to tell interrupts were disabled. IOW that
> TRACE_IRQS_OFF bit in there.
> 
> But that doesn't seem to be working right. Why? Because afaict this is
> then the exact normal flow of things, but it's only going sideways
> during this idle thing.
> 
> What's going 'funny' ?

So after having talked to Sven a bit, the thing that is happening, is
that this is the one place where we take interrupts with RCU being
disabled. Normally RCU is watching and all is well, except during idle.



Re: [GIT pull] locking/urgent for v5.10-rc6

2020-12-01 Thread Peter Zijlstra
On Mon, Nov 30, 2020 at 08:31:32PM +0100, Christian Borntraeger wrote:
> On 30.11.20 19:04, Linus Torvalds wrote:
> > On Mon, Nov 30, 2020 at 5:03 AM Peter Zijlstra  wrote:
> >>
> >>> But but but...
> >>>
> >>>   do_idle()   # IRQs on
> >>> local_irq_disable();  # IRQs off
> >>> defaul_idle_call()# IRQs off
> >> lockdep_hardirqs_on();  # IRQs off, but lockdep things they're on
> >>>   arch_cpu_idle() # IRQs off
> >>> enabled_wait()# IRQs off
> >>> raw_local_save()  # still off
> >>> psw_idle()# very much off
> >>>   ext_int_handler # get an interrupt ?!?!
> >>   rcu_irq_enter()   # lockdep thinks IRQs are on <- FAIL
> >>
> >> I can't much read s390 assembler, but ext_int_handler() has a
> >> TRACE_IRQS_OFF, which would be sufficient to re-align the lockdep state
> >> with the actual state, but there's some condition before it, what's that
> >> test and is that right?
> > 
> > I think that "psw_idle()" enables interrupts, exactly like x86 does.

(like ye olde x86, modern x86 idles with interrupts disabled)

> Yes, by definition.  Otherwise it would be an software error state.
> The interesting part is the lpswe instruction at the end (load PSW) 
> which loads the full PSW, which contains interrupt enablement, wait bit,
> condition code, paging enablement, machine check enablement the address
> and others. The idle psw is enabled for interrupts and has the wait bit
> set. If the wait bit is set and interrupts are off this is called "disabled
> wait" and is used for panic, shutdown etc. 

OK, but at that point, hardware interrupt state is on, lockdep thinks
it's on. And we take an interrupt, just like any old regular interrupt
enabled region.

But then the exception handler (ext_int_handler), which I'm assuming is
ran by the hardware with hardware interrupts disabled again, should be
calling into lockdep to tell interrupts were disabled. IOW that
TRACE_IRQS_OFF bit in there.

But that doesn't seem to be working right. Why? Because afaict this is
then the exact normal flow of things, but it's only going sideways
during this idle thing.

What's going 'funny' ?


Re: [GIT pull] locking/urgent for v5.10-rc6

2020-11-30 Thread Peter Zijlstra
On Tue, Dec 01, 2020 at 08:39:06AM +0100, Peter Zijlstra wrote:
> On Mon, Nov 30, 2020 at 09:55:44AM -0800, Linus Torvalds wrote:
> 
> > Yes, yes, I can very well imagine some hardware doing a "idle until
> > you sense an interrupt, but don't actually take it". It's not
> > _impossible_. But it's certainly not normal.
> 
> A lot of hardware actually does that, including modern x86. But yes, not
> nearly everything.

So the thing is that (with exception of RCU-tiny), we need interrupts
disabled when coming out of idle to prod RCU back into action.

So even if an architecture needs to enable interrupts on idle, we need
it disabled again when coming out. So we might as well have the arch
idle routine then be: STI; HLT; CLI; because then architectures than can
idle with interrupts disabled can avoid mucking about with the interrupt
state entirely.

Currently they (and this includes much of ARM) are effectively doing:

wfi();
raw_local_irq_enable();
raw_local_irq_disable();

where a plain: wfi(), would be sufficient.



Re: [GIT pull] locking/urgent for v5.10-rc6

2020-11-30 Thread Sven Schnelle
Hi Peter,

Peter Zijlstra  writes:

> On Mon, Nov 30, 2020 at 01:52:11PM +0100, Peter Zijlstra wrote:
>> On Mon, Nov 30, 2020 at 01:31:33PM +0100, Sven Schnelle wrote:
>> > [0.670280] [ cut here ] 
>> > [0.670288] WARNING: CPU: 1 PID: 0 at kernel/rcu/tree.c:1054 
>> > rcu_irq_enter+0x7e/0xa8 
>> > [0.670293] Modules linked in: 
>> > [0.670299] CPU: 1 PID: 0 Comm: swapper/1 Tainted: GW 
>> > 5.10.0-rc6 #2263 
>> > [0.670304] Hardware name: IBM 2964 NC9 702 (z/VM 6.4.0) 
>> > [0.670309] Krnl PSW : 0404d0018000 00d8a8da 
>> > (rcu_irq_enter+0x82/0xa8) 
>> > [0.670318]R:0 T:1 IO:0 EX:0 Key:0 M:1 W:0 P:0 AS:3 CC:1 
>> > PM:0 RI:0 EA:3 
>> > [0.670325] Krnl GPRS:  8002 
>> > 0001 0101fcee 
>> > [0.670331]  
>> >   
>> > [0.670337]03e00029ff48  
>> > 017212d8 0001 
>> > [0.670343]05ba0100 000324bb 
>> > 03e00029fe40 03e00029fe10
>> >  
>> > [0.670358] Krnl Code: 00d8a8ca: ec180013017ecij 
>> > %r1,1,8,00d8a8f0 
>> > [0.670358]00d8a8d0: ecb80005007ecij 
>> > %r11,0,8,00d8a8da 
>> > [0.670358]   #00d8a8d6: af00mc  
>> > 0,0 
>> > [0.670358]   >00d8a8da: ebbff0a4lmg 
>> > %r11,%r15,160(%r15) 
>> > [0.670358]00d8a8e0: c0f4ff68brcl
>> > 15,00d8a7b0 
>> > [0.670358]00d8a8e6: c0e538c1brasl   
>> > %r14,00d91a68 
>> > [0.670358]00d8a8ec: a7f4ffdcbrc 
>> > 15,00d8a8a4 
>> > [0.670358]00d8a8f0: c0e538bcbrasl   
>> > %r14,00d91a68 
>> > [0.670392] Call Trace: 
>> > [0.670396]  [<00d8a8da>] rcu_irq_enter+0x82/0xa8  
>> > [0.670401]  [<00157f9a>] irq_enter+0x22/0x30  
>> > [0.670404]  [<0010e51c>] do_IRQ+0x64/0xd0  
>> > [0.670408]  [<00d9a65a>] ext_int_handler+0x18e/0x194  
>> > [0.670412]  [<00d9a6a0>] psw_idle+0x40/0x48  
>> > [0.670416] ([<00104202>] enabled_wait+0x22/0xf0) 
>> > [0.670419]  [<001046e2>] arch_cpu_idle+0x22/0x38  
>> > [0.670423]  [<00d986cc>] default_idle_call+0x74/0xd8  
>> > [0.670427]  [<0019a94a>] do_idle+0xf2/0x1b0  
>> > [0.670431]  [<0019ac7e>] cpu_startup_entry+0x36/0x40  
>> > [0.670435]  [<00118b9a>] smp_start_secondary+0x82/0x88  
>> 
>> But but but...
>> 
>>   do_idle()  # IRQs on
>> local_irq_disable(); # IRQs off
>> defaul_idle_call()   # IRQs off
>   lockdep_hardirqs_on();  # IRQs off, but lockdep things they're on
>>   arch_cpu_idle()# IRQs off
>> enabled_wait()   # IRQs off
>>raw_local_save()  # still off
>>psw_idle()# very much off
>>  ext_int_handler # get an interrupt ?!?!
> rcu_irq_enter()   # lockdep thinks IRQs are on <- FAIL
>
>
> I can't much read s390 assembler, but ext_int_handler() has a
> TRACE_IRQS_OFF, which would be sufficient to re-align the lockdep state
> with the actual state, but there's some condition before it, what's that
> test and is that right?

That test was introduced to only track changes in IRQ state because of
recursion problems in lockdep. This now seems to no longer work. We
propably could remove that as lockdep now can handle recursion much
better with all the recent changes, but given that we're already at
-rc6, i don't want to touch entry.S code because of a debugging feature.



Re: [GIT pull] locking/urgent for v5.10-rc6

2020-11-30 Thread Peter Zijlstra
On Mon, Nov 30, 2020 at 09:55:44AM -0800, Linus Torvalds wrote:

> Yes, yes, I can very well imagine some hardware doing a "idle until
> you sense an interrupt, but don't actually take it". It's not
> _impossible_. But it's certainly not normal.

A lot of hardware actually does that, including modern x86. But yes, not
nearly everything.



Re: [GIT pull] locking/urgent for v5.10-rc6

2020-11-30 Thread Christian Borntraeger



On 30.11.20 19:04, Linus Torvalds wrote:
> On Mon, Nov 30, 2020 at 5:03 AM Peter Zijlstra  wrote:
>>
>>> But but but...
>>>
>>>   do_idle()   # IRQs on
>>> local_irq_disable();  # IRQs off
>>> defaul_idle_call()# IRQs off
>> lockdep_hardirqs_on();  # IRQs off, but lockdep things they're on
>>>   arch_cpu_idle() # IRQs off
>>> enabled_wait()# IRQs off
>>> raw_local_save()  # still off
>>> psw_idle()# very much off
>>>   ext_int_handler # get an interrupt ?!?!
>>   rcu_irq_enter()   # lockdep thinks IRQs are on <- FAIL
>>
>> I can't much read s390 assembler, but ext_int_handler() has a
>> TRACE_IRQS_OFF, which would be sufficient to re-align the lockdep state
>> with the actual state, but there's some condition before it, what's that
>> test and is that right?
> 
> I think that "psw_idle()" enables interrupts, exactly like x86 does.

Yes, by definition.  Otherwise it would be an software error state.
The interesting part is the lpswe instruction at the end (load PSW) 
which loads the full PSW, which contains interrupt enablement, wait bit,
condition code, paging enablement, machine check enablement the address
and others. The idle psw is enabled for interrupts and has the wait bit
set. If the wait bit is set and interrupts are off this is called "disabled
wait" and is used for panic, shutdown etc. 

> See my previous email.
> 
> But no, I can't read s390 asm either. IBM is notorious for making up
> odd IBM-only incomprehensible names. When "oi" means "or immediate", I
> personally start suspecting that there were some "happy drugs"
> involved
> 
> To make matters worse, some of the assembly code in psw_idle isn't
> even assembly code, it's machine code, with "BPON" being an
> alternative instruction definition with just the hex encoding for the
> machine code instruction rather than any actual human-legible
> instruction encoding.
> 
> Of course, when the "human-legible" instructions are "oi", I guess hex
> codes aren't all that much less legible..
> 
> s390 programmers must be some super-human breed. Or, alternatively,
> they are munching happy pills by the truck-load to get over the pain
> ;)


For something that was defined in 1964 it is certainly not too bad.
And for the oddities, you simply get used to it.

In the end the scheme is actually relatively straightforward. 
The first character defines what it is:
o: or
a: add
n: and
l: load 
i: insert (kind of a load to parts, e.g. only one byte)
st: store
b: branch
j: jump
and some more but those should cover 80% of what you need.
and the following characters then tell if 32 or 64 bit (g), 
immediate (i) value or memory, type of register (float, general purpose).
so lg for a 64 bit load, l for a 32bit load. 
ld for a load into a floating point register (double)


Re: [GIT pull] locking/urgent for v5.10-rc6

2020-11-30 Thread Linus Torvalds
On Mon, Nov 30, 2020 at 5:03 AM Peter Zijlstra  wrote:
>
> > But but but...
> >
> >   do_idle()   # IRQs on
> > local_irq_disable();  # IRQs off
> > defaul_idle_call()# IRQs off
> lockdep_hardirqs_on();  # IRQs off, but lockdep things they're on
> >   arch_cpu_idle() # IRQs off
> > enabled_wait()# IRQs off
> > raw_local_save()  # still off
> > psw_idle()# very much off
> >   ext_int_handler # get an interrupt ?!?!
>   rcu_irq_enter()   # lockdep thinks IRQs are on <- FAIL
>
> I can't much read s390 assembler, but ext_int_handler() has a
> TRACE_IRQS_OFF, which would be sufficient to re-align the lockdep state
> with the actual state, but there's some condition before it, what's that
> test and is that right?

I think that "psw_idle()" enables interrupts, exactly like x86 does.
See my previous email.

But no, I can't read s390 asm either. IBM is notorious for making up
odd IBM-only incomprehensible names. When "oi" means "or immediate", I
personally start suspecting that there were some "happy drugs"
involved.

To make matters worse, some of the assembly code in psw_idle isn't
even assembly code, it's machine code, with "BPON" being an
alternative instruction definition with just the hex encoding for the
machine code instruction rather than any actual human-legible
instruction encoding.

Of course, when the "human-legible" instructions are "oi", I guess hex
codes aren't all that much less legible..

s390 programmers must be some super-human breed. Or, alternatively,
they are munching happy pills by the truck-load to get over the pain
;)

   Linus


Re: [GIT pull] locking/urgent for v5.10-rc6

2020-11-30 Thread Linus Torvalds
On Sun, Nov 29, 2020 at 11:57 PM Peter Zijlstra  wrote:
>
> The idea was to flip all of arch_cpu_idle() to not enable interrupts.

I don't think that's realistic.

> This is suboptimal for things like x86 where arch_cpu_idle() is
> basically STI;HLT, but x86 isn't likely to actually use this code path
> anyway, given all the various cpuidle drivers it has.

Well, but the thing is, "enable interrupts" is pretty much fundamental
to any idle routine.

Idling with interrupts disabled is not a sensible operation. The fact
that on x86 the sequence is "sti;hlt" is not just some x86 oddity,
it's basically fundamental to the whole notion of idle.

Yes, yes, I can very well imagine some hardware doing a "idle until
you sense an interrupt, but don't actually take it". It's not
_impossible_. But it's certainly not normal.

So I think it's completely misguided to think that the default idle
routine should assume that arch_cpu_idle() wouldn't enable interrupts.

 Linus


Re: [GIT pull] locking/urgent for v5.10-rc6

2020-11-30 Thread Peter Zijlstra
On Mon, Nov 30, 2020 at 01:52:11PM +0100, Peter Zijlstra wrote:
> On Mon, Nov 30, 2020 at 01:31:33PM +0100, Sven Schnelle wrote:
> > [0.670280] [ cut here ] 
> > [0.670288] WARNING: CPU: 1 PID: 0 at kernel/rcu/tree.c:1054 
> > rcu_irq_enter+0x7e/0xa8 
> > [0.670293] Modules linked in: 
> > [0.670299] CPU: 1 PID: 0 Comm: swapper/1 Tainted: GW 
> > 5.10.0-rc6 #2263 
> > [0.670304] Hardware name: IBM 2964 NC9 702 (z/VM 6.4.0) 
> > [0.670309] Krnl PSW : 0404d0018000 00d8a8da 
> > (rcu_irq_enter+0x82/0xa8) 
> > [0.670318]R:0 T:1 IO:0 EX:0 Key:0 M:1 W:0 P:0 AS:3 CC:1 
> > PM:0 RI:0 EA:3 
> > [0.670325] Krnl GPRS:  8002 
> > 0001 0101fcee 
> > [0.670331]  
> >   
> > [0.670337]03e00029ff48  
> > 017212d8 0001 
> > [0.670343]05ba0100 000324bb 
> > 03e00029fe40 03e00029fe10
> >  
> > [0.670358] Krnl Code: 00d8a8ca: ec180013017ecij 
> > %r1,1,8,00d8a8f0 
> > [0.670358]00d8a8d0: ecb80005007ecij 
> > %r11,0,8,00d8a8da 
> > [0.670358]   #00d8a8d6: af00mc  0,0 
> > [0.670358]   >00d8a8da: ebbff0a4lmg 
> > %r11,%r15,160(%r15) 
> > [0.670358]00d8a8e0: c0f4ff68brcl
> > 15,00d8a7b0 
> > [0.670358]00d8a8e6: c0e538c1brasl   
> > %r14,00d91a68 
> > [0.670358]00d8a8ec: a7f4ffdcbrc 
> > 15,00d8a8a4 
> > [0.670358]00d8a8f0: c0e538bcbrasl   
> > %r14,00d91a68 
> > [0.670392] Call Trace: 
> > [0.670396]  [<00d8a8da>] rcu_irq_enter+0x82/0xa8  
> > [0.670401]  [<00157f9a>] irq_enter+0x22/0x30  
> > [0.670404]  [<0010e51c>] do_IRQ+0x64/0xd0  
> > [0.670408]  [<00d9a65a>] ext_int_handler+0x18e/0x194  
> > [0.670412]  [<00d9a6a0>] psw_idle+0x40/0x48  
> > [0.670416] ([<00104202>] enabled_wait+0x22/0xf0) 
> > [0.670419]  [<001046e2>] arch_cpu_idle+0x22/0x38  
> > [0.670423]  [<00d986cc>] default_idle_call+0x74/0xd8  
> > [0.670427]  [<0019a94a>] do_idle+0xf2/0x1b0  
> > [0.670431]  [<0019ac7e>] cpu_startup_entry+0x36/0x40  
> > [0.670435]  [<00118b9a>] smp_start_secondary+0x82/0x88  
> 
> But but but...
> 
>   do_idle()   # IRQs on
> local_irq_disable();  # IRQs off
> defaul_idle_call()# IRQs off
lockdep_hardirqs_on();  # IRQs off, but lockdep things they're on
>   arch_cpu_idle() # IRQs off
> enabled_wait()# IRQs off
> raw_local_save()  # still off
> psw_idle()# very much off
>   ext_int_handler # get an interrupt ?!?!
  rcu_irq_enter()   # lockdep thinks IRQs are on <- FAIL


I can't much read s390 assembler, but ext_int_handler() has a
TRACE_IRQS_OFF, which would be sufficient to re-align the lockdep state
with the actual state, but there's some condition before it, what's that
test and is that right?


Re: [GIT pull] locking/urgent for v5.10-rc6

2020-11-30 Thread Peter Zijlstra
On Mon, Nov 30, 2020 at 01:31:33PM +0100, Sven Schnelle wrote:
> [0.670280] [ cut here ] 
> [0.670288] WARNING: CPU: 1 PID: 0 at kernel/rcu/tree.c:1054 
> rcu_irq_enter+0x7e/0xa8 
> [0.670293] Modules linked in: 
> [0.670299] CPU: 1 PID: 0 Comm: swapper/1 Tainted: GW 
> 5.10.0-rc6 #2263 
> [0.670304] Hardware name: IBM 2964 NC9 702 (z/VM 6.4.0) 
> [0.670309] Krnl PSW : 0404d0018000 00d8a8da 
> (rcu_irq_enter+0x82/0xa8) 
> [0.670318]R:0 T:1 IO:0 EX:0 Key:0 M:1 W:0 P:0 AS:3 CC:1 PM:0 
> RI:0 EA:3 
> [0.670325] Krnl GPRS:  8002 0001 
> 0101fcee 
> [0.670331]   
>  
> [0.670337]03e00029ff48  017212d8 
> 0001 
> [0.670343]05ba0100 000324bb 03e00029fe40 
> 03e00029fe10
>  
> [0.670358] Krnl Code: 00d8a8ca: ec180013017ecij 
> %r1,1,8,00d8a8f0 
> [0.670358]00d8a8d0: ecb80005007ecij 
> %r11,0,8,00d8a8da 
> [0.670358]   #00d8a8d6: af00mc  0,0 
> [0.670358]   >00d8a8da: ebbff0a4lmg 
> %r11,%r15,160(%r15) 
> [0.670358]00d8a8e0: c0f4ff68brcl
> 15,00d8a7b0 
> [0.670358]00d8a8e6: c0e538c1brasl   
> %r14,00d91a68 
> [0.670358]00d8a8ec: a7f4ffdcbrc 
> 15,00d8a8a4 
> [0.670358]00d8a8f0: c0e538bcbrasl   
> %r14,00d91a68 
> [0.670392] Call Trace: 
> [0.670396]  [<00d8a8da>] rcu_irq_enter+0x82/0xa8  
> [0.670401]  [<00157f9a>] irq_enter+0x22/0x30  
> [0.670404]  [<0010e51c>] do_IRQ+0x64/0xd0  
> [0.670408]  [<00d9a65a>] ext_int_handler+0x18e/0x194  
> [0.670412]  [<00d9a6a0>] psw_idle+0x40/0x48  
> [0.670416] ([<00104202>] enabled_wait+0x22/0xf0) 
> [0.670419]  [<001046e2>] arch_cpu_idle+0x22/0x38  
> [0.670423]  [<00d986cc>] default_idle_call+0x74/0xd8  
> [0.670427]  [<0019a94a>] do_idle+0xf2/0x1b0  
> [0.670431]  [<0019ac7e>] cpu_startup_entry+0x36/0x40  
> [0.670435]  [<00118b9a>] smp_start_secondary+0x82/0x88  

But but but...

  do_idle() # IRQs on
local_irq_disable();# IRQs off
defaul_idle_call()  # IRQs off
  arch_cpu_idle()   # IRQs off
enabled_wait()  # IRQs off
  raw_local_save()  # still off
  psw_idle()# very much off
ext_int_handler # get an interrupt ?!?!

Help?


Re: [GIT pull] locking/urgent for v5.10-rc6

2020-11-30 Thread Sven Schnelle
Hi,

Sven Schnelle  writes:

> Hi Peter,
>
> Peter Zijlstra  writes:
>
>> On Sun, Nov 29, 2020 at 11:31:41AM -0800, Linus Torvalds wrote:
>>> On Sun, Nov 29, 2020 at 5:38 AM Thomas Gleixner  wrote:
>>> >
>>> > Yet two more places which invoke tracing from RCU disabled regions in the
>>> > idle path. Similar to the entry path the low level idle functions have to
>>> > be non-instrumentable.
>>> 
>>> This really seems less than optimal.
>>> 
>>> In particular, lookie here:
>>> 
>>> > @@ -94,9 +94,35 @@ void __cpuidle default_idle_call(void)
>>> >
>>> > trace_cpu_idle(1, smp_processor_id());
>>> > stop_critical_timings();
>>> > +
>>> > +   /*
>>> > +* arch_cpu_idle() is supposed to enable IRQs, however
>>> > +* we can't do that because of RCU and tracing.
>>> > +*
>>> > +* Trace IRQs enable here, then switch off RCU, and have
>>> > +* arch_cpu_idle() use raw_local_irq_enable(). Note that
>>> > +* rcu_idle_enter() relies on lockdep IRQ state, so 
>>> > switch that
>>> > +* last -- this is very similar to the entry code.
>>> > +*/
>>> > +   trace_hardirqs_on_prepare();
>>> > +   lockdep_hardirqs_on_prepare(_THIS_IP_);
>>> > rcu_idle_enter();
>>> > +   lockdep_hardirqs_on(_THIS_IP_);
>>> > +
>>> > arch_cpu_idle();
>>> > +
>>> > +   /*
>>> > +* OK, so IRQs are enabled here, but RCU needs them 
>>> > disabled to
>>> > +* turn itself back on.. funny thing is that disabling 
>>> > IRQs
>>> > +* will cause tracing, which needs RCU. Jump through 
>>> > hoops to
>>> > +* make it 'work'.
>>> > +*/
>>> > +   raw_local_irq_disable();
>>> > +   lockdep_hardirqs_off(_THIS_IP_);
>>> > rcu_idle_exit();
>>> > +   lockdep_hardirqs_on(_THIS_IP_);
>>> > +   raw_local_irq_enable();
>>> > +
>>> > start_critical_timings();
>>> > trace_cpu_idle(PWR_EVENT_EXIT, smp_processor_id());
>>> > }
>>> 
>>> And look at what the code generation for the idle exit path is when
>>> lockdep isn't even on.
>>
>> Agreed.
>>
>> The idea was to flip all of arch_cpu_idle() to not enable interrupts.
>>
>> This is suboptimal for things like x86 where arch_cpu_idle() is
>> basically STI;HLT, but x86 isn't likely to actually use this code path
>> anyway, given all the various cpuidle drivers it has.
>>
>> Many of the other archs are now doing things like arm's:
>> wfi();raw_local_irq_enable().
>>
>> Doing that tree-wide interrupt-state flip was something I didn't want to
>> do at this late a stage, the chanse of messing that up is just too high.
>>
>> After that I need to go look at flipping cpuidle, which is even more
>> 'interesting'. cpuidle_enter() has the exact same semantics, and this is
>> the code path that x86 actually uses, and here it's inconsitent at best.
>
> On s390 this introduces the following splat:
> [..]

I sent you the wrong backtrace. This is the correct one:

[0.667491] smp: Bringing up secondary CPUs ... 
[0.670262] random: get_random_bytes called from __warn+0x12a/0x160 with 
crng_init=1 
[0.670280] [ cut here ] 
[0.670288] WARNING: CPU: 1 PID: 0 at kernel/rcu/tree.c:1054 
rcu_irq_enter+0x7e/0xa8 
[0.670293] Modules linked in: 
[0.670299] CPU: 1 PID: 0 Comm: swapper/1 Tainted: GW 
5.10.0-rc6 #2263 
[0.670304] Hardware name: IBM 2964 NC9 702 (z/VM 6.4.0) 
[0.670309] Krnl PSW : 0404d0018000 00d8a8da 
(rcu_irq_enter+0x82/0xa8) 
[0.670318]R:0 T:1 IO:0 EX:0 Key:0 M:1 W:0 P:0 AS:3 CC:1 PM:0 
RI:0 EA:3 
[0.670325] Krnl GPRS:  8002 0001 
0101fcee 
[0.670331]   
 
[0.670337]03e00029ff48  017212d8 
0001 
[0.670343]05ba0100 000324bb 03e00029fe40 
03e00029fe10
 
[0.670358] Krnl Code: 00d8a8ca: ec180013017ecij 
%r1,1,8,00d8a8f0 
[0.670358]00d8a8d0: ecb80005007ecij 
%r11,0,8,00d8a8da 
[0.670358]   #00d8a8d6: af00mc  0,0 
[0.670358]   >00d8a8da: ebbff0a4lmg 
%r11,%r15,160(%r15) 
[0.670358]00d8a8e0: c0f4ff68brcl
15,00d8a7b0 
[0.670358]00d8a8e6: c0e538c1brasl   
%r14,00d91a68 
[0.670358]00d8a8ec: a7f4ffdcbrc 
15,00d8a8a4 
[0.670358]00d8a8f0: c0e538bcbrasl   

Re: [GIT pull] locking/urgent for v5.10-rc6

2020-11-30 Thread Peter Zijlstra
On Mon, Nov 30, 2020 at 08:56:51AM +0100, Peter Zijlstra wrote:
> The idea was to flip all of arch_cpu_idle() to not enable interrupts.
> 
> This is suboptimal for things like x86 where arch_cpu_idle() is
> basically STI;HLT, but x86 isn't likely to actually use this code path
> anyway, given all the various cpuidle drivers it has.
> 
> Many of the other archs are now doing things like arm's:
> wfi();raw_local_irq_enable().
> 
> Doing that tree-wide interrupt-state flip was something I didn't want to
> do at this late a stage, the chanse of messing that up is just too high.

It looks something like the below, but this needs to soak a bit because
who knows, I might've missed some obscure case ...

And then, after I've also flipped cpuidle, I need to go noinstr annotate
all the actual idle code :-/ (which is going to be a pain due to chasing
function pointers).

---
 arch/alpha/kernel/process.c  |  1 -
 arch/arc/kernel/process.c|  3 +++
 arch/arm/kernel/process.c|  1 -
 arch/arm/mach-gemini/board-dt.c  |  3 ++-
 arch/arm64/kernel/process.c  |  1 -
 arch/c6x/kernel/process.c|  1 +
 arch/csky/kernel/process.c   |  1 -
 arch/h8300/kernel/process.c  |  1 +
 arch/hexagon/kernel/process.c|  1 -
 arch/ia64/kernel/process.c   |  1 +
 arch/microblaze/kernel/process.c |  1 -
 arch/mips/kernel/idle.c  |  7 +--
 arch/nios2/kernel/process.c  |  1 -
 arch/openrisc/kernel/process.c   |  1 +
 arch/parisc/kernel/process.c |  2 --
 arch/powerpc/kernel/idle.c   |  5 ++---
 arch/riscv/kernel/process.c  |  1 -
 arch/s390/kernel/idle.c  |  1 -
 arch/sh/kernel/idle.c|  1 +
 arch/sparc/kernel/leon_pmc.c |  4 
 arch/sparc/kernel/process_32.c   |  1 -
 arch/sparc/kernel/process_64.c   |  3 ++-
 arch/um/kernel/process.c |  1 -
 arch/x86/kernel/process.c| 14 --
 arch/xtensa/kernel/process.c |  1 +
 kernel/sched/idle.c  | 10 +++---
 26 files changed, 27 insertions(+), 41 deletions(-)

diff --git a/arch/alpha/kernel/process.c b/arch/alpha/kernel/process.c
index 4c7b0414a3ff..d686e8faec6e 100644
--- a/arch/alpha/kernel/process.c
+++ b/arch/alpha/kernel/process.c
@@ -57,7 +57,6 @@ EXPORT_SYMBOL(pm_power_off);
 void arch_cpu_idle(void)
 {
wtint(0);
-   raw_local_irq_enable();
 }
 
 void arch_cpu_idle_dead(void)
diff --git a/arch/arc/kernel/process.c b/arch/arc/kernel/process.c
index 37f724ad5e39..1d0832b28d09 100644
--- a/arch/arc/kernel/process.c
+++ b/arch/arc/kernel/process.c
@@ -114,6 +114,8 @@ void arch_cpu_idle(void)
"sleep %0   \n"
:
:"I"(arg)); /* can't be "r" has to be embedded const */
+
+   raw_local_irq_disable();
 }
 
 #else  /* ARC700 */
@@ -122,6 +124,7 @@ void arch_cpu_idle(void)
 {
/* sleep, but enable both set E1/E2 (levels of interrutps) before 
committing */
__asm__ __volatile__("sleep 0x3 \n");
+   raw_local_irq_disable();
 }
 
 #endif
diff --git a/arch/arm/kernel/process.c b/arch/arm/kernel/process.c
index 9f199b1e8383..f9c97caa52d1 100644
--- a/arch/arm/kernel/process.c
+++ b/arch/arm/kernel/process.c
@@ -71,7 +71,6 @@ void arch_cpu_idle(void)
arm_pm_idle();
else
cpu_do_idle();
-   raw_local_irq_enable();
 }
 
 void arch_cpu_idle_prepare(void)
diff --git a/arch/arm/mach-gemini/board-dt.c b/arch/arm/mach-gemini/board-dt.c
index de0afcc8d94a..fbafe7475c02 100644
--- a/arch/arm/mach-gemini/board-dt.c
+++ b/arch/arm/mach-gemini/board-dt.c
@@ -42,8 +42,9 @@ static void gemini_idle(void)
 */
 
/* FIXME: Enabling interrupts here is racy! */
-   local_irq_enable();
+   raw_local_irq_enable();
cpu_do_idle();
+   raw_local_irq_disable();
 }
 
 static void __init gemini_init_machine(void)
diff --git a/arch/arm64/kernel/process.c b/arch/arm64/kernel/process.c
index 7697a4b48b7c..07f551879dc5 100644
--- a/arch/arm64/kernel/process.c
+++ b/arch/arm64/kernel/process.c
@@ -126,7 +126,6 @@ void arch_cpu_idle(void)
 * tricks
 */
cpu_do_idle();
-   raw_local_irq_enable();
 }
 
 #ifdef CONFIG_HOTPLUG_CPU
diff --git a/arch/c6x/kernel/process.c b/arch/c6x/kernel/process.c
index 9f4fd6a40a10..9e638155ca76 100644
--- a/arch/c6x/kernel/process.c
+++ b/arch/c6x/kernel/process.c
@@ -45,6 +45,7 @@ void arch_cpu_idle(void)
  "   mvc .s2 %0,CSR\n"
  "|| idle\n"
  : "=b"(tmp));
+   raw_local_irq_disable();
 }
 
 static void halt_loop(void)
diff --git a/arch/csky/kernel/process.c b/arch/csky/kernel/process.c
index 69af6bc87e64..bae792dca7a3 100644
--- a/arch/csky/kernel/process.c
+++ b/arch/csky/kernel/process.c
@@ -102,6 +102,5 @@ void arch_cpu_idle(void)
 #ifdef CONFIG_CPU_PM_STOP
asm volatile("stop\n");
 #endif
-   raw_local_irq_enable();
 }
 #endif
diff --git a/arch/h8300/kernel/process.c b/arch/h8300/kernel/process.c
index 

Re: [GIT pull] locking/urgent for v5.10-rc6

2020-11-30 Thread Sven Schnelle
Hi Peter,

Peter Zijlstra  writes:

> On Sun, Nov 29, 2020 at 11:31:41AM -0800, Linus Torvalds wrote:
>> On Sun, Nov 29, 2020 at 5:38 AM Thomas Gleixner  wrote:
>> >
>> > Yet two more places which invoke tracing from RCU disabled regions in the
>> > idle path. Similar to the entry path the low level idle functions have to
>> > be non-instrumentable.
>> 
>> This really seems less than optimal.
>> 
>> In particular, lookie here:
>> 
>> > @@ -94,9 +94,35 @@ void __cpuidle default_idle_call(void)
>> >
>> > trace_cpu_idle(1, smp_processor_id());
>> > stop_critical_timings();
>> > +
>> > +   /*
>> > +* arch_cpu_idle() is supposed to enable IRQs, however
>> > +* we can't do that because of RCU and tracing.
>> > +*
>> > +* Trace IRQs enable here, then switch off RCU, and have
>> > +* arch_cpu_idle() use raw_local_irq_enable(). Note that
>> > +* rcu_idle_enter() relies on lockdep IRQ state, so switch 
>> > that
>> > +* last -- this is very similar to the entry code.
>> > +*/
>> > +   trace_hardirqs_on_prepare();
>> > +   lockdep_hardirqs_on_prepare(_THIS_IP_);
>> > rcu_idle_enter();
>> > +   lockdep_hardirqs_on(_THIS_IP_);
>> > +
>> > arch_cpu_idle();
>> > +
>> > +   /*
>> > +* OK, so IRQs are enabled here, but RCU needs them 
>> > disabled to
>> > +* turn itself back on.. funny thing is that disabling IRQs
>> > +* will cause tracing, which needs RCU. Jump through hoops 
>> > to
>> > +* make it 'work'.
>> > +*/
>> > +   raw_local_irq_disable();
>> > +   lockdep_hardirqs_off(_THIS_IP_);
>> > rcu_idle_exit();
>> > +   lockdep_hardirqs_on(_THIS_IP_);
>> > +   raw_local_irq_enable();
>> > +
>> > start_critical_timings();
>> > trace_cpu_idle(PWR_EVENT_EXIT, smp_processor_id());
>> > }
>> 
>> And look at what the code generation for the idle exit path is when
>> lockdep isn't even on.
>
> Agreed.
>
> The idea was to flip all of arch_cpu_idle() to not enable interrupts.
>
> This is suboptimal for things like x86 where arch_cpu_idle() is
> basically STI;HLT, but x86 isn't likely to actually use this code path
> anyway, given all the various cpuidle drivers it has.
>
> Many of the other archs are now doing things like arm's:
> wfi();raw_local_irq_enable().
>
> Doing that tree-wide interrupt-state flip was something I didn't want to
> do at this late a stage, the chanse of messing that up is just too high.
>
> After that I need to go look at flipping cpuidle, which is even more
> 'interesting'. cpuidle_enter() has the exact same semantics, and this is
> the code path that x86 actually uses, and here it's inconsitent at best.

On s390 this introduces the following splat:

[2.962283] Testing tracer wakeup_rt:  
[3.017102] sched: DL replenish lagged too much 
[3.061777] PASSED 
[3.062076] Testing tracer wakeup_dl: PASSED 
[3.161296] [ cut here ] 
[3.161301] DEBUG_LOCKS_WARN_ON(current->hardirq_chain_key != 
current->curr_chain_key) 
[3.161310] WARNING: CPU: 0 PID: 0 at kernel/locking/lockdep.c:4155 
lockdep_hardirqs_on+0x1ea/0x1f8 
[3.161316] Modules linked in: 
[3.161323] CPU: 0 PID: 0 Comm: swapper/0 Not tainted 
5.10.0-rc6-07209-g9e30ba6d2d5c #2249 
[3.161327] Hardware name: IBM 2964 NC9 702 (z/VM 6.4.0) 
[3.161331] Krnl PSW : 0404d0018000 00d730fe 
(lockdep_hardirqs_on+0x1ee/0x1f8) 
[3.161340]R:0 T:1 IO:0 EX:0 Key:0 M:1 W:0 P:0 AS:3 CC:1 PM:0 
RI:0 EA:3 
[3.161347] Krnl GPRS: c000bfff 8001 004a 
001e33b8 
[3.161352]03807b88 03807b80  
 
[3.161357] 0151a610 01361500 
00d81d40 
[3.161362]00010400 00d88010 00d730fa 
03807db8
 
[3.161461] Krnl Code: 00d730ee: c0200012d9adlarl
%r2,00fce448 
[3.161461]00d730f4: c0e5451abrasl   
%r14,00d5bb28 
[3.161461]   #00d730fa: af00mc  0,0 
[3.161461]   >00d730fe: a7f4ff70brc 
15,00d72fde 
[3.161461]00d73102: 0707bcr 0,%r7 
[3.161461]00d73104: 0707bcr 0,%r7 
[3.161461]00d73106: 0707bcr 0,%r7 
[3.161461]00d73108: c418002884eclgrl
%r1,01283ae0 
[3.161518] Call Trace: 
[3.161526]  [<00d730fe>] 

Re: [GIT pull] locking/urgent for v5.10-rc6

2020-11-29 Thread Peter Zijlstra
On Sun, Nov 29, 2020 at 11:31:41AM -0800, Linus Torvalds wrote:
> On Sun, Nov 29, 2020 at 5:38 AM Thomas Gleixner  wrote:
> >
> > Yet two more places which invoke tracing from RCU disabled regions in the
> > idle path. Similar to the entry path the low level idle functions have to
> > be non-instrumentable.
> 
> This really seems less than optimal.
> 
> In particular, lookie here:
> 
> > @@ -94,9 +94,35 @@ void __cpuidle default_idle_call(void)
> >
> > trace_cpu_idle(1, smp_processor_id());
> > stop_critical_timings();
> > +
> > +   /*
> > +* arch_cpu_idle() is supposed to enable IRQs, however
> > +* we can't do that because of RCU and tracing.
> > +*
> > +* Trace IRQs enable here, then switch off RCU, and have
> > +* arch_cpu_idle() use raw_local_irq_enable(). Note that
> > +* rcu_idle_enter() relies on lockdep IRQ state, so switch 
> > that
> > +* last -- this is very similar to the entry code.
> > +*/
> > +   trace_hardirqs_on_prepare();
> > +   lockdep_hardirqs_on_prepare(_THIS_IP_);
> > rcu_idle_enter();
> > +   lockdep_hardirqs_on(_THIS_IP_);
> > +
> > arch_cpu_idle();
> > +
> > +   /*
> > +* OK, so IRQs are enabled here, but RCU needs them 
> > disabled to
> > +* turn itself back on.. funny thing is that disabling IRQs
> > +* will cause tracing, which needs RCU. Jump through hoops 
> > to
> > +* make it 'work'.
> > +*/
> > +   raw_local_irq_disable();
> > +   lockdep_hardirqs_off(_THIS_IP_);
> > rcu_idle_exit();
> > +   lockdep_hardirqs_on(_THIS_IP_);
> > +   raw_local_irq_enable();
> > +
> > start_critical_timings();
> > trace_cpu_idle(PWR_EVENT_EXIT, smp_processor_id());
> > }
> 
> And look at what the code generation for the idle exit path is when
> lockdep isn't even on.

Agreed.

The idea was to flip all of arch_cpu_idle() to not enable interrupts.

This is suboptimal for things like x86 where arch_cpu_idle() is
basically STI;HLT, but x86 isn't likely to actually use this code path
anyway, given all the various cpuidle drivers it has.

Many of the other archs are now doing things like arm's:
wfi();raw_local_irq_enable().

Doing that tree-wide interrupt-state flip was something I didn't want to
do at this late a stage, the chanse of messing that up is just too high.

After that I need to go look at flipping cpuidle, which is even more
'interesting'. cpuidle_enter() has the exact same semantics, and this is
the code path that x86 actually uses, and here it's inconsitent at best.


Re: [GIT pull] locking/urgent for v5.10-rc6

2020-11-29 Thread Paul E. McKenney
On Sun, Nov 29, 2020 at 11:31:41AM -0800, Linus Torvalds wrote:
> On Sun, Nov 29, 2020 at 5:38 AM Thomas Gleixner  wrote:
> >
> > Yet two more places which invoke tracing from RCU disabled regions in the
> > idle path. Similar to the entry path the low level idle functions have to
> > be non-instrumentable.
> 
> This really seems less than optimal.
> 
> In particular, lookie here:
> 
> > @@ -94,9 +94,35 @@ void __cpuidle default_idle_call(void)
> >
> > trace_cpu_idle(1, smp_processor_id());
> > stop_critical_timings();
> > +
> > +   /*
> > +* arch_cpu_idle() is supposed to enable IRQs, however
> > +* we can't do that because of RCU and tracing.
> > +*
> > +* Trace IRQs enable here, then switch off RCU, and have
> > +* arch_cpu_idle() use raw_local_irq_enable(). Note that
> > +* rcu_idle_enter() relies on lockdep IRQ state, so switch 
> > that
> > +* last -- this is very similar to the entry code.
> > +*/
> > +   trace_hardirqs_on_prepare();
> > +   lockdep_hardirqs_on_prepare(_THIS_IP_);
> > rcu_idle_enter();
> > +   lockdep_hardirqs_on(_THIS_IP_);
> > +
> > arch_cpu_idle();
> > +
> > +   /*
> > +* OK, so IRQs are enabled here, but RCU needs them 
> > disabled to
> > +* turn itself back on.. funny thing is that disabling IRQs
> > +* will cause tracing, which needs RCU. Jump through hoops 
> > to
> > +* make it 'work'.
> > +*/
> > +   raw_local_irq_disable();
> > +   lockdep_hardirqs_off(_THIS_IP_);
> > rcu_idle_exit();
> > +   lockdep_hardirqs_on(_THIS_IP_);
> > +   raw_local_irq_enable();
> > +
> > start_critical_timings();
> > trace_cpu_idle(PWR_EVENT_EXIT, smp_processor_id());
> > }
> 
> And look at what the code generation for the idle exit path is when
> lockdep isn't even on.
> 
> It's *literally*
> 
> cli
> call rcu_idle_exit
> sti
> 
> and guess what rcu_idle_exit does?
> 
> Yeah, that one does "pushf; cli; call rcu_eqs_exit; popf".
> 
> So here we are, in the somewhat critical "an interrupt woke us up"
> section, and we're doing just ridiculously stupid things.
> 
> I've pulled this, because it solves a problem, but there's a deeper
> problem here in how all this is done.
> 
> The idle path is actually quite important. I can point to real loads
> where this is a big part of the CPU profile, because you end up having
> lots of "go to sleep for very short times, because the thing we were
> waiting for takes almost no time at all".

This is because of the noinline implied by the noinstr on rcu_eqs_exit().
If I replace that with inline, it does get inlined.  Except that, if
I remember correctly, making that change messes up the tooling that
enforces the no-instrumentation regions.

I -think- that a combination of instrumentation_end() and s/noinstr/inline/
might work, but I will need to consult with the experts on this.

Thanx, Paul


Re: [GIT pull] locking/urgent for v5.10-rc6

2020-11-29 Thread pr-tracker-bot
The pull request you sent on Sun, 29 Nov 2020 13:38:00 -:

> git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip.git 
> locking-urgent-2020-11-29

has been merged into torvalds/linux.git:
https://git.kernel.org/torvalds/c/f91a3aa6bce480fe6e08df540129f4a923222419

Thank you!

-- 
Deet-doot-dot, I am a bot.
https://korg.docs.kernel.org/prtracker.html


Re: [GIT pull] locking/urgent for v5.10-rc6

2020-11-29 Thread Linus Torvalds
On Sun, Nov 29, 2020 at 5:38 AM Thomas Gleixner  wrote:
>
> Yet two more places which invoke tracing from RCU disabled regions in the
> idle path. Similar to the entry path the low level idle functions have to
> be non-instrumentable.

This really seems less than optimal.

In particular, lookie here:

> @@ -94,9 +94,35 @@ void __cpuidle default_idle_call(void)
>
> trace_cpu_idle(1, smp_processor_id());
> stop_critical_timings();
> +
> +   /*
> +* arch_cpu_idle() is supposed to enable IRQs, however
> +* we can't do that because of RCU and tracing.
> +*
> +* Trace IRQs enable here, then switch off RCU, and have
> +* arch_cpu_idle() use raw_local_irq_enable(). Note that
> +* rcu_idle_enter() relies on lockdep IRQ state, so switch 
> that
> +* last -- this is very similar to the entry code.
> +*/
> +   trace_hardirqs_on_prepare();
> +   lockdep_hardirqs_on_prepare(_THIS_IP_);
> rcu_idle_enter();
> +   lockdep_hardirqs_on(_THIS_IP_);
> +
> arch_cpu_idle();
> +
> +   /*
> +* OK, so IRQs are enabled here, but RCU needs them disabled 
> to
> +* turn itself back on.. funny thing is that disabling IRQs
> +* will cause tracing, which needs RCU. Jump through hoops to
> +* make it 'work'.
> +*/
> +   raw_local_irq_disable();
> +   lockdep_hardirqs_off(_THIS_IP_);
> rcu_idle_exit();
> +   lockdep_hardirqs_on(_THIS_IP_);
> +   raw_local_irq_enable();
> +
> start_critical_timings();
> trace_cpu_idle(PWR_EVENT_EXIT, smp_processor_id());
> }

And look at what the code generation for the idle exit path is when
lockdep isn't even on.

It's *literally*

cli
call rcu_idle_exit
sti

and guess what rcu_idle_exit does?

Yeah, that one does "pushf; cli; call rcu_eqs_exit; popf".

So here we are, in the somewhat critical "an interrupt woke us up"
section, and we're doing just ridiculously stupid things.

I've pulled this, because it solves a problem, but there's a deeper
problem here in how all this is done.

The idle path is actually quite important. I can point to real loads
where this is a big part of the CPU profile, because you end up having
lots of "go to sleep for very short times, because the thing we were
waiting for takes almost no time at all".

 Linus


[GIT pull] locking/urgent for v5.10-rc6

2020-11-29 Thread Thomas Gleixner
Linus,

please pull the latest locking/urgent branch from:

   git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip.git 
locking-urgent-2020-11-29

up to:  6e1d2bc675bd: intel_idle: Fix intel_idle() vs tracing


Yet two more places which invoke tracing from RCU disabled regions in the
idle path. Similar to the entry path the low level idle functions have to
be non-instrumentable.

Thanks,

tglx

-->
Peter Zijlstra (2):
  sched/idle: Fix arch_cpu_idle() vs tracing
  intel_idle: Fix intel_idle() vs tracing


 arch/alpha/kernel/process.c  |  2 +-
 arch/arm/kernel/process.c|  2 +-
 arch/arm64/kernel/process.c  |  2 +-
 arch/csky/kernel/process.c   |  2 +-
 arch/h8300/kernel/process.c  |  2 +-
 arch/hexagon/kernel/process.c|  2 +-
 arch/ia64/kernel/process.c   |  2 +-
 arch/microblaze/kernel/process.c |  2 +-
 arch/mips/kernel/idle.c  | 12 ++--
 arch/nios2/kernel/process.c  |  2 +-
 arch/openrisc/kernel/process.c   |  2 +-
 arch/parisc/kernel/process.c |  2 +-
 arch/powerpc/kernel/idle.c   |  4 ++--
 arch/riscv/kernel/process.c  |  2 +-
 arch/s390/kernel/idle.c  |  6 +++---
 arch/sh/kernel/idle.c|  2 +-
 arch/sparc/kernel/leon_pmc.c |  4 ++--
 arch/sparc/kernel/process_32.c   |  2 +-
 arch/sparc/kernel/process_64.c   |  4 ++--
 arch/um/kernel/process.c |  2 +-
 arch/x86/include/asm/mwait.h |  2 --
 arch/x86/kernel/process.c| 12 +++-
 drivers/idle/intel_idle.c| 37 -
 kernel/sched/idle.c  | 28 +++-
 24 files changed, 84 insertions(+), 55 deletions(-)

diff --git a/arch/alpha/kernel/process.c b/arch/alpha/kernel/process.c
index 7462a7911002..4c7b0414a3ff 100644
--- a/arch/alpha/kernel/process.c
+++ b/arch/alpha/kernel/process.c
@@ -57,7 +57,7 @@ EXPORT_SYMBOL(pm_power_off);
 void arch_cpu_idle(void)
 {
wtint(0);
-   local_irq_enable();
+   raw_local_irq_enable();
 }
 
 void arch_cpu_idle_dead(void)
diff --git a/arch/arm/kernel/process.c b/arch/arm/kernel/process.c
index 8e6ace03e960..9f199b1e8383 100644
--- a/arch/arm/kernel/process.c
+++ b/arch/arm/kernel/process.c
@@ -71,7 +71,7 @@ void arch_cpu_idle(void)
arm_pm_idle();
else
cpu_do_idle();
-   local_irq_enable();
+   raw_local_irq_enable();
 }
 
 void arch_cpu_idle_prepare(void)
diff --git a/arch/arm64/kernel/process.c b/arch/arm64/kernel/process.c
index 4784011cecac..9ebe02574127 100644
--- a/arch/arm64/kernel/process.c
+++ b/arch/arm64/kernel/process.c
@@ -126,7 +126,7 @@ void arch_cpu_idle(void)
 * tricks
 */
cpu_do_idle();
-   local_irq_enable();
+   raw_local_irq_enable();
 }
 
 #ifdef CONFIG_HOTPLUG_CPU
diff --git a/arch/csky/kernel/process.c b/arch/csky/kernel/process.c
index f730869e21ee..69af6bc87e64 100644
--- a/arch/csky/kernel/process.c
+++ b/arch/csky/kernel/process.c
@@ -102,6 +102,6 @@ void arch_cpu_idle(void)
 #ifdef CONFIG_CPU_PM_STOP
asm volatile("stop\n");
 #endif
-   local_irq_enable();
+   raw_local_irq_enable();
 }
 #endif
diff --git a/arch/h8300/kernel/process.c b/arch/h8300/kernel/process.c
index aea0a40b77a9..bc1364db58fe 100644
--- a/arch/h8300/kernel/process.c
+++ b/arch/h8300/kernel/process.c
@@ -57,7 +57,7 @@ asmlinkage void ret_from_kernel_thread(void);
  */
 void arch_cpu_idle(void)
 {
-   local_irq_enable();
+   raw_local_irq_enable();
__asm__("sleep");
 }
 
diff --git a/arch/hexagon/kernel/process.c b/arch/hexagon/kernel/process.c
index 5a0a95d93ddb..67767c5ed98c 100644
--- a/arch/hexagon/kernel/process.c
+++ b/arch/hexagon/kernel/process.c
@@ -44,7 +44,7 @@ void arch_cpu_idle(void)
 {
__vmwait();
/*  interrupts wake us up, but irqs are still disabled */
-   local_irq_enable();
+   raw_local_irq_enable();
 }
 
 /*
diff --git a/arch/ia64/kernel/process.c b/arch/ia64/kernel/process.c
index 6b61a703bcf5..c9ff8796b509 100644
--- a/arch/ia64/kernel/process.c
+++ b/arch/ia64/kernel/process.c
@@ -239,7 +239,7 @@ void arch_cpu_idle(void)
if (mark_idle)
(*mark_idle)(1);
 
-   safe_halt();
+   raw_safe_halt();
 
if (mark_idle)
(*mark_idle)(0);
diff --git a/arch/microblaze/kernel/process.c b/arch/microblaze/kernel/process.c
index a9e46e525cd0..f99860771ff4 100644
--- a/arch/microblaze/kernel/process.c
+++ b/arch/microblaze/kernel/process.c
@@ -149,5 +149,5 @@ int dump_fpu(struct pt_regs *regs, elf_fpregset_t *fpregs)
 
 void arch_cpu_idle(void)
 {
-   local_irq_enable();
+   raw_local_irq_enable();
 }
diff --git a/arch/mips/kernel/idle.c b/arch/mips/kernel/idle.c
index 5bc3b04693c7..18e69ebf5691 100644
--- a/arch/mips/kernel/idle.c
+++ b/arch/mips/kernel/idle.c
@@ -33,19 +33,19 @@ static void __cpuidle r3081_wait(void)
 {
unsigned long cfg = read_c0_conf();
write_c0_conf(cfg |