Re: [RFC][PATCH 4/7] smp: Optimize send_call_function_single_ipi()

2021-01-22 Thread Paul E. McKenney
On Fri, Jan 22, 2021 at 09:31:37AM +0100, Peter Zijlstra wrote:
> On Thu, Jan 21, 2021 at 04:20:12PM -0800, Paul E. McKenney wrote:
> 
> > > ---
> > > diff --git a/kernel/rcu/tree.c b/kernel/rcu/tree.c
> > > index 368749008ae8..2c8d4c3e341e 100644
> > > --- a/kernel/rcu/tree.c
> > > +++ b/kernel/rcu/tree.c
> > > @@ -445,7 +445,7 @@ static int rcu_is_cpu_rrupt_from_idle(void)
> > >   /*
> > >* Usually called from the tick; but also used from smp_function_call()
> > >* for expedited grace periods. This latter can result in running from
> > > -  * the idle task, instead of an actual IPI.
> > > +  * a (usually the idle) task, instead of an actual IPI.
> > 
> > The story is growing enough hair that we should tell it only once.
> > So here just where it is called from:
> > 
> > /*
> >  * Usually called from the tick; but also used from smp_function_call()
> >  * for expedited grace periods.
> >  */
> > 
> > >   lockdep_assert_irqs_disabled();
> > >  
> > > @@ -461,9 +461,14 @@ static int rcu_is_cpu_rrupt_from_idle(void)
> > >   return false;
> > >  
> > >   /*
> > > -  * If we're not in an interrupt, we must be in the idle task!
> > > +  * If we're not in an interrupt, we must be in task context.
> > > +  *
> > > +  * This will typically be the idle task through:
> > > +  *   flush_smp_call_function_from_idle(),
> > > +  *
> > > +  * but can also be in CPU HotPlug through smpcfd_dying().
> > >*/
> > 
> > Good, but how about like this?
> > 
> > /*
> >  * If we are not in an interrupt handler, we must be in
> >  * smp_call_function() handler.
> >  *
> >  * Normally, smp_call_function() handlers are invoked from
> >  * the idle task via flush_smp_call_function_from_idle().
> >  * However, they can also be invoked from CPU hotplug
> >  * operations via smpcfd_dying().
> >  */
> > 
> > > - WARN_ON_ONCE(!nesting && !is_idle_task(current));
> > > + WARN_ON_ONCE(!nesting && !in_task(current));
> > 
> > This is used in time-critical contexts, so why not RCU_LOCKDEP_WARN()?
> > That should also allow checking more closely.  Would something like the
> > following work?
> > 
> > RCU_LOCKDEP_WARN(!nesting && !is_idle_task(current) && 
> > (!in_task(current) || !lockdep_cpus_write_held()));
> > 
> > Where lockdep_cpus_write_held is defined in kernel/cpu.c:
> 
> Works for me, except s/in_task(current)/in_task()/ compiles a lot
> better.

These compilers sure constrain my creativity!  ;-)

Might be a good thing, though...

Thanx, Paul


Re: [RFC][PATCH 4/7] smp: Optimize send_call_function_single_ipi()

2021-01-22 Thread Peter Zijlstra
On Thu, Jan 21, 2021 at 04:20:12PM -0800, Paul E. McKenney wrote:

> > ---
> > diff --git a/kernel/rcu/tree.c b/kernel/rcu/tree.c
> > index 368749008ae8..2c8d4c3e341e 100644
> > --- a/kernel/rcu/tree.c
> > +++ b/kernel/rcu/tree.c
> > @@ -445,7 +445,7 @@ static int rcu_is_cpu_rrupt_from_idle(void)
> > /*
> >  * Usually called from the tick; but also used from smp_function_call()
> >  * for expedited grace periods. This latter can result in running from
> > -* the idle task, instead of an actual IPI.
> > +* a (usually the idle) task, instead of an actual IPI.
> 
> The story is growing enough hair that we should tell it only once.
> So here just where it is called from:
> 
>   /*
>* Usually called from the tick; but also used from smp_function_call()
>* for expedited grace periods.
>*/
> 
> > lockdep_assert_irqs_disabled();
> >  
> > @@ -461,9 +461,14 @@ static int rcu_is_cpu_rrupt_from_idle(void)
> > return false;
> >  
> > /*
> > -* If we're not in an interrupt, we must be in the idle task!
> > +* If we're not in an interrupt, we must be in task context.
> > +*
> > +* This will typically be the idle task through:
> > +*   flush_smp_call_function_from_idle(),
> > +*
> > +* but can also be in CPU HotPlug through smpcfd_dying().
> >  */
> 
> Good, but how about like this?
> 
>   /*
>* If we are not in an interrupt handler, we must be in
>* smp_call_function() handler.
>*
>* Normally, smp_call_function() handlers are invoked from
>* the idle task via flush_smp_call_function_from_idle().
>* However, they can also be invoked from CPU hotplug
>* operations via smpcfd_dying().
>*/
> 
> > -   WARN_ON_ONCE(!nesting && !is_idle_task(current));
> > +   WARN_ON_ONCE(!nesting && !in_task(current));
> 
> This is used in time-critical contexts, so why not RCU_LOCKDEP_WARN()?
> That should also allow checking more closely.  Would something like the
> following work?
> 
>   RCU_LOCKDEP_WARN(!nesting && !is_idle_task(current) && 
> (!in_task(current) || !lockdep_cpus_write_held()));
> 
> Where lockdep_cpus_write_held is defined in kernel/cpu.c:

Works for me, except s/in_task(current)/in_task()/ compiles a lot
better.


Re: [RFC][PATCH 4/7] smp: Optimize send_call_function_single_ipi()

2021-01-21 Thread Paul E. McKenney
On Thu, Jan 21, 2021 at 05:56:53PM +0100, Peter Zijlstra wrote:
> On Wed, May 27, 2020 at 07:12:36PM +0200, Peter Zijlstra wrote:
> > Subject: rcu: Allow for smp_call_function() running callbacks from idle
> > 
> > Current RCU hard relies on smp_call_function() callbacks running from
> > interrupt context. A pending optimization is going to break that, it
> > will allow idle CPUs to run the callbacks from the idle loop. This
> > avoids raising the IPI on the requesting CPU and avoids handling an
> > exception on the receiving CPU.
> > 
> > Change rcu_is_cpu_rrupt_from_idle() to also accept task context,
> > provided it is the idle task.
> > 
> > Signed-off-by: Peter Zijlstra (Intel) 
> > ---
> >  kernel/rcu/tree.c   | 25 +++--
> >  kernel/sched/idle.c |  4 
> >  2 files changed, 23 insertions(+), 6 deletions(-)
> > 
> > diff --git a/kernel/rcu/tree.c b/kernel/rcu/tree.c
> > index d8e9dbbefcfa..c716eadc7617 100644
> > --- a/kernel/rcu/tree.c
> > +++ b/kernel/rcu/tree.c
> > @@ -418,16 +418,23 @@ void rcu_momentary_dyntick_idle(void)
> >  EXPORT_SYMBOL_GPL(rcu_momentary_dyntick_idle);
> >  
> >  /**
> > - * rcu_is_cpu_rrupt_from_idle - see if interrupted from idle
> > + * rcu_is_cpu_rrupt_from_idle - see if 'interrupted' from idle
> >   *
> >   * If the current CPU is idle and running at a first-level (not nested)
> > - * interrupt from idle, return true.  The caller must have at least
> > - * disabled preemption.
> > + * interrupt, or directly, from idle, return true.
> > + *
> > + * The caller must have at least disabled IRQs.
> >   */
> >  static int rcu_is_cpu_rrupt_from_idle(void)
> >  {
> > -   /* Called only from within the scheduling-clock interrupt */
> > -   lockdep_assert_in_irq();
> > +   long nesting;
> > +
> > +   /*
> > +* Usually called from the tick; but also used from smp_function_call()
> > +* for expedited grace periods. This latter can result in running from
> > +* the idle task, instead of an actual IPI.
> > +*/
> > +   lockdep_assert_irqs_disabled();
> >  
> > /* Check for counter underflows */
> > RCU_LOCKDEP_WARN(__this_cpu_read(rcu_data.dynticks_nesting) < 0,
> > @@ -436,9 +443,15 @@ static int rcu_is_cpu_rrupt_from_idle(void)
> >  "RCU dynticks_nmi_nesting counter underflow/zero!");
> >  
> > /* Are we at first interrupt nesting level? */
> > -   if (__this_cpu_read(rcu_data.dynticks_nmi_nesting) != 1)
> > +   nesting = __this_cpu_read(rcu_data.dynticks_nmi_nesting);
> > +   if (nesting > 1)
> > return false;
> >  
> > +   /*
> > +* If we're not in an interrupt, we must be in the idle task!
> > +*/
> > +   WARN_ON_ONCE(!nesting && !is_idle_task(current));
> > +
> > /* Does CPU appear to be idle from an RCU standpoint? */
> > return __this_cpu_read(rcu_data.dynticks_nesting) == 0;
> >  }
> 
> Let me revive this thread after yesterdays IRC conversation.
> 
> As said; it might be _extremely_ unlikely, but somewhat possible for us
> to send the IPI concurrent with hot-unplug, not yet observing
> rcutree_offline_cpu() or thereabout.
> 
> Then have the IPI 'delayed' enough to not happen until smpcfd_dying()
> and getting ran there.
> 
> This would then run the function from the stopper thread instead of the
> idle thread and trigger the warning, even though we're not holding
> rcu_read_lock() (which, IIRC, was the only constraint).
> 
> So would something like the below be acceptable?
> 
> ---
> diff --git a/kernel/rcu/tree.c b/kernel/rcu/tree.c
> index 368749008ae8..2c8d4c3e341e 100644
> --- a/kernel/rcu/tree.c
> +++ b/kernel/rcu/tree.c
> @@ -445,7 +445,7 @@ static int rcu_is_cpu_rrupt_from_idle(void)
>   /*
>* Usually called from the tick; but also used from smp_function_call()
>* for expedited grace periods. This latter can result in running from
> -  * the idle task, instead of an actual IPI.
> +  * a (usually the idle) task, instead of an actual IPI.

The story is growing enough hair that we should tell it only once.
So here just where it is called from:

/*
 * Usually called from the tick; but also used from smp_function_call()
 * for expedited grace periods.
 */

>   lockdep_assert_irqs_disabled();
>  
> @@ -461,9 +461,14 @@ static int rcu_is_cpu_rrupt_from_idle(void)
>   return false;
>  
>   /*
> -  * If we're not in an interrupt, we must be in the idle task!
> +  * If we're not in an interrupt, we must be in task context.
> +  *
> +  * This will typically be the idle task through:
> +  *   flush_smp_call_function_from_idle(),
> +  *
> +  * but can also be in CPU HotPlug through smpcfd_dying().
>*/

Good, but how about like this?

/*
 * If we are not in an interrupt handler, we must be in
 * smp_call_function() handler.
 *
 * Normally, smp_call_function() handlers are invoked from
 * the idle task via 

Re: [RFC][PATCH 4/7] smp: Optimize send_call_function_single_ipi()

2021-01-21 Thread Peter Zijlstra
On Wed, May 27, 2020 at 07:12:36PM +0200, Peter Zijlstra wrote:
> Subject: rcu: Allow for smp_call_function() running callbacks from idle
> 
> Current RCU hard relies on smp_call_function() callbacks running from
> interrupt context. A pending optimization is going to break that, it
> will allow idle CPUs to run the callbacks from the idle loop. This
> avoids raising the IPI on the requesting CPU and avoids handling an
> exception on the receiving CPU.
> 
> Change rcu_is_cpu_rrupt_from_idle() to also accept task context,
> provided it is the idle task.
> 
> Signed-off-by: Peter Zijlstra (Intel) 
> ---
>  kernel/rcu/tree.c   | 25 +++--
>  kernel/sched/idle.c |  4 
>  2 files changed, 23 insertions(+), 6 deletions(-)
> 
> diff --git a/kernel/rcu/tree.c b/kernel/rcu/tree.c
> index d8e9dbbefcfa..c716eadc7617 100644
> --- a/kernel/rcu/tree.c
> +++ b/kernel/rcu/tree.c
> @@ -418,16 +418,23 @@ void rcu_momentary_dyntick_idle(void)
>  EXPORT_SYMBOL_GPL(rcu_momentary_dyntick_idle);
>  
>  /**
> - * rcu_is_cpu_rrupt_from_idle - see if interrupted from idle
> + * rcu_is_cpu_rrupt_from_idle - see if 'interrupted' from idle
>   *
>   * If the current CPU is idle and running at a first-level (not nested)
> - * interrupt from idle, return true.  The caller must have at least
> - * disabled preemption.
> + * interrupt, or directly, from idle, return true.
> + *
> + * The caller must have at least disabled IRQs.
>   */
>  static int rcu_is_cpu_rrupt_from_idle(void)
>  {
> - /* Called only from within the scheduling-clock interrupt */
> - lockdep_assert_in_irq();
> + long nesting;
> +
> + /*
> +  * Usually called from the tick; but also used from smp_function_call()
> +  * for expedited grace periods. This latter can result in running from
> +  * the idle task, instead of an actual IPI.
> +  */
> + lockdep_assert_irqs_disabled();
>  
>   /* Check for counter underflows */
>   RCU_LOCKDEP_WARN(__this_cpu_read(rcu_data.dynticks_nesting) < 0,
> @@ -436,9 +443,15 @@ static int rcu_is_cpu_rrupt_from_idle(void)
>"RCU dynticks_nmi_nesting counter underflow/zero!");
>  
>   /* Are we at first interrupt nesting level? */
> - if (__this_cpu_read(rcu_data.dynticks_nmi_nesting) != 1)
> + nesting = __this_cpu_read(rcu_data.dynticks_nmi_nesting);
> + if (nesting > 1)
>   return false;
>  
> + /*
> +  * If we're not in an interrupt, we must be in the idle task!
> +  */
> + WARN_ON_ONCE(!nesting && !is_idle_task(current));
> +
>   /* Does CPU appear to be idle from an RCU standpoint? */
>   return __this_cpu_read(rcu_data.dynticks_nesting) == 0;
>  }

Let me revive this thread after yesterdays IRC conversation.

As said; it might be _extremely_ unlikely, but somewhat possible for us
to send the IPI concurrent with hot-unplug, not yet observing
rcutree_offline_cpu() or thereabout.

Then have the IPI 'delayed' enough to not happen until smpcfd_dying()
and getting ran there.

This would then run the function from the stopper thread instead of the
idle thread and trigger the warning, even though we're not holding
rcu_read_lock() (which, IIRC, was the only constraint).

So would something like the below be acceptable?

---
diff --git a/kernel/rcu/tree.c b/kernel/rcu/tree.c
index 368749008ae8..2c8d4c3e341e 100644
--- a/kernel/rcu/tree.c
+++ b/kernel/rcu/tree.c
@@ -445,7 +445,7 @@ static int rcu_is_cpu_rrupt_from_idle(void)
/*
 * Usually called from the tick; but also used from smp_function_call()
 * for expedited grace periods. This latter can result in running from
-* the idle task, instead of an actual IPI.
+* a (usually the idle) task, instead of an actual IPI.
 */
lockdep_assert_irqs_disabled();
 
@@ -461,9 +461,14 @@ static int rcu_is_cpu_rrupt_from_idle(void)
return false;
 
/*
-* If we're not in an interrupt, we must be in the idle task!
+* If we're not in an interrupt, we must be in task context.
+*
+* This will typically be the idle task through:
+*   flush_smp_call_function_from_idle(),
+*
+* but can also be in CPU HotPlug through smpcfd_dying().
 */
-   WARN_ON_ONCE(!nesting && !is_idle_task(current));
+   WARN_ON_ONCE(!nesting && !in_task(current));
 
/* Does CPU appear to be idle from an RCU standpoint? */
return __this_cpu_read(rcu_data.dynticks_nesting) == 0;


Re: [RFC][PATCH 4/7] smp: Optimize send_call_function_single_ipi()

2020-05-29 Thread Frederic Weisbecker
On Tue, May 26, 2020 at 06:11:01PM +0200, Peter Zijlstra wrote:
> +void flush_smp_call_function_from_idle(void)
> +{
> + unsigned long flags;
> +
> + if (llist_empty(this_cpu_ptr(_single_queue)))
> + return;

Now it seems weird that sched_ttwu_pending() didn't have that
llist_empty() optimization. The ordering should allow it.

Anyway,

Reviewed-by: Frederic Weisbecker 


Re: [RFC][PATCH 4/7] smp: Optimize send_call_function_single_ipi()

2020-05-27 Thread Joel Fernandes
On Wed, May 27, 2020 at 12:39:14PM -0700, Paul E. McKenney wrote:
> On Wed, May 27, 2020 at 07:12:36PM +0200, Peter Zijlstra wrote:
> > On Wed, May 27, 2020 at 06:35:43PM +0200, Peter Zijlstra wrote:
> > > Right, I went though them, didn't find anything obvious amiss. OK, let
> > > me do a nicer patch.
> > 
> > something like so then?
> > 
> > ---
> > Subject: rcu: Allow for smp_call_function() running callbacks from idle
> > 
> > Current RCU hard relies on smp_call_function() callbacks running from
> > interrupt context. A pending optimization is going to break that, it
> > will allow idle CPUs to run the callbacks from the idle loop. This
> > avoids raising the IPI on the requesting CPU and avoids handling an
> > exception on the receiving CPU.
> > 
> > Change rcu_is_cpu_rrupt_from_idle() to also accept task context,
> > provided it is the idle task.
> > 
> > Signed-off-by: Peter Zijlstra (Intel) 
> 
> Looks good to me!
> 
> Reviewed-by: Paul E. McKenney 

Reviewed-by: Joel Fernandes (Google) 

thanks,

 - Joel


> 
> > ---
> >  kernel/rcu/tree.c   | 25 +++--
> >  kernel/sched/idle.c |  4 
> >  2 files changed, 23 insertions(+), 6 deletions(-)
> > 
> > diff --git a/kernel/rcu/tree.c b/kernel/rcu/tree.c
> > index d8e9dbbefcfa..c716eadc7617 100644
> > --- a/kernel/rcu/tree.c
> > +++ b/kernel/rcu/tree.c
> > @@ -418,16 +418,23 @@ void rcu_momentary_dyntick_idle(void)
> >  EXPORT_SYMBOL_GPL(rcu_momentary_dyntick_idle);
> >  
> >  /**
> > - * rcu_is_cpu_rrupt_from_idle - see if interrupted from idle
> > + * rcu_is_cpu_rrupt_from_idle - see if 'interrupted' from idle
> >   *
> >   * If the current CPU is idle and running at a first-level (not nested)
> > - * interrupt from idle, return true.  The caller must have at least
> > - * disabled preemption.
> > + * interrupt, or directly, from idle, return true.
> > + *
> > + * The caller must have at least disabled IRQs.
> >   */
> >  static int rcu_is_cpu_rrupt_from_idle(void)
> >  {
> > -   /* Called only from within the scheduling-clock interrupt */
> > -   lockdep_assert_in_irq();
> > +   long nesting;
> > +
> > +   /*
> > +* Usually called from the tick; but also used from smp_function_call()
> > +* for expedited grace periods. This latter can result in running from
> > +* the idle task, instead of an actual IPI.
> > +*/
> > +   lockdep_assert_irqs_disabled();
> >  
> > /* Check for counter underflows */
> > RCU_LOCKDEP_WARN(__this_cpu_read(rcu_data.dynticks_nesting) < 0,
> > @@ -436,9 +443,15 @@ static int rcu_is_cpu_rrupt_from_idle(void)
> >  "RCU dynticks_nmi_nesting counter underflow/zero!");
> >  
> > /* Are we at first interrupt nesting level? */
> > -   if (__this_cpu_read(rcu_data.dynticks_nmi_nesting) != 1)
> > +   nesting = __this_cpu_read(rcu_data.dynticks_nmi_nesting);
> > +   if (nesting > 1)
> > return false;
> >  
> > +   /*
> > +* If we're not in an interrupt, we must be in the idle task!
> > +*/
> > +   WARN_ON_ONCE(!nesting && !is_idle_task(current));
> > +
> > /* Does CPU appear to be idle from an RCU standpoint? */
> > return __this_cpu_read(rcu_data.dynticks_nesting) == 0;
> >  }
> > diff --git a/kernel/sched/idle.c b/kernel/sched/idle.c
> > index e9cef84c2b70..05deb81bb3e3 100644
> > --- a/kernel/sched/idle.c
> > +++ b/kernel/sched/idle.c
> > @@ -289,6 +289,10 @@ static void do_idle(void)
> >  */
> > smp_mb__after_atomic();
> >  
> > +   /*
> > +* RCU relies on this call to be done outside of an RCU read-side
> > +* critical section.
> > +*/
> > flush_smp_call_function_from_idle();
> > schedule_idle();
> >  


Re: [RFC][PATCH 4/7] smp: Optimize send_call_function_single_ipi()

2020-05-27 Thread Paul E. McKenney
On Wed, May 27, 2020 at 07:12:36PM +0200, Peter Zijlstra wrote:
> On Wed, May 27, 2020 at 06:35:43PM +0200, Peter Zijlstra wrote:
> > Right, I went though them, didn't find anything obvious amiss. OK, let
> > me do a nicer patch.
> 
> something like so then?
> 
> ---
> Subject: rcu: Allow for smp_call_function() running callbacks from idle
> 
> Current RCU hard relies on smp_call_function() callbacks running from
> interrupt context. A pending optimization is going to break that, it
> will allow idle CPUs to run the callbacks from the idle loop. This
> avoids raising the IPI on the requesting CPU and avoids handling an
> exception on the receiving CPU.
> 
> Change rcu_is_cpu_rrupt_from_idle() to also accept task context,
> provided it is the idle task.
> 
> Signed-off-by: Peter Zijlstra (Intel) 

Looks good to me!

Reviewed-by: Paul E. McKenney 

> ---
>  kernel/rcu/tree.c   | 25 +++--
>  kernel/sched/idle.c |  4 
>  2 files changed, 23 insertions(+), 6 deletions(-)
> 
> diff --git a/kernel/rcu/tree.c b/kernel/rcu/tree.c
> index d8e9dbbefcfa..c716eadc7617 100644
> --- a/kernel/rcu/tree.c
> +++ b/kernel/rcu/tree.c
> @@ -418,16 +418,23 @@ void rcu_momentary_dyntick_idle(void)
>  EXPORT_SYMBOL_GPL(rcu_momentary_dyntick_idle);
>  
>  /**
> - * rcu_is_cpu_rrupt_from_idle - see if interrupted from idle
> + * rcu_is_cpu_rrupt_from_idle - see if 'interrupted' from idle
>   *
>   * If the current CPU is idle and running at a first-level (not nested)
> - * interrupt from idle, return true.  The caller must have at least
> - * disabled preemption.
> + * interrupt, or directly, from idle, return true.
> + *
> + * The caller must have at least disabled IRQs.
>   */
>  static int rcu_is_cpu_rrupt_from_idle(void)
>  {
> - /* Called only from within the scheduling-clock interrupt */
> - lockdep_assert_in_irq();
> + long nesting;
> +
> + /*
> +  * Usually called from the tick; but also used from smp_function_call()
> +  * for expedited grace periods. This latter can result in running from
> +  * the idle task, instead of an actual IPI.
> +  */
> + lockdep_assert_irqs_disabled();
>  
>   /* Check for counter underflows */
>   RCU_LOCKDEP_WARN(__this_cpu_read(rcu_data.dynticks_nesting) < 0,
> @@ -436,9 +443,15 @@ static int rcu_is_cpu_rrupt_from_idle(void)
>"RCU dynticks_nmi_nesting counter underflow/zero!");
>  
>   /* Are we at first interrupt nesting level? */
> - if (__this_cpu_read(rcu_data.dynticks_nmi_nesting) != 1)
> + nesting = __this_cpu_read(rcu_data.dynticks_nmi_nesting);
> + if (nesting > 1)
>   return false;
>  
> + /*
> +  * If we're not in an interrupt, we must be in the idle task!
> +  */
> + WARN_ON_ONCE(!nesting && !is_idle_task(current));
> +
>   /* Does CPU appear to be idle from an RCU standpoint? */
>   return __this_cpu_read(rcu_data.dynticks_nesting) == 0;
>  }
> diff --git a/kernel/sched/idle.c b/kernel/sched/idle.c
> index e9cef84c2b70..05deb81bb3e3 100644
> --- a/kernel/sched/idle.c
> +++ b/kernel/sched/idle.c
> @@ -289,6 +289,10 @@ static void do_idle(void)
>*/
>   smp_mb__after_atomic();
>  
> + /*
> +  * RCU relies on this call to be done outside of an RCU read-side
> +  * critical section.
> +  */
>   flush_smp_call_function_from_idle();
>   schedule_idle();
>  


Re: [RFC][PATCH 4/7] smp: Optimize send_call_function_single_ipi()

2020-05-27 Thread Peter Zijlstra
On Wed, May 27, 2020 at 06:35:43PM +0200, Peter Zijlstra wrote:
> Right, I went though them, didn't find anything obvious amiss. OK, let
> me do a nicer patch.

something like so then?

---
Subject: rcu: Allow for smp_call_function() running callbacks from idle

Current RCU hard relies on smp_call_function() callbacks running from
interrupt context. A pending optimization is going to break that, it
will allow idle CPUs to run the callbacks from the idle loop. This
avoids raising the IPI on the requesting CPU and avoids handling an
exception on the receiving CPU.

Change rcu_is_cpu_rrupt_from_idle() to also accept task context,
provided it is the idle task.

Signed-off-by: Peter Zijlstra (Intel) 
---
 kernel/rcu/tree.c   | 25 +++--
 kernel/sched/idle.c |  4 
 2 files changed, 23 insertions(+), 6 deletions(-)

diff --git a/kernel/rcu/tree.c b/kernel/rcu/tree.c
index d8e9dbbefcfa..c716eadc7617 100644
--- a/kernel/rcu/tree.c
+++ b/kernel/rcu/tree.c
@@ -418,16 +418,23 @@ void rcu_momentary_dyntick_idle(void)
 EXPORT_SYMBOL_GPL(rcu_momentary_dyntick_idle);
 
 /**
- * rcu_is_cpu_rrupt_from_idle - see if interrupted from idle
+ * rcu_is_cpu_rrupt_from_idle - see if 'interrupted' from idle
  *
  * If the current CPU is idle and running at a first-level (not nested)
- * interrupt from idle, return true.  The caller must have at least
- * disabled preemption.
+ * interrupt, or directly, from idle, return true.
+ *
+ * The caller must have at least disabled IRQs.
  */
 static int rcu_is_cpu_rrupt_from_idle(void)
 {
-   /* Called only from within the scheduling-clock interrupt */
-   lockdep_assert_in_irq();
+   long nesting;
+
+   /*
+* Usually called from the tick; but also used from smp_function_call()
+* for expedited grace periods. This latter can result in running from
+* the idle task, instead of an actual IPI.
+*/
+   lockdep_assert_irqs_disabled();
 
/* Check for counter underflows */
RCU_LOCKDEP_WARN(__this_cpu_read(rcu_data.dynticks_nesting) < 0,
@@ -436,9 +443,15 @@ static int rcu_is_cpu_rrupt_from_idle(void)
 "RCU dynticks_nmi_nesting counter underflow/zero!");
 
/* Are we at first interrupt nesting level? */
-   if (__this_cpu_read(rcu_data.dynticks_nmi_nesting) != 1)
+   nesting = __this_cpu_read(rcu_data.dynticks_nmi_nesting);
+   if (nesting > 1)
return false;
 
+   /*
+* If we're not in an interrupt, we must be in the idle task!
+*/
+   WARN_ON_ONCE(!nesting && !is_idle_task(current));
+
/* Does CPU appear to be idle from an RCU standpoint? */
return __this_cpu_read(rcu_data.dynticks_nesting) == 0;
 }
diff --git a/kernel/sched/idle.c b/kernel/sched/idle.c
index e9cef84c2b70..05deb81bb3e3 100644
--- a/kernel/sched/idle.c
+++ b/kernel/sched/idle.c
@@ -289,6 +289,10 @@ static void do_idle(void)
 */
smp_mb__after_atomic();
 
+   /*
+* RCU relies on this call to be done outside of an RCU read-side
+* critical section.
+*/
flush_smp_call_function_from_idle();
schedule_idle();
 


Re: [RFC][PATCH 4/7] smp: Optimize send_call_function_single_ipi()

2020-05-27 Thread Peter Zijlstra
On Wed, May 27, 2020 at 08:56:56AM -0700, Paul E. McKenney wrote:
> On Wed, May 27, 2020 at 12:15:13PM +0200, Peter Zijlstra wrote:

> > At first glance, something like the below could work. But obviously I
> > might have overlooked something more subtle than a brick :-)
> 
> This can work, but only if the call from the idle loop is a place where
> either RCU isn't watching on the one hand or that cannot be in an RCU
> read-side critical section on the other. 

Guaranteed no RCU read side, although the call is in a place where RCU
is active again, is that a problem? I think with a bit of work I can
move it to where RCU is still idle.

> Because rcu_exp_handler() assumes that if this function returns true,
> we are not in an RCU read-side critical section.  (I would expect this
> to be the case, but I figured that I should make it explicit.)

Indeed, I shall put a comment in the idle look to make sure it stays that way.

> > ---
> > 
> > diff --git a/kernel/rcu/tree.c b/kernel/rcu/tree.c
> > index 90c8be22d57a..0792c032a972 100644
> > --- a/kernel/rcu/tree.c
> > +++ b/kernel/rcu/tree.c
> > @@ -426,8 +426,11 @@ EXPORT_SYMBOL_GPL(rcu_momentary_dyntick_idle);
> >   */
> 
> Could we please have a comment noting the change in semantics and
> the reason?

A Changelog you mean? Sure, I can do, but I wasn't nowhere confident
enough in the change to even bother trying to write one.

> >  static int rcu_is_cpu_rrupt_from_idle(void)
> >  {
> > -   /* Called only from within the scheduling-clock interrupt */
> > -   lockdep_assert_in_irq();
> > +   /*
> > +* Usually called from the tick; but also used from smp_call_function()
> > +* for expedited grace periods.
> > +*/
> > +   lockdep_assert_irqs_disabled();
> >  
> > /* Check for counter underflows */
> > RCU_LOCKDEP_WARN(__this_cpu_read(rcu_data.dynticks_nesting) < 0,
> > @@ -435,8 +438,11 @@ static int rcu_is_cpu_rrupt_from_idle(void)
> > RCU_LOCKDEP_WARN(__this_cpu_read(rcu_data.dynticks_nmi_nesting) <= 0,
> >  "RCU dynticks_nmi_nesting counter underflow/zero!");
> >  
> > -   /* Are we at first interrupt nesting level? */
> > -   if (__this_cpu_read(rcu_data.dynticks_nmi_nesting) != 1)
> > +   /*
> > +* Are we at first interrupt nesting level? -- or below, when running
> > +* directly from the idle loop itself.
> > +*/
> > +   if (__this_cpu_read(rcu_data.dynticks_nmi_nesting) > 1)
> 
> Wouldn't it also be a good idea to check that we are in the context of
> an idle thread?  Just in case some idiot like me drops a call to this
> function in the wrong place, for example, if I were to mistakenly remember
> the old semantics where it would return false from process context?
> 
> Maybe something like this?
> 
>   nesting = __this_cpu_read(rcu_data.dynticks_nmi_nesting;
>   if (nesting > 1)
>   return false;
>   WARN_ON_ONCE(!nesting && !is_idle_task(current));

Yep, that should do.

> > return false;
> >  
> > /* Does CPU appear to be idle from an RCU standpoint? */
> 
> And let's check the other callers:
> 
> rcu_sched_clock_irq():  This will always be called from IRQ (right?), so
>   no problem.
> 
> rcu_pending():  Only called from rcu_sched_clock_irq(), so still no problem.
> 
> rcu_flavor_sched_clock_irq(): Ditto for both definitions.

Right, I went though them, didn't find anything obvious amiss. OK, let
me do a nicer patch.


Re: [RFC][PATCH 4/7] smp: Optimize send_call_function_single_ipi()

2020-05-27 Thread Paul E. McKenney
On Wed, May 27, 2020 at 12:15:13PM +0200, Peter Zijlstra wrote:
> On Wed, May 27, 2020 at 11:56:45AM +0200, Peter Zijlstra wrote:
> 
> > This is rcu_is_cpu_rrupt_from_idle()'s lockdep_assert_in_irq() tripping
> > up (it's comment is obviously a bit antiquated).
> > 
> > Now, if I read that code correctly, it actually relies on
> > rcu_irq_enter() and thus really wants to be in an interrupt. Is there
> > any way this code can be made to work from the new context too?
> > 
> > After all, all that really is different is not having gone throught he
> > bother of setting up the IRQ context, nothing else changed -- it just so
> > happens you actually relied on that ;/
> 
> At first glance, something like the below could work. But obviously I
> might have overlooked something more subtle than a brick :-)

This can work, but only if the call from the idle loop is a place where
either RCU isn't watching on the one hand or that cannot be in an RCU
read-side critical section on the other.  Because rcu_exp_handler()
assumes that if this function returns true, we are not in an RCU read-side
critical section.  (I would expect this to be the case, but I figured
that I should make it explicit.)

> ---
> 
> diff --git a/kernel/rcu/tree.c b/kernel/rcu/tree.c
> index 90c8be22d57a..0792c032a972 100644
> --- a/kernel/rcu/tree.c
> +++ b/kernel/rcu/tree.c
> @@ -426,8 +426,11 @@ EXPORT_SYMBOL_GPL(rcu_momentary_dyntick_idle);
>   */

Could we please have a comment noting the change in semantics and
the reason?

>  static int rcu_is_cpu_rrupt_from_idle(void)
>  {
> - /* Called only from within the scheduling-clock interrupt */
> - lockdep_assert_in_irq();
> + /*
> +  * Usually called from the tick; but also used from smp_call_function()
> +  * for expedited grace periods.
> +  */
> + lockdep_assert_irqs_disabled();
>  
>   /* Check for counter underflows */
>   RCU_LOCKDEP_WARN(__this_cpu_read(rcu_data.dynticks_nesting) < 0,
> @@ -435,8 +438,11 @@ static int rcu_is_cpu_rrupt_from_idle(void)
>   RCU_LOCKDEP_WARN(__this_cpu_read(rcu_data.dynticks_nmi_nesting) <= 0,
>"RCU dynticks_nmi_nesting counter underflow/zero!");
>  
> - /* Are we at first interrupt nesting level? */
> - if (__this_cpu_read(rcu_data.dynticks_nmi_nesting) != 1)
> + /*
> +  * Are we at first interrupt nesting level? -- or below, when running
> +  * directly from the idle loop itself.
> +  */
> + if (__this_cpu_read(rcu_data.dynticks_nmi_nesting) > 1)

Wouldn't it also be a good idea to check that we are in the context of
an idle thread?  Just in case some idiot like me drops a call to this
function in the wrong place, for example, if I were to mistakenly remember
the old semantics where it would return false from process context?

Maybe something like this?

nesting = __this_cpu_read(rcu_data.dynticks_nmi_nesting;
if (nesting > 1)
return false;
WARN_ON_ONCE(!nesting && !is_idle_task(current));

>   return false;
>  
>   /* Does CPU appear to be idle from an RCU standpoint? */

And let's check the other callers:

rcu_sched_clock_irq():  This will always be called from IRQ (right?), so
no problem.

rcu_pending():  Only called from rcu_sched_clock_irq(), so still no problem.

rcu_flavor_sched_clock_irq(): Ditto for both definitions.

Thanx, Paul


Re: [RFC][PATCH 4/7] smp: Optimize send_call_function_single_ipi()

2020-05-27 Thread Peter Zijlstra
On Wed, May 27, 2020 at 11:56:45AM +0200, Peter Zijlstra wrote:

> This is rcu_is_cpu_rrupt_from_idle()'s lockdep_assert_in_irq() tripping
> up (it's comment is obviously a bit antiquated).
> 
> Now, if I read that code correctly, it actually relies on
> rcu_irq_enter() and thus really wants to be in an interrupt. Is there
> any way this code can be made to work from the new context too?
> 
> After all, all that really is different is not having gone throught he
> bother of setting up the IRQ context, nothing else changed -- it just so
> happens you actually relied on that ;/

At first glance, something like the below could work. But obviously I
might have overlooked something more subtle than a brick :-)

---

diff --git a/kernel/rcu/tree.c b/kernel/rcu/tree.c
index 90c8be22d57a..0792c032a972 100644
--- a/kernel/rcu/tree.c
+++ b/kernel/rcu/tree.c
@@ -426,8 +426,11 @@ EXPORT_SYMBOL_GPL(rcu_momentary_dyntick_idle);
  */
 static int rcu_is_cpu_rrupt_from_idle(void)
 {
-   /* Called only from within the scheduling-clock interrupt */
-   lockdep_assert_in_irq();
+   /*
+* Usually called from the tick; but also used from smp_call_function()
+* for expedited grace periods.
+*/
+   lockdep_assert_irqs_disabled();
 
/* Check for counter underflows */
RCU_LOCKDEP_WARN(__this_cpu_read(rcu_data.dynticks_nesting) < 0,
@@ -435,8 +438,11 @@ static int rcu_is_cpu_rrupt_from_idle(void)
RCU_LOCKDEP_WARN(__this_cpu_read(rcu_data.dynticks_nmi_nesting) <= 0,
 "RCU dynticks_nmi_nesting counter underflow/zero!");
 
-   /* Are we at first interrupt nesting level? */
-   if (__this_cpu_read(rcu_data.dynticks_nmi_nesting) != 1)
+   /*
+* Are we at first interrupt nesting level? -- or below, when running
+* directly from the idle loop itself.
+*/
+   if (__this_cpu_read(rcu_data.dynticks_nmi_nesting) > 1)
return false;
 
/* Does CPU appear to be idle from an RCU standpoint? */


Re: [RFC][PATCH 4/7] smp: Optimize send_call_function_single_ipi()

2020-05-27 Thread Peter Zijlstra
On Tue, May 26, 2020 at 06:11:01PM +0200, Peter Zijlstra wrote:
> Just like the ttwu_queue_remote() IPI, make use of _TIF_POLLING_NRFLAG
> to avoid sending IPIs to idle CPUs.
> 
> Signed-off-by: Peter Zijlstra (Intel) 
> ---
>  kernel/sched/core.c  |   10 ++
>  kernel/sched/idle.c  |1 +
>  kernel/sched/sched.h |2 ++
>  kernel/smp.c |   16 +++-
>  4 files changed, 28 insertions(+), 1 deletion(-)
> 
> --- a/kernel/sched/core.c
> +++ b/kernel/sched/core.c
> @@ -2296,6 +2296,16 @@ static void wake_csd_func(void *info)
>   sched_ttwu_pending();
>  }
>  
> +void send_call_function_single_ipi(int cpu)
> +{
> + struct rq *rq = cpu_rq(cpu);
> +
> + if (!set_nr_if_polling(rq->idle))
> + arch_send_call_function_single_ipi(cpu);
> + else
> + trace_sched_wake_idle_without_ipi(cpu);
> +}
> +
>  /*
>   * Queue a task on the target CPUs wake_list and wake the CPU via IPI if
>   * necessary. The wakee CPU on receipt of the IPI will queue the task
> --- a/kernel/sched/idle.c
> +++ b/kernel/sched/idle.c
> @@ -289,6 +289,7 @@ static void do_idle(void)
>*/
>   smp_mb__after_atomic();
>  
> + flush_smp_call_function_from_idle();
>   sched_ttwu_pending();
>   schedule_idle();
>  

Paul; the above patch basically allows smp_call_function_single() to run
from the idle context (with IRQs disabled, obviously) instead of from an
actual IRQ context.

This makes RCU unhappy (as reported by mingo):

 [ ] [ cut here ]
 [ ] Not in hardirq as expected
 [ ] WARNING: CPU: 4 PID: 0 at kernel/rcu/tree.c:430 
rcu_is_cpu_rrupt_from_idle+0xed/0x110
 [ ] Modules linked in:
 [ ] CPU: 4 PID: 0 Comm: swapper/4 Not tainted 5.7.0-rc7-00840-ga61d572-dirty #1
 [ ] Hardware name: Supermicro H8DG6/H8DGi/H8DG6/H8DGi, BIOS 2.0b   
03/01/2012
 [ ] RIP: 0010:rcu_is_cpu_rrupt_from_idle+0xed/0x110
 [ ] Call Trace:
 [ ]  rcu_exp_handler+0x38/0x90
 [ ]  flush_smp_call_function_queue+0xce/0x230
 [ ]  flush_smp_call_function_from_idle+0x2f/0x60
 [ ]  do_idle+0x163/0x260
 [ ]  cpu_startup_entry+0x19/0x20
 [ ]  start_secondary+0x14f/0x1a0
 [ ] irq event stamp: 189300
 [ ] hardirqs last  enabled at (189299): [] 
tick_nohz_idle_exit+0x55/0xb0
 [ ] hardirqs last disabled at (189300): [] 
flush_smp_call_function_from_idle+0x25/0x60
 [ ] softirqs last  enabled at (189284): [] 
irq_enter_rcu+0x70/0x80
 [ ] softirqs last disabled at (189283): [] 
irq_enter_rcu+0x55/0x80

This is rcu_is_cpu_rrupt_from_idle()'s lockdep_assert_in_irq() tripping
up (it's comment is obviously a bit antiquated).

Now, if I read that code correctly, it actually relies on
rcu_irq_enter() and thus really wants to be in an interrupt. Is there
any way this code can be made to work from the new context too?

After all, all that really is different is not having gone throught he
bother of setting up the IRQ context, nothing else changed -- it just so
happens you actually relied on that ;/


[RFC][PATCH 4/7] smp: Optimize send_call_function_single_ipi()

2020-05-26 Thread Peter Zijlstra
Just like the ttwu_queue_remote() IPI, make use of _TIF_POLLING_NRFLAG
to avoid sending IPIs to idle CPUs.

Signed-off-by: Peter Zijlstra (Intel) 
---
 kernel/sched/core.c  |   10 ++
 kernel/sched/idle.c  |1 +
 kernel/sched/sched.h |2 ++
 kernel/smp.c |   16 +++-
 4 files changed, 28 insertions(+), 1 deletion(-)

--- a/kernel/sched/core.c
+++ b/kernel/sched/core.c
@@ -2296,6 +2296,16 @@ static void wake_csd_func(void *info)
sched_ttwu_pending();
 }
 
+void send_call_function_single_ipi(int cpu)
+{
+   struct rq *rq = cpu_rq(cpu);
+
+   if (!set_nr_if_polling(rq->idle))
+   arch_send_call_function_single_ipi(cpu);
+   else
+   trace_sched_wake_idle_without_ipi(cpu);
+}
+
 /*
  * Queue a task on the target CPUs wake_list and wake the CPU via IPI if
  * necessary. The wakee CPU on receipt of the IPI will queue the task
--- a/kernel/sched/idle.c
+++ b/kernel/sched/idle.c
@@ -289,6 +289,7 @@ static void do_idle(void)
 */
smp_mb__after_atomic();
 
+   flush_smp_call_function_from_idle();
sched_ttwu_pending();
schedule_idle();
 
--- a/kernel/sched/sched.h
+++ b/kernel/sched/sched.h
@@ -1506,6 +1506,8 @@ static inline void unregister_sched_doma
 }
 #endif
 
+extern void flush_smp_call_function_from_idle(void);
+
 #else
 
 static inline void sched_ttwu_pending(void) { }
--- a/kernel/smp.c
+++ b/kernel/smp.c
@@ -135,6 +135,8 @@ static __always_inline void csd_unlock(c
 
 static DEFINE_PER_CPU_SHARED_ALIGNED(call_single_data_t, csd_data);
 
+extern void send_call_function_single_ipi(int cpu);
+
 /*
  * Insert a previously allocated call_single_data_t element
  * for execution on the given CPU. data must already have
@@ -178,7 +180,7 @@ static int generic_exec_single(int cpu,
 * equipped to do the right thing...
 */
if (llist_add(>llist, _cpu(call_single_queue, cpu)))
-   arch_send_call_function_single_ipi(cpu);
+   send_call_function_single_ipi(cpu);
 
return 0;
 }
@@ -278,6 +280,18 @@ static void flush_smp_call_function_queu
}
 }
 
+void flush_smp_call_function_from_idle(void)
+{
+   unsigned long flags;
+
+   if (llist_empty(this_cpu_ptr(_single_queue)))
+   return;
+
+   local_irq_save(flags);
+   flush_smp_call_function_queue(true);
+   local_irq_restore(flags);
+}
+
 /*
  * smp_call_function_single - Run a function on a specific CPU
  * @func: The function to run. This must be fast and non-blocking.