Re: [PATCH RFC nohz_full 7/7] nohz_full: Force RCU's grace-period kthreads onto timekeeping CPU

2013-07-30 Thread Paul E. McKenney
On Tue, Jul 30, 2013 at 09:40:03AM +0800, Lai Jiangshan wrote:
> On 07/30/2013 12:52 AM, Paul E. McKenney wrote:
> > On Mon, Jul 29, 2013 at 11:36:05AM +0800, Lai Jiangshan wrote:
> >> On 07/27/2013 07:19 AM, Paul E. McKenney wrote:
> >>> From: "Paul E. McKenney" 
> >>>
> >>> Because RCU's quiescent-state-forcing mechanism is used to drive the
> >>> full-system-idle state machine, and because this mechanism is executed
> >>> by RCU's grace-period kthreads, this commit forces these kthreads to
> >>> run on the timekeeping CPU (tick_do_timer_cpu).  To do otherwise would
> >>> mean that the RCU grace-period kthreads would force the system into
> >>> non-idle state every time they drove the state machine, which would
> >>> be just a bit on the futile side.
> >>>
> >>> Signed-off-by: Paul E. McKenney 
> >>> Cc: Frederic Weisbecker 
> >>> Cc: Steven Rostedt 
> >>> ---
> >>>  kernel/rcutree.c|  1 +
> >>>  kernel/rcutree.h|  1 +
> >>>  kernel/rcutree_plugin.h | 20 +++-
> >>>  3 files changed, 21 insertions(+), 1 deletion(-)
> >>>
> >>> diff --git a/kernel/rcutree.c b/kernel/rcutree.c
> >>> index aa6d96e..fe83085 100644
> >>> --- a/kernel/rcutree.c
> >>> +++ b/kernel/rcutree.c
> >>> @@ -1286,6 +1286,7 @@ static int rcu_gp_init(struct rcu_state *rsp)
> >>>   struct rcu_data *rdp;
> >>>   struct rcu_node *rnp = rcu_get_root(rsp);
> >>>  
> >>> + rcu_bind_gp_kthread();
> >>>   raw_spin_lock_irq(>lock);
> >>>   rsp->gp_flags = 0; /* Clear all flags: New grace period. */
> >>
> >> bind the gp thread when RCU_GP_FLAG_INIT ...
> >>
> >>>  
> >>> diff --git a/kernel/rcutree.h b/kernel/rcutree.h
> >>> index e0de5dc..49dac99 100644
> >>> --- a/kernel/rcutree.h
> >>> +++ b/kernel/rcutree.h
> >>> @@ -560,6 +560,7 @@ static void rcu_sysidle_check_cpu(struct rcu_data 
> >>> *rdp, bool *isidle,
> >>>  static bool is_sysidle_rcu_state(struct rcu_state *rsp);
> >>>  static void rcu_sysidle_report_gp(struct rcu_state *rsp, int isidle,
> >>> unsigned long maxj);
> >>> +static void rcu_bind_gp_kthread(void);
> >>>  static void rcu_sysidle_init_percpu_data(struct rcu_dynticks *rdtp);
> >>>  
> >>>  #endif /* #ifndef RCU_TREE_NONCORE */
> >>> diff --git a/kernel/rcutree_plugin.h b/kernel/rcutree_plugin.h
> >>> index ff84bed..f65d9c2 100644
> >>> --- a/kernel/rcutree_plugin.h
> >>> +++ b/kernel/rcutree_plugin.h
> >>> @@ -2544,7 +2544,7 @@ static void rcu_sysidle_check_cpu(struct rcu_data 
> >>> *rdp, bool *isidle,
> >>>   if (!*isidle || rdp->rsp != rcu_sysidle_state ||
> >>>   cpu_is_offline(rdp->cpu) || rdp->cpu == tick_do_timer_cpu)
> >>>   return;
> >>> - /* WARN_ON_ONCE(smp_processor_id() != tick_do_timer_cpu); */
> >>> + WARN_ON_ONCE(smp_processor_id() != tick_do_timer_cpu);
> >>
> >>
> >> but call rcu_sysidle_check_cpu() when RCU_GP_FLAG_FQS.
> > 
> > Yep!  But we don't call rcu_gp_fqs() until the grace period is started,
> > by which time the kthread will be bound.  Any setting of RCU_GP_FLAG_FQS
> > while there is no grace period in progress is ignored.
> 
> tick_do_timer_cpu can be changed.
> when rcu_gp_fqs() is called, the tick_do_timer_cpu may be a different CPU.
> 
> xxx_thread()
> {
>   bind itself to tick_do_timer_cpu.
>   sleep(); /* tick_do_timer_cpu can be changed while this */
>   use wrong tick_do_timer_cpu.
> }

Yes, but Frederic's patches currently disable changing of tick_do_timer_cpu.
He said that he will re-enable this at some point, and he and I will need
to coordinate that change to allow RCU to tolerate tick_do_timer_cpu
migration.

Thanx, Paul

> >> In this time, the thread may not be bound to tick_do_timer_cpu,
> >> the WARN_ON_ONCE() may be wrong.
> >>
> >> Does any other code ensure the gp thread bound on tick_do_timer_cpu
> >> which I missed?
> > 
> > However, on small systems, rcu_sysidle_check_cpu() can be called from
> > the timekeeping CPU.  I suppose that this could potentially happen
> > before the first grace period starts, and in that case, we could
> > potentially see a spurious warning.  I could imagine a number of ways
> > to fix this:
> > 
> > 1.  Bind the kthread when it is created.
> > 
> > 2.  Bind the kthread when it first starts running, rather than just
> > after the grace period starts.
> > 
> > 3.  Suppress the warning when there is no grace period in progress.
> > 
> > 4.  Suppress the warning prior to the first grace period starting.
> > 
> > Seems like #3 is the most straightforward approach.  I just change it to:
> > 
> > if (rcu_gp_in_progress(rdp->rsp))
> > WARN_ON_ONCE(smp_processor_id() != tick_do_timer_cpu);
> > 
> > This still gets a WARN_ON_ONCE() if someone moves the timekeeping CPU,
> > but Frederic tells me that it never moves.  My WARN_ON_ONCE() has some
> > probability of complaining should some bug creep in.
> > 
> > Sound reasonable?
> > 
> > Thanx, Paul
> > 

Re: [PATCH RFC nohz_full 7/7] nohz_full: Force RCU's grace-period kthreads onto timekeeping CPU

2013-07-30 Thread Paul E. McKenney
On Tue, Jul 30, 2013 at 09:40:03AM +0800, Lai Jiangshan wrote:
 On 07/30/2013 12:52 AM, Paul E. McKenney wrote:
  On Mon, Jul 29, 2013 at 11:36:05AM +0800, Lai Jiangshan wrote:
  On 07/27/2013 07:19 AM, Paul E. McKenney wrote:
  From: Paul E. McKenney paul...@linux.vnet.ibm.com
 
  Because RCU's quiescent-state-forcing mechanism is used to drive the
  full-system-idle state machine, and because this mechanism is executed
  by RCU's grace-period kthreads, this commit forces these kthreads to
  run on the timekeeping CPU (tick_do_timer_cpu).  To do otherwise would
  mean that the RCU grace-period kthreads would force the system into
  non-idle state every time they drove the state machine, which would
  be just a bit on the futile side.
 
  Signed-off-by: Paul E. McKenney paul...@linux.vnet.ibm.com
  Cc: Frederic Weisbecker fweis...@gmail.com
  Cc: Steven Rostedt rost...@goodmis.org
  ---
   kernel/rcutree.c|  1 +
   kernel/rcutree.h|  1 +
   kernel/rcutree_plugin.h | 20 +++-
   3 files changed, 21 insertions(+), 1 deletion(-)
 
  diff --git a/kernel/rcutree.c b/kernel/rcutree.c
  index aa6d96e..fe83085 100644
  --- a/kernel/rcutree.c
  +++ b/kernel/rcutree.c
  @@ -1286,6 +1286,7 @@ static int rcu_gp_init(struct rcu_state *rsp)
struct rcu_data *rdp;
struct rcu_node *rnp = rcu_get_root(rsp);
   
  + rcu_bind_gp_kthread();
raw_spin_lock_irq(rnp-lock);
rsp-gp_flags = 0; /* Clear all flags: New grace period. */
 
  bind the gp thread when RCU_GP_FLAG_INIT ...
 
   
  diff --git a/kernel/rcutree.h b/kernel/rcutree.h
  index e0de5dc..49dac99 100644
  --- a/kernel/rcutree.h
  +++ b/kernel/rcutree.h
  @@ -560,6 +560,7 @@ static void rcu_sysidle_check_cpu(struct rcu_data 
  *rdp, bool *isidle,
   static bool is_sysidle_rcu_state(struct rcu_state *rsp);
   static void rcu_sysidle_report_gp(struct rcu_state *rsp, int isidle,
  unsigned long maxj);
  +static void rcu_bind_gp_kthread(void);
   static void rcu_sysidle_init_percpu_data(struct rcu_dynticks *rdtp);
   
   #endif /* #ifndef RCU_TREE_NONCORE */
  diff --git a/kernel/rcutree_plugin.h b/kernel/rcutree_plugin.h
  index ff84bed..f65d9c2 100644
  --- a/kernel/rcutree_plugin.h
  +++ b/kernel/rcutree_plugin.h
  @@ -2544,7 +2544,7 @@ static void rcu_sysidle_check_cpu(struct rcu_data 
  *rdp, bool *isidle,
if (!*isidle || rdp-rsp != rcu_sysidle_state ||
cpu_is_offline(rdp-cpu) || rdp-cpu == tick_do_timer_cpu)
return;
  - /* WARN_ON_ONCE(smp_processor_id() != tick_do_timer_cpu); */
  + WARN_ON_ONCE(smp_processor_id() != tick_do_timer_cpu);
 
 
  but call rcu_sysidle_check_cpu() when RCU_GP_FLAG_FQS.
  
  Yep!  But we don't call rcu_gp_fqs() until the grace period is started,
  by which time the kthread will be bound.  Any setting of RCU_GP_FLAG_FQS
  while there is no grace period in progress is ignored.
 
 tick_do_timer_cpu can be changed.
 when rcu_gp_fqs() is called, the tick_do_timer_cpu may be a different CPU.
 
 xxx_thread()
 {
   bind itself to tick_do_timer_cpu.
   sleep(); /* tick_do_timer_cpu can be changed while this */
   use wrong tick_do_timer_cpu.
 }

Yes, but Frederic's patches currently disable changing of tick_do_timer_cpu.
He said that he will re-enable this at some point, and he and I will need
to coordinate that change to allow RCU to tolerate tick_do_timer_cpu
migration.

Thanx, Paul

  In this time, the thread may not be bound to tick_do_timer_cpu,
  the WARN_ON_ONCE() may be wrong.
 
  Does any other code ensure the gp thread bound on tick_do_timer_cpu
  which I missed?
  
  However, on small systems, rcu_sysidle_check_cpu() can be called from
  the timekeeping CPU.  I suppose that this could potentially happen
  before the first grace period starts, and in that case, we could
  potentially see a spurious warning.  I could imagine a number of ways
  to fix this:
  
  1.  Bind the kthread when it is created.
  
  2.  Bind the kthread when it first starts running, rather than just
  after the grace period starts.
  
  3.  Suppress the warning when there is no grace period in progress.
  
  4.  Suppress the warning prior to the first grace period starting.
  
  Seems like #3 is the most straightforward approach.  I just change it to:
  
  if (rcu_gp_in_progress(rdp-rsp))
  WARN_ON_ONCE(smp_processor_id() != tick_do_timer_cpu);
  
  This still gets a WARN_ON_ONCE() if someone moves the timekeeping CPU,
  but Frederic tells me that it never moves.  My WARN_ON_ONCE() has some
  probability of complaining should some bug creep in.
  
  Sound reasonable?
  
  Thanx, Paul
  
/* Pick up current idle and NMI-nesting counter and check. */
cur = atomic_read(rdtp-dynticks_idle);
  @@ -2570,6 +2570,20 @@ static bool is_sysidle_rcu_state(struct rcu_state 
  *rsp)
   }
   
   /*
  + * Bind the grace-period kthread for 

Re: [PATCH RFC nohz_full 7/7] nohz_full: Force RCU's grace-period kthreads onto timekeeping CPU

2013-07-29 Thread Lai Jiangshan
On 07/30/2013 12:52 AM, Paul E. McKenney wrote:
> On Mon, Jul 29, 2013 at 11:36:05AM +0800, Lai Jiangshan wrote:
>> On 07/27/2013 07:19 AM, Paul E. McKenney wrote:
>>> From: "Paul E. McKenney" 
>>>
>>> Because RCU's quiescent-state-forcing mechanism is used to drive the
>>> full-system-idle state machine, and because this mechanism is executed
>>> by RCU's grace-period kthreads, this commit forces these kthreads to
>>> run on the timekeeping CPU (tick_do_timer_cpu).  To do otherwise would
>>> mean that the RCU grace-period kthreads would force the system into
>>> non-idle state every time they drove the state machine, which would
>>> be just a bit on the futile side.
>>>
>>> Signed-off-by: Paul E. McKenney 
>>> Cc: Frederic Weisbecker 
>>> Cc: Steven Rostedt 
>>> ---
>>>  kernel/rcutree.c|  1 +
>>>  kernel/rcutree.h|  1 +
>>>  kernel/rcutree_plugin.h | 20 +++-
>>>  3 files changed, 21 insertions(+), 1 deletion(-)
>>>
>>> diff --git a/kernel/rcutree.c b/kernel/rcutree.c
>>> index aa6d96e..fe83085 100644
>>> --- a/kernel/rcutree.c
>>> +++ b/kernel/rcutree.c
>>> @@ -1286,6 +1286,7 @@ static int rcu_gp_init(struct rcu_state *rsp)
>>> struct rcu_data *rdp;
>>> struct rcu_node *rnp = rcu_get_root(rsp);
>>>  
>>> +   rcu_bind_gp_kthread();
>>> raw_spin_lock_irq(>lock);
>>> rsp->gp_flags = 0; /* Clear all flags: New grace period. */
>>
>> bind the gp thread when RCU_GP_FLAG_INIT ...
>>
>>>  
>>> diff --git a/kernel/rcutree.h b/kernel/rcutree.h
>>> index e0de5dc..49dac99 100644
>>> --- a/kernel/rcutree.h
>>> +++ b/kernel/rcutree.h
>>> @@ -560,6 +560,7 @@ static void rcu_sysidle_check_cpu(struct rcu_data *rdp, 
>>> bool *isidle,
>>>  static bool is_sysidle_rcu_state(struct rcu_state *rsp);
>>>  static void rcu_sysidle_report_gp(struct rcu_state *rsp, int isidle,
>>>   unsigned long maxj);
>>> +static void rcu_bind_gp_kthread(void);
>>>  static void rcu_sysidle_init_percpu_data(struct rcu_dynticks *rdtp);
>>>  
>>>  #endif /* #ifndef RCU_TREE_NONCORE */
>>> diff --git a/kernel/rcutree_plugin.h b/kernel/rcutree_plugin.h
>>> index ff84bed..f65d9c2 100644
>>> --- a/kernel/rcutree_plugin.h
>>> +++ b/kernel/rcutree_plugin.h
>>> @@ -2544,7 +2544,7 @@ static void rcu_sysidle_check_cpu(struct rcu_data 
>>> *rdp, bool *isidle,
>>> if (!*isidle || rdp->rsp != rcu_sysidle_state ||
>>> cpu_is_offline(rdp->cpu) || rdp->cpu == tick_do_timer_cpu)
>>> return;
>>> -   /* WARN_ON_ONCE(smp_processor_id() != tick_do_timer_cpu); */
>>> +   WARN_ON_ONCE(smp_processor_id() != tick_do_timer_cpu);
>>
>>
>> but call rcu_sysidle_check_cpu() when RCU_GP_FLAG_FQS.
> 
> Yep!  But we don't call rcu_gp_fqs() until the grace period is started,
> by which time the kthread will be bound.  Any setting of RCU_GP_FLAG_FQS
> while there is no grace period in progress is ignored.

tick_do_timer_cpu can be changed.
when rcu_gp_fqs() is called, the tick_do_timer_cpu may be a different CPU.

xxx_thread()
{
bind itself to tick_do_timer_cpu.
sleep(); /* tick_do_timer_cpu can be changed while this */
use wrong tick_do_timer_cpu.
}


> 
>> In this time, the thread may not be bound to tick_do_timer_cpu,
>> the WARN_ON_ONCE() may be wrong.
>>
>> Does any other code ensure the gp thread bound on tick_do_timer_cpu
>> which I missed?
> 
> However, on small systems, rcu_sysidle_check_cpu() can be called from
> the timekeeping CPU.  I suppose that this could potentially happen
> before the first grace period starts, and in that case, we could
> potentially see a spurious warning.  I could imagine a number of ways
> to fix this:
> 
> 1.Bind the kthread when it is created.
> 
> 2.Bind the kthread when it first starts running, rather than just
>   after the grace period starts.
> 
> 3.Suppress the warning when there is no grace period in progress.
> 
> 4.Suppress the warning prior to the first grace period starting.
> 
> Seems like #3 is the most straightforward approach.  I just change it to:
> 
>   if (rcu_gp_in_progress(rdp->rsp))
>   WARN_ON_ONCE(smp_processor_id() != tick_do_timer_cpu);
> 
> This still gets a WARN_ON_ONCE() if someone moves the timekeeping CPU,
> but Frederic tells me that it never moves.  My WARN_ON_ONCE() has some
> probability of complaining should some bug creep in.
> 
> Sound reasonable?
> 
>   Thanx, Paul
> 
>>> /* Pick up current idle and NMI-nesting counter and check. */
>>> cur = atomic_read(>dynticks_idle);
>>> @@ -2570,6 +2570,20 @@ static bool is_sysidle_rcu_state(struct rcu_state 
>>> *rsp)
>>>  }
>>>  
>>>  /*
>>> + * Bind the grace-period kthread for the sysidle flavor of RCU to the
>>> + * timekeeping CPU.
>>> + */
>>> +static void rcu_bind_gp_kthread(void)
>>> +{
>>> +   int cpu = ACCESS_ONCE(tick_do_timer_cpu);
>>> +
>>> +   if (cpu < 0 || cpu >= nr_cpu_ids)
>>> +   return;
>>> +   if 

Re: [PATCH RFC nohz_full 7/7] nohz_full: Force RCU's grace-period kthreads onto timekeeping CPU

2013-07-29 Thread Paul E. McKenney
On Mon, Jul 29, 2013 at 06:59:46PM +0200, Frederic Weisbecker wrote:
> On Mon, Jul 29, 2013 at 09:52:53AM -0700, Paul E. McKenney wrote:
> > On Mon, Jul 29, 2013 at 11:36:05AM +0800, Lai Jiangshan wrote:
> > However, on small systems, rcu_sysidle_check_cpu() can be called from
> > the timekeeping CPU.  I suppose that this could potentially happen
> > before the first grace period starts, and in that case, we could
> > potentially see a spurious warning.  I could imagine a number of ways
> > to fix this:
> > 
> > 1.  Bind the kthread when it is created.
> > 
> > 2.  Bind the kthread when it first starts running, rather than just
> > after the grace period starts.
> > 
> > 3.  Suppress the warning when there is no grace period in progress.
> > 
> > 4.  Suppress the warning prior to the first grace period starting.
> > 
> > Seems like #3 is the most straightforward approach.  I just change it to:
> > 
> > if (rcu_gp_in_progress(rdp->rsp))
> > WARN_ON_ONCE(smp_processor_id() != tick_do_timer_cpu);
> > 
> > This still gets a WARN_ON_ONCE() if someone moves the timekeeping CPU,
> > but Frederic tells me that it never moves.  My WARN_ON_ONCE() has some
> > probability of complaining should some bug creep in.
> 
> It doesn't move for now but keep in mind that it will probably be able
> to move in the future. If we have several non full-dynticks CPUs, balancing
> the timekeeping duty between them, depending which one runs at a given time,
> may improve power savings even better.
> 
> But you can ignore that for now. Your patchset is entertaining enough that
> we don't need to add more complications yet ;)

Yeah, we will need some sort of handshake for that.  Might be a simple
as setting a flag that suppresses the warning, which I clear the next
time I bind the kthread.  Well, it would need to deal with closely-spaced
moves of the timekeeping duty, wouldn't it?  Plus it would need to deal
with the fact that sampling the variable referencing the timekeeping CPU,
sampling the current CPU, and binding the kthread cannot be done as one
big atomic operation.  Which means that their would need to be two calls
in the handshake, one to prepare to move the timekeeping CPU and another
to announce that it had in fact been moved.

Which is not too hard -- I use an irq-disable lock to guard setting and
clearing an internal-to-RCU flag noting the upcoming change.  The
check and WARN_ON_ONCE() are done while holding this same lock.  The
flag is cleared only after the kthread-bind operation that follows the
last "it has in fact been moved" handshake.  So the flag has three states,
idle, ready to move, and moved.  The possibility of closely spaced moves
of the timekeeping kthread are dealt with by transitioning from "moved"
to "ready to move".  The state goes back to "idle" only after completion
of a kthread-bind operation in the "moved" state.

So agreed, let's defer this one.  ;-)

Thanx, Paul

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH RFC nohz_full 7/7] nohz_full: Force RCU's grace-period kthreads onto timekeeping CPU

2013-07-29 Thread Frederic Weisbecker
On Mon, Jul 29, 2013 at 09:52:53AM -0700, Paul E. McKenney wrote:
> On Mon, Jul 29, 2013 at 11:36:05AM +0800, Lai Jiangshan wrote:
> However, on small systems, rcu_sysidle_check_cpu() can be called from
> the timekeeping CPU.  I suppose that this could potentially happen
> before the first grace period starts, and in that case, we could
> potentially see a spurious warning.  I could imagine a number of ways
> to fix this:
> 
> 1.Bind the kthread when it is created.
> 
> 2.Bind the kthread when it first starts running, rather than just
>   after the grace period starts.
> 
> 3.Suppress the warning when there is no grace period in progress.
> 
> 4.Suppress the warning prior to the first grace period starting.
> 
> Seems like #3 is the most straightforward approach.  I just change it to:
> 
>   if (rcu_gp_in_progress(rdp->rsp))
>   WARN_ON_ONCE(smp_processor_id() != tick_do_timer_cpu);
> 
> This still gets a WARN_ON_ONCE() if someone moves the timekeeping CPU,
> but Frederic tells me that it never moves.  My WARN_ON_ONCE() has some
> probability of complaining should some bug creep in.

It doesn't move for now but keep in mind that it will probably be able
to move in the future. If we have several non full-dynticks CPUs, balancing
the timekeeping duty between them, depending which one runs at a given time,
may improve power savings even better.

But you can ignore that for now. Your patchset is entertaining enough that
we don't need to add more complications yet ;)
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH RFC nohz_full 7/7] nohz_full: Force RCU's grace-period kthreads onto timekeeping CPU

2013-07-29 Thread Paul E. McKenney
On Mon, Jul 29, 2013 at 11:36:05AM +0800, Lai Jiangshan wrote:
> On 07/27/2013 07:19 AM, Paul E. McKenney wrote:
> > From: "Paul E. McKenney" 
> > 
> > Because RCU's quiescent-state-forcing mechanism is used to drive the
> > full-system-idle state machine, and because this mechanism is executed
> > by RCU's grace-period kthreads, this commit forces these kthreads to
> > run on the timekeeping CPU (tick_do_timer_cpu).  To do otherwise would
> > mean that the RCU grace-period kthreads would force the system into
> > non-idle state every time they drove the state machine, which would
> > be just a bit on the futile side.
> > 
> > Signed-off-by: Paul E. McKenney 
> > Cc: Frederic Weisbecker 
> > Cc: Steven Rostedt 
> > ---
> >  kernel/rcutree.c|  1 +
> >  kernel/rcutree.h|  1 +
> >  kernel/rcutree_plugin.h | 20 +++-
> >  3 files changed, 21 insertions(+), 1 deletion(-)
> > 
> > diff --git a/kernel/rcutree.c b/kernel/rcutree.c
> > index aa6d96e..fe83085 100644
> > --- a/kernel/rcutree.c
> > +++ b/kernel/rcutree.c
> > @@ -1286,6 +1286,7 @@ static int rcu_gp_init(struct rcu_state *rsp)
> > struct rcu_data *rdp;
> > struct rcu_node *rnp = rcu_get_root(rsp);
> >  
> > +   rcu_bind_gp_kthread();
> > raw_spin_lock_irq(>lock);
> > rsp->gp_flags = 0; /* Clear all flags: New grace period. */
> 
> bind the gp thread when RCU_GP_FLAG_INIT ...
> 
> >  
> > diff --git a/kernel/rcutree.h b/kernel/rcutree.h
> > index e0de5dc..49dac99 100644
> > --- a/kernel/rcutree.h
> > +++ b/kernel/rcutree.h
> > @@ -560,6 +560,7 @@ static void rcu_sysidle_check_cpu(struct rcu_data *rdp, 
> > bool *isidle,
> >  static bool is_sysidle_rcu_state(struct rcu_state *rsp);
> >  static void rcu_sysidle_report_gp(struct rcu_state *rsp, int isidle,
> >   unsigned long maxj);
> > +static void rcu_bind_gp_kthread(void);
> >  static void rcu_sysidle_init_percpu_data(struct rcu_dynticks *rdtp);
> >  
> >  #endif /* #ifndef RCU_TREE_NONCORE */
> > diff --git a/kernel/rcutree_plugin.h b/kernel/rcutree_plugin.h
> > index ff84bed..f65d9c2 100644
> > --- a/kernel/rcutree_plugin.h
> > +++ b/kernel/rcutree_plugin.h
> > @@ -2544,7 +2544,7 @@ static void rcu_sysidle_check_cpu(struct rcu_data 
> > *rdp, bool *isidle,
> > if (!*isidle || rdp->rsp != rcu_sysidle_state ||
> > cpu_is_offline(rdp->cpu) || rdp->cpu == tick_do_timer_cpu)
> > return;
> > -   /* WARN_ON_ONCE(smp_processor_id() != tick_do_timer_cpu); */
> > +   WARN_ON_ONCE(smp_processor_id() != tick_do_timer_cpu);
> 
> 
> but call rcu_sysidle_check_cpu() when RCU_GP_FLAG_FQS.

Yep!  But we don't call rcu_gp_fqs() until the grace period is started,
by which time the kthread will be bound.  Any setting of RCU_GP_FLAG_FQS
while there is no grace period in progress is ignored.

> In this time, the thread may not be bound to tick_do_timer_cpu,
> the WARN_ON_ONCE() may be wrong.
> 
> Does any other code ensure the gp thread bound on tick_do_timer_cpu
> which I missed?

However, on small systems, rcu_sysidle_check_cpu() can be called from
the timekeeping CPU.  I suppose that this could potentially happen
before the first grace period starts, and in that case, we could
potentially see a spurious warning.  I could imagine a number of ways
to fix this:

1.  Bind the kthread when it is created.

2.  Bind the kthread when it first starts running, rather than just
after the grace period starts.

3.  Suppress the warning when there is no grace period in progress.

4.  Suppress the warning prior to the first grace period starting.

Seems like #3 is the most straightforward approach.  I just change it to:

if (rcu_gp_in_progress(rdp->rsp))
WARN_ON_ONCE(smp_processor_id() != tick_do_timer_cpu);

This still gets a WARN_ON_ONCE() if someone moves the timekeeping CPU,
but Frederic tells me that it never moves.  My WARN_ON_ONCE() has some
probability of complaining should some bug creep in.

Sound reasonable?

Thanx, Paul

> > /* Pick up current idle and NMI-nesting counter and check. */
> > cur = atomic_read(>dynticks_idle);
> > @@ -2570,6 +2570,20 @@ static bool is_sysidle_rcu_state(struct rcu_state 
> > *rsp)
> >  }
> >  
> >  /*
> > + * Bind the grace-period kthread for the sysidle flavor of RCU to the
> > + * timekeeping CPU.
> > + */
> > +static void rcu_bind_gp_kthread(void)
> > +{
> > +   int cpu = ACCESS_ONCE(tick_do_timer_cpu);
> > +
> > +   if (cpu < 0 || cpu >= nr_cpu_ids)
> > +   return;
> > +   if (raw_smp_processor_id() != cpu)
> > +   set_cpus_allowed_ptr(current, cpumask_of(cpu));
> > +}
> > +
> > +/*
> >   * Return a delay in jiffies based on the number of CPUs, rcu_node
> >   * leaf fanout, and jiffies tick rate.  The idea is to allow larger
> >   * systems more time to transition to full-idle state in order to
> > @@ -2767,6 +2781,10 @@ static bool is_sysidle_rcu_state(struct 

Re: [PATCH RFC nohz_full 7/7] nohz_full: Force RCU's grace-period kthreads onto timekeeping CPU

2013-07-29 Thread Paul E. McKenney
On Mon, Jul 29, 2013 at 11:36:05AM +0800, Lai Jiangshan wrote:
 On 07/27/2013 07:19 AM, Paul E. McKenney wrote:
  From: Paul E. McKenney paul...@linux.vnet.ibm.com
  
  Because RCU's quiescent-state-forcing mechanism is used to drive the
  full-system-idle state machine, and because this mechanism is executed
  by RCU's grace-period kthreads, this commit forces these kthreads to
  run on the timekeeping CPU (tick_do_timer_cpu).  To do otherwise would
  mean that the RCU grace-period kthreads would force the system into
  non-idle state every time they drove the state machine, which would
  be just a bit on the futile side.
  
  Signed-off-by: Paul E. McKenney paul...@linux.vnet.ibm.com
  Cc: Frederic Weisbecker fweis...@gmail.com
  Cc: Steven Rostedt rost...@goodmis.org
  ---
   kernel/rcutree.c|  1 +
   kernel/rcutree.h|  1 +
   kernel/rcutree_plugin.h | 20 +++-
   3 files changed, 21 insertions(+), 1 deletion(-)
  
  diff --git a/kernel/rcutree.c b/kernel/rcutree.c
  index aa6d96e..fe83085 100644
  --- a/kernel/rcutree.c
  +++ b/kernel/rcutree.c
  @@ -1286,6 +1286,7 @@ static int rcu_gp_init(struct rcu_state *rsp)
  struct rcu_data *rdp;
  struct rcu_node *rnp = rcu_get_root(rsp);
   
  +   rcu_bind_gp_kthread();
  raw_spin_lock_irq(rnp-lock);
  rsp-gp_flags = 0; /* Clear all flags: New grace period. */
 
 bind the gp thread when RCU_GP_FLAG_INIT ...
 
   
  diff --git a/kernel/rcutree.h b/kernel/rcutree.h
  index e0de5dc..49dac99 100644
  --- a/kernel/rcutree.h
  +++ b/kernel/rcutree.h
  @@ -560,6 +560,7 @@ static void rcu_sysidle_check_cpu(struct rcu_data *rdp, 
  bool *isidle,
   static bool is_sysidle_rcu_state(struct rcu_state *rsp);
   static void rcu_sysidle_report_gp(struct rcu_state *rsp, int isidle,
unsigned long maxj);
  +static void rcu_bind_gp_kthread(void);
   static void rcu_sysidle_init_percpu_data(struct rcu_dynticks *rdtp);
   
   #endif /* #ifndef RCU_TREE_NONCORE */
  diff --git a/kernel/rcutree_plugin.h b/kernel/rcutree_plugin.h
  index ff84bed..f65d9c2 100644
  --- a/kernel/rcutree_plugin.h
  +++ b/kernel/rcutree_plugin.h
  @@ -2544,7 +2544,7 @@ static void rcu_sysidle_check_cpu(struct rcu_data 
  *rdp, bool *isidle,
  if (!*isidle || rdp-rsp != rcu_sysidle_state ||
  cpu_is_offline(rdp-cpu) || rdp-cpu == tick_do_timer_cpu)
  return;
  -   /* WARN_ON_ONCE(smp_processor_id() != tick_do_timer_cpu); */
  +   WARN_ON_ONCE(smp_processor_id() != tick_do_timer_cpu);
 
 
 but call rcu_sysidle_check_cpu() when RCU_GP_FLAG_FQS.

Yep!  But we don't call rcu_gp_fqs() until the grace period is started,
by which time the kthread will be bound.  Any setting of RCU_GP_FLAG_FQS
while there is no grace period in progress is ignored.

 In this time, the thread may not be bound to tick_do_timer_cpu,
 the WARN_ON_ONCE() may be wrong.
 
 Does any other code ensure the gp thread bound on tick_do_timer_cpu
 which I missed?

However, on small systems, rcu_sysidle_check_cpu() can be called from
the timekeeping CPU.  I suppose that this could potentially happen
before the first grace period starts, and in that case, we could
potentially see a spurious warning.  I could imagine a number of ways
to fix this:

1.  Bind the kthread when it is created.

2.  Bind the kthread when it first starts running, rather than just
after the grace period starts.

3.  Suppress the warning when there is no grace period in progress.

4.  Suppress the warning prior to the first grace period starting.

Seems like #3 is the most straightforward approach.  I just change it to:

if (rcu_gp_in_progress(rdp-rsp))
WARN_ON_ONCE(smp_processor_id() != tick_do_timer_cpu);

This still gets a WARN_ON_ONCE() if someone moves the timekeeping CPU,
but Frederic tells me that it never moves.  My WARN_ON_ONCE() has some
probability of complaining should some bug creep in.

Sound reasonable?

Thanx, Paul

  /* Pick up current idle and NMI-nesting counter and check. */
  cur = atomic_read(rdtp-dynticks_idle);
  @@ -2570,6 +2570,20 @@ static bool is_sysidle_rcu_state(struct rcu_state 
  *rsp)
   }
   
   /*
  + * Bind the grace-period kthread for the sysidle flavor of RCU to the
  + * timekeeping CPU.
  + */
  +static void rcu_bind_gp_kthread(void)
  +{
  +   int cpu = ACCESS_ONCE(tick_do_timer_cpu);
  +
  +   if (cpu  0 || cpu = nr_cpu_ids)
  +   return;
  +   if (raw_smp_processor_id() != cpu)
  +   set_cpus_allowed_ptr(current, cpumask_of(cpu));
  +}
  +
  +/*
* Return a delay in jiffies based on the number of CPUs, rcu_node
* leaf fanout, and jiffies tick rate.  The idea is to allow larger
* systems more time to transition to full-idle state in order to
  @@ -2767,6 +2781,10 @@ static bool is_sysidle_rcu_state(struct rcu_state 
  *rsp)
  return false;
   }
   
  +static void rcu_bind_gp_kthread(void)
  

Re: [PATCH RFC nohz_full 7/7] nohz_full: Force RCU's grace-period kthreads onto timekeeping CPU

2013-07-29 Thread Frederic Weisbecker
On Mon, Jul 29, 2013 at 09:52:53AM -0700, Paul E. McKenney wrote:
 On Mon, Jul 29, 2013 at 11:36:05AM +0800, Lai Jiangshan wrote:
 However, on small systems, rcu_sysidle_check_cpu() can be called from
 the timekeeping CPU.  I suppose that this could potentially happen
 before the first grace period starts, and in that case, we could
 potentially see a spurious warning.  I could imagine a number of ways
 to fix this:
 
 1.Bind the kthread when it is created.
 
 2.Bind the kthread when it first starts running, rather than just
   after the grace period starts.
 
 3.Suppress the warning when there is no grace period in progress.
 
 4.Suppress the warning prior to the first grace period starting.
 
 Seems like #3 is the most straightforward approach.  I just change it to:
 
   if (rcu_gp_in_progress(rdp-rsp))
   WARN_ON_ONCE(smp_processor_id() != tick_do_timer_cpu);
 
 This still gets a WARN_ON_ONCE() if someone moves the timekeeping CPU,
 but Frederic tells me that it never moves.  My WARN_ON_ONCE() has some
 probability of complaining should some bug creep in.

It doesn't move for now but keep in mind that it will probably be able
to move in the future. If we have several non full-dynticks CPUs, balancing
the timekeeping duty between them, depending which one runs at a given time,
may improve power savings even better.

But you can ignore that for now. Your patchset is entertaining enough that
we don't need to add more complications yet ;)
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH RFC nohz_full 7/7] nohz_full: Force RCU's grace-period kthreads onto timekeeping CPU

2013-07-29 Thread Paul E. McKenney
On Mon, Jul 29, 2013 at 06:59:46PM +0200, Frederic Weisbecker wrote:
 On Mon, Jul 29, 2013 at 09:52:53AM -0700, Paul E. McKenney wrote:
  On Mon, Jul 29, 2013 at 11:36:05AM +0800, Lai Jiangshan wrote:
  However, on small systems, rcu_sysidle_check_cpu() can be called from
  the timekeeping CPU.  I suppose that this could potentially happen
  before the first grace period starts, and in that case, we could
  potentially see a spurious warning.  I could imagine a number of ways
  to fix this:
  
  1.  Bind the kthread when it is created.
  
  2.  Bind the kthread when it first starts running, rather than just
  after the grace period starts.
  
  3.  Suppress the warning when there is no grace period in progress.
  
  4.  Suppress the warning prior to the first grace period starting.
  
  Seems like #3 is the most straightforward approach.  I just change it to:
  
  if (rcu_gp_in_progress(rdp-rsp))
  WARN_ON_ONCE(smp_processor_id() != tick_do_timer_cpu);
  
  This still gets a WARN_ON_ONCE() if someone moves the timekeeping CPU,
  but Frederic tells me that it never moves.  My WARN_ON_ONCE() has some
  probability of complaining should some bug creep in.
 
 It doesn't move for now but keep in mind that it will probably be able
 to move in the future. If we have several non full-dynticks CPUs, balancing
 the timekeeping duty between them, depending which one runs at a given time,
 may improve power savings even better.
 
 But you can ignore that for now. Your patchset is entertaining enough that
 we don't need to add more complications yet ;)

Yeah, we will need some sort of handshake for that.  Might be a simple
as setting a flag that suppresses the warning, which I clear the next
time I bind the kthread.  Well, it would need to deal with closely-spaced
moves of the timekeeping duty, wouldn't it?  Plus it would need to deal
with the fact that sampling the variable referencing the timekeeping CPU,
sampling the current CPU, and binding the kthread cannot be done as one
big atomic operation.  Which means that their would need to be two calls
in the handshake, one to prepare to move the timekeeping CPU and another
to announce that it had in fact been moved.

Which is not too hard -- I use an irq-disable lock to guard setting and
clearing an internal-to-RCU flag noting the upcoming change.  The
check and WARN_ON_ONCE() are done while holding this same lock.  The
flag is cleared only after the kthread-bind operation that follows the
last it has in fact been moved handshake.  So the flag has three states,
idle, ready to move, and moved.  The possibility of closely spaced moves
of the timekeeping kthread are dealt with by transitioning from moved
to ready to move.  The state goes back to idle only after completion
of a kthread-bind operation in the moved state.

So agreed, let's defer this one.  ;-)

Thanx, Paul

--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH RFC nohz_full 7/7] nohz_full: Force RCU's grace-period kthreads onto timekeeping CPU

2013-07-29 Thread Lai Jiangshan
On 07/30/2013 12:52 AM, Paul E. McKenney wrote:
 On Mon, Jul 29, 2013 at 11:36:05AM +0800, Lai Jiangshan wrote:
 On 07/27/2013 07:19 AM, Paul E. McKenney wrote:
 From: Paul E. McKenney paul...@linux.vnet.ibm.com

 Because RCU's quiescent-state-forcing mechanism is used to drive the
 full-system-idle state machine, and because this mechanism is executed
 by RCU's grace-period kthreads, this commit forces these kthreads to
 run on the timekeeping CPU (tick_do_timer_cpu).  To do otherwise would
 mean that the RCU grace-period kthreads would force the system into
 non-idle state every time they drove the state machine, which would
 be just a bit on the futile side.

 Signed-off-by: Paul E. McKenney paul...@linux.vnet.ibm.com
 Cc: Frederic Weisbecker fweis...@gmail.com
 Cc: Steven Rostedt rost...@goodmis.org
 ---
  kernel/rcutree.c|  1 +
  kernel/rcutree.h|  1 +
  kernel/rcutree_plugin.h | 20 +++-
  3 files changed, 21 insertions(+), 1 deletion(-)

 diff --git a/kernel/rcutree.c b/kernel/rcutree.c
 index aa6d96e..fe83085 100644
 --- a/kernel/rcutree.c
 +++ b/kernel/rcutree.c
 @@ -1286,6 +1286,7 @@ static int rcu_gp_init(struct rcu_state *rsp)
 struct rcu_data *rdp;
 struct rcu_node *rnp = rcu_get_root(rsp);
  
 +   rcu_bind_gp_kthread();
 raw_spin_lock_irq(rnp-lock);
 rsp-gp_flags = 0; /* Clear all flags: New grace period. */

 bind the gp thread when RCU_GP_FLAG_INIT ...

  
 diff --git a/kernel/rcutree.h b/kernel/rcutree.h
 index e0de5dc..49dac99 100644
 --- a/kernel/rcutree.h
 +++ b/kernel/rcutree.h
 @@ -560,6 +560,7 @@ static void rcu_sysidle_check_cpu(struct rcu_data *rdp, 
 bool *isidle,
  static bool is_sysidle_rcu_state(struct rcu_state *rsp);
  static void rcu_sysidle_report_gp(struct rcu_state *rsp, int isidle,
   unsigned long maxj);
 +static void rcu_bind_gp_kthread(void);
  static void rcu_sysidle_init_percpu_data(struct rcu_dynticks *rdtp);
  
  #endif /* #ifndef RCU_TREE_NONCORE */
 diff --git a/kernel/rcutree_plugin.h b/kernel/rcutree_plugin.h
 index ff84bed..f65d9c2 100644
 --- a/kernel/rcutree_plugin.h
 +++ b/kernel/rcutree_plugin.h
 @@ -2544,7 +2544,7 @@ static void rcu_sysidle_check_cpu(struct rcu_data 
 *rdp, bool *isidle,
 if (!*isidle || rdp-rsp != rcu_sysidle_state ||
 cpu_is_offline(rdp-cpu) || rdp-cpu == tick_do_timer_cpu)
 return;
 -   /* WARN_ON_ONCE(smp_processor_id() != tick_do_timer_cpu); */
 +   WARN_ON_ONCE(smp_processor_id() != tick_do_timer_cpu);


 but call rcu_sysidle_check_cpu() when RCU_GP_FLAG_FQS.
 
 Yep!  But we don't call rcu_gp_fqs() until the grace period is started,
 by which time the kthread will be bound.  Any setting of RCU_GP_FLAG_FQS
 while there is no grace period in progress is ignored.

tick_do_timer_cpu can be changed.
when rcu_gp_fqs() is called, the tick_do_timer_cpu may be a different CPU.

xxx_thread()
{
bind itself to tick_do_timer_cpu.
sleep(); /* tick_do_timer_cpu can be changed while this */
use wrong tick_do_timer_cpu.
}


 
 In this time, the thread may not be bound to tick_do_timer_cpu,
 the WARN_ON_ONCE() may be wrong.

 Does any other code ensure the gp thread bound on tick_do_timer_cpu
 which I missed?
 
 However, on small systems, rcu_sysidle_check_cpu() can be called from
 the timekeeping CPU.  I suppose that this could potentially happen
 before the first grace period starts, and in that case, we could
 potentially see a spurious warning.  I could imagine a number of ways
 to fix this:
 
 1.Bind the kthread when it is created.
 
 2.Bind the kthread when it first starts running, rather than just
   after the grace period starts.
 
 3.Suppress the warning when there is no grace period in progress.
 
 4.Suppress the warning prior to the first grace period starting.
 
 Seems like #3 is the most straightforward approach.  I just change it to:
 
   if (rcu_gp_in_progress(rdp-rsp))
   WARN_ON_ONCE(smp_processor_id() != tick_do_timer_cpu);
 
 This still gets a WARN_ON_ONCE() if someone moves the timekeeping CPU,
 but Frederic tells me that it never moves.  My WARN_ON_ONCE() has some
 probability of complaining should some bug creep in.
 
 Sound reasonable?
 
   Thanx, Paul
 
 /* Pick up current idle and NMI-nesting counter and check. */
 cur = atomic_read(rdtp-dynticks_idle);
 @@ -2570,6 +2570,20 @@ static bool is_sysidle_rcu_state(struct rcu_state 
 *rsp)
  }
  
  /*
 + * Bind the grace-period kthread for the sysidle flavor of RCU to the
 + * timekeeping CPU.
 + */
 +static void rcu_bind_gp_kthread(void)
 +{
 +   int cpu = ACCESS_ONCE(tick_do_timer_cpu);
 +
 +   if (cpu  0 || cpu = nr_cpu_ids)
 +   return;
 +   if (raw_smp_processor_id() != cpu)
 +   set_cpus_allowed_ptr(current, cpumask_of(cpu));
 +}
 +
 +/*
   * Return a delay in jiffies based on the number of CPUs, rcu_node
   * leaf fanout, and jiffies 

Re: [PATCH RFC nohz_full 7/7] nohz_full: Force RCU's grace-period kthreads onto timekeeping CPU

2013-07-28 Thread Lai Jiangshan
On 07/27/2013 07:19 AM, Paul E. McKenney wrote:
> From: "Paul E. McKenney" 
> 
> Because RCU's quiescent-state-forcing mechanism is used to drive the
> full-system-idle state machine, and because this mechanism is executed
> by RCU's grace-period kthreads, this commit forces these kthreads to
> run on the timekeeping CPU (tick_do_timer_cpu).  To do otherwise would
> mean that the RCU grace-period kthreads would force the system into
> non-idle state every time they drove the state machine, which would
> be just a bit on the futile side.
> 
> Signed-off-by: Paul E. McKenney 
> Cc: Frederic Weisbecker 
> Cc: Steven Rostedt 
> ---
>  kernel/rcutree.c|  1 +
>  kernel/rcutree.h|  1 +
>  kernel/rcutree_plugin.h | 20 +++-
>  3 files changed, 21 insertions(+), 1 deletion(-)
> 
> diff --git a/kernel/rcutree.c b/kernel/rcutree.c
> index aa6d96e..fe83085 100644
> --- a/kernel/rcutree.c
> +++ b/kernel/rcutree.c
> @@ -1286,6 +1286,7 @@ static int rcu_gp_init(struct rcu_state *rsp)
>   struct rcu_data *rdp;
>   struct rcu_node *rnp = rcu_get_root(rsp);
>  
> + rcu_bind_gp_kthread();
>   raw_spin_lock_irq(>lock);
>   rsp->gp_flags = 0; /* Clear all flags: New grace period. */

bind the gp thread when RCU_GP_FLAG_INIT ...

>  
> diff --git a/kernel/rcutree.h b/kernel/rcutree.h
> index e0de5dc..49dac99 100644
> --- a/kernel/rcutree.h
> +++ b/kernel/rcutree.h
> @@ -560,6 +560,7 @@ static void rcu_sysidle_check_cpu(struct rcu_data *rdp, 
> bool *isidle,
>  static bool is_sysidle_rcu_state(struct rcu_state *rsp);
>  static void rcu_sysidle_report_gp(struct rcu_state *rsp, int isidle,
> unsigned long maxj);
> +static void rcu_bind_gp_kthread(void);
>  static void rcu_sysidle_init_percpu_data(struct rcu_dynticks *rdtp);
>  
>  #endif /* #ifndef RCU_TREE_NONCORE */
> diff --git a/kernel/rcutree_plugin.h b/kernel/rcutree_plugin.h
> index ff84bed..f65d9c2 100644
> --- a/kernel/rcutree_plugin.h
> +++ b/kernel/rcutree_plugin.h
> @@ -2544,7 +2544,7 @@ static void rcu_sysidle_check_cpu(struct rcu_data *rdp, 
> bool *isidle,
>   if (!*isidle || rdp->rsp != rcu_sysidle_state ||
>   cpu_is_offline(rdp->cpu) || rdp->cpu == tick_do_timer_cpu)
>   return;
> - /* WARN_ON_ONCE(smp_processor_id() != tick_do_timer_cpu); */
> + WARN_ON_ONCE(smp_processor_id() != tick_do_timer_cpu);


but call rcu_sysidle_check_cpu() when RCU_GP_FLAG_FQS.

In this time, the thread may not be bound to tick_do_timer_cpu,
the WARN_ON_ONCE() may be wrong.

Does any other code ensure the gp thread bound on tick_do_timer_cpu
which I missed?

>  
>   /* Pick up current idle and NMI-nesting counter and check. */
>   cur = atomic_read(>dynticks_idle);
> @@ -2570,6 +2570,20 @@ static bool is_sysidle_rcu_state(struct rcu_state *rsp)
>  }
>  
>  /*
> + * Bind the grace-period kthread for the sysidle flavor of RCU to the
> + * timekeeping CPU.
> + */
> +static void rcu_bind_gp_kthread(void)
> +{
> + int cpu = ACCESS_ONCE(tick_do_timer_cpu);
> +
> + if (cpu < 0 || cpu >= nr_cpu_ids)
> + return;
> + if (raw_smp_processor_id() != cpu)
> + set_cpus_allowed_ptr(current, cpumask_of(cpu));
> +}
> +
> +/*
>   * Return a delay in jiffies based on the number of CPUs, rcu_node
>   * leaf fanout, and jiffies tick rate.  The idea is to allow larger
>   * systems more time to transition to full-idle state in order to
> @@ -2767,6 +2781,10 @@ static bool is_sysidle_rcu_state(struct rcu_state *rsp)
>   return false;
>  }
>  
> +static void rcu_bind_gp_kthread(void)
> +{
> +}
> +
>  static void rcu_sysidle_report_gp(struct rcu_state *rsp, int isidle,
> unsigned long maxj)
>  {

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH RFC nohz_full 7/7] nohz_full: Force RCU's grace-period kthreads onto timekeeping CPU

2013-07-28 Thread Lai Jiangshan
On 07/27/2013 07:19 AM, Paul E. McKenney wrote:
 From: Paul E. McKenney paul...@linux.vnet.ibm.com
 
 Because RCU's quiescent-state-forcing mechanism is used to drive the
 full-system-idle state machine, and because this mechanism is executed
 by RCU's grace-period kthreads, this commit forces these kthreads to
 run on the timekeeping CPU (tick_do_timer_cpu).  To do otherwise would
 mean that the RCU grace-period kthreads would force the system into
 non-idle state every time they drove the state machine, which would
 be just a bit on the futile side.
 
 Signed-off-by: Paul E. McKenney paul...@linux.vnet.ibm.com
 Cc: Frederic Weisbecker fweis...@gmail.com
 Cc: Steven Rostedt rost...@goodmis.org
 ---
  kernel/rcutree.c|  1 +
  kernel/rcutree.h|  1 +
  kernel/rcutree_plugin.h | 20 +++-
  3 files changed, 21 insertions(+), 1 deletion(-)
 
 diff --git a/kernel/rcutree.c b/kernel/rcutree.c
 index aa6d96e..fe83085 100644
 --- a/kernel/rcutree.c
 +++ b/kernel/rcutree.c
 @@ -1286,6 +1286,7 @@ static int rcu_gp_init(struct rcu_state *rsp)
   struct rcu_data *rdp;
   struct rcu_node *rnp = rcu_get_root(rsp);
  
 + rcu_bind_gp_kthread();
   raw_spin_lock_irq(rnp-lock);
   rsp-gp_flags = 0; /* Clear all flags: New grace period. */

bind the gp thread when RCU_GP_FLAG_INIT ...

  
 diff --git a/kernel/rcutree.h b/kernel/rcutree.h
 index e0de5dc..49dac99 100644
 --- a/kernel/rcutree.h
 +++ b/kernel/rcutree.h
 @@ -560,6 +560,7 @@ static void rcu_sysidle_check_cpu(struct rcu_data *rdp, 
 bool *isidle,
  static bool is_sysidle_rcu_state(struct rcu_state *rsp);
  static void rcu_sysidle_report_gp(struct rcu_state *rsp, int isidle,
 unsigned long maxj);
 +static void rcu_bind_gp_kthread(void);
  static void rcu_sysidle_init_percpu_data(struct rcu_dynticks *rdtp);
  
  #endif /* #ifndef RCU_TREE_NONCORE */
 diff --git a/kernel/rcutree_plugin.h b/kernel/rcutree_plugin.h
 index ff84bed..f65d9c2 100644
 --- a/kernel/rcutree_plugin.h
 +++ b/kernel/rcutree_plugin.h
 @@ -2544,7 +2544,7 @@ static void rcu_sysidle_check_cpu(struct rcu_data *rdp, 
 bool *isidle,
   if (!*isidle || rdp-rsp != rcu_sysidle_state ||
   cpu_is_offline(rdp-cpu) || rdp-cpu == tick_do_timer_cpu)
   return;
 - /* WARN_ON_ONCE(smp_processor_id() != tick_do_timer_cpu); */
 + WARN_ON_ONCE(smp_processor_id() != tick_do_timer_cpu);


but call rcu_sysidle_check_cpu() when RCU_GP_FLAG_FQS.

In this time, the thread may not be bound to tick_do_timer_cpu,
the WARN_ON_ONCE() may be wrong.

Does any other code ensure the gp thread bound on tick_do_timer_cpu
which I missed?

  
   /* Pick up current idle and NMI-nesting counter and check. */
   cur = atomic_read(rdtp-dynticks_idle);
 @@ -2570,6 +2570,20 @@ static bool is_sysidle_rcu_state(struct rcu_state *rsp)
  }
  
  /*
 + * Bind the grace-period kthread for the sysidle flavor of RCU to the
 + * timekeeping CPU.
 + */
 +static void rcu_bind_gp_kthread(void)
 +{
 + int cpu = ACCESS_ONCE(tick_do_timer_cpu);
 +
 + if (cpu  0 || cpu = nr_cpu_ids)
 + return;
 + if (raw_smp_processor_id() != cpu)
 + set_cpus_allowed_ptr(current, cpumask_of(cpu));
 +}
 +
 +/*
   * Return a delay in jiffies based on the number of CPUs, rcu_node
   * leaf fanout, and jiffies tick rate.  The idea is to allow larger
   * systems more time to transition to full-idle state in order to
 @@ -2767,6 +2781,10 @@ static bool is_sysidle_rcu_state(struct rcu_state *rsp)
   return false;
  }
  
 +static void rcu_bind_gp_kthread(void)
 +{
 +}
 +
  static void rcu_sysidle_report_gp(struct rcu_state *rsp, int isidle,
 unsigned long maxj)
  {

--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH RFC nohz_full 7/7] nohz_full: Force RCU's grace-period kthreads onto timekeeping CPU

2013-07-26 Thread Paul E. McKenney
From: "Paul E. McKenney" 

Because RCU's quiescent-state-forcing mechanism is used to drive the
full-system-idle state machine, and because this mechanism is executed
by RCU's grace-period kthreads, this commit forces these kthreads to
run on the timekeeping CPU (tick_do_timer_cpu).  To do otherwise would
mean that the RCU grace-period kthreads would force the system into
non-idle state every time they drove the state machine, which would
be just a bit on the futile side.

Signed-off-by: Paul E. McKenney 
Cc: Frederic Weisbecker 
Cc: Steven Rostedt 
---
 kernel/rcutree.c|  1 +
 kernel/rcutree.h|  1 +
 kernel/rcutree_plugin.h | 20 +++-
 3 files changed, 21 insertions(+), 1 deletion(-)

diff --git a/kernel/rcutree.c b/kernel/rcutree.c
index aa6d96e..fe83085 100644
--- a/kernel/rcutree.c
+++ b/kernel/rcutree.c
@@ -1286,6 +1286,7 @@ static int rcu_gp_init(struct rcu_state *rsp)
struct rcu_data *rdp;
struct rcu_node *rnp = rcu_get_root(rsp);
 
+   rcu_bind_gp_kthread();
raw_spin_lock_irq(>lock);
rsp->gp_flags = 0; /* Clear all flags: New grace period. */
 
diff --git a/kernel/rcutree.h b/kernel/rcutree.h
index e0de5dc..49dac99 100644
--- a/kernel/rcutree.h
+++ b/kernel/rcutree.h
@@ -560,6 +560,7 @@ static void rcu_sysidle_check_cpu(struct rcu_data *rdp, 
bool *isidle,
 static bool is_sysidle_rcu_state(struct rcu_state *rsp);
 static void rcu_sysidle_report_gp(struct rcu_state *rsp, int isidle,
  unsigned long maxj);
+static void rcu_bind_gp_kthread(void);
 static void rcu_sysidle_init_percpu_data(struct rcu_dynticks *rdtp);
 
 #endif /* #ifndef RCU_TREE_NONCORE */
diff --git a/kernel/rcutree_plugin.h b/kernel/rcutree_plugin.h
index ff84bed..f65d9c2 100644
--- a/kernel/rcutree_plugin.h
+++ b/kernel/rcutree_plugin.h
@@ -2544,7 +2544,7 @@ static void rcu_sysidle_check_cpu(struct rcu_data *rdp, 
bool *isidle,
if (!*isidle || rdp->rsp != rcu_sysidle_state ||
cpu_is_offline(rdp->cpu) || rdp->cpu == tick_do_timer_cpu)
return;
-   /* WARN_ON_ONCE(smp_processor_id() != tick_do_timer_cpu); */
+   WARN_ON_ONCE(smp_processor_id() != tick_do_timer_cpu);
 
/* Pick up current idle and NMI-nesting counter and check. */
cur = atomic_read(>dynticks_idle);
@@ -2570,6 +2570,20 @@ static bool is_sysidle_rcu_state(struct rcu_state *rsp)
 }
 
 /*
+ * Bind the grace-period kthread for the sysidle flavor of RCU to the
+ * timekeeping CPU.
+ */
+static void rcu_bind_gp_kthread(void)
+{
+   int cpu = ACCESS_ONCE(tick_do_timer_cpu);
+
+   if (cpu < 0 || cpu >= nr_cpu_ids)
+   return;
+   if (raw_smp_processor_id() != cpu)
+   set_cpus_allowed_ptr(current, cpumask_of(cpu));
+}
+
+/*
  * Return a delay in jiffies based on the number of CPUs, rcu_node
  * leaf fanout, and jiffies tick rate.  The idea is to allow larger
  * systems more time to transition to full-idle state in order to
@@ -2767,6 +2781,10 @@ static bool is_sysidle_rcu_state(struct rcu_state *rsp)
return false;
 }
 
+static void rcu_bind_gp_kthread(void)
+{
+}
+
 static void rcu_sysidle_report_gp(struct rcu_state *rsp, int isidle,
  unsigned long maxj)
 {
-- 
1.8.1.5

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH RFC nohz_full 7/7] nohz_full: Force RCU's grace-period kthreads onto timekeeping CPU

2013-07-26 Thread Paul E. McKenney
From: Paul E. McKenney paul...@linux.vnet.ibm.com

Because RCU's quiescent-state-forcing mechanism is used to drive the
full-system-idle state machine, and because this mechanism is executed
by RCU's grace-period kthreads, this commit forces these kthreads to
run on the timekeeping CPU (tick_do_timer_cpu).  To do otherwise would
mean that the RCU grace-period kthreads would force the system into
non-idle state every time they drove the state machine, which would
be just a bit on the futile side.

Signed-off-by: Paul E. McKenney paul...@linux.vnet.ibm.com
Cc: Frederic Weisbecker fweis...@gmail.com
Cc: Steven Rostedt rost...@goodmis.org
---
 kernel/rcutree.c|  1 +
 kernel/rcutree.h|  1 +
 kernel/rcutree_plugin.h | 20 +++-
 3 files changed, 21 insertions(+), 1 deletion(-)

diff --git a/kernel/rcutree.c b/kernel/rcutree.c
index aa6d96e..fe83085 100644
--- a/kernel/rcutree.c
+++ b/kernel/rcutree.c
@@ -1286,6 +1286,7 @@ static int rcu_gp_init(struct rcu_state *rsp)
struct rcu_data *rdp;
struct rcu_node *rnp = rcu_get_root(rsp);
 
+   rcu_bind_gp_kthread();
raw_spin_lock_irq(rnp-lock);
rsp-gp_flags = 0; /* Clear all flags: New grace period. */
 
diff --git a/kernel/rcutree.h b/kernel/rcutree.h
index e0de5dc..49dac99 100644
--- a/kernel/rcutree.h
+++ b/kernel/rcutree.h
@@ -560,6 +560,7 @@ static void rcu_sysidle_check_cpu(struct rcu_data *rdp, 
bool *isidle,
 static bool is_sysidle_rcu_state(struct rcu_state *rsp);
 static void rcu_sysidle_report_gp(struct rcu_state *rsp, int isidle,
  unsigned long maxj);
+static void rcu_bind_gp_kthread(void);
 static void rcu_sysidle_init_percpu_data(struct rcu_dynticks *rdtp);
 
 #endif /* #ifndef RCU_TREE_NONCORE */
diff --git a/kernel/rcutree_plugin.h b/kernel/rcutree_plugin.h
index ff84bed..f65d9c2 100644
--- a/kernel/rcutree_plugin.h
+++ b/kernel/rcutree_plugin.h
@@ -2544,7 +2544,7 @@ static void rcu_sysidle_check_cpu(struct rcu_data *rdp, 
bool *isidle,
if (!*isidle || rdp-rsp != rcu_sysidle_state ||
cpu_is_offline(rdp-cpu) || rdp-cpu == tick_do_timer_cpu)
return;
-   /* WARN_ON_ONCE(smp_processor_id() != tick_do_timer_cpu); */
+   WARN_ON_ONCE(smp_processor_id() != tick_do_timer_cpu);
 
/* Pick up current idle and NMI-nesting counter and check. */
cur = atomic_read(rdtp-dynticks_idle);
@@ -2570,6 +2570,20 @@ static bool is_sysidle_rcu_state(struct rcu_state *rsp)
 }
 
 /*
+ * Bind the grace-period kthread for the sysidle flavor of RCU to the
+ * timekeeping CPU.
+ */
+static void rcu_bind_gp_kthread(void)
+{
+   int cpu = ACCESS_ONCE(tick_do_timer_cpu);
+
+   if (cpu  0 || cpu = nr_cpu_ids)
+   return;
+   if (raw_smp_processor_id() != cpu)
+   set_cpus_allowed_ptr(current, cpumask_of(cpu));
+}
+
+/*
  * Return a delay in jiffies based on the number of CPUs, rcu_node
  * leaf fanout, and jiffies tick rate.  The idea is to allow larger
  * systems more time to transition to full-idle state in order to
@@ -2767,6 +2781,10 @@ static bool is_sysidle_rcu_state(struct rcu_state *rsp)
return false;
 }
 
+static void rcu_bind_gp_kthread(void)
+{
+}
+
 static void rcu_sysidle_report_gp(struct rcu_state *rsp, int isidle,
  unsigned long maxj)
 {
-- 
1.8.1.5

--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH RFC nohz_full 7/7] nohz_full: Force RCU's grace-period kthreads onto timekeeping CPU

2013-07-08 Thread Paul E. McKenney
From: "Paul E. McKenney" 

Because RCU's quiescent-state-forcing mechanism is used to drive the
full-system-idle state machine, and because this mechanism is executed
by RCU's grace-period kthreads, this commit forces these kthreads to
run on the timekeeping CPU (tick_do_timer_cpu).  To do otherwise would
mean that the RCU grace-period kthreads would force the system into
non-idle state every time they drove the state machine, which would
be just a bit on the futile side.

Signed-off-by: Paul E. McKenney 
Cc: Frederic Weisbecker 
Cc: Steven Rostedt 
---
 kernel/rcutree.c|  1 +
 kernel/rcutree.h|  1 +
 kernel/rcutree_plugin.h | 20 +++-
 3 files changed, 21 insertions(+), 1 deletion(-)

diff --git a/kernel/rcutree.c b/kernel/rcutree.c
index 06cfd75..ad9a5ec 100644
--- a/kernel/rcutree.c
+++ b/kernel/rcutree.c
@@ -1286,6 +1286,7 @@ static int rcu_gp_init(struct rcu_state *rsp)
struct rcu_data *rdp;
struct rcu_node *rnp = rcu_get_root(rsp);
 
+   rcu_bind_gp_kthread();
raw_spin_lock_irq(>lock);
rsp->gp_flags = 0; /* Clear all flags: New grace period. */
 
diff --git a/kernel/rcutree.h b/kernel/rcutree.h
index 7326a3c..1602c21 100644
--- a/kernel/rcutree.h
+++ b/kernel/rcutree.h
@@ -558,6 +558,7 @@ static void rcu_sysidle_exit(struct rcu_dynticks *rdtp, int 
irq);
 static void rcu_sysidle_check_cpu(struct rcu_data *rdp, bool *isidle,
  unsigned long *maxj);
 static bool is_sysidle_rcu_state(struct rcu_state *rsp);
+static void rcu_bind_gp_kthread(void);
 static void rcu_sysidle_report(struct rcu_state *rsp, int isidle,
   unsigned long maxj);
 static void rcu_sysidle_init_percpu_data(struct rcu_dynticks *rdtp);
diff --git a/kernel/rcutree_plugin.h b/kernel/rcutree_plugin.h
index b47ffb0..a4d44c3 100644
--- a/kernel/rcutree_plugin.h
+++ b/kernel/rcutree_plugin.h
@@ -2544,7 +2544,7 @@ static void rcu_sysidle_check_cpu(struct rcu_data *rdp, 
bool *isidle,
if (!*isidle || rdp->rsp != rcu_sysidle_state ||
cpu_is_offline(rdp->cpu) || rdp->cpu == tick_do_timer_cpu)
return;
-   /* WARN_ON_ONCE(smp_processor_id() != tick_do_timer_cpu); */
+   WARN_ON_ONCE(smp_processor_id() != tick_do_timer_cpu);
 
/* Pick up current idle and NMI-nesting counter and check. */
cur = atomic_read(>dynticks_idle);
@@ -2570,6 +2570,20 @@ static bool is_sysidle_rcu_state(struct rcu_state *rsp)
 }
 
 /*
+ * Bind the grace-period kthread for the sysidle flavor of RCU to the
+ * timekeeping CPU.
+ */
+static void rcu_bind_gp_kthread(void)
+{
+   int cpu = ACCESS_ONCE(tick_do_timer_cpu);
+
+   if (cpu < 0 || cpu >= nr_cpu_ids)
+   return;
+   if (raw_smp_processor_id() != cpu)
+   set_cpus_allowed_ptr(current, cpumask_of(cpu));
+}
+
+/*
  * Return a delay in jiffies based on the number of CPUs, rcu_node
  * leaf fanout, and jiffies tick rate.  The idea is to allow larger
  * systems more time to transition to full-idle state in order to
@@ -2758,6 +2772,10 @@ static bool is_sysidle_rcu_state(struct rcu_state *rsp)
return false;
 }
 
+static void rcu_bind_gp_kthread(void)
+{
+}
+
 static void rcu_sysidle_report(struct rcu_state *rsp, int isidle,
   unsigned long maxj)
 {
-- 
1.8.1.5

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH RFC nohz_full 7/7] nohz_full: Force RCU's grace-period kthreads onto timekeeping CPU

2013-07-08 Thread Paul E. McKenney
From: Paul E. McKenney paul...@linux.vnet.ibm.com

Because RCU's quiescent-state-forcing mechanism is used to drive the
full-system-idle state machine, and because this mechanism is executed
by RCU's grace-period kthreads, this commit forces these kthreads to
run on the timekeeping CPU (tick_do_timer_cpu).  To do otherwise would
mean that the RCU grace-period kthreads would force the system into
non-idle state every time they drove the state machine, which would
be just a bit on the futile side.

Signed-off-by: Paul E. McKenney paul...@linux.vnet.ibm.com
Cc: Frederic Weisbecker fweis...@gmail.com
Cc: Steven Rostedt rost...@goodmis.org
---
 kernel/rcutree.c|  1 +
 kernel/rcutree.h|  1 +
 kernel/rcutree_plugin.h | 20 +++-
 3 files changed, 21 insertions(+), 1 deletion(-)

diff --git a/kernel/rcutree.c b/kernel/rcutree.c
index 06cfd75..ad9a5ec 100644
--- a/kernel/rcutree.c
+++ b/kernel/rcutree.c
@@ -1286,6 +1286,7 @@ static int rcu_gp_init(struct rcu_state *rsp)
struct rcu_data *rdp;
struct rcu_node *rnp = rcu_get_root(rsp);
 
+   rcu_bind_gp_kthread();
raw_spin_lock_irq(rnp-lock);
rsp-gp_flags = 0; /* Clear all flags: New grace period. */
 
diff --git a/kernel/rcutree.h b/kernel/rcutree.h
index 7326a3c..1602c21 100644
--- a/kernel/rcutree.h
+++ b/kernel/rcutree.h
@@ -558,6 +558,7 @@ static void rcu_sysidle_exit(struct rcu_dynticks *rdtp, int 
irq);
 static void rcu_sysidle_check_cpu(struct rcu_data *rdp, bool *isidle,
  unsigned long *maxj);
 static bool is_sysidle_rcu_state(struct rcu_state *rsp);
+static void rcu_bind_gp_kthread(void);
 static void rcu_sysidle_report(struct rcu_state *rsp, int isidle,
   unsigned long maxj);
 static void rcu_sysidle_init_percpu_data(struct rcu_dynticks *rdtp);
diff --git a/kernel/rcutree_plugin.h b/kernel/rcutree_plugin.h
index b47ffb0..a4d44c3 100644
--- a/kernel/rcutree_plugin.h
+++ b/kernel/rcutree_plugin.h
@@ -2544,7 +2544,7 @@ static void rcu_sysidle_check_cpu(struct rcu_data *rdp, 
bool *isidle,
if (!*isidle || rdp-rsp != rcu_sysidle_state ||
cpu_is_offline(rdp-cpu) || rdp-cpu == tick_do_timer_cpu)
return;
-   /* WARN_ON_ONCE(smp_processor_id() != tick_do_timer_cpu); */
+   WARN_ON_ONCE(smp_processor_id() != tick_do_timer_cpu);
 
/* Pick up current idle and NMI-nesting counter and check. */
cur = atomic_read(rdtp-dynticks_idle);
@@ -2570,6 +2570,20 @@ static bool is_sysidle_rcu_state(struct rcu_state *rsp)
 }
 
 /*
+ * Bind the grace-period kthread for the sysidle flavor of RCU to the
+ * timekeeping CPU.
+ */
+static void rcu_bind_gp_kthread(void)
+{
+   int cpu = ACCESS_ONCE(tick_do_timer_cpu);
+
+   if (cpu  0 || cpu = nr_cpu_ids)
+   return;
+   if (raw_smp_processor_id() != cpu)
+   set_cpus_allowed_ptr(current, cpumask_of(cpu));
+}
+
+/*
  * Return a delay in jiffies based on the number of CPUs, rcu_node
  * leaf fanout, and jiffies tick rate.  The idea is to allow larger
  * systems more time to transition to full-idle state in order to
@@ -2758,6 +2772,10 @@ static bool is_sysidle_rcu_state(struct rcu_state *rsp)
return false;
 }
 
+static void rcu_bind_gp_kthread(void)
+{
+}
+
 static void rcu_sysidle_report(struct rcu_state *rsp, int isidle,
   unsigned long maxj)
 {
-- 
1.8.1.5

--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/