Re: [PATCH RFC nohz_full 7/7] nohz_full: Force RCU's grace-period kthreads onto timekeeping CPU
On Tue, Jul 30, 2013 at 09:40:03AM +0800, Lai Jiangshan wrote: > On 07/30/2013 12:52 AM, Paul E. McKenney wrote: > > On Mon, Jul 29, 2013 at 11:36:05AM +0800, Lai Jiangshan wrote: > >> On 07/27/2013 07:19 AM, Paul E. McKenney wrote: > >>> From: "Paul E. McKenney" > >>> > >>> Because RCU's quiescent-state-forcing mechanism is used to drive the > >>> full-system-idle state machine, and because this mechanism is executed > >>> by RCU's grace-period kthreads, this commit forces these kthreads to > >>> run on the timekeeping CPU (tick_do_timer_cpu). To do otherwise would > >>> mean that the RCU grace-period kthreads would force the system into > >>> non-idle state every time they drove the state machine, which would > >>> be just a bit on the futile side. > >>> > >>> Signed-off-by: Paul E. McKenney > >>> Cc: Frederic Weisbecker > >>> Cc: Steven Rostedt > >>> --- > >>> kernel/rcutree.c| 1 + > >>> kernel/rcutree.h| 1 + > >>> kernel/rcutree_plugin.h | 20 +++- > >>> 3 files changed, 21 insertions(+), 1 deletion(-) > >>> > >>> diff --git a/kernel/rcutree.c b/kernel/rcutree.c > >>> index aa6d96e..fe83085 100644 > >>> --- a/kernel/rcutree.c > >>> +++ b/kernel/rcutree.c > >>> @@ -1286,6 +1286,7 @@ static int rcu_gp_init(struct rcu_state *rsp) > >>> struct rcu_data *rdp; > >>> struct rcu_node *rnp = rcu_get_root(rsp); > >>> > >>> + rcu_bind_gp_kthread(); > >>> raw_spin_lock_irq(>lock); > >>> rsp->gp_flags = 0; /* Clear all flags: New grace period. */ > >> > >> bind the gp thread when RCU_GP_FLAG_INIT ... > >> > >>> > >>> diff --git a/kernel/rcutree.h b/kernel/rcutree.h > >>> index e0de5dc..49dac99 100644 > >>> --- a/kernel/rcutree.h > >>> +++ b/kernel/rcutree.h > >>> @@ -560,6 +560,7 @@ static void rcu_sysidle_check_cpu(struct rcu_data > >>> *rdp, bool *isidle, > >>> static bool is_sysidle_rcu_state(struct rcu_state *rsp); > >>> static void rcu_sysidle_report_gp(struct rcu_state *rsp, int isidle, > >>> unsigned long maxj); > >>> +static void rcu_bind_gp_kthread(void); > >>> static void rcu_sysidle_init_percpu_data(struct rcu_dynticks *rdtp); > >>> > >>> #endif /* #ifndef RCU_TREE_NONCORE */ > >>> diff --git a/kernel/rcutree_plugin.h b/kernel/rcutree_plugin.h > >>> index ff84bed..f65d9c2 100644 > >>> --- a/kernel/rcutree_plugin.h > >>> +++ b/kernel/rcutree_plugin.h > >>> @@ -2544,7 +2544,7 @@ static void rcu_sysidle_check_cpu(struct rcu_data > >>> *rdp, bool *isidle, > >>> if (!*isidle || rdp->rsp != rcu_sysidle_state || > >>> cpu_is_offline(rdp->cpu) || rdp->cpu == tick_do_timer_cpu) > >>> return; > >>> - /* WARN_ON_ONCE(smp_processor_id() != tick_do_timer_cpu); */ > >>> + WARN_ON_ONCE(smp_processor_id() != tick_do_timer_cpu); > >> > >> > >> but call rcu_sysidle_check_cpu() when RCU_GP_FLAG_FQS. > > > > Yep! But we don't call rcu_gp_fqs() until the grace period is started, > > by which time the kthread will be bound. Any setting of RCU_GP_FLAG_FQS > > while there is no grace period in progress is ignored. > > tick_do_timer_cpu can be changed. > when rcu_gp_fqs() is called, the tick_do_timer_cpu may be a different CPU. > > xxx_thread() > { > bind itself to tick_do_timer_cpu. > sleep(); /* tick_do_timer_cpu can be changed while this */ > use wrong tick_do_timer_cpu. > } Yes, but Frederic's patches currently disable changing of tick_do_timer_cpu. He said that he will re-enable this at some point, and he and I will need to coordinate that change to allow RCU to tolerate tick_do_timer_cpu migration. Thanx, Paul > >> In this time, the thread may not be bound to tick_do_timer_cpu, > >> the WARN_ON_ONCE() may be wrong. > >> > >> Does any other code ensure the gp thread bound on tick_do_timer_cpu > >> which I missed? > > > > However, on small systems, rcu_sysidle_check_cpu() can be called from > > the timekeeping CPU. I suppose that this could potentially happen > > before the first grace period starts, and in that case, we could > > potentially see a spurious warning. I could imagine a number of ways > > to fix this: > > > > 1. Bind the kthread when it is created. > > > > 2. Bind the kthread when it first starts running, rather than just > > after the grace period starts. > > > > 3. Suppress the warning when there is no grace period in progress. > > > > 4. Suppress the warning prior to the first grace period starting. > > > > Seems like #3 is the most straightforward approach. I just change it to: > > > > if (rcu_gp_in_progress(rdp->rsp)) > > WARN_ON_ONCE(smp_processor_id() != tick_do_timer_cpu); > > > > This still gets a WARN_ON_ONCE() if someone moves the timekeeping CPU, > > but Frederic tells me that it never moves. My WARN_ON_ONCE() has some > > probability of complaining should some bug creep in. > > > > Sound reasonable? > > > > Thanx, Paul > >
Re: [PATCH RFC nohz_full 7/7] nohz_full: Force RCU's grace-period kthreads onto timekeeping CPU
On Tue, Jul 30, 2013 at 09:40:03AM +0800, Lai Jiangshan wrote: On 07/30/2013 12:52 AM, Paul E. McKenney wrote: On Mon, Jul 29, 2013 at 11:36:05AM +0800, Lai Jiangshan wrote: On 07/27/2013 07:19 AM, Paul E. McKenney wrote: From: Paul E. McKenney paul...@linux.vnet.ibm.com Because RCU's quiescent-state-forcing mechanism is used to drive the full-system-idle state machine, and because this mechanism is executed by RCU's grace-period kthreads, this commit forces these kthreads to run on the timekeeping CPU (tick_do_timer_cpu). To do otherwise would mean that the RCU grace-period kthreads would force the system into non-idle state every time they drove the state machine, which would be just a bit on the futile side. Signed-off-by: Paul E. McKenney paul...@linux.vnet.ibm.com Cc: Frederic Weisbecker fweis...@gmail.com Cc: Steven Rostedt rost...@goodmis.org --- kernel/rcutree.c| 1 + kernel/rcutree.h| 1 + kernel/rcutree_plugin.h | 20 +++- 3 files changed, 21 insertions(+), 1 deletion(-) diff --git a/kernel/rcutree.c b/kernel/rcutree.c index aa6d96e..fe83085 100644 --- a/kernel/rcutree.c +++ b/kernel/rcutree.c @@ -1286,6 +1286,7 @@ static int rcu_gp_init(struct rcu_state *rsp) struct rcu_data *rdp; struct rcu_node *rnp = rcu_get_root(rsp); + rcu_bind_gp_kthread(); raw_spin_lock_irq(rnp-lock); rsp-gp_flags = 0; /* Clear all flags: New grace period. */ bind the gp thread when RCU_GP_FLAG_INIT ... diff --git a/kernel/rcutree.h b/kernel/rcutree.h index e0de5dc..49dac99 100644 --- a/kernel/rcutree.h +++ b/kernel/rcutree.h @@ -560,6 +560,7 @@ static void rcu_sysidle_check_cpu(struct rcu_data *rdp, bool *isidle, static bool is_sysidle_rcu_state(struct rcu_state *rsp); static void rcu_sysidle_report_gp(struct rcu_state *rsp, int isidle, unsigned long maxj); +static void rcu_bind_gp_kthread(void); static void rcu_sysidle_init_percpu_data(struct rcu_dynticks *rdtp); #endif /* #ifndef RCU_TREE_NONCORE */ diff --git a/kernel/rcutree_plugin.h b/kernel/rcutree_plugin.h index ff84bed..f65d9c2 100644 --- a/kernel/rcutree_plugin.h +++ b/kernel/rcutree_plugin.h @@ -2544,7 +2544,7 @@ static void rcu_sysidle_check_cpu(struct rcu_data *rdp, bool *isidle, if (!*isidle || rdp-rsp != rcu_sysidle_state || cpu_is_offline(rdp-cpu) || rdp-cpu == tick_do_timer_cpu) return; - /* WARN_ON_ONCE(smp_processor_id() != tick_do_timer_cpu); */ + WARN_ON_ONCE(smp_processor_id() != tick_do_timer_cpu); but call rcu_sysidle_check_cpu() when RCU_GP_FLAG_FQS. Yep! But we don't call rcu_gp_fqs() until the grace period is started, by which time the kthread will be bound. Any setting of RCU_GP_FLAG_FQS while there is no grace period in progress is ignored. tick_do_timer_cpu can be changed. when rcu_gp_fqs() is called, the tick_do_timer_cpu may be a different CPU. xxx_thread() { bind itself to tick_do_timer_cpu. sleep(); /* tick_do_timer_cpu can be changed while this */ use wrong tick_do_timer_cpu. } Yes, but Frederic's patches currently disable changing of tick_do_timer_cpu. He said that he will re-enable this at some point, and he and I will need to coordinate that change to allow RCU to tolerate tick_do_timer_cpu migration. Thanx, Paul In this time, the thread may not be bound to tick_do_timer_cpu, the WARN_ON_ONCE() may be wrong. Does any other code ensure the gp thread bound on tick_do_timer_cpu which I missed? However, on small systems, rcu_sysidle_check_cpu() can be called from the timekeeping CPU. I suppose that this could potentially happen before the first grace period starts, and in that case, we could potentially see a spurious warning. I could imagine a number of ways to fix this: 1. Bind the kthread when it is created. 2. Bind the kthread when it first starts running, rather than just after the grace period starts. 3. Suppress the warning when there is no grace period in progress. 4. Suppress the warning prior to the first grace period starting. Seems like #3 is the most straightforward approach. I just change it to: if (rcu_gp_in_progress(rdp-rsp)) WARN_ON_ONCE(smp_processor_id() != tick_do_timer_cpu); This still gets a WARN_ON_ONCE() if someone moves the timekeeping CPU, but Frederic tells me that it never moves. My WARN_ON_ONCE() has some probability of complaining should some bug creep in. Sound reasonable? Thanx, Paul /* Pick up current idle and NMI-nesting counter and check. */ cur = atomic_read(rdtp-dynticks_idle); @@ -2570,6 +2570,20 @@ static bool is_sysidle_rcu_state(struct rcu_state *rsp) } /* + * Bind the grace-period kthread for
Re: [PATCH RFC nohz_full 7/7] nohz_full: Force RCU's grace-period kthreads onto timekeeping CPU
On 07/30/2013 12:52 AM, Paul E. McKenney wrote: > On Mon, Jul 29, 2013 at 11:36:05AM +0800, Lai Jiangshan wrote: >> On 07/27/2013 07:19 AM, Paul E. McKenney wrote: >>> From: "Paul E. McKenney" >>> >>> Because RCU's quiescent-state-forcing mechanism is used to drive the >>> full-system-idle state machine, and because this mechanism is executed >>> by RCU's grace-period kthreads, this commit forces these kthreads to >>> run on the timekeeping CPU (tick_do_timer_cpu). To do otherwise would >>> mean that the RCU grace-period kthreads would force the system into >>> non-idle state every time they drove the state machine, which would >>> be just a bit on the futile side. >>> >>> Signed-off-by: Paul E. McKenney >>> Cc: Frederic Weisbecker >>> Cc: Steven Rostedt >>> --- >>> kernel/rcutree.c| 1 + >>> kernel/rcutree.h| 1 + >>> kernel/rcutree_plugin.h | 20 +++- >>> 3 files changed, 21 insertions(+), 1 deletion(-) >>> >>> diff --git a/kernel/rcutree.c b/kernel/rcutree.c >>> index aa6d96e..fe83085 100644 >>> --- a/kernel/rcutree.c >>> +++ b/kernel/rcutree.c >>> @@ -1286,6 +1286,7 @@ static int rcu_gp_init(struct rcu_state *rsp) >>> struct rcu_data *rdp; >>> struct rcu_node *rnp = rcu_get_root(rsp); >>> >>> + rcu_bind_gp_kthread(); >>> raw_spin_lock_irq(>lock); >>> rsp->gp_flags = 0; /* Clear all flags: New grace period. */ >> >> bind the gp thread when RCU_GP_FLAG_INIT ... >> >>> >>> diff --git a/kernel/rcutree.h b/kernel/rcutree.h >>> index e0de5dc..49dac99 100644 >>> --- a/kernel/rcutree.h >>> +++ b/kernel/rcutree.h >>> @@ -560,6 +560,7 @@ static void rcu_sysidle_check_cpu(struct rcu_data *rdp, >>> bool *isidle, >>> static bool is_sysidle_rcu_state(struct rcu_state *rsp); >>> static void rcu_sysidle_report_gp(struct rcu_state *rsp, int isidle, >>> unsigned long maxj); >>> +static void rcu_bind_gp_kthread(void); >>> static void rcu_sysidle_init_percpu_data(struct rcu_dynticks *rdtp); >>> >>> #endif /* #ifndef RCU_TREE_NONCORE */ >>> diff --git a/kernel/rcutree_plugin.h b/kernel/rcutree_plugin.h >>> index ff84bed..f65d9c2 100644 >>> --- a/kernel/rcutree_plugin.h >>> +++ b/kernel/rcutree_plugin.h >>> @@ -2544,7 +2544,7 @@ static void rcu_sysidle_check_cpu(struct rcu_data >>> *rdp, bool *isidle, >>> if (!*isidle || rdp->rsp != rcu_sysidle_state || >>> cpu_is_offline(rdp->cpu) || rdp->cpu == tick_do_timer_cpu) >>> return; >>> - /* WARN_ON_ONCE(smp_processor_id() != tick_do_timer_cpu); */ >>> + WARN_ON_ONCE(smp_processor_id() != tick_do_timer_cpu); >> >> >> but call rcu_sysidle_check_cpu() when RCU_GP_FLAG_FQS. > > Yep! But we don't call rcu_gp_fqs() until the grace period is started, > by which time the kthread will be bound. Any setting of RCU_GP_FLAG_FQS > while there is no grace period in progress is ignored. tick_do_timer_cpu can be changed. when rcu_gp_fqs() is called, the tick_do_timer_cpu may be a different CPU. xxx_thread() { bind itself to tick_do_timer_cpu. sleep(); /* tick_do_timer_cpu can be changed while this */ use wrong tick_do_timer_cpu. } > >> In this time, the thread may not be bound to tick_do_timer_cpu, >> the WARN_ON_ONCE() may be wrong. >> >> Does any other code ensure the gp thread bound on tick_do_timer_cpu >> which I missed? > > However, on small systems, rcu_sysidle_check_cpu() can be called from > the timekeeping CPU. I suppose that this could potentially happen > before the first grace period starts, and in that case, we could > potentially see a spurious warning. I could imagine a number of ways > to fix this: > > 1.Bind the kthread when it is created. > > 2.Bind the kthread when it first starts running, rather than just > after the grace period starts. > > 3.Suppress the warning when there is no grace period in progress. > > 4.Suppress the warning prior to the first grace period starting. > > Seems like #3 is the most straightforward approach. I just change it to: > > if (rcu_gp_in_progress(rdp->rsp)) > WARN_ON_ONCE(smp_processor_id() != tick_do_timer_cpu); > > This still gets a WARN_ON_ONCE() if someone moves the timekeeping CPU, > but Frederic tells me that it never moves. My WARN_ON_ONCE() has some > probability of complaining should some bug creep in. > > Sound reasonable? > > Thanx, Paul > >>> /* Pick up current idle and NMI-nesting counter and check. */ >>> cur = atomic_read(>dynticks_idle); >>> @@ -2570,6 +2570,20 @@ static bool is_sysidle_rcu_state(struct rcu_state >>> *rsp) >>> } >>> >>> /* >>> + * Bind the grace-period kthread for the sysidle flavor of RCU to the >>> + * timekeeping CPU. >>> + */ >>> +static void rcu_bind_gp_kthread(void) >>> +{ >>> + int cpu = ACCESS_ONCE(tick_do_timer_cpu); >>> + >>> + if (cpu < 0 || cpu >= nr_cpu_ids) >>> + return; >>> + if
Re: [PATCH RFC nohz_full 7/7] nohz_full: Force RCU's grace-period kthreads onto timekeeping CPU
On Mon, Jul 29, 2013 at 06:59:46PM +0200, Frederic Weisbecker wrote: > On Mon, Jul 29, 2013 at 09:52:53AM -0700, Paul E. McKenney wrote: > > On Mon, Jul 29, 2013 at 11:36:05AM +0800, Lai Jiangshan wrote: > > However, on small systems, rcu_sysidle_check_cpu() can be called from > > the timekeeping CPU. I suppose that this could potentially happen > > before the first grace period starts, and in that case, we could > > potentially see a spurious warning. I could imagine a number of ways > > to fix this: > > > > 1. Bind the kthread when it is created. > > > > 2. Bind the kthread when it first starts running, rather than just > > after the grace period starts. > > > > 3. Suppress the warning when there is no grace period in progress. > > > > 4. Suppress the warning prior to the first grace period starting. > > > > Seems like #3 is the most straightforward approach. I just change it to: > > > > if (rcu_gp_in_progress(rdp->rsp)) > > WARN_ON_ONCE(smp_processor_id() != tick_do_timer_cpu); > > > > This still gets a WARN_ON_ONCE() if someone moves the timekeeping CPU, > > but Frederic tells me that it never moves. My WARN_ON_ONCE() has some > > probability of complaining should some bug creep in. > > It doesn't move for now but keep in mind that it will probably be able > to move in the future. If we have several non full-dynticks CPUs, balancing > the timekeeping duty between them, depending which one runs at a given time, > may improve power savings even better. > > But you can ignore that for now. Your patchset is entertaining enough that > we don't need to add more complications yet ;) Yeah, we will need some sort of handshake for that. Might be a simple as setting a flag that suppresses the warning, which I clear the next time I bind the kthread. Well, it would need to deal with closely-spaced moves of the timekeeping duty, wouldn't it? Plus it would need to deal with the fact that sampling the variable referencing the timekeeping CPU, sampling the current CPU, and binding the kthread cannot be done as one big atomic operation. Which means that their would need to be two calls in the handshake, one to prepare to move the timekeeping CPU and another to announce that it had in fact been moved. Which is not too hard -- I use an irq-disable lock to guard setting and clearing an internal-to-RCU flag noting the upcoming change. The check and WARN_ON_ONCE() are done while holding this same lock. The flag is cleared only after the kthread-bind operation that follows the last "it has in fact been moved" handshake. So the flag has three states, idle, ready to move, and moved. The possibility of closely spaced moves of the timekeeping kthread are dealt with by transitioning from "moved" to "ready to move". The state goes back to "idle" only after completion of a kthread-bind operation in the "moved" state. So agreed, let's defer this one. ;-) Thanx, Paul -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH RFC nohz_full 7/7] nohz_full: Force RCU's grace-period kthreads onto timekeeping CPU
On Mon, Jul 29, 2013 at 09:52:53AM -0700, Paul E. McKenney wrote: > On Mon, Jul 29, 2013 at 11:36:05AM +0800, Lai Jiangshan wrote: > However, on small systems, rcu_sysidle_check_cpu() can be called from > the timekeeping CPU. I suppose that this could potentially happen > before the first grace period starts, and in that case, we could > potentially see a spurious warning. I could imagine a number of ways > to fix this: > > 1.Bind the kthread when it is created. > > 2.Bind the kthread when it first starts running, rather than just > after the grace period starts. > > 3.Suppress the warning when there is no grace period in progress. > > 4.Suppress the warning prior to the first grace period starting. > > Seems like #3 is the most straightforward approach. I just change it to: > > if (rcu_gp_in_progress(rdp->rsp)) > WARN_ON_ONCE(smp_processor_id() != tick_do_timer_cpu); > > This still gets a WARN_ON_ONCE() if someone moves the timekeeping CPU, > but Frederic tells me that it never moves. My WARN_ON_ONCE() has some > probability of complaining should some bug creep in. It doesn't move for now but keep in mind that it will probably be able to move in the future. If we have several non full-dynticks CPUs, balancing the timekeeping duty between them, depending which one runs at a given time, may improve power savings even better. But you can ignore that for now. Your patchset is entertaining enough that we don't need to add more complications yet ;) -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH RFC nohz_full 7/7] nohz_full: Force RCU's grace-period kthreads onto timekeeping CPU
On Mon, Jul 29, 2013 at 11:36:05AM +0800, Lai Jiangshan wrote: > On 07/27/2013 07:19 AM, Paul E. McKenney wrote: > > From: "Paul E. McKenney" > > > > Because RCU's quiescent-state-forcing mechanism is used to drive the > > full-system-idle state machine, and because this mechanism is executed > > by RCU's grace-period kthreads, this commit forces these kthreads to > > run on the timekeeping CPU (tick_do_timer_cpu). To do otherwise would > > mean that the RCU grace-period kthreads would force the system into > > non-idle state every time they drove the state machine, which would > > be just a bit on the futile side. > > > > Signed-off-by: Paul E. McKenney > > Cc: Frederic Weisbecker > > Cc: Steven Rostedt > > --- > > kernel/rcutree.c| 1 + > > kernel/rcutree.h| 1 + > > kernel/rcutree_plugin.h | 20 +++- > > 3 files changed, 21 insertions(+), 1 deletion(-) > > > > diff --git a/kernel/rcutree.c b/kernel/rcutree.c > > index aa6d96e..fe83085 100644 > > --- a/kernel/rcutree.c > > +++ b/kernel/rcutree.c > > @@ -1286,6 +1286,7 @@ static int rcu_gp_init(struct rcu_state *rsp) > > struct rcu_data *rdp; > > struct rcu_node *rnp = rcu_get_root(rsp); > > > > + rcu_bind_gp_kthread(); > > raw_spin_lock_irq(>lock); > > rsp->gp_flags = 0; /* Clear all flags: New grace period. */ > > bind the gp thread when RCU_GP_FLAG_INIT ... > > > > > diff --git a/kernel/rcutree.h b/kernel/rcutree.h > > index e0de5dc..49dac99 100644 > > --- a/kernel/rcutree.h > > +++ b/kernel/rcutree.h > > @@ -560,6 +560,7 @@ static void rcu_sysidle_check_cpu(struct rcu_data *rdp, > > bool *isidle, > > static bool is_sysidle_rcu_state(struct rcu_state *rsp); > > static void rcu_sysidle_report_gp(struct rcu_state *rsp, int isidle, > > unsigned long maxj); > > +static void rcu_bind_gp_kthread(void); > > static void rcu_sysidle_init_percpu_data(struct rcu_dynticks *rdtp); > > > > #endif /* #ifndef RCU_TREE_NONCORE */ > > diff --git a/kernel/rcutree_plugin.h b/kernel/rcutree_plugin.h > > index ff84bed..f65d9c2 100644 > > --- a/kernel/rcutree_plugin.h > > +++ b/kernel/rcutree_plugin.h > > @@ -2544,7 +2544,7 @@ static void rcu_sysidle_check_cpu(struct rcu_data > > *rdp, bool *isidle, > > if (!*isidle || rdp->rsp != rcu_sysidle_state || > > cpu_is_offline(rdp->cpu) || rdp->cpu == tick_do_timer_cpu) > > return; > > - /* WARN_ON_ONCE(smp_processor_id() != tick_do_timer_cpu); */ > > + WARN_ON_ONCE(smp_processor_id() != tick_do_timer_cpu); > > > but call rcu_sysidle_check_cpu() when RCU_GP_FLAG_FQS. Yep! But we don't call rcu_gp_fqs() until the grace period is started, by which time the kthread will be bound. Any setting of RCU_GP_FLAG_FQS while there is no grace period in progress is ignored. > In this time, the thread may not be bound to tick_do_timer_cpu, > the WARN_ON_ONCE() may be wrong. > > Does any other code ensure the gp thread bound on tick_do_timer_cpu > which I missed? However, on small systems, rcu_sysidle_check_cpu() can be called from the timekeeping CPU. I suppose that this could potentially happen before the first grace period starts, and in that case, we could potentially see a spurious warning. I could imagine a number of ways to fix this: 1. Bind the kthread when it is created. 2. Bind the kthread when it first starts running, rather than just after the grace period starts. 3. Suppress the warning when there is no grace period in progress. 4. Suppress the warning prior to the first grace period starting. Seems like #3 is the most straightforward approach. I just change it to: if (rcu_gp_in_progress(rdp->rsp)) WARN_ON_ONCE(smp_processor_id() != tick_do_timer_cpu); This still gets a WARN_ON_ONCE() if someone moves the timekeeping CPU, but Frederic tells me that it never moves. My WARN_ON_ONCE() has some probability of complaining should some bug creep in. Sound reasonable? Thanx, Paul > > /* Pick up current idle and NMI-nesting counter and check. */ > > cur = atomic_read(>dynticks_idle); > > @@ -2570,6 +2570,20 @@ static bool is_sysidle_rcu_state(struct rcu_state > > *rsp) > > } > > > > /* > > + * Bind the grace-period kthread for the sysidle flavor of RCU to the > > + * timekeeping CPU. > > + */ > > +static void rcu_bind_gp_kthread(void) > > +{ > > + int cpu = ACCESS_ONCE(tick_do_timer_cpu); > > + > > + if (cpu < 0 || cpu >= nr_cpu_ids) > > + return; > > + if (raw_smp_processor_id() != cpu) > > + set_cpus_allowed_ptr(current, cpumask_of(cpu)); > > +} > > + > > +/* > > * Return a delay in jiffies based on the number of CPUs, rcu_node > > * leaf fanout, and jiffies tick rate. The idea is to allow larger > > * systems more time to transition to full-idle state in order to > > @@ -2767,6 +2781,10 @@ static bool is_sysidle_rcu_state(struct
Re: [PATCH RFC nohz_full 7/7] nohz_full: Force RCU's grace-period kthreads onto timekeeping CPU
On Mon, Jul 29, 2013 at 11:36:05AM +0800, Lai Jiangshan wrote: On 07/27/2013 07:19 AM, Paul E. McKenney wrote: From: Paul E. McKenney paul...@linux.vnet.ibm.com Because RCU's quiescent-state-forcing mechanism is used to drive the full-system-idle state machine, and because this mechanism is executed by RCU's grace-period kthreads, this commit forces these kthreads to run on the timekeeping CPU (tick_do_timer_cpu). To do otherwise would mean that the RCU grace-period kthreads would force the system into non-idle state every time they drove the state machine, which would be just a bit on the futile side. Signed-off-by: Paul E. McKenney paul...@linux.vnet.ibm.com Cc: Frederic Weisbecker fweis...@gmail.com Cc: Steven Rostedt rost...@goodmis.org --- kernel/rcutree.c| 1 + kernel/rcutree.h| 1 + kernel/rcutree_plugin.h | 20 +++- 3 files changed, 21 insertions(+), 1 deletion(-) diff --git a/kernel/rcutree.c b/kernel/rcutree.c index aa6d96e..fe83085 100644 --- a/kernel/rcutree.c +++ b/kernel/rcutree.c @@ -1286,6 +1286,7 @@ static int rcu_gp_init(struct rcu_state *rsp) struct rcu_data *rdp; struct rcu_node *rnp = rcu_get_root(rsp); + rcu_bind_gp_kthread(); raw_spin_lock_irq(rnp-lock); rsp-gp_flags = 0; /* Clear all flags: New grace period. */ bind the gp thread when RCU_GP_FLAG_INIT ... diff --git a/kernel/rcutree.h b/kernel/rcutree.h index e0de5dc..49dac99 100644 --- a/kernel/rcutree.h +++ b/kernel/rcutree.h @@ -560,6 +560,7 @@ static void rcu_sysidle_check_cpu(struct rcu_data *rdp, bool *isidle, static bool is_sysidle_rcu_state(struct rcu_state *rsp); static void rcu_sysidle_report_gp(struct rcu_state *rsp, int isidle, unsigned long maxj); +static void rcu_bind_gp_kthread(void); static void rcu_sysidle_init_percpu_data(struct rcu_dynticks *rdtp); #endif /* #ifndef RCU_TREE_NONCORE */ diff --git a/kernel/rcutree_plugin.h b/kernel/rcutree_plugin.h index ff84bed..f65d9c2 100644 --- a/kernel/rcutree_plugin.h +++ b/kernel/rcutree_plugin.h @@ -2544,7 +2544,7 @@ static void rcu_sysidle_check_cpu(struct rcu_data *rdp, bool *isidle, if (!*isidle || rdp-rsp != rcu_sysidle_state || cpu_is_offline(rdp-cpu) || rdp-cpu == tick_do_timer_cpu) return; - /* WARN_ON_ONCE(smp_processor_id() != tick_do_timer_cpu); */ + WARN_ON_ONCE(smp_processor_id() != tick_do_timer_cpu); but call rcu_sysidle_check_cpu() when RCU_GP_FLAG_FQS. Yep! But we don't call rcu_gp_fqs() until the grace period is started, by which time the kthread will be bound. Any setting of RCU_GP_FLAG_FQS while there is no grace period in progress is ignored. In this time, the thread may not be bound to tick_do_timer_cpu, the WARN_ON_ONCE() may be wrong. Does any other code ensure the gp thread bound on tick_do_timer_cpu which I missed? However, on small systems, rcu_sysidle_check_cpu() can be called from the timekeeping CPU. I suppose that this could potentially happen before the first grace period starts, and in that case, we could potentially see a spurious warning. I could imagine a number of ways to fix this: 1. Bind the kthread when it is created. 2. Bind the kthread when it first starts running, rather than just after the grace period starts. 3. Suppress the warning when there is no grace period in progress. 4. Suppress the warning prior to the first grace period starting. Seems like #3 is the most straightforward approach. I just change it to: if (rcu_gp_in_progress(rdp-rsp)) WARN_ON_ONCE(smp_processor_id() != tick_do_timer_cpu); This still gets a WARN_ON_ONCE() if someone moves the timekeeping CPU, but Frederic tells me that it never moves. My WARN_ON_ONCE() has some probability of complaining should some bug creep in. Sound reasonable? Thanx, Paul /* Pick up current idle and NMI-nesting counter and check. */ cur = atomic_read(rdtp-dynticks_idle); @@ -2570,6 +2570,20 @@ static bool is_sysidle_rcu_state(struct rcu_state *rsp) } /* + * Bind the grace-period kthread for the sysidle flavor of RCU to the + * timekeeping CPU. + */ +static void rcu_bind_gp_kthread(void) +{ + int cpu = ACCESS_ONCE(tick_do_timer_cpu); + + if (cpu 0 || cpu = nr_cpu_ids) + return; + if (raw_smp_processor_id() != cpu) + set_cpus_allowed_ptr(current, cpumask_of(cpu)); +} + +/* * Return a delay in jiffies based on the number of CPUs, rcu_node * leaf fanout, and jiffies tick rate. The idea is to allow larger * systems more time to transition to full-idle state in order to @@ -2767,6 +2781,10 @@ static bool is_sysidle_rcu_state(struct rcu_state *rsp) return false; } +static void rcu_bind_gp_kthread(void)
Re: [PATCH RFC nohz_full 7/7] nohz_full: Force RCU's grace-period kthreads onto timekeeping CPU
On Mon, Jul 29, 2013 at 09:52:53AM -0700, Paul E. McKenney wrote: On Mon, Jul 29, 2013 at 11:36:05AM +0800, Lai Jiangshan wrote: However, on small systems, rcu_sysidle_check_cpu() can be called from the timekeeping CPU. I suppose that this could potentially happen before the first grace period starts, and in that case, we could potentially see a spurious warning. I could imagine a number of ways to fix this: 1.Bind the kthread when it is created. 2.Bind the kthread when it first starts running, rather than just after the grace period starts. 3.Suppress the warning when there is no grace period in progress. 4.Suppress the warning prior to the first grace period starting. Seems like #3 is the most straightforward approach. I just change it to: if (rcu_gp_in_progress(rdp-rsp)) WARN_ON_ONCE(smp_processor_id() != tick_do_timer_cpu); This still gets a WARN_ON_ONCE() if someone moves the timekeeping CPU, but Frederic tells me that it never moves. My WARN_ON_ONCE() has some probability of complaining should some bug creep in. It doesn't move for now but keep in mind that it will probably be able to move in the future. If we have several non full-dynticks CPUs, balancing the timekeeping duty between them, depending which one runs at a given time, may improve power savings even better. But you can ignore that for now. Your patchset is entertaining enough that we don't need to add more complications yet ;) -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH RFC nohz_full 7/7] nohz_full: Force RCU's grace-period kthreads onto timekeeping CPU
On Mon, Jul 29, 2013 at 06:59:46PM +0200, Frederic Weisbecker wrote: On Mon, Jul 29, 2013 at 09:52:53AM -0700, Paul E. McKenney wrote: On Mon, Jul 29, 2013 at 11:36:05AM +0800, Lai Jiangshan wrote: However, on small systems, rcu_sysidle_check_cpu() can be called from the timekeeping CPU. I suppose that this could potentially happen before the first grace period starts, and in that case, we could potentially see a spurious warning. I could imagine a number of ways to fix this: 1. Bind the kthread when it is created. 2. Bind the kthread when it first starts running, rather than just after the grace period starts. 3. Suppress the warning when there is no grace period in progress. 4. Suppress the warning prior to the first grace period starting. Seems like #3 is the most straightforward approach. I just change it to: if (rcu_gp_in_progress(rdp-rsp)) WARN_ON_ONCE(smp_processor_id() != tick_do_timer_cpu); This still gets a WARN_ON_ONCE() if someone moves the timekeeping CPU, but Frederic tells me that it never moves. My WARN_ON_ONCE() has some probability of complaining should some bug creep in. It doesn't move for now but keep in mind that it will probably be able to move in the future. If we have several non full-dynticks CPUs, balancing the timekeeping duty between them, depending which one runs at a given time, may improve power savings even better. But you can ignore that for now. Your patchset is entertaining enough that we don't need to add more complications yet ;) Yeah, we will need some sort of handshake for that. Might be a simple as setting a flag that suppresses the warning, which I clear the next time I bind the kthread. Well, it would need to deal with closely-spaced moves of the timekeeping duty, wouldn't it? Plus it would need to deal with the fact that sampling the variable referencing the timekeeping CPU, sampling the current CPU, and binding the kthread cannot be done as one big atomic operation. Which means that their would need to be two calls in the handshake, one to prepare to move the timekeeping CPU and another to announce that it had in fact been moved. Which is not too hard -- I use an irq-disable lock to guard setting and clearing an internal-to-RCU flag noting the upcoming change. The check and WARN_ON_ONCE() are done while holding this same lock. The flag is cleared only after the kthread-bind operation that follows the last it has in fact been moved handshake. So the flag has three states, idle, ready to move, and moved. The possibility of closely spaced moves of the timekeeping kthread are dealt with by transitioning from moved to ready to move. The state goes back to idle only after completion of a kthread-bind operation in the moved state. So agreed, let's defer this one. ;-) Thanx, Paul -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH RFC nohz_full 7/7] nohz_full: Force RCU's grace-period kthreads onto timekeeping CPU
On 07/30/2013 12:52 AM, Paul E. McKenney wrote: On Mon, Jul 29, 2013 at 11:36:05AM +0800, Lai Jiangshan wrote: On 07/27/2013 07:19 AM, Paul E. McKenney wrote: From: Paul E. McKenney paul...@linux.vnet.ibm.com Because RCU's quiescent-state-forcing mechanism is used to drive the full-system-idle state machine, and because this mechanism is executed by RCU's grace-period kthreads, this commit forces these kthreads to run on the timekeeping CPU (tick_do_timer_cpu). To do otherwise would mean that the RCU grace-period kthreads would force the system into non-idle state every time they drove the state machine, which would be just a bit on the futile side. Signed-off-by: Paul E. McKenney paul...@linux.vnet.ibm.com Cc: Frederic Weisbecker fweis...@gmail.com Cc: Steven Rostedt rost...@goodmis.org --- kernel/rcutree.c| 1 + kernel/rcutree.h| 1 + kernel/rcutree_plugin.h | 20 +++- 3 files changed, 21 insertions(+), 1 deletion(-) diff --git a/kernel/rcutree.c b/kernel/rcutree.c index aa6d96e..fe83085 100644 --- a/kernel/rcutree.c +++ b/kernel/rcutree.c @@ -1286,6 +1286,7 @@ static int rcu_gp_init(struct rcu_state *rsp) struct rcu_data *rdp; struct rcu_node *rnp = rcu_get_root(rsp); + rcu_bind_gp_kthread(); raw_spin_lock_irq(rnp-lock); rsp-gp_flags = 0; /* Clear all flags: New grace period. */ bind the gp thread when RCU_GP_FLAG_INIT ... diff --git a/kernel/rcutree.h b/kernel/rcutree.h index e0de5dc..49dac99 100644 --- a/kernel/rcutree.h +++ b/kernel/rcutree.h @@ -560,6 +560,7 @@ static void rcu_sysidle_check_cpu(struct rcu_data *rdp, bool *isidle, static bool is_sysidle_rcu_state(struct rcu_state *rsp); static void rcu_sysidle_report_gp(struct rcu_state *rsp, int isidle, unsigned long maxj); +static void rcu_bind_gp_kthread(void); static void rcu_sysidle_init_percpu_data(struct rcu_dynticks *rdtp); #endif /* #ifndef RCU_TREE_NONCORE */ diff --git a/kernel/rcutree_plugin.h b/kernel/rcutree_plugin.h index ff84bed..f65d9c2 100644 --- a/kernel/rcutree_plugin.h +++ b/kernel/rcutree_plugin.h @@ -2544,7 +2544,7 @@ static void rcu_sysidle_check_cpu(struct rcu_data *rdp, bool *isidle, if (!*isidle || rdp-rsp != rcu_sysidle_state || cpu_is_offline(rdp-cpu) || rdp-cpu == tick_do_timer_cpu) return; - /* WARN_ON_ONCE(smp_processor_id() != tick_do_timer_cpu); */ + WARN_ON_ONCE(smp_processor_id() != tick_do_timer_cpu); but call rcu_sysidle_check_cpu() when RCU_GP_FLAG_FQS. Yep! But we don't call rcu_gp_fqs() until the grace period is started, by which time the kthread will be bound. Any setting of RCU_GP_FLAG_FQS while there is no grace period in progress is ignored. tick_do_timer_cpu can be changed. when rcu_gp_fqs() is called, the tick_do_timer_cpu may be a different CPU. xxx_thread() { bind itself to tick_do_timer_cpu. sleep(); /* tick_do_timer_cpu can be changed while this */ use wrong tick_do_timer_cpu. } In this time, the thread may not be bound to tick_do_timer_cpu, the WARN_ON_ONCE() may be wrong. Does any other code ensure the gp thread bound on tick_do_timer_cpu which I missed? However, on small systems, rcu_sysidle_check_cpu() can be called from the timekeeping CPU. I suppose that this could potentially happen before the first grace period starts, and in that case, we could potentially see a spurious warning. I could imagine a number of ways to fix this: 1.Bind the kthread when it is created. 2.Bind the kthread when it first starts running, rather than just after the grace period starts. 3.Suppress the warning when there is no grace period in progress. 4.Suppress the warning prior to the first grace period starting. Seems like #3 is the most straightforward approach. I just change it to: if (rcu_gp_in_progress(rdp-rsp)) WARN_ON_ONCE(smp_processor_id() != tick_do_timer_cpu); This still gets a WARN_ON_ONCE() if someone moves the timekeeping CPU, but Frederic tells me that it never moves. My WARN_ON_ONCE() has some probability of complaining should some bug creep in. Sound reasonable? Thanx, Paul /* Pick up current idle and NMI-nesting counter and check. */ cur = atomic_read(rdtp-dynticks_idle); @@ -2570,6 +2570,20 @@ static bool is_sysidle_rcu_state(struct rcu_state *rsp) } /* + * Bind the grace-period kthread for the sysidle flavor of RCU to the + * timekeeping CPU. + */ +static void rcu_bind_gp_kthread(void) +{ + int cpu = ACCESS_ONCE(tick_do_timer_cpu); + + if (cpu 0 || cpu = nr_cpu_ids) + return; + if (raw_smp_processor_id() != cpu) + set_cpus_allowed_ptr(current, cpumask_of(cpu)); +} + +/* * Return a delay in jiffies based on the number of CPUs, rcu_node * leaf fanout, and jiffies
Re: [PATCH RFC nohz_full 7/7] nohz_full: Force RCU's grace-period kthreads onto timekeeping CPU
On 07/27/2013 07:19 AM, Paul E. McKenney wrote: > From: "Paul E. McKenney" > > Because RCU's quiescent-state-forcing mechanism is used to drive the > full-system-idle state machine, and because this mechanism is executed > by RCU's grace-period kthreads, this commit forces these kthreads to > run on the timekeeping CPU (tick_do_timer_cpu). To do otherwise would > mean that the RCU grace-period kthreads would force the system into > non-idle state every time they drove the state machine, which would > be just a bit on the futile side. > > Signed-off-by: Paul E. McKenney > Cc: Frederic Weisbecker > Cc: Steven Rostedt > --- > kernel/rcutree.c| 1 + > kernel/rcutree.h| 1 + > kernel/rcutree_plugin.h | 20 +++- > 3 files changed, 21 insertions(+), 1 deletion(-) > > diff --git a/kernel/rcutree.c b/kernel/rcutree.c > index aa6d96e..fe83085 100644 > --- a/kernel/rcutree.c > +++ b/kernel/rcutree.c > @@ -1286,6 +1286,7 @@ static int rcu_gp_init(struct rcu_state *rsp) > struct rcu_data *rdp; > struct rcu_node *rnp = rcu_get_root(rsp); > > + rcu_bind_gp_kthread(); > raw_spin_lock_irq(>lock); > rsp->gp_flags = 0; /* Clear all flags: New grace period. */ bind the gp thread when RCU_GP_FLAG_INIT ... > > diff --git a/kernel/rcutree.h b/kernel/rcutree.h > index e0de5dc..49dac99 100644 > --- a/kernel/rcutree.h > +++ b/kernel/rcutree.h > @@ -560,6 +560,7 @@ static void rcu_sysidle_check_cpu(struct rcu_data *rdp, > bool *isidle, > static bool is_sysidle_rcu_state(struct rcu_state *rsp); > static void rcu_sysidle_report_gp(struct rcu_state *rsp, int isidle, > unsigned long maxj); > +static void rcu_bind_gp_kthread(void); > static void rcu_sysidle_init_percpu_data(struct rcu_dynticks *rdtp); > > #endif /* #ifndef RCU_TREE_NONCORE */ > diff --git a/kernel/rcutree_plugin.h b/kernel/rcutree_plugin.h > index ff84bed..f65d9c2 100644 > --- a/kernel/rcutree_plugin.h > +++ b/kernel/rcutree_plugin.h > @@ -2544,7 +2544,7 @@ static void rcu_sysidle_check_cpu(struct rcu_data *rdp, > bool *isidle, > if (!*isidle || rdp->rsp != rcu_sysidle_state || > cpu_is_offline(rdp->cpu) || rdp->cpu == tick_do_timer_cpu) > return; > - /* WARN_ON_ONCE(smp_processor_id() != tick_do_timer_cpu); */ > + WARN_ON_ONCE(smp_processor_id() != tick_do_timer_cpu); but call rcu_sysidle_check_cpu() when RCU_GP_FLAG_FQS. In this time, the thread may not be bound to tick_do_timer_cpu, the WARN_ON_ONCE() may be wrong. Does any other code ensure the gp thread bound on tick_do_timer_cpu which I missed? > > /* Pick up current idle and NMI-nesting counter and check. */ > cur = atomic_read(>dynticks_idle); > @@ -2570,6 +2570,20 @@ static bool is_sysidle_rcu_state(struct rcu_state *rsp) > } > > /* > + * Bind the grace-period kthread for the sysidle flavor of RCU to the > + * timekeeping CPU. > + */ > +static void rcu_bind_gp_kthread(void) > +{ > + int cpu = ACCESS_ONCE(tick_do_timer_cpu); > + > + if (cpu < 0 || cpu >= nr_cpu_ids) > + return; > + if (raw_smp_processor_id() != cpu) > + set_cpus_allowed_ptr(current, cpumask_of(cpu)); > +} > + > +/* > * Return a delay in jiffies based on the number of CPUs, rcu_node > * leaf fanout, and jiffies tick rate. The idea is to allow larger > * systems more time to transition to full-idle state in order to > @@ -2767,6 +2781,10 @@ static bool is_sysidle_rcu_state(struct rcu_state *rsp) > return false; > } > > +static void rcu_bind_gp_kthread(void) > +{ > +} > + > static void rcu_sysidle_report_gp(struct rcu_state *rsp, int isidle, > unsigned long maxj) > { -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH RFC nohz_full 7/7] nohz_full: Force RCU's grace-period kthreads onto timekeeping CPU
On 07/27/2013 07:19 AM, Paul E. McKenney wrote: From: Paul E. McKenney paul...@linux.vnet.ibm.com Because RCU's quiescent-state-forcing mechanism is used to drive the full-system-idle state machine, and because this mechanism is executed by RCU's grace-period kthreads, this commit forces these kthreads to run on the timekeeping CPU (tick_do_timer_cpu). To do otherwise would mean that the RCU grace-period kthreads would force the system into non-idle state every time they drove the state machine, which would be just a bit on the futile side. Signed-off-by: Paul E. McKenney paul...@linux.vnet.ibm.com Cc: Frederic Weisbecker fweis...@gmail.com Cc: Steven Rostedt rost...@goodmis.org --- kernel/rcutree.c| 1 + kernel/rcutree.h| 1 + kernel/rcutree_plugin.h | 20 +++- 3 files changed, 21 insertions(+), 1 deletion(-) diff --git a/kernel/rcutree.c b/kernel/rcutree.c index aa6d96e..fe83085 100644 --- a/kernel/rcutree.c +++ b/kernel/rcutree.c @@ -1286,6 +1286,7 @@ static int rcu_gp_init(struct rcu_state *rsp) struct rcu_data *rdp; struct rcu_node *rnp = rcu_get_root(rsp); + rcu_bind_gp_kthread(); raw_spin_lock_irq(rnp-lock); rsp-gp_flags = 0; /* Clear all flags: New grace period. */ bind the gp thread when RCU_GP_FLAG_INIT ... diff --git a/kernel/rcutree.h b/kernel/rcutree.h index e0de5dc..49dac99 100644 --- a/kernel/rcutree.h +++ b/kernel/rcutree.h @@ -560,6 +560,7 @@ static void rcu_sysidle_check_cpu(struct rcu_data *rdp, bool *isidle, static bool is_sysidle_rcu_state(struct rcu_state *rsp); static void rcu_sysidle_report_gp(struct rcu_state *rsp, int isidle, unsigned long maxj); +static void rcu_bind_gp_kthread(void); static void rcu_sysidle_init_percpu_data(struct rcu_dynticks *rdtp); #endif /* #ifndef RCU_TREE_NONCORE */ diff --git a/kernel/rcutree_plugin.h b/kernel/rcutree_plugin.h index ff84bed..f65d9c2 100644 --- a/kernel/rcutree_plugin.h +++ b/kernel/rcutree_plugin.h @@ -2544,7 +2544,7 @@ static void rcu_sysidle_check_cpu(struct rcu_data *rdp, bool *isidle, if (!*isidle || rdp-rsp != rcu_sysidle_state || cpu_is_offline(rdp-cpu) || rdp-cpu == tick_do_timer_cpu) return; - /* WARN_ON_ONCE(smp_processor_id() != tick_do_timer_cpu); */ + WARN_ON_ONCE(smp_processor_id() != tick_do_timer_cpu); but call rcu_sysidle_check_cpu() when RCU_GP_FLAG_FQS. In this time, the thread may not be bound to tick_do_timer_cpu, the WARN_ON_ONCE() may be wrong. Does any other code ensure the gp thread bound on tick_do_timer_cpu which I missed? /* Pick up current idle and NMI-nesting counter and check. */ cur = atomic_read(rdtp-dynticks_idle); @@ -2570,6 +2570,20 @@ static bool is_sysidle_rcu_state(struct rcu_state *rsp) } /* + * Bind the grace-period kthread for the sysidle flavor of RCU to the + * timekeeping CPU. + */ +static void rcu_bind_gp_kthread(void) +{ + int cpu = ACCESS_ONCE(tick_do_timer_cpu); + + if (cpu 0 || cpu = nr_cpu_ids) + return; + if (raw_smp_processor_id() != cpu) + set_cpus_allowed_ptr(current, cpumask_of(cpu)); +} + +/* * Return a delay in jiffies based on the number of CPUs, rcu_node * leaf fanout, and jiffies tick rate. The idea is to allow larger * systems more time to transition to full-idle state in order to @@ -2767,6 +2781,10 @@ static bool is_sysidle_rcu_state(struct rcu_state *rsp) return false; } +static void rcu_bind_gp_kthread(void) +{ +} + static void rcu_sysidle_report_gp(struct rcu_state *rsp, int isidle, unsigned long maxj) { -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH RFC nohz_full 7/7] nohz_full: Force RCU's grace-period kthreads onto timekeeping CPU
From: "Paul E. McKenney" Because RCU's quiescent-state-forcing mechanism is used to drive the full-system-idle state machine, and because this mechanism is executed by RCU's grace-period kthreads, this commit forces these kthreads to run on the timekeeping CPU (tick_do_timer_cpu). To do otherwise would mean that the RCU grace-period kthreads would force the system into non-idle state every time they drove the state machine, which would be just a bit on the futile side. Signed-off-by: Paul E. McKenney Cc: Frederic Weisbecker Cc: Steven Rostedt --- kernel/rcutree.c| 1 + kernel/rcutree.h| 1 + kernel/rcutree_plugin.h | 20 +++- 3 files changed, 21 insertions(+), 1 deletion(-) diff --git a/kernel/rcutree.c b/kernel/rcutree.c index aa6d96e..fe83085 100644 --- a/kernel/rcutree.c +++ b/kernel/rcutree.c @@ -1286,6 +1286,7 @@ static int rcu_gp_init(struct rcu_state *rsp) struct rcu_data *rdp; struct rcu_node *rnp = rcu_get_root(rsp); + rcu_bind_gp_kthread(); raw_spin_lock_irq(>lock); rsp->gp_flags = 0; /* Clear all flags: New grace period. */ diff --git a/kernel/rcutree.h b/kernel/rcutree.h index e0de5dc..49dac99 100644 --- a/kernel/rcutree.h +++ b/kernel/rcutree.h @@ -560,6 +560,7 @@ static void rcu_sysidle_check_cpu(struct rcu_data *rdp, bool *isidle, static bool is_sysidle_rcu_state(struct rcu_state *rsp); static void rcu_sysidle_report_gp(struct rcu_state *rsp, int isidle, unsigned long maxj); +static void rcu_bind_gp_kthread(void); static void rcu_sysidle_init_percpu_data(struct rcu_dynticks *rdtp); #endif /* #ifndef RCU_TREE_NONCORE */ diff --git a/kernel/rcutree_plugin.h b/kernel/rcutree_plugin.h index ff84bed..f65d9c2 100644 --- a/kernel/rcutree_plugin.h +++ b/kernel/rcutree_plugin.h @@ -2544,7 +2544,7 @@ static void rcu_sysidle_check_cpu(struct rcu_data *rdp, bool *isidle, if (!*isidle || rdp->rsp != rcu_sysidle_state || cpu_is_offline(rdp->cpu) || rdp->cpu == tick_do_timer_cpu) return; - /* WARN_ON_ONCE(smp_processor_id() != tick_do_timer_cpu); */ + WARN_ON_ONCE(smp_processor_id() != tick_do_timer_cpu); /* Pick up current idle and NMI-nesting counter and check. */ cur = atomic_read(>dynticks_idle); @@ -2570,6 +2570,20 @@ static bool is_sysidle_rcu_state(struct rcu_state *rsp) } /* + * Bind the grace-period kthread for the sysidle flavor of RCU to the + * timekeeping CPU. + */ +static void rcu_bind_gp_kthread(void) +{ + int cpu = ACCESS_ONCE(tick_do_timer_cpu); + + if (cpu < 0 || cpu >= nr_cpu_ids) + return; + if (raw_smp_processor_id() != cpu) + set_cpus_allowed_ptr(current, cpumask_of(cpu)); +} + +/* * Return a delay in jiffies based on the number of CPUs, rcu_node * leaf fanout, and jiffies tick rate. The idea is to allow larger * systems more time to transition to full-idle state in order to @@ -2767,6 +2781,10 @@ static bool is_sysidle_rcu_state(struct rcu_state *rsp) return false; } +static void rcu_bind_gp_kthread(void) +{ +} + static void rcu_sysidle_report_gp(struct rcu_state *rsp, int isidle, unsigned long maxj) { -- 1.8.1.5 -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH RFC nohz_full 7/7] nohz_full: Force RCU's grace-period kthreads onto timekeeping CPU
From: Paul E. McKenney paul...@linux.vnet.ibm.com Because RCU's quiescent-state-forcing mechanism is used to drive the full-system-idle state machine, and because this mechanism is executed by RCU's grace-period kthreads, this commit forces these kthreads to run on the timekeeping CPU (tick_do_timer_cpu). To do otherwise would mean that the RCU grace-period kthreads would force the system into non-idle state every time they drove the state machine, which would be just a bit on the futile side. Signed-off-by: Paul E. McKenney paul...@linux.vnet.ibm.com Cc: Frederic Weisbecker fweis...@gmail.com Cc: Steven Rostedt rost...@goodmis.org --- kernel/rcutree.c| 1 + kernel/rcutree.h| 1 + kernel/rcutree_plugin.h | 20 +++- 3 files changed, 21 insertions(+), 1 deletion(-) diff --git a/kernel/rcutree.c b/kernel/rcutree.c index aa6d96e..fe83085 100644 --- a/kernel/rcutree.c +++ b/kernel/rcutree.c @@ -1286,6 +1286,7 @@ static int rcu_gp_init(struct rcu_state *rsp) struct rcu_data *rdp; struct rcu_node *rnp = rcu_get_root(rsp); + rcu_bind_gp_kthread(); raw_spin_lock_irq(rnp-lock); rsp-gp_flags = 0; /* Clear all flags: New grace period. */ diff --git a/kernel/rcutree.h b/kernel/rcutree.h index e0de5dc..49dac99 100644 --- a/kernel/rcutree.h +++ b/kernel/rcutree.h @@ -560,6 +560,7 @@ static void rcu_sysidle_check_cpu(struct rcu_data *rdp, bool *isidle, static bool is_sysidle_rcu_state(struct rcu_state *rsp); static void rcu_sysidle_report_gp(struct rcu_state *rsp, int isidle, unsigned long maxj); +static void rcu_bind_gp_kthread(void); static void rcu_sysidle_init_percpu_data(struct rcu_dynticks *rdtp); #endif /* #ifndef RCU_TREE_NONCORE */ diff --git a/kernel/rcutree_plugin.h b/kernel/rcutree_plugin.h index ff84bed..f65d9c2 100644 --- a/kernel/rcutree_plugin.h +++ b/kernel/rcutree_plugin.h @@ -2544,7 +2544,7 @@ static void rcu_sysidle_check_cpu(struct rcu_data *rdp, bool *isidle, if (!*isidle || rdp-rsp != rcu_sysidle_state || cpu_is_offline(rdp-cpu) || rdp-cpu == tick_do_timer_cpu) return; - /* WARN_ON_ONCE(smp_processor_id() != tick_do_timer_cpu); */ + WARN_ON_ONCE(smp_processor_id() != tick_do_timer_cpu); /* Pick up current idle and NMI-nesting counter and check. */ cur = atomic_read(rdtp-dynticks_idle); @@ -2570,6 +2570,20 @@ static bool is_sysidle_rcu_state(struct rcu_state *rsp) } /* + * Bind the grace-period kthread for the sysidle flavor of RCU to the + * timekeeping CPU. + */ +static void rcu_bind_gp_kthread(void) +{ + int cpu = ACCESS_ONCE(tick_do_timer_cpu); + + if (cpu 0 || cpu = nr_cpu_ids) + return; + if (raw_smp_processor_id() != cpu) + set_cpus_allowed_ptr(current, cpumask_of(cpu)); +} + +/* * Return a delay in jiffies based on the number of CPUs, rcu_node * leaf fanout, and jiffies tick rate. The idea is to allow larger * systems more time to transition to full-idle state in order to @@ -2767,6 +2781,10 @@ static bool is_sysidle_rcu_state(struct rcu_state *rsp) return false; } +static void rcu_bind_gp_kthread(void) +{ +} + static void rcu_sysidle_report_gp(struct rcu_state *rsp, int isidle, unsigned long maxj) { -- 1.8.1.5 -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH RFC nohz_full 7/7] nohz_full: Force RCU's grace-period kthreads onto timekeeping CPU
From: "Paul E. McKenney" Because RCU's quiescent-state-forcing mechanism is used to drive the full-system-idle state machine, and because this mechanism is executed by RCU's grace-period kthreads, this commit forces these kthreads to run on the timekeeping CPU (tick_do_timer_cpu). To do otherwise would mean that the RCU grace-period kthreads would force the system into non-idle state every time they drove the state machine, which would be just a bit on the futile side. Signed-off-by: Paul E. McKenney Cc: Frederic Weisbecker Cc: Steven Rostedt --- kernel/rcutree.c| 1 + kernel/rcutree.h| 1 + kernel/rcutree_plugin.h | 20 +++- 3 files changed, 21 insertions(+), 1 deletion(-) diff --git a/kernel/rcutree.c b/kernel/rcutree.c index 06cfd75..ad9a5ec 100644 --- a/kernel/rcutree.c +++ b/kernel/rcutree.c @@ -1286,6 +1286,7 @@ static int rcu_gp_init(struct rcu_state *rsp) struct rcu_data *rdp; struct rcu_node *rnp = rcu_get_root(rsp); + rcu_bind_gp_kthread(); raw_spin_lock_irq(>lock); rsp->gp_flags = 0; /* Clear all flags: New grace period. */ diff --git a/kernel/rcutree.h b/kernel/rcutree.h index 7326a3c..1602c21 100644 --- a/kernel/rcutree.h +++ b/kernel/rcutree.h @@ -558,6 +558,7 @@ static void rcu_sysidle_exit(struct rcu_dynticks *rdtp, int irq); static void rcu_sysidle_check_cpu(struct rcu_data *rdp, bool *isidle, unsigned long *maxj); static bool is_sysidle_rcu_state(struct rcu_state *rsp); +static void rcu_bind_gp_kthread(void); static void rcu_sysidle_report(struct rcu_state *rsp, int isidle, unsigned long maxj); static void rcu_sysidle_init_percpu_data(struct rcu_dynticks *rdtp); diff --git a/kernel/rcutree_plugin.h b/kernel/rcutree_plugin.h index b47ffb0..a4d44c3 100644 --- a/kernel/rcutree_plugin.h +++ b/kernel/rcutree_plugin.h @@ -2544,7 +2544,7 @@ static void rcu_sysidle_check_cpu(struct rcu_data *rdp, bool *isidle, if (!*isidle || rdp->rsp != rcu_sysidle_state || cpu_is_offline(rdp->cpu) || rdp->cpu == tick_do_timer_cpu) return; - /* WARN_ON_ONCE(smp_processor_id() != tick_do_timer_cpu); */ + WARN_ON_ONCE(smp_processor_id() != tick_do_timer_cpu); /* Pick up current idle and NMI-nesting counter and check. */ cur = atomic_read(>dynticks_idle); @@ -2570,6 +2570,20 @@ static bool is_sysidle_rcu_state(struct rcu_state *rsp) } /* + * Bind the grace-period kthread for the sysidle flavor of RCU to the + * timekeeping CPU. + */ +static void rcu_bind_gp_kthread(void) +{ + int cpu = ACCESS_ONCE(tick_do_timer_cpu); + + if (cpu < 0 || cpu >= nr_cpu_ids) + return; + if (raw_smp_processor_id() != cpu) + set_cpus_allowed_ptr(current, cpumask_of(cpu)); +} + +/* * Return a delay in jiffies based on the number of CPUs, rcu_node * leaf fanout, and jiffies tick rate. The idea is to allow larger * systems more time to transition to full-idle state in order to @@ -2758,6 +2772,10 @@ static bool is_sysidle_rcu_state(struct rcu_state *rsp) return false; } +static void rcu_bind_gp_kthread(void) +{ +} + static void rcu_sysidle_report(struct rcu_state *rsp, int isidle, unsigned long maxj) { -- 1.8.1.5 -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH RFC nohz_full 7/7] nohz_full: Force RCU's grace-period kthreads onto timekeeping CPU
From: Paul E. McKenney paul...@linux.vnet.ibm.com Because RCU's quiescent-state-forcing mechanism is used to drive the full-system-idle state machine, and because this mechanism is executed by RCU's grace-period kthreads, this commit forces these kthreads to run on the timekeeping CPU (tick_do_timer_cpu). To do otherwise would mean that the RCU grace-period kthreads would force the system into non-idle state every time they drove the state machine, which would be just a bit on the futile side. Signed-off-by: Paul E. McKenney paul...@linux.vnet.ibm.com Cc: Frederic Weisbecker fweis...@gmail.com Cc: Steven Rostedt rost...@goodmis.org --- kernel/rcutree.c| 1 + kernel/rcutree.h| 1 + kernel/rcutree_plugin.h | 20 +++- 3 files changed, 21 insertions(+), 1 deletion(-) diff --git a/kernel/rcutree.c b/kernel/rcutree.c index 06cfd75..ad9a5ec 100644 --- a/kernel/rcutree.c +++ b/kernel/rcutree.c @@ -1286,6 +1286,7 @@ static int rcu_gp_init(struct rcu_state *rsp) struct rcu_data *rdp; struct rcu_node *rnp = rcu_get_root(rsp); + rcu_bind_gp_kthread(); raw_spin_lock_irq(rnp-lock); rsp-gp_flags = 0; /* Clear all flags: New grace period. */ diff --git a/kernel/rcutree.h b/kernel/rcutree.h index 7326a3c..1602c21 100644 --- a/kernel/rcutree.h +++ b/kernel/rcutree.h @@ -558,6 +558,7 @@ static void rcu_sysidle_exit(struct rcu_dynticks *rdtp, int irq); static void rcu_sysidle_check_cpu(struct rcu_data *rdp, bool *isidle, unsigned long *maxj); static bool is_sysidle_rcu_state(struct rcu_state *rsp); +static void rcu_bind_gp_kthread(void); static void rcu_sysidle_report(struct rcu_state *rsp, int isidle, unsigned long maxj); static void rcu_sysidle_init_percpu_data(struct rcu_dynticks *rdtp); diff --git a/kernel/rcutree_plugin.h b/kernel/rcutree_plugin.h index b47ffb0..a4d44c3 100644 --- a/kernel/rcutree_plugin.h +++ b/kernel/rcutree_plugin.h @@ -2544,7 +2544,7 @@ static void rcu_sysidle_check_cpu(struct rcu_data *rdp, bool *isidle, if (!*isidle || rdp-rsp != rcu_sysidle_state || cpu_is_offline(rdp-cpu) || rdp-cpu == tick_do_timer_cpu) return; - /* WARN_ON_ONCE(smp_processor_id() != tick_do_timer_cpu); */ + WARN_ON_ONCE(smp_processor_id() != tick_do_timer_cpu); /* Pick up current idle and NMI-nesting counter and check. */ cur = atomic_read(rdtp-dynticks_idle); @@ -2570,6 +2570,20 @@ static bool is_sysidle_rcu_state(struct rcu_state *rsp) } /* + * Bind the grace-period kthread for the sysidle flavor of RCU to the + * timekeeping CPU. + */ +static void rcu_bind_gp_kthread(void) +{ + int cpu = ACCESS_ONCE(tick_do_timer_cpu); + + if (cpu 0 || cpu = nr_cpu_ids) + return; + if (raw_smp_processor_id() != cpu) + set_cpus_allowed_ptr(current, cpumask_of(cpu)); +} + +/* * Return a delay in jiffies based on the number of CPUs, rcu_node * leaf fanout, and jiffies tick rate. The idea is to allow larger * systems more time to transition to full-idle state in order to @@ -2758,6 +2772,10 @@ static bool is_sysidle_rcu_state(struct rcu_state *rsp) return false; } +static void rcu_bind_gp_kthread(void) +{ +} + static void rcu_sysidle_report(struct rcu_state *rsp, int isidle, unsigned long maxj) { -- 1.8.1.5 -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/