Re: nohz problem with idle time on old hardware
On Wed, 9 Apr 2014 21:26:51 +0530 Viresh Kumar wrote: > When HRES isn't enabled and NOHZ isn't enabled as well, in that > case we should stick to the periodic code from tick-common.c and > the oneshot options of tick_nohz_switch_to_nohz() or > hrtimer_switch_to_hres() shouldn't be used. And so, we still need > those checks, as per my understanding. :) The one thing we can agree on is that the current code is wrong :-) > > Lets see what others have for this discussion :) Yeah, those that actually wrote this code should have a say in this. -- Steve -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: nohz problem with idle time on old hardware
On 9 April 2014 21:09, Steven Rostedt wrote: > Reading even more of the code, now I'm totally confused :-) :) > When tick_setup_sched_timer() is called, if tick_nohz_enabled is set, > then we set tick_nohz_active. correct. > This gets called by hrtimer_switch_to_hres(), and before that is > called, the tick_check_oneshot_changed() will never get to the > tick_nohz_switch_to_nohz() call. If hrtimer_switch_to_hres() is called or HRES is enabled, we will never ever call tick_nohz_switch_to_nohz(). > Looks to me, the real answer is to nuke both the if statement *and* the > setting of the tick_nohz_active in that function. Both looks a bit > redundant to me. When HRES isn't enabled and NOHZ isn't enabled as well, in that case we should stick to the periodic code from tick-common.c and the oneshot options of tick_nohz_switch_to_nohz() or hrtimer_switch_to_hres() shouldn't be used. And so, we still need those checks, as per my understanding. :) Lets see what others have for this discussion :) -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: nohz problem with idle time on old hardware
On Wed, 9 Apr 2014 11:31:43 -0400 Steven Rostedt wrote: > > Hmm, looking at the code, I see it probably should still do the check. > > OK, nevermind ;-) Reading even more of the code, now I'm totally confused :-) When tick_setup_sched_timer() is called, if tick_nohz_enabled is set, then we set tick_nohz_active. This gets called by hrtimer_switch_to_hres(), and before that is called, the tick_check_oneshot_changed() will never get to the tick_nohz_switch_to_nohz() call. Looks to me, the real answer is to nuke both the if statement *and* the setting of the tick_nohz_active in that function. Both looks a bit redundant to me. -- Steve -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: nohz problem with idle time on old hardware
On 9 April 2014 21:01, Steven Rostedt wrote: >> Do we? This is only called by tick_check_oneshot_change() which has the >> following: >> >> int tick_check_oneshot_change(int allow_nohz) >> { >> struct tick_sched *ts = &__get_cpu_var(tick_cpu_sched); >> >> if (!test_and_clear_bit(0, >check_clocks)) >> return 0; >> >> if (ts->nohz_mode != NOHZ_MODE_INACTIVE) >> return 0; >> >> if (!timekeeping_valid_for_hres() || !tick_is_oneshot_available()) >> return 0; >> >> if (!allow_nohz) >> return 1; >> >> tick_nohz_switch_to_nohz(); >> return 0; >> } >> >> How often does it make it to that last check? Probably we will reach here only once per boot per cpu. > Hmm, looking at the code, I see it probably should still do the check. But still we need it for that one time. :) -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: nohz problem with idle time on old hardware
On Wed, 9 Apr 2014 11:29:50 -0400 Steven Rostedt wrote: > On Wed, 9 Apr 2014 20:50:59 +0530 > Viresh Kumar wrote: > > > On 9 April 2014 20:01, Steven Rostedt wrote: > > > Ouch! You are correct, this part of the patch makes no sense. That's > > > what I get for reviewing a patch and not looking at all the code around > > > the changes. (another kernel developer hangs head in shame :-( ) > > > > > > I think that if statement should be nuked. > > > > Hmm, my opinion differs here :) > > > > If we completely remove this statement, we will run > > tick_nohz_switch_to_nohz() even if nohz is not enabled. And check for > > enabled must stay. > > Do we? This is only called by tick_check_oneshot_change() which has the > following: > > int tick_check_oneshot_change(int allow_nohz) > { > struct tick_sched *ts = &__get_cpu_var(tick_cpu_sched); > > if (!test_and_clear_bit(0, >check_clocks)) > return 0; > > if (ts->nohz_mode != NOHZ_MODE_INACTIVE) > return 0; > > if (!timekeeping_valid_for_hres() || !tick_is_oneshot_available()) > return 0; > > if (!allow_nohz) > return 1; > > tick_nohz_switch_to_nohz(); > return 0; > } > > How often does it make it to that last check? Hmm, looking at the code, I see it probably should still do the check. OK, nevermind ;-) -- Steve -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: nohz problem with idle time on old hardware
On Wed, 9 Apr 2014 20:50:59 +0530 Viresh Kumar wrote: > On 9 April 2014 20:01, Steven Rostedt wrote: > > Ouch! You are correct, this part of the patch makes no sense. That's > > what I get for reviewing a patch and not looking at all the code around > > the changes. (another kernel developer hangs head in shame :-( ) > > > > I think that if statement should be nuked. > > Hmm, my opinion differs here :) > > If we completely remove this statement, we will run > tick_nohz_switch_to_nohz() even if nohz is not enabled. And check for > enabled must stay. Do we? This is only called by tick_check_oneshot_change() which has the following: int tick_check_oneshot_change(int allow_nohz) { struct tick_sched *ts = &__get_cpu_var(tick_cpu_sched); if (!test_and_clear_bit(0, >check_clocks)) return 0; if (ts->nohz_mode != NOHZ_MODE_INACTIVE) return 0; if (!timekeeping_valid_for_hres() || !tick_is_oneshot_available()) return 0; if (!allow_nohz) return 1; tick_nohz_switch_to_nohz(); return 0; } How often does it make it to that last check? -- Steve -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: nohz problem with idle time on old hardware
On 9 April 2014 20:01, Steven Rostedt wrote: > Ouch! You are correct, this part of the patch makes no sense. That's > what I get for reviewing a patch and not looking at all the code around > the changes. (another kernel developer hangs head in shame :-( ) > > I think that if statement should be nuked. Hmm, my opinion differs here :) If we completely remove this statement, we will run tick_nohz_switch_to_nohz() even if nohz is not enabled. And check for enabled must stay. -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: nohz problem with idle time on old hardware
On Wed, Apr 09, 2014 at 07:21:53PM +0530, Viresh Kumar wrote: > On Thu, Nov 14, 2013 at 1:31 AM, Thomas Gleixner wrote: > > Subject: NOHZ: Check for nohz active instead of nohz enabled > > > > RCU and the fine grained idle time accounting functions check > > tick_nohz_enabled. But that variable is merily telling that NOHZ has > > been enabled in the config and not been disabled on the command line. > > > > But it does not tell anything about nohz being active. That's what all > > this should check for. > > > > Matthew reported, that the idle accounting on his old P1 machine > > showed bogus values, when he enabled NOHZ in the config and did not > > disable it on the kernel command line. The reason is that his machine > > uses (refined) jiffies as a clocksource which explains why the "fine" > > grained accounting went into lala land, because it depends on when the > > system goes and leaves idle relative to the jiffies increment. > > > > Provide a tick_nohz_active indicator and let RCU and the accounting > > code use this instead of tick_nohz_enable. > > > diff --git a/kernel/time/tick-sched.c b/kernel/time/tick-sched.c > > @@ -973,7 +968,7 @@ static void tick_nohz_switch_to_nohz(void) > > struct tick_sched *ts = &__get_cpu_var(tick_cpu_sched); > > ktime_t next; > > > > - if (!tick_nohz_enabled) > > + if (!tick_nohz_active) > > return; > > Considering the impressive list of Reviewed-by and people involved > in this patch, I am not sure I am reading the code well here. > > The above change isn't required as per my understanding. Otherwise > we will never pass that check. tick_nohz_active is initialized as zero > and so we will keep on returning for ever and wouldn't be able to set > it to 1 ever. > > I have a patch to fix it up, but wanted to know your opinion before > sending it. Ouch, bad thing. I'm also disovering this whole thread and patch only today, since nobody Cc'ed me :-( > > local_irq_disable(); > > @@ -981,7 +976,7 @@ static void tick_nohz_switch_to_nohz(void) > > local_irq_enable(); > > return; > > } > > - > > + tick_nohz_active = 1; > > ts->nohz_mode = NOHZ_MODE_LOWRES; > > > > /* > -- > To unsubscribe from this list: send the line "unsubscribe linux-kernel" in > the body of a message to majord...@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html > Please read the FAQ at http://www.tux.org/lkml/ -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: nohz problem with idle time on old hardware
On Wed, 9 Apr 2014 19:21:53 +0530 Viresh Kumar wrote: > On Thu, Nov 14, 2013 at 1:31 AM, Thomas Gleixner wrote: > > Subject: NOHZ: Check for nohz active instead of nohz enabled > > > > RCU and the fine grained idle time accounting functions check > > tick_nohz_enabled. But that variable is merily telling that NOHZ has > > been enabled in the config and not been disabled on the command line. > > > > But it does not tell anything about nohz being active. That's what all > > this should check for. > > > > Matthew reported, that the idle accounting on his old P1 machine > > showed bogus values, when he enabled NOHZ in the config and did not > > disable it on the kernel command line. The reason is that his machine > > uses (refined) jiffies as a clocksource which explains why the "fine" > > grained accounting went into lala land, because it depends on when the > > system goes and leaves idle relative to the jiffies increment. > > > > Provide a tick_nohz_active indicator and let RCU and the accounting > > code use this instead of tick_nohz_enable. > > > diff --git a/kernel/time/tick-sched.c b/kernel/time/tick-sched.c > > @@ -973,7 +968,7 @@ static void tick_nohz_switch_to_nohz(void) > > struct tick_sched *ts = &__get_cpu_var(tick_cpu_sched); > > ktime_t next; > > > > - if (!tick_nohz_enabled) > > + if (!tick_nohz_active) > > return; > > Considering the impressive list of Reviewed-by and people involved > in this patch, I am not sure I am reading the code well here. > > The above change isn't required as per my understanding. Otherwise > we will never pass that check. tick_nohz_active is initialized as zero > and so we will keep on returning for ever and wouldn't be able to set > it to 1 ever. > > I have a patch to fix it up, but wanted to know your opinion before > sending it. Ouch! You are correct, this part of the patch makes no sense. That's what I get for reviewing a patch and not looking at all the code around the changes. (another kernel developer hangs head in shame :-( ) I think that if statement should be nuked. -- Steve > > > local_irq_disable(); > > @@ -981,7 +976,7 @@ static void tick_nohz_switch_to_nohz(void) > > local_irq_enable(); > > return; > > } > > - > > + tick_nohz_active = 1; > > ts->nohz_mode = NOHZ_MODE_LOWRES; > > > > /* -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: nohz problem with idle time on old hardware
On Thu, Nov 14, 2013 at 1:31 AM, Thomas Gleixner wrote: > Subject: NOHZ: Check for nohz active instead of nohz enabled > > RCU and the fine grained idle time accounting functions check > tick_nohz_enabled. But that variable is merily telling that NOHZ has > been enabled in the config and not been disabled on the command line. > > But it does not tell anything about nohz being active. That's what all > this should check for. > > Matthew reported, that the idle accounting on his old P1 machine > showed bogus values, when he enabled NOHZ in the config and did not > disable it on the kernel command line. The reason is that his machine > uses (refined) jiffies as a clocksource which explains why the "fine" > grained accounting went into lala land, because it depends on when the > system goes and leaves idle relative to the jiffies increment. > > Provide a tick_nohz_active indicator and let RCU and the accounting > code use this instead of tick_nohz_enable. > diff --git a/kernel/time/tick-sched.c b/kernel/time/tick-sched.c > @@ -973,7 +968,7 @@ static void tick_nohz_switch_to_nohz(void) > struct tick_sched *ts = &__get_cpu_var(tick_cpu_sched); > ktime_t next; > > - if (!tick_nohz_enabled) > + if (!tick_nohz_active) > return; Considering the impressive list of Reviewed-by and people involved in this patch, I am not sure I am reading the code well here. The above change isn't required as per my understanding. Otherwise we will never pass that check. tick_nohz_active is initialized as zero and so we will keep on returning for ever and wouldn't be able to set it to 1 ever. I have a patch to fix it up, but wanted to know your opinion before sending it. > local_irq_disable(); > @@ -981,7 +976,7 @@ static void tick_nohz_switch_to_nohz(void) > local_irq_enable(); > return; > } > - > + tick_nohz_active = 1; > ts->nohz_mode = NOHZ_MODE_LOWRES; > > /* -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: nohz problem with idle time on old hardware
On Thu, Nov 14, 2013 at 1:31 AM, Thomas Gleixner t...@linutronix.de wrote: Subject: NOHZ: Check for nohz active instead of nohz enabled RCU and the fine grained idle time accounting functions check tick_nohz_enabled. But that variable is merily telling that NOHZ has been enabled in the config and not been disabled on the command line. But it does not tell anything about nohz being active. That's what all this should check for. Matthew reported, that the idle accounting on his old P1 machine showed bogus values, when he enabled NOHZ in the config and did not disable it on the kernel command line. The reason is that his machine uses (refined) jiffies as a clocksource which explains why the fine grained accounting went into lala land, because it depends on when the system goes and leaves idle relative to the jiffies increment. Provide a tick_nohz_active indicator and let RCU and the accounting code use this instead of tick_nohz_enable. diff --git a/kernel/time/tick-sched.c b/kernel/time/tick-sched.c @@ -973,7 +968,7 @@ static void tick_nohz_switch_to_nohz(void) struct tick_sched *ts = __get_cpu_var(tick_cpu_sched); ktime_t next; - if (!tick_nohz_enabled) + if (!tick_nohz_active) return; Considering the impressive list of Reviewed-by and people involved in this patch, I am not sure I am reading the code well here. The above change isn't required as per my understanding. Otherwise we will never pass that check. tick_nohz_active is initialized as zero and so we will keep on returning for ever and wouldn't be able to set it to 1 ever. I have a patch to fix it up, but wanted to know your opinion before sending it. local_irq_disable(); @@ -981,7 +976,7 @@ static void tick_nohz_switch_to_nohz(void) local_irq_enable(); return; } - + tick_nohz_active = 1; ts-nohz_mode = NOHZ_MODE_LOWRES; /* -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: nohz problem with idle time on old hardware
On Wed, 9 Apr 2014 19:21:53 +0530 Viresh Kumar viresh.ku...@linaro.org wrote: On Thu, Nov 14, 2013 at 1:31 AM, Thomas Gleixner t...@linutronix.de wrote: Subject: NOHZ: Check for nohz active instead of nohz enabled RCU and the fine grained idle time accounting functions check tick_nohz_enabled. But that variable is merily telling that NOHZ has been enabled in the config and not been disabled on the command line. But it does not tell anything about nohz being active. That's what all this should check for. Matthew reported, that the idle accounting on his old P1 machine showed bogus values, when he enabled NOHZ in the config and did not disable it on the kernel command line. The reason is that his machine uses (refined) jiffies as a clocksource which explains why the fine grained accounting went into lala land, because it depends on when the system goes and leaves idle relative to the jiffies increment. Provide a tick_nohz_active indicator and let RCU and the accounting code use this instead of tick_nohz_enable. diff --git a/kernel/time/tick-sched.c b/kernel/time/tick-sched.c @@ -973,7 +968,7 @@ static void tick_nohz_switch_to_nohz(void) struct tick_sched *ts = __get_cpu_var(tick_cpu_sched); ktime_t next; - if (!tick_nohz_enabled) + if (!tick_nohz_active) return; Considering the impressive list of Reviewed-by and people involved in this patch, I am not sure I am reading the code well here. The above change isn't required as per my understanding. Otherwise we will never pass that check. tick_nohz_active is initialized as zero and so we will keep on returning for ever and wouldn't be able to set it to 1 ever. I have a patch to fix it up, but wanted to know your opinion before sending it. Ouch! You are correct, this part of the patch makes no sense. That's what I get for reviewing a patch and not looking at all the code around the changes. (another kernel developer hangs head in shame :-( ) I think that if statement should be nuked. -- Steve local_irq_disable(); @@ -981,7 +976,7 @@ static void tick_nohz_switch_to_nohz(void) local_irq_enable(); return; } - + tick_nohz_active = 1; ts-nohz_mode = NOHZ_MODE_LOWRES; /* -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: nohz problem with idle time on old hardware
On Wed, Apr 09, 2014 at 07:21:53PM +0530, Viresh Kumar wrote: On Thu, Nov 14, 2013 at 1:31 AM, Thomas Gleixner t...@linutronix.de wrote: Subject: NOHZ: Check for nohz active instead of nohz enabled RCU and the fine grained idle time accounting functions check tick_nohz_enabled. But that variable is merily telling that NOHZ has been enabled in the config and not been disabled on the command line. But it does not tell anything about nohz being active. That's what all this should check for. Matthew reported, that the idle accounting on his old P1 machine showed bogus values, when he enabled NOHZ in the config and did not disable it on the kernel command line. The reason is that his machine uses (refined) jiffies as a clocksource which explains why the fine grained accounting went into lala land, because it depends on when the system goes and leaves idle relative to the jiffies increment. Provide a tick_nohz_active indicator and let RCU and the accounting code use this instead of tick_nohz_enable. diff --git a/kernel/time/tick-sched.c b/kernel/time/tick-sched.c @@ -973,7 +968,7 @@ static void tick_nohz_switch_to_nohz(void) struct tick_sched *ts = __get_cpu_var(tick_cpu_sched); ktime_t next; - if (!tick_nohz_enabled) + if (!tick_nohz_active) return; Considering the impressive list of Reviewed-by and people involved in this patch, I am not sure I am reading the code well here. The above change isn't required as per my understanding. Otherwise we will never pass that check. tick_nohz_active is initialized as zero and so we will keep on returning for ever and wouldn't be able to set it to 1 ever. I have a patch to fix it up, but wanted to know your opinion before sending it. Ouch, bad thing. I'm also disovering this whole thread and patch only today, since nobody Cc'ed me :-( local_irq_disable(); @@ -981,7 +976,7 @@ static void tick_nohz_switch_to_nohz(void) local_irq_enable(); return; } - + tick_nohz_active = 1; ts-nohz_mode = NOHZ_MODE_LOWRES; /* -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/ -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: nohz problem with idle time on old hardware
On 9 April 2014 20:01, Steven Rostedt rost...@goodmis.org wrote: Ouch! You are correct, this part of the patch makes no sense. That's what I get for reviewing a patch and not looking at all the code around the changes. (another kernel developer hangs head in shame :-( ) I think that if statement should be nuked. Hmm, my opinion differs here :) If we completely remove this statement, we will run tick_nohz_switch_to_nohz() even if nohz is not enabled. And check for enabled must stay. -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: nohz problem with idle time on old hardware
On Wed, 9 Apr 2014 20:50:59 +0530 Viresh Kumar viresh.ku...@linaro.org wrote: On 9 April 2014 20:01, Steven Rostedt rost...@goodmis.org wrote: Ouch! You are correct, this part of the patch makes no sense. That's what I get for reviewing a patch and not looking at all the code around the changes. (another kernel developer hangs head in shame :-( ) I think that if statement should be nuked. Hmm, my opinion differs here :) If we completely remove this statement, we will run tick_nohz_switch_to_nohz() even if nohz is not enabled. And check for enabled must stay. Do we? This is only called by tick_check_oneshot_change() which has the following: int tick_check_oneshot_change(int allow_nohz) { struct tick_sched *ts = __get_cpu_var(tick_cpu_sched); if (!test_and_clear_bit(0, ts-check_clocks)) return 0; if (ts-nohz_mode != NOHZ_MODE_INACTIVE) return 0; if (!timekeeping_valid_for_hres() || !tick_is_oneshot_available()) return 0; if (!allow_nohz) return 1; tick_nohz_switch_to_nohz(); return 0; } How often does it make it to that last check? -- Steve -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: nohz problem with idle time on old hardware
On Wed, 9 Apr 2014 11:29:50 -0400 Steven Rostedt rost...@goodmis.org wrote: On Wed, 9 Apr 2014 20:50:59 +0530 Viresh Kumar viresh.ku...@linaro.org wrote: On 9 April 2014 20:01, Steven Rostedt rost...@goodmis.org wrote: Ouch! You are correct, this part of the patch makes no sense. That's what I get for reviewing a patch and not looking at all the code around the changes. (another kernel developer hangs head in shame :-( ) I think that if statement should be nuked. Hmm, my opinion differs here :) If we completely remove this statement, we will run tick_nohz_switch_to_nohz() even if nohz is not enabled. And check for enabled must stay. Do we? This is only called by tick_check_oneshot_change() which has the following: int tick_check_oneshot_change(int allow_nohz) { struct tick_sched *ts = __get_cpu_var(tick_cpu_sched); if (!test_and_clear_bit(0, ts-check_clocks)) return 0; if (ts-nohz_mode != NOHZ_MODE_INACTIVE) return 0; if (!timekeeping_valid_for_hres() || !tick_is_oneshot_available()) return 0; if (!allow_nohz) return 1; tick_nohz_switch_to_nohz(); return 0; } How often does it make it to that last check? Hmm, looking at the code, I see it probably should still do the check. OK, nevermind ;-) -- Steve -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: nohz problem with idle time on old hardware
On 9 April 2014 21:01, Steven Rostedt rost...@goodmis.org wrote: Do we? This is only called by tick_check_oneshot_change() which has the following: int tick_check_oneshot_change(int allow_nohz) { struct tick_sched *ts = __get_cpu_var(tick_cpu_sched); if (!test_and_clear_bit(0, ts-check_clocks)) return 0; if (ts-nohz_mode != NOHZ_MODE_INACTIVE) return 0; if (!timekeeping_valid_for_hres() || !tick_is_oneshot_available()) return 0; if (!allow_nohz) return 1; tick_nohz_switch_to_nohz(); return 0; } How often does it make it to that last check? Probably we will reach here only once per boot per cpu. Hmm, looking at the code, I see it probably should still do the check. But still we need it for that one time. :) -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: nohz problem with idle time on old hardware
On Wed, 9 Apr 2014 11:31:43 -0400 Steven Rostedt rost...@goodmis.org wrote: Hmm, looking at the code, I see it probably should still do the check. OK, nevermind ;-) Reading even more of the code, now I'm totally confused :-) When tick_setup_sched_timer() is called, if tick_nohz_enabled is set, then we set tick_nohz_active. This gets called by hrtimer_switch_to_hres(), and before that is called, the tick_check_oneshot_changed() will never get to the tick_nohz_switch_to_nohz() call. Looks to me, the real answer is to nuke both the if statement *and* the setting of the tick_nohz_active in that function. Both looks a bit redundant to me. -- Steve -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: nohz problem with idle time on old hardware
On 9 April 2014 21:09, Steven Rostedt rost...@goodmis.org wrote: Reading even more of the code, now I'm totally confused :-) :) When tick_setup_sched_timer() is called, if tick_nohz_enabled is set, then we set tick_nohz_active. correct. This gets called by hrtimer_switch_to_hres(), and before that is called, the tick_check_oneshot_changed() will never get to the tick_nohz_switch_to_nohz() call. If hrtimer_switch_to_hres() is called or HRES is enabled, we will never ever call tick_nohz_switch_to_nohz(). Looks to me, the real answer is to nuke both the if statement *and* the setting of the tick_nohz_active in that function. Both looks a bit redundant to me. When HRES isn't enabled and NOHZ isn't enabled as well, in that case we should stick to the periodic code from tick-common.c and the oneshot options of tick_nohz_switch_to_nohz() or hrtimer_switch_to_hres() shouldn't be used. And so, we still need those checks, as per my understanding. :) Lets see what others have for this discussion :) -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: nohz problem with idle time on old hardware
On Wed, 9 Apr 2014 21:26:51 +0530 Viresh Kumar viresh.ku...@linaro.org wrote: When HRES isn't enabled and NOHZ isn't enabled as well, in that case we should stick to the periodic code from tick-common.c and the oneshot options of tick_nohz_switch_to_nohz() or hrtimer_switch_to_hres() shouldn't be used. And so, we still need those checks, as per my understanding. :) The one thing we can agree on is that the current code is wrong :-) Lets see what others have for this discussion :) Yeah, those that actually wrote this code should have a say in this. -- Steve -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: nohz problem with idle time on old hardware
On Wed, Nov 13, 2013 at 09:01:57PM +0100, Thomas Gleixner wrote: > On Wed, 13 Nov 2013, Paul E. McKenney wrote: > > On Wed, Nov 13, 2013 at 11:23:38AM -0500, Steven Rostedt wrote: > > > On Wed, 13 Nov 2013 08:18:29 -0800 > > > "Paul E. McKenney" wrote: > > > > > > > On Wed, Nov 13, 2013 at 11:12:57AM -0500, Steven Rostedt wrote: > > > > > On Wed, 13 Nov 2013 17:07:18 +0100 (CET) > > > > > Thomas Gleixner wrote: > > > > > > > > > > > > > > > > Right. It's telling you if NOHZ is enabled. It's not telling you > > > > > > that > > > > > > NOHZ is active. > > > > > > > > > > Yeah, which makes this code rather silly: > > > > > > > > > > in rcu_prepare_for_idle(): > > > > > > > > > > /* Handle nohz enablement switches conservatively. */ > > > > > tne = ACCESS_ONCE(tick_nohz_enabled); > > > > > if (tne != rdtp->tick_nohz_enabled_snap) { > > > > > if (rcu_cpu_has_callbacks(cpu, NULL)) > > > > > invoke_rcu_core(); /* force nohz to see update. > > > > > */ > > > > > rdtp->tick_nohz_enabled_snap = tne; > > > > > return; > > > > > } > > > > > > > > OK, what should I be checking instead? Not much point in trying to > > > > get RCU out of the way of disabling the scheduling-clock interrupt > > > > if NOHZ is disabled. ;-) > > > > > > I'll leave the answer to Thomas, but checking tick_nohz_enabled just > > > lets you know if someone booted with nohz=off or not (and has nohz > > > configured). But it doesn't tell you if nohz is actually being used. > > > > Based on Thomas's most recent response, it sounds like I need to check > > a frozen shark or something. ;-) > > Correct. Fix for the whole mess below. > > Thanks, > > tglx > > > > Subject: NOHZ: Check for nohz active instead of nohz enabled > > RCU and the fine grained idle time accounting functions check > tick_nohz_enabled. But that variable is merily telling that NOHZ has > been enabled in the config and not been disabled on the command line. > > But it does not tell anything about nohz being active. That's what all > this should check for. > > Matthew reported, that the idle accounting on his old P1 machine > showed bogus values, when he enabled NOHZ in the config and did not > disable it on the kernel command line. The reason is that his machine > uses (refined) jiffies as a clocksource which explains why the "fine" > grained accounting went into lala land, because it depends on when the > system goes and leaves idle relative to the jiffies increment. > > Provide a tick_nohz_active indicator and let RCU and the accounting > code use this instead of tick_nohz_enable. > > Reported-by: Matthew Whitehead > Signed-off-by: Thomas Gleixner Reviewed-by: Paul E. McKenney > --- > diff --git a/kernel/rcu/tree_plugin.h b/kernel/rcu/tree_plugin.h > index 3822ac0..da6e6de 100644 > --- a/kernel/rcu/tree_plugin.h > +++ b/kernel/rcu/tree_plugin.h > @@ -1632,7 +1632,7 @@ module_param(rcu_idle_gp_delay, int, 0644); > static int rcu_idle_lazy_gp_delay = RCU_IDLE_LAZY_GP_DELAY; > module_param(rcu_idle_lazy_gp_delay, int, 0644); > > -extern int tick_nohz_enabled; > +extern int tick_nohz_active; > > /* > * Try to advance callbacks for all flavors of RCU on the current CPU, but > @@ -1729,7 +1729,7 @@ static void rcu_prepare_for_idle(int cpu) > int tne; > > /* Handle nohz enablement switches conservatively. */ > - tne = ACCESS_ONCE(tick_nohz_enabled); > + tne = ACCESS_ONCE(tick_nohz_active); > if (tne != rdtp->tick_nohz_enabled_snap) { > if (rcu_cpu_has_callbacks(cpu, NULL)) > invoke_rcu_core(); /* force nohz to see update. */ > diff --git a/kernel/time/tick-sched.c b/kernel/time/tick-sched.c > index 3612fc7..a12df5a 100644 > --- a/kernel/time/tick-sched.c > +++ b/kernel/time/tick-sched.c > @@ -361,8 +361,8 @@ void __init tick_nohz_init(void) > /* > * NO HZ enabled ? > */ > -int tick_nohz_enabled __read_mostly = 1; > - > +static int tick_nohz_enabled __read_mostly = 1; > +int tick_nohz_active __read_mostly; > /* > * Enable / Disable tickless mode > */ > @@ -465,7 +465,7 @@ u64 get_cpu_idle_time_us(int cpu, u64 *last_update_time) > struct tick_sched *ts = _cpu(tick_cpu_sched, cpu); > ktime_t now, idle; > > - if (!tick_nohz_enabled) > + if (!tick_nohz_active) > return -1; > > now = ktime_get(); > @@ -506,7 +506,7 @@ u64 get_cpu_iowait_time_us(int cpu, u64 *last_update_time) > struct tick_sched *ts = _cpu(tick_cpu_sched, cpu); > ktime_t now, iowait; > > - if (!tick_nohz_enabled) > + if (!tick_nohz_active) > return -1; > > now = ktime_get(); > @@ -799,11 +799,6 @@ void tick_nohz_idle_enter(void) > local_irq_disable(); > > ts = &__get_cpu_var(tick_cpu_sched); > - /* > - * set ts->inidle unconditionally. even if the system did not > - * switch to nohz mode the cpu
Re: nohz problem with idle time on old hardware
On Wed, Nov 13, 2013 at 09:01:57PM +0100, Thomas Gleixner wrote: On Wed, 13 Nov 2013, Paul E. McKenney wrote: On Wed, Nov 13, 2013 at 11:23:38AM -0500, Steven Rostedt wrote: On Wed, 13 Nov 2013 08:18:29 -0800 Paul E. McKenney paul...@linux.vnet.ibm.com wrote: On Wed, Nov 13, 2013 at 11:12:57AM -0500, Steven Rostedt wrote: On Wed, 13 Nov 2013 17:07:18 +0100 (CET) Thomas Gleixner t...@linutronix.de wrote: Right. It's telling you if NOHZ is enabled. It's not telling you that NOHZ is active. Yeah, which makes this code rather silly: in rcu_prepare_for_idle(): /* Handle nohz enablement switches conservatively. */ tne = ACCESS_ONCE(tick_nohz_enabled); if (tne != rdtp-tick_nohz_enabled_snap) { if (rcu_cpu_has_callbacks(cpu, NULL)) invoke_rcu_core(); /* force nohz to see update. */ rdtp-tick_nohz_enabled_snap = tne; return; } OK, what should I be checking instead? Not much point in trying to get RCU out of the way of disabling the scheduling-clock interrupt if NOHZ is disabled. ;-) I'll leave the answer to Thomas, but checking tick_nohz_enabled just lets you know if someone booted with nohz=off or not (and has nohz configured). But it doesn't tell you if nohz is actually being used. Based on Thomas's most recent response, it sounds like I need to check a frozen shark or something. ;-) Correct. Fix for the whole mess below. Thanks, tglx Subject: NOHZ: Check for nohz active instead of nohz enabled RCU and the fine grained idle time accounting functions check tick_nohz_enabled. But that variable is merily telling that NOHZ has been enabled in the config and not been disabled on the command line. But it does not tell anything about nohz being active. That's what all this should check for. Matthew reported, that the idle accounting on his old P1 machine showed bogus values, when he enabled NOHZ in the config and did not disable it on the kernel command line. The reason is that his machine uses (refined) jiffies as a clocksource which explains why the fine grained accounting went into lala land, because it depends on when the system goes and leaves idle relative to the jiffies increment. Provide a tick_nohz_active indicator and let RCU and the accounting code use this instead of tick_nohz_enable. Reported-by: Matthew Whitehead tedheads...@gmail.com Signed-off-by: Thomas Gleixner t...@linutronix.de Reviewed-by: Paul E. McKenney paul...@linux.vnet.ibm.com --- diff --git a/kernel/rcu/tree_plugin.h b/kernel/rcu/tree_plugin.h index 3822ac0..da6e6de 100644 --- a/kernel/rcu/tree_plugin.h +++ b/kernel/rcu/tree_plugin.h @@ -1632,7 +1632,7 @@ module_param(rcu_idle_gp_delay, int, 0644); static int rcu_idle_lazy_gp_delay = RCU_IDLE_LAZY_GP_DELAY; module_param(rcu_idle_lazy_gp_delay, int, 0644); -extern int tick_nohz_enabled; +extern int tick_nohz_active; /* * Try to advance callbacks for all flavors of RCU on the current CPU, but @@ -1729,7 +1729,7 @@ static void rcu_prepare_for_idle(int cpu) int tne; /* Handle nohz enablement switches conservatively. */ - tne = ACCESS_ONCE(tick_nohz_enabled); + tne = ACCESS_ONCE(tick_nohz_active); if (tne != rdtp-tick_nohz_enabled_snap) { if (rcu_cpu_has_callbacks(cpu, NULL)) invoke_rcu_core(); /* force nohz to see update. */ diff --git a/kernel/time/tick-sched.c b/kernel/time/tick-sched.c index 3612fc7..a12df5a 100644 --- a/kernel/time/tick-sched.c +++ b/kernel/time/tick-sched.c @@ -361,8 +361,8 @@ void __init tick_nohz_init(void) /* * NO HZ enabled ? */ -int tick_nohz_enabled __read_mostly = 1; - +static int tick_nohz_enabled __read_mostly = 1; +int tick_nohz_active __read_mostly; /* * Enable / Disable tickless mode */ @@ -465,7 +465,7 @@ u64 get_cpu_idle_time_us(int cpu, u64 *last_update_time) struct tick_sched *ts = per_cpu(tick_cpu_sched, cpu); ktime_t now, idle; - if (!tick_nohz_enabled) + if (!tick_nohz_active) return -1; now = ktime_get(); @@ -506,7 +506,7 @@ u64 get_cpu_iowait_time_us(int cpu, u64 *last_update_time) struct tick_sched *ts = per_cpu(tick_cpu_sched, cpu); ktime_t now, iowait; - if (!tick_nohz_enabled) + if (!tick_nohz_active) return -1; now = ktime_get(); @@ -799,11 +799,6 @@ void tick_nohz_idle_enter(void) local_irq_disable(); ts = __get_cpu_var(tick_cpu_sched); - /* - * set ts-inidle unconditionally. even if the system did not - * switch to nohz mode the cpu frequency governers rely on the - * update of the idle time accounting in tick_nohz_start_idle(). - */ ts-inidle = 1;
Re: nohz problem with idle time on old hardware
On Wed, Nov 13, 2013 at 03:07:45PM -0500, Steven Rostedt wrote: > On Wed, 13 Nov 2013 21:01:57 +0100 (CET) > Thomas Gleixner wrote: > > > > > > > Subject: NOHZ: Check for nohz active instead of nohz enabled > > > > RCU and the fine grained idle time accounting functions check > > tick_nohz_enabled. But that variable is merily telling that NOHZ has > > been enabled in the config and not been disabled on the command line. > > > > But it does not tell anything about nohz being active. That's what all > > this should check for. > > > > Matthew reported, that the idle accounting on his old P1 machine > > showed bogus values, when he enabled NOHZ in the config and did not > > disable it on the kernel command line. The reason is that his machine > > uses (refined) jiffies as a clocksource which explains why the "fine" > > grained accounting went into lala land, because it depends on when the > > system goes and leaves idle relative to the jiffies increment. > > > > Provide a tick_nohz_active indicator and let RCU and the accounting > > code use this instead of tick_nohz_enable. > > > > Reported-by: Matthew Whitehead > > Matthew, can you test this patch to make sure it causes the idle issue > to go away (remember to remove nohz=off from your command line). > Early testing looks good: # Verify I haven't overridden nohz as a kernel parameter root@host:~# grep nohz /proc/cmdline # See how my kernel chose its clocksource root@host:~# dmesg | egrep "TSC|clocksource" [0.00] tsc: Fast TSC calibration using PIT [2.492468] Switched to clocksource refined-jiffies [ 24.197356] Switched to clocksource tsc [ 26.211058] Switched to clocksource refined-jiffies # Check idle time since the reboot root@host:~# sar -s 16:28:00 -P ALL Linux 3.12.0-rc6+ (host) 11/13/2013 _i586_ (2 CPU) 04:28:26 PM LINUX RESTART 04:32:02 PM CPU %user %nice %system %iowait%steal %idle 04:34:02 PM all 4.98 0.00 1.03 0.04 0.00 93.96 04:34:02 PM 0 4.07 0.00 0.69 0.03 0.00 95.21 04:34:02 PM 1 5.88 0.00 1.37 0.06 0.00 92.70 04:36:01 PM all 4.12 0.00 1.13 0.02 0.00 94.73 04:36:01 PM 0 1.98 0.00 1.34 0.02 0.00 96.66 04:36:01 PM 1 6.28 0.00 0.91 0.03 0.00 92.79 04:38:01 PM all 0.39 0.00 0.65 0.05 0.00 98.90 04:38:01 PM 0 0.33 0.00 0.54 0.03 0.00 99.09 04:38:01 PM 1 0.45 0.00 0.75 0.07 0.00 98.73 04:40:01 PM all 0.14 0.00 0.40 0.02 0.00 99.44 04:40:01 PM 0 0.12 0.00 0.26 0.01 0.00 99.62 04:40:01 PM 1 0.16 0.00 0.55 0.02 0.00 99.27 04:42:01 PM all 0.13 0.00 0.28 0.01 0.00 99.58 04:42:01 PM 0 0.09 0.00 0.14 0.01 0.00 99.76 04:42:01 PM 1 0.16 0.00 0.43 0.02 0.00 99.40 Average:all 1.95 0.00 0.70 0.03 0.00 97.33 Average: 0 1.32 0.00 0.59 0.02 0.00 98.07 Average: 1 2.58 0.00 0.80 0.04 0.00 96.58 I will continue to test but I think this is good. Tested-by: Matthew Whitehead - Matthew W > Reviewed-by: Steven Rostedt > > -- Steve > > > Signed-off-by: Thomas Gleixner > > --- > > diff --git a/kernel/rcu/tree_plugin.h b/kernel/rcu/tree_plugin.h > > index 3822ac0..da6e6de 100644 > > --- a/kernel/rcu/tree_plugin.h > > +++ b/kernel/rcu/tree_plugin.h > > @@ -1632,7 +1632,7 @@ module_param(rcu_idle_gp_delay, int, 0644); > > static int rcu_idle_lazy_gp_delay = RCU_IDLE_LAZY_GP_DELAY; > > module_param(rcu_idle_lazy_gp_delay, int, 0644); > > > > -extern int tick_nohz_enabled; > > +extern int tick_nohz_active; > > > > /* > > * Try to advance callbacks for all flavors of RCU on the current CPU, but > > @@ -1729,7 +1729,7 @@ static void rcu_prepare_for_idle(int cpu) > > int tne; > > > > /* Handle nohz enablement switches conservatively. */ > > - tne = ACCESS_ONCE(tick_nohz_enabled); > > + tne = ACCESS_ONCE(tick_nohz_active); > > if (tne != rdtp->tick_nohz_enabled_snap) { > > if (rcu_cpu_has_callbacks(cpu, NULL)) > > invoke_rcu_core(); /* force nohz to see update. */ > > diff --git a/kernel/time/tick-sched.c b/kernel/time/tick-sched.c > > index 3612fc7..a12df5a 100644 > > --- a/kernel/time/tick-sched.c > > +++ b/kernel/time/tick-sched.c > > @@ -361,8 +361,8 @@ void __init tick_nohz_init(void) > > /* > > * NO HZ enabled ? > > */ > > -int tick_nohz_enabled __read_mostly = 1; > > - > > +static int tick_nohz_enabled __read_mostly = 1; > > +int tick_nohz_active __read_mostly; > > /* > > * Enable / Disable
Re: nohz problem with idle time on old hardware
On Wed, 13 Nov 2013 21:01:57 +0100 (CET) Thomas Gleixner wrote: > > > Subject: NOHZ: Check for nohz active instead of nohz enabled > > RCU and the fine grained idle time accounting functions check > tick_nohz_enabled. But that variable is merily telling that NOHZ has > been enabled in the config and not been disabled on the command line. > > But it does not tell anything about nohz being active. That's what all > this should check for. > > Matthew reported, that the idle accounting on his old P1 machine > showed bogus values, when he enabled NOHZ in the config and did not > disable it on the kernel command line. The reason is that his machine > uses (refined) jiffies as a clocksource which explains why the "fine" > grained accounting went into lala land, because it depends on when the > system goes and leaves idle relative to the jiffies increment. > > Provide a tick_nohz_active indicator and let RCU and the accounting > code use this instead of tick_nohz_enable. > > Reported-by: Matthew Whitehead Matthew, can you test this patch to make sure it causes the idle issue to go away (remember to remove nohz=off from your command line). Reviewed-by: Steven Rostedt -- Steve > Signed-off-by: Thomas Gleixner > --- > diff --git a/kernel/rcu/tree_plugin.h b/kernel/rcu/tree_plugin.h > index 3822ac0..da6e6de 100644 > --- a/kernel/rcu/tree_plugin.h > +++ b/kernel/rcu/tree_plugin.h > @@ -1632,7 +1632,7 @@ module_param(rcu_idle_gp_delay, int, 0644); > static int rcu_idle_lazy_gp_delay = RCU_IDLE_LAZY_GP_DELAY; > module_param(rcu_idle_lazy_gp_delay, int, 0644); > > -extern int tick_nohz_enabled; > +extern int tick_nohz_active; > > /* > * Try to advance callbacks for all flavors of RCU on the current CPU, but > @@ -1729,7 +1729,7 @@ static void rcu_prepare_for_idle(int cpu) > int tne; > > /* Handle nohz enablement switches conservatively. */ > - tne = ACCESS_ONCE(tick_nohz_enabled); > + tne = ACCESS_ONCE(tick_nohz_active); > if (tne != rdtp->tick_nohz_enabled_snap) { > if (rcu_cpu_has_callbacks(cpu, NULL)) > invoke_rcu_core(); /* force nohz to see update. */ > diff --git a/kernel/time/tick-sched.c b/kernel/time/tick-sched.c > index 3612fc7..a12df5a 100644 > --- a/kernel/time/tick-sched.c > +++ b/kernel/time/tick-sched.c > @@ -361,8 +361,8 @@ void __init tick_nohz_init(void) > /* > * NO HZ enabled ? > */ > -int tick_nohz_enabled __read_mostly = 1; > - > +static int tick_nohz_enabled __read_mostly = 1; > +int tick_nohz_active __read_mostly; > /* > * Enable / Disable tickless mode > */ > @@ -465,7 +465,7 @@ u64 get_cpu_idle_time_us(int cpu, u64 *last_update_time) > struct tick_sched *ts = _cpu(tick_cpu_sched, cpu); > ktime_t now, idle; > > - if (!tick_nohz_enabled) > + if (!tick_nohz_active) > return -1; > > now = ktime_get(); > @@ -506,7 +506,7 @@ u64 get_cpu_iowait_time_us(int cpu, u64 *last_update_time) > struct tick_sched *ts = _cpu(tick_cpu_sched, cpu); > ktime_t now, iowait; > > - if (!tick_nohz_enabled) > + if (!tick_nohz_active) > return -1; > > now = ktime_get(); > @@ -799,11 +799,6 @@ void tick_nohz_idle_enter(void) > local_irq_disable(); > > ts = &__get_cpu_var(tick_cpu_sched); > - /* > - * set ts->inidle unconditionally. even if the system did not > - * switch to nohz mode the cpu frequency governers rely on the > - * update of the idle time accounting in tick_nohz_start_idle(). > - */ > ts->inidle = 1; > __tick_nohz_idle_enter(ts); > > @@ -973,7 +968,7 @@ static void tick_nohz_switch_to_nohz(void) > struct tick_sched *ts = &__get_cpu_var(tick_cpu_sched); > ktime_t next; > > - if (!tick_nohz_enabled) > + if (!tick_nohz_active) > return; > > local_irq_disable(); > @@ -981,7 +976,7 @@ static void tick_nohz_switch_to_nohz(void) > local_irq_enable(); > return; > } > - > + tick_nohz_active = 1; > ts->nohz_mode = NOHZ_MODE_LOWRES; > > /* > @@ -1139,8 +1134,10 @@ void tick_setup_sched_timer(void) > } > > #ifdef CONFIG_NO_HZ_COMMON > - if (tick_nohz_enabled) > + if (tick_nohz_enabled) { > ts->nohz_mode = NOHZ_MODE_HIGHRES; > + tick_nohz_active = 1; > + } > #endif > } > #endif /* HIGH_RES_TIMERS */ -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: nohz problem with idle time on old hardware
On Wed, 13 Nov 2013, Paul E. McKenney wrote: > On Wed, Nov 13, 2013 at 11:23:38AM -0500, Steven Rostedt wrote: > > On Wed, 13 Nov 2013 08:18:29 -0800 > > "Paul E. McKenney" wrote: > > > > > On Wed, Nov 13, 2013 at 11:12:57AM -0500, Steven Rostedt wrote: > > > > On Wed, 13 Nov 2013 17:07:18 +0100 (CET) > > > > Thomas Gleixner wrote: > > > > > > > > > > > > > Right. It's telling you if NOHZ is enabled. It's not telling you that > > > > > NOHZ is active. > > > > > > > > Yeah, which makes this code rather silly: > > > > > > > > in rcu_prepare_for_idle(): > > > > > > > > /* Handle nohz enablement switches conservatively. */ > > > > tne = ACCESS_ONCE(tick_nohz_enabled); > > > > if (tne != rdtp->tick_nohz_enabled_snap) { > > > > if (rcu_cpu_has_callbacks(cpu, NULL)) > > > > invoke_rcu_core(); /* force nohz to see update. > > > > */ > > > > rdtp->tick_nohz_enabled_snap = tne; > > > > return; > > > > } > > > > > > OK, what should I be checking instead? Not much point in trying to > > > get RCU out of the way of disabling the scheduling-clock interrupt > > > if NOHZ is disabled. ;-) > > > > I'll leave the answer to Thomas, but checking tick_nohz_enabled just > > lets you know if someone booted with nohz=off or not (and has nohz > > configured). But it doesn't tell you if nohz is actually being used. > > Based on Thomas's most recent response, it sounds like I need to check > a frozen shark or something. ;-) Correct. Fix for the whole mess below. Thanks, tglx > Subject: NOHZ: Check for nohz active instead of nohz enabled RCU and the fine grained idle time accounting functions check tick_nohz_enabled. But that variable is merily telling that NOHZ has been enabled in the config and not been disabled on the command line. But it does not tell anything about nohz being active. That's what all this should check for. Matthew reported, that the idle accounting on his old P1 machine showed bogus values, when he enabled NOHZ in the config and did not disable it on the kernel command line. The reason is that his machine uses (refined) jiffies as a clocksource which explains why the "fine" grained accounting went into lala land, because it depends on when the system goes and leaves idle relative to the jiffies increment. Provide a tick_nohz_active indicator and let RCU and the accounting code use this instead of tick_nohz_enable. Reported-by: Matthew Whitehead Signed-off-by: Thomas Gleixner --- diff --git a/kernel/rcu/tree_plugin.h b/kernel/rcu/tree_plugin.h index 3822ac0..da6e6de 100644 --- a/kernel/rcu/tree_plugin.h +++ b/kernel/rcu/tree_plugin.h @@ -1632,7 +1632,7 @@ module_param(rcu_idle_gp_delay, int, 0644); static int rcu_idle_lazy_gp_delay = RCU_IDLE_LAZY_GP_DELAY; module_param(rcu_idle_lazy_gp_delay, int, 0644); -extern int tick_nohz_enabled; +extern int tick_nohz_active; /* * Try to advance callbacks for all flavors of RCU on the current CPU, but @@ -1729,7 +1729,7 @@ static void rcu_prepare_for_idle(int cpu) int tne; /* Handle nohz enablement switches conservatively. */ - tne = ACCESS_ONCE(tick_nohz_enabled); + tne = ACCESS_ONCE(tick_nohz_active); if (tne != rdtp->tick_nohz_enabled_snap) { if (rcu_cpu_has_callbacks(cpu, NULL)) invoke_rcu_core(); /* force nohz to see update. */ diff --git a/kernel/time/tick-sched.c b/kernel/time/tick-sched.c index 3612fc7..a12df5a 100644 --- a/kernel/time/tick-sched.c +++ b/kernel/time/tick-sched.c @@ -361,8 +361,8 @@ void __init tick_nohz_init(void) /* * NO HZ enabled ? */ -int tick_nohz_enabled __read_mostly = 1; - +static int tick_nohz_enabled __read_mostly = 1; +int tick_nohz_active __read_mostly; /* * Enable / Disable tickless mode */ @@ -465,7 +465,7 @@ u64 get_cpu_idle_time_us(int cpu, u64 *last_update_time) struct tick_sched *ts = _cpu(tick_cpu_sched, cpu); ktime_t now, idle; - if (!tick_nohz_enabled) + if (!tick_nohz_active) return -1; now = ktime_get(); @@ -506,7 +506,7 @@ u64 get_cpu_iowait_time_us(int cpu, u64 *last_update_time) struct tick_sched *ts = _cpu(tick_cpu_sched, cpu); ktime_t now, iowait; - if (!tick_nohz_enabled) + if (!tick_nohz_active) return -1; now = ktime_get(); @@ -799,11 +799,6 @@ void tick_nohz_idle_enter(void) local_irq_disable(); ts = &__get_cpu_var(tick_cpu_sched); - /* -* set ts->inidle unconditionally. even if the system did not -* switch to nohz mode the cpu frequency governers rely on the -* update of the idle time accounting in tick_nohz_start_idle(). -*/ ts->inidle = 1; __tick_nohz_idle_enter(ts); @@ -973,7 +968,7 @@ static void tick_nohz_switch_to_nohz(void) struct tick_sched *ts =
Re: nohz problem with idle time on old hardware
On Wed, Nov 13, 2013 at 11:23:38AM -0500, Steven Rostedt wrote: > On Wed, 13 Nov 2013 08:18:29 -0800 > "Paul E. McKenney" wrote: > > > On Wed, Nov 13, 2013 at 11:12:57AM -0500, Steven Rostedt wrote: > > > On Wed, 13 Nov 2013 17:07:18 +0100 (CET) > > > Thomas Gleixner wrote: > > > > > > > > > > Right. It's telling you if NOHZ is enabled. It's not telling you that > > > > NOHZ is active. > > > > > > Yeah, which makes this code rather silly: > > > > > > in rcu_prepare_for_idle(): > > > > > > /* Handle nohz enablement switches conservatively. */ > > > tne = ACCESS_ONCE(tick_nohz_enabled); > > > if (tne != rdtp->tick_nohz_enabled_snap) { > > > if (rcu_cpu_has_callbacks(cpu, NULL)) > > > invoke_rcu_core(); /* force nohz to see update. */ > > > rdtp->tick_nohz_enabled_snap = tne; > > > return; > > > } > > > > OK, what should I be checking instead? Not much point in trying to > > get RCU out of the way of disabling the scheduling-clock interrupt > > if NOHZ is disabled. ;-) > > I'll leave the answer to Thomas, but checking tick_nohz_enabled just > lets you know if someone booted with nohz=off or not (and has nohz > configured). But it doesn't tell you if nohz is actually being used. Based on Thomas's most recent response, it sounds like I need to check a frozen shark or something. ;-) Thanx, Paul > That is, tick_nohz_enabled is set at bootup and never changes. > > Perhaps this old hardware uncovered other bugs as well ;-) > > -- Steve > -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: nohz problem with idle time on old hardware
On Wed, 13 Nov 2013 17:21:53 +0100 (CET) Thomas Gleixner wrote: > Frozen shark time As long as it's not aimed at me ;-) -- Steve -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: nohz problem with idle time on old hardware
On Wed, 13 Nov 2013 08:18:29 -0800 "Paul E. McKenney" wrote: > On Wed, Nov 13, 2013 at 11:12:57AM -0500, Steven Rostedt wrote: > > On Wed, 13 Nov 2013 17:07:18 +0100 (CET) > > Thomas Gleixner wrote: > > > > > > > Right. It's telling you if NOHZ is enabled. It's not telling you that > > > NOHZ is active. > > > > Yeah, which makes this code rather silly: > > > > in rcu_prepare_for_idle(): > > > > /* Handle nohz enablement switches conservatively. */ > > tne = ACCESS_ONCE(tick_nohz_enabled); > > if (tne != rdtp->tick_nohz_enabled_snap) { > > if (rcu_cpu_has_callbacks(cpu, NULL)) > > invoke_rcu_core(); /* force nohz to see update. */ > > rdtp->tick_nohz_enabled_snap = tne; > > return; > > } > > OK, what should I be checking instead? Not much point in trying to > get RCU out of the way of disabling the scheduling-clock interrupt > if NOHZ is disabled. ;-) > I'll leave the answer to Thomas, but checking tick_nohz_enabled just lets you know if someone booted with nohz=off or not (and has nohz configured). But it doesn't tell you if nohz is actually being used. That is, tick_nohz_enabled is set at bootup and never changes. Perhaps this old hardware uncovered other bugs as well ;-) -- Steve -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: nohz problem with idle time on old hardware
On Wed, 13 Nov 2013, Thomas Gleixner wrote: > On Wed, 13 Nov 2013, Steven Rostedt wrote: > > > On Wed, 13 Nov 2013 10:31:34 -0500 > > Steven Rostedt wrote: > > > > > The trace does indeed show that a tick is happening, as the config has > > > HZ=250 (4ms) and we see a tick happen every 4ms. But for some reason, > > > we don't update the the idle time correctly when nohz is enabled. > > > > > > When I say nohz is enabled, I mean that we don't have nohz=off in the > > > command line. There seems to be some difference between having nohz=off > > > and having nohz disabled at runtime. > > > > > > Looking at the differences between nohz=off from the command line, and > > disabled at run time seems to be the variable "tick_nohz_enabled". I > > don't see where it gets set to zero except for nohz=off. > > Right. It's telling you if NOHZ is enabled. It's not telling you that > NOHZ is active. And its even worse: tick_nohz_idle_enter() /* * set ts->inidle unconditionally. even if the system did not * switch to nohz mode the cpu frequency governers rely on the * update of the idle time accounting in tick_nohz_start_idle(). */ ts->inidle = 1; So there is code relying on that accounting stuff even if nohz is not active. Frozen shark time -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: nohz problem with idle time on old hardware
On Wed, Nov 13, 2013 at 11:12:57AM -0500, Steven Rostedt wrote: > On Wed, 13 Nov 2013 17:07:18 +0100 (CET) > Thomas Gleixner wrote: > > > > Right. It's telling you if NOHZ is enabled. It's not telling you that > > NOHZ is active. > > Yeah, which makes this code rather silly: > > in rcu_prepare_for_idle(): > > /* Handle nohz enablement switches conservatively. */ > tne = ACCESS_ONCE(tick_nohz_enabled); > if (tne != rdtp->tick_nohz_enabled_snap) { > if (rcu_cpu_has_callbacks(cpu, NULL)) > invoke_rcu_core(); /* force nohz to see update. */ > rdtp->tick_nohz_enabled_snap = tne; > return; > } OK, what should I be checking instead? Not much point in trying to get RCU out of the way of disabling the scheduling-clock interrupt if NOHZ is disabled. ;-) Thanx, Paul -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: nohz problem with idle time on old hardware
On Wed, Nov 13, 2013 at 04:50:20PM +0100, Thomas Gleixner wrote: > On Wed, 13 Nov 2013, Steven Rostedt wrote: > > > I'm not saying that we are actually getting into nohz, but something > > > with the nohz code is messing with cpu accounting. > > > > The trace does indeed show that a tick is happening, as the config has > > HZ=250 (4ms) and we see a tick happen every 4ms. But for some reason, > > we don't update the the idle time correctly when nohz is enabled. > > > > When I say nohz is enabled, I mean that we don't have nohz=off in the > > command line. There seems to be some difference between having nohz=off > > and having nohz disabled at runtime. > > Right that affects tick_nohz_enabled > > Two files use this variable: > kernel/rcu/tree_plugin.h > kernel/time/tick-sched.c > > The only accounting related stuff is in tick-sched.c: > > get_cpu_idle_time_us() and get_cpu_iowait_time_us() > > Both functions bail out if (!tick_nohz_enabled). > > The users of get_cpu_idle_time_us() are cpufreq and fs/proc/stat.c! > > Now the simplest fix is to let those functions check whether we > actually switched into NOHZ mode. Should work for the RCU tree stuff > as well. RCU's use of tick_nohz_enabled is for the RCU_FAST_NO_HZ stuff. If it sees !tick_nohz_enabled, it skips trying to get RCU out of the way of disabling the scheduling-clock tick. If RCU detects a change in the value of tick_nohz_enabled, it does a raise_softirq() to force re-evaluation of the situation. Thanx, Paul -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: nohz problem with idle time on old hardware
On Wed, 13 Nov 2013 17:07:18 +0100 (CET) Thomas Gleixner wrote: > Right. It's telling you if NOHZ is enabled. It's not telling you that > NOHZ is active. Yeah, which makes this code rather silly: in rcu_prepare_for_idle(): /* Handle nohz enablement switches conservatively. */ tne = ACCESS_ONCE(tick_nohz_enabled); if (tne != rdtp->tick_nohz_enabled_snap) { if (rcu_cpu_has_callbacks(cpu, NULL)) invoke_rcu_core(); /* force nohz to see update. */ rdtp->tick_nohz_enabled_snap = tne; return; } -- Steve -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: nohz problem with idle time on old hardware
On Wed, 13 Nov 2013, Steven Rostedt wrote: > On Wed, 13 Nov 2013 10:31:34 -0500 > Steven Rostedt wrote: > > > The trace does indeed show that a tick is happening, as the config has > > HZ=250 (4ms) and we see a tick happen every 4ms. But for some reason, > > we don't update the the idle time correctly when nohz is enabled. > > > > When I say nohz is enabled, I mean that we don't have nohz=off in the > > command line. There seems to be some difference between having nohz=off > > and having nohz disabled at runtime. > > > Looking at the differences between nohz=off from the command line, and > disabled at run time seems to be the variable "tick_nohz_enabled". I > don't see where it gets set to zero except for nohz=off. Right. It's telling you if NOHZ is enabled. It's not telling you that NOHZ is active. Thanks, tglx -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: nohz problem with idle time on old hardware
On Wed, 13 Nov 2013 10:31:34 -0500 Steven Rostedt wrote: > The trace does indeed show that a tick is happening, as the config has > HZ=250 (4ms) and we see a tick happen every 4ms. But for some reason, > we don't update the the idle time correctly when nohz is enabled. > > When I say nohz is enabled, I mean that we don't have nohz=off in the > command line. There seems to be some difference between having nohz=off > and having nohz disabled at runtime. Looking at the differences between nohz=off from the command line, and disabled at run time seems to be the variable "tick_nohz_enabled". I don't see where it gets set to zero except for nohz=off. $ git grep tick_nohz_enabled kernel/rcutree.h: int tick_nohz_enabled_snap; /* Previously seen value from sysfs. */ kernel/rcutree_plugin.h:extern int tick_nohz_enabled; kernel/rcutree_plugin.h:tne = ACCESS_ONCE(tick_nohz_enabled); kernel/rcutree_plugin.h:if (tne != rdtp->tick_nohz_enabled_snap) { kernel/rcutree_plugin.h:rdtp->tick_nohz_enabled_snap = tne; kernel/rcutree_plugin.h: rdtp->tick_nohz_enabled_snap ? '.' : 'D'); kernel/time/tick-sched.c:int tick_nohz_enabled __read_mostly = 1; kernel/time/tick-sched.c: tick_nohz_enabled = 0; kernel/time/tick-sched.c: tick_nohz_enabled = 1; kernel/time/tick-sched.c: if (!tick_nohz_enabled) kernel/time/tick-sched.c: if (!tick_nohz_enabled) kernel/time/tick-sched.c: if (!tick_nohz_enabled) kernel/time/tick-sched.c: if (tick_nohz_enabled) What's even stranger is that the RCU code in rcutree_plugin.h does an ACCESS_ONCE(tick_nohz_enabled) as if it can change. That said, looking at the fs/proc/stat.c get_idle_time() it does an idle_time = get_cpu_idle_time_us(cpu, NULL) which has: u64 get_cpu_idle_time_us(int cpu, u64 *last_update_time) { struct tick_sched *ts = _cpu(tick_cpu_sched, cpu); ktime_t now, idle; if (!tick_nohz_enabled) return -1; now = ktime_get(); if (last_update_time) { update_ts_time_stats(cpu, ts, now, last_update_time); idle = ts->idle_sleeptime; } else { if (ts->idle_active && !nr_iowait_cpu(cpu)) { ktime_t delta = ktime_sub(now, ts->idle_entrytime); idle = ktime_add(ts->idle_sleeptime, delta); } else { idle = ts->idle_sleeptime; } } return ktime_to_us(idle); } This is one of the differences between nohz=off and nohz=on with jiffy accounting. When we have nohz=off, this returns -1 and the calling function calculates the idle time differently. -- Steve -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: nohz problem with idle time on old hardware
On Wed, 13 Nov 2013, Steven Rostedt wrote: > > I'm not saying that we are actually getting into nohz, but something > > with the nohz code is messing with cpu accounting. > > The trace does indeed show that a tick is happening, as the config has > HZ=250 (4ms) and we see a tick happen every 4ms. But for some reason, > we don't update the the idle time correctly when nohz is enabled. > > When I say nohz is enabled, I mean that we don't have nohz=off in the > command line. There seems to be some difference between having nohz=off > and having nohz disabled at runtime. Right that affects tick_nohz_enabled Two files use this variable: kernel/rcu/tree_plugin.h kernel/time/tick-sched.c The only accounting related stuff is in tick-sched.c: get_cpu_idle_time_us() and get_cpu_iowait_time_us() Both functions bail out if (!tick_nohz_enabled). The users of get_cpu_idle_time_us() are cpufreq and fs/proc/stat.c! Now the simplest fix is to let those functions check whether we actually switched into NOHZ mode. Should work for the RCU tree stuff as well. Thanks, tglx -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: nohz problem with idle time on old hardware
On Wed, 13 Nov 2013 10:21:53 -0500 Steven Rostedt wrote: trace-cmd record -p function -l '*nohz*' -l account_process_tick -e sched_switch > > rcu_sche-9 0d... 6858.618033: sched_switch: rcu_sched:9 [120] > S ==> swapper/0:0 [120] > -0 0 6858.618082: function: > tick_nohz_idle_enter <-- cpu_startup_entry > -0 0d... 6858.618101: function: > __tick_nohz_idle_enter <-- tick_nohz_idle_enter > -0 0d.s. 6858.621499: function: > tick_nohz_stop_idle <-- tick_check_idle > -0 0d.h. 6858.621550: function: > account_process_tick <-- update_process_times > -0 0d... 6858.621769: function: tick_nohz_irq_exit > <-- irq_exit > -0 0d... 6858.621787: function: > __tick_nohz_idle_enter <-- irq_exit > -0 0d.s. 6858.625500: function: > tick_nohz_stop_idle <-- tick_check_idle > -0 0d.h. 6858.625574: function: > account_process_tick <-- update_process_times > -0 0d... 6858.625818: function: tick_nohz_irq_exit > <-- irq_exit > -0 0d... 6858.625847: function: > __tick_nohz_idle_enter <-- irq_exit > -0 0d.s. 6858.629295: function: > tick_nohz_stop_idle <-- tick_check_idle > -0 0d... 6858.629351: function: tick_nohz_irq_exit > <-- irq_exit > -0 0d... 6858.629373: function: > __tick_nohz_idle_enter <-- irq_exit > -0 0d.s. 6858.629503: function: > tick_nohz_stop_idle <-- tick_check_idle > -0 0d.h. 6858.629569: function: > account_process_tick <-- update_process_times > -0 0d... 6858.629851: function: tick_nohz_irq_exit > <-- irq_exit > -0 0d... 6858.629881: function: > __tick_nohz_idle_enter <-- irq_exit > -0 0d.s. 6858.630412: function: > tick_nohz_stop_idle <-- tick_check_idle > -0 0.N.. 6858.630550: function: > tick_nohz_idle_exit <-- cpu_startup_entry > -0 0d... 6858.630605: sched_switch: swapper/0:0 [120] > R ==> rcu_sched:9 [120] > > I'm not saying that we are actually getting into nohz, but something > with the nohz code is messing with cpu accounting. The trace does indeed show that a tick is happening, as the config has HZ=250 (4ms) and we see a tick happen every 4ms. But for some reason, we don't update the the idle time correctly when nohz is enabled. When I say nohz is enabled, I mean that we don't have nohz=off in the command line. There seems to be some difference between having nohz=off and having nohz disabled at runtime. -- Steve -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: nohz problem with idle time on old hardware
On Wed, 13 Nov 2013, Matthew Whitehead wrote: > I was testing the 3.12 kernel on some _old_ hardware and I uncovered a bug. > It arises when nohz=on and goes away with nohz=off. On a crusty dual > Pentium-1 > system that is completely idle, the sar utility reports 0% idle time on cpu0 > and 100% idle on cpu1. Cpu0 _should_ also be reporting 100% idle, but instead > it reports around 75% system time and 25% user time. > > The problem was diagnosed by Steve Rostedt with help from John > Stultz. The old system declares the dual TSCs unstable, and backs > down to a timesource of refined-jiffies. Apparently refined-jiffies > and jiffies are not a usable timesourcefor nohz, but we don't check > for that case because most modern systems have several reliable > hardware timesources. Wrong. > John suggested that we turn off nohz unless a usable hardware timesource is > present. nohz already depends on two things: 1) A reliable clocksource which is valid for highres/nohz 2) A per cpu clockevent device which supports one shot mode. and those are evaluated at runtime before we switch into NOHZ mode. And neither jiffies nor refined-jiffies qualify as valid clocksource. So there is something else wrong. Thanks, tglx -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: nohz problem with idle time on old hardware
On Wed, 13 Nov 2013, Matthew Whitehead wrote: I was testing the 3.12 kernel on some _old_ hardware and I uncovered a bug. It arises when nohz=on and goes away with nohz=off. On a crusty dual Pentium-1 system that is completely idle, the sar utility reports 0% idle time on cpu0 and 100% idle on cpu1. Cpu0 _should_ also be reporting 100% idle, but instead it reports around 75% system time and 25% user time. The problem was diagnosed by Steve Rostedt with help from John Stultz. The old system declares the dual TSCs unstable, and backs down to a timesource of refined-jiffies. Apparently refined-jiffies and jiffies are not a usable timesourcefor nohz, but we don't check for that case because most modern systems have several reliable hardware timesources. Wrong. John suggested that we turn off nohz unless a usable hardware timesource is present. nohz already depends on two things: 1) A reliable clocksource which is valid for highres/nohz 2) A per cpu clockevent device which supports one shot mode. and those are evaluated at runtime before we switch into NOHZ mode. And neither jiffies nor refined-jiffies qualify as valid clocksource. So there is something else wrong. Thanks, tglx -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: nohz problem with idle time on old hardware
On Wed, 13 Nov 2013 10:21:53 -0500 Steven Rostedt rost...@goodmis.org wrote: trace-cmd record -p function -l '*nohz*' -l account_process_tick -e sched_switch rcu_sche-9 0d... 6858.618033: sched_switch: rcu_sched:9 [120] S == swapper/0:0 [120] idle-0 0 6858.618082: function: tick_nohz_idle_enter -- cpu_startup_entry idle-0 0d... 6858.618101: function: __tick_nohz_idle_enter -- tick_nohz_idle_enter idle-0 0d.s. 6858.621499: function: tick_nohz_stop_idle -- tick_check_idle idle-0 0d.h. 6858.621550: function: account_process_tick -- update_process_times idle-0 0d... 6858.621769: function: tick_nohz_irq_exit -- irq_exit idle-0 0d... 6858.621787: function: __tick_nohz_idle_enter -- irq_exit idle-0 0d.s. 6858.625500: function: tick_nohz_stop_idle -- tick_check_idle idle-0 0d.h. 6858.625574: function: account_process_tick -- update_process_times idle-0 0d... 6858.625818: function: tick_nohz_irq_exit -- irq_exit idle-0 0d... 6858.625847: function: __tick_nohz_idle_enter -- irq_exit idle-0 0d.s. 6858.629295: function: tick_nohz_stop_idle -- tick_check_idle idle-0 0d... 6858.629351: function: tick_nohz_irq_exit -- irq_exit idle-0 0d... 6858.629373: function: __tick_nohz_idle_enter -- irq_exit idle-0 0d.s. 6858.629503: function: tick_nohz_stop_idle -- tick_check_idle idle-0 0d.h. 6858.629569: function: account_process_tick -- update_process_times idle-0 0d... 6858.629851: function: tick_nohz_irq_exit -- irq_exit idle-0 0d... 6858.629881: function: __tick_nohz_idle_enter -- irq_exit idle-0 0d.s. 6858.630412: function: tick_nohz_stop_idle -- tick_check_idle idle-0 0.N.. 6858.630550: function: tick_nohz_idle_exit -- cpu_startup_entry idle-0 0d... 6858.630605: sched_switch: swapper/0:0 [120] R == rcu_sched:9 [120] I'm not saying that we are actually getting into nohz, but something with the nohz code is messing with cpu accounting. The trace does indeed show that a tick is happening, as the config has HZ=250 (4ms) and we see a tick happen every 4ms. But for some reason, we don't update the the idle time correctly when nohz is enabled. When I say nohz is enabled, I mean that we don't have nohz=off in the command line. There seems to be some difference between having nohz=off and having nohz disabled at runtime. -- Steve -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: nohz problem with idle time on old hardware
On Wed, 13 Nov 2013, Steven Rostedt wrote: I'm not saying that we are actually getting into nohz, but something with the nohz code is messing with cpu accounting. The trace does indeed show that a tick is happening, as the config has HZ=250 (4ms) and we see a tick happen every 4ms. But for some reason, we don't update the the idle time correctly when nohz is enabled. When I say nohz is enabled, I mean that we don't have nohz=off in the command line. There seems to be some difference between having nohz=off and having nohz disabled at runtime. Right that affects tick_nohz_enabled Two files use this variable: kernel/rcu/tree_plugin.h kernel/time/tick-sched.c The only accounting related stuff is in tick-sched.c: get_cpu_idle_time_us() and get_cpu_iowait_time_us() Both functions bail out if (!tick_nohz_enabled). The users of get_cpu_idle_time_us() are cpufreq and fs/proc/stat.c! Now the simplest fix is to let those functions check whether we actually switched into NOHZ mode. Should work for the RCU tree stuff as well. Thanks, tglx -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: nohz problem with idle time on old hardware
On Wed, 13 Nov 2013 10:31:34 -0500 Steven Rostedt rost...@goodmis.org wrote: The trace does indeed show that a tick is happening, as the config has HZ=250 (4ms) and we see a tick happen every 4ms. But for some reason, we don't update the the idle time correctly when nohz is enabled. When I say nohz is enabled, I mean that we don't have nohz=off in the command line. There seems to be some difference between having nohz=off and having nohz disabled at runtime. Looking at the differences between nohz=off from the command line, and disabled at run time seems to be the variable tick_nohz_enabled. I don't see where it gets set to zero except for nohz=off. $ git grep tick_nohz_enabled kernel/rcutree.h: int tick_nohz_enabled_snap; /* Previously seen value from sysfs. */ kernel/rcutree_plugin.h:extern int tick_nohz_enabled; kernel/rcutree_plugin.h:tne = ACCESS_ONCE(tick_nohz_enabled); kernel/rcutree_plugin.h:if (tne != rdtp-tick_nohz_enabled_snap) { kernel/rcutree_plugin.h:rdtp-tick_nohz_enabled_snap = tne; kernel/rcutree_plugin.h: rdtp-tick_nohz_enabled_snap ? '.' : 'D'); kernel/time/tick-sched.c:int tick_nohz_enabled __read_mostly = 1; kernel/time/tick-sched.c: tick_nohz_enabled = 0; kernel/time/tick-sched.c: tick_nohz_enabled = 1; kernel/time/tick-sched.c: if (!tick_nohz_enabled) kernel/time/tick-sched.c: if (!tick_nohz_enabled) kernel/time/tick-sched.c: if (!tick_nohz_enabled) kernel/time/tick-sched.c: if (tick_nohz_enabled) What's even stranger is that the RCU code in rcutree_plugin.h does an ACCESS_ONCE(tick_nohz_enabled) as if it can change. That said, looking at the fs/proc/stat.c get_idle_time() it does an idle_time = get_cpu_idle_time_us(cpu, NULL) which has: u64 get_cpu_idle_time_us(int cpu, u64 *last_update_time) { struct tick_sched *ts = per_cpu(tick_cpu_sched, cpu); ktime_t now, idle; if (!tick_nohz_enabled) return -1; now = ktime_get(); if (last_update_time) { update_ts_time_stats(cpu, ts, now, last_update_time); idle = ts-idle_sleeptime; } else { if (ts-idle_active !nr_iowait_cpu(cpu)) { ktime_t delta = ktime_sub(now, ts-idle_entrytime); idle = ktime_add(ts-idle_sleeptime, delta); } else { idle = ts-idle_sleeptime; } } return ktime_to_us(idle); } This is one of the differences between nohz=off and nohz=on with jiffy accounting. When we have nohz=off, this returns -1 and the calling function calculates the idle time differently. -- Steve -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: nohz problem with idle time on old hardware
On Wed, 13 Nov 2013, Steven Rostedt wrote: On Wed, 13 Nov 2013 10:31:34 -0500 Steven Rostedt rost...@goodmis.org wrote: The trace does indeed show that a tick is happening, as the config has HZ=250 (4ms) and we see a tick happen every 4ms. But for some reason, we don't update the the idle time correctly when nohz is enabled. When I say nohz is enabled, I mean that we don't have nohz=off in the command line. There seems to be some difference between having nohz=off and having nohz disabled at runtime. Looking at the differences between nohz=off from the command line, and disabled at run time seems to be the variable tick_nohz_enabled. I don't see where it gets set to zero except for nohz=off. Right. It's telling you if NOHZ is enabled. It's not telling you that NOHZ is active. Thanks, tglx -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: nohz problem with idle time on old hardware
On Wed, 13 Nov 2013 17:07:18 +0100 (CET) Thomas Gleixner t...@linutronix.de wrote: Right. It's telling you if NOHZ is enabled. It's not telling you that NOHZ is active. Yeah, which makes this code rather silly: in rcu_prepare_for_idle(): /* Handle nohz enablement switches conservatively. */ tne = ACCESS_ONCE(tick_nohz_enabled); if (tne != rdtp-tick_nohz_enabled_snap) { if (rcu_cpu_has_callbacks(cpu, NULL)) invoke_rcu_core(); /* force nohz to see update. */ rdtp-tick_nohz_enabled_snap = tne; return; } -- Steve -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: nohz problem with idle time on old hardware
On Wed, Nov 13, 2013 at 04:50:20PM +0100, Thomas Gleixner wrote: On Wed, 13 Nov 2013, Steven Rostedt wrote: I'm not saying that we are actually getting into nohz, but something with the nohz code is messing with cpu accounting. The trace does indeed show that a tick is happening, as the config has HZ=250 (4ms) and we see a tick happen every 4ms. But for some reason, we don't update the the idle time correctly when nohz is enabled. When I say nohz is enabled, I mean that we don't have nohz=off in the command line. There seems to be some difference between having nohz=off and having nohz disabled at runtime. Right that affects tick_nohz_enabled Two files use this variable: kernel/rcu/tree_plugin.h kernel/time/tick-sched.c The only accounting related stuff is in tick-sched.c: get_cpu_idle_time_us() and get_cpu_iowait_time_us() Both functions bail out if (!tick_nohz_enabled). The users of get_cpu_idle_time_us() are cpufreq and fs/proc/stat.c! Now the simplest fix is to let those functions check whether we actually switched into NOHZ mode. Should work for the RCU tree stuff as well. RCU's use of tick_nohz_enabled is for the RCU_FAST_NO_HZ stuff. If it sees !tick_nohz_enabled, it skips trying to get RCU out of the way of disabling the scheduling-clock tick. If RCU detects a change in the value of tick_nohz_enabled, it does a raise_softirq() to force re-evaluation of the situation. Thanx, Paul -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: nohz problem with idle time on old hardware
On Wed, Nov 13, 2013 at 11:12:57AM -0500, Steven Rostedt wrote: On Wed, 13 Nov 2013 17:07:18 +0100 (CET) Thomas Gleixner t...@linutronix.de wrote: Right. It's telling you if NOHZ is enabled. It's not telling you that NOHZ is active. Yeah, which makes this code rather silly: in rcu_prepare_for_idle(): /* Handle nohz enablement switches conservatively. */ tne = ACCESS_ONCE(tick_nohz_enabled); if (tne != rdtp-tick_nohz_enabled_snap) { if (rcu_cpu_has_callbacks(cpu, NULL)) invoke_rcu_core(); /* force nohz to see update. */ rdtp-tick_nohz_enabled_snap = tne; return; } OK, what should I be checking instead? Not much point in trying to get RCU out of the way of disabling the scheduling-clock interrupt if NOHZ is disabled. ;-) Thanx, Paul -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: nohz problem with idle time on old hardware
On Wed, 13 Nov 2013, Thomas Gleixner wrote: On Wed, 13 Nov 2013, Steven Rostedt wrote: On Wed, 13 Nov 2013 10:31:34 -0500 Steven Rostedt rost...@goodmis.org wrote: The trace does indeed show that a tick is happening, as the config has HZ=250 (4ms) and we see a tick happen every 4ms. But for some reason, we don't update the the idle time correctly when nohz is enabled. When I say nohz is enabled, I mean that we don't have nohz=off in the command line. There seems to be some difference between having nohz=off and having nohz disabled at runtime. Looking at the differences between nohz=off from the command line, and disabled at run time seems to be the variable tick_nohz_enabled. I don't see where it gets set to zero except for nohz=off. Right. It's telling you if NOHZ is enabled. It's not telling you that NOHZ is active. And its even worse: tick_nohz_idle_enter() /* * set ts-inidle unconditionally. even if the system did not * switch to nohz mode the cpu frequency governers rely on the * update of the idle time accounting in tick_nohz_start_idle(). */ ts-inidle = 1; So there is code relying on that accounting stuff even if nohz is not active. Frozen shark time -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: nohz problem with idle time on old hardware
On Wed, 13 Nov 2013 08:18:29 -0800 Paul E. McKenney paul...@linux.vnet.ibm.com wrote: On Wed, Nov 13, 2013 at 11:12:57AM -0500, Steven Rostedt wrote: On Wed, 13 Nov 2013 17:07:18 +0100 (CET) Thomas Gleixner t...@linutronix.de wrote: Right. It's telling you if NOHZ is enabled. It's not telling you that NOHZ is active. Yeah, which makes this code rather silly: in rcu_prepare_for_idle(): /* Handle nohz enablement switches conservatively. */ tne = ACCESS_ONCE(tick_nohz_enabled); if (tne != rdtp-tick_nohz_enabled_snap) { if (rcu_cpu_has_callbacks(cpu, NULL)) invoke_rcu_core(); /* force nohz to see update. */ rdtp-tick_nohz_enabled_snap = tne; return; } OK, what should I be checking instead? Not much point in trying to get RCU out of the way of disabling the scheduling-clock interrupt if NOHZ is disabled. ;-) I'll leave the answer to Thomas, but checking tick_nohz_enabled just lets you know if someone booted with nohz=off or not (and has nohz configured). But it doesn't tell you if nohz is actually being used. That is, tick_nohz_enabled is set at bootup and never changes. Perhaps this old hardware uncovered other bugs as well ;-) -- Steve -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: nohz problem with idle time on old hardware
On Wed, 13 Nov 2013 17:21:53 +0100 (CET) Thomas Gleixner t...@linutronix.de wrote: Frozen shark time As long as it's not aimed at me ;-) -- Steve -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: nohz problem with idle time on old hardware
On Wed, Nov 13, 2013 at 11:23:38AM -0500, Steven Rostedt wrote: On Wed, 13 Nov 2013 08:18:29 -0800 Paul E. McKenney paul...@linux.vnet.ibm.com wrote: On Wed, Nov 13, 2013 at 11:12:57AM -0500, Steven Rostedt wrote: On Wed, 13 Nov 2013 17:07:18 +0100 (CET) Thomas Gleixner t...@linutronix.de wrote: Right. It's telling you if NOHZ is enabled. It's not telling you that NOHZ is active. Yeah, which makes this code rather silly: in rcu_prepare_for_idle(): /* Handle nohz enablement switches conservatively. */ tne = ACCESS_ONCE(tick_nohz_enabled); if (tne != rdtp-tick_nohz_enabled_snap) { if (rcu_cpu_has_callbacks(cpu, NULL)) invoke_rcu_core(); /* force nohz to see update. */ rdtp-tick_nohz_enabled_snap = tne; return; } OK, what should I be checking instead? Not much point in trying to get RCU out of the way of disabling the scheduling-clock interrupt if NOHZ is disabled. ;-) I'll leave the answer to Thomas, but checking tick_nohz_enabled just lets you know if someone booted with nohz=off or not (and has nohz configured). But it doesn't tell you if nohz is actually being used. Based on Thomas's most recent response, it sounds like I need to check a frozen shark or something. ;-) Thanx, Paul That is, tick_nohz_enabled is set at bootup and never changes. Perhaps this old hardware uncovered other bugs as well ;-) -- Steve -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: nohz problem with idle time on old hardware
On Wed, 13 Nov 2013, Paul E. McKenney wrote: On Wed, Nov 13, 2013 at 11:23:38AM -0500, Steven Rostedt wrote: On Wed, 13 Nov 2013 08:18:29 -0800 Paul E. McKenney paul...@linux.vnet.ibm.com wrote: On Wed, Nov 13, 2013 at 11:12:57AM -0500, Steven Rostedt wrote: On Wed, 13 Nov 2013 17:07:18 +0100 (CET) Thomas Gleixner t...@linutronix.de wrote: Right. It's telling you if NOHZ is enabled. It's not telling you that NOHZ is active. Yeah, which makes this code rather silly: in rcu_prepare_for_idle(): /* Handle nohz enablement switches conservatively. */ tne = ACCESS_ONCE(tick_nohz_enabled); if (tne != rdtp-tick_nohz_enabled_snap) { if (rcu_cpu_has_callbacks(cpu, NULL)) invoke_rcu_core(); /* force nohz to see update. */ rdtp-tick_nohz_enabled_snap = tne; return; } OK, what should I be checking instead? Not much point in trying to get RCU out of the way of disabling the scheduling-clock interrupt if NOHZ is disabled. ;-) I'll leave the answer to Thomas, but checking tick_nohz_enabled just lets you know if someone booted with nohz=off or not (and has nohz configured). But it doesn't tell you if nohz is actually being used. Based on Thomas's most recent response, it sounds like I need to check a frozen shark or something. ;-) Correct. Fix for the whole mess below. Thanks, tglx Subject: NOHZ: Check for nohz active instead of nohz enabled RCU and the fine grained idle time accounting functions check tick_nohz_enabled. But that variable is merily telling that NOHZ has been enabled in the config and not been disabled on the command line. But it does not tell anything about nohz being active. That's what all this should check for. Matthew reported, that the idle accounting on his old P1 machine showed bogus values, when he enabled NOHZ in the config and did not disable it on the kernel command line. The reason is that his machine uses (refined) jiffies as a clocksource which explains why the fine grained accounting went into lala land, because it depends on when the system goes and leaves idle relative to the jiffies increment. Provide a tick_nohz_active indicator and let RCU and the accounting code use this instead of tick_nohz_enable. Reported-by: Matthew Whitehead tedheads...@gmail.com Signed-off-by: Thomas Gleixner t...@linutronix.de --- diff --git a/kernel/rcu/tree_plugin.h b/kernel/rcu/tree_plugin.h index 3822ac0..da6e6de 100644 --- a/kernel/rcu/tree_plugin.h +++ b/kernel/rcu/tree_plugin.h @@ -1632,7 +1632,7 @@ module_param(rcu_idle_gp_delay, int, 0644); static int rcu_idle_lazy_gp_delay = RCU_IDLE_LAZY_GP_DELAY; module_param(rcu_idle_lazy_gp_delay, int, 0644); -extern int tick_nohz_enabled; +extern int tick_nohz_active; /* * Try to advance callbacks for all flavors of RCU on the current CPU, but @@ -1729,7 +1729,7 @@ static void rcu_prepare_for_idle(int cpu) int tne; /* Handle nohz enablement switches conservatively. */ - tne = ACCESS_ONCE(tick_nohz_enabled); + tne = ACCESS_ONCE(tick_nohz_active); if (tne != rdtp-tick_nohz_enabled_snap) { if (rcu_cpu_has_callbacks(cpu, NULL)) invoke_rcu_core(); /* force nohz to see update. */ diff --git a/kernel/time/tick-sched.c b/kernel/time/tick-sched.c index 3612fc7..a12df5a 100644 --- a/kernel/time/tick-sched.c +++ b/kernel/time/tick-sched.c @@ -361,8 +361,8 @@ void __init tick_nohz_init(void) /* * NO HZ enabled ? */ -int tick_nohz_enabled __read_mostly = 1; - +static int tick_nohz_enabled __read_mostly = 1; +int tick_nohz_active __read_mostly; /* * Enable / Disable tickless mode */ @@ -465,7 +465,7 @@ u64 get_cpu_idle_time_us(int cpu, u64 *last_update_time) struct tick_sched *ts = per_cpu(tick_cpu_sched, cpu); ktime_t now, idle; - if (!tick_nohz_enabled) + if (!tick_nohz_active) return -1; now = ktime_get(); @@ -506,7 +506,7 @@ u64 get_cpu_iowait_time_us(int cpu, u64 *last_update_time) struct tick_sched *ts = per_cpu(tick_cpu_sched, cpu); ktime_t now, iowait; - if (!tick_nohz_enabled) + if (!tick_nohz_active) return -1; now = ktime_get(); @@ -799,11 +799,6 @@ void tick_nohz_idle_enter(void) local_irq_disable(); ts = __get_cpu_var(tick_cpu_sched); - /* -* set ts-inidle unconditionally. even if the system did not -* switch to nohz mode the cpu frequency governers rely on the -* update of the idle time accounting in tick_nohz_start_idle(). -*/ ts-inidle = 1; __tick_nohz_idle_enter(ts); @@ -973,7 +968,7 @@ static void tick_nohz_switch_to_nohz(void) struct tick_sched *ts = __get_cpu_var(tick_cpu_sched); ktime_t next; -
Re: nohz problem with idle time on old hardware
On Wed, 13 Nov 2013 21:01:57 +0100 (CET) Thomas Gleixner t...@linutronix.de wrote: Subject: NOHZ: Check for nohz active instead of nohz enabled RCU and the fine grained idle time accounting functions check tick_nohz_enabled. But that variable is merily telling that NOHZ has been enabled in the config and not been disabled on the command line. But it does not tell anything about nohz being active. That's what all this should check for. Matthew reported, that the idle accounting on his old P1 machine showed bogus values, when he enabled NOHZ in the config and did not disable it on the kernel command line. The reason is that his machine uses (refined) jiffies as a clocksource which explains why the fine grained accounting went into lala land, because it depends on when the system goes and leaves idle relative to the jiffies increment. Provide a tick_nohz_active indicator and let RCU and the accounting code use this instead of tick_nohz_enable. Reported-by: Matthew Whitehead tedheads...@gmail.com Matthew, can you test this patch to make sure it causes the idle issue to go away (remember to remove nohz=off from your command line). Reviewed-by: Steven Rostedt rost...@goodmis.org -- Steve Signed-off-by: Thomas Gleixner t...@linutronix.de --- diff --git a/kernel/rcu/tree_plugin.h b/kernel/rcu/tree_plugin.h index 3822ac0..da6e6de 100644 --- a/kernel/rcu/tree_plugin.h +++ b/kernel/rcu/tree_plugin.h @@ -1632,7 +1632,7 @@ module_param(rcu_idle_gp_delay, int, 0644); static int rcu_idle_lazy_gp_delay = RCU_IDLE_LAZY_GP_DELAY; module_param(rcu_idle_lazy_gp_delay, int, 0644); -extern int tick_nohz_enabled; +extern int tick_nohz_active; /* * Try to advance callbacks for all flavors of RCU on the current CPU, but @@ -1729,7 +1729,7 @@ static void rcu_prepare_for_idle(int cpu) int tne; /* Handle nohz enablement switches conservatively. */ - tne = ACCESS_ONCE(tick_nohz_enabled); + tne = ACCESS_ONCE(tick_nohz_active); if (tne != rdtp-tick_nohz_enabled_snap) { if (rcu_cpu_has_callbacks(cpu, NULL)) invoke_rcu_core(); /* force nohz to see update. */ diff --git a/kernel/time/tick-sched.c b/kernel/time/tick-sched.c index 3612fc7..a12df5a 100644 --- a/kernel/time/tick-sched.c +++ b/kernel/time/tick-sched.c @@ -361,8 +361,8 @@ void __init tick_nohz_init(void) /* * NO HZ enabled ? */ -int tick_nohz_enabled __read_mostly = 1; - +static int tick_nohz_enabled __read_mostly = 1; +int tick_nohz_active __read_mostly; /* * Enable / Disable tickless mode */ @@ -465,7 +465,7 @@ u64 get_cpu_idle_time_us(int cpu, u64 *last_update_time) struct tick_sched *ts = per_cpu(tick_cpu_sched, cpu); ktime_t now, idle; - if (!tick_nohz_enabled) + if (!tick_nohz_active) return -1; now = ktime_get(); @@ -506,7 +506,7 @@ u64 get_cpu_iowait_time_us(int cpu, u64 *last_update_time) struct tick_sched *ts = per_cpu(tick_cpu_sched, cpu); ktime_t now, iowait; - if (!tick_nohz_enabled) + if (!tick_nohz_active) return -1; now = ktime_get(); @@ -799,11 +799,6 @@ void tick_nohz_idle_enter(void) local_irq_disable(); ts = __get_cpu_var(tick_cpu_sched); - /* - * set ts-inidle unconditionally. even if the system did not - * switch to nohz mode the cpu frequency governers rely on the - * update of the idle time accounting in tick_nohz_start_idle(). - */ ts-inidle = 1; __tick_nohz_idle_enter(ts); @@ -973,7 +968,7 @@ static void tick_nohz_switch_to_nohz(void) struct tick_sched *ts = __get_cpu_var(tick_cpu_sched); ktime_t next; - if (!tick_nohz_enabled) + if (!tick_nohz_active) return; local_irq_disable(); @@ -981,7 +976,7 @@ static void tick_nohz_switch_to_nohz(void) local_irq_enable(); return; } - + tick_nohz_active = 1; ts-nohz_mode = NOHZ_MODE_LOWRES; /* @@ -1139,8 +1134,10 @@ void tick_setup_sched_timer(void) } #ifdef CONFIG_NO_HZ_COMMON - if (tick_nohz_enabled) + if (tick_nohz_enabled) { ts-nohz_mode = NOHZ_MODE_HIGHRES; + tick_nohz_active = 1; + } #endif } #endif /* HIGH_RES_TIMERS */ -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: nohz problem with idle time on old hardware
On Wed, Nov 13, 2013 at 03:07:45PM -0500, Steven Rostedt wrote: On Wed, 13 Nov 2013 21:01:57 +0100 (CET) Thomas Gleixner t...@linutronix.de wrote: Subject: NOHZ: Check for nohz active instead of nohz enabled RCU and the fine grained idle time accounting functions check tick_nohz_enabled. But that variable is merily telling that NOHZ has been enabled in the config and not been disabled on the command line. But it does not tell anything about nohz being active. That's what all this should check for. Matthew reported, that the idle accounting on his old P1 machine showed bogus values, when he enabled NOHZ in the config and did not disable it on the kernel command line. The reason is that his machine uses (refined) jiffies as a clocksource which explains why the fine grained accounting went into lala land, because it depends on when the system goes and leaves idle relative to the jiffies increment. Provide a tick_nohz_active indicator and let RCU and the accounting code use this instead of tick_nohz_enable. Reported-by: Matthew Whitehead tedheads...@gmail.com Matthew, can you test this patch to make sure it causes the idle issue to go away (remember to remove nohz=off from your command line). Early testing looks good: # Verify I haven't overridden nohz as a kernel parameter root@host:~# grep nohz /proc/cmdline # See how my kernel chose its clocksource root@host:~# dmesg | egrep TSC|clocksource [0.00] tsc: Fast TSC calibration using PIT [2.492468] Switched to clocksource refined-jiffies [ 24.197356] Switched to clocksource tsc [ 26.211058] Switched to clocksource refined-jiffies # Check idle time since the reboot root@host:~# sar -s 16:28:00 -P ALL Linux 3.12.0-rc6+ (host) 11/13/2013 _i586_ (2 CPU) 04:28:26 PM LINUX RESTART 04:32:02 PM CPU %user %nice %system %iowait%steal %idle 04:34:02 PM all 4.98 0.00 1.03 0.04 0.00 93.96 04:34:02 PM 0 4.07 0.00 0.69 0.03 0.00 95.21 04:34:02 PM 1 5.88 0.00 1.37 0.06 0.00 92.70 04:36:01 PM all 4.12 0.00 1.13 0.02 0.00 94.73 04:36:01 PM 0 1.98 0.00 1.34 0.02 0.00 96.66 04:36:01 PM 1 6.28 0.00 0.91 0.03 0.00 92.79 04:38:01 PM all 0.39 0.00 0.65 0.05 0.00 98.90 04:38:01 PM 0 0.33 0.00 0.54 0.03 0.00 99.09 04:38:01 PM 1 0.45 0.00 0.75 0.07 0.00 98.73 04:40:01 PM all 0.14 0.00 0.40 0.02 0.00 99.44 04:40:01 PM 0 0.12 0.00 0.26 0.01 0.00 99.62 04:40:01 PM 1 0.16 0.00 0.55 0.02 0.00 99.27 04:42:01 PM all 0.13 0.00 0.28 0.01 0.00 99.58 04:42:01 PM 0 0.09 0.00 0.14 0.01 0.00 99.76 04:42:01 PM 1 0.16 0.00 0.43 0.02 0.00 99.40 Average:all 1.95 0.00 0.70 0.03 0.00 97.33 Average: 0 1.32 0.00 0.59 0.02 0.00 98.07 Average: 1 2.58 0.00 0.80 0.04 0.00 96.58 I will continue to test but I think this is good. Tested-by: Matthew Whitehead tedheads...@gmail.com - Matthew W Reviewed-by: Steven Rostedt rost...@goodmis.org -- Steve Signed-off-by: Thomas Gleixner t...@linutronix.de --- diff --git a/kernel/rcu/tree_plugin.h b/kernel/rcu/tree_plugin.h index 3822ac0..da6e6de 100644 --- a/kernel/rcu/tree_plugin.h +++ b/kernel/rcu/tree_plugin.h @@ -1632,7 +1632,7 @@ module_param(rcu_idle_gp_delay, int, 0644); static int rcu_idle_lazy_gp_delay = RCU_IDLE_LAZY_GP_DELAY; module_param(rcu_idle_lazy_gp_delay, int, 0644); -extern int tick_nohz_enabled; +extern int tick_nohz_active; /* * Try to advance callbacks for all flavors of RCU on the current CPU, but @@ -1729,7 +1729,7 @@ static void rcu_prepare_for_idle(int cpu) int tne; /* Handle nohz enablement switches conservatively. */ - tne = ACCESS_ONCE(tick_nohz_enabled); + tne = ACCESS_ONCE(tick_nohz_active); if (tne != rdtp-tick_nohz_enabled_snap) { if (rcu_cpu_has_callbacks(cpu, NULL)) invoke_rcu_core(); /* force nohz to see update. */ diff --git a/kernel/time/tick-sched.c b/kernel/time/tick-sched.c index 3612fc7..a12df5a 100644 --- a/kernel/time/tick-sched.c +++ b/kernel/time/tick-sched.c @@ -361,8 +361,8 @@ void __init tick_nohz_init(void) /* * NO HZ enabled ? */ -int tick_nohz_enabled __read_mostly = 1; - +static int tick_nohz_enabled __read_mostly = 1; +int tick_nohz_active __read_mostly; /* * Enable / Disable tickless mode */ @@ -465,7 +465,7