Re: nohz problem with idle time on old hardware

2014-04-09 Thread Steven Rostedt
On Wed, 9 Apr 2014 21:26:51 +0530
Viresh Kumar  wrote:


> When HRES isn't enabled and NOHZ isn't enabled as well, in that
> case we should stick to the periodic code from tick-common.c and
> the oneshot options of tick_nohz_switch_to_nohz() or
> hrtimer_switch_to_hres() shouldn't be used. And so, we still need
> those checks, as per my understanding. :)

The one thing we can agree on is that the current code is wrong :-)

> 
> Lets see what others have for this discussion :)

Yeah, those that actually wrote this code should have a say in this.

-- Steve
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: nohz problem with idle time on old hardware

2014-04-09 Thread Viresh Kumar
On 9 April 2014 21:09, Steven Rostedt  wrote:
> Reading even more of the code, now I'm totally confused :-)

:)

> When tick_setup_sched_timer() is called, if tick_nohz_enabled is set,
> then we set tick_nohz_active.

correct.

> This gets called by hrtimer_switch_to_hres(), and before that is
> called, the tick_check_oneshot_changed() will never get to the
> tick_nohz_switch_to_nohz() call.

If hrtimer_switch_to_hres() is called or HRES is enabled, we will
never ever call tick_nohz_switch_to_nohz().

> Looks to me, the real answer is to nuke both the if statement *and* the
> setting of the tick_nohz_active in that function. Both looks a bit
> redundant to me.

When HRES isn't enabled and NOHZ isn't enabled as well, in that
case we should stick to the periodic code from tick-common.c and
the oneshot options of tick_nohz_switch_to_nohz() or
hrtimer_switch_to_hres() shouldn't be used. And so, we still need
those checks, as per my understanding. :)

Lets see what others have for this discussion :)
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: nohz problem with idle time on old hardware

2014-04-09 Thread Steven Rostedt
On Wed, 9 Apr 2014 11:31:43 -0400
Steven Rostedt  wrote:

> 
> Hmm, looking at the code, I see it probably should still do the check.
> 
> OK, nevermind ;-)

Reading even more of the code, now I'm totally confused :-)

When tick_setup_sched_timer() is called, if tick_nohz_enabled is set,
then we set tick_nohz_active.

This gets called by hrtimer_switch_to_hres(), and before that is
called, the tick_check_oneshot_changed() will never get to the
tick_nohz_switch_to_nohz() call.

Looks to me, the real answer is to nuke both the if statement *and* the
setting of the tick_nohz_active in that function. Both looks a bit
redundant to me.

-- Steve
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: nohz problem with idle time on old hardware

2014-04-09 Thread Viresh Kumar
On 9 April 2014 21:01, Steven Rostedt  wrote:
>> Do we? This is only called by tick_check_oneshot_change() which has the
>> following:
>>
>> int tick_check_oneshot_change(int allow_nohz)
>> {
>>   struct tick_sched *ts = &__get_cpu_var(tick_cpu_sched);
>>
>>   if (!test_and_clear_bit(0, >check_clocks))
>>   return 0;
>>
>>   if (ts->nohz_mode != NOHZ_MODE_INACTIVE)
>>   return 0;
>>
>>   if (!timekeeping_valid_for_hres() || !tick_is_oneshot_available())
>>   return 0;
>>
>>   if (!allow_nohz)
>>   return 1;
>>
>>   tick_nohz_switch_to_nohz();
>>   return 0;
>> }
>>
>> How often does it make it to that last check?

Probably we will reach here only once per boot per cpu.

> Hmm, looking at the code, I see it probably should still do the check.

But still we need it for that one time. :)
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: nohz problem with idle time on old hardware

2014-04-09 Thread Steven Rostedt
On Wed, 9 Apr 2014 11:29:50 -0400
Steven Rostedt  wrote:

> On Wed, 9 Apr 2014 20:50:59 +0530
> Viresh Kumar  wrote:
> 
> > On 9 April 2014 20:01, Steven Rostedt  wrote:
> > > Ouch! You are correct, this part of the patch makes no sense. That's
> > > what I get for reviewing a patch and not looking at all the code around
> > > the changes. (another kernel developer hangs head in shame :-( )
> > >
> > > I think that if statement should be nuked.
> > 
> > Hmm, my opinion differs here :)
> > 
> > If we completely remove this statement, we will run
> > tick_nohz_switch_to_nohz() even if nohz is not enabled. And check for
> > enabled must stay.
> 
> Do we? This is only called by tick_check_oneshot_change() which has the
> following:
> 
> int tick_check_oneshot_change(int allow_nohz)
> {
>   struct tick_sched *ts = &__get_cpu_var(tick_cpu_sched);
> 
>   if (!test_and_clear_bit(0, >check_clocks))
>   return 0;
> 
>   if (ts->nohz_mode != NOHZ_MODE_INACTIVE)
>   return 0;
> 
>   if (!timekeeping_valid_for_hres() || !tick_is_oneshot_available())
>   return 0;
> 
>   if (!allow_nohz)
>   return 1;
> 
>   tick_nohz_switch_to_nohz();
>   return 0;
> }
> 
> How often does it make it to that last check?


Hmm, looking at the code, I see it probably should still do the check.

OK, nevermind ;-)

-- Steve
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: nohz problem with idle time on old hardware

2014-04-09 Thread Steven Rostedt
On Wed, 9 Apr 2014 20:50:59 +0530
Viresh Kumar  wrote:

> On 9 April 2014 20:01, Steven Rostedt  wrote:
> > Ouch! You are correct, this part of the patch makes no sense. That's
> > what I get for reviewing a patch and not looking at all the code around
> > the changes. (another kernel developer hangs head in shame :-( )
> >
> > I think that if statement should be nuked.
> 
> Hmm, my opinion differs here :)
> 
> If we completely remove this statement, we will run
> tick_nohz_switch_to_nohz() even if nohz is not enabled. And check for
> enabled must stay.

Do we? This is only called by tick_check_oneshot_change() which has the
following:

int tick_check_oneshot_change(int allow_nohz)
{
struct tick_sched *ts = &__get_cpu_var(tick_cpu_sched);

if (!test_and_clear_bit(0, >check_clocks))
return 0;

if (ts->nohz_mode != NOHZ_MODE_INACTIVE)
return 0;

if (!timekeeping_valid_for_hres() || !tick_is_oneshot_available())
return 0;

if (!allow_nohz)
return 1;

tick_nohz_switch_to_nohz();
return 0;
}

How often does it make it to that last check?

-- Steve
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: nohz problem with idle time on old hardware

2014-04-09 Thread Viresh Kumar
On 9 April 2014 20:01, Steven Rostedt  wrote:
> Ouch! You are correct, this part of the patch makes no sense. That's
> what I get for reviewing a patch and not looking at all the code around
> the changes. (another kernel developer hangs head in shame :-( )
>
> I think that if statement should be nuked.

Hmm, my opinion differs here :)

If we completely remove this statement, we will run
tick_nohz_switch_to_nohz() even if nohz is not enabled. And check for
enabled must stay.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: nohz problem with idle time on old hardware

2014-04-09 Thread Frederic Weisbecker
On Wed, Apr 09, 2014 at 07:21:53PM +0530, Viresh Kumar wrote:
> On Thu, Nov 14, 2013 at 1:31 AM, Thomas Gleixner  wrote:
> > Subject: NOHZ: Check for nohz active instead of nohz enabled
> >
> > RCU and the fine grained idle time accounting functions check
> > tick_nohz_enabled. But that variable is merily telling that NOHZ has
> > been enabled in the config and not been disabled on the command line.
> >
> > But it does not tell anything about nohz being active. That's what all
> > this should check for.
> >
> > Matthew reported, that the idle accounting on his old P1 machine
> > showed bogus values, when he enabled NOHZ in the config and did not
> > disable it on the kernel command line. The reason is that his machine
> > uses (refined) jiffies as a clocksource which explains why the "fine"
> > grained accounting went into lala land, because it depends on when the
> > system goes and leaves idle relative to the jiffies increment.
> >
> > Provide a tick_nohz_active indicator and let RCU and the accounting
> > code use this instead of tick_nohz_enable.
> 
> > diff --git a/kernel/time/tick-sched.c b/kernel/time/tick-sched.c
> > @@ -973,7 +968,7 @@ static void tick_nohz_switch_to_nohz(void)
> > struct tick_sched *ts = &__get_cpu_var(tick_cpu_sched);
> > ktime_t next;
> >
> > -   if (!tick_nohz_enabled)
> > +   if (!tick_nohz_active)
> > return;
> 
> Considering the impressive list of Reviewed-by and people involved
> in this patch, I am not sure I am reading the code well here.
> 
> The above change isn't required as per my understanding. Otherwise
> we will never pass that check. tick_nohz_active is initialized as zero
> and so we will keep on returning for ever and wouldn't be able to set
> it to 1 ever.
> 
> I have a patch to fix it up, but wanted to know your opinion before
> sending it.

Ouch, bad thing.

I'm also disovering this whole thread and patch only today, since nobody Cc'ed 
me :-(
 
> > local_irq_disable();
> > @@ -981,7 +976,7 @@ static void tick_nohz_switch_to_nohz(void)
> > local_irq_enable();
> > return;
> > }
> > -
> > +   tick_nohz_active = 1;
> > ts->nohz_mode = NOHZ_MODE_LOWRES;
> >
> > /*
> --
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to majord...@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at  http://www.tux.org/lkml/
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: nohz problem with idle time on old hardware

2014-04-09 Thread Steven Rostedt
On Wed, 9 Apr 2014 19:21:53 +0530
Viresh Kumar  wrote:

> On Thu, Nov 14, 2013 at 1:31 AM, Thomas Gleixner  wrote:
> > Subject: NOHZ: Check for nohz active instead of nohz enabled
> >
> > RCU and the fine grained idle time accounting functions check
> > tick_nohz_enabled. But that variable is merily telling that NOHZ has
> > been enabled in the config and not been disabled on the command line.
> >
> > But it does not tell anything about nohz being active. That's what all
> > this should check for.
> >
> > Matthew reported, that the idle accounting on his old P1 machine
> > showed bogus values, when he enabled NOHZ in the config and did not
> > disable it on the kernel command line. The reason is that his machine
> > uses (refined) jiffies as a clocksource which explains why the "fine"
> > grained accounting went into lala land, because it depends on when the
> > system goes and leaves idle relative to the jiffies increment.
> >
> > Provide a tick_nohz_active indicator and let RCU and the accounting
> > code use this instead of tick_nohz_enable.
> 
> > diff --git a/kernel/time/tick-sched.c b/kernel/time/tick-sched.c
> > @@ -973,7 +968,7 @@ static void tick_nohz_switch_to_nohz(void)
> > struct tick_sched *ts = &__get_cpu_var(tick_cpu_sched);
> > ktime_t next;
> >
> > -   if (!tick_nohz_enabled)
> > +   if (!tick_nohz_active)
> > return;
> 
> Considering the impressive list of Reviewed-by and people involved
> in this patch, I am not sure I am reading the code well here.
> 
> The above change isn't required as per my understanding. Otherwise
> we will never pass that check. tick_nohz_active is initialized as zero
> and so we will keep on returning for ever and wouldn't be able to set
> it to 1 ever.
> 
> I have a patch to fix it up, but wanted to know your opinion before
> sending it.

Ouch! You are correct, this part of the patch makes no sense. That's
what I get for reviewing a patch and not looking at all the code around
the changes. (another kernel developer hangs head in shame :-( )

I think that if statement should be nuked.

-- Steve

> 
> > local_irq_disable();
> > @@ -981,7 +976,7 @@ static void tick_nohz_switch_to_nohz(void)
> > local_irq_enable();
> > return;
> > }
> > -
> > +   tick_nohz_active = 1;
> > ts->nohz_mode = NOHZ_MODE_LOWRES;
> >
> > /*

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: nohz problem with idle time on old hardware

2014-04-09 Thread Viresh Kumar
On Thu, Nov 14, 2013 at 1:31 AM, Thomas Gleixner  wrote:
> Subject: NOHZ: Check for nohz active instead of nohz enabled
>
> RCU and the fine grained idle time accounting functions check
> tick_nohz_enabled. But that variable is merily telling that NOHZ has
> been enabled in the config and not been disabled on the command line.
>
> But it does not tell anything about nohz being active. That's what all
> this should check for.
>
> Matthew reported, that the idle accounting on his old P1 machine
> showed bogus values, when he enabled NOHZ in the config and did not
> disable it on the kernel command line. The reason is that his machine
> uses (refined) jiffies as a clocksource which explains why the "fine"
> grained accounting went into lala land, because it depends on when the
> system goes and leaves idle relative to the jiffies increment.
>
> Provide a tick_nohz_active indicator and let RCU and the accounting
> code use this instead of tick_nohz_enable.

> diff --git a/kernel/time/tick-sched.c b/kernel/time/tick-sched.c
> @@ -973,7 +968,7 @@ static void tick_nohz_switch_to_nohz(void)
> struct tick_sched *ts = &__get_cpu_var(tick_cpu_sched);
> ktime_t next;
>
> -   if (!tick_nohz_enabled)
> +   if (!tick_nohz_active)
> return;

Considering the impressive list of Reviewed-by and people involved
in this patch, I am not sure I am reading the code well here.

The above change isn't required as per my understanding. Otherwise
we will never pass that check. tick_nohz_active is initialized as zero
and so we will keep on returning for ever and wouldn't be able to set
it to 1 ever.

I have a patch to fix it up, but wanted to know your opinion before
sending it.

> local_irq_disable();
> @@ -981,7 +976,7 @@ static void tick_nohz_switch_to_nohz(void)
> local_irq_enable();
> return;
> }
> -
> +   tick_nohz_active = 1;
> ts->nohz_mode = NOHZ_MODE_LOWRES;
>
> /*
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: nohz problem with idle time on old hardware

2014-04-09 Thread Viresh Kumar
On Thu, Nov 14, 2013 at 1:31 AM, Thomas Gleixner t...@linutronix.de wrote:
 Subject: NOHZ: Check for nohz active instead of nohz enabled

 RCU and the fine grained idle time accounting functions check
 tick_nohz_enabled. But that variable is merily telling that NOHZ has
 been enabled in the config and not been disabled on the command line.

 But it does not tell anything about nohz being active. That's what all
 this should check for.

 Matthew reported, that the idle accounting on his old P1 machine
 showed bogus values, when he enabled NOHZ in the config and did not
 disable it on the kernel command line. The reason is that his machine
 uses (refined) jiffies as a clocksource which explains why the fine
 grained accounting went into lala land, because it depends on when the
 system goes and leaves idle relative to the jiffies increment.

 Provide a tick_nohz_active indicator and let RCU and the accounting
 code use this instead of tick_nohz_enable.

 diff --git a/kernel/time/tick-sched.c b/kernel/time/tick-sched.c
 @@ -973,7 +968,7 @@ static void tick_nohz_switch_to_nohz(void)
 struct tick_sched *ts = __get_cpu_var(tick_cpu_sched);
 ktime_t next;

 -   if (!tick_nohz_enabled)
 +   if (!tick_nohz_active)
 return;

Considering the impressive list of Reviewed-by and people involved
in this patch, I am not sure I am reading the code well here.

The above change isn't required as per my understanding. Otherwise
we will never pass that check. tick_nohz_active is initialized as zero
and so we will keep on returning for ever and wouldn't be able to set
it to 1 ever.

I have a patch to fix it up, but wanted to know your opinion before
sending it.

 local_irq_disable();
 @@ -981,7 +976,7 @@ static void tick_nohz_switch_to_nohz(void)
 local_irq_enable();
 return;
 }
 -
 +   tick_nohz_active = 1;
 ts-nohz_mode = NOHZ_MODE_LOWRES;

 /*
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: nohz problem with idle time on old hardware

2014-04-09 Thread Steven Rostedt
On Wed, 9 Apr 2014 19:21:53 +0530
Viresh Kumar viresh.ku...@linaro.org wrote:

 On Thu, Nov 14, 2013 at 1:31 AM, Thomas Gleixner t...@linutronix.de wrote:
  Subject: NOHZ: Check for nohz active instead of nohz enabled
 
  RCU and the fine grained idle time accounting functions check
  tick_nohz_enabled. But that variable is merily telling that NOHZ has
  been enabled in the config and not been disabled on the command line.
 
  But it does not tell anything about nohz being active. That's what all
  this should check for.
 
  Matthew reported, that the idle accounting on his old P1 machine
  showed bogus values, when he enabled NOHZ in the config and did not
  disable it on the kernel command line. The reason is that his machine
  uses (refined) jiffies as a clocksource which explains why the fine
  grained accounting went into lala land, because it depends on when the
  system goes and leaves idle relative to the jiffies increment.
 
  Provide a tick_nohz_active indicator and let RCU and the accounting
  code use this instead of tick_nohz_enable.
 
  diff --git a/kernel/time/tick-sched.c b/kernel/time/tick-sched.c
  @@ -973,7 +968,7 @@ static void tick_nohz_switch_to_nohz(void)
  struct tick_sched *ts = __get_cpu_var(tick_cpu_sched);
  ktime_t next;
 
  -   if (!tick_nohz_enabled)
  +   if (!tick_nohz_active)
  return;
 
 Considering the impressive list of Reviewed-by and people involved
 in this patch, I am not sure I am reading the code well here.
 
 The above change isn't required as per my understanding. Otherwise
 we will never pass that check. tick_nohz_active is initialized as zero
 and so we will keep on returning for ever and wouldn't be able to set
 it to 1 ever.
 
 I have a patch to fix it up, but wanted to know your opinion before
 sending it.

Ouch! You are correct, this part of the patch makes no sense. That's
what I get for reviewing a patch and not looking at all the code around
the changes. (another kernel developer hangs head in shame :-( )

I think that if statement should be nuked.

-- Steve

 
  local_irq_disable();
  @@ -981,7 +976,7 @@ static void tick_nohz_switch_to_nohz(void)
  local_irq_enable();
  return;
  }
  -
  +   tick_nohz_active = 1;
  ts-nohz_mode = NOHZ_MODE_LOWRES;
 
  /*

--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: nohz problem with idle time on old hardware

2014-04-09 Thread Frederic Weisbecker
On Wed, Apr 09, 2014 at 07:21:53PM +0530, Viresh Kumar wrote:
 On Thu, Nov 14, 2013 at 1:31 AM, Thomas Gleixner t...@linutronix.de wrote:
  Subject: NOHZ: Check for nohz active instead of nohz enabled
 
  RCU and the fine grained idle time accounting functions check
  tick_nohz_enabled. But that variable is merily telling that NOHZ has
  been enabled in the config and not been disabled on the command line.
 
  But it does not tell anything about nohz being active. That's what all
  this should check for.
 
  Matthew reported, that the idle accounting on his old P1 machine
  showed bogus values, when he enabled NOHZ in the config and did not
  disable it on the kernel command line. The reason is that his machine
  uses (refined) jiffies as a clocksource which explains why the fine
  grained accounting went into lala land, because it depends on when the
  system goes and leaves idle relative to the jiffies increment.
 
  Provide a tick_nohz_active indicator and let RCU and the accounting
  code use this instead of tick_nohz_enable.
 
  diff --git a/kernel/time/tick-sched.c b/kernel/time/tick-sched.c
  @@ -973,7 +968,7 @@ static void tick_nohz_switch_to_nohz(void)
  struct tick_sched *ts = __get_cpu_var(tick_cpu_sched);
  ktime_t next;
 
  -   if (!tick_nohz_enabled)
  +   if (!tick_nohz_active)
  return;
 
 Considering the impressive list of Reviewed-by and people involved
 in this patch, I am not sure I am reading the code well here.
 
 The above change isn't required as per my understanding. Otherwise
 we will never pass that check. tick_nohz_active is initialized as zero
 and so we will keep on returning for ever and wouldn't be able to set
 it to 1 ever.
 
 I have a patch to fix it up, but wanted to know your opinion before
 sending it.

Ouch, bad thing.

I'm also disovering this whole thread and patch only today, since nobody Cc'ed 
me :-(
 
  local_irq_disable();
  @@ -981,7 +976,7 @@ static void tick_nohz_switch_to_nohz(void)
  local_irq_enable();
  return;
  }
  -
  +   tick_nohz_active = 1;
  ts-nohz_mode = NOHZ_MODE_LOWRES;
 
  /*
 --
 To unsubscribe from this list: send the line unsubscribe linux-kernel in
 the body of a message to majord...@vger.kernel.org
 More majordomo info at  http://vger.kernel.org/majordomo-info.html
 Please read the FAQ at  http://www.tux.org/lkml/
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: nohz problem with idle time on old hardware

2014-04-09 Thread Viresh Kumar
On 9 April 2014 20:01, Steven Rostedt rost...@goodmis.org wrote:
 Ouch! You are correct, this part of the patch makes no sense. That's
 what I get for reviewing a patch and not looking at all the code around
 the changes. (another kernel developer hangs head in shame :-( )

 I think that if statement should be nuked.

Hmm, my opinion differs here :)

If we completely remove this statement, we will run
tick_nohz_switch_to_nohz() even if nohz is not enabled. And check for
enabled must stay.
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: nohz problem with idle time on old hardware

2014-04-09 Thread Steven Rostedt
On Wed, 9 Apr 2014 20:50:59 +0530
Viresh Kumar viresh.ku...@linaro.org wrote:

 On 9 April 2014 20:01, Steven Rostedt rost...@goodmis.org wrote:
  Ouch! You are correct, this part of the patch makes no sense. That's
  what I get for reviewing a patch and not looking at all the code around
  the changes. (another kernel developer hangs head in shame :-( )
 
  I think that if statement should be nuked.
 
 Hmm, my opinion differs here :)
 
 If we completely remove this statement, we will run
 tick_nohz_switch_to_nohz() even if nohz is not enabled. And check for
 enabled must stay.

Do we? This is only called by tick_check_oneshot_change() which has the
following:

int tick_check_oneshot_change(int allow_nohz)
{
struct tick_sched *ts = __get_cpu_var(tick_cpu_sched);

if (!test_and_clear_bit(0, ts-check_clocks))
return 0;

if (ts-nohz_mode != NOHZ_MODE_INACTIVE)
return 0;

if (!timekeeping_valid_for_hres() || !tick_is_oneshot_available())
return 0;

if (!allow_nohz)
return 1;

tick_nohz_switch_to_nohz();
return 0;
}

How often does it make it to that last check?

-- Steve
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: nohz problem with idle time on old hardware

2014-04-09 Thread Steven Rostedt
On Wed, 9 Apr 2014 11:29:50 -0400
Steven Rostedt rost...@goodmis.org wrote:

 On Wed, 9 Apr 2014 20:50:59 +0530
 Viresh Kumar viresh.ku...@linaro.org wrote:
 
  On 9 April 2014 20:01, Steven Rostedt rost...@goodmis.org wrote:
   Ouch! You are correct, this part of the patch makes no sense. That's
   what I get for reviewing a patch and not looking at all the code around
   the changes. (another kernel developer hangs head in shame :-( )
  
   I think that if statement should be nuked.
  
  Hmm, my opinion differs here :)
  
  If we completely remove this statement, we will run
  tick_nohz_switch_to_nohz() even if nohz is not enabled. And check for
  enabled must stay.
 
 Do we? This is only called by tick_check_oneshot_change() which has the
 following:
 
 int tick_check_oneshot_change(int allow_nohz)
 {
   struct tick_sched *ts = __get_cpu_var(tick_cpu_sched);
 
   if (!test_and_clear_bit(0, ts-check_clocks))
   return 0;
 
   if (ts-nohz_mode != NOHZ_MODE_INACTIVE)
   return 0;
 
   if (!timekeeping_valid_for_hres() || !tick_is_oneshot_available())
   return 0;
 
   if (!allow_nohz)
   return 1;
 
   tick_nohz_switch_to_nohz();
   return 0;
 }
 
 How often does it make it to that last check?


Hmm, looking at the code, I see it probably should still do the check.

OK, nevermind ;-)

-- Steve
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: nohz problem with idle time on old hardware

2014-04-09 Thread Viresh Kumar
On 9 April 2014 21:01, Steven Rostedt rost...@goodmis.org wrote:
 Do we? This is only called by tick_check_oneshot_change() which has the
 following:

 int tick_check_oneshot_change(int allow_nohz)
 {
   struct tick_sched *ts = __get_cpu_var(tick_cpu_sched);

   if (!test_and_clear_bit(0, ts-check_clocks))
   return 0;

   if (ts-nohz_mode != NOHZ_MODE_INACTIVE)
   return 0;

   if (!timekeeping_valid_for_hres() || !tick_is_oneshot_available())
   return 0;

   if (!allow_nohz)
   return 1;

   tick_nohz_switch_to_nohz();
   return 0;
 }

 How often does it make it to that last check?

Probably we will reach here only once per boot per cpu.

 Hmm, looking at the code, I see it probably should still do the check.

But still we need it for that one time. :)
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: nohz problem with idle time on old hardware

2014-04-09 Thread Steven Rostedt
On Wed, 9 Apr 2014 11:31:43 -0400
Steven Rostedt rost...@goodmis.org wrote:

 
 Hmm, looking at the code, I see it probably should still do the check.
 
 OK, nevermind ;-)

Reading even more of the code, now I'm totally confused :-)

When tick_setup_sched_timer() is called, if tick_nohz_enabled is set,
then we set tick_nohz_active.

This gets called by hrtimer_switch_to_hres(), and before that is
called, the tick_check_oneshot_changed() will never get to the
tick_nohz_switch_to_nohz() call.

Looks to me, the real answer is to nuke both the if statement *and* the
setting of the tick_nohz_active in that function. Both looks a bit
redundant to me.

-- Steve
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: nohz problem with idle time on old hardware

2014-04-09 Thread Viresh Kumar
On 9 April 2014 21:09, Steven Rostedt rost...@goodmis.org wrote:
 Reading even more of the code, now I'm totally confused :-)

:)

 When tick_setup_sched_timer() is called, if tick_nohz_enabled is set,
 then we set tick_nohz_active.

correct.

 This gets called by hrtimer_switch_to_hres(), and before that is
 called, the tick_check_oneshot_changed() will never get to the
 tick_nohz_switch_to_nohz() call.

If hrtimer_switch_to_hres() is called or HRES is enabled, we will
never ever call tick_nohz_switch_to_nohz().

 Looks to me, the real answer is to nuke both the if statement *and* the
 setting of the tick_nohz_active in that function. Both looks a bit
 redundant to me.

When HRES isn't enabled and NOHZ isn't enabled as well, in that
case we should stick to the periodic code from tick-common.c and
the oneshot options of tick_nohz_switch_to_nohz() or
hrtimer_switch_to_hres() shouldn't be used. And so, we still need
those checks, as per my understanding. :)

Lets see what others have for this discussion :)
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: nohz problem with idle time on old hardware

2014-04-09 Thread Steven Rostedt
On Wed, 9 Apr 2014 21:26:51 +0530
Viresh Kumar viresh.ku...@linaro.org wrote:


 When HRES isn't enabled and NOHZ isn't enabled as well, in that
 case we should stick to the periodic code from tick-common.c and
 the oneshot options of tick_nohz_switch_to_nohz() or
 hrtimer_switch_to_hres() shouldn't be used. And so, we still need
 those checks, as per my understanding. :)

The one thing we can agree on is that the current code is wrong :-)

 
 Lets see what others have for this discussion :)

Yeah, those that actually wrote this code should have a say in this.

-- Steve
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: nohz problem with idle time on old hardware

2013-11-18 Thread Paul E. McKenney
On Wed, Nov 13, 2013 at 09:01:57PM +0100, Thomas Gleixner wrote:
> On Wed, 13 Nov 2013, Paul E. McKenney wrote:
> > On Wed, Nov 13, 2013 at 11:23:38AM -0500, Steven Rostedt wrote:
> > > On Wed, 13 Nov 2013 08:18:29 -0800
> > > "Paul E. McKenney"  wrote:
> > > 
> > > > On Wed, Nov 13, 2013 at 11:12:57AM -0500, Steven Rostedt wrote:
> > > > > On Wed, 13 Nov 2013 17:07:18 +0100 (CET)
> > > > > Thomas Gleixner  wrote:
> > > > > 
> > > > > 
> > > > > > Right. It's telling you if NOHZ is enabled. It's not telling you 
> > > > > > that
> > > > > > NOHZ is active.
> > > > > 
> > > > > Yeah, which makes this code rather silly:
> > > > > 
> > > > > in rcu_prepare_for_idle():
> > > > > 
> > > > >   /* Handle nohz enablement switches conservatively. */
> > > > >   tne = ACCESS_ONCE(tick_nohz_enabled);
> > > > >   if (tne != rdtp->tick_nohz_enabled_snap) {
> > > > >   if (rcu_cpu_has_callbacks(cpu, NULL))
> > > > >   invoke_rcu_core(); /* force nohz to see update. 
> > > > > */
> > > > >   rdtp->tick_nohz_enabled_snap = tne;
> > > > >   return;
> > > > >   }
> > > > 
> > > > OK, what should I be checking instead?  Not much point in trying to
> > > > get RCU out of the way of disabling the scheduling-clock interrupt
> > > > if NOHZ is disabled.  ;-)
> > > 
> > > I'll leave the answer to Thomas, but checking tick_nohz_enabled just
> > > lets you know if someone booted with nohz=off or not (and has nohz
> > > configured). But it doesn't tell you if nohz is actually being used.
> > 
> > Based on Thomas's most recent response, it sounds like I need to check
> > a frozen shark or something.  ;-)
> 
> Correct. Fix for the whole mess below.
> 
> Thanks,
> 
>   tglx
> 
> >
> Subject: NOHZ: Check for nohz active instead of nohz enabled
> 
> RCU and the fine grained idle time accounting functions check
> tick_nohz_enabled. But that variable is merily telling that NOHZ has
> been enabled in the config and not been disabled on the command line.
> 
> But it does not tell anything about nohz being active. That's what all
> this should check for.
> 
> Matthew reported, that the idle accounting on his old P1 machine
> showed bogus values, when he enabled NOHZ in the config and did not
> disable it on the kernel command line. The reason is that his machine
> uses (refined) jiffies as a clocksource which explains why the "fine"
> grained accounting went into lala land, because it depends on when the
> system goes and leaves idle relative to the jiffies increment.
> 
> Provide a tick_nohz_active indicator and let RCU and the accounting
> code use this instead of tick_nohz_enable.
> 
> Reported-by: Matthew Whitehead 
> Signed-off-by: Thomas Gleixner 

Reviewed-by: Paul E. McKenney 

> ---
> diff --git a/kernel/rcu/tree_plugin.h b/kernel/rcu/tree_plugin.h
> index 3822ac0..da6e6de 100644
> --- a/kernel/rcu/tree_plugin.h
> +++ b/kernel/rcu/tree_plugin.h
> @@ -1632,7 +1632,7 @@ module_param(rcu_idle_gp_delay, int, 0644);
>  static int rcu_idle_lazy_gp_delay = RCU_IDLE_LAZY_GP_DELAY;
>  module_param(rcu_idle_lazy_gp_delay, int, 0644);
> 
> -extern int tick_nohz_enabled;
> +extern int tick_nohz_active;
> 
>  /*
>   * Try to advance callbacks for all flavors of RCU on the current CPU, but
> @@ -1729,7 +1729,7 @@ static void rcu_prepare_for_idle(int cpu)
>   int tne;
> 
>   /* Handle nohz enablement switches conservatively. */
> - tne = ACCESS_ONCE(tick_nohz_enabled);
> + tne = ACCESS_ONCE(tick_nohz_active);
>   if (tne != rdtp->tick_nohz_enabled_snap) {
>   if (rcu_cpu_has_callbacks(cpu, NULL))
>   invoke_rcu_core(); /* force nohz to see update. */
> diff --git a/kernel/time/tick-sched.c b/kernel/time/tick-sched.c
> index 3612fc7..a12df5a 100644
> --- a/kernel/time/tick-sched.c
> +++ b/kernel/time/tick-sched.c
> @@ -361,8 +361,8 @@ void __init tick_nohz_init(void)
>  /*
>   * NO HZ enabled ?
>   */
> -int tick_nohz_enabled __read_mostly  = 1;
> -
> +static int tick_nohz_enabled __read_mostly  = 1;
> +int tick_nohz_active  __read_mostly;
>  /*
>   * Enable / Disable tickless mode
>   */
> @@ -465,7 +465,7 @@ u64 get_cpu_idle_time_us(int cpu, u64 *last_update_time)
>   struct tick_sched *ts = _cpu(tick_cpu_sched, cpu);
>   ktime_t now, idle;
> 
> - if (!tick_nohz_enabled)
> + if (!tick_nohz_active)
>   return -1;
> 
>   now = ktime_get();
> @@ -506,7 +506,7 @@ u64 get_cpu_iowait_time_us(int cpu, u64 *last_update_time)
>   struct tick_sched *ts = _cpu(tick_cpu_sched, cpu);
>   ktime_t now, iowait;
> 
> - if (!tick_nohz_enabled)
> + if (!tick_nohz_active)
>   return -1;
> 
>   now = ktime_get();
> @@ -799,11 +799,6 @@ void tick_nohz_idle_enter(void)
>   local_irq_disable();
> 
>   ts = &__get_cpu_var(tick_cpu_sched);
> - /*
> -  * set ts->inidle unconditionally. even if the system did not
> -  * switch to nohz mode the cpu 

Re: nohz problem with idle time on old hardware

2013-11-18 Thread Paul E. McKenney
On Wed, Nov 13, 2013 at 09:01:57PM +0100, Thomas Gleixner wrote:
 On Wed, 13 Nov 2013, Paul E. McKenney wrote:
  On Wed, Nov 13, 2013 at 11:23:38AM -0500, Steven Rostedt wrote:
   On Wed, 13 Nov 2013 08:18:29 -0800
   Paul E. McKenney paul...@linux.vnet.ibm.com wrote:
   
On Wed, Nov 13, 2013 at 11:12:57AM -0500, Steven Rostedt wrote:
 On Wed, 13 Nov 2013 17:07:18 +0100 (CET)
 Thomas Gleixner t...@linutronix.de wrote:
 
 
  Right. It's telling you if NOHZ is enabled. It's not telling you 
  that
  NOHZ is active.
 
 Yeah, which makes this code rather silly:
 
 in rcu_prepare_for_idle():
 
   /* Handle nohz enablement switches conservatively. */
   tne = ACCESS_ONCE(tick_nohz_enabled);
   if (tne != rdtp-tick_nohz_enabled_snap) {
   if (rcu_cpu_has_callbacks(cpu, NULL))
   invoke_rcu_core(); /* force nohz to see update. 
 */
   rdtp-tick_nohz_enabled_snap = tne;
   return;
   }

OK, what should I be checking instead?  Not much point in trying to
get RCU out of the way of disabling the scheduling-clock interrupt
if NOHZ is disabled.  ;-)
   
   I'll leave the answer to Thomas, but checking tick_nohz_enabled just
   lets you know if someone booted with nohz=off or not (and has nohz
   configured). But it doesn't tell you if nohz is actually being used.
  
  Based on Thomas's most recent response, it sounds like I need to check
  a frozen shark or something.  ;-)
 
 Correct. Fix for the whole mess below.
 
 Thanks,
 
   tglx
 
 
 Subject: NOHZ: Check for nohz active instead of nohz enabled
 
 RCU and the fine grained idle time accounting functions check
 tick_nohz_enabled. But that variable is merily telling that NOHZ has
 been enabled in the config and not been disabled on the command line.
 
 But it does not tell anything about nohz being active. That's what all
 this should check for.
 
 Matthew reported, that the idle accounting on his old P1 machine
 showed bogus values, when he enabled NOHZ in the config and did not
 disable it on the kernel command line. The reason is that his machine
 uses (refined) jiffies as a clocksource which explains why the fine
 grained accounting went into lala land, because it depends on when the
 system goes and leaves idle relative to the jiffies increment.
 
 Provide a tick_nohz_active indicator and let RCU and the accounting
 code use this instead of tick_nohz_enable.
 
 Reported-by: Matthew Whitehead tedheads...@gmail.com
 Signed-off-by: Thomas Gleixner t...@linutronix.de

Reviewed-by: Paul E. McKenney paul...@linux.vnet.ibm.com

 ---
 diff --git a/kernel/rcu/tree_plugin.h b/kernel/rcu/tree_plugin.h
 index 3822ac0..da6e6de 100644
 --- a/kernel/rcu/tree_plugin.h
 +++ b/kernel/rcu/tree_plugin.h
 @@ -1632,7 +1632,7 @@ module_param(rcu_idle_gp_delay, int, 0644);
  static int rcu_idle_lazy_gp_delay = RCU_IDLE_LAZY_GP_DELAY;
  module_param(rcu_idle_lazy_gp_delay, int, 0644);
 
 -extern int tick_nohz_enabled;
 +extern int tick_nohz_active;
 
  /*
   * Try to advance callbacks for all flavors of RCU on the current CPU, but
 @@ -1729,7 +1729,7 @@ static void rcu_prepare_for_idle(int cpu)
   int tne;
 
   /* Handle nohz enablement switches conservatively. */
 - tne = ACCESS_ONCE(tick_nohz_enabled);
 + tne = ACCESS_ONCE(tick_nohz_active);
   if (tne != rdtp-tick_nohz_enabled_snap) {
   if (rcu_cpu_has_callbacks(cpu, NULL))
   invoke_rcu_core(); /* force nohz to see update. */
 diff --git a/kernel/time/tick-sched.c b/kernel/time/tick-sched.c
 index 3612fc7..a12df5a 100644
 --- a/kernel/time/tick-sched.c
 +++ b/kernel/time/tick-sched.c
 @@ -361,8 +361,8 @@ void __init tick_nohz_init(void)
  /*
   * NO HZ enabled ?
   */
 -int tick_nohz_enabled __read_mostly  = 1;
 -
 +static int tick_nohz_enabled __read_mostly  = 1;
 +int tick_nohz_active  __read_mostly;
  /*
   * Enable / Disable tickless mode
   */
 @@ -465,7 +465,7 @@ u64 get_cpu_idle_time_us(int cpu, u64 *last_update_time)
   struct tick_sched *ts = per_cpu(tick_cpu_sched, cpu);
   ktime_t now, idle;
 
 - if (!tick_nohz_enabled)
 + if (!tick_nohz_active)
   return -1;
 
   now = ktime_get();
 @@ -506,7 +506,7 @@ u64 get_cpu_iowait_time_us(int cpu, u64 *last_update_time)
   struct tick_sched *ts = per_cpu(tick_cpu_sched, cpu);
   ktime_t now, iowait;
 
 - if (!tick_nohz_enabled)
 + if (!tick_nohz_active)
   return -1;
 
   now = ktime_get();
 @@ -799,11 +799,6 @@ void tick_nohz_idle_enter(void)
   local_irq_disable();
 
   ts = __get_cpu_var(tick_cpu_sched);
 - /*
 -  * set ts-inidle unconditionally. even if the system did not
 -  * switch to nohz mode the cpu frequency governers rely on the
 -  * update of the idle time accounting in tick_nohz_start_idle().
 -  */
   ts-inidle = 1;
   

Re: nohz problem with idle time on old hardware

2013-11-13 Thread Matthew Whitehead
On Wed, Nov 13, 2013 at 03:07:45PM -0500, Steven Rostedt wrote:
> On Wed, 13 Nov 2013 21:01:57 +0100 (CET)
> Thomas Gleixner  wrote:
> 
> 
> > >
> > Subject: NOHZ: Check for nohz active instead of nohz enabled
> > 
> > RCU and the fine grained idle time accounting functions check
> > tick_nohz_enabled. But that variable is merily telling that NOHZ has
> > been enabled in the config and not been disabled on the command line.
> > 
> > But it does not tell anything about nohz being active. That's what all
> > this should check for.
> > 
> > Matthew reported, that the idle accounting on his old P1 machine
> > showed bogus values, when he enabled NOHZ in the config and did not
> > disable it on the kernel command line. The reason is that his machine
> > uses (refined) jiffies as a clocksource which explains why the "fine"
> > grained accounting went into lala land, because it depends on when the
> > system goes and leaves idle relative to the jiffies increment.
> > 
> > Provide a tick_nohz_active indicator and let RCU and the accounting
> > code use this instead of tick_nohz_enable.
> > 
> > Reported-by: Matthew Whitehead 
> 
> Matthew, can you test this patch to make sure it causes the idle issue
> to go away (remember to remove nohz=off from your command line).
> 

Early testing looks good:

# Verify I haven't overridden nohz as a kernel parameter
root@host:~# grep nohz /proc/cmdline 

# See how my kernel chose its clocksource
root@host:~# dmesg | egrep "TSC|clocksource"
[0.00] tsc: Fast TSC calibration using PIT
[2.492468] Switched to clocksource refined-jiffies
[   24.197356] Switched to clocksource tsc
[   26.211058] Switched to clocksource refined-jiffies

# Check idle time since the reboot
root@host:~# sar -s 16:28:00 -P ALL 
Linux 3.12.0-rc6+ (host)  11/13/2013  _i586_  (2 CPU)

04:28:26 PM   LINUX RESTART

04:32:02 PM CPU %user %nice   %system   %iowait%steal %idle
04:34:02 PM all  4.98  0.00  1.03  0.04  0.00 93.96
04:34:02 PM   0  4.07  0.00  0.69  0.03  0.00 95.21
04:34:02 PM   1  5.88  0.00  1.37  0.06  0.00 92.70
04:36:01 PM all  4.12  0.00  1.13  0.02  0.00 94.73
04:36:01 PM   0  1.98  0.00  1.34  0.02  0.00 96.66
04:36:01 PM   1  6.28  0.00  0.91  0.03  0.00 92.79
04:38:01 PM all  0.39  0.00  0.65  0.05  0.00 98.90
04:38:01 PM   0  0.33  0.00  0.54  0.03  0.00 99.09
04:38:01 PM   1  0.45  0.00  0.75  0.07  0.00 98.73
04:40:01 PM all  0.14  0.00  0.40  0.02  0.00 99.44
04:40:01 PM   0  0.12  0.00  0.26  0.01  0.00 99.62
04:40:01 PM   1  0.16  0.00  0.55  0.02  0.00 99.27
04:42:01 PM all  0.13  0.00  0.28  0.01  0.00 99.58
04:42:01 PM   0  0.09  0.00  0.14  0.01  0.00 99.76
04:42:01 PM   1  0.16  0.00  0.43  0.02  0.00 99.40
Average:all  1.95  0.00  0.70  0.03  0.00 97.33
Average:  0  1.32  0.00  0.59  0.02  0.00 98.07
Average:  1  2.58  0.00  0.80  0.04  0.00 96.58

I will continue to test but I think this is good.

Tested-by: Matthew Whitehead 

- Matthew W

> Reviewed-by: Steven Rostedt 
> 
> -- Steve
> 
> > Signed-off-by: Thomas Gleixner 
> > ---
> > diff --git a/kernel/rcu/tree_plugin.h b/kernel/rcu/tree_plugin.h
> > index 3822ac0..da6e6de 100644
> > --- a/kernel/rcu/tree_plugin.h
> > +++ b/kernel/rcu/tree_plugin.h
> > @@ -1632,7 +1632,7 @@ module_param(rcu_idle_gp_delay, int, 0644);
> >  static int rcu_idle_lazy_gp_delay = RCU_IDLE_LAZY_GP_DELAY;
> >  module_param(rcu_idle_lazy_gp_delay, int, 0644);
> >  
> > -extern int tick_nohz_enabled;
> > +extern int tick_nohz_active;
> >  
> >  /*
> >   * Try to advance callbacks for all flavors of RCU on the current CPU, but
> > @@ -1729,7 +1729,7 @@ static void rcu_prepare_for_idle(int cpu)
> > int tne;
> >  
> > /* Handle nohz enablement switches conservatively. */
> > -   tne = ACCESS_ONCE(tick_nohz_enabled);
> > +   tne = ACCESS_ONCE(tick_nohz_active);
> > if (tne != rdtp->tick_nohz_enabled_snap) {
> > if (rcu_cpu_has_callbacks(cpu, NULL))
> > invoke_rcu_core(); /* force nohz to see update. */
> > diff --git a/kernel/time/tick-sched.c b/kernel/time/tick-sched.c
> > index 3612fc7..a12df5a 100644
> > --- a/kernel/time/tick-sched.c
> > +++ b/kernel/time/tick-sched.c
> > @@ -361,8 +361,8 @@ void __init tick_nohz_init(void)
> >  /*
> >   * NO HZ enabled ?
> >   */
> > -int tick_nohz_enabled __read_mostly  = 1;
> > -
> > +static int tick_nohz_enabled __read_mostly  = 1;
> > +int tick_nohz_active  __read_mostly;
> >  /*
> >   * Enable / Disable 

Re: nohz problem with idle time on old hardware

2013-11-13 Thread Steven Rostedt
On Wed, 13 Nov 2013 21:01:57 +0100 (CET)
Thomas Gleixner  wrote:


> >
> Subject: NOHZ: Check for nohz active instead of nohz enabled
> 
> RCU and the fine grained idle time accounting functions check
> tick_nohz_enabled. But that variable is merily telling that NOHZ has
> been enabled in the config and not been disabled on the command line.
> 
> But it does not tell anything about nohz being active. That's what all
> this should check for.
> 
> Matthew reported, that the idle accounting on his old P1 machine
> showed bogus values, when he enabled NOHZ in the config and did not
> disable it on the kernel command line. The reason is that his machine
> uses (refined) jiffies as a clocksource which explains why the "fine"
> grained accounting went into lala land, because it depends on when the
> system goes and leaves idle relative to the jiffies increment.
> 
> Provide a tick_nohz_active indicator and let RCU and the accounting
> code use this instead of tick_nohz_enable.
> 
> Reported-by: Matthew Whitehead 

Matthew, can you test this patch to make sure it causes the idle issue
to go away (remember to remove nohz=off from your command line).

Reviewed-by: Steven Rostedt 

-- Steve

> Signed-off-by: Thomas Gleixner 
> ---
> diff --git a/kernel/rcu/tree_plugin.h b/kernel/rcu/tree_plugin.h
> index 3822ac0..da6e6de 100644
> --- a/kernel/rcu/tree_plugin.h
> +++ b/kernel/rcu/tree_plugin.h
> @@ -1632,7 +1632,7 @@ module_param(rcu_idle_gp_delay, int, 0644);
>  static int rcu_idle_lazy_gp_delay = RCU_IDLE_LAZY_GP_DELAY;
>  module_param(rcu_idle_lazy_gp_delay, int, 0644);
>  
> -extern int tick_nohz_enabled;
> +extern int tick_nohz_active;
>  
>  /*
>   * Try to advance callbacks for all flavors of RCU on the current CPU, but
> @@ -1729,7 +1729,7 @@ static void rcu_prepare_for_idle(int cpu)
>   int tne;
>  
>   /* Handle nohz enablement switches conservatively. */
> - tne = ACCESS_ONCE(tick_nohz_enabled);
> + tne = ACCESS_ONCE(tick_nohz_active);
>   if (tne != rdtp->tick_nohz_enabled_snap) {
>   if (rcu_cpu_has_callbacks(cpu, NULL))
>   invoke_rcu_core(); /* force nohz to see update. */
> diff --git a/kernel/time/tick-sched.c b/kernel/time/tick-sched.c
> index 3612fc7..a12df5a 100644
> --- a/kernel/time/tick-sched.c
> +++ b/kernel/time/tick-sched.c
> @@ -361,8 +361,8 @@ void __init tick_nohz_init(void)
>  /*
>   * NO HZ enabled ?
>   */
> -int tick_nohz_enabled __read_mostly  = 1;
> -
> +static int tick_nohz_enabled __read_mostly  = 1;
> +int tick_nohz_active  __read_mostly;
>  /*
>   * Enable / Disable tickless mode
>   */
> @@ -465,7 +465,7 @@ u64 get_cpu_idle_time_us(int cpu, u64 *last_update_time)
>   struct tick_sched *ts = _cpu(tick_cpu_sched, cpu);
>   ktime_t now, idle;
>  
> - if (!tick_nohz_enabled)
> + if (!tick_nohz_active)
>   return -1;
>  
>   now = ktime_get();
> @@ -506,7 +506,7 @@ u64 get_cpu_iowait_time_us(int cpu, u64 *last_update_time)
>   struct tick_sched *ts = _cpu(tick_cpu_sched, cpu);
>   ktime_t now, iowait;
>  
> - if (!tick_nohz_enabled)
> + if (!tick_nohz_active)
>   return -1;
>  
>   now = ktime_get();
> @@ -799,11 +799,6 @@ void tick_nohz_idle_enter(void)
>   local_irq_disable();
>  
>   ts = &__get_cpu_var(tick_cpu_sched);
> - /*
> -  * set ts->inidle unconditionally. even if the system did not
> -  * switch to nohz mode the cpu frequency governers rely on the
> -  * update of the idle time accounting in tick_nohz_start_idle().
> -  */
>   ts->inidle = 1;
>   __tick_nohz_idle_enter(ts);
>  
> @@ -973,7 +968,7 @@ static void tick_nohz_switch_to_nohz(void)
>   struct tick_sched *ts = &__get_cpu_var(tick_cpu_sched);
>   ktime_t next;
>  
> - if (!tick_nohz_enabled)
> + if (!tick_nohz_active)
>   return;
>  
>   local_irq_disable();
> @@ -981,7 +976,7 @@ static void tick_nohz_switch_to_nohz(void)
>   local_irq_enable();
>   return;
>   }
> -
> + tick_nohz_active = 1;
>   ts->nohz_mode = NOHZ_MODE_LOWRES;
>  
>   /*
> @@ -1139,8 +1134,10 @@ void tick_setup_sched_timer(void)
>   }
>  
>  #ifdef CONFIG_NO_HZ_COMMON
> - if (tick_nohz_enabled)
> + if (tick_nohz_enabled) {
>   ts->nohz_mode = NOHZ_MODE_HIGHRES;
> + tick_nohz_active = 1;
> + }
>  #endif
>  }
>  #endif /* HIGH_RES_TIMERS */

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: nohz problem with idle time on old hardware

2013-11-13 Thread Thomas Gleixner
On Wed, 13 Nov 2013, Paul E. McKenney wrote:
> On Wed, Nov 13, 2013 at 11:23:38AM -0500, Steven Rostedt wrote:
> > On Wed, 13 Nov 2013 08:18:29 -0800
> > "Paul E. McKenney"  wrote:
> > 
> > > On Wed, Nov 13, 2013 at 11:12:57AM -0500, Steven Rostedt wrote:
> > > > On Wed, 13 Nov 2013 17:07:18 +0100 (CET)
> > > > Thomas Gleixner  wrote:
> > > > 
> > > > 
> > > > > Right. It's telling you if NOHZ is enabled. It's not telling you that
> > > > > NOHZ is active.
> > > > 
> > > > Yeah, which makes this code rather silly:
> > > > 
> > > > in rcu_prepare_for_idle():
> > > > 
> > > > /* Handle nohz enablement switches conservatively. */
> > > > tne = ACCESS_ONCE(tick_nohz_enabled);
> > > > if (tne != rdtp->tick_nohz_enabled_snap) {
> > > > if (rcu_cpu_has_callbacks(cpu, NULL))
> > > > invoke_rcu_core(); /* force nohz to see update. 
> > > > */
> > > > rdtp->tick_nohz_enabled_snap = tne;
> > > > return;
> > > > }
> > > 
> > > OK, what should I be checking instead?  Not much point in trying to
> > > get RCU out of the way of disabling the scheduling-clock interrupt
> > > if NOHZ is disabled.  ;-)
> > 
> > I'll leave the answer to Thomas, but checking tick_nohz_enabled just
> > lets you know if someone booted with nohz=off or not (and has nohz
> > configured). But it doesn't tell you if nohz is actually being used.
> 
> Based on Thomas's most recent response, it sounds like I need to check
> a frozen shark or something.  ;-)

Correct. Fix for the whole mess below.

Thanks,

tglx

>
Subject: NOHZ: Check for nohz active instead of nohz enabled

RCU and the fine grained idle time accounting functions check
tick_nohz_enabled. But that variable is merily telling that NOHZ has
been enabled in the config and not been disabled on the command line.

But it does not tell anything about nohz being active. That's what all
this should check for.

Matthew reported, that the idle accounting on his old P1 machine
showed bogus values, when he enabled NOHZ in the config and did not
disable it on the kernel command line. The reason is that his machine
uses (refined) jiffies as a clocksource which explains why the "fine"
grained accounting went into lala land, because it depends on when the
system goes and leaves idle relative to the jiffies increment.

Provide a tick_nohz_active indicator and let RCU and the accounting
code use this instead of tick_nohz_enable.

Reported-by: Matthew Whitehead 
Signed-off-by: Thomas Gleixner 
---
diff --git a/kernel/rcu/tree_plugin.h b/kernel/rcu/tree_plugin.h
index 3822ac0..da6e6de 100644
--- a/kernel/rcu/tree_plugin.h
+++ b/kernel/rcu/tree_plugin.h
@@ -1632,7 +1632,7 @@ module_param(rcu_idle_gp_delay, int, 0644);
 static int rcu_idle_lazy_gp_delay = RCU_IDLE_LAZY_GP_DELAY;
 module_param(rcu_idle_lazy_gp_delay, int, 0644);
 
-extern int tick_nohz_enabled;
+extern int tick_nohz_active;
 
 /*
  * Try to advance callbacks for all flavors of RCU on the current CPU, but
@@ -1729,7 +1729,7 @@ static void rcu_prepare_for_idle(int cpu)
int tne;
 
/* Handle nohz enablement switches conservatively. */
-   tne = ACCESS_ONCE(tick_nohz_enabled);
+   tne = ACCESS_ONCE(tick_nohz_active);
if (tne != rdtp->tick_nohz_enabled_snap) {
if (rcu_cpu_has_callbacks(cpu, NULL))
invoke_rcu_core(); /* force nohz to see update. */
diff --git a/kernel/time/tick-sched.c b/kernel/time/tick-sched.c
index 3612fc7..a12df5a 100644
--- a/kernel/time/tick-sched.c
+++ b/kernel/time/tick-sched.c
@@ -361,8 +361,8 @@ void __init tick_nohz_init(void)
 /*
  * NO HZ enabled ?
  */
-int tick_nohz_enabled __read_mostly  = 1;
-
+static int tick_nohz_enabled __read_mostly  = 1;
+int tick_nohz_active  __read_mostly;
 /*
  * Enable / Disable tickless mode
  */
@@ -465,7 +465,7 @@ u64 get_cpu_idle_time_us(int cpu, u64 *last_update_time)
struct tick_sched *ts = _cpu(tick_cpu_sched, cpu);
ktime_t now, idle;
 
-   if (!tick_nohz_enabled)
+   if (!tick_nohz_active)
return -1;
 
now = ktime_get();
@@ -506,7 +506,7 @@ u64 get_cpu_iowait_time_us(int cpu, u64 *last_update_time)
struct tick_sched *ts = _cpu(tick_cpu_sched, cpu);
ktime_t now, iowait;
 
-   if (!tick_nohz_enabled)
+   if (!tick_nohz_active)
return -1;
 
now = ktime_get();
@@ -799,11 +799,6 @@ void tick_nohz_idle_enter(void)
local_irq_disable();
 
ts = &__get_cpu_var(tick_cpu_sched);
-   /*
-* set ts->inidle unconditionally. even if the system did not
-* switch to nohz mode the cpu frequency governers rely on the
-* update of the idle time accounting in tick_nohz_start_idle().
-*/
ts->inidle = 1;
__tick_nohz_idle_enter(ts);
 
@@ -973,7 +968,7 @@ static void tick_nohz_switch_to_nohz(void)
struct tick_sched *ts = 

Re: nohz problem with idle time on old hardware

2013-11-13 Thread Paul E. McKenney
On Wed, Nov 13, 2013 at 11:23:38AM -0500, Steven Rostedt wrote:
> On Wed, 13 Nov 2013 08:18:29 -0800
> "Paul E. McKenney"  wrote:
> 
> > On Wed, Nov 13, 2013 at 11:12:57AM -0500, Steven Rostedt wrote:
> > > On Wed, 13 Nov 2013 17:07:18 +0100 (CET)
> > > Thomas Gleixner  wrote:
> > > 
> > > 
> > > > Right. It's telling you if NOHZ is enabled. It's not telling you that
> > > > NOHZ is active.
> > > 
> > > Yeah, which makes this code rather silly:
> > > 
> > > in rcu_prepare_for_idle():
> > > 
> > >   /* Handle nohz enablement switches conservatively. */
> > >   tne = ACCESS_ONCE(tick_nohz_enabled);
> > >   if (tne != rdtp->tick_nohz_enabled_snap) {
> > >   if (rcu_cpu_has_callbacks(cpu, NULL))
> > >   invoke_rcu_core(); /* force nohz to see update. */
> > >   rdtp->tick_nohz_enabled_snap = tne;
> > >   return;
> > >   }
> > 
> > OK, what should I be checking instead?  Not much point in trying to
> > get RCU out of the way of disabling the scheduling-clock interrupt
> > if NOHZ is disabled.  ;-)
> 
> I'll leave the answer to Thomas, but checking tick_nohz_enabled just
> lets you know if someone booted with nohz=off or not (and has nohz
> configured). But it doesn't tell you if nohz is actually being used.

Based on Thomas's most recent response, it sounds like I need to check
a frozen shark or something.  ;-)

Thanx, Paul

> That is, tick_nohz_enabled is set at bootup and never changes.
> 
> Perhaps this old hardware uncovered other bugs as well ;-)
> 
> -- Steve
> 

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: nohz problem with idle time on old hardware

2013-11-13 Thread Steven Rostedt
On Wed, 13 Nov 2013 17:21:53 +0100 (CET)
Thomas Gleixner  wrote:

 
> Frozen shark time 

As long as it's not aimed at me ;-)

-- Steve

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: nohz problem with idle time on old hardware

2013-11-13 Thread Steven Rostedt
On Wed, 13 Nov 2013 08:18:29 -0800
"Paul E. McKenney"  wrote:

> On Wed, Nov 13, 2013 at 11:12:57AM -0500, Steven Rostedt wrote:
> > On Wed, 13 Nov 2013 17:07:18 +0100 (CET)
> > Thomas Gleixner  wrote:
> > 
> > 
> > > Right. It's telling you if NOHZ is enabled. It's not telling you that
> > > NOHZ is active.
> > 
> > Yeah, which makes this code rather silly:
> > 
> > in rcu_prepare_for_idle():
> > 
> > /* Handle nohz enablement switches conservatively. */
> > tne = ACCESS_ONCE(tick_nohz_enabled);
> > if (tne != rdtp->tick_nohz_enabled_snap) {
> > if (rcu_cpu_has_callbacks(cpu, NULL))
> > invoke_rcu_core(); /* force nohz to see update. */
> > rdtp->tick_nohz_enabled_snap = tne;
> > return;
> > }
> 
> OK, what should I be checking instead?  Not much point in trying to
> get RCU out of the way of disabling the scheduling-clock interrupt
> if NOHZ is disabled.  ;-)
> 

I'll leave the answer to Thomas, but checking tick_nohz_enabled just
lets you know if someone booted with nohz=off or not (and has nohz
configured). But it doesn't tell you if nohz is actually being used.

That is, tick_nohz_enabled is set at bootup and never changes.

Perhaps this old hardware uncovered other bugs as well ;-)

-- Steve
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: nohz problem with idle time on old hardware

2013-11-13 Thread Thomas Gleixner
On Wed, 13 Nov 2013, Thomas Gleixner wrote:

> On Wed, 13 Nov 2013, Steven Rostedt wrote:
> 
> > On Wed, 13 Nov 2013 10:31:34 -0500
> > Steven Rostedt  wrote:
> >  
> > > The trace does indeed show that a tick is happening, as the config has
> > > HZ=250 (4ms) and we see a tick happen every 4ms. But for some reason,
> > > we don't update the the idle time correctly when nohz is enabled.
> > > 
> > > When I say nohz is enabled, I mean that we don't have nohz=off in the
> > > command line. There seems to be some difference between having nohz=off
> > > and having nohz disabled at runtime.
> > 
> > 
> > Looking at the differences between nohz=off from the command line, and
> > disabled at run time seems to be the variable "tick_nohz_enabled". I
> > don't see where it gets set to zero except for nohz=off.
> 
> Right. It's telling you if NOHZ is enabled. It's not telling you that
> NOHZ is active.

And its even worse:

tick_nohz_idle_enter()

/*
 * set ts->inidle unconditionally. even if the system did not
 * switch to nohz mode the cpu frequency governers rely on the
 * update of the idle time accounting in tick_nohz_start_idle().
 */
ts->inidle = 1;

So there is code relying on that accounting stuff even if nohz is not
active.

Frozen shark time 






--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: nohz problem with idle time on old hardware

2013-11-13 Thread Paul E. McKenney
On Wed, Nov 13, 2013 at 11:12:57AM -0500, Steven Rostedt wrote:
> On Wed, 13 Nov 2013 17:07:18 +0100 (CET)
> Thomas Gleixner  wrote:
> 
> 
> > Right. It's telling you if NOHZ is enabled. It's not telling you that
> > NOHZ is active.
> 
> Yeah, which makes this code rather silly:
> 
> in rcu_prepare_for_idle():
> 
>   /* Handle nohz enablement switches conservatively. */
>   tne = ACCESS_ONCE(tick_nohz_enabled);
>   if (tne != rdtp->tick_nohz_enabled_snap) {
>   if (rcu_cpu_has_callbacks(cpu, NULL))
>   invoke_rcu_core(); /* force nohz to see update. */
>   rdtp->tick_nohz_enabled_snap = tne;
>   return;
>   }

OK, what should I be checking instead?  Not much point in trying to
get RCU out of the way of disabling the scheduling-clock interrupt
if NOHZ is disabled.  ;-)

Thanx, Paul

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: nohz problem with idle time on old hardware

2013-11-13 Thread Paul E. McKenney
On Wed, Nov 13, 2013 at 04:50:20PM +0100, Thomas Gleixner wrote:
> On Wed, 13 Nov 2013, Steven Rostedt wrote:
> > > I'm not saying that we are actually getting into nohz, but something
> > > with the nohz code is messing with cpu accounting.
> > 
> > The trace does indeed show that a tick is happening, as the config has
> > HZ=250 (4ms) and we see a tick happen every 4ms. But for some reason,
> > we don't update the the idle time correctly when nohz is enabled.
> > 
> > When I say nohz is enabled, I mean that we don't have nohz=off in the
> > command line. There seems to be some difference between having nohz=off
> > and having nohz disabled at runtime.
> 
> Right that affects tick_nohz_enabled
> 
> Two files use this variable:
> kernel/rcu/tree_plugin.h
> kernel/time/tick-sched.c
> 
> The only accounting related stuff is in tick-sched.c:
> 
> get_cpu_idle_time_us() and get_cpu_iowait_time_us()
> 
> Both functions bail out if (!tick_nohz_enabled).
> 
> The users of get_cpu_idle_time_us() are cpufreq and fs/proc/stat.c!
> 
> Now the simplest fix is to let those functions check whether we
> actually switched into NOHZ mode. Should work for the RCU tree stuff
> as well.

RCU's use of tick_nohz_enabled is for the RCU_FAST_NO_HZ stuff.  If
it sees !tick_nohz_enabled, it skips trying to get RCU out of the way
of disabling the scheduling-clock tick.  If RCU detects a change
in the value of tick_nohz_enabled, it does a raise_softirq() to
force re-evaluation of the situation.

Thanx, Paul

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: nohz problem with idle time on old hardware

2013-11-13 Thread Steven Rostedt
On Wed, 13 Nov 2013 17:07:18 +0100 (CET)
Thomas Gleixner  wrote:


> Right. It's telling you if NOHZ is enabled. It's not telling you that
> NOHZ is active.

Yeah, which makes this code rather silly:

in rcu_prepare_for_idle():

/* Handle nohz enablement switches conservatively. */
tne = ACCESS_ONCE(tick_nohz_enabled);
if (tne != rdtp->tick_nohz_enabled_snap) {
if (rcu_cpu_has_callbacks(cpu, NULL))
invoke_rcu_core(); /* force nohz to see update. */
rdtp->tick_nohz_enabled_snap = tne;
return;
}

-- Steve
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: nohz problem with idle time on old hardware

2013-11-13 Thread Thomas Gleixner
On Wed, 13 Nov 2013, Steven Rostedt wrote:

> On Wed, 13 Nov 2013 10:31:34 -0500
> Steven Rostedt  wrote:
>  
> > The trace does indeed show that a tick is happening, as the config has
> > HZ=250 (4ms) and we see a tick happen every 4ms. But for some reason,
> > we don't update the the idle time correctly when nohz is enabled.
> > 
> > When I say nohz is enabled, I mean that we don't have nohz=off in the
> > command line. There seems to be some difference between having nohz=off
> > and having nohz disabled at runtime.
> 
> 
> Looking at the differences between nohz=off from the command line, and
> disabled at run time seems to be the variable "tick_nohz_enabled". I
> don't see where it gets set to zero except for nohz=off.

Right. It's telling you if NOHZ is enabled. It's not telling you that
NOHZ is active.

Thanks,

tglx
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: nohz problem with idle time on old hardware

2013-11-13 Thread Steven Rostedt
On Wed, 13 Nov 2013 10:31:34 -0500
Steven Rostedt  wrote:
 
> The trace does indeed show that a tick is happening, as the config has
> HZ=250 (4ms) and we see a tick happen every 4ms. But for some reason,
> we don't update the the idle time correctly when nohz is enabled.
> 
> When I say nohz is enabled, I mean that we don't have nohz=off in the
> command line. There seems to be some difference between having nohz=off
> and having nohz disabled at runtime.


Looking at the differences between nohz=off from the command line, and
disabled at run time seems to be the variable "tick_nohz_enabled". I
don't see where it gets set to zero except for nohz=off.

$ git grep tick_nohz_enabled
kernel/rcutree.h:   int tick_nohz_enabled_snap; /* Previously seen
value from sysfs. */ kernel/rcutree_plugin.h:extern int
tick_nohz_enabled; kernel/rcutree_plugin.h:tne =
ACCESS_ONCE(tick_nohz_enabled); kernel/rcutree_plugin.h:if
(tne != rdtp->tick_nohz_enabled_snap)
{ kernel/rcutree_plugin.h:rdtp->tick_nohz_enabled_snap
= tne; kernel/rcutree_plugin.h:
rdtp->tick_nohz_enabled_snap ? '.' : 'D'); kernel/time/tick-sched.c:int
tick_nohz_enabled __read_mostly  = 1;
kernel/time/tick-sched.c:   tick_nohz_enabled = 0;
kernel/time/tick-sched.c:   tick_nohz_enabled = 1;
kernel/time/tick-sched.c:   if (!tick_nohz_enabled)
kernel/time/tick-sched.c:   if (!tick_nohz_enabled)
kernel/time/tick-sched.c:   if (!tick_nohz_enabled)
kernel/time/tick-sched.c:   if (tick_nohz_enabled)


What's even stranger is that the RCU code in rcutree_plugin.h does an
ACCESS_ONCE(tick_nohz_enabled) as if it can change.

That said, looking at the fs/proc/stat.c get_idle_time() it does an
idle_time = get_cpu_idle_time_us(cpu, NULL) which has:

u64 get_cpu_idle_time_us(int cpu, u64 *last_update_time)
{
struct tick_sched *ts = _cpu(tick_cpu_sched, cpu);
ktime_t now, idle;

if (!tick_nohz_enabled)
return -1;

now = ktime_get();
if (last_update_time) {
update_ts_time_stats(cpu, ts, now, last_update_time);
idle = ts->idle_sleeptime;
} else {
if (ts->idle_active && !nr_iowait_cpu(cpu)) {
ktime_t delta = ktime_sub(now, ts->idle_entrytime);
idle = ktime_add(ts->idle_sleeptime, delta);
} else {
idle = ts->idle_sleeptime;
}
}

return ktime_to_us(idle);

}

This is one of the differences between nohz=off and nohz=on with jiffy
accounting. When we have nohz=off, this returns -1 and the calling
function calculates the idle time differently.

-- Steve
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: nohz problem with idle time on old hardware

2013-11-13 Thread Thomas Gleixner
On Wed, 13 Nov 2013, Steven Rostedt wrote:
> > I'm not saying that we are actually getting into nohz, but something
> > with the nohz code is messing with cpu accounting.
> 
> The trace does indeed show that a tick is happening, as the config has
> HZ=250 (4ms) and we see a tick happen every 4ms. But for some reason,
> we don't update the the idle time correctly when nohz is enabled.
> 
> When I say nohz is enabled, I mean that we don't have nohz=off in the
> command line. There seems to be some difference between having nohz=off
> and having nohz disabled at runtime.

Right that affects tick_nohz_enabled

Two files use this variable:
kernel/rcu/tree_plugin.h
kernel/time/tick-sched.c

The only accounting related stuff is in tick-sched.c:

get_cpu_idle_time_us() and get_cpu_iowait_time_us()

Both functions bail out if (!tick_nohz_enabled).

The users of get_cpu_idle_time_us() are cpufreq and fs/proc/stat.c!

Now the simplest fix is to let those functions check whether we
actually switched into NOHZ mode. Should work for the RCU tree stuff
as well.

Thanks,

tglx



--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: nohz problem with idle time on old hardware

2013-11-13 Thread Steven Rostedt
On Wed, 13 Nov 2013 10:21:53 -0500
Steven Rostedt  wrote:
trace-cmd record -p function -l '*nohz*' -l account_process_tick -e sched_switch
> 
> rcu_sche-9   0d...  6858.618033: sched_switch: rcu_sched:9 [120] 
> S ==> swapper/0:0 [120]
>   -0   0  6858.618082: function: 
> tick_nohz_idle_enter <-- cpu_startup_entry
>   -0   0d...  6858.618101: function:
> __tick_nohz_idle_enter <-- tick_nohz_idle_enter
>   -0   0d.s.  6858.621499: function: 
> tick_nohz_stop_idle <-- tick_check_idle
>   -0   0d.h.  6858.621550: function: 
> account_process_tick <-- update_process_times
>   -0   0d...  6858.621769: function: tick_nohz_irq_exit 
> <-- irq_exit
>   -0   0d...  6858.621787: function: 
> __tick_nohz_idle_enter <-- irq_exit
>   -0   0d.s.  6858.625500: function: 
> tick_nohz_stop_idle <-- tick_check_idle
>   -0   0d.h.  6858.625574: function: 
> account_process_tick <-- update_process_times
>   -0   0d...  6858.625818: function: tick_nohz_irq_exit 
> <-- irq_exit
>   -0   0d...  6858.625847: function: 
> __tick_nohz_idle_enter <-- irq_exit
>   -0   0d.s.  6858.629295: function: 
> tick_nohz_stop_idle <-- tick_check_idle
>   -0   0d...  6858.629351: function: tick_nohz_irq_exit 
> <-- irq_exit
>   -0   0d...  6858.629373: function: 
> __tick_nohz_idle_enter <-- irq_exit
>   -0   0d.s.  6858.629503: function: 
> tick_nohz_stop_idle <-- tick_check_idle
>   -0   0d.h.  6858.629569: function: 
> account_process_tick <-- update_process_times
>   -0   0d...  6858.629851: function: tick_nohz_irq_exit 
> <-- irq_exit
>   -0   0d...  6858.629881: function: 
> __tick_nohz_idle_enter <-- irq_exit
>   -0   0d.s.  6858.630412: function: 
> tick_nohz_stop_idle <-- tick_check_idle
>   -0   0.N..  6858.630550: function: 
> tick_nohz_idle_exit <-- cpu_startup_entry
>   -0   0d...  6858.630605: sched_switch: swapper/0:0  [120] 
> R ==> rcu_sched:9 [120]
> 
> I'm not saying that we are actually getting into nohz, but something
> with the nohz code is messing with cpu accounting.

The trace does indeed show that a tick is happening, as the config has
HZ=250 (4ms) and we see a tick happen every 4ms. But for some reason,
we don't update the the idle time correctly when nohz is enabled.

When I say nohz is enabled, I mean that we don't have nohz=off in the
command line. There seems to be some difference between having nohz=off
and having nohz disabled at runtime.

-- Steve
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: nohz problem with idle time on old hardware

2013-11-13 Thread Thomas Gleixner
On Wed, 13 Nov 2013, Matthew Whitehead wrote:

> I was testing the 3.12 kernel on some _old_ hardware and I uncovered a bug.
> It arises when nohz=on and goes away with nohz=off. On a crusty dual 
> Pentium-1 
> system that is completely idle, the sar utility reports 0% idle time on cpu0 
> and 100% idle on cpu1. Cpu0 _should_ also be reporting 100% idle, but instead
> it reports around 75% system time and 25% user time.
> 
> The problem was diagnosed by Steve Rostedt with help from John
> Stultz. The old system declares the dual TSCs unstable, and backs
> down to a timesource of refined-jiffies. Apparently refined-jiffies
> and jiffies are not a usable timesourcefor nohz, but we don't check
> for that case because most modern systems have several reliable
> hardware timesources.

Wrong.
 
> John suggested that we turn off nohz unless a usable hardware timesource is
> present.

nohz already depends on two things:

  1) A reliable clocksource which is valid for highres/nohz

  2) A per cpu clockevent device which supports one shot mode.

and those are evaluated at runtime before we switch into NOHZ mode.

And neither jiffies nor refined-jiffies qualify as valid clocksource.

So there is something else wrong.

Thanks,

tglx
   
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: nohz problem with idle time on old hardware

2013-11-13 Thread Thomas Gleixner
On Wed, 13 Nov 2013, Matthew Whitehead wrote:

 I was testing the 3.12 kernel on some _old_ hardware and I uncovered a bug.
 It arises when nohz=on and goes away with nohz=off. On a crusty dual 
 Pentium-1 
 system that is completely idle, the sar utility reports 0% idle time on cpu0 
 and 100% idle on cpu1. Cpu0 _should_ also be reporting 100% idle, but instead
 it reports around 75% system time and 25% user time.
 
 The problem was diagnosed by Steve Rostedt with help from John
 Stultz. The old system declares the dual TSCs unstable, and backs
 down to a timesource of refined-jiffies. Apparently refined-jiffies
 and jiffies are not a usable timesourcefor nohz, but we don't check
 for that case because most modern systems have several reliable
 hardware timesources.

Wrong.
 
 John suggested that we turn off nohz unless a usable hardware timesource is
 present.

nohz already depends on two things:

  1) A reliable clocksource which is valid for highres/nohz

  2) A per cpu clockevent device which supports one shot mode.

and those are evaluated at runtime before we switch into NOHZ mode.

And neither jiffies nor refined-jiffies qualify as valid clocksource.

So there is something else wrong.

Thanks,

tglx
   
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: nohz problem with idle time on old hardware

2013-11-13 Thread Steven Rostedt
On Wed, 13 Nov 2013 10:21:53 -0500
Steven Rostedt rost...@goodmis.org wrote:
trace-cmd record -p function -l '*nohz*' -l account_process_tick -e sched_switch
 
 rcu_sche-9   0d...  6858.618033: sched_switch: rcu_sched:9 [120] 
 S == swapper/0:0 [120]
   idle-0   0  6858.618082: function: 
 tick_nohz_idle_enter -- cpu_startup_entry
   idle-0   0d...  6858.618101: function:
 __tick_nohz_idle_enter -- tick_nohz_idle_enter
   idle-0   0d.s.  6858.621499: function: 
 tick_nohz_stop_idle -- tick_check_idle
   idle-0   0d.h.  6858.621550: function: 
 account_process_tick -- update_process_times
   idle-0   0d...  6858.621769: function: tick_nohz_irq_exit 
 -- irq_exit
   idle-0   0d...  6858.621787: function: 
 __tick_nohz_idle_enter -- irq_exit
   idle-0   0d.s.  6858.625500: function: 
 tick_nohz_stop_idle -- tick_check_idle
   idle-0   0d.h.  6858.625574: function: 
 account_process_tick -- update_process_times
   idle-0   0d...  6858.625818: function: tick_nohz_irq_exit 
 -- irq_exit
   idle-0   0d...  6858.625847: function: 
 __tick_nohz_idle_enter -- irq_exit
   idle-0   0d.s.  6858.629295: function: 
 tick_nohz_stop_idle -- tick_check_idle
   idle-0   0d...  6858.629351: function: tick_nohz_irq_exit 
 -- irq_exit
   idle-0   0d...  6858.629373: function: 
 __tick_nohz_idle_enter -- irq_exit
   idle-0   0d.s.  6858.629503: function: 
 tick_nohz_stop_idle -- tick_check_idle
   idle-0   0d.h.  6858.629569: function: 
 account_process_tick -- update_process_times
   idle-0   0d...  6858.629851: function: tick_nohz_irq_exit 
 -- irq_exit
   idle-0   0d...  6858.629881: function: 
 __tick_nohz_idle_enter -- irq_exit
   idle-0   0d.s.  6858.630412: function: 
 tick_nohz_stop_idle -- tick_check_idle
   idle-0   0.N..  6858.630550: function: 
 tick_nohz_idle_exit -- cpu_startup_entry
   idle-0   0d...  6858.630605: sched_switch: swapper/0:0  [120] 
 R == rcu_sched:9 [120]
 
 I'm not saying that we are actually getting into nohz, but something
 with the nohz code is messing with cpu accounting.

The trace does indeed show that a tick is happening, as the config has
HZ=250 (4ms) and we see a tick happen every 4ms. But for some reason,
we don't update the the idle time correctly when nohz is enabled.

When I say nohz is enabled, I mean that we don't have nohz=off in the
command line. There seems to be some difference between having nohz=off
and having nohz disabled at runtime.

-- Steve
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: nohz problem with idle time on old hardware

2013-11-13 Thread Thomas Gleixner
On Wed, 13 Nov 2013, Steven Rostedt wrote:
  I'm not saying that we are actually getting into nohz, but something
  with the nohz code is messing with cpu accounting.
 
 The trace does indeed show that a tick is happening, as the config has
 HZ=250 (4ms) and we see a tick happen every 4ms. But for some reason,
 we don't update the the idle time correctly when nohz is enabled.
 
 When I say nohz is enabled, I mean that we don't have nohz=off in the
 command line. There seems to be some difference between having nohz=off
 and having nohz disabled at runtime.

Right that affects tick_nohz_enabled

Two files use this variable:
kernel/rcu/tree_plugin.h
kernel/time/tick-sched.c

The only accounting related stuff is in tick-sched.c:

get_cpu_idle_time_us() and get_cpu_iowait_time_us()

Both functions bail out if (!tick_nohz_enabled).

The users of get_cpu_idle_time_us() are cpufreq and fs/proc/stat.c!

Now the simplest fix is to let those functions check whether we
actually switched into NOHZ mode. Should work for the RCU tree stuff
as well.

Thanks,

tglx



--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: nohz problem with idle time on old hardware

2013-11-13 Thread Steven Rostedt
On Wed, 13 Nov 2013 10:31:34 -0500
Steven Rostedt rost...@goodmis.org wrote:
 
 The trace does indeed show that a tick is happening, as the config has
 HZ=250 (4ms) and we see a tick happen every 4ms. But for some reason,
 we don't update the the idle time correctly when nohz is enabled.
 
 When I say nohz is enabled, I mean that we don't have nohz=off in the
 command line. There seems to be some difference between having nohz=off
 and having nohz disabled at runtime.


Looking at the differences between nohz=off from the command line, and
disabled at run time seems to be the variable tick_nohz_enabled. I
don't see where it gets set to zero except for nohz=off.

$ git grep tick_nohz_enabled
kernel/rcutree.h:   int tick_nohz_enabled_snap; /* Previously seen
value from sysfs. */ kernel/rcutree_plugin.h:extern int
tick_nohz_enabled; kernel/rcutree_plugin.h:tne =
ACCESS_ONCE(tick_nohz_enabled); kernel/rcutree_plugin.h:if
(tne != rdtp-tick_nohz_enabled_snap)
{ kernel/rcutree_plugin.h:rdtp-tick_nohz_enabled_snap
= tne; kernel/rcutree_plugin.h:
rdtp-tick_nohz_enabled_snap ? '.' : 'D'); kernel/time/tick-sched.c:int
tick_nohz_enabled __read_mostly  = 1;
kernel/time/tick-sched.c:   tick_nohz_enabled = 0;
kernel/time/tick-sched.c:   tick_nohz_enabled = 1;
kernel/time/tick-sched.c:   if (!tick_nohz_enabled)
kernel/time/tick-sched.c:   if (!tick_nohz_enabled)
kernel/time/tick-sched.c:   if (!tick_nohz_enabled)
kernel/time/tick-sched.c:   if (tick_nohz_enabled)


What's even stranger is that the RCU code in rcutree_plugin.h does an
ACCESS_ONCE(tick_nohz_enabled) as if it can change.

That said, looking at the fs/proc/stat.c get_idle_time() it does an
idle_time = get_cpu_idle_time_us(cpu, NULL) which has:

u64 get_cpu_idle_time_us(int cpu, u64 *last_update_time)
{
struct tick_sched *ts = per_cpu(tick_cpu_sched, cpu);
ktime_t now, idle;

if (!tick_nohz_enabled)
return -1;

now = ktime_get();
if (last_update_time) {
update_ts_time_stats(cpu, ts, now, last_update_time);
idle = ts-idle_sleeptime;
} else {
if (ts-idle_active  !nr_iowait_cpu(cpu)) {
ktime_t delta = ktime_sub(now, ts-idle_entrytime);
idle = ktime_add(ts-idle_sleeptime, delta);
} else {
idle = ts-idle_sleeptime;
}
}

return ktime_to_us(idle);

}

This is one of the differences between nohz=off and nohz=on with jiffy
accounting. When we have nohz=off, this returns -1 and the calling
function calculates the idle time differently.

-- Steve
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: nohz problem with idle time on old hardware

2013-11-13 Thread Thomas Gleixner
On Wed, 13 Nov 2013, Steven Rostedt wrote:

 On Wed, 13 Nov 2013 10:31:34 -0500
 Steven Rostedt rost...@goodmis.org wrote:
  
  The trace does indeed show that a tick is happening, as the config has
  HZ=250 (4ms) and we see a tick happen every 4ms. But for some reason,
  we don't update the the idle time correctly when nohz is enabled.
  
  When I say nohz is enabled, I mean that we don't have nohz=off in the
  command line. There seems to be some difference between having nohz=off
  and having nohz disabled at runtime.
 
 
 Looking at the differences between nohz=off from the command line, and
 disabled at run time seems to be the variable tick_nohz_enabled. I
 don't see where it gets set to zero except for nohz=off.

Right. It's telling you if NOHZ is enabled. It's not telling you that
NOHZ is active.

Thanks,

tglx
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: nohz problem with idle time on old hardware

2013-11-13 Thread Steven Rostedt
On Wed, 13 Nov 2013 17:07:18 +0100 (CET)
Thomas Gleixner t...@linutronix.de wrote:


 Right. It's telling you if NOHZ is enabled. It's not telling you that
 NOHZ is active.

Yeah, which makes this code rather silly:

in rcu_prepare_for_idle():

/* Handle nohz enablement switches conservatively. */
tne = ACCESS_ONCE(tick_nohz_enabled);
if (tne != rdtp-tick_nohz_enabled_snap) {
if (rcu_cpu_has_callbacks(cpu, NULL))
invoke_rcu_core(); /* force nohz to see update. */
rdtp-tick_nohz_enabled_snap = tne;
return;
}

-- Steve
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: nohz problem with idle time on old hardware

2013-11-13 Thread Paul E. McKenney
On Wed, Nov 13, 2013 at 04:50:20PM +0100, Thomas Gleixner wrote:
 On Wed, 13 Nov 2013, Steven Rostedt wrote:
   I'm not saying that we are actually getting into nohz, but something
   with the nohz code is messing with cpu accounting.
  
  The trace does indeed show that a tick is happening, as the config has
  HZ=250 (4ms) and we see a tick happen every 4ms. But for some reason,
  we don't update the the idle time correctly when nohz is enabled.
  
  When I say nohz is enabled, I mean that we don't have nohz=off in the
  command line. There seems to be some difference between having nohz=off
  and having nohz disabled at runtime.
 
 Right that affects tick_nohz_enabled
 
 Two files use this variable:
 kernel/rcu/tree_plugin.h
 kernel/time/tick-sched.c
 
 The only accounting related stuff is in tick-sched.c:
 
 get_cpu_idle_time_us() and get_cpu_iowait_time_us()
 
 Both functions bail out if (!tick_nohz_enabled).
 
 The users of get_cpu_idle_time_us() are cpufreq and fs/proc/stat.c!
 
 Now the simplest fix is to let those functions check whether we
 actually switched into NOHZ mode. Should work for the RCU tree stuff
 as well.

RCU's use of tick_nohz_enabled is for the RCU_FAST_NO_HZ stuff.  If
it sees !tick_nohz_enabled, it skips trying to get RCU out of the way
of disabling the scheduling-clock tick.  If RCU detects a change
in the value of tick_nohz_enabled, it does a raise_softirq() to
force re-evaluation of the situation.

Thanx, Paul

--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: nohz problem with idle time on old hardware

2013-11-13 Thread Paul E. McKenney
On Wed, Nov 13, 2013 at 11:12:57AM -0500, Steven Rostedt wrote:
 On Wed, 13 Nov 2013 17:07:18 +0100 (CET)
 Thomas Gleixner t...@linutronix.de wrote:
 
 
  Right. It's telling you if NOHZ is enabled. It's not telling you that
  NOHZ is active.
 
 Yeah, which makes this code rather silly:
 
 in rcu_prepare_for_idle():
 
   /* Handle nohz enablement switches conservatively. */
   tne = ACCESS_ONCE(tick_nohz_enabled);
   if (tne != rdtp-tick_nohz_enabled_snap) {
   if (rcu_cpu_has_callbacks(cpu, NULL))
   invoke_rcu_core(); /* force nohz to see update. */
   rdtp-tick_nohz_enabled_snap = tne;
   return;
   }

OK, what should I be checking instead?  Not much point in trying to
get RCU out of the way of disabling the scheduling-clock interrupt
if NOHZ is disabled.  ;-)

Thanx, Paul

--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: nohz problem with idle time on old hardware

2013-11-13 Thread Thomas Gleixner
On Wed, 13 Nov 2013, Thomas Gleixner wrote:

 On Wed, 13 Nov 2013, Steven Rostedt wrote:
 
  On Wed, 13 Nov 2013 10:31:34 -0500
  Steven Rostedt rost...@goodmis.org wrote:
   
   The trace does indeed show that a tick is happening, as the config has
   HZ=250 (4ms) and we see a tick happen every 4ms. But for some reason,
   we don't update the the idle time correctly when nohz is enabled.
   
   When I say nohz is enabled, I mean that we don't have nohz=off in the
   command line. There seems to be some difference between having nohz=off
   and having nohz disabled at runtime.
  
  
  Looking at the differences between nohz=off from the command line, and
  disabled at run time seems to be the variable tick_nohz_enabled. I
  don't see where it gets set to zero except for nohz=off.
 
 Right. It's telling you if NOHZ is enabled. It's not telling you that
 NOHZ is active.

And its even worse:

tick_nohz_idle_enter()

/*
 * set ts-inidle unconditionally. even if the system did not
 * switch to nohz mode the cpu frequency governers rely on the
 * update of the idle time accounting in tick_nohz_start_idle().
 */
ts-inidle = 1;

So there is code relying on that accounting stuff even if nohz is not
active.

Frozen shark time 






--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: nohz problem with idle time on old hardware

2013-11-13 Thread Steven Rostedt
On Wed, 13 Nov 2013 08:18:29 -0800
Paul E. McKenney paul...@linux.vnet.ibm.com wrote:

 On Wed, Nov 13, 2013 at 11:12:57AM -0500, Steven Rostedt wrote:
  On Wed, 13 Nov 2013 17:07:18 +0100 (CET)
  Thomas Gleixner t...@linutronix.de wrote:
  
  
   Right. It's telling you if NOHZ is enabled. It's not telling you that
   NOHZ is active.
  
  Yeah, which makes this code rather silly:
  
  in rcu_prepare_for_idle():
  
  /* Handle nohz enablement switches conservatively. */
  tne = ACCESS_ONCE(tick_nohz_enabled);
  if (tne != rdtp-tick_nohz_enabled_snap) {
  if (rcu_cpu_has_callbacks(cpu, NULL))
  invoke_rcu_core(); /* force nohz to see update. */
  rdtp-tick_nohz_enabled_snap = tne;
  return;
  }
 
 OK, what should I be checking instead?  Not much point in trying to
 get RCU out of the way of disabling the scheduling-clock interrupt
 if NOHZ is disabled.  ;-)
 

I'll leave the answer to Thomas, but checking tick_nohz_enabled just
lets you know if someone booted with nohz=off or not (and has nohz
configured). But it doesn't tell you if nohz is actually being used.

That is, tick_nohz_enabled is set at bootup and never changes.

Perhaps this old hardware uncovered other bugs as well ;-)

-- Steve
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: nohz problem with idle time on old hardware

2013-11-13 Thread Steven Rostedt
On Wed, 13 Nov 2013 17:21:53 +0100 (CET)
Thomas Gleixner t...@linutronix.de wrote:

 
 Frozen shark time 

As long as it's not aimed at me ;-)

-- Steve

--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: nohz problem with idle time on old hardware

2013-11-13 Thread Paul E. McKenney
On Wed, Nov 13, 2013 at 11:23:38AM -0500, Steven Rostedt wrote:
 On Wed, 13 Nov 2013 08:18:29 -0800
 Paul E. McKenney paul...@linux.vnet.ibm.com wrote:
 
  On Wed, Nov 13, 2013 at 11:12:57AM -0500, Steven Rostedt wrote:
   On Wed, 13 Nov 2013 17:07:18 +0100 (CET)
   Thomas Gleixner t...@linutronix.de wrote:
   
   
Right. It's telling you if NOHZ is enabled. It's not telling you that
NOHZ is active.
   
   Yeah, which makes this code rather silly:
   
   in rcu_prepare_for_idle():
   
 /* Handle nohz enablement switches conservatively. */
 tne = ACCESS_ONCE(tick_nohz_enabled);
 if (tne != rdtp-tick_nohz_enabled_snap) {
 if (rcu_cpu_has_callbacks(cpu, NULL))
 invoke_rcu_core(); /* force nohz to see update. */
 rdtp-tick_nohz_enabled_snap = tne;
 return;
 }
  
  OK, what should I be checking instead?  Not much point in trying to
  get RCU out of the way of disabling the scheduling-clock interrupt
  if NOHZ is disabled.  ;-)
 
 I'll leave the answer to Thomas, but checking tick_nohz_enabled just
 lets you know if someone booted with nohz=off or not (and has nohz
 configured). But it doesn't tell you if nohz is actually being used.

Based on Thomas's most recent response, it sounds like I need to check
a frozen shark or something.  ;-)

Thanx, Paul

 That is, tick_nohz_enabled is set at bootup and never changes.
 
 Perhaps this old hardware uncovered other bugs as well ;-)
 
 -- Steve
 

--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: nohz problem with idle time on old hardware

2013-11-13 Thread Thomas Gleixner
On Wed, 13 Nov 2013, Paul E. McKenney wrote:
 On Wed, Nov 13, 2013 at 11:23:38AM -0500, Steven Rostedt wrote:
  On Wed, 13 Nov 2013 08:18:29 -0800
  Paul E. McKenney paul...@linux.vnet.ibm.com wrote:
  
   On Wed, Nov 13, 2013 at 11:12:57AM -0500, Steven Rostedt wrote:
On Wed, 13 Nov 2013 17:07:18 +0100 (CET)
Thomas Gleixner t...@linutronix.de wrote:


 Right. It's telling you if NOHZ is enabled. It's not telling you that
 NOHZ is active.

Yeah, which makes this code rather silly:

in rcu_prepare_for_idle():

/* Handle nohz enablement switches conservatively. */
tne = ACCESS_ONCE(tick_nohz_enabled);
if (tne != rdtp-tick_nohz_enabled_snap) {
if (rcu_cpu_has_callbacks(cpu, NULL))
invoke_rcu_core(); /* force nohz to see update. 
*/
rdtp-tick_nohz_enabled_snap = tne;
return;
}
   
   OK, what should I be checking instead?  Not much point in trying to
   get RCU out of the way of disabling the scheduling-clock interrupt
   if NOHZ is disabled.  ;-)
  
  I'll leave the answer to Thomas, but checking tick_nohz_enabled just
  lets you know if someone booted with nohz=off or not (and has nohz
  configured). But it doesn't tell you if nohz is actually being used.
 
 Based on Thomas's most recent response, it sounds like I need to check
 a frozen shark or something.  ;-)

Correct. Fix for the whole mess below.

Thanks,

tglx


Subject: NOHZ: Check for nohz active instead of nohz enabled

RCU and the fine grained idle time accounting functions check
tick_nohz_enabled. But that variable is merily telling that NOHZ has
been enabled in the config and not been disabled on the command line.

But it does not tell anything about nohz being active. That's what all
this should check for.

Matthew reported, that the idle accounting on his old P1 machine
showed bogus values, when he enabled NOHZ in the config and did not
disable it on the kernel command line. The reason is that his machine
uses (refined) jiffies as a clocksource which explains why the fine
grained accounting went into lala land, because it depends on when the
system goes and leaves idle relative to the jiffies increment.

Provide a tick_nohz_active indicator and let RCU and the accounting
code use this instead of tick_nohz_enable.

Reported-by: Matthew Whitehead tedheads...@gmail.com
Signed-off-by: Thomas Gleixner t...@linutronix.de
---
diff --git a/kernel/rcu/tree_plugin.h b/kernel/rcu/tree_plugin.h
index 3822ac0..da6e6de 100644
--- a/kernel/rcu/tree_plugin.h
+++ b/kernel/rcu/tree_plugin.h
@@ -1632,7 +1632,7 @@ module_param(rcu_idle_gp_delay, int, 0644);
 static int rcu_idle_lazy_gp_delay = RCU_IDLE_LAZY_GP_DELAY;
 module_param(rcu_idle_lazy_gp_delay, int, 0644);
 
-extern int tick_nohz_enabled;
+extern int tick_nohz_active;
 
 /*
  * Try to advance callbacks for all flavors of RCU on the current CPU, but
@@ -1729,7 +1729,7 @@ static void rcu_prepare_for_idle(int cpu)
int tne;
 
/* Handle nohz enablement switches conservatively. */
-   tne = ACCESS_ONCE(tick_nohz_enabled);
+   tne = ACCESS_ONCE(tick_nohz_active);
if (tne != rdtp-tick_nohz_enabled_snap) {
if (rcu_cpu_has_callbacks(cpu, NULL))
invoke_rcu_core(); /* force nohz to see update. */
diff --git a/kernel/time/tick-sched.c b/kernel/time/tick-sched.c
index 3612fc7..a12df5a 100644
--- a/kernel/time/tick-sched.c
+++ b/kernel/time/tick-sched.c
@@ -361,8 +361,8 @@ void __init tick_nohz_init(void)
 /*
  * NO HZ enabled ?
  */
-int tick_nohz_enabled __read_mostly  = 1;
-
+static int tick_nohz_enabled __read_mostly  = 1;
+int tick_nohz_active  __read_mostly;
 /*
  * Enable / Disable tickless mode
  */
@@ -465,7 +465,7 @@ u64 get_cpu_idle_time_us(int cpu, u64 *last_update_time)
struct tick_sched *ts = per_cpu(tick_cpu_sched, cpu);
ktime_t now, idle;
 
-   if (!tick_nohz_enabled)
+   if (!tick_nohz_active)
return -1;
 
now = ktime_get();
@@ -506,7 +506,7 @@ u64 get_cpu_iowait_time_us(int cpu, u64 *last_update_time)
struct tick_sched *ts = per_cpu(tick_cpu_sched, cpu);
ktime_t now, iowait;
 
-   if (!tick_nohz_enabled)
+   if (!tick_nohz_active)
return -1;
 
now = ktime_get();
@@ -799,11 +799,6 @@ void tick_nohz_idle_enter(void)
local_irq_disable();
 
ts = __get_cpu_var(tick_cpu_sched);
-   /*
-* set ts-inidle unconditionally. even if the system did not
-* switch to nohz mode the cpu frequency governers rely on the
-* update of the idle time accounting in tick_nohz_start_idle().
-*/
ts-inidle = 1;
__tick_nohz_idle_enter(ts);
 
@@ -973,7 +968,7 @@ static void tick_nohz_switch_to_nohz(void)
struct tick_sched *ts = __get_cpu_var(tick_cpu_sched);
ktime_t next;
 
- 

Re: nohz problem with idle time on old hardware

2013-11-13 Thread Steven Rostedt
On Wed, 13 Nov 2013 21:01:57 +0100 (CET)
Thomas Gleixner t...@linutronix.de wrote:


 
 Subject: NOHZ: Check for nohz active instead of nohz enabled
 
 RCU and the fine grained idle time accounting functions check
 tick_nohz_enabled. But that variable is merily telling that NOHZ has
 been enabled in the config and not been disabled on the command line.
 
 But it does not tell anything about nohz being active. That's what all
 this should check for.
 
 Matthew reported, that the idle accounting on his old P1 machine
 showed bogus values, when he enabled NOHZ in the config and did not
 disable it on the kernel command line. The reason is that his machine
 uses (refined) jiffies as a clocksource which explains why the fine
 grained accounting went into lala land, because it depends on when the
 system goes and leaves idle relative to the jiffies increment.
 
 Provide a tick_nohz_active indicator and let RCU and the accounting
 code use this instead of tick_nohz_enable.
 
 Reported-by: Matthew Whitehead tedheads...@gmail.com

Matthew, can you test this patch to make sure it causes the idle issue
to go away (remember to remove nohz=off from your command line).

Reviewed-by: Steven Rostedt rost...@goodmis.org

-- Steve

 Signed-off-by: Thomas Gleixner t...@linutronix.de
 ---
 diff --git a/kernel/rcu/tree_plugin.h b/kernel/rcu/tree_plugin.h
 index 3822ac0..da6e6de 100644
 --- a/kernel/rcu/tree_plugin.h
 +++ b/kernel/rcu/tree_plugin.h
 @@ -1632,7 +1632,7 @@ module_param(rcu_idle_gp_delay, int, 0644);
  static int rcu_idle_lazy_gp_delay = RCU_IDLE_LAZY_GP_DELAY;
  module_param(rcu_idle_lazy_gp_delay, int, 0644);
  
 -extern int tick_nohz_enabled;
 +extern int tick_nohz_active;
  
  /*
   * Try to advance callbacks for all flavors of RCU on the current CPU, but
 @@ -1729,7 +1729,7 @@ static void rcu_prepare_for_idle(int cpu)
   int tne;
  
   /* Handle nohz enablement switches conservatively. */
 - tne = ACCESS_ONCE(tick_nohz_enabled);
 + tne = ACCESS_ONCE(tick_nohz_active);
   if (tne != rdtp-tick_nohz_enabled_snap) {
   if (rcu_cpu_has_callbacks(cpu, NULL))
   invoke_rcu_core(); /* force nohz to see update. */
 diff --git a/kernel/time/tick-sched.c b/kernel/time/tick-sched.c
 index 3612fc7..a12df5a 100644
 --- a/kernel/time/tick-sched.c
 +++ b/kernel/time/tick-sched.c
 @@ -361,8 +361,8 @@ void __init tick_nohz_init(void)
  /*
   * NO HZ enabled ?
   */
 -int tick_nohz_enabled __read_mostly  = 1;
 -
 +static int tick_nohz_enabled __read_mostly  = 1;
 +int tick_nohz_active  __read_mostly;
  /*
   * Enable / Disable tickless mode
   */
 @@ -465,7 +465,7 @@ u64 get_cpu_idle_time_us(int cpu, u64 *last_update_time)
   struct tick_sched *ts = per_cpu(tick_cpu_sched, cpu);
   ktime_t now, idle;
  
 - if (!tick_nohz_enabled)
 + if (!tick_nohz_active)
   return -1;
  
   now = ktime_get();
 @@ -506,7 +506,7 @@ u64 get_cpu_iowait_time_us(int cpu, u64 *last_update_time)
   struct tick_sched *ts = per_cpu(tick_cpu_sched, cpu);
   ktime_t now, iowait;
  
 - if (!tick_nohz_enabled)
 + if (!tick_nohz_active)
   return -1;
  
   now = ktime_get();
 @@ -799,11 +799,6 @@ void tick_nohz_idle_enter(void)
   local_irq_disable();
  
   ts = __get_cpu_var(tick_cpu_sched);
 - /*
 -  * set ts-inidle unconditionally. even if the system did not
 -  * switch to nohz mode the cpu frequency governers rely on the
 -  * update of the idle time accounting in tick_nohz_start_idle().
 -  */
   ts-inidle = 1;
   __tick_nohz_idle_enter(ts);
  
 @@ -973,7 +968,7 @@ static void tick_nohz_switch_to_nohz(void)
   struct tick_sched *ts = __get_cpu_var(tick_cpu_sched);
   ktime_t next;
  
 - if (!tick_nohz_enabled)
 + if (!tick_nohz_active)
   return;
  
   local_irq_disable();
 @@ -981,7 +976,7 @@ static void tick_nohz_switch_to_nohz(void)
   local_irq_enable();
   return;
   }
 -
 + tick_nohz_active = 1;
   ts-nohz_mode = NOHZ_MODE_LOWRES;
  
   /*
 @@ -1139,8 +1134,10 @@ void tick_setup_sched_timer(void)
   }
  
  #ifdef CONFIG_NO_HZ_COMMON
 - if (tick_nohz_enabled)
 + if (tick_nohz_enabled) {
   ts-nohz_mode = NOHZ_MODE_HIGHRES;
 + tick_nohz_active = 1;
 + }
  #endif
  }
  #endif /* HIGH_RES_TIMERS */

--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: nohz problem with idle time on old hardware

2013-11-13 Thread Matthew Whitehead
On Wed, Nov 13, 2013 at 03:07:45PM -0500, Steven Rostedt wrote:
 On Wed, 13 Nov 2013 21:01:57 +0100 (CET)
 Thomas Gleixner t...@linutronix.de wrote:
 
 
  
  Subject: NOHZ: Check for nohz active instead of nohz enabled
  
  RCU and the fine grained idle time accounting functions check
  tick_nohz_enabled. But that variable is merily telling that NOHZ has
  been enabled in the config and not been disabled on the command line.
  
  But it does not tell anything about nohz being active. That's what all
  this should check for.
  
  Matthew reported, that the idle accounting on his old P1 machine
  showed bogus values, when he enabled NOHZ in the config and did not
  disable it on the kernel command line. The reason is that his machine
  uses (refined) jiffies as a clocksource which explains why the fine
  grained accounting went into lala land, because it depends on when the
  system goes and leaves idle relative to the jiffies increment.
  
  Provide a tick_nohz_active indicator and let RCU and the accounting
  code use this instead of tick_nohz_enable.
  
  Reported-by: Matthew Whitehead tedheads...@gmail.com
 
 Matthew, can you test this patch to make sure it causes the idle issue
 to go away (remember to remove nohz=off from your command line).
 

Early testing looks good:

# Verify I haven't overridden nohz as a kernel parameter
root@host:~# grep nohz /proc/cmdline 

# See how my kernel chose its clocksource
root@host:~# dmesg | egrep TSC|clocksource
[0.00] tsc: Fast TSC calibration using PIT
[2.492468] Switched to clocksource refined-jiffies
[   24.197356] Switched to clocksource tsc
[   26.211058] Switched to clocksource refined-jiffies

# Check idle time since the reboot
root@host:~# sar -s 16:28:00 -P ALL 
Linux 3.12.0-rc6+ (host)  11/13/2013  _i586_  (2 CPU)

04:28:26 PM   LINUX RESTART

04:32:02 PM CPU %user %nice   %system   %iowait%steal %idle
04:34:02 PM all  4.98  0.00  1.03  0.04  0.00 93.96
04:34:02 PM   0  4.07  0.00  0.69  0.03  0.00 95.21
04:34:02 PM   1  5.88  0.00  1.37  0.06  0.00 92.70
04:36:01 PM all  4.12  0.00  1.13  0.02  0.00 94.73
04:36:01 PM   0  1.98  0.00  1.34  0.02  0.00 96.66
04:36:01 PM   1  6.28  0.00  0.91  0.03  0.00 92.79
04:38:01 PM all  0.39  0.00  0.65  0.05  0.00 98.90
04:38:01 PM   0  0.33  0.00  0.54  0.03  0.00 99.09
04:38:01 PM   1  0.45  0.00  0.75  0.07  0.00 98.73
04:40:01 PM all  0.14  0.00  0.40  0.02  0.00 99.44
04:40:01 PM   0  0.12  0.00  0.26  0.01  0.00 99.62
04:40:01 PM   1  0.16  0.00  0.55  0.02  0.00 99.27
04:42:01 PM all  0.13  0.00  0.28  0.01  0.00 99.58
04:42:01 PM   0  0.09  0.00  0.14  0.01  0.00 99.76
04:42:01 PM   1  0.16  0.00  0.43  0.02  0.00 99.40
Average:all  1.95  0.00  0.70  0.03  0.00 97.33
Average:  0  1.32  0.00  0.59  0.02  0.00 98.07
Average:  1  2.58  0.00  0.80  0.04  0.00 96.58

I will continue to test but I think this is good.

Tested-by: Matthew Whitehead tedheads...@gmail.com

- Matthew W

 Reviewed-by: Steven Rostedt rost...@goodmis.org
 
 -- Steve
 
  Signed-off-by: Thomas Gleixner t...@linutronix.de
  ---
  diff --git a/kernel/rcu/tree_plugin.h b/kernel/rcu/tree_plugin.h
  index 3822ac0..da6e6de 100644
  --- a/kernel/rcu/tree_plugin.h
  +++ b/kernel/rcu/tree_plugin.h
  @@ -1632,7 +1632,7 @@ module_param(rcu_idle_gp_delay, int, 0644);
   static int rcu_idle_lazy_gp_delay = RCU_IDLE_LAZY_GP_DELAY;
   module_param(rcu_idle_lazy_gp_delay, int, 0644);
   
  -extern int tick_nohz_enabled;
  +extern int tick_nohz_active;
   
   /*
* Try to advance callbacks for all flavors of RCU on the current CPU, but
  @@ -1729,7 +1729,7 @@ static void rcu_prepare_for_idle(int cpu)
  int tne;
   
  /* Handle nohz enablement switches conservatively. */
  -   tne = ACCESS_ONCE(tick_nohz_enabled);
  +   tne = ACCESS_ONCE(tick_nohz_active);
  if (tne != rdtp-tick_nohz_enabled_snap) {
  if (rcu_cpu_has_callbacks(cpu, NULL))
  invoke_rcu_core(); /* force nohz to see update. */
  diff --git a/kernel/time/tick-sched.c b/kernel/time/tick-sched.c
  index 3612fc7..a12df5a 100644
  --- a/kernel/time/tick-sched.c
  +++ b/kernel/time/tick-sched.c
  @@ -361,8 +361,8 @@ void __init tick_nohz_init(void)
   /*
* NO HZ enabled ?
*/
  -int tick_nohz_enabled __read_mostly  = 1;
  -
  +static int tick_nohz_enabled __read_mostly  = 1;
  +int tick_nohz_active  __read_mostly;
   /*
* Enable / Disable tickless mode
*/
  @@ -465,7 +465,7