Re: [PATCH tip/core/rcu] Do not keep timekeeping CPU tick running for non-nohz_full= CPUs

2014-07-23 Thread Mike Galbraith
On Wed, 2014-07-23 at 18:02 +0200, Frederic Weisbecker wrote:

> Yes. Distros seem to want to make full dynticks available for users but they
> also want the off case (when nohz_full= isn't passed) to keep the lowest 
> overhead
> as possible.

Yup, zero being the _desired_ number.  The general case can't afford any
fastpath cycles being added by a fringe feature, we're too fat now.

Imagines marketeer highlighting shiny new fringe feature: "...and most
network benchmarks show that we only lost a few percent throughput, so
you won't have to increase the size of your server farm _too_ much".

Ok, so my imagination conjured up a _stupid_ marketeer, you get it ;-)

-Mike

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH tip/core/rcu] Do not keep timekeeping CPU tick running for non-nohz_full= CPUs

2014-07-23 Thread Frederic Weisbecker
On Wed, Jul 23, 2014 at 09:31:59AM -0700, Paul E. McKenney wrote:
> On Wed, Jul 23, 2014 at 06:23:48PM +0200, Frederic Weisbecker wrote:
> > On Mon, Jul 21, 2014 at 10:33:06AM -0700, Paul E. McKenney wrote:
> > > On Mon, Jul 21, 2014 at 07:04:59PM +0200, Peter Zijlstra wrote:
> > > > On Mon, Jul 21, 2014 at 08:57:41AM -0700, Paul E. McKenney wrote:
> > > > > On Sun, Jul 20, 2014 at 10:34:17PM +0200, Peter Zijlstra wrote:
> > > > > > On Sun, Jul 20, 2014 at 04:47:59AM -0700, Paul E. McKenney wrote:
> > > > > > > So we really have to have -all- the CPUs be idle to turn off the 
> > > > > > > timekeeper.
> > > > > > 
> > > > > > That seems to be pretty unavoidable any which way around.
> > > > > 
> > > > > Hmmm...  The exception would be the likely common case where none of
> > > > > the CPUs are flagged as nohz_full= CPUs.  If we handled that case as
> > > > > if CONFIG_NO_HZ_FULL=n, we would have handled almost all of
> > > > > the problem.
> > > > 
> > > > You mean that is not currently the case? Yes that seems like a fairly
> > > > sane thing to do.
> > > 
> > > Hard to say -- need to see where Frederic is putting the call to
> > > rcu_sys_is_idle().  On the RCU side, I could potentially lower overhead
> > > by checking tick_nohz_full_enabled() in a few functions.
> > 
> > Yeah you definetly can.
> > 
> > Just put this in the very beginning of rcu_sys_is_idle():
> > 
> > if (tick_nohz_full_enabled())
> > return true;
> 
> That would be !tick_nohz_full_enabled(), right?  But please see below.

Right.

> 
> > That imply perhaps a more appropriate name like rcu_sys_need_timekeeper(),
> > with inverted condition.
> 
> Ah, I thought that you already avoided invoking rcu_sys_is_idle() when
> !tick_nohz_full_enabled(), so I didn't add a check to that function.
> Are you planning to change this?  Or am I having eyesight problems?

Ah right, I forgot that I already have that check from the caller.

Thanks.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH tip/core/rcu] Do not keep timekeeping CPU tick running for non-nohz_full= CPUs

2014-07-23 Thread Paul E. McKenney
On Wed, Jul 23, 2014 at 06:23:48PM +0200, Frederic Weisbecker wrote:
> On Mon, Jul 21, 2014 at 10:33:06AM -0700, Paul E. McKenney wrote:
> > On Mon, Jul 21, 2014 at 07:04:59PM +0200, Peter Zijlstra wrote:
> > > On Mon, Jul 21, 2014 at 08:57:41AM -0700, Paul E. McKenney wrote:
> > > > On Sun, Jul 20, 2014 at 10:34:17PM +0200, Peter Zijlstra wrote:
> > > > > On Sun, Jul 20, 2014 at 04:47:59AM -0700, Paul E. McKenney wrote:
> > > > > > So we really have to have -all- the CPUs be idle to turn off the 
> > > > > > timekeeper.
> > > > > 
> > > > > That seems to be pretty unavoidable any which way around.
> > > > 
> > > > Hmmm...  The exception would be the likely common case where none of
> > > > the CPUs are flagged as nohz_full= CPUs.  If we handled that case as
> > > > if CONFIG_NO_HZ_FULL=n, we would have handled almost all of
> > > > the problem.
> > > 
> > > You mean that is not currently the case? Yes that seems like a fairly
> > > sane thing to do.
> > 
> > Hard to say -- need to see where Frederic is putting the call to
> > rcu_sys_is_idle().  On the RCU side, I could potentially lower overhead
> > by checking tick_nohz_full_enabled() in a few functions.
> 
> Yeah you definetly can.
> 
> Just put this in the very beginning of rcu_sys_is_idle():
> 
> if (tick_nohz_full_enabled())
> return true;

That would be !tick_nohz_full_enabled(), right?  But please see below.

> That imply perhaps a more appropriate name like rcu_sys_need_timekeeper(),
> with inverted condition.

Ah, I thought that you already avoided invoking rcu_sys_is_idle() when
!tick_nohz_full_enabled(), so I didn't add a check to that function.
Are you planning to change this?  Or am I having eyesight problems?

Thanx, Paul

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH tip/core/rcu] Do not keep timekeeping CPU tick running for non-nohz_full= CPUs

2014-07-23 Thread Frederic Weisbecker
On Mon, Jul 21, 2014 at 10:33:06AM -0700, Paul E. McKenney wrote:
> On Mon, Jul 21, 2014 at 07:04:59PM +0200, Peter Zijlstra wrote:
> > On Mon, Jul 21, 2014 at 08:57:41AM -0700, Paul E. McKenney wrote:
> > > On Sun, Jul 20, 2014 at 10:34:17PM +0200, Peter Zijlstra wrote:
> > > > On Sun, Jul 20, 2014 at 04:47:59AM -0700, Paul E. McKenney wrote:
> > > > > So we really have to have -all- the CPUs be idle to turn off the 
> > > > > timekeeper.
> > > > 
> > > > That seems to be pretty unavoidable any which way around.
> > > 
> > > Hmmm...  The exception would be the likely common case where none of
> > > the CPUs are flagged as nohz_full= CPUs.  If we handled that case as
> > > if CONFIG_NO_HZ_FULL=n, we would have handled almost all of
> > > the problem.
> > 
> > You mean that is not currently the case? Yes that seems like a fairly
> > sane thing to do.
> 
> Hard to say -- need to see where Frederic is putting the call to
> rcu_sys_is_idle().  On the RCU side, I could potentially lower overhead
> by checking tick_nohz_full_enabled() in a few functions.

Yeah you definetly can.

Just put this in the very beginning of rcu_sys_is_idle():

if (tick_nohz_full_enabled())
return true;

That imply perhaps a more appropriate name like rcu_sys_need_timekeeper(),
with inverted condition.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH tip/core/rcu] Do not keep timekeeping CPU tick running for non-nohz_full= CPUs

2014-07-23 Thread Frederic Weisbecker
On Mon, Jul 21, 2014 at 08:59:22AM -0700, Paul E. McKenney wrote:
> On Mon, Jul 21, 2014 at 12:12:48AM +0200, Frederic Weisbecker wrote:
> > On Sun, Jul 20, 2014 at 04:47:59AM -0700, Paul E. McKenney wrote:
> > > On Sat, Jul 19, 2014 at 08:01:24PM +0200, Frederic Weisbecker wrote:
> > > > On Sat, Jul 19, 2014 at 09:53:50AM -0700, Paul E. McKenney wrote:
> > > > > If a non-nohz_full= CPU is non-idle, it will have a scheduling-clock
> > > > > interrupt, and therefore doesn't need the timekeeping CPU to keep
> > > > > its scheduling-clock interrupt going.  This commit therefore ignores
> > > > > the idle state of non-nohz_full CPUs when determining whether or not
> > > > > the timekeeping CPU can safely turn off its scheduling-clock 
> > > > > interrupt.
> > > > > 
> > > > > Signed-off-by: Paul E. McKenney 
> > > > 
> > > > Unfortunately that's not how things work. Running a CPU tick doesn't 
> > > > necessarily
> > > > imply to run the timekeeping duty.
> > > > 
> > > > Only the timekeeper can update the timekeeping. There is an exception 
> > > > though:
> > > > the timekeeping is also updated by dynticks idle CPUs when they wake up 
> > > > in an
> > > > interrupt from idle.
> > > > 
> > > > Here is in practice why it doesn't work:
> > > > 
> > > > So lets say CPU 0 is timekeeper, CPU 1 a non-nohz-full CPU and all 
> > > > others are full-nohz.
> > > > CPU 0 is sleeping. CPU 1 wakes up from idle, so it has an uptodate 
> > > > timekeeping but then
> > > > if it continues to execute further without waking up CPU 0, it risks 
> > > > stale timestamps.
> > > > 
> > > > This can be changed by allowing timekeeping duty from all non-nohz_full 
> > > > CPUs, that's
> > > > the initial direction I took, but it involved a lot of complications 
> > > > and scalability
> > > > issues.
> > > 
> > > So we really have to have -all- the CPUs be idle to turn off the 
> > > timekeeper.
> > > This won't make the battery-powered embedded guys happy...
> > 
> > I can imagine all sorts of solutions to solve this. None of them look simple
> > though. And I'm really convinced this isn't worth until some user comes up
> > and report me that 1) he seriously uses full dynticks and 2) he needs 
> > non-full-nohz
> > CPUs other than CPU 0.
> > 
> > If 1 and 2 ever happen, I'll gladly work on this.
> 
> Does the thought of special-casing the situation where CONFIG_NO_HZ_FULL=y,
> CONFIG_NO_HZ_FULL_SYSIDLE=y, and there are no nohz_full= CPUs make sense?

Yes. Distros seem to want to make full dynticks available for users but they
also want the off case (when nohz_full= isn't passed) to keep the lowest 
overhead
as possible.

So CONFIG_NO_HZ_FULL_SYSIDLE=y should probably do the same as it's expected to 
be
a default choice as well.

> 
> > > Other thoughts on this?  We really should not be setting
> > > CONFIG_NO_HZ_FULL_SYSIDLE by default until this is solved.
> > 
> > Well it's better to save energy when all CPUs are idle than never.
> 
> Fair point!
> 
>   Thanx, Paul
> 
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH tip/core/rcu] Do not keep timekeeping CPU tick running for non-nohz_full= CPUs

2014-07-23 Thread Frederic Weisbecker
On Mon, Jul 21, 2014 at 08:57:41AM -0700, Paul E. McKenney wrote:
> On Sun, Jul 20, 2014 at 10:34:17PM +0200, Peter Zijlstra wrote:
> > On Sun, Jul 20, 2014 at 04:47:59AM -0700, Paul E. McKenney wrote:
> > > So we really have to have -all- the CPUs be idle to turn off the 
> > > timekeeper.
> > 
> > That seems to be pretty unavoidable any which way around.
> 
> Hmmm...  The exception would be the likely common case where none of
> the CPUs are flagged as nohz_full= CPUs.  If we handled that case as
> if CONFIG_NO_HZ_FULL=n, we would have handled almost all of
> the problem.

Exactly, like you said on a further post, tick_nohz_full_enabled() is
the magic you need :)

> 
> > > This won't make the battery-powered embedded guys happy...
> > > 
> > > Other thoughts on this?  We really should not be setting
> > > CONFIG_NO_HZ_FULL_SYSIDLE by default until this is solved.
> > 
> > What are those same guys doing with nohz_full to begin with?
> 
> If CONFIG_NO_HZ_FULL_SYSIDLE=y is the default, my main concern is for
> people who didn't really want it, and who thus did not set the nohz_full=
> boot parameter.  Hence my suggestion above that we treat that case as
> if CONFIG_NO_HZ_FULL=n (and thus also as if CONFIG_NO_HZ_FULL_SYSIDLE=n).
> 
> There have been some people saying that they want only a subset of
> their CPUs in nohz_full= state, and these guys seem to want to run a
> mixed workload.  For example, they have HPC (or RT) workloads on the
> nohz_full= CPUs, and also want normal high-throughput processing on the
> remaining CPUs.  If software was trivial (and making other unlikely
> assumptions about the perfection of the world and the invalidity of
> Murphy's lawy), we would want the timekeeping CPU to be able to move
> among the non-nohz_full= CPUs.
> 
> However, this should be a small fraction of the users, and many of
> these guys would probably be open to making a few changes.  Thus, a
> less-proactive approach should allow us to solve their actual problems, as
> opposed to the problems that we speculate that they might encounter.  ;-)

Sounds pretty good way of doing things!

>   Thanx, Paul
> 
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH tip/core/rcu] Do not keep timekeeping CPU tick running for non-nohz_full= CPUs

2014-07-21 Thread Paul E. McKenney
On Mon, Jul 21, 2014 at 07:04:59PM +0200, Peter Zijlstra wrote:
> On Mon, Jul 21, 2014 at 08:57:41AM -0700, Paul E. McKenney wrote:
> > On Sun, Jul 20, 2014 at 10:34:17PM +0200, Peter Zijlstra wrote:
> > > On Sun, Jul 20, 2014 at 04:47:59AM -0700, Paul E. McKenney wrote:
> > > > So we really have to have -all- the CPUs be idle to turn off the 
> > > > timekeeper.
> > > 
> > > That seems to be pretty unavoidable any which way around.
> > 
> > Hmmm...  The exception would be the likely common case where none of
> > the CPUs are flagged as nohz_full= CPUs.  If we handled that case as
> > if CONFIG_NO_HZ_FULL=n, we would have handled almost all of
> > the problem.
> 
> You mean that is not currently the case? Yes that seems like a fairly
> sane thing to do.

Hard to say -- need to see where Frederic is putting the call to
rcu_sys_is_idle().  On the RCU side, I could potentially lower overhead
by checking tick_nohz_full_enabled() in a few functions.

> > > > This won't make the battery-powered embedded guys happy...
> > > > 
> > > > Other thoughts on this?  We really should not be setting
> > > > CONFIG_NO_HZ_FULL_SYSIDLE by default until this is solved.
> > > 
> > > What are those same guys doing with nohz_full to begin with?
> > 
> > If CONFIG_NO_HZ_FULL_SYSIDLE=y is the default, my main concern is for
> > people who didn't really want it, and who thus did not set the nohz_full=
> > boot parameter.  Hence my suggestion above that we treat that case as
> > if CONFIG_NO_HZ_FULL=n (and thus also as if CONFIG_NO_HZ_FULL_SYSIDLE=n).
> 
> ack
> 
> > There have been some people saying that they want only a subset of
> > their CPUs in nohz_full= state, and these guys seem to want to run a
> > mixed workload.  For example, they have HPC (or RT) workloads on the
> > nohz_full= CPUs, and also want normal high-throughput processing on the
> > remaining CPUs.  If software was trivial (and making other unlikely
> > assumptions about the perfection of the world and the invalidity of
> > Murphy's lawy), we would want the timekeeping CPU to be able to move
> > among the non-nohz_full= CPUs.
> 
> Yeah, I don't see a problem with that, but then I'm not entirely sure
> why we use RCU to track system idle state.

Because RCU needs to do very similar tracking to deal with dyntick-idle
CPUs and the various types of RCU grace periods.

> > However, this should be a small fraction of the users, and many of
> > these guys would probably be open to making a few changes.  Thus, a
> > less-proactive approach should allow us to solve their actual problems, as
> > opposed to the problems that we speculate that they might encounter.  ;-)
> 
> But you still haven't talked about the battery people... I don't think
> nohz_full is something they should care about / use.

For all I know, they might care, but it is all speculative at this point.
The possible use cases would be if they were needing some HPC-style
computations for some misbegotten mobile implementation of some
misbegotten game.

So as far as I know at this point, the common case for the battery-powered
guys is that they don't want unconditional scheduling-clock interrupts
on CPU 0 when CPU 0 is idle, and that case is covered by our discussion
above.

Thanx, Paul

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH tip/core/rcu] Do not keep timekeeping CPU tick running for non-nohz_full= CPUs

2014-07-21 Thread Peter Zijlstra
On Mon, Jul 21, 2014 at 08:57:41AM -0700, Paul E. McKenney wrote:
> On Sun, Jul 20, 2014 at 10:34:17PM +0200, Peter Zijlstra wrote:
> > On Sun, Jul 20, 2014 at 04:47:59AM -0700, Paul E. McKenney wrote:
> > > So we really have to have -all- the CPUs be idle to turn off the 
> > > timekeeper.
> > 
> > That seems to be pretty unavoidable any which way around.
> 
> Hmmm...  The exception would be the likely common case where none of
> the CPUs are flagged as nohz_full= CPUs.  If we handled that case as
> if CONFIG_NO_HZ_FULL=n, we would have handled almost all of
> the problem.

You mean that is not currently the case? Yes that seems like a fairly
sane thing to do.

> > > This won't make the battery-powered embedded guys happy...
> > > 
> > > Other thoughts on this?  We really should not be setting
> > > CONFIG_NO_HZ_FULL_SYSIDLE by default until this is solved.
> > 
> > What are those same guys doing with nohz_full to begin with?
> 
> If CONFIG_NO_HZ_FULL_SYSIDLE=y is the default, my main concern is for
> people who didn't really want it, and who thus did not set the nohz_full=
> boot parameter.  Hence my suggestion above that we treat that case as
> if CONFIG_NO_HZ_FULL=n (and thus also as if CONFIG_NO_HZ_FULL_SYSIDLE=n).

ack

> There have been some people saying that they want only a subset of
> their CPUs in nohz_full= state, and these guys seem to want to run a
> mixed workload.  For example, they have HPC (or RT) workloads on the
> nohz_full= CPUs, and also want normal high-throughput processing on the
> remaining CPUs.  If software was trivial (and making other unlikely
> assumptions about the perfection of the world and the invalidity of
> Murphy's lawy), we would want the timekeeping CPU to be able to move
> among the non-nohz_full= CPUs.

Yeah, I don't see a problem with that, but then I'm not entirely sure
why we use RCU to track system idle state.

> However, this should be a small fraction of the users, and many of
> these guys would probably be open to making a few changes.  Thus, a
> less-proactive approach should allow us to solve their actual problems, as
> opposed to the problems that we speculate that they might encounter.  ;-)

But you still haven't talked about the battery people... I don't think
nohz_full is something they should care about / use.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH tip/core/rcu] Do not keep timekeeping CPU tick running for non-nohz_full= CPUs

2014-07-21 Thread Paul E. McKenney
On Mon, Jul 21, 2014 at 12:12:48AM +0200, Frederic Weisbecker wrote:
> On Sun, Jul 20, 2014 at 04:47:59AM -0700, Paul E. McKenney wrote:
> > On Sat, Jul 19, 2014 at 08:01:24PM +0200, Frederic Weisbecker wrote:
> > > On Sat, Jul 19, 2014 at 09:53:50AM -0700, Paul E. McKenney wrote:
> > > > If a non-nohz_full= CPU is non-idle, it will have a scheduling-clock
> > > > interrupt, and therefore doesn't need the timekeeping CPU to keep
> > > > its scheduling-clock interrupt going.  This commit therefore ignores
> > > > the idle state of non-nohz_full CPUs when determining whether or not
> > > > the timekeeping CPU can safely turn off its scheduling-clock interrupt.
> > > > 
> > > > Signed-off-by: Paul E. McKenney 
> > > 
> > > Unfortunately that's not how things work. Running a CPU tick doesn't 
> > > necessarily
> > > imply to run the timekeeping duty.
> > > 
> > > Only the timekeeper can update the timekeeping. There is an exception 
> > > though:
> > > the timekeeping is also updated by dynticks idle CPUs when they wake up 
> > > in an
> > > interrupt from idle.
> > > 
> > > Here is in practice why it doesn't work:
> > > 
> > > So lets say CPU 0 is timekeeper, CPU 1 a non-nohz-full CPU and all others 
> > > are full-nohz.
> > > CPU 0 is sleeping. CPU 1 wakes up from idle, so it has an uptodate 
> > > timekeeping but then
> > > if it continues to execute further without waking up CPU 0, it risks 
> > > stale timestamps.
> > > 
> > > This can be changed by allowing timekeeping duty from all non-nohz_full 
> > > CPUs, that's
> > > the initial direction I took, but it involved a lot of complications and 
> > > scalability
> > > issues.
> > 
> > So we really have to have -all- the CPUs be idle to turn off the timekeeper.
> > This won't make the battery-powered embedded guys happy...
> 
> I can imagine all sorts of solutions to solve this. None of them look simple
> though. And I'm really convinced this isn't worth until some user comes up
> and report me that 1) he seriously uses full dynticks and 2) he needs 
> non-full-nohz
> CPUs other than CPU 0.
> 
> If 1 and 2 ever happen, I'll gladly work on this.

Does the thought of special-casing the situation where CONFIG_NO_HZ_FULL=y,
CONFIG_NO_HZ_FULL_SYSIDLE=y, and there are no nohz_full= CPUs make sense?

> > Other thoughts on this?  We really should not be setting
> > CONFIG_NO_HZ_FULL_SYSIDLE by default until this is solved.
> 
> Well it's better to save energy when all CPUs are idle than never.

Fair point!

Thanx, Paul

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH tip/core/rcu] Do not keep timekeeping CPU tick running for non-nohz_full= CPUs

2014-07-21 Thread Paul E. McKenney
On Sun, Jul 20, 2014 at 10:34:17PM +0200, Peter Zijlstra wrote:
> On Sun, Jul 20, 2014 at 04:47:59AM -0700, Paul E. McKenney wrote:
> > So we really have to have -all- the CPUs be idle to turn off the timekeeper.
> 
> That seems to be pretty unavoidable any which way around.

Hmmm...  The exception would be the likely common case where none of
the CPUs are flagged as nohz_full= CPUs.  If we handled that case as
if CONFIG_NO_HZ_FULL=n, we would have handled almost all of
the problem.

> > This won't make the battery-powered embedded guys happy...
> > 
> > Other thoughts on this?  We really should not be setting
> > CONFIG_NO_HZ_FULL_SYSIDLE by default until this is solved.
> 
> What are those same guys doing with nohz_full to begin with?

If CONFIG_NO_HZ_FULL_SYSIDLE=y is the default, my main concern is for
people who didn't really want it, and who thus did not set the nohz_full=
boot parameter.  Hence my suggestion above that we treat that case as
if CONFIG_NO_HZ_FULL=n (and thus also as if CONFIG_NO_HZ_FULL_SYSIDLE=n).

There have been some people saying that they want only a subset of
their CPUs in nohz_full= state, and these guys seem to want to run a
mixed workload.  For example, they have HPC (or RT) workloads on the
nohz_full= CPUs, and also want normal high-throughput processing on the
remaining CPUs.  If software was trivial (and making other unlikely
assumptions about the perfection of the world and the invalidity of
Murphy's lawy), we would want the timekeeping CPU to be able to move
among the non-nohz_full= CPUs.

However, this should be a small fraction of the users, and many of
these guys would probably be open to making a few changes.  Thus, a
less-proactive approach should allow us to solve their actual problems, as
opposed to the problems that we speculate that they might encounter.  ;-)

Thanx, Paul

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH tip/core/rcu] Do not keep timekeeping CPU tick running for non-nohz_full= CPUs

2014-07-20 Thread Frederic Weisbecker
On Sun, Jul 20, 2014 at 04:47:59AM -0700, Paul E. McKenney wrote:
> On Sat, Jul 19, 2014 at 08:01:24PM +0200, Frederic Weisbecker wrote:
> > On Sat, Jul 19, 2014 at 09:53:50AM -0700, Paul E. McKenney wrote:
> > > If a non-nohz_full= CPU is non-idle, it will have a scheduling-clock
> > > interrupt, and therefore doesn't need the timekeeping CPU to keep
> > > its scheduling-clock interrupt going.  This commit therefore ignores
> > > the idle state of non-nohz_full CPUs when determining whether or not
> > > the timekeeping CPU can safely turn off its scheduling-clock interrupt.
> > > 
> > > Signed-off-by: Paul E. McKenney 
> > 
> > Unfortunately that's not how things work. Running a CPU tick doesn't 
> > necessarily
> > imply to run the timekeeping duty.
> > 
> > Only the timekeeper can update the timekeeping. There is an exception 
> > though:
> > the timekeeping is also updated by dynticks idle CPUs when they wake up in 
> > an
> > interrupt from idle.
> > 
> > Here is in practice why it doesn't work:
> > 
> > So lets say CPU 0 is timekeeper, CPU 1 a non-nohz-full CPU and all others 
> > are full-nohz.
> > CPU 0 is sleeping. CPU 1 wakes up from idle, so it has an uptodate 
> > timekeeping but then
> > if it continues to execute further without waking up CPU 0, it risks stale 
> > timestamps.
> > 
> > This can be changed by allowing timekeeping duty from all non-nohz_full 
> > CPUs, that's
> > the initial direction I took, but it involved a lot of complications and 
> > scalability
> > issues.
> 
> So we really have to have -all- the CPUs be idle to turn off the timekeeper.
> This won't make the battery-powered embedded guys happy...

I can imagine all sorts of solutions to solve this. None of them look simple
though. And I'm really convinced this isn't worth until some user comes up
and report me that 1) he seriously uses full dynticks and 2) he needs 
non-full-nohz
CPUs other than CPU 0.

If 1 and 2 ever happen, I'll gladly work on this.

> 
> Other thoughts on this?  We really should not be setting
> CONFIG_NO_HZ_FULL_SYSIDLE by default until this is solved.

Well it's better to save energy when all CPUs are idle than never.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH tip/core/rcu] Do not keep timekeeping CPU tick running for non-nohz_full= CPUs

2014-07-20 Thread Peter Zijlstra
On Sun, Jul 20, 2014 at 04:47:59AM -0700, Paul E. McKenney wrote:
> So we really have to have -all- the CPUs be idle to turn off the timekeeper.

That seems to be pretty unavoidable any which way around.

> This won't make the battery-powered embedded guys happy...
> 
> Other thoughts on this?  We really should not be setting
> CONFIG_NO_HZ_FULL_SYSIDLE by default until this is solved.

What are those same guys doing with nohz_full to begin with?


pgp8PdmEOZCwS.pgp
Description: PGP signature


Re: [PATCH tip/core/rcu] Do not keep timekeeping CPU tick running for non-nohz_full= CPUs

2014-07-20 Thread Paul E. McKenney
On Sat, Jul 19, 2014 at 08:01:24PM +0200, Frederic Weisbecker wrote:
> On Sat, Jul 19, 2014 at 09:53:50AM -0700, Paul E. McKenney wrote:
> > If a non-nohz_full= CPU is non-idle, it will have a scheduling-clock
> > interrupt, and therefore doesn't need the timekeeping CPU to keep
> > its scheduling-clock interrupt going.  This commit therefore ignores
> > the idle state of non-nohz_full CPUs when determining whether or not
> > the timekeeping CPU can safely turn off its scheduling-clock interrupt.
> > 
> > Signed-off-by: Paul E. McKenney 
> 
> Unfortunately that's not how things work. Running a CPU tick doesn't 
> necessarily
> imply to run the timekeeping duty.
> 
> Only the timekeeper can update the timekeeping. There is an exception though:
> the timekeeping is also updated by dynticks idle CPUs when they wake up in an
> interrupt from idle.
> 
> Here is in practice why it doesn't work:
> 
> So lets say CPU 0 is timekeeper, CPU 1 a non-nohz-full CPU and all others are 
> full-nohz.
> CPU 0 is sleeping. CPU 1 wakes up from idle, so it has an uptodate 
> timekeeping but then
> if it continues to execute further without waking up CPU 0, it risks stale 
> timestamps.
> 
> This can be changed by allowing timekeeping duty from all non-nohz_full CPUs, 
> that's
> the initial direction I took, but it involved a lot of complications and 
> scalability
> issues.

So we really have to have -all- the CPUs be idle to turn off the timekeeper.
This won't make the battery-powered embedded guys happy...

Other thoughts on this?  We really should not be setting
CONFIG_NO_HZ_FULL_SYSIDLE by default until this is solved.

Thanx, Paul

> > diff --git a/kernel/rcu/tree_plugin.h b/kernel/rcu/tree_plugin.h
> > index ddad959a9132..eaa32e4c228d 100644
> > --- a/kernel/rcu/tree_plugin.h
> > +++ b/kernel/rcu/tree_plugin.h
> > @@ -2789,8 +2789,13 @@ static void rcu_sysidle_exit(struct rcu_dynticks 
> > *rdtp, int irq)
> >  * system-idle state.  This means that the timekeeping CPU must
> >  * invoke rcu_sysidle_force_exit() directly if it does anything
> >  * more than take a scheduling-clock interrupt.
> > +*
> > +* In addition if we are not a nohz_full= CPU, then when we are
> > +* non-idle we have our own tick, so we don't need the timekeeping
> > +* CPU to keep a tick on our behalf.  We assume that the timekeeping
> > +* CPU is also a nohz_full= CPU.
> >  */
> > -   if (smp_processor_id() == tick_do_timer_cpu)
> > +   if (!tick_nohz_full_cpu(smp_processor_id()))
> > return;
> >  
> > /* Update system-idle state: We are clearly no longer fully idle! */
> > @@ -2810,11 +2815,11 @@ static void rcu_sysidle_check_cpu(struct rcu_data 
> > *rdp, bool *isidle,
> >  
> > /*
> >  * If some other CPU has already reported non-idle, if this is
> > -* not the flavor of RCU that tracks sysidle state, or if this
> > -* is an offline or the timekeeping CPU, nothing to do.
> > +* not the flavor of RCU that tracks sysidle state, or if this is
> > +* an offline or !nohz_full= or the timekeeping CPU, nothing to do.
> >  */
> > if (!*isidle || rdp->rsp != rcu_sysidle_state ||
> > -   cpu_is_offline(rdp->cpu) || rdp->cpu == tick_do_timer_cpu)
> > +   cpu_is_offline(rdp->cpu) || !tick_nohz_full_cpu(rdp->cpu))
> > return;
> > if (rcu_gp_in_progress(rdp->rsp))
> > WARN_ON_ONCE(smp_processor_id() != tick_do_timer_cpu);
> > 
> 

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH tip/core/rcu] Do not keep timekeeping CPU tick running for non-nohz_full= CPUs

2014-07-19 Thread Frederic Weisbecker
2014-07-19 20:28 GMT+02:00 Peter Zijlstra :
> On Sat, Jul 19, 2014 at 08:01:24PM +0200, Frederic Weisbecker wrote:
>> This can be changed by allowing timekeeping duty from all non-nohz_full 
>> CPUs, that's
>> the initial direction I took, but it involved a lot of complications and 
>> scalability
>> issues.
>
> How so, currently any CPU can be timekeeper, how is any !nohz_full cpu
> different?

If timekeeping becomes a movable target in nohz full then we need to
make rcu_sys_is_idle() callable concurrently and we must send the
timekeeping-wakeup IPI to a possibly moving target. All that is a
predictable nightmare both in terms of complexity and scalability.

That's the direction I took initially
(https://lkml.org/lkml/2013/12/17/708) but I quickly resigned. The
changestat needed to be doubled to do it correctly. Moreover having
non-nohz-full CPUs other than CPU 0 is expected to be a corner case. A
corner case for a barely used feature (nohz full) as of today.

Also you might want to read tglx opinion on movable timekeepers in
nohz full: 
http://marc.info/?i=alpine.DEB.2.02.1405092358390.6261%40ionos.tec.linutronix.de
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH tip/core/rcu] Do not keep timekeeping CPU tick running for non-nohz_full= CPUs

2014-07-19 Thread Peter Zijlstra
On Sat, Jul 19, 2014 at 08:01:24PM +0200, Frederic Weisbecker wrote:
> This can be changed by allowing timekeeping duty from all non-nohz_full CPUs, 
> that's
> the initial direction I took, but it involved a lot of complications and 
> scalability
> issues.

How so, currently any CPU can be timekeeper, how is any !nohz_full cpu
different?
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH tip/core/rcu] Do not keep timekeeping CPU tick running for non-nohz_full= CPUs

2014-07-19 Thread Frederic Weisbecker
On Sat, Jul 19, 2014 at 09:53:50AM -0700, Paul E. McKenney wrote:
> If a non-nohz_full= CPU is non-idle, it will have a scheduling-clock
> interrupt, and therefore doesn't need the timekeeping CPU to keep
> its scheduling-clock interrupt going.  This commit therefore ignores
> the idle state of non-nohz_full CPUs when determining whether or not
> the timekeeping CPU can safely turn off its scheduling-clock interrupt.
> 
> Signed-off-by: Paul E. McKenney 

Unfortunately that's not how things work. Running a CPU tick doesn't necessarily
imply to run the timekeeping duty.

Only the timekeeper can update the timekeeping. There is an exception though:
the timekeeping is also updated by dynticks idle CPUs when they wake up in an
interrupt from idle.

Here is in practice why it doesn't work:

So lets say CPU 0 is timekeeper, CPU 1 a non-nohz-full CPU and all others are 
full-nohz.
CPU 0 is sleeping. CPU 1 wakes up from idle, so it has an uptodate timekeeping 
but then
if it continues to execute further without waking up CPU 0, it risks stale 
timestamps.

This can be changed by allowing timekeeping duty from all non-nohz_full CPUs, 
that's
the initial direction I took, but it involved a lot of complications and 
scalability
issues.

> 
> diff --git a/kernel/rcu/tree_plugin.h b/kernel/rcu/tree_plugin.h
> index ddad959a9132..eaa32e4c228d 100644
> --- a/kernel/rcu/tree_plugin.h
> +++ b/kernel/rcu/tree_plugin.h
> @@ -2789,8 +2789,13 @@ static void rcu_sysidle_exit(struct rcu_dynticks 
> *rdtp, int irq)
>* system-idle state.  This means that the timekeeping CPU must
>* invoke rcu_sysidle_force_exit() directly if it does anything
>* more than take a scheduling-clock interrupt.
> +  *
> +  * In addition if we are not a nohz_full= CPU, then when we are
> +  * non-idle we have our own tick, so we don't need the timekeeping
> +  * CPU to keep a tick on our behalf.  We assume that the timekeeping
> +  * CPU is also a nohz_full= CPU.
>*/
> - if (smp_processor_id() == tick_do_timer_cpu)
> + if (!tick_nohz_full_cpu(smp_processor_id()))
>   return;
>  
>   /* Update system-idle state: We are clearly no longer fully idle! */
> @@ -2810,11 +2815,11 @@ static void rcu_sysidle_check_cpu(struct rcu_data 
> *rdp, bool *isidle,
>  
>   /*
>* If some other CPU has already reported non-idle, if this is
> -  * not the flavor of RCU that tracks sysidle state, or if this
> -  * is an offline or the timekeeping CPU, nothing to do.
> +  * not the flavor of RCU that tracks sysidle state, or if this is
> +  * an offline or !nohz_full= or the timekeeping CPU, nothing to do.
>*/
>   if (!*isidle || rdp->rsp != rcu_sysidle_state ||
> - cpu_is_offline(rdp->cpu) || rdp->cpu == tick_do_timer_cpu)
> + cpu_is_offline(rdp->cpu) || !tick_nohz_full_cpu(rdp->cpu))
>   return;
>   if (rcu_gp_in_progress(rdp->rsp))
>   WARN_ON_ONCE(smp_processor_id() != tick_do_timer_cpu);
> 
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH tip/core/rcu] Do not keep timekeeping CPU tick running for non-nohz_full= CPUs

2014-07-19 Thread Josh Triplett
On Sat, Jul 19, 2014 at 09:53:50AM -0700, Paul E. McKenney wrote:
> If a non-nohz_full= CPU is non-idle, it will have a scheduling-clock
> interrupt, and therefore doesn't need the timekeeping CPU to keep
> its scheduling-clock interrupt going.  This commit therefore ignores
> the idle state of non-nohz_full CPUs when determining whether or not
> the timekeeping CPU can safely turn off its scheduling-clock interrupt.
> 
> Signed-off-by: Paul E. McKenney 

Reviewed-by: Josh Triplett 

> diff --git a/kernel/rcu/tree_plugin.h b/kernel/rcu/tree_plugin.h
> index ddad959a9132..eaa32e4c228d 100644
> --- a/kernel/rcu/tree_plugin.h
> +++ b/kernel/rcu/tree_plugin.h
> @@ -2789,8 +2789,13 @@ static void rcu_sysidle_exit(struct rcu_dynticks 
> *rdtp, int irq)
>* system-idle state.  This means that the timekeeping CPU must
>* invoke rcu_sysidle_force_exit() directly if it does anything
>* more than take a scheduling-clock interrupt.
> +  *
> +  * In addition if we are not a nohz_full= CPU, then when we are
> +  * non-idle we have our own tick, so we don't need the timekeeping
> +  * CPU to keep a tick on our behalf.  We assume that the timekeeping
> +  * CPU is also a nohz_full= CPU.
>*/
> - if (smp_processor_id() == tick_do_timer_cpu)
> + if (!tick_nohz_full_cpu(smp_processor_id()))
>   return;
>  
>   /* Update system-idle state: We are clearly no longer fully idle! */
> @@ -2810,11 +2815,11 @@ static void rcu_sysidle_check_cpu(struct rcu_data 
> *rdp, bool *isidle,
>  
>   /*
>* If some other CPU has already reported non-idle, if this is
> -  * not the flavor of RCU that tracks sysidle state, or if this
> -  * is an offline or the timekeeping CPU, nothing to do.
> +  * not the flavor of RCU that tracks sysidle state, or if this is
> +  * an offline or !nohz_full= or the timekeeping CPU, nothing to do.
>*/
>   if (!*isidle || rdp->rsp != rcu_sysidle_state ||
> - cpu_is_offline(rdp->cpu) || rdp->cpu == tick_do_timer_cpu)
> + cpu_is_offline(rdp->cpu) || !tick_nohz_full_cpu(rdp->cpu))
>   return;
>   if (rcu_gp_in_progress(rdp->rsp))
>   WARN_ON_ONCE(smp_processor_id() != tick_do_timer_cpu);
> 
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH tip/core/rcu] Do not keep timekeeping CPU tick running for non-nohz_full= CPUs

2014-07-19 Thread Paul E. McKenney
If a non-nohz_full= CPU is non-idle, it will have a scheduling-clock
interrupt, and therefore doesn't need the timekeeping CPU to keep
its scheduling-clock interrupt going.  This commit therefore ignores
the idle state of non-nohz_full CPUs when determining whether or not
the timekeeping CPU can safely turn off its scheduling-clock interrupt.

Signed-off-by: Paul E. McKenney 

diff --git a/kernel/rcu/tree_plugin.h b/kernel/rcu/tree_plugin.h
index ddad959a9132..eaa32e4c228d 100644
--- a/kernel/rcu/tree_plugin.h
+++ b/kernel/rcu/tree_plugin.h
@@ -2789,8 +2789,13 @@ static void rcu_sysidle_exit(struct rcu_dynticks *rdtp, 
int irq)
 * system-idle state.  This means that the timekeeping CPU must
 * invoke rcu_sysidle_force_exit() directly if it does anything
 * more than take a scheduling-clock interrupt.
+*
+* In addition if we are not a nohz_full= CPU, then when we are
+* non-idle we have our own tick, so we don't need the timekeeping
+* CPU to keep a tick on our behalf.  We assume that the timekeeping
+* CPU is also a nohz_full= CPU.
 */
-   if (smp_processor_id() == tick_do_timer_cpu)
+   if (!tick_nohz_full_cpu(smp_processor_id()))
return;
 
/* Update system-idle state: We are clearly no longer fully idle! */
@@ -2810,11 +2815,11 @@ static void rcu_sysidle_check_cpu(struct rcu_data *rdp, 
bool *isidle,
 
/*
 * If some other CPU has already reported non-idle, if this is
-* not the flavor of RCU that tracks sysidle state, or if this
-* is an offline or the timekeeping CPU, nothing to do.
+* not the flavor of RCU that tracks sysidle state, or if this is
+* an offline or !nohz_full= or the timekeeping CPU, nothing to do.
 */
if (!*isidle || rdp->rsp != rcu_sysidle_state ||
-   cpu_is_offline(rdp->cpu) || rdp->cpu == tick_do_timer_cpu)
+   cpu_is_offline(rdp->cpu) || !tick_nohz_full_cpu(rdp->cpu))
return;
if (rcu_gp_in_progress(rdp->rsp))
WARN_ON_ONCE(smp_processor_id() != tick_do_timer_cpu);

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/