Re: [RFCv2 PATCH 00/23] sched: Energy cost model for energy-aware scheduling

2014-07-08 Thread Morten Rasmussen
On Tue, Jul 08, 2014 at 01:23:43AM +0100, Yuyang Du wrote:
> Hi Morten,
> 
> On Mon, Jul 07, 2014 at 03:16:27PM +0100, Morten Rasmussen wrote:
> 
> > Could you elaborate on what you mean by 'a general statement'?
> 
> The general statement is: higher freq, more cap, and more power. More specific
> numbers are not needed, as they are just instances of this general statement.

I think I understand now. While that statement might be true for SMP
systems, it doesn't tell you the cost of chosing a higher frequency. If
you are optimizing for energy, you really care about energy per work (~
energy/instruction). The additional cost of going to a higher capacity
state very platform dependent. At least on typical modern ARM platforms,
the highest states are significantly more expensive to use, so you don't
want to use them unless you really have to.

If we don't have any energy cost information, we can't make an informed
decision whether it worth running faster (race-to-idle or consolidating
tasks on fewer cpus) or using more cpus (if that is possible).
 
> > cpu_power doesn't tell you anything about energy-efficiency. There is no
> > link with frequency scaling.
> 
> In general, more cpu_power, more freq, less energy-efficiency, as you said 
> sometime ago.

Not in general :) For big.LITTLE it may be more energy efficient to run
a little cpu at a high frequency instead of using a big cpu at a low
frequency. For multi-cluster/package SMP it is not straight forward
either as it is more expensive to run the first cpu in a large power
domain than the additional cpus.

> 
> > No representation of power domains.
> 
> Represented by CPU topology?

Not really. The sched_domain hierarchy represents the cache hierarhcy
(and nodes for NUMA), but you don't necessarily have a power domains at
the same levels. But yes, the sched_domain hierarchy can be used for
this purpose as well if we attach the necessary power domain information
to it. That is basically what we do in this patch set.

Morten

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [RFCv2 PATCH 00/23] sched: Energy cost model for energy-aware scheduling

2014-07-08 Thread Yuyang Du
Hi Morten,

On Mon, Jul 07, 2014 at 03:16:27PM +0100, Morten Rasmussen wrote:

> Could you elaborate on what you mean by 'a general statement'?

The general statement is: higher freq, more cap, and more power. More specific
numbers are not needed, as they are just instances of this general statement.

> cpu_power doesn't tell you anything about energy-efficiency. There is no
> link with frequency scaling.

In general, more cpu_power, more freq, less energy-efficiency, as you said 
sometime ago.

> No representation of power domains.

Represented by CPU topology?

Thanks,
Yuyang
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [RFCv2 PATCH 00/23] sched: Energy cost model for energy-aware scheduling

2014-07-08 Thread Yuyang Du
Hi Morten,

On Mon, Jul 07, 2014 at 03:16:27PM +0100, Morten Rasmussen wrote:

 Could you elaborate on what you mean by 'a general statement'?

The general statement is: higher freq, more cap, and more power. More specific
numbers are not needed, as they are just instances of this general statement.

 cpu_power doesn't tell you anything about energy-efficiency. There is no
 link with frequency scaling.

In general, more cpu_power, more freq, less energy-efficiency, as you said 
sometime ago.

 No representation of power domains.

Represented by CPU topology?

Thanks,
Yuyang
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [RFCv2 PATCH 00/23] sched: Energy cost model for energy-aware scheduling

2014-07-08 Thread Morten Rasmussen
On Tue, Jul 08, 2014 at 01:23:43AM +0100, Yuyang Du wrote:
 Hi Morten,
 
 On Mon, Jul 07, 2014 at 03:16:27PM +0100, Morten Rasmussen wrote:
 
  Could you elaborate on what you mean by 'a general statement'?
 
 The general statement is: higher freq, more cap, and more power. More specific
 numbers are not needed, as they are just instances of this general statement.

I think I understand now. While that statement might be true for SMP
systems, it doesn't tell you the cost of chosing a higher frequency. If
you are optimizing for energy, you really care about energy per work (~
energy/instruction). The additional cost of going to a higher capacity
state very platform dependent. At least on typical modern ARM platforms,
the highest states are significantly more expensive to use, so you don't
want to use them unless you really have to.

If we don't have any energy cost information, we can't make an informed
decision whether it worth running faster (race-to-idle or consolidating
tasks on fewer cpus) or using more cpus (if that is possible).
 
  cpu_power doesn't tell you anything about energy-efficiency. There is no
  link with frequency scaling.
 
 In general, more cpu_power, more freq, less energy-efficiency, as you said 
 sometime ago.

Not in general :) For big.LITTLE it may be more energy efficient to run
a little cpu at a high frequency instead of using a big cpu at a low
frequency. For multi-cluster/package SMP it is not straight forward
either as it is more expensive to run the first cpu in a large power
domain than the additional cpus.

 
  No representation of power domains.
 
 Represented by CPU topology?

Not really. The sched_domain hierarchy represents the cache hierarhcy
(and nodes for NUMA), but you don't necessarily have a power domains at
the same levels. But yes, the sched_domain hierarchy can be used for
this purpose as well if we attach the necessary power domain information
to it. That is basically what we do in this patch set.

Morten

--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [RFCv2 PATCH 00/23] sched: Energy cost model for energy-aware scheduling

2014-07-07 Thread Peter Zijlstra
On Mon, Jul 07, 2014 at 03:00:18PM +0100, Morten Rasmussen wrote:
> > Could this be addressed by making the scheduler more "proactive" and,
> > rather than just looking at the current energy diff, guesstimate what it
> > would be if not placing a task at all on the CPU? If for example there
> > is no other task running on that CPU, could energy_diff_task() take into
> > account the next deeper C-state rather than just the current one? This
> > way we may be able to achieve more packing even on fully symmetric
> > systems and allow CPUs to go into deeper sleep states.
> 
> I think it would be possible to bias the choice of cpu either by
> considering potential energy savings by letting some cpus get into a
> deeper C-state, or applying a static bias towards some cpus (lower cpuid
> for example). Since it is in the wakeup path it must not be too complex
> to figure out though.
> 
> I haven't seen the problem in reality yet. When I tried the short tasks
> test with all cpus using the same energy model I got tasks consolidated
> on either of the clusters. The consolidation cluster sometimes changed
> during the test.
> 
> There is a lot of tuning to be done, that is for sure. We will have to
> make similar decisions for the periodic/idle balance path as well.

So one of the things I mentioned previously (on IRC, to Morton) is that
we can use the energy numbers (P and C state) to precompute whether or
not race-to-idle makes sense for the platform. Or if it benefits from
packing etc..

So at topology setup time we can statically determine some of these
policies (maybe with a few parameters) and take it from there.

So if the platform benefits from packing, we can set the appropriate
topology bits to do so. If it benefits from race-to-idle, it can select
that, etc.



pgpDEHTpMxsvw.pgp
Description: PGP signature


Re: [RFCv2 PATCH 00/23] sched: Energy cost model for energy-aware scheduling

2014-07-07 Thread Morten Rasmussen
On Sun, Jul 06, 2014 at 08:05:23PM +0100, Yuyang Du wrote:
> Hi Morten,
> 
> Thanks, got it. Then another question,
> 
> On Fri, Jul 04, 2014 at 12:06:13PM +0100, Morten Rasmussen wrote:
> > The patch set essentially puts tasks where it is most energy-efficient
> > guided by the platform energy model. That should benefit any platform,
> > SMP and big.LITTLE. That is at least the goal.
> > 
> 
> I understand energy_diff_* functions are based on the energy model (though I
> have not dived into the detail of how you change load balancing based on
> energy_diff_*).
> 
> Speaking of the engergy model, I am not sure why elaborate "imprecise" energy
> numbers do a better job than only a general statement: higher freq, more cap,
> and more power.

The idea is that the energy model allows the scheduler to estimate the
energy efficiency of the cpus under any load scenario. That way, the
scheduler can estimate the energy implications of every choice it makes.
Whether it is cheaper (in terms of energy) to increase frequency on the
currently awake cpu instead of waking up more. Which cpu is the cheapest
to wake up if another one is needed. And so on.

> Even for big.LITTLE systems, big and little CPUs also follow that statement
> respectively. Then it is just a matter of where to place tasks between them.
> Under such, the energy model might be useful, but still probably 
> cpu_power_orig
> (from Vincent) might be enough.

cpu_power doesn't tell you anything about energy-efficiency. There is no
link with frequency scaling. No representation of power domains. I don't
see how you can make energy aware decisions without having just a vague
idea about the impact of decisions. You need to consider energy
efficiency to get the most out of big.LITTLE. I believe the same is true
to some extend for SMP systems with aggressive cpu power management.

Could you elaborate on what you mean by 'a general statement'?

Thanks,
Morten

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [RFCv2 PATCH 00/23] sched: Energy cost model for energy-aware scheduling

2014-07-07 Thread Morten Rasmussen
Hi Catalin,

On Fri, Jul 04, 2014 at 05:55:52PM +0100, Catalin Marinas wrote:
> Hi Morten,
> 
> On Thu, Jul 03, 2014 at 05:25:47PM +0100, Morten Rasmussen wrote:
> > This is an RFC and there are some loose ends that have not been
> > addressed here or in the code yet. The model and its infrastructure is
> > in place in the scheduler and it is being used for load-balancing
> > decisions. It is used for the select_task_rq_fair() path for
> > fork/exec/wake balancing and to guide the selection of the source cpu
> > for periodic or idle balance.
> 
> IMHO, the series is on the right direction for addressing the energy
> aware scheduling (very complex) problem. But I have some high level
> comments below.
> 
> > However, the main ideas and the primary focus of this RFC: The energy
> > model and energy_diff_{load, task, cpu}() are there.
> > 
> > Due to limitation 1, the ARM TC2 platform (2xA15+3xA7) was setup to
> > disable frequency scaling and set frequencies to eliminate the
> > big.LITTLE performance difference. That basically turns TC2 into an SMP
> > platform where a subset of the cpus are less energy-efficient.
> > 
> > Tests using a synthetic workload with seven short running periodic
> > tasks of different size and period, and the sysbench cpu benchmark with
> > five threads gave the following results:
> > 
> > cpu energy* short tasks sysbench
> > Mainline100 100
> > EA   49  99
> > 
> > * Note that these energy savings are _not_ representative of what can be
> > achieved on a true SMP platform where all cpus are equally 
> > energy-efficient. There should be benefit for SMP platforms as well, 
> > however, it will be smaller.
> 
> My impression (and I may be wrong) is that you get bigger energy saving
> on a big.LITTLE vs SMP system exactly because of the asymmetry in power
> consumption.

That is correct. As said in the note above, the benefit will be smaller
on SMP systems.

> The algorithm proposed here ends up packing small tasks on
> the little CPUs as they are more energy efficient (which is the correct
> thing to do but I wonder what results you would get with 3xA7 vs
> 2xA7+1xA15).
> 
> For a symmetric system where all CPUs have the same energy model you
> could end up with several small threads balanced equally across the
> system. The only way the scheduler could avoid a CPU is if it somehow
> manages to get into a deeper idle state (and energy_diff_task() would
> show some asymmetry). But this wouldn't happen without the scheduler
> first deciding to leave that CPU idle for longer.

It is a scenario that could happen with the current use of
energy_diff_task() in the wakeup balancing path. Any 'imbalance' might
make some cpus cheaper and hence attract the other tasks, but it is not
guaranteed to happen.

> Could this be addressed by making the scheduler more "proactive" and,
> rather than just looking at the current energy diff, guesstimate what it
> would be if not placing a task at all on the CPU? If for example there
> is no other task running on that CPU, could energy_diff_task() take into
> account the next deeper C-state rather than just the current one? This
> way we may be able to achieve more packing even on fully symmetric
> systems and allow CPUs to go into deeper sleep states.

I think it would be possible to bias the choice of cpu either by
considering potential energy savings by letting some cpus get into a
deeper C-state, or applying a static bias towards some cpus (lower cpuid
for example). Since it is in the wakeup path it must not be too complex
to figure out though.

I haven't seen the problem in reality yet. When I tried the short tasks
test with all cpus using the same energy model I got tasks consolidated
on either of the clusters. The consolidation cluster sometimes changed
during the test.

There is a lot of tuning to be done, that is for sure. We will have to
make similar decisions for the periodic/idle balance path as well.

Thanks,
Morten

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [RFCv2 PATCH 00/23] sched: Energy cost model for energy-aware scheduling

2014-07-07 Thread Morten Rasmussen
Hi Catalin,

On Fri, Jul 04, 2014 at 05:55:52PM +0100, Catalin Marinas wrote:
 Hi Morten,
 
 On Thu, Jul 03, 2014 at 05:25:47PM +0100, Morten Rasmussen wrote:
  This is an RFC and there are some loose ends that have not been
  addressed here or in the code yet. The model and its infrastructure is
  in place in the scheduler and it is being used for load-balancing
  decisions. It is used for the select_task_rq_fair() path for
  fork/exec/wake balancing and to guide the selection of the source cpu
  for periodic or idle balance.
 
 IMHO, the series is on the right direction for addressing the energy
 aware scheduling (very complex) problem. But I have some high level
 comments below.
 
  However, the main ideas and the primary focus of this RFC: The energy
  model and energy_diff_{load, task, cpu}() are there.
  
  Due to limitation 1, the ARM TC2 platform (2xA15+3xA7) was setup to
  disable frequency scaling and set frequencies to eliminate the
  big.LITTLE performance difference. That basically turns TC2 into an SMP
  platform where a subset of the cpus are less energy-efficient.
  
  Tests using a synthetic workload with seven short running periodic
  tasks of different size and period, and the sysbench cpu benchmark with
  five threads gave the following results:
  
  cpu energy* short tasks sysbench
  Mainline100 100
  EA   49  99
  
  * Note that these energy savings are _not_ representative of what can be
  achieved on a true SMP platform where all cpus are equally 
  energy-efficient. There should be benefit for SMP platforms as well, 
  however, it will be smaller.
 
 My impression (and I may be wrong) is that you get bigger energy saving
 on a big.LITTLE vs SMP system exactly because of the asymmetry in power
 consumption.

That is correct. As said in the note above, the benefit will be smaller
on SMP systems.

 The algorithm proposed here ends up packing small tasks on
 the little CPUs as they are more energy efficient (which is the correct
 thing to do but I wonder what results you would get with 3xA7 vs
 2xA7+1xA15).
 
 For a symmetric system where all CPUs have the same energy model you
 could end up with several small threads balanced equally across the
 system. The only way the scheduler could avoid a CPU is if it somehow
 manages to get into a deeper idle state (and energy_diff_task() would
 show some asymmetry). But this wouldn't happen without the scheduler
 first deciding to leave that CPU idle for longer.

It is a scenario that could happen with the current use of
energy_diff_task() in the wakeup balancing path. Any 'imbalance' might
make some cpus cheaper and hence attract the other tasks, but it is not
guaranteed to happen.

 Could this be addressed by making the scheduler more proactive and,
 rather than just looking at the current energy diff, guesstimate what it
 would be if not placing a task at all on the CPU? If for example there
 is no other task running on that CPU, could energy_diff_task() take into
 account the next deeper C-state rather than just the current one? This
 way we may be able to achieve more packing even on fully symmetric
 systems and allow CPUs to go into deeper sleep states.

I think it would be possible to bias the choice of cpu either by
considering potential energy savings by letting some cpus get into a
deeper C-state, or applying a static bias towards some cpus (lower cpuid
for example). Since it is in the wakeup path it must not be too complex
to figure out though.

I haven't seen the problem in reality yet. When I tried the short tasks
test with all cpus using the same energy model I got tasks consolidated
on either of the clusters. The consolidation cluster sometimes changed
during the test.

There is a lot of tuning to be done, that is for sure. We will have to
make similar decisions for the periodic/idle balance path as well.

Thanks,
Morten

--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [RFCv2 PATCH 00/23] sched: Energy cost model for energy-aware scheduling

2014-07-07 Thread Morten Rasmussen
On Sun, Jul 06, 2014 at 08:05:23PM +0100, Yuyang Du wrote:
 Hi Morten,
 
 Thanks, got it. Then another question,
 
 On Fri, Jul 04, 2014 at 12:06:13PM +0100, Morten Rasmussen wrote:
  The patch set essentially puts tasks where it is most energy-efficient
  guided by the platform energy model. That should benefit any platform,
  SMP and big.LITTLE. That is at least the goal.
  
 
 I understand energy_diff_* functions are based on the energy model (though I
 have not dived into the detail of how you change load balancing based on
 energy_diff_*).
 
 Speaking of the engergy model, I am not sure why elaborate imprecise energy
 numbers do a better job than only a general statement: higher freq, more cap,
 and more power.

The idea is that the energy model allows the scheduler to estimate the
energy efficiency of the cpus under any load scenario. That way, the
scheduler can estimate the energy implications of every choice it makes.
Whether it is cheaper (in terms of energy) to increase frequency on the
currently awake cpu instead of waking up more. Which cpu is the cheapest
to wake up if another one is needed. And so on.

 Even for big.LITTLE systems, big and little CPUs also follow that statement
 respectively. Then it is just a matter of where to place tasks between them.
 Under such, the energy model might be useful, but still probably 
 cpu_power_orig
 (from Vincent) might be enough.

cpu_power doesn't tell you anything about energy-efficiency. There is no
link with frequency scaling. No representation of power domains. I don't
see how you can make energy aware decisions without having just a vague
idea about the impact of decisions. You need to consider energy
efficiency to get the most out of big.LITTLE. I believe the same is true
to some extend for SMP systems with aggressive cpu power management.

Could you elaborate on what you mean by 'a general statement'?

Thanks,
Morten

--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [RFCv2 PATCH 00/23] sched: Energy cost model for energy-aware scheduling

2014-07-07 Thread Peter Zijlstra
On Mon, Jul 07, 2014 at 03:00:18PM +0100, Morten Rasmussen wrote:
  Could this be addressed by making the scheduler more proactive and,
  rather than just looking at the current energy diff, guesstimate what it
  would be if not placing a task at all on the CPU? If for example there
  is no other task running on that CPU, could energy_diff_task() take into
  account the next deeper C-state rather than just the current one? This
  way we may be able to achieve more packing even on fully symmetric
  systems and allow CPUs to go into deeper sleep states.
 
 I think it would be possible to bias the choice of cpu either by
 considering potential energy savings by letting some cpus get into a
 deeper C-state, or applying a static bias towards some cpus (lower cpuid
 for example). Since it is in the wakeup path it must not be too complex
 to figure out though.
 
 I haven't seen the problem in reality yet. When I tried the short tasks
 test with all cpus using the same energy model I got tasks consolidated
 on either of the clusters. The consolidation cluster sometimes changed
 during the test.
 
 There is a lot of tuning to be done, that is for sure. We will have to
 make similar decisions for the periodic/idle balance path as well.

So one of the things I mentioned previously (on IRC, to Morton) is that
we can use the energy numbers (P and C state) to precompute whether or
not race-to-idle makes sense for the platform. Or if it benefits from
packing etc..

So at topology setup time we can statically determine some of these
policies (maybe with a few parameters) and take it from there.

So if the platform benefits from packing, we can set the appropriate
topology bits to do so. If it benefits from race-to-idle, it can select
that, etc.



pgpDEHTpMxsvw.pgp
Description: PGP signature


Re: [RFCv2 PATCH 00/23] sched: Energy cost model for energy-aware scheduling

2014-07-06 Thread Yuyang Du
Hi Morten,

Thanks, got it. Then another question,

On Fri, Jul 04, 2014 at 12:06:13PM +0100, Morten Rasmussen wrote:
> The patch set essentially puts tasks where it is most energy-efficient
> guided by the platform energy model. That should benefit any platform,
> SMP and big.LITTLE. That is at least the goal.
> 

I understand energy_diff_* functions are based on the energy model (though I
have not dived into the detail of how you change load balancing based on
energy_diff_*).

Speaking of the engergy model, I am not sure why elaborate "imprecise" energy
numbers do a better job than only a general statement: higher freq, more cap,
and more power.

Even for big.LITTLE systems, big and little CPUs also follow that statement
respectively. Then it is just a matter of where to place tasks between them.
Under such, the energy model might be useful, but still probably cpu_power_orig
(from Vincent) might be enough.

Thanks,
Yuyang
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [RFCv2 PATCH 00/23] sched: Energy cost model for energy-aware scheduling

2014-07-06 Thread Yuyang Du
Hi Morten,

Thanks, got it. Then another question,

On Fri, Jul 04, 2014 at 12:06:13PM +0100, Morten Rasmussen wrote:
 The patch set essentially puts tasks where it is most energy-efficient
 guided by the platform energy model. That should benefit any platform,
 SMP and big.LITTLE. That is at least the goal.
 

I understand energy_diff_* functions are based on the energy model (though I
have not dived into the detail of how you change load balancing based on
energy_diff_*).

Speaking of the engergy model, I am not sure why elaborate imprecise energy
numbers do a better job than only a general statement: higher freq, more cap,
and more power.

Even for big.LITTLE systems, big and little CPUs also follow that statement
respectively. Then it is just a matter of where to place tasks between them.
Under such, the energy model might be useful, but still probably cpu_power_orig
(from Vincent) might be enough.

Thanks,
Yuyang
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [RFCv2 PATCH 00/23] sched: Energy cost model for energy-aware scheduling

2014-07-04 Thread Catalin Marinas
Hi Morten,

On Thu, Jul 03, 2014 at 05:25:47PM +0100, Morten Rasmussen wrote:
> This is an RFC and there are some loose ends that have not been
> addressed here or in the code yet. The model and its infrastructure is
> in place in the scheduler and it is being used for load-balancing
> decisions. It is used for the select_task_rq_fair() path for
> fork/exec/wake balancing and to guide the selection of the source cpu
> for periodic or idle balance.

IMHO, the series is on the right direction for addressing the energy
aware scheduling (very complex) problem. But I have some high level
comments below.

> However, the main ideas and the primary focus of this RFC: The energy
> model and energy_diff_{load, task, cpu}() are there.
> 
> Due to limitation 1, the ARM TC2 platform (2xA15+3xA7) was setup to
> disable frequency scaling and set frequencies to eliminate the
> big.LITTLE performance difference. That basically turns TC2 into an SMP
> platform where a subset of the cpus are less energy-efficient.
> 
> Tests using a synthetic workload with seven short running periodic
> tasks of different size and period, and the sysbench cpu benchmark with
> five threads gave the following results:
> 
> cpu energy*   short tasks sysbench
> Mainline  100 100
> EA 49  99
> 
> * Note that these energy savings are _not_ representative of what can be
> achieved on a true SMP platform where all cpus are equally 
> energy-efficient. There should be benefit for SMP platforms as well, 
> however, it will be smaller.

My impression (and I may be wrong) is that you get bigger energy saving
on a big.LITTLE vs SMP system exactly because of the asymmetry in power
consumption. The algorithm proposed here ends up packing small tasks on
the little CPUs as they are more energy efficient (which is the correct
thing to do but I wonder what results you would get with 3xA7 vs
2xA7+1xA15).

For a symmetric system where all CPUs have the same energy model you
could end up with several small threads balanced equally across the
system. The only way the scheduler could avoid a CPU is if it somehow
manages to get into a deeper idle state (and energy_diff_task() would
show some asymmetry). But this wouldn't happen without the scheduler
first deciding to leave that CPU idle for longer.

Could this be addressed by making the scheduler more "proactive" and,
rather than just looking at the current energy diff, guesstimate what it
would be if not placing a task at all on the CPU? If for example there
is no other task running on that CPU, could energy_diff_task() take into
account the next deeper C-state rather than just the current one? This
way we may be able to achieve more packing even on fully symmetric
systems and allow CPUs to go into deeper sleep states.

Thanks.

-- 
Catalin
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [RFCv2 PATCH 00/23] sched: Energy cost model for energy-aware scheduling

2014-07-04 Thread Anca Emanuel
This sounds like an an math problem ( for Donald Knuth :) )
You need to think out of the box, present the problem right is just
the fist step and an big one.
Then you need to come with an formal algorithm to solve it, then proof it.
Next step is to code that algorithm and verify that is working in real world.

If you not present the problem right ( missed bigLittle,
over/down-clocking ) you will not get the wright algorithm.
For new algorithms very few people does it right for the first time.

On Fri, Jul 4, 2014 at 2:06 PM, Morten Rasmussen
 wrote:
> Hi Yuyang,
>
> On Fri, Jul 04, 2014 at 12:19:50AM +0100, Yuyang Du wrote:
>> Hi Morten,
>>
>> On Fri, Jul 04, 2014 at 12:25:47AM +0800, Morten Rasmussen wrote:
>> > * Note that these energy savings are _not_ representative of what can be
>> > achieved on a true SMP platform where all cpus are equally
>> > energy-efficient. There should be benefit for SMP platforms as well,
>> > however, it will be smaller.
>> >
>> > The energy model led to consolidation of the short tasks on the A7
>> > cluster (more energy-efficient), while sysbench made use of all cpus as
>> > the A7s didn't have sufficient compute capacity to handle the five
>> > tasks.
>>
>> Looks like this patchset is mainly for big.LITTLE?
>
> No, not at all. The only big.LITTLE in there is the test platform but
> that has been configured to be as close as possible to an SMP platform.
> That is, no performance difference between cpus. I would have preferred
> a true SMP platform for testing, but this is the only dual-cluster
> platform that I have access to with proper mainline kernel support.
>
> The patch set essentially puts tasks where it is most energy-efficient
> guided by the platform energy model. That should benefit any platform,
> SMP and big.LITTLE. That is at least the goal.
>
> On an SMP platform with two clusters/packages (whatever you call a group
> of cpus sharing the same power domain) you get task consolidation on a
> single cluster if the energy model says that it is beneficial. Very much
> like your previous proposals. It is also what I'm trying to show with
> the numbers I have included.
>
> That said, we are of course keeping in mind what would be required to
> make this work for big.LITTLE. However, there is nothing big.LITTLE
> specific in the patch set. Just the possibility of having different
> energy models for different cpus in the system. We will have to add some
> tweaks eventually to get the best out of big.LITTLE later. Somewhat
> similar to what exists today for better SMT support and other
> architecture specialities.
>
>> And can the patchset actually replace Global Task Scheduling?
>
> Global Task Scheduling is (ARM) marketing speak for letting the
> scheduler know about all cpus in a big.LITTLE system. It is not an
> actual implementation. There is an out-of-tree implementation of GTS
> available which is very big.LITTLE specific.
>
> The energy model driven scheduling proposed here is not big.LITTLE
> specific, but aims at introducing generic energy-awareness in the
> scheduler. Once energy-awareness is in place, most of the support needed
> for big.LITTLE will be there too. It is generic energy-aware code that
> is capable of making informed decisions based on the platform model,
> big.LITTLE or SMP.
>
> The short answer is: Not in its current state, but if we get the
> energy-awareness right it should be able to.
>
> Morten
> --
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to majord...@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at  http://www.tux.org/lkml/
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [RFCv2 PATCH 00/23] sched: Energy cost model for energy-aware scheduling

2014-07-04 Thread Morten Rasmussen
Hi Yuyang,

On Fri, Jul 04, 2014 at 12:19:50AM +0100, Yuyang Du wrote:
> Hi Morten,
> 
> On Fri, Jul 04, 2014 at 12:25:47AM +0800, Morten Rasmussen wrote:
> > * Note that these energy savings are _not_ representative of what can be
> > achieved on a true SMP platform where all cpus are equally 
> > energy-efficient. There should be benefit for SMP platforms as well, 
> > however, it will be smaller.
> > 
> > The energy model led to consolidation of the short tasks on the A7
> > cluster (more energy-efficient), while sysbench made use of all cpus as
> > the A7s didn't have sufficient compute capacity to handle the five
> > tasks.
> 
> Looks like this patchset is mainly for big.LITTLE?

No, not at all. The only big.LITTLE in there is the test platform but
that has been configured to be as close as possible to an SMP platform.
That is, no performance difference between cpus. I would have preferred
a true SMP platform for testing, but this is the only dual-cluster
platform that I have access to with proper mainline kernel support.

The patch set essentially puts tasks where it is most energy-efficient
guided by the platform energy model. That should benefit any platform,
SMP and big.LITTLE. That is at least the goal.

On an SMP platform with two clusters/packages (whatever you call a group
of cpus sharing the same power domain) you get task consolidation on a
single cluster if the energy model says that it is beneficial. Very much
like your previous proposals. It is also what I'm trying to show with
the numbers I have included.

That said, we are of course keeping in mind what would be required to
make this work for big.LITTLE. However, there is nothing big.LITTLE
specific in the patch set. Just the possibility of having different
energy models for different cpus in the system. We will have to add some
tweaks eventually to get the best out of big.LITTLE later. Somewhat
similar to what exists today for better SMT support and other
architecture specialities.

> And can the patchset actually replace Global Task Scheduling?

Global Task Scheduling is (ARM) marketing speak for letting the
scheduler know about all cpus in a big.LITTLE system. It is not an
actual implementation. There is an out-of-tree implementation of GTS
available which is very big.LITTLE specific.

The energy model driven scheduling proposed here is not big.LITTLE
specific, but aims at introducing generic energy-awareness in the
scheduler. Once energy-awareness is in place, most of the support needed
for big.LITTLE will be there too. It is generic energy-aware code that
is capable of making informed decisions based on the platform model,
big.LITTLE or SMP.

The short answer is: Not in its current state, but if we get the
energy-awareness right it should be able to.

Morten
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [RFCv2 PATCH 00/23] sched: Energy cost model for energy-aware scheduling

2014-07-04 Thread Yuyang Du
Hi Morten,

On Fri, Jul 04, 2014 at 12:25:47AM +0800, Morten Rasmussen wrote:
> * Note that these energy savings are _not_ representative of what can be
> achieved on a true SMP platform where all cpus are equally 
> energy-efficient. There should be benefit for SMP platforms as well, 
> however, it will be smaller.
> 
> The energy model led to consolidation of the short tasks on the A7
> cluster (more energy-efficient), while sysbench made use of all cpus as
> the A7s didn't have sufficient compute capacity to handle the five
> tasks.

Looks like this patchset is mainly for big.LITTLE? And can the patchset
actually replace Global Task Scheduling?

Thanks,
Yuyang
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [RFCv2 PATCH 00/23] sched: Energy cost model for energy-aware scheduling

2014-07-04 Thread Yuyang Du
Hi Morten,

On Fri, Jul 04, 2014 at 12:25:47AM +0800, Morten Rasmussen wrote:
 * Note that these energy savings are _not_ representative of what can be
 achieved on a true SMP platform where all cpus are equally 
 energy-efficient. There should be benefit for SMP platforms as well, 
 however, it will be smaller.
 
 The energy model led to consolidation of the short tasks on the A7
 cluster (more energy-efficient), while sysbench made use of all cpus as
 the A7s didn't have sufficient compute capacity to handle the five
 tasks.

Looks like this patchset is mainly for big.LITTLE? And can the patchset
actually replace Global Task Scheduling?

Thanks,
Yuyang
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [RFCv2 PATCH 00/23] sched: Energy cost model for energy-aware scheduling

2014-07-04 Thread Morten Rasmussen
Hi Yuyang,

On Fri, Jul 04, 2014 at 12:19:50AM +0100, Yuyang Du wrote:
 Hi Morten,
 
 On Fri, Jul 04, 2014 at 12:25:47AM +0800, Morten Rasmussen wrote:
  * Note that these energy savings are _not_ representative of what can be
  achieved on a true SMP platform where all cpus are equally 
  energy-efficient. There should be benefit for SMP platforms as well, 
  however, it will be smaller.
  
  The energy model led to consolidation of the short tasks on the A7
  cluster (more energy-efficient), while sysbench made use of all cpus as
  the A7s didn't have sufficient compute capacity to handle the five
  tasks.
 
 Looks like this patchset is mainly for big.LITTLE?

No, not at all. The only big.LITTLE in there is the test platform but
that has been configured to be as close as possible to an SMP platform.
That is, no performance difference between cpus. I would have preferred
a true SMP platform for testing, but this is the only dual-cluster
platform that I have access to with proper mainline kernel support.

The patch set essentially puts tasks where it is most energy-efficient
guided by the platform energy model. That should benefit any platform,
SMP and big.LITTLE. That is at least the goal.

On an SMP platform with two clusters/packages (whatever you call a group
of cpus sharing the same power domain) you get task consolidation on a
single cluster if the energy model says that it is beneficial. Very much
like your previous proposals. It is also what I'm trying to show with
the numbers I have included.

That said, we are of course keeping in mind what would be required to
make this work for big.LITTLE. However, there is nothing big.LITTLE
specific in the patch set. Just the possibility of having different
energy models for different cpus in the system. We will have to add some
tweaks eventually to get the best out of big.LITTLE later. Somewhat
similar to what exists today for better SMT support and other
architecture specialities.

 And can the patchset actually replace Global Task Scheduling?

Global Task Scheduling is (ARM) marketing speak for letting the
scheduler know about all cpus in a big.LITTLE system. It is not an
actual implementation. There is an out-of-tree implementation of GTS
available which is very big.LITTLE specific.

The energy model driven scheduling proposed here is not big.LITTLE
specific, but aims at introducing generic energy-awareness in the
scheduler. Once energy-awareness is in place, most of the support needed
for big.LITTLE will be there too. It is generic energy-aware code that
is capable of making informed decisions based on the platform model,
big.LITTLE or SMP.

The short answer is: Not in its current state, but if we get the
energy-awareness right it should be able to.

Morten
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [RFCv2 PATCH 00/23] sched: Energy cost model for energy-aware scheduling

2014-07-04 Thread Anca Emanuel
This sounds like an an math problem ( for Donald Knuth :) )
You need to think out of the box, present the problem right is just
the fist step and an big one.
Then you need to come with an formal algorithm to solve it, then proof it.
Next step is to code that algorithm and verify that is working in real world.

If you not present the problem right ( missed bigLittle,
over/down-clocking ) you will not get the wright algorithm.
For new algorithms very few people does it right for the first time.

On Fri, Jul 4, 2014 at 2:06 PM, Morten Rasmussen
morten.rasmus...@arm.com wrote:
 Hi Yuyang,

 On Fri, Jul 04, 2014 at 12:19:50AM +0100, Yuyang Du wrote:
 Hi Morten,

 On Fri, Jul 04, 2014 at 12:25:47AM +0800, Morten Rasmussen wrote:
  * Note that these energy savings are _not_ representative of what can be
  achieved on a true SMP platform where all cpus are equally
  energy-efficient. There should be benefit for SMP platforms as well,
  however, it will be smaller.
 
  The energy model led to consolidation of the short tasks on the A7
  cluster (more energy-efficient), while sysbench made use of all cpus as
  the A7s didn't have sufficient compute capacity to handle the five
  tasks.

 Looks like this patchset is mainly for big.LITTLE?

 No, not at all. The only big.LITTLE in there is the test platform but
 that has been configured to be as close as possible to an SMP platform.
 That is, no performance difference between cpus. I would have preferred
 a true SMP platform for testing, but this is the only dual-cluster
 platform that I have access to with proper mainline kernel support.

 The patch set essentially puts tasks where it is most energy-efficient
 guided by the platform energy model. That should benefit any platform,
 SMP and big.LITTLE. That is at least the goal.

 On an SMP platform with two clusters/packages (whatever you call a group
 of cpus sharing the same power domain) you get task consolidation on a
 single cluster if the energy model says that it is beneficial. Very much
 like your previous proposals. It is also what I'm trying to show with
 the numbers I have included.

 That said, we are of course keeping in mind what would be required to
 make this work for big.LITTLE. However, there is nothing big.LITTLE
 specific in the patch set. Just the possibility of having different
 energy models for different cpus in the system. We will have to add some
 tweaks eventually to get the best out of big.LITTLE later. Somewhat
 similar to what exists today for better SMT support and other
 architecture specialities.

 And can the patchset actually replace Global Task Scheduling?

 Global Task Scheduling is (ARM) marketing speak for letting the
 scheduler know about all cpus in a big.LITTLE system. It is not an
 actual implementation. There is an out-of-tree implementation of GTS
 available which is very big.LITTLE specific.

 The energy model driven scheduling proposed here is not big.LITTLE
 specific, but aims at introducing generic energy-awareness in the
 scheduler. Once energy-awareness is in place, most of the support needed
 for big.LITTLE will be there too. It is generic energy-aware code that
 is capable of making informed decisions based on the platform model,
 big.LITTLE or SMP.

 The short answer is: Not in its current state, but if we get the
 energy-awareness right it should be able to.

 Morten
 --
 To unsubscribe from this list: send the line unsubscribe linux-kernel in
 the body of a message to majord...@vger.kernel.org
 More majordomo info at  http://vger.kernel.org/majordomo-info.html
 Please read the FAQ at  http://www.tux.org/lkml/
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [RFCv2 PATCH 00/23] sched: Energy cost model for energy-aware scheduling

2014-07-04 Thread Catalin Marinas
Hi Morten,

On Thu, Jul 03, 2014 at 05:25:47PM +0100, Morten Rasmussen wrote:
 This is an RFC and there are some loose ends that have not been
 addressed here or in the code yet. The model and its infrastructure is
 in place in the scheduler and it is being used for load-balancing
 decisions. It is used for the select_task_rq_fair() path for
 fork/exec/wake balancing and to guide the selection of the source cpu
 for periodic or idle balance.

IMHO, the series is on the right direction for addressing the energy
aware scheduling (very complex) problem. But I have some high level
comments below.

 However, the main ideas and the primary focus of this RFC: The energy
 model and energy_diff_{load, task, cpu}() are there.
 
 Due to limitation 1, the ARM TC2 platform (2xA15+3xA7) was setup to
 disable frequency scaling and set frequencies to eliminate the
 big.LITTLE performance difference. That basically turns TC2 into an SMP
 platform where a subset of the cpus are less energy-efficient.
 
 Tests using a synthetic workload with seven short running periodic
 tasks of different size and period, and the sysbench cpu benchmark with
 five threads gave the following results:
 
 cpu energy*   short tasks sysbench
 Mainline  100 100
 EA 49  99
 
 * Note that these energy savings are _not_ representative of what can be
 achieved on a true SMP platform where all cpus are equally 
 energy-efficient. There should be benefit for SMP platforms as well, 
 however, it will be smaller.

My impression (and I may be wrong) is that you get bigger energy saving
on a big.LITTLE vs SMP system exactly because of the asymmetry in power
consumption. The algorithm proposed here ends up packing small tasks on
the little CPUs as they are more energy efficient (which is the correct
thing to do but I wonder what results you would get with 3xA7 vs
2xA7+1xA15).

For a symmetric system where all CPUs have the same energy model you
could end up with several small threads balanced equally across the
system. The only way the scheduler could avoid a CPU is if it somehow
manages to get into a deeper idle state (and energy_diff_task() would
show some asymmetry). But this wouldn't happen without the scheduler
first deciding to leave that CPU idle for longer.

Could this be addressed by making the scheduler more proactive and,
rather than just looking at the current energy diff, guesstimate what it
would be if not placing a task at all on the CPU? If for example there
is no other task running on that CPU, could energy_diff_task() take into
account the next deeper C-state rather than just the current one? This
way we may be able to achieve more packing even on fully symmetric
systems and allow CPUs to go into deeper sleep states.

Thanks.

-- 
Catalin
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[RFCv2 PATCH 00/23] sched: Energy cost model for energy-aware scheduling

2014-07-03 Thread Morten Rasmussen
This is RFC v2 of this proposal (changelog at the end).

Several techniques for saving energy through various scheduler
modifications have been proposed in the past, however most of the
techniques have not been universally beneficial for all use-cases and
platforms. For example, consolidating tasks on fewer cpus is an
effective way to save energy on some platforms, while it might make
things worse on others.

This proposal, which is inspired by the Ksummit workshop discussions
last year [1], takes a different approach by using a (relatively) simple
platform energy cost model to guide scheduling decisions. By providing
the model with platform specific costing data the model can provide a
estimate of the energy implications of scheduling decisions. So instead
of blindly applying scheduling techniques that may or may not work for
the current use-case, the scheduler can make informed energy-aware
decisions. We believe this approach provides a methodology that can be
adapted to any platform, including heterogeneous systems such as ARM
big.LITTLE. The model considers cpus only. Model data includes power
consumption at each P-state, C-state power consumption, and wake-up
energy costs. However, the energy model could potentially be extended to
be used to guide performance/energy decisions in other subsystems.

For example, the scheduler can use energy_diff_task(cpu, task) to
estimate the cost of placing a task on a specific cpu and compare energy
costs of different cpus.

This is an RFC and there are some loose ends that have not been
addressed here or in the code yet. The model and its infrastructure is
in place in the scheduler and it is being used for load-balancing
decisions. It is used for the select_task_rq_fair() path for
fork/exec/wake balancing and to guide the selection of the source cpu
for periodic or idle balance. The latter is still very early days. There
are quite a few dirty hacks in there to tie things together. To mention
a few current limitations:

1. Due to the lack of scale invariant cpu and task utilization, it 
   doesn't work properly with frequency scaling or heterogeneous systems 
   (big.LITTLE).

2. Platform data for the test platform (ARM TC2) has been hardcoded in 
   arch/arm/ code.

3. Most likely idle-state is currently hardcoded to be the shallowest
   one. cpuidle integration missing.

However, the main ideas and the primary focus of this RFC: The energy
model and energy_diff_{load, task, cpu}() are there.

Due to limitation 1, the ARM TC2 platform (2xA15+3xA7) was setup to
disable frequency scaling and set frequencies to eliminate the
big.LITTLE performance difference. That basically turns TC2 into an SMP
platform where a subset of the cpus are less energy-efficient.

Tests using a synthetic workload with seven short running periodic
tasks of different size and period, and the sysbench cpu benchmark with
five threads gave the following results:

cpu energy* short tasks sysbench
Mainline100 100
EA   49  99

* Note that these energy savings are _not_ representative of what can be
achieved on a true SMP platform where all cpus are equally 
energy-efficient. There should be benefit for SMP platforms as well, 
however, it will be smaller.

The energy model led to consolidation of the short tasks on the A7
cluster (more energy-efficient), while sysbench made use of all cpus as
the A7s didn't have sufficient compute capacity to handle the five
tasks.

To see how scheduling would happen if all cpus would have been A7s the
same tests were done with the A15s' energy model being the same as that
of the A7s (i.e. lying about the platform to the scheduler energy
model). The scheduling pattern for the short tasks changed to being
either consolidated on the A7 or the A15 cluster instead of just on the
A7, which was expected. Currently, there are no tools available to 
easily deduce energy for traces using a platform energy model, which 
could have estimated the energy benefit. Linaro is currently looking 
into extending the idle-stat tool [3] to do this.

Testing with more realistic (mobile) use-cases was done using two
previously described Android workloads [2]: Audio playback and Web
browsing. In addition the combination of the the two was measured.
Reported numbers are averages for 20 runs and have been normalized.
Browsing performance score is roughly rendering time (less is better).

browsingaudio   browsing+audio
Mainline
 A15 51.517.740.5
 A7  48.582.359.5
 energy 100.0   100.0   100.0
 perf   100.0   100.0

EA
 A15 16.3 2.213.4
 A7  60.280.761.1
 energy  76.682.974.6
 perf   108.9   108.9

Diff
 energy -23.4%  -17.1%  

[RFCv2 PATCH 00/23] sched: Energy cost model for energy-aware scheduling

2014-07-03 Thread Morten Rasmussen
This is RFC v2 of this proposal (changelog at the end).

Several techniques for saving energy through various scheduler
modifications have been proposed in the past, however most of the
techniques have not been universally beneficial for all use-cases and
platforms. For example, consolidating tasks on fewer cpus is an
effective way to save energy on some platforms, while it might make
things worse on others.

This proposal, which is inspired by the Ksummit workshop discussions
last year [1], takes a different approach by using a (relatively) simple
platform energy cost model to guide scheduling decisions. By providing
the model with platform specific costing data the model can provide a
estimate of the energy implications of scheduling decisions. So instead
of blindly applying scheduling techniques that may or may not work for
the current use-case, the scheduler can make informed energy-aware
decisions. We believe this approach provides a methodology that can be
adapted to any platform, including heterogeneous systems such as ARM
big.LITTLE. The model considers cpus only. Model data includes power
consumption at each P-state, C-state power consumption, and wake-up
energy costs. However, the energy model could potentially be extended to
be used to guide performance/energy decisions in other subsystems.

For example, the scheduler can use energy_diff_task(cpu, task) to
estimate the cost of placing a task on a specific cpu and compare energy
costs of different cpus.

This is an RFC and there are some loose ends that have not been
addressed here or in the code yet. The model and its infrastructure is
in place in the scheduler and it is being used for load-balancing
decisions. It is used for the select_task_rq_fair() path for
fork/exec/wake balancing and to guide the selection of the source cpu
for periodic or idle balance. The latter is still very early days. There
are quite a few dirty hacks in there to tie things together. To mention
a few current limitations:

1. Due to the lack of scale invariant cpu and task utilization, it 
   doesn't work properly with frequency scaling or heterogeneous systems 
   (big.LITTLE).

2. Platform data for the test platform (ARM TC2) has been hardcoded in 
   arch/arm/ code.

3. Most likely idle-state is currently hardcoded to be the shallowest
   one. cpuidle integration missing.

However, the main ideas and the primary focus of this RFC: The energy
model and energy_diff_{load, task, cpu}() are there.

Due to limitation 1, the ARM TC2 platform (2xA15+3xA7) was setup to
disable frequency scaling and set frequencies to eliminate the
big.LITTLE performance difference. That basically turns TC2 into an SMP
platform where a subset of the cpus are less energy-efficient.

Tests using a synthetic workload with seven short running periodic
tasks of different size and period, and the sysbench cpu benchmark with
five threads gave the following results:

cpu energy* short tasks sysbench
Mainline100 100
EA   49  99

* Note that these energy savings are _not_ representative of what can be
achieved on a true SMP platform where all cpus are equally 
energy-efficient. There should be benefit for SMP platforms as well, 
however, it will be smaller.

The energy model led to consolidation of the short tasks on the A7
cluster (more energy-efficient), while sysbench made use of all cpus as
the A7s didn't have sufficient compute capacity to handle the five
tasks.

To see how scheduling would happen if all cpus would have been A7s the
same tests were done with the A15s' energy model being the same as that
of the A7s (i.e. lying about the platform to the scheduler energy
model). The scheduling pattern for the short tasks changed to being
either consolidated on the A7 or the A15 cluster instead of just on the
A7, which was expected. Currently, there are no tools available to 
easily deduce energy for traces using a platform energy model, which 
could have estimated the energy benefit. Linaro is currently looking 
into extending the idle-stat tool [3] to do this.

Testing with more realistic (mobile) use-cases was done using two
previously described Android workloads [2]: Audio playback and Web
browsing. In addition the combination of the the two was measured.
Reported numbers are averages for 20 runs and have been normalized.
Browsing performance score is roughly rendering time (less is better).

browsingaudio   browsing+audio
Mainline
 A15 51.517.740.5
 A7  48.582.359.5
 energy 100.0   100.0   100.0
 perf   100.0   100.0

EA
 A15 16.3 2.213.4
 A7  60.280.761.1
 energy  76.682.974.6
 perf   108.9   108.9

Diff
 energy -23.4%  -17.1%