Re: [RFC PATCH 2/6] sched: Introduce energy models of CPUs

2018-04-10 Thread Rafael J. Wysocki
On Tue, Apr 10, 2018 at 11:31 AM, Quentin Perret  wrote:
> On Tuesday 10 Apr 2018 at 08:55:14 (+0200), Rafael J. Wysocki wrote:
>> On Mon, Apr 9, 2018 at 6:42 PM, Quentin Perret  
>> wrote:
>> > On Monday 09 Apr 2018 at 17:32:33 (+0200), Peter Zijlstra wrote:
>> >> On Mon, Apr 09, 2018 at 02:45:11PM +0100, Quentin Perret wrote:
>
> [...]
>
>> > I quiet like the first idea from a pure design standpoint, but I could
>> > also understand if maintainers of other archs were reluctant to
>> > have new dependencies on PM_OPP ...
>>
>> Not just reluctant I would think.
>>
>> Depending on PM_OPP directly here is like depending on ACPI directly.
>> Would you agree with the latter?
>
> Right, I see your point. I was suggesting to use PM_OPP only to make the
> OPPs *visible*, nothing else. That doesn't mean all archs would have
> to use dev_pm_opp_set_rate() or anything, they could just keep on doing
> DVFS their own way. PM_OPP would just be a common way to make OPPs
> visible outside of their subsystem, which should be harmless. The point
> is to keep the energy model loading code common to all archs.
>
> Another solution would be to let the archs populate the energy model
> data-structures themselves, and turn the current energy.c file into
> arm/arm64-specific code for ex.
>
> Overall, I guess the question is whether or not PM_OPP is the right
> interface for EAS of multiple archs ... That sounds like an interesting
> discussion topic for OSPM next week,

I agree.

> so thanks a lot for raising this point !

And moreover, we already have cpufreq and cpuidle that use their own
representations of the same information, generally coming from lower
layers.  They do that, because they need to work with different
platforms that generally represent the low-level information
differently.  I don't see why that principle doesn't apply to EAS.

Maybe there should be a common data structure to be used by them all,
but I'm quite confident that PM_OPP is not suitable for this purpose
in general.


Re: [RFC PATCH 2/6] sched: Introduce energy models of CPUs

2018-04-10 Thread Quentin Perret
On Tuesday 10 Apr 2018 at 08:55:14 (+0200), Rafael J. Wysocki wrote:
> On Mon, Apr 9, 2018 at 6:42 PM, Quentin Perret  wrote:
> > On Monday 09 Apr 2018 at 17:32:33 (+0200), Peter Zijlstra wrote:
> >> On Mon, Apr 09, 2018 at 02:45:11PM +0100, Quentin Perret wrote:

[...]

> > I quiet like the first idea from a pure design standpoint, but I could
> > also understand if maintainers of other archs were reluctant to
> > have new dependencies on PM_OPP ...
> 
> Not just reluctant I would think.
> 
> Depending on PM_OPP directly here is like depending on ACPI directly.
> Would you agree with the latter?

Right, I see your point. I was suggesting to use PM_OPP only to make the
OPPs *visible*, nothing else. That doesn't mean all archs would have
to use dev_pm_opp_set_rate() or anything, they could just keep on doing
DVFS their own way. PM_OPP would just be a common way to make OPPs
visible outside of their subsystem, which should be harmless. The point
is to keep the energy model loading code common to all archs.

Another solution would be to let the archs populate the energy model
data-structures themselves, and turn the current energy.c file into
arm/arm64-specific code for ex.

Overall, I guess the question is whether or not PM_OPP is the right
interface for EAS of multiple archs ... That sounds like an interesting
discussion topic for OSPM next week, so thanks a lot for raising this
point !

Regards,
Quentin


Re: [RFC PATCH 2/6] sched: Introduce energy models of CPUs

2018-04-09 Thread Rafael J. Wysocki
On Mon, Apr 9, 2018 at 6:42 PM, Quentin Perret  wrote:
> On Monday 09 Apr 2018 at 17:32:33 (+0200), Peter Zijlstra wrote:
>> On Mon, Apr 09, 2018 at 02:45:11PM +0100, Quentin Perret wrote:
>>
>> > In this specific patch, we are basically trying to figure out the
>> > boundaries of frequency domains, and the power consumed by each CPU
>> > at each OPP, to make them available to the scheduler. The important
>> > thing here is that, in both cases, we rely on the OPP library to
>> > keep the code as platform-agnostic as possible.
>>
>> AFAICT the only users of this PM_OPP stuff is a bunch of ARM platforms.
>
> That's correct.
>
>> Granted, body else has build a big.little style system, so that might
>> all be fine I suppose.
>>
>> It won't be until some !ARM chip comes along that we'll know how
>> generically usable any of this really is.
>>
>
> Right. There is already a lot of diversity in the Arm ecosystem that has
> to be managed. That's what I meant by platform-agnostic. Now, I agree
> that it should be discussed whether or not this is enough for other
> archs ...

Even for ARM64 w/ ACPI, mind you.

> It might be reasonable to expect from the archs who want to use EAS that
> they expose their OPPs in the OPP lib. That should be harmless, and EAS
> needs to know about the OPPs, so they should be made visible, ideally
> somewhere generic. Otherwise, that means the interface with the
> EAS has to be defined only by the energy model data structures, and the
> actual energy model loading procedure becomes free-form arch code.
>
> I quiet like the first idea from a pure design standpoint, but I could
> also understand if maintainers of other archs were reluctant to
> have new dependencies on PM_OPP ...

Not just reluctant I would think.

Depending on PM_OPP directly here is like depending on ACPI directly.
Would you agree with the latter?

>> > In the case of the frequency domains for example, the cpufreq driver is
>> > in charge of specifying the CPUs that are sharing frequencies. That
>> > information can come from DT, or SCPI, or SCMI, or whatever -- we
>> > probably shouldn't have to care about that from the scheduler's
>> > standpoint. That's why using dev_pm_opp_get_sharing_cpus() is handy,
>> > the OPP library gives us the digested information we need.
>>
>> So I kinda would've expected to just ask cpufreq, that after all already
>> knows these things. Why did we need to invent this pm_opp thing?
>
> Yes, we can definitely rely on cpufreq for this one. There is a "strong"
> dependency on PM_OPP to get power values, so I decided to use PM_OPP for
> the frequency domains as well, for consistency. But I can change that if
> needed.

Yes, please.

>>
>> Cpufreq has a tons of supported architectures, pm_opp not so much.
>>
>> > The power values (dev_pm_opp_get_power) we use right now are those
>> > already used by the thermal subsystem (IPA), which means we don't have
>>
>> I love an IPA style beer, but I'm thinking that's not the same IPA,
>> right :-)
>
> Well, both can help to chill down in a way ... :-)
>
> The IPA I'm talking about means Intelligent Power Allocator. It's a
> thermal governor that uses a power model of the platform to allocate
> power budgets to CPUs & GPUs using a control loop. The code is in
> drivers/thermal/power_allocator.c if this is of interest.
>
>>
>> > to introduce any new DT binding whatsoever. In a close future, the power
>> > values could also come from other sources (SCMI for ex), and again it's
>> > probably not the scheduler's job to care about those things, so the OPP
>> > library is helping us again. As mentioned in the notes, as of today, this
>> > approach has dependencies on other patches relating to these things which
>> > are already on the list [1].
>>
>> Is there any !ARM thermal driver? (clearly I'm not up-to-date on things
>> thermal).
>
> I don't think so.

No, there isn't, AFAICS.

Thanks!


Re: [RFC PATCH 2/6] sched: Introduce energy models of CPUs

2018-04-09 Thread Quentin Perret
On Monday 09 Apr 2018 at 17:32:33 (+0200), Peter Zijlstra wrote:
> On Mon, Apr 09, 2018 at 02:45:11PM +0100, Quentin Perret wrote:
> 
> > In this specific patch, we are basically trying to figure out the
> > boundaries of frequency domains, and the power consumed by each CPU
> > at each OPP, to make them available to the scheduler. The important
> > thing here is that, in both cases, we rely on the OPP library to
> > keep the code as platform-agnostic as possible.
> 
> AFAICT the only users of this PM_OPP stuff is a bunch of ARM platforms.

That's correct.

> Granted, body else has build a big.little style system, so that might
> all be fine I suppose.
> 
> It won't be until some !ARM chip comes along that we'll know how
> generically usable any of this really is.
> 

Right. There is already a lot of diversity in the Arm ecosystem that has
to be managed. That's what I meant by platform-agnostic. Now, I agree
that it should be discussed whether or not this is enough for other
archs ...

It might be reasonable to expect from the archs who want to use EAS that
they expose their OPPs in the OPP lib. That should be harmless, and EAS
needs to know about the OPPs, so they should be made visible, ideally
somewhere generic. Otherwise, that means the interface with the
EAS has to be defined only by the energy model data structures, and the
actual energy model loading procedure becomes free-form arch code.

I quiet like the first idea from a pure design standpoint, but I could
also understand if maintainers of other archs were reluctant to
have new dependencies on PM_OPP ...

> > In the case of the frequency domains for example, the cpufreq driver is
> > in charge of specifying the CPUs that are sharing frequencies. That
> > information can come from DT, or SCPI, or SCMI, or whatever -- we
> > probably shouldn't have to care about that from the scheduler's
> > standpoint. That's why using dev_pm_opp_get_sharing_cpus() is handy,
> > the OPP library gives us the digested information we need.
> 
> So I kinda would've expected to just ask cpufreq, that after all already
> knows these things. Why did we need to invent this pm_opp thing?

Yes, we can definitely rely on cpufreq for this one. There is a "strong"
dependency on PM_OPP to get power values, so I decided to use PM_OPP for
the frequency domains as well, for consistency. But I can change that if
needed.

> 
> Cpufreq has a tons of supported architectures, pm_opp not so much.
> 
> > The power values (dev_pm_opp_get_power) we use right now are those
> > already used by the thermal subsystem (IPA), which means we don't have
> 
> I love an IPA style beer, but I'm thinking that's not the same IPA,
> right :-)

Well, both can help to chill down in a way ... :-)

The IPA I'm talking about means Intelligent Power Allocator. It's a
thermal governor that uses a power model of the platform to allocate
power budgets to CPUs & GPUs using a control loop. The code is in
drivers/thermal/power_allocator.c if this is of interest.

> 
> > to introduce any new DT binding whatsoever. In a close future, the power
> > values could also come from other sources (SCMI for ex), and again it's
> > probably not the scheduler's job to care about those things, so the OPP
> > library is helping us again. As mentioned in the notes, as of today, this
> > approach has dependencies on other patches relating to these things which
> > are already on the list [1].
> 
> Is there any !ARM thermal driver? (clearly I'm not up-to-date on things
> thermal).

I don't think so.

Thanks,
Quentin



Re: [RFC PATCH 2/6] sched: Introduce energy models of CPUs

2018-04-09 Thread Peter Zijlstra
On Mon, Apr 09, 2018 at 02:45:11PM +0100, Quentin Perret wrote:

> In this specific patch, we are basically trying to figure out the
> boundaries of frequency domains, and the power consumed by each CPU
> at each OPP, to make them available to the scheduler. The important
> thing here is that, in both cases, we rely on the OPP library to
> keep the code as platform-agnostic as possible.

AFAICT the only users of this PM_OPP stuff is a bunch of ARM platforms.
Granted, body else has build a big.little style system, so that might
all be fine I suppose.

It won't be until some !ARM chip comes along that we'll know how
generically usable any of this really is.

> In the case of the frequency domains for example, the cpufreq driver is
> in charge of specifying the CPUs that are sharing frequencies. That
> information can come from DT, or SCPI, or SCMI, or whatever -- we
> probably shouldn't have to care about that from the scheduler's
> standpoint. That's why using dev_pm_opp_get_sharing_cpus() is handy,
> the OPP library gives us the digested information we need.

So I kinda would've expected to just ask cpufreq, that after all already
knows these things. Why did we need to invent this pm_opp thing?

Cpufreq has a tons of supported architectures, pm_opp not so much.

> The power values (dev_pm_opp_get_power) we use right now are those
> already used by the thermal subsystem (IPA), which means we don't have

I love an IPA style beer, but I'm thinking that's not the same IPA,
right :-)

> to introduce any new DT binding whatsoever. In a close future, the power
> values could also come from other sources (SCMI for ex), and again it's
> probably not the scheduler's job to care about those things, so the OPP
> library is helping us again. As mentioned in the notes, as of today, this
> approach has dependencies on other patches relating to these things which
> are already on the list [1].

Is there any !ARM thermal driver? (clearly I'm not up-to-date on things
thermal).


Re: [RFC PATCH 2/6] sched: Introduce energy models of CPUs

2018-04-09 Thread Quentin Perret
On Monday 09 Apr 2018 at 14:01:11 (+0200), Peter Zijlstra wrote:
> On Tue, Mar 20, 2018 at 09:43:08AM +, Dietmar Eggemann wrote:
> > From: Quentin Perret 
> > 
> > The energy consumption of each CPU in the system is modeled with a list
> > of values representing its dissipated power and compute capacity at each
> > available Operating Performance Point (OPP). These values are derived
> > from existing information in the kernel (currently used by the thermal
> > subsystem) and don't require the introduction of new platform-specific
> > tunables. The energy model is also provided with a simple representation
> > of all frequency domains as cpumasks, hence enabling the scheduler to be
> > aware of dependencies between CPUs. The data required to build the energy
> > model is provided by the OPP library which enables an abstract view of
> > the platform from the scheduler. The new data structures holding these
> > models and the routines to populate them are stored in
> > kernel/sched/energy.c.
> > 
> > For the sake of simplicity, it is assumed in the energy model that all
> > CPUs in a frequency domain share the same micro-architecture. As long as
> > this assumption is correct, the energy models of different CPUs belonging
> > to the same frequency domain are equal. Hence, this commit builds only one
> > energy model per frequency domain, and links all relevant CPUs to it in
> > order to save time and memory. If needed for future hardware platforms,
> > relaxing this assumption should imply relatively simple modifications in
> > the code but a significantly higher algorithmic complexity.
> 
> What this doesn't mention is why this isn't part of the regular topology
> bits. IIRC this is because the frequency domains don't necessarily need
> to align with the existing topology, but this completely fails to state
> any of that.

Yes that's the main reason. Frequency domains and scheduling domains don't
necessarily align. That used to be the case for big.LITTLE platforms, but
not anymore with DynamIQ ...

> 
> Also, since I'm not at all familiar with DT and the OPP library stuff,
> this code is completely unreadable to me and there isn't a nice comment
> to help me along.

Right, so I can definitely fix that. Comments in the code and a better
commit message should help hopefully. And also, it has already been
suggested that a documentation file should be added alongside the code
for this patchset, so I'll make sure we add that for the next version.
In the meantime, here is a (hopefully) better explanation below.

In this specific patch, we are basically trying to figure out the
boundaries of frequency domains, and the power consumed by each CPU
at each OPP, to make them available to the scheduler. The important
thing here is that, in both cases, we rely on the OPP library to
keep the code as platform-agnostic as possible.

In the case of the frequency domains for example, the cpufreq driver is
in charge of specifying the CPUs that are sharing frequencies. That
information can come from DT, or SCPI, or SCMI, or whatever -- we
probably shouldn't have to care about that from the scheduler's
standpoint. That's why using dev_pm_opp_get_sharing_cpus() is handy,
the OPP library gives us the digested information we need.

The power values (dev_pm_opp_get_power) we use right now are those
already used by the thermal subsystem (IPA), which means we don't have
to introduce any new DT binding whatsoever. In a close future, the power
values could also come from other sources (SCMI for ex), and again it's
probably not the scheduler's job to care about those things, so the OPP
library is helping us again. As mentioned in the notes, as of today, this
approach has dependencies on other patches relating to these things which
are already on the list [1].

The rest of the code in this patch is just about iterating over the
CPUs/freq. domains/OPPs. The algorithm is more or less the following:

 1. find a frequency domain which hasn't been visited yet;
 2. estimate the power and capacity of a CPU in this freq domain at each
possible OPP;
 3. map all CPUs in the freq domain to this list of  tuples;
 4. go to 1.

I hope that makes sense.

Thanks,
Quentin

[1] https://marc.info/?l=linux-pm&m=151635516419249&w=2


Re: [RFC PATCH 2/6] sched: Introduce energy models of CPUs

2018-04-09 Thread Peter Zijlstra
On Tue, Mar 20, 2018 at 09:43:08AM +, Dietmar Eggemann wrote:
> From: Quentin Perret 
> 
> The energy consumption of each CPU in the system is modeled with a list
> of values representing its dissipated power and compute capacity at each
> available Operating Performance Point (OPP). These values are derived
> from existing information in the kernel (currently used by the thermal
> subsystem) and don't require the introduction of new platform-specific
> tunables. The energy model is also provided with a simple representation
> of all frequency domains as cpumasks, hence enabling the scheduler to be
> aware of dependencies between CPUs. The data required to build the energy
> model is provided by the OPP library which enables an abstract view of
> the platform from the scheduler. The new data structures holding these
> models and the routines to populate them are stored in
> kernel/sched/energy.c.
> 
> For the sake of simplicity, it is assumed in the energy model that all
> CPUs in a frequency domain share the same micro-architecture. As long as
> this assumption is correct, the energy models of different CPUs belonging
> to the same frequency domain are equal. Hence, this commit builds only one
> energy model per frequency domain, and links all relevant CPUs to it in
> order to save time and memory. If needed for future hardware platforms,
> relaxing this assumption should imply relatively simple modifications in
> the code but a significantly higher algorithmic complexity.

What this doesn't mention is why this isn't part of the regular topology
bits. IIRC this is because the frequency domains don't necessarily need
to align with the existing topology, but this completely fails to state
any of that.

Also, since I'm not at all familiar with DT and the OPP library stuff,
this code is completely unreadable to me and there isn't a nice comment
to help me along.


Re: [RFC PATCH 2/6] sched: Introduce energy models of CPUs

2018-03-26 Thread Dietmar Eggemann
On 03/25/2018 03:48 PM, Quentin Perret wrote:
> On Tuesday 20 Mar 2018 at 10:52:15 (+0100), Greg Kroah-Hartman wrote:
>> On Tue, Mar 20, 2018 at 09:43:08AM +, Dietmar Eggemann wrote:
>>> From: Quentin Perret 
> 
> [...]
> 
>>> +#ifdef CONFIG_PM_OPP
>>
>> #ifdefs go in .h files, not .c files, right?
>>
> 
> So, after looking into this, my suggestion would be to: 1) remove the
> #ifdef CONFIG_PM_OPP from energy.c entirely; 2) make sure
> init_sched_energy() is stubbed properly for !CONFIG_SMP and
> !CONFIG_PM_OPP in include/linux/sched/energy.h; 3) relocate the global
> variables (energy_model, freq_domains, ...) to fair.c; and 4) modify
> kernel/sched/Makefile with something like:
> 
> ifeq ($(CONFIG_PM_OPP),y)
> obj-$(CONFIG_SMP) += energy.o
> endif
> 
> That way, energy.c is not compiled if not needed by the arch, and the
> ifdef are kept within header files and Makefiles.
> 
> Would that work ?

Could we extend this idea a little bit further and leave the global
variables in energy.c? Normally, all energy interfaces could be
declared or stubbed in energy.h.

We could hide the access to energy_model, sched_energy_present and
freq_domains behind functions.

It would be nice to provide also find_energy_efficient_cpu() in energy.c
but it uses itself a lot of fair.c stuff and we do want to avoid function
calls in the wakeup path, right?

Boot-tested on juno r0 (CONFIG_SMP=y and CONFIG_PM_OPP=y). Build-tested on
i386 (CONFIG_SMP and CONFIG_PM_OPP not set), arm multi_v5_defconfig (CONFIG_SMP
not set and CONFIG_PM_OPP=y).

in energy.h:

#ifdef CONFIG_SMP
#ifdef CONFIG_PM_OPP
static inline struct capacity_state *find_cap_state(int cpu, unsigned long util)
{
...
}
static inline bool sched_energy_enabled(void)
{
   ...
}
static inline struct list_head *get_freq_domains(void)
{
   ...
}
#endif
#endif

#if !defined(CONFIG_SMP) || !defined(CONFIG_PM_OPP)
static inline struct capacity_state *find_cap_state(int cpu, unsigned long 
util) { return false; }
static inline struct list_head *get_freq_domains(void) { return NULL; }
static inline struct capacity_state *
find_cap_state(int cpu, unsigned long util) { return NULL; }
static inline void init_sched_energy(void) { }
#endif

#define for_each_freq_domain(fdom) \
list_for_each_entry(fdom, get_freq_domains(), next)

--->8---

diff --git a/include/linux/sched/energy.h b/include/linux/sched/energy.h
index b4f43564ffe4..5aad03ec5b30 100644
--- a/include/linux/sched/energy.h
+++ b/include/linux/sched/energy.h
@@ -20,12 +20,49 @@ struct freq_domain {
 extern struct sched_energy_model ** __percpu energy_model;
 extern struct static_key_false sched_energy_present;
 extern struct list_head freq_domains;
-#define for_each_freq_domain(fdom) \
-   list_for_each_entry(fdom, &freq_domains, next)
 
-void init_sched_energy(void);
-#else
-static inline void init_sched_energy(void) { }
+#ifdef CONFIG_PM_OPP
+static inline bool sched_energy_enabled(void)
+{
+   return static_branch_unlikely(&sched_energy_present);
+}
+
+static inline struct list_head *get_freq_domains(void)
+{
+   return &freq_domains;
+}
+
+static inline
+struct capacity_state *find_cap_state(int cpu, unsigned long util)
+{
+   struct sched_energy_model *em = *per_cpu_ptr(energy_model, cpu);
+   struct capacity_state *cs = NULL;
+   int i;
+
+   util += util >> 2;
+
+   for (i = 0; i < em->nr_cap_states; i++) {
+   cs = &em->cap_states[i];
+   if (cs->cap >= util)
+   break;
+   }
+
+   return cs;
+}
+
+extern void init_sched_energy(void);
 #endif
+#endif /* CONFIG_SMP */
 
+#if !defined(CONFIG_SMP) || !defined(CONFIG_PM_OPP)
+static inline bool sched_energy_enabled(void) { return false; }
+static inline struct list_head *get_freq_domains(void) { return NULL; }
+static inline struct capacity_state *
+find_cap_state(int cpu, unsigned long util) { return NULL; }
+static inline void init_sched_energy(void) { }
 #endif
+
+#define for_each_freq_domain(fdom) \
+   list_for_each_entry(fdom, get_freq_domains(), next)
+
+#endif /* _LINUX_SCHED_ENERGY_H */
diff --git a/kernel/sched/Makefile b/kernel/sched/Makefile
index 912972ad4dbc..e34bec3ae353 100644
--- a/kernel/sched/Makefile
+++ b/kernel/sched/Makefile
@@ -20,7 +20,7 @@ obj-y += core.o loadavg.o clock.o cputime.o
 obj-y += idle.o fair.o rt.o deadline.o
 obj-y += wait.o wait_bit.o swait.o completion.o
 
-obj-$(CONFIG_SMP) += cpupri.o cpudeadline.o topology.o stop_task.o energy.o
+obj-$(CONFIG_SMP) += cpupri.o cpudeadline.o topology.o stop_task.o
 obj-$(CONFIG_SCHED_AUTOGROUP) += autogroup.o
 obj-$(CONFIG_SCHEDSTATS) += stats.o
 obj-$(CONFIG_SCHED_DEBUG) += debug.o
@@ -29,3 +29,6 @@ obj-$(CONFIG_CPU_FREQ) += cpufreq.o
 obj-$(CONFIG_CPU_FREQ_GOV_SCHEDUTIL) += cpufreq_schedutil.o
 obj-$(CONFIG_MEMBARRIER) += membarrier.o
 obj-$(CONFIG_CPU_ISOLATION) += isolation.o
+ifeq ($(CONFIG_PM_OPP),y)
+   obj-$(CONFIG_SMP) += energy.o
+endif
dif

Re: [RFC PATCH 2/6] sched: Introduce energy models of CPUs

2018-03-25 Thread Quentin Perret
On Tuesday 20 Mar 2018 at 10:52:15 (+0100), Greg Kroah-Hartman wrote:
> On Tue, Mar 20, 2018 at 09:43:08AM +, Dietmar Eggemann wrote:
> > From: Quentin Perret 

[...]

> > +#ifdef CONFIG_PM_OPP
> 
> #ifdefs go in .h files, not .c files, right?
> 

So, after looking into this, my suggestion would be to: 1) remove the
#ifdef CONFIG_PM_OPP from energy.c entirely; 2) make sure
init_sched_energy() is stubbed properly for !CONFIG_SMP and
!CONFIG_PM_OPP in include/linux/sched/energy.h; 3) relocate the global
variables (energy_model, freq_domains, ...) to fair.c; and 4) modify
kernel/sched/Makefile with something like:

ifeq ($(CONFIG_PM_OPP),y)
obj-$(CONFIG_SMP) += energy.o
endif

That way, energy.c is not compiled if not needed by the arch, and the
ifdef are kept within header files and Makefiles.

Would that work ?

Thanks,
Quentin


Re: [RFC PATCH 2/6] sched: Introduce energy models of CPUs

2018-03-20 Thread Quentin Perret
On Tuesday 20 Mar 2018 at 10:52:15 (+0100), Greg Kroah-Hartman wrote:
> On Tue, Mar 20, 2018 at 09:43:08AM +, Dietmar Eggemann wrote:
> > From: Quentin Perret 
> > 
> > The energy consumption of each CPU in the system is modeled with a list
> > of values representing its dissipated power and compute capacity at each
> > available Operating Performance Point (OPP). These values are derived
> > from existing information in the kernel (currently used by the thermal
> > subsystem) and don't require the introduction of new platform-specific
> > tunables. The energy model is also provided with a simple representation
> > of all frequency domains as cpumasks, hence enabling the scheduler to be
> > aware of dependencies between CPUs. The data required to build the energy
> > model is provided by the OPP library which enables an abstract view of
> > the platform from the scheduler. The new data structures holding these
> > models and the routines to populate them are stored in
> > kernel/sched/energy.c.
> > 
> > For the sake of simplicity, it is assumed in the energy model that all
> > CPUs in a frequency domain share the same micro-architecture. As long as
> > this assumption is correct, the energy models of different CPUs belonging
> > to the same frequency domain are equal. Hence, this commit builds only one
> > energy model per frequency domain, and links all relevant CPUs to it in
> > order to save time and memory. If needed for future hardware platforms,
> > relaxing this assumption should imply relatively simple modifications in
> > the code but a significantly higher algorithmic complexity.
> > 
> > As it appears that energy-aware scheduling really makes a difference on
> > heterogeneous systems (e.g. big.LITTLE platforms), it is restricted to
> > systems having:
> > 
> >1. SD_ASYM_CPUCAPACITY flag set
> >2. Dynamic Voltage and Frequency Scaling (DVFS) is enabled
> >3. Available power estimates for the OPPs of all possible CPUs
> > 
> > Moreover, the scheduler is notified of the energy model availability
> > using a static key in order to minimize the overhead on non-energy-aware
> > systems.
> > 
> > Cc: Ingo Molnar 
> > Cc: Peter Zijlstra 
> > Signed-off-by: Quentin Perret 
> > Signed-off-by: Dietmar Eggemann 
> > 
> > ---
> > This patch depends on additional infrastructure being merged in the OPP
> > core. As this infrastructure can also be useful for other clients, the
> > related patches have been posted separately [1].
> > 
> > [1] https://marc.info/?l=linux-pm&m=151635516419249&w=2
> > ---
> >  include/linux/sched/energy.h |  31 +++
> >  kernel/sched/Makefile|   2 +-
> >  kernel/sched/energy.c| 190 
> > +++
> >  3 files changed, 222 insertions(+), 1 deletion(-)
> >  create mode 100644 include/linux/sched/energy.h
> >  create mode 100644 kernel/sched/energy.c
> > 
> > diff --git a/include/linux/sched/energy.h b/include/linux/sched/energy.h
> > new file mode 100644
> > index ..b4f43564ffe4
> > --- /dev/null
> > +++ b/include/linux/sched/energy.h
> > @@ -0,0 +1,31 @@
> > +#ifndef _LINUX_SCHED_ENERGY_H
> > +#define _LINUX_SCHED_ENERGY_H
> 
> No copyright or license info?  Not good :(
> 
> > --- /dev/null
> > +++ b/kernel/sched/energy.c
> > @@ -0,0 +1,190 @@
> > +/*
> > + * Released under the GPLv2 only.
> > + * SPDX-License-Identifier: GPL-2.0
> 
> Please read the documentation for the SPDX lines on how to do them
> correctly.  Newer versions of checkpatch.pl will catch this, but that is
> in linux-next for the moment.
> 
> And once you have the SPDX line, the "Released under..." line is not
> needed.
> 
> 
> > + *
> > + * Energy-aware scheduling models
> > + *
> > + * Copyright (C) 2018, Arm Ltd.
> > + * Written by: Quentin Perret, Arm Ltd.
> > + *
> > + * This file is subject to the terms and conditions of the GNU General 
> > Public
> > + * License.  See the file "COPYING" in the main directory of this archive
> > + * for more details.
> 
> This paragraph is not needed at all.

Right, I will fix all the licence issues and add one to the new header
file. I took example on existing files a while ago when I first wrote
the patches and forgot to update them later on. Sorry about that.

> 
> > + */
> > +
> > +#define pr_fmt(fmt) "sched-energy: " fmt
> > +
> > +#include 
> > +#include 
> > +#include 
> > +
> > +#include "sched.h"
> > +
> > +DEFINE_STATIC_KEY_FALSE(sched_energy_present);
> > +struct sched_energy_model ** __percpu energy_model;
> > +
> > +/*
> > + * A copy of the cpumasks representing the frequency domains is kept 
> > private
> > + * to the scheduler. They are stacked in a dynamically allocated linked 
> > list
> > + * as we don't know how many frequency domains the system has.
> > + */
> > +LIST_HEAD(freq_domains);
> 
> global variable?  If so, please prefix it with something more unique
> than "freq_".

Will do.

> 
> > +#ifdef CONFIG_PM_OPP
> 
> #ifdefs go in .h files, not .c files, right?

Yes good po

Re: [RFC PATCH 2/6] sched: Introduce energy models of CPUs

2018-03-20 Thread Greg Kroah-Hartman
On Tue, Mar 20, 2018 at 09:43:08AM +, Dietmar Eggemann wrote:
> From: Quentin Perret 
> 
> The energy consumption of each CPU in the system is modeled with a list
> of values representing its dissipated power and compute capacity at each
> available Operating Performance Point (OPP). These values are derived
> from existing information in the kernel (currently used by the thermal
> subsystem) and don't require the introduction of new platform-specific
> tunables. The energy model is also provided with a simple representation
> of all frequency domains as cpumasks, hence enabling the scheduler to be
> aware of dependencies between CPUs. The data required to build the energy
> model is provided by the OPP library which enables an abstract view of
> the platform from the scheduler. The new data structures holding these
> models and the routines to populate them are stored in
> kernel/sched/energy.c.
> 
> For the sake of simplicity, it is assumed in the energy model that all
> CPUs in a frequency domain share the same micro-architecture. As long as
> this assumption is correct, the energy models of different CPUs belonging
> to the same frequency domain are equal. Hence, this commit builds only one
> energy model per frequency domain, and links all relevant CPUs to it in
> order to save time and memory. If needed for future hardware platforms,
> relaxing this assumption should imply relatively simple modifications in
> the code but a significantly higher algorithmic complexity.
> 
> As it appears that energy-aware scheduling really makes a difference on
> heterogeneous systems (e.g. big.LITTLE platforms), it is restricted to
> systems having:
> 
>1. SD_ASYM_CPUCAPACITY flag set
>2. Dynamic Voltage and Frequency Scaling (DVFS) is enabled
>3. Available power estimates for the OPPs of all possible CPUs
> 
> Moreover, the scheduler is notified of the energy model availability
> using a static key in order to minimize the overhead on non-energy-aware
> systems.
> 
> Cc: Ingo Molnar 
> Cc: Peter Zijlstra 
> Signed-off-by: Quentin Perret 
> Signed-off-by: Dietmar Eggemann 
> 
> ---
> This patch depends on additional infrastructure being merged in the OPP
> core. As this infrastructure can also be useful for other clients, the
> related patches have been posted separately [1].
> 
> [1] https://marc.info/?l=linux-pm&m=151635516419249&w=2
> ---
>  include/linux/sched/energy.h |  31 +++
>  kernel/sched/Makefile|   2 +-
>  kernel/sched/energy.c| 190 
> +++
>  3 files changed, 222 insertions(+), 1 deletion(-)
>  create mode 100644 include/linux/sched/energy.h
>  create mode 100644 kernel/sched/energy.c
> 
> diff --git a/include/linux/sched/energy.h b/include/linux/sched/energy.h
> new file mode 100644
> index ..b4f43564ffe4
> --- /dev/null
> +++ b/include/linux/sched/energy.h
> @@ -0,0 +1,31 @@
> +#ifndef _LINUX_SCHED_ENERGY_H
> +#define _LINUX_SCHED_ENERGY_H

No copyright or license info?  Not good :(

> --- /dev/null
> +++ b/kernel/sched/energy.c
> @@ -0,0 +1,190 @@
> +/*
> + * Released under the GPLv2 only.
> + * SPDX-License-Identifier: GPL-2.0

Please read the documentation for the SPDX lines on how to do them
correctly.  Newer versions of checkpatch.pl will catch this, but that is
in linux-next for the moment.

And once you have the SPDX line, the "Released under..." line is not
needed.


> + *
> + * Energy-aware scheduling models
> + *
> + * Copyright (C) 2018, Arm Ltd.
> + * Written by: Quentin Perret, Arm Ltd.
> + *
> + * This file is subject to the terms and conditions of the GNU General Public
> + * License.  See the file "COPYING" in the main directory of this archive
> + * for more details.

This paragraph is not needed at all.

> + */
> +
> +#define pr_fmt(fmt) "sched-energy: " fmt
> +
> +#include 
> +#include 
> +#include 
> +
> +#include "sched.h"
> +
> +DEFINE_STATIC_KEY_FALSE(sched_energy_present);
> +struct sched_energy_model ** __percpu energy_model;
> +
> +/*
> + * A copy of the cpumasks representing the frequency domains is kept private
> + * to the scheduler. They are stacked in a dynamically allocated linked list
> + * as we don't know how many frequency domains the system has.
> + */
> +LIST_HEAD(freq_domains);

global variable?  If so, please prefix it with something more unique
than "freq_".

> +#ifdef CONFIG_PM_OPP

#ifdefs go in .h files, not .c files, right?

thanks,

greg k-h