Re: [PATCH v4 1/4] PM / EM: add devices to Energy Model

2020-03-15 Thread Lukasz Luba

Hi Quentin,

On 3/13/20 10:04 AM, Quentin Perret wrote:

Hi Lukasz,

On Monday 09 Mar 2020 at 13:41:14 (+), Lukasz Luba wrote:


diff --git a/drivers/opp/of.c b/drivers/opp/of.c
index 9cd8f0adacae..0efd6cf6d023 100644
--- a/drivers/opp/of.c
+++ b/drivers/opp/of.c
@@ -1047,9 +1047,8 @@ EXPORT_SYMBOL_GPL(dev_pm_opp_get_of_node);
   * calculation failed because of missing parameters, 0 otherwise.
   */
  static int __maybe_unused _get_cpu_power(unsigned long *mW, unsigned long 
*kHz,
-int cpu)
+struct device *cpu_dev)
  {
-   struct device *cpu_dev;
struct dev_pm_opp *opp;
struct device_node *np;
unsigned long mV, Hz;
@@ -1057,10 +1056,6 @@ static int __maybe_unused _get_cpu_power(unsigned long 
*mW, unsigned long *kHz,
u64 tmp;
int ret;
  
-	cpu_dev = get_cpu_device(cpu);

-   if (!cpu_dev)
-   return -ENODEV;
-
np = of_node_get(cpu_dev->of_node);
if (!np)
return -EINVAL;
@@ -1128,6 +1123,6 @@ void dev_pm_opp_of_register_em(struct cpumask *cpus)
if (ret || !cap)
return;
  
-	em_register_perf_domain(cpus, nr_opp, _cb);

+   em_register_perf_domain(cpu_dev, nr_opp, _cb, cpus);


Any reason for not checking the return value here ? You added a nice
check in scmi_get_cpu_power(), perhaps do the same thing here ?


I have tried to avoid changing the function to 'return int' in this
patch. It is changed in the 2/4 where it gets the proper return.

The 2/4 patch touches a few drivers which use the function
dev_pm_opp_of_register_em(), mainly the new arguments, but also the
return type in one patch (for consistency). It would need some ACKs
from maintainers making sure to sync their trees with that patch
(to avoid getting merge conflicts at some late stages).




  }
  EXPORT_SYMBOL_GPL(dev_pm_opp_of_register_em);
diff --git a/drivers/thermal/cpufreq_cooling.c 
b/drivers/thermal/cpufreq_cooling.c
index fe83d7a210d4..fcf2dab1b3b8 100644
--- a/drivers/thermal/cpufreq_cooling.c
+++ b/drivers/thermal/cpufreq_cooling.c
@@ -333,18 +333,18 @@ static inline bool em_is_sane(struct 
cpufreq_cooling_device *cpufreq_cdev,
return false;
  
  	policy = cpufreq_cdev->policy;

-   if (!cpumask_equal(policy->related_cpus, to_cpumask(em->cpus))) {
+   if (!cpumask_equal(policy->related_cpus, em_span_cpus(em))) {
pr_err("The span of pd %*pbl is misaligned with cpufreq policy 
%*pbl\n",
-   cpumask_pr_args(to_cpumask(em->cpus)),
+   cpumask_pr_args(em_span_cpus(em)),
cpumask_pr_args(policy->related_cpus));
return false;
}
  
  	nr_levels = cpufreq_cdev->max_level + 1;

-   if (em->nr_cap_states != nr_levels) {
+   if (em->nr_perf_states != nr_levels) {
pr_err("The number of cap states in pd %*pbl (%u) doesn't match the 
number of cooling levels (%u)\n",


s/cap states/performance states


missed it, thanks




-   cpumask_pr_args(to_cpumask(em->cpus)),
-   em->nr_cap_states, nr_levels);
+   cpumask_pr_args(em_span_cpus(em)),
+   em->nr_perf_states, nr_levels);
return false;
}




+static int em_create_perf_table(struct device *dev, struct em_perf_domain *pd,
+   int nr_states, struct em_data_callback *cb)
  {
unsigned long opp_eff, prev_opp_eff = ULONG_MAX;
unsigned long power, freq, prev_freq = 0;
-   int i, ret, cpu = cpumask_first(span);
-   struct em_cap_state *table;
-   struct em_perf_domain *pd;
+   struct em_perf_state *table;
+   int i, ret;
u64 fmax;
  
-	if (!cb->active_power)

-   return NULL;
-
-   pd = kzalloc(sizeof(*pd) + cpumask_size(), GFP_KERNEL);
-   if (!pd)
-   return NULL;
-
table = kcalloc(nr_states, sizeof(*table), GFP_KERNEL);
if (!table)
-   goto free_pd;
+   return -ENOMEM;
  
-	/* Build the list of capacity states for this performance domain */

+   /* Build the list of performance states for this performance domain */
for (i = 0, freq = 0; i < nr_states; i++, freq++) {
/*
 * active_power() is a driver callback which ceils 'freq' to
-* lowest capacity state of 'cpu' above 'freq' and updates
+* lowest performance state of 'dev' above 'freq' and updates
 * 'power' and 'freq' accordingly.
 */
-   ret = cb->active_power(, , cpu);
+   ret = cb->active_power(, , dev);
if (ret) {
-   pr_err("pd%d: invalid cap. state: %d\n", cpu, ret);
+   dev_err(dev, "EM: invalid perf. state: %d\n",
+   ret);


Not easy to 

[PATCH v4 1/4] PM / EM: add devices to Energy Model

2020-03-10 Thread Lukasz Luba
Add support of other devices into the Energy Model framework not only the
CPUs. Change the interface to be more unified which can handle other
devices as well.

Signed-off-by: Lukasz Luba 
---
 Documentation/power/energy-model.rst | 133 ---
 Documentation/scheduler/sched-energy.rst |   2 +-
 drivers/cpufreq/scmi-cpufreq.c   |  13 +-
 drivers/opp/of.c |   9 +-
 drivers/thermal/cpufreq_cooling.c|  10 +-
 include/linux/energy_model.h | 111 --
 kernel/power/energy_model.c  | 463 ++-
 kernel/sched/fair.c  |   2 +-
 kernel/sched/topology.c  |   4 +-
 9 files changed, 538 insertions(+), 209 deletions(-)

diff --git a/Documentation/power/energy-model.rst 
b/Documentation/power/energy-model.rst
index 90a345d57ae9..c0a08ecd4e32 100644
--- a/Documentation/power/energy-model.rst
+++ b/Documentation/power/energy-model.rst
@@ -1,15 +1,17 @@
-
-Energy Model of CPUs
-
+.. SPDX-License-Identifier: GPL-2.0
+
+===
+Energy Model of devices
+===
 
 1. Overview
 ---
 
 The Energy Model (EM) framework serves as an interface between drivers knowing
-the power consumed by CPUs at various performance levels, and the kernel
+the power consumed by devices at various performance levels, and the kernel
 subsystems willing to use that information to make energy-aware decisions.
 
-The source of the information about the power consumed by CPUs can vary greatly
+The source of the information about the power consumed by devices can vary 
greatly
 from one platform to another. These power costs can be estimated using
 devicetree data in some cases. In others, the firmware will know better.
 Alternatively, userspace might be best positioned. And so on. In order to avoid
@@ -25,7 +27,7 @@ framework, and interested clients reading the data from it::
+---+  +-+  +---+
| Thermal (IPA) |  | Scheduler (EAS) |  | Other |
+---+  +-+  +---+
-   |   | em_pd_energy()|
+   |   | em_cpu_energy()   |
|   | em_cpu_get()  |
+-+ | +-+
  | | |
@@ -47,12 +49,12 @@ framework, and interested clients reading the data from it::
 | Device Tree  |   |   Firmware|  |  ?   |
 +--+   +---+  +--+
 
-The EM framework manages power cost tables per 'performance domain' in the
-system. A performance domain is a group of CPUs whose performance is scaled
-together. Performance domains generally have a 1-to-1 mapping with CPUFreq
-policies. All CPUs in a performance domain are required to have the same
-micro-architecture. CPUs in different performance domains can have different
-micro-architectures.
+In case of CPU devices the EM framework manages power cost tables per
+'performance domain' in the system. A performance domain is a group of CPUs
+whose performance is scaled together. Performance domains generally have a
+1-to-1 mapping with CPUFreq policies. All CPUs in a performance domain are
+required to have the same micro-architecture. CPUs in different performance
+domains can have different micro-architectures.
 
 
 2. Core APIs
@@ -70,14 +72,16 @@ CONFIG_ENERGY_MODEL must be enabled to use the EM framework.
 Drivers are expected to register performance domains into the EM framework by
 calling the following API::
 
-  int em_register_perf_domain(cpumask_t *span, unsigned int nr_states,
- struct em_data_callback *cb);
+  int em_register_perf_domain(struct device *dev, unsigned int nr_states,
+   struct em_data_callback *cb, cpumask_t *cpus);
 
-Drivers must specify the CPUs of the performance domains using the cpumask
-argument, and provide a callback function returning  tuples
-for each capacity state. The callback function provided by the driver is free
+Drivers must provide a callback function returning  tuples
+for each performance state. The callback function provided by the driver is 
free
 to fetch data from any relevant location (DT, firmware, ...), and by any mean
-deemed necessary. See Section 3. for an example of driver implementing this
+deemed necessary. Only for CPU devices, drivers must specify the CPUs of the
+performance domains using cpumask. For other devices than CPUs the last
+argument must be set to NULL.
+See Section 3. for an example of driver implementing this
 callback, and kernel/power/energy_model.c for further documentation on this
 API.
 
@@ -85,13 +89,20 @@ API.
 2.3 Accessing performance domains
 ^
 
+There are two API functions which provide the access to the energy model:
+em_cpu_get() which