Re: [PATCH v13 1/6] sched/core: uclamp: Extend CPU's cgroup controller

2019-08-08 Thread Michal Koutný
On Thu, Aug 08, 2019 at 04:10:21PM +0100, Patrick Bellasi 
 wrote:
> Not sure to get what you mean here: I'm currently exposing uclamp to
> both v1 and v2 hierarchies.
cpu controller has different API for v1 and v2 hierarchies. My question
reworded is -- are the new knobs exposed in the legacy API
intentionally/for a reason?

Michal


signature.asc
Description: Digital signature


Re: [PATCH v13 1/6] sched/core: uclamp: Extend CPU's cgroup controller

2019-08-08 Thread Patrick Bellasi


On Tue, Aug 06, 2019 at 17:11:34 +0100, Michal Koutný wrote...

> On Fri, Aug 02, 2019 at 10:08:48AM +0100, Patrick Bellasi 
>  wrote:
>> +static ssize_t cpu_uclamp_write(struct kernfs_open_file *of, char *buf,
>> +size_t nbytes, loff_t off,
>> +enum uclamp_id clamp_id)
>> +{
>> +struct uclamp_request req;
>> +struct task_group *tg;
>> +
>> +req = capacity_from_percent(buf);
>> +if (req.ret)
>> +return req.ret;
>> +
>> +rcu_read_lock();
> This should be the uclamp_mutex.
>
> (The compound results of the series is correct as the lock is introduced
> in "sched/core: uclamp: Propagate parent clamps".
> This is just for the happiness of cherry-pickers/bisectors.)

Right, will move the uclamp_mutex introduction in this patch instead of
in the following one.

>> +static inline void cpu_uclamp_print(struct seq_file *sf,
>> +enum uclamp_id clamp_id)
>> +{
>> [...]
>> +rcu_read_lock();
>> +tg = css_tg(seq_css(sf));
>> +util_clamp = tg->uclamp_req[clamp_id].value;
>> +rcu_read_unlock();
> Why is the rcu_read_lock() needed here? (I'm considering the comment in
> of_css() that should apply here (and it seems that similar uses in other
> seq_file handlers also skip this).)

So, looks like that since we are in the context of a file operation,
all the cgroup's attribute read/write functions are implicitly save.

IOW, we don't need an RCU lock since the TG data structures are granted
to be always available till the end of the read/write operation.

That seems to make sense... I'm wondering if keeping the RCU look is
still a precaution for possible future code/assumption changes or just
an unnecessary overhead?

>> @@ -7369,6 +7506,20 @@ static struct cftype cpu_legacy_files[] = {
>> [...]
>> +.name = "uclamp.min",
>> [...]
>> +.name = "uclamp.max",
> I don't see technical reasons why uclamp couldn't work on legacy
> hierarchy and Tejun acked the series, despite that I'll ask -- should
> the new attributes be exposed in v1 controller hierarchy (taking into
> account the legacy API is sort of frozen and potential maintenance needs
> spanning both hierarchies)?

Not sure to get what you mean here: I'm currently exposing uclamp to
both v1 and v2 hierarchies.

Best,
Patrick

--
#include 

Patrick Bellasi


Re: [PATCH v13 1/6] sched/core: uclamp: Extend CPU's cgroup controller

2019-08-06 Thread Michal Koutný
On Fri, Aug 02, 2019 at 10:08:48AM +0100, Patrick Bellasi 
 wrote:
> +static ssize_t cpu_uclamp_write(struct kernfs_open_file *of, char *buf,
> + size_t nbytes, loff_t off,
> + enum uclamp_id clamp_id)
> +{
> + struct uclamp_request req;
> + struct task_group *tg;
> +
> + req = capacity_from_percent(buf);
> + if (req.ret)
> + return req.ret;
> +
> + rcu_read_lock();
This should be the uclamp_mutex.

(The compound results of the series is correct as the lock is introduced
in "sched/core: uclamp: Propagate parent clamps".
This is just for the happiness of cherry-pickers/bisectors.)

> +static inline void cpu_uclamp_print(struct seq_file *sf,
> + enum uclamp_id clamp_id)
> +{
> [...]
> + rcu_read_lock();
> + tg = css_tg(seq_css(sf));
> + util_clamp = tg->uclamp_req[clamp_id].value;
> + rcu_read_unlock();
Why is the rcu_read_lock() needed here? (I'm considering the comment in
of_css() that should apply here (and it seems that similar uses in other
seq_file handlers also skip this).)

> @@ -7369,6 +7506,20 @@ static struct cftype cpu_legacy_files[] = {
> [...]
> + .name = "uclamp.min",
> [...]
> + .name = "uclamp.max",
I don't see technical reasons why uclamp couldn't work on legacy
hierarchy and Tejun acked the series, despite that I'll ask -- should
the new attributes be exposed in v1 controller hierarchy (taking into
account the legacy API is sort of frozen and potential maintenance needs
spanning both hierarchies)?



[PATCH v13 1/6] sched/core: uclamp: Extend CPU's cgroup controller

2019-08-02 Thread Patrick Bellasi
The cgroup CPU bandwidth controller allows to assign a specified
(maximum) bandwidth to the tasks of a group. However this bandwidth is
defined and enforced only on a temporal base, without considering the
actual frequency a CPU is running on. Thus, the amount of computation
completed by a task within an allocated bandwidth can be very different
depending on the actual frequency the CPU is running that task.
The amount of computation can be affected also by the specific CPU a
task is running on, especially when running on asymmetric capacity
systems like Arm's big.LITTLE.

With the availability of schedutil, the scheduler is now able
to drive frequency selections based on actual task utilization.
Moreover, the utilization clamping support provides a mechanism to
bias the frequency selection operated by schedutil depending on
constraints assigned to the tasks currently RUNNABLE on a CPU.

Giving the mechanisms described above, it is now possible to extend the
cpu controller to specify the minimum (or maximum) utilization which
should be considered for tasks RUNNABLE on a cpu.
This makes it possible to better defined the actual computational
power assigned to task groups, thus improving the cgroup CPU bandwidth
controller which is currently based just on time constraints.

Extend the CPU controller with a couple of new attributes uclamp.{min,max}
which allow to enforce utilization boosting and capping for all the
tasks in a group.

Specifically:

- uclamp.min: defines the minimum utilization which should be considered
  i.e. the RUNNABLE tasks of this group will run at least at a
 minimum frequency which corresponds to the uclamp.min
 utilization

- uclamp.max: defines the maximum utilization which should be considered
  i.e. the RUNNABLE tasks of this group will run up to a
 maximum frequency which corresponds to the uclamp.max
 utilization

These attributes:

a) are available only for non-root nodes, both on default and legacy
   hierarchies, while system wide clamps are defined by a generic
   interface which does not depends on cgroups. This system wide
   interface enforces constraints on tasks in the root node.

b) enforce effective constraints at each level of the hierarchy which
   are a restriction of the group requests considering its parent's
   effective constraints. Root group effective constraints are defined
   by the system wide interface.
   This mechanism allows each (non-root) level of the hierarchy to:
   - request whatever clamp values it would like to get
   - effectively get only up to the maximum amount allowed by its parent

c) have higher priority than task-specific clamps, defined via
   sched_setattr(), thus allowing to control and restrict task requests.

Add two new attributes to the cpu controller to collect "requested"
clamp values. Allow that at each non-root level of the hierarchy.
Validate local consistency by enforcing uclamp.min < uclamp.max.
Keep it simple by not caring now about "effective" values computation
and propagation along the hierarchy.

Signed-off-by: Patrick Bellasi 
Acked-by: Tejun Heo 
Cc: Ingo Molnar 
Cc: Peter Zijlstra 
Cc: Tejun Heo 

---
Changes in v13:
 Message-ID: <20190725114104.ga32...@blackbody.suse.cz>
 - move common code from cpu_uclamp_{min,max}_write() into 
cpu_uclamp_write(clamp_id)
---
 Documentation/admin-guide/cgroup-v2.rst |  34 +
 init/Kconfig|  22 
 kernel/sched/core.c | 167 +++-
 kernel/sched/sched.h|   8 ++
 4 files changed, 230 insertions(+), 1 deletion(-)

diff --git a/Documentation/admin-guide/cgroup-v2.rst 
b/Documentation/admin-guide/cgroup-v2.rst
index 3b29005aa981..5f1c266131b0 100644
--- a/Documentation/admin-guide/cgroup-v2.rst
+++ b/Documentation/admin-guide/cgroup-v2.rst
@@ -951,6 +951,13 @@ controller implements weight and absolute bandwidth limit 
models for
 normal scheduling policy and absolute bandwidth allocation model for
 realtime scheduling policy.
 
+In all the above models, cycles distribution is defined only on a temporal
+base and it does not account for the frequency at which tasks are executed.
+The (optional) utilization clamping support allows to hint the schedutil
+cpufreq governor about the minimum desired frequency which should always be
+provided by a CPU, as well as the maximum desired frequency, which should not
+be exceeded by a CPU.
+
 WARNING: cgroup2 doesn't yet support control of realtime processes and
 the cpu controller can only be enabled when all RT processes are in
 the root cgroup.  Be aware that system management software may already
@@ -1016,6 +1023,33 @@ All time durations are in microseconds.
Shows pressure stall information for CPU. See
Documentation/accounting/psi.rst for details.
 
+  cpu.uclamp.min
+A read-write single value file which exists on non-root cgroups.
+The