Re: [PATCH v2 08/12] sched/core: uclamp: extend cpu's cgroup controller

2018-07-27 Thread Quentin Perret
On Thursday 26 Jul 2018 at 17:39:19 (-0700), Joel Fernandes wrote:
> On Tue, Jul 24, 2018 at 06:29:02AM -0700, Tejun Heo wrote:
> > Hello, Patrick.
> > 
> > On Mon, Jul 23, 2018 at 06:22:15PM +0100, Patrick Bellasi wrote:
> > > However, the "best effort" bandwidth control we have for CFS and RT
> > > can be further improved if, instead of just looking at time spent on
> > > CPUs, we provide some more hints to the scheduler to know at which
> > > min/max "MIPS" we want to consume the (best effort) time we have been
> > > allocated on a CPU.
> > > 
> > > Such a simple extension is still quite useful to satisfy many use-case
> > > we have, mainly on mobile systems, like the ones I've described in the
> > >"Newcomer's Short Abstract (Updated)"
> > > section of the cover letter:
> > >
> > > https://lore.kernel.org/lkml/20180716082906.6061-1-patrick.bell...@arm.com/T/#u
> > 
> > So, that's all completely fine but then let's please not give it a
> > name which doesn't quite match what it does.  We can just call it
> > e.g. cpufreq range control.
> 
> But then what name can one give it if it does more than one thing, like
> task-placement and CPU frequency control?
> 
> It doesn't make sense to name it cpufreq IMHO. Its a clamp on the utilization
> of the task which can be used for many purposes.

Indeed, the scheduler could use clamped utilization values in several
places. The capacity-awareness bits (mostly useful for big.LITTLE
platforms) could already use that today I guess.

And on the longer term, depending on where the EAS patches [1] end up,
utilization clamping might actually become very useful to bias task
placement decisions. EAS basically decides where to place tasks based on
their utilization, so util_clamp would make a lot of sense there IMO.

Thanks,
Quentin

[1] https://lkml.org/lkml/2018/7/24/420


Re: [PATCH v2 08/12] sched/core: uclamp: extend cpu's cgroup controller

2018-07-27 Thread Quentin Perret
On Thursday 26 Jul 2018 at 17:39:19 (-0700), Joel Fernandes wrote:
> On Tue, Jul 24, 2018 at 06:29:02AM -0700, Tejun Heo wrote:
> > Hello, Patrick.
> > 
> > On Mon, Jul 23, 2018 at 06:22:15PM +0100, Patrick Bellasi wrote:
> > > However, the "best effort" bandwidth control we have for CFS and RT
> > > can be further improved if, instead of just looking at time spent on
> > > CPUs, we provide some more hints to the scheduler to know at which
> > > min/max "MIPS" we want to consume the (best effort) time we have been
> > > allocated on a CPU.
> > > 
> > > Such a simple extension is still quite useful to satisfy many use-case
> > > we have, mainly on mobile systems, like the ones I've described in the
> > >"Newcomer's Short Abstract (Updated)"
> > > section of the cover letter:
> > >
> > > https://lore.kernel.org/lkml/20180716082906.6061-1-patrick.bell...@arm.com/T/#u
> > 
> > So, that's all completely fine but then let's please not give it a
> > name which doesn't quite match what it does.  We can just call it
> > e.g. cpufreq range control.
> 
> But then what name can one give it if it does more than one thing, like
> task-placement and CPU frequency control?
> 
> It doesn't make sense to name it cpufreq IMHO. Its a clamp on the utilization
> of the task which can be used for many purposes.

Indeed, the scheduler could use clamped utilization values in several
places. The capacity-awareness bits (mostly useful for big.LITTLE
platforms) could already use that today I guess.

And on the longer term, depending on where the EAS patches [1] end up,
utilization clamping might actually become very useful to bias task
placement decisions. EAS basically decides where to place tasks based on
their utilization, so util_clamp would make a lot of sense there IMO.

Thanks,
Quentin

[1] https://lkml.org/lkml/2018/7/24/420


Re: [PATCH v2 08/12] sched/core: uclamp: extend cpu's cgroup controller

2018-07-26 Thread Joel Fernandes
On Tue, Jul 24, 2018 at 06:29:02AM -0700, Tejun Heo wrote:
> Hello, Patrick.
> 
> On Mon, Jul 23, 2018 at 06:22:15PM +0100, Patrick Bellasi wrote:
> > However, the "best effort" bandwidth control we have for CFS and RT
> > can be further improved if, instead of just looking at time spent on
> > CPUs, we provide some more hints to the scheduler to know at which
> > min/max "MIPS" we want to consume the (best effort) time we have been
> > allocated on a CPU.
> > 
> > Such a simple extension is still quite useful to satisfy many use-case
> > we have, mainly on mobile systems, like the ones I've described in the
> >"Newcomer's Short Abstract (Updated)"
> > section of the cover letter:
> >
> > https://lore.kernel.org/lkml/20180716082906.6061-1-patrick.bell...@arm.com/T/#u
> 
> So, that's all completely fine but then let's please not give it a
> name which doesn't quite match what it does.  We can just call it
> e.g. cpufreq range control.

But then what name can one give it if it does more than one thing, like
task-placement and CPU frequency control?

It doesn't make sense to name it cpufreq IMHO. Its a clamp on the utilization
of the task which can be used for many purposes.

thanks,

- Joel



Re: [PATCH v2 08/12] sched/core: uclamp: extend cpu's cgroup controller

2018-07-26 Thread Joel Fernandes
On Tue, Jul 24, 2018 at 06:29:02AM -0700, Tejun Heo wrote:
> Hello, Patrick.
> 
> On Mon, Jul 23, 2018 at 06:22:15PM +0100, Patrick Bellasi wrote:
> > However, the "best effort" bandwidth control we have for CFS and RT
> > can be further improved if, instead of just looking at time spent on
> > CPUs, we provide some more hints to the scheduler to know at which
> > min/max "MIPS" we want to consume the (best effort) time we have been
> > allocated on a CPU.
> > 
> > Such a simple extension is still quite useful to satisfy many use-case
> > we have, mainly on mobile systems, like the ones I've described in the
> >"Newcomer's Short Abstract (Updated)"
> > section of the cover letter:
> >
> > https://lore.kernel.org/lkml/20180716082906.6061-1-patrick.bell...@arm.com/T/#u
> 
> So, that's all completely fine but then let's please not give it a
> name which doesn't quite match what it does.  We can just call it
> e.g. cpufreq range control.

But then what name can one give it if it does more than one thing, like
task-placement and CPU frequency control?

It doesn't make sense to name it cpufreq IMHO. Its a clamp on the utilization
of the task which can be used for many purposes.

thanks,

- Joel



Re: [PATCH v2 08/12] sched/core: uclamp: extend cpu's cgroup controller

2018-07-24 Thread Patrick Bellasi
Hi Tejun,

I apologize in advance for the (yet another) long reply, however I did
my best hereafter to try to resume all the controversial points
discussed so far.

If you will have (one more time) the patience to go through the
following text you'll find a set of precise clarifications and
questions I have for you.

Thank you again for your time.

On 24-Jul 06:29, Tejun Heo wrote:

[...]

> > What I describe here is just an additional hint to the scheduler which
> > enrich the above described model. Provided A and B are already
> > satisfied, when a task gets a chance to run it will be executed at a
> > min/max configured frequency. That's really all... there is not
> > additional impact on "resources allocation".
> 
> So, if it's a cpufreq range controller.  It'd have sth like
> cpu.freq.min and cpu.freq.max, where min defines the maximum minimum
> cpufreq its descendants can get and max defines the maximum cpufreq
> allowed in the subtree.  For an example, please refer to how
> memory.min and memory.max are defined.

I think you are still looking at just one usage of this interface,
which is likely mainly my fault also because of the long time between
posting. Sorry for that...

Let me re-propose here an abstract of the cover letter with some
additional notes inline.

--- Cover Letter Abstract START ---

> > [...] utilization is a task specific property which is used by the scheduler
> > to know how much CPU bandwidth a task requires (under certain conditions).
> > Thus, the utilization clamp values defined either per-task or via the
> > CPU controller, can be used to represent tasks to the scheduler as
> > being bigger (or smaller) then what they really are.
  ^^^

This is a fundamental feature added by utilization clamping: this is a
task property which can be useful in many different ways to the
scheduler and not "just" to bias frequency selection.

> > Utilization clamping thus ultimately enable interesting additional
> > optimizations, especially on asymmetric capacity systems like Arm
> > big.LITTLE and DynamIQ CPUs, where:
> > 
> >  - boosting: small tasks are preferably scheduled on higher-capacity CPUs
> >where, despite being less energy efficient, they can complete faster
> > 
> >  - clamping: big/background tasks are preferably scheduler on low-capacity 
> > CPUs
> >where, being more energy efficient, they can still run but save power and
> >thermal headroom for more important tasks.

These two point above are two examples of how we can use utilization
clamping which is not frequency selection.

> > This additional usage of the utilization clamping is not presented in this
  

Is it acceptable to add a generic interface by properly and completely
describing, both in the cover letter and in the relative changelogs,
what will be the future bits we can add ?

> > series but it's an integral part of the Energy Aware Scheduler (EAS) feature
   ^

The EAS scheduler, without the utilization clamping bits, does a great
job in scheduling tasks while saving energy. However, on every system,
we are interested also in other metrics, like for example: completion
time and power dissipation.

Whether certain tasks should be scheduled to optimize energy
efficiency, completion time and/or power dissipation is something we
can achieve only by:

1. adopting a proper tasks classification schema
   => that's why CGroups are of interest

2. using a generic enough mechanism to describe certain tasks
   properties which affect all the metrics above,
   i.e. energy, speed and power
   => that's why utilization and its clamping is of interest

> > set. A similar solution (SchedTune) is already used on Android kernels, 
> > which
   ^^^

This _complete support_ is already actively and successfully used on
many Android devices...

> > targets both frequency selection and task placement biasing.
 ^^

... to support _not only_ frequency selections.

> > This series provides the foundation bits to add similar features in mainline
 ^^^
> > and its first simple client with the schedutil integration.
^^^

The solution presented here shows only the integration with
cpufreq/schedutil. However, since we are adding a user-space
interface, we have to add this new interface in a generic way since
the beginning to support also the complete implementation we will have
at the end.

--- Cover Letter Abstract END ---


>From my comments above I hope it's now more clear that "utilization
clamping" is not just a "cpufreq range controller" and, since we
will extend the internal usage of such interface, we cannot add now a
user-space interface which targets just frequency control.

To 

Re: [PATCH v2 08/12] sched/core: uclamp: extend cpu's cgroup controller

2018-07-24 Thread Patrick Bellasi
Hi Tejun,

I apologize in advance for the (yet another) long reply, however I did
my best hereafter to try to resume all the controversial points
discussed so far.

If you will have (one more time) the patience to go through the
following text you'll find a set of precise clarifications and
questions I have for you.

Thank you again for your time.

On 24-Jul 06:29, Tejun Heo wrote:

[...]

> > What I describe here is just an additional hint to the scheduler which
> > enrich the above described model. Provided A and B are already
> > satisfied, when a task gets a chance to run it will be executed at a
> > min/max configured frequency. That's really all... there is not
> > additional impact on "resources allocation".
> 
> So, if it's a cpufreq range controller.  It'd have sth like
> cpu.freq.min and cpu.freq.max, where min defines the maximum minimum
> cpufreq its descendants can get and max defines the maximum cpufreq
> allowed in the subtree.  For an example, please refer to how
> memory.min and memory.max are defined.

I think you are still looking at just one usage of this interface,
which is likely mainly my fault also because of the long time between
posting. Sorry for that...

Let me re-propose here an abstract of the cover letter with some
additional notes inline.

--- Cover Letter Abstract START ---

> > [...] utilization is a task specific property which is used by the scheduler
> > to know how much CPU bandwidth a task requires (under certain conditions).
> > Thus, the utilization clamp values defined either per-task or via the
> > CPU controller, can be used to represent tasks to the scheduler as
> > being bigger (or smaller) then what they really are.
  ^^^

This is a fundamental feature added by utilization clamping: this is a
task property which can be useful in many different ways to the
scheduler and not "just" to bias frequency selection.

> > Utilization clamping thus ultimately enable interesting additional
> > optimizations, especially on asymmetric capacity systems like Arm
> > big.LITTLE and DynamIQ CPUs, where:
> > 
> >  - boosting: small tasks are preferably scheduled on higher-capacity CPUs
> >where, despite being less energy efficient, they can complete faster
> > 
> >  - clamping: big/background tasks are preferably scheduler on low-capacity 
> > CPUs
> >where, being more energy efficient, they can still run but save power and
> >thermal headroom for more important tasks.

These two point above are two examples of how we can use utilization
clamping which is not frequency selection.

> > This additional usage of the utilization clamping is not presented in this
  

Is it acceptable to add a generic interface by properly and completely
describing, both in the cover letter and in the relative changelogs,
what will be the future bits we can add ?

> > series but it's an integral part of the Energy Aware Scheduler (EAS) feature
   ^

The EAS scheduler, without the utilization clamping bits, does a great
job in scheduling tasks while saving energy. However, on every system,
we are interested also in other metrics, like for example: completion
time and power dissipation.

Whether certain tasks should be scheduled to optimize energy
efficiency, completion time and/or power dissipation is something we
can achieve only by:

1. adopting a proper tasks classification schema
   => that's why CGroups are of interest

2. using a generic enough mechanism to describe certain tasks
   properties which affect all the metrics above,
   i.e. energy, speed and power
   => that's why utilization and its clamping is of interest

> > set. A similar solution (SchedTune) is already used on Android kernels, 
> > which
   ^^^

This _complete support_ is already actively and successfully used on
many Android devices...

> > targets both frequency selection and task placement biasing.
 ^^

... to support _not only_ frequency selections.

> > This series provides the foundation bits to add similar features in mainline
 ^^^
> > and its first simple client with the schedutil integration.
^^^

The solution presented here shows only the integration with
cpufreq/schedutil. However, since we are adding a user-space
interface, we have to add this new interface in a generic way since
the beginning to support also the complete implementation we will have
at the end.

--- Cover Letter Abstract END ---


>From my comments above I hope it's now more clear that "utilization
clamping" is not just a "cpufreq range controller" and, since we
will extend the internal usage of such interface, we cannot add now a
user-space interface which targets just frequency control.

To 

Re: [PATCH v2 08/12] sched/core: uclamp: extend cpu's cgroup controller

2018-07-24 Thread Tejun Heo
Hello, Patrick.

On Mon, Jul 23, 2018 at 06:22:15PM +0100, Patrick Bellasi wrote:
> However, the "best effort" bandwidth control we have for CFS and RT
> can be further improved if, instead of just looking at time spent on
> CPUs, we provide some more hints to the scheduler to know at which
> min/max "MIPS" we want to consume the (best effort) time we have been
> allocated on a CPU.
> 
> Such a simple extension is still quite useful to satisfy many use-case
> we have, mainly on mobile systems, like the ones I've described in the
>"Newcomer's Short Abstract (Updated)"
> section of the cover letter:
>
> https://lore.kernel.org/lkml/20180716082906.6061-1-patrick.bell...@arm.com/T/#u

So, that's all completely fine but then let's please not give it a
name which doesn't quite match what it does.  We can just call it
e.g. cpufreq range control.

> > So, there are fundamental discrepancies between
> > description+interface vs. what it actually does.
> 
> Perhaps then I should just change the description to make it less
> generic...

I think so, along with the interface itself.

> > I really don't think that's something we can fix up later.
> 
> ... since, really, I don't think we can get to the point to extend
> later this interface to provide the strict bandwidth enforcement you
> are thinking about.

That's completley fine.  The interface just has to match what's
implemented.

...
> > and what you're describing inherently breaks the delegation model.
> 
> What I describe here is just an additional hint to the scheduler which
> enrich the above described model. Provided A and B are already
> satisfied, when a task gets a chance to run it will be executed at a
> min/max configured frequency. That's really all... there is not
> additional impact on "resources allocation".

So, if it's a cpufreq range controller.  It'd have sth like
cpu.freq.min and cpu.freq.max, where min defines the maximum minimum
cpufreq its descendants can get and max defines the maximum cpufreq
allowed in the subtree.  For an example, please refer to how
memory.min and memory.max are defined.

Thanks.

-- 
tejun


Re: [PATCH v2 08/12] sched/core: uclamp: extend cpu's cgroup controller

2018-07-24 Thread Tejun Heo
Hello, Patrick.

On Mon, Jul 23, 2018 at 06:22:15PM +0100, Patrick Bellasi wrote:
> However, the "best effort" bandwidth control we have for CFS and RT
> can be further improved if, instead of just looking at time spent on
> CPUs, we provide some more hints to the scheduler to know at which
> min/max "MIPS" we want to consume the (best effort) time we have been
> allocated on a CPU.
> 
> Such a simple extension is still quite useful to satisfy many use-case
> we have, mainly on mobile systems, like the ones I've described in the
>"Newcomer's Short Abstract (Updated)"
> section of the cover letter:
>
> https://lore.kernel.org/lkml/20180716082906.6061-1-patrick.bell...@arm.com/T/#u

So, that's all completely fine but then let's please not give it a
name which doesn't quite match what it does.  We can just call it
e.g. cpufreq range control.

> > So, there are fundamental discrepancies between
> > description+interface vs. what it actually does.
> 
> Perhaps then I should just change the description to make it less
> generic...

I think so, along with the interface itself.

> > I really don't think that's something we can fix up later.
> 
> ... since, really, I don't think we can get to the point to extend
> later this interface to provide the strict bandwidth enforcement you
> are thinking about.

That's completley fine.  The interface just has to match what's
implemented.

...
> > and what you're describing inherently breaks the delegation model.
> 
> What I describe here is just an additional hint to the scheduler which
> enrich the above described model. Provided A and B are already
> satisfied, when a task gets a chance to run it will be executed at a
> min/max configured frequency. That's really all... there is not
> additional impact on "resources allocation".

So, if it's a cpufreq range controller.  It'd have sth like
cpu.freq.min and cpu.freq.max, where min defines the maximum minimum
cpufreq its descendants can get and max defines the maximum cpufreq
allowed in the subtree.  For an example, please refer to how
memory.min and memory.max are defined.

Thanks.

-- 
tejun


Re: [PATCH v2 08/12] sched/core: uclamp: extend cpu's cgroup controller

2018-07-23 Thread Patrick Bellasi
On 23-Jul 08:30, Tejun Heo wrote:
> Hello,

Hi Tejun!
 
> On Mon, Jul 16, 2018 at 09:29:02AM +0100, Patrick Bellasi wrote:
> > The cgroup's CPU controller allows to assign a specified (maximum)
> > bandwidth to the tasks of a group. However this bandwidth is defined and
> > enforced only on a temporal base, without considering the actual
> > frequency a CPU is running on. Thus, the amount of computation completed
> > by a task within an allocated bandwidth can be very different depending
> > on the actual frequency the CPU is running that task.
> > The amount of computation can be affected also by the specific CPU a
> > task is running on, especially when running on asymmetric capacity
> > systems like Arm's big.LITTLE.
> 
> One basic problem I have with this patchset is that what's being
> described is way more generic than what actually got implemented.
> What's described is computation bandwidth control but what's
> implemented is just frequency clamping.

What I meant to describe is that we already have a computation
bandwidth control mechanism which is working quite fine for the
scheduling classes it applies to, i.e. CFS and RT.

For these classes we are usually happy with just a _best effort_
allocation of the bandwidth: nothing enforced in strict terms. Indeed,
there is not (at least not in kernel space) a tracking of the actual
available and allocated bandwidth. If we need strict enforcement, we
already have DL with its CBS servers.

However, the "best effort" bandwidth control we have for CFS and RT
can be further improved if, instead of just looking at time spent on
CPUs, we provide some more hints to the scheduler to know at which
min/max "MIPS" we want to consume the (best effort) time we have been
allocated on a CPU.

Such a simple extension is still quite useful to satisfy many use-case
we have, mainly on mobile systems, like the ones I've described in the
   "Newcomer's Short Abstract (Updated)"
section of the cover letter:
   
https://lore.kernel.org/lkml/20180716082906.6061-1-patrick.bell...@arm.com/T/#u

> So, there are fundamental discrepancies between
> description+interface vs. what it actually does.

Perhaps then I should just change the description to make it less
generic...

> I really don't think that's something we can fix up later.

... since, really, I don't think we can get to the point to extend
later this interface to provide the strict bandwidth enforcement you
are thinking about.

This would not be a fixup, but something really close to
re-implementing what we already have with the DL class.

> > These attributes:
> > 
> > a) are available only for non-root nodes, both on default and legacy
> >hierarchies
> > b) do not enforce any constraints and/or dependency between the parent
> >and its child nodes, thus relying on the delegation model and
> >permission settings defined by the system management software
> 
> cgroup does host attributes which only concern the cgroup itself and
> thus don't need any hierarchical behaviors on their own, but what's
> being implemented does control resource allocation,

I'm not completely sure to get your point here.

Maybe it all depends on what we mean by "control resource allocation".

AFAIU, currently both the CFS and RT bandwidth controllers allow you
to define how much CPU time a group of tasks can use. It does that by
looking just within the group: there is no enforced/required relation
between the bandwidth assigned to a group and the bandwidth assigned
to its parent, siblings and/or children.

The resource control allocation is eventually enforced "indirectly" by
means of the fact that, based on tasks priorities and cgroup shares,
the scheduler will prefer to pick and run "more frequently" and
"longer" certain tasks instead of others.

Thus I would say that the resource allocation control is already
performed by the combined action of:
A) priorities / shares to favor certain tasks over others
B) period & bandwidth to further bias the scheduler in _not_ selecting
  tasks which already executed for the configured amount of time.

> and what you're describing inherently breaks the delegation model.

What I describe here is just an additional hint to the scheduler which
enrich the above described model. Provided A and B are already
satisfied, when a task gets a chance to run it will be executed at a
min/max configured frequency. That's really all... there is not
additional impact on "resources allocation".

I don't see why you say that this breaks the delegation model?

Maybe an example can help to better explain what you mean?

Best,
Patrick

-- 
#include 

Patrick Bellasi


Re: [PATCH v2 08/12] sched/core: uclamp: extend cpu's cgroup controller

2018-07-23 Thread Patrick Bellasi
On 23-Jul 08:30, Tejun Heo wrote:
> Hello,

Hi Tejun!
 
> On Mon, Jul 16, 2018 at 09:29:02AM +0100, Patrick Bellasi wrote:
> > The cgroup's CPU controller allows to assign a specified (maximum)
> > bandwidth to the tasks of a group. However this bandwidth is defined and
> > enforced only on a temporal base, without considering the actual
> > frequency a CPU is running on. Thus, the amount of computation completed
> > by a task within an allocated bandwidth can be very different depending
> > on the actual frequency the CPU is running that task.
> > The amount of computation can be affected also by the specific CPU a
> > task is running on, especially when running on asymmetric capacity
> > systems like Arm's big.LITTLE.
> 
> One basic problem I have with this patchset is that what's being
> described is way more generic than what actually got implemented.
> What's described is computation bandwidth control but what's
> implemented is just frequency clamping.

What I meant to describe is that we already have a computation
bandwidth control mechanism which is working quite fine for the
scheduling classes it applies to, i.e. CFS and RT.

For these classes we are usually happy with just a _best effort_
allocation of the bandwidth: nothing enforced in strict terms. Indeed,
there is not (at least not in kernel space) a tracking of the actual
available and allocated bandwidth. If we need strict enforcement, we
already have DL with its CBS servers.

However, the "best effort" bandwidth control we have for CFS and RT
can be further improved if, instead of just looking at time spent on
CPUs, we provide some more hints to the scheduler to know at which
min/max "MIPS" we want to consume the (best effort) time we have been
allocated on a CPU.

Such a simple extension is still quite useful to satisfy many use-case
we have, mainly on mobile systems, like the ones I've described in the
   "Newcomer's Short Abstract (Updated)"
section of the cover letter:
   
https://lore.kernel.org/lkml/20180716082906.6061-1-patrick.bell...@arm.com/T/#u

> So, there are fundamental discrepancies between
> description+interface vs. what it actually does.

Perhaps then I should just change the description to make it less
generic...

> I really don't think that's something we can fix up later.

... since, really, I don't think we can get to the point to extend
later this interface to provide the strict bandwidth enforcement you
are thinking about.

This would not be a fixup, but something really close to
re-implementing what we already have with the DL class.

> > These attributes:
> > 
> > a) are available only for non-root nodes, both on default and legacy
> >hierarchies
> > b) do not enforce any constraints and/or dependency between the parent
> >and its child nodes, thus relying on the delegation model and
> >permission settings defined by the system management software
> 
> cgroup does host attributes which only concern the cgroup itself and
> thus don't need any hierarchical behaviors on their own, but what's
> being implemented does control resource allocation,

I'm not completely sure to get your point here.

Maybe it all depends on what we mean by "control resource allocation".

AFAIU, currently both the CFS and RT bandwidth controllers allow you
to define how much CPU time a group of tasks can use. It does that by
looking just within the group: there is no enforced/required relation
between the bandwidth assigned to a group and the bandwidth assigned
to its parent, siblings and/or children.

The resource control allocation is eventually enforced "indirectly" by
means of the fact that, based on tasks priorities and cgroup shares,
the scheduler will prefer to pick and run "more frequently" and
"longer" certain tasks instead of others.

Thus I would say that the resource allocation control is already
performed by the combined action of:
A) priorities / shares to favor certain tasks over others
B) period & bandwidth to further bias the scheduler in _not_ selecting
  tasks which already executed for the configured amount of time.

> and what you're describing inherently breaks the delegation model.

What I describe here is just an additional hint to the scheduler which
enrich the above described model. Provided A and B are already
satisfied, when a task gets a chance to run it will be executed at a
min/max configured frequency. That's really all... there is not
additional impact on "resources allocation".

I don't see why you say that this breaks the delegation model?

Maybe an example can help to better explain what you mean?

Best,
Patrick

-- 
#include 

Patrick Bellasi


Re: [PATCH v2 08/12] sched/core: uclamp: extend cpu's cgroup controller

2018-07-23 Thread Tejun Heo
Hello,

On Mon, Jul 16, 2018 at 09:29:02AM +0100, Patrick Bellasi wrote:
> The cgroup's CPU controller allows to assign a specified (maximum)
> bandwidth to the tasks of a group. However this bandwidth is defined and
> enforced only on a temporal base, without considering the actual
> frequency a CPU is running on. Thus, the amount of computation completed
> by a task within an allocated bandwidth can be very different depending
> on the actual frequency the CPU is running that task.
> The amount of computation can be affected also by the specific CPU a
> task is running on, especially when running on asymmetric capacity
> systems like Arm's big.LITTLE.

One basic problem I have with this patchset is that what's being
described is way more generic than what actually got implemented.
What's described is computation bandwidth control but what's
implemented is just frequency clamping.  So, there are fundamental
discrepancies between description+interface vs. what it actually does.

I really don't think that's something we can fix up later.

> These attributes:
> 
> a) are available only for non-root nodes, both on default and legacy
>hierarchies
> b) do not enforce any constraints and/or dependency between the parent
>and its child nodes, thus relying on the delegation model and
>permission settings defined by the system management software

cgroup does host attributes which only concern the cgroup itself and
thus don't need any hierarchical behaviors on their own, but what's
being implemented does control resource allocation, and what you're
describing inherently breaks the delegation model.

> c) allow to (eventually) further restrict task-specific clamps defined
>via sched_setattr(2)

Thanks.

-- 
tejun


Re: [PATCH v2 08/12] sched/core: uclamp: extend cpu's cgroup controller

2018-07-23 Thread Tejun Heo
Hello,

On Mon, Jul 16, 2018 at 09:29:02AM +0100, Patrick Bellasi wrote:
> The cgroup's CPU controller allows to assign a specified (maximum)
> bandwidth to the tasks of a group. However this bandwidth is defined and
> enforced only on a temporal base, without considering the actual
> frequency a CPU is running on. Thus, the amount of computation completed
> by a task within an allocated bandwidth can be very different depending
> on the actual frequency the CPU is running that task.
> The amount of computation can be affected also by the specific CPU a
> task is running on, especially when running on asymmetric capacity
> systems like Arm's big.LITTLE.

One basic problem I have with this patchset is that what's being
described is way more generic than what actually got implemented.
What's described is computation bandwidth control but what's
implemented is just frequency clamping.  So, there are fundamental
discrepancies between description+interface vs. what it actually does.

I really don't think that's something we can fix up later.

> These attributes:
> 
> a) are available only for non-root nodes, both on default and legacy
>hierarchies
> b) do not enforce any constraints and/or dependency between the parent
>and its child nodes, thus relying on the delegation model and
>permission settings defined by the system management software

cgroup does host attributes which only concern the cgroup itself and
thus don't need any hierarchical behaviors on their own, but what's
being implemented does control resource allocation, and what you're
describing inherently breaks the delegation model.

> c) allow to (eventually) further restrict task-specific clamps defined
>via sched_setattr(2)

Thanks.

-- 
tejun


Re: [PATCH v2 08/12] sched/core: uclamp: extend cpu's cgroup controller

2018-07-23 Thread Patrick Bellasi
On 20-Jul 19:37, Suren Baghdasaryan wrote:
> On Mon, Jul 16, 2018 at 1:29 AM, Patrick Bellasi
>  wrote:

[...]

> > +#ifdef CONFIG_UCLAMP_TASK_GROUP
> > +static int cpu_util_min_write_u64(struct cgroup_subsys_state *css,
> > + struct cftype *cftype, u64 min_value)
> > +{
> > +   struct task_group *tg;
> > +   int ret = -EINVAL;
> > +
> > +   if (min_value > SCHED_CAPACITY_SCALE)
> > +   return -ERANGE;
> > +
> > +   mutex_lock(_mutex);
> > +   rcu_read_lock();
> > +
> > +   tg = css_tg(css);
> > +   if (tg->uclamp[UCLAMP_MIN].value == min_value) {
> > +   ret = 0;
> > +   goto out;
> > +   }
> > +   if (tg->uclamp[UCLAMP_MAX].value < min_value)
> > +   goto out;
> > +
> 
> + tg->uclamp[UCLAMP_MIN].value = min_value;
> + ret = 0;
> 
> Are these assignments missing or am I missing something? Same for
> cpu_util_max_write_u64().

They are introduced in the following patch, to keep this one focus
just on CGroups integration.

I'm also returning -EINVAL at this stage since, with just this patch
in, we are not really providing any good service to user-space, i.e.
it's like clamp groups not being available...

Maybe I can call this out better in the change log ;)

> > +out:
> > +   rcu_read_unlock();
> > +   mutex_unlock(_mutex);
> > +
> > +   return ret;
> > +}

[...]

-- 
#include 

Patrick Bellasi


Re: [PATCH v2 08/12] sched/core: uclamp: extend cpu's cgroup controller

2018-07-23 Thread Patrick Bellasi
On 20-Jul 19:37, Suren Baghdasaryan wrote:
> On Mon, Jul 16, 2018 at 1:29 AM, Patrick Bellasi
>  wrote:

[...]

> > +#ifdef CONFIG_UCLAMP_TASK_GROUP
> > +static int cpu_util_min_write_u64(struct cgroup_subsys_state *css,
> > + struct cftype *cftype, u64 min_value)
> > +{
> > +   struct task_group *tg;
> > +   int ret = -EINVAL;
> > +
> > +   if (min_value > SCHED_CAPACITY_SCALE)
> > +   return -ERANGE;
> > +
> > +   mutex_lock(_mutex);
> > +   rcu_read_lock();
> > +
> > +   tg = css_tg(css);
> > +   if (tg->uclamp[UCLAMP_MIN].value == min_value) {
> > +   ret = 0;
> > +   goto out;
> > +   }
> > +   if (tg->uclamp[UCLAMP_MAX].value < min_value)
> > +   goto out;
> > +
> 
> + tg->uclamp[UCLAMP_MIN].value = min_value;
> + ret = 0;
> 
> Are these assignments missing or am I missing something? Same for
> cpu_util_max_write_u64().

They are introduced in the following patch, to keep this one focus
just on CGroups integration.

I'm also returning -EINVAL at this stage since, with just this patch
in, we are not really providing any good service to user-space, i.e.
it's like clamp groups not being available...

Maybe I can call this out better in the change log ;)

> > +out:
> > +   rcu_read_unlock();
> > +   mutex_unlock(_mutex);
> > +
> > +   return ret;
> > +}

[...]

-- 
#include 

Patrick Bellasi


Re: [PATCH v2 08/12] sched/core: uclamp: extend cpu's cgroup controller

2018-07-20 Thread Suren Baghdasaryan
On Fri, Jul 20, 2018 at 7:37 PM, Suren Baghdasaryan  wrote:
> On Mon, Jul 16, 2018 at 1:29 AM, Patrick Bellasi
>  wrote:
>> The cgroup's CPU controller allows to assign a specified (maximum)
>> bandwidth to the tasks of a group. However this bandwidth is defined and
>> enforced only on a temporal base, without considering the actual
>> frequency a CPU is running on. Thus, the amount of computation completed
>> by a task within an allocated bandwidth can be very different depending
>> on the actual frequency the CPU is running that task.
>> The amount of computation can be affected also by the specific CPU a
>> task is running on, especially when running on asymmetric capacity
>> systems like Arm's big.LITTLE.
>>
>> With the availability of schedutil, the scheduler is now able
>> to drive frequency selections based on actual task utilization.
>> Moreover, the utilization clamping support provides a mechanism to
>> bias the frequency selection operated by schedutil depending on
>> constraints assigned to the tasks currently RUNNABLE on a CPU.
>>
>> Give the above mechanisms, it is now possible to extend the cpu
>> controller to specify what is the minimum (or maximum) utilization which
>> a task is expected (or allowed) to generate.
>> Constraints on minimum and maximum utilization allowed for tasks in a
>> CPU cgroup can improve the control on the actual amount of CPU bandwidth
>> consumed by tasks.
>>
>> Utilization clamping constraints are useful not only to bias frequency
>> selection, when a task is running, but also to better support certain
>> scheduler decisions regarding task placement. For example, on
>> asymmetric capacity systems, a utilization clamp value can be
>> conveniently used to enforce important interactive tasks on more capable
>> CPUs or to run low priority and background tasks on more energy
>> efficient CPUs.
>>
>> The ultimate goal of utilization clamping is thus to enable:
>>
>> - boosting: by selecting an higher capacity CPU and/or higher execution
>> frequency for small tasks which are affecting the user
>> interactive experience.
>>
>> - capping: by selecting more energy efficiency CPUs or lower execution
>>frequency, for big tasks which are mainly related to
>>background activities, and thus without a direct impact on
>>the user experience.
>>
>> Thus, a proper extension of the cpu controller with utilization clamping
>> support will make this controller even more suitable for integration
>> with advanced system management software (e.g. Android).
>> Indeed, an informed user-space can provide rich information hints to the
>> scheduler regarding the tasks it's going to schedule.
>>
>> This patch extends the CPU controller by adding a couple of new
>> attributes, util_min and util_max, which can be used to enforce task's
>> utilization boosting and capping. Specifically:
>>
>> - util_min: defines the minimum utilization which should be considered,
>> e.g. when schedutil selects the frequency for a CPU while a
>> task in this group is RUNNABLE.
>> i.e. the task will run at least at a minimum frequency which
>> corresponds to the min_util utilization
>>
>> - util_max: defines the maximum utilization which should be considered,
>> e.g. when schedutil selects the frequency for a CPU while a
>> task in this group is RUNNABLE.
>> i.e. the task will run up to a maximum frequency which
>> corresponds to the max_util utilization
>>
>> These attributes:
>>
>> a) are available only for non-root nodes, both on default and legacy
>>hierarchies
>> b) do not enforce any constraints and/or dependency between the parent
>>and its child nodes, thus relying on the delegation model and
>>permission settings defined by the system management software
>> c) allow to (eventually) further restrict task-specific clamps defined
>>via sched_setattr(2)
>>
>> This patch provides the basic support to expose the two new attributes
>> and to validate their run-time updates.
>>
>> Signed-off-by: Patrick Bellasi 
>> Cc: Ingo Molnar 
>> Cc: Peter Zijlstra 
>> Cc: Tejun Heo 
>> Cc: Rafael J. Wysocki 
>> Cc: Viresh Kumar 
>> Cc: Todd Kjos 
>> Cc: Joel Fernandes 
>> Cc: Juri Lelli 
>> Cc: linux-kernel@vger.kernel.org
>> Cc: linux...@vger.kernel.org
>> ---
>>  Documentation/admin-guide/cgroup-v2.rst |  25 
>>  init/Kconfig|  22 +++
>>  kernel/sched/core.c | 186 
>>  kernel/sched/sched.h|   5 +
>>  4 files changed, 238 insertions(+)
>>
>> diff --git a/Documentation/admin-guide/cgroup-v2.rst 
>> b/Documentation/admin-guide/cgroup-v2.rst
>> index 8a2c52d5c53b..328c011cc105 100644
>> --- a/Documentation/admin-guide/cgroup-v2.rst
>> +++ b/Documentation/admin-guide/cgroup-v2.rst
>> @@ -904,6 +904,12 @@ controller implements weight and absolute bandwidth 

Re: [PATCH v2 08/12] sched/core: uclamp: extend cpu's cgroup controller

2018-07-20 Thread Suren Baghdasaryan
On Fri, Jul 20, 2018 at 7:37 PM, Suren Baghdasaryan  wrote:
> On Mon, Jul 16, 2018 at 1:29 AM, Patrick Bellasi
>  wrote:
>> The cgroup's CPU controller allows to assign a specified (maximum)
>> bandwidth to the tasks of a group. However this bandwidth is defined and
>> enforced only on a temporal base, without considering the actual
>> frequency a CPU is running on. Thus, the amount of computation completed
>> by a task within an allocated bandwidth can be very different depending
>> on the actual frequency the CPU is running that task.
>> The amount of computation can be affected also by the specific CPU a
>> task is running on, especially when running on asymmetric capacity
>> systems like Arm's big.LITTLE.
>>
>> With the availability of schedutil, the scheduler is now able
>> to drive frequency selections based on actual task utilization.
>> Moreover, the utilization clamping support provides a mechanism to
>> bias the frequency selection operated by schedutil depending on
>> constraints assigned to the tasks currently RUNNABLE on a CPU.
>>
>> Give the above mechanisms, it is now possible to extend the cpu
>> controller to specify what is the minimum (or maximum) utilization which
>> a task is expected (or allowed) to generate.
>> Constraints on minimum and maximum utilization allowed for tasks in a
>> CPU cgroup can improve the control on the actual amount of CPU bandwidth
>> consumed by tasks.
>>
>> Utilization clamping constraints are useful not only to bias frequency
>> selection, when a task is running, but also to better support certain
>> scheduler decisions regarding task placement. For example, on
>> asymmetric capacity systems, a utilization clamp value can be
>> conveniently used to enforce important interactive tasks on more capable
>> CPUs or to run low priority and background tasks on more energy
>> efficient CPUs.
>>
>> The ultimate goal of utilization clamping is thus to enable:
>>
>> - boosting: by selecting an higher capacity CPU and/or higher execution
>> frequency for small tasks which are affecting the user
>> interactive experience.
>>
>> - capping: by selecting more energy efficiency CPUs or lower execution
>>frequency, for big tasks which are mainly related to
>>background activities, and thus without a direct impact on
>>the user experience.
>>
>> Thus, a proper extension of the cpu controller with utilization clamping
>> support will make this controller even more suitable for integration
>> with advanced system management software (e.g. Android).
>> Indeed, an informed user-space can provide rich information hints to the
>> scheduler regarding the tasks it's going to schedule.
>>
>> This patch extends the CPU controller by adding a couple of new
>> attributes, util_min and util_max, which can be used to enforce task's
>> utilization boosting and capping. Specifically:
>>
>> - util_min: defines the minimum utilization which should be considered,
>> e.g. when schedutil selects the frequency for a CPU while a
>> task in this group is RUNNABLE.
>> i.e. the task will run at least at a minimum frequency which
>> corresponds to the min_util utilization
>>
>> - util_max: defines the maximum utilization which should be considered,
>> e.g. when schedutil selects the frequency for a CPU while a
>> task in this group is RUNNABLE.
>> i.e. the task will run up to a maximum frequency which
>> corresponds to the max_util utilization
>>
>> These attributes:
>>
>> a) are available only for non-root nodes, both on default and legacy
>>hierarchies
>> b) do not enforce any constraints and/or dependency between the parent
>>and its child nodes, thus relying on the delegation model and
>>permission settings defined by the system management software
>> c) allow to (eventually) further restrict task-specific clamps defined
>>via sched_setattr(2)
>>
>> This patch provides the basic support to expose the two new attributes
>> and to validate their run-time updates.
>>
>> Signed-off-by: Patrick Bellasi 
>> Cc: Ingo Molnar 
>> Cc: Peter Zijlstra 
>> Cc: Tejun Heo 
>> Cc: Rafael J. Wysocki 
>> Cc: Viresh Kumar 
>> Cc: Todd Kjos 
>> Cc: Joel Fernandes 
>> Cc: Juri Lelli 
>> Cc: linux-kernel@vger.kernel.org
>> Cc: linux...@vger.kernel.org
>> ---
>>  Documentation/admin-guide/cgroup-v2.rst |  25 
>>  init/Kconfig|  22 +++
>>  kernel/sched/core.c | 186 
>>  kernel/sched/sched.h|   5 +
>>  4 files changed, 238 insertions(+)
>>
>> diff --git a/Documentation/admin-guide/cgroup-v2.rst 
>> b/Documentation/admin-guide/cgroup-v2.rst
>> index 8a2c52d5c53b..328c011cc105 100644
>> --- a/Documentation/admin-guide/cgroup-v2.rst
>> +++ b/Documentation/admin-guide/cgroup-v2.rst
>> @@ -904,6 +904,12 @@ controller implements weight and absolute bandwidth 

Re: [PATCH v2 08/12] sched/core: uclamp: extend cpu's cgroup controller

2018-07-20 Thread Suren Baghdasaryan
On Mon, Jul 16, 2018 at 1:29 AM, Patrick Bellasi
 wrote:
> The cgroup's CPU controller allows to assign a specified (maximum)
> bandwidth to the tasks of a group. However this bandwidth is defined and
> enforced only on a temporal base, without considering the actual
> frequency a CPU is running on. Thus, the amount of computation completed
> by a task within an allocated bandwidth can be very different depending
> on the actual frequency the CPU is running that task.
> The amount of computation can be affected also by the specific CPU a
> task is running on, especially when running on asymmetric capacity
> systems like Arm's big.LITTLE.
>
> With the availability of schedutil, the scheduler is now able
> to drive frequency selections based on actual task utilization.
> Moreover, the utilization clamping support provides a mechanism to
> bias the frequency selection operated by schedutil depending on
> constraints assigned to the tasks currently RUNNABLE on a CPU.
>
> Give the above mechanisms, it is now possible to extend the cpu
> controller to specify what is the minimum (or maximum) utilization which
> a task is expected (or allowed) to generate.
> Constraints on minimum and maximum utilization allowed for tasks in a
> CPU cgroup can improve the control on the actual amount of CPU bandwidth
> consumed by tasks.
>
> Utilization clamping constraints are useful not only to bias frequency
> selection, when a task is running, but also to better support certain
> scheduler decisions regarding task placement. For example, on
> asymmetric capacity systems, a utilization clamp value can be
> conveniently used to enforce important interactive tasks on more capable
> CPUs or to run low priority and background tasks on more energy
> efficient CPUs.
>
> The ultimate goal of utilization clamping is thus to enable:
>
> - boosting: by selecting an higher capacity CPU and/or higher execution
> frequency for small tasks which are affecting the user
> interactive experience.
>
> - capping: by selecting more energy efficiency CPUs or lower execution
>frequency, for big tasks which are mainly related to
>background activities, and thus without a direct impact on
>the user experience.
>
> Thus, a proper extension of the cpu controller with utilization clamping
> support will make this controller even more suitable for integration
> with advanced system management software (e.g. Android).
> Indeed, an informed user-space can provide rich information hints to the
> scheduler regarding the tasks it's going to schedule.
>
> This patch extends the CPU controller by adding a couple of new
> attributes, util_min and util_max, which can be used to enforce task's
> utilization boosting and capping. Specifically:
>
> - util_min: defines the minimum utilization which should be considered,
> e.g. when schedutil selects the frequency for a CPU while a
> task in this group is RUNNABLE.
> i.e. the task will run at least at a minimum frequency which
> corresponds to the min_util utilization
>
> - util_max: defines the maximum utilization which should be considered,
> e.g. when schedutil selects the frequency for a CPU while a
> task in this group is RUNNABLE.
> i.e. the task will run up to a maximum frequency which
> corresponds to the max_util utilization
>
> These attributes:
>
> a) are available only for non-root nodes, both on default and legacy
>hierarchies
> b) do not enforce any constraints and/or dependency between the parent
>and its child nodes, thus relying on the delegation model and
>permission settings defined by the system management software
> c) allow to (eventually) further restrict task-specific clamps defined
>via sched_setattr(2)
>
> This patch provides the basic support to expose the two new attributes
> and to validate their run-time updates.
>
> Signed-off-by: Patrick Bellasi 
> Cc: Ingo Molnar 
> Cc: Peter Zijlstra 
> Cc: Tejun Heo 
> Cc: Rafael J. Wysocki 
> Cc: Viresh Kumar 
> Cc: Todd Kjos 
> Cc: Joel Fernandes 
> Cc: Juri Lelli 
> Cc: linux-kernel@vger.kernel.org
> Cc: linux...@vger.kernel.org
> ---
>  Documentation/admin-guide/cgroup-v2.rst |  25 
>  init/Kconfig|  22 +++
>  kernel/sched/core.c | 186 
>  kernel/sched/sched.h|   5 +
>  4 files changed, 238 insertions(+)
>
> diff --git a/Documentation/admin-guide/cgroup-v2.rst 
> b/Documentation/admin-guide/cgroup-v2.rst
> index 8a2c52d5c53b..328c011cc105 100644
> --- a/Documentation/admin-guide/cgroup-v2.rst
> +++ b/Documentation/admin-guide/cgroup-v2.rst
> @@ -904,6 +904,12 @@ controller implements weight and absolute bandwidth 
> limit models for
>  normal scheduling policy and absolute bandwidth allocation model for
>  realtime scheduling policy.
>
> +Cycles distribution is based, by 

Re: [PATCH v2 08/12] sched/core: uclamp: extend cpu's cgroup controller

2018-07-20 Thread Suren Baghdasaryan
On Mon, Jul 16, 2018 at 1:29 AM, Patrick Bellasi
 wrote:
> The cgroup's CPU controller allows to assign a specified (maximum)
> bandwidth to the tasks of a group. However this bandwidth is defined and
> enforced only on a temporal base, without considering the actual
> frequency a CPU is running on. Thus, the amount of computation completed
> by a task within an allocated bandwidth can be very different depending
> on the actual frequency the CPU is running that task.
> The amount of computation can be affected also by the specific CPU a
> task is running on, especially when running on asymmetric capacity
> systems like Arm's big.LITTLE.
>
> With the availability of schedutil, the scheduler is now able
> to drive frequency selections based on actual task utilization.
> Moreover, the utilization clamping support provides a mechanism to
> bias the frequency selection operated by schedutil depending on
> constraints assigned to the tasks currently RUNNABLE on a CPU.
>
> Give the above mechanisms, it is now possible to extend the cpu
> controller to specify what is the minimum (or maximum) utilization which
> a task is expected (or allowed) to generate.
> Constraints on minimum and maximum utilization allowed for tasks in a
> CPU cgroup can improve the control on the actual amount of CPU bandwidth
> consumed by tasks.
>
> Utilization clamping constraints are useful not only to bias frequency
> selection, when a task is running, but also to better support certain
> scheduler decisions regarding task placement. For example, on
> asymmetric capacity systems, a utilization clamp value can be
> conveniently used to enforce important interactive tasks on more capable
> CPUs or to run low priority and background tasks on more energy
> efficient CPUs.
>
> The ultimate goal of utilization clamping is thus to enable:
>
> - boosting: by selecting an higher capacity CPU and/or higher execution
> frequency for small tasks which are affecting the user
> interactive experience.
>
> - capping: by selecting more energy efficiency CPUs or lower execution
>frequency, for big tasks which are mainly related to
>background activities, and thus without a direct impact on
>the user experience.
>
> Thus, a proper extension of the cpu controller with utilization clamping
> support will make this controller even more suitable for integration
> with advanced system management software (e.g. Android).
> Indeed, an informed user-space can provide rich information hints to the
> scheduler regarding the tasks it's going to schedule.
>
> This patch extends the CPU controller by adding a couple of new
> attributes, util_min and util_max, which can be used to enforce task's
> utilization boosting and capping. Specifically:
>
> - util_min: defines the minimum utilization which should be considered,
> e.g. when schedutil selects the frequency for a CPU while a
> task in this group is RUNNABLE.
> i.e. the task will run at least at a minimum frequency which
> corresponds to the min_util utilization
>
> - util_max: defines the maximum utilization which should be considered,
> e.g. when schedutil selects the frequency for a CPU while a
> task in this group is RUNNABLE.
> i.e. the task will run up to a maximum frequency which
> corresponds to the max_util utilization
>
> These attributes:
>
> a) are available only for non-root nodes, both on default and legacy
>hierarchies
> b) do not enforce any constraints and/or dependency between the parent
>and its child nodes, thus relying on the delegation model and
>permission settings defined by the system management software
> c) allow to (eventually) further restrict task-specific clamps defined
>via sched_setattr(2)
>
> This patch provides the basic support to expose the two new attributes
> and to validate their run-time updates.
>
> Signed-off-by: Patrick Bellasi 
> Cc: Ingo Molnar 
> Cc: Peter Zijlstra 
> Cc: Tejun Heo 
> Cc: Rafael J. Wysocki 
> Cc: Viresh Kumar 
> Cc: Todd Kjos 
> Cc: Joel Fernandes 
> Cc: Juri Lelli 
> Cc: linux-kernel@vger.kernel.org
> Cc: linux...@vger.kernel.org
> ---
>  Documentation/admin-guide/cgroup-v2.rst |  25 
>  init/Kconfig|  22 +++
>  kernel/sched/core.c | 186 
>  kernel/sched/sched.h|   5 +
>  4 files changed, 238 insertions(+)
>
> diff --git a/Documentation/admin-guide/cgroup-v2.rst 
> b/Documentation/admin-guide/cgroup-v2.rst
> index 8a2c52d5c53b..328c011cc105 100644
> --- a/Documentation/admin-guide/cgroup-v2.rst
> +++ b/Documentation/admin-guide/cgroup-v2.rst
> @@ -904,6 +904,12 @@ controller implements weight and absolute bandwidth 
> limit models for
>  normal scheduling policy and absolute bandwidth allocation model for
>  realtime scheduling policy.
>
> +Cycles distribution is based, by 

[PATCH v2 08/12] sched/core: uclamp: extend cpu's cgroup controller

2018-07-16 Thread Patrick Bellasi
The cgroup's CPU controller allows to assign a specified (maximum)
bandwidth to the tasks of a group. However this bandwidth is defined and
enforced only on a temporal base, without considering the actual
frequency a CPU is running on. Thus, the amount of computation completed
by a task within an allocated bandwidth can be very different depending
on the actual frequency the CPU is running that task.
The amount of computation can be affected also by the specific CPU a
task is running on, especially when running on asymmetric capacity
systems like Arm's big.LITTLE.

With the availability of schedutil, the scheduler is now able
to drive frequency selections based on actual task utilization.
Moreover, the utilization clamping support provides a mechanism to
bias the frequency selection operated by schedutil depending on
constraints assigned to the tasks currently RUNNABLE on a CPU.

Give the above mechanisms, it is now possible to extend the cpu
controller to specify what is the minimum (or maximum) utilization which
a task is expected (or allowed) to generate.
Constraints on minimum and maximum utilization allowed for tasks in a
CPU cgroup can improve the control on the actual amount of CPU bandwidth
consumed by tasks.

Utilization clamping constraints are useful not only to bias frequency
selection, when a task is running, but also to better support certain
scheduler decisions regarding task placement. For example, on
asymmetric capacity systems, a utilization clamp value can be
conveniently used to enforce important interactive tasks on more capable
CPUs or to run low priority and background tasks on more energy
efficient CPUs.

The ultimate goal of utilization clamping is thus to enable:

- boosting: by selecting an higher capacity CPU and/or higher execution
frequency for small tasks which are affecting the user
interactive experience.

- capping: by selecting more energy efficiency CPUs or lower execution
   frequency, for big tasks which are mainly related to
   background activities, and thus without a direct impact on
   the user experience.

Thus, a proper extension of the cpu controller with utilization clamping
support will make this controller even more suitable for integration
with advanced system management software (e.g. Android).
Indeed, an informed user-space can provide rich information hints to the
scheduler regarding the tasks it's going to schedule.

This patch extends the CPU controller by adding a couple of new
attributes, util_min and util_max, which can be used to enforce task's
utilization boosting and capping. Specifically:

- util_min: defines the minimum utilization which should be considered,
e.g. when schedutil selects the frequency for a CPU while a
task in this group is RUNNABLE.
i.e. the task will run at least at a minimum frequency which
corresponds to the min_util utilization

- util_max: defines the maximum utilization which should be considered,
e.g. when schedutil selects the frequency for a CPU while a
task in this group is RUNNABLE.
i.e. the task will run up to a maximum frequency which
corresponds to the max_util utilization

These attributes:

a) are available only for non-root nodes, both on default and legacy
   hierarchies
b) do not enforce any constraints and/or dependency between the parent
   and its child nodes, thus relying on the delegation model and
   permission settings defined by the system management software
c) allow to (eventually) further restrict task-specific clamps defined
   via sched_setattr(2)

This patch provides the basic support to expose the two new attributes
and to validate their run-time updates.

Signed-off-by: Patrick Bellasi 
Cc: Ingo Molnar 
Cc: Peter Zijlstra 
Cc: Tejun Heo 
Cc: Rafael J. Wysocki 
Cc: Viresh Kumar 
Cc: Todd Kjos 
Cc: Joel Fernandes 
Cc: Juri Lelli 
Cc: linux-kernel@vger.kernel.org
Cc: linux...@vger.kernel.org
---
 Documentation/admin-guide/cgroup-v2.rst |  25 
 init/Kconfig|  22 +++
 kernel/sched/core.c | 186 
 kernel/sched/sched.h|   5 +
 4 files changed, 238 insertions(+)

diff --git a/Documentation/admin-guide/cgroup-v2.rst 
b/Documentation/admin-guide/cgroup-v2.rst
index 8a2c52d5c53b..328c011cc105 100644
--- a/Documentation/admin-guide/cgroup-v2.rst
+++ b/Documentation/admin-guide/cgroup-v2.rst
@@ -904,6 +904,12 @@ controller implements weight and absolute bandwidth limit 
models for
 normal scheduling policy and absolute bandwidth allocation model for
 realtime scheduling policy.
 
+Cycles distribution is based, by default, on a temporal base and it
+does not account for the frequency at which tasks are executed.
+The (optional) utilization clamping support allows to enforce a minimum
+bandwidth, which should always be provided by a CPU, and a maximum bandwidth,

[PATCH v2 08/12] sched/core: uclamp: extend cpu's cgroup controller

2018-07-16 Thread Patrick Bellasi
The cgroup's CPU controller allows to assign a specified (maximum)
bandwidth to the tasks of a group. However this bandwidth is defined and
enforced only on a temporal base, without considering the actual
frequency a CPU is running on. Thus, the amount of computation completed
by a task within an allocated bandwidth can be very different depending
on the actual frequency the CPU is running that task.
The amount of computation can be affected also by the specific CPU a
task is running on, especially when running on asymmetric capacity
systems like Arm's big.LITTLE.

With the availability of schedutil, the scheduler is now able
to drive frequency selections based on actual task utilization.
Moreover, the utilization clamping support provides a mechanism to
bias the frequency selection operated by schedutil depending on
constraints assigned to the tasks currently RUNNABLE on a CPU.

Give the above mechanisms, it is now possible to extend the cpu
controller to specify what is the minimum (or maximum) utilization which
a task is expected (or allowed) to generate.
Constraints on minimum and maximum utilization allowed for tasks in a
CPU cgroup can improve the control on the actual amount of CPU bandwidth
consumed by tasks.

Utilization clamping constraints are useful not only to bias frequency
selection, when a task is running, but also to better support certain
scheduler decisions regarding task placement. For example, on
asymmetric capacity systems, a utilization clamp value can be
conveniently used to enforce important interactive tasks on more capable
CPUs or to run low priority and background tasks on more energy
efficient CPUs.

The ultimate goal of utilization clamping is thus to enable:

- boosting: by selecting an higher capacity CPU and/or higher execution
frequency for small tasks which are affecting the user
interactive experience.

- capping: by selecting more energy efficiency CPUs or lower execution
   frequency, for big tasks which are mainly related to
   background activities, and thus without a direct impact on
   the user experience.

Thus, a proper extension of the cpu controller with utilization clamping
support will make this controller even more suitable for integration
with advanced system management software (e.g. Android).
Indeed, an informed user-space can provide rich information hints to the
scheduler regarding the tasks it's going to schedule.

This patch extends the CPU controller by adding a couple of new
attributes, util_min and util_max, which can be used to enforce task's
utilization boosting and capping. Specifically:

- util_min: defines the minimum utilization which should be considered,
e.g. when schedutil selects the frequency for a CPU while a
task in this group is RUNNABLE.
i.e. the task will run at least at a minimum frequency which
corresponds to the min_util utilization

- util_max: defines the maximum utilization which should be considered,
e.g. when schedutil selects the frequency for a CPU while a
task in this group is RUNNABLE.
i.e. the task will run up to a maximum frequency which
corresponds to the max_util utilization

These attributes:

a) are available only for non-root nodes, both on default and legacy
   hierarchies
b) do not enforce any constraints and/or dependency between the parent
   and its child nodes, thus relying on the delegation model and
   permission settings defined by the system management software
c) allow to (eventually) further restrict task-specific clamps defined
   via sched_setattr(2)

This patch provides the basic support to expose the two new attributes
and to validate their run-time updates.

Signed-off-by: Patrick Bellasi 
Cc: Ingo Molnar 
Cc: Peter Zijlstra 
Cc: Tejun Heo 
Cc: Rafael J. Wysocki 
Cc: Viresh Kumar 
Cc: Todd Kjos 
Cc: Joel Fernandes 
Cc: Juri Lelli 
Cc: linux-kernel@vger.kernel.org
Cc: linux...@vger.kernel.org
---
 Documentation/admin-guide/cgroup-v2.rst |  25 
 init/Kconfig|  22 +++
 kernel/sched/core.c | 186 
 kernel/sched/sched.h|   5 +
 4 files changed, 238 insertions(+)

diff --git a/Documentation/admin-guide/cgroup-v2.rst 
b/Documentation/admin-guide/cgroup-v2.rst
index 8a2c52d5c53b..328c011cc105 100644
--- a/Documentation/admin-guide/cgroup-v2.rst
+++ b/Documentation/admin-guide/cgroup-v2.rst
@@ -904,6 +904,12 @@ controller implements weight and absolute bandwidth limit 
models for
 normal scheduling policy and absolute bandwidth allocation model for
 realtime scheduling policy.
 
+Cycles distribution is based, by default, on a temporal base and it
+does not account for the frequency at which tasks are executed.
+The (optional) utilization clamping support allows to enforce a minimum
+bandwidth, which should always be provided by a CPU, and a maximum bandwidth,