Re: [PATCH v2 08/12] sched/core: uclamp: extend cpu's cgroup controller
On Thursday 26 Jul 2018 at 17:39:19 (-0700), Joel Fernandes wrote: > On Tue, Jul 24, 2018 at 06:29:02AM -0700, Tejun Heo wrote: > > Hello, Patrick. > > > > On Mon, Jul 23, 2018 at 06:22:15PM +0100, Patrick Bellasi wrote: > > > However, the "best effort" bandwidth control we have for CFS and RT > > > can be further improved if, instead of just looking at time spent on > > > CPUs, we provide some more hints to the scheduler to know at which > > > min/max "MIPS" we want to consume the (best effort) time we have been > > > allocated on a CPU. > > > > > > Such a simple extension is still quite useful to satisfy many use-case > > > we have, mainly on mobile systems, like the ones I've described in the > > >"Newcomer's Short Abstract (Updated)" > > > section of the cover letter: > > > > > > https://lore.kernel.org/lkml/20180716082906.6061-1-patrick.bell...@arm.com/T/#u > > > > So, that's all completely fine but then let's please not give it a > > name which doesn't quite match what it does. We can just call it > > e.g. cpufreq range control. > > But then what name can one give it if it does more than one thing, like > task-placement and CPU frequency control? > > It doesn't make sense to name it cpufreq IMHO. Its a clamp on the utilization > of the task which can be used for many purposes. Indeed, the scheduler could use clamped utilization values in several places. The capacity-awareness bits (mostly useful for big.LITTLE platforms) could already use that today I guess. And on the longer term, depending on where the EAS patches [1] end up, utilization clamping might actually become very useful to bias task placement decisions. EAS basically decides where to place tasks based on their utilization, so util_clamp would make a lot of sense there IMO. Thanks, Quentin [1] https://lkml.org/lkml/2018/7/24/420
Re: [PATCH v2 08/12] sched/core: uclamp: extend cpu's cgroup controller
On Thursday 26 Jul 2018 at 17:39:19 (-0700), Joel Fernandes wrote: > On Tue, Jul 24, 2018 at 06:29:02AM -0700, Tejun Heo wrote: > > Hello, Patrick. > > > > On Mon, Jul 23, 2018 at 06:22:15PM +0100, Patrick Bellasi wrote: > > > However, the "best effort" bandwidth control we have for CFS and RT > > > can be further improved if, instead of just looking at time spent on > > > CPUs, we provide some more hints to the scheduler to know at which > > > min/max "MIPS" we want to consume the (best effort) time we have been > > > allocated on a CPU. > > > > > > Such a simple extension is still quite useful to satisfy many use-case > > > we have, mainly on mobile systems, like the ones I've described in the > > >"Newcomer's Short Abstract (Updated)" > > > section of the cover letter: > > > > > > https://lore.kernel.org/lkml/20180716082906.6061-1-patrick.bell...@arm.com/T/#u > > > > So, that's all completely fine but then let's please not give it a > > name which doesn't quite match what it does. We can just call it > > e.g. cpufreq range control. > > But then what name can one give it if it does more than one thing, like > task-placement and CPU frequency control? > > It doesn't make sense to name it cpufreq IMHO. Its a clamp on the utilization > of the task which can be used for many purposes. Indeed, the scheduler could use clamped utilization values in several places. The capacity-awareness bits (mostly useful for big.LITTLE platforms) could already use that today I guess. And on the longer term, depending on where the EAS patches [1] end up, utilization clamping might actually become very useful to bias task placement decisions. EAS basically decides where to place tasks based on their utilization, so util_clamp would make a lot of sense there IMO. Thanks, Quentin [1] https://lkml.org/lkml/2018/7/24/420
Re: [PATCH v2 08/12] sched/core: uclamp: extend cpu's cgroup controller
On Tue, Jul 24, 2018 at 06:29:02AM -0700, Tejun Heo wrote: > Hello, Patrick. > > On Mon, Jul 23, 2018 at 06:22:15PM +0100, Patrick Bellasi wrote: > > However, the "best effort" bandwidth control we have for CFS and RT > > can be further improved if, instead of just looking at time spent on > > CPUs, we provide some more hints to the scheduler to know at which > > min/max "MIPS" we want to consume the (best effort) time we have been > > allocated on a CPU. > > > > Such a simple extension is still quite useful to satisfy many use-case > > we have, mainly on mobile systems, like the ones I've described in the > >"Newcomer's Short Abstract (Updated)" > > section of the cover letter: > > > > https://lore.kernel.org/lkml/20180716082906.6061-1-patrick.bell...@arm.com/T/#u > > So, that's all completely fine but then let's please not give it a > name which doesn't quite match what it does. We can just call it > e.g. cpufreq range control. But then what name can one give it if it does more than one thing, like task-placement and CPU frequency control? It doesn't make sense to name it cpufreq IMHO. Its a clamp on the utilization of the task which can be used for many purposes. thanks, - Joel
Re: [PATCH v2 08/12] sched/core: uclamp: extend cpu's cgroup controller
On Tue, Jul 24, 2018 at 06:29:02AM -0700, Tejun Heo wrote: > Hello, Patrick. > > On Mon, Jul 23, 2018 at 06:22:15PM +0100, Patrick Bellasi wrote: > > However, the "best effort" bandwidth control we have for CFS and RT > > can be further improved if, instead of just looking at time spent on > > CPUs, we provide some more hints to the scheduler to know at which > > min/max "MIPS" we want to consume the (best effort) time we have been > > allocated on a CPU. > > > > Such a simple extension is still quite useful to satisfy many use-case > > we have, mainly on mobile systems, like the ones I've described in the > >"Newcomer's Short Abstract (Updated)" > > section of the cover letter: > > > > https://lore.kernel.org/lkml/20180716082906.6061-1-patrick.bell...@arm.com/T/#u > > So, that's all completely fine but then let's please not give it a > name which doesn't quite match what it does. We can just call it > e.g. cpufreq range control. But then what name can one give it if it does more than one thing, like task-placement and CPU frequency control? It doesn't make sense to name it cpufreq IMHO. Its a clamp on the utilization of the task which can be used for many purposes. thanks, - Joel
Re: [PATCH v2 08/12] sched/core: uclamp: extend cpu's cgroup controller
Hi Tejun, I apologize in advance for the (yet another) long reply, however I did my best hereafter to try to resume all the controversial points discussed so far. If you will have (one more time) the patience to go through the following text you'll find a set of precise clarifications and questions I have for you. Thank you again for your time. On 24-Jul 06:29, Tejun Heo wrote: [...] > > What I describe here is just an additional hint to the scheduler which > > enrich the above described model. Provided A and B are already > > satisfied, when a task gets a chance to run it will be executed at a > > min/max configured frequency. That's really all... there is not > > additional impact on "resources allocation". > > So, if it's a cpufreq range controller. It'd have sth like > cpu.freq.min and cpu.freq.max, where min defines the maximum minimum > cpufreq its descendants can get and max defines the maximum cpufreq > allowed in the subtree. For an example, please refer to how > memory.min and memory.max are defined. I think you are still looking at just one usage of this interface, which is likely mainly my fault also because of the long time between posting. Sorry for that... Let me re-propose here an abstract of the cover letter with some additional notes inline. --- Cover Letter Abstract START --- > > [...] utilization is a task specific property which is used by the scheduler > > to know how much CPU bandwidth a task requires (under certain conditions). > > Thus, the utilization clamp values defined either per-task or via the > > CPU controller, can be used to represent tasks to the scheduler as > > being bigger (or smaller) then what they really are. ^^^ This is a fundamental feature added by utilization clamping: this is a task property which can be useful in many different ways to the scheduler and not "just" to bias frequency selection. > > Utilization clamping thus ultimately enable interesting additional > > optimizations, especially on asymmetric capacity systems like Arm > > big.LITTLE and DynamIQ CPUs, where: > > > > - boosting: small tasks are preferably scheduled on higher-capacity CPUs > >where, despite being less energy efficient, they can complete faster > > > > - clamping: big/background tasks are preferably scheduler on low-capacity > > CPUs > >where, being more energy efficient, they can still run but save power and > >thermal headroom for more important tasks. These two point above are two examples of how we can use utilization clamping which is not frequency selection. > > This additional usage of the utilization clamping is not presented in this Is it acceptable to add a generic interface by properly and completely describing, both in the cover letter and in the relative changelogs, what will be the future bits we can add ? > > series but it's an integral part of the Energy Aware Scheduler (EAS) feature ^ The EAS scheduler, without the utilization clamping bits, does a great job in scheduling tasks while saving energy. However, on every system, we are interested also in other metrics, like for example: completion time and power dissipation. Whether certain tasks should be scheduled to optimize energy efficiency, completion time and/or power dissipation is something we can achieve only by: 1. adopting a proper tasks classification schema => that's why CGroups are of interest 2. using a generic enough mechanism to describe certain tasks properties which affect all the metrics above, i.e. energy, speed and power => that's why utilization and its clamping is of interest > > set. A similar solution (SchedTune) is already used on Android kernels, > > which ^^^ This _complete support_ is already actively and successfully used on many Android devices... > > targets both frequency selection and task placement biasing. ^^ ... to support _not only_ frequency selections. > > This series provides the foundation bits to add similar features in mainline ^^^ > > and its first simple client with the schedutil integration. ^^^ The solution presented here shows only the integration with cpufreq/schedutil. However, since we are adding a user-space interface, we have to add this new interface in a generic way since the beginning to support also the complete implementation we will have at the end. --- Cover Letter Abstract END --- >From my comments above I hope it's now more clear that "utilization clamping" is not just a "cpufreq range controller" and, since we will extend the internal usage of such interface, we cannot add now a user-space interface which targets just frequency control. To
Re: [PATCH v2 08/12] sched/core: uclamp: extend cpu's cgroup controller
Hi Tejun, I apologize in advance for the (yet another) long reply, however I did my best hereafter to try to resume all the controversial points discussed so far. If you will have (one more time) the patience to go through the following text you'll find a set of precise clarifications and questions I have for you. Thank you again for your time. On 24-Jul 06:29, Tejun Heo wrote: [...] > > What I describe here is just an additional hint to the scheduler which > > enrich the above described model. Provided A and B are already > > satisfied, when a task gets a chance to run it will be executed at a > > min/max configured frequency. That's really all... there is not > > additional impact on "resources allocation". > > So, if it's a cpufreq range controller. It'd have sth like > cpu.freq.min and cpu.freq.max, where min defines the maximum minimum > cpufreq its descendants can get and max defines the maximum cpufreq > allowed in the subtree. For an example, please refer to how > memory.min and memory.max are defined. I think you are still looking at just one usage of this interface, which is likely mainly my fault also because of the long time between posting. Sorry for that... Let me re-propose here an abstract of the cover letter with some additional notes inline. --- Cover Letter Abstract START --- > > [...] utilization is a task specific property which is used by the scheduler > > to know how much CPU bandwidth a task requires (under certain conditions). > > Thus, the utilization clamp values defined either per-task or via the > > CPU controller, can be used to represent tasks to the scheduler as > > being bigger (or smaller) then what they really are. ^^^ This is a fundamental feature added by utilization clamping: this is a task property which can be useful in many different ways to the scheduler and not "just" to bias frequency selection. > > Utilization clamping thus ultimately enable interesting additional > > optimizations, especially on asymmetric capacity systems like Arm > > big.LITTLE and DynamIQ CPUs, where: > > > > - boosting: small tasks are preferably scheduled on higher-capacity CPUs > >where, despite being less energy efficient, they can complete faster > > > > - clamping: big/background tasks are preferably scheduler on low-capacity > > CPUs > >where, being more energy efficient, they can still run but save power and > >thermal headroom for more important tasks. These two point above are two examples of how we can use utilization clamping which is not frequency selection. > > This additional usage of the utilization clamping is not presented in this Is it acceptable to add a generic interface by properly and completely describing, both in the cover letter and in the relative changelogs, what will be the future bits we can add ? > > series but it's an integral part of the Energy Aware Scheduler (EAS) feature ^ The EAS scheduler, without the utilization clamping bits, does a great job in scheduling tasks while saving energy. However, on every system, we are interested also in other metrics, like for example: completion time and power dissipation. Whether certain tasks should be scheduled to optimize energy efficiency, completion time and/or power dissipation is something we can achieve only by: 1. adopting a proper tasks classification schema => that's why CGroups are of interest 2. using a generic enough mechanism to describe certain tasks properties which affect all the metrics above, i.e. energy, speed and power => that's why utilization and its clamping is of interest > > set. A similar solution (SchedTune) is already used on Android kernels, > > which ^^^ This _complete support_ is already actively and successfully used on many Android devices... > > targets both frequency selection and task placement biasing. ^^ ... to support _not only_ frequency selections. > > This series provides the foundation bits to add similar features in mainline ^^^ > > and its first simple client with the schedutil integration. ^^^ The solution presented here shows only the integration with cpufreq/schedutil. However, since we are adding a user-space interface, we have to add this new interface in a generic way since the beginning to support also the complete implementation we will have at the end. --- Cover Letter Abstract END --- >From my comments above I hope it's now more clear that "utilization clamping" is not just a "cpufreq range controller" and, since we will extend the internal usage of such interface, we cannot add now a user-space interface which targets just frequency control. To
Re: [PATCH v2 08/12] sched/core: uclamp: extend cpu's cgroup controller
Hello, Patrick. On Mon, Jul 23, 2018 at 06:22:15PM +0100, Patrick Bellasi wrote: > However, the "best effort" bandwidth control we have for CFS and RT > can be further improved if, instead of just looking at time spent on > CPUs, we provide some more hints to the scheduler to know at which > min/max "MIPS" we want to consume the (best effort) time we have been > allocated on a CPU. > > Such a simple extension is still quite useful to satisfy many use-case > we have, mainly on mobile systems, like the ones I've described in the >"Newcomer's Short Abstract (Updated)" > section of the cover letter: > > https://lore.kernel.org/lkml/20180716082906.6061-1-patrick.bell...@arm.com/T/#u So, that's all completely fine but then let's please not give it a name which doesn't quite match what it does. We can just call it e.g. cpufreq range control. > > So, there are fundamental discrepancies between > > description+interface vs. what it actually does. > > Perhaps then I should just change the description to make it less > generic... I think so, along with the interface itself. > > I really don't think that's something we can fix up later. > > ... since, really, I don't think we can get to the point to extend > later this interface to provide the strict bandwidth enforcement you > are thinking about. That's completley fine. The interface just has to match what's implemented. ... > > and what you're describing inherently breaks the delegation model. > > What I describe here is just an additional hint to the scheduler which > enrich the above described model. Provided A and B are already > satisfied, when a task gets a chance to run it will be executed at a > min/max configured frequency. That's really all... there is not > additional impact on "resources allocation". So, if it's a cpufreq range controller. It'd have sth like cpu.freq.min and cpu.freq.max, where min defines the maximum minimum cpufreq its descendants can get and max defines the maximum cpufreq allowed in the subtree. For an example, please refer to how memory.min and memory.max are defined. Thanks. -- tejun
Re: [PATCH v2 08/12] sched/core: uclamp: extend cpu's cgroup controller
Hello, Patrick. On Mon, Jul 23, 2018 at 06:22:15PM +0100, Patrick Bellasi wrote: > However, the "best effort" bandwidth control we have for CFS and RT > can be further improved if, instead of just looking at time spent on > CPUs, we provide some more hints to the scheduler to know at which > min/max "MIPS" we want to consume the (best effort) time we have been > allocated on a CPU. > > Such a simple extension is still quite useful to satisfy many use-case > we have, mainly on mobile systems, like the ones I've described in the >"Newcomer's Short Abstract (Updated)" > section of the cover letter: > > https://lore.kernel.org/lkml/20180716082906.6061-1-patrick.bell...@arm.com/T/#u So, that's all completely fine but then let's please not give it a name which doesn't quite match what it does. We can just call it e.g. cpufreq range control. > > So, there are fundamental discrepancies between > > description+interface vs. what it actually does. > > Perhaps then I should just change the description to make it less > generic... I think so, along with the interface itself. > > I really don't think that's something we can fix up later. > > ... since, really, I don't think we can get to the point to extend > later this interface to provide the strict bandwidth enforcement you > are thinking about. That's completley fine. The interface just has to match what's implemented. ... > > and what you're describing inherently breaks the delegation model. > > What I describe here is just an additional hint to the scheduler which > enrich the above described model. Provided A and B are already > satisfied, when a task gets a chance to run it will be executed at a > min/max configured frequency. That's really all... there is not > additional impact on "resources allocation". So, if it's a cpufreq range controller. It'd have sth like cpu.freq.min and cpu.freq.max, where min defines the maximum minimum cpufreq its descendants can get and max defines the maximum cpufreq allowed in the subtree. For an example, please refer to how memory.min and memory.max are defined. Thanks. -- tejun
Re: [PATCH v2 08/12] sched/core: uclamp: extend cpu's cgroup controller
On 23-Jul 08:30, Tejun Heo wrote: > Hello, Hi Tejun! > On Mon, Jul 16, 2018 at 09:29:02AM +0100, Patrick Bellasi wrote: > > The cgroup's CPU controller allows to assign a specified (maximum) > > bandwidth to the tasks of a group. However this bandwidth is defined and > > enforced only on a temporal base, without considering the actual > > frequency a CPU is running on. Thus, the amount of computation completed > > by a task within an allocated bandwidth can be very different depending > > on the actual frequency the CPU is running that task. > > The amount of computation can be affected also by the specific CPU a > > task is running on, especially when running on asymmetric capacity > > systems like Arm's big.LITTLE. > > One basic problem I have with this patchset is that what's being > described is way more generic than what actually got implemented. > What's described is computation bandwidth control but what's > implemented is just frequency clamping. What I meant to describe is that we already have a computation bandwidth control mechanism which is working quite fine for the scheduling classes it applies to, i.e. CFS and RT. For these classes we are usually happy with just a _best effort_ allocation of the bandwidth: nothing enforced in strict terms. Indeed, there is not (at least not in kernel space) a tracking of the actual available and allocated bandwidth. If we need strict enforcement, we already have DL with its CBS servers. However, the "best effort" bandwidth control we have for CFS and RT can be further improved if, instead of just looking at time spent on CPUs, we provide some more hints to the scheduler to know at which min/max "MIPS" we want to consume the (best effort) time we have been allocated on a CPU. Such a simple extension is still quite useful to satisfy many use-case we have, mainly on mobile systems, like the ones I've described in the "Newcomer's Short Abstract (Updated)" section of the cover letter: https://lore.kernel.org/lkml/20180716082906.6061-1-patrick.bell...@arm.com/T/#u > So, there are fundamental discrepancies between > description+interface vs. what it actually does. Perhaps then I should just change the description to make it less generic... > I really don't think that's something we can fix up later. ... since, really, I don't think we can get to the point to extend later this interface to provide the strict bandwidth enforcement you are thinking about. This would not be a fixup, but something really close to re-implementing what we already have with the DL class. > > These attributes: > > > > a) are available only for non-root nodes, both on default and legacy > >hierarchies > > b) do not enforce any constraints and/or dependency between the parent > >and its child nodes, thus relying on the delegation model and > >permission settings defined by the system management software > > cgroup does host attributes which only concern the cgroup itself and > thus don't need any hierarchical behaviors on their own, but what's > being implemented does control resource allocation, I'm not completely sure to get your point here. Maybe it all depends on what we mean by "control resource allocation". AFAIU, currently both the CFS and RT bandwidth controllers allow you to define how much CPU time a group of tasks can use. It does that by looking just within the group: there is no enforced/required relation between the bandwidth assigned to a group and the bandwidth assigned to its parent, siblings and/or children. The resource control allocation is eventually enforced "indirectly" by means of the fact that, based on tasks priorities and cgroup shares, the scheduler will prefer to pick and run "more frequently" and "longer" certain tasks instead of others. Thus I would say that the resource allocation control is already performed by the combined action of: A) priorities / shares to favor certain tasks over others B) period & bandwidth to further bias the scheduler in _not_ selecting tasks which already executed for the configured amount of time. > and what you're describing inherently breaks the delegation model. What I describe here is just an additional hint to the scheduler which enrich the above described model. Provided A and B are already satisfied, when a task gets a chance to run it will be executed at a min/max configured frequency. That's really all... there is not additional impact on "resources allocation". I don't see why you say that this breaks the delegation model? Maybe an example can help to better explain what you mean? Best, Patrick -- #include Patrick Bellasi
Re: [PATCH v2 08/12] sched/core: uclamp: extend cpu's cgroup controller
On 23-Jul 08:30, Tejun Heo wrote: > Hello, Hi Tejun! > On Mon, Jul 16, 2018 at 09:29:02AM +0100, Patrick Bellasi wrote: > > The cgroup's CPU controller allows to assign a specified (maximum) > > bandwidth to the tasks of a group. However this bandwidth is defined and > > enforced only on a temporal base, without considering the actual > > frequency a CPU is running on. Thus, the amount of computation completed > > by a task within an allocated bandwidth can be very different depending > > on the actual frequency the CPU is running that task. > > The amount of computation can be affected also by the specific CPU a > > task is running on, especially when running on asymmetric capacity > > systems like Arm's big.LITTLE. > > One basic problem I have with this patchset is that what's being > described is way more generic than what actually got implemented. > What's described is computation bandwidth control but what's > implemented is just frequency clamping. What I meant to describe is that we already have a computation bandwidth control mechanism which is working quite fine for the scheduling classes it applies to, i.e. CFS and RT. For these classes we are usually happy with just a _best effort_ allocation of the bandwidth: nothing enforced in strict terms. Indeed, there is not (at least not in kernel space) a tracking of the actual available and allocated bandwidth. If we need strict enforcement, we already have DL with its CBS servers. However, the "best effort" bandwidth control we have for CFS and RT can be further improved if, instead of just looking at time spent on CPUs, we provide some more hints to the scheduler to know at which min/max "MIPS" we want to consume the (best effort) time we have been allocated on a CPU. Such a simple extension is still quite useful to satisfy many use-case we have, mainly on mobile systems, like the ones I've described in the "Newcomer's Short Abstract (Updated)" section of the cover letter: https://lore.kernel.org/lkml/20180716082906.6061-1-patrick.bell...@arm.com/T/#u > So, there are fundamental discrepancies between > description+interface vs. what it actually does. Perhaps then I should just change the description to make it less generic... > I really don't think that's something we can fix up later. ... since, really, I don't think we can get to the point to extend later this interface to provide the strict bandwidth enforcement you are thinking about. This would not be a fixup, but something really close to re-implementing what we already have with the DL class. > > These attributes: > > > > a) are available only for non-root nodes, both on default and legacy > >hierarchies > > b) do not enforce any constraints and/or dependency between the parent > >and its child nodes, thus relying on the delegation model and > >permission settings defined by the system management software > > cgroup does host attributes which only concern the cgroup itself and > thus don't need any hierarchical behaviors on their own, but what's > being implemented does control resource allocation, I'm not completely sure to get your point here. Maybe it all depends on what we mean by "control resource allocation". AFAIU, currently both the CFS and RT bandwidth controllers allow you to define how much CPU time a group of tasks can use. It does that by looking just within the group: there is no enforced/required relation between the bandwidth assigned to a group and the bandwidth assigned to its parent, siblings and/or children. The resource control allocation is eventually enforced "indirectly" by means of the fact that, based on tasks priorities and cgroup shares, the scheduler will prefer to pick and run "more frequently" and "longer" certain tasks instead of others. Thus I would say that the resource allocation control is already performed by the combined action of: A) priorities / shares to favor certain tasks over others B) period & bandwidth to further bias the scheduler in _not_ selecting tasks which already executed for the configured amount of time. > and what you're describing inherently breaks the delegation model. What I describe here is just an additional hint to the scheduler which enrich the above described model. Provided A and B are already satisfied, when a task gets a chance to run it will be executed at a min/max configured frequency. That's really all... there is not additional impact on "resources allocation". I don't see why you say that this breaks the delegation model? Maybe an example can help to better explain what you mean? Best, Patrick -- #include Patrick Bellasi
Re: [PATCH v2 08/12] sched/core: uclamp: extend cpu's cgroup controller
Hello, On Mon, Jul 16, 2018 at 09:29:02AM +0100, Patrick Bellasi wrote: > The cgroup's CPU controller allows to assign a specified (maximum) > bandwidth to the tasks of a group. However this bandwidth is defined and > enforced only on a temporal base, without considering the actual > frequency a CPU is running on. Thus, the amount of computation completed > by a task within an allocated bandwidth can be very different depending > on the actual frequency the CPU is running that task. > The amount of computation can be affected also by the specific CPU a > task is running on, especially when running on asymmetric capacity > systems like Arm's big.LITTLE. One basic problem I have with this patchset is that what's being described is way more generic than what actually got implemented. What's described is computation bandwidth control but what's implemented is just frequency clamping. So, there are fundamental discrepancies between description+interface vs. what it actually does. I really don't think that's something we can fix up later. > These attributes: > > a) are available only for non-root nodes, both on default and legacy >hierarchies > b) do not enforce any constraints and/or dependency between the parent >and its child nodes, thus relying on the delegation model and >permission settings defined by the system management software cgroup does host attributes which only concern the cgroup itself and thus don't need any hierarchical behaviors on their own, but what's being implemented does control resource allocation, and what you're describing inherently breaks the delegation model. > c) allow to (eventually) further restrict task-specific clamps defined >via sched_setattr(2) Thanks. -- tejun
Re: [PATCH v2 08/12] sched/core: uclamp: extend cpu's cgroup controller
Hello, On Mon, Jul 16, 2018 at 09:29:02AM +0100, Patrick Bellasi wrote: > The cgroup's CPU controller allows to assign a specified (maximum) > bandwidth to the tasks of a group. However this bandwidth is defined and > enforced only on a temporal base, without considering the actual > frequency a CPU is running on. Thus, the amount of computation completed > by a task within an allocated bandwidth can be very different depending > on the actual frequency the CPU is running that task. > The amount of computation can be affected also by the specific CPU a > task is running on, especially when running on asymmetric capacity > systems like Arm's big.LITTLE. One basic problem I have with this patchset is that what's being described is way more generic than what actually got implemented. What's described is computation bandwidth control but what's implemented is just frequency clamping. So, there are fundamental discrepancies between description+interface vs. what it actually does. I really don't think that's something we can fix up later. > These attributes: > > a) are available only for non-root nodes, both on default and legacy >hierarchies > b) do not enforce any constraints and/or dependency between the parent >and its child nodes, thus relying on the delegation model and >permission settings defined by the system management software cgroup does host attributes which only concern the cgroup itself and thus don't need any hierarchical behaviors on their own, but what's being implemented does control resource allocation, and what you're describing inherently breaks the delegation model. > c) allow to (eventually) further restrict task-specific clamps defined >via sched_setattr(2) Thanks. -- tejun
Re: [PATCH v2 08/12] sched/core: uclamp: extend cpu's cgroup controller
On 20-Jul 19:37, Suren Baghdasaryan wrote: > On Mon, Jul 16, 2018 at 1:29 AM, Patrick Bellasi > wrote: [...] > > +#ifdef CONFIG_UCLAMP_TASK_GROUP > > +static int cpu_util_min_write_u64(struct cgroup_subsys_state *css, > > + struct cftype *cftype, u64 min_value) > > +{ > > + struct task_group *tg; > > + int ret = -EINVAL; > > + > > + if (min_value > SCHED_CAPACITY_SCALE) > > + return -ERANGE; > > + > > + mutex_lock(_mutex); > > + rcu_read_lock(); > > + > > + tg = css_tg(css); > > + if (tg->uclamp[UCLAMP_MIN].value == min_value) { > > + ret = 0; > > + goto out; > > + } > > + if (tg->uclamp[UCLAMP_MAX].value < min_value) > > + goto out; > > + > > + tg->uclamp[UCLAMP_MIN].value = min_value; > + ret = 0; > > Are these assignments missing or am I missing something? Same for > cpu_util_max_write_u64(). They are introduced in the following patch, to keep this one focus just on CGroups integration. I'm also returning -EINVAL at this stage since, with just this patch in, we are not really providing any good service to user-space, i.e. it's like clamp groups not being available... Maybe I can call this out better in the change log ;) > > +out: > > + rcu_read_unlock(); > > + mutex_unlock(_mutex); > > + > > + return ret; > > +} [...] -- #include Patrick Bellasi
Re: [PATCH v2 08/12] sched/core: uclamp: extend cpu's cgroup controller
On 20-Jul 19:37, Suren Baghdasaryan wrote: > On Mon, Jul 16, 2018 at 1:29 AM, Patrick Bellasi > wrote: [...] > > +#ifdef CONFIG_UCLAMP_TASK_GROUP > > +static int cpu_util_min_write_u64(struct cgroup_subsys_state *css, > > + struct cftype *cftype, u64 min_value) > > +{ > > + struct task_group *tg; > > + int ret = -EINVAL; > > + > > + if (min_value > SCHED_CAPACITY_SCALE) > > + return -ERANGE; > > + > > + mutex_lock(_mutex); > > + rcu_read_lock(); > > + > > + tg = css_tg(css); > > + if (tg->uclamp[UCLAMP_MIN].value == min_value) { > > + ret = 0; > > + goto out; > > + } > > + if (tg->uclamp[UCLAMP_MAX].value < min_value) > > + goto out; > > + > > + tg->uclamp[UCLAMP_MIN].value = min_value; > + ret = 0; > > Are these assignments missing or am I missing something? Same for > cpu_util_max_write_u64(). They are introduced in the following patch, to keep this one focus just on CGroups integration. I'm also returning -EINVAL at this stage since, with just this patch in, we are not really providing any good service to user-space, i.e. it's like clamp groups not being available... Maybe I can call this out better in the change log ;) > > +out: > > + rcu_read_unlock(); > > + mutex_unlock(_mutex); > > + > > + return ret; > > +} [...] -- #include Patrick Bellasi
Re: [PATCH v2 08/12] sched/core: uclamp: extend cpu's cgroup controller
On Fri, Jul 20, 2018 at 7:37 PM, Suren Baghdasaryan wrote: > On Mon, Jul 16, 2018 at 1:29 AM, Patrick Bellasi > wrote: >> The cgroup's CPU controller allows to assign a specified (maximum) >> bandwidth to the tasks of a group. However this bandwidth is defined and >> enforced only on a temporal base, without considering the actual >> frequency a CPU is running on. Thus, the amount of computation completed >> by a task within an allocated bandwidth can be very different depending >> on the actual frequency the CPU is running that task. >> The amount of computation can be affected also by the specific CPU a >> task is running on, especially when running on asymmetric capacity >> systems like Arm's big.LITTLE. >> >> With the availability of schedutil, the scheduler is now able >> to drive frequency selections based on actual task utilization. >> Moreover, the utilization clamping support provides a mechanism to >> bias the frequency selection operated by schedutil depending on >> constraints assigned to the tasks currently RUNNABLE on a CPU. >> >> Give the above mechanisms, it is now possible to extend the cpu >> controller to specify what is the minimum (or maximum) utilization which >> a task is expected (or allowed) to generate. >> Constraints on minimum and maximum utilization allowed for tasks in a >> CPU cgroup can improve the control on the actual amount of CPU bandwidth >> consumed by tasks. >> >> Utilization clamping constraints are useful not only to bias frequency >> selection, when a task is running, but also to better support certain >> scheduler decisions regarding task placement. For example, on >> asymmetric capacity systems, a utilization clamp value can be >> conveniently used to enforce important interactive tasks on more capable >> CPUs or to run low priority and background tasks on more energy >> efficient CPUs. >> >> The ultimate goal of utilization clamping is thus to enable: >> >> - boosting: by selecting an higher capacity CPU and/or higher execution >> frequency for small tasks which are affecting the user >> interactive experience. >> >> - capping: by selecting more energy efficiency CPUs or lower execution >>frequency, for big tasks which are mainly related to >>background activities, and thus without a direct impact on >>the user experience. >> >> Thus, a proper extension of the cpu controller with utilization clamping >> support will make this controller even more suitable for integration >> with advanced system management software (e.g. Android). >> Indeed, an informed user-space can provide rich information hints to the >> scheduler regarding the tasks it's going to schedule. >> >> This patch extends the CPU controller by adding a couple of new >> attributes, util_min and util_max, which can be used to enforce task's >> utilization boosting and capping. Specifically: >> >> - util_min: defines the minimum utilization which should be considered, >> e.g. when schedutil selects the frequency for a CPU while a >> task in this group is RUNNABLE. >> i.e. the task will run at least at a minimum frequency which >> corresponds to the min_util utilization >> >> - util_max: defines the maximum utilization which should be considered, >> e.g. when schedutil selects the frequency for a CPU while a >> task in this group is RUNNABLE. >> i.e. the task will run up to a maximum frequency which >> corresponds to the max_util utilization >> >> These attributes: >> >> a) are available only for non-root nodes, both on default and legacy >>hierarchies >> b) do not enforce any constraints and/or dependency between the parent >>and its child nodes, thus relying on the delegation model and >>permission settings defined by the system management software >> c) allow to (eventually) further restrict task-specific clamps defined >>via sched_setattr(2) >> >> This patch provides the basic support to expose the two new attributes >> and to validate their run-time updates. >> >> Signed-off-by: Patrick Bellasi >> Cc: Ingo Molnar >> Cc: Peter Zijlstra >> Cc: Tejun Heo >> Cc: Rafael J. Wysocki >> Cc: Viresh Kumar >> Cc: Todd Kjos >> Cc: Joel Fernandes >> Cc: Juri Lelli >> Cc: linux-kernel@vger.kernel.org >> Cc: linux...@vger.kernel.org >> --- >> Documentation/admin-guide/cgroup-v2.rst | 25 >> init/Kconfig| 22 +++ >> kernel/sched/core.c | 186 >> kernel/sched/sched.h| 5 + >> 4 files changed, 238 insertions(+) >> >> diff --git a/Documentation/admin-guide/cgroup-v2.rst >> b/Documentation/admin-guide/cgroup-v2.rst >> index 8a2c52d5c53b..328c011cc105 100644 >> --- a/Documentation/admin-guide/cgroup-v2.rst >> +++ b/Documentation/admin-guide/cgroup-v2.rst >> @@ -904,6 +904,12 @@ controller implements weight and absolute bandwidth
Re: [PATCH v2 08/12] sched/core: uclamp: extend cpu's cgroup controller
On Fri, Jul 20, 2018 at 7:37 PM, Suren Baghdasaryan wrote: > On Mon, Jul 16, 2018 at 1:29 AM, Patrick Bellasi > wrote: >> The cgroup's CPU controller allows to assign a specified (maximum) >> bandwidth to the tasks of a group. However this bandwidth is defined and >> enforced only on a temporal base, without considering the actual >> frequency a CPU is running on. Thus, the amount of computation completed >> by a task within an allocated bandwidth can be very different depending >> on the actual frequency the CPU is running that task. >> The amount of computation can be affected also by the specific CPU a >> task is running on, especially when running on asymmetric capacity >> systems like Arm's big.LITTLE. >> >> With the availability of schedutil, the scheduler is now able >> to drive frequency selections based on actual task utilization. >> Moreover, the utilization clamping support provides a mechanism to >> bias the frequency selection operated by schedutil depending on >> constraints assigned to the tasks currently RUNNABLE on a CPU. >> >> Give the above mechanisms, it is now possible to extend the cpu >> controller to specify what is the minimum (or maximum) utilization which >> a task is expected (or allowed) to generate. >> Constraints on minimum and maximum utilization allowed for tasks in a >> CPU cgroup can improve the control on the actual amount of CPU bandwidth >> consumed by tasks. >> >> Utilization clamping constraints are useful not only to bias frequency >> selection, when a task is running, but also to better support certain >> scheduler decisions regarding task placement. For example, on >> asymmetric capacity systems, a utilization clamp value can be >> conveniently used to enforce important interactive tasks on more capable >> CPUs or to run low priority and background tasks on more energy >> efficient CPUs. >> >> The ultimate goal of utilization clamping is thus to enable: >> >> - boosting: by selecting an higher capacity CPU and/or higher execution >> frequency for small tasks which are affecting the user >> interactive experience. >> >> - capping: by selecting more energy efficiency CPUs or lower execution >>frequency, for big tasks which are mainly related to >>background activities, and thus without a direct impact on >>the user experience. >> >> Thus, a proper extension of the cpu controller with utilization clamping >> support will make this controller even more suitable for integration >> with advanced system management software (e.g. Android). >> Indeed, an informed user-space can provide rich information hints to the >> scheduler regarding the tasks it's going to schedule. >> >> This patch extends the CPU controller by adding a couple of new >> attributes, util_min and util_max, which can be used to enforce task's >> utilization boosting and capping. Specifically: >> >> - util_min: defines the minimum utilization which should be considered, >> e.g. when schedutil selects the frequency for a CPU while a >> task in this group is RUNNABLE. >> i.e. the task will run at least at a minimum frequency which >> corresponds to the min_util utilization >> >> - util_max: defines the maximum utilization which should be considered, >> e.g. when schedutil selects the frequency for a CPU while a >> task in this group is RUNNABLE. >> i.e. the task will run up to a maximum frequency which >> corresponds to the max_util utilization >> >> These attributes: >> >> a) are available only for non-root nodes, both on default and legacy >>hierarchies >> b) do not enforce any constraints and/or dependency between the parent >>and its child nodes, thus relying on the delegation model and >>permission settings defined by the system management software >> c) allow to (eventually) further restrict task-specific clamps defined >>via sched_setattr(2) >> >> This patch provides the basic support to expose the two new attributes >> and to validate their run-time updates. >> >> Signed-off-by: Patrick Bellasi >> Cc: Ingo Molnar >> Cc: Peter Zijlstra >> Cc: Tejun Heo >> Cc: Rafael J. Wysocki >> Cc: Viresh Kumar >> Cc: Todd Kjos >> Cc: Joel Fernandes >> Cc: Juri Lelli >> Cc: linux-kernel@vger.kernel.org >> Cc: linux...@vger.kernel.org >> --- >> Documentation/admin-guide/cgroup-v2.rst | 25 >> init/Kconfig| 22 +++ >> kernel/sched/core.c | 186 >> kernel/sched/sched.h| 5 + >> 4 files changed, 238 insertions(+) >> >> diff --git a/Documentation/admin-guide/cgroup-v2.rst >> b/Documentation/admin-guide/cgroup-v2.rst >> index 8a2c52d5c53b..328c011cc105 100644 >> --- a/Documentation/admin-guide/cgroup-v2.rst >> +++ b/Documentation/admin-guide/cgroup-v2.rst >> @@ -904,6 +904,12 @@ controller implements weight and absolute bandwidth
Re: [PATCH v2 08/12] sched/core: uclamp: extend cpu's cgroup controller
On Mon, Jul 16, 2018 at 1:29 AM, Patrick Bellasi wrote: > The cgroup's CPU controller allows to assign a specified (maximum) > bandwidth to the tasks of a group. However this bandwidth is defined and > enforced only on a temporal base, without considering the actual > frequency a CPU is running on. Thus, the amount of computation completed > by a task within an allocated bandwidth can be very different depending > on the actual frequency the CPU is running that task. > The amount of computation can be affected also by the specific CPU a > task is running on, especially when running on asymmetric capacity > systems like Arm's big.LITTLE. > > With the availability of schedutil, the scheduler is now able > to drive frequency selections based on actual task utilization. > Moreover, the utilization clamping support provides a mechanism to > bias the frequency selection operated by schedutil depending on > constraints assigned to the tasks currently RUNNABLE on a CPU. > > Give the above mechanisms, it is now possible to extend the cpu > controller to specify what is the minimum (or maximum) utilization which > a task is expected (or allowed) to generate. > Constraints on minimum and maximum utilization allowed for tasks in a > CPU cgroup can improve the control on the actual amount of CPU bandwidth > consumed by tasks. > > Utilization clamping constraints are useful not only to bias frequency > selection, when a task is running, but also to better support certain > scheduler decisions regarding task placement. For example, on > asymmetric capacity systems, a utilization clamp value can be > conveniently used to enforce important interactive tasks on more capable > CPUs or to run low priority and background tasks on more energy > efficient CPUs. > > The ultimate goal of utilization clamping is thus to enable: > > - boosting: by selecting an higher capacity CPU and/or higher execution > frequency for small tasks which are affecting the user > interactive experience. > > - capping: by selecting more energy efficiency CPUs or lower execution >frequency, for big tasks which are mainly related to >background activities, and thus without a direct impact on >the user experience. > > Thus, a proper extension of the cpu controller with utilization clamping > support will make this controller even more suitable for integration > with advanced system management software (e.g. Android). > Indeed, an informed user-space can provide rich information hints to the > scheduler regarding the tasks it's going to schedule. > > This patch extends the CPU controller by adding a couple of new > attributes, util_min and util_max, which can be used to enforce task's > utilization boosting and capping. Specifically: > > - util_min: defines the minimum utilization which should be considered, > e.g. when schedutil selects the frequency for a CPU while a > task in this group is RUNNABLE. > i.e. the task will run at least at a minimum frequency which > corresponds to the min_util utilization > > - util_max: defines the maximum utilization which should be considered, > e.g. when schedutil selects the frequency for a CPU while a > task in this group is RUNNABLE. > i.e. the task will run up to a maximum frequency which > corresponds to the max_util utilization > > These attributes: > > a) are available only for non-root nodes, both on default and legacy >hierarchies > b) do not enforce any constraints and/or dependency between the parent >and its child nodes, thus relying on the delegation model and >permission settings defined by the system management software > c) allow to (eventually) further restrict task-specific clamps defined >via sched_setattr(2) > > This patch provides the basic support to expose the two new attributes > and to validate their run-time updates. > > Signed-off-by: Patrick Bellasi > Cc: Ingo Molnar > Cc: Peter Zijlstra > Cc: Tejun Heo > Cc: Rafael J. Wysocki > Cc: Viresh Kumar > Cc: Todd Kjos > Cc: Joel Fernandes > Cc: Juri Lelli > Cc: linux-kernel@vger.kernel.org > Cc: linux...@vger.kernel.org > --- > Documentation/admin-guide/cgroup-v2.rst | 25 > init/Kconfig| 22 +++ > kernel/sched/core.c | 186 > kernel/sched/sched.h| 5 + > 4 files changed, 238 insertions(+) > > diff --git a/Documentation/admin-guide/cgroup-v2.rst > b/Documentation/admin-guide/cgroup-v2.rst > index 8a2c52d5c53b..328c011cc105 100644 > --- a/Documentation/admin-guide/cgroup-v2.rst > +++ b/Documentation/admin-guide/cgroup-v2.rst > @@ -904,6 +904,12 @@ controller implements weight and absolute bandwidth > limit models for > normal scheduling policy and absolute bandwidth allocation model for > realtime scheduling policy. > > +Cycles distribution is based, by
Re: [PATCH v2 08/12] sched/core: uclamp: extend cpu's cgroup controller
On Mon, Jul 16, 2018 at 1:29 AM, Patrick Bellasi wrote: > The cgroup's CPU controller allows to assign a specified (maximum) > bandwidth to the tasks of a group. However this bandwidth is defined and > enforced only on a temporal base, without considering the actual > frequency a CPU is running on. Thus, the amount of computation completed > by a task within an allocated bandwidth can be very different depending > on the actual frequency the CPU is running that task. > The amount of computation can be affected also by the specific CPU a > task is running on, especially when running on asymmetric capacity > systems like Arm's big.LITTLE. > > With the availability of schedutil, the scheduler is now able > to drive frequency selections based on actual task utilization. > Moreover, the utilization clamping support provides a mechanism to > bias the frequency selection operated by schedutil depending on > constraints assigned to the tasks currently RUNNABLE on a CPU. > > Give the above mechanisms, it is now possible to extend the cpu > controller to specify what is the minimum (or maximum) utilization which > a task is expected (or allowed) to generate. > Constraints on minimum and maximum utilization allowed for tasks in a > CPU cgroup can improve the control on the actual amount of CPU bandwidth > consumed by tasks. > > Utilization clamping constraints are useful not only to bias frequency > selection, when a task is running, but also to better support certain > scheduler decisions regarding task placement. For example, on > asymmetric capacity systems, a utilization clamp value can be > conveniently used to enforce important interactive tasks on more capable > CPUs or to run low priority and background tasks on more energy > efficient CPUs. > > The ultimate goal of utilization clamping is thus to enable: > > - boosting: by selecting an higher capacity CPU and/or higher execution > frequency for small tasks which are affecting the user > interactive experience. > > - capping: by selecting more energy efficiency CPUs or lower execution >frequency, for big tasks which are mainly related to >background activities, and thus without a direct impact on >the user experience. > > Thus, a proper extension of the cpu controller with utilization clamping > support will make this controller even more suitable for integration > with advanced system management software (e.g. Android). > Indeed, an informed user-space can provide rich information hints to the > scheduler regarding the tasks it's going to schedule. > > This patch extends the CPU controller by adding a couple of new > attributes, util_min and util_max, which can be used to enforce task's > utilization boosting and capping. Specifically: > > - util_min: defines the minimum utilization which should be considered, > e.g. when schedutil selects the frequency for a CPU while a > task in this group is RUNNABLE. > i.e. the task will run at least at a minimum frequency which > corresponds to the min_util utilization > > - util_max: defines the maximum utilization which should be considered, > e.g. when schedutil selects the frequency for a CPU while a > task in this group is RUNNABLE. > i.e. the task will run up to a maximum frequency which > corresponds to the max_util utilization > > These attributes: > > a) are available only for non-root nodes, both on default and legacy >hierarchies > b) do not enforce any constraints and/or dependency between the parent >and its child nodes, thus relying on the delegation model and >permission settings defined by the system management software > c) allow to (eventually) further restrict task-specific clamps defined >via sched_setattr(2) > > This patch provides the basic support to expose the two new attributes > and to validate their run-time updates. > > Signed-off-by: Patrick Bellasi > Cc: Ingo Molnar > Cc: Peter Zijlstra > Cc: Tejun Heo > Cc: Rafael J. Wysocki > Cc: Viresh Kumar > Cc: Todd Kjos > Cc: Joel Fernandes > Cc: Juri Lelli > Cc: linux-kernel@vger.kernel.org > Cc: linux...@vger.kernel.org > --- > Documentation/admin-guide/cgroup-v2.rst | 25 > init/Kconfig| 22 +++ > kernel/sched/core.c | 186 > kernel/sched/sched.h| 5 + > 4 files changed, 238 insertions(+) > > diff --git a/Documentation/admin-guide/cgroup-v2.rst > b/Documentation/admin-guide/cgroup-v2.rst > index 8a2c52d5c53b..328c011cc105 100644 > --- a/Documentation/admin-guide/cgroup-v2.rst > +++ b/Documentation/admin-guide/cgroup-v2.rst > @@ -904,6 +904,12 @@ controller implements weight and absolute bandwidth > limit models for > normal scheduling policy and absolute bandwidth allocation model for > realtime scheduling policy. > > +Cycles distribution is based, by
[PATCH v2 08/12] sched/core: uclamp: extend cpu's cgroup controller
The cgroup's CPU controller allows to assign a specified (maximum) bandwidth to the tasks of a group. However this bandwidth is defined and enforced only on a temporal base, without considering the actual frequency a CPU is running on. Thus, the amount of computation completed by a task within an allocated bandwidth can be very different depending on the actual frequency the CPU is running that task. The amount of computation can be affected also by the specific CPU a task is running on, especially when running on asymmetric capacity systems like Arm's big.LITTLE. With the availability of schedutil, the scheduler is now able to drive frequency selections based on actual task utilization. Moreover, the utilization clamping support provides a mechanism to bias the frequency selection operated by schedutil depending on constraints assigned to the tasks currently RUNNABLE on a CPU. Give the above mechanisms, it is now possible to extend the cpu controller to specify what is the minimum (or maximum) utilization which a task is expected (or allowed) to generate. Constraints on minimum and maximum utilization allowed for tasks in a CPU cgroup can improve the control on the actual amount of CPU bandwidth consumed by tasks. Utilization clamping constraints are useful not only to bias frequency selection, when a task is running, but also to better support certain scheduler decisions regarding task placement. For example, on asymmetric capacity systems, a utilization clamp value can be conveniently used to enforce important interactive tasks on more capable CPUs or to run low priority and background tasks on more energy efficient CPUs. The ultimate goal of utilization clamping is thus to enable: - boosting: by selecting an higher capacity CPU and/or higher execution frequency for small tasks which are affecting the user interactive experience. - capping: by selecting more energy efficiency CPUs or lower execution frequency, for big tasks which are mainly related to background activities, and thus without a direct impact on the user experience. Thus, a proper extension of the cpu controller with utilization clamping support will make this controller even more suitable for integration with advanced system management software (e.g. Android). Indeed, an informed user-space can provide rich information hints to the scheduler regarding the tasks it's going to schedule. This patch extends the CPU controller by adding a couple of new attributes, util_min and util_max, which can be used to enforce task's utilization boosting and capping. Specifically: - util_min: defines the minimum utilization which should be considered, e.g. when schedutil selects the frequency for a CPU while a task in this group is RUNNABLE. i.e. the task will run at least at a minimum frequency which corresponds to the min_util utilization - util_max: defines the maximum utilization which should be considered, e.g. when schedutil selects the frequency for a CPU while a task in this group is RUNNABLE. i.e. the task will run up to a maximum frequency which corresponds to the max_util utilization These attributes: a) are available only for non-root nodes, both on default and legacy hierarchies b) do not enforce any constraints and/or dependency between the parent and its child nodes, thus relying on the delegation model and permission settings defined by the system management software c) allow to (eventually) further restrict task-specific clamps defined via sched_setattr(2) This patch provides the basic support to expose the two new attributes and to validate their run-time updates. Signed-off-by: Patrick Bellasi Cc: Ingo Molnar Cc: Peter Zijlstra Cc: Tejun Heo Cc: Rafael J. Wysocki Cc: Viresh Kumar Cc: Todd Kjos Cc: Joel Fernandes Cc: Juri Lelli Cc: linux-kernel@vger.kernel.org Cc: linux...@vger.kernel.org --- Documentation/admin-guide/cgroup-v2.rst | 25 init/Kconfig| 22 +++ kernel/sched/core.c | 186 kernel/sched/sched.h| 5 + 4 files changed, 238 insertions(+) diff --git a/Documentation/admin-guide/cgroup-v2.rst b/Documentation/admin-guide/cgroup-v2.rst index 8a2c52d5c53b..328c011cc105 100644 --- a/Documentation/admin-guide/cgroup-v2.rst +++ b/Documentation/admin-guide/cgroup-v2.rst @@ -904,6 +904,12 @@ controller implements weight and absolute bandwidth limit models for normal scheduling policy and absolute bandwidth allocation model for realtime scheduling policy. +Cycles distribution is based, by default, on a temporal base and it +does not account for the frequency at which tasks are executed. +The (optional) utilization clamping support allows to enforce a minimum +bandwidth, which should always be provided by a CPU, and a maximum bandwidth,
[PATCH v2 08/12] sched/core: uclamp: extend cpu's cgroup controller
The cgroup's CPU controller allows to assign a specified (maximum) bandwidth to the tasks of a group. However this bandwidth is defined and enforced only on a temporal base, without considering the actual frequency a CPU is running on. Thus, the amount of computation completed by a task within an allocated bandwidth can be very different depending on the actual frequency the CPU is running that task. The amount of computation can be affected also by the specific CPU a task is running on, especially when running on asymmetric capacity systems like Arm's big.LITTLE. With the availability of schedutil, the scheduler is now able to drive frequency selections based on actual task utilization. Moreover, the utilization clamping support provides a mechanism to bias the frequency selection operated by schedutil depending on constraints assigned to the tasks currently RUNNABLE on a CPU. Give the above mechanisms, it is now possible to extend the cpu controller to specify what is the minimum (or maximum) utilization which a task is expected (or allowed) to generate. Constraints on minimum and maximum utilization allowed for tasks in a CPU cgroup can improve the control on the actual amount of CPU bandwidth consumed by tasks. Utilization clamping constraints are useful not only to bias frequency selection, when a task is running, but also to better support certain scheduler decisions regarding task placement. For example, on asymmetric capacity systems, a utilization clamp value can be conveniently used to enforce important interactive tasks on more capable CPUs or to run low priority and background tasks on more energy efficient CPUs. The ultimate goal of utilization clamping is thus to enable: - boosting: by selecting an higher capacity CPU and/or higher execution frequency for small tasks which are affecting the user interactive experience. - capping: by selecting more energy efficiency CPUs or lower execution frequency, for big tasks which are mainly related to background activities, and thus without a direct impact on the user experience. Thus, a proper extension of the cpu controller with utilization clamping support will make this controller even more suitable for integration with advanced system management software (e.g. Android). Indeed, an informed user-space can provide rich information hints to the scheduler regarding the tasks it's going to schedule. This patch extends the CPU controller by adding a couple of new attributes, util_min and util_max, which can be used to enforce task's utilization boosting and capping. Specifically: - util_min: defines the minimum utilization which should be considered, e.g. when schedutil selects the frequency for a CPU while a task in this group is RUNNABLE. i.e. the task will run at least at a minimum frequency which corresponds to the min_util utilization - util_max: defines the maximum utilization which should be considered, e.g. when schedutil selects the frequency for a CPU while a task in this group is RUNNABLE. i.e. the task will run up to a maximum frequency which corresponds to the max_util utilization These attributes: a) are available only for non-root nodes, both on default and legacy hierarchies b) do not enforce any constraints and/or dependency between the parent and its child nodes, thus relying on the delegation model and permission settings defined by the system management software c) allow to (eventually) further restrict task-specific clamps defined via sched_setattr(2) This patch provides the basic support to expose the two new attributes and to validate their run-time updates. Signed-off-by: Patrick Bellasi Cc: Ingo Molnar Cc: Peter Zijlstra Cc: Tejun Heo Cc: Rafael J. Wysocki Cc: Viresh Kumar Cc: Todd Kjos Cc: Joel Fernandes Cc: Juri Lelli Cc: linux-kernel@vger.kernel.org Cc: linux...@vger.kernel.org --- Documentation/admin-guide/cgroup-v2.rst | 25 init/Kconfig| 22 +++ kernel/sched/core.c | 186 kernel/sched/sched.h| 5 + 4 files changed, 238 insertions(+) diff --git a/Documentation/admin-guide/cgroup-v2.rst b/Documentation/admin-guide/cgroup-v2.rst index 8a2c52d5c53b..328c011cc105 100644 --- a/Documentation/admin-guide/cgroup-v2.rst +++ b/Documentation/admin-guide/cgroup-v2.rst @@ -904,6 +904,12 @@ controller implements weight and absolute bandwidth limit models for normal scheduling policy and absolute bandwidth allocation model for realtime scheduling policy. +Cycles distribution is based, by default, on a temporal base and it +does not account for the frequency at which tasks are executed. +The (optional) utilization clamping support allows to enforce a minimum +bandwidth, which should always be provided by a CPU, and a maximum bandwidth,