Re: [PATCH v3 01/14] sched/core: uclamp: extend sched_setattr to support utilization clamping

2018-08-17 Thread Quentin Perret
On Friday 17 Aug 2018 at 11:57:31 (+0100), Patrick Bellasi wrote:
> On 17-Aug 11:34, Quentin Perret wrote:
> > Hi Patrick,
> > 
> > On Thursday 09 Aug 2018 at 16:23:13 (+0100), Patrick Bellasi wrote:
> > > On 09-Aug 11:50, Juri Lelli wrote:
> > > > On 09/08/18 10:14, Patrick Bellasi wrote:
> > > > > On 07-Aug 14:35, Juri Lelli wrote:
> > > > > > On 06/08/18 17:39, Patrick Bellasi wrote:
> > > 
> > > [...]
> > > 
> > > > > 1) make CAP_SYS_NICE protected the clamp groups, with an optional boot
> > > > >time parameter to relax this check
> > > > 
> > > > It seems to me that this might work well with that the intended usage of
> > > > the interface that you depict above. SMS only (or any privileged user)
> > > > will be in control of how groups are configured, so no problem for
> > > > normal users.
> > > 
> > > Yes, well... apart normal users still getting a -ENOSPC is they are
> > > requesting one of the not pre-configured clamp values. Which is why
> > > the following bits can be helpful.
> > 
> > So IIUC, normal users would still be free of choosing their clamp values
> > as long as they choose one in the list of pre-allocated ones ? Is that
> > correct ?
> 
> No, with the CAP_SYS_NICE/ADMIN guard in place, as discussed above in
> point 1, the syscall will just fail for normal users.

Right, I just misunderstood then :-)
Sorry for the noise ...

Thanks,
Quentin


Re: [PATCH v3 01/14] sched/core: uclamp: extend sched_setattr to support utilization clamping

2018-08-17 Thread Quentin Perret
On Friday 17 Aug 2018 at 11:57:31 (+0100), Patrick Bellasi wrote:
> On 17-Aug 11:34, Quentin Perret wrote:
> > Hi Patrick,
> > 
> > On Thursday 09 Aug 2018 at 16:23:13 (+0100), Patrick Bellasi wrote:
> > > On 09-Aug 11:50, Juri Lelli wrote:
> > > > On 09/08/18 10:14, Patrick Bellasi wrote:
> > > > > On 07-Aug 14:35, Juri Lelli wrote:
> > > > > > On 06/08/18 17:39, Patrick Bellasi wrote:
> > > 
> > > [...]
> > > 
> > > > > 1) make CAP_SYS_NICE protected the clamp groups, with an optional boot
> > > > >time parameter to relax this check
> > > > 
> > > > It seems to me that this might work well with that the intended usage of
> > > > the interface that you depict above. SMS only (or any privileged user)
> > > > will be in control of how groups are configured, so no problem for
> > > > normal users.
> > > 
> > > Yes, well... apart normal users still getting a -ENOSPC is they are
> > > requesting one of the not pre-configured clamp values. Which is why
> > > the following bits can be helpful.
> > 
> > So IIUC, normal users would still be free of choosing their clamp values
> > as long as they choose one in the list of pre-allocated ones ? Is that
> > correct ?
> 
> No, with the CAP_SYS_NICE/ADMIN guard in place, as discussed above in
> point 1, the syscall will just fail for normal users.

Right, I just misunderstood then :-)
Sorry for the noise ...

Thanks,
Quentin


Re: [PATCH v3 01/14] sched/core: uclamp: extend sched_setattr to support utilization clamping

2018-08-17 Thread Patrick Bellasi
On 17-Aug 11:34, Quentin Perret wrote:
> Hi Patrick,
> 
> On Thursday 09 Aug 2018 at 16:23:13 (+0100), Patrick Bellasi wrote:
> > On 09-Aug 11:50, Juri Lelli wrote:
> > > On 09/08/18 10:14, Patrick Bellasi wrote:
> > > > On 07-Aug 14:35, Juri Lelli wrote:
> > > > > On 06/08/18 17:39, Patrick Bellasi wrote:
> > 
> > [...]
> > 
> > > > 1) make CAP_SYS_NICE protected the clamp groups, with an optional boot
> > > >time parameter to relax this check
> > > 
> > > It seems to me that this might work well with that the intended usage of
> > > the interface that you depict above. SMS only (or any privileged user)
> > > will be in control of how groups are configured, so no problem for
> > > normal users.
> > 
> > Yes, well... apart normal users still getting a -ENOSPC is they are
> > requesting one of the not pre-configured clamp values. Which is why
> > the following bits can be helpful.
> 
> So IIUC, normal users would still be free of choosing their clamp values
> as long as they choose one in the list of pre-allocated ones ? Is that
> correct ?

No, with the CAP_SYS_NICE/ADMIN guard in place, as discussed above in
point 1, the syscall will just fail for normal users.

Only privileged tasks (i.e. SMS control threads) can change clamp values.

> If yes, that would still let normal users make they tasks look bigger no ?
> They could just choose the clamp group with the highest min_clamp or
> something. Isn't this a problem too ? I mean, if that can be abused easily,
> I'm pretty sure people _will_ abuse it ...

It should not be possible with 1) in place.

However, if the system is booted with that check disabled (e.g. via
kernel boot parameter) that probably means you trust/control your
userspace and don't want to impose restrictions on non privileged
tasks. In this case "abuses" are just "acceptable usages"...

-- 
#include 

Patrick Bellasi


Re: [PATCH v3 01/14] sched/core: uclamp: extend sched_setattr to support utilization clamping

2018-08-17 Thread Patrick Bellasi
On 17-Aug 11:34, Quentin Perret wrote:
> Hi Patrick,
> 
> On Thursday 09 Aug 2018 at 16:23:13 (+0100), Patrick Bellasi wrote:
> > On 09-Aug 11:50, Juri Lelli wrote:
> > > On 09/08/18 10:14, Patrick Bellasi wrote:
> > > > On 07-Aug 14:35, Juri Lelli wrote:
> > > > > On 06/08/18 17:39, Patrick Bellasi wrote:
> > 
> > [...]
> > 
> > > > 1) make CAP_SYS_NICE protected the clamp groups, with an optional boot
> > > >time parameter to relax this check
> > > 
> > > It seems to me that this might work well with that the intended usage of
> > > the interface that you depict above. SMS only (or any privileged user)
> > > will be in control of how groups are configured, so no problem for
> > > normal users.
> > 
> > Yes, well... apart normal users still getting a -ENOSPC is they are
> > requesting one of the not pre-configured clamp values. Which is why
> > the following bits can be helpful.
> 
> So IIUC, normal users would still be free of choosing their clamp values
> as long as they choose one in the list of pre-allocated ones ? Is that
> correct ?

No, with the CAP_SYS_NICE/ADMIN guard in place, as discussed above in
point 1, the syscall will just fail for normal users.

Only privileged tasks (i.e. SMS control threads) can change clamp values.

> If yes, that would still let normal users make they tasks look bigger no ?
> They could just choose the clamp group with the highest min_clamp or
> something. Isn't this a problem too ? I mean, if that can be abused easily,
> I'm pretty sure people _will_ abuse it ...

It should not be possible with 1) in place.

However, if the system is booted with that check disabled (e.g. via
kernel boot parameter) that probably means you trust/control your
userspace and don't want to impose restrictions on non privileged
tasks. In this case "abuses" are just "acceptable usages"...

-- 
#include 

Patrick Bellasi


Re: [PATCH v3 01/14] sched/core: uclamp: extend sched_setattr to support utilization clamping

2018-08-17 Thread Quentin Perret
Hi Patrick,

On Thursday 09 Aug 2018 at 16:23:13 (+0100), Patrick Bellasi wrote:
> On 09-Aug 11:50, Juri Lelli wrote:
> > On 09/08/18 10:14, Patrick Bellasi wrote:
> > > On 07-Aug 14:35, Juri Lelli wrote:
> > > > On 06/08/18 17:39, Patrick Bellasi wrote:
> 
> [...]
> 
> > > 1) make CAP_SYS_NICE protected the clamp groups, with an optional boot
> > >time parameter to relax this check
> > 
> > It seems to me that this might work well with that the intended usage of
> > the interface that you depict above. SMS only (or any privileged user)
> > will be in control of how groups are configured, so no problem for
> > normal users.
> 
> Yes, well... apart normal users still getting a -ENOSPC is they are
> requesting one of the not pre-configured clamp values. Which is why
> the following bits can be helpful.

So IIUC, normal users would still be free of choosing their clamp values
as long as they choose one in the list of pre-allocated ones ? Is that
correct ?

If yes, that would still let normal users make they tasks look bigger no ?
They could just choose the clamp group with the highest min_clamp or
something. Isn't this a problem too ? I mean, if that can be abused easily,
I'm pretty sure people _will_ abuse it ...

Or maybe I misunderstood something ?

Thanks,
Quentin


Re: [PATCH v3 01/14] sched/core: uclamp: extend sched_setattr to support utilization clamping

2018-08-17 Thread Quentin Perret
Hi Patrick,

On Thursday 09 Aug 2018 at 16:23:13 (+0100), Patrick Bellasi wrote:
> On 09-Aug 11:50, Juri Lelli wrote:
> > On 09/08/18 10:14, Patrick Bellasi wrote:
> > > On 07-Aug 14:35, Juri Lelli wrote:
> > > > On 06/08/18 17:39, Patrick Bellasi wrote:
> 
> [...]
> 
> > > 1) make CAP_SYS_NICE protected the clamp groups, with an optional boot
> > >time parameter to relax this check
> > 
> > It seems to me that this might work well with that the intended usage of
> > the interface that you depict above. SMS only (or any privileged user)
> > will be in control of how groups are configured, so no problem for
> > normal users.
> 
> Yes, well... apart normal users still getting a -ENOSPC is they are
> requesting one of the not pre-configured clamp values. Which is why
> the following bits can be helpful.

So IIUC, normal users would still be free of choosing their clamp values
as long as they choose one in the list of pre-allocated ones ? Is that
correct ?

If yes, that would still let normal users make they tasks look bigger no ?
They could just choose the clamp group with the highest min_clamp or
something. Isn't this a problem too ? I mean, if that can be abused easily,
I'm pretty sure people _will_ abuse it ...

Or maybe I misunderstood something ?

Thanks,
Quentin


Re: [PATCH v3 01/14] sched/core: uclamp: extend sched_setattr to support utilization clamping

2018-08-13 Thread Juri Lelli
On 13/08/18 13:14, Patrick Bellasi wrote:
> On 07-Aug 11:59, Juri Lelli wrote:
> > Hi,
> > 
> > Minor comments below.
> > 
> > On 06/08/18 17:39, Patrick Bellasi wrote:
> > 
> > [...]
> > 
> > > + *
> > > + * Task Utilization Attributes
> > > + * ===
> > > + *
> > > + * A subset of sched_attr attributes allows to specify the utilization 
> > > which
> > > + * should be expected by a task. These attributes allows to inform the
> >^
> >  allow
> > 
> > > + * scheduler about the utilization boundaries within which is safe to 
> > > schedule
> > 
> > Isn't all this more about providing hints than safety?
> 
> Yes, it's "just" hints... will rephrase to make it more clear.
> 
> > > + * the task. These utilization boundaries are valuable information to 
> > > support
> > > + * scheduler decisions on both task placement and frequencies selection.
> > > + *
> > > + *  @sched_util_min  represents the minimum utilization
> > > + *  @sched_util_max  represents the maximum utilization
> > > + *
> > > + * Utilization is a value in the range [0..SCHED_CAPACITY_SCALE] which
> > > + * represents the percentage of CPU time used by a task when running at 
> > > the
> > > + * maximum frequency on the highest capacity CPU of the system. Thus, for
> > > + * example, a 20% utilization task is a task running for 2ms every 10ms.
> > > + *
> > > + * A task with a min utilization value bigger then 0 is more likely to be
> > > + * scheduled on a CPU which can provide that bandwidth.
> > > + * A task with a max utilization value smaller then 1024 is more likely 
> > > to be
> > > + * scheduled on a CPU which do not provide more then the required 
> > > bandwidth.
> > 
> > Isn't s/bandwidth/capacity/ here, above, and in general where you use
> > the term "bandwidth" more appropriate? I wonder if overloading this term
> > (w.r.t. how is used with DEADLINE) might create confusion. In this case
> > we are not providing any sort of guarantees, it's a hint.
> 
> Yes, you right... here we are not really granting any bandwidth but
> just "improving" the bandwidth provisioning by hinting the scheduler
> about a certain min/max capacity required.
> 
> The problem related to using capacity is that, from kernel space,
> capacity is defined as a static quantity/property of CPUs. Still, I

Looks like it's also more inline with EAS terminology (i.e., capacity
states).


Re: [PATCH v3 01/14] sched/core: uclamp: extend sched_setattr to support utilization clamping

2018-08-13 Thread Juri Lelli
On 13/08/18 13:14, Patrick Bellasi wrote:
> On 07-Aug 11:59, Juri Lelli wrote:
> > Hi,
> > 
> > Minor comments below.
> > 
> > On 06/08/18 17:39, Patrick Bellasi wrote:
> > 
> > [...]
> > 
> > > + *
> > > + * Task Utilization Attributes
> > > + * ===
> > > + *
> > > + * A subset of sched_attr attributes allows to specify the utilization 
> > > which
> > > + * should be expected by a task. These attributes allows to inform the
> >^
> >  allow
> > 
> > > + * scheduler about the utilization boundaries within which is safe to 
> > > schedule
> > 
> > Isn't all this more about providing hints than safety?
> 
> Yes, it's "just" hints... will rephrase to make it more clear.
> 
> > > + * the task. These utilization boundaries are valuable information to 
> > > support
> > > + * scheduler decisions on both task placement and frequencies selection.
> > > + *
> > > + *  @sched_util_min  represents the minimum utilization
> > > + *  @sched_util_max  represents the maximum utilization
> > > + *
> > > + * Utilization is a value in the range [0..SCHED_CAPACITY_SCALE] which
> > > + * represents the percentage of CPU time used by a task when running at 
> > > the
> > > + * maximum frequency on the highest capacity CPU of the system. Thus, for
> > > + * example, a 20% utilization task is a task running for 2ms every 10ms.
> > > + *
> > > + * A task with a min utilization value bigger then 0 is more likely to be
> > > + * scheduled on a CPU which can provide that bandwidth.
> > > + * A task with a max utilization value smaller then 1024 is more likely 
> > > to be
> > > + * scheduled on a CPU which do not provide more then the required 
> > > bandwidth.
> > 
> > Isn't s/bandwidth/capacity/ here, above, and in general where you use
> > the term "bandwidth" more appropriate? I wonder if overloading this term
> > (w.r.t. how is used with DEADLINE) might create confusion. In this case
> > we are not providing any sort of guarantees, it's a hint.
> 
> Yes, you right... here we are not really granting any bandwidth but
> just "improving" the bandwidth provisioning by hinting the scheduler
> about a certain min/max capacity required.
> 
> The problem related to using capacity is that, from kernel space,
> capacity is defined as a static quantity/property of CPUs. Still, I

Looks like it's also more inline with EAS terminology (i.e., capacity
states).


Re: [PATCH v3 01/14] sched/core: uclamp: extend sched_setattr to support utilization clamping

2018-08-13 Thread Patrick Bellasi
On 07-Aug 11:59, Juri Lelli wrote:
> Hi,
> 
> Minor comments below.
> 
> On 06/08/18 17:39, Patrick Bellasi wrote:
> 
> [...]
> 
> > + *
> > + * Task Utilization Attributes
> > + * ===
> > + *
> > + * A subset of sched_attr attributes allows to specify the utilization 
> > which
> > + * should be expected by a task. These attributes allows to inform the
>^
>allow
> 
> > + * scheduler about the utilization boundaries within which is safe to 
> > schedule
> 
> Isn't all this more about providing hints than safety?

Yes, it's "just" hints... will rephrase to make it more clear.

> > + * the task. These utilization boundaries are valuable information to 
> > support
> > + * scheduler decisions on both task placement and frequencies selection.
> > + *
> > + *  @sched_util_minrepresents the minimum utilization
> > + *  @sched_util_maxrepresents the maximum utilization
> > + *
> > + * Utilization is a value in the range [0..SCHED_CAPACITY_SCALE] which
> > + * represents the percentage of CPU time used by a task when running at the
> > + * maximum frequency on the highest capacity CPU of the system. Thus, for
> > + * example, a 20% utilization task is a task running for 2ms every 10ms.
> > + *
> > + * A task with a min utilization value bigger then 0 is more likely to be
> > + * scheduled on a CPU which can provide that bandwidth.
> > + * A task with a max utilization value smaller then 1024 is more likely to 
> > be
> > + * scheduled on a CPU which do not provide more then the required 
> > bandwidth.
> 
> Isn't s/bandwidth/capacity/ here, above, and in general where you use
> the term "bandwidth" more appropriate? I wonder if overloading this term
> (w.r.t. how is used with DEADLINE) might create confusion. In this case
> we are not providing any sort of guarantees, it's a hint.

Yes, you right... here we are not really granting any bandwidth but
just "improving" the bandwidth provisioning by hinting the scheduler
about a certain min/max capacity required.

The problem related to using capacity is that, from kernel space,
capacity is defined as a static quantity/property of CPUs. Still, I
think it makes sense to argue that util_{min,max} are hints on the
min/max capacity required for a task.

I'll update comments and text to avoid using bandwidth in favour of
capacity.

Cheers Patrick

-- 
#include 

Patrick Bellasi


Re: [PATCH v3 01/14] sched/core: uclamp: extend sched_setattr to support utilization clamping

2018-08-13 Thread Patrick Bellasi
On 07-Aug 11:59, Juri Lelli wrote:
> Hi,
> 
> Minor comments below.
> 
> On 06/08/18 17:39, Patrick Bellasi wrote:
> 
> [...]
> 
> > + *
> > + * Task Utilization Attributes
> > + * ===
> > + *
> > + * A subset of sched_attr attributes allows to specify the utilization 
> > which
> > + * should be expected by a task. These attributes allows to inform the
>^
>allow
> 
> > + * scheduler about the utilization boundaries within which is safe to 
> > schedule
> 
> Isn't all this more about providing hints than safety?

Yes, it's "just" hints... will rephrase to make it more clear.

> > + * the task. These utilization boundaries are valuable information to 
> > support
> > + * scheduler decisions on both task placement and frequencies selection.
> > + *
> > + *  @sched_util_minrepresents the minimum utilization
> > + *  @sched_util_maxrepresents the maximum utilization
> > + *
> > + * Utilization is a value in the range [0..SCHED_CAPACITY_SCALE] which
> > + * represents the percentage of CPU time used by a task when running at the
> > + * maximum frequency on the highest capacity CPU of the system. Thus, for
> > + * example, a 20% utilization task is a task running for 2ms every 10ms.
> > + *
> > + * A task with a min utilization value bigger then 0 is more likely to be
> > + * scheduled on a CPU which can provide that bandwidth.
> > + * A task with a max utilization value smaller then 1024 is more likely to 
> > be
> > + * scheduled on a CPU which do not provide more then the required 
> > bandwidth.
> 
> Isn't s/bandwidth/capacity/ here, above, and in general where you use
> the term "bandwidth" more appropriate? I wonder if overloading this term
> (w.r.t. how is used with DEADLINE) might create confusion. In this case
> we are not providing any sort of guarantees, it's a hint.

Yes, you right... here we are not really granting any bandwidth but
just "improving" the bandwidth provisioning by hinting the scheduler
about a certain min/max capacity required.

The problem related to using capacity is that, from kernel space,
capacity is defined as a static quantity/property of CPUs. Still, I
think it makes sense to argue that util_{min,max} are hints on the
min/max capacity required for a task.

I'll update comments and text to avoid using bandwidth in favour of
capacity.

Cheers Patrick

-- 
#include 

Patrick Bellasi


Re: [PATCH v3 01/14] sched/core: uclamp: extend sched_setattr to support utilization clamping

2018-08-10 Thread Juri Lelli
On 09/08/18 16:23, Patrick Bellasi wrote:
> On 09-Aug 11:50, Juri Lelli wrote:
> > On 09/08/18 10:14, Patrick Bellasi wrote:
> > > On 07-Aug 14:35, Juri Lelli wrote:
> > > > On 06/08/18 17:39, Patrick Bellasi wrote:
> 
> [...]
> 
> > > 1) make CAP_SYS_NICE protected the clamp groups, with an optional boot
> > >time parameter to relax this check
> > 
> > It seems to me that this might work well with that the intended usage of
> > the interface that you depict above. SMS only (or any privileged user)
> > will be in control of how groups are configured, so no problem for
> > normal users.
> 
> Yes, well... apart normal users still getting a -ENOSPC is they are
> requesting one of the not pre-configured clamp values. Which is why
> the following bits can be helpful.
> 
> > > 2) add discretization support to clamp groups allocation
> > 
> > And this might also work well if we feel that we don't want to restrict
> > usage of the interface to admin only, however...
> > 
> > > This second feature specifically, will ensure that clamp values are
> > > always mapped into one of the available clamp groups. While the exact
> > > clamp value can always be used for tasks placement biasing, when it
> > > comes to frequency selection biasing, depending on concurrently
> > > running tasks, you can end up with an effective clamp value which is a
> > > rounded up.
> > 
> > what I'm not so sure about is that we might lose in flexibility if the
> > number of available discrete clamp groups is too small compared to the
> > number of available OPP on the platform.
> 
> Regarding this concern, I would say that we should consider that, for
> frequency biasing, we are in general not interested in nailing down
> the single 1% difference and/or exact OPP capacities

True.

> A certain coarse grained resolution is usually acceptable for many
> different reasons:
> a) schedutil already uses a 20% margin which can potentially eclipse
>few OPP when we scale up/down
> b) tasks/CPUs utilization are good enough but never exact and precise
>values
> c) reducing the number of OPP switches could have some benefits on
>stability/latencies
> d) clamping is actually defining minimum/maximum preferred values, is
>not to be considered a tool for "precise control"
> 
> All that considered, I would say that maybe a 5% resolution could
> still be considered an acceptable _worst case_ rounding since we don't
> have always to round up to the next 5%.
> 
> For example, if we have:
>  - TaskA: util_min=41%
>  - TaskB: util_nin=44%
> they will be both accounted in the 40-45% clamp group but the clamp
> group value can be modulated at run-time depending on RUNNABLE
> tasks. When TaskA is running alone, we can still set util_min to
> 41%, while we will use 44% (not 45%) when TaskB is (also) running.
> 
> It's worth to notice that we pre-allocated at compile time 20 clamp
> groups, but not necessarily all of them will be used at run-time.
> Indeed, we will still use a policy where only the actual required
> values are allocated at the beginning of the clamps map, thus
> optimizing max updates.

OK, so you'll only still iterate over the groups that are actually in
use, which is hopefully less than 20 and should keep overhead low. Makes
sense to me.


Re: [PATCH v3 01/14] sched/core: uclamp: extend sched_setattr to support utilization clamping

2018-08-10 Thread Juri Lelli
On 09/08/18 16:23, Patrick Bellasi wrote:
> On 09-Aug 11:50, Juri Lelli wrote:
> > On 09/08/18 10:14, Patrick Bellasi wrote:
> > > On 07-Aug 14:35, Juri Lelli wrote:
> > > > On 06/08/18 17:39, Patrick Bellasi wrote:
> 
> [...]
> 
> > > 1) make CAP_SYS_NICE protected the clamp groups, with an optional boot
> > >time parameter to relax this check
> > 
> > It seems to me that this might work well with that the intended usage of
> > the interface that you depict above. SMS only (or any privileged user)
> > will be in control of how groups are configured, so no problem for
> > normal users.
> 
> Yes, well... apart normal users still getting a -ENOSPC is they are
> requesting one of the not pre-configured clamp values. Which is why
> the following bits can be helpful.
> 
> > > 2) add discretization support to clamp groups allocation
> > 
> > And this might also work well if we feel that we don't want to restrict
> > usage of the interface to admin only, however...
> > 
> > > This second feature specifically, will ensure that clamp values are
> > > always mapped into one of the available clamp groups. While the exact
> > > clamp value can always be used for tasks placement biasing, when it
> > > comes to frequency selection biasing, depending on concurrently
> > > running tasks, you can end up with an effective clamp value which is a
> > > rounded up.
> > 
> > what I'm not so sure about is that we might lose in flexibility if the
> > number of available discrete clamp groups is too small compared to the
> > number of available OPP on the platform.
> 
> Regarding this concern, I would say that we should consider that, for
> frequency biasing, we are in general not interested in nailing down
> the single 1% difference and/or exact OPP capacities

True.

> A certain coarse grained resolution is usually acceptable for many
> different reasons:
> a) schedutil already uses a 20% margin which can potentially eclipse
>few OPP when we scale up/down
> b) tasks/CPUs utilization are good enough but never exact and precise
>values
> c) reducing the number of OPP switches could have some benefits on
>stability/latencies
> d) clamping is actually defining minimum/maximum preferred values, is
>not to be considered a tool for "precise control"
> 
> All that considered, I would say that maybe a 5% resolution could
> still be considered an acceptable _worst case_ rounding since we don't
> have always to round up to the next 5%.
> 
> For example, if we have:
>  - TaskA: util_min=41%
>  - TaskB: util_nin=44%
> they will be both accounted in the 40-45% clamp group but the clamp
> group value can be modulated at run-time depending on RUNNABLE
> tasks. When TaskA is running alone, we can still set util_min to
> 41%, while we will use 44% (not 45%) when TaskB is (also) running.
> 
> It's worth to notice that we pre-allocated at compile time 20 clamp
> groups, but not necessarily all of them will be used at run-time.
> Indeed, we will still use a policy where only the actual required
> values are allocated at the beginning of the clamps map, thus
> optimizing max updates.

OK, so you'll only still iterate over the groups that are actually in
use, which is hopefully less than 20 and should keep overhead low. Makes
sense to me.


Re: [PATCH v3 01/14] sched/core: uclamp: extend sched_setattr to support utilization clamping

2018-08-09 Thread Patrick Bellasi
On 09-Aug 11:50, Juri Lelli wrote:
> On 09/08/18 10:14, Patrick Bellasi wrote:
> > On 07-Aug 14:35, Juri Lelli wrote:
> > > On 06/08/18 17:39, Patrick Bellasi wrote:

[...]

> > 1) make CAP_SYS_NICE protected the clamp groups, with an optional boot
> >time parameter to relax this check
> 
> It seems to me that this might work well with that the intended usage of
> the interface that you depict above. SMS only (or any privileged user)
> will be in control of how groups are configured, so no problem for
> normal users.

Yes, well... apart normal users still getting a -ENOSPC is they are
requesting one of the not pre-configured clamp values. Which is why
the following bits can be helpful.

> > 2) add discretization support to clamp groups allocation
> 
> And this might also work well if we feel that we don't want to restrict
> usage of the interface to admin only, however...
> 
> > This second feature specifically, will ensure that clamp values are
> > always mapped into one of the available clamp groups. While the exact
> > clamp value can always be used for tasks placement biasing, when it
> > comes to frequency selection biasing, depending on concurrently
> > running tasks, you can end up with an effective clamp value which is a
> > rounded up.
> 
> what I'm not so sure about is that we might lose in flexibility if the
> number of available discrete clamp groups is too small compared to the
> number of available OPP on the platform.

Regarding this concern, I would say that we should consider that, for
frequency biasing, we are in general not interested in nailing down
the single 1% difference and/or exact OPP capacities

A certain coarse grained resolution is usually acceptable for many
different reasons:
a) schedutil already uses a 20% margin which can potentially eclipse
   few OPP when we scale up/down
b) tasks/CPUs utilization are good enough but never exact and precise
   values
c) reducing the number of OPP switches could have some benefits on
   stability/latencies
d) clamping is actually defining minimum/maximum preferred values, is
   not to be considered a tool for "precise control"

All that considered, I would say that maybe a 5% resolution could
still be considered an acceptable _worst case_ rounding since we don't
have always to round up to the next 5%.

For example, if we have:
 - TaskA: util_min=41%
 - TaskB: util_nin=44%
they will be both accounted in the 40-45% clamp group but the clamp
group value can be modulated at run-time depending on RUNNABLE
tasks. When TaskA is running alone, we can still set util_min to
41%, while we will use 44% (not 45%) when TaskB is (also) running.

It's worth to notice that we pre-allocated at compile time 20 clamp
groups, but not necessarily all of them will be used at run-time.
Indeed, we will still use a policy where only the actual required
values are allocated at the beginning of the clamps map, thus
optimizing max updates.

-- 
#include 

Patrick Bellasi


Re: [PATCH v3 01/14] sched/core: uclamp: extend sched_setattr to support utilization clamping

2018-08-09 Thread Patrick Bellasi
On 09-Aug 11:50, Juri Lelli wrote:
> On 09/08/18 10:14, Patrick Bellasi wrote:
> > On 07-Aug 14:35, Juri Lelli wrote:
> > > On 06/08/18 17:39, Patrick Bellasi wrote:

[...]

> > 1) make CAP_SYS_NICE protected the clamp groups, with an optional boot
> >time parameter to relax this check
> 
> It seems to me that this might work well with that the intended usage of
> the interface that you depict above. SMS only (or any privileged user)
> will be in control of how groups are configured, so no problem for
> normal users.

Yes, well... apart normal users still getting a -ENOSPC is they are
requesting one of the not pre-configured clamp values. Which is why
the following bits can be helpful.

> > 2) add discretization support to clamp groups allocation
> 
> And this might also work well if we feel that we don't want to restrict
> usage of the interface to admin only, however...
> 
> > This second feature specifically, will ensure that clamp values are
> > always mapped into one of the available clamp groups. While the exact
> > clamp value can always be used for tasks placement biasing, when it
> > comes to frequency selection biasing, depending on concurrently
> > running tasks, you can end up with an effective clamp value which is a
> > rounded up.
> 
> what I'm not so sure about is that we might lose in flexibility if the
> number of available discrete clamp groups is too small compared to the
> number of available OPP on the platform.

Regarding this concern, I would say that we should consider that, for
frequency biasing, we are in general not interested in nailing down
the single 1% difference and/or exact OPP capacities

A certain coarse grained resolution is usually acceptable for many
different reasons:
a) schedutil already uses a 20% margin which can potentially eclipse
   few OPP when we scale up/down
b) tasks/CPUs utilization are good enough but never exact and precise
   values
c) reducing the number of OPP switches could have some benefits on
   stability/latencies
d) clamping is actually defining minimum/maximum preferred values, is
   not to be considered a tool for "precise control"

All that considered, I would say that maybe a 5% resolution could
still be considered an acceptable _worst case_ rounding since we don't
have always to round up to the next 5%.

For example, if we have:
 - TaskA: util_min=41%
 - TaskB: util_nin=44%
they will be both accounted in the 40-45% clamp group but the clamp
group value can be modulated at run-time depending on RUNNABLE
tasks. When TaskA is running alone, we can still set util_min to
41%, while we will use 44% (not 45%) when TaskB is (also) running.

It's worth to notice that we pre-allocated at compile time 20 clamp
groups, but not necessarily all of them will be used at run-time.
Indeed, we will still use a policy where only the actual required
values are allocated at the beginning of the clamps map, thus
optimizing max updates.

-- 
#include 

Patrick Bellasi


Re: [PATCH v3 01/14] sched/core: uclamp: extend sched_setattr to support utilization clamping

2018-08-09 Thread Randy Dunlap
On 08/09/2018 01:39 AM, Patrick Bellasi wrote:
> On 06-Aug 09:50, Randy Dunlap wrote:
>> Hi,
> 
> Hi Randy,
> 
>> On 08/06/2018 09:39 AM, Patrick Bellasi wrote:
>>> diff --git a/init/Kconfig b/init/Kconfig
>>> index 041f3a022122..1d45a6877d6f 100644
>>> --- a/init/Kconfig
>>> +++ b/init/Kconfig
>>> @@ -583,6 +583,25 @@ config HAVE_UNSTABLE_SCHED_CLOCK
>>>  config GENERIC_SCHED_CLOCK
>>> bool
>>>  
>>> +menu "Scheduler features"
>>> +
>>> +config UCLAMP_TASK
>>> +   bool "Enable utilization clamping for RT/FAIR tasks"
>>> +   depends on CPU_FREQ_GOV_SCHEDUTIL
>>> +   default false
>>
>>  default n
>> but just omit the line completely since "n" is already the default.
> 
> 
> Right, will update for next posting!
> Is there a strict rule to omit this line when it's already the
> default?

It's not documented AFAIK, but it's commonly repeated on LKML.

>>> +   help
>>> + This feature enables the scheduler to track the clamped utilization
>>> + of each CPU based on RUNNABLE tasks currently scheduled on that CPU.
>>> +
>>> + When this option is enabled, the user can specify a min and max CPU
>>> +  bandwidth which is allowed for a task.
>>> + The max bandwidth allows to clamp the maximum frequency a task can
>>> + use, while the min bandwidth allows to define a minimum frequency a
>>> +  task will always use.
>>
>> Please clean up the indentation above to use one tab + 2 spaces on all lines.
> 
> Sure, my bad I did not notice it... although I'm quite sure the patch
> passed a checkpatch... will check better next time.

Thanks.

>>> +
>>> + If in doubt, say N.
>>> +
>>> +endmenu


-- 
~Randy


Re: [PATCH v3 01/14] sched/core: uclamp: extend sched_setattr to support utilization clamping

2018-08-09 Thread Randy Dunlap
On 08/09/2018 01:39 AM, Patrick Bellasi wrote:
> On 06-Aug 09:50, Randy Dunlap wrote:
>> Hi,
> 
> Hi Randy,
> 
>> On 08/06/2018 09:39 AM, Patrick Bellasi wrote:
>>> diff --git a/init/Kconfig b/init/Kconfig
>>> index 041f3a022122..1d45a6877d6f 100644
>>> --- a/init/Kconfig
>>> +++ b/init/Kconfig
>>> @@ -583,6 +583,25 @@ config HAVE_UNSTABLE_SCHED_CLOCK
>>>  config GENERIC_SCHED_CLOCK
>>> bool
>>>  
>>> +menu "Scheduler features"
>>> +
>>> +config UCLAMP_TASK
>>> +   bool "Enable utilization clamping for RT/FAIR tasks"
>>> +   depends on CPU_FREQ_GOV_SCHEDUTIL
>>> +   default false
>>
>>  default n
>> but just omit the line completely since "n" is already the default.
> 
> 
> Right, will update for next posting!
> Is there a strict rule to omit this line when it's already the
> default?

It's not documented AFAIK, but it's commonly repeated on LKML.

>>> +   help
>>> + This feature enables the scheduler to track the clamped utilization
>>> + of each CPU based on RUNNABLE tasks currently scheduled on that CPU.
>>> +
>>> + When this option is enabled, the user can specify a min and max CPU
>>> +  bandwidth which is allowed for a task.
>>> + The max bandwidth allows to clamp the maximum frequency a task can
>>> + use, while the min bandwidth allows to define a minimum frequency a
>>> +  task will always use.
>>
>> Please clean up the indentation above to use one tab + 2 spaces on all lines.
> 
> Sure, my bad I did not notice it... although I'm quite sure the patch
> passed a checkpatch... will check better next time.

Thanks.

>>> +
>>> + If in doubt, say N.
>>> +
>>> +endmenu


-- 
~Randy


Re: [PATCH v3 01/14] sched/core: uclamp: extend sched_setattr to support utilization clamping

2018-08-09 Thread Juri Lelli
On 09/08/18 10:14, Patrick Bellasi wrote:
> On 07-Aug 14:35, Juri Lelli wrote:
> > On 06/08/18 17:39, Patrick Bellasi wrote:
> > 
> > [...]
> > 
> > > @@ -4218,6 +4245,13 @@ static int __sched_setscheduler(struct task_struct 
> > > *p,
> > >   return retval;
> > >   }
> > >  
> > > + /* Configure utilization clamps for the task */
> > > + if (attr->sched_flags & SCHED_FLAG_UTIL_CLAMP) {
> > > + retval = __setscheduler_uclamp(p, attr);
> > > + if (retval)
> > > + return retval;
> > > + }
> > > +
> > 
> > IIUC, this is available to root and non-root users. In the latter case,
> > how do we cope with the fact that some user might occupy all the
> > available clamping groups configured for the system?
> 
> That's a very good point, glad you noticed it.
> 
> What concern me most is that we set constraints to the cgroups
> delegation model. If all clamp groups have been used it could be
> not possible for a parent group to shrink resources for its subgroups.

Right, when groups are in use the problem might actually be even more
serious.

> In both cases however, in principle, I think we can live with the idea
> that the "System Management Software" (SMS) can pre-allocate all the
> required boost groups at boot time;  malicious tasks and dependent
> groups will eventually get an -ENOSPC error.
> 
> These are the main reason why I did not posted a more "safe" solution:
> this series is already big enough, a properly (pre)configured system
> is still reasonably functional safe and  this feature can be added in
> a second step.

I see, but I also fear that there will be times and usages of this new
interface where no SMS is present.

> However, I already have a couple of possible extensions/fixes which I
> can add on top on the next respin. They are along these lines:

These are exactly what I was thinking about as well. :-)

> 1) make CAP_SYS_NICE protected the clamp groups, with an optional boot
>time parameter to relax this check

It seems to me that this might work well with that the intended usage of
the interface that you depict above. SMS only (or any privileged user)
will be in control of how groups are configured, so no problem for
normal users.

> 2) add discretization support to clamp groups allocation

And this might also work well if we feel that we don't want to restrict
usage of the interface to admin only, however...

> This second feature specifically, will ensure that clamp values are
> always mapped into one of the available clamp groups. While the exact
> clamp value can always be used for tasks placement biasing, when it
> comes to frequency selection biasing, depending on concurrently
> running tasks, you can end up with an effective clamp value which is a
> rounded up.

what I'm not so sure about is that we might lose in flexibility if the
number of available discrete clamp groups is too small compared to the
number of available OPP on the platform.

> Will likely add a couple of additional patches on v4 posting.
> Do you have any other possible idea?

As said, I though as well about the two options you mentioned.


Re: [PATCH v3 01/14] sched/core: uclamp: extend sched_setattr to support utilization clamping

2018-08-09 Thread Juri Lelli
On 09/08/18 10:14, Patrick Bellasi wrote:
> On 07-Aug 14:35, Juri Lelli wrote:
> > On 06/08/18 17:39, Patrick Bellasi wrote:
> > 
> > [...]
> > 
> > > @@ -4218,6 +4245,13 @@ static int __sched_setscheduler(struct task_struct 
> > > *p,
> > >   return retval;
> > >   }
> > >  
> > > + /* Configure utilization clamps for the task */
> > > + if (attr->sched_flags & SCHED_FLAG_UTIL_CLAMP) {
> > > + retval = __setscheduler_uclamp(p, attr);
> > > + if (retval)
> > > + return retval;
> > > + }
> > > +
> > 
> > IIUC, this is available to root and non-root users. In the latter case,
> > how do we cope with the fact that some user might occupy all the
> > available clamping groups configured for the system?
> 
> That's a very good point, glad you noticed it.
> 
> What concern me most is that we set constraints to the cgroups
> delegation model. If all clamp groups have been used it could be
> not possible for a parent group to shrink resources for its subgroups.

Right, when groups are in use the problem might actually be even more
serious.

> In both cases however, in principle, I think we can live with the idea
> that the "System Management Software" (SMS) can pre-allocate all the
> required boost groups at boot time;  malicious tasks and dependent
> groups will eventually get an -ENOSPC error.
> 
> These are the main reason why I did not posted a more "safe" solution:
> this series is already big enough, a properly (pre)configured system
> is still reasonably functional safe and  this feature can be added in
> a second step.

I see, but I also fear that there will be times and usages of this new
interface where no SMS is present.

> However, I already have a couple of possible extensions/fixes which I
> can add on top on the next respin. They are along these lines:

These are exactly what I was thinking about as well. :-)

> 1) make CAP_SYS_NICE protected the clamp groups, with an optional boot
>time parameter to relax this check

It seems to me that this might work well with that the intended usage of
the interface that you depict above. SMS only (or any privileged user)
will be in control of how groups are configured, so no problem for
normal users.

> 2) add discretization support to clamp groups allocation

And this might also work well if we feel that we don't want to restrict
usage of the interface to admin only, however...

> This second feature specifically, will ensure that clamp values are
> always mapped into one of the available clamp groups. While the exact
> clamp value can always be used for tasks placement biasing, when it
> comes to frequency selection biasing, depending on concurrently
> running tasks, you can end up with an effective clamp value which is a
> rounded up.

what I'm not so sure about is that we might lose in flexibility if the
number of available discrete clamp groups is too small compared to the
number of available OPP on the platform.

> Will likely add a couple of additional patches on v4 posting.
> Do you have any other possible idea?

As said, I though as well about the two options you mentioned.


Re: [PATCH v3 01/14] sched/core: uclamp: extend sched_setattr to support utilization clamping

2018-08-09 Thread Patrick Bellasi
On 07-Aug 14:35, Juri Lelli wrote:
> On 06/08/18 17:39, Patrick Bellasi wrote:
> 
> [...]
> 
> > @@ -4218,6 +4245,13 @@ static int __sched_setscheduler(struct task_struct 
> > *p,
> > return retval;
> > }
> >  
> > +   /* Configure utilization clamps for the task */
> > +   if (attr->sched_flags & SCHED_FLAG_UTIL_CLAMP) {
> > +   retval = __setscheduler_uclamp(p, attr);
> > +   if (retval)
> > +   return retval;
> > +   }
> > +
> 
> IIUC, this is available to root and non-root users. In the latter case,
> how do we cope with the fact that some user might occupy all the
> available clamping groups configured for the system?

That's a very good point, glad you noticed it.

What concern me most is that we set constraints to the cgroups
delegation model. If all clamp groups have been used it could be
not possible for a parent group to shrink resources for its subgroups.

In both cases however, in principle, I think we can live with the idea
that the "System Management Software" (SMS) can pre-allocate all the
required boost groups at boot time;  malicious tasks and dependent
groups will eventually get an -ENOSPC error.

These are the main reason why I did not posted a more "safe" solution:
this series is already big enough, a properly (pre)configured system
is still reasonably functional safe and  this feature can be added in
a second step.

However, I already have a couple of possible extensions/fixes which I
can add on top on the next respin. They are along these lines:
1) make CAP_SYS_NICE protected the clamp groups, with an optional boot
   time parameter to relax this check
2) add discretization support to clamp groups allocation

This second feature specifically, will ensure that clamp values are
always mapped into one of the available clamp groups. While the exact
clamp value can always be used for tasks placement biasing, when it
comes to frequency selection biasing, depending on concurrently
running tasks, you can end up with an effective clamp value which is a
rounded up.

Will likely add a couple of additional patches on v4 posting.
Do you have any other possible idea?

Cheers Patrick

-- 
#include 

Patrick Bellasi


Re: [PATCH v3 01/14] sched/core: uclamp: extend sched_setattr to support utilization clamping

2018-08-09 Thread Patrick Bellasi
On 07-Aug 14:35, Juri Lelli wrote:
> On 06/08/18 17:39, Patrick Bellasi wrote:
> 
> [...]
> 
> > @@ -4218,6 +4245,13 @@ static int __sched_setscheduler(struct task_struct 
> > *p,
> > return retval;
> > }
> >  
> > +   /* Configure utilization clamps for the task */
> > +   if (attr->sched_flags & SCHED_FLAG_UTIL_CLAMP) {
> > +   retval = __setscheduler_uclamp(p, attr);
> > +   if (retval)
> > +   return retval;
> > +   }
> > +
> 
> IIUC, this is available to root and non-root users. In the latter case,
> how do we cope with the fact that some user might occupy all the
> available clamping groups configured for the system?

That's a very good point, glad you noticed it.

What concern me most is that we set constraints to the cgroups
delegation model. If all clamp groups have been used it could be
not possible for a parent group to shrink resources for its subgroups.

In both cases however, in principle, I think we can live with the idea
that the "System Management Software" (SMS) can pre-allocate all the
required boost groups at boot time;  malicious tasks and dependent
groups will eventually get an -ENOSPC error.

These are the main reason why I did not posted a more "safe" solution:
this series is already big enough, a properly (pre)configured system
is still reasonably functional safe and  this feature can be added in
a second step.

However, I already have a couple of possible extensions/fixes which I
can add on top on the next respin. They are along these lines:
1) make CAP_SYS_NICE protected the clamp groups, with an optional boot
   time parameter to relax this check
2) add discretization support to clamp groups allocation

This second feature specifically, will ensure that clamp values are
always mapped into one of the available clamp groups. While the exact
clamp value can always be used for tasks placement biasing, when it
comes to frequency selection biasing, depending on concurrently
running tasks, you can end up with an effective clamp value which is a
rounded up.

Will likely add a couple of additional patches on v4 posting.
Do you have any other possible idea?

Cheers Patrick

-- 
#include 

Patrick Bellasi


Re: [PATCH v3 01/14] sched/core: uclamp: extend sched_setattr to support utilization clamping

2018-08-09 Thread Patrick Bellasi
On 06-Aug 09:50, Randy Dunlap wrote:
> Hi,

Hi Randy,

> On 08/06/2018 09:39 AM, Patrick Bellasi wrote:
> > diff --git a/init/Kconfig b/init/Kconfig
> > index 041f3a022122..1d45a6877d6f 100644
> > --- a/init/Kconfig
> > +++ b/init/Kconfig
> > @@ -583,6 +583,25 @@ config HAVE_UNSTABLE_SCHED_CLOCK
> >  config GENERIC_SCHED_CLOCK
> > bool
> >  
> > +menu "Scheduler features"
> > +
> > +config UCLAMP_TASK
> > +   bool "Enable utilization clamping for RT/FAIR tasks"
> > +   depends on CPU_FREQ_GOV_SCHEDUTIL
> > +   default false
> 
>   default n
> but just omit the line completely since "n" is already the default.


Right, will update for next posting!
Is there a strict rule to omit this line when it's already the
default?
> 
> > +   help
> > + This feature enables the scheduler to track the clamped utilization
> > + of each CPU based on RUNNABLE tasks currently scheduled on that CPU.
> > +
> > + When this option is enabled, the user can specify a min and max CPU
> > +  bandwidth which is allowed for a task.
> > + The max bandwidth allows to clamp the maximum frequency a task can
> > + use, while the min bandwidth allows to define a minimum frequency a
> > +  task will always use.
> 
> Please clean up the indentation above to use one tab + 2 spaces on all lines.

Sure, my bad I did not notice it... although I'm quite sure the patch
passed a checkpatch... will check better next time.

> > +
> > + If in doubt, say N.
> > +
> > +endmenu

Cheers Patrick

-- 
#include 

Patrick Bellasi


Re: [PATCH v3 01/14] sched/core: uclamp: extend sched_setattr to support utilization clamping

2018-08-09 Thread Patrick Bellasi
On 06-Aug 09:50, Randy Dunlap wrote:
> Hi,

Hi Randy,

> On 08/06/2018 09:39 AM, Patrick Bellasi wrote:
> > diff --git a/init/Kconfig b/init/Kconfig
> > index 041f3a022122..1d45a6877d6f 100644
> > --- a/init/Kconfig
> > +++ b/init/Kconfig
> > @@ -583,6 +583,25 @@ config HAVE_UNSTABLE_SCHED_CLOCK
> >  config GENERIC_SCHED_CLOCK
> > bool
> >  
> > +menu "Scheduler features"
> > +
> > +config UCLAMP_TASK
> > +   bool "Enable utilization clamping for RT/FAIR tasks"
> > +   depends on CPU_FREQ_GOV_SCHEDUTIL
> > +   default false
> 
>   default n
> but just omit the line completely since "n" is already the default.


Right, will update for next posting!
Is there a strict rule to omit this line when it's already the
default?
> 
> > +   help
> > + This feature enables the scheduler to track the clamped utilization
> > + of each CPU based on RUNNABLE tasks currently scheduled on that CPU.
> > +
> > + When this option is enabled, the user can specify a min and max CPU
> > +  bandwidth which is allowed for a task.
> > + The max bandwidth allows to clamp the maximum frequency a task can
> > + use, while the min bandwidth allows to define a minimum frequency a
> > +  task will always use.
> 
> Please clean up the indentation above to use one tab + 2 spaces on all lines.

Sure, my bad I did not notice it... although I'm quite sure the patch
passed a checkpatch... will check better next time.

> > +
> > + If in doubt, say N.
> > +
> > +endmenu

Cheers Patrick

-- 
#include 

Patrick Bellasi


Re: [PATCH v3 01/14] sched/core: uclamp: extend sched_setattr to support utilization clamping

2018-08-07 Thread Juri Lelli
On 06/08/18 17:39, Patrick Bellasi wrote:

[...]

> @@ -4218,6 +4245,13 @@ static int __sched_setscheduler(struct task_struct *p,
>   return retval;
>   }
>  
> + /* Configure utilization clamps for the task */
> + if (attr->sched_flags & SCHED_FLAG_UTIL_CLAMP) {
> + retval = __setscheduler_uclamp(p, attr);
> + if (retval)
> + return retval;
> + }
> +

IIUC, this is available to root and non-root users. In the latter case,
how do we cope with the fact that some user might occupy all the
available clamping groups configured for the system?

Best,

- Juri


Re: [PATCH v3 01/14] sched/core: uclamp: extend sched_setattr to support utilization clamping

2018-08-07 Thread Juri Lelli
On 06/08/18 17:39, Patrick Bellasi wrote:

[...]

> @@ -4218,6 +4245,13 @@ static int __sched_setscheduler(struct task_struct *p,
>   return retval;
>   }
>  
> + /* Configure utilization clamps for the task */
> + if (attr->sched_flags & SCHED_FLAG_UTIL_CLAMP) {
> + retval = __setscheduler_uclamp(p, attr);
> + if (retval)
> + return retval;
> + }
> +

IIUC, this is available to root and non-root users. In the latter case,
how do we cope with the fact that some user might occupy all the
available clamping groups configured for the system?

Best,

- Juri


Re: [PATCH v3 01/14] sched/core: uclamp: extend sched_setattr to support utilization clamping

2018-08-07 Thread Juri Lelli
Hi,

Minor comments below.

On 06/08/18 17:39, Patrick Bellasi wrote:

[...]

> + *
> + * Task Utilization Attributes
> + * ===
> + *
> + * A subset of sched_attr attributes allows to specify the utilization which
> + * should be expected by a task. These attributes allows to inform the
   ^
 allow

> + * scheduler about the utilization boundaries within which is safe to 
> schedule

Isn't all this more about providing hints than safety?

> + * the task. These utilization boundaries are valuable information to support
> + * scheduler decisions on both task placement and frequencies selection.
> + *
> + *  @sched_util_min  represents the minimum utilization
> + *  @sched_util_max  represents the maximum utilization
> + *
> + * Utilization is a value in the range [0..SCHED_CAPACITY_SCALE] which
> + * represents the percentage of CPU time used by a task when running at the
> + * maximum frequency on the highest capacity CPU of the system. Thus, for
> + * example, a 20% utilization task is a task running for 2ms every 10ms.
> + *
> + * A task with a min utilization value bigger then 0 is more likely to be
> + * scheduled on a CPU which can provide that bandwidth.
> + * A task with a max utilization value smaller then 1024 is more likely to be
> + * scheduled on a CPU which do not provide more then the required bandwidth.

Isn't s/bandwidth/capacity/ here, above, and in general where you use
the term "bandwidth" more appropriate? I wonder if overloading this term
(w.r.t. how is used with DEADLINE) might create confusion. In this case
we are not providing any sort of guarantees, it's a hint.

Best,

- Juri


Re: [PATCH v3 01/14] sched/core: uclamp: extend sched_setattr to support utilization clamping

2018-08-07 Thread Juri Lelli
Hi,

Minor comments below.

On 06/08/18 17:39, Patrick Bellasi wrote:

[...]

> + *
> + * Task Utilization Attributes
> + * ===
> + *
> + * A subset of sched_attr attributes allows to specify the utilization which
> + * should be expected by a task. These attributes allows to inform the
   ^
 allow

> + * scheduler about the utilization boundaries within which is safe to 
> schedule

Isn't all this more about providing hints than safety?

> + * the task. These utilization boundaries are valuable information to support
> + * scheduler decisions on both task placement and frequencies selection.
> + *
> + *  @sched_util_min  represents the minimum utilization
> + *  @sched_util_max  represents the maximum utilization
> + *
> + * Utilization is a value in the range [0..SCHED_CAPACITY_SCALE] which
> + * represents the percentage of CPU time used by a task when running at the
> + * maximum frequency on the highest capacity CPU of the system. Thus, for
> + * example, a 20% utilization task is a task running for 2ms every 10ms.
> + *
> + * A task with a min utilization value bigger then 0 is more likely to be
> + * scheduled on a CPU which can provide that bandwidth.
> + * A task with a max utilization value smaller then 1024 is more likely to be
> + * scheduled on a CPU which do not provide more then the required bandwidth.

Isn't s/bandwidth/capacity/ here, above, and in general where you use
the term "bandwidth" more appropriate? I wonder if overloading this term
(w.r.t. how is used with DEADLINE) might create confusion. In this case
we are not providing any sort of guarantees, it's a hint.

Best,

- Juri


Re: [PATCH v3 01/14] sched/core: uclamp: extend sched_setattr to support utilization clamping

2018-08-06 Thread Randy Dunlap
Hi,

On 08/06/2018 09:39 AM, Patrick Bellasi wrote:
> diff --git a/init/Kconfig b/init/Kconfig
> index 041f3a022122..1d45a6877d6f 100644
> --- a/init/Kconfig
> +++ b/init/Kconfig
> @@ -583,6 +583,25 @@ config HAVE_UNSTABLE_SCHED_CLOCK
>  config GENERIC_SCHED_CLOCK
>   bool
>  
> +menu "Scheduler features"
> +
> +config UCLAMP_TASK
> + bool "Enable utilization clamping for RT/FAIR tasks"
> + depends on CPU_FREQ_GOV_SCHEDUTIL
> + default false

default n
but just omit the line completely since "n" is already the default.

> + help
> +   This feature enables the scheduler to track the clamped utilization
> +   of each CPU based on RUNNABLE tasks currently scheduled on that CPU.
> +
> +   When this option is enabled, the user can specify a min and max CPU
> +  bandwidth which is allowed for a task.
> +   The max bandwidth allows to clamp the maximum frequency a task can
> +   use, while the min bandwidth allows to define a minimum frequency a
> +  task will always use.

Please clean up the indentation above to use one tab + 2 spaces on all lines.

> +
> +   If in doubt, say N.
> +
> +endmenu


thanks,
-- 
~Randy


Re: [PATCH v3 01/14] sched/core: uclamp: extend sched_setattr to support utilization clamping

2018-08-06 Thread Randy Dunlap
Hi,

On 08/06/2018 09:39 AM, Patrick Bellasi wrote:
> diff --git a/init/Kconfig b/init/Kconfig
> index 041f3a022122..1d45a6877d6f 100644
> --- a/init/Kconfig
> +++ b/init/Kconfig
> @@ -583,6 +583,25 @@ config HAVE_UNSTABLE_SCHED_CLOCK
>  config GENERIC_SCHED_CLOCK
>   bool
>  
> +menu "Scheduler features"
> +
> +config UCLAMP_TASK
> + bool "Enable utilization clamping for RT/FAIR tasks"
> + depends on CPU_FREQ_GOV_SCHEDUTIL
> + default false

default n
but just omit the line completely since "n" is already the default.

> + help
> +   This feature enables the scheduler to track the clamped utilization
> +   of each CPU based on RUNNABLE tasks currently scheduled on that CPU.
> +
> +   When this option is enabled, the user can specify a min and max CPU
> +  bandwidth which is allowed for a task.
> +   The max bandwidth allows to clamp the maximum frequency a task can
> +   use, while the min bandwidth allows to define a minimum frequency a
> +  task will always use.

Please clean up the indentation above to use one tab + 2 spaces on all lines.

> +
> +   If in doubt, say N.
> +
> +endmenu


thanks,
-- 
~Randy


[PATCH v3 01/14] sched/core: uclamp: extend sched_setattr to support utilization clamping

2018-08-06 Thread Patrick Bellasi
The SCHED_DEADLINE scheduling class provides an advanced and formal
model to define tasks requirements which can be translated into proper
decisions for both task placements and frequencies selections.
Other classes have a more simplified model which is essentially based on
the relatively simple concept of POSIX priorities.

Such a simple priority based model however does not allow to exploit
some of the most advanced features of the Linux scheduler like, for
example, driving frequencies selection via the schedutil cpufreq
governor. However, also for non SCHED_DEADLINE tasks, it's still
interesting to define tasks properties which can be used to better
support certain scheduler decisions.

Utilization clamping aims at exposing to user-space a new set of
per-task attributes which can be used to provide the scheduler with some
hints about the expected/required utilization for a task.
This will allow to implement a more advanced per-task frequency control
mechanism which is not based just on a "passive" measured task
utilization but on a more "active" approach. For example, it could be
possible to boost interactive tasks, thus getting better performance, or
cap background tasks, thus being more energy efficient.
Ultimately, such a mechanism can be considered similar to the cpufreq's
powersave, performance and userspace governor but with a much fine
grained and per-task control.

Let's introduce a new API to set utilization clamping values for a
specified task by extending sched_setattr, a syscall which already
allows to define task specific properties for different scheduling
classes.
Specifically, a new pair of attributes allows to specify a minimum and
maximum utilization which the scheduler should consider for a task.

Signed-off-by: Patrick Bellasi 
Cc: Ingo Molnar 
Cc: Peter Zijlstra 
Cc: Tejun Heo 
Cc: Rafael J. Wysocki 
Cc: Vincent Guittot 
Cc: Viresh Kumar 
Cc: Paul Turner 
Cc: Suren Baghdasaryan 
Cc: Todd Kjos 
Cc: Joel Fernandes 
Cc: Steve Muckle 
Cc: Juri Lelli 
Cc: Dietmar Eggemann 
Cc: Morten Rasmussen 
Cc: linux-kernel@vger.kernel.org
Cc: linux...@vger.kernel.org

---
Changes in v3:
 Message-ID: 

 - removed UCLAMP_NONE not used by this patch
 Others:
 - rebased on tip/sched/core

Changes in v2:
 - rebased on v4.18-rc4
 - move at the head of the series

As discussed at OSPM, using a [0..SCHED_CAPACITY_SCALE] range seems to
be acceptable. However, an additional patch has been added at the end of
the series which introduces a simple abstraction to use a more
generic [0..100] range.

At OSPM we also discarded the idea to "recycle" the usage of
sched_runtime and sched_period which would have made the API too
much complex for limited benefits.
---
 include/linux/sched.h| 13 +++
 include/uapi/linux/sched.h   |  4 +-
 include/uapi/linux/sched/types.h | 64 +++-
 init/Kconfig | 19 ++
 kernel/sched/core.c  | 39 +++
 5 files changed, 129 insertions(+), 10 deletions(-)

diff --git a/include/linux/sched.h b/include/linux/sched.h
index e0f4f56c9310..42f6439378e1 100644
--- a/include/linux/sched.h
+++ b/include/linux/sched.h
@@ -279,6 +279,14 @@ struct vtime {
u64 gtime;
 };
 
+enum uclamp_id {
+   UCLAMP_MIN = 0, /* Minimum utilization */
+   UCLAMP_MAX, /* Maximum utilization */
+
+   /* Utilization clamping constraints count */
+   UCLAMP_CNT
+};
+
 struct sched_info {
 #ifdef CONFIG_SCHED_INFO
/* Cumulative counters: */
@@ -649,6 +657,11 @@ struct task_struct {
 #endif
struct sched_dl_entity  dl;
 
+#ifdef CONFIG_UCLAMP_TASK
+   /* Utlization clamp values for this task */
+   int uclamp[UCLAMP_CNT];
+#endif
+
 #ifdef CONFIG_PREEMPT_NOTIFIERS
/* List of struct preempt_notifier: */
struct hlist_head   preempt_notifiers;
diff --git a/include/uapi/linux/sched.h b/include/uapi/linux/sched.h
index 22627f80063e..c27d6e81517b 100644
--- a/include/uapi/linux/sched.h
+++ b/include/uapi/linux/sched.h
@@ -50,9 +50,11 @@
 #define SCHED_FLAG_RESET_ON_FORK   0x01
 #define SCHED_FLAG_RECLAIM 0x02
 #define SCHED_FLAG_DL_OVERRUN  0x04
+#define SCHED_FLAG_UTIL_CLAMP  0x08
 
 #define SCHED_FLAG_ALL (SCHED_FLAG_RESET_ON_FORK   | \
 SCHED_FLAG_RECLAIM | \
-SCHED_FLAG_DL_OVERRUN)
+SCHED_FLAG_DL_OVERRUN  | \
+SCHED_FLAG_UTIL_CLAMP)
 
 #endif /* _UAPI_LINUX_SCHED_H */
diff --git a/include/uapi/linux/sched/types.h b/include/uapi/linux/sched/types.h
index 10fbb8031930..7421cd25354d 100644
--- a/include/uapi/linux/sched/types.h
+++ b/include/uapi/linux/sched/types.h
@@ -21,8 +21,33 @@ struct sched_param {
  * the tasks may be useful for a wide variety of application fields, e.g.,
  * multimedia, streaming, automation and control, and many others.
  

[PATCH v3 01/14] sched/core: uclamp: extend sched_setattr to support utilization clamping

2018-08-06 Thread Patrick Bellasi
The SCHED_DEADLINE scheduling class provides an advanced and formal
model to define tasks requirements which can be translated into proper
decisions for both task placements and frequencies selections.
Other classes have a more simplified model which is essentially based on
the relatively simple concept of POSIX priorities.

Such a simple priority based model however does not allow to exploit
some of the most advanced features of the Linux scheduler like, for
example, driving frequencies selection via the schedutil cpufreq
governor. However, also for non SCHED_DEADLINE tasks, it's still
interesting to define tasks properties which can be used to better
support certain scheduler decisions.

Utilization clamping aims at exposing to user-space a new set of
per-task attributes which can be used to provide the scheduler with some
hints about the expected/required utilization for a task.
This will allow to implement a more advanced per-task frequency control
mechanism which is not based just on a "passive" measured task
utilization but on a more "active" approach. For example, it could be
possible to boost interactive tasks, thus getting better performance, or
cap background tasks, thus being more energy efficient.
Ultimately, such a mechanism can be considered similar to the cpufreq's
powersave, performance and userspace governor but with a much fine
grained and per-task control.

Let's introduce a new API to set utilization clamping values for a
specified task by extending sched_setattr, a syscall which already
allows to define task specific properties for different scheduling
classes.
Specifically, a new pair of attributes allows to specify a minimum and
maximum utilization which the scheduler should consider for a task.

Signed-off-by: Patrick Bellasi 
Cc: Ingo Molnar 
Cc: Peter Zijlstra 
Cc: Tejun Heo 
Cc: Rafael J. Wysocki 
Cc: Vincent Guittot 
Cc: Viresh Kumar 
Cc: Paul Turner 
Cc: Suren Baghdasaryan 
Cc: Todd Kjos 
Cc: Joel Fernandes 
Cc: Steve Muckle 
Cc: Juri Lelli 
Cc: Dietmar Eggemann 
Cc: Morten Rasmussen 
Cc: linux-kernel@vger.kernel.org
Cc: linux...@vger.kernel.org

---
Changes in v3:
 Message-ID: 

 - removed UCLAMP_NONE not used by this patch
 Others:
 - rebased on tip/sched/core

Changes in v2:
 - rebased on v4.18-rc4
 - move at the head of the series

As discussed at OSPM, using a [0..SCHED_CAPACITY_SCALE] range seems to
be acceptable. However, an additional patch has been added at the end of
the series which introduces a simple abstraction to use a more
generic [0..100] range.

At OSPM we also discarded the idea to "recycle" the usage of
sched_runtime and sched_period which would have made the API too
much complex for limited benefits.
---
 include/linux/sched.h| 13 +++
 include/uapi/linux/sched.h   |  4 +-
 include/uapi/linux/sched/types.h | 64 +++-
 init/Kconfig | 19 ++
 kernel/sched/core.c  | 39 +++
 5 files changed, 129 insertions(+), 10 deletions(-)

diff --git a/include/linux/sched.h b/include/linux/sched.h
index e0f4f56c9310..42f6439378e1 100644
--- a/include/linux/sched.h
+++ b/include/linux/sched.h
@@ -279,6 +279,14 @@ struct vtime {
u64 gtime;
 };
 
+enum uclamp_id {
+   UCLAMP_MIN = 0, /* Minimum utilization */
+   UCLAMP_MAX, /* Maximum utilization */
+
+   /* Utilization clamping constraints count */
+   UCLAMP_CNT
+};
+
 struct sched_info {
 #ifdef CONFIG_SCHED_INFO
/* Cumulative counters: */
@@ -649,6 +657,11 @@ struct task_struct {
 #endif
struct sched_dl_entity  dl;
 
+#ifdef CONFIG_UCLAMP_TASK
+   /* Utlization clamp values for this task */
+   int uclamp[UCLAMP_CNT];
+#endif
+
 #ifdef CONFIG_PREEMPT_NOTIFIERS
/* List of struct preempt_notifier: */
struct hlist_head   preempt_notifiers;
diff --git a/include/uapi/linux/sched.h b/include/uapi/linux/sched.h
index 22627f80063e..c27d6e81517b 100644
--- a/include/uapi/linux/sched.h
+++ b/include/uapi/linux/sched.h
@@ -50,9 +50,11 @@
 #define SCHED_FLAG_RESET_ON_FORK   0x01
 #define SCHED_FLAG_RECLAIM 0x02
 #define SCHED_FLAG_DL_OVERRUN  0x04
+#define SCHED_FLAG_UTIL_CLAMP  0x08
 
 #define SCHED_FLAG_ALL (SCHED_FLAG_RESET_ON_FORK   | \
 SCHED_FLAG_RECLAIM | \
-SCHED_FLAG_DL_OVERRUN)
+SCHED_FLAG_DL_OVERRUN  | \
+SCHED_FLAG_UTIL_CLAMP)
 
 #endif /* _UAPI_LINUX_SCHED_H */
diff --git a/include/uapi/linux/sched/types.h b/include/uapi/linux/sched/types.h
index 10fbb8031930..7421cd25354d 100644
--- a/include/uapi/linux/sched/types.h
+++ b/include/uapi/linux/sched/types.h
@@ -21,8 +21,33 @@ struct sched_param {
  * the tasks may be useful for a wide variety of application fields, e.g.,
  * multimedia, streaming, automation and control, and many others.