Re: [PATCH v3 01/14] sched/core: uclamp: extend sched_setattr to support utilization clamping
On Friday 17 Aug 2018 at 11:57:31 (+0100), Patrick Bellasi wrote: > On 17-Aug 11:34, Quentin Perret wrote: > > Hi Patrick, > > > > On Thursday 09 Aug 2018 at 16:23:13 (+0100), Patrick Bellasi wrote: > > > On 09-Aug 11:50, Juri Lelli wrote: > > > > On 09/08/18 10:14, Patrick Bellasi wrote: > > > > > On 07-Aug 14:35, Juri Lelli wrote: > > > > > > On 06/08/18 17:39, Patrick Bellasi wrote: > > > > > > [...] > > > > > > > > 1) make CAP_SYS_NICE protected the clamp groups, with an optional boot > > > > >time parameter to relax this check > > > > > > > > It seems to me that this might work well with that the intended usage of > > > > the interface that you depict above. SMS only (or any privileged user) > > > > will be in control of how groups are configured, so no problem for > > > > normal users. > > > > > > Yes, well... apart normal users still getting a -ENOSPC is they are > > > requesting one of the not pre-configured clamp values. Which is why > > > the following bits can be helpful. > > > > So IIUC, normal users would still be free of choosing their clamp values > > as long as they choose one in the list of pre-allocated ones ? Is that > > correct ? > > No, with the CAP_SYS_NICE/ADMIN guard in place, as discussed above in > point 1, the syscall will just fail for normal users. Right, I just misunderstood then :-) Sorry for the noise ... Thanks, Quentin
Re: [PATCH v3 01/14] sched/core: uclamp: extend sched_setattr to support utilization clamping
On Friday 17 Aug 2018 at 11:57:31 (+0100), Patrick Bellasi wrote: > On 17-Aug 11:34, Quentin Perret wrote: > > Hi Patrick, > > > > On Thursday 09 Aug 2018 at 16:23:13 (+0100), Patrick Bellasi wrote: > > > On 09-Aug 11:50, Juri Lelli wrote: > > > > On 09/08/18 10:14, Patrick Bellasi wrote: > > > > > On 07-Aug 14:35, Juri Lelli wrote: > > > > > > On 06/08/18 17:39, Patrick Bellasi wrote: > > > > > > [...] > > > > > > > > 1) make CAP_SYS_NICE protected the clamp groups, with an optional boot > > > > >time parameter to relax this check > > > > > > > > It seems to me that this might work well with that the intended usage of > > > > the interface that you depict above. SMS only (or any privileged user) > > > > will be in control of how groups are configured, so no problem for > > > > normal users. > > > > > > Yes, well... apart normal users still getting a -ENOSPC is they are > > > requesting one of the not pre-configured clamp values. Which is why > > > the following bits can be helpful. > > > > So IIUC, normal users would still be free of choosing their clamp values > > as long as they choose one in the list of pre-allocated ones ? Is that > > correct ? > > No, with the CAP_SYS_NICE/ADMIN guard in place, as discussed above in > point 1, the syscall will just fail for normal users. Right, I just misunderstood then :-) Sorry for the noise ... Thanks, Quentin
Re: [PATCH v3 01/14] sched/core: uclamp: extend sched_setattr to support utilization clamping
On 17-Aug 11:34, Quentin Perret wrote: > Hi Patrick, > > On Thursday 09 Aug 2018 at 16:23:13 (+0100), Patrick Bellasi wrote: > > On 09-Aug 11:50, Juri Lelli wrote: > > > On 09/08/18 10:14, Patrick Bellasi wrote: > > > > On 07-Aug 14:35, Juri Lelli wrote: > > > > > On 06/08/18 17:39, Patrick Bellasi wrote: > > > > [...] > > > > > > 1) make CAP_SYS_NICE protected the clamp groups, with an optional boot > > > >time parameter to relax this check > > > > > > It seems to me that this might work well with that the intended usage of > > > the interface that you depict above. SMS only (or any privileged user) > > > will be in control of how groups are configured, so no problem for > > > normal users. > > > > Yes, well... apart normal users still getting a -ENOSPC is they are > > requesting one of the not pre-configured clamp values. Which is why > > the following bits can be helpful. > > So IIUC, normal users would still be free of choosing their clamp values > as long as they choose one in the list of pre-allocated ones ? Is that > correct ? No, with the CAP_SYS_NICE/ADMIN guard in place, as discussed above in point 1, the syscall will just fail for normal users. Only privileged tasks (i.e. SMS control threads) can change clamp values. > If yes, that would still let normal users make they tasks look bigger no ? > They could just choose the clamp group with the highest min_clamp or > something. Isn't this a problem too ? I mean, if that can be abused easily, > I'm pretty sure people _will_ abuse it ... It should not be possible with 1) in place. However, if the system is booted with that check disabled (e.g. via kernel boot parameter) that probably means you trust/control your userspace and don't want to impose restrictions on non privileged tasks. In this case "abuses" are just "acceptable usages"... -- #include Patrick Bellasi
Re: [PATCH v3 01/14] sched/core: uclamp: extend sched_setattr to support utilization clamping
On 17-Aug 11:34, Quentin Perret wrote: > Hi Patrick, > > On Thursday 09 Aug 2018 at 16:23:13 (+0100), Patrick Bellasi wrote: > > On 09-Aug 11:50, Juri Lelli wrote: > > > On 09/08/18 10:14, Patrick Bellasi wrote: > > > > On 07-Aug 14:35, Juri Lelli wrote: > > > > > On 06/08/18 17:39, Patrick Bellasi wrote: > > > > [...] > > > > > > 1) make CAP_SYS_NICE protected the clamp groups, with an optional boot > > > >time parameter to relax this check > > > > > > It seems to me that this might work well with that the intended usage of > > > the interface that you depict above. SMS only (or any privileged user) > > > will be in control of how groups are configured, so no problem for > > > normal users. > > > > Yes, well... apart normal users still getting a -ENOSPC is they are > > requesting one of the not pre-configured clamp values. Which is why > > the following bits can be helpful. > > So IIUC, normal users would still be free of choosing their clamp values > as long as they choose one in the list of pre-allocated ones ? Is that > correct ? No, with the CAP_SYS_NICE/ADMIN guard in place, as discussed above in point 1, the syscall will just fail for normal users. Only privileged tasks (i.e. SMS control threads) can change clamp values. > If yes, that would still let normal users make they tasks look bigger no ? > They could just choose the clamp group with the highest min_clamp or > something. Isn't this a problem too ? I mean, if that can be abused easily, > I'm pretty sure people _will_ abuse it ... It should not be possible with 1) in place. However, if the system is booted with that check disabled (e.g. via kernel boot parameter) that probably means you trust/control your userspace and don't want to impose restrictions on non privileged tasks. In this case "abuses" are just "acceptable usages"... -- #include Patrick Bellasi
Re: [PATCH v3 01/14] sched/core: uclamp: extend sched_setattr to support utilization clamping
Hi Patrick, On Thursday 09 Aug 2018 at 16:23:13 (+0100), Patrick Bellasi wrote: > On 09-Aug 11:50, Juri Lelli wrote: > > On 09/08/18 10:14, Patrick Bellasi wrote: > > > On 07-Aug 14:35, Juri Lelli wrote: > > > > On 06/08/18 17:39, Patrick Bellasi wrote: > > [...] > > > > 1) make CAP_SYS_NICE protected the clamp groups, with an optional boot > > >time parameter to relax this check > > > > It seems to me that this might work well with that the intended usage of > > the interface that you depict above. SMS only (or any privileged user) > > will be in control of how groups are configured, so no problem for > > normal users. > > Yes, well... apart normal users still getting a -ENOSPC is they are > requesting one of the not pre-configured clamp values. Which is why > the following bits can be helpful. So IIUC, normal users would still be free of choosing their clamp values as long as they choose one in the list of pre-allocated ones ? Is that correct ? If yes, that would still let normal users make they tasks look bigger no ? They could just choose the clamp group with the highest min_clamp or something. Isn't this a problem too ? I mean, if that can be abused easily, I'm pretty sure people _will_ abuse it ... Or maybe I misunderstood something ? Thanks, Quentin
Re: [PATCH v3 01/14] sched/core: uclamp: extend sched_setattr to support utilization clamping
Hi Patrick, On Thursday 09 Aug 2018 at 16:23:13 (+0100), Patrick Bellasi wrote: > On 09-Aug 11:50, Juri Lelli wrote: > > On 09/08/18 10:14, Patrick Bellasi wrote: > > > On 07-Aug 14:35, Juri Lelli wrote: > > > > On 06/08/18 17:39, Patrick Bellasi wrote: > > [...] > > > > 1) make CAP_SYS_NICE protected the clamp groups, with an optional boot > > >time parameter to relax this check > > > > It seems to me that this might work well with that the intended usage of > > the interface that you depict above. SMS only (or any privileged user) > > will be in control of how groups are configured, so no problem for > > normal users. > > Yes, well... apart normal users still getting a -ENOSPC is they are > requesting one of the not pre-configured clamp values. Which is why > the following bits can be helpful. So IIUC, normal users would still be free of choosing their clamp values as long as they choose one in the list of pre-allocated ones ? Is that correct ? If yes, that would still let normal users make they tasks look bigger no ? They could just choose the clamp group with the highest min_clamp or something. Isn't this a problem too ? I mean, if that can be abused easily, I'm pretty sure people _will_ abuse it ... Or maybe I misunderstood something ? Thanks, Quentin
Re: [PATCH v3 01/14] sched/core: uclamp: extend sched_setattr to support utilization clamping
On 13/08/18 13:14, Patrick Bellasi wrote: > On 07-Aug 11:59, Juri Lelli wrote: > > Hi, > > > > Minor comments below. > > > > On 06/08/18 17:39, Patrick Bellasi wrote: > > > > [...] > > > > > + * > > > + * Task Utilization Attributes > > > + * === > > > + * > > > + * A subset of sched_attr attributes allows to specify the utilization > > > which > > > + * should be expected by a task. These attributes allows to inform the > >^ > > allow > > > > > + * scheduler about the utilization boundaries within which is safe to > > > schedule > > > > Isn't all this more about providing hints than safety? > > Yes, it's "just" hints... will rephrase to make it more clear. > > > > + * the task. These utilization boundaries are valuable information to > > > support > > > + * scheduler decisions on both task placement and frequencies selection. > > > + * > > > + * @sched_util_min represents the minimum utilization > > > + * @sched_util_max represents the maximum utilization > > > + * > > > + * Utilization is a value in the range [0..SCHED_CAPACITY_SCALE] which > > > + * represents the percentage of CPU time used by a task when running at > > > the > > > + * maximum frequency on the highest capacity CPU of the system. Thus, for > > > + * example, a 20% utilization task is a task running for 2ms every 10ms. > > > + * > > > + * A task with a min utilization value bigger then 0 is more likely to be > > > + * scheduled on a CPU which can provide that bandwidth. > > > + * A task with a max utilization value smaller then 1024 is more likely > > > to be > > > + * scheduled on a CPU which do not provide more then the required > > > bandwidth. > > > > Isn't s/bandwidth/capacity/ here, above, and in general where you use > > the term "bandwidth" more appropriate? I wonder if overloading this term > > (w.r.t. how is used with DEADLINE) might create confusion. In this case > > we are not providing any sort of guarantees, it's a hint. > > Yes, you right... here we are not really granting any bandwidth but > just "improving" the bandwidth provisioning by hinting the scheduler > about a certain min/max capacity required. > > The problem related to using capacity is that, from kernel space, > capacity is defined as a static quantity/property of CPUs. Still, I Looks like it's also more inline with EAS terminology (i.e., capacity states).
Re: [PATCH v3 01/14] sched/core: uclamp: extend sched_setattr to support utilization clamping
On 13/08/18 13:14, Patrick Bellasi wrote: > On 07-Aug 11:59, Juri Lelli wrote: > > Hi, > > > > Minor comments below. > > > > On 06/08/18 17:39, Patrick Bellasi wrote: > > > > [...] > > > > > + * > > > + * Task Utilization Attributes > > > + * === > > > + * > > > + * A subset of sched_attr attributes allows to specify the utilization > > > which > > > + * should be expected by a task. These attributes allows to inform the > >^ > > allow > > > > > + * scheduler about the utilization boundaries within which is safe to > > > schedule > > > > Isn't all this more about providing hints than safety? > > Yes, it's "just" hints... will rephrase to make it more clear. > > > > + * the task. These utilization boundaries are valuable information to > > > support > > > + * scheduler decisions on both task placement and frequencies selection. > > > + * > > > + * @sched_util_min represents the minimum utilization > > > + * @sched_util_max represents the maximum utilization > > > + * > > > + * Utilization is a value in the range [0..SCHED_CAPACITY_SCALE] which > > > + * represents the percentage of CPU time used by a task when running at > > > the > > > + * maximum frequency on the highest capacity CPU of the system. Thus, for > > > + * example, a 20% utilization task is a task running for 2ms every 10ms. > > > + * > > > + * A task with a min utilization value bigger then 0 is more likely to be > > > + * scheduled on a CPU which can provide that bandwidth. > > > + * A task with a max utilization value smaller then 1024 is more likely > > > to be > > > + * scheduled on a CPU which do not provide more then the required > > > bandwidth. > > > > Isn't s/bandwidth/capacity/ here, above, and in general where you use > > the term "bandwidth" more appropriate? I wonder if overloading this term > > (w.r.t. how is used with DEADLINE) might create confusion. In this case > > we are not providing any sort of guarantees, it's a hint. > > Yes, you right... here we are not really granting any bandwidth but > just "improving" the bandwidth provisioning by hinting the scheduler > about a certain min/max capacity required. > > The problem related to using capacity is that, from kernel space, > capacity is defined as a static quantity/property of CPUs. Still, I Looks like it's also more inline with EAS terminology (i.e., capacity states).
Re: [PATCH v3 01/14] sched/core: uclamp: extend sched_setattr to support utilization clamping
On 07-Aug 11:59, Juri Lelli wrote: > Hi, > > Minor comments below. > > On 06/08/18 17:39, Patrick Bellasi wrote: > > [...] > > > + * > > + * Task Utilization Attributes > > + * === > > + * > > + * A subset of sched_attr attributes allows to specify the utilization > > which > > + * should be expected by a task. These attributes allows to inform the >^ >allow > > > + * scheduler about the utilization boundaries within which is safe to > > schedule > > Isn't all this more about providing hints than safety? Yes, it's "just" hints... will rephrase to make it more clear. > > + * the task. These utilization boundaries are valuable information to > > support > > + * scheduler decisions on both task placement and frequencies selection. > > + * > > + * @sched_util_minrepresents the minimum utilization > > + * @sched_util_maxrepresents the maximum utilization > > + * > > + * Utilization is a value in the range [0..SCHED_CAPACITY_SCALE] which > > + * represents the percentage of CPU time used by a task when running at the > > + * maximum frequency on the highest capacity CPU of the system. Thus, for > > + * example, a 20% utilization task is a task running for 2ms every 10ms. > > + * > > + * A task with a min utilization value bigger then 0 is more likely to be > > + * scheduled on a CPU which can provide that bandwidth. > > + * A task with a max utilization value smaller then 1024 is more likely to > > be > > + * scheduled on a CPU which do not provide more then the required > > bandwidth. > > Isn't s/bandwidth/capacity/ here, above, and in general where you use > the term "bandwidth" more appropriate? I wonder if overloading this term > (w.r.t. how is used with DEADLINE) might create confusion. In this case > we are not providing any sort of guarantees, it's a hint. Yes, you right... here we are not really granting any bandwidth but just "improving" the bandwidth provisioning by hinting the scheduler about a certain min/max capacity required. The problem related to using capacity is that, from kernel space, capacity is defined as a static quantity/property of CPUs. Still, I think it makes sense to argue that util_{min,max} are hints on the min/max capacity required for a task. I'll update comments and text to avoid using bandwidth in favour of capacity. Cheers Patrick -- #include Patrick Bellasi
Re: [PATCH v3 01/14] sched/core: uclamp: extend sched_setattr to support utilization clamping
On 07-Aug 11:59, Juri Lelli wrote: > Hi, > > Minor comments below. > > On 06/08/18 17:39, Patrick Bellasi wrote: > > [...] > > > + * > > + * Task Utilization Attributes > > + * === > > + * > > + * A subset of sched_attr attributes allows to specify the utilization > > which > > + * should be expected by a task. These attributes allows to inform the >^ >allow > > > + * scheduler about the utilization boundaries within which is safe to > > schedule > > Isn't all this more about providing hints than safety? Yes, it's "just" hints... will rephrase to make it more clear. > > + * the task. These utilization boundaries are valuable information to > > support > > + * scheduler decisions on both task placement and frequencies selection. > > + * > > + * @sched_util_minrepresents the minimum utilization > > + * @sched_util_maxrepresents the maximum utilization > > + * > > + * Utilization is a value in the range [0..SCHED_CAPACITY_SCALE] which > > + * represents the percentage of CPU time used by a task when running at the > > + * maximum frequency on the highest capacity CPU of the system. Thus, for > > + * example, a 20% utilization task is a task running for 2ms every 10ms. > > + * > > + * A task with a min utilization value bigger then 0 is more likely to be > > + * scheduled on a CPU which can provide that bandwidth. > > + * A task with a max utilization value smaller then 1024 is more likely to > > be > > + * scheduled on a CPU which do not provide more then the required > > bandwidth. > > Isn't s/bandwidth/capacity/ here, above, and in general where you use > the term "bandwidth" more appropriate? I wonder if overloading this term > (w.r.t. how is used with DEADLINE) might create confusion. In this case > we are not providing any sort of guarantees, it's a hint. Yes, you right... here we are not really granting any bandwidth but just "improving" the bandwidth provisioning by hinting the scheduler about a certain min/max capacity required. The problem related to using capacity is that, from kernel space, capacity is defined as a static quantity/property of CPUs. Still, I think it makes sense to argue that util_{min,max} are hints on the min/max capacity required for a task. I'll update comments and text to avoid using bandwidth in favour of capacity. Cheers Patrick -- #include Patrick Bellasi
Re: [PATCH v3 01/14] sched/core: uclamp: extend sched_setattr to support utilization clamping
On 09/08/18 16:23, Patrick Bellasi wrote: > On 09-Aug 11:50, Juri Lelli wrote: > > On 09/08/18 10:14, Patrick Bellasi wrote: > > > On 07-Aug 14:35, Juri Lelli wrote: > > > > On 06/08/18 17:39, Patrick Bellasi wrote: > > [...] > > > > 1) make CAP_SYS_NICE protected the clamp groups, with an optional boot > > >time parameter to relax this check > > > > It seems to me that this might work well with that the intended usage of > > the interface that you depict above. SMS only (or any privileged user) > > will be in control of how groups are configured, so no problem for > > normal users. > > Yes, well... apart normal users still getting a -ENOSPC is they are > requesting one of the not pre-configured clamp values. Which is why > the following bits can be helpful. > > > > 2) add discretization support to clamp groups allocation > > > > And this might also work well if we feel that we don't want to restrict > > usage of the interface to admin only, however... > > > > > This second feature specifically, will ensure that clamp values are > > > always mapped into one of the available clamp groups. While the exact > > > clamp value can always be used for tasks placement biasing, when it > > > comes to frequency selection biasing, depending on concurrently > > > running tasks, you can end up with an effective clamp value which is a > > > rounded up. > > > > what I'm not so sure about is that we might lose in flexibility if the > > number of available discrete clamp groups is too small compared to the > > number of available OPP on the platform. > > Regarding this concern, I would say that we should consider that, for > frequency biasing, we are in general not interested in nailing down > the single 1% difference and/or exact OPP capacities True. > A certain coarse grained resolution is usually acceptable for many > different reasons: > a) schedutil already uses a 20% margin which can potentially eclipse >few OPP when we scale up/down > b) tasks/CPUs utilization are good enough but never exact and precise >values > c) reducing the number of OPP switches could have some benefits on >stability/latencies > d) clamping is actually defining minimum/maximum preferred values, is >not to be considered a tool for "precise control" > > All that considered, I would say that maybe a 5% resolution could > still be considered an acceptable _worst case_ rounding since we don't > have always to round up to the next 5%. > > For example, if we have: > - TaskA: util_min=41% > - TaskB: util_nin=44% > they will be both accounted in the 40-45% clamp group but the clamp > group value can be modulated at run-time depending on RUNNABLE > tasks. When TaskA is running alone, we can still set util_min to > 41%, while we will use 44% (not 45%) when TaskB is (also) running. > > It's worth to notice that we pre-allocated at compile time 20 clamp > groups, but not necessarily all of them will be used at run-time. > Indeed, we will still use a policy where only the actual required > values are allocated at the beginning of the clamps map, thus > optimizing max updates. OK, so you'll only still iterate over the groups that are actually in use, which is hopefully less than 20 and should keep overhead low. Makes sense to me.
Re: [PATCH v3 01/14] sched/core: uclamp: extend sched_setattr to support utilization clamping
On 09/08/18 16:23, Patrick Bellasi wrote: > On 09-Aug 11:50, Juri Lelli wrote: > > On 09/08/18 10:14, Patrick Bellasi wrote: > > > On 07-Aug 14:35, Juri Lelli wrote: > > > > On 06/08/18 17:39, Patrick Bellasi wrote: > > [...] > > > > 1) make CAP_SYS_NICE protected the clamp groups, with an optional boot > > >time parameter to relax this check > > > > It seems to me that this might work well with that the intended usage of > > the interface that you depict above. SMS only (or any privileged user) > > will be in control of how groups are configured, so no problem for > > normal users. > > Yes, well... apart normal users still getting a -ENOSPC is they are > requesting one of the not pre-configured clamp values. Which is why > the following bits can be helpful. > > > > 2) add discretization support to clamp groups allocation > > > > And this might also work well if we feel that we don't want to restrict > > usage of the interface to admin only, however... > > > > > This second feature specifically, will ensure that clamp values are > > > always mapped into one of the available clamp groups. While the exact > > > clamp value can always be used for tasks placement biasing, when it > > > comes to frequency selection biasing, depending on concurrently > > > running tasks, you can end up with an effective clamp value which is a > > > rounded up. > > > > what I'm not so sure about is that we might lose in flexibility if the > > number of available discrete clamp groups is too small compared to the > > number of available OPP on the platform. > > Regarding this concern, I would say that we should consider that, for > frequency biasing, we are in general not interested in nailing down > the single 1% difference and/or exact OPP capacities True. > A certain coarse grained resolution is usually acceptable for many > different reasons: > a) schedutil already uses a 20% margin which can potentially eclipse >few OPP when we scale up/down > b) tasks/CPUs utilization are good enough but never exact and precise >values > c) reducing the number of OPP switches could have some benefits on >stability/latencies > d) clamping is actually defining minimum/maximum preferred values, is >not to be considered a tool for "precise control" > > All that considered, I would say that maybe a 5% resolution could > still be considered an acceptable _worst case_ rounding since we don't > have always to round up to the next 5%. > > For example, if we have: > - TaskA: util_min=41% > - TaskB: util_nin=44% > they will be both accounted in the 40-45% clamp group but the clamp > group value can be modulated at run-time depending on RUNNABLE > tasks. When TaskA is running alone, we can still set util_min to > 41%, while we will use 44% (not 45%) when TaskB is (also) running. > > It's worth to notice that we pre-allocated at compile time 20 clamp > groups, but not necessarily all of them will be used at run-time. > Indeed, we will still use a policy where only the actual required > values are allocated at the beginning of the clamps map, thus > optimizing max updates. OK, so you'll only still iterate over the groups that are actually in use, which is hopefully less than 20 and should keep overhead low. Makes sense to me.
Re: [PATCH v3 01/14] sched/core: uclamp: extend sched_setattr to support utilization clamping
On 09-Aug 11:50, Juri Lelli wrote: > On 09/08/18 10:14, Patrick Bellasi wrote: > > On 07-Aug 14:35, Juri Lelli wrote: > > > On 06/08/18 17:39, Patrick Bellasi wrote: [...] > > 1) make CAP_SYS_NICE protected the clamp groups, with an optional boot > >time parameter to relax this check > > It seems to me that this might work well with that the intended usage of > the interface that you depict above. SMS only (or any privileged user) > will be in control of how groups are configured, so no problem for > normal users. Yes, well... apart normal users still getting a -ENOSPC is they are requesting one of the not pre-configured clamp values. Which is why the following bits can be helpful. > > 2) add discretization support to clamp groups allocation > > And this might also work well if we feel that we don't want to restrict > usage of the interface to admin only, however... > > > This second feature specifically, will ensure that clamp values are > > always mapped into one of the available clamp groups. While the exact > > clamp value can always be used for tasks placement biasing, when it > > comes to frequency selection biasing, depending on concurrently > > running tasks, you can end up with an effective clamp value which is a > > rounded up. > > what I'm not so sure about is that we might lose in flexibility if the > number of available discrete clamp groups is too small compared to the > number of available OPP on the platform. Regarding this concern, I would say that we should consider that, for frequency biasing, we are in general not interested in nailing down the single 1% difference and/or exact OPP capacities A certain coarse grained resolution is usually acceptable for many different reasons: a) schedutil already uses a 20% margin which can potentially eclipse few OPP when we scale up/down b) tasks/CPUs utilization are good enough but never exact and precise values c) reducing the number of OPP switches could have some benefits on stability/latencies d) clamping is actually defining minimum/maximum preferred values, is not to be considered a tool for "precise control" All that considered, I would say that maybe a 5% resolution could still be considered an acceptable _worst case_ rounding since we don't have always to round up to the next 5%. For example, if we have: - TaskA: util_min=41% - TaskB: util_nin=44% they will be both accounted in the 40-45% clamp group but the clamp group value can be modulated at run-time depending on RUNNABLE tasks. When TaskA is running alone, we can still set util_min to 41%, while we will use 44% (not 45%) when TaskB is (also) running. It's worth to notice that we pre-allocated at compile time 20 clamp groups, but not necessarily all of them will be used at run-time. Indeed, we will still use a policy where only the actual required values are allocated at the beginning of the clamps map, thus optimizing max updates. -- #include Patrick Bellasi
Re: [PATCH v3 01/14] sched/core: uclamp: extend sched_setattr to support utilization clamping
On 09-Aug 11:50, Juri Lelli wrote: > On 09/08/18 10:14, Patrick Bellasi wrote: > > On 07-Aug 14:35, Juri Lelli wrote: > > > On 06/08/18 17:39, Patrick Bellasi wrote: [...] > > 1) make CAP_SYS_NICE protected the clamp groups, with an optional boot > >time parameter to relax this check > > It seems to me that this might work well with that the intended usage of > the interface that you depict above. SMS only (or any privileged user) > will be in control of how groups are configured, so no problem for > normal users. Yes, well... apart normal users still getting a -ENOSPC is they are requesting one of the not pre-configured clamp values. Which is why the following bits can be helpful. > > 2) add discretization support to clamp groups allocation > > And this might also work well if we feel that we don't want to restrict > usage of the interface to admin only, however... > > > This second feature specifically, will ensure that clamp values are > > always mapped into one of the available clamp groups. While the exact > > clamp value can always be used for tasks placement biasing, when it > > comes to frequency selection biasing, depending on concurrently > > running tasks, you can end up with an effective clamp value which is a > > rounded up. > > what I'm not so sure about is that we might lose in flexibility if the > number of available discrete clamp groups is too small compared to the > number of available OPP on the platform. Regarding this concern, I would say that we should consider that, for frequency biasing, we are in general not interested in nailing down the single 1% difference and/or exact OPP capacities A certain coarse grained resolution is usually acceptable for many different reasons: a) schedutil already uses a 20% margin which can potentially eclipse few OPP when we scale up/down b) tasks/CPUs utilization are good enough but never exact and precise values c) reducing the number of OPP switches could have some benefits on stability/latencies d) clamping is actually defining minimum/maximum preferred values, is not to be considered a tool for "precise control" All that considered, I would say that maybe a 5% resolution could still be considered an acceptable _worst case_ rounding since we don't have always to round up to the next 5%. For example, if we have: - TaskA: util_min=41% - TaskB: util_nin=44% they will be both accounted in the 40-45% clamp group but the clamp group value can be modulated at run-time depending on RUNNABLE tasks. When TaskA is running alone, we can still set util_min to 41%, while we will use 44% (not 45%) when TaskB is (also) running. It's worth to notice that we pre-allocated at compile time 20 clamp groups, but not necessarily all of them will be used at run-time. Indeed, we will still use a policy where only the actual required values are allocated at the beginning of the clamps map, thus optimizing max updates. -- #include Patrick Bellasi
Re: [PATCH v3 01/14] sched/core: uclamp: extend sched_setattr to support utilization clamping
On 08/09/2018 01:39 AM, Patrick Bellasi wrote: > On 06-Aug 09:50, Randy Dunlap wrote: >> Hi, > > Hi Randy, > >> On 08/06/2018 09:39 AM, Patrick Bellasi wrote: >>> diff --git a/init/Kconfig b/init/Kconfig >>> index 041f3a022122..1d45a6877d6f 100644 >>> --- a/init/Kconfig >>> +++ b/init/Kconfig >>> @@ -583,6 +583,25 @@ config HAVE_UNSTABLE_SCHED_CLOCK >>> config GENERIC_SCHED_CLOCK >>> bool >>> >>> +menu "Scheduler features" >>> + >>> +config UCLAMP_TASK >>> + bool "Enable utilization clamping for RT/FAIR tasks" >>> + depends on CPU_FREQ_GOV_SCHEDUTIL >>> + default false >> >> default n >> but just omit the line completely since "n" is already the default. > > > Right, will update for next posting! > Is there a strict rule to omit this line when it's already the > default? It's not documented AFAIK, but it's commonly repeated on LKML. >>> + help >>> + This feature enables the scheduler to track the clamped utilization >>> + of each CPU based on RUNNABLE tasks currently scheduled on that CPU. >>> + >>> + When this option is enabled, the user can specify a min and max CPU >>> + bandwidth which is allowed for a task. >>> + The max bandwidth allows to clamp the maximum frequency a task can >>> + use, while the min bandwidth allows to define a minimum frequency a >>> + task will always use. >> >> Please clean up the indentation above to use one tab + 2 spaces on all lines. > > Sure, my bad I did not notice it... although I'm quite sure the patch > passed a checkpatch... will check better next time. Thanks. >>> + >>> + If in doubt, say N. >>> + >>> +endmenu -- ~Randy
Re: [PATCH v3 01/14] sched/core: uclamp: extend sched_setattr to support utilization clamping
On 08/09/2018 01:39 AM, Patrick Bellasi wrote: > On 06-Aug 09:50, Randy Dunlap wrote: >> Hi, > > Hi Randy, > >> On 08/06/2018 09:39 AM, Patrick Bellasi wrote: >>> diff --git a/init/Kconfig b/init/Kconfig >>> index 041f3a022122..1d45a6877d6f 100644 >>> --- a/init/Kconfig >>> +++ b/init/Kconfig >>> @@ -583,6 +583,25 @@ config HAVE_UNSTABLE_SCHED_CLOCK >>> config GENERIC_SCHED_CLOCK >>> bool >>> >>> +menu "Scheduler features" >>> + >>> +config UCLAMP_TASK >>> + bool "Enable utilization clamping for RT/FAIR tasks" >>> + depends on CPU_FREQ_GOV_SCHEDUTIL >>> + default false >> >> default n >> but just omit the line completely since "n" is already the default. > > > Right, will update for next posting! > Is there a strict rule to omit this line when it's already the > default? It's not documented AFAIK, but it's commonly repeated on LKML. >>> + help >>> + This feature enables the scheduler to track the clamped utilization >>> + of each CPU based on RUNNABLE tasks currently scheduled on that CPU. >>> + >>> + When this option is enabled, the user can specify a min and max CPU >>> + bandwidth which is allowed for a task. >>> + The max bandwidth allows to clamp the maximum frequency a task can >>> + use, while the min bandwidth allows to define a minimum frequency a >>> + task will always use. >> >> Please clean up the indentation above to use one tab + 2 spaces on all lines. > > Sure, my bad I did not notice it... although I'm quite sure the patch > passed a checkpatch... will check better next time. Thanks. >>> + >>> + If in doubt, say N. >>> + >>> +endmenu -- ~Randy
Re: [PATCH v3 01/14] sched/core: uclamp: extend sched_setattr to support utilization clamping
On 09/08/18 10:14, Patrick Bellasi wrote: > On 07-Aug 14:35, Juri Lelli wrote: > > On 06/08/18 17:39, Patrick Bellasi wrote: > > > > [...] > > > > > @@ -4218,6 +4245,13 @@ static int __sched_setscheduler(struct task_struct > > > *p, > > > return retval; > > > } > > > > > > + /* Configure utilization clamps for the task */ > > > + if (attr->sched_flags & SCHED_FLAG_UTIL_CLAMP) { > > > + retval = __setscheduler_uclamp(p, attr); > > > + if (retval) > > > + return retval; > > > + } > > > + > > > > IIUC, this is available to root and non-root users. In the latter case, > > how do we cope with the fact that some user might occupy all the > > available clamping groups configured for the system? > > That's a very good point, glad you noticed it. > > What concern me most is that we set constraints to the cgroups > delegation model. If all clamp groups have been used it could be > not possible for a parent group to shrink resources for its subgroups. Right, when groups are in use the problem might actually be even more serious. > In both cases however, in principle, I think we can live with the idea > that the "System Management Software" (SMS) can pre-allocate all the > required boost groups at boot time; malicious tasks and dependent > groups will eventually get an -ENOSPC error. > > These are the main reason why I did not posted a more "safe" solution: > this series is already big enough, a properly (pre)configured system > is still reasonably functional safe and this feature can be added in > a second step. I see, but I also fear that there will be times and usages of this new interface where no SMS is present. > However, I already have a couple of possible extensions/fixes which I > can add on top on the next respin. They are along these lines: These are exactly what I was thinking about as well. :-) > 1) make CAP_SYS_NICE protected the clamp groups, with an optional boot >time parameter to relax this check It seems to me that this might work well with that the intended usage of the interface that you depict above. SMS only (or any privileged user) will be in control of how groups are configured, so no problem for normal users. > 2) add discretization support to clamp groups allocation And this might also work well if we feel that we don't want to restrict usage of the interface to admin only, however... > This second feature specifically, will ensure that clamp values are > always mapped into one of the available clamp groups. While the exact > clamp value can always be used for tasks placement biasing, when it > comes to frequency selection biasing, depending on concurrently > running tasks, you can end up with an effective clamp value which is a > rounded up. what I'm not so sure about is that we might lose in flexibility if the number of available discrete clamp groups is too small compared to the number of available OPP on the platform. > Will likely add a couple of additional patches on v4 posting. > Do you have any other possible idea? As said, I though as well about the two options you mentioned.
Re: [PATCH v3 01/14] sched/core: uclamp: extend sched_setattr to support utilization clamping
On 09/08/18 10:14, Patrick Bellasi wrote: > On 07-Aug 14:35, Juri Lelli wrote: > > On 06/08/18 17:39, Patrick Bellasi wrote: > > > > [...] > > > > > @@ -4218,6 +4245,13 @@ static int __sched_setscheduler(struct task_struct > > > *p, > > > return retval; > > > } > > > > > > + /* Configure utilization clamps for the task */ > > > + if (attr->sched_flags & SCHED_FLAG_UTIL_CLAMP) { > > > + retval = __setscheduler_uclamp(p, attr); > > > + if (retval) > > > + return retval; > > > + } > > > + > > > > IIUC, this is available to root and non-root users. In the latter case, > > how do we cope with the fact that some user might occupy all the > > available clamping groups configured for the system? > > That's a very good point, glad you noticed it. > > What concern me most is that we set constraints to the cgroups > delegation model. If all clamp groups have been used it could be > not possible for a parent group to shrink resources for its subgroups. Right, when groups are in use the problem might actually be even more serious. > In both cases however, in principle, I think we can live with the idea > that the "System Management Software" (SMS) can pre-allocate all the > required boost groups at boot time; malicious tasks and dependent > groups will eventually get an -ENOSPC error. > > These are the main reason why I did not posted a more "safe" solution: > this series is already big enough, a properly (pre)configured system > is still reasonably functional safe and this feature can be added in > a second step. I see, but I also fear that there will be times and usages of this new interface where no SMS is present. > However, I already have a couple of possible extensions/fixes which I > can add on top on the next respin. They are along these lines: These are exactly what I was thinking about as well. :-) > 1) make CAP_SYS_NICE protected the clamp groups, with an optional boot >time parameter to relax this check It seems to me that this might work well with that the intended usage of the interface that you depict above. SMS only (or any privileged user) will be in control of how groups are configured, so no problem for normal users. > 2) add discretization support to clamp groups allocation And this might also work well if we feel that we don't want to restrict usage of the interface to admin only, however... > This second feature specifically, will ensure that clamp values are > always mapped into one of the available clamp groups. While the exact > clamp value can always be used for tasks placement biasing, when it > comes to frequency selection biasing, depending on concurrently > running tasks, you can end up with an effective clamp value which is a > rounded up. what I'm not so sure about is that we might lose in flexibility if the number of available discrete clamp groups is too small compared to the number of available OPP on the platform. > Will likely add a couple of additional patches on v4 posting. > Do you have any other possible idea? As said, I though as well about the two options you mentioned.
Re: [PATCH v3 01/14] sched/core: uclamp: extend sched_setattr to support utilization clamping
On 07-Aug 14:35, Juri Lelli wrote: > On 06/08/18 17:39, Patrick Bellasi wrote: > > [...] > > > @@ -4218,6 +4245,13 @@ static int __sched_setscheduler(struct task_struct > > *p, > > return retval; > > } > > > > + /* Configure utilization clamps for the task */ > > + if (attr->sched_flags & SCHED_FLAG_UTIL_CLAMP) { > > + retval = __setscheduler_uclamp(p, attr); > > + if (retval) > > + return retval; > > + } > > + > > IIUC, this is available to root and non-root users. In the latter case, > how do we cope with the fact that some user might occupy all the > available clamping groups configured for the system? That's a very good point, glad you noticed it. What concern me most is that we set constraints to the cgroups delegation model. If all clamp groups have been used it could be not possible for a parent group to shrink resources for its subgroups. In both cases however, in principle, I think we can live with the idea that the "System Management Software" (SMS) can pre-allocate all the required boost groups at boot time; malicious tasks and dependent groups will eventually get an -ENOSPC error. These are the main reason why I did not posted a more "safe" solution: this series is already big enough, a properly (pre)configured system is still reasonably functional safe and this feature can be added in a second step. However, I already have a couple of possible extensions/fixes which I can add on top on the next respin. They are along these lines: 1) make CAP_SYS_NICE protected the clamp groups, with an optional boot time parameter to relax this check 2) add discretization support to clamp groups allocation This second feature specifically, will ensure that clamp values are always mapped into one of the available clamp groups. While the exact clamp value can always be used for tasks placement biasing, when it comes to frequency selection biasing, depending on concurrently running tasks, you can end up with an effective clamp value which is a rounded up. Will likely add a couple of additional patches on v4 posting. Do you have any other possible idea? Cheers Patrick -- #include Patrick Bellasi
Re: [PATCH v3 01/14] sched/core: uclamp: extend sched_setattr to support utilization clamping
On 07-Aug 14:35, Juri Lelli wrote: > On 06/08/18 17:39, Patrick Bellasi wrote: > > [...] > > > @@ -4218,6 +4245,13 @@ static int __sched_setscheduler(struct task_struct > > *p, > > return retval; > > } > > > > + /* Configure utilization clamps for the task */ > > + if (attr->sched_flags & SCHED_FLAG_UTIL_CLAMP) { > > + retval = __setscheduler_uclamp(p, attr); > > + if (retval) > > + return retval; > > + } > > + > > IIUC, this is available to root and non-root users. In the latter case, > how do we cope with the fact that some user might occupy all the > available clamping groups configured for the system? That's a very good point, glad you noticed it. What concern me most is that we set constraints to the cgroups delegation model. If all clamp groups have been used it could be not possible for a parent group to shrink resources for its subgroups. In both cases however, in principle, I think we can live with the idea that the "System Management Software" (SMS) can pre-allocate all the required boost groups at boot time; malicious tasks and dependent groups will eventually get an -ENOSPC error. These are the main reason why I did not posted a more "safe" solution: this series is already big enough, a properly (pre)configured system is still reasonably functional safe and this feature can be added in a second step. However, I already have a couple of possible extensions/fixes which I can add on top on the next respin. They are along these lines: 1) make CAP_SYS_NICE protected the clamp groups, with an optional boot time parameter to relax this check 2) add discretization support to clamp groups allocation This second feature specifically, will ensure that clamp values are always mapped into one of the available clamp groups. While the exact clamp value can always be used for tasks placement biasing, when it comes to frequency selection biasing, depending on concurrently running tasks, you can end up with an effective clamp value which is a rounded up. Will likely add a couple of additional patches on v4 posting. Do you have any other possible idea? Cheers Patrick -- #include Patrick Bellasi
Re: [PATCH v3 01/14] sched/core: uclamp: extend sched_setattr to support utilization clamping
On 06-Aug 09:50, Randy Dunlap wrote: > Hi, Hi Randy, > On 08/06/2018 09:39 AM, Patrick Bellasi wrote: > > diff --git a/init/Kconfig b/init/Kconfig > > index 041f3a022122..1d45a6877d6f 100644 > > --- a/init/Kconfig > > +++ b/init/Kconfig > > @@ -583,6 +583,25 @@ config HAVE_UNSTABLE_SCHED_CLOCK > > config GENERIC_SCHED_CLOCK > > bool > > > > +menu "Scheduler features" > > + > > +config UCLAMP_TASK > > + bool "Enable utilization clamping for RT/FAIR tasks" > > + depends on CPU_FREQ_GOV_SCHEDUTIL > > + default false > > default n > but just omit the line completely since "n" is already the default. Right, will update for next posting! Is there a strict rule to omit this line when it's already the default? > > > + help > > + This feature enables the scheduler to track the clamped utilization > > + of each CPU based on RUNNABLE tasks currently scheduled on that CPU. > > + > > + When this option is enabled, the user can specify a min and max CPU > > + bandwidth which is allowed for a task. > > + The max bandwidth allows to clamp the maximum frequency a task can > > + use, while the min bandwidth allows to define a minimum frequency a > > + task will always use. > > Please clean up the indentation above to use one tab + 2 spaces on all lines. Sure, my bad I did not notice it... although I'm quite sure the patch passed a checkpatch... will check better next time. > > + > > + If in doubt, say N. > > + > > +endmenu Cheers Patrick -- #include Patrick Bellasi
Re: [PATCH v3 01/14] sched/core: uclamp: extend sched_setattr to support utilization clamping
On 06-Aug 09:50, Randy Dunlap wrote: > Hi, Hi Randy, > On 08/06/2018 09:39 AM, Patrick Bellasi wrote: > > diff --git a/init/Kconfig b/init/Kconfig > > index 041f3a022122..1d45a6877d6f 100644 > > --- a/init/Kconfig > > +++ b/init/Kconfig > > @@ -583,6 +583,25 @@ config HAVE_UNSTABLE_SCHED_CLOCK > > config GENERIC_SCHED_CLOCK > > bool > > > > +menu "Scheduler features" > > + > > +config UCLAMP_TASK > > + bool "Enable utilization clamping for RT/FAIR tasks" > > + depends on CPU_FREQ_GOV_SCHEDUTIL > > + default false > > default n > but just omit the line completely since "n" is already the default. Right, will update for next posting! Is there a strict rule to omit this line when it's already the default? > > > + help > > + This feature enables the scheduler to track the clamped utilization > > + of each CPU based on RUNNABLE tasks currently scheduled on that CPU. > > + > > + When this option is enabled, the user can specify a min and max CPU > > + bandwidth which is allowed for a task. > > + The max bandwidth allows to clamp the maximum frequency a task can > > + use, while the min bandwidth allows to define a minimum frequency a > > + task will always use. > > Please clean up the indentation above to use one tab + 2 spaces on all lines. Sure, my bad I did not notice it... although I'm quite sure the patch passed a checkpatch... will check better next time. > > + > > + If in doubt, say N. > > + > > +endmenu Cheers Patrick -- #include Patrick Bellasi
Re: [PATCH v3 01/14] sched/core: uclamp: extend sched_setattr to support utilization clamping
On 06/08/18 17:39, Patrick Bellasi wrote: [...] > @@ -4218,6 +4245,13 @@ static int __sched_setscheduler(struct task_struct *p, > return retval; > } > > + /* Configure utilization clamps for the task */ > + if (attr->sched_flags & SCHED_FLAG_UTIL_CLAMP) { > + retval = __setscheduler_uclamp(p, attr); > + if (retval) > + return retval; > + } > + IIUC, this is available to root and non-root users. In the latter case, how do we cope with the fact that some user might occupy all the available clamping groups configured for the system? Best, - Juri
Re: [PATCH v3 01/14] sched/core: uclamp: extend sched_setattr to support utilization clamping
On 06/08/18 17:39, Patrick Bellasi wrote: [...] > @@ -4218,6 +4245,13 @@ static int __sched_setscheduler(struct task_struct *p, > return retval; > } > > + /* Configure utilization clamps for the task */ > + if (attr->sched_flags & SCHED_FLAG_UTIL_CLAMP) { > + retval = __setscheduler_uclamp(p, attr); > + if (retval) > + return retval; > + } > + IIUC, this is available to root and non-root users. In the latter case, how do we cope with the fact that some user might occupy all the available clamping groups configured for the system? Best, - Juri
Re: [PATCH v3 01/14] sched/core: uclamp: extend sched_setattr to support utilization clamping
Hi, Minor comments below. On 06/08/18 17:39, Patrick Bellasi wrote: [...] > + * > + * Task Utilization Attributes > + * === > + * > + * A subset of sched_attr attributes allows to specify the utilization which > + * should be expected by a task. These attributes allows to inform the ^ allow > + * scheduler about the utilization boundaries within which is safe to > schedule Isn't all this more about providing hints than safety? > + * the task. These utilization boundaries are valuable information to support > + * scheduler decisions on both task placement and frequencies selection. > + * > + * @sched_util_min represents the minimum utilization > + * @sched_util_max represents the maximum utilization > + * > + * Utilization is a value in the range [0..SCHED_CAPACITY_SCALE] which > + * represents the percentage of CPU time used by a task when running at the > + * maximum frequency on the highest capacity CPU of the system. Thus, for > + * example, a 20% utilization task is a task running for 2ms every 10ms. > + * > + * A task with a min utilization value bigger then 0 is more likely to be > + * scheduled on a CPU which can provide that bandwidth. > + * A task with a max utilization value smaller then 1024 is more likely to be > + * scheduled on a CPU which do not provide more then the required bandwidth. Isn't s/bandwidth/capacity/ here, above, and in general where you use the term "bandwidth" more appropriate? I wonder if overloading this term (w.r.t. how is used with DEADLINE) might create confusion. In this case we are not providing any sort of guarantees, it's a hint. Best, - Juri
Re: [PATCH v3 01/14] sched/core: uclamp: extend sched_setattr to support utilization clamping
Hi, Minor comments below. On 06/08/18 17:39, Patrick Bellasi wrote: [...] > + * > + * Task Utilization Attributes > + * === > + * > + * A subset of sched_attr attributes allows to specify the utilization which > + * should be expected by a task. These attributes allows to inform the ^ allow > + * scheduler about the utilization boundaries within which is safe to > schedule Isn't all this more about providing hints than safety? > + * the task. These utilization boundaries are valuable information to support > + * scheduler decisions on both task placement and frequencies selection. > + * > + * @sched_util_min represents the minimum utilization > + * @sched_util_max represents the maximum utilization > + * > + * Utilization is a value in the range [0..SCHED_CAPACITY_SCALE] which > + * represents the percentage of CPU time used by a task when running at the > + * maximum frequency on the highest capacity CPU of the system. Thus, for > + * example, a 20% utilization task is a task running for 2ms every 10ms. > + * > + * A task with a min utilization value bigger then 0 is more likely to be > + * scheduled on a CPU which can provide that bandwidth. > + * A task with a max utilization value smaller then 1024 is more likely to be > + * scheduled on a CPU which do not provide more then the required bandwidth. Isn't s/bandwidth/capacity/ here, above, and in general where you use the term "bandwidth" more appropriate? I wonder if overloading this term (w.r.t. how is used with DEADLINE) might create confusion. In this case we are not providing any sort of guarantees, it's a hint. Best, - Juri
Re: [PATCH v3 01/14] sched/core: uclamp: extend sched_setattr to support utilization clamping
Hi, On 08/06/2018 09:39 AM, Patrick Bellasi wrote: > diff --git a/init/Kconfig b/init/Kconfig > index 041f3a022122..1d45a6877d6f 100644 > --- a/init/Kconfig > +++ b/init/Kconfig > @@ -583,6 +583,25 @@ config HAVE_UNSTABLE_SCHED_CLOCK > config GENERIC_SCHED_CLOCK > bool > > +menu "Scheduler features" > + > +config UCLAMP_TASK > + bool "Enable utilization clamping for RT/FAIR tasks" > + depends on CPU_FREQ_GOV_SCHEDUTIL > + default false default n but just omit the line completely since "n" is already the default. > + help > + This feature enables the scheduler to track the clamped utilization > + of each CPU based on RUNNABLE tasks currently scheduled on that CPU. > + > + When this option is enabled, the user can specify a min and max CPU > + bandwidth which is allowed for a task. > + The max bandwidth allows to clamp the maximum frequency a task can > + use, while the min bandwidth allows to define a minimum frequency a > + task will always use. Please clean up the indentation above to use one tab + 2 spaces on all lines. > + > + If in doubt, say N. > + > +endmenu thanks, -- ~Randy
Re: [PATCH v3 01/14] sched/core: uclamp: extend sched_setattr to support utilization clamping
Hi, On 08/06/2018 09:39 AM, Patrick Bellasi wrote: > diff --git a/init/Kconfig b/init/Kconfig > index 041f3a022122..1d45a6877d6f 100644 > --- a/init/Kconfig > +++ b/init/Kconfig > @@ -583,6 +583,25 @@ config HAVE_UNSTABLE_SCHED_CLOCK > config GENERIC_SCHED_CLOCK > bool > > +menu "Scheduler features" > + > +config UCLAMP_TASK > + bool "Enable utilization clamping for RT/FAIR tasks" > + depends on CPU_FREQ_GOV_SCHEDUTIL > + default false default n but just omit the line completely since "n" is already the default. > + help > + This feature enables the scheduler to track the clamped utilization > + of each CPU based on RUNNABLE tasks currently scheduled on that CPU. > + > + When this option is enabled, the user can specify a min and max CPU > + bandwidth which is allowed for a task. > + The max bandwidth allows to clamp the maximum frequency a task can > + use, while the min bandwidth allows to define a minimum frequency a > + task will always use. Please clean up the indentation above to use one tab + 2 spaces on all lines. > + > + If in doubt, say N. > + > +endmenu thanks, -- ~Randy
[PATCH v3 01/14] sched/core: uclamp: extend sched_setattr to support utilization clamping
The SCHED_DEADLINE scheduling class provides an advanced and formal model to define tasks requirements which can be translated into proper decisions for both task placements and frequencies selections. Other classes have a more simplified model which is essentially based on the relatively simple concept of POSIX priorities. Such a simple priority based model however does not allow to exploit some of the most advanced features of the Linux scheduler like, for example, driving frequencies selection via the schedutil cpufreq governor. However, also for non SCHED_DEADLINE tasks, it's still interesting to define tasks properties which can be used to better support certain scheduler decisions. Utilization clamping aims at exposing to user-space a new set of per-task attributes which can be used to provide the scheduler with some hints about the expected/required utilization for a task. This will allow to implement a more advanced per-task frequency control mechanism which is not based just on a "passive" measured task utilization but on a more "active" approach. For example, it could be possible to boost interactive tasks, thus getting better performance, or cap background tasks, thus being more energy efficient. Ultimately, such a mechanism can be considered similar to the cpufreq's powersave, performance and userspace governor but with a much fine grained and per-task control. Let's introduce a new API to set utilization clamping values for a specified task by extending sched_setattr, a syscall which already allows to define task specific properties for different scheduling classes. Specifically, a new pair of attributes allows to specify a minimum and maximum utilization which the scheduler should consider for a task. Signed-off-by: Patrick Bellasi Cc: Ingo Molnar Cc: Peter Zijlstra Cc: Tejun Heo Cc: Rafael J. Wysocki Cc: Vincent Guittot Cc: Viresh Kumar Cc: Paul Turner Cc: Suren Baghdasaryan Cc: Todd Kjos Cc: Joel Fernandes Cc: Steve Muckle Cc: Juri Lelli Cc: Dietmar Eggemann Cc: Morten Rasmussen Cc: linux-kernel@vger.kernel.org Cc: linux...@vger.kernel.org --- Changes in v3: Message-ID: - removed UCLAMP_NONE not used by this patch Others: - rebased on tip/sched/core Changes in v2: - rebased on v4.18-rc4 - move at the head of the series As discussed at OSPM, using a [0..SCHED_CAPACITY_SCALE] range seems to be acceptable. However, an additional patch has been added at the end of the series which introduces a simple abstraction to use a more generic [0..100] range. At OSPM we also discarded the idea to "recycle" the usage of sched_runtime and sched_period which would have made the API too much complex for limited benefits. --- include/linux/sched.h| 13 +++ include/uapi/linux/sched.h | 4 +- include/uapi/linux/sched/types.h | 64 +++- init/Kconfig | 19 ++ kernel/sched/core.c | 39 +++ 5 files changed, 129 insertions(+), 10 deletions(-) diff --git a/include/linux/sched.h b/include/linux/sched.h index e0f4f56c9310..42f6439378e1 100644 --- a/include/linux/sched.h +++ b/include/linux/sched.h @@ -279,6 +279,14 @@ struct vtime { u64 gtime; }; +enum uclamp_id { + UCLAMP_MIN = 0, /* Minimum utilization */ + UCLAMP_MAX, /* Maximum utilization */ + + /* Utilization clamping constraints count */ + UCLAMP_CNT +}; + struct sched_info { #ifdef CONFIG_SCHED_INFO /* Cumulative counters: */ @@ -649,6 +657,11 @@ struct task_struct { #endif struct sched_dl_entity dl; +#ifdef CONFIG_UCLAMP_TASK + /* Utlization clamp values for this task */ + int uclamp[UCLAMP_CNT]; +#endif + #ifdef CONFIG_PREEMPT_NOTIFIERS /* List of struct preempt_notifier: */ struct hlist_head preempt_notifiers; diff --git a/include/uapi/linux/sched.h b/include/uapi/linux/sched.h index 22627f80063e..c27d6e81517b 100644 --- a/include/uapi/linux/sched.h +++ b/include/uapi/linux/sched.h @@ -50,9 +50,11 @@ #define SCHED_FLAG_RESET_ON_FORK 0x01 #define SCHED_FLAG_RECLAIM 0x02 #define SCHED_FLAG_DL_OVERRUN 0x04 +#define SCHED_FLAG_UTIL_CLAMP 0x08 #define SCHED_FLAG_ALL (SCHED_FLAG_RESET_ON_FORK | \ SCHED_FLAG_RECLAIM | \ -SCHED_FLAG_DL_OVERRUN) +SCHED_FLAG_DL_OVERRUN | \ +SCHED_FLAG_UTIL_CLAMP) #endif /* _UAPI_LINUX_SCHED_H */ diff --git a/include/uapi/linux/sched/types.h b/include/uapi/linux/sched/types.h index 10fbb8031930..7421cd25354d 100644 --- a/include/uapi/linux/sched/types.h +++ b/include/uapi/linux/sched/types.h @@ -21,8 +21,33 @@ struct sched_param { * the tasks may be useful for a wide variety of application fields, e.g., * multimedia, streaming, automation and control, and many others.
[PATCH v3 01/14] sched/core: uclamp: extend sched_setattr to support utilization clamping
The SCHED_DEADLINE scheduling class provides an advanced and formal model to define tasks requirements which can be translated into proper decisions for both task placements and frequencies selections. Other classes have a more simplified model which is essentially based on the relatively simple concept of POSIX priorities. Such a simple priority based model however does not allow to exploit some of the most advanced features of the Linux scheduler like, for example, driving frequencies selection via the schedutil cpufreq governor. However, also for non SCHED_DEADLINE tasks, it's still interesting to define tasks properties which can be used to better support certain scheduler decisions. Utilization clamping aims at exposing to user-space a new set of per-task attributes which can be used to provide the scheduler with some hints about the expected/required utilization for a task. This will allow to implement a more advanced per-task frequency control mechanism which is not based just on a "passive" measured task utilization but on a more "active" approach. For example, it could be possible to boost interactive tasks, thus getting better performance, or cap background tasks, thus being more energy efficient. Ultimately, such a mechanism can be considered similar to the cpufreq's powersave, performance and userspace governor but with a much fine grained and per-task control. Let's introduce a new API to set utilization clamping values for a specified task by extending sched_setattr, a syscall which already allows to define task specific properties for different scheduling classes. Specifically, a new pair of attributes allows to specify a minimum and maximum utilization which the scheduler should consider for a task. Signed-off-by: Patrick Bellasi Cc: Ingo Molnar Cc: Peter Zijlstra Cc: Tejun Heo Cc: Rafael J. Wysocki Cc: Vincent Guittot Cc: Viresh Kumar Cc: Paul Turner Cc: Suren Baghdasaryan Cc: Todd Kjos Cc: Joel Fernandes Cc: Steve Muckle Cc: Juri Lelli Cc: Dietmar Eggemann Cc: Morten Rasmussen Cc: linux-kernel@vger.kernel.org Cc: linux...@vger.kernel.org --- Changes in v3: Message-ID: - removed UCLAMP_NONE not used by this patch Others: - rebased on tip/sched/core Changes in v2: - rebased on v4.18-rc4 - move at the head of the series As discussed at OSPM, using a [0..SCHED_CAPACITY_SCALE] range seems to be acceptable. However, an additional patch has been added at the end of the series which introduces a simple abstraction to use a more generic [0..100] range. At OSPM we also discarded the idea to "recycle" the usage of sched_runtime and sched_period which would have made the API too much complex for limited benefits. --- include/linux/sched.h| 13 +++ include/uapi/linux/sched.h | 4 +- include/uapi/linux/sched/types.h | 64 +++- init/Kconfig | 19 ++ kernel/sched/core.c | 39 +++ 5 files changed, 129 insertions(+), 10 deletions(-) diff --git a/include/linux/sched.h b/include/linux/sched.h index e0f4f56c9310..42f6439378e1 100644 --- a/include/linux/sched.h +++ b/include/linux/sched.h @@ -279,6 +279,14 @@ struct vtime { u64 gtime; }; +enum uclamp_id { + UCLAMP_MIN = 0, /* Minimum utilization */ + UCLAMP_MAX, /* Maximum utilization */ + + /* Utilization clamping constraints count */ + UCLAMP_CNT +}; + struct sched_info { #ifdef CONFIG_SCHED_INFO /* Cumulative counters: */ @@ -649,6 +657,11 @@ struct task_struct { #endif struct sched_dl_entity dl; +#ifdef CONFIG_UCLAMP_TASK + /* Utlization clamp values for this task */ + int uclamp[UCLAMP_CNT]; +#endif + #ifdef CONFIG_PREEMPT_NOTIFIERS /* List of struct preempt_notifier: */ struct hlist_head preempt_notifiers; diff --git a/include/uapi/linux/sched.h b/include/uapi/linux/sched.h index 22627f80063e..c27d6e81517b 100644 --- a/include/uapi/linux/sched.h +++ b/include/uapi/linux/sched.h @@ -50,9 +50,11 @@ #define SCHED_FLAG_RESET_ON_FORK 0x01 #define SCHED_FLAG_RECLAIM 0x02 #define SCHED_FLAG_DL_OVERRUN 0x04 +#define SCHED_FLAG_UTIL_CLAMP 0x08 #define SCHED_FLAG_ALL (SCHED_FLAG_RESET_ON_FORK | \ SCHED_FLAG_RECLAIM | \ -SCHED_FLAG_DL_OVERRUN) +SCHED_FLAG_DL_OVERRUN | \ +SCHED_FLAG_UTIL_CLAMP) #endif /* _UAPI_LINUX_SCHED_H */ diff --git a/include/uapi/linux/sched/types.h b/include/uapi/linux/sched/types.h index 10fbb8031930..7421cd25354d 100644 --- a/include/uapi/linux/sched/types.h +++ b/include/uapi/linux/sched/types.h @@ -21,8 +21,33 @@ struct sched_param { * the tasks may be useful for a wide variety of application fields, e.g., * multimedia, streaming, automation and control, and many others.