Re: [PATCH v4 02/16] sched/core: uclamp: map TASK's clamp values into CPU's clamp groups
On 13-Sep 21:14, Peter Zijlstra wrote: > On Wed, Sep 12, 2018 at 06:52:02PM +0100, Patrick Bellasi wrote: > > On 12-Sep 19:42, Peter Zijlstra wrote: > > > On Wed, Sep 12, 2018 at 06:35:15PM +0100, Patrick Bellasi wrote: > > > > On 12-Sep 18:12, Peter Zijlstra wrote: > > > > > > > > No idea; but if you want to go all fancy you can replace he whole > > > > > uclamp_map thing with something like: > > > > > > > > > > struct uclamp_map { > > > > > union { > > > > > struct { > > > > > unsigned long v : 10; > > > > > unsigned long c : BITS_PER_LONG - 10; > > > > > }; > > > > > atomic_long_t s; > > > > > }; > > > > > }; > > > > > > > > That sounds really cool and scary at the same time :) > > > > > > > > The v:10 requires that we never set SCHED_CAPACITY_SCALE>1024 > > > > or that we use it to track a percentage value (i.e. [0..100]). > > > > > > Or we pick 11 bits, it seems unlikely that capacity be larger than 2k. > > > > Just remembered a past experience where we had unaligned access traps > > on some machine because... don't you see any potentially issue on > > using bitfleds like you suggest above ? > > > > I'm thinking to: > > > >commit 317d359df95d ("sched/core: Force proper alignment of 'struct > > util_est'") > > There should not be (and I'm still confused by that particular commit > you reference). If we access everything through the uclamp_map::s, and > only use the bitfields to interpret the results, it all 'works'. Yes, the problem above was different... still I was wondering if there could be bitfields related alignment issue lurking somewhere. But, as you say, if we always R/W atomically via uclamp_map::s there should be none. > The tricky thing we did earlier was trying to use u64 accesses for 2 > u32 variables. And somehow ia64 didn't get the alignment right. Right, np... sorry for the noise. -- #include Patrick Bellasi
Re: [PATCH v4 02/16] sched/core: uclamp: map TASK's clamp values into CPU's clamp groups
On 13-Sep 21:14, Peter Zijlstra wrote: > On Wed, Sep 12, 2018 at 06:52:02PM +0100, Patrick Bellasi wrote: > > On 12-Sep 19:42, Peter Zijlstra wrote: > > > On Wed, Sep 12, 2018 at 06:35:15PM +0100, Patrick Bellasi wrote: > > > > On 12-Sep 18:12, Peter Zijlstra wrote: > > > > > > > > No idea; but if you want to go all fancy you can replace he whole > > > > > uclamp_map thing with something like: > > > > > > > > > > struct uclamp_map { > > > > > union { > > > > > struct { > > > > > unsigned long v : 10; > > > > > unsigned long c : BITS_PER_LONG - 10; > > > > > }; > > > > > atomic_long_t s; > > > > > }; > > > > > }; > > > > > > > > That sounds really cool and scary at the same time :) > > > > > > > > The v:10 requires that we never set SCHED_CAPACITY_SCALE>1024 > > > > or that we use it to track a percentage value (i.e. [0..100]). > > > > > > Or we pick 11 bits, it seems unlikely that capacity be larger than 2k. > > > > Just remembered a past experience where we had unaligned access traps > > on some machine because... don't you see any potentially issue on > > using bitfleds like you suggest above ? > > > > I'm thinking to: > > > >commit 317d359df95d ("sched/core: Force proper alignment of 'struct > > util_est'") > > There should not be (and I'm still confused by that particular commit > you reference). If we access everything through the uclamp_map::s, and > only use the bitfields to interpret the results, it all 'works'. Yes, the problem above was different... still I was wondering if there could be bitfields related alignment issue lurking somewhere. But, as you say, if we always R/W atomically via uclamp_map::s there should be none. > The tricky thing we did earlier was trying to use u64 accesses for 2 > u32 variables. And somehow ia64 didn't get the alignment right. Right, np... sorry for the noise. -- #include Patrick Bellasi
Re: [PATCH v4 02/16] sched/core: uclamp: map TASK's clamp values into CPU's clamp groups
On 13-Sep 21:20, Peter Zijlstra wrote: > On Wed, Sep 12, 2018 at 06:42:09PM +0100, Patrick Bellasi wrote: > > On 12-Sep 18:24, Peter Zijlstra wrote: > > > On Tue, Aug 28, 2018 at 02:53:10PM +0100, Patrick Bellasi wrote: > > > > > { > > > > + int group_id[UCLAMP_CNT] = { UCLAMP_NOT_VALID }; > > > > + int lower_bound, upper_bound; > > > > + struct uclamp_se *uc_se; > > > > + int result = 0; > > > > > > I think the thing would become much more readable if you set > > > lower/upper_bound right here. > > > Actually it could also make sense to have them before the mutex ;) > > Indeed. > > + upper_bound = (attr->sched_flags & SCHED_FLAG_UTIL_CLAMP_MAX) > + ? attr->sched_util_max > + : p->uclamp[UCLAMP_MAX].value; > + > + if (upper_bound == UCLAMP_NOT_VALID) > + upper_bound = SCHED_CAPACITY_SCALE; > + if (attr->sched_util_min > upper_bound) { > + result = -EINVAL; > + goto done; > + } > + > + result = uclamp_group_find(UCLAMP_MIN, attr->sched_util_min); > + if (result == -ENOSPC) { > + pr_err(UCLAMP_ENOSPC_FMT, "MIN"); > + goto done; > + } > + group_id[UCLAMP_MIN] = result; > + } > + if (attr->sched_flags & SCHED_FLAG_UTIL_CLAMP_MAX) { > + lower_bound = (attr->sched_flags & SCHED_FLAG_UTIL_CLAMP_MIN) > + ? attr->sched_util_min > + : p->uclamp[UCLAMP_MIN].value; > + > + if (lower_bound == UCLAMP_NOT_VALID) > + lower_bound = 0; > + if (attr->sched_util_max < lower_bound || > + attr->sched_util_max > SCHED_CAPACITY_SCALE) { > + result = -EINVAL; > + goto done; > + } > > That would end up soething like: > > unsigned int lower_bound = p->uclamp[UCLAMP_MIN].value; > unsigned int upper_bound = p->uclamp[UCLAMP_MAX].value; > > if (sched_flags & SCHED_FLAG_UTIL_CLAMP_MIN) > lower_bound = attr->sched_util_min; > > if (sched_flags & SCHED_FLAG_UTIL_CLAMP_MAX) > upper_bound = attr->sched_util_max; > > if (lower_bound > upper_bound || > upper_bound > SCHED_CAPACITY_MAX) > return -EINVAL; > > mutex_lock(...); Yes... much cleaner, thanks. -- #include Patrick Bellasi
Re: [PATCH v4 02/16] sched/core: uclamp: map TASK's clamp values into CPU's clamp groups
On 13-Sep 21:20, Peter Zijlstra wrote: > On Wed, Sep 12, 2018 at 06:42:09PM +0100, Patrick Bellasi wrote: > > On 12-Sep 18:24, Peter Zijlstra wrote: > > > On Tue, Aug 28, 2018 at 02:53:10PM +0100, Patrick Bellasi wrote: > > > > > { > > > > + int group_id[UCLAMP_CNT] = { UCLAMP_NOT_VALID }; > > > > + int lower_bound, upper_bound; > > > > + struct uclamp_se *uc_se; > > > > + int result = 0; > > > > > > I think the thing would become much more readable if you set > > > lower/upper_bound right here. > > > Actually it could also make sense to have them before the mutex ;) > > Indeed. > > + upper_bound = (attr->sched_flags & SCHED_FLAG_UTIL_CLAMP_MAX) > + ? attr->sched_util_max > + : p->uclamp[UCLAMP_MAX].value; > + > + if (upper_bound == UCLAMP_NOT_VALID) > + upper_bound = SCHED_CAPACITY_SCALE; > + if (attr->sched_util_min > upper_bound) { > + result = -EINVAL; > + goto done; > + } > + > + result = uclamp_group_find(UCLAMP_MIN, attr->sched_util_min); > + if (result == -ENOSPC) { > + pr_err(UCLAMP_ENOSPC_FMT, "MIN"); > + goto done; > + } > + group_id[UCLAMP_MIN] = result; > + } > + if (attr->sched_flags & SCHED_FLAG_UTIL_CLAMP_MAX) { > + lower_bound = (attr->sched_flags & SCHED_FLAG_UTIL_CLAMP_MIN) > + ? attr->sched_util_min > + : p->uclamp[UCLAMP_MIN].value; > + > + if (lower_bound == UCLAMP_NOT_VALID) > + lower_bound = 0; > + if (attr->sched_util_max < lower_bound || > + attr->sched_util_max > SCHED_CAPACITY_SCALE) { > + result = -EINVAL; > + goto done; > + } > > That would end up soething like: > > unsigned int lower_bound = p->uclamp[UCLAMP_MIN].value; > unsigned int upper_bound = p->uclamp[UCLAMP_MAX].value; > > if (sched_flags & SCHED_FLAG_UTIL_CLAMP_MIN) > lower_bound = attr->sched_util_min; > > if (sched_flags & SCHED_FLAG_UTIL_CLAMP_MAX) > upper_bound = attr->sched_util_max; > > if (lower_bound > upper_bound || > upper_bound > SCHED_CAPACITY_MAX) > return -EINVAL; > > mutex_lock(...); Yes... much cleaner, thanks. -- #include Patrick Bellasi
Re: [PATCH v4 02/16] sched/core: uclamp: map TASK's clamp values into CPU's clamp groups
On Wed, Sep 12, 2018 at 06:42:09PM +0100, Patrick Bellasi wrote: > On 12-Sep 18:24, Peter Zijlstra wrote: > > On Tue, Aug 28, 2018 at 02:53:10PM +0100, Patrick Bellasi wrote: > > > { > > > + int group_id[UCLAMP_CNT] = { UCLAMP_NOT_VALID }; > > > + int lower_bound, upper_bound; > > > + struct uclamp_se *uc_se; > > > + int result = 0; > > > > I think the thing would become much more readable if you set > > lower/upper_bound right here. > Actually it could also make sense to have them before the mutex ;) Indeed. + upper_bound = (attr->sched_flags & SCHED_FLAG_UTIL_CLAMP_MAX) + ? attr->sched_util_max + : p->uclamp[UCLAMP_MAX].value; + + if (upper_bound == UCLAMP_NOT_VALID) + upper_bound = SCHED_CAPACITY_SCALE; + if (attr->sched_util_min > upper_bound) { + result = -EINVAL; + goto done; + } + + result = uclamp_group_find(UCLAMP_MIN, attr->sched_util_min); + if (result == -ENOSPC) { + pr_err(UCLAMP_ENOSPC_FMT, "MIN"); + goto done; + } + group_id[UCLAMP_MIN] = result; + } + if (attr->sched_flags & SCHED_FLAG_UTIL_CLAMP_MAX) { + lower_bound = (attr->sched_flags & SCHED_FLAG_UTIL_CLAMP_MIN) + ? attr->sched_util_min + : p->uclamp[UCLAMP_MIN].value; + + if (lower_bound == UCLAMP_NOT_VALID) + lower_bound = 0; + if (attr->sched_util_max < lower_bound || + attr->sched_util_max > SCHED_CAPACITY_SCALE) { + result = -EINVAL; + goto done; + } That would end up soething like: unsigned int lower_bound = p->uclamp[UCLAMP_MIN].value; unsigned int upper_bound = p->uclamp[UCLAMP_MAX].value; if (sched_flags & SCHED_FLAG_UTIL_CLAMP_MIN) lower_bound = attr->sched_util_min; if (sched_flags & SCHED_FLAG_UTIL_CLAMP_MAX) upper_bound = attr->sched_util_max; if (lower_bound > upper_bound || upper_bound > SCHED_CAPACITY_MAX) return -EINVAL; mutex_lock(...);
Re: [PATCH v4 02/16] sched/core: uclamp: map TASK's clamp values into CPU's clamp groups
On Wed, Sep 12, 2018 at 06:42:09PM +0100, Patrick Bellasi wrote: > On 12-Sep 18:24, Peter Zijlstra wrote: > > On Tue, Aug 28, 2018 at 02:53:10PM +0100, Patrick Bellasi wrote: > > > { > > > + int group_id[UCLAMP_CNT] = { UCLAMP_NOT_VALID }; > > > + int lower_bound, upper_bound; > > > + struct uclamp_se *uc_se; > > > + int result = 0; > > > > I think the thing would become much more readable if you set > > lower/upper_bound right here. > Actually it could also make sense to have them before the mutex ;) Indeed. + upper_bound = (attr->sched_flags & SCHED_FLAG_UTIL_CLAMP_MAX) + ? attr->sched_util_max + : p->uclamp[UCLAMP_MAX].value; + + if (upper_bound == UCLAMP_NOT_VALID) + upper_bound = SCHED_CAPACITY_SCALE; + if (attr->sched_util_min > upper_bound) { + result = -EINVAL; + goto done; + } + + result = uclamp_group_find(UCLAMP_MIN, attr->sched_util_min); + if (result == -ENOSPC) { + pr_err(UCLAMP_ENOSPC_FMT, "MIN"); + goto done; + } + group_id[UCLAMP_MIN] = result; + } + if (attr->sched_flags & SCHED_FLAG_UTIL_CLAMP_MAX) { + lower_bound = (attr->sched_flags & SCHED_FLAG_UTIL_CLAMP_MIN) + ? attr->sched_util_min + : p->uclamp[UCLAMP_MIN].value; + + if (lower_bound == UCLAMP_NOT_VALID) + lower_bound = 0; + if (attr->sched_util_max < lower_bound || + attr->sched_util_max > SCHED_CAPACITY_SCALE) { + result = -EINVAL; + goto done; + } That would end up soething like: unsigned int lower_bound = p->uclamp[UCLAMP_MIN].value; unsigned int upper_bound = p->uclamp[UCLAMP_MAX].value; if (sched_flags & SCHED_FLAG_UTIL_CLAMP_MIN) lower_bound = attr->sched_util_min; if (sched_flags & SCHED_FLAG_UTIL_CLAMP_MAX) upper_bound = attr->sched_util_max; if (lower_bound > upper_bound || upper_bound > SCHED_CAPACITY_MAX) return -EINVAL; mutex_lock(...);
Re: [PATCH v4 02/16] sched/core: uclamp: map TASK's clamp values into CPU's clamp groups
On Wed, Sep 12, 2018 at 06:52:02PM +0100, Patrick Bellasi wrote: > On 12-Sep 19:42, Peter Zijlstra wrote: > > On Wed, Sep 12, 2018 at 06:35:15PM +0100, Patrick Bellasi wrote: > > > On 12-Sep 18:12, Peter Zijlstra wrote: > > > > > > No idea; but if you want to go all fancy you can replace he whole > > > > uclamp_map thing with something like: > > > > > > > > struct uclamp_map { > > > > union { > > > > struct { > > > > unsigned long v : 10; > > > > unsigned long c : BITS_PER_LONG - 10; > > > > }; > > > > atomic_long_t s; > > > > }; > > > > }; > > > > > > That sounds really cool and scary at the same time :) > > > > > > The v:10 requires that we never set SCHED_CAPACITY_SCALE>1024 > > > or that we use it to track a percentage value (i.e. [0..100]). > > > > Or we pick 11 bits, it seems unlikely that capacity be larger than 2k. > > Just remembered a past experience where we had unaligned access traps > on some machine because... don't you see any potentially issue on > using bitfleds like you suggest above ? > > I'm thinking to: > >commit 317d359df95d ("sched/core: Force proper alignment of 'struct > util_est'") There should not be (and I'm still confused by that particular commit you reference). If we access everything through the uclamp_map::s, and only use the bitfields to interpret the results, it all 'works'. The tricky thing we did earlier was trying to use u64 accesses for 2 u32 variables. And somehow ia64 didn't get the alignment right.
Re: [PATCH v4 02/16] sched/core: uclamp: map TASK's clamp values into CPU's clamp groups
On Wed, Sep 12, 2018 at 06:52:02PM +0100, Patrick Bellasi wrote: > On 12-Sep 19:42, Peter Zijlstra wrote: > > On Wed, Sep 12, 2018 at 06:35:15PM +0100, Patrick Bellasi wrote: > > > On 12-Sep 18:12, Peter Zijlstra wrote: > > > > > > No idea; but if you want to go all fancy you can replace he whole > > > > uclamp_map thing with something like: > > > > > > > > struct uclamp_map { > > > > union { > > > > struct { > > > > unsigned long v : 10; > > > > unsigned long c : BITS_PER_LONG - 10; > > > > }; > > > > atomic_long_t s; > > > > }; > > > > }; > > > > > > That sounds really cool and scary at the same time :) > > > > > > The v:10 requires that we never set SCHED_CAPACITY_SCALE>1024 > > > or that we use it to track a percentage value (i.e. [0..100]). > > > > Or we pick 11 bits, it seems unlikely that capacity be larger than 2k. > > Just remembered a past experience where we had unaligned access traps > on some machine because... don't you see any potentially issue on > using bitfleds like you suggest above ? > > I'm thinking to: > >commit 317d359df95d ("sched/core: Force proper alignment of 'struct > util_est'") There should not be (and I'm still confused by that particular commit you reference). If we access everything through the uclamp_map::s, and only use the bitfields to interpret the results, it all 'works'. The tricky thing we did earlier was trying to use u64 accesses for 2 u32 variables. And somehow ia64 didn't get the alignment right.
Re: [PATCH v4 02/16] sched/core: uclamp: map TASK's clamp values into CPU's clamp groups
On 12-Sep 19:42, Peter Zijlstra wrote: > On Wed, Sep 12, 2018 at 06:35:15PM +0100, Patrick Bellasi wrote: > > On 12-Sep 18:12, Peter Zijlstra wrote: > > > > No idea; but if you want to go all fancy you can replace he whole > > > uclamp_map thing with something like: > > > > > > struct uclamp_map { > > > union { > > > struct { > > > unsigned long v : 10; > > > unsigned long c : BITS_PER_LONG - 10; > > > }; > > > atomic_long_t s; > > > }; > > > }; > > > > That sounds really cool and scary at the same time :) > > > > The v:10 requires that we never set SCHED_CAPACITY_SCALE>1024 > > or that we use it to track a percentage value (i.e. [0..100]). > > Or we pick 11 bits, it seems unlikely that capacity be larger than 2k. Just remembered a past experience where we had unaligned access traps on some machine because... don't you see any potentially issue on using bitfleds like you suggest above ? I'm thinking to: commit 317d359df95d ("sched/core: Force proper alignment of 'struct util_est'") -- #include Patrick Bellasi
Re: [PATCH v4 02/16] sched/core: uclamp: map TASK's clamp values into CPU's clamp groups
On 12-Sep 19:42, Peter Zijlstra wrote: > On Wed, Sep 12, 2018 at 06:35:15PM +0100, Patrick Bellasi wrote: > > On 12-Sep 18:12, Peter Zijlstra wrote: > > > > No idea; but if you want to go all fancy you can replace he whole > > > uclamp_map thing with something like: > > > > > > struct uclamp_map { > > > union { > > > struct { > > > unsigned long v : 10; > > > unsigned long c : BITS_PER_LONG - 10; > > > }; > > > atomic_long_t s; > > > }; > > > }; > > > > That sounds really cool and scary at the same time :) > > > > The v:10 requires that we never set SCHED_CAPACITY_SCALE>1024 > > or that we use it to track a percentage value (i.e. [0..100]). > > Or we pick 11 bits, it seems unlikely that capacity be larger than 2k. Just remembered a past experience where we had unaligned access traps on some machine because... don't you see any potentially issue on using bitfleds like you suggest above ? I'm thinking to: commit 317d359df95d ("sched/core: Force proper alignment of 'struct util_est'") -- #include Patrick Bellasi
Re: [PATCH v4 02/16] sched/core: uclamp: map TASK's clamp values into CPU's clamp groups
On Wed, Sep 12, 2018 at 06:35:15PM +0100, Patrick Bellasi wrote: > On 12-Sep 18:12, Peter Zijlstra wrote: > > No idea; but if you want to go all fancy you can replace he whole > > uclamp_map thing with something like: > > > > struct uclamp_map { > > union { > > struct { > > unsigned long v : 10; > > unsigned long c : BITS_PER_LONG - 10; > > }; > > atomic_long_t s; > > }; > > }; > > That sounds really cool and scary at the same time :) > > The v:10 requires that we never set SCHED_CAPACITY_SCALE>1024 > or that we use it to track a percentage value (i.e. [0..100]). Or we pick 11 bits, it seems unlikely that capacity be larger than 2k. > One of the last patches introduces percentage values to userspace. > But, I was considering that in kernel space we should always track > full scale utilization values. > > The c:(BITS_PER_LONG-10) restricts the range of concurrently active > SE refcounting the same clamp value. Which, for some 32bit systems is > only 4 milions among tasks and cgroups... maybe still reasonable... Yeah, on 32bit having 4M tasks seems 'unlikely'.
Re: [PATCH v4 02/16] sched/core: uclamp: map TASK's clamp values into CPU's clamp groups
On Wed, Sep 12, 2018 at 06:35:15PM +0100, Patrick Bellasi wrote: > On 12-Sep 18:12, Peter Zijlstra wrote: > > No idea; but if you want to go all fancy you can replace he whole > > uclamp_map thing with something like: > > > > struct uclamp_map { > > union { > > struct { > > unsigned long v : 10; > > unsigned long c : BITS_PER_LONG - 10; > > }; > > atomic_long_t s; > > }; > > }; > > That sounds really cool and scary at the same time :) > > The v:10 requires that we never set SCHED_CAPACITY_SCALE>1024 > or that we use it to track a percentage value (i.e. [0..100]). Or we pick 11 bits, it seems unlikely that capacity be larger than 2k. > One of the last patches introduces percentage values to userspace. > But, I was considering that in kernel space we should always track > full scale utilization values. > > The c:(BITS_PER_LONG-10) restricts the range of concurrently active > SE refcounting the same clamp value. Which, for some 32bit systems is > only 4 milions among tasks and cgroups... maybe still reasonable... Yeah, on 32bit having 4M tasks seems 'unlikely'.
Re: [PATCH v4 02/16] sched/core: uclamp: map TASK's clamp values into CPU's clamp groups
On 12-Sep 18:24, Peter Zijlstra wrote: > On Tue, Aug 28, 2018 at 02:53:10PM +0100, Patrick Bellasi wrote: > > static inline int __setscheduler_uclamp(struct task_struct *p, > > const struct sched_attr *attr) > > But large for inline now. Yes, Suren also already pointed that out... already gone in my v5 ;) > > { > > + int group_id[UCLAMP_CNT] = { UCLAMP_NOT_VALID }; > > + int lower_bound, upper_bound; > > + struct uclamp_se *uc_se; > > + int result = 0; > > I think the thing would become much more readable if you set > lower/upper_bound right here. Do you mean the bits I've ---8<---ed below ? > > > > + mutex_lock(_mutex); > > > > + /* Find a valid group_id for each required clamp value */ > > + if (attr->sched_flags & SCHED_FLAG_UTIL_CLAMP_MIN) { ---8<--- > > + upper_bound = (attr->sched_flags & SCHED_FLAG_UTIL_CLAMP_MAX) > > + ? attr->sched_util_max > > + : p->uclamp[UCLAMP_MAX].value; > > + > > + if (upper_bound == UCLAMP_NOT_VALID) > > + upper_bound = SCHED_CAPACITY_SCALE; > > + if (attr->sched_util_min > upper_bound) { > > + result = -EINVAL; > > + goto done; > > + } ---8<--- Actually it could also make sense to have them before the mutex ;) > > + > > + result = uclamp_group_find(UCLAMP_MIN, attr->sched_util_min); > > + if (result == -ENOSPC) { > > + pr_err(UCLAMP_ENOSPC_FMT, "MIN"); > > AFAICT this is an unpriv part of the syscall; and you can spam the log > without limits. Not good. Good point, will better check this. [...] Cheers, Patrick -- #include Patrick Bellasi
Re: [PATCH v4 02/16] sched/core: uclamp: map TASK's clamp values into CPU's clamp groups
On 12-Sep 18:24, Peter Zijlstra wrote: > On Tue, Aug 28, 2018 at 02:53:10PM +0100, Patrick Bellasi wrote: > > static inline int __setscheduler_uclamp(struct task_struct *p, > > const struct sched_attr *attr) > > But large for inline now. Yes, Suren also already pointed that out... already gone in my v5 ;) > > { > > + int group_id[UCLAMP_CNT] = { UCLAMP_NOT_VALID }; > > + int lower_bound, upper_bound; > > + struct uclamp_se *uc_se; > > + int result = 0; > > I think the thing would become much more readable if you set > lower/upper_bound right here. Do you mean the bits I've ---8<---ed below ? > > > > + mutex_lock(_mutex); > > > > + /* Find a valid group_id for each required clamp value */ > > + if (attr->sched_flags & SCHED_FLAG_UTIL_CLAMP_MIN) { ---8<--- > > + upper_bound = (attr->sched_flags & SCHED_FLAG_UTIL_CLAMP_MAX) > > + ? attr->sched_util_max > > + : p->uclamp[UCLAMP_MAX].value; > > + > > + if (upper_bound == UCLAMP_NOT_VALID) > > + upper_bound = SCHED_CAPACITY_SCALE; > > + if (attr->sched_util_min > upper_bound) { > > + result = -EINVAL; > > + goto done; > > + } ---8<--- Actually it could also make sense to have them before the mutex ;) > > + > > + result = uclamp_group_find(UCLAMP_MIN, attr->sched_util_min); > > + if (result == -ENOSPC) { > > + pr_err(UCLAMP_ENOSPC_FMT, "MIN"); > > AFAICT this is an unpriv part of the syscall; and you can spam the log > without limits. Not good. Good point, will better check this. [...] Cheers, Patrick -- #include Patrick Bellasi
Re: [PATCH v4 02/16] sched/core: uclamp: map TASK's clamp values into CPU's clamp groups
On 12-Sep 18:12, Peter Zijlstra wrote: > On Wed, Sep 12, 2018 at 04:56:19PM +0100, Patrick Bellasi wrote: > > On 12-Sep 15:49, Peter Zijlstra wrote: > > > On Tue, Aug 28, 2018 at 02:53:10PM +0100, Patrick Bellasi wrote: > > > > > +/** > > > > + * uclamp_map: reference counts a utilization "clamp value" > > > > + * @value:the utilization "clamp value" required > > > > + * @se_count: the number of scheduling entities requiring the "clamp > > > > value" > > > > + * @se_lock: serialize reference count updates by protecting se_count > > > > > > Why do you have a spinlock to serialize a single value? Don't we have > > > atomics for that? > > > > There are some code paths where it's used to protect clamp groups > > mapping and initialization, e.g. > > > > uclamp_group_get() > > spin_lock() > > // initialize clamp group (if required) and then... ^^^ This is actually a couple of function calls > > se_count += 1 > > spin_unlock() > > > > Almost all these paths are triggered from user-space and protected > > by a global uclamp_mutex, but fork/exit paths. > > > > To serialize these paths I'm using the spinlock above, does it make > > sense ? Can we use the global uclamp_mutex on forks/exit too ? > > OK, then your comment is misleading; it serializes both fields. Yes... that definitively needs an update. > > One additional observations is that, if in the future we want to add a > > kernel space API, (e.g. driver asking for a new clamp value), maybe we > > will need to have a serialized non-sleeping uclamp_group_get() API ? > > No idea; but if you want to go all fancy you can replace he whole > uclamp_map thing with something like: > > struct uclamp_map { > union { > struct { > unsigned long v : 10; > unsigned long c : BITS_PER_LONG - 10; > }; > atomic_long_t s; > }; > }; That sounds really cool and scary at the same time :) The v:10 requires that we never set SCHED_CAPACITY_SCALE>1024 or that we use it to track a percentage value (i.e. [0..100]). One of the last patches introduces percentage values to userspace. But, I was considering that in kernel space we should always track full scale utilization values. The c:(BITS_PER_LONG-10) restricts the range of concurrently active SE refcounting the same clamp value. Which, for some 32bit systems is only 4 milions among tasks and cgroups... maybe still reasonable... > And use uclamp_map::c == 0 as unused (as per normal refcount > semantics) and atomic_long_cmpxchg() the whole thing using > uclamp_map::s. Yes... that could work for the uclamp_map updates, but as I noted above, I think I have other calls serialized by that lock... will look better into what you suggest, thanks! [...] > > > What's the purpose of that cacheline align statement? > > > > In uclamp_maps, we still need to scan the array when a clamp value is > > changed from user-space, i.e. the cases reported above. Thus, that > > alignment is just to ensure that we minimize the number of cache lines > > used. Does that make sense ? > > > > Maybe that alignment implicitly generated by the compiler ? > > It is not, but if it really is a slow path, we shouldn't care about > alignment. Ok, will remove it. > > > Note that without that apparently superfluous lock, it would be 8*12 = > > > 96 bytes, which is 1.5 lines and would indeed suggest you default to > > > GROUP_COUNT=7 by default to fill 2 lines. > > > > Yes, will check better if we can count on just the uclamp_mutex > > Well, if we don't care about performance (slow path) then keeping he > lock is fine, just the comment and alignment are misleading. Ok [...] Cheers, Patrick -- #include Patrick Bellasi
Re: [PATCH v4 02/16] sched/core: uclamp: map TASK's clamp values into CPU's clamp groups
On 12-Sep 18:12, Peter Zijlstra wrote: > On Wed, Sep 12, 2018 at 04:56:19PM +0100, Patrick Bellasi wrote: > > On 12-Sep 15:49, Peter Zijlstra wrote: > > > On Tue, Aug 28, 2018 at 02:53:10PM +0100, Patrick Bellasi wrote: > > > > > +/** > > > > + * uclamp_map: reference counts a utilization "clamp value" > > > > + * @value:the utilization "clamp value" required > > > > + * @se_count: the number of scheduling entities requiring the "clamp > > > > value" > > > > + * @se_lock: serialize reference count updates by protecting se_count > > > > > > Why do you have a spinlock to serialize a single value? Don't we have > > > atomics for that? > > > > There are some code paths where it's used to protect clamp groups > > mapping and initialization, e.g. > > > > uclamp_group_get() > > spin_lock() > > // initialize clamp group (if required) and then... ^^^ This is actually a couple of function calls > > se_count += 1 > > spin_unlock() > > > > Almost all these paths are triggered from user-space and protected > > by a global uclamp_mutex, but fork/exit paths. > > > > To serialize these paths I'm using the spinlock above, does it make > > sense ? Can we use the global uclamp_mutex on forks/exit too ? > > OK, then your comment is misleading; it serializes both fields. Yes... that definitively needs an update. > > One additional observations is that, if in the future we want to add a > > kernel space API, (e.g. driver asking for a new clamp value), maybe we > > will need to have a serialized non-sleeping uclamp_group_get() API ? > > No idea; but if you want to go all fancy you can replace he whole > uclamp_map thing with something like: > > struct uclamp_map { > union { > struct { > unsigned long v : 10; > unsigned long c : BITS_PER_LONG - 10; > }; > atomic_long_t s; > }; > }; That sounds really cool and scary at the same time :) The v:10 requires that we never set SCHED_CAPACITY_SCALE>1024 or that we use it to track a percentage value (i.e. [0..100]). One of the last patches introduces percentage values to userspace. But, I was considering that in kernel space we should always track full scale utilization values. The c:(BITS_PER_LONG-10) restricts the range of concurrently active SE refcounting the same clamp value. Which, for some 32bit systems is only 4 milions among tasks and cgroups... maybe still reasonable... > And use uclamp_map::c == 0 as unused (as per normal refcount > semantics) and atomic_long_cmpxchg() the whole thing using > uclamp_map::s. Yes... that could work for the uclamp_map updates, but as I noted above, I think I have other calls serialized by that lock... will look better into what you suggest, thanks! [...] > > > What's the purpose of that cacheline align statement? > > > > In uclamp_maps, we still need to scan the array when a clamp value is > > changed from user-space, i.e. the cases reported above. Thus, that > > alignment is just to ensure that we minimize the number of cache lines > > used. Does that make sense ? > > > > Maybe that alignment implicitly generated by the compiler ? > > It is not, but if it really is a slow path, we shouldn't care about > alignment. Ok, will remove it. > > > Note that without that apparently superfluous lock, it would be 8*12 = > > > 96 bytes, which is 1.5 lines and would indeed suggest you default to > > > GROUP_COUNT=7 by default to fill 2 lines. > > > > Yes, will check better if we can count on just the uclamp_mutex > > Well, if we don't care about performance (slow path) then keeping he > lock is fine, just the comment and alignment are misleading. Ok [...] Cheers, Patrick -- #include Patrick Bellasi
Re: [PATCH v4 02/16] sched/core: uclamp: map TASK's clamp values into CPU's clamp groups
On Tue, Aug 28, 2018 at 02:53:10PM +0100, Patrick Bellasi wrote: > static inline int __setscheduler_uclamp(struct task_struct *p, > const struct sched_attr *attr) But large for inline now. > { > + int group_id[UCLAMP_CNT] = { UCLAMP_NOT_VALID }; > + int lower_bound, upper_bound; > + struct uclamp_se *uc_se; > + int result = 0; I think the thing would become much more readable if you set lower/upper_bound right here. > > + mutex_lock(_mutex); > > + /* Find a valid group_id for each required clamp value */ > + if (attr->sched_flags & SCHED_FLAG_UTIL_CLAMP_MIN) { > + upper_bound = (attr->sched_flags & SCHED_FLAG_UTIL_CLAMP_MAX) > + ? attr->sched_util_max > + : p->uclamp[UCLAMP_MAX].value; > + > + if (upper_bound == UCLAMP_NOT_VALID) > + upper_bound = SCHED_CAPACITY_SCALE; > + if (attr->sched_util_min > upper_bound) { > + result = -EINVAL; > + goto done; > + } > + > + result = uclamp_group_find(UCLAMP_MIN, attr->sched_util_min); > + if (result == -ENOSPC) { > + pr_err(UCLAMP_ENOSPC_FMT, "MIN"); AFAICT this is an unpriv part of the syscall; and you can spam the log without limits. Not good. > + goto done; > + } > + group_id[UCLAMP_MIN] = result; > + } > + if (attr->sched_flags & SCHED_FLAG_UTIL_CLAMP_MAX) { > + lower_bound = (attr->sched_flags & SCHED_FLAG_UTIL_CLAMP_MIN) > + ? attr->sched_util_min > + : p->uclamp[UCLAMP_MIN].value; > + > + if (lower_bound == UCLAMP_NOT_VALID) > + lower_bound = 0; > + if (attr->sched_util_max < lower_bound || > + attr->sched_util_max > SCHED_CAPACITY_SCALE) { > + result = -EINVAL; > + goto done; > + } > + > + result = uclamp_group_find(UCLAMP_MAX, attr->sched_util_max); > + if (result == -ENOSPC) { > + pr_err(UCLAMP_ENOSPC_FMT, "MAX"); > + goto done; > + } > + group_id[UCLAMP_MAX] = result; > + } > + > + /* Update each required clamp group */ > + if (attr->sched_flags & SCHED_FLAG_UTIL_CLAMP_MIN) { > + uc_se = >uclamp[UCLAMP_MIN]; > + uclamp_group_get(UCLAMP_MIN, group_id[UCLAMP_MIN], > + uc_se, attr->sched_util_min); > + } > + if (attr->sched_flags & SCHED_FLAG_UTIL_CLAMP_MAX) { > + uc_se = >uclamp[UCLAMP_MAX]; > + uclamp_group_get(UCLAMP_MAX, group_id[UCLAMP_MAX], > + uc_se, attr->sched_util_max); > + } > + > +done: > + mutex_unlock(_mutex); > + > + return result; > +}
Re: [PATCH v4 02/16] sched/core: uclamp: map TASK's clamp values into CPU's clamp groups
On Tue, Aug 28, 2018 at 02:53:10PM +0100, Patrick Bellasi wrote: > static inline int __setscheduler_uclamp(struct task_struct *p, > const struct sched_attr *attr) But large for inline now. > { > + int group_id[UCLAMP_CNT] = { UCLAMP_NOT_VALID }; > + int lower_bound, upper_bound; > + struct uclamp_se *uc_se; > + int result = 0; I think the thing would become much more readable if you set lower/upper_bound right here. > > + mutex_lock(_mutex); > > + /* Find a valid group_id for each required clamp value */ > + if (attr->sched_flags & SCHED_FLAG_UTIL_CLAMP_MIN) { > + upper_bound = (attr->sched_flags & SCHED_FLAG_UTIL_CLAMP_MAX) > + ? attr->sched_util_max > + : p->uclamp[UCLAMP_MAX].value; > + > + if (upper_bound == UCLAMP_NOT_VALID) > + upper_bound = SCHED_CAPACITY_SCALE; > + if (attr->sched_util_min > upper_bound) { > + result = -EINVAL; > + goto done; > + } > + > + result = uclamp_group_find(UCLAMP_MIN, attr->sched_util_min); > + if (result == -ENOSPC) { > + pr_err(UCLAMP_ENOSPC_FMT, "MIN"); AFAICT this is an unpriv part of the syscall; and you can spam the log without limits. Not good. > + goto done; > + } > + group_id[UCLAMP_MIN] = result; > + } > + if (attr->sched_flags & SCHED_FLAG_UTIL_CLAMP_MAX) { > + lower_bound = (attr->sched_flags & SCHED_FLAG_UTIL_CLAMP_MIN) > + ? attr->sched_util_min > + : p->uclamp[UCLAMP_MIN].value; > + > + if (lower_bound == UCLAMP_NOT_VALID) > + lower_bound = 0; > + if (attr->sched_util_max < lower_bound || > + attr->sched_util_max > SCHED_CAPACITY_SCALE) { > + result = -EINVAL; > + goto done; > + } > + > + result = uclamp_group_find(UCLAMP_MAX, attr->sched_util_max); > + if (result == -ENOSPC) { > + pr_err(UCLAMP_ENOSPC_FMT, "MAX"); > + goto done; > + } > + group_id[UCLAMP_MAX] = result; > + } > + > + /* Update each required clamp group */ > + if (attr->sched_flags & SCHED_FLAG_UTIL_CLAMP_MIN) { > + uc_se = >uclamp[UCLAMP_MIN]; > + uclamp_group_get(UCLAMP_MIN, group_id[UCLAMP_MIN], > + uc_se, attr->sched_util_min); > + } > + if (attr->sched_flags & SCHED_FLAG_UTIL_CLAMP_MAX) { > + uc_se = >uclamp[UCLAMP_MAX]; > + uclamp_group_get(UCLAMP_MAX, group_id[UCLAMP_MAX], > + uc_se, attr->sched_util_max); > + } > + > +done: > + mutex_unlock(_mutex); > + > + return result; > +}
Re: [PATCH v4 02/16] sched/core: uclamp: map TASK's clamp values into CPU's clamp groups
On Wed, Sep 12, 2018 at 04:56:19PM +0100, Patrick Bellasi wrote: > On 12-Sep 15:49, Peter Zijlstra wrote: > > On Tue, Aug 28, 2018 at 02:53:10PM +0100, Patrick Bellasi wrote: > > > +/** > > > + * uclamp_map: reference counts a utilization "clamp value" > > > + * @value:the utilization "clamp value" required > > > + * @se_count: the number of scheduling entities requiring the "clamp > > > value" > > > + * @se_lock: serialize reference count updates by protecting se_count > > > > Why do you have a spinlock to serialize a single value? Don't we have > > atomics for that? > > There are some code paths where it's used to protect clamp groups > mapping and initialization, e.g. > > uclamp_group_get() > spin_lock() > // initialize clamp group (if required) and then... > se_count += 1 > spin_unlock() > > Almost all these paths are triggered from user-space and protected > by a global uclamp_mutex, but fork/exit paths. > > To serialize these paths I'm using the spinlock above, does it make > sense ? Can we use the global uclamp_mutex on forks/exit too ? OK, then your comment is misleading; it serializes both fields. > One additional observations is that, if in the future we want to add a > kernel space API, (e.g. driver asking for a new clamp value), maybe we > will need to have a serialized non-sleeping uclamp_group_get() API ? No idea; but if you want to go all fancy you can replace he whole uclamp_map thing with something like: struct uclamp_map { union { struct { unsigned long v : 10; unsigned long c : BITS_PER_LONG - 10; }; atomic_long_t s; }; }; And use uclamp_map::c == 0 as unused (as per normal refcount semantics) and atomic_long_cmpxchg() the whole thing using uclamp_map::s. > > > + * uclamp_maps is a matrix of > > > + * +--- UCLAMP_CNT by CONFIG_UCLAMP_GROUPS_COUNT+1 entries > > > + * || > > > + * |/---+---\ > > > + * | ++ ++ > > > + * | / UCLAMP_MIN | value | | value | > > > + * | || se_count |.. | se_count | > > > + * | |++ ++ > > > + * +--+++ ++ > > > + * || value | | value | > > > + * \ UCLAMP_MAX | se_count |.. | se_count | > > > + * +-^--+ +^---+ > > > + *| | > > > + * uc_map = + | > > > + * _maps[clamp_id][0] + > > > + *clamp_value = > > > + * uc_map[group_id].value > > > + */ > > > +static struct uclamp_map uclamp_maps[UCLAMP_CNT] > > > + [CONFIG_UCLAMP_GROUPS_COUNT + 1] > > > + cacheline_aligned_in_smp; > > > + > > > > I'm still completely confused by all this. > > > > sizeof(uclamp_map) = 12 > > > > that array is 2*6=12 of those, so the whole thing is 144 bytes. which is > > more than 2 (64 byte) cachelines. > > This data structure is *not* used in the hot-path, that's why I did not > care about fitting it exactly into few cache lines. > > It's used to map a user-space "clamp value" into a kernel-space "clamp > group" when user-space: > - changes a task specific clamp value > - changes a cgroup clamp value > - a task forks/exits > > I assume we can consider all those as "slow" code paths, is that correct ? Yep. > > What's the purpose of that cacheline align statement? > > In uclamp_maps, we still need to scan the array when a clamp value is > changed from user-space, i.e. the cases reported above. Thus, that > alignment is just to ensure that we minimize the number of cache lines > used. Does that make sense ? > > Maybe that alignment implicitly generated by the compiler ? It is not, but if it really is a slow path, we shouldn't care about alignment. > > Note that without that apparently superfluous lock, it would be 8*12 = > > 96 bytes, which is 1.5 lines and would indeed suggest you default to > > GROUP_COUNT=7 by default to fill 2 lines. > > Yes, will check better if we can count on just the uclamp_mutex Well, if we don't care about performance (slow path) then keeping he lock is fine, just the comment and alignment are misleading. > > Why are the min and max things torn up like that? I'm fairly sure I > > asked some of that last time; but the above comments only try to explain > > what, not why. > > We use that organization to speedup scanning for clamp values of the > same clamp_id. That's more
Re: [PATCH v4 02/16] sched/core: uclamp: map TASK's clamp values into CPU's clamp groups
On Wed, Sep 12, 2018 at 04:56:19PM +0100, Patrick Bellasi wrote: > On 12-Sep 15:49, Peter Zijlstra wrote: > > On Tue, Aug 28, 2018 at 02:53:10PM +0100, Patrick Bellasi wrote: > > > +/** > > > + * uclamp_map: reference counts a utilization "clamp value" > > > + * @value:the utilization "clamp value" required > > > + * @se_count: the number of scheduling entities requiring the "clamp > > > value" > > > + * @se_lock: serialize reference count updates by protecting se_count > > > > Why do you have a spinlock to serialize a single value? Don't we have > > atomics for that? > > There are some code paths where it's used to protect clamp groups > mapping and initialization, e.g. > > uclamp_group_get() > spin_lock() > // initialize clamp group (if required) and then... > se_count += 1 > spin_unlock() > > Almost all these paths are triggered from user-space and protected > by a global uclamp_mutex, but fork/exit paths. > > To serialize these paths I'm using the spinlock above, does it make > sense ? Can we use the global uclamp_mutex on forks/exit too ? OK, then your comment is misleading; it serializes both fields. > One additional observations is that, if in the future we want to add a > kernel space API, (e.g. driver asking for a new clamp value), maybe we > will need to have a serialized non-sleeping uclamp_group_get() API ? No idea; but if you want to go all fancy you can replace he whole uclamp_map thing with something like: struct uclamp_map { union { struct { unsigned long v : 10; unsigned long c : BITS_PER_LONG - 10; }; atomic_long_t s; }; }; And use uclamp_map::c == 0 as unused (as per normal refcount semantics) and atomic_long_cmpxchg() the whole thing using uclamp_map::s. > > > + * uclamp_maps is a matrix of > > > + * +--- UCLAMP_CNT by CONFIG_UCLAMP_GROUPS_COUNT+1 entries > > > + * || > > > + * |/---+---\ > > > + * | ++ ++ > > > + * | / UCLAMP_MIN | value | | value | > > > + * | || se_count |.. | se_count | > > > + * | |++ ++ > > > + * +--+++ ++ > > > + * || value | | value | > > > + * \ UCLAMP_MAX | se_count |.. | se_count | > > > + * +-^--+ +^---+ > > > + *| | > > > + * uc_map = + | > > > + * _maps[clamp_id][0] + > > > + *clamp_value = > > > + * uc_map[group_id].value > > > + */ > > > +static struct uclamp_map uclamp_maps[UCLAMP_CNT] > > > + [CONFIG_UCLAMP_GROUPS_COUNT + 1] > > > + cacheline_aligned_in_smp; > > > + > > > > I'm still completely confused by all this. > > > > sizeof(uclamp_map) = 12 > > > > that array is 2*6=12 of those, so the whole thing is 144 bytes. which is > > more than 2 (64 byte) cachelines. > > This data structure is *not* used in the hot-path, that's why I did not > care about fitting it exactly into few cache lines. > > It's used to map a user-space "clamp value" into a kernel-space "clamp > group" when user-space: > - changes a task specific clamp value > - changes a cgroup clamp value > - a task forks/exits > > I assume we can consider all those as "slow" code paths, is that correct ? Yep. > > What's the purpose of that cacheline align statement? > > In uclamp_maps, we still need to scan the array when a clamp value is > changed from user-space, i.e. the cases reported above. Thus, that > alignment is just to ensure that we minimize the number of cache lines > used. Does that make sense ? > > Maybe that alignment implicitly generated by the compiler ? It is not, but if it really is a slow path, we shouldn't care about alignment. > > Note that without that apparently superfluous lock, it would be 8*12 = > > 96 bytes, which is 1.5 lines and would indeed suggest you default to > > GROUP_COUNT=7 by default to fill 2 lines. > > Yes, will check better if we can count on just the uclamp_mutex Well, if we don't care about performance (slow path) then keeping he lock is fine, just the comment and alignment are misleading. > > Why are the min and max things torn up like that? I'm fairly sure I > > asked some of that last time; but the above comments only try to explain > > what, not why. > > We use that organization to speedup scanning for clamp values of the > same clamp_id. That's more
Re: [PATCH v4 02/16] sched/core: uclamp: map TASK's clamp values into CPU's clamp groups
On 12-Sep 15:49, Peter Zijlstra wrote: > On Tue, Aug 28, 2018 at 02:53:10PM +0100, Patrick Bellasi wrote: > > +/** > > + * Utilization's clamp group > > + * > > + * A utilization clamp group maps a "clamp value" (value), i.e. > > + * util_{min,max}, to a "clamp group index" (group_id). > > + */ > > +struct uclamp_se { > > + unsigned int value; > > + unsigned int group_id; > > +}; > > > +/** > > + * uclamp_map: reference counts a utilization "clamp value" > > + * @value:the utilization "clamp value" required > > + * @se_count: the number of scheduling entities requiring the "clamp value" > > + * @se_lock: serialize reference count updates by protecting se_count > > Why do you have a spinlock to serialize a single value? Don't we have > atomics for that? There are some code paths where it's used to protect clamp groups mapping and initialization, e.g. uclamp_group_get() spin_lock() // initialize clamp group (if required) and then... se_count += 1 spin_unlock() Almost all these paths are triggered from user-space and protected by a global uclamp_mutex, but fork/exit paths. To serialize these paths I'm using the spinlock above, does it make sense ? Can we use the global uclamp_mutex on forks/exit too ? One additional observations is that, if in the future we want to add a kernel space API, (e.g. driver asking for a new clamp value), maybe we will need to have a serialized non-sleeping uclamp_group_get() API ? > > + */ > > +struct uclamp_map { > > + int value; > > + int se_count; > > + raw_spinlock_t se_lock; > > +}; > > + > > +/** > > + * uclamp_maps: maps each SEs "clamp value" into a CPUs "clamp group" > > + * > > + * Since only a limited number of different "clamp values" are supported, > > we > > + * need to map each different clamp value into a "clamp group" (group_id) > > to > > + * be used by the per-CPU accounting in the fast-path, when tasks are > > + * enqueued and dequeued. > > + * We also support different kind of utilization clamping, min and max > > + * utilization for example, each representing what we call a "clamp index" > > + * (clamp_id). > > + * > > + * A matrix is thus required to map "clamp values" to "clamp groups" > > + * (group_id), for each "clamp index" (clamp_id), where: > > + * - rows are indexed by clamp_id and they collect the clamp groups for a > > + * given clamp index > > + * - columns are indexed by group_id and they collect the clamp values > > which > > + * maps to that clamp group > > + * > > + * Thus, the column index of a given (clamp_id, value) pair represents the > > + * clamp group (group_id) used by the fast-path's per-CPU accounting. > > + * > > + * NOTE: first clamp group (group_id=0) is reserved for tracking of non > > + * clamped tasks. Thus we allocate one more slot than the value of > > + * CONFIG_UCLAMP_GROUPS_COUNT. > > + * > > + * Here is the map layout and, right below, how entries are accessed by the > > + * following code. > > + * > > + * uclamp_maps is a matrix of > > + * +--- UCLAMP_CNT by CONFIG_UCLAMP_GROUPS_COUNT+1 entries > > + * || > > + * |/---+---\ > > + * | ++ ++ > > + * | / UCLAMP_MIN | value | | value | > > + * | || se_count |.. | se_count | > > + * | |++ ++ > > + * +--+++ ++ > > + * || value | | value | > > + * \ UCLAMP_MAX | se_count |.. | se_count | > > + * +-^--+ +^---+ > > + *| | > > + * uc_map = + | > > + * _maps[clamp_id][0] + > > + *clamp_value = > > + * uc_map[group_id].value > > + */ > > +static struct uclamp_map uclamp_maps[UCLAMP_CNT] > > + [CONFIG_UCLAMP_GROUPS_COUNT + 1] > > + cacheline_aligned_in_smp; > > + > > I'm still completely confused by all this. > > sizeof(uclamp_map) = 12 > > that array is 2*6=12 of those, so the whole thing is 144 bytes. which is > more than 2 (64 byte) cachelines. This data structure is *not* used in the hot-path, that's why I did not care about fitting it exactly into few cache lines. It's used to map a user-space "clamp value" into a kernel-space "clamp group" when user-space: - changes a task specific clamp value - changes a cgroup clamp value - a task forks/exits I assume we can consider all those as "slow" code paths, is that correct ? At enqueue/dequeue time we use instead struct uclamp_cpu, introduced
Re: [PATCH v4 02/16] sched/core: uclamp: map TASK's clamp values into CPU's clamp groups
On 12-Sep 15:49, Peter Zijlstra wrote: > On Tue, Aug 28, 2018 at 02:53:10PM +0100, Patrick Bellasi wrote: > > +/** > > + * Utilization's clamp group > > + * > > + * A utilization clamp group maps a "clamp value" (value), i.e. > > + * util_{min,max}, to a "clamp group index" (group_id). > > + */ > > +struct uclamp_se { > > + unsigned int value; > > + unsigned int group_id; > > +}; > > > +/** > > + * uclamp_map: reference counts a utilization "clamp value" > > + * @value:the utilization "clamp value" required > > + * @se_count: the number of scheduling entities requiring the "clamp value" > > + * @se_lock: serialize reference count updates by protecting se_count > > Why do you have a spinlock to serialize a single value? Don't we have > atomics for that? There are some code paths where it's used to protect clamp groups mapping and initialization, e.g. uclamp_group_get() spin_lock() // initialize clamp group (if required) and then... se_count += 1 spin_unlock() Almost all these paths are triggered from user-space and protected by a global uclamp_mutex, but fork/exit paths. To serialize these paths I'm using the spinlock above, does it make sense ? Can we use the global uclamp_mutex on forks/exit too ? One additional observations is that, if in the future we want to add a kernel space API, (e.g. driver asking for a new clamp value), maybe we will need to have a serialized non-sleeping uclamp_group_get() API ? > > + */ > > +struct uclamp_map { > > + int value; > > + int se_count; > > + raw_spinlock_t se_lock; > > +}; > > + > > +/** > > + * uclamp_maps: maps each SEs "clamp value" into a CPUs "clamp group" > > + * > > + * Since only a limited number of different "clamp values" are supported, > > we > > + * need to map each different clamp value into a "clamp group" (group_id) > > to > > + * be used by the per-CPU accounting in the fast-path, when tasks are > > + * enqueued and dequeued. > > + * We also support different kind of utilization clamping, min and max > > + * utilization for example, each representing what we call a "clamp index" > > + * (clamp_id). > > + * > > + * A matrix is thus required to map "clamp values" to "clamp groups" > > + * (group_id), for each "clamp index" (clamp_id), where: > > + * - rows are indexed by clamp_id and they collect the clamp groups for a > > + * given clamp index > > + * - columns are indexed by group_id and they collect the clamp values > > which > > + * maps to that clamp group > > + * > > + * Thus, the column index of a given (clamp_id, value) pair represents the > > + * clamp group (group_id) used by the fast-path's per-CPU accounting. > > + * > > + * NOTE: first clamp group (group_id=0) is reserved for tracking of non > > + * clamped tasks. Thus we allocate one more slot than the value of > > + * CONFIG_UCLAMP_GROUPS_COUNT. > > + * > > + * Here is the map layout and, right below, how entries are accessed by the > > + * following code. > > + * > > + * uclamp_maps is a matrix of > > + * +--- UCLAMP_CNT by CONFIG_UCLAMP_GROUPS_COUNT+1 entries > > + * || > > + * |/---+---\ > > + * | ++ ++ > > + * | / UCLAMP_MIN | value | | value | > > + * | || se_count |.. | se_count | > > + * | |++ ++ > > + * +--+++ ++ > > + * || value | | value | > > + * \ UCLAMP_MAX | se_count |.. | se_count | > > + * +-^--+ +^---+ > > + *| | > > + * uc_map = + | > > + * _maps[clamp_id][0] + > > + *clamp_value = > > + * uc_map[group_id].value > > + */ > > +static struct uclamp_map uclamp_maps[UCLAMP_CNT] > > + [CONFIG_UCLAMP_GROUPS_COUNT + 1] > > + cacheline_aligned_in_smp; > > + > > I'm still completely confused by all this. > > sizeof(uclamp_map) = 12 > > that array is 2*6=12 of those, so the whole thing is 144 bytes. which is > more than 2 (64 byte) cachelines. This data structure is *not* used in the hot-path, that's why I did not care about fitting it exactly into few cache lines. It's used to map a user-space "clamp value" into a kernel-space "clamp group" when user-space: - changes a task specific clamp value - changes a cgroup clamp value - a task forks/exits I assume we can consider all those as "slow" code paths, is that correct ? At enqueue/dequeue time we use instead struct uclamp_cpu, introduced
Re: [PATCH v4 02/16] sched/core: uclamp: map TASK's clamp values into CPU's clamp groups
On Tue, Aug 28, 2018 at 02:53:10PM +0100, Patrick Bellasi wrote: > +/** > + * Utilization's clamp group > + * > + * A utilization clamp group maps a "clamp value" (value), i.e. > + * util_{min,max}, to a "clamp group index" (group_id). > + */ > +struct uclamp_se { > + unsigned int value; > + unsigned int group_id; > +}; > +/** > + * uclamp_map: reference counts a utilization "clamp value" > + * @value:the utilization "clamp value" required > + * @se_count: the number of scheduling entities requiring the "clamp value" > + * @se_lock: serialize reference count updates by protecting se_count Why do you have a spinlock to serialize a single value? Don't we have atomics for that? > + */ > +struct uclamp_map { > + int value; > + int se_count; > + raw_spinlock_t se_lock; > +}; > + > +/** > + * uclamp_maps: maps each SEs "clamp value" into a CPUs "clamp group" > + * > + * Since only a limited number of different "clamp values" are supported, we > + * need to map each different clamp value into a "clamp group" (group_id) to > + * be used by the per-CPU accounting in the fast-path, when tasks are > + * enqueued and dequeued. > + * We also support different kind of utilization clamping, min and max > + * utilization for example, each representing what we call a "clamp index" > + * (clamp_id). > + * > + * A matrix is thus required to map "clamp values" to "clamp groups" > + * (group_id), for each "clamp index" (clamp_id), where: > + * - rows are indexed by clamp_id and they collect the clamp groups for a > + * given clamp index > + * - columns are indexed by group_id and they collect the clamp values which > + * maps to that clamp group > + * > + * Thus, the column index of a given (clamp_id, value) pair represents the > + * clamp group (group_id) used by the fast-path's per-CPU accounting. > + * > + * NOTE: first clamp group (group_id=0) is reserved for tracking of non > + * clamped tasks. Thus we allocate one more slot than the value of > + * CONFIG_UCLAMP_GROUPS_COUNT. > + * > + * Here is the map layout and, right below, how entries are accessed by the > + * following code. > + * > + * uclamp_maps is a matrix of > + * +--- UCLAMP_CNT by CONFIG_UCLAMP_GROUPS_COUNT+1 entries > + * || > + * |/---+---\ > + * | ++ ++ > + * | / UCLAMP_MIN | value | | value | > + * | || se_count |.. | se_count | > + * | |++ ++ > + * +--+++ ++ > + * || value | | value | > + * \ UCLAMP_MAX | se_count |.. | se_count | > + * +-^--+ +^---+ > + *| | > + * uc_map = + | > + * _maps[clamp_id][0] + > + *clamp_value = > + * uc_map[group_id].value > + */ > +static struct uclamp_map uclamp_maps[UCLAMP_CNT] > + [CONFIG_UCLAMP_GROUPS_COUNT + 1] > + cacheline_aligned_in_smp; > + I'm still completely confused by all this. sizeof(uclamp_map) = 12 that array is 2*6=12 of those, so the whole thing is 144 bytes. which is more than 2 (64 byte) cachelines. What's the purpose of that cacheline align statement? Note that without that apparently superfluous lock, it would be 8*12 = 96 bytes, which is 1.5 lines and would indeed suggest you default to GROUP_COUNT=7 by default to fill 2 lines. Why are the min and max things torn up like that? I'm fairly sure I asked some of that last time; but the above comments only try to explain what, not why.
Re: [PATCH v4 02/16] sched/core: uclamp: map TASK's clamp values into CPU's clamp groups
On Tue, Aug 28, 2018 at 02:53:10PM +0100, Patrick Bellasi wrote: > +/** > + * Utilization's clamp group > + * > + * A utilization clamp group maps a "clamp value" (value), i.e. > + * util_{min,max}, to a "clamp group index" (group_id). > + */ > +struct uclamp_se { > + unsigned int value; > + unsigned int group_id; > +}; > +/** > + * uclamp_map: reference counts a utilization "clamp value" > + * @value:the utilization "clamp value" required > + * @se_count: the number of scheduling entities requiring the "clamp value" > + * @se_lock: serialize reference count updates by protecting se_count Why do you have a spinlock to serialize a single value? Don't we have atomics for that? > + */ > +struct uclamp_map { > + int value; > + int se_count; > + raw_spinlock_t se_lock; > +}; > + > +/** > + * uclamp_maps: maps each SEs "clamp value" into a CPUs "clamp group" > + * > + * Since only a limited number of different "clamp values" are supported, we > + * need to map each different clamp value into a "clamp group" (group_id) to > + * be used by the per-CPU accounting in the fast-path, when tasks are > + * enqueued and dequeued. > + * We also support different kind of utilization clamping, min and max > + * utilization for example, each representing what we call a "clamp index" > + * (clamp_id). > + * > + * A matrix is thus required to map "clamp values" to "clamp groups" > + * (group_id), for each "clamp index" (clamp_id), where: > + * - rows are indexed by clamp_id and they collect the clamp groups for a > + * given clamp index > + * - columns are indexed by group_id and they collect the clamp values which > + * maps to that clamp group > + * > + * Thus, the column index of a given (clamp_id, value) pair represents the > + * clamp group (group_id) used by the fast-path's per-CPU accounting. > + * > + * NOTE: first clamp group (group_id=0) is reserved for tracking of non > + * clamped tasks. Thus we allocate one more slot than the value of > + * CONFIG_UCLAMP_GROUPS_COUNT. > + * > + * Here is the map layout and, right below, how entries are accessed by the > + * following code. > + * > + * uclamp_maps is a matrix of > + * +--- UCLAMP_CNT by CONFIG_UCLAMP_GROUPS_COUNT+1 entries > + * || > + * |/---+---\ > + * | ++ ++ > + * | / UCLAMP_MIN | value | | value | > + * | || se_count |.. | se_count | > + * | |++ ++ > + * +--+++ ++ > + * || value | | value | > + * \ UCLAMP_MAX | se_count |.. | se_count | > + * +-^--+ +^---+ > + *| | > + * uc_map = + | > + * _maps[clamp_id][0] + > + *clamp_value = > + * uc_map[group_id].value > + */ > +static struct uclamp_map uclamp_maps[UCLAMP_CNT] > + [CONFIG_UCLAMP_GROUPS_COUNT + 1] > + cacheline_aligned_in_smp; > + I'm still completely confused by all this. sizeof(uclamp_map) = 12 that array is 2*6=12 of those, so the whole thing is 144 bytes. which is more than 2 (64 byte) cachelines. What's the purpose of that cacheline align statement? Note that without that apparently superfluous lock, it would be 8*12 = 96 bytes, which is 1.5 lines and would indeed suggest you default to GROUP_COUNT=7 by default to fill 2 lines. Why are the min and max things torn up like that? I'm fairly sure I asked some of that last time; but the above comments only try to explain what, not why.
Re: [PATCH v4 02/16] sched/core: uclamp: map TASK's clamp values into CPU's clamp groups
Hi Suren, On 08-Sep 16:47, Suren Baghdasaryan wrote: [...] > > + * A clamp group is not free if there is at least one SE which is sing a > > clamp > > typo in the sentence Right, s/is sing/is using/ +1 [...] > > +static int > > +uclamp_group_find(int clamp_id, unsigned int clamp_value) > > +{ > > + struct uclamp_map *uc_map = _maps[clamp_id][0]; > > + int free_group_id = UCLAMP_NOT_VALID; > > + unsigned int group_id = 0; > > + > > + for ( ; group_id <= CONFIG_UCLAMP_GROUPS_COUNT; ++group_id) { > > + /* Keep track of first free clamp group */ > > + if (uclamp_group_available(clamp_id, group_id)) { > > + if (free_group_id == UCLAMP_NOT_VALID) > > + free_group_id = group_id; > > + continue; > > + } > > Not a big improvement but reordering the two conditions in this loop > would avoid finding and recording free_group_id if the very first > group is the one we are looking for. Right, indeed with: uclamp_group_put() uclamp_group_reset() uclamp_group_init() we always ensure that: uc_map[group_id].value == UCLAMP_NOT_VALID for free groups. Thus, it should be safe to swap this two checks and we likely save few instructions for a likely common case of non clamped tasks. +1 I'll also get the chance to remove the two simple comments. ;) > > + /* Return index of first group with same clamp value */ > > + if (uc_map[group_id].value == clamp_value) > > + return group_id; > > + } > > + > > + if (likely(free_group_id != UCLAMP_NOT_VALID)) > > + return free_group_id; > > + > > + return -ENOSPC; > > +} [...] > > +static inline void uclamp_group_put(int clamp_id, int group_id) > Is the size and the number of invocations of this function small > enough for inlining? Same goes for uclamp_group_get() and especially > for __setscheduler_uclamp(). Right... yes, we could let the scheduler do its job and remove inline from these functions... at least for those not in the critical path. +1 [...] > > + if (likely(uc_map[group_id].se_count)) > > + uc_map[group_id].se_count -= 1; > > +#ifdef SCHED_DEBUG > > + else { > > nit: no need for braces > > > + WARN(1, "invalid SE clamp group [%d:%d] refcount\n", > > +clamp_id, group_id); Since the above statement is multi-line, we actually need it for code code-style requirements. > > + } > > +#endif [...] > > +static void uclamp_fork(struct task_struct *p, bool reset) > > +{ > > + int clamp_id; > > + > > + if (unlikely(!p->sched_class->uclamp_enabled)) > > + return; > > + > > + for (clamp_id = 0; clamp_id < UCLAMP_CNT; ++clamp_id) { > > + int next_group_id = p->uclamp[clamp_id].group_id; > > + struct uclamp_se *uc_se = >uclamp[clamp_id]; > > Might be easier to read if after the above assignment you use > uc_se->xxx instead of p->uclamp[clamp_id].xxx in the code below. Yes, that's actually the intent of the above assigmenet... but I've forgot a couple of usages! +1 > > + > > + if (unlikely(reset)) { > > + next_group_id = 0; > > + p->uclamp[clamp_id].value = uclamp_none(clamp_id); > > + } > > + > > + p->uclamp[clamp_id].group_id = UCLAMP_NOT_VALID; > > + uclamp_group_get(clamp_id, next_group_id, uc_se, > > +p->uclamp[clamp_id].value); > > + } > > +} [...] > Thanks, > Suren. Cheers, Patrick -- #include Patrick Bellasi
Re: [PATCH v4 02/16] sched/core: uclamp: map TASK's clamp values into CPU's clamp groups
Hi Suren, On 08-Sep 16:47, Suren Baghdasaryan wrote: [...] > > + * A clamp group is not free if there is at least one SE which is sing a > > clamp > > typo in the sentence Right, s/is sing/is using/ +1 [...] > > +static int > > +uclamp_group_find(int clamp_id, unsigned int clamp_value) > > +{ > > + struct uclamp_map *uc_map = _maps[clamp_id][0]; > > + int free_group_id = UCLAMP_NOT_VALID; > > + unsigned int group_id = 0; > > + > > + for ( ; group_id <= CONFIG_UCLAMP_GROUPS_COUNT; ++group_id) { > > + /* Keep track of first free clamp group */ > > + if (uclamp_group_available(clamp_id, group_id)) { > > + if (free_group_id == UCLAMP_NOT_VALID) > > + free_group_id = group_id; > > + continue; > > + } > > Not a big improvement but reordering the two conditions in this loop > would avoid finding and recording free_group_id if the very first > group is the one we are looking for. Right, indeed with: uclamp_group_put() uclamp_group_reset() uclamp_group_init() we always ensure that: uc_map[group_id].value == UCLAMP_NOT_VALID for free groups. Thus, it should be safe to swap this two checks and we likely save few instructions for a likely common case of non clamped tasks. +1 I'll also get the chance to remove the two simple comments. ;) > > + /* Return index of first group with same clamp value */ > > + if (uc_map[group_id].value == clamp_value) > > + return group_id; > > + } > > + > > + if (likely(free_group_id != UCLAMP_NOT_VALID)) > > + return free_group_id; > > + > > + return -ENOSPC; > > +} [...] > > +static inline void uclamp_group_put(int clamp_id, int group_id) > Is the size and the number of invocations of this function small > enough for inlining? Same goes for uclamp_group_get() and especially > for __setscheduler_uclamp(). Right... yes, we could let the scheduler do its job and remove inline from these functions... at least for those not in the critical path. +1 [...] > > + if (likely(uc_map[group_id].se_count)) > > + uc_map[group_id].se_count -= 1; > > +#ifdef SCHED_DEBUG > > + else { > > nit: no need for braces > > > + WARN(1, "invalid SE clamp group [%d:%d] refcount\n", > > +clamp_id, group_id); Since the above statement is multi-line, we actually need it for code code-style requirements. > > + } > > +#endif [...] > > +static void uclamp_fork(struct task_struct *p, bool reset) > > +{ > > + int clamp_id; > > + > > + if (unlikely(!p->sched_class->uclamp_enabled)) > > + return; > > + > > + for (clamp_id = 0; clamp_id < UCLAMP_CNT; ++clamp_id) { > > + int next_group_id = p->uclamp[clamp_id].group_id; > > + struct uclamp_se *uc_se = >uclamp[clamp_id]; > > Might be easier to read if after the above assignment you use > uc_se->xxx instead of p->uclamp[clamp_id].xxx in the code below. Yes, that's actually the intent of the above assigmenet... but I've forgot a couple of usages! +1 > > + > > + if (unlikely(reset)) { > > + next_group_id = 0; > > + p->uclamp[clamp_id].value = uclamp_none(clamp_id); > > + } > > + > > + p->uclamp[clamp_id].group_id = UCLAMP_NOT_VALID; > > + uclamp_group_get(clamp_id, next_group_id, uc_se, > > +p->uclamp[clamp_id].value); > > + } > > +} [...] > Thanks, > Suren. Cheers, Patrick -- #include Patrick Bellasi
Re: [PATCH v4 02/16] sched/core: uclamp: map TASK's clamp values into CPU's clamp groups
Hi Patrick! On Tue, Aug 28, 2018 at 6:53 AM, Patrick Bellasi wrote: > Utilization clamping requires each CPU to know which clamp values are > assigned to tasks that are currently RUNNABLE on that CPU. > Multiple tasks can be assigned the same clamp value and tasks with > different clamp values can be concurrently active on the same CPU. > Thus, a proper data structure is required to support a fast and > efficient aggregation of the clamp values required by the currently > RUNNABLE tasks. > > For this purpose we use a per-CPU array of reference counters, > where each slot is used to account how many tasks require a certain > clamp value are currently RUNNABLE on each CPU. > Each clamp value corresponds to a "clamp index" which identifies the > position within the array of reference counters. > > : >(user-space changes) : (kernel space / scheduler) > : > SLOW PATH : FAST PATH > : > task_struct::uclamp::value : sched/core::enqueue/dequeue > : cpufreq_schedutil > : > ++++ +---+ > | TASK || CLAMP GROUP| |CPU CLAMPS | > ++++ +---+ > ||| clamp_{min,max} | | clamp_{min,max} | > | util_{min,max} || se_count | |tasks count| > ++++ +---+ > : >+--> : +---> > group_id = map(clamp_value) : ref_count(group_id) > : > : > > Let's introduce the support to map tasks to "clamp groups". > Specifically we introduce the required functions to translate a > "clamp value" into a clamp's "group index" (group_id). > > Only a limited number of (different) clamp values are supported since: > 1. there are usually only few classes of workloads for which it makes >sense to boost/limit to different frequencies, >e.g. background vs foreground, interactive vs low-priority > 2. it allows a simpler and more memory/time efficient tracking of >the per-CPU clamp values in the fast path. > > The number of possible different clamp values is currently defined at > compile time. Thus, setting a new clamp value for a task can result into > a -ENOSPC error in case this will exceed the number of maximum different > clamp values supported. > > Signed-off-by: Patrick Bellasi > Cc: Ingo Molnar > Cc: Peter Zijlstra > Cc: Paul Turner > Cc: Suren Baghdasaryan > Cc: Todd Kjos > Cc: Joel Fernandes > Cc: Juri Lelli > Cc: Quentin Perret > Cc: Dietmar Eggemann > Cc: Morten Rasmussen > Cc: linux-kernel@vger.kernel.org > Cc: linux...@vger.kernel.org > > --- > Changes in v4: > Message-ID: <20180814112509.gb2...@codeaurora.org> > - add uclamp_exit_task() to release clamp refcount from do_exit() > Message-ID: <20180816133249.GA2964@e110439-lin> > - keep the WARN but butify a bit that code > Message-ID: <20180413082648.gp4...@hirez.programming.kicks-ass.net> > - move uclamp_enabled at the top of sched_class to keep it on the same >cache line of other main wakeup time callbacks > Others: > - init uclamp for the init_task and refcount its clamp groups > - add uclamp specific fork time code into uclamp_fork > - add support for SCHED_FLAG_RESET_ON_FORK >default clamps are now set for init_task and inherited/reset at >fork time (when then flag is set for the parent) > - enable uclamp only for FAIR tasks, RT class will be enabled only >by a following patch which also integrate the class to schedutil > - define uclamp_maps cacheline_aligned_in_smp > - in uclamp_group_get() ensure to include uclamp_group_available() and >uclamp_group_init() into the atomic section defined by: > uc_map[next_group_id].se_lock > - do not use mutex_lock(_mutex) in uclamp_exit_task >which is also not needed since refcounting is already guarded by >the uc_map[group_id].se_lock spinlock > - rebased on v4.19-rc1 > > Changes in v3: > Message-ID: > > - rename UCLAMP_NONE into UCLAMP_NOT_VALID > - remove not necessary checks in uclamp_group_find() > - add WARN on unlikely un-referenced decrement in uclamp_group_put() > - make __setscheduler_uclamp() able to set just one clamp value > - make __setscheduler_uclamp() failing if both clamps are required but >there is no clamp groups available for one of them > - remove uclamp_group_find() from uclamp_group_get() which now takes a >group_id as a parameter > Others: > - rebased on tip/sched/core > Changes in v2: > - rabased on v4.18-rc4 > - set UCLAMP_GROUPS_COUNT=2 by default >which allows to fit all
Re: [PATCH v4 02/16] sched/core: uclamp: map TASK's clamp values into CPU's clamp groups
Hi Patrick! On Tue, Aug 28, 2018 at 6:53 AM, Patrick Bellasi wrote: > Utilization clamping requires each CPU to know which clamp values are > assigned to tasks that are currently RUNNABLE on that CPU. > Multiple tasks can be assigned the same clamp value and tasks with > different clamp values can be concurrently active on the same CPU. > Thus, a proper data structure is required to support a fast and > efficient aggregation of the clamp values required by the currently > RUNNABLE tasks. > > For this purpose we use a per-CPU array of reference counters, > where each slot is used to account how many tasks require a certain > clamp value are currently RUNNABLE on each CPU. > Each clamp value corresponds to a "clamp index" which identifies the > position within the array of reference counters. > > : >(user-space changes) : (kernel space / scheduler) > : > SLOW PATH : FAST PATH > : > task_struct::uclamp::value : sched/core::enqueue/dequeue > : cpufreq_schedutil > : > ++++ +---+ > | TASK || CLAMP GROUP| |CPU CLAMPS | > ++++ +---+ > ||| clamp_{min,max} | | clamp_{min,max} | > | util_{min,max} || se_count | |tasks count| > ++++ +---+ > : >+--> : +---> > group_id = map(clamp_value) : ref_count(group_id) > : > : > > Let's introduce the support to map tasks to "clamp groups". > Specifically we introduce the required functions to translate a > "clamp value" into a clamp's "group index" (group_id). > > Only a limited number of (different) clamp values are supported since: > 1. there are usually only few classes of workloads for which it makes >sense to boost/limit to different frequencies, >e.g. background vs foreground, interactive vs low-priority > 2. it allows a simpler and more memory/time efficient tracking of >the per-CPU clamp values in the fast path. > > The number of possible different clamp values is currently defined at > compile time. Thus, setting a new clamp value for a task can result into > a -ENOSPC error in case this will exceed the number of maximum different > clamp values supported. > > Signed-off-by: Patrick Bellasi > Cc: Ingo Molnar > Cc: Peter Zijlstra > Cc: Paul Turner > Cc: Suren Baghdasaryan > Cc: Todd Kjos > Cc: Joel Fernandes > Cc: Juri Lelli > Cc: Quentin Perret > Cc: Dietmar Eggemann > Cc: Morten Rasmussen > Cc: linux-kernel@vger.kernel.org > Cc: linux...@vger.kernel.org > > --- > Changes in v4: > Message-ID: <20180814112509.gb2...@codeaurora.org> > - add uclamp_exit_task() to release clamp refcount from do_exit() > Message-ID: <20180816133249.GA2964@e110439-lin> > - keep the WARN but butify a bit that code > Message-ID: <20180413082648.gp4...@hirez.programming.kicks-ass.net> > - move uclamp_enabled at the top of sched_class to keep it on the same >cache line of other main wakeup time callbacks > Others: > - init uclamp for the init_task and refcount its clamp groups > - add uclamp specific fork time code into uclamp_fork > - add support for SCHED_FLAG_RESET_ON_FORK >default clamps are now set for init_task and inherited/reset at >fork time (when then flag is set for the parent) > - enable uclamp only for FAIR tasks, RT class will be enabled only >by a following patch which also integrate the class to schedutil > - define uclamp_maps cacheline_aligned_in_smp > - in uclamp_group_get() ensure to include uclamp_group_available() and >uclamp_group_init() into the atomic section defined by: > uc_map[next_group_id].se_lock > - do not use mutex_lock(_mutex) in uclamp_exit_task >which is also not needed since refcounting is already guarded by >the uc_map[group_id].se_lock spinlock > - rebased on v4.19-rc1 > > Changes in v3: > Message-ID: > > - rename UCLAMP_NONE into UCLAMP_NOT_VALID > - remove not necessary checks in uclamp_group_find() > - add WARN on unlikely un-referenced decrement in uclamp_group_put() > - make __setscheduler_uclamp() able to set just one clamp value > - make __setscheduler_uclamp() failing if both clamps are required but >there is no clamp groups available for one of them > - remove uclamp_group_find() from uclamp_group_get() which now takes a >group_id as a parameter > Others: > - rebased on tip/sched/core > Changes in v2: > - rabased on v4.18-rc4 > - set UCLAMP_GROUPS_COUNT=2 by default >which allows to fit all
Re: [PATCH v4 02/16] sched/core: uclamp: map TASK's clamp values into CPU's clamp groups
On 06/09/18 14:48, Patrick Bellasi wrote: > Hi Juri! > > On 05-Sep 12:45, Juri Lelli wrote: > > Hi, > > > > On 28/08/18 14:53, Patrick Bellasi wrote: > > > > [...] > > > > > static inline int __setscheduler_uclamp(struct task_struct *p, > > > const struct sched_attr *attr) > > > { > > > - if (attr->sched_util_min > attr->sched_util_max) > > > - return -EINVAL; > > > - if (attr->sched_util_max > SCHED_CAPACITY_SCALE) > > > - return -EINVAL; > > > + int group_id[UCLAMP_CNT] = { UCLAMP_NOT_VALID }; > > > + int lower_bound, upper_bound; > > > + struct uclamp_se *uc_se; > > > + int result = 0; > > > > > > - p->uclamp[UCLAMP_MIN] = attr->sched_util_min; > > > - p->uclamp[UCLAMP_MAX] = attr->sched_util_max; > > > + mutex_lock(_mutex); > > > > This is going to get called from an rcu_read_lock() section, which is a > > no-go for using mutexes: > > > > sys_sched_setattr -> > >rcu_read_lock() > >... > >sched_setattr() -> > > __sched_setscheduler() -> > >... > >__setscheduler_uclamp() -> > > ... > > mutex_lock() > > Rightm, great catch, thanks! > > > Guess you could fix the issue by getting the task struct after find_ > > process_by_pid() in sys_sched_attr() and then calling sched_setattr() > > after rcu_read_lock() (putting the task struct at the end). Peter > > actually suggested this mod to solve a different issue. > > I guess you mean something like this ? > > ---8<--- > --- a/kernel/sched/core.c > +++ b/kernel/sched/core.c > @@ -5792,10 +5792,15 @@ SYSCALL_DEFINE3(sched_setattr, pid_t, pid, struct > sched_attr __user *, uattr, > rcu_read_lock(); > retval = -ESRCH; > p = find_process_by_pid(pid); > - if (p != NULL) > - retval = sched_setattr(p, ); > + if (likely(p)) > + get_task_struct(p); > rcu_read_unlock(); > > + if (likely(p)) { > + retval = sched_setattr(p, ); > + put_task_struct(p); > + } > + > return retval; > } > ---8<--- This should do the job yes.
Re: [PATCH v4 02/16] sched/core: uclamp: map TASK's clamp values into CPU's clamp groups
On 06/09/18 14:48, Patrick Bellasi wrote: > Hi Juri! > > On 05-Sep 12:45, Juri Lelli wrote: > > Hi, > > > > On 28/08/18 14:53, Patrick Bellasi wrote: > > > > [...] > > > > > static inline int __setscheduler_uclamp(struct task_struct *p, > > > const struct sched_attr *attr) > > > { > > > - if (attr->sched_util_min > attr->sched_util_max) > > > - return -EINVAL; > > > - if (attr->sched_util_max > SCHED_CAPACITY_SCALE) > > > - return -EINVAL; > > > + int group_id[UCLAMP_CNT] = { UCLAMP_NOT_VALID }; > > > + int lower_bound, upper_bound; > > > + struct uclamp_se *uc_se; > > > + int result = 0; > > > > > > - p->uclamp[UCLAMP_MIN] = attr->sched_util_min; > > > - p->uclamp[UCLAMP_MAX] = attr->sched_util_max; > > > + mutex_lock(_mutex); > > > > This is going to get called from an rcu_read_lock() section, which is a > > no-go for using mutexes: > > > > sys_sched_setattr -> > >rcu_read_lock() > >... > >sched_setattr() -> > > __sched_setscheduler() -> > >... > >__setscheduler_uclamp() -> > > ... > > mutex_lock() > > Rightm, great catch, thanks! > > > Guess you could fix the issue by getting the task struct after find_ > > process_by_pid() in sys_sched_attr() and then calling sched_setattr() > > after rcu_read_lock() (putting the task struct at the end). Peter > > actually suggested this mod to solve a different issue. > > I guess you mean something like this ? > > ---8<--- > --- a/kernel/sched/core.c > +++ b/kernel/sched/core.c > @@ -5792,10 +5792,15 @@ SYSCALL_DEFINE3(sched_setattr, pid_t, pid, struct > sched_attr __user *, uattr, > rcu_read_lock(); > retval = -ESRCH; > p = find_process_by_pid(pid); > - if (p != NULL) > - retval = sched_setattr(p, ); > + if (likely(p)) > + get_task_struct(p); > rcu_read_unlock(); > > + if (likely(p)) { > + retval = sched_setattr(p, ); > + put_task_struct(p); > + } > + > return retval; > } > ---8<--- This should do the job yes.
Re: [PATCH v4 02/16] sched/core: uclamp: map TASK's clamp values into CPU's clamp groups
On 06-Sep 10:17, Juri Lelli wrote: > On 28/08/18 14:53, Patrick Bellasi wrote: > > [...] > > > static inline int __setscheduler_uclamp(struct task_struct *p, > > const struct sched_attr *attr) > > { > > - if (attr->sched_util_min > attr->sched_util_max) > > - return -EINVAL; > > - if (attr->sched_util_max > SCHED_CAPACITY_SCALE) > > - return -EINVAL; > > + int group_id[UCLAMP_CNT] = { UCLAMP_NOT_VALID }; > > + int lower_bound, upper_bound; > > + struct uclamp_se *uc_se; > > + int result = 0; > > > > - p->uclamp[UCLAMP_MIN] = attr->sched_util_min; > > - p->uclamp[UCLAMP_MAX] = attr->sched_util_max; > > + mutex_lock(_mutex); > > > > - return 0; > > + /* Find a valid group_id for each required clamp value */ > > + if (attr->sched_flags & SCHED_FLAG_UTIL_CLAMP_MIN) { > > + upper_bound = (attr->sched_flags & SCHED_FLAG_UTIL_CLAMP_MAX) > > + ? attr->sched_util_max > > + : p->uclamp[UCLAMP_MAX].value; > > + > > + if (upper_bound == UCLAMP_NOT_VALID) > > + upper_bound = SCHED_CAPACITY_SCALE; > > + if (attr->sched_util_min > upper_bound) { > > + result = -EINVAL; > > + goto done; > > + } > > + > > + result = uclamp_group_find(UCLAMP_MIN, attr->sched_util_min); > > + if (result == -ENOSPC) { > > + pr_err(UCLAMP_ENOSPC_FMT, "MIN"); > > + goto done; > > + } > > + group_id[UCLAMP_MIN] = result; > > + } > > + if (attr->sched_flags & SCHED_FLAG_UTIL_CLAMP_MAX) { > > + lower_bound = (attr->sched_flags & SCHED_FLAG_UTIL_CLAMP_MIN) > > + ? attr->sched_util_min > > + : p->uclamp[UCLAMP_MIN].value; > > + > > + if (lower_bound == UCLAMP_NOT_VALID) > > + lower_bound = 0; > > + if (attr->sched_util_max < lower_bound || > > + attr->sched_util_max > SCHED_CAPACITY_SCALE) { > > + result = -EINVAL; > > + goto done; > > + } > > + > > + result = uclamp_group_find(UCLAMP_MAX, attr->sched_util_max); > > + if (result == -ENOSPC) { > > + pr_err(UCLAMP_ENOSPC_FMT, "MAX"); > > + goto done; > > + } > > + group_id[UCLAMP_MAX] = result; > > + } > > Don't you have to reset result to 0 here (it seems what follows cannot > fail anymore)? Otherwise this function will return latest > uclamp_group_find return value, which will be interpreted as error if > not 0. Yes, wired.. it uses to work on my tests just because I return from the caller on !0 and: - that's just good enough to set by uclamps - my current rt-app integration does not check the sycall return value - my tests never request the clamp_group mapped on group_id=0 :( Will fix both this and the rt-app integration and the tests ! Cheers Patrick -- #include Patrick Bellasi
Re: [PATCH v4 02/16] sched/core: uclamp: map TASK's clamp values into CPU's clamp groups
On 06-Sep 10:17, Juri Lelli wrote: > On 28/08/18 14:53, Patrick Bellasi wrote: > > [...] > > > static inline int __setscheduler_uclamp(struct task_struct *p, > > const struct sched_attr *attr) > > { > > - if (attr->sched_util_min > attr->sched_util_max) > > - return -EINVAL; > > - if (attr->sched_util_max > SCHED_CAPACITY_SCALE) > > - return -EINVAL; > > + int group_id[UCLAMP_CNT] = { UCLAMP_NOT_VALID }; > > + int lower_bound, upper_bound; > > + struct uclamp_se *uc_se; > > + int result = 0; > > > > - p->uclamp[UCLAMP_MIN] = attr->sched_util_min; > > - p->uclamp[UCLAMP_MAX] = attr->sched_util_max; > > + mutex_lock(_mutex); > > > > - return 0; > > + /* Find a valid group_id for each required clamp value */ > > + if (attr->sched_flags & SCHED_FLAG_UTIL_CLAMP_MIN) { > > + upper_bound = (attr->sched_flags & SCHED_FLAG_UTIL_CLAMP_MAX) > > + ? attr->sched_util_max > > + : p->uclamp[UCLAMP_MAX].value; > > + > > + if (upper_bound == UCLAMP_NOT_VALID) > > + upper_bound = SCHED_CAPACITY_SCALE; > > + if (attr->sched_util_min > upper_bound) { > > + result = -EINVAL; > > + goto done; > > + } > > + > > + result = uclamp_group_find(UCLAMP_MIN, attr->sched_util_min); > > + if (result == -ENOSPC) { > > + pr_err(UCLAMP_ENOSPC_FMT, "MIN"); > > + goto done; > > + } > > + group_id[UCLAMP_MIN] = result; > > + } > > + if (attr->sched_flags & SCHED_FLAG_UTIL_CLAMP_MAX) { > > + lower_bound = (attr->sched_flags & SCHED_FLAG_UTIL_CLAMP_MIN) > > + ? attr->sched_util_min > > + : p->uclamp[UCLAMP_MIN].value; > > + > > + if (lower_bound == UCLAMP_NOT_VALID) > > + lower_bound = 0; > > + if (attr->sched_util_max < lower_bound || > > + attr->sched_util_max > SCHED_CAPACITY_SCALE) { > > + result = -EINVAL; > > + goto done; > > + } > > + > > + result = uclamp_group_find(UCLAMP_MAX, attr->sched_util_max); > > + if (result == -ENOSPC) { > > + pr_err(UCLAMP_ENOSPC_FMT, "MAX"); > > + goto done; > > + } > > + group_id[UCLAMP_MAX] = result; > > + } > > Don't you have to reset result to 0 here (it seems what follows cannot > fail anymore)? Otherwise this function will return latest > uclamp_group_find return value, which will be interpreted as error if > not 0. Yes, wired.. it uses to work on my tests just because I return from the caller on !0 and: - that's just good enough to set by uclamps - my current rt-app integration does not check the sycall return value - my tests never request the clamp_group mapped on group_id=0 :( Will fix both this and the rt-app integration and the tests ! Cheers Patrick -- #include Patrick Bellasi
Re: [PATCH v4 02/16] sched/core: uclamp: map TASK's clamp values into CPU's clamp groups
Hi Juri! On 05-Sep 12:45, Juri Lelli wrote: > Hi, > > On 28/08/18 14:53, Patrick Bellasi wrote: > > [...] > > > static inline int __setscheduler_uclamp(struct task_struct *p, > > const struct sched_attr *attr) > > { > > - if (attr->sched_util_min > attr->sched_util_max) > > - return -EINVAL; > > - if (attr->sched_util_max > SCHED_CAPACITY_SCALE) > > - return -EINVAL; > > + int group_id[UCLAMP_CNT] = { UCLAMP_NOT_VALID }; > > + int lower_bound, upper_bound; > > + struct uclamp_se *uc_se; > > + int result = 0; > > > > - p->uclamp[UCLAMP_MIN] = attr->sched_util_min; > > - p->uclamp[UCLAMP_MAX] = attr->sched_util_max; > > + mutex_lock(_mutex); > > This is going to get called from an rcu_read_lock() section, which is a > no-go for using mutexes: > > sys_sched_setattr -> >rcu_read_lock() >... >sched_setattr() -> > __sched_setscheduler() -> >... >__setscheduler_uclamp() -> > ... >mutex_lock() Rightm, great catch, thanks! > Guess you could fix the issue by getting the task struct after find_ > process_by_pid() in sys_sched_attr() and then calling sched_setattr() > after rcu_read_lock() (putting the task struct at the end). Peter > actually suggested this mod to solve a different issue. I guess you mean something like this ? ---8<--- --- a/kernel/sched/core.c +++ b/kernel/sched/core.c @@ -5792,10 +5792,15 @@ SYSCALL_DEFINE3(sched_setattr, pid_t, pid, struct sched_attr __user *, uattr, rcu_read_lock(); retval = -ESRCH; p = find_process_by_pid(pid); - if (p != NULL) - retval = sched_setattr(p, ); + if (likely(p)) + get_task_struct(p); rcu_read_unlock(); + if (likely(p)) { + retval = sched_setattr(p, ); + put_task_struct(p); + } + return retval; } ---8<--- Cheers, Patrick -- #include Patrick Bellasi
Re: [PATCH v4 02/16] sched/core: uclamp: map TASK's clamp values into CPU's clamp groups
Hi Juri! On 05-Sep 12:45, Juri Lelli wrote: > Hi, > > On 28/08/18 14:53, Patrick Bellasi wrote: > > [...] > > > static inline int __setscheduler_uclamp(struct task_struct *p, > > const struct sched_attr *attr) > > { > > - if (attr->sched_util_min > attr->sched_util_max) > > - return -EINVAL; > > - if (attr->sched_util_max > SCHED_CAPACITY_SCALE) > > - return -EINVAL; > > + int group_id[UCLAMP_CNT] = { UCLAMP_NOT_VALID }; > > + int lower_bound, upper_bound; > > + struct uclamp_se *uc_se; > > + int result = 0; > > > > - p->uclamp[UCLAMP_MIN] = attr->sched_util_min; > > - p->uclamp[UCLAMP_MAX] = attr->sched_util_max; > > + mutex_lock(_mutex); > > This is going to get called from an rcu_read_lock() section, which is a > no-go for using mutexes: > > sys_sched_setattr -> >rcu_read_lock() >... >sched_setattr() -> > __sched_setscheduler() -> >... >__setscheduler_uclamp() -> > ... >mutex_lock() Rightm, great catch, thanks! > Guess you could fix the issue by getting the task struct after find_ > process_by_pid() in sys_sched_attr() and then calling sched_setattr() > after rcu_read_lock() (putting the task struct at the end). Peter > actually suggested this mod to solve a different issue. I guess you mean something like this ? ---8<--- --- a/kernel/sched/core.c +++ b/kernel/sched/core.c @@ -5792,10 +5792,15 @@ SYSCALL_DEFINE3(sched_setattr, pid_t, pid, struct sched_attr __user *, uattr, rcu_read_lock(); retval = -ESRCH; p = find_process_by_pid(pid); - if (p != NULL) - retval = sched_setattr(p, ); + if (likely(p)) + get_task_struct(p); rcu_read_unlock(); + if (likely(p)) { + retval = sched_setattr(p, ); + put_task_struct(p); + } + return retval; } ---8<--- Cheers, Patrick -- #include Patrick Bellasi
Re: [PATCH v4 02/16] sched/core: uclamp: map TASK's clamp values into CPU's clamp groups
On 28/08/18 14:53, Patrick Bellasi wrote: [...] > static inline int __setscheduler_uclamp(struct task_struct *p, > const struct sched_attr *attr) > { > - if (attr->sched_util_min > attr->sched_util_max) > - return -EINVAL; > - if (attr->sched_util_max > SCHED_CAPACITY_SCALE) > - return -EINVAL; > + int group_id[UCLAMP_CNT] = { UCLAMP_NOT_VALID }; > + int lower_bound, upper_bound; > + struct uclamp_se *uc_se; > + int result = 0; > > - p->uclamp[UCLAMP_MIN] = attr->sched_util_min; > - p->uclamp[UCLAMP_MAX] = attr->sched_util_max; > + mutex_lock(_mutex); > > - return 0; > + /* Find a valid group_id for each required clamp value */ > + if (attr->sched_flags & SCHED_FLAG_UTIL_CLAMP_MIN) { > + upper_bound = (attr->sched_flags & SCHED_FLAG_UTIL_CLAMP_MAX) > + ? attr->sched_util_max > + : p->uclamp[UCLAMP_MAX].value; > + > + if (upper_bound == UCLAMP_NOT_VALID) > + upper_bound = SCHED_CAPACITY_SCALE; > + if (attr->sched_util_min > upper_bound) { > + result = -EINVAL; > + goto done; > + } > + > + result = uclamp_group_find(UCLAMP_MIN, attr->sched_util_min); > + if (result == -ENOSPC) { > + pr_err(UCLAMP_ENOSPC_FMT, "MIN"); > + goto done; > + } > + group_id[UCLAMP_MIN] = result; > + } > + if (attr->sched_flags & SCHED_FLAG_UTIL_CLAMP_MAX) { > + lower_bound = (attr->sched_flags & SCHED_FLAG_UTIL_CLAMP_MIN) > + ? attr->sched_util_min > + : p->uclamp[UCLAMP_MIN].value; > + > + if (lower_bound == UCLAMP_NOT_VALID) > + lower_bound = 0; > + if (attr->sched_util_max < lower_bound || > + attr->sched_util_max > SCHED_CAPACITY_SCALE) { > + result = -EINVAL; > + goto done; > + } > + > + result = uclamp_group_find(UCLAMP_MAX, attr->sched_util_max); > + if (result == -ENOSPC) { > + pr_err(UCLAMP_ENOSPC_FMT, "MAX"); > + goto done; > + } > + group_id[UCLAMP_MAX] = result; > + } Don't you have to reset result to 0 here (it seems what follows cannot fail anymore)? Otherwise this function will return latest uclamp_group_find return value, which will be interpreted as error if not 0. > + > + /* Update each required clamp group */ > + if (attr->sched_flags & SCHED_FLAG_UTIL_CLAMP_MIN) { > + uc_se = >uclamp[UCLAMP_MIN]; > + uclamp_group_get(UCLAMP_MIN, group_id[UCLAMP_MIN], > + uc_se, attr->sched_util_min); > + } > + if (attr->sched_flags & SCHED_FLAG_UTIL_CLAMP_MAX) { > + uc_se = >uclamp[UCLAMP_MAX]; > + uclamp_group_get(UCLAMP_MAX, group_id[UCLAMP_MAX], > + uc_se, attr->sched_util_max); > + } > + > +done: > + mutex_unlock(_mutex); > + > + return result; > +} Best, - Juri
Re: [PATCH v4 02/16] sched/core: uclamp: map TASK's clamp values into CPU's clamp groups
On 28/08/18 14:53, Patrick Bellasi wrote: [...] > static inline int __setscheduler_uclamp(struct task_struct *p, > const struct sched_attr *attr) > { > - if (attr->sched_util_min > attr->sched_util_max) > - return -EINVAL; > - if (attr->sched_util_max > SCHED_CAPACITY_SCALE) > - return -EINVAL; > + int group_id[UCLAMP_CNT] = { UCLAMP_NOT_VALID }; > + int lower_bound, upper_bound; > + struct uclamp_se *uc_se; > + int result = 0; > > - p->uclamp[UCLAMP_MIN] = attr->sched_util_min; > - p->uclamp[UCLAMP_MAX] = attr->sched_util_max; > + mutex_lock(_mutex); > > - return 0; > + /* Find a valid group_id for each required clamp value */ > + if (attr->sched_flags & SCHED_FLAG_UTIL_CLAMP_MIN) { > + upper_bound = (attr->sched_flags & SCHED_FLAG_UTIL_CLAMP_MAX) > + ? attr->sched_util_max > + : p->uclamp[UCLAMP_MAX].value; > + > + if (upper_bound == UCLAMP_NOT_VALID) > + upper_bound = SCHED_CAPACITY_SCALE; > + if (attr->sched_util_min > upper_bound) { > + result = -EINVAL; > + goto done; > + } > + > + result = uclamp_group_find(UCLAMP_MIN, attr->sched_util_min); > + if (result == -ENOSPC) { > + pr_err(UCLAMP_ENOSPC_FMT, "MIN"); > + goto done; > + } > + group_id[UCLAMP_MIN] = result; > + } > + if (attr->sched_flags & SCHED_FLAG_UTIL_CLAMP_MAX) { > + lower_bound = (attr->sched_flags & SCHED_FLAG_UTIL_CLAMP_MIN) > + ? attr->sched_util_min > + : p->uclamp[UCLAMP_MIN].value; > + > + if (lower_bound == UCLAMP_NOT_VALID) > + lower_bound = 0; > + if (attr->sched_util_max < lower_bound || > + attr->sched_util_max > SCHED_CAPACITY_SCALE) { > + result = -EINVAL; > + goto done; > + } > + > + result = uclamp_group_find(UCLAMP_MAX, attr->sched_util_max); > + if (result == -ENOSPC) { > + pr_err(UCLAMP_ENOSPC_FMT, "MAX"); > + goto done; > + } > + group_id[UCLAMP_MAX] = result; > + } Don't you have to reset result to 0 here (it seems what follows cannot fail anymore)? Otherwise this function will return latest uclamp_group_find return value, which will be interpreted as error if not 0. > + > + /* Update each required clamp group */ > + if (attr->sched_flags & SCHED_FLAG_UTIL_CLAMP_MIN) { > + uc_se = >uclamp[UCLAMP_MIN]; > + uclamp_group_get(UCLAMP_MIN, group_id[UCLAMP_MIN], > + uc_se, attr->sched_util_min); > + } > + if (attr->sched_flags & SCHED_FLAG_UTIL_CLAMP_MAX) { > + uc_se = >uclamp[UCLAMP_MAX]; > + uclamp_group_get(UCLAMP_MAX, group_id[UCLAMP_MAX], > + uc_se, attr->sched_util_max); > + } > + > +done: > + mutex_unlock(_mutex); > + > + return result; > +} Best, - Juri
Re: [PATCH v4 02/16] sched/core: uclamp: map TASK's clamp values into CPU's clamp groups
Hi, On 28/08/18 14:53, Patrick Bellasi wrote: [...] > static inline int __setscheduler_uclamp(struct task_struct *p, > const struct sched_attr *attr) > { > - if (attr->sched_util_min > attr->sched_util_max) > - return -EINVAL; > - if (attr->sched_util_max > SCHED_CAPACITY_SCALE) > - return -EINVAL; > + int group_id[UCLAMP_CNT] = { UCLAMP_NOT_VALID }; > + int lower_bound, upper_bound; > + struct uclamp_se *uc_se; > + int result = 0; > > - p->uclamp[UCLAMP_MIN] = attr->sched_util_min; > - p->uclamp[UCLAMP_MAX] = attr->sched_util_max; > + mutex_lock(_mutex); This is going to get called from an rcu_read_lock() section, which is a no-go for using mutexes: sys_sched_setattr -> rcu_read_lock() ... sched_setattr() -> __sched_setscheduler() -> ... __setscheduler_uclamp() -> ... mutex_lock() Guess you could fix the issue by getting the task struct after find_ process_by_pid() in sys_sched_attr() and then calling sched_setattr() after rcu_read_lock() (putting the task struct at the end). Peter actually suggested this mod to solve a different issue. Best, - Juri
Re: [PATCH v4 02/16] sched/core: uclamp: map TASK's clamp values into CPU's clamp groups
Hi, On 28/08/18 14:53, Patrick Bellasi wrote: [...] > static inline int __setscheduler_uclamp(struct task_struct *p, > const struct sched_attr *attr) > { > - if (attr->sched_util_min > attr->sched_util_max) > - return -EINVAL; > - if (attr->sched_util_max > SCHED_CAPACITY_SCALE) > - return -EINVAL; > + int group_id[UCLAMP_CNT] = { UCLAMP_NOT_VALID }; > + int lower_bound, upper_bound; > + struct uclamp_se *uc_se; > + int result = 0; > > - p->uclamp[UCLAMP_MIN] = attr->sched_util_min; > - p->uclamp[UCLAMP_MAX] = attr->sched_util_max; > + mutex_lock(_mutex); This is going to get called from an rcu_read_lock() section, which is a no-go for using mutexes: sys_sched_setattr -> rcu_read_lock() ... sched_setattr() -> __sched_setscheduler() -> ... __setscheduler_uclamp() -> ... mutex_lock() Guess you could fix the issue by getting the task struct after find_ process_by_pid() in sys_sched_attr() and then calling sched_setattr() after rcu_read_lock() (putting the task struct at the end). Peter actually suggested this mod to solve a different issue. Best, - Juri