subject:"\"\\\[RFC v3 5\\\/5\\\] sched\\\/\\\{core,cpufreq_schedutil\\\}\\\: add capacity clamping for RT\\\/DL tasks\""

Re: [RFC v3 5/5] sched/{core,cpufreq_schedutil}: add capacity clamping for RT/DL tasks

2017-03-16 Thread Juri Lelli

On 16/03/17 09:58, Joel Fernandes wrote:
> On Thu, Mar 16, 2017 at 5:44 AM, Juri Lelli  wrote:
> > On 16/03/17 12:27, Patrick Bellasi wrote:
> >> On 16-Mar 11:16, Juri Lelli wrote:
> >> > On 15/03/17 16:40, Joel Fernandes wrote:
> >> > > On Wed, Mar 15, 2017 at 9:24 AM, Juri Lelli  wrote:
> >> > > [..]
> >> > > >
> >> > > >> > However, trying to quickly summarize how that would work (for who 
> >> > > >> > is
> >> > > >> > already somewhat familiar with reclaiming bits):
> >> > > >> >
> >> > > >> >  - a task utilization contribution is accounted for (at rq level) 
> >> > > >> > as
> >> > > >> >soon as it wakes up for the first time in a new period
> >> > > >> >  - its contribution is then removed after the 0lag time (or when 
> >> > > >> > the
> >> > > >> >task gets throttled)
> >> > > >> >  - frequency transitions are triggered accordingly
> >> > > >> >
> >> > > >> > So, I don't see why triggering a go down request after the 0lag 
> >> > > >> > time
> >> > > >> > expired and quickly reacting to tasks waking up would have create
> >> > > >> > problems in your case?
> >> > > >>
> >> > > >> In my experience, the 'reacting to tasks' bit doesn't work very 
> >> > > >> well.
> >> > > >
> >> > > > Humm.. but in this case we won't be 'reacting', we will be
> >> > > > 'anticipating' tasks' needs, right?
> >> > >
> >> > > Are you saying we will start ramping frequency before the next
> >> > > activation so that we're ready for it?
> >> > >
> >> >
> >> > I'm saying that there is no need to ramp, simply select the frequency
> >> > that is needed for a task (or a set of them).
> >> >
> >> > > If not, it sounds like it will only make the frequency request on the
> >> > > next activation when the Active bandwidth increases due to the task
> >> > > waking up. By then task has already started to run, right?
> >> > >
> >> >
> >> > When the task is enqueued back we select the frequency considering its
> >> > bandwidth request (and the bandwidth/utilization of the others). So,
> >> > when it actually starts running it will already have enough capacity to
> >> > finish in time.
> >>
> >> Here we are factoring out the time required to actually switch to the
> >> required OPP. I think Joel was referring to this time.
> >>
> 
> Yes, that's what I meant.
> 
> >
> > Right. But, this is an HW limitation. It seems a problem that every
> > scheduler driven decision will have to take into account. So, doesn't
> > make more sense to let the driver (or the governor shim layer) introduce
> > some sort of hysteresis to frequency changes if needed?
> 
> The problem IMO which Hysterisis in the governor will not help is what
> if you had a DL task that is not waking up for several periods and
> then wakes up, then for that wake up, we would still be subject to the
> HW limitation of time taken to switch to needed OPP. Right?
> 

True, but in this case the problem is that you cannot really predict the
future anyway. So, if your HW is so slow to react that it always causes
latency problems then I guess you'll be forced to statically raise your
min_freq value to cope with that HW limitation, indipendently from
scheduling policies/heuristics?

OTOH, hysteresis, when properly tuned, should cover the 'normal' cases.

> >> That time cannot really be eliminated but from having faster OOP
> >> swiching HW support. Still, jumping strating to the "optimal" OPP
> >> instead of rumping up is a big improvement.
> 
> Yes I think so.
> 
> Thanks,
> Joel

Re: [RFC v3 5/5] sched/{core,cpufreq_schedutil}: add capacity clamping for RT/DL tasks

2017-03-16 Thread Joel Fernandes

On Thu, Mar 16, 2017 at 5:44 AM, Juri Lelli  wrote:
> On 16/03/17 12:27, Patrick Bellasi wrote:
>> On 16-Mar 11:16, Juri Lelli wrote:
>> > On 15/03/17 16:40, Joel Fernandes wrote:
>> > > On Wed, Mar 15, 2017 at 9:24 AM, Juri Lelli  wrote:
>> > > [..]
>> > > >
>> > > >> > However, trying to quickly summarize how that would work (for who is
>> > > >> > already somewhat familiar with reclaiming bits):
>> > > >> >
>> > > >> >  - a task utilization contribution is accounted for (at rq level) as
>> > > >> >soon as it wakes up for the first time in a new period
>> > > >> >  - its contribution is then removed after the 0lag time (or when the
>> > > >> >task gets throttled)
>> > > >> >  - frequency transitions are triggered accordingly
>> > > >> >
>> > > >> > So, I don't see why triggering a go down request after the 0lag time
>> > > >> > expired and quickly reacting to tasks waking up would have create
>> > > >> > problems in your case?
>> > > >>
>> > > >> In my experience, the 'reacting to tasks' bit doesn't work very well.
>> > > >
>> > > > Humm.. but in this case we won't be 'reacting', we will be
>> > > > 'anticipating' tasks' needs, right?
>> > >
>> > > Are you saying we will start ramping frequency before the next
>> > > activation so that we're ready for it?
>> > >
>> >
>> > I'm saying that there is no need to ramp, simply select the frequency
>> > that is needed for a task (or a set of them).
>> >
>> > > If not, it sounds like it will only make the frequency request on the
>> > > next activation when the Active bandwidth increases due to the task
>> > > waking up. By then task has already started to run, right?
>> > >
>> >
>> > When the task is enqueued back we select the frequency considering its
>> > bandwidth request (and the bandwidth/utilization of the others). So,
>> > when it actually starts running it will already have enough capacity to
>> > finish in time.
>>
>> Here we are factoring out the time required to actually switch to the
>> required OPP. I think Joel was referring to this time.
>>

Yes, that's what I meant.

>
> Right. But, this is an HW limitation. It seems a problem that every
> scheduler driven decision will have to take into account. So, doesn't
> make more sense to let the driver (or the governor shim layer) introduce
> some sort of hysteresis to frequency changes if needed?

The problem IMO which Hysterisis in the governor will not help is what
if you had a DL task that is not waking up for several periods and
then wakes up, then for that wake up, we would still be subject to the
HW limitation of time taken to switch to needed OPP. Right?

>> That time cannot really be eliminated but from having faster OOP
>> swiching HW support. Still, jumping strating to the "optimal" OPP
>> instead of rumping up is a big improvement.

Yes I think so.

Thanks,
Joel

Re: [RFC v3 5/5] sched/{core,cpufreq_schedutil}: add capacity clamping for RT/DL tasks

2017-03-16 Thread Juri Lelli

On 16/03/17 12:27, Patrick Bellasi wrote:
> On 16-Mar 11:16, Juri Lelli wrote:
> > On 15/03/17 16:40, Joel Fernandes wrote:
> > > On Wed, Mar 15, 2017 at 9:24 AM, Juri Lelli  wrote:
> > > [..]
> > > >
> > > >> > However, trying to quickly summarize how that would work (for who is
> > > >> > already somewhat familiar with reclaiming bits):
> > > >> >
> > > >> >  - a task utilization contribution is accounted for (at rq level) as
> > > >> >soon as it wakes up for the first time in a new period
> > > >> >  - its contribution is then removed after the 0lag time (or when the
> > > >> >task gets throttled)
> > > >> >  - frequency transitions are triggered accordingly
> > > >> >
> > > >> > So, I don't see why triggering a go down request after the 0lag time
> > > >> > expired and quickly reacting to tasks waking up would have create
> > > >> > problems in your case?
> > > >>
> > > >> In my experience, the 'reacting to tasks' bit doesn't work very well.
> > > >
> > > > Humm.. but in this case we won't be 'reacting', we will be
> > > > 'anticipating' tasks' needs, right?
> > > 
> > > Are you saying we will start ramping frequency before the next
> > > activation so that we're ready for it?
> > > 
> > 
> > I'm saying that there is no need to ramp, simply select the frequency
> > that is needed for a task (or a set of them).
> > 
> > > If not, it sounds like it will only make the frequency request on the
> > > next activation when the Active bandwidth increases due to the task
> > > waking up. By then task has already started to run, right?
> > > 
> > 
> > When the task is enqueued back we select the frequency considering its
> > bandwidth request (and the bandwidth/utilization of the others). So,
> > when it actually starts running it will already have enough capacity to
> > finish in time.
> 
> Here we are factoring out the time required to actually switch to the
> required OPP. I think Joel was referring to this time.
> 

Right. But, this is an HW limitation. It seems a problem that every
scheduler driven decision will have to take into account. So, doesn't
make more sense to let the driver (or the governor shim layer) introduce
some sort of hysteresis to frequency changes if needed?

> That time cannot really be eliminated but from having faster OOP
> swiching HW support. Still, jumping strating to the "optimal" OPP
> instead of rumping up is a big improvement.
> 
> 
> -- 
> #include 
> 
> Patrick Bellasi

Re: [RFC v3 5/5] sched/{core,cpufreq_schedutil}: add capacity clamping for RT/DL tasks

2017-03-16 Thread Patrick Bellasi

On 16-Mar 11:16, Juri Lelli wrote:
> On 15/03/17 16:40, Joel Fernandes wrote:
> > On Wed, Mar 15, 2017 at 9:24 AM, Juri Lelli  wrote:
> > [..]
> > >
> > >> > However, trying to quickly summarize how that would work (for who is
> > >> > already somewhat familiar with reclaiming bits):
> > >> >
> > >> >  - a task utilization contribution is accounted for (at rq level) as
> > >> >soon as it wakes up for the first time in a new period
> > >> >  - its contribution is then removed after the 0lag time (or when the
> > >> >task gets throttled)
> > >> >  - frequency transitions are triggered accordingly
> > >> >
> > >> > So, I don't see why triggering a go down request after the 0lag time
> > >> > expired and quickly reacting to tasks waking up would have create
> > >> > problems in your case?
> > >>
> > >> In my experience, the 'reacting to tasks' bit doesn't work very well.
> > >
> > > Humm.. but in this case we won't be 'reacting', we will be
> > > 'anticipating' tasks' needs, right?
> > 
> > Are you saying we will start ramping frequency before the next
> > activation so that we're ready for it?
> > 
> 
> I'm saying that there is no need to ramp, simply select the frequency
> that is needed for a task (or a set of them).
> 
> > If not, it sounds like it will only make the frequency request on the
> > next activation when the Active bandwidth increases due to the task
> > waking up. By then task has already started to run, right?
> > 
> 
> When the task is enqueued back we select the frequency considering its
> bandwidth request (and the bandwidth/utilization of the others). So,
> when it actually starts running it will already have enough capacity to
> finish in time.

Here we are factoring out the time required to actually switch to the
required OPP. I think Joel was referring to this time.

That time cannot really be eliminated but from having faster OOP
swiching HW support. Still, jumping strating to the "optimal" OPP
instead of rumping up is a big improvement.


-- 
#include 

Patrick Bellasi

Re: [RFC v3 5/5] sched/{core,cpufreq_schedutil}: add capacity clamping for RT/DL tasks

2017-03-16 Thread Juri Lelli

On 15/03/17 16:40, Joel Fernandes wrote:
> On Wed, Mar 15, 2017 at 9:24 AM, Juri Lelli  wrote:
> [..]
> >
> >> > However, trying to quickly summarize how that would work (for who is
> >> > already somewhat familiar with reclaiming bits):
> >> >
> >> >  - a task utilization contribution is accounted for (at rq level) as
> >> >soon as it wakes up for the first time in a new period
> >> >  - its contribution is then removed after the 0lag time (or when the
> >> >task gets throttled)
> >> >  - frequency transitions are triggered accordingly
> >> >
> >> > So, I don't see why triggering a go down request after the 0lag time
> >> > expired and quickly reacting to tasks waking up would have create
> >> > problems in your case?
> >>
> >> In my experience, the 'reacting to tasks' bit doesn't work very well.
> >
> > Humm.. but in this case we won't be 'reacting', we will be
> > 'anticipating' tasks' needs, right?
> 
> Are you saying we will start ramping frequency before the next
> activation so that we're ready for it?
> 

I'm saying that there is no need to ramp, simply select the frequency
that is needed for a task (or a set of them).

> If not, it sounds like it will only make the frequency request on the
> next activation when the Active bandwidth increases due to the task
> waking up. By then task has already started to run, right?
> 

When the task is enqueued back we select the frequency considering its
bandwidth request (and the bandwidth/utilization of the others). So,
when it actually starts running it will already have enough capacity to
finish in time.

Re: [RFC v3 5/5] sched/{core,cpufreq_schedutil}: add capacity clamping for RT/DL tasks

2017-03-15 Thread Joel Fernandes

On Wed, Mar 15, 2017 at 9:24 AM, Juri Lelli  wrote:
[..]
>
>> > However, trying to quickly summarize how that would work (for who is
>> > already somewhat familiar with reclaiming bits):
>> >
>> >  - a task utilization contribution is accounted for (at rq level) as
>> >soon as it wakes up for the first time in a new period
>> >  - its contribution is then removed after the 0lag time (or when the
>> >task gets throttled)
>> >  - frequency transitions are triggered accordingly
>> >
>> > So, I don't see why triggering a go down request after the 0lag time
>> > expired and quickly reacting to tasks waking up would have create
>> > problems in your case?
>>
>> In my experience, the 'reacting to tasks' bit doesn't work very well.
>
> Humm.. but in this case we won't be 'reacting', we will be
> 'anticipating' tasks' needs, right?

Are you saying we will start ramping frequency before the next
activation so that we're ready for it?

If not, it sounds like it will only make the frequency request on the
next activation when the Active bandwidth increases due to the task
waking up. By then task has already started to run, right?

Thanks,
Joel

Re: [RFC v3 5/5] sched/{core,cpufreq_schedutil}: add capacity clamping for RT/DL tasks

2017-03-15 Thread Juri Lelli

On 15/03/17 09:13, Joel Fernandes wrote:
> On Wed, Mar 15, 2017 at 7:44 AM, Juri Lelli  wrote:
> > Hi Joel,
> >
> > On 15/03/17 05:59, Joel Fernandes wrote:
> >> On Wed, Mar 15, 2017 at 4:40 AM, Patrick Bellasi
> >>  wrote:
> >> > On 13-Mar 03:08, Joel Fernandes (Google) wrote:
> >> >> Hi Patrick,
> >> >>
> >> >> On Tue, Feb 28, 2017 at 6:38 AM, Patrick Bellasi
> >> >>  wrote:
> >> >> > Currently schedutil enforce a maximum OPP when RT/DL tasks are 
> >> >> > RUNNABLE.
> >> >> > Such a mandatory policy can be made more tunable from userspace thus
> >> >> > allowing for example to define a reasonable max capacity (i.e.
> >> >> > frequency) which is required for the execution of a specific RT/DL
> >> >> > workload. This will contribute to make the RT class more "friendly" 
> >> >> > for
> >> >> > power/energy sensible applications.
> >> >> >
> >> >> > This patch extends the usage of capacity_{min,max} to the RT/DL 
> >> >> > classes.
> >> >> > Whenever a task in these classes is RUNNABLE, the capacity required is
> >> >> > defined by the constraints of the control group that task belongs to.
> >> >> >
> >> >>
> >> >> We briefly discussed this at Linaro Connect that this works well for
> >> >> sporadic RT tasks that run briefly and then sleep for long periods of
> >> >> time - so certainly this patch is good, but its only a partial
> >> >> solution to the problem of frequent and short-sleepers and something
> >> >> is required to keep the boost active for short non-RUNNABLE as well.
> >> >> The behavior with many periodic RT tasks is that they will sleep for
> >> >> short intervals and run for short intervals periodically. In this case
> >> >> removing the clamp (or the boost as in schedtune v2) on a dequeue will
> >> >> essentially mean during a narrow window cpufreq can drop the frequency
> >> >> and only to make it go back up again.
> >> >>
> >> >> Currently for schedtune v2, I am working on prototyping something like
> >> >> the following for Android:
> >> >> - if RT task is enqueue, introduce the boost.
> >> >> - When task is dequeued, start a timer for a  "minimum deboost delay
> >> >> time" before taking out the boost.
> >> >> - If task is enqueued again before the timer fires, then cancel the 
> >> >> timer.
> >> >>
> >> >> I don't think any "fix" to this particular issue should be to the
> >> >> schedutil governor and should be sorted before going to cpufreq itself
> >> >> (that is before making the request). What do you think about this?
> >> >
> >> > My short observations are:
> >> >
> >> > 1) for certain RT tasks, which have a quite "predictable" activation
> >> >pattern, we should definitively try to use DEADLINE... which will
> >> >factor out all "boosting potential races" since the bandwidth
> >> >requirements are well defined at task description time.
> >>
> >> I don't immediately see how deadline can fix this, when a task is
> >> dequeued after end of its current runtime, its bandwidth will be
> >> subtracted from the active running bandwidth. This is what drives the
> >> DL part of the capacity request. In this case, we run into the same
> >> issue as with the boost-removal on dequeue. Isn't it?
> >>
> >
> > Unfortunately, I still have to post the set of patches (based on Luca's
> > reclaiming set) that introduces driving of clock frequency from
> > DEADLINE, so I guess everything we can discuss about how DEADLINE might
> > help here might be difficult to understand. :(
> >
> > I should definitely fix that.
> 
> I fully understand, Sorry to be discussing this too soon here...
> 

No problem. I just thought I should clarify before people go WTH are
these guys talking about?! :)

> > However, trying to quickly summarize how that would work (for who is
> > already somewhat familiar with reclaiming bits):
> >
> >  - a task utilization contribution is accounted for (at rq level) as
> >soon as it wakes up for the first time in a new period
> >  - its contribution is then removed after the 0lag time (or when the
> >task gets throttled)
> >  - frequency transitions are triggered accordingly
> >
> > So, I don't see why triggering a go down request after the 0lag time
> > expired and quickly reacting to tasks waking up would have create
> > problems in your case?
> 
> In my experience, the 'reacting to tasks' bit doesn't work very well.

Humm.. but in this case we won't be 'reacting', we will be
'anticipating' tasks' needs, right?

> For short running period tasks, we need to set the frequency to
> something and not ramp it down too quickly (for ex, runtime 1.5ms and
> period 3ms). In this case the 0-lag time would be < 3ms. I guess if
> we're going to use 0-lag time, then we'd need to set it runtime and
> period to be higher than exactly matching the task's? So would we be
> assigning the same bandwidth but for R/T instead of r/t (Where r, R
> are the runtimes and t,T are periods, and R > r and T > t)?
> 

In general, I guess, you could let the Period be the task's period and
set

Re: [RFC v3 5/5] sched/{core,cpufreq_schedutil}: add capacity clamping for RT/DL tasks

2017-03-15 Thread Joel Fernandes

On Wed, Mar 15, 2017 at 7:44 AM, Juri Lelli  wrote:
> Hi Joel,
>
> On 15/03/17 05:59, Joel Fernandes wrote:
>> On Wed, Mar 15, 2017 at 4:40 AM, Patrick Bellasi
>>  wrote:
>> > On 13-Mar 03:08, Joel Fernandes (Google) wrote:
>> >> Hi Patrick,
>> >>
>> >> On Tue, Feb 28, 2017 at 6:38 AM, Patrick Bellasi
>> >>  wrote:
>> >> > Currently schedutil enforce a maximum OPP when RT/DL tasks are RUNNABLE.
>> >> > Such a mandatory policy can be made more tunable from userspace thus
>> >> > allowing for example to define a reasonable max capacity (i.e.
>> >> > frequency) which is required for the execution of a specific RT/DL
>> >> > workload. This will contribute to make the RT class more "friendly" for
>> >> > power/energy sensible applications.
>> >> >
>> >> > This patch extends the usage of capacity_{min,max} to the RT/DL classes.
>> >> > Whenever a task in these classes is RUNNABLE, the capacity required is
>> >> > defined by the constraints of the control group that task belongs to.
>> >> >
>> >>
>> >> We briefly discussed this at Linaro Connect that this works well for
>> >> sporadic RT tasks that run briefly and then sleep for long periods of
>> >> time - so certainly this patch is good, but its only a partial
>> >> solution to the problem of frequent and short-sleepers and something
>> >> is required to keep the boost active for short non-RUNNABLE as well.
>> >> The behavior with many periodic RT tasks is that they will sleep for
>> >> short intervals and run for short intervals periodically. In this case
>> >> removing the clamp (or the boost as in schedtune v2) on a dequeue will
>> >> essentially mean during a narrow window cpufreq can drop the frequency
>> >> and only to make it go back up again.
>> >>
>> >> Currently for schedtune v2, I am working on prototyping something like
>> >> the following for Android:
>> >> - if RT task is enqueue, introduce the boost.
>> >> - When task is dequeued, start a timer for a  "minimum deboost delay
>> >> time" before taking out the boost.
>> >> - If task is enqueued again before the timer fires, then cancel the timer.
>> >>
>> >> I don't think any "fix" to this particular issue should be to the
>> >> schedutil governor and should be sorted before going to cpufreq itself
>> >> (that is before making the request). What do you think about this?
>> >
>> > My short observations are:
>> >
>> > 1) for certain RT tasks, which have a quite "predictable" activation
>> >pattern, we should definitively try to use DEADLINE... which will
>> >factor out all "boosting potential races" since the bandwidth
>> >requirements are well defined at task description time.
>>
>> I don't immediately see how deadline can fix this, when a task is
>> dequeued after end of its current runtime, its bandwidth will be
>> subtracted from the active running bandwidth. This is what drives the
>> DL part of the capacity request. In this case, we run into the same
>> issue as with the boost-removal on dequeue. Isn't it?
>>
>
> Unfortunately, I still have to post the set of patches (based on Luca's
> reclaiming set) that introduces driving of clock frequency from
> DEADLINE, so I guess everything we can discuss about how DEADLINE might
> help here might be difficult to understand. :(
>
> I should definitely fix that.

I fully understand, Sorry to be discussing this too soon here...

> However, trying to quickly summarize how that would work (for who is
> already somewhat familiar with reclaiming bits):
>
>  - a task utilization contribution is accounted for (at rq level) as
>soon as it wakes up for the first time in a new period
>  - its contribution is then removed after the 0lag time (or when the
>task gets throttled)
>  - frequency transitions are triggered accordingly
>
> So, I don't see why triggering a go down request after the 0lag time
> expired and quickly reacting to tasks waking up would have create
> problems in your case?

In my experience, the 'reacting to tasks' bit doesn't work very well.
For short running period tasks, we need to set the frequency to
something and not ramp it down too quickly (for ex, runtime 1.5ms and
period 3ms). In this case the 0-lag time would be < 3ms. I guess if
we're going to use 0-lag time, then we'd need to set it runtime and
period to be higher than exactly matching the task's? So would we be
assigning the same bandwidth but for R/T instead of r/t (Where r, R
are the runtimes and t,T are periods, and R > r and T > t)?

Thanks,
Joel

Re: [RFC v3 5/5] sched/{core,cpufreq_schedutil}: add capacity clamping for RT/DL tasks

2017-03-15 Thread Juri Lelli

Hi Joel,

On 15/03/17 05:59, Joel Fernandes wrote:
> On Wed, Mar 15, 2017 at 4:40 AM, Patrick Bellasi
>  wrote:
> > On 13-Mar 03:08, Joel Fernandes (Google) wrote:
> >> Hi Patrick,
> >>
> >> On Tue, Feb 28, 2017 at 6:38 AM, Patrick Bellasi
> >>  wrote:
> >> > Currently schedutil enforce a maximum OPP when RT/DL tasks are RUNNABLE.
> >> > Such a mandatory policy can be made more tunable from userspace thus
> >> > allowing for example to define a reasonable max capacity (i.e.
> >> > frequency) which is required for the execution of a specific RT/DL
> >> > workload. This will contribute to make the RT class more "friendly" for
> >> > power/energy sensible applications.
> >> >
> >> > This patch extends the usage of capacity_{min,max} to the RT/DL classes.
> >> > Whenever a task in these classes is RUNNABLE, the capacity required is
> >> > defined by the constraints of the control group that task belongs to.
> >> >
> >>
> >> We briefly discussed this at Linaro Connect that this works well for
> >> sporadic RT tasks that run briefly and then sleep for long periods of
> >> time - so certainly this patch is good, but its only a partial
> >> solution to the problem of frequent and short-sleepers and something
> >> is required to keep the boost active for short non-RUNNABLE as well.
> >> The behavior with many periodic RT tasks is that they will sleep for
> >> short intervals and run for short intervals periodically. In this case
> >> removing the clamp (or the boost as in schedtune v2) on a dequeue will
> >> essentially mean during a narrow window cpufreq can drop the frequency
> >> and only to make it go back up again.
> >>
> >> Currently for schedtune v2, I am working on prototyping something like
> >> the following for Android:
> >> - if RT task is enqueue, introduce the boost.
> >> - When task is dequeued, start a timer for a  "minimum deboost delay
> >> time" before taking out the boost.
> >> - If task is enqueued again before the timer fires, then cancel the timer.
> >>
> >> I don't think any "fix" to this particular issue should be to the
> >> schedutil governor and should be sorted before going to cpufreq itself
> >> (that is before making the request). What do you think about this?
> >
> > My short observations are:
> >
> > 1) for certain RT tasks, which have a quite "predictable" activation
> >pattern, we should definitively try to use DEADLINE... which will
> >factor out all "boosting potential races" since the bandwidth
> >requirements are well defined at task description time.
> 
> I don't immediately see how deadline can fix this, when a task is
> dequeued after end of its current runtime, its bandwidth will be
> subtracted from the active running bandwidth. This is what drives the
> DL part of the capacity request. In this case, we run into the same
> issue as with the boost-removal on dequeue. Isn't it?
> 

Unfortunately, I still have to post the set of patches (based on Luca's
reclaiming set) that introduces driving of clock frequency from
DEADLINE, so I guess everything we can discuss about how DEADLINE might
help here might be difficult to understand. :(

I should definitely fix that.

However, trying to quickly summarize how that would work (for who is
already somewhat familiar with reclaiming bits):

 - a task utilization contribution is accounted for (at rq level) as
   soon as it wakes up for the first time in a new period
 - its contribution is then removed after the 0lag time (or when the
   task gets throttled)
 - frequency transitions are triggered accordingly

So, I don't see why triggering a go down request after the 0lag time
expired and quickly reacting to tasks waking up would have create
problems in your case?

Thanks,

- Juri

Re: [RFC v3 5/5] sched/{core,cpufreq_schedutil}: add capacity clamping for RT/DL tasks

2017-03-15 Thread Joel Fernandes

On Wed, Mar 15, 2017 at 4:40 AM, Patrick Bellasi
 wrote:
> On 13-Mar 03:08, Joel Fernandes (Google) wrote:
>> Hi Patrick,
>>
>> On Tue, Feb 28, 2017 at 6:38 AM, Patrick Bellasi
>>  wrote:
>> > Currently schedutil enforce a maximum OPP when RT/DL tasks are RUNNABLE.
>> > Such a mandatory policy can be made more tunable from userspace thus
>> > allowing for example to define a reasonable max capacity (i.e.
>> > frequency) which is required for the execution of a specific RT/DL
>> > workload. This will contribute to make the RT class more "friendly" for
>> > power/energy sensible applications.
>> >
>> > This patch extends the usage of capacity_{min,max} to the RT/DL classes.
>> > Whenever a task in these classes is RUNNABLE, the capacity required is
>> > defined by the constraints of the control group that task belongs to.
>> >
>>
>> We briefly discussed this at Linaro Connect that this works well for
>> sporadic RT tasks that run briefly and then sleep for long periods of
>> time - so certainly this patch is good, but its only a partial
>> solution to the problem of frequent and short-sleepers and something
>> is required to keep the boost active for short non-RUNNABLE as well.
>> The behavior with many periodic RT tasks is that they will sleep for
>> short intervals and run for short intervals periodically. In this case
>> removing the clamp (or the boost as in schedtune v2) on a dequeue will
>> essentially mean during a narrow window cpufreq can drop the frequency
>> and only to make it go back up again.
>>
>> Currently for schedtune v2, I am working on prototyping something like
>> the following for Android:
>> - if RT task is enqueue, introduce the boost.
>> - When task is dequeued, start a timer for a  "minimum deboost delay
>> time" before taking out the boost.
>> - If task is enqueued again before the timer fires, then cancel the timer.
>>
>> I don't think any "fix" to this particular issue should be to the
>> schedutil governor and should be sorted before going to cpufreq itself
>> (that is before making the request). What do you think about this?
>
> My short observations are:
>
> 1) for certain RT tasks, which have a quite "predictable" activation
>pattern, we should definitively try to use DEADLINE... which will
>factor out all "boosting potential races" since the bandwidth
>requirements are well defined at task description time.

I don't immediately see how deadline can fix this, when a task is
dequeued after end of its current runtime, its bandwidth will be
subtracted from the active running bandwidth. This is what drives the
DL part of the capacity request. In this case, we run into the same
issue as with the boost-removal on dequeue. Isn't it?

> 4) Previous point is about "separation of concerns", thus IMHO any
>policy defining how to consume the CPU utilization signal
>(whether it is boosted or not) should be responsibility of
>schedutil, which eventually does not exclude useful input from the
>scheduler.
>
> 5) I understand the usefulness of a scale down threshold for schedutil
>to reduce the current OPP, while I don't get the point for a scale
>up threshold. If the system is demanding more capacity and there
>are not HW constrains (e.g. pending changes) then we should go up
>as soon as possible.
>
> Finally, I think we can improve quite a lot the boosting issues you
> are having with RT tasks by better refining the schedutil thresholds
> implementation.
>
> We already have some patches pending for review:
>https://lkml.org/lkml/2017/3/2/385
> which fixes some schedutil issue and we will follow up with others
> trying to improve the rate-limiting to not compromise responsiveness.

I agree we can try to explore fixing schedutil to do the right thing.

J.

Re: [RFC v3 5/5] sched/{core,cpufreq_schedutil}: add capacity clamping for RT/DL tasks

2017-03-15 Thread Patrick Bellasi

On 13-Mar 03:08, Joel Fernandes (Google) wrote:
> Hi Patrick,
> 
> On Tue, Feb 28, 2017 at 6:38 AM, Patrick Bellasi
>  wrote:
> > Currently schedutil enforce a maximum OPP when RT/DL tasks are RUNNABLE.
> > Such a mandatory policy can be made more tunable from userspace thus
> > allowing for example to define a reasonable max capacity (i.e.
> > frequency) which is required for the execution of a specific RT/DL
> > workload. This will contribute to make the RT class more "friendly" for
> > power/energy sensible applications.
> >
> > This patch extends the usage of capacity_{min,max} to the RT/DL classes.
> > Whenever a task in these classes is RUNNABLE, the capacity required is
> > defined by the constraints of the control group that task belongs to.
> >
> 
> We briefly discussed this at Linaro Connect that this works well for
> sporadic RT tasks that run briefly and then sleep for long periods of
> time - so certainly this patch is good, but its only a partial
> solution to the problem of frequent and short-sleepers and something
> is required to keep the boost active for short non-RUNNABLE as well.
> The behavior with many periodic RT tasks is that they will sleep for
> short intervals and run for short intervals periodically. In this case
> removing the clamp (or the boost as in schedtune v2) on a dequeue will
> essentially mean during a narrow window cpufreq can drop the frequency
> and only to make it go back up again.
> 
> Currently for schedtune v2, I am working on prototyping something like
> the following for Android:
> - if RT task is enqueue, introduce the boost.
> - When task is dequeued, start a timer for a  "minimum deboost delay
> time" before taking out the boost.
> - If task is enqueued again before the timer fires, then cancel the timer.
> 
> I don't think any "fix" to this particular issue should be to the
> schedutil governor and should be sorted before going to cpufreq itself
> (that is before making the request). What do you think about this?

My short observations are:

1) for certain RT tasks, which have a quite "predictable" activation
   pattern, we should definitively try to use DEADLINE... which will
   factor out all "boosting potential races" since the bandwidth
   requirements are well defined at task description time.

2) CPU boosting is, at least for the time being, a best-effort feature
   which is introduced mainly for FAIR tasks.

3) Tracking the boost at enqueue/dequeue time matches with the design
   to track features/properties of the currently RUNNABLE tasks, while
   avoiding to add yet another signal to track CPUs utilization.

4) Previous point is about "separation of concerns", thus IMHO any
   policy defining how to consume the CPU utilization signal
   (whether it is boosted or not) should be responsibility of
   schedutil, which eventually does not exclude useful input from the
   scheduler.

5) I understand the usefulness of a scale down threshold for schedutil
   to reduce the current OPP, while I don't get the point for a scale
   up threshold. If the system is demanding more capacity and there
   are not HW constrains (e.g. pending changes) then we should go up
   as soon as possible.

Finally, I think we can improve quite a lot the boosting issues you
are having with RT tasks by better refining the schedutil thresholds
implementation.

We already have some patches pending for review:
   https://lkml.org/lkml/2017/3/2/385
which fixes some schedutil issue and we will follow up with others
trying to improve the rate-limiting to not compromise responsiveness.

> Thanks,
> Joel

Cheers Patrick

-- 
#include 

Patrick Bellasi

Re: [RFC v3 5/5] sched/{core,cpufreq_schedutil}: add capacity clamping for RT/DL tasks

2017-03-13 Thread Joel Fernandes (Google)

Hi Patrick,

On Tue, Feb 28, 2017 at 6:38 AM, Patrick Bellasi
 wrote:
> Currently schedutil enforce a maximum OPP when RT/DL tasks are RUNNABLE.
> Such a mandatory policy can be made more tunable from userspace thus
> allowing for example to define a reasonable max capacity (i.e.
> frequency) which is required for the execution of a specific RT/DL
> workload. This will contribute to make the RT class more "friendly" for
> power/energy sensible applications.
>
> This patch extends the usage of capacity_{min,max} to the RT/DL classes.
> Whenever a task in these classes is RUNNABLE, the capacity required is
> defined by the constraints of the control group that task belongs to.
>

We briefly discussed this at Linaro Connect that this works well for
sporadic RT tasks that run briefly and then sleep for long periods of
time - so certainly this patch is good, but its only a partial
solution to the problem of frequent and short-sleepers and something
is required to keep the boost active for short non-RUNNABLE as well.
The behavior with many periodic RT tasks is that they will sleep for
short intervals and run for short intervals periodically. In this case
removing the clamp (or the boost as in schedtune v2) on a dequeue will
essentially mean during a narrow window cpufreq can drop the frequency
and only to make it go back up again.

Currently for schedtune v2, I am working on prototyping something like
the following for Android:
- if RT task is enqueue, introduce the boost.
- When task is dequeued, start a timer for a  "minimum deboost delay
time" before taking out the boost.
- If task is enqueued again before the timer fires, then cancel the timer.

I don't think any "fix" to this particular issue should be to the
schedutil governor and should be sorted before going to cpufreq itself
(that is before making the request). What do you think about this?

Thanks,
Joel

[RFC v3 5/5] sched/{core,cpufreq_schedutil}: add capacity clamping for RT/DL tasks

2017-02-28 Thread Patrick Bellasi

Currently schedutil enforce a maximum OPP when RT/DL tasks are RUNNABLE.
Such a mandatory policy can be made more tunable from userspace thus
allowing for example to define a reasonable max capacity (i.e.
frequency) which is required for the execution of a specific RT/DL
workload. This will contribute to make the RT class more "friendly" for
power/energy sensible applications.

This patch extends the usage of capacity_{min,max} to the RT/DL classes.
Whenever a task in these classes is RUNNABLE, the capacity required is
defined by the constraints of the control group that task belongs to.

Signed-off-by: Patrick Bellasi 
Cc: Ingo Molnar 
Cc: Peter Zijlstra 
Cc: Rafael J. Wysocki 
Cc: linux-kernel@vger.kernel.org
Cc: linux...@vger.kernel.org
---
 kernel/sched/cpufreq_schedutil.c | 15 +++
 1 file changed, 7 insertions(+), 8 deletions(-)

diff --git a/kernel/sched/cpufreq_schedutil.c b/kernel/sched/cpufreq_schedutil.c
index 51484f7..18abd62 100644
--- a/kernel/sched/cpufreq_schedutil.c
+++ b/kernel/sched/cpufreq_schedutil.c
@@ -256,7 +256,9 @@ static void sugov_update_single(struct update_util_data 
*hook, u64 time,
return;
 
if (flags & SCHED_CPUFREQ_RT_DL) {
-   next_f = policy->cpuinfo.max_freq;
+   util = cap_clamp_cpu_util(smp_processor_id(),
+ SCHED_CAPACITY_SCALE);
+   next_f = get_next_freq(sg_cpu, util, policy->cpuinfo.max_freq);
} else {
sugov_get_util(&util, &max);
sugov_iowait_boost(sg_cpu, &util, &max);
@@ -272,15 +274,11 @@ static unsigned int sugov_next_freq_shared(struct 
sugov_cpu *sg_cpu,
 {
struct sugov_policy *sg_policy = sg_cpu->sg_policy;
struct cpufreq_policy *policy = sg_policy->policy;
-   unsigned int max_f = policy->cpuinfo.max_freq;
u64 last_freq_update_time = sg_policy->last_freq_update_time;
unsigned int cap_max = SCHED_CAPACITY_SCALE;
unsigned int cap_min = 0;
unsigned int j;
 
-   if (flags & SCHED_CPUFREQ_RT_DL)
-   return max_f;
-
sugov_iowait_boost(sg_cpu, &util, &max);
 
/* Initialize clamping range based on caller CPU constraints */
@@ -308,10 +306,11 @@ static unsigned int sugov_next_freq_shared(struct 
sugov_cpu *sg_cpu,
j_sg_cpu->iowait_boost = 0;
continue;
}
-   if (j_sg_cpu->flags & SCHED_CPUFREQ_RT_DL)
-   return max_f;
 
-   j_util = j_sg_cpu->util;
+   if (j_sg_cpu->flags & SCHED_CPUFREQ_RT_DL)
+   j_util = cap_clamp_cpu_util(j, SCHED_CAPACITY_SCALE);
+   else
+   j_util = j_sg_cpu->util;
j_max = j_sg_cpu->max;
if (j_util * max > j_max * util) {
util = j_util;
-- 
2.7.4

Re: [RFC v3 5/5] sched/{core,cpufreq_schedutil}: add capacity clamping for RT/DL tasks

Re: [RFC v3 5/5] sched/{core,cpufreq_schedutil}: add capacity clamping for RT/DL tasks

Re: [RFC v3 5/5] sched/{core,cpufreq_schedutil}: add capacity clamping for RT/DL tasks

Re: [RFC v3 5/5] sched/{core,cpufreq_schedutil}: add capacity clamping for RT/DL tasks

Re: [RFC v3 5/5] sched/{core,cpufreq_schedutil}: add capacity clamping for RT/DL tasks

Re: [RFC v3 5/5] sched/{core,cpufreq_schedutil}: add capacity clamping for RT/DL tasks

Re: [RFC v3 5/5] sched/{core,cpufreq_schedutil}: add capacity clamping for RT/DL tasks

Re: [RFC v3 5/5] sched/{core,cpufreq_schedutil}: add capacity clamping for RT/DL tasks

Re: [RFC v3 5/5] sched/{core,cpufreq_schedutil}: add capacity clamping for RT/DL tasks

Re: [RFC v3 5/5] sched/{core,cpufreq_schedutil}: add capacity clamping for RT/DL tasks

Re: [RFC v3 5/5] sched/{core,cpufreq_schedutil}: add capacity clamping for RT/DL tasks

Re: [RFC v3 5/5] sched/{core,cpufreq_schedutil}: add capacity clamping for RT/DL tasks

[RFC v3 5/5] sched/{core,cpufreq_schedutil}: add capacity clamping for RT/DL tasks

13 matches

Site Navigation

Mail list logo

Footer information