Re: [RFC/RFT][PATCH 1/1] cpufreq: New governor using utilization data from the scheduler

2016-03-01 Thread Rafael J. Wysocki
On Tue, Mar 1, 2016 at 3:56 PM, Juri Lelli  wrote:
> On 26/02/16 03:36, Rafael J. Wysocki wrote:
>> On Thursday, February 25, 2016 11:01:20 AM Juri Lelli wrote:
>
> [...]
>
>> >
>> > That is right. But, can't an higher priority class eat all the needed
>> > capacity. I mean, suppose that both CFS and DL need 30% of CPU capacity
>> > on the same CPU. DL wins and gets its 30% of capacity. When CFS gets to
>> > run it's too late for requesting anything more (w.r.t. the same time
>> > window). If we somehow aggregate requests instead, we could request 60%
>> > and both classes can have their capacity to run. It seems to me that
>> > this is what governors were already doing by using the 1 - idle metric.
>>
>> That's interesting, because it is about a few different things at a time. :-)
>>
>> So first of all the "old" governors only collect information about what
>> happened in the past and make decisions on that basis (kind of in the hope
>> that what happened once will happen again), while the idea behind what
>> you're describing seems to be to attempt to project future needs for
>> capacity and use that to make decisions (just for the very near future,
>> but that should be sufficient).  If successful, that would be the most
>> suitable approach IMO.
>>
>
> Right, this is a key difference.
>
>> Of course, the $subject patch is not aspiring to anything of that kind.
>> It only uses information about current needs that's already available to
>> it in a very straightforward way.
>>
>
> But, using utilization of CFS tasks (based on PELT) has already some
> notion of "future needs" (even if it is true that tasks might have
> phases). And this will be true for DL as well, once we will have a
> corresponding utilization signal that we can consume. I think you are
> already consuming information about the future in some sense. :-)

That's because the already available numbers include that information.
I don't do any projections myself.

>> But there's more to it.  In the sampling, or rate-limiting if you will,
>> situation you really have a window in which many things can happen and
>> making a good decision at the beginning of it is important.  However, if
>> you just can handle *every* request and really switch frequencies on the
>> fly, then each of them may come with a "currently needed capacity" number
>> and you can just give it what it asks for every time.
>>
>
> True. Rate-limiting poses interesting problems.
>
>> My point is that there are quite a few things to consider here and I'm
>> expecting a learning process to happen before we are happy with what we
>> have.  So my approach would be (and is) to start very simple and then
>> add more complexity over time as needed instead of just trying to address
>> every issue I can think about from the outset.
>>
>
> I perfectly understand that, and I agree that there is value in starting
> simple. I simply fear that aggregation of utilization signals will be one
> of the few things that will pop out fairly soon. :-)

That's OK.  If it is demonstrably better than the super-simple initial
approach, there won't be any reason to reject it.

Thanks,
Rafael


Re: [RFC/RFT][PATCH 1/1] cpufreq: New governor using utilization data from the scheduler

2016-03-01 Thread Rafael J. Wysocki
On Tue, Mar 1, 2016 at 3:56 PM, Juri Lelli  wrote:
> On 26/02/16 03:36, Rafael J. Wysocki wrote:
>> On Thursday, February 25, 2016 11:01:20 AM Juri Lelli wrote:
>
> [...]
>
>> >
>> > That is right. But, can't an higher priority class eat all the needed
>> > capacity. I mean, suppose that both CFS and DL need 30% of CPU capacity
>> > on the same CPU. DL wins and gets its 30% of capacity. When CFS gets to
>> > run it's too late for requesting anything more (w.r.t. the same time
>> > window). If we somehow aggregate requests instead, we could request 60%
>> > and both classes can have their capacity to run. It seems to me that
>> > this is what governors were already doing by using the 1 - idle metric.
>>
>> That's interesting, because it is about a few different things at a time. :-)
>>
>> So first of all the "old" governors only collect information about what
>> happened in the past and make decisions on that basis (kind of in the hope
>> that what happened once will happen again), while the idea behind what
>> you're describing seems to be to attempt to project future needs for
>> capacity and use that to make decisions (just for the very near future,
>> but that should be sufficient).  If successful, that would be the most
>> suitable approach IMO.
>>
>
> Right, this is a key difference.
>
>> Of course, the $subject patch is not aspiring to anything of that kind.
>> It only uses information about current needs that's already available to
>> it in a very straightforward way.
>>
>
> But, using utilization of CFS tasks (based on PELT) has already some
> notion of "future needs" (even if it is true that tasks might have
> phases). And this will be true for DL as well, once we will have a
> corresponding utilization signal that we can consume. I think you are
> already consuming information about the future in some sense. :-)

That's because the already available numbers include that information.
I don't do any projections myself.

>> But there's more to it.  In the sampling, or rate-limiting if you will,
>> situation you really have a window in which many things can happen and
>> making a good decision at the beginning of it is important.  However, if
>> you just can handle *every* request and really switch frequencies on the
>> fly, then each of them may come with a "currently needed capacity" number
>> and you can just give it what it asks for every time.
>>
>
> True. Rate-limiting poses interesting problems.
>
>> My point is that there are quite a few things to consider here and I'm
>> expecting a learning process to happen before we are happy with what we
>> have.  So my approach would be (and is) to start very simple and then
>> add more complexity over time as needed instead of just trying to address
>> every issue I can think about from the outset.
>>
>
> I perfectly understand that, and I agree that there is value in starting
> simple. I simply fear that aggregation of utilization signals will be one
> of the few things that will pop out fairly soon. :-)

That's OK.  If it is demonstrably better than the super-simple initial
approach, there won't be any reason to reject it.

Thanks,
Rafael


Re: [RFC/RFT][PATCH 1/1] cpufreq: New governor using utilization data from the scheduler

2016-03-01 Thread Juri Lelli
On 26/02/16 03:36, Rafael J. Wysocki wrote:
> On Thursday, February 25, 2016 11:01:20 AM Juri Lelli wrote:

[...]

> > 
> > That is right. But, can't an higher priority class eat all the needed
> > capacity. I mean, suppose that both CFS and DL need 30% of CPU capacity
> > on the same CPU. DL wins and gets its 30% of capacity. When CFS gets to
> > run it's too late for requesting anything more (w.r.t. the same time
> > window). If we somehow aggregate requests instead, we could request 60%
> > and both classes can have their capacity to run. It seems to me that
> > this is what governors were already doing by using the 1 - idle metric.
> 
> That's interesting, because it is about a few different things at a time. :-)
> 
> So first of all the "old" governors only collect information about what
> happened in the past and make decisions on that basis (kind of in the hope
> that what happened once will happen again), while the idea behind what
> you're describing seems to be to attempt to project future needs for
> capacity and use that to make decisions (just for the very near future,
> but that should be sufficient).  If successful, that would be the most
> suitable approach IMO.
> 

Right, this is a key difference.

> Of course, the $subject patch is not aspiring to anything of that kind.
> It only uses information about current needs that's already available to
> it in a very straightforward way.
> 

But, using utilization of CFS tasks (based on PELT) has already some
notion of "future needs" (even if it is true that tasks might have
phases). And this will be true for DL as well, once we will have a
corresponding utilization signal that we can consume. I think you are
already consuming information about the future in some sense. :-)

> But there's more to it.  In the sampling, or rate-limiting if you will,
> situation you really have a window in which many things can happen and
> making a good decision at the beginning of it is important.  However, if
> you just can handle *every* request and really switch frequencies on the
> fly, then each of them may come with a "currently needed capacity" number
> and you can just give it what it asks for every time.
> 

True. Rate-limiting poses interesting problems.

> My point is that there are quite a few things to consider here and I'm
> expecting a learning process to happen before we are happy with what we
> have.  So my approach would be (and is) to start very simple and then
> add more complexity over time as needed instead of just trying to address
> every issue I can think about from the outset.
> 

I perfectly understand that, and I agree that there is value in starting
simple. I simply fear that aggregation of utilization signals will be one
of the few things that will pop out fairly soon. :-)

Best,

- Juri


Re: [RFC/RFT][PATCH 1/1] cpufreq: New governor using utilization data from the scheduler

2016-03-01 Thread Juri Lelli
On 26/02/16 03:36, Rafael J. Wysocki wrote:
> On Thursday, February 25, 2016 11:01:20 AM Juri Lelli wrote:

[...]

> > 
> > That is right. But, can't an higher priority class eat all the needed
> > capacity. I mean, suppose that both CFS and DL need 30% of CPU capacity
> > on the same CPU. DL wins and gets its 30% of capacity. When CFS gets to
> > run it's too late for requesting anything more (w.r.t. the same time
> > window). If we somehow aggregate requests instead, we could request 60%
> > and both classes can have their capacity to run. It seems to me that
> > this is what governors were already doing by using the 1 - idle metric.
> 
> That's interesting, because it is about a few different things at a time. :-)
> 
> So first of all the "old" governors only collect information about what
> happened in the past and make decisions on that basis (kind of in the hope
> that what happened once will happen again), while the idea behind what
> you're describing seems to be to attempt to project future needs for
> capacity and use that to make decisions (just for the very near future,
> but that should be sufficient).  If successful, that would be the most
> suitable approach IMO.
> 

Right, this is a key difference.

> Of course, the $subject patch is not aspiring to anything of that kind.
> It only uses information about current needs that's already available to
> it in a very straightforward way.
> 

But, using utilization of CFS tasks (based on PELT) has already some
notion of "future needs" (even if it is true that tasks might have
phases). And this will be true for DL as well, once we will have a
corresponding utilization signal that we can consume. I think you are
already consuming information about the future in some sense. :-)

> But there's more to it.  In the sampling, or rate-limiting if you will,
> situation you really have a window in which many things can happen and
> making a good decision at the beginning of it is important.  However, if
> you just can handle *every* request and really switch frequencies on the
> fly, then each of them may come with a "currently needed capacity" number
> and you can just give it what it asks for every time.
> 

True. Rate-limiting poses interesting problems.

> My point is that there are quite a few things to consider here and I'm
> expecting a learning process to happen before we are happy with what we
> have.  So my approach would be (and is) to start very simple and then
> add more complexity over time as needed instead of just trying to address
> every issue I can think about from the outset.
> 

I perfectly understand that, and I agree that there is value in starting
simple. I simply fear that aggregation of utilization signals will be one
of the few things that will pop out fairly soon. :-)

Best,

- Juri


Re: [RFC/RFT][PATCH 1/1] cpufreq: New governor using utilization data from the scheduler

2016-02-25 Thread Rafael J. Wysocki
On Thursday, February 25, 2016 11:01:20 AM Juri Lelli wrote:
> On 23/02/16 00:02, Rafael J. Wysocki wrote:
> > On Mon, Feb 22, 2016 at 3:16 PM, Juri Lelli  wrote:
> > > Hi Rafael,
> > 
> > Hi,
> > 
> 
> Sorry, my reply to this got delayed a bit.

No problem at all.

> > > thanks for this RFC. I'm going to test it more in the next few days, but
> > > I already have some questions from skimming through it. Please find them
> > > inline below.
> > >
> > > On 22/02/16 00:18, Rafael J. Wysocki wrote:
> > >> From: Rafael J. Wysocki 
> > >>
> > >> Add a new cpufreq scaling governor, called "schedutil", that uses
> > >> scheduler-provided CPU utilization information as input for making
> > >> its decisions.
> > >>
> > >
> > > I guess the first (macro) question is why did you decide to go with a
> > > complete new governor, where new here is w.r.t. the sched-freq solution.
> > 
> > Probably the most comprehensive answer to this question is my intro
> > message: http://marc.info/?l=linux-pm=145609673008122=2
> > 
> > The executive summary is probably that this was the most
> > straightforward way to use the scheduler-provided numbers in cpufreq
> > that I could think about.
> > 
> > > AFAICT, it is true that your solution directly builds on top of the
> > > latest changes to cpufreq core and governor, but it also seems to have
> > > more than a few points in common with sched-freq,
> > 
> > That surely isn't a drawback, is it?
> > 
> 
> Not at all. I guess that I was simply wondering why you felt that a new
> approach was required. But you explain this below. :-)
> 
> > If two people come to the same conclusions in different ways, that's
> > an indication that the conclusions may actually be correct.
> > 
> > > and sched-freq has been discussed and evaluated for already quite some 
> > > time.
> > 
> > Yes, it has.
> > 
> > Does this mean that no one is allowed to try any alternatives to it now?
> >
> 
> Of course not. I'm mostly inline with what Steve replied here. But yes,
> I think that we can only gain better understanding by reviewing both
> RFCs.
> 
> > > Also, it appears to me that they both shares (or they might encounter in 
> > > the
> > > future as development progresses) the same kind of problems, like for
> > > example the fact that we can't trigger opp changes from scheduler
> > > context ATM.
> > 
> > "Give them a finger and they will ask for the hand."
> > 
> 
> I'm sorry if you felt that I was asking too much from an RFC. I wasn't
> in fact, what I wanted to say is that the two alternatives seemed to
> share the same kind of problems. Well, now it seems that you have
> already proposed a solution for one of them. :-)

That actually was a missing piece that I had planned to add from the outset, but
then decided to keep it simple to start with and omit that part until I can
clean up the ACPI driver enough to make it fit the whole picture more naturally.

I still want to do that which is mostly why I'm regarding that patch as a
prototype.

> > If you read my intro message linked above, you'll find a paragraph or
> > two about that in it.
> > 
> > And the short summary is that I have a plan to actually implement that
> > feature in the schedutil governor at least for the ACPI cpufreq
> > driver.  It shouldn't be too difficult to do either AFAICS.  So it is
> > not "we can't", but rather "we haven't implemented that yet" in this
> > particular case.
> > 
> > I may not be able to do that in the next few days, as I have other
> > things to do too, but you may expect to see that done at one point.
> > 
> > So it's not a fundamental issue or anything, it's just that I haven't
> > done that *yet* at this point, OK?
> > 
> 
> Sure. I saw what you are proposing to solve this. I'll reply to that
> patch if I'll have any comments.
> 
> > > Don't get me wrong. I think that looking at different ways to solve a
> > > problem is always beneficial, since I guess that the goal in the end is
> > > to come up with something that suits everybody's needs.
> > 
> > Precisely.
> > 
> > > I was only curious about your thoughts on sched-freq. But we can also 
> > > wait for the
> > > next RFC from Steve for this macro question. :-)
> > 
> > Right, but I have some thoughts anyway.
> > 
> > My goal, that may be quite different from yours, is to reduce the
> > cpufreq's overhead as much as I possibly can.  If I have to change the
> > way it drives the CPU frequency selection to achieve that goal, I will
> > do that, but if that can stay the way it is, that's fine too.
> > 
> 
> As Steve already said, this was not our primary goal. But it is for sure
> beneficail for everybody.
> 
> > Some progress has been made already here: we have dealt with the
> > timers for good now I think.
> > 
> > This patch deals with the overhead associated with the load tracking
> > carried by "traditional" cpufreq governors and with a couple of
> > questionable things done by "ondemand" in addition to 

Re: [RFC/RFT][PATCH 1/1] cpufreq: New governor using utilization data from the scheduler

2016-02-25 Thread Rafael J. Wysocki
On Thursday, February 25, 2016 11:01:20 AM Juri Lelli wrote:
> On 23/02/16 00:02, Rafael J. Wysocki wrote:
> > On Mon, Feb 22, 2016 at 3:16 PM, Juri Lelli  wrote:
> > > Hi Rafael,
> > 
> > Hi,
> > 
> 
> Sorry, my reply to this got delayed a bit.

No problem at all.

> > > thanks for this RFC. I'm going to test it more in the next few days, but
> > > I already have some questions from skimming through it. Please find them
> > > inline below.
> > >
> > > On 22/02/16 00:18, Rafael J. Wysocki wrote:
> > >> From: Rafael J. Wysocki 
> > >>
> > >> Add a new cpufreq scaling governor, called "schedutil", that uses
> > >> scheduler-provided CPU utilization information as input for making
> > >> its decisions.
> > >>
> > >
> > > I guess the first (macro) question is why did you decide to go with a
> > > complete new governor, where new here is w.r.t. the sched-freq solution.
> > 
> > Probably the most comprehensive answer to this question is my intro
> > message: http://marc.info/?l=linux-pm=145609673008122=2
> > 
> > The executive summary is probably that this was the most
> > straightforward way to use the scheduler-provided numbers in cpufreq
> > that I could think about.
> > 
> > > AFAICT, it is true that your solution directly builds on top of the
> > > latest changes to cpufreq core and governor, but it also seems to have
> > > more than a few points in common with sched-freq,
> > 
> > That surely isn't a drawback, is it?
> > 
> 
> Not at all. I guess that I was simply wondering why you felt that a new
> approach was required. But you explain this below. :-)
> 
> > If two people come to the same conclusions in different ways, that's
> > an indication that the conclusions may actually be correct.
> > 
> > > and sched-freq has been discussed and evaluated for already quite some 
> > > time.
> > 
> > Yes, it has.
> > 
> > Does this mean that no one is allowed to try any alternatives to it now?
> >
> 
> Of course not. I'm mostly inline with what Steve replied here. But yes,
> I think that we can only gain better understanding by reviewing both
> RFCs.
> 
> > > Also, it appears to me that they both shares (or they might encounter in 
> > > the
> > > future as development progresses) the same kind of problems, like for
> > > example the fact that we can't trigger opp changes from scheduler
> > > context ATM.
> > 
> > "Give them a finger and they will ask for the hand."
> > 
> 
> I'm sorry if you felt that I was asking too much from an RFC. I wasn't
> in fact, what I wanted to say is that the two alternatives seemed to
> share the same kind of problems. Well, now it seems that you have
> already proposed a solution for one of them. :-)

That actually was a missing piece that I had planned to add from the outset, but
then decided to keep it simple to start with and omit that part until I can
clean up the ACPI driver enough to make it fit the whole picture more naturally.

I still want to do that which is mostly why I'm regarding that patch as a
prototype.

> > If you read my intro message linked above, you'll find a paragraph or
> > two about that in it.
> > 
> > And the short summary is that I have a plan to actually implement that
> > feature in the schedutil governor at least for the ACPI cpufreq
> > driver.  It shouldn't be too difficult to do either AFAICS.  So it is
> > not "we can't", but rather "we haven't implemented that yet" in this
> > particular case.
> > 
> > I may not be able to do that in the next few days, as I have other
> > things to do too, but you may expect to see that done at one point.
> > 
> > So it's not a fundamental issue or anything, it's just that I haven't
> > done that *yet* at this point, OK?
> > 
> 
> Sure. I saw what you are proposing to solve this. I'll reply to that
> patch if I'll have any comments.
> 
> > > Don't get me wrong. I think that looking at different ways to solve a
> > > problem is always beneficial, since I guess that the goal in the end is
> > > to come up with something that suits everybody's needs.
> > 
> > Precisely.
> > 
> > > I was only curious about your thoughts on sched-freq. But we can also 
> > > wait for the
> > > next RFC from Steve for this macro question. :-)
> > 
> > Right, but I have some thoughts anyway.
> > 
> > My goal, that may be quite different from yours, is to reduce the
> > cpufreq's overhead as much as I possibly can.  If I have to change the
> > way it drives the CPU frequency selection to achieve that goal, I will
> > do that, but if that can stay the way it is, that's fine too.
> > 
> 
> As Steve already said, this was not our primary goal. But it is for sure
> beneficail for everybody.
> 
> > Some progress has been made already here: we have dealt with the
> > timers for good now I think.
> > 
> > This patch deals with the overhead associated with the load tracking
> > carried by "traditional" cpufreq governors and with a couple of
> > questionable things done by "ondemand" in addition to that (which is
> > one of the reasons why I 

Re: [RFC/RFT][PATCH 1/1] cpufreq: New governor using utilization data from the scheduler

2016-02-25 Thread Juri Lelli
On 23/02/16 00:02, Rafael J. Wysocki wrote:
> On Mon, Feb 22, 2016 at 3:16 PM, Juri Lelli  wrote:
> > Hi Rafael,
> 
> Hi,
> 

Sorry, my reply to this got delayed a bit.

> > thanks for this RFC. I'm going to test it more in the next few days, but
> > I already have some questions from skimming through it. Please find them
> > inline below.
> >
> > On 22/02/16 00:18, Rafael J. Wysocki wrote:
> >> From: Rafael J. Wysocki 
> >>
> >> Add a new cpufreq scaling governor, called "schedutil", that uses
> >> scheduler-provided CPU utilization information as input for making
> >> its decisions.
> >>
> >
> > I guess the first (macro) question is why did you decide to go with a
> > complete new governor, where new here is w.r.t. the sched-freq solution.
> 
> Probably the most comprehensive answer to this question is my intro
> message: http://marc.info/?l=linux-pm=145609673008122=2
> 
> The executive summary is probably that this was the most
> straightforward way to use the scheduler-provided numbers in cpufreq
> that I could think about.
> 
> > AFAICT, it is true that your solution directly builds on top of the
> > latest changes to cpufreq core and governor, but it also seems to have
> > more than a few points in common with sched-freq,
> 
> That surely isn't a drawback, is it?
> 

Not at all. I guess that I was simply wondering why you felt that a new
approach was required. But you explain this below. :-)

> If two people come to the same conclusions in different ways, that's
> an indication that the conclusions may actually be correct.
> 
> > and sched-freq has been discussed and evaluated for already quite some time.
> 
> Yes, it has.
> 
> Does this mean that no one is allowed to try any alternatives to it now?
>

Of course not. I'm mostly inline with what Steve replied here. But yes,
I think that we can only gain better understanding by reviewing both
RFCs.

> > Also, it appears to me that they both shares (or they might encounter in the
> > future as development progresses) the same kind of problems, like for
> > example the fact that we can't trigger opp changes from scheduler
> > context ATM.
> 
> "Give them a finger and they will ask for the hand."
> 

I'm sorry if you felt that I was asking too much from an RFC. I wasn't
in fact, what I wanted to say is that the two alternatives seemed to
share the same kind of problems. Well, now it seems that you have
already proposed a solution for one of them. :-)

> If you read my intro message linked above, you'll find a paragraph or
> two about that in it.
> 
> And the short summary is that I have a plan to actually implement that
> feature in the schedutil governor at least for the ACPI cpufreq
> driver.  It shouldn't be too difficult to do either AFAICS.  So it is
> not "we can't", but rather "we haven't implemented that yet" in this
> particular case.
> 
> I may not be able to do that in the next few days, as I have other
> things to do too, but you may expect to see that done at one point.
> 
> So it's not a fundamental issue or anything, it's just that I haven't
> done that *yet* at this point, OK?
> 

Sure. I saw what you are proposing to solve this. I'll reply to that
patch if I'll have any comments.

> > Don't get me wrong. I think that looking at different ways to solve a
> > problem is always beneficial, since I guess that the goal in the end is
> > to come up with something that suits everybody's needs.
> 
> Precisely.
> 
> > I was only curious about your thoughts on sched-freq. But we can also wait 
> > for the
> > next RFC from Steve for this macro question. :-)
> 
> Right, but I have some thoughts anyway.
> 
> My goal, that may be quite different from yours, is to reduce the
> cpufreq's overhead as much as I possibly can.  If I have to change the
> way it drives the CPU frequency selection to achieve that goal, I will
> do that, but if that can stay the way it is, that's fine too.
> 

As Steve already said, this was not our primary goal. But it is for sure
beneficail for everybody.

> Some progress has been made already here: we have dealt with the
> timers for good now I think.
> 
> This patch deals with the overhead associated with the load tracking
> carried by "traditional" cpufreq governors and with a couple of
> questionable things done by "ondemand" in addition to that (which is
> one of the reasons why I didn't want to modify "ondemand" itself for
> now).
> 
> The next step will be to teach the governor and the ACPI driver to
> switch CPU frequencies in the scheduler context, without spawning
> extra work items etc.
> 
> Finally, the sampling should go away and that's where I want it to be.
> 
> I just don't want to run extra code when that's not necessary and I
> want things to stay simple when that's as good as it can get.  If
> sched-freq can pull that off for me, that's fine, but can it really?
> 
> > [...]
> >
> >> +static void sugov_update_commit(struct policy_dbs_info *policy_dbs, u64 
> 

Re: [RFC/RFT][PATCH 1/1] cpufreq: New governor using utilization data from the scheduler

2016-02-25 Thread Juri Lelli
On 23/02/16 00:02, Rafael J. Wysocki wrote:
> On Mon, Feb 22, 2016 at 3:16 PM, Juri Lelli  wrote:
> > Hi Rafael,
> 
> Hi,
> 

Sorry, my reply to this got delayed a bit.

> > thanks for this RFC. I'm going to test it more in the next few days, but
> > I already have some questions from skimming through it. Please find them
> > inline below.
> >
> > On 22/02/16 00:18, Rafael J. Wysocki wrote:
> >> From: Rafael J. Wysocki 
> >>
> >> Add a new cpufreq scaling governor, called "schedutil", that uses
> >> scheduler-provided CPU utilization information as input for making
> >> its decisions.
> >>
> >
> > I guess the first (macro) question is why did you decide to go with a
> > complete new governor, where new here is w.r.t. the sched-freq solution.
> 
> Probably the most comprehensive answer to this question is my intro
> message: http://marc.info/?l=linux-pm=145609673008122=2
> 
> The executive summary is probably that this was the most
> straightforward way to use the scheduler-provided numbers in cpufreq
> that I could think about.
> 
> > AFAICT, it is true that your solution directly builds on top of the
> > latest changes to cpufreq core and governor, but it also seems to have
> > more than a few points in common with sched-freq,
> 
> That surely isn't a drawback, is it?
> 

Not at all. I guess that I was simply wondering why you felt that a new
approach was required. But you explain this below. :-)

> If two people come to the same conclusions in different ways, that's
> an indication that the conclusions may actually be correct.
> 
> > and sched-freq has been discussed and evaluated for already quite some time.
> 
> Yes, it has.
> 
> Does this mean that no one is allowed to try any alternatives to it now?
>

Of course not. I'm mostly inline with what Steve replied here. But yes,
I think that we can only gain better understanding by reviewing both
RFCs.

> > Also, it appears to me that they both shares (or they might encounter in the
> > future as development progresses) the same kind of problems, like for
> > example the fact that we can't trigger opp changes from scheduler
> > context ATM.
> 
> "Give them a finger and they will ask for the hand."
> 

I'm sorry if you felt that I was asking too much from an RFC. I wasn't
in fact, what I wanted to say is that the two alternatives seemed to
share the same kind of problems. Well, now it seems that you have
already proposed a solution for one of them. :-)

> If you read my intro message linked above, you'll find a paragraph or
> two about that in it.
> 
> And the short summary is that I have a plan to actually implement that
> feature in the schedutil governor at least for the ACPI cpufreq
> driver.  It shouldn't be too difficult to do either AFAICS.  So it is
> not "we can't", but rather "we haven't implemented that yet" in this
> particular case.
> 
> I may not be able to do that in the next few days, as I have other
> things to do too, but you may expect to see that done at one point.
> 
> So it's not a fundamental issue or anything, it's just that I haven't
> done that *yet* at this point, OK?
> 

Sure. I saw what you are proposing to solve this. I'll reply to that
patch if I'll have any comments.

> > Don't get me wrong. I think that looking at different ways to solve a
> > problem is always beneficial, since I guess that the goal in the end is
> > to come up with something that suits everybody's needs.
> 
> Precisely.
> 
> > I was only curious about your thoughts on sched-freq. But we can also wait 
> > for the
> > next RFC from Steve for this macro question. :-)
> 
> Right, but I have some thoughts anyway.
> 
> My goal, that may be quite different from yours, is to reduce the
> cpufreq's overhead as much as I possibly can.  If I have to change the
> way it drives the CPU frequency selection to achieve that goal, I will
> do that, but if that can stay the way it is, that's fine too.
> 

As Steve already said, this was not our primary goal. But it is for sure
beneficail for everybody.

> Some progress has been made already here: we have dealt with the
> timers for good now I think.
> 
> This patch deals with the overhead associated with the load tracking
> carried by "traditional" cpufreq governors and with a couple of
> questionable things done by "ondemand" in addition to that (which is
> one of the reasons why I didn't want to modify "ondemand" itself for
> now).
> 
> The next step will be to teach the governor and the ACPI driver to
> switch CPU frequencies in the scheduler context, without spawning
> extra work items etc.
> 
> Finally, the sampling should go away and that's where I want it to be.
> 
> I just don't want to run extra code when that's not necessary and I
> want things to stay simple when that's as good as it can get.  If
> sched-freq can pull that off for me, that's fine, but can it really?
> 
> > [...]
> >
> >> +static void sugov_update_commit(struct policy_dbs_info *policy_dbs, u64 
> >> time,
> >> + 

Re: [RFC/RFT][PATCH 1/1] cpufreq: New governor using utilization data from the scheduler

2016-02-23 Thread Rafael J. Wysocki
On Monday, February 22, 2016 11:20:33 PM Steve Muckle wrote:
> On 02/22/2016 03:02 PM, Rafael J. Wysocki wrote:
> >> I guess the first (macro) question is why did you decide to go with a
> >> complete new governor, where new here is w.r.t. the sched-freq solution.
> > 
> > Probably the most comprehensive answer to this question is my intro
> > message: http://marc.info/?l=linux-pm=145609673008122=2
> > 
> > The executive summary is probably that this was the most
> > straightforward way to use the scheduler-provided numbers in cpufreq
> > that I could think about.
> > 
> >> AFAICT, it is true that your solution directly builds on top of the
> >> latest changes to cpufreq core and governor, but it also seems to have
> >> more than a few points in common with sched-freq,
> > 
> > That surely isn't a drawback, is it?
> >
> > If two people come to the same conclusions in different ways, that's
> > an indication that the conclusions may actually be correct.
> > 
> >> and sched-freq has been discussed and evaluated for already quite some 
> >> time.
> > 
> > Yes, it has.
> > 
> > Does this mean that no one is allowed to try any alternatives to it now?
> 
> As mentioned above they are rather similar so it doesn't really seem
> like an alternative per se, more like a reimplementation.

If that is the case, I don't quite see where or what the problem is.

I posted this mostly because you and Juri were complaining that I wasn't
telling anyone about how I was going to use util and max going forward.

So this is how I'd like to use them, more or less.  If that is in alignment
with the changes you want to make, all should be fine.

> Why do you feel a new starting point for this problem is needed? Are
> there specific technical concerns?

Well, let me comment the patches you've sent (although not today maybe as I'm
quite tired already and I'm afraid that my comments may not be much to the
point).

That aside, this was rather an attempt to see what could be done on top of
recent fixes in the core and how complicated it would be.

> I see you started looking over the
> latest schedfreq RFC, thank you for your comments thus far. We'd really
> appreciate your continued feedback and the chance to collaborate on it
> to move it forward. I and others have put a fair bit of effort into it
> over the last year or so and will happily and earnestly work to address
> any shortcomings you raise.
> 
> I will review your RFC in the next day or so as well.
> 
> ...
> > My goal, that may be quite different from yours, is to reduce the
> > cpufreq's overhead as much as I possibly can.  If I have to change the
> > way it drives the CPU frequency selection to achieve that goal, I will
> > do that, but if that can stay the way it is, that's fine too.
> 
> Our primary goal has been simply to achieve functional scheduler-driven
> CPU frequency control with equivalent or better power and performance
> than what is available today. Reduction of cpufreq overhead fits within
> this goal (and may be required) so no conflict here.

Good.

Thanks,
Rafael



Re: [RFC/RFT][PATCH 1/1] cpufreq: New governor using utilization data from the scheduler

2016-02-23 Thread Rafael J. Wysocki
On Monday, February 22, 2016 11:20:33 PM Steve Muckle wrote:
> On 02/22/2016 03:02 PM, Rafael J. Wysocki wrote:
> >> I guess the first (macro) question is why did you decide to go with a
> >> complete new governor, where new here is w.r.t. the sched-freq solution.
> > 
> > Probably the most comprehensive answer to this question is my intro
> > message: http://marc.info/?l=linux-pm=145609673008122=2
> > 
> > The executive summary is probably that this was the most
> > straightforward way to use the scheduler-provided numbers in cpufreq
> > that I could think about.
> > 
> >> AFAICT, it is true that your solution directly builds on top of the
> >> latest changes to cpufreq core and governor, but it also seems to have
> >> more than a few points in common with sched-freq,
> > 
> > That surely isn't a drawback, is it?
> >
> > If two people come to the same conclusions in different ways, that's
> > an indication that the conclusions may actually be correct.
> > 
> >> and sched-freq has been discussed and evaluated for already quite some 
> >> time.
> > 
> > Yes, it has.
> > 
> > Does this mean that no one is allowed to try any alternatives to it now?
> 
> As mentioned above they are rather similar so it doesn't really seem
> like an alternative per se, more like a reimplementation.

If that is the case, I don't quite see where or what the problem is.

I posted this mostly because you and Juri were complaining that I wasn't
telling anyone about how I was going to use util and max going forward.

So this is how I'd like to use them, more or less.  If that is in alignment
with the changes you want to make, all should be fine.

> Why do you feel a new starting point for this problem is needed? Are
> there specific technical concerns?

Well, let me comment the patches you've sent (although not today maybe as I'm
quite tired already and I'm afraid that my comments may not be much to the
point).

That aside, this was rather an attempt to see what could be done on top of
recent fixes in the core and how complicated it would be.

> I see you started looking over the
> latest schedfreq RFC, thank you for your comments thus far. We'd really
> appreciate your continued feedback and the chance to collaborate on it
> to move it forward. I and others have put a fair bit of effort into it
> over the last year or so and will happily and earnestly work to address
> any shortcomings you raise.
> 
> I will review your RFC in the next day or so as well.
> 
> ...
> > My goal, that may be quite different from yours, is to reduce the
> > cpufreq's overhead as much as I possibly can.  If I have to change the
> > way it drives the CPU frequency selection to achieve that goal, I will
> > do that, but if that can stay the way it is, that's fine too.
> 
> Our primary goal has been simply to achieve functional scheduler-driven
> CPU frequency control with equivalent or better power and performance
> than what is available today. Reduction of cpufreq overhead fits within
> this goal (and may be required) so no conflict here.

Good.

Thanks,
Rafael



Re: [RFC/RFT][PATCH 1/1] cpufreq: New governor using utilization data from the scheduler

2016-02-22 Thread Steve Muckle
On 02/22/2016 03:02 PM, Rafael J. Wysocki wrote:
>> I guess the first (macro) question is why did you decide to go with a
>> complete new governor, where new here is w.r.t. the sched-freq solution.
> 
> Probably the most comprehensive answer to this question is my intro
> message: http://marc.info/?l=linux-pm=145609673008122=2
> 
> The executive summary is probably that this was the most
> straightforward way to use the scheduler-provided numbers in cpufreq
> that I could think about.
> 
>> AFAICT, it is true that your solution directly builds on top of the
>> latest changes to cpufreq core and governor, but it also seems to have
>> more than a few points in common with sched-freq,
> 
> That surely isn't a drawback, is it?
>
> If two people come to the same conclusions in different ways, that's
> an indication that the conclusions may actually be correct.
> 
>> and sched-freq has been discussed and evaluated for already quite some time.
> 
> Yes, it has.
> 
> Does this mean that no one is allowed to try any alternatives to it now?

As mentioned above they are rather similar so it doesn't really seem
like an alternative per se, more like a reimplementation.

Why do you feel a new starting point for this problem is needed? Are
there specific technical concerns? I see you started looking over the
latest schedfreq RFC, thank you for your comments thus far. We'd really
appreciate your continued feedback and the chance to collaborate on it
to move it forward. I and others have put a fair bit of effort into it
over the last year or so and will happily and earnestly work to address
any shortcomings you raise.

I will review your RFC in the next day or so as well.

...
> My goal, that may be quite different from yours, is to reduce the
> cpufreq's overhead as much as I possibly can.  If I have to change the
> way it drives the CPU frequency selection to achieve that goal, I will
> do that, but if that can stay the way it is, that's fine too.

Our primary goal has been simply to achieve functional scheduler-driven
CPU frequency control with equivalent or better power and performance
than what is available today. Reduction of cpufreq overhead fits within
this goal (and may be required) so no conflict here.

thanks,
Steve



Re: [RFC/RFT][PATCH 1/1] cpufreq: New governor using utilization data from the scheduler

2016-02-22 Thread Steve Muckle
On 02/22/2016 03:02 PM, Rafael J. Wysocki wrote:
>> I guess the first (macro) question is why did you decide to go with a
>> complete new governor, where new here is w.r.t. the sched-freq solution.
> 
> Probably the most comprehensive answer to this question is my intro
> message: http://marc.info/?l=linux-pm=145609673008122=2
> 
> The executive summary is probably that this was the most
> straightforward way to use the scheduler-provided numbers in cpufreq
> that I could think about.
> 
>> AFAICT, it is true that your solution directly builds on top of the
>> latest changes to cpufreq core and governor, but it also seems to have
>> more than a few points in common with sched-freq,
> 
> That surely isn't a drawback, is it?
>
> If two people come to the same conclusions in different ways, that's
> an indication that the conclusions may actually be correct.
> 
>> and sched-freq has been discussed and evaluated for already quite some time.
> 
> Yes, it has.
> 
> Does this mean that no one is allowed to try any alternatives to it now?

As mentioned above they are rather similar so it doesn't really seem
like an alternative per se, more like a reimplementation.

Why do you feel a new starting point for this problem is needed? Are
there specific technical concerns? I see you started looking over the
latest schedfreq RFC, thank you for your comments thus far. We'd really
appreciate your continued feedback and the chance to collaborate on it
to move it forward. I and others have put a fair bit of effort into it
over the last year or so and will happily and earnestly work to address
any shortcomings you raise.

I will review your RFC in the next day or so as well.

...
> My goal, that may be quite different from yours, is to reduce the
> cpufreq's overhead as much as I possibly can.  If I have to change the
> way it drives the CPU frequency selection to achieve that goal, I will
> do that, but if that can stay the way it is, that's fine too.

Our primary goal has been simply to achieve functional scheduler-driven
CPU frequency control with equivalent or better power and performance
than what is available today. Reduction of cpufreq overhead fits within
this goal (and may be required) so no conflict here.

thanks,
Steve



Re: [RFC/RFT][PATCH 1/1] cpufreq: New governor using utilization data from the scheduler

2016-02-22 Thread Rafael J. Wysocki
On Mon, Feb 22, 2016 at 3:16 PM, Juri Lelli  wrote:
> Hi Rafael,

Hi,

> thanks for this RFC. I'm going to test it more in the next few days, but
> I already have some questions from skimming through it. Please find them
> inline below.
>
> On 22/02/16 00:18, Rafael J. Wysocki wrote:
>> From: Rafael J. Wysocki 
>>
>> Add a new cpufreq scaling governor, called "schedutil", that uses
>> scheduler-provided CPU utilization information as input for making
>> its decisions.
>>
>
> I guess the first (macro) question is why did you decide to go with a
> complete new governor, where new here is w.r.t. the sched-freq solution.

Probably the most comprehensive answer to this question is my intro
message: http://marc.info/?l=linux-pm=145609673008122=2

The executive summary is probably that this was the most
straightforward way to use the scheduler-provided numbers in cpufreq
that I could think about.

> AFAICT, it is true that your solution directly builds on top of the
> latest changes to cpufreq core and governor, but it also seems to have
> more than a few points in common with sched-freq,

That surely isn't a drawback, is it?

If two people come to the same conclusions in different ways, that's
an indication that the conclusions may actually be correct.

> and sched-freq has been discussed and evaluated for already quite some time.

Yes, it has.

Does this mean that no one is allowed to try any alternatives to it now?

> Also, it appears to me that they both shares (or they might encounter in the
> future as development progresses) the same kind of problems, like for
> example the fact that we can't trigger opp changes from scheduler
> context ATM.

"Give them a finger and they will ask for the hand."

If you read my intro message linked above, you'll find a paragraph or
two about that in it.

And the short summary is that I have a plan to actually implement that
feature in the schedutil governor at least for the ACPI cpufreq
driver.  It shouldn't be too difficult to do either AFAICS.  So it is
not "we can't", but rather "we haven't implemented that yet" in this
particular case.

I may not be able to do that in the next few days, as I have other
things to do too, but you may expect to see that done at one point.

So it's not a fundamental issue or anything, it's just that I haven't
done that *yet* at this point, OK?

> Don't get me wrong. I think that looking at different ways to solve a
> problem is always beneficial, since I guess that the goal in the end is
> to come up with something that suits everybody's needs.

Precisely.

> I was only curious about your thoughts on sched-freq. But we can also wait 
> for the
> next RFC from Steve for this macro question. :-)

Right, but I have some thoughts anyway.

My goal, that may be quite different from yours, is to reduce the
cpufreq's overhead as much as I possibly can.  If I have to change the
way it drives the CPU frequency selection to achieve that goal, I will
do that, but if that can stay the way it is, that's fine too.

Some progress has been made already here: we have dealt with the
timers for good now I think.

This patch deals with the overhead associated with the load tracking
carried by "traditional" cpufreq governors and with a couple of
questionable things done by "ondemand" in addition to that (which is
one of the reasons why I didn't want to modify "ondemand" itself for
now).

The next step will be to teach the governor and the ACPI driver to
switch CPU frequencies in the scheduler context, without spawning
extra work items etc.

Finally, the sampling should go away and that's where I want it to be.

I just don't want to run extra code when that's not necessary and I
want things to stay simple when that's as good as it can get.  If
sched-freq can pull that off for me, that's fine, but can it really?

> [...]
>
>> +static void sugov_update_commit(struct policy_dbs_info *policy_dbs, u64 
>> time,
>> + unsigned int next_freq)
>> +{
>> + struct sugov_policy *sg_policy = to_sg_policy(policy_dbs);
>> +
>> + sg_policy->next_freq = next_freq;
>> + policy_dbs->last_sample_time = time;
>> + policy_dbs->work_in_progress = true;
>> + irq_work_queue(_dbs->irq_work);
>
> Here we basically use the system wq to be able to do the freq transition
> in process context. CFS is probably fine with this, but don't you think
> we might get into troubles when, in the future, we will want to service
> RT/DL requests more properly and they will end up being serviced
> together with all the others wq users and at !RT priority?

That may be regarded as a problem, but I'm not sure why you're talking
about it in the context of this particular patch.  That problem has
been there forever in cpufreq: in theory RT tasks may stall frequency
changes indefinitely.

Is the problem real, though?

Suppose that that actually happens and there are RT tasks effectively
stalling frequency updates.  

Re: [RFC/RFT][PATCH 1/1] cpufreq: New governor using utilization data from the scheduler

2016-02-22 Thread Rafael J. Wysocki
On Mon, Feb 22, 2016 at 3:16 PM, Juri Lelli  wrote:
> Hi Rafael,

Hi,

> thanks for this RFC. I'm going to test it more in the next few days, but
> I already have some questions from skimming through it. Please find them
> inline below.
>
> On 22/02/16 00:18, Rafael J. Wysocki wrote:
>> From: Rafael J. Wysocki 
>>
>> Add a new cpufreq scaling governor, called "schedutil", that uses
>> scheduler-provided CPU utilization information as input for making
>> its decisions.
>>
>
> I guess the first (macro) question is why did you decide to go with a
> complete new governor, where new here is w.r.t. the sched-freq solution.

Probably the most comprehensive answer to this question is my intro
message: http://marc.info/?l=linux-pm=145609673008122=2

The executive summary is probably that this was the most
straightforward way to use the scheduler-provided numbers in cpufreq
that I could think about.

> AFAICT, it is true that your solution directly builds on top of the
> latest changes to cpufreq core and governor, but it also seems to have
> more than a few points in common with sched-freq,

That surely isn't a drawback, is it?

If two people come to the same conclusions in different ways, that's
an indication that the conclusions may actually be correct.

> and sched-freq has been discussed and evaluated for already quite some time.

Yes, it has.

Does this mean that no one is allowed to try any alternatives to it now?

> Also, it appears to me that they both shares (or they might encounter in the
> future as development progresses) the same kind of problems, like for
> example the fact that we can't trigger opp changes from scheduler
> context ATM.

"Give them a finger and they will ask for the hand."

If you read my intro message linked above, you'll find a paragraph or
two about that in it.

And the short summary is that I have a plan to actually implement that
feature in the schedutil governor at least for the ACPI cpufreq
driver.  It shouldn't be too difficult to do either AFAICS.  So it is
not "we can't", but rather "we haven't implemented that yet" in this
particular case.

I may not be able to do that in the next few days, as I have other
things to do too, but you may expect to see that done at one point.

So it's not a fundamental issue or anything, it's just that I haven't
done that *yet* at this point, OK?

> Don't get me wrong. I think that looking at different ways to solve a
> problem is always beneficial, since I guess that the goal in the end is
> to come up with something that suits everybody's needs.

Precisely.

> I was only curious about your thoughts on sched-freq. But we can also wait 
> for the
> next RFC from Steve for this macro question. :-)

Right, but I have some thoughts anyway.

My goal, that may be quite different from yours, is to reduce the
cpufreq's overhead as much as I possibly can.  If I have to change the
way it drives the CPU frequency selection to achieve that goal, I will
do that, but if that can stay the way it is, that's fine too.

Some progress has been made already here: we have dealt with the
timers for good now I think.

This patch deals with the overhead associated with the load tracking
carried by "traditional" cpufreq governors and with a couple of
questionable things done by "ondemand" in addition to that (which is
one of the reasons why I didn't want to modify "ondemand" itself for
now).

The next step will be to teach the governor and the ACPI driver to
switch CPU frequencies in the scheduler context, without spawning
extra work items etc.

Finally, the sampling should go away and that's where I want it to be.

I just don't want to run extra code when that's not necessary and I
want things to stay simple when that's as good as it can get.  If
sched-freq can pull that off for me, that's fine, but can it really?

> [...]
>
>> +static void sugov_update_commit(struct policy_dbs_info *policy_dbs, u64 
>> time,
>> + unsigned int next_freq)
>> +{
>> + struct sugov_policy *sg_policy = to_sg_policy(policy_dbs);
>> +
>> + sg_policy->next_freq = next_freq;
>> + policy_dbs->last_sample_time = time;
>> + policy_dbs->work_in_progress = true;
>> + irq_work_queue(_dbs->irq_work);
>
> Here we basically use the system wq to be able to do the freq transition
> in process context. CFS is probably fine with this, but don't you think
> we might get into troubles when, in the future, we will want to service
> RT/DL requests more properly and they will end up being serviced
> together with all the others wq users and at !RT priority?

That may be regarded as a problem, but I'm not sure why you're talking
about it in the context of this particular patch.  That problem has
been there forever in cpufreq: in theory RT tasks may stall frequency
changes indefinitely.

Is the problem real, though?

Suppose that that actually happens and there are RT tasks effectively
stalling frequency updates.  In that case some other important
activities of 

Re: [RFC/RFT][PATCH 1/1] cpufreq: New governor using utilization data from the scheduler

2016-02-22 Thread Juri Lelli
Hi Rafael,

thanks for this RFC. I'm going to test it more in the next few days, but
I already have some questions from skimming through it. Please find them
inline below.

On 22/02/16 00:18, Rafael J. Wysocki wrote:
> From: Rafael J. Wysocki 
> 
> Add a new cpufreq scaling governor, called "schedutil", that uses
> scheduler-provided CPU utilization information as input for making
> its decisions.
> 

I guess the first (macro) question is why did you decide to go with a
complete new governor, where new here is w.r.t. the sched-freq solution.
AFAICT, it is true that your solution directly builds on top of the
latest changes to cpufreq core and governor, but it also seems to have
more than a few points in common with sched-freq, and sched-freq has
been discussed and evaluated for already quite some time. Also, it
appears to me that they both shares (or they might encounter in the
future as development progresses) the same kind of problems, like for
example the fact that we can't trigger opp changes from scheduler
context ATM.

Don't get me wrong. I think that looking at different ways to solve a
problem is always beneficial, since I guess that the goal in the end is
to come up with something that suits everybody's needs. I was only
curious about your thoughts on sched-freq. But we can also wait for the
next RFC from Steve for this macro question. :-)

[...]

> +static void sugov_update_commit(struct policy_dbs_info *policy_dbs, u64 time,
> + unsigned int next_freq)
> +{
> + struct sugov_policy *sg_policy = to_sg_policy(policy_dbs);
> +
> + sg_policy->next_freq = next_freq;
> + policy_dbs->last_sample_time = time;
> + policy_dbs->work_in_progress = true;
> + irq_work_queue(_dbs->irq_work);

Here we basically use the system wq to be able to do the freq transition
in process context. CFS is probably fine with this, but don't you think
we might get into troubles when, in the future, we will want to service
RT/DL requests more properly and they will end up being serviced
together with all the others wq users and at !RT priority?

> +}
> +
> +static void sugov_update_shared(struct update_util_data *data, u64 time,
> + unsigned long util, unsigned long max)
> +{

We don't have a way to tell from which scheduling class this is coming
from, do we? And if that is true can't a request from CFS overwrite
RT/DL go to max requests?

[...]

Anyway, I'm going to start using our existing testing infrastructure
used to evaluate sched-freq to try to better understand the implications
of your approach.

Best,

- Juri


Re: [RFC/RFT][PATCH 1/1] cpufreq: New governor using utilization data from the scheduler

2016-02-22 Thread Juri Lelli
Hi Rafael,

thanks for this RFC. I'm going to test it more in the next few days, but
I already have some questions from skimming through it. Please find them
inline below.

On 22/02/16 00:18, Rafael J. Wysocki wrote:
> From: Rafael J. Wysocki 
> 
> Add a new cpufreq scaling governor, called "schedutil", that uses
> scheduler-provided CPU utilization information as input for making
> its decisions.
> 

I guess the first (macro) question is why did you decide to go with a
complete new governor, where new here is w.r.t. the sched-freq solution.
AFAICT, it is true that your solution directly builds on top of the
latest changes to cpufreq core and governor, but it also seems to have
more than a few points in common with sched-freq, and sched-freq has
been discussed and evaluated for already quite some time. Also, it
appears to me that they both shares (or they might encounter in the
future as development progresses) the same kind of problems, like for
example the fact that we can't trigger opp changes from scheduler
context ATM.

Don't get me wrong. I think that looking at different ways to solve a
problem is always beneficial, since I guess that the goal in the end is
to come up with something that suits everybody's needs. I was only
curious about your thoughts on sched-freq. But we can also wait for the
next RFC from Steve for this macro question. :-)

[...]

> +static void sugov_update_commit(struct policy_dbs_info *policy_dbs, u64 time,
> + unsigned int next_freq)
> +{
> + struct sugov_policy *sg_policy = to_sg_policy(policy_dbs);
> +
> + sg_policy->next_freq = next_freq;
> + policy_dbs->last_sample_time = time;
> + policy_dbs->work_in_progress = true;
> + irq_work_queue(_dbs->irq_work);

Here we basically use the system wq to be able to do the freq transition
in process context. CFS is probably fine with this, but don't you think
we might get into troubles when, in the future, we will want to service
RT/DL requests more properly and they will end up being serviced
together with all the others wq users and at !RT priority?

> +}
> +
> +static void sugov_update_shared(struct update_util_data *data, u64 time,
> + unsigned long util, unsigned long max)
> +{

We don't have a way to tell from which scheduling class this is coming
from, do we? And if that is true can't a request from CFS overwrite
RT/DL go to max requests?

[...]

Anyway, I'm going to start using our existing testing infrastructure
used to evaluate sched-freq to try to better understand the implications
of your approach.

Best,

- Juri


[RFC/RFT][PATCH 1/1] cpufreq: New governor using utilization data from the scheduler

2016-02-21 Thread Rafael J. Wysocki
From: Rafael J. Wysocki 

Add a new cpufreq scaling governor, called "schedutil", that uses
scheduler-provided CPU utilization information as input for making
its decisions.

Doing that is possible after commit fe7034338ba0 (cpufreq: Add
mechanism for registering utilization update callbacks) that
introduced cpufreq_update_util() called by the scheduler on
utilization changes (from CFS) and RT/DL task status updates.
In particular, CPU frequency scaling decisions may be based on
the the utilization data passed to cpufreq_update_util() by CFS.

The new governor is very simple.  It is almost as simple as it
can be and remain reasonably functional.

The frequency selection formula used by it is essentially the same
as the one used by the "ondemand" governor, although it doesn't use
the additional up_threshold parameter, but instead of computing the
load as the "non-idle CPU time" to "total CPU time" ratio, it takes
the utilization data provided by CFS as input.  More specifically,
it represents "load" as the util/max ratio, where util and max
are the utilization and CPU capacity coming from CFS.

All of the computations are carried out in the utilization update
handlers provided by the new governor.  One of those handlers is
used for cpufreq policies shared between multiple CPUs and the other
one is for policies with one CPU only (and therefore it doesn't need
to use any extra synchronization means).  The only operation carried
out by the new governor's ->gov_dbs_timer callback, sugov_set_freq(),
is a __cpufreq_driver_target() call to trigger a frequency update (to
a value already computed beforehand in one of the utilization update
handlers).  This means that, at least for some cpufreq drivers that
can update CPU frequency by doing simple register writes, it should
be possible to set the frequency in the utilization update handlers
too in which case all of the governor's activity would take place in
the scheduler paths invoking cpufreq_update_util() without the need
to run anything in process context.

Currently, the governor treats all of the RT and DL tasks as
"unknown utilization" and sets the frequency to the allowed
maximum when updated from the RT or DL sched classes.  That
heavy-handed approach should be replaced with something more
specifically targeted at RT and DL tasks.

To some extent it relies on the common governor code in
cpufreq_governor.c and it uses that code in a somewhat unusual
way (different from what the "ondemand" and "conservative"
governors do), so some small and rather unintrusive changes
have to be made in that code and the other governors to support it.

However, after making it possible to set the CPU frequency from
the utilization update handlers, that new governor's interactions
with the common code might be limited to the initialization, cleanup
and handling of sysfs attributes (currently only one attribute,
sampling_rate, is supported in addition to the standard policy
attributes handled by the cpufreq core).

Signed-off-by: Rafael J. Wysocki 
---

This is on top of the linux-next branch of the linux-pm.git tree (that
should be part of the tomorrow's linux-next if all goes well), but it should
also apply on top of the pm-cpufreq-test branch in that tree (which only
contains changes related to cpufreq governors).

---
 drivers/cpufreq/Kconfig|   15 +
 drivers/cpufreq/Makefile   |1 
 drivers/cpufreq/cpufreq_conservative.c |3 
 drivers/cpufreq/cpufreq_governor.c |   21 +-
 drivers/cpufreq/cpufreq_governor.h |2 
 drivers/cpufreq/cpufreq_ondemand.c |3 
 drivers/cpufreq/cpufreq_schedutil.c|  249 +
 7 files changed, 284 insertions(+), 10 deletions(-)

Index: linux-pm/drivers/cpufreq/cpufreq_governor.h
===
--- linux-pm.orig/drivers/cpufreq/cpufreq_governor.h
+++ linux-pm/drivers/cpufreq/cpufreq_governor.h
@@ -164,7 +164,7 @@ struct dbs_governor {
void (*free)(struct policy_dbs_info *policy_dbs);
int (*init)(struct dbs_data *dbs_data, bool notify);
void (*exit)(struct dbs_data *dbs_data, bool notify);
-   void (*start)(struct cpufreq_policy *policy);
+   bool (*start)(struct cpufreq_policy *policy);
 };
 
 static inline struct dbs_governor *dbs_governor_of(struct cpufreq_policy 
*policy)
Index: linux-pm/drivers/cpufreq/cpufreq_schedutil.c
===
--- /dev/null
+++ linux-pm/drivers/cpufreq/cpufreq_schedutil.c
@@ -0,0 +1,249 @@
+/*
+ * CPUFreq governor based on scheduler-provided CPU utilization data.
+ *
+ * Copyright (C) 2016, Intel Corporation
+ * Author: Rafael J. Wysocki 
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License version 2 as
+ * published by the Free Software 

[RFC/RFT][PATCH 1/1] cpufreq: New governor using utilization data from the scheduler

2016-02-21 Thread Rafael J. Wysocki
From: Rafael J. Wysocki 

Add a new cpufreq scaling governor, called "schedutil", that uses
scheduler-provided CPU utilization information as input for making
its decisions.

Doing that is possible after commit fe7034338ba0 (cpufreq: Add
mechanism for registering utilization update callbacks) that
introduced cpufreq_update_util() called by the scheduler on
utilization changes (from CFS) and RT/DL task status updates.
In particular, CPU frequency scaling decisions may be based on
the the utilization data passed to cpufreq_update_util() by CFS.

The new governor is very simple.  It is almost as simple as it
can be and remain reasonably functional.

The frequency selection formula used by it is essentially the same
as the one used by the "ondemand" governor, although it doesn't use
the additional up_threshold parameter, but instead of computing the
load as the "non-idle CPU time" to "total CPU time" ratio, it takes
the utilization data provided by CFS as input.  More specifically,
it represents "load" as the util/max ratio, where util and max
are the utilization and CPU capacity coming from CFS.

All of the computations are carried out in the utilization update
handlers provided by the new governor.  One of those handlers is
used for cpufreq policies shared between multiple CPUs and the other
one is for policies with one CPU only (and therefore it doesn't need
to use any extra synchronization means).  The only operation carried
out by the new governor's ->gov_dbs_timer callback, sugov_set_freq(),
is a __cpufreq_driver_target() call to trigger a frequency update (to
a value already computed beforehand in one of the utilization update
handlers).  This means that, at least for some cpufreq drivers that
can update CPU frequency by doing simple register writes, it should
be possible to set the frequency in the utilization update handlers
too in which case all of the governor's activity would take place in
the scheduler paths invoking cpufreq_update_util() without the need
to run anything in process context.

Currently, the governor treats all of the RT and DL tasks as
"unknown utilization" and sets the frequency to the allowed
maximum when updated from the RT or DL sched classes.  That
heavy-handed approach should be replaced with something more
specifically targeted at RT and DL tasks.

To some extent it relies on the common governor code in
cpufreq_governor.c and it uses that code in a somewhat unusual
way (different from what the "ondemand" and "conservative"
governors do), so some small and rather unintrusive changes
have to be made in that code and the other governors to support it.

However, after making it possible to set the CPU frequency from
the utilization update handlers, that new governor's interactions
with the common code might be limited to the initialization, cleanup
and handling of sysfs attributes (currently only one attribute,
sampling_rate, is supported in addition to the standard policy
attributes handled by the cpufreq core).

Signed-off-by: Rafael J. Wysocki 
---

This is on top of the linux-next branch of the linux-pm.git tree (that
should be part of the tomorrow's linux-next if all goes well), but it should
also apply on top of the pm-cpufreq-test branch in that tree (which only
contains changes related to cpufreq governors).

---
 drivers/cpufreq/Kconfig|   15 +
 drivers/cpufreq/Makefile   |1 
 drivers/cpufreq/cpufreq_conservative.c |3 
 drivers/cpufreq/cpufreq_governor.c |   21 +-
 drivers/cpufreq/cpufreq_governor.h |2 
 drivers/cpufreq/cpufreq_ondemand.c |3 
 drivers/cpufreq/cpufreq_schedutil.c|  249 +
 7 files changed, 284 insertions(+), 10 deletions(-)

Index: linux-pm/drivers/cpufreq/cpufreq_governor.h
===
--- linux-pm.orig/drivers/cpufreq/cpufreq_governor.h
+++ linux-pm/drivers/cpufreq/cpufreq_governor.h
@@ -164,7 +164,7 @@ struct dbs_governor {
void (*free)(struct policy_dbs_info *policy_dbs);
int (*init)(struct dbs_data *dbs_data, bool notify);
void (*exit)(struct dbs_data *dbs_data, bool notify);
-   void (*start)(struct cpufreq_policy *policy);
+   bool (*start)(struct cpufreq_policy *policy);
 };
 
 static inline struct dbs_governor *dbs_governor_of(struct cpufreq_policy 
*policy)
Index: linux-pm/drivers/cpufreq/cpufreq_schedutil.c
===
--- /dev/null
+++ linux-pm/drivers/cpufreq/cpufreq_schedutil.c
@@ -0,0 +1,249 @@
+/*
+ * CPUFreq governor based on scheduler-provided CPU utilization data.
+ *
+ * Copyright (C) 2016, Intel Corporation
+ * Author: Rafael J. Wysocki 
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License version 2 as
+ * published by the Free Software Foundation.
+ */
+
+#include 
+#include 
+
+#include "cpufreq_governor.h"
+
+struct