Re: [RFC/RFT][PATCH 1/1] cpufreq: New governor using utilization data from the scheduler
On Tue, Mar 1, 2016 at 3:56 PM, Juri Lelliwrote: > On 26/02/16 03:36, Rafael J. Wysocki wrote: >> On Thursday, February 25, 2016 11:01:20 AM Juri Lelli wrote: > > [...] > >> > >> > That is right. But, can't an higher priority class eat all the needed >> > capacity. I mean, suppose that both CFS and DL need 30% of CPU capacity >> > on the same CPU. DL wins and gets its 30% of capacity. When CFS gets to >> > run it's too late for requesting anything more (w.r.t. the same time >> > window). If we somehow aggregate requests instead, we could request 60% >> > and both classes can have their capacity to run. It seems to me that >> > this is what governors were already doing by using the 1 - idle metric. >> >> That's interesting, because it is about a few different things at a time. :-) >> >> So first of all the "old" governors only collect information about what >> happened in the past and make decisions on that basis (kind of in the hope >> that what happened once will happen again), while the idea behind what >> you're describing seems to be to attempt to project future needs for >> capacity and use that to make decisions (just for the very near future, >> but that should be sufficient). If successful, that would be the most >> suitable approach IMO. >> > > Right, this is a key difference. > >> Of course, the $subject patch is not aspiring to anything of that kind. >> It only uses information about current needs that's already available to >> it in a very straightforward way. >> > > But, using utilization of CFS tasks (based on PELT) has already some > notion of "future needs" (even if it is true that tasks might have > phases). And this will be true for DL as well, once we will have a > corresponding utilization signal that we can consume. I think you are > already consuming information about the future in some sense. :-) That's because the already available numbers include that information. I don't do any projections myself. >> But there's more to it. In the sampling, or rate-limiting if you will, >> situation you really have a window in which many things can happen and >> making a good decision at the beginning of it is important. However, if >> you just can handle *every* request and really switch frequencies on the >> fly, then each of them may come with a "currently needed capacity" number >> and you can just give it what it asks for every time. >> > > True. Rate-limiting poses interesting problems. > >> My point is that there are quite a few things to consider here and I'm >> expecting a learning process to happen before we are happy with what we >> have. So my approach would be (and is) to start very simple and then >> add more complexity over time as needed instead of just trying to address >> every issue I can think about from the outset. >> > > I perfectly understand that, and I agree that there is value in starting > simple. I simply fear that aggregation of utilization signals will be one > of the few things that will pop out fairly soon. :-) That's OK. If it is demonstrably better than the super-simple initial approach, there won't be any reason to reject it. Thanks, Rafael
Re: [RFC/RFT][PATCH 1/1] cpufreq: New governor using utilization data from the scheduler
On Tue, Mar 1, 2016 at 3:56 PM, Juri Lelli wrote: > On 26/02/16 03:36, Rafael J. Wysocki wrote: >> On Thursday, February 25, 2016 11:01:20 AM Juri Lelli wrote: > > [...] > >> > >> > That is right. But, can't an higher priority class eat all the needed >> > capacity. I mean, suppose that both CFS and DL need 30% of CPU capacity >> > on the same CPU. DL wins and gets its 30% of capacity. When CFS gets to >> > run it's too late for requesting anything more (w.r.t. the same time >> > window). If we somehow aggregate requests instead, we could request 60% >> > and both classes can have their capacity to run. It seems to me that >> > this is what governors were already doing by using the 1 - idle metric. >> >> That's interesting, because it is about a few different things at a time. :-) >> >> So first of all the "old" governors only collect information about what >> happened in the past and make decisions on that basis (kind of in the hope >> that what happened once will happen again), while the idea behind what >> you're describing seems to be to attempt to project future needs for >> capacity and use that to make decisions (just for the very near future, >> but that should be sufficient). If successful, that would be the most >> suitable approach IMO. >> > > Right, this is a key difference. > >> Of course, the $subject patch is not aspiring to anything of that kind. >> It only uses information about current needs that's already available to >> it in a very straightforward way. >> > > But, using utilization of CFS tasks (based on PELT) has already some > notion of "future needs" (even if it is true that tasks might have > phases). And this will be true for DL as well, once we will have a > corresponding utilization signal that we can consume. I think you are > already consuming information about the future in some sense. :-) That's because the already available numbers include that information. I don't do any projections myself. >> But there's more to it. In the sampling, or rate-limiting if you will, >> situation you really have a window in which many things can happen and >> making a good decision at the beginning of it is important. However, if >> you just can handle *every* request and really switch frequencies on the >> fly, then each of them may come with a "currently needed capacity" number >> and you can just give it what it asks for every time. >> > > True. Rate-limiting poses interesting problems. > >> My point is that there are quite a few things to consider here and I'm >> expecting a learning process to happen before we are happy with what we >> have. So my approach would be (and is) to start very simple and then >> add more complexity over time as needed instead of just trying to address >> every issue I can think about from the outset. >> > > I perfectly understand that, and I agree that there is value in starting > simple. I simply fear that aggregation of utilization signals will be one > of the few things that will pop out fairly soon. :-) That's OK. If it is demonstrably better than the super-simple initial approach, there won't be any reason to reject it. Thanks, Rafael
Re: [RFC/RFT][PATCH 1/1] cpufreq: New governor using utilization data from the scheduler
On 26/02/16 03:36, Rafael J. Wysocki wrote: > On Thursday, February 25, 2016 11:01:20 AM Juri Lelli wrote: [...] > > > > That is right. But, can't an higher priority class eat all the needed > > capacity. I mean, suppose that both CFS and DL need 30% of CPU capacity > > on the same CPU. DL wins and gets its 30% of capacity. When CFS gets to > > run it's too late for requesting anything more (w.r.t. the same time > > window). If we somehow aggregate requests instead, we could request 60% > > and both classes can have their capacity to run. It seems to me that > > this is what governors were already doing by using the 1 - idle metric. > > That's interesting, because it is about a few different things at a time. :-) > > So first of all the "old" governors only collect information about what > happened in the past and make decisions on that basis (kind of in the hope > that what happened once will happen again), while the idea behind what > you're describing seems to be to attempt to project future needs for > capacity and use that to make decisions (just for the very near future, > but that should be sufficient). If successful, that would be the most > suitable approach IMO. > Right, this is a key difference. > Of course, the $subject patch is not aspiring to anything of that kind. > It only uses information about current needs that's already available to > it in a very straightforward way. > But, using utilization of CFS tasks (based on PELT) has already some notion of "future needs" (even if it is true that tasks might have phases). And this will be true for DL as well, once we will have a corresponding utilization signal that we can consume. I think you are already consuming information about the future in some sense. :-) > But there's more to it. In the sampling, or rate-limiting if you will, > situation you really have a window in which many things can happen and > making a good decision at the beginning of it is important. However, if > you just can handle *every* request and really switch frequencies on the > fly, then each of them may come with a "currently needed capacity" number > and you can just give it what it asks for every time. > True. Rate-limiting poses interesting problems. > My point is that there are quite a few things to consider here and I'm > expecting a learning process to happen before we are happy with what we > have. So my approach would be (and is) to start very simple and then > add more complexity over time as needed instead of just trying to address > every issue I can think about from the outset. > I perfectly understand that, and I agree that there is value in starting simple. I simply fear that aggregation of utilization signals will be one of the few things that will pop out fairly soon. :-) Best, - Juri
Re: [RFC/RFT][PATCH 1/1] cpufreq: New governor using utilization data from the scheduler
On 26/02/16 03:36, Rafael J. Wysocki wrote: > On Thursday, February 25, 2016 11:01:20 AM Juri Lelli wrote: [...] > > > > That is right. But, can't an higher priority class eat all the needed > > capacity. I mean, suppose that both CFS and DL need 30% of CPU capacity > > on the same CPU. DL wins and gets its 30% of capacity. When CFS gets to > > run it's too late for requesting anything more (w.r.t. the same time > > window). If we somehow aggregate requests instead, we could request 60% > > and both classes can have their capacity to run. It seems to me that > > this is what governors were already doing by using the 1 - idle metric. > > That's interesting, because it is about a few different things at a time. :-) > > So first of all the "old" governors only collect information about what > happened in the past and make decisions on that basis (kind of in the hope > that what happened once will happen again), while the idea behind what > you're describing seems to be to attempt to project future needs for > capacity and use that to make decisions (just for the very near future, > but that should be sufficient). If successful, that would be the most > suitable approach IMO. > Right, this is a key difference. > Of course, the $subject patch is not aspiring to anything of that kind. > It only uses information about current needs that's already available to > it in a very straightforward way. > But, using utilization of CFS tasks (based on PELT) has already some notion of "future needs" (even if it is true that tasks might have phases). And this will be true for DL as well, once we will have a corresponding utilization signal that we can consume. I think you are already consuming information about the future in some sense. :-) > But there's more to it. In the sampling, or rate-limiting if you will, > situation you really have a window in which many things can happen and > making a good decision at the beginning of it is important. However, if > you just can handle *every* request and really switch frequencies on the > fly, then each of them may come with a "currently needed capacity" number > and you can just give it what it asks for every time. > True. Rate-limiting poses interesting problems. > My point is that there are quite a few things to consider here and I'm > expecting a learning process to happen before we are happy with what we > have. So my approach would be (and is) to start very simple and then > add more complexity over time as needed instead of just trying to address > every issue I can think about from the outset. > I perfectly understand that, and I agree that there is value in starting simple. I simply fear that aggregation of utilization signals will be one of the few things that will pop out fairly soon. :-) Best, - Juri
Re: [RFC/RFT][PATCH 1/1] cpufreq: New governor using utilization data from the scheduler
On Thursday, February 25, 2016 11:01:20 AM Juri Lelli wrote: > On 23/02/16 00:02, Rafael J. Wysocki wrote: > > On Mon, Feb 22, 2016 at 3:16 PM, Juri Lelliwrote: > > > Hi Rafael, > > > > Hi, > > > > Sorry, my reply to this got delayed a bit. No problem at all. > > > thanks for this RFC. I'm going to test it more in the next few days, but > > > I already have some questions from skimming through it. Please find them > > > inline below. > > > > > > On 22/02/16 00:18, Rafael J. Wysocki wrote: > > >> From: Rafael J. Wysocki > > >> > > >> Add a new cpufreq scaling governor, called "schedutil", that uses > > >> scheduler-provided CPU utilization information as input for making > > >> its decisions. > > >> > > > > > > I guess the first (macro) question is why did you decide to go with a > > > complete new governor, where new here is w.r.t. the sched-freq solution. > > > > Probably the most comprehensive answer to this question is my intro > > message: http://marc.info/?l=linux-pm=145609673008122=2 > > > > The executive summary is probably that this was the most > > straightforward way to use the scheduler-provided numbers in cpufreq > > that I could think about. > > > > > AFAICT, it is true that your solution directly builds on top of the > > > latest changes to cpufreq core and governor, but it also seems to have > > > more than a few points in common with sched-freq, > > > > That surely isn't a drawback, is it? > > > > Not at all. I guess that I was simply wondering why you felt that a new > approach was required. But you explain this below. :-) > > > If two people come to the same conclusions in different ways, that's > > an indication that the conclusions may actually be correct. > > > > > and sched-freq has been discussed and evaluated for already quite some > > > time. > > > > Yes, it has. > > > > Does this mean that no one is allowed to try any alternatives to it now? > > > > Of course not. I'm mostly inline with what Steve replied here. But yes, > I think that we can only gain better understanding by reviewing both > RFCs. > > > > Also, it appears to me that they both shares (or they might encounter in > > > the > > > future as development progresses) the same kind of problems, like for > > > example the fact that we can't trigger opp changes from scheduler > > > context ATM. > > > > "Give them a finger and they will ask for the hand." > > > > I'm sorry if you felt that I was asking too much from an RFC. I wasn't > in fact, what I wanted to say is that the two alternatives seemed to > share the same kind of problems. Well, now it seems that you have > already proposed a solution for one of them. :-) That actually was a missing piece that I had planned to add from the outset, but then decided to keep it simple to start with and omit that part until I can clean up the ACPI driver enough to make it fit the whole picture more naturally. I still want to do that which is mostly why I'm regarding that patch as a prototype. > > If you read my intro message linked above, you'll find a paragraph or > > two about that in it. > > > > And the short summary is that I have a plan to actually implement that > > feature in the schedutil governor at least for the ACPI cpufreq > > driver. It shouldn't be too difficult to do either AFAICS. So it is > > not "we can't", but rather "we haven't implemented that yet" in this > > particular case. > > > > I may not be able to do that in the next few days, as I have other > > things to do too, but you may expect to see that done at one point. > > > > So it's not a fundamental issue or anything, it's just that I haven't > > done that *yet* at this point, OK? > > > > Sure. I saw what you are proposing to solve this. I'll reply to that > patch if I'll have any comments. > > > > Don't get me wrong. I think that looking at different ways to solve a > > > problem is always beneficial, since I guess that the goal in the end is > > > to come up with something that suits everybody's needs. > > > > Precisely. > > > > > I was only curious about your thoughts on sched-freq. But we can also > > > wait for the > > > next RFC from Steve for this macro question. :-) > > > > Right, but I have some thoughts anyway. > > > > My goal, that may be quite different from yours, is to reduce the > > cpufreq's overhead as much as I possibly can. If I have to change the > > way it drives the CPU frequency selection to achieve that goal, I will > > do that, but if that can stay the way it is, that's fine too. > > > > As Steve already said, this was not our primary goal. But it is for sure > beneficail for everybody. > > > Some progress has been made already here: we have dealt with the > > timers for good now I think. > > > > This patch deals with the overhead associated with the load tracking > > carried by "traditional" cpufreq governors and with a couple of > > questionable things done by "ondemand" in addition to
Re: [RFC/RFT][PATCH 1/1] cpufreq: New governor using utilization data from the scheduler
On Thursday, February 25, 2016 11:01:20 AM Juri Lelli wrote: > On 23/02/16 00:02, Rafael J. Wysocki wrote: > > On Mon, Feb 22, 2016 at 3:16 PM, Juri Lelli wrote: > > > Hi Rafael, > > > > Hi, > > > > Sorry, my reply to this got delayed a bit. No problem at all. > > > thanks for this RFC. I'm going to test it more in the next few days, but > > > I already have some questions from skimming through it. Please find them > > > inline below. > > > > > > On 22/02/16 00:18, Rafael J. Wysocki wrote: > > >> From: Rafael J. Wysocki > > >> > > >> Add a new cpufreq scaling governor, called "schedutil", that uses > > >> scheduler-provided CPU utilization information as input for making > > >> its decisions. > > >> > > > > > > I guess the first (macro) question is why did you decide to go with a > > > complete new governor, where new here is w.r.t. the sched-freq solution. > > > > Probably the most comprehensive answer to this question is my intro > > message: http://marc.info/?l=linux-pm=145609673008122=2 > > > > The executive summary is probably that this was the most > > straightforward way to use the scheduler-provided numbers in cpufreq > > that I could think about. > > > > > AFAICT, it is true that your solution directly builds on top of the > > > latest changes to cpufreq core and governor, but it also seems to have > > > more than a few points in common with sched-freq, > > > > That surely isn't a drawback, is it? > > > > Not at all. I guess that I was simply wondering why you felt that a new > approach was required. But you explain this below. :-) > > > If two people come to the same conclusions in different ways, that's > > an indication that the conclusions may actually be correct. > > > > > and sched-freq has been discussed and evaluated for already quite some > > > time. > > > > Yes, it has. > > > > Does this mean that no one is allowed to try any alternatives to it now? > > > > Of course not. I'm mostly inline with what Steve replied here. But yes, > I think that we can only gain better understanding by reviewing both > RFCs. > > > > Also, it appears to me that they both shares (or they might encounter in > > > the > > > future as development progresses) the same kind of problems, like for > > > example the fact that we can't trigger opp changes from scheduler > > > context ATM. > > > > "Give them a finger and they will ask for the hand." > > > > I'm sorry if you felt that I was asking too much from an RFC. I wasn't > in fact, what I wanted to say is that the two alternatives seemed to > share the same kind of problems. Well, now it seems that you have > already proposed a solution for one of them. :-) That actually was a missing piece that I had planned to add from the outset, but then decided to keep it simple to start with and omit that part until I can clean up the ACPI driver enough to make it fit the whole picture more naturally. I still want to do that which is mostly why I'm regarding that patch as a prototype. > > If you read my intro message linked above, you'll find a paragraph or > > two about that in it. > > > > And the short summary is that I have a plan to actually implement that > > feature in the schedutil governor at least for the ACPI cpufreq > > driver. It shouldn't be too difficult to do either AFAICS. So it is > > not "we can't", but rather "we haven't implemented that yet" in this > > particular case. > > > > I may not be able to do that in the next few days, as I have other > > things to do too, but you may expect to see that done at one point. > > > > So it's not a fundamental issue or anything, it's just that I haven't > > done that *yet* at this point, OK? > > > > Sure. I saw what you are proposing to solve this. I'll reply to that > patch if I'll have any comments. > > > > Don't get me wrong. I think that looking at different ways to solve a > > > problem is always beneficial, since I guess that the goal in the end is > > > to come up with something that suits everybody's needs. > > > > Precisely. > > > > > I was only curious about your thoughts on sched-freq. But we can also > > > wait for the > > > next RFC from Steve for this macro question. :-) > > > > Right, but I have some thoughts anyway. > > > > My goal, that may be quite different from yours, is to reduce the > > cpufreq's overhead as much as I possibly can. If I have to change the > > way it drives the CPU frequency selection to achieve that goal, I will > > do that, but if that can stay the way it is, that's fine too. > > > > As Steve already said, this was not our primary goal. But it is for sure > beneficail for everybody. > > > Some progress has been made already here: we have dealt with the > > timers for good now I think. > > > > This patch deals with the overhead associated with the load tracking > > carried by "traditional" cpufreq governors and with a couple of > > questionable things done by "ondemand" in addition to that (which is > > one of the reasons why I
Re: [RFC/RFT][PATCH 1/1] cpufreq: New governor using utilization data from the scheduler
On 23/02/16 00:02, Rafael J. Wysocki wrote: > On Mon, Feb 22, 2016 at 3:16 PM, Juri Lelliwrote: > > Hi Rafael, > > Hi, > Sorry, my reply to this got delayed a bit. > > thanks for this RFC. I'm going to test it more in the next few days, but > > I already have some questions from skimming through it. Please find them > > inline below. > > > > On 22/02/16 00:18, Rafael J. Wysocki wrote: > >> From: Rafael J. Wysocki > >> > >> Add a new cpufreq scaling governor, called "schedutil", that uses > >> scheduler-provided CPU utilization information as input for making > >> its decisions. > >> > > > > I guess the first (macro) question is why did you decide to go with a > > complete new governor, where new here is w.r.t. the sched-freq solution. > > Probably the most comprehensive answer to this question is my intro > message: http://marc.info/?l=linux-pm=145609673008122=2 > > The executive summary is probably that this was the most > straightforward way to use the scheduler-provided numbers in cpufreq > that I could think about. > > > AFAICT, it is true that your solution directly builds on top of the > > latest changes to cpufreq core and governor, but it also seems to have > > more than a few points in common with sched-freq, > > That surely isn't a drawback, is it? > Not at all. I guess that I was simply wondering why you felt that a new approach was required. But you explain this below. :-) > If two people come to the same conclusions in different ways, that's > an indication that the conclusions may actually be correct. > > > and sched-freq has been discussed and evaluated for already quite some time. > > Yes, it has. > > Does this mean that no one is allowed to try any alternatives to it now? > Of course not. I'm mostly inline with what Steve replied here. But yes, I think that we can only gain better understanding by reviewing both RFCs. > > Also, it appears to me that they both shares (or they might encounter in the > > future as development progresses) the same kind of problems, like for > > example the fact that we can't trigger opp changes from scheduler > > context ATM. > > "Give them a finger and they will ask for the hand." > I'm sorry if you felt that I was asking too much from an RFC. I wasn't in fact, what I wanted to say is that the two alternatives seemed to share the same kind of problems. Well, now it seems that you have already proposed a solution for one of them. :-) > If you read my intro message linked above, you'll find a paragraph or > two about that in it. > > And the short summary is that I have a plan to actually implement that > feature in the schedutil governor at least for the ACPI cpufreq > driver. It shouldn't be too difficult to do either AFAICS. So it is > not "we can't", but rather "we haven't implemented that yet" in this > particular case. > > I may not be able to do that in the next few days, as I have other > things to do too, but you may expect to see that done at one point. > > So it's not a fundamental issue or anything, it's just that I haven't > done that *yet* at this point, OK? > Sure. I saw what you are proposing to solve this. I'll reply to that patch if I'll have any comments. > > Don't get me wrong. I think that looking at different ways to solve a > > problem is always beneficial, since I guess that the goal in the end is > > to come up with something that suits everybody's needs. > > Precisely. > > > I was only curious about your thoughts on sched-freq. But we can also wait > > for the > > next RFC from Steve for this macro question. :-) > > Right, but I have some thoughts anyway. > > My goal, that may be quite different from yours, is to reduce the > cpufreq's overhead as much as I possibly can. If I have to change the > way it drives the CPU frequency selection to achieve that goal, I will > do that, but if that can stay the way it is, that's fine too. > As Steve already said, this was not our primary goal. But it is for sure beneficail for everybody. > Some progress has been made already here: we have dealt with the > timers for good now I think. > > This patch deals with the overhead associated with the load tracking > carried by "traditional" cpufreq governors and with a couple of > questionable things done by "ondemand" in addition to that (which is > one of the reasons why I didn't want to modify "ondemand" itself for > now). > > The next step will be to teach the governor and the ACPI driver to > switch CPU frequencies in the scheduler context, without spawning > extra work items etc. > > Finally, the sampling should go away and that's where I want it to be. > > I just don't want to run extra code when that's not necessary and I > want things to stay simple when that's as good as it can get. If > sched-freq can pull that off for me, that's fine, but can it really? > > > [...] > > > >> +static void sugov_update_commit(struct policy_dbs_info *policy_dbs, u64 >
Re: [RFC/RFT][PATCH 1/1] cpufreq: New governor using utilization data from the scheduler
On 23/02/16 00:02, Rafael J. Wysocki wrote: > On Mon, Feb 22, 2016 at 3:16 PM, Juri Lelli wrote: > > Hi Rafael, > > Hi, > Sorry, my reply to this got delayed a bit. > > thanks for this RFC. I'm going to test it more in the next few days, but > > I already have some questions from skimming through it. Please find them > > inline below. > > > > On 22/02/16 00:18, Rafael J. Wysocki wrote: > >> From: Rafael J. Wysocki > >> > >> Add a new cpufreq scaling governor, called "schedutil", that uses > >> scheduler-provided CPU utilization information as input for making > >> its decisions. > >> > > > > I guess the first (macro) question is why did you decide to go with a > > complete new governor, where new here is w.r.t. the sched-freq solution. > > Probably the most comprehensive answer to this question is my intro > message: http://marc.info/?l=linux-pm=145609673008122=2 > > The executive summary is probably that this was the most > straightforward way to use the scheduler-provided numbers in cpufreq > that I could think about. > > > AFAICT, it is true that your solution directly builds on top of the > > latest changes to cpufreq core and governor, but it also seems to have > > more than a few points in common with sched-freq, > > That surely isn't a drawback, is it? > Not at all. I guess that I was simply wondering why you felt that a new approach was required. But you explain this below. :-) > If two people come to the same conclusions in different ways, that's > an indication that the conclusions may actually be correct. > > > and sched-freq has been discussed and evaluated for already quite some time. > > Yes, it has. > > Does this mean that no one is allowed to try any alternatives to it now? > Of course not. I'm mostly inline with what Steve replied here. But yes, I think that we can only gain better understanding by reviewing both RFCs. > > Also, it appears to me that they both shares (or they might encounter in the > > future as development progresses) the same kind of problems, like for > > example the fact that we can't trigger opp changes from scheduler > > context ATM. > > "Give them a finger and they will ask for the hand." > I'm sorry if you felt that I was asking too much from an RFC. I wasn't in fact, what I wanted to say is that the two alternatives seemed to share the same kind of problems. Well, now it seems that you have already proposed a solution for one of them. :-) > If you read my intro message linked above, you'll find a paragraph or > two about that in it. > > And the short summary is that I have a plan to actually implement that > feature in the schedutil governor at least for the ACPI cpufreq > driver. It shouldn't be too difficult to do either AFAICS. So it is > not "we can't", but rather "we haven't implemented that yet" in this > particular case. > > I may not be able to do that in the next few days, as I have other > things to do too, but you may expect to see that done at one point. > > So it's not a fundamental issue or anything, it's just that I haven't > done that *yet* at this point, OK? > Sure. I saw what you are proposing to solve this. I'll reply to that patch if I'll have any comments. > > Don't get me wrong. I think that looking at different ways to solve a > > problem is always beneficial, since I guess that the goal in the end is > > to come up with something that suits everybody's needs. > > Precisely. > > > I was only curious about your thoughts on sched-freq. But we can also wait > > for the > > next RFC from Steve for this macro question. :-) > > Right, but I have some thoughts anyway. > > My goal, that may be quite different from yours, is to reduce the > cpufreq's overhead as much as I possibly can. If I have to change the > way it drives the CPU frequency selection to achieve that goal, I will > do that, but if that can stay the way it is, that's fine too. > As Steve already said, this was not our primary goal. But it is for sure beneficail for everybody. > Some progress has been made already here: we have dealt with the > timers for good now I think. > > This patch deals with the overhead associated with the load tracking > carried by "traditional" cpufreq governors and with a couple of > questionable things done by "ondemand" in addition to that (which is > one of the reasons why I didn't want to modify "ondemand" itself for > now). > > The next step will be to teach the governor and the ACPI driver to > switch CPU frequencies in the scheduler context, without spawning > extra work items etc. > > Finally, the sampling should go away and that's where I want it to be. > > I just don't want to run extra code when that's not necessary and I > want things to stay simple when that's as good as it can get. If > sched-freq can pull that off for me, that's fine, but can it really? > > > [...] > > > >> +static void sugov_update_commit(struct policy_dbs_info *policy_dbs, u64 > >> time, > >> +
Re: [RFC/RFT][PATCH 1/1] cpufreq: New governor using utilization data from the scheduler
On Monday, February 22, 2016 11:20:33 PM Steve Muckle wrote: > On 02/22/2016 03:02 PM, Rafael J. Wysocki wrote: > >> I guess the first (macro) question is why did you decide to go with a > >> complete new governor, where new here is w.r.t. the sched-freq solution. > > > > Probably the most comprehensive answer to this question is my intro > > message: http://marc.info/?l=linux-pm=145609673008122=2 > > > > The executive summary is probably that this was the most > > straightforward way to use the scheduler-provided numbers in cpufreq > > that I could think about. > > > >> AFAICT, it is true that your solution directly builds on top of the > >> latest changes to cpufreq core and governor, but it also seems to have > >> more than a few points in common with sched-freq, > > > > That surely isn't a drawback, is it? > > > > If two people come to the same conclusions in different ways, that's > > an indication that the conclusions may actually be correct. > > > >> and sched-freq has been discussed and evaluated for already quite some > >> time. > > > > Yes, it has. > > > > Does this mean that no one is allowed to try any alternatives to it now? > > As mentioned above they are rather similar so it doesn't really seem > like an alternative per se, more like a reimplementation. If that is the case, I don't quite see where or what the problem is. I posted this mostly because you and Juri were complaining that I wasn't telling anyone about how I was going to use util and max going forward. So this is how I'd like to use them, more or less. If that is in alignment with the changes you want to make, all should be fine. > Why do you feel a new starting point for this problem is needed? Are > there specific technical concerns? Well, let me comment the patches you've sent (although not today maybe as I'm quite tired already and I'm afraid that my comments may not be much to the point). That aside, this was rather an attempt to see what could be done on top of recent fixes in the core and how complicated it would be. > I see you started looking over the > latest schedfreq RFC, thank you for your comments thus far. We'd really > appreciate your continued feedback and the chance to collaborate on it > to move it forward. I and others have put a fair bit of effort into it > over the last year or so and will happily and earnestly work to address > any shortcomings you raise. > > I will review your RFC in the next day or so as well. > > ... > > My goal, that may be quite different from yours, is to reduce the > > cpufreq's overhead as much as I possibly can. If I have to change the > > way it drives the CPU frequency selection to achieve that goal, I will > > do that, but if that can stay the way it is, that's fine too. > > Our primary goal has been simply to achieve functional scheduler-driven > CPU frequency control with equivalent or better power and performance > than what is available today. Reduction of cpufreq overhead fits within > this goal (and may be required) so no conflict here. Good. Thanks, Rafael
Re: [RFC/RFT][PATCH 1/1] cpufreq: New governor using utilization data from the scheduler
On Monday, February 22, 2016 11:20:33 PM Steve Muckle wrote: > On 02/22/2016 03:02 PM, Rafael J. Wysocki wrote: > >> I guess the first (macro) question is why did you decide to go with a > >> complete new governor, where new here is w.r.t. the sched-freq solution. > > > > Probably the most comprehensive answer to this question is my intro > > message: http://marc.info/?l=linux-pm=145609673008122=2 > > > > The executive summary is probably that this was the most > > straightforward way to use the scheduler-provided numbers in cpufreq > > that I could think about. > > > >> AFAICT, it is true that your solution directly builds on top of the > >> latest changes to cpufreq core and governor, but it also seems to have > >> more than a few points in common with sched-freq, > > > > That surely isn't a drawback, is it? > > > > If two people come to the same conclusions in different ways, that's > > an indication that the conclusions may actually be correct. > > > >> and sched-freq has been discussed and evaluated for already quite some > >> time. > > > > Yes, it has. > > > > Does this mean that no one is allowed to try any alternatives to it now? > > As mentioned above they are rather similar so it doesn't really seem > like an alternative per se, more like a reimplementation. If that is the case, I don't quite see where or what the problem is. I posted this mostly because you and Juri were complaining that I wasn't telling anyone about how I was going to use util and max going forward. So this is how I'd like to use them, more or less. If that is in alignment with the changes you want to make, all should be fine. > Why do you feel a new starting point for this problem is needed? Are > there specific technical concerns? Well, let me comment the patches you've sent (although not today maybe as I'm quite tired already and I'm afraid that my comments may not be much to the point). That aside, this was rather an attempt to see what could be done on top of recent fixes in the core and how complicated it would be. > I see you started looking over the > latest schedfreq RFC, thank you for your comments thus far. We'd really > appreciate your continued feedback and the chance to collaborate on it > to move it forward. I and others have put a fair bit of effort into it > over the last year or so and will happily and earnestly work to address > any shortcomings you raise. > > I will review your RFC in the next day or so as well. > > ... > > My goal, that may be quite different from yours, is to reduce the > > cpufreq's overhead as much as I possibly can. If I have to change the > > way it drives the CPU frequency selection to achieve that goal, I will > > do that, but if that can stay the way it is, that's fine too. > > Our primary goal has been simply to achieve functional scheduler-driven > CPU frequency control with equivalent or better power and performance > than what is available today. Reduction of cpufreq overhead fits within > this goal (and may be required) so no conflict here. Good. Thanks, Rafael
Re: [RFC/RFT][PATCH 1/1] cpufreq: New governor using utilization data from the scheduler
On 02/22/2016 03:02 PM, Rafael J. Wysocki wrote: >> I guess the first (macro) question is why did you decide to go with a >> complete new governor, where new here is w.r.t. the sched-freq solution. > > Probably the most comprehensive answer to this question is my intro > message: http://marc.info/?l=linux-pm=145609673008122=2 > > The executive summary is probably that this was the most > straightforward way to use the scheduler-provided numbers in cpufreq > that I could think about. > >> AFAICT, it is true that your solution directly builds on top of the >> latest changes to cpufreq core and governor, but it also seems to have >> more than a few points in common with sched-freq, > > That surely isn't a drawback, is it? > > If two people come to the same conclusions in different ways, that's > an indication that the conclusions may actually be correct. > >> and sched-freq has been discussed and evaluated for already quite some time. > > Yes, it has. > > Does this mean that no one is allowed to try any alternatives to it now? As mentioned above they are rather similar so it doesn't really seem like an alternative per se, more like a reimplementation. Why do you feel a new starting point for this problem is needed? Are there specific technical concerns? I see you started looking over the latest schedfreq RFC, thank you for your comments thus far. We'd really appreciate your continued feedback and the chance to collaborate on it to move it forward. I and others have put a fair bit of effort into it over the last year or so and will happily and earnestly work to address any shortcomings you raise. I will review your RFC in the next day or so as well. ... > My goal, that may be quite different from yours, is to reduce the > cpufreq's overhead as much as I possibly can. If I have to change the > way it drives the CPU frequency selection to achieve that goal, I will > do that, but if that can stay the way it is, that's fine too. Our primary goal has been simply to achieve functional scheduler-driven CPU frequency control with equivalent or better power and performance than what is available today. Reduction of cpufreq overhead fits within this goal (and may be required) so no conflict here. thanks, Steve
Re: [RFC/RFT][PATCH 1/1] cpufreq: New governor using utilization data from the scheduler
On 02/22/2016 03:02 PM, Rafael J. Wysocki wrote: >> I guess the first (macro) question is why did you decide to go with a >> complete new governor, where new here is w.r.t. the sched-freq solution. > > Probably the most comprehensive answer to this question is my intro > message: http://marc.info/?l=linux-pm=145609673008122=2 > > The executive summary is probably that this was the most > straightforward way to use the scheduler-provided numbers in cpufreq > that I could think about. > >> AFAICT, it is true that your solution directly builds on top of the >> latest changes to cpufreq core and governor, but it also seems to have >> more than a few points in common with sched-freq, > > That surely isn't a drawback, is it? > > If two people come to the same conclusions in different ways, that's > an indication that the conclusions may actually be correct. > >> and sched-freq has been discussed and evaluated for already quite some time. > > Yes, it has. > > Does this mean that no one is allowed to try any alternatives to it now? As mentioned above they are rather similar so it doesn't really seem like an alternative per se, more like a reimplementation. Why do you feel a new starting point for this problem is needed? Are there specific technical concerns? I see you started looking over the latest schedfreq RFC, thank you for your comments thus far. We'd really appreciate your continued feedback and the chance to collaborate on it to move it forward. I and others have put a fair bit of effort into it over the last year or so and will happily and earnestly work to address any shortcomings you raise. I will review your RFC in the next day or so as well. ... > My goal, that may be quite different from yours, is to reduce the > cpufreq's overhead as much as I possibly can. If I have to change the > way it drives the CPU frequency selection to achieve that goal, I will > do that, but if that can stay the way it is, that's fine too. Our primary goal has been simply to achieve functional scheduler-driven CPU frequency control with equivalent or better power and performance than what is available today. Reduction of cpufreq overhead fits within this goal (and may be required) so no conflict here. thanks, Steve
Re: [RFC/RFT][PATCH 1/1] cpufreq: New governor using utilization data from the scheduler
On Mon, Feb 22, 2016 at 3:16 PM, Juri Lelliwrote: > Hi Rafael, Hi, > thanks for this RFC. I'm going to test it more in the next few days, but > I already have some questions from skimming through it. Please find them > inline below. > > On 22/02/16 00:18, Rafael J. Wysocki wrote: >> From: Rafael J. Wysocki >> >> Add a new cpufreq scaling governor, called "schedutil", that uses >> scheduler-provided CPU utilization information as input for making >> its decisions. >> > > I guess the first (macro) question is why did you decide to go with a > complete new governor, where new here is w.r.t. the sched-freq solution. Probably the most comprehensive answer to this question is my intro message: http://marc.info/?l=linux-pm=145609673008122=2 The executive summary is probably that this was the most straightforward way to use the scheduler-provided numbers in cpufreq that I could think about. > AFAICT, it is true that your solution directly builds on top of the > latest changes to cpufreq core and governor, but it also seems to have > more than a few points in common with sched-freq, That surely isn't a drawback, is it? If two people come to the same conclusions in different ways, that's an indication that the conclusions may actually be correct. > and sched-freq has been discussed and evaluated for already quite some time. Yes, it has. Does this mean that no one is allowed to try any alternatives to it now? > Also, it appears to me that they both shares (or they might encounter in the > future as development progresses) the same kind of problems, like for > example the fact that we can't trigger opp changes from scheduler > context ATM. "Give them a finger and they will ask for the hand." If you read my intro message linked above, you'll find a paragraph or two about that in it. And the short summary is that I have a plan to actually implement that feature in the schedutil governor at least for the ACPI cpufreq driver. It shouldn't be too difficult to do either AFAICS. So it is not "we can't", but rather "we haven't implemented that yet" in this particular case. I may not be able to do that in the next few days, as I have other things to do too, but you may expect to see that done at one point. So it's not a fundamental issue or anything, it's just that I haven't done that *yet* at this point, OK? > Don't get me wrong. I think that looking at different ways to solve a > problem is always beneficial, since I guess that the goal in the end is > to come up with something that suits everybody's needs. Precisely. > I was only curious about your thoughts on sched-freq. But we can also wait > for the > next RFC from Steve for this macro question. :-) Right, but I have some thoughts anyway. My goal, that may be quite different from yours, is to reduce the cpufreq's overhead as much as I possibly can. If I have to change the way it drives the CPU frequency selection to achieve that goal, I will do that, but if that can stay the way it is, that's fine too. Some progress has been made already here: we have dealt with the timers for good now I think. This patch deals with the overhead associated with the load tracking carried by "traditional" cpufreq governors and with a couple of questionable things done by "ondemand" in addition to that (which is one of the reasons why I didn't want to modify "ondemand" itself for now). The next step will be to teach the governor and the ACPI driver to switch CPU frequencies in the scheduler context, without spawning extra work items etc. Finally, the sampling should go away and that's where I want it to be. I just don't want to run extra code when that's not necessary and I want things to stay simple when that's as good as it can get. If sched-freq can pull that off for me, that's fine, but can it really? > [...] > >> +static void sugov_update_commit(struct policy_dbs_info *policy_dbs, u64 >> time, >> + unsigned int next_freq) >> +{ >> + struct sugov_policy *sg_policy = to_sg_policy(policy_dbs); >> + >> + sg_policy->next_freq = next_freq; >> + policy_dbs->last_sample_time = time; >> + policy_dbs->work_in_progress = true; >> + irq_work_queue(_dbs->irq_work); > > Here we basically use the system wq to be able to do the freq transition > in process context. CFS is probably fine with this, but don't you think > we might get into troubles when, in the future, we will want to service > RT/DL requests more properly and they will end up being serviced > together with all the others wq users and at !RT priority? That may be regarded as a problem, but I'm not sure why you're talking about it in the context of this particular patch. That problem has been there forever in cpufreq: in theory RT tasks may stall frequency changes indefinitely. Is the problem real, though? Suppose that that actually happens and there are RT tasks effectively stalling frequency updates.
Re: [RFC/RFT][PATCH 1/1] cpufreq: New governor using utilization data from the scheduler
On Mon, Feb 22, 2016 at 3:16 PM, Juri Lelli wrote: > Hi Rafael, Hi, > thanks for this RFC. I'm going to test it more in the next few days, but > I already have some questions from skimming through it. Please find them > inline below. > > On 22/02/16 00:18, Rafael J. Wysocki wrote: >> From: Rafael J. Wysocki >> >> Add a new cpufreq scaling governor, called "schedutil", that uses >> scheduler-provided CPU utilization information as input for making >> its decisions. >> > > I guess the first (macro) question is why did you decide to go with a > complete new governor, where new here is w.r.t. the sched-freq solution. Probably the most comprehensive answer to this question is my intro message: http://marc.info/?l=linux-pm=145609673008122=2 The executive summary is probably that this was the most straightforward way to use the scheduler-provided numbers in cpufreq that I could think about. > AFAICT, it is true that your solution directly builds on top of the > latest changes to cpufreq core and governor, but it also seems to have > more than a few points in common with sched-freq, That surely isn't a drawback, is it? If two people come to the same conclusions in different ways, that's an indication that the conclusions may actually be correct. > and sched-freq has been discussed and evaluated for already quite some time. Yes, it has. Does this mean that no one is allowed to try any alternatives to it now? > Also, it appears to me that they both shares (or they might encounter in the > future as development progresses) the same kind of problems, like for > example the fact that we can't trigger opp changes from scheduler > context ATM. "Give them a finger and they will ask for the hand." If you read my intro message linked above, you'll find a paragraph or two about that in it. And the short summary is that I have a plan to actually implement that feature in the schedutil governor at least for the ACPI cpufreq driver. It shouldn't be too difficult to do either AFAICS. So it is not "we can't", but rather "we haven't implemented that yet" in this particular case. I may not be able to do that in the next few days, as I have other things to do too, but you may expect to see that done at one point. So it's not a fundamental issue or anything, it's just that I haven't done that *yet* at this point, OK? > Don't get me wrong. I think that looking at different ways to solve a > problem is always beneficial, since I guess that the goal in the end is > to come up with something that suits everybody's needs. Precisely. > I was only curious about your thoughts on sched-freq. But we can also wait > for the > next RFC from Steve for this macro question. :-) Right, but I have some thoughts anyway. My goal, that may be quite different from yours, is to reduce the cpufreq's overhead as much as I possibly can. If I have to change the way it drives the CPU frequency selection to achieve that goal, I will do that, but if that can stay the way it is, that's fine too. Some progress has been made already here: we have dealt with the timers for good now I think. This patch deals with the overhead associated with the load tracking carried by "traditional" cpufreq governors and with a couple of questionable things done by "ondemand" in addition to that (which is one of the reasons why I didn't want to modify "ondemand" itself for now). The next step will be to teach the governor and the ACPI driver to switch CPU frequencies in the scheduler context, without spawning extra work items etc. Finally, the sampling should go away and that's where I want it to be. I just don't want to run extra code when that's not necessary and I want things to stay simple when that's as good as it can get. If sched-freq can pull that off for me, that's fine, but can it really? > [...] > >> +static void sugov_update_commit(struct policy_dbs_info *policy_dbs, u64 >> time, >> + unsigned int next_freq) >> +{ >> + struct sugov_policy *sg_policy = to_sg_policy(policy_dbs); >> + >> + sg_policy->next_freq = next_freq; >> + policy_dbs->last_sample_time = time; >> + policy_dbs->work_in_progress = true; >> + irq_work_queue(_dbs->irq_work); > > Here we basically use the system wq to be able to do the freq transition > in process context. CFS is probably fine with this, but don't you think > we might get into troubles when, in the future, we will want to service > RT/DL requests more properly and they will end up being serviced > together with all the others wq users and at !RT priority? That may be regarded as a problem, but I'm not sure why you're talking about it in the context of this particular patch. That problem has been there forever in cpufreq: in theory RT tasks may stall frequency changes indefinitely. Is the problem real, though? Suppose that that actually happens and there are RT tasks effectively stalling frequency updates. In that case some other important activities of
Re: [RFC/RFT][PATCH 1/1] cpufreq: New governor using utilization data from the scheduler
Hi Rafael, thanks for this RFC. I'm going to test it more in the next few days, but I already have some questions from skimming through it. Please find them inline below. On 22/02/16 00:18, Rafael J. Wysocki wrote: > From: Rafael J. Wysocki> > Add a new cpufreq scaling governor, called "schedutil", that uses > scheduler-provided CPU utilization information as input for making > its decisions. > I guess the first (macro) question is why did you decide to go with a complete new governor, where new here is w.r.t. the sched-freq solution. AFAICT, it is true that your solution directly builds on top of the latest changes to cpufreq core and governor, but it also seems to have more than a few points in common with sched-freq, and sched-freq has been discussed and evaluated for already quite some time. Also, it appears to me that they both shares (or they might encounter in the future as development progresses) the same kind of problems, like for example the fact that we can't trigger opp changes from scheduler context ATM. Don't get me wrong. I think that looking at different ways to solve a problem is always beneficial, since I guess that the goal in the end is to come up with something that suits everybody's needs. I was only curious about your thoughts on sched-freq. But we can also wait for the next RFC from Steve for this macro question. :-) [...] > +static void sugov_update_commit(struct policy_dbs_info *policy_dbs, u64 time, > + unsigned int next_freq) > +{ > + struct sugov_policy *sg_policy = to_sg_policy(policy_dbs); > + > + sg_policy->next_freq = next_freq; > + policy_dbs->last_sample_time = time; > + policy_dbs->work_in_progress = true; > + irq_work_queue(_dbs->irq_work); Here we basically use the system wq to be able to do the freq transition in process context. CFS is probably fine with this, but don't you think we might get into troubles when, in the future, we will want to service RT/DL requests more properly and they will end up being serviced together with all the others wq users and at !RT priority? > +} > + > +static void sugov_update_shared(struct update_util_data *data, u64 time, > + unsigned long util, unsigned long max) > +{ We don't have a way to tell from which scheduling class this is coming from, do we? And if that is true can't a request from CFS overwrite RT/DL go to max requests? [...] Anyway, I'm going to start using our existing testing infrastructure used to evaluate sched-freq to try to better understand the implications of your approach. Best, - Juri
Re: [RFC/RFT][PATCH 1/1] cpufreq: New governor using utilization data from the scheduler
Hi Rafael, thanks for this RFC. I'm going to test it more in the next few days, but I already have some questions from skimming through it. Please find them inline below. On 22/02/16 00:18, Rafael J. Wysocki wrote: > From: Rafael J. Wysocki > > Add a new cpufreq scaling governor, called "schedutil", that uses > scheduler-provided CPU utilization information as input for making > its decisions. > I guess the first (macro) question is why did you decide to go with a complete new governor, where new here is w.r.t. the sched-freq solution. AFAICT, it is true that your solution directly builds on top of the latest changes to cpufreq core and governor, but it also seems to have more than a few points in common with sched-freq, and sched-freq has been discussed and evaluated for already quite some time. Also, it appears to me that they both shares (or they might encounter in the future as development progresses) the same kind of problems, like for example the fact that we can't trigger opp changes from scheduler context ATM. Don't get me wrong. I think that looking at different ways to solve a problem is always beneficial, since I guess that the goal in the end is to come up with something that suits everybody's needs. I was only curious about your thoughts on sched-freq. But we can also wait for the next RFC from Steve for this macro question. :-) [...] > +static void sugov_update_commit(struct policy_dbs_info *policy_dbs, u64 time, > + unsigned int next_freq) > +{ > + struct sugov_policy *sg_policy = to_sg_policy(policy_dbs); > + > + sg_policy->next_freq = next_freq; > + policy_dbs->last_sample_time = time; > + policy_dbs->work_in_progress = true; > + irq_work_queue(_dbs->irq_work); Here we basically use the system wq to be able to do the freq transition in process context. CFS is probably fine with this, but don't you think we might get into troubles when, in the future, we will want to service RT/DL requests more properly and they will end up being serviced together with all the others wq users and at !RT priority? > +} > + > +static void sugov_update_shared(struct update_util_data *data, u64 time, > + unsigned long util, unsigned long max) > +{ We don't have a way to tell from which scheduling class this is coming from, do we? And if that is true can't a request from CFS overwrite RT/DL go to max requests? [...] Anyway, I'm going to start using our existing testing infrastructure used to evaluate sched-freq to try to better understand the implications of your approach. Best, - Juri
[RFC/RFT][PATCH 1/1] cpufreq: New governor using utilization data from the scheduler
From: Rafael J. WysockiAdd a new cpufreq scaling governor, called "schedutil", that uses scheduler-provided CPU utilization information as input for making its decisions. Doing that is possible after commit fe7034338ba0 (cpufreq: Add mechanism for registering utilization update callbacks) that introduced cpufreq_update_util() called by the scheduler on utilization changes (from CFS) and RT/DL task status updates. In particular, CPU frequency scaling decisions may be based on the the utilization data passed to cpufreq_update_util() by CFS. The new governor is very simple. It is almost as simple as it can be and remain reasonably functional. The frequency selection formula used by it is essentially the same as the one used by the "ondemand" governor, although it doesn't use the additional up_threshold parameter, but instead of computing the load as the "non-idle CPU time" to "total CPU time" ratio, it takes the utilization data provided by CFS as input. More specifically, it represents "load" as the util/max ratio, where util and max are the utilization and CPU capacity coming from CFS. All of the computations are carried out in the utilization update handlers provided by the new governor. One of those handlers is used for cpufreq policies shared between multiple CPUs and the other one is for policies with one CPU only (and therefore it doesn't need to use any extra synchronization means). The only operation carried out by the new governor's ->gov_dbs_timer callback, sugov_set_freq(), is a __cpufreq_driver_target() call to trigger a frequency update (to a value already computed beforehand in one of the utilization update handlers). This means that, at least for some cpufreq drivers that can update CPU frequency by doing simple register writes, it should be possible to set the frequency in the utilization update handlers too in which case all of the governor's activity would take place in the scheduler paths invoking cpufreq_update_util() without the need to run anything in process context. Currently, the governor treats all of the RT and DL tasks as "unknown utilization" and sets the frequency to the allowed maximum when updated from the RT or DL sched classes. That heavy-handed approach should be replaced with something more specifically targeted at RT and DL tasks. To some extent it relies on the common governor code in cpufreq_governor.c and it uses that code in a somewhat unusual way (different from what the "ondemand" and "conservative" governors do), so some small and rather unintrusive changes have to be made in that code and the other governors to support it. However, after making it possible to set the CPU frequency from the utilization update handlers, that new governor's interactions with the common code might be limited to the initialization, cleanup and handling of sysfs attributes (currently only one attribute, sampling_rate, is supported in addition to the standard policy attributes handled by the cpufreq core). Signed-off-by: Rafael J. Wysocki --- This is on top of the linux-next branch of the linux-pm.git tree (that should be part of the tomorrow's linux-next if all goes well), but it should also apply on top of the pm-cpufreq-test branch in that tree (which only contains changes related to cpufreq governors). --- drivers/cpufreq/Kconfig| 15 + drivers/cpufreq/Makefile |1 drivers/cpufreq/cpufreq_conservative.c |3 drivers/cpufreq/cpufreq_governor.c | 21 +- drivers/cpufreq/cpufreq_governor.h |2 drivers/cpufreq/cpufreq_ondemand.c |3 drivers/cpufreq/cpufreq_schedutil.c| 249 + 7 files changed, 284 insertions(+), 10 deletions(-) Index: linux-pm/drivers/cpufreq/cpufreq_governor.h === --- linux-pm.orig/drivers/cpufreq/cpufreq_governor.h +++ linux-pm/drivers/cpufreq/cpufreq_governor.h @@ -164,7 +164,7 @@ struct dbs_governor { void (*free)(struct policy_dbs_info *policy_dbs); int (*init)(struct dbs_data *dbs_data, bool notify); void (*exit)(struct dbs_data *dbs_data, bool notify); - void (*start)(struct cpufreq_policy *policy); + bool (*start)(struct cpufreq_policy *policy); }; static inline struct dbs_governor *dbs_governor_of(struct cpufreq_policy *policy) Index: linux-pm/drivers/cpufreq/cpufreq_schedutil.c === --- /dev/null +++ linux-pm/drivers/cpufreq/cpufreq_schedutil.c @@ -0,0 +1,249 @@ +/* + * CPUFreq governor based on scheduler-provided CPU utilization data. + * + * Copyright (C) 2016, Intel Corporation + * Author: Rafael J. Wysocki + * + * This program is free software; you can redistribute it and/or modify + * it under the terms of the GNU General Public License version 2 as + * published by the Free Software
[RFC/RFT][PATCH 1/1] cpufreq: New governor using utilization data from the scheduler
From: Rafael J. Wysocki Add a new cpufreq scaling governor, called "schedutil", that uses scheduler-provided CPU utilization information as input for making its decisions. Doing that is possible after commit fe7034338ba0 (cpufreq: Add mechanism for registering utilization update callbacks) that introduced cpufreq_update_util() called by the scheduler on utilization changes (from CFS) and RT/DL task status updates. In particular, CPU frequency scaling decisions may be based on the the utilization data passed to cpufreq_update_util() by CFS. The new governor is very simple. It is almost as simple as it can be and remain reasonably functional. The frequency selection formula used by it is essentially the same as the one used by the "ondemand" governor, although it doesn't use the additional up_threshold parameter, but instead of computing the load as the "non-idle CPU time" to "total CPU time" ratio, it takes the utilization data provided by CFS as input. More specifically, it represents "load" as the util/max ratio, where util and max are the utilization and CPU capacity coming from CFS. All of the computations are carried out in the utilization update handlers provided by the new governor. One of those handlers is used for cpufreq policies shared between multiple CPUs and the other one is for policies with one CPU only (and therefore it doesn't need to use any extra synchronization means). The only operation carried out by the new governor's ->gov_dbs_timer callback, sugov_set_freq(), is a __cpufreq_driver_target() call to trigger a frequency update (to a value already computed beforehand in one of the utilization update handlers). This means that, at least for some cpufreq drivers that can update CPU frequency by doing simple register writes, it should be possible to set the frequency in the utilization update handlers too in which case all of the governor's activity would take place in the scheduler paths invoking cpufreq_update_util() without the need to run anything in process context. Currently, the governor treats all of the RT and DL tasks as "unknown utilization" and sets the frequency to the allowed maximum when updated from the RT or DL sched classes. That heavy-handed approach should be replaced with something more specifically targeted at RT and DL tasks. To some extent it relies on the common governor code in cpufreq_governor.c and it uses that code in a somewhat unusual way (different from what the "ondemand" and "conservative" governors do), so some small and rather unintrusive changes have to be made in that code and the other governors to support it. However, after making it possible to set the CPU frequency from the utilization update handlers, that new governor's interactions with the common code might be limited to the initialization, cleanup and handling of sysfs attributes (currently only one attribute, sampling_rate, is supported in addition to the standard policy attributes handled by the cpufreq core). Signed-off-by: Rafael J. Wysocki --- This is on top of the linux-next branch of the linux-pm.git tree (that should be part of the tomorrow's linux-next if all goes well), but it should also apply on top of the pm-cpufreq-test branch in that tree (which only contains changes related to cpufreq governors). --- drivers/cpufreq/Kconfig| 15 + drivers/cpufreq/Makefile |1 drivers/cpufreq/cpufreq_conservative.c |3 drivers/cpufreq/cpufreq_governor.c | 21 +- drivers/cpufreq/cpufreq_governor.h |2 drivers/cpufreq/cpufreq_ondemand.c |3 drivers/cpufreq/cpufreq_schedutil.c| 249 + 7 files changed, 284 insertions(+), 10 deletions(-) Index: linux-pm/drivers/cpufreq/cpufreq_governor.h === --- linux-pm.orig/drivers/cpufreq/cpufreq_governor.h +++ linux-pm/drivers/cpufreq/cpufreq_governor.h @@ -164,7 +164,7 @@ struct dbs_governor { void (*free)(struct policy_dbs_info *policy_dbs); int (*init)(struct dbs_data *dbs_data, bool notify); void (*exit)(struct dbs_data *dbs_data, bool notify); - void (*start)(struct cpufreq_policy *policy); + bool (*start)(struct cpufreq_policy *policy); }; static inline struct dbs_governor *dbs_governor_of(struct cpufreq_policy *policy) Index: linux-pm/drivers/cpufreq/cpufreq_schedutil.c === --- /dev/null +++ linux-pm/drivers/cpufreq/cpufreq_schedutil.c @@ -0,0 +1,249 @@ +/* + * CPUFreq governor based on scheduler-provided CPU utilization data. + * + * Copyright (C) 2016, Intel Corporation + * Author: Rafael J. Wysocki + * + * This program is free software; you can redistribute it and/or modify + * it under the terms of the GNU General Public License version 2 as + * published by the Free Software Foundation. + */ + +#include +#include + +#include "cpufreq_governor.h" + +struct