Re: [PATCH v2 6/10] cpufreq: Support for fast frequency switching
On Mon, Mar 7, 2016 at 2:32 PM, Peter Zijlstrawrote: > On Mon, Mar 07, 2016 at 02:15:47PM +0100, Rafael J. Wysocki wrote: >> On Mon, Mar 7, 2016 at 9:00 AM, Peter Zijlstra wrote: > >> > Sure I know all that. But that, to me, seems like an argument for why >> > you should have done this a long time ago. >> >> While I generally agree with this, I don't quite see why cleaning that >> up necessarily has to be connected to the current patch series which >> is my point. > > Ah OK, fair enough I suppose. But someone should stick this on their > TODO list, we should not 'forget' about this (again). Sure. >> > But I do think something wants to be done here. >> >> So here's what I can do for the "fast switch" thing. >> >> There is the fast_switch_possible policy flag that's necessary anyway. >> I can make notifier registration fail when that is set for at least >> one policy and I can make the setting of it fail if at least one >> notifier has already been registered. >> >> However, without spending too much time on chasing code dependencies i >> sort of suspect that it will uncover things that register cpufreq >> notifiers early and it won't be possible to use fast switch without >> sorting that out. > > The two x86 users don't register notifiers when CONSTANT_TSC, which > seems to be the right thing. > > Much of the other users seem unlikely to be used on x86, so I suspect > the initial fallout will be very limited. OK, let me try this then. > *groan* modules, cpufreq allows drivers to be modules, so init sequences > are poorly defined at best :/ Yes that blows. Yup. >> And that won't even change anything apart from >> removing some code that has not worked for quite a while already and >> nobody noticed. > > Which is always a good thing, but yes, we can do this later. > >> It is doable for the "fast switch" thing, but it won't help in all of >> the other cases when notifications are not reliable. > > Right, you can maybe add a 'NOTIFIERS_BROKEN' flag to the intel_p_state > and HWP drivers or so, and trigger off of that. Something like that, yes.
Re: [PATCH v2 6/10] cpufreq: Support for fast frequency switching
On Mon, Mar 7, 2016 at 2:32 PM, Peter Zijlstra wrote: > On Mon, Mar 07, 2016 at 02:15:47PM +0100, Rafael J. Wysocki wrote: >> On Mon, Mar 7, 2016 at 9:00 AM, Peter Zijlstra wrote: > >> > Sure I know all that. But that, to me, seems like an argument for why >> > you should have done this a long time ago. >> >> While I generally agree with this, I don't quite see why cleaning that >> up necessarily has to be connected to the current patch series which >> is my point. > > Ah OK, fair enough I suppose. But someone should stick this on their > TODO list, we should not 'forget' about this (again). Sure. >> > But I do think something wants to be done here. >> >> So here's what I can do for the "fast switch" thing. >> >> There is the fast_switch_possible policy flag that's necessary anyway. >> I can make notifier registration fail when that is set for at least >> one policy and I can make the setting of it fail if at least one >> notifier has already been registered. >> >> However, without spending too much time on chasing code dependencies i >> sort of suspect that it will uncover things that register cpufreq >> notifiers early and it won't be possible to use fast switch without >> sorting that out. > > The two x86 users don't register notifiers when CONSTANT_TSC, which > seems to be the right thing. > > Much of the other users seem unlikely to be used on x86, so I suspect > the initial fallout will be very limited. OK, let me try this then. > *groan* modules, cpufreq allows drivers to be modules, so init sequences > are poorly defined at best :/ Yes that blows. Yup. >> And that won't even change anything apart from >> removing some code that has not worked for quite a while already and >> nobody noticed. > > Which is always a good thing, but yes, we can do this later. > >> It is doable for the "fast switch" thing, but it won't help in all of >> the other cases when notifications are not reliable. > > Right, you can maybe add a 'NOTIFIERS_BROKEN' flag to the intel_p_state > and HWP drivers or so, and trigger off of that. Something like that, yes.
Re: [PATCH v2 6/10] cpufreq: Support for fast frequency switching
On Mon, Mar 07, 2016 at 02:15:47PM +0100, Rafael J. Wysocki wrote: > On Mon, Mar 7, 2016 at 9:00 AM, Peter Zijlstrawrote: > > Sure I know all that. But that, to me, seems like an argument for why > > you should have done this a long time ago. > > While I generally agree with this, I don't quite see why cleaning that > up necessarily has to be connected to the current patch series which > is my point. Ah OK, fair enough I suppose. But someone should stick this on their TODO list, we should not 'forget' about this (again). > > But I do think something wants to be done here. > > So here's what I can do for the "fast switch" thing. > > There is the fast_switch_possible policy flag that's necessary anyway. > I can make notifier registration fail when that is set for at least > one policy and I can make the setting of it fail if at least one > notifier has already been registered. > > However, without spending too much time on chasing code dependencies i > sort of suspect that it will uncover things that register cpufreq > notifiers early and it won't be possible to use fast switch without > sorting that out. The two x86 users don't register notifiers when CONSTANT_TSC, which seems to be the right thing. Much of the other users seem unlikely to be used on x86, so I suspect the initial fallout will be very limited. *groan* modules, cpufreq allows drivers to be modules, so init sequences are poorly defined at best :/ Yes that blows. > And that won't even change anything apart from > removing some code that has not worked for quite a while already and > nobody noticed. Which is always a good thing, but yes, we can do this later. > It is doable for the "fast switch" thing, but it won't help in all of > the other cases when notifications are not reliable. Right, you can maybe add a 'NOTIFIERS_BROKEN' flag to the intel_p_state and HWP drivers or so, and trigger off of that. > If it changes frequently enough, it's not practical and not even > necessary to cause things like thermal to react on every change, but I > think there needs to be a way to make them reevaluate things > regularly. Arguably, they might set a timer for that, but why would > they need a timer if they could get triggered by the code that > actually makes changes? So that very much depends on what thermal actually needs; but I suspect that using a timer is cheaper than using irq_work to kick off something else. The irq_work is a LAPIC write (self IPI), just as the timer. However timers can be coalesced, resulting in, on average, less timer reprogramming than there are handlers ran. Now, if thermal can do without work and can run in-line just like the fast freq switch, then yes, that might make sense.
Re: [PATCH v2 6/10] cpufreq: Support for fast frequency switching
On Mon, Mar 07, 2016 at 02:15:47PM +0100, Rafael J. Wysocki wrote: > On Mon, Mar 7, 2016 at 9:00 AM, Peter Zijlstra wrote: > > Sure I know all that. But that, to me, seems like an argument for why > > you should have done this a long time ago. > > While I generally agree with this, I don't quite see why cleaning that > up necessarily has to be connected to the current patch series which > is my point. Ah OK, fair enough I suppose. But someone should stick this on their TODO list, we should not 'forget' about this (again). > > But I do think something wants to be done here. > > So here's what I can do for the "fast switch" thing. > > There is the fast_switch_possible policy flag that's necessary anyway. > I can make notifier registration fail when that is set for at least > one policy and I can make the setting of it fail if at least one > notifier has already been registered. > > However, without spending too much time on chasing code dependencies i > sort of suspect that it will uncover things that register cpufreq > notifiers early and it won't be possible to use fast switch without > sorting that out. The two x86 users don't register notifiers when CONSTANT_TSC, which seems to be the right thing. Much of the other users seem unlikely to be used on x86, so I suspect the initial fallout will be very limited. *groan* modules, cpufreq allows drivers to be modules, so init sequences are poorly defined at best :/ Yes that blows. > And that won't even change anything apart from > removing some code that has not worked for quite a while already and > nobody noticed. Which is always a good thing, but yes, we can do this later. > It is doable for the "fast switch" thing, but it won't help in all of > the other cases when notifications are not reliable. Right, you can maybe add a 'NOTIFIERS_BROKEN' flag to the intel_p_state and HWP drivers or so, and trigger off of that. > If it changes frequently enough, it's not practical and not even > necessary to cause things like thermal to react on every change, but I > think there needs to be a way to make them reevaluate things > regularly. Arguably, they might set a timer for that, but why would > they need a timer if they could get triggered by the code that > actually makes changes? So that very much depends on what thermal actually needs; but I suspect that using a timer is cheaper than using irq_work to kick off something else. The irq_work is a LAPIC write (self IPI), just as the timer. However timers can be coalesced, resulting in, on average, less timer reprogramming than there are handlers ran. Now, if thermal can do without work and can run in-line just like the fast freq switch, then yes, that might make sense.
Re: [PATCH v2 6/10] cpufreq: Support for fast frequency switching
On Mon, Mar 7, 2016 at 9:00 AM, Peter Zijlstrawrote: > On Sun, Mar 06, 2016 at 03:17:09AM +0100, Rafael J. Wysocki wrote: >> > Agreed, fail the stuff hard. >> > >> > Simply make cpufreq_register_notifier a __must_check function and add >> > error handling to all call sites. >> >> Quite frankly, I don't see a compelling reason to do anything about >> the notifications at this point. >> >> The ACPI driver is the only one that will support fast switching for >> the time being and on practically all platforms that can use the ACPI >> driver the transition notifications cannot be relied on anyway for a >> few reasons. First, if intel_pstate or HWP is in use, they won't be >> coming at all. Second, anything turbo will just change frequency at >> will without notifying (like HWP). Finally, if they are coming, >> whoever receives them is notified about the frequency that is >> requested and not the real one, which is misleading, because (a) the >> request may just make the CPU go into the turbo range and then see >> above or (b) if the CPU is in a platform-coordinated package, its >> request will only be granted if it's the winning one. > > Sure I know all that. But that, to me, seems like an argument for why > you should have done this a long time ago. While I generally agree with this, I don't quite see why cleaning that up necessarily has to be connected to the current patch series which is my point. > Someone registering a notifier you _know_ won't be called reliably is a > sure sign of borkage. And you want to be notified (pun intended) of > borkage. > > So the alternative option to making the registration fail, is making the > registration WARN (and possibly disable fast support in the driver). > > But I do think something wants to be done here. So here's what I can do for the "fast switch" thing. There is the fast_switch_possible policy flag that's necessary anyway. I can make notifier registration fail when that is set for at least one policy and I can make the setting of it fail if at least one notifier has already been registered. However, without spending too much time on chasing code dependencies i sort of suspect that it will uncover things that register cpufreq notifiers early and it won't be possible to use fast switch without sorting that out. And that won't even change anything apart from removing some code that has not worked for quite a while already and nobody noticed. >> > No no no, that's just horrible. Why would you want to keep this >> > notification stuff alive? If your platform can change frequency 'fast' >> > you don't want notifiers. >> >> I'm not totally sure about that. > > I am, per definition, if you need to call notifiers, you're not fast. > > I would really suggest making that a hard rule and enforcing it. OK, but see above. It is doable for the "fast switch" thing, but it won't help in all of the other cases when notifications are not reliable. >> > What's the point of a notification that says: "At some point in the >> > random past my frequency has changed, and it likely has changed again >> > since then, do 'something'." >> > >> > That's pointless. If you have dependent clock domains or whatever, you >> > simply _cannot_ be fast. >> > >> >> What about thermal? They don't need to get very accurate information, >> but they need to be updated on a regular basis. It would do if they >> get averages instead of momentary values (and may be better even). > > Thermal, should be an integral part of cpufreq, but if they need a > callback from the switching hook (and here I would like to remind > everyone that this is inside scheduler hot paths and the more code you > stuff in the harder the performance regressions will hit you in the > face) Calling notifiers (or any kind of callbacks that anyone can register) from there is out of the question. > it can get a direct function call. No need for no stinking > notifiers. I'm not talking about hooks in the switching code but *some* way to let stuff know about frequency changes. If it changes frequently enough, it's not practical and not even necessary to cause things like thermal to react on every change, but I think there needs to be a way to make them reevaluate things regularly. Arguably, they might set a timer for that, but why would they need a timer if they could get triggered by the code that actually makes changes?
Re: [PATCH v2 6/10] cpufreq: Support for fast frequency switching
On Mon, Mar 7, 2016 at 9:00 AM, Peter Zijlstra wrote: > On Sun, Mar 06, 2016 at 03:17:09AM +0100, Rafael J. Wysocki wrote: >> > Agreed, fail the stuff hard. >> > >> > Simply make cpufreq_register_notifier a __must_check function and add >> > error handling to all call sites. >> >> Quite frankly, I don't see a compelling reason to do anything about >> the notifications at this point. >> >> The ACPI driver is the only one that will support fast switching for >> the time being and on practically all platforms that can use the ACPI >> driver the transition notifications cannot be relied on anyway for a >> few reasons. First, if intel_pstate or HWP is in use, they won't be >> coming at all. Second, anything turbo will just change frequency at >> will without notifying (like HWP). Finally, if they are coming, >> whoever receives them is notified about the frequency that is >> requested and not the real one, which is misleading, because (a) the >> request may just make the CPU go into the turbo range and then see >> above or (b) if the CPU is in a platform-coordinated package, its >> request will only be granted if it's the winning one. > > Sure I know all that. But that, to me, seems like an argument for why > you should have done this a long time ago. While I generally agree with this, I don't quite see why cleaning that up necessarily has to be connected to the current patch series which is my point. > Someone registering a notifier you _know_ won't be called reliably is a > sure sign of borkage. And you want to be notified (pun intended) of > borkage. > > So the alternative option to making the registration fail, is making the > registration WARN (and possibly disable fast support in the driver). > > But I do think something wants to be done here. So here's what I can do for the "fast switch" thing. There is the fast_switch_possible policy flag that's necessary anyway. I can make notifier registration fail when that is set for at least one policy and I can make the setting of it fail if at least one notifier has already been registered. However, without spending too much time on chasing code dependencies i sort of suspect that it will uncover things that register cpufreq notifiers early and it won't be possible to use fast switch without sorting that out. And that won't even change anything apart from removing some code that has not worked for quite a while already and nobody noticed. >> > No no no, that's just horrible. Why would you want to keep this >> > notification stuff alive? If your platform can change frequency 'fast' >> > you don't want notifiers. >> >> I'm not totally sure about that. > > I am, per definition, if you need to call notifiers, you're not fast. > > I would really suggest making that a hard rule and enforcing it. OK, but see above. It is doable for the "fast switch" thing, but it won't help in all of the other cases when notifications are not reliable. >> > What's the point of a notification that says: "At some point in the >> > random past my frequency has changed, and it likely has changed again >> > since then, do 'something'." >> > >> > That's pointless. If you have dependent clock domains or whatever, you >> > simply _cannot_ be fast. >> > >> >> What about thermal? They don't need to get very accurate information, >> but they need to be updated on a regular basis. It would do if they >> get averages instead of momentary values (and may be better even). > > Thermal, should be an integral part of cpufreq, but if they need a > callback from the switching hook (and here I would like to remind > everyone that this is inside scheduler hot paths and the more code you > stuff in the harder the performance regressions will hit you in the > face) Calling notifiers (or any kind of callbacks that anyone can register) from there is out of the question. > it can get a direct function call. No need for no stinking > notifiers. I'm not talking about hooks in the switching code but *some* way to let stuff know about frequency changes. If it changes frequently enough, it's not practical and not even necessary to cause things like thermal to react on every change, but I think there needs to be a way to make them reevaluate things regularly. Arguably, they might set a timer for that, but why would they need a timer if they could get triggered by the code that actually makes changes?
Re: [PATCH v2 6/10] cpufreq: Support for fast frequency switching
On Sun, Mar 06, 2016 at 03:17:09AM +0100, Rafael J. Wysocki wrote: > > Agreed, fail the stuff hard. > > > > Simply make cpufreq_register_notifier a __must_check function and add > > error handling to all call sites. > > Quite frankly, I don't see a compelling reason to do anything about > the notifications at this point. > > The ACPI driver is the only one that will support fast switching for > the time being and on practically all platforms that can use the ACPI > driver the transition notifications cannot be relied on anyway for a > few reasons. First, if intel_pstate or HWP is in use, they won't be > coming at all. Second, anything turbo will just change frequency at > will without notifying (like HWP). Finally, if they are coming, > whoever receives them is notified about the frequency that is > requested and not the real one, which is misleading, because (a) the > request may just make the CPU go into the turbo range and then see > above or (b) if the CPU is in a platform-coordinated package, its > request will only be granted if it's the winning one. Sure I know all that. But that, to me, seems like an argument for why you should have done this a long time ago. Someone registering a notifier you _know_ won't be called reliably is a sure sign of borkage. And you want to be notified (pun intended) of borkage. So the alternative option to making the registration fail, is making the registration WARN (and possibly disable fast support in the driver). But I do think something wants to be done here. > > No no no, that's just horrible. Why would you want to keep this > > notification stuff alive? If your platform can change frequency 'fast' > > you don't want notifiers. > > I'm not totally sure about that. I am, per definition, if you need to call notifiers, you're not fast. I would really suggest making that a hard rule and enforcing it. > > What's the point of a notification that says: "At some point in the > > random past my frequency has changed, and it likely has changed again > > since then, do 'something'." > > > > That's pointless. If you have dependent clock domains or whatever, you > > simply _cannot_ be fast. > > > > What about thermal? They don't need to get very accurate information, > but they need to be updated on a regular basis. It would do if they > get averages instead of momentary values (and may be better even). Thermal, should be an integral part of cpufreq, but if they need a callback from the switching hook (and here I would like to remind everyone that this is inside scheduler hot paths and the more code you stuff in the harder the performance regressions will hit you in the face) it can get a direct function call. No need for no stinking notifiers.
Re: [PATCH v2 6/10] cpufreq: Support for fast frequency switching
On Sun, Mar 06, 2016 at 03:17:09AM +0100, Rafael J. Wysocki wrote: > > Agreed, fail the stuff hard. > > > > Simply make cpufreq_register_notifier a __must_check function and add > > error handling to all call sites. > > Quite frankly, I don't see a compelling reason to do anything about > the notifications at this point. > > The ACPI driver is the only one that will support fast switching for > the time being and on practically all platforms that can use the ACPI > driver the transition notifications cannot be relied on anyway for a > few reasons. First, if intel_pstate or HWP is in use, they won't be > coming at all. Second, anything turbo will just change frequency at > will without notifying (like HWP). Finally, if they are coming, > whoever receives them is notified about the frequency that is > requested and not the real one, which is misleading, because (a) the > request may just make the CPU go into the turbo range and then see > above or (b) if the CPU is in a platform-coordinated package, its > request will only be granted if it's the winning one. Sure I know all that. But that, to me, seems like an argument for why you should have done this a long time ago. Someone registering a notifier you _know_ won't be called reliably is a sure sign of borkage. And you want to be notified (pun intended) of borkage. So the alternative option to making the registration fail, is making the registration WARN (and possibly disable fast support in the driver). But I do think something wants to be done here. > > No no no, that's just horrible. Why would you want to keep this > > notification stuff alive? If your platform can change frequency 'fast' > > you don't want notifiers. > > I'm not totally sure about that. I am, per definition, if you need to call notifiers, you're not fast. I would really suggest making that a hard rule and enforcing it. > > What's the point of a notification that says: "At some point in the > > random past my frequency has changed, and it likely has changed again > > since then, do 'something'." > > > > That's pointless. If you have dependent clock domains or whatever, you > > simply _cannot_ be fast. > > > > What about thermal? They don't need to get very accurate information, > but they need to be updated on a regular basis. It would do if they > get averages instead of momentary values (and may be better even). Thermal, should be an integral part of cpufreq, but if they need a callback from the switching hook (and here I would like to remind everyone that this is inside scheduler hot paths and the more code you stuff in the harder the performance regressions will hit you in the face) it can get a direct function call. No need for no stinking notifiers.
Re: [PATCH v2 6/10] cpufreq: Support for fast frequency switching
On Sat, Mar 5, 2016 at 5:49 PM, Peter Zijlstrawrote: > On Sat, Mar 05, 2016 at 12:18:54AM +0100, Rafael J. Wysocki wrote: > >> >>> Even if there are platforms which may change the CPU frequency behind >> >>> cpufreq's back, breaking the transition notifiers, I'm worried about the >> >>> addition of an interface which itself breaks them. The platforms which >> >>> do change CPU frequency on their own have probably evolved to live with >> >>> or work around this behavior. As other platforms migrate to fast >> >>> frequency switching they might be surprised when things don't work as >> >>> advertised. > > There's only 43 sites of cpufreq_register_notifier in 37 files, that > should be fairly simple to audit. > >> >>> I'm not sure what the easiest way to deal with this is. I see the >> >>> transition notifiers are the srcu type, which I understand to be >> >>> blocking. Going through the tree and reworking everyone's callbacks and >> >>> changing the type to atomic is obviously not realistic. >> >> >> >> Right. > > Even if it was (and per the above it looks entirely feasible), that's > just not going to happen. We're not ever going to call random notifier > crap from this deep within the scheduler. > >> >>> How about modifying cpufreq_register_notifier to return an error if the >> >>> driver has a fast_switch callback installed and an attempt to register a >> >>> transition notifier is made? >> >> >> >> That sounds like a good idea. > > Agreed, fail the stuff hard. > > Simply make cpufreq_register_notifier a __must_check function and add > error handling to all call sites. Quite frankly, I don't see a compelling reason to do anything about the notifications at this point. The ACPI driver is the only one that will support fast switching for the time being and on practically all platforms that can use the ACPI driver the transition notifications cannot be relied on anyway for a few reasons. First, if intel_pstate or HWP is in use, they won't be coming at all. Second, anything turbo will just change frequency at will without notifying (like HWP). Finally, if they are coming, whoever receives them is notified about the frequency that is requested and not the real one, which is misleading, because (a) the request may just make the CPU go into the turbo range and then see above or (b) if the CPU is in a platform-coordinated package, its request will only be granted if it's the winning one. >> > I guess what might be done would be to spawn a work item to carry out >> > a notify when the frequency changes. >> >> In fact, the mechanism may be relatively simple if I'm not mistaken. >> >> In the "fast switch" case, the governor may spawn a work item that >> will just execute cpufreq_get() on policy->cpu. That will notice that >> policy->cur is different from the real current frequency and will >> re-adjust. >> >> Of course, cpufreq_driver_fast_switch() will need to be modified so it >> doesn't update policy->cur then perhaps with a comment that the >> governor using it will be responsible for that. > > No no no, that's just horrible. Why would you want to keep this > notification stuff alive? If your platform can change frequency 'fast' > you don't want notifiers. I'm not totally sure about that. > > What's the point of a notification that says: "At some point in the > random past my frequency has changed, and it likely has changed again > since then, do 'something'." > > That's pointless. If you have dependent clock domains or whatever, you > simply _cannot_ be fast. > What about thermal? They don't need to get very accurate information, but they need to be updated on a regular basis. It would do if they get averages instead of momentary values (and may be better even).
Re: [PATCH v2 6/10] cpufreq: Support for fast frequency switching
On Sat, Mar 5, 2016 at 5:49 PM, Peter Zijlstra wrote: > On Sat, Mar 05, 2016 at 12:18:54AM +0100, Rafael J. Wysocki wrote: > >> >>> Even if there are platforms which may change the CPU frequency behind >> >>> cpufreq's back, breaking the transition notifiers, I'm worried about the >> >>> addition of an interface which itself breaks them. The platforms which >> >>> do change CPU frequency on their own have probably evolved to live with >> >>> or work around this behavior. As other platforms migrate to fast >> >>> frequency switching they might be surprised when things don't work as >> >>> advertised. > > There's only 43 sites of cpufreq_register_notifier in 37 files, that > should be fairly simple to audit. > >> >>> I'm not sure what the easiest way to deal with this is. I see the >> >>> transition notifiers are the srcu type, which I understand to be >> >>> blocking. Going through the tree and reworking everyone's callbacks and >> >>> changing the type to atomic is obviously not realistic. >> >> >> >> Right. > > Even if it was (and per the above it looks entirely feasible), that's > just not going to happen. We're not ever going to call random notifier > crap from this deep within the scheduler. > >> >>> How about modifying cpufreq_register_notifier to return an error if the >> >>> driver has a fast_switch callback installed and an attempt to register a >> >>> transition notifier is made? >> >> >> >> That sounds like a good idea. > > Agreed, fail the stuff hard. > > Simply make cpufreq_register_notifier a __must_check function and add > error handling to all call sites. Quite frankly, I don't see a compelling reason to do anything about the notifications at this point. The ACPI driver is the only one that will support fast switching for the time being and on practically all platforms that can use the ACPI driver the transition notifications cannot be relied on anyway for a few reasons. First, if intel_pstate or HWP is in use, they won't be coming at all. Second, anything turbo will just change frequency at will without notifying (like HWP). Finally, if they are coming, whoever receives them is notified about the frequency that is requested and not the real one, which is misleading, because (a) the request may just make the CPU go into the turbo range and then see above or (b) if the CPU is in a platform-coordinated package, its request will only be granted if it's the winning one. >> > I guess what might be done would be to spawn a work item to carry out >> > a notify when the frequency changes. >> >> In fact, the mechanism may be relatively simple if I'm not mistaken. >> >> In the "fast switch" case, the governor may spawn a work item that >> will just execute cpufreq_get() on policy->cpu. That will notice that >> policy->cur is different from the real current frequency and will >> re-adjust. >> >> Of course, cpufreq_driver_fast_switch() will need to be modified so it >> doesn't update policy->cur then perhaps with a comment that the >> governor using it will be responsible for that. > > No no no, that's just horrible. Why would you want to keep this > notification stuff alive? If your platform can change frequency 'fast' > you don't want notifiers. I'm not totally sure about that. > > What's the point of a notification that says: "At some point in the > random past my frequency has changed, and it likely has changed again > since then, do 'something'." > > That's pointless. If you have dependent clock domains or whatever, you > simply _cannot_ be fast. > What about thermal? They don't need to get very accurate information, but they need to be updated on a regular basis. It would do if they get averages instead of momentary values (and may be better even).
Re: [PATCH v2 6/10] cpufreq: Support for fast frequency switching
On Sat, Mar 05, 2016 at 12:18:54AM +0100, Rafael J. Wysocki wrote: > >>> Even if there are platforms which may change the CPU frequency behind > >>> cpufreq's back, breaking the transition notifiers, I'm worried about the > >>> addition of an interface which itself breaks them. The platforms which > >>> do change CPU frequency on their own have probably evolved to live with > >>> or work around this behavior. As other platforms migrate to fast > >>> frequency switching they might be surprised when things don't work as > >>> advertised. There's only 43 sites of cpufreq_register_notifier in 37 files, that should be fairly simple to audit. > >>> I'm not sure what the easiest way to deal with this is. I see the > >>> transition notifiers are the srcu type, which I understand to be > >>> blocking. Going through the tree and reworking everyone's callbacks and > >>> changing the type to atomic is obviously not realistic. > >> > >> Right. Even if it was (and per the above it looks entirely feasible), that's just not going to happen. We're not ever going to call random notifier crap from this deep within the scheduler. > >>> How about modifying cpufreq_register_notifier to return an error if the > >>> driver has a fast_switch callback installed and an attempt to register a > >>> transition notifier is made? > >> > >> That sounds like a good idea. Agreed, fail the stuff hard. Simply make cpufreq_register_notifier a __must_check function and add error handling to all call sites. > > I guess what might be done would be to spawn a work item to carry out > > a notify when the frequency changes. > > In fact, the mechanism may be relatively simple if I'm not mistaken. > > In the "fast switch" case, the governor may spawn a work item that > will just execute cpufreq_get() on policy->cpu. That will notice that > policy->cur is different from the real current frequency and will > re-adjust. > > Of course, cpufreq_driver_fast_switch() will need to be modified so it > doesn't update policy->cur then perhaps with a comment that the > governor using it will be responsible for that. No no no, that's just horrible. Why would you want to keep this notification stuff alive? If your platform can change frequency 'fast' you don't want notifiers. What's the point of a notification that says: "At some point in the random past my frequency has changed, and it likely has changed again since then, do 'something'." That's pointless. If you have dependent clock domains or whatever, you simply _cannot_ be fast.
Re: [PATCH v2 6/10] cpufreq: Support for fast frequency switching
On Sat, Mar 05, 2016 at 12:18:54AM +0100, Rafael J. Wysocki wrote: > >>> Even if there are platforms which may change the CPU frequency behind > >>> cpufreq's back, breaking the transition notifiers, I'm worried about the > >>> addition of an interface which itself breaks them. The platforms which > >>> do change CPU frequency on their own have probably evolved to live with > >>> or work around this behavior. As other platforms migrate to fast > >>> frequency switching they might be surprised when things don't work as > >>> advertised. There's only 43 sites of cpufreq_register_notifier in 37 files, that should be fairly simple to audit. > >>> I'm not sure what the easiest way to deal with this is. I see the > >>> transition notifiers are the srcu type, which I understand to be > >>> blocking. Going through the tree and reworking everyone's callbacks and > >>> changing the type to atomic is obviously not realistic. > >> > >> Right. Even if it was (and per the above it looks entirely feasible), that's just not going to happen. We're not ever going to call random notifier crap from this deep within the scheduler. > >>> How about modifying cpufreq_register_notifier to return an error if the > >>> driver has a fast_switch callback installed and an attempt to register a > >>> transition notifier is made? > >> > >> That sounds like a good idea. Agreed, fail the stuff hard. Simply make cpufreq_register_notifier a __must_check function and add error handling to all call sites. > > I guess what might be done would be to spawn a work item to carry out > > a notify when the frequency changes. > > In fact, the mechanism may be relatively simple if I'm not mistaken. > > In the "fast switch" case, the governor may spawn a work item that > will just execute cpufreq_get() on policy->cpu. That will notice that > policy->cur is different from the real current frequency and will > re-adjust. > > Of course, cpufreq_driver_fast_switch() will need to be modified so it > doesn't update policy->cur then perhaps with a comment that the > governor using it will be responsible for that. No no no, that's just horrible. Why would you want to keep this notification stuff alive? If your platform can change frequency 'fast' you don't want notifiers. What's the point of a notification that says: "At some point in the random past my frequency has changed, and it likely has changed again since then, do 'something'." That's pointless. If you have dependent clock domains or whatever, you simply _cannot_ be fast.
Re: [PATCH v2 6/10] cpufreq: Support for fast frequency switching
* Rafael J. Wysockiwrote: > > Honestly I wonder if it's better to just try the "no notifiers with fast > > drivers" approach to start. The notifiers could always be added if platform > > owners complain that they absolutely require them. > > Well, I'm not sure what happens if we start to fail notifier registrations. > It > may not be a well tested error code path. :-) Yeah, so as a general principle 'struct notifier_block' as a really bad interface with poor and fragile semantics, and we are trying to get rid of them everywhere from core kernel code. For example Thomas Gleixner et al is working on eliminating them from the CPU hotplug code - which will get rid of most remaining notifier uses from the scheduler as well. So please add explicit cpufreq driver callback functions instead, which can be filled in by a platform if needed. No notifiers! Thanks, Ingo
Re: [PATCH v2 6/10] cpufreq: Support for fast frequency switching
* Rafael J. Wysocki wrote: > > Honestly I wonder if it's better to just try the "no notifiers with fast > > drivers" approach to start. The notifiers could always be added if platform > > owners complain that they absolutely require them. > > Well, I'm not sure what happens if we start to fail notifier registrations. > It > may not be a well tested error code path. :-) Yeah, so as a general principle 'struct notifier_block' as a really bad interface with poor and fragile semantics, and we are trying to get rid of them everywhere from core kernel code. For example Thomas Gleixner et al is working on eliminating them from the CPU hotplug code - which will get rid of most remaining notifier uses from the scheduler as well. So please add explicit cpufreq driver callback functions instead, which can be filled in by a platform if needed. No notifiers! Thanks, Ingo
Re: [PATCH v2 6/10] cpufreq: Support for fast frequency switching
On Sat, Mar 5, 2016 at 12:56 AM, Steve Mucklewrote: > On 03/04/2016 03:18 PM, Rafael J. Wysocki wrote: >> In fact, the mechanism may be relatively simple if I'm not mistaken. >> >> In the "fast switch" case, the governor may spawn a work item that >> will just execute cpufreq_get() on policy->cpu. That will notice that >> policy->cur is different from the real current frequency and will >> re-adjust. >> >> Of course, cpufreq_driver_fast_switch() will need to be modified so it >> doesn't update policy->cur then perhaps with a comment that the >> governor using it will be responsible for that. >> >> And the governor will need to avoid spawning that work item too often >> (basically, if one has been spawned already and hasn't completed, no >> need to spawn a new one, and maybe rate-limit it?), but all that looks >> reasonably straightforward. > > It is another option though definitely a compromise. The semantics seem > different since you'd potentially have multiple freq changes before a > single notifier went through, so stuff might still break. Here I'm not worried. That's basically equivalent to someone doing a "get" and seeing an unexpected frequency in the driver output which is covered already and things need to cope with it or they are just really broken. > The fast path would also be more expensive given the workqueue activity that > could > translate into additional task wakeups. That's a valid concern, so maybe there can be a driver flag to indicate that this has to be done if ->fast_switch is in use? Or something like fast_switch_notify_rate that will tell the governor how often to notify things about transitions if ->fast_switch is in use with either 0 or all ones meaning "never"? That might be a policy property even, so the driver may set this depending on what platform it is used on. > Honestly I wonder if it's better to just try the "no notifiers with fast > drivers" approach to start. The notifiers could always be added if > platform owners complain that they absolutely require them. Well, I'm not sure what happens if we start to fail notifier registrations. It may not be a well tested error code path. :-) Besides, there is the problem with registering notifiers before the driver and I don't think we can fail driver registration if notifiers have already been registered. We may not be able to register a "fast" driver at all in that case. But that whole thing is your worry, not mine. :-) Had I been worrying about that, I would have added some bandaid for that to the patches. Thanks, Rafael
Re: [PATCH v2 6/10] cpufreq: Support for fast frequency switching
On Sat, Mar 5, 2016 at 12:56 AM, Steve Muckle wrote: > On 03/04/2016 03:18 PM, Rafael J. Wysocki wrote: >> In fact, the mechanism may be relatively simple if I'm not mistaken. >> >> In the "fast switch" case, the governor may spawn a work item that >> will just execute cpufreq_get() on policy->cpu. That will notice that >> policy->cur is different from the real current frequency and will >> re-adjust. >> >> Of course, cpufreq_driver_fast_switch() will need to be modified so it >> doesn't update policy->cur then perhaps with a comment that the >> governor using it will be responsible for that. >> >> And the governor will need to avoid spawning that work item too often >> (basically, if one has been spawned already and hasn't completed, no >> need to spawn a new one, and maybe rate-limit it?), but all that looks >> reasonably straightforward. > > It is another option though definitely a compromise. The semantics seem > different since you'd potentially have multiple freq changes before a > single notifier went through, so stuff might still break. Here I'm not worried. That's basically equivalent to someone doing a "get" and seeing an unexpected frequency in the driver output which is covered already and things need to cope with it or they are just really broken. > The fast path would also be more expensive given the workqueue activity that > could > translate into additional task wakeups. That's a valid concern, so maybe there can be a driver flag to indicate that this has to be done if ->fast_switch is in use? Or something like fast_switch_notify_rate that will tell the governor how often to notify things about transitions if ->fast_switch is in use with either 0 or all ones meaning "never"? That might be a policy property even, so the driver may set this depending on what platform it is used on. > Honestly I wonder if it's better to just try the "no notifiers with fast > drivers" approach to start. The notifiers could always be added if > platform owners complain that they absolutely require them. Well, I'm not sure what happens if we start to fail notifier registrations. It may not be a well tested error code path. :-) Besides, there is the problem with registering notifiers before the driver and I don't think we can fail driver registration if notifiers have already been registered. We may not be able to register a "fast" driver at all in that case. But that whole thing is your worry, not mine. :-) Had I been worrying about that, I would have added some bandaid for that to the patches. Thanks, Rafael
Re: [PATCH v2 6/10] cpufreq: Support for fast frequency switching
On 03/04/2016 02:58 PM, Rafael J. Wysocki wrote: >>> How about modifying cpufreq_register_notifier to return an error if the >>> >> driver has a fast_switch callback installed and an attempt to register a >>> >> transition notifier is made? >> > >> > That sounds like a good idea. > > Transition notifiers may be registered before the driver is > registered, so that won't help in all cases. Could that hole be closed by a similar check in cpufreq_register_driver()? I.e. if the transition_notifier list is not empty, fail to register the driver (if the driver has a fast_switch routine)? Or alternatively, the fast_switch routine is not installed. thanks, Steve
Re: [PATCH v2 6/10] cpufreq: Support for fast frequency switching
On 03/04/2016 02:58 PM, Rafael J. Wysocki wrote: >>> How about modifying cpufreq_register_notifier to return an error if the >>> >> driver has a fast_switch callback installed and an attempt to register a >>> >> transition notifier is made? >> > >> > That sounds like a good idea. > > Transition notifiers may be registered before the driver is > registered, so that won't help in all cases. Could that hole be closed by a similar check in cpufreq_register_driver()? I.e. if the transition_notifier list is not empty, fail to register the driver (if the driver has a fast_switch routine)? Or alternatively, the fast_switch routine is not installed. thanks, Steve
Re: [PATCH v2 6/10] cpufreq: Support for fast frequency switching
On 03/04/2016 03:18 PM, Rafael J. Wysocki wrote: > In fact, the mechanism may be relatively simple if I'm not mistaken. > > In the "fast switch" case, the governor may spawn a work item that > will just execute cpufreq_get() on policy->cpu. That will notice that > policy->cur is different from the real current frequency and will > re-adjust. > > Of course, cpufreq_driver_fast_switch() will need to be modified so it > doesn't update policy->cur then perhaps with a comment that the > governor using it will be responsible for that. > > And the governor will need to avoid spawning that work item too often > (basically, if one has been spawned already and hasn't completed, no > need to spawn a new one, and maybe rate-limit it?), but all that looks > reasonably straightforward. It is another option though definitely a compromise. The semantics seem different since you'd potentially have multiple freq changes before a single notifier went through, so stuff might still break. The fast path would also be more expensive given the workqueue activity that could translate into additional task wakeups. Honestly I wonder if it's better to just try the "no notifiers with fast drivers" approach to start. The notifiers could always be added if platform owners complain that they absolutely require them. thanks, Steve
Re: [PATCH v2 6/10] cpufreq: Support for fast frequency switching
On 03/04/2016 03:18 PM, Rafael J. Wysocki wrote: > In fact, the mechanism may be relatively simple if I'm not mistaken. > > In the "fast switch" case, the governor may spawn a work item that > will just execute cpufreq_get() on policy->cpu. That will notice that > policy->cur is different from the real current frequency and will > re-adjust. > > Of course, cpufreq_driver_fast_switch() will need to be modified so it > doesn't update policy->cur then perhaps with a comment that the > governor using it will be responsible for that. > > And the governor will need to avoid spawning that work item too often > (basically, if one has been spawned already and hasn't completed, no > need to spawn a new one, and maybe rate-limit it?), but all that looks > reasonably straightforward. It is another option though definitely a compromise. The semantics seem different since you'd potentially have multiple freq changes before a single notifier went through, so stuff might still break. The fast path would also be more expensive given the workqueue activity that could translate into additional task wakeups. Honestly I wonder if it's better to just try the "no notifiers with fast drivers" approach to start. The notifiers could always be added if platform owners complain that they absolutely require them. thanks, Steve
Re: [PATCH v2 6/10] cpufreq: Support for fast frequency switching
On Fri, Mar 4, 2016 at 11:40 PM, Rafael J. Wysockiwrote: > On Fri, Mar 4, 2016 at 11:32 PM, Rafael J. Wysocki wrote: >> On Fri, Mar 4, 2016 at 11:18 PM, Steve Muckle >> wrote: >>> On 03/03/2016 07:07 PM, Rafael J. Wysocki wrote: +void cpufreq_driver_fast_switch(struct cpufreq_policy *policy, + unsigned int target_freq, unsigned int relation) +{ + unsigned int freq; + + freq = cpufreq_driver->fast_switch(policy, target_freq, relation); + if (freq != CPUFREQ_ENTRY_INVALID) { + policy->cur = freq; + trace_cpu_frequency(freq, smp_processor_id()); + } +} >>> >>> Even if there are platforms which may change the CPU frequency behind >>> cpufreq's back, breaking the transition notifiers, I'm worried about the >>> addition of an interface which itself breaks them. The platforms which >>> do change CPU frequency on their own have probably evolved to live with >>> or work around this behavior. As other platforms migrate to fast >>> frequency switching they might be surprised when things don't work as >>> advertised. >> >> Well, intel_pstate doesn't do notifies at all, so anything depending >> on them is already broken when it is used. Let alone the hardware >> P-states coordination mechanism (HWP) where the frequency is >> controlled by the processor itself entirely. >> >> That said I see your point. >> >>> I'm not sure what the easiest way to deal with this is. I see the >>> transition notifiers are the srcu type, which I understand to be >>> blocking. Going through the tree and reworking everyone's callbacks and >>> changing the type to atomic is obviously not realistic. >> >> Right. >> >>> How about modifying cpufreq_register_notifier to return an error if the >>> driver has a fast_switch callback installed and an attempt to register a >>> transition notifier is made? >> >> That sounds like a good idea. >> >> There also is the CPUFREQ_ASYNC_NOTIFICATION driver flag that in >> principle might be used as a workaround, but I'm not sure how much >> work that would require ATM. > > What I mean is that drivers using it are supposed to handle the > notifications by calling cpufreq_freq_transition_begin(/end() by > themselves, so theoretically there is a mechanism already in place for > that. > > I guess what might be done would be to spawn a work item to carry out > a notify when the frequency changes. In fact, the mechanism may be relatively simple if I'm not mistaken. In the "fast switch" case, the governor may spawn a work item that will just execute cpufreq_get() on policy->cpu. That will notice that policy->cur is different from the real current frequency and will re-adjust. Of course, cpufreq_driver_fast_switch() will need to be modified so it doesn't update policy->cur then perhaps with a comment that the governor using it will be responsible for that. And the governor will need to avoid spawning that work item too often (basically, if one has been spawned already and hasn't completed, no need to spawn a new one, and maybe rate-limit it?), but all that looks reasonably straightforward. Thanks, Rafael
Re: [PATCH v2 6/10] cpufreq: Support for fast frequency switching
On Fri, Mar 4, 2016 at 11:40 PM, Rafael J. Wysocki wrote: > On Fri, Mar 4, 2016 at 11:32 PM, Rafael J. Wysocki wrote: >> On Fri, Mar 4, 2016 at 11:18 PM, Steve Muckle >> wrote: >>> On 03/03/2016 07:07 PM, Rafael J. Wysocki wrote: +void cpufreq_driver_fast_switch(struct cpufreq_policy *policy, + unsigned int target_freq, unsigned int relation) +{ + unsigned int freq; + + freq = cpufreq_driver->fast_switch(policy, target_freq, relation); + if (freq != CPUFREQ_ENTRY_INVALID) { + policy->cur = freq; + trace_cpu_frequency(freq, smp_processor_id()); + } +} >>> >>> Even if there are platforms which may change the CPU frequency behind >>> cpufreq's back, breaking the transition notifiers, I'm worried about the >>> addition of an interface which itself breaks them. The platforms which >>> do change CPU frequency on their own have probably evolved to live with >>> or work around this behavior. As other platforms migrate to fast >>> frequency switching they might be surprised when things don't work as >>> advertised. >> >> Well, intel_pstate doesn't do notifies at all, so anything depending >> on them is already broken when it is used. Let alone the hardware >> P-states coordination mechanism (HWP) where the frequency is >> controlled by the processor itself entirely. >> >> That said I see your point. >> >>> I'm not sure what the easiest way to deal with this is. I see the >>> transition notifiers are the srcu type, which I understand to be >>> blocking. Going through the tree and reworking everyone's callbacks and >>> changing the type to atomic is obviously not realistic. >> >> Right. >> >>> How about modifying cpufreq_register_notifier to return an error if the >>> driver has a fast_switch callback installed and an attempt to register a >>> transition notifier is made? >> >> That sounds like a good idea. >> >> There also is the CPUFREQ_ASYNC_NOTIFICATION driver flag that in >> principle might be used as a workaround, but I'm not sure how much >> work that would require ATM. > > What I mean is that drivers using it are supposed to handle the > notifications by calling cpufreq_freq_transition_begin(/end() by > themselves, so theoretically there is a mechanism already in place for > that. > > I guess what might be done would be to spawn a work item to carry out > a notify when the frequency changes. In fact, the mechanism may be relatively simple if I'm not mistaken. In the "fast switch" case, the governor may spawn a work item that will just execute cpufreq_get() on policy->cpu. That will notice that policy->cur is different from the real current frequency and will re-adjust. Of course, cpufreq_driver_fast_switch() will need to be modified so it doesn't update policy->cur then perhaps with a comment that the governor using it will be responsible for that. And the governor will need to avoid spawning that work item too often (basically, if one has been spawned already and hasn't completed, no need to spawn a new one, and maybe rate-limit it?), but all that looks reasonably straightforward. Thanks, Rafael
Re: [PATCH v2 6/10] cpufreq: Support for fast frequency switching
On Fri, Mar 4, 2016 at 11:32 PM, Rafael J. Wysockiwrote: > On Fri, Mar 4, 2016 at 11:18 PM, Steve Muckle wrote: >> On 03/03/2016 07:07 PM, Rafael J. Wysocki wrote: >>> +void cpufreq_driver_fast_switch(struct cpufreq_policy *policy, >>> + unsigned int target_freq, unsigned int >>> relation) >>> +{ >>> + unsigned int freq; >>> + >>> + freq = cpufreq_driver->fast_switch(policy, target_freq, relation); >>> + if (freq != CPUFREQ_ENTRY_INVALID) { >>> + policy->cur = freq; >>> + trace_cpu_frequency(freq, smp_processor_id()); >>> + } >>> +} >> >> Even if there are platforms which may change the CPU frequency behind >> cpufreq's back, breaking the transition notifiers, I'm worried about the >> addition of an interface which itself breaks them. The platforms which >> do change CPU frequency on their own have probably evolved to live with >> or work around this behavior. As other platforms migrate to fast >> frequency switching they might be surprised when things don't work as >> advertised. > > Well, intel_pstate doesn't do notifies at all, so anything depending > on them is already broken when it is used. Let alone the hardware > P-states coordination mechanism (HWP) where the frequency is > controlled by the processor itself entirely. > > That said I see your point. > >> I'm not sure what the easiest way to deal with this is. I see the >> transition notifiers are the srcu type, which I understand to be >> blocking. Going through the tree and reworking everyone's callbacks and >> changing the type to atomic is obviously not realistic. > > Right. > >> How about modifying cpufreq_register_notifier to return an error if the >> driver has a fast_switch callback installed and an attempt to register a >> transition notifier is made? > > That sounds like a good idea. Transition notifiers may be registered before the driver is registered, so that won't help in all cases. Thanks, Rafael
Re: [PATCH v2 6/10] cpufreq: Support for fast frequency switching
On Fri, Mar 4, 2016 at 11:32 PM, Rafael J. Wysocki wrote: > On Fri, Mar 4, 2016 at 11:18 PM, Steve Muckle wrote: >> On 03/03/2016 07:07 PM, Rafael J. Wysocki wrote: >>> +void cpufreq_driver_fast_switch(struct cpufreq_policy *policy, >>> + unsigned int target_freq, unsigned int >>> relation) >>> +{ >>> + unsigned int freq; >>> + >>> + freq = cpufreq_driver->fast_switch(policy, target_freq, relation); >>> + if (freq != CPUFREQ_ENTRY_INVALID) { >>> + policy->cur = freq; >>> + trace_cpu_frequency(freq, smp_processor_id()); >>> + } >>> +} >> >> Even if there are platforms which may change the CPU frequency behind >> cpufreq's back, breaking the transition notifiers, I'm worried about the >> addition of an interface which itself breaks them. The platforms which >> do change CPU frequency on their own have probably evolved to live with >> or work around this behavior. As other platforms migrate to fast >> frequency switching they might be surprised when things don't work as >> advertised. > > Well, intel_pstate doesn't do notifies at all, so anything depending > on them is already broken when it is used. Let alone the hardware > P-states coordination mechanism (HWP) where the frequency is > controlled by the processor itself entirely. > > That said I see your point. > >> I'm not sure what the easiest way to deal with this is. I see the >> transition notifiers are the srcu type, which I understand to be >> blocking. Going through the tree and reworking everyone's callbacks and >> changing the type to atomic is obviously not realistic. > > Right. > >> How about modifying cpufreq_register_notifier to return an error if the >> driver has a fast_switch callback installed and an attempt to register a >> transition notifier is made? > > That sounds like a good idea. Transition notifiers may be registered before the driver is registered, so that won't help in all cases. Thanks, Rafael
Re: [PATCH v2 6/10] cpufreq: Support for fast frequency switching
On Fri, Mar 4, 2016 at 11:32 PM, Rafael J. Wysockiwrote: > On Fri, Mar 4, 2016 at 11:18 PM, Steve Muckle wrote: >> On 03/03/2016 07:07 PM, Rafael J. Wysocki wrote: >>> +void cpufreq_driver_fast_switch(struct cpufreq_policy *policy, >>> + unsigned int target_freq, unsigned int >>> relation) >>> +{ >>> + unsigned int freq; >>> + >>> + freq = cpufreq_driver->fast_switch(policy, target_freq, relation); >>> + if (freq != CPUFREQ_ENTRY_INVALID) { >>> + policy->cur = freq; >>> + trace_cpu_frequency(freq, smp_processor_id()); >>> + } >>> +} >> >> Even if there are platforms which may change the CPU frequency behind >> cpufreq's back, breaking the transition notifiers, I'm worried about the >> addition of an interface which itself breaks them. The platforms which >> do change CPU frequency on their own have probably evolved to live with >> or work around this behavior. As other platforms migrate to fast >> frequency switching they might be surprised when things don't work as >> advertised. > > Well, intel_pstate doesn't do notifies at all, so anything depending > on them is already broken when it is used. Let alone the hardware > P-states coordination mechanism (HWP) where the frequency is > controlled by the processor itself entirely. > > That said I see your point. > >> I'm not sure what the easiest way to deal with this is. I see the >> transition notifiers are the srcu type, which I understand to be >> blocking. Going through the tree and reworking everyone's callbacks and >> changing the type to atomic is obviously not realistic. > > Right. > >> How about modifying cpufreq_register_notifier to return an error if the >> driver has a fast_switch callback installed and an attempt to register a >> transition notifier is made? > > That sounds like a good idea. > > There also is the CPUFREQ_ASYNC_NOTIFICATION driver flag that in > principle might be used as a workaround, but I'm not sure how much > work that would require ATM. What I mean is that drivers using it are supposed to handle the notifications by calling cpufreq_freq_transition_begin(/end() by themselves, so theoretically there is a mechanism already in place for that. I guess what might be done would be to spawn a work item to carry out a notify when the frequency changes. Thanks, Rafael
Re: [PATCH v2 6/10] cpufreq: Support for fast frequency switching
On Fri, Mar 4, 2016 at 11:32 PM, Rafael J. Wysocki wrote: > On Fri, Mar 4, 2016 at 11:18 PM, Steve Muckle wrote: >> On 03/03/2016 07:07 PM, Rafael J. Wysocki wrote: >>> +void cpufreq_driver_fast_switch(struct cpufreq_policy *policy, >>> + unsigned int target_freq, unsigned int >>> relation) >>> +{ >>> + unsigned int freq; >>> + >>> + freq = cpufreq_driver->fast_switch(policy, target_freq, relation); >>> + if (freq != CPUFREQ_ENTRY_INVALID) { >>> + policy->cur = freq; >>> + trace_cpu_frequency(freq, smp_processor_id()); >>> + } >>> +} >> >> Even if there are platforms which may change the CPU frequency behind >> cpufreq's back, breaking the transition notifiers, I'm worried about the >> addition of an interface which itself breaks them. The platforms which >> do change CPU frequency on their own have probably evolved to live with >> or work around this behavior. As other platforms migrate to fast >> frequency switching they might be surprised when things don't work as >> advertised. > > Well, intel_pstate doesn't do notifies at all, so anything depending > on them is already broken when it is used. Let alone the hardware > P-states coordination mechanism (HWP) where the frequency is > controlled by the processor itself entirely. > > That said I see your point. > >> I'm not sure what the easiest way to deal with this is. I see the >> transition notifiers are the srcu type, which I understand to be >> blocking. Going through the tree and reworking everyone's callbacks and >> changing the type to atomic is obviously not realistic. > > Right. > >> How about modifying cpufreq_register_notifier to return an error if the >> driver has a fast_switch callback installed and an attempt to register a >> transition notifier is made? > > That sounds like a good idea. > > There also is the CPUFREQ_ASYNC_NOTIFICATION driver flag that in > principle might be used as a workaround, but I'm not sure how much > work that would require ATM. What I mean is that drivers using it are supposed to handle the notifications by calling cpufreq_freq_transition_begin(/end() by themselves, so theoretically there is a mechanism already in place for that. I guess what might be done would be to spawn a work item to carry out a notify when the frequency changes. Thanks, Rafael
Re: [PATCH v2 6/10] cpufreq: Support for fast frequency switching
On Fri, Mar 4, 2016 at 11:18 PM, Steve Mucklewrote: > On 03/03/2016 07:07 PM, Rafael J. Wysocki wrote: >> +void cpufreq_driver_fast_switch(struct cpufreq_policy *policy, >> + unsigned int target_freq, unsigned int >> relation) >> +{ >> + unsigned int freq; >> + >> + freq = cpufreq_driver->fast_switch(policy, target_freq, relation); >> + if (freq != CPUFREQ_ENTRY_INVALID) { >> + policy->cur = freq; >> + trace_cpu_frequency(freq, smp_processor_id()); >> + } >> +} > > Even if there are platforms which may change the CPU frequency behind > cpufreq's back, breaking the transition notifiers, I'm worried about the > addition of an interface which itself breaks them. The platforms which > do change CPU frequency on their own have probably evolved to live with > or work around this behavior. As other platforms migrate to fast > frequency switching they might be surprised when things don't work as > advertised. Well, intel_pstate doesn't do notifies at all, so anything depending on them is already broken when it is used. Let alone the hardware P-states coordination mechanism (HWP) where the frequency is controlled by the processor itself entirely. That said I see your point. > I'm not sure what the easiest way to deal with this is. I see the > transition notifiers are the srcu type, which I understand to be > blocking. Going through the tree and reworking everyone's callbacks and > changing the type to atomic is obviously not realistic. Right. > How about modifying cpufreq_register_notifier to return an error if the > driver has a fast_switch callback installed and an attempt to register a > transition notifier is made? That sounds like a good idea. There also is the CPUFREQ_ASYNC_NOTIFICATION driver flag that in principle might be used as a workaround, but I'm not sure how much work that would require ATM. > In the future, perhaps an additional atomic transition callback type can > be added, which platform/driver owners can switch to if they wish to use > fast transitions with their platform. I guess you mean an atomic notification mechanism based on registering callbacks? While technically viable that's somewhat risky, because we are in a fast path and allowing anyone to add stuff to it would be asking for trouble IMO. Thanks, Rafael
Re: [PATCH v2 6/10] cpufreq: Support for fast frequency switching
On Fri, Mar 4, 2016 at 11:18 PM, Steve Muckle wrote: > On 03/03/2016 07:07 PM, Rafael J. Wysocki wrote: >> +void cpufreq_driver_fast_switch(struct cpufreq_policy *policy, >> + unsigned int target_freq, unsigned int >> relation) >> +{ >> + unsigned int freq; >> + >> + freq = cpufreq_driver->fast_switch(policy, target_freq, relation); >> + if (freq != CPUFREQ_ENTRY_INVALID) { >> + policy->cur = freq; >> + trace_cpu_frequency(freq, smp_processor_id()); >> + } >> +} > > Even if there are platforms which may change the CPU frequency behind > cpufreq's back, breaking the transition notifiers, I'm worried about the > addition of an interface which itself breaks them. The platforms which > do change CPU frequency on their own have probably evolved to live with > or work around this behavior. As other platforms migrate to fast > frequency switching they might be surprised when things don't work as > advertised. Well, intel_pstate doesn't do notifies at all, so anything depending on them is already broken when it is used. Let alone the hardware P-states coordination mechanism (HWP) where the frequency is controlled by the processor itself entirely. That said I see your point. > I'm not sure what the easiest way to deal with this is. I see the > transition notifiers are the srcu type, which I understand to be > blocking. Going through the tree and reworking everyone's callbacks and > changing the type to atomic is obviously not realistic. Right. > How about modifying cpufreq_register_notifier to return an error if the > driver has a fast_switch callback installed and an attempt to register a > transition notifier is made? That sounds like a good idea. There also is the CPUFREQ_ASYNC_NOTIFICATION driver flag that in principle might be used as a workaround, but I'm not sure how much work that would require ATM. > In the future, perhaps an additional atomic transition callback type can > be added, which platform/driver owners can switch to if they wish to use > fast transitions with their platform. I guess you mean an atomic notification mechanism based on registering callbacks? While technically viable that's somewhat risky, because we are in a fast path and allowing anyone to add stuff to it would be asking for trouble IMO. Thanks, Rafael
Re: [PATCH v2 6/10] cpufreq: Support for fast frequency switching
On 03/03/2016 07:07 PM, Rafael J. Wysocki wrote: > +void cpufreq_driver_fast_switch(struct cpufreq_policy *policy, > + unsigned int target_freq, unsigned int relation) > +{ > + unsigned int freq; > + > + freq = cpufreq_driver->fast_switch(policy, target_freq, relation); > + if (freq != CPUFREQ_ENTRY_INVALID) { > + policy->cur = freq; > + trace_cpu_frequency(freq, smp_processor_id()); > + } > +} Even if there are platforms which may change the CPU frequency behind cpufreq's back, breaking the transition notifiers, I'm worried about the addition of an interface which itself breaks them. The platforms which do change CPU frequency on their own have probably evolved to live with or work around this behavior. As other platforms migrate to fast frequency switching they might be surprised when things don't work as advertised. I'm not sure what the easiest way to deal with this is. I see the transition notifiers are the srcu type, which I understand to be blocking. Going through the tree and reworking everyone's callbacks and changing the type to atomic is obviously not realistic. How about modifying cpufreq_register_notifier to return an error if the driver has a fast_switch callback installed and an attempt to register a transition notifier is made? In the future, perhaps an additional atomic transition callback type can be added, which platform/driver owners can switch to if they wish to use fast transitions with their platform. thanks, Steve
Re: [PATCH v2 6/10] cpufreq: Support for fast frequency switching
On 03/03/2016 07:07 PM, Rafael J. Wysocki wrote: > +void cpufreq_driver_fast_switch(struct cpufreq_policy *policy, > + unsigned int target_freq, unsigned int relation) > +{ > + unsigned int freq; > + > + freq = cpufreq_driver->fast_switch(policy, target_freq, relation); > + if (freq != CPUFREQ_ENTRY_INVALID) { > + policy->cur = freq; > + trace_cpu_frequency(freq, smp_processor_id()); > + } > +} Even if there are platforms which may change the CPU frequency behind cpufreq's back, breaking the transition notifiers, I'm worried about the addition of an interface which itself breaks them. The platforms which do change CPU frequency on their own have probably evolved to live with or work around this behavior. As other platforms migrate to fast frequency switching they might be surprised when things don't work as advertised. I'm not sure what the easiest way to deal with this is. I see the transition notifiers are the srcu type, which I understand to be blocking. Going through the tree and reworking everyone's callbacks and changing the type to atomic is obviously not realistic. How about modifying cpufreq_register_notifier to return an error if the driver has a fast_switch callback installed and an attempt to register a transition notifier is made? In the future, perhaps an additional atomic transition callback type can be added, which platform/driver owners can switch to if they wish to use fast transitions with their platform. thanks, Steve
[PATCH v2 6/10] cpufreq: Support for fast frequency switching
From: Rafael J. WysockiModify the ACPI cpufreq driver to provide a method for switching CPU frequencies from interrupt context and update the cpufreq core to support that method if available. Introduce a new cpufreq driver callback, ->fast_switch, to be invoked for frequency switching from interrupt context via a new helper function, cpufreq_driver_fast_switch(). Add a new policy flag, fast_switch_possible, to be set if fast frequency switching can be used for the given policy. Implement the ->fast_switch callback in the ACPI cpufreq driver and make it set fast_switch_possible during policy initialization as appropriate. Signed-off-by: Rafael J. Wysocki --- Changes from the previous version: - Drop a bogus check from cpufreq_driver_fast_switch(). --- drivers/cpufreq/acpi-cpufreq.c | 53 + drivers/cpufreq/cpufreq.c | 30 +++ include/linux/cpufreq.h|6 3 files changed, 89 insertions(+) Index: linux-pm/drivers/cpufreq/acpi-cpufreq.c === --- linux-pm.orig/drivers/cpufreq/acpi-cpufreq.c +++ linux-pm/drivers/cpufreq/acpi-cpufreq.c @@ -458,6 +458,55 @@ static int acpi_cpufreq_target(struct cp return result; } +unsigned int acpi_cpufreq_fast_switch(struct cpufreq_policy *policy, + unsigned int target_freq, + unsigned int relation) +{ + struct acpi_cpufreq_data *data = policy->driver_data; + struct acpi_processor_performance *perf; + struct cpufreq_frequency_table *entry, *found; + unsigned int next_perf_state, next_freq, freq; + + /* +* Find the closest frequency above target_freq or equal to it. +* +* The table is sorted in the reverse order with respect to the +* frequency and all of the entries are valid (see the initialization). +*/ + entry = data->freq_table; + do { + entry++; + freq = entry->frequency; + } while (freq >= target_freq && freq != CPUFREQ_TABLE_END); + found = entry - 1; + /* +* Use the one found or the previous one, depending on the relation. +* CPUFREQ_RELATION_H is not taken into account here, but it is not +* expected to be passed to this function anyway. +*/ + next_freq = found->frequency; + if (freq == CPUFREQ_TABLE_END || relation != CPUFREQ_RELATION_C || + target_freq - freq >= next_freq - target_freq) { + next_perf_state = found->driver_data; + } else { + next_freq = freq; + next_perf_state = entry->driver_data; + } + + perf = to_perf_data(data); + if (perf->state == next_perf_state) { + if (unlikely(data->resume)) + data->resume = 0; + else + return next_freq; + } + + data->cpu_freq_write(>control_register, +perf->states[next_perf_state].control); + perf->state = next_perf_state; + return next_freq; +} + static unsigned long acpi_cpufreq_guess_freq(struct acpi_cpufreq_data *data, unsigned int cpu) { @@ -740,6 +789,9 @@ static int acpi_cpufreq_cpu_init(struct goto err_unreg; } + policy->fast_switch_possible = !acpi_pstate_strict && + !(policy_is_shared(policy) && policy->shared_type != CPUFREQ_SHARED_TYPE_ANY); + data->freq_table = kzalloc(sizeof(*data->freq_table) * (perf->state_count+1), GFP_KERNEL); if (!data->freq_table) { @@ -874,6 +926,7 @@ static struct freq_attr *acpi_cpufreq_at static struct cpufreq_driver acpi_cpufreq_driver = { .verify = cpufreq_generic_frequency_table_verify, .target_index = acpi_cpufreq_target, + .fast_switch= acpi_cpufreq_fast_switch, .bios_limit = acpi_processor_get_bios_limit, .init = acpi_cpufreq_cpu_init, .exit = acpi_cpufreq_cpu_exit, Index: linux-pm/drivers/cpufreq/cpufreq.c === --- linux-pm.orig/drivers/cpufreq/cpufreq.c +++ linux-pm/drivers/cpufreq/cpufreq.c @@ -1719,6 +1719,36 @@ EXPORT_SYMBOL(cpufreq_unregister_notifie * GOVERNORS* */ +/** + * cpufreq_driver_fast_switch - Carry out a fast CPU frequency switch. + * @policy: cpufreq policy to switch the frequency for. + * @target_freq: New frequency to set (may be approximate). + * @relation: Relation to use for frequency selection. + * + * Carry out a fast frequency switch from interrupt context. + * + * This function must not be called if policy->fast_switch_possible is unset.
[PATCH v2 6/10] cpufreq: Support for fast frequency switching
From: Rafael J. Wysocki Modify the ACPI cpufreq driver to provide a method for switching CPU frequencies from interrupt context and update the cpufreq core to support that method if available. Introduce a new cpufreq driver callback, ->fast_switch, to be invoked for frequency switching from interrupt context via a new helper function, cpufreq_driver_fast_switch(). Add a new policy flag, fast_switch_possible, to be set if fast frequency switching can be used for the given policy. Implement the ->fast_switch callback in the ACPI cpufreq driver and make it set fast_switch_possible during policy initialization as appropriate. Signed-off-by: Rafael J. Wysocki --- Changes from the previous version: - Drop a bogus check from cpufreq_driver_fast_switch(). --- drivers/cpufreq/acpi-cpufreq.c | 53 + drivers/cpufreq/cpufreq.c | 30 +++ include/linux/cpufreq.h|6 3 files changed, 89 insertions(+) Index: linux-pm/drivers/cpufreq/acpi-cpufreq.c === --- linux-pm.orig/drivers/cpufreq/acpi-cpufreq.c +++ linux-pm/drivers/cpufreq/acpi-cpufreq.c @@ -458,6 +458,55 @@ static int acpi_cpufreq_target(struct cp return result; } +unsigned int acpi_cpufreq_fast_switch(struct cpufreq_policy *policy, + unsigned int target_freq, + unsigned int relation) +{ + struct acpi_cpufreq_data *data = policy->driver_data; + struct acpi_processor_performance *perf; + struct cpufreq_frequency_table *entry, *found; + unsigned int next_perf_state, next_freq, freq; + + /* +* Find the closest frequency above target_freq or equal to it. +* +* The table is sorted in the reverse order with respect to the +* frequency and all of the entries are valid (see the initialization). +*/ + entry = data->freq_table; + do { + entry++; + freq = entry->frequency; + } while (freq >= target_freq && freq != CPUFREQ_TABLE_END); + found = entry - 1; + /* +* Use the one found or the previous one, depending on the relation. +* CPUFREQ_RELATION_H is not taken into account here, but it is not +* expected to be passed to this function anyway. +*/ + next_freq = found->frequency; + if (freq == CPUFREQ_TABLE_END || relation != CPUFREQ_RELATION_C || + target_freq - freq >= next_freq - target_freq) { + next_perf_state = found->driver_data; + } else { + next_freq = freq; + next_perf_state = entry->driver_data; + } + + perf = to_perf_data(data); + if (perf->state == next_perf_state) { + if (unlikely(data->resume)) + data->resume = 0; + else + return next_freq; + } + + data->cpu_freq_write(>control_register, +perf->states[next_perf_state].control); + perf->state = next_perf_state; + return next_freq; +} + static unsigned long acpi_cpufreq_guess_freq(struct acpi_cpufreq_data *data, unsigned int cpu) { @@ -740,6 +789,9 @@ static int acpi_cpufreq_cpu_init(struct goto err_unreg; } + policy->fast_switch_possible = !acpi_pstate_strict && + !(policy_is_shared(policy) && policy->shared_type != CPUFREQ_SHARED_TYPE_ANY); + data->freq_table = kzalloc(sizeof(*data->freq_table) * (perf->state_count+1), GFP_KERNEL); if (!data->freq_table) { @@ -874,6 +926,7 @@ static struct freq_attr *acpi_cpufreq_at static struct cpufreq_driver acpi_cpufreq_driver = { .verify = cpufreq_generic_frequency_table_verify, .target_index = acpi_cpufreq_target, + .fast_switch= acpi_cpufreq_fast_switch, .bios_limit = acpi_processor_get_bios_limit, .init = acpi_cpufreq_cpu_init, .exit = acpi_cpufreq_cpu_exit, Index: linux-pm/drivers/cpufreq/cpufreq.c === --- linux-pm.orig/drivers/cpufreq/cpufreq.c +++ linux-pm/drivers/cpufreq/cpufreq.c @@ -1719,6 +1719,36 @@ EXPORT_SYMBOL(cpufreq_unregister_notifie * GOVERNORS* */ +/** + * cpufreq_driver_fast_switch - Carry out a fast CPU frequency switch. + * @policy: cpufreq policy to switch the frequency for. + * @target_freq: New frequency to set (may be approximate). + * @relation: Relation to use for frequency selection. + * + * Carry out a fast frequency switch from interrupt context. + * + * This function must not be called if policy->fast_switch_possible is unset. + * + * Governors calling this function must guarantee