Re: cpuidle and cpufreq coupling?
On 07/20/2017 05:11 PM, Vikram Mulukutla wrote: > On 7/20/2017 3:56 PM, Florian Fainelli wrote: >> On 07/20/2017 07:45 AM, Peter Zijlstra wrote: > > > >>> >>> Can your ARM part change OPP without scheduling? Because (for obvious >>> reasons) the idle thread is not supposed to block. >> >> I think it should be able to do that, but I am not sure that if I went >> through the cpufreq API it would be that straight forward so I may have >> to re-implement some of the frequency scaling logic outside of cpufreq >> (or rather make the low-level parts some kind of library I guess). >> > > I think I can safely mention that some of our non-upstream idle drivers > in the past have invoked low level clock drivers to atomically switch > CPUs to low frequency OPPs, with no interaction whatsoever with cpufreq. > It was maintainable since both the idle and clock drivers were > qcom-specific. However this is no longer necessary in recent designs and > I really hope we never need to do this again... Yes same here, this is for a past generation product, current generation has a smarter design that so far does not require that. > > We didn't have to do a voltage switch and just PLL or mux > work so this was doable. I'm guessing your atomic switching also allows > voltage reduction? Correct there is a voltage reduction occurring which is largely under control of a separate MCU/firmware. > > If your architecture allows another CPU to change the entering-idle CPU's > frequency, synchronization will be necessary as well - this is where it > can get a bit tricky. That is a very good point, the frequency scaling is not per-CPU but for the entire CPU complex (up to 4 cores) so that might indeed be a problem. Thanks! -- Florian
Re: cpuidle and cpufreq coupling?
On 7/20/2017 3:56 PM, Florian Fainelli wrote: On 07/20/2017 07:45 AM, Peter Zijlstra wrote: Can your ARM part change OPP without scheduling? Because (for obvious reasons) the idle thread is not supposed to block. I think it should be able to do that, but I am not sure that if I went through the cpufreq API it would be that straight forward so I may have to re-implement some of the frequency scaling logic outside of cpufreq (or rather make the low-level parts some kind of library I guess). I think I can safely mention that some of our non-upstream idle drivers in the past have invoked low level clock drivers to atomically switch CPUs to low frequency OPPs, with no interaction whatsoever with cpufreq. It was maintainable since both the idle and clock drivers were qcom-specific. However this is no longer necessary in recent designs and I really hope we never need to do this again... We didn't have to do a voltage switch and just PLL or mux work so this was doable. I'm guessing your atomic switching also allows voltage reduction? If your architecture allows another CPU to change the entering-idle CPU's frequency, synchronization will be necessary as well - this is where it can get a bit tricky. Thanks, Vikram
Re: cpuidle and cpufreq coupling?
On 07/20/2017 02:23 AM, Sudeep Holla wrote: > > > On 20/07/17 08:18, Viresh Kumar wrote: >> On 20-07-17, 01:17, Rafael J. Wysocki wrote: >>> On Thu, Jul 20, 2017 at 12:54 AM, Florian Fainelli >>> wrote: Hi, We have a particular ARM CPU design that is drawing quite a lot of current upon exit from WFI, and it does so in a way even before the first instruction out of WFI is executed. That means we cannot influence directly the exit from WFI other than by changing the state in which it would be previously entered because of this "dead" time during which the internal logic needs to ramp up back where it left. A naive approach to solving this problem because we have CPU frequency scaling available would be to do the following: - just before entering WFI, switch to a low frequency OPP - enter WFI - upon exit from WFI, ramp up the frequency back to e.g: highest OPP Some of the parts that I am not exactly clear on would be: - would that qualify as a cpuidle governor of some kind that ties in which cpufreq? - would using cpufreq_driver_fast_switch() be an appropriate API to use from outside >>> >>> Generally, the idle driver is expected to manipulate OPPs as suitable >>> for it at the low level. >> >> Does any idle driver do it today ? > >> I am not sure, but I haven't heard anyone from ARM doing it. Though I >> may have completely missed it :) >> > > It doesn't need to be in Linux. E.g. PSCI or any low lever driver can do > that transparently. Not everything is PSCI-based, this platform is ARM (32_bit) and now several years old, still, the logic and spirit remains largely the same. > >> So, that must call into cpufreq (somehow) and look for a low power >> OPP? >> > > That's seems hacky and NAK if it's PSCI platform. It's cleaner do such > hacks/workarounds in platform specific PSCI firmware. > >> @Florian: It would be more tricky then we anticipate. We don't always >> want to go to low OPP on idle, as we may get out of it very quickly >> and changing OPP twice (before and after idle) in that scenario would >> be a complete waste of time. > > Exactly. > I completely agree, this is a trade-off between creating a big but short spike of energy that a poorly designed regulator/power distribution may not handle versus creating a smaller amplitude, but longer in time energy need. The key point is that if your only lowest OPP is the lowest CPU frequency, and the low-level logic to make that happen is there already in the cpufreq driver, can we somehow both utilize it, and feed back its latency into cpuidle, or should the cpufreq driver have hooks into cpuidle (either way is probably fine, but the former scales better to the number of diverse cpufreq drivers out there). Thanks! -- Florian
Re: cpuidle and cpufreq coupling?
On 07/20/2017 07:45 AM, Peter Zijlstra wrote: > On Thu, Jul 20, 2017 at 11:52:41AM +0200, Rafael J. Wysocki wrote: >> On Thu, Jul 20, 2017 at 9:18 AM, Viresh Kumar >> wrote: >>> On 20-07-17, 01:17, Rafael J. Wysocki wrote: On Thu, Jul 20, 2017 at 12:54 AM, Florian Fainelli wrote: > Hi, > > We have a particular ARM CPU design that is drawing quite a lot of > current upon exit from WFI, and it does so in a way even before the > first instruction out of WFI is executed. That means we cannot influence > directly the exit from WFI other than by changing the state in which it > would be previously entered because of this "dead" time during which the > internal logic needs to ramp up back where it left. > > A naive approach to solving this problem because we have CPU frequency > scaling available would be to do the following: > > - just before entering WFI, switch to a low frequency OPP > - enter WFI > - upon exit from WFI, ramp up the frequency back to e.g: highest OPP > > Some of the parts that I am not exactly clear on would be: > > - would that qualify as a cpuidle governor of some kind that ties in > which cpufreq? > - would using cpufreq_driver_fast_switch() be an appropriate API to use > from outside > > Can your ARM part change OPP without scheduling? Because (for obvious > reasons) the idle thread is not supposed to block. I think it should be able to do that, but I am not sure that if I went through the cpufreq API it would be that straight forward so I may have to re-implement some of the frequency scaling logic outside of cpufreq (or rather make the low-level parts some kind of library I guess).
Re: cpuidle and cpufreq coupling?
On Thu, Jul 20, 2017 at 11:52:41AM +0200, Rafael J. Wysocki wrote: > On Thu, Jul 20, 2017 at 9:18 AM, Viresh Kumar wrote: > > On 20-07-17, 01:17, Rafael J. Wysocki wrote: > >> On Thu, Jul 20, 2017 at 12:54 AM, Florian Fainelli > >> wrote: > >> > Hi, > >> > > >> > We have a particular ARM CPU design that is drawing quite a lot of > >> > current upon exit from WFI, and it does so in a way even before the > >> > first instruction out of WFI is executed. That means we cannot influence > >> > directly the exit from WFI other than by changing the state in which it > >> > would be previously entered because of this "dead" time during which the > >> > internal logic needs to ramp up back where it left. > >> > > >> > A naive approach to solving this problem because we have CPU frequency > >> > scaling available would be to do the following: > >> > > >> > - just before entering WFI, switch to a low frequency OPP > >> > - enter WFI > >> > - upon exit from WFI, ramp up the frequency back to e.g: highest OPP > >> > > >> > Some of the parts that I am not exactly clear on would be: > >> > > >> > - would that qualify as a cpuidle governor of some kind that ties in > >> > which cpufreq? > >> > - would using cpufreq_driver_fast_switch() be an appropriate API to use > >> > from outside Can your ARM part change OPP without scheduling? Because (for obvious reasons) the idle thread is not supposed to block.
Re: cpuidle and cpufreq coupling?
On Thu, Jul 20, 2017 at 9:18 AM, Viresh Kumar wrote: > On 20-07-17, 01:17, Rafael J. Wysocki wrote: >> On Thu, Jul 20, 2017 at 12:54 AM, Florian Fainelli >> wrote: >> > Hi, >> > >> > We have a particular ARM CPU design that is drawing quite a lot of >> > current upon exit from WFI, and it does so in a way even before the >> > first instruction out of WFI is executed. That means we cannot influence >> > directly the exit from WFI other than by changing the state in which it >> > would be previously entered because of this "dead" time during which the >> > internal logic needs to ramp up back where it left. >> > >> > A naive approach to solving this problem because we have CPU frequency >> > scaling available would be to do the following: >> > >> > - just before entering WFI, switch to a low frequency OPP >> > - enter WFI >> > - upon exit from WFI, ramp up the frequency back to e.g: highest OPP >> > >> > Some of the parts that I am not exactly clear on would be: >> > >> > - would that qualify as a cpuidle governor of some kind that ties in >> > which cpufreq? >> > - would using cpufreq_driver_fast_switch() be an appropriate API to use >> > from outside >> >> Generally, the idle driver is expected to manipulate OPPs as suitable >> for it at the low level. > > Does any idle driver do it today ? > > I am not sure, but I haven't heard anyone from ARM doing it. Though I > may have completely missed it :) You may not, but that's what is recommended. Had you attended PM sessions at the LPC and similar, you might have heard about it ... > So, that must call into cpufreq (somehow) and look for a low power > OPP? It should know what OPP to use and then coordinate with cpufreq so they don't go against each other (on shared policies).
Re: cpuidle and cpufreq coupling?
On 20/07/17 08:18, Viresh Kumar wrote: > On 20-07-17, 01:17, Rafael J. Wysocki wrote: >> On Thu, Jul 20, 2017 at 12:54 AM, Florian Fainelli >> wrote: >>> Hi, >>> >>> We have a particular ARM CPU design that is drawing quite a lot of >>> current upon exit from WFI, and it does so in a way even before the >>> first instruction out of WFI is executed. That means we cannot influence >>> directly the exit from WFI other than by changing the state in which it >>> would be previously entered because of this "dead" time during which the >>> internal logic needs to ramp up back where it left. >>> >>> A naive approach to solving this problem because we have CPU frequency >>> scaling available would be to do the following: >>> >>> - just before entering WFI, switch to a low frequency OPP >>> - enter WFI >>> - upon exit from WFI, ramp up the frequency back to e.g: highest OPP >>> >>> Some of the parts that I am not exactly clear on would be: >>> >>> - would that qualify as a cpuidle governor of some kind that ties in >>> which cpufreq? >>> - would using cpufreq_driver_fast_switch() be an appropriate API to use >>> from outside >> >> Generally, the idle driver is expected to manipulate OPPs as suitable >> for it at the low level. > > Does any idle driver do it today ? > I am not sure, but I haven't heard anyone from ARM doing it. Though I > may have completely missed it :) > It doesn't need to be in Linux. E.g. PSCI or any low lever driver can do that transparently. > So, that must call into cpufreq (somehow) and look for a low power > OPP? > That's seems hacky and NAK if it's PSCI platform. It's cleaner do such hacks/workarounds in platform specific PSCI firmware. > @Florian: It would be more tricky then we anticipate. We don't always > want to go to low OPP on idle, as we may get out of it very quickly > and changing OPP twice (before and after idle) in that scenario would > be a complete waste of time. Exactly. -- Regards, Sudeep
Re: cpuidle and cpufreq coupling?
On 20-07-17, 01:17, Rafael J. Wysocki wrote: > On Thu, Jul 20, 2017 at 12:54 AM, Florian Fainelli > wrote: > > Hi, > > > > We have a particular ARM CPU design that is drawing quite a lot of > > current upon exit from WFI, and it does so in a way even before the > > first instruction out of WFI is executed. That means we cannot influence > > directly the exit from WFI other than by changing the state in which it > > would be previously entered because of this "dead" time during which the > > internal logic needs to ramp up back where it left. > > > > A naive approach to solving this problem because we have CPU frequency > > scaling available would be to do the following: > > > > - just before entering WFI, switch to a low frequency OPP > > - enter WFI > > - upon exit from WFI, ramp up the frequency back to e.g: highest OPP > > > > Some of the parts that I am not exactly clear on would be: > > > > - would that qualify as a cpuidle governor of some kind that ties in > > which cpufreq? > > - would using cpufreq_driver_fast_switch() be an appropriate API to use > > from outside > > Generally, the idle driver is expected to manipulate OPPs as suitable > for it at the low level. Does any idle driver do it today ? I am not sure, but I haven't heard anyone from ARM doing it. Though I may have completely missed it :) So, that must call into cpufreq (somehow) and look for a low power OPP? @Florian: It would be more tricky then we anticipate. We don't always want to go to low OPP on idle, as we may get out of it very quickly and changing OPP twice (before and after idle) in that scenario would be a complete waste of time. And then I hope your ARM CPUs must be sharing clock/voltage lines with each other as well ? And in that case we shouldn't touch the OPP unless the whole cluster is going down, as some CPUs might be running code then. -- viresh
Re: cpuidle and cpufreq coupling?
On Thu, Jul 20, 2017 at 12:54 AM, Florian Fainelli wrote: > Hi, > > We have a particular ARM CPU design that is drawing quite a lot of > current upon exit from WFI, and it does so in a way even before the > first instruction out of WFI is executed. That means we cannot influence > directly the exit from WFI other than by changing the state in which it > would be previously entered because of this "dead" time during which the > internal logic needs to ramp up back where it left. > > A naive approach to solving this problem because we have CPU frequency > scaling available would be to do the following: > > - just before entering WFI, switch to a low frequency OPP > - enter WFI > - upon exit from WFI, ramp up the frequency back to e.g: highest OPP > > Some of the parts that I am not exactly clear on would be: > > - would that qualify as a cpuidle governor of some kind that ties in > which cpufreq? > - would using cpufreq_driver_fast_switch() be an appropriate API to use > from outside Generally, the idle driver is expected to manipulate OPPs as suitable for it at the low level. Thanks, Rafael
cpuidle and cpufreq coupling?
Hi, We have a particular ARM CPU design that is drawing quite a lot of current upon exit from WFI, and it does so in a way even before the first instruction out of WFI is executed. That means we cannot influence directly the exit from WFI other than by changing the state in which it would be previously entered because of this "dead" time during which the internal logic needs to ramp up back where it left. A naive approach to solving this problem because we have CPU frequency scaling available would be to do the following: - just before entering WFI, switch to a low frequency OPP - enter WFI - upon exit from WFI, ramp up the frequency back to e.g: highest OPP Some of the parts that I am not exactly clear on would be: - would that qualify as a cpuidle governor of some kind that ties in which cpufreq? - would using cpufreq_driver_fast_switch() be an appropriate API to use from outside Thanks! -- Florian