Re: [linux-pm] [RFC PATCH] PM: Introduce generic DVFS framework with device-specific OPPs

2011-04-28 Thread MyungJoo Ham
On Thu, Apr 28, 2011 at 5:48 AM, Colin Cross ccr...@google.com wrote:
 On Wed, Apr 27, 2011 at 12:26 PM, Thomas Gleixner t...@linutronix.de wrote:
 Forget OMAP implementation details for a while, sit back and look at
 the big picture.

 Here's my proposal for DVFS:
 - DVFS is implemented in drivers/clk/dvfs.c, and is called by the
 common clock implementation to adjust the voltages, if necessary, on
 regular clk_* calls.
 - Platform code provides mappings in the form (clk, regulator, max
 frequency, min voltage) to the dvfs code.
 - Everything that is in OPP today gets converted to helper functions
 inside the dvfs implementation, and is never called from SoC code
 (except to pass tables at init), or from drivers.
 - OPP can be recreated in the future as a upper level policy manager
 for clocks that need to move together, if that is ever necessary.  It
 would not know anything about voltages.
 - A few common policy implementations need to be added to the common
 clock implementation, like temperature limits.

I hope that my previous reply answered this.


 For Tegra:
 - DVFS continues to be accessed by calling clk_* functions

 For OMAP:
 - DVFS is triggered by hwmod through clk_* functions.  Any cross-arch
 driver can continue to call clk_* functions.

 OPP currently has opp_enable and opp_disable functions.  I don't
 understand why these are needed, they are only used at init time to
 determine available voltages, which could be handled by never passing
 unavailable voltages to the dvfs implementation.

We need them in runtime.

A device a may want to guarantee that a device b to be at least
200MHz or faster while it does some operations. Then, a will
opp_disable(b, 100MHz and others); and opp_enable(b, them) later
on. We have similar issues with multimedia blocks (MFC, Camera, FB,
GPU) and CPU/Memory Bus. Ondemand governor of CPUFREQ has some delay
on catching up a workload (1.5x the sampling rate in average, 2.0x
the sampling rate in worst cases), which may incur flickering/tearing
issues with multimedia streams. On the other hand, a general thermal
monitor or battery manager might want to limit energy usage by
disabling top performance clocks if it is too hot or the battery level
is low.

 ___
 linux-pm mailing list
 linux...@lists.linux-foundation.org
 https://lists.linux-foundation.org/mailman/listinfo/linux-pm




-- 
MyungJoo Ham (함명주), Ph.D.
Mobile Software Platform Lab,
Digital Media and Communications (DMC) Business
Samsung Electronics
cell: 82-10-6714-2858
--
To unsubscribe from this list: send the line unsubscribe linux-omap in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [linux-pm] [RFC PATCH] PM: Introduce generic DVFS framework with device-specific OPPs

2011-04-28 Thread Colin Cross
On Wed, Apr 27, 2011 at 10:59 PM, MyungJoo Ham myungjoo@samsung.com wrote:
 What one instance of DVFS (devfreq) controls are clocks and
 regulators. (a device may have multiple regulators as well as multiple
 clocks)
 What one instance of DVFS (devfreq) monitors (device load and/or
 temperature) is a device that uses the clocks and regulators.

 If we focus on the things that are controlled by DVFS, connecting DVFS
 with clock seems fine; however, DVFS's decision is based on the status
 of the device and the decision (monitoring result) configures a set of
 clocks and regulators. The clocks are not configured independently
 from others if the clocks are used by a DVFS-capable device. The
 frequency/voltage pair (OPP in this patch) associated with a device
 becomes a representative value of a specific configuration that
 configures the set of clocks and regulators.

 This is quite similar with CPUFREQ. CPUFREQ provides a single
 frequency value as a result of monitoring; however the machine's
 cpufreq driver may set multiple clocks and multiple voltage regulators
 based on the representative value (which is usually the core clock)
 although the cpufreq driver may need to control many more clocks with
 different frequencies.

 With multiple clocks of a device, if there is a clock that is required
 to be set independently from the representative clock with DVFS, it
 means that the DVFS monitoring result (load/temperature) is not a
 scalar value but a vector (multi-dimensional value). That implies that
 we need to monitor different and independent values, which in turn,
 implies that we need separated devices. Note that the DVFS monitor
 result from load and temperature combined is not a multi-dimensional
 value because the temperature limits maximum possible frequency or
 voltage and the load gives preferred lower bound of frequency that
 can be overridden by the limit set by temperature.

 Therefore, having one DVFS per clock where multiple clocks are
 attached to a device will create multiple monitors that monitor the
 same object(device behavior) with same metrics (load and temperature).

 Besides, the reason I've started with target callback, not clk and
 regulator names or pointers is that a device may have multiple clks
 and regulators and the OPP may only show the representative
 clock/regulators as CPUFREQ does. Especially when the order of
 transitions of those multiple clocks and regulators matter (if they
 are in a single device, it sometimes does), running a DVFS per clock,
 not per device, will be bothersome if not disasterous.

I understand the need for some sort of governor that can use device
state to determine the necessary clock frequencies.  Where I disagree
is the connection to voltages.  The governor should ONLY determine the
frequencies desired, and the voltage required to meet those
frequencies should be determined by the clock framework, based only on
the clock and the frequency.
--
To unsubscribe from this list: send the line unsubscribe linux-omap in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [linux-pm] [RFC PATCH] PM: Introduce generic DVFS framework with device-specific OPPs

2011-04-28 Thread Colin Cross
On Wed, Apr 27, 2011 at 11:12 PM, MyungJoo Ham myungjoo@samsung.com wrote:
 On Thu, Apr 28, 2011 at 5:48 AM, Colin Cross ccr...@google.com wrote:
 OPP currently has opp_enable and opp_disable functions.  I don't
 understand why these are needed, they are only used at init time to
 determine available voltages, which could be handled by never passing
 unavailable voltages to the dvfs implementation.

 We need them in runtime.

 A device a may want to guarantee that a device b to be at least
 200MHz or faster while it does some operations. Then, a will
 opp_disable(b, 100MHz and others); and opp_enable(b, them) later
 on. We have similar issues with multimedia blocks (MFC, Camera, FB,
 GPU) and CPU/Memory Bus. Ondemand governor of CPUFREQ has some delay
 on catching up a workload (1.5x the sampling rate in average, 2.0x
 the sampling rate in worst cases), which may incur flickering/tearing
 issues with multimedia streams. On the other hand, a general thermal
 monitor or battery manager might want to limit energy usage by
 disabling top performance clocks if it is too hot or the battery level
 is low.

That sounds like a very strange api, when what you really mean is
clk_set_min_rate or clk_set_max_rate.
--
To unsubscribe from this list: send the line unsubscribe linux-omap in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [linux-pm] [RFC PATCH] PM: Introduce generic DVFS framework with device-specific OPPs

2011-04-28 Thread MyungJoo Ham
On Thu, Apr 28, 2011 at 3:44 PM, Colin Cross ccr...@google.com wrote:
 On Wed, Apr 27, 2011 at 11:12 PM, MyungJoo Ham myungjoo@samsung.com 
 wrote:
 On Thu, Apr 28, 2011 at 5:48 AM, Colin Cross ccr...@google.com wrote:
 OPP currently has opp_enable and opp_disable functions.  I don't
 understand why these are needed, they are only used at init time to
 determine available voltages, which could be handled by never passing
 unavailable voltages to the dvfs implementation.

 We need them in runtime.

 A device a may want to guarantee that a device b to be at least
 200MHz or faster while it does some operations. Then, a will
 opp_disable(b, 100MHz and others); and opp_enable(b, them) later
 on. We have similar issues with multimedia blocks (MFC, Camera, FB,
 GPU) and CPU/Memory Bus. Ondemand governor of CPUFREQ has some delay
 on catching up a workload (1.5x the sampling rate in average, 2.0x
 the sampling rate in worst cases), which may incur flickering/tearing
 issues with multimedia streams. On the other hand, a general thermal
 monitor or battery manager might want to limit energy usage by
 disabling top performance clocks if it is too hot or the battery level
 is low.

 That sounds like a very strange api, when what you really mean is
 clk_set_min_rate or clk_set_max_rate.

Essentially, that's what needed.
However, with clk_set_min/max_rate, don't we need to let another
device to be consumer of other devices' clocks? Not just introducing a
device to other devices?

 ___
 linux-pm mailing list
 linux...@lists.linux-foundation.org
 https://lists.linux-foundation.org/mailman/listinfo/linux-pm




-- 
MyungJoo Ham (함명주), Ph.D.
Mobile Software Platform Lab,
Digital Media and Communications (DMC) Business
Samsung Electronics
cell: 82-10-6714-2858
--
To unsubscribe from this list: send the line unsubscribe linux-omap in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [linux-pm] [RFC PATCH] PM: Introduce generic DVFS framework with device-specific OPPs

2011-04-28 Thread Colin Cross
On Wed, Apr 27, 2011 at 11:50 PM, MyungJoo Ham myungjoo@samsung.com wrote:
 On Thu, Apr 28, 2011 at 3:44 PM, Colin Cross ccr...@google.com wrote:
 On Wed, Apr 27, 2011 at 11:12 PM, MyungJoo Ham myungjoo@samsung.com 
 wrote:
 On Thu, Apr 28, 2011 at 5:48 AM, Colin Cross ccr...@google.com wrote:
 OPP currently has opp_enable and opp_disable functions.  I don't
 understand why these are needed, they are only used at init time to
 determine available voltages, which could be handled by never passing
 unavailable voltages to the dvfs implementation.

 We need them in runtime.

 A device a may want to guarantee that a device b to be at least
 200MHz or faster while it does some operations. Then, a will
 opp_disable(b, 100MHz and others); and opp_enable(b, them) later
 on. We have similar issues with multimedia blocks (MFC, Camera, FB,
 GPU) and CPU/Memory Bus. Ondemand governor of CPUFREQ has some delay
 on catching up a workload (1.5x the sampling rate in average, 2.0x
 the sampling rate in worst cases), which may incur flickering/tearing
 issues with multimedia streams. On the other hand, a general thermal
 monitor or battery manager might want to limit energy usage by
 disabling top performance clocks if it is too hot or the battery level
 is low.

 That sounds like a very strange api, when what you really mean is
 clk_set_min_rate or clk_set_max_rate.

 Essentially, that's what needed.
 However, with clk_set_min/max_rate, don't we need to let another
 device to be consumer of other devices' clocks? Not just introducing a
 device to other devices?

Yes, but that's effectively what you're doing through a backwards api
anyways.  The question is, for these complicated clock scenarios where
the final frequency of a clock depends on so many factors, should that
control go through the clock framework, or through some sort of global
clock governor (which is where OPP would reappear).
--
To unsubscribe from this list: send the line unsubscribe linux-omap in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [linux-pm] [RFC PATCH] PM: Introduce generic DVFS framework with device-specific OPPs

2011-04-28 Thread MyungJoo Ham
On Thu, Apr 28, 2011 at 3:43 PM, Colin Cross ccr...@google.com wrote:
 I understand the need for some sort of governor that can use device
 state to determine the necessary clock frequencies.  Where I disagree
 is the connection to voltages.  The governor should ONLY determine the
 frequencies desired, and the voltage required to meet those
 frequencies should be determined by the clock framework, based only on
 the clock and the frequency.

Yes, as long as AVS(Adaptive Voltage Scaling) is not involved, devfreq
does not need to care about voltages and let device driver (such as
the target callback or its callee) take care of voltages. Besides, my
impression on AVS is that AVS wouldn't be depending on software DVFS
scheme, at least with some AVS test on S5PC110. So, I'd say that it's
safe to let devfreq framework handle frequency only and let target
callback handle anything else except for choosing representative clock
frequency.

However, if we are going to detach devfreq from OPP, we only need to
provide frequency list at init and { an interface to control max/min
freq or an interface to lookup max/min freq of corresponding
representative clock. }

 ___
 linux-pm mailing list
 linux...@lists.linux-foundation.org
 https://lists.linux-foundation.org/mailman/listinfo/linux-pm


ps. In our AVS test, the device drivers had nothing to do with voltage
scaling except for initializing devices. The H/W did everything about
voltage scaling dynamically.

Thanks,

MyungJoo.
-- 
MyungJoo Ham (함명주), Ph.D.
Mobile Software Platform Lab,
Digital Media and Communications (DMC) Business
Samsung Electronics
cell: 82-10-6714-2858
--
To unsubscribe from this list: send the line unsubscribe linux-omap in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [linux-pm] [RFC PATCH] PM: Introduce generic DVFS framework with device-specific OPPs

2011-04-28 Thread MyungJoo Ham
On Thu, Apr 28, 2011 at 4:06 PM, Colin Cross ccr...@google.com wrote:
 On Wed, Apr 27, 2011 at 11:50 PM, MyungJoo Ham myungjoo@samsung.com 
 wrote:
 On Thu, Apr 28, 2011 at 3:44 PM, Colin Cross ccr...@google.com wrote:
 On Wed, Apr 27, 2011 at 11:12 PM, MyungJoo Ham myungjoo@samsung.com 
 wrote:
 On Thu, Apr 28, 2011 at 5:48 AM, Colin Cross ccr...@google.com wrote:
 OPP currently has opp_enable and opp_disable functions.  I don't
 understand why these are needed, they are only used at init time to
 determine available voltages, which could be handled by never passing
 unavailable voltages to the dvfs implementation.

 We need them in runtime.

 A device a may want to guarantee that a device b to be at least
 200MHz or faster while it does some operations. Then, a will
 opp_disable(b, 100MHz and others); and opp_enable(b, them) later
 on. We have similar issues with multimedia blocks (MFC, Camera, FB,
 GPU) and CPU/Memory Bus. Ondemand governor of CPUFREQ has some delay
 on catching up a workload (1.5x the sampling rate in average, 2.0x
 the sampling rate in worst cases), which may incur flickering/tearing
 issues with multimedia streams. On the other hand, a general thermal
 monitor or battery manager might want to limit energy usage by
 disabling top performance clocks if it is too hot or the battery level
 is low.

 That sounds like a very strange api, when what you really mean is
 clk_set_min_rate or clk_set_max_rate.

 Essentially, that's what needed.
 However, with clk_set_min/max_rate, don't we need to let another
 device to be consumer of other devices' clocks? Not just introducing a
 device to other devices?

 Yes, but that's effectively what you're doing through a backwards api
 anyways.  The question is, for these complicated clock scenarios where
 the final frequency of a clock depends on so many factors, should that
 control go through the clock framework, or through some sort of global
 clock governor (which is where OPP would reappear).


In the use cases of runtime clock setting by devfreq or other devices
mentioned above, we are controlling the device's performance with the
representative clock of the device, not a specific clock among the
clocks that the device has. For a device A with clock a1 and a2,
another device B would not control both a1 and a2 directly to
get the guaranteed performance from A. Besides, B should not do so
if there are specific orders, delays, and other controls for A to
properly change performance.

Therefore, my answer is that it would be preferred to control through
some wrapper/interface/or anything that is connected to the device of
the controlled clocks (and let the device's callback or something
control its clocks), not to control through clock framework directly.
In this version of devfreq+OPP, these are handled by the target
callback.


Cheers!
- MyungJoo
-- 
MyungJoo Ham, Ph.D.
Mobile Software Platform Lab,
Digital Media and Communications (DMC) Business
Samsung Electronics
cell: 82-10-6714-2858
--
To unsubscribe from this list: send the line unsubscribe linux-omap in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [linux-pm] [RFC PATCH] PM: Introduce generic DVFS framework with device-specific OPPs

2011-04-28 Thread Mark Brown
On Wed, Apr 27, 2011 at 01:48:52PM -0700, Colin Cross wrote:

 OPP currently has opp_enable and opp_disable functions.  I don't
 understand why these are needed, they are only used at init time to
 determine available voltages, which could be handled by never passing
 unavailable voltages to the dvfs implementation.

I queried this when OPP was originally added.  The motivation which was
given (which seemed fairly reasonable) was to reduce the number of data
tables for similar parts and board designs.  That did seem like
something which it was reasonable to factor out in some way, though
possibly with a different mechanism.
--
To unsubscribe from this list: send the line unsubscribe linux-omap in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [linux-pm] [RFC PATCH] PM: Introduce generic DVFS framework with device-specific OPPs

2011-04-27 Thread Menon, Nishanth
On Wed, Apr 27, 2011 at 12:49, Colin Cross ccr...@google.com wrote:
+l-o

 I'm a little confused about the design for this, and OPP as well.  OPP
 matches a struct device * and a frequency to a voltage, which is not a
 generically useful pairing, as far as I can tell.  On Tegra, it is
 quite possible for a single device to have multiple clocks that each
 have different voltage requirements, for example the display block can
 have an interface clock as well as a pixel clock.  Simplifying this to
 dev + freq = voltage seems very OMAP specific, and will be difficult
 or impossible to adapt to Tegra.
We have the same requirements as well(iclk,fclk,pixclk etc..)! We
group them under voltage domains in OMAP ;). if your issue was a
ability to have a single freq to a OPP, it is upto SoC to do the
proper mapping. Concept of an OPP still remains consistent - which is
for a voltage, there is only so much freq you can drive that specific
module to.

 Moreover, from a silicon perspective, there is always a simple link
 from a single frequency to a minimum voltage for a given circuit.
 There is no need to group them into OPPs, which seem to have a group
 of clocks and their frequencies that map to a single voltage.  That is
 an artifact of the way TI specifies voltages.

 I don't think DVFS is even the right place for any sort of governor.
 DVFS is very simple - to increase to a specific clock speed, the
 voltage must be immediately be raised, with minimum or no delay, to a
 specified value that is specific to that clock.  When the frequency is
 lowered, the voltage should be decreased.  There is a tiny bit of
 policy to determine when to delay dropping the voltage in case the
 frequency will immediately be raised again, but nowhere near the
 complexity of what is shown here.

 I proposed in a different thread on LKML that DVFS be handled within
 the generic clock implementation.  Platforms would register a
 regulator and a table of voltages for each struct clock that required
 DVFS, and the voltages would be changed on normal clk_* requests.
 This maintains compatibility with existing clk_* calls.

It is upto SoC frameworks to implement the transitions. E.g. lets look
at scalability: How'd the mechanism proposed work with temperature
variances: Example: I dont want to hit 1.5GHz if temp 70C - wont it
be an SoC specific hack I'd need to introduce?

All OPP framework does is store that maps, and leaves it to users to
choose regulators, clock framework variances, SoC temperature sensors
or what ever mechanisms they choose to allow through a transition.

 There is a place for a GPU, etc., frequency governor, but it is a
 completely separate issue from DVFS, and should not be mixed in.  I
 could have a GPU that is not voltage scalable, but could still benefit
 from lowering the frequency when it is not in use.  A devfreq
 interface sounds perfect for this, as long as it only ends up calling
 clk_* functions, and those functions handle getting the voltage
 correct.

Regards,
Nishanth Menon
PS:
https://lists.linux-foundation.org/pipermail/linux-pm/2011-April/031113.html
for start of thread
--
To unsubscribe from this list: send the line unsubscribe linux-omap in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [linux-pm] [RFC PATCH] PM: Introduce generic DVFS framework with device-specific OPPs

2011-04-27 Thread Colin Cross
On Wed, Apr 27, 2011 at 11:07 AM, Menon, Nishanth n...@ti.com wrote:
 On Wed, Apr 27, 2011 at 12:49, Colin Cross ccr...@google.com wrote:
 +l-o

 I'm a little confused about the design for this, and OPP as well.  OPP
 matches a struct device * and a frequency to a voltage, which is not a
 generically useful pairing, as far as I can tell.  On Tegra, it is
 quite possible for a single device to have multiple clocks that each
 have different voltage requirements, for example the display block can
 have an interface clock as well as a pixel clock.  Simplifying this to
 dev + freq = voltage seems very OMAP specific, and will be difficult
 or impossible to adapt to Tegra.
 We have the same requirements as well(iclk,fclk,pixclk etc..)! We
 group them under voltage domains in OMAP ;). if your issue was a
 ability to have a single freq to a OPP, it is upto SoC to do the
 proper mapping. Concept of an OPP still remains consistent - which is
 for a voltage, there is only so much freq you can drive that specific
 module to.
No, that is still wrong.  You don't drive a module at a frequency, you
drive a clock.  You can't map struct device * 1-1 to a clock.  Look at
omap2_set_init_voltage:
static int __init omap2_set_init_voltage(char *vdd_name, char *clk_name,
struct device *dev) {

clk =  clk_get(NULL, clk_name);
freq = clk-rate;
opp = opp_find_freq_ceil(dev, freq);
...
}

Now what happens if I have a dev with two frequencies,

 Moreover, from a silicon perspective, there is always a simple link
 from a single frequency to a minimum voltage for a given circuit.
 There is no need to group them into OPPs, which seem to have a group
 of clocks and their frequencies that map to a single voltage.  That is
 an artifact of the way TI specifies voltages.

 I don't think DVFS is even the right place for any sort of governor.
 DVFS is very simple - to increase to a specific clock speed, the
 voltage must be immediately be raised, with minimum or no delay, to a
 specified value that is specific to that clock.  When the frequency is
 lowered, the voltage should be decreased.  There is a tiny bit of
 policy to determine when to delay dropping the voltage in case the
 frequency will immediately be raised again, but nowhere near the
 complexity of what is shown here.

 I proposed in a different thread on LKML that DVFS be handled within
 the generic clock implementation.  Platforms would register a
 regulator and a table of voltages for each struct clock that required
 DVFS, and the voltages would be changed on normal clk_* requests.
 This maintains compatibility with existing clk_* calls.

 It is upto SoC frameworks to implement the transitions. E.g. lets look
 at scalability: How'd the mechanism proposed work with temperature
 variances: Example: I dont want to hit 1.5GHz if temp 70C - wont it
 be an SoC specific hack I'd need to introduce?

 All OPP framework does is store that maps, and leaves it to users to
 choose regulators, clock framework variances, SoC temperature sensors
 or what ever mechanisms they choose to allow through a transition.

 There is a place for a GPU, etc., frequency governor, but it is a
 completely separate issue from DVFS, and should not be mixed in.  I
 could have a GPU that is not voltage scalable, but could still benefit
 from lowering the frequency when it is not in use.  A devfreq
 interface sounds perfect for this, as long as it only ends up calling
 clk_* functions, and those functions handle getting the voltage
 correct.

 Regards,
 Nishanth Menon
 PS:
 https://lists.linux-foundation.org/pipermail/linux-pm/2011-April/031113.html
 for start of thread

--
To unsubscribe from this list: send the line unsubscribe linux-omap in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [linux-pm] [RFC PATCH] PM: Introduce generic DVFS framework with device-specific OPPs

2011-04-27 Thread Colin Cross
(sorry, missent the earlier one)

On Wed, Apr 27, 2011 at 11:07 AM, Menon, Nishanth n...@ti.com wrote:
 On Wed, Apr 27, 2011 at 12:49, Colin Cross ccr...@google.com wrote:
 +l-o

 I'm a little confused about the design for this, and OPP as well.  OPP
 matches a struct device * and a frequency to a voltage, which is not a
 generically useful pairing, as far as I can tell.  On Tegra, it is
 quite possible for a single device to have multiple clocks that each
 have different voltage requirements, for example the display block can
 have an interface clock as well as a pixel clock.  Simplifying this to
 dev + freq = voltage seems very OMAP specific, and will be difficult
 or impossible to adapt to Tegra.
 We have the same requirements as well(iclk,fclk,pixclk etc..)! We
 group them under voltage domains in OMAP ;). if your issue was a
 ability to have a single freq to a OPP, it is upto SoC to do the
 proper mapping. Concept of an OPP still remains consistent - which is
 for a voltage, there is only so much freq you can drive that specific
 module to.
No, that is still wrong.  You don't drive a module at a frequency, you
drive a clock.  You can't map struct device * 1-1 to a clock.  Look at
omap2_set_init_voltage:
static int __init omap2_set_init_voltage(char *vdd_name, char *clk_name,
                                               struct device *dev) {
...
        clk =  clk_get(NULL, clk_name);
        freq = clk-rate;
        opp = opp_find_freq_ceil(dev, freq);
        ...
}

What happens if I have a dev with two frequencies?  I can only pass a
dev into opp.  It makes infinitely more sense to pass in a clock:
opp_find_freq_ceil(clk, freq).

 It is upto SoC frameworks to implement the transitions. E.g. lets look
 at scalability: How'd the mechanism proposed work with temperature
 variances: Example: I dont want to hit 1.5GHz if temp 70C - wont it
 be an SoC specific hack I'd need to introduce?
No, because you're putting it in the wrong place, that is a policy
decision.  Handle it in the clock framework, or handle it in the
device driver.  That's a bad example either way - what happens if you
are already at 1.5GHz when the temperature crosses 70C?  You need an
interrupt that tells you the temperature is too high, and than needs
to affect a policy decision at a much higher level than dvfs.


 All OPP framework does is store that maps, and leaves it to users to
 choose regulators, clock framework variances, SoC temperature sensors
 or what ever mechanisms they choose to allow through a transition.
I understand its just a map, but its a map between two things that
don't have a direct mapping in many SoCs.  I think if you changed
every usage of struct dev * in opp to struct clk *, it would make much
more sense.  There is already a mapping from struct dev * to struct
clk *, its called clk_get, and it takes a second parameter to allow
devices to have multiple clocks.
--
To unsubscribe from this list: send the line unsubscribe linux-omap in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [linux-pm] [RFC PATCH] PM: Introduce generic DVFS framework with device-specific OPPs

2011-04-27 Thread Menon, Nishanth
On Wed, Apr 27, 2011 at 13:29, Colin Cross ccr...@google.com wrote:
 On Wed, Apr 27, 2011 at 11:07 AM, Menon, Nishanth n...@ti.com wrote:
 On Wed, Apr 27, 2011 at 12:49, Colin Cross ccr...@google.com wrote:
 +l-o

 I'm a little confused about the design for this, and OPP as well.  OPP
 matches a struct device * and a frequency to a voltage, which is not a
 generically useful pairing, as far as I can tell.  On Tegra, it is
 quite possible for a single device to have multiple clocks that each
 have different voltage requirements, for example the display block can
 have an interface clock as well as a pixel clock.  Simplifying this to
 dev + freq = voltage seems very OMAP specific, and will be difficult
 or impossible to adapt to Tegra.
 We have the same requirements as well(iclk,fclk,pixclk etc..)! We
 group them under voltage domains in OMAP ;). if your issue was a
 ability to have a single freq to a OPP, it is upto SoC to do the
 proper mapping. Concept of an OPP still remains consistent - which is
 for a voltage, there is only so much freq you can drive that specific
 module to.
 No, that is still wrong.  You don't drive a module at a frequency, you
 drive a clock.  You can't map struct device * 1-1 to a clock.  Look at
Agreed, module runs on clocks - Lets say n clocks provide a module
it's functionality.

 omap2_set_init_voltage:
 static int __init omap2_set_init_voltage(char *vdd_name, char *clk_name,
                                                struct device *dev) {

        clk =  clk_get(NULL, clk_name);
        freq = clk-rate;
        opp = opp_find_freq_ceil(dev, freq);
        ...
 }

 Now what happens if I have a dev with two frequencies,
we do have it - it depends on what the OPP table represents. we do
have modules which have both interface and functional clocks on OMAP
as well. for a module(represented by struct device *) which has n
clocks, choose the scheme of representation of clock that depends on
voltage for the module.
in the example you provided the display block can have an interface
clock as well as a pixel clock - I suppose you mean:
{.pclk = x, .iclk = y, .v = z}
The question I'd ask is this : for a voltage z, is the dependency on
pclk or iclk? I can expect a dependency of pclk to iclk requirement
(considering pixel clock drives an external display for example). the
table reduces to just
{.iclk = y, .v = z} and a different table that has divisor for .iclk
to pclk which is SoC based.

OPP table is just a storage and retrieval mechanism, it is upto SoC
frameworks to choose the most adequate of solutions - e.g. OMAP has
omap_device, hwmod and a clock framework for more intricate control to
work in conjunction with cpuidle frameworks as well.

There is cross domain dependency which OMAP (yet to be pushed to
mainline) has - example: when OMAP4's MPUs are at a certain OPP, L3
(OMAP's SoC bus) needs to be at least a certain OPP - these are
framework which may be very custom to OMAP itself.

---
Regards,
Nishanth Menon
--
To unsubscribe from this list: send the line unsubscribe linux-omap in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [linux-pm] [RFC PATCH] PM: Introduce generic DVFS framework with device-specific OPPs

2011-04-27 Thread Colin Cross
On Wed, Apr 27, 2011 at 11:48 AM, Menon, Nishanth n...@ti.com wrote:
 On Wed, Apr 27, 2011 at 13:29, Colin Cross ccr...@google.com wrote:
 On Wed, Apr 27, 2011 at 11:07 AM, Menon, Nishanth n...@ti.com wrote:
 On Wed, Apr 27, 2011 at 12:49, Colin Cross ccr...@google.com wrote:
 +l-o

 I'm a little confused about the design for this, and OPP as well.  OPP
 matches a struct device * and a frequency to a voltage, which is not a
 generically useful pairing, as far as I can tell.  On Tegra, it is
 quite possible for a single device to have multiple clocks that each
 have different voltage requirements, for example the display block can
 have an interface clock as well as a pixel clock.  Simplifying this to
 dev + freq = voltage seems very OMAP specific, and will be difficult
 or impossible to adapt to Tegra.
 We have the same requirements as well(iclk,fclk,pixclk etc..)! We
 group them under voltage domains in OMAP ;). if your issue was a
 ability to have a single freq to a OPP, it is upto SoC to do the
 proper mapping. Concept of an OPP still remains consistent - which is
 for a voltage, there is only so much freq you can drive that specific
 module to.
 No, that is still wrong.  You don't drive a module at a frequency, you
 drive a clock.  You can't map struct device * 1-1 to a clock.  Look at
 Agreed, module runs on clocks - Lets say n clocks provide a module
 it's functionality.

 omap2_set_init_voltage:
 static int __init omap2_set_init_voltage(char *vdd_name, char *clk_name,
                                                struct device *dev) {

        clk =  clk_get(NULL, clk_name);
        freq = clk-rate;
        opp = opp_find_freq_ceil(dev, freq);
        ...
 }

 Now what happens if I have a dev with two frequencies,
 we do have it - it depends on what the OPP table represents. we do
 have modules which have both interface and functional clocks on OMAP
 as well. for a module(represented by struct device *) which has n
 clocks, choose the scheme of representation of clock that depends on
 voltage for the module.
 in the example you provided the display block can have an interface
 clock as well as a pixel clock - I suppose you mean:
 {.pclk = x, .iclk = y, .v = z}
 The question I'd ask is this : for a voltage z, is the dependency on
 pclk or iclk? I can expect a dependency of pclk to iclk requirement
 (considering pixel clock drives an external display for example). the
 table reduces to just
 {.iclk = y, .v = z} and a different table that has divisor for .iclk
 to pclk which is SoC based.

No, there can be voltage requirements on both, and the higher voltage
requirement of the two must be used.

 OPP table is just a storage and retrieval mechanism, it is upto SoC
 frameworks to choose the most adequate of solutions - e.g. OMAP has
 omap_device, hwmod and a clock framework for more intricate control to
 work in conjunction with cpuidle frameworks as well.

 There is cross domain dependency which OMAP (yet to be pushed to
 mainline) has - example: when OMAP4's MPUs are at a certain OPP, L3
 (OMAP's SoC bus) needs to be at least a certain OPP - these are
 framework which may be very custom to OMAP itself.

 ---
 Regards,
 Nishanth Menon

--
To unsubscribe from this list: send the line unsubscribe linux-omap in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [linux-pm] [RFC PATCH] PM: Introduce generic DVFS framework with device-specific OPPs

2011-04-27 Thread Thomas Gleixner
On Wed, 27 Apr 2011, Menon, Nishanth wrote:
 On Wed, Apr 27, 2011 at 12:49, Colin Cross ccr...@google.com wrote:
  I proposed in a different thread on LKML that DVFS be handled within
  the generic clock implementation.  Platforms would register a
  regulator and a table of voltages for each struct clock that required
  DVFS, and the voltages would be changed on normal clk_* requests.
  This maintains compatibility with existing clk_* calls.
 
 It is upto SoC frameworks to implement the transitions. E.g. lets look
 at scalability: How'd the mechanism proposed work with temperature
 variances: Example: I dont want to hit 1.5GHz if temp 70C - wont it
 be an SoC specific hack I'd need to introduce?

Why is limiting the max core frequency depending on temperature a SoC
specific problem ?

Everyone wants to do that. x86 does it in hardware / SMM, other
architectures want the kernel to take care of it.

So the decision is simple. Something wants to set core freq to 1.5
GHz, so it calls clk_set_rate() and there we consult the DVFS code
first to validate that setting. If it can be set, fine, then DVFS will
set the voltages _before_ we change the frequency or it will simply
veto the change because one of the preliminaries for such a change is
not given.

Please stop thinking that your SoC is sooo special. It's NOT.

The HW concepts are quite similar all over the place, they are just
named differently and use different IP blocks with slightly different
functionality, but the problems are not unique to a particular SoC at
all.

 All OPP framework does is store that maps, and leaves it to users to
 choose regulators, clock framework variances, SoC temperature sensors
 or what ever mechanisms they choose to allow through a transition.

That's how it's implemented, but that does not say that the design is
correct and usable for more than the usecase it was modeled after.
 
We are looking into a common clock framework, which abstracts out the
duplicated functionality of the various implementations and reduces
them to the real thing: hardware drivers. So we really need to look
into that DVFS problem as well, simply because it is tightly coupled
and not a complete separate entity.

And looking at the struct clk disaster we really don't want another
incarnation in terms of DVFS where we end up with the same decision
functions in various SoCs over and over.

Thanks,

tglx

Re: [linux-pm] [RFC PATCH] PM: Introduce generic DVFS framework with device-specific OPPs

2011-04-27 Thread Thomas Gleixner
On Wed, 27 Apr 2011, Menon, Nishanth wrote:

 OPP table is just a storage and retrieval mechanism, it is upto SoC
 frameworks to choose the most adequate of solutions - e.g. OMAP has
 omap_device, hwmod and a clock framework for more intricate control to
 work in conjunction with cpuidle frameworks as well.

Can you please stop thinking about OMAP for a minute?

A clock framework is nothing SoC specific. A framework is an
abstraction of common HW functionality, which implements general
functionality and relies on the HW specific part to configure it and
to provide access to the hardware itself.

clocks are ordered as trees in HW, simply because you cannot have a
clock consumer be driven by more than one active clock at the same
time. A clock consumer may select a different clock producer, but that
merily changes the tree structure nothing else. So why should every
SoC implement it's own (different buggy) version of tree handling and
call it framework?

Yes, I know you might argue that some devices need two clocks enabled
to be functional. That's correct, but coupling those clocks at the
framework level is the wrong thing to do. If a device needs both an
interface clock and a separate interconnect clock to work, then it
needs to enable both clocks and become a consumer of them.
 
 There is cross domain dependency which OMAP (yet to be pushed to
 mainline) has - example: when OMAP4's MPUs are at a certain OPP, L3
 (OMAP's SoC bus) needs to be at least a certain OPP - these are
 framework which may be very custom to OMAP itself.

Wrong again. That's not a framework when you hack SoC specific
decision functions into it. It's the OMAP internal hackery to make
stuff work, but that's far from a framework.

What you are describing is a restriction which can be expressed in
tables or rules which are fed into a general framework.

Look at generic irqs, generic timekeeping, generic clockevents and
tons of other real frameworks in the kernel. They abstract out
concepts and provide generic interfaces rather than claiming that the
problem is unique to a particular piece of silicon.

Forget OMAP implementation details for a while, sit back and look at
the big picture.

Thanks,

tglx


--
To unsubscribe from this list: send the line unsubscribe linux-omap in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [linux-pm] [RFC PATCH] PM: Introduce generic DVFS framework with device-specific OPPs

2011-04-27 Thread Colin Cross
On Wed, Apr 27, 2011 at 12:26 PM, Thomas Gleixner t...@linutronix.de wrote:
 Forget OMAP implementation details for a while, sit back and look at
 the big picture.

Here's my proposal for DVFS:
- DVFS is implemented in drivers/clk/dvfs.c, and is called by the
common clock implementation to adjust the voltages, if necessary, on
regular clk_* calls.
- Platform code provides mappings in the form (clk, regulator, max
frequency, min voltage) to the dvfs code.
- Everything that is in OPP today gets converted to helper functions
inside the dvfs implementation, and is never called from SoC code
(except to pass tables at init), or from drivers.
- OPP can be recreated in the future as a upper level policy manager
for clocks that need to move together, if that is ever necessary.  It
would not know anything about voltages.
- A few common policy implementations need to be added to the common
clock implementation, like temperature limits.

For Tegra:
- DVFS continues to be accessed by calling clk_* functions

For OMAP:
- DVFS is triggered by hwmod through clk_* functions.  Any cross-arch
driver can continue to call clk_* functions.

OPP currently has opp_enable and opp_disable functions.  I don't
understand why these are needed, they are only used at init time to
determine available voltages, which could be handled by never passing
unavailable voltages to the dvfs implementation.
--
To unsubscribe from this list: send the line unsubscribe linux-omap in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [linux-pm] [RFC PATCH] PM: Introduce generic DVFS framework with device-specific OPPs

2011-04-27 Thread MyungJoo Ham
On Thu, Apr 28, 2011 at 3:37 AM, Colin Cross ccr...@google.com wrote:
 (sorry, missent the earlier one)

 On Wed, Apr 27, 2011 at 11:07 AM, Menon, Nishanth n...@ti.com wrote:
 On Wed, Apr 27, 2011 at 12:49, Colin Cross ccr...@google.com wrote:
 +l-o

 I'm a little confused about the design for this, and OPP as well.  OPP
 matches a struct device * and a frequency to a voltage, which is not a
 generically useful pairing, as far as I can tell.  On Tegra, it is
 quite possible for a single device to have multiple clocks that each
 have different voltage requirements, for example the display block can
 have an interface clock as well as a pixel clock.  Simplifying this to
 dev + freq = voltage seems very OMAP specific, and will be difficult
 or impossible to adapt to Tegra.
 We have the same requirements as well(iclk,fclk,pixclk etc..)! We
 group them under voltage domains in OMAP ;). if your issue was a
 ability to have a single freq to a OPP, it is upto SoC to do the
 proper mapping. Concept of an OPP still remains consistent - which is
 for a voltage, there is only so much freq you can drive that specific
 module to.
 No, that is still wrong.  You don't drive a module at a frequency, you
 drive a clock.  You can't map struct device * 1-1 to a clock.  Look at
 omap2_set_init_voltage:
 static int __init omap2_set_init_voltage(char *vdd_name, char *clk_name,
                                                struct device *dev) {
        ...
         clk =  clk_get(NULL, clk_name);
         freq = clk-rate;
         opp = opp_find_freq_ceil(dev, freq);
         ...
 }

 What happens if I have a dev with two frequencies?  I can only pass a
 dev into opp.  It makes infinitely more sense to pass in a clock:
 opp_find_freq_ceil(clk, freq).

What one instance of DVFS (devfreq) controls are clocks and
regulators. (a device may have multiple regulators as well as multiple
clocks)
What one instance of DVFS (devfreq) monitors (device load and/or
temperature) is a device that uses the clocks and regulators.

If we focus on the things that are controlled by DVFS, connecting DVFS
with clock seems fine; however, DVFS's decision is based on the status
of the device and the decision (monitoring result) configures a set of
clocks and regulators. The clocks are not configured independently
from others if the clocks are used by a DVFS-capable device. The
frequency/voltage pair (OPP in this patch) associated with a device
becomes a representative value of a specific configuration that
configures the set of clocks and regulators.

This is quite similar with CPUFREQ. CPUFREQ provides a single
frequency value as a result of monitoring; however the machine's
cpufreq driver may set multiple clocks and multiple voltage regulators
based on the representative value (which is usually the core clock)
although the cpufreq driver may need to control many more clocks with
different frequencies.

With multiple clocks of a device, if there is a clock that is required
to be set independently from the representative clock with DVFS, it
means that the DVFS monitoring result (load/temperature) is not a
scalar value but a vector (multi-dimensional value). That implies that
we need to monitor different and independent values, which in turn,
implies that we need separated devices. Note that the DVFS monitor
result from load and temperature combined is not a multi-dimensional
value because the temperature limits maximum possible frequency or
voltage and the load gives preferred lower bound of frequency that
can be overridden by the limit set by temperature.

Therefore, having one DVFS per clock where multiple clocks are
attached to a device will create multiple monitors that monitor the
same object(device behavior) with same metrics (load and temperature).

Besides, the reason I've started with target callback, not clk and
regulator names or pointers is that a device may have multiple clks
and regulators and the OPP may only show the representative
clock/regulators as CPUFREQ does. Especially when the order of
transitions of those multiple clocks and regulators matter (if they
are in a single device, it sometimes does), running a DVFS per clock,
not per device, will be bothersome if not disasterous.


 It is upto SoC frameworks to implement the transitions. E.g. lets look
 at scalability: How'd the mechanism proposed work with temperature
 variances: Example: I dont want to hit 1.5GHz if temp 70C - wont it
 be an SoC specific hack I'd need to introduce?
 No, because you're putting it in the wrong place, that is a policy
 decision.  Handle it in the clock framework, or handle it in the
 device driver.  That's a bad example either way - what happens if you
 are already at 1.5GHz when the temperature crosses 70C?  You need an
 interrupt that tells you the temperature is too high, and than needs
 to affect a policy decision at a much higher level than dvfs.


 All OPP framework does is store that maps, and leaves it to users to
 choose regulators, clock