Re: [PATCH] arm64: dts: qcom: sc7180: Add 'sustainable_power' for CPU thermal zones

2020-09-03 Thread Doug Anderson
Hi,

On Thu, Sep 3, 2020 at 5:17 AM Matthias Kaehlcke  wrote:
>
> Hi Rajendra,
>
> On Thu, Sep 03, 2020 at 11:00:52AM +0530, Rajendra Nayak wrote:
> >
> > On 9/3/2020 10:14 AM, Rajendra Nayak wrote:
> > >
> > > On 9/2/2020 9:02 PM, Doug Anderson wrote:
> > > > Hi,
> > > >
> > > > On Tue, Sep 1, 2020 at 10:36 PM Rajendra Nayak  
> > > > wrote:
> > > > >
> > > > >
> > > > > > * In terms of the numbers here, I believe that you're claiming that 
> > > > > > we
> > > > > > can dissipate 768 mW * 6 + 1202 mW * 2 = ~7 Watts of power.  My 
> > > > > > memory
> > > > > > of how much power we could dissipate in previous laptops I worked on
> > > > > > is a little fuzzy, but that doesn't seem insane for a 
> > > > > > passively-cooled
> > > > > > laptop.  However, I think someone could conceivably put this chip 
> > > > > > in a
> > > > > > smaller form factor.  In such a case, it seems like we'd want these
> > > > > > things to sum up to ~2000 (if it would ever make sense for someone 
> > > > > > to
> > > > > > put this chip in a phone) or ~4000 (if it would ever make sense for
> > > > > > someone to put this chip in a small tablet).  It seems possible 
> > > > > > that,
> > > > > > to achieve this, we might have to tweak the
> > > > > > "dynamic-power-coefficient".
> > > > >
> > > > > DPC values are calculated (at a SoC) by actually measuring max power 
> > > > > at various
> > > > > frequency/voltage combinations by running things like dhrystone.
> > > > > How would the max power a SoC can generate depend on form factors?
> > > > > How much it can dissipate sure is, but then I am not super familiar 
> > > > > how
> > > > > thermal frameworks end up using DPC for calculating power dissipated,
> > > > > I am guessing they don't.
> > > > >
> > > > > > I don't know how much thought was put
> > > > > > into those numbers, but the fact that the little cores have a super
> > > > > > round 100 for their dynamic-power-coefficient makes me feel like 
> > > > > > they
> > > > > > might have been more schwags than anything.  Rajendra maybe knows?
> > > > >
> > > > > FWIK, the values are always scaled and normalized to 100 for silver 
> > > > > and
> > > > > then used to derive the relative DPC number for gold. If you see the 
> > > > > DPC
> > > > > for silver cores even on sdm845 is a 100.
> > > > > Again these are not estimations but based on actual power 
> > > > > measurements.
> > > >
> > > > The scaling to 100 doesn't seem to match how the thermal framework is
> > > > using them.  Take a look at of_cpufreq_cooling_register().  It takes
> > > > the "dynamic-power-coefficient" and passes it as "capacitance" into
> > > > __cpufreq_cooling_register().  That's eventually used to compute
> > > > power, which is documented in the code to be in mW.
> > > >
> > > > power = (u64)capacitance * freq_mhz * voltage_mv * voltage_mv;
> > > > do_div(power, 10);
> > > >
> > > > /* power is stored in mW */
> > > > freq_table[i].power = power;
> > > >
> > > > That's used together with "sustainable-power", which is the attribute
> > > > that Matthias is trying to set.  That value is documented to be in mW
> > > > as well.
> > > >
> > > > ...so if the silver cores are always scaled to 100 regardless of how
> > > > much power they actually draw then it'll be impossible to actually
> > > > think about "sustainable-power" as a mW value.  Presumably we either
> > > > need to accept that fact (and ideally document it) or we need to
> > > > change the values for silver / gold cores (we could still keep the
> > > > relative values the same and just scale them).
> > >
> > > That sounds reasonable (still keep the relative values and scale them)
> > > I'll get back on what those scaled numbers would look like, and try to
> > > get some sense of why this scaling to 100 was done (like you said
> > > I don't see any documentation on this), but I see atleast a few other 
> > > non-qcom
> > > SoCs doing this too in mainline (like rockchip/rk3399)

I don't think I was too closely involved in these numbers on rk3399,
but as far as I can tell the 100 number came from:

https://crrev.com/c/364003

...interestingly enough the number _wasn't_ scaled to 100 (but was a
number close to 100) and then was changed to scale to 100.  That makes
it seem like 100, though awfully round, was at least based loosely on
fact for rk3399.

In any case, the devicetree bindings make it pretty clear that this
value should be based in reality and not some bogus number.


> > On second thoughts, why wouldn't a relative 'sustainable-power' value work?
> > On every device, one would need to do the exercise that Matthias did to come
> > up with the OPP at which we can sustain max CPU/GPU loads anyway.
>
> You assume that a thermal zone only has cooling devices of a the same type (or
> with the same fake unit for power consumption). This falls apart when multiple
> types are used, which is common.
>
> Also sustainable power is only a derived value, the lying already starts in
> 

Re: [PATCH] arm64: dts: qcom: sc7180: Add 'sustainable_power' for CPU thermal zones

2020-09-03 Thread Matthias Kaehlcke
Hi Rajendra,

On Thu, Sep 03, 2020 at 11:00:52AM +0530, Rajendra Nayak wrote:
> 
> On 9/3/2020 10:14 AM, Rajendra Nayak wrote:
> > 
> > On 9/2/2020 9:02 PM, Doug Anderson wrote:
> > > Hi,
> > > 
> > > On Tue, Sep 1, 2020 at 10:36 PM Rajendra Nayak  
> > > wrote:
> > > > 
> > > > 
> > > > > * In terms of the numbers here, I believe that you're claiming that we
> > > > > can dissipate 768 mW * 6 + 1202 mW * 2 = ~7 Watts of power.  My memory
> > > > > of how much power we could dissipate in previous laptops I worked on
> > > > > is a little fuzzy, but that doesn't seem insane for a passively-cooled
> > > > > laptop.  However, I think someone could conceivably put this chip in a
> > > > > smaller form factor.  In such a case, it seems like we'd want these
> > > > > things to sum up to ~2000 (if it would ever make sense for someone to
> > > > > put this chip in a phone) or ~4000 (if it would ever make sense for
> > > > > someone to put this chip in a small tablet).  It seems possible that,
> > > > > to achieve this, we might have to tweak the
> > > > > "dynamic-power-coefficient".
> > > > 
> > > > DPC values are calculated (at a SoC) by actually measuring max power at 
> > > > various
> > > > frequency/voltage combinations by running things like dhrystone.
> > > > How would the max power a SoC can generate depend on form factors?
> > > > How much it can dissipate sure is, but then I am not super familiar how
> > > > thermal frameworks end up using DPC for calculating power dissipated,
> > > > I am guessing they don't.
> > > > 
> > > > > I don't know how much thought was put
> > > > > into those numbers, but the fact that the little cores have a super
> > > > > round 100 for their dynamic-power-coefficient makes me feel like they
> > > > > might have been more schwags than anything.  Rajendra maybe knows?
> > > > 
> > > > FWIK, the values are always scaled and normalized to 100 for silver and
> > > > then used to derive the relative DPC number for gold. If you see the DPC
> > > > for silver cores even on sdm845 is a 100.
> > > > Again these are not estimations but based on actual power measurements.
> > > 
> > > The scaling to 100 doesn't seem to match how the thermal framework is
> > > using them.  Take a look at of_cpufreq_cooling_register().  It takes
> > > the "dynamic-power-coefficient" and passes it as "capacitance" into
> > > __cpufreq_cooling_register().  That's eventually used to compute
> > > power, which is documented in the code to be in mW.
> > > 
> > > power = (u64)capacitance * freq_mhz * voltage_mv * voltage_mv;
> > > do_div(power, 10);
> > > 
> > > /* power is stored in mW */
> > > freq_table[i].power = power;
> > > 
> > > That's used together with "sustainable-power", which is the attribute
> > > that Matthias is trying to set.  That value is documented to be in mW
> > > as well.
> > > 
> > > ...so if the silver cores are always scaled to 100 regardless of how
> > > much power they actually draw then it'll be impossible to actually
> > > think about "sustainable-power" as a mW value.  Presumably we either
> > > need to accept that fact (and ideally document it) or we need to
> > > change the values for silver / gold cores (we could still keep the
> > > relative values the same and just scale them).
> > 
> > That sounds reasonable (still keep the relative values and scale them)
> > I'll get back on what those scaled numbers would look like, and try to
> > get some sense of why this scaling to 100 was done (like you said
> > I don't see any documentation on this), but I see atleast a few other 
> > non-qcom
> > SoCs doing this too in mainline (like rockchip/rk3399)
> 
> On second thoughts, why wouldn't a relative 'sustainable-power' value work?
> On every device, one would need to do the exercise that Matthias did to come
> up with the OPP at which we can sustain max CPU/GPU loads anyway.

You assume that a thermal zone only has cooling devices of a the same type (or
with the same fake unit for power consumption). This falls apart when multiple
types are used, which is common.

Also sustainable power is only a derived value, the lying already starts in
the energy model, which is used by EAS, so a fake unit could cause further
problems.

> I mean even if we do change the DPC values to match actual power, Matthias 
> would
> still observe that we can sustain at the very same OPP and not any different.
> Its just that the mW values that are passed to kernel are relative and not
> absolute. My worry is that perhaps no SoC vendor wants to put these absolute 
> numbers
> out.

This is pretty much 'security' by obscurity. It would be relatively easy to
measure actual power consumption at different CPU speeds and derive the DPC
values from that.


Re: [PATCH] arm64: dts: qcom: sc7180: Add 'sustainable_power' for CPU thermal zones

2020-09-02 Thread Rajendra Nayak



On 9/3/2020 10:14 AM, Rajendra Nayak wrote:


On 9/2/2020 9:02 PM, Doug Anderson wrote:

Hi,

On Tue, Sep 1, 2020 at 10:36 PM Rajendra Nayak  wrote:




* In terms of the numbers here, I believe that you're claiming that we
can dissipate 768 mW * 6 + 1202 mW * 2 = ~7 Watts of power.  My memory
of how much power we could dissipate in previous laptops I worked on
is a little fuzzy, but that doesn't seem insane for a passively-cooled
laptop.  However, I think someone could conceivably put this chip in a
smaller form factor.  In such a case, it seems like we'd want these
things to sum up to ~2000 (if it would ever make sense for someone to
put this chip in a phone) or ~4000 (if it would ever make sense for
someone to put this chip in a small tablet).  It seems possible that,
to achieve this, we might have to tweak the
"dynamic-power-coefficient".


DPC values are calculated (at a SoC) by actually measuring max power at various
frequency/voltage combinations by running things like dhrystone.
How would the max power a SoC can generate depend on form factors?
How much it can dissipate sure is, but then I am not super familiar how
thermal frameworks end up using DPC for calculating power dissipated,
I am guessing they don't.


I don't know how much thought was put
into those numbers, but the fact that the little cores have a super
round 100 for their dynamic-power-coefficient makes me feel like they
might have been more schwags than anything.  Rajendra maybe knows?


FWIK, the values are always scaled and normalized to 100 for silver and
then used to derive the relative DPC number for gold. If you see the DPC
for silver cores even on sdm845 is a 100.
Again these are not estimations but based on actual power measurements.


The scaling to 100 doesn't seem to match how the thermal framework is
using them.  Take a look at of_cpufreq_cooling_register().  It takes
the "dynamic-power-coefficient" and passes it as "capacitance" into
__cpufreq_cooling_register().  That's eventually used to compute
power, which is documented in the code to be in mW.

power = (u64)capacitance * freq_mhz * voltage_mv * voltage_mv;
do_div(power, 10);

/* power is stored in mW */
freq_table[i].power = power;

That's used together with "sustainable-power", which is the attribute
that Matthias is trying to set.  That value is documented to be in mW
as well.

...so if the silver cores are always scaled to 100 regardless of how
much power they actually draw then it'll be impossible to actually
think about "sustainable-power" as a mW value.  Presumably we either
need to accept that fact (and ideally document it) or we need to
change the values for silver / gold cores (we could still keep the
relative values the same and just scale them).


That sounds reasonable (still keep the relative values and scale them)
I'll get back on what those scaled numbers would look like, and try to
get some sense of why this scaling to 100 was done (like you said
I don't see any documentation on this), but I see atleast a few other non-qcom
SoCs doing this too in mainline (like rockchip/rk3399)


On second thoughts, why wouldn't a relative 'sustainable-power' value work?
On every device, one would need to do the exercise that Matthias did to come
up with the OPP at which we can sustain max CPU/GPU loads anyway.
I mean even if we do change the DPC values to match actual power, Matthias would
still observe that we can sustain at the very same OPP and not any different.
Its just that the mW values that are passed to kernel are relative and not
absolute. My worry is that perhaps no SoC vendor wants to put these absolute 
numbers
out.

--
QUALCOMM INDIA, on behalf of Qualcomm Innovation Center, Inc. is a member
of Code Aurora Forum, hosted by The Linux Foundation


Re: [PATCH] arm64: dts: qcom: sc7180: Add 'sustainable_power' for CPU thermal zones

2020-09-02 Thread Rajendra Nayak



On 9/2/2020 9:02 PM, Doug Anderson wrote:

Hi,

On Tue, Sep 1, 2020 at 10:36 PM Rajendra Nayak  wrote:




* In terms of the numbers here, I believe that you're claiming that we
can dissipate 768 mW * 6 + 1202 mW * 2 = ~7 Watts of power.  My memory
of how much power we could dissipate in previous laptops I worked on
is a little fuzzy, but that doesn't seem insane for a passively-cooled
laptop.  However, I think someone could conceivably put this chip in a
smaller form factor.  In such a case, it seems like we'd want these
things to sum up to ~2000 (if it would ever make sense for someone to
put this chip in a phone) or ~4000 (if it would ever make sense for
someone to put this chip in a small tablet).  It seems possible that,
to achieve this, we might have to tweak the
"dynamic-power-coefficient".


DPC values are calculated (at a SoC) by actually measuring max power at various
frequency/voltage combinations by running things like dhrystone.
How would the max power a SoC can generate depend on form factors?
How much it can dissipate sure is, but then I am not super familiar how
thermal frameworks end up using DPC for calculating power dissipated,
I am guessing they don't.


I don't know how much thought was put
into those numbers, but the fact that the little cores have a super
round 100 for their dynamic-power-coefficient makes me feel like they
might have been more schwags than anything.  Rajendra maybe knows?


FWIK, the values are always scaled and normalized to 100 for silver and
then used to derive the relative DPC number for gold. If you see the DPC
for silver cores even on sdm845 is a 100.
Again these are not estimations but based on actual power measurements.


The scaling to 100 doesn't seem to match how the thermal framework is
using them.  Take a look at of_cpufreq_cooling_register().  It takes
the "dynamic-power-coefficient" and passes it as "capacitance" into
__cpufreq_cooling_register().  That's eventually used to compute
power, which is documented in the code to be in mW.

power = (u64)capacitance * freq_mhz * voltage_mv * voltage_mv;
do_div(power, 10);

/* power is stored in mW */
freq_table[i].power = power;

That's used together with "sustainable-power", which is the attribute
that Matthias is trying to set.  That value is documented to be in mW
as well.

...so if the silver cores are always scaled to 100 regardless of how
much power they actually draw then it'll be impossible to actually
think about "sustainable-power" as a mW value.  Presumably we either
need to accept that fact (and ideally document it) or we need to
change the values for silver / gold cores (we could still keep the
relative values the same and just scale them).


That sounds reasonable (still keep the relative values and scale them)
I'll get back on what those scaled numbers would look like, and try to
get some sense of why this scaling to 100 was done (like you said
I don't see any documentation on this), but I see atleast a few other non-qcom
SoCs doing this too in mainline (like rockchip/rk3399)

--
QUALCOMM INDIA, on behalf of Qualcomm Innovation Center, Inc. is a member
of Code Aurora Forum, hosted by The Linux Foundation


Re: [PATCH] arm64: dts: qcom: sc7180: Add 'sustainable_power' for CPU thermal zones

2020-09-02 Thread Doug Anderson
Hi,

On Tue, Sep 1, 2020 at 10:36 PM Rajendra Nayak  wrote:
>
>
> > * In terms of the numbers here, I believe that you're claiming that we
> > can dissipate 768 mW * 6 + 1202 mW * 2 = ~7 Watts of power.  My memory
> > of how much power we could dissipate in previous laptops I worked on
> > is a little fuzzy, but that doesn't seem insane for a passively-cooled
> > laptop.  However, I think someone could conceivably put this chip in a
> > smaller form factor.  In such a case, it seems like we'd want these
> > things to sum up to ~2000 (if it would ever make sense for someone to
> > put this chip in a phone) or ~4000 (if it would ever make sense for
> > someone to put this chip in a small tablet).  It seems possible that,
> > to achieve this, we might have to tweak the
> > "dynamic-power-coefficient".
>
> DPC values are calculated (at a SoC) by actually measuring max power at 
> various
> frequency/voltage combinations by running things like dhrystone.
> How would the max power a SoC can generate depend on form factors?
> How much it can dissipate sure is, but then I am not super familiar how
> thermal frameworks end up using DPC for calculating power dissipated,
> I am guessing they don't.
>
> > I don't know how much thought was put
> > into those numbers, but the fact that the little cores have a super
> > round 100 for their dynamic-power-coefficient makes me feel like they
> > might have been more schwags than anything.  Rajendra maybe knows?
>
> FWIK, the values are always scaled and normalized to 100 for silver and
> then used to derive the relative DPC number for gold. If you see the DPC
> for silver cores even on sdm845 is a 100.
> Again these are not estimations but based on actual power measurements.

The scaling to 100 doesn't seem to match how the thermal framework is
using them.  Take a look at of_cpufreq_cooling_register().  It takes
the "dynamic-power-coefficient" and passes it as "capacitance" into
__cpufreq_cooling_register().  That's eventually used to compute
power, which is documented in the code to be in mW.

power = (u64)capacitance * freq_mhz * voltage_mv * voltage_mv;
do_div(power, 10);

/* power is stored in mW */
freq_table[i].power = power;

That's used together with "sustainable-power", which is the attribute
that Matthias is trying to set.  That value is documented to be in mW
as well.

...so if the silver cores are always scaled to 100 regardless of how
much power they actually draw then it'll be impossible to actually
think about "sustainable-power" as a mW value.  Presumably we either
need to accept that fact (and ideally document it) or we need to
change the values for silver / gold cores (we could still keep the
relative values the same and just scale them).

-Doug


Re: [PATCH] arm64: dts: qcom: sc7180: Add 'sustainable_power' for CPU thermal zones

2020-09-01 Thread Rajendra Nayak




I'm not massively familiar with this area of the code, but I guess I
shouldn't let that stop me from having an opinion!  :-P

* I would agree that it seems highly unlikely that someone would put
one of these chips in a device that could only dissipate the heat from
the lowest OPP, so having some higher estimate definitely makes sense.

* In terms of the numbers here, I believe that you're claiming that we
can dissipate 768 mW * 6 + 1202 mW * 2 = ~7 Watts of power.


No, I'm claiming it's 768 mW + 1202 mW = ~2 W.

SC7180 has a 6 thermal zones for the 6 little cores and 4 zones for the
2 big cores. Each of these thermal zones uses either all little or all big
cores as cooling devices, hence the power sustainable power of the
individual zones doesn't add up. 768 mW corresponds to 6x 128 mW (aka all
little cores at 1.8 GHz), and 1202 mW to 2x 601 mW (both big cores at 1.9 GHz).


My memory
of how much power we could dissipate in previous laptops I worked on
is a little fuzzy, but that doesn't seem insane for a passively-cooled
laptop.  However, I think someone could conceivably put this chip in a
smaller form factor.  In such a case, it seems like we'd want these
things to sum up to ~2000 (if it would ever make sense for someone to
put this chip in a phone) or ~4000 (if it would ever make sense for
someone to put this chip in a small tablet).


See above, the sustainable power with this patch only adds up to ~2000.
It is possible though that it would be lower in a smaller form factor
device.

I'd be ok with posting something lower for SC7180 (it would be a guess
though) and use the specific numbers in the device specific DT.


It seems possible that,
to achieve this, we might have to tweak the
"dynamic-power-coefficient".  I don't know how much thought was put
into those numbers, but the fact that the little cores have a super
round 100 for their dynamic-power-coefficient makes me feel like they
might have been more schwags than anything.  Rajendra maybe knows?


Yeah, it's possible that that was just an approximation


No, these are based on actual power measurements.




* I'm curious about the fact that there are two numbers here: one for
littles and one for bigs.  If I had to guess I'd say that since all
the cores are in one package so the contributions kinda need to be
thought of together, right?  If we're sitting there thermally
throttled then we'd want to pick the best perf-per-watt for the
overall package.  This is why your patch says we can sustain the
little cores at max and the big cores get whatever is left over,
right?


It's derived from how Qualcomm specified the thermal zones and cooling
devices. Any ("cpu") zone is either cooled by (all) big cores or by (all)
little cores, but not a mix of them. In my tests I also saw that the big
cores seemed to have little impact on the little ones. The little cores
are at max because even running at max frequency the temperature in the
'little zones' wouldn't come close to the trip point.


* Should we be leaving some room in here for the GPU?  ...or I guess
once we list it as a cooling device we'll have to decrease the amount
the CPUs can use?


I don't know for sure, but judging from the CPU zones I wouldn't be
surprised if the GPU was managed exclusively in the dedicated GPU
thermal zones (I guess that's what 'gpuss0-thermal' and 'gpuss1-thermal'
are). If that's not the case the values in the CPU zones can be
adjusted when specific data is available.


So I guess the tl; dr is:

a) We should check "dynamic-power-coefficient" and possibly adjust.


ok, lets see if Rajendra can check if there is room for tweaking.


I suggest we don't :)

--
QUALCOMM INDIA, on behalf of Qualcomm Innovation Center, Inc. is a member
of Code Aurora Forum, hosted by The Linux Foundation


Re: [PATCH] arm64: dts: qcom: sc7180: Add 'sustainable_power' for CPU thermal zones

2020-09-01 Thread Rajendra Nayak




* In terms of the numbers here, I believe that you're claiming that we
can dissipate 768 mW * 6 + 1202 mW * 2 = ~7 Watts of power.  My memory
of how much power we could dissipate in previous laptops I worked on
is a little fuzzy, but that doesn't seem insane for a passively-cooled
laptop.  However, I think someone could conceivably put this chip in a
smaller form factor.  In such a case, it seems like we'd want these
things to sum up to ~2000 (if it would ever make sense for someone to
put this chip in a phone) or ~4000 (if it would ever make sense for
someone to put this chip in a small tablet).  It seems possible that,
to achieve this, we might have to tweak the
"dynamic-power-coefficient".


DPC values are calculated (at a SoC) by actually measuring max power at various
frequency/voltage combinations by running things like dhrystone.
How would the max power a SoC can generate depend on form factors?
How much it can dissipate sure is, but then I am not super familiar how
thermal frameworks end up using DPC for calculating power dissipated,
I am guessing they don't.
 

I don't know how much thought was put
into those numbers, but the fact that the little cores have a super
round 100 for their dynamic-power-coefficient makes me feel like they
might have been more schwags than anything.  Rajendra maybe knows?


FWIK, the values are always scaled and normalized to 100 for silver and
then used to derive the relative DPC number for gold. If you see the DPC
for silver cores even on sdm845 is a 100.
Again these are not estimations but based on actual power measurements.


--
QUALCOMM INDIA, on behalf of Qualcomm Innovation Center, Inc. is a member
of Code Aurora Forum, hosted by The Linux Foundation


Re: [PATCH] arm64: dts: qcom: sc7180: Add 'sustainable_power' for CPU thermal zones

2020-09-01 Thread Matthias Kaehlcke
On Tue, Sep 01, 2020 at 03:45:52PM -0700, Doug Anderson wrote:
> Hi,
> 
> On Tue, Sep 1, 2020 at 2:33 PM Matthias Kaehlcke  wrote:
> >
> > Hi Doug,
> >
> > On Tue, Sep 01, 2020 at 01:19:10PM -0700, Doug Anderson wrote:
> > > Hi,
> > >
> > > On Tue, Sep 1, 2020 at 10:07 AM Matthias Kaehlcke  
> > > wrote:
> > > >
> > > > On Thu, Aug 13, 2020 at 11:30:33AM -0700, Matthias Kaehlcke wrote:
> > > > > The 'sustainable_power' attribute provides an estimate of the 
> > > > > sustained
> > > > > power that can be dissipated at the desired control temperature. One
> > > > > could argue that this value is not necessarily the same for all 
> > > > > devices
> > > > > with the same SoC, which may have different form factors or thermal
> > > > > designs. However there are reasons to specify a (default) value at SoC
> > > > > level for SC7180: most importantly, if no value is specified at all 
> > > > > the
> > > > > power_allocator thermal governor (aka 'IPA') estimates a value, using 
> > > > > the
> > > > > minimum power of all cooling devices of the zone, which can result in
> > > > > overly aggressive thermal throttling. For most devices an approximate
> > > > > conservative value should be more useful than the minimum guesstimate
> > > > > of power_allocator. Devices that need a different value can overwrite
> > > > > it in their .dts. Also the thermal zones for SC7180 have a 
> > > > > high
> > > > > level of granularity (essentially one for each function block), which
> > > > > makes it more likely that the default value just works for many 
> > > > > devices.
> > > > >
> > > > > The values correspond to 1901 MHz for the big cores, and 1804 MHz for
> > > > > the small cores. The values were determined by limiting the CPU
> > > > > frequencies to different max values and launching a bunch of processes
> > > > > that cause high CPU load ('while true; do true; done &' is simple and
> > > > > does a good job). A frequency is deemed sustainable if the CPU
> > > > > temperatures don't rise (consistently) above the second trip point
> > > > > ('control temperature', 95 degC in this case). Once the highest
> > > > > sustainable frequency is found, the sustainable power can be 
> > > > > calculated
> > > > > by multiplying the energy consumption per core at this frequency 
> > > > > (which
> > > > > can be found in /sys/kernel/debug/energy_model/) with the number of
> > > > > cores that are specified as cooling devices.
> > > > >
> > > > > The sustainable frequencies were determined at room temperature
> > > > > on a device without heat sink or other passive cooling elements.
> > >
> > > I'm curious: was this a bare board, or a device in a case?  Hrm, I'm
> > > not sure which one would be worse at heat dissipation, but I would
> > > imagine that being inside a plastic case might be worse?
> >
> > This was with a device in a plastic case.
> >
> > > > > Signed-off-by: Matthias Kaehlcke 
> > > > > ---
> > > > > If maintainers think 'sustainable_power' should be specified at
> > > > > device level (with which I conceptually agree) I'm fine with
> > > > > doing that, just seemed it could be useful to have a reasonable
> > > > > 'default' at SoC level in this case.
> > > >
> > > > Any comments on this?
> > >
> > > I'm not massively familiar with this area of the code, but I guess I
> > > shouldn't let that stop me from having an opinion!  :-P
> > >
> > > * I would agree that it seems highly unlikely that someone would put
> > > one of these chips in a device that could only dissipate the heat from
> > > the lowest OPP, so having some higher estimate definitely makes sense.
> > >
> > > * In terms of the numbers here, I believe that you're claiming that we
> > > can dissipate 768 mW * 6 + 1202 mW * 2 = ~7 Watts of power.
> >
> > No, I'm claiming it's 768 mW + 1202 mW = ~2 W.
> >
> > SC7180 has a 6 thermal zones for the 6 little cores and 4 zones for the
> > 2 big cores. Each of these thermal zones uses either all little or all big
> > cores as cooling devices, hence the power sustainable power of the
> > individual zones doesn't add up. 768 mW corresponds to 6x 128 mW (aka all
> > little cores at 1.8 GHz), and 1202 mW to 2x 601 mW (both big cores at 1.9 
> > GHz).
> 
> Ah!  Thanks for explaining.
> 
> 
> > > My memory
> > > of how much power we could dissipate in previous laptops I worked on
> > > is a little fuzzy, but that doesn't seem insane for a passively-cooled
> > > laptop.  However, I think someone could conceivably put this chip in a
> > > smaller form factor.  In such a case, it seems like we'd want these
> > > things to sum up to ~2000 (if it would ever make sense for someone to
> > > put this chip in a phone) or ~4000 (if it would ever make sense for
> > > someone to put this chip in a small tablet).
> >
> > See above, the sustainable power with this patch only adds up to ~2000.
> > It is possible though that it would be lower in a smaller form factor
> > device.
> >
> > I'd be ok with posting something lower for 

Re: [PATCH] arm64: dts: qcom: sc7180: Add 'sustainable_power' for CPU thermal zones

2020-09-01 Thread Doug Anderson
Hi,

On Tue, Sep 1, 2020 at 2:33 PM Matthias Kaehlcke  wrote:
>
> Hi Doug,
>
> On Tue, Sep 01, 2020 at 01:19:10PM -0700, Doug Anderson wrote:
> > Hi,
> >
> > On Tue, Sep 1, 2020 at 10:07 AM Matthias Kaehlcke  wrote:
> > >
> > > On Thu, Aug 13, 2020 at 11:30:33AM -0700, Matthias Kaehlcke wrote:
> > > > The 'sustainable_power' attribute provides an estimate of the sustained
> > > > power that can be dissipated at the desired control temperature. One
> > > > could argue that this value is not necessarily the same for all devices
> > > > with the same SoC, which may have different form factors or thermal
> > > > designs. However there are reasons to specify a (default) value at SoC
> > > > level for SC7180: most importantly, if no value is specified at all the
> > > > power_allocator thermal governor (aka 'IPA') estimates a value, using 
> > > > the
> > > > minimum power of all cooling devices of the zone, which can result in
> > > > overly aggressive thermal throttling. For most devices an approximate
> > > > conservative value should be more useful than the minimum guesstimate
> > > > of power_allocator. Devices that need a different value can overwrite
> > > > it in their .dts. Also the thermal zones for SC7180 have a high
> > > > level of granularity (essentially one for each function block), which
> > > > makes it more likely that the default value just works for many devices.
> > > >
> > > > The values correspond to 1901 MHz for the big cores, and 1804 MHz for
> > > > the small cores. The values were determined by limiting the CPU
> > > > frequencies to different max values and launching a bunch of processes
> > > > that cause high CPU load ('while true; do true; done &' is simple and
> > > > does a good job). A frequency is deemed sustainable if the CPU
> > > > temperatures don't rise (consistently) above the second trip point
> > > > ('control temperature', 95 degC in this case). Once the highest
> > > > sustainable frequency is found, the sustainable power can be calculated
> > > > by multiplying the energy consumption per core at this frequency (which
> > > > can be found in /sys/kernel/debug/energy_model/) with the number of
> > > > cores that are specified as cooling devices.
> > > >
> > > > The sustainable frequencies were determined at room temperature
> > > > on a device without heat sink or other passive cooling elements.
> >
> > I'm curious: was this a bare board, or a device in a case?  Hrm, I'm
> > not sure which one would be worse at heat dissipation, but I would
> > imagine that being inside a plastic case might be worse?
>
> This was with a device in a plastic case.
>
> > > > Signed-off-by: Matthias Kaehlcke 
> > > > ---
> > > > If maintainers think 'sustainable_power' should be specified at
> > > > device level (with which I conceptually agree) I'm fine with
> > > > doing that, just seemed it could be useful to have a reasonable
> > > > 'default' at SoC level in this case.
> > >
> > > Any comments on this?
> >
> > I'm not massively familiar with this area of the code, but I guess I
> > shouldn't let that stop me from having an opinion!  :-P
> >
> > * I would agree that it seems highly unlikely that someone would put
> > one of these chips in a device that could only dissipate the heat from
> > the lowest OPP, so having some higher estimate definitely makes sense.
> >
> > * In terms of the numbers here, I believe that you're claiming that we
> > can dissipate 768 mW * 6 + 1202 mW * 2 = ~7 Watts of power.
>
> No, I'm claiming it's 768 mW + 1202 mW = ~2 W.
>
> SC7180 has a 6 thermal zones for the 6 little cores and 4 zones for the
> 2 big cores. Each of these thermal zones uses either all little or all big
> cores as cooling devices, hence the power sustainable power of the
> individual zones doesn't add up. 768 mW corresponds to 6x 128 mW (aka all
> little cores at 1.8 GHz), and 1202 mW to 2x 601 mW (both big cores at 1.9 
> GHz).

Ah!  Thanks for explaining.


> > My memory
> > of how much power we could dissipate in previous laptops I worked on
> > is a little fuzzy, but that doesn't seem insane for a passively-cooled
> > laptop.  However, I think someone could conceivably put this chip in a
> > smaller form factor.  In such a case, it seems like we'd want these
> > things to sum up to ~2000 (if it would ever make sense for someone to
> > put this chip in a phone) or ~4000 (if it would ever make sense for
> > someone to put this chip in a small tablet).
>
> See above, the sustainable power with this patch only adds up to ~2000.
> It is possible though that it would be lower in a smaller form factor
> device.
>
> I'd be ok with posting something lower for SC7180 (it would be a guess
> though) and use the specific numbers in the device specific DT.

Given the advice in the bindings it seems like 2W should be fine.


> > It seems possible that,
> > to achieve this, we might have to tweak the
> > "dynamic-power-coefficient".  I don't know how much thought was put
> > into those numbers, but 

Re: [PATCH] arm64: dts: qcom: sc7180: Add 'sustainable_power' for CPU thermal zones

2020-09-01 Thread Matthias Kaehlcke
Hi Doug,

On Tue, Sep 01, 2020 at 01:19:10PM -0700, Doug Anderson wrote:
> Hi,
> 
> On Tue, Sep 1, 2020 at 10:07 AM Matthias Kaehlcke  wrote:
> >
> > On Thu, Aug 13, 2020 at 11:30:33AM -0700, Matthias Kaehlcke wrote:
> > > The 'sustainable_power' attribute provides an estimate of the sustained
> > > power that can be dissipated at the desired control temperature. One
> > > could argue that this value is not necessarily the same for all devices
> > > with the same SoC, which may have different form factors or thermal
> > > designs. However there are reasons to specify a (default) value at SoC
> > > level for SC7180: most importantly, if no value is specified at all the
> > > power_allocator thermal governor (aka 'IPA') estimates a value, using the
> > > minimum power of all cooling devices of the zone, which can result in
> > > overly aggressive thermal throttling. For most devices an approximate
> > > conservative value should be more useful than the minimum guesstimate
> > > of power_allocator. Devices that need a different value can overwrite
> > > it in their .dts. Also the thermal zones for SC7180 have a high
> > > level of granularity (essentially one for each function block), which
> > > makes it more likely that the default value just works for many devices.
> > >
> > > The values correspond to 1901 MHz for the big cores, and 1804 MHz for
> > > the small cores. The values were determined by limiting the CPU
> > > frequencies to different max values and launching a bunch of processes
> > > that cause high CPU load ('while true; do true; done &' is simple and
> > > does a good job). A frequency is deemed sustainable if the CPU
> > > temperatures don't rise (consistently) above the second trip point
> > > ('control temperature', 95 degC in this case). Once the highest
> > > sustainable frequency is found, the sustainable power can be calculated
> > > by multiplying the energy consumption per core at this frequency (which
> > > can be found in /sys/kernel/debug/energy_model/) with the number of
> > > cores that are specified as cooling devices.
> > >
> > > The sustainable frequencies were determined at room temperature
> > > on a device without heat sink or other passive cooling elements.
> 
> I'm curious: was this a bare board, or a device in a case?  Hrm, I'm
> not sure which one would be worse at heat dissipation, but I would
> imagine that being inside a plastic case might be worse?

This was with a device in a plastic case.

> > > Signed-off-by: Matthias Kaehlcke 
> > > ---
> > > If maintainers think 'sustainable_power' should be specified at
> > > device level (with which I conceptually agree) I'm fine with
> > > doing that, just seemed it could be useful to have a reasonable
> > > 'default' at SoC level in this case.
> >
> > Any comments on this?
> 
> I'm not massively familiar with this area of the code, but I guess I
> shouldn't let that stop me from having an opinion!  :-P
> 
> * I would agree that it seems highly unlikely that someone would put
> one of these chips in a device that could only dissipate the heat from
> the lowest OPP, so having some higher estimate definitely makes sense.
> 
> * In terms of the numbers here, I believe that you're claiming that we
> can dissipate 768 mW * 6 + 1202 mW * 2 = ~7 Watts of power.

No, I'm claiming it's 768 mW + 1202 mW = ~2 W.

SC7180 has a 6 thermal zones for the 6 little cores and 4 zones for the
2 big cores. Each of these thermal zones uses either all little or all big
cores as cooling devices, hence the power sustainable power of the
individual zones doesn't add up. 768 mW corresponds to 6x 128 mW (aka all
little cores at 1.8 GHz), and 1202 mW to 2x 601 mW (both big cores at 1.9 GHz).

> My memory
> of how much power we could dissipate in previous laptops I worked on
> is a little fuzzy, but that doesn't seem insane for a passively-cooled
> laptop.  However, I think someone could conceivably put this chip in a
> smaller form factor.  In such a case, it seems like we'd want these
> things to sum up to ~2000 (if it would ever make sense for someone to
> put this chip in a phone) or ~4000 (if it would ever make sense for
> someone to put this chip in a small tablet).

See above, the sustainable power with this patch only adds up to ~2000.
It is possible though that it would be lower in a smaller form factor
device.

I'd be ok with posting something lower for SC7180 (it would be a guess
though) and use the specific numbers in the device specific DT.

> It seems possible that,
> to achieve this, we might have to tweak the
> "dynamic-power-coefficient".  I don't know how much thought was put
> into those numbers, but the fact that the little cores have a super
> round 100 for their dynamic-power-coefficient makes me feel like they
> might have been more schwags than anything.  Rajendra maybe knows?

Yeah, it's possible that that was just an approximation

> * I'm curious about the fact that there are two numbers here: one for
> littles and one for 

Re: [PATCH] arm64: dts: qcom: sc7180: Add 'sustainable_power' for CPU thermal zones

2020-09-01 Thread Doug Anderson
Hi,

On Tue, Sep 1, 2020 at 10:07 AM Matthias Kaehlcke  wrote:
>
> On Thu, Aug 13, 2020 at 11:30:33AM -0700, Matthias Kaehlcke wrote:
> > The 'sustainable_power' attribute provides an estimate of the sustained
> > power that can be dissipated at the desired control temperature. One
> > could argue that this value is not necessarily the same for all devices
> > with the same SoC, which may have different form factors or thermal
> > designs. However there are reasons to specify a (default) value at SoC
> > level for SC7180: most importantly, if no value is specified at all the
> > power_allocator thermal governor (aka 'IPA') estimates a value, using the
> > minimum power of all cooling devices of the zone, which can result in
> > overly aggressive thermal throttling. For most devices an approximate
> > conservative value should be more useful than the minimum guesstimate
> > of power_allocator. Devices that need a different value can overwrite
> > it in their .dts. Also the thermal zones for SC7180 have a high
> > level of granularity (essentially one for each function block), which
> > makes it more likely that the default value just works for many devices.
> >
> > The values correspond to 1901 MHz for the big cores, and 1804 MHz for
> > the small cores. The values were determined by limiting the CPU
> > frequencies to different max values and launching a bunch of processes
> > that cause high CPU load ('while true; do true; done &' is simple and
> > does a good job). A frequency is deemed sustainable if the CPU
> > temperatures don't rise (consistently) above the second trip point
> > ('control temperature', 95 degC in this case). Once the highest
> > sustainable frequency is found, the sustainable power can be calculated
> > by multiplying the energy consumption per core at this frequency (which
> > can be found in /sys/kernel/debug/energy_model/) with the number of
> > cores that are specified as cooling devices.
> >
> > The sustainable frequencies were determined at room temperature
> > on a device without heat sink or other passive cooling elements.

I'm curious: was this a bare board, or a device in a case?  Hrm, I'm
not sure which one would be worse at heat dissipation, but I would
imagine that being inside a plastic case might be worse?


> > Signed-off-by: Matthias Kaehlcke 
> > ---
> > If maintainers think 'sustainable_power' should be specified at
> > device level (with which I conceptually agree) I'm fine with
> > doing that, just seemed it could be useful to have a reasonable
> > 'default' at SoC level in this case.
>
> Any comments on this?

I'm not massively familiar with this area of the code, but I guess I
shouldn't let that stop me from having an opinion!  :-P

* I would agree that it seems highly unlikely that someone would put
one of these chips in a device that could only dissipate the heat from
the lowest OPP, so having some higher estimate definitely makes sense.

* In terms of the numbers here, I believe that you're claiming that we
can dissipate 768 mW * 6 + 1202 mW * 2 = ~7 Watts of power.  My memory
of how much power we could dissipate in previous laptops I worked on
is a little fuzzy, but that doesn't seem insane for a passively-cooled
laptop.  However, I think someone could conceivably put this chip in a
smaller form factor.  In such a case, it seems like we'd want these
things to sum up to ~2000 (if it would ever make sense for someone to
put this chip in a phone) or ~4000 (if it would ever make sense for
someone to put this chip in a small tablet).  It seems possible that,
to achieve this, we might have to tweak the
"dynamic-power-coefficient".  I don't know how much thought was put
into those numbers, but the fact that the little cores have a super
round 100 for their dynamic-power-coefficient makes me feel like they
might have been more schwags than anything.  Rajendra maybe knows?

* I'm curious about the fact that there are two numbers here: one for
littles and one for bigs.  If I had to guess I'd say that since all
the cores are in one package so the contributions kinda need to be
thought of together, right?  If we're sitting there thermally
throttled then we'd want to pick the best perf-per-watt for the
overall package.  This is why your patch says we can sustain the
little cores at max and the big cores get whatever is left over,
right?

* Should we be leaving some room in here for the GPU?  ...or I guess
once we list it as a cooling device we'll have to decrease the amount
the CPUs can use?


So I guess the tl; dr is:

a) We should check "dynamic-power-coefficient" and possibly adjust.

b) I don't think the "conservative" by-default numbers should add up
to 7 Watts.  I could be convinced that this chip is not intended for
phones and thus we could have it add up to 4 Watts, but 7 Watts seems
too much.


-Doug


Re: [PATCH] arm64: dts: qcom: sc7180: Add 'sustainable_power' for CPU thermal zones

2020-09-01 Thread Matthias Kaehlcke
On Thu, Aug 13, 2020 at 11:30:33AM -0700, Matthias Kaehlcke wrote:
> The 'sustainable_power' attribute provides an estimate of the sustained
> power that can be dissipated at the desired control temperature. One
> could argue that this value is not necessarily the same for all devices
> with the same SoC, which may have different form factors or thermal
> designs. However there are reasons to specify a (default) value at SoC
> level for SC7180: most importantly, if no value is specified at all the
> power_allocator thermal governor (aka 'IPA') estimates a value, using the
> minimum power of all cooling devices of the zone, which can result in
> overly aggressive thermal throttling. For most devices an approximate
> conservative value should be more useful than the minimum guesstimate
> of power_allocator. Devices that need a different value can overwrite
> it in their .dts. Also the thermal zones for SC7180 have a high
> level of granularity (essentially one for each function block), which
> makes it more likely that the default value just works for many devices.
> 
> The values correspond to 1901 MHz for the big cores, and 1804 MHz for
> the small cores. The values were determined by limiting the CPU
> frequencies to different max values and launching a bunch of processes
> that cause high CPU load ('while true; do true; done &' is simple and
> does a good job). A frequency is deemed sustainable if the CPU
> temperatures don't rise (consistently) above the second trip point
> ('control temperature', 95 degC in this case). Once the highest
> sustainable frequency is found, the sustainable power can be calculated
> by multiplying the energy consumption per core at this frequency (which
> can be found in /sys/kernel/debug/energy_model/) with the number of
> cores that are specified as cooling devices.
> 
> The sustainable frequencies were determined at room temperature
> on a device without heat sink or other passive cooling elements.
> 
> Signed-off-by: Matthias Kaehlcke 
> ---
> If maintainers think 'sustainable_power' should be specified at
> device level (with which I conceptually agree) I'm fine with
> doing that, just seemed it could be useful to have a reasonable
> 'default' at SoC level in this case.

Any comments on this?


[PATCH] arm64: dts: qcom: sc7180: Add 'sustainable_power' for CPU thermal zones

2020-08-13 Thread Matthias Kaehlcke
The 'sustainable_power' attribute provides an estimate of the sustained
power that can be dissipated at the desired control temperature. One
could argue that this value is not necessarily the same for all devices
with the same SoC, which may have different form factors or thermal
designs. However there are reasons to specify a (default) value at SoC
level for SC7180: most importantly, if no value is specified at all the
power_allocator thermal governor (aka 'IPA') estimates a value, using the
minimum power of all cooling devices of the zone, which can result in
overly aggressive thermal throttling. For most devices an approximate
conservative value should be more useful than the minimum guesstimate
of power_allocator. Devices that need a different value can overwrite
it in their .dts. Also the thermal zones for SC7180 have a high
level of granularity (essentially one for each function block), which
makes it more likely that the default value just works for many devices.

The values correspond to 1901 MHz for the big cores, and 1804 MHz for
the small cores. The values were determined by limiting the CPU
frequencies to different max values and launching a bunch of processes
that cause high CPU load ('while true; do true; done &' is simple and
does a good job). A frequency is deemed sustainable if the CPU
temperatures don't rise (consistently) above the second trip point
('control temperature', 95 degC in this case). Once the highest
sustainable frequency is found, the sustainable power can be calculated
by multiplying the energy consumption per core at this frequency (which
can be found in /sys/kernel/debug/energy_model/) with the number of
cores that are specified as cooling devices.

The sustainable frequencies were determined at room temperature
on a device without heat sink or other passive cooling elements.

Signed-off-by: Matthias Kaehlcke 
---
If maintainers think 'sustainable_power' should be specified at
device level (with which I conceptually agree) I'm fine with
doing that, just seemed it could be useful to have a reasonable
'default' at SoC level in this case.

 arch/arm64/boot/dts/qcom/sc7180.dtsi | 10 ++
 1 file changed, 10 insertions(+)

diff --git a/arch/arm64/boot/dts/qcom/sc7180.dtsi 
b/arch/arm64/boot/dts/qcom/sc7180.dtsi
index d46b3833e52f..23f84639d6b9 100644
--- a/arch/arm64/boot/dts/qcom/sc7180.dtsi
+++ b/arch/arm64/boot/dts/qcom/sc7180.dtsi
@@ -3320,6 +3320,7 @@ cpu0-thermal {
polling-delay = <0>;
 
thermal-sensors = < 1>;
+   sustainable-power = <768>;
 
trips {
cpu0_alert0: trip-point0 {
@@ -3368,6 +3369,7 @@ cpu1-thermal {
polling-delay = <0>;
 
thermal-sensors = < 2>;
+   sustainable-power = <768>;
 
trips {
cpu1_alert0: trip-point0 {
@@ -3416,6 +3418,7 @@ cpu2-thermal {
polling-delay = <0>;
 
thermal-sensors = < 3>;
+   sustainable-power = <768>;
 
trips {
cpu2_alert0: trip-point0 {
@@ -3464,6 +3467,7 @@ cpu3-thermal {
polling-delay = <0>;
 
thermal-sensors = < 4>;
+   sustainable-power = <768>;
 
trips {
cpu3_alert0: trip-point0 {
@@ -3512,6 +3516,7 @@ cpu4-thermal {
polling-delay = <0>;
 
thermal-sensors = < 5>;
+   sustainable-power = <768>;
 
trips {
cpu4_alert0: trip-point0 {
@@ -3560,6 +3565,7 @@ cpu5-thermal {
polling-delay = <0>;
 
thermal-sensors = < 6>;
+   sustainable-power = <768>;
 
trips {
cpu5_alert0: trip-point0 {
@@ -3608,6 +3614,7 @@ cpu6-thermal {
polling-delay = <0>;
 
thermal-sensors = < 9>;
+   sustainable-power = <1202>;
 
trips {
cpu6_alert0: trip-point0 {
@@ -3648,6 +3655,7 @@ cpu7-thermal {
polling-delay = <0>;
 
thermal-sensors = < 10>;
+   sustainable-power = <1202>;
 
trips {
cpu7_alert0: trip-point0 {
@@ -3688,6 +3696,7 @@ cpu8-thermal {
polling-delay = <0>;
 
thermal-sensors = < 11>;
+   sustainable-power = <1202>;
 
trips {
cpu8_alert0: trip-point0 {
@@ -3728,6 +3737,7 @@ cpu9-thermal {
polling-delay = <0>;