from:"Zhang Rui"

[tip: perf/core] perf/x86/rapl: Add support for Intel Alder Lake

2021-04-20 Thread tip-bot2 for Zhang Rui

The following commit has been merged into the perf/core branch of tip:

Commit-ID: 6a5f4386798d81f7f413e93c87e2b6de7439beea
Gitweb:
https://git.kernel.org/tip/6a5f4386798d81f7f413e93c87e2b6de7439beea
Author:Zhang Rui 
AuthorDate:Mon, 12 Apr 2021 07:31:05 -07:00
Committer: Peter Zijlstra 
CommitterDate: Mon, 19 Apr 2021 20:03:30 +02:00

perf/x86/rapl: Add support for Intel Alder Lake

Alder Lake RAPL support is the same as previous Sky Lake.
Add Alder Lake model for RAPL.

Signed-off-by: Zhang Rui 
Signed-off-by: Peter Zijlstra (Intel) 
Reviewed-by: Andi Kleen 
Link: 
https://lkml.kernel.org/r/1618237865-33448-26-git-send-email-kan.li...@linux.intel.com
---
 arch/x86/events/rapl.c | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/arch/x86/events/rapl.c b/arch/x86/events/rapl.c
index f42a704..84a1042 100644
--- a/arch/x86/events/rapl.c
+++ b/arch/x86/events/rapl.c
@@ -800,6 +800,8 @@ static const struct x86_cpu_id rapl_model_match[] 
__initconst = {
X86_MATCH_INTEL_FAM6_MODEL(ICELAKE_X,   _hsx),
X86_MATCH_INTEL_FAM6_MODEL(COMETLAKE_L, _skl),
X86_MATCH_INTEL_FAM6_MODEL(COMETLAKE,   _skl),
+   X86_MATCH_INTEL_FAM6_MODEL(ALDERLAKE,   _skl),
+   X86_MATCH_INTEL_FAM6_MODEL(ALDERLAKE_L, _skl),
X86_MATCH_INTEL_FAM6_MODEL(SAPPHIRERAPIDS_X,_spr),
X86_MATCH_VENDOR_FAM(AMD,   0x17,   _amd_fam17h),
X86_MATCH_VENDOR_FAM(HYGON, 0x18,   _amd_fam17h),

Re: [PATCH] tools/power/x86/turbostat: Fix TCC offset bit mask

2021-04-11 Thread Zhang Rui

On Sat, 2021-03-13 at 07:16 -0800, Doug Smythies wrote:
> On Fri, Mar 12, 2021 at 2:16 PM Len Brown  wrote:
> > 
> > Doug,
> > The offset works for control.
> > 
> > However, it is erroneous to use it for reporting of the actual
> > temperature, like I did in turbostat.
> 
> Agreed.
> I have been running with a correction for that for a while,
> and as discussed on Rui's thread.
> But this bit mask correction patch is still needed isn't it?
> for this:
> cpu4: MSR_IA32_TEMPERATURE_TARGET: 0x1a64100d (90 C) (100 default -
> 10 offset)
> which should be this:
> cpu4: MSR_IA32_TEMPERATURE_TARGET: 0x1a64100d (74 C) (100 default -
> 26 offset)
> 
> But yes, I do now see the field size is only 4 bits for some parts.

As this is CPU specific, and we don't know which is which for all the
CPUs, so it seems that we can have a white list for the ones that we
care and have been verified.

For the others, by default, we only show the raw value and default TCC
activation temperature, like

cpu4: MSR_IA32_TEMPERATURE_TARGET: 0x1a64100d (100 default )

And this white list can be updated together with the one in the kernel
tcc_offset_cooling driver.

what do you think?

thanks,
rui



> 
> ... Doug
> 
> > Thus, I'm going to revert the patch that added it's use in
> > turbostat
> > for the Temperature column.
> > 
> > thanks,
> > -Len
> > 
> > On Fri, Mar 12, 2021 at 1:26 AM Doug Smythies 
> > wrote:
> > > 
> > > Hi Len,
> > > 
> > > 
> > > thank you for your reply.
> > > 
> > > On Thu, Mar 11, 2021 at 3:19 PM Len Brown 
> > > wrote:
> > > > 
> > > > Thanks for the close read, Doug.
> > > > 
> > > > This field size actually varies from system to system,
> > > > but the reality is that the offset is never that big, and so
> > > > the
> > > > smaller mask is sufficient.
> > > 
> > > Disagree.
> > > 
> > > I want to use an offset of 26.
> > > 
> > > > Finally, this may all be moot, because there is discussion that
> > > > using
> > > > the offset this way is simply erroneous.
> > > 
> > > Disagree.
> > > It works great.
> > > As far as I know/recall I was the only person that responded to
> > > Rui's thread
> > > "thermal/intel: introduce tcc cooling driver" [1]
> > > And, I spent quite a bit of time doing so.
> > > However, I agree the response seems different between the two
> > > systems
> > > under test, Rui's and mine.
> > > 
> > > [1] https://marc.info/?l=linux-pm=161070345329806=2
> > > 
> > > >  stay tuned.
> > > 
> > > O.K.
> > > 
> > > ... Doug
> > > > 
> > > > -Len
> > > > 
> > > > 
> > > > On Sat, Jan 16, 2021 at 12:07 PM Doug Smythies <
> > > > doug.smyth...@gmail.com> wrote:
> > > > > 
> > > > > The TCC offset mask is incorrect, resulting in
> > > > > incorrect target temperature calculations, if
> > > > > the offset is big enough to exceed the mask size.
> > > > > 
> > > > > Signed-off-by: Doug Smythies 
> > > > > ---
> > > > >  tools/power/x86/turbostat/turbostat.c | 2 +-
> > > > >  1 file changed, 1 insertion(+), 1 deletion(-)
> > > > > 
> > > > > diff --git a/tools/power/x86/turbostat/turbostat.c
> > > > > b/tools/power/x86/turbostat/turbostat.c
> > > > > index 389ea5209a83..d7acdd4d16c4 100644
> > > > > --- a/tools/power/x86/turbostat/turbostat.c
> > > > > +++ b/tools/power/x86/turbostat/turbostat.c
> > > > > @@ -4823,7 +4823,7 @@ int read_tcc_activation_temp()
> > > > > 
> > > > > target_c = (msr >> 16) & 0xFF;
> > > > > 
> > > > > -   offset_c = (msr >> 24) & 0xF;
> > > > > +   offset_c = (msr >> 24) & 0x3F;
> > > > > 
> > > > > tcc = target_c - offset_c;
> > > > > 
> > > > > --
> > > > > 2.25.1
> > > > > 
> > > > 
> > > > 
> > > > --
> > > > Len Brown, Intel Open Source Technology Center
> > 
> > 
> > 
> > --
> > Len Brown, Intel Open Source Technology Center

Re: [PATCH v2 04/15] ACPI: table: replace attribute((packed)) by __packed

2021-03-31 Thread Zhang Rui

On Tue, 2021-03-30 at 08:14 +, David Laight wrote:
> From: Zhang Rui
> > Sent: 30 March 2021 09:00
> > To: Xiaofei Tan ; David Laight <
> > david.lai...@aculab.com>; r...@rjwysocki.net;
> > l...@kernel.org; bhelg...@google.com
> > Cc: linux-a...@vger.kernel.org; linux-kernel@vger.kernel.org; 
> > linux-...@vger.kernel.org;
> > linux...@openeuler.org
> > Subject: Re: [PATCH v2 04/15] ACPI: table: replace
> > __attribute__((packed)) by __packed
> > 
> > On Tue, 2021-03-30 at 15:31 +0800, Zhang Rui wrote:
> > > On Tue, 2021-03-30 at 10:23 +0800, Xiaofei Tan wrote:
> > > > Hi David,
> > > > 
> > > > On 2021/3/29 18:09, David Laight wrote:
> > > > > From: Xiaofei Tan
> > > > > > Sent: 27 March 2021 07:46
> > > > > > 
> > > > > > Replace __attribute__((packed)) by __packed following the
> > > > > > advice of checkpatch.pl.
> > > > > > 
> > > > > > Signed-off-by: Xiaofei Tan 
> > > > > > ---
> > > > > >  drivers/acpi/acpi_fpdt.c | 6 +++---
> > > > > >  1 file changed, 3 insertions(+), 3 deletions(-)
> > > > > > 
> > > > > > diff --git a/drivers/acpi/acpi_fpdt.c
> > > > > > b/drivers/acpi/acpi_fpdt.c
> > > > > > index a89a806..690a88a 100644
> > > > > > --- a/drivers/acpi/acpi_fpdt.c
> > > > > > +++ b/drivers/acpi/acpi_fpdt.c
> > > > > > @@ -53,7 +53,7 @@ struct resume_performance_record {
> > > > > > u32 resume_count;
> > > > > > u64 resume_prev;
> > > > > > u64 resume_avg;
> > > > > > -} __attribute__((packed));
> > > > > > +} __packed;
> > > > > > 
> > > > > >  struct boot_performance_record {
> > > > > > struct fpdt_record_header header;
> > > > > > @@ -63,13 +63,13 @@ struct boot_performance_record {
> > > > > > u64 bootloader_launch;
> > > > > > u64 exitbootservice_start;
> > > > > > u64 exitbootservice_end;
> > > > > > -} __attribute__((packed));
> > > > > > +} __packed;
> > > > > > 
> > > > > >  struct suspend_performance_record {
> > > > > > struct fpdt_record_header header;
> > > > > > u64 suspend_start;
> > > > > > u64 suspend_end;
> > > > > > -} __attribute__((packed));
> > > > > > +} __packed;
> > > > > 
> > > > > My standard question about 'packed' is whether it is actually
> > > > > needed.
> > > > > It should only be used if the structures might be misaligned
> > > > > in
> > > > > memory.
> > > > > If the only problem is that a 64bit item needs to be 32bit
> > > > > aligned
> > > > > then a suitable type should be used for those specific
> > > > > fields.
> > > > > 
> > > > > Those all look very dubious - the standard header isn't
> > > > > packed
> > > > > so everything must eb assumed to be at least 32bit aligned.
> > > > > 
> > > > > There are also other sub-structures that contain 64bit
> > > > > values.
> > > > > These don't contain padding - but that requires 64bit
> > > > > alignement.
> > > > > 
> > > > > The only problematic structure is the last one - which would
> > > > > have
> > > > > a 32bit pad after the header.
> > > > > Is this even right given than there are explicit alignment
> > > > > pads
> > > > > in some of the other structures.
> > > > > 
> > > > > If 64bit alignment isn't guaranteed then a '64bit aligned to
> > > > > 32bit'
> > > > > type should be used for the u64 fields.
> > > > > 
> > > > 
> > > > Yes, some of them has been aligned already, then nothing
> > > > changed
> > > > when
> > > > add this "packed ". Maybe the purpose of the original author is
> > > > for
> > > > extension, and can tell others that this struct need be packed.
> > > > 
> > > 
> > > The patch is upstreamed recently but it was made long time ago.
> > > I think the origi

Re: [PATCH v2 04/15] ACPI: table: replace attribute((packed)) by __packed

2021-03-30 Thread Zhang Rui

On Tue, 2021-03-30 at 15:31 +0800, Zhang Rui wrote:
> On Tue, 2021-03-30 at 10:23 +0800, Xiaofei Tan wrote:
> > Hi David,
> > 
> > On 2021/3/29 18:09, David Laight wrote:
> > > From: Xiaofei Tan
> > > > Sent: 27 March 2021 07:46
> > > > 
> > > > Replace __attribute__((packed)) by __packed following the
> > > > advice of checkpatch.pl.
> > > > 
> > > > Signed-off-by: Xiaofei Tan 
> > > > ---
> > > >  drivers/acpi/acpi_fpdt.c | 6 +++---
> > > >  1 file changed, 3 insertions(+), 3 deletions(-)
> > > > 
> > > > diff --git a/drivers/acpi/acpi_fpdt.c
> > > > b/drivers/acpi/acpi_fpdt.c
> > > > index a89a806..690a88a 100644
> > > > --- a/drivers/acpi/acpi_fpdt.c
> > > > +++ b/drivers/acpi/acpi_fpdt.c
> > > > @@ -53,7 +53,7 @@ struct resume_performance_record {
> > > > u32 resume_count;
> > > > u64 resume_prev;
> > > > u64 resume_avg;
> > > > -} __attribute__((packed));
> > > > +} __packed;
> > > > 
> > > >  struct boot_performance_record {
> > > > struct fpdt_record_header header;
> > > > @@ -63,13 +63,13 @@ struct boot_performance_record {
> > > > u64 bootloader_launch;
> > > > u64 exitbootservice_start;
> > > > u64 exitbootservice_end;
> > > > -} __attribute__((packed));
> > > > +} __packed;
> > > > 
> > > >  struct suspend_performance_record {
> > > > struct fpdt_record_header header;
> > > > u64 suspend_start;
> > > > u64 suspend_end;
> > > > -} __attribute__((packed));
> > > > +} __packed;
> > > 
> > > My standard question about 'packed' is whether it is actually
> > > needed.
> > > It should only be used if the structures might be misaligned in
> > > memory.
> > > If the only problem is that a 64bit item needs to be 32bit
> > > aligned
> > > then a suitable type should be used for those specific fields.
> > > 
> > > Those all look very dubious - the standard header isn't packed
> > > so everything must eb assumed to be at least 32bit aligned.
> > > 
> > > There are also other sub-structures that contain 64bit values.
> > > These don't contain padding - but that requires 64bit alignement.
> > > 
> > > The only problematic structure is the last one - which would have
> > > a 32bit pad after the header.
> > > Is this even right given than there are explicit alignment pads
> > > in some of the other structures.
> > > 
> > > If 64bit alignment isn't guaranteed then a '64bit aligned to
> > > 32bit'
> > > type should be used for the u64 fields.
> > > 
> > 
> > Yes, some of them has been aligned already, then nothing changed
> > when 
> > add this "packed ". Maybe the purpose of the original author is
> > for 
> > extension, and can tell others that this struct need be packed.
> > 
> 
> The patch is upstreamed recently but it was made long time ago.
> I think the original problem is that one of the address, probably the
> suspend_performance record, is not 64bit aligned, thus we can not
> read
> the proper content of suspend_start and suspend_end, mapped from
> physical memory.
> 
> I will try to find a machine to reproduce the problem with all
> __attribute__((packed)) removed to double confirm this.
> 

So here is the problem, without __attribute__((packed))

[0.858442] suspend_record: 0xaad500175020
/sys/firmware/acpi/fpdt/suspend/suspend_end_ns:addr:
0xaad500175030, 15998179292659843072
/sys/firmware/acpi/fpdt/suspend/suspend_start_ns:addr:
0xaad500175028, 0

suspend_record is mapped to 0xaad500175020, and it is combined with
one 32bit header and two 64bit fields (suspend_start and suspend_end),
this is how it is located in physical memory.
So the addresses of the two 64bit fields are actually not 64bit
aligned.

David,
Is this the "a 64bit item needs to be 32bit aligned" problem you
referred?
If yes, what is the proper fix? should I used two 32bits for each of
the field instead?

thanks,
rui

> thanks,
> rui
> > >   David
> > > 
> > > -
> > > Registered Address Lakeside, Bramley Road, Mount Farm, Milton
> > > Keynes, MK1 1PT, UK
> > > Registration No: 1397386 (Wales)
> > > 
> > > 
> > > .
> > > 
> > 
> > 
> 
>

Re: [PATCH v2 04/15] ACPI: table: replace attribute((packed)) by __packed

2021-03-30 Thread Zhang Rui

On Tue, 2021-03-30 at 10:23 +0800, Xiaofei Tan wrote:
> Hi David,
> 
> On 2021/3/29 18:09, David Laight wrote:
> > From: Xiaofei Tan
> > > Sent: 27 March 2021 07:46
> > > 
> > > Replace __attribute__((packed)) by __packed following the
> > > advice of checkpatch.pl.
> > > 
> > > Signed-off-by: Xiaofei Tan 
> > > ---
> > >  drivers/acpi/acpi_fpdt.c | 6 +++---
> > >  1 file changed, 3 insertions(+), 3 deletions(-)
> > > 
> > > diff --git a/drivers/acpi/acpi_fpdt.c b/drivers/acpi/acpi_fpdt.c
> > > index a89a806..690a88a 100644
> > > --- a/drivers/acpi/acpi_fpdt.c
> > > +++ b/drivers/acpi/acpi_fpdt.c
> > > @@ -53,7 +53,7 @@ struct resume_performance_record {
> > >   u32 resume_count;
> > >   u64 resume_prev;
> > >   u64 resume_avg;
> > > -} __attribute__((packed));
> > > +} __packed;
> > > 
> > >  struct boot_performance_record {
> > >   struct fpdt_record_header header;
> > > @@ -63,13 +63,13 @@ struct boot_performance_record {
> > >   u64 bootloader_launch;
> > >   u64 exitbootservice_start;
> > >   u64 exitbootservice_end;
> > > -} __attribute__((packed));
> > > +} __packed;
> > > 
> > >  struct suspend_performance_record {
> > >   struct fpdt_record_header header;
> > >   u64 suspend_start;
> > >   u64 suspend_end;
> > > -} __attribute__((packed));
> > > +} __packed;
> > 
> > My standard question about 'packed' is whether it is actually
> > needed.
> > It should only be used if the structures might be misaligned in
> > memory.
> > If the only problem is that a 64bit item needs to be 32bit aligned
> > then a suitable type should be used for those specific fields.
> > 
> > Those all look very dubious - the standard header isn't packed
> > so everything must eb assumed to be at least 32bit aligned.
> > 
> > There are also other sub-structures that contain 64bit values.
> > These don't contain padding - but that requires 64bit alignement.
> > 
> > The only problematic structure is the last one - which would have
> > a 32bit pad after the header.
> > Is this even right given than there are explicit alignment pads
> > in some of the other structures.
> > 
> > If 64bit alignment isn't guaranteed then a '64bit aligned to 32bit'
> > type should be used for the u64 fields.
> > 
> 
> Yes, some of them has been aligned already, then nothing changed
> when 
> add this "packed ". Maybe the purpose of the original author is for 
> extension, and can tell others that this struct need be packed.
> 

The patch is upstreamed recently but it was made long time ago.
I think the original problem is that one of the address, probably the
suspend_performance record, is not 64bit aligned, thus we can not read
the proper content of suspend_start and suspend_end, mapped from
physical memory.

I will try to find a machine to reproduce the problem with all
__attribute__((packed)) removed to double confirm this.

thanks,
rui
> > David
> > 
> > -
> > Registered Address Lakeside, Bramley Road, Mount Farm, Milton
> > Keynes, MK1 1PT, UK
> > Registration No: 1397386 (Wales)
> > 
> > 
> > .
> > 
> 
>

[tip: perf/core] perf/x86/rapl: Add msr mask support

2021-02-10 Thread tip-bot2 for Zhang Rui

The following commit has been merged into the perf/core branch of tip:

Commit-ID: ffb20c2e52e8709b5fc9951e8863e31efb1f2cba
Gitweb:
https://git.kernel.org/tip/ffb20c2e52e8709b5fc9951e8863e31efb1f2cba
Author:Zhang Rui 
AuthorDate:Fri, 05 Feb 2021 00:18:14 +08:00
Committer: Peter Zijlstra 
CommitterDate: Wed, 10 Feb 2021 14:44:54 +01:00

perf/x86/rapl: Add msr mask support

In some cases, when probing a perf MSR, we're probing certain bits of the
MSR instead of the whole register, thus only these bits should be checked.

For example, for RAPL ENERGY_STATUS MSR, only the lower 32 bits represents
the energy counter, and the higher 32bits are reserved.

Introduce a new mask field in struct perf_msr to allow probing certain
bits of a MSR.

This change is transparent to the current perf_msr_probe() users.

Signed-off-by: Zhang Rui 
Signed-off-by: Peter Zijlstra (Intel) 
Reviewed-by: Andi Kleen 
Link: https://lkml.kernel.org/r/20210204161816.12649-1-rui.zh...@intel.com
---
 arch/x86/events/probe.c | 7 ++-
 arch/x86/events/probe.h | 7 ---
 2 files changed, 10 insertions(+), 4 deletions(-)

diff --git a/arch/x86/events/probe.c b/arch/x86/events/probe.c
index 136a1e8..600bf8d 100644
--- a/arch/x86/events/probe.c
+++ b/arch/x86/events/probe.c
@@ -28,6 +28,7 @@ perf_msr_probe(struct perf_msr *msr, int cnt, bool zero, void 
*data)
for (bit = 0; bit < cnt; bit++) {
if (!msr[bit].no_check) {
struct attribute_group *grp = msr[bit].grp;
+   u64 mask;
 
/* skip entry with no group */
if (!grp)
@@ -44,8 +45,12 @@ perf_msr_probe(struct perf_msr *msr, int cnt, bool zero, 
void *data)
/* Virt sucks; you cannot tell if a R/O MSR is present 
:/ */
if (rdmsrl_safe(msr[bit].msr, ))
continue;
+
+   mask = msr[bit].mask;
+   if (!mask)
+   mask = ~0ULL;
/* Disable zero counters if requested. */
-   if (!zero && !val)
+   if (!zero && !(val & mask))
continue;
 
grp->is_visible = NULL;
diff --git a/arch/x86/events/probe.h b/arch/x86/events/probe.h
index 4c8e0af..261b9bd 100644
--- a/arch/x86/events/probe.h
+++ b/arch/x86/events/probe.h
@@ -4,10 +4,11 @@
 #include 
 
 struct perf_msr {
-   u64   msr;
-   struct attribute_group   *grp;
+   u64 msr;
+   struct attribute_group  *grp;
bool(*test)(int idx, void *data);
-   bool  no_check;
+   boolno_check;
+   u64 mask;
 };
 
 unsigned long

[tip: perf/core] perf/x86/rapl: Only check lower 32bits for RAPL energy counters

2021-02-10 Thread tip-bot2 for Zhang Rui

The following commit has been merged into the perf/core branch of tip:

Commit-ID: b6f78d3fba7f605f673185d7292d84af7576fdc1
Gitweb:
https://git.kernel.org/tip/b6f78d3fba7f605f673185d7292d84af7576fdc1
Author:Zhang Rui 
AuthorDate:Fri, 05 Feb 2021 00:18:15 +08:00
Committer: Peter Zijlstra 
CommitterDate: Wed, 10 Feb 2021 14:44:55 +01:00

perf/x86/rapl: Only check lower 32bits for RAPL energy counters

In the RAPL ENERGY_COUNTER MSR, only the lower 32bits represent the energy
counter.

On previous platforms, the higher 32bits are reverved and always return
Zero. But on Intel SapphireRapids platform, the higher 32bits are reused
for other purpose and return non-zero value.

Thus check the lower 32bits only for these ENERGY_COUTNER MSRs, to make
sure the RAPL PMU events are not added erroneously when higher 32bits
contain non-zero value.

Signed-off-by: Zhang Rui 
Signed-off-by: Peter Zijlstra (Intel) 
Reviewed-by: Andi Kleen 
Link: https://lkml.kernel.org/r/20210204161816.12649-2-rui.zh...@intel.com
---
 arch/x86/events/rapl.c | 13 -
 1 file changed, 8 insertions(+), 5 deletions(-)

diff --git a/arch/x86/events/rapl.c b/arch/x86/events/rapl.c
index 7dbbeaa..7ed25b2 100644
--- a/arch/x86/events/rapl.c
+++ b/arch/x86/events/rapl.c
@@ -523,12 +523,15 @@ static bool test_msr(int idx, void *data)
return test_bit(idx, (unsigned long *) data);
 }
 
+/* Only lower 32bits of the MSR represents the energy counter */
+#define RAPL_MSR_MASK 0x
+
 static struct perf_msr intel_rapl_msrs[] = {
-   [PERF_RAPL_PP0]  = { MSR_PP0_ENERGY_STATUS,  
_events_cores_group, test_msr },
-   [PERF_RAPL_PKG]  = { MSR_PKG_ENERGY_STATUS,  
_events_pkg_group,   test_msr },
-   [PERF_RAPL_RAM]  = { MSR_DRAM_ENERGY_STATUS, 
_events_ram_group,   test_msr },
-   [PERF_RAPL_PP1]  = { MSR_PP1_ENERGY_STATUS,  
_events_gpu_group,   test_msr },
-   [PERF_RAPL_PSYS] = { MSR_PLATFORM_ENERGY_STATUS, 
_events_psys_group,  test_msr },
+   [PERF_RAPL_PP0]  = { MSR_PP0_ENERGY_STATUS,  
_events_cores_group, test_msr, false, RAPL_MSR_MASK },
+   [PERF_RAPL_PKG]  = { MSR_PKG_ENERGY_STATUS,  
_events_pkg_group,   test_msr, false, RAPL_MSR_MASK },
+   [PERF_RAPL_RAM]  = { MSR_DRAM_ENERGY_STATUS, 
_events_ram_group,   test_msr, false, RAPL_MSR_MASK },
+   [PERF_RAPL_PP1]  = { MSR_PP1_ENERGY_STATUS,  
_events_gpu_group,   test_msr, false, RAPL_MSR_MASK },
+   [PERF_RAPL_PSYS] = { MSR_PLATFORM_ENERGY_STATUS, 
_events_psys_group,  test_msr, false, RAPL_MSR_MASK },
 };
 
 /*

[tip: perf/core] perf/x86/rapl: Fix psys-energy event on Intel SPR platform

2021-02-10 Thread tip-bot2 for Zhang Rui

The following commit has been merged into the perf/core branch of tip:

Commit-ID: 838342a6d6b7ecc475dc052d4a405c4ffb3ad1b5
Gitweb:
https://git.kernel.org/tip/838342a6d6b7ecc475dc052d4a405c4ffb3ad1b5
Author:Zhang Rui 
AuthorDate:Fri, 05 Feb 2021 00:18:16 +08:00
Committer: Peter Zijlstra 
CommitterDate: Wed, 10 Feb 2021 14:44:55 +01:00

perf/x86/rapl: Fix psys-energy event on Intel SPR platform

There are several things special for the RAPL Psys energy counter, on
Intel Sapphire Rapids platform.
1. it contains one Psys master package, and only CPUs on the master
   package can read valid value of the Psys energy counter, reading the
   MSR on CPUs in the slave package returns 0.
2. The master package does not have to be Physical package 0. And when
   all the CPUs on the Psys master package are offlined, we lose the Psys
   energy counter, at runtime.
3. The Psys energy counter can be disabled by BIOS, while all the other
   energy counters are not affected.

It is not easy to handle all of these in the current RAPL PMU design
because
a) perf_msr_probe() validates the MSR on some random CPU, which may either
   be in the Psys master package or in the Psys slave package.
b) all the RAPL events share the same PMU, and there is not API to remove
   the psys-energy event cleanly, without affecting the other events in
   the same PMU.

This patch addresses the problems in a simple way.

First,  by setting .no_check bit for RAPL Psys MSR, the psys-energy event
is always added, so we don't have to check the Psys ENERGY_STATUS MSR on
master package.

Then, by removing rapl_not_visible(), the psys-energy event is always
available in sysfs. This does not affect the previous code because, for
the RAPL MSRs with .no_check cleared, the .is_visible() callback is always
overriden in the perf_msr_probe() function.

Note, although RAPL PMU is die-based, and the Psys energy counter MSR on
Intel SPR is package scope, this is not a problem because there is only
one die in each package on SPR.

Signed-off-by: Zhang Rui 
Signed-off-by: Peter Zijlstra (Intel) 
Reviewed-by: Andi Kleen 
Link: https://lkml.kernel.org/r/20210204161816.12649-3-rui.zh...@intel.com
---
 arch/x86/events/rapl.c | 21 +
 1 file changed, 9 insertions(+), 12 deletions(-)

diff --git a/arch/x86/events/rapl.c b/arch/x86/events/rapl.c
index 7ed25b2..f42a704 100644
--- a/arch/x86/events/rapl.c
+++ b/arch/x86/events/rapl.c
@@ -454,16 +454,9 @@ static struct attribute *rapl_events_cores[] = {
NULL,
 };
 
-static umode_t
-rapl_not_visible(struct kobject *kobj, struct attribute *attr, int i)
-{
-   return 0;
-}
-
 static struct attribute_group rapl_events_cores_group = {
.name  = "events",
.attrs = rapl_events_cores,
-   .is_visible = rapl_not_visible,
 };
 
 static struct attribute *rapl_events_pkg[] = {
@@ -476,7 +469,6 @@ static struct attribute *rapl_events_pkg[] = {
 static struct attribute_group rapl_events_pkg_group = {
.name  = "events",
.attrs = rapl_events_pkg,
-   .is_visible = rapl_not_visible,
 };
 
 static struct attribute *rapl_events_ram[] = {
@@ -489,7 +481,6 @@ static struct attribute *rapl_events_ram[] = {
 static struct attribute_group rapl_events_ram_group = {
.name  = "events",
.attrs = rapl_events_ram,
-   .is_visible = rapl_not_visible,
 };
 
 static struct attribute *rapl_events_gpu[] = {
@@ -502,7 +493,6 @@ static struct attribute *rapl_events_gpu[] = {
 static struct attribute_group rapl_events_gpu_group = {
.name  = "events",
.attrs = rapl_events_gpu,
-   .is_visible = rapl_not_visible,
 };
 
 static struct attribute *rapl_events_psys[] = {
@@ -515,7 +505,6 @@ static struct attribute *rapl_events_psys[] = {
 static struct attribute_group rapl_events_psys_group = {
.name  = "events",
.attrs = rapl_events_psys,
-   .is_visible = rapl_not_visible,
 };
 
 static bool test_msr(int idx, void *data)
@@ -534,6 +523,14 @@ static struct perf_msr intel_rapl_msrs[] = {
[PERF_RAPL_PSYS] = { MSR_PLATFORM_ENERGY_STATUS, 
_events_psys_group,  test_msr, false, RAPL_MSR_MASK },
 };
 
+static struct perf_msr intel_rapl_spr_msrs[] = {
+   [PERF_RAPL_PP0]  = { MSR_PP0_ENERGY_STATUS,  
_events_cores_group, test_msr, false, RAPL_MSR_MASK },
+   [PERF_RAPL_PKG]  = { MSR_PKG_ENERGY_STATUS,  
_events_pkg_group,   test_msr, false, RAPL_MSR_MASK },
+   [PERF_RAPL_RAM]  = { MSR_DRAM_ENERGY_STATUS, 
_events_ram_group,   test_msr, false, RAPL_MSR_MASK },
+   [PERF_RAPL_PP1]  = { MSR_PP1_ENERGY_STATUS,  
_events_gpu_group,   test_msr, false, RAPL_MSR_MASK },
+   [PERF_RAPL_PSYS] = { MSR_PLATFORM_ENERGY_STATUS, 
_events_psys_group,  test_msr, true, RAPL_MSR_MASK },
+};
+
 /*
  * Force to PERF_RAPL_MAX size due to:
  * - perf_msr_probe(PERF_RAPL_MAX)
@@ -764,7 +761,7 @@ static struct rapl_model model_sp

[PATCH V2 1/3] perf/x86/rapl: Add msr mask support

2021-02-04 Thread Zhang Rui

In some cases, when probing a perf MSR, we're probing certain bits of the
MSR instead of the whole register, thus only these bits should be checked.

For example, for RAPL ENERGY_STATUS MSR, only the lower 32 bits represents
the energy counter, and the higher 32bits are reserved.

Introduce a new mask field in struct perf_msr to allow probing certain
bits of a MSR.

This change is transparent to the current perf_msr_probe() users.

Signed-off-by: Zhang Rui 
Reviewed-by: Andi Kleen 
---
 arch/x86/events/probe.c | 5 -
 arch/x86/events/probe.h | 7 ---
 2 files changed, 8 insertions(+), 4 deletions(-)

diff --git a/arch/x86/events/probe.c b/arch/x86/events/probe.c
index 136a1e847254..a0a19c404cb5 100644
--- a/arch/x86/events/probe.c
+++ b/arch/x86/events/probe.c
@@ -28,6 +28,7 @@ perf_msr_probe(struct perf_msr *msr, int cnt, bool zero, void 
*data)
for (bit = 0; bit < cnt; bit++) {
if (!msr[bit].no_check) {
struct attribute_group *grp = msr[bit].grp;
+   u64 mask;
 
/* skip entry with no group */
if (!grp)
@@ -44,8 +45,10 @@ perf_msr_probe(struct perf_msr *msr, int cnt, bool zero, 
void *data)
/* Virt sucks; you cannot tell if a R/O MSR is present 
:/ */
if (rdmsrl_safe(msr[bit].msr, ))
continue;
+
+   mask = msr[bit].mask ? msr[bit].mask : U64_MAX;
/* Disable zero counters if requested. */
-   if (!zero && !val)
+   if (!zero && !(val & mask))
continue;
 
grp->is_visible = NULL;
diff --git a/arch/x86/events/probe.h b/arch/x86/events/probe.h
index 4c8e0afc5fb5..261b9bda24e3 100644
--- a/arch/x86/events/probe.h
+++ b/arch/x86/events/probe.h
@@ -4,10 +4,11 @@
 #include 
 
 struct perf_msr {
-   u64   msr;
-   struct attribute_group   *grp;
+   u64 msr;
+   struct attribute_group  *grp;
bool(*test)(int idx, void *data);
-   bool  no_check;
+   boolno_check;
+   u64 mask;
 };
 
 unsigned long
-- 
2.17.1

[PATCH V2 2/3] perf/x86/rapl: Only check lower 32bits for RAPL energy counters

2021-02-04 Thread Zhang Rui

In the RAPL ENERGY_COUNTER MSR, only the lower 32bits represent the energy
counter.

On previous platforms, the higher 32bits are reverved and always return
Zero. But on Intel SapphireRapids platform, the higher 32bits are reused
for other purpose and return non-zero value.

Thus check the lower 32bits only for these ENERGY_COUTNER MSRs, to make
sure the RAPL PMU events are not added erroneously when higher 32bits
contain non-zero value.

Signed-off-by: Zhang Rui 
Reviewed-by: Andi Kleen 
---
 arch/x86/events/rapl.c | 13 -
 1 file changed, 8 insertions(+), 5 deletions(-)

diff --git a/arch/x86/events/rapl.c b/arch/x86/events/rapl.c
index 7dbbeaacd995..7ed25b2ba05f 100644
--- a/arch/x86/events/rapl.c
+++ b/arch/x86/events/rapl.c
@@ -523,12 +523,15 @@ static bool test_msr(int idx, void *data)
return test_bit(idx, (unsigned long *) data);
 }
 
+/* Only lower 32bits of the MSR represents the energy counter */
+#define RAPL_MSR_MASK 0x
+
 static struct perf_msr intel_rapl_msrs[] = {
-   [PERF_RAPL_PP0]  = { MSR_PP0_ENERGY_STATUS,  
_events_cores_group, test_msr },
-   [PERF_RAPL_PKG]  = { MSR_PKG_ENERGY_STATUS,  
_events_pkg_group,   test_msr },
-   [PERF_RAPL_RAM]  = { MSR_DRAM_ENERGY_STATUS, 
_events_ram_group,   test_msr },
-   [PERF_RAPL_PP1]  = { MSR_PP1_ENERGY_STATUS,  
_events_gpu_group,   test_msr },
-   [PERF_RAPL_PSYS] = { MSR_PLATFORM_ENERGY_STATUS, 
_events_psys_group,  test_msr },
+   [PERF_RAPL_PP0]  = { MSR_PP0_ENERGY_STATUS,  
_events_cores_group, test_msr, false, RAPL_MSR_MASK },
+   [PERF_RAPL_PKG]  = { MSR_PKG_ENERGY_STATUS,  
_events_pkg_group,   test_msr, false, RAPL_MSR_MASK },
+   [PERF_RAPL_RAM]  = { MSR_DRAM_ENERGY_STATUS, 
_events_ram_group,   test_msr, false, RAPL_MSR_MASK },
+   [PERF_RAPL_PP1]  = { MSR_PP1_ENERGY_STATUS,  
_events_gpu_group,   test_msr, false, RAPL_MSR_MASK },
+   [PERF_RAPL_PSYS] = { MSR_PLATFORM_ENERGY_STATUS, 
_events_psys_group,  test_msr, false, RAPL_MSR_MASK },
 };
 
 /*
-- 
2.17.1

[PATCH V2 3/3] perf/x86/rapl: Fix psys-energy event on Intel SPR platform

2021-02-04 Thread Zhang Rui

There are several things special for the RAPL Psys energy counter, on
Intel Sapphire Rapids platform.
1. it contains one Psys master package, and only CPUs on the master
   package can read valid value of the Psys energy counter, reading the
   MSR on CPUs in the slave package returns 0.
2. The master package does not have to be Physical package 0. And when
   all the CPUs on the Psys master package are offlined, we lose the Psys
   energy counter, at runtime.
3. The Psys energy counter can be disabled by BIOS, while all the other
   energy counters are not affected.

It is not easy to handle all of these in the current RAPL PMU design
because
a) perf_msr_probe() validates the MSR on some random CPU, which may either
   be in the Psys master package or in the Psys slave package.
b) all the RAPL events share the same PMU, and there is not API to remove
   the psys-energy event cleanly, without affecting the other events in
   the same PMU.

This patch addresses the problems in a simple way.

First,  by setting .no_check bit for RAPL Psys MSR, the psys-energy event
is always added, so we don't have to check the Psys ENERGY_STATUS MSR on
master package.

Then, by removing rapl_not_visible(), the psys-energy event is always
available in sysfs. This does not affect the previous code because, for
the RAPL MSRs with .no_check cleared, the .is_visible() callback is always
overriden in the perf_msr_probe() function.

Note, although RAPL PMU is die-based, and the Psys energy counter MSR on
Intel SPR is package scope, this is not a problem because there is only
one die in each package on SPR.

Signed-off-by: Zhang Rui 
Reviewed-by: Andi Kleen 
---
 arch/x86/events/rapl.c | 21 +
 1 file changed, 9 insertions(+), 12 deletions(-)

diff --git a/arch/x86/events/rapl.c b/arch/x86/events/rapl.c
index 7ed25b2ba05f..f42a70496a24 100644
--- a/arch/x86/events/rapl.c
+++ b/arch/x86/events/rapl.c
@@ -454,16 +454,9 @@ static struct attribute *rapl_events_cores[] = {
NULL,
 };
 
-static umode_t
-rapl_not_visible(struct kobject *kobj, struct attribute *attr, int i)
-{
-   return 0;
-}
-
 static struct attribute_group rapl_events_cores_group = {
.name  = "events",
.attrs = rapl_events_cores,
-   .is_visible = rapl_not_visible,
 };
 
 static struct attribute *rapl_events_pkg[] = {
@@ -476,7 +469,6 @@ static struct attribute *rapl_events_pkg[] = {
 static struct attribute_group rapl_events_pkg_group = {
.name  = "events",
.attrs = rapl_events_pkg,
-   .is_visible = rapl_not_visible,
 };
 
 static struct attribute *rapl_events_ram[] = {
@@ -489,7 +481,6 @@ static struct attribute *rapl_events_ram[] = {
 static struct attribute_group rapl_events_ram_group = {
.name  = "events",
.attrs = rapl_events_ram,
-   .is_visible = rapl_not_visible,
 };
 
 static struct attribute *rapl_events_gpu[] = {
@@ -502,7 +493,6 @@ static struct attribute *rapl_events_gpu[] = {
 static struct attribute_group rapl_events_gpu_group = {
.name  = "events",
.attrs = rapl_events_gpu,
-   .is_visible = rapl_not_visible,
 };
 
 static struct attribute *rapl_events_psys[] = {
@@ -515,7 +505,6 @@ static struct attribute *rapl_events_psys[] = {
 static struct attribute_group rapl_events_psys_group = {
.name  = "events",
.attrs = rapl_events_psys,
-   .is_visible = rapl_not_visible,
 };
 
 static bool test_msr(int idx, void *data)
@@ -534,6 +523,14 @@ static struct perf_msr intel_rapl_msrs[] = {
[PERF_RAPL_PSYS] = { MSR_PLATFORM_ENERGY_STATUS, 
_events_psys_group,  test_msr, false, RAPL_MSR_MASK },
 };
 
+static struct perf_msr intel_rapl_spr_msrs[] = {
+   [PERF_RAPL_PP0]  = { MSR_PP0_ENERGY_STATUS,  
_events_cores_group, test_msr, false, RAPL_MSR_MASK },
+   [PERF_RAPL_PKG]  = { MSR_PKG_ENERGY_STATUS,  
_events_pkg_group,   test_msr, false, RAPL_MSR_MASK },
+   [PERF_RAPL_RAM]  = { MSR_DRAM_ENERGY_STATUS, 
_events_ram_group,   test_msr, false, RAPL_MSR_MASK },
+   [PERF_RAPL_PP1]  = { MSR_PP1_ENERGY_STATUS,  
_events_gpu_group,   test_msr, false, RAPL_MSR_MASK },
+   [PERF_RAPL_PSYS] = { MSR_PLATFORM_ENERGY_STATUS, 
_events_psys_group,  test_msr, true, RAPL_MSR_MASK },
+};
+
 /*
  * Force to PERF_RAPL_MAX size due to:
  * - perf_msr_probe(PERF_RAPL_MAX)
@@ -764,7 +761,7 @@ static struct rapl_model model_spr = {
  BIT(PERF_RAPL_PSYS),
.unit_quirk = RAPL_UNIT_QUIRK_INTEL_SPR,
.msr_power_unit = MSR_RAPL_POWER_UNIT,
-   .rapl_msrs  = intel_rapl_msrs,
+   .rapl_msrs  = intel_rapl_spr_msrs,
 };
 
 static struct rapl_model model_amd_fam17h = {
-- 
2.17.1

Re: [PATCH 3/3] perf/x86/rapl: Fix psys-energy event on Intel SPR platform

2021-02-04 Thread Zhang Rui

Hi, Peter,

On Wed, 2021-02-03 at 15:47 +0100, Peter Zijlstra wrote:
> FWIW, your email is malformed, please wrap at 78 chars.
> 
> On Mon, Jan 25, 2021 at 06:11:14AM +0000, Zhang, Rui wrote:
> > In short, the current code does not allow RAPL energy counter to
> > return 0. And all the work I do is to allow Psys energy counter to
> > return 0.
> 
> Ok.
> 
> > In this way, the Psys event is "valid" on all CPUs, so we don't
> > need
> > to handle the master thing.
> 
> So RAPL is mapped to DIEs, and IIRC we can have multiple DIEs per
> Package. But the master thing is a Package.
> 
> Is this all moot because SPR has one DIE per Package?

Oh, right.
This is not a problem on SPR because it is a single-die
platform.

>  Because if it
> would have more, there's be more interesting problems I suppose.

Agreed.

thanks,
rui

RE: [PATCH 3/3] perf/x86/rapl: Fix psys-energy event on Intel SPR platform

2021-02-03 Thread Zhang, Rui

Hi, Peter,

> -Original Message-
> From: Zhang, Rui
> Sent: Monday, January 25, 2021 2:11 PM
> To: 'Peter Zijlstra' 
> Cc: 'mi...@redhat.com' ; 'a...@kernel.org'
> ; 'mark.rutl...@arm.com' ;
> 'alexander.shish...@linux.intel.com' ;
> 'jo...@redhat.com' ; 'namhy...@kernel.org'
> ; 'linux-kernel@vger.kernel.org'  ker...@vger.kernel.org>; 'x...@kernel.org' ;
> 'kan.li...@linux.intel.com' ; 
> 'a...@linux.intel.com'
> 
> Subject: RE: [PATCH 3/3] perf/x86/rapl: Fix psys-energy event on Intel SPR
> platform
> 
> Hi, Peter,
> 
> > -Original Message-
> > From: Zhang, Rui
> > Sent: Sunday, January 17, 2021 10:34 PM
> > To: 'Peter Zijlstra' 
> > Cc: mi...@redhat.com; a...@kernel.org; mark.rutl...@arm.com;
> > alexander.shish...@linux.intel.com; jo...@redhat.com;
> > namhy...@kernel.org; linux-kernel@vger.kernel.org; x...@kernel.org;
> > kan.li...@linux.intel.com; a...@linux.intel.com
> > Subject: RE: [PATCH 3/3] perf/x86/rapl: Fix psys-energy event on Intel
> > SPR platform
> >
> > Hi, Peter,
> >
> > > -Original Message-
> > > From: Peter Zijlstra 
> > > Sent: Saturday, January 16, 2021 8:50 PM
> > > To: Zhang, Rui 
> > > Cc: mi...@redhat.com; a...@kernel.org; mark.rutl...@arm.com;
> > > alexander.shish...@linux.intel.com; jo...@redhat.com;
> > > namhy...@kernel.org; linux-kernel@vger.kernel.org; x...@kernel.org;
> > > kan.li...@linux.intel.com; a...@linux.intel.com
> > > Subject: Re: [PATCH 3/3] perf/x86/rapl: Fix psys-energy event on
> > > Intel SPR platform
> > > Importance: High
> > >
> > > On Fri, Jan 15, 2021 at 05:22:08PM +0800, Zhang Rui wrote:
> > > > There are several things special for the RAPL Psys energy counter,
> > > > on Intel Sapphire Rapids platform.
> > > > 1. it contains one Psys master package, and only CPUs on the master
> > > >package can read valid value of the Psys energy counter, reading the
> > > >MSR on CPUs in the slave package returns 0.
> > > > 2. The master package does not have to be Physical package 0. And
> when
> > > >all the CPUs on the Psys master package are offlined, we lose the 
> > > > Psys
> > > >energy counter, at runtime.
> > > > 3. The Psys energy counter can be disabled by BIOS, while all the other
> > > >energy counters are not affected.
> > > >
> > > > It is not easy to handle all of these in the current RAPL PMU
> > > > design because
> > > > a) perf_msr_probe() validates the MSR on some random CPU, which
> > > > may
> > > either
> > > >be in the Psys master package or in the Psys slave package.
> > > > b) all the RAPL events share the same PMU, and there is not API to
> > remove
> > > >the psys-energy event cleanly, without affecting the other events in
> > > >the same PMU.
> > > >
> > > > This patch addresses the problems in a simple way.
> > > >
> > > > First, by setting .no_check bit for RAPL Psys MSR, the psys-energy
> > > > event is always added, so we don't have to check the Psys
> > > > ENERGY_STATUS MSR on master package.
> > > >
> > > > Then, rapl_not_visible() is removed because 1. it is useless for
> > > > RAPL MSRs with .no_check cleared, because the
> > > >.is_visible() callbacks is always overridden in perf_msr_probe().
> > > > 2. it is useless for RAPL MSRs with .no_check set, because we actually
> > > >want the sysfs attributes always be visible for those MSRs.
> > > >
> > > > With the above changes, we always probe the psys-energy event on
> > > > Intel SPR platform. Difference is that the event counter returns 0
> > > > when the Psys RAPL Domain is disabled by BIOS, or the Psys master
> > > > package is
> > > offlined.
> > >
> > > Maybe I'm too tired, but I cannot follow. How does this cure the
> > > fact that the rapl_cpu_mask might not include that master thing. And
> > > how can software detect what the master thing is to begin with?
> >
> > To make things simple, I ignore the master thing, and probe the
> > psys-energy counter blindly on SPR.
> > So rapl_cpu_mask still includes all the online CPUs.
> > This means that psys-energy is "valid" on all packages, and it just
> > returns different values on different packages.
> > AKA, whole system power consumption on Psys master package, and Zero
> > on Psys slave packages.
> >
> In short, the current code does not allow RAPL energy counter to return 0.
> And all the work I do is to allow Psys energy counter to return 0.
> In this way, the Psys event is "valid" on all CPUs, so we don't need to handle
> the master thing.
> The drawback is that we still see psys-energy event, but with 0 readout,
> when Psys counter is not available (master package offlined, or psys
> disabled).
> 
> TBH, I'm not quite sure if I understand your original question correctly or 
> not,
> so please let me know if there is still something unclear.
> 
Sorry to bother, may I know your concern about this patch series?

Thanks,
rui
> Thanks,
> rui
> >
> > Thanks,
> > rui

Re: [PATCH -next] ACPI: tables: Mark acpi_init_fpdt with static keyword

2021-01-28 Thread Zhang Rui

Hi, Wei,

Thanks for the patch.

Given that there are a couple of things need to be fixed in the orignal
patch, I'd prefer to refresh the patch with all the fixes included

https://patchwork.kernel.org/project/linux-acpi/patch/20210129061548.13448-1-rui.zh...@intel.com/

what do you think?

thanks,
rui

On Thu, 2021-01-28 at 19:31 +0800, Zou Wei wrote:
> Fix the following sparse warning:
> 
> drivers/acpi/acpi_fpdt.c:230:6: warning: symbol 'acpi_init_fpdt' was
> not declared. Should it be static?
> 
> Signed-off-by: Zou Wei 
> ---
>  drivers/acpi/acpi_fpdt.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/drivers/acpi/acpi_fpdt.c b/drivers/acpi/acpi_fpdt.c
> index b810811..968f9cc 100644
> --- a/drivers/acpi/acpi_fpdt.c
> +++ b/drivers/acpi/acpi_fpdt.c
> @@ -227,7 +227,7 @@ static int fpdt_process_subtable(u64 address, u32
> subtable_type)
>   return 0;
>  }
>  
> -void acpi_init_fpdt(void)
> +static void acpi_init_fpdt(void)
>  {
>   acpi_status status;
>   struct acpi_table_header *header;

Re: [PATCH -next] acpi: fpdt: drop errant comma in pr_info()

2021-01-28 Thread Zhang Rui

Hi, Randy,

Thanks for the patch, a similar patch has been posted earlier, but I
forgot to cc linux-acpi mailing list.
https://marc.info/?l=linux-next=161172750710666=2

Now given that there are a couple of fixes needed for the original
patch, I just refreshed the original patch to include all the fixes.

https://patchwork.kernel.org/project/linux-acpi/patch/20210129061548.13448-1-rui.zh...@intel.com/

thanks,
rui

On Thu, 2021-01-28 at 15:25 -0800, Randy Dunlap wrote:
> Drop a mistaken comma in the pr_info() args to prevent the
> build warning.
> 
> ../drivers/acpi/acpi_fpdt.c: In function 'acpi_init_fpdt':
> ../include/linux/kern_levels.h:5:18: warning: too many arguments for
> format [-Wformat-extra-args]
> ../drivers/acpi/acpi_fpdt.c:255:4: note: in expansion of macro
> 'pr_info'
> pr_info(FW_BUG, "Invalid subtable type %d found.\n",
> 
> Fixes: 208757d71098 ("ACPI: tables: introduce support for FPDT
> table")
> Signed-off-by: Randy Dunlap 
> Cc: "Rafael J. Wysocki" 
> Cc: Rafael J. Wysocki 
> Cc: Len Brown 
> Cc: linux-a...@vger.kernel.org
> Cc: Zhang Rui 
> ---
>  drivers/acpi/acpi_fpdt.c |2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> --- linux-next-20210128.orig/drivers/acpi/acpi_fpdt.c
> +++ linux-next-20210128/drivers/acpi/acpi_fpdt.c
> @@ -252,7 +252,7 @@ void acpi_init_fpdt(void)
> subtable->type);
>   break;
>   default:
> - pr_info(FW_BUG, "Invalid subtable type %d
> found.\n",
> + pr_info(FW_BUG "Invalid subtable type %d
> found.\n",
>  subtable->type);
>   return;
>   }

Re: [PATCH -next] acpi: fpdt: drop errant comma in pr_info()

2021-01-28 Thread Zhang Rui

On Thu, 2021-01-28 at 15:56 -0800, Joe Perches wrote:
> On Thu, 2021-01-28 at 15:25 -0800, Randy Dunlap wrote:
> > Drop a mistaken comma in the pr_info() args to prevent the
> > build warning.
> > 
> > ../drivers/acpi/acpi_fpdt.c: In function 'acpi_init_fpdt':
> > ../include/linux/kern_levels.h:5:18: warning: too many arguments
> > for format [-Wformat-extra-args]
> > ../drivers/acpi/acpi_fpdt.c:255:4: note: in expansion of macro
> > 'pr_info'
> > pr_info(FW_BUG, "Invalid subtable type %d found.\n",
> 
> []
> > --- linux-next-20210128.orig/drivers/acpi/acpi_fpdt.c
> > +++ linux-next-20210128/drivers/acpi/acpi_fpdt.c
> > @@ -252,7 +252,7 @@ void acpi_init_fpdt(void)
> >   subtable->type);
> > break;
> > default:
> > -   pr_info(FW_BUG, "Invalid subtable type %d
> > found.\n",
> > +   pr_info(FW_BUG "Invalid subtable type %d
> > found.\n",
> >subtable->type);
> 
> Another question would be why is the pr_info when all the other
> FW_BUG uses in this file are pr_err
> 
Here, this FW_BUG just means an unrecognized subtable is found, and it
should not affect the other subtables that are already supported by
this driver. So that's why we didn't use pr_err.
In fact, I've just posted a V2 patch, 
https://patchwork.kernel.org/project/linux-acpi/patch/20210129061548.13448-1-rui.zh...@intel.com/
and I prefer to continue processing even if this FW_BUG is detected.

> One would think it's at least a defect of some time.
> I would think it should at least be pr_notice or pr_warn

I'm also okay with pr_notice/pr_warn here.
This FW_BUG should be really rare.

thanks,
rui
> 
> Documentation/admin-guide/kernel
> -parameters.txt-1
> (KERN_ALERT)  action must be taken immediately
> Documentation/admin-guide/kernel
> -parameters.txt-2
> (KERN_CRIT)   critical conditions
> Documentation/admin-guide/kernel
> -parameters.txt-3 (KERN_ERR)error
> conditions
> Documentation/admin-guide/kernel
> -parameters.txt-4
> (KERN_WARNING)warning conditions
> Documentation/admin-guide/kernel
> -parameters.txt-5
> (KERN_NOTICE) normal but significant condition
> Documentation/admin-guide/kernel-
> parameters.txt:6
> (KERN_INFO)   informational
> Documentation/admin-guide/kernel
> -parameters.txt-7
> (KERN_DEBUG)  debug-level messages
> 
>

Re: [PATCH v2 2/2] thermal: Move therm_throt there from x86/mce

2021-01-27 Thread Zhang Rui

On Mon, 2021-01-25 at 14:05 +0100, Borislav Petkov wrote:
> From: Borislav Petkov 
> 
> This functionality has nothing to do with MCE, move it to the thermal
> framework and untangle it from MCE.
> 
> Have thermal_set_handler() check the build-time assigned default
> handler
> stub was the one used before therm_throt assigns a new one.
> 
> Requested-by: Peter Zijlstra 
> Signed-off-by: Borislav Petkov 

Acked-by: Zhang Rui 

thanks,
rui
> ---
>  arch/x86/Kconfig  |  4 ---
>  arch/x86/include/asm/irq.h|  4 +++
>  arch/x86/include/asm/mce.h| 16 --
>  arch/x86/include/asm/thermal.h| 21 ++
>  arch/x86/kernel/cpu/intel.c   |  3 ++
>  arch/x86/kernel/cpu/mce/Makefile  |  2 --
>  arch/x86/kernel/cpu/mce/intel.c   |  1 -
>  arch/x86/kernel/irq.c | 29
> +++
>  drivers/thermal/intel/Kconfig |  4 +++
>  drivers/thermal/intel/Makefile|  1 +
>  .../thermal/intel}/therm_throt.c  | 25 ++--
>  drivers/thermal/intel/x86_pkg_temp_thermal.c  |  3 +-
>  12 files changed, 67 insertions(+), 46 deletions(-)
>  create mode 100644 arch/x86/include/asm/thermal.h
>  rename {arch/x86/kernel/cpu/mce =>
> drivers/thermal/intel}/therm_throt.c (97%)
> 
> diff --git a/arch/x86/Kconfig b/arch/x86/Kconfig
> index 21f851179ff0..9989db3a9bf5 100644
> --- a/arch/x86/Kconfig
> +++ b/arch/x86/Kconfig
> @@ -1158,10 +1158,6 @@ config X86_MCE_INJECT
> If you don't know what a machine check is and you don't do
> kernel
> QA it is safe to say n.
>  
> -config X86_THERMAL_VECTOR
> - def_bool y
> - depends on X86_MCE_INTEL
> -
>  source "arch/x86/events/Kconfig"
>  
>  config X86_LEGACY_VM86
> diff --git a/arch/x86/include/asm/irq.h b/arch/x86/include/asm/irq.h
> index 528c8a71fe7f..ad65fe7dceb1 100644
> --- a/arch/x86/include/asm/irq.h
> +++ b/arch/x86/include/asm/irq.h
> @@ -53,4 +53,8 @@ void arch_trigger_cpumask_backtrace(const struct
> cpumask *mask,
>  #define arch_trigger_cpumask_backtrace
> arch_trigger_cpumask_backtrace
>  #endif
>  
> +#ifdef CONFIG_X86_THERMAL_VECTOR
> +void thermal_set_handler(void (*handler)(void));
> +#endif
> +
>  #endif /* _ASM_X86_IRQ_H */
> diff --git a/arch/x86/include/asm/mce.h b/arch/x86/include/asm/mce.h
> index def9aa5e1fa4..ddfb3cad8dff 100644
> --- a/arch/x86/include/asm/mce.h
> +++ b/arch/x86/include/asm/mce.h
> @@ -288,22 +288,6 @@ extern void (*mce_threshold_vector)(void);
>  /* Deferred error interrupt handler */
>  extern void (*deferred_error_int_vector)(void);
>  
> -/*
> - * Thermal handler
> - */
> -
> -void intel_init_thermal(struct cpuinfo_x86 *c);
> -
> -/* Interrupt Handler for core thermal thresholds */
> -extern int (*platform_thermal_notify)(__u64 msr_val);
> -
> -/* Interrupt Handler for package thermal thresholds */
> -extern int (*platform_thermal_package_notify)(__u64 msr_val);
> -
> -/* Callback support of rate control, return true, if
> - * callback has rate control */
> -extern bool (*platform_thermal_package_rate_control)(void);
> -
>  /*
>   * Used by APEI to report memory error via /dev/mcelog
>   */
> diff --git a/arch/x86/include/asm/thermal.h
> b/arch/x86/include/asm/thermal.h
> new file mode 100644
> index ..58b0e0a4af6e
> --- /dev/null
> +++ b/arch/x86/include/asm/thermal.h
> @@ -0,0 +1,21 @@
> +/* SPDX-License-Identifier: GPL-2.0 */
> +#ifndef _ASM_X86_THERMAL_H
> +#define _ASM_X86_THERMAL_H
> +
> +/* Interrupt Handler for package thermal thresholds */
> +extern int (*platform_thermal_package_notify)(__u64 msr_val);
> +
> +/* Interrupt Handler for core thermal thresholds */
> +extern int (*platform_thermal_notify)(__u64 msr_val);
> +
> +/* Callback support of rate control, return true, if
> + * callback has rate control */
> +extern bool (*platform_thermal_package_rate_control)(void);
> +
> +#ifdef CONFIG_X86_THERMAL_VECTOR
> +void intel_init_thermal(struct cpuinfo_x86 *c);
> +#else
> +static inline void intel_init_thermal(struct cpuinfo_x86 *c) { }
> +#endif
> +
> +#endif /* _ASM_X86_THERMAL_H */
> diff --git a/arch/x86/kernel/cpu/intel.c
> b/arch/x86/kernel/cpu/intel.c
> index 59a1e3ce3f14..71221af87cb1 100644
> --- a/arch/x86/kernel/cpu/intel.c
> +++ b/arch/x86/kernel/cpu/intel.c
> @@ -24,6 +24,7 @@
>  #include 
>  #include 
>  #include 
> +#include 
>  
>  #ifdef CONFIG_X86_64
>  #include 
> @@ -719,6 +720,8 @@ static void init_intel(struct cpuinfo_x86 *c)
>   tsx_disable();
>  
>

Re: linux-next: build warning after merge of the pm tree

2021-01-26 Thread Zhang Rui

Hi, Stephen,

Sorry that I missed this build warning in the first place, thanks for
reporting.
The patch below fixes it.

BTW, Rafael, I think acpi_fpdt_init() also needs to be fixed to have
proper return value.
Do you prefer an incremental patch or a V2 of 208757d71098 ("ACPI:
tables: introduce support for FPDT table"), which includes all these
fixes?

thanks,
rui

>From 2b8ed148351875b4bf227602a97edba12d08af7e Mon Sep 17 00:00:00 2001
From: Zhang Rui 
Date: Wed, 27 Jan 2021 11:33:33 +0800
Subject: [PATCH] ACPI: FPDT: fix build warning

Fix a build warning,
In file included from ./include/linux/printk.h:7:0,
 from ./include/linux/kernel.h:16,
 from ./include/linux/list.h:9,
 from ./include/linux/kobject.h:19,
 from ./include/linux/of.h:17,
 from ./include/linux/irqdomain.h:35,
 from ./include/linux/acpi.h:13,
 from drivers/acpi/acpi_fpdt.c:11:
drivers/acpi/acpi_fpdt.c: In function ‘acpi_init_fpdt’:
./include/linux/kern_levels.h:5:18: warning: too many arguments for format 
[-Wformat-extra-args]
 #define KERN_SOH "\001"  /* ASCII Start Of Header */
  ^
./include/linux/kern_levels.h:14:19: note: in expansion of macro ‘KERN_SOH’
 #define KERN_INFO KERN_SOH "6" /* informational */
   ^~~~
./include/linux/printk.h:373:9: note: in expansion of macro ‘KERN_INFO’
  printk(KERN_INFO pr_fmt(fmt), ##__VA_ARGS__)
 ^
drivers/acpi/acpi_fpdt.c:255:4: note: in expansion of macro ‘pr_info’
pr_info(FW_BUG, "Invalid subtable type %d found.\n",
^~~

Signed-off-by: Zhang Rui 
---
 drivers/acpi/acpi_fpdt.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/acpi/acpi_fpdt.c b/drivers/acpi/acpi_fpdt.c
index b8108117262a..64d5733dca0b 100644
--- a/drivers/acpi/acpi_fpdt.c
+++ b/drivers/acpi/acpi_fpdt.c
@@ -252,7 +252,7 @@ void acpi_init_fpdt(void)
  subtable->type);
break;
default:
-   pr_info(FW_BUG, "Invalid subtable type %d found.\n",
+   pr_info(FW_BUG "Invalid subtable type %d found.\n",
   subtable->type);
return;
}
-- 
2.17.1


On Wed, 2021-01-27 at 12:43 +1100, Stephen Rothwell wrote:
> Hi all,
> 
> After merging the pm tree, today's linux-next build (x86_64
> allmodconfig)
> produced this warning:
> 
> In file included from include/linux/printk.h:7,
>  from include/linux/kernel.h:16,
>  from include/linux/list.h:9,
>  from include/linux/kobject.h:19,
>  from include/linux/of.h:17,
>  from include/linux/irqdomain.h:35,
>  from include/linux/acpi.h:13,
>  from drivers/acpi/acpi_fpdt.c:11:
> drivers/acpi/acpi_fpdt.c: In function 'acpi_init_fpdt':
> include/linux/kern_levels.h:5:18: warning: too many arguments for
> format [-Wformat-extra-args]
> 5 | #define KERN_SOH "\001"  /* ASCII Start Of Header */
>   |  ^~
> include/linux/kern_levels.h:14:19: note: in expansion of macro
> 'KERN_SOH'
>14 | #define KERN_INFO KERN_SOH "6" /* informational */
>   |   ^~~~
> include/linux/printk.h:373:9: note: in expansion of macro 'KERN_INFO'
>   373 |  printk(KERN_INFO pr_fmt(fmt), ##__VA_ARGS__)
>   | ^
> drivers/acpi/acpi_fpdt.c:255:4: note: in expansion of macro 'pr_info'
>   255 |pr_info(FW_BUG, "Invalid subtable type %d found.\n",
>   |^~~
> 
> Introduced by commit
> 
>   208757d71098 ("ACPI: tables: introduce support for FPDT table")
>

Re: [PATCH v2 2/2] thermal: Move therm_throt there from x86/mce

2021-01-25 Thread Zhang Rui

Hi, Borislav,

Thanks for the patch. CC Srinivas.

On Mon, 2021-01-25 at 14:05 +0100, Borislav Petkov wrote:
> From: Borislav Petkov 
> 
> This functionality has nothing to do with MCE, move it to the thermal
> framework and untangle it from MCE.
> 
Agreed.

just one question,
there are many overlaps between this kernel thermal throttling code and
the x86_pkg_temp_thermal driver, is it possible to combine these two
pieces of code altogether?

thanks,
rui


> Have thermal_set_handler() check the build-time assigned default
> handler
> stub was the one used before therm_throt assigns a new one.
> 

> Requested-by: Peter Zijlstra 
> Signed-off-by: Borislav Petkov 
> ---
>  arch/x86/Kconfig  |  4 ---
>  arch/x86/include/asm/irq.h|  4 +++
>  arch/x86/include/asm/mce.h| 16 --
>  arch/x86/include/asm/thermal.h| 21 ++
>  arch/x86/kernel/cpu/intel.c   |  3 ++
>  arch/x86/kernel/cpu/mce/Makefile  |  2 --
>  arch/x86/kernel/cpu/mce/intel.c   |  1 -
>  arch/x86/kernel/irq.c | 29
> +++
>  drivers/thermal/intel/Kconfig |  4 +++
>  drivers/thermal/intel/Makefile|  1 +
>  .../thermal/intel}/therm_throt.c  | 25 ++--
>  drivers/thermal/intel/x86_pkg_temp_thermal.c  |  3 +-
>  12 files changed, 67 insertions(+), 46 deletions(-)
>  create mode 100644 arch/x86/include/asm/thermal.h
>  rename {arch/x86/kernel/cpu/mce =>
> drivers/thermal/intel}/therm_throt.c (97%)
> 
> diff --git a/arch/x86/Kconfig b/arch/x86/Kconfig
> index 21f851179ff0..9989db3a9bf5 100644
> --- a/arch/x86/Kconfig
> +++ b/arch/x86/Kconfig
> @@ -1158,10 +1158,6 @@ config X86_MCE_INJECT
> If you don't know what a machine check is and you don't do
> kernel
> QA it is safe to say n.
>  
> -config X86_THERMAL_VECTOR
> - def_bool y
> - depends on X86_MCE_INTEL
> -
>  source "arch/x86/events/Kconfig"
>  
>  config X86_LEGACY_VM86
> diff --git a/arch/x86/include/asm/irq.h b/arch/x86/include/asm/irq.h
> index 528c8a71fe7f..ad65fe7dceb1 100644
> --- a/arch/x86/include/asm/irq.h
> +++ b/arch/x86/include/asm/irq.h
> @@ -53,4 +53,8 @@ void arch_trigger_cpumask_backtrace(const struct
> cpumask *mask,
>  #define arch_trigger_cpumask_backtrace
> arch_trigger_cpumask_backtrace
>  #endif
>  
> +#ifdef CONFIG_X86_THERMAL_VECTOR
> +void thermal_set_handler(void (*handler)(void));
> +#endif
> +
>  #endif /* _ASM_X86_IRQ_H */
> diff --git a/arch/x86/include/asm/mce.h b/arch/x86/include/asm/mce.h
> index def9aa5e1fa4..ddfb3cad8dff 100644
> --- a/arch/x86/include/asm/mce.h
> +++ b/arch/x86/include/asm/mce.h
> @@ -288,22 +288,6 @@ extern void (*mce_threshold_vector)(void);
>  /* Deferred error interrupt handler */
>  extern void (*deferred_error_int_vector)(void);
>  
> -/*
> - * Thermal handler
> - */
> -
> -void intel_init_thermal(struct cpuinfo_x86 *c);
> -
> -/* Interrupt Handler for core thermal thresholds */
> -extern int (*platform_thermal_notify)(__u64 msr_val);
> -
> -/* Interrupt Handler for package thermal thresholds */
> -extern int (*platform_thermal_package_notify)(__u64 msr_val);
> -
> -/* Callback support of rate control, return true, if
> - * callback has rate control */
> -extern bool (*platform_thermal_package_rate_control)(void);
> -
>  /*
>   * Used by APEI to report memory error via /dev/mcelog
>   */
> diff --git a/arch/x86/include/asm/thermal.h
> b/arch/x86/include/asm/thermal.h
> new file mode 100644
> index ..58b0e0a4af6e
> --- /dev/null
> +++ b/arch/x86/include/asm/thermal.h
> @@ -0,0 +1,21 @@
> +/* SPDX-License-Identifier: GPL-2.0 */
> +#ifndef _ASM_X86_THERMAL_H
> +#define _ASM_X86_THERMAL_H
> +
> +/* Interrupt Handler for package thermal thresholds */
> +extern int (*platform_thermal_package_notify)(__u64 msr_val);
> +
> +/* Interrupt Handler for core thermal thresholds */
> +extern int (*platform_thermal_notify)(__u64 msr_val);
> +
> +/* Callback support of rate control, return true, if
> + * callback has rate control */
> +extern bool (*platform_thermal_package_rate_control)(void);
> +
> +#ifdef CONFIG_X86_THERMAL_VECTOR
> +void intel_init_thermal(struct cpuinfo_x86 *c);
> +#else
> +static inline void intel_init_thermal(struct cpuinfo_x86 *c) { }
> +#endif
> +
> +#endif /* _ASM_X86_THERMAL_H */
> diff --git a/arch/x86/kernel/cpu/intel.c
> b/arch/x86/kernel/cpu/intel.c
> index 59a1e3ce3f14..71221af87cb1 100644
> --- a/arch/x86/kernel/cpu/intel.c
> +++ b/arch/x86/kernel/cpu/intel.c
> @@ -24,6 +24,7 @@
>  #include 
>  #include 
>  #include 
> +#include 
>  
>  #ifdef CONFIG_X86_64
>  #include 
> @@ -719,6 +720,8 @@ static void init_intel(struct cpuinfo_x86 *c)
>   tsx_disable();
>  
>   split_lock_init();
> +
> + intel_init_thermal(c);
>  }
>  
>  #ifdef CONFIG_X86_32
> diff --git a/arch/x86/kernel/cpu/mce/Makefile
> b/arch/x86/kernel/cpu/mce/Makefile
> index

RE: [PATCH 3/3] perf/x86/rapl: Fix psys-energy event on Intel SPR platform

2021-01-24 Thread Zhang, Rui

Hi, Peter,

> -Original Message-
> From: Zhang, Rui
> Sent: Sunday, January 17, 2021 10:34 PM
> To: 'Peter Zijlstra' 
> Cc: mi...@redhat.com; a...@kernel.org; mark.rutl...@arm.com;
> alexander.shish...@linux.intel.com; jo...@redhat.com;
> namhy...@kernel.org; linux-kernel@vger.kernel.org; x...@kernel.org;
> kan.li...@linux.intel.com; a...@linux.intel.com
> Subject: RE: [PATCH 3/3] perf/x86/rapl: Fix psys-energy event on Intel SPR
> platform
> 
> Hi, Peter,
> 
> > -Original Message-
> > From: Peter Zijlstra 
> > Sent: Saturday, January 16, 2021 8:50 PM
> > To: Zhang, Rui 
> > Cc: mi...@redhat.com; a...@kernel.org; mark.rutl...@arm.com;
> > alexander.shish...@linux.intel.com; jo...@redhat.com;
> > namhy...@kernel.org; linux-kernel@vger.kernel.org; x...@kernel.org;
> > kan.li...@linux.intel.com; a...@linux.intel.com
> > Subject: Re: [PATCH 3/3] perf/x86/rapl: Fix psys-energy event on Intel
> > SPR platform
> > Importance: High
> >
> > On Fri, Jan 15, 2021 at 05:22:08PM +0800, Zhang Rui wrote:
> > > There are several things special for the RAPL Psys energy counter,
> > > on Intel Sapphire Rapids platform.
> > > 1. it contains one Psys master package, and only CPUs on the master
> > >package can read valid value of the Psys energy counter, reading the
> > >MSR on CPUs in the slave package returns 0.
> > > 2. The master package does not have to be Physical package 0. And when
> > >all the CPUs on the Psys master package are offlined, we lose the Psys
> > >energy counter, at runtime.
> > > 3. The Psys energy counter can be disabled by BIOS, while all the other
> > >energy counters are not affected.
> > >
> > > It is not easy to handle all of these in the current RAPL PMU design
> > > because
> > > a) perf_msr_probe() validates the MSR on some random CPU, which may
> > either
> > >be in the Psys master package or in the Psys slave package.
> > > b) all the RAPL events share the same PMU, and there is not API to
> remove
> > >the psys-energy event cleanly, without affecting the other events in
> > >the same PMU.
> > >
> > > This patch addresses the problems in a simple way.
> > >
> > > First, by setting .no_check bit for RAPL Psys MSR, the psys-energy
> > > event is always added, so we don't have to check the Psys
> > > ENERGY_STATUS MSR on master package.
> > >
> > > Then, rapl_not_visible() is removed because 1. it is useless for
> > > RAPL MSRs with .no_check cleared, because the
> > >.is_visible() callbacks is always overridden in perf_msr_probe().
> > > 2. it is useless for RAPL MSRs with .no_check set, because we actually
> > >want the sysfs attributes always be visible for those MSRs.
> > >
> > > With the above changes, we always probe the psys-energy event on
> > > Intel SPR platform. Difference is that the event counter returns 0
> > > when the Psys RAPL Domain is disabled by BIOS, or the Psys master
> > > package is
> > offlined.
> >
> > Maybe I'm too tired, but I cannot follow. How does this cure the fact
> > that the rapl_cpu_mask might not include that master thing. And how
> > can software detect what the master thing is to begin with?
> 
> To make things simple, I ignore the master thing, and probe the psys-energy
> counter blindly on SPR.
> So rapl_cpu_mask still includes all the online CPUs.
> This means that psys-energy is "valid" on all packages, and it just returns
> different values on different packages.
> AKA, whole system power consumption on Psys master package, and Zero
> on Psys slave packages.
> 
In short, the current code does not allow RAPL energy counter to return 0. And 
all the work I do is to allow Psys energy counter to return 0.
In this way, the Psys event is "valid" on all CPUs, so we don't need to handle 
the master thing.
The drawback is that we still see psys-energy event, but with 0 readout, when 
Psys counter is not available (master package offlined, or psys disabled).

TBH, I'm not quite sure if I understand your original question correctly or 
not, so please let me know if there is still something unclear.

Thanks,
rui
> 
> Thanks,
> rui

RE: [PATCH 0/2] thermal: Replace thermal_notify_framework with thermal_zone_device_update

2021-01-19 Thread Zhang, Rui

Hi, Thara,

Thanks for the cleanup. I've proposed similar patches previously.
https://patchwork.kernel.org/project/linux-pm/patch/20200430063229.6182-2-rui.zh...@intel.com/
https://patchwork.kernel.org/project/linux-pm/patch/20200430063229.6182-3-rui.zh...@intel.com/
can you please also address the comments in the previous discussion, like doc 
cleanup?

Thanks,
rui

> -Original Message-
> From: Thara Gopinath 
> Sent: Tuesday, January 19, 2021 10:06 PM
> To: Zhang, Rui ; daniel.lezc...@linaro.org;
> kv...@codeaurora.org; da...@davemloft.net; k...@kernel.org; Coelho,
> Luciano 
> Cc: linux-wirel...@vger.kernel.org; linux-kernel@vger.kernel.org;
> net...@vger.kernel.org; linux...@vger.kernel.org; am...@kernel.org;
> Errera, Nathan 
> Subject: [PATCH 0/2] thermal: Replace thermal_notify_framework with
> thermal_zone_device_update
> Importance: High
> 
> thermal_notify_framework just updates for a single trip point where as
> thermal_zone_device_update does other bookkeeping like updating the
> temperature of the thermal zone, running through the list of trip points and
> setting the next trip point etc. Since  the later is a more thorough version 
> of
> former, replace thermal_notify_framework with
> thermal_zone_device_update.
> 
> Thara Gopinath (2):
>   net: wireless: intel: iwlwifi: mvm: tt: Replace
> thermal_notify_framework
>   drivers: thermal: Remove thermal_notify_framework
> 
>  drivers/net/wireless/intel/iwlwifi/mvm/tt.c |  4 ++--
>  drivers/thermal/thermal_core.c  | 18 --
>  include/linux/thermal.h |  4 
>  3 files changed, 2 insertions(+), 24 deletions(-)
> 
> --
> 2.25.1

RE: [PATCH v2] thermal/core: Make cooling device state change private

2021-01-18 Thread Zhang, Rui




> -Original Message-
> From: Daniel Lezcano 
> Sent: Tuesday, January 19, 2021 1:38 AM
> To: daniel.lezc...@linaro.org; Zhang, Rui 
> Cc: Guenter Roeck ; Kamil Debski ;
> Bartlomiej Zolnierkiewicz ; Jean Delvare
> ; Neil Armstrong ; Amit
> Kucheria ; open list:PWM FAN DRIVER  hw...@vger.kernel.org>; open list ; open
> list:KHADAS MCU MFD DRIVER ; open
> list:THERMAL 
> Subject: [PATCH v2] thermal/core: Make cooling device state change private
> Importance: High
> 
> The change of the cooling device state should be used by the governor or at
> least by the core code, not by the drivers themselves.
> 
> Remove the API usage and move the function declaration to the internal
> headers.
> 
> Signed-off-by: Daniel Lezcano 
> Acked-by: Guenter Roeck 

Acked-by: Zhang Rui 

Thanks,
rui
> ---
>  drivers/hwmon/pwm-fan.c  | 1 -
>  drivers/thermal/khadas_mcu_fan.c | 1 -
>  drivers/thermal/thermal_core.h   | 2 ++
>  include/linux/thermal.h  | 3 ---
>  4 files changed, 2 insertions(+), 5 deletions(-)
> 
> diff --git a/drivers/hwmon/pwm-fan.c b/drivers/hwmon/pwm-fan.c index
> bdba2143021a..0b1159ceac9b 100644
> --- a/drivers/hwmon/pwm-fan.c
> +++ b/drivers/hwmon/pwm-fan.c
> @@ -378,7 +378,6 @@ static int pwm_fan_probe(struct platform_device
> *pdev)
>   return ret;
>   }
>   ctx->cdev = cdev;
> - thermal_cdev_update(cdev);
>   }
> 
>   return 0;
> diff --git a/drivers/thermal/khadas_mcu_fan.c
> b/drivers/thermal/khadas_mcu_fan.c
> index 9eadd2d6413e..d35e5313bea4 100644
> --- a/drivers/thermal/khadas_mcu_fan.c
> +++ b/drivers/thermal/khadas_mcu_fan.c
> @@ -100,7 +100,6 @@ static int khadas_mcu_fan_probe(struct
> platform_device *pdev)
>   return ret;
>   }
>   ctx->cdev = cdev;
> - thermal_cdev_update(cdev);
> 
>   return 0;
>  }
> diff --git a/drivers/thermal/thermal_core.h
> b/drivers/thermal/thermal_core.h index 90f9a80c8b23..86b8cef7310e 100644
> --- a/drivers/thermal/thermal_core.h
> +++ b/drivers/thermal/thermal_core.h
> @@ -65,6 +65,8 @@ static inline bool cdev_is_power_actor(struct
> thermal_cooling_device *cdev)
>   cdev->ops->power2state;
>  }
> 
> +void thermal_cdev_update(struct thermal_cooling_device *);
> +
>  /**
>   * struct thermal_trip - representation of a point in temperature domain
>   * @np: pointer to struct device_node that this trip point was created from
> diff --git a/include/linux/thermal.h b/include/linux/thermal.h index
> 1e686404951b..6ac7bb1d2b1f 100644
> --- a/include/linux/thermal.h
> +++ b/include/linux/thermal.h
> @@ -390,7 +390,6 @@ int thermal_zone_get_temp(struct
> thermal_zone_device *tz, int *temp);  int thermal_zone_get_slope(struct
> thermal_zone_device *tz);  int thermal_zone_get_offset(struct
> thermal_zone_device *tz);
> 
> -void thermal_cdev_update(struct thermal_cooling_device *);  void
> thermal_notify_framework(struct thermal_zone_device *, int);  int
> thermal_zone_device_enable(struct thermal_zone_device *tz);  int
> thermal_zone_device_disable(struct thermal_zone_device *tz); @@ -437,8
> +436,6 @@ static inline int thermal_zone_get_offset(
>   struct thermal_zone_device *tz)
>  { return -ENODEV; }
> 
> -static inline void thermal_cdev_update(struct thermal_cooling_device
> *cdev) -{ }  static inline void thermal_notify_framework(struct
> thermal_zone_device *tz,
>   int trip)
>  { }
> --
> 2.17.1

RE: [PATCH 2/3] perf/x86/rapl: Fix energy counter detection

2021-01-17 Thread Zhang, Rui




> -Original Message-
> From: Peter Zijlstra 
> Sent: Saturday, January 16, 2021 8:48 PM
> To: Zhang, Rui 
> Cc: mi...@redhat.com; a...@kernel.org; mark.rutl...@arm.com;
> alexander.shish...@linux.intel.com; jo...@redhat.com;
> namhy...@kernel.org; linux-kernel@vger.kernel.org; x...@kernel.org;
> kan.li...@linux.intel.com; a...@linux.intel.com
> Subject: Re: [PATCH 2/3] perf/x86/rapl: Fix energy counter detection
> Importance: High
> 
> On Sat, Jan 16, 2021 at 08:19:35AM +, Zhang, Rui wrote:
> >
> >
> > > -Original Message-
> > > From: Peter Zijlstra 
> > > Sent: Saturday, January 16, 2021 4:03 AM
> > > To: Zhang, Rui 
> > > Cc: mi...@redhat.com; a...@kernel.org; mark.rutl...@arm.com;
> > > alexander.shish...@linux.intel.com; jo...@redhat.com;
> > > namhy...@kernel.org; linux-kernel@vger.kernel.org; x...@kernel.org;
> > > kan.li...@linux.intel.com; a...@linux.intel.com
> > > Subject: Re: [PATCH 2/3] perf/x86/rapl: Fix energy counter detection
> > > Importance: High
> > >
> > > On Fri, Jan 15, 2021 at 05:22:07PM +0800, Zhang Rui wrote:
> > > > In the RAPL ENERGY_COUNTER MSR, only the lower 32bits represent
> > > > the energy counter, and the higher 32bits are reserved.
> > > >
> > > > Add the MSR mask for these MSRs to fix a problem that the RAPL PMU
> > > > events are added erroneously when higher 32bits contain non-zero
> value.
> > >
> > > Why would these high bits be non-zero?
> >
> > On SPR platform, the high bits of Psys energy counter are reused for other
> purpose.
> > High bits for other RAPL domains energy counters still return 0.
> >
> > I didn't mention this because I thought this patch should be okay as a
> generic fix.
> 
> But it doesn't fix anything.. there's not anything broken, except on that daft
> SPR thing.

Well, yes.
Before SPR, this is just a potential issue. But things on SPR suggests that 
this potential issue may become a real one.
So are you suggesting me to also include the SPR information as the 
justification of this patch?

Thanks,
rui

RE: [PATCH 3/3] perf/x86/rapl: Fix psys-energy event on Intel SPR platform

2021-01-17 Thread Zhang, Rui

Hi, Peter,

> -Original Message-
> From: Peter Zijlstra 
> Sent: Saturday, January 16, 2021 8:50 PM
> To: Zhang, Rui 
> Cc: mi...@redhat.com; a...@kernel.org; mark.rutl...@arm.com;
> alexander.shish...@linux.intel.com; jo...@redhat.com;
> namhy...@kernel.org; linux-kernel@vger.kernel.org; x...@kernel.org;
> kan.li...@linux.intel.com; a...@linux.intel.com
> Subject: Re: [PATCH 3/3] perf/x86/rapl: Fix psys-energy event on Intel SPR
> platform
> Importance: High
> 
> On Fri, Jan 15, 2021 at 05:22:08PM +0800, Zhang Rui wrote:
> > There are several things special for the RAPL Psys energy counter, on
> > Intel Sapphire Rapids platform.
> > 1. it contains one Psys master package, and only CPUs on the master
> >package can read valid value of the Psys energy counter, reading the
> >MSR on CPUs in the slave package returns 0.
> > 2. The master package does not have to be Physical package 0. And when
> >all the CPUs on the Psys master package are offlined, we lose the Psys
> >energy counter, at runtime.
> > 3. The Psys energy counter can be disabled by BIOS, while all the other
> >energy counters are not affected.
> >
> > It is not easy to handle all of these in the current RAPL PMU design
> > because
> > a) perf_msr_probe() validates the MSR on some random CPU, which may
> either
> >be in the Psys master package or in the Psys slave package.
> > b) all the RAPL events share the same PMU, and there is not API to remove
> >the psys-energy event cleanly, without affecting the other events in
> >the same PMU.
> >
> > This patch addresses the problems in a simple way.
> >
> > First, by setting .no_check bit for RAPL Psys MSR, the psys-energy
> > event is always added, so we don't have to check the Psys
> > ENERGY_STATUS MSR on master package.
> >
> > Then, rapl_not_visible() is removed because 1. it is useless for RAPL
> > MSRs with .no_check cleared, because the
> >.is_visible() callbacks is always overridden in perf_msr_probe().
> > 2. it is useless for RAPL MSRs with .no_check set, because we actually
> >want the sysfs attributes always be visible for those MSRs.
> >
> > With the above changes, we always probe the psys-energy event on Intel
> > SPR platform. Difference is that the event counter returns 0 when the
> > Psys RAPL Domain is disabled by BIOS, or the Psys master package is
> offlined.
> 
> Maybe I'm too tired, but I cannot follow. How does this cure the fact that the
> rapl_cpu_mask might not include that master thing. And how can software
> detect what the master thing is to begin with?

To make things simple, I ignore the master thing, and probe the psys-energy 
counter blindly on SPR.
So rapl_cpu_mask still includes all the online CPUs.
This means that psys-energy is "valid" on all packages, and it just returns 
different values on different packages.
AKA, whole system power consumption on Psys master package, and Zero on Psys 
slave packages.

Not sure if I answered your question or not.

Thanks,
rui

RE: [PATCH 2/3] perf/x86/rapl: Fix energy counter detection

2021-01-16 Thread Zhang, Rui




> -Original Message-
> From: Peter Zijlstra 
> Sent: Saturday, January 16, 2021 4:03 AM
> To: Zhang, Rui 
> Cc: mi...@redhat.com; a...@kernel.org; mark.rutl...@arm.com;
> alexander.shish...@linux.intel.com; jo...@redhat.com;
> namhy...@kernel.org; linux-kernel@vger.kernel.org; x...@kernel.org;
> kan.li...@linux.intel.com; a...@linux.intel.com
> Subject: Re: [PATCH 2/3] perf/x86/rapl: Fix energy counter detection
> Importance: High
> 
> On Fri, Jan 15, 2021 at 05:22:07PM +0800, Zhang Rui wrote:
> > In the RAPL ENERGY_COUNTER MSR, only the lower 32bits represent the
> > energy counter, and the higher 32bits are reserved.
> >
> > Add the MSR mask for these MSRs to fix a problem that the RAPL PMU
> > events are added erroneously when higher 32bits contain non-zero value.
> 
> Why would these high bits be non-zero?

On SPR platform, the high bits of Psys energy counter are reused for other 
purpose.
High bits for other RAPL domains energy counters still return 0.

I didn't mention this because I thought this patch should be okay as a generic 
fix.

Thanks,
rui

[PATCH 3/3] perf/x86/rapl: Fix psys-energy event on Intel SPR platform

2021-01-15 Thread Zhang Rui

There are several things special for the RAPL Psys energy counter, on
Intel Sapphire Rapids platform.
1. it contains one Psys master package, and only CPUs on the master
   package can read valid value of the Psys energy counter, reading the
   MSR on CPUs in the slave package returns 0.
2. The master package does not have to be Physical package 0. And when
   all the CPUs on the Psys master package are offlined, we lose the Psys
   energy counter, at runtime.
3. The Psys energy counter can be disabled by BIOS, while all the other
   energy counters are not affected.

It is not easy to handle all of these in the current RAPL PMU design
because
a) perf_msr_probe() validates the MSR on some random CPU, which may either
   be in the Psys master package or in the Psys slave package.
b) all the RAPL events share the same PMU, and there is not API to remove
   the psys-energy event cleanly, without affecting the other events in
   the same PMU.

This patch addresses the problems in a simple way.

First, by setting .no_check bit for RAPL Psys MSR, the psys-energy event
is always added, so we don't have to check the Psys ENERGY_STATUS MSR on
master package.

Then, rapl_not_visible() is removed because
1. it is useless for RAPL MSRs with .no_check cleared, because the
   .is_visible() callbacks is always overridden in perf_msr_probe().
2. it is useless for RAPL MSRs with .no_check set, because we actually
   want the sysfs attributes always be visible for those MSRs.

With the above changes, we always probe the psys-energy event on Intel SPR
platform. Difference is that the event counter returns 0 when the Psys
RAPL Domain is disabled by BIOS, or the Psys master package is offlined.

Signed-off-by: Zhang Rui 
Reviewed-by: Andi Kleen 
---
 arch/x86/events/rapl.c | 21 +
 1 file changed, 9 insertions(+), 12 deletions(-)

diff --git a/arch/x86/events/rapl.c b/arch/x86/events/rapl.c
index 7ed25b2ba05f..f42a70496a24 100644
--- a/arch/x86/events/rapl.c
+++ b/arch/x86/events/rapl.c
@@ -454,16 +454,9 @@ static struct attribute *rapl_events_cores[] = {
NULL,
 };
 
-static umode_t
-rapl_not_visible(struct kobject *kobj, struct attribute *attr, int i)
-{
-   return 0;
-}
-
 static struct attribute_group rapl_events_cores_group = {
.name  = "events",
.attrs = rapl_events_cores,
-   .is_visible = rapl_not_visible,
 };
 
 static struct attribute *rapl_events_pkg[] = {
@@ -476,7 +469,6 @@ static struct attribute *rapl_events_pkg[] = {
 static struct attribute_group rapl_events_pkg_group = {
.name  = "events",
.attrs = rapl_events_pkg,
-   .is_visible = rapl_not_visible,
 };
 
 static struct attribute *rapl_events_ram[] = {
@@ -489,7 +481,6 @@ static struct attribute *rapl_events_ram[] = {
 static struct attribute_group rapl_events_ram_group = {
.name  = "events",
.attrs = rapl_events_ram,
-   .is_visible = rapl_not_visible,
 };
 
 static struct attribute *rapl_events_gpu[] = {
@@ -502,7 +493,6 @@ static struct attribute *rapl_events_gpu[] = {
 static struct attribute_group rapl_events_gpu_group = {
.name  = "events",
.attrs = rapl_events_gpu,
-   .is_visible = rapl_not_visible,
 };
 
 static struct attribute *rapl_events_psys[] = {
@@ -515,7 +505,6 @@ static struct attribute *rapl_events_psys[] = {
 static struct attribute_group rapl_events_psys_group = {
.name  = "events",
.attrs = rapl_events_psys,
-   .is_visible = rapl_not_visible,
 };
 
 static bool test_msr(int idx, void *data)
@@ -534,6 +523,14 @@ static struct perf_msr intel_rapl_msrs[] = {
[PERF_RAPL_PSYS] = { MSR_PLATFORM_ENERGY_STATUS, 
_events_psys_group,  test_msr, false, RAPL_MSR_MASK },
 };
 
+static struct perf_msr intel_rapl_spr_msrs[] = {
+   [PERF_RAPL_PP0]  = { MSR_PP0_ENERGY_STATUS,  
_events_cores_group, test_msr, false, RAPL_MSR_MASK },
+   [PERF_RAPL_PKG]  = { MSR_PKG_ENERGY_STATUS,  
_events_pkg_group,   test_msr, false, RAPL_MSR_MASK },
+   [PERF_RAPL_RAM]  = { MSR_DRAM_ENERGY_STATUS, 
_events_ram_group,   test_msr, false, RAPL_MSR_MASK },
+   [PERF_RAPL_PP1]  = { MSR_PP1_ENERGY_STATUS,  
_events_gpu_group,   test_msr, false, RAPL_MSR_MASK },
+   [PERF_RAPL_PSYS] = { MSR_PLATFORM_ENERGY_STATUS, 
_events_psys_group,  test_msr, true, RAPL_MSR_MASK },
+};
+
 /*
  * Force to PERF_RAPL_MAX size due to:
  * - perf_msr_probe(PERF_RAPL_MAX)
@@ -764,7 +761,7 @@ static struct rapl_model model_spr = {
  BIT(PERF_RAPL_PSYS),
.unit_quirk = RAPL_UNIT_QUIRK_INTEL_SPR,
.msr_power_unit = MSR_RAPL_POWER_UNIT,
-   .rapl_msrs  = intel_rapl_msrs,
+   .rapl_msrs  = intel_rapl_spr_msrs,
 };
 
 static struct rapl_model model_amd_fam17h = {
-- 
2.17.1

[PATCH 1/3] perf/x86/rapl: Add msr mask support

2021-01-15 Thread Zhang Rui

In some cases, when probing a perf MSR, we're probing certain bits of the
MSR instead of the whole register, thus only these bits should be checked.

For example, for RAPL ENERGY_STATUS MSR, only the lower 32 bits represents
the energy counter, and the higher 32bits are reserved.

Introduce a new mask field in struct perf_msr to allow probing certain
bits of a MSR.

This change is transparent to the current perf_msr_probe() users.

Signed-off-by: Zhang Rui 
Reviewed-by: Andi Kleen 
---
 arch/x86/events/probe.c | 5 -
 arch/x86/events/probe.h | 7 ---
 2 files changed, 8 insertions(+), 4 deletions(-)

diff --git a/arch/x86/events/probe.c b/arch/x86/events/probe.c
index 136a1e847254..a0a19c404cb5 100644
--- a/arch/x86/events/probe.c
+++ b/arch/x86/events/probe.c
@@ -28,6 +28,7 @@ perf_msr_probe(struct perf_msr *msr, int cnt, bool zero, void 
*data)
for (bit = 0; bit < cnt; bit++) {
if (!msr[bit].no_check) {
struct attribute_group *grp = msr[bit].grp;
+   u64 mask;
 
/* skip entry with no group */
if (!grp)
@@ -44,8 +45,10 @@ perf_msr_probe(struct perf_msr *msr, int cnt, bool zero, 
void *data)
/* Virt sucks; you cannot tell if a R/O MSR is present 
:/ */
if (rdmsrl_safe(msr[bit].msr, ))
continue;
+
+   mask = msr[bit].mask ? msr[bit].mask : U64_MAX;
/* Disable zero counters if requested. */
-   if (!zero && !val)
+   if (!zero && !(val & mask))
continue;
 
grp->is_visible = NULL;
diff --git a/arch/x86/events/probe.h b/arch/x86/events/probe.h
index 4c8e0afc5fb5..261b9bda24e3 100644
--- a/arch/x86/events/probe.h
+++ b/arch/x86/events/probe.h
@@ -4,10 +4,11 @@
 #include 
 
 struct perf_msr {
-   u64   msr;
-   struct attribute_group   *grp;
+   u64 msr;
+   struct attribute_group  *grp;
bool(*test)(int idx, void *data);
-   bool  no_check;
+   boolno_check;
+   u64 mask;
 };
 
 unsigned long
-- 
2.17.1

[PATCH 2/3] perf/x86/rapl: Fix energy counter detection

2021-01-15 Thread Zhang Rui

In the RAPL ENERGY_COUNTER MSR, only the lower 32bits represent the
energy counter, and the higher 32bits are reserved.

Add the MSR mask for these MSRs to fix a problem that the RAPL PMU events
are added erroneously when higher 32bits contain non-zero value.

Signed-off-by: Zhang Rui 
Reviewed-by: Andi Kleen 
---
 arch/x86/events/rapl.c | 13 -
 1 file changed, 8 insertions(+), 5 deletions(-)

diff --git a/arch/x86/events/rapl.c b/arch/x86/events/rapl.c
index 7dbbeaacd995..7ed25b2ba05f 100644
--- a/arch/x86/events/rapl.c
+++ b/arch/x86/events/rapl.c
@@ -523,12 +523,15 @@ static bool test_msr(int idx, void *data)
return test_bit(idx, (unsigned long *) data);
 }
 
+/* Only lower 32bits of the MSR represents the energy counter */
+#define RAPL_MSR_MASK 0x
+
 static struct perf_msr intel_rapl_msrs[] = {
-   [PERF_RAPL_PP0]  = { MSR_PP0_ENERGY_STATUS,  
_events_cores_group, test_msr },
-   [PERF_RAPL_PKG]  = { MSR_PKG_ENERGY_STATUS,  
_events_pkg_group,   test_msr },
-   [PERF_RAPL_RAM]  = { MSR_DRAM_ENERGY_STATUS, 
_events_ram_group,   test_msr },
-   [PERF_RAPL_PP1]  = { MSR_PP1_ENERGY_STATUS,  
_events_gpu_group,   test_msr },
-   [PERF_RAPL_PSYS] = { MSR_PLATFORM_ENERGY_STATUS, 
_events_psys_group,  test_msr },
+   [PERF_RAPL_PP0]  = { MSR_PP0_ENERGY_STATUS,  
_events_cores_group, test_msr, false, RAPL_MSR_MASK },
+   [PERF_RAPL_PKG]  = { MSR_PKG_ENERGY_STATUS,  
_events_pkg_group,   test_msr, false, RAPL_MSR_MASK },
+   [PERF_RAPL_RAM]  = { MSR_DRAM_ENERGY_STATUS, 
_events_ram_group,   test_msr, false, RAPL_MSR_MASK },
+   [PERF_RAPL_PP1]  = { MSR_PP1_ENERGY_STATUS,  
_events_gpu_group,   test_msr, false, RAPL_MSR_MASK },
+   [PERF_RAPL_PSYS] = { MSR_PLATFORM_ENERGY_STATUS, 
_events_psys_group,  test_msr, false, RAPL_MSR_MASK },
 };
 
 /*
-- 
2.17.1

RE: [PATCH 4/6] acpi/drivers/thermal: Remove TRIPS_NONE cooling device binding

2021-01-06 Thread Zhang, Rui

ACPI thermal driver binds the devices listed in _TZD method with 
THERMAL_TRIPS_NONE.
Now given that
1. THERMAL_TRIPS_NONE is removed from thermal framework
2. _TZP is rarely supported. I searched ~500 acpidumps from different platforms 
reported by end users in kernel Bugzilla, there is only one platform with _TZP 
implemented, and it was almost 10 years ago.

So, I think it is safe to remove this piece of code.

> -Original Message-
> From: Daniel Lezcano 
> Sent: Tuesday, January 05, 2021 11:44 PM
> To: Zhang, Rui 
> Cc: mj...@codon.org.uk; linux...@vger.kernel.org; linux-
> ker...@vger.kernel.org; am...@kernel.org; thara.gopin...@linaro.org;
> Rafael J. Wysocki ; Len Brown ; open
> list:ACPI THERMAL DRIVER 
> Subject: Re: [PATCH 4/6] acpi/drivers/thermal: Remove TRIPS_NONE cooling
> device binding
> Importance: High
> 
> Hi Rui,
> 
> 
> On 15/12/2020 00:38, Daniel Lezcano wrote:
> > The loop is here to create default cooling device binding on the
> > THERMAL_TRIPS_NONE number which is used to be the 'forced_passive'
> > feature. However, we removed all code dealing with that in the thermal
> > core, thus this binding does no longer make sense.
> >
> > Remove it.
> >
> > Signed-off-by: Daniel Lezcano 

Acked-by: Zhang Rui 

Thanks,
rui
> 
> Are you fine with this change?
> 
> Thanks
> 
>   -- Daniel
> 
> > ---
> >  drivers/acpi/thermal.c | 19 ---
> >  1 file changed, 19 deletions(-)
> >
> > diff --git a/drivers/acpi/thermal.c b/drivers/acpi/thermal.c index
> > b5e4bc9e3282..26a89ff80a0e 100644
> > --- a/drivers/acpi/thermal.c
> > +++ b/drivers/acpi/thermal.c
> > @@ -764,25 +764,6 @@ static int acpi_thermal_cooling_device_cb(struct
> thermal_zone_device *thermal,
> > }
> > }
> >
> > -   for (i = 0; i < tz->devices.count; i++) {
> > -   handle = tz->devices.handles[i];
> > -   status = acpi_bus_get_device(handle, );
> > -   if (ACPI_SUCCESS(status) && (dev == device)) {
> > -   if (bind)
> > -   result = thermal_zone_bind_cooling_device
> > -   (thermal,
> THERMAL_TRIPS_NONE,
> > -cdev, THERMAL_NO_LIMIT,
> > -THERMAL_NO_LIMIT,
> > -
> THERMAL_WEIGHT_DEFAULT);
> > -   else
> > -   result =
> thermal_zone_unbind_cooling_device
> > -   (thermal,
> THERMAL_TRIPS_NONE,
> > -cdev);
> > -   if (result)
> > -   goto failed;
> > -   }
> > -   }
> > -
> >  failed:
> > return result;
> >  }
> >
> 
> 
> --
> <http://www.linaro.org/> Linaro.org │ Open source software for ARM SoCs
> 
> Follow Linaro:  <http://www.facebook.com/pages/Linaro> Facebook |
> <http://twitter.com/#!/linaroorg> Twitter | <http://www.linaro.org/linaro-
> blog/> Blog

Re: [PATCH] x86/PCI: Convert force_disable_hpet() to standard quirk

2020-12-01 Thread Zhang Rui

On Mon, 2020-11-30 at 20:21 +0100, Thomas Gleixner wrote:
> Feng,
> 
> On Fri, Nov 27 2020 at 14:11, Feng Tang wrote:
> > On Fri, Nov 27, 2020 at 12:27:34AM +0100, Thomas Gleixner wrote:
> > > On Thu, Nov 26 2020 at 09:24, Feng Tang wrote:
> > > Yes, that can happen. But OTOH, we should start to think about
> > > the
> > > requirements for using the TSC watchdog.

My original proposal is to disable jiffies and refined-jiffies as the
clocksource watchdog, because they are not reliable and it's better to
use clocksource that has a hardware counter as watchdog, like the patch
below, which I didn't sent out for upstream.

>From cf9ce0ecab8851a3745edcad92e072022af3dbd9 Mon Sep 17 00:00:00 2001
From: Zhang Rui 
Date: Fri, 19 Jun 2020 22:03:23 +0800
Subject: [RFC PATCH] time/clocksource: do not use refined-jiffies as watchdog

On IA platforms, if HPET is disabled, either via x86 early-quirks, or
via kernel commandline, refined-jiffies will be used as clocksource
watchdog in early boot phase, before acpi_pm timer registered.

This is not a problem if jiffies are accurate.
But in some cases, for example, when serial console is enabled, it may
take several milliseconds to write to the console, with irq disabled,
frequently. Thus many ticks may become longer than it should be.

Using refined-jiffies as watchdog in this case breaks the system because
a) duration calculated by refined-jiffies watchdog is always consistent
   with the watchdog timeout issued using add_timer(), say, around 500ms.
b) duration calculated by the running clocksource, usually TSC on IA
   platforms, reflects the real time cost, which may be much larger.
This results in the running clocksource being disabled erroneously.

This is reproduced on ICL because HPET is disabled in x86 early-quirks,
and also reproduced on a KBL and a WHL platform when HPET is disabled
via command line.

BTW, commit fd329f276eca
("x86/mtrr: Skip cache flushes on CPUs with cache self-snooping") is
another example that refined-jiffies causes the same problem when ticks
become slow for some other reason.

IMO, the right solution is to only use hardware clocksource as watchdog.
Then even if ticks are slow, both the running clocksource and the watchdog
returns real time cost, and they still match.

Signed-off-by: Zhang Rui 
---
 kernel/time/clocksource.c | 4 
 1 file changed, 4 insertions(+)

diff --git a/kernel/time/clocksource.c b/kernel/time/clocksource.c
index 02441ead3c3b..e7e703858fa6 100644
--- a/kernel/time/clocksource.c
+++ b/kernel/time/clocksource.c
@@ -364,6 +364,10 @@ static void clocksource_select_watchdog(bool fallback)
watchdog = NULL;

list_for_each_entry(cs, _list, list) {
+   /* Do not use refined-jiffies as clocksource watchdog */
+   if (cs->rating <= 2)
+   continue;
+
/* cs is a clocksource to be watched. */
if (cs->flags & CLOCK_SOURCE_MUST_VERIFY)
continue;
-- 
2.17.1

> > > 
> > > I'm inclined to lift that requirement when the CPU has:
> > > 
> > > 1) X86_FEATURE_CONSTANT_TSC
> > > 2) X86_FEATURE_NONSTOP_TSC
> > > 3) X86_FEATURE_NONSTOP_TSC_S3
> > 
> > IIUC, this feature exists for several generations of Atom
> > platforms,
> > and it is always coupled with 1) and 2), so it could be skipped for
> > the checking.
> 
> Yes, we can ignore that bit as it's not widely available and not
> required to solve the problem.
> 
> > > 4) X86_FEATURE_TSC_ADJUST
> > > 
> > > 5) At max. 4 sockets
> > > 
Should we consider some other corner cases like TSC is not used as
clocksource? refined_jiffies watchdog can break any other hardware
clocksource when it becomes inaccurate.

> > > The only reason I hate to disable HPET upfront at least during
> > > boot is
> > > that HPET is the best mechanism for the refined TSC calibration.
> > > PMTIMER
> > > sucks because it's slow and wraps around pretty quick.
> > > 
> > > So we could do the following even on platforms where HPET stops
> > > in some
> > > magic PC? state:
> > > 
> > >   - Register it during early boot as clocksource
> > > 
> > >   - Prevent the enablement as clockevent and the chardev hpet
> > > timer muck
> > > 
> > >   - Prevent the magic PC? state up to the point where the refined
> > > TSC calibration is finished.
> > > 
> > >   - Unregister it once the TSC has taken over as system
> > > clocksource and
> > > enable the magic PC? state in which HPET gets disfunctional.
> > 

On the other side, for ICL, the HPET problem is observed on early
sampl

Re: [PATCH] thermal: Fix NULL pointer dereference issue

2020-11-17 Thread Zhang Rui

On Tue, 2020-11-17 at 09:57 +0100, Daniel Lezcano wrote:
> On 17/11/2020 08:18, Zhang Rui wrote:
> > On Mon, 2020-11-16 at 21:59 +0530, Mukesh Ojha wrote:
> > > Cooling stats variable inside
> > > thermal_cooling_device_stats_update()
> > > can get NULL. We should add a NULL check on stat inside for
> > > sanity.
> > > 
> > > Signed-off-by: Mukesh Ojha 
> > > ---
> > >  drivers/thermal/thermal_sysfs.c | 3 +++
> > >  1 file changed, 3 insertions(+)
> > > 
> > > diff --git a/drivers/thermal/thermal_sysfs.c
> > > b/drivers/thermal/thermal_sysfs.c
> > > index a6f371f..f52708f 100644
> > > --- a/drivers/thermal/thermal_sysfs.c
> > > +++ b/drivers/thermal/thermal_sysfs.c
> > > @@ -754,6 +754,9 @@ void
> > > thermal_cooling_device_stats_update(struct
> > > thermal_cooling_device *cdev,
> > >  {
> > >   struct cooling_dev_stats *stats = cdev->stats;
> > >  
> > > + if (!stats)
> > > + return;
> > > +
> > 
> > May I know in which case stats can be NULL?
> > The only possibility I can see is that cdev->ops->get_max_state()
> > fails
> > in cooling_device_stats_setup(), right?
> 
> A few lines below, the allocation could fail.
> 
> stats = kzalloc(var, GFP_KERNEL);
> if (!stats)
> return;
> 
> Some drivers define themselves as a cooling device state to let the
> userspace to act on their power. The screen brightness is one example
> with a cdev with 1024 states, the resulting stats table to be
> allocated
> is very big and the kzalloc is prone to fail.
> 
Oh, right.
As we're not going to fix the cdev, so I think we do need this patch,
right?

thanks,
rui
> > thanks,
> > rui
> > 
> > >   spin_lock(>lock);
> > >  
> > >   if (stats->state == new_state)
> 
>

Re: [PATCH] thermal: Fix NULL pointer dereference issue

2020-11-16 Thread Zhang Rui

On Mon, 2020-11-16 at 21:59 +0530, Mukesh Ojha wrote:
> Cooling stats variable inside thermal_cooling_device_stats_update()
> can get NULL. We should add a NULL check on stat inside for sanity.
> 
> Signed-off-by: Mukesh Ojha 
> ---
>  drivers/thermal/thermal_sysfs.c | 3 +++
>  1 file changed, 3 insertions(+)
> 
> diff --git a/drivers/thermal/thermal_sysfs.c
> b/drivers/thermal/thermal_sysfs.c
> index a6f371f..f52708f 100644
> --- a/drivers/thermal/thermal_sysfs.c
> +++ b/drivers/thermal/thermal_sysfs.c
> @@ -754,6 +754,9 @@ void thermal_cooling_device_stats_update(struct
> thermal_cooling_device *cdev,
>  {
>   struct cooling_dev_stats *stats = cdev->stats;
>  
> + if (!stats)
> + return;
> +
May I know in which case stats can be NULL?
The only possibility I can see is that cdev->ops->get_max_state() fails
in cooling_device_stats_setup(), right?

thanks,
rui

>   spin_lock(>lock);
>  
>   if (stats->state == new_state)

Re: [PATCH] thermal: Fix slab-out-of-bounds in thermal_cooling_device_stats_update()

2020-11-16 Thread Zhang Rui

On Tue, 2020-09-15 at 13:58 +0800, zhuguangqin...@gmail.com wrote:
> From: zhuguangqing 
> 
> In function thermal_cooling_device_stats_update(), if the input
> parameter
> new_state is greater or equal to stats->max_states, then it will
> cause
> slab-out-of-bounds error when execute the code as follows:
> stats->trans_table[stats->state * stats->max_states + new_state]++;
> 
> Two functions call the function
> thermal_cooling_device_stats_update().
> 1. cur_state_store()
> 2. thermal_cdev_set_cur_state()
> Both of the two functions call cdev->ops->set_cur_state(cdev, state)
> before thermal_cooling_device_stats_update(), if the return value is
> not 0, then thermal_cooling_device_stats_update() will not be called.
> So if all cdev->ops->set_cur_state(cdev, state) check validity of the
> parameter state, then it's ok. Unfortunately, it's not now.
> 
> We have two methods to avoid the slab-out-of-bounds error in
> thermal_cooling_device_stats_update().
> 1. Check the validity of the parameter state in all
> cdev->ops->set_cur_state(cdev, state).
> 2. Check the validity of the parameter state in
> thermal_cooling_device_stats_update().
> 
> Use method 2 in this patch. Because the modification is simple and
> resolve the problem in the scope of "#ifdef
> CONFIG_THERMAL_STATISTICS".
> 
> Signed-off-by: zhuguangqing 

Hi, Daniel,

this patch is a similar fix as

https://patchwork.kernel.org/project/linux-pm/patch/20200408041917.2329-4-rui.zh...@intel.com/

I think we'd better take the original fix from Takashi Iwai.
And I will refresh and submit the patches that supports dynamic cooling
states later when I have time.

thanks,
rui
> ---
>  drivers/thermal/thermal_sysfs.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/drivers/thermal/thermal_sysfs.c
> b/drivers/thermal/thermal_sysfs.c
> index 8c231219e15d..9c49f744d79d 100644
> --- a/drivers/thermal/thermal_sysfs.c
> +++ b/drivers/thermal/thermal_sysfs.c
> @@ -756,7 +756,7 @@ void thermal_cooling_device_stats_update(struct
> thermal_cooling_device *cdev,
>  
>   spin_lock(>lock);
>  
> - if (stats->state == new_state)
> + if (stats->state == new_state || new_state >= stats-
> >max_states)
>   goto unlock;
>  
>   update_time_in_state(stats);

Re: [PATCH v3 3/4] powercap: Add AMD Fam17h RAPL support

2020-11-01 Thread Zhang Rui

On Tue, 2020-10-27 at 07:23 +, Victor Ding wrote:
> This patch enables AMD Fam17h RAPL support for the power capping
> framework. The support is as per AMD Fam17h Model31h (Zen2) and
> model 00-ffh (Zen1) PPR.
> 
> Tested by comparing the results of following two sysfs entries and
> the
> values directly read from corresponding MSRs via /dev/cpu/[x]/msr:
>   /sys/class/powercap/intel-rapl/intel-rapl:0/energy_uj
>   /sys/class/powercap/intel-rapl/intel-rapl:0/intel-
> rapl:0:0/energy_uj
> 
> Signed-off-by: Victor Ding 
> Acked-by: Kim Phillips 
> 
> 
> ---
> 
> Changes in v3:
> By Victor Ding 
>  - Rebased to the latest code.
>  - Created a new rapl_defaults for AMD CPUs.
>  - Removed redundant setting to zeros.
>  - Stopped using the fake power limit domain 1.
> 
> Changes in v2:
> By Kim Phillips :
>  - Added Kim's Acked-by.
>  - Added Daniel Lezcano to Cc.
>  - (No code change).
> 
>  arch/x86/include/asm/msr-index.h |  1 +
>  drivers/powercap/intel_rapl_common.c |  6 ++
>  drivers/powercap/intel_rapl_msr.c| 20 +++-
>  3 files changed, 26 insertions(+), 1 deletion(-)
> 
> diff --git a/arch/x86/include/asm/msr-index.h
> b/arch/x86/include/asm/msr-index.h
> index 21917e134ad4..c36a083c8ec0 100644
> --- a/arch/x86/include/asm/msr-index.h
> +++ b/arch/x86/include/asm/msr-index.h
> @@ -327,6 +327,7 @@
>  #define MSR_PP1_POLICY   0x0642
>  
>  #define MSR_AMD_RAPL_POWER_UNIT  0xc0010299
> +#define MSR_AMD_CORE_ENERGY_STATUS   0xc001029a
>  #define MSR_AMD_PKG_ENERGY_STATUS0xc001029b
>  
>  /* Config TDP MSRs */
> diff --git a/drivers/powercap/intel_rapl_common.c
> b/drivers/powercap/intel_rapl_common.c
> index 0b2830efc574..bedd780bed12 100644
> --- a/drivers/powercap/intel_rapl_common.c
> +++ b/drivers/powercap/intel_rapl_common.c
> @@ -1011,6 +1011,10 @@ static const struct rapl_defaults
> rapl_defaults_cht = {
>   .compute_time_window = rapl_compute_time_window_atom,
>  };
>  
> +static const struct rapl_defaults rapl_defaults_amd = {
> + .check_unit = rapl_check_unit_core,
> +};
> +

why do we need power_unit and time_unit if we only want to expose the
energy counter?

Plus, in rapl_init_domains(), PL1 is enabled for every RAPL Domain
blindly, I'm not sure how this is handled on the AMD CPUs.
Is PL1 invalidated by rapl_detect_powerlimit()? or is it still
registered as a valid constraint into powercap sysfs I/F?

Currently, the code makes the assumption that there is only on power
limit if priv->limits[domain_id] not set, we probably need to change
this if we want to support RAPL domains with no power limit.

thanks,
rui
>  static const struct x86_cpu_id rapl_ids[] __initconst = {
>   X86_MATCH_INTEL_FAM6_MODEL(SANDYBRIDGE, _default
> s_core),
>   X86_MATCH_INTEL_FAM6_MODEL(SANDYBRIDGE_X,   _defaults_core),
> @@ -1061,6 +1065,8 @@ static const struct x86_cpu_id rapl_ids[]
> __initconst = {
>  
>   X86_MATCH_INTEL_FAM6_MODEL(XEON_PHI_KNL,_defaults_hsw_se
> rver),
>   X86_MATCH_INTEL_FAM6_MODEL(XEON_PHI_KNM,_defaults_hsw_se
> rver),
> +
> + X86_MATCH_VENDOR_FAM(AMD, 0x17, _defaults_amd),
>   {}
>  };
>  MODULE_DEVICE_TABLE(x86cpu, rapl_ids);
> diff --git a/drivers/powercap/intel_rapl_msr.c
> b/drivers/powercap/intel_rapl_msr.c
> index a819b3b89b2f..78213d4b5b16 100644
> --- a/drivers/powercap/intel_rapl_msr.c
> +++ b/drivers/powercap/intel_rapl_msr.c
> @@ -49,6 +49,14 @@ static struct rapl_if_priv rapl_msr_priv_intel = {
>   .limits[RAPL_DOMAIN_PLATFORM] = 2,
>  };
>  
> +static struct rapl_if_priv rapl_msr_priv_amd = {
> + .reg_unit = MSR_AMD_RAPL_POWER_UNIT,
> + .regs[RAPL_DOMAIN_PACKAGE] = {
> + 0, MSR_AMD_PKG_ENERGY_STATUS, 0, 0, 0 },
> + .regs[RAPL_DOMAIN_PP0] = {
> + 0, MSR_AMD_CORE_ENERGY_STATUS, 0, 0, 0 },
> +};
> +
>  /* Handles CPU hotplug on multi-socket systems.
>   * If a CPU goes online as the first CPU of the physical package
>   * we add the RAPL package to the system. Similarly, when the last
> @@ -138,7 +146,17 @@ static int rapl_msr_probe(struct platform_device
> *pdev)
>   const struct x86_cpu_id *id = x86_match_cpu(pl4_support_ids);
>   int ret;
>  
> - rapl_msr_priv = _msr_priv_intel;
> + switch (boot_cpu_data.x86_vendor) {
> + case X86_VENDOR_INTEL:
> + rapl_msr_priv = _msr_priv_intel;
> + break;
> + case X86_VENDOR_AMD:
> + rapl_msr_priv = _msr_priv_amd;
> + break;
> + default:
> + pr_err("intel-rapl does not support CPU vendor %d\n",
> boot_cpu_data.x86_vendor);
> + return -ENODEV;
> + }
>   rapl_msr_priv->read_raw = rapl_msr_read_raw;
>   rapl_msr_priv->write_raw = rapl_msr_write_raw;
>

Re: [PATCH v2 2/4] cpufreq: intel_pstate: Avoid missing HWP max updates in passive mode

2020-10-27 Thread Zhang Rui

On Fri, 2020-10-23 at 17:35 +0200, Rafael J. Wysocki wrote:
> From: Rafael J. Wysocki 
> 
> If the cpufreq policy max limit is changed when intel_pstate operates
> in the passive mode with HWP enabled and the "powersave" governor is
> used on top of it, the HWP max limit is not updated as appropriate.
> 
> Namely, in the "powersave" governor case, the target P-state
> is always equal to the policy min limit, so if the latter does
> not change, intel_cpufreq_adjust_hwp() is not invoked to update
> the HWP Request MSR due to the "target_pstate != old_pstate" check
> in intel_cpufreq_update_pstate(), so the HWP max limit is not
> updated as a result.
> 
> Also, if the CPUFREQ_NEED_UPDATE_LIMITS flag is not set for the
> driver and the target frequency does not change along with the
> policy max limit, the "target_freq == policy->cur" check in
> __cpufreq_driver_target() prevents the driver's ->target() callback
> from being invoked at all, so the HWP max limit is not updated.
> 
> To prevent that occurring, set the CPUFREQ_NEED_UPDATE_LIMITS flag
> in the intel_cpufreq driver structure if HWP is enabled and modify
> intel_cpufreq_update_pstate() to do the "target_pstate != old_pstate"
> check only in the non-HWP case and let intel_cpufreq_adjust_hwp()
> always run in the HWP case (it will update HWP Request only if the
> cached value of the register is different from the new one including
> the limits, so if neither the target P-state value nor the max limit
> changes, the register write will still be avoided).
> 
> Fixes: f6ebbcf08f37 ("cpufreq: intel_pstate: Implement passive mode
> with HWP enabled")
> Reported-by: Zhang Rui 
> Cc: 5.9+  # 5.9+
> Signed-off-by: Rafael J. Wysocki 

I have confirmed that the problem is gone with this patch series
applied.
The HWP register is updated after changing the scaling_max_freq sysfs
attribute, with powersave governor.

Tested-by: Zhang Rui 

thanks,
rui
> ---
> 
> The v2 is just the intel_pstate changes (without the core changes)
> and setting
> the new flag.
> 
> ---
>  drivers/cpufreq/intel_pstate.c |   13 ++---
>  1 file changed, 6 insertions(+), 7 deletions(-)
> 
> Index: linux-pm/drivers/cpufreq/intel_pstate.c
> ===
> --- linux-pm.orig/drivers/cpufreq/intel_pstate.c
> +++ linux-pm/drivers/cpufreq/intel_pstate.c
> @@ -2550,14 +2550,12 @@ static int intel_cpufreq_update_pstate(s
>   int old_pstate = cpu->pstate.current_pstate;
>  
>   target_pstate = intel_pstate_prepare_request(cpu,
> target_pstate);
> - if (target_pstate != old_pstate) {
> + if (hwp_active) {
> + intel_cpufreq_adjust_hwp(cpu, target_pstate,
> fast_switch);
> + cpu->pstate.current_pstate = target_pstate;
> + } else if (target_pstate != old_pstate) {
> + intel_cpufreq_adjust_perf_ctl(cpu, target_pstate,
> fast_switch);
>   cpu->pstate.current_pstate = target_pstate;
> - if (hwp_active)
> - intel_cpufreq_adjust_hwp(cpu, target_pstate,
> -  fast_switch);
> - else
> - intel_cpufreq_adjust_perf_ctl(cpu,
> target_pstate,
> -   fast_switch);
>   }
>  
>   intel_cpufreq_trace(cpu, fast_switch ?
> INTEL_PSTATE_TRACE_FAST_SWITCH :
> @@ -3014,6 +3012,7 @@ static int __init intel_pstate_init(void
>   hwp_mode_bdw = id->driver_data;
>   intel_pstate.attr = hwp_cpufreq_attrs;
>   intel_cpufreq.attr = hwp_cpufreq_attrs;
> + intel_cpufreq.flags |=
> CPUFREQ_NEED_UPDATE_LIMITS;
>   if (!default_driver)
>   default_driver = _pstate;
>  
> 
> 
>

Re: [PATCH v2 4/4] cpufreq: schedutil: Always call drvier if need_freq_update is set

2020-10-27 Thread Zhang Rui

On Fri, 2020-10-23 at 17:36 +0200, Rafael J. Wysocki wrote:
> From: Rafael J. Wysocki 
> 
> Because sugov_update_next_freq() may skip a frequency update even if
> the need_freq_update flag has been set for the policy at hand, policy
> limits updates may not take effect as expected.
> 
> For example, if the intel_pstate driver operates in the passive mode
> with HWP enabled, it needs to update the HWP min and max limits when
> the policy min and max limits change, respectively, but that may not
> happen if the target frequency does not change along with the limit
> at hand.  In particular, if the policy min is changed first, causing
> the target frequency to be adjusted to it, and the policy max limit
> is changed later to the same value, the HWP max limit will not be
> updated to follow it as expected, because the target frequency is
> still equal to the policy min limit and it will not change until
> that limit is updated.
> 
> To address this issue, modify get_next_freq() to clear
> need_freq_update only if the CPUFREQ_NEED_UPDATE_LIMITS flag is
> not set for the cpufreq driver in use (and it should be set for all
> potentially affected drivers) and make sugov_update_next_freq()
> check need_freq_update and continue when it is set regardless of
> whether or not the new target frequency is equal to the old one.
> 
> Fixes: f6ebbcf08f37 ("cpufreq: intel_pstate: Implement passive mode
> with HWP enabled")
> Reported-by: Zhang Rui 
> Cc: 5.9+  # 5.9+
> Signed-off-by: Rafael J. Wysocki 

I have confirmed that the problem is gone with this patch series
applied.

Tested-by: Zhang Rui 

thanks,
rui

> ---
> 
> New patch in v2.
> 
> ---
>  kernel/sched/cpufreq_schedutil.c |8 ++--
>  1 file changed, 6 insertions(+), 2 deletions(-)
> 
> Index: linux-pm/kernel/sched/cpufreq_schedutil.c
> ===
> --- linux-pm.orig/kernel/sched/cpufreq_schedutil.c
> +++ linux-pm/kernel/sched/cpufreq_schedutil.c
> @@ -102,11 +102,12 @@ static bool sugov_should_update_freq(str
>  static bool sugov_update_next_freq(struct sugov_policy *sg_policy,
> u64 time,
>  unsigned int next_freq)
>  {
> - if (sg_policy->next_freq == next_freq)
> + if (sg_policy->next_freq == next_freq && !sg_policy-
> >need_freq_update)
>   return false;
>  
>   sg_policy->next_freq = next_freq;
>   sg_policy->last_freq_update_time = time;
> + sg_policy->need_freq_update = false;
>  
>   return true;
>  }
> @@ -164,7 +165,10 @@ static unsigned int get_next_freq(struct
>   if (freq == sg_policy->cached_raw_freq && !sg_policy-
> >need_freq_update)
>   return sg_policy->next_freq;
>  
> - sg_policy->need_freq_update = false;
> + if (sg_policy->need_freq_update)
> + sg_policy->need_freq_update =
> + cpufreq_driver_test_flags(CPUFREQ_NEED_UPDATE_L
> IMITS);
> +
>   sg_policy->cached_raw_freq = freq;
>   return cpufreq_driver_resolve_freq(policy, freq);
>  }
> 
> 
>

Re: [PATCH v2 3/4] powercap: Add AMD Fam17h RAPL support

2020-10-08 Thread Zhang Rui

On Wed, 2020-10-07 at 11:14 -0500, Kim Phillips wrote:
> From: Victor Ding 
> 
> This patch enables AMD Fam17h RAPL support for the power capping
> framework. The support is as per AMD Fam17h Model31h (Zen2) and
> model 00-ffh (Zen1) PPR.
> 
> Tested by comparing the results of following two sysfs entries and
> the
> values directly read from corresponding MSRs via /dev/cpu/[x]/msr:
>   /sys/class/powercap/intel-rapl/intel-rapl:0/energy_uj
>   /sys/class/powercap/intel-rapl/intel-rapl:0/intel-
> rapl:0:0/energy_uj
> 
> Signed-off-by: Victor Ding 
> Acked-by: Kim Phillips 
> Cc: Victor Ding 
> Cc: Alexander Shishkin 
> Cc: Borislav Petkov 
> Cc: Daniel Lezcano 
> Cc: "H. Peter Anvin" 
> Cc: Ingo Molnar 
> Cc: Josh Poimboeuf 
> Cc: Pawan Gupta 
> Cc: "Peter Zijlstra (Intel)" 
> Cc: "Rafael J. Wysocki" 
> Cc: Sean Christopherson 
> Cc: Thomas Gleixner 
> Cc: Tony Luck 
> Cc: Vineela Tummalapalli 
> Cc: LKML 
> Cc: linux...@vger.kernel.org
> Cc: x...@kernel.org
> ---
> Kim's changes from Victor's original submission:
> 
> 
https://lore.kernel.org/lkml/20200729205144.3.I01b89fb23d7498521c84cfdf417450cbbfca46bb@changeid/
> 
>  - Added my Acked-by.
>  - Added Daniel Lezcano to Cc.
> 
>  arch/x86/include/asm/msr-index.h |  1 +
>  drivers/powercap/intel_rapl_common.c |  2 ++
>  drivers/powercap/intel_rapl_msr.c| 27
> ++-
>  3 files changed, 29 insertions(+), 1 deletion(-)
> 
> diff --git a/arch/x86/include/asm/msr-index.h
> b/arch/x86/include/asm/msr-index.h
> index f1b24f1b774d..c0646f69d2a5 100644
> --- a/arch/x86/include/asm/msr-index.h
> +++ b/arch/x86/include/asm/msr-index.h
> @@ -324,6 +324,7 @@
>  #define MSR_PP1_POLICY   0x0642
>  
>  #define MSR_AMD_RAPL_POWER_UNIT  0xc0010299
> +#define MSR_AMD_CORE_ENERGY_STATUS   0xc001029a
>  #define MSR_AMD_PKG_ENERGY_STATUS0xc001029b
>  
>  /* Config TDP MSRs */
> diff --git a/drivers/powercap/intel_rapl_common.c
> b/drivers/powercap/intel_rapl_common.c
> index 983d75bd5bd1..6905ccffcec3 100644
> --- a/drivers/powercap/intel_rapl_common.c
> +++ b/drivers/powercap/intel_rapl_common.c
> @@ -1054,6 +1054,8 @@ static const struct x86_cpu_id rapl_ids[]
> __initconst = {
>  
>   X86_MATCH_INTEL_FAM6_MODEL(XEON_PHI_KNL,_defaults_hsw_se
> rver),
>   X86_MATCH_INTEL_FAM6_MODEL(XEON_PHI_KNM,_defaults_hsw_se
> rver),
> +
> + X86_MATCH_VENDOR_FAM(AMD, 0x17, _defaults_core),

I double if we can use rapl_defaults_core here.

static const struct rapl_defaults rapl_defaults_core = {
.floor_freq_reg_addr = 0,
.check_unit = rapl_check_unit_core,
.set_floor_freq = set_floor_freq_default,
.compute_time_window = rapl_compute_time_window_core,
};

.floor_freq_reg_addr = 0,
is redundant here, even for rapl_defaults_core, we can remove it.

.check_unit = rapl_check_unit_core,
the Intel UNIT MSR supports three units including Energy/Power/Time.
>From the change below, only the energy counter is supported, so you may
need to confirm if all the three units are supported or not.

.set_floor_freq = set_floor_freq_default,this function sets PL1_CLAMP 
bit on RAPL_DOMAIN_REG_LIMIT, but RAPL_DOMAIN_REG_LIMIT is not supported on the 
AMD cpus.

.compute_time_window = rapl_compute_time_window_core,
this is used for setting the power limits, which is not supported on
the AMD cpus.

IMO, it's better to use a new rapl_defaults that contains valid
callbacks for AMD cpus.

>   {}
>  };
>  MODULE_DEVICE_TABLE(x86cpu, rapl_ids);
> diff --git a/drivers/powercap/intel_rapl_msr.c
> b/drivers/powercap/intel_rapl_msr.c
> index c68ef5e4e1c4..dcaef917f79d 100644
> --- a/drivers/powercap/intel_rapl_msr.c
> +++ b/drivers/powercap/intel_rapl_msr.c
> @@ -48,6 +48,21 @@ static struct rapl_if_priv rapl_msr_priv_intel = {
>   .limits[RAPL_DOMAIN_PACKAGE] = 2,
>  };
>  
> +static struct rapl_if_priv rapl_msr_priv_amd = {
> + .reg_unit = MSR_AMD_RAPL_POWER_UNIT,
> + .regs[RAPL_DOMAIN_PACKAGE] = {
> + 0, MSR_AMD_PKG_ENERGY_STATUS, 0, 0, 0 },
> + .regs[RAPL_DOMAIN_PP0] = {
> + 0, MSR_AMD_CORE_ENERGY_STATUS, 0, 0, 0 },
> + .regs[RAPL_DOMAIN_PP1] = {
> + 0, 0, 0, 0, 0 },
> + .regs[RAPL_DOMAIN_DRAM] = {
> + 0, 0, 0, 0, 0 },
> + .regs[RAPL_DOMAIN_PLATFORM] = {
> + 0, 0, 0, 0, 0},

I don't think you need to set the PP1/DRAM/PLATFORM registers to 0 explicitly 
if they are not supported.

> + .limits[RAPL_DOMAIN_PACKAGE] = 1,


Is Pkg Domain PL1 really supported?
At least according to this patch, I don't think so. So if power limit
is supported, it is better to add the support in this patch altogether.

If no, we're actually exposing energy counters only. If this is the
case, I'm not sure if it is proper to do this in powercap class because
we can not do powercap actually. Or at least, if we want to support
power zones with no power limits, we should enhance the code to

Re: [PATCH] intel_idle: Add ICL support

2020-08-26 Thread Zhang Rui

On Wed, 2020-08-26 at 19:00 +0200, Rafael J. Wysocki wrote:
> On Wed, Aug 26, 2020 at 6:46 PM Guilhem Lettron 
> wrote:
> > 
> > I've done more tests, maybe it can give you more hints.
> > I don't see that much differences between both (with and without
> > patches) in this cases.
> 
> OK, thanks!
> 
> I'm assuming that the topmost two sets of data are for the "without
> the patch" case whereas the other three correspond to the "with the
> patch" case.

I think the sample period is too short.
Even with the same kernel, I can see the Busy% varies from 1% to 9%,
and the PkgWatt varies from 0.4W to 2.4W.

thanks,
rui

> 
> If so, the processor clearly enters PC10 in both cases and the
> residency percentages are similar.
> 
> The numbers of times the POLL state was selected in the first test
> look kind of unusual (relatively very large), but other than this the
> patch doesn't seem to make much of a difference, so I'm not going to
> apply it.
> 
> Thanks!

Re: [PATCH] intel_idle: Add ICL support

2020-08-26 Thread Zhang Rui

On Wed, 2020-08-26 at 15:32 +0200, Guilhem Lettron wrote:
> On Wed, 26 Aug 2020 at 15:18, Artem Bityutskiy 
> wrote:
> > 
> > On Wed, 2020-08-26 at 16:16 +0300, Artem Bityutskiy wrote:
> > > Just get a reasonably new turbostat (it is part of the kernel
> > > tree, you
> > > can compile it yourself) and run it for few seconds (like
> > > 'turbostat
> > > sleep 10'), get the output (will be a lot of it), and we can
> > > check what
> > > is actually going on with regards to C-states.
> > 
> > Oh, and if you could do that with and without your patch, we could
> > even
> > compare things. But try to do it at least with the default
> > (acpi_idle)
> > configuration.
> 
> with my patch:
> 
> turbostat version 20.03.20 - Len Brown 
> CPUID(0): GenuineIntel 0x1b CPUID levels; 0x8008 xlevels;
> family:model:stepping 0x6:7e:5 (6:126:5)
> CPUID(1): SSE3 MONITOR - EIST TM2 TSC MSR ACPI-TM HT TM
> CPUID(6): APERF, TURBO, DTS, PTM, HWP, No-HWPnotify, HWPwindow,
> HWPepp, HWPpkg, EPB
> cpu2: MSR_IA32_MISC_ENABLE: 0x00850089 (TCC EIST MWAIT PREFETCH
> TURBO)
> CPUID(7): SGX
> cpu2: MSR_IA32_FEATURE_CONTROL: 0x00020005 (Locked )
> CPUID(0x15): eax_crystal: 2 ebx_tsc: 78 ecx_crystal_hz: 3840
> TSC: 1497 MHz (3840 Hz * 78 / 2 / 100)
> CPUID(0x16): base_mhz: 1500 max_mhz: 3900 bus_mhz: 100
> cpu2: MSR_MISC_PWR_MGMT: 0x00401c40 (ENable-EIST_Coordination
> DISable-EPB DISable-OOB)
> RAPL: 17476 sec. Joule Counter Range, at 15 Watts
> cpu2: MSR_PLATFORM_INFO: 0x4043cf1810f00
> 4 * 100.0 = 400.0 MHz max efficiency frequency
> 15 * 100.0 = 1500.0 MHz base frequency
> cpu2: MSR_IA32_POWER_CTL: 0x0024005d (C1E auto-promotion: DISabled)
> cpu2: MSR_TURBO_RATIO_LIMIT: 0x2323232323232627
> 35 * 100.0 = 3500.0 MHz max turbo 8 active cores
> 35 * 100.0 = 3500.0 MHz max turbo 7 active cores
> 35 * 100.0 = 3500.0 MHz max turbo 6 active cores
> 35 * 100.0 = 3500.0 MHz max turbo 5 active cores
> 35 * 100.0 = 3500.0 MHz max turbo 4 active cores
> 35 * 100.0 = 3500.0 MHz max turbo 3 active cores
> 38 * 100.0 = 3800.0 MHz max turbo 2 active cores
> 39 * 100.0 = 3900.0 MHz max turbo 1 active cores
> cpu2: MSR_CONFIG_TDP_NOMINAL: 0x000d (base_ratio=13)
> cpu2: MSR_CONFIG_TDP_LEVEL_1: 0x000a0060 (PKG_MIN_PWR_LVL1=0
> PKG_MAX_PWR_LVL1=0 LVL1_RATIO=10 PKG_TDP_LVL1=96)
> cpu2: MSR_CONFIG_TDP_LEVEL_2: 0x000f00c8 (PKG_MIN_PWR_LVL2=0
> PKG_MAX_PWR_LVL2=0 LVL2_RATIO=15 PKG_TDP_LVL2=200)
> cpu2: MSR_CONFIG_TDP_CONTROL: 0x ( lock=0)
> cpu2: MSR_TURBO_ACTIVATION_RATIO: 0x000c (MAX_NON_TURBO_RATIO=12
> lock=0)
> cpu2: MSR_PKG_CST_CONFIG_CONTROL: 0x74008008 (UNdemote-C1, demote-C1,
> locked, pkg-cstate-limit=8 (unlimited))
> current_driver: intel_idle
> current_governor: menu
> current_governor_ro: menu
> cpu2: POLL: CPUIDLE CORE POLL IDLE
> cpu2: C1: MWAIT 0x00
> cpu2: C1E: MWAIT 0x01
> cpu2: C6: MWAIT 0x20
> cpu2: C7s: MWAIT 0x33
> cpu2: C8: MWAIT 0x40
> cpu2: C9: MWAIT 0x50
> cpu2: C10: MWAIT 0x60
> cpu2: cpufreq driver: intel_cpufreq
> cpu2: cpufreq governor: schedutil
> cpufreq intel_pstate no_turbo: 0
> cpu2: MSR_MISC_FEATURE_CONTROL: 0x (L2-Prefetch
> L2-Prefetch-pair L1-Prefetch L1-IP-Prefetch)
> cpu0: MSR_PM_ENABLE: 0x0001 (HWP)
> cpu0: MSR_HWP_CAPABILITIES: 0x010e0d27 (high 39 guar 13 eff 14 low 1)
> cpu0: MSR_HWP_REQUEST: 0x80002727 (min 39 max 39 des 0 epp 0x80
> window
> 0x0 pkg 0x0)
> cpu0: MSR_HWP_REQUEST_PKG: 0x8000ff01 (min 1 max 255 des 0 epp 0x80
> window 0x0)
> cpu0: MSR_HWP_STATUS: 0x (No-Guaranteed_Perf_Change, No-
> Excursion_Min)
> cpu0: MSR_IA32_ENERGY_PERF_BIAS: 0x0006 (balanced)
> cpu0: MSR_RAPL_POWER_UNIT: 0x000a0e03 (0.125000 Watts, 0.61
> Joules, 0.000977 sec.)
> cpu0: MSR_PKG_POWER_INFO: 0x0078 (15 W TDP, RAPL 0 - 0 W,
> 0.00 sec.)
> cpu0: MSR_PKG_POWER_LIMIT: 0x5a8118009d80c8 (UNlocked)
> cpu0: PKG Limit #1: ENabled (25.00 Watts, 24.00 sec, clamp
> ENabled)
> cpu0: PKG Limit #2: ENabled (35.00 Watts, 10.00* sec, clamp
> DISabled)
> cpu0: MSR_DRAM_POWER_LIMIT: 0x5400de (UNlocked)
> cpu0: DRAM Limit: DISabled (0.00 Watts, 0.000977 sec, clamp
> DISabled)
> cpu0: MSR_PP0_POLICY: 0
> cpu0: MSR_PP0_POWER_LIMIT: 0x (UNlocked)
> cpu0: Cores Limit: DISabled (0.00 Watts, 0.000977 sec, clamp
> DISabled)
> cpu0: MSR_PP1_POLICY: 0
> cpu0: MSR_PP1_POWER_LIMIT: 0x (UNlocked)
> cpu0: GFX Limit: DISabled (0.00 Watts, 0.000977 sec, clamp
> DISabled)
> cpu0: MSR_IA32_TEMPERATURE_TARGET: 0x0564 (100 C)
> cpu0: MSR_IA32_PACKAGE_THERM_STATUS: 0x88290800 (59 C)
> cpu0: MSR_IA32_PACKAGE_THERM_INTERRUPT: 0x0003 (100 C, 100 C)
> cpu2: MSR_PKGC3_IRTL: 0x (NOTvalid, 0 ns)
> cpu2: MSR_PKGC6_IRTL: 0x (NOTvalid, 0 ns)
> cpu2: MSR_PKGC7_IRTL: 0x (NOTvalid, 0 ns)
> cpu2: MSR_PKGC8_IRTL: 0x (NOTvalid, 0 ns)
> cpu2: MSR_PKGC9_IRTL: 0x (NOTvalid, 0 ns)
> cpu2: MSR_PKGC10_IRTL: 0x (NOTvalid, 0 ns)
> 10.003466 sec
> Core CPU Avg_MHz Busy% Bzy_MHz TSC_MHz IRQ SMI POLL C1 C1E C6 C7s C8
> C9

[tip: perf/urgent] perf/x86/rapl: Fix missing psys sysfs attributes

2020-08-14 Thread tip-bot2 for Zhang Rui

The following commit has been merged into the perf/urgent branch of tip:

Commit-ID: 4bb5fcb97a5df0bbc0a27e0252b1e7ce140a8431
Gitweb:
https://git.kernel.org/tip/4bb5fcb97a5df0bbc0a27e0252b1e7ce140a8431
Author:Zhang Rui 
AuthorDate:Tue, 11 Aug 2020 23:31:47 +08:00
Committer: Ingo Molnar 
CommitterDate: Fri, 14 Aug 2020 12:35:11 +02:00

perf/x86/rapl: Fix missing psys sysfs attributes

This fixes a problem introduced by commit:

  5fb5273a905c ("perf/x86/rapl: Use new MSR detection interface")

that perf event sysfs attributes for psys RAPL domain are missing.

Fixes: 5fb5273a905c ("perf/x86/rapl: Use new MSR detection interface")
Signed-off-by: Zhang Rui 
Signed-off-by: Ingo Molnar 
Reviewed-by: Kan Liang 
Reviewed-by: Len Brown 
Acked-by: Jiri Olsa 
Link: https://lore.kernel.org/r/20200811153149.12242-2-rui.zh...@intel.com
---
 arch/x86/events/rapl.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/arch/x86/events/rapl.c b/arch/x86/events/rapl.c
index 68b3882..e972383 100644
--- a/arch/x86/events/rapl.c
+++ b/arch/x86/events/rapl.c
@@ -665,7 +665,7 @@ static const struct attribute_group *rapl_attr_update[] = {
_events_pkg_group,
_events_ram_group,
_events_gpu_group,
-   _events_gpu_group,
+   _events_psys_group,
NULL,
 };

[tip: perf/urgent] perf/x86/rapl: Support multiple RAPL unit quirks

2020-08-14 Thread tip-bot2 for Zhang Rui

The following commit has been merged into the perf/urgent branch of tip:

Commit-ID: 74f41adab0f4a61857833e1b6fa8e9ad12c251b6
Gitweb:
https://git.kernel.org/tip/74f41adab0f4a61857833e1b6fa8e9ad12c251b6
Author:Zhang Rui 
AuthorDate:Tue, 11 Aug 2020 23:31:48 +08:00
Committer: Ingo Molnar 
CommitterDate: Fri, 14 Aug 2020 12:35:12 +02:00

perf/x86/rapl: Support multiple RAPL unit quirks

There will be more platforms with different fixed energy units.
Enhance the code to support different RAPL unit quirks for different
platforms.

Signed-off-by: Zhang Rui 
Signed-off-by: Ingo Molnar 
Reviewed-by: Kan Liang 
Reviewed-by: Len Brown 
Link: https://lore.kernel.org/r/20200811153149.12242-3-rui.zh...@intel.com
---
 arch/x86/events/rapl.c | 24 +++-
 1 file changed, 15 insertions(+), 9 deletions(-)

diff --git a/arch/x86/events/rapl.c b/arch/x86/events/rapl.c
index e972383..d0002eb 100644
--- a/arch/x86/events/rapl.c
+++ b/arch/x86/events/rapl.c
@@ -130,11 +130,16 @@ struct rapl_pmus {
struct rapl_pmu *pmus[];
 };
 
+enum rapl_unit_quirk {
+   RAPL_UNIT_QUIRK_NONE,
+   RAPL_UNIT_QUIRK_INTEL_HSW,
+};
+
 struct rapl_model {
struct perf_msr *rapl_msrs;
unsigned long   events;
unsigned intmsr_power_unit;
-   boolapply_quirk;
+   enum rapl_unit_quirkunit_quirk;
 };
 
  /* 1/2^hw_unit Joule */
@@ -612,14 +617,20 @@ static int rapl_check_hw_unit(struct rapl_model *rm)
for (i = 0; i < NR_RAPL_DOMAINS; i++)
rapl_hw_unit[i] = (msr_rapl_power_unit_bits >> 8) & 0x1FULL;
 
+   switch (rm->unit_quirk) {
/*
 * DRAM domain on HSW server and KNL has fixed energy unit which can be
 * different than the unit from power unit MSR. See
 * "Intel Xeon Processor E5-1600 and E5-2600 v3 Product Families, V2
 * of 2. Datasheet, September 2014, Reference Number: 330784-001 "
 */
-   if (rm->apply_quirk)
+   case RAPL_UNIT_QUIRK_INTEL_HSW:
rapl_hw_unit[PERF_RAPL_RAM] = 16;
+   break;
+   default:
+   break;
+   }
+
 
/*
 * Calculate the timer rate:
@@ -698,7 +709,6 @@ static struct rapl_model model_snb = {
.events = BIT(PERF_RAPL_PP0) |
  BIT(PERF_RAPL_PKG) |
  BIT(PERF_RAPL_PP1),
-   .apply_quirk= false,
.msr_power_unit = MSR_RAPL_POWER_UNIT,
.rapl_msrs  = intel_rapl_msrs,
 };
@@ -707,7 +717,6 @@ static struct rapl_model model_snbep = {
.events = BIT(PERF_RAPL_PP0) |
  BIT(PERF_RAPL_PKG) |
  BIT(PERF_RAPL_RAM),
-   .apply_quirk= false,
.msr_power_unit = MSR_RAPL_POWER_UNIT,
.rapl_msrs  = intel_rapl_msrs,
 };
@@ -717,7 +726,6 @@ static struct rapl_model model_hsw = {
  BIT(PERF_RAPL_PKG) |
  BIT(PERF_RAPL_RAM) |
  BIT(PERF_RAPL_PP1),
-   .apply_quirk= false,
.msr_power_unit = MSR_RAPL_POWER_UNIT,
.rapl_msrs  = intel_rapl_msrs,
 };
@@ -726,7 +734,7 @@ static struct rapl_model model_hsx = {
.events = BIT(PERF_RAPL_PP0) |
  BIT(PERF_RAPL_PKG) |
  BIT(PERF_RAPL_RAM),
-   .apply_quirk= true,
+   .unit_quirk = RAPL_UNIT_QUIRK_INTEL_HSW,
.msr_power_unit = MSR_RAPL_POWER_UNIT,
.rapl_msrs  = intel_rapl_msrs,
 };
@@ -734,7 +742,7 @@ static struct rapl_model model_hsx = {
 static struct rapl_model model_knl = {
.events = BIT(PERF_RAPL_PKG) |
  BIT(PERF_RAPL_RAM),
-   .apply_quirk= true,
+   .unit_quirk = RAPL_UNIT_QUIRK_INTEL_HSW,
.msr_power_unit = MSR_RAPL_POWER_UNIT,
.rapl_msrs  = intel_rapl_msrs,
 };
@@ -745,14 +753,12 @@ static struct rapl_model model_skl = {
  BIT(PERF_RAPL_RAM) |
  BIT(PERF_RAPL_PP1) |
  BIT(PERF_RAPL_PSYS),
-   .apply_quirk= false,
.msr_power_unit = MSR_RAPL_POWER_UNIT,
.rapl_msrs  = intel_rapl_msrs,
 };
 
 static struct rapl_model model_amd_fam17h = {
.events = BIT(PERF_RAPL_PKG),
-   .apply_quirk= false,
.msr_power_unit = MSR_AMD_RAPL_POWER_UNIT,
.rapl_msrs  = amd_rapl_msrs,
 };

[tip: perf/urgent] perf/x86/rapl: Add support for Intel SPR platform

2020-08-14 Thread tip-bot2 for Zhang Rui

The following commit has been merged into the perf/urgent branch of tip:

Commit-ID: bcfd218b66790243ef303c1b35ce59f786ded225
Gitweb:
https://git.kernel.org/tip/bcfd218b66790243ef303c1b35ce59f786ded225
Author:Zhang Rui 
AuthorDate:Tue, 11 Aug 2020 23:31:49 +08:00
Committer: Ingo Molnar 
CommitterDate: Fri, 14 Aug 2020 12:35:12 +02:00

perf/x86/rapl: Add support for Intel SPR platform

Intel SPR platform uses fixed 16 bit energy unit for DRAM RAPL domain,
and fixed 0 bit energy unit for Psys RAPL domain.
After this, on SPR platform the energy counters appear in perf list.

Signed-off-by: Zhang Rui 
Signed-off-by: Ingo Molnar 
Reviewed-by: Kan Liang 
Acked-by: Len Brown 
Link: https://lore.kernel.org/r/20200811153149.12242-4-rui.zh...@intel.com
---
 arch/x86/events/rapl.c | 20 
 1 file changed, 20 insertions(+)

diff --git a/arch/x86/events/rapl.c b/arch/x86/events/rapl.c
index d0002eb..67b411f 100644
--- a/arch/x86/events/rapl.c
+++ b/arch/x86/events/rapl.c
@@ -133,6 +133,7 @@ struct rapl_pmus {
 enum rapl_unit_quirk {
RAPL_UNIT_QUIRK_NONE,
RAPL_UNIT_QUIRK_INTEL_HSW,
+   RAPL_UNIT_QUIRK_INTEL_SPR,
 };
 
 struct rapl_model {
@@ -627,6 +628,14 @@ static int rapl_check_hw_unit(struct rapl_model *rm)
case RAPL_UNIT_QUIRK_INTEL_HSW:
rapl_hw_unit[PERF_RAPL_RAM] = 16;
break;
+   /*
+* SPR shares the same DRAM domain energy unit as HSW, plus it
+* also has a fixed energy unit for Psys domain.
+*/
+   case RAPL_UNIT_QUIRK_INTEL_SPR:
+   rapl_hw_unit[PERF_RAPL_RAM] = 16;
+   rapl_hw_unit[PERF_RAPL_PSYS] = 0;
+   break;
default:
break;
}
@@ -757,6 +766,16 @@ static struct rapl_model model_skl = {
.rapl_msrs  = intel_rapl_msrs,
 };
 
+static struct rapl_model model_spr = {
+   .events = BIT(PERF_RAPL_PP0) |
+ BIT(PERF_RAPL_PKG) |
+ BIT(PERF_RAPL_RAM) |
+ BIT(PERF_RAPL_PSYS),
+   .unit_quirk = RAPL_UNIT_QUIRK_INTEL_SPR,
+   .msr_power_unit = MSR_RAPL_POWER_UNIT,
+   .rapl_msrs  = intel_rapl_msrs,
+};
+
 static struct rapl_model model_amd_fam17h = {
.events = BIT(PERF_RAPL_PKG),
.msr_power_unit = MSR_AMD_RAPL_POWER_UNIT,
@@ -793,6 +812,7 @@ static const struct x86_cpu_id rapl_model_match[] 
__initconst = {
X86_MATCH_INTEL_FAM6_MODEL(ICELAKE_X,   _hsx),
X86_MATCH_INTEL_FAM6_MODEL(COMETLAKE_L, _skl),
X86_MATCH_INTEL_FAM6_MODEL(COMETLAKE,   _skl),
+   X86_MATCH_INTEL_FAM6_MODEL(SAPPHIRERAPIDS_X,_spr),
X86_MATCH_VENDOR_FAM(AMD,   0x17,   _amd_fam17h),
X86_MATCH_VENDOR_FAM(HYGON, 0x18,   _amd_fam17h),
{},

Re: [PATCH v2 2/3] perf/x86/rapl: Support multiple rapl unit quirks

2020-08-11 Thread Zhang Rui

Hi,

Thanks for reviewing.

On Tue, 2020-08-11 at 11:19 -0700, Joe Perches wrote:
> On Tue, 2020-08-11 at 23:31 +0800, Zhang Rui wrote:
> > There will be more platforms with different fixed energy units.
> > Enhance the code to support different rapl unit quirks for
> > different
> > platforms.
> 
> This seems like one quirk per platform.
> 
> Should multiple quirks on individual platforms be supported?
> 
enum rapl_unit_quirk is just used as a flag.
multiple quirks can be deployed with the same flag, just like what I
did in patch 3/3.
Also different platforms can either have different flags or share the
same flag.

thanks,
rui

> > diff --git a/arch/x86/events/rapl.c b/arch/x86/events/rapl.c
> 
> []
> > @@ -130,11 +130,16 @@ struct rapl_pmus {
> > struct rapl_pmu *pmus[];
> >  };
> >  
> > +enum rapl_unit_quirk {
> > +   RAPL_UNIT_QUIRK_NONE,
> > +   RAPL_UNIT_QUIRK_INTEL_HSW,
> > +};
> > +
> >  struct rapl_model {
> > struct perf_msr *rapl_msrs;
> > unsigned long   events;
> > unsigned intmsr_power_unit;
> > -   boolapply_quirk;
> > +   enum rapl_unit_quirkunit_quirk;
> >  };
> 
>

[PATCH v2 0/3] perf/x86/rapl: Add Intel SapphireRapids support

2020-08-11 Thread Zhang Rui

Hi, all,

This patch set adds rapl perf event support for Intel SapphireRapids
platform.

Patch 1/3 fixes a regression that Psys RAPL Domain sysfs I/F is missing.
Patch 2/3 introduces support for different energy unit quirks.
Patch 3/3 introduces support for Intel SapphireRapids platform, which has
  fixed energy units for DRAM RAPL Domain and Psys RAPL Domain.

Any feedbacks are appreciated.

thanks,
rui

v1..v2:
- add ACK from Jiri Olsa.
- update patch 3/3 to solve a conflict introduced in the merge window.

[PATCH v2 3/3] perf/x86/rapl: Add support for Intel SPR platform

2020-08-11 Thread Zhang Rui

Intel SPR platform uses fixed 16 bit energy unit for DRAM RAPL domain,
and fixed 0 bit energy unit for Psys RAPL domain.
After this, on SPR platform the energy counters appear in perf list.

Signed-off-by: Zhang Rui 
Reviewed-by: Kan Liang 
Acked-by: Len Brown 
---
 arch/x86/events/rapl.c | 20 
 1 file changed, 20 insertions(+)

diff --git a/arch/x86/events/rapl.c b/arch/x86/events/rapl.c
index d0002eb971b7..67b411f7e8c4 100644
--- a/arch/x86/events/rapl.c
+++ b/arch/x86/events/rapl.c
@@ -133,6 +133,7 @@ struct rapl_pmus {
 enum rapl_unit_quirk {
RAPL_UNIT_QUIRK_NONE,
RAPL_UNIT_QUIRK_INTEL_HSW,
+   RAPL_UNIT_QUIRK_INTEL_SPR,
 };
 
 struct rapl_model {
@@ -627,6 +628,14 @@ static int rapl_check_hw_unit(struct rapl_model *rm)
case RAPL_UNIT_QUIRK_INTEL_HSW:
rapl_hw_unit[PERF_RAPL_RAM] = 16;
break;
+   /*
+* SPR shares the same DRAM domain energy unit as HSW, plus it
+* also has a fixed energy unit for Psys domain.
+*/
+   case RAPL_UNIT_QUIRK_INTEL_SPR:
+   rapl_hw_unit[PERF_RAPL_RAM] = 16;
+   rapl_hw_unit[PERF_RAPL_PSYS] = 0;
+   break;
default:
break;
}
@@ -757,6 +766,16 @@ static struct rapl_model model_skl = {
.rapl_msrs  = intel_rapl_msrs,
 };
 
+static struct rapl_model model_spr = {
+   .events = BIT(PERF_RAPL_PP0) |
+ BIT(PERF_RAPL_PKG) |
+ BIT(PERF_RAPL_RAM) |
+ BIT(PERF_RAPL_PSYS),
+   .unit_quirk = RAPL_UNIT_QUIRK_INTEL_SPR,
+   .msr_power_unit = MSR_RAPL_POWER_UNIT,
+   .rapl_msrs  = intel_rapl_msrs,
+};
+
 static struct rapl_model model_amd_fam17h = {
.events = BIT(PERF_RAPL_PKG),
.msr_power_unit = MSR_AMD_RAPL_POWER_UNIT,
@@ -793,6 +812,7 @@ static const struct x86_cpu_id rapl_model_match[] 
__initconst = {
X86_MATCH_INTEL_FAM6_MODEL(ICELAKE_X,   _hsx),
X86_MATCH_INTEL_FAM6_MODEL(COMETLAKE_L, _skl),
X86_MATCH_INTEL_FAM6_MODEL(COMETLAKE,   _skl),
+   X86_MATCH_INTEL_FAM6_MODEL(SAPPHIRERAPIDS_X,_spr),
X86_MATCH_VENDOR_FAM(AMD,   0x17,   _amd_fam17h),
X86_MATCH_VENDOR_FAM(HYGON, 0x18,   _amd_fam17h),
{},
-- 
2.17.1

[PATCH v2 2/3] perf/x86/rapl: Support multiple rapl unit quirks

2020-08-11 Thread Zhang Rui

There will be more platforms with different fixed energy units.
Enhance the code to support different rapl unit quirks for different
platforms.

Signed-off-by: Zhang Rui 
Reviewed-by: Kan Liang 
Reviewed-by: Len Brown 
---
 arch/x86/events/rapl.c | 24 +++-
 1 file changed, 15 insertions(+), 9 deletions(-)

diff --git a/arch/x86/events/rapl.c b/arch/x86/events/rapl.c
index e9723833551f..d0002eb971b7 100644
--- a/arch/x86/events/rapl.c
+++ b/arch/x86/events/rapl.c
@@ -130,11 +130,16 @@ struct rapl_pmus {
struct rapl_pmu *pmus[];
 };
 
+enum rapl_unit_quirk {
+   RAPL_UNIT_QUIRK_NONE,
+   RAPL_UNIT_QUIRK_INTEL_HSW,
+};
+
 struct rapl_model {
struct perf_msr *rapl_msrs;
unsigned long   events;
unsigned intmsr_power_unit;
-   boolapply_quirk;
+   enum rapl_unit_quirkunit_quirk;
 };
 
  /* 1/2^hw_unit Joule */
@@ -612,14 +617,20 @@ static int rapl_check_hw_unit(struct rapl_model *rm)
for (i = 0; i < NR_RAPL_DOMAINS; i++)
rapl_hw_unit[i] = (msr_rapl_power_unit_bits >> 8) & 0x1FULL;
 
+   switch (rm->unit_quirk) {
/*
 * DRAM domain on HSW server and KNL has fixed energy unit which can be
 * different than the unit from power unit MSR. See
 * "Intel Xeon Processor E5-1600 and E5-2600 v3 Product Families, V2
 * of 2. Datasheet, September 2014, Reference Number: 330784-001 "
 */
-   if (rm->apply_quirk)
+   case RAPL_UNIT_QUIRK_INTEL_HSW:
rapl_hw_unit[PERF_RAPL_RAM] = 16;
+   break;
+   default:
+   break;
+   }
+
 
/*
 * Calculate the timer rate:
@@ -698,7 +709,6 @@ static struct rapl_model model_snb = {
.events = BIT(PERF_RAPL_PP0) |
  BIT(PERF_RAPL_PKG) |
  BIT(PERF_RAPL_PP1),
-   .apply_quirk= false,
.msr_power_unit = MSR_RAPL_POWER_UNIT,
.rapl_msrs  = intel_rapl_msrs,
 };
@@ -707,7 +717,6 @@ static struct rapl_model model_snbep = {
.events = BIT(PERF_RAPL_PP0) |
  BIT(PERF_RAPL_PKG) |
  BIT(PERF_RAPL_RAM),
-   .apply_quirk= false,
.msr_power_unit = MSR_RAPL_POWER_UNIT,
.rapl_msrs  = intel_rapl_msrs,
 };
@@ -717,7 +726,6 @@ static struct rapl_model model_hsw = {
  BIT(PERF_RAPL_PKG) |
  BIT(PERF_RAPL_RAM) |
  BIT(PERF_RAPL_PP1),
-   .apply_quirk= false,
.msr_power_unit = MSR_RAPL_POWER_UNIT,
.rapl_msrs  = intel_rapl_msrs,
 };
@@ -726,7 +734,7 @@ static struct rapl_model model_hsx = {
.events = BIT(PERF_RAPL_PP0) |
  BIT(PERF_RAPL_PKG) |
  BIT(PERF_RAPL_RAM),
-   .apply_quirk= true,
+   .unit_quirk = RAPL_UNIT_QUIRK_INTEL_HSW,
.msr_power_unit = MSR_RAPL_POWER_UNIT,
.rapl_msrs  = intel_rapl_msrs,
 };
@@ -734,7 +742,7 @@ static struct rapl_model model_hsx = {
 static struct rapl_model model_knl = {
.events = BIT(PERF_RAPL_PKG) |
  BIT(PERF_RAPL_RAM),
-   .apply_quirk= true,
+   .unit_quirk = RAPL_UNIT_QUIRK_INTEL_HSW,
.msr_power_unit = MSR_RAPL_POWER_UNIT,
.rapl_msrs  = intel_rapl_msrs,
 };
@@ -745,14 +753,12 @@ static struct rapl_model model_skl = {
  BIT(PERF_RAPL_RAM) |
  BIT(PERF_RAPL_PP1) |
  BIT(PERF_RAPL_PSYS),
-   .apply_quirk= false,
.msr_power_unit = MSR_RAPL_POWER_UNIT,
.rapl_msrs  = intel_rapl_msrs,
 };
 
 static struct rapl_model model_amd_fam17h = {
.events = BIT(PERF_RAPL_PKG),
-   .apply_quirk= false,
.msr_power_unit = MSR_AMD_RAPL_POWER_UNIT,
.rapl_msrs  = amd_rapl_msrs,
 };
-- 
2.17.1

[PATCH v2 1/3] perf/x86/rapl: Fix missing psys sysfs attributes

2020-08-11 Thread Zhang Rui

This fixes a problem introduced by
commit 5fb5273a905c ("perf/x86/rapl: Use new MSR detection interface")
that perf event sysfs attributes for psys RAPL domain are missing.

Fixes: 5fb5273a905c ("perf/x86/rapl: Use new MSR detection interface")
Signed-off-by: Zhang Rui 
Reviewed-by: Kan Liang 
Reviewed-by: Len Brown 
Acked-by: Jiri Olsa 
---
 arch/x86/events/rapl.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/arch/x86/events/rapl.c b/arch/x86/events/rapl.c
index 68b38820b10e..e9723833551f 100644
--- a/arch/x86/events/rapl.c
+++ b/arch/x86/events/rapl.c
@@ -665,7 +665,7 @@ static const struct attribute_group *rapl_attr_update[] = {
_events_pkg_group,
_events_ram_group,
_events_gpu_group,
-   _events_gpu_group,
+   _events_psys_group,
NULL,
 };
 
-- 
2.17.1

Re: [GIT PULL] thermal for v5.9-rc1

2020-08-04 Thread Zhang Rui

Hi, Linus,

On Mon, 2020-08-03 at 20:26 -0700, Linus Torvalds wrote:
> On Mon, Aug 3, 2020 at 2:44 PM Daniel Lezcano <
> daniel.lezc...@linaro.org> wrote:
> > 
> > 
ssh://g...@gitolite.kernel.org/pub/scm/linux/kernel/git/thermal/linux.git
> > tags/thermal-v5.9-rc1
> 
> This was all rebased just an hour before you sent it to me.
> 
> Why?
> 

There must be something wrong here, Daniel and I are following a strict
process to make sure that we don't lose any history.

For this PR, I'm not quite sure what happened, he probably did
something by mistake when generating it.

thanks,
rui

> Maybe it's how you commonly work, and I just haven't noticed before,
> but it's wrong for all the reasons I've stated about a million times
> now.
> 
> What makes it so hard for people to understand? What makes that "you
> sent me a completely untested pull request and that's not ok" so
> difficult a concept to get?
> 
> And dammit, if you do it and have a good reason to do this despite
> literally *decades* of me telling people not to do that, and why it's
> wrong, then  you can spend the five minutes *explaining* why you do
> something that is widely documented to be bad.
> 
> These commits sure as hell weren't in linux-next either.
> 
>Linus

Re: [PATCH 1/3] perf/x86/rapl: Fix missing psys sysfs attributes

2020-07-28 Thread Zhang Rui

On Fri, 2020-07-17 at 10:33 +0200, Jiri Olsa wrote:
> On Thu, Jul 16, 2020 at 11:18:57PM +0800, Zhang Rui wrote:
> > This fixes a problem introduced by
> > commit 5fb5273a905c ("perf/x86/rapl: Use new MSR detection
> > interface")
> > that perf event sysfs attributes for psys RAPL domain are missing.
> > 
> > Fixes: 5fb5273a905c ("perf/x86/rapl: Use new MSR detection
> > interface")
> > Signed-off-by: Zhang Rui 
> > Reviewed-by: Kan Liang 
> > Reviewed-by: Len Brown 
> > ---
> >  arch/x86/events/rapl.c | 2 +-
> >  1 file changed, 1 insertion(+), 1 deletion(-)
> > 
> > diff --git a/arch/x86/events/rapl.c b/arch/x86/events/rapl.c
> > index 0f2bf59f4354..51ff9a3618c9 100644
> > --- a/arch/x86/events/rapl.c
> > +++ b/arch/x86/events/rapl.c
> > @@ -665,7 +665,7 @@ static const struct attribute_group
> > *rapl_attr_update[] = {
> > _events_pkg_group,
> > _events_ram_group,
> > _events_gpu_group,
> > -   _events_gpu_group,
> > +   _events_psys_group,
> 
> I did copy & paste but did not change to psys :-\
> 
> Acked-by: Jiri Olsa 

Hi, jirka,

Thanks for your ACK.


Hi, Peter,

A gentle ping on this patch series.

thanks,
rui
> 
> thanks,
> jirka
> 
> > NULL,
> >  };
> >  
> > -- 
> > 2.17.1
> > 
> 
>

Re: [PATCH] thermal: core: Add thermal zone enable/disable notification

2020-07-28 Thread Zhang Rui

On Tue, 2020-07-28 at 01:10 +0200, Daniel Lezcano wrote:
> Now the calls to enable/disable a thermal zone are centralized in a
> call to a function, we can add in these the corresponding netlink
> notifications.
> 
> Signed-off-by: Daniel Lezcano 

Acked-by: Zhang Rui 
> ---
>  drivers/thermal/thermal_core.c | 5 +
>  1 file changed, 5 insertions(+)
> 
> diff --git a/drivers/thermal/thermal_core.c
> b/drivers/thermal/thermal_core.c
> index 9748fbb9a3a1..72bf159bcecc 100644
> --- a/drivers/thermal/thermal_core.c
> +++ b/drivers/thermal/thermal_core.c
> @@ -509,6 +509,11 @@ static int thermal_zone_device_set_mode(struct
> thermal_zone_device *tz,
>  
>   thermal_zone_device_update(tz, THERMAL_EVENT_UNSPECIFIED);
>  
> + if (mode == THERMAL_DEVICE_ENABLED)
> + thermal_notify_tz_enable(tz->id);
> + else
> + thermal_notify_tz_disable(tz->id);
> +
>   return ret;
>  }
>

[PATCH 3/3] perf/x86/rapl: Add support for Intel SPR platform

2020-07-16 Thread Zhang Rui

Intel SPR platform uses fixed 16 bit energy unit for DRAM RAPL domain,
and fixed 0 bit energy unit for Psys RAPL domain.
After this, on SPR platform the energy counters appear in perf list.

Signed-off-by: Zhang Rui 
Reviewed-by: Kan Liang 
Acked-by: Len Brown 
---
 arch/x86/events/rapl.c | 20 
 1 file changed, 20 insertions(+)

diff --git a/arch/x86/events/rapl.c b/arch/x86/events/rapl.c
index 5b3e11299c8d..731e3a32f723 100644
--- a/arch/x86/events/rapl.c
+++ b/arch/x86/events/rapl.c
@@ -133,6 +133,7 @@ struct rapl_pmus {
 enum rapl_unit_quirk {
RAPL_UNIT_QUIRK_NONE,
RAPL_UNIT_QUIRK_INTEL_HSW,
+   RAPL_UNIT_QUIRK_INTEL_SPR,
 };
 
 struct rapl_model {
@@ -627,6 +628,14 @@ static int rapl_check_hw_unit(struct rapl_model *rm)
case RAPL_UNIT_QUIRK_INTEL_HSW:
rapl_hw_unit[PERF_RAPL_RAM] = 16;
break;
+   /*
+* SPR shares the same DRAM domain energy unit as HSW, plus it
+* also has a fixed energy unit for Psys domain.
+*/
+   case RAPL_UNIT_QUIRK_INTEL_SPR:
+   rapl_hw_unit[PERF_RAPL_RAM] = 16;
+   rapl_hw_unit[PERF_RAPL_PSYS] = 0;
+   break;
default:
break;
}
@@ -757,6 +766,16 @@ static struct rapl_model model_skl = {
.rapl_msrs  = intel_rapl_msrs,
 };
 
+static struct rapl_model model_spr = {
+   .events = BIT(PERF_RAPL_PP0) |
+ BIT(PERF_RAPL_PKG) |
+ BIT(PERF_RAPL_RAM) |
+ BIT(PERF_RAPL_PSYS),
+   .unit_quirk = RAPL_UNIT_QUIRK_INTEL_SPR,
+   .msr_power_unit = MSR_RAPL_POWER_UNIT,
+   .rapl_msrs  = intel_rapl_msrs,
+};
+
 static struct rapl_model model_amd_fam17h = {
.events = BIT(PERF_RAPL_PKG),
.msr_power_unit = MSR_AMD_RAPL_POWER_UNIT,
@@ -793,6 +812,7 @@ static const struct x86_cpu_id rapl_model_match[] 
__initconst = {
X86_MATCH_INTEL_FAM6_MODEL(ICELAKE_X,   _hsx),
X86_MATCH_INTEL_FAM6_MODEL(COMETLAKE_L, _skl),
X86_MATCH_INTEL_FAM6_MODEL(COMETLAKE,   _skl),
+   X86_MATCH_INTEL_FAM6_MODEL(SAPPHIRERAPIDS_X,_spr),
X86_MATCH_VENDOR_FAM(AMD, 0x17, _amd_fam17h),
{},
 };
-- 
2.17.1

[PATCH 1/3] perf/x86/rapl: Fix missing psys sysfs attributes

2020-07-16 Thread Zhang Rui

This fixes a problem introduced by
commit 5fb5273a905c ("perf/x86/rapl: Use new MSR detection interface")
that perf event sysfs attributes for psys RAPL domain are missing.

Fixes: 5fb5273a905c ("perf/x86/rapl: Use new MSR detection interface")
Signed-off-by: Zhang Rui 
Reviewed-by: Kan Liang 
Reviewed-by: Len Brown 
---
 arch/x86/events/rapl.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/arch/x86/events/rapl.c b/arch/x86/events/rapl.c
index 0f2bf59f4354..51ff9a3618c9 100644
--- a/arch/x86/events/rapl.c
+++ b/arch/x86/events/rapl.c
@@ -665,7 +665,7 @@ static const struct attribute_group *rapl_attr_update[] = {
_events_pkg_group,
_events_ram_group,
_events_gpu_group,
-   _events_gpu_group,
+   _events_psys_group,
NULL,
 };
 
-- 
2.17.1

[PATCH 2/3] perf/x86/rapl: Support multiple rapl unit quirks

2020-07-16 Thread Zhang Rui

There will be more platforms with different fixed energy units.
Enhance the code to support different rapl unit quirks for different
platforms.

Signed-off-by: Zhang Rui 
Reviewed-by: Kan Liang 
Reviewed-by: Len Brown 
---
 arch/x86/events/rapl.c | 24 +++-
 1 file changed, 15 insertions(+), 9 deletions(-)

diff --git a/arch/x86/events/rapl.c b/arch/x86/events/rapl.c
index 51ff9a3618c9..5b3e11299c8d 100644
--- a/arch/x86/events/rapl.c
+++ b/arch/x86/events/rapl.c
@@ -130,11 +130,16 @@ struct rapl_pmus {
struct rapl_pmu *pmus[];
 };
 
+enum rapl_unit_quirk {
+   RAPL_UNIT_QUIRK_NONE,
+   RAPL_UNIT_QUIRK_INTEL_HSW,
+};
+
 struct rapl_model {
struct perf_msr *rapl_msrs;
unsigned long   events;
unsigned intmsr_power_unit;
-   boolapply_quirk;
+   enum rapl_unit_quirkunit_quirk;
 };
 
  /* 1/2^hw_unit Joule */
@@ -612,14 +617,20 @@ static int rapl_check_hw_unit(struct rapl_model *rm)
for (i = 0; i < NR_RAPL_DOMAINS; i++)
rapl_hw_unit[i] = (msr_rapl_power_unit_bits >> 8) & 0x1FULL;
 
+   switch (rm->unit_quirk) {
/*
 * DRAM domain on HSW server and KNL has fixed energy unit which can be
 * different than the unit from power unit MSR. See
 * "Intel Xeon Processor E5-1600 and E5-2600 v3 Product Families, V2
 * of 2. Datasheet, September 2014, Reference Number: 330784-001 "
 */
-   if (rm->apply_quirk)
+   case RAPL_UNIT_QUIRK_INTEL_HSW:
rapl_hw_unit[PERF_RAPL_RAM] = 16;
+   break;
+   default:
+   break;
+   }
+
 
/*
 * Calculate the timer rate:
@@ -698,7 +709,6 @@ static struct rapl_model model_snb = {
.events = BIT(PERF_RAPL_PP0) |
  BIT(PERF_RAPL_PKG) |
  BIT(PERF_RAPL_PP1),
-   .apply_quirk= false,
.msr_power_unit = MSR_RAPL_POWER_UNIT,
.rapl_msrs  = intel_rapl_msrs,
 };
@@ -707,7 +717,6 @@ static struct rapl_model model_snbep = {
.events = BIT(PERF_RAPL_PP0) |
  BIT(PERF_RAPL_PKG) |
  BIT(PERF_RAPL_RAM),
-   .apply_quirk= false,
.msr_power_unit = MSR_RAPL_POWER_UNIT,
.rapl_msrs  = intel_rapl_msrs,
 };
@@ -717,7 +726,6 @@ static struct rapl_model model_hsw = {
  BIT(PERF_RAPL_PKG) |
  BIT(PERF_RAPL_RAM) |
  BIT(PERF_RAPL_PP1),
-   .apply_quirk= false,
.msr_power_unit = MSR_RAPL_POWER_UNIT,
.rapl_msrs  = intel_rapl_msrs,
 };
@@ -726,7 +734,7 @@ static struct rapl_model model_hsx = {
.events = BIT(PERF_RAPL_PP0) |
  BIT(PERF_RAPL_PKG) |
  BIT(PERF_RAPL_RAM),
-   .apply_quirk= true,
+   .unit_quirk = RAPL_UNIT_QUIRK_INTEL_HSW,
.msr_power_unit = MSR_RAPL_POWER_UNIT,
.rapl_msrs  = intel_rapl_msrs,
 };
@@ -734,7 +742,7 @@ static struct rapl_model model_hsx = {
 static struct rapl_model model_knl = {
.events = BIT(PERF_RAPL_PKG) |
  BIT(PERF_RAPL_RAM),
-   .apply_quirk= true,
+   .unit_quirk = RAPL_UNIT_QUIRK_INTEL_HSW,
.msr_power_unit = MSR_RAPL_POWER_UNIT,
.rapl_msrs  = intel_rapl_msrs,
 };
@@ -745,14 +753,12 @@ static struct rapl_model model_skl = {
  BIT(PERF_RAPL_RAM) |
  BIT(PERF_RAPL_PP1) |
  BIT(PERF_RAPL_PSYS),
-   .apply_quirk= false,
.msr_power_unit = MSR_RAPL_POWER_UNIT,
.rapl_msrs  = intel_rapl_msrs,
 };
 
 static struct rapl_model model_amd_fam17h = {
.events = BIT(PERF_RAPL_PKG),
-   .apply_quirk= false,
.msr_power_unit = MSR_AMD_RAPL_POWER_UNIT,
.rapl_msrs  = amd_rapl_msrs,
 };
-- 
2.17.1

Re: [RFC PATCH 3/4] thermal:core:Add genetlink notifications for monitoring falling temperature

2020-07-15 Thread Zhang Rui

On Fri, 2020-07-10 at 09:51 -0400, Thara Gopinath wrote:
> Add notification calls for trip type THERMAL_TRIP_COLD when
> temperature
> crosses the trip point in either direction.
> 
> Signed-off-by: Thara Gopinath 
> ---
>  drivers/thermal/thermal_core.c | 21 +++--
>  1 file changed, 15 insertions(+), 6 deletions(-)
> 
> diff --git a/drivers/thermal/thermal_core.c
> b/drivers/thermal/thermal_core.c
> index 750a89f0c20a..e2302ca1cd3b 100644
> --- a/drivers/thermal/thermal_core.c
> +++ b/drivers/thermal/thermal_core.c
> @@ -429,12 +429,21 @@ static void handle_thermal_trip(struct
> thermal_zone_device *tz, int trip)
>   tz->ops->get_trip_hyst(tz, trip, );
>  
>   if (tz->last_temperature != THERMAL_TEMP_INVALID) {
> - if (tz->last_temperature < trip_temp &&
> - tz->temperature >= trip_temp)
> - thermal_notify_tz_trip_up(tz->id, trip);
> - if (tz->last_temperature >= trip_temp &&
> - tz->temperature < (trip_temp - hyst))
> - thermal_notify_tz_trip_down(tz->id, trip);
> + if (type == THERMAL_TRIP_COLD) {
> + if (tz->last_temperature > trip_temp &&
> + tz->temperature <= trip_temp)
> + thermal_notify_tz_trip_down(tz->id,
> trip);

trip_type should also be part of the event because trip_down/trip_up
for hot trip and cold trip have different meanings.
Or can we use some more generic names like trip_on/trip_off? trip_on
means the trip point is violated or actions need to be taken for the
specific trip points, for both hot and cold trips. I know
trip_on/trip_off doesn't represent what I mean clearly, but surely you
can find a better name.

thanks,
rui

> + if (tz->last_temperature <= trip_temp &&
> + tz->temperature > (trip_temp + hyst))
> + thermal_notify_tz_trip_up(tz->id,
> trip);
> + } else {
> + if (tz->last_temperature < trip_temp &&
> + tz->temperature >= trip_temp)
> + thermal_notify_tz_trip_up(tz->id,
> trip);
> + if (tz->last_temperature >= trip_temp &&
> + tz->temperature < (trip_temp - hyst))
> + thermal_notify_tz_trip_down(tz->id,
> trip);
> + }
>   }
>  
>   if (type == THERMAL_TRIP_CRITICAL || type == THERMAL_TRIP_HOT)

Re: [RFC PATCH 4/4] thermal: Modify thermal governors to do nothing for "cold" trip points

2020-07-15 Thread Zhang Rui

On Fri, 2020-07-10 at 09:51 -0400, Thara Gopinath wrote:
> For now, thermal governors do not support monitoring of falling
> temperature. Hence, in case of calls to the governor for trip points
> marked
> as cold, return doing nothing.
> 
> Signed-off-by: Thara Gopinath 
> ---
>  drivers/thermal/gov_bang_bang.c   | 8 
>  drivers/thermal/gov_fair_share.c  | 8 
>  drivers/thermal/gov_power_allocator.c | 8 
>  drivers/thermal/gov_step_wise.c   | 8 
>  4 files changed, 32 insertions(+)

userspace governor does not support cold trip point neither.

So how about adding the check in handle_non_critical_trips first, and
remove the check later, after all the governors support cold trip?

thanks,
rui
> 
> diff --git a/drivers/thermal/gov_bang_bang.c
> b/drivers/thermal/gov_bang_bang.c
> index 991a1c54296d..8324d13de1e7 100644
> --- a/drivers/thermal/gov_bang_bang.c
> +++ b/drivers/thermal/gov_bang_bang.c
> @@ -99,6 +99,14 @@ static void thermal_zone_trip_update(struct
> thermal_zone_device *tz, int trip)
>  static int bang_bang_control(struct thermal_zone_device *tz, int
> trip)
>  {
>   struct thermal_instance *instance;
> + enum thermal_trip_type trip_type;
> +
> + /* Return doing nothing in case of cold trip point */
> + if (trip != THERMAL_TRIPS_NONE) {
> + tz->ops->get_trip_type(tz, trip, _type);
> + if (trip_type == THERMAL_TRIP_COLD)
> + return 0;
> + }
>  
>   thermal_zone_trip_update(tz, trip);
>  
> diff --git a/drivers/thermal/gov_fair_share.c
> b/drivers/thermal/gov_fair_share.c
> index aaa07180ab48..c0adce525faa 100644
> --- a/drivers/thermal/gov_fair_share.c
> +++ b/drivers/thermal/gov_fair_share.c
> @@ -81,6 +81,14 @@ static int fair_share_throttle(struct
> thermal_zone_device *tz, int trip)
>   int total_weight = 0;
>   int total_instance = 0;
>   int cur_trip_level = get_trip_level(tz);
> + enum thermal_trip_type trip_type;
> +
> + /* Return doing nothing in case of cold trip point */
> + if (trip != THERMAL_TRIPS_NONE) {
> + tz->ops->get_trip_type(tz, trip, _type);
> + if (trip_type == THERMAL_TRIP_COLD)
> + return 0;
> + }
>  
>   list_for_each_entry(instance, >thermal_instances, tz_node)
> {
>   if (instance->trip != trip)
> diff --git a/drivers/thermal/gov_power_allocator.c
> b/drivers/thermal/gov_power_allocator.c
> index 44636475b2a3..2644ad4d4032 100644
> --- a/drivers/thermal/gov_power_allocator.c
> +++ b/drivers/thermal/gov_power_allocator.c
> @@ -613,8 +613,16 @@ static int power_allocator_throttle(struct
> thermal_zone_device *tz, int trip)
>  {
>   int ret;
>   int switch_on_temp, control_temp;
> + enum thermal_trip_type trip_type;
>   struct power_allocator_params *params = tz->governor_data;
>  
> + /* Return doing nothing in case of cold trip point */
> + if (trip != THERMAL_TRIPS_NONE) {
> + tz->ops->get_trip_type(tz, trip, _type);
> + if (trip_type == THERMAL_TRIP_COLD)
> + return 0;
> + }
> +
>   /*
>* We get called for every trip point but we only need to do
>* our calculations once
> diff --git a/drivers/thermal/gov_step_wise.c
> b/drivers/thermal/gov_step_wise.c
> index 2ae7198d3067..009aefda0441 100644
> --- a/drivers/thermal/gov_step_wise.c
> +++ b/drivers/thermal/gov_step_wise.c
> @@ -186,6 +186,14 @@ static void thermal_zone_trip_update(struct
> thermal_zone_device *tz, int trip)
>  static int step_wise_throttle(struct thermal_zone_device *tz, int
> trip)
>  {
>   struct thermal_instance *instance;
> + enum thermal_trip_type trip_type;
> +
> + /* For now, return doing nothing in case of cold trip point */
> + if (trip != THERMAL_TRIPS_NONE) {
> + tz->ops->get_trip_type(tz, trip, _type);
> + if (trip_type == THERMAL_TRIP_COLD)
> + return 0;
> + }
>  
>   thermal_zone_trip_update(tz, trip);
>

Re: [RFC PATCH 0/4] thermal: Introduce support for monitoring falling temperature

2020-07-15 Thread Zhang Rui

Hi, Thara,

On Tue, 2020-07-14 at 17:39 -0400, Thara Gopinath wrote:
> 
> > 
> > For example, to support this, we can
> > either
> > introduce both "cold" trip points and "warming devices", and
> > introduce
> > new logic in thermal framework and governors to handle them,
> > Or
> > introduce "cold" trip point and "warming" device, but only
> > semantically, and treat them just like normal trip points and
> > cooling
> > devices. And strictly define cooling state 0 as the state that
> > generates most heat, and define max cooling state as the state that
> > generates least heat. Then, say, we have a trip point at -10C, the
> > "warming" device is set to cooling state 0 when the temperature is
> > lower than -10C, and in most cases, this thermal zone is always in
> > a
> > "overheating" state (temperature higher than -10C), and the
> > "warming"
> > device for this thermal zone is "throttled" to generate as least
> > heat
> > as possible. And this is pretty much what the current code has
> > always
> > been doing, right?
> 
> 
> IMHO, thermal framework should move to a direction where the term 
> "mitigation" is used rather than cooling or warming. In this case 
> "cooling dev" and "warming dev" should will become 
> "temp-mitigating-dev". So going by this, I think what you mention as 
> option 1 is more suitable where new logic is introduced into the 
> framework and governors to handle the trip points marked as "cold".
> 
> Also in the current set of requirements, we have a few power domain 
> rails and other resources that are used exclusively in the thermal 
> framework for warming alone as in they are not used ever for cooling 
> down a zone. But then one of the requirements we have discussed is
> for cpufreq and gpu scaling to be behave as warming devices where
> the minimum operating point/ voltage of the relevant cpu/gpu is
> restricted.
> So in this case, Daniel had this suggestion of introducing negative 
> states for presently what is defined as cooling devices. So cooling
> dev 
> / temp-mitigation-dev states can range from say -3 to 5 with 0 as
> the 
> good state where no mitigation is happening. This is an interesting
> idea 
> though I have not proto-typed it yet.

Agreed. If some devices support both "cooling" and "warning", we should
have only one "temp-mitigating-dev" instead.
> 
> > 
> > I can not say which one is better for now as I don't have the
> > background of this requirement. It's nice that Thara sent this RFC
> > series for discussion, but from upstream point of view, I'd prefer
> > to
> > see a full stack solution, before taking any code.
> 
> We had done a session at ELC on this requirement. Here is the link
> to 
> the presentation. Hopefully it gives you some back ground on this.

yes, it helps. :)
> 
> 
https://elinux.org/images/f/f7/ELC-2020-Thara-Ram-Linux-Kernel-Thermal-Warming.pdf
> 
> I have sent across some patches for introducing a generic power
> domain 
> warming device which is under review by Daniel.
> 
> So how do you want to proceed on this? Can you elaborate a bit more
> on 
> what you mean by a full stack solution.

I mean, the patches, and the idea look good to me, just with some minor
comments. But applying this patch series, alone, does not bring us
anything because we don't have a thermal zone driver that supports cold
trip point, right?
I'd like to see this patch series together with the support in
thermal_core/governors and real users like updated/new thermal
zone/cdev drivers that supports the cold trip point and warming
actions.
Or else I've the concern that this piece of code may be changed back
and forth when prototyping the rest of the support.

thanks,
rui

Re: [RFC PATCH 0/4] thermal: Introduce support for monitoring falling temperature

2020-07-14 Thread Zhang Rui

On Mon, 2020-07-13 at 17:03 +0200, Daniel Lezcano wrote:
> On 10/07/2020 15:51, Thara Gopinath wrote:
> > Thermal framework today supports monitoring for rising temperatures
> > and
> > subsequently initiating cooling action in case of a thermal trip
> > point
> > being crossed. There are scenarios where a SoC need some warming
> > action to
> > be activated if the temperature falls below a cetain permissible
> > limit.
> > Since warming action can be considered mirror opposite of cooling
> > action,
> > most of the thermal framework can be re-used to achieve this.
> > 
> > This patch series is yet another attempt to add support for
> > monitoring
> > falling temperature in thermal framework. Unlike the first
> > attempt[1]
> > (where a new property was added to thermal trip point binding to
> > indicate
> > direction of temperature monitoring), this series introduces a new
> > trip
> > point type (THERMAL_TRIP_COLD) to indicate a trip point at which
> > falling
> > temperature monitoring must be triggered. This patch series uses
> > Daniel
> > Lezcano's recently added thermal genetlink interface[2] to notify
> > userspace
> > of falling temperature and rising temperature at the cold trip
> > point. This
> > will enable a user space engine to trigger the relevant mitigation
> > for
> > falling temperature. At present, no support is added to any of the
> > thermal
> > governors to monitor and mitigate falling temperature at the cold
> > trip
> > point;rather all governors return doing nothing if triggered for a
> > cold
> > trip point. As future extension, monitoring of falling temperature
> > can be
> > added to the relevant thermal governor. 
> 
> I agree we need a cold trip point in order to introduce the
> functioning
> temperature range in the thermal framework.
> 
> Rui, what is your opinion ?

I agree with the concept of "cold" trip point.
In this patch set, the cold trip point is defined with only netlink
event support. But there are still quite a lot of things unclear,
especially what we should do in thermal framework?

For example, to support this, we can
either
introduce both "cold" trip points and "warming devices", and introduce
new logic in thermal framework and governors to handle them,
Or
introduce "cold" trip point and "warming" device, but only
semantically, and treat them just like normal trip points and cooling
devices. And strictly define cooling state 0 as the state that
generates most heat, and define max cooling state as the state that
generates least heat. Then, say, we have a trip point at -10C, the
"warming" device is set to cooling state 0 when the temperature is
lower than -10C, and in most cases, this thermal zone is always in a
"overheating" state (temperature higher than -10C), and the "warming"
device for this thermal zone is "throttled" to generate as least heat
as possible. And this is pretty much what the current code has always
been doing, right?

I can not say which one is better for now as I don't have the
background of this requirement. It's nice that Thara sent this RFC
series for discussion, but from upstream point of view, I'd prefer to
see a full stack solution, before taking any code.

thanks,
Rui

Re: [PATCH v4 3/4] thermal: core: genetlink support for events/cmd/sampling

2020-07-06 Thread Zhang Rui

On Mon, 2020-07-06 at 12:55 +0200, Daniel Lezcano wrote:
> Initially the thermal framework had a very simple notification
> mechanism to send generic netlink messages to the userspace.
> 
> The notification function was never called from anywhere and the
> corresponding dead code was removed. It was probably a first attempt
> to introduce the netlink notification.
> 
> At LPC2018, the presentation "Linux thermal: User kernel interface",
> proposed to create the notifications to the userspace via a kfifo.
> 
> The advantage of the kfifo is the performance. It is usually used
> from
> a 1:1 communication channel where a driver captures data and sends it
> as fast as possible to a userspace process.
> 
> The drawback is that only one process uses the notification channel
> exclusively, thus no other process is allowed to use the channel to
> get temperature or notifications.
> 
> This patch defines a generic netlink API to discover the current
> thermal setup and adds event notifications as well as temperature
> sampling. As any genetlink protocol, it can evolve and the versioning
> allows to keep the backward compatibility.
> 
> In order to prevent the user from getting flooded with data on a
> single channel, there are two multicast channels, one for the
> temperature sampling when the thermal zone is updated and another one
> for the events, so the user can get the events only without the
> thermal zone temperature sampling.
> 
> Also, a list of commands to discover the thermal setup is added and
> can be extended when needed.
> 
> Reviewed-by: Amit Kucheria 
> Signed-off-by: Daniel Lezcano 

Acked-by: Zhang Rui 
> ---
>   v4:
>   - Removed max cdev state and renamed function and attributes
>   with CDEV_STATE_UPDATE
>   v3:
>   - Fixed changelog from Amit Kucheria suggestions
>   - Prefixed fields in the parameter structure (trip_*, cdev_*)
>   - Fixed leading whitespaces errors
>   - Replaced id by trip_id
>   - s/THERMAL_GENL_CMD_TZ_GET/THERMAL_GENL_CMD_TZ_GET_ID/
>   - Added the cdev max state in the cdev change event
>   - Removed min state
>   - Fixed checkpatch warnings
> ---
> ---
>  drivers/thermal/Makefile  |   2 +-
>  drivers/thermal/thermal_core.h|  18 +
>  drivers/thermal/thermal_netlink.c | 648
> ++
>  include/linux/thermal.h   |  17 -
>  include/uapi/linux/thermal.h  |  89 +++-
>  5 files changed, 739 insertions(+), 35 deletions(-)
>  create mode 100644 drivers/thermal/thermal_netlink.c
> 
> diff --git a/drivers/thermal/Makefile b/drivers/thermal/Makefile
> index 0c8b84a09b9a..1bbf0805fb04 100644
> --- a/drivers/thermal/Makefile
> +++ b/drivers/thermal/Makefile
> @@ -5,7 +5,7 @@
>  
>  obj-$(CONFIG_THERMAL)+= thermal_sys.o
>  thermal_sys-y+= thermal_core.o
> thermal_sysfs.o \
> - thermal_helpers.o
> + thermal_helpers.o
> thermal_netlink.o
>  
>  # interface to/from other layers providing sensors
>  thermal_sys-$(CONFIG_THERMAL_HWMON)  += thermal_hwmon.o
> diff --git a/drivers/thermal/thermal_core.h
> b/drivers/thermal/thermal_core.h
> index 4f8389efaa62..b44969d50ec0 100644
> --- a/drivers/thermal/thermal_core.h
> +++ b/drivers/thermal/thermal_core.h
> @@ -52,6 +52,24 @@ int for_each_thermal_governor(int (*cb)(struct
> thermal_governor *, void *),
>  
>  struct thermal_zone_device *thermal_zone_get_by_id(int id);
>  
> +/* Netlink notification function */
> +int thermal_notify_tz_create(int tz_id, const char *name);
> +int thermal_notify_tz_delete(int tz_id);
> +int thermal_notify_tz_enable(int tz_id);
> +int thermal_notify_tz_disable(int tz_id);
> +int thermal_notify_tz_trip_down(int tz_id, int id);
> +int thermal_notify_tz_trip_up(int tz_id, int id);
> +int thermal_notify_tz_trip_delete(int tz_id, int id);
> +int thermal_notify_tz_trip_add(int tz_id, int id, int type,
> +int temp, int hyst);
> +int thermal_notify_tz_trip_change(int tz_id, int id, int type,
> +   int temp, int hyst);
> +int thermal_notify_cdev_state_update(int cdev_id, int state);
> +int thermal_notify_cdev_add(int cdev_id, const char *name, int
> max_state);
> +int thermal_notify_cdev_delete(int cdev_id);
> +int thermal_notify_tz_gov_change(int tz_id, const char *name);
> +int thermal_genl_sampling_temp(int id, int temp);
> +
>  struct thermal_attr {
>   struct device_attribute attr;
>   char name[THERMAL_NAME_LENGTH];
> diff --git a/drivers/thermal/thermal_netlink.c
> b/drivers/thermal/thermal_netlink.c
> new fi

Re: [PATCH v4 4/4] thermal: core: Add notifications call in the framework

2020-07-06 Thread Zhang Rui

On Mon, 2020-07-06 at 12:55 +0200, Daniel Lezcano wrote:
> The generic netlink protocol is implemented but the different
> notification functions are not yet connected to the core code.
> 
> These changes add the notification calls in the different
> corresponding places.
> 
> Reviewed-by: Amit Kucheria 
> Signed-off-by: Daniel Lezcano 

Acked-by: Zhang Rui 
> ---
>   v4:
>  - Fixed missing static declaration, reported by kbuild-bot
>  - Removed max state notification
> ---
>  drivers/thermal/thermal_core.c| 21 +
>  drivers/thermal/thermal_helpers.c | 13 +++--
>  drivers/thermal/thermal_sysfs.c   | 15 ++-
>  3 files changed, 46 insertions(+), 3 deletions(-)
> 
> diff --git a/drivers/thermal/thermal_core.c
> b/drivers/thermal/thermal_core.c
> index 5fae1621fb01..25ef29123f72 100644
> --- a/drivers/thermal/thermal_core.c
> +++ b/drivers/thermal/thermal_core.c
> @@ -215,6 +215,8 @@ int thermal_zone_device_set_policy(struct
> thermal_zone_device *tz,
>   mutex_unlock(>lock);
>   mutex_unlock(_governor_lock);
>  
> + thermal_notify_tz_gov_change(tz->id, policy);
> +
>   return ret;
>  }
>  
> @@ -406,12 +408,25 @@ static void handle_critical_trips(struct
> thermal_zone_device *tz,
>  static void handle_thermal_trip(struct thermal_zone_device *tz, int
> trip)
>  {
>   enum thermal_trip_type type;
> + int trip_temp, hyst = 0;
>  
>   /* Ignore disabled trip points */
>   if (test_bit(trip, >trips_disabled))
>   return;
>  
> + tz->ops->get_trip_temp(tz, trip, _temp);
>   tz->ops->get_trip_type(tz, trip, );
> + if (tz->ops->get_trip_hyst)
> + tz->ops->get_trip_hyst(tz, trip, );
> +
> + if (tz->last_temperature != THERMAL_TEMP_INVALID) {
> + if (tz->last_temperature < trip_temp &&
> + tz->temperature >= trip_temp)
> + thermal_notify_tz_trip_up(tz->id, trip);
> + if (tz->last_temperature >= trip_temp &&
> + tz->temperature < (trip_temp - hyst))
> + thermal_notify_tz_trip_down(tz->id, trip);
> + }
>  
>   if (type == THERMAL_TRIP_CRITICAL || type == THERMAL_TRIP_HOT)
>   handle_critical_trips(tz, trip, type);
> @@ -443,6 +458,8 @@ static void update_temperature(struct
> thermal_zone_device *tz)
>   mutex_unlock(>lock);
>  
>   trace_thermal_temperature(tz);
> +
> + thermal_genl_sampling_temp(tz->id, temp);
>  }
>  
>  static void thermal_zone_device_init(struct thermal_zone_device *tz)
> @@ -1405,6 +1422,8 @@ thermal_zone_device_register(const char *type,
> int trips, int mask,
>   if (atomic_cmpxchg(>need_update, 1, 0))
>   thermal_zone_device_update(tz,
> THERMAL_EVENT_UNSPECIFIED);
>  
> + thermal_notify_tz_create(tz->id, tz->type);
> +
>   return tz;
>  
>  unregister:
> @@ -1476,6 +1495,8 @@ void thermal_zone_device_unregister(struct
> thermal_zone_device *tz)
>   ida_destroy(>ida);
>   mutex_destroy(>lock);
>   device_unregister(>device);
> +
> + thermal_notify_tz_delete(tz->id);
>  }
>  EXPORT_SYMBOL_GPL(thermal_zone_device_unregister);
>  
> diff --git a/drivers/thermal/thermal_helpers.c
> b/drivers/thermal/thermal_helpers.c
> index 87b1256fa2f2..c94bc824e5d3 100644
> --- a/drivers/thermal/thermal_helpers.c
> +++ b/drivers/thermal/thermal_helpers.c
> @@ -175,6 +175,16 @@ void thermal_zone_set_trips(struct
> thermal_zone_device *tz)
>   mutex_unlock(>lock);
>  }
>  
> +static void thermal_cdev_set_cur_state(struct thermal_cooling_device
> *cdev,
> +int target)
> +{
> + if (cdev->ops->set_cur_state(cdev, target))
> + return;
> +
> + thermal_notify_cdev_state_update(cdev->id, target);
> + thermal_cooling_device_stats_update(cdev, target);
> +}
> +
>  void thermal_cdev_update(struct thermal_cooling_device *cdev)
>  {
>   struct thermal_instance *instance;
> @@ -197,8 +207,7 @@ void thermal_cdev_update(struct
> thermal_cooling_device *cdev)
>   target = instance->target;
>   }
>  
> - if (!cdev->ops->set_cur_state(cdev, target))
> - thermal_cooling_device_stats_update(cdev, target);
> + thermal_cdev_set_cur_state(cdev, target);
>  
>   cdev->updated = true;
>   mutex_unlock(>lock);
> diff --git a/drivers/thermal/thermal_sysfs.c
> b/drivers/thermal/thermal_sysfs.c
> index aa99edb4dff7..ff

Re: [PATCH v4 2/4] thermal: core: Get thermal zone by id

2020-07-06 Thread Zhang Rui

On Mon, 2020-07-06 at 12:55 +0200, Daniel Lezcano wrote:
> The next patch will introduce the generic netlink protocol to handle
> events, sampling and command from the thermal framework. In order to
> deal with the thermal zone, it uses its unique identifier to
> characterize it in the message. Passing an integer is more efficient
> than passing an entire string.
> 
> This change provides a function returning back a thermal zone pointer
> corresponding to the identifier passed as parameter.
> 
> Signed-off-by: Daniel Lezcano 
> Reviewed-by: Amit Kucheria 

Acked-by: Zhang Rui 
> ---
>  drivers/thermal/thermal_core.c | 14 ++
>  drivers/thermal/thermal_core.h |  2 ++
>  2 files changed, 16 insertions(+)
> 
> diff --git a/drivers/thermal/thermal_core.c
> b/drivers/thermal/thermal_core.c
> index 9caaa0b6d662..5fae1621fb01 100644
> --- a/drivers/thermal/thermal_core.c
> +++ b/drivers/thermal/thermal_core.c
> @@ -668,6 +668,20 @@ int for_each_thermal_zone(int (*cb)(struct
> thermal_zone_device *, void *),
>   return ret;
>  }
>  
> +struct thermal_zone_device *thermal_zone_get_by_id(int id)
> +{
> + struct thermal_zone_device *tz = NULL;
> +
> + mutex_lock(_list_lock);
> + list_for_each_entry(tz, _tz_list, node) {
> + if (tz->id == id)
> + break;
> + }
> + mutex_unlock(_list_lock);
> +
> + return tz;
> +}
> +
>  void thermal_zone_device_unbind_exception(struct thermal_zone_device
> *tz,
> const char *cdev_type, size_t
> size)
>  {
> diff --git a/drivers/thermal/thermal_core.h
> b/drivers/thermal/thermal_core.h
> index 71d88dac0791..4f8389efaa62 100644
> --- a/drivers/thermal/thermal_core.h
> +++ b/drivers/thermal/thermal_core.h
> @@ -50,6 +50,8 @@ int for_each_thermal_cooling_device(int
> (*cb)(struct thermal_cooling_device *,
>  int for_each_thermal_governor(int (*cb)(struct thermal_governor *,
> void *),
> void *thermal_governor);
>  
> +struct thermal_zone_device *thermal_zone_get_by_id(int id);
> +
>  struct thermal_attr {
>   struct device_attribute attr;
>   char name[THERMAL_NAME_LENGTH];

Re: [PATCH v3 3/4] thermal: core: genetlink support for events/cmd/sampling

2020-07-05 Thread Zhang Rui

On Fri, 2020-07-03 at 10:53 +0200, Daniel Lezcano wrote:
> Initially the thermal framework had a very simple notification
> mechanism to send generic netlink messages to the userspace.
> 
> The notification function was never called from anywhere and the
> corresponding dead code was removed. It was probably a first attempt
> to introduce the netlink notification.
> 
> At LPC2018, the presentation "Linux thermal: User kernel interface",
> proposed to create the notifications to the userspace via a kfifo.
> 
> The advantage of the kfifo is the performance. It is usually used
> from
> a 1:1 communication channel where a driver captures data and sends it
> as fast as possible to a userspace process.
> 
> The drawback is that only one process uses the notification channel
> exclusively, thus no other process is allowed to use the channel to
> get temperature or notifications.
> 
> This patch defines a generic netlink API to discover the current
> thermal setup and adds event notifications as well as temperature
> sampling. As any genetlink protocol, it can evolve and the versioning
> allows to keep the backward compatibility.
> 
> In order to prevent the user from getting flooded with data on a
> single channel, there are two multicast channels, one for the
> temperature sampling when the thermal zone is updated and another one
> for the events, so the user can get the events only without the
> thermal zone temperature sampling.
> 
> Also, a list of commands to discover the thermal setup is added and
> can be extended when needed.
> 
> Signed-off-by: Daniel Lezcano 
> ---
>   v3:
>   - Fixed changelog from Amit Kucheria suggestions
>   - Prefixed fields in the parameter structure (trip_*, cdev_*)
>   - Fixed leading whitespaces errors
>   - Replaced id by trip_id
>   - s/THERMAL_GENL_CMD_TZ_GET/THERMAL_GENL_CMD_TZ_GET_ID/
>   - Added the cdev max state in the cdev change event
>   - Removed min state
>   - Fixed checkpatch warnings
> ---
> ---
>  drivers/thermal/Makefile  |   2 +-
>  drivers/thermal/thermal_core.h|  18 +
>  drivers/thermal/thermal_netlink.c | 650
> ++
>  include/linux/thermal.h   |  17 -
>  include/uapi/linux/thermal.h  |  90 -
>  5 files changed, 742 insertions(+), 35 deletions(-)
>  create mode 100644 drivers/thermal/thermal_netlink.c
> 
> diff --git a/drivers/thermal/Makefile b/drivers/thermal/Makefile
> index 0c8b84a09b9a..1bbf0805fb04 100644
> --- a/drivers/thermal/Makefile
> +++ b/drivers/thermal/Makefile
> @@ -5,7 +5,7 @@
>  
>  obj-$(CONFIG_THERMAL)+= thermal_sys.o
>  thermal_sys-y+= thermal_core.o
> thermal_sysfs.o \
> - thermal_helpers.o
> + thermal_helpers.o
> thermal_netlink.o
>  
>  # interface to/from other layers providing sensors
>  thermal_sys-$(CONFIG_THERMAL_HWMON)  += thermal_hwmon.o
> diff --git a/drivers/thermal/thermal_core.h
> b/drivers/thermal/thermal_core.h
> index 4f8389efaa62..12bca87fb709 100644
> --- a/drivers/thermal/thermal_core.h
> +++ b/drivers/thermal/thermal_core.h
> @@ -52,6 +52,24 @@ int for_each_thermal_governor(int (*cb)(struct
> thermal_governor *, void *),
>  
>  struct thermal_zone_device *thermal_zone_get_by_id(int id);
>  
> +/* Netlink notification function */
> +int thermal_notify_tz_create(int tz_id, const char *name);
> +int thermal_notify_tz_delete(int tz_id);
> +int thermal_notify_tz_enable(int tz_id);
> +int thermal_notify_tz_disable(int tz_id);
> +int thermal_notify_tz_trip_down(int tz_id, int id);
> +int thermal_notify_tz_trip_up(int tz_id, int id);
> +int thermal_notify_tz_trip_delete(int tz_id, int id);
> +int thermal_notify_tz_trip_add(int tz_id, int id, int type,
> +int temp, int hyst);
> +int thermal_notify_tz_trip_change(int tz_id, int id, int type,
> +   int temp, int hyst);
> +int thermal_notify_cdev_update(int cdev_id, int state);
> +int thermal_notify_cdev_add(int cdev_id, const char *name, int
> max_state);
> +int thermal_notify_cdev_delete(int cdev_id);
> +int thermal_notify_tz_gov_change(int tz_id, const char *name);
> +int thermal_genl_sampling_temp(int id, int temp);
> +
>  struct thermal_attr {
>   struct device_attribute attr;
>   char name[THERMAL_NAME_LENGTH];
> diff --git a/drivers/thermal/thermal_netlink.c
> b/drivers/thermal/thermal_netlink.c
> new file mode 100644
> index ..d3c48bbcd269
> --- /dev/null
> +++ b/drivers/thermal/thermal_netlink.c
> @@ -0,0 +1,650 @@
> +// SPDX-License-Identifier: GPL-2.0
> +/*
> + * Copyright 2020 Linaro Limited
> + *
> + * Author: Daniel Lezcano 
> + *
> + * Generic netlink for thermal management framework
> + */
> +#include 
> +#include 
> +#include 
> +#include 
> +
> +#include "thermal_core.h"
> +
> +static const struct genl_multicast_group thermal_genl_mcgrps[] = {
> + { .name =

Re: [PATCH v2 1/5] thermal: core: Add helpers to browse the cdev, tz and governor list

2020-07-02 Thread Zhang Rui

On Wed, 2020-07-01 at 11:50 +0200, Daniel Lezcano wrote:
> On 01/07/2020 09:57, Zhang Rui wrote:
> 
> [ ... ]
> 
> > > Do you want to move them out?
> > 
> > Then no. I don't have any objection of removing thermal_helper.c,
> > so
> > you can just leave these functions in thermal_core.c
> 
> Shall I consider that as an ack for this patch ?
> 
sure.

Acked-by: Zhang Rui 

thanks,
rui
>

Re: [PATCH v2 1/5] thermal: core: Add helpers to browse the cdev, tz and governor list

2020-07-01 Thread Zhang Rui

On Wed, 2020-07-01 at 09:35 +0200, Daniel Lezcano wrote:
> On 30/06/2020 17:09, Zhang Rui wrote:
> > Hi, Daniel,
> > 
> > seems that you forgot to cc linux-pm mailing list.
> > 
> > On Tue, 2020-06-30 at 17:16 +0530, Amit Kucheria wrote:
> > > On Thu, Jun 25, 2020 at 8:15 PM Daniel Lezcano
> > >  wrote:
> > > > 
> > > > The cdev, tz and governor list, as well as their respective
> > > > locks
> > > > are
> > > > statically defined in the thermal_core.c file.
> > > > 
> > > > In order to give a sane access to these list, like browsing all
> > > > the
> > > > thermal zones or all the cooling devices, let's define a set of
> > > > helpers where we pass a callback as a parameter to be called
> > > > for
> > > > each
> > > > thermal entity.
> > > > 
> > > > We keep the self-encapsulation and ensure the locks are
> > > > correctly
> > > > taken when looking at the list.
> > > > 
> > > > Signed-off-by: Daniel Lezcano 
> > > > ---
> > > >  drivers/thermal/thermal_core.c | 51
> > > > ++
> > > 
> > > Is the idea to not use thermal_helpers.c from now on? It fits
> > > perfectly with a patch I have to merge all its contents to
> > > thermal_core.c :-)
> > 
> > I agree these changes should be in thermal_helper.c
> 
> Oh, actually I remind put those functions in the thermal_core.c file
> because they need the locks which are statically defined in there.
> 
> If the functions are moved to thermal_helper.c that will imply to
> export
> the locks outside of the file, thus breaking the self-encapsulation.
> 
> Do you want to move them out?

Then no. I don't have any objection of removing thermal_helper.c, so
you can just leave these functions in thermal_core.c

thanks,
rui
> 
>

Re: [PATCH v2 4/5] thermal: core: genetlink support for events/cmd/sampling

2020-07-01 Thread Zhang Rui

Hi, Daniel,

On Tue, 2020-06-30 at 20:32 +0200, Daniel Lezcano wrote:
> Hi Zhang,
> 
> thanks for taking the time to review
> 
> 
> On 30/06/2020 18:12, Zhang Rui wrote:
> 
> [ ... ]
> 
> > > +int thermal_notify_tz_enable(int tz_id);
> > > +int thermal_notify_tz_disable(int tz_id);
> > 
> > these two will be used after merging the mode enhancement patches
> > from
> > Andrzej Pietrasiewicz, right?
> 
> Yes, that is correct.
> 
> > > +int thermal_notify_tz_trip_down(int tz_id, int id);
> > > +int thermal_notify_tz_trip_up(int tz_id, int id);
> > > +int thermal_notify_tz_trip_delete(int tz_id, int id);
> > > +int thermal_notify_tz_trip_add(int tz_id, int id, int type,
> > > +int temp, int hyst);
> > 
> > is there any case we need to use these two?
> 
> There is the initial proposal to add/del trip points via sysfs which
> is
> not merged because of no comments and the presentation from Srinivas
> giving the 'add' and 'del' notification description, so I assumed the
> feature would be added soon.
> 
> These functions are here ready to be connected to the core code. If
> anyone is planning to implement add/del trip point, he won't have to
> implement these two notifications as they will be ready to be called.
> 
Then I'd prefer we only introduce the events that are used or will be
used soon, like the tz disable/enable, to avoid some potential dead
code.
We can easily add more events when they are needed.

Srinivas, do you have plan to use the trip add/delete events?

> > > +int thermal_notify_tz_trip_change(int tz_id, int id, int type,
> > > +   int temp, int hyst);
> > > +int thermal_notify_cdev_update(int cdev_id, int state);
> > 
> > This is used when the cdev cur_state is changed.
> > what about cases that cdev->max_state changes? I think we need an
> > extra
> > event for it, right?
> 
> Sure, I can add:
> 
> int thermal_notify_cdev_change(...)

thermal_notify_cdev_change() and thermal_notify_cdev_update() looks
confusing to me.
Can we use thermal_notify_cdev_update_cur() and
thermal_notify_cdev_update_max()?
Or can we use one event, with updated cur_state and max_state?

> > > +int thermal_notify_cdev_add(int cdev_id, const char *name,
> > > + int min_state, int max_state);
> > > +int thermal_notify_cdev_delete(int cdev_id);
> > 
> > These two should be used in patch 5/5.
> 
> Sure.
> 
> > > +int thermal_notify_tz_gov_change(int tz_id, const char *name);
> > > +int thermal_genl_sampling_temp(int id, int temp);
> > > +
> > 
> >  struct thermal_attr {
> > >   struct device_attribute attr;
> > >   char name[THERMAL_NAME_LENGTH];
> > > diff --git a/drivers/thermal/thermal_netlink.c
> > > b/drivers/thermal/thermal_netlink.c
> > > new file mode 100644
> > > index ..a8d6a815a21d
> > > --- /dev/null
> > > +++ b/drivers/thermal/thermal_netlink.c
> > > @@ -0,0 +1,645 @@
> > > +// SPDX-License-Identifier: GPL-2.0
> > > +/*
> > > + * Copyright 2020 Linaro Limited
> > > + *
> > > + * Author: Daniel Lezcano 
> > > + *
> > > + * Generic netlink for thermal management framework
> > > + */
> > > +#include 
> > > +#include 
> > > +#include 
> > > +#include 
> > > +
> > > +#include "thermal_core.h"
> > > +
> > > +static const struct genl_multicast_group thermal_genl_mcgrps[] =
> > > {
> > > + { .name = THERMAL_GENL_SAMPLING_GROUP_NAME, },
> > > + { .name = THERMAL_GENL_EVENT_GROUP_NAME,  },
> > > +};
> > > +
> > > +static const struct nla_policy
> > > thermal_genl_policy[THERMAL_GENL_ATTR_MAX + 1] = {
> > > + /* Thermal zone */
> > > + [THERMAL_GENL_ATTR_TZ]  = { .type =
> > > NLA_NESTED },
> > > + [THERMAL_GENL_ATTR_TZ_ID]   = { .type = NLA_U32 },
> > > + [THERMAL_GENL_ATTR_TZ_TEMP] = { .type = NLA_U32
> > > },
> > > + [THERMAL_GENL_ATTR_TZ_TRIP] = { .type =
> > > NLA_NESTED },
> > > + [THERMAL_GENL_ATTR_TZ_TRIP_ID]  = { .type =
> > > NLA_U32
> > > },
> > > + [THERMAL_GENL_ATTR_TZ_TRIP_TEMP]= { .type = NLA_U32 },
> > > + [THERMAL_GENL_ATTR_TZ_TRIP_TYPE]= { .type = NLA_U32 },
> > > + [THERMAL_GENL_ATTR_TZ_TRIP_HYST]= { .type = NLA_U32 },
> > > + [THERMAL_GENL_ATTR_TZ_MODE] = { .type = NLA_U32
&g

Re: [PATCH v2 4/5] thermal: core: genetlink support for events/cmd/sampling

2020-06-30 Thread Zhang Rui

On Thu, 2020-06-25 at 16:45 +0200, Daniel Lezcano wrote:
> Initially the thermal framework had a very simple notification
> mechanism to send generic netlink messages to the userspace.
> 
> The notification function was never called from anywhere and the
> corresponding dead code was removed. It was probably a first attempt
> to introduce the netlink notification.
> 
> At LPC2018, the presentation "Linux thermal: User kernel interface",
> proposed to create the notifications to the userspace via a kfifo.
> 
> The advantage of the kfifo is the performance. It is usually used
> from
> a 1:1 communication channel where a driver captures data and send
> them
> as fast as possible to an userspace process.
> 
> The inconvenient is the process uses the notification channel
> exclusively, thus no other process is allowed to use the channel to
> get temperature or notifications.
> 
> The purpose of this series is to provide a fully netlink featured
> thermal management.
> 
> This patch is defining a generic netlink API to discover the current
> thermal setup, get events and temperature sampling. As any genetlink
> protocol, it can evolve and the versionning allows to keep the
> backward
> compatibility.
> 
> In order to not flood the user with a single channel data, there are
> two multicast channels, one for the temperature sampling when the
> thermal zone is updated and another one for the events, so the user
> can get the events only without the thermal zone temperature
> sampling. All these events are from the ones presented at the
> LPC2018.
> 
> Also, a list of commands to discover the thermal setup is given and
> can be extended on purpose.
> 
> Signed-off-by: Daniel Lezcano 
> ---
>  drivers/thermal/Makefile  |   2 +-
>  drivers/thermal/thermal_core.h|  19 +
>  drivers/thermal/thermal_netlink.c | 645
> ++
>  include/linux/thermal.h   |  12 -
>  include/uapi/linux/thermal.h  |  80 +++-
>  5 files changed, 738 insertions(+), 20 deletions(-)
>  create mode 100644 drivers/thermal/thermal_netlink.c
> 
> diff --git a/drivers/thermal/Makefile b/drivers/thermal/Makefile
> index 0c8b84a09b9a..1bbf0805fb04 100644
> --- a/drivers/thermal/Makefile
> +++ b/drivers/thermal/Makefile
> @@ -5,7 +5,7 @@
>  
>  obj-$(CONFIG_THERMAL)+= thermal_sys.o
>  thermal_sys-y+= thermal_core.o
> thermal_sysfs.o \
> - thermal_helpers.o
> + thermal_helpers.o
> thermal_netlink.o
>  
>  # interface to/from other layers providing sensors
>  thermal_sys-$(CONFIG_THERMAL_HWMON)  += thermal_hwmon.o
> diff --git a/drivers/thermal/thermal_core.h
> b/drivers/thermal/thermal_core.h
> index 7e8f45db6bbf..08eb03fe4f69 100644
> --- a/drivers/thermal/thermal_core.h
> +++ b/drivers/thermal/thermal_core.h
> @@ -52,6 +52,25 @@ int for_each_thermal_governor(int (*cb)(struct
> thermal_governor *, void *),
>  
>  struct thermal_zone_device *thermal_zone_get_by_id(int id);
>  
> +/* Netlink notification function */
> +int thermal_notify_tz_create(int tz_id, const char *name);
> +int thermal_notify_tz_delete(int tz_id);

> +int thermal_notify_tz_enable(int tz_id);
> +int thermal_notify_tz_disable(int tz_id);

these two will be used after merging the mode enhancement patches from
Andrzej Pietrasiewicz, right?


> +int thermal_notify_tz_trip_down(int tz_id, int id);
> +int thermal_notify_tz_trip_up(int tz_id, int id);

> +int thermal_notify_tz_trip_delete(int tz_id, int id);
> +int thermal_notify_tz_trip_add(int tz_id, int id, int type,
> +int temp, int hyst);

is there any case we need to use these two?

> +int thermal_notify_tz_trip_change(int tz_id, int id, int type,
> +   int temp, int hyst);
> +int thermal_notify_cdev_update(int cdev_id, int state);

This is used when the cdev cur_state is changed.
what about cases that cdev->max_state changes? I think we need an extra
event for it, right?
> 
> +int thermal_notify_cdev_add(int cdev_id, const char *name,
> + int min_state, int max_state);
> +int thermal_notify_cdev_delete(int cdev_id);

These two should be used in patch 5/5.

> +int thermal_notify_tz_gov_change(int tz_id, const char *name);
> +int thermal_genl_sampling_temp(int id, int temp);
> +

 struct thermal_attr {
>   struct device_attribute attr;
>   char name[THERMAL_NAME_LENGTH];
> diff --git a/drivers/thermal/thermal_netlink.c
> b/drivers/thermal/thermal_netlink.c
> new file mode 100644
> index ..a8d6a815a21d
> --- /dev/null
> +++ b/drivers/thermal/thermal_netlink.c
> @@ -0,0 +1,645 @@
> +// SPDX-License-Identifier: GPL-2.0
> +/*
> + * Copyright 2020 Linaro Limited
> + *
> + * Author: Daniel Lezcano 
> + *
> + * Generic netlink for thermal management framework
> + */
> +#include 
> +#include 
> +#include 
> +#include 
> +
> +#include "thermal_core.h"
> +
> +static const struct

Re: [PATCH v2 2/5] thermal: core: Get thermal zone by id

2020-06-30 Thread Zhang Rui

On Thu, 2020-06-25 at 16:45 +0200, Daniel Lezcano wrote:
> The next patch will introduce the generic netlink protocol to handle
> events, sampling and command from the thermal framework. In order to
> deal with the thermal zone, it uses its unique identifier to
> characterize it in the message. Passing an integer is more efficient
> than passing an entire string.
> 
> This change provides a function returning back a thermal zone pointer
> corresponding to the identifier passed as parameter.
> 
> Signed-off-by: Daniel Lezcano 
> ---
>  drivers/thermal/thermal_core.c | 14 ++
>  drivers/thermal/thermal_core.h |  2 ++
>  2 files changed, 16 insertions(+)
> 
> diff --git a/drivers/thermal/thermal_core.c
> b/drivers/thermal/thermal_core.c
> index e2f8d2550ecd..58c95aeafb7f 100644
> --- a/drivers/thermal/thermal_core.c
> +++ b/drivers/thermal/thermal_core.c
> @@ -662,6 +662,20 @@ int for_each_thermal_zone(int (*cb)(struct
> thermal_zone_device *, void *),
>   return ret;
>  }
>  
> +struct thermal_zone_device *thermal_zone_get_by_id(int id)
> +{
> + struct thermal_zone_device *tz = NULL;
> +
> + mutex_lock(_list_lock);
> + list_for_each_entry(tz, _tz_list, node) {
> + if (tz->id == id)
> + break;
> + }
> + mutex_unlock(_list_lock);
> +
> + return tz;
> +}
> +

IMO, this one, as well as thermal_zone_get_zone_by_name(), should all
be in thermal_helper.c

thanks,
rui
>  void thermal_zone_device_unbind_exception(struct thermal_zone_device
> *tz,
> const char *cdev_type, size_t
> size)
>  {
> diff --git a/drivers/thermal/thermal_core.h
> b/drivers/thermal/thermal_core.h
> index bb8f8aee79eb..7e8f45db6bbf 100644
> --- a/drivers/thermal/thermal_core.h
> +++ b/drivers/thermal/thermal_core.h
> @@ -50,6 +50,8 @@ int for_each_thermal_cooling_device(int
> (*cb)(struct thermal_cooling_device *,
>  int for_each_thermal_governor(int (*cb)(struct thermal_governor *,
> void *),
> void *thermal_governor);
>  
> +struct thermal_zone_device *thermal_zone_get_by_id(int id);
> +
>  struct thermal_attr {
>   struct device_attribute attr;
>   char name[THERMAL_NAME_LENGTH];

Re: [PATCH v2 1/5] thermal: core: Add helpers to browse the cdev, tz and governor list

2020-06-30 Thread Zhang Rui

Hi, Daniel,

seems that you forgot to cc linux-pm mailing list.

On Tue, 2020-06-30 at 17:16 +0530, Amit Kucheria wrote:
> On Thu, Jun 25, 2020 at 8:15 PM Daniel Lezcano
>  wrote:
> > 
> > The cdev, tz and governor list, as well as their respective locks
> > are
> > statically defined in the thermal_core.c file.
> > 
> > In order to give a sane access to these list, like browsing all the
> > thermal zones or all the cooling devices, let's define a set of
> > helpers where we pass a callback as a parameter to be called for
> > each
> > thermal entity.
> > 
> > We keep the self-encapsulation and ensure the locks are correctly
> > taken when looking at the list.
> > 
> > Signed-off-by: Daniel Lezcano 
> > ---
> >  drivers/thermal/thermal_core.c | 51
> > ++
> 
> Is the idea to not use thermal_helpers.c from now on? It fits
> perfectly with a patch I have to merge all its contents to
> thermal_core.c :-)

I agree these changes should be in thermal_helper.c

thanks,
rui
> 
> 
> >  drivers/thermal/thermal_core.h |  9 ++
> >  2 files changed, 60 insertions(+)
> > 
> > diff --git a/drivers/thermal/thermal_core.c
> > b/drivers/thermal/thermal_core.c
> > index 2a3f83265d8b..e2f8d2550ecd 100644
> > --- a/drivers/thermal/thermal_core.c
> > +++ b/drivers/thermal/thermal_core.c
> > @@ -611,6 +611,57 @@ void
> > thermal_zone_device_rebind_exception(struct thermal_zone_device
> > *tz,
> > mutex_unlock(_list_lock);
> >  }
> > 
> > +int for_each_thermal_governor(int (*cb)(struct thermal_governor *,
> > void *),
> > + void *data)
> 
> 
> > +{
> > +   struct thermal_governor *gov;
> > +   int ret = 0;
> > +
> > +   mutex_lock(_governor_lock);
> > +   list_for_each_entry(gov, _governor_list,
> > governor_list) {
> > +   ret = cb(gov, data);
> > +   if (ret)
> > +   break;
> > +   }
> > +   mutex_unlock(_governor_lock);
> > +
> > +   return ret;
> > +}
> > +
> > +int for_each_thermal_cooling_device(int (*cb)(struct
> > thermal_cooling_device *,
> > + void *), void *data)
> > +{
> > +   struct thermal_cooling_device *cdev;
> > +   int ret = 0;
> > +
> > +   mutex_lock(_list_lock);
> > +   list_for_each_entry(cdev, _cdev_list, node) {
> > +   ret = cb(cdev, data);
> > +   if (ret)
> > +   break;
> > +   }
> > +   mutex_unlock(_list_lock);
> > +
> > +   return ret;
> > +}
> > +
> > +int for_each_thermal_zone(int (*cb)(struct thermal_zone_device *,
> > void *),
> > + void *data)
> > +{
> > +   struct thermal_zone_device *tz;
> > +   int ret = 0;
> > +
> > +   mutex_lock(_list_lock);
> > +   list_for_each_entry(tz, _tz_list, node) {
> > +   ret = cb(tz, data);
> > +   if (ret)
> > +   break;
> > +   }
> > +   mutex_unlock(_list_lock);
> > +
> > +   return ret;
> > +}
> > +
> >  void thermal_zone_device_unbind_exception(struct
> > thermal_zone_device *tz,
> >   const char *cdev_type,
> > size_t size)
> >  {
> > diff --git a/drivers/thermal/thermal_core.h
> > b/drivers/thermal/thermal_core.h
> > index 4e271016b7a9..bb8f8aee79eb 100644
> > --- a/drivers/thermal/thermal_core.h
> > +++ b/drivers/thermal/thermal_core.h
> > @@ -41,6 +41,15 @@ extern struct thermal_governor
> > *__governor_thermal_table_end[];
> >  __governor < __governor_thermal_table_end; \
> >  __governor++)
> > 
> > +int for_each_thermal_zone(int (*cb)(struct thermal_zone_device *,
> > void *),
> > + void *);
> > +
> > +int for_each_thermal_cooling_device(int (*cb)(struct
> > thermal_cooling_device *,
> > + void *), void *);
> > +
> > +int for_each_thermal_governor(int (*cb)(struct thermal_governor *,
> > void *),
> > + void *thermal_governor);
> > +
> >  struct thermal_attr {
> > struct device_attribute attr;
> > char name[THERMAL_NAME_LENGTH];
> > --
> > 2.17.1
> >

Re: [PATCH V2 1/3] thermal/int340x_thermal: Export GDDV

2020-05-29 Thread Zhang Rui

On Fri, 2020-05-29 at 00:00 -0600, Pandruvada, Srinivas wrote:
> On Mon, 2020-05-18 at 23:18 +, Pandruvada, Srinivas wrote:
> > On Mon, 2020-04-13 at 19:09 -0700, Matthew Garrett wrote:
> > > From: Matthew Garrett 
> > > 
> > > Implementing DPTF properly requires making use of firmware-
> > > provided
> > > information associated with the INT3400 device. Calling GDDV
> > > provides
> > > a
> > > buffer of information which userland can then interpret to
> > > determine
> > > appropriate DPTF policy.
> > > 
> > > Signed-off-by: Matthew Garrett 
> > 
> > Tested-by: Pandruvada, Srinivas <
> > srinivas.pandruv...@linux.intel.com>
> 
> Can we take this series for 5.8?

They are in testing branch and has just passed the build test, will
queue them for 5.8.

thanks,
rui
> 
> Thanks,
> Srinivas
> 
> > 
> > > ---
> > >  .../intel/int340x_thermal/int3400_thermal.c   | 60
> > > +++
> > >  1 file changed, 60 insertions(+)
> > > 
> > > diff --git
> > > a/drivers/thermal/intel/int340x_thermal/int3400_thermal.c
> > > b/drivers/thermal/intel/int340x_thermal/int3400_thermal.c
> > > index ceef89c956bd4..00a7732724cd0 100644
> > > --- a/drivers/thermal/intel/int340x_thermal/int3400_thermal.c
> > > +++ b/drivers/thermal/intel/int340x_thermal/int3400_thermal.c
> > > @@ -52,6 +52,25 @@ struct int3400_thermal_priv {
> > >   u8 uuid_bitmap;
> > >   int rel_misc_dev_res;
> > >   int current_uuid_index;
> > > + char *data_vault;
> > > +};
> > > +
> > > +static ssize_t data_vault_read(struct file *file, struct kobject
> > > *kobj,
> > > +  struct bin_attribute *attr, char *buf, loff_t off, size_t
> > > count)
> > > +{
> > > + memcpy(buf, attr->private + off, count);
> > > + return count;
> > > +}
> > > +
> > > +static BIN_ATTR_RO(data_vault, 0);
> > > +
> > > +static struct bin_attribute *data_attributes[] = {
> > > + _attr_data_vault,
> > > + NULL,
> > > +};
> > > +
> > > +static const struct attribute_group data_attribute_group = {
> > > + .bin_attrs = data_attributes,
> > >  };
> > >  
> > >  static ssize_t available_uuids_show(struct device *dev,
> > > @@ -278,6 +297,32 @@ static struct thermal_zone_params
> > > int3400_thermal_params = {
> > >   .no_hwmon = true,
> > >  };
> > >  
> > > +static void int3400_setup_gddv(struct int3400_thermal_priv
> > > *priv)
> > > +{
> > > + struct acpi_buffer buffer = { ACPI_ALLOCATE_BUFFER, NULL };
> > > + union acpi_object *obj;
> > > + acpi_status status;
> > > +
> > > + status = acpi_evaluate_object(priv->adev->handle, "GDDV", NULL,
> > > +   );
> > > + if (ACPI_FAILURE(status) || !buffer.length)
> > > + return;
> > > +
> > > + obj = buffer.pointer;
> > > + if (obj->type != ACPI_TYPE_PACKAGE || obj->package.count != 1
> > > + || obj->package.elements[0].type != ACPI_TYPE_BUFFER) {
> > > + kfree(buffer.pointer);
> > > + return;
> > > + }
> > > +
> > > + priv->data_vault = kmemdup(obj-
> > > > package.elements[0].buffer.pointer,
> > > 
> > > +obj-
> > > > package.elements[0].buffer.length,
> > > 
> > > +GFP_KERNEL);
> > > + bin_attr_data_vault.private = priv->data_vault;
> > > + bin_attr_data_vault.size = obj-
> > > > package.elements[0].buffer.length;
> > > 
> > > + kfree(buffer.pointer);
> > > +}
> > > +
> > >  static int int3400_thermal_probe(struct platform_device *pdev)
> > >  {
> > >   struct acpi_device *adev = ACPI_COMPANION(>dev);
> > > @@ -309,6 +354,8 @@ static int int3400_thermal_probe(struct
> > > platform_device *pdev)
> > >  
> > >   platform_set_drvdata(pdev, priv);
> > >  
> > > + int3400_setup_gddv(priv);
> > > +
> > >   int3400_thermal_ops.get_mode = int3400_thermal_get_mode;
> > >   int3400_thermal_ops.set_mode = int3400_thermal_set_mode;
> > >  
> > > @@ -327,6 +374,13 @@ static int int3400_thermal_probe(struct
> > > platform_device *pdev)
> > >   if (result)
> > >   goto free_rel_misc;
> > >  
> > > + if (priv->data_vault) {
> > > + result = sysfs_create_group(>dev.kobj,
> > > + _attribute_group);
> > > + if (result)
> > > + goto free_uuid;
> > > + }
> > > +
> > >   result = acpi_install_notify_handler(
> > >   priv->adev->handle, ACPI_DEVICE_NOTIFY,
> > > int3400_notify,
> > >   (void *)priv);
> > > @@ -336,6 +390,9 @@ static int int3400_thermal_probe(struct
> > > platform_device *pdev)
> > >   return 0;
> > >  
> > >  free_sysfs:
> > > + if (priv->data_vault)
> > > + sysfs_remove_group(>dev.kobj,
> > > _attribute_group);
> > > +free_uuid:
> > >   sysfs_remove_group(>dev.kobj, _attribute_group);
> > >  free_rel_misc:
> > >   if (!priv->rel_misc_dev_res)
> > > @@ -360,8 +417,11 @@ static int int3400_thermal_remove(struct
> > > platform_device *pdev)
> > >   if (!priv->rel_misc_dev_res)
> > >   acpi_thermal_rel_misc_device_remove(priv->adev-
> > > > handle);
> > > 
> > >  
> > > + if (priv->data_vault)
> > > +

Re: [GIT PULL] Thermal management updates for v5.4-rc1

2019-09-28 Thread Zhang Rui

Hi, Linus,

I'm really sorry about this.

I thought no code change could be a reason that a rebase can be
accepted, but didn't realize this is exactly the case we should avoid
it. I wish I could read Documentation/maintainer/rebasing-and-
merging.rst earlier so that I didn't make this mistake.
Sorry to bring this trouble.

thanks,
rui

On Fri, 2019-09-27 at 11:34 -0700, Linus Torvalds wrote:
> On Fri, Sep 27, 2019 at 6:08 AM Zhang Rui 
> wrote:
> > 
> > One thing to mention is that, all the patches have been tested in
> > linux-next for weeks, but there is a conflict detected, because
> > upstream has took commit eaf7b46083a7e34 ("docs: thermal: add it to
> > the
> > driver API") from jc-docs tree while I'm keeping a wrong version of
> > the
> > patch, so I just rebased my tree to fix this.
> 
> Why do I have to say this EVERY single release?
> 
> A conflict is not a reason to rebase. Conflicts happen. They happen a
> lot. I deal with them, and it's usually trivial.
> 
> If you feel it's not trivial, just describe what the resolution is,
> rather than rebasing. Really.
> 
> Rebasing for a random conflict (particularly in documentation, for
> chrissake!) is like using an atomic bomb to swat a fly.  You have all
> those downsides, and there are basically _no_ upsides. It only makes
> for more work for me because I have to re-write this email for the
> millionth time, and that takes longer and is more aggravating than
> the
> conflict would have taken to just sort out.
> 
>Linus

[GIT PULL] Thermal management updates for v5.4-rc1

2019-09-27 Thread Zhang Rui

Hi, Linus,

Please pull from
  git://git.kernel.org/pub/scm/linux/kernel/git/rzhang/linux.git for-
5.4

to receive the latest Thermal Management updates for v5.4-rc1 with
top-most commit 0f84d1d18c46d0f995962c876c8b2900fd183fd7:

  Merge branches 'thermal-core', 'thermal-intel' and 'thermal-soc' into
for-5.4 (2019-09-24 09:56:37 +0800)

on top of commit d1abaeb3be7b5fa6d7a1fbbd2e14e3310005c4c1:

  Linux 5.3-rc5 (2019-08-18 14:31:08 -0700)

One thing to mention is that, all the patches have been tested in
linux-next for weeks, but there is a conflict detected, because
upstream has took commit eaf7b46083a7e34 ("docs: thermal: add it to the
driver API") from jc-docs tree while I'm keeping a wrong version of the
patch, so I just rebased my tree to fix this.

Specifics:
 - Add Amit Kucheria as thermal subsystem Reviewer. (Amit Kucheria)
 - Fix a use after free bug when unregistering thermal zone devices.
(Ido Schimmel)
 - Fix thermal core framework to use put_device() when
device_register() fails. (Yue Hu)
 - Enable intel_pch_thermal and MMIO RAPL support for Intel Icelake
platform. (Srinivas Pandruvada)
 - Add clock operations in qorip thermal driver, for some platforms
with clock control like i.MX8MQ. (Anson Huang)
 - A couple of trivial fixes and cleanups for thermal core and
different soc thermal drivers.(Amit Kucheria, Christophe
JAILLET, Chuhong Yuan, Fuqian Huang, Kelsey Skunberg, Nathan
Huckleberry, Rishi Gupta, Srinivas Kandagatla)

thanks,
rui


Amit Kucheria (2):
  thermal: Add some error messages
  MAINTAINERS: Add Amit Kucheria as reviewer for thermal

Anson Huang (5):
  thermal: qoriq: Add clock operations
  thermal: qoriq: Fix error path of calling
qoriq_tmu_register_tmu_zone fail
  thermal: qoriq: Use devm_platform_ioremap_resource() instead of
of_iomap()
  thermal: qoriq: Use __maybe_unused instead of #if CONFIG_PM_SLEEP
  dt-bindings: thermal: qoriq: Add optional clocks property

Christophe JAILLET (1):
  thermal: tegra: Fix a typo

Chuhong Yuan (1):
  thermal: intel: Use dev_get_drvdata

Fuqian Huang (1):
  thermal: rcar_gen3_thermal: Replace devm_add_action() followed by
failure action with devm_add_action_or_reset()

Ido Schimmel (1):
  thermal: Fix use-after-free when unregistering thermal zone
device

Kelsey Skunberg (1):
  thermal: intel: int340x_thermal: Remove unnecessary
acpi_has_method() uses

Nathan Huckleberry (1):
  thermal: armada: Fix -Wshift-negative-value

Rishi Gupta (1):
  thermal: intel: int3403: replace printk(KERN_WARN...) with
pr_warn(...)

Srinivas Kandagatla (1):
  drivers: thermal: qcom: tsens: Fix memory leak from qfprom read

Srinivas Pandruvada (2):
  drivers: thermal: processor_thermal_device: Export sysfs
interface for TCC offset
  thermal: int340x: processor_thermal: Add Ice Lake support

Stefan Mavrodiev (1):
  thermal_hwmon: Sanitize thermal_zone type

Yue Hu (1):
  thermal/drivers/core: Use put_device() if device_register() fails

Zhang Rui (2):
  Merge branches 'thermal-soc-misc' and 'thermal-soc-qoriq' into
thermal-soc
  Merge branches 'thermal-core', 'thermal-intel' and 'thermal-soc'
into for-5.4

 .../devicetree/bindings/thermal/qoriq-thermal.txt  |  1 +
 MAINTAINERS|  1 +
 drivers/thermal/armada_thermal.c   |  5 +-
 .../intel/int340x_thermal/acpi_thermal_rel.c   |  6 --
 .../intel/int340x_thermal/int3403_thermal.c|  2 +-
 .../int340x_thermal/processor_thermal_device.c | 96
+-
 drivers/thermal/intel/intel_pch_thermal.c  |  6 +-
 drivers/thermal/qcom/tsens-8960.c  |  2 +
 drivers/thermal/qcom/tsens-v0_1.c  | 12 ++-
 drivers/thermal/qcom/tsens-v1.c|  1 +
 drivers/thermal/qcom/tsens.h   |  1 +
 drivers/thermal/qoriq_thermal.c| 45 ++
 drivers/thermal/rcar_gen3_thermal.c|  3 +-
 drivers/thermal/tegra/soctherm.c   |  2 +-
 drivers/thermal/thermal_core.c | 44 ++
 drivers/thermal/thermal_hwmon.c|  8 +-
 16 files changed, 178 insertions(+), 57 deletions(-)

Re: [PATCH] thermal: qoriq: add thermal monitor unit version 2 support

2019-09-23 Thread Zhang Rui

On Mon, 2019-09-23 at 09:24 +, Andy Tang wrote:
> Hi Rui, Edubezval,
> 
> Would you please review this patch?
> 
CC Anson Huang.
I'd prefer all the qoriq thermal patches go through his review first.

thanks,
rui

> BR,
> Andy
> 
> > -Original Message-
> > From: Andy Tang
> > Sent: 2019年8月29日 16:38
> > To: 'edubez...@gmail.com' ; 
> > 'rui.zh...@intel.com'
> > 
> > Cc: 'daniel.lezc...@linaro.org' ; Leo Li
> > ; 'linux...@vger.kernel.org'
> > ; 'linux-kernel@vger.kernel.org'
> > 
> > Subject: RE: [PATCH] thermal: qoriq: add thermal monitor unit
> > version 2
> > support
> > 
> > Hi Rui, Edubezval,
> > 
> > Almost three monthes passed, I have not got your comments from you.
> > Could you please take a look at this patch?
> > 
> > BR,
> > Andy
> > 
> > > -Original Message-
> > > From: Andy Tang
> > > Sent: 2019年8月6日 10:57
> > > To: edubez...@gmail.com; rui.zh...@intel.com
> > > Cc: daniel.lezc...@linaro.org; Leo Li ;
> > > linux...@vger.kernel.org; linux-kernel@vger.kernel.org
> > > Subject: RE: [PATCH] thermal: qoriq: add thermal monitor unit
> > > version
> > > 2 support
> > > 
> > > Any comments?
> > > 
> > > BR,
> > > Andy
> > > 
> > > > -Original Message-
> > > > From: Yuantian Tang 
> > > > Sent: 2019年6月4日 10:51
> > > > To: edubez...@gmail.com; rui.zh...@intel.com
> > > > Cc: daniel.lezc...@linaro.org; Leo Li ;
> > > > linux...@vger.kernel.org; linux-kernel@vger.kernel.org; Andy
> > > > Tang
> > > > 
> > > > Subject: [PATCH] thermal: qoriq: add thermal monitor unit
> > > > version 2
> > > > support
> > > > 
> > > > Thermal Monitor Unit v2 is introduced on new Layscape SoC.
> > > > Compared to v1, TMUv2 has a little different register layout
> > > > and
> > > > digital output is fairly linear.
> > > > 
> > > > Signed-off-by: Yuantian Tang 
> > > > ---
> > > >  drivers/thermal/qoriq_thermal.c | 122
> > > > +---
> > > >  1 file changed, 98 insertions(+), 24 deletions(-)
> > > > 
> > > > diff --git a/drivers/thermal/qoriq_thermal.c
> > > > b/drivers/thermal/qoriq_thermal.c index
> > > > 3b5f5b3fb1bc..0df6dfddf804
> > > > 100644
> > > > --- a/drivers/thermal/qoriq_thermal.c
> > > > +++ b/drivers/thermal/qoriq_thermal.c
> > > > @@ -13,6 +13,15 @@
> > > >  #include "thermal_core.h"
> > > > 
> > > >  #define SITES_MAX  16
> > > > +#define TMR_DISABLE0x0
> > > > +#define TMR_ME 0x8000
> > > > +#define TMR_ALPF   0x0c00
> > > > +#define TMR_ALPF_V20x0300
> > > > +#define TMTMIR_DEFAULT 0x000f
> > > > +#define TIER_DISABLE   0x0
> > > > +#define TEUMR0_V2  0x51009C00
> > > > +#define TMU_VER1   0x1
> > > > +#define TMU_VER2   0x2
> > > > 
> > > >  /*
> > > >   * QorIQ TMU Registers
> > > > @@ -23,17 +32,55 @@ struct qoriq_tmu_site_regs {
> > > > u8 res0[0x8];
> > > >  };
> > > > 
> > > > -struct qoriq_tmu_regs {
> > > > +struct qoriq_tmu_regs_v2 {
> > > > +   u32 tmr;/* Mode Register */
> > > > +   u32 tsr;/* Status Register */
> > > > +   u32 tmsr;   /* monitor site register */
> > > > +   u32 tmtmir; /* Temperature measurement
> > > > interval Register
> > 
> > */
> > > > +   u8 res0[0x10];
> > > > +   u32 tier;   /* Interrupt Enable Register */
> > > > +   u32 tidr;   /* Interrupt Detect Register */
> > > > +   u8 res1[0x8];
> > > > +   u32 tiiscr; /* interrupt immediate site
> > > > capture register */
> > > > +   u32 tiascr; /* interrupt average site
> > > > capture register */
> > > > +   u32 ticscr; /* Interrupt Critical Site
> > > > Capture Register */
> > > > +   u32 res2;
> > > > +   u32 tmhtcr; /* monitor high temperature
> > > > capture register */
> > > > +   u32 tmltcr; /* monitor low temperature
> > > > capture register */
> > > > +   u32 tmrtrcr;/* monitor rising temperature rate
> > > > capture register
> > 
> > */
> > > > +   u32 tmftrcr;/* monitor falling temperature rate
> > > > capture register
> > 
> > */
> > > > +   u32 tmhtitr;/* High Temperature Immediate Threshold
> > > > */
> > > > +   u32 tmhtatr;/* High Temperature Average Threshold
> > > > */
> > > > +   u32 tmhtactr;   /* High Temperature Average Crit
> > > > Threshold */
> > > > +   u32 res3;
> > > > +   u32 tmltitr;/* monitor low temperature immediate
> > > > threshold */
> > > > +   u32 tmltatr;/* monitor low temperature average
> > > > threshold
> > 
> > register */
> > > > +   u32 tmltactr;   /* monitor low temperature average
> > > > critical
> > 
> > threshold */
> > > > +   u32 res4;
> > > > +   u32 tmrtrctr;   /* monitor rising temperature rate
> > > > critical threshold
> > 
> > */
> > > > +   u32 tmftrctr;   /* monitor falling temperature rate
> > > > critical
> > 
> > threshold*/
> > > > +   u8 res5[0x8];
> > > > +   u32 ttcfgr; /*

Re: [PATCH 1/5] thermal: Initialize thermal subsystem earlier

2019-09-19 Thread Zhang Rui

On Tue, 2019-09-17 at 14:48 +0530, Amit Kucheria wrote:
> On Tue, Sep 17, 2019 at 1:30 AM Daniel Lezcano
>  wrote:
> > 
> > On 12/09/2019 00:32, Amit Kucheria wrote:
> > > From: Lina Iyer 
> > > 
> > > Now that the thermal framework is built-in, in order to
> > > facilitate
> > > thermal mitigation as early as possible in the boot cycle, move
> > > the
> > > thermal framework initialization to core_initcall.
> > > 
> > > However, netlink initialization happens only as part of
> > > subsys_initcall.
> > > At this time in the boot process, the userspace is not available
> > > yet. So
> > > initialize the netlink events later in fs_initcall.
> > 
> > Why not kill directly the netlink part, no one is using it in the
> > kernel?
> 
> That's a good point. I wasn't sure if anybody was using it, but I can
> remove it completely since no driver seems to be using the
> thermal_generate_netlink_event() api.

Interesting, I recalled that thermal_generate_netlink_event() is indeed
used by some thermal driver, but it's true that no one is using it now.

let's remove it and see if we get any complains.

thanks,
rui
> 
> Regards,
> Amit
> 
> $ git grep thermal_generate_netlink_event
> Documentation/thermal/sysfs-api.rst:just need to call
> thermal_generate_netlink_event() with two arguments viz
> drivers/thermal/thermal_core.c:int
> thermal_generate_netlink_event(struct thermal_zone_device *tz,
> drivers/thermal/thermal_core.c:EXPORT_SYMBOL_GPL(thermal_generate_net
> link_event);
> include/linux/thermal.h:extern int
> thermal_generate_netlink_event(struct thermal_zone_device *tz,
> include/linux/thermal.h:static inline int
> thermal_generate_netlink_event(struct thermal_zone_device *tz,

Re: [PATCH V3 1/5] thermal: qoriq: Add clock operations

2019-08-28 Thread Zhang Rui

On Thu, 2019-08-29 at 02:49 +, Anson Huang wrote:
> Hi, Rui
> 
> > > On Wed, 2019-08-28 at 08:51 +, Anson Huang wrote:
> > > > Hi, Rui
> > > > 
> > > > > On Tue, 2019-08-27 at 12:41 +, Leonard Crestez wrote:
> > > > > > On 27.08.2019 04:51, Anson Huang wrote:
> > > > > > > > In an earlier series the CLK_IS_CRITICAL flags was
> > > > > > > > removed
> > > > > > > > from the TMU clock so if the thermal driver doesn't
> > > > > > > > explicitly enable it the system will hang on probe.
> > > > > > > > This is
> > > > > > > > what happens in linux-next right now!
> > > > > > > 
> > > > > > > The thermal driver should be built with module, so
> > > > > > > default
> > > > > > > kernel should can boot up, do you modify the thermal
> > > > > > > driver as
> > > > > > > built- in?
> > > > > > > 
> > > > > > > > Unless this patches is merged soon we'll end up with a
> > > > > > > > 5.4-
> > > > > > > > rc1
> > > > > > > > that doesn't boot on imx8mq. An easy fix would be to
> > > > > > > > drop/revert commit
> > > > > > > > 951c1aef9691 ("clk: imx8mq: Remove CLK_IS_CRITICAL flag
> > > > > > > > for
> > > > > > > > IMX8MQ_CLK_TMU_ROOT") until the thermal patches are
> > 
> > accepted.
> > > > > > > 
> > > > > > > If the thermal driver is built as module, I think no need
> > > > > > > to
> > > > > > > revert the commit, but if by default thermal driver is
> > > > > > > built-in or mod probed, then yes, it should NOT break
> > > > > > > kernel boot
> > 
> > up.
> > > > > > 
> > > > > > The qoriq_thermal driver is built as a module in defconfig
> > > > > > and
> > > > > > when modules are properly installed in rootfs they will be
> > > > > > automatically be probed on boot and cause a hang.
> > > > > > 
> > > > > > I usually run nfsroot with modules:
> > > > > > 
> > > > > >  make modules_install INSTALL_MOD_PATH=/srv/nfs/imx8-
> > > > > > root
> > > > > 
> > > > > so we need this patch shipped in the beginning of the merge
> > > > > window, right?
> > > > > if there is hard dependency between patches, it's better to
> > > > > send
> > > > > them in one series, and get shipped via either tree.
> > > > 
> > > > There is no hard dependency in this patch series. Previous for
> > > > the
> > > > TMU clock disabled patch, since thermal driver is built as
> > > > module so
> > > > I did NOT found the issue. The patch series is the correct fix.
> > > > 
> > > 
> > > Got it.
> > > the clock patch is also queued for 5.4-rc1, right?
> > > I will apply this series and try to push it as early as possible
> > > during the merge window.
> > 
> > The clock patch is as below in Linux-next tree, while I did NOT see
> > it in v5.3-
> > rc6, so it should be queued for 5.4-rc1, right?
> > Thanks for taking the patch series!
> 
> Sorry for pushing, so you will apply this patch series to avoid the
> i.MX8MQ kernel boot up hang
> caused by insmod qoriq thermal driver, right? Then we no need to
> revert that TMU clock patch
> 951c1aef9691 ("clk: imx8mq: Remove CLK_IS_CRITICAL flag for
> IMX8MQ_CLK_TMU_ROOT").
> 
right. I will queue it for 5.4-rc1.

thanks,
rui

> Thanks,
> Anson
> 
> > 
> > 
> > commit 951c1aef9691491ddf4dd5aab76f2665d56bd5d3
> > Author: Anson Huang 
> > Date:   Fri Jul 5 12:56:11 2019 +0800
> > 
> > clk: imx8mq: Remove CLK_IS_CRITICAL flag for
> > IMX8MQ_CLK_TMU_ROOT
> > 
> > IMX8MQ_CLK_TMU_ROOT is ONLY used for thermal module, the driver
> > should manage this clock, so no need to have CLK_IS_CRITICAL
> > flag
> > set.
> > 
> > Signed-off-by: Anson Huang 
> > Reviewed-by: Abel Vesa 
> > Acked-by: Stephen Boyd 
> > Signed-off-by: Shawn Guo 
> > 
> > drivers/clk/imx/clk-imx8mq.c
> > 
> > 
> > Thanks,
> > Anson

Re: [PATCH] thermal: armada: Fix -Wshift-negative-value

2019-08-28 Thread Zhang Rui

On Wed, 2019-08-28 at 11:49 -0700, Nick Desaulniers wrote:
> On Wed, Aug 28, 2019 at 1:53 AM Zhang Rui 
> wrote:
> > 
> > On Mon, 2019-08-19 at 10:21 +0200, Miquel Raynal wrote:
> > > Hello,
> > > 
> > > Daniel Lezcano  wrote on Thu, 15 Aug
> > > 2019
> > > 01:06:21 +0200:
> > > 
> > > > On 15/08/2019 00:12, Nick Desaulniers wrote:
> > > > > On Tue, Aug 13, 2019 at 10:28 AM 'Nathan Huckleberry' via
> > > > > Clang
> > > > > Built
> > > > > Linux  wrote:
> > > > > > 
> > > > > > Following up to see if this patch is going to be accepted.
> > > > > 
> > > > > Miquel is listed as the maintainer of this file in
> > > > > MAINTAINERS.
> > > > > Miquel, can you please pick this up?  Otherwise Zhang,
> > > > > Eduardo,
> > > > > and
> > > > > Daniel are listed as maintainers for drivers/thermal/.
> > > > 
> > > > I'm listed as reviewer, it is up to Zhang or Eduardo to take
> > > > the
> > > > patches.
> > > > 
> > > > 
> > > 
> > > Sorry for the delay, I don't manage a tree for this driver, I'll
> > > let
> > > Zhang or Eduardo take the patch with my
> > > 
> > > Acked-by: Miquel Raynal 
> > > 
> > 
> > Patch applied.
> > 
> > thanks,
> > rui
> 
> Thanks Rui, did you push the branch?  I guess I would have expected
> it
> in 
> https://git.kernel.org/pub/scm/linux/kernel/git/rzhang/linux.git/log/?h=next
> ?
> I'm trying to track where this lands in
> https://github.com/ClangBuiltLinux/linux/issues/532.

Not yet. I will push it to kernel.org after I finish my internal build
test.

thanks,
rui
>

Re: [PATCH] drivers: thermal: qcom: tsens: Fix memory leak from qfprom read

2019-08-28 Thread Zhang Rui

On Fri, 2019-08-23 at 17:29 +0530, Amit Kucheria wrote:
> On Fri, Aug 23, 2019 at 3:08 PM Srinivas Kandagatla
>  wrote:
> > 
> > memory returned as part of nvmem_read via qfprom_read should be
> > freed by the consumer once done.
> > Existing code is not doing it so fix it.
> > 
> > Below memory leak detected by kmemleak
> >[] kmemleak_alloc+0x50/0x84
> > [] __kmalloc+0xe8/0x168
> > [] nvmem_cell_read+0x30/0x80
> > [] qfprom_read+0x4c/0x7c
> > [] calibrate_v1+0x34/0x204
> > [] tsens_probe+0x164/0x258
> > [] platform_drv_probe+0x80/0xa0
> > [] really_probe+0x208/0x248
> > [] driver_probe_device+0x98/0xc0
> > [] __device_attach_driver+0x9c/0xac
> > [] bus_for_each_drv+0x60/0x8c
> > [] __device_attach+0x8c/0x100
> > [] device_initial_probe+0x20/0x28
> > [] bus_probe_device+0x34/0x7c
> > [] deferred_probe_work_func+0x6c/0x98
> > [] process_one_work+0x160/0x2f8
> > 
> > Signed-off-by: Srinivas Kandagatla 
> 
> Acked-by: Amit Kucheria 

patch applied.

thanks,
rui
> 
> > ---
> >  drivers/thermal/qcom/tsens-8960.c |  2 ++
> >  drivers/thermal/qcom/tsens-v0_1.c | 12 ++--
> >  drivers/thermal/qcom/tsens-v1.c   |  1 +
> >  drivers/thermal/qcom/tsens.h  |  1 +
> >  4 files changed, 14 insertions(+), 2 deletions(-)
> > 
> > diff --git a/drivers/thermal/qcom/tsens-8960.c
> > b/drivers/thermal/qcom/tsens-8960.c
> > index 8d9b721dadb6..e46a4e3f25c4 100644
> > --- a/drivers/thermal/qcom/tsens-8960.c
> > +++ b/drivers/thermal/qcom/tsens-8960.c
> > @@ -229,6 +229,8 @@ static int calibrate_8960(struct tsens_priv
> > *priv)
> > for (i = 0; i < num_read; i++, s++)
> > s->offset = data[i];
> > 
> > +   kfree(data);
> > +
> > return 0;
> >  }
> > 
> > diff --git a/drivers/thermal/qcom/tsens-v0_1.c
> > b/drivers/thermal/qcom/tsens-v0_1.c
> > index 6f26fadf4c27..055647bcee67 100644
> > --- a/drivers/thermal/qcom/tsens-v0_1.c
> > +++ b/drivers/thermal/qcom/tsens-v0_1.c
> > @@ -145,8 +145,10 @@ static int calibrate_8916(struct tsens_priv
> > *priv)
> > return PTR_ERR(qfprom_cdata);
> > 
> > qfprom_csel = (u32 *)qfprom_read(priv->dev, "calib_sel");
> > -   if (IS_ERR(qfprom_csel))
> > +   if (IS_ERR(qfprom_csel)) {
> > +   kfree(qfprom_cdata);
> > return PTR_ERR(qfprom_csel);
> > +   }
> > 
> > mode = (qfprom_csel[0] & MSM8916_CAL_SEL_MASK) >>
> > MSM8916_CAL_SEL_SHIFT;
> > dev_dbg(priv->dev, "calibration mode is %d\n", mode);
> > @@ -181,6 +183,8 @@ static int calibrate_8916(struct tsens_priv
> > *priv)
> > }
> > 
> > compute_intercept_slope(priv, p1, p2, mode);
> > +   kfree(qfprom_cdata);
> > +   kfree(qfprom_csel);
> > 
> > return 0;
> >  }
> > @@ -198,8 +202,10 @@ static int calibrate_8974(struct tsens_priv
> > *priv)
> > return PTR_ERR(calib);
> > 
> > bkp = (u32 *)qfprom_read(priv->dev, "calib_backup");
> > -   if (IS_ERR(bkp))
> > +   if (IS_ERR(bkp)) {
> > +   kfree(calib);
> > return PTR_ERR(bkp);
> > +   }
> > 
> > calib_redun_sel =  bkp[1] & BKP_REDUN_SEL;
> > calib_redun_sel >>= BKP_REDUN_SHIFT;
> > @@ -313,6 +319,8 @@ static int calibrate_8974(struct tsens_priv
> > *priv)
> > }
> > 
> > compute_intercept_slope(priv, p1, p2, mode);
> > +   kfree(calib);
> > +   kfree(bkp);
> > 
> > return 0;
> >  }
> > diff --git a/drivers/thermal/qcom/tsens-v1.c
> > b/drivers/thermal/qcom/tsens-v1.c
> > index 10b595d4f619..870f502f2cb6 100644
> > --- a/drivers/thermal/qcom/tsens-v1.c
> > +++ b/drivers/thermal/qcom/tsens-v1.c
> > @@ -138,6 +138,7 @@ static int calibrate_v1(struct tsens_priv
> > *priv)
> > }
> > 
> > compute_intercept_slope(priv, p1, p2, mode);
> > +   kfree(qfprom_cdata);
> > 
> > return 0;
> >  }
> > diff --git a/drivers/thermal/qcom/tsens.h
> > b/drivers/thermal/qcom/tsens.h
> > index 2fd94997245b..b89083b61c38 100644
> > --- a/drivers/thermal/qcom/tsens.h
> > +++ b/drivers/thermal/qcom/tsens.h
> > @@ -17,6 +17,7 @@
> > 
> >  #include 
> >  #include 
> > +#include 
> > 
> >  struct tsens_priv;
> > 
> > --
> > 2.21.0
> >

Re: [PATCH v7 4/4] thermal: cpu_cooling: Migrate to using the EM framework

2019-08-28 Thread Zhang Rui

On Mon, 2019-08-12 at 09:42 +0100, Quentin Perret wrote:
> The newly introduced Energy Model framework manages power cost tables
> in
> a generic way. Moreover, it supports several types of models since
> the
> tables can come from DT or firmware (through SCMI) for example. On
> the
> other hand, the cpu_cooling subsystem manages its own power cost
> tables
> using only DT data.
> 
> In order to avoid the duplication of data in the kernel, and in order
> to
> enable IPA with EMs coming from more than just DT, remove the private
> tables from cpu_cooling.c and migrate it to using the centralized EM
> framework. Doing so should have no visible functional impact for
> existing users of IPA since:
> 
>  - recent extenstions to the the PM_OPP infrastructure enable the
>registration of EMs in PM_EM using the DT property used by IPA;
> 
>  - the existing upstream cpufreq drivers marked with the
>'CPUFREQ_IS_COOLING_DEV' flag all use the aforementioned PM_OPP
>infrastructure, which means they all support PM_EM. The only two
>exceptions are qoriq-cpufreq which doesn't in fact use an EM and
>scmi-cpufreq which doesn't use DT for power costs.
> 
> For existing users of cpu_cooling, PM_EM tables will contain the
> exact
> same power values that IPA used to compute on its own until now. The
> only new dependency for them is to compile in CONFIG_ENERGY_MODEL.
> 
> The case where the thermal subsystem is used without an Energy Model
> (cpufreq_cooling_ops) is handled by looking directly at CPUFreq's
> frequency table which is already a dependency for cpu_cooling.c
> anyway.
> Since the thermal framework expects the cooling states in a
> particular
> order, bail out whenever the CPUFreq table is unsorted, since that is
> fairly uncommon in general, and there are currently no users of
> cpu_cooling for this use-case.
> 
> Acked-by: Daniel Lezcano 
> Acked-by: Viresh Kumar 
> Signed-off-by: Quentin Perret 

this patch has coding style problems, please check the checkpatch.pl
output.
total: 5 errors, 17 warnings, 413 lines checked

thanks,
rui
> ---
>  drivers/thermal/Kconfig   |   1 +
>  drivers/thermal/cpu_cooling.c | 250 --
> 
>  2 files changed, 91 insertions(+), 160 deletions(-)
> 
> diff --git a/drivers/thermal/Kconfig b/drivers/thermal/Kconfig
> index 9966364a6deb..340853a3ca48 100644
> --- a/drivers/thermal/Kconfig
> +++ b/drivers/thermal/Kconfig
> @@ -144,6 +144,7 @@ config THERMAL_GOV_USER_SPACE
>  
>  config THERMAL_GOV_POWER_ALLOCATOR
>   bool "Power allocator thermal governor"
> + depends on ENERGY_MODEL
>   help
> Enable this to manage platform thermals by dynamically
> allocating and limiting power to devices.
> diff --git a/drivers/thermal/cpu_cooling.c
> b/drivers/thermal/cpu_cooling.c
> index 498f59ab64b2..83486775e593 100644
> --- a/drivers/thermal/cpu_cooling.c
> +++ b/drivers/thermal/cpu_cooling.c
> @@ -19,6 +19,7 @@
>  #include 
>  #include 
>  #include 
> +#include 
>  
>  #include 
>  
> @@ -36,21 +37,6 @@
>   *   ...
>   */
>  
> -/**
> - * struct freq_table - frequency table along with power entries
> - * @frequency:   frequency in KHz
> - * @power:   power in mW
> - *
> - * This structure is built when the cooling device registers and
> helps
> - * in translating frequency to power and vice versa.
> - */
> -struct freq_table {
> - u32 frequency;
> -#ifdef CONFIG_THERMAL_GOV_POWER_ALLOCATOR
> - u32 power;
> -#endif
> -};
> -
>  /**
>   * struct time_in_idle - Idle time stats
>   * @time: previous reading of the absolute time that this cpu was
> idle
> @@ -72,7 +58,7 @@ struct time_in_idle {
>   *   frequency.
>   * @max_level: maximum cooling level. One less than total number of
> valid
>   *   cpufreq frequencies.
> - * @freq_table: Freq table in descending order of frequencies
> + * @em: Reference on the Energy Model of the device
>   * @cdev: thermal_cooling_device pointer to keep track of the
>   *   registered cooling device.
>   * @policy: cpufreq policy.
> @@ -88,7 +74,7 @@ struct cpufreq_cooling_device {
>   unsigned int cpufreq_state;
>   unsigned int clipped_freq;
>   unsigned int max_level;
> - struct freq_table *freq_table;  /* In descending order */
> + struct em_perf_domain *em;
>   struct cpufreq_policy *policy;
>   struct list_head node;
>   struct time_in_idle *idle_time;
> @@ -162,114 +148,40 @@ static int cpufreq_thermal_notifier(struct
> notifier_block *nb,
>  static unsigned long get_level(struct cpufreq_cooling_device
> *cpufreq_cdev,
>  unsigned int freq)
>  {
> - struct freq_table *freq_table = cpufreq_cdev->freq_table;
> - unsigned long level;
> + int i;
>  
> - for (level = 1; level <= cpufreq_cdev->max_level; level++)
> - if (freq > freq_table[level].frequency)
> + for (i = cpufreq_cdev->max_level - 1; i >= 0; i--) {
> + if (freq > cpufreq_cdev->em->table[i].frequency)
>

Re: [PATCH v5 09/18] thermal: sun8i: rework for ths calibrate func

2019-08-28 Thread Zhang Rui

On Sat, 2019-08-10 at 05:28 +, Yangtao Li wrote:
> Here, we do something to prepare for the subsequent
> support of multiple platforms.
> 
> 1) rename sun50i_ths_calibrate to sun8i_ths_calibrate, because
>this function should be suitable for all platforms now.
> 
> 2) introduce calibrate callback to mask calibration method
>differences.
> 
> Signed-off-by: Yangtao Li 

IMO, patch 4/18 to patch 9/18 are all changes to a new file introduced
in patch 1/18, so why not have a prettier patch 1/18 instead of
separate patches?

thanks,
rui

> ---
>  drivers/thermal/sun8i_thermal.c | 86 ++-
> --
>  1 file changed, 48 insertions(+), 38 deletions(-)
> 
> diff --git a/drivers/thermal/sun8i_thermal.c
> b/drivers/thermal/sun8i_thermal.c
> index 6f4294c2aba7..47c20c4c69e7 100644
> --- a/drivers/thermal/sun8i_thermal.c
> +++ b/drivers/thermal/sun8i_thermal.c
> @@ -60,6 +60,8 @@ struct ths_thermal_chip {
>   int scale;
>   int ft_deviation;
>   int temp_data_base;
> + int (*calibrate)(struct ths_device *tmdev,
> +  u16 *caldata, int callen);
>   int (*init)(struct ths_device *tmdev);
>   int (*irq_ack)(struct ths_device *tmdev);
>  };
> @@ -152,45 +154,14 @@ static irqreturn_t sun8i_irq_thread(int irq,
> void *data)
>   return IRQ_HANDLED;
>  }
>  
> -static int sun50i_ths_calibrate(struct ths_device *tmdev)
> +static int sun50i_h6_ths_calibrate(struct ths_device *tmdev,
> +u16 *caldata, int callen)
>  {
> - struct nvmem_cell *calcell;
>   struct device *dev = tmdev->dev;
> - u16 *caldata;
> - size_t callen;
> - int ft_temp;
> - int i, ret = 0;
> -
> - calcell = devm_nvmem_cell_get(dev, "calib");
> - if (IS_ERR(calcell)) {
> - if (PTR_ERR(calcell) == -EPROBE_DEFER)
> - return -EPROBE_DEFER;
> - /*
> -  * Even if the external calibration data stored in sid
> is
> -  * not accessible, the THS hardware can still work,
> although
> -  * the data won't be so accurate.
> -  *
> -  * The default value of calibration register is 0x800
> for
> -  * every sensor, and the calibration value is usually
> 0x7xx
> -  * or 0x8xx, so they won't be away from the default
> value
> -  * for a lot.
> -  *
> -  * So here we do not return error if the calibartion
> data is
> -  * not available, except the probe needs deferring.
> -  */
> - goto out;
> - }
> + int i, ft_temp;
>  
> - caldata = nvmem_cell_read(calcell, );
> - if (IS_ERR(caldata)) {
> - ret = PTR_ERR(caldata);
> - goto out;
> - }
> -
> - if (!caldata[0] || callen < 2 + 2 * tmdev->chip->sensor_num) {
> - ret = -EINVAL;
> - goto out_free;
> - }
> + if (!caldata[0] || callen < 2 + 2 * tmdev->chip->sensor_num)
> + return -EINVAL;
>  
>   /*
>* efuse layout:
> @@ -245,7 +216,45 @@ static int sun50i_ths_calibrate(struct
> ths_device *tmdev)
>  cdata << offset);
>   }
>  
> -out_free:
> + return 0;
> +}
> +
> +static int sun8i_ths_calibrate(struct ths_device *tmdev)
> +{
> + struct nvmem_cell *calcell;
> + struct device *dev = tmdev->dev;
> + u16 *caldata;
> + size_t callen;
> + int ret = 0;
> +
> + calcell = devm_nvmem_cell_get(dev, "calib");
> + if (IS_ERR(calcell)) {
> + if (PTR_ERR(calcell) == -EPROBE_DEFER)
> + return -EPROBE_DEFER;
> + /*
> +  * Even if the external calibration data stored in sid
> is
> +  * not accessible, the THS hardware can still work,
> although
> +  * the data won't be so accurate.
> +  *
> +  * The default value of calibration register is 0x800
> for
> +  * every sensor, and the calibration value is usually
> 0x7xx
> +  * or 0x8xx, so they won't be away from the default
> value
> +  * for a lot.
> +  *
> +  * So here we do not return error if the calibartion
> data is
> +  * not available, except the probe needs deferring.
> +  */
> + goto out;
> + }
> +
> + caldata = nvmem_cell_read(calcell, );
> + if (IS_ERR(caldata)) {
> + ret = PTR_ERR(caldata);
> + goto out;
> + }
> +
> + tmdev->chip->calibrate(tmdev, caldata, callen);
> +
>   kfree(caldata);
>  out:
>   return ret;
> @@ -294,7 +303,7 @@ static int sun8i_ths_resource_init(struct
> ths_device *tmdev)
>   if (ret)
>   goto bus_disable;
>  
> - ret = sun50i_ths_calibrate(tmdev);
> + ret = sun8i_ths_calibrate(tmdev);
>   if (ret)
>

Re: [PATCH v5 03/18] thermal: fix indentation in makefile

2019-08-28 Thread Zhang Rui

On Sat, 2019-08-10 at 05:28 +, Yangtao Li wrote:
> To unify code style.
> 
> Signed-off-by: Yangtao Li 

the later patches in this series does not change Makefile.
So this seems to be a cleanup patch independent of this patch set.
It's better to remove this patch from this patch series.

thanks,
rui
> ---
>  drivers/thermal/Makefile | 8 
>  1 file changed, 4 insertions(+), 4 deletions(-)
> 
> diff --git a/drivers/thermal/Makefile b/drivers/thermal/Makefile
> index fa6f8b206281..d7eafb5ef8ef 100644
> --- a/drivers/thermal/Makefile
> +++ b/drivers/thermal/Makefile
> @@ -5,7 +5,7 @@
>  
>  obj-$(CONFIG_THERMAL)+= thermal_sys.o
>  thermal_sys-y+= thermal_core.o
> thermal_sysfs.o \
> - thermal_helpers.o
> +thermal_helpers.o
>  
>  # interface to/from other layers providing sensors
>  thermal_sys-$(CONFIG_THERMAL_HWMON)  += thermal_hwmon.o
> @@ -25,11 +25,11 @@ thermal_sys-$(CONFIG_CPU_THERMAL) +=
> cpu_cooling.o
>  thermal_sys-$(CONFIG_CLOCK_THERMAL)  += clock_cooling.o
>  
>  # devfreq cooling
> -thermal_sys-$(CONFIG_DEVFREQ_THERMAL) += devfreq_cooling.o
> +thermal_sys-$(CONFIG_DEVFREQ_THERMAL)+= devfreq_cooling.o
>  
>  # platform thermal drivers
>  obj-y+= broadcom/
> -obj-$(CONFIG_THERMAL_MMIO)   += thermal_mmio.o
> +obj-$(CONFIG_THERMAL_MMIO)   += thermal_mmio.o
>  obj-$(CONFIG_SPEAR_THERMAL)  += spear_thermal.o
>  obj-$(CONFIG_SUN8I_THERMAL) += sun8i_thermal.o
>  obj-$(CONFIG_ROCKCHIP_THERMAL)   += rockchip_thermal.o
> @@ -50,7 +50,7 @@ obj-$(CONFIG_TI_SOC_THERMAL)+= ti-soc-
> thermal/
>  obj-y+= st/
>  obj-$(CONFIG_QCOM_TSENS) += qcom/
>  obj-y+= tegra/
> -obj-$(CONFIG_HISI_THERMAL) += hisi_thermal.o
> +obj-$(CONFIG_HISI_THERMAL)   += hisi_thermal.o
>  obj-$(CONFIG_MTK_THERMAL)+= mtk_thermal.o
>  obj-$(CONFIG_GENERIC_ADC_THERMAL)+= thermal-generic-adc.o
>  obj-$(CONFIG_ZX2967_THERMAL) += zx2967_thermal.o

Re: [PATCH] thermal: rcar_gen3_thermal: Use devm_add_action_or_reset() helper

2019-08-28 Thread Zhang Rui

On Wed, 2019-07-31 at 20:44 +0200, Niklas Söderlund wrote:
> Hi Geert,
> 
> Thanks for your work.
> 
> On 2019-07-31 14:50:53 +0200, Geert Uytterhoeven wrote:
> > Use the devm_add_action_or_reset() helper instead of open-coding
> > the
> > same operations.
> > 
> > Signed-off-by: Geert Uytterhoeven 
> 
> Reviewed-by: Niklas Söderlund 
> 
Hi, Geert,

https://patchwork.kernel.org/patch/11034969/ is the same fix submitted
a few days earlier, so I will take that patch instead. Thanks for the
patch anyway.

thanks,
rui

> > ---
> >  drivers/thermal/rcar_gen3_thermal.c | 7 +++
> >  1 file changed, 3 insertions(+), 4 deletions(-)
> > 
> > diff --git a/drivers/thermal/rcar_gen3_thermal.c
> > b/drivers/thermal/rcar_gen3_thermal.c
> > index a56463308694e937..2db7e7f8baf939fd 100644
> > --- a/drivers/thermal/rcar_gen3_thermal.c
> > +++ b/drivers/thermal/rcar_gen3_thermal.c
> > @@ -443,11 +443,10 @@ static int rcar_gen3_thermal_probe(struct
> > platform_device *pdev)
> > if (ret)
> > goto error_unregister;
> >  
> > -   ret = devm_add_action(dev, rcar_gen3_hwmon_action,
> > zone);
> > -   if (ret) {
> > -   rcar_gen3_hwmon_action(zone);
> > +   ret = devm_add_action_or_reset(dev,
> > rcar_gen3_hwmon_action,
> > +  zone);
> > +   if (ret)
> > goto error_unregister;
> > -   }
> >  
> > ret = of_thermal_get_ntrips(tsc->zone);
> > if (ret < 0)
> > -- 
> > 2.17.1
> > 
> 
>

Re: [PATCH] thermal: mediatek: add suspend/resume callback

2019-08-28 Thread Zhang Rui

On Tue, 2019-07-02 at 17:16 +0800, michael@mediatek.com wrote:
> From: Louis Yu 
> 
> Add suspend/resume callback to disable/enable Mediatek thermal sensor
> respectively. Since thermal power domain is off in suspend, thermal
> driver
> needs re-initialization during resume.
> 
> Signed-off-by: Louis Yu 
> Signed-off-by: Michael Kao 
> ---
> This patch series base on these patches [1][2][3].
> 
> [1]thermal: mediatek: mt8183: fix bank number settings (
> https://patchwork.kernel.org/patch/10938817/)
> [2]thermal: mediatek: add another get_temp ops for thermal sensors (
> https://patchwork.kernel.org/patch/10938829/)
> [3]thermal: mediatek: use spinlock to protect PTPCORESEL (
> https://patchwork.kernel.org/patch/10938841/)
> 
all these patches are not sent to linux-pm mailing list, thus they
never got chance to get merged. please resend them to linux-pm.

I don't know what the first part of the patch set do, so I'm wondering
if there is any dependency?

thanks,
rui

>  drivers/thermal/mtk_thermal.c | 134
> +++---
>  1 file changed, 125 insertions(+), 9 deletions(-)
> 
> diff --git a/drivers/thermal/mtk_thermal.c
> b/drivers/thermal/mtk_thermal.c
> index 3d01153..61d4114 100644
> --- a/drivers/thermal/mtk_thermal.c
> +++ b/drivers/thermal/mtk_thermal.c
> @@ -30,6 +30,7 @@
>  #include 
>  #include 
>  #include 
> +#include 
>  
>  /* AUXADC Registers */
>  #define AUXADC_CON1_SET_V0x008
> @@ -39,6 +40,8 @@
>  
>  #define APMIXED_SYS_TS_CON1  0x604
>  
> +#define APMIXED_SYS_TS_CON1_BUFFER_OFF   0x30
> +
>  /* Thermal Controller Registers */
>  #define TEMP_MONCTL0 0x000
>  #define TEMP_MONCTL1 0x004
> @@ -46,6 +49,7 @@
>  #define TEMP_MONIDET00x014
>  #define TEMP_MONIDET10x018
>  #define TEMP_MSRCTL0 0x038
> +#define TEMP_MSRCTL1 0x03c
>  #define TEMP_AHBPOLL 0x040
>  #define TEMP_AHBTO   0x044
>  #define TEMP_ADCPNP0 0x048
> @@ -95,6 +99,9 @@
>  #define TEMP_ADCVALIDMASK_VALID_HIGH BIT(5)
>  #define TEMP_ADCVALIDMASK_VALID_POS(bit) (bit)
>  
> +#define TEMP_MSRCTL1_BUS_STA (BIT(0) | BIT(7))
> +#define TEMP_MSRCTL1_SENSING_POINTS_PAUSE0x10E
> +
>  /* MT8173 thermal sensors */
>  #define MT8173_TS1   0
>  #define MT8173_TS2   1
> @@ -266,6 +273,10 @@ struct mtk_thermal_data {
>  struct mtk_thermal {
>   struct device *dev;
>   void __iomem *thermal_base;
> + void __iomem *apmixed_base;
> + void __iomem *auxadc_base;
> + u64 apmixed_phys_base;
> + u64 auxadc_phys_base;
>  
>   struct clk *clk_peri_therm;
>   struct clk *clk_auxadc;
> @@ -795,6 +806,42 @@ static void mtk_thermal_init_bank(struct
> mtk_thermal *mt, int num,
>   mtk_thermal_put_bank(bank);
>  }
>  
> +static int mtk_thermal_disable_sensing(struct mtk_thermal *mt, int
> num)
> +{
> + struct mtk_thermal_bank *bank = >banks[num];
> + u32 val;
> + unsigned long timeout;
> + void __iomem *addr;
> + int ret = 0;
> +
> + bank->id = num;
> + bank->mt = mt;
> +
> + mtk_thermal_get_bank(bank);
> +
> + val = readl(mt->thermal_base + TEMP_MSRCTL1);
> + /* pause periodic temperature measurement for sensing points */
> + writel(val | TEMP_MSRCTL1_SENSING_POINTS_PAUSE,
> +mt->thermal_base + TEMP_MSRCTL1);
> +
> + /* wait until temperature measurement bus idle */
> + timeout = jiffies + HZ;
> + addr = mt->thermal_base + TEMP_MSRCTL1;
> +
> + ret = readl_poll_timeout(addr, val, (val &
> TEMP_MSRCTL1_BUS_STA) == 0x0,
> +  0, timeout);
> + if (ret < 0)
> + goto out;
> +
> + /* disable periodic temperature meausrement on sensing points
> */
> + writel(0x0, mt->thermal_base + TEMP_MONCTL0);
> +
> +out:
> + mtk_thermal_put_bank(bank);
> +
> + return ret;
> +}
> +
>  static u64 of_get_phys_base(struct device_node *np)
>  {
>   u64 size64;
> @@ -917,7 +964,6 @@ static int mtk_thermal_probe(struct
> platform_device *pdev)
>   struct device_node *auxadc, *apmixedsys, *np = pdev-
> >dev.of_node;
>   struct mtk_thermal *mt;
>   struct resource *res;
> - u64 auxadc_phys_base, apmixed_phys_base;
>   struct thermal_zone_device *tzdev;
>   struct mtk_thermal_zone *tz;
>  
> @@ -954,11 +1000,11 @@ static int mtk_thermal_probe(struct
> platform_device *pdev)
>   return -ENODEV;
>   }
>  
> - auxadc_phys_base = of_get_phys_base(auxadc);
> + mt->auxadc_phys_base = of_get_phys_base(auxadc);
>  
>   of_node_put(auxadc);
>  
> - if (auxadc_phys_base == OF_BAD_ADDR) {
> + if (mt->auxadc_phys_base == OF_BAD_ADDR) {
>   dev_err(>dev, "Can't get auxadc phys address\n");
>   return -EINVAL;
>   }
> @@ -969,11 +1015,12 @@ static int mtk_thermal_probe(struct
> platform_device *pdev)
>   return -ENODEV;
>   }
>  
> - apmixed_phys_base = of_get_phys_base(apmixedsys);

Re: [PATCH V3 1/5] thermal: qoriq: Add clock operations

2019-08-28 Thread Zhang Rui

On Wed, 2019-08-28 at 08:51 +, Anson Huang wrote:
> Hi, Rui
> 
> > On Tue, 2019-08-27 at 12:41 +, Leonard Crestez wrote:
> > > On 27.08.2019 04:51, Anson Huang wrote:
> > > > > In an earlier series the CLK_IS_CRITICAL flags was removed
> > > > > from
> > > > > the TMU clock so if the thermal driver doesn't explicitly
> > > > > enable
> > > > > it the system will hang on probe. This is what happens in
> > > > > linux-next right now!
> > > > 
> > > > The thermal driver should be built with module, so default
> > > > kernel
> > > > should can boot up, do you modify the thermal driver as built-
> > > > in?
> > > > 
> > > > > Unless this patches is merged soon we'll end up with a 5.4-
> > > > > rc1
> > > > > that doesn't boot on imx8mq. An easy fix would be to
> > > > > drop/revert
> > > > > commit
> > > > > 951c1aef9691 ("clk: imx8mq: Remove CLK_IS_CRITICAL flag for
> > > > > IMX8MQ_CLK_TMU_ROOT") until the thermal patches are accepted.
> > > > 
> > > > If the thermal driver is built as module, I think no need to
> > > > revert
> > > > the commit, but if by default thermal driver is built-in or mod
> > > > probed, then yes, it should NOT break kernel boot up.
> > > 
> > > The qoriq_thermal driver is built as a module in defconfig and
> > > when
> > > modules are properly installed in rootfs they will be
> > > automatically be
> > > probed on boot and cause a hang.
> > > 
> > > I usually run nfsroot with modules:
> > > 
> > >  make modules_install INSTALL_MOD_PATH=/srv/nfs/imx8-root
> > 
> > so we need this patch shipped in the beginning of the merge window,
> > right?
> > if there is hard dependency between patches, it's better to send
> > them in one
> > series, and get shipped via either tree.
> 
> There is no hard dependency in this patch series. Previous for the
> TMU clock disabled
> patch, since thermal driver is built as module so I did NOT found the
> issue. The patch
> series is the correct fix.
> 
Got it.
the clock patch is also queued for 5.4-rc1, right?
I will apply this series and try to push it as early as possible during
the merge window.

thanks,
rui
> Thanks,
> Anson 
> 
> > 
> > BTW, who is maintaining qoriq driver from NXP? If Anson is
> > maintaining and
> > developing this driver, it's better to update this in the driver or
> > the
> > MAINTAINER file, I will take the driver specific patches as long as
> > we have
> > ACK/Reviewed-By from the driver maintainer.
> > 
> > thanks,
> > rui
> > 
> > > 
> > > --
> > > Regards,
> > > Leonard
> 
>

Re: [PATCH V3 1/5] thermal: qoriq: Add clock operations

2019-08-28 Thread Zhang Rui

On Wed, 2019-08-28 at 08:49 +, Anson Huang wrote:
> Hi, Rui
> 
> > On Wed, 2019-08-28 at 16:32 +0800, Zhang Rui wrote:
> > > On Tue, 2019-08-27 at 12:41 +, Leonard Crestez wrote:
> > > > On 27.08.2019 04:51, Anson Huang wrote:
> > > > > > In an earlier series the CLK_IS_CRITICAL flags was removed
> > > > > > from
> > > > > > the TMU clock so if the thermal driver doesn't explicitly
> > > > > > enable
> > > > > > it the system will hang on probe. This is what happens in
> > > > > > linux-next right now!
> > > > > 
> > > > > The thermal driver should be built with module, so default
> > > > > kernel
> > > > > should can boot up, do you modify the thermal driver as
> > > > > built-in?
> > > > > 
> > > > > > Unless this patches is merged soon we'll end up with a 5.4-
> > > > > > rc1
> > > > > > that doesn't boot on imx8mq. An easy fix would be to
> > > > > > drop/revert
> > > > > > commit
> > > > > > 951c1aef9691 ("clk: imx8mq: Remove CLK_IS_CRITICAL flag for
> > > > > > IMX8MQ_CLK_TMU_ROOT") until the thermal patches are
> > > > > > accepted.
> > > > > 
> > > > > If the thermal driver is built as module, I think no need to
> > > > > revert the commit, but if by default thermal driver is built-
> > > > > in or
> > > > > mod probed, then yes, it should NOT break kernel boot up.
> > > > 
> > > > The qoriq_thermal driver is built as a module in defconfig and
> > > > when
> > > > modules are properly installed in rootfs they will be
> > > > automatically
> > > > be probed on boot and cause a hang.
> > > > 
> > > > I usually run nfsroot with modules:
> > > > 
> > > >  make modules_install INSTALL_MOD_PATH=/srv/nfs/imx8-root
> > > 
> > > so we need this patch shipped in the beginning of the merge
> > > window,
> > > right?
> > > if there is hard dependency between patches, it's better to send
> > > them
> > > in one series, and get shipped via either tree.
> > > 
> > > BTW, who is maintaining qoriq driver from NXP? If Anson is
> > > maintaining
> > > and developing this driver, it's better to update this in the
> > > driver
> > > or the MAINTAINER file, I will take the driver specific patches
> > > as
> > > long as we have ACK/Reviewed-By from the driver maintainer.
> 
> I am NOT sure who is the qoriq driver from NXP, some of our i.MX SoCs
> use
> qoriq thermal IP, so I have to add support for them.  The first
> maintainer for
> this driver is hongtao@nxp.com, but I can NOT find it from NXP's
> mail system,
> NOT sure if he is still in NXP. The MAINTAINER file does NOT have
> info for this
> Driver's maintainer, so how to update it? Just change the name in
> driver? Or leave
> it as what it is?
> 
>  MODULE_AUTHOR("Jia Hongtao ");
>  MODULE_DESCRIPTION("QorIQ Thermal Monitoring Unit driver");
>  MODULE_LICENSE("GPL v2");
> 
I see several people are actively working on this driver from NXP.
If you're okay, I'd like to get your comments on all the patches for
this driver before I take them, and I can update the MAINTAINER file
later so that you're CCed for all the qoriq thermal driver changes. 

> > 
> > And also, can you provide your feedback for this one?
> > 
https://eur01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fpatch
> > work.kernel.org%2Fpatch%2F10974147%2Fdata=02%7C01%7Canson.h
> > uang%40nxp.com%7C887e7c90f7c943ff0d9b08d72b92aea1%7C686ea1d3bc2
> > b4c6fa92cd99c5c301635%7C0%7C0%7C637025781325203384sdata=Xg
> > tX6mPdA50Nbb%2FmnS2om2bJNepTd1th6HmfwGuU9Hw%3Dreserve
> > d=0
> 
> I can take a look at it later.
> 
thanks!

-rui
> Thanks,
> Anson

Re: [PATCH] thermal: armada: Fix -Wshift-negative-value

2019-08-28 Thread Zhang Rui

On Mon, 2019-08-19 at 10:21 +0200, Miquel Raynal wrote:
> Hello,
> 
> Daniel Lezcano  wrote on Thu, 15 Aug 2019
> 01:06:21 +0200:
> 
> > On 15/08/2019 00:12, Nick Desaulniers wrote:
> > > On Tue, Aug 13, 2019 at 10:28 AM 'Nathan Huckleberry' via Clang
> > > Built
> > > Linux  wrote:  
> > > > 
> > > > Following up to see if this patch is going to be accepted.  
> > > 
> > > Miquel is listed as the maintainer of this file in MAINTAINERS.
> > > Miquel, can you please pick this up?  Otherwise Zhang, Eduardo,
> > > and
> > > Daniel are listed as maintainers for drivers/thermal/.  
> > 
> > I'm listed as reviewer, it is up to Zhang or Eduardo to take the
> > patches.
> > 
> > 
> 
> Sorry for the delay, I don't manage a tree for this driver, I'll let
> Zhang or Eduardo take the patch with my
> 
> Acked-by: Miquel Raynal 
> 

Patch applied.

thanks,
rui
> 
> Thanks,
> Miquèl

Re: [PATCH] thermal: rcar_gen3_thermal: Fix Wshift-negative-value

2019-08-28 Thread Zhang Rui

On Fri, 2019-06-14 at 12:52 +0200, Daniel Lezcano wrote:
> Hi Nathan,
> 
> On 13/06/2019 23:12, Nathan Huckleberry wrote:
> > Clang produces the following warning
> > 
> > vers/thermal/rcar_gen3_thermal.c:147:33: warning: shifting a
> > negative
> > signed value is undefined [-Wshift-negative-value] / (ptat[0] -
> > ptat[2])) +
> > FIXPT_INT(TJ_3); ^~~
> > drivers/thermal/rcar_gen3_thermal.c:126:29
> > note: expanded from macro 'FIXPT_INT' #define FIXPT_INT(_x) ((_x)
> > <<
> > FIXPT_SHIFT)  ^ drivers/thermal/rcar_gen3_thermal.c:150:18:
> > warning:
> > shifting a negative signed value is undefined [-Wshift-negative-
> > value]
> > tsc->tj_t - FIXPT_INT(TJ_3)); ^~~~
> > 
> > Upon further investigating it looks like there is no real reason
> > for
> > TJ_3 to be -41. Usages subtract -41, code would be cleaner to just
> > add.
> 
> All the code seems broken regarding the negative value shifting as
> the
> macros pass an integer:
> 
> eg.
> tsc->coef.a2 = FIXPT_DIV(FIXPT_INT(thcode[1] - thcode[0]),
>  tsc->tj_t - FIXPT_INT(ths_tj_1));
> 
> thcode[1] is always < than thcode[0],
> 
> thcode[1] - thcode[0] < 0
> 
> FIXPT_INT(thcode[1] - thcode[0]) is undefined
> 
> 
> Is it done on purpose FIXPT_DIV(FIXPT_INT(thcode[1] - thcode[0]) ?
> 
> Try developing the macro with the coef.a2 computation ...
> 
> The code quality of this driver could be better, it deserves a rework
> IMHO ...
> 
> I suggest to revert:
> 
> 4eb39f79ef443fa566d36bd43f1f578d5c140305
> bdc4480a669d476814061b4da6bb006f7048c8e5
> 6a310f8f97bb8bc2e2bb9db6f49a1b8678c8d144
> 
> Rework the coefficient computation and re-introduce what was reverted
> in
> a nicer way.

Sounds reasonable to me.

Yoshihiro,
can you please clarify on this? Or else I will revert the above commits
first?

Also CC Wolfram Sang, the driver author.

thanks,
rui
> 
> 
> > Fixes: 4eb39f79ef44 ("thermal: rcar_gen3_thermal: Update value of
> > Tj_1")
> > Cc: clang-built-li...@googlegroups.com
> > Link: https://github.com/ClangBuiltLinux/linux/issues/531
> > Signed-off-by: Nathan Huckleberry 
> > ---
> >  drivers/thermal/rcar_gen3_thermal.c | 8 
> >  1 file changed, 4 insertions(+), 4 deletions(-)
> > 
> > diff --git a/drivers/thermal/rcar_gen3_thermal.c
> > b/drivers/thermal/rcar_gen3_thermal.c
> > index a56463308694..f4b4558c08e9 100644
> > --- a/drivers/thermal/rcar_gen3_thermal.c
> > +++ b/drivers/thermal/rcar_gen3_thermal.c
> > @@ -131,7 +131,7 @@ static inline void
> > rcar_gen3_thermal_write(struct rcar_gen3_thermal_tsc *tsc,
> >  #define RCAR3_THERMAL_GRAN 500 /* mili Celsius */
> >  
> >  /* no idea where these constants come from */
> > -#define TJ_3 -41
> > +#define TJ_3 41
> >  
> >  static void rcar_gen3_thermal_calc_coefs(struct
> > rcar_gen3_thermal_tsc *tsc,
> >  int *ptat, const int *thcode,
> > @@ -144,11 +144,11 @@ static void
> > rcar_gen3_thermal_calc_coefs(struct rcar_gen3_thermal_tsc *tsc,
> >  * the dividend (4095 * 4095 << 14 > INT_MAX) so keep it
> > unscaled
> >  */
> > tsc->tj_t = (FIXPT_INT((ptat[1] - ptat[2]) * 157)
> > -/ (ptat[0] - ptat[2])) + FIXPT_INT(TJ_3);
> > +/ (ptat[0] - ptat[2])) - FIXPT_INT(TJ_3);
> >  
> > tsc->coef.a1 = FIXPT_DIV(FIXPT_INT(thcode[1] - thcode[2]),
> > -tsc->tj_t - FIXPT_INT(TJ_3));
> > -   tsc->coef.b1 = FIXPT_INT(thcode[2]) - tsc->coef.a1 * TJ_3;
> > +tsc->tj_t + FIXPT_INT(TJ_3));
> > +   tsc->coef.b1 = FIXPT_INT(thcode[2]) + tsc->coef.a1 * TJ_3;
> >  
> > tsc->coef.a2 = FIXPT_DIV(FIXPT_INT(thcode[1] - thcode[0]),
> >  tsc->tj_t - FIXPT_INT(ths_tj_1));
> > 
> 
>

Re: [PATCH V3 1/5] thermal: qoriq: Add clock operations

2019-08-28 Thread Zhang Rui

On Wed, 2019-08-28 at 16:32 +0800, Zhang Rui wrote:
> On Tue, 2019-08-27 at 12:41 +, Leonard Crestez wrote:
> > On 27.08.2019 04:51, Anson Huang wrote:
> > > > In an earlier series the CLK_IS_CRITICAL flags was removed from
> > > > the TMU
> > > > clock so if the thermal driver doesn't explicitly enable it the
> > > > system will hang
> > > > on probe. This is what happens in linux-next right now!
> > > 
> > > The thermal driver should be built with module, so default kernel
> > > should can boot
> > > up, do you modify the thermal driver as built-in?
> > > 
> > > > Unless this patches is merged soon we'll end up with a 5.4-rc1
> > > > that doesn't
> > > > boot on imx8mq. An easy fix would be to drop/revert commit
> > > > 951c1aef9691 ("clk: imx8mq: Remove CLK_IS_CRITICAL flag for
> > > > IMX8MQ_CLK_TMU_ROOT") until the thermal patches are accepted.
> > > 
> > > If the thermal driver is built as module, I think no need to
> > > revert
> > > the commit, but
> > > if by default thermal driver is built-in or mod probed, then yes,
> > > it should NOT break
> > > kernel boot up.
> > 
> > The qoriq_thermal driver is built as a module in defconfig and
> > when 
> > modules are properly installed in rootfs they will be automatically
> > be 
> > probed on boot and cause a hang.
> > 
> > I usually run nfsroot with modules:
> > 
> >  make modules_install INSTALL_MOD_PATH=/srv/nfs/imx8-root
> 
> so we need this patch shipped in the beginning of the merge window,
> right?
> if there is hard dependency between patches, it's better to send them
> in one series, and get shipped via either tree.
> 
> BTW, who is maintaining qoriq driver from NXP? If Anson is
> maintaining
> and developing this driver, it's better to update this in the driver
> or
> the MAINTAINER file, I will take the driver specific patches as long
> as
> we have ACK/Reviewed-By from the driver maintainer.

And also, can you provide your feedback for this one?
https://patchwork.kernel.org/patch/10974147/

thanks,
rui
> 
> thanks,
> rui
> 
> > 
> > --
> > Regards,
> > Leonard

Re: [PATCH V3 1/5] thermal: qoriq: Add clock operations

2019-08-28 Thread Zhang Rui

On Tue, 2019-08-27 at 12:41 +, Leonard Crestez wrote:
> On 27.08.2019 04:51, Anson Huang wrote:
> > > In an earlier series the CLK_IS_CRITICAL flags was removed from
> > > the TMU
> > > clock so if the thermal driver doesn't explicitly enable it the
> > > system will hang
> > > on probe. This is what happens in linux-next right now!
> > 
> > The thermal driver should be built with module, so default kernel
> > should can boot
> > up, do you modify the thermal driver as built-in?
> > 
> > > Unless this patches is merged soon we'll end up with a 5.4-rc1
> > > that doesn't
> > > boot on imx8mq. An easy fix would be to drop/revert commit
> > > 951c1aef9691 ("clk: imx8mq: Remove CLK_IS_CRITICAL flag for
> > > IMX8MQ_CLK_TMU_ROOT") until the thermal patches are accepted.
> > 
> > If the thermal driver is built as module, I think no need to revert
> > the commit, but
> > if by default thermal driver is built-in or mod probed, then yes,
> > it should NOT break
> > kernel boot up.
> 
> The qoriq_thermal driver is built as a module in defconfig and when 
> modules are properly installed in rootfs they will be automatically
> be 
> probed on boot and cause a hang.
> 
> I usually run nfsroot with modules:
> 
>  make modules_install INSTALL_MOD_PATH=/srv/nfs/imx8-root

so we need this patch shipped in the beginning of the merge window,
right?
if there is hard dependency between patches, it's better to send them
in one series, and get shipped via either tree.

BTW, who is maintaining qoriq driver from NXP? If Anson is maintaining
and developing this driver, it's better to update this in the driver or
the MAINTAINER file, I will take the driver specific patches as long as
we have ACK/Reviewed-By from the driver maintainer.

thanks,
rui

> 
> --
> Regards,
> Leonard

Re: [PATCH V3 1/5] thermal: qoriq: Add clock operations

2019-08-26 Thread Zhang Rui

On Tue, 2019-08-27 at 01:51 +, Anson Huang wrote:
> > On 7/30/2019 5:31 AM, anson.hu...@nxp.com wrote:
> > > From: Anson Huang 
> > > 
> > > Some platforms like i.MX8MQ has clock control for this module,
> > > need to
> > > add clock operations to make sure the driver is working properly.
> > > 
> > > Signed-off-by: Anson Huang 
> > > Reviewed-by: Guido Günther 
> > 
> > This series looks good, do you think it can be merged in time for
> > v5.4?
> > Today was v5.3-rc6.
> 
> If the question is for me, then I am NOT sure, the thermal patches
> are pending
> there for almost half year and I did NOT receive any response,

which patch series you're referring to?

>  looks like no one
> is maintaining the thermal sub-system?
> 

Eduardo is maintaining all the thermal-soc driver changes. Thus I
usually filtered out the soc driver patches in my mailbox.

The last email from Eduardo is that he is offline during this July and
will be back and taking patches in August.

I will double check with Eduardo anyway.

thanks,
rui


> > 
> > In an earlier series the CLK_IS_CRITICAL flags was removed from the
> > TMU
> > clock so if the thermal driver doesn't explicitly enable it the
> > system will hang
> > on probe. This is what happens in linux-next right now!
> 
> The thermal driver should be built with module, so default kernel
> should can boot
> up, do you modify the thermal driver as built-in?
> 
> > 
> > Unless this patches is merged soon we'll end up with a 5.4-rc1 that
> > doesn't
> > boot on imx8mq. An easy fix would be to drop/revert commit
> > 951c1aef9691 ("clk: imx8mq: Remove CLK_IS_CRITICAL flag for
> > IMX8MQ_CLK_TMU_ROOT") until the thermal patches are accepted.
> 
> If the thermal driver is built as module, I think no need to revert
> the commit, but
> if by default thermal driver is built-in or mod probed, then yes, it
> should NOT break
> kernel boot up.
> 
> Anson.
> 
> > 
> > Merging patches out-of-order when they have hard (boot-breaking)
> > dependencies also breaks bisect.
> > 
> > --
> > Regards,
> > Leonard

Re: [PATCH][V2] drivers: thermal: processor_thermal_device: fix missing bitwise-or operators

2019-08-19 Thread Zhang Rui

On Mon, 2019-07-29 at 14:09 -0700, Srinivas Pandruvada wrote:
> On Mon, 2019-07-29 at 13:03 +0100, Colin King wrote:
> > From: Colin Ian King 
> > 
> > The variable val is having the top 8 bits cleared and then the
> > variable is being
> > re-assinged and setting just the top 8 bits.  I believe the
> > intention
> > was bitwise-or
> > in the top 8 bits.  Fix this by replacing the = operators with &=
> > and
> > > = instead.
> > 
> > Addresses-Coverity: ("Unused value")
> > Fixes: b0c74b08517e ("drivers: thermal: processor_thermal_device:
> > Export sysfs inteface for TCC offset")
> > Signed-off-by: Colin Ian King 
> 
> Reviewed-by: Srinivas Pandruvada  >

Hi, Colin,

thanks for the fix, as b0c74b08517e ("drivers: thermal:
processor_thermal_device: Export sysfs inteface for TCC offset") has
not been shipped in upstream yet, I will fold this fix into the
original patch directly.

thanks,
rui
> 
> > ---
> > 
> > V2: Add in &= operator missing from V1. Doh.
> > 
> > ---
> >  .../thermal/intel/int340x_thermal/processor_thermal_device.c  | 4
> > ++--
> >  1 file changed, 2 insertions(+), 2 deletions(-)
> > 
> > diff --git
> > a/drivers/thermal/intel/int340x_thermal/processor_thermal_device.c
> > b/drivers/thermal/intel/int340x_thermal/processor_thermal_device.c
> > index 6f6ac6a8e82d..97333fc4be42 100644
> > ---
> > a/drivers/thermal/intel/int340x_thermal/processor_thermal_device.c
> > +++
> > b/drivers/thermal/intel/int340x_thermal/processor_thermal_device.c
> > @@ -163,8 +163,8 @@ static int tcc_offset_update(int tcc)
> > if (err)
> > return err;
> >  
> > -   val = ~GENMASK_ULL(31, 24);
> > -   val = (tcc & 0xff) << 24;
> > +   val &= ~GENMASK_ULL(31, 24);
> > +   val |= (tcc & 0xff) << 24;
> >  
> > err = wrmsrl_safe(MSR_IA32_TEMPERATURE_TARGET, val);
> > if (err)
> 
>

Re: [PATCH] int340X/processor_thermal_device: Fix proc_thermal_rapl_remove()

2019-07-23 Thread Zhang Rui

On 二, 2019-07-23 at 09:30 +0200, Rafael J. Wysocki wrote:
> On Mon, Jul 22, 2019 at 12:23 PM Rafael J. Wysocki  > wrote:
> > 
> > 
> > From: Rafael J. Wysocki 
> > 
> > Passing 0 to cpuhp_remove_state() triggers the BUG_ON() in
> > __cpuhp_remove_state_cpuslocked() and the argument passed to
> > powercap_unregister_control_type() is expected to be a valid
> > pointer, so avoid calling these functions with incorrect
> > arguments from proc_thermal_rapl_remove().
> > 
> > Fixes: 555c45fe0d04 ("int340X/processor_thermal_device: add support
> > for MMIO RAPL")
> > Signed-off-by: Rafael J. Wysocki 
Acked-by: Zhang Rui 

> Any comments?
> 
> If not, I'll queue this up along with the other RAPL-related fix
> (https://patchwork.kernel.org/patch/11050999/).
> 
> > 
> > ---
> >  drivers/thermal/intel/int340x_thermal/processor_thermal_device.c
> > |4 
> >  1 file changed, 4 insertions(+)
> > 
> > Index: linux-
> > pm/drivers/thermal/intel/int340x_thermal/processor_thermal_device.c
> > ===
> > --- linux-
> > pm.orig/drivers/thermal/intel/int340x_thermal/processor_thermal_dev
> > ice.c
> > +++ linux-
> > pm/drivers/thermal/intel/int340x_thermal/processor_thermal_device.c
> > @@ -487,6 +487,7 @@ static int proc_thermal_rapl_add(struct
> > rapl_mmio_cpu_online,
> > rapl_mmio_cpu_down_prep);
> > if (ret < 0) {
> > powercap_unregister_control_type(rapl_mmio_priv.con
> > trol_type);
> > +   rapl_mmio_priv.control_type = NULL;
> > return ret;
> > }
> > rapl_mmio_priv.pcap_rapl_online = ret;
> > @@ -496,6 +497,9 @@ static int proc_thermal_rapl_add(struct
> > 
> >  static void proc_thermal_rapl_remove(void)
> >  {
> > +   if (IS_ERR_OR_NULL(rapl_mmio_priv.control_type))
> > +   return;
> > +
> > cpuhp_remove_state(rapl_mmio_priv.pcap_rapl_online);
> > powercap_unregister_control_type(rapl_mmio_priv.control_typ
> > e);
> >  }
> > 
> > 
> >

[GIT PULL] Thermal management updates for v5.3-rc1

2019-07-17 Thread Zhang Rui

Hi, Linus,

Please pull from
  git://git.kernel.org/pub/scm/linux/kernel/git/rzhang/linux.git next

to receive the latest Thermal Management updates for v5.3-rc1 with
top-most commit 6c395f66e98c895cf3ebf87c0b2fc63b6a57a196:

  drivers: thermal: processor_thermal_device: Fix build warning (2019-
07-09 21:19:12 +0800)

on top of commit 4b972a01a7da614b4796475f933094751a295a2f:

  Linux 5.2-rc6 (2019-06-22 16:01:36 -0700)

Specifics:
 - Covert thermal documents to ReST. (Mauro Carvalho Chehab)
 - Fix a cyclic depedency in between thermal core and governors.
(Daniel Lezcano)
 - Fix processor_thermal_device driver to re-evaluate power limits
after resume. (Srinivas Pandruvada, Zhang Rui)

thanks,
rui



Daniel Lezcano (2):
  thermal/drivers/core: Add init section table for self-
encapsulation
  thermal/drivers/core: Use governor table to initialize

Mauro Carvalho Chehab (1):
  docs: thermal: convert to ReST

Srinivas Pandruvada (1):
  drivers: thermal: processor_thermal: Read PPCC on resume

Zhang Rui (2):
  Merge branches 'thermal-core' and 'thermal-intel' into next
  drivers: thermal: processor_thermal_device: Fix build warning

 .../{cpu-cooling-api.txt => cpu-cooling-api.rst}   |  39 +-
 .../thermal/{exynos_thermal => exynos_thermal.rst} |  47 +-
 Documentation/thermal/exynos_thermal_emulation |  53 ---
 Documentation/thermal/exynos_thermal_emulation.rst |  61 +++
 Documentation/thermal/index.rst|  18 +
 .../{intel_powerclamp.txt => intel_powerclamp.rst} | 183 
 .../{nouveau_thermal => nouveau_thermal.rst}   |  54 ++-
 .../{power_allocator.txt => power_allocator.rst}   | 144 +++---
 .../thermal/{sysfs-api.txt => sysfs-api.rst}   | 488
++---
 ...ure_thermal => x86_pkg_temperature_thermal.rst} |  28 +-
 MAINTAINERS|   2 +-
 drivers/thermal/fair_share.c   |  12 +-
 drivers/thermal/gov_bang_bang.c|  11 +-
 .../int340x_thermal/processor_thermal_device.c |  18 +
 drivers/thermal/power_allocator.c  |  11 +-
 drivers/thermal/step_wise.c|  11 +-
 drivers/thermal/thermal_core.c |  52 ++-
 drivers/thermal/thermal_core.h |  55 +--
 drivers/thermal/user_space.c   |  12 +-
 include/asm-generic/vmlinux.lds.h  |  11 +
 include/linux/thermal.h|   4 +-
 21 files changed, 771 insertions(+), 543 deletions(-)
 rename Documentation/thermal/{cpu-cooling-api.txt => cpu-cooling-
api.rst} (82%)
 rename Documentation/thermal/{exynos_thermal => exynos_thermal.rst}
(67%)
 delete mode 100644 Documentation/thermal/exynos_thermal_emulation
 create mode 100644 Documentation/thermal/exynos_thermal_emulation.rst
 create mode 100644 Documentation/thermal/index.rst
 rename Documentation/thermal/{intel_powerclamp.txt =>
intel_powerclamp.rst} (76%)
 rename Documentation/thermal/{nouveau_thermal => nouveau_thermal.rst}
(64%)
 rename Documentation/thermal/{power_allocator.txt =>
power_allocator.rst} (74%)
 rename Documentation/thermal/{sysfs-api.txt => sysfs-api.rst} (66%)
 rename Documentation/thermal/{x86_pkg_temperature_thermal =>
x86_pkg_temperature_thermal.rst} (80%)

Re: [PATCH] drivers: thermal: processor_thermal: mark pm function __maybe_unused

2019-07-10 Thread Zhang Rui

Hi, Arnd,

thanks for the report.

On 一, 2019-07-08 at 14:47 +0200, Arnd Bergmann wrote:
> Without CONFIG_PM, we get a harmless warning:
> 
> drivers/thermal/intel/int340x_thermal/processor_thermal_device.c:446:
> 12: error: 'proc_thermal_resume' defined but not used [-
> Werror=unused-function]
>  static int proc_thermal_resume(struct device *dev)
> 
> Mark it __maybe_unused to shut up the warning.
> 
> Fixes: aaba9791fbb4 ("drivers: thermal: processor_thermal: Read PPCC
> on resume")
> Signed-off-by: Arnd Bergmann 
> ---
>  .../thermal/intel/int340x_thermal/processor_thermal_device.c| 2
> +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git
> a/drivers/thermal/intel/int340x_thermal/processor_thermal_device.c
> b/drivers/thermal/intel/int340x_thermal/processor_thermal_device.c
> index a3210f09f366..5ce639a99330 100644
> ---
> a/drivers/thermal/intel/int340x_thermal/processor_thermal_device.c
> +++
> b/drivers/thermal/intel/int340x_thermal/processor_thermal_device.c
> @@ -443,7 +443,7 @@ static void  proc_thermal_pci_remove(struct
> pci_dev *pdev)
>   pci_disable_device(pdev);
>  }
>  
> -static int proc_thermal_resume(struct device *dev)
> +static int __maybe_unused proc_thermal_resume(struct device *dev)
>  {
>   struct proc_thermal_device *proc_dev;
>  
I'd rather prefer to add #ifdef CONFIG_PM_SLEEP for
proc_thermal_resume().
Just like the patch below, what do you think?

thanks,
rui

>From 6c395f66e98c895cf3ebf87c0b2fc63b6a57a196 Mon Sep 17 00:00:00 2001
From: Zhang Rui 
Date: Tue, 9 Jul 2019 21:19:12 +0800
Subject: [PATCH] drivers: thermal: processor_thermal_device: Fix build warning

As a system sleep callback, proc_thermal_resume() should be defined only
if CONFIG_PM_SLEEP is set.

This fixes a build warning when CONFIG_PM_SLEEP is not set,
drivers/thermal/intel/int340x_thermal/processor_thermal_device.c:446:12: error: 
'proc_thermal_resume' defined but not used [-Werror=unused-function]
 static int proc_thermal_resume(struct device *dev)

Fixes: aaba9791fbb4 ("drivers: thermal: processor_thermal: Read PPCC on resume")
Reported-by: Arnd Bergmann 
Signed-off-by: Zhang Rui 
---
 drivers/thermal/intel/int340x_thermal/processor_thermal_device.c | 4 
 1 file changed, 4 insertions(+)

diff --git a/drivers/thermal/intel/int340x_thermal/processor_thermal_device.c 
b/drivers/thermal/intel/int340x_thermal/processor_thermal_device.c
index a3210f0..77dae1e 100644
--- a/drivers/thermal/intel/int340x_thermal/processor_thermal_device.c
+++ b/drivers/thermal/intel/int340x_thermal/processor_thermal_device.c
@@ -443,6 +443,7 @@ static void  proc_thermal_pci_remove(struct pci_dev *pdev)
    pci_disable_device(pdev);
 }
 
+#ifdef CONFIG_PM_SLEEP
 static int proc_thermal_resume(struct device *dev)
 {
    struct proc_thermal_device *proc_dev;
@@ -452,6 +453,9 @@ static int proc_thermal_resume(struct device *dev)
 
    return 0;
 }
+#else
+#define proc_thermal_resume NULL
+#endif
 
 static SIMPLE_DEV_PM_OPS(proc_thermal_pm, NULL, proc_thermal_resume);
 
-- 
2.7.4

Re: [PATCH 1/2] thermal/drivers/core: Add init section table for self-encapsulation

2019-06-27 Thread Zhang Rui

On 一, 2019-06-24 at 09:32 +0200, Daniel Lezcano wrote:
> Any chance this patch gets merged for v5.4?
> 
> Thanks
>   -- Daniel
> 

have you run compile test for the patch?
I got the following errors when compiling.

In file included from drivers/thermal/fair_share.c:16:0:
drivers/thermal/thermal_core.h:23:3: error: expected identifier or ‘(’
before ‘static’
  (static typeof(name) *__thermal_table_entry_##name \
   ^
drivers/thermal/thermal_core.h:26:40: note: in expansion of macro
‘THERMAL_TABLE_ENTRY’
 #define THERMAL_GOVERNOR_DECLARE(name) THERMAL_TABLE_ENTRY(governor,
name)
^
drivers/thermal/fair_share.c:120:1: note: in expansion of macro
‘THERMAL_GOVERNOR_DECLARE’
 THERMAL_GOVERNOR_DECLARE(thermal_gov_fair_share);
 ^
drivers/thermal/fair_share.c:116:32: warning: ‘thermal_gov_fair_share’
defined but not used [-Wunused-variable]
 static struct thermal_governor thermal_gov_fair_share = {
^
make[2]: *** [drivers/thermal/fair_share.o] Error 1
make[2]: *** Waiting for unfinished jobs
In file included from drivers/thermal/gov_bang_bang.c:14:0:
drivers/thermal/thermal_core.h:23:3: error: expected identifier or ‘(’
before ‘static’
  (static typeof(name) *__thermal_table_entry_##name \
   ^
drivers/thermal/thermal_core.h:26:40: note: in expansion of macro
‘THERMAL_TABLE_ENTRY’
 #define THERMAL_GOVERNOR_DECLARE(name) THERMAL_TABLE_ENTRY(governor,
name)
^
drivers/thermal/gov_bang_bang.c:119:1: note: in expansion of macro
‘THERMAL_GOVERNOR_DECLARE’
 THERMAL_GOVERNOR_DECLARE(thermal_gov_bang_bang);
 ^
drivers/thermal/gov_bang_bang.c:115:32: warning:
‘thermal_gov_bang_bang’ defined but not used [-Wunused-variable]
 static struct thermal_governor thermal_gov_bang_bang = {
^
make[2]: *** [drivers/thermal/gov_bang_bang.o] Error 1
make[1]: *** [drivers/thermal] Error 2
make[1]: *** Waiting for unfinished jobs
make: *** [drivers] Error 2

Fix the problem by removing the round brackets
of THERMAL_TABLE_ENTRY(), and applied.

thanks,
rui
> On 12/06/2019 22:13, Daniel Lezcano wrote:
> > 
> > Currently the governors are declared in their respective files but
> > they
> > export their [un]register functions which in turn call the
> > [un]register
> > governors core's functions. That implies a cyclic dependency which
> > is
> > not desirable. There is a way to self-encapsulate the governors by
> > letting
> > them to declare themselves in a __init section table.
> > 
> > Define the table in the asm generic linker description like the
> > other
> > tables and provide the specific macros to deal with.
> > 
> > Reviewed-by: Amit Kucheria 
> > Signed-off-by: Daniel Lezcano 
> > ---
> >  drivers/thermal/thermal_core.h| 15 +++
> >  include/asm-generic/vmlinux.lds.h | 11 +++
> >  2 files changed, 26 insertions(+)
> > 
> > diff --git a/drivers/thermal/thermal_core.h
> > b/drivers/thermal/thermal_core.h
> > index 0df190ed82a7..be901e84aa65 100644
> > --- a/drivers/thermal/thermal_core.h
> > +++ b/drivers/thermal/thermal_core.h
> > @@ -15,6 +15,21 @@
> >  /* Initial state of a cooling device during binding */
> >  #define THERMAL_NO_TARGET -1UL
> >  
> > +/* Init section thermal table */
> > +extern struct thermal_governor *__governor_thermal_table[];
> > +extern struct thermal_governor *__governor_thermal_table_end[];
> > +
> > +#define THERMAL_TABLE_ENTRY(table, name)   \
> > +   (static typeof(name) *__thermal_table_entry_##name  
> > \
> > +   __used __section(__##table##_thermal_table) = )
> > +
> > +#define THERMAL_GOVERNOR_DECLARE(name) THERMAL_TABLE_ENTRY(
> > governor, name)
> > +
> > +#define for_each_governor_table(__governor)\
> > +   for (__governor = __governor_thermal_table; \
> > +    __governor < __governor_thermal_table_end; \
> > +    __governor++)
> > +
> >  /*
> >   * This structure is used to describe the behavior of
> >   * a certain cooling device on a certain trip point
> > diff --git a/include/asm-generic/vmlinux.lds.h b/include/asm-
> > generic/vmlinux.lds.h
> > index f8f6f04c4453..8312fdc2b2fa 100644
> > --- a/include/asm-generic/vmlinux.lds.h
> > +++ b/include/asm-generic/vmlinux.lds.h
> > @@ -239,6 +239,16 @@
> >  #define ACPI_PROBE_TABLE(name)
> >  #endif
> >  
> > +#ifdef CONFIG_THERMAL
> > +#define THERMAL_TABLE(name)
> > \
> > +   . = ALIGN(8);   
> > \
> > +   __##name##_thermal_table = .;   
> > \
> > +   KEEP(*(__##name##_thermal_table))   
> > \
> > +   __##name##_thermal_table_end = .;
> > +#else
> > +#define THERMAL_TABLE(name)
> > +#endif
> > +
> >  #define KERNEL_DTB()   
> > \
> >     STRUCT_ALIGN(); 
> > \
> >     __dtb_start = .;

[PATCH] perf/rapl: restart perf rapl counter after resume

2019-06-17 Thread Zhang Rui

>From b74a74f953f4c34818a58831b6eb468b42b17c62 Mon Sep 17 00:00:00 2001
From: Zhang Rui 
Date: Tue, 23 Apr 2019 16:26:50 +0800
Subject: [PATCH] perf/rapl: restart perf rapl counter after resume

After S3 suspend/resume, "perf stat -I 1000 -e power/energy-pkg/ -a"
reports an insane value for the very first sampling period after resume
as shown below.

19.278989977   2.16 Joules power/energy-pkg/
20.279373569   1.96 Joules power/energy-pkg/
21.279765481   2.09 Joules power/energy-pkg/
22.280305420   2.10 Joules power/energy-pkg/
25.504782277   4,294,966,686.01 Joules power/energy-pkg/
26.505114993   3.58 Joules power/energy-pkg/
27.505471758   1.66 Joules power/energy-pkg/

Fix this by resetting the counter right after resume.

Kan, Liang proposed the prototype patch and I reworked it to use syscore
ops.

Signed-off-by: Zhang Rui 
Signed-off-by: Kan Liang 
---
 arch/x86/events/intel/rapl.c | 84 +++-
 1 file changed, 76 insertions(+), 8 deletions(-)

diff --git a/arch/x86/events/intel/rapl.c b/arch/x86/events/intel/rapl.c
index 26c03f5..6cff8fd 100644
--- a/arch/x86/events/intel/rapl.c
+++ b/arch/x86/events/intel/rapl.c
@@ -55,6 +55,7 @@
 #include 
 #include 
 #include 
+#include 
 #include 
 #include 
 #include "../perf_event.h"
@@ -228,6 +229,32 @@ static u64 rapl_event_update(struct perf_event *event)
    return new_raw_count;
 }
 
+static void rapl_pmu_update_all(struct rapl_pmu *pmu)
+{
+   struct perf_event *event;
+   unsigned long flags;
+
+   raw_spin_lock_irqsave(>lock, flags);
+
+   list_for_each_entry(event, >active_list, active_entry)
+   rapl_event_update(event);
+
+   raw_spin_unlock_irqrestore(>lock, flags);
+}
+
+static void rapl_pmu_restart_all(struct rapl_pmu *pmu)
+{
+   struct perf_event *event;
+   unsigned long flags;
+
+   raw_spin_lock_irqsave(>lock, flags);
+
+   list_for_each_entry(event, >active_list, active_entry)
+   local64_set(>hw.prev_count, rapl_read_counter(event));
+
+   raw_spin_unlock_irqrestore(>lock, flags);
+}
+
 static void rapl_start_hrtimer(struct rapl_pmu *pmu)
 {
hrtimer_start(>hrtimer, pmu->timer_interval,
@@ -237,18 +264,11 @@ static void rapl_start_hrtimer(struct rapl_pmu *pmu)
 static enum hrtimer_restart rapl_hrtimer_handle(struct hrtimer *hrtimer)
 {
    struct rapl_pmu *pmu = container_of(hrtimer, struct rapl_pmu, hrtimer);
-   struct perf_event *event;
-   unsigned long flags;
 
    if (!pmu->n_active)
    return HRTIMER_NORESTART;
 
-   raw_spin_lock_irqsave(>lock, flags);
-
-   list_for_each_entry(event, >active_list, active_entry)
-   rapl_event_update(event);
-
-   raw_spin_unlock_irqrestore(>lock, flags);
+   rapl_pmu_update_all(pmu);
 
    hrtimer_forward_now(hrtimer, pmu->timer_interval);
 
@@ -698,6 +718,52 @@ static int __init init_rapl_pmus(void)
    return 0;
 }
 
+
+#ifdef CONFIG_PM
+
+static int perf_rapl_suspend(void)
+{
+   int i;
+
+   get_online_cpus();
+   for (i = 0; i < rapl_pmus->maxpkg; i++)
+   rapl_pmu_update_all(rapl_pmus->pmus[i]);
+   put_online_cpus();
+   return 0;
+}
+
+static void perf_rapl_resume(void)
+{
+   int i;
+
+   get_online_cpus();
+   for (i = 0; i < rapl_pmus->maxpkg; i++)
+   rapl_pmu_restart_all(rapl_pmus->pmus[i]);
+   put_online_cpus();
+}
+
+static struct syscore_ops perf_rapl_syscore_ops = {
+   .resume = perf_rapl_resume,
+   .suspend = perf_rapl_suspend,
+};
+
+static void perf_rapl_pm_register(void)
+{
+   register_syscore_ops(_rapl_syscore_ops);
+}
+
+static void perf_rapl_pm_unregister(void)
+{
+   unregister_syscore_ops(_rapl_syscore_ops);
+}
+
+#else
+
+static inline void perf_rapl_pm_register(void) { }
+static inline void perf_rapl_pm_unregister(void) { }
+
+#endif
+
 #define X86_RAPL_MODEL_MATCH(model, init)  \
    { X86_VENDOR_INTEL, 6, model, X86_FEATURE_ANY, (unsigned long) }
 
@@ -798,6 +864,7 @@ static int __init rapl_pmu_init(void)
    apply_quirk = rapl_init->apply_quirk;
    rapl_cntr_mask = rapl_init->cntr_mask;
    rapl_pmu_events_group.attrs = rapl_init->attrs;
+   perf_rapl_pm_register();
 
    ret = rapl_check_hw_unit(apply_quirk);
    if (ret)
@@ -836,6 +903,7 @@ static void __exit intel_rapl_exit(void)
 {
    cpuhp_remove_state_nocalls(CPUHP_AP_PERF_X86_RAPL_ONLINE);
    perf_pmu_unregister(_pmus->pmu);
+   perf_rapl_pm_unregister();
    cleanup_rapl_pmus();
 }
 module_exit(intel_rapl_exit);
-- 
2.7.4

Re: [RFC PATCH] ACPI / processors: allow a processor device _UID to be a string

2019-06-11 Thread Zhang Rui

On 二, 2019-06-11 at 17:11 +0100, Sudeep Holla wrote:
> On Tue, Jun 11, 2019 at 10:03:15AM -0600, Al Stone wrote:
> > 
> > On 6/11/19 6:53 AM, Sudeep Holla wrote:
> > > 
> > > On Mon, Jun 10, 2019 at 02:07:34PM -0600, Al Stone wrote:
> > > > 
> > > > In the ACPI specification, section 6.1.12, a _UID may be either
> > > > an
> > > > integer or a string object.  Up until now, when defining
> > > > processor
> > > > Device()s in ACPI (_HID ACPI0007), only integers were allowed
> > > > even
> > > > though this ignored the specification.  As a practical matter,
> > > > it
> > > > was not an issue.
> > > > 
> > > > Recently, some DSDTs have shown up that look like this:
> > > > 
> > > >   Device (XX00)
> > > >   {
> > > > Name (_HID, "ACPI0007" /* Processor Device */)
> > > > Name (_UID, "XYZZY-XX00")
> > > > .
> > > >   }
> > > > 
> > > > which is perfectly legal.  However, the kernel will report
> > > > instead:
> > > > 
> > > I am not sure how this can be perfectly legal from specification
> > > perspective. It's legal with respect to AML namespace but then
> > > the
> > > other condition of this matching with entries in static tables
> > > like
> > > MADT is not possible where there are declared to be simple 4 byte
> > > integer/word. Same is true for even ACPI0010, the processor
> > > container
> > > objects which need to match entries in PPTT,
> > > 
> > > ACPI Processor UID(in MADT): The OS associates this GICC(applies
> > > even
> > > for APIC and family) Structure with a processor device object in
> > > the namespace when the _UID child object of the processor device
> > > evaluates to a numeric value that matches the numeric value in
> > > this
> > > field.
> > > 
> > > So for me that indicates it can't be string unless you have some
> > > ways to
> > > match those _UID entries to ACPI Processor ID in MADT and PPTT.
> > > 
> > > Let me know if I am missing to consider something here.
> > > 
> > > --
> > > Regards,
> > > Sudeep
> > > 
> > Harumph.  I think what we have here is a big mess in the spec, but
> > that is exactly why this is an RFC.
> > 
> > The MADT can have any of ~16 different subtables, as you note.  Of
> > those, only these require a numeric _UID:
> > 
> >    -- Type 0x0: Processor Local APIC
> >    -- Type 0x4: Local APIC NMI [0]
> >    -- Type 0x7: Processor Local SAPIC [1]
> >    -- Type 0x9: Processor Local x2APIC
> >    -- Type 0xa: Local x2APIC NMI [0]
> >    -- Type 0xb: GICC
> > 
> > Note [0]: a value of !0x0 is also allowed, indicating all
> > processors
> >  [1]: this has two fields that could be interpreted as an ID
> > when
> >   used together
> > 
> > It does not appear that you could build a usable system without any
> > of these subtables -- but perhaps someone knows of incantations
> > that
> > could -- which is why I thought a string _UID might be viable.
> > 
> I hope no one is shipping such device yet or am I wrong ?
> We can ask them to fix as Linux simply can't boot on such system or
> even if it boots, it may have issues with acpi_processor drivers.
> 
> > 
> > If we consider the PPTT too, then yeah, _UID must be an integer for
> > some devices.
> > 
> > Thanks for the feedback; it forced me to double-check my thinking
> > about
> > the MADT.  The root cause of the issue is not the kernel in this
> > case,
> > but a lack of clarity in the spec -- or at least implied
> > requirements
> > that probably need to be explicit.  I'll send in a spec change.
> > 
> Completely agreed. Even little more clarification on this is helpful.
> Thanks for volunteering :) to take up spec change, much appreciated.
> 

hmmm, we've run into the same problem, and I think the problem is that
1. this is a BIOS bug, because we do need numeric _UID when using
"Device" object, because we need to use it as a reference to find the
processor APIC ID in MADT. Thus a BIOS fix is indeed needed in this
case.
2. Although ACPI spec has made "Processor" object deprecated, it does
not provide a clear ASL example about how to use "Device" object, plus
the clarification of this is in the MADT section instead of the
"Declare Processors" section, which could be easy overlooked, thus I
totally agree with you that we need some spec change here. Thanks!


-rui


> --
> Regards,
> Sudeep

Re: 5.2-rc2: low framerate in flightgear, cpu not running at full speed, thermal related?

2019-06-09 Thread Zhang Rui

On 日, 2019-06-09 at 14:12 +0200, Pavel Machek wrote:
> Hi!
> 
> > 
> > > 
> > > When I start flightgear, I get framerates around 20 fps and cpu
> > > at
> > > 3GHz:
> > > 
> > > pavel@duo:~/bt$ cat /proc/cpuinfo  | grep MHz
> > > cpu MHz   : 3027.471
> > > cpu MHz     : 2981.863
> > > cpu MHz       : 2958.352
> > > cpu MHz     : 2864.001
> > > pavel@duo:~/bt$
> > > 
> > > (Ok, fgfs is really only running at single core, so why do both
> > > cores
> > > run at 3GHz?)
> > > 
> > > But temperatures get quite high:
> > > 
> > > pavel@duo:~/bt$ sensors
> > > thinkpad-isa-
> > > Adapter: ISA adapter
> > > fan1:4485 RPM
> > > 
> > > coretemp-isa-
> > > Adapter: ISA adapter
> > > Package id 0:  +98.0°C  (high = +86.0°C, crit = +100.0°C)
> > > Core 0:+98.0°C  (high = +86.0°C, crit = +100.0°C)
> > > Core 1:+91.0°C  (high = +86.0°C, crit = +100.0°C)
> > > 
> > > And soon cpu goes to 1.5GHz range, with framerates going down to
> > > 12fps. That's a bit low.
> > > 
> > > Room temperature is 26Celsius.
> > > 
> > > The CPU is Intel(R) Core(TM) i5-2520M CPU @ 2.50GHz . I guess it
> > > means
> > > it should be able to sustain both cores running at 2.5GHz?
> > > 
> > > Any ideas? Were there any recent changes in that area?
> > I tried kernel compile. It keeps both cores at 3GHz, temperature
> > goes
> > up over 95C, and then cpus start going down to 2.3GHz... and then
> > down
> > to 2GHz... and down to 1.9GHz.
> > 
> > watch bash -c 'sensors;  cat /proc/cpuinfo | grep MHz'
> Situation is very different with v4.6 distro based kernel.
> 
> CPU MHz is only getting values round to 100MHz. It does not go above
> 2.5GHz, but it does not go below 2.5GHz under the load, either.
> 
> ACPI adapter appears in sensors output.

what temperature does coretemp report?
can you please provide the sensors output in 4.6 during kernel
compiling?

can you attach the output of "grep . /sys/class/thermal/thermal*/*"
when the temperature goes high in both kernels?

thanks,
rui
> 
> Now I tried going to 5.2-rc4. It behaves the same as 5.2-rc2. Goes up
> to 3GHz briefly but then down to 2.0GHz and below under load.
> 
> Ideas welcome.
> 
> Best regards,
>   
> Pavel

RE: mainline/master boot bisection: v5.2-rc1-165-g54dee406374c on rk3288-veyron-jaq

2019-05-23 Thread Zhang, Rui




> -Original Message-
> From: Enric Balletbo i Serra [mailto:enric.balle...@collabora.com]
> Sent: Thursday, May 23, 2019 9:19 PM
> To: Mark Brown ; Eduardo Valentin
> ; Elaine Zhang 
> Cc: tomeu.viz...@collabora.com; guillaume.tuc...@collabora.com;
> mga...@collabora.com; matthew.h...@linaro.org; khil...@baylibre.com;
> Daniel Lezcano ; Heiko Stuebner
> ; linux...@vger.kernel.org; linux-
> ker...@vger.kernel.org; linux-rockc...@lists.infradead.org; Zhang, Rui
> ; linux-arm-ker...@lists.infradead.org
> Subject: Re: mainline/master boot bisection: v5.2-rc1-165-g54dee406374c on
> rk3288-veyron-jaq
> Importance: High
> 
> Hi Mark,
> 
> On 23/5/19 15:12, Mark Brown wrote:
> > On Wed, May 22, 2019 at 07:23:09PM -0700, kernelci.org bot wrote:
> >
> >>   Details:https://kernelci.org/boot/id/5ce5984c59b514e6a47a364c
> >>   Plain log:  https://storage.kernelci.org//mainline/master/v5.2-rc1-165-
> g54dee406374c/arm/multi_v7_defconfig+CONFIG_EFI=y+CONFIG_ARM_LPAE
> =y/gcc-8/lab-collabora/boot-rk3288-veyron-jaq.txt
> >>   HTML log:   https://storage.kernelci.org//mainline/master/v5.2-rc1-165-
> g54dee406374c/arm/multi_v7_defconfig+CONFIG_EFI=y+CONFIG_ARM_LPAE
> =y/gcc-8/lab-collabora/boot-rk3288-veyron-jaq.html
> >>   Result: 28694e009e51 thermal: rockchip: fix up the tsadc pinctrl 
> >> setting
> error
> >
> > It looks like this issue has persisted for a while without any kind of
> > fix happening - given that the bisection has identified this commit as
> > causing the regression and confirmed that reverting it fixes shouldn't
> > we just revert?  My guess would be that there's some error with the
> > pinctrl settings in the DT for the board.
> >
> 
> After some discussion Heiko sent a patch that reverts the offending commit
> one day ago [1] and it's waiting for maintainer to pick-up the patch.
> 
I thought Eduardo will take the patch.
But I will apply it and queue it for -rc2 anyway.

Thanks,
Rui

> The reason why we think is best reverting that fix it is explained here [2]
> 
> [1] https://lkml.org/lkml/2019/5/22/467
> [2] https://lkml.org/lkml/2019/4/30/270
> 
> Thanks,
>  Enric

[tip:x86/topology] hwmon/coretemp: Support multi-die/package

2019-05-23 Thread tip-bot for Zhang Rui

Commit-ID:  cfcd82e632882372db960b50782a439a8ba56c09
Gitweb: https://git.kernel.org/tip/cfcd82e632882372db960b50782a439a8ba56c09
Author: Zhang Rui 
AuthorDate: Mon, 13 May 2019 13:58:54 -0400
Committer:  Thomas Gleixner 
CommitDate: Thu, 23 May 2019 10:08:33 +0200

hwmon/coretemp: Support multi-die/package

Package temperature sensors are actually implemented in hardware per-die.

Update coretemp to be "die-aware", so it can expose mulitple sensors per
package, instead of just one.  No change to single-die/package systems.

Signed-off-by: Zhang Rui 
Signed-off-by: Len Brown 
Signed-off-by: Thomas Gleixner 
Reviewed-by: Ingo Molnar 
Acked-by: Guenter Roeck 
Acked-by: Peter Zijlstra (Intel) 
Cc: linux...@vger.kernel.org
Cc: linux-hw...@vger.kernel.org
Link: 
https://lkml.kernel.org/r/ec2868f35113a01ff72d9041e0b97fc6a1c7df84.1557769318.git.len.br...@intel.com

---
 drivers/hwmon/coretemp.c | 8 
 1 file changed, 4 insertions(+), 4 deletions(-)

diff --git a/drivers/hwmon/coretemp.c b/drivers/hwmon/coretemp.c
index 5d34f7271e67..c64ce32d3214 100644
--- a/drivers/hwmon/coretemp.c
+++ b/drivers/hwmon/coretemp.c
@@ -435,7 +435,7 @@ static int chk_ucode_version(unsigned int cpu)
 
 static struct platform_device *coretemp_get_pdev(unsigned int cpu)
 {
-   int pkgid = topology_logical_package_id(cpu);
+   int pkgid = topology_logical_die_id(cpu);
 
if (pkgid >= 0 && pkgid < max_packages)
return pkg_devices[pkgid];
@@ -579,7 +579,7 @@ static struct platform_driver coretemp_driver = {
 
 static struct platform_device *coretemp_device_add(unsigned int cpu)
 {
-   int err, pkgid = topology_logical_package_id(cpu);
+   int err, pkgid = topology_logical_die_id(cpu);
struct platform_device *pdev;
 
if (pkgid < 0)
@@ -703,7 +703,7 @@ static int coretemp_cpu_offline(unsigned int cpu)
 * the rest.
 */
if (cpumask_empty(>cpumask)) {
-   pkg_devices[topology_logical_package_id(cpu)] = NULL;
+   pkg_devices[topology_logical_die_id(cpu)] = NULL;
platform_device_unregister(pdev);
return 0;
}
@@ -741,7 +741,7 @@ static int __init coretemp_init(void)
if (!x86_match_cpu(coretemp_ids))
return -ENODEV;
 
-   max_packages = topology_max_packages();
+   max_packages = topology_max_packages() * topology_max_die_per_package();
pkg_devices = kcalloc(max_packages, sizeof(struct platform_device *),
  GFP_KERNEL);
if (!pkg_devices)

[tip:x86/topology] powercap/intel_rapl: Update RAPL domain name and debug messages

2019-05-23 Thread tip-bot for Zhang Rui

Commit-ID:  9ea7612c46586d9eacfd517e73ff76ef294feca0
Gitweb: https://git.kernel.org/tip/9ea7612c46586d9eacfd517e73ff76ef294feca0
Author: Zhang Rui 
AuthorDate: Mon, 13 May 2019 13:58:53 -0400
Committer:  Thomas Gleixner 
CommitDate: Thu, 23 May 2019 10:08:33 +0200

powercap/intel_rapl: Update RAPL domain name and debug messages

The RAPL domain "name" attribute contains "Package-N", which is ambiguous
on multi-die per-package systems.

Update the name to "package-X-die-Y" on those systems.

No change on systems without multi-die/package.

Update driver debug messages.

Signed-off-by: Zhang Rui 
Signed-off-by: Len Brown 
Signed-off-by: Thomas Gleixner 
Reviewed-by: Ingo Molnar 
Acked-by: Rafael J. Wysocki 
Acked-by: Peter Zijlstra (Intel) 
Cc: linux...@vger.kernel.org
Link: 
https://lkml.kernel.org/r/6510b784e16374447965925588ec6e46d5d007d8.1557769318.git.len.br...@intel.com

---
 drivers/powercap/intel_rapl.c | 57 ---
 1 file changed, 32 insertions(+), 25 deletions(-)

diff --git a/drivers/powercap/intel_rapl.c b/drivers/powercap/intel_rapl.c
index 9202dbcef96d..ad78c1d08260 100644
--- a/drivers/powercap/intel_rapl.c
+++ b/drivers/powercap/intel_rapl.c
@@ -178,12 +178,15 @@ struct rapl_domain {
 #define power_zone_to_rapl_domain(_zone) \
container_of(_zone, struct rapl_domain, power_zone)
 
+/* maximum rapl package domain name: package-%d-die-%d */
+#define PACKAGE_DOMAIN_NAME_LENGTH 30
 
-/* Each physical package contains multiple domains, these are the common
+
+/* Each rapl package contains multiple domains, these are the common
  * data across RAPL domains within a package.
  */
 struct rapl_package {
-   unsigned int id; /* physical package/socket id */
+   unsigned int id; /* logical die id, equals physical 1-die systems */
unsigned int nr_domains;
unsigned long domain_map; /* bit map of active domains */
unsigned int power_unit;
@@ -198,6 +201,7 @@ struct rapl_package {
int lead_cpu; /* one active cpu per package for access */
/* Track active cpus */
struct cpumask cpumask;
+   char name[PACKAGE_DOMAIN_NAME_LENGTH];
 };
 
 struct rapl_defaults {
@@ -926,8 +930,8 @@ static int rapl_check_unit_core(struct rapl_package *rp, 
int cpu)
value = (msr_val & TIME_UNIT_MASK) >> TIME_UNIT_OFFSET;
rp->time_unit = 100 / (1 << value);
 
-   pr_debug("Core CPU package %d energy=%dpJ, time=%dus, power=%duW\n",
-   rp->id, rp->energy_unit, rp->time_unit, rp->power_unit);
+   pr_debug("Core CPU %s energy=%dpJ, time=%dus, power=%duW\n",
+   rp->name, rp->energy_unit, rp->time_unit, rp->power_unit);
 
return 0;
 }
@@ -951,8 +955,8 @@ static int rapl_check_unit_atom(struct rapl_package *rp, 
int cpu)
value = (msr_val & TIME_UNIT_MASK) >> TIME_UNIT_OFFSET;
rp->time_unit = 100 / (1 << value);
 
-   pr_debug("Atom package %d energy=%dpJ, time=%dus, power=%duW\n",
-   rp->id, rp->energy_unit, rp->time_unit, rp->power_unit);
+   pr_debug("Atom %s energy=%dpJ, time=%dus, power=%duW\n",
+   rp->name, rp->energy_unit, rp->time_unit, rp->power_unit);
 
return 0;
 }
@@ -1181,7 +1185,7 @@ static void rapl_update_domain_data(struct rapl_package 
*rp)
u64 val;
 
for (dmn = 0; dmn < rp->nr_domains; dmn++) {
-   pr_debug("update package %d domain %s data\n", rp->id,
+   pr_debug("update %s domain %s data\n", rp->name,
 rp->domains[dmn].name);
/* exclude non-raw primitives */
for (prim = 0; prim < NR_RAW_PRIMITIVES; prim++) {
@@ -1206,7 +1210,6 @@ static void rapl_unregister_powercap(void)
 static int rapl_package_register_powercap(struct rapl_package *rp)
 {
struct rapl_domain *rd;
-   char dev_name[17]; /* max domain name = 7 + 1 + 8 for int + 1 for null*/
struct powercap_zone *power_zone = NULL;
int nr_pl, ret;
 
@@ -1217,20 +1220,16 @@ static int rapl_package_register_powercap(struct 
rapl_package *rp)
for (rd = rp->domains; rd < rp->domains + rp->nr_domains; rd++) {
if (rd->id == RAPL_DOMAIN_PACKAGE) {
nr_pl = find_nr_power_limit(rd);
-   pr_debug("register socket %d package domain %s\n",
-   rp->id, rd->name);
-   memset(dev_name, 0, sizeof(dev_name));
-   snprintf(dev_name, sizeof(dev_name), "%s-%d",
-   rd->name, rp->id);
+   pr_debug("register package domain %s\n", rp->name);
power_z

[tip:x86/topology] thermal/x86_pkg_temp_thermal: Support multi-die/package

2019-05-23 Thread tip-bot for Zhang Rui

Commit-ID:  724adec33c2491f26f739f285ddca25fca226e48
Gitweb: https://git.kernel.org/tip/724adec33c2491f26f739f285ddca25fca226e48
Author: Zhang Rui 
AuthorDate: Mon, 13 May 2019 13:58:52 -0400
Committer:  Thomas Gleixner 
CommitDate: Thu, 23 May 2019 10:08:33 +0200

thermal/x86_pkg_temp_thermal: Support multi-die/package

Package temperature sensors are actually implemented in hardware per-die.
Thus, the new multi-die/package systems sport mulitple package thermal
zones for each package.

Update the x86_pkg_temp_thermal to be "multi-die-aware", so it can expose
multiple zones per package, instead of just one.

Signed-off-by: Zhang Rui 
Signed-off-by: Len Brown 
Signed-off-by: Thomas Gleixner 
Reviewed-by: Ingo Molnar 
Acked-by: Peter Zijlstra (Intel) 
Link: 
https://lkml.kernel.org/r/281695c854d38d3bdec803480c3049c36198ca44.1557769318.git.len.br...@intel.com

---
 drivers/thermal/intel/x86_pkg_temp_thermal.c | 8 
 1 file changed, 4 insertions(+), 4 deletions(-)

diff --git a/drivers/thermal/intel/x86_pkg_temp_thermal.c 
b/drivers/thermal/intel/x86_pkg_temp_thermal.c
index 1ef937d799e4..405b3858900a 100644
--- a/drivers/thermal/intel/x86_pkg_temp_thermal.c
+++ b/drivers/thermal/intel/x86_pkg_temp_thermal.c
@@ -122,7 +122,7 @@ err_out:
  */
 static struct pkg_device *pkg_temp_thermal_get_dev(unsigned int cpu)
 {
-   int pkgid = topology_logical_package_id(cpu);
+   int pkgid = topology_logical_die_id(cpu);
 
if (pkgid >= 0 && pkgid < max_packages)
return packages[pkgid];
@@ -353,7 +353,7 @@ static int pkg_thermal_notify(u64 msr_val)
 
 static int pkg_temp_thermal_device_add(unsigned int cpu)
 {
-   int pkgid = topology_logical_package_id(cpu);
+   int pkgid = topology_logical_die_id(cpu);
u32 tj_max, eax, ebx, ecx, edx;
struct pkg_device *pkgdev;
int thres_count, err;
@@ -449,7 +449,7 @@ static int pkg_thermal_cpu_offline(unsigned int cpu)
 * worker will see the package anymore.
 */
if (lastcpu) {
-   packages[topology_logical_package_id(cpu)] = NULL;
+   packages[topology_logical_die_id(cpu)] = NULL;
/* After this point nothing touches the MSR anymore. */
wrmsr(MSR_IA32_PACKAGE_THERM_INTERRUPT,
  pkgdev->msr_pkg_therm_low, pkgdev->msr_pkg_therm_high);
@@ -515,7 +515,7 @@ static int __init pkg_temp_thermal_init(void)
if (!x86_match_cpu(pkg_temp_thermal_ids))
return -ENODEV;
 
-   max_packages = topology_max_packages();
+   max_packages = topology_max_packages() * topology_max_die_per_package();
packages = kcalloc(max_packages, sizeof(struct pkg_device *),
   GFP_KERNEL);
if (!packages)

[tip:x86/topology] powercap/intel_rapl: Support multi-die/package

2019-05-23 Thread tip-bot for Zhang Rui

Commit-ID:  32fb480e0a2cf1f71e4174d6477198c94dbc746c
Gitweb: https://git.kernel.org/tip/32fb480e0a2cf1f71e4174d6477198c94dbc746c
Author: Zhang Rui 
AuthorDate: Mon, 13 May 2019 13:58:51 -0400
Committer:  Thomas Gleixner 
CommitDate: Thu, 23 May 2019 10:08:32 +0200

powercap/intel_rapl: Support multi-die/package

RAPL "package" domains are actually implemented in hardware per-die.
Thus, the new multi-die/package systems have mulitple domains
within each physical package.

Update the intel_rapl driver to be "die aware" -- exporting multiple
domains within a single package, when present.  No change on single
die/package systems.

Signed-off-by: Zhang Rui 
Signed-off-by: Len Brown 
Signed-off-by: Thomas Gleixner 
Reviewed-by: Ingo Molnar 
Acked-by: Rafael J. Wysocki 
Acked-by: Peter Zijlstra (Intel) 
Cc: linux...@vger.kernel.org
Link: 
https://lkml.kernel.org/r/9fcb4719aeb7efccf3bc75ed8dd559e46121649f.1557769318.git.len.br...@intel.com

---
 drivers/powercap/intel_rapl.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/drivers/powercap/intel_rapl.c b/drivers/powercap/intel_rapl.c
index 3c3c0c23180b..9202dbcef96d 100644
--- a/drivers/powercap/intel_rapl.c
+++ b/drivers/powercap/intel_rapl.c
@@ -266,7 +266,7 @@ static struct rapl_domain *platform_rapl_domain; /* 
Platform (PSys) domain */
 /* caller to ensure CPU hotplug lock is held */
 static struct rapl_package *rapl_find_package_domain(int cpu)
 {
-   int id = topology_physical_package_id(cpu);
+   int id = topology_logical_die_id(cpu);
struct rapl_package *rp;
 
list_for_each_entry(rp, _packages, plist) {
@@ -1459,7 +1459,7 @@ static void rapl_remove_package(struct rapl_package *rp)
 /* called from CPU hotplug notifier, hotplug lock held */
 static struct rapl_package *rapl_add_package(int cpu)
 {
-   int id = topology_physical_package_id(cpu);
+   int id = topology_logical_die_id(cpu);
struct rapl_package *rp;
int ret;

1 2 3 4 5 6 7 8 9 10 >

1 - 100 of 1640 matches

Mail list logo