from:"kan . liang"

[PATCH V5] perf/x86: Reset the dirty counter to prevent the leak for an RDPMC task

2021-04-20 Thread kan . liang

From: Kan Liang 

The counter value of a perf task may leak to another RDPMC task.
For example, a perf stat task as below is running on CPU 0.

perf stat -e 'branches,cycles' -- taskset -c 0 ./workload

In the meantime, an RDPMC task, which is also running on CPU 0, may read
the GP counters periodically. (The RDPMC task creates a fixed event,
but read four GP counters.)

$ taskset -c 0 ./rdpmc_read_all_counters
index 0x0 value 0x8001e5970f99
index 0x1 value 0x8005d750edb6
index 0x2 value 0x0
index 0x3 value 0x0

index 0x0 value 0x8002358e48a5
index 0x1 value 0x8006bd1e3bc9
index 0x2 value 0x0
index 0x3 value 0x0

It is a potential security issue. Once the attacker knows what the other
thread is counting. The PerfMon counter can be used as a side-channel to
attack cryptosystems.

The counter value of the perf stat task leaks to the RDPMC task because
perf never clears the counter when it's stopped.

Two methods were considered to address the issue.
- Unconditionally reset the counter in x86_pmu_del(). It can bring extra
  overhead even when there is no RDPMC task running.
- Only reset the un-assigned dirty counters when the RDPMC task is
  scheduled in. The method is implemented here.

The dirty counter is a counter, on which the assigned event has been
deleted, but the counter is not reset. To track the dirty counters,
add a 'dirty' variable in the struct cpu_hw_events.

The current code doesn't reset the counter when the assigned event is
deleted. Set the corresponding bit in the 'dirty' variable in
x86_pmu_del(), if the RDPMC feature is available on the system.

The security issue can only be found with an RDPMC task. The event for
an RDPMC task requires the mmap buffer. This can be used to detect an
RDPMC task. Once the event is detected in the event_mapped(), enable
sched_task(), which is invoked in each context switch. Add a check in
the sched_task() to clear the dirty counters, when the RDPMC task is
scheduled in. Only the current un-assigned dirty counters are reset,
bacuase the RDPMC assigned dirty counters will be updated soon.

The RDPMC instruction is also supported on the older platforms. Add
sched_task() for the core_pmu. The core_pmu doesn't support large PEBS
and LBR callstack, the intel_pmu_pebs/lbr_sched_task() will be ignored.

The RDPMC is not Intel-only feature. Add the dirty counters clear code
in the X86 generic code.

After applying the patch,

$ taskset -c 0 ./rdpmc_read_all_counters
index 0x0 value 0x0
index 0x1 value 0x0
index 0x2 value 0x0
index 0x3 value 0x0

index 0x0 value 0x0
index 0x1 value 0x0
index 0x2 value 0x0
index 0x3 value 0x0

Performance

The performance of a context switch only be impacted when there are two
or more perf users and one of the users must be an RDPMC user. In other
cases, there is no performance impact.

The worst-case occurs when there are two users: the RDPMC user only
applies one counter; while the other user applies all available
counters. When the RDPMC task is scheduled in, all the counters, other
than the RDPMC assigned one, have to be reset.

Here is the test result for the worst-case.

The test is implemented on an Ice Lake platform, which has 8 GP
counters and 3 fixed counters (Not include SLOTS counter).

The lat_ctx is used to measure the context switching time.

lat_ctx -s 128K -N 1000 processes 2

I instrument the lat_ctx to open all 8 GP counters and 3 fixed
counters for one task. The other task opens a fixed counter and enable
RDPMC.

Without the patch:
The context switch time is 4.97 us

With the patch:
The context switch time is 5.16 us

There is ~4% performance drop for the context switching time in the
worst-case.

Suggested-by: Peter Zijlstra (Intel) 
Signed-off-by: Kan Liang 
---
Changes since V4:
- Fix the warning with CONFIG_DEBUG_PREEMPT=y
  Disable the interrupts and preemption around perf_sched_cb_inc/dec()
  to protect the sched_cb_list. I don't think we touch the area in NMI.
  Disabling the interrupts should be good enough to protect the cpuctx.
  We don't need perf_ctx_lock().

Changes since V3:
- Fix warnings reported by kernel test robot 
- Move bitmap_empty() check after clearing assigned counters.
  It should be very likely that the cpuc->dirty is non-empty.
  Move it after the clearing can skip the for_each_set_bit() and
  bitmap_zero().

The V2 can be found here.
https://lore.kernel.org/lkml/20200821195754.20159-3-kan.li...@linux.intel.com/

Changes since V2:
- Unconditionally set cpuc->dirty. The worst case for an RDPMC task is
  that we may have to clear all counters for the first time in
  x86_pmu_event_mapped. After that, the sched_task() will clear/update
  the 'dirty'. Only the real 'dirty' counters are clear. For a non-RDPMC
  task, it's harmless to unconditionally set the cpuc->dirty.
- Remove the !is_sampling_event() check
- Move the code into X86 generic file, because RDPMC is no

[tip: perf/core] perf/x86: Hybrid PMU support for counters

2021-04-20 Thread tip-bot2 for Kan Liang

The following commit has been merged into the perf/core branch of tip:

Commit-ID: d4b294bf84db7a84e295ddf19cb8e7f71b7bd045
Gitweb:
https://git.kernel.org/tip/d4b294bf84db7a84e295ddf19cb8e7f71b7bd045
Author:Kan Liang 
AuthorDate:Mon, 12 Apr 2021 07:30:46 -07:00
Committer: Peter Zijlstra 
CommitterDate: Mon, 19 Apr 2021 20:03:25 +02:00

perf/x86: Hybrid PMU support for counters

The number of GP and fixed counters are different among hybrid PMUs.
Each hybrid PMU should use its own counter related information.

When handling a certain hybrid PMU, apply the number of counters from
the corresponding hybrid PMU.

When reserving the counters in the initialization of a new event,
reserve all possible counters.

The number of counter recored in the global x86_pmu is for the
architecture counters which are available for all hybrid PMUs. KVM
doesn't support the hybrid PMU yet. Return the number of the
architecture counters for now.

For the functions only available for the old platforms, e.g.,
intel_pmu_drain_pebs_nhm(), nothing is changed.

Signed-off-by: Kan Liang 
Signed-off-by: Peter Zijlstra (Intel) 
Reviewed-by: Andi Kleen 
Link: 
https://lkml.kernel.org/r/1618237865-33448-7-git-send-email-kan.li...@linux.intel.com
---
 arch/x86/events/core.c   | 55 ---
 arch/x86/events/intel/core.c |  8 +++--
 arch/x86/events/intel/ds.c   | 14 +
 arch/x86/events/perf_event.h |  4 +++-
 4 files changed, 56 insertions(+), 25 deletions(-)

diff --git a/arch/x86/events/core.c b/arch/x86/events/core.c
index 7d3c19e..1aeb31c 100644
--- a/arch/x86/events/core.c
+++ b/arch/x86/events/core.c
@@ -185,16 +185,29 @@ static DEFINE_MUTEX(pmc_reserve_mutex);
 
 #ifdef CONFIG_X86_LOCAL_APIC
 
+static inline int get_possible_num_counters(void)
+{
+   int i, num_counters = x86_pmu.num_counters;
+
+   if (!is_hybrid())
+   return num_counters;
+
+   for (i = 0; i < x86_pmu.num_hybrid_pmus; i++)
+   num_counters = max_t(int, num_counters, 
x86_pmu.hybrid_pmu[i].num_counters);
+
+   return num_counters;
+}
+
 static bool reserve_pmc_hardware(void)
 {
-   int i;
+   int i, num_counters = get_possible_num_counters();
 
-   for (i = 0; i < x86_pmu.num_counters; i++) {
+   for (i = 0; i < num_counters; i++) {
if (!reserve_perfctr_nmi(x86_pmu_event_addr(i)))
goto perfctr_fail;
}
 
-   for (i = 0; i < x86_pmu.num_counters; i++) {
+   for (i = 0; i < num_counters; i++) {
if (!reserve_evntsel_nmi(x86_pmu_config_addr(i)))
goto eventsel_fail;
}
@@ -205,7 +218,7 @@ eventsel_fail:
for (i--; i >= 0; i--)
release_evntsel_nmi(x86_pmu_config_addr(i));
 
-   i = x86_pmu.num_counters;
+   i = num_counters;
 
 perfctr_fail:
for (i--; i >= 0; i--)
@@ -216,9 +229,9 @@ perfctr_fail:
 
 static void release_pmc_hardware(void)
 {
-   int i;
+   int i, num_counters = get_possible_num_counters();
 
-   for (i = 0; i < x86_pmu.num_counters; i++) {
+   for (i = 0; i < num_counters; i++) {
release_perfctr_nmi(x86_pmu_event_addr(i));
release_evntsel_nmi(x86_pmu_config_addr(i));
}
@@ -946,6 +959,7 @@ EXPORT_SYMBOL_GPL(perf_assign_events);
 
 int x86_schedule_events(struct cpu_hw_events *cpuc, int n, int *assign)
 {
+   int num_counters = hybrid(cpuc->pmu, num_counters);
struct event_constraint *c;
struct perf_event *e;
int n0, i, wmin, wmax, unsched = 0;
@@ -1021,7 +1035,7 @@ int x86_schedule_events(struct cpu_hw_events *cpuc, int 
n, int *assign)
 
/* slow path */
if (i != n) {
-   int gpmax = x86_pmu.num_counters;
+   int gpmax = num_counters;
 
/*
 * Do not allow scheduling of more than half the available
@@ -1042,7 +1056,7 @@ int x86_schedule_events(struct cpu_hw_events *cpuc, int 
n, int *assign)
 * the extra Merge events needed by large increment events.
 */
if (x86_pmu.flags & PMU_FL_PAIR) {
-   gpmax = x86_pmu.num_counters - cpuc->n_pair;
+   gpmax = num_counters - cpuc->n_pair;
WARN_ON(gpmax <= 0);
}
 
@@ -1129,10 +1143,12 @@ static int collect_event(struct cpu_hw_events *cpuc, 
struct perf_event *event,
  */
 static int collect_events(struct cpu_hw_events *cpuc, struct perf_event 
*leader, bool dogrp)
 {
+   int num_counters = hybrid(cpuc->pmu, num_counters);
+   int num_counters_fixed = hybrid(cpuc->pmu, num_counters_fixed);
struct perf_event *event;
int n, max_count;
 
-   max_count = x86_pmu.num_counters + x86_pmu.num_counters_fixed;
+   max_count = num_counters + num_counters_fixed;
 
/* current number of events already ac

[tip: perf/core] perf/x86: Track pmu in per-CPU cpu_hw_events

2021-04-20 Thread tip-bot2 for Kan Liang

The following commit has been merged into the perf/core branch of tip:

Commit-ID: 61e76d53c39bb768ad264d379837cfc56b9e35b4
Gitweb:
https://git.kernel.org/tip/61e76d53c39bb768ad264d379837cfc56b9e35b4
Author:Kan Liang 
AuthorDate:Mon, 12 Apr 2021 07:30:43 -07:00
Committer: Peter Zijlstra 
CommitterDate: Mon, 19 Apr 2021 20:03:24 +02:00

perf/x86: Track pmu in per-CPU cpu_hw_events

Some platforms, e.g. Alder Lake, have hybrid architecture. In the same
package, there may be more than one type of CPU. The PMU capabilities
are different among different types of CPU. Perf will register a
dedicated PMU for each type of CPU.

Add a 'pmu' variable in the struct cpu_hw_events to track the dedicated
PMU of the current CPU.

Current x86_get_pmu() use the global 'pmu', which will be broken on a
hybrid platform. Modify it to apply the 'pmu' of the specific CPU.

Initialize the per-CPU 'pmu' variable with the global 'pmu'. There is
nothing changed for the non-hybrid platforms.

The is_x86_event() will be updated in the later patch ("perf/x86:
Register hybrid PMUs") for hybrid platforms. For the non-hybrid
platforms, nothing is changed here.

Suggested-by: Peter Zijlstra (Intel) 
Signed-off-by: Kan Liang 
Signed-off-by: Peter Zijlstra (Intel) 
Link: 
https://lkml.kernel.org/r/1618237865-33448-4-git-send-email-kan.li...@linux.intel.com
---
 arch/x86/events/core.c   | 17 +
 arch/x86/events/intel/core.c |  2 +-
 arch/x86/events/intel/ds.c   |  4 ++--
 arch/x86/events/intel/lbr.c  |  9 +
 arch/x86/events/perf_event.h |  4 +++-
 5 files changed, 24 insertions(+), 12 deletions(-)

diff --git a/arch/x86/events/core.c b/arch/x86/events/core.c
index dd9f3c2..a49a8bd 100644
--- a/arch/x86/events/core.c
+++ b/arch/x86/events/core.c
@@ -45,9 +45,11 @@
 #include "perf_event.h"
 
 struct x86_pmu x86_pmu __read_mostly;
+static struct pmu pmu;
 
 DEFINE_PER_CPU(struct cpu_hw_events, cpu_hw_events) = {
.enabled = 1,
+   .pmu = ,
 };
 
 DEFINE_STATIC_KEY_FALSE(rdpmc_never_available_key);
@@ -724,16 +726,23 @@ void x86_pmu_enable_all(int added)
}
 }
 
-static struct pmu pmu;
-
 static inline int is_x86_event(struct perf_event *event)
 {
return event->pmu == 
 }
 
-struct pmu *x86_get_pmu(void)
+struct pmu *x86_get_pmu(unsigned int cpu)
 {
-   return 
+   struct cpu_hw_events *cpuc = _cpu(cpu_hw_events, cpu);
+
+   /*
+* All CPUs of the hybrid type have been offline.
+* The x86_get_pmu() should not be invoked.
+*/
+   if (WARN_ON_ONCE(!cpuc->pmu))
+   return 
+
+   return cpuc->pmu;
 }
 /*
  * Event scheduler state:
diff --git a/arch/x86/events/intel/core.c b/arch/x86/events/intel/core.c
index 7bbb5bb..f116c63 100644
--- a/arch/x86/events/intel/core.c
+++ b/arch/x86/events/intel/core.c
@@ -4876,7 +4876,7 @@ static void update_tfa_sched(void *ignored)
 * and if so force schedule out for all event types all contexts
 */
if (test_bit(3, cpuc->active_mask))
-   perf_pmu_resched(x86_get_pmu());
+   perf_pmu_resched(x86_get_pmu(smp_processor_id()));
 }
 
 static ssize_t show_sysctl_tfa(struct device *cdev,
diff --git a/arch/x86/events/intel/ds.c b/arch/x86/events/intel/ds.c
index 7ebae18..1bfea8c 100644
--- a/arch/x86/events/intel/ds.c
+++ b/arch/x86/events/intel/ds.c
@@ -2192,7 +2192,7 @@ void __init intel_ds_init(void)
PERF_SAMPLE_TIME;
x86_pmu.flags |= PMU_FL_PEBS_ALL;
pebs_qual = "-baseline";
-   x86_get_pmu()->capabilities |= 
PERF_PMU_CAP_EXTENDED_REGS;
+   x86_get_pmu(smp_processor_id())->capabilities 
|= PERF_PMU_CAP_EXTENDED_REGS;
} else {
/* Only basic record supported */
x86_pmu.large_pebs_flags &=
@@ -2207,7 +2207,7 @@ void __init intel_ds_init(void)
 
if (x86_pmu.intel_cap.pebs_output_pt_available) {
pr_cont("PEBS-via-PT, ");
-   x86_get_pmu()->capabilities |= 
PERF_PMU_CAP_AUX_OUTPUT;
+   x86_get_pmu(smp_processor_id())->capabilities 
|= PERF_PMU_CAP_AUX_OUTPUT;
}
 
break;
diff --git a/arch/x86/events/intel/lbr.c b/arch/x86/events/intel/lbr.c
index 21890da..bb4486c 100644
--- a/arch/x86/events/intel/lbr.c
+++ b/arch/x86/events/intel/lbr.c
@@ -705,7 +705,7 @@ void intel_pmu_lbr_add(struct perf_event *event)
 
 void release_lbr_buffers(void)
 {
-   struct kmem_cache *kmem_cache = x86_get_pmu()->task_ctx_cache;
+   struct kmem_cache *kmem_cache;
struct cpu_hw_events *cpuc;
int cpu;
 
@@ -714,6 +714,7 @@ void release_lbr_buffers(void)
 
for_each

[tip: perf/core] perf/x86/intel: Hybrid PMU support for perf capabilities

2021-04-20 Thread tip-bot2 for Kan Liang

The following commit has been merged into the perf/core branch of tip:

Commit-ID: d0946a882e6220229a29f9031641e54379be5a1e
Gitweb:
https://git.kernel.org/tip/d0946a882e6220229a29f9031641e54379be5a1e
Author:Kan Liang 
AuthorDate:Mon, 12 Apr 2021 07:30:44 -07:00
Committer: Peter Zijlstra 
CommitterDate: Mon, 19 Apr 2021 20:03:24 +02:00

perf/x86/intel: Hybrid PMU support for perf capabilities

Some platforms, e.g. Alder Lake, have hybrid architecture. Although most
PMU capabilities are the same, there are still some unique PMU
capabilities for different hybrid PMUs. Perf should register a dedicated
pmu for each hybrid PMU.

Add a new struct x86_hybrid_pmu, which saves the dedicated pmu and
capabilities for each hybrid PMU.

The architecture MSR, MSR_IA32_PERF_CAPABILITIES, only indicates the
architecture features which are available on all hybrid PMUs. The
architecture features are stored in the global x86_pmu.intel_cap.

For Alder Lake, the model-specific features are perf metrics and
PEBS-via-PT. The corresponding bits of the global x86_pmu.intel_cap
should be 0 for these two features. Perf should not use the global
intel_cap to check the features on a hybrid system.
Add a dedicated intel_cap in the x86_hybrid_pmu to store the
model-specific capabilities. Use the dedicated intel_cap to replace
the global intel_cap for thse two features. The dedicated intel_cap
will be set in the following "Add Alder Lake Hybrid support" patch.

Add is_hybrid() to distinguish a hybrid system. ADL may have an
alternative configuration. With that configuration, the
X86_FEATURE_HYBRID_CPU is not set. Perf cannot rely on the feature bit.
Add a new static_key_false, perf_is_hybrid, to indicate a hybrid system.
It will be assigned in the following "Add Alder Lake Hybrid support"
patch as well.

Suggested-by: Peter Zijlstra (Intel) 
Signed-off-by: Kan Liang 
Signed-off-by: Peter Zijlstra (Intel) 
Link: 
https://lkml.kernel.org/r/1618237865-33448-5-git-send-email-kan.li...@linux.intel.com
---
 arch/x86/events/core.c   |  7 +--
 arch/x86/events/intel/core.c | 22 +
 arch/x86/events/intel/ds.c   |  2 +-
 arch/x86/events/perf_event.h | 33 +++-
 arch/x86/include/asm/msr-index.h |  3 +++-
 5 files changed, 60 insertions(+), 7 deletions(-)

diff --git a/arch/x86/events/core.c b/arch/x86/events/core.c
index a49a8bd..7fc2001 100644
--- a/arch/x86/events/core.c
+++ b/arch/x86/events/core.c
@@ -54,6 +54,7 @@ DEFINE_PER_CPU(struct cpu_hw_events, cpu_hw_events) = {
 
 DEFINE_STATIC_KEY_FALSE(rdpmc_never_available_key);
 DEFINE_STATIC_KEY_FALSE(rdpmc_always_available_key);
+DEFINE_STATIC_KEY_FALSE(perf_is_hybrid);
 
 /*
  * This here uses DEFINE_STATIC_CALL_NULL() to get a static_call defined
@@ -1105,8 +1106,9 @@ static void del_nr_metric_event(struct cpu_hw_events 
*cpuc,
 static int collect_event(struct cpu_hw_events *cpuc, struct perf_event *event,
 int max_count, int n)
 {
+   union perf_capabilities intel_cap = hybrid(cpuc->pmu, intel_cap);
 
-   if (x86_pmu.intel_cap.perf_metrics && add_nr_metric_event(cpuc, event))
+   if (intel_cap.perf_metrics && add_nr_metric_event(cpuc, event))
return -EINVAL;
 
if (n >= max_count + cpuc->n_metric)
@@ -1581,6 +1583,7 @@ void x86_pmu_stop(struct perf_event *event, int flags)
 static void x86_pmu_del(struct perf_event *event, int flags)
 {
struct cpu_hw_events *cpuc = this_cpu_ptr(_hw_events);
+   union perf_capabilities intel_cap = hybrid(cpuc->pmu, intel_cap);
int i;
 
/*
@@ -1620,7 +1623,7 @@ static void x86_pmu_del(struct perf_event *event, int 
flags)
}
cpuc->event_constraint[i-1] = NULL;
--cpuc->n_events;
-   if (x86_pmu.intel_cap.perf_metrics)
+   if (intel_cap.perf_metrics)
del_nr_metric_event(cpuc, event);
 
perf_event_update_userpage(event);
diff --git a/arch/x86/events/intel/core.c b/arch/x86/events/intel/core.c
index f116c63..dc9e2fb 100644
--- a/arch/x86/events/intel/core.c
+++ b/arch/x86/events/intel/core.c
@@ -3646,6 +3646,12 @@ static inline bool is_mem_loads_aux_event(struct 
perf_event *event)
return (event->attr.config & INTEL_ARCH_EVENT_MASK) == 
X86_CONFIG(.event=0x03, .umask=0x82);
 }
 
+static inline bool intel_pmu_has_cap(struct perf_event *event, int idx)
+{
+   union perf_capabilities *intel_cap = (event->pmu, intel_cap);
+
+   return test_bit(idx, (unsigned long *)_cap->capabilities);
+}
 
 static int intel_pmu_hw_config(struct perf_event *event)
 {
@@ -3712,7 +3718,7 @@ static int intel_pmu_hw_config(struct perf_event *event)
 * with a slots event as group leader. When the slots event
 * is used in a metrics group, it too cannot support sampling.
 */
-   if (x86_pmu.intel_cap.perf_metrics && is_topdow

[tip: perf/core] perf/x86: Hybrid PMU support for intel_ctrl

2021-04-20 Thread tip-bot2 for Kan Liang

The following commit has been merged into the perf/core branch of tip:

Commit-ID: fc4b8fca2d8fc8aecd58508e81d55afe4ed76344
Gitweb:
https://git.kernel.org/tip/fc4b8fca2d8fc8aecd58508e81d55afe4ed76344
Author:Kan Liang 
AuthorDate:Mon, 12 Apr 2021 07:30:45 -07:00
Committer: Peter Zijlstra 
CommitterDate: Mon, 19 Apr 2021 20:03:24 +02:00

perf/x86: Hybrid PMU support for intel_ctrl

The intel_ctrl is the counter mask of a PMU. The PMU counter information
may be different among hybrid PMUs, each hybrid PMU should use its own
intel_ctrl to check and access the counters.

When handling a certain hybrid PMU, apply the intel_ctrl from the
corresponding hybrid PMU.

When checking the HW existence, apply the PMU and number of counters
from the corresponding hybrid PMU as well. Perf will check the HW
existence for each Hybrid PMU before registration. Expose the
check_hw_exists() for a later patch.

Signed-off-by: Kan Liang 
Signed-off-by: Peter Zijlstra (Intel) 
Reviewed-by: Andi Kleen 
Link: 
https://lkml.kernel.org/r/1618237865-33448-6-git-send-email-kan.li...@linux.intel.com
---
 arch/x86/events/core.c   | 14 +++---
 arch/x86/events/intel/core.c | 14 +-
 arch/x86/events/perf_event.h | 10 --
 3 files changed, 24 insertions(+), 14 deletions(-)

diff --git a/arch/x86/events/core.c b/arch/x86/events/core.c
index 7fc2001..7d3c19e 100644
--- a/arch/x86/events/core.c
+++ b/arch/x86/events/core.c
@@ -231,7 +231,7 @@ static void release_pmc_hardware(void) {}
 
 #endif
 
-static bool check_hw_exists(void)
+bool check_hw_exists(struct pmu *pmu, int num_counters, int num_counters_fixed)
 {
u64 val, val_fail = -1, val_new= ~0;
int i, reg, reg_fail = -1, ret = 0;
@@ -242,7 +242,7 @@ static bool check_hw_exists(void)
 * Check to see if the BIOS enabled any of the counters, if so
 * complain and bail.
 */
-   for (i = 0; i < x86_pmu.num_counters; i++) {
+   for (i = 0; i < num_counters; i++) {
reg = x86_pmu_config_addr(i);
ret = rdmsrl_safe(reg, );
if (ret)
@@ -256,13 +256,13 @@ static bool check_hw_exists(void)
}
}
 
-   if (x86_pmu.num_counters_fixed) {
+   if (num_counters_fixed) {
reg = MSR_ARCH_PERFMON_FIXED_CTR_CTRL;
ret = rdmsrl_safe(reg, );
if (ret)
goto msr_fail;
-   for (i = 0; i < x86_pmu.num_counters_fixed; i++) {
-   if (fixed_counter_disabled(i))
+   for (i = 0; i < num_counters_fixed; i++) {
+   if (fixed_counter_disabled(i, pmu))
continue;
if (val & (0x03 << i*4)) {
bios_fail = 1;
@@ -1547,7 +1547,7 @@ void perf_event_print_debug(void)
cpu, idx, prev_left);
}
for (idx = 0; idx < x86_pmu.num_counters_fixed; idx++) {
-   if (fixed_counter_disabled(idx))
+   if (fixed_counter_disabled(idx, cpuc->pmu))
continue;
rdmsrl(MSR_ARCH_PERFMON_FIXED_CTR0 + idx, pmc_count);
 
@@ -1992,7 +1992,7 @@ static int __init init_hw_perf_events(void)
pmu_check_apic();
 
/* sanity check that the hardware exists or is emulated */
-   if (!check_hw_exists())
+   if (!check_hw_exists(, x86_pmu.num_counters, 
x86_pmu.num_counters_fixed))
return 0;
 
pr_cont("%s PMU driver.\n", x86_pmu.name);
diff --git a/arch/x86/events/intel/core.c b/arch/x86/events/intel/core.c
index dc9e2fb..2d56055 100644
--- a/arch/x86/events/intel/core.c
+++ b/arch/x86/events/intel/core.c
@@ -2153,10 +2153,11 @@ static void intel_pmu_disable_all(void)
 static void __intel_pmu_enable_all(int added, bool pmi)
 {
struct cpu_hw_events *cpuc = this_cpu_ptr(_hw_events);
+   u64 intel_ctrl = hybrid(cpuc->pmu, intel_ctrl);
 
intel_pmu_lbr_enable_all(pmi);
wrmsrl(MSR_CORE_PERF_GLOBAL_CTRL,
-   x86_pmu.intel_ctrl & ~cpuc->intel_ctrl_guest_mask);
+  intel_ctrl & ~cpuc->intel_ctrl_guest_mask);
 
if (test_bit(INTEL_PMC_IDX_FIXED_BTS, cpuc->active_mask)) {
struct perf_event *event =
@@ -2709,6 +2710,7 @@ int intel_pmu_save_and_restart(struct perf_event *event)
 static void intel_pmu_reset(void)
 {
struct debug_store *ds = __this_cpu_read(cpu_hw_events.ds);
+   struct cpu_hw_events *cpuc = this_cpu_ptr(_hw_events);
unsigned long flags;
int idx;
 
@@ -2724,7 +2726,7 @@ static void intel_pmu_reset(void)
wrmsrl_safe(x86_pmu_event_addr(idx),  0ull);
}
for (idx = 0; idx < x86_pmu.num_counters_fixed; idx++) {
-   if (fixed_counter_disabled(idx))
+   if (fixed_counter_disabled(idx, cpuc->pmu

[tip: perf/core] perf/x86: Hybrid PMU support for hardware cache event

2021-04-20 Thread tip-bot2 for Kan Liang

The following commit has been merged into the perf/core branch of tip:

Commit-ID: 0d18f2dfead8dd63bf1186c9ef38528d6a615a55
Gitweb:
https://git.kernel.org/tip/0d18f2dfead8dd63bf1186c9ef38528d6a615a55
Author:Kan Liang 
AuthorDate:Mon, 12 Apr 2021 07:30:48 -07:00
Committer: Peter Zijlstra 
CommitterDate: Mon, 19 Apr 2021 20:03:25 +02:00

perf/x86: Hybrid PMU support for hardware cache event

The hardware cache events are different among hybrid PMUs. Each hybrid
PMU should have its own hw cache event table.

Suggested-by: Peter Zijlstra (Intel) 
Signed-off-by: Kan Liang 
Signed-off-by: Peter Zijlstra (Intel) 
Link: 
https://lkml.kernel.org/r/1618237865-33448-9-git-send-email-kan.li...@linux.intel.com
---
 arch/x86/events/core.c   |  5 ++---
 arch/x86/events/perf_event.h |  9 +
 2 files changed, 11 insertions(+), 3 deletions(-)

diff --git a/arch/x86/events/core.c b/arch/x86/events/core.c
index 1aeb31c..e8cb892 100644
--- a/arch/x86/events/core.c
+++ b/arch/x86/events/core.c
@@ -376,8 +376,7 @@ set_ext_hw_attr(struct hw_perf_event *hwc, struct 
perf_event *event)
return -EINVAL;
cache_result = array_index_nospec(cache_result, 
PERF_COUNT_HW_CACHE_RESULT_MAX);
 
-   val = hw_cache_event_ids[cache_type][cache_op][cache_result];
-
+   val = hybrid_var(event->pmu, 
hw_cache_event_ids)[cache_type][cache_op][cache_result];
if (val == 0)
return -ENOENT;
 
@@ -385,7 +384,7 @@ set_ext_hw_attr(struct hw_perf_event *hwc, struct 
perf_event *event)
return -EINVAL;
 
hwc->config |= val;
-   attr->config1 = hw_cache_extra_regs[cache_type][cache_op][cache_result];
+   attr->config1 = hybrid_var(event->pmu, 
hw_cache_extra_regs)[cache_type][cache_op][cache_result];
return x86_pmu_extra_regs(val, event);
 }
 
diff --git a/arch/x86/events/perf_event.h b/arch/x86/events/perf_event.h
index 2688e45..b65cf46 100644
--- a/arch/x86/events/perf_event.h
+++ b/arch/x86/events/perf_event.h
@@ -639,6 +639,15 @@ struct x86_hybrid_pmu {
int num_counters;
int num_counters_fixed;
struct event_constraint unconstrained;
+
+   u64 hw_cache_event_ids
+   [PERF_COUNT_HW_CACHE_MAX]
+   [PERF_COUNT_HW_CACHE_OP_MAX]
+   [PERF_COUNT_HW_CACHE_RESULT_MAX];
+   u64 hw_cache_extra_regs
+   [PERF_COUNT_HW_CACHE_MAX]
+   [PERF_COUNT_HW_CACHE_OP_MAX]
+   [PERF_COUNT_HW_CACHE_RESULT_MAX];
 };
 
 static __always_inline struct x86_hybrid_pmu *hybrid_pmu(struct pmu *pmu)

[tip: perf/core] perf/x86: Hybrid PMU support for unconstrained

2021-04-20 Thread tip-bot2 for Kan Liang

The following commit has been merged into the perf/core branch of tip:

Commit-ID: eaacf07d1116f6bf3b93b265515fccf2301097f2
Gitweb:
https://git.kernel.org/tip/eaacf07d1116f6bf3b93b265515fccf2301097f2
Author:Kan Liang 
AuthorDate:Mon, 12 Apr 2021 07:30:47 -07:00
Committer: Peter Zijlstra 
CommitterDate: Mon, 19 Apr 2021 20:03:25 +02:00

perf/x86: Hybrid PMU support for unconstrained

The unconstrained value depends on the number of GP and fixed counters.
Each hybrid PMU should use its own unconstrained.

Suggested-by: Peter Zijlstra (Intel) 
Signed-off-by: Kan Liang 
Signed-off-by: Peter Zijlstra (Intel) 
Link: 
https://lkml.kernel.org/r/1618237865-33448-8-git-send-email-kan.li...@linux.intel.com
---
 arch/x86/events/intel/core.c |  2 +-
 arch/x86/events/perf_event.h | 11 +++
 2 files changed, 12 insertions(+), 1 deletion(-)

diff --git a/arch/x86/events/intel/core.c b/arch/x86/events/intel/core.c
index 3ea0126..4cfc382 100644
--- a/arch/x86/events/intel/core.c
+++ b/arch/x86/events/intel/core.c
@@ -3147,7 +3147,7 @@ x86_get_event_constraints(struct cpu_hw_events *cpuc, int 
idx,
}
}
 
-   return 
+   return _var(cpuc->pmu, unconstrained);
 }
 
 static struct event_constraint *
diff --git a/arch/x86/events/perf_event.h b/arch/x86/events/perf_event.h
index 0539ad4..2688e45 100644
--- a/arch/x86/events/perf_event.h
+++ b/arch/x86/events/perf_event.h
@@ -638,6 +638,7 @@ struct x86_hybrid_pmu {
int max_pebs_events;
int num_counters;
int num_counters_fixed;
+   struct event_constraint unconstrained;
 };
 
 static __always_inline struct x86_hybrid_pmu *hybrid_pmu(struct pmu *pmu)
@@ -658,6 +659,16 @@ extern struct static_key_false perf_is_hybrid;
__Fp;   \
 }))
 
+#define hybrid_var(_pmu, _var) \
+(*({   \
+   typeof(&_var) __Fp = &_var; \
+   \
+   if (is_hybrid() && (_pmu))  \
+   __Fp = _pmu(_pmu)->_var; \
+   \
+   __Fp;   \
+}))
+
 /*
  * struct x86_pmu - generic x86 pmu
  */

[tip: perf/core] perf/x86: Hybrid PMU support for event constraints

2021-04-20 Thread tip-bot2 for Kan Liang

The following commit has been merged into the perf/core branch of tip:

Commit-ID: 24ee38ffe61a68fc35065fcab1908883a34c866b
Gitweb:
https://git.kernel.org/tip/24ee38ffe61a68fc35065fcab1908883a34c866b
Author:Kan Liang 
AuthorDate:Mon, 12 Apr 2021 07:30:49 -07:00
Committer: Peter Zijlstra 
CommitterDate: Mon, 19 Apr 2021 20:03:25 +02:00

perf/x86: Hybrid PMU support for event constraints

The events are different among hybrid PMUs. Each hybrid PMU should use
its own event constraints.

Signed-off-by: Kan Liang 
Signed-off-by: Peter Zijlstra (Intel) 
Reviewed-by: Andi Kleen 
Link: 
https://lkml.kernel.org/r/1618237865-33448-10-git-send-email-kan.li...@linux.intel.com
---
 arch/x86/events/core.c   | 3 ++-
 arch/x86/events/intel/core.c | 5 +++--
 arch/x86/events/intel/ds.c   | 5 +++--
 arch/x86/events/perf_event.h | 2 ++
 4 files changed, 10 insertions(+), 5 deletions(-)

diff --git a/arch/x86/events/core.c b/arch/x86/events/core.c
index e8cb892..f92d234 100644
--- a/arch/x86/events/core.c
+++ b/arch/x86/events/core.c
@@ -1518,6 +1518,7 @@ void perf_event_print_debug(void)
struct cpu_hw_events *cpuc = _cpu(cpu_hw_events, cpu);
int num_counters = hybrid(cpuc->pmu, num_counters);
int num_counters_fixed = hybrid(cpuc->pmu, num_counters_fixed);
+   struct event_constraint *pebs_constraints = hybrid(cpuc->pmu, 
pebs_constraints);
unsigned long flags;
int idx;
 
@@ -1537,7 +1538,7 @@ void perf_event_print_debug(void)
pr_info("CPU#%d: status: %016llx\n", cpu, status);
pr_info("CPU#%d: overflow:   %016llx\n", cpu, overflow);
pr_info("CPU#%d: fixed:  %016llx\n", cpu, fixed);
-   if (x86_pmu.pebs_constraints) {
+   if (pebs_constraints) {
rdmsrl(MSR_IA32_PEBS_ENABLE, pebs);
pr_info("CPU#%d: pebs:   %016llx\n", cpu, pebs);
}
diff --git a/arch/x86/events/intel/core.c b/arch/x86/events/intel/core.c
index 4cfc382..447a80f 100644
--- a/arch/x86/events/intel/core.c
+++ b/arch/x86/events/intel/core.c
@@ -3136,10 +3136,11 @@ struct event_constraint *
 x86_get_event_constraints(struct cpu_hw_events *cpuc, int idx,
  struct perf_event *event)
 {
+   struct event_constraint *event_constraints = hybrid(cpuc->pmu, 
event_constraints);
struct event_constraint *c;
 
-   if (x86_pmu.event_constraints) {
-   for_each_event_constraint(c, x86_pmu.event_constraints) {
+   if (event_constraints) {
+   for_each_event_constraint(c, event_constraints) {
if (constraint_match(c, event->hw.config)) {
event->hw.flags |= c->flags;
return c;
diff --git a/arch/x86/events/intel/ds.c b/arch/x86/events/intel/ds.c
index 312bf3b..f1402bc 100644
--- a/arch/x86/events/intel/ds.c
+++ b/arch/x86/events/intel/ds.c
@@ -959,13 +959,14 @@ struct event_constraint 
intel_spr_pebs_event_constraints[] = {
 
 struct event_constraint *intel_pebs_constraints(struct perf_event *event)
 {
+   struct event_constraint *pebs_constraints = hybrid(event->pmu, 
pebs_constraints);
struct event_constraint *c;
 
if (!event->attr.precise_ip)
return NULL;
 
-   if (x86_pmu.pebs_constraints) {
-   for_each_event_constraint(c, x86_pmu.pebs_constraints) {
+   if (pebs_constraints) {
+   for_each_event_constraint(c, pebs_constraints) {
if (constraint_match(c, event->hw.config)) {
event->hw.flags |= c->flags;
return c;
diff --git a/arch/x86/events/perf_event.h b/arch/x86/events/perf_event.h
index b65cf46..34b7fc9 100644
--- a/arch/x86/events/perf_event.h
+++ b/arch/x86/events/perf_event.h
@@ -648,6 +648,8 @@ struct x86_hybrid_pmu {
[PERF_COUNT_HW_CACHE_MAX]
[PERF_COUNT_HW_CACHE_OP_MAX]
[PERF_COUNT_HW_CACHE_RESULT_MAX];
+   struct event_constraint *event_constraints;
+   struct event_constraint *pebs_constraints;
 };
 
 static __always_inline struct x86_hybrid_pmu *hybrid_pmu(struct pmu *pmu)

[tip: perf/core] perf/x86: Hybrid PMU support for extra_regs

2021-04-20 Thread tip-bot2 for Kan Liang

The following commit has been merged into the perf/core branch of tip:

Commit-ID: 183af7366b4e813ee4e0b995ff731e3ac28251f0
Gitweb:
https://git.kernel.org/tip/183af7366b4e813ee4e0b995ff731e3ac28251f0
Author:Kan Liang 
AuthorDate:Mon, 12 Apr 2021 07:30:50 -07:00
Committer: Peter Zijlstra 
CommitterDate: Mon, 19 Apr 2021 20:03:26 +02:00

perf/x86: Hybrid PMU support for extra_regs

Different hybrid PMU may have different extra registers, e.g. Core PMU
may have offcore registers, frontend register and ldlat register. Atom
core may only have offcore registers and ldlat register. Each hybrid PMU
should use its own extra_regs.

An Intel Hybrid system should always have extra registers.
Unconditionally allocate shared_regs for Intel Hybrid system.

Signed-off-by: Kan Liang 
Signed-off-by: Peter Zijlstra (Intel) 
Reviewed-by: Andi Kleen 
Link: 
https://lkml.kernel.org/r/1618237865-33448-11-git-send-email-kan.li...@linux.intel.com
---
 arch/x86/events/core.c   |  5 +++--
 arch/x86/events/intel/core.c | 15 +--
 arch/x86/events/perf_event.h |  1 +
 3 files changed, 13 insertions(+), 8 deletions(-)

diff --git a/arch/x86/events/core.c b/arch/x86/events/core.c
index f92d234..57d3fe1 100644
--- a/arch/x86/events/core.c
+++ b/arch/x86/events/core.c
@@ -154,15 +154,16 @@ again:
  */
 static int x86_pmu_extra_regs(u64 config, struct perf_event *event)
 {
+   struct extra_reg *extra_regs = hybrid(event->pmu, extra_regs);
struct hw_perf_event_extra *reg;
struct extra_reg *er;
 
reg = >hw.extra_reg;
 
-   if (!x86_pmu.extra_regs)
+   if (!extra_regs)
return 0;
 
-   for (er = x86_pmu.extra_regs; er->msr; er++) {
+   for (er = extra_regs; er->msr; er++) {
if (er->event != (config & er->config_mask))
continue;
if (event->attr.config1 & ~er->valid_mask)
diff --git a/arch/x86/events/intel/core.c b/arch/x86/events/intel/core.c
index 447a80f..f727aa5 100644
--- a/arch/x86/events/intel/core.c
+++ b/arch/x86/events/intel/core.c
@@ -2966,8 +2966,10 @@ intel_vlbr_constraints(struct perf_event *event)
return NULL;
 }
 
-static int intel_alt_er(int idx, u64 config)
+static int intel_alt_er(struct cpu_hw_events *cpuc,
+   int idx, u64 config)
 {
+   struct extra_reg *extra_regs = hybrid(cpuc->pmu, extra_regs);
int alt_idx = idx;
 
if (!(x86_pmu.flags & PMU_FL_HAS_RSP_1))
@@ -2979,7 +2981,7 @@ static int intel_alt_er(int idx, u64 config)
if (idx == EXTRA_REG_RSP_1)
alt_idx = EXTRA_REG_RSP_0;
 
-   if (config & ~x86_pmu.extra_regs[alt_idx].valid_mask)
+   if (config & ~extra_regs[alt_idx].valid_mask)
return idx;
 
return alt_idx;
@@ -2987,15 +2989,16 @@ static int intel_alt_er(int idx, u64 config)
 
 static void intel_fixup_er(struct perf_event *event, int idx)
 {
+   struct extra_reg *extra_regs = hybrid(event->pmu, extra_regs);
event->hw.extra_reg.idx = idx;
 
if (idx == EXTRA_REG_RSP_0) {
event->hw.config &= ~INTEL_ARCH_EVENT_MASK;
-   event->hw.config |= x86_pmu.extra_regs[EXTRA_REG_RSP_0].event;
+   event->hw.config |= extra_regs[EXTRA_REG_RSP_0].event;
event->hw.extra_reg.reg = MSR_OFFCORE_RSP_0;
} else if (idx == EXTRA_REG_RSP_1) {
event->hw.config &= ~INTEL_ARCH_EVENT_MASK;
-   event->hw.config |= x86_pmu.extra_regs[EXTRA_REG_RSP_1].event;
+   event->hw.config |= extra_regs[EXTRA_REG_RSP_1].event;
event->hw.extra_reg.reg = MSR_OFFCORE_RSP_1;
}
 }
@@ -3071,7 +3074,7 @@ again:
 */
c = NULL;
} else {
-   idx = intel_alt_er(idx, reg->config);
+   idx = intel_alt_er(cpuc, idx, reg->config);
if (idx != reg->idx) {
raw_spin_unlock_irqrestore(>lock, flags);
goto again;
@@ -4155,7 +4158,7 @@ int intel_cpuc_prepare(struct cpu_hw_events *cpuc, int 
cpu)
 {
cpuc->pebs_record_size = x86_pmu.pebs_record_size;
 
-   if (x86_pmu.extra_regs || x86_pmu.lbr_sel_map) {
+   if (is_hybrid() || x86_pmu.extra_regs || x86_pmu.lbr_sel_map) {
cpuc->shared_regs = allocate_shared_regs(cpu);
if (!cpuc->shared_regs)
goto err;
diff --git a/arch/x86/events/perf_event.h b/arch/x86/events/perf_event.h
index 34b7fc9..d8c448b 100644
--- a/arch/x86/events/perf_event.h
+++ b/arch/x86/events/perf_event.h
@@ -650,6 +650,7 @@ struct x86_hybrid_pmu {
[PERF_COUNT_HW_CACHE_RESULT_MAX];
struct event_constraint *event_constraints;
struct event_constraint *pebs_constraints;
+   struct extra_reg*extra_regs;
 };
 
 static __always_inline struct x86_hybrid_pmu *hybrid_pmu(struct pmu *pmu)

[tip: perf/core] perf/x86/intel: Factor out intel_pmu_check_num_counters

2021-04-20 Thread tip-bot2 for Kan Liang

The following commit has been merged into the perf/core branch of tip:

Commit-ID: b8c4d1a87610ba20da1abddb7aacbde0b2817c1a
Gitweb:
https://git.kernel.org/tip/b8c4d1a87610ba20da1abddb7aacbde0b2817c1a
Author:Kan Liang 
AuthorDate:Mon, 12 Apr 2021 07:30:51 -07:00
Committer: Peter Zijlstra 
CommitterDate: Mon, 19 Apr 2021 20:03:26 +02:00

perf/x86/intel: Factor out intel_pmu_check_num_counters

Each Hybrid PMU has to check its own number of counters and mask fixed
counters before registration.

The intel_pmu_check_num_counters will be reused later to check the
number of the counters for each hybrid PMU.

Signed-off-by: Kan Liang 
Signed-off-by: Peter Zijlstra (Intel) 
Reviewed-by: Andi Kleen 
Link: 
https://lkml.kernel.org/r/1618237865-33448-12-git-send-email-kan.li...@linux.intel.com
---
 arch/x86/events/intel/core.c | 38 ++-
 1 file changed, 24 insertions(+), 14 deletions(-)

diff --git a/arch/x86/events/intel/core.c b/arch/x86/events/intel/core.c
index f727aa5..d7e2021 100644
--- a/arch/x86/events/intel/core.c
+++ b/arch/x86/events/intel/core.c
@@ -5064,6 +5064,26 @@ static const struct attribute_group *attr_update[] = {
 
 static struct attribute *empty_attrs;
 
+static void intel_pmu_check_num_counters(int *num_counters,
+int *num_counters_fixed,
+u64 *intel_ctrl, u64 fixed_mask)
+{
+   if (*num_counters > INTEL_PMC_MAX_GENERIC) {
+   WARN(1, KERN_ERR "hw perf events %d > max(%d), clipping!",
+*num_counters, INTEL_PMC_MAX_GENERIC);
+   *num_counters = INTEL_PMC_MAX_GENERIC;
+   }
+   *intel_ctrl = (1ULL << *num_counters) - 1;
+
+   if (*num_counters_fixed > INTEL_PMC_MAX_FIXED) {
+   WARN(1, KERN_ERR "hw perf events fixed %d > max(%d), clipping!",
+*num_counters_fixed, INTEL_PMC_MAX_FIXED);
+   *num_counters_fixed = INTEL_PMC_MAX_FIXED;
+   }
+
+   *intel_ctrl |= fixed_mask << INTEL_PMC_IDX_FIXED;
+}
+
 __init int intel_pmu_init(void)
 {
struct attribute **extra_skl_attr = _attrs;
@@ -5703,20 +5723,10 @@ __init int intel_pmu_init(void)
 
x86_pmu.attr_update = attr_update;
 
-   if (x86_pmu.num_counters > INTEL_PMC_MAX_GENERIC) {
-   WARN(1, KERN_ERR "hw perf events %d > max(%d), clipping!",
-x86_pmu.num_counters, INTEL_PMC_MAX_GENERIC);
-   x86_pmu.num_counters = INTEL_PMC_MAX_GENERIC;
-   }
-   x86_pmu.intel_ctrl = (1ULL << x86_pmu.num_counters) - 1;
-
-   if (x86_pmu.num_counters_fixed > INTEL_PMC_MAX_FIXED) {
-   WARN(1, KERN_ERR "hw perf events fixed %d > max(%d), clipping!",
-x86_pmu.num_counters_fixed, INTEL_PMC_MAX_FIXED);
-   x86_pmu.num_counters_fixed = INTEL_PMC_MAX_FIXED;
-   }
-
-   x86_pmu.intel_ctrl |= (u64)fixed_mask << INTEL_PMC_IDX_FIXED;
+   intel_pmu_check_num_counters(_pmu.num_counters,
+_pmu.num_counters_fixed,
+_pmu.intel_ctrl,
+(u64)fixed_mask);
 
/* AnyThread may be deprecated on arch perfmon v5 or later */
if (x86_pmu.intel_cap.anythread_deprecated)

[tip: perf/core] perf/x86/intel: Factor out intel_pmu_check_event_constraints

2021-04-20 Thread tip-bot2 for Kan Liang

The following commit has been merged into the perf/core branch of tip:

Commit-ID: bc14fe1beeec1d80ee39f03019c10e130c8d376b
Gitweb:
https://git.kernel.org/tip/bc14fe1beeec1d80ee39f03019c10e130c8d376b
Author:Kan Liang 
AuthorDate:Mon, 12 Apr 2021 07:30:52 -07:00
Committer: Peter Zijlstra 
CommitterDate: Mon, 19 Apr 2021 20:03:26 +02:00

perf/x86/intel: Factor out intel_pmu_check_event_constraints

Each Hybrid PMU has to check and update its own event constraints before
registration.

The intel_pmu_check_event_constraints will be reused later to check
the event constraints of each hybrid PMU.

Signed-off-by: Kan Liang 
Signed-off-by: Peter Zijlstra (Intel) 
Reviewed-by: Andi Kleen 
Link: 
https://lkml.kernel.org/r/1618237865-33448-13-git-send-email-kan.li...@linux.intel.com
---
 arch/x86/events/intel/core.c | 82 ---
 1 file changed, 47 insertions(+), 35 deletions(-)

diff --git a/arch/x86/events/intel/core.c b/arch/x86/events/intel/core.c
index d7e2021..5c5f330 100644
--- a/arch/x86/events/intel/core.c
+++ b/arch/x86/events/intel/core.c
@@ -5084,6 +5084,49 @@ static void intel_pmu_check_num_counters(int 
*num_counters,
*intel_ctrl |= fixed_mask << INTEL_PMC_IDX_FIXED;
 }
 
+static void intel_pmu_check_event_constraints(struct event_constraint 
*event_constraints,
+ int num_counters,
+ int num_counters_fixed,
+ u64 intel_ctrl)
+{
+   struct event_constraint *c;
+
+   if (!event_constraints)
+   return;
+
+   /*
+* event on fixed counter2 (REF_CYCLES) only works on this
+* counter, so do not extend mask to generic counters
+*/
+   for_each_event_constraint(c, event_constraints) {
+   /*
+* Don't extend the topdown slots and metrics
+* events to the generic counters.
+*/
+   if (c->idxmsk64 & INTEL_PMC_MSK_TOPDOWN) {
+   /*
+* Disable topdown slots and metrics events,
+* if slots event is not in CPUID.
+*/
+   if (!(INTEL_PMC_MSK_FIXED_SLOTS & intel_ctrl))
+   c->idxmsk64 = 0;
+   c->weight = hweight64(c->idxmsk64);
+   continue;
+   }
+
+   if (c->cmask == FIXED_EVENT_FLAGS) {
+   /* Disabled fixed counters which are not in CPUID */
+   c->idxmsk64 &= intel_ctrl;
+
+   if (c->idxmsk64 != INTEL_PMC_MSK_FIXED_REF_CYCLES)
+   c->idxmsk64 |= (1ULL << num_counters) - 1;
+   }
+   c->idxmsk64 &=
+   ~(~0ULL << (INTEL_PMC_IDX_FIXED + num_counters_fixed));
+   c->weight = hweight64(c->idxmsk64);
+   }
+}
+
 __init int intel_pmu_init(void)
 {
struct attribute **extra_skl_attr = _attrs;
@@ -5094,7 +5137,6 @@ __init int intel_pmu_init(void)
union cpuid10_edx edx;
union cpuid10_eax eax;
union cpuid10_ebx ebx;
-   struct event_constraint *c;
unsigned int fixed_mask;
struct extra_reg *er;
bool pmem = false;
@@ -5732,40 +5774,10 @@ __init int intel_pmu_init(void)
if (x86_pmu.intel_cap.anythread_deprecated)
x86_pmu.format_attrs = intel_arch_formats_attr;
 
-   if (x86_pmu.event_constraints) {
-   /*
-* event on fixed counter2 (REF_CYCLES) only works on this
-* counter, so do not extend mask to generic counters
-*/
-   for_each_event_constraint(c, x86_pmu.event_constraints) {
-   /*
-* Don't extend the topdown slots and metrics
-* events to the generic counters.
-*/
-   if (c->idxmsk64 & INTEL_PMC_MSK_TOPDOWN) {
-   /*
-* Disable topdown slots and metrics events,
-* if slots event is not in CPUID.
-*/
-   if (!(INTEL_PMC_MSK_FIXED_SLOTS & 
x86_pmu.intel_ctrl))
-   c->idxmsk64 = 0;
-   c->weight = hweight64(c->idxmsk64);
-   continue;
-   }
-
-   if (c->cmask == FIXED_EVENT_FLAGS) {
-   /* Disabled fixed counters which are not in 
CPUID */
-   c->idxmsk64 &= x86_pmu.intel_ctrl;
-
-   if (c->idxmsk64 != 
INTEL_PMC_MSK_FIXED_RE

[tip: perf/core] perf/x86/intel: Factor out intel_pmu_check_extra_regs

2021-04-20 Thread tip-bot2 for Kan Liang

The following commit has been merged into the perf/core branch of tip:

Commit-ID: 34d5b61f29eea656be4283213273c33d5987e4d2
Gitweb:
https://git.kernel.org/tip/34d5b61f29eea656be4283213273c33d5987e4d2
Author:Kan Liang 
AuthorDate:Mon, 12 Apr 2021 07:30:53 -07:00
Committer: Peter Zijlstra 
CommitterDate: Mon, 19 Apr 2021 20:03:26 +02:00

perf/x86/intel: Factor out intel_pmu_check_extra_regs

Each Hybrid PMU has to check and update its own extra registers before
registration.

The intel_pmu_check_extra_regs will be reused later to check the extra
registers of each hybrid PMU.

Signed-off-by: Kan Liang 
Signed-off-by: Peter Zijlstra (Intel) 
Reviewed-by: Andi Kleen 
Link: 
https://lkml.kernel.org/r/1618237865-33448-14-git-send-email-kan.li...@linux.intel.com
---
 arch/x86/events/intel/core.c | 35 +--
 1 file changed, 21 insertions(+), 14 deletions(-)

diff --git a/arch/x86/events/intel/core.c b/arch/x86/events/intel/core.c
index 5c5f330..55ccfbb 100644
--- a/arch/x86/events/intel/core.c
+++ b/arch/x86/events/intel/core.c
@@ -5127,6 +5127,26 @@ static void intel_pmu_check_event_constraints(struct 
event_constraint *event_con
}
 }
 
+static void intel_pmu_check_extra_regs(struct extra_reg *extra_regs)
+{
+   struct extra_reg *er;
+
+   /*
+* Access extra MSR may cause #GP under certain circumstances.
+* E.g. KVM doesn't support offcore event
+* Check all extra_regs here.
+*/
+   if (!extra_regs)
+   return;
+
+   for (er = extra_regs; er->msr; er++) {
+   er->extra_msr_access = check_msr(er->msr, 0x11UL);
+   /* Disable LBR select mapping */
+   if ((er->idx == EXTRA_REG_LBR) && !er->extra_msr_access)
+   x86_pmu.lbr_sel_map = NULL;
+   }
+}
+
 __init int intel_pmu_init(void)
 {
struct attribute **extra_skl_attr = _attrs;
@@ -5138,7 +5158,6 @@ __init int intel_pmu_init(void)
union cpuid10_eax eax;
union cpuid10_ebx ebx;
unsigned int fixed_mask;
-   struct extra_reg *er;
bool pmem = false;
int version, i;
char *name;
@@ -5795,19 +5814,7 @@ __init int intel_pmu_init(void)
if (x86_pmu.lbr_nr)
pr_cont("%d-deep LBR, ", x86_pmu.lbr_nr);
 
-   /*
-* Access extra MSR may cause #GP under certain circumstances.
-* E.g. KVM doesn't support offcore event
-* Check all extra_regs here.
-*/
-   if (x86_pmu.extra_regs) {
-   for (er = x86_pmu.extra_regs; er->msr; er++) {
-   er->extra_msr_access = check_msr(er->msr, 0x11UL);
-   /* Disable LBR select mapping */
-   if ((er->idx == EXTRA_REG_LBR) && !er->extra_msr_access)
-   x86_pmu.lbr_sel_map = NULL;
-   }
-   }
+   intel_pmu_check_extra_regs(x86_pmu.extra_regs);
 
/* Support full width counters using alternative MSR range */
if (x86_pmu.intel_cap.full_width_write) {

[tip: perf/core] perf/x86: Factor out x86_pmu_show_pmu_cap

2021-04-20 Thread tip-bot2 for Kan Liang

The following commit has been merged into the perf/core branch of tip:

Commit-ID: e11c1a7eb302ac8f6f47c18fa662546405a5fd83
Gitweb:
https://git.kernel.org/tip/e11c1a7eb302ac8f6f47c18fa662546405a5fd83
Author:Kan Liang 
AuthorDate:Mon, 12 Apr 2021 07:30:55 -07:00
Committer: Peter Zijlstra 
CommitterDate: Mon, 19 Apr 2021 20:03:27 +02:00

perf/x86: Factor out x86_pmu_show_pmu_cap

The PMU capabilities are different among hybrid PMUs. Perf should dump
the PMU capabilities information for each hybrid PMU.

Factor out x86_pmu_show_pmu_cap() which shows the PMU capabilities
information. The function will be reused later when registering a
dedicated hybrid PMU.

Signed-off-by: Kan Liang 
Signed-off-by: Peter Zijlstra (Intel) 
Reviewed-by: Andi Kleen 
Link: 
https://lkml.kernel.org/r/1618237865-33448-16-git-send-email-kan.li...@linux.intel.com
---
 arch/x86/events/core.c   | 25 -
 arch/x86/events/perf_event.h |  3 +++
 2 files changed, 19 insertions(+), 9 deletions(-)

diff --git a/arch/x86/events/core.c b/arch/x86/events/core.c
index ed8dcfb..2e7ae52 100644
--- a/arch/x86/events/core.c
+++ b/arch/x86/events/core.c
@@ -1976,6 +1976,20 @@ static void _x86_pmu_read(struct perf_event *event)
x86_perf_event_update(event);
 }
 
+void x86_pmu_show_pmu_cap(int num_counters, int num_counters_fixed,
+ u64 intel_ctrl)
+{
+   pr_info("... version:%d\n", x86_pmu.version);
+   pr_info("... bit width:  %d\n", x86_pmu.cntval_bits);
+   pr_info("... generic registers:  %d\n", num_counters);
+   pr_info("... value mask: %016Lx\n", x86_pmu.cntval_mask);
+   pr_info("... max period: %016Lx\n", x86_pmu.max_period);
+   pr_info("... fixed-purpose events:   %lu\n",
+   hweight641ULL << num_counters_fixed) - 1)
+   << INTEL_PMC_IDX_FIXED) & intel_ctrl));
+   pr_info("... event mask: %016Lx\n", intel_ctrl);
+}
+
 static int __init init_hw_perf_events(void)
 {
struct x86_pmu_quirk *quirk;
@@ -2036,15 +2050,8 @@ static int __init init_hw_perf_events(void)
 
pmu.attr_update = x86_pmu.attr_update;
 
-   pr_info("... version:%d\n", x86_pmu.version);
-   pr_info("... bit width:  %d\n", x86_pmu.cntval_bits);
-   pr_info("... generic registers:  %d\n", x86_pmu.num_counters);
-   pr_info("... value mask: %016Lx\n", x86_pmu.cntval_mask);
-   pr_info("... max period: %016Lx\n", x86_pmu.max_period);
-   pr_info("... fixed-purpose events:   %lu\n",
-   hweight641ULL << x86_pmu.num_counters_fixed) - 1)
-   << INTEL_PMC_IDX_FIXED) & 
x86_pmu.intel_ctrl));
-   pr_info("... event mask: %016Lx\n", x86_pmu.intel_ctrl);
+   x86_pmu_show_pmu_cap(x86_pmu.num_counters, x86_pmu.num_counters_fixed,
+x86_pmu.intel_ctrl);
 
if (!x86_pmu.read)
x86_pmu.read = _x86_pmu_read;
diff --git a/arch/x86/events/perf_event.h b/arch/x86/events/perf_event.h
index d8c448b..a3534e3 100644
--- a/arch/x86/events/perf_event.h
+++ b/arch/x86/events/perf_event.h
@@ -1092,6 +1092,9 @@ void x86_pmu_enable_event(struct perf_event *event);
 
 int x86_pmu_handle_irq(struct pt_regs *regs);
 
+void x86_pmu_show_pmu_cap(int num_counters, int num_counters_fixed,
+ u64 intel_ctrl);
+
 extern struct event_constraint emptyconstraint;
 
 extern struct event_constraint unconstrained;

[tip: perf/core] perf/x86: Remove temporary pmu assignment in event_init

2021-04-20 Thread tip-bot2 for Kan Liang

The following commit has been merged into the perf/core branch of tip:

Commit-ID: b98567298bad891774054113690b30bd90d5738d
Gitweb:
https://git.kernel.org/tip/b98567298bad891774054113690b30bd90d5738d
Author:Kan Liang 
AuthorDate:Mon, 12 Apr 2021 07:30:54 -07:00
Committer: Peter Zijlstra 
CommitterDate: Mon, 19 Apr 2021 20:03:27 +02:00

perf/x86: Remove temporary pmu assignment in event_init

The temporary pmu assignment in event_init is unnecessary.

The assignment was introduced by commit 8113070d6639 ("perf_events:
Add fast-path to the rescheduling code"). At that time, event->pmu is
not assigned yet when initializing an event. The assignment is required.
However, from commit 7e5b2a01d2ca ("perf: provide PMU when initing
events"), the event->pmu is provided before event_init is invoked.
The temporary pmu assignment in event_init should be removed.

Signed-off-by: Kan Liang 
Signed-off-by: Peter Zijlstra (Intel) 
Reviewed-by: Andi Kleen 
Link: 
https://lkml.kernel.org/r/1618237865-33448-15-git-send-email-kan.li...@linux.intel.com
---
 arch/x86/events/core.c | 11 ---
 1 file changed, 11 deletions(-)

diff --git a/arch/x86/events/core.c b/arch/x86/events/core.c
index 57d3fe1..ed8dcfb 100644
--- a/arch/x86/events/core.c
+++ b/arch/x86/events/core.c
@@ -2291,7 +2291,6 @@ out:
 
 static int x86_pmu_event_init(struct perf_event *event)
 {
-   struct pmu *tmp;
int err;
 
switch (event->attr.type) {
@@ -2306,20 +2305,10 @@ static int x86_pmu_event_init(struct perf_event *event)
 
err = __x86_pmu_event_init(event);
if (!err) {
-   /*
-* we temporarily connect event to its pmu
-* such that validate_group() can classify
-* it as an x86 event using is_x86_event()
-*/
-   tmp = event->pmu;
-   event->pmu = 
-
if (event->group_leader != event)
err = validate_group(event);
else
err = validate_event(event);
-
-   event->pmu = tmp;
}
if (err) {
if (event->destroy)

[tip: perf/core] perf/x86: Register hybrid PMUs

2021-04-20 Thread tip-bot2 for Kan Liang

The following commit has been merged into the perf/core branch of tip:

Commit-ID: d9977c43bff895ed49a9d25e1f382b0a98bb271f
Gitweb:
https://git.kernel.org/tip/d9977c43bff895ed49a9d25e1f382b0a98bb271f
Author:Kan Liang 
AuthorDate:Mon, 12 Apr 2021 07:30:56 -07:00
Committer: Peter Zijlstra 
CommitterDate: Mon, 19 Apr 2021 20:03:27 +02:00

perf/x86: Register hybrid PMUs

Different hybrid PMUs have different PMU capabilities and events. Perf
should registers a dedicated PMU for each of them.

To check the X86 event, perf has to go through all possible hybrid pmus.

All the hybrid PMUs are registered at boot time. Before the
registration, add intel_pmu_check_hybrid_pmus() to check and update the
counters information, the event constraints, the extra registers and the
unique capabilities for each hybrid PMUs.

Postpone the display of the PMU information and HW check to
CPU_STARTING, because the boot CPU is the only online CPU in the
init_hw_perf_events(). Perf doesn't know the availability of the other
PMUs. Perf should display the PMU information only if the counters of
the PMU are available.

One type of CPUs may be all offline. For this case, users can still
observe the PMU in /sys/devices, but its CPU mask is 0.

All hybrid PMUs have capability PERF_PMU_CAP_HETEROGENEOUS_CPUS.
The PMU name for hybrid PMUs will be "cpu_XXX", which will be assigned
later in a separated patch.

The PMU type id for the core PMU is still PERF_TYPE_RAW. For the other
hybrid PMUs, the PMU type id is not hard code.

The event->cpu must be compatitable with the supported CPUs of the PMU.
Add a check in the x86_pmu_event_init().

The events in a group must be from the same type of hybrid PMU.
The fake cpuc used in the validation must be from the supported CPU of
the event->pmu.

Perf may not retrieve a valid core type from get_this_hybrid_cpu_type().
For example, ADL may have an alternative configuration. With that
configuration, Perf cannot retrieve the core type from the CPUID leaf
0x1a. Add a platform specific get_hybrid_cpu_type(). If the generic way
fails, invoke the platform specific get_hybrid_cpu_type().

Suggested-by: Peter Zijlstra (Intel) 
Signed-off-by: Kan Liang 
Signed-off-by: Peter Zijlstra (Intel) 
Link: 
https://lkml.kernel.org/r/1618237865-33448-17-git-send-email-kan.li...@linux.intel.com
---
 arch/x86/events/core.c   | 137 +-
 arch/x86/events/intel/core.c |  93 ++-
 arch/x86/events/perf_event.h |  14 +++-
 3 files changed, 223 insertions(+), 21 deletions(-)

diff --git a/arch/x86/events/core.c b/arch/x86/events/core.c
index 2e7ae52..bd465a8 100644
--- a/arch/x86/events/core.c
+++ b/arch/x86/events/core.c
@@ -478,7 +478,7 @@ int x86_setup_perfctr(struct perf_event *event)
local64_set(>period_left, hwc->sample_period);
}
 
-   if (attr->type == PERF_TYPE_RAW)
+   if (attr->type == event->pmu->type)
return x86_pmu_extra_regs(event->attr.config, event);
 
if (attr->type == PERF_TYPE_HW_CACHE)
@@ -613,7 +613,7 @@ int x86_pmu_hw_config(struct perf_event *event)
if (!event->attr.exclude_kernel)
event->hw.config |= ARCH_PERFMON_EVENTSEL_OS;
 
-   if (event->attr.type == PERF_TYPE_RAW)
+   if (event->attr.type == event->pmu->type)
event->hw.config |= event->attr.config & X86_RAW_EVENT_MASK;
 
if (event->attr.sample_period && x86_pmu.limit_period) {
@@ -742,7 +742,17 @@ void x86_pmu_enable_all(int added)
 
 static inline int is_x86_event(struct perf_event *event)
 {
-   return event->pmu == 
+   int i;
+
+   if (!is_hybrid())
+   return event->pmu == 
+
+   for (i = 0; i < x86_pmu.num_hybrid_pmus; i++) {
+   if (event->pmu == _pmu.hybrid_pmu[i].pmu)
+   return true;
+   }
+
+   return false;
 }
 
 struct pmu *x86_get_pmu(unsigned int cpu)
@@ -1990,6 +2000,23 @@ void x86_pmu_show_pmu_cap(int num_counters, int 
num_counters_fixed,
pr_info("... event mask: %016Lx\n", intel_ctrl);
 }
 
+/*
+ * The generic code is not hybrid friendly. The hybrid_pmu->pmu
+ * of the first registered PMU is unconditionally assigned to
+ * each possible cpuctx->ctx.pmu.
+ * Update the correct hybrid PMU to the cpuctx->ctx.pmu.
+ */
+void x86_pmu_update_cpu_context(struct pmu *pmu, int cpu)
+{
+   struct perf_cpu_context *cpuctx;
+
+   if (!pmu->pmu_cpu_context)
+   return;
+
+   cpuctx = per_cpu_ptr(pmu->pmu_cpu_context, cpu);
+   cpuctx->ctx.pmu = pmu;
+}
+
 static int __init init_hw_perf_events(void)
 {
struct x86_pmu_quirk *quirk;
@@ -2050,8 +2077,11 @@ static int __init init_hw_perf_events(void)
 
pmu.attr_update = x86_pmu.attr_update;
 
-   x86_pmu_show_pmu_cap(x86

[tip: perf/core] perf/x86: Add structures for the attributes of Hybrid PMUs

2021-04-20 Thread tip-bot2 for Kan Liang

The following commit has been merged into the perf/core branch of tip:

Commit-ID: a9c81ccdf52dd73a20178c40bca34cf52991fdea
Gitweb:
https://git.kernel.org/tip/a9c81ccdf52dd73a20178c40bca34cf52991fdea
Author:Kan Liang 
AuthorDate:Mon, 12 Apr 2021 07:30:57 -07:00
Committer: Peter Zijlstra 
CommitterDate: Mon, 19 Apr 2021 20:03:28 +02:00

perf/x86: Add structures for the attributes of Hybrid PMUs

Hybrid PMUs have different events and formats. In theory, Hybrid PMU
specific attributes should be maintained in the dedicated struct
x86_hybrid_pmu, but it wastes space because the events and formats are
similar among Hybrid PMUs.

To reduce duplication, all hybrid PMUs will share a group of attributes
in the following patch. To distinguish an attribute from different
Hybrid PMUs, a PMU aware attribute structure is introduced. A PMU type
is required for the attribute structure. The type is internal usage. It
is not visible in the sysfs API.

Hybrid PMUs may support the same event name, but with different event
encoding, e.g., the mem-loads event on an Atom PMU has different event
encoding from a Core PMU. It brings issue if two attributes are
created for them. Current sysfs_update_group finds an attribute by
searching the attr name (aka event name). If two attributes have the
same event name, the first attribute will be replaced.
To address the issue, only one attribute is created for the event. The
event_str is extended and stores event encodings from all Hybrid PMUs.
Each event encoding is divided by ";". The order of the event encodings
must follow the order of the hybrid PMU index. The event_str is internal
usage as well. When a user wants to show the attribute of a Hybrid PMU,
only the corresponding part of the string is displayed.

Signed-off-by: Kan Liang 
Signed-off-by: Peter Zijlstra (Intel) 
Reviewed-by: Andi Kleen 
Link: 
https://lkml.kernel.org/r/1618237865-33448-18-git-send-email-kan.li...@linux.intel.com
---
 arch/x86/events/core.c   | 43 +++-
 arch/x86/events/perf_event.h | 19 +++-
 include/linux/perf_event.h   | 12 ++-
 3 files changed, 74 insertions(+)

diff --git a/arch/x86/events/core.c b/arch/x86/events/core.c
index bd465a8..37ab109 100644
--- a/arch/x86/events/core.c
+++ b/arch/x86/events/core.c
@@ -1860,6 +1860,49 @@ ssize_t events_ht_sysfs_show(struct device *dev, struct 
device_attribute *attr,
pmu_attr->event_str_noht);
 }
 
+ssize_t events_hybrid_sysfs_show(struct device *dev,
+struct device_attribute *attr,
+char *page)
+{
+   struct perf_pmu_events_hybrid_attr *pmu_attr =
+   container_of(attr, struct perf_pmu_events_hybrid_attr, attr);
+   struct x86_hybrid_pmu *pmu;
+   const char *str, *next_str;
+   int i;
+
+   if (hweight64(pmu_attr->pmu_type) == 1)
+   return sprintf(page, "%s", pmu_attr->event_str);
+
+   /*
+* Hybrid PMUs may support the same event name, but with different
+* event encoding, e.g., the mem-loads event on an Atom PMU has
+* different event encoding from a Core PMU.
+*
+* The event_str includes all event encodings. Each event encoding
+* is divided by ";". The order of the event encodings must follow
+* the order of the hybrid PMU index.
+*/
+   pmu = container_of(dev_get_drvdata(dev), struct x86_hybrid_pmu, pmu);
+
+   str = pmu_attr->event_str;
+   for (i = 0; i < x86_pmu.num_hybrid_pmus; i++) {
+   if (!(x86_pmu.hybrid_pmu[i].cpu_type & pmu_attr->pmu_type))
+   continue;
+   if (x86_pmu.hybrid_pmu[i].cpu_type & pmu->cpu_type) {
+   next_str = strchr(str, ';');
+   if (next_str)
+   return snprintf(page, next_str - str + 1, "%s", 
str);
+   else
+   return sprintf(page, "%s", str);
+   }
+   str = strchr(str, ';');
+   str++;
+   }
+
+   return 0;
+}
+EXPORT_SYMBOL_GPL(events_hybrid_sysfs_show);
+
 EVENT_ATTR(cpu-cycles, CPU_CYCLES  );
 EVENT_ATTR(instructions,   INSTRUCTIONS);
 EVENT_ATTR(cache-references,   CACHE_REFERENCES);
diff --git a/arch/x86/events/perf_event.h b/arch/x86/events/perf_event.h
index 4282ce4..e2be927 100644
--- a/arch/x86/events/perf_event.h
+++ b/arch/x86/events/perf_event.h
@@ -979,6 +979,22 @@ static struct perf_pmu_events_ht_attr event_attr_##v = {   
\
.event_str_ht   = ht,   \
 }
 
+#define EVENT_ATTR_STR_HYBRID(_name, v, str, _pmu) \
+static struct perf_pmu_events_hybrid_attr event_attr_##v = {   \
+   .att

[tip: perf/core] perf/x86/intel: Add Alder Lake Hybrid support

2021-04-20 Thread tip-bot2 for Kan Liang

The following commit has been merged into the perf/core branch of tip:

Commit-ID: f83d2f91d2590318e083d05bd7b1beda2489050e
Gitweb:
https://git.kernel.org/tip/f83d2f91d2590318e083d05bd7b1beda2489050e
Author:Kan Liang 
AuthorDate:Mon, 12 Apr 2021 07:31:00 -07:00
Committer: Peter Zijlstra 
CommitterDate: Mon, 19 Apr 2021 20:03:28 +02:00

perf/x86/intel: Add Alder Lake Hybrid support

Alder Lake Hybrid system has two different types of core, Golden Cove
core and Gracemont core. The Golden Cove core is registered to
"cpu_core" PMU. The Gracemont core is registered to "cpu_atom" PMU.

The difference between the two PMUs include:
- Number of GP and fixed counters
- Events
- The "cpu_core" PMU supports Topdown metrics.
  The "cpu_atom" PMU supports PEBS-via-PT.

The "cpu_core" PMU is similar to the Sapphire Rapids PMU, but without
PMEM.
The "cpu_atom" PMU is similar to Tremont, but with different events,
event_constraints, extra_regs and number of counters.

The mem-loads AUX event workaround only applies to the Golden Cove core.

Users may disable all CPUs of the same CPU type on the command line or
in the BIOS. For this case, perf still register a PMU for the CPU type
but the CPU mask is 0.

Current caps/pmu_name is usually the microarch codename. Assign the
"alderlake_hybrid" to the caps/pmu_name of both PMUs to indicate the
hybrid Alder Lake microarchitecture.

Signed-off-by: Kan Liang 
Signed-off-by: Peter Zijlstra (Intel) 
Reviewed-by: Andi Kleen 
Link: 
https://lkml.kernel.org/r/1618237865-33448-21-git-send-email-kan.li...@linux.intel.com
---
 arch/x86/events/intel/core.c | 255 +-
 arch/x86/events/intel/ds.c   |   7 +-
 arch/x86/events/perf_event.h |   7 +-
 3 files changed, 268 insertions(+), 1 deletion(-)

diff --git a/arch/x86/events/intel/core.c b/arch/x86/events/intel/core.c
index ba24638..5272f34 100644
--- a/arch/x86/events/intel/core.c
+++ b/arch/x86/events/intel/core.c
@@ -2076,6 +2076,14 @@ static struct extra_reg intel_tnt_extra_regs[] 
__read_mostly = {
EVENT_EXTRA_END
 };
 
+static struct extra_reg intel_grt_extra_regs[] __read_mostly = {
+   /* must define OFFCORE_RSP_X first, see intel_fixup_er() */
+   INTEL_UEVENT_EXTRA_REG(0x01b7, MSR_OFFCORE_RSP_0, 0x3full, 
RSP_0),
+   INTEL_UEVENT_EXTRA_REG(0x02b7, MSR_OFFCORE_RSP_1, 0x3full, 
RSP_1),
+   INTEL_UEVENT_PEBS_LDLAT_EXTRA_REG(0x5d0),
+   EVENT_EXTRA_END
+};
+
 #define KNL_OT_L2_HITE BIT_ULL(19) /* Other Tile L2 Hit */
 #define KNL_OT_L2_HITF BIT_ULL(20) /* Other Tile L2 Hit */
 #define KNL_MCDRAM_LOCAL   BIT_ULL(21)
@@ -2430,6 +2438,16 @@ static int icl_set_topdown_event_period(struct 
perf_event *event)
return 0;
 }
 
+static int adl_set_topdown_event_period(struct perf_event *event)
+{
+   struct x86_hybrid_pmu *pmu = hybrid_pmu(event->pmu);
+
+   if (pmu->cpu_type != hybrid_big)
+   return 0;
+
+   return icl_set_topdown_event_period(event);
+}
+
 static inline u64 icl_get_metrics_event_value(u64 metric, u64 slots, int idx)
 {
u32 val;
@@ -2570,6 +2588,17 @@ static u64 icl_update_topdown_event(struct perf_event 
*event)
 x86_pmu.num_topdown_events - 
1);
 }
 
+static u64 adl_update_topdown_event(struct perf_event *event)
+{
+   struct x86_hybrid_pmu *pmu = hybrid_pmu(event->pmu);
+
+   if (pmu->cpu_type != hybrid_big)
+   return 0;
+
+   return icl_update_topdown_event(event);
+}
+
+
 static void intel_pmu_read_topdown_event(struct perf_event *event)
 {
struct cpu_hw_events *cpuc = this_cpu_ptr(_hw_events);
@@ -3655,6 +3684,17 @@ static inline bool is_mem_loads_aux_event(struct 
perf_event *event)
return (event->attr.config & INTEL_ARCH_EVENT_MASK) == 
X86_CONFIG(.event=0x03, .umask=0x82);
 }
 
+static inline bool require_mem_loads_aux_event(struct perf_event *event)
+{
+   if (!(x86_pmu.flags & PMU_FL_MEM_LOADS_AUX))
+   return false;
+
+   if (is_hybrid())
+   return hybrid_pmu(event->pmu)->cpu_type == hybrid_big;
+
+   return true;
+}
+
 static inline bool intel_pmu_has_cap(struct perf_event *event, int idx)
 {
union perf_capabilities *intel_cap = (event->pmu, intel_cap);
@@ -3779,7 +3819,7 @@ static int intel_pmu_hw_config(struct perf_event *event)
 * event. The rule is to simplify the implementation of the check.
 * That's because perf cannot have a complete group at the moment.
 */
-   if (x86_pmu.flags & PMU_FL_MEM_LOADS_AUX &&
+   if (require_mem_loads_aux_event(event) &&
(event->attr.sample_type & PERF_SAMPLE_DATA_SRC) &&
is_mem_loads_event(event)) {
struct perf_event *leader = event->group_leader;
@@ -4056,6 +4096,39 @@ tfa_get

[tip: perf/core] perf/x86/intel: Add attr_update for Hybrid PMUs

2021-04-20 Thread tip-bot2 for Kan Liang

The following commit has been merged into the perf/core branch of tip:

Commit-ID: 58ae30c29a370c09eb49e0007d881a9aed13c5a3
Gitweb:
https://git.kernel.org/tip/58ae30c29a370c09eb49e0007d881a9aed13c5a3
Author:Kan Liang 
AuthorDate:Mon, 12 Apr 2021 07:30:58 -07:00
Committer: Peter Zijlstra 
CommitterDate: Mon, 19 Apr 2021 20:03:28 +02:00

perf/x86/intel: Add attr_update for Hybrid PMUs

The attribute_group for Hybrid PMUs should be different from the
previous
cpu PMU. For example, cpumask is required for a Hybrid PMU. The PMU type
should be included in the event and format attribute.

Add hybrid_attr_update for the Hybrid PMU.
Check the PMU type in is_visible() function. Only display the event or
format for the matched Hybrid PMU.

Signed-off-by: Kan Liang 
Signed-off-by: Peter Zijlstra (Intel) 
Reviewed-by: Andi Kleen 
Link: 
https://lkml.kernel.org/r/1618237865-33448-19-git-send-email-kan.li...@linux.intel.com
---
 arch/x86/events/intel/core.c | 120 --
 1 file changed, 114 insertions(+), 6 deletions(-)

diff --git a/arch/x86/events/intel/core.c b/arch/x86/events/intel/core.c
index 4881209..ba24638 100644
--- a/arch/x86/events/intel/core.c
+++ b/arch/x86/events/intel/core.c
@@ -5118,6 +5118,106 @@ static const struct attribute_group *attr_update[] = {
NULL,
 };
 
+static bool is_attr_for_this_pmu(struct kobject *kobj, struct attribute *attr)
+{
+   struct device *dev = kobj_to_dev(kobj);
+   struct x86_hybrid_pmu *pmu =
+   container_of(dev_get_drvdata(dev), struct x86_hybrid_pmu, pmu);
+   struct perf_pmu_events_hybrid_attr *pmu_attr =
+   container_of(attr, struct perf_pmu_events_hybrid_attr, 
attr.attr);
+
+   return pmu->cpu_type & pmu_attr->pmu_type;
+}
+
+static umode_t hybrid_events_is_visible(struct kobject *kobj,
+   struct attribute *attr, int i)
+{
+   return is_attr_for_this_pmu(kobj, attr) ? attr->mode : 0;
+}
+
+static inline int hybrid_find_supported_cpu(struct x86_hybrid_pmu *pmu)
+{
+   int cpu = cpumask_first(>supported_cpus);
+
+   return (cpu >= nr_cpu_ids) ? -1 : cpu;
+}
+
+static umode_t hybrid_tsx_is_visible(struct kobject *kobj,
+struct attribute *attr, int i)
+{
+   struct device *dev = kobj_to_dev(kobj);
+   struct x86_hybrid_pmu *pmu =
+container_of(dev_get_drvdata(dev), struct x86_hybrid_pmu, pmu);
+   int cpu = hybrid_find_supported_cpu(pmu);
+
+   return (cpu >= 0) && is_attr_for_this_pmu(kobj, attr) && 
cpu_has(_data(cpu), X86_FEATURE_RTM) ? attr->mode : 0;
+}
+
+static umode_t hybrid_format_is_visible(struct kobject *kobj,
+   struct attribute *attr, int i)
+{
+   struct device *dev = kobj_to_dev(kobj);
+   struct x86_hybrid_pmu *pmu =
+   container_of(dev_get_drvdata(dev), struct x86_hybrid_pmu, pmu);
+   struct perf_pmu_format_hybrid_attr *pmu_attr =
+   container_of(attr, struct perf_pmu_format_hybrid_attr, 
attr.attr);
+   int cpu = hybrid_find_supported_cpu(pmu);
+
+   return (cpu >= 0) && (pmu->cpu_type & pmu_attr->pmu_type) ? attr->mode 
: 0;
+}
+
+static struct attribute_group hybrid_group_events_td  = {
+   .name   = "events",
+   .is_visible = hybrid_events_is_visible,
+};
+
+static struct attribute_group hybrid_group_events_mem = {
+   .name   = "events",
+   .is_visible = hybrid_events_is_visible,
+};
+
+static struct attribute_group hybrid_group_events_tsx = {
+   .name   = "events",
+   .is_visible = hybrid_tsx_is_visible,
+};
+
+static struct attribute_group hybrid_group_format_extra = {
+   .name   = "format",
+   .is_visible = hybrid_format_is_visible,
+};
+
+static ssize_t intel_hybrid_get_attr_cpus(struct device *dev,
+ struct device_attribute *attr,
+ char *buf)
+{
+   struct x86_hybrid_pmu *pmu =
+   container_of(dev_get_drvdata(dev), struct x86_hybrid_pmu, pmu);
+
+   return cpumap_print_to_pagebuf(true, buf, >supported_cpus);
+}
+
+static DEVICE_ATTR(cpus, S_IRUGO, intel_hybrid_get_attr_cpus, NULL);
+static struct attribute *intel_hybrid_cpus_attrs[] = {
+   _attr_cpus.attr,
+   NULL,
+};
+
+static struct attribute_group hybrid_group_cpus = {
+   .attrs  = intel_hybrid_cpus_attrs,
+};
+
+static const struct attribute_group *hybrid_attr_update[] = {
+   _group_events_td,
+   _group_events_mem,
+   _group_events_tsx,
+   _caps_gen,
+   _caps_lbr,
+   _group_format_extra,
+   _default,
+   _group_cpus,
+   NULL,
+};
+
 static struct attribute *empty_attrs;
 
 static void intel_pmu_check_num_counters(int *num_counters,

[tip: perf/core] perf/x86: Support filter_match callback

2021-04-20 Thread tip-bot2 for Kan Liang

The following commit has been merged into the perf/core branch of tip:

Commit-ID: 3e9a8b219e4cc897dba20e19185d0471f129f6f3
Gitweb:
https://git.kernel.org/tip/3e9a8b219e4cc897dba20e19185d0471f129f6f3
Author:Kan Liang 
AuthorDate:Mon, 12 Apr 2021 07:30:59 -07:00
Committer: Peter Zijlstra 
CommitterDate: Mon, 19 Apr 2021 20:03:28 +02:00

perf/x86: Support filter_match callback

Implement filter_match callback for X86, which check whether an event is
schedulable on the current CPU.

Signed-off-by: Kan Liang 
Signed-off-by: Peter Zijlstra (Intel) 
Reviewed-by: Andi Kleen 
Link: 
https://lkml.kernel.org/r/1618237865-33448-20-git-send-email-kan.li...@linux.intel.com
---
 arch/x86/events/core.c   | 10 ++
 arch/x86/events/perf_event.h |  1 +
 2 files changed, 11 insertions(+)

diff --git a/arch/x86/events/core.c b/arch/x86/events/core.c
index 37ab109..4f6595e 100644
--- a/arch/x86/events/core.c
+++ b/arch/x86/events/core.c
@@ -2641,6 +2641,14 @@ static int x86_pmu_aux_output_match(struct perf_event 
*event)
return 0;
 }
 
+static int x86_pmu_filter_match(struct perf_event *event)
+{
+   if (x86_pmu.filter_match)
+   return x86_pmu.filter_match(event);
+
+   return 1;
+}
+
 static struct pmu pmu = {
.pmu_enable = x86_pmu_enable,
.pmu_disable= x86_pmu_disable,
@@ -2668,6 +2676,8 @@ static struct pmu pmu = {
.check_period   = x86_pmu_check_period,
 
.aux_output_match   = x86_pmu_aux_output_match,
+
+   .filter_match   = x86_pmu_filter_match,
 };
 
 void arch_perf_update_userpage(struct perf_event *event,
diff --git a/arch/x86/events/perf_event.h b/arch/x86/events/perf_event.h
index e2be927..606fb6e 100644
--- a/arch/x86/events/perf_event.h
+++ b/arch/x86/events/perf_event.h
@@ -879,6 +879,7 @@ struct x86_pmu {
 
int (*aux_output_match) (struct perf_event *event);
 
+   int (*filter_match)(struct perf_event *event);
/*
 * Hybrid support
 *

[tip: perf/core] perf: Extend PERF_TYPE_HARDWARE and PERF_TYPE_HW_CACHE

2021-04-20 Thread tip-bot2 for Kan Liang

The following commit has been merged into the perf/core branch of tip:

Commit-ID: 55bcf6ef314ae8ba81bcd74aa760247b635ed47b
Gitweb:
https://git.kernel.org/tip/55bcf6ef314ae8ba81bcd74aa760247b635ed47b
Author:Kan Liang 
AuthorDate:Mon, 12 Apr 2021 07:31:01 -07:00
Committer: Peter Zijlstra 
CommitterDate: Mon, 19 Apr 2021 20:03:29 +02:00

perf: Extend PERF_TYPE_HARDWARE and PERF_TYPE_HW_CACHE

Current Hardware events and Hardware cache events have special perf
types, PERF_TYPE_HARDWARE and PERF_TYPE_HW_CACHE. The two types don't
pass the PMU type in the user interface. For a hybrid system, the perf
subsystem doesn't know which PMU the events belong to. The first capable
PMU will always be assigned to the events. The events never get a chance
to run on the other capable PMUs.

Extend the two types to become PMU aware types. The PMU type ID is
stored at attr.config[63:32].

Add a new PMU capability, PERF_PMU_CAP_EXTENDED_HW_TYPE, to indicate a
PMU which supports the extended PERF_TYPE_HARDWARE and
PERF_TYPE_HW_CACHE.

The PMU type is only required when searching a specific PMU. The PMU
specific codes will only be interested in the 'real' config value, which
is stored in the low 32 bit of the event->attr.config. Update the
event->attr.config in the generic code, so the PMU specific codes don't
need to calculate it separately.

If a user specifies a PMU type, but the PMU doesn't support the extended
type, error out.

If an event cannot be initialized in a PMU specified by a user, error
out immediately. Perf should not try to open it on other PMUs.

The new PMU capability is only set for the X86 hybrid PMUs for now.
Other architectures, e.g., ARM, may need it as well. The support on ARM
may be implemented later separately.

Suggested-by: Peter Zijlstra (Intel) 
Signed-off-by: Kan Liang 
Signed-off-by: Peter Zijlstra (Intel) 
Link: 
https://lkml.kernel.org/r/1618237865-33448-22-git-send-email-kan.li...@linux.intel.com
---
 arch/x86/events/core.c  |  1 +
 include/linux/perf_event.h  | 19 ++-
 include/uapi/linux/perf_event.h | 15 +++
 kernel/events/core.c| 19 ---
 4 files changed, 42 insertions(+), 12 deletions(-)

diff --git a/arch/x86/events/core.c b/arch/x86/events/core.c
index 4f6595e..3fe66b7 100644
--- a/arch/x86/events/core.c
+++ b/arch/x86/events/core.c
@@ -2173,6 +2173,7 @@ static int __init init_hw_perf_events(void)
hybrid_pmu->pmu.type = -1;
hybrid_pmu->pmu.attr_update = x86_pmu.attr_update;
hybrid_pmu->pmu.capabilities |= 
PERF_PMU_CAP_HETEROGENEOUS_CPUS;
+   hybrid_pmu->pmu.capabilities |= 
PERF_PMU_CAP_EXTENDED_HW_TYPE;
 
err = perf_pmu_register(_pmu->pmu, 
hybrid_pmu->name,
(hybrid_pmu->cpu_type == 
hybrid_big) ? PERF_TYPE_RAW : -1);
diff --git a/include/linux/perf_event.h b/include/linux/perf_event.h
index 61b3851..a763928 100644
--- a/include/linux/perf_event.h
+++ b/include/linux/perf_event.h
@@ -260,15 +260,16 @@ struct perf_event;
 /**
  * pmu::capabilities flags
  */
-#define PERF_PMU_CAP_NO_INTERRUPT  0x01
-#define PERF_PMU_CAP_NO_NMI0x02
-#define PERF_PMU_CAP_AUX_NO_SG 0x04
-#define PERF_PMU_CAP_EXTENDED_REGS 0x08
-#define PERF_PMU_CAP_EXCLUSIVE 0x10
-#define PERF_PMU_CAP_ITRACE0x20
-#define PERF_PMU_CAP_HETEROGENEOUS_CPUS0x40
-#define PERF_PMU_CAP_NO_EXCLUDE0x80
-#define PERF_PMU_CAP_AUX_OUTPUT0x100
+#define PERF_PMU_CAP_NO_INTERRUPT  0x0001
+#define PERF_PMU_CAP_NO_NMI0x0002
+#define PERF_PMU_CAP_AUX_NO_SG 0x0004
+#define PERF_PMU_CAP_EXTENDED_REGS 0x0008
+#define PERF_PMU_CAP_EXCLUSIVE 0x0010
+#define PERF_PMU_CAP_ITRACE0x0020
+#define PERF_PMU_CAP_HETEROGENEOUS_CPUS0x0040
+#define PERF_PMU_CAP_NO_EXCLUDE0x0080
+#define PERF_PMU_CAP_AUX_OUTPUT0x0100
+#define PERF_PMU_CAP_EXTENDED_HW_TYPE  0x0200
 
 struct perf_output_handle;
 
diff --git a/include/uapi/linux/perf_event.h b/include/uapi/linux/perf_event.h
index 0b58970..e54e639 100644
--- a/include/uapi/linux/perf_event.h
+++ b/include/uapi/linux/perf_event.h
@@ -38,6 +38,21 @@ enum perf_type_id {
 };
 
 /*
+ * attr.config layout for type PERF_TYPE_HARDWARE and PERF_TYPE_HW_CACHE
+ * PERF_TYPE_HARDWARE: 0x00AA
+ * AA: hardware event ID
+ * : PMU type ID
+ * PERF_TYPE_HW_CACHE: 0x00DDCCBB
+ * BB: hardware cache ID
+ *

[tip: perf/core] perf/x86/intel/uncore: Add Alder Lake support

2021-04-20 Thread tip-bot2 for Kan Liang

The following commit has been merged into the perf/core branch of tip:

Commit-ID: 772ed05f3c5ce722b9de6c4c2dd87538a33fb8d3
Gitweb:
https://git.kernel.org/tip/772ed05f3c5ce722b9de6c4c2dd87538a33fb8d3
Author:Kan Liang 
AuthorDate:Mon, 12 Apr 2021 07:31:02 -07:00
Committer: Peter Zijlstra 
CommitterDate: Mon, 19 Apr 2021 20:03:29 +02:00

perf/x86/intel/uncore: Add Alder Lake support

The uncore subsystem for Alder Lake is similar to the previous Tiger
Lake.

The difference includes:
- New MSR addresses for global control, fixed counters, CBOX and ARB.
  Add a new adl_uncore_msr_ops for uncore operations.
- Add a new threshold field for CBOX.
- New PCIIDs for IMC devices.

Signed-off-by: Kan Liang 
Signed-off-by: Peter Zijlstra (Intel) 
Reviewed-by: Andi Kleen 
Link: 
https://lkml.kernel.org/r/1618237865-33448-23-git-send-email-kan.li...@linux.intel.com
---
 arch/x86/events/intel/uncore.c |   7 +-
 arch/x86/events/intel/uncore.h |   1 +-
 arch/x86/events/intel/uncore_snb.c | 131 -
 3 files changed, 139 insertions(+)

diff --git a/arch/x86/events/intel/uncore.c b/arch/x86/events/intel/uncore.c
index a2b68bb..df7b07d 100644
--- a/arch/x86/events/intel/uncore.c
+++ b/arch/x86/events/intel/uncore.c
@@ -1752,6 +1752,11 @@ static const struct intel_uncore_init_fun 
rkl_uncore_init __initconst = {
.pci_init = skl_uncore_pci_init,
 };
 
+static const struct intel_uncore_init_fun adl_uncore_init __initconst = {
+   .cpu_init = adl_uncore_cpu_init,
+   .mmio_init = tgl_uncore_mmio_init,
+};
+
 static const struct intel_uncore_init_fun icx_uncore_init __initconst = {
.cpu_init = icx_uncore_cpu_init,
.pci_init = icx_uncore_pci_init,
@@ -1806,6 +1811,8 @@ static const struct x86_cpu_id intel_uncore_match[] 
__initconst = {
X86_MATCH_INTEL_FAM6_MODEL(TIGERLAKE_L, _l_uncore_init),
X86_MATCH_INTEL_FAM6_MODEL(TIGERLAKE,   _uncore_init),
X86_MATCH_INTEL_FAM6_MODEL(ROCKETLAKE,  _uncore_init),
+   X86_MATCH_INTEL_FAM6_MODEL(ALDERLAKE,   _uncore_init),
+   X86_MATCH_INTEL_FAM6_MODEL(ALDERLAKE_L, _uncore_init),
X86_MATCH_INTEL_FAM6_MODEL(ATOM_TREMONT_D,  _uncore_init),
{},
 };
diff --git a/arch/x86/events/intel/uncore.h b/arch/x86/events/intel/uncore.h
index 96569dc..2917910 100644
--- a/arch/x86/events/intel/uncore.h
+++ b/arch/x86/events/intel/uncore.h
@@ -582,6 +582,7 @@ void snb_uncore_cpu_init(void);
 void nhm_uncore_cpu_init(void);
 void skl_uncore_cpu_init(void);
 void icl_uncore_cpu_init(void);
+void adl_uncore_cpu_init(void);
 void tgl_uncore_cpu_init(void);
 void tgl_uncore_mmio_init(void);
 void tgl_l_uncore_mmio_init(void);
diff --git a/arch/x86/events/intel/uncore_snb.c 
b/arch/x86/events/intel/uncore_snb.c
index 5127128..0f63706 100644
--- a/arch/x86/events/intel/uncore_snb.c
+++ b/arch/x86/events/intel/uncore_snb.c
@@ -62,6 +62,8 @@
 #define PCI_DEVICE_ID_INTEL_TGL_H_IMC  0x9a36
 #define PCI_DEVICE_ID_INTEL_RKL_1_IMC  0x4c43
 #define PCI_DEVICE_ID_INTEL_RKL_2_IMC  0x4c53
+#define PCI_DEVICE_ID_INTEL_ADL_1_IMC  0x4660
+#define PCI_DEVICE_ID_INTEL_ADL_2_IMC  0x4641
 
 /* SNB event control */
 #define SNB_UNC_CTL_EV_SEL_MASK0x00ff
@@ -131,12 +133,33 @@
 #define ICL_UNC_ARB_PER_CTR0x3b1
 #define ICL_UNC_ARB_PERFEVTSEL 0x3b3
 
+/* ADL uncore global control */
+#define ADL_UNC_PERF_GLOBAL_CTL0x2ff0
+#define ADL_UNC_FIXED_CTR_CTRL  0x2fde
+#define ADL_UNC_FIXED_CTR   0x2fdf
+
+/* ADL Cbo register */
+#define ADL_UNC_CBO_0_PER_CTR0 0x2002
+#define ADL_UNC_CBO_0_PERFEVTSEL0  0x2000
+#define ADL_UNC_CTL_THRESHOLD  0x3f00
+#define ADL_UNC_RAW_EVENT_MASK (SNB_UNC_CTL_EV_SEL_MASK | \
+SNB_UNC_CTL_UMASK_MASK | \
+SNB_UNC_CTL_EDGE_DET | \
+SNB_UNC_CTL_INVERT | \
+ADL_UNC_CTL_THRESHOLD)
+
+/* ADL ARB register */
+#define ADL_UNC_ARB_PER_CTR0   0x2FD2
+#define ADL_UNC_ARB_PERFEVTSEL00x2FD0
+#define ADL_UNC_ARB_MSR_OFFSET 0x8
+
 DEFINE_UNCORE_FORMAT_ATTR(event, event, "config:0-7");
 DEFINE_UNCORE_FORMAT_ATTR(umask, umask, "config:8-15");
 DEFINE_UNCORE_FORMAT_ATTR(edge, edge, "config:18");
 DEFINE_UNCORE_FORMAT_ATTR(inv, inv, "config:23");
 DEFINE_UNCORE_FORMAT_ATTR(cmask5, cmask, "config:24-28");
 DEFINE_UNCORE_FORMAT_ATTR(cmask8, cmask, "config:24-31");
+DEFINE_UNCORE_FORMAT_ATTR(threshold, threshold, "config:24-29");
 
 /* Sandy Bridge uncore support */
 static void snb_uncore_msr_enable_event(struct intel_

[tip: perf/core] perf/x86/cstate: Add Alder Lake CPU support

2021-04-20 Thread tip-bot2 for Kan Liang

The following commit has been merged into the perf/core branch of tip:

Commit-ID: d0ca946bcf84e1f9847571923bb1e6bd1264f424
Gitweb:
https://git.kernel.org/tip/d0ca946bcf84e1f9847571923bb1e6bd1264f424
Author:Kan Liang 
AuthorDate:Mon, 12 Apr 2021 07:31:04 -07:00
Committer: Peter Zijlstra 
CommitterDate: Mon, 19 Apr 2021 20:03:29 +02:00

perf/x86/cstate: Add Alder Lake CPU support

Compared with the Rocket Lake, the CORE C1 Residency Counter is added
for Alder Lake, but the CORE C3 Residency Counter is removed. Other
counters are the same.

Create a new adl_cstates for Alder Lake. Update the comments
accordingly.

The External Design Specification (EDS) is not published yet. It comes
from an authoritative internal source.

The patch has been tested on real hardware.

Signed-off-by: Kan Liang 
Signed-off-by: Peter Zijlstra (Intel) 
Reviewed-by: Andi Kleen 
Link: 
https://lkml.kernel.org/r/1618237865-33448-25-git-send-email-kan.li...@linux.intel.com
---
 arch/x86/events/intel/cstate.c | 39 -
 1 file changed, 29 insertions(+), 10 deletions(-)

diff --git a/arch/x86/events/intel/cstate.c b/arch/x86/events/intel/cstate.c
index 407eee5..4333990 100644
--- a/arch/x86/events/intel/cstate.c
+++ b/arch/x86/events/intel/cstate.c
@@ -40,7 +40,7 @@
  * Model specific counters:
  * MSR_CORE_C1_RES: CORE C1 Residency Counter
  *  perf code: 0x00
- *  Available model: SLM,AMT,GLM,CNL,TNT
+ *  Available model: SLM,AMT,GLM,CNL,TNT,ADL
  *  Scope: Core (each processor core has a MSR)
  * MSR_CORE_C3_RESIDENCY: CORE C3 Residency Counter
  *perf code: 0x01
@@ -51,46 +51,49 @@
  *perf code: 0x02
  *Available model: SLM,AMT,NHM,WSM,SNB,IVB,HSW,BDW,
  * SKL,KNL,GLM,CNL,KBL,CML,ICL,TGL,
- * TNT,RKL
+ * TNT,RKL,ADL
  *Scope: Core
  * MSR_CORE_C7_RESIDENCY: CORE C7 Residency Counter
  *perf code: 0x03
  *Available model: SNB,IVB,HSW,BDW,SKL,CNL,KBL,CML,
- * ICL,TGL,RKL
+ * ICL,TGL,RKL,ADL
  *Scope: Core
  * MSR_PKG_C2_RESIDENCY:  Package C2 Residency Counter.
  *perf code: 0x00
  *Available model: SNB,IVB,HSW,BDW,SKL,KNL,GLM,CNL,
- * KBL,CML,ICL,TGL,TNT,RKL
+ * KBL,CML,ICL,TGL,TNT,RKL,ADL
  *Scope: Package (physical package)
  * MSR_PKG_C3_RESIDENCY:  Package C3 Residency Counter.
  *perf code: 0x01
  *Available model: NHM,WSM,SNB,IVB,HSW,BDW,SKL,KNL,
- * GLM,CNL,KBL,CML,ICL,TGL,TNT,RKL
+ * GLM,CNL,KBL,CML,ICL,TGL,TNT,RKL,
+ * ADL
  *Scope: Package (physical package)
  * MSR_PKG_C6_RESIDENCY:  Package C6 Residency Counter.
  *perf code: 0x02
  *Available model: SLM,AMT,NHM,WSM,SNB,IVB,HSW,BDW,
  * SKL,KNL,GLM,CNL,KBL,CML,ICL,TGL,
- * TNT,RKL
+ * TNT,RKL,ADL
  *Scope: Package (physical package)
  * MSR_PKG_C7_RESIDENCY:  Package C7 Residency Counter.
  *perf code: 0x03
  *Available model: NHM,WSM,SNB,IVB,HSW,BDW,SKL,CNL,
- * KBL,CML,ICL,TGL,RKL
+ * KBL,CML,ICL,TGL,RKL,ADL
  *Scope: Package (physical package)
  * MSR_PKG_C8_RESIDENCY:  Package C8 Residency Counter.
  *perf code: 0x04
- *Available model: HSW ULT,KBL,CNL,CML,ICL,TGL,RKL
+ *Available model: HSW ULT,KBL,CNL,CML,ICL,TGL,RKL,
+ * ADL
  *Scope: Package (physical package)
  * MSR_PKG_C9_RESIDENCY:  Package C9 Residency Counter.
  *perf code: 0x05
- *Available model: HSW ULT,KBL,CNL,CML,ICL,TGL,RKL
+ *Available model: HSW ULT,KBL,CNL,CML,ICL,TGL,RKL,
+ * ADL
  *Scope: Package (physical package)
  * MSR_PKG_C10_RESIDENCY

[tip: perf/core] perf/x86/msr: Add Alder Lake CPU support

2021-04-20 Thread tip-bot2 for Kan Liang

The following commit has been merged into the perf/core branch of tip:

Commit-ID: 19d3a81fd92dc9b73950564955164ecfd0dfbea1
Gitweb:
https://git.kernel.org/tip/19d3a81fd92dc9b73950564955164ecfd0dfbea1
Author:Kan Liang 
AuthorDate:Mon, 12 Apr 2021 07:31:03 -07:00
Committer: Peter Zijlstra 
CommitterDate: Mon, 19 Apr 2021 20:03:29 +02:00

perf/x86/msr: Add Alder Lake CPU support

PPERF and SMI_COUNT MSRs are also supported on Alder Lake.

The External Design Specification (EDS) is not published yet. It comes
from an authoritative internal source.

The patch has been tested on real hardware.

Signed-off-by: Kan Liang 
Signed-off-by: Peter Zijlstra (Intel) 
Reviewed-by: Andi Kleen 
Link: 
https://lkml.kernel.org/r/1618237865-33448-24-git-send-email-kan.li...@linux.intel.com
---
 arch/x86/events/msr.c | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/arch/x86/events/msr.c b/arch/x86/events/msr.c
index 680404c..c853b28 100644
--- a/arch/x86/events/msr.c
+++ b/arch/x86/events/msr.c
@@ -100,6 +100,8 @@ static bool test_intel(int idx, void *data)
case INTEL_FAM6_TIGERLAKE_L:
case INTEL_FAM6_TIGERLAKE:
case INTEL_FAM6_ROCKETLAKE:
+   case INTEL_FAM6_ALDERLAKE:
+   case INTEL_FAM6_ALDERLAKE_L:
if (idx == PERF_MSR_SMI || idx == PERF_MSR_PPERF)
return true;
break;

[tip: perf/core] perf/x86: Move cpuc->running into P4 specific code

2021-04-16 Thread tip-bot2 for Kan Liang

The following commit has been merged into the perf/core branch of tip:

Commit-ID: 46ade4740bbf9bf4e804ddb2c85845cccd219f3c
Gitweb:
https://git.kernel.org/tip/46ade4740bbf9bf4e804ddb2c85845cccd219f3c
Author:Kan Liang 
AuthorDate:Wed, 14 Apr 2021 07:36:29 -07:00
Committer: Peter Zijlstra 
CommitterDate: Fri, 16 Apr 2021 16:32:42 +02:00

perf/x86: Move cpuc->running into P4 specific code

The 'running' variable is only used in the P4 PMU. Current perf sets the
variable in the critical function x86_pmu_start(), which wastes cycles
for everybody not running on P4.

Move cpuc->running into the P4 specific p4_pmu_enable_event().

Add a static per-CPU 'p4_running' variable to replace the 'running'
variable in the struct cpu_hw_events. Saves space for the generic
structure.

The p4_pmu_enable_all() also invokes the p4_pmu_enable_event(), but it
should not set cpuc->running. Factor out __p4_pmu_enable_event() for
p4_pmu_enable_all().

Suggested-by: Peter Zijlstra (Intel) 
Signed-off-by: Kan Liang 
Signed-off-by: Peter Zijlstra (Intel) 
Link: 
https://lkml.kernel.org/r/1618410990-21383-1-git-send-email-kan.li...@linux.intel.com
---
 arch/x86/events/core.c   |  1 -
 arch/x86/events/intel/p4.c   | 16 +---
 arch/x86/events/perf_event.h |  1 -
 3 files changed, 13 insertions(+), 5 deletions(-)

diff --git a/arch/x86/events/core.c b/arch/x86/events/core.c
index 18df171..dd9f3c2 100644
--- a/arch/x86/events/core.c
+++ b/arch/x86/events/core.c
@@ -1480,7 +1480,6 @@ static void x86_pmu_start(struct perf_event *event, int 
flags)
 
cpuc->events[idx] = event;
__set_bit(idx, cpuc->active_mask);
-   __set_bit(idx, cpuc->running);
static_call(x86_pmu_enable)(event);
perf_event_update_userpage(event);
 }
diff --git a/arch/x86/events/intel/p4.c b/arch/x86/events/intel/p4.c
index a4cc660..9c10cbb 100644
--- a/arch/x86/events/intel/p4.c
+++ b/arch/x86/events/intel/p4.c
@@ -947,7 +947,7 @@ static void p4_pmu_enable_pebs(u64 config)
(void)wrmsrl_safe(MSR_P4_PEBS_MATRIX_VERT,  (u64)bind->metric_vert);
 }
 
-static void p4_pmu_enable_event(struct perf_event *event)
+static void __p4_pmu_enable_event(struct perf_event *event)
 {
struct hw_perf_event *hwc = >hw;
int thread = p4_ht_config_thread(hwc->config);
@@ -983,6 +983,16 @@ static void p4_pmu_enable_event(struct perf_event *event)
(cccr & ~P4_CCCR_RESERVED) | P4_CCCR_ENABLE);
 }
 
+static DEFINE_PER_CPU(unsigned long [BITS_TO_LONGS(X86_PMC_IDX_MAX)], 
p4_running);
+
+static void p4_pmu_enable_event(struct perf_event *event)
+{
+   int idx = event->hw.idx;
+
+   __set_bit(idx, per_cpu(p4_running, smp_processor_id()));
+   __p4_pmu_enable_event(event);
+}
+
 static void p4_pmu_enable_all(int added)
 {
struct cpu_hw_events *cpuc = this_cpu_ptr(_hw_events);
@@ -992,7 +1002,7 @@ static void p4_pmu_enable_all(int added)
struct perf_event *event = cpuc->events[idx];
if (!test_bit(idx, cpuc->active_mask))
continue;
-   p4_pmu_enable_event(event);
+   __p4_pmu_enable_event(event);
}
 }
 
@@ -1012,7 +1022,7 @@ static int p4_pmu_handle_irq(struct pt_regs *regs)
 
if (!test_bit(idx, cpuc->active_mask)) {
/* catch in-flight IRQs */
-   if (__test_and_clear_bit(idx, cpuc->running))
+   if (__test_and_clear_bit(idx, per_cpu(p4_running, 
smp_processor_id(
handled++;
continue;
}
diff --git a/arch/x86/events/perf_event.h b/arch/x86/events/perf_event.h
index 53b2b5f..54a340e 100644
--- a/arch/x86/events/perf_event.h
+++ b/arch/x86/events/perf_event.h
@@ -228,7 +228,6 @@ struct cpu_hw_events {
 */
struct perf_event   *events[X86_PMC_IDX_MAX]; /* in counter order */
unsigned long   active_mask[BITS_TO_LONGS(X86_PMC_IDX_MAX)];
-   unsigned long   running[BITS_TO_LONGS(X86_PMC_IDX_MAX)];
int enabled;
 
int n_events; /* the # of events in the below 
arrays */

[tip: perf/core] perf/x86: Reset the dirty counter to prevent the leak for an RDPMC task

2021-04-16 Thread tip-bot2 for Kan Liang

The following commit has been merged into the perf/core branch of tip:

Commit-ID: 01fd9661e168de7cfc4f947e7220fca0e6791999
Gitweb:
https://git.kernel.org/tip/01fd9661e168de7cfc4f947e7220fca0e6791999
Author:Kan Liang 
AuthorDate:Wed, 14 Apr 2021 07:36:30 -07:00
Committer: Peter Zijlstra 
CommitterDate: Fri, 16 Apr 2021 16:32:43 +02:00

perf/x86: Reset the dirty counter to prevent the leak for an RDPMC task

The counter value of a perf task may leak to another RDPMC task.
For example, a perf stat task as below is running on CPU 0.

perf stat -e 'branches,cycles' -- taskset -c 0 ./workload

In the meantime, an RDPMC task, which is also running on CPU 0, may read
the GP counters periodically. (The RDPMC task creates a fixed event,
but read four GP counters.)

$ taskset -c 0 ./rdpmc_read_all_counters
index 0x0 value 0x8001e5970f99
index 0x1 value 0x8005d750edb6
index 0x2 value 0x0
index 0x3 value 0x0

index 0x0 value 0x8002358e48a5
index 0x1 value 0x8006bd1e3bc9
index 0x2 value 0x0
index 0x3 value 0x0

It is a potential security issue. Once the attacker knows what the other
thread is counting. The PerfMon counter can be used as a side-channel to
attack cryptosystems.

The counter value of the perf stat task leaks to the RDPMC task because
perf never clears the counter when it's stopped.

Two methods were considered to address the issue.
- Unconditionally reset the counter in x86_pmu_del(). It can bring extra
  overhead even when there is no RDPMC task running.
- Only reset the un-assigned dirty counters when the RDPMC task is
  scheduled in. The method is implemented here.

The dirty counter is a counter, on which the assigned event has been
deleted, but the counter is not reset. To track the dirty counters,
add a 'dirty' variable in the struct cpu_hw_events.

The current code doesn't reset the counter when the assigned event is
deleted. Set the corresponding bit in the 'dirty' variable in
x86_pmu_del(), if the RDPMC feature is available on the system.

The security issue can only be found with an RDPMC task. The event for
an RDPMC task requires the mmap buffer. This can be used to detect an
RDPMC task. Once the event is detected in the event_mapped(), enable
sched_task(), which is invoked in each context switch. Add a check in
the sched_task() to clear the dirty counters, when the RDPMC task is
scheduled in. Only the current un-assigned dirty counters are reset,
bacuase the RDPMC assigned dirty counters will be updated soon.

The RDPMC instruction is also supported on the older platforms. Add
sched_task() for the core_pmu. The core_pmu doesn't support large PEBS
and LBR callstack, the intel_pmu_pebs/lbr_sched_task() will be ignored.

The RDPMC is not Intel-only feature. Add the dirty counters clear code
in the X86 generic code.

After applying the patch,

$ taskset -c 0 ./rdpmc_read_all_counters
index 0x0 value 0x0
index 0x1 value 0x0
index 0x2 value 0x0
index 0x3 value 0x0

index 0x0 value 0x0
index 0x1 value 0x0
index 0x2 value 0x0
index 0x3 value 0x0

Performance

The performance of a context switch only be impacted when there are two
or more perf users and one of the users must be an RDPMC user. In other
cases, there is no performance impact.

The worst-case occurs when there are two users: the RDPMC user only
applies one counter; while the other user applies all available
counters. When the RDPMC task is scheduled in, all the counters, other
than the RDPMC assigned one, have to be reset.

Here is the test result for the worst-case.

The test is implemented on an Ice Lake platform, which has 8 GP
counters and 3 fixed counters (Not include SLOTS counter).

The lat_ctx is used to measure the context switching time.

lat_ctx -s 128K -N 1000 processes 2

I instrument the lat_ctx to open all 8 GP counters and 3 fixed
counters for one task. The other task opens a fixed counter and enable
RDPMC.

Without the patch:
The context switch time is 4.97 us

With the patch:
The context switch time is 5.16 us

There is ~4% performance drop for the context switching time in the
worst-case.

Suggested-by: Peter Zijlstra (Intel) 
Signed-off-by: Kan Liang 
Signed-off-by: Peter Zijlstra (Intel) 
Link: 
https://lkml.kernel.org/r/1618410990-21383-2-git-send-email-kan.li...@linux.intel.com
---
 arch/x86/events/core.c   | 47 +++-
 arch/x86/events/perf_event.h |  1 +-
 2 files changed, 48 insertions(+)

diff --git a/arch/x86/events/core.c b/arch/x86/events/core.c
index dd9f3c2..e34eb72 100644
--- a/arch/x86/events/core.c
+++ b/arch/x86/events/core.c
@@ -1585,6 +1585,8 @@ static void x86_pmu_del(struct perf_event *event, int 
flags)
if (cpuc->txn_flags & PERF_PMU_TXN_ADD)
goto do_del;
 
+   __set_bit(event->hw.idx, cpuc->dirty);
+
/*
 * Not a TXN, therefore cleanup properly.
 */
@@ -230

[PATCH] perf/x86/intel/uncore: Remove uncore extra PCI dev HSWEP_PCI_PCU_3

2021-04-15 Thread kan . liang

From: Kan Liang 

There may be a kernel panic on the Haswell server and the Broadwell
server, if the snbep_pci2phy_map_init() return error.

The uncore_extra_pci_dev[HSWEP_PCI_PCU_3] is used in the cpu_init() to
detect the existence of the SBOX, which is a MSR type of PMON unit.
The uncore_extra_pci_dev is allocated in the uncore_pci_init(). If the
snbep_pci2phy_map_init() returns error, perf doesn't initialize the
PCI type of the PMON units, so the uncore_extra_pci_dev will not be
allocated. But perf may continue initializing the MSR type of PMON
units. A null dereference kernel panic will be triggered.

The sockets in a Haswell server or a Broadwell server are identical.
Only need to detect the existence of the SBOX once.
Current perf probes all available PCU devices and stores them into the
uncore_extra_pci_dev. It's unnecessary.
Use the pci_get_device() to replace the uncore_extra_pci_dev. Only
detect the existence of the SBOX on the first available PCU device once.

Factor out hswep_has_limit_sbox(), since the Haswell server and the
Broadwell server uses the same way to detect the existence of the SBOX.

Add some macros to replace the magic number.

Fixes: 5306c31c5733 ("perf/x86/uncore/hsw-ep: Handle systems with only two 
SBOXes")
Reported-by: Steve Wahl 
Signed-off-by: Kan Liang 
---
 arch/x86/events/intel/uncore_snbep.c | 61 +++-
 1 file changed, 26 insertions(+), 35 deletions(-)

diff --git a/arch/x86/events/intel/uncore_snbep.c 
b/arch/x86/events/intel/uncore_snbep.c
index b79951d..9b89376 100644
--- a/arch/x86/events/intel/uncore_snbep.c
+++ b/arch/x86/events/intel/uncore_snbep.c
@@ -1159,7 +1159,6 @@ enum {
SNBEP_PCI_QPI_PORT0_FILTER,
SNBEP_PCI_QPI_PORT1_FILTER,
BDX_PCI_QPI_PORT2_FILTER,
-   HSWEP_PCI_PCU_3,
 };
 
 static int snbep_qpi_hw_config(struct intel_uncore_box *box, struct perf_event 
*event)
@@ -2857,22 +2856,33 @@ static struct intel_uncore_type *hswep_msr_uncores[] = {
NULL,
 };
 
-void hswep_uncore_cpu_init(void)
+#define HSWEP_PCU_DID  0x2fc0
+#define HSWEP_PCU_CAPID4_OFFET 0x94
+#define hswep_get_chop(_cap)   (((_cap) >> 6) & 0x3)
+
+static bool hswep_has_limit_sbox(unsigned int device)
 {
-   int pkg = boot_cpu_data.logical_proc_id;
+   struct pci_dev *dev = pci_get_device(PCI_VENDOR_ID_INTEL, device, NULL);
+   u32 capid4;
+
+   if (!dev)
+   return false;
+
+   pci_read_config_dword(dev, HSWEP_PCU_CAPID4_OFFET, );
+   if (!hswep_get_chop(capid4))
+   return true;
 
+   return false;
+}
+
+void hswep_uncore_cpu_init(void)
+{
if (hswep_uncore_cbox.num_boxes > boot_cpu_data.x86_max_cores)
hswep_uncore_cbox.num_boxes = boot_cpu_data.x86_max_cores;
 
/* Detect 6-8 core systems with only two SBOXes */
-   if (uncore_extra_pci_dev[pkg].dev[HSWEP_PCI_PCU_3]) {
-   u32 capid4;
-
-   
pci_read_config_dword(uncore_extra_pci_dev[pkg].dev[HSWEP_PCI_PCU_3],
- 0x94, );
-   if (((capid4 >> 6) & 0x3) == 0)
-   hswep_uncore_sbox.num_boxes = 2;
-   }
+   if (hswep_has_limit_sbox(HSWEP_PCU_DID))
+   hswep_uncore_sbox.num_boxes = 2;
 
uncore_msr_uncores = hswep_msr_uncores;
 }
@@ -3135,11 +3145,6 @@ static const struct pci_device_id hswep_uncore_pci_ids[] 
= {
.driver_data = UNCORE_PCI_DEV_DATA(UNCORE_EXTRA_PCI_DEV,
   SNBEP_PCI_QPI_PORT1_FILTER),
},
-   { /* PCU.3 (for Capability registers) */
-   PCI_DEVICE(PCI_VENDOR_ID_INTEL, 0x2fc0),
-   .driver_data = UNCORE_PCI_DEV_DATA(UNCORE_EXTRA_PCI_DEV,
-  HSWEP_PCI_PCU_3),
-   },
{ /* end: all zeroes */ }
 };
 
@@ -3231,27 +3236,18 @@ static struct event_constraint 
bdx_uncore_pcu_constraints[] = {
EVENT_CONSTRAINT_END
 };
 
+#define BDX_PCU_DID0x6fc0
+
 void bdx_uncore_cpu_init(void)
 {
-   int pkg = topology_phys_to_logical_pkg(boot_cpu_data.phys_proc_id);
-
if (bdx_uncore_cbox.num_boxes > boot_cpu_data.x86_max_cores)
bdx_uncore_cbox.num_boxes = boot_cpu_data.x86_max_cores;
uncore_msr_uncores = bdx_msr_uncores;
 
-   /* BDX-DE doesn't have SBOX */
-   if (boot_cpu_data.x86_model == 86) {
-   uncore_msr_uncores[BDX_MSR_UNCORE_SBOX] = NULL;
/* Detect systems with no SBOXes */
-   } else if (uncore_extra_pci_dev[pkg].dev[HSWEP_PCI_PCU_3]) {
-   struct pci_dev *pdev;
-   u32 capid4;
-
-   pdev = uncore_extra_pci_dev[pkg].dev[HSWEP_PCI_PCU_3];
-   pci_read_config_dword(pdev, 0x94, );
-   if (((capid4 >> 6) & 0x3) == 0)
-   bdx_msr_uncores[BDX_MSR_UNCORE_SBOX] = NULL;
-

[PATCH V4 2/2] perf/x86: Reset the dirty counter to prevent the leak for an RDPMC task

2021-04-14 Thread kan . liang

From: Kan Liang 

The counter value of a perf task may leak to another RDPMC task.
For example, a perf stat task as below is running on CPU 0.

perf stat -e 'branches,cycles' -- taskset -c 0 ./workload

In the meantime, an RDPMC task, which is also running on CPU 0, may read
the GP counters periodically. (The RDPMC task creates a fixed event,
but read four GP counters.)

$ taskset -c 0 ./rdpmc_read_all_counters
index 0x0 value 0x8001e5970f99
index 0x1 value 0x8005d750edb6
index 0x2 value 0x0
index 0x3 value 0x0

index 0x0 value 0x8002358e48a5
index 0x1 value 0x8006bd1e3bc9
index 0x2 value 0x0
index 0x3 value 0x0

It is a potential security issue. Once the attacker knows what the other
thread is counting. The PerfMon counter can be used as a side-channel to
attack cryptosystems.

The counter value of the perf stat task leaks to the RDPMC task because
perf never clears the counter when it's stopped.

Two methods were considered to address the issue.
- Unconditionally reset the counter in x86_pmu_del(). It can bring extra
  overhead even when there is no RDPMC task running.
- Only reset the un-assigned dirty counters when the RDPMC task is
  scheduled in. The method is implemented here.

The dirty counter is a counter, on which the assigned event has been
deleted, but the counter is not reset. To track the dirty counters,
add a 'dirty' variable in the struct cpu_hw_events.

The current code doesn't reset the counter when the assigned event is
deleted. Set the corresponding bit in the 'dirty' variable in
x86_pmu_del(), if the RDPMC feature is available on the system.

The security issue can only be found with an RDPMC task. The event for
an RDPMC task requires the mmap buffer. This can be used to detect an
RDPMC task. Once the event is detected in the event_mapped(), enable
sched_task(), which is invoked in each context switch. Add a check in
the sched_task() to clear the dirty counters, when the RDPMC task is
scheduled in. Only the current un-assigned dirty counters are reset,
bacuase the RDPMC assigned dirty counters will be updated soon.

The RDPMC instruction is also supported on the older platforms. Add
sched_task() for the core_pmu. The core_pmu doesn't support large PEBS
and LBR callstack, the intel_pmu_pebs/lbr_sched_task() will be ignored.

The RDPMC is not Intel-only feature. Add the dirty counters clear code
in the X86 generic code.

After applying the patch,

$ taskset -c 0 ./rdpmc_read_all_counters
index 0x0 value 0x0
index 0x1 value 0x0
index 0x2 value 0x0
index 0x3 value 0x0

index 0x0 value 0x0
index 0x1 value 0x0
index 0x2 value 0x0
index 0x3 value 0x0

Performance

The performance of a context switch only be impacted when there are two
or more perf users and one of the users must be an RDPMC user. In other
cases, there is no performance impact.

The worst-case occurs when there are two users: the RDPMC user only
applies one counter; while the other user applies all available
counters. When the RDPMC task is scheduled in, all the counters, other
than the RDPMC assigned one, have to be reset.

Here is the test result for the worst-case.

The test is implemented on an Ice Lake platform, which has 8 GP
counters and 3 fixed counters (Not include SLOTS counter).

The lat_ctx is used to measure the context switching time.

lat_ctx -s 128K -N 1000 processes 2

I instrument the lat_ctx to open all 8 GP counters and 3 fixed
counters for one task. The other task opens a fixed counter and enable
RDPMC.

Without the patch:
The context switch time is 4.97 us

With the patch:
The context switch time is 5.16 us

There is ~4% performance drop for the context switching time in the
worst-case.

Suggested-by: Peter Zijlstra (Intel) 
Signed-off-by: Kan Liang 
---
Changes since V3:
- Fix warnings reported by kernel test robot 
- Move bitmap_empty() check after clearing assigned counters.
  It should be very likely that the cpuc->dirty is non-empty.
  Move it after the clearing can skip the for_each_set_bit() and
  bitmap_zero(). 

The V2 can be found here.
https://lore.kernel.org/lkml/20200821195754.20159-3-kan.li...@linux.intel.com/

Changes since V2:
- Unconditionally set cpuc->dirty. The worst case for an RDPMC task is
  that we may have to clear all counters for the first time in
  x86_pmu_event_mapped. After that, the sched_task() will clear/update
  the 'dirty'. Only the real 'dirty' counters are clear. For a non-RDPMC
  task, it's harmless to unconditionally set the cpuc->dirty.
- Remove the !is_sampling_event() check
- Move the code into X86 generic file, because RDPMC is not a Intel-only
  feature.

Changes since V1:
- Drop the old method, which unconditionally reset the counter in
  x86_pmu_del().
  Only reset the dirty counters when a RDPMC task is sheduled in.

 arch/x86/events/core.c   | 47 
 arch/x86/events/perf_event.h |  1 +

[PATCH V4 1/2] perf/x86: Move cpuc->running into P4 specific code

2021-04-14 Thread kan . liang

From: Kan Liang 

The 'running' variable is only used in the P4 PMU. Current perf sets the
variable in the critical function x86_pmu_start(), which wastes cycles
for everybody not running on P4.

Move cpuc->running into the P4 specific p4_pmu_enable_event().

Add a static per-CPU 'p4_running' variable to replace the 'running'
variable in the struct cpu_hw_events. Saves space for the generic
structure.

The p4_pmu_enable_all() also invokes the p4_pmu_enable_event(), but it
should not set cpuc->running. Factor out __p4_pmu_enable_event() for
p4_pmu_enable_all().

Suggested-by: Peter Zijlstra (Intel) 
Signed-off-by: Kan Liang 
---
No changes since V3

New patch for V3
- Address the suggestion from Peter.
https://lore.kernel.org/lkml/20200908155840.gc35...@hirez.programming.kicks-ass.net/

 arch/x86/events/core.c   |  1 -
 arch/x86/events/intel/p4.c   | 16 +---
 arch/x86/events/perf_event.h |  1 -
 3 files changed, 13 insertions(+), 5 deletions(-)

diff --git a/arch/x86/events/core.c b/arch/x86/events/core.c
index 18df171..dd9f3c2 100644
--- a/arch/x86/events/core.c
+++ b/arch/x86/events/core.c
@@ -1480,7 +1480,6 @@ static void x86_pmu_start(struct perf_event *event, int 
flags)
 
cpuc->events[idx] = event;
__set_bit(idx, cpuc->active_mask);
-   __set_bit(idx, cpuc->running);
static_call(x86_pmu_enable)(event);
perf_event_update_userpage(event);
 }
diff --git a/arch/x86/events/intel/p4.c b/arch/x86/events/intel/p4.c
index a4cc660..9c10cbb 100644
--- a/arch/x86/events/intel/p4.c
+++ b/arch/x86/events/intel/p4.c
@@ -947,7 +947,7 @@ static void p4_pmu_enable_pebs(u64 config)
(void)wrmsrl_safe(MSR_P4_PEBS_MATRIX_VERT,  (u64)bind->metric_vert);
 }
 
-static void p4_pmu_enable_event(struct perf_event *event)
+static void __p4_pmu_enable_event(struct perf_event *event)
 {
struct hw_perf_event *hwc = >hw;
int thread = p4_ht_config_thread(hwc->config);
@@ -983,6 +983,16 @@ static void p4_pmu_enable_event(struct perf_event *event)
(cccr & ~P4_CCCR_RESERVED) | P4_CCCR_ENABLE);
 }
 
+static DEFINE_PER_CPU(unsigned long [BITS_TO_LONGS(X86_PMC_IDX_MAX)], 
p4_running);
+
+static void p4_pmu_enable_event(struct perf_event *event)
+{
+   int idx = event->hw.idx;
+
+   __set_bit(idx, per_cpu(p4_running, smp_processor_id()));
+   __p4_pmu_enable_event(event);
+}
+
 static void p4_pmu_enable_all(int added)
 {
struct cpu_hw_events *cpuc = this_cpu_ptr(_hw_events);
@@ -992,7 +1002,7 @@ static void p4_pmu_enable_all(int added)
struct perf_event *event = cpuc->events[idx];
if (!test_bit(idx, cpuc->active_mask))
continue;
-   p4_pmu_enable_event(event);
+   __p4_pmu_enable_event(event);
}
 }
 
@@ -1012,7 +1022,7 @@ static int p4_pmu_handle_irq(struct pt_regs *regs)
 
if (!test_bit(idx, cpuc->active_mask)) {
/* catch in-flight IRQs */
-   if (__test_and_clear_bit(idx, cpuc->running))
+   if (__test_and_clear_bit(idx, per_cpu(p4_running, 
smp_processor_id(
handled++;
continue;
}
diff --git a/arch/x86/events/perf_event.h b/arch/x86/events/perf_event.h
index 53b2b5f..54a340e 100644
--- a/arch/x86/events/perf_event.h
+++ b/arch/x86/events/perf_event.h
@@ -228,7 +228,6 @@ struct cpu_hw_events {
 */
struct perf_event   *events[X86_PMC_IDX_MAX]; /* in counter order */
unsigned long   active_mask[BITS_TO_LONGS(X86_PMC_IDX_MAX)];
-   unsigned long   running[BITS_TO_LONGS(X86_PMC_IDX_MAX)];
int enabled;
 
int n_events; /* the # of events in the below 
arrays */
-- 
2.7.4

[PATCH V3 2/2] perf/x86: Reset the dirty counter to prevent the leak for an RDPMC task

2021-04-13 Thread kan . liang

From: Kan Liang 

The counter value of a perf task may leak to another RDPMC task.
For example, a perf stat task as below is running on CPU 0.

perf stat -e 'branches,cycles' -- taskset -c 0 ./workload

In the meantime, an RDPMC task, which is also running on CPU 0, may read
the GP counters periodically. (The RDPMC task creates a fixed event,
but read four GP counters.)

$ taskset -c 0 ./rdpmc_read_all_counters
index 0x0 value 0x8001e5970f99
index 0x1 value 0x8005d750edb6
index 0x2 value 0x0
index 0x3 value 0x0

index 0x0 value 0x8002358e48a5
index 0x1 value 0x8006bd1e3bc9
index 0x2 value 0x0
index 0x3 value 0x0

It is a potential security issue. Once the attacker knows what the other
thread is counting. The PerfMon counter can be used as a side-channel to
attack cryptosystems.

The counter value of the perf stat task leaks to the RDPMC task because
perf never clears the counter when it's stopped.

Two methods were considered to address the issue.
- Unconditionally reset the counter in x86_pmu_del(). It can bring extra
  overhead even when there is no RDPMC task running.
- Only reset the un-assigned dirty counters when the RDPMC task is
  scheduled in. The method is implemented here.

The dirty counter is a counter, on which the assigned event has been
deleted, but the counter is not reset. To track the dirty counters,
add a 'dirty' variable in the struct cpu_hw_events.

The current code doesn't reset the counter when the assigned event is
deleted. Set the corresponding bit in the 'dirty' variable in
x86_pmu_del(), if the RDPMC feature is available on the system.

The security issue can only be found with an RDPMC task. The event for
an RDPMC task requires the mmap buffer. This can be used to detect an
RDPMC task. Once the event is detected in the event_mapped(), enable
sched_task(), which is invoked in each context switch. Add a check in
the sched_task() to clear the dirty counters, when the RDPMC task is
scheduled in. Only the current un-assigned dirty counters are reset,
bacuase the RDPMC assigned dirty counters will be updated soon.

The RDPMC instruction is also supported on the older platforms. Add
sched_task() for the core_pmu. The core_pmu doesn't support large PEBS
and LBR callstack, the intel_pmu_pebs/lbr_sched_task() will be ignored.

The RDPMC is not Intel-only feature. Add the dirty counters clear code
in the X86 generic code.

After applying the patch,

$ taskset -c 0 ./rdpmc_read_all_counters
index 0x0 value 0x0
index 0x1 value 0x0
index 0x2 value 0x0
index 0x3 value 0x0

index 0x0 value 0x0
index 0x1 value 0x0
index 0x2 value 0x0
index 0x3 value 0x0

Performance

The performance of a context switch only be impacted when there are two
or more perf users and one of the users must be an RDPMC user. In other
cases, there is no performance impact.

The worst-case occurs when there are two users: the RDPMC user only
applies one counter; while the other user applies all available
counters. When the RDPMC task is scheduled in, all the counters, other
than the RDPMC assigned one, have to be reset.

Here is the test result for the worst-case.

The test is implemented on an Ice Lake platform, which has 8 GP
counters and 3 fixed counters (Not include SLOTS counter).

The lat_ctx is used to measure the context switching time.

lat_ctx -s 128K -N 1000 processes 2

I instrument the lat_ctx to open all 8 GP counters and 3 fixed
counters for one task. The other task opens a fixed counter and enable
RDPMC.

Without the patch:
The context switch time is 4.97 us

With the patch:
The context switch time is 5.16 us

There is ~4% performance drop for the context switching time in the
worst-case.

Suggested-by: Peter Zijlstra (Intel) 
Signed-off-by: Kan Liang 
---
The V2 can be found here.
https://lore.kernel.org/lkml/20200821195754.20159-3-kan.li...@linux.intel.com/

Changes since V2:
- Unconditionally set cpuc->dirty. The worst case for an RDPMC task is
  that we may have to clear all counters for the first time in
  x86_pmu_event_mapped. After that, the sched_task() will clear/update
  the 'dirty'. Only the real 'dirty' counters are clear. For a non-RDPMC
  task, it's harmless to unconditionally set the cpuc->dirty.
- Remove the !is_sampling_event() check
- Move the code into X86 generic file, because RDPMC is not a Intel-only
  feature.

Changes since V1:
- Drop the old method, which unconditionally reset the counter in
  x86_pmu_del().
  Only reset the dirty counters when a RDPMC task is sheduled in.

 arch/x86/events/core.c   | 47 
 arch/x86/events/perf_event.h |  1 +
 2 files changed, 48 insertions(+)

diff --git a/arch/x86/events/core.c b/arch/x86/events/core.c
index dd9f3c2..0d4a1a3 100644
--- a/arch/x86/events/core.c
+++ b/arch/x86/events/core.c
@@ -1585,6 +1585,8 @@ static void x86_pmu_del(struct perf_event *event, int

[PATCH V3 1/2] perf/x86: Move cpuc->running into P4 specific code

2021-04-13 Thread kan . liang

From: Kan Liang 

The 'running' variable is only used in the P4 PMU. Current perf sets the
variable in the critical function x86_pmu_start(), which wastes cycles
for everybody not running on P4.

Move cpuc->running into the P4 specific p4_pmu_enable_event().

Add a static per-CPU 'p4_running' variable to replace the 'running'
variable in the struct cpu_hw_events. Saves space for the generic
structure.

The p4_pmu_enable_all() also invokes the p4_pmu_enable_event(), but it
should not set cpuc->running. Factor out __p4_pmu_enable_event() for
p4_pmu_enable_all().

Suggested-by: Peter Zijlstra (Intel) 
Signed-off-by: Kan Liang 
---

New patch for V3
- Address the suggestion from Peter.
https://lore.kernel.org/lkml/20200908155840.gc35...@hirez.programming.kicks-ass.net/

 arch/x86/events/core.c   |  1 -
 arch/x86/events/intel/p4.c   | 16 +---
 arch/x86/events/perf_event.h |  1 -
 3 files changed, 13 insertions(+), 5 deletions(-)

diff --git a/arch/x86/events/core.c b/arch/x86/events/core.c
index 18df171..dd9f3c2 100644
--- a/arch/x86/events/core.c
+++ b/arch/x86/events/core.c
@@ -1480,7 +1480,6 @@ static void x86_pmu_start(struct perf_event *event, int 
flags)
 
cpuc->events[idx] = event;
__set_bit(idx, cpuc->active_mask);
-   __set_bit(idx, cpuc->running);
static_call(x86_pmu_enable)(event);
perf_event_update_userpage(event);
 }
diff --git a/arch/x86/events/intel/p4.c b/arch/x86/events/intel/p4.c
index a4cc660..9c10cbb 100644
--- a/arch/x86/events/intel/p4.c
+++ b/arch/x86/events/intel/p4.c
@@ -947,7 +947,7 @@ static void p4_pmu_enable_pebs(u64 config)
(void)wrmsrl_safe(MSR_P4_PEBS_MATRIX_VERT,  (u64)bind->metric_vert);
 }
 
-static void p4_pmu_enable_event(struct perf_event *event)
+static void __p4_pmu_enable_event(struct perf_event *event)
 {
struct hw_perf_event *hwc = >hw;
int thread = p4_ht_config_thread(hwc->config);
@@ -983,6 +983,16 @@ static void p4_pmu_enable_event(struct perf_event *event)
(cccr & ~P4_CCCR_RESERVED) | P4_CCCR_ENABLE);
 }
 
+static DEFINE_PER_CPU(unsigned long [BITS_TO_LONGS(X86_PMC_IDX_MAX)], 
p4_running);
+
+static void p4_pmu_enable_event(struct perf_event *event)
+{
+   int idx = event->hw.idx;
+
+   __set_bit(idx, per_cpu(p4_running, smp_processor_id()));
+   __p4_pmu_enable_event(event);
+}
+
 static void p4_pmu_enable_all(int added)
 {
struct cpu_hw_events *cpuc = this_cpu_ptr(_hw_events);
@@ -992,7 +1002,7 @@ static void p4_pmu_enable_all(int added)
struct perf_event *event = cpuc->events[idx];
if (!test_bit(idx, cpuc->active_mask))
continue;
-   p4_pmu_enable_event(event);
+   __p4_pmu_enable_event(event);
}
 }
 
@@ -1012,7 +1022,7 @@ static int p4_pmu_handle_irq(struct pt_regs *regs)
 
if (!test_bit(idx, cpuc->active_mask)) {
/* catch in-flight IRQs */
-   if (__test_and_clear_bit(idx, cpuc->running))
+   if (__test_and_clear_bit(idx, per_cpu(p4_running, 
smp_processor_id(
handled++;
continue;
}
diff --git a/arch/x86/events/perf_event.h b/arch/x86/events/perf_event.h
index 53b2b5f..54a340e 100644
--- a/arch/x86/events/perf_event.h
+++ b/arch/x86/events/perf_event.h
@@ -228,7 +228,6 @@ struct cpu_hw_events {
 */
struct perf_event   *events[X86_PMC_IDX_MAX]; /* in counter order */
unsigned long   active_mask[BITS_TO_LONGS(X86_PMC_IDX_MAX)];
-   unsigned long   running[BITS_TO_LONGS(X86_PMC_IDX_MAX)];
int enabled;
 
int n_events; /* the # of events in the below 
arrays */
-- 
2.7.4

[PATCH V6 16/25] perf/x86: Register hybrid PMUs

2021-04-12 Thread kan . liang

From: Kan Liang 

Different hybrid PMUs have different PMU capabilities and events. Perf
should registers a dedicated PMU for each of them.

To check the X86 event, perf has to go through all possible hybrid pmus.

All the hybrid PMUs are registered at boot time. Before the
registration, add intel_pmu_check_hybrid_pmus() to check and update the
counters information, the event constraints, the extra registers and the
unique capabilities for each hybrid PMUs.

Postpone the display of the PMU information and HW check to
CPU_STARTING, because the boot CPU is the only online CPU in the
init_hw_perf_events(). Perf doesn't know the availability of the other
PMUs. Perf should display the PMU information only if the counters of
the PMU are available.

One type of CPUs may be all offline. For this case, users can still
observe the PMU in /sys/devices, but its CPU mask is 0.

All hybrid PMUs have capability PERF_PMU_CAP_HETEROGENEOUS_CPUS.
The PMU name for hybrid PMUs will be "cpu_XXX", which will be assigned
later in a separated patch.

The PMU type id for the core PMU is still PERF_TYPE_RAW. For the other
hybrid PMUs, the PMU type id is not hard code.

The event->cpu must be compatitable with the supported CPUs of the PMU.
Add a check in the x86_pmu_event_init().

The events in a group must be from the same type of hybrid PMU.
The fake cpuc used in the validation must be from the supported CPU of
the event->pmu.

Perf may not retrieve a valid core type from get_this_hybrid_cpu_type().
For example, ADL may have an alternative configuration. With that
configuration, Perf cannot retrieve the core type from the CPUID leaf
0x1a. Add a platform specific get_hybrid_cpu_type(). If the generic way
fails, invoke the platform specific get_hybrid_cpu_type().

Suggested-by: Peter Zijlstra (Intel) 
Signed-off-by: Kan Liang 
---
 arch/x86/events/core.c   | 137 +--
 arch/x86/events/intel/core.c |  93 -
 arch/x86/events/perf_event.h |  14 +
 3 files changed, 223 insertions(+), 21 deletions(-)

diff --git a/arch/x86/events/core.c b/arch/x86/events/core.c
index bb375ab..ccc639d 100644
--- a/arch/x86/events/core.c
+++ b/arch/x86/events/core.c
@@ -478,7 +478,7 @@ int x86_setup_perfctr(struct perf_event *event)
local64_set(>period_left, hwc->sample_period);
}
 
-   if (attr->type == PERF_TYPE_RAW)
+   if (attr->type == event->pmu->type)
return x86_pmu_extra_regs(event->attr.config, event);
 
if (attr->type == PERF_TYPE_HW_CACHE)
@@ -613,7 +613,7 @@ int x86_pmu_hw_config(struct perf_event *event)
if (!event->attr.exclude_kernel)
event->hw.config |= ARCH_PERFMON_EVENTSEL_OS;
 
-   if (event->attr.type == PERF_TYPE_RAW)
+   if (event->attr.type == event->pmu->type)
event->hw.config |= event->attr.config & X86_RAW_EVENT_MASK;
 
if (event->attr.sample_period && x86_pmu.limit_period) {
@@ -742,7 +742,17 @@ void x86_pmu_enable_all(int added)
 
 static inline int is_x86_event(struct perf_event *event)
 {
-   return event->pmu == 
+   int i;
+
+   if (!is_hybrid())
+   return event->pmu == 
+
+   for (i = 0; i < x86_pmu.num_hybrid_pmus; i++) {
+   if (event->pmu == _pmu.hybrid_pmu[i].pmu)
+   return true;
+   }
+
+   return false;
 }
 
 struct pmu *x86_get_pmu(unsigned int cpu)
@@ -1991,6 +2001,23 @@ void x86_pmu_show_pmu_cap(int num_counters, int 
num_counters_fixed,
pr_info("... event mask: %016Lx\n", intel_ctrl);
 }
 
+/*
+ * The generic code is not hybrid friendly. The hybrid_pmu->pmu
+ * of the first registered PMU is unconditionally assigned to
+ * each possible cpuctx->ctx.pmu.
+ * Update the correct hybrid PMU to the cpuctx->ctx.pmu.
+ */
+void x86_pmu_update_cpu_context(struct pmu *pmu, int cpu)
+{
+   struct perf_cpu_context *cpuctx;
+
+   if (!pmu->pmu_cpu_context)
+   return;
+
+   cpuctx = per_cpu_ptr(pmu->pmu_cpu_context, cpu);
+   cpuctx->ctx.pmu = pmu;
+}
+
 static int __init init_hw_perf_events(void)
 {
struct x86_pmu_quirk *quirk;
@@ -2051,8 +2078,11 @@ static int __init init_hw_perf_events(void)
 
pmu.attr_update = x86_pmu.attr_update;
 
-   x86_pmu_show_pmu_cap(x86_pmu.num_counters, x86_pmu.num_counters_fixed,
-x86_pmu.intel_ctrl);
+   if (!is_hybrid()) {
+   x86_pmu_show_pmu_cap(x86_pmu.num_counters,
+x86_pmu.num_counters_fixed,
+x86_pmu.intel_ctrl);
+   }
 
if (!x86_pmu.read)
x86_pmu.read = _x86_pmu_read;
@@ -2082,9 +2112,45 @@ static int __init init_hw_perf_events(void)
if (err)
goto out1;
 
-   err = perf_pmu_regist

[PATCH V6 17/25] perf/x86: Add structures for the attributes of Hybrid PMUs

2021-04-12 Thread kan . liang

From: Kan Liang 

Hybrid PMUs have different events and formats. In theory, Hybrid PMU
specific attributes should be maintained in the dedicated struct
x86_hybrid_pmu, but it wastes space because the events and formats are
similar among Hybrid PMUs.

To reduce duplication, all hybrid PMUs will share a group of attributes
in the following patch. To distinguish an attribute from different
Hybrid PMUs, a PMU aware attribute structure is introduced. A PMU type
is required for the attribute structure. The type is internal usage. It
is not visible in the sysfs API.

Hybrid PMUs may support the same event name, but with different event
encoding, e.g., the mem-loads event on an Atom PMU has different event
encoding from a Core PMU. It brings issue if two attributes are
created for them. Current sysfs_update_group finds an attribute by
searching the attr name (aka event name). If two attributes have the
same event name, the first attribute will be replaced.
To address the issue, only one attribute is created for the event. The
event_str is extended and stores event encodings from all Hybrid PMUs.
Each event encoding is divided by ";". The order of the event encodings
must follow the order of the hybrid PMU index. The event_str is internal
usage as well. When a user wants to show the attribute of a Hybrid PMU,
only the corresponding part of the string is displayed.

Reviewed-by: Andi Kleen 
Signed-off-by: Kan Liang 
---
 arch/x86/events/core.c   | 43 +++
 arch/x86/events/perf_event.h | 19 +++
 include/linux/perf_event.h   | 12 
 3 files changed, 74 insertions(+)

diff --git a/arch/x86/events/core.c b/arch/x86/events/core.c
index ccc639d..ba3736c 100644
--- a/arch/x86/events/core.c
+++ b/arch/x86/events/core.c
@@ -1861,6 +1861,49 @@ ssize_t events_ht_sysfs_show(struct device *dev, struct 
device_attribute *attr,
pmu_attr->event_str_noht);
 }
 
+ssize_t events_hybrid_sysfs_show(struct device *dev,
+struct device_attribute *attr,
+char *page)
+{
+   struct perf_pmu_events_hybrid_attr *pmu_attr =
+   container_of(attr, struct perf_pmu_events_hybrid_attr, attr);
+   struct x86_hybrid_pmu *pmu;
+   const char *str, *next_str;
+   int i;
+
+   if (hweight64(pmu_attr->pmu_type) == 1)
+   return sprintf(page, "%s", pmu_attr->event_str);
+
+   /*
+* Hybrid PMUs may support the same event name, but with different
+* event encoding, e.g., the mem-loads event on an Atom PMU has
+* different event encoding from a Core PMU.
+*
+* The event_str includes all event encodings. Each event encoding
+* is divided by ";". The order of the event encodings must follow
+* the order of the hybrid PMU index.
+*/
+   pmu = container_of(dev_get_drvdata(dev), struct x86_hybrid_pmu, pmu);
+
+   str = pmu_attr->event_str;
+   for (i = 0; i < x86_pmu.num_hybrid_pmus; i++) {
+   if (!(x86_pmu.hybrid_pmu[i].cpu_type & pmu_attr->pmu_type))
+   continue;
+   if (x86_pmu.hybrid_pmu[i].cpu_type & pmu->cpu_type) {
+   next_str = strchr(str, ';');
+   if (next_str)
+   return snprintf(page, next_str - str + 1, "%s", 
str);
+   else
+   return sprintf(page, "%s", str);
+   }
+   str = strchr(str, ';');
+   str++;
+   }
+
+   return 0;
+}
+EXPORT_SYMBOL_GPL(events_hybrid_sysfs_show);
+
 EVENT_ATTR(cpu-cycles, CPU_CYCLES  );
 EVENT_ATTR(instructions,   INSTRUCTIONS);
 EVENT_ATTR(cache-references,   CACHE_REFERENCES);
diff --git a/arch/x86/events/perf_event.h b/arch/x86/events/perf_event.h
index 22e13ff..4d94ec9 100644
--- a/arch/x86/events/perf_event.h
+++ b/arch/x86/events/perf_event.h
@@ -980,6 +980,22 @@ static struct perf_pmu_events_ht_attr event_attr_##v = {   
\
.event_str_ht   = ht,   \
 }
 
+#define EVENT_ATTR_STR_HYBRID(_name, v, str, _pmu) \
+static struct perf_pmu_events_hybrid_attr event_attr_##v = {   \
+   .attr   = __ATTR(_name, 0444, events_hybrid_sysfs_show, NULL),\
+   .id = 0,\
+   .event_str  = str,  \
+   .pmu_type   = _pmu, \
+}
+
+#define FORMAT_HYBRID_PTR(_id) (_attr_hybrid_##_id.attr.attr)
+
+#define FORMAT_ATTR_HYBRID(_name, _pmu)
\
+static struct perf_pmu_format_hybrid_attr format_attr_hybrid_##_name = {\
+   .att

[PATCH V6 15/25] perf/x86: Factor out x86_pmu_show_pmu_cap

2021-04-12 Thread kan . liang

From: Kan Liang 

The PMU capabilities are different among hybrid PMUs. Perf should dump
the PMU capabilities information for each hybrid PMU.

Factor out x86_pmu_show_pmu_cap() which shows the PMU capabilities
information. The function will be reused later when registering a
dedicated hybrid PMU.

Reviewed-by: Andi Kleen 
Signed-off-by: Kan Liang 
---
 arch/x86/events/core.c   | 25 -
 arch/x86/events/perf_event.h |  3 +++
 2 files changed, 19 insertions(+), 9 deletions(-)

diff --git a/arch/x86/events/core.c b/arch/x86/events/core.c
index fe811b5..bb375ab 100644
--- a/arch/x86/events/core.c
+++ b/arch/x86/events/core.c
@@ -1977,6 +1977,20 @@ static void _x86_pmu_read(struct perf_event *event)
x86_perf_event_update(event);
 }
 
+void x86_pmu_show_pmu_cap(int num_counters, int num_counters_fixed,
+ u64 intel_ctrl)
+{
+   pr_info("... version:%d\n", x86_pmu.version);
+   pr_info("... bit width:  %d\n", x86_pmu.cntval_bits);
+   pr_info("... generic registers:  %d\n", num_counters);
+   pr_info("... value mask: %016Lx\n", x86_pmu.cntval_mask);
+   pr_info("... max period: %016Lx\n", x86_pmu.max_period);
+   pr_info("... fixed-purpose events:   %lu\n",
+   hweight641ULL << num_counters_fixed) - 1)
+   << INTEL_PMC_IDX_FIXED) & intel_ctrl));
+   pr_info("... event mask: %016Lx\n", intel_ctrl);
+}
+
 static int __init init_hw_perf_events(void)
 {
struct x86_pmu_quirk *quirk;
@@ -2037,15 +2051,8 @@ static int __init init_hw_perf_events(void)
 
pmu.attr_update = x86_pmu.attr_update;
 
-   pr_info("... version:%d\n", x86_pmu.version);
-   pr_info("... bit width:  %d\n", x86_pmu.cntval_bits);
-   pr_info("... generic registers:  %d\n", x86_pmu.num_counters);
-   pr_info("... value mask: %016Lx\n", x86_pmu.cntval_mask);
-   pr_info("... max period: %016Lx\n", x86_pmu.max_period);
-   pr_info("... fixed-purpose events:   %lu\n",
-   hweight641ULL << x86_pmu.num_counters_fixed) - 1)
-   << INTEL_PMC_IDX_FIXED) & 
x86_pmu.intel_ctrl));
-   pr_info("... event mask: %016Lx\n", x86_pmu.intel_ctrl);
+   x86_pmu_show_pmu_cap(x86_pmu.num_counters, x86_pmu.num_counters_fixed,
+x86_pmu.intel_ctrl);
 
if (!x86_pmu.read)
x86_pmu.read = _x86_pmu_read;
diff --git a/arch/x86/events/perf_event.h b/arch/x86/events/perf_event.h
index f04be6b..8523700 100644
--- a/arch/x86/events/perf_event.h
+++ b/arch/x86/events/perf_event.h
@@ -1093,6 +1093,9 @@ void x86_pmu_enable_event(struct perf_event *event);
 
 int x86_pmu_handle_irq(struct pt_regs *regs);
 
+void x86_pmu_show_pmu_cap(int num_counters, int num_counters_fixed,
+ u64 intel_ctrl);
+
 extern struct event_constraint emptyconstraint;
 
 extern struct event_constraint unconstrained;
-- 
2.7.4

[PATCH V6 14/25] perf/x86: Remove temporary pmu assignment in event_init

2021-04-12 Thread kan . liang

From: Kan Liang 

The temporary pmu assignment in event_init is unnecessary.

The assignment was introduced by commit 8113070d6639 ("perf_events:
Add fast-path to the rescheduling code"). At that time, event->pmu is
not assigned yet when initializing an event. The assignment is required.
However, from commit 7e5b2a01d2ca ("perf: provide PMU when initing
events"), the event->pmu is provided before event_init is invoked.
The temporary pmu assignment in event_init should be removed.

Reviewed-by: Andi Kleen 
Signed-off-by: Kan Liang 
---
 arch/x86/events/core.c | 11 ---
 1 file changed, 11 deletions(-)

diff --git a/arch/x86/events/core.c b/arch/x86/events/core.c
index 4dcf0de..fe811b5 100644
--- a/arch/x86/events/core.c
+++ b/arch/x86/events/core.c
@@ -2292,7 +2292,6 @@ static int validate_group(struct perf_event *event)
 
 static int x86_pmu_event_init(struct perf_event *event)
 {
-   struct pmu *tmp;
int err;
 
switch (event->attr.type) {
@@ -2307,20 +2306,10 @@ static int x86_pmu_event_init(struct perf_event *event)
 
err = __x86_pmu_event_init(event);
if (!err) {
-   /*
-* we temporarily connect event to its pmu
-* such that validate_group() can classify
-* it as an x86 event using is_x86_event()
-*/
-   tmp = event->pmu;
-   event->pmu = 
-
if (event->group_leader != event)
err = validate_group(event);
else
err = validate_event(event);
-
-   event->pmu = tmp;
}
if (err) {
if (event->destroy)
-- 
2.7.4

[PATCH V6 11/25] perf/x86/intel: Factor out intel_pmu_check_num_counters

2021-04-12 Thread kan . liang

From: Kan Liang 

Each Hybrid PMU has to check its own number of counters and mask fixed
counters before registration.

The intel_pmu_check_num_counters will be reused later to check the
number of the counters for each hybrid PMU.

Reviewed-by: Andi Kleen 
Signed-off-by: Kan Liang 
---
 arch/x86/events/intel/core.c | 38 --
 1 file changed, 24 insertions(+), 14 deletions(-)

diff --git a/arch/x86/events/intel/core.c b/arch/x86/events/intel/core.c
index f727aa5..d7e2021 100644
--- a/arch/x86/events/intel/core.c
+++ b/arch/x86/events/intel/core.c
@@ -5064,6 +5064,26 @@ static const struct attribute_group *attr_update[] = {
 
 static struct attribute *empty_attrs;
 
+static void intel_pmu_check_num_counters(int *num_counters,
+int *num_counters_fixed,
+u64 *intel_ctrl, u64 fixed_mask)
+{
+   if (*num_counters > INTEL_PMC_MAX_GENERIC) {
+   WARN(1, KERN_ERR "hw perf events %d > max(%d), clipping!",
+*num_counters, INTEL_PMC_MAX_GENERIC);
+   *num_counters = INTEL_PMC_MAX_GENERIC;
+   }
+   *intel_ctrl = (1ULL << *num_counters) - 1;
+
+   if (*num_counters_fixed > INTEL_PMC_MAX_FIXED) {
+   WARN(1, KERN_ERR "hw perf events fixed %d > max(%d), clipping!",
+*num_counters_fixed, INTEL_PMC_MAX_FIXED);
+   *num_counters_fixed = INTEL_PMC_MAX_FIXED;
+   }
+
+   *intel_ctrl |= fixed_mask << INTEL_PMC_IDX_FIXED;
+}
+
 __init int intel_pmu_init(void)
 {
struct attribute **extra_skl_attr = _attrs;
@@ -5703,20 +5723,10 @@ __init int intel_pmu_init(void)
 
x86_pmu.attr_update = attr_update;
 
-   if (x86_pmu.num_counters > INTEL_PMC_MAX_GENERIC) {
-   WARN(1, KERN_ERR "hw perf events %d > max(%d), clipping!",
-x86_pmu.num_counters, INTEL_PMC_MAX_GENERIC);
-   x86_pmu.num_counters = INTEL_PMC_MAX_GENERIC;
-   }
-   x86_pmu.intel_ctrl = (1ULL << x86_pmu.num_counters) - 1;
-
-   if (x86_pmu.num_counters_fixed > INTEL_PMC_MAX_FIXED) {
-   WARN(1, KERN_ERR "hw perf events fixed %d > max(%d), clipping!",
-x86_pmu.num_counters_fixed, INTEL_PMC_MAX_FIXED);
-   x86_pmu.num_counters_fixed = INTEL_PMC_MAX_FIXED;
-   }
-
-   x86_pmu.intel_ctrl |= (u64)fixed_mask << INTEL_PMC_IDX_FIXED;
+   intel_pmu_check_num_counters(_pmu.num_counters,
+_pmu.num_counters_fixed,
+_pmu.intel_ctrl,
+(u64)fixed_mask);
 
/* AnyThread may be deprecated on arch perfmon v5 or later */
if (x86_pmu.intel_cap.anythread_deprecated)
-- 
2.7.4

[PATCH V6 08/25] perf/x86: Hybrid PMU support for hardware cache event

2021-04-12 Thread kan . liang

From: Kan Liang 

The hardware cache events are different among hybrid PMUs. Each hybrid
PMU should have its own hw cache event table.

Suggested-by: Peter Zijlstra (Intel) 
Signed-off-by: Kan Liang 
---
 arch/x86/events/core.c   | 5 ++---
 arch/x86/events/perf_event.h | 9 +
 2 files changed, 11 insertions(+), 3 deletions(-)

diff --git a/arch/x86/events/core.c b/arch/x86/events/core.c
index 3b99864..a5f8a5e 100644
--- a/arch/x86/events/core.c
+++ b/arch/x86/events/core.c
@@ -376,8 +376,7 @@ set_ext_hw_attr(struct hw_perf_event *hwc, struct 
perf_event *event)
return -EINVAL;
cache_result = array_index_nospec(cache_result, 
PERF_COUNT_HW_CACHE_RESULT_MAX);
 
-   val = hw_cache_event_ids[cache_type][cache_op][cache_result];
-
+   val = hybrid_var(event->pmu, 
hw_cache_event_ids)[cache_type][cache_op][cache_result];
if (val == 0)
return -ENOENT;
 
@@ -385,7 +384,7 @@ set_ext_hw_attr(struct hw_perf_event *hwc, struct 
perf_event *event)
return -EINVAL;
 
hwc->config |= val;
-   attr->config1 = hw_cache_extra_regs[cache_type][cache_op][cache_result];
+   attr->config1 = hybrid_var(event->pmu, 
hw_cache_extra_regs)[cache_type][cache_op][cache_result];
return x86_pmu_extra_regs(val, event);
 }
 
diff --git a/arch/x86/events/perf_event.h b/arch/x86/events/perf_event.h
index 93d6479..10ef244 100644
--- a/arch/x86/events/perf_event.h
+++ b/arch/x86/events/perf_event.h
@@ -640,6 +640,15 @@ struct x86_hybrid_pmu {
int num_counters;
int num_counters_fixed;
struct event_constraint unconstrained;
+
+   u64 hw_cache_event_ids
+   [PERF_COUNT_HW_CACHE_MAX]
+   [PERF_COUNT_HW_CACHE_OP_MAX]
+   [PERF_COUNT_HW_CACHE_RESULT_MAX];
+   u64 hw_cache_extra_regs
+   [PERF_COUNT_HW_CACHE_MAX]
+   [PERF_COUNT_HW_CACHE_OP_MAX]
+   [PERF_COUNT_HW_CACHE_RESULT_MAX];
 };
 
 static __always_inline struct x86_hybrid_pmu *hybrid_pmu(struct pmu *pmu)
-- 
2.7.4

[PATCH V6 10/25] perf/x86: Hybrid PMU support for extra_regs

2021-04-12 Thread kan . liang

From: Kan Liang 

Different hybrid PMU may have different extra registers, e.g. Core PMU
may have offcore registers, frontend register and ldlat register. Atom
core may only have offcore registers and ldlat register. Each hybrid PMU
should use its own extra_regs.

An Intel Hybrid system should always have extra registers.
Unconditionally allocate shared_regs for Intel Hybrid system.

Reviewed-by: Andi Kleen 
Signed-off-by: Kan Liang 
---
 arch/x86/events/core.c   |  5 +++--
 arch/x86/events/intel/core.c | 15 +--
 arch/x86/events/perf_event.h |  1 +
 3 files changed, 13 insertions(+), 8 deletions(-)

diff --git a/arch/x86/events/core.c b/arch/x86/events/core.c
index f3e6fb0..4dcf0de 100644
--- a/arch/x86/events/core.c
+++ b/arch/x86/events/core.c
@@ -154,15 +154,16 @@ u64 x86_perf_event_update(struct perf_event *event)
  */
 static int x86_pmu_extra_regs(u64 config, struct perf_event *event)
 {
+   struct extra_reg *extra_regs = hybrid(event->pmu, extra_regs);
struct hw_perf_event_extra *reg;
struct extra_reg *er;
 
reg = >hw.extra_reg;
 
-   if (!x86_pmu.extra_regs)
+   if (!extra_regs)
return 0;
 
-   for (er = x86_pmu.extra_regs; er->msr; er++) {
+   for (er = extra_regs; er->msr; er++) {
if (er->event != (config & er->config_mask))
continue;
if (event->attr.config1 & ~er->valid_mask)
diff --git a/arch/x86/events/intel/core.c b/arch/x86/events/intel/core.c
index 447a80f..f727aa5 100644
--- a/arch/x86/events/intel/core.c
+++ b/arch/x86/events/intel/core.c
@@ -2966,8 +2966,10 @@ intel_vlbr_constraints(struct perf_event *event)
return NULL;
 }
 
-static int intel_alt_er(int idx, u64 config)
+static int intel_alt_er(struct cpu_hw_events *cpuc,
+   int idx, u64 config)
 {
+   struct extra_reg *extra_regs = hybrid(cpuc->pmu, extra_regs);
int alt_idx = idx;
 
if (!(x86_pmu.flags & PMU_FL_HAS_RSP_1))
@@ -2979,7 +2981,7 @@ static int intel_alt_er(int idx, u64 config)
if (idx == EXTRA_REG_RSP_1)
alt_idx = EXTRA_REG_RSP_0;
 
-   if (config & ~x86_pmu.extra_regs[alt_idx].valid_mask)
+   if (config & ~extra_regs[alt_idx].valid_mask)
return idx;
 
return alt_idx;
@@ -2987,15 +2989,16 @@ static int intel_alt_er(int idx, u64 config)
 
 static void intel_fixup_er(struct perf_event *event, int idx)
 {
+   struct extra_reg *extra_regs = hybrid(event->pmu, extra_regs);
event->hw.extra_reg.idx = idx;
 
if (idx == EXTRA_REG_RSP_0) {
event->hw.config &= ~INTEL_ARCH_EVENT_MASK;
-   event->hw.config |= x86_pmu.extra_regs[EXTRA_REG_RSP_0].event;
+   event->hw.config |= extra_regs[EXTRA_REG_RSP_0].event;
event->hw.extra_reg.reg = MSR_OFFCORE_RSP_0;
} else if (idx == EXTRA_REG_RSP_1) {
event->hw.config &= ~INTEL_ARCH_EVENT_MASK;
-   event->hw.config |= x86_pmu.extra_regs[EXTRA_REG_RSP_1].event;
+   event->hw.config |= extra_regs[EXTRA_REG_RSP_1].event;
event->hw.extra_reg.reg = MSR_OFFCORE_RSP_1;
}
 }
@@ -3071,7 +3074,7 @@ __intel_shared_reg_get_constraints(struct cpu_hw_events 
*cpuc,
 */
c = NULL;
} else {
-   idx = intel_alt_er(idx, reg->config);
+   idx = intel_alt_er(cpuc, idx, reg->config);
if (idx != reg->idx) {
raw_spin_unlock_irqrestore(>lock, flags);
goto again;
@@ -4155,7 +4158,7 @@ int intel_cpuc_prepare(struct cpu_hw_events *cpuc, int 
cpu)
 {
cpuc->pebs_record_size = x86_pmu.pebs_record_size;
 
-   if (x86_pmu.extra_regs || x86_pmu.lbr_sel_map) {
+   if (is_hybrid() || x86_pmu.extra_regs || x86_pmu.lbr_sel_map) {
cpuc->shared_regs = allocate_shared_regs(cpu);
if (!cpuc->shared_regs)
goto err;
diff --git a/arch/x86/events/perf_event.h b/arch/x86/events/perf_event.h
index a38c5b6..f04be6b 100644
--- a/arch/x86/events/perf_event.h
+++ b/arch/x86/events/perf_event.h
@@ -651,6 +651,7 @@ struct x86_hybrid_pmu {
[PERF_COUNT_HW_CACHE_RESULT_MAX];
struct event_constraint *event_constraints;
struct event_constraint *pebs_constraints;
+   struct extra_reg*extra_regs;
 };
 
 static __always_inline struct x86_hybrid_pmu *hybrid_pmu(struct pmu *pmu)
-- 
2.7.4

[PATCH V6 06/25] perf/x86: Hybrid PMU support for counters

2021-04-12 Thread kan . liang

From: Kan Liang 

The number of GP and fixed counters are different among hybrid PMUs.
Each hybrid PMU should use its own counter related information.

When handling a certain hybrid PMU, apply the number of counters from
the corresponding hybrid PMU.

When reserving the counters in the initialization of a new event,
reserve all possible counters.

The number of counter recored in the global x86_pmu is for the
architecture counters which are available for all hybrid PMUs. KVM
doesn't support the hybrid PMU yet. Return the number of the
architecture counters for now.

For the functions only available for the old platforms, e.g.,
intel_pmu_drain_pebs_nhm(), nothing is changed.

Reviewed-by: Andi Kleen 
Signed-off-by: Kan Liang 
---
 arch/x86/events/core.c   | 55 ++--
 arch/x86/events/intel/core.c |  8 ---
 arch/x86/events/intel/ds.c   | 14 +++
 arch/x86/events/perf_event.h |  4 
 4 files changed, 56 insertions(+), 25 deletions(-)

diff --git a/arch/x86/events/core.c b/arch/x86/events/core.c
index 2382ace..3b99864 100644
--- a/arch/x86/events/core.c
+++ b/arch/x86/events/core.c
@@ -185,16 +185,29 @@ static DEFINE_MUTEX(pmc_reserve_mutex);
 
 #ifdef CONFIG_X86_LOCAL_APIC
 
+static inline int get_possible_num_counters(void)
+{
+   int i, num_counters = x86_pmu.num_counters;
+
+   if (!is_hybrid())
+   return num_counters;
+
+   for (i = 0; i < x86_pmu.num_hybrid_pmus; i++)
+   num_counters = max_t(int, num_counters, 
x86_pmu.hybrid_pmu[i].num_counters);
+
+   return num_counters;
+}
+
 static bool reserve_pmc_hardware(void)
 {
-   int i;
+   int i, num_counters = get_possible_num_counters();
 
-   for (i = 0; i < x86_pmu.num_counters; i++) {
+   for (i = 0; i < num_counters; i++) {
if (!reserve_perfctr_nmi(x86_pmu_event_addr(i)))
goto perfctr_fail;
}
 
-   for (i = 0; i < x86_pmu.num_counters; i++) {
+   for (i = 0; i < num_counters; i++) {
if (!reserve_evntsel_nmi(x86_pmu_config_addr(i)))
goto eventsel_fail;
}
@@ -205,7 +218,7 @@ static bool reserve_pmc_hardware(void)
for (i--; i >= 0; i--)
release_evntsel_nmi(x86_pmu_config_addr(i));
 
-   i = x86_pmu.num_counters;
+   i = num_counters;
 
 perfctr_fail:
for (i--; i >= 0; i--)
@@ -216,9 +229,9 @@ static bool reserve_pmc_hardware(void)
 
 static void release_pmc_hardware(void)
 {
-   int i;
+   int i, num_counters = get_possible_num_counters();
 
-   for (i = 0; i < x86_pmu.num_counters; i++) {
+   for (i = 0; i < num_counters; i++) {
release_perfctr_nmi(x86_pmu_event_addr(i));
release_evntsel_nmi(x86_pmu_config_addr(i));
}
@@ -946,6 +959,7 @@ EXPORT_SYMBOL_GPL(perf_assign_events);
 
 int x86_schedule_events(struct cpu_hw_events *cpuc, int n, int *assign)
 {
+   int num_counters = hybrid(cpuc->pmu, num_counters);
struct event_constraint *c;
struct perf_event *e;
int n0, i, wmin, wmax, unsched = 0;
@@ -1021,7 +1035,7 @@ int x86_schedule_events(struct cpu_hw_events *cpuc, int 
n, int *assign)
 
/* slow path */
if (i != n) {
-   int gpmax = x86_pmu.num_counters;
+   int gpmax = num_counters;
 
/*
 * Do not allow scheduling of more than half the available
@@ -1042,7 +1056,7 @@ int x86_schedule_events(struct cpu_hw_events *cpuc, int 
n, int *assign)
 * the extra Merge events needed by large increment events.
 */
if (x86_pmu.flags & PMU_FL_PAIR) {
-   gpmax = x86_pmu.num_counters - cpuc->n_pair;
+   gpmax = num_counters - cpuc->n_pair;
WARN_ON(gpmax <= 0);
}
 
@@ -1129,10 +1143,12 @@ static int collect_event(struct cpu_hw_events *cpuc, 
struct perf_event *event,
  */
 static int collect_events(struct cpu_hw_events *cpuc, struct perf_event 
*leader, bool dogrp)
 {
+   int num_counters = hybrid(cpuc->pmu, num_counters);
+   int num_counters_fixed = hybrid(cpuc->pmu, num_counters_fixed);
struct perf_event *event;
int n, max_count;
 
-   max_count = x86_pmu.num_counters + x86_pmu.num_counters_fixed;
+   max_count = num_counters + num_counters_fixed;
 
/* current number of events already accepted */
n = cpuc->n_events;
@@ -1500,18 +1516,18 @@ void perf_event_print_debug(void)
 {
u64 ctrl, status, overflow, pmc_ctrl, pmc_count, prev_left, fixed;
u64 pebs, debugctl;
-   struct cpu_hw_events *cpuc;
+   int cpu = smp_processor_id();
+   struct cpu_hw_events *cpuc = _cpu(cpu_hw_events, cpu);
+   int num_counters = hybrid(cpuc->pmu, num_counters);
+   int num_counters_fixed = hybrid(cpuc->pmu, num_cou

[PATCH V6 04/25] perf/x86/intel: Hybrid PMU support for perf capabilities

2021-04-12 Thread kan . liang

From: Kan Liang 

Some platforms, e.g. Alder Lake, have hybrid architecture. Although most
PMU capabilities are the same, there are still some unique PMU
capabilities for different hybrid PMUs. Perf should register a dedicated
pmu for each hybrid PMU.

Add a new struct x86_hybrid_pmu, which saves the dedicated pmu and
capabilities for each hybrid PMU.

The architecture MSR, MSR_IA32_PERF_CAPABILITIES, only indicates the
architecture features which are available on all hybrid PMUs. The
architecture features are stored in the global x86_pmu.intel_cap.

For Alder Lake, the model-specific features are perf metrics and
PEBS-via-PT. The corresponding bits of the global x86_pmu.intel_cap
should be 0 for these two features. Perf should not use the global
intel_cap to check the features on a hybrid system.
Add a dedicated intel_cap in the x86_hybrid_pmu to store the
model-specific capabilities. Use the dedicated intel_cap to replace
the global intel_cap for thse two features. The dedicated intel_cap
will be set in the following "Add Alder Lake Hybrid support" patch.

Add is_hybrid() to distinguish a hybrid system. ADL may have an
alternative configuration. With that configuration, the
X86_FEATURE_HYBRID_CPU is not set. Perf cannot rely on the feature bit.
Add a new static_key_false, perf_is_hybrid, to indicate a hybrid system.
It will be assigned in the following "Add Alder Lake Hybrid support"
patch as well.

Suggested-by: Peter Zijlstra (Intel) 
Signed-off-by: Kan Liang 
---
 arch/x86/events/core.c   |  7 +--
 arch/x86/events/intel/core.c | 22 ++
 arch/x86/events/intel/ds.c   |  2 +-
 arch/x86/events/perf_event.h | 33 +
 arch/x86/include/asm/msr-index.h |  3 +++
 5 files changed, 60 insertions(+), 7 deletions(-)

diff --git a/arch/x86/events/core.c b/arch/x86/events/core.c
index e564e96..a8e7247 100644
--- a/arch/x86/events/core.c
+++ b/arch/x86/events/core.c
@@ -54,6 +54,7 @@ DEFINE_PER_CPU(struct cpu_hw_events, cpu_hw_events) = {
 
 DEFINE_STATIC_KEY_FALSE(rdpmc_never_available_key);
 DEFINE_STATIC_KEY_FALSE(rdpmc_always_available_key);
+DEFINE_STATIC_KEY_FALSE(perf_is_hybrid);
 
 /*
  * This here uses DEFINE_STATIC_CALL_NULL() to get a static_call defined
@@ -1105,8 +1106,9 @@ static void del_nr_metric_event(struct cpu_hw_events 
*cpuc,
 static int collect_event(struct cpu_hw_events *cpuc, struct perf_event *event,
 int max_count, int n)
 {
+   union perf_capabilities intel_cap = hybrid(cpuc->pmu, intel_cap);
 
-   if (x86_pmu.intel_cap.perf_metrics && add_nr_metric_event(cpuc, event))
+   if (intel_cap.perf_metrics && add_nr_metric_event(cpuc, event))
return -EINVAL;
 
if (n >= max_count + cpuc->n_metric)
@@ -1582,6 +1584,7 @@ void x86_pmu_stop(struct perf_event *event, int flags)
 static void x86_pmu_del(struct perf_event *event, int flags)
 {
struct cpu_hw_events *cpuc = this_cpu_ptr(_hw_events);
+   union perf_capabilities intel_cap = hybrid(cpuc->pmu, intel_cap);
int i;
 
/*
@@ -1621,7 +1624,7 @@ static void x86_pmu_del(struct perf_event *event, int 
flags)
}
cpuc->event_constraint[i-1] = NULL;
--cpuc->n_events;
-   if (x86_pmu.intel_cap.perf_metrics)
+   if (intel_cap.perf_metrics)
del_nr_metric_event(cpuc, event);
 
perf_event_update_userpage(event);
diff --git a/arch/x86/events/intel/core.c b/arch/x86/events/intel/core.c
index f116c63..dc9e2fb 100644
--- a/arch/x86/events/intel/core.c
+++ b/arch/x86/events/intel/core.c
@@ -3646,6 +3646,12 @@ static inline bool is_mem_loads_aux_event(struct 
perf_event *event)
return (event->attr.config & INTEL_ARCH_EVENT_MASK) == 
X86_CONFIG(.event=0x03, .umask=0x82);
 }
 
+static inline bool intel_pmu_has_cap(struct perf_event *event, int idx)
+{
+   union perf_capabilities *intel_cap = (event->pmu, intel_cap);
+
+   return test_bit(idx, (unsigned long *)_cap->capabilities);
+}
 
 static int intel_pmu_hw_config(struct perf_event *event)
 {
@@ -3712,7 +3718,7 @@ static int intel_pmu_hw_config(struct perf_event *event)
 * with a slots event as group leader. When the slots event
 * is used in a metrics group, it too cannot support sampling.
 */
-   if (x86_pmu.intel_cap.perf_metrics && is_topdown_event(event)) {
+   if (intel_pmu_has_cap(event, PERF_CAP_METRICS_IDX) && 
is_topdown_event(event)) {
if (event->attr.config1 || event->attr.config2)
return -EINVAL;
 
@@ -4219,8 +4225,16 @@ static void intel_pmu_cpu_starting(int cpu)
if (x86_pmu.version > 1)
flip_smm_bit(_pmu.attr_freeze_on_smi);
 
-   /* Disable perf metrics if any added CPU doesn't support it. */
-   if (x86_pmu.intel_cap.perf_metrics) {
+   /*
+* Disable perf metric

[PATCH V6 03/25] perf/x86: Track pmu in per-CPU cpu_hw_events

2021-04-12 Thread kan . liang

From: Kan Liang 

Some platforms, e.g. Alder Lake, have hybrid architecture. In the same
package, there may be more than one type of CPU. The PMU capabilities
are different among different types of CPU. Perf will register a
dedicated PMU for each type of CPU.

Add a 'pmu' variable in the struct cpu_hw_events to track the dedicated
PMU of the current CPU.

Current x86_get_pmu() use the global 'pmu', which will be broken on a
hybrid platform. Modify it to apply the 'pmu' of the specific CPU.

Initialize the per-CPU 'pmu' variable with the global 'pmu'. There is
nothing changed for the non-hybrid platforms.

The is_x86_event() will be updated in the later patch ("perf/x86:
Register hybrid PMUs") for hybrid platforms. For the non-hybrid
platforms, nothing is changed here.

Suggested-by: Peter Zijlstra (Intel) 
Signed-off-by: Kan Liang 
---
 arch/x86/events/core.c   | 17 +
 arch/x86/events/intel/core.c |  2 +-
 arch/x86/events/intel/ds.c   |  4 ++--
 arch/x86/events/intel/lbr.c  |  9 +
 arch/x86/events/perf_event.h |  4 +++-
 5 files changed, 24 insertions(+), 12 deletions(-)

diff --git a/arch/x86/events/core.c b/arch/x86/events/core.c
index 18df171..e564e96 100644
--- a/arch/x86/events/core.c
+++ b/arch/x86/events/core.c
@@ -45,9 +45,11 @@
 #include "perf_event.h"
 
 struct x86_pmu x86_pmu __read_mostly;
+static struct pmu pmu;
 
 DEFINE_PER_CPU(struct cpu_hw_events, cpu_hw_events) = {
.enabled = 1,
+   .pmu = ,
 };
 
 DEFINE_STATIC_KEY_FALSE(rdpmc_never_available_key);
@@ -724,16 +726,23 @@ void x86_pmu_enable_all(int added)
}
 }
 
-static struct pmu pmu;
-
 static inline int is_x86_event(struct perf_event *event)
 {
return event->pmu == 
 }
 
-struct pmu *x86_get_pmu(void)
+struct pmu *x86_get_pmu(unsigned int cpu)
 {
-   return 
+   struct cpu_hw_events *cpuc = _cpu(cpu_hw_events, cpu);
+
+   /*
+* All CPUs of the hybrid type have been offline.
+* The x86_get_pmu() should not be invoked.
+*/
+   if (WARN_ON_ONCE(!cpuc->pmu))
+   return 
+
+   return cpuc->pmu;
 }
 /*
  * Event scheduler state:
diff --git a/arch/x86/events/intel/core.c b/arch/x86/events/intel/core.c
index 7bbb5bb..f116c63 100644
--- a/arch/x86/events/intel/core.c
+++ b/arch/x86/events/intel/core.c
@@ -4876,7 +4876,7 @@ static void update_tfa_sched(void *ignored)
 * and if so force schedule out for all event types all contexts
 */
if (test_bit(3, cpuc->active_mask))
-   perf_pmu_resched(x86_get_pmu());
+   perf_pmu_resched(x86_get_pmu(smp_processor_id()));
 }
 
 static ssize_t show_sysctl_tfa(struct device *cdev,
diff --git a/arch/x86/events/intel/ds.c b/arch/x86/events/intel/ds.c
index 7ebae18..1bfea8c 100644
--- a/arch/x86/events/intel/ds.c
+++ b/arch/x86/events/intel/ds.c
@@ -2192,7 +2192,7 @@ void __init intel_ds_init(void)
PERF_SAMPLE_TIME;
x86_pmu.flags |= PMU_FL_PEBS_ALL;
pebs_qual = "-baseline";
-   x86_get_pmu()->capabilities |= 
PERF_PMU_CAP_EXTENDED_REGS;
+   x86_get_pmu(smp_processor_id())->capabilities 
|= PERF_PMU_CAP_EXTENDED_REGS;
} else {
/* Only basic record supported */
x86_pmu.large_pebs_flags &=
@@ -2207,7 +2207,7 @@ void __init intel_ds_init(void)
 
if (x86_pmu.intel_cap.pebs_output_pt_available) {
pr_cont("PEBS-via-PT, ");
-   x86_get_pmu()->capabilities |= 
PERF_PMU_CAP_AUX_OUTPUT;
+   x86_get_pmu(smp_processor_id())->capabilities 
|= PERF_PMU_CAP_AUX_OUTPUT;
}
 
break;
diff --git a/arch/x86/events/intel/lbr.c b/arch/x86/events/intel/lbr.c
index 21890da..bb4486c 100644
--- a/arch/x86/events/intel/lbr.c
+++ b/arch/x86/events/intel/lbr.c
@@ -705,7 +705,7 @@ void intel_pmu_lbr_add(struct perf_event *event)
 
 void release_lbr_buffers(void)
 {
-   struct kmem_cache *kmem_cache = x86_get_pmu()->task_ctx_cache;
+   struct kmem_cache *kmem_cache;
struct cpu_hw_events *cpuc;
int cpu;
 
@@ -714,6 +714,7 @@ void release_lbr_buffers(void)
 
for_each_possible_cpu(cpu) {
cpuc = per_cpu_ptr(_hw_events, cpu);
+   kmem_cache = x86_get_pmu(cpu)->task_ctx_cache;
if (kmem_cache && cpuc->lbr_xsave) {
kmem_cache_free(kmem_cache, cpuc->lbr_xsave);
cpuc->lbr_xsave = NULL;
@@ -1609,7 +1610,7 @@ void intel_pmu_lbr_init_hsw(void)
x86_pmu.lbr_sel_mask = LBR_SEL_MASK;
x86_pmu.lbr_sel_map  = hsw_lbr_sel_map;
 
-   x86_get_pmu()->task_ctx_c

[PATCH V6 23/25] perf/x86/msr: Add Alder Lake CPU support

2021-04-12 Thread kan . liang

From: Kan Liang 

PPERF and SMI_COUNT MSRs are also supported on Alder Lake.

The External Design Specification (EDS) is not published yet. It comes
from an authoritative internal source.

The patch has been tested on real hardware.

Reviewed-by: Andi Kleen 
Signed-off-by: Kan Liang 
---
 arch/x86/events/msr.c | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/arch/x86/events/msr.c b/arch/x86/events/msr.c
index 680404c..c853b28 100644
--- a/arch/x86/events/msr.c
+++ b/arch/x86/events/msr.c
@@ -100,6 +100,8 @@ static bool test_intel(int idx, void *data)
case INTEL_FAM6_TIGERLAKE_L:
case INTEL_FAM6_TIGERLAKE:
case INTEL_FAM6_ROCKETLAKE:
+   case INTEL_FAM6_ALDERLAKE:
+   case INTEL_FAM6_ALDERLAKE_L:
if (idx == PERF_MSR_SMI || idx == PERF_MSR_PPERF)
return true;
break;
-- 
2.7.4

[PATCH V6 02/25] x86/cpu: Add helper function to get the type of the current hybrid CPU

2021-04-12 Thread kan . liang

From: Ricardo Neri 

On processors with Intel Hybrid Technology (i.e., one having more than
one type of CPU in the same package), all CPUs support the same
instruction set and enumerate the same features on CPUID. Thus, all
software can run on any CPU without restrictions. However, there may be
model-specific differences among types of CPUs. For instance, each type
of CPU may support a different number of performance counters. Also,
machine check error banks may be wired differently. Even though most
software will not care about these differences, kernel subsystems
dealing with these differences must know.

Add and expose a new helper function get_this_hybrid_cpu_type() to query
the type of the current hybrid CPU. The function will be used later in
the perf subsystem.

The Intel Software Developer's Manual defines the CPU type as 8-bit
identifier.

Cc: Andi Kleen 
Cc: Andy Lutomirski 
Cc: Dave Hansen 
Cc: Kan Liang 
Cc: "Peter Zijlstra (Intel)" 
Cc: "Rafael J. Wysocki" 
Cc: "Ravi V. Shankar" 
Cc: Srinivas Pandruvada 
Cc: linux-kernel@vger.kernel.org
Reviewed-by: Len Brown 
Reviewed-by: Tony Luck 
Acked-by: Borislav Petkov 
Signed-off-by: Ricardo Neri 
---
Changes since v5 (as part of patchset for perf change for Alderlake)
 * None

Changes since v4 (as part of patchset for perf change for Alderlake)
 * Put the X86_HYBRID_CPU_TYPE_ID_SHIFT over the function where it is
   used (Boris) 
 * Add Acked-by

Changes since v3 (as part of patchset for perf change for Alderlake)
 * None

Changes since v2 (as part of patchset for perf change for Alderlake)
 * Use get_this_hybrid_cpu_type() to replace get_hybrid_cpu_type() to
   avoid the trouble of IPIs. The new function retrieves the type of the
   current hybrid CPU. It's good enough for perf. (Dave)
 * Remove definitions for Atom and Core CPU types. Perf will define a
   enum for the hybrid CPU type in the perf_event.h (Peter)
 * Remove X86_HYBRID_CPU_NATIVE_MODEL_ID_MASK. Not used in the patch
   set. (Kan)
 * Update the description accordingly. (Boris)

Changes since v1 (as part of patchset for perf change for Alderlake)
 * Removed cpuinfo_x86.x86_cpu_type. It can be added later if needed.
   Instead, implement helper functions that subsystems can use.(Boris)
 * Add definitions for Atom and Core CPU types. (Kan)

Changes since v1 (in a separate posting)
 * Simplify code by using cpuid_eax(). (Boris)
 * Reworded the commit message to clarify the concept of Intel Hybrid
   Technology. Stress that all CPUs can run the same instruction set
   and support the same features.
---
 arch/x86/include/asm/cpu.h  |  6 ++
 arch/x86/kernel/cpu/intel.c | 16 
 2 files changed, 22 insertions(+)

diff --git a/arch/x86/include/asm/cpu.h b/arch/x86/include/asm/cpu.h
index da78ccb..610905d 100644
--- a/arch/x86/include/asm/cpu.h
+++ b/arch/x86/include/asm/cpu.h
@@ -45,6 +45,7 @@ extern void __init cpu_set_core_cap_bits(struct cpuinfo_x86 
*c);
 extern void switch_to_sld(unsigned long tifn);
 extern bool handle_user_split_lock(struct pt_regs *regs, long error_code);
 extern bool handle_guest_split_lock(unsigned long ip);
+u8 get_this_hybrid_cpu_type(void);
 #else
 static inline void __init cpu_set_core_cap_bits(struct cpuinfo_x86 *c) {}
 static inline void switch_to_sld(unsigned long tifn) {}
@@ -57,6 +58,11 @@ static inline bool handle_guest_split_lock(unsigned long ip)
 {
return false;
 }
+
+static inline u8 get_this_hybrid_cpu_type(void)
+{
+   return 0;
+}
 #endif
 #ifdef CONFIG_IA32_FEAT_CTL
 void init_ia32_feat_ctl(struct cpuinfo_x86 *c);
diff --git a/arch/x86/kernel/cpu/intel.c b/arch/x86/kernel/cpu/intel.c
index 0e422a5..26fb626 100644
--- a/arch/x86/kernel/cpu/intel.c
+++ b/arch/x86/kernel/cpu/intel.c
@@ -1195,3 +1195,19 @@ void __init cpu_set_core_cap_bits(struct cpuinfo_x86 *c)
cpu_model_supports_sld = true;
split_lock_setup();
 }
+
+#define X86_HYBRID_CPU_TYPE_ID_SHIFT   24
+
+/**
+ * get_this_hybrid_cpu_type() - Get the type of this hybrid CPU
+ *
+ * Returns the CPU type [31:24] (i.e., Atom or Core) of a CPU in
+ * a hybrid processor. If the processor is not hybrid, returns 0.
+ */
+u8 get_this_hybrid_cpu_type(void)
+{
+   if (!cpu_feature_enabled(X86_FEATURE_HYBRID_CPU))
+   return 0;
+
+   return cpuid_eax(0x001a) >> X86_HYBRID_CPU_TYPE_ID_SHIFT;
+}
-- 
2.7.4

[PATCH V6 18/25] perf/x86/intel: Add attr_update for Hybrid PMUs

2021-04-12 Thread kan . liang

From: Kan Liang 

The attribute_group for Hybrid PMUs should be different from the
previous
cpu PMU. For example, cpumask is required for a Hybrid PMU. The PMU type
should be included in the event and format attribute.

Add hybrid_attr_update for the Hybrid PMU.
Check the PMU type in is_visible() function. Only display the event or
format for the matched Hybrid PMU.

Reviewed-by: Andi Kleen 
Signed-off-by: Kan Liang 
---
 arch/x86/events/intel/core.c | 120 ---
 1 file changed, 114 insertions(+), 6 deletions(-)

diff --git a/arch/x86/events/intel/core.c b/arch/x86/events/intel/core.c
index 4881209..ba24638 100644
--- a/arch/x86/events/intel/core.c
+++ b/arch/x86/events/intel/core.c
@@ -5118,6 +5118,106 @@ static const struct attribute_group *attr_update[] = {
NULL,
 };
 
+static bool is_attr_for_this_pmu(struct kobject *kobj, struct attribute *attr)
+{
+   struct device *dev = kobj_to_dev(kobj);
+   struct x86_hybrid_pmu *pmu =
+   container_of(dev_get_drvdata(dev), struct x86_hybrid_pmu, pmu);
+   struct perf_pmu_events_hybrid_attr *pmu_attr =
+   container_of(attr, struct perf_pmu_events_hybrid_attr, 
attr.attr);
+
+   return pmu->cpu_type & pmu_attr->pmu_type;
+}
+
+static umode_t hybrid_events_is_visible(struct kobject *kobj,
+   struct attribute *attr, int i)
+{
+   return is_attr_for_this_pmu(kobj, attr) ? attr->mode : 0;
+}
+
+static inline int hybrid_find_supported_cpu(struct x86_hybrid_pmu *pmu)
+{
+   int cpu = cpumask_first(>supported_cpus);
+
+   return (cpu >= nr_cpu_ids) ? -1 : cpu;
+}
+
+static umode_t hybrid_tsx_is_visible(struct kobject *kobj,
+struct attribute *attr, int i)
+{
+   struct device *dev = kobj_to_dev(kobj);
+   struct x86_hybrid_pmu *pmu =
+container_of(dev_get_drvdata(dev), struct x86_hybrid_pmu, pmu);
+   int cpu = hybrid_find_supported_cpu(pmu);
+
+   return (cpu >= 0) && is_attr_for_this_pmu(kobj, attr) && 
cpu_has(_data(cpu), X86_FEATURE_RTM) ? attr->mode : 0;
+}
+
+static umode_t hybrid_format_is_visible(struct kobject *kobj,
+   struct attribute *attr, int i)
+{
+   struct device *dev = kobj_to_dev(kobj);
+   struct x86_hybrid_pmu *pmu =
+   container_of(dev_get_drvdata(dev), struct x86_hybrid_pmu, pmu);
+   struct perf_pmu_format_hybrid_attr *pmu_attr =
+   container_of(attr, struct perf_pmu_format_hybrid_attr, 
attr.attr);
+   int cpu = hybrid_find_supported_cpu(pmu);
+
+   return (cpu >= 0) && (pmu->cpu_type & pmu_attr->pmu_type) ? attr->mode 
: 0;
+}
+
+static struct attribute_group hybrid_group_events_td  = {
+   .name   = "events",
+   .is_visible = hybrid_events_is_visible,
+};
+
+static struct attribute_group hybrid_group_events_mem = {
+   .name   = "events",
+   .is_visible = hybrid_events_is_visible,
+};
+
+static struct attribute_group hybrid_group_events_tsx = {
+   .name   = "events",
+   .is_visible = hybrid_tsx_is_visible,
+};
+
+static struct attribute_group hybrid_group_format_extra = {
+   .name   = "format",
+   .is_visible = hybrid_format_is_visible,
+};
+
+static ssize_t intel_hybrid_get_attr_cpus(struct device *dev,
+ struct device_attribute *attr,
+ char *buf)
+{
+   struct x86_hybrid_pmu *pmu =
+   container_of(dev_get_drvdata(dev), struct x86_hybrid_pmu, pmu);
+
+   return cpumap_print_to_pagebuf(true, buf, >supported_cpus);
+}
+
+static DEVICE_ATTR(cpus, S_IRUGO, intel_hybrid_get_attr_cpus, NULL);
+static struct attribute *intel_hybrid_cpus_attrs[] = {
+   _attr_cpus.attr,
+   NULL,
+};
+
+static struct attribute_group hybrid_group_cpus = {
+   .attrs  = intel_hybrid_cpus_attrs,
+};
+
+static const struct attribute_group *hybrid_attr_update[] = {
+   _group_events_td,
+   _group_events_mem,
+   _group_events_tsx,
+   _caps_gen,
+   _caps_lbr,
+   _group_format_extra,
+   _default,
+   _group_cpus,
+   NULL,
+};
+
 static struct attribute *empty_attrs;
 
 static void intel_pmu_check_num_counters(int *num_counters,
@@ -5861,14 +5961,22 @@ __init int intel_pmu_init(void)
 
snprintf(pmu_name_str, sizeof(pmu_name_str), "%s", name);
 
+   if (!is_hybrid()) {
+   group_events_td.attrs  = td_attr;
+   group_events_mem.attrs = mem_attr;
+   group_events_tsx.attrs = tsx_attr;
+   group_format_extra.attrs = extra_attr;
+   group_format_extra_skl.attrs = extra_skl_attr;
 
-   group_events_td.attrs  = td_attr;
-   group

[PATCH V6 20/25] perf/x86/intel: Add Alder Lake Hybrid support

2021-04-12 Thread kan . liang

From: Kan Liang 

Alder Lake Hybrid system has two different types of core, Golden Cove
core and Gracemont core. The Golden Cove core is registered to
"cpu_core" PMU. The Gracemont core is registered to "cpu_atom" PMU.

The difference between the two PMUs include:
- Number of GP and fixed counters
- Events
- The "cpu_core" PMU supports Topdown metrics.
  The "cpu_atom" PMU supports PEBS-via-PT.

The "cpu_core" PMU is similar to the Sapphire Rapids PMU, but without
PMEM.
The "cpu_atom" PMU is similar to Tremont, but with different events,
event_constraints, extra_regs and number of counters.

The mem-loads AUX event workaround only applies to the Golden Cove core.

Users may disable all CPUs of the same CPU type on the command line or
in the BIOS. For this case, perf still register a PMU for the CPU type
but the CPU mask is 0.

Current caps/pmu_name is usually the microarch codename. Assign the
"alderlake_hybrid" to the caps/pmu_name of both PMUs to indicate the
hybrid Alder Lake microarchitecture.

Reviewed-by: Andi Kleen 
Signed-off-by: Kan Liang 
---
 arch/x86/events/intel/core.c | 255 ++-
 arch/x86/events/intel/ds.c   |   7 ++
 arch/x86/events/perf_event.h |   7 ++
 3 files changed, 268 insertions(+), 1 deletion(-)

diff --git a/arch/x86/events/intel/core.c b/arch/x86/events/intel/core.c
index ba24638..5272f34 100644
--- a/arch/x86/events/intel/core.c
+++ b/arch/x86/events/intel/core.c
@@ -2076,6 +2076,14 @@ static struct extra_reg intel_tnt_extra_regs[] 
__read_mostly = {
EVENT_EXTRA_END
 };
 
+static struct extra_reg intel_grt_extra_regs[] __read_mostly = {
+   /* must define OFFCORE_RSP_X first, see intel_fixup_er() */
+   INTEL_UEVENT_EXTRA_REG(0x01b7, MSR_OFFCORE_RSP_0, 0x3full, 
RSP_0),
+   INTEL_UEVENT_EXTRA_REG(0x02b7, MSR_OFFCORE_RSP_1, 0x3full, 
RSP_1),
+   INTEL_UEVENT_PEBS_LDLAT_EXTRA_REG(0x5d0),
+   EVENT_EXTRA_END
+};
+
 #define KNL_OT_L2_HITE BIT_ULL(19) /* Other Tile L2 Hit */
 #define KNL_OT_L2_HITF BIT_ULL(20) /* Other Tile L2 Hit */
 #define KNL_MCDRAM_LOCAL   BIT_ULL(21)
@@ -2430,6 +2438,16 @@ static int icl_set_topdown_event_period(struct 
perf_event *event)
return 0;
 }
 
+static int adl_set_topdown_event_period(struct perf_event *event)
+{
+   struct x86_hybrid_pmu *pmu = hybrid_pmu(event->pmu);
+
+   if (pmu->cpu_type != hybrid_big)
+   return 0;
+
+   return icl_set_topdown_event_period(event);
+}
+
 static inline u64 icl_get_metrics_event_value(u64 metric, u64 slots, int idx)
 {
u32 val;
@@ -2570,6 +2588,17 @@ static u64 icl_update_topdown_event(struct perf_event 
*event)
 x86_pmu.num_topdown_events - 
1);
 }
 
+static u64 adl_update_topdown_event(struct perf_event *event)
+{
+   struct x86_hybrid_pmu *pmu = hybrid_pmu(event->pmu);
+
+   if (pmu->cpu_type != hybrid_big)
+   return 0;
+
+   return icl_update_topdown_event(event);
+}
+
+
 static void intel_pmu_read_topdown_event(struct perf_event *event)
 {
struct cpu_hw_events *cpuc = this_cpu_ptr(_hw_events);
@@ -3655,6 +3684,17 @@ static inline bool is_mem_loads_aux_event(struct 
perf_event *event)
return (event->attr.config & INTEL_ARCH_EVENT_MASK) == 
X86_CONFIG(.event=0x03, .umask=0x82);
 }
 
+static inline bool require_mem_loads_aux_event(struct perf_event *event)
+{
+   if (!(x86_pmu.flags & PMU_FL_MEM_LOADS_AUX))
+   return false;
+
+   if (is_hybrid())
+   return hybrid_pmu(event->pmu)->cpu_type == hybrid_big;
+
+   return true;
+}
+
 static inline bool intel_pmu_has_cap(struct perf_event *event, int idx)
 {
union perf_capabilities *intel_cap = (event->pmu, intel_cap);
@@ -3779,7 +3819,7 @@ static int intel_pmu_hw_config(struct perf_event *event)
 * event. The rule is to simplify the implementation of the check.
 * That's because perf cannot have a complete group at the moment.
 */
-   if (x86_pmu.flags & PMU_FL_MEM_LOADS_AUX &&
+   if (require_mem_loads_aux_event(event) &&
(event->attr.sample_type & PERF_SAMPLE_DATA_SRC) &&
is_mem_loads_event(event)) {
struct perf_event *leader = event->group_leader;
@@ -4056,6 +4096,39 @@ tfa_get_event_constraints(struct cpu_hw_events *cpuc, 
int idx,
return c;
 }
 
+static struct event_constraint *
+adl_get_event_constraints(struct cpu_hw_events *cpuc, int idx,
+ struct perf_event *event)
+{
+   struct x86_hybrid_pmu *pmu = hybrid_pmu(event->pmu);
+
+   if (pmu->cpu_type == hybrid_big)
+   return spr_get_event_constraints(cpuc, idx, event);
+   else if (pmu->cpu_type == hybrid_small)
+   return tnt_get_event_constraints(cp

[PATCH V6 24/25] perf/x86/cstate: Add Alder Lake CPU support

2021-04-12 Thread kan . liang

From: Kan Liang 

Compared with the Rocket Lake, the CORE C1 Residency Counter is added
for Alder Lake, but the CORE C3 Residency Counter is removed. Other
counters are the same.

Create a new adl_cstates for Alder Lake. Update the comments
accordingly.

The External Design Specification (EDS) is not published yet. It comes
from an authoritative internal source.

The patch has been tested on real hardware.

Reviewed-by: Andi Kleen 
Signed-off-by: Kan Liang 
---
 arch/x86/events/intel/cstate.c | 39 +--
 1 file changed, 29 insertions(+), 10 deletions(-)

diff --git a/arch/x86/events/intel/cstate.c b/arch/x86/events/intel/cstate.c
index 407eee5..4333990 100644
--- a/arch/x86/events/intel/cstate.c
+++ b/arch/x86/events/intel/cstate.c
@@ -40,7 +40,7 @@
  * Model specific counters:
  * MSR_CORE_C1_RES: CORE C1 Residency Counter
  *  perf code: 0x00
- *  Available model: SLM,AMT,GLM,CNL,TNT
+ *  Available model: SLM,AMT,GLM,CNL,TNT,ADL
  *  Scope: Core (each processor core has a MSR)
  * MSR_CORE_C3_RESIDENCY: CORE C3 Residency Counter
  *perf code: 0x01
@@ -51,46 +51,49 @@
  *perf code: 0x02
  *Available model: SLM,AMT,NHM,WSM,SNB,IVB,HSW,BDW,
  * SKL,KNL,GLM,CNL,KBL,CML,ICL,TGL,
- * TNT,RKL
+ * TNT,RKL,ADL
  *Scope: Core
  * MSR_CORE_C7_RESIDENCY: CORE C7 Residency Counter
  *perf code: 0x03
  *Available model: SNB,IVB,HSW,BDW,SKL,CNL,KBL,CML,
- * ICL,TGL,RKL
+ * ICL,TGL,RKL,ADL
  *Scope: Core
  * MSR_PKG_C2_RESIDENCY:  Package C2 Residency Counter.
  *perf code: 0x00
  *Available model: SNB,IVB,HSW,BDW,SKL,KNL,GLM,CNL,
- * KBL,CML,ICL,TGL,TNT,RKL
+ * KBL,CML,ICL,TGL,TNT,RKL,ADL
  *Scope: Package (physical package)
  * MSR_PKG_C3_RESIDENCY:  Package C3 Residency Counter.
  *perf code: 0x01
  *Available model: NHM,WSM,SNB,IVB,HSW,BDW,SKL,KNL,
- * GLM,CNL,KBL,CML,ICL,TGL,TNT,RKL
+ * GLM,CNL,KBL,CML,ICL,TGL,TNT,RKL,
+ * ADL
  *Scope: Package (physical package)
  * MSR_PKG_C6_RESIDENCY:  Package C6 Residency Counter.
  *perf code: 0x02
  *Available model: SLM,AMT,NHM,WSM,SNB,IVB,HSW,BDW,
  * SKL,KNL,GLM,CNL,KBL,CML,ICL,TGL,
- * TNT,RKL
+ * TNT,RKL,ADL
  *Scope: Package (physical package)
  * MSR_PKG_C7_RESIDENCY:  Package C7 Residency Counter.
  *perf code: 0x03
  *Available model: NHM,WSM,SNB,IVB,HSW,BDW,SKL,CNL,
- * KBL,CML,ICL,TGL,RKL
+ * KBL,CML,ICL,TGL,RKL,ADL
  *Scope: Package (physical package)
  * MSR_PKG_C8_RESIDENCY:  Package C8 Residency Counter.
  *perf code: 0x04
- *Available model: HSW ULT,KBL,CNL,CML,ICL,TGL,RKL
+ *Available model: HSW ULT,KBL,CNL,CML,ICL,TGL,RKL,
+ * ADL
  *Scope: Package (physical package)
  * MSR_PKG_C9_RESIDENCY:  Package C9 Residency Counter.
  *perf code: 0x05
- *Available model: HSW ULT,KBL,CNL,CML,ICL,TGL,RKL
+ *Available model: HSW ULT,KBL,CNL,CML,ICL,TGL,RKL,
+ * ADL
  *Scope: Package (physical package)
  * MSR_PKG_C10_RESIDENCY: Package C10 Residency Counter.
  *perf code: 0x06
  *Available model: HSW ULT,KBL,GLM,CNL,CML,ICL,TGL,
- * TNT,RKL
+ * TNT,RKL,ADL
  *Scope: Package (physical package)
  *
  */
@@ -563,6 +566,20 @@ static const struct cstate_model icl_cstates __initconst = 
{
  BIT(PERF_CSTATE_PKG_C10_RES),
 };
 
+static const struct

[PATCH V6 19/25] perf/x86: Support filter_match callback

2021-04-12 Thread kan . liang

From: Kan Liang 

Implement filter_match callback for X86, which check whether an event is
schedulable on the current CPU.

Reviewed-by: Andi Kleen 
Signed-off-by: Kan Liang 
---
 arch/x86/events/core.c   | 10 ++
 arch/x86/events/perf_event.h |  1 +
 2 files changed, 11 insertions(+)

diff --git a/arch/x86/events/core.c b/arch/x86/events/core.c
index ba3736c..8fc45b8 100644
--- a/arch/x86/events/core.c
+++ b/arch/x86/events/core.c
@@ -2642,6 +2642,14 @@ static int x86_pmu_aux_output_match(struct perf_event 
*event)
return 0;
 }
 
+static int x86_pmu_filter_match(struct perf_event *event)
+{
+   if (x86_pmu.filter_match)
+   return x86_pmu.filter_match(event);
+
+   return 1;
+}
+
 static struct pmu pmu = {
.pmu_enable = x86_pmu_enable,
.pmu_disable= x86_pmu_disable,
@@ -2669,6 +2677,8 @@ static struct pmu pmu = {
.check_period   = x86_pmu_check_period,
 
.aux_output_match   = x86_pmu_aux_output_match,
+
+   .filter_match   = x86_pmu_filter_match,
 };
 
 void arch_perf_update_userpage(struct perf_event *event,
diff --git a/arch/x86/events/perf_event.h b/arch/x86/events/perf_event.h
index 4d94ec9..0051c87 100644
--- a/arch/x86/events/perf_event.h
+++ b/arch/x86/events/perf_event.h
@@ -880,6 +880,7 @@ struct x86_pmu {
 
int (*aux_output_match) (struct perf_event *event);
 
+   int (*filter_match)(struct perf_event *event);
/*
 * Hybrid support
 *
-- 
2.7.4

[PATCH V6 25/25] perf/x86/rapl: Add support for Intel Alder Lake

2021-04-12 Thread kan . liang

From: Zhang Rui 

Alder Lake RAPL support is the same as previous Sky Lake.
Add Alder Lake model for RAPL.

Reviewed-by: Andi Kleen 
Signed-off-by: Zhang Rui 
---
 arch/x86/events/rapl.c | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/arch/x86/events/rapl.c b/arch/x86/events/rapl.c
index f42a704..84a1042 100644
--- a/arch/x86/events/rapl.c
+++ b/arch/x86/events/rapl.c
@@ -800,6 +800,8 @@ static const struct x86_cpu_id rapl_model_match[] 
__initconst = {
X86_MATCH_INTEL_FAM6_MODEL(ICELAKE_X,   _hsx),
X86_MATCH_INTEL_FAM6_MODEL(COMETLAKE_L, _skl),
X86_MATCH_INTEL_FAM6_MODEL(COMETLAKE,   _skl),
+   X86_MATCH_INTEL_FAM6_MODEL(ALDERLAKE,   _skl),
+   X86_MATCH_INTEL_FAM6_MODEL(ALDERLAKE_L, _skl),
X86_MATCH_INTEL_FAM6_MODEL(SAPPHIRERAPIDS_X,_spr),
X86_MATCH_VENDOR_FAM(AMD,   0x17,   _amd_fam17h),
X86_MATCH_VENDOR_FAM(HYGON, 0x18,   _amd_fam17h),
-- 
2.7.4

[PATCH V6 22/25] perf/x86/intel/uncore: Add Alder Lake support

2021-04-12 Thread kan . liang

From: Kan Liang 

The uncore subsystem for Alder Lake is similar to the previous Tiger
Lake.

The difference includes:
- New MSR addresses for global control, fixed counters, CBOX and ARB.
  Add a new adl_uncore_msr_ops for uncore operations.
- Add a new threshold field for CBOX.
- New PCIIDs for IMC devices.

Reviewed-by: Andi Kleen 
Signed-off-by: Kan Liang 
---
 arch/x86/events/intel/uncore.c |   7 ++
 arch/x86/events/intel/uncore.h |   1 +
 arch/x86/events/intel/uncore_snb.c | 131 +
 3 files changed, 139 insertions(+)

diff --git a/arch/x86/events/intel/uncore.c b/arch/x86/events/intel/uncore.c
index 35b3470..70816f3 100644
--- a/arch/x86/events/intel/uncore.c
+++ b/arch/x86/events/intel/uncore.c
@@ -1740,6 +1740,11 @@ static const struct intel_uncore_init_fun 
rkl_uncore_init __initconst = {
.pci_init = skl_uncore_pci_init,
 };
 
+static const struct intel_uncore_init_fun adl_uncore_init __initconst = {
+   .cpu_init = adl_uncore_cpu_init,
+   .mmio_init = tgl_uncore_mmio_init,
+};
+
 static const struct intel_uncore_init_fun icx_uncore_init __initconst = {
.cpu_init = icx_uncore_cpu_init,
.pci_init = icx_uncore_pci_init,
@@ -1794,6 +1799,8 @@ static const struct x86_cpu_id intel_uncore_match[] 
__initconst = {
X86_MATCH_INTEL_FAM6_MODEL(TIGERLAKE_L, _l_uncore_init),
X86_MATCH_INTEL_FAM6_MODEL(TIGERLAKE,   _uncore_init),
X86_MATCH_INTEL_FAM6_MODEL(ROCKETLAKE,  _uncore_init),
+   X86_MATCH_INTEL_FAM6_MODEL(ALDERLAKE,   _uncore_init),
+   X86_MATCH_INTEL_FAM6_MODEL(ALDERLAKE_L, _uncore_init),
X86_MATCH_INTEL_FAM6_MODEL(ATOM_TREMONT_D,  _uncore_init),
{},
 };
diff --git a/arch/x86/events/intel/uncore.h b/arch/x86/events/intel/uncore.h
index 549cfb2..426212f 100644
--- a/arch/x86/events/intel/uncore.h
+++ b/arch/x86/events/intel/uncore.h
@@ -575,6 +575,7 @@ void snb_uncore_cpu_init(void);
 void nhm_uncore_cpu_init(void);
 void skl_uncore_cpu_init(void);
 void icl_uncore_cpu_init(void);
+void adl_uncore_cpu_init(void);
 void tgl_uncore_cpu_init(void);
 void tgl_uncore_mmio_init(void);
 void tgl_l_uncore_mmio_init(void);
diff --git a/arch/x86/events/intel/uncore_snb.c 
b/arch/x86/events/intel/uncore_snb.c
index 5127128..0f63706 100644
--- a/arch/x86/events/intel/uncore_snb.c
+++ b/arch/x86/events/intel/uncore_snb.c
@@ -62,6 +62,8 @@
 #define PCI_DEVICE_ID_INTEL_TGL_H_IMC  0x9a36
 #define PCI_DEVICE_ID_INTEL_RKL_1_IMC  0x4c43
 #define PCI_DEVICE_ID_INTEL_RKL_2_IMC  0x4c53
+#define PCI_DEVICE_ID_INTEL_ADL_1_IMC  0x4660
+#define PCI_DEVICE_ID_INTEL_ADL_2_IMC  0x4641
 
 /* SNB event control */
 #define SNB_UNC_CTL_EV_SEL_MASK0x00ff
@@ -131,12 +133,33 @@
 #define ICL_UNC_ARB_PER_CTR0x3b1
 #define ICL_UNC_ARB_PERFEVTSEL 0x3b3
 
+/* ADL uncore global control */
+#define ADL_UNC_PERF_GLOBAL_CTL0x2ff0
+#define ADL_UNC_FIXED_CTR_CTRL  0x2fde
+#define ADL_UNC_FIXED_CTR   0x2fdf
+
+/* ADL Cbo register */
+#define ADL_UNC_CBO_0_PER_CTR0 0x2002
+#define ADL_UNC_CBO_0_PERFEVTSEL0  0x2000
+#define ADL_UNC_CTL_THRESHOLD  0x3f00
+#define ADL_UNC_RAW_EVENT_MASK (SNB_UNC_CTL_EV_SEL_MASK | \
+SNB_UNC_CTL_UMASK_MASK | \
+SNB_UNC_CTL_EDGE_DET | \
+SNB_UNC_CTL_INVERT | \
+ADL_UNC_CTL_THRESHOLD)
+
+/* ADL ARB register */
+#define ADL_UNC_ARB_PER_CTR0   0x2FD2
+#define ADL_UNC_ARB_PERFEVTSEL00x2FD0
+#define ADL_UNC_ARB_MSR_OFFSET 0x8
+
 DEFINE_UNCORE_FORMAT_ATTR(event, event, "config:0-7");
 DEFINE_UNCORE_FORMAT_ATTR(umask, umask, "config:8-15");
 DEFINE_UNCORE_FORMAT_ATTR(edge, edge, "config:18");
 DEFINE_UNCORE_FORMAT_ATTR(inv, inv, "config:23");
 DEFINE_UNCORE_FORMAT_ATTR(cmask5, cmask, "config:24-28");
 DEFINE_UNCORE_FORMAT_ATTR(cmask8, cmask, "config:24-31");
+DEFINE_UNCORE_FORMAT_ATTR(threshold, threshold, "config:24-29");
 
 /* Sandy Bridge uncore support */
 static void snb_uncore_msr_enable_event(struct intel_uncore_box *box, struct 
perf_event *event)
@@ -422,6 +445,106 @@ void tgl_uncore_cpu_init(void)
skl_uncore_msr_ops.init_box = rkl_uncore_msr_init_box;
 }
 
+static void adl_uncore_msr_init_box(struct intel_uncore_box *box)
+{
+   if (box->pmu->pmu_idx == 0)
+   wrmsrl(ADL_UNC_PERF_GLOBAL_CTL, SNB_UNC_GLOBAL_CTL_EN);
+}
+
+static void adl_uncore_msr_enable_box(struct intel_uncore_box *box)
+{
+   wrmsrl(ADL_UNC_PERF_GLOBAL_CTL, SNB_UNC_GLOBAL_CTL_EN);
+}
+
+static void adl_uncore_ms

[PATCH V6 21/25] perf: Extend PERF_TYPE_HARDWARE and PERF_TYPE_HW_CACHE

2021-04-12 Thread kan . liang

From: Kan Liang 

Current Hardware events and Hardware cache events have special perf
types, PERF_TYPE_HARDWARE and PERF_TYPE_HW_CACHE. The two types don't
pass the PMU type in the user interface. For a hybrid system, the perf
subsystem doesn't know which PMU the events belong to. The first capable
PMU will always be assigned to the events. The events never get a chance
to run on the other capable PMUs.

Extend the two types to become PMU aware types. The PMU type ID is
stored at attr.config[63:32].

Add a new PMU capability, PERF_PMU_CAP_EXTENDED_HW_TYPE, to indicate a
PMU which supports the extended PERF_TYPE_HARDWARE and
PERF_TYPE_HW_CACHE.

The PMU type is only required when searching a specific PMU. The PMU
specific codes will only be interested in the 'real' config value, which
is stored in the low 32 bit of the event->attr.config. Update the
event->attr.config in the generic code, so the PMU specific codes don't
need to calculate it separately.

If a user specifies a PMU type, but the PMU doesn't support the extended
type, error out.

If an event cannot be initialized in a PMU specified by a user, error
out immediately. Perf should not try to open it on other PMUs.

The new PMU capability is only set for the X86 hybrid PMUs for now.
Other architectures, e.g., ARM, may need it as well. The support on ARM
may be implemented later separately.

Cc: Mark Rutland 
Suggested-by: Peter Zijlstra (Intel) 
Signed-off-by: Kan Liang 
---
 arch/x86/events/core.c  |  1 +
 include/linux/perf_event.h  | 19 ++-
 include/uapi/linux/perf_event.h | 15 +++
 kernel/events/core.c| 19 ---
 4 files changed, 42 insertions(+), 12 deletions(-)

diff --git a/arch/x86/events/core.c b/arch/x86/events/core.c
index 8fc45b8..88f2dbf 100644
--- a/arch/x86/events/core.c
+++ b/arch/x86/events/core.c
@@ -2174,6 +2174,7 @@ static int __init init_hw_perf_events(void)
hybrid_pmu->pmu.type = -1;
hybrid_pmu->pmu.attr_update = x86_pmu.attr_update;
hybrid_pmu->pmu.capabilities |= 
PERF_PMU_CAP_HETEROGENEOUS_CPUS;
+   hybrid_pmu->pmu.capabilities |= 
PERF_PMU_CAP_EXTENDED_HW_TYPE;
 
err = perf_pmu_register(_pmu->pmu, 
hybrid_pmu->name,
(hybrid_pmu->cpu_type == 
hybrid_big) ? PERF_TYPE_RAW : -1);
diff --git a/include/linux/perf_event.h b/include/linux/perf_event.h
index b832e09..2f12dca 100644
--- a/include/linux/perf_event.h
+++ b/include/linux/perf_event.h
@@ -260,15 +260,16 @@ struct perf_event;
 /**
  * pmu::capabilities flags
  */
-#define PERF_PMU_CAP_NO_INTERRUPT  0x01
-#define PERF_PMU_CAP_NO_NMI0x02
-#define PERF_PMU_CAP_AUX_NO_SG 0x04
-#define PERF_PMU_CAP_EXTENDED_REGS 0x08
-#define PERF_PMU_CAP_EXCLUSIVE 0x10
-#define PERF_PMU_CAP_ITRACE0x20
-#define PERF_PMU_CAP_HETEROGENEOUS_CPUS0x40
-#define PERF_PMU_CAP_NO_EXCLUDE0x80
-#define PERF_PMU_CAP_AUX_OUTPUT0x100
+#define PERF_PMU_CAP_NO_INTERRUPT  0x0001
+#define PERF_PMU_CAP_NO_NMI0x0002
+#define PERF_PMU_CAP_AUX_NO_SG 0x0004
+#define PERF_PMU_CAP_EXTENDED_REGS 0x0008
+#define PERF_PMU_CAP_EXCLUSIVE 0x0010
+#define PERF_PMU_CAP_ITRACE0x0020
+#define PERF_PMU_CAP_HETEROGENEOUS_CPUS0x0040
+#define PERF_PMU_CAP_NO_EXCLUDE0x0080
+#define PERF_PMU_CAP_AUX_OUTPUT0x0100
+#define PERF_PMU_CAP_EXTENDED_HW_TYPE  0x0200
 
 struct perf_output_handle;
 
diff --git a/include/uapi/linux/perf_event.h b/include/uapi/linux/perf_event.h
index ad15e40..14332f4 100644
--- a/include/uapi/linux/perf_event.h
+++ b/include/uapi/linux/perf_event.h
@@ -38,6 +38,21 @@ enum perf_type_id {
 };
 
 /*
+ * attr.config layout for type PERF_TYPE_HARDWARE and PERF_TYPE_HW_CACHE
+ * PERF_TYPE_HARDWARE: 0x00AA
+ * AA: hardware event ID
+ * : PMU type ID
+ * PERF_TYPE_HW_CACHE: 0x00DDCCBB
+ * BB: hardware cache ID
+ * CC: hardware cache op ID
+ * DD: hardware cache op result ID
+ * : PMU type ID
+ * If the PMU type ID is 0, the PERF_TYPE_RAW will be applied.
+ */
+#define PERF_PMU_TYPE_SHIFT32
+#define PERF_HW_EVENT_MASK 0x
+
+/*
  * Generalized performance event event_id types, used by the
  * attr.event_id parameter of the sys_perf_event_open()
  * syscall:
diff --git a/kernel/events/core.c b/kernel/events/core.c
index f079431..c87c51e

[PATCH V6 12/25] perf/x86/intel: Factor out intel_pmu_check_event_constraints

2021-04-12 Thread kan . liang

From: Kan Liang 

Each Hybrid PMU has to check and update its own event constraints before
registration.

The intel_pmu_check_event_constraints will be reused later to check
the event constraints of each hybrid PMU.

Reviewed-by: Andi Kleen 
Signed-off-by: Kan Liang 
---
 arch/x86/events/intel/core.c | 82 +---
 1 file changed, 47 insertions(+), 35 deletions(-)

diff --git a/arch/x86/events/intel/core.c b/arch/x86/events/intel/core.c
index d7e2021..5c5f330 100644
--- a/arch/x86/events/intel/core.c
+++ b/arch/x86/events/intel/core.c
@@ -5084,6 +5084,49 @@ static void intel_pmu_check_num_counters(int 
*num_counters,
*intel_ctrl |= fixed_mask << INTEL_PMC_IDX_FIXED;
 }
 
+static void intel_pmu_check_event_constraints(struct event_constraint 
*event_constraints,
+ int num_counters,
+ int num_counters_fixed,
+ u64 intel_ctrl)
+{
+   struct event_constraint *c;
+
+   if (!event_constraints)
+   return;
+
+   /*
+* event on fixed counter2 (REF_CYCLES) only works on this
+* counter, so do not extend mask to generic counters
+*/
+   for_each_event_constraint(c, event_constraints) {
+   /*
+* Don't extend the topdown slots and metrics
+* events to the generic counters.
+*/
+   if (c->idxmsk64 & INTEL_PMC_MSK_TOPDOWN) {
+   /*
+* Disable topdown slots and metrics events,
+* if slots event is not in CPUID.
+*/
+   if (!(INTEL_PMC_MSK_FIXED_SLOTS & intel_ctrl))
+   c->idxmsk64 = 0;
+   c->weight = hweight64(c->idxmsk64);
+   continue;
+   }
+
+   if (c->cmask == FIXED_EVENT_FLAGS) {
+   /* Disabled fixed counters which are not in CPUID */
+   c->idxmsk64 &= intel_ctrl;
+
+   if (c->idxmsk64 != INTEL_PMC_MSK_FIXED_REF_CYCLES)
+   c->idxmsk64 |= (1ULL << num_counters) - 1;
+   }
+   c->idxmsk64 &=
+   ~(~0ULL << (INTEL_PMC_IDX_FIXED + num_counters_fixed));
+   c->weight = hweight64(c->idxmsk64);
+   }
+}
+
 __init int intel_pmu_init(void)
 {
struct attribute **extra_skl_attr = _attrs;
@@ -5094,7 +5137,6 @@ __init int intel_pmu_init(void)
union cpuid10_edx edx;
union cpuid10_eax eax;
union cpuid10_ebx ebx;
-   struct event_constraint *c;
unsigned int fixed_mask;
struct extra_reg *er;
bool pmem = false;
@@ -5732,40 +5774,10 @@ __init int intel_pmu_init(void)
if (x86_pmu.intel_cap.anythread_deprecated)
x86_pmu.format_attrs = intel_arch_formats_attr;
 
-   if (x86_pmu.event_constraints) {
-   /*
-* event on fixed counter2 (REF_CYCLES) only works on this
-* counter, so do not extend mask to generic counters
-*/
-   for_each_event_constraint(c, x86_pmu.event_constraints) {
-   /*
-* Don't extend the topdown slots and metrics
-* events to the generic counters.
-*/
-   if (c->idxmsk64 & INTEL_PMC_MSK_TOPDOWN) {
-   /*
-* Disable topdown slots and metrics events,
-* if slots event is not in CPUID.
-*/
-   if (!(INTEL_PMC_MSK_FIXED_SLOTS & 
x86_pmu.intel_ctrl))
-   c->idxmsk64 = 0;
-   c->weight = hweight64(c->idxmsk64);
-   continue;
-   }
-
-   if (c->cmask == FIXED_EVENT_FLAGS) {
-   /* Disabled fixed counters which are not in 
CPUID */
-   c->idxmsk64 &= x86_pmu.intel_ctrl;
-
-   if (c->idxmsk64 != 
INTEL_PMC_MSK_FIXED_REF_CYCLES)
-   c->idxmsk64 |= (1ULL << 
x86_pmu.num_counters) - 1;
-   }
-   c->idxmsk64 &=
-   ~(~0ULL << (INTEL_PMC_IDX_FIXED + 
x86_pmu.num_counters_fixed));
-   c->weight = hweight64(c->idxmsk64);
-   }
-   }
-
+   intel_pmu_check_event_constraints(x86_pmu.event_constraints,
+ x86_pmu.num_counters,
+

[PATCH V6 13/25] perf/x86/intel: Factor out intel_pmu_check_extra_regs

2021-04-12 Thread kan . liang

From: Kan Liang 

Each Hybrid PMU has to check and update its own extra registers before
registration.

The intel_pmu_check_extra_regs will be reused later to check the extra
registers of each hybrid PMU.

Reviewed-by: Andi Kleen 
Signed-off-by: Kan Liang 
---
 arch/x86/events/intel/core.c | 35 +--
 1 file changed, 21 insertions(+), 14 deletions(-)

diff --git a/arch/x86/events/intel/core.c b/arch/x86/events/intel/core.c
index 5c5f330..55ccfbb 100644
--- a/arch/x86/events/intel/core.c
+++ b/arch/x86/events/intel/core.c
@@ -5127,6 +5127,26 @@ static void intel_pmu_check_event_constraints(struct 
event_constraint *event_con
}
 }
 
+static void intel_pmu_check_extra_regs(struct extra_reg *extra_regs)
+{
+   struct extra_reg *er;
+
+   /*
+* Access extra MSR may cause #GP under certain circumstances.
+* E.g. KVM doesn't support offcore event
+* Check all extra_regs here.
+*/
+   if (!extra_regs)
+   return;
+
+   for (er = extra_regs; er->msr; er++) {
+   er->extra_msr_access = check_msr(er->msr, 0x11UL);
+   /* Disable LBR select mapping */
+   if ((er->idx == EXTRA_REG_LBR) && !er->extra_msr_access)
+   x86_pmu.lbr_sel_map = NULL;
+   }
+}
+
 __init int intel_pmu_init(void)
 {
struct attribute **extra_skl_attr = _attrs;
@@ -5138,7 +5158,6 @@ __init int intel_pmu_init(void)
union cpuid10_eax eax;
union cpuid10_ebx ebx;
unsigned int fixed_mask;
-   struct extra_reg *er;
bool pmem = false;
int version, i;
char *name;
@@ -5795,19 +5814,7 @@ __init int intel_pmu_init(void)
if (x86_pmu.lbr_nr)
pr_cont("%d-deep LBR, ", x86_pmu.lbr_nr);
 
-   /*
-* Access extra MSR may cause #GP under certain circumstances.
-* E.g. KVM doesn't support offcore event
-* Check all extra_regs here.
-*/
-   if (x86_pmu.extra_regs) {
-   for (er = x86_pmu.extra_regs; er->msr; er++) {
-   er->extra_msr_access = check_msr(er->msr, 0x11UL);
-   /* Disable LBR select mapping */
-   if ((er->idx == EXTRA_REG_LBR) && !er->extra_msr_access)
-   x86_pmu.lbr_sel_map = NULL;
-   }
-   }
+   intel_pmu_check_extra_regs(x86_pmu.extra_regs);
 
/* Support full width counters using alternative MSR range */
if (x86_pmu.intel_cap.full_width_write) {
-- 
2.7.4

[PATCH V6 09/25] perf/x86: Hybrid PMU support for event constraints

2021-04-12 Thread kan . liang

From: Kan Liang 

The events are different among hybrid PMUs. Each hybrid PMU should use
its own event constraints.

Reviewed-by: Andi Kleen 
Signed-off-by: Kan Liang 
---
 arch/x86/events/core.c   | 3 ++-
 arch/x86/events/intel/core.c | 5 +++--
 arch/x86/events/intel/ds.c   | 5 +++--
 arch/x86/events/perf_event.h | 2 ++
 4 files changed, 10 insertions(+), 5 deletions(-)

diff --git a/arch/x86/events/core.c b/arch/x86/events/core.c
index a5f8a5e..f3e6fb0 100644
--- a/arch/x86/events/core.c
+++ b/arch/x86/events/core.c
@@ -1519,6 +1519,7 @@ void perf_event_print_debug(void)
struct cpu_hw_events *cpuc = _cpu(cpu_hw_events, cpu);
int num_counters = hybrid(cpuc->pmu, num_counters);
int num_counters_fixed = hybrid(cpuc->pmu, num_counters_fixed);
+   struct event_constraint *pebs_constraints = hybrid(cpuc->pmu, 
pebs_constraints);
unsigned long flags;
int idx;
 
@@ -1538,7 +1539,7 @@ void perf_event_print_debug(void)
pr_info("CPU#%d: status: %016llx\n", cpu, status);
pr_info("CPU#%d: overflow:   %016llx\n", cpu, overflow);
pr_info("CPU#%d: fixed:  %016llx\n", cpu, fixed);
-   if (x86_pmu.pebs_constraints) {
+   if (pebs_constraints) {
rdmsrl(MSR_IA32_PEBS_ENABLE, pebs);
pr_info("CPU#%d: pebs:   %016llx\n", cpu, pebs);
}
diff --git a/arch/x86/events/intel/core.c b/arch/x86/events/intel/core.c
index 4cfc382f..447a80f 100644
--- a/arch/x86/events/intel/core.c
+++ b/arch/x86/events/intel/core.c
@@ -3136,10 +3136,11 @@ struct event_constraint *
 x86_get_event_constraints(struct cpu_hw_events *cpuc, int idx,
  struct perf_event *event)
 {
+   struct event_constraint *event_constraints = hybrid(cpuc->pmu, 
event_constraints);
struct event_constraint *c;
 
-   if (x86_pmu.event_constraints) {
-   for_each_event_constraint(c, x86_pmu.event_constraints) {
+   if (event_constraints) {
+   for_each_event_constraint(c, event_constraints) {
if (constraint_match(c, event->hw.config)) {
event->hw.flags |= c->flags;
return c;
diff --git a/arch/x86/events/intel/ds.c b/arch/x86/events/intel/ds.c
index 312bf3b..f1402bc 100644
--- a/arch/x86/events/intel/ds.c
+++ b/arch/x86/events/intel/ds.c
@@ -959,13 +959,14 @@ struct event_constraint 
intel_spr_pebs_event_constraints[] = {
 
 struct event_constraint *intel_pebs_constraints(struct perf_event *event)
 {
+   struct event_constraint *pebs_constraints = hybrid(event->pmu, 
pebs_constraints);
struct event_constraint *c;
 
if (!event->attr.precise_ip)
return NULL;
 
-   if (x86_pmu.pebs_constraints) {
-   for_each_event_constraint(c, x86_pmu.pebs_constraints) {
+   if (pebs_constraints) {
+   for_each_event_constraint(c, pebs_constraints) {
if (constraint_match(c, event->hw.config)) {
event->hw.flags |= c->flags;
return c;
diff --git a/arch/x86/events/perf_event.h b/arch/x86/events/perf_event.h
index 10ef244..a38c5b6 100644
--- a/arch/x86/events/perf_event.h
+++ b/arch/x86/events/perf_event.h
@@ -649,6 +649,8 @@ struct x86_hybrid_pmu {
[PERF_COUNT_HW_CACHE_MAX]
[PERF_COUNT_HW_CACHE_OP_MAX]
[PERF_COUNT_HW_CACHE_RESULT_MAX];
+   struct event_constraint *event_constraints;
+   struct event_constraint *pebs_constraints;
 };
 
 static __always_inline struct x86_hybrid_pmu *hybrid_pmu(struct pmu *pmu)
-- 
2.7.4

[PATCH V6 07/25] perf/x86: Hybrid PMU support for unconstrained

2021-04-12 Thread kan . liang

From: Kan Liang 

The unconstrained value depends on the number of GP and fixed counters.
Each hybrid PMU should use its own unconstrained.

Suggested-by: Peter Zijlstra (Intel) 
Signed-off-by: Kan Liang 
---
 arch/x86/events/intel/core.c |  2 +-
 arch/x86/events/perf_event.h | 11 +++
 2 files changed, 12 insertions(+), 1 deletion(-)

diff --git a/arch/x86/events/intel/core.c b/arch/x86/events/intel/core.c
index 3ea0126e..4cfc382f 100644
--- a/arch/x86/events/intel/core.c
+++ b/arch/x86/events/intel/core.c
@@ -3147,7 +3147,7 @@ x86_get_event_constraints(struct cpu_hw_events *cpuc, int 
idx,
}
}
 
-   return 
+   return _var(cpuc->pmu, unconstrained);
 }
 
 static struct event_constraint *
diff --git a/arch/x86/events/perf_event.h b/arch/x86/events/perf_event.h
index df3689b..93d6479 100644
--- a/arch/x86/events/perf_event.h
+++ b/arch/x86/events/perf_event.h
@@ -639,6 +639,7 @@ struct x86_hybrid_pmu {
int max_pebs_events;
int num_counters;
int num_counters_fixed;
+   struct event_constraint unconstrained;
 };
 
 static __always_inline struct x86_hybrid_pmu *hybrid_pmu(struct pmu *pmu)
@@ -659,6 +660,16 @@ extern struct static_key_false perf_is_hybrid;
__Fp;   \
 }))
 
+#define hybrid_var(_pmu, _var) \
+(*({   \
+   typeof(&_var) __Fp = &_var; \
+   \
+   if (is_hybrid() && (_pmu))  \
+   __Fp = _pmu(_pmu)->_var; \
+   \
+   __Fp;   \
+}))
+
 /*
  * struct x86_pmu - generic x86 pmu
  */
-- 
2.7.4

[PATCH V6 05/25] perf/x86: Hybrid PMU support for intel_ctrl

2021-04-12 Thread kan . liang

From: Kan Liang 

The intel_ctrl is the counter mask of a PMU. The PMU counter information
may be different among hybrid PMUs, each hybrid PMU should use its own
intel_ctrl to check and access the counters.

When handling a certain hybrid PMU, apply the intel_ctrl from the
corresponding hybrid PMU.

When checking the HW existence, apply the PMU and number of counters
from the corresponding hybrid PMU as well. Perf will check the HW
existence for each Hybrid PMU before registration. Expose the
check_hw_exists() for a later patch.

Reviewed-by: Andi Kleen 
Signed-off-by: Kan Liang 
---
 arch/x86/events/core.c   | 14 +++---
 arch/x86/events/intel/core.c | 14 +-
 arch/x86/events/perf_event.h | 10 --
 3 files changed, 24 insertions(+), 14 deletions(-)

diff --git a/arch/x86/events/core.c b/arch/x86/events/core.c
index a8e7247..2382ace 100644
--- a/arch/x86/events/core.c
+++ b/arch/x86/events/core.c
@@ -231,7 +231,7 @@ static void release_pmc_hardware(void) {}
 
 #endif
 
-static bool check_hw_exists(void)
+bool check_hw_exists(struct pmu *pmu, int num_counters, int num_counters_fixed)
 {
u64 val, val_fail = -1, val_new= ~0;
int i, reg, reg_fail = -1, ret = 0;
@@ -242,7 +242,7 @@ static bool check_hw_exists(void)
 * Check to see if the BIOS enabled any of the counters, if so
 * complain and bail.
 */
-   for (i = 0; i < x86_pmu.num_counters; i++) {
+   for (i = 0; i < num_counters; i++) {
reg = x86_pmu_config_addr(i);
ret = rdmsrl_safe(reg, );
if (ret)
@@ -256,13 +256,13 @@ static bool check_hw_exists(void)
}
}
 
-   if (x86_pmu.num_counters_fixed) {
+   if (num_counters_fixed) {
reg = MSR_ARCH_PERFMON_FIXED_CTR_CTRL;
ret = rdmsrl_safe(reg, );
if (ret)
goto msr_fail;
-   for (i = 0; i < x86_pmu.num_counters_fixed; i++) {
-   if (fixed_counter_disabled(i))
+   for (i = 0; i < num_counters_fixed; i++) {
+   if (fixed_counter_disabled(i, pmu))
continue;
if (val & (0x03 << i*4)) {
bios_fail = 1;
@@ -1548,7 +1548,7 @@ void perf_event_print_debug(void)
cpu, idx, prev_left);
}
for (idx = 0; idx < x86_pmu.num_counters_fixed; idx++) {
-   if (fixed_counter_disabled(idx))
+   if (fixed_counter_disabled(idx, cpuc->pmu))
continue;
rdmsrl(MSR_ARCH_PERFMON_FIXED_CTR0 + idx, pmc_count);
 
@@ -1993,7 +1993,7 @@ static int __init init_hw_perf_events(void)
pmu_check_apic();
 
/* sanity check that the hardware exists or is emulated */
-   if (!check_hw_exists())
+   if (!check_hw_exists(, x86_pmu.num_counters, 
x86_pmu.num_counters_fixed))
return 0;
 
pr_cont("%s PMU driver.\n", x86_pmu.name);
diff --git a/arch/x86/events/intel/core.c b/arch/x86/events/intel/core.c
index dc9e2fb..2d56055 100644
--- a/arch/x86/events/intel/core.c
+++ b/arch/x86/events/intel/core.c
@@ -2153,10 +2153,11 @@ static void intel_pmu_disable_all(void)
 static void __intel_pmu_enable_all(int added, bool pmi)
 {
struct cpu_hw_events *cpuc = this_cpu_ptr(_hw_events);
+   u64 intel_ctrl = hybrid(cpuc->pmu, intel_ctrl);
 
intel_pmu_lbr_enable_all(pmi);
wrmsrl(MSR_CORE_PERF_GLOBAL_CTRL,
-   x86_pmu.intel_ctrl & ~cpuc->intel_ctrl_guest_mask);
+  intel_ctrl & ~cpuc->intel_ctrl_guest_mask);
 
if (test_bit(INTEL_PMC_IDX_FIXED_BTS, cpuc->active_mask)) {
struct perf_event *event =
@@ -2709,6 +2710,7 @@ int intel_pmu_save_and_restart(struct perf_event *event)
 static void intel_pmu_reset(void)
 {
struct debug_store *ds = __this_cpu_read(cpu_hw_events.ds);
+   struct cpu_hw_events *cpuc = this_cpu_ptr(_hw_events);
unsigned long flags;
int idx;
 
@@ -2724,7 +2726,7 @@ static void intel_pmu_reset(void)
wrmsrl_safe(x86_pmu_event_addr(idx),  0ull);
}
for (idx = 0; idx < x86_pmu.num_counters_fixed; idx++) {
-   if (fixed_counter_disabled(idx))
+   if (fixed_counter_disabled(idx, cpuc->pmu))
continue;
wrmsrl_safe(MSR_ARCH_PERFMON_FIXED_CTR0 + idx, 0ull);
}
@@ -2753,6 +2755,7 @@ static int handle_pmi_common(struct pt_regs *regs, u64 
status)
struct cpu_hw_events *cpuc = this_cpu_ptr(_hw_events);
int bit;
int handled = 0;
+   u64 intel_ctrl = hybrid(cpuc->pmu, intel_ctrl);
 
inc_irq_stat(apic_perf_irqs);
 
@@ -2798,7 +2801,7 @@ static int handle_pmi_common(struct pt_regs *regs, u64 
status)
 
handled++;

[PATCH V6 01/25] x86/cpufeatures: Enumerate Intel Hybrid Technology feature bit

2021-04-12 Thread kan . liang

From: Ricardo Neri 

Add feature enumeration to identify a processor with Intel Hybrid
Technology: one in which CPUs of more than one type are the same package.
On a hybrid processor, all CPUs support the same homogeneous (i.e.,
symmetric) instruction set. All CPUs enumerate the same features in CPUID.
Thus, software (user space and kernel) can run and migrate to any CPU in
the system as well as utilize any of the enumerated features without any
change or special provisions. The main difference among CPUs in a hybrid
processor are power and performance properties.

Cc: Andi Kleen 
Cc: Kan Liang 
Cc: "Peter Zijlstra (Intel)" 
Cc: "Rafael J. Wysocki" 
Cc: "Ravi V. Shankar" 
Cc: Srinivas Pandruvada 
Cc: linux-kernel@vger.kernel.org
Reviewed-by: Len Brown 
Reviewed-by: Tony Luck 
Acked-by: Borislav Petkov 
Signed-off-by: Ricardo Neri 
---
Changes since v5 (as part of patchset for perf change for Alderlake)
 * None

Changes since v4 (as part of patchset for perf change for Alderlake)
 * Add Acked-by

Changes since v3 (as part of patchset for perf change for Alderlake)
 * None

Changes since V2 (as part of patchset for perf change for Alderlake)
 * Don't show "hybrid_cpu" in /proc/cpuinfo (Boris)

Changes since v1 (as part of patchset for perf change for Alderlake)
 * None

Changes since v1 (in a separate posting):
 * Reworded commit message to clearly state what is Intel Hybrid
   Technology. Stress that all CPUs can run the same instruction
   set and support the same features.
---
 arch/x86/include/asm/cpufeatures.h | 1 +
 1 file changed, 1 insertion(+)

diff --git a/arch/x86/include/asm/cpufeatures.h 
b/arch/x86/include/asm/cpufeatures.h
index cc96e26..1ba4a6e 100644
--- a/arch/x86/include/asm/cpufeatures.h
+++ b/arch/x86/include/asm/cpufeatures.h
@@ -374,6 +374,7 @@
 #define X86_FEATURE_MD_CLEAR   (18*32+10) /* VERW clears CPU buffers */
 #define X86_FEATURE_TSX_FORCE_ABORT(18*32+13) /* "" TSX_FORCE_ABORT */
 #define X86_FEATURE_SERIALIZE  (18*32+14) /* SERIALIZE instruction */
+#define X86_FEATURE_HYBRID_CPU (18*32+15) /* "" This part has CPUs of 
more than one type */
 #define X86_FEATURE_TSXLDTRK   (18*32+16) /* TSX Suspend Load Address 
Tracking */
 #define X86_FEATURE_PCONFIG(18*32+18) /* Intel PCONFIG */
 #define X86_FEATURE_ARCH_LBR   (18*32+19) /* Intel ARCH LBR */
-- 
2.7.4

[PATCH V6 00/25] Add Alder Lake support for perf (kernel)

2021-04-12 Thread kan . liang

From: Kan Liang 

Changes since V5:
- Add a new static_key_false "perf_is_hybrid" to indicate a hybrid
  system. Update hybrid() so we can get a pointer for the hybrid
  variables (Peter) (Patch 4 & 20)
- Use (not change) the x86_pmu.intel_cap.pebs_output_pt_available in
  intel_pmu_aux_output_match(). Drop the PEBS via PT support on the
  small core for now. A separate patch to enable the PEBS via PT
  support on the small core may be submitted later. (Patch 4)
- Add hybrid_var() for the variables which not in the x86_pmu, e.g.,
  unconstrained. (Peter) (Patch 7 & 8)
- Only register the hybrid PMUs when all PMUs are registered
  correctly. (Peter) (Patch 16)
- Remove two new perf types. Extend the two existing types,
  PERF_TYPE_HARDWARE and PERF_TYPE_HW_CACHE, to become PMU aware types.
  (Peter) (Patch 21)

Changes since V4:
- Put the X86_HYBRID_CPU_TYPE_ID_SHIFT over the function where it is
  used (Boris) (Patch 2)
- Add Acked-by from Boris for Patch 1 & 2
- Fix a smatch warning, "allocate_fake_cpuc() warn: possible memory
  leak of 'cpuc'" (0-DAY test) (Patch 16)

Changes since V3:
- Check whether the supported_cpus is empty in allocate_fake_cpuc().
  A user may offline all the CPUs of a certain type. Perf should not
  create an event for that PMU. (Patch 16)
- Don't clear a cpuc->pmu when the cpu is offlined in intel_pmu_cpu_dead().
  We never unregister a PMU, even all the CPUs of a certain type are
  offlined. A cpuc->pmu should be always valid and unchanged. There is no
  harm to keep the pointer of the PMU. Also, some functions, e.g.,
  release_lbr_buffers(), require a valid cpuc->pmu for each possible CPU.
  (Patch 16)
- ADL may have an alternative configuration. With that configuration
  X86_FEATURE_HYBRID_CPU is not set. Perf cannot retrieve the core type
  from the CPUID leaf 0x1a either.
  Use the number of hybrid PMUs, which implies a hybrid system, to replace
  the check of the X86_FEATURE_HYBRID_CPU. (Patch 4)
  Introduce a platform specific get_hybrid_cpu_type to retrieve the core
  type if the generic one doesn't return a valid core type. (Patch 16 & 20)

Changes since V2:
- Don't show "hybrid_cpu" in /proc/cpuinfo (Boris) (Patch 1)
- Use get_this_hybrid_cpu_type() to replace get_hybrid_cpu_type() to
  avoid the trouble of IPIs. The new function retrieves the type of the
  current hybrid CPU. It's good enough for perf. (Dave) (Patch 2)
- Remove definitions for Atom and Core CPU types. Perf will define a
  enum for the hybrid CPU type in the perf_event.h (Peter) (Patch 2 & 16)
- Remove X86_HYBRID_CPU_NATIVE_MODEL_ID_MASK. Not used in the patch set
  (Kan)(Patch 2)
- Update the description of the patch 2 accordingly. (Boris) (Patch 2)
- All the hybrid PMUs are registered at boot time. (Peter) (Patch 16)
- Align all ATTR things. (Peter) (Patch 20)
- The patchset doesn't change the caps/pmu_name. The perf tool doesn't
  rely on it to distinguish the event list. The caps/pmu_name is only to
  indicate the microarchitecture, which is the hybrid Alder Lake for
  both PMUs.

Changes since V1:
- Drop all user space patches, which will be reviewed later separately.
- Don't save the CPU type in struct cpuinfo_x86. Instead, provide helper
  functions to get parameters of hybrid CPUs. (Boris)
- Rework the perf kernel patches according to Peter's suggestion. The
  key changes include,
  - Code style changes. Drop all the macro which names in capital
letters.
  - Drop the hybrid PMU index, track the pointer of the hybrid PMU in
the per-CPU struct cpu_hw_events.
  - Fix the x86_get_pmu() support
  - Fix the allocate_fake_cpuc() support
  - Fix validate_group() support
  - Dynamically allocate the *hybrid_pmu for each hybrid PMU

Alder Lake uses a hybrid architecture utilizing Golden Cove cores
and Gracemont cores. On such architectures, all CPUs support the same,
homogeneous and symmetric, instruction set. Also, CPUID enumerate
the same features for all CPUs. There may be model-specific differences,
such as those addressed in this patchset.

The first two patches enumerate the hybrid CPU feature bit and provide
a helper function to get the CPU type of hybrid CPUs. (The initial idea
[1] was to save the CPU type in a new field x86_cpu_type in struct
cpuinfo_x86. Since the only user of the new field is perf, querying the
X86_FEATURE_HYBRID_CPU at the call site is a simpler alternative.[2])
Compared with the initial submission, the below two concerns[3][4] are
also addressed,
- Provide a good use case, PMU.
- Clarify what Intel Hybrid Technology is and is not.

The PMU capabilities for Golden Cove core and Gracemont core are not the
same. The key differences include the number of counters, events, perf
metrics feature, and PEBS-via-PT feature. A dedicated hybrid PMU has to
be registered for each of them. However, the current perf X86 assumes
that there is only one CPU PMU. To handle the hybrid PMUs, the patchset
- Introduce a new s

[PATCH V5 25/25] perf/x86/rapl: Add support for Intel Alder Lake

2021-04-05 Thread kan . liang

From: Zhang Rui 

Alder Lake RAPL support is the same as previous Sky Lake.
Add Alder Lake model for RAPL.

Reviewed-by: Andi Kleen 
Signed-off-by: Zhang Rui 
---
 arch/x86/events/rapl.c | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/arch/x86/events/rapl.c b/arch/x86/events/rapl.c
index f42a704..84a1042 100644
--- a/arch/x86/events/rapl.c
+++ b/arch/x86/events/rapl.c
@@ -800,6 +800,8 @@ static const struct x86_cpu_id rapl_model_match[] 
__initconst = {
X86_MATCH_INTEL_FAM6_MODEL(ICELAKE_X,   _hsx),
X86_MATCH_INTEL_FAM6_MODEL(COMETLAKE_L, _skl),
X86_MATCH_INTEL_FAM6_MODEL(COMETLAKE,   _skl),
+   X86_MATCH_INTEL_FAM6_MODEL(ALDERLAKE,   _skl),
+   X86_MATCH_INTEL_FAM6_MODEL(ALDERLAKE_L, _skl),
X86_MATCH_INTEL_FAM6_MODEL(SAPPHIRERAPIDS_X,_spr),
X86_MATCH_VENDOR_FAM(AMD,   0x17,   _amd_fam17h),
X86_MATCH_VENDOR_FAM(HYGON, 0x18,   _amd_fam17h),
-- 
2.7.4

[PATCH V5 24/25] perf/x86/cstate: Add Alder Lake CPU support

2021-04-05 Thread kan . liang

From: Kan Liang 

Compared with the Rocket Lake, the CORE C1 Residency Counter is added
for Alder Lake, but the CORE C3 Residency Counter is removed. Other
counters are the same.

Create a new adl_cstates for Alder Lake. Update the comments
accordingly.

The External Design Specification (EDS) is not published yet. It comes
from an authoritative internal source.

The patch has been tested on real hardware.

Reviewed-by: Andi Kleen 
Signed-off-by: Kan Liang 
---
 arch/x86/events/intel/cstate.c | 39 +--
 1 file changed, 29 insertions(+), 10 deletions(-)

diff --git a/arch/x86/events/intel/cstate.c b/arch/x86/events/intel/cstate.c
index 407eee5..4333990 100644
--- a/arch/x86/events/intel/cstate.c
+++ b/arch/x86/events/intel/cstate.c
@@ -40,7 +40,7 @@
  * Model specific counters:
  * MSR_CORE_C1_RES: CORE C1 Residency Counter
  *  perf code: 0x00
- *  Available model: SLM,AMT,GLM,CNL,TNT
+ *  Available model: SLM,AMT,GLM,CNL,TNT,ADL
  *  Scope: Core (each processor core has a MSR)
  * MSR_CORE_C3_RESIDENCY: CORE C3 Residency Counter
  *perf code: 0x01
@@ -51,46 +51,49 @@
  *perf code: 0x02
  *Available model: SLM,AMT,NHM,WSM,SNB,IVB,HSW,BDW,
  * SKL,KNL,GLM,CNL,KBL,CML,ICL,TGL,
- * TNT,RKL
+ * TNT,RKL,ADL
  *Scope: Core
  * MSR_CORE_C7_RESIDENCY: CORE C7 Residency Counter
  *perf code: 0x03
  *Available model: SNB,IVB,HSW,BDW,SKL,CNL,KBL,CML,
- * ICL,TGL,RKL
+ * ICL,TGL,RKL,ADL
  *Scope: Core
  * MSR_PKG_C2_RESIDENCY:  Package C2 Residency Counter.
  *perf code: 0x00
  *Available model: SNB,IVB,HSW,BDW,SKL,KNL,GLM,CNL,
- * KBL,CML,ICL,TGL,TNT,RKL
+ * KBL,CML,ICL,TGL,TNT,RKL,ADL
  *Scope: Package (physical package)
  * MSR_PKG_C3_RESIDENCY:  Package C3 Residency Counter.
  *perf code: 0x01
  *Available model: NHM,WSM,SNB,IVB,HSW,BDW,SKL,KNL,
- * GLM,CNL,KBL,CML,ICL,TGL,TNT,RKL
+ * GLM,CNL,KBL,CML,ICL,TGL,TNT,RKL,
+ * ADL
  *Scope: Package (physical package)
  * MSR_PKG_C6_RESIDENCY:  Package C6 Residency Counter.
  *perf code: 0x02
  *Available model: SLM,AMT,NHM,WSM,SNB,IVB,HSW,BDW,
  * SKL,KNL,GLM,CNL,KBL,CML,ICL,TGL,
- * TNT,RKL
+ * TNT,RKL,ADL
  *Scope: Package (physical package)
  * MSR_PKG_C7_RESIDENCY:  Package C7 Residency Counter.
  *perf code: 0x03
  *Available model: NHM,WSM,SNB,IVB,HSW,BDW,SKL,CNL,
- * KBL,CML,ICL,TGL,RKL
+ * KBL,CML,ICL,TGL,RKL,ADL
  *Scope: Package (physical package)
  * MSR_PKG_C8_RESIDENCY:  Package C8 Residency Counter.
  *perf code: 0x04
- *Available model: HSW ULT,KBL,CNL,CML,ICL,TGL,RKL
+ *Available model: HSW ULT,KBL,CNL,CML,ICL,TGL,RKL,
+ * ADL
  *Scope: Package (physical package)
  * MSR_PKG_C9_RESIDENCY:  Package C9 Residency Counter.
  *perf code: 0x05
- *Available model: HSW ULT,KBL,CNL,CML,ICL,TGL,RKL
+ *Available model: HSW ULT,KBL,CNL,CML,ICL,TGL,RKL,
+ * ADL
  *Scope: Package (physical package)
  * MSR_PKG_C10_RESIDENCY: Package C10 Residency Counter.
  *perf code: 0x06
  *Available model: HSW ULT,KBL,GLM,CNL,CML,ICL,TGL,
- * TNT,RKL
+ * TNT,RKL,ADL
  *Scope: Package (physical package)
  *
  */
@@ -563,6 +566,20 @@ static const struct cstate_model icl_cstates __initconst = 
{
  BIT(PERF_CSTATE_PKG_C10_RES),
 };
 
+static const struct

[PATCH V5 21/25] perf: Introduce PERF_TYPE_HARDWARE_PMU and PERF_TYPE_HW_CACHE_PMU

2021-04-05 Thread kan . liang

From: Kan Liang 

Current Hardware events and Hardware cache events have special perf
types, PERF_TYPE_HARDWARE and PERF_TYPE_HW_CACHE. The two types don't
pass the PMU type in the user interface. For a hybrid system, the perf
subsystem doesn't know which PMU the events belong to. The first capable
PMU will always be assigned to the events. The events never get a chance
to run on the other capable PMUs.

Add a PMU aware version PERF_TYPE_HARDWARE_PMU and
PERF_TYPE_HW_CACHE_PMU. The PMU type ID is stored at attr.config[40:32].
Support the new types for X86.

Suggested-by: Andi Kleen 
Reviewed-by: Andi Kleen 
Signed-off-by: Kan Liang 
---
 arch/x86/events/core.c  | 10 --
 include/uapi/linux/perf_event.h | 26 ++
 kernel/events/core.c| 14 +-
 3 files changed, 47 insertions(+), 3 deletions(-)

diff --git a/arch/x86/events/core.c b/arch/x86/events/core.c
index 09922ee..f8d1222 100644
--- a/arch/x86/events/core.c
+++ b/arch/x86/events/core.c
@@ -488,7 +488,7 @@ int x86_setup_perfctr(struct perf_event *event)
if (attr->type == event->pmu->type)
return x86_pmu_extra_regs(event->attr.config, event);
 
-   if (attr->type == PERF_TYPE_HW_CACHE)
+   if ((attr->type == PERF_TYPE_HW_CACHE) || (attr->type == 
PERF_TYPE_HW_CACHE_PMU))
return set_ext_hw_attr(hwc, event);
 
if (attr->config >= x86_pmu.max_events)
@@ -2452,9 +2452,15 @@ static int x86_pmu_event_init(struct perf_event *event)
 
if ((event->attr.type != event->pmu->type) &&
(event->attr.type != PERF_TYPE_HARDWARE) &&
-   (event->attr.type != PERF_TYPE_HW_CACHE))
+   (event->attr.type != PERF_TYPE_HW_CACHE) &&
+   (event->attr.type != PERF_TYPE_HARDWARE_PMU) &&
+   (event->attr.type != PERF_TYPE_HW_CACHE_PMU))
return -ENOENT;
 
+   if ((event->attr.type == PERF_TYPE_HARDWARE_PMU) ||
+   (event->attr.type == PERF_TYPE_HW_CACHE_PMU))
+   event->attr.config &= PERF_HW_CACHE_EVENT_MASK;
+
if (is_hybrid() && (event->cpu != -1)) {
pmu = hybrid_pmu(event->pmu);
if (!cpumask_test_cpu(event->cpu, >supported_cpus))
diff --git a/include/uapi/linux/perf_event.h b/include/uapi/linux/perf_event.h
index ad15e40..c0a511e 100644
--- a/include/uapi/linux/perf_event.h
+++ b/include/uapi/linux/perf_event.h
@@ -33,6 +33,8 @@ enum perf_type_id {
PERF_TYPE_HW_CACHE  = 3,
PERF_TYPE_RAW   = 4,
PERF_TYPE_BREAKPOINT= 5,
+   PERF_TYPE_HARDWARE_PMU  = 6,
+   PERF_TYPE_HW_CACHE_PMU  = 7,
 
PERF_TYPE_MAX,  /* non-ABI */
 };
@@ -95,6 +97,30 @@ enum perf_hw_cache_op_result_id {
 };
 
 /*
+ * attr.config layout for type PERF_TYPE_HARDWARE* and PERF_TYPE_HW_CACHE*
+ * PERF_TYPE_HARDWARE: 0xAA
+ * AA: hardware event ID
+ * PERF_TYPE_HW_CACHE: 0xCCBBAA
+ * AA: hardware cache ID
+ * BB: hardware cache op ID
+ * CC: hardware cache op result ID
+ * PERF_TYPE_HARDWARE_PMU: 0xDD00AA
+ * AA: hardware event ID
+ * DD: PMU type ID
+ * PERF_TYPE_HW_CACHE_PMU: 0xDD00CCBBAA
+ * AA: hardware cache ID
+ * BB: hardware cache op ID
+ * CC: hardware cache op result ID
+ * DD: PMU type ID
+ */
+#define PERF_HW_CACHE_ID_SHIFT 0
+#define PERF_HW_CACHE_OP_ID_SHIFT  8
+#define PERF_HW_CACHE_OP_RESULT_ID_SHIFT   16
+#define PERF_HW_CACHE_EVENT_MASK   0xff
+
+#define PERF_PMU_TYPE_SHIFT32
+
+/*
  * Special "software" events provided by the kernel, even if the hardware
  * does not support performance events. These events measure various
  * physical and sw events of the kernel (and allow the profiling of them as
diff --git a/kernel/events/core.c b/kernel/events/core.c
index f079431..b8ab756 100644
--- a/kernel/events/core.c
+++ b/kernel/events/core.c
@@ -11093,6 +11093,14 @@ static int perf_try_init_event(struct pmu *pmu, struct 
perf_event *event)
return ret;
 }
 
+static bool perf_event_is_hw_pmu_type(struct perf_event *event)
+{
+   int type = event->attr.type;
+
+   return type == PERF_TYPE_HARDWARE_PMU ||
+  type == PERF_TYPE_HW_CACHE_PMU;
+}
+
 static struct pmu *perf_init_event(struct perf_event *event)
 {
int idx, type, ret;
@@ -6,13 +11124,17 @@ static struct pmu *perf_init_event(struct perf_event 
*event)
if (type == PERF_TYPE_HARDWARE || type == PERF_

[PATCH V5 23/25] perf/x86/msr: Add Alder Lake CPU support

2021-04-05 Thread kan . liang

From: Kan Liang 

PPERF and SMI_COUNT MSRs are also supported on Alder Lake.

The External Design Specification (EDS) is not published yet. It comes
from an authoritative internal source.

The patch has been tested on real hardware.

Reviewed-by: Andi Kleen 
Signed-off-by: Kan Liang 
---
 arch/x86/events/msr.c | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/arch/x86/events/msr.c b/arch/x86/events/msr.c
index 680404c..c853b28 100644
--- a/arch/x86/events/msr.c
+++ b/arch/x86/events/msr.c
@@ -100,6 +100,8 @@ static bool test_intel(int idx, void *data)
case INTEL_FAM6_TIGERLAKE_L:
case INTEL_FAM6_TIGERLAKE:
case INTEL_FAM6_ROCKETLAKE:
+   case INTEL_FAM6_ALDERLAKE:
+   case INTEL_FAM6_ALDERLAKE_L:
if (idx == PERF_MSR_SMI || idx == PERF_MSR_PPERF)
return true;
break;
-- 
2.7.4

[PATCH V5 22/25] perf/x86/intel/uncore: Add Alder Lake support

2021-04-05 Thread kan . liang

From: Kan Liang 

The uncore subsystem for Alder Lake is similar to the previous Tiger
Lake.

The difference includes:
- New MSR addresses for global control, fixed counters, CBOX and ARB.
  Add a new adl_uncore_msr_ops for uncore operations.
- Add a new threshold field for CBOX.
- New PCIIDs for IMC devices.

Reviewed-by: Andi Kleen 
Signed-off-by: Kan Liang 
---
 arch/x86/events/intel/uncore.c |   7 ++
 arch/x86/events/intel/uncore.h |   1 +
 arch/x86/events/intel/uncore_snb.c | 131 +
 3 files changed, 139 insertions(+)

diff --git a/arch/x86/events/intel/uncore.c b/arch/x86/events/intel/uncore.c
index 35b3470..70816f3 100644
--- a/arch/x86/events/intel/uncore.c
+++ b/arch/x86/events/intel/uncore.c
@@ -1740,6 +1740,11 @@ static const struct intel_uncore_init_fun 
rkl_uncore_init __initconst = {
.pci_init = skl_uncore_pci_init,
 };
 
+static const struct intel_uncore_init_fun adl_uncore_init __initconst = {
+   .cpu_init = adl_uncore_cpu_init,
+   .mmio_init = tgl_uncore_mmio_init,
+};
+
 static const struct intel_uncore_init_fun icx_uncore_init __initconst = {
.cpu_init = icx_uncore_cpu_init,
.pci_init = icx_uncore_pci_init,
@@ -1794,6 +1799,8 @@ static const struct x86_cpu_id intel_uncore_match[] 
__initconst = {
X86_MATCH_INTEL_FAM6_MODEL(TIGERLAKE_L, _l_uncore_init),
X86_MATCH_INTEL_FAM6_MODEL(TIGERLAKE,   _uncore_init),
X86_MATCH_INTEL_FAM6_MODEL(ROCKETLAKE,  _uncore_init),
+   X86_MATCH_INTEL_FAM6_MODEL(ALDERLAKE,   _uncore_init),
+   X86_MATCH_INTEL_FAM6_MODEL(ALDERLAKE_L, _uncore_init),
X86_MATCH_INTEL_FAM6_MODEL(ATOM_TREMONT_D,  _uncore_init),
{},
 };
diff --git a/arch/x86/events/intel/uncore.h b/arch/x86/events/intel/uncore.h
index 549cfb2..426212f 100644
--- a/arch/x86/events/intel/uncore.h
+++ b/arch/x86/events/intel/uncore.h
@@ -575,6 +575,7 @@ void snb_uncore_cpu_init(void);
 void nhm_uncore_cpu_init(void);
 void skl_uncore_cpu_init(void);
 void icl_uncore_cpu_init(void);
+void adl_uncore_cpu_init(void);
 void tgl_uncore_cpu_init(void);
 void tgl_uncore_mmio_init(void);
 void tgl_l_uncore_mmio_init(void);
diff --git a/arch/x86/events/intel/uncore_snb.c 
b/arch/x86/events/intel/uncore_snb.c
index 5127128..0f63706 100644
--- a/arch/x86/events/intel/uncore_snb.c
+++ b/arch/x86/events/intel/uncore_snb.c
@@ -62,6 +62,8 @@
 #define PCI_DEVICE_ID_INTEL_TGL_H_IMC  0x9a36
 #define PCI_DEVICE_ID_INTEL_RKL_1_IMC  0x4c43
 #define PCI_DEVICE_ID_INTEL_RKL_2_IMC  0x4c53
+#define PCI_DEVICE_ID_INTEL_ADL_1_IMC  0x4660
+#define PCI_DEVICE_ID_INTEL_ADL_2_IMC  0x4641
 
 /* SNB event control */
 #define SNB_UNC_CTL_EV_SEL_MASK0x00ff
@@ -131,12 +133,33 @@
 #define ICL_UNC_ARB_PER_CTR0x3b1
 #define ICL_UNC_ARB_PERFEVTSEL 0x3b3
 
+/* ADL uncore global control */
+#define ADL_UNC_PERF_GLOBAL_CTL0x2ff0
+#define ADL_UNC_FIXED_CTR_CTRL  0x2fde
+#define ADL_UNC_FIXED_CTR   0x2fdf
+
+/* ADL Cbo register */
+#define ADL_UNC_CBO_0_PER_CTR0 0x2002
+#define ADL_UNC_CBO_0_PERFEVTSEL0  0x2000
+#define ADL_UNC_CTL_THRESHOLD  0x3f00
+#define ADL_UNC_RAW_EVENT_MASK (SNB_UNC_CTL_EV_SEL_MASK | \
+SNB_UNC_CTL_UMASK_MASK | \
+SNB_UNC_CTL_EDGE_DET | \
+SNB_UNC_CTL_INVERT | \
+ADL_UNC_CTL_THRESHOLD)
+
+/* ADL ARB register */
+#define ADL_UNC_ARB_PER_CTR0   0x2FD2
+#define ADL_UNC_ARB_PERFEVTSEL00x2FD0
+#define ADL_UNC_ARB_MSR_OFFSET 0x8
+
 DEFINE_UNCORE_FORMAT_ATTR(event, event, "config:0-7");
 DEFINE_UNCORE_FORMAT_ATTR(umask, umask, "config:8-15");
 DEFINE_UNCORE_FORMAT_ATTR(edge, edge, "config:18");
 DEFINE_UNCORE_FORMAT_ATTR(inv, inv, "config:23");
 DEFINE_UNCORE_FORMAT_ATTR(cmask5, cmask, "config:24-28");
 DEFINE_UNCORE_FORMAT_ATTR(cmask8, cmask, "config:24-31");
+DEFINE_UNCORE_FORMAT_ATTR(threshold, threshold, "config:24-29");
 
 /* Sandy Bridge uncore support */
 static void snb_uncore_msr_enable_event(struct intel_uncore_box *box, struct 
perf_event *event)
@@ -422,6 +445,106 @@ void tgl_uncore_cpu_init(void)
skl_uncore_msr_ops.init_box = rkl_uncore_msr_init_box;
 }
 
+static void adl_uncore_msr_init_box(struct intel_uncore_box *box)
+{
+   if (box->pmu->pmu_idx == 0)
+   wrmsrl(ADL_UNC_PERF_GLOBAL_CTL, SNB_UNC_GLOBAL_CTL_EN);
+}
+
+static void adl_uncore_msr_enable_box(struct intel_uncore_box *box)
+{
+   wrmsrl(ADL_UNC_PERF_GLOBAL_CTL, SNB_UNC_GLOBAL_CTL_EN);
+}
+
+static void adl_uncore_ms

[PATCH V5 20/25] perf/x86/intel: Add Alder Lake Hybrid support

2021-04-05 Thread kan . liang

From: Kan Liang 

Alder Lake Hybrid system has two different types of core, Golden Cove
core and Gracemont core. The Golden Cove core is registered to
"cpu_core" PMU. The Gracemont core is registered to "cpu_atom" PMU.

The difference between the two PMUs include:
- Number of GP and fixed counters
- Events
- The "cpu_core" PMU supports Topdown metrics.
  The "cpu_atom" PMU supports PEBS-via-PT.

The "cpu_core" PMU is similar to the Sapphire Rapids PMU, but without
PMEM.
The "cpu_atom" PMU is similar to Tremont, but with different events,
event_constraints, extra_regs and number of counters.

The mem-loads AUX event workaround only applies to the Golden Cove core.

Users may disable all CPUs of the same CPU type on the command line or
in the BIOS. For this case, perf still register a PMU for the CPU type
but the CPU mask is 0.

Current caps/pmu_name is usually the microarch codename. Assign the
"alderlake_hybrid" to the caps/pmu_name of both PMUs to indicate the
hybrid Alder Lake microarchitecture.

Reviewed-by: Andi Kleen 
Signed-off-by: Kan Liang 
---
 arch/x86/events/intel/core.c | 254 ++-
 arch/x86/events/intel/ds.c   |   7 ++
 arch/x86/events/perf_event.h |   7 ++
 3 files changed, 267 insertions(+), 1 deletion(-)

diff --git a/arch/x86/events/intel/core.c b/arch/x86/events/intel/core.c
index 07af58c..2b553d9 100644
--- a/arch/x86/events/intel/core.c
+++ b/arch/x86/events/intel/core.c
@@ -2076,6 +2076,14 @@ static struct extra_reg intel_tnt_extra_regs[] 
__read_mostly = {
EVENT_EXTRA_END
 };
 
+static struct extra_reg intel_grt_extra_regs[] __read_mostly = {
+   /* must define OFFCORE_RSP_X first, see intel_fixup_er() */
+   INTEL_UEVENT_EXTRA_REG(0x01b7, MSR_OFFCORE_RSP_0, 0x3full, 
RSP_0),
+   INTEL_UEVENT_EXTRA_REG(0x02b7, MSR_OFFCORE_RSP_1, 0x3full, 
RSP_1),
+   INTEL_UEVENT_PEBS_LDLAT_EXTRA_REG(0x5d0),
+   EVENT_EXTRA_END
+};
+
 #define KNL_OT_L2_HITE BIT_ULL(19) /* Other Tile L2 Hit */
 #define KNL_OT_L2_HITF BIT_ULL(20) /* Other Tile L2 Hit */
 #define KNL_MCDRAM_LOCAL   BIT_ULL(21)
@@ -2430,6 +2438,16 @@ static int icl_set_topdown_event_period(struct 
perf_event *event)
return 0;
 }
 
+static int adl_set_topdown_event_period(struct perf_event *event)
+{
+   struct x86_hybrid_pmu *pmu = hybrid_pmu(event->pmu);
+
+   if (pmu->cpu_type != hybrid_big)
+   return 0;
+
+   return icl_set_topdown_event_period(event);
+}
+
 static inline u64 icl_get_metrics_event_value(u64 metric, u64 slots, int idx)
 {
u32 val;
@@ -2570,6 +2588,17 @@ static u64 icl_update_topdown_event(struct perf_event 
*event)
 x86_pmu.num_topdown_events - 
1);
 }
 
+static u64 adl_update_topdown_event(struct perf_event *event)
+{
+   struct x86_hybrid_pmu *pmu = hybrid_pmu(event->pmu);
+
+   if (pmu->cpu_type != hybrid_big)
+   return 0;
+
+   return icl_update_topdown_event(event);
+}
+
+
 static void intel_pmu_read_topdown_event(struct perf_event *event)
 {
struct cpu_hw_events *cpuc = this_cpu_ptr(_hw_events);
@@ -3658,6 +3687,17 @@ static inline bool is_mem_loads_aux_event(struct 
perf_event *event)
return (event->attr.config & INTEL_ARCH_EVENT_MASK) == 
X86_CONFIG(.event=0x03, .umask=0x82);
 }
 
+static inline bool require_mem_loads_aux_event(struct perf_event *event)
+{
+   if (!(x86_pmu.flags & PMU_FL_MEM_LOADS_AUX))
+   return false;
+
+   if (is_hybrid())
+   return hybrid_pmu(event->pmu)->cpu_type == hybrid_big;
+
+   return true;
+}
+
 static inline bool intel_pmu_has_cap(struct perf_event *event, int idx)
 {
union perf_capabilities *intel_cap;
@@ -3785,7 +3825,7 @@ static int intel_pmu_hw_config(struct perf_event *event)
 * event. The rule is to simplify the implementation of the check.
 * That's because perf cannot have a complete group at the moment.
 */
-   if (x86_pmu.flags & PMU_FL_MEM_LOADS_AUX &&
+   if (require_mem_loads_aux_event(event) &&
(event->attr.sample_type & PERF_SAMPLE_DATA_SRC) &&
is_mem_loads_event(event)) {
struct perf_event *leader = event->group_leader;
@@ -4062,6 +4102,39 @@ tfa_get_event_constraints(struct cpu_hw_events *cpuc, 
int idx,
return c;
 }
 
+static struct event_constraint *
+adl_get_event_constraints(struct cpu_hw_events *cpuc, int idx,
+ struct perf_event *event)
+{
+   struct x86_hybrid_pmu *pmu = hybrid_pmu(event->pmu);
+
+   if (pmu->cpu_type == hybrid_big)
+   return spr_get_event_constraints(cpuc, idx, event);
+   else if (pmu->cpu_type == hybrid_small)
+   return tnt_get_event_constraints(cpuc, idx, event);
+
+

[PATCH V5 17/25] perf/x86: Add structures for the attributes of Hybrid PMUs

2021-04-05 Thread kan . liang

From: Kan Liang 

Hybrid PMUs have different events and formats. In theory, Hybrid PMU
specific attributes should be maintained in the dedicated struct
x86_hybrid_pmu, but it wastes space because the events and formats are
similar among Hybrid PMUs.

To reduce duplication, all hybrid PMUs will share a group of attributes
in the following patch. To distinguish an attribute from different
Hybrid PMUs, a PMU aware attribute structure is introduced. A PMU type
is required for the attribute structure. The type is internal usage. It
is not visible in the sysfs API.

Hybrid PMUs may support the same event name, but with different event
encoding, e.g., the mem-loads event on an Atom PMU has different event
encoding from a Core PMU. It brings issue if two attributes are
created for them. Current sysfs_update_group finds an attribute by
searching the attr name (aka event name). If two attributes have the
same event name, the first attribute will be replaced.
To address the issue, only one attribute is created for the event. The
event_str is extended and stores event encodings from all Hybrid PMUs.
Each event encoding is divided by ";". The order of the event encodings
must follow the order of the hybrid PMU index. The event_str is internal
usage as well. When a user wants to show the attribute of a Hybrid PMU,
only the corresponding part of the string is displayed.

Reviewed-by: Andi Kleen 
Signed-off-by: Kan Liang 
---
 arch/x86/events/core.c   | 43 +++
 arch/x86/events/perf_event.h | 19 +++
 include/linux/perf_event.h   | 12 
 3 files changed, 74 insertions(+)

diff --git a/arch/x86/events/core.c b/arch/x86/events/core.c
index 901b52c..16b4f6f 100644
--- a/arch/x86/events/core.c
+++ b/arch/x86/events/core.c
@@ -1868,6 +1868,49 @@ ssize_t events_ht_sysfs_show(struct device *dev, struct 
device_attribute *attr,
pmu_attr->event_str_noht);
 }
 
+ssize_t events_hybrid_sysfs_show(struct device *dev,
+struct device_attribute *attr,
+char *page)
+{
+   struct perf_pmu_events_hybrid_attr *pmu_attr =
+   container_of(attr, struct perf_pmu_events_hybrid_attr, attr);
+   struct x86_hybrid_pmu *pmu;
+   const char *str, *next_str;
+   int i;
+
+   if (hweight64(pmu_attr->pmu_type) == 1)
+   return sprintf(page, "%s", pmu_attr->event_str);
+
+   /*
+* Hybrid PMUs may support the same event name, but with different
+* event encoding, e.g., the mem-loads event on an Atom PMU has
+* different event encoding from a Core PMU.
+*
+* The event_str includes all event encodings. Each event encoding
+* is divided by ";". The order of the event encodings must follow
+* the order of the hybrid PMU index.
+*/
+   pmu = container_of(dev_get_drvdata(dev), struct x86_hybrid_pmu, pmu);
+
+   str = pmu_attr->event_str;
+   for (i = 0; i < x86_pmu.num_hybrid_pmus; i++) {
+   if (!(x86_pmu.hybrid_pmu[i].cpu_type & pmu_attr->pmu_type))
+   continue;
+   if (x86_pmu.hybrid_pmu[i].cpu_type & pmu->cpu_type) {
+   next_str = strchr(str, ';');
+   if (next_str)
+   return snprintf(page, next_str - str + 1, "%s", 
str);
+   else
+   return sprintf(page, "%s", str);
+   }
+   str = strchr(str, ';');
+   str++;
+   }
+
+   return 0;
+}
+EXPORT_SYMBOL_GPL(events_hybrid_sysfs_show);
+
 EVENT_ATTR(cpu-cycles, CPU_CYCLES  );
 EVENT_ATTR(instructions,   INSTRUCTIONS);
 EVENT_ATTR(cache-references,   CACHE_REFERENCES);
diff --git a/arch/x86/events/perf_event.h b/arch/x86/events/perf_event.h
index 35510a9..c1c90c3 100644
--- a/arch/x86/events/perf_event.h
+++ b/arch/x86/events/perf_event.h
@@ -970,6 +970,22 @@ static struct perf_pmu_events_ht_attr event_attr_##v = {   
\
.event_str_ht   = ht,   \
 }
 
+#define EVENT_ATTR_STR_HYBRID(_name, v, str, _pmu) \
+static struct perf_pmu_events_hybrid_attr event_attr_##v = {   \
+   .attr   = __ATTR(_name, 0444, events_hybrid_sysfs_show, NULL),\
+   .id = 0,\
+   .event_str  = str,  \
+   .pmu_type   = _pmu, \
+}
+
+#define FORMAT_HYBRID_PTR(_id) (_attr_hybrid_##_id.attr.attr)
+
+#define FORMAT_ATTR_HYBRID(_name, _pmu)
\
+static struct perf_pmu_format_hybrid_attr format_attr_hybrid_##_name = {\
+   .att

[PATCH V5 18/25] perf/x86/intel: Add attr_update for Hybrid PMUs

2021-04-05 Thread kan . liang

From: Kan Liang 

The attribute_group for Hybrid PMUs should be different from the
previous
cpu PMU. For example, cpumask is required for a Hybrid PMU. The PMU type
should be included in the event and format attribute.

Add hybrid_attr_update for the Hybrid PMU.
Check the PMU type in is_visible() function. Only display the event or
format for the matched Hybrid PMU.

Reviewed-by: Andi Kleen 
Signed-off-by: Kan Liang 
---
 arch/x86/events/intel/core.c | 120 ---
 1 file changed, 114 insertions(+), 6 deletions(-)

diff --git a/arch/x86/events/intel/core.c b/arch/x86/events/intel/core.c
index 27919ae..07af58c 100644
--- a/arch/x86/events/intel/core.c
+++ b/arch/x86/events/intel/core.c
@@ -5124,6 +5124,106 @@ static const struct attribute_group *attr_update[] = {
NULL,
 };
 
+static bool is_attr_for_this_pmu(struct kobject *kobj, struct attribute *attr)
+{
+   struct device *dev = kobj_to_dev(kobj);
+   struct x86_hybrid_pmu *pmu =
+   container_of(dev_get_drvdata(dev), struct x86_hybrid_pmu, pmu);
+   struct perf_pmu_events_hybrid_attr *pmu_attr =
+   container_of(attr, struct perf_pmu_events_hybrid_attr, 
attr.attr);
+
+   return pmu->cpu_type & pmu_attr->pmu_type;
+}
+
+static umode_t hybrid_events_is_visible(struct kobject *kobj,
+   struct attribute *attr, int i)
+{
+   return is_attr_for_this_pmu(kobj, attr) ? attr->mode : 0;
+}
+
+static inline int hybrid_find_supported_cpu(struct x86_hybrid_pmu *pmu)
+{
+   int cpu = cpumask_first(>supported_cpus);
+
+   return (cpu >= nr_cpu_ids) ? -1 : cpu;
+}
+
+static umode_t hybrid_tsx_is_visible(struct kobject *kobj,
+struct attribute *attr, int i)
+{
+   struct device *dev = kobj_to_dev(kobj);
+   struct x86_hybrid_pmu *pmu =
+container_of(dev_get_drvdata(dev), struct x86_hybrid_pmu, pmu);
+   int cpu = hybrid_find_supported_cpu(pmu);
+
+   return (cpu >= 0) && is_attr_for_this_pmu(kobj, attr) && 
cpu_has(_data(cpu), X86_FEATURE_RTM) ? attr->mode : 0;
+}
+
+static umode_t hybrid_format_is_visible(struct kobject *kobj,
+   struct attribute *attr, int i)
+{
+   struct device *dev = kobj_to_dev(kobj);
+   struct x86_hybrid_pmu *pmu =
+   container_of(dev_get_drvdata(dev), struct x86_hybrid_pmu, pmu);
+   struct perf_pmu_format_hybrid_attr *pmu_attr =
+   container_of(attr, struct perf_pmu_format_hybrid_attr, 
attr.attr);
+   int cpu = hybrid_find_supported_cpu(pmu);
+
+   return (cpu >= 0) && (pmu->cpu_type & pmu_attr->pmu_type) ? attr->mode 
: 0;
+}
+
+static struct attribute_group hybrid_group_events_td  = {
+   .name   = "events",
+   .is_visible = hybrid_events_is_visible,
+};
+
+static struct attribute_group hybrid_group_events_mem = {
+   .name   = "events",
+   .is_visible = hybrid_events_is_visible,
+};
+
+static struct attribute_group hybrid_group_events_tsx = {
+   .name   = "events",
+   .is_visible = hybrid_tsx_is_visible,
+};
+
+static struct attribute_group hybrid_group_format_extra = {
+   .name   = "format",
+   .is_visible = hybrid_format_is_visible,
+};
+
+static ssize_t intel_hybrid_get_attr_cpus(struct device *dev,
+ struct device_attribute *attr,
+ char *buf)
+{
+   struct x86_hybrid_pmu *pmu =
+   container_of(dev_get_drvdata(dev), struct x86_hybrid_pmu, pmu);
+
+   return cpumap_print_to_pagebuf(true, buf, >supported_cpus);
+}
+
+static DEVICE_ATTR(cpus, S_IRUGO, intel_hybrid_get_attr_cpus, NULL);
+static struct attribute *intel_hybrid_cpus_attrs[] = {
+   _attr_cpus.attr,
+   NULL,
+};
+
+static struct attribute_group hybrid_group_cpus = {
+   .attrs  = intel_hybrid_cpus_attrs,
+};
+
+static const struct attribute_group *hybrid_attr_update[] = {
+   _group_events_td,
+   _group_events_mem,
+   _group_events_tsx,
+   _caps_gen,
+   _caps_lbr,
+   _group_format_extra,
+   _default,
+   _group_cpus,
+   NULL,
+};
+
 static struct attribute *empty_attrs;
 
 static void intel_pmu_check_num_counters(int *num_counters,
@@ -5867,14 +5967,22 @@ __init int intel_pmu_init(void)
 
snprintf(pmu_name_str, sizeof(pmu_name_str), "%s", name);
 
+   if (!is_hybrid()) {
+   group_events_td.attrs  = td_attr;
+   group_events_mem.attrs = mem_attr;
+   group_events_tsx.attrs = tsx_attr;
+   group_format_extra.attrs = extra_attr;
+   group_format_extra_skl.attrs = extra_skl_attr;
 
-   group_events_td.attrs  = td_attr;
-   group

[PATCH V5 19/25] perf/x86: Support filter_match callback

2021-04-05 Thread kan . liang

From: Kan Liang 

Implement filter_match callback for X86, which check whether an event is
schedulable on the current CPU.

Reviewed-by: Andi Kleen 
Signed-off-by: Kan Liang 
---
 arch/x86/events/core.c   | 10 ++
 arch/x86/events/perf_event.h |  1 +
 2 files changed, 11 insertions(+)

diff --git a/arch/x86/events/core.c b/arch/x86/events/core.c
index 16b4f6f..09922ee 100644
--- a/arch/x86/events/core.c
+++ b/arch/x86/events/core.c
@@ -2650,6 +2650,14 @@ static int x86_pmu_aux_output_match(struct perf_event 
*event)
return 0;
 }
 
+static int x86_pmu_filter_match(struct perf_event *event)
+{
+   if (x86_pmu.filter_match)
+   return x86_pmu.filter_match(event);
+
+   return 1;
+}
+
 static struct pmu pmu = {
.pmu_enable = x86_pmu_enable,
.pmu_disable= x86_pmu_disable,
@@ -2677,6 +2685,8 @@ static struct pmu pmu = {
.check_period   = x86_pmu_check_period,
 
.aux_output_match   = x86_pmu_aux_output_match,
+
+   .filter_match   = x86_pmu_filter_match,
 };
 
 void arch_perf_update_userpage(struct perf_event *event,
diff --git a/arch/x86/events/perf_event.h b/arch/x86/events/perf_event.h
index c1c90c3..f996686 100644
--- a/arch/x86/events/perf_event.h
+++ b/arch/x86/events/perf_event.h
@@ -870,6 +870,7 @@ struct x86_pmu {
 
int (*aux_output_match) (struct perf_event *event);
 
+   int (*filter_match)(struct perf_event *event);
/*
 * Hybrid support
 *
-- 
2.7.4

[PATCH V5 15/25] perf/x86: Factor out x86_pmu_show_pmu_cap

2021-04-05 Thread kan . liang

From: Kan Liang 

The PMU capabilities are different among hybrid PMUs. Perf should dump
the PMU capabilities information for each hybrid PMU.

Factor out x86_pmu_show_pmu_cap() which shows the PMU capabilities
information. The function will be reused later when registering a
dedicated hybrid PMU.

Reviewed-by: Andi Kleen 
Signed-off-by: Kan Liang 
---
 arch/x86/events/core.c   | 25 -
 arch/x86/events/perf_event.h |  3 +++
 2 files changed, 19 insertions(+), 9 deletions(-)

diff --git a/arch/x86/events/core.c b/arch/x86/events/core.c
index 9c931ec..f9d299b 100644
--- a/arch/x86/events/core.c
+++ b/arch/x86/events/core.c
@@ -1984,6 +1984,20 @@ static void _x86_pmu_read(struct perf_event *event)
x86_perf_event_update(event);
 }
 
+void x86_pmu_show_pmu_cap(int num_counters, int num_counters_fixed,
+ u64 intel_ctrl)
+{
+   pr_info("... version:%d\n", x86_pmu.version);
+   pr_info("... bit width:  %d\n", x86_pmu.cntval_bits);
+   pr_info("... generic registers:  %d\n", num_counters);
+   pr_info("... value mask: %016Lx\n", x86_pmu.cntval_mask);
+   pr_info("... max period: %016Lx\n", x86_pmu.max_period);
+   pr_info("... fixed-purpose events:   %lu\n",
+   hweight641ULL << num_counters_fixed) - 1)
+   << INTEL_PMC_IDX_FIXED) & intel_ctrl));
+   pr_info("... event mask: %016Lx\n", intel_ctrl);
+}
+
 static int __init init_hw_perf_events(void)
 {
struct x86_pmu_quirk *quirk;
@@ -2044,15 +2058,8 @@ static int __init init_hw_perf_events(void)
 
pmu.attr_update = x86_pmu.attr_update;
 
-   pr_info("... version:%d\n", x86_pmu.version);
-   pr_info("... bit width:  %d\n", x86_pmu.cntval_bits);
-   pr_info("... generic registers:  %d\n", x86_pmu.num_counters);
-   pr_info("... value mask: %016Lx\n", x86_pmu.cntval_mask);
-   pr_info("... max period: %016Lx\n", x86_pmu.max_period);
-   pr_info("... fixed-purpose events:   %lu\n",
-   hweight641ULL << x86_pmu.num_counters_fixed) - 1)
-   << INTEL_PMC_IDX_FIXED) & 
x86_pmu.intel_ctrl));
-   pr_info("... event mask: %016Lx\n", x86_pmu.intel_ctrl);
+   x86_pmu_show_pmu_cap(x86_pmu.num_counters, x86_pmu.num_counters_fixed,
+x86_pmu.intel_ctrl);
 
if (!x86_pmu.read)
x86_pmu.read = _x86_pmu_read;
diff --git a/arch/x86/events/perf_event.h b/arch/x86/events/perf_event.h
index 5679c12..1da91b7 100644
--- a/arch/x86/events/perf_event.h
+++ b/arch/x86/events/perf_event.h
@@ -1083,6 +1083,9 @@ void x86_pmu_enable_event(struct perf_event *event);
 
 int x86_pmu_handle_irq(struct pt_regs *regs);
 
+void x86_pmu_show_pmu_cap(int num_counters, int num_counters_fixed,
+ u64 intel_ctrl);
+
 extern struct event_constraint emptyconstraint;
 
 extern struct event_constraint unconstrained;
-- 
2.7.4

[PATCH V5 16/25] perf/x86: Register hybrid PMUs

2021-04-05 Thread kan . liang

From: Kan Liang 

Different hybrid PMUs have different PMU capabilities and events. Perf
should registers a dedicated PMU for each of them.

To check the X86 event, perf has to go through all possible hybrid pmus.

All the hybrid PMUs are registered at boot time. Before the
registration, add intel_pmu_check_hybrid_pmus() to check and update the
counters information, the event constraints, the extra registers and the
unique capabilities for each hybrid PMUs.

Postpone the display of the PMU information and HW check to
CPU_STARTING, because the boot CPU is the only online CPU in the
init_hw_perf_events(). Perf doesn't know the availability of the other
PMUs. Perf should display the PMU information only if the counters of
the PMU are available.

One type of CPUs may be all offline. For this case, users can still
observe the PMU in /sys/devices, but its CPU mask is 0.

All hybrid PMUs have capability PERF_PMU_CAP_HETEROGENEOUS_CPUS.
The PMU name for hybrid PMUs will be "cpu_XXX", which will be assigned
later in a separated patch.

The PMU type id for the core PMU is still PERF_TYPE_RAW. For the other
hybrid PMUs, the PMU type id is not hard code.

The event->cpu must be compatitable with the supported CPUs of the PMU.
Add a check in the x86_pmu_event_init().

The events in a group must be from the same type of hybrid PMU.
The fake cpuc used in the validation must be from the supported CPU of
the event->pmu.

Perf may not retrieve a valid core type from get_this_hybrid_cpu_type().
For example, ADL may have an alternative configuration. With that
configuration, Perf cannot retrieve the core type from the CPUID leaf
0x1a. Add a platform specific get_hybrid_cpu_type(). If the generic way
fails, invoke the platform specific get_hybrid_cpu_type().

Suggested-by: Peter Zijlstra (Intel) 
Signed-off-by: Kan Liang 
---
 arch/x86/events/core.c   | 138 +--
 arch/x86/events/intel/core.c |  93 -
 arch/x86/events/perf_event.h |  14 +
 3 files changed, 224 insertions(+), 21 deletions(-)

diff --git a/arch/x86/events/core.c b/arch/x86/events/core.c
index f9d299b..901b52c 100644
--- a/arch/x86/events/core.c
+++ b/arch/x86/events/core.c
@@ -485,7 +485,7 @@ int x86_setup_perfctr(struct perf_event *event)
local64_set(>period_left, hwc->sample_period);
}
 
-   if (attr->type == PERF_TYPE_RAW)
+   if (attr->type == event->pmu->type)
return x86_pmu_extra_regs(event->attr.config, event);
 
if (attr->type == PERF_TYPE_HW_CACHE)
@@ -620,7 +620,7 @@ int x86_pmu_hw_config(struct perf_event *event)
if (!event->attr.exclude_kernel)
event->hw.config |= ARCH_PERFMON_EVENTSEL_OS;
 
-   if (event->attr.type == PERF_TYPE_RAW)
+   if (event->attr.type == event->pmu->type)
event->hw.config |= event->attr.config & X86_RAW_EVENT_MASK;
 
if (event->attr.sample_period && x86_pmu.limit_period) {
@@ -749,7 +749,17 @@ void x86_pmu_enable_all(int added)
 
 static inline int is_x86_event(struct perf_event *event)
 {
-   return event->pmu == 
+   int i;
+
+   if (!is_hybrid())
+   return event->pmu == 
+
+   for (i = 0; i < x86_pmu.num_hybrid_pmus; i++) {
+   if (event->pmu == _pmu.hybrid_pmu[i].pmu)
+   return true;
+   }
+
+   return false;
 }
 
 struct pmu *x86_get_pmu(unsigned int cpu)
@@ -1998,6 +2008,23 @@ void x86_pmu_show_pmu_cap(int num_counters, int 
num_counters_fixed,
pr_info("... event mask: %016Lx\n", intel_ctrl);
 }
 
+/*
+ * The generic code is not hybrid friendly. The hybrid_pmu->pmu
+ * of the first registered PMU is unconditionally assigned to
+ * each possible cpuctx->ctx.pmu.
+ * Update the correct hybrid PMU to the cpuctx->ctx.pmu.
+ */
+void x86_pmu_update_cpu_context(struct pmu *pmu, int cpu)
+{
+   struct perf_cpu_context *cpuctx;
+
+   if (!pmu->pmu_cpu_context)
+   return;
+
+   cpuctx = per_cpu_ptr(pmu->pmu_cpu_context, cpu);
+   cpuctx->ctx.pmu = pmu;
+}
+
 static int __init init_hw_perf_events(void)
 {
struct x86_pmu_quirk *quirk;
@@ -2058,8 +2085,11 @@ static int __init init_hw_perf_events(void)
 
pmu.attr_update = x86_pmu.attr_update;
 
-   x86_pmu_show_pmu_cap(x86_pmu.num_counters, x86_pmu.num_counters_fixed,
-x86_pmu.intel_ctrl);
+   if (!is_hybrid()) {
+   x86_pmu_show_pmu_cap(x86_pmu.num_counters,
+x86_pmu.num_counters_fixed,
+x86_pmu.intel_ctrl);
+   }
 
if (!x86_pmu.read)
x86_pmu.read = _x86_pmu_read;
@@ -2089,9 +2119,46 @@ static int __init init_hw_perf_events(void)
if (err)
goto out1;
 
-   err = perf_pmu_regist

[PATCH V5 12/25] perf/x86/intel: Factor out intel_pmu_check_event_constraints

2021-04-05 Thread kan . liang

From: Kan Liang 

Each Hybrid PMU has to check and update its own event constraints before
registration.

The intel_pmu_check_event_constraints will be reused later to check
the event constraints of each hybrid PMU.

Reviewed-by: Andi Kleen 
Signed-off-by: Kan Liang 
---
 arch/x86/events/intel/core.c | 82 +---
 1 file changed, 47 insertions(+), 35 deletions(-)

diff --git a/arch/x86/events/intel/core.c b/arch/x86/events/intel/core.c
index 9394646..53a2e2e 100644
--- a/arch/x86/events/intel/core.c
+++ b/arch/x86/events/intel/core.c
@@ -5090,6 +5090,49 @@ static void intel_pmu_check_num_counters(int 
*num_counters,
*intel_ctrl |= fixed_mask << INTEL_PMC_IDX_FIXED;
 }
 
+static void intel_pmu_check_event_constraints(struct event_constraint 
*event_constraints,
+ int num_counters,
+ int num_counters_fixed,
+ u64 intel_ctrl)
+{
+   struct event_constraint *c;
+
+   if (!event_constraints)
+   return;
+
+   /*
+* event on fixed counter2 (REF_CYCLES) only works on this
+* counter, so do not extend mask to generic counters
+*/
+   for_each_event_constraint(c, event_constraints) {
+   /*
+* Don't extend the topdown slots and metrics
+* events to the generic counters.
+*/
+   if (c->idxmsk64 & INTEL_PMC_MSK_TOPDOWN) {
+   /*
+* Disable topdown slots and metrics events,
+* if slots event is not in CPUID.
+*/
+   if (!(INTEL_PMC_MSK_FIXED_SLOTS & intel_ctrl))
+   c->idxmsk64 = 0;
+   c->weight = hweight64(c->idxmsk64);
+   continue;
+   }
+
+   if (c->cmask == FIXED_EVENT_FLAGS) {
+   /* Disabled fixed counters which are not in CPUID */
+   c->idxmsk64 &= intel_ctrl;
+
+   if (c->idxmsk64 != INTEL_PMC_MSK_FIXED_REF_CYCLES)
+   c->idxmsk64 |= (1ULL << num_counters) - 1;
+   }
+   c->idxmsk64 &=
+   ~(~0ULL << (INTEL_PMC_IDX_FIXED + num_counters_fixed));
+   c->weight = hweight64(c->idxmsk64);
+   }
+}
+
 __init int intel_pmu_init(void)
 {
struct attribute **extra_skl_attr = _attrs;
@@ -5100,7 +5143,6 @@ __init int intel_pmu_init(void)
union cpuid10_edx edx;
union cpuid10_eax eax;
union cpuid10_ebx ebx;
-   struct event_constraint *c;
unsigned int fixed_mask;
struct extra_reg *er;
bool pmem = false;
@@ -5738,40 +5780,10 @@ __init int intel_pmu_init(void)
if (x86_pmu.intel_cap.anythread_deprecated)
x86_pmu.format_attrs = intel_arch_formats_attr;
 
-   if (x86_pmu.event_constraints) {
-   /*
-* event on fixed counter2 (REF_CYCLES) only works on this
-* counter, so do not extend mask to generic counters
-*/
-   for_each_event_constraint(c, x86_pmu.event_constraints) {
-   /*
-* Don't extend the topdown slots and metrics
-* events to the generic counters.
-*/
-   if (c->idxmsk64 & INTEL_PMC_MSK_TOPDOWN) {
-   /*
-* Disable topdown slots and metrics events,
-* if slots event is not in CPUID.
-*/
-   if (!(INTEL_PMC_MSK_FIXED_SLOTS & 
x86_pmu.intel_ctrl))
-   c->idxmsk64 = 0;
-   c->weight = hweight64(c->idxmsk64);
-   continue;
-   }
-
-   if (c->cmask == FIXED_EVENT_FLAGS) {
-   /* Disabled fixed counters which are not in 
CPUID */
-   c->idxmsk64 &= x86_pmu.intel_ctrl;
-
-   if (c->idxmsk64 != 
INTEL_PMC_MSK_FIXED_REF_CYCLES)
-   c->idxmsk64 |= (1ULL << 
x86_pmu.num_counters) - 1;
-   }
-   c->idxmsk64 &=
-   ~(~0ULL << (INTEL_PMC_IDX_FIXED + 
x86_pmu.num_counters_fixed));
-   c->weight = hweight64(c->idxmsk64);
-   }
-   }
-
+   intel_pmu_check_event_constraints(x86_pmu.event_constraints,
+ x86_pmu.num_counters,
+

[PATCH V5 05/25] perf/x86: Hybrid PMU support for intel_ctrl

2021-04-05 Thread kan . liang

From: Kan Liang 

The intel_ctrl is the counter mask of a PMU. The PMU counter information
may be different among hybrid PMUs, each hybrid PMU should use its own
intel_ctrl to check and access the counters.

When handling a certain hybrid PMU, apply the intel_ctrl from the
corresponding hybrid PMU.

When checking the HW existence, apply the PMU and number of counters
from the corresponding hybrid PMU as well. Perf will check the HW
existence for each Hybrid PMU before registration. Expose the
check_hw_exists() for a later patch.

Reviewed-by: Andi Kleen 
Signed-off-by: Kan Liang 
---
 arch/x86/events/core.c   | 14 +++---
 arch/x86/events/intel/core.c | 14 +-
 arch/x86/events/perf_event.h | 10 --
 3 files changed, 24 insertions(+), 14 deletions(-)

diff --git a/arch/x86/events/core.c b/arch/x86/events/core.c
index d3d3c6b..fc14697 100644
--- a/arch/x86/events/core.c
+++ b/arch/x86/events/core.c
@@ -230,7 +230,7 @@ static void release_pmc_hardware(void) {}
 
 #endif
 
-static bool check_hw_exists(void)
+bool check_hw_exists(struct pmu *pmu, int num_counters, int num_counters_fixed)
 {
u64 val, val_fail = -1, val_new= ~0;
int i, reg, reg_fail = -1, ret = 0;
@@ -241,7 +241,7 @@ static bool check_hw_exists(void)
 * Check to see if the BIOS enabled any of the counters, if so
 * complain and bail.
 */
-   for (i = 0; i < x86_pmu.num_counters; i++) {
+   for (i = 0; i < num_counters; i++) {
reg = x86_pmu_config_addr(i);
ret = rdmsrl_safe(reg, );
if (ret)
@@ -255,13 +255,13 @@ static bool check_hw_exists(void)
}
}
 
-   if (x86_pmu.num_counters_fixed) {
+   if (num_counters_fixed) {
reg = MSR_ARCH_PERFMON_FIXED_CTR_CTRL;
ret = rdmsrl_safe(reg, );
if (ret)
goto msr_fail;
-   for (i = 0; i < x86_pmu.num_counters_fixed; i++) {
-   if (fixed_counter_disabled(i))
+   for (i = 0; i < num_counters_fixed; i++) {
+   if (fixed_counter_disabled(i, pmu))
continue;
if (val & (0x03 << i*4)) {
bios_fail = 1;
@@ -1547,7 +1547,7 @@ void perf_event_print_debug(void)
cpu, idx, prev_left);
}
for (idx = 0; idx < x86_pmu.num_counters_fixed; idx++) {
-   if (fixed_counter_disabled(idx))
+   if (fixed_counter_disabled(idx, cpuc->pmu))
continue;
rdmsrl(MSR_ARCH_PERFMON_FIXED_CTR0 + idx, pmc_count);
 
@@ -1992,7 +1992,7 @@ static int __init init_hw_perf_events(void)
pmu_check_apic();
 
/* sanity check that the hardware exists or is emulated */
-   if (!check_hw_exists())
+   if (!check_hw_exists(, x86_pmu.num_counters, 
x86_pmu.num_counters_fixed))
return 0;
 
pr_cont("%s PMU driver.\n", x86_pmu.name);
diff --git a/arch/x86/events/intel/core.c b/arch/x86/events/intel/core.c
index 494b9bc..7cc2c45 100644
--- a/arch/x86/events/intel/core.c
+++ b/arch/x86/events/intel/core.c
@@ -2153,10 +2153,11 @@ static void intel_pmu_disable_all(void)
 static void __intel_pmu_enable_all(int added, bool pmi)
 {
struct cpu_hw_events *cpuc = this_cpu_ptr(_hw_events);
+   u64 intel_ctrl = hybrid(cpuc->pmu, intel_ctrl);
 
intel_pmu_lbr_enable_all(pmi);
wrmsrl(MSR_CORE_PERF_GLOBAL_CTRL,
-   x86_pmu.intel_ctrl & ~cpuc->intel_ctrl_guest_mask);
+  intel_ctrl & ~cpuc->intel_ctrl_guest_mask);
 
if (test_bit(INTEL_PMC_IDX_FIXED_BTS, cpuc->active_mask)) {
struct perf_event *event =
@@ -2709,6 +2710,7 @@ int intel_pmu_save_and_restart(struct perf_event *event)
 static void intel_pmu_reset(void)
 {
struct debug_store *ds = __this_cpu_read(cpu_hw_events.ds);
+   struct cpu_hw_events *cpuc = this_cpu_ptr(_hw_events);
unsigned long flags;
int idx;
 
@@ -2724,7 +2726,7 @@ static void intel_pmu_reset(void)
wrmsrl_safe(x86_pmu_event_addr(idx),  0ull);
}
for (idx = 0; idx < x86_pmu.num_counters_fixed; idx++) {
-   if (fixed_counter_disabled(idx))
+   if (fixed_counter_disabled(idx, cpuc->pmu))
continue;
wrmsrl_safe(MSR_ARCH_PERFMON_FIXED_CTR0 + idx, 0ull);
}
@@ -2753,6 +2755,7 @@ static int handle_pmi_common(struct pt_regs *regs, u64 
status)
struct cpu_hw_events *cpuc = this_cpu_ptr(_hw_events);
int bit;
int handled = 0;
+   u64 intel_ctrl = hybrid(cpuc->pmu, intel_ctrl);
 
inc_irq_stat(apic_perf_irqs);
 
@@ -2798,7 +2801,7 @@ static int handle_pmi_common(struct pt_regs *regs, u64 
status)
 
handled++;

[PATCH V5 13/25] perf/x86/intel: Factor out intel_pmu_check_extra_regs

2021-04-05 Thread kan . liang

From: Kan Liang 

Each Hybrid PMU has to check and update its own extra registers before
registration.

The intel_pmu_check_extra_regs will be reused later to check the extra
registers of each hybrid PMU.

Reviewed-by: Andi Kleen 
Signed-off-by: Kan Liang 
---
 arch/x86/events/intel/core.c | 35 +--
 1 file changed, 21 insertions(+), 14 deletions(-)

diff --git a/arch/x86/events/intel/core.c b/arch/x86/events/intel/core.c
index 53a2e2e..d1a13e0 100644
--- a/arch/x86/events/intel/core.c
+++ b/arch/x86/events/intel/core.c
@@ -5133,6 +5133,26 @@ static void intel_pmu_check_event_constraints(struct 
event_constraint *event_con
}
 }
 
+static void intel_pmu_check_extra_regs(struct extra_reg *extra_regs)
+{
+   struct extra_reg *er;
+
+   /*
+* Access extra MSR may cause #GP under certain circumstances.
+* E.g. KVM doesn't support offcore event
+* Check all extra_regs here.
+*/
+   if (!extra_regs)
+   return;
+
+   for (er = extra_regs; er->msr; er++) {
+   er->extra_msr_access = check_msr(er->msr, 0x11UL);
+   /* Disable LBR select mapping */
+   if ((er->idx == EXTRA_REG_LBR) && !er->extra_msr_access)
+   x86_pmu.lbr_sel_map = NULL;
+   }
+}
+
 __init int intel_pmu_init(void)
 {
struct attribute **extra_skl_attr = _attrs;
@@ -5144,7 +5164,6 @@ __init int intel_pmu_init(void)
union cpuid10_eax eax;
union cpuid10_ebx ebx;
unsigned int fixed_mask;
-   struct extra_reg *er;
bool pmem = false;
int version, i;
char *name;
@@ -5801,19 +5820,7 @@ __init int intel_pmu_init(void)
if (x86_pmu.lbr_nr)
pr_cont("%d-deep LBR, ", x86_pmu.lbr_nr);
 
-   /*
-* Access extra MSR may cause #GP under certain circumstances.
-* E.g. KVM doesn't support offcore event
-* Check all extra_regs here.
-*/
-   if (x86_pmu.extra_regs) {
-   for (er = x86_pmu.extra_regs; er->msr; er++) {
-   er->extra_msr_access = check_msr(er->msr, 0x11UL);
-   /* Disable LBR select mapping */
-   if ((er->idx == EXTRA_REG_LBR) && !er->extra_msr_access)
-   x86_pmu.lbr_sel_map = NULL;
-   }
-   }
+   intel_pmu_check_extra_regs(x86_pmu.extra_regs);
 
/* Support full width counters using alternative MSR range */
if (x86_pmu.intel_cap.full_width_write) {
-- 
2.7.4

[PATCH V5 06/25] perf/x86: Hybrid PMU support for counters

2021-04-05 Thread kan . liang

From: Kan Liang 

The number of GP and fixed counters are different among hybrid PMUs.
Each hybrid PMU should use its own counter related information.

When handling a certain hybrid PMU, apply the number of counters from
the corresponding hybrid PMU.

When reserving the counters in the initialization of a new event,
reserve all possible counters.

The number of counter recored in the global x86_pmu is for the
architecture counters which are available for all hybrid PMUs. KVM
doesn't support the hybrid PMU yet. Return the number of the
architecture counters for now.

For the functions only available for the old platforms, e.g.,
intel_pmu_drain_pebs_nhm(), nothing is changed.

Reviewed-by: Andi Kleen 
Signed-off-by: Kan Liang 
---
 arch/x86/events/core.c   | 55 ++--
 arch/x86/events/intel/core.c |  8 ---
 arch/x86/events/intel/ds.c   | 14 +++
 arch/x86/events/perf_event.h |  3 +++
 4 files changed, 55 insertions(+), 25 deletions(-)

diff --git a/arch/x86/events/core.c b/arch/x86/events/core.c
index fc14697..0bd9554 100644
--- a/arch/x86/events/core.c
+++ b/arch/x86/events/core.c
@@ -184,16 +184,29 @@ static DEFINE_MUTEX(pmc_reserve_mutex);
 
 #ifdef CONFIG_X86_LOCAL_APIC
 
+static inline int get_possible_num_counters(void)
+{
+   int i, num_counters = x86_pmu.num_counters;
+
+   if (!is_hybrid())
+   return num_counters;
+
+   for (i = 0; i < x86_pmu.num_hybrid_pmus; i++)
+   num_counters = max_t(int, num_counters, 
x86_pmu.hybrid_pmu[i].num_counters);
+
+   return num_counters;
+}
+
 static bool reserve_pmc_hardware(void)
 {
-   int i;
+   int i, num_counters = get_possible_num_counters();
 
-   for (i = 0; i < x86_pmu.num_counters; i++) {
+   for (i = 0; i < num_counters; i++) {
if (!reserve_perfctr_nmi(x86_pmu_event_addr(i)))
goto perfctr_fail;
}
 
-   for (i = 0; i < x86_pmu.num_counters; i++) {
+   for (i = 0; i < num_counters; i++) {
if (!reserve_evntsel_nmi(x86_pmu_config_addr(i)))
goto eventsel_fail;
}
@@ -204,7 +217,7 @@ static bool reserve_pmc_hardware(void)
for (i--; i >= 0; i--)
release_evntsel_nmi(x86_pmu_config_addr(i));
 
-   i = x86_pmu.num_counters;
+   i = num_counters;
 
 perfctr_fail:
for (i--; i >= 0; i--)
@@ -215,9 +228,9 @@ static bool reserve_pmc_hardware(void)
 
 static void release_pmc_hardware(void)
 {
-   int i;
+   int i, num_counters = get_possible_num_counters();
 
-   for (i = 0; i < x86_pmu.num_counters; i++) {
+   for (i = 0; i < num_counters; i++) {
release_perfctr_nmi(x86_pmu_event_addr(i));
release_evntsel_nmi(x86_pmu_config_addr(i));
}
@@ -945,6 +958,7 @@ EXPORT_SYMBOL_GPL(perf_assign_events);
 
 int x86_schedule_events(struct cpu_hw_events *cpuc, int n, int *assign)
 {
+   int num_counters = hybrid(cpuc->pmu, num_counters);
struct event_constraint *c;
struct perf_event *e;
int n0, i, wmin, wmax, unsched = 0;
@@ -1020,7 +1034,7 @@ int x86_schedule_events(struct cpu_hw_events *cpuc, int 
n, int *assign)
 
/* slow path */
if (i != n) {
-   int gpmax = x86_pmu.num_counters;
+   int gpmax = num_counters;
 
/*
 * Do not allow scheduling of more than half the available
@@ -1041,7 +1055,7 @@ int x86_schedule_events(struct cpu_hw_events *cpuc, int 
n, int *assign)
 * the extra Merge events needed by large increment events.
 */
if (x86_pmu.flags & PMU_FL_PAIR) {
-   gpmax = x86_pmu.num_counters - cpuc->n_pair;
+   gpmax = num_counters - cpuc->n_pair;
WARN_ON(gpmax <= 0);
}
 
@@ -1128,10 +1142,12 @@ static int collect_event(struct cpu_hw_events *cpuc, 
struct perf_event *event,
  */
 static int collect_events(struct cpu_hw_events *cpuc, struct perf_event 
*leader, bool dogrp)
 {
+   int num_counters = hybrid(cpuc->pmu, num_counters);
+   int num_counters_fixed = hybrid(cpuc->pmu, num_counters_fixed);
struct perf_event *event;
int n, max_count;
 
-   max_count = x86_pmu.num_counters + x86_pmu.num_counters_fixed;
+   max_count = num_counters + num_counters_fixed;
 
/* current number of events already accepted */
n = cpuc->n_events;
@@ -1499,18 +1515,18 @@ void perf_event_print_debug(void)
 {
u64 ctrl, status, overflow, pmc_ctrl, pmc_count, prev_left, fixed;
u64 pebs, debugctl;
-   struct cpu_hw_events *cpuc;
+   int cpu = smp_processor_id();
+   struct cpu_hw_events *cpuc = _cpu(cpu_hw_events, cpu);
+   int num_counters = hybrid(cpuc->pmu, num_counters);
+   int num_counters_fixed = hybrid(cpuc->pmu, num_cou

[PATCH V5 08/25] perf/x86: Hybrid PMU support for hardware cache event

2021-04-05 Thread kan . liang

From: Kan Liang 

The hardware cache events are different among hybrid PMUs. Each hybrid
PMU should have its own hw cache event table.

The hw_cache_extra_regs is not part of the struct x86_pmu, the hybrid()
cannot be applied here.

Reviewed-by: Andi Kleen 
Signed-off-by: Kan Liang 
---
 arch/x86/events/core.c   | 11 +--
 arch/x86/events/perf_event.h |  9 +
 2 files changed, 18 insertions(+), 2 deletions(-)

diff --git a/arch/x86/events/core.c b/arch/x86/events/core.c
index 0bd9554..d71ca69 100644
--- a/arch/x86/events/core.c
+++ b/arch/x86/events/core.c
@@ -356,6 +356,7 @@ set_ext_hw_attr(struct hw_perf_event *hwc, struct 
perf_event *event)
 {
struct perf_event_attr *attr = >attr;
unsigned int cache_type, cache_op, cache_result;
+   struct x86_hybrid_pmu *pmu = is_hybrid() ? hybrid_pmu(event->pmu) : 
NULL;
u64 config, val;
 
config = attr->config;
@@ -375,7 +376,10 @@ set_ext_hw_attr(struct hw_perf_event *hwc, struct 
perf_event *event)
return -EINVAL;
cache_result = array_index_nospec(cache_result, 
PERF_COUNT_HW_CACHE_RESULT_MAX);
 
-   val = hw_cache_event_ids[cache_type][cache_op][cache_result];
+   if (pmu)
+   val = 
pmu->hw_cache_event_ids[cache_type][cache_op][cache_result];
+   else
+   val = hw_cache_event_ids[cache_type][cache_op][cache_result];
 
if (val == 0)
return -ENOENT;
@@ -384,7 +388,10 @@ set_ext_hw_attr(struct hw_perf_event *hwc, struct 
perf_event *event)
return -EINVAL;
 
hwc->config |= val;
-   attr->config1 = hw_cache_extra_regs[cache_type][cache_op][cache_result];
+   if (pmu)
+   attr->config1 = 
pmu->hw_cache_extra_regs[cache_type][cache_op][cache_result];
+   else
+   attr->config1 = 
hw_cache_extra_regs[cache_type][cache_op][cache_result];
return x86_pmu_extra_regs(val, event);
 }
 
diff --git a/arch/x86/events/perf_event.h b/arch/x86/events/perf_event.h
index cfb2da0..203c165 100644
--- a/arch/x86/events/perf_event.h
+++ b/arch/x86/events/perf_event.h
@@ -640,6 +640,15 @@ struct x86_hybrid_pmu {
int num_counters;
int num_counters_fixed;
struct event_constraint unconstrained;
+
+   u64 hw_cache_event_ids
+   [PERF_COUNT_HW_CACHE_MAX]
+   [PERF_COUNT_HW_CACHE_OP_MAX]
+   [PERF_COUNT_HW_CACHE_RESULT_MAX];
+   u64 hw_cache_extra_regs
+   [PERF_COUNT_HW_CACHE_MAX]
+   [PERF_COUNT_HW_CACHE_OP_MAX]
+   [PERF_COUNT_HW_CACHE_RESULT_MAX];
 };
 
 static __always_inline struct x86_hybrid_pmu *hybrid_pmu(struct pmu *pmu)
-- 
2.7.4

[PATCH V5 11/25] perf/x86/intel: Factor out intel_pmu_check_num_counters

2021-04-05 Thread kan . liang

From: Kan Liang 

Each Hybrid PMU has to check its own number of counters and mask fixed
counters before registration.

The intel_pmu_check_num_counters will be reused later to check the
number of the counters for each hybrid PMU.

Reviewed-by: Andi Kleen 
Signed-off-by: Kan Liang 
---
 arch/x86/events/intel/core.c | 38 --
 1 file changed, 24 insertions(+), 14 deletions(-)

diff --git a/arch/x86/events/intel/core.c b/arch/x86/events/intel/core.c
index b5b7694..9394646 100644
--- a/arch/x86/events/intel/core.c
+++ b/arch/x86/events/intel/core.c
@@ -5070,6 +5070,26 @@ static const struct attribute_group *attr_update[] = {
 
 static struct attribute *empty_attrs;
 
+static void intel_pmu_check_num_counters(int *num_counters,
+int *num_counters_fixed,
+u64 *intel_ctrl, u64 fixed_mask)
+{
+   if (*num_counters > INTEL_PMC_MAX_GENERIC) {
+   WARN(1, KERN_ERR "hw perf events %d > max(%d), clipping!",
+*num_counters, INTEL_PMC_MAX_GENERIC);
+   *num_counters = INTEL_PMC_MAX_GENERIC;
+   }
+   *intel_ctrl = (1ULL << *num_counters) - 1;
+
+   if (*num_counters_fixed > INTEL_PMC_MAX_FIXED) {
+   WARN(1, KERN_ERR "hw perf events fixed %d > max(%d), clipping!",
+*num_counters_fixed, INTEL_PMC_MAX_FIXED);
+   *num_counters_fixed = INTEL_PMC_MAX_FIXED;
+   }
+
+   *intel_ctrl |= fixed_mask << INTEL_PMC_IDX_FIXED;
+}
+
 __init int intel_pmu_init(void)
 {
struct attribute **extra_skl_attr = _attrs;
@@ -5709,20 +5729,10 @@ __init int intel_pmu_init(void)
 
x86_pmu.attr_update = attr_update;
 
-   if (x86_pmu.num_counters > INTEL_PMC_MAX_GENERIC) {
-   WARN(1, KERN_ERR "hw perf events %d > max(%d), clipping!",
-x86_pmu.num_counters, INTEL_PMC_MAX_GENERIC);
-   x86_pmu.num_counters = INTEL_PMC_MAX_GENERIC;
-   }
-   x86_pmu.intel_ctrl = (1ULL << x86_pmu.num_counters) - 1;
-
-   if (x86_pmu.num_counters_fixed > INTEL_PMC_MAX_FIXED) {
-   WARN(1, KERN_ERR "hw perf events fixed %d > max(%d), clipping!",
-x86_pmu.num_counters_fixed, INTEL_PMC_MAX_FIXED);
-   x86_pmu.num_counters_fixed = INTEL_PMC_MAX_FIXED;
-   }
-
-   x86_pmu.intel_ctrl |= (u64)fixed_mask << INTEL_PMC_IDX_FIXED;
+   intel_pmu_check_num_counters(_pmu.num_counters,
+_pmu.num_counters_fixed,
+_pmu.intel_ctrl,
+(u64)fixed_mask);
 
/* AnyThread may be deprecated on arch perfmon v5 or later */
if (x86_pmu.intel_cap.anythread_deprecated)
-- 
2.7.4

[PATCH V5 07/25] perf/x86: Hybrid PMU support for unconstrained

2021-04-05 Thread kan . liang

From: Kan Liang 

The unconstrained value depends on the number of GP and fixed counters.
Each hybrid PMU should use its own unconstrained.

Suggested-by: Peter Zijlstra (Intel) 
Signed-off-by: Kan Liang 
---
 arch/x86/events/intel/core.c | 5 -
 arch/x86/events/perf_event.h | 1 +
 2 files changed, 5 insertions(+), 1 deletion(-)

diff --git a/arch/x86/events/intel/core.c b/arch/x86/events/intel/core.c
index 33d26ed..39f57ae 100644
--- a/arch/x86/events/intel/core.c
+++ b/arch/x86/events/intel/core.c
@@ -3147,7 +3147,10 @@ x86_get_event_constraints(struct cpu_hw_events *cpuc, 
int idx,
}
}
 
-   return 
+   if (!is_hybrid() || !cpuc->pmu)
+   return 
+
+   return _pmu(cpuc->pmu)->unconstrained;
 }
 
 static struct event_constraint *
diff --git a/arch/x86/events/perf_event.h b/arch/x86/events/perf_event.h
index 993f0de..cfb2da0 100644
--- a/arch/x86/events/perf_event.h
+++ b/arch/x86/events/perf_event.h
@@ -639,6 +639,7 @@ struct x86_hybrid_pmu {
int max_pebs_events;
int num_counters;
int num_counters_fixed;
+   struct event_constraint unconstrained;
 };
 
 static __always_inline struct x86_hybrid_pmu *hybrid_pmu(struct pmu *pmu)
-- 
2.7.4

[PATCH V5 14/25] perf/x86: Remove temporary pmu assignment in event_init

2021-04-05 Thread kan . liang

From: Kan Liang 

The temporary pmu assignment in event_init is unnecessary.

The assignment was introduced by commit 8113070d6639 ("perf_events:
Add fast-path to the rescheduling code"). At that time, event->pmu is
not assigned yet when initializing an event. The assignment is required.
However, from commit 7e5b2a01d2ca ("perf: provide PMU when initing
events"), the event->pmu is provided before event_init is invoked.
The temporary pmu assignment in event_init should be removed.

Reviewed-by: Andi Kleen 
Signed-off-by: Kan Liang 
---
 arch/x86/events/core.c | 11 ---
 1 file changed, 11 deletions(-)

diff --git a/arch/x86/events/core.c b/arch/x86/events/core.c
index b79a506..9c931ec 100644
--- a/arch/x86/events/core.c
+++ b/arch/x86/events/core.c
@@ -2299,7 +2299,6 @@ static int validate_group(struct perf_event *event)
 
 static int x86_pmu_event_init(struct perf_event *event)
 {
-   struct pmu *tmp;
int err;
 
switch (event->attr.type) {
@@ -2314,20 +2313,10 @@ static int x86_pmu_event_init(struct perf_event *event)
 
err = __x86_pmu_event_init(event);
if (!err) {
-   /*
-* we temporarily connect event to its pmu
-* such that validate_group() can classify
-* it as an x86 event using is_x86_event()
-*/
-   tmp = event->pmu;
-   event->pmu = 
-
if (event->group_leader != event)
err = validate_group(event);
else
err = validate_event(event);
-
-   event->pmu = tmp;
}
if (err) {
if (event->destroy)
-- 
2.7.4

[PATCH V5 09/25] perf/x86: Hybrid PMU support for event constraints

2021-04-05 Thread kan . liang

From: Kan Liang 

The events are different among hybrid PMUs. Each hybrid PMU should use
its own event constraints.

Reviewed-by: Andi Kleen 
Signed-off-by: Kan Liang 
---
 arch/x86/events/core.c   | 3 ++-
 arch/x86/events/intel/core.c | 5 +++--
 arch/x86/events/intel/ds.c   | 5 +++--
 arch/x86/events/perf_event.h | 2 ++
 4 files changed, 10 insertions(+), 5 deletions(-)

diff --git a/arch/x86/events/core.c b/arch/x86/events/core.c
index d71ca69..b866867 100644
--- a/arch/x86/events/core.c
+++ b/arch/x86/events/core.c
@@ -1526,6 +1526,7 @@ void perf_event_print_debug(void)
struct cpu_hw_events *cpuc = _cpu(cpu_hw_events, cpu);
int num_counters = hybrid(cpuc->pmu, num_counters);
int num_counters_fixed = hybrid(cpuc->pmu, num_counters_fixed);
+   struct event_constraint *pebs_constraints = hybrid(cpuc->pmu, 
pebs_constraints);
unsigned long flags;
int idx;
 
@@ -1545,7 +1546,7 @@ void perf_event_print_debug(void)
pr_info("CPU#%d: status: %016llx\n", cpu, status);
pr_info("CPU#%d: overflow:   %016llx\n", cpu, overflow);
pr_info("CPU#%d: fixed:  %016llx\n", cpu, fixed);
-   if (x86_pmu.pebs_constraints) {
+   if (pebs_constraints) {
rdmsrl(MSR_IA32_PEBS_ENABLE, pebs);
pr_info("CPU#%d: pebs:   %016llx\n", cpu, pebs);
}
diff --git a/arch/x86/events/intel/core.c b/arch/x86/events/intel/core.c
index 39f57ae..d304ba3 100644
--- a/arch/x86/events/intel/core.c
+++ b/arch/x86/events/intel/core.c
@@ -3136,10 +3136,11 @@ struct event_constraint *
 x86_get_event_constraints(struct cpu_hw_events *cpuc, int idx,
  struct perf_event *event)
 {
+   struct event_constraint *event_constraints = hybrid(cpuc->pmu, 
event_constraints);
struct event_constraint *c;
 
-   if (x86_pmu.event_constraints) {
-   for_each_event_constraint(c, x86_pmu.event_constraints) {
+   if (event_constraints) {
+   for_each_event_constraint(c, event_constraints) {
if (constraint_match(c, event->hw.config)) {
event->hw.flags |= c->flags;
return c;
diff --git a/arch/x86/events/intel/ds.c b/arch/x86/events/intel/ds.c
index 312bf3b..f1402bc 100644
--- a/arch/x86/events/intel/ds.c
+++ b/arch/x86/events/intel/ds.c
@@ -959,13 +959,14 @@ struct event_constraint 
intel_spr_pebs_event_constraints[] = {
 
 struct event_constraint *intel_pebs_constraints(struct perf_event *event)
 {
+   struct event_constraint *pebs_constraints = hybrid(event->pmu, 
pebs_constraints);
struct event_constraint *c;
 
if (!event->attr.precise_ip)
return NULL;
 
-   if (x86_pmu.pebs_constraints) {
-   for_each_event_constraint(c, x86_pmu.pebs_constraints) {
+   if (pebs_constraints) {
+   for_each_event_constraint(c, pebs_constraints) {
if (constraint_match(c, event->hw.config)) {
event->hw.flags |= c->flags;
return c;
diff --git a/arch/x86/events/perf_event.h b/arch/x86/events/perf_event.h
index 203c165..c32e9dc 100644
--- a/arch/x86/events/perf_event.h
+++ b/arch/x86/events/perf_event.h
@@ -649,6 +649,8 @@ struct x86_hybrid_pmu {
[PERF_COUNT_HW_CACHE_MAX]
[PERF_COUNT_HW_CACHE_OP_MAX]
[PERF_COUNT_HW_CACHE_RESULT_MAX];
+   struct event_constraint *event_constraints;
+   struct event_constraint *pebs_constraints;
 };
 
 static __always_inline struct x86_hybrid_pmu *hybrid_pmu(struct pmu *pmu)
-- 
2.7.4

[PATCH V5 10/25] perf/x86: Hybrid PMU support for extra_regs

2021-04-05 Thread kan . liang

From: Kan Liang 

Different hybrid PMU may have different extra registers, e.g. Core PMU
may have offcore registers, frontend register and ldlat register. Atom
core may only have offcore registers and ldlat register. Each hybrid PMU
should use its own extra_regs.

An Intel Hybrid system should always have extra registers.
Unconditionally allocate shared_regs for Intel Hybrid system.

Reviewed-by: Andi Kleen 
Signed-off-by: Kan Liang 
---
 arch/x86/events/core.c   |  5 +++--
 arch/x86/events/intel/core.c | 15 +--
 arch/x86/events/perf_event.h |  1 +
 3 files changed, 13 insertions(+), 8 deletions(-)

diff --git a/arch/x86/events/core.c b/arch/x86/events/core.c
index b866867..b79a506 100644
--- a/arch/x86/events/core.c
+++ b/arch/x86/events/core.c
@@ -153,15 +153,16 @@ u64 x86_perf_event_update(struct perf_event *event)
  */
 static int x86_pmu_extra_regs(u64 config, struct perf_event *event)
 {
+   struct extra_reg *extra_regs = hybrid(event->pmu, extra_regs);
struct hw_perf_event_extra *reg;
struct extra_reg *er;
 
reg = >hw.extra_reg;
 
-   if (!x86_pmu.extra_regs)
+   if (!extra_regs)
return 0;
 
-   for (er = x86_pmu.extra_regs; er->msr; er++) {
+   for (er = extra_regs; er->msr; er++) {
if (er->event != (config & er->config_mask))
continue;
if (event->attr.config1 & ~er->valid_mask)
diff --git a/arch/x86/events/intel/core.c b/arch/x86/events/intel/core.c
index d304ba3..b5b7694 100644
--- a/arch/x86/events/intel/core.c
+++ b/arch/x86/events/intel/core.c
@@ -2966,8 +2966,10 @@ intel_vlbr_constraints(struct perf_event *event)
return NULL;
 }
 
-static int intel_alt_er(int idx, u64 config)
+static int intel_alt_er(struct cpu_hw_events *cpuc,
+   int idx, u64 config)
 {
+   struct extra_reg *extra_regs = hybrid(cpuc->pmu, extra_regs);
int alt_idx = idx;
 
if (!(x86_pmu.flags & PMU_FL_HAS_RSP_1))
@@ -2979,7 +2981,7 @@ static int intel_alt_er(int idx, u64 config)
if (idx == EXTRA_REG_RSP_1)
alt_idx = EXTRA_REG_RSP_0;
 
-   if (config & ~x86_pmu.extra_regs[alt_idx].valid_mask)
+   if (config & ~extra_regs[alt_idx].valid_mask)
return idx;
 
return alt_idx;
@@ -2987,15 +2989,16 @@ static int intel_alt_er(int idx, u64 config)
 
 static void intel_fixup_er(struct perf_event *event, int idx)
 {
+   struct extra_reg *extra_regs = hybrid(event->pmu, extra_regs);
event->hw.extra_reg.idx = idx;
 
if (idx == EXTRA_REG_RSP_0) {
event->hw.config &= ~INTEL_ARCH_EVENT_MASK;
-   event->hw.config |= x86_pmu.extra_regs[EXTRA_REG_RSP_0].event;
+   event->hw.config |= extra_regs[EXTRA_REG_RSP_0].event;
event->hw.extra_reg.reg = MSR_OFFCORE_RSP_0;
} else if (idx == EXTRA_REG_RSP_1) {
event->hw.config &= ~INTEL_ARCH_EVENT_MASK;
-   event->hw.config |= x86_pmu.extra_regs[EXTRA_REG_RSP_1].event;
+   event->hw.config |= extra_regs[EXTRA_REG_RSP_1].event;
event->hw.extra_reg.reg = MSR_OFFCORE_RSP_1;
}
 }
@@ -3071,7 +3074,7 @@ __intel_shared_reg_get_constraints(struct cpu_hw_events 
*cpuc,
 */
c = NULL;
} else {
-   idx = intel_alt_er(idx, reg->config);
+   idx = intel_alt_er(cpuc, idx, reg->config);
if (idx != reg->idx) {
raw_spin_unlock_irqrestore(>lock, flags);
goto again;
@@ -4161,7 +4164,7 @@ int intel_cpuc_prepare(struct cpu_hw_events *cpuc, int 
cpu)
 {
cpuc->pebs_record_size = x86_pmu.pebs_record_size;
 
-   if (x86_pmu.extra_regs || x86_pmu.lbr_sel_map) {
+   if (is_hybrid() || x86_pmu.extra_regs || x86_pmu.lbr_sel_map) {
cpuc->shared_regs = allocate_shared_regs(cpu);
if (!cpuc->shared_regs)
goto err;
diff --git a/arch/x86/events/perf_event.h b/arch/x86/events/perf_event.h
index c32e9dc..5679c12 100644
--- a/arch/x86/events/perf_event.h
+++ b/arch/x86/events/perf_event.h
@@ -651,6 +651,7 @@ struct x86_hybrid_pmu {
[PERF_COUNT_HW_CACHE_RESULT_MAX];
struct event_constraint *event_constraints;
struct event_constraint *pebs_constraints;
+   struct extra_reg*extra_regs;
 };
 
 static __always_inline struct x86_hybrid_pmu *hybrid_pmu(struct pmu *pmu)
-- 
2.7.4

[PATCH V5 02/25] x86/cpu: Add helper function to get the type of the current hybrid CPU

2021-04-05 Thread kan . liang

From: Ricardo Neri 

On processors with Intel Hybrid Technology (i.e., one having more than
one type of CPU in the same package), all CPUs support the same
instruction set and enumerate the same features on CPUID. Thus, all
software can run on any CPU without restrictions. However, there may be
model-specific differences among types of CPUs. For instance, each type
of CPU may support a different number of performance counters. Also,
machine check error banks may be wired differently. Even though most
software will not care about these differences, kernel subsystems
dealing with these differences must know.

Add and expose a new helper function get_this_hybrid_cpu_type() to query
the type of the current hybrid CPU. The function will be used later in
the perf subsystem.

The Intel Software Developer's Manual defines the CPU type as 8-bit
identifier.

Cc: Andi Kleen 
Cc: Andy Lutomirski 
Cc: Dave Hansen 
Cc: Kan Liang 
Cc: "Peter Zijlstra (Intel)" 
Cc: "Rafael J. Wysocki" 
Cc: "Ravi V. Shankar" 
Cc: Srinivas Pandruvada 
Cc: linux-kernel@vger.kernel.org
Reviewed-by: Len Brown 
Reviewed-by: Tony Luck 
Acked-by: Borislav Petkov 
Signed-off-by: Ricardo Neri 
---
Changes since v4 (as part of patchset for perf change for Alderlake)
 * Put the X86_HYBRID_CPU_TYPE_ID_SHIFT over the function where it is
   used (Boris) 
 * Add Acked-by

Changes since v3 (as part of patchset for perf change for Alderlake)
 * None

Changes since v2 (as part of patchset for perf change for Alderlake)
 * Use get_this_hybrid_cpu_type() to replace get_hybrid_cpu_type() to
   avoid the trouble of IPIs. The new function retrieves the type of the
   current hybrid CPU. It's good enough for perf. (Dave)
 * Remove definitions for Atom and Core CPU types. Perf will define a
   enum for the hybrid CPU type in the perf_event.h (Peter)
 * Remove X86_HYBRID_CPU_NATIVE_MODEL_ID_MASK. Not used in the patch
   set. (Kan)
 * Update the description accordingly. (Boris)

Changes since v1 (as part of patchset for perf change for Alderlake)
 * Removed cpuinfo_x86.x86_cpu_type. It can be added later if needed.
   Instead, implement helper functions that subsystems can use.(Boris)
 * Add definitions for Atom and Core CPU types. (Kan)

Changes since v1 (in a separate posting)
 * Simplify code by using cpuid_eax(). (Boris)
 * Reworded the commit message to clarify the concept of Intel Hybrid
   Technology. Stress that all CPUs can run the same instruction set
   and support the same features.
---
 arch/x86/include/asm/cpu.h  |  6 ++
 arch/x86/kernel/cpu/intel.c | 16 
 2 files changed, 22 insertions(+)

diff --git a/arch/x86/include/asm/cpu.h b/arch/x86/include/asm/cpu.h
index da78ccb..610905d 100644
--- a/arch/x86/include/asm/cpu.h
+++ b/arch/x86/include/asm/cpu.h
@@ -45,6 +45,7 @@ extern void __init cpu_set_core_cap_bits(struct cpuinfo_x86 
*c);
 extern void switch_to_sld(unsigned long tifn);
 extern bool handle_user_split_lock(struct pt_regs *regs, long error_code);
 extern bool handle_guest_split_lock(unsigned long ip);
+u8 get_this_hybrid_cpu_type(void);
 #else
 static inline void __init cpu_set_core_cap_bits(struct cpuinfo_x86 *c) {}
 static inline void switch_to_sld(unsigned long tifn) {}
@@ -57,6 +58,11 @@ static inline bool handle_guest_split_lock(unsigned long ip)
 {
return false;
 }
+
+static inline u8 get_this_hybrid_cpu_type(void)
+{
+   return 0;
+}
 #endif
 #ifdef CONFIG_IA32_FEAT_CTL
 void init_ia32_feat_ctl(struct cpuinfo_x86 *c);
diff --git a/arch/x86/kernel/cpu/intel.c b/arch/x86/kernel/cpu/intel.c
index 0e422a5..26fb626 100644
--- a/arch/x86/kernel/cpu/intel.c
+++ b/arch/x86/kernel/cpu/intel.c
@@ -1195,3 +1195,19 @@ void __init cpu_set_core_cap_bits(struct cpuinfo_x86 *c)
cpu_model_supports_sld = true;
split_lock_setup();
 }
+
+#define X86_HYBRID_CPU_TYPE_ID_SHIFT   24
+
+/**
+ * get_this_hybrid_cpu_type() - Get the type of this hybrid CPU
+ *
+ * Returns the CPU type [31:24] (i.e., Atom or Core) of a CPU in
+ * a hybrid processor. If the processor is not hybrid, returns 0.
+ */
+u8 get_this_hybrid_cpu_type(void)
+{
+   if (!cpu_feature_enabled(X86_FEATURE_HYBRID_CPU))
+   return 0;
+
+   return cpuid_eax(0x001a) >> X86_HYBRID_CPU_TYPE_ID_SHIFT;
+}
-- 
2.7.4

[PATCH V5 04/25] perf/x86/intel: Hybrid PMU support for perf capabilities

2021-04-05 Thread kan . liang

From: Kan Liang 

Some platforms, e.g. Alder Lake, have hybrid architecture. Although most
PMU capabilities are the same, there are still some unique PMU
capabilities for different hybrid PMUs. Perf should register a dedicated
pmu for each hybrid PMU.

Add a new struct x86_hybrid_pmu, which saves the dedicated pmu and
capabilities for each hybrid PMU.

The architecture MSR, MSR_IA32_PERF_CAPABILITIES, only indicates the
architecture features which are available on all hybrid PMUs. The
architecture features are stored in the global x86_pmu.intel_cap.

For Alder Lake, the model-specific features are perf metrics and
PEBS-via-PT. The corresponding bits of the global x86_pmu.intel_cap
should be 0 for these two features. Perf should not use the global
intel_cap to check the features on a hybrid system.
Add a dedicated intel_cap in the x86_hybrid_pmu to store the
model-specific capabilities. Use the dedicated intel_cap to replace
the global intel_cap for thse two features. The dedicated intel_cap
will be set in the following "Add Alder Lake Hybrid support" patch.

Add is_hybrid() to distinguish a hybrid system. ADL may have an
alternative configuration. With that configuration, the
X86_FEATURE_HYBRID_CPU is not set. Perf cannot rely on the feature bit.
The number of hybrid PMUs implies whether it's a hybrid system. The
number will be assigned in the following "Add Alder Lake Hybrid support"
patch as well.

Suggested-by: Peter Zijlstra (Intel) 
Signed-off-by: Kan Liang 
---
 arch/x86/events/core.c   |  6 --
 arch/x86/events/intel/core.c | 27 ++-
 arch/x86/events/intel/ds.c   |  2 +-
 arch/x86/events/perf_event.h | 34 ++
 arch/x86/include/asm/msr-index.h |  3 +++
 5 files changed, 64 insertions(+), 8 deletions(-)

diff --git a/arch/x86/events/core.c b/arch/x86/events/core.c
index e564e96..d3d3c6b 100644
--- a/arch/x86/events/core.c
+++ b/arch/x86/events/core.c
@@ -1105,8 +1105,9 @@ static void del_nr_metric_event(struct cpu_hw_events 
*cpuc,
 static int collect_event(struct cpu_hw_events *cpuc, struct perf_event *event,
 int max_count, int n)
 {
+   union perf_capabilities intel_cap = hybrid(cpuc->pmu, intel_cap);
 
-   if (x86_pmu.intel_cap.perf_metrics && add_nr_metric_event(cpuc, event))
+   if (intel_cap.perf_metrics && add_nr_metric_event(cpuc, event))
return -EINVAL;
 
if (n >= max_count + cpuc->n_metric)
@@ -1582,6 +1583,7 @@ void x86_pmu_stop(struct perf_event *event, int flags)
 static void x86_pmu_del(struct perf_event *event, int flags)
 {
struct cpu_hw_events *cpuc = this_cpu_ptr(_hw_events);
+   union perf_capabilities intel_cap = hybrid(cpuc->pmu, intel_cap);
int i;
 
/*
@@ -1621,7 +1623,7 @@ static void x86_pmu_del(struct perf_event *event, int 
flags)
}
cpuc->event_constraint[i-1] = NULL;
--cpuc->n_events;
-   if (x86_pmu.intel_cap.perf_metrics)
+   if (intel_cap.perf_metrics)
del_nr_metric_event(cpuc, event);
 
perf_event_update_userpage(event);
diff --git a/arch/x86/events/intel/core.c b/arch/x86/events/intel/core.c
index f116c63..494b9bc 100644
--- a/arch/x86/events/intel/core.c
+++ b/arch/x86/events/intel/core.c
@@ -3646,6 +3646,15 @@ static inline bool is_mem_loads_aux_event(struct 
perf_event *event)
return (event->attr.config & INTEL_ARCH_EVENT_MASK) == 
X86_CONFIG(.event=0x03, .umask=0x82);
 }
 
+static inline bool intel_pmu_has_cap(struct perf_event *event, int idx)
+{
+   union perf_capabilities *intel_cap;
+
+   intel_cap = is_hybrid() ? _pmu(event->pmu)->intel_cap :
+ _pmu.intel_cap;
+
+   return test_bit(idx, (unsigned long *)_cap->capabilities);
+}
 
 static int intel_pmu_hw_config(struct perf_event *event)
 {
@@ -3712,7 +3721,7 @@ static int intel_pmu_hw_config(struct perf_event *event)
 * with a slots event as group leader. When the slots event
 * is used in a metrics group, it too cannot support sampling.
 */
-   if (x86_pmu.intel_cap.perf_metrics && is_topdown_event(event)) {
+   if (intel_pmu_has_cap(event, PERF_CAP_METRICS_IDX) && 
is_topdown_event(event)) {
if (event->attr.config1 || event->attr.config2)
return -EINVAL;
 
@@ -4219,8 +4228,16 @@ static void intel_pmu_cpu_starting(int cpu)
if (x86_pmu.version > 1)
flip_smm_bit(_pmu.attr_freeze_on_smi);
 
-   /* Disable perf metrics if any added CPU doesn't support it. */
-   if (x86_pmu.intel_cap.perf_metrics) {
+   /*
+* Disable perf metrics if any added CPU doesn't support it.
+*
+* Turn off the check for a hybrid architecture, because the
+* architecture MSR, MSR_IA32_PERF_CAPABILITIES, only indicate
+

[PATCH V5 03/25] perf/x86: Track pmu in per-CPU cpu_hw_events

2021-04-05 Thread kan . liang

From: Kan Liang 

Some platforms, e.g. Alder Lake, have hybrid architecture. In the same
package, there may be more than one type of CPU. The PMU capabilities
are different among different types of CPU. Perf will register a
dedicated PMU for each type of CPU.

Add a 'pmu' variable in the struct cpu_hw_events to track the dedicated
PMU of the current CPU.

Current x86_get_pmu() use the global 'pmu', which will be broken on a
hybrid platform. Modify it to apply the 'pmu' of the specific CPU.

Initialize the per-CPU 'pmu' variable with the global 'pmu'. There is
nothing changed for the non-hybrid platforms.

The is_x86_event() will be updated in the later patch ("perf/x86:
Register hybrid PMUs") for hybrid platforms. For the non-hybrid
platforms, nothing is changed here.

Suggested-by: Peter Zijlstra (Intel) 
Signed-off-by: Kan Liang 
---
 arch/x86/events/core.c   | 17 +
 arch/x86/events/intel/core.c |  2 +-
 arch/x86/events/intel/ds.c   |  4 ++--
 arch/x86/events/intel/lbr.c  |  9 +
 arch/x86/events/perf_event.h |  4 +++-
 5 files changed, 24 insertions(+), 12 deletions(-)

diff --git a/arch/x86/events/core.c b/arch/x86/events/core.c
index 18df171..e564e96 100644
--- a/arch/x86/events/core.c
+++ b/arch/x86/events/core.c
@@ -45,9 +45,11 @@
 #include "perf_event.h"
 
 struct x86_pmu x86_pmu __read_mostly;
+static struct pmu pmu;
 
 DEFINE_PER_CPU(struct cpu_hw_events, cpu_hw_events) = {
.enabled = 1,
+   .pmu = ,
 };
 
 DEFINE_STATIC_KEY_FALSE(rdpmc_never_available_key);
@@ -724,16 +726,23 @@ void x86_pmu_enable_all(int added)
}
 }
 
-static struct pmu pmu;
-
 static inline int is_x86_event(struct perf_event *event)
 {
return event->pmu == 
 }
 
-struct pmu *x86_get_pmu(void)
+struct pmu *x86_get_pmu(unsigned int cpu)
 {
-   return 
+   struct cpu_hw_events *cpuc = _cpu(cpu_hw_events, cpu);
+
+   /*
+* All CPUs of the hybrid type have been offline.
+* The x86_get_pmu() should not be invoked.
+*/
+   if (WARN_ON_ONCE(!cpuc->pmu))
+   return 
+
+   return cpuc->pmu;
 }
 /*
  * Event scheduler state:
diff --git a/arch/x86/events/intel/core.c b/arch/x86/events/intel/core.c
index 7bbb5bb..f116c63 100644
--- a/arch/x86/events/intel/core.c
+++ b/arch/x86/events/intel/core.c
@@ -4876,7 +4876,7 @@ static void update_tfa_sched(void *ignored)
 * and if so force schedule out for all event types all contexts
 */
if (test_bit(3, cpuc->active_mask))
-   perf_pmu_resched(x86_get_pmu());
+   perf_pmu_resched(x86_get_pmu(smp_processor_id()));
 }
 
 static ssize_t show_sysctl_tfa(struct device *cdev,
diff --git a/arch/x86/events/intel/ds.c b/arch/x86/events/intel/ds.c
index 7ebae18..1bfea8c 100644
--- a/arch/x86/events/intel/ds.c
+++ b/arch/x86/events/intel/ds.c
@@ -2192,7 +2192,7 @@ void __init intel_ds_init(void)
PERF_SAMPLE_TIME;
x86_pmu.flags |= PMU_FL_PEBS_ALL;
pebs_qual = "-baseline";
-   x86_get_pmu()->capabilities |= 
PERF_PMU_CAP_EXTENDED_REGS;
+   x86_get_pmu(smp_processor_id())->capabilities 
|= PERF_PMU_CAP_EXTENDED_REGS;
} else {
/* Only basic record supported */
x86_pmu.large_pebs_flags &=
@@ -2207,7 +2207,7 @@ void __init intel_ds_init(void)
 
if (x86_pmu.intel_cap.pebs_output_pt_available) {
pr_cont("PEBS-via-PT, ");
-   x86_get_pmu()->capabilities |= 
PERF_PMU_CAP_AUX_OUTPUT;
+   x86_get_pmu(smp_processor_id())->capabilities 
|= PERF_PMU_CAP_AUX_OUTPUT;
}
 
break;
diff --git a/arch/x86/events/intel/lbr.c b/arch/x86/events/intel/lbr.c
index 21890da..bb4486c 100644
--- a/arch/x86/events/intel/lbr.c
+++ b/arch/x86/events/intel/lbr.c
@@ -705,7 +705,7 @@ void intel_pmu_lbr_add(struct perf_event *event)
 
 void release_lbr_buffers(void)
 {
-   struct kmem_cache *kmem_cache = x86_get_pmu()->task_ctx_cache;
+   struct kmem_cache *kmem_cache;
struct cpu_hw_events *cpuc;
int cpu;
 
@@ -714,6 +714,7 @@ void release_lbr_buffers(void)
 
for_each_possible_cpu(cpu) {
cpuc = per_cpu_ptr(_hw_events, cpu);
+   kmem_cache = x86_get_pmu(cpu)->task_ctx_cache;
if (kmem_cache && cpuc->lbr_xsave) {
kmem_cache_free(kmem_cache, cpuc->lbr_xsave);
cpuc->lbr_xsave = NULL;
@@ -1609,7 +1610,7 @@ void intel_pmu_lbr_init_hsw(void)
x86_pmu.lbr_sel_mask = LBR_SEL_MASK;
x86_pmu.lbr_sel_map  = hsw_lbr_sel_map;
 
-   x86_get_pmu()->task_ctx_c

[PATCH V5 01/25] x86/cpufeatures: Enumerate Intel Hybrid Technology feature bit

2021-04-05 Thread kan . liang

From: Ricardo Neri 

Add feature enumeration to identify a processor with Intel Hybrid
Technology: one in which CPUs of more than one type are the same package.
On a hybrid processor, all CPUs support the same homogeneous (i.e.,
symmetric) instruction set. All CPUs enumerate the same features in CPUID.
Thus, software (user space and kernel) can run and migrate to any CPU in
the system as well as utilize any of the enumerated features without any
change or special provisions. The main difference among CPUs in a hybrid
processor are power and performance properties.

Cc: Andi Kleen 
Cc: Kan Liang 
Cc: "Peter Zijlstra (Intel)" 
Cc: "Rafael J. Wysocki" 
Cc: "Ravi V. Shankar" 
Cc: Srinivas Pandruvada 
Cc: linux-kernel@vger.kernel.org
Reviewed-by: Len Brown 
Reviewed-by: Tony Luck 
Acked-by: Borislav Petkov 
Signed-off-by: Ricardo Neri 
---
Changes since v4 (as part of patchset for perf change for Alderlake)
 * Add Acked-by

Changes since v3 (as part of patchset for perf change for Alderlake)
 * None

Changes since V2 (as part of patchset for perf change for Alderlake)
 * Don't show "hybrid_cpu" in /proc/cpuinfo (Boris)

Changes since v1 (as part of patchset for perf change for Alderlake)
 * None

Changes since v1 (in a separate posting):
 * Reworded commit message to clearly state what is Intel Hybrid
   Technology. Stress that all CPUs can run the same instruction
   set and support the same features.
---
 arch/x86/include/asm/cpufeatures.h | 1 +
 1 file changed, 1 insertion(+)

diff --git a/arch/x86/include/asm/cpufeatures.h 
b/arch/x86/include/asm/cpufeatures.h
index cc96e26..1ba4a6e 100644
--- a/arch/x86/include/asm/cpufeatures.h
+++ b/arch/x86/include/asm/cpufeatures.h
@@ -374,6 +374,7 @@
 #define X86_FEATURE_MD_CLEAR   (18*32+10) /* VERW clears CPU buffers */
 #define X86_FEATURE_TSX_FORCE_ABORT(18*32+13) /* "" TSX_FORCE_ABORT */
 #define X86_FEATURE_SERIALIZE  (18*32+14) /* SERIALIZE instruction */
+#define X86_FEATURE_HYBRID_CPU (18*32+15) /* "" This part has CPUs of 
more than one type */
 #define X86_FEATURE_TSXLDTRK   (18*32+16) /* TSX Suspend Load Address 
Tracking */
 #define X86_FEATURE_PCONFIG(18*32+18) /* Intel PCONFIG */
 #define X86_FEATURE_ARCH_LBR   (18*32+19) /* Intel ARCH LBR */
-- 
2.7.4

[PATCH V5 00/25] Add Alder Lake support for perf (kernel)

2021-04-05 Thread kan . liang

From: Kan Liang 

Changes since V4:
- Put the X86_HYBRID_CPU_TYPE_ID_SHIFT over the function where it is
  used (Boris) (Patch 2)
- Add Acked-by from Boris for Patch 1 & 2
- Fix a smatch warning, "allocate_fake_cpuc() warn: possible memory
  leak of 'cpuc'" (0-DAY test) (Patch 16)

Changes since V3:
- Check whether the supported_cpus is empty in allocate_fake_cpuc().
  A user may offline all the CPUs of a certain type. Perf should not
  create an event for that PMU. (Patch 16)
- Don't clear a cpuc->pmu when the cpu is offlined in intel_pmu_cpu_dead().
  We never unregister a PMU, even all the CPUs of a certain type are
  offlined. A cpuc->pmu should be always valid and unchanged. There is no
  harm to keep the pointer of the PMU. Also, some functions, e.g.,
  release_lbr_buffers(), require a valid cpuc->pmu for each possible CPU.
  (Patch 16)
- ADL may have an alternative configuration. With that configuration
  X86_FEATURE_HYBRID_CPU is not set. Perf cannot retrieve the core type
  from the CPUID leaf 0x1a either.
  Use the number of hybrid PMUs, which implies a hybrid system, to replace
  the check of the X86_FEATURE_HYBRID_CPU. (Patch 4)
  Introduce a platform specific get_hybrid_cpu_type to retrieve the core
  type if the generic one doesn't return a valid core type. (Patch 16 & 20)

Changes since V2:
- Don't show "hybrid_cpu" in /proc/cpuinfo (Boris) (Patch 1)
- Use get_this_hybrid_cpu_type() to replace get_hybrid_cpu_type() to
  avoid the trouble of IPIs. The new function retrieves the type of the
  current hybrid CPU. It's good enough for perf. (Dave) (Patch 2)
- Remove definitions for Atom and Core CPU types. Perf will define a
  enum for the hybrid CPU type in the perf_event.h (Peter) (Patch 2 & 16)
- Remove X86_HYBRID_CPU_NATIVE_MODEL_ID_MASK. Not used in the patch set
  (Kan)(Patch 2)
- Update the description of the patch 2 accordingly. (Boris) (Patch 2)
- All the hybrid PMUs are registered at boot time. (Peter) (Patch 16)
- Align all ATTR things. (Peter) (Patch 20)
- The patchset doesn't change the caps/pmu_name. The perf tool doesn't
  rely on it to distinguish the event list. The caps/pmu_name is only to
  indicate the microarchitecture, which is the hybrid Alder Lake for
  both PMUs.

Changes since V1:
- Drop all user space patches, which will be reviewed later separately.
- Don't save the CPU type in struct cpuinfo_x86. Instead, provide helper
  functions to get parameters of hybrid CPUs. (Boris)
- Rework the perf kernel patches according to Peter's suggestion. The
  key changes include,
  - Code style changes. Drop all the macro which names in capital
letters.
  - Drop the hybrid PMU index, track the pointer of the hybrid PMU in
the per-CPU struct cpu_hw_events.
  - Fix the x86_get_pmu() support
  - Fix the allocate_fake_cpuc() support
  - Fix validate_group() support
  - Dynamically allocate the *hybrid_pmu for each hybrid PMU

Alder Lake uses a hybrid architecture utilizing Golden Cove cores
and Gracemont cores. On such architectures, all CPUs support the same,
homogeneous and symmetric, instruction set. Also, CPUID enumerate
the same features for all CPUs. There may be model-specific differences,
such as those addressed in this patchset.

The first two patches enumerate the hybrid CPU feature bit and provide
a helper function to get the CPU type of hybrid CPUs. (The initial idea
[1] was to save the CPU type in a new field x86_cpu_type in struct
cpuinfo_x86. Since the only user of the new field is perf, querying the
X86_FEATURE_HYBRID_CPU at the call site is a simpler alternative.[2])
Compared with the initial submission, the below two concerns[3][4] are
also addressed,
- Provide a good use case, PMU.
- Clarify what Intel Hybrid Technology is and is not.

The PMU capabilities for Golden Cove core and Gracemont core are not the
same. The key differences include the number of counters, events, perf
metrics feature, and PEBS-via-PT feature. A dedicated hybrid PMU has to
be registered for each of them. However, the current perf X86 assumes
that there is only one CPU PMU. To handle the hybrid PMUs, the patchset
- Introduce a new struct x86_hybrid_pmu to save the unique capabilities
  from different PMUs. It's part of the global x86_pmu. The architecture
  capabilities, which are available for all PMUs, are still saved in
  the global x86_pmu. To save the space, the x86_hybrid_pmu is
  dynamically allocated.
- The hybrid PMU registration has been moved to the cpu_starting(),
  because only boot CPU is available when invoking the
  init_hw_perf_events().
- Hybrid PMUs have different events and formats. Add new structures and
  helpers for events attribute and format attribute which take the PMU
  type into account.
- Add a PMU aware version PERF_TYPE_HARDWARE_PMU and
  PERF_TYPE_HW_CACHE_PMU to facilitate user space tools

The uncore, MSR and cstate are the same between hybrid CPUs.
Don't need to register hybrid PMUs for them.

The generi

[tip: perf/core] perf/x86/intel/uncore: Generic support for the MSR type of uncore blocks

2021-04-02 Thread tip-bot2 for Kan Liang

The following commit has been merged into the perf/core branch of tip:

Commit-ID: d6c754130435ab786711bed75d04a2388a6b4da8
Gitweb:
https://git.kernel.org/tip/d6c754130435ab786711bed75d04a2388a6b4da8
Author:Kan Liang 
AuthorDate:Wed, 17 Mar 2021 10:59:34 -07:00
Committer: Peter Zijlstra 
CommitterDate: Fri, 02 Apr 2021 10:04:54 +02:00

perf/x86/intel/uncore: Generic support for the MSR type of uncore blocks

The discovery table provides the generic uncore block information for
the MSR type of uncore blocks, e.g., the counter width, the number of
counters, the location of control/counter registers, which is good
enough to provide basic uncore support. It can be used as a fallback
solution when the kernel doesn't support a platform.

The name of the uncore box cannot be retrieved from the discovery table.
uncore_type__ will be used as its name. Save the type ID
and the box ID information in the struct intel_uncore_type.
Factor out uncore_get_pmu_name() to handle different naming methods.

Implement generic support for the MSR type of uncore block.

Some advanced features, such as filters and constraints, cannot be
retrieved from discovery tables. Features that rely on that
information are not be supported here.

Signed-off-by: Kan Liang 
Signed-off-by: Peter Zijlstra (Intel) 
Link: 
https://lkml.kernel.org/r/1616003977-90612-3-git-send-email-kan.li...@linux.intel.com
---
 arch/x86/events/intel/uncore.c   |  45 ++--
 arch/x86/events/intel/uncore.h   |   3 +-
 arch/x86/events/intel/uncore_discovery.c | 126 ++-
 arch/x86/events/intel/uncore_discovery.h |  18 +++-
 4 files changed, 182 insertions(+), 10 deletions(-)

diff --git a/arch/x86/events/intel/uncore.c b/arch/x86/events/intel/uncore.c
index d111370..dabc01f 100644
--- a/arch/x86/events/intel/uncore.c
+++ b/arch/x86/events/intel/uncore.c
@@ -10,7 +10,7 @@ static bool uncore_no_discover;
 module_param(uncore_no_discover, bool, 0);
 MODULE_PARM_DESC(uncore_no_discover, "Don't enable the Intel uncore PerfMon 
discovery mechanism "
 "(default: enable the discovery 
mechanism).");
-static struct intel_uncore_type *empty_uncore[] = { NULL, };
+struct intel_uncore_type *empty_uncore[] = { NULL, };
 struct intel_uncore_type **uncore_msr_uncores = empty_uncore;
 struct intel_uncore_type **uncore_pci_uncores = empty_uncore;
 struct intel_uncore_type **uncore_mmio_uncores = empty_uncore;
@@ -834,6 +834,34 @@ static const struct attribute_group uncore_pmu_attr_group 
= {
.attrs = uncore_pmu_attrs,
 };
 
+static void uncore_get_pmu_name(struct intel_uncore_pmu *pmu)
+{
+   struct intel_uncore_type *type = pmu->type;
+
+   /*
+* No uncore block name in discovery table.
+* Use uncore_type__ as name.
+*/
+   if (!type->name) {
+   if (type->num_boxes == 1)
+   sprintf(pmu->name, "uncore_type_%u", type->type_id);
+   else {
+   sprintf(pmu->name, "uncore_type_%u_%d",
+   type->type_id, type->box_ids[pmu->pmu_idx]);
+   }
+   return;
+   }
+
+   if (type->num_boxes == 1) {
+   if (strlen(type->name) > 0)
+   sprintf(pmu->name, "uncore_%s", type->name);
+   else
+   sprintf(pmu->name, "uncore");
+   } else
+   sprintf(pmu->name, "uncore_%s_%d", type->name, pmu->pmu_idx);
+
+}
+
 static int uncore_pmu_register(struct intel_uncore_pmu *pmu)
 {
int ret;
@@ -860,15 +888,7 @@ static int uncore_pmu_register(struct intel_uncore_pmu 
*pmu)
pmu->pmu.attr_update = pmu->type->attr_update;
}
 
-   if (pmu->type->num_boxes == 1) {
-   if (strlen(pmu->type->name) > 0)
-   sprintf(pmu->name, "uncore_%s", pmu->type->name);
-   else
-   sprintf(pmu->name, "uncore");
-   } else {
-   sprintf(pmu->name, "uncore_%s_%d", pmu->type->name,
-   pmu->pmu_idx);
-   }
+   uncore_get_pmu_name(pmu);
 
ret = perf_pmu_register(>pmu, pmu->name, -1);
if (!ret)
@@ -909,6 +929,10 @@ static void uncore_type_exit(struct intel_uncore_type 
*type)
kfree(type->pmus);
type->pmus = NULL;
}
+   if (type->box_ids) {
+   kfree(type->box_ids);
+   type->box_ids = NULL;
+   }
kfree(type->events_group);
type->events_group = NULL;
 }
@@ -1643,6 +1667,7 @@ static const struct intel_uncore_init_fun snr_uncore_init 
__initconst = {
 };
 
 static const struct intel_uncore_init_fun generic_uncore_init __initcons

[tip: perf/core] perf/x86/intel/uncore: Parse uncore discovery tables

2021-04-02 Thread tip-bot2 for Kan Liang

The following commit has been merged into the perf/core branch of tip:

Commit-ID: edae1f06c2cda41edffc93de6aedc8ba8dc883c3
Gitweb:
https://git.kernel.org/tip/edae1f06c2cda41edffc93de6aedc8ba8dc883c3
Author:Kan Liang 
AuthorDate:Wed, 17 Mar 2021 10:59:33 -07:00
Committer: Peter Zijlstra 
CommitterDate: Fri, 02 Apr 2021 10:04:54 +02:00

perf/x86/intel/uncore: Parse uncore discovery tables

A self-describing mechanism for the uncore PerfMon hardware has been
introduced with the latest Intel platforms. By reading through an MMIO
page worth of information, perf can 'discover' all the standard uncore
PerfMon registers in a machine.

The discovery mechanism relies on BIOS's support. With a proper BIOS,
a PCI device with the unique capability ID 0x23 can be found on each
die. Perf can retrieve the information of all available uncore PerfMons
from the device via MMIO. The information is composed of one global
discovery table and several unit discovery tables.
- The global discovery table includes global uncore information of the
  die, e.g., the address of the global control register, the offset of
  the global status register, the number of uncore units, the offset of
  unit discovery tables, etc.
- The unit discovery table includes generic uncore unit information,
  e.g., the access type, the counter width, the address of counters,
  the address of the counter control, the unit ID, the unit type, etc.
  The unit is also called "box" in the code.
Perf can provide basic uncore support based on this information
with the following patches.

To locate the PCI device with the discovery tables, check the generic
PCI ID first. If it doesn't match, go through the entire PCI device tree
and locate the device with the unique capability ID.

The uncore information is similar among dies. To save parsing time and
space, only completely parse and store the discovery tables on the first
die and the first box of each die. The parsed information is stored in
an
RB tree structure, intel_uncore_discovery_type. The size of the stored
discovery tables varies among platforms. It's around 4KB for a Sapphire
Rapids server.

If a BIOS doesn't support the 'discovery' mechanism, the uncore driver
will exit with -ENODEV. There is nothing changed.

Add a module parameter to disable the discovery feature. If a BIOS gets
the discovery tables wrong, users can have an option to disable the
feature. For the current patchset, the uncore driver will exit with
-ENODEV. In the future, it may fall back to the hardcode uncore driver
on a known platform.

Signed-off-by: Kan Liang 
Signed-off-by: Peter Zijlstra (Intel) 
Link: 
https://lkml.kernel.org/r/1616003977-90612-2-git-send-email-kan.li...@linux.intel.com
---
 arch/x86/events/intel/Makefile   |   2 +-
 arch/x86/events/intel/uncore.c   |  31 +-
 arch/x86/events/intel/uncore_discovery.c | 318 ++-
 arch/x86/events/intel/uncore_discovery.h | 105 +++-
 4 files changed, 448 insertions(+), 8 deletions(-)
 create mode 100644 arch/x86/events/intel/uncore_discovery.c
 create mode 100644 arch/x86/events/intel/uncore_discovery.h

diff --git a/arch/x86/events/intel/Makefile b/arch/x86/events/intel/Makefile
index e67a588..10bde6c 100644
--- a/arch/x86/events/intel/Makefile
+++ b/arch/x86/events/intel/Makefile
@@ -3,6 +3,6 @@ obj-$(CONFIG_CPU_SUP_INTEL) += core.o bts.o
 obj-$(CONFIG_CPU_SUP_INTEL)+= ds.o knc.o
 obj-$(CONFIG_CPU_SUP_INTEL)+= lbr.o p4.o p6.o pt.o
 obj-$(CONFIG_PERF_EVENTS_INTEL_UNCORE) += intel-uncore.o
-intel-uncore-objs  := uncore.o uncore_nhmex.o uncore_snb.o 
uncore_snbep.o
+intel-uncore-objs  := uncore.o uncore_nhmex.o uncore_snb.o 
uncore_snbep.o uncore_discovery.o
 obj-$(CONFIG_PERF_EVENTS_INTEL_CSTATE) += intel-cstate.o
 intel-cstate-objs  := cstate.o
diff --git a/arch/x86/events/intel/uncore.c b/arch/x86/events/intel/uncore.c
index 33c8180..d111370 100644
--- a/arch/x86/events/intel/uncore.c
+++ b/arch/x86/events/intel/uncore.c
@@ -4,7 +4,12 @@
 #include 
 #include 
 #include "uncore.h"
+#include "uncore_discovery.h"
 
+static bool uncore_no_discover;
+module_param(uncore_no_discover, bool, 0);
+MODULE_PARM_DESC(uncore_no_discover, "Don't enable the Intel uncore PerfMon 
discovery mechanism "
+"(default: enable the discovery 
mechanism).");
 static struct intel_uncore_type *empty_uncore[] = { NULL, };
 struct intel_uncore_type **uncore_msr_uncores = empty_uncore;
 struct intel_uncore_type **uncore_pci_uncores = empty_uncore;
@@ -1637,6 +1642,9 @@ static const struct intel_uncore_init_fun snr_uncore_init 
__initconst = {
.mmio_init = snr_uncore_mmio_init,
 };
 
+static const struct intel_uncore_init_fun generic_uncore_init __initconst = {
+};
+
 static const struct x86_cpu_id intel_uncore_match[] __initconst = {
X86_MATCH_

[tip: perf/core] perf/x86/intel/uncore: Generic support for the PCI type of uncore blocks

2021-04-02 Thread tip-bot2 for Kan Liang

The following commit has been merged into the perf/core branch of tip:

Commit-ID: 42839ef4a20a4bda415974ff0e7d85ff540fffa4
Gitweb:
https://git.kernel.org/tip/42839ef4a20a4bda415974ff0e7d85ff540fffa4
Author:Kan Liang 
AuthorDate:Wed, 17 Mar 2021 10:59:36 -07:00
Committer: Peter Zijlstra 
CommitterDate: Fri, 02 Apr 2021 10:04:55 +02:00

perf/x86/intel/uncore: Generic support for the PCI type of uncore blocks

The discovery table provides the generic uncore block information
for the PCI type of uncore blocks, which is good enough to provide
basic uncore support.

The PCI BUS and DEVFN information can be retrieved from the box control
field. Introduce the uncore_pci_pmus_register() to register all the
PCICFG type of uncore blocks. The old PCI probe/remove way is dropped.

The PCI BUS and DEVFN information are different among dies. Add box_ctls
to store the box control field of each die.

Add a new BUS notifier for the PCI type of uncore block to support the
hotplug. If the device is "hot remove", the corresponding registered PMU
has to be unregistered. Perf cannot locate the PMU by searching a const
pci_device_id table, because the discovery tables don't provide such
information. Introduce uncore_pci_find_dev_pmu_from_types() to search
the whole uncore_pci_uncores for the PMU.

Implement generic support for the PCI type of uncore block.

Signed-off-by: Kan Liang 
Signed-off-by: Peter Zijlstra (Intel) 
Link: 
https://lkml.kernel.org/r/1616003977-90612-5-git-send-email-kan.li...@linux.intel.com
---
 arch/x86/events/intel/uncore.c   | 91 +--
 arch/x86/events/intel/uncore.h   |  6 +-
 arch/x86/events/intel/uncore_discovery.c | 80 -
 arch/x86/events/intel/uncore_discovery.h |  7 ++-
 4 files changed, 177 insertions(+), 7 deletions(-)

diff --git a/arch/x86/events/intel/uncore.c b/arch/x86/events/intel/uncore.c
index 391fa7c..3109082 100644
--- a/arch/x86/events/intel/uncore.c
+++ b/arch/x86/events/intel/uncore.c
@@ -1032,10 +1032,37 @@ static int uncore_pci_get_dev_die_info(struct pci_dev 
*pdev, int *die)
return 0;
 }
 
+static struct intel_uncore_pmu *
+uncore_pci_find_dev_pmu_from_types(struct pci_dev *pdev)
+{
+   struct intel_uncore_type **types = uncore_pci_uncores;
+   struct intel_uncore_type *type;
+   u64 box_ctl;
+   int i, die;
+
+   for (; *types; types++) {
+   type = *types;
+   for (die = 0; die < __uncore_max_dies; die++) {
+   for (i = 0; i < type->num_boxes; i++) {
+   if (!type->box_ctls[die])
+   continue;
+   box_ctl = type->box_ctls[die] + 
type->pci_offsets[i];
+   if (pdev->devfn == 
UNCORE_DISCOVERY_PCI_DEVFN(box_ctl) &&
+   pdev->bus->number == 
UNCORE_DISCOVERY_PCI_BUS(box_ctl) &&
+   pci_domain_nr(pdev->bus) == 
UNCORE_DISCOVERY_PCI_DOMAIN(box_ctl))
+   return >pmus[i];
+   }
+   }
+   }
+
+   return NULL;
+}
+
 /*
  * Find the PMU of a PCI device.
  * @pdev: The PCI device.
  * @ids: The ID table of the available PCI devices with a PMU.
+ *   If NULL, search the whole uncore_pci_uncores.
  */
 static struct intel_uncore_pmu *
 uncore_pci_find_dev_pmu(struct pci_dev *pdev, const struct pci_device_id *ids)
@@ -1045,6 +1072,9 @@ uncore_pci_find_dev_pmu(struct pci_dev *pdev, const 
struct pci_device_id *ids)
kernel_ulong_t data;
unsigned int devfn;
 
+   if (!ids)
+   return uncore_pci_find_dev_pmu_from_types(pdev);
+
while (ids && ids->vendor) {
if ((ids->vendor == pdev->vendor) &&
(ids->device == pdev->device)) {
@@ -1283,6 +1313,48 @@ static void uncore_pci_sub_driver_init(void)
uncore_pci_sub_driver = NULL;
 }
 
+static int uncore_pci_bus_notify(struct notifier_block *nb,
+unsigned long action, void *data)
+{
+   return uncore_bus_notify(nb, action, data, NULL);
+}
+
+static struct notifier_block uncore_pci_notifier = {
+   .notifier_call = uncore_pci_bus_notify,
+};
+
+
+static void uncore_pci_pmus_register(void)
+{
+   struct intel_uncore_type **types = uncore_pci_uncores;
+   struct intel_uncore_type *type;
+   struct intel_uncore_pmu *pmu;
+   struct pci_dev *pdev;
+   u64 box_ctl;
+   int i, die;
+
+   for (; *types; types++) {
+   type = *types;
+   for (die = 0; die < __uncore_max_dies; die++) {
+   for (i = 0; i < type->num_boxes; i++) {
+   if (!type->box_ctls[die])
+

[tip: perf/core] perf/x86/intel/uncore: Rename uncore_notifier to uncore_pci_sub_notifier

2021-04-02 Thread tip-bot2 for Kan Liang

The following commit has been merged into the perf/core branch of tip:

Commit-ID: 6477dc3934775f82a571fac469fd8c348e611095
Gitweb:
https://git.kernel.org/tip/6477dc3934775f82a571fac469fd8c348e611095
Author:Kan Liang 
AuthorDate:Wed, 17 Mar 2021 10:59:35 -07:00
Committer: Peter Zijlstra 
CommitterDate: Fri, 02 Apr 2021 10:04:54 +02:00

perf/x86/intel/uncore: Rename uncore_notifier to uncore_pci_sub_notifier

Perf will use a similar method to the PCI sub driver to register
the PMUs for the PCI type of uncore blocks. The method requires a BUS
notifier to support hotplug. The current BUS notifier cannot be reused,
because it searches a const id_table for the corresponding registered
PMU. The PCI type of uncore blocks in the discovery tables doesn't
provide an id_table.

Factor out uncore_bus_notify() and add the pointer of an id_table as a
parameter. The uncore_bus_notify() will be reused in the following
patch.

The current BUS notifier is only used by the PCI sub driver. Its name is
too generic. Rename it to uncore_pci_sub_notifier, which is specific for
the PCI sub driver.

Signed-off-by: Kan Liang 
Signed-off-by: Peter Zijlstra (Intel) 
Link: 
https://lkml.kernel.org/r/1616003977-90612-4-git-send-email-kan.li...@linux.intel.com
---
 arch/x86/events/intel/uncore.c | 20 ++--
 1 file changed, 14 insertions(+), 6 deletions(-)

diff --git a/arch/x86/events/intel/uncore.c b/arch/x86/events/intel/uncore.c
index dabc01f..391fa7c 100644
--- a/arch/x86/events/intel/uncore.c
+++ b/arch/x86/events/intel/uncore.c
@@ -1203,7 +1203,8 @@ static void uncore_pci_remove(struct pci_dev *pdev)
 }
 
 static int uncore_bus_notify(struct notifier_block *nb,
-unsigned long action, void *data)
+unsigned long action, void *data,
+const struct pci_device_id *ids)
 {
struct device *dev = data;
struct pci_dev *pdev = to_pci_dev(dev);
@@ -1214,7 +1215,7 @@ static int uncore_bus_notify(struct notifier_block *nb,
if (action != BUS_NOTIFY_DEL_DEVICE)
return NOTIFY_DONE;
 
-   pmu = uncore_pci_find_dev_pmu(pdev, uncore_pci_sub_driver->id_table);
+   pmu = uncore_pci_find_dev_pmu(pdev, ids);
if (!pmu)
return NOTIFY_DONE;
 
@@ -1226,8 +1227,15 @@ static int uncore_bus_notify(struct notifier_block *nb,
return NOTIFY_OK;
 }
 
-static struct notifier_block uncore_notifier = {
-   .notifier_call = uncore_bus_notify,
+static int uncore_pci_sub_bus_notify(struct notifier_block *nb,
+unsigned long action, void *data)
+{
+   return uncore_bus_notify(nb, action, data,
+uncore_pci_sub_driver->id_table);
+}
+
+static struct notifier_block uncore_pci_sub_notifier = {
+   .notifier_call = uncore_pci_sub_bus_notify,
 };
 
 static void uncore_pci_sub_driver_init(void)
@@ -1268,7 +1276,7 @@ static void uncore_pci_sub_driver_init(void)
ids++;
}
 
-   if (notify && bus_register_notifier(_bus_type, _notifier))
+   if (notify && bus_register_notifier(_bus_type, 
_pci_sub_notifier))
notify = false;
 
if (!notify)
@@ -1319,7 +1327,7 @@ static void uncore_pci_exit(void)
if (pcidrv_registered) {
pcidrv_registered = false;
if (uncore_pci_sub_driver)
-   bus_unregister_notifier(_bus_type, 
_notifier);
+   bus_unregister_notifier(_bus_type, 
_pci_sub_notifier);
pci_unregister_driver(uncore_pci_driver);
uncore_types_exit(uncore_pci_uncores);
kfree(uncore_extra_pci_dev);

[tip: perf/core] perf/x86/intel/uncore: Generic support for the MMIO type of uncore blocks

2021-04-02 Thread tip-bot2 for Kan Liang

The following commit has been merged into the perf/core branch of tip:

Commit-ID: c4c55e362a521d763356b9e02bc9a4348c71a471
Gitweb:
https://git.kernel.org/tip/c4c55e362a521d763356b9e02bc9a4348c71a471
Author:Kan Liang 
AuthorDate:Wed, 17 Mar 2021 10:59:37 -07:00
Committer: Peter Zijlstra 
CommitterDate: Fri, 02 Apr 2021 10:04:55 +02:00

perf/x86/intel/uncore: Generic support for the MMIO type of uncore blocks

The discovery table provides the generic uncore block information
for the MMIO type of uncore blocks, which is good enough to provide
basic uncore support.

The box control field is composed of the BAR address and box control
offset. When initializing the uncore blocks, perf should ioremap the
address from the box control field.

Implement the generic support for the MMIO type of uncore block.

Signed-off-by: Kan Liang 
Signed-off-by: Peter Zijlstra (Intel) 
Link: 
https://lkml.kernel.org/r/1616003977-90612-6-git-send-email-kan.li...@linux.intel.com
---
 arch/x86/events/intel/uncore.c   |  1 +-
 arch/x86/events/intel/uncore.h   |  1 +-
 arch/x86/events/intel/uncore_discovery.c | 98 +++-
 arch/x86/events/intel/uncore_discovery.h |  1 +-
 4 files changed, 101 insertions(+)

diff --git a/arch/x86/events/intel/uncore.c b/arch/x86/events/intel/uncore.c
index 3109082..35b3470 100644
--- a/arch/x86/events/intel/uncore.c
+++ b/arch/x86/events/intel/uncore.c
@@ -1755,6 +1755,7 @@ static const struct intel_uncore_init_fun snr_uncore_init 
__initconst = {
 static const struct intel_uncore_init_fun generic_uncore_init __initconst = {
.cpu_init = intel_uncore_generic_uncore_cpu_init,
.pci_init = intel_uncore_generic_uncore_pci_init,
+   .mmio_init = intel_uncore_generic_uncore_mmio_init,
 };
 
 static const struct x86_cpu_id intel_uncore_match[] __initconst = {
diff --git a/arch/x86/events/intel/uncore.h b/arch/x86/events/intel/uncore.h
index 76fc898..549cfb2 100644
--- a/arch/x86/events/intel/uncore.h
+++ b/arch/x86/events/intel/uncore.h
@@ -70,6 +70,7 @@ struct intel_uncore_type {
union {
unsigned *msr_offsets;
unsigned *pci_offsets;
+   unsigned *mmio_offsets;
};
unsigned *box_ids;
struct event_constraint unconstrainted;
diff --git a/arch/x86/events/intel/uncore_discovery.c 
b/arch/x86/events/intel/uncore_discovery.c
index 784d7b4..aba9bff 100644
--- a/arch/x86/events/intel/uncore_discovery.c
+++ b/arch/x86/events/intel/uncore_discovery.c
@@ -442,6 +442,90 @@ static struct intel_uncore_ops generic_uncore_pci_ops = {
.read_counter   = intel_generic_uncore_pci_read_counter,
 };
 
+#define UNCORE_GENERIC_MMIO_SIZE   0x4000
+
+static unsigned int generic_uncore_mmio_box_ctl(struct intel_uncore_box *box)
+{
+   struct intel_uncore_type *type = box->pmu->type;
+
+   if (!type->box_ctls || !type->box_ctls[box->dieid] || 
!type->mmio_offsets)
+   return 0;
+
+   return type->box_ctls[box->dieid] + 
type->mmio_offsets[box->pmu->pmu_idx];
+}
+
+static void intel_generic_uncore_mmio_init_box(struct intel_uncore_box *box)
+{
+   unsigned int box_ctl = generic_uncore_mmio_box_ctl(box);
+   struct intel_uncore_type *type = box->pmu->type;
+   resource_size_t addr;
+
+   if (!box_ctl) {
+   pr_warn("Uncore type %d box %d: Invalid box control address.\n",
+   type->type_id, type->box_ids[box->pmu->pmu_idx]);
+   return;
+   }
+
+   addr = box_ctl;
+   box->io_addr = ioremap(addr, UNCORE_GENERIC_MMIO_SIZE);
+   if (!box->io_addr) {
+   pr_warn("Uncore type %d box %d: ioremap error for 0x%llx.\n",
+   type->type_id, type->box_ids[box->pmu->pmu_idx],
+   (unsigned long long)addr);
+   return;
+   }
+
+   writel(GENERIC_PMON_BOX_CTL_INT, box->io_addr);
+}
+
+static void intel_generic_uncore_mmio_disable_box(struct intel_uncore_box *box)
+{
+   if (!box->io_addr)
+   return;
+
+   writel(GENERIC_PMON_BOX_CTL_FRZ, box->io_addr);
+}
+
+static void intel_generic_uncore_mmio_enable_box(struct intel_uncore_box *box)
+{
+   if (!box->io_addr)
+   return;
+
+   writel(0, box->io_addr);
+}
+
+static void intel_generic_uncore_mmio_enable_event(struct intel_uncore_box 
*box,
+struct perf_event *event)
+{
+   struct hw_perf_event *hwc = >hw;
+
+   if (!box->io_addr)
+   return;
+
+   writel(hwc->config, box->io_addr + hwc->config_base);
+}
+
+static void intel_generic_uncore_mmio_disable_event(struct intel_uncore_box 
*box,
+ struct perf_event *event)
+{
+   struct hw_perf_event *hwc = >hw;
+
+   if (!box-&g

[PATCH V4 25/25] perf/x86/rapl: Add support for Intel Alder Lake

2021-04-01 Thread kan . liang

From: Zhang Rui 

Alder Lake RAPL support is the same as previous Sky Lake.
Add Alder Lake model for RAPL.

Reviewed-by: Andi Kleen 
Signed-off-by: Zhang Rui 
---
 arch/x86/events/rapl.c | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/arch/x86/events/rapl.c b/arch/x86/events/rapl.c
index f42a704..84a1042 100644
--- a/arch/x86/events/rapl.c
+++ b/arch/x86/events/rapl.c
@@ -800,6 +800,8 @@ static const struct x86_cpu_id rapl_model_match[] 
__initconst = {
X86_MATCH_INTEL_FAM6_MODEL(ICELAKE_X,   _hsx),
X86_MATCH_INTEL_FAM6_MODEL(COMETLAKE_L, _skl),
X86_MATCH_INTEL_FAM6_MODEL(COMETLAKE,   _skl),
+   X86_MATCH_INTEL_FAM6_MODEL(ALDERLAKE,   _skl),
+   X86_MATCH_INTEL_FAM6_MODEL(ALDERLAKE_L, _skl),
X86_MATCH_INTEL_FAM6_MODEL(SAPPHIRERAPIDS_X,_spr),
X86_MATCH_VENDOR_FAM(AMD,   0x17,   _amd_fam17h),
X86_MATCH_VENDOR_FAM(HYGON, 0x18,   _amd_fam17h),
-- 
2.7.4

[PATCH V4 23/25] perf/x86/msr: Add Alder Lake CPU support

2021-04-01 Thread kan . liang

From: Kan Liang 

PPERF and SMI_COUNT MSRs are also supported on Alder Lake.

The External Design Specification (EDS) is not published yet. It comes
from an authoritative internal source.

The patch has been tested on real hardware.

Reviewed-by: Andi Kleen 
Signed-off-by: Kan Liang 
---
 arch/x86/events/msr.c | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/arch/x86/events/msr.c b/arch/x86/events/msr.c
index 680404c..c853b28 100644
--- a/arch/x86/events/msr.c
+++ b/arch/x86/events/msr.c
@@ -100,6 +100,8 @@ static bool test_intel(int idx, void *data)
case INTEL_FAM6_TIGERLAKE_L:
case INTEL_FAM6_TIGERLAKE:
case INTEL_FAM6_ROCKETLAKE:
+   case INTEL_FAM6_ALDERLAKE:
+   case INTEL_FAM6_ALDERLAKE_L:
if (idx == PERF_MSR_SMI || idx == PERF_MSR_PPERF)
return true;
break;
-- 
2.7.4

[PATCH V4 24/25] perf/x86/cstate: Add Alder Lake CPU support

2021-04-01 Thread kan . liang

From: Kan Liang 

Compared with the Rocket Lake, the CORE C1 Residency Counter is added
for Alder Lake, but the CORE C3 Residency Counter is removed. Other
counters are the same.

Create a new adl_cstates for Alder Lake. Update the comments
accordingly.

The External Design Specification (EDS) is not published yet. It comes
from an authoritative internal source.

The patch has been tested on real hardware.

Reviewed-by: Andi Kleen 
Signed-off-by: Kan Liang 
---
 arch/x86/events/intel/cstate.c | 39 +--
 1 file changed, 29 insertions(+), 10 deletions(-)

diff --git a/arch/x86/events/intel/cstate.c b/arch/x86/events/intel/cstate.c
index 407eee5..4333990 100644
--- a/arch/x86/events/intel/cstate.c
+++ b/arch/x86/events/intel/cstate.c
@@ -40,7 +40,7 @@
  * Model specific counters:
  * MSR_CORE_C1_RES: CORE C1 Residency Counter
  *  perf code: 0x00
- *  Available model: SLM,AMT,GLM,CNL,TNT
+ *  Available model: SLM,AMT,GLM,CNL,TNT,ADL
  *  Scope: Core (each processor core has a MSR)
  * MSR_CORE_C3_RESIDENCY: CORE C3 Residency Counter
  *perf code: 0x01
@@ -51,46 +51,49 @@
  *perf code: 0x02
  *Available model: SLM,AMT,NHM,WSM,SNB,IVB,HSW,BDW,
  * SKL,KNL,GLM,CNL,KBL,CML,ICL,TGL,
- * TNT,RKL
+ * TNT,RKL,ADL
  *Scope: Core
  * MSR_CORE_C7_RESIDENCY: CORE C7 Residency Counter
  *perf code: 0x03
  *Available model: SNB,IVB,HSW,BDW,SKL,CNL,KBL,CML,
- * ICL,TGL,RKL
+ * ICL,TGL,RKL,ADL
  *Scope: Core
  * MSR_PKG_C2_RESIDENCY:  Package C2 Residency Counter.
  *perf code: 0x00
  *Available model: SNB,IVB,HSW,BDW,SKL,KNL,GLM,CNL,
- * KBL,CML,ICL,TGL,TNT,RKL
+ * KBL,CML,ICL,TGL,TNT,RKL,ADL
  *Scope: Package (physical package)
  * MSR_PKG_C3_RESIDENCY:  Package C3 Residency Counter.
  *perf code: 0x01
  *Available model: NHM,WSM,SNB,IVB,HSW,BDW,SKL,KNL,
- * GLM,CNL,KBL,CML,ICL,TGL,TNT,RKL
+ * GLM,CNL,KBL,CML,ICL,TGL,TNT,RKL,
+ * ADL
  *Scope: Package (physical package)
  * MSR_PKG_C6_RESIDENCY:  Package C6 Residency Counter.
  *perf code: 0x02
  *Available model: SLM,AMT,NHM,WSM,SNB,IVB,HSW,BDW,
  * SKL,KNL,GLM,CNL,KBL,CML,ICL,TGL,
- * TNT,RKL
+ * TNT,RKL,ADL
  *Scope: Package (physical package)
  * MSR_PKG_C7_RESIDENCY:  Package C7 Residency Counter.
  *perf code: 0x03
  *Available model: NHM,WSM,SNB,IVB,HSW,BDW,SKL,CNL,
- * KBL,CML,ICL,TGL,RKL
+ * KBL,CML,ICL,TGL,RKL,ADL
  *Scope: Package (physical package)
  * MSR_PKG_C8_RESIDENCY:  Package C8 Residency Counter.
  *perf code: 0x04
- *Available model: HSW ULT,KBL,CNL,CML,ICL,TGL,RKL
+ *Available model: HSW ULT,KBL,CNL,CML,ICL,TGL,RKL,
+ * ADL
  *Scope: Package (physical package)
  * MSR_PKG_C9_RESIDENCY:  Package C9 Residency Counter.
  *perf code: 0x05
- *Available model: HSW ULT,KBL,CNL,CML,ICL,TGL,RKL
+ *Available model: HSW ULT,KBL,CNL,CML,ICL,TGL,RKL,
+ * ADL
  *Scope: Package (physical package)
  * MSR_PKG_C10_RESIDENCY: Package C10 Residency Counter.
  *perf code: 0x06
  *Available model: HSW ULT,KBL,GLM,CNL,CML,ICL,TGL,
- * TNT,RKL
+ * TNT,RKL,ADL
  *Scope: Package (physical package)
  *
  */
@@ -563,6 +566,20 @@ static const struct cstate_model icl_cstates __initconst = 
{
  BIT(PERF_CSTATE_PKG_C10_RES),
 };
 
+static const struct

[PATCH V4 22/25] perf/x86/intel/uncore: Add Alder Lake support

2021-04-01 Thread kan . liang

From: Kan Liang 

The uncore subsystem for Alder Lake is similar to the previous Tiger
Lake.

The difference includes:
- New MSR addresses for global control, fixed counters, CBOX and ARB.
  Add a new adl_uncore_msr_ops for uncore operations.
- Add a new threshold field for CBOX.
- New PCIIDs for IMC devices.

Reviewed-by: Andi Kleen 
Signed-off-by: Kan Liang 
---
 arch/x86/events/intel/uncore.c |   7 ++
 arch/x86/events/intel/uncore.h |   1 +
 arch/x86/events/intel/uncore_snb.c | 131 +
 3 files changed, 139 insertions(+)

diff --git a/arch/x86/events/intel/uncore.c b/arch/x86/events/intel/uncore.c
index 35b3470..70816f3 100644
--- a/arch/x86/events/intel/uncore.c
+++ b/arch/x86/events/intel/uncore.c
@@ -1740,6 +1740,11 @@ static const struct intel_uncore_init_fun 
rkl_uncore_init __initconst = {
.pci_init = skl_uncore_pci_init,
 };
 
+static const struct intel_uncore_init_fun adl_uncore_init __initconst = {
+   .cpu_init = adl_uncore_cpu_init,
+   .mmio_init = tgl_uncore_mmio_init,
+};
+
 static const struct intel_uncore_init_fun icx_uncore_init __initconst = {
.cpu_init = icx_uncore_cpu_init,
.pci_init = icx_uncore_pci_init,
@@ -1794,6 +1799,8 @@ static const struct x86_cpu_id intel_uncore_match[] 
__initconst = {
X86_MATCH_INTEL_FAM6_MODEL(TIGERLAKE_L, _l_uncore_init),
X86_MATCH_INTEL_FAM6_MODEL(TIGERLAKE,   _uncore_init),
X86_MATCH_INTEL_FAM6_MODEL(ROCKETLAKE,  _uncore_init),
+   X86_MATCH_INTEL_FAM6_MODEL(ALDERLAKE,   _uncore_init),
+   X86_MATCH_INTEL_FAM6_MODEL(ALDERLAKE_L, _uncore_init),
X86_MATCH_INTEL_FAM6_MODEL(ATOM_TREMONT_D,  _uncore_init),
{},
 };
diff --git a/arch/x86/events/intel/uncore.h b/arch/x86/events/intel/uncore.h
index 549cfb2..426212f 100644
--- a/arch/x86/events/intel/uncore.h
+++ b/arch/x86/events/intel/uncore.h
@@ -575,6 +575,7 @@ void snb_uncore_cpu_init(void);
 void nhm_uncore_cpu_init(void);
 void skl_uncore_cpu_init(void);
 void icl_uncore_cpu_init(void);
+void adl_uncore_cpu_init(void);
 void tgl_uncore_cpu_init(void);
 void tgl_uncore_mmio_init(void);
 void tgl_l_uncore_mmio_init(void);
diff --git a/arch/x86/events/intel/uncore_snb.c 
b/arch/x86/events/intel/uncore_snb.c
index 5127128..0f63706 100644
--- a/arch/x86/events/intel/uncore_snb.c
+++ b/arch/x86/events/intel/uncore_snb.c
@@ -62,6 +62,8 @@
 #define PCI_DEVICE_ID_INTEL_TGL_H_IMC  0x9a36
 #define PCI_DEVICE_ID_INTEL_RKL_1_IMC  0x4c43
 #define PCI_DEVICE_ID_INTEL_RKL_2_IMC  0x4c53
+#define PCI_DEVICE_ID_INTEL_ADL_1_IMC  0x4660
+#define PCI_DEVICE_ID_INTEL_ADL_2_IMC  0x4641
 
 /* SNB event control */
 #define SNB_UNC_CTL_EV_SEL_MASK0x00ff
@@ -131,12 +133,33 @@
 #define ICL_UNC_ARB_PER_CTR0x3b1
 #define ICL_UNC_ARB_PERFEVTSEL 0x3b3
 
+/* ADL uncore global control */
+#define ADL_UNC_PERF_GLOBAL_CTL0x2ff0
+#define ADL_UNC_FIXED_CTR_CTRL  0x2fde
+#define ADL_UNC_FIXED_CTR   0x2fdf
+
+/* ADL Cbo register */
+#define ADL_UNC_CBO_0_PER_CTR0 0x2002
+#define ADL_UNC_CBO_0_PERFEVTSEL0  0x2000
+#define ADL_UNC_CTL_THRESHOLD  0x3f00
+#define ADL_UNC_RAW_EVENT_MASK (SNB_UNC_CTL_EV_SEL_MASK | \
+SNB_UNC_CTL_UMASK_MASK | \
+SNB_UNC_CTL_EDGE_DET | \
+SNB_UNC_CTL_INVERT | \
+ADL_UNC_CTL_THRESHOLD)
+
+/* ADL ARB register */
+#define ADL_UNC_ARB_PER_CTR0   0x2FD2
+#define ADL_UNC_ARB_PERFEVTSEL00x2FD0
+#define ADL_UNC_ARB_MSR_OFFSET 0x8
+
 DEFINE_UNCORE_FORMAT_ATTR(event, event, "config:0-7");
 DEFINE_UNCORE_FORMAT_ATTR(umask, umask, "config:8-15");
 DEFINE_UNCORE_FORMAT_ATTR(edge, edge, "config:18");
 DEFINE_UNCORE_FORMAT_ATTR(inv, inv, "config:23");
 DEFINE_UNCORE_FORMAT_ATTR(cmask5, cmask, "config:24-28");
 DEFINE_UNCORE_FORMAT_ATTR(cmask8, cmask, "config:24-31");
+DEFINE_UNCORE_FORMAT_ATTR(threshold, threshold, "config:24-29");
 
 /* Sandy Bridge uncore support */
 static void snb_uncore_msr_enable_event(struct intel_uncore_box *box, struct 
perf_event *event)
@@ -422,6 +445,106 @@ void tgl_uncore_cpu_init(void)
skl_uncore_msr_ops.init_box = rkl_uncore_msr_init_box;
 }
 
+static void adl_uncore_msr_init_box(struct intel_uncore_box *box)
+{
+   if (box->pmu->pmu_idx == 0)
+   wrmsrl(ADL_UNC_PERF_GLOBAL_CTL, SNB_UNC_GLOBAL_CTL_EN);
+}
+
+static void adl_uncore_msr_enable_box(struct intel_uncore_box *box)
+{
+   wrmsrl(ADL_UNC_PERF_GLOBAL_CTL, SNB_UNC_GLOBAL_CTL_EN);
+}
+
+static void adl_uncore_ms

[PATCH V4 21/25] perf: Introduce PERF_TYPE_HARDWARE_PMU and PERF_TYPE_HW_CACHE_PMU

2021-04-01 Thread kan . liang

From: Kan Liang 

Current Hardware events and Hardware cache events have special perf
types, PERF_TYPE_HARDWARE and PERF_TYPE_HW_CACHE. The two types don't
pass the PMU type in the user interface. For a hybrid system, the perf
subsystem doesn't know which PMU the events belong to. The first capable
PMU will always be assigned to the events. The events never get a chance
to run on the other capable PMUs.

Add a PMU aware version PERF_TYPE_HARDWARE_PMU and
PERF_TYPE_HW_CACHE_PMU. The PMU type ID is stored at attr.config[40:32].
Support the new types for X86.

Suggested-by: Andi Kleen 
Reviewed-by: Andi Kleen 
Signed-off-by: Kan Liang 
---
 arch/x86/events/core.c  | 10 --
 include/uapi/linux/perf_event.h | 26 ++
 kernel/events/core.c| 14 +-
 3 files changed, 47 insertions(+), 3 deletions(-)

diff --git a/arch/x86/events/core.c b/arch/x86/events/core.c
index 18b75d6..36cfebad 100644
--- a/arch/x86/events/core.c
+++ b/arch/x86/events/core.c
@@ -488,7 +488,7 @@ int x86_setup_perfctr(struct perf_event *event)
if (attr->type == event->pmu->type)
return x86_pmu_extra_regs(event->attr.config, event);
 
-   if (attr->type == PERF_TYPE_HW_CACHE)
+   if ((attr->type == PERF_TYPE_HW_CACHE) || (attr->type == 
PERF_TYPE_HW_CACHE_PMU))
return set_ext_hw_attr(hwc, event);
 
if (attr->config >= x86_pmu.max_events)
@@ -2452,9 +2452,15 @@ static int x86_pmu_event_init(struct perf_event *event)
 
if ((event->attr.type != event->pmu->type) &&
(event->attr.type != PERF_TYPE_HARDWARE) &&
-   (event->attr.type != PERF_TYPE_HW_CACHE))
+   (event->attr.type != PERF_TYPE_HW_CACHE) &&
+   (event->attr.type != PERF_TYPE_HARDWARE_PMU) &&
+   (event->attr.type != PERF_TYPE_HW_CACHE_PMU))
return -ENOENT;
 
+   if ((event->attr.type == PERF_TYPE_HARDWARE_PMU) ||
+   (event->attr.type == PERF_TYPE_HW_CACHE_PMU))
+   event->attr.config &= PERF_HW_CACHE_EVENT_MASK;
+
if (is_hybrid() && (event->cpu != -1)) {
pmu = hybrid_pmu(event->pmu);
if (!cpumask_test_cpu(event->cpu, >supported_cpus))
diff --git a/include/uapi/linux/perf_event.h b/include/uapi/linux/perf_event.h
index ad15e40..c0a511e 100644
--- a/include/uapi/linux/perf_event.h
+++ b/include/uapi/linux/perf_event.h
@@ -33,6 +33,8 @@ enum perf_type_id {
PERF_TYPE_HW_CACHE  = 3,
PERF_TYPE_RAW   = 4,
PERF_TYPE_BREAKPOINT= 5,
+   PERF_TYPE_HARDWARE_PMU  = 6,
+   PERF_TYPE_HW_CACHE_PMU  = 7,
 
PERF_TYPE_MAX,  /* non-ABI */
 };
@@ -95,6 +97,30 @@ enum perf_hw_cache_op_result_id {
 };
 
 /*
+ * attr.config layout for type PERF_TYPE_HARDWARE* and PERF_TYPE_HW_CACHE*
+ * PERF_TYPE_HARDWARE: 0xAA
+ * AA: hardware event ID
+ * PERF_TYPE_HW_CACHE: 0xCCBBAA
+ * AA: hardware cache ID
+ * BB: hardware cache op ID
+ * CC: hardware cache op result ID
+ * PERF_TYPE_HARDWARE_PMU: 0xDD00AA
+ * AA: hardware event ID
+ * DD: PMU type ID
+ * PERF_TYPE_HW_CACHE_PMU: 0xDD00CCBBAA
+ * AA: hardware cache ID
+ * BB: hardware cache op ID
+ * CC: hardware cache op result ID
+ * DD: PMU type ID
+ */
+#define PERF_HW_CACHE_ID_SHIFT 0
+#define PERF_HW_CACHE_OP_ID_SHIFT  8
+#define PERF_HW_CACHE_OP_RESULT_ID_SHIFT   16
+#define PERF_HW_CACHE_EVENT_MASK   0xff
+
+#define PERF_PMU_TYPE_SHIFT32
+
+/*
  * Special "software" events provided by the kernel, even if the hardware
  * does not support performance events. These events measure various
  * physical and sw events of the kernel (and allow the profiling of them as
diff --git a/kernel/events/core.c b/kernel/events/core.c
index f079431..b8ab756 100644
--- a/kernel/events/core.c
+++ b/kernel/events/core.c
@@ -11093,6 +11093,14 @@ static int perf_try_init_event(struct pmu *pmu, struct 
perf_event *event)
return ret;
 }
 
+static bool perf_event_is_hw_pmu_type(struct perf_event *event)
+{
+   int type = event->attr.type;
+
+   return type == PERF_TYPE_HARDWARE_PMU ||
+  type == PERF_TYPE_HW_CACHE_PMU;
+}
+
 static struct pmu *perf_init_event(struct perf_event *event)
 {
int idx, type, ret;
@@ -6,13 +11124,17 @@ static struct pmu *perf_init_event(struct perf_event 
*event)
if (type == PERF_TYPE_HARDWARE || type == PERF_

[PATCH V4 20/25] perf/x86/intel: Add Alder Lake Hybrid support

2021-04-01 Thread kan . liang

From: Kan Liang 

Alder Lake Hybrid system has two different types of core, Golden Cove
core and Gracemont core. The Golden Cove core is registered to
"cpu_core" PMU. The Gracemont core is registered to "cpu_atom" PMU.

The difference between the two PMUs include:
- Number of GP and fixed counters
- Events
- The "cpu_core" PMU supports Topdown metrics.
  The "cpu_atom" PMU supports PEBS-via-PT.

The "cpu_core" PMU is similar to the Sapphire Rapids PMU, but without
PMEM.
The "cpu_atom" PMU is similar to Tremont, but with different events,
event_constraints, extra_regs and number of counters.

The mem-loads AUX event workaround only applies to the Golden Cove core.

Users may disable all CPUs of the same CPU type on the command line or
in the BIOS. For this case, perf still register a PMU for the CPU type
but the CPU mask is 0.

Current caps/pmu_name is usually the microarch codename. Assign the
"alderlake_hybrid" to the caps/pmu_name of both PMUs to indicate the
hybrid Alder Lake microarchitecture.

Reviewed-by: Andi Kleen 
Signed-off-by: Kan Liang 
---
 arch/x86/events/intel/core.c | 254 ++-
 arch/x86/events/intel/ds.c   |   7 ++
 arch/x86/events/perf_event.h |   7 ++
 3 files changed, 267 insertions(+), 1 deletion(-)

diff --git a/arch/x86/events/intel/core.c b/arch/x86/events/intel/core.c
index 07af58c..2b553d9 100644
--- a/arch/x86/events/intel/core.c
+++ b/arch/x86/events/intel/core.c
@@ -2076,6 +2076,14 @@ static struct extra_reg intel_tnt_extra_regs[] 
__read_mostly = {
EVENT_EXTRA_END
 };
 
+static struct extra_reg intel_grt_extra_regs[] __read_mostly = {
+   /* must define OFFCORE_RSP_X first, see intel_fixup_er() */
+   INTEL_UEVENT_EXTRA_REG(0x01b7, MSR_OFFCORE_RSP_0, 0x3full, 
RSP_0),
+   INTEL_UEVENT_EXTRA_REG(0x02b7, MSR_OFFCORE_RSP_1, 0x3full, 
RSP_1),
+   INTEL_UEVENT_PEBS_LDLAT_EXTRA_REG(0x5d0),
+   EVENT_EXTRA_END
+};
+
 #define KNL_OT_L2_HITE BIT_ULL(19) /* Other Tile L2 Hit */
 #define KNL_OT_L2_HITF BIT_ULL(20) /* Other Tile L2 Hit */
 #define KNL_MCDRAM_LOCAL   BIT_ULL(21)
@@ -2430,6 +2438,16 @@ static int icl_set_topdown_event_period(struct 
perf_event *event)
return 0;
 }
 
+static int adl_set_topdown_event_period(struct perf_event *event)
+{
+   struct x86_hybrid_pmu *pmu = hybrid_pmu(event->pmu);
+
+   if (pmu->cpu_type != hybrid_big)
+   return 0;
+
+   return icl_set_topdown_event_period(event);
+}
+
 static inline u64 icl_get_metrics_event_value(u64 metric, u64 slots, int idx)
 {
u32 val;
@@ -2570,6 +2588,17 @@ static u64 icl_update_topdown_event(struct perf_event 
*event)
 x86_pmu.num_topdown_events - 
1);
 }
 
+static u64 adl_update_topdown_event(struct perf_event *event)
+{
+   struct x86_hybrid_pmu *pmu = hybrid_pmu(event->pmu);
+
+   if (pmu->cpu_type != hybrid_big)
+   return 0;
+
+   return icl_update_topdown_event(event);
+}
+
+
 static void intel_pmu_read_topdown_event(struct perf_event *event)
 {
struct cpu_hw_events *cpuc = this_cpu_ptr(_hw_events);
@@ -3658,6 +3687,17 @@ static inline bool is_mem_loads_aux_event(struct 
perf_event *event)
return (event->attr.config & INTEL_ARCH_EVENT_MASK) == 
X86_CONFIG(.event=0x03, .umask=0x82);
 }
 
+static inline bool require_mem_loads_aux_event(struct perf_event *event)
+{
+   if (!(x86_pmu.flags & PMU_FL_MEM_LOADS_AUX))
+   return false;
+
+   if (is_hybrid())
+   return hybrid_pmu(event->pmu)->cpu_type == hybrid_big;
+
+   return true;
+}
+
 static inline bool intel_pmu_has_cap(struct perf_event *event, int idx)
 {
union perf_capabilities *intel_cap;
@@ -3785,7 +3825,7 @@ static int intel_pmu_hw_config(struct perf_event *event)
 * event. The rule is to simplify the implementation of the check.
 * That's because perf cannot have a complete group at the moment.
 */
-   if (x86_pmu.flags & PMU_FL_MEM_LOADS_AUX &&
+   if (require_mem_loads_aux_event(event) &&
(event->attr.sample_type & PERF_SAMPLE_DATA_SRC) &&
is_mem_loads_event(event)) {
struct perf_event *leader = event->group_leader;
@@ -4062,6 +4102,39 @@ tfa_get_event_constraints(struct cpu_hw_events *cpuc, 
int idx,
return c;
 }
 
+static struct event_constraint *
+adl_get_event_constraints(struct cpu_hw_events *cpuc, int idx,
+ struct perf_event *event)
+{
+   struct x86_hybrid_pmu *pmu = hybrid_pmu(event->pmu);
+
+   if (pmu->cpu_type == hybrid_big)
+   return spr_get_event_constraints(cpuc, idx, event);
+   else if (pmu->cpu_type == hybrid_small)
+   return tnt_get_event_constraints(cpuc, idx, event);
+
+

[PATCH V4 19/25] perf/x86: Support filter_match callback

2021-04-01 Thread kan . liang

From: Kan Liang 

Implement filter_match callback for X86, which check whether an event is
schedulable on the current CPU.

Reviewed-by: Andi Kleen 
Signed-off-by: Kan Liang 
---
 arch/x86/events/core.c   | 10 ++
 arch/x86/events/perf_event.h |  1 +
 2 files changed, 11 insertions(+)

diff --git a/arch/x86/events/core.c b/arch/x86/events/core.c
index 19e026a..18b75d6 100644
--- a/arch/x86/events/core.c
+++ b/arch/x86/events/core.c
@@ -2650,6 +2650,14 @@ static int x86_pmu_aux_output_match(struct perf_event 
*event)
return 0;
 }
 
+static int x86_pmu_filter_match(struct perf_event *event)
+{
+   if (x86_pmu.filter_match)
+   return x86_pmu.filter_match(event);
+
+   return 1;
+}
+
 static struct pmu pmu = {
.pmu_enable = x86_pmu_enable,
.pmu_disable= x86_pmu_disable,
@@ -2677,6 +2685,8 @@ static struct pmu pmu = {
.check_period   = x86_pmu_check_period,
 
.aux_output_match   = x86_pmu_aux_output_match,
+
+   .filter_match   = x86_pmu_filter_match,
 };
 
 void arch_perf_update_userpage(struct perf_event *event,
diff --git a/arch/x86/events/perf_event.h b/arch/x86/events/perf_event.h
index c1c90c3..f996686 100644
--- a/arch/x86/events/perf_event.h
+++ b/arch/x86/events/perf_event.h
@@ -870,6 +870,7 @@ struct x86_pmu {
 
int (*aux_output_match) (struct perf_event *event);
 
+   int (*filter_match)(struct perf_event *event);
/*
 * Hybrid support
 *
-- 
2.7.4

[PATCH V4 03/25] perf/x86: Track pmu in per-CPU cpu_hw_events

2021-04-01 Thread kan . liang

From: Kan Liang 

Some platforms, e.g. Alder Lake, have hybrid architecture. In the same
package, there may be more than one type of CPU. The PMU capabilities
are different among different types of CPU. Perf will register a
dedicated PMU for each type of CPU.

Add a 'pmu' variable in the struct cpu_hw_events to track the dedicated
PMU of the current CPU.

Current x86_get_pmu() use the global 'pmu', which will be broken on a
hybrid platform. Modify it to apply the 'pmu' of the specific CPU.

Initialize the per-CPU 'pmu' variable with the global 'pmu'. There is
nothing changed for the non-hybrid platforms.

The is_x86_event() will be updated in the later patch ("perf/x86:
Register hybrid PMUs") for hybrid platforms. For the non-hybrid
platforms, nothing is changed here.

Suggested-by: Peter Zijlstra (Intel) 
Signed-off-by: Kan Liang 
---
 arch/x86/events/core.c   | 17 +
 arch/x86/events/intel/core.c |  2 +-
 arch/x86/events/intel/ds.c   |  4 ++--
 arch/x86/events/intel/lbr.c  |  9 +
 arch/x86/events/perf_event.h |  4 +++-
 5 files changed, 24 insertions(+), 12 deletions(-)

diff --git a/arch/x86/events/core.c b/arch/x86/events/core.c
index 18df171..e564e96 100644
--- a/arch/x86/events/core.c
+++ b/arch/x86/events/core.c
@@ -45,9 +45,11 @@
 #include "perf_event.h"
 
 struct x86_pmu x86_pmu __read_mostly;
+static struct pmu pmu;
 
 DEFINE_PER_CPU(struct cpu_hw_events, cpu_hw_events) = {
.enabled = 1,
+   .pmu = ,
 };
 
 DEFINE_STATIC_KEY_FALSE(rdpmc_never_available_key);
@@ -724,16 +726,23 @@ void x86_pmu_enable_all(int added)
}
 }
 
-static struct pmu pmu;
-
 static inline int is_x86_event(struct perf_event *event)
 {
return event->pmu == 
 }
 
-struct pmu *x86_get_pmu(void)
+struct pmu *x86_get_pmu(unsigned int cpu)
 {
-   return 
+   struct cpu_hw_events *cpuc = _cpu(cpu_hw_events, cpu);
+
+   /*
+* All CPUs of the hybrid type have been offline.
+* The x86_get_pmu() should not be invoked.
+*/
+   if (WARN_ON_ONCE(!cpuc->pmu))
+   return 
+
+   return cpuc->pmu;
 }
 /*
  * Event scheduler state:
diff --git a/arch/x86/events/intel/core.c b/arch/x86/events/intel/core.c
index 7bbb5bb..f116c63 100644
--- a/arch/x86/events/intel/core.c
+++ b/arch/x86/events/intel/core.c
@@ -4876,7 +4876,7 @@ static void update_tfa_sched(void *ignored)
 * and if so force schedule out for all event types all contexts
 */
if (test_bit(3, cpuc->active_mask))
-   perf_pmu_resched(x86_get_pmu());
+   perf_pmu_resched(x86_get_pmu(smp_processor_id()));
 }
 
 static ssize_t show_sysctl_tfa(struct device *cdev,
diff --git a/arch/x86/events/intel/ds.c b/arch/x86/events/intel/ds.c
index 7ebae18..1bfea8c 100644
--- a/arch/x86/events/intel/ds.c
+++ b/arch/x86/events/intel/ds.c
@@ -2192,7 +2192,7 @@ void __init intel_ds_init(void)
PERF_SAMPLE_TIME;
x86_pmu.flags |= PMU_FL_PEBS_ALL;
pebs_qual = "-baseline";
-   x86_get_pmu()->capabilities |= 
PERF_PMU_CAP_EXTENDED_REGS;
+   x86_get_pmu(smp_processor_id())->capabilities 
|= PERF_PMU_CAP_EXTENDED_REGS;
} else {
/* Only basic record supported */
x86_pmu.large_pebs_flags &=
@@ -2207,7 +2207,7 @@ void __init intel_ds_init(void)
 
if (x86_pmu.intel_cap.pebs_output_pt_available) {
pr_cont("PEBS-via-PT, ");
-   x86_get_pmu()->capabilities |= 
PERF_PMU_CAP_AUX_OUTPUT;
+   x86_get_pmu(smp_processor_id())->capabilities 
|= PERF_PMU_CAP_AUX_OUTPUT;
}
 
break;
diff --git a/arch/x86/events/intel/lbr.c b/arch/x86/events/intel/lbr.c
index 21890da..bb4486c 100644
--- a/arch/x86/events/intel/lbr.c
+++ b/arch/x86/events/intel/lbr.c
@@ -705,7 +705,7 @@ void intel_pmu_lbr_add(struct perf_event *event)
 
 void release_lbr_buffers(void)
 {
-   struct kmem_cache *kmem_cache = x86_get_pmu()->task_ctx_cache;
+   struct kmem_cache *kmem_cache;
struct cpu_hw_events *cpuc;
int cpu;
 
@@ -714,6 +714,7 @@ void release_lbr_buffers(void)
 
for_each_possible_cpu(cpu) {
cpuc = per_cpu_ptr(_hw_events, cpu);
+   kmem_cache = x86_get_pmu(cpu)->task_ctx_cache;
if (kmem_cache && cpuc->lbr_xsave) {
kmem_cache_free(kmem_cache, cpuc->lbr_xsave);
cpuc->lbr_xsave = NULL;
@@ -1609,7 +1610,7 @@ void intel_pmu_lbr_init_hsw(void)
x86_pmu.lbr_sel_mask = LBR_SEL_MASK;
x86_pmu.lbr_sel_map  = hsw_lbr_sel_map;
 
-   x86_get_pmu()->task_ctx_c

[PATCH V4 18/25] perf/x86/intel: Add attr_update for Hybrid PMUs

2021-04-01 Thread kan . liang

From: Kan Liang 

The attribute_group for Hybrid PMUs should be different from the
previous
cpu PMU. For example, cpumask is required for a Hybrid PMU. The PMU type
should be included in the event and format attribute.

Add hybrid_attr_update for the Hybrid PMU.
Check the PMU type in is_visible() function. Only display the event or
format for the matched Hybrid PMU.

Reviewed-by: Andi Kleen 
Signed-off-by: Kan Liang 
---
 arch/x86/events/intel/core.c | 120 ---
 1 file changed, 114 insertions(+), 6 deletions(-)

diff --git a/arch/x86/events/intel/core.c b/arch/x86/events/intel/core.c
index 27919ae..07af58c 100644
--- a/arch/x86/events/intel/core.c
+++ b/arch/x86/events/intel/core.c
@@ -5124,6 +5124,106 @@ static const struct attribute_group *attr_update[] = {
NULL,
 };
 
+static bool is_attr_for_this_pmu(struct kobject *kobj, struct attribute *attr)
+{
+   struct device *dev = kobj_to_dev(kobj);
+   struct x86_hybrid_pmu *pmu =
+   container_of(dev_get_drvdata(dev), struct x86_hybrid_pmu, pmu);
+   struct perf_pmu_events_hybrid_attr *pmu_attr =
+   container_of(attr, struct perf_pmu_events_hybrid_attr, 
attr.attr);
+
+   return pmu->cpu_type & pmu_attr->pmu_type;
+}
+
+static umode_t hybrid_events_is_visible(struct kobject *kobj,
+   struct attribute *attr, int i)
+{
+   return is_attr_for_this_pmu(kobj, attr) ? attr->mode : 0;
+}
+
+static inline int hybrid_find_supported_cpu(struct x86_hybrid_pmu *pmu)
+{
+   int cpu = cpumask_first(>supported_cpus);
+
+   return (cpu >= nr_cpu_ids) ? -1 : cpu;
+}
+
+static umode_t hybrid_tsx_is_visible(struct kobject *kobj,
+struct attribute *attr, int i)
+{
+   struct device *dev = kobj_to_dev(kobj);
+   struct x86_hybrid_pmu *pmu =
+container_of(dev_get_drvdata(dev), struct x86_hybrid_pmu, pmu);
+   int cpu = hybrid_find_supported_cpu(pmu);
+
+   return (cpu >= 0) && is_attr_for_this_pmu(kobj, attr) && 
cpu_has(_data(cpu), X86_FEATURE_RTM) ? attr->mode : 0;
+}
+
+static umode_t hybrid_format_is_visible(struct kobject *kobj,
+   struct attribute *attr, int i)
+{
+   struct device *dev = kobj_to_dev(kobj);
+   struct x86_hybrid_pmu *pmu =
+   container_of(dev_get_drvdata(dev), struct x86_hybrid_pmu, pmu);
+   struct perf_pmu_format_hybrid_attr *pmu_attr =
+   container_of(attr, struct perf_pmu_format_hybrid_attr, 
attr.attr);
+   int cpu = hybrid_find_supported_cpu(pmu);
+
+   return (cpu >= 0) && (pmu->cpu_type & pmu_attr->pmu_type) ? attr->mode 
: 0;
+}
+
+static struct attribute_group hybrid_group_events_td  = {
+   .name   = "events",
+   .is_visible = hybrid_events_is_visible,
+};
+
+static struct attribute_group hybrid_group_events_mem = {
+   .name   = "events",
+   .is_visible = hybrid_events_is_visible,
+};
+
+static struct attribute_group hybrid_group_events_tsx = {
+   .name   = "events",
+   .is_visible = hybrid_tsx_is_visible,
+};
+
+static struct attribute_group hybrid_group_format_extra = {
+   .name   = "format",
+   .is_visible = hybrid_format_is_visible,
+};
+
+static ssize_t intel_hybrid_get_attr_cpus(struct device *dev,
+ struct device_attribute *attr,
+ char *buf)
+{
+   struct x86_hybrid_pmu *pmu =
+   container_of(dev_get_drvdata(dev), struct x86_hybrid_pmu, pmu);
+
+   return cpumap_print_to_pagebuf(true, buf, >supported_cpus);
+}
+
+static DEVICE_ATTR(cpus, S_IRUGO, intel_hybrid_get_attr_cpus, NULL);
+static struct attribute *intel_hybrid_cpus_attrs[] = {
+   _attr_cpus.attr,
+   NULL,
+};
+
+static struct attribute_group hybrid_group_cpus = {
+   .attrs  = intel_hybrid_cpus_attrs,
+};
+
+static const struct attribute_group *hybrid_attr_update[] = {
+   _group_events_td,
+   _group_events_mem,
+   _group_events_tsx,
+   _caps_gen,
+   _caps_lbr,
+   _group_format_extra,
+   _default,
+   _group_cpus,
+   NULL,
+};
+
 static struct attribute *empty_attrs;
 
 static void intel_pmu_check_num_counters(int *num_counters,
@@ -5867,14 +5967,22 @@ __init int intel_pmu_init(void)
 
snprintf(pmu_name_str, sizeof(pmu_name_str), "%s", name);
 
+   if (!is_hybrid()) {
+   group_events_td.attrs  = td_attr;
+   group_events_mem.attrs = mem_attr;
+   group_events_tsx.attrs = tsx_attr;
+   group_format_extra.attrs = extra_attr;
+   group_format_extra_skl.attrs = extra_skl_attr;
 
-   group_events_td.attrs  = td_attr;
-   group

[PATCH V4 05/25] perf/x86: Hybrid PMU support for intel_ctrl

2021-04-01 Thread kan . liang

From: Kan Liang 

The intel_ctrl is the counter mask of a PMU. The PMU counter information
may be different among hybrid PMUs, each hybrid PMU should use its own
intel_ctrl to check and access the counters.

When handling a certain hybrid PMU, apply the intel_ctrl from the
corresponding hybrid PMU.

When checking the HW existence, apply the PMU and number of counters
from the corresponding hybrid PMU as well. Perf will check the HW
existence for each Hybrid PMU before registration. Expose the
check_hw_exists() for a later patch.

Reviewed-by: Andi Kleen 
Signed-off-by: Kan Liang 
---
 arch/x86/events/core.c   | 14 +++---
 arch/x86/events/intel/core.c | 14 +-
 arch/x86/events/perf_event.h | 10 --
 3 files changed, 24 insertions(+), 14 deletions(-)

diff --git a/arch/x86/events/core.c b/arch/x86/events/core.c
index d3d3c6b..fc14697 100644
--- a/arch/x86/events/core.c
+++ b/arch/x86/events/core.c
@@ -230,7 +230,7 @@ static void release_pmc_hardware(void) {}
 
 #endif
 
-static bool check_hw_exists(void)
+bool check_hw_exists(struct pmu *pmu, int num_counters, int num_counters_fixed)
 {
u64 val, val_fail = -1, val_new= ~0;
int i, reg, reg_fail = -1, ret = 0;
@@ -241,7 +241,7 @@ static bool check_hw_exists(void)
 * Check to see if the BIOS enabled any of the counters, if so
 * complain and bail.
 */
-   for (i = 0; i < x86_pmu.num_counters; i++) {
+   for (i = 0; i < num_counters; i++) {
reg = x86_pmu_config_addr(i);
ret = rdmsrl_safe(reg, );
if (ret)
@@ -255,13 +255,13 @@ static bool check_hw_exists(void)
}
}
 
-   if (x86_pmu.num_counters_fixed) {
+   if (num_counters_fixed) {
reg = MSR_ARCH_PERFMON_FIXED_CTR_CTRL;
ret = rdmsrl_safe(reg, );
if (ret)
goto msr_fail;
-   for (i = 0; i < x86_pmu.num_counters_fixed; i++) {
-   if (fixed_counter_disabled(i))
+   for (i = 0; i < num_counters_fixed; i++) {
+   if (fixed_counter_disabled(i, pmu))
continue;
if (val & (0x03 << i*4)) {
bios_fail = 1;
@@ -1547,7 +1547,7 @@ void perf_event_print_debug(void)
cpu, idx, prev_left);
}
for (idx = 0; idx < x86_pmu.num_counters_fixed; idx++) {
-   if (fixed_counter_disabled(idx))
+   if (fixed_counter_disabled(idx, cpuc->pmu))
continue;
rdmsrl(MSR_ARCH_PERFMON_FIXED_CTR0 + idx, pmc_count);
 
@@ -1992,7 +1992,7 @@ static int __init init_hw_perf_events(void)
pmu_check_apic();
 
/* sanity check that the hardware exists or is emulated */
-   if (!check_hw_exists())
+   if (!check_hw_exists(, x86_pmu.num_counters, 
x86_pmu.num_counters_fixed))
return 0;
 
pr_cont("%s PMU driver.\n", x86_pmu.name);
diff --git a/arch/x86/events/intel/core.c b/arch/x86/events/intel/core.c
index 494b9bc..7cc2c45 100644
--- a/arch/x86/events/intel/core.c
+++ b/arch/x86/events/intel/core.c
@@ -2153,10 +2153,11 @@ static void intel_pmu_disable_all(void)
 static void __intel_pmu_enable_all(int added, bool pmi)
 {
struct cpu_hw_events *cpuc = this_cpu_ptr(_hw_events);
+   u64 intel_ctrl = hybrid(cpuc->pmu, intel_ctrl);
 
intel_pmu_lbr_enable_all(pmi);
wrmsrl(MSR_CORE_PERF_GLOBAL_CTRL,
-   x86_pmu.intel_ctrl & ~cpuc->intel_ctrl_guest_mask);
+  intel_ctrl & ~cpuc->intel_ctrl_guest_mask);
 
if (test_bit(INTEL_PMC_IDX_FIXED_BTS, cpuc->active_mask)) {
struct perf_event *event =
@@ -2709,6 +2710,7 @@ int intel_pmu_save_and_restart(struct perf_event *event)
 static void intel_pmu_reset(void)
 {
struct debug_store *ds = __this_cpu_read(cpu_hw_events.ds);
+   struct cpu_hw_events *cpuc = this_cpu_ptr(_hw_events);
unsigned long flags;
int idx;
 
@@ -2724,7 +2726,7 @@ static void intel_pmu_reset(void)
wrmsrl_safe(x86_pmu_event_addr(idx),  0ull);
}
for (idx = 0; idx < x86_pmu.num_counters_fixed; idx++) {
-   if (fixed_counter_disabled(idx))
+   if (fixed_counter_disabled(idx, cpuc->pmu))
continue;
wrmsrl_safe(MSR_ARCH_PERFMON_FIXED_CTR0 + idx, 0ull);
}
@@ -2753,6 +2755,7 @@ static int handle_pmi_common(struct pt_regs *regs, u64 
status)
struct cpu_hw_events *cpuc = this_cpu_ptr(_hw_events);
int bit;
int handled = 0;
+   u64 intel_ctrl = hybrid(cpuc->pmu, intel_ctrl);
 
inc_irq_stat(apic_perf_irqs);
 
@@ -2798,7 +2801,7 @@ static int handle_pmi_common(struct pt_regs *regs, u64 
status)
 
handled++;

[PATCH V4 17/25] perf/x86: Add structures for the attributes of Hybrid PMUs

2021-04-01 Thread kan . liang

From: Kan Liang 

Hybrid PMUs have different events and formats. In theory, Hybrid PMU
specific attributes should be maintained in the dedicated struct
x86_hybrid_pmu, but it wastes space because the events and formats are
similar among Hybrid PMUs.

To reduce duplication, all hybrid PMUs will share a group of attributes
in the following patch. To distinguish an attribute from different
Hybrid PMUs, a PMU aware attribute structure is introduced. A PMU type
is required for the attribute structure. The type is internal usage. It
is not visible in the sysfs API.

Hybrid PMUs may support the same event name, but with different event
encoding, e.g., the mem-loads event on an Atom PMU has different event
encoding from a Core PMU. It brings issue if two attributes are
created for them. Current sysfs_update_group finds an attribute by
searching the attr name (aka event name). If two attributes have the
same event name, the first attribute will be replaced.
To address the issue, only one attribute is created for the event. The
event_str is extended and stores event encodings from all Hybrid PMUs.
Each event encoding is divided by ";". The order of the event encodings
must follow the order of the hybrid PMU index. The event_str is internal
usage as well. When a user wants to show the attribute of a Hybrid PMU,
only the corresponding part of the string is displayed.

Reviewed-by: Andi Kleen 
Signed-off-by: Kan Liang 
---
 arch/x86/events/core.c   | 43 +++
 arch/x86/events/perf_event.h | 19 +++
 include/linux/perf_event.h   | 12 
 3 files changed, 74 insertions(+)

diff --git a/arch/x86/events/core.c b/arch/x86/events/core.c
index e15b177..19e026a 100644
--- a/arch/x86/events/core.c
+++ b/arch/x86/events/core.c
@@ -1868,6 +1868,49 @@ ssize_t events_ht_sysfs_show(struct device *dev, struct 
device_attribute *attr,
pmu_attr->event_str_noht);
 }
 
+ssize_t events_hybrid_sysfs_show(struct device *dev,
+struct device_attribute *attr,
+char *page)
+{
+   struct perf_pmu_events_hybrid_attr *pmu_attr =
+   container_of(attr, struct perf_pmu_events_hybrid_attr, attr);
+   struct x86_hybrid_pmu *pmu;
+   const char *str, *next_str;
+   int i;
+
+   if (hweight64(pmu_attr->pmu_type) == 1)
+   return sprintf(page, "%s", pmu_attr->event_str);
+
+   /*
+* Hybrid PMUs may support the same event name, but with different
+* event encoding, e.g., the mem-loads event on an Atom PMU has
+* different event encoding from a Core PMU.
+*
+* The event_str includes all event encodings. Each event encoding
+* is divided by ";". The order of the event encodings must follow
+* the order of the hybrid PMU index.
+*/
+   pmu = container_of(dev_get_drvdata(dev), struct x86_hybrid_pmu, pmu);
+
+   str = pmu_attr->event_str;
+   for (i = 0; i < x86_pmu.num_hybrid_pmus; i++) {
+   if (!(x86_pmu.hybrid_pmu[i].cpu_type & pmu_attr->pmu_type))
+   continue;
+   if (x86_pmu.hybrid_pmu[i].cpu_type & pmu->cpu_type) {
+   next_str = strchr(str, ';');
+   if (next_str)
+   return snprintf(page, next_str - str + 1, "%s", 
str);
+   else
+   return sprintf(page, "%s", str);
+   }
+   str = strchr(str, ';');
+   str++;
+   }
+
+   return 0;
+}
+EXPORT_SYMBOL_GPL(events_hybrid_sysfs_show);
+
 EVENT_ATTR(cpu-cycles, CPU_CYCLES  );
 EVENT_ATTR(instructions,   INSTRUCTIONS);
 EVENT_ATTR(cache-references,   CACHE_REFERENCES);
diff --git a/arch/x86/events/perf_event.h b/arch/x86/events/perf_event.h
index 35510a9..c1c90c3 100644
--- a/arch/x86/events/perf_event.h
+++ b/arch/x86/events/perf_event.h
@@ -970,6 +970,22 @@ static struct perf_pmu_events_ht_attr event_attr_##v = {   
\
.event_str_ht   = ht,   \
 }
 
+#define EVENT_ATTR_STR_HYBRID(_name, v, str, _pmu) \
+static struct perf_pmu_events_hybrid_attr event_attr_##v = {   \
+   .attr   = __ATTR(_name, 0444, events_hybrid_sysfs_show, NULL),\
+   .id = 0,\
+   .event_str  = str,  \
+   .pmu_type   = _pmu, \
+}
+
+#define FORMAT_HYBRID_PTR(_id) (_attr_hybrid_##_id.attr.attr)
+
+#define FORMAT_ATTR_HYBRID(_name, _pmu)
\
+static struct perf_pmu_format_hybrid_attr format_attr_hybrid_##_name = {\
+   .att

[PATCH V4 16/25] perf/x86: Register hybrid PMUs

2021-04-01 Thread kan . liang

From: Kan Liang 

Different hybrid PMUs have different PMU capabilities and events. Perf
should registers a dedicated PMU for each of them.

To check the X86 event, perf has to go through all possible hybrid pmus.

All the hybrid PMUs are registered at boot time. Before the
registration, add intel_pmu_check_hybrid_pmus() to check and update the
counters information, the event constraints, the extra registers and the
unique capabilities for each hybrid PMUs.

Postpone the display of the PMU information and HW check to
CPU_STARTING, because the boot CPU is the only online CPU in the
init_hw_perf_events(). Perf doesn't know the availability of the other
PMUs. Perf should display the PMU information only if the counters of
the PMU are available.

One type of CPUs may be all offline. For this case, users can still
observe the PMU in /sys/devices, but its CPU mask is 0.

All hybrid PMUs have capability PERF_PMU_CAP_HETEROGENEOUS_CPUS.
The PMU name for hybrid PMUs will be "cpu_XXX", which will be assigned
later in a separated patch.

The PMU type id for the core PMU is still PERF_TYPE_RAW. For the other
hybrid PMUs, the PMU type id is not hard code.

The event->cpu must be compatitable with the supported CPUs of the PMU.
Add a check in the x86_pmu_event_init().

The events in a group must be from the same type of hybrid PMU.
The fake cpuc used in the validation must be from the supported CPU of
the event->pmu.

Perf may not retrieve a valid core type from get_this_hybrid_cpu_type().
For example, ADL may have an alternative configuration. With that
configuration, Perf cannot retrieve the core type from the CPUID leaf
0x1a. Add a platform specific get_hybrid_cpu_type(). If the generic way
fails, invoke the platform specific get_hybrid_cpu_type().

Suggested-by: Peter Zijlstra (Intel) 
Signed-off-by: Kan Liang 
---
 arch/x86/events/core.c   | 138 +--
 arch/x86/events/intel/core.c |  93 -
 arch/x86/events/perf_event.h |  14 +
 3 files changed, 224 insertions(+), 21 deletions(-)

diff --git a/arch/x86/events/core.c b/arch/x86/events/core.c
index f9d299b..e15b177 100644
--- a/arch/x86/events/core.c
+++ b/arch/x86/events/core.c
@@ -485,7 +485,7 @@ int x86_setup_perfctr(struct perf_event *event)
local64_set(>period_left, hwc->sample_period);
}
 
-   if (attr->type == PERF_TYPE_RAW)
+   if (attr->type == event->pmu->type)
return x86_pmu_extra_regs(event->attr.config, event);
 
if (attr->type == PERF_TYPE_HW_CACHE)
@@ -620,7 +620,7 @@ int x86_pmu_hw_config(struct perf_event *event)
if (!event->attr.exclude_kernel)
event->hw.config |= ARCH_PERFMON_EVENTSEL_OS;
 
-   if (event->attr.type == PERF_TYPE_RAW)
+   if (event->attr.type == event->pmu->type)
event->hw.config |= event->attr.config & X86_RAW_EVENT_MASK;
 
if (event->attr.sample_period && x86_pmu.limit_period) {
@@ -749,7 +749,17 @@ void x86_pmu_enable_all(int added)
 
 static inline int is_x86_event(struct perf_event *event)
 {
-   return event->pmu == 
+   int i;
+
+   if (!is_hybrid())
+   return event->pmu == 
+
+   for (i = 0; i < x86_pmu.num_hybrid_pmus; i++) {
+   if (event->pmu == _pmu.hybrid_pmu[i].pmu)
+   return true;
+   }
+
+   return false;
 }
 
 struct pmu *x86_get_pmu(unsigned int cpu)
@@ -1998,6 +2008,23 @@ void x86_pmu_show_pmu_cap(int num_counters, int 
num_counters_fixed,
pr_info("... event mask: %016Lx\n", intel_ctrl);
 }
 
+/*
+ * The generic code is not hybrid friendly. The hybrid_pmu->pmu
+ * of the first registered PMU is unconditionally assigned to
+ * each possible cpuctx->ctx.pmu.
+ * Update the correct hybrid PMU to the cpuctx->ctx.pmu.
+ */
+void x86_pmu_update_cpu_context(struct pmu *pmu, int cpu)
+{
+   struct perf_cpu_context *cpuctx;
+
+   if (!pmu->pmu_cpu_context)
+   return;
+
+   cpuctx = per_cpu_ptr(pmu->pmu_cpu_context, cpu);
+   cpuctx->ctx.pmu = pmu;
+}
+
 static int __init init_hw_perf_events(void)
 {
struct x86_pmu_quirk *quirk;
@@ -2058,8 +2085,11 @@ static int __init init_hw_perf_events(void)
 
pmu.attr_update = x86_pmu.attr_update;
 
-   x86_pmu_show_pmu_cap(x86_pmu.num_counters, x86_pmu.num_counters_fixed,
-x86_pmu.intel_ctrl);
+   if (!is_hybrid()) {
+   x86_pmu_show_pmu_cap(x86_pmu.num_counters,
+x86_pmu.num_counters_fixed,
+x86_pmu.intel_ctrl);
+   }
 
if (!x86_pmu.read)
x86_pmu.read = _x86_pmu_read;
@@ -2089,9 +2119,46 @@ static int __init init_hw_perf_events(void)
if (err)
goto out1;
 
-   err = perf_pmu_regist

[PATCH V4 15/25] perf/x86: Factor out x86_pmu_show_pmu_cap

2021-04-01 Thread kan . liang

From: Kan Liang 

The PMU capabilities are different among hybrid PMUs. Perf should dump
the PMU capabilities information for each hybrid PMU.

Factor out x86_pmu_show_pmu_cap() which shows the PMU capabilities
information. The function will be reused later when registering a
dedicated hybrid PMU.

Reviewed-by: Andi Kleen 
Signed-off-by: Kan Liang 
---
 arch/x86/events/core.c   | 25 -
 arch/x86/events/perf_event.h |  3 +++
 2 files changed, 19 insertions(+), 9 deletions(-)

diff --git a/arch/x86/events/core.c b/arch/x86/events/core.c
index 9c931ec..f9d299b 100644
--- a/arch/x86/events/core.c
+++ b/arch/x86/events/core.c
@@ -1984,6 +1984,20 @@ static void _x86_pmu_read(struct perf_event *event)
x86_perf_event_update(event);
 }
 
+void x86_pmu_show_pmu_cap(int num_counters, int num_counters_fixed,
+ u64 intel_ctrl)
+{
+   pr_info("... version:%d\n", x86_pmu.version);
+   pr_info("... bit width:  %d\n", x86_pmu.cntval_bits);
+   pr_info("... generic registers:  %d\n", num_counters);
+   pr_info("... value mask: %016Lx\n", x86_pmu.cntval_mask);
+   pr_info("... max period: %016Lx\n", x86_pmu.max_period);
+   pr_info("... fixed-purpose events:   %lu\n",
+   hweight641ULL << num_counters_fixed) - 1)
+   << INTEL_PMC_IDX_FIXED) & intel_ctrl));
+   pr_info("... event mask: %016Lx\n", intel_ctrl);
+}
+
 static int __init init_hw_perf_events(void)
 {
struct x86_pmu_quirk *quirk;
@@ -2044,15 +2058,8 @@ static int __init init_hw_perf_events(void)
 
pmu.attr_update = x86_pmu.attr_update;
 
-   pr_info("... version:%d\n", x86_pmu.version);
-   pr_info("... bit width:  %d\n", x86_pmu.cntval_bits);
-   pr_info("... generic registers:  %d\n", x86_pmu.num_counters);
-   pr_info("... value mask: %016Lx\n", x86_pmu.cntval_mask);
-   pr_info("... max period: %016Lx\n", x86_pmu.max_period);
-   pr_info("... fixed-purpose events:   %lu\n",
-   hweight641ULL << x86_pmu.num_counters_fixed) - 1)
-   << INTEL_PMC_IDX_FIXED) & 
x86_pmu.intel_ctrl));
-   pr_info("... event mask: %016Lx\n", x86_pmu.intel_ctrl);
+   x86_pmu_show_pmu_cap(x86_pmu.num_counters, x86_pmu.num_counters_fixed,
+x86_pmu.intel_ctrl);
 
if (!x86_pmu.read)
x86_pmu.read = _x86_pmu_read;
diff --git a/arch/x86/events/perf_event.h b/arch/x86/events/perf_event.h
index 5679c12..1da91b7 100644
--- a/arch/x86/events/perf_event.h
+++ b/arch/x86/events/perf_event.h
@@ -1083,6 +1083,9 @@ void x86_pmu_enable_event(struct perf_event *event);
 
 int x86_pmu_handle_irq(struct pt_regs *regs);
 
+void x86_pmu_show_pmu_cap(int num_counters, int num_counters_fixed,
+ u64 intel_ctrl);
+
 extern struct event_constraint emptyconstraint;
 
 extern struct event_constraint unconstrained;
-- 
2.7.4

1 2 3 4 5 6 7 8 9 10 >

1 - 100 of 3488 matches

Mail list logo