Re: [PATCH v4 00/25] perf tool: AlderLake hybrid support series 1

2021-04-20 Thread Jin, Yao

Hi Arnaldo, Hi Jiri,

Kan's patch series for AlderLake perf core support has been upstreamed, so the interface will not be 
changed any more.


For this perf tool series (v4), do you have any comments?

Thanks
Jin Yao

On 4/16/2021 10:04 PM, Jin Yao wrote:

AlderLake uses a hybrid architecture utilizing Golden Cove cores
(core cpu) and Gracemont cores (atom cpu). Each cpu has dedicated
event list. Some events are available on core cpu, some events
are available on atom cpu and some events can be available on both.

Kernel exports new pmus "cpu_core" and "cpu_atom" through sysfs:
/sys/devices/cpu_core
/sys/devices/cpu_atom

cat /sys/devices/cpu_core/cpus
0-15

cat /sys/devices/cpu_atom/cpus
16-23

In this example, core cpus are 0-15 and atom cpus are 16-23.

To enable a core only event or atom only event:

 cpu_core//
or
 cpu_atom//

Count the 'cycles' event on core cpus.

   # perf stat -e cpu_core/cycles/ -a -- sleep 1

Performance counter stats for 'system wide':

   12,853,951,349  cpu_core/cycles/

  1.002581249 seconds time elapsed

If one event is available on both atom cpu and core cpu, two events
are created automatically.

   # perf stat -e cycles -a -- sleep 1

Performance counter stats for 'system wide':

   12,856,467,438  cpu_core/cycles/
6,404,634,785  cpu_atom/cycles/

  1.002453013 seconds time elapsed

Group is supported if the events are from same pmu, otherwise a warning
is displayed and disable grouping automatically.

   # perf stat -e '{cpu_core/cycles/,cpu_core/instructions/}' -a -- sleep 1

Performance counter stats for 'system wide':

   12,863,866,968  cpu_core/cycles/
  554,795,017  cpu_core/instructions/

  1.002616117 seconds time elapsed

   # perf stat -e '{cpu_core/cycles/,cpu_atom/instructions/}' -a -- sleep 1
   WARNING: events in group from different hybrid PMUs!
   WARNING: grouped events cpus do not match, disabling group:
 anon group { cpu_core/cycles/, cpu_atom/instructions/ }

Performance counter stats for 'system wide':

6,283,970  cpu_core/cycles/
  765,635  cpu_atom/instructions/

  1.003959036 seconds time elapsed

Note that, since the whole patchset for AlderLake hybrid support is very
large (40+ patches). For simplicity, it's splitted into several patch
series.

The patch series 1 only supports the basic functionality. The advanced
supports for perf-c2c/perf-mem/topdown/metrics/topology header and others
will be added in follow-up patch series.

The perf tool codes can also be found at:
https://github.com/yaoj/perf.git

v4:
---
- In Liang Kan's patch:
   '[PATCH V6 21/25] perf: Extend PERF_TYPE_HARDWARE and PERF_TYPE_HW_CACHE',
   the user interface for hardware events and cache events are changed, so
   perf tool patches are changed as well.

- Fix an issue when atom CPUs are offlined. 
"/sys/bus/event_source/devices/cpu_atom/cpus"
   exists but the content is empty. For this case, we can't enable the cpu_atom
   PMU. '[PATCH v4 05/25] perf pmu: Save detected hybrid pmus to a global pmu 
list'

- Define 'ret' variable for return value in patch
   '[PATCH v4 09/25] perf parse-events: Create two hybrid cache events'

- Directly return add_raw_hybrid() in patch
   '[PATCH v4 10/25] perf parse-events: Create two hybrid raw events'
  
- Drop the patch 'perf pmu: Support 'cycles' and 'branches' inside

   hybrid PMU'.

- Separate '[PATCH v3 12/27] perf parse-events: Support no alias assigned event
   inside hybrid PMU' into two patches:
   '[PATCH v4 11/25] perf parse-events: Compare with hybrid pmu name'
   '[PATCH v4 12/25] perf parse-events: Support event inside hybrid pmu'.
   And these two patches are improved according to Jiri's comments.

v3:
---
- Drop 'perf evlist: Hybrid event uses its own cpus'. This patch is wide
   and actually it's not very necessary. The current perf framework has
   processed the cpus for evsel well even for hybrid evsel. So this patch can
   be dropped.

- Drop 'perf evsel: Adjust hybrid event and global event mixed group'.
   The patch is a bit tricky and hard to understand. In v3, we will disable
   grouping when the group members are from different PMUs. So this patch
   would be not necessary.

- Create parse-events-hybrid.c/parse-events-hybrid.h and 
evlist-hybrid.c/evlist-hybrid.h.
   Move hybrid related codes to these files.

- Create a new patch 'perf pmu: Support 'cycles' and 'branches' inside hybrid 
PMU' to
   support 'cycles' and 'branches' inside PMU.

- Create a new patch 'perf record: Uniquify hybrid event name' to tell user the
   pmu which the event belongs to for perf-record.

- If group members are from different hybrid PMUs, shows warning and disable
   grouping.

- Other refining and refactoring.

v2:
---
- Drop kernel patches (Kan posted the series "Add Alder Lake support for perf 
(kernel)" separately).
- Drop the patch

[PATCH v4] perf Documentation: Document intel-hybrid support

2021-04-16 Thread Jin Yao
Add some words and examples to help understanding of
Intel hybrid perf support.

Signed-off-by: Jin Yao 
---
 v4:
  - Update due to PERF_TYPE_HARDWARE and PERF_TYPE_HW_CACHE
are extended to be PMU type aware.

 tools/perf/Documentation/intel-hybrid.txt | 214 ++
 tools/perf/Documentation/perf-record.txt  |   1 +
 tools/perf/Documentation/perf-stat.txt|   2 +
 3 files changed, 217 insertions(+)
 create mode 100644 tools/perf/Documentation/intel-hybrid.txt

diff --git a/tools/perf/Documentation/intel-hybrid.txt 
b/tools/perf/Documentation/intel-hybrid.txt
new file mode 100644
index ..07f0aa3bf682
--- /dev/null
+++ b/tools/perf/Documentation/intel-hybrid.txt
@@ -0,0 +1,214 @@
+Intel hybrid support
+
+Support for Intel hybrid events within perf tools.
+
+For some Intel platforms, such as AlderLake, which is hybrid platform and
+it consists of atom cpu and core cpu. Each cpu has dedicated event list.
+Part of events are available on core cpu, part of events are available
+on atom cpu and even part of events are available on both.
+
+Kernel exports two new cpu pmus via sysfs:
+/sys/devices/cpu_core
+/sys/devices/cpu_atom
+
+The 'cpus' files are created under the directories. For example,
+
+cat /sys/devices/cpu_core/cpus
+0-15
+
+cat /sys/devices/cpu_atom/cpus
+16-23
+
+It indicates cpu0-cpu15 are core cpus and cpu16-cpu23 are atom cpus.
+
+Quickstart
+
+List hybrid event
+-
+
+As before, use perf-list to list the symbolic event.
+
+perf list
+
+inst_retired.any
+   [Fixed Counter: Counts the number of instructions retired. Unit: 
cpu_atom]
+inst_retired.any
+   [Number of instructions retired. Fixed Counter - architectural event. 
Unit: cpu_core]
+
+The 'Unit: xxx' is added to brief description to indicate which pmu
+the event is belong to. Same event name but with different pmu can
+be supported.
+
+Enable hybrid event with a specific pmu
+---
+
+To enable a core only event or atom only event, following syntax is supported:
+
+   cpu_core//
+or
+   cpu_atom//
+
+For example, count the 'cycles' event on core cpus.
+
+   perf stat -e cpu_core/cycles/
+
+Create two events for one hardware event automatically
+--
+
+When creating one event and the event is available on both atom and core,
+two events are created automatically. One is for atom, the other is for
+core. Most of hardware events and cache events are available on both
+cpu_core and cpu_atom.
+
+For hardware events, they have pre-defined configs (e.g. 0 for cycles).
+But on hybrid platform, kernel needs to know where the event comes from
+(from atom or from core). The original perf event type PERF_TYPE_HARDWARE
+can't carry pmu information. So now this type is extended to be PMU aware
+type. The PMU type ID is stored at attr.config[63:32].
+
+PMU type ID is retrieved from sysfs.
+/sys/devices/cpu_atom/type
+/sys/devices/cpu_core/type
+
+The new attr.config layout for PERF_TYPE_HARDWARE:
+
+PERF_TYPE_HARDWARE: 0x00AA
+AA: hardware event ID
+: PMU type ID
+
+Cache event is similar. The type PERF_TYPE_HW_CACHE is extended to be
+PMU aware type. The PMU type ID is stored at attr.config[63:32].
+
+The new attr.config layout for PERF_TYPE_HW_CACHE:
+
+PERF_TYPE_HW_CACHE: 0x00DDCCBB
+BB: hardware cache ID
+CC: hardware cache op ID
+DD: hardware cache op result ID
+: PMU type ID
+
+When enabling a hardware event without specified pmu, such as,
+perf stat -e cycles -a (use system-wide in this example), two events
+are created automatically.
+
+  
+  perf_event_attr:
+size 120
+config   0x4
+sample_type  IDENTIFIER
+read_format  TOTAL_TIME_ENABLED|TOTAL_TIME_RUNNING
+disabled 1
+inherit  1
+exclude_guest1
+  
+
+and
+
+  
+  perf_event_attr:
+size 120
+config   0x8
+sample_type  IDENTIFIER
+read_format  TOTAL_TIME_ENABLED|TOTAL_TIME_RUNNING
+disabled 1
+inherit  1
+exclude_guest1
+  
+
+type 0 is PERF_TYPE_HARDWARE.
+0x4 in 0x4 indicates it's cpu_core pmu.
+0x8 in 0x8

[PATCH v4 25/25] perf tests: Skip 'perf stat metrics (shadow stat) test' for hybrid

2021-04-16 Thread Jin Yao
Currently we don't support shadow stat for hybrid.

  root@ssp-pwrt-002:~# ./perf stat -e cycles,instructions -a -- sleep 1

   Performance counter stats for 'system wide':

  12,883,109,591  cpu_core/cycles/
   6,405,163,221  cpu_atom/cycles/
 555,553,778  cpu_core/instructions/
 841,158,734  cpu_atom/instructions/

 1.002644773 seconds time elapsed

Now there is no shadow stat 'insn per cycle' reported. We will support
it later and now just skip the 'perf stat metrics (shadow stat) test'.

Signed-off-by: Jin Yao 
---
 tools/perf/tests/shell/stat+shadow_stat.sh | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/tools/perf/tests/shell/stat+shadow_stat.sh 
b/tools/perf/tests/shell/stat+shadow_stat.sh
index ebebd3596cf9..e6e35fc6c882 100755
--- a/tools/perf/tests/shell/stat+shadow_stat.sh
+++ b/tools/perf/tests/shell/stat+shadow_stat.sh
@@ -7,6 +7,9 @@ set -e
 # skip if system-wide mode is forbidden
 perf stat -a true > /dev/null 2>&1 || exit 2
 
+# skip if on hybrid platform
+perf stat -a -e cycles sleep 1 2>&1 | grep -e cpu_core && exit 2
+
 test_global_aggr()
 {
perf stat -a --no-big-num -e cycles,instructions sleep 1  2>&1 | \
-- 
2.17.1



[PATCH v4 22/25] perf tests: Support 'Parse and process metrics' test for hybrid

2021-04-16 Thread Jin Yao
Some events are not supported. Only pick up some cases for hybrid.

  # ./perf test 67
  67: Parse and process metrics   : Ok

Signed-off-by: Jin Yao 
---
 tools/perf/tests/parse-metric.c | 10 --
 1 file changed, 8 insertions(+), 2 deletions(-)

diff --git a/tools/perf/tests/parse-metric.c b/tools/perf/tests/parse-metric.c
index 4968c4106254..24e5ddff515e 100644
--- a/tools/perf/tests/parse-metric.c
+++ b/tools/perf/tests/parse-metric.c
@@ -11,6 +11,7 @@
 #include "debug.h"
 #include "expr.h"
 #include "stat.h"
+#include "pmu.h"
 
 static struct pmu_event pme_test[] = {
 {
@@ -370,12 +371,17 @@ static int test_metric_group(void)
 
 int test__parse_metric(struct test *test __maybe_unused, int subtest 
__maybe_unused)
 {
+   perf_pmu__scan(NULL);
+
TEST_ASSERT_VAL("IPC failed", test_ipc() == 0);
TEST_ASSERT_VAL("frontend failed", test_frontend() == 0);
-   TEST_ASSERT_VAL("cache_miss_cycles failed", test_cache_miss_cycles() == 
0);
TEST_ASSERT_VAL("DCache_L2 failed", test_dcache_l2() == 0);
TEST_ASSERT_VAL("recursion fail failed", test_recursion_fail() == 0);
-   TEST_ASSERT_VAL("test metric group", test_metric_group() == 0);
TEST_ASSERT_VAL("Memory bandwidth", test_memory_bandwidth() == 0);
+
+   if (!perf_pmu__has_hybrid()) {
+   TEST_ASSERT_VAL("cache_miss_cycles failed", 
test_cache_miss_cycles() == 0);
+   TEST_ASSERT_VAL("test metric group", test_metric_group() == 0);
+   }
return 0;
 }
-- 
2.17.1



[PATCH v4 24/25] perf tests: Support 'Convert perf time to TSC' test for hybrid

2021-04-16 Thread Jin Yao
Since for "cycles:u' on hybrid platform, it creates two "cycles".
So the second evsel in evlist also needs initialization.

With this patch,

  # ./perf test 71
  71: Convert perf time to TSC: Ok

Signed-off-by: Jin Yao 
---
 tools/perf/tests/perf-time-to-tsc.c | 16 
 1 file changed, 16 insertions(+)

diff --git a/tools/perf/tests/perf-time-to-tsc.c 
b/tools/perf/tests/perf-time-to-tsc.c
index 680c3cffb128..72f268c6cc5d 100644
--- a/tools/perf/tests/perf-time-to-tsc.c
+++ b/tools/perf/tests/perf-time-to-tsc.c
@@ -20,6 +20,7 @@
 #include "tsc.h"
 #include "mmap.h"
 #include "tests.h"
+#include "pmu.h"
 
 #define CHECK__(x) {   \
while ((x) < 0) {   \
@@ -66,6 +67,10 @@ int test__perf_time_to_tsc(struct test *test __maybe_unused, 
int subtest __maybe
u64 test_tsc, comm1_tsc, comm2_tsc;
u64 test_time, comm1_time = 0, comm2_time = 0;
struct mmap *md;
+   bool hybrid = false;
+
+   if (perf_pmu__has_hybrid())
+   hybrid = true;
 
threads = thread_map__new(-1, getpid(), UINT_MAX);
CHECK_NOT_NULL__(threads);
@@ -88,6 +93,17 @@ int test__perf_time_to_tsc(struct test *test __maybe_unused, 
int subtest __maybe
evsel->core.attr.disabled = 1;
evsel->core.attr.enable_on_exec = 0;
 
+   /*
+* For hybrid "cycles:u", it creates two events.
+* Init the second evsel here.
+*/
+   if (hybrid) {
+   evsel = evsel__next(evsel);
+   evsel->core.attr.comm = 1;
+   evsel->core.attr.disabled = 1;
+   evsel->core.attr.enable_on_exec = 0;
+   }
+
CHECK__(evlist__open(evlist));
 
CHECK__(evlist__mmap(evlist, UINT_MAX));
-- 
2.17.1



[PATCH v4 23/25] perf tests: Support 'Session topology' test for hybrid

2021-04-16 Thread Jin Yao
Force to create one event "cpu_core/cycles/" by default,
otherwise in evlist__valid_sample_type, the checking of
'if (evlist->core.nr_entries == 1)' would be failed.

  # ./perf test 41
  41: Session topology: Ok

Signed-off-by: Jin Yao 
---
 tools/perf/tests/topology.c | 10 +-
 1 file changed, 9 insertions(+), 1 deletion(-)

diff --git a/tools/perf/tests/topology.c b/tools/perf/tests/topology.c
index 050489807a47..30b4acb08d35 100644
--- a/tools/perf/tests/topology.c
+++ b/tools/perf/tests/topology.c
@@ -8,6 +8,7 @@
 #include "session.h"
 #include "evlist.h"
 #include "debug.h"
+#include "pmu.h"
 #include 
 
 #define TEMPL "/tmp/perf-test-XX"
@@ -40,7 +41,14 @@ static int session_write_header(char *path)
session = perf_session__new(, false, NULL);
TEST_ASSERT_VAL("can't get session", !IS_ERR(session));
 
-   session->evlist = evlist__new_default();
+   if (!perf_pmu__has_hybrid()) {
+   session->evlist = evlist__new_default();
+   } else {
+   struct parse_events_error err;
+
+   session->evlist = evlist__new();
+   parse_events(session->evlist, "cpu_core/cycles/", );
+   }
TEST_ASSERT_VAL("can't get evlist", session->evlist);
 
perf_header__set_feat(>header, HEADER_CPU_TOPOLOGY);
-- 
2.17.1



[PATCH v4 21/25] perf tests: Support 'Track with sched_switch' test for hybrid

2021-04-16 Thread Jin Yao
Since for "cycles:u' on hybrid platform, it creates two "cycles".
So the number of events in evlist is not expected in next test
steps. Now we just use one event "cpu_core/cycles:u/" for hybrid.

  # ./perf test 35
  35: Track with sched_switch     : Ok

Signed-off-by: Jin Yao 
---
 tools/perf/tests/switch-tracking.c | 10 +-
 1 file changed, 9 insertions(+), 1 deletion(-)

diff --git a/tools/perf/tests/switch-tracking.c 
b/tools/perf/tests/switch-tracking.c
index 3ebaa758df77..3a12176f8c46 100644
--- a/tools/perf/tests/switch-tracking.c
+++ b/tools/perf/tests/switch-tracking.c
@@ -18,6 +18,7 @@
 #include "record.h"
 #include "tests.h"
 #include "util/mmap.h"
+#include "pmu.h"
 
 static int spin_sleep(void)
 {
@@ -340,6 +341,10 @@ int test__switch_tracking(struct test *test 
__maybe_unused, int subtest __maybe_
struct evsel *switch_evsel, *tracking_evsel;
const char *comm;
int err = -1;
+   bool hybrid = false;
+
+   if (perf_pmu__has_hybrid())
+   hybrid = true;
 
threads = thread_map__new(-1, getpid(), UINT_MAX);
if (!threads) {
@@ -371,7 +376,10 @@ int test__switch_tracking(struct test *test 
__maybe_unused, int subtest __maybe_
cpu_clocks_evsel = evlist__last(evlist);
 
/* Second event */
-   err = parse_events(evlist, "cycles:u", NULL);
+   if (!hybrid)
+   err = parse_events(evlist, "cycles:u", NULL);
+   else
+   err = parse_events(evlist, "cpu_core/cycles/u", NULL);
if (err) {
pr_debug("Failed to parse event cycles:u\n");
goto out_err;
-- 
2.17.1



[PATCH v4 20/25] perf tests: Skip 'Setup struct perf_event_attr' test for hybrid

2021-04-16 Thread Jin Yao
For hybrid, the attr.type consists of pmu type id + original type.
There will be much changes for this test. Now we temporarily
skip this test case and TODO in future.

Signed-off-by: Jin Yao 
---
 tools/perf/tests/attr.c | 4 
 1 file changed, 4 insertions(+)

diff --git a/tools/perf/tests/attr.c b/tools/perf/tests/attr.c
index dd39ce9b0277..b37c35fb5a46 100644
--- a/tools/perf/tests/attr.c
+++ b/tools/perf/tests/attr.c
@@ -34,6 +34,7 @@
 #include "event.h"
 #include "util.h"
 #include "tests.h"
+#include "pmu.h"
 
 #define ENV "PERF_TEST_ATTR"
 
@@ -184,6 +185,9 @@ int test__attr(struct test *test __maybe_unused, int 
subtest __maybe_unused)
char path_dir[PATH_MAX];
char *exec_path;
 
+   if (perf_pmu__has_hybrid())
+   return 0;
+
/* First try development tree tests. */
if (!lstat("./tests", ))
return run_dir("./tests", "./perf");
-- 
2.17.1



[PATCH v4 18/25] perf tests: Add hybrid cases for 'Parse event definition strings' test

2021-04-16 Thread Jin Yao
Add basic hybrid test cases for 'Parse event definition strings' test.

  # perf test 6
   6: Parse event definition strings  : Ok

Signed-off-by: Jin Yao 
---
 tools/perf/tests/parse-events.c | 152 
 1 file changed, 152 insertions(+)

diff --git a/tools/perf/tests/parse-events.c b/tools/perf/tests/parse-events.c
index 026c54743311..40eb08049ab2 100644
--- a/tools/perf/tests/parse-events.c
+++ b/tools/perf/tests/parse-events.c
@@ -1512,6 +1512,110 @@ static int test__all_tracepoints(struct evlist *evlist)
return test__checkevent_tracepoint_multi(evlist);
 }
 
+static int test__hybrid_hw_event_with_pmu(struct evlist *evlist)
+{
+   struct evsel *evsel = evlist__first(evlist);
+
+   TEST_ASSERT_VAL("wrong number of entries", 1 == 
evlist->core.nr_entries);
+   TEST_ASSERT_VAL("wrong type", PERF_TYPE_RAW == evsel->core.attr.type);
+   TEST_ASSERT_VAL("wrong config", 0x3c == evsel->core.attr.config);
+   return 0;
+}
+
+static int test__hybrid_hw_group_event(struct evlist *evlist)
+{
+   struct evsel *evsel, *leader;
+
+   evsel = leader = evlist__first(evlist);
+   TEST_ASSERT_VAL("wrong number of entries", 2 == 
evlist->core.nr_entries);
+   TEST_ASSERT_VAL("wrong type", PERF_TYPE_RAW == evsel->core.attr.type);
+   TEST_ASSERT_VAL("wrong config", 0x3c == evsel->core.attr.config);
+   TEST_ASSERT_VAL("wrong leader", evsel->leader == leader);
+
+   evsel = evsel__next(evsel);
+   TEST_ASSERT_VAL("wrong type", PERF_TYPE_RAW == evsel->core.attr.type);
+   TEST_ASSERT_VAL("wrong config", 0xc0 == evsel->core.attr.config);
+   TEST_ASSERT_VAL("wrong leader", evsel->leader == leader);
+   return 0;
+}
+
+static int test__hybrid_sw_hw_group_event(struct evlist *evlist)
+{
+   struct evsel *evsel, *leader;
+
+   evsel = leader = evlist__first(evlist);
+   TEST_ASSERT_VAL("wrong number of entries", 2 == 
evlist->core.nr_entries);
+   TEST_ASSERT_VAL("wrong type", PERF_TYPE_SOFTWARE == 
evsel->core.attr.type);
+   TEST_ASSERT_VAL("wrong leader", evsel->leader == leader);
+
+   evsel = evsel__next(evsel);
+   TEST_ASSERT_VAL("wrong type", PERF_TYPE_RAW == evsel->core.attr.type);
+   TEST_ASSERT_VAL("wrong config", 0x3c == evsel->core.attr.config);
+   TEST_ASSERT_VAL("wrong leader", evsel->leader == leader);
+   return 0;
+}
+
+static int test__hybrid_hw_sw_group_event(struct evlist *evlist)
+{
+   struct evsel *evsel, *leader;
+
+   evsel = leader = evlist__first(evlist);
+   TEST_ASSERT_VAL("wrong number of entries", 2 == 
evlist->core.nr_entries);
+   TEST_ASSERT_VAL("wrong type", PERF_TYPE_RAW == evsel->core.attr.type);
+   TEST_ASSERT_VAL("wrong config", 0x3c == evsel->core.attr.config);
+   TEST_ASSERT_VAL("wrong leader", evsel->leader == leader);
+
+   evsel = evsel__next(evsel);
+   TEST_ASSERT_VAL("wrong type", PERF_TYPE_SOFTWARE == 
evsel->core.attr.type);
+   TEST_ASSERT_VAL("wrong leader", evsel->leader == leader);
+   return 0;
+}
+
+static int test__hybrid_group_modifier1(struct evlist *evlist)
+{
+   struct evsel *evsel, *leader;
+
+   evsel = leader = evlist__first(evlist);
+   TEST_ASSERT_VAL("wrong number of entries", 2 == 
evlist->core.nr_entries);
+   TEST_ASSERT_VAL("wrong type", PERF_TYPE_RAW == evsel->core.attr.type);
+   TEST_ASSERT_VAL("wrong config", 0x3c == evsel->core.attr.config);
+   TEST_ASSERT_VAL("wrong leader", evsel->leader == leader);
+   TEST_ASSERT_VAL("wrong exclude_user", evsel->core.attr.exclude_user);
+   TEST_ASSERT_VAL("wrong exclude_kernel", 
!evsel->core.attr.exclude_kernel);
+
+   evsel = evsel__next(evsel);
+   TEST_ASSERT_VAL("wrong type", PERF_TYPE_RAW == evsel->core.attr.type);
+   TEST_ASSERT_VAL("wrong config", 0xc0 == evsel->core.attr.config);
+   TEST_ASSERT_VAL("wrong leader", evsel->leader == leader);
+   TEST_ASSERT_VAL("wrong exclude_user", !evsel->core.attr.exclude_user);
+   TEST_ASSERT_VAL("wrong exclude_kernel", 
evsel->core.attr.exclude_kernel);
+   return 0;
+}
+
+static int test__hybrid_raw1(struct evlist *evlist)
+{
+   struct evsel *evsel = evlist__first(evlist);
+
+   TEST_ASSERT_VAL("wrong number of entries", 2 == 
evlist->core.nr_entries);
+   TEST_ASSERT_VAL("wrong type", PERF_TYPE_RAW == evsel->core.attr.type);
+   TEST_ASSERT_VAL("wrong config", 0x1a == evsel->core.attr.config);
+
+   /* The ty

[PATCH v4 19/25] perf tests: Add hybrid cases for 'Roundtrip evsel->name' test

2021-04-16 Thread Jin Yao
Since for one hw event, two hybrid events are created.

For example,

evsel->idx  evsel__name(evsel)
0   cycles
1   cycles
2   instructions
3   instructions
...

So for comparing the evsel name on hybrid, the evsel->idx
needs to be divided by 2.

  # ./perf test 14
  14: Roundtrip evsel->name   : Ok

Signed-off-by: Jin Yao 
---
 tools/perf/tests/evsel-roundtrip-name.c | 19 ---
 1 file changed, 12 insertions(+), 7 deletions(-)

diff --git a/tools/perf/tests/evsel-roundtrip-name.c 
b/tools/perf/tests/evsel-roundtrip-name.c
index f7f3e5b4c180..b74cf80d1f10 100644
--- a/tools/perf/tests/evsel-roundtrip-name.c
+++ b/tools/perf/tests/evsel-roundtrip-name.c
@@ -4,6 +4,7 @@
 #include "parse-events.h"
 #include "tests.h"
 #include "debug.h"
+#include "pmu.h"
 #include 
 #include 
 
@@ -62,7 +63,8 @@ static int perf_evsel__roundtrip_cache_name_test(void)
return ret;
 }
 
-static int __perf_evsel__name_array_test(const char *names[], int nr_names)
+static int __perf_evsel__name_array_test(const char *names[], int nr_names,
+int distance)
 {
int i, err;
struct evsel *evsel;
@@ -82,9 +84,9 @@ static int __perf_evsel__name_array_test(const char *names[], 
int nr_names)
 
err = 0;
evlist__for_each_entry(evlist, evsel) {
-   if (strcmp(evsel__name(evsel), names[evsel->idx])) {
+   if (strcmp(evsel__name(evsel), names[evsel->idx / distance])) {
--err;
-   pr_debug("%s != %s\n", evsel__name(evsel), 
names[evsel->idx]);
+   pr_debug("%s != %s\n", evsel__name(evsel), 
names[evsel->idx / distance]);
}
}
 
@@ -93,18 +95,21 @@ static int __perf_evsel__name_array_test(const char 
*names[], int nr_names)
return err;
 }
 
-#define perf_evsel__name_array_test(names) \
-   __perf_evsel__name_array_test(names, ARRAY_SIZE(names))
+#define perf_evsel__name_array_test(names, distance) \
+   __perf_evsel__name_array_test(names, ARRAY_SIZE(names), distance)
 
 int test__perf_evsel__roundtrip_name_test(struct test *test __maybe_unused, 
int subtest __maybe_unused)
 {
int err = 0, ret = 0;
 
-   err = perf_evsel__name_array_test(evsel__hw_names);
+   if (perf_pmu__has_hybrid())
+   return perf_evsel__name_array_test(evsel__hw_names, 2);
+
+   err = perf_evsel__name_array_test(evsel__hw_names, 1);
if (err)
ret = err;
 
-   err = __perf_evsel__name_array_test(evsel__sw_names, 
PERF_COUNT_SW_DUMMY + 1);
+   err = __perf_evsel__name_array_test(evsel__sw_names, 
PERF_COUNT_SW_DUMMY + 1, 1);
if (err)
ret = err;
 
-- 
2.17.1



[PATCH v4 16/25] perf stat: Warn group events from different hybrid PMU

2021-04-16 Thread Jin Yao
If a group has events which are from different hybrid PMUs,
shows a warning:

"WARNING: events in group from different hybrid PMUs!"

This is to remind the user not to put the core event and atom
event into one group.

Next, just disable grouping.

  # perf stat -e "{cpu_core/cycles/,cpu_atom/cycles/}" -a -- sleep 1
  WARNING: events in group from different hybrid PMUs!
  WARNING: grouped events cpus do not match, disabling group:
anon group { cpu_core/cycles/, cpu_atom/cycles/ }

   Performance counter stats for 'system wide':

   5,438,125  cpu_core/cycles/
   3,914,586  cpu_atom/cycles/

 1.004250966 seconds time elapsed

Signed-off-by: Jin Yao 
---
v4:
 - No change.

 tools/perf/builtin-stat.c  |  4 +++
 tools/perf/util/evlist-hybrid.c| 47 ++
 tools/perf/util/evlist-hybrid.h|  2 ++
 tools/perf/util/evsel.c|  6 
 tools/perf/util/evsel.h|  1 +
 tools/perf/util/python-ext-sources |  2 ++
 6 files changed, 62 insertions(+)

diff --git a/tools/perf/builtin-stat.c b/tools/perf/builtin-stat.c
index 0351b99d17a7..c429aae6eeb6 100644
--- a/tools/perf/builtin-stat.c
+++ b/tools/perf/builtin-stat.c
@@ -48,6 +48,7 @@
 #include "util/pmu.h"
 #include "util/event.h"
 #include "util/evlist.h"
+#include "util/evlist-hybrid.h"
 #include "util/evsel.h"
 #include "util/debug.h"
 #include "util/color.h"
@@ -240,6 +241,9 @@ static void evlist__check_cpu_maps(struct evlist *evlist)
struct evsel *evsel, *pos, *leader;
char buf[1024];
 
+   if (evlist__has_hybrid(evlist))
+   evlist__warn_hybrid_group(evlist);
+
evlist__for_each_entry(evlist, evsel) {
leader = evsel->leader;
 
diff --git a/tools/perf/util/evlist-hybrid.c b/tools/perf/util/evlist-hybrid.c
index e11998526f2e..db3f5fbdebe1 100644
--- a/tools/perf/util/evlist-hybrid.c
+++ b/tools/perf/util/evlist-hybrid.c
@@ -7,6 +7,7 @@
 #include "../perf.h"
 #include "util/pmu-hybrid.h"
 #include "util/evlist-hybrid.h"
+#include "debug.h"
 #include 
 #include 
 #include 
@@ -39,3 +40,49 @@ int evlist__add_default_hybrid(struct evlist *evlist, bool 
precise)
 
return 0;
 }
+
+static bool group_hybrid_conflict(struct evsel *leader)
+{
+   struct evsel *pos, *prev = NULL;
+
+   for_each_group_evsel(pos, leader) {
+   if (!evsel__is_hybrid(pos))
+   continue;
+
+   if (prev && strcmp(prev->pmu_name, pos->pmu_name))
+   return true;
+
+   prev = pos;
+   }
+
+   return false;
+}
+
+void evlist__warn_hybrid_group(struct evlist *evlist)
+{
+   struct evsel *evsel;
+
+   evlist__for_each_entry(evlist, evsel) {
+   if (evsel__is_group_leader(evsel) &&
+   evsel->core.nr_members > 1 &&
+   group_hybrid_conflict(evsel)) {
+   pr_warning("WARNING: events in group from "
+  "different hybrid PMUs!\n");
+   return;
+   }
+   }
+}
+
+bool evlist__has_hybrid(struct evlist *evlist)
+{
+   struct evsel *evsel;
+
+   evlist__for_each_entry(evlist, evsel) {
+   if (evsel->pmu_name &&
+   perf_pmu__is_hybrid(evsel->pmu_name)) {
+   return true;
+   }
+   }
+
+   return false;
+}
diff --git a/tools/perf/util/evlist-hybrid.h b/tools/perf/util/evlist-hybrid.h
index e25861649d8f..19f74b4c340a 100644
--- a/tools/perf/util/evlist-hybrid.h
+++ b/tools/perf/util/evlist-hybrid.h
@@ -8,5 +8,7 @@
 #include 
 
 int evlist__add_default_hybrid(struct evlist *evlist, bool precise);
+void evlist__warn_hybrid_group(struct evlist *evlist);
+bool evlist__has_hybrid(struct evlist *evlist);
 
 #endif /* __PERF_EVLIST_HYBRID_H */
diff --git a/tools/perf/util/evsel.c b/tools/perf/util/evsel.c
index 0ba4daa09453..0f64a32ea9c5 100644
--- a/tools/perf/util/evsel.c
+++ b/tools/perf/util/evsel.c
@@ -47,6 +47,7 @@
 #include "memswap.h"
 #include "util.h"
 #include "hashmap.h"
+#include "pmu-hybrid.h"
 #include "../perf-sys.h"
 #include "util/parse-branch-options.h"
 #include 
@@ -2797,3 +2798,8 @@ void evsel__zero_per_pkg(struct evsel *evsel)
hashmap__clear(evsel->per_pkg_mask);
}
 }
+
+bool evsel__is_hybrid(struct evsel *evsel)
+{
+   return evsel->pmu_name && perf_pmu__is_hybrid(evsel->pmu_name);
+}
diff --git a/tools/perf/util/evsel.h b/tools/perf/util/evsel.h
index ff89196281bd..f6f90f68381b 100644
--- a/tools/perf/util/evsel.h
+++ b/tools/perf/util/evsel.h
@@ -453,4 +453,5 @@ struct perf_env *evsel__env(struct evsel *evsel);
 int evsel__sto

[PATCH v4 17/25] perf record: Uniquify hybrid event name

2021-04-16 Thread Jin Yao
For perf-record, it would be useful to tell user the pmu which the
event belongs to.

For example,

  # perf record -a -- sleep 1
  # perf report

  # To display the perf.data header info, please use --header/--header-only 
options.
  #
  #
  # Total Lost Samples: 0
  #
  # Samples: 106  of event 'cpu_core/cycles/'
  # Event count (approx.): 22043448
  #
  # Overhead  Command   Shared ObjectSymbol
  #     ...  

  #
  ...

Signed-off-by: Jin Yao 
---
v4:
 - No change.

 tools/perf/builtin-record.c | 28 
 1 file changed, 28 insertions(+)

diff --git a/tools/perf/builtin-record.c b/tools/perf/builtin-record.c
index 6af46c6a4fd8..3337b5f93336 100644
--- a/tools/perf/builtin-record.c
+++ b/tools/perf/builtin-record.c
@@ -1605,6 +1605,32 @@ static void hit_auxtrace_snapshot_trigger(struct record 
*rec)
}
 }
 
+static void record__uniquify_name(struct record *rec)
+{
+   struct evsel *pos;
+   struct evlist *evlist = rec->evlist;
+   char *new_name;
+   int ret;
+
+   if (!perf_pmu__has_hybrid())
+   return;
+
+   evlist__for_each_entry(evlist, pos) {
+   if (!evsel__is_hybrid(pos))
+   continue;
+
+   if (strchr(pos->name, '/'))
+   continue;
+
+   ret = asprintf(_name, "%s/%s/",
+  pos->pmu_name, pos->name);
+   if (ret) {
+   free(pos->name);
+   pos->name = new_name;
+   }
+   }
+}
+
 static int __cmd_record(struct record *rec, int argc, const char **argv)
 {
int err;
@@ -1709,6 +1735,8 @@ static int __cmd_record(struct record *rec, int argc, 
const char **argv)
if (data->is_pipe && rec->evlist->core.nr_entries == 1)
rec->opts.sample_id = true;
 
+   record__uniquify_name(rec);
+
if (record__open(rec) != 0) {
err = -1;
goto out_child;
-- 
2.17.1



[PATCH v4 14/25] perf stat: Add default hybrid events

2021-04-16 Thread Jin Yao
Previously if '-e' is not specified in perf stat, some software events
and hardware events are added to evlist by default.

Before:

  # ./perf stat -a -- sleep 1

   Performance counter stats for 'system wide':

   24,044.40 msec cpu-clock #   23.946 CPUs utilized
  99  context-switches  #4.117 /sec
  24  cpu-migrations#0.998 /sec
   3  page-faults   #0.125 /sec
   7,000,244  cycles#0.000 GHz
   2,955,024  instructions  #0.42  insn per cycle
 608,941  branches  #   25.326 K/sec
  31,991  branch-misses #5.25% of all branches

 1.004106859 seconds time elapsed

Among the events, cycles, instructions, branches and branch-misses
are hardware events.

One hybrid platform, two hardware events are created for one
hardware event.

cpu_core/cycles/,
cpu_atom/cycles/,
cpu_core/instructions/,
cpu_atom/instructions/,
cpu_core/branches/,
cpu_atom/branches/,
cpu_core/branch-misses/,
cpu_atom/branch-misses/

These events would be added to evlist on hybrid platform.

Since parse_events() has been supported to create two hardware events
for one event on hybrid platform, so we just use parse_events(evlist,
"cycles,instructions,branches,branch-misses") to create the default
events and add them to evlist.

After:

  # ./perf stat -a -- sleep 1

   Performance counter stats for 'system wide':

   24,048.60 msec task-clock#   23.947 CPUs utilized
 438  context-switches  #   18.213 /sec
  24  cpu-migrations#0.998 /sec
   6  page-faults   #0.249 /sec
  24,813,157  cpu_core/cycles/  #1.032 M/sec
   8,072,687  cpu_atom/cycles/  #  335.682 K/sec
  20,731,286  cpu_core/instructions/#  862.058 K/sec
   3,737,203  cpu_atom/instructions/#  155.402 K/sec
   2,620,924  cpu_core/branches/#  108.984 K/sec
 381,186  cpu_atom/branches/#   15.851 K/sec
  93,248  cpu_core/branch-misses/   #3.877 K/sec
  36,515  cpu_atom/branch-misses/   #1.518 K/sec

 1.004235472 seconds time elapsed

We can see two events are created for one hardware event.

One TODO is, the shadow stats looks a bit different, now it's just
'M/sec'.

The perf_stat__update_shadow_stats and perf_stat__print_shadow_stats
need to be improved in future if we want to get the original shadow
stats.

Signed-off-by: Jin Yao 
---
v4:
 - No change.

 tools/perf/builtin-stat.c | 21 +
 1 file changed, 21 insertions(+)

diff --git a/tools/perf/builtin-stat.c b/tools/perf/builtin-stat.c
index 1255af4751c2..0351b99d17a7 100644
--- a/tools/perf/builtin-stat.c
+++ b/tools/perf/builtin-stat.c
@@ -1145,6 +1145,13 @@ static int parse_stat_cgroups(const struct option *opt,
return parse_cgroups(opt, str, unset);
 }
 
+static int add_default_hybrid_events(struct evlist *evlist)
+{
+   struct parse_events_error err;
+
+   return parse_events(evlist, 
"cycles,instructions,branches,branch-misses", );
+}
+
 static struct option stat_options[] = {
OPT_BOOLEAN('T', "transaction", _run,
"hardware transaction statistics"),
@@ -1626,6 +1633,12 @@ static int add_default_attributes(void)
   { .type = PERF_TYPE_HARDWARE, .config = PERF_COUNT_HW_BRANCH_INSTRUCTIONS
},
   { .type = PERF_TYPE_HARDWARE, .config = PERF_COUNT_HW_BRANCH_MISSES  
},
 
+};
+   struct perf_event_attr default_sw_attrs[] = {
+  { .type = PERF_TYPE_SOFTWARE, .config = PERF_COUNT_SW_TASK_CLOCK 
},
+  { .type = PERF_TYPE_SOFTWARE, .config = PERF_COUNT_SW_CONTEXT_SWITCHES   
},
+  { .type = PERF_TYPE_SOFTWARE, .config = PERF_COUNT_SW_CPU_MIGRATIONS 
},
+  { .type = PERF_TYPE_SOFTWARE, .config = PERF_COUNT_SW_PAGE_FAULTS
},
 };
 
 /*
@@ -1863,6 +1876,14 @@ static int add_default_attributes(void)
}
 
if (!evsel_list->core.nr_entries) {
+   if (perf_pmu__has_hybrid()) {
+   if (evlist__add_default_attrs(evsel_list,
+ default_sw_attrs) < 0) {
+   return -1;
+   }
+   return add_default_hybrid_events(evsel_list);
+   }
+
if (target__has_cpu())
default_attrs0[0].config = PERF_COUNT_SW_CPU_CLOCK;
 
-- 
2.17.1



[PATCH v4 15/25] perf stat: Filter out unmatched aggregation for hybrid event

2021-04-16 Thread Jin Yao
perf-stat has supported some aggregation modes, such as --per-core,
--per-socket and etc. While for hybrid event, it may only available
on part of cpus. So for --per-core, we need to filter out the
unavailable cores, for --per-socket, filter out the unavailable
sockets, and so on.

Before:

  # perf stat --per-core -e cpu_core/cycles/ -a -- sleep 1

   Performance counter stats for 'system wide':

  S0-D0-C0   2479,530  cpu_core/cycles/
  S0-D0-C4   2175,007  cpu_core/cycles/
  S0-D0-C8   2166,240  cpu_core/cycles/
  S0-D0-C12  2704,673  cpu_core/cycles/
  S0-D0-C16  2865,835  cpu_core/cycles/
  S0-D0-C20  2  2,958,461  cpu_core/cycles/
  S0-D0-C24  2163,988  cpu_core/cycles/
  S0-D0-C28  2164,729  cpu_core/cycles/
  S0-D0-C32  0cpu_core/cycles/
  S0-D0-C33  0cpu_core/cycles/
  S0-D0-C34  0cpu_core/cycles/
  S0-D0-C35  0cpu_core/cycles/
  S0-D0-C36  0cpu_core/cycles/
  S0-D0-C37  0cpu_core/cycles/
  S0-D0-C38  0cpu_core/cycles/
  S0-D0-C39  0cpu_core/cycles/

 1.003597211 seconds time elapsed

After:

  # perf stat --per-core -e cpu_core/cycles/ -a -- sleep 1

   Performance counter stats for 'system wide':

  S0-D0-C0   2210,428  cpu_core/cycles/
  S0-D0-C4   2444,830  cpu_core/cycles/
  S0-D0-C8   2435,241  cpu_core/cycles/
  S0-D0-C12  2423,976  cpu_core/cycles/
  S0-D0-C16  2859,350  cpu_core/cycles/
  S0-D0-C20  2  1,559,589  cpu_core/cycles/
  S0-D0-C24  2163,924  cpu_core/cycles/
  S0-D0-C28  2376,610  cpu_core/cycles/

 1.003621290 seconds time elapsed

Signed-off-by: Jin Yao 
---
v4:
 - No change.

 tools/perf/util/stat-display.c | 20 
 1 file changed, 20 insertions(+)

diff --git a/tools/perf/util/stat-display.c b/tools/perf/util/stat-display.c
index 5255d78b1c30..15eafd249e46 100644
--- a/tools/perf/util/stat-display.c
+++ b/tools/perf/util/stat-display.c
@@ -643,6 +643,20 @@ static void aggr_cb(struct perf_stat_config *config,
}
 }
 
+static bool aggr_id_hybrid_matched(struct perf_stat_config *config,
+  struct evsel *counter, struct aggr_cpu_id id)
+{
+   struct aggr_cpu_id s;
+
+   for (int i = 0; i < evsel__nr_cpus(counter); i++) {
+   s = config->aggr_get_id(config, evsel__cpus(counter), i);
+   if (cpu_map__compare_aggr_cpu_id(s, id))
+   return true;
+   }
+
+   return false;
+}
+
 static void print_counter_aggrdata(struct perf_stat_config *config,
   struct evsel *counter, int s,
   char *prefix, bool metric_only,
@@ -656,6 +670,12 @@ static void print_counter_aggrdata(struct perf_stat_config 
*config,
double uval;
 
ad.id = id = config->aggr_map->map[s];
+
+   if (perf_pmu__has_hybrid() &&
+   !aggr_id_hybrid_matched(config, counter, id)) {
+   return;
+   }
+
ad.val = ad.ena = ad.run = 0;
ad.nr = 0;
if (!collect_data(config, counter, aggr_cb, ))
-- 
2.17.1



[PATCH v4 13/25] perf record: Create two hybrid 'cycles' events by default

2021-04-16 Thread Jin Yao
When evlist is empty, for example no '-e' specified in perf record,
one default 'cycles' event is added to evlist.

While on hybrid platform, it needs to create two default 'cycles'
events. One is for cpu_core, the other is for cpu_atom.

This patch actually calls evsel__new_cycles() two times to create
two 'cycles' events.

  # ./perf record -vv -a -- sleep 1
  ...
  
  perf_event_attr:
size 120
config   0x4
{ sample_period, sample_freq }   4000
sample_type  IP|TID|TIME|ID|CPU|PERIOD
read_format  ID
disabled 1
inherit  1
freq 1
precise_ip   3
sample_id_all1
exclude_guest1
  
  sys_perf_event_open: pid -1  cpu 0  group_fd -1  flags 0x8 = 5
  sys_perf_event_open: pid -1  cpu 1  group_fd -1  flags 0x8 = 6
  sys_perf_event_open: pid -1  cpu 2  group_fd -1  flags 0x8 = 7
  sys_perf_event_open: pid -1  cpu 3  group_fd -1  flags 0x8 = 9
  sys_perf_event_open: pid -1  cpu 4  group_fd -1  flags 0x8 = 10
  sys_perf_event_open: pid -1  cpu 5  group_fd -1  flags 0x8 = 11
  sys_perf_event_open: pid -1  cpu 6  group_fd -1  flags 0x8 = 12
  sys_perf_event_open: pid -1  cpu 7  group_fd -1  flags 0x8 = 13
  sys_perf_event_open: pid -1  cpu 8  group_fd -1  flags 0x8 = 14
  sys_perf_event_open: pid -1  cpu 9  group_fd -1  flags 0x8 = 15
  sys_perf_event_open: pid -1  cpu 10  group_fd -1  flags 0x8 = 16
  sys_perf_event_open: pid -1  cpu 11  group_fd -1  flags 0x8 = 17
  sys_perf_event_open: pid -1  cpu 12  group_fd -1  flags 0x8 = 18
  sys_perf_event_open: pid -1  cpu 13  group_fd -1  flags 0x8 = 19
  sys_perf_event_open: pid -1  cpu 14  group_fd -1  flags 0x8 = 20
  sys_perf_event_open: pid -1  cpu 15  group_fd -1  flags 0x8 = 21
  
  perf_event_attr:
size 120
config   0x8
{ sample_period, sample_freq }   4000
sample_type  IP|TID|TIME|ID|CPU|PERIOD
read_format  ID
disabled 1
inherit  1
freq 1
precise_ip   3
sample_id_all1
exclude_guest1
  
  sys_perf_event_open: pid -1  cpu 16  group_fd -1  flags 0x8 = 22
  sys_perf_event_open: pid -1  cpu 17  group_fd -1  flags 0x8 = 23
  sys_perf_event_open: pid -1  cpu 18  group_fd -1  flags 0x8 = 24
  sys_perf_event_open: pid -1  cpu 19  group_fd -1  flags 0x8 = 25
  sys_perf_event_open: pid -1  cpu 20  group_fd -1  flags 0x8 = 26
  sys_perf_event_open: pid -1  cpu 21  group_fd -1  flags 0x8 = 27
  sys_perf_event_open: pid -1  cpu 22  group_fd -1  flags 0x8 = 28
  sys_perf_event_open: pid -1  cpu 23  group_fd -1  flags 0x8 = 29
  

We have to create evlist-hybrid.c otherwise due to the symbol
dependency the perf test python would be failed.

Signed-off-by: Jin Yao 
---
v4:
 - Use PERF_TYPE_HARDWARE (v3 uses PERF_TYPE_HARDWARE_PMU).

v3:
 - Move the major code to new created evlist-hybrid.c.

 tools/perf/builtin-record.c | 19 +++
 tools/perf/util/Build   |  1 +
 tools/perf/util/evlist-hybrid.c | 41 +
 tools/perf/util/evlist-hybrid.h | 12 ++
 tools/perf/util/evlist.c|  5 +++-
 tools/perf/util/evsel.c |  6 ++---
 tools/perf/util/evsel.h |  2 +-
 7 files changed, 77 insertions(+), 9 deletions(-)
 create mode 100644 tools/perf/util/evlist-hybrid.c
 create mode 100644 tools/perf/util/evlist-hybrid.h

diff --git a/tools/perf/builtin-record.c b/tools/perf/builtin-record.c
index 5fb9665a2ec2..6af46c6a4fd8 100644
--- a/tools/perf/builtin-record.c
+++ b/tools/perf/builtin-record.c
@@ -47,6 +47,8 @@
 #include "util/util.h"
 #include "util/pfm.h"
 #include "util/clockid.h"
+#include "util/pmu-hybrid.h"
+#include "util/evlist-hybrid.h"
 #include "asm/bug.h"
 #include "perf.h"
 
@@ -2790,10 +2792,19 @@ int cmd_record(int argc, const char **argv)
if (record.opts.overwrite)
record.opts.tail_synthesize = true;
 
-   if (rec->evlist->core.nr_entries == 0 &&
-   __evlist__add_default(rec->evlist, !record.opts.no_samples) < 0) {
-   pr_err("Not enough memory for event selector list\n");
-   goto out;
+   if (rec->evlist->core.nr_entries == 0) {
+   

[PATCH v4 11/25] perf parse-events: Compare with hybrid pmu name

2021-04-16 Thread Jin Yao
On hybrid platform, user may want to enable event only on one pmu.
Following syntax will be supported:

cpu_core//
cpu_atom//

For hardware event, hardware cache event and raw event, two events
are created by default. We pass the specified pmu name in parse_state
and it would be checked before event creation. So next only the
event with the specified pmu would be created.

Signed-off-by: Jin Yao 
---
v4:
 - New in v4.

 tools/perf/util/parse-events-hybrid.c | 21 -
 tools/perf/util/parse-events-hybrid.h |  3 ++-
 tools/perf/util/parse-events.c|  5 +++--
 tools/perf/util/parse-events.h|  4 +++-
 tools/perf/util/parse-events.y|  9 ++---
 5 files changed, 34 insertions(+), 8 deletions(-)

diff --git a/tools/perf/util/parse-events-hybrid.c 
b/tools/perf/util/parse-events-hybrid.c
index e27b747080e1..10160ab126f9 100644
--- a/tools/perf/util/parse-events-hybrid.c
+++ b/tools/perf/util/parse-events-hybrid.c
@@ -59,6 +59,15 @@ static int create_event_hybrid(__u32 config_type, int *idx,
return 0;
 }
 
+static int pmu_cmp(struct parse_events_state *parse_state,
+  struct perf_pmu *pmu)
+{
+   if (!parse_state->hybrid_pmu_name)
+   return 0;
+
+   return strcmp(parse_state->hybrid_pmu_name, pmu->name);
+}
+
 static int add_hw_hybrid(struct parse_events_state *parse_state,
 struct list_head *list, struct perf_event_attr *attr,
 char *name, struct list_head *config_terms)
@@ -67,6 +76,9 @@ static int add_hw_hybrid(struct parse_events_state 
*parse_state,
int ret;
 
perf_pmu__for_each_hybrid_pmu(pmu) {
+   if (pmu_cmp(parse_state, pmu))
+   continue;
+
ret = create_event_hybrid(PERF_TYPE_HARDWARE,
  _state->idx, list, attr, name,
  config_terms, pmu);
@@ -103,6 +115,9 @@ static int add_raw_hybrid(struct parse_events_state 
*parse_state,
int ret;
 
perf_pmu__for_each_hybrid_pmu(pmu) {
+   if (pmu_cmp(parse_state, pmu))
+   continue;
+
ret = create_raw_event_hybrid(_state->idx, list, attr,
  name, config_terms, pmu);
if (ret)
@@ -138,7 +153,8 @@ int parse_events__add_numeric_hybrid(struct 
parse_events_state *parse_state,
 int parse_events__add_cache_hybrid(struct list_head *list, int *idx,
   struct perf_event_attr *attr, char *name,
   struct list_head *config_terms,
-  bool *hybrid)
+  bool *hybrid,
+  struct parse_events_state *parse_state)
 {
struct perf_pmu *pmu;
int ret;
@@ -149,6 +165,9 @@ int parse_events__add_cache_hybrid(struct list_head *list, 
int *idx,
 
*hybrid = true;
perf_pmu__for_each_hybrid_pmu(pmu) {
+   if (pmu_cmp(parse_state, pmu))
+   continue;
+
ret = create_event_hybrid(PERF_TYPE_HW_CACHE, idx, list,
  attr, name, config_terms, pmu);
if (ret)
diff --git a/tools/perf/util/parse-events-hybrid.h 
b/tools/perf/util/parse-events-hybrid.h
index 9ad33cd0cef4..f33bd67aa851 100644
--- a/tools/perf/util/parse-events-hybrid.h
+++ b/tools/perf/util/parse-events-hybrid.h
@@ -17,6 +17,7 @@ int parse_events__add_numeric_hybrid(struct 
parse_events_state *parse_state,
 int parse_events__add_cache_hybrid(struct list_head *list, int *idx,
   struct perf_event_attr *attr, char *name,
   struct list_head *config_terms,
-  bool *hybrid);
+  bool *hybrid,
+  struct parse_events_state *parse_state);
 
 #endif /* __PERF_PARSE_EVENTS_HYBRID_H */
diff --git a/tools/perf/util/parse-events.c b/tools/perf/util/parse-events.c
index 9b2588df22a4..f69475a158bb 100644
--- a/tools/perf/util/parse-events.c
+++ b/tools/perf/util/parse-events.c
@@ -453,7 +453,8 @@ static int config_attr(struct perf_event_attr *attr,
 int parse_events_add_cache(struct list_head *list, int *idx,
   char *type, char *op_result1, char *op_result2,
   struct parse_events_error *err,
-  struct list_head *head_config)
+  struct list_head *head_config,
+  struct parse_events_state *parse_state)
 {
struct perf_event_attr attr;
LIST_HEAD(config_terms);
@@ -524,7 +525,7 @@ int parse_events_add_cache(struct list_head *list, int *idx,
 
ret = parse_events__add_cache_hybrid(list, idx, ,
 config_name ? 

[PATCH v4 12/25] perf parse-events: Support event inside hybrid pmu

2021-04-16 Thread Jin Yao
On hybrid platform, user may want to enable events on one pmu.

Following syntax are supported:

cpu_core//
cpu_atom//

But the syntax doesn't work for cache event.

Before:

  # perf stat -e cpu_core/LLC-loads/ -a -- sleep 1
  event syntax error: 'cpu_core/LLC-loads/'
\___ unknown term 'LLC-loads' for pmu 'cpu_core'

Cache events are a bit complex. We can't create aliases for them.
We use another solution. For example, if we use "cpu_core/LLC-loads/",
in parse_events_add_pmu(), term->config is "LLC-loads".

Then we create a new parser to scan "LLC-loads". The
parse_events_add_cache() would be called during parsing.
The parse_state->hybrid_pmu_name is used to identify the pmu
where the event should be enabled on.

After:

  # perf stat -e cpu_core/LLC-loads/ -a -- sleep 1

   Performance counter stats for 'system wide':

  24,593  cpu_core/LLC-loads/

 1.003911601 seconds time elapsed

If the user sets the config name, we will not uniquify hybrid
event name.

  # perf stat -e cpu_core/r3c/ -a -- sleep 1

   Performance counter stats for 'system wide':

   5,072,048  cpu_core/r3c/

 1.001989415 seconds time elapsed

  # perf stat -e cpu_core/r3c,name=EVENT/ -a -- sleep 1

   Performance counter stats for 'system wide':

   6,819,847  EVENT

 1.001795630 seconds time elapsed

Signed-off-by: Jin Yao 
---
v4:
 - New in v4.

 tools/perf/util/parse-events.c | 55 ++
 1 file changed, 55 insertions(+)

diff --git a/tools/perf/util/parse-events.c b/tools/perf/util/parse-events.c
index f69475a158bb..bd3fd722b4ac 100644
--- a/tools/perf/util/parse-events.c
+++ b/tools/perf/util/parse-events.c
@@ -38,6 +38,7 @@
 #include "util/event.h"
 #include "util/pfm.h"
 #include "util/parse-events-hybrid.h"
+#include "util/pmu-hybrid.h"
 #include "perf.h"
 
 #define MAX_NAME_LEN 100
@@ -48,6 +49,9 @@ extern int parse_events_debug;
 int parse_events_parse(void *parse_state, void *scanner);
 static int get_config_terms(struct list_head *head_config,
struct list_head *head_terms __maybe_unused);
+static int parse_events__with_hybrid_pmu(struct parse_events_state 
*parse_state,
+const char *str, char *pmu_name,
+struct list_head *list, bool *parsed);
 
 static struct perf_pmu_event_symbol *perf_pmu_events_list;
 /*
@@ -1567,6 +1571,27 @@ int parse_events_add_pmu(struct parse_events_state 
*parse_state,
if (pmu->default_config && get_config_chgs(pmu, head_config, 
_terms))
return -ENOMEM;
 
+   if (!parse_state->fake_pmu && head_config &&
+   perf_pmu__is_hybrid(name)) {
+   struct parse_events_term *term;
+   bool parsed;
+   int ret;
+
+   term = list_first_entry(head_config, struct parse_events_term,
+   list);
+   if (term && term->config && strcmp(term->config, "event")) {
+   ret = parse_events__with_hybrid_pmu(parse_state,
+   term->config, name,
+   list, );
+   /*
+* If the string inside the pmu can't be parsed,
+* don't return, try next steps.
+*/
+   if (parsed)
+   return ret;
+   }
+   }
+
if (!parse_state->fake_pmu && perf_pmu__config(pmu, , head_config, 
parse_state->error)) {
struct evsel_config_term *pos, *tmp;
 
@@ -1585,6 +1610,9 @@ int parse_events_add_pmu(struct parse_events_state 
*parse_state,
if (!evsel)
return -ENOMEM;
 
+   if (evsel->name)
+   evsel->use_config_name = true;
+
evsel->pmu_name = name ? strdup(name) : NULL;
evsel->use_uncore_alias = use_uncore_alias;
evsel->percore = config_term_percore(>config_terms);
@@ -2180,6 +2208,33 @@ int parse_events_terms(struct list_head *terms, const 
char *str)
return ret;
 }
 
+static int parse_events__with_hybrid_pmu(struct parse_events_state 
*parse_state,
+const char *str, char *pmu_name,
+struct list_head *list, bool *parsed)
+{
+   struct parse_events_state ps = {
+   .list= LIST_HEAD_INIT(ps.list),
+   .stoken  = PE_START_EVENTS,
+   .hybrid_pmu_name = pmu_name,
+   .idx = parse_state->idx,
+   };
+   int ret;
+
+   *parsed = false;
+   ret = parse_events_

[PATCH v4 10/25] perf parse-events: Create two hybrid raw events

2021-04-16 Thread Jin Yao
On hybrid platform, same raw event is possible to be available
on both cpu_core pmu and cpu_atom pmu. It's supported to create
two raw events for one event encoding. For raw events, the
attr.type is PMU type.

  # perf stat -e r3c -a -vv -- sleep 1
  Control descriptor is not initialized
  
  perf_event_attr:
type 4
size 120
config   0x3c
sample_type  IDENTIFIER
read_format  TOTAL_TIME_ENABLED|TOTAL_TIME_RUNNING
disabled 1
inherit  1
exclude_guest1
  
  sys_perf_event_open: pid -1  cpu 0  group_fd -1  flags 0x8 = 3
  
  ...
  
  perf_event_attr:
type 4
size 120
config   0x3c
sample_type  IDENTIFIER
read_format  TOTAL_TIME_ENABLED|TOTAL_TIME_RUNNING
disabled 1
inherit  1
exclude_guest1
  
  sys_perf_event_open: pid -1  cpu 15  group_fd -1  flags 0x8 = 19
  
  perf_event_attr:
type 8
size 120
config   0x3c
sample_type  IDENTIFIER
read_format  TOTAL_TIME_ENABLED|TOTAL_TIME_RUNNING
disabled 1
inherit  1
exclude_guest1
  
  sys_perf_event_open: pid -1  cpu 16  group_fd -1  flags 0x8 = 20
  
  ...
  
  perf_event_attr:
type 8
size 120
config   0x3c
sample_type  IDENTIFIER
read_format  TOTAL_TIME_ENABLED|TOTAL_TIME_RUNNING
disabled 1
inherit  1
exclude_guest1
  
  sys_perf_event_open: pid -1  cpu 23  group_fd -1  flags 0x8 = 27
  r3c: 0: 434449 1001412521 1001412521
  r3c: 1: 173162 1001482031 1001482031
  r3c: 2: 231710 1001524974 1001524974
  r3c: 3: 110012 1001563523 1001563523
  r3c: 4: 191517 1001593221 1001593221
  r3c: 5: 956458 1001628147 1001628147
  r3c: 6: 416969 1001715626 1001715626
  r3c: 7: 1047527 1001596650 1001596650
  r3c: 8: 103877 1001633520 1001633520
  r3c: 9: 70571 1001637898 1001637898
  r3c: 10: 550284 1001714398 1001714398
  r3c: 11: 1257274 1001738349 1001738349
  r3c: 12: 107797 1001801432 1001801432
  r3c: 13: 67471 1001836281 1001836281
  r3c: 14: 286782 1001923161 1001923161
  r3c: 15: 815509 1001952550 1001952550
  r3c: 0: 95994 1002071117 1002071117
  r3c: 1: 105570 1002142438 1002142438
  r3c: 2: 115921 1002189147 1002189147
  r3c: 3: 72747 1002238133 1002238133
  r3c: 4: 103519 1002276753 1002276753
  r3c: 5: 121382 1002315131 1002315131
  r3c: 6: 80298 1002248050 1002248050
  r3c: 7: 466790 1002278221 1002278221
  r3c: 6821369 16026754282 16026754282
  r3c: 1162221 8017758990 8017758990

   Performance counter stats for 'system wide':

   6,821,369  cpu_core/r3c/
   1,162,221  cpu_atom/r3c/

 1.002289965 seconds time elapsed

Signed-off-by: Jin Yao 
---
v4:
 - Directly return add_raw_hybrid().

v3:
 - Raw event creation is moved to parse-events-hybrid.c.

 tools/perf/util/parse-events-hybrid.c | 38 ++-
 1 file changed, 37 insertions(+), 1 deletion(-)

diff --git a/tools/perf/util/parse-events-hybrid.c 
b/tools/perf/util/parse-events-hybrid.c
index 7a7e065d2b5f..e27b747080e1 100644
--- a/tools/perf/util/parse-events-hybrid.c
+++ b/tools/perf/util/parse-events-hybrid.c
@@ -77,6 +77,41 @@ static int add_hw_hybrid(struct parse_events_state 
*parse_state,
return 0;
 }
 
+static int create_raw_event_hybrid(int *idx, struct list_head *list,
+  struct perf_event_attr *attr, char *name,
+  struct list_head *config_terms,
+  struct perf_pmu *pmu)
+{
+   struct evsel *evsel;
+
+   attr->type = pmu->type;
+   evsel = parse_events__add_event_hybrid(list, idx, attr, name,
+  pmu, config_terms);
+   if 

[PATCH v4 08/25] perf parse-events: Create two hybrid hardware events

2021-04-16 Thread Jin Yao
Current hardware events has special perf types PERF_TYPE_HARDWARE.
But it doesn't pass the PMU type in the user interface. For a hybrid
system, the perf kernel doesn't know which PMU the events belong to.

So now this type is extended to be PMU aware type. The PMU type ID
is stored at attr.config[63:32].

PMU type ID is retrieved from sysfs.

  root@lkp-adl-d01:/sys/devices/cpu_atom# cat type
  8

  root@lkp-adl-d01:/sys/devices/cpu_core# cat type
  4

When enabling a hybrid hardware event without specified pmu, such as,
'perf stat -e cycles -a', two events are created automatically. One
is for atom, the other is for core.

  # perf stat -e cycles -a -vv -- sleep 1
  Control descriptor is not initialized
  
  perf_event_attr:
size 120
config   0x4
sample_type  IDENTIFIER
read_format  TOTAL_TIME_ENABLED|TOTAL_TIME_RUNNING
disabled 1
inherit  1
exclude_guest1
  
  sys_perf_event_open: pid -1  cpu 0  group_fd -1  flags 0x8 = 3
  
  ...
  
  perf_event_attr:
size 120
config   0x4
sample_type  IDENTIFIER
read_format  TOTAL_TIME_ENABLED|TOTAL_TIME_RUNNING
disabled 1
inherit  1
exclude_guest1
  
  sys_perf_event_open: pid -1  cpu 15  group_fd -1  flags 0x8 = 19
  
  perf_event_attr:
size 120
config   0x8
sample_type  IDENTIFIER
read_format  TOTAL_TIME_ENABLED|TOTAL_TIME_RUNNING
disabled 1
inherit  1
exclude_guest1
  
  sys_perf_event_open: pid -1  cpu 16  group_fd -1  flags 0x8 = 20
  
  ...
  
  perf_event_attr:
size 120
config   0x8
sample_type  IDENTIFIER
read_format  TOTAL_TIME_ENABLED|TOTAL_TIME_RUNNING
disabled 1
inherit  1
exclude_guest1
  
  sys_perf_event_open: pid -1  cpu 23  group_fd -1  flags 0x8 = 27
  cycles: 0: 836272 1001525722 1001525722
  cycles: 1: 628564 1001580453 1001580453
  cycles: 2: 872693 1001605997 1001605997
  cycles: 3: 70417 1001641369 1001641369
  cycles: 4: 88593 1001726722 1001726722
  cycles: 5: 470495 1001752993 1001752993
  cycles: 6: 484733 1001840440 1001840440
  cycles: 7: 1272477 1001593105 1001593105
  cycles: 8: 209185 1001608616 1001608616
  cycles: 9: 204391 1001633962 1001633962
  cycles: 10: 264121 1001661745 1001661745
  cycles: 11: 826104 1001689904 1001689904
  cycles: 12: 89935 1001728861 1001728861
  cycles: 13: 70639 1001756757 1001756757
  cycles: 14: 185266 1001784810 1001784810
  cycles: 15: 171094 1001825466 1001825466
  cycles: 0: 129624 1001854843 1001854843
  cycles: 1: 122533 1001840421 1001840421
  cycles: 2: 90055 1001882506 1001882506
  cycles: 3: 139607 1001896463 1001896463
  cycles: 4: 141791 1001907838 1001907838
  cycles: 5: 530927 1001883880 1001883880
  cycles: 6: 143246 1001852529 1001852529
  cycles: 7: 667769 1001872626 1001872626
  cycles: 6744979 16026956922 16026956922
  cycles: 1965552 8014991106 8014991106

   Performance counter stats for 'system wide':

   6,744,979  cpu_core/cycles/
   1,965,552  cpu_atom/cycles/

 1.001882711 seconds time elapsed

0x4 in 0x4 indicates the cpu_core pmu.
0x8 in 0x8 indicates the cpu_atom pmu.

Signed-off-by: Jin Yao 
---
v4:
 - Use PERF_TYPE_HARDWARE (v3 uses PERF_TYPE_HARDWARE_PMU)

v3:
 - Create new parse-events-hybrid.c/parse-events-hybrid.h
 - Refine the code

 tools/perf/util/Build |   1 +
 tools/perf/util/parse-events-hybrid.c | 100 ++
 tools/perf/util/parse-events-hybrid.h |  17 +
 tools/perf/util/parse-events.c|  18 +
 tools/perf/util/parse-events.h|   5 ++
 5 files changed, 141 insertions(+)
 create mode 100644 tools/perf/util/parse-events-hybrid.c
 create mode 100644 tools/perf/util/parse-events-hybrid.h

diff --git

[PATCH v4 06/25] perf pmu: Add hybrid helper functions

2021-04-16 Thread Jin Yao
The functions perf_pmu__is_hybrid and perf_pmu__find_hybrid_pmu
can be used to identify the hybrid platform and return the found
hybrid cpu pmu. All the detected hybrid pmus have been saved in
'perf_pmu__hybrid_pmus' list. So we just need to search this list.

perf_pmu__hybrid_type_to_pmu converts the user specified string
to hybrid pmu name. This is used to support the '--cputype' option
in next patches.

perf_pmu__has_hybrid checks the existing of hybrid pmu. Note that,
we have to define it in pmu.c (make pmu-hybrid.c no more symbol
dependency), otherwise perf test python would be failed.

Signed-off-by: Jin Yao 
---
v4:
 - No change.
  
v3:
 - Move perf_pmu__has_hybrid from pmu-hybrid.c to pmu.c. We have to
   add pmu-hybrid.c to python-ext-sources to solve symbol dependency
   issue found in perf test python. For perf_pmu__has_hybrid, it calls
   perf_pmu__scan, which is defined in pmu.c. It's very hard to add
   pmu.c to python-ext-sources, too much symbol dependency here.

 tools/perf/util/pmu-hybrid.c | 40 
 tools/perf/util/pmu-hybrid.h |  4 
 tools/perf/util/pmu.c| 11 ++
 tools/perf/util/pmu.h|  2 ++
 4 files changed, 57 insertions(+)

diff --git a/tools/perf/util/pmu-hybrid.c b/tools/perf/util/pmu-hybrid.c
index 8ed0e6e1776d..f51ccaac60ee 100644
--- a/tools/perf/util/pmu-hybrid.c
+++ b/tools/perf/util/pmu-hybrid.c
@@ -47,3 +47,43 @@ bool perf_pmu__hybrid_mounted(const char *name)
 
return true;
 }
+
+struct perf_pmu *perf_pmu__find_hybrid_pmu(const char *name)
+{
+   struct perf_pmu *pmu;
+
+   if (!name)
+   return NULL;
+
+   perf_pmu__for_each_hybrid_pmu(pmu) {
+   if (!strcmp(name, pmu->name))
+   return pmu;
+   }
+
+   return NULL;
+}
+
+bool perf_pmu__is_hybrid(const char *name)
+{
+   return perf_pmu__find_hybrid_pmu(name) != NULL;
+}
+
+char *perf_pmu__hybrid_type_to_pmu(const char *type)
+{
+   char *pmu_name = NULL;
+
+   if (asprintf(_name, "cpu_%s", type) < 0)
+   return NULL;
+
+   if (perf_pmu__is_hybrid(pmu_name))
+   return pmu_name;
+
+   /*
+* pmu may be not scanned, check the sysfs.
+*/
+   if (perf_pmu__hybrid_mounted(pmu_name))
+   return pmu_name;
+
+   free(pmu_name);
+   return NULL;
+}
diff --git a/tools/perf/util/pmu-hybrid.h b/tools/perf/util/pmu-hybrid.h
index 35bed3714438..d0fa7bc50a76 100644
--- a/tools/perf/util/pmu-hybrid.h
+++ b/tools/perf/util/pmu-hybrid.h
@@ -15,4 +15,8 @@ extern struct list_head perf_pmu__hybrid_pmus;
 
 bool perf_pmu__hybrid_mounted(const char *name);
 
+struct perf_pmu *perf_pmu__find_hybrid_pmu(const char *name);
+bool perf_pmu__is_hybrid(const char *name);
+char *perf_pmu__hybrid_type_to_pmu(const char *type);
+
 #endif /* __PMU_HYBRID_H */
diff --git a/tools/perf/util/pmu.c b/tools/perf/util/pmu.c
index 6e49c7b8ad71..88c8ecdc60b0 100644
--- a/tools/perf/util/pmu.c
+++ b/tools/perf/util/pmu.c
@@ -40,6 +40,7 @@ int perf_pmu_parse(struct list_head *list, char *name);
 extern FILE *perf_pmu_in;
 
 static LIST_HEAD(pmus);
+static bool hybrid_scanned;
 
 /*
  * Parse & process all the sysfs attributes located under
@@ -1861,3 +1862,13 @@ void perf_pmu__warn_invalid_config(struct perf_pmu *pmu, 
__u64 config,
   "'%llx' not supported by kernel)!\n",
   name ?: "N/A", buf, config);
 }
+
+bool perf_pmu__has_hybrid(void)
+{
+   if (!hybrid_scanned) {
+   hybrid_scanned = true;
+   perf_pmu__scan(NULL);
+   }
+
+   return !list_empty(_pmu__hybrid_pmus);
+}
diff --git a/tools/perf/util/pmu.h b/tools/perf/util/pmu.h
index 9a2f89eeab6f..a790ef758171 100644
--- a/tools/perf/util/pmu.h
+++ b/tools/perf/util/pmu.h
@@ -132,4 +132,6 @@ int perf_pmu__caps_parse(struct perf_pmu *pmu);
 void perf_pmu__warn_invalid_config(struct perf_pmu *pmu, __u64 config,
   char *name);
 
+bool perf_pmu__has_hybrid(void);
+
 #endif /* __PMU_H */
-- 
2.17.1



[PATCH v4 07/25] perf stat: Uniquify hybrid event name

2021-04-16 Thread Jin Yao
It would be useful to tell user the pmu which the event belongs to.
perf-stat has supported '--no-merge' option and it can print the pmu
name after the event name, such as:

"cycles [cpu_core]"

Now this option is enabled by default for hybrid platform but change
the format to:

"cpu_core/cycles/"

If user configs the name, we still use the user specified name.

Signed-off-by: Jin Yao 
---
v4:
 - If user configs the name, we still use the user specified name.

v3:
 - No change.

 tools/perf/builtin-stat.c  |  4 
 tools/perf/util/evsel.h|  1 +
 tools/perf/util/stat-display.c | 15 +--
 3 files changed, 18 insertions(+), 2 deletions(-)

diff --git a/tools/perf/builtin-stat.c b/tools/perf/builtin-stat.c
index 2a2c15cac80a..1255af4751c2 100644
--- a/tools/perf/builtin-stat.c
+++ b/tools/perf/builtin-stat.c
@@ -68,6 +68,7 @@
 #include "util/affinity.h"
 #include "util/pfm.h"
 #include "util/bpf_counter.h"
+#include "util/pmu-hybrid.h"
 #include "asm/bug.h"
 
 #include 
@@ -2378,6 +2379,9 @@ int cmd_stat(int argc, const char **argv)
 
evlist__check_cpu_maps(evsel_list);
 
+   if (perf_pmu__has_hybrid())
+   stat_config.no_merge = true;
+
/*
 * Initialize thread_map with comm names,
 * so we could print it out on output.
diff --git a/tools/perf/util/evsel.h b/tools/perf/util/evsel.h
index eccc4fd5b3eb..d518da2fd2eb 100644
--- a/tools/perf/util/evsel.h
+++ b/tools/perf/util/evsel.h
@@ -115,6 +115,7 @@ struct evsel {
boolmerged_stat;
boolreset_group;
boolerrored;
+   booluse_config_name;
struct hashmap  *per_pkg_mask;
struct evsel*leader;
struct list_headconfig_terms;
diff --git a/tools/perf/util/stat-display.c b/tools/perf/util/stat-display.c
index d3137bc17065..5255d78b1c30 100644
--- a/tools/perf/util/stat-display.c
+++ b/tools/perf/util/stat-display.c
@@ -17,6 +17,7 @@
 #include "cgroup.h"
 #include 
 #include "util.h"
+#include "pmu-hybrid.h"
 
 #define CNTR_NOT_SUPPORTED ""
 #define CNTR_NOT_COUNTED   ""
@@ -532,6 +533,7 @@ static void uniquify_event_name(struct evsel *counter)
 {
char *new_name;
char *config;
+   int ret = 0;
 
if (counter->uniquified_name ||
!counter->pmu_name || !strncmp(counter->name, counter->pmu_name,
@@ -546,8 +548,17 @@ static void uniquify_event_name(struct evsel *counter)
counter->name = new_name;
}
} else {
-   if (asprintf(_name,
-"%s [%s]", counter->name, counter->pmu_name) > 0) {
+   if (perf_pmu__has_hybrid()) {
+   if (!counter->use_config_name) {
+   ret = asprintf(_name, "%s/%s/",
+  counter->pmu_name, 
counter->name);
+   }
+   } else {
+   ret = asprintf(_name, "%s [%s]",
+  counter->name, counter->pmu_name);
+   }
+
+   if (ret) {
free(counter->name);
counter->name = new_name;
}
-- 
2.17.1



[PATCH v4 09/25] perf parse-events: Create two hybrid cache events

2021-04-16 Thread Jin Yao
For cache events, they have pre-defined configs. The kernel needs
to know where the cache event comes from (e.g. from cpu_core pmu
or from cpu_atom pmu). But the perf type PERF_TYPE_HW_CACHE
can't carry pmu information.

Now the type PERF_TYPE_HW_CACHE is extended to be PMU aware type.
The PMU type ID is stored at attr.config[63:32].

When enabling a hybrid cache event without specified pmu, such as,
'perf stat -e LLC-loads -a', two events are created
automatically. One is for atom, the other is for core.

  # perf stat -e LLC-loads -a -vv -- sleep 1
  Control descriptor is not initialized
  
  perf_event_attr:
type 3
size 120
config   0x40002
sample_type  IDENTIFIER
read_format  TOTAL_TIME_ENABLED|TOTAL_TIME_RUNNING
disabled 1
inherit  1
exclude_guest1
  
  sys_perf_event_open: pid -1  cpu 0  group_fd -1  flags 0x8 = 3
  
  ...
  
  perf_event_attr:
type 3
size 120
config   0x40002
sample_type  IDENTIFIER
read_format  TOTAL_TIME_ENABLED|TOTAL_TIME_RUNNING
disabled 1
inherit  1
exclude_guest1
  
  sys_perf_event_open: pid -1  cpu 15  group_fd -1  flags 0x8 = 19
  
  perf_event_attr:
type 3
size 120
config   0x80002
sample_type  IDENTIFIER
read_format  TOTAL_TIME_ENABLED|TOTAL_TIME_RUNNING
disabled 1
inherit  1
exclude_guest1
  
  sys_perf_event_open: pid -1  cpu 16  group_fd -1  flags 0x8 = 20
  
  ...
  
  perf_event_attr:
type 3
size 120
config   0x80002
sample_type  IDENTIFIER
read_format  TOTAL_TIME_ENABLED|TOTAL_TIME_RUNNING
disabled 1
inherit  1
exclude_guest1
  
  sys_perf_event_open: pid -1  cpu 23  group_fd -1  flags 0x8 = 27
  LLC-loads: 0: 1507 1001800280 1001800280
  LLC-loads: 1: 666 1001812250 1001812250
  LLC-loads: 2: 3353 1001813453 1001813453
  LLC-loads: 3: 514 1001848795 1001848795
  LLC-loads: 4: 627 1001952832 1001952832
  LLC-loads: 5: 4399 1001451154 1001451154
  LLC-loads: 6: 1240 1001481052 1001481052
  LLC-loads: 7: 478 1001520348 1001520348
  LLC-loads: 8: 691 1001551236 1001551236
  LLC-loads: 9: 310 1001578945 1001578945
  LLC-loads: 10: 1018 1001594354 1001594354
  LLC-loads: 11: 3656 1001622355 1001622355
  LLC-loads: 12: 882 1001661416 1001661416
  LLC-loads: 13: 506 1001693963 1001693963
  LLC-loads: 14: 3547 1001721013 1001721013
  LLC-loads: 15: 1399 1001734818 1001734818
  LLC-loads: 0: 1314 1001793826 1001793826
  LLC-loads: 1: 2857 1001752764 1001752764
  LLC-loads: 2: 646 1001830694 1001830694
  LLC-loads: 3: 1612 1001864861 1001864861
  LLC-loads: 4: 2244 1001912381 1001912381
  LLC-loads: 5: 1255 1001943889 1001943889
  LLC-loads: 6: 4624 1002021109 1002021109
  LLC-loads: 7: 2703 1001959302 1001959302
  LLC-loads: 24793 16026838264 16026838264
  LLC-loads: 17255 8015078826 8015078826

   Performance counter stats for 'system wide':

  24,793  cpu_core/LLC-loads/
  17,255  cpu_atom/LLC-loads/

 1.001970988 seconds time elapsed

0x4 in 0x40002 indicates the cpu_core pmu.
0x8 in 0x80002 indicates the cpu_atom pmu.

Signed-off-by: Jin Yao 
---
v4:
 - Use PERF_TYPE_HW_CACHE (v3 uses PERF_TYPE_HW_CACHE_PMU)
 - Define 'ret' variable for return value.

v3:
 - Raw event creation is moved to parse-events-hybrid.c.

 tools/perf/util/parse-events-hybrid.c | 23 +++
 tools/perf/util/parse-events-hybrid.h |  5 +
 tools/perf/util/parse-events.c| 10 +-
 3 files changed, 37 insertions(+), 1 deletion(-)

diff --git a/tools/perf/util/parse-events-hybrid.c 
b/tools/perf/util/parse-events-hybrid.c
index 8fd7f19a9865..7a7e065d2b5f

[PATCH v4 05/25] perf pmu: Save detected hybrid pmus to a global pmu list

2021-04-16 Thread Jin Yao
We identify the cpu_core pmu and cpu_atom pmu by explicitly
checking following files:

For cpu_core, checks:
"/sys/bus/event_source/devices/cpu_core/cpus"

For cpu_atom, checks:
"/sys/bus/event_source/devices/cpu_atom/cpus"

If the 'cpus' file exists and it has data, the pmu exists.

But in order not to hardcode the "cpu_core" and "cpu_atom",
and make the code in a generic way. So if the path
"/sys/bus/event_source/devices/cpu_xxx/cpus" exists, the hybrid
pmu exists. All the detected hybrid pmus are linked to a
global list 'perf_pmu__hybrid_pmus' and then next we just need
to iterate the list to get all hybrid pmu by using
perf_pmu__for_each_hybrid_pmu.

Signed-off-by: Jin Yao 
---
v4:
 - Check if 'cpus' file is empty. If so, don't create pmu. 

v3:
 - No functional change.

 tools/perf/util/Build|  1 +
 tools/perf/util/pmu-hybrid.c | 49 
 tools/perf/util/pmu-hybrid.h | 18 +
 tools/perf/util/pmu.c|  9 ++-
 tools/perf/util/pmu.h|  4 +++
 5 files changed, 80 insertions(+), 1 deletion(-)
 create mode 100644 tools/perf/util/pmu-hybrid.c
 create mode 100644 tools/perf/util/pmu-hybrid.h

diff --git a/tools/perf/util/Build b/tools/perf/util/Build
index e3e12f9d4733..37a8a63c7195 100644
--- a/tools/perf/util/Build
+++ b/tools/perf/util/Build
@@ -69,6 +69,7 @@ perf-y += parse-events-bison.o
 perf-y += pmu.o
 perf-y += pmu-flex.o
 perf-y += pmu-bison.o
+perf-y += pmu-hybrid.o
 perf-y += trace-event-read.o
 perf-y += trace-event-info.o
 perf-y += trace-event-scripting.o
diff --git a/tools/perf/util/pmu-hybrid.c b/tools/perf/util/pmu-hybrid.c
new file mode 100644
index ..8ed0e6e1776d
--- /dev/null
+++ b/tools/perf/util/pmu-hybrid.c
@@ -0,0 +1,49 @@
+// SPDX-License-Identifier: GPL-2.0
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include "fncache.h"
+#include "pmu-hybrid.h"
+
+LIST_HEAD(perf_pmu__hybrid_pmus);
+
+bool perf_pmu__hybrid_mounted(const char *name)
+{
+   char path[PATH_MAX];
+   const char *sysfs;
+   FILE *file;
+   int n, cpu;
+
+   if (strncmp(name, "cpu_", 4))
+   return false;
+
+   sysfs = sysfs__mountpoint();
+   if (!sysfs)
+   return false;
+
+   snprintf(path, PATH_MAX, CPUS_TEMPLATE_CPU, sysfs, name);
+   if (!file_available(path))
+   return false;
+
+   file = fopen(path, "r");
+   if (!file)
+   return false;
+
+   n = fscanf(file, "%u", );
+   fclose(file);
+   if (n <= 0)
+   return false;
+
+   return true;
+}
diff --git a/tools/perf/util/pmu-hybrid.h b/tools/perf/util/pmu-hybrid.h
new file mode 100644
index ..35bed3714438
--- /dev/null
+++ b/tools/perf/util/pmu-hybrid.h
@@ -0,0 +1,18 @@
+/* SPDX-License-Identifier: GPL-2.0 */
+#ifndef __PMU_HYBRID_H
+#define __PMU_HYBRID_H
+
+#include 
+#include 
+#include 
+#include 
+#include "pmu.h"
+
+extern struct list_head perf_pmu__hybrid_pmus;
+
+#define perf_pmu__for_each_hybrid_pmu(pmu) \
+   list_for_each_entry(pmu, _pmu__hybrid_pmus, hybrid_list)
+
+bool perf_pmu__hybrid_mounted(const char *name);
+
+#endif /* __PMU_HYBRID_H */
diff --git a/tools/perf/util/pmu.c b/tools/perf/util/pmu.c
index 44225838eb03..6e49c7b8ad71 100644
--- a/tools/perf/util/pmu.c
+++ b/tools/perf/util/pmu.c
@@ -25,6 +25,7 @@
 #include "string2.h"
 #include "strbuf.h"
 #include "fncache.h"
+#include "pmu-hybrid.h"
 
 struct perf_pmu perf_pmu__fake;
 
@@ -613,7 +614,6 @@ static struct perf_cpu_map *__pmu_cpumask(const char *path)
  */
 #define SYS_TEMPLATE_ID"./bus/event_source/devices/%s/identifier"
 #define CPUS_TEMPLATE_UNCORE   "%s/bus/event_source/devices/%s/cpumask"
-#define CPUS_TEMPLATE_CPU  "%s/bus/event_source/devices/%s/cpus"
 
 static struct perf_cpu_map *pmu_cpumask(const char *name)
 {
@@ -645,6 +645,9 @@ static bool pmu_is_uncore(const char *name)
char path[PATH_MAX];
const char *sysfs;
 
+   if (perf_pmu__hybrid_mounted(name))
+   return false;
+
sysfs = sysfs__mountpoint();
snprintf(path, PATH_MAX, CPUS_TEMPLATE_UNCORE, sysfs, name);
return file_available(path);
@@ -951,6 +954,7 @@ static struct perf_pmu *pmu_lookup(const char *name)
pmu->is_uncore = pmu_is_uncore(name);
if (pmu->is_uncore)
pmu->id = pmu_id(name);
+   pmu->is_hybrid = perf_pmu__hybrid_mounted(name);
pmu->max_precise = pmu_max_precise(name);
pmu_add_cpu_aliases(, pmu);
pmu_add_sys_aliases(, pmu);
@@ -962,6 +966,9 @@ static struct perf_pmu *pmu_lookup(const char *name)
list_splice(, >aliases);
list_add_tail(>list, 

[PATCH v4 02/25] perf jevents: Support unit value "cpu_core" and "cpu_atom"

2021-04-16 Thread Jin Yao
For some Intel platforms, such as Alderlake, which is a hybrid platform
and it consists of atom cpu and core cpu. Each cpu has dedicated event
list. Part of events are available on core cpu, part of events are
available on atom cpu.

The kernel exports new cpu pmus: cpu_core and cpu_atom. The event in
json is added with a new field "Unit" to indicate which pmu the event
is available on.

For example, one event in cache.json,

{
"BriefDescription": "Counts the number of load ops retired that",
"CollectPEBSRecord": "2",
"Counter": "0,1,2,3",
"EventCode": "0xd2",
"EventName": "MEM_LOAD_UOPS_RETIRED_MISC.MMIO",
"PEBScounters": "0,1,2,3",
"SampleAfterValue": "103",
"UMask": "0x80",
"Unit": "cpu_atom"
},

The unit "cpu_atom" indicates this event is only available on "cpu_atom".

In generated pmu-events.c, we can see:

{
.name = "mem_load_uops_retired_misc.mmio",
.event = "period=103,umask=0x80,event=0xd2",
.desc = "Counts the number of load ops retired that. Unit: cpu_atom ",
.topic = "cache",
.pmu = "cpu_atom",
},

But if without this patch, the "uncore_" prefix is added before "cpu_atom",
such as:
.pmu = "uncore_cpu_atom"

That would be a wrong pmu.

Signed-off-by: Jin Yao 
---
v4:
 - No change.

 tools/perf/pmu-events/jevents.c | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/tools/perf/pmu-events/jevents.c b/tools/perf/pmu-events/jevents.c
index 33aa3c885eaf..ed4f0bd72e5a 100644
--- a/tools/perf/pmu-events/jevents.c
+++ b/tools/perf/pmu-events/jevents.c
@@ -285,6 +285,8 @@ static struct map {
{ "imx8_ddr", "imx8_ddr" },
{ "L3PMC", "amd_l3" },
{ "DFPMC", "amd_df" },
+   { "cpu_core", "cpu_core" },
+   { "cpu_atom", "cpu_atom" },
{}
 };
 
-- 
2.17.1



[PATCH v4 03/25] perf pmu: Simplify arguments of __perf_pmu__new_alias

2021-04-16 Thread Jin Yao
Simplify the arguments of __perf_pmu__new_alias() by passing
the whole 'struct pme_event' pointer.

Signed-off-by: Jin Yao 
---
v4:
 - No change.

 tools/perf/util/pmu.c | 36 
 1 file changed, 16 insertions(+), 20 deletions(-)

diff --git a/tools/perf/util/pmu.c b/tools/perf/util/pmu.c
index 286d5e415bdc..8214def7b0f0 100644
--- a/tools/perf/util/pmu.c
+++ b/tools/perf/util/pmu.c
@@ -306,18 +306,25 @@ static bool perf_pmu_merge_alias(struct perf_pmu_alias 
*newalias,
 }
 
 static int __perf_pmu__new_alias(struct list_head *list, char *dir, char *name,
-char *desc, char *val,
-char *long_desc, char *topic,
-char *unit, char *perpkg,
-char *metric_expr,
-char *metric_name,
-char *deprecated)
+char *desc, char *val, struct pmu_event *pe)
 {
struct parse_events_term *term;
struct perf_pmu_alias *alias;
int ret;
int num;
char newval[256];
+   char *long_desc = NULL, *topic = NULL, *unit = NULL, *perpkg = NULL,
+*metric_expr = NULL, *metric_name = NULL, *deprecated = NULL;
+
+   if (pe) {
+   long_desc = (char *)pe->long_desc;
+   topic = (char *)pe->topic;
+   unit = (char *)pe->unit;
+   perpkg = (char *)pe->perpkg;
+   metric_expr = (char *)pe->metric_expr;
+   metric_name = (char *)pe->metric_name;
+   deprecated = (char *)pe->deprecated;
+   }
 
alias = malloc(sizeof(*alias));
if (!alias)
@@ -406,8 +413,7 @@ static int perf_pmu__new_alias(struct list_head *list, char 
*dir, char *name, FI
/* Remove trailing newline from sysfs file */
strim(buf);
 
-   return __perf_pmu__new_alias(list, dir, name, NULL, buf, NULL, NULL, 
NULL,
-NULL, NULL, NULL, NULL);
+   return __perf_pmu__new_alias(list, dir, name, NULL, buf, NULL);
 }
 
 static inline bool pmu_alias_info_file(char *name)
@@ -798,11 +804,7 @@ void pmu_add_cpu_aliases_map(struct list_head *head, 
struct perf_pmu *pmu,
/* need type casts to override 'const' */
__perf_pmu__new_alias(head, NULL, (char *)pe->name,
(char *)pe->desc, (char *)pe->event,
-   (char *)pe->long_desc, (char *)pe->topic,
-   (char *)pe->unit, (char *)pe->perpkg,
-   (char *)pe->metric_expr,
-   (char *)pe->metric_name,
-   (char *)pe->deprecated);
+   pe);
}
 }
 
@@ -869,13 +871,7 @@ static int pmu_add_sys_aliases_iter_fn(struct pmu_event 
*pe, void *data)
  (char *)pe->name,
  (char *)pe->desc,
  (char *)pe->event,
- (char *)pe->long_desc,
- (char *)pe->topic,
- (char *)pe->unit,
- (char *)pe->perpkg,
- (char *)pe->metric_expr,
- (char *)pe->metric_name,
- (char *)pe->deprecated);
+ pe);
}
 
return 0;
-- 
2.17.1



[PATCH v4 04/25] perf pmu: Save pmu name

2021-04-16 Thread Jin Yao
On hybrid platform, one event is available on one pmu
(such as, available on cpu_core or on cpu_atom).

This patch saves the pmu name to the pmu field of struct perf_pmu_alias.
Then next we can know the pmu which the event can be enabled on.

Signed-off-by: Jin Yao 
---
v4:
 - No change.

v3:
 - Change pmu to pmu_name in struct perf_pmu_alias.

 tools/perf/util/pmu.c | 10 +-
 tools/perf/util/pmu.h |  1 +
 2 files changed, 10 insertions(+), 1 deletion(-)

diff --git a/tools/perf/util/pmu.c b/tools/perf/util/pmu.c
index 8214def7b0f0..44225838eb03 100644
--- a/tools/perf/util/pmu.c
+++ b/tools/perf/util/pmu.c
@@ -283,6 +283,7 @@ void perf_pmu_free_alias(struct perf_pmu_alias *newalias)
zfree(>str);
zfree(>metric_expr);
zfree(>metric_name);
+   zfree(>pmu_name);
parse_events_terms__purge(>terms);
free(newalias);
 }
@@ -297,6 +298,10 @@ static bool perf_pmu_merge_alias(struct perf_pmu_alias 
*newalias,
 
list_for_each_entry(a, alist, list) {
if (!strcasecmp(newalias->name, a->name)) {
+   if (newalias->pmu_name && a->pmu_name &&
+   !strcasecmp(newalias->pmu_name, a->pmu_name)) {
+   continue;
+   }
perf_pmu_update_alias(a, newalias);
perf_pmu_free_alias(newalias);
return true;
@@ -314,7 +319,8 @@ static int __perf_pmu__new_alias(struct list_head *list, 
char *dir, char *name,
int num;
char newval[256];
char *long_desc = NULL, *topic = NULL, *unit = NULL, *perpkg = NULL,
-*metric_expr = NULL, *metric_name = NULL, *deprecated = NULL;
+*metric_expr = NULL, *metric_name = NULL, *deprecated = NULL,
+*pmu_name = NULL;
 
if (pe) {
long_desc = (char *)pe->long_desc;
@@ -324,6 +330,7 @@ static int __perf_pmu__new_alias(struct list_head *list, 
char *dir, char *name,
metric_expr = (char *)pe->metric_expr;
metric_name = (char *)pe->metric_name;
deprecated = (char *)pe->deprecated;
+   pmu_name = (char *)pe->pmu;
}
 
alias = malloc(sizeof(*alias));
@@ -389,6 +396,7 @@ static int __perf_pmu__new_alias(struct list_head *list, 
char *dir, char *name,
}
alias->per_pkg = perpkg && sscanf(perpkg, "%d", ) == 1 && num == 1;
alias->str = strdup(newval);
+   alias->pmu_name = pmu_name ? strdup(pmu_name) : NULL;
 
if (deprecated)
alias->deprecated = true;
diff --git a/tools/perf/util/pmu.h b/tools/perf/util/pmu.h
index 1f1749ba830f..4f100768c264 100644
--- a/tools/perf/util/pmu.h
+++ b/tools/perf/util/pmu.h
@@ -72,6 +72,7 @@ struct perf_pmu_alias {
bool deprecated;
char *metric_expr;
char *metric_name;
+   char *pmu_name;
 };
 
 struct perf_pmu *perf_pmu__find(const char *name);
-- 
2.17.1



[PATCH v4 01/25] tools headers uapi: Update tools's copy of linux/perf_event.h

2021-04-16 Thread Jin Yao
To get the changes in:

Liang Kan's patch
[PATCH V6 21/25] perf: Extend PERF_TYPE_HARDWARE and PERF_TYPE_HW_CACHE

Kan's patch is in review at the moment. The following perf tool
patches need this interface for hybrid support.

This patch can be removed after Kan's patch is upstreamed.

Signed-off-by: Jin Yao 
---
v4:
 - Updated by Kan's latest patch,
   '[PATCH V6 21/25] perf: Extend PERF_TYPE_HARDWARE and PERF_TYPE_HW_CACHE'

 include/uapi/linux/perf_event.h   | 15 +++
 tools/include/uapi/linux/perf_event.h | 15 +++
 2 files changed, 30 insertions(+)

diff --git a/include/uapi/linux/perf_event.h b/include/uapi/linux/perf_event.h
index ad15e40d7f5d..14332f4cf816 100644
--- a/include/uapi/linux/perf_event.h
+++ b/include/uapi/linux/perf_event.h
@@ -37,6 +37,21 @@ enum perf_type_id {
PERF_TYPE_MAX,  /* non-ABI */
 };
 
+/*
+ * attr.config layout for type PERF_TYPE_HARDWARE and PERF_TYPE_HW_CACHE
+ * PERF_TYPE_HARDWARE: 0x00AA
+ * AA: hardware event ID
+ * : PMU type ID
+ * PERF_TYPE_HW_CACHE: 0x00DDCCBB
+ * BB: hardware cache ID
+ * CC: hardware cache op ID
+ * DD: hardware cache op result ID
+ * : PMU type ID
+ * If the PMU type ID is 0, the PERF_TYPE_RAW will be applied.
+ */
+#define PERF_PMU_TYPE_SHIFT32
+#define PERF_HW_EVENT_MASK 0x
+
 /*
  * Generalized performance event event_id types, used by the
  * attr.event_id parameter of the sys_perf_event_open()
diff --git a/tools/include/uapi/linux/perf_event.h 
b/tools/include/uapi/linux/perf_event.h
index ad15e40d7f5d..14332f4cf816 100644
--- a/tools/include/uapi/linux/perf_event.h
+++ b/tools/include/uapi/linux/perf_event.h
@@ -37,6 +37,21 @@ enum perf_type_id {
PERF_TYPE_MAX,  /* non-ABI */
 };
 
+/*
+ * attr.config layout for type PERF_TYPE_HARDWARE and PERF_TYPE_HW_CACHE
+ * PERF_TYPE_HARDWARE: 0x00AA
+ * AA: hardware event ID
+ * : PMU type ID
+ * PERF_TYPE_HW_CACHE: 0x00DDCCBB
+ * BB: hardware cache ID
+ * CC: hardware cache op ID
+ * DD: hardware cache op result ID
+ * : PMU type ID
+ * If the PMU type ID is 0, the PERF_TYPE_RAW will be applied.
+ */
+#define PERF_PMU_TYPE_SHIFT32
+#define PERF_HW_EVENT_MASK 0x
+
 /*
  * Generalized performance event event_id types, used by the
  * attr.event_id parameter of the sys_perf_event_open()
-- 
2.17.1



[PATCH v4 00/25] perf tool: AlderLake hybrid support series 1

2021-04-16 Thread Jin Yao
uot;.
- Move command output two chars to the right.
- Move pmu hybrid functions to new created pmu-hybrid.c/pmu-hybrid.h.
  This is to pass the perf test python case.

Jin Yao (25):
  tools headers uapi: Update tools's copy of linux/perf_event.h
  perf jevents: Support unit value "cpu_core" and "cpu_atom"
  perf pmu: Simplify arguments of __perf_pmu__new_alias
  perf pmu: Save pmu name
  perf pmu: Save detected hybrid pmus to a global pmu list
  perf pmu: Add hybrid helper functions
  perf stat: Uniquify hybrid event name
  perf parse-events: Create two hybrid hardware events
  perf parse-events: Create two hybrid cache events
  perf parse-events: Create two hybrid raw events
  perf parse-events: Compare with hybrid pmu name
  perf parse-events: Support event inside hybrid pmu
  perf record: Create two hybrid 'cycles' events by default
  perf stat: Add default hybrid events
  perf stat: Filter out unmatched aggregation for hybrid event
  perf stat: Warn group events from different hybrid PMU
  perf record: Uniquify hybrid event name
  perf tests: Add hybrid cases for 'Parse event definition strings' test
  perf tests: Add hybrid cases for 'Roundtrip evsel->name' test
  perf tests: Skip 'Setup struct perf_event_attr' test for hybrid
  perf tests: Support 'Track with sched_switch' test for hybrid
  perf tests: Support 'Parse and process metrics' test for hybrid
  perf tests: Support 'Session topology' test for hybrid
  perf tests: Support 'Convert perf time to TSC' test for hybrid
  perf tests: Skip 'perf stat metrics (shadow stat) test' for hybrid

 include/uapi/linux/perf_event.h|  15 ++
 tools/include/uapi/linux/perf_event.h  |  15 ++
 tools/perf/builtin-record.c|  47 +-
 tools/perf/builtin-stat.c  |  29 
 tools/perf/pmu-events/jevents.c|   2 +
 tools/perf/tests/attr.c|   4 +
 tools/perf/tests/evsel-roundtrip-name.c|  19 ++-
 tools/perf/tests/parse-events.c| 152 ++
 tools/perf/tests/parse-metric.c|  10 +-
 tools/perf/tests/perf-time-to-tsc.c|  16 ++
 tools/perf/tests/shell/stat+shadow_stat.sh |   3 +
 tools/perf/tests/switch-tracking.c |  10 +-
 tools/perf/tests/topology.c|  10 +-
 tools/perf/util/Build  |   3 +
 tools/perf/util/evlist-hybrid.c|  88 ++
 tools/perf/util/evlist-hybrid.h|  14 ++
 tools/perf/util/evlist.c   |   5 +-
 tools/perf/util/evsel.c|  12 +-
 tools/perf/util/evsel.h|   4 +-
 tools/perf/util/parse-events-hybrid.c  | 178 +
 tools/perf/util/parse-events-hybrid.h  |  23 +++
 tools/perf/util/parse-events.c |  86 +-
 tools/perf/util/parse-events.h |   9 +-
 tools/perf/util/parse-events.y |   9 +-
 tools/perf/util/pmu-hybrid.c   |  89 +++
 tools/perf/util/pmu-hybrid.h   |  22 +++
 tools/perf/util/pmu.c  |  64 +---
 tools/perf/util/pmu.h  |   7 +
 tools/perf/util/python-ext-sources |   2 +
 tools/perf/util/stat-display.c |  35 +++-
 30 files changed, 933 insertions(+), 49 deletions(-)
 create mode 100644 tools/perf/util/evlist-hybrid.c
 create mode 100644 tools/perf/util/evlist-hybrid.h
 create mode 100644 tools/perf/util/parse-events-hybrid.c
 create mode 100644 tools/perf/util/parse-events-hybrid.h
 create mode 100644 tools/perf/util/pmu-hybrid.c
 create mode 100644 tools/perf/util/pmu-hybrid.h

-- 
2.17.1



Re: [PATCH v3 12/27] perf parse-events: Support no alias assigned event inside hybrid PMU

2021-04-15 Thread Jin, Yao

Hi Jiri,

On 4/16/2021 3:39 AM, Jiri Olsa wrote:

On Thu, Apr 15, 2021 at 10:53:33PM +0800, Jin, Yao wrote:

SNIP



With my current code,

static int parse_events__with_hybrid_pmu(struct parse_events_state *parse_state,
 const char *str, char *pmu_name,
 struct list_head *list)
{
struct parse_events_state ps = {
.list   = LIST_HEAD_INIT(ps.list),
.stoken = PE_START_EVENTS,
.pmu_name   = pmu_name,
.idx= parse_state->idx,
};
int ret;

ret = parse_events__scanner(str, );
perf_pmu__parse_cleanup();

if (!ret) {
if (!list_empty()) {
list_splice(, list);
parse_state->idx = ps.idx;
}
}

return ret;
}

The new created evsels are added to the tail of list (ps.list) and ps.list
is joined to the list (the parameter 'list').

If we want to reuse the __parse_events(), we may need to:

struct evlist *evlist = evlist__new();


there's the original evlist pointer passed to the initial parser
that we should use no?



Unfortunately the answer is no. :(

For "cpu_core/LLC-loads/", if we do the parser twice by just calling __parse_events, actually the 
__parse_events will be called two times.


int __parse_events(struct evlist *evlist, const char *str,
   struct parse_events_error *err, struct perf_pmu *fake_pmu,
   char *pmu_name)
{
struct parse_events_state parse_state = {
.list = LIST_HEAD_INIT(parse_state.list),
...
};

ret = parse_events__scanner(str, _state);
perf_pmu__parse_cleanup();

if (!ret && list_empty(_state.list)) {
WARN_ONCE(true, "WARNING: event parser found nothing\n");
return -1;
}
...
}

When returning to the first __parse_events,'parse_state.list' is an empty list so it would return 
"WARNING: event parser found nothing".


So in my patch, I pass a list pointer in and the new created evsels will be 
added to this list.



__parse_events(evlist, str, NULL, NULL);
Add the evsels in evlist to the tail of list (the parameter 'list')
evlist__delete(evlist);

Is my understanding correct?

Yes, we have to change the interface of __parse_events() by adding a new
parameter 'pmu_name', which will bring much more changes. I agree to make
this change in follow-up patches.


let's check on this over the next version



That's fine, thanks.

Thanks
Jin Yao


thanks,
jirka



Re: [PATCH v3 12/27] perf parse-events: Support no alias assigned event inside hybrid PMU

2021-04-15 Thread Jin, Yao

Hi Jiri,

On 4/15/2021 10:11 PM, Jiri Olsa wrote:

On Thu, Apr 15, 2021 at 09:36:16PM +0800, Jin, Yao wrote:

SNIP


+   int n = 0;
+
+   list_for_each(pos, list)
+   n++;
+
+   return n;
+}
+
+static int parse_events__with_hybrid_pmu(struct parse_events_state 
*parse_state,
+const char *str, char *pmu_name,
+bool *found, struct list_head *list)
+{
+   struct parse_events_state ps = {
+   .list   = LIST_HEAD_INIT(ps.list),
+   .stoken = PE_START_EVENTS,
+   .pmu_name   = pmu_name,
+   .idx= parse_state->idx,
+   };


could we add this pmu_name directly to __parse_events?



Do you suggest we directly call __parse_events()?

int __parse_events(struct evlist *evlist, const char *str,
   struct parse_events_error *err, struct perf_pmu *fake_pmu)

struct parse_events_state parse_state = {
.list = LIST_HEAD_INIT(parse_state.list),
.idx  = evlist->core.nr_entries,
.error= err,
.evlist   = evlist,
.stoken   = PE_START_EVENTS,
.fake_pmu = fake_pmu,
};

But for parse_events__with_hybrid_pmu, we don't have valid evlist. So if we
switch to __parse_events, evlist processing may be a problem.


you should use parse_state->evlist no? but we can chec/make this
change in next itaration.. it's already lot of changes

jirka



With my current code,

static int parse_events__with_hybrid_pmu(struct parse_events_state *parse_state,
 const char *str, char *pmu_name,
 struct list_head *list)
{
struct parse_events_state ps = {
.list   = LIST_HEAD_INIT(ps.list),
.stoken = PE_START_EVENTS,
.pmu_name   = pmu_name,
.idx= parse_state->idx,
};
int ret;

ret = parse_events__scanner(str, );
perf_pmu__parse_cleanup();

if (!ret) {
if (!list_empty()) {
list_splice(, list);
parse_state->idx = ps.idx;
}
}

return ret;
}

The new created evsels are added to the tail of list (ps.list) and ps.list is joined to the list 
(the parameter 'list').


If we want to reuse the __parse_events(), we may need to:

struct evlist *evlist = evlist__new();

__parse_events(evlist, str, NULL, NULL);
Add the evsels in evlist to the tail of list (the parameter 'list')
evlist__delete(evlist);

Is my understanding correct?

Yes, we have to change the interface of __parse_events() by adding a new parameter 'pmu_name', which 
will bring much more changes. I agree to make this change in follow-up patches.


Thanks
Jin Yao


Re: [PATCH v3 12/27] perf parse-events: Support no alias assigned event inside hybrid PMU

2021-04-15 Thread Jin, Yao

Hi Jiri,

On 4/15/2021 7:03 PM, Jiri Olsa wrote:

On Mon, Mar 29, 2021 at 03:00:31PM +0800, Jin Yao wrote:

SNIP


---
v3:
  - Rename the patch:
'perf parse-events: Support hardware events inside PMU' -->
'perf parse-events: Support no alias assigned event inside hybrid PMU'

  - Major code is moved to parse-events-hybrid.c.
  - Refine the code.

  tools/perf/util/parse-events-hybrid.c | 18 +-
  tools/perf/util/parse-events-hybrid.h |  3 +-
  tools/perf/util/parse-events.c| 80 +--
  tools/perf/util/parse-events.h|  4 +-
  tools/perf/util/parse-events.y|  9 ++-
  tools/perf/util/pmu.c |  4 +-
  tools/perf/util/pmu.h |  2 +-
  7 files changed, 108 insertions(+), 12 deletions(-)


please move the support to pass pmu_name and filter
on it within hybrid code in to separate patch



OK.



diff --git a/tools/perf/util/parse-events-hybrid.c 
b/tools/perf/util/parse-events-hybrid.c
index 8a630cbab8f3..5bf176b55573 100644
--- a/tools/perf/util/parse-events-hybrid.c
+++ b/tools/perf/util/parse-events-hybrid.c
@@ -64,6 +64,11 @@ static int add_hw_hybrid(struct parse_events_state 
*parse_state,
int ret;
  
  	perf_pmu__for_each_hybrid_pmu(pmu) {

+   if (parse_state->pmu_name &&
+   strcmp(parse_state->pmu_name, pmu->name)) {
+   continue;


please add this check to separate function

if (pmu_cmp(parse_stat))
continue;



OK.


SNIP


+   if (!parse_state->fake_pmu && head_config && !found &&
+   perf_pmu__is_hybrid(name)) {
+   struct parse_events_term *term;
+   int ret;
+
+   list_for_each_entry(term, head_config, list) {
+   if (!term->config)
+   continue;
+
+   ret = parse_events__with_hybrid_pmu(parse_state,
+   term->config,
+   name, ,
+   list);
+   if (found)
+   return ret;


what if there are more terms in head_config?
should we make sure there's just one term and fail if there's more?



Yes, it should have only one term in head_config.

Now I change the code to:

+   if (!parse_state->fake_pmu && head_config && !found &&
+   perf_pmu__is_hybrid(name)) {
+   struct parse_events_term *term;
+
+   term = list_first_entry(head_config, struct parse_events_term,
+   list);
+   if (term->config) {
+   return parse_events__with_hybrid_pmu(parse_state,
+term->config,
+name, list);
+   }
+   }


also we already know the perf_pmu__is_hybrid(name) is true,
so can't we just call:

   return parse_events__with_hybrid_pmu()




Yes, we can direct return parse_events__with_hybrid_pmu().


+   }
+   }
  
  	if (verbose > 1) {

fprintf(stderr, "After aliases, add event pmu '%s' with '",
@@ -1605,6 +1630,15 @@ int parse_events_multi_pmu_add(struct parse_events_state 
*parse_state,
struct perf_pmu *pmu = NULL;
int ok = 0;
  
+	if (parse_state->pmu_name) {

+   list = malloc(sizeof(struct list_head));
+   if (!list)
+   return -1;
+   INIT_LIST_HEAD(list);
+   *listp = list;
+   return 0;
+   }


hum, why is this needed?



Hmm, it's not necessary in new code, sorry about that.


+
*listp = NULL;
/* Add it for all PMUs that support the alias */
list = malloc(sizeof(struct list_head));
@@ -2176,6 +2210,44 @@ int parse_events_terms(struct list_head *terms, const 
char *str)
return ret;
  }
  
+static int list_entries_nr(struct list_head *list)

+{
+   struct list_head *pos;
+   int n = 0;
+
+   list_for_each(pos, list)
+   n++;
+
+   return n;
+}
+
+static int parse_events__with_hybrid_pmu(struct parse_events_state 
*parse_state,
+const char *str, char *pmu_name,
+bool *found, struct list_head *list)
+{
+   struct parse_events_state ps = {
+   .list   = LIST_HEAD_INIT(ps.list),
+   .stoken = PE_START_EVENTS,
+   .pmu_name   = pmu_name,
+   .idx= parse_state->idx,
+   };


could we add this pmu_name directly to __parse_events?



Do you suggest we directly call __parse_events()?

int __parse_events(struct evlist *evlist, const

Re: [PATCH v3 12/27] perf parse-events: Support no alias assigned event inside hybrid PMU

2021-04-11 Thread Jin, Yao

Hi Jiri,

On 4/9/2021 9:47 PM, Jiri Olsa wrote:

On Mon, Mar 29, 2021 at 03:00:31PM +0800, Jin Yao wrote:

SNIP


+  struct parse_events_state *parse_state)
  {
struct perf_event_attr attr;
LIST_HEAD(config_terms);
@@ -521,7 +526,7 @@ int parse_events_add_cache(struct list_head *list, int *idx,
  
  	i = parse_events__add_cache_hybrid(list, idx, ,

   config_name ? : name, _terms,
-  );
+  , parse_state);
if (hybrid)
return i;
  
@@ -1481,7 +1486,7 @@ int parse_events_add_pmu(struct parse_events_state *parse_state,

struct perf_pmu *pmu;
struct evsel *evsel;
struct parse_events_error *err = parse_state->error;
-   bool use_uncore_alias;
+   bool use_uncore_alias, found = false;
LIST_HEAD(config_terms);
  
  	if (verbose > 1) {

@@ -1530,8 +1535,28 @@ int parse_events_add_pmu(struct parse_events_state 
*parse_state,
}
}
  
-	if (!parse_state->fake_pmu && perf_pmu__check_alias(pmu, head_config, ))

+   if (!parse_state->fake_pmu &&
+   perf_pmu__check_alias(pmu, head_config, , )) {
return -EINVAL;
+   }
+


ok, let's not polute surronding functions and make strict check
on what we want in here.. we are after following events:

cpu_xxx/L1-dcache/
cpu_xxx/l1-d|/
 ...
right?



Yes, we only focus on the cache events now.


so we are after events with single term in head_config that has name in:

L1-dcache|l1-d|l1d|L1-data  |
L1-icache|l1-i|l1i|L1-instruction   |
LLC|L2  |
dTLB|d-tlb|Data-TLB |
iTLB|i-tlb|Instruction-TLB  |
branch|branches|bpu|btb|bpc |
node

I think that with such direct check the code will be more straight
forward, also let's move it to parse-events-hybrid



Do you suggest we just use string comparison for doing the direct check?

e.g.

if (strstr(term->config, "L1-dcache"))
...

Of course, we can define a string array first and use a loop for string 
comparison.


+   if (!parse_state->fake_pmu && head_config && !found &&
+   perf_pmu__is_hybrid(name)) {
+   struct parse_events_term *term;
+   int ret;
+
+   list_for_each_entry(term, head_config, list) {
+   if (!term->config)
+   continue;
+
+   ret = parse_events__with_hybrid_pmu(parse_state,
+   term->config,
+   name, ,
+   list);


do we need to call the parsing again? could we just call
parse_events__add_cache_hybrid?

jirka




If we do the direct check for cache events, I think we don't need the parsing 
again.

As I mentioned above, we need to define a string array and compare with 
term->config one by one.

I'm OK for this solution. :)

Thanks
Jin Yao


+   if (found)
+   return ret;
+   }
+   }
  
  	if (verbose > 1) {

fprintf(stderr, "After aliases, add event pmu '%s' with '",
@@ -1605,6 +1630,15 @@ int parse_events_multi_pmu_add(struct parse_events_state 
*parse_state,
struct perf_pmu *pmu = NULL;
int ok = 0;
  


SNIP



Re: [PATCH v3 11/27] perf pmu: Support 'cycles' and 'branches' inside hybrid PMU

2021-04-11 Thread Jin, Yao

Hi Jiri,

On 4/9/2021 9:48 PM, Jiri Olsa wrote:

On Mon, Mar 29, 2021 at 03:00:30PM +0800, Jin Yao wrote:

On hybrid platform, user may want to enable the hardware event
only on one PMU. So following syntax is supported:

cpu_core//
cpu_atom//

   # perf stat -e cpu_core/cpu-cycles/ -a -- sleep 1

Performance counter stats for 'system wide':

6,049,336  cpu_core/cpu-cycles/

  1.003577042 seconds time elapsed

It enables the event 'cpu-cycles' only on cpu_core pmu.

But for 'cycles' and 'branches', the syntax doesn't work.


because the alias is not there.. but there's:
   cpu/cpu-cycles/
   cpu/branch-instructions/

doing the same thing..  what's wrong with that?

I have a feeling we discussed this in the previous
version.. did I give up? ;-)



Yes, we discussed this in previous threads. :)

Now I'm fine to keep the original behavior. Because the syntax 'cpu/cycles/' and 'cpu/branches/' are 
not supported by current perf.



SNIP


diff --git a/tools/perf/util/pmu.c b/tools/perf/util/pmu.c
index beff29981101..72e5ae5e868e 100644
--- a/tools/perf/util/pmu.c
+++ b/tools/perf/util/pmu.c
@@ -916,6 +916,35 @@ static int pmu_max_precise(const char *name)
return max_precise;
  }
  
+static void perf_pmu__add_hybrid_aliases(struct list_head *head)

+{
+   static struct pmu_event pme_hybrid_fixup[] = {
+   {
+   .name = "cycles",
+   .event = "event=0x3c",
+   },
+   {
+   .name = "branches",
+   .event = "event=0xc4",
+   },
+   {
+   .name = 0,
+   .event = 0,
+   },


if you really need to access these 2 events with special name,
why not add it through the json.. let's not have yet another
place that defines aliases ... also this should be model specific
no?



Yes, defining in json is a good idea if we really need to support 'cpu/cycles/' 
and 'cpu/branches/'.

Anyway, I will drop this patch in next version.

Thanks
Jin Yao


jirka



Re: [PATCH v3 09/27] perf parse-events: Create two hybrid cache events

2021-04-11 Thread Jin, Yao

Hi Jiri,

On 4/9/2021 9:48 PM, Jiri Olsa wrote:

On Mon, Mar 29, 2021 at 03:00:28PM +0800, Jin Yao wrote:

SNIP


index 1bbd0ba92ba7..3692fa3c964a 100644
--- a/tools/perf/util/parse-events.c
+++ b/tools/perf/util/parse-events.c
@@ -458,6 +458,7 @@ int parse_events_add_cache(struct list_head *list, int *idx,
int cache_type = -1, cache_op = -1, cache_result = -1;
char *op_result[2] = { op_result1, op_result2 };
int i, n;
+   bool hybrid;
  
  	/*

 * No fallback - if we cannot get a clear cache type
@@ -517,6 +518,13 @@ int parse_events_add_cache(struct list_head *list, int 
*idx,
if (get_config_terms(head_config, _terms))
return -ENOMEM;
}
+
+   i = parse_events__add_cache_hybrid(list, idx, ,
+  config_name ? : name, _terms,
+  );
+   if (hybrid)
+   return i;


please define 'ret' for the return value, i is confusing

thanks,
jirka



Previously I wanted to save a 'ret' variable, but yes it's confusing. I will define 'ret' in next 
version.


Thanks
Jin Yao


+
return add_event(list, idx, , config_name ? : name, _terms);
  }
  
--

2.17.1





Re: [PATCH v3 10/27] perf parse-events: Create two hybrid raw events

2021-04-11 Thread Jin, Yao

Hi Jiri,

On 4/9/2021 9:49 PM, Jiri Olsa wrote:

On Mon, Mar 29, 2021 at 03:00:29PM +0800, Jin Yao wrote:

SNIP


+ name, config_terms, pmu);
+   if (ret)
+   return ret;
+   }
+
+   return 0;
+}
+
  int parse_events__add_numeric_hybrid(struct parse_events_state *parse_state,
 struct list_head *list,
 struct perf_event_attr *attr,
@@ -91,6 +126,9 @@ int parse_events__add_numeric_hybrid(struct 
parse_events_state *parse_state,
if (attr->type != PERF_TYPE_RAW) {
return add_hw_hybrid(parse_state, list, attr, name,
 config_terms);
+   } else {
+   return add_raw_hybrid(parse_state, list, attr, name,
+ config_terms);
}
  
  	return -1;


no need for the return -1

jirka



Yes, no need return -1 here.

if (attr->type != PERF_TYPE_RAW) {
return add_hw_hybrid(parse_state, list, attr, name,
 config_terms);
}

return add_raw_hybrid(parse_state, list, attr, name,
  config_terms);

Thanks
Jin Yao


--
2.17.1





[PATCH] perf report: Fix wrong LBR block sorting

2021-04-06 Thread Jin Yao
[get-dynamic-info.h:102 -> get-dynamic-info.h:111]ld-2.27.so
  0.08%  820.06%  11  
[intel_pmu_drain_pebs_nhm+580 -> intel_pmu_drain_pebs_nhm+627] 
[kernel.kallsyms]
  0.08%  770.42%  77
  [lru_add_drain_cpu+0 -> lru_add_drain_cpu+133] [kernel.kallsyms]
  0.08%  740.10%  18
[handle_pmi_common+271 -> handle_pmi_common+310] [kernel.kallsyms]
  0.08%  740.40%  74
  [get-dynamic-info.h:131 -> get-dynamic-info.h:157]ld-2.27.so
  0.07%  690.09%  17  
[intel_pmu_drain_pebs_nhm+432 -> intel_pmu_drain_pebs_nhm+468] 
[kernel.kallsyms]

Now the hottest block is reported at the top of output.

Fixes: b65a7d372b1a ("perf hist: Support block formats with 
compare/sort/display")
Signed-off-by: Jin Yao 
---
 tools/perf/util/block-info.c | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/tools/perf/util/block-info.c b/tools/perf/util/block-info.c
index 423ec69bda6c..5ecd4f401f32 100644
--- a/tools/perf/util/block-info.c
+++ b/tools/perf/util/block-info.c
@@ -201,7 +201,7 @@ static int block_total_cycles_pct_entry(struct perf_hpp_fmt 
*fmt,
double ratio = 0.0;
 
if (block_fmt->total_cycles)
-   ratio = (double)bi->cycles / (double)block_fmt->total_cycles;
+   ratio = (double)bi->cycles_aggr / 
(double)block_fmt->total_cycles;
 
return color_pct(hpp, block_fmt->width, 100.0 * ratio);
 }
@@ -216,9 +216,9 @@ static int64_t block_total_cycles_pct_sort(struct 
perf_hpp_fmt *fmt,
double l, r;
 
if (block_fmt->total_cycles) {
-   l = ((double)bi_l->cycles /
+   l = ((double)bi_l->cycles_aggr /
(double)block_fmt->total_cycles) * 10.0;
-   r = ((double)bi_r->cycles /
+   r = ((double)bi_r->cycles_aggr /
(double)block_fmt->total_cycles) * 10.0;
return (int64_t)l - (int64_t)r;
}
-- 
2.17.1



[PATCH] perf vendor events: Add missing model numbers

2021-03-29 Thread Jin Yao
Kernel has supported COMETLAKE/COMETLAKE_L to use the SKYLAKE
events and supported TIGERLAKE_L/TIGERLAKE/ROCKETLAKE to use
the ICELAKE events. But pmu-events mapfile.csv is missing
these model numbers.

Now add the missing model numbers to mapfile.csv.

Signed-off-by: Jin Yao 
---
 tools/perf/pmu-events/arch/x86/mapfile.csv | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/tools/perf/pmu-events/arch/x86/mapfile.csv 
b/tools/perf/pmu-events/arch/x86/mapfile.csv
index 2f2a209e87e1..6455f06f35d3 100644
--- a/tools/perf/pmu-events/arch/x86/mapfile.csv
+++ b/tools/perf/pmu-events/arch/x86/mapfile.csv
@@ -24,6 +24,7 @@ GenuineIntel-6-1F,v2,nehalemep,core
 GenuineIntel-6-1A,v2,nehalemep,core
 GenuineIntel-6-2E,v2,nehalemex,core
 GenuineIntel-6-[4589]E,v24,skylake,core
+GenuineIntel-6-A[56],v24,skylake,core
 GenuineIntel-6-37,v13,silvermont,core
 GenuineIntel-6-4D,v13,silvermont,core
 GenuineIntel-6-4C,v13,silvermont,core
@@ -35,6 +36,8 @@ GenuineIntel-6-55-[01234],v1,skylakex,core
 GenuineIntel-6-55-[56789ABCDEF],v1,cascadelakex,core
 GenuineIntel-6-7D,v1,icelake,core
 GenuineIntel-6-7E,v1,icelake,core
+GenuineIntel-6-8[CD],v1,icelake,core
+GenuineIntel-6-A7,v1,icelake,core
 GenuineIntel-6-86,v1,tremontx,core
 AuthenticAMD-23-([12][0-9A-F]|[0-9A-F]),v2,amdzen1,core
 AuthenticAMD-23-[[:xdigit:]]+,v1,amdzen2,core
-- 
2.17.1



[PATCH v3 19/27] perf tests: Add hybrid cases for 'Parse event definition strings' test

2021-03-29 Thread Jin Yao
Add basic hybrid test cases for 'Parse event definition strings' test.

  # ./perf test 6
   6: Parse event definition strings  : Ok

Signed-off-by: Jin Yao 
---
v3:
 - Use PERF_TYPE_RAW for cpu_core/cycles/

 tools/perf/tests/parse-events.c | 170 
 1 file changed, 170 insertions(+)

diff --git a/tools/perf/tests/parse-events.c b/tools/perf/tests/parse-events.c
index 026c54743311..1cd1d2778172 100644
--- a/tools/perf/tests/parse-events.c
+++ b/tools/perf/tests/parse-events.c
@@ -1512,6 +1512,123 @@ static int test__all_tracepoints(struct evlist *evlist)
return test__checkevent_tracepoint_multi(evlist);
 }
 
+static int test__hybrid_hw_event_with_pmu(struct evlist *evlist)
+{
+   struct evsel *evsel = evlist__first(evlist);
+
+   TEST_ASSERT_VAL("wrong number of entries", 1 == 
evlist->core.nr_entries);
+   TEST_ASSERT_VAL("wrong type", PERF_TYPE_RAW == evsel->core.attr.type);
+   TEST_ASSERT_VAL("wrong config", 0x3c == evsel->core.attr.config);
+   return 0;
+}
+
+static int test__hybrid_hw_event(struct evlist *evlist)
+{
+   struct evsel *evsel1 = evlist__first(evlist);
+   struct evsel *evsel2 = evlist__last(evlist);
+
+   TEST_ASSERT_VAL("wrong number of entries", 2 == 
evlist->core.nr_entries);
+   TEST_ASSERT_VAL("wrong type", PERF_TYPE_HARDWARE_PMU == 
evsel1->core.attr.type);
+   TEST_ASSERT_VAL("wrong config", 0x4 == 
evsel1->core.attr.config);
+   TEST_ASSERT_VAL("wrong type", PERF_TYPE_HARDWARE_PMU == 
evsel2->core.attr.type);
+   TEST_ASSERT_VAL("wrong config", 0xa == 
evsel2->core.attr.config);
+   return 0;
+}
+
+static int test__hybrid_hw_group_event(struct evlist *evlist)
+{
+   struct evsel *evsel, *leader;
+
+   evsel = leader = evlist__first(evlist);
+   TEST_ASSERT_VAL("wrong number of entries", 2 == 
evlist->core.nr_entries);
+   TEST_ASSERT_VAL("wrong type", PERF_TYPE_RAW == evsel->core.attr.type);
+   TEST_ASSERT_VAL("wrong config", 0x3c == evsel->core.attr.config);
+   TEST_ASSERT_VAL("wrong leader", evsel->leader == leader);
+
+   evsel = evsel__next(evsel);
+   TEST_ASSERT_VAL("wrong type", PERF_TYPE_RAW == evsel->core.attr.type);
+   TEST_ASSERT_VAL("wrong config", 0xc0 == evsel->core.attr.config);
+   TEST_ASSERT_VAL("wrong leader", evsel->leader == leader);
+   return 0;
+}
+
+static int test__hybrid_sw_hw_group_event(struct evlist *evlist)
+{
+   struct evsel *evsel, *leader;
+
+   evsel = leader = evlist__first(evlist);
+   TEST_ASSERT_VAL("wrong number of entries", 2 == 
evlist->core.nr_entries);
+   TEST_ASSERT_VAL("wrong type", PERF_TYPE_SOFTWARE == 
evsel->core.attr.type);
+   TEST_ASSERT_VAL("wrong leader", evsel->leader == leader);
+
+   evsel = evsel__next(evsel);
+   TEST_ASSERT_VAL("wrong type", PERF_TYPE_RAW == evsel->core.attr.type);
+   TEST_ASSERT_VAL("wrong config", 0x3c == evsel->core.attr.config);
+   TEST_ASSERT_VAL("wrong leader", evsel->leader == leader);
+   return 0;
+}
+
+static int test__hybrid_hw_sw_group_event(struct evlist *evlist)
+{
+   struct evsel *evsel, *leader;
+
+   evsel = leader = evlist__first(evlist);
+   TEST_ASSERT_VAL("wrong number of entries", 2 == 
evlist->core.nr_entries);
+   TEST_ASSERT_VAL("wrong type", PERF_TYPE_RAW == evsel->core.attr.type);
+   TEST_ASSERT_VAL("wrong config", 0x3c == evsel->core.attr.config);
+   TEST_ASSERT_VAL("wrong leader", evsel->leader == leader);
+
+   evsel = evsel__next(evsel);
+   TEST_ASSERT_VAL("wrong type", PERF_TYPE_SOFTWARE == 
evsel->core.attr.type);
+   TEST_ASSERT_VAL("wrong leader", evsel->leader == leader);
+   return 0;
+}
+
+static int test__hybrid_group_modifier1(struct evlist *evlist)
+{
+   struct evsel *evsel, *leader;
+
+   evsel = leader = evlist__first(evlist);
+   TEST_ASSERT_VAL("wrong number of entries", 2 == 
evlist->core.nr_entries);
+   TEST_ASSERT_VAL("wrong type", PERF_TYPE_RAW == evsel->core.attr.type);
+   TEST_ASSERT_VAL("wrong config", 0x3c == evsel->core.attr.config);
+   TEST_ASSERT_VAL("wrong leader", evsel->leader == leader);
+   TEST_ASSERT_VAL("wrong exclude_user", evsel->core.attr.exclude_user);
+   TEST_ASSERT_VAL("wrong exclude_kernel", 
!evsel->core.attr.exclude_kernel);
+
+   evsel = evsel__next(evsel);
+   TEST_ASSERT_VAL("wrong type", PERF_TYPE_RAW == evsel->core.attr.type);
+   TEST_ASSERT_VAL("wrong config&quo

[PATCH v3 24/27] perf tests: Support 'Session topology' test for hybrid

2021-03-29 Thread Jin Yao
Force to create one event "cpu_core/cycles/" by default,
otherwise in evlist__valid_sample_type, the checking of
'if (evlist->core.nr_entries == 1)' would be failed.

  # ./perf test 41
  41: Session topology: Ok

Signed-off-by: Jin Yao 
---
v3:
 - No functional change.

 tools/perf/tests/topology.c | 10 +-
 1 file changed, 9 insertions(+), 1 deletion(-)

diff --git a/tools/perf/tests/topology.c b/tools/perf/tests/topology.c
index 050489807a47..30b4acb08d35 100644
--- a/tools/perf/tests/topology.c
+++ b/tools/perf/tests/topology.c
@@ -8,6 +8,7 @@
 #include "session.h"
 #include "evlist.h"
 #include "debug.h"
+#include "pmu.h"
 #include 
 
 #define TEMPL "/tmp/perf-test-XX"
@@ -40,7 +41,14 @@ static int session_write_header(char *path)
session = perf_session__new(, false, NULL);
TEST_ASSERT_VAL("can't get session", !IS_ERR(session));
 
-   session->evlist = evlist__new_default();
+   if (!perf_pmu__has_hybrid()) {
+   session->evlist = evlist__new_default();
+   } else {
+   struct parse_events_error err;
+
+   session->evlist = evlist__new();
+   parse_events(session->evlist, "cpu_core/cycles/", );
+   }
TEST_ASSERT_VAL("can't get evlist", session->evlist);
 
perf_header__set_feat(>header, HEADER_CPU_TOPOLOGY);
-- 
2.17.1



[PATCH v3 18/27] perf record: Uniquify hybrid event name

2021-03-29 Thread Jin Yao
For perf-record, it would be useful to tell user the pmu which the
event belongs to.

For example,

  # perf record -a -- sleep 1
  # perf report

  # To display the perf.data header info, please use --header/--header-only 
options.
  #
  #
  # Total Lost Samples: 0
  #
  # Samples: 106  of event 'cpu_core/cycles/'
  # Event count (approx.): 22043448
  #
  # Overhead  Command   Shared ObjectSymbol
  #     ...  

  #
  ...

Signed-off-by: Jin Yao 
---
v3:
 - New patch in v3.

 tools/perf/builtin-record.c | 28 
 1 file changed, 28 insertions(+)

diff --git a/tools/perf/builtin-record.c b/tools/perf/builtin-record.c
index 1b44b1a8636e..74cc9ffbd9ef 100644
--- a/tools/perf/builtin-record.c
+++ b/tools/perf/builtin-record.c
@@ -1605,6 +1605,32 @@ static void hit_auxtrace_snapshot_trigger(struct record 
*rec)
}
 }
 
+static void record__uniquify_name(struct record *rec)
+{
+   struct evsel *pos;
+   struct evlist *evlist = rec->evlist;
+   char *new_name;
+   int ret;
+
+   if (!perf_pmu__has_hybrid())
+   return;
+
+   evlist__for_each_entry(evlist, pos) {
+   if (!evsel__is_hybrid(pos))
+   continue;
+
+   if (strchr(pos->name, '/'))
+   continue;
+
+   ret = asprintf(_name, "%s/%s/",
+  pos->pmu_name, pos->name);
+   if (ret) {
+   free(pos->name);
+   pos->name = new_name;
+   }
+   }
+}
+
 static int __cmd_record(struct record *rec, int argc, const char **argv)
 {
int err;
@@ -1709,6 +1735,8 @@ static int __cmd_record(struct record *rec, int argc, 
const char **argv)
if (data->is_pipe && rec->evlist->core.nr_entries == 1)
rec->opts.sample_id = true;
 
+   record__uniquify_name(rec);
+
if (record__open(rec) != 0) {
err = -1;
goto out_child;
-- 
2.17.1



[PATCH v3 25/27] perf tests: Support 'Convert perf time to TSC' test for hybrid

2021-03-29 Thread Jin Yao
Since for "cycles:u' on hybrid platform, it creates two "cycles".
So the second evsel in evlist also needs initialization.

With this patch,

  # ./perf test 71
  71: Convert perf time to TSC: Ok

Signed-off-by: Jin Yao 
---
v3:
 - No functional change.

 tools/perf/tests/perf-time-to-tsc.c | 16 
 1 file changed, 16 insertions(+)

diff --git a/tools/perf/tests/perf-time-to-tsc.c 
b/tools/perf/tests/perf-time-to-tsc.c
index 680c3cffb128..72f268c6cc5d 100644
--- a/tools/perf/tests/perf-time-to-tsc.c
+++ b/tools/perf/tests/perf-time-to-tsc.c
@@ -20,6 +20,7 @@
 #include "tsc.h"
 #include "mmap.h"
 #include "tests.h"
+#include "pmu.h"
 
 #define CHECK__(x) {   \
while ((x) < 0) {   \
@@ -66,6 +67,10 @@ int test__perf_time_to_tsc(struct test *test __maybe_unused, 
int subtest __maybe
u64 test_tsc, comm1_tsc, comm2_tsc;
u64 test_time, comm1_time = 0, comm2_time = 0;
struct mmap *md;
+   bool hybrid = false;
+
+   if (perf_pmu__has_hybrid())
+   hybrid = true;
 
threads = thread_map__new(-1, getpid(), UINT_MAX);
CHECK_NOT_NULL__(threads);
@@ -88,6 +93,17 @@ int test__perf_time_to_tsc(struct test *test __maybe_unused, 
int subtest __maybe
evsel->core.attr.disabled = 1;
evsel->core.attr.enable_on_exec = 0;
 
+   /*
+* For hybrid "cycles:u", it creates two events.
+* Init the second evsel here.
+*/
+   if (hybrid) {
+   evsel = evsel__next(evsel);
+   evsel->core.attr.comm = 1;
+   evsel->core.attr.disabled = 1;
+   evsel->core.attr.enable_on_exec = 0;
+   }
+
CHECK__(evlist__open(evlist));
 
CHECK__(evlist__mmap(evlist, UINT_MAX));
-- 
2.17.1



[PATCH v3 22/27] perf tests: Support 'Track with sched_switch' test for hybrid

2021-03-29 Thread Jin Yao
Since for "cycles:u' on hybrid platform, it creates two "cycles".
So the number of events in evlist is not expected in next test
steps. Now we just use one event "cpu_core/cycles:u/" for hybrid.

  # ./perf test 35
  35: Track with sched_switch     : Ok

Signed-off-by: Jin Yao 
---
v3:
 - No functional change.

 tools/perf/tests/switch-tracking.c | 10 +-
 1 file changed, 9 insertions(+), 1 deletion(-)

diff --git a/tools/perf/tests/switch-tracking.c 
b/tools/perf/tests/switch-tracking.c
index 3ebaa758df77..3a12176f8c46 100644
--- a/tools/perf/tests/switch-tracking.c
+++ b/tools/perf/tests/switch-tracking.c
@@ -18,6 +18,7 @@
 #include "record.h"
 #include "tests.h"
 #include "util/mmap.h"
+#include "pmu.h"
 
 static int spin_sleep(void)
 {
@@ -340,6 +341,10 @@ int test__switch_tracking(struct test *test 
__maybe_unused, int subtest __maybe_
struct evsel *switch_evsel, *tracking_evsel;
const char *comm;
int err = -1;
+   bool hybrid = false;
+
+   if (perf_pmu__has_hybrid())
+   hybrid = true;
 
threads = thread_map__new(-1, getpid(), UINT_MAX);
if (!threads) {
@@ -371,7 +376,10 @@ int test__switch_tracking(struct test *test 
__maybe_unused, int subtest __maybe_
cpu_clocks_evsel = evlist__last(evlist);
 
/* Second event */
-   err = parse_events(evlist, "cycles:u", NULL);
+   if (!hybrid)
+   err = parse_events(evlist, "cycles:u", NULL);
+   else
+   err = parse_events(evlist, "cpu_core/cycles/u", NULL);
if (err) {
pr_debug("Failed to parse event cycles:u\n");
goto out_err;
-- 
2.17.1



[PATCH v3 27/27] perf Documentation: Document intel-hybrid support

2021-03-29 Thread Jin Yao
Add some words and examples to help understanding of
Intel hybrid perf support.

Signed-off-by: Jin Yao 
---
v3:
 - No functional change.

 tools/perf/Documentation/intel-hybrid.txt | 228 ++
 tools/perf/Documentation/perf-record.txt  |   1 +
 tools/perf/Documentation/perf-stat.txt|   2 +
 3 files changed, 231 insertions(+)
 create mode 100644 tools/perf/Documentation/intel-hybrid.txt

diff --git a/tools/perf/Documentation/intel-hybrid.txt 
b/tools/perf/Documentation/intel-hybrid.txt
new file mode 100644
index ..784f598dd36f
--- /dev/null
+++ b/tools/perf/Documentation/intel-hybrid.txt
@@ -0,0 +1,228 @@
+Intel hybrid support
+
+Support for Intel hybrid events within perf tools.
+
+For some Intel platforms, such as AlderLake, which is hybrid platform and
+it consists of atom cpu and core cpu. Each cpu has dedicated event list.
+Part of events are available on core cpu, part of events are available
+on atom cpu and even part of events are available on both.
+
+Kernel exports two new cpu pmus via sysfs:
+/sys/devices/cpu_core
+/sys/devices/cpu_atom
+
+The 'cpus' files are created under the directories. For example,
+
+cat /sys/devices/cpu_core/cpus
+0-15
+
+cat /sys/devices/cpu_atom/cpus
+16-23
+
+It indicates cpu0-cpu15 are core cpus and cpu16-cpu23 are atom cpus.
+
+Quickstart
+
+List hybrid event
+-
+
+As before, use perf-list to list the symbolic event.
+
+perf list
+
+inst_retired.any
+   [Fixed Counter: Counts the number of instructions retired. Unit: 
cpu_atom]
+inst_retired.any
+   [Number of instructions retired. Fixed Counter - architectural event. 
Unit: cpu_core]
+
+The 'Unit: xxx' is added to brief description to indicate which pmu
+the event is belong to. Same event name but with different pmu can
+be supported.
+
+Enable hybrid event with a specific pmu
+---
+
+To enable a core only event or atom only event, following syntax is supported:
+
+   cpu_core//
+or
+   cpu_atom//
+
+For example, count the 'cycles' event on core cpus.
+
+   perf stat -e cpu_core/cycles/
+
+Create two events for one hardware event automatically
+--
+
+When creating one event and the event is available on both atom and core,
+two events are created automatically. One is for atom, the other is for
+core. Most of hardware events and cache events are available on both
+cpu_core and cpu_atom.
+
+For hardware events, they have pre-defined configs (e.g. 0 for cycles).
+But on hybrid platform, kernel needs to know where the event comes from
+(from atom or from core). The original perf event type PERF_TYPE_HARDWARE
+can't carry pmu information. So a new type PERF_TYPE_HARDWARE_PMU is
+introduced.
+
+The new attr.config layout for PERF_TYPE_HARDWARE_PMU:
+
+0xDD00AA
+AA: original hardware event ID
+DD: PMU type ID
+
+Cache event is similar. A new type PERF_TYPE_HW_CACHE_PMU is introduced.
+
+The new attr.config layout for PERF_TYPE_HW_CACHE_PMU:
+
+0xDD00CCBBAA
+AA: original hardware cache ID
+BB: original hardware cache op ID
+CC: original hardware cache op result ID
+DD: PMU type ID
+
+PMU type ID is retrieved from sysfs
+
+cat /sys/devices/cpu_atom/type
+10
+
+cat /sys/devices/cpu_core/type
+4
+
+When enabling a hardware event without specified pmu, such as,
+perf stat -e cycles -a (use system-wide in this example), two events
+are created automatically.
+
+
+perf_event_attr:
+  type 6
+  size 120
+  config   0x4
+  sample_type  IDENTIFIER
+  read_format  TOTAL_TIME_ENABLED|TOTAL_TIME_RUNNING
+  disabled 1
+  inherit  1
+  exclude_guest1
+
+
+and
+
+
+perf_event_attr:
+  type 6
+  size 120
+  config   0xa
+  sample_type  IDENTIFIER
+  read_format  TOTAL_TIME_ENABLED|TOTAL_TIME_RUNNING
+  disabled 1
+  inherit  1
+  exclude_guest1
+
+
+type 6 is PERF_TYPE_HARDWARE_PMU.
+0x4 in 0x4 indicates it's cpu_core pmu.
+0xa in 0xa indicates it's cpu_atom pmu (atom pmu type id is random).
+
+The kernel creates 'cycles' (0x4) on cpu0-cpu15 (core cpus),
+and create 'cycles' (0xa) on cpu16-cpu23 (atom cpus).
+
+For perf-stat result, it displays two events:
+
+ Performance counter stats for 'system wide

[PATCH v3 23/27] perf tests: Support 'Parse and process metrics' test for hybrid

2021-03-29 Thread Jin Yao
Some events are not supported. Only pick up some cases for hybrid.

  # ./perf test 67
  67: Parse and process metrics   : Ok

Signed-off-by: Jin Yao 
---
v3:
 - No functional change.

 tools/perf/tests/parse-metric.c | 10 --
 1 file changed, 8 insertions(+), 2 deletions(-)

diff --git a/tools/perf/tests/parse-metric.c b/tools/perf/tests/parse-metric.c
index 4968c4106254..24e5ddff515e 100644
--- a/tools/perf/tests/parse-metric.c
+++ b/tools/perf/tests/parse-metric.c
@@ -11,6 +11,7 @@
 #include "debug.h"
 #include "expr.h"
 #include "stat.h"
+#include "pmu.h"
 
 static struct pmu_event pme_test[] = {
 {
@@ -370,12 +371,17 @@ static int test_metric_group(void)
 
 int test__parse_metric(struct test *test __maybe_unused, int subtest 
__maybe_unused)
 {
+   perf_pmu__scan(NULL);
+
TEST_ASSERT_VAL("IPC failed", test_ipc() == 0);
TEST_ASSERT_VAL("frontend failed", test_frontend() == 0);
-   TEST_ASSERT_VAL("cache_miss_cycles failed", test_cache_miss_cycles() == 
0);
TEST_ASSERT_VAL("DCache_L2 failed", test_dcache_l2() == 0);
TEST_ASSERT_VAL("recursion fail failed", test_recursion_fail() == 0);
-   TEST_ASSERT_VAL("test metric group", test_metric_group() == 0);
TEST_ASSERT_VAL("Memory bandwidth", test_memory_bandwidth() == 0);
+
+   if (!perf_pmu__has_hybrid()) {
+   TEST_ASSERT_VAL("cache_miss_cycles failed", 
test_cache_miss_cycles() == 0);
+   TEST_ASSERT_VAL("test metric group", test_metric_group() == 0);
+   }
return 0;
 }
-- 
2.17.1



[PATCH v3 26/27] perf tests: Skip 'perf stat metrics (shadow stat) test' for hybrid

2021-03-29 Thread Jin Yao
Currently we don't support shadow stat for hybrid.

  root@ssp-pwrt-002:~# ./perf stat -e cycles,instructions -a -- sleep 1

   Performance counter stats for 'system wide':

  12,883,109,591  cpu_core/cycles/
   6,405,163,221  cpu_atom/cycles/
 555,553,778  cpu_core/instructions/
 841,158,734  cpu_atom/instructions/

 1.002644773 seconds time elapsed

Now there is no shadow stat 'insn per cycle' reported. We will support
it later and now just skip the 'perf stat metrics (shadow stat) test'.

Signed-off-by: Jin Yao 
---
v3:
 - No functional change.

 tools/perf/tests/shell/stat+shadow_stat.sh | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/tools/perf/tests/shell/stat+shadow_stat.sh 
b/tools/perf/tests/shell/stat+shadow_stat.sh
index ebebd3596cf9..e6e35fc6c882 100755
--- a/tools/perf/tests/shell/stat+shadow_stat.sh
+++ b/tools/perf/tests/shell/stat+shadow_stat.sh
@@ -7,6 +7,9 @@ set -e
 # skip if system-wide mode is forbidden
 perf stat -a true > /dev/null 2>&1 || exit 2
 
+# skip if on hybrid platform
+perf stat -a -e cycles sleep 1 2>&1 | grep -e cpu_core && exit 2
+
 test_global_aggr()
 {
perf stat -a --no-big-num -e cycles,instructions sleep 1  2>&1 | \
-- 
2.17.1



[PATCH v3 21/27] perf tests: Skip 'Setup struct perf_event_attr' test for hybrid

2021-03-29 Thread Jin Yao
For hybrid, kernel introduces new perf type PERF_TYPE_HARDWARE_PMU (6)
and it's assigned to hybrid hardware events.

  # ./perf test 17 -vvv
...
compare
  matching [event:base-stat]
to [event-6-17179869184-4]
[cpu] * 0
[flags] 0|8 8
[type] 0 6
  ->FAIL
  match: [event:base-stat] matches []
  event:base-stat does not match, but is optional
matched
compare
  matching [event-6-17179869184-4]
to [event:base-stat]
[cpu] 0 *
[flags] 8 0|8
[type] 6 0
  ->FAIL
  match: [event-6-17179869184-4] matches []
  expected type=6, got 0
  expected config=17179869184, got 0
  FAILED './tests/attr/test-stat-C0' - match failure

The type matching is failed because one type is 0 but the
type of hybrid hardware event is 6. We temporarily skip this
test case and TODO in future.

Signed-off-by: Jin Yao 
---
v3:
 - No functional change.

 tools/perf/tests/attr.c | 4 
 1 file changed, 4 insertions(+)

diff --git a/tools/perf/tests/attr.c b/tools/perf/tests/attr.c
index dd39ce9b0277..b37c35fb5a46 100644
--- a/tools/perf/tests/attr.c
+++ b/tools/perf/tests/attr.c
@@ -34,6 +34,7 @@
 #include "event.h"
 #include "util.h"
 #include "tests.h"
+#include "pmu.h"
 
 #define ENV "PERF_TEST_ATTR"
 
@@ -184,6 +185,9 @@ int test__attr(struct test *test __maybe_unused, int 
subtest __maybe_unused)
char path_dir[PATH_MAX];
char *exec_path;
 
+   if (perf_pmu__has_hybrid())
+   return 0;
+
/* First try development tree tests. */
if (!lstat("./tests", ))
return run_dir("./tests", "./perf");
-- 
2.17.1



[PATCH v3 20/27] perf tests: Add hybrid cases for 'Roundtrip evsel->name' test

2021-03-29 Thread Jin Yao
Since for one hw event, two hybrid events are created.

For example,

evsel->idx  evsel__name(evsel)
0   cycles
1   cycles
2   instructions
3   instructions
...

So for comparing the evsel name on hybrid, the evsel->idx
needs to be divided by 2.

  # ./perf test 14
  14: Roundtrip evsel->name   : Ok

Signed-off-by: Jin Yao 
---
v3:
 - No functional change.

 tools/perf/tests/evsel-roundtrip-name.c | 19 ---
 1 file changed, 12 insertions(+), 7 deletions(-)

diff --git a/tools/perf/tests/evsel-roundtrip-name.c 
b/tools/perf/tests/evsel-roundtrip-name.c
index f7f3e5b4c180..b74cf80d1f10 100644
--- a/tools/perf/tests/evsel-roundtrip-name.c
+++ b/tools/perf/tests/evsel-roundtrip-name.c
@@ -4,6 +4,7 @@
 #include "parse-events.h"
 #include "tests.h"
 #include "debug.h"
+#include "pmu.h"
 #include 
 #include 
 
@@ -62,7 +63,8 @@ static int perf_evsel__roundtrip_cache_name_test(void)
return ret;
 }
 
-static int __perf_evsel__name_array_test(const char *names[], int nr_names)
+static int __perf_evsel__name_array_test(const char *names[], int nr_names,
+int distance)
 {
int i, err;
struct evsel *evsel;
@@ -82,9 +84,9 @@ static int __perf_evsel__name_array_test(const char *names[], 
int nr_names)
 
err = 0;
evlist__for_each_entry(evlist, evsel) {
-   if (strcmp(evsel__name(evsel), names[evsel->idx])) {
+   if (strcmp(evsel__name(evsel), names[evsel->idx / distance])) {
--err;
-   pr_debug("%s != %s\n", evsel__name(evsel), 
names[evsel->idx]);
+   pr_debug("%s != %s\n", evsel__name(evsel), 
names[evsel->idx / distance]);
}
}
 
@@ -93,18 +95,21 @@ static int __perf_evsel__name_array_test(const char 
*names[], int nr_names)
return err;
 }
 
-#define perf_evsel__name_array_test(names) \
-   __perf_evsel__name_array_test(names, ARRAY_SIZE(names))
+#define perf_evsel__name_array_test(names, distance) \
+   __perf_evsel__name_array_test(names, ARRAY_SIZE(names), distance)
 
 int test__perf_evsel__roundtrip_name_test(struct test *test __maybe_unused, 
int subtest __maybe_unused)
 {
int err = 0, ret = 0;
 
-   err = perf_evsel__name_array_test(evsel__hw_names);
+   if (perf_pmu__has_hybrid())
+   return perf_evsel__name_array_test(evsel__hw_names, 2);
+
+   err = perf_evsel__name_array_test(evsel__hw_names, 1);
if (err)
ret = err;
 
-   err = __perf_evsel__name_array_test(evsel__sw_names, 
PERF_COUNT_SW_DUMMY + 1);
+   err = __perf_evsel__name_array_test(evsel__sw_names, 
PERF_COUNT_SW_DUMMY + 1, 1);
if (err)
ret = err;
 
-- 
2.17.1



[PATCH v3 15/27] perf stat: Filter out unmatched aggregation for hybrid event

2021-03-29 Thread Jin Yao
perf-stat has supported some aggregation modes, such as --per-core,
--per-socket and etc. While for hybrid event, it may only available
on part of cpus. So for --per-core, we need to filter out the
unavailable cores, for --per-socket, filter out the unavailable
sockets, and so on.

Before:

  # perf stat --per-core -e cpu_core/cycles/ -a -- sleep 1

   Performance counter stats for 'system wide':

  S0-D0-C0   2479,530  cpu_core/cycles/
  S0-D0-C4   2175,007  cpu_core/cycles/
  S0-D0-C8   2166,240  cpu_core/cycles/
  S0-D0-C12  2704,673  cpu_core/cycles/
  S0-D0-C16  2865,835  cpu_core/cycles/
  S0-D0-C20  2  2,958,461  cpu_core/cycles/
  S0-D0-C24  2163,988  cpu_core/cycles/
  S0-D0-C28  2164,729  cpu_core/cycles/
  S0-D0-C32  0cpu_core/cycles/
  S0-D0-C33  0cpu_core/cycles/
  S0-D0-C34  0cpu_core/cycles/
  S0-D0-C35  0cpu_core/cycles/
  S0-D0-C36  0cpu_core/cycles/
  S0-D0-C37  0cpu_core/cycles/
  S0-D0-C38  0cpu_core/cycles/
  S0-D0-C39  0cpu_core/cycles/

 1.003597211 seconds time elapsed

After:

  # perf stat --per-core -e cpu_core/cycles/ -a -- sleep 1

   Performance counter stats for 'system wide':

  S0-D0-C0   2210,428  cpu_core/cycles/
  S0-D0-C4   2444,830  cpu_core/cycles/
  S0-D0-C8   2435,241  cpu_core/cycles/
  S0-D0-C12  2423,976  cpu_core/cycles/
  S0-D0-C16  2859,350  cpu_core/cycles/
  S0-D0-C20  2  1,559,589  cpu_core/cycles/
  S0-D0-C24  2163,924  cpu_core/cycles/
  S0-D0-C28  2376,610  cpu_core/cycles/

 1.003621290 seconds time elapsed

Signed-off-by: Jin Yao 
---
v3:
 - No functional change.

 tools/perf/util/stat-display.c | 20 
 1 file changed, 20 insertions(+)

diff --git a/tools/perf/util/stat-display.c b/tools/perf/util/stat-display.c
index 161826938a00..b7ce3c4ae5a8 100644
--- a/tools/perf/util/stat-display.c
+++ b/tools/perf/util/stat-display.c
@@ -635,6 +635,20 @@ static void aggr_cb(struct perf_stat_config *config,
}
 }
 
+static bool aggr_id_hybrid_matched(struct perf_stat_config *config,
+  struct evsel *counter, struct aggr_cpu_id id)
+{
+   struct aggr_cpu_id s;
+
+   for (int i = 0; i < evsel__nr_cpus(counter); i++) {
+   s = config->aggr_get_id(config, evsel__cpus(counter), i);
+   if (cpu_map__compare_aggr_cpu_id(s, id))
+   return true;
+   }
+
+   return false;
+}
+
 static void print_counter_aggrdata(struct perf_stat_config *config,
   struct evsel *counter, int s,
   char *prefix, bool metric_only,
@@ -648,6 +662,12 @@ static void print_counter_aggrdata(struct perf_stat_config 
*config,
double uval;
 
ad.id = id = config->aggr_map->map[s];
+
+   if (perf_pmu__has_hybrid() &&
+   !aggr_id_hybrid_matched(config, counter, id)) {
+   return;
+   }
+
ad.val = ad.ena = ad.run = 0;
ad.nr = 0;
if (!collect_data(config, counter, aggr_cb, ))
-- 
2.17.1



[PATCH v3 17/27] perf script: Support PERF_TYPE_HARDWARE_PMU and PERF_TYPE_HW_CACHE_PMU

2021-03-29 Thread Jin Yao
For a hybrid system, the perf subsystem doesn't know which PMU the
events belong to. So the PMU aware version PERF_TYPE_HARDWARE_PMU and
PERF_TYPE_HW_CACHE_PMU are introduced.

Now define the new output[] entries for these two types.

Signed-off-by: Jin Yao 
---
v3:
 - No change.

 tools/perf/builtin-script.c | 24 
 1 file changed, 24 insertions(+)

diff --git a/tools/perf/builtin-script.c b/tools/perf/builtin-script.c
index 1280cbfad4db..627ec640d2e6 100644
--- a/tools/perf/builtin-script.c
+++ b/tools/perf/builtin-script.c
@@ -275,6 +275,30 @@ static struct {
.invalid_fields = PERF_OUTPUT_TRACE | PERF_OUTPUT_BPF_OUTPUT,
},
 
+   [PERF_TYPE_HARDWARE_PMU] = {
+   .user_set = false,
+
+   .fields = PERF_OUTPUT_COMM | PERF_OUTPUT_TID |
+ PERF_OUTPUT_CPU | PERF_OUTPUT_TIME |
+ PERF_OUTPUT_EVNAME | PERF_OUTPUT_IP |
+ PERF_OUTPUT_SYM | PERF_OUTPUT_SYMOFFSET |
+ PERF_OUTPUT_DSO | PERF_OUTPUT_PERIOD,
+
+   .invalid_fields = PERF_OUTPUT_TRACE | PERF_OUTPUT_BPF_OUTPUT,
+   },
+
+   [PERF_TYPE_HW_CACHE_PMU] = {
+   .user_set = false,
+
+   .fields = PERF_OUTPUT_COMM | PERF_OUTPUT_TID |
+ PERF_OUTPUT_CPU | PERF_OUTPUT_TIME |
+ PERF_OUTPUT_EVNAME | PERF_OUTPUT_IP |
+ PERF_OUTPUT_SYM | PERF_OUTPUT_SYMOFFSET |
+ PERF_OUTPUT_DSO | PERF_OUTPUT_PERIOD,
+
+   .invalid_fields = PERF_OUTPUT_TRACE | PERF_OUTPUT_BPF_OUTPUT,
+   },
+
[OUTPUT_TYPE_SYNTH] = {
.user_set = false,
 
-- 
2.17.1



[PATCH v3 14/27] perf stat: Add default hybrid events

2021-03-29 Thread Jin Yao
Previously if '-e' is not specified in perf stat, some software events
and hardware events are added to evlist by default.

Before:

  # ./perf stat -a -- sleep 1

   Performance counter stats for 'system wide':

   24,044.40 msec cpu-clock #   23.946 CPUs utilized
  99  context-switches  #4.117 /sec
  24  cpu-migrations#0.998 /sec
   3  page-faults   #0.125 /sec
   7,000,244  cycles#0.000 GHz
   2,955,024  instructions  #0.42  insn per cycle
 608,941  branches  #   25.326 K/sec
  31,991  branch-misses #5.25% of all branches

 1.004106859 seconds time elapsed

Among the events, cycles, instructions, branches and branch-misses
are hardware events.

One hybrid platform, two hardware events are created for one
hardware event.

cpu_core/cycles/,
cpu_atom/cycles/,
cpu_core/instructions/,
cpu_atom/instructions/,
cpu_core/branches/,
cpu_atom/branches/,
cpu_core/branch-misses/,
cpu_atom/branch-misses/

These events would be added to evlist on hybrid platform.

Since parse_events() has been supported to create two hardware events
for one event on hybrid platform, so we just use parse_events(evlist,
"cycles,instructions,branches,branch-misses") to create the default
events and add them to evlist.

After:

  # ./perf stat -a -- sleep 1

   Performance counter stats for 'system wide':

   24,048.60 msec task-clock#   23.947 CPUs utilized
 438  context-switches  #   18.213 /sec
  24  cpu-migrations#0.998 /sec
   6  page-faults   #0.249 /sec
  24,813,157  cpu_core/cycles/  #1.032 M/sec
   8,072,687  cpu_atom/cycles/  #  335.682 K/sec
  20,731,286  cpu_core/instructions/#  862.058 K/sec
   3,737,203  cpu_atom/instructions/#  155.402 K/sec
   2,620,924  cpu_core/branches/#  108.984 K/sec
 381,186  cpu_atom/branches/#   15.851 K/sec
  93,248  cpu_core/branch-misses/   #3.877 K/sec
  36,515  cpu_atom/branch-misses/   #1.518 K/sec

 1.004235472 seconds time elapsed

We can see two events are created for one hardware event.

One TODO is, the shadow stats looks a bit different, now it's just
'M/sec'.

The perf_stat__update_shadow_stats and perf_stat__print_shadow_stats
need to be improved in future if we want to get the original shadow
stats.

Signed-off-by: Jin Yao 
---
v3:
 - No functional change.

 tools/perf/builtin-stat.c | 21 +
 1 file changed, 21 insertions(+)

diff --git a/tools/perf/builtin-stat.c b/tools/perf/builtin-stat.c
index 7b2dfe21c5a8..33fda8f55f66 100644
--- a/tools/perf/builtin-stat.c
+++ b/tools/perf/builtin-stat.c
@@ -1140,6 +1140,13 @@ static int parse_stat_cgroups(const struct option *opt,
return parse_cgroups(opt, str, unset);
 }
 
+static int add_default_hybrid_events(struct evlist *evlist)
+{
+   struct parse_events_error err;
+
+   return parse_events(evlist, 
"cycles,instructions,branches,branch-misses", );
+}
+
 static struct option stat_options[] = {
OPT_BOOLEAN('T', "transaction", _run,
"hardware transaction statistics"),
@@ -1619,6 +1626,12 @@ static int add_default_attributes(void)
   { .type = PERF_TYPE_HARDWARE, .config = PERF_COUNT_HW_BRANCH_INSTRUCTIONS
},
   { .type = PERF_TYPE_HARDWARE, .config = PERF_COUNT_HW_BRANCH_MISSES  
},
 
+};
+   struct perf_event_attr default_sw_attrs[] = {
+  { .type = PERF_TYPE_SOFTWARE, .config = PERF_COUNT_SW_TASK_CLOCK 
},
+  { .type = PERF_TYPE_SOFTWARE, .config = PERF_COUNT_SW_CONTEXT_SWITCHES   
},
+  { .type = PERF_TYPE_SOFTWARE, .config = PERF_COUNT_SW_CPU_MIGRATIONS 
},
+  { .type = PERF_TYPE_SOFTWARE, .config = PERF_COUNT_SW_PAGE_FAULTS
},
 };
 
 /*
@@ -1856,6 +1869,14 @@ static int add_default_attributes(void)
}
 
if (!evsel_list->core.nr_entries) {
+   if (perf_pmu__has_hybrid()) {
+   if (evlist__add_default_attrs(evsel_list,
+ default_sw_attrs) < 0) {
+   return -1;
+   }
+   return add_default_hybrid_events(evsel_list);
+   }
+
if (target__has_cpu())
default_attrs0[0].config = PERF_COUNT_SW_CPU_CLOCK;
 
-- 
2.17.1



[PATCH v3 16/27] perf stat: Warn group events from different hybrid PMU

2021-03-29 Thread Jin Yao
If a group has events which are from different hybrid PMUs,
shows a warning:

"WARNING: events in group from different hybrid PMUs!"

This is to remind the user not to put the core event and atom
event into one group.

Next, just disable grouping.

  # perf stat -e "{cpu_core/cycles/,cpu_atom/cycles/}" -a -- sleep 1
  WARNING: events in group from different hybrid PMUs!
  WARNING: grouped events cpus do not match, disabling group:
anon group { cpu_core/cycles/, cpu_atom/cycles/ }

   Performance counter stats for 'system wide':

   5,438,125  cpu_core/cycles/
   3,914,586  cpu_atom/cycles/

 1.004250966 seconds time elapsed

Signed-off-by: Jin Yao 
---
v3:
 - Change the processing logic. In v2, it just reported the warning
   and returned error. But in v3, we also disable grouping.

 tools/perf/builtin-stat.c  |  4 +++
 tools/perf/util/evlist-hybrid.c| 47 ++
 tools/perf/util/evlist-hybrid.h|  2 ++
 tools/perf/util/evsel.c|  6 
 tools/perf/util/evsel.h|  1 +
 tools/perf/util/python-ext-sources |  2 ++
 6 files changed, 62 insertions(+)

diff --git a/tools/perf/builtin-stat.c b/tools/perf/builtin-stat.c
index 33fda8f55f66..6f70a9f1971e 100644
--- a/tools/perf/builtin-stat.c
+++ b/tools/perf/builtin-stat.c
@@ -48,6 +48,7 @@
 #include "util/pmu.h"
 #include "util/event.h"
 #include "util/evlist.h"
+#include "util/evlist-hybrid.h"
 #include "util/evsel.h"
 #include "util/debug.h"
 #include "util/color.h"
@@ -240,6 +241,9 @@ static void evlist__check_cpu_maps(struct evlist *evlist)
struct evsel *evsel, *pos, *leader;
char buf[1024];
 
+   if (evlist__has_hybrid(evlist))
+   evlist__warn_hybrid_group(evlist);
+
evlist__for_each_entry(evlist, evsel) {
leader = evsel->leader;
 
diff --git a/tools/perf/util/evlist-hybrid.c b/tools/perf/util/evlist-hybrid.c
index 185f60ec4351..39f520372447 100644
--- a/tools/perf/util/evlist-hybrid.c
+++ b/tools/perf/util/evlist-hybrid.c
@@ -7,6 +7,7 @@
 #include "../perf.h"
 #include "util/pmu-hybrid.h"
 #include "util/evlist-hybrid.h"
+#include "debug.h"
 #include 
 #include 
 #include 
@@ -39,3 +40,49 @@ int evlist__add_default_hybrid(struct evlist *evlist, bool 
precise)
 
return 0;
 }
+
+static bool group_hybrid_conflict(struct evsel *leader)
+{
+   struct evsel *pos, *prev = NULL;
+
+   for_each_group_evsel(pos, leader) {
+   if (!evsel__is_hybrid(pos))
+   continue;
+
+   if (prev && strcmp(prev->pmu_name, pos->pmu_name))
+   return true;
+
+   prev = pos;
+   }
+
+   return false;
+}
+
+void evlist__warn_hybrid_group(struct evlist *evlist)
+{
+   struct evsel *evsel;
+
+   evlist__for_each_entry(evlist, evsel) {
+   if (evsel__is_group_leader(evsel) &&
+   evsel->core.nr_members > 1 &&
+   group_hybrid_conflict(evsel)) {
+   pr_warning("WARNING: events in group from "
+  "different hybrid PMUs!\n");
+   return;
+   }
+   }
+}
+
+bool evlist__has_hybrid(struct evlist *evlist)
+{
+   struct evsel *evsel;
+
+   evlist__for_each_entry(evlist, evsel) {
+   if (evsel->pmu_name &&
+   perf_pmu__is_hybrid(evsel->pmu_name)) {
+   return true;
+   }
+   }
+
+   return false;
+}
diff --git a/tools/perf/util/evlist-hybrid.h b/tools/perf/util/evlist-hybrid.h
index e25861649d8f..19f74b4c340a 100644
--- a/tools/perf/util/evlist-hybrid.h
+++ b/tools/perf/util/evlist-hybrid.h
@@ -8,5 +8,7 @@
 #include 
 
 int evlist__add_default_hybrid(struct evlist *evlist, bool precise);
+void evlist__warn_hybrid_group(struct evlist *evlist);
+bool evlist__has_hybrid(struct evlist *evlist);
 
 #endif /* __PERF_EVLIST_HYBRID_H */
diff --git a/tools/perf/util/evsel.c b/tools/perf/util/evsel.c
index 0ba4daa09453..0f64a32ea9c5 100644
--- a/tools/perf/util/evsel.c
+++ b/tools/perf/util/evsel.c
@@ -47,6 +47,7 @@
 #include "memswap.h"
 #include "util.h"
 #include "hashmap.h"
+#include "pmu-hybrid.h"
 #include "../perf-sys.h"
 #include "util/parse-branch-options.h"
 #include 
@@ -2797,3 +2798,8 @@ void evsel__zero_per_pkg(struct evsel *evsel)
hashmap__clear(evsel->per_pkg_mask);
}
 }
+
+bool evsel__is_hybrid(struct evsel *evsel)
+{
+   return evsel->pmu_name && perf_pmu__is_hybrid(evsel->pmu_name);
+}
diff --git a/tools/perf/util/evsel.h b/tools/perf/util/evsel.h
index e7dc3448ab2f..e56826bbb628 100644
--- a/tools/perf/util/evsel.h
+++ b/t

[PATCH v3 12/27] perf parse-events: Support no alias assigned event inside hybrid PMU

2021-03-29 Thread Jin Yao
On hybrid platform, similar to hardware event, user may want
to enable other events only on one PMU. So following syntax
should be supported:

cpu_core//
cpu_atom//

But the syntax doesn't work for some events, such as cache
event.

Before:

  # perf stat -e cpu_core/LLC-loads/ -a -- sleep 1
  event syntax error: 'cpu_core/LLC-loads/'
\___ unknown term 'LLC-loads' for pmu 'cpu_core'

Cache events are much more complex than hardware events, so
we can't create aliases for them. We use another solution.
For example, if we use "cpu_core/LLC-loads/", in parse_events_add_pmu(),
term->config is "LLC-loads".

We create a new "parse_events_state" with the pmu_name and use
parse_events__scanner to scan the term->config ("LLC-loads" in
this example). The parse_events_add_cache() would be called during
parsing. The parse_state->pmu_name is used to identify the pmu
where the event should be enabled on.

After:

  # ./perf stat -e cpu_core/LLC-loads/ -a -- sleep 1

   Performance counter stats for 'system wide':

  24,593  cpu_core/LLC-loads/

 1.003911601 seconds time elapsed

Signed-off-by: Jin Yao 
---
v3:
 - Rename the patch:
   'perf parse-events: Support hardware events inside PMU' -->
   'perf parse-events: Support no alias assigned event inside hybrid PMU'

 - Major code is moved to parse-events-hybrid.c.
 - Refine the code.

 tools/perf/util/parse-events-hybrid.c | 18 +-
 tools/perf/util/parse-events-hybrid.h |  3 +-
 tools/perf/util/parse-events.c| 80 +--
 tools/perf/util/parse-events.h|  4 +-
 tools/perf/util/parse-events.y|  9 ++-
 tools/perf/util/pmu.c |  4 +-
 tools/perf/util/pmu.h |  2 +-
 7 files changed, 108 insertions(+), 12 deletions(-)

diff --git a/tools/perf/util/parse-events-hybrid.c 
b/tools/perf/util/parse-events-hybrid.c
index 8a630cbab8f3..5bf176b55573 100644
--- a/tools/perf/util/parse-events-hybrid.c
+++ b/tools/perf/util/parse-events-hybrid.c
@@ -64,6 +64,11 @@ static int add_hw_hybrid(struct parse_events_state 
*parse_state,
int ret;
 
perf_pmu__for_each_hybrid_pmu(pmu) {
+   if (parse_state->pmu_name &&
+   strcmp(parse_state->pmu_name, pmu->name)) {
+   continue;
+   }
+
ret = create_event_hybrid(PERF_TYPE_HARDWARE_PMU,
  _state->idx, list, attr, name,
  config_terms, pmu);
@@ -100,6 +105,11 @@ static int add_raw_hybrid(struct parse_events_state 
*parse_state,
int ret;
 
perf_pmu__for_each_hybrid_pmu(pmu) {
+   if (parse_state->pmu_name &&
+   strcmp(parse_state->pmu_name, pmu->name)) {
+   continue;
+   }
+
ret = create_raw_event_hybrid(_state->idx, list, attr,
  name, config_terms, pmu);
if (ret)
@@ -137,7 +147,8 @@ int parse_events__add_numeric_hybrid(struct 
parse_events_state *parse_state,
 int parse_events__add_cache_hybrid(struct list_head *list, int *idx,
   struct perf_event_attr *attr, char *name,
   struct list_head *config_terms,
-  bool *hybrid)
+  bool *hybrid,
+  struct parse_events_state *parse_state)
 {
struct perf_pmu *pmu;
int ret;
@@ -148,6 +159,11 @@ int parse_events__add_cache_hybrid(struct list_head *list, 
int *idx,
 
*hybrid = true;
perf_pmu__for_each_hybrid_pmu(pmu) {
+   if (parse_state->pmu_name &&
+   strcmp(parse_state->pmu_name, pmu->name)) {
+   continue;
+   }
+
ret = create_event_hybrid(PERF_TYPE_HW_CACHE_PMU, idx, list,
  attr, name, config_terms, pmu);
if (ret)
diff --git a/tools/perf/util/parse-events-hybrid.h 
b/tools/perf/util/parse-events-hybrid.h
index 9ad33cd0cef4..f33bd67aa851 100644
--- a/tools/perf/util/parse-events-hybrid.h
+++ b/tools/perf/util/parse-events-hybrid.h
@@ -17,6 +17,7 @@ int parse_events__add_numeric_hybrid(struct 
parse_events_state *parse_state,
 int parse_events__add_cache_hybrid(struct list_head *list, int *idx,
   struct perf_event_attr *attr, char *name,
   struct list_head *config_terms,
-  bool *hybrid);
+  bool *hybrid,
+  struct parse_events_state *parse_state);
 
 #endif /* __PERF_PARSE_EVENTS_HYBRID_H */
diff --git a/tools/perf/util/parse-events.c b/tools/perf/util/parse-even

[PATCH v3 10/27] perf parse-events: Create two hybrid raw events

2021-03-29 Thread Jin Yao
/

 1.003722876 seconds time elapsed

type 4 is cpu_core pmu type.
type 10 is cpu_atom pmu type.

Signed-off-by: Jin Yao 
---
v3:
 - Raw event creation is moved to parse-events-hybrid.c.

 tools/perf/util/parse-events-hybrid.c | 38 +++
 1 file changed, 38 insertions(+)

diff --git a/tools/perf/util/parse-events-hybrid.c 
b/tools/perf/util/parse-events-hybrid.c
index ff2909bfbf86..8a630cbab8f3 100644
--- a/tools/perf/util/parse-events-hybrid.c
+++ b/tools/perf/util/parse-events-hybrid.c
@@ -74,6 +74,41 @@ static int add_hw_hybrid(struct parse_events_state 
*parse_state,
return 0;
 }
 
+static int create_raw_event_hybrid(int *idx, struct list_head *list,
+  struct perf_event_attr *attr, char *name,
+  struct list_head *config_terms,
+  struct perf_pmu *pmu)
+{
+   struct evsel *evsel;
+
+   attr->type = pmu->type;
+   evsel = parse_events__add_event_hybrid(list, idx, attr, name,
+  pmu, config_terms);
+   if (evsel)
+   evsel->pmu_name = strdup(pmu->name);
+   else
+   return -ENOMEM;
+
+   return 0;
+}
+
+static int add_raw_hybrid(struct parse_events_state *parse_state,
+ struct list_head *list, struct perf_event_attr *attr,
+ char *name, struct list_head *config_terms)
+{
+   struct perf_pmu *pmu;
+   int ret;
+
+   perf_pmu__for_each_hybrid_pmu(pmu) {
+   ret = create_raw_event_hybrid(_state->idx, list, attr,
+ name, config_terms, pmu);
+   if (ret)
+   return ret;
+   }
+
+   return 0;
+}
+
 int parse_events__add_numeric_hybrid(struct parse_events_state *parse_state,
 struct list_head *list,
 struct perf_event_attr *attr,
@@ -91,6 +126,9 @@ int parse_events__add_numeric_hybrid(struct 
parse_events_state *parse_state,
if (attr->type != PERF_TYPE_RAW) {
return add_hw_hybrid(parse_state, list, attr, name,
 config_terms);
+   } else {
+   return add_raw_hybrid(parse_state, list, attr, name,
+ config_terms);
}
 
return -1;
-- 
2.17.1



[PATCH v3 13/27] perf record: Create two hybrid 'cycles' events by default

2021-03-29 Thread Jin Yao
When evlist is empty, for example no '-e' specified in perf record,
one default 'cycles' event is added to evlist.

While on hybrid platform, it needs to create two default 'cycles'
events. One is for cpu_core, the other is for cpu_atom.

This patch actually calls evsel__new_cycles() two times to create
two 'cycles' events.

  # ./perf record -vv -a -- sleep 1
  ...
  
  perf_event_attr:
type 6
size 120
config   0x4
{ sample_period, sample_freq }   4000
sample_type  IP|TID|TIME|ID|CPU|PERIOD
read_format  ID
disabled 1
inherit  1
freq 1
precise_ip   3
sample_id_all1
exclude_guest1
  
  sys_perf_event_open: pid -1  cpu 0  group_fd -1  flags 0x8 = 5
  sys_perf_event_open: pid -1  cpu 1  group_fd -1  flags 0x8 = 6
  sys_perf_event_open: pid -1  cpu 2  group_fd -1  flags 0x8 = 7
  sys_perf_event_open: pid -1  cpu 3  group_fd -1  flags 0x8 = 9
  sys_perf_event_open: pid -1  cpu 4  group_fd -1  flags 0x8 = 10
  sys_perf_event_open: pid -1  cpu 5  group_fd -1  flags 0x8 = 11
  sys_perf_event_open: pid -1  cpu 6  group_fd -1  flags 0x8 = 12
  sys_perf_event_open: pid -1  cpu 7  group_fd -1  flags 0x8 = 13
  sys_perf_event_open: pid -1  cpu 8  group_fd -1  flags 0x8 = 14
  sys_perf_event_open: pid -1  cpu 9  group_fd -1  flags 0x8 = 15
  sys_perf_event_open: pid -1  cpu 10  group_fd -1  flags 0x8 = 16
  sys_perf_event_open: pid -1  cpu 11  group_fd -1  flags 0x8 = 17
  sys_perf_event_open: pid -1  cpu 12  group_fd -1  flags 0x8 = 18
  sys_perf_event_open: pid -1  cpu 13  group_fd -1  flags 0x8 = 19
  sys_perf_event_open: pid -1  cpu 14  group_fd -1  flags 0x8 = 20
  sys_perf_event_open: pid -1  cpu 15  group_fd -1  flags 0x8 = 21
  
  perf_event_attr:
type 6
size 120
config   0xa
{ sample_period, sample_freq }   4000
sample_type  IP|TID|TIME|ID|CPU|PERIOD
read_format  ID
disabled 1
inherit  1
freq 1
precise_ip   3
sample_id_all1
exclude_guest1
  
  sys_perf_event_open: pid -1  cpu 16  group_fd -1  flags 0x8 = 22
  sys_perf_event_open: pid -1  cpu 17  group_fd -1  flags 0x8 = 23
  sys_perf_event_open: pid -1  cpu 18  group_fd -1  flags 0x8 = 24
  sys_perf_event_open: pid -1  cpu 19  group_fd -1  flags 0x8 = 25
  sys_perf_event_open: pid -1  cpu 20  group_fd -1  flags 0x8 = 26
  sys_perf_event_open: pid -1  cpu 21  group_fd -1  flags 0x8 = 27
  sys_perf_event_open: pid -1  cpu 22  group_fd -1  flags 0x8 = 28
  sys_perf_event_open: pid -1  cpu 23  group_fd -1  flags 0x8 = 29
  

We have to create evlist-hybrid.c otherwise due to the symbol
dependency the perf test python would be failed.

Signed-off-by: Jin Yao 
---
v3:
 - Move the major code to new created evlist-hybrid.c.

 tools/perf/builtin-record.c | 19 +++
 tools/perf/util/Build   |  1 +
 tools/perf/util/evlist-hybrid.c | 41 +
 tools/perf/util/evlist-hybrid.h | 12 ++
 tools/perf/util/evlist.c|  5 +++-
 tools/perf/util/evsel.c |  6 ++---
 tools/perf/util/evsel.h |  2 +-
 7 files changed, 77 insertions(+), 9 deletions(-)
 create mode 100644 tools/perf/util/evlist-hybrid.c
 create mode 100644 tools/perf/util/evlist-hybrid.h

diff --git a/tools/perf/builtin-record.c b/tools/perf/builtin-record.c
index 35465d1db6dd..1b44b1a8636e 100644
--- a/tools/perf/builtin-record.c
+++ b/tools/perf/builtin-record.c
@@ -47,6 +47,8 @@
 #include "util/util.h"
 #include "util/pfm.h"
 #include "util/clockid.h"
+#include "util/pmu-hybrid.h"
+#include "util/evlist-hybrid.h"
 #include "asm/bug.h"
 #include "perf.h"
 
@@ -2786,10 +2788,19 @@ int cmd_record(int argc, const char **argv)
if (record.opts.overwrite)
record.opts.tail_synthesize = true;
 
-   if (rec->evlist->core.nr_entries == 0 &&
-   __evlist__add_default(rec->evlist, !record.opts.no_samples) < 0) {
-   pr_err("Not enough memory for event selector list\n");
-   goto out;
+   if (rec->evlist->core.nr_entries == 0) {
+   

[PATCH v3 11/27] perf pmu: Support 'cycles' and 'branches' inside hybrid PMU

2021-03-29 Thread Jin Yao
On hybrid platform, user may want to enable the hardware event
only on one PMU. So following syntax is supported:

cpu_core//
cpu_atom//

  # perf stat -e cpu_core/cpu-cycles/ -a -- sleep 1

   Performance counter stats for 'system wide':

   6,049,336  cpu_core/cpu-cycles/

 1.003577042 seconds time elapsed

It enables the event 'cpu-cycles' only on cpu_core pmu.

But for 'cycles' and 'branches', the syntax doesn't work.

Before:

  # perf stat -e cpu_core/cycles/ -a -- sleep 1
  event syntax error: 'cpu_core/cycles/'
\___ unknown term 'cycles' for pmu 'cpu_core'

  # perf stat -e cpu_core/branches/ -a -- sleep 1
  event syntax error: 'cpu_core/branches/'
\___ unknown term 'branches' for pmu 'cpu_core'

For 'cpu-cycles', why it works is because the event is defined in
/sys/devices/cpu_core/events/. It's added as alias by
pmu_add_sys_aliases and it's treated as 'event' when parsing the
term->config.

We use a similar idea, create a pme_hybrid_fixup table for
'cycles' and 'branches' and add them as aliases.

After:

  # perf stat -e cpu_core/cycles/ -a -- sleep 1

   Performance counter stats for 'system wide':

   5,769,631  cpu_core/cycles/

 1.003833235 seconds time elapsed

  # perf stat -e cpu_core/branches/ -a -- sleep 1

   Performance counter stats for 'system wide':

 490,951  cpu_core/branches/

 1.003693946 seconds time elapsed

Signed-off-by: Jin Yao 
---
v3:
 - New patch in v3.

 tools/perf/util/pmu.c | 32 
 1 file changed, 32 insertions(+)

diff --git a/tools/perf/util/pmu.c b/tools/perf/util/pmu.c
index beff29981101..72e5ae5e868e 100644
--- a/tools/perf/util/pmu.c
+++ b/tools/perf/util/pmu.c
@@ -916,6 +916,35 @@ static int pmu_max_precise(const char *name)
return max_precise;
 }
 
+static void perf_pmu__add_hybrid_aliases(struct list_head *head)
+{
+   static struct pmu_event pme_hybrid_fixup[] = {
+   {
+   .name = "cycles",
+   .event = "event=0x3c",
+   },
+   {
+   .name = "branches",
+   .event = "event=0xc4",
+   },
+   {
+   .name = 0,
+   .event = 0,
+   },
+   };
+   int i = 0;
+
+   while (1) {
+   struct pmu_event *pe = _hybrid_fixup[i++];
+
+   if (!pe->name)
+   break;
+
+   __perf_pmu__new_alias(head, NULL, (char *)pe->name, NULL,
+ (char *)pe->event, NULL);
+   }
+}
+
 static struct perf_pmu *pmu_lookup(const char *name)
 {
struct perf_pmu *pmu;
@@ -955,6 +984,9 @@ static struct perf_pmu *pmu_lookup(const char *name)
pmu_add_cpu_aliases(, pmu);
pmu_add_sys_aliases(, pmu);
 
+   if (pmu->is_hybrid)
+   perf_pmu__add_hybrid_aliases();
+
INIT_LIST_HEAD(>format);
INIT_LIST_HEAD(>aliases);
INIT_LIST_HEAD(>caps);
-- 
2.17.1



[PATCH v3 09/27] perf parse-events: Create two hybrid cache events

2021-03-29 Thread Jin Yao
 1001696907 1001696907
  L1-dcache-loads: 12: 7723 1001716599 1001716599
  L1-dcache-loads: 13: 10211 1001743285 1001743285
  L1-dcache-loads: 14: 13023 1001765343 1001765343
  L1-dcache-loads: 15: 8991 1001788673 1001788673
  L1-dcache-loads: 0: 240163 1001800830 1001800830
  L1-dcache-loads: 1: 7454 1001773983 1001773983
  L1-dcache-loads: 2: 32323 1001686339 1001686339
  L1-dcache-loads: 3: 11039 1001732430 1001732430
  L1-dcache-loads: 4: 52867 1001753753 1001753753
  L1-dcache-loads: 5: 7481 1001756879 1001756879
  L1-dcache-loads: 6: 7471 1001814616 1001814616
  L1-dcache-loads: 7: 29627 1001815092 1001815092
  L1-dcache-loads: 453924 16027424826 16027424826
  L1-dcache-loads: 388425 8014133922 8014133922

   Performance counter stats for 'system wide':

 453,924  cpu_core/L1-dcache-loads/
 388,425  cpu_atom/L1-dcache-loads/

 1.003644499 seconds time elapsed

type 7 is PERF_TYPE_HW_CACHE_PMU.
0x4 in 0x4 indicates the cpu_core pmu.
0xa in 0xa indicates the cpu_atom pmu.

Signed-off-by: Jin Yao 
---
v3:
 - Hybrid cache event creation is moved to parse-events-hybrid.c.

 tools/perf/util/parse-events-hybrid.c | 23 +++
 tools/perf/util/parse-events-hybrid.h |  5 +
 tools/perf/util/parse-events.c|  8 
 3 files changed, 36 insertions(+)

diff --git a/tools/perf/util/parse-events-hybrid.c 
b/tools/perf/util/parse-events-hybrid.c
index bd48563596e0..ff2909bfbf86 100644
--- a/tools/perf/util/parse-events-hybrid.c
+++ b/tools/perf/util/parse-events-hybrid.c
@@ -95,3 +95,26 @@ int parse_events__add_numeric_hybrid(struct 
parse_events_state *parse_state,
 
return -1;
 }
+
+int parse_events__add_cache_hybrid(struct list_head *list, int *idx,
+  struct perf_event_attr *attr, char *name,
+  struct list_head *config_terms,
+  bool *hybrid)
+{
+   struct perf_pmu *pmu;
+   int ret;
+
+   *hybrid = false;
+   if (!perf_pmu__has_hybrid())
+   return 0;
+
+   *hybrid = true;
+   perf_pmu__for_each_hybrid_pmu(pmu) {
+   ret = create_event_hybrid(PERF_TYPE_HW_CACHE_PMU, idx, list,
+ attr, name, config_terms, pmu);
+   if (ret)
+   return ret;
+   }
+
+   return 0;
+}
diff --git a/tools/perf/util/parse-events-hybrid.h 
b/tools/perf/util/parse-events-hybrid.h
index d81a76978480..9ad33cd0cef4 100644
--- a/tools/perf/util/parse-events-hybrid.h
+++ b/tools/perf/util/parse-events-hybrid.h
@@ -14,4 +14,9 @@ int parse_events__add_numeric_hybrid(struct 
parse_events_state *parse_state,
 char *name, struct list_head *config_terms,
 bool *hybrid);
 
+int parse_events__add_cache_hybrid(struct list_head *list, int *idx,
+  struct perf_event_attr *attr, char *name,
+  struct list_head *config_terms,
+  bool *hybrid);
+
 #endif /* __PERF_PARSE_EVENTS_HYBRID_H */
diff --git a/tools/perf/util/parse-events.c b/tools/perf/util/parse-events.c
index 1bbd0ba92ba7..3692fa3c964a 100644
--- a/tools/perf/util/parse-events.c
+++ b/tools/perf/util/parse-events.c
@@ -458,6 +458,7 @@ int parse_events_add_cache(struct list_head *list, int *idx,
int cache_type = -1, cache_op = -1, cache_result = -1;
char *op_result[2] = { op_result1, op_result2 };
int i, n;
+   bool hybrid;
 
/*
 * No fallback - if we cannot get a clear cache type
@@ -517,6 +518,13 @@ int parse_events_add_cache(struct list_head *list, int 
*idx,
if (get_config_terms(head_config, _terms))
return -ENOMEM;
}
+
+   i = parse_events__add_cache_hybrid(list, idx, ,
+  config_name ? : name, _terms,
+  );
+   if (hybrid)
+   return i;
+
return add_event(list, idx, , config_name ? : name, _terms);
 }
 
-- 
2.17.1



[PATCH v3 08/27] perf parse-events: Create two hybrid hardware events

2021-03-29 Thread Jin Yao
 = 27
  cycles: 0: 402953 1001514714 1001514714
  cycles: 1: 137456 1001554851 1001554851
  cycles: 2: 130449 1001575948 1001575948
  cycles: 3: 61370 1001600256 1001600256
  cycles: 4: 100084 1001614135 1001614135
  cycles: 5: 62549 1001646802 1001646802
  cycles: 6: 453760 1001703487 1001703487
  cycles: 7: 66527 1001739138 1001739138
  cycles: 8: 80526 1001823867 1001823867
  cycles: 9: 74942 1001863884 1001863884
  cycles: 10: 322356 1001952832 1001952832
  cycles: 11: 1681751 1001846058 1001846058
  cycles: 12: 97608 1001874578 1001874578
  cycles: 13: 62060 1001899200 1001899200
  cycles: 14: 546496 1001920608 1001920608
  cycles: 15: 65631 1001939206 1001939206
  cycles: 0: 59901 1002047096 1002047096
  cycles: 1: 57304 1001995373 1001995373
  cycles: 2: 781291 1002027732 1002027732
  cycles: 3: 99656 1002058466 1002058466
  cycles: 4: 95071 1002092749 1002092749
  cycles: 5: 346827 1002142979 1002142979
  cycles: 6: 183967 1002183046 1002183046
  cycles: 7: 391779 1002218286 1002218286
  cycles: 4346518 16028069564 16028069564
  cycles: 2015796 8016765727 8016765727

   Performance counter stats for 'system wide':

   4,346,518  cpu_core/cycles/
   2,015,796  cpu_atom/cycles/

 1.003685897 seconds time elapsed

type 6 is PERF_TYPE_HARDWARE_PMU.
0x4 in 0x4 indicates the cpu_core pmu.
0xa in 0xa indicates the cpu_atom pmu.

Signed-off-by: Jin Yao 
---
v3:
 - Create new parse-events-hybrid.c/parse-events-hybrid.h
 - Refine the code

 tools/perf/util/Build |  1 +
 tools/perf/util/parse-events-hybrid.c | 97 +++
 tools/perf/util/parse-events-hybrid.h | 17 +
 tools/perf/util/parse-events.c| 18 +
 tools/perf/util/parse-events.h|  5 ++
 5 files changed, 138 insertions(+)
 create mode 100644 tools/perf/util/parse-events-hybrid.c
 create mode 100644 tools/perf/util/parse-events-hybrid.h

diff --git a/tools/perf/util/Build b/tools/perf/util/Build
index 37a8a63c7195..00c9fb064ba6 100644
--- a/tools/perf/util/Build
+++ b/tools/perf/util/Build
@@ -23,6 +23,7 @@ perf-y += llvm-utils.o
 perf-y += mmap.o
 perf-y += memswap.o
 perf-y += parse-events.o
+perf-y += parse-events-hybrid.o
 perf-y += perf_regs.o
 perf-y += path.o
 perf-y += print_binary.o
diff --git a/tools/perf/util/parse-events-hybrid.c 
b/tools/perf/util/parse-events-hybrid.c
new file mode 100644
index ..bd48563596e0
--- /dev/null
+++ b/tools/perf/util/parse-events-hybrid.c
@@ -0,0 +1,97 @@
+// SPDX-License-Identifier: GPL-2.0
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include "evlist.h"
+#include "evsel.h"
+#include "parse-events.h"
+#include "parse-events-hybrid.h"
+#include "debug.h"
+#include "pmu.h"
+#include "pmu-hybrid.h"
+#include "perf.h"
+
+static void config_hybrid_attr(struct perf_event_attr *attr,
+  int type, int pmu_type)
+{
+   /*
+* attr.config layout:
+* PERF_TYPE_HARDWARE_PMU: 0xDD00AA
+* AA: hardware event ID
+* DD: PMU type ID
+* PERF_TYPE_HW_CACHE_PMU: 0xDD00CCBBAA
+* AA: hardware cache ID
+* BB: hardware cache op ID
+* CC: hardware cache op result ID
+* DD: PMU type ID
+*/
+   attr->type = type;
+   attr->config = attr->config | ((__u64)pmu_type << PERF_PMU_TYPE_SHIFT);
+}
+
+static int create_event_hybrid(__u32 config_type, int *idx,
+  struct list_head *list,
+  struct perf_event_attr *attr, char *name,
+  struct list_head *config_terms,
+  struct perf_pmu *pmu)
+{
+   struct evsel *evsel;
+   __u32 type = attr->type;
+   __u64 config = attr->config;
+
+   config_hybrid_attr(attr, config_type, pmu->type);
+   evsel = parse_events__add_event_hybrid(list, idx, attr, name,
+  pmu, config_terms);
+   if (evsel)
+   evsel->pmu_name = strdup(pmu->name);
+   else
+   return -ENOMEM;
+
+   attr->type = type;
+   attr->config = config;
+   return 0;
+}
+
+static int add_hw_hybrid(struct parse_events_state *parse_state,
+struct list_head *list, struct perf_event_attr *attr,
+char *name, struct list_head *config_terms)
+{
+   struct perf_pmu *pmu;
+   int ret;
+
+   perf_pmu__for_each_hybrid_pmu(pmu) {
+   ret = create_event_hybrid(PERF_TYPE_HARDWARE_PMU,
+ _state->idx, list, attr, name,
+ c

[PATCH v3 04/27] perf pmu: Save pmu name

2021-03-29 Thread Jin Yao
On hybrid platform, one event is available on one pmu
(such as, available on cpu_core or on cpu_atom).

This patch saves the pmu name to the pmu field of struct perf_pmu_alias.
Then next we can know the pmu which the event can be enabled on.

Signed-off-by: Jin Yao 
---
v3:
 - Change pmu to pmu_name in struct perf_pmu_alias.

 tools/perf/util/pmu.c | 10 +-
 tools/perf/util/pmu.h |  1 +
 2 files changed, 10 insertions(+), 1 deletion(-)

diff --git a/tools/perf/util/pmu.c b/tools/perf/util/pmu.c
index 9ed9a6a8b2d2..10709ec1cc3e 100644
--- a/tools/perf/util/pmu.c
+++ b/tools/perf/util/pmu.c
@@ -283,6 +283,7 @@ void perf_pmu_free_alias(struct perf_pmu_alias *newalias)
zfree(>str);
zfree(>metric_expr);
zfree(>metric_name);
+   zfree(>pmu_name);
parse_events_terms__purge(>terms);
free(newalias);
 }
@@ -297,6 +298,10 @@ static bool perf_pmu_merge_alias(struct perf_pmu_alias 
*newalias,
 
list_for_each_entry(a, alist, list) {
if (!strcasecmp(newalias->name, a->name)) {
+   if (newalias->pmu_name && a->pmu_name &&
+   !strcasecmp(newalias->pmu_name, a->pmu_name)) {
+   continue;
+   }
perf_pmu_update_alias(a, newalias);
perf_pmu_free_alias(newalias);
return true;
@@ -314,7 +319,8 @@ static int __perf_pmu__new_alias(struct list_head *list, 
char *dir, char *name,
int num;
char newval[256];
char *long_desc = NULL, *topic = NULL, *unit = NULL, *perpkg = NULL,
-*metric_expr = NULL, *metric_name = NULL, *deprecated = NULL;
+*metric_expr = NULL, *metric_name = NULL, *deprecated = NULL,
+*pmu_name = NULL;
 
if (pe) {
long_desc = (char *)pe->long_desc;
@@ -324,6 +330,7 @@ static int __perf_pmu__new_alias(struct list_head *list, 
char *dir, char *name,
metric_expr = (char *)pe->metric_expr;
metric_name = (char *)pe->metric_name;
deprecated = (char *)pe->deprecated;
+   pmu_name = (char *)pe->pmu;
}
 
alias = malloc(sizeof(*alias));
@@ -389,6 +396,7 @@ static int __perf_pmu__new_alias(struct list_head *list, 
char *dir, char *name,
}
alias->per_pkg = perpkg && sscanf(perpkg, "%d", ) == 1 && num == 1;
alias->str = strdup(newval);
+   alias->pmu_name = pmu_name ? strdup(pmu_name) : NULL;
 
if (deprecated)
alias->deprecated = true;
diff --git a/tools/perf/util/pmu.h b/tools/perf/util/pmu.h
index 8164388478c6..8493b1719e10 100644
--- a/tools/perf/util/pmu.h
+++ b/tools/perf/util/pmu.h
@@ -72,6 +72,7 @@ struct perf_pmu_alias {
bool deprecated;
char *metric_expr;
char *metric_name;
+   char *pmu_name;
 };
 
 struct perf_pmu *perf_pmu__find(const char *name);
-- 
2.17.1



[PATCH v3 05/27] perf pmu: Save detected hybrid pmus to a global pmu list

2021-03-29 Thread Jin Yao
We identify the cpu_core pmu and cpu_atom pmu by explicitly
checking following files:

For cpu_core, checks:
"/sys/bus/event_source/devices/cpu_core/cpus"

For cpu_atom, checks:
"/sys/bus/event_source/devices/cpu_atom/cpus"

If the 'cpus' file exists, the pmu exists.

But in order not to hardcode the "cpu_core" and "cpu_atom",
and make the code in a generic way. So if the path
"/sys/bus/event_source/devices/cpu_xxx/cpus" exists, the hybrid
pmu exists. All the detected hybrid pmus are linked to a
global list 'perf_pmu__hybrid_pmus' and then next we just need
to iterate the list to get all hybrid pmu by using
perf_pmu__for_each_hybrid_pmu.

Signed-off-by: Jin Yao 
---
v3:
 - No functional change.

 tools/perf/util/Build|  1 +
 tools/perf/util/pmu-hybrid.c | 35 +++
 tools/perf/util/pmu-hybrid.h | 18 ++
 tools/perf/util/pmu.c|  9 -
 tools/perf/util/pmu.h|  4 
 5 files changed, 66 insertions(+), 1 deletion(-)
 create mode 100644 tools/perf/util/pmu-hybrid.c
 create mode 100644 tools/perf/util/pmu-hybrid.h

diff --git a/tools/perf/util/Build b/tools/perf/util/Build
index e3e12f9d4733..37a8a63c7195 100644
--- a/tools/perf/util/Build
+++ b/tools/perf/util/Build
@@ -69,6 +69,7 @@ perf-y += parse-events-bison.o
 perf-y += pmu.o
 perf-y += pmu-flex.o
 perf-y += pmu-bison.o
+perf-y += pmu-hybrid.o
 perf-y += trace-event-read.o
 perf-y += trace-event-info.o
 perf-y += trace-event-scripting.o
diff --git a/tools/perf/util/pmu-hybrid.c b/tools/perf/util/pmu-hybrid.c
new file mode 100644
index ..7316bf46e54b
--- /dev/null
+++ b/tools/perf/util/pmu-hybrid.c
@@ -0,0 +1,35 @@
+// SPDX-License-Identifier: GPL-2.0
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include "fncache.h"
+#include "pmu-hybrid.h"
+
+LIST_HEAD(perf_pmu__hybrid_pmus);
+
+bool perf_pmu__hybrid_mounted(const char *name)
+{
+   char path[PATH_MAX];
+   const char *sysfs;
+
+   if (strncmp(name, "cpu_", 4))
+   return false;
+
+   sysfs = sysfs__mountpoint();
+   if (!sysfs)
+   return false;
+
+   snprintf(path, PATH_MAX, CPUS_TEMPLATE_CPU, sysfs, name);
+   return file_available(path);
+}
diff --git a/tools/perf/util/pmu-hybrid.h b/tools/perf/util/pmu-hybrid.h
new file mode 100644
index ..35bed3714438
--- /dev/null
+++ b/tools/perf/util/pmu-hybrid.h
@@ -0,0 +1,18 @@
+/* SPDX-License-Identifier: GPL-2.0 */
+#ifndef __PMU_HYBRID_H
+#define __PMU_HYBRID_H
+
+#include 
+#include 
+#include 
+#include 
+#include "pmu.h"
+
+extern struct list_head perf_pmu__hybrid_pmus;
+
+#define perf_pmu__for_each_hybrid_pmu(pmu) \
+   list_for_each_entry(pmu, _pmu__hybrid_pmus, hybrid_list)
+
+bool perf_pmu__hybrid_mounted(const char *name);
+
+#endif /* __PMU_HYBRID_H */
diff --git a/tools/perf/util/pmu.c b/tools/perf/util/pmu.c
index 10709ec1cc3e..35e9660c3904 100644
--- a/tools/perf/util/pmu.c
+++ b/tools/perf/util/pmu.c
@@ -25,6 +25,7 @@
 #include "string2.h"
 #include "strbuf.h"
 #include "fncache.h"
+#include "pmu-hybrid.h"
 
 struct perf_pmu perf_pmu__fake;
 
@@ -613,7 +614,6 @@ static struct perf_cpu_map *__pmu_cpumask(const char *path)
  */
 #define SYS_TEMPLATE_ID"./bus/event_source/devices/%s/identifier"
 #define CPUS_TEMPLATE_UNCORE   "%s/bus/event_source/devices/%s/cpumask"
-#define CPUS_TEMPLATE_CPU  "%s/bus/event_source/devices/%s/cpus"
 
 static struct perf_cpu_map *pmu_cpumask(const char *name)
 {
@@ -645,6 +645,9 @@ static bool pmu_is_uncore(const char *name)
char path[PATH_MAX];
const char *sysfs;
 
+   if (perf_pmu__hybrid_mounted(name))
+   return false;
+
sysfs = sysfs__mountpoint();
snprintf(path, PATH_MAX, CPUS_TEMPLATE_UNCORE, sysfs, name);
return file_available(path);
@@ -946,6 +949,7 @@ static struct perf_pmu *pmu_lookup(const char *name)
pmu->is_uncore = pmu_is_uncore(name);
if (pmu->is_uncore)
pmu->id = pmu_id(name);
+   pmu->is_hybrid = perf_pmu__hybrid_mounted(name);
pmu->max_precise = pmu_max_precise(name);
pmu_add_cpu_aliases(, pmu);
pmu_add_sys_aliases(, pmu);
@@ -957,6 +961,9 @@ static struct perf_pmu *pmu_lookup(const char *name)
list_splice(, >aliases);
list_add_tail(>list, );
 
+   if (pmu->is_hybrid)
+   list_add_tail(>hybrid_list, _pmu__hybrid_pmus);
+
pmu->default_config = perf_pmu__get_default_config(pmu);
 
return pmu;
diff --git a/tools/perf/util/pmu.h b/tools/perf/util/pmu.h
index 8493b1719e10..29289e7c2649 100644
--- a/tools/perf/util/pmu.h
+++ b/tools/perf/util/pmu.h
@@ -5,6 +5,7

[PATCH v3 06/27] perf pmu: Add hybrid helper functions

2021-03-29 Thread Jin Yao
The functions perf_pmu__is_hybrid and perf_pmu__find_hybrid_pmu
can be used to identify the hybrid platform and return the found
hybrid cpu pmu. All the detected hybrid pmus have been saved in
'perf_pmu__hybrid_pmus' list. So we just need to search this list.

perf_pmu__hybrid_type_to_pmu converts the user specified string
to hybrid pmu name. This is used to support the '--cputype' option
in next patches.

perf_pmu__has_hybrid checks the existing of hybrid pmu. Note that,
we have to define it in pmu.c (try to make pmu-hybrid.c no more
symbol dependency), otherwise perf test python would be failed.

Signed-off-by: Jin Yao 
---
v3:
 - Move perf_pmu__has_hybrid from pmu-hybrid.c to pmu.c. We have to
   add pmu-hybrid.c to python-ext-sources to solve symbol dependency
   issue found in perf test python. For perf_pmu__has_hybrid, it calls
   perf_pmu__scan, which is defined in pmu.c. It's very hard to add
   pmu.c to python-ext-sources, too much symbol dependency here.

 tools/perf/util/pmu-hybrid.c | 40 
 tools/perf/util/pmu-hybrid.h |  4 
 tools/perf/util/pmu.c| 11 ++
 tools/perf/util/pmu.h|  2 ++
 4 files changed, 57 insertions(+)

diff --git a/tools/perf/util/pmu-hybrid.c b/tools/perf/util/pmu-hybrid.c
index 7316bf46e54b..86ba84d9469c 100644
--- a/tools/perf/util/pmu-hybrid.c
+++ b/tools/perf/util/pmu-hybrid.c
@@ -33,3 +33,43 @@ bool perf_pmu__hybrid_mounted(const char *name)
snprintf(path, PATH_MAX, CPUS_TEMPLATE_CPU, sysfs, name);
return file_available(path);
 }
+
+struct perf_pmu *perf_pmu__find_hybrid_pmu(const char *name)
+{
+   struct perf_pmu *pmu;
+
+   if (!name)
+   return NULL;
+
+   perf_pmu__for_each_hybrid_pmu(pmu) {
+   if (!strcmp(name, pmu->name))
+   return pmu;
+   }
+
+   return NULL;
+}
+
+bool perf_pmu__is_hybrid(const char *name)
+{
+   return perf_pmu__find_hybrid_pmu(name) != NULL;
+}
+
+char *perf_pmu__hybrid_type_to_pmu(const char *type)
+{
+   char *pmu_name = NULL;
+
+   if (asprintf(_name, "cpu_%s", type) < 0)
+   return NULL;
+
+   if (perf_pmu__is_hybrid(pmu_name))
+   return pmu_name;
+
+   /*
+* pmu may be not scanned, check the sysfs.
+*/
+   if (perf_pmu__hybrid_mounted(pmu_name))
+   return pmu_name;
+
+   free(pmu_name);
+   return NULL;
+}
diff --git a/tools/perf/util/pmu-hybrid.h b/tools/perf/util/pmu-hybrid.h
index 35bed3714438..d0fa7bc50a76 100644
--- a/tools/perf/util/pmu-hybrid.h
+++ b/tools/perf/util/pmu-hybrid.h
@@ -15,4 +15,8 @@ extern struct list_head perf_pmu__hybrid_pmus;
 
 bool perf_pmu__hybrid_mounted(const char *name);
 
+struct perf_pmu *perf_pmu__find_hybrid_pmu(const char *name);
+bool perf_pmu__is_hybrid(const char *name);
+char *perf_pmu__hybrid_type_to_pmu(const char *type);
+
 #endif /* __PMU_HYBRID_H */
diff --git a/tools/perf/util/pmu.c b/tools/perf/util/pmu.c
index 35e9660c3904..beff29981101 100644
--- a/tools/perf/util/pmu.c
+++ b/tools/perf/util/pmu.c
@@ -40,6 +40,7 @@ int perf_pmu_parse(struct list_head *list, char *name);
 extern FILE *perf_pmu_in;
 
 static LIST_HEAD(pmus);
+static bool hybrid_scanned;
 
 /*
  * Parse & process all the sysfs attributes located under
@@ -1823,3 +1824,13 @@ int perf_pmu__caps_parse(struct perf_pmu *pmu)
 
return nr_caps;
 }
+
+bool perf_pmu__has_hybrid(void)
+{
+   if (!hybrid_scanned) {
+   hybrid_scanned = true;
+   perf_pmu__scan(NULL);
+   }
+
+   return !list_empty(_pmu__hybrid_pmus);
+}
diff --git a/tools/perf/util/pmu.h b/tools/perf/util/pmu.h
index 29289e7c2649..5ed2abab7fe0 100644
--- a/tools/perf/util/pmu.h
+++ b/tools/perf/util/pmu.h
@@ -128,4 +128,6 @@ int perf_pmu__convert_scale(const char *scale, char **end, 
double *sval);
 
 int perf_pmu__caps_parse(struct perf_pmu *pmu);
 
+bool perf_pmu__has_hybrid(void);
+
 #endif /* __PMU_H */
-- 
2.17.1



[PATCH v3 07/27] perf stat: Uniquify hybrid event name

2021-03-29 Thread Jin Yao
It would be useful to tell user the pmu which the event belongs to.
perf-stat has supported '--no-merge' option and it can print the pmu
name after the event name, such as:

"cycles [cpu_core]"

Now this option is enabled by default for hybrid platform but change
the format to:

"cpu_core/cycles/"

Signed-off-by: Jin Yao 
---
v3:
 - No functional change.

 tools/perf/builtin-stat.c  |  4 
 tools/perf/util/stat-display.c | 13 +++--
 2 files changed, 15 insertions(+), 2 deletions(-)

diff --git a/tools/perf/builtin-stat.c b/tools/perf/builtin-stat.c
index 4bb48c6b6698..7b2dfe21c5a8 100644
--- a/tools/perf/builtin-stat.c
+++ b/tools/perf/builtin-stat.c
@@ -68,6 +68,7 @@
 #include "util/affinity.h"
 #include "util/pfm.h"
 #include "util/bpf_counter.h"
+#include "util/pmu-hybrid.h"
 #include "asm/bug.h"
 
 #include 
@@ -2371,6 +2372,9 @@ int cmd_stat(int argc, const char **argv)
 
evlist__check_cpu_maps(evsel_list);
 
+   if (perf_pmu__has_hybrid())
+   stat_config.no_merge = true;
+
/*
 * Initialize thread_map with comm names,
 * so we could print it out on output.
diff --git a/tools/perf/util/stat-display.c b/tools/perf/util/stat-display.c
index 7f09cdaf5b60..161826938a00 100644
--- a/tools/perf/util/stat-display.c
+++ b/tools/perf/util/stat-display.c
@@ -17,6 +17,7 @@
 #include "cgroup.h"
 #include 
 #include "util.h"
+#include "pmu-hybrid.h"
 
 #define CNTR_NOT_SUPPORTED ""
 #define CNTR_NOT_COUNTED   ""
@@ -526,6 +527,7 @@ static void uniquify_event_name(struct evsel *counter)
 {
char *new_name;
char *config;
+   int ret;
 
if (counter->uniquified_name ||
!counter->pmu_name || !strncmp(counter->name, counter->pmu_name,
@@ -540,8 +542,15 @@ static void uniquify_event_name(struct evsel *counter)
counter->name = new_name;
}
} else {
-   if (asprintf(_name,
-"%s [%s]", counter->name, counter->pmu_name) > 0) {
+   if (perf_pmu__has_hybrid()) {
+   ret = asprintf(_name, "%s/%s/",
+  counter->pmu_name, counter->name);
+   } else {
+   ret = asprintf(_name, "%s [%s]",
+  counter->name, counter->pmu_name);
+   }
+
+   if (ret) {
free(counter->name);
counter->name = new_name;
}
-- 
2.17.1



[PATCH v3 02/27] perf jevents: Support unit value "cpu_core" and "cpu_atom"

2021-03-29 Thread Jin Yao
For some Intel platforms, such as Alderlake, which is a hybrid platform
and it consists of atom cpu and core cpu. Each cpu has dedicated event
list. Part of events are available on core cpu, part of events are
available on atom cpu.

The kernel exports new cpu pmus: cpu_core and cpu_atom. The event in
json is added with a new field "Unit" to indicate which pmu the event
is available on.

For example, one event in cache.json,

{
"BriefDescription": "Counts the number of load ops retired that",
"CollectPEBSRecord": "2",
"Counter": "0,1,2,3",
"EventCode": "0xd2",
"EventName": "MEM_LOAD_UOPS_RETIRED_MISC.MMIO",
"PEBScounters": "0,1,2,3",
"SampleAfterValue": "103",
"UMask": "0x80",
"Unit": "cpu_atom"
},

The unit "cpu_atom" indicates this event is only availabe on "cpu_atom".

In generated pmu-events.c, we can see:

{
.name = "mem_load_uops_retired_misc.mmio",
.event = "period=103,umask=0x80,event=0xd2",
.desc = "Counts the number of load ops retired that. Unit: cpu_atom ",
.topic = "cache",
.pmu = "cpu_atom",
},

But if without this patch, the "uncore_" prefix is added before "cpu_atom",
such as:
.pmu = "uncore_cpu_atom"

That would be a wrong pmu.

Signed-off-by: Jin Yao 
---
v3:
 - No change.

 tools/perf/pmu-events/jevents.c | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/tools/perf/pmu-events/jevents.c b/tools/perf/pmu-events/jevents.c
index 33aa3c885eaf..ed4f0bd72e5a 100644
--- a/tools/perf/pmu-events/jevents.c
+++ b/tools/perf/pmu-events/jevents.c
@@ -285,6 +285,8 @@ static struct map {
{ "imx8_ddr", "imx8_ddr" },
{ "L3PMC", "amd_l3" },
{ "DFPMC", "amd_df" },
+   { "cpu_core", "cpu_core" },
+   { "cpu_atom", "cpu_atom" },
{}
 };
 
-- 
2.17.1



[PATCH v3 03/27] perf pmu: Simplify arguments of __perf_pmu__new_alias

2021-03-29 Thread Jin Yao
Simplify the arguments of __perf_pmu__new_alias() by passing
the whole 'struct pme_event' pointer.

Signed-off-by: Jin Yao 
---
v3:
 - No change.

 tools/perf/util/pmu.c | 36 
 1 file changed, 16 insertions(+), 20 deletions(-)

diff --git a/tools/perf/util/pmu.c b/tools/perf/util/pmu.c
index 88da5cf6aee8..9ed9a6a8b2d2 100644
--- a/tools/perf/util/pmu.c
+++ b/tools/perf/util/pmu.c
@@ -306,18 +306,25 @@ static bool perf_pmu_merge_alias(struct perf_pmu_alias 
*newalias,
 }
 
 static int __perf_pmu__new_alias(struct list_head *list, char *dir, char *name,
-char *desc, char *val,
-char *long_desc, char *topic,
-char *unit, char *perpkg,
-char *metric_expr,
-char *metric_name,
-char *deprecated)
+char *desc, char *val, struct pmu_event *pe)
 {
struct parse_events_term *term;
struct perf_pmu_alias *alias;
int ret;
int num;
char newval[256];
+   char *long_desc = NULL, *topic = NULL, *unit = NULL, *perpkg = NULL,
+*metric_expr = NULL, *metric_name = NULL, *deprecated = NULL;
+
+   if (pe) {
+   long_desc = (char *)pe->long_desc;
+   topic = (char *)pe->topic;
+   unit = (char *)pe->unit;
+   perpkg = (char *)pe->perpkg;
+   metric_expr = (char *)pe->metric_expr;
+   metric_name = (char *)pe->metric_name;
+   deprecated = (char *)pe->deprecated;
+   }
 
alias = malloc(sizeof(*alias));
if (!alias)
@@ -406,8 +413,7 @@ static int perf_pmu__new_alias(struct list_head *list, char 
*dir, char *name, FI
/* Remove trailing newline from sysfs file */
strim(buf);
 
-   return __perf_pmu__new_alias(list, dir, name, NULL, buf, NULL, NULL, 
NULL,
-NULL, NULL, NULL, NULL);
+   return __perf_pmu__new_alias(list, dir, name, NULL, buf, NULL);
 }
 
 static inline bool pmu_alias_info_file(char *name)
@@ -793,11 +799,7 @@ void pmu_add_cpu_aliases_map(struct list_head *head, 
struct perf_pmu *pmu,
/* need type casts to override 'const' */
__perf_pmu__new_alias(head, NULL, (char *)pe->name,
(char *)pe->desc, (char *)pe->event,
-   (char *)pe->long_desc, (char *)pe->topic,
-   (char *)pe->unit, (char *)pe->perpkg,
-   (char *)pe->metric_expr,
-   (char *)pe->metric_name,
-   (char *)pe->deprecated);
+   pe);
}
 }
 
@@ -864,13 +866,7 @@ static int pmu_add_sys_aliases_iter_fn(struct pmu_event 
*pe, void *data)
  (char *)pe->name,
  (char *)pe->desc,
  (char *)pe->event,
- (char *)pe->long_desc,
- (char *)pe->topic,
- (char *)pe->unit,
- (char *)pe->perpkg,
- (char *)pe->metric_expr,
- (char *)pe->metric_name,
- (char *)pe->deprecated);
+ pe);
}
 
return 0;
-- 
2.17.1



[PATCH v3 01/27] tools headers uapi: Update tools's copy of linux/perf_event.h

2021-03-29 Thread Jin Yao
To get the changes in:

Liang Kan's patch
("perf: Introduce PERF_TYPE_HARDWARE_PMU and PERF_TYPE_HW_CACHE_PMU")

Kan's patch is in review at the moment, but the following perf tool
patches need this interface for hybrid support.

This patch can be removed after Kan's patch is upstreamed.

Signed-off-by: Jin Yao 
---
v3:
 - No change.

 tools/include/uapi/linux/perf_event.h | 26 ++
 1 file changed, 26 insertions(+)

diff --git a/tools/include/uapi/linux/perf_event.h 
b/tools/include/uapi/linux/perf_event.h
index ad15e40d7f5d..c0a511eea498 100644
--- a/tools/include/uapi/linux/perf_event.h
+++ b/tools/include/uapi/linux/perf_event.h
@@ -33,6 +33,8 @@ enum perf_type_id {
PERF_TYPE_HW_CACHE  = 3,
PERF_TYPE_RAW   = 4,
PERF_TYPE_BREAKPOINT= 5,
+   PERF_TYPE_HARDWARE_PMU  = 6,
+   PERF_TYPE_HW_CACHE_PMU  = 7,
 
PERF_TYPE_MAX,  /* non-ABI */
 };
@@ -94,6 +96,30 @@ enum perf_hw_cache_op_result_id {
PERF_COUNT_HW_CACHE_RESULT_MAX, /* non-ABI */
 };
 
+/*
+ * attr.config layout for type PERF_TYPE_HARDWARE* and PERF_TYPE_HW_CACHE*
+ * PERF_TYPE_HARDWARE: 0xAA
+ * AA: hardware event ID
+ * PERF_TYPE_HW_CACHE: 0xCCBBAA
+ * AA: hardware cache ID
+ * BB: hardware cache op ID
+ * CC: hardware cache op result ID
+ * PERF_TYPE_HARDWARE_PMU: 0xDD00AA
+ * AA: hardware event ID
+ * DD: PMU type ID
+ * PERF_TYPE_HW_CACHE_PMU: 0xDD00CCBBAA
+ * AA: hardware cache ID
+ * BB: hardware cache op ID
+ * CC: hardware cache op result ID
+ * DD: PMU type ID
+ */
+#define PERF_HW_CACHE_ID_SHIFT 0
+#define PERF_HW_CACHE_OP_ID_SHIFT  8
+#define PERF_HW_CACHE_OP_RESULT_ID_SHIFT   16
+#define PERF_HW_CACHE_EVENT_MASK   0xff
+
+#define PERF_PMU_TYPE_SHIFT32
+
 /*
  * Special "software" events provided by the kernel, even if the hardware
  * does not support performance events. These events measure various
-- 
2.17.1



[PATCH v3 00/27] perf tool: AlderLake hybrid support series 1

2021-03-29 Thread Jin Yao
AlderLake uses a hybrid architecture utilizing Golden Cove cores
(core cpu) and Gracemont cores (atom cpu). Each cpu has dedicated
event list. Some events are available on core cpu, some events
are available on atom cpu and some events can be available on both.

Kernel exports new pmus "cpu_core" and "cpu_atom" through sysfs:
/sys/devices/cpu_core
/sys/devices/cpu_atom

cat /sys/devices/cpu_core/cpus
0-15

cat /sys/devices/cpu_atom/cpus
16-23

In this example, core cpus are 0-15 and atom cpus are 16-23.

To enable a core only event or atom only event:

cpu_core//
or
cpu_atom//

Count the 'cycles' event on core cpus.

  # perf stat -e cpu_core/cycles/ -a -- sleep 1

   Performance counter stats for 'system wide':

  12,853,951,349  cpu_core/cycles/

 1.002581249 seconds time elapsed

If one event is available on both atom cpu and core cpu, two events
are created automatically.

  # perf stat -e cycles -a -- sleep 1

   Performance counter stats for 'system wide':

  12,856,467,438  cpu_core/cycles/
   6,404,634,785  cpu_atom/cycles/

 1.002453013 seconds time elapsed

Group is supported if the events are from same pmu, otherwise a warning
is displayed and disable grouping automatically.

  # perf stat -e '{cpu_core/cycles/,cpu_core/instructions/}' -a -- sleep 1

   Performance counter stats for 'system wide':

  12,863,866,968  cpu_core/cycles/
 554,795,017  cpu_core/instructions/

 1.002616117 seconds time elapsed

  # perf stat -e '{cpu_core/cycles/,cpu_atom/instructions/}' -a -- sleep 1
  WARNING: events in group from different hybrid PMUs!
  WARNING: grouped events cpus do not match, disabling group:
anon group { cpu_core/cycles/, cpu_atom/instructions/ }

   Performance counter stats for 'system wide':

   6,283,970  cpu_core/cycles/
 765,635  cpu_atom/instructions/

 1.003959036 seconds time elapsed

Note that, since the whole patchset for AlderLake hybrid support is very
large (40+ patches). For simplicity, it's splitted into several patch
series.

The patch series 1 only supports the basic functionality. The advanced
supports for perf-c2c/perf-mem/topdown/metrics/topology header and others
will be added in follow-up patch series.

The perf tool codes can also be found at:
https://github.com/yaoj/perf.git

v3:
---
- Drop 'perf evlist: Hybrid event uses its own cpus'. This patch is wide
  and actually it's not very necessary. The current perf framework has
  processed the cpus for evsel well even for hybrid evsel. So this patch can
  be dropped.

- Drop 'perf evsel: Adjust hybrid event and global event mixed group'.
  The patch is a bit tricky and hard to understand. In v3, we will disable
  grouping when the group members are from different PMUs. So this patch
  would be not necessary.

- Create parse-events-hybrid.c/parse-events-hybrid.h and 
evlist-hybrid.c/evlist-hybrid.h.
  Move hybrid related codes to these files.

- Create a new patch 'perf pmu: Support 'cycles' and 'branches' inside hybrid 
PMU' to
  support 'cycles' and 'branches' inside PMU.

- Create a new patch 'perf record: Uniquify hybrid event name' to tell user the
  pmu which the event belongs to for perf-record.

- If group members are from different hybrid PMUs, shows warning and disable
  grouping.

- Other refining and refactoring.

v2:
---
- Drop kernel patches (Kan posted the series "Add Alder Lake support for perf 
(kernel)" separately).
- Drop the patches for perf-c2c/perf-mem/topdown/metrics/topology header 
supports,
  which will be added in series 2 or series 3.
- Simplify the arguments of __perf_pmu__new_alias() by passing
  the 'struct pme_event' pointer.
- Check sysfs validity before access.
- Use pmu style event name, such as "cpu_core/cycles/".
- Move command output two chars to the right.
- Move pmu hybrid functions to new created pmu-hybrid.c/pmu-hybrid.h.
  This is to pass the perf test python case.

Jin Yao (27):
  tools headers uapi: Update tools's copy of linux/perf_event.h
  perf jevents: Support unit value "cpu_core" and "cpu_atom"
  perf pmu: Simplify arguments of __perf_pmu__new_alias
  perf pmu: Save pmu name
  perf pmu: Save detected hybrid pmus to a global pmu list
  perf pmu: Add hybrid helper functions
  perf stat: Uniquify hybrid event name
  perf parse-events: Create two hybrid hardware events
  perf parse-events: Create two hybrid cache events
  perf parse-events: Create two hybrid raw events
  perf pmu: Support 'cycles' and 'branches' inside hybrid PMU
  perf parse-events: Support no alias assigned event inside hybrid PMU
  perf record: Create two hybrid 'cycles' events by default
  perf stat: Add default hybrid events
  perf stat: Filter out unmatched aggregation for hybrid event
  perf stat: Warn group events from different hybrid PMU
  perf script: Support PERF_TYPE_HARDWARE_PMU and PERF_TYPE_HW_CACHE_PMU
  perf

[PATCH v3 2/2] perf test: Add CVS summary test

2021-03-19 Thread Jin Yao
The patch "perf stat: Align CSV output for summary mode" aligned
CVS output and added "summary" to the first column of summary
lines.

Now we check if the "summary" string is added to the CVS output.

If we set '--no-cvs-summary' option, the "summary" string would
not be added, also check with this case.

Signed-off-by: Jin Yao 
---
 v3:
   - New in v3.
 
 tools/perf/tests/shell/stat+cvs_summary.sh | 31 ++
 1 file changed, 31 insertions(+)
 create mode 100755 tools/perf/tests/shell/stat+cvs_summary.sh

diff --git a/tools/perf/tests/shell/stat+cvs_summary.sh 
b/tools/perf/tests/shell/stat+cvs_summary.sh
new file mode 100755
index ..dd14f2ce7f6b
--- /dev/null
+++ b/tools/perf/tests/shell/stat+cvs_summary.sh
@@ -0,0 +1,31 @@
+#!/bin/sh
+# perf stat cvs summary test
+# SPDX-License-Identifier: GPL-2.0
+
+set -e
+
+#
+# 1.001364330 9224197  cycles 8012885033 100.00
+# summary 9224197  cycles 8012885033 100.00
+#
+perf stat -e cycles  -x' ' -I1000 --interval-count 1 --summary 2>&1 | \
+grep -e summary | \
+while read summary num event run pct
+do
+   if [ $summary != "summary" ]; then
+   exit 1
+   fi
+done
+
+#
+# 1.001360298 9148534  cycles 8012853854 100.00
+#9148534  cycles 8012853854 100.00
+#
+perf stat -e cycles  -x' ' -I1000 --interval-count 1 --summary 
--no-cvs-summary 2>&1 | \
+grep -e summary | \
+while read num event run pct
+do
+   exit 1
+done
+
+exit 0
-- 
2.17.1



[PATCH v3 1/2] perf stat: Align CSV output for summary mode

2021-03-19 Thread Jin Yao
,8013356501,100.00,0.36,insn per cycle
  553564,,branches,8013366204,100.00,69.081,K/sec
  54021,,branch-misses,8013375952,100.00,9.76,of all branches

Signed-off-by: Jin Yao 
---
 v3:
   - No change.

 v2:
   - Add new option '--no-cvs-summary'.
   - Add perf config variable 'stat.no-cvs-summary'.

 tools/perf/Documentation/perf-stat.txt | 9 +
 tools/perf/builtin-stat.c  | 7 +++
 tools/perf/util/config.c   | 3 +++
 tools/perf/util/stat-display.c | 6 ++
 tools/perf/util/stat.h | 2 ++
 5 files changed, 27 insertions(+)

diff --git a/tools/perf/Documentation/perf-stat.txt 
b/tools/perf/Documentation/perf-stat.txt
index 3055aad38d46..854597e70406 100644
--- a/tools/perf/Documentation/perf-stat.txt
+++ b/tools/perf/Documentation/perf-stat.txt
@@ -471,6 +471,15 @@ convenient for post processing.
 --summary::
 Print summary for interval mode (-I).
 
+--no-cvs-summary::
+Don't print 'summary' at the first column for CVS summary output.
+This option must be used with -x and --summary.
+
+This option can be enabled in perf config by setting the variable
+'stat.no-cvs-summary'.
+
+$ perf config stat.no-cvs-summary=true
+
 EXAMPLES
 
 
diff --git a/tools/perf/builtin-stat.c b/tools/perf/builtin-stat.c
index 2e2e4a8345ea..3823dd5fd6e8 100644
--- a/tools/perf/builtin-stat.c
+++ b/tools/perf/builtin-stat.c
@@ -1083,6 +1083,11 @@ void perf_stat__set_big_num(int set)
stat_config.big_num = (set != 0);
 }
 
+void perf_stat__set_no_cvs_summary(int set)
+{
+   stat_config.no_cvs_summary = (set != 0);
+}
+
 static int stat__set_big_num(const struct option *opt __maybe_unused,
 const char *s __maybe_unused, int unset)
 {
@@ -1235,6 +1240,8 @@ static struct option stat_options[] = {
"threads of same physical core"),
OPT_BOOLEAN(0, "summary", _config.summary,
   "print summary for interval mode"),
+   OPT_BOOLEAN(0, "no-cvs-summary", _config.no_cvs_summary,
+  "don't print 'summary' for CVS summary output"),
OPT_BOOLEAN(0, "quiet", _config.quiet,
"don't print output (useful with record)"),
 #ifdef HAVE_LIBPFM
diff --git a/tools/perf/util/config.c b/tools/perf/util/config.c
index 6984c77068a3..dbf585460791 100644
--- a/tools/perf/util/config.c
+++ b/tools/perf/util/config.c
@@ -457,6 +457,9 @@ static int perf_stat_config(const char *var, const char 
*value)
if (!strcmp(var, "stat.big-num"))
perf_stat__set_big_num(perf_config_bool(var, value));
 
+   if (!strcmp(var, "stat.no-cvs-summary"))
+   perf_stat__set_no_cvs_summary(perf_config_bool(var, value));
+
/* Add other config variables here. */
return 0;
 }
diff --git a/tools/perf/util/stat-display.c b/tools/perf/util/stat-display.c
index 7f09cdaf5b60..2e7fec0bd8f3 100644
--- a/tools/perf/util/stat-display.c
+++ b/tools/perf/util/stat-display.c
@@ -439,6 +439,12 @@ static void printout(struct perf_stat_config *config, 
struct aggr_cpu_id id, int
if (counter->cgrp)
os.nfields++;
}
+
+   if (!config->no_cvs_summary && config->csv_output &&
+   config->summary && !config->interval) {
+   fprintf(config->output, "%16s%s", "summary", config->csv_sep);
+   }
+
if (run == 0 || ena == 0 || counter->counts->scaled == -1) {
if (config->metric_only) {
pm(config, , NULL, "", "", 0);
diff --git a/tools/perf/util/stat.h b/tools/perf/util/stat.h
index 41107b8deac5..def0cdc84133 100644
--- a/tools/perf/util/stat.h
+++ b/tools/perf/util/stat.h
@@ -128,6 +128,7 @@ struct perf_stat_config {
bool all_user;
bool percore_show_thread;
bool summary;
+   bool no_cvs_summary;
bool metric_no_group;
bool metric_no_merge;
bool stop_read_counter;
@@ -160,6 +161,7 @@ struct perf_stat_config {
 };
 
 void perf_stat__set_big_num(int set);
+void perf_stat__set_no_cvs_summary(int set);
 
 void update_stats(struct stats *stats, u64 val);
 double avg_stats(struct stats *stats);
-- 
2.17.1



Re: [PATCH v2] perf stat: Align CSV output for summary mode

2021-03-18 Thread Jin, Yao

Hi Arnaldo,

On 3/18/2021 9:15 PM, Arnaldo Carvalho de Melo wrote:

Em Wed, Mar 17, 2021 at 02:51:42PM -0700, Andi Kleen escreveu:

If you care about not breaking existing scripts, then the output they
get with what they use as command line options must continue to produce
the same output.


It's not clear there are any useful ones (except for tools that handle
both). It's really hard to parse the previous mess. It's simply not
valid CSV.

That's why I'm arguing that keeping compatibility is not useful here.

We would be stuck with the broken mess as default forever.


Fair enough, lets fix the default then. Jin, can you please consider
adding a 'perf test' shell entry to parse the CSV mode with/without that
summary? This way we'll notice when the new normal gets broken.

- Arnaldo



Thanks Arnaldo! I will post v3 with the perf test script.

Thanks
Jin Yao



Re: [PATCH v2 11/27] perf parse-events: Support hardware events inside PMU

2021-03-17 Thread Jin, Yao

Hi Jiri,

On 3/17/2021 6:06 PM, Jiri Olsa wrote:

On Wed, Mar 17, 2021 at 10:12:03AM +0800, Jin, Yao wrote:



On 3/16/2021 10:04 PM, Jiri Olsa wrote:

On Tue, Mar 16, 2021 at 09:49:42AM +0800, Jin, Yao wrote:

SNIP



   Performance counter stats for 'system wide':

 136,655,302  cpu_core/branch-instructions/

 1.003171561 seconds time elapsed

So we need special rules for both cycles and branches.

The worse thing is, we also need to process the hardware cache events.

# ./perf stat -e cpu_core/LLC-loads/
event syntax error: 'cpu_core/LLC-loads/'
\___ unknown term 'LLC-loads' for pmu 'cpu_core'

valid terms: 
event,pc,edge,offcore_rsp,ldlat,inv,umask,frontend,cmask,config,config1,config2,name,period,percore

Initial error:
event syntax error: 'cpu_core/LLC-loads/'
\___ unknown term 'LLC-loads' for pmu 'cpu_core'

If we use special rules for establishing all event mapping, that looks too 
much. :(


hmmm but wait, currently we do not support events like this:

'cpu/cycles/'
'cpu/branches/'

the pmu style accepts only 'events' or 'format' terms within //

we made hw events like 'cycles','instructions','branches' special
to be used without the pmu

so why do we need to support cpu_code/cycles/ ?

jirka



Actually we have to support pmu style event for hybrid platform.

User may want to enable the events from specified pmus and also with flexible 
grouping.

For example,

perf stat -e '{cpu_core/cycles/,cpu_core/instructions/}' -e 
'{cpu_atom/cycles/,cpu_atom/instructions/}'

This usage is common and reasonable. So I think we may need to support pmu 
style events.


sure, but we don't support 'cpu/cycles/' but we support 'cpu/cpu-cycles/'
why do you insist on supporting cpu_core/cycles/ ?

jirka



I'm OK to only support 'cpu_core/cpu-cycles/' or 'cpu_atom/cpu-cycles/'. But what would we do for 
cache event?


'perf stat -e LLC-loads' is OK, but 'perf stat -e cpu/LLC-loads/' is not 
supported currently.

For hybrid platform, user may only want to enable the LLC-loads on core CPUs or on atom CPUs. That's 
reasonable. While if we don't support the pmu style event, how to satisfy this requirement?


If we can support the pmu style event, we can also use the same way for cpu_core/cycles/. At least 
it's not a bad thing, right? :)


Thanks
Jin Yao


[PATCH v2] perf stat: Align CSV output for summary mode

2021-03-17 Thread Jin Yao
,8013356501,100.00,0.36,insn per cycle
  553564,,branches,8013366204,100.00,69.081,K/sec
  54021,,branch-misses,8013375952,100.00,9.76,of all branches

Signed-off-by: Jin Yao 
---
 v2:
   - Add new option '--no-cvs-summary'.
   - Add perf config variable 'stat.no-cvs-summary'.

 tools/perf/Documentation/perf-stat.txt | 9 +
 tools/perf/builtin-stat.c  | 7 +++
 tools/perf/util/config.c   | 3 +++
 tools/perf/util/stat-display.c | 6 ++
 tools/perf/util/stat.h | 2 ++
 5 files changed, 27 insertions(+)

diff --git a/tools/perf/Documentation/perf-stat.txt 
b/tools/perf/Documentation/perf-stat.txt
index 3055aad38d46..854597e70406 100644
--- a/tools/perf/Documentation/perf-stat.txt
+++ b/tools/perf/Documentation/perf-stat.txt
@@ -471,6 +471,15 @@ convenient for post processing.
 --summary::
 Print summary for interval mode (-I).
 
+--no-cvs-summary::
+Don't print 'summary' at the first column for CVS summary output.
+This option must be used with -x and --summary.
+
+This option can be enabled in perf config by setting the variable
+'stat.no-cvs-summary'.
+
+$ perf config stat.no-cvs-summary=true
+
 EXAMPLES
 
 
diff --git a/tools/perf/builtin-stat.c b/tools/perf/builtin-stat.c
index 2e2e4a8345ea..3823dd5fd6e8 100644
--- a/tools/perf/builtin-stat.c
+++ b/tools/perf/builtin-stat.c
@@ -1083,6 +1083,11 @@ void perf_stat__set_big_num(int set)
stat_config.big_num = (set != 0);
 }
 
+void perf_stat__set_no_cvs_summary(int set)
+{
+   stat_config.no_cvs_summary = (set != 0);
+}
+
 static int stat__set_big_num(const struct option *opt __maybe_unused,
 const char *s __maybe_unused, int unset)
 {
@@ -1235,6 +1240,8 @@ static struct option stat_options[] = {
"threads of same physical core"),
OPT_BOOLEAN(0, "summary", _config.summary,
   "print summary for interval mode"),
+   OPT_BOOLEAN(0, "no-cvs-summary", _config.no_cvs_summary,
+  "don't print 'summary' for CVS summary output"),
OPT_BOOLEAN(0, "quiet", _config.quiet,
"don't print output (useful with record)"),
 #ifdef HAVE_LIBPFM
diff --git a/tools/perf/util/config.c b/tools/perf/util/config.c
index 6984c77068a3..dbf585460791 100644
--- a/tools/perf/util/config.c
+++ b/tools/perf/util/config.c
@@ -457,6 +457,9 @@ static int perf_stat_config(const char *var, const char 
*value)
if (!strcmp(var, "stat.big-num"))
perf_stat__set_big_num(perf_config_bool(var, value));
 
+   if (!strcmp(var, "stat.no-cvs-summary"))
+   perf_stat__set_no_cvs_summary(perf_config_bool(var, value));
+
/* Add other config variables here. */
return 0;
 }
diff --git a/tools/perf/util/stat-display.c b/tools/perf/util/stat-display.c
index 7f09cdaf5b60..2e7fec0bd8f3 100644
--- a/tools/perf/util/stat-display.c
+++ b/tools/perf/util/stat-display.c
@@ -439,6 +439,12 @@ static void printout(struct perf_stat_config *config, 
struct aggr_cpu_id id, int
if (counter->cgrp)
os.nfields++;
}
+
+   if (!config->no_cvs_summary && config->csv_output &&
+   config->summary && !config->interval) {
+   fprintf(config->output, "%16s%s", "summary", config->csv_sep);
+   }
+
if (run == 0 || ena == 0 || counter->counts->scaled == -1) {
if (config->metric_only) {
pm(config, , NULL, "", "", 0);
diff --git a/tools/perf/util/stat.h b/tools/perf/util/stat.h
index 41107b8deac5..def0cdc84133 100644
--- a/tools/perf/util/stat.h
+++ b/tools/perf/util/stat.h
@@ -128,6 +128,7 @@ struct perf_stat_config {
bool all_user;
bool percore_show_thread;
bool summary;
+   bool no_cvs_summary;
bool metric_no_group;
bool metric_no_merge;
bool stop_read_counter;
@@ -160,6 +161,7 @@ struct perf_stat_config {
 };
 
 void perf_stat__set_big_num(int set);
+void perf_stat__set_no_cvs_summary(int set);
 
 void update_stats(struct stats *stats, u64 val);
 double avg_stats(struct stats *stats);
-- 
2.17.1



Re: [PATCH v2 11/27] perf parse-events: Support hardware events inside PMU

2021-03-16 Thread Jin, Yao




On 3/16/2021 10:04 PM, Jiri Olsa wrote:

On Tue, Mar 16, 2021 at 09:49:42AM +0800, Jin, Yao wrote:

SNIP



  Performance counter stats for 'system wide':

136,655,302  cpu_core/branch-instructions/

1.003171561 seconds time elapsed

So we need special rules for both cycles and branches.

The worse thing is, we also need to process the hardware cache events.

# ./perf stat -e cpu_core/LLC-loads/
event syntax error: 'cpu_core/LLC-loads/'
   \___ unknown term 'LLC-loads' for pmu 'cpu_core'

valid terms: 
event,pc,edge,offcore_rsp,ldlat,inv,umask,frontend,cmask,config,config1,config2,name,period,percore

Initial error:
event syntax error: 'cpu_core/LLC-loads/'
   \___ unknown term 'LLC-loads' for pmu 'cpu_core'

If we use special rules for establishing all event mapping, that looks too 
much. :(


hmmm but wait, currently we do not support events like this:

   'cpu/cycles/'
   'cpu/branches/'

the pmu style accepts only 'events' or 'format' terms within //

we made hw events like 'cycles','instructions','branches' special
to be used without the pmu

so why do we need to support cpu_code/cycles/ ?

jirka



Actually we have to support pmu style event for hybrid platform.

User may want to enable the events from specified pmus and also with flexible 
grouping.

For example,

perf stat -e '{cpu_core/cycles/,cpu_core/instructions/}' -e 
'{cpu_atom/cycles/,cpu_atom/instructions/}'

This usage is common and reasonable. So I think we may need to support pmu 
style events.

Thanks
Jin Yao



Re: [PATCH] perf stat: Align CSV output for summary mode

2021-03-16 Thread Jin, Yao




On 3/17/2021 9:30 AM, Andi Kleen wrote:

Is it serious or just a joke? :)


I would prefer to not be compatible (at least not until someone complains),
but if compatibility is required then yes opting in to the broken
format would be better. Perhaps not with that name.

And the option could be hidden in the perf config file instead
of being on the command line.

-Andi



That makes sense, thanks Andi!

Thanks
Jin Yao


Re: [PATCH] perf stat: Align CSV output for summary mode

2021-03-16 Thread Jin, Yao



On 3/17/2021 5:55 AM, Jiri Olsa wrote:

On Tue, Mar 16, 2021 at 01:02:20PM -0700, Andi Kleen wrote:

On Tue, Mar 16, 2021 at 04:05:13PM -0300, Arnaldo Carvalho de Melo wrote:

Em Tue, Mar 16, 2021 at 09:34:21AM -0700, Andi Kleen escreveu:

looks ok, but maybe make the option more related to CVS, like:

   --x-summary, --cvs-summary  ...?


Actually I don't think it should be a new option. I doubt
anyone could parse the previous mess. So just make it default
with -x


In these cases I always fear that people are already parsing that mess
by considering the summary lines to be the ones not starting with
spaces, and now we go on and change it to be "better" by prefixing it
with "summary" and... break existing scripts.


I think it was just one version or so?

FWIW perf has broken CSV output several times, I added workarounds
to toplev every time. Having a broken version for a short time
shouldn't be too bad.

I actually had a workaround for this one, but it can parse either way.



Can we do this with a new option?

I.e. like --cvs-summary?


If you do it I would add an option for the old broken format
--i-want-broken-csv. But not  require the option forever
just to get sane output.


I like that.. also we'll find out how many people are actually parsing that ;-)

jirka



Is it serious or just a joke? :)

Thanks
Jin Yao



Or maybe only a perf config option.

-Andi





[PATCH] perf stat: Align CSV output for summary mode

2021-03-16 Thread Jin Yao
perf-stat has supported the summary mode. But the summary
lines break the CSV output so it's hard for scripts to parse
the result.

Before:

  # perf stat -x, -I1000 --interval-count 1 --summary
   1.001323097,8013.48,msec,cpu-clock,8013483384,100.00,8.013,CPUs utilized
   1.001323097,270,,context-switches,8013513297,100.00,0.034,K/sec
   1.001323097,13,,cpu-migrations,8013530032,100.00,0.002,K/sec
   1.001323097,184,,page-faults,8013546992,100.00,0.023,K/sec
   1.001323097,20574191,,cycles,8013551506,100.00,0.003,GHz
   1.001323097,10562267,,instructions,8013564958,100.00,0.51,insn per cycle
   1.001323097,2019244,,branches,8013575673,100.00,0.252,M/sec
   1.001323097,106152,,branch-misses,8013585776,100.00,5.26,of all branches
  8013.48,msec,cpu-clock,8013483384,100.00,7.984,CPUs utilized
  270,,context-switches,8013513297,100.00,0.034,K/sec
  13,,cpu-migrations,8013530032,100.00,0.002,K/sec
  184,,page-faults,8013546992,100.00,0.023,K/sec
  20574191,,cycles,8013551506,100.00,0.003,GHz
  10562267,,instructions,8013564958,100.00,0.51,insn per cycle
  2019244,,branches,8013575673,100.00,0.252,M/sec
  106152,,branch-misses,8013585776,100.00,5.26,of all branches

The summary line loses the timestamp column, which breaks the
CVS output.

We add a column at the 'timestamp' position and it just says 'summary'
for the summary line.

After:

  # perf stat -x, -I1000 --interval-count 1 --summary
   1.001196053,8012.72,msec,cpu-clock,8012722903,100.00,8.013,CPUs utilized
   1.001196053,218,,context-switches,8012753271,100.00,0.027,K/sec
   1.001196053,9,,cpu-migrations,8012769767,100.00,0.001,K/sec
   1.001196053,0,,page-faults,8012786257,100.00,0.000,K/sec
   1.001196053,15004518,,cycles,8012790637,100.00,0.002,GHz
   1.001196053,7954691,,instructions,8012804027,100.00,0.53,insn per cycle
   1.001196053,1590259,,branches,8012814766,100.00,0.198,M/sec
   1.001196053,82601,,branch-misses,8012824365,100.00,5.19,of all branches
   summary,8012.72,msec,cpu-clock,8012722903,100.00,7.986,CPUs utilized
   summary,218,,context-switches,8012753271,100.00,0.027,K/sec
   summary,9,,cpu-migrations,8012769767,100.00,0.001,K/sec
   summary,0,,page-faults,8012786257,100.00,0.000,K/sec
   summary,15004518,,cycles,8012790637,100.00,0.002,GHz
   summary,7954691,,instructions,8012804027,100.00,0.53,insn per cycle
   summary,1590259,,branches,8012814766,100.00,0.198,M/sec
   summary,82601,,branch-misses,8012824365,100.00,5.19,of all branches

Now it's easy for script to analyse the summary lines.

Signed-off-by: Jin Yao 
---
 tools/perf/util/stat-display.c | 4 
 1 file changed, 4 insertions(+)

diff --git a/tools/perf/util/stat-display.c b/tools/perf/util/stat-display.c
index 7f09cdaf5b60..c4183d3e87a4 100644
--- a/tools/perf/util/stat-display.c
+++ b/tools/perf/util/stat-display.c
@@ -439,6 +439,10 @@ static void printout(struct perf_stat_config *config, 
struct aggr_cpu_id id, int
if (counter->cgrp)
os.nfields++;
}
+
+   if (config->csv_output && config->summary && !config->interval)
+   fprintf(config->output, "%16s%s", "summary", config->csv_sep);
+
if (run == 0 || ena == 0 || counter->counts->scaled == -1) {
if (config->metric_only) {
pm(config, , NULL, "", "", 0);
-- 
2.17.1



Re: [PATCH v2 16/27] perf evlist: Warn as events from different hybrid PMUs in a group

2021-03-15 Thread Jin, Yao

Hi Jiri,

On 3/16/2021 7:03 AM, Jiri Olsa wrote:

On Thu, Mar 11, 2021 at 03:07:31PM +0800, Jin Yao wrote:

SNIP


goto try_again;
}
+
+   if (errno == EINVAL && perf_pmu__hybrid_exist())
+   evlist__warn_hybrid_group(evlist);
rc = -errno;
evsel__open_strerror(pos, >target, errno, msg, 
sizeof(msg));
ui__error("%s\n", msg);
diff --git a/tools/perf/builtin-stat.c b/tools/perf/builtin-stat.c
index 7a732508b2b4..6f780a039db0 100644
--- a/tools/perf/builtin-stat.c
+++ b/tools/perf/builtin-stat.c
@@ -239,6 +239,9 @@ static void evlist__check_cpu_maps(struct evlist *evlist)
struct evsel *evsel, *pos, *leader;
char buf[1024];
  
+	if (evlist__hybrid_exist(evlist))

+   return;


this should be in separate patch and explained



Now I have another idea. If a group consists of atom events and core events, we still follow current 
disabling group solution?


I mean removing following code:

if (evlist__hybrid_exist(evlist))
return;

evlist__check_cpu_maps then continues running and disabling the group. But also report with a 
warning that says "WARNING: Group has events from different hybrid PMUs".


Do you like this way?


+
evlist__for_each_entry(evlist, evsel) {
leader = evsel->leader;
  
@@ -726,6 +729,10 @@ enum counter_recovery {

  static enum counter_recovery stat_handle_error(struct evsel *counter)
  {
char msg[BUFSIZ];
+
+   if (perf_pmu__hybrid_exist() && errno == EINVAL)
+   evlist__warn_hybrid_group(evsel_list);
+
/*
 * PPC returns ENXIO for HW counters until 2.6.37
 * (behavior changed with commit b0a873e).
diff --git a/tools/perf/util/evlist.c b/tools/perf/util/evlist.c
index f139151b9433..5ec891418cdd 100644
--- a/tools/perf/util/evlist.c
+++ b/tools/perf/util/evlist.c
@@ -2224,3 +2224,47 @@ void evlist__invalidate_all_cpus(struct evlist *evlist)
perf_cpu_map__put(evlist->core.all_cpus);
evlist->core.all_cpus = perf_cpu_map__empty_new(1);
  }
+
+static bool group_hybrid_conflict(struct evsel *leader)
+{
+   struct evsel *pos, *prev = NULL;
+
+   for_each_group_evsel(pos, leader) {
+   if (!pos->pmu_name || !perf_pmu__is_hybrid(pos->pmu_name))
+   continue;
+
+   if (prev && strcmp(prev->pmu_name, pos->pmu_name))
+   return true;
+
+   prev = pos;
+   }
+
+   return false;
+}
+
+void evlist__warn_hybrid_group(struct evlist *evlist)
+{
+   struct evsel *evsel;
+
+   evlist__for_each_entry(evlist, evsel) {
+   if (evsel__is_group_leader(evsel) &&
+   evsel->core.nr_members > 1 &&


hm, could we just iterate all the members and make sure the first found
hybrid event's pmu matches the pmu of the rest hybrid events in the list?



'{cpu_core/event1/,cpu_core/event2/}','{cpu_atom/event3/,cpu_atom/event4/}'

Two or more groups need to be supported. We get the first hybrid event's pmu (cpu_core in this 
example) but it doesn't match the cpu_atom/event3/ and cpu_atom/event4/. But actually this case 
should be supported, right?



+   group_hybrid_conflict(evsel)) {
+   WARN_ONCE(1, "WARNING: Group has events from "
+"different hybrid PMUs\n");
+   return;
+   }
+   }
+}
+
+bool evlist__hybrid_exist(struct evlist *evlist)


evlist__has_hybrid seems better



Yes, agree.

Thanks
Jin Yao



jirka



Re: [PATCH v2 17/27] perf evsel: Adjust hybrid event and global event mixed group

2021-03-15 Thread Jin, Yao

Hi Jiri,

On 3/16/2021 7:04 AM, Jiri Olsa wrote:

On Thu, Mar 11, 2021 at 03:07:32PM +0800, Jin Yao wrote:

A group mixed with hybrid event and global event is allowed. For example,
group leader is 'cpu-clock' and the group member is 'cpu_atom/cycles/'.

e.g.
perf stat -e '{cpu-clock,cpu_atom/cycles/}' -a

The challenge is their available cpus are not fully matched.
For example, 'cpu-clock' is available on CPU0-CPU23, but 'cpu_core/cycles/'
is available on CPU16-CPU23.

When getting the group id for group member, we must be very careful
because the cpu for 'cpu-clock' is not equal to the cpu for 'cpu_atom/cycles/'.
Actually the cpu here is the index of evsel->core.cpus, not the real CPU ID.
e.g. cpu0 for 'cpu-clock' is CPU0, but cpu0 for 'cpu_atom/cycles/' is CPU16.

Another challenge is for group read. The events in group may be not
available on all cpus. For example the leader is a software event and
it's available on CPU0-CPU1, but the group member is a hybrid event and
it's only available on CPU1. For CPU0, we have only one event, but for CPU1
we have two events. So we need to change the read size according to
the real number of events on that cpu.


ugh, this is really bad.. do we really want to support it? ;-)
I guess we need that for metrics..



Yes, it's a bit of pain but the user case makes sense. Some metrics need the event group which 
consists of global event + hybrid event.


For example, CPU_Utilization = 'cpu_clk_unhalted.ref_tsc' / 'msr/tsc/'.

'msr/tsc/' is a global event. It's valid on all CPUs.

But 'cpu_clk_unhalted.ref' is hybrid event.
'cpu_core/cpu_clk_unhalted.ref/' is valid on core CPUs
'cpu_atom/cpu_clk_unhalted.ref/' is valid on atom CPUs.

So we have to support this usage. :)


SNIP



Performance counter stats for 'system wide':

24,059.14 msec cpu-clock #   23.994 CPUs utilized
6,406,677,892  cpu_atom/cycles/  #  266.289 M/sec

  1.002699058 seconds time elapsed

For cpu_atom/cycles/, cpu16-cpu23 are set with valid group fd (cpu-clock's fd
on that cpu). For counting results, cpu-clock has 24 cpus aggregation and
cpu_atom/cycles/ has 8 cpus aggregation. That's expected.

But if the event order is changed, e.g. '{cpu_atom/cycles/,cpu-clock}',
there leaves more works to do.

   root@ssp-pwrt-002:~# ./perf stat -e '{cpu_atom/cycles/,cpu-clock}' -a -vvv 
-- sleep 1


what id you add the other hybrid pmu event? or just cycles?



Do you mean the config for cpu_atom/cycles/? Let's see the log.

root@ssp-pwrt-002:~# perf stat -e '{cpu_atom/cycles/,cpu-clock}' -a -vvv -- 
sleep 1
Control descriptor is not initialized

perf_event_attr:
  type 6
  size 120
  config   0xa
  sample_type  IDENTIFIER
  read_format  
TOTAL_TIME_ENABLED|TOTAL_TIME_RUNNING|ID|GROUP
  disabled 1
  inherit  1
  exclude_guest1

sys_perf_event_open: pid -1  cpu 16  group_fd -1  flags 0x8 = 3
sys_perf_event_open: pid -1  cpu 17  group_fd -1  flags 0x8 = 4
sys_perf_event_open: pid -1  cpu 18  group_fd -1  flags 0x8 = 5
sys_perf_event_open: pid -1  cpu 19  group_fd -1  flags 0x8 = 7
sys_perf_event_open: pid -1  cpu 20  group_fd -1  flags 0x8 = 8
sys_perf_event_open: pid -1  cpu 21  group_fd -1  flags 0x8 = 9
sys_perf_event_open: pid -1  cpu 22  group_fd -1  flags 0x8 = 10
sys_perf_event_open: pid -1  cpu 23  group_fd -1  flags 0x8 = 11

perf_event_attr:
  type 1
  size 120
  sample_type  IDENTIFIER
  read_format  
TOTAL_TIME_ENABLED|TOTAL_TIME_RUNNING|ID|GROUP
  inherit  1
  exclude_guest1

sys_perf_event_open: pid -1  cpu 0  group_fd -1  flags 0x8 = 12
sys_perf_event_open: pid -1  cpu 1  group_fd -1  flags 0x8 = 13
sys_perf_event_open: pid -1  cpu 2  group_fd -1  flags 0x8 = 14
sys_perf_event_open: pid -1  cpu 3  group_fd -1  flags 0x8 = 15
sys_perf_event_open: pid -1  cpu 4  group_fd -1  flags 0x8 = 16
sys_perf_event_open: pid -1  cpu 5  group_fd -1  flags 0x8 = 17
sys_perf_event_open: pid -1  cpu 6  group_fd -1  flags 0x8 = 18
sys_perf_event_open: pid -1  cpu 7  group_fd -1  flags 0x8 = 19
sys_perf_event_open: pid -1  cpu 8  group_fd -1  flags 0x8 = 20
sys_perf_event_open: pid -1  cpu 9  group_fd -1  flags 0x8 = 21
sys_perf_event_open: pid -1  cpu 10  group_fd -1  flags 0x8 = 22
sys_perf_event_open: pid -1  cpu 11  group_fd -1  flags 0x8 = 23
sys_perf_event_open: pid -1  cpu 12  group_fd -1  flags 0x8 = 24
sys_perf_event_open: pid -1  cpu 13  group_fd -1  flags 0x8 =

Re: [PATCH v2 10/27] perf parse-events: Create two hybrid cache events

2021-03-15 Thread Jin, Yao

Hi Jiri,

On 3/16/2021 7:05 AM, Jiri Olsa wrote:

On Thu, Mar 11, 2021 at 03:07:25PM +0800, Jin Yao wrote:

SNIP


+   config_terms, pmu);
+   if (ret)
+   return ret;
+   }
+
+   return 0;
+}
+
  int parse_events_add_cache(struct list_head *list, int *idx,
   char *type, char *op_result1, char *op_result2,
   struct parse_events_error *err,
@@ -474,7 +516,8 @@ int parse_events_add_cache(struct list_head *list, int *idx,
char name[MAX_NAME_LEN], *config_name;
int cache_type = -1, cache_op = -1, cache_result = -1;
char *op_result[2] = { op_result1, op_result2 };
-   int i, n;
+   int i, n, ret;
+   bool hybrid;
  
  	/*

 * No fallback - if we cannot get a clear cache type
@@ -534,6 +577,15 @@ int parse_events_add_cache(struct list_head *list, int 
*idx,
if (get_config_terms(head_config, _terms))
return -ENOMEM;
}
+
+   if (!perf_pmu__hybrid_exist())
+   perf_pmu__scan(NULL);


actualy how about construct like:

perf_pmu_is_hybrid()
return hybrid_add_event_cache(...)

return add_event(...)


with:
perf_pmu_is_hybrid()
{
static bool initialized;

if (!initialized) {
initialized = true;
perf_pmu__scan(NULL)
}

return ...
}

jirka



Thanks, that's a good solution. I will do that in v3.

Thanks
Jin Yao


+
+   ret = add_hybrid_cache(list, idx, , config_name ? : name,
+  _terms, );
+   if (hybrid)
+   return ret;
+
return add_event(list, idx, , config_name ? : name, _terms);
  }
  
--

2.17.1





Re: [PATCH v2 09/27] perf parse-events: Create two hybrid hardware events

2021-03-15 Thread Jin, Yao

Hi Jiri,

On 3/16/2021 7:05 AM, Jiri Olsa wrote:

On Thu, Mar 11, 2021 at 03:07:24PM +0800, Jin Yao wrote:

For hardware events, they have pre-defined configs. The kernel
needs to know where the event comes from (e.g. from cpu_core pmu
or from cpu_atom pmu). But the perf type 'PERF_TYPE_HARDWARE'
can't carry pmu information.

So the kernel introduces a new type 'PERF_TYPE_HARDWARE_PMU'.
The new attr.config layout for PERF_TYPE_HARDWARE_PMU is:

0xDD00AA
AA: original hardware event ID
DD: PMU type ID

PMU type ID is retrieved from sysfs. For example,

   cat /sys/devices/cpu_atom/type
   10

   cat /sys/devices/cpu_core/type
   4

When enabling a hybrid hardware event without specified pmu, such as,
'perf stat -e cycles -a', two events are created automatically. One
is for atom, the other is for core.


ok I think I understand the need for this (and the following) patch
the perf_hw_id counters could be global, so when you specify only
event like:

-e cycles

you want all the cycles, which on hybrid system means cycles from
more than one pmus



Yes, on hybrid system it means the cycles from two pmus. One cycle is from cpu_core pmu, another 
cycles is from cpu_atom pmu.



SNIP


@@ -1416,6 +1475,8 @@ int parse_events_add_numeric(struct parse_events_state 
*parse_state,
  {
struct perf_event_attr attr;
LIST_HEAD(config_terms);
+   bool hybrid;
+   int ret;
  
  	memset(, 0, sizeof(attr));

attr.type = type;
@@ -1430,6 +1491,18 @@ int parse_events_add_numeric(struct parse_events_state 
*parse_state,
return -ENOMEM;
}
  
+	/*

+* Skip the software dummy event.
+*/
+   if (type != PERF_TYPE_SOFTWARE) {
+   if (!perf_pmu__hybrid_exist())
+   perf_pmu__scan(NULL);


this could be checked in the following add_hybrid_numeric call



Yes, that should be OK. I will move the check in the next version.


+
+   ret = add_hybrid_numeric(parse_state, list, , );
+   if (hybrid)
+   return ret;
+   }


could we add this to separate object.. hybrid.c or maybe parse-events-hybrid.c,

there's already global __add_event wrapper - parse_events__add_event


jirka



Use a new parse-events-hybrid.c, hmm, well that's OK.

Thanks
Jin Yao


+
return add_event(list, _state->idx, ,
 get_config_name(head_config), _terms);
  }
--
2.17.1





Re: [PATCH v2 11/27] perf parse-events: Support hardware events inside PMU

2021-03-15 Thread Jin, Yao

Hi Jiri,

On 3/16/2021 1:37 AM, Jiri Olsa wrote:

On Mon, Mar 15, 2021 at 10:28:12AM +0800, Jin, Yao wrote:

Hi Jiri,

On 3/13/2021 3:15 AM, Jiri Olsa wrote:

On Thu, Mar 11, 2021 at 03:07:26PM +0800, Jin Yao wrote:

On hybrid platform, some hardware events are only available
on a specific pmu. For example, 'L1-dcache-load-misses' is only
available on 'cpu_core' pmu. And even for the event which can be
available on both pmus, the user also may want to just enable
one event. So now following syntax is supported:

cpu_core//
cpu_core//
cpu_core//

cpu_atom//
cpu_atom//
cpu_atom//

It limits the event to be enabled only on a specified pmu.

The patch uses this idea, for example, if we use "cpu_core/LLC-loads/",
in parse_events_add_pmu(), term->config is "LLC-loads".


hum, I don't understand how this doest not work even now,
I assume both cpu_core and cpu_atom have sysfs device directory
with events/ directory right?



Yes, we have cpu_core and cpu_atom directories with events.

root@ssp-pwrt-002:/sys/devices/cpu_atom/events# ls
branch-instructions  bus-cyclescache-references  instructions
mem-stores  topdown-bad-spec topdown-fe-bound
branch-missescache-misses  cpu-cyclesmem-loads
ref-cycles  topdown-be-bound topdown-retiring

root@ssp-pwrt-002:/sys/devices/cpu_core/events# ls
branch-instructions  cache-misses  instructions   mem-stores
topdown-bad-spec topdown-fe-bound   topdown-mem-bound
branch-missescache-references  mem-loads  ref-cycles
topdown-be-bound topdown-fetch-lat  topdown-retiring
bus-cycles   cpu-cyclesmem-loads-aux  slots
topdown-br-mispredict topdown-heavy-ops


and whatever is defined in events we allow in parsing syntax..

why can't we treat them like 2 separated pmus?



But if without this patch, it reports the error,

root@ssp-pwrt-002:~# ./perf stat -e cpu_core/cycles/ -a -vv -- sleep 1
event syntax error: 'cpu_core/cycles/'
   \___ unknown term 'cycles' for pmu 'cpu_core'


yep, because there's special care for 'cycles' unfortunately,
but you should be able to run 'cpu_core/cpu-cycles/' right?



Yes, cpu_core/cpu-cycles/ is OK.

# ./perf stat -e cpu_core/cpu-cycles/ -a -- sleep 1

 Performance counter stats for 'system wide':

12,831,980,326  cpu_core/cpu-cycles/

   1.003132639 seconds time elapsed



valid terms: 
event,pc,edge,offcore_rsp,ldlat,inv,umask,frontend,cmask,config,config1,config2,name,period,percore

Initial error:
event syntax error: 'cpu_core/cycles/'
   \___ unknown term 'cycles' for pmu 'cpu_core'

valid terms: 
event,pc,edge,offcore_rsp,ldlat,inv,umask,frontend,cmask,config,config1,config2,name,period,percore
Run 'perf list' for a list of valid events

The 'cycles' is treated as a unknown term, then it errors out.


yep, because it's not in events.. we could add special rule to
treat cycles as cpu-cycles inside pmu definition ;-)

jirka



But not only the cycles, the branches has error too.

# ./perf stat -e cpu_core/branches/ -a -- sleep 1
event syntax error: 'cpu_core/branches/'
  \___ unknown term 'branches' for pmu 'cpu_core'

valid terms: 
event,pc,edge,offcore_rsp,ldlat,inv,umask,frontend,cmask,config,config1,config2,name,period,percore


Initial error:
event syntax error: 'cpu_core/branches/'
  \___ unknown term 'branches' for pmu 'cpu_core'

Of course, branch-instructions runs OK.

# ./perf stat -e cpu_core/branch-instructions/ -a -- sleep 1

 Performance counter stats for 'system wide':

   136,655,302  cpu_core/branch-instructions/

   1.003171561 seconds time elapsed

So we need special rules for both cycles and branches.

The worse thing is, we also need to process the hardware cache events.

# ./perf stat -e cpu_core/LLC-loads/
event syntax error: 'cpu_core/LLC-loads/'
  \___ unknown term 'LLC-loads' for pmu 'cpu_core'

valid terms: 
event,pc,edge,offcore_rsp,ldlat,inv,umask,frontend,cmask,config,config1,config2,name,period,percore


Initial error:
event syntax error: 'cpu_core/LLC-loads/'
  \___ unknown term 'LLC-loads' for pmu 'cpu_core'

If we use special rules for establishing all event mapping, that looks too 
much. :(

Thanks
Jin Yao


Re: [PATCH v2 04/27] perf pmu: Save pmu name

2021-03-15 Thread Jin, Yao

Hi Jiri,

On 3/16/2021 7:03 AM, Jiri Olsa wrote:

On Thu, Mar 11, 2021 at 03:07:19PM +0800, Jin Yao wrote:

On hybrid platform, one event is available on one pmu
(such as, available on cpu_core or on cpu_atom).

This patch saves the pmu name to the pmu field of struct perf_pmu_alias.
Then next we can know the pmu which the event can be available on.

Signed-off-by: Jin Yao 
---
  tools/perf/util/pmu.c | 10 +-
  tools/perf/util/pmu.h |  1 +
  2 files changed, 10 insertions(+), 1 deletion(-)

diff --git a/tools/perf/util/pmu.c b/tools/perf/util/pmu.c
index 54e586bf19a5..45d8db1af8d2 100644
--- a/tools/perf/util/pmu.c
+++ b/tools/perf/util/pmu.c
@@ -283,6 +283,7 @@ void perf_pmu_free_alias(struct perf_pmu_alias *newalias)
zfree(>str);
zfree(>metric_expr);
zfree(>metric_name);
+   zfree(>pmu);
parse_events_terms__purge(>terms);
free(newalias);
  }
@@ -297,6 +298,10 @@ static bool perf_pmu_merge_alias(struct perf_pmu_alias 
*newalias,
  
  	list_for_each_entry(a, alist, list) {

if (!strcasecmp(newalias->name, a->name)) {
+   if (newalias->pmu && a->pmu &&
+   !strcasecmp(newalias->pmu, a->pmu)) {
+   continue;
+   }
perf_pmu_update_alias(a, newalias);
perf_pmu_free_alias(newalias);
return true;
@@ -314,7 +319,8 @@ static int __perf_pmu__new_alias(struct list_head *list, 
char *dir, char *name,
int num;
char newval[256];
char *long_desc = NULL, *topic = NULL, *unit = NULL, *perpkg = NULL,
-*metric_expr = NULL, *metric_name = NULL, *deprecated = NULL;
+*metric_expr = NULL, *metric_name = NULL, *deprecated = NULL,
+*pmu = NULL;
  
  	if (pe) {

long_desc = (char *)pe->long_desc;
@@ -324,6 +330,7 @@ static int __perf_pmu__new_alias(struct list_head *list, 
char *dir, char *name,
metric_expr = (char *)pe->metric_expr;
metric_name = (char *)pe->metric_name;
deprecated = (char *)pe->deprecated;
+   pmu = (char *)pe->pmu;
}
  
  	alias = malloc(sizeof(*alias));

@@ -389,6 +396,7 @@ static int __perf_pmu__new_alias(struct list_head *list, 
char *dir, char *name,
}
alias->per_pkg = perpkg && sscanf(perpkg, "%d", ) == 1 && num == 1;
alias->str = strdup(newval);
+   alias->pmu = pmu ? strdup(pmu) : NULL;
  
  	if (deprecated)

alias->deprecated = true;
diff --git a/tools/perf/util/pmu.h b/tools/perf/util/pmu.h
index 8164388478c6..0e724d5b84c6 100644
--- a/tools/perf/util/pmu.h
+++ b/tools/perf/util/pmu.h
@@ -72,6 +72,7 @@ struct perf_pmu_alias {
bool deprecated;
char *metric_expr;
char *metric_name;
+   char *pmu;


please use pmu_name

thanks,
jirka



OK, I will use pmu_name in next version.

Thanks
Jin Yao



Re: [PATCH v2 11/27] perf parse-events: Support hardware events inside PMU

2021-03-14 Thread Jin, Yao

Hi Jiri,

On 3/13/2021 3:15 AM, Jiri Olsa wrote:

On Thu, Mar 11, 2021 at 03:07:26PM +0800, Jin Yao wrote:

On hybrid platform, some hardware events are only available
on a specific pmu. For example, 'L1-dcache-load-misses' is only
available on 'cpu_core' pmu. And even for the event which can be
available on both pmus, the user also may want to just enable
one event. So now following syntax is supported:

cpu_core//
cpu_core//
cpu_core//

cpu_atom//
cpu_atom//
cpu_atom//

It limits the event to be enabled only on a specified pmu.

The patch uses this idea, for example, if we use "cpu_core/LLC-loads/",
in parse_events_add_pmu(), term->config is "LLC-loads".


hum, I don't understand how this doest not work even now,
I assume both cpu_core and cpu_atom have sysfs device directory
with events/ directory right?



Yes, we have cpu_core and cpu_atom directories with events.

root@ssp-pwrt-002:/sys/devices/cpu_atom/events# ls
branch-instructions  bus-cyclescache-references  instructions  mem-stores  topdown-bad-spec 
topdown-fe-bound
branch-missescache-misses  cpu-cyclesmem-loads ref-cycles  topdown-be-bound 
topdown-retiring


root@ssp-pwrt-002:/sys/devices/cpu_core/events# ls
branch-instructions  cache-misses  instructions   mem-stores  topdown-bad-spec 
topdown-fe-bound   topdown-mem-bound
branch-missescache-references  mem-loads  ref-cycles  topdown-be-bound 
topdown-fetch-lat  topdown-retiring
bus-cycles   cpu-cyclesmem-loads-aux  slots   topdown-br-mispredict 
topdown-heavy-ops



and whatever is defined in events we allow in parsing syntax..

why can't we treat them like 2 separated pmus?



But if without this patch, it reports the error,

root@ssp-pwrt-002:~# ./perf stat -e cpu_core/cycles/ -a -vv -- sleep 1
event syntax error: 'cpu_core/cycles/'
  \___ unknown term 'cycles' for pmu 'cpu_core'

valid terms: 
event,pc,edge,offcore_rsp,ldlat,inv,umask,frontend,cmask,config,config1,config2,name,period,percore


Initial error:
event syntax error: 'cpu_core/cycles/'
  \___ unknown term 'cycles' for pmu 'cpu_core'

valid terms: 
event,pc,edge,offcore_rsp,ldlat,inv,umask,frontend,cmask,config,config1,config2,name,period,percore

Run 'perf list' for a list of valid events

The 'cycles' is treated as a unknown term, then it errors out.

So we have to create another parser to scan the term.

Thanks
Jin Yao


thanks,
jirka



We create a new "parse_events_state" with the pmu_name and use
parse_events__scanner to scan the term->config (the string "LLC-loads"
in this example). The parse_events_add_cache() will be called during
parsing. The parse_state->pmu_name is used to identify the pmu
where the event is enabled.

Let's see examples:

   root@ssp-pwrt-002:~# ./perf stat -e cpu_core/cycles/,cpu_core/LLC-loads/ -vv 
-- ./triad_loop
   Control descriptor is not initialized
   
   perf_event_attr:
 type 6
 size 120
 config   0x4
 sample_type  IDENTIFIER
 read_format  TOTAL_TIME_ENABLED|TOTAL_TIME_RUNNING
 disabled 1
 inherit  1
 enable_on_exec   1
 exclude_guest1
   
   sys_perf_event_open: pid 7267  cpu -1  group_fd -1  flags 0x8 = 3
   
   perf_event_attr:
 type 7
 size 120
 config   0x40002
 sample_type  IDENTIFIER
 read_format  TOTAL_TIME_ENABLED|TOTAL_TIME_RUNNING
 disabled 1
 inherit  1
 enable_on_exec   1
 exclude_guest1
   
   sys_perf_event_open: pid 7267  cpu -1  group_fd -1  flags 0x8 = 4
   cycles: 0: 449252097 29724 29724
   LLC-loads: 0: 1857 29724 29724
   cycles: 449252097 29724 29724
   LLC-loads: 1857 29724 29724

Performance counter stats for './triad_loop':

  449,252,097  cpu_core/cycles/
1,857  cpu_core/LLC-loads/

  0.298898415 seconds time elapsed

   root@ssp-pwrt-002:~# ./perf stat -e cpu_atom/cycles/,cpu_atom/LLC-loads/ -vv 
-- taskset -c 16 ./triad_loop
   Control descriptor is not initialized
   
   perf_event_attr:
 type 6
 size 120
 config   0xa
 sample_type

Re: [PATCH v2 09/27] perf parse-events: Create two hybrid hardware events

2021-03-14 Thread Jin, Yao

Hi Jiri,

On 3/13/2021 3:15 AM, Jiri Olsa wrote:

On Thu, Mar 11, 2021 at 03:07:24PM +0800, Jin Yao wrote:

SNIP


   cycles: 4: 800933425 1002536659 1002536659
   cycles: 5: 800928573 1002528386 1002528386
   cycles: 6: 800924347 1002520527 1002520527
   cycles: 7: 800922009 1002513176 1002513176
   cycles: 8: 800919624 1002507326 1002507326
   cycles: 9: 800917204 1002500663 1002500663
   cycles: 10: 802096579 1002494280 1002494280
   cycles: 11: 802093770 1002486404 1002486404
   cycles: 12: 803284338 1002479491 1002479491
   cycles: 13: 803277609 1002469777 1002469777
   cycles: 14: 800875902 1002458861 1002458861
   cycles: 15: 800873241 1002451350 1002451350
   cycles: 0: 800837379 1002444645 1002444645
   cycles: 1: 800833400 1002438505 1002438505
   cycles: 2: 800829291 1002433698 1002433698
   cycles: 3: 800824390 1002427584 1002427584
   cycles: 4: 800819360 1002422099 1002422099
   cycles: 5: 800814787 1002415845 1002415845
   cycles: 6: 800810125 1002410301 1002410301
   cycles: 7: 800791893 1002386845 1002386845
   cycles: 12855737722 16040169029 16040169029
   cycles: 6406560625 8019379522 8019379522

Performance counter stats for 'system wide':

   12,855,737,722  cpu_core/cycles/
6,406,560,625  cpu_atom/cycles/


so we do that no_merge stuff for uncore pmus, why can't we do
that in here? that'd seems like generic way

jirka



We have set the "stat_config.no_merge = true;" in "[PATCH v2 08/27] perf stat: Uniquify hybrid event 
name".


For hybrid hardware events, they have different configs. The config is 0xDD00AA (0x4 for 
core vs. 0xa for atom in this example)


We use perf_pmu__for_each_hybrid_pmu() to iterate all hybrid PMUs, generate the configs and create 
the evsels for each hybrid PMU. This logic and the code are not complex and easy to understand.


Uncore looks complicated. It has uncore alias concept which is for different PMUs but with same 
prefix. Such as "uncore_cbox" for "uncore_cbox_0" to "uncore_cbox_9". But the uncore alias concept 
doesn't apply to hybrid pmu (we just have "cpu_core" and "cpu_atom" here). And actually I also don't 
want to mix the core stuff with uncore stuff, that would be hard for understanding.


Perhaps I misunderstand, correct me if I'm wrong.

Thanks
Jin Yao



Re: [PATCH v2 07/27] perf evlist: Hybrid event uses its own cpus

2021-03-14 Thread Jin, Yao

Hi Jiri,

On 3/13/2021 3:15 AM, Jiri Olsa wrote:

On Thu, Mar 11, 2021 at 03:07:22PM +0800, Jin Yao wrote:

On hybrid platform, atom events can be only enabled on atom CPUs. Core
events can be only enabled on core CPUs. So for a hybrid event, it can
be only enabled on it's own CPUs.

But the problem for current perf is, the cpus for evsel (via PMU sysfs)
have been merged to evsel_list->core.all_cpus. It might be all CPUs.

So we need to figure out one way to let the hybrid event only use it's
own CPUs.

The idea is to create a new evlist__invalidate_all_cpus to invalidate
the evsel_list->core.all_cpus then evlist__for_each_cpu returns cpu -1
for hybrid evsel. If cpu is -1, hybrid evsel will use it's own cpus.


that's wild.. I don't understand when you say we don't have
cpus for evsel, because they have been merged.. each evsel
has evsel->core.own_cpus coming from pmu->cpus, right?

why can't you just filter out cpus that are in there?

jirka



Yes, you're right. This patch is wide and actually it's not very necessary.

The current framework has processed the cpus for evsel well even for hybrid evsel. So this patch can 
be dropped.


Thanks
Jin Yao



[PATCH v2 25/27] perf tests: Support 'Convert perf time to TSC' test for hybrid

2021-03-10 Thread Jin Yao
Since for "cycles:u' on hybrid platform, it creates two "cycles".
So the second evsel in evlist also needs initialization.

With this patch,

root@otcpl-adl-s-2:~# ./perf test 71
71: Convert perf time to TSC: Ok

Signed-off-by: Jin Yao 
---
 tools/perf/tests/perf-time-to-tsc.c | 16 
 1 file changed, 16 insertions(+)

diff --git a/tools/perf/tests/perf-time-to-tsc.c 
b/tools/perf/tests/perf-time-to-tsc.c
index 680c3cffb128..b472205ec8e3 100644
--- a/tools/perf/tests/perf-time-to-tsc.c
+++ b/tools/perf/tests/perf-time-to-tsc.c
@@ -66,6 +66,11 @@ int test__perf_time_to_tsc(struct test *test __maybe_unused, 
int subtest __maybe
u64 test_tsc, comm1_tsc, comm2_tsc;
u64 test_time, comm1_time = 0, comm2_time = 0;
struct mmap *md;
+   bool hybrid = false;
+
+   perf_pmu__scan(NULL);
+   if (perf_pmu__hybrid_exist())
+   hybrid = true;
 
threads = thread_map__new(-1, getpid(), UINT_MAX);
CHECK_NOT_NULL__(threads);
@@ -88,6 +93,17 @@ int test__perf_time_to_tsc(struct test *test __maybe_unused, 
int subtest __maybe
evsel->core.attr.disabled = 1;
evsel->core.attr.enable_on_exec = 0;
 
+   /*
+* For hybrid "cycles:u", it creates two events.
+* Init the second evsel here.
+*/
+   if (hybrid) {
+   evsel = evsel__next(evsel);
+   evsel->core.attr.comm = 1;
+   evsel->core.attr.disabled = 1;
+   evsel->core.attr.enable_on_exec = 0;
+   }
+
CHECK__(evlist__open(evlist));
 
CHECK__(evlist__mmap(evlist, UINT_MAX));
-- 
2.17.1



[PATCH v2 26/27] perf tests: Skip 'perf stat metrics (shadow stat) test' for hybrid

2021-03-10 Thread Jin Yao
Currently we don't support shadow stat for hybrid.

  root@ssp-pwrt-002:~# ./perf stat -e cycles,instructions -a -- sleep 1

   Performance counter stats for 'system wide':

  12,883,109,591  cpu_core/cycles/
   6,405,163,221  cpu_atom/cycles/
 555,553,778  cpu_core/instructions/
 841,158,734  cpu_atom/instructions/

 1.002644773 seconds time elapsed

Now there is no shadow stat 'insn per cycle' reported. We will support
it later and now just skip the 'perf stat metrics (shadow stat) test'.

Signed-off-by: Jin Yao 
---
 tools/perf/tests/shell/stat+shadow_stat.sh | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/tools/perf/tests/shell/stat+shadow_stat.sh 
b/tools/perf/tests/shell/stat+shadow_stat.sh
index ebebd3596cf9..e6e35fc6c882 100755
--- a/tools/perf/tests/shell/stat+shadow_stat.sh
+++ b/tools/perf/tests/shell/stat+shadow_stat.sh
@@ -7,6 +7,9 @@ set -e
 # skip if system-wide mode is forbidden
 perf stat -a true > /dev/null 2>&1 || exit 2
 
+# skip if on hybrid platform
+perf stat -a -e cycles sleep 1 2>&1 | grep -e cpu_core && exit 2
+
 test_global_aggr()
 {
perf stat -a --no-big-num -e cycles,instructions sleep 1  2>&1 | \
-- 
2.17.1



[PATCH v2 27/27] perf Documentation: Document intel-hybrid support

2021-03-10 Thread Jin Yao
Add some words and examples to help understanding of
Intel hybrid perf support.

Signed-off-by: Jin Yao 
---
 tools/perf/Documentation/intel-hybrid.txt | 228 ++
 tools/perf/Documentation/perf-record.txt  |   1 +
 tools/perf/Documentation/perf-stat.txt|   2 +
 3 files changed, 231 insertions(+)
 create mode 100644 tools/perf/Documentation/intel-hybrid.txt

diff --git a/tools/perf/Documentation/intel-hybrid.txt 
b/tools/perf/Documentation/intel-hybrid.txt
new file mode 100644
index ..ff641d9ac81b
--- /dev/null
+++ b/tools/perf/Documentation/intel-hybrid.txt
@@ -0,0 +1,228 @@
+Intel hybrid support
+
+Support for Intel hybrid events within perf tools.
+
+For some Intel platforms, such as AlderLake, which is hybrid platform and
+it consists of atom cpu and core cpu. Each cpu has dedicated event list.
+Part of events are available on core cpu, part of events are available
+on atom cpu and even part of events are available on both.
+
+Kernel exports two new cpu pmus via sysfs:
+/sys/devices/cpu_core
+/sys/devices/cpu_atom
+
+The 'cpus' files are created under the directories. For example,
+
+cat /sys/devices/cpu_core/cpus
+0-15
+
+cat /sys/devices/cpu_atom/cpus
+16-23
+
+It indicates cpu0-cpu15 are core cpus and cpu16-cpu23 are atom cpus.
+
+Quickstart
+
+List hybrid event
+-
+
+As before, use perf-list to list the symbolic event.
+
+perf list
+
+inst_retired.any
+   [Fixed Counter: Counts the number of instructions retired. Unit: 
cpu_atom]
+inst_retired.any
+   [Number of instructions retired. Fixed Counter - architectural event. 
Unit: cpu_core]
+
+The 'Unit: xxx' is added to brief description to indicate which pmu
+the event is belong to. Same event name but with different pmu can
+be supported.
+
+Enable hybrid event with a specific pmu
+---
+
+To enable a core only event or atom only event, following syntax is supported:
+
+   cpu_core//
+or
+   cpu_atom//
+
+For example, count the 'cycles' event on core cpus.
+
+   perf stat -e cpu_core/cycles/
+
+Create two events for one hardware event automatically
+--
+
+When creating one event and the event is available on both atom and core,
+two events are created automatically. One is for atom, the other is for
+core. Most of hardware events and cache events are available on both
+cpu_core and cpu_atom.
+
+For hardware events, they have pre-defined configs (e.g. 0 for cycles).
+But on hybrid platform, kernel needs to know where the event comes from
+(from atom or from core). The original perf event type PERF_TYPE_HARDWARE
+can't carry pmu information. So a new type PERF_TYPE_HARDWARE_PMU is
+introduced.
+
+The new attr.config layout for PERF_TYPE_HARDWARE_PMU:
+
+0xDD00AA
+AA: original hardware event ID
+DD: PMU type ID
+
+Cache event is similar. A new type PERF_TYPE_HW_CACHE_PMU is introduced.
+
+The new attr.config layout for PERF_TYPE_HW_CACHE_PMU:
+
+0xDD00CCBBAA
+AA: original hardware cache ID
+BB: original hardware cache op ID
+CC: original hardware cache op result ID
+DD: PMU type ID
+
+PMU type ID is retrieved from sysfs
+
+cat /sys/devices/cpu_atom/type
+10
+
+cat /sys/devices/cpu_core/type
+4
+
+When enabling a hardware event without specified pmu, such as,
+perf stat -e cycles -a (use system-wide in this example), two events
+are created automatically.
+
+
+perf_event_attr:
+  type 6
+  size 120
+  config   0x4
+  sample_type  IDENTIFIER
+  read_format  TOTAL_TIME_ENABLED|TOTAL_TIME_RUNNING
+  disabled 1
+  inherit  1
+  exclude_guest1
+
+
+and
+
+
+perf_event_attr:
+  type 6
+  size 120
+  config   0xa
+  sample_type  IDENTIFIER
+  read_format  TOTAL_TIME_ENABLED|TOTAL_TIME_RUNNING
+  disabled 1
+  inherit  1
+  exclude_guest1
+
+
+type 6 is PERF_TYPE_HARDWARE_PMU.
+0x4 in 0x4 indicates it's cpu_core pmu.
+0xa in 0xa indicates it's cpu_atom pmu (atom pmu type id is random).
+
+The kernel creates 'cycles' (0x4) on cpu0-cpu15 (core cpus),
+and create 'cycles' (0xa) on cpu16-cpu23 (atom cpus).
+
+For perf-stat result, it displays two events:
+
+ Performance counter stats for 'system wide':
+
+12,869,720,529  cpu_core/cycles

[PATCH v2 19/27] perf tests: Add hybrid cases for 'Parse event definition strings' test

2021-03-10 Thread Jin Yao
Add basic hybrid test cases for 'Parse event definition strings' test.

root@otcpl-adl-s-2:~# ./perf test 6
 6: Parse event definition strings  : Ok

Signed-off-by: Jin Yao 
---
 tools/perf/tests/parse-events.c | 171 
 1 file changed, 171 insertions(+)

diff --git a/tools/perf/tests/parse-events.c b/tools/perf/tests/parse-events.c
index a7f6661e6112..aec929867020 100644
--- a/tools/perf/tests/parse-events.c
+++ b/tools/perf/tests/parse-events.c
@@ -1512,6 +1512,123 @@ static int test__all_tracepoints(struct evlist *evlist)
return test__checkevent_tracepoint_multi(evlist);
 }
 
+static int test__hybrid_hw_event_with_pmu(struct evlist *evlist)
+{
+   struct evsel *evsel = evlist__first(evlist);
+
+   TEST_ASSERT_VAL("wrong number of entries", 1 == 
evlist->core.nr_entries);
+   TEST_ASSERT_VAL("wrong type", PERF_TYPE_HARDWARE_PMU == 
evsel->core.attr.type);
+   TEST_ASSERT_VAL("wrong config", 0x4 == evsel->core.attr.config);
+   return 0;
+}
+
+static int test__hybrid_hw_event(struct evlist *evlist)
+{
+   struct evsel *evsel1 = evlist__first(evlist);
+   struct evsel *evsel2 = evlist__last(evlist);
+
+   TEST_ASSERT_VAL("wrong number of entries", 2 == 
evlist->core.nr_entries);
+   TEST_ASSERT_VAL("wrong type", PERF_TYPE_HARDWARE_PMU == 
evsel1->core.attr.type);
+   TEST_ASSERT_VAL("wrong config", 0x4 == 
evsel1->core.attr.config);
+   TEST_ASSERT_VAL("wrong type", PERF_TYPE_HARDWARE_PMU == 
evsel2->core.attr.type);
+   TEST_ASSERT_VAL("wrong config", 0xa == 
evsel2->core.attr.config);
+   return 0;
+}
+
+static int test__hybrid_hw_group_event(struct evlist *evlist)
+{
+   struct evsel *evsel, *leader;
+
+   evsel = leader = evlist__first(evlist);
+   TEST_ASSERT_VAL("wrong number of entries", 2 == 
evlist->core.nr_entries);
+   TEST_ASSERT_VAL("wrong type", PERF_TYPE_HARDWARE_PMU == 
evsel->core.attr.type);
+   TEST_ASSERT_VAL("wrong config", 0x4 == evsel->core.attr.config);
+   TEST_ASSERT_VAL("wrong leader", evsel->leader == leader);
+
+   evsel = evsel__next(evsel);
+   TEST_ASSERT_VAL("wrong type", PERF_TYPE_HARDWARE_PMU == 
evsel->core.attr.type);
+   TEST_ASSERT_VAL("wrong config", 0x40001 == evsel->core.attr.config);
+   TEST_ASSERT_VAL("wrong leader", evsel->leader == leader);
+   return 0;
+}
+
+static int test__hybrid_sw_hw_group_event(struct evlist *evlist)
+{
+   struct evsel *evsel, *leader;
+
+   evsel = leader = evlist__first(evlist);
+   TEST_ASSERT_VAL("wrong number of entries", 2 == 
evlist->core.nr_entries);
+   TEST_ASSERT_VAL("wrong type", PERF_TYPE_SOFTWARE == 
evsel->core.attr.type);
+   TEST_ASSERT_VAL("wrong leader", evsel->leader == leader);
+
+   evsel = evsel__next(evsel);
+   TEST_ASSERT_VAL("wrong type", PERF_TYPE_HARDWARE_PMU == 
evsel->core.attr.type);
+   TEST_ASSERT_VAL("wrong config", 0x4 == evsel->core.attr.config);
+   TEST_ASSERT_VAL("wrong leader", evsel->leader == leader);
+   return 0;
+}
+
+static int test__hybrid_hw_sw_group_event(struct evlist *evlist)
+{
+   struct evsel *evsel, *leader;
+
+   evsel = leader = evlist__first(evlist);
+   TEST_ASSERT_VAL("wrong number of entries", 2 == 
evlist->core.nr_entries);
+   TEST_ASSERT_VAL("wrong type", PERF_TYPE_HARDWARE_PMU == 
evsel->core.attr.type);
+   TEST_ASSERT_VAL("wrong config", 0x4 == evsel->core.attr.config);
+   TEST_ASSERT_VAL("wrong leader", evsel->leader == leader);
+
+   evsel = evsel__next(evsel);
+   TEST_ASSERT_VAL("wrong type", PERF_TYPE_SOFTWARE == 
evsel->core.attr.type);
+   TEST_ASSERT_VAL("wrong leader", evsel->leader == leader);
+   return 0;
+}
+
+static int test__hybrid_group_modifier1(struct evlist *evlist)
+{
+   struct evsel *evsel, *leader;
+
+   evsel = leader = evlist__first(evlist);
+   TEST_ASSERT_VAL("wrong number of entries", 2 == 
evlist->core.nr_entries);
+   TEST_ASSERT_VAL("wrong type", PERF_TYPE_HARDWARE_PMU == 
evsel->core.attr.type);
+   TEST_ASSERT_VAL("wrong config", 0x4 == evsel->core.attr.config);
+   TEST_ASSERT_VAL("wrong leader", evsel->leader == leader);
+   TEST_ASSERT_VAL("wrong exclude_user", evsel->core.attr.exclude_user);
+   TEST_ASSERT_VAL("wrong exclude_kernel", 
!evsel->core.attr.exclude_kernel);
+
+   evsel = evsel__next(evsel);
+   TEST_ASSERT_VAL("wrong type", PERF_TYPE_HARDWARE_P

[PATCH v2 21/27] perf tests: Skip 'Setup struct perf_event_attr' test for hybrid

2021-03-10 Thread Jin Yao
For hybrid, kernel introduces new perf type PERF_TYPE_HARDWARE_PMU (6)
and it's assigned to hybrid hardware events.

root@otcpl-adl-s-2:~# ./perf test 17 -vvv
  ...
  compare
matching [event:base-stat]
  to [event-6-17179869184-4]
  [cpu] * 0
  [flags] 0|8 8
  [type] 0 6
->FAIL
match: [event:base-stat] matches []
event:base-stat does not match, but is optional
  matched
  compare
matching [event-6-17179869184-4]
  to [event:base-stat]
  [cpu] 0 *
  [flags] 8 0|8
  [type] 6 0
->FAIL
match: [event-6-17179869184-4] matches []
expected type=6, got 0
expected config=17179869184, got 0
FAILED './tests/attr/test-stat-C0' - match failure

The type matching is failed because one type is 0 but the
type of hybrid hardware event is 6. We temporarily skip this
test case and TODO in future.

Signed-off-by: Jin Yao 
---
 tools/perf/tests/attr.c | 5 +
 1 file changed, 5 insertions(+)

diff --git a/tools/perf/tests/attr.c b/tools/perf/tests/attr.c
index dd39ce9b0277..fc7f74159764 100644
--- a/tools/perf/tests/attr.c
+++ b/tools/perf/tests/attr.c
@@ -34,6 +34,7 @@
 #include "event.h"
 #include "util.h"
 #include "tests.h"
+#include "pmu-hybrid.h"
 
 #define ENV "PERF_TEST_ATTR"
 
@@ -184,6 +185,10 @@ int test__attr(struct test *test __maybe_unused, int 
subtest __maybe_unused)
char path_dir[PATH_MAX];
char *exec_path;
 
+   perf_pmu__scan(NULL);
+   if (perf_pmu__hybrid_exist())
+   return 0;
+
/* First try development tree tests. */
if (!lstat("./tests", ))
return run_dir("./tests", "./perf");
-- 
2.17.1



[PATCH v2 24/27] perf tests: Support 'Session topology' test for hybrid

2021-03-10 Thread Jin Yao
Force to create one event "cpu_core/cycles/" by default,
otherwise in evlist__valid_sample_type, the checking of
'if (evlist->core.nr_entries == 1)' would be failed.

root@otcpl-adl-s-2:~# ./perf test 41
41: Session topology: Ok

Signed-off-by: Jin Yao 
---
 tools/perf/tests/topology.c | 10 +-
 1 file changed, 9 insertions(+), 1 deletion(-)

diff --git a/tools/perf/tests/topology.c b/tools/perf/tests/topology.c
index 74748ed75b2c..0f6e73baab2d 100644
--- a/tools/perf/tests/topology.c
+++ b/tools/perf/tests/topology.c
@@ -40,7 +40,15 @@ static int session_write_header(char *path)
session = perf_session__new(, false, NULL);
TEST_ASSERT_VAL("can't get session", !IS_ERR(session));
 
-   session->evlist = evlist__new_default();
+   perf_pmu__scan(NULL);
+   if (!perf_pmu__hybrid_exist()) {
+   session->evlist = evlist__new_default();
+   } else {
+   struct parse_events_error err;
+
+   session->evlist = evlist__new();
+   parse_events(session->evlist, "cpu_core/cycles/", );
+   }
TEST_ASSERT_VAL("can't get evlist", session->evlist);
 
perf_header__set_feat(>header, HEADER_CPU_TOPOLOGY);
-- 
2.17.1



[PATCH v2 23/27] perf tests: Support 'Parse and process metrics' test for hybrid

2021-03-10 Thread Jin Yao
Some events are not supported. Only pick up some cases for hybrid.

root@otcpl-adl-s-2:~# ./perf test 67
67: Parse and process metrics   : Ok

Signed-off-by: Jin Yao 
---
 tools/perf/tests/parse-metric.c | 11 ---
 1 file changed, 8 insertions(+), 3 deletions(-)

diff --git a/tools/perf/tests/parse-metric.c b/tools/perf/tests/parse-metric.c
index 55bf52e588be..149b18f1f96a 100644
--- a/tools/perf/tests/parse-metric.c
+++ b/tools/perf/tests/parse-metric.c
@@ -370,12 +370,17 @@ static int test_metric_group(void)
 
 int test__parse_metric(struct test *test __maybe_unused, int subtest 
__maybe_unused)
 {
+   perf_pmu__scan(NULL);
+
TEST_ASSERT_VAL("IPC failed", test_ipc() == 0);
TEST_ASSERT_VAL("frontend failed", test_frontend() == 0);
-   TEST_ASSERT_VAL("cache_miss_cycles failed", test_cache_miss_cycles() == 
0);
TEST_ASSERT_VAL("DCache_L2 failed", test_dcache_l2() == 0);
TEST_ASSERT_VAL("recursion fail failed", test_recursion_fail() == 0);
-   TEST_ASSERT_VAL("test metric group", test_metric_group() == 0);
TEST_ASSERT_VAL("Memory bandwidth", test_memory_bandwidth() == 0);
-   return 0;
+
+   if (!perf_pmu__hybrid_exist()) {
+   TEST_ASSERT_VAL("cache_miss_cycles failed", 
test_cache_miss_cycles() == 0);
+   TEST_ASSERT_VAL("test metric group", test_metric_group() == 0);
+   }
+return 0;
 }
-- 
2.17.1



[PATCH v2 20/27] perf tests: Add hybrid cases for 'Roundtrip evsel->name' test

2021-03-10 Thread Jin Yao
Since for one hw event, two hybrid events are created.

For example,

evsel->idx  evsel__name(evsel)
0   cycles
1   cycles
2   instructions
3   instructions
...

So for comparing the evsel name on hybrid, the evsel->idx
needs to be divided by 2.

root@otcpl-adl-s-2:~# ./perf test 14
14: Roundtrip evsel->name   : Ok

Signed-off-by: Jin Yao 
---
 tools/perf/tests/evsel-roundtrip-name.c | 19 ---
 1 file changed, 12 insertions(+), 7 deletions(-)

diff --git a/tools/perf/tests/evsel-roundtrip-name.c 
b/tools/perf/tests/evsel-roundtrip-name.c
index f7f3e5b4c180..2b938a15901e 100644
--- a/tools/perf/tests/evsel-roundtrip-name.c
+++ b/tools/perf/tests/evsel-roundtrip-name.c
@@ -62,7 +62,8 @@ static int perf_evsel__roundtrip_cache_name_test(void)
return ret;
 }
 
-static int __perf_evsel__name_array_test(const char *names[], int nr_names)
+static int __perf_evsel__name_array_test(const char *names[], int nr_names,
+int distance)
 {
int i, err;
struct evsel *evsel;
@@ -82,9 +83,9 @@ static int __perf_evsel__name_array_test(const char *names[], 
int nr_names)
 
err = 0;
evlist__for_each_entry(evlist, evsel) {
-   if (strcmp(evsel__name(evsel), names[evsel->idx])) {
+   if (strcmp(evsel__name(evsel), names[evsel->idx / distance])) {
--err;
-   pr_debug("%s != %s\n", evsel__name(evsel), 
names[evsel->idx]);
+   pr_debug("%s != %s\n", evsel__name(evsel), 
names[evsel->idx / distance]);
}
}
 
@@ -93,18 +94,22 @@ static int __perf_evsel__name_array_test(const char 
*names[], int nr_names)
return err;
 }
 
-#define perf_evsel__name_array_test(names) \
-   __perf_evsel__name_array_test(names, ARRAY_SIZE(names))
+#define perf_evsel__name_array_test(names, distance) \
+   __perf_evsel__name_array_test(names, ARRAY_SIZE(names), distance)
 
 int test__perf_evsel__roundtrip_name_test(struct test *test __maybe_unused, 
int subtest __maybe_unused)
 {
int err = 0, ret = 0;
 
-   err = perf_evsel__name_array_test(evsel__hw_names);
+   perf_pmu__scan(NULL);
+   if (perf_pmu__hybrid_exist())
+   return perf_evsel__name_array_test(evsel__hw_names, 2);
+
+   err = perf_evsel__name_array_test(evsel__hw_names, 1);
if (err)
ret = err;
 
-   err = __perf_evsel__name_array_test(evsel__sw_names, 
PERF_COUNT_SW_DUMMY + 1);
+   err = __perf_evsel__name_array_test(evsel__sw_names, 
PERF_COUNT_SW_DUMMY + 1, 1);
if (err)
ret = err;
 
-- 
2.17.1



[PATCH v2 22/27] perf tests: Support 'Track with sched_switch' test for hybrid

2021-03-10 Thread Jin Yao
Since for "cycles:u' on hybrid platform, it creates two "cycles".
So the number of events in evlist is not expected in next test
steps. Now we just use one event "cpu_core/cycles:u/" for hybrid.

root@otcpl-adl-s-2:~# ./perf test 35
35: Track with sched_switch     : Ok

Signed-off-by: Jin Yao 
---
 tools/perf/tests/switch-tracking.c | 10 +-
 1 file changed, 9 insertions(+), 1 deletion(-)

diff --git a/tools/perf/tests/switch-tracking.c 
b/tools/perf/tests/switch-tracking.c
index 3ebaa758df77..13a11ce51a1a 100644
--- a/tools/perf/tests/switch-tracking.c
+++ b/tools/perf/tests/switch-tracking.c
@@ -340,6 +340,11 @@ int test__switch_tracking(struct test *test 
__maybe_unused, int subtest __maybe_
struct evsel *switch_evsel, *tracking_evsel;
const char *comm;
int err = -1;
+   bool hybrid = false;
+
+   perf_pmu__scan(NULL);
+   if (perf_pmu__hybrid_exist())
+   hybrid = true;
 
threads = thread_map__new(-1, getpid(), UINT_MAX);
if (!threads) {
@@ -371,7 +376,10 @@ int test__switch_tracking(struct test *test 
__maybe_unused, int subtest __maybe_
cpu_clocks_evsel = evlist__last(evlist);
 
/* Second event */
-   err = parse_events(evlist, "cycles:u", NULL);
+   if (!hybrid)
+   err = parse_events(evlist, "cycles:u", NULL);
+   else
+   err = parse_events(evlist, "cpu_core/cycles:u/", NULL);
if (err) {
pr_debug("Failed to parse event cycles:u\n");
goto out_err;
-- 
2.17.1



[PATCH v2 16/27] perf evlist: Warn as events from different hybrid PMUs in a group

2021-03-10 Thread Jin Yao
If a group has events which are from different hybrid PMUs,
shows a warning.

This is to remind the user not to put the core event and atom
event into one group.

  root@ssp-pwrt-002:~# ./perf stat -e "{cpu_core/cycles/,cpu_atom/cycles/}" -- 
sleep 1
  WARNING: Group has events from different hybrid PMUs

   Performance counter stats for 'sleep 1':

 cpu_core/cycles/
   cpu_atom/cycles/

 1.002585908 seconds time elapsed

Signed-off-by: Jin Yao 
---
 tools/perf/builtin-record.c |  3 +++
 tools/perf/builtin-stat.c   |  7 ++
 tools/perf/util/evlist.c| 44 +
 tools/perf/util/evlist.h|  2 ++
 4 files changed, 56 insertions(+)

diff --git a/tools/perf/builtin-record.c b/tools/perf/builtin-record.c
index 363ea1047148..188a1198cd4b 100644
--- a/tools/perf/builtin-record.c
+++ b/tools/perf/builtin-record.c
@@ -929,6 +929,9 @@ static int record__open(struct record *rec)
pos = evlist__reset_weak_group(evlist, pos, 
true);
goto try_again;
}
+
+   if (errno == EINVAL && perf_pmu__hybrid_exist())
+   evlist__warn_hybrid_group(evlist);
rc = -errno;
evsel__open_strerror(pos, >target, errno, msg, 
sizeof(msg));
ui__error("%s\n", msg);
diff --git a/tools/perf/builtin-stat.c b/tools/perf/builtin-stat.c
index 7a732508b2b4..6f780a039db0 100644
--- a/tools/perf/builtin-stat.c
+++ b/tools/perf/builtin-stat.c
@@ -239,6 +239,9 @@ static void evlist__check_cpu_maps(struct evlist *evlist)
struct evsel *evsel, *pos, *leader;
char buf[1024];
 
+   if (evlist__hybrid_exist(evlist))
+   return;
+
evlist__for_each_entry(evlist, evsel) {
leader = evsel->leader;
 
@@ -726,6 +729,10 @@ enum counter_recovery {
 static enum counter_recovery stat_handle_error(struct evsel *counter)
 {
char msg[BUFSIZ];
+
+   if (perf_pmu__hybrid_exist() && errno == EINVAL)
+   evlist__warn_hybrid_group(evsel_list);
+
/*
 * PPC returns ENXIO for HW counters until 2.6.37
 * (behavior changed with commit b0a873e).
diff --git a/tools/perf/util/evlist.c b/tools/perf/util/evlist.c
index f139151b9433..5ec891418cdd 100644
--- a/tools/perf/util/evlist.c
+++ b/tools/perf/util/evlist.c
@@ -2224,3 +2224,47 @@ void evlist__invalidate_all_cpus(struct evlist *evlist)
perf_cpu_map__put(evlist->core.all_cpus);
evlist->core.all_cpus = perf_cpu_map__empty_new(1);
 }
+
+static bool group_hybrid_conflict(struct evsel *leader)
+{
+   struct evsel *pos, *prev = NULL;
+
+   for_each_group_evsel(pos, leader) {
+   if (!pos->pmu_name || !perf_pmu__is_hybrid(pos->pmu_name))
+   continue;
+
+   if (prev && strcmp(prev->pmu_name, pos->pmu_name))
+   return true;
+
+   prev = pos;
+   }
+
+   return false;
+}
+
+void evlist__warn_hybrid_group(struct evlist *evlist)
+{
+   struct evsel *evsel;
+
+   evlist__for_each_entry(evlist, evsel) {
+   if (evsel__is_group_leader(evsel) &&
+   evsel->core.nr_members > 1 &&
+   group_hybrid_conflict(evsel)) {
+   WARN_ONCE(1, "WARNING: Group has events from "
+"different hybrid PMUs\n");
+   return;
+   }
+   }
+}
+
+bool evlist__hybrid_exist(struct evlist *evlist)
+{
+   struct evsel *evsel;
+
+   evlist__for_each_entry(evlist, evsel) {
+   if (evsel__is_hybrid_event(evsel))
+   return true;
+   }
+
+   return false;
+}
diff --git a/tools/perf/util/evlist.h b/tools/perf/util/evlist.h
index 0da683511d98..33dec3bb5739 100644
--- a/tools/perf/util/evlist.h
+++ b/tools/perf/util/evlist.h
@@ -369,4 +369,6 @@ struct evsel *evlist__find_evsel(struct evlist *evlist, int 
idx);
 void evlist__invalidate_all_cpus(struct evlist *evlist);
 
 bool evlist__has_hybrid_events(struct evlist *evlist);
+void evlist__warn_hybrid_group(struct evlist *evlist);
+bool evlist__hybrid_exist(struct evlist *evlist);
 #endif /* __PERF_EVLIST_H */
-- 
2.17.1



[PATCH v2 17/27] perf evsel: Adjust hybrid event and global event mixed group

2021-03-10 Thread Jin Yao
: 6: 800568167 1002104339 1002104339
  cycles: 7: 800566760 1002102953 1002102953
  WARNING: for cpu-clock, some CPU counts not read
  cpu-clock: 0: 0 0 0
  cpu-clock: 1: 0 0 0
  cpu-clock: 2: 0 0 0
  cpu-clock: 3: 0 0 0
  cpu-clock: 4: 0 0 0
  cpu-clock: 5: 0 0 0
  cpu-clock: 6: 0 0 0
  cpu-clock: 7: 0 0 0
  cpu-clock: 8: 0 0 0
  cpu-clock: 9: 0 0 0
  cpu-clock: 10: 0 0 0
  cpu-clock: 11: 0 0 0
  cpu-clock: 12: 0 0 0
  cpu-clock: 13: 0 0 0
  cpu-clock: 14: 0 0 0
  cpu-clock: 15: 0 0 0
  cpu-clock: 16: 1002125111 1002124999 1002124999
  cpu-clock: 17: 1002118626 1002118442 1002118442
  cpu-clock: 18: 1002115058 1002114853 1002114853
  cpu-clock: 19: 1002111740 1002111730 1002111730
  cpu-clock: 20: 1002109031 1002108582 1002108582
  cpu-clock: 21: 1002105927 1002106441 1002106441
  cpu-clock: 22: 1002104010 1002104339 1002104339
  cpu-clock: 23: 1002102730 1002102953 1002102953
  cycles: 6446120799 8016892339 8016892339
  cpu-clock: 8016892233 8016892339 8016892339

   Performance counter stats for 'system wide':

   6,446,120,799  cpu_atom/cycles/  #  804.067 M/sec
8,016.89 msec cpu-clock #7.999 CPUs utilized

 1.002212870 seconds time elapsed

For cpu-clock, cpu16-cpu23 are set with valid group fd (cpu_atom/cycles/'s
fd on that cpu). For counting results, cpu_atom/cycles/ has 8 cpus aggregation
, that's correct. But for cpu-clock, it also has 8 cpus aggregation
(cpu16-cpu23, but not all cpus), the code should be improved. Now one warning
is displayed: "WARNING: for cpu-clock, some CPU counts not read".

Signed-off-by: Jin Yao 
---
 tools/perf/util/evsel.c | 105 ++--
 tools/perf/util/stat.h  |   1 +
 2 files changed, 101 insertions(+), 5 deletions(-)

diff --git a/tools/perf/util/evsel.c b/tools/perf/util/evsel.c
index e0b6227d263f..862fdc145f05 100644
--- a/tools/perf/util/evsel.c
+++ b/tools/perf/util/evsel.c
@@ -1464,15 +1464,26 @@ static void evsel__set_count(struct evsel *counter, int 
cpu, int thread, u64 val
perf_counts__set_loaded(counter->counts, cpu, thread, true);
 }
 
-static int evsel__process_group_data(struct evsel *leader, int cpu, int 
thread, u64 *data)
+static int evsel_cpuid_match(struct evsel *evsel1, struct evsel *evsel2,
+int cpu)
+{
+   int cpuid;
+
+   cpuid = perf_cpu_map__cpu(evsel1->core.cpus, cpu);
+   return perf_cpu_map__idx(evsel2->core.cpus, cpuid);
+}
+
+static int evsel__process_group_data(struct evsel *leader, int cpu, int thread,
+u64 *data, int nr_members)
 {
u64 read_format = leader->core.attr.read_format;
struct sample_read_value *v;
u64 nr, ena = 0, run = 0, i;
+   int idx;
 
nr = *data++;
 
-   if (nr != (u64) leader->core.nr_members)
+   if (nr != (u64) nr_members)
return -EINVAL;
 
if (read_format & PERF_FORMAT_TOTAL_TIME_ENABLED)
@@ -1492,24 +1503,85 @@ static int evsel__process_group_data(struct evsel 
*leader, int cpu, int thread,
if (!counter)
return -EINVAL;
 
-   evsel__set_count(counter, cpu, thread, v[i].value, ena, run);
+   if (evsel__is_hybrid_event(counter) ||
+   evsel__is_hybrid_event(leader)) {
+   idx = evsel_cpuid_match(leader, counter, cpu);
+   if (idx == -1)
+   return -EINVAL;
+   } else
+   idx = cpu;
+
+   evsel__set_count(counter, idx, thread, v[i].value, ena, run);
}
 
return 0;
 }
 
+static int hybrid_read_size(struct evsel *leader, int cpu, int *nr_members)
+{
+   struct evsel *pos;
+   int nr = 1, back, new_size = 0, idx;
+
+   for_each_group_member(pos, leader) {
+   idx = evsel_cpuid_match(leader, pos, cpu);
+   if (idx != -1)
+   nr++;
+   }
+
+   if (nr != leader->core.nr_members) {
+   back = leader->core.nr_members;
+   leader->core.nr_members = nr;
+   new_size = perf_evsel__read_size(>core);
+   leader->core.nr_members = back;
+   }
+
+   *nr_members = nr;
+   return new_size;
+}
+
 static int evsel__read_group(struct evsel *leader, int cpu, int thread)
 {
struct perf_stat_evsel *ps = leader->stats;
u64 read_format = leader->core.attr.read_format;
int size = perf_evsel__read_size(>core);
+   int new_size, nr_members;
u64 *data = ps->group_data;
 
if (!(read_format & PERF_FORMAT_ID))
return -EINVAL;
 
-   if (!evsel__is_group_leader(leader))
+   if (!evsel__is_group_leader(leader)) {
+   if (evsel__is_hybrid_event(leader->leader) &&
+   !evsel__is_hybrid_event(leader)) {
+

[PATCH v2 18/27] perf script: Support PERF_TYPE_HARDWARE_PMU and PERF_TYPE_HW_CACHE_PMU

2021-03-10 Thread Jin Yao
For a hybrid system, the perf subsystem doesn't know which PMU the
events belong to. So the PMU aware version PERF_TYPE_HARDWARE_PMU and
PERF_TYPE_HW_CACHE_PMU are introduced.

Now define the new output[] entries for these two types.

Signed-off-by: Jin Yao 
---
 tools/perf/builtin-script.c | 24 
 1 file changed, 24 insertions(+)

diff --git a/tools/perf/builtin-script.c b/tools/perf/builtin-script.c
index 5915f19cee55..d0e889e636d5 100644
--- a/tools/perf/builtin-script.c
+++ b/tools/perf/builtin-script.c
@@ -275,6 +275,30 @@ static struct {
.invalid_fields = PERF_OUTPUT_TRACE | PERF_OUTPUT_BPF_OUTPUT,
},
 
+   [PERF_TYPE_HARDWARE_PMU] = {
+   .user_set = false,
+
+   .fields = PERF_OUTPUT_COMM | PERF_OUTPUT_TID |
+ PERF_OUTPUT_CPU | PERF_OUTPUT_TIME |
+ PERF_OUTPUT_EVNAME | PERF_OUTPUT_IP |
+ PERF_OUTPUT_SYM | PERF_OUTPUT_SYMOFFSET |
+ PERF_OUTPUT_DSO | PERF_OUTPUT_PERIOD,
+
+   .invalid_fields = PERF_OUTPUT_TRACE | PERF_OUTPUT_BPF_OUTPUT,
+   },
+
+   [PERF_TYPE_HW_CACHE_PMU] = {
+   .user_set = false,
+
+   .fields = PERF_OUTPUT_COMM | PERF_OUTPUT_TID |
+ PERF_OUTPUT_CPU | PERF_OUTPUT_TIME |
+ PERF_OUTPUT_EVNAME | PERF_OUTPUT_IP |
+ PERF_OUTPUT_SYM | PERF_OUTPUT_SYMOFFSET |
+ PERF_OUTPUT_DSO | PERF_OUTPUT_PERIOD,
+
+   .invalid_fields = PERF_OUTPUT_TRACE | PERF_OUTPUT_BPF_OUTPUT,
+   },
+
[OUTPUT_TYPE_SYNTH] = {
.user_set = false,
 
-- 
2.17.1



[PATCH v2 15/27] perf stat: Filter out unmatched aggregation for hybrid event

2021-03-10 Thread Jin Yao
perf-stat has supported some aggregation modes, such as --per-core,
--per-socket and etc. While for hybrid event, it may only available
on part of cpus. So for --per-core, we need to filter out the
unavailable cores, for --per-socket, filter out the unavailable
sockets, and so on.

Before:

  root@ssp-pwrt-002:~# ./perf stat --per-core -e cpu_core/cycles/ -a -- sleep 1

   Performance counter stats for 'system wide':

  S0-D0-C0   2  1,604,426,524  cpu_core/cycles/
  S0-D0-C4   2  1,604,408,224  cpu_core/cycles/
  S0-D0-C8   2  1,605,995,644  cpu_core/cycles/
  S0-D0-C12  2  1,628,056,554  cpu_core/cycles/
  S0-D0-C16  2  1,611,488,734  cpu_core/cycles/
  S0-D0-C20  2  1,616,314,761  cpu_core/cycles/
  S0-D0-C24  2  1,603,558,295  cpu_core/cycles/
  S0-D0-C28  2  1,603,541,128  cpu_core/cycles/
  S0-D0-C32  0cpu_core/cycles/
  S0-D0-C33  0cpu_core/cycles/
  S0-D0-C34  0cpu_core/cycles/
  S0-D0-C35  0cpu_core/cycles/
  S0-D0-C36  0cpu_core/cycles/
  S0-D0-C37  0cpu_core/cycles/
  S0-D0-C38  0cpu_core/cycles/
  S0-D0-C39  0cpu_core/cycles/

After:

  root@ssp-pwrt-002:~# ./perf stat --per-core -e cpu_core/cycles/ -a -- sleep 1

   Performance counter stats for 'system wide':

  S0-D0-C0   2  1,621,781,943  cpu_core/cycles/
  S0-D0-C4   2  1,621,755,088  cpu_core/cycles/
  S0-D0-C8   2  1,604,276,920  cpu_core/cycles/
  S0-D0-C12  2  1,603,446,963  cpu_core/cycles/
  S0-D0-C16  2  1,604,231,725  cpu_core/cycles/
  S0-D0-C20  2  1,603,435,286  cpu_core/cycles/
  S0-D0-C24  2  1,603,387,250  cpu_core/cycles/
  S0-D0-C28  2  1,604,173,183  cpu_core/cycles/

Signed-off-by: Jin Yao 
---
 tools/perf/util/stat-display.c | 20 
 1 file changed, 20 insertions(+)

diff --git a/tools/perf/util/stat-display.c b/tools/perf/util/stat-display.c
index ed37d8e7ea1a..2db7c36a03ad 100644
--- a/tools/perf/util/stat-display.c
+++ b/tools/perf/util/stat-display.c
@@ -634,6 +634,20 @@ static void aggr_cb(struct perf_stat_config *config,
}
 }
 
+static bool aggr_id_hybrid_matched(struct perf_stat_config *config,
+  struct evsel *counter, struct aggr_cpu_id id)
+{
+   struct aggr_cpu_id s;
+
+   for (int i = 0; i < evsel__nr_cpus(counter); i++) {
+   s = config->aggr_get_id(config, evsel__cpus(counter), i);
+   if (cpu_map__compare_aggr_cpu_id(s, id))
+   return true;
+   }
+
+   return false;
+}
+
 static void print_counter_aggrdata(struct perf_stat_config *config,
   struct evsel *counter, int s,
   char *prefix, bool metric_only,
@@ -647,6 +661,12 @@ static void print_counter_aggrdata(struct perf_stat_config 
*config,
double uval;
 
ad.id = id = config->aggr_map->map[s];
+
+   if (perf_pmu__hybrid_exist() &&
+   !aggr_id_hybrid_matched(config, counter, id)) {
+   return;
+   }
+
ad.val = ad.ena = ad.run = 0;
ad.nr = 0;
if (!collect_data(config, counter, aggr_cb, ))
-- 
2.17.1



[PATCH v2 14/27] perf stat: Add default hybrid events

2021-03-10 Thread Jin Yao
Previously if '-e' is not specified in perf stat, some software events
and hardware events are added to evlist by default.

  root@otcpl-adl-s-2:~# ./perf stat  -- ./triad_loop

   Performance counter stats for './triad_loop':

  109.43 msec task-clock#0.993 CPUs utilized
   1  context-switches  #0.009 K/sec
   0  cpu-migrations#0.000 K/sec
 105  page-faults   #0.960 K/sec
 401,161,982  cycles#3.666 GHz
   1,601,216,357  instructions  #3.99  insn per cycle
 200,217,751  branches  # 1829.686 M/sec
  14,555  branch-misses #0.01% of all branches

 0.110176860 seconds time elapsed

Among the events, cycles, instructions, branches and branch-misses
are hardware events.

One hybrid platform, two events are created for one hardware event.

core cycles,
atom cycles,
core instructions,
atom instructions,
core branches,
atom branches,
core branch-misses,
atom branch-misses

These events will be added to evlist in order on hybrid platform
if '-e' is not set.

Since parse_events() has been supported to create two hardware events
for one event on hybrid platform, so we just use parse_events(evlist,
"cycles,instructions,branches,branch-misses") to create the default
events and add them to evlist.

After:

  root@ssp-pwrt-002:~# ./perf stat  -- ./triad_loop

   Performance counter stats for './triad_loop':

  290.77 msec task-clock#0.996 CPUs utilized
  25  context-switches  #0.086 K/sec
  13  cpu-migrations#0.045 K/sec
 107  page-faults   #0.368 K/sec
 449,620,957  cpu_core/cycles/  # 1546.334 M/sec
 cpu_atom/cycles/  
(0.00%)
   1,601,499,820  cpu_core/instructions/# 5507.870 M/sec
 cpu_atom/instructions/
(0.00%)
 200,272,310  cpu_core/branches/#  688.776 M/sec
 cpu_atom/branches/
(0.00%)
  15,255  cpu_core/branch-misses/   #0.052 M/sec
 cpu_atom/branch-misses/   
(0.00%)

 0.291897676 seconds time elapsed

We can see two events are created for one hardware event.
First one is core event the second one is atom event.

One thing is, the shadow stats looks a bit different, now it's just
'M/sec'.

The perf_stat__update_shadow_stats and perf_stat__print_shadow_stats
need to be improved in future if we want to get the original shadow
stats.

Signed-off-by: Jin Yao 
---
 tools/perf/builtin-stat.c | 22 ++
 1 file changed, 22 insertions(+)

diff --git a/tools/perf/builtin-stat.c b/tools/perf/builtin-stat.c
index 6c0a21323814..7a732508b2b4 100644
--- a/tools/perf/builtin-stat.c
+++ b/tools/perf/builtin-stat.c
@@ -1162,6 +1162,13 @@ static int parse_stat_cgroups(const struct option *opt,
return parse_cgroups(opt, str, unset);
 }
 
+static int add_default_hybrid_events(struct evlist *evlist)
+{
+   struct parse_events_error err;
+
+   return parse_events(evlist, 
"cycles,instructions,branches,branch-misses", );
+}
+
 static struct option stat_options[] = {
OPT_BOOLEAN('T', "transaction", _run,
"hardware transaction statistics"),
@@ -1637,6 +1644,12 @@ static int add_default_attributes(void)
   { .type = PERF_TYPE_HARDWARE, .config = PERF_COUNT_HW_BRANCH_INSTRUCTIONS
},
   { .type = PERF_TYPE_HARDWARE, .config = PERF_COUNT_HW_BRANCH_MISSES  
},
 
+};
+   struct perf_event_attr default_sw_attrs[] = {
+  { .type = PERF_TYPE_SOFTWARE, .config = PERF_COUNT_SW_TASK_CLOCK 
},
+  { .type = PERF_TYPE_SOFTWARE, .config = PERF_COUNT_SW_CONTEXT_SWITCHES   
},
+  { .type = PERF_TYPE_SOFTWARE, .config = PERF_COUNT_SW_CPU_MIGRATIONS 
},
+  { .type = PERF_TYPE_SOFTWARE, .config = PERF_COUNT_SW_PAGE_FAULTS
},
 };
 
 /*
@@ -1874,6 +1887,15 @@ static int add_default_attributes(void)
}
 
if (!evsel_list->core.nr_entries) {
+   perf_pmu__scan(NULL);
+   if (perf_pmu__hybrid_exist()) {
+   if (evlist__add_default_attrs(evsel_list,
+ default_sw_attrs) < 0) {
+   return -1;
+   }
+   return add_default_hybrid_events(evsel_list);
+   }
+
if (target__has_cpu())
default_attrs0[0].config = PERF_COUNT_SW_CPU_CLOCK;
 
-- 
2.17.1



[PATCH v2 13/27] perf evlist: Create two hybrid 'cycles' events by default

2021-03-10 Thread Jin Yao
When evlist is empty, for example no '-e' specified in perf record,
one default 'cycles' event is added to evlist.

While on hybrid platform, it needs to create two default 'cycles'
events. One is for core, the other is for atom.

This patch actually calls evsel__new_cycles() two times to create
two 'cycles' events.

  root@ssp-pwrt-002:~# ./perf record -vv -- sleep 1
  ...
  
  perf_event_attr:
type 6
size 120
config   0x4
{ sample_period, sample_freq }   4000
sample_type  IP|TID|TIME|ID|PERIOD
read_format  ID
disabled 1
inherit  1
mmap 1
comm 1
freq 1
enable_on_exec   1
task 1
precise_ip   3
sample_id_all1
exclude_guest1
mmap21
comm_exec1
ksymbol  1
bpf_event1
  
  sys_perf_event_open: pid 22300  cpu 0  group_fd -1  flags 0x8 = 5
  sys_perf_event_open: pid 22300  cpu 1  group_fd -1  flags 0x8 = 6
  sys_perf_event_open: pid 22300  cpu 2  group_fd -1  flags 0x8 = 7
  sys_perf_event_open: pid 22300  cpu 3  group_fd -1  flags 0x8 = 9
  sys_perf_event_open: pid 22300  cpu 4  group_fd -1  flags 0x8 = 10
  sys_perf_event_open: pid 22300  cpu 5  group_fd -1  flags 0x8 = 11
  sys_perf_event_open: pid 22300  cpu 6  group_fd -1  flags 0x8 = 12
  sys_perf_event_open: pid 22300  cpu 7  group_fd -1  flags 0x8 = 13
  sys_perf_event_open: pid 22300  cpu 8  group_fd -1  flags 0x8 = 14
  sys_perf_event_open: pid 22300  cpu 9  group_fd -1  flags 0x8 = 15
  sys_perf_event_open: pid 22300  cpu 10  group_fd -1  flags 0x8 = 16
  sys_perf_event_open: pid 22300  cpu 11  group_fd -1  flags 0x8 = 17
  sys_perf_event_open: pid 22300  cpu 12  group_fd -1  flags 0x8 = 18
  sys_perf_event_open: pid 22300  cpu 13  group_fd -1  flags 0x8 = 19
  sys_perf_event_open: pid 22300  cpu 14  group_fd -1  flags 0x8 = 20
  sys_perf_event_open: pid 22300  cpu 15  group_fd -1  flags 0x8 = 21
  
  perf_event_attr:
type 6
size 120
config   0xa
{ sample_period, sample_freq }   4000
sample_type  IP|TID|TIME|ID|PERIOD
read_format  ID
disabled 1
inherit  1
freq 1
enable_on_exec   1
precise_ip   3
sample_id_all1
exclude_guest1
  
  sys_perf_event_open: pid 22300  cpu 16  group_fd -1  flags 0x8 = 22
  sys_perf_event_open: pid 22300  cpu 17  group_fd -1  flags 0x8 = 23
  sys_perf_event_open: pid 22300  cpu 18  group_fd -1  flags 0x8 = 24
  sys_perf_event_open: pid 22300  cpu 19  group_fd -1  flags 0x8 = 25
  sys_perf_event_open: pid 22300  cpu 20  group_fd -1  flags 0x8 = 26
  sys_perf_event_open: pid 22300  cpu 21  group_fd -1  flags 0x8 = 27
  sys_perf_event_open: pid 22300  cpu 22  group_fd -1  flags 0x8 = 28
  sys_perf_event_open: pid 22300  cpu 23  group_fd -1  flags 0x8 = 29
  ...

We can see one core 'cycles' (0x4) is enabled on cpu0-cpu15
and atom 'cycles' (0xa) is enabled on cpu16-cpu23.

Signed-off-by: Jin Yao 
---
 tools/perf/builtin-record.c | 10 ++
 tools/perf/util/evlist.c| 32 +++-
 tools/perf/util/evsel.c |  6 +++---
 tools/perf/util/evsel.h |  2 +-
 4 files changed, 41 insertions(+), 9 deletions(-)

diff --git a/tools/perf/builtin-record.c b/tools/perf/builtin-record.c
index 35465d1db6dd..363ea1047148 100644
--- a/tools/perf/builtin-record.c
+++ b/tools/perf/builtin-record.c
@@ -2786,10 +2786,12 @@ int cmd_record(int argc, const char **argv)
if (record.opts.overwrite)
record.opts.tail_synthesize = true;
 
-   if (rec->evlist->core.nr_entries == 0 &&
-   __evlist__add_default(rec->evlist, !record.opts.no_samples) < 0) {
-   pr_err("Not enough memory for event selector list\n");
-   goto out;
+   if (rec->evlist->core.nr_entries == 0) {
+   perf_pmu__scan(NULL);
+   if (__evlist__add_default(rec->evlist, !record.opts.no_samples) 
< 0) {
+   pr_err("Not enough memory for event selector list\n");
+   got

[PATCH v2 11/27] perf parse-events: Support hardware events inside PMU

2021-03-10 Thread Jin Yao
On hybrid platform, some hardware events are only available
on a specific pmu. For example, 'L1-dcache-load-misses' is only
available on 'cpu_core' pmu. And even for the event which can be
available on both pmus, the user also may want to just enable
one event. So now following syntax is supported:

cpu_core//
cpu_core//
cpu_core//

cpu_atom//
cpu_atom//
cpu_atom//

It limits the event to be enabled only on a specified pmu.

The patch uses this idea, for example, if we use "cpu_core/LLC-loads/",
in parse_events_add_pmu(), term->config is "LLC-loads".

We create a new "parse_events_state" with the pmu_name and use
parse_events__scanner to scan the term->config (the string "LLC-loads"
in this example). The parse_events_add_cache() will be called during
parsing. The parse_state->pmu_name is used to identify the pmu
where the event is enabled.

Let's see examples:

  root@ssp-pwrt-002:~# ./perf stat -e cpu_core/cycles/,cpu_core/LLC-loads/ -vv 
-- ./triad_loop
  Control descriptor is not initialized
  
  perf_event_attr:
type 6
size 120
config   0x4
sample_type  IDENTIFIER
read_format  TOTAL_TIME_ENABLED|TOTAL_TIME_RUNNING
disabled 1
inherit  1
enable_on_exec   1
exclude_guest1
  
  sys_perf_event_open: pid 7267  cpu -1  group_fd -1  flags 0x8 = 3
  
  perf_event_attr:
type 7
size 120
config   0x40002
sample_type  IDENTIFIER
read_format  TOTAL_TIME_ENABLED|TOTAL_TIME_RUNNING
disabled 1
inherit  1
enable_on_exec   1
exclude_guest1
  
  sys_perf_event_open: pid 7267  cpu -1  group_fd -1  flags 0x8 = 4
  cycles: 0: 449252097 29724 29724
  LLC-loads: 0: 1857 29724 29724
  cycles: 449252097 29724 29724
  LLC-loads: 1857 29724 29724

   Performance counter stats for './triad_loop':

 449,252,097  cpu_core/cycles/
   1,857  cpu_core/LLC-loads/

 0.298898415 seconds time elapsed

  root@ssp-pwrt-002:~# ./perf stat -e cpu_atom/cycles/,cpu_atom/LLC-loads/ -vv 
-- taskset -c 16 ./triad_loop
  Control descriptor is not initialized
  
  perf_event_attr:
type 6
size 120
config   0xa
sample_type  IDENTIFIER
read_format  TOTAL_TIME_ENABLED|TOTAL_TIME_RUNNING
disabled 1
inherit  1
enable_on_exec   1
exclude_guest1
  
  sys_perf_event_open: pid 7339  cpu -1  group_fd -1  flags 0x8 = 3
  
  perf_event_attr:
type 7
size 120
config   0xa0002
sample_type  IDENTIFIER
read_format  TOTAL_TIME_ENABLED|TOTAL_TIME_RUNNING
disabled 1
inherit  1
enable_on_exec   1
exclude_guest1
  
  sys_perf_event_open: pid 7339  cpu -1  group_fd -1  flags 0x8 = 4
  cycles: 0: 602020010 343657939 342553275
  LLC-loads: 0: 3537 343657939 342553275
  cycles: 603961400 343657939 342553275
  LLC-loads: 3548 343657939 342553275

   Performance counter stats for 'taskset -c 16 ./triad_loop':

 603,961,400  cpu_atom/cycles/  
(99.68%)
   3,548  cpu_atom/LLC-loads/   
(99.68%)

 0.344904585 seconds time elapsed

Signed-off-by: Jin Yao 
---
 tools/perf/util/parse-events.c | 100 +++--
 tools/perf/util/parse-events.h |   6 +-
 tools/perf/util/parse-events.y |  21 ++-
 3 files changed, 105 insertions(+), 22 deletions(-)

diff --git a/tools/perf/util/parse-events.c b/tools/perf/util/parse-events.c
index 09e42245f71a..30435adc7a7b 100644
--- a/tools/perf/util/parse-events.c
+++ b/tools/perf/util/parse-events.c
@@ -489,7 +489,8 @@ static int create_hybrid_cache_eve

[PATCH v2 12/27] perf parse-events: Support hybrid raw events

2021-03-10 Thread Jin Yao
 293915211 293915211
  cpu_core/r3c/: 449000613 293915211 293915211

   Performance counter stats for './triad_loop':

 449,000,613  cpu_core/r3c/

 0.294859229 seconds time elapsed

Signed-off-by: Jin Yao 
---
 tools/perf/util/parse-events.c | 56 +-
 1 file changed, 55 insertions(+), 1 deletion(-)

diff --git a/tools/perf/util/parse-events.c b/tools/perf/util/parse-events.c
index 30435adc7a7b..9b2a33103a57 100644
--- a/tools/perf/util/parse-events.c
+++ b/tools/perf/util/parse-events.c
@@ -1532,6 +1532,55 @@ static int add_hybrid_numeric(struct parse_events_state 
*parse_state,
return 0;
 }
 
+static int create_hybrid_raw_event(struct parse_events_state *parse_state,
+  struct list_head *list,
+  struct perf_event_attr *attr,
+  struct list_head *head_config,
+  struct list_head *config_terms,
+  struct perf_pmu *pmu)
+{
+   struct evsel *evsel;
+
+   attr->type = pmu->type;
+   evsel = __add_event(list, _state->idx, attr, true,
+   get_config_name(head_config),
+   pmu, config_terms, false, NULL);
+   if (evsel)
+   evsel->pmu_name = strdup(pmu->name);
+   else
+   return -ENOMEM;
+
+   return 0;
+}
+
+static int add_hybrid_raw(struct parse_events_state *parse_state,
+ struct list_head *list,
+ struct perf_event_attr *attr,
+ struct list_head *head_config,
+ struct list_head *config_terms,
+ bool *hybrid)
+{
+   struct perf_pmu *pmu;
+   int ret;
+
+   *hybrid = false;
+   perf_pmu__for_each_hybrid_pmu(pmu) {
+   *hybrid = true;
+   if (parse_state->pmu_name &&
+   strcmp(parse_state->pmu_name, pmu->name)) {
+   continue;
+   }
+
+   ret = create_hybrid_raw_event(parse_state, list, attr,
+ head_config, config_terms,
+ pmu);
+   if (ret)
+   return ret;
+   }
+
+   return 0;
+}
+
 int parse_events_add_numeric(struct parse_events_state *parse_state,
 struct list_head *list,
 u32 type, u64 config,
@@ -1558,7 +1607,12 @@ int parse_events_add_numeric(struct parse_events_state 
*parse_state,
/*
 * Skip the software dummy event.
 */
-   if (type != PERF_TYPE_SOFTWARE) {
+   if (type == PERF_TYPE_RAW) {
+   ret = add_hybrid_raw(parse_state, list, , head_config,
+_terms, );
+   if (hybrid)
+   return ret;
+   } else if (type != PERF_TYPE_SOFTWARE) {
if (!perf_pmu__hybrid_exist())
perf_pmu__scan(NULL);
 
-- 
2.17.1



  1   2   3   4   5   6   7   8   9   10   >