Re: [PATCH] powerpc/perf: prevent mixed EBB and non-EBB events

2021-04-06 Thread Athira Rajeev



> On 05-Mar-2021, at 11:20 AM, Athira Rajeev  
> wrote:
> 
> 
> 
>> On 24-Feb-2021, at 5:51 PM, Thadeu Lima de Souza Cascardo 
>>  wrote:
>> 
>> EBB events must be under exclusive groups, so there is no mix of EBB and
>> non-EBB events on the same PMU. This requirement worked fine as perf core
>> would not allow other pinned events to be scheduled together with exclusive
>> events.
>> 
>> This assumption was broken by commit 1908dc911792 ("perf: Tweak
>> perf_event_attr::exclusive semantics").
>> 
>> After that, the test cpu_event_pinned_vs_ebb_test started succeeding after
>> read_events, but worse, the task would not have given access to PMC1, so
>> when it tried to write to it, it was killed with "illegal instruction".
>> 
>> Preventing mixed EBB and non-EBB events from being add to the same PMU will
>> just revert to the previous behavior and the test will succeed.
> 
> 
> Hi,
> 
> Thanks for checking this. I checked your patch which is fixing 
> “check_excludes” to make
> sure all events must agree on EBB. But in the PMU group constraints, we 
> already have check for
> EBB events. This is in arch/powerpc/perf/isa207-common.c ( 
> isa207_get_constraint function ).
> 
> <<>>
> mask  |= CNST_EBB_VAL(ebb);
> value |= CNST_EBB_MASK;
> <<>>
> 
> But the above setting for mask and value is interchanged. We actually need to 
> fix here.
> 

Hi,

I have sent a patch for fixing this EBB mask/value setting.
This is the link to patch:

powerpc/perf: Fix PMU constraint check for EBB events
https://patchwork.ozlabs.org/project/linuxppc-dev/list/?series=237669

Thanks
Athira

> Below patch should fix this:
> 
> diff --git a/arch/powerpc/perf/isa207-common.c 
> b/arch/powerpc/perf/isa207-common.c
> index e4f577da33d8..8b5eeb6fb2fb 100644
> --- a/arch/powerpc/perf/isa207-common.c
> +++ b/arch/powerpc/perf/isa207-common.c
> @@ -447,8 +447,8 @@ int isa207_get_constraint(u64 event, unsigned long 
> *maskp, unsigned long *valp,
> * EBB events are pinned & exclusive, so this should never actually
> * hit, but we leave it as a fallback in case.
> */
> -   mask  |= CNST_EBB_VAL(ebb);
> -   value |= CNST_EBB_MASK;
> +   mask  |= CNST_EBB_MASK;
> +   value |= CNST_EBB_VAL(ebb);
> 
>*maskp = mask;
>*valp = value;
> 
> 
> Can you please try with this patch.
> 
> Thanks
> Athira
> 
> 
>> 
>> Fixes: 1908dc911792 (perf: Tweak perf_event_attr::exclusive semantics)
>> Signed-off-by: Thadeu Lima de Souza Cascardo 
>> ---
>> arch/powerpc/perf/core-book3s.c | 20 
>> 1 file changed, 16 insertions(+), 4 deletions(-)
>> 
>> diff --git a/arch/powerpc/perf/core-book3s.c 
>> b/arch/powerpc/perf/core-book3s.c
>> index 43599e671d38..d767f7944f85 100644
>> --- a/arch/powerpc/perf/core-book3s.c
>> +++ b/arch/powerpc/perf/core-book3s.c
>> @@ -1010,9 +1010,25 @@ static int check_excludes(struct perf_event **ctrs, 
>> unsigned int cflags[],
>>int n_prev, int n_new)
>> {
>>  int eu = 0, ek = 0, eh = 0;
>> +bool ebb = false;
>>  int i, n, first;
>>  struct perf_event *event;
>> 
>> +n = n_prev + n_new;
>> +if (n <= 1)
>> +return 0;
>> +
>> +first = 1;
>> +for (i = 0; i < n; ++i) {
>> +event = ctrs[i];
>> +if (first) {
>> +ebb = is_ebb_event(event);
>> +first = 0;
>> +} else if (is_ebb_event(event) != ebb) {
>> +return -EAGAIN;
>> +}
>> +}
>> +
>>  /*
>>   * If the PMU we're on supports per event exclude settings then we
>>   * don't need to do any of this logic. NB. This assumes no PMU has both
>> @@ -1021,10 +1037,6 @@ static int check_excludes(struct perf_event **ctrs, 
>> unsigned int cflags[],
>>  if (ppmu->flags & PPMU_ARCH_207S)
>>  return 0;
>> 
>> -n = n_prev + n_new;
>> -if (n <= 1)
>> -return 0;
>> -
>>  first = 1;
>>  for (i = 0; i < n; ++i) {
>>  if (cflags[i] & PPMU_LIMITED_PMC_OK) {
>> -- 
>> 2.27.0



[PATCH V2 5/5] tools/perf: Display sort dimension p_stage_cyc only on supported archs

2021-03-22 Thread Athira Rajeev
The sort dimension "p_stage_cyc" is used to represent pipeline
stage cycle information. Presently, this is used only in powerpc.
For unsupported platforms, we don't want to display it
in the perf report output columns. Hence add check in sort_dimension__add()
and skip the sort key incase it is not applicable for the particular arch.

Signed-off-by: Athira Rajeev 
---
 tools/perf/arch/powerpc/util/event.c |  7 +++
 tools/perf/util/event.h  |  1 +
 tools/perf/util/sort.c   | 19 +++
 3 files changed, 27 insertions(+)

diff --git a/tools/perf/arch/powerpc/util/event.c 
b/tools/perf/arch/powerpc/util/event.c
index 22521bc9481a..3bf441257466 100644
--- a/tools/perf/arch/powerpc/util/event.c
+++ b/tools/perf/arch/powerpc/util/event.c
@@ -44,3 +44,10 @@ const char *arch_perf_header_entry(const char *se_header)
return "Dispatch Cyc";
return se_header;
 }
+
+int arch_support_sort_key(const char *sort_key)
+{
+   if (!strcmp(sort_key, "p_stage_cyc"))
+   return 1;
+   return 0;
+}
diff --git a/tools/perf/util/event.h b/tools/perf/util/event.h
index e5da4a695ff2..8a62fb39e365 100644
--- a/tools/perf/util/event.h
+++ b/tools/perf/util/event.h
@@ -429,5 +429,6 @@ void  cpu_map_data__synthesize(struct 
perf_record_cpu_map_data *data, struct per
 void arch_perf_parse_sample_weight(struct perf_sample *data, const __u64 
*array, u64 type);
 void arch_perf_synthesize_sample_weight(const struct perf_sample *data, __u64 
*array, u64 type);
 const char *arch_perf_header_entry(const char *se_header);
+int arch_support_sort_key(const char *sort_key);
 
 #endif /* __PERF_RECORD_H */
diff --git a/tools/perf/util/sort.c b/tools/perf/util/sort.c
index d262261ad1a6..e8030778ff44 100644
--- a/tools/perf/util/sort.c
+++ b/tools/perf/util/sort.c
@@ -47,6 +47,7 @@
 inthave_ignore_callees = 0;
 enum sort_mode sort__mode = SORT_MODE__NORMAL;
 const char *dynamic_headers[] = {"local_ins_lat", "p_stage_cyc"};
+const char *arch_specific_sort_keys[] = {"p_stage_cyc"};
 
 /*
  * Replaces all occurrences of a char used with the:
@@ -1837,6 +1838,11 @@ struct sort_dimension {
int taken;
 };
 
+int __weak arch_support_sort_key(const char *sort_key __maybe_unused)
+{
+   return 0;
+}
+
 const char * __weak arch_perf_header_entry(const char *se_header)
 {
return se_header;
@@ -2773,6 +2779,19 @@ int sort_dimension__add(struct perf_hpp_list *list, 
const char *tok,
 {
unsigned int i, j;
 
+   /*
+* Check to see if there are any arch specific
+* sort dimensions not applicable for the current
+* architecture. If so, Skip that sort key since
+* we don't want to display it in the output fields.
+*/
+   for (j = 0; j < ARRAY_SIZE(arch_specific_sort_keys); j++) {
+   if (!strcmp(arch_specific_sort_keys[j], tok) &&
+   !arch_support_sort_key(tok)) {
+   return 0;
+   }
+   }
+
for (i = 0; i < ARRAY_SIZE(common_sort_dimensions); i++) {
struct sort_dimension *sd = _sort_dimensions[i];
 
-- 
1.8.3.1



[PATCH V2 0/5] powerpc/perf: Export processor pipeline stage cycles information

2021-03-22 Thread Athira Rajeev
rch specific header string for matching
sort order in patch2.
  
Athira Rajeev (5):
  powerpc/perf: Expose processor pipeline stage cycles using
PERF_SAMPLE_WEIGHT_STRUCT
  tools/perf: Add dynamic headers for perf report columns
  tools/perf: Add powerpc support for PERF_SAMPLE_WEIGHT_STRUCT
  tools/perf: Support pipeline stage cycles for powerpc
  tools/perf: Display sort dimension p_stage_cyc only on supported archs

 arch/powerpc/include/asm/perf_event_server.h |  2 +-
 arch/powerpc/perf/core-book3s.c  |  4 +-
 arch/powerpc/perf/isa207-common.c| 29 --
 arch/powerpc/perf/isa207-common.h|  6 ++-
 tools/perf/Documentation/perf-report.txt |  2 +
 tools/perf/arch/powerpc/util/Build   |  2 +
 tools/perf/arch/powerpc/util/event.c | 53 
 tools/perf/arch/powerpc/util/evsel.c |  8 
 tools/perf/util/event.h  |  3 ++
 tools/perf/util/hist.c   | 11 +++--
 tools/perf/util/hist.h   |  1 +
 tools/perf/util/session.c|  4 +-
 tools/perf/util/sort.c   | 60 +++-
 tools/perf/util/sort.h   |  2 +
 14 files changed, 174 insertions(+), 13 deletions(-)
 create mode 100644 tools/perf/arch/powerpc/util/event.c
 create mode 100644 tools/perf/arch/powerpc/util/evsel.c

-- 
1.8.3.1



[PATCH V2 2/5] tools/perf: Add dynamic headers for perf report columns

2021-03-22 Thread Athira Rajeev
Currently the header string for different columns in perf report
is fixed. Some fields of perf sample could have different meaning
for different architectures than the meaning conveyed by the header
string. An example is the new field 'var2_w' of perf_sample_weight
structure. This is presently captured as 'Local INSTR Latency' in
perf mem report. But this could be used to denote a different latency
cycle in another architecture.

Introduce a weak function arch_perf_header_entry() to set
the arch specific header string for the fields which can contain dynamic
header. If the architecture do not have this function, fall back to the
default header string value.

Signed-off-by: Athira Rajeev 
---
 tools/perf/util/event.h |  1 +
 tools/perf/util/sort.c  | 19 ++-
 2 files changed, 19 insertions(+), 1 deletion(-)

diff --git a/tools/perf/util/event.h b/tools/perf/util/event.h
index f603edbbbc6f..6106a9c134c9 100644
--- a/tools/perf/util/event.h
+++ b/tools/perf/util/event.h
@@ -427,5 +427,6 @@ void  cpu_map_data__synthesize(struct 
perf_record_cpu_map_data *data, struct per
 
 void arch_perf_parse_sample_weight(struct perf_sample *data, const __u64 
*array, u64 type);
 void arch_perf_synthesize_sample_weight(const struct perf_sample *data, __u64 
*array, u64 type);
+const char *arch_perf_header_entry(const char *se_header);
 
 #endif /* __PERF_RECORD_H */
diff --git a/tools/perf/util/sort.c b/tools/perf/util/sort.c
index 552b590485bf..eeb03e749181 100644
--- a/tools/perf/util/sort.c
+++ b/tools/perf/util/sort.c
@@ -25,6 +25,7 @@
 #include 
 #include "mem-events.h"
 #include "annotate.h"
+#include "event.h"
 #include "time-utils.h"
 #include "cgroup.h"
 #include "machine.h"
@@ -45,6 +46,7 @@
 regex_tignore_callees_regex;
 inthave_ignore_callees = 0;
 enum sort_mode sort__mode = SORT_MODE__NORMAL;
+const char *dynamic_headers[] = {"local_ins_lat"};
 
 /*
  * Replaces all occurrences of a char used with the:
@@ -1816,6 +1818,16 @@ struct sort_dimension {
int taken;
 };
 
+const char * __weak arch_perf_header_entry(const char *se_header)
+{
+   return se_header;
+}
+
+static void sort_dimension_add_dynamic_header(struct sort_dimension *sd)
+{
+   sd->entry->se_header = arch_perf_header_entry(sd->entry->se_header);
+}
+
 #define DIM(d, n, func) [d] = { .name = n, .entry = &(func) }
 
 static struct sort_dimension common_sort_dimensions[] = {
@@ -2739,7 +2751,7 @@ int sort_dimension__add(struct perf_hpp_list *list, const 
char *tok,
struct evlist *evlist,
int level)
 {
-   unsigned int i;
+   unsigned int i, j;
 
for (i = 0; i < ARRAY_SIZE(common_sort_dimensions); i++) {
struct sort_dimension *sd = _sort_dimensions[i];
@@ -2747,6 +2759,11 @@ int sort_dimension__add(struct perf_hpp_list *list, 
const char *tok,
if (strncasecmp(tok, sd->name, strlen(tok)))
continue;
 
+   for (j = 0; j < ARRAY_SIZE(dynamic_headers); j++) {
+   if (!strcmp(dynamic_headers[j], sd->name))
+   sort_dimension_add_dynamic_header(sd);
+   }
+
if (sd->entry == _parent) {
int ret = regcomp(_regex, parent_pattern, 
REG_EXTENDED);
if (ret) {
-- 
1.8.3.1



[PATCH V2 1/5] powerpc/perf: Expose processor pipeline stage cycles using PERF_SAMPLE_WEIGHT_STRUCT

2021-03-22 Thread Athira Rajeev
Performance Monitoring Unit (PMU) registers in powerpc provides
information on cycles elapsed between different stages in the
pipeline. This can be used for application tuning. On ISA v3.1
platform, this information is exposed by sampling registers.
Patch adds kernel support to capture two of the cycle counters
as part of perf sample using the sample type:
PERF_SAMPLE_WEIGHT_STRUCT.

The power PMU function 'get_mem_weight' currently uses 64 bit weight
field of perf_sample_data to capture memory latency. But following the
introduction of PERF_SAMPLE_WEIGHT_TYPE, weight field could contain
64-bit or 32-bit value depending on the architexture support for
PERF_SAMPLE_WEIGHT_STRUCT. Patches uses WEIGHT_STRUCT to expose the
pipeline stage cycles info. Hence update the ppmu functions to work for
64-bit and 32-bit weight values.

If the sample type is PERF_SAMPLE_WEIGHT, use the 64-bit weight field.
if the sample type is PERF_SAMPLE_WEIGHT_STRUCT, memory subsystem
latency is stored in the low 32bits of perf_sample_weight structure.
Also for CPU_FTR_ARCH_31, capture the two cycle counter information in
two 16 bit fields of perf_sample_weight structure.

Signed-off-by: Athira Rajeev 
---
 arch/powerpc/include/asm/perf_event_server.h |  2 +-
 arch/powerpc/perf/core-book3s.c  |  4 ++--
 arch/powerpc/perf/isa207-common.c| 29 +---
 arch/powerpc/perf/isa207-common.h|  6 +-
 4 files changed, 34 insertions(+), 7 deletions(-)

diff --git a/arch/powerpc/include/asm/perf_event_server.h 
b/arch/powerpc/include/asm/perf_event_server.h
index 00e7e671bb4b..112cf092d7b3 100644
--- a/arch/powerpc/include/asm/perf_event_server.h
+++ b/arch/powerpc/include/asm/perf_event_server.h
@@ -43,7 +43,7 @@ struct power_pmu {
u64 alt[]);
void(*get_mem_data_src)(union perf_mem_data_src *dsrc,
u32 flags, struct pt_regs *regs);
-   void(*get_mem_weight)(u64 *weight);
+   void(*get_mem_weight)(u64 *weight, u64 type);
unsigned long   group_constraint_mask;
unsigned long   group_constraint_val;
u64 (*bhrb_filter_map)(u64 branch_sample_type);
diff --git a/arch/powerpc/perf/core-book3s.c b/arch/powerpc/perf/core-book3s.c
index 766f064f00fb..6936763246bd 100644
--- a/arch/powerpc/perf/core-book3s.c
+++ b/arch/powerpc/perf/core-book3s.c
@@ -2206,9 +2206,9 @@ static void record_and_restart(struct perf_event *event, 
unsigned long val,
ppmu->get_mem_data_src)
ppmu->get_mem_data_src(_src, ppmu->flags, 
regs);
 
-   if (event->attr.sample_type & PERF_SAMPLE_WEIGHT &&
+   if (event->attr.sample_type & PERF_SAMPLE_WEIGHT_TYPE &&
ppmu->get_mem_weight)
-   ppmu->get_mem_weight();
+   ppmu->get_mem_weight(, 
event->attr.sample_type);
 
if (perf_event_overflow(event, , regs))
power_pmu_stop(event, 0);
diff --git a/arch/powerpc/perf/isa207-common.c 
b/arch/powerpc/perf/isa207-common.c
index e4f577da33d8..5dcbdbd54598 100644
--- a/arch/powerpc/perf/isa207-common.c
+++ b/arch/powerpc/perf/isa207-common.c
@@ -284,8 +284,10 @@ void isa207_get_mem_data_src(union perf_mem_data_src 
*dsrc, u32 flags,
}
 }
 
-void isa207_get_mem_weight(u64 *weight)
+void isa207_get_mem_weight(u64 *weight, u64 type)
 {
+   union perf_sample_weight *weight_fields;
+   u64 weight_lat;
u64 mmcra = mfspr(SPRN_MMCRA);
u64 exp = MMCRA_THR_CTR_EXP(mmcra);
u64 mantissa = MMCRA_THR_CTR_MANT(mmcra);
@@ -296,9 +298,30 @@ void isa207_get_mem_weight(u64 *weight)
mantissa = P10_MMCRA_THR_CTR_MANT(mmcra);
 
if (val == 0 || val == 7)
-   *weight = 0;
+   weight_lat = 0;
else
-   *weight = mantissa << (2 * exp);
+   weight_lat = mantissa << (2 * exp);
+
+   /*
+* Use 64 bit weight field (full) if sample type is
+* WEIGHT.
+*
+* if sample type is WEIGHT_STRUCT:
+* - store memory latency in the lower 32 bits.
+* - For ISA v3.1, use remaining two 16 bit fields of
+*   perf_sample_weight to store cycle counter values
+*   from sier2.
+*/
+   weight_fields = (union perf_sample_weight *)weight;
+   if (type & PERF_SAMPLE_WEIGHT)
+   weight_fields->full = weight_lat;
+   else {
+   weight_fields->var1_dw = (u32)weight_lat;
+   if (cpu_has_feature(CPU_FTR_ARCH_31)) {
+   weight_fields->var2_w = 
P10_SIER2_FINISH_CYC(mfspr(SPRN_SIER2));
+   weight_fields->var3_w = 
P10_SIER2_DISPATCH_CYC(mfspr(SPRN_SIER2));
+   

[PATCH V2 4/5] tools/perf: Support pipeline stage cycles for powerpc

2021-03-22 Thread Athira Rajeev
The pipeline stage cycles details can be recorded on powerpc from
the contents of Performance Monitor Unit (PMU) registers. On
ISA v3.1 platform, sampling registers exposes the cycles spent in
different pipeline stages. Patch adds perf tools support to present
two of the cycle counter information along with memory latency (weight).

Re-use the field 'ins_lat' for storing the first pipeline stage cycle.
This is stored in 'var2_w' field of 'perf_sample_weight'.

Add a new field 'p_stage_cyc' to store the second pipeline stage cycle
which is stored in 'var3_w' field of perf_sample_weight.

Add new sort function 'Pipeline Stage Cycle' and include this in
default_mem_sort_order[]. This new sort function may be used to denote
some other pipeline stage in another architecture. So add this to
list of sort entries that can have dynamic header string.

Signed-off-by: Athira Rajeev 
---
 tools/perf/Documentation/perf-report.txt |  2 ++
 tools/perf/arch/powerpc/util/event.c | 18 --
 tools/perf/util/event.h  |  1 +
 tools/perf/util/hist.c   | 11 ---
 tools/perf/util/hist.h   |  1 +
 tools/perf/util/session.c|  4 +++-
 tools/perf/util/sort.c   | 24 ++--
 tools/perf/util/sort.h   |  2 ++
 8 files changed, 55 insertions(+), 8 deletions(-)

diff --git a/tools/perf/Documentation/perf-report.txt 
b/tools/perf/Documentation/perf-report.txt
index f546b5e9db05..563fb01a9b8d 100644
--- a/tools/perf/Documentation/perf-report.txt
+++ b/tools/perf/Documentation/perf-report.txt
@@ -112,6 +112,8 @@ OPTIONS
- ins_lat: Instruction latency in core cycles. This is the global 
instruction
  latency
- local_ins_lat: Local instruction latency version
+   - p_stage_cyc: On powerpc, this presents the number of cycles spent in a
+ pipeline stage. And currently supported only on powerpc.
 
By default, comm, dso and symbol keys are used.
(i.e. --sort comm,dso,symbol)
diff --git a/tools/perf/arch/powerpc/util/event.c 
b/tools/perf/arch/powerpc/util/event.c
index f49d32c2c8ae..22521bc9481a 100644
--- a/tools/perf/arch/powerpc/util/event.c
+++ b/tools/perf/arch/powerpc/util/event.c
@@ -18,8 +18,11 @@ void arch_perf_parse_sample_weight(struct perf_sample *data,
weight.full = *array;
if (type & PERF_SAMPLE_WEIGHT)
data->weight = weight.full;
-   else
+   else {
data->weight = weight.var1_dw;
+   data->ins_lat = weight.var2_w;
+   data->p_stage_cyc = weight.var3_w;
+   }
 }
 
 void arch_perf_synthesize_sample_weight(const struct perf_sample *data,
@@ -27,6 +30,17 @@ void arch_perf_synthesize_sample_weight(const struct 
perf_sample *data,
 {
*array = data->weight;
 
-   if (type & PERF_SAMPLE_WEIGHT_STRUCT)
+   if (type & PERF_SAMPLE_WEIGHT_STRUCT) {
*array &= 0x;
+   *array |= ((u64)data->ins_lat << 32);
+   }
+}
+
+const char *arch_perf_header_entry(const char *se_header)
+{
+   if (!strcmp(se_header, "Local INSTR Latency"))
+   return "Finish Cyc";
+   else if (!strcmp(se_header, "Pipeline Stage Cycle"))
+   return "Dispatch Cyc";
+   return se_header;
 }
diff --git a/tools/perf/util/event.h b/tools/perf/util/event.h
index 6106a9c134c9..e5da4a695ff2 100644
--- a/tools/perf/util/event.h
+++ b/tools/perf/util/event.h
@@ -147,6 +147,7 @@ struct perf_sample {
u8  cpumode;
u16 misc;
u16 ins_lat;
+   u16 p_stage_cyc;
bool no_hw_idx; /* No hw_idx collected in branch_stack */
char insn[MAX_INSN];
void *raw_data;
diff --git a/tools/perf/util/hist.c b/tools/perf/util/hist.c
index c82f5fc26af8..9299ee535518 100644
--- a/tools/perf/util/hist.c
+++ b/tools/perf/util/hist.c
@@ -211,6 +211,7 @@ void hists__calc_col_len(struct hists *hists, struct 
hist_entry *h)
hists__new_col_len(hists, HISTC_MEM_BLOCKED, 10);
hists__new_col_len(hists, HISTC_LOCAL_INS_LAT, 13);
hists__new_col_len(hists, HISTC_GLOBAL_INS_LAT, 13);
+   hists__new_col_len(hists, HISTC_P_STAGE_CYC, 13);
if (symbol_conf.nanosecs)
hists__new_col_len(hists, HISTC_TIME, 16);
else
@@ -289,13 +290,14 @@ static long hist_time(unsigned long htime)
 }
 
 static void he_stat__add_period(struct he_stat *he_stat, u64 period,
-   u64 weight, u64 ins_lat)
+   u64 weight, u64 ins_lat, u64 p_stage_cyc)
 {
 
he_stat->period += period;
he_stat->weight += weight;
he_stat->nr_events  += 1;
he_stat->ins_lat+= ins_lat;
+   he_stat->p_stage_cyc+= p_stage_cyc;
 }
 
 static void he_stat__add_stat(struct he_stat *dest, stru

[PATCH V2 3/5] tools/perf: Add powerpc support for PERF_SAMPLE_WEIGHT_STRUCT

2021-03-22 Thread Athira Rajeev
Add arch specific arch_evsel__set_sample_weight() to set the new
sample type for powerpc.

Add arch specific arch_perf_parse_sample_weight() to store the
sample->weight values depending on the sample type applied.
if the new sample type (PERF_SAMPLE_WEIGHT_STRUCT) is applied,
store only the lower 32 bits to sample->weight. If sample type
is 'PERF_SAMPLE_WEIGHT', store the full 64-bit to sample->weight.

Signed-off-by: Athira Rajeev 
---
 tools/perf/arch/powerpc/util/Build   |  2 ++
 tools/perf/arch/powerpc/util/event.c | 32 
 tools/perf/arch/powerpc/util/evsel.c |  8 
 3 files changed, 42 insertions(+)
 create mode 100644 tools/perf/arch/powerpc/util/event.c
 create mode 100644 tools/perf/arch/powerpc/util/evsel.c

diff --git a/tools/perf/arch/powerpc/util/Build 
b/tools/perf/arch/powerpc/util/Build
index b7945e5a543b..8a79c4126e5b 100644
--- a/tools/perf/arch/powerpc/util/Build
+++ b/tools/perf/arch/powerpc/util/Build
@@ -4,6 +4,8 @@ perf-y += kvm-stat.o
 perf-y += perf_regs.o
 perf-y += mem-events.o
 perf-y += sym-handling.o
+perf-y += evsel.o
+perf-y += event.o
 
 perf-$(CONFIG_DWARF) += dwarf-regs.o
 perf-$(CONFIG_DWARF) += skip-callchain-idx.o
diff --git a/tools/perf/arch/powerpc/util/event.c 
b/tools/perf/arch/powerpc/util/event.c
new file mode 100644
index ..f49d32c2c8ae
--- /dev/null
+++ b/tools/perf/arch/powerpc/util/event.c
@@ -0,0 +1,32 @@
+// SPDX-License-Identifier: GPL-2.0
+#include 
+#include 
+#include 
+
+#include "../../../util/event.h"
+#include "../../../util/synthetic-events.h"
+#include "../../../util/machine.h"
+#include "../../../util/tool.h"
+#include "../../../util/map.h"
+#include "../../../util/debug.h"
+
+void arch_perf_parse_sample_weight(struct perf_sample *data,
+  const __u64 *array, u64 type)
+{
+   union perf_sample_weight weight;
+
+   weight.full = *array;
+   if (type & PERF_SAMPLE_WEIGHT)
+   data->weight = weight.full;
+   else
+   data->weight = weight.var1_dw;
+}
+
+void arch_perf_synthesize_sample_weight(const struct perf_sample *data,
+   __u64 *array, u64 type)
+{
+   *array = data->weight;
+
+   if (type & PERF_SAMPLE_WEIGHT_STRUCT)
+   *array &= 0x;
+}
diff --git a/tools/perf/arch/powerpc/util/evsel.c 
b/tools/perf/arch/powerpc/util/evsel.c
new file mode 100644
index ..2f733cdc8dbb
--- /dev/null
+++ b/tools/perf/arch/powerpc/util/evsel.c
@@ -0,0 +1,8 @@
+// SPDX-License-Identifier: GPL-2.0
+#include 
+#include "util/evsel.h"
+
+void arch_evsel__set_sample_weight(struct evsel *evsel)
+{
+   evsel__set_sample_bit(evsel, WEIGHT_STRUCT);
+}
-- 
1.8.3.1



Re: [PATCH 4/4] tools/perf: Support pipeline stage cycles for powerpc

2021-03-15 Thread Athira Rajeev



> On 12-Mar-2021, at 6:26 PM, Jiri Olsa  wrote:
> 
> On Tue, Mar 09, 2021 at 09:04:00AM -0500, Athira Rajeev wrote:
>> The pipeline stage cycles details can be recorded on powerpc from
>> the contents of Performance Monitor Unit (PMU) registers. On
>> ISA v3.1 platform, sampling registers exposes the cycles spent in
>> different pipeline stages. Patch adds perf tools support to present
>> two of the cycle counter information along with memory latency (weight).
>> 
>> Re-use the field 'ins_lat' for storing the first pipeline stage cycle.
>> This is stored in 'var2_w' field of 'perf_sample_weight'.
>> 
>> Add a new field 'p_stage_cyc' to store the second pipeline stage cycle
>> which is stored in 'var3_w' field of perf_sample_weight.
>> 
>> Add new sort function 'Pipeline Stage Cycle' and include this in
>> default_mem_sort_order[]. This new sort function may be used to denote
>> some other pipeline stage in another architecture. So add this to
>> list of sort entries that can have dynamic header string.
>> 
>> Signed-off-by: Athira Rajeev 
>> ---
>> tools/perf/Documentation/perf-report.txt |  1 +
>> tools/perf/arch/powerpc/util/event.c | 18 --
>> tools/perf/util/event.h  |  1 +
>> tools/perf/util/hist.c   | 11 ---
>> tools/perf/util/hist.h   |  1 +
>> tools/perf/util/session.c|  4 +++-
>> tools/perf/util/sort.c   | 24 ++--
>> tools/perf/util/sort.h   |  2 ++
>> 8 files changed, 54 insertions(+), 8 deletions(-)
>> 
>> diff --git a/tools/perf/Documentation/perf-report.txt 
>> b/tools/perf/Documentation/perf-report.txt
>> index f546b5e9db05..9691d9c227ba 100644
>> --- a/tools/perf/Documentation/perf-report.txt
>> +++ b/tools/perf/Documentation/perf-report.txt
>> @@ -112,6 +112,7 @@ OPTIONS
>>  - ins_lat: Instruction latency in core cycles. This is the global 
>> instruction
>>latency
>>  - local_ins_lat: Local instruction latency version
>> +- p_stage_cyc: Number of cycles spent in a pipeline stage.
> 
> please specify in here that it's ppc only

Ok Sure,

> 
> SNIP
> 
>> +struct sort_entry sort_p_stage_cyc = {
>> +.se_header  = "Pipeline Stage Cycle",
>> +.se_cmp = sort__global_p_stage_cyc_cmp,
>> +.se_snprintf= hist_entry__p_stage_cyc_snprintf,
>> +.se_width_idx   = HISTC_P_STAGE_CYC,
>> +};
>> +
>> struct sort_entry sort_mem_daddr_sym = {
>>  .se_header  = "Data Symbol",
>>  .se_cmp = sort__daddr_cmp,
>> @@ -1853,6 +1872,7 @@ static void sort_dimension_add_dynamic_header(struct 
>> sort_dimension *sd)
>>  DIM(SORT_CODE_PAGE_SIZE, "code_page_size", sort_code_page_size),
>>  DIM(SORT_LOCAL_INS_LAT, "local_ins_lat", sort_local_ins_lat),
>>  DIM(SORT_GLOBAL_INS_LAT, "ins_lat", sort_global_ins_lat),
>> +DIM(SORT_P_STAGE_CYC, "p_stage_cyc", sort_p_stage_cyc),
> 
> this might be out of scope for this patch, but would it make sense
> to add arch specific sort dimension? so the specific column is
> not even visible on arch that it's not supported on
> 

Hi Jiri,

Thanks for the suggestions.

Below is an approach I came up with for adding dynamic sort key based on 
architecture support.
With this patch, perf report for mem mode will display new sort key only in 
supported archs. 
Please help to review if this approach looks good. I have created this on top 
of my current set. If this looks fine, 
I can include this in version2 patch set.

From 8ebbe6ae802d895103335899e4e60dde5e562f33 Mon Sep 17 00:00:00 2001
From: Athira Rajeev 
Date: Mon, 15 Mar 2021 02:33:28 +
Subject: [PATCH] tools/perf: Add dynamic sort dimensions for mem mode

Add dynamic sort dimensions for mem mode.

Signed-off-by: Athira Rajeev 
---
 tools/perf/arch/powerpc/util/event.c |  7 +
 tools/perf/util/event.h  |  1 +
 tools/perf/util/sort.c   | 43 +++-
 3 files changed, 50 insertions(+), 1 deletion(-)

diff --git a/tools/perf/arch/powerpc/util/event.c 
b/tools/perf/arch/powerpc/util/event.c
index b80fbee83b6e..fddfc288c415 100644
--- a/tools/perf/arch/powerpc/util/event.c
+++ b/tools/perf/arch/powerpc/util/event.c
@@ -44,3 +44,10 @@ const char *arch_perf_header_entry__add(const char 
*se_header)
return "Dispatch Cyc";
return se_header;
 }
+
+int arch_support_dynamic_key(const char *sort_key)
+{
+   if (!strcmp(sort_key, "p_stage_cyc"))
+   return 1;
+   return 0;
+}
diff --g

Re: [PATCH 2/4] tools/perf: Add dynamic headers for perf report columns

2021-03-15 Thread Athira Rajeev



> On 12-Mar-2021, at 6:27 PM, Jiri Olsa  wrote:
> 
> On Tue, Mar 09, 2021 at 09:03:58AM -0500, Athira Rajeev wrote:
>> Currently the header string for different columns in perf report
>> is fixed. Some fields of perf sample could have different meaning
>> for different architectures than the meaning conveyed by the header
>> string. An example is the new field 'var2_w' of perf_sample_weight
>> structure. This is presently captured as 'Local INSTR Latency' in
>> perf mem report. But this could be used to denote a different latency
>> cycle in another architecture.
>> 
>> Introduce a weak function arch_perf_header_entry__add() to set
>> the arch specific header string for the fields which can contain dynamic
>> header. If the architecture do not have this function, fall back to the
>> default header string value.
>> 
>> Signed-off-by: Athira Rajeev 
>> ---
>> tools/perf/util/event.h |  1 +
>> tools/perf/util/sort.c  | 19 ++-
>> 2 files changed, 19 insertions(+), 1 deletion(-)
>> 
>> diff --git a/tools/perf/util/event.h b/tools/perf/util/event.h
>> index f603edbbbc6f..89b149e2e70a 100644
>> --- a/tools/perf/util/event.h
>> +++ b/tools/perf/util/event.h
>> @@ -427,5 +427,6 @@ void  cpu_map_data__synthesize(struct 
>> perf_record_cpu_map_data *data, struct per
>> 
>> void arch_perf_parse_sample_weight(struct perf_sample *data, const __u64 
>> *array, u64 type);
>> void arch_perf_synthesize_sample_weight(const struct perf_sample *data, 
>> __u64 *array, u64 type);
>> +const char *arch_perf_header_entry__add(const char *se_header);
>> 
>> #endif /* __PERF_RECORD_H */
>> diff --git a/tools/perf/util/sort.c b/tools/perf/util/sort.c
>> index 0d5ad42812b9..741a6df29fa0 100644
>> --- a/tools/perf/util/sort.c
>> +++ b/tools/perf/util/sort.c
>> @@ -25,6 +25,7 @@
>> #include 
>> #include "mem-events.h"
>> #include "annotate.h"
>> +#include "event.h"
>> #include "time-utils.h"
>> #include "cgroup.h"
>> #include "machine.h"
>> @@ -45,6 +46,7 @@
>> regex_t  ignore_callees_regex;
>> int  have_ignore_callees = 0;
>> enum sort_mode   sort__mode = SORT_MODE__NORMAL;
>> +const char  *dynamic_headers[] = {"local_ins_lat"};
>> 
>> /*
>>  * Replaces all occurrences of a char used with the:
>> @@ -1816,6 +1818,16 @@ struct sort_dimension {
>>  int taken;
>> };
>> 
>> +const char * __weak arch_perf_header_entry__add(const char *se_header)
> 
> no need for the __add suffix in here
> 
> jirka
> 

Thanks Jiri for the review.

I will include this change in next version.

Thanks
Athira

>> +{
>> +return se_header;
>> +}
>> +
>> +static void sort_dimension_add_dynamic_header(struct sort_dimension *sd)
>> +{
>> +sd->entry->se_header = 
>> arch_perf_header_entry__add(sd->entry->se_header);
>> +}
>> +
>> #define DIM(d, n, func) [d] = { .name = n, .entry = &(func) }
>> 
>> static struct sort_dimension common_sort_dimensions[] = {
>> @@ -2739,11 +2751,16 @@ int sort_dimension__add(struct perf_hpp_list *list, 
>> const char *tok,
>>  struct evlist *evlist,
>>  int level)
>> {
>> -unsigned int i;
>> +unsigned int i, j;
>> 
>>  for (i = 0; i < ARRAY_SIZE(common_sort_dimensions); i++) {
>>  struct sort_dimension *sd = _sort_dimensions[i];
>> 
>> +for (j = 0; j < ARRAY_SIZE(dynamic_headers); j++) {
>> +if (!strcmp(dynamic_headers[j], sd->name))
>> +sort_dimension_add_dynamic_header(sd);
>> +}
>> +
>>  if (strncasecmp(tok, sd->name, strlen(tok)))
>>  continue;
>> 
>> -- 
>> 1.8.3.1



[PATCH 4/4] tools/perf: Support pipeline stage cycles for powerpc

2021-03-09 Thread Athira Rajeev
The pipeline stage cycles details can be recorded on powerpc from
the contents of Performance Monitor Unit (PMU) registers. On
ISA v3.1 platform, sampling registers exposes the cycles spent in
different pipeline stages. Patch adds perf tools support to present
two of the cycle counter information along with memory latency (weight).

Re-use the field 'ins_lat' for storing the first pipeline stage cycle.
This is stored in 'var2_w' field of 'perf_sample_weight'.

Add a new field 'p_stage_cyc' to store the second pipeline stage cycle
which is stored in 'var3_w' field of perf_sample_weight.

Add new sort function 'Pipeline Stage Cycle' and include this in
default_mem_sort_order[]. This new sort function may be used to denote
some other pipeline stage in another architecture. So add this to
list of sort entries that can have dynamic header string.

Signed-off-by: Athira Rajeev 
---
 tools/perf/Documentation/perf-report.txt |  1 +
 tools/perf/arch/powerpc/util/event.c | 18 --
 tools/perf/util/event.h  |  1 +
 tools/perf/util/hist.c   | 11 ---
 tools/perf/util/hist.h   |  1 +
 tools/perf/util/session.c|  4 +++-
 tools/perf/util/sort.c   | 24 ++--
 tools/perf/util/sort.h   |  2 ++
 8 files changed, 54 insertions(+), 8 deletions(-)

diff --git a/tools/perf/Documentation/perf-report.txt 
b/tools/perf/Documentation/perf-report.txt
index f546b5e9db05..9691d9c227ba 100644
--- a/tools/perf/Documentation/perf-report.txt
+++ b/tools/perf/Documentation/perf-report.txt
@@ -112,6 +112,7 @@ OPTIONS
- ins_lat: Instruction latency in core cycles. This is the global 
instruction
  latency
- local_ins_lat: Local instruction latency version
+   - p_stage_cyc: Number of cycles spent in a pipeline stage.
 
By default, comm, dso and symbol keys are used.
(i.e. --sort comm,dso,symbol)
diff --git a/tools/perf/arch/powerpc/util/event.c 
b/tools/perf/arch/powerpc/util/event.c
index f49d32c2c8ae..b80fbee83b6e 100644
--- a/tools/perf/arch/powerpc/util/event.c
+++ b/tools/perf/arch/powerpc/util/event.c
@@ -18,8 +18,11 @@ void arch_perf_parse_sample_weight(struct perf_sample *data,
weight.full = *array;
if (type & PERF_SAMPLE_WEIGHT)
data->weight = weight.full;
-   else
+   else {
data->weight = weight.var1_dw;
+   data->ins_lat = weight.var2_w;
+   data->p_stage_cyc = weight.var3_w;
+   }
 }
 
 void arch_perf_synthesize_sample_weight(const struct perf_sample *data,
@@ -27,6 +30,17 @@ void arch_perf_synthesize_sample_weight(const struct 
perf_sample *data,
 {
*array = data->weight;
 
-   if (type & PERF_SAMPLE_WEIGHT_STRUCT)
+   if (type & PERF_SAMPLE_WEIGHT_STRUCT) {
*array &= 0x;
+   *array |= ((u64)data->ins_lat << 32);
+   }
+}
+
+const char *arch_perf_header_entry__add(const char *se_header)
+{
+   if (!strcmp(se_header, "Local INSTR Latency"))
+   return "Finish Cyc";
+   else if (!strcmp(se_header, "Pipeline Stage Cycle"))
+   return "Dispatch Cyc";
+   return se_header;
 }
diff --git a/tools/perf/util/event.h b/tools/perf/util/event.h
index 89b149e2e70a..65f89e80916f 100644
--- a/tools/perf/util/event.h
+++ b/tools/perf/util/event.h
@@ -147,6 +147,7 @@ struct perf_sample {
u8  cpumode;
u16 misc;
u16 ins_lat;
+   u16 p_stage_cyc;
bool no_hw_idx; /* No hw_idx collected in branch_stack */
char insn[MAX_INSN];
void *raw_data;
diff --git a/tools/perf/util/hist.c b/tools/perf/util/hist.c
index c82f5fc26af8..9299ee535518 100644
--- a/tools/perf/util/hist.c
+++ b/tools/perf/util/hist.c
@@ -211,6 +211,7 @@ void hists__calc_col_len(struct hists *hists, struct 
hist_entry *h)
hists__new_col_len(hists, HISTC_MEM_BLOCKED, 10);
hists__new_col_len(hists, HISTC_LOCAL_INS_LAT, 13);
hists__new_col_len(hists, HISTC_GLOBAL_INS_LAT, 13);
+   hists__new_col_len(hists, HISTC_P_STAGE_CYC, 13);
if (symbol_conf.nanosecs)
hists__new_col_len(hists, HISTC_TIME, 16);
else
@@ -289,13 +290,14 @@ static long hist_time(unsigned long htime)
 }
 
 static void he_stat__add_period(struct he_stat *he_stat, u64 period,
-   u64 weight, u64 ins_lat)
+   u64 weight, u64 ins_lat, u64 p_stage_cyc)
 {
 
he_stat->period += period;
he_stat->weight += weight;
he_stat->nr_events  += 1;
he_stat->ins_lat+= ins_lat;
+   he_stat->p_stage_cyc+= p_stage_cyc;
 }
 
 static void he_stat__add_stat(struct he_stat *dest, struct he_stat *src)
@@ -308,6 +310,7 @@ static void he_stat__add_stat(struct he_st

[PATCH 3/4] tools/perf: Add powerpc support for PERF_SAMPLE_WEIGHT_STRUCT

2021-03-09 Thread Athira Rajeev
Add arch specific arch_evsel__set_sample_weight() to set the new
sample type for powerpc.

Add arch specific arch_perf_parse_sample_weight() to store the
sample->weight values depending on the sample type applied.
if the new sample type (PERF_SAMPLE_WEIGHT_STRUCT) is applied,
store only the lower 32 bits to sample->weight. If sample type
is 'PERF_SAMPLE_WEIGHT', store the full 64-bit to sample->weight.

Signed-off-by: Athira Rajeev 
---
 tools/perf/arch/powerpc/util/Build   |  2 ++
 tools/perf/arch/powerpc/util/event.c | 32 
 tools/perf/arch/powerpc/util/evsel.c |  8 
 3 files changed, 42 insertions(+)
 create mode 100644 tools/perf/arch/powerpc/util/event.c
 create mode 100644 tools/perf/arch/powerpc/util/evsel.c

diff --git a/tools/perf/arch/powerpc/util/Build 
b/tools/perf/arch/powerpc/util/Build
index b7945e5a543b..8a79c4126e5b 100644
--- a/tools/perf/arch/powerpc/util/Build
+++ b/tools/perf/arch/powerpc/util/Build
@@ -4,6 +4,8 @@ perf-y += kvm-stat.o
 perf-y += perf_regs.o
 perf-y += mem-events.o
 perf-y += sym-handling.o
+perf-y += evsel.o
+perf-y += event.o
 
 perf-$(CONFIG_DWARF) += dwarf-regs.o
 perf-$(CONFIG_DWARF) += skip-callchain-idx.o
diff --git a/tools/perf/arch/powerpc/util/event.c 
b/tools/perf/arch/powerpc/util/event.c
new file mode 100644
index ..f49d32c2c8ae
--- /dev/null
+++ b/tools/perf/arch/powerpc/util/event.c
@@ -0,0 +1,32 @@
+// SPDX-License-Identifier: GPL-2.0
+#include 
+#include 
+#include 
+
+#include "../../../util/event.h"
+#include "../../../util/synthetic-events.h"
+#include "../../../util/machine.h"
+#include "../../../util/tool.h"
+#include "../../../util/map.h"
+#include "../../../util/debug.h"
+
+void arch_perf_parse_sample_weight(struct perf_sample *data,
+  const __u64 *array, u64 type)
+{
+   union perf_sample_weight weight;
+
+   weight.full = *array;
+   if (type & PERF_SAMPLE_WEIGHT)
+   data->weight = weight.full;
+   else
+   data->weight = weight.var1_dw;
+}
+
+void arch_perf_synthesize_sample_weight(const struct perf_sample *data,
+   __u64 *array, u64 type)
+{
+   *array = data->weight;
+
+   if (type & PERF_SAMPLE_WEIGHT_STRUCT)
+   *array &= 0x;
+}
diff --git a/tools/perf/arch/powerpc/util/evsel.c 
b/tools/perf/arch/powerpc/util/evsel.c
new file mode 100644
index ..2f733cdc8dbb
--- /dev/null
+++ b/tools/perf/arch/powerpc/util/evsel.c
@@ -0,0 +1,8 @@
+// SPDX-License-Identifier: GPL-2.0
+#include 
+#include "util/evsel.h"
+
+void arch_evsel__set_sample_weight(struct evsel *evsel)
+{
+   evsel__set_sample_bit(evsel, WEIGHT_STRUCT);
+}
-- 
1.8.3.1



[PATCH 2/4] tools/perf: Add dynamic headers for perf report columns

2021-03-09 Thread Athira Rajeev
Currently the header string for different columns in perf report
is fixed. Some fields of perf sample could have different meaning
for different architectures than the meaning conveyed by the header
string. An example is the new field 'var2_w' of perf_sample_weight
structure. This is presently captured as 'Local INSTR Latency' in
perf mem report. But this could be used to denote a different latency
cycle in another architecture.

Introduce a weak function arch_perf_header_entry__add() to set
the arch specific header string for the fields which can contain dynamic
header. If the architecture do not have this function, fall back to the
default header string value.

Signed-off-by: Athira Rajeev 
---
 tools/perf/util/event.h |  1 +
 tools/perf/util/sort.c  | 19 ++-
 2 files changed, 19 insertions(+), 1 deletion(-)

diff --git a/tools/perf/util/event.h b/tools/perf/util/event.h
index f603edbbbc6f..89b149e2e70a 100644
--- a/tools/perf/util/event.h
+++ b/tools/perf/util/event.h
@@ -427,5 +427,6 @@ void  cpu_map_data__synthesize(struct 
perf_record_cpu_map_data *data, struct per
 
 void arch_perf_parse_sample_weight(struct perf_sample *data, const __u64 
*array, u64 type);
 void arch_perf_synthesize_sample_weight(const struct perf_sample *data, __u64 
*array, u64 type);
+const char *arch_perf_header_entry__add(const char *se_header);
 
 #endif /* __PERF_RECORD_H */
diff --git a/tools/perf/util/sort.c b/tools/perf/util/sort.c
index 0d5ad42812b9..741a6df29fa0 100644
--- a/tools/perf/util/sort.c
+++ b/tools/perf/util/sort.c
@@ -25,6 +25,7 @@
 #include 
 #include "mem-events.h"
 #include "annotate.h"
+#include "event.h"
 #include "time-utils.h"
 #include "cgroup.h"
 #include "machine.h"
@@ -45,6 +46,7 @@
 regex_tignore_callees_regex;
 inthave_ignore_callees = 0;
 enum sort_mode sort__mode = SORT_MODE__NORMAL;
+const char *dynamic_headers[] = {"local_ins_lat"};
 
 /*
  * Replaces all occurrences of a char used with the:
@@ -1816,6 +1818,16 @@ struct sort_dimension {
int taken;
 };
 
+const char * __weak arch_perf_header_entry__add(const char *se_header)
+{
+   return se_header;
+}
+
+static void sort_dimension_add_dynamic_header(struct sort_dimension *sd)
+{
+   sd->entry->se_header = 
arch_perf_header_entry__add(sd->entry->se_header);
+}
+
 #define DIM(d, n, func) [d] = { .name = n, .entry = &(func) }
 
 static struct sort_dimension common_sort_dimensions[] = {
@@ -2739,11 +2751,16 @@ int sort_dimension__add(struct perf_hpp_list *list, 
const char *tok,
struct evlist *evlist,
int level)
 {
-   unsigned int i;
+   unsigned int i, j;
 
for (i = 0; i < ARRAY_SIZE(common_sort_dimensions); i++) {
struct sort_dimension *sd = _sort_dimensions[i];
 
+   for (j = 0; j < ARRAY_SIZE(dynamic_headers); j++) {
+   if (!strcmp(dynamic_headers[j], sd->name))
+   sort_dimension_add_dynamic_header(sd);
+   }
+
if (strncasecmp(tok, sd->name, strlen(tok)))
continue;
 
-- 
1.8.3.1



[PATCH 1/4] powerpc/perf: Expose processor pipeline stage cycles using PERF_SAMPLE_WEIGHT_STRUCT

2021-03-09 Thread Athira Rajeev
Performance Monitoring Unit (PMU) registers in powerpc provides
information on cycles elapsed between different stages in the
pipeline. This can be used for application tuning. On ISA v3.1
platform, this information is exposed by sampling registers.
Patch adds kernel support to capture two of the cycle counters
as part of perf sample using the sample type:
PERF_SAMPLE_WEIGHT_STRUCT.

The power PMU function 'get_mem_weight' currently uses 64 bit weight
field of perf_sample_data to capture memory latency. But following the
introduction of PERF_SAMPLE_WEIGHT_TYPE, weight field could contain
64-bit or 32-bit value depending on the architexture support for
PERF_SAMPLE_WEIGHT_STRUCT. Patches uses WEIGHT_STRUCT to expose the
pipeline stage cycles info. Hence update the ppmu functions to work for
64-bit and 32-bit weight values.

If the sample type is PERF_SAMPLE_WEIGHT, use the 64-bit weight field.
if the sample type is PERF_SAMPLE_WEIGHT_STRUCT, memory subsystem
latency is stored in the low 32bits of perf_sample_weight structure.
Also for CPU_FTR_ARCH_31, capture the two cycle counter information in
two 16 bit fields of perf_sample_weight structure.

Signed-off-by: Athira Rajeev 
---
 arch/powerpc/include/asm/perf_event_server.h |  2 +-
 arch/powerpc/perf/core-book3s.c  |  4 ++--
 arch/powerpc/perf/isa207-common.c| 29 +---
 arch/powerpc/perf/isa207-common.h|  6 +-
 4 files changed, 34 insertions(+), 7 deletions(-)

diff --git a/arch/powerpc/include/asm/perf_event_server.h 
b/arch/powerpc/include/asm/perf_event_server.h
index 00e7e671bb4b..112cf092d7b3 100644
--- a/arch/powerpc/include/asm/perf_event_server.h
+++ b/arch/powerpc/include/asm/perf_event_server.h
@@ -43,7 +43,7 @@ struct power_pmu {
u64 alt[]);
void(*get_mem_data_src)(union perf_mem_data_src *dsrc,
u32 flags, struct pt_regs *regs);
-   void(*get_mem_weight)(u64 *weight);
+   void(*get_mem_weight)(u64 *weight, u64 type);
unsigned long   group_constraint_mask;
unsigned long   group_constraint_val;
u64 (*bhrb_filter_map)(u64 branch_sample_type);
diff --git a/arch/powerpc/perf/core-book3s.c b/arch/powerpc/perf/core-book3s.c
index 6817331e22ff..57ff2494880c 100644
--- a/arch/powerpc/perf/core-book3s.c
+++ b/arch/powerpc/perf/core-book3s.c
@@ -2206,9 +2206,9 @@ static void record_and_restart(struct perf_event *event, 
unsigned long val,
ppmu->get_mem_data_src)
ppmu->get_mem_data_src(_src, ppmu->flags, 
regs);
 
-   if (event->attr.sample_type & PERF_SAMPLE_WEIGHT &&
+   if (event->attr.sample_type & PERF_SAMPLE_WEIGHT_TYPE &&
ppmu->get_mem_weight)
-   ppmu->get_mem_weight();
+   ppmu->get_mem_weight(, 
event->attr.sample_type);
 
if (perf_event_overflow(event, , regs))
power_pmu_stop(event, 0);
diff --git a/arch/powerpc/perf/isa207-common.c 
b/arch/powerpc/perf/isa207-common.c
index e4f577da33d8..5dcbdbd54598 100644
--- a/arch/powerpc/perf/isa207-common.c
+++ b/arch/powerpc/perf/isa207-common.c
@@ -284,8 +284,10 @@ void isa207_get_mem_data_src(union perf_mem_data_src 
*dsrc, u32 flags,
}
 }
 
-void isa207_get_mem_weight(u64 *weight)
+void isa207_get_mem_weight(u64 *weight, u64 type)
 {
+   union perf_sample_weight *weight_fields;
+   u64 weight_lat;
u64 mmcra = mfspr(SPRN_MMCRA);
u64 exp = MMCRA_THR_CTR_EXP(mmcra);
u64 mantissa = MMCRA_THR_CTR_MANT(mmcra);
@@ -296,9 +298,30 @@ void isa207_get_mem_weight(u64 *weight)
mantissa = P10_MMCRA_THR_CTR_MANT(mmcra);
 
if (val == 0 || val == 7)
-   *weight = 0;
+   weight_lat = 0;
else
-   *weight = mantissa << (2 * exp);
+   weight_lat = mantissa << (2 * exp);
+
+   /*
+* Use 64 bit weight field (full) if sample type is
+* WEIGHT.
+*
+* if sample type is WEIGHT_STRUCT:
+* - store memory latency in the lower 32 bits.
+* - For ISA v3.1, use remaining two 16 bit fields of
+*   perf_sample_weight to store cycle counter values
+*   from sier2.
+*/
+   weight_fields = (union perf_sample_weight *)weight;
+   if (type & PERF_SAMPLE_WEIGHT)
+   weight_fields->full = weight_lat;
+   else {
+   weight_fields->var1_dw = (u32)weight_lat;
+   if (cpu_has_feature(CPU_FTR_ARCH_31)) {
+   weight_fields->var2_w = 
P10_SIER2_FINISH_CYC(mfspr(SPRN_SIER2));
+   weight_fields->var3_w = 
P10_SIER2_DISPATCH_CYC(mfspr(SPRN_SIER2));
+   

[PATCH 0/4] powerpc/perf: Export processor pipeline stage cycles information

2021-03-09 Thread Athira Rajeev
Performance Monitoring Unit (PMU) registers in powerpc exports
number of cycles elapsed between different stages in the pipeline.
Example, sampling registers in ISA v3.1.

This patchset implements kernel and perf tools support to expose
these pipeline stage cycles using the sample type PERF_SAMPLE_WEIGHT_TYPE.

Patch 1/4 adds kernel side support to store the cycle counter
values as part of 'var2_w' and 'var3_w' fields of perf_sample_weight
structure.

Patch 2/4 adds support to make the perf report column header
strings as dynamic.
Patch 3/4 adds powerpc support in perf tools for PERF_SAMPLE_WEIGHT_STRUCT
in sample type: PERF_SAMPLE_WEIGHT_TYPE.
Patch 4/4 adds support to present pipeline stage cycles as part of
mem-mode.

Sample output on powerpc:

# perf mem record ls
# perf mem report

# To display the perf.data header info, please use --header/--header-only 
options.
#
#
# Total Lost Samples: 0
#
# Samples: 11  of event 'cpu/mem-loads/'
# Total weight : 1332
# Sort order   : 
local_weight,mem,sym,dso,symbol_daddr,dso_daddr,snoop,tlb,locked,blocked,local_ins_lat,stall_cyc
#
# Overhead   Samples  Local Weight  Memory access Symbol
  Shared Object Data Symbol 
   Data ObjectSnoop TLB access  Locked  
Blocked Finish Cyc Dispatch Cyc 
#         
..    
.  .  
  ..  ..  ..  .  
.
#
44.14% 1  588   L1 hit[k] 
rcu_nmi_exit[kernel.vmlinux]  [k] 0xc007ffdd21b0
 [unknown]  N/A   N/A 
No   N/A7  5
22.22% 1  296   L1 hit[k] 
copypage_power7 [kernel.vmlinux]  [k] 0xc000ff6a1780
 [unknown]  N/A   N/A 
No   N/A2933
 6.98% 1  93L1 hit[.] _dl_addr  
  libc-2.31.so  [.] 0x7fff86fa5058  
   libc-2.31.so   N/A   N/A No   
N/A7  1
 6.61% 1  88L2 hit[.] 
new_do_writelibc-2.31.so  [.] _IO_2_1_stdout_+0x0   
 libc-2.31.so   N/A   N/A 
No   N/A84 1
 5.93% 1  79L1 hit[k] 
printk_nmi_exit [kernel.vmlinux]  [k] 0xc006085df6b0
 [unknown]  N/A   N/A 
No   N/A7  1
 4.05% 1  54L2 hit[.] 
__alloc_dir libc-2.31.so  [.] 0x7fffdb70a640
 [stack]N/A   N/A 
No   N/A18 1
 3.60% 1  48L1 hit[.] _init 
  ls[.] 0x00016ca82118  
   [heap] N/A   N/A No   
N/A7  6
 2.40% 1  32L1 hit[k] desc_read 
  [kernel.vmlinux]  [k] _printk_rb_static_descs+0x1ea10 
   [kernel.vmlinux].data  N/A   N/A No   
N/A7  1
 1.65% 1  22L2 hit[k] 
perf_iterate_ctx.constprop.139  [kernel.vmlinux]  [k] 0xc0064d79e8a8
 [unknown]  N/A   N/A 
No   N/A16 1
 1.58% 1  21L1 hit[k] 
perf_event_interrupt[kernel.vmlinux]  [k] 0xc006085df6b0
 [unknown]  N/A   N/A 
No   N/A7  1
 0.83% 1  11L1 hit[k] 
perf_event_exec [kernel.vmlinux]  [k] 0xc007ffdd3288
 [unknown]  N/A   N/A 
No   N/A7  4


Athira Rajeev (4):
  powerpc/perf: Expose processor pipeline stage cycles using
PERF_SAMPLE_WEIGHT_STRUCT
  tools/perf: Add dynamic headers for perf report columns
  tools/perf: Add powerpc support for PERF_SAMPLE_WEIGHT_STRUCT
  tools/perf: Support

Re: [PATCH] powerpc/perf: prevent mixed EBB and non-EBB events

2021-03-04 Thread Athira Rajeev



> On 24-Feb-2021, at 5:51 PM, Thadeu Lima de Souza Cascardo 
>  wrote:
> 
> EBB events must be under exclusive groups, so there is no mix of EBB and
> non-EBB events on the same PMU. This requirement worked fine as perf core
> would not allow other pinned events to be scheduled together with exclusive
> events.
> 
> This assumption was broken by commit 1908dc911792 ("perf: Tweak
> perf_event_attr::exclusive semantics").
> 
> After that, the test cpu_event_pinned_vs_ebb_test started succeeding after
> read_events, but worse, the task would not have given access to PMC1, so
> when it tried to write to it, it was killed with "illegal instruction".
> 
> Preventing mixed EBB and non-EBB events from being add to the same PMU will
> just revert to the previous behavior and the test will succeed.


Hi,

Thanks for checking this. I checked your patch which is fixing “check_excludes” 
to make
sure all events must agree on EBB. But in the PMU group constraints, we already 
have check for
EBB events. This is in arch/powerpc/perf/isa207-common.c ( 
isa207_get_constraint function ).

<<>>
mask  |= CNST_EBB_VAL(ebb);
value |= CNST_EBB_MASK;
<<>>

But the above setting for mask and value is interchanged. We actually need to 
fix here.

Below patch should fix this:

diff --git a/arch/powerpc/perf/isa207-common.c 
b/arch/powerpc/perf/isa207-common.c
index e4f577da33d8..8b5eeb6fb2fb 100644
--- a/arch/powerpc/perf/isa207-common.c
+++ b/arch/powerpc/perf/isa207-common.c
@@ -447,8 +447,8 @@ int isa207_get_constraint(u64 event, unsigned long *maskp, 
unsigned long *valp,
 * EBB events are pinned & exclusive, so this should never actually
 * hit, but we leave it as a fallback in case.
 */
-   mask  |= CNST_EBB_VAL(ebb);
-   value |= CNST_EBB_MASK;
+   mask  |= CNST_EBB_MASK;
+   value |= CNST_EBB_VAL(ebb);
 
*maskp = mask;
*valp = value;


Can you please try with this patch.

Thanks
Athira


> 
> Fixes: 1908dc911792 (perf: Tweak perf_event_attr::exclusive semantics)
> Signed-off-by: Thadeu Lima de Souza Cascardo 
> ---
> arch/powerpc/perf/core-book3s.c | 20 
> 1 file changed, 16 insertions(+), 4 deletions(-)
> 
> diff --git a/arch/powerpc/perf/core-book3s.c b/arch/powerpc/perf/core-book3s.c
> index 43599e671d38..d767f7944f85 100644
> --- a/arch/powerpc/perf/core-book3s.c
> +++ b/arch/powerpc/perf/core-book3s.c
> @@ -1010,9 +1010,25 @@ static int check_excludes(struct perf_event **ctrs, 
> unsigned int cflags[],
> int n_prev, int n_new)
> {
>   int eu = 0, ek = 0, eh = 0;
> + bool ebb = false;
>   int i, n, first;
>   struct perf_event *event;
> 
> + n = n_prev + n_new;
> + if (n <= 1)
> + return 0;
> +
> + first = 1;
> + for (i = 0; i < n; ++i) {
> + event = ctrs[i];
> + if (first) {
> + ebb = is_ebb_event(event);
> + first = 0;
> + } else if (is_ebb_event(event) != ebb) {
> + return -EAGAIN;
> + }
> + }
> +
>   /*
>* If the PMU we're on supports per event exclude settings then we
>* don't need to do any of this logic. NB. This assumes no PMU has both
> @@ -1021,10 +1037,6 @@ static int check_excludes(struct perf_event **ctrs, 
> unsigned int cflags[],
>   if (ppmu->flags & PPMU_ARCH_207S)
>   return 0;
> 
> - n = n_prev + n_new;
> - if (n <= 1)
> - return 0;
> -
>   first = 1;
>   for (i = 0; i < n; ++i) {
>   if (cflags[i] & PPMU_LIMITED_PMC_OK) {
> -- 
> 2.27.0
> 



Re: [PATCH] perf report: Fix -F for branch & mem modes

2021-03-03 Thread Athira Rajeev



> On 04-Mar-2021, at 11:59 AM, Ravi Bangoria  
> wrote:
> 
> perf report fails to add valid additional fields with -F when
> used with branch or mem modes. Fix it.
> 
> Before patch:
> 
>  $ ./perf record -b
>  $ ./perf report -b -F +srcline_from --stdio
>  Error:
>  Invalid --fields key: `srcline_from'
> 
> After patch:
> 
>  $ ./perf report -b -F +srcline_from --stdio
>  # Samples: 8K of event 'cycles'
>  # Event count (approx.): 8784
>  ...
> 
> Reported-by: Athira Rajeev 
> Fixes: aa6b3c99236b ("perf report: Make -F more strict like -s")
> Signed-off-by: Ravi Bangoria 

Thanks for the fix Ravi.

Reviewed-and-tested-by: Athira Rajeev 

> ---
> tools/perf/util/sort.c | 4 ++--
> 1 file changed, 2 insertions(+), 2 deletions(-)
> 
> diff --git a/tools/perf/util/sort.c b/tools/perf/util/sort.c
> index 0d5ad42812b9..552b590485bf 100644
> --- a/tools/perf/util/sort.c
> +++ b/tools/perf/util/sort.c
> @@ -3140,7 +3140,7 @@ int output_field_add(struct perf_hpp_list *list, char 
> *tok)
>   if (strncasecmp(tok, sd->name, strlen(tok)))
>   continue;
> 
> - if (sort__mode != SORT_MODE__MEMORY)
> + if (sort__mode != SORT_MODE__BRANCH)
>   return -EINVAL;
> 
>   return __sort_dimension__add_output(list, sd);
> @@ -3152,7 +3152,7 @@ int output_field_add(struct perf_hpp_list *list, char 
> *tok)
>   if (strncasecmp(tok, sd->name, strlen(tok)))
>   continue;
> 
> - if (sort__mode != SORT_MODE__BRANCH)
> + if (sort__mode != SORT_MODE__MEMORY)
>   return -EINVAL;
> 
>   return __sort_dimension__add_output(list, sd);
> -- 
> 2.29.2
> 



Re: [PATCH] perf test: Test case 27 fails on s390 and non-x86 platforms

2021-03-03 Thread Athira Rajeev



> On 03-Mar-2021, at 1:40 AM, Liang, Kan  wrote:
> 
> 
> 
> On 3/2/2021 12:08 PM, Thomas Richter wrote:
>> On 3/2/21 4:23 PM, Liang, Kan wrote:
>>> 
>>> 
>>> On 3/2/2021 9:48 AM, Thomas Richter wrote:
>>>> On 3/2/21 3:03 PM, Liang, Kan wrote:
>>>>> 
>>>>> + Athira Rajeev
>>>>> 
>>>>> On 3/2/2021 8:31 AM, Thomas Richter wrote:
>>>>>> Executing perf test 27 fails on s390:
>>>>>>[root@t35lp46 perf]# ./perf test -Fv 27
>>>>>>27: Sample parsing
>>>>>>--- start ---
>>>>>> end 
>>>>>>Sample parsing: FAILED!
>>>>>>[root@t35lp46 perf]#
>>>>>> 
>>>>>> The root cause is
>>>>>> commit c7444297fd3769 ("perf test: Support PERF_SAMPLE_WEIGHT_STRUCT")
>>>>>> This commit introduced a test case for PERF_SAMPLE_WEIGHT_STRUCT
>>>>>> but does not adjust non-x86 weak linkage functions.
>>>>>> 
>>>>>> The error is in test__sample_parsing() --> do_test()
>>>>>> Function do_test() defines two structures of type struct perf_sample 
>>>>>> named
>>>>>> sample and sample_out. The first sets member sample.ins_lat = 117
>>>>>> 
>>>>>> Structure sample_out is constructed dynamically using functions
>>>>>> perf_event__synthesize_sample() and evsel__parse_sample().
>>>>>> Both functions have an x86 specific function version which sets member
>>>>>> ins_lat. The weak common functions do not set member ins_lat.
>>>>>> 
>>>>> 
>>>>> I don't think Power supports the instruction latency. As a request from 
>>>>> Athira Rajeev, I moved the PERF_SAMPLE_WEIGHT_STRUCT to the X86 specific 
>>>>> codes.
>>>>> https://lore.kernel.org/lkml/d97fef4f-dd88-4760-885e-9a6161a9b...@linux.vnet.ibm.com/
>>>>> https://lore.kernel.org/lkml/1612540912-6562-1-git-send-email-kan.li...@linux.intel.com/
>>>>> 
>>>>> I don't think we want to add the ins_lat back in the weak common 
>>>>> functions.


Hi Kan Liang,

Yes, presently in powerpc we are not using PERF_SAMPLE_WEIGHT_STRUCT.
But I am working on a patch set to expose some of the pipeline stalls details 
using PERF_SAMPLE_WEIGHT_STRUCT,
by using the two 16-bit fields of sample weight. I could use the same "ins_lat" 
field and then use an arch specific header string while displaying with "perf 
report”. I will be sharing an RFC patch on this soon.

But I believe it is good to keep the weak function 
"arch_perf_parse_sample_weight" if we want to create different field for 
'weight->var2_w' in future.

Thanks
Athira

>>>>> 
>>>>> Could you please update the perf test and don't apply the 
>>>>> PERF_SAMPLE_WEIGHT_STRUCT for the non-X86 platform?
>>>> 
>>>> I used offical linux git tree
>>>>   [root@t35lp46 perf]# git tag | fgrep 5.12
>>>> v5.12-rc1
>>>> [root@t35lp46 perf]#
>>>> 
>>>> So this change is in the pipe. I do not plan to revert individual patches.
>>> 
>>> No, we shouldn't revert the patch.
>>> I mean can you fix the issue in perf test?
>>> Don't test ins_lat or PERF_SAMPLE_WEIGHT_STRUCT for a non-X86 platform.
>> That would be very ugly code. We would end up in conditional compiles like
>> #ifdef __s390x__
>> #endif
>> and other architectes like ARM/POWER etc come along. This is something I 
>> want to avoid.
> 
> The ins_lat is a model specific variable. Maybe we should move it to the arch 
> specific test.
> 
> 
>> And this fix only touches perf, not the kernel.
> 
> The patch changes the behavior of the PERF_SAMPLE_WEIGHT. The high 32 bit 
> will be dropped. It should bring some problems if the high 32 bit contains 
> valid information.
> 
>>>>> 
>>>>> 
>>>>>> Later in function samples_same() both data in variable sample and 
>>>>>> sample_out
>>>>>> are compared. The comparison fails because sample.ins_lat is 117
>>>>>> and samples_out.ins_lat is 0, the weak functions never set member 
>>>>>> ins_lat.
>>>>>> 
>>>>>> Output after:
>>>>>>[root@t35lp46 perf]# ./perf test -Fv 27
>>>>>>27: Sample parsing
>>>>>>--- start ---

[PATCH] perf bench numa: Fix the condition checks for max number of numa nodes

2021-02-25 Thread Athira Rajeev
In systems having higher node numbers available like node
255, perf numa bench will fail with SIGABORT.

<<>>
perf: bench/numa.c:1416: init: Assertion `!(g->p.nr_nodes > 64 || g->p.nr_nodes 
< 0)' failed.
Aborted (core dumped)
<<>>

Snippet from 'numactl -H' below on a powerpc system where the highest
node number available is 255.

available: 6 nodes (0,8,252-255)
node 0 cpus: 
node 0 size: 519587 MB
node 0 free: 516659 MB
node 8 cpus: 
node 8 size: 523607 MB
node 8 free: 486757 MB
node 252 cpus:
node 252 size: 0 MB
node 252 free: 0 MB
node 253 cpus:
node 253 size: 0 MB
node 253 free: 0 MB
node 254 cpus:
node 254 size: 0 MB
node 254 free: 0 MB
node 255 cpus:
node 255 size: 0 MB
node 255 free: 0 MB
node distances:
node   0   8  252  253  254  255

Note:  expands to actual cpu list in the original output.
These nodes 252-255 are to represent the memory on GPUs and are valid
nodes.

The perf numa bench init code has a condition check to see if the number
of numa nodes (nr_nodes) exceeds MAX_NR_NODES. The value of MAX_NR_NODES
defined in perf code is 64. And the 'nr_nodes' is the value from
numa_max_node() which represents the highest node number available in the
system. In some systems where we could have numa node 255, this condition
check fails and results in SIGABORT.

The numa benchmark uses static value of MAX_NR_NODES in the code to
represent size of two numa node arrays and node bitmask used for setting
memory policy. Patch adds a fix to dynamically allocate size for the
two arrays and bitmask value based on the node numbers available in the
system. With the fix, perf numa benchmark will work with node configuration
on any system and thus removes the static MAX_NR_NODES value.

Signed-off-by: Athira Rajeev 
---
 tools/perf/bench/numa.c | 42 +-
 1 file changed, 29 insertions(+), 13 deletions(-)

diff --git a/tools/perf/bench/numa.c b/tools/perf/bench/numa.c
index 11726ec..20b87e2 100644
--- a/tools/perf/bench/numa.c
+++ b/tools/perf/bench/numa.c
@@ -344,18 +344,22 @@ static void mempol_restore(void)
 
 static void bind_to_memnode(int node)
 {
-   unsigned long nodemask;
+   struct bitmask *node_mask;
int ret;
 
if (node == NUMA_NO_NODE)
return;
 
-   BUG_ON(g->p.nr_nodes > (int)sizeof(nodemask)*8);
-   nodemask = 1L << node;
+   node_mask = numa_allocate_nodemask();
+   BUG_ON(!node_mask);
 
-   ret = set_mempolicy(MPOL_BIND, , sizeof(nodemask)*8);
-   dprintf("binding to node %d, mask: %016lx => %d\n", node, nodemask, 
ret);
+   numa_bitmask_clearall(node_mask);
+   numa_bitmask_setbit(node_mask, node);
 
+   ret = set_mempolicy(MPOL_BIND, node_mask->maskp, node_mask->size + 1);
+   dprintf("binding to node %d, mask: %016lx => %d\n", node, 
*node_mask->maskp, ret);
+
+   numa_bitmask_free(node_mask);
BUG_ON(ret);
 }
 
@@ -876,8 +880,6 @@ static void update_curr_cpu(int task_nr, unsigned long 
bytes_worked)
prctl(0, bytes_worked);
 }
 
-#define MAX_NR_NODES   64
-
 /*
  * Count the number of nodes a process's threads
  * are spread out on.
@@ -888,10 +890,15 @@ static void update_curr_cpu(int task_nr, unsigned long 
bytes_worked)
  */
 static int count_process_nodes(int process_nr)
 {
-   char node_present[MAX_NR_NODES] = { 0, };
+   char *node_present;
int nodes;
int n, t;
 
+   node_present = (char *)malloc(g->p.nr_nodes * sizeof(char));
+   BUG_ON(!node_present);
+   for (nodes = 0; nodes < g->p.nr_nodes; nodes++)
+   node_present[nodes] = 0;
+
for (t = 0; t < g->p.nr_threads; t++) {
struct thread_data *td;
int task_nr;
@@ -901,17 +908,20 @@ static int count_process_nodes(int process_nr)
td = g->threads + task_nr;
 
node = numa_node_of_cpu(td->curr_cpu);
-   if (node < 0) /* curr_cpu was likely still -1 */
+   if (node < 0) /* curr_cpu was likely still -1 */ {
+   free(node_present);
return 0;
+   }
 
node_present[node] = 1;
}
 
nodes = 0;
 
-   for (n = 0; n < MAX_NR_NODES; n++)
+   for (n = 0; n < g->p.nr_nodes; n++)
nodes += node_present[n];
 
+   free(node_present);
return nodes;
 }
 
@@ -980,7 +990,7 @@ static void calc_convergence(double runtime_ns_max, double 
*convergence)
 {
unsigned int loops_done_min, loops_done_max;
int process_groups;
-   int nodes[MAX_NR_NODES];
+   int *nodes;
int distance;
int nr_min;
int nr_max;
@@ -994,6 +1004,8 @@ static void calc_convergence(double runtime_ns_max, double 
*convergence)
if (!g->p.show_convergence && !g->p.measure_convergence)
return;
 

Re: [PATCH 6/9] perf report: Support instruction latency

2021-02-07 Thread Athira Rajeev



> On 05-Feb-2021, at 8:21 PM, Liang, Kan  wrote:
> 
> 
> 
> On 2/5/2021 7:55 AM, Athira Rajeev wrote:
>>>> Because in other archs, the var2_w of ‘perf_sample_weight’ could be used 
>>>> to capture something else than the Local INSTR Latency.
>>>> Can we have some weak function to populate the header string ?
>>> I agree that the var2_w has different meanings among architectures. We 
>>> should not force it to data->ins_lat.
>>> 
>>> The patch as below should fix it. Does it work for you?
>> My point about weak function was actually for the arch specific header 
>> string. But I guess we should not force it to data->ins_lat
> 
> Yes, I don't think PowerPC should force var2_w to data->ins_lat. I think you 
> can create your own field.
> 
>> as you mentioned. I checked the below patch defining an 
>> ‘arch_perf_parse_sample_weight' for powerpc and it works.
>> But one observation is that, for cases with kernel having support for 
>> PERF_SAMPLE_WEIGHT_STRUCT but missing arch specific support for  
>> ‘arch_perf_parse_sample_weight', it will report ‘Local Weight’ wrongly since 
>> weak function takes it as 64 bit. Not sure if that is a valid case to 
>> consider though.
> 
> Currently, the PERF_SAMPLE_WEIGHT_STRUCT is only enabled on X86 by default.
> https://lore.kernel.org/lkml/1612296553-21962-6-git-send-email-kan.li...@linux.intel.com/
> 
> For PowerPC, the PERF_SAMPLE_WEIGHT is still the default setting. There is no 
> way to set PERF_SAMPLE_WEIGHT_STRUCT via perf tool.
> I don't think the above case will happen.

Yes. 

I tested with kernel changes from perf/core branch of 
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip.git
And perf tools side changes from tmp.perf/core branch of 
git://git.kernel.org/pub/scm/linux/kernel/git/acme/linux.git along with the 
above change. 
The default setting for powerpc works with out breaking anything and verified 
using “perf mem record ”

Tested-by: Athira Rajeev 

Thanks
Athira Rajeev
> 
> Thanks,
> Kan



Re: [PATCH 6/9] perf report: Support instruction latency

2021-02-05 Thread Athira Rajeev



> On 04-Feb-2021, at 8:49 PM, Liang, Kan  wrote:
> 
> 
> 
> On 2/4/2021 8:11 AM, Athira Rajeev wrote:
>>> On 03-Feb-2021, at 1:39 AM, kan.li...@linux.intel.com wrote:
>>> 
>>> From: Kan Liang 
>>> 
>>> The instruction latency information can be recorded on some platforms,
>>> e.g., the Intel Sapphire Rapids server. With both memory latency
>>> (weight) and the new instruction latency information, users can easily
>>> locate the expensive load instructions, and also understand the time
>>> spent in different stages. The users can optimize their applications
>>> in different pipeline stages.
>>> 
>>> The 'weight' field is shared among different architectures. Reusing the
>>> 'weight' field may impacts other architectures. Add a new field to store
>>> the instruction latency.
>>> 
>>> Like the 'weight' support, introduce a 'ins_lat' for the global
>>> instruction latency, and a 'local_ins_lat' for the local instruction
>>> latency version.
>>> 
>>> Add new sort functions, INSTR Latency and Local INSTR Latency,
>>> accordingly.
>>> 
>>> Add local_ins_lat to the default_mem_sort_order[].
>>> 
>>> Signed-off-by: Kan Liang 
>>> ---
>>> tools/perf/Documentation/perf-report.txt |  6 +++-
>>> tools/perf/util/event.h  |  1 +
>>> tools/perf/util/evsel.c  |  4 ++-
>>> tools/perf/util/hist.c   | 12 ++--
>>> tools/perf/util/hist.h   |  2 ++
>>> tools/perf/util/intel-pt.c   |  5 ++--
>>> tools/perf/util/session.c|  8 --
>>> tools/perf/util/sort.c   | 47 
>>> +++-
>>> tools/perf/util/sort.h   |  3 ++
>>> tools/perf/util/synthetic-events.c   |  4 ++-
>>> 10 files changed, 81 insertions(+), 11 deletions(-)
>>> 
>>> diff --git a/tools/perf/Documentation/perf-report.txt 
>>> b/tools/perf/Documentation/perf-report.txt
>>> index 826b5a9..0565b7c 100644
>>> --- a/tools/perf/Documentation/perf-report.txt
>>> +++ b/tools/perf/Documentation/perf-report.txt
>>> @@ -108,6 +108,9 @@ OPTIONS
>>> - period: Raw number of event count of sample
>>> - time: Separate the samples by time stamp with the resolution 
>>> specified by
>>> --time-quantum (default 100ms). Specify with overhead and before it.
>>> +   - ins_lat: Instruction latency in core cycles. This is the global
>>> +   instruction latency
>>> +   - local_ins_lat: Local instruction latency version
>>> 
>>> By default, comm, dso and symbol keys are used.
>>> (i.e. --sort comm,dso,symbol)
>>> @@ -154,7 +157,8 @@ OPTIONS
>>> - blocked: reason of blocked load access for the data at the time of 
>>> the sample
>>> 
>>> And the default sort keys are changed to local_weight, mem, sym, dso,
>>> -   symbol_daddr, dso_daddr, snoop, tlb, locked, blocked, see '--mem-mode'.
>>> +   symbol_daddr, dso_daddr, snoop, tlb, locked, blocked, local_ins_lat,
>>> +   see '--mem-mode'.
>>> 
>>> If the data file has tracepoint event(s), following (dynamic) sort keys
>>> are also available:
>>> diff --git a/tools/perf/util/event.h b/tools/perf/util/event.h
>>> index ff403ea..5d50a49 100644
>>> --- a/tools/perf/util/event.h
>>> +++ b/tools/perf/util/event.h
>>> @@ -141,6 +141,7 @@ struct perf_sample {
>>> u16 insn_len;
>>> u8  cpumode;
>>> u16 misc;
>>> +   u16 ins_lat;
>>> bool no_hw_idx; /* No hw_idx collected in branch_stack */
>>> char insn[MAX_INSN];
>>> void *raw_data;
>>> diff --git a/tools/perf/util/evsel.c b/tools/perf/util/evsel.c
>>> index 0a2a307..24c0b59 100644
>>> --- a/tools/perf/util/evsel.c
>>> +++ b/tools/perf/util/evsel.c
>>> @@ -2337,8 +2337,10 @@ int evsel__parse_sample(struct evsel *evsel, union 
>>> perf_event *event,
>>> weight.full = *array;
>>> if (type & PERF_SAMPLE_WEIGHT)
>>> data->weight = weight.full;
>>> -   else
>>> +   else {
>>> data->weight = weight.var1_dw;
>>> +   data->ins_lat = weight.var2_w;
>>> +   }
>>> array++;
>>> }
>>> 
>>> diff --git a/t

Re: [PATCH 6/9] perf report: Support instruction latency

2021-02-04 Thread Athira Rajeev
ols/perf/util/sort.c b/tools/perf/util/sort.c
> index 249a03c..e0529f2 100644
> --- a/tools/perf/util/sort.c
> +++ b/tools/perf/util/sort.c
> @@ -36,7 +36,7 @@ const char  default_parent_pattern[] = 
> "^sys_|^do_page_fault";
> const char*parent_pattern = default_parent_pattern;
> const char*default_sort_order = "comm,dso,symbol";
> const chardefault_branch_sort_order[] = 
> "comm,dso_from,symbol_from,symbol_to,cycles";
> -const char   default_mem_sort_order[] = 
> "local_weight,mem,sym,dso,symbol_daddr,dso_daddr,snoop,tlb,locked,blocked";
> +const char   default_mem_sort_order[] = 
> "local_weight,mem,sym,dso,symbol_daddr,dso_daddr,snoop,tlb,locked,blocked,local_ins_lat";
> const chardefault_top_sort_order[] = "dso,symbol";
> const chardefault_diff_sort_order[] = "dso,symbol";
> const chardefault_tracepoint_sort_order[] = "trace";
> @@ -1365,6 +1365,49 @@ struct sort_entry sort_global_weight = {
>   .se_width_idx   = HISTC_GLOBAL_WEIGHT,
> };
> 
> +static u64 he_ins_lat(struct hist_entry *he)
> +{
> + return he->stat.nr_events ? he->stat.ins_lat / 
> he->stat.nr_events : 0;
> +}
> +
> +static int64_t
> +sort__local_ins_lat_cmp(struct hist_entry *left, struct hist_entry *right)
> +{
> + return he_ins_lat(left) - he_ins_lat(right);
> +}
> +
> +static int hist_entry__local_ins_lat_snprintf(struct hist_entry *he, char 
> *bf,
> +   size_t size, unsigned int width)
> +{
> + return repsep_snprintf(bf, size, "%-*u", width, he_ins_lat(he));
> +}
> +
> +struct sort_entry sort_local_ins_lat = {
> + .se_header  = "Local INSTR Latency",
> + .se_cmp = sort__local_ins_lat_cmp,
> + .se_snprintf= hist_entry__local_ins_lat_snprintf,
> + .se_width_idx   = HISTC_LOCAL_INS_LAT,
> +};

Hi Kan Liang,

Currently with this changes, perf will display "Local INSTR Latency”  for the 
new column in ‘perf mem report’

Can we make this header string to be Architecture Specific ?
Because in other archs, the var2_w of ‘perf_sample_weight’ could be used to 
capture something else than the Local INSTR Latency.
Can we have some weak function to populate the header string ?


Thanks
Athira Rajeev
> +
> +static int64_t
> +sort__global_ins_lat_cmp(struct hist_entry *left, struct hist_entry *right)
> +{
> + return left->stat.ins_lat - right->stat.ins_lat;
> +}
> +
> +static int hist_entry__global_ins_lat_snprintf(struct hist_entry *he, char 
> *bf,
> +size_t size, unsigned int width)
> +{
> + return repsep_snprintf(bf, size, "%-*u", width, 
> he->stat.ins_lat);
> +}
> +
> +struct sort_entry sort_global_ins_lat = {
> + .se_header  = "INSTR Latency",
> + .se_cmp = sort__global_ins_lat_cmp,
> + .se_snprintf= hist_entry__global_ins_lat_snprintf,
> + .se_width_idx   = HISTC_GLOBAL_INS_LAT,

> +};
> +
> struct sort_entry sort_mem_daddr_sym = {
>   .se_header  = "Data Symbol",
>   .se_cmp = sort__daddr_cmp,
> @@ -1770,6 +1813,8 @@ static struct sort_dimension common_sort_dimensions[] = 
> {
>   DIM(SORT_CGROUP_ID, "cgroup_id", sort_cgroup_id),
>   DIM(SORT_SYM_IPC_NULL, "ipc_null", sort_sym_ipc_null),
>   DIM(SORT_TIME, "time", sort_time),
> + DIM(SORT_LOCAL_INS_LAT, "local_ins_lat", sort_local_ins_lat),
> + DIM(SORT_GLOBAL_INS_LAT, "ins_lat", sort_global_ins_lat),
> };
> 
> #undef DIM
> diff --git a/tools/perf/util/sort.h b/tools/perf/util/sort.h
> index 2b2645b..c92ca15 100644
> --- a/tools/perf/util/sort.h
> +++ b/tools/perf/util/sort.h
> @@ -50,6 +50,7 @@ struct he_stat {
>   u64 period_guest_sys;
>   u64 period_guest_us;
>   u64 weight;
> + u64 ins_lat;
>   u32 nr_events;
> };
> 
> @@ -229,6 +230,8 @@ enum sort_type {
>   SORT_CGROUP_ID,
>   SORT_SYM_IPC_NULL,
>   SORT_TIME,
> + SORT_LOCAL_INS_LAT,
> + SORT_GLOBAL_INS_LAT,
> 
>   /* branch stack specific sort keys */
>   __SORT_BRANCH_STACK,
> diff --git a/tools/perf/util/synthetic-events.c 
> b/tools/perf/util/synthetic-events.c
> index bc16268..95401c9 100644
> --- a/tools/perf/util/synthetic-events.c
> +++ b/tools/perf/util/synthetic-events.c
> @@ -1557,8 +1557,10 @@ int perf_event__synthesize_sample(union perf_event 
> *event, u64 type, u64 read_fo
> 
>   if (type & PERF_SAMPLE_WEIGHT_TYPE) {
>   *array = sample->weight;
> - if (type & PERF_SAMPLE_WEIGHT_STRUCT)
> + if (type & PERF_SAMPLE_WEIGHT_STRUCT) {
>   *array &= 0x;
> + *array |= ((u64)sample->ins_lat << 32);
> + }
>   array++;
>   }
> 
> -- 
> 2.7.4
> 
> 
> 



Re: [PATCH V4 0/6] Add the page size in the perf record (user tools)

2021-01-19 Thread Athira Rajeev



> On 13-Jan-2021, at 12:43 AM, Liang, Kan  wrote:
> 
> 
> 
> On 1/12/2021 12:24 AM, Athira Rajeev wrote:
>>> On 06-Jan-2021, at 1:27 AM, kan.li...@linux.intel.com wrote:
>>> 
>>> From: Kan Liang 
>>> 
>>> Changes since V3:
>>> - Rebase on top of acme's perf/core branch
>>>  commit c07b45a355ee ("perf record: Tweak "Lowering..." warning in 
>>> record_opts__config_freq")
>>> 
>>> Changes since V2:
>>> - Rebase on top of acme perf/core branch
>>>  commit eec7b53d5916 ("perf test: Make sample-parsing test aware of 
>>> PERF_SAMPLE_{CODE,DATA}_PAGE_SIZE")
>>> - Use unit_number__scnprintf() in get_page_size_name()
>>> - Emit warning about kernel not supporting the code page size sample_type 
>>> bit
>>> 
>>> Changes since V1:
>>> - Fix the compile warning with GCC 10
>>> - Add Acked-by from Namhyung Kim
>>> 
>>> Current perf can report both virtual addresses and physical addresses,
>>> but not the page size. Without the page size information of the utilized
>>> page, users cannot decide whether to promote/demote large pages to
>>> optimize memory usage.
>>> 
>>> The kernel patches have been merged into tip perf/core branch,
>>> commit 8d97e71811aa ("perf/core: Add PERF_SAMPLE_DATA_PAGE_SIZE")
>>> commit 76a5433f95f3 ("perf/x86/intel: Support PERF_SAMPLE_DATA_PAGE_SIZE")
>>> commit 4cb6a42e4c4b ("powerpc/perf: Support PERF_SAMPLE_DATA_PAGE_SIZE")
>>> commit 995f088efebe ("perf/core: Add support for 
>>> PERF_SAMPLE_CODE_PAGE_SIZE")
>>> commit 51b646b2d9f8 ("perf,mm: Handle non-page-table-aligned hugetlbfs")
>>> 
>>> and Peter's perf/core branch
>>> commit 524680ce47a1 ("mm/gup: Provide gup_get_pte() more generic")
>>> commit 44a35d6937d2 ("mm: Introduce pXX_leaf_size()")
>>> commit 2f1e2f091ad0 ("perf/core: Fix arch_perf_get_page_size()")
>>> commit 7649e44aacdd ("arm64/mm: Implement pXX_leaf_size() support")
>>> commit 1df1ae7e262c ("sparc64/mm: Implement pXX_leaf_size() support")
>>> 
>>> This patch set is to enable the page size support in user tools.
>> Hi Kan Liang,
>> I am trying to check this series on powerpc.
>> # perf mem --phys-data --data-page-size record 
>> To my observation, some of the samples returned zero size and comes as ’N/A’ 
>> in the perf report
>> # perf mem --phys-data --data-page-size report
>> For fetching the page size, though initially there was a weak function added 
>> ( as arch_perf_get_page_size ) here:
>> https://git.kernel.org/pub/scm/linux/kernel/git/acme/linux.git/commit/?h=perf/core=51b646b2d9f84d6ff6300e3c1d09f2be4329a424
>> later I see it got removed here:
>> https://git.kernel.org/pub/scm/linux/kernel/git/acme/linux.git/commit/?h=perf/core=8af26be062721e52eba1550caf50b712f774c5fd
>> I picked kernel changes from 
>> git://git.kernel.org/pub/scm/linux/kernel/git/acme/linux.git , or I am 
>> missing something ?
> 
> I believe all the kernel changes have been merged.
> 
> According to the commit message of the recent changes, only Power/8xxx is 
> supported for power for now. I guess that may be the reason of some 'N/A's.
> https://lore.kernel.org/patchwork/cover/1345521/

Thanks for clarifying. 
For tools side changes, other than ’N/A’ in the perf report which I got, I 
verified the --data-page-size option for perf mem record and mem report.

For tools-side changes,
Tested-by: Athira Rajeev

Thanks
Athira
> 
> Thanks,
> Kan
> 
> 
>> Thanks
>> Athira
>>> 
>>> Kan Liang (3):
>>>  perf mem: Clean up output format
>>>  perf mem: Support data page size
>>>  perf tools: Add support for PERF_SAMPLE_CODE_PAGE_SIZE
>>> 
>>> Stephane Eranian (3):
>>>  perf script: Add support for PERF_SAMPLE_CODE_PAGE_SIZE
>>>  perf report: Add support for PERF_SAMPLE_CODE_PAGE_SIZE
>>>  perf test: Add test case for PERF_SAMPLE_CODE_PAGE_SIZE
>>> 
>>> tools/perf/Documentation/perf-mem.txt |   3 +
>>> tools/perf/Documentation/perf-record.txt  |   3 +
>>> tools/perf/Documentation/perf-report.txt  |   1 +
>>> tools/perf/Documentation/perf-script.txt  |   2 +-
>>> tools/perf/builtin-mem.c  | 111 +++---
>>> tools/perf/builtin-record.c   |   2 +
>>> tools/perf/builtin-script.c   |  13 ++-
>>> tools/perf/tests/sample-parsing.c |   4 +
>>> tools/perf/util/event.h   |   1 +
>>> tools/perf/util/evsel.c   |  18 +++-
>>> tools/perf/util/evsel.h   |   1 +
>>> tools/perf/util/hist.c|   2 +
>>> tools/perf/util/hist.h|   1 +
>>> tools/perf/util/perf_event_attr_fprintf.c |   2 +-
>>> tools/perf/util/record.h  |   1 +
>>> tools/perf/util/session.c |   3 +
>>> tools/perf/util/sort.c|  26 +
>>> tools/perf/util/sort.h|   2 +
>>> tools/perf/util/synthetic-events.c|   8 ++
>>> 19 files changed, 144 insertions(+), 60 deletions(-)
>>> 
>>> -- 
>>> 2.25.1



Re: [PATCH V4 0/6] Add the page size in the perf record (user tools)

2021-01-11 Thread Athira Rajeev



> On 06-Jan-2021, at 1:27 AM, kan.li...@linux.intel.com wrote:
> 
> From: Kan Liang 
> 
> Changes since V3:
> - Rebase on top of acme's perf/core branch
>  commit c07b45a355ee ("perf record: Tweak "Lowering..." warning in 
> record_opts__config_freq")
> 
> Changes since V2:
> - Rebase on top of acme perf/core branch
>  commit eec7b53d5916 ("perf test: Make sample-parsing test aware of 
> PERF_SAMPLE_{CODE,DATA}_PAGE_SIZE")
> - Use unit_number__scnprintf() in get_page_size_name()
> - Emit warning about kernel not supporting the code page size sample_type bit
> 
> Changes since V1:
> - Fix the compile warning with GCC 10
> - Add Acked-by from Namhyung Kim
> 
> Current perf can report both virtual addresses and physical addresses,
> but not the page size. Without the page size information of the utilized
> page, users cannot decide whether to promote/demote large pages to
> optimize memory usage.
> 
> The kernel patches have been merged into tip perf/core branch,
> commit 8d97e71811aa ("perf/core: Add PERF_SAMPLE_DATA_PAGE_SIZE")
> commit 76a5433f95f3 ("perf/x86/intel: Support PERF_SAMPLE_DATA_PAGE_SIZE")
> commit 4cb6a42e4c4b ("powerpc/perf: Support PERF_SAMPLE_DATA_PAGE_SIZE")
> commit 995f088efebe ("perf/core: Add support for PERF_SAMPLE_CODE_PAGE_SIZE")
> commit 51b646b2d9f8 ("perf,mm: Handle non-page-table-aligned hugetlbfs")
> 
> and Peter's perf/core branch
> commit 524680ce47a1 ("mm/gup: Provide gup_get_pte() more generic")
> commit 44a35d6937d2 ("mm: Introduce pXX_leaf_size()")
> commit 2f1e2f091ad0 ("perf/core: Fix arch_perf_get_page_size()")
> commit 7649e44aacdd ("arm64/mm: Implement pXX_leaf_size() support")
> commit 1df1ae7e262c ("sparc64/mm: Implement pXX_leaf_size() support")
> 
> This patch set is to enable the page size support in user tools.

Hi Kan Liang,

I am trying to check this series on powerpc. 

# perf mem --phys-data --data-page-size record 

To my observation, some of the samples returned zero size and comes as ’N/A’ in 
the perf report 

# perf mem --phys-data --data-page-size report 

For fetching the page size, though initially there was a weak function added ( 
as arch_perf_get_page_size ) here: 

https://git.kernel.org/pub/scm/linux/kernel/git/acme/linux.git/commit/?h=perf/core=51b646b2d9f84d6ff6300e3c1d09f2be4329a424

later I see it got removed here: 

https://git.kernel.org/pub/scm/linux/kernel/git/acme/linux.git/commit/?h=perf/core=8af26be062721e52eba1550caf50b712f774c5fd
 

I picked kernel changes from 
git://git.kernel.org/pub/scm/linux/kernel/git/acme/linux.git , or I am missing 
something ?

Thanks
Athira

> 
> Kan Liang (3):
>  perf mem: Clean up output format
>  perf mem: Support data page size
>  perf tools: Add support for PERF_SAMPLE_CODE_PAGE_SIZE
> 
> Stephane Eranian (3):
>  perf script: Add support for PERF_SAMPLE_CODE_PAGE_SIZE
>  perf report: Add support for PERF_SAMPLE_CODE_PAGE_SIZE
>  perf test: Add test case for PERF_SAMPLE_CODE_PAGE_SIZE
> 
> tools/perf/Documentation/perf-mem.txt |   3 +
> tools/perf/Documentation/perf-record.txt  |   3 +
> tools/perf/Documentation/perf-report.txt  |   1 +
> tools/perf/Documentation/perf-script.txt  |   2 +-
> tools/perf/builtin-mem.c  | 111 +++---
> tools/perf/builtin-record.c   |   2 +
> tools/perf/builtin-script.c   |  13 ++-
> tools/perf/tests/sample-parsing.c |   4 +
> tools/perf/util/event.h   |   1 +
> tools/perf/util/evsel.c   |  18 +++-
> tools/perf/util/evsel.h   |   1 +
> tools/perf/util/hist.c|   2 +
> tools/perf/util/hist.h|   1 +
> tools/perf/util/perf_event_attr_fprintf.c |   2 +-
> tools/perf/util/record.h  |   1 +
> tools/perf/util/session.c |   3 +
> tools/perf/util/sort.c|  26 +
> tools/perf/util/sort.h|   2 +
> tools/perf/util/synthetic-events.c|   8 ++
> 19 files changed, 144 insertions(+), 60 deletions(-)
> 
> -- 
> 2.25.1
> 
> 
> 



Re: [PATCH -next v2] powerpc/perf: Fix symbol undeclared warning

2020-09-23 Thread Athira Rajeev



> On 23-Sep-2020, at 12:44 PM, Wang Wensheng  wrote:
> 
> Build kernel with `C=2`:
> arch/powerpc/perf/isa207-common.c:24:18: warning: symbol
> 'isa207_pmu_format_attr' was not declared. Should it be static?
> arch/powerpc/perf/power9-pmu.c:101:5: warning: symbol 'p9_dd21_bl_ev'
> was not declared. Should it be static?
> arch/powerpc/perf/power9-pmu.c:115:5: warning: symbol 'p9_dd22_bl_ev'
> was not declared. Should it be static?
> 
> Those symbols are used only in the files that define them so we declare
> them as static to fix the warnings.

Hi, 

Looks fine to me. 

Reviewed-by: Athira Rajeev 

Thanks
Athira
> 
> Signed-off-by: Wang Wensheng 
> ---
> arch/powerpc/perf/isa207-common.c | 2 +-
> arch/powerpc/perf/power9-pmu.c| 4 ++--
> 2 files changed, 3 insertions(+), 3 deletions(-)
> 
> diff --git a/arch/powerpc/perf/isa207-common.c 
> b/arch/powerpc/perf/isa207-common.c
> index 964437adec18..85dc860b265b 100644
> --- a/arch/powerpc/perf/isa207-common.c
> +++ b/arch/powerpc/perf/isa207-common.c
> @@ -21,7 +21,7 @@ PMU_FORMAT_ATTR(thresh_stop,"config:32-35");
> PMU_FORMAT_ATTR(thresh_start, "config:36-39");
> PMU_FORMAT_ATTR(thresh_cmp,   "config:40-49");
> 
> -struct attribute *isa207_pmu_format_attr[] = {
> +static struct attribute *isa207_pmu_format_attr[] = {
>   _attr_event.attr,
>   _attr_pmcxsel.attr,
>   _attr_mark.attr,
> diff --git a/arch/powerpc/perf/power9-pmu.c b/arch/powerpc/perf/power9-pmu.c
> index 2a57e93a79dc..4a315fad1f99 100644
> --- a/arch/powerpc/perf/power9-pmu.c
> +++ b/arch/powerpc/perf/power9-pmu.c
> @@ -98,7 +98,7 @@ extern u64 PERF_REG_EXTENDED_MASK;
> /* PowerISA v2.07 format attribute structure*/
> extern struct attribute_group isa207_pmu_format_group;
> 
> -int p9_dd21_bl_ev[] = {
> +static int p9_dd21_bl_ev[] = {
>   PM_MRK_ST_DONE_L2,
>   PM_RADIX_PWC_L1_HIT,
>   PM_FLOP_CMPL,
> @@ -112,7 +112,7 @@ int p9_dd21_bl_ev[] = {
>   PM_DISP_HELD_SYNC_HOLD,
> };
> 
> -int p9_dd22_bl_ev[] = {
> +static int p9_dd22_bl_ev[] = {
>   PM_DTLB_MISS_16G,
>   PM_DERAT_MISS_2M,
>   PM_DTLB_MISS_2M,
> -- 
> 2.25.0
> 



Re: [PATCH -next] powerpc/perf: Fix symbol undeclared warning

2020-09-22 Thread Athira Rajeev



> On 21-Sep-2020, at 4:55 PM, Wang Wensheng  wrote:
> 
> Build kernel with `C=2`:
> arch/powerpc/perf/isa207-common.c:24:18: warning: symbol
> 'isa207_pmu_format_attr' was not declared. Should it be static?
> arch/powerpc/perf/power9-pmu.c:101:5: warning: symbol 'p9_dd21_bl_ev'
> was not declared. Should it be static?
> arch/powerpc/perf/power9-pmu.c:115:5: warning: symbol 'p9_dd22_bl_ev'
> was not declared. Should it be static?

Hi, 

It will be good to include a comment in the commit message saying what is the 
fix here. 
ex, declare p9_dd21_bl_ev/p9_dd22_bl_ev as static variable.

Thanks
Athira
> 
> Signed-off-by: Wang Wensheng 
> ---
> arch/powerpc/perf/isa207-common.c | 2 +-
> arch/powerpc/perf/power9-pmu.c| 4 ++--
> 2 files changed, 3 insertions(+), 3 deletions(-)
> 
> diff --git a/arch/powerpc/perf/isa207-common.c 
> b/arch/powerpc/perf/isa207-common.c
> index 964437adec18..85dc860b265b 100644
> --- a/arch/powerpc/perf/isa207-common.c
> +++ b/arch/powerpc/perf/isa207-common.c
> @@ -21,7 +21,7 @@ PMU_FORMAT_ATTR(thresh_stop,"config:32-35");
> PMU_FORMAT_ATTR(thresh_start, "config:36-39");
> PMU_FORMAT_ATTR(thresh_cmp,   "config:40-49");
> 
> -struct attribute *isa207_pmu_format_attr[] = {
> +static struct attribute *isa207_pmu_format_attr[] = {
>   _attr_event.attr,
>   _attr_pmcxsel.attr,
>   _attr_mark.attr,
> diff --git a/arch/powerpc/perf/power9-pmu.c b/arch/powerpc/perf/power9-pmu.c
> index 2a57e93a79dc..4a315fad1f99 100644
> --- a/arch/powerpc/perf/power9-pmu.c
> +++ b/arch/powerpc/perf/power9-pmu.c
> @@ -98,7 +98,7 @@ extern u64 PERF_REG_EXTENDED_MASK;
> /* PowerISA v2.07 format attribute structure*/
> extern struct attribute_group isa207_pmu_format_group;
> 
> -int p9_dd21_bl_ev[] = {
> +static int p9_dd21_bl_ev[] = {
>   PM_MRK_ST_DONE_L2,
>   PM_RADIX_PWC_L1_HIT,
>   PM_FLOP_CMPL,
> @@ -112,7 +112,7 @@ int p9_dd21_bl_ev[] = {
>   PM_DISP_HELD_SYNC_HOLD,
> };
> 
> -int p9_dd22_bl_ev[] = {
> +static int p9_dd22_bl_ev[] = {
>   PM_DTLB_MISS_16G,
>   PM_DERAT_MISS_2M,
>   PM_DTLB_MISS_2M,
> -- 
> 2.25.0
> 



Re: [PATCH v2 1/5] perf record: Set PERF_RECORD_PERIOD if attr->freq is set.

2020-07-29 Thread Athira Rajeev



> On 28-Jul-2020, at 9:33 PM, Arnaldo Carvalho de Melo  wrote:
> 
> Em Tue, Jul 28, 2020 at 05:43:47PM +0200, Jiri Olsa escreveu:
>> On Tue, Jul 28, 2020 at 01:57:30AM -0700, Ian Rogers wrote:
>>> From: David Sharp 
>>> 
>>> evsel__config() would only set PERF_RECORD_PERIOD if it set attr->freq
>>> from perf record options. When it is set by libpfm events, it would not
>>> get set. This changes evsel__config to see if attr->freq is set outside of
>>> whether or not it changes attr->freq itself.
>>> 
>>> Signed-off-by: David Sharp 
>>> Signed-off-by: Ian Rogers 
>> 
>> Acked-by: Jiri Olsa 
> 
> So, somebody else complained that its not PERF_RECORD_PERIOD (there is
> no such thing) that is being set, its PERF_SAMPLE_PERIOD.

Hi Arnaldo

Thanks for adding in that correction.

Athira
> 
> Since you acked it I merged it now, with that correction,
> 
> - Arnaldo
> 
>> thanks,
>> jirka
>> 
>>> ---
>>> tools/perf/util/evsel.c | 7 ++-
>>> 1 file changed, 6 insertions(+), 1 deletion(-)
>>> 
>>> diff --git a/tools/perf/util/evsel.c b/tools/perf/util/evsel.c
>>> index ef802f6d40c1..811f538f7d77 100644
>>> --- a/tools/perf/util/evsel.c
>>> +++ b/tools/perf/util/evsel.c
>>> @@ -979,13 +979,18 @@ void evsel__config(struct evsel *evsel, struct 
>>> record_opts *opts,
>>> if (!attr->sample_period || (opts->user_freq != UINT_MAX ||
>>>  opts->user_interval != ULLONG_MAX)) {
>>> if (opts->freq) {
>>> -   evsel__set_sample_bit(evsel, PERIOD);
>>> attr->freq  = 1;
>>> attr->sample_freq   = opts->freq;
>>> } else {
>>> attr->sample_period = opts->default_interval;
>>> }
>>> }
>>> +   /*
>>> +* If attr->freq was set (here or earlier), ask for period
>>> +* to be sampled.
>>> +*/
>>> +   if (attr->freq)
>>> +   evsel__set_sample_bit(evsel, PERIOD);
>>> 
>>> if (opts->no_samples)
>>> attr->sample_freq = 0;
>>> -- 
>>> 2.28.0.163.g6104cc2f0b6-goog
>>> 
>> 
> 
> -- 
> 
> - Arnaldo



Re: [PATCH] perf record: Set PERF_RECORD_SAMPLE if attr->freq is set.

2020-07-27 Thread Athira Rajeev



> On 27-Jul-2020, at 12:29 PM, Ian Rogers  wrote:
> 
> From: David Sharp 
> 
> evsel__config() would only set PERF_RECORD_SAMPLE if it set attr->freq

Hi Ian,

Commit message says PERF_RECORD_SAMPLE. But since we are setting period here, 
it has to say “PERF_SAMPLE_PERIOD” ?


Thanks
Athira 

> from perf record options. When it is set by libpfm events, it would not
> get set. This changes evsel__config to see if attr->freq is set outside of
> whether or not it changes attr->freq itself.
> 
> Signed-off-by: David Sharp 
> Signed-off-by: Ian Rogers 
> ---
> tools/perf/util/evsel.c | 7 ++-
> 1 file changed, 6 insertions(+), 1 deletion(-)
> 
> diff --git a/tools/perf/util/evsel.c b/tools/perf/util/evsel.c
> index ef802f6d40c1..811f538f7d77 100644
> --- a/tools/perf/util/evsel.c
> +++ b/tools/perf/util/evsel.c
> @@ -979,13 +979,18 @@ void evsel__config(struct evsel *evsel, struct 
> record_opts *opts,
>   if (!attr->sample_period || (opts->user_freq != UINT_MAX ||
>opts->user_interval != ULLONG_MAX)) {
>   if (opts->freq) {
> - evsel__set_sample_bit(evsel, PERIOD);
>   attr->freq  = 1;
>   attr->sample_freq   = opts->freq;
>   } else {
>   attr->sample_period = opts->default_interval;
>   }
>   }
> + /*
> +  * If attr->freq was set (here or earlier), ask for period
> +  * to be sampled.
> +  */
> + if (attr->freq)
> + evsel__set_sample_bit(evsel, PERIOD);
> 
>   if (opts->no_samples)
>   attr->sample_freq = 0;
> -- 
> 2.28.0.rc0.142.g3c755180ce-goog
> 
> 
> 



Re: [PATCH 1/2] lockdep: improve current->(hard|soft)irqs_enabled synchronisation with actual irq state

2020-07-24 Thread Athira Rajeev



> On 24-Jul-2020, at 9:46 AM, Alexey Kardashevskiy  wrote:
> 
> 
> 
> On 23/07/2020 23:11, Nicholas Piggin wrote:
>> Excerpts from Peter Zijlstra's message of July 23, 2020 9:40 pm:
>>> On Thu, Jul 23, 2020 at 08:56:14PM +1000, Nicholas Piggin wrote:
>>> 
>>>> diff --git a/arch/powerpc/include/asm/hw_irq.h 
>>>> b/arch/powerpc/include/asm/hw_irq.h
>>>> index 3a0db7b0b46e..35060be09073 100644
>>>> --- a/arch/powerpc/include/asm/hw_irq.h
>>>> +++ b/arch/powerpc/include/asm/hw_irq.h
>>>> @@ -200,17 +200,14 @@ static inline bool arch_irqs_disabled(void)
>>>> #define powerpc_local_irq_pmu_save(flags)  \
>>>> do {   \
>>>>raw_local_irq_pmu_save(flags);  \
>>>> -  trace_hardirqs_off();   \
>>>> +  if (!raw_irqs_disabled_flags(flags))\
>>>> +  trace_hardirqs_off();   \
>>>>} while(0)
>>>> #define powerpc_local_irq_pmu_restore(flags)   \
>>>>do {\
>>>> -  if (raw_irqs_disabled_flags(flags)) {   \
>>>> -  raw_local_irq_pmu_restore(flags);   \
>>>> -  trace_hardirqs_off();   \
>>>> -  } else {\
>>>> +  if (!raw_irqs_disabled_flags(flags))\
>>>>trace_hardirqs_on();\
>>>> -  raw_local_irq_pmu_restore(flags);   \
>>>> -  }   \
>>>> +  raw_local_irq_pmu_restore(flags);   \
>>>>} while(0)
>>> 
>>> You shouldn't be calling lockdep from NMI context!
>> 
>> After this patch it doesn't.
>> 
>> trace_hardirqs_on/off implementation appears to expect to be called in NMI 
>> context though, for some reason.
>> 
>>> That is, I recently
>>> added suport for that on x86:
>>> 
>>>  https://lkml.kernel.org/r/20200623083721.155449...@infradead.org
>>>  https://lkml.kernel.org/r/20200623083721.216740...@infradead.org
>>> 
>>> But you need to be very careful on how you order things, as you can see
>>> the above relies on preempt_count() already having been incremented with
>>> NMI_MASK.
>> 
>> Hmm. My patch seems simpler.
> 
> And your patches fix my error while Peter's do not:
> 
> 
> IRQs not enabled as expected
> WARNING: CPU: 0 PID: 1377 at /home/aik/p/kernel/kernel/softirq.c:169
> __local_bh_enable_ip+0x118/0x190

Hi Nicholas, Alexey

I was able to reproduce the warning which Alexey reported using perf_fuzzer 
test suite. 
With the patch provided by Nick, I don’t see the issue anymore. This patch 
fixes the
warnings I got with perf fuzzer run.

Thanks Nick for the fix. 

Tested-by: Athira Rajeev


> 
> 
>> 
>> I don't know this stuff very well, I don't really understand what your patch 
>> enables for x86 but at least it shouldn't be incompatible with this one 
>> AFAIKS.
>> 
>> Thanks,
>> Nick
>> 
> 
> -- 
> Alexey



[PATCH V4 2/2] tools/perf: Add perf tools support for extended register capability in powerpc

2020-05-27 Thread Athira Rajeev
From: Anju T Sudhakar 

Add extended regs to sample_reg_mask in the tool side to use
with `-I?` option. Perf tools side uses extended mask to display
the platform supported register names (with -I? option) to the user
and also send this mask to the kernel to capture the extended registers
in each sample. Hence decide the mask value based on the processor
version.

Signed-off-by: Anju T Sudhakar 
[Decide extended mask at run time based on platform]
Signed-off-by: Athira Rajeev 
Reviewed-by: Madhavan Srinivasan 
---
 tools/arch/powerpc/include/uapi/asm/perf_regs.h | 14 ++-
 tools/perf/arch/powerpc/include/perf_regs.h |  5 ++-
 tools/perf/arch/powerpc/util/perf_regs.c| 55 +
 3 files changed, 72 insertions(+), 2 deletions(-)

diff --git a/tools/arch/powerpc/include/uapi/asm/perf_regs.h 
b/tools/arch/powerpc/include/uapi/asm/perf_regs.h
index f599064..485b1d5 100644
--- a/tools/arch/powerpc/include/uapi/asm/perf_regs.h
+++ b/tools/arch/powerpc/include/uapi/asm/perf_regs.h
@@ -48,6 +48,18 @@ enum perf_event_powerpc_regs {
PERF_REG_POWERPC_DSISR,
PERF_REG_POWERPC_SIER,
PERF_REG_POWERPC_MMCRA,
-   PERF_REG_POWERPC_MAX,
+   /* Extended registers */
+   PERF_REG_POWERPC_MMCR0,
+   PERF_REG_POWERPC_MMCR1,
+   PERF_REG_POWERPC_MMCR2,
+   /* Max regs without the extended regs */
+   PERF_REG_POWERPC_MAX = PERF_REG_POWERPC_MMCRA + 1,
 };
+
+#define PERF_REG_PMU_MASK  ((1ULL << PERF_REG_POWERPC_MAX) - 1)
+
+/* PERF_REG_EXTENDED_MASK value for CPU_FTR_ARCH_300 */
+#define PERF_REG_PMU_MASK_300   (((1ULL << (PERF_REG_POWERPC_MMCR2 + 1)) - 1) \
+   - PERF_REG_PMU_MASK)
+
 #endif /* _UAPI_ASM_POWERPC_PERF_REGS_H */
diff --git a/tools/perf/arch/powerpc/include/perf_regs.h 
b/tools/perf/arch/powerpc/include/perf_regs.h
index e18a355..46ed00d 100644
--- a/tools/perf/arch/powerpc/include/perf_regs.h
+++ b/tools/perf/arch/powerpc/include/perf_regs.h
@@ -64,7 +64,10 @@
[PERF_REG_POWERPC_DAR] = "dar",
[PERF_REG_POWERPC_DSISR] = "dsisr",
[PERF_REG_POWERPC_SIER] = "sier",
-   [PERF_REG_POWERPC_MMCRA] = "mmcra"
+   [PERF_REG_POWERPC_MMCRA] = "mmcra",
+   [PERF_REG_POWERPC_MMCR0] = "mmcr0",
+   [PERF_REG_POWERPC_MMCR1] = "mmcr1",
+   [PERF_REG_POWERPC_MMCR2] = "mmcr2",
 };
 
 static inline const char *perf_reg_name(int id)
diff --git a/tools/perf/arch/powerpc/util/perf_regs.c 
b/tools/perf/arch/powerpc/util/perf_regs.c
index 0a52429..9179230 100644
--- a/tools/perf/arch/powerpc/util/perf_regs.c
+++ b/tools/perf/arch/powerpc/util/perf_regs.c
@@ -6,9 +6,14 @@
 
 #include "../../../util/perf_regs.h"
 #include "../../../util/debug.h"
+#include "../../../util/event.h"
+#include "../../../util/header.h"
+#include "../../../perf-sys.h"
 
 #include 
 
+#define PVR_POWER9 0x004E
+
 const struct sample_reg sample_reg_masks[] = {
SMPL_REG(r0, PERF_REG_POWERPC_R0),
SMPL_REG(r1, PERF_REG_POWERPC_R1),
@@ -55,6 +60,9 @@
SMPL_REG(dsisr, PERF_REG_POWERPC_DSISR),
SMPL_REG(sier, PERF_REG_POWERPC_SIER),
SMPL_REG(mmcra, PERF_REG_POWERPC_MMCRA),
+   SMPL_REG(mmcr0, PERF_REG_POWERPC_MMCR0),
+   SMPL_REG(mmcr1, PERF_REG_POWERPC_MMCR1),
+   SMPL_REG(mmcr2, PERF_REG_POWERPC_MMCR2),
SMPL_REG_END
 };
 
@@ -163,3 +171,50 @@ int arch_sdt_arg_parse_op(char *old_op, char **new_op)
 
return SDT_ARG_VALID;
 }
+
+uint64_t arch__intr_reg_mask(void)
+{
+   struct perf_event_attr attr = {
+   .type   = PERF_TYPE_HARDWARE,
+   .config = PERF_COUNT_HW_CPU_CYCLES,
+   .sample_type= PERF_SAMPLE_REGS_INTR,
+   .precise_ip = 1,
+   .disabled   = 1,
+   .exclude_kernel = 1,
+   };
+   int fd, ret;
+   char buffer[64];
+   u32 version;
+   u64 extended_mask = 0;
+
+   /* Get the PVR value to set the extended
+* mask specific to platform
+*/
+   get_cpuid(buffer, sizeof(buffer));
+   ret = sscanf(buffer, "%u,", );
+
+   if (ret != 1) {
+   pr_debug("Failed to get the processor version, unable to output 
extended registers\n");
+   return PERF_REGS_MASK;
+   }
+
+   if (version == PVR_POWER9)
+   extended_mask = PERF_REG_PMU_MASK_300;
+   else
+   return PERF_REGS_MASK;
+
+   attr.sample_regs_intr = extended_mask;
+   attr.sample_period = 1;
+   event_attr_init();
+
+   /*
+* check if the pmu supports perf extended regs, before
+* returning the register mask to sample.
+*/
+   fd = sys_perf_event_open(, 0, -1, -1, 0);
+   if (fd != -1) {
+   close(fd);
+   return (extended_mask | PERF_REGS_MASK);
+   }
+   return PERF_REGS_MASK;
+}
-- 
1.8.3.1



[PATCH V4 1/2] powerpc/perf: Add support for outputting extended regs in perf intr_regs

2020-05-27 Thread Athira Rajeev
From: Anju T Sudhakar 

Add support for perf extended register capability in powerpc.
The capability flag PERF_PMU_CAP_EXTENDED_REGS, is used to indicate the
PMU which support extended registers. The generic code define the mask
of extended registers as 0 for non supported architectures.

Patch adds extended regs support for power9 platform by
exposing MMCR0, MMCR1 and MMCR2 registers.

REG_RESERVED mask needs update to include extended regs.
`PERF_REG_EXTENDED_MASK`, contains mask value of the supported registers,
is defined at runtime in the kernel based on platform since the supported
registers may differ from one processor version to another and hence the
MASK value.

with patch
--

available registers: r0 r1 r2 r3 r4 r5 r6 r7 r8 r9 r10 r11
r12 r13 r14 r15 r16 r17 r18 r19 r20 r21 r22 r23 r24 r25 r26
r27 r28 r29 r30 r31 nip msr orig_r3 ctr link xer ccr softe
trap dar dsisr sier mmcra mmcr0 mmcr1 mmcr2

PERF_RECORD_SAMPLE(IP, 0x1): 4784/4784: 0 period: 1 addr: 0
... intr regs: mask 0x ABI 64-bit
 r00xc012b77c
 r10xc03fe5e03930
 r20xc1b0e000
 r30xc03fdcddf800
 r40xc03fc788
 r50x9c422724be
 r60xc03fe5e03908
 r70xff63bddc8706
 r80x9e4
 r90x0
 r10   0x1
 r11   0x0
 r12   0xc01299c0
 r13   0xc03c4800
 r14   0x0
 r15   0x7fffdd8b8b00
 r16   0x0
 r17   0x7fffdd8be6b8
 r18   0x7e7076607730
 r19   0x2f
 r20   0xc0001fc26c68
 r21   0xc0002041e4227e00
 r22   0xc0002018fb60
 r23   0x1
 r24   0xc03ffec4d900
 r25   0x8000
 r26   0x0
 r27   0x1
 r28   0x1
 r29   0xc1be1260
 r30   0x6008010
 r31   0xc03ffebb7218
 nip   0xc012b910
 msr   0x90009033
 orig_r3 0xc012b86c
 ctr   0xc01299c0
 link  0xc012b77c
 xer   0x0
 ccr   0x2800
 softe 0x1
 trap  0xf00
 dar   0x0
 dsisr 0x800
 sier  0x0
 mmcra 0x800
 mmcr0 0x82008090
 mmcr1 0x1e00
 mmcr2 0x0
 ... thread: perf:4784

Signed-off-by: Anju T Sudhakar 
[Defined PERF_REG_EXTENDED_MASK at run time to add support for different 
platforms ]
Signed-off-by: Athira Rajeev 
Reviewed-by: Madhavan Srinivasan 
---
 arch/powerpc/include/asm/perf_event_server.h |  8 +++
 arch/powerpc/include/uapi/asm/perf_regs.h| 14 +++-
 arch/powerpc/perf/core-book3s.c  |  1 +
 arch/powerpc/perf/perf_regs.c| 34 +---
 arch/powerpc/perf/power9-pmu.c   |  6 +
 5 files changed, 59 insertions(+), 4 deletions(-)

diff --git a/arch/powerpc/include/asm/perf_event_server.h 
b/arch/powerpc/include/asm/perf_event_server.h
index 3e9703f..1458e1a 100644
--- a/arch/powerpc/include/asm/perf_event_server.h
+++ b/arch/powerpc/include/asm/perf_event_server.h
@@ -15,6 +15,9 @@
 #define MAX_EVENT_ALTERNATIVES 8
 #define MAX_LIMITED_HWCOUNTERS 2
 
+extern u64 mask_var;
+#define PERF_REG_EXTENDED_MASK  mask_var
+
 struct perf_event;
 
 /*
@@ -55,6 +58,11 @@ struct power_pmu {
int *blacklist_ev;
/* BHRB entries in the PMU */
int bhrb_nr;
+   /*
+* set this flag with `PERF_PMU_CAP_EXTENDED_REGS` if
+* the pmu supports extended perf regs capability
+*/
+   int capabilities;
 };
 
 /*
diff --git a/arch/powerpc/include/uapi/asm/perf_regs.h 
b/arch/powerpc/include/uapi/asm/perf_regs.h
index f599064..485b1d5 100644
--- a/arch/powerpc/include/uapi/asm/perf_regs.h
+++ b/arch/powerpc/include/uapi/asm/perf_regs.h
@@ -48,6 +48,18 @@ enum perf_event_powerpc_regs {
PERF_REG_POWERPC_DSISR,
PERF_REG_POWERPC_SIER,
PERF_REG_POWERPC_MMCRA,
-   PERF_REG_POWERPC_MAX,
+   /* Extended registers */
+   PERF_REG_POWERPC_MMCR0,
+   PERF_REG_POWERPC_MMCR1,
+   PERF_REG_POWERPC_MMCR2,
+   /* Max regs without the extended regs */
+   PERF_REG_POWERPC_MAX = PERF_REG_POWERPC_MMCRA + 1,
 };
+
+#define PERF_REG_PMU_MASK  ((1ULL << PERF_REG_POWERPC_MAX) - 1)
+
+/* PERF_REG_EXTENDED_MASK value for CPU_FTR_ARCH_300 */
+#define PERF_REG_PMU_MASK_300   (((1ULL << (PERF_REG_POWERPC_MMCR2 + 1)) - 1) \
+   - PERF_REG_PMU_MASK)
+
 #endif /* _UAPI_ASM_POWERPC_PERF_REGS_H */
diff --git a/arch/powerpc/perf/core-book3s.c b/arch/powerpc/perf/core-book3s.c
index 3dcfecf..7f63edf 100644
--- a/arch/powerpc/perf/core-book3s.c
+++ b/arch/powerpc/perf/core-book3s.c
@@ -2275,6 +2275,7 @@ int register_power_pmu(struct power_pmu *pmu)
pmu->name);
 
power_pmu.attr_groups = ppmu->attr_groups;
+   power_pmu.capabilities |= (ppmu->capabilities & 
PERF_PMU_CAP_EXTENDED_REGS);
 
 #ifdef MSR_HV
/*
diff --git a/arch/powerpc/perf/perf_regs.c b/arch/powerpc/perf/perf_regs

[PATCH V4 0/2] powerpc/perf: Add support for perf extended regs in powerpc

2020-05-27 Thread Athira Rajeev
Patch set to add support for perf extended register capability in
powerpc. The capability flag PERF_PMU_CAP_EXTENDED_REGS, is used to
indicate the PMU which support extended registers. The generic code
define the mask of extended registers as 0 for non supported architectures.

patch 1/2 defines this PERF_PMU_CAP_EXTENDED_REGS mask to output the
values of mmcr0,mmcr1,mmcr2 for POWER9. Defines `PERF_REG_EXTENDED_MASK`
at runtime which contains mask value of the supported registers under
extended regs.

Patch 2/2 adds extended regs to sample_reg_mask in the tool side to use
with `-I?` option.

Anju T Sudhakar (2):
  powerpc/perf: Add support for outputting extended regs in perf
intr_regs
  tools/perf: Add perf tools support for extended register capability in
powerpc

---
Changes from v3 -> v4
- Addressed the comments for new line/tab issue
  and added "Reviewed-by" from Madhavan Srinivasn.

Changes from v2 -> v3
- Split kernel and tools side patches as suggested by Arnaldo
- Addressed review comment from Madhavan Srinivasn

Changes from v1 -> v2

- PERF_REG_EXTENDED_MASK` is defined at runtime in the kernel
based on platform. This will give flexibility in using extended
regs for all processor versions where the supported registers may differ.
- removed PERF_REG_EXTENDED_MASK from the perf tools side. Based on the
processor version(from PVR value), tool side will return the appropriate
extended mask
- Since tool changes can handle without a "PERF_REG_EXTENDED_MASK" macro,
dropped patch to set NO_AUXTRACE.
- Addressed review comments from Ravi Bangoria for V1

---

 arch/powerpc/include/asm/perf_event_server.h|  8 
 arch/powerpc/include/uapi/asm/perf_regs.h   | 14 ++-
 arch/powerpc/perf/core-book3s.c |  1 +
 arch/powerpc/perf/perf_regs.c   | 34 +--
 arch/powerpc/perf/power9-pmu.c  |  6 +++
 tools/arch/powerpc/include/uapi/asm/perf_regs.h | 14 ++-
 tools/perf/arch/powerpc/include/perf_regs.h |  5 ++-
 tools/perf/arch/powerpc/util/perf_regs.c| 55 +
 8 files changed, 131 insertions(+), 6 deletions(-)

-- 
1.8.3.1



[PATCH V3 0/2] powerpc/perf: Add support for perf extended regs in powerpc

2020-05-20 Thread Athira Rajeev
Patch set to add support for perf extended register capability in
powerpc. The capability flag PERF_PMU_CAP_EXTENDED_REGS, is used to
indicate the PMU which support extended registers. The generic code
define the mask of extended registers as 0 for non supported architectures.

patch 1/2 defines this PERF_PMU_CAP_EXTENDED_REGS mask to output the
values of mmcr0,mmcr1,mmcr2 for POWER9. Defines `PERF_REG_EXTENDED_MASK`
at runtime which contains mask value of the supported registers under
extended regs.

Patch 2/2 adds extended regs to sample_reg_mask in the tool side to use
with `-I?` option.

Anju T Sudhakar (2):
  powerpc/perf: Add support for outputting extended regs in perf
intr_regs
  tools/perf: Add perf tools support for extended register capability in
powerpc
---
Changes from v2 -> v3
- Split kernel and tools side patches as suggested by Arnaldo
- Addressed review comment from Madhavan Srinivasn

Changes from v1 -> v2

- PERF_REG_EXTENDED_MASK` is defined at runtime in the kernel
based on platform. This will give flexibility in using extended
regs for all processor versions where the supported registers may differ.
- removed PERF_REG_EXTENDED_MASK from the perf tools side. Based on the
processor version(from PVR value), tool side will return the appropriate
extended mask
- Since tool changes can handle without a "PERF_REG_EXTENDED_MASK" macro,
dropped patch to set NO_AUXTRACE.
- Addressed review comments from Ravi Bangoria for V1

---

 arch/powerpc/include/asm/perf_event_server.h|  8 
 arch/powerpc/include/uapi/asm/perf_regs.h   | 14 ++-
 arch/powerpc/perf/core-book3s.c |  1 +
 arch/powerpc/perf/perf_regs.c   | 34 +--
 arch/powerpc/perf/power9-pmu.c  |  6 +++
 tools/arch/powerpc/include/uapi/asm/perf_regs.h | 14 ++-
 tools/perf/arch/powerpc/include/perf_regs.h |  5 ++-
 tools/perf/arch/powerpc/util/perf_regs.c| 55 +
 8 files changed, 131 insertions(+), 6 deletions(-)

-- 
1.8.3.1



[PATCH V3 1/2] powerpc/perf: Add support for outputting extended regs in perf intr_regs

2020-05-20 Thread Athira Rajeev
From: Anju T Sudhakar 

Add support for perf extended register capability in powerpc.
The capability flag PERF_PMU_CAP_EXTENDED_REGS, is used to indicate the
PMU which support extended registers. The generic code define the mask
of extended registers as 0 for non supported architectures.

Patch adds extended regs support for power9 platform by
exposing MMCR0, MMCR1 and MMCR2 registers.

REG_RESERVED mask needs update to include extended regs.
`PERF_REG_EXTENDED_MASK`, contains mask value of the supported registers,
is defined at runtime in the kernel based on platform since the supported
registers may differ from one processor version to another and hence the
MASK value.

with patch
--

available registers: r0 r1 r2 r3 r4 r5 r6 r7 r8 r9 r10 r11
r12 r13 r14 r15 r16 r17 r18 r19 r20 r21 r22 r23 r24 r25 r26
r27 r28 r29 r30 r31 nip msr orig_r3 ctr link xer ccr softe
trap dar dsisr sier mmcra mmcr0 mmcr1 mmcr2

PERF_RECORD_SAMPLE(IP, 0x1): 4784/4784: 0 period: 1 addr: 0
... intr regs: mask 0x ABI 64-bit
 r00xc012b77c
 r10xc03fe5e03930
 r20xc1b0e000
 r30xc03fdcddf800
 r40xc03fc788
 r50x9c422724be
 r60xc03fe5e03908
 r70xff63bddc8706
 r80x9e4
 r90x0
 r10   0x1
 r11   0x0
 r12   0xc01299c0
 r13   0xc03c4800
 r14   0x0
 r15   0x7fffdd8b8b00
 r16   0x0
 r17   0x7fffdd8be6b8
 r18   0x7e7076607730
 r19   0x2f
 r20   0xc0001fc26c68
 r21   0xc0002041e4227e00
 r22   0xc0002018fb60
 r23   0x1
 r24   0xc03ffec4d900
 r25   0x8000
 r26   0x0
 r27   0x1
 r28   0x1
 r29   0xc1be1260
 r30   0x6008010
 r31   0xc03ffebb7218
 nip   0xc012b910
 msr   0x90009033
 orig_r3 0xc012b86c
 ctr   0xc01299c0
 link  0xc012b77c
 xer   0x0
 ccr   0x2800
 softe 0x1
 trap  0xf00
 dar   0x0
 dsisr 0x800
 sier  0x0
 mmcra 0x800
 mmcr0 0x82008090
 mmcr1 0x1e00
 mmcr2 0x0
 ... thread: perf:4784

Signed-off-by: Anju T Sudhakar 
[Defined PERF_REG_EXTENDED_MASK at run time to add support for different 
platforms ]
Signed-off-by: Athira Rajeev 
---
 arch/powerpc/include/asm/perf_event_server.h |  8 +++
 arch/powerpc/include/uapi/asm/perf_regs.h| 14 +++-
 arch/powerpc/perf/core-book3s.c  |  1 +
 arch/powerpc/perf/perf_regs.c| 34 +---
 arch/powerpc/perf/power9-pmu.c   |  6 +
 5 files changed, 59 insertions(+), 4 deletions(-)

diff --git a/arch/powerpc/include/asm/perf_event_server.h 
b/arch/powerpc/include/asm/perf_event_server.h
index 3e9703f..1458e1a 100644
--- a/arch/powerpc/include/asm/perf_event_server.h
+++ b/arch/powerpc/include/asm/perf_event_server.h
@@ -15,6 +15,9 @@
 #define MAX_EVENT_ALTERNATIVES 8
 #define MAX_LIMITED_HWCOUNTERS 2
 
+extern u64 mask_var;
+#define PERF_REG_EXTENDED_MASK  mask_var
+
 struct perf_event;
 
 /*
@@ -55,6 +58,11 @@ struct power_pmu {
int *blacklist_ev;
/* BHRB entries in the PMU */
int bhrb_nr;
+   /*
+* set this flag with `PERF_PMU_CAP_EXTENDED_REGS` if
+* the pmu supports extended perf regs capability
+*/
+   int capabilities;
 };
 
 /*
diff --git a/arch/powerpc/include/uapi/asm/perf_regs.h 
b/arch/powerpc/include/uapi/asm/perf_regs.h
index f599064..485b1d5 100644
--- a/arch/powerpc/include/uapi/asm/perf_regs.h
+++ b/arch/powerpc/include/uapi/asm/perf_regs.h
@@ -48,6 +48,18 @@ enum perf_event_powerpc_regs {
PERF_REG_POWERPC_DSISR,
PERF_REG_POWERPC_SIER,
PERF_REG_POWERPC_MMCRA,
-   PERF_REG_POWERPC_MAX,
+   /* Extended registers */
+   PERF_REG_POWERPC_MMCR0,
+   PERF_REG_POWERPC_MMCR1,
+   PERF_REG_POWERPC_MMCR2,
+   /* Max regs without the extended regs */
+   PERF_REG_POWERPC_MAX = PERF_REG_POWERPC_MMCRA + 1,
 };
+
+#define PERF_REG_PMU_MASK  ((1ULL << PERF_REG_POWERPC_MAX) - 1)
+
+/* PERF_REG_EXTENDED_MASK value for CPU_FTR_ARCH_300 */
+#define PERF_REG_PMU_MASK_300   (((1ULL << (PERF_REG_POWERPC_MMCR2 + 1)) - 1) \
+   - PERF_REG_PMU_MASK)
+
 #endif /* _UAPI_ASM_POWERPC_PERF_REGS_H */
diff --git a/arch/powerpc/perf/core-book3s.c b/arch/powerpc/perf/core-book3s.c
index 3dcfecf..f56b778 100644
--- a/arch/powerpc/perf/core-book3s.c
+++ b/arch/powerpc/perf/core-book3s.c
@@ -2276,6 +2276,7 @@ int register_power_pmu(struct power_pmu *pmu)
 
power_pmu.attr_groups = ppmu->attr_groups;
 
+   power_pmu.capabilities |= (ppmu->capabilities & 
PERF_PMU_CAP_EXTENDED_REGS);
 #ifdef MSR_HV
/*
 * Use FCHV to ignore kernel events if MSR.HV is set.
diff --git a/arch/powerpc/perf/perf_regs.c b/arch/powerpc/perf/perf_regs.c

[PATCH V3 2/2] tools/perf: Add perf tools support for extended register capability in powerpc

2020-05-20 Thread Athira Rajeev
From: Anju T Sudhakar 

Add extended regs to sample_reg_mask in the tool side to use
with `-I?` option. Perf tools side uses extended mask to display
the platform supported register names (with -I? option) to the user
and also send this mask to the kernel to capture the extended registers
in each sample. Hence decide the mask value based on the processor
version.

Signed-off-by: Anju T Sudhakar 
[Decide extended mask at run time based on platform]
Signed-off-by: Athira Rajeev 
---
 tools/arch/powerpc/include/uapi/asm/perf_regs.h | 14 ++-
 tools/perf/arch/powerpc/include/perf_regs.h |  5 ++-
 tools/perf/arch/powerpc/util/perf_regs.c| 55 +
 3 files changed, 72 insertions(+), 2 deletions(-)

diff --git a/tools/arch/powerpc/include/uapi/asm/perf_regs.h 
b/tools/arch/powerpc/include/uapi/asm/perf_regs.h
index f599064..485b1d5 100644
--- a/tools/arch/powerpc/include/uapi/asm/perf_regs.h
+++ b/tools/arch/powerpc/include/uapi/asm/perf_regs.h
@@ -48,6 +48,18 @@ enum perf_event_powerpc_regs {
PERF_REG_POWERPC_DSISR,
PERF_REG_POWERPC_SIER,
PERF_REG_POWERPC_MMCRA,
-   PERF_REG_POWERPC_MAX,
+   /* Extended registers */
+   PERF_REG_POWERPC_MMCR0,
+   PERF_REG_POWERPC_MMCR1,
+   PERF_REG_POWERPC_MMCR2,
+   /* Max regs without the extended regs */
+   PERF_REG_POWERPC_MAX = PERF_REG_POWERPC_MMCRA + 1,
 };
+
+#define PERF_REG_PMU_MASK  ((1ULL << PERF_REG_POWERPC_MAX) - 1)
+
+/* PERF_REG_EXTENDED_MASK value for CPU_FTR_ARCH_300 */
+#define PERF_REG_PMU_MASK_300   (((1ULL << (PERF_REG_POWERPC_MMCR2 + 1)) - 1) \
+   - PERF_REG_PMU_MASK)
+
 #endif /* _UAPI_ASM_POWERPC_PERF_REGS_H */
diff --git a/tools/perf/arch/powerpc/include/perf_regs.h 
b/tools/perf/arch/powerpc/include/perf_regs.h
index e18a355..46ed00d 100644
--- a/tools/perf/arch/powerpc/include/perf_regs.h
+++ b/tools/perf/arch/powerpc/include/perf_regs.h
@@ -64,7 +64,10 @@
[PERF_REG_POWERPC_DAR] = "dar",
[PERF_REG_POWERPC_DSISR] = "dsisr",
[PERF_REG_POWERPC_SIER] = "sier",
-   [PERF_REG_POWERPC_MMCRA] = "mmcra"
+   [PERF_REG_POWERPC_MMCRA] = "mmcra",
+   [PERF_REG_POWERPC_MMCR0] = "mmcr0",
+   [PERF_REG_POWERPC_MMCR1] = "mmcr1",
+   [PERF_REG_POWERPC_MMCR2] = "mmcr2",
 };
 
 static inline const char *perf_reg_name(int id)
diff --git a/tools/perf/arch/powerpc/util/perf_regs.c 
b/tools/perf/arch/powerpc/util/perf_regs.c
index 0a52429..9179230 100644
--- a/tools/perf/arch/powerpc/util/perf_regs.c
+++ b/tools/perf/arch/powerpc/util/perf_regs.c
@@ -6,9 +6,14 @@
 
 #include "../../../util/perf_regs.h"
 #include "../../../util/debug.h"
+#include "../../../util/event.h"
+#include "../../../util/header.h"
+#include "../../../perf-sys.h"
 
 #include 
 
+#define PVR_POWER9 0x004E
+
 const struct sample_reg sample_reg_masks[] = {
SMPL_REG(r0, PERF_REG_POWERPC_R0),
SMPL_REG(r1, PERF_REG_POWERPC_R1),
@@ -55,6 +60,9 @@
SMPL_REG(dsisr, PERF_REG_POWERPC_DSISR),
SMPL_REG(sier, PERF_REG_POWERPC_SIER),
SMPL_REG(mmcra, PERF_REG_POWERPC_MMCRA),
+   SMPL_REG(mmcr0, PERF_REG_POWERPC_MMCR0),
+   SMPL_REG(mmcr1, PERF_REG_POWERPC_MMCR1),
+   SMPL_REG(mmcr2, PERF_REG_POWERPC_MMCR2),
SMPL_REG_END
 };
 
@@ -163,3 +171,50 @@ int arch_sdt_arg_parse_op(char *old_op, char **new_op)
 
return SDT_ARG_VALID;
 }
+
+uint64_t arch__intr_reg_mask(void)
+{
+   struct perf_event_attr attr = {
+   .type   = PERF_TYPE_HARDWARE,
+   .config = PERF_COUNT_HW_CPU_CYCLES,
+   .sample_type= PERF_SAMPLE_REGS_INTR,
+   .precise_ip = 1,
+   .disabled   = 1,
+   .exclude_kernel = 1,
+   };
+   int fd, ret;
+   char buffer[64];
+   u32 version;
+   u64 extended_mask = 0;
+
+   /* Get the PVR value to set the extended
+* mask specific to platform
+*/
+   get_cpuid(buffer, sizeof(buffer));
+   ret = sscanf(buffer, "%u,", );
+
+   if (ret != 1) {
+   pr_debug("Failed to get the processor version, unable to output 
extended registers\n");
+   return PERF_REGS_MASK;
+   }
+
+   if (version == PVR_POWER9)
+   extended_mask = PERF_REG_PMU_MASK_300;
+   else
+   return PERF_REGS_MASK;
+
+   attr.sample_regs_intr = extended_mask;
+   attr.sample_period = 1;
+   event_attr_init();
+
+   /*
+* check if the pmu supports perf extended regs, before
+* returning the register mask to sample.
+*/
+   fd = sys_perf_event_open(, 0, -1, -1, 0);
+   if (fd != -1) {
+   close(fd);
+   return (extended_mask | PERF_REGS_MASK);
+   }
+   return PERF_REGS_MASK;
+}
-- 
1.8.3.1



[PATCH V2] powerpc/perf: Add support for outputting extended regs in perf intr_regs

2020-05-19 Thread Athira Rajeev
From: Anju T Sudhakar 

Add support for perf extended register capability in powerpc.
The capability flag PERF_PMU_CAP_EXTENDED_REGS, is used to indicate the
PMU which support extended registers. The generic code define the mask
of extended registers as 0 for non supported architectures.

Patch adds extended regs support for power9 platform by
exposing MMCR0, MMCR1 and MMCR2 registers.

REG_RESERVED mask needs update to include extended regs.
`PERF_REG_EXTENDED_MASK`, contains mask value of the supported registers,
is defined at runtime in the kernel based on platform since the supported
registers may differ from one processor version to another and hence the
MASK value.

Perf tools side uses extended mask to display the platform
supported register names (with -I? option) to the user and also
send this mask to the kernel to capture the extended registers
in each sample. Hence decide the mask value based on the processor
version.

with patch
--

available registers: r0 r1 r2 r3 r4 r5 r6 r7 r8 r9 r10 r11
r12 r13 r14 r15 r16 r17 r18 r19 r20 r21 r22 r23 r24 r25 r26
r27 r28 r29 r30 r31 nip msr orig_r3 ctr link xer ccr softe
trap dar dsisr sier mmcra mmcr0 mmcr1 mmcr2

PERF_RECORD_SAMPLE(IP, 0x1): 4784/4784: 0 period: 1 addr: 0
... intr regs: mask 0x ABI 64-bit
 r00xc012b77c
 r10xc03fe5e03930
 r20xc1b0e000
 r30xc03fdcddf800
 r40xc03fc788
 r50x9c422724be
 r60xc03fe5e03908
 r70xff63bddc8706
 r80x9e4
 r90x0
 r10   0x1
 r11   0x0
 r12   0xc01299c0
 r13   0xc03c4800
 r14   0x0
 r15   0x7fffdd8b8b00
 r16   0x0
 r17   0x7fffdd8be6b8
 r18   0x7e7076607730
 r19   0x2f
 r20   0xc0001fc26c68
 r21   0xc0002041e4227e00
 r22   0xc0002018fb60
 r23   0x1
 r24   0xc03ffec4d900
 r25   0x8000
 r26   0x0
 r27   0x1
 r28   0x1
 r29   0xc1be1260
 r30   0x6008010
 r31   0xc03ffebb7218
 nip   0xc012b910
 msr   0x90009033
 orig_r3 0xc012b86c
 ctr   0xc01299c0
 link  0xc012b77c
 xer   0x0
 ccr   0x2800
 softe 0x1
 trap  0xf00
 dar   0x0
 dsisr 0x800
 sier  0x0
 mmcra 0x800
 mmcr0 0x82008090
 mmcr1 0x1e00
 mmcr2 0x0
 ... thread: perf:4784

Signed-off-by: Anju T Sudhakar 
[Defined PERF_REG_EXTENDED_MASK at run time to add support for different 
platforms ]
Signed-off-by: Athira Rajeev 
---
Changes from v1 -> v2

- PERF_REG_EXTENDED_MASK` is defined at runtime in the kernel
based on platform. This will give flexibility in using extended
regs for all processor versions where the supported registers may differ.
- removed PERF_REG_EXTENDED_MASK from the perf tools side. Based on the
processor version(from PVR value), tool side will return the appropriate
extended mask
- Since tool changes can handle without a "PERF_REG_EXTENDED_MASK" macro,
dropped patch to set NO_AUXTRACE.
- Addressed review comments from Ravi Bangoria for V1

---

 arch/powerpc/include/asm/perf_event_server.h|  8 
 arch/powerpc/include/uapi/asm/perf_regs.h   | 14 ++-
 arch/powerpc/perf/core-book3s.c |  1 +
 arch/powerpc/perf/perf_regs.c   | 34 ++--
 arch/powerpc/perf/power9-pmu.c  |  6 +++
 tools/arch/powerpc/include/uapi/asm/perf_regs.h | 14 ++-
 tools/perf/arch/powerpc/include/perf_regs.h |  5 ++-
 tools/perf/arch/powerpc/util/perf_regs.c| 54 +
 8 files changed, 130 insertions(+), 6 deletions(-)

diff --git a/arch/powerpc/include/asm/perf_event_server.h 
b/arch/powerpc/include/asm/perf_event_server.h
index 3e9703f..1458e1a 100644
--- a/arch/powerpc/include/asm/perf_event_server.h
+++ b/arch/powerpc/include/asm/perf_event_server.h
@@ -15,6 +15,9 @@
 #define MAX_EVENT_ALTERNATIVES 8
 #define MAX_LIMITED_HWCOUNTERS 2
 
+extern u64 mask_var;
+#define PERF_REG_EXTENDED_MASK  mask_var
+
 struct perf_event;
 
 /*
@@ -55,6 +58,11 @@ struct power_pmu {
int *blacklist_ev;
/* BHRB entries in the PMU */
int bhrb_nr;
+   /*
+* set this flag with `PERF_PMU_CAP_EXTENDED_REGS` if
+* the pmu supports extended perf regs capability
+*/
+   int capabilities;
 };
 
 /*
diff --git a/arch/powerpc/include/uapi/asm/perf_regs.h 
b/arch/powerpc/include/uapi/asm/perf_regs.h
index f599064..485b1d5 100644
--- a/arch/powerpc/include/uapi/asm/perf_regs.h
+++ b/arch/powerpc/include/uapi/asm/perf_regs.h
@@ -48,6 +48,18 @@ enum perf_event_powerpc_regs {
PERF_REG_POWERPC_DSISR,
PERF_REG_POWERPC_SIER,
PERF_REG_POWERPC_MMCRA,
-   PERF_REG_POWERPC_MAX,
+   /* Extended registers */
+   PERF_REG_POWERPC_MMCR0,
+   PERF_REG_POWERPC_MMCR1,
+   PERF_REG_