Re: [PATCH V2 1/3] tools/tracing: Use tools/build makefiles on latency-collector

2024-03-15 Thread Arnaldo Carvalho de Melo
On Fri, Mar 15, 2024 at 03:48:58PM +0100, Daniel Bristot de Oliveira wrote:
> On 3/15/24 15:24, Daniel Bristot de Oliveira wrote:
> > Use tools/build/ makefiles to build latency-collector, inheriting
> > the benefits of it. For example: Before this patch, a missing
> > tracefs/traceevents headers will result in fail like this:
> > 
> >  %< ---
> 
> Oops, b4 is interpreting these -- as the '---' separator, and is 
> truncating
> the message. I will fix this in a v3.
> 
> sorry.

Yeah, that confuses scripts, that separator.

But overall I tested various versions of your patches and they look ok.

That tools/build/ was done for other tools to use and so far some tools/
living projects use it: tools/objtool, tools/lib/subcmd, etc.

I just did a:

git grep tools\/build

And there is quite a few more I didn't realize have been using bits and
pieces, good.

- Arnaldo



Re: [PATCH v8 4/4] libperf: Add support for user space counter access

2021-04-20 Thread Arnaldo Carvalho de Melo
Em Wed, Apr 14, 2021 at 11:07:39AM -0500, Rob Herring escreveu:
> x86 and arm64 can both support direct access of event counters in
> userspace. The access sequence is less than trivial and currently exists
> in perf test code (tools/perf/arch/x86/tests/rdpmc.c) with copies in
> projects such as PAPI and libpfm4.
> 
> In order to support usersapce access, an event must be mmapped first
> with perf_evsel__mmap(). Then subsequent calls to perf_evsel__read()
> will use the fast path (assuming the arch supports it).

Had to apply this to fix the build on the other arches:
 
> +#if defined(__i386__) || defined(__x86_64__)
> +static u64 read_perf_counter(unsigned int counter)
> +{
> + unsigned int low, high;
> +
> + asm volatile("rdpmc" : "=a" (low), "=d" (high) : "c" (counter));
> +
> + return low | ((u64)high) << 32;
> +}
> +
> +static u64 read_timestamp(void)
> +{
> + unsigned int low, high;
> +
> + asm volatile("rdtsc" : "=a" (low), "=d" (high));
> +
> + return low | ((u64)high) << 32;
> +}
> +#else
> +static u64 read_perf_counter(unsigned int counter) { return 0; }
> +static u64 read_timestamp(void) { return 0; }
> +#endif

diff --git a/tools/lib/perf/mmap.c b/tools/lib/perf/mmap.c
index 915469f00cf4c3fb..c89dfa5f67b3a408 100644
--- a/tools/lib/perf/mmap.c
+++ b/tools/lib/perf/mmap.c
@@ -295,7 +295,7 @@ static u64 read_timestamp(void)
return low | ((u64)high) << 32;
 }
 #else
-static u64 read_perf_counter(unsigned int counter) { return 0; }
+static u64 read_perf_counter(unsigned int counter __maybe_unused) { return 0; }
 static u64 read_timestamp(void) { return 0; }
 #endif
 


Re: [PATCH] objtool: prevent memory leak in error paths

2021-04-19 Thread Arnaldo Carvalho de Melo
Em Thu, Apr 15, 2021 at 09:24:41AM +0200, Peter Zijlstra escreveu:
> On Wed, Apr 14, 2021 at 11:47:09AM +0300, Dan Carpenter wrote:
> > On Wed, Apr 14, 2021 at 01:45:11AM +0500, Muhammad Usama Anjum wrote:
> > > Memory allocated by sym and sym->name isn't being freed if some error
> > > occurs in elf_create_undef_symbol(). Free the sym and sym->name if error
> > > is detected before returning NULL.
> > > 
> > > Addresses-Coverity: ("Prevent memory leak")
> > > Fixes: 2f2f7e47f052 ("objtool: Add elf_create_undef_symbol()")
> > > Signed-off-by: Muhammad Usama Anjum 
> > > ---
> > > Only build has been tested.
> > > 
> > 
> > Just ignore leaks from the tools/ directory.  These things run and then
> > exit and all the memory is freed.  #OldSchoolGarbageCollector
> 
> Mostly true; but I suspect tools/perf might care, it has some longer
> running things in.

Yes, and now we have 'perf daemon' that is long running.

- Arnaldo


Re: [PATCH 1/1] perf data: Fix error return code in perf_data__create_dir()

2021-04-19 Thread Arnaldo Carvalho de Melo
Em Thu, Apr 15, 2021 at 01:38:40PM +0200, Jiri Olsa escreveu:
> On Thu, Apr 15, 2021 at 04:34:16PM +0800, Zhen Lei wrote:
> > Although 'ret' has been initialized to -1, but it will be reassigned by
> > the "ret = open(...)" statement in the for loop. So that, the value of
> > 'ret' is unknown when asprintf() failed.
> > 
> > Reported-by: Hulk Robot 
> > Signed-off-by: Zhen Lei 
> 
> Acked-by: Jiri Olsa 

Thanks, applied.

- Arnaldo

 
> thanks,
> jirka
> 
> > ---
> >  tools/perf/util/data.c | 5 +++--
> >  1 file changed, 3 insertions(+), 2 deletions(-)
> > 
> > diff --git a/tools/perf/util/data.c b/tools/perf/util/data.c
> > index f29af4fc3d09390..8fca4779ae6a8e9 100644
> > --- a/tools/perf/util/data.c
> > +++ b/tools/perf/util/data.c
> > @@ -35,7 +35,7 @@ void perf_data__close_dir(struct perf_data *data)
> >  int perf_data__create_dir(struct perf_data *data, int nr)
> >  {
> > struct perf_data_file *files = NULL;
> > -   int i, ret = -1;
> > +   int i, ret;
> >  
> > if (WARN_ON(!data->is_dir))
> > return -EINVAL;
> > @@ -51,7 +51,8 @@ int perf_data__create_dir(struct perf_data *data, int nr)
> > for (i = 0; i < nr; i++) {
> > struct perf_data_file *file = [i];
> >  
> > -   if (asprintf(>path, "%s/data.%d", data->path, i) < 0)
> > +   ret = asprintf(>path, "%s/data.%d", data->path, i);
> > +   if (ret < 0)
> > goto out_err;
> >  
> > ret = open(file->path, O_RDWR|O_CREAT|O_TRUNC, S_IRUSR|S_IWUSR);
> > -- 
> > 2.26.0.106.g9fadedd
> > 
> > 
> 

-- 

- Arnaldo


Re: [PATCH] perf arm64: Fix off-by-one directory paths.

2021-04-19 Thread Arnaldo Carvalho de Melo
Em Fri, Apr 16, 2021 at 02:41:13PM -0700, Ian Rogers escreveu:
> Relative path include works in the regular build due to -I paths but may
> break in other situations.

Thanks, applied.

- Arnaldo

 
> Signed-off-by: Ian Rogers 
> ---
>  tools/perf/arch/arm64/util/kvm-stat.c | 4 ++--
>  tools/perf/arch/arm64/util/pmu.c  | 4 ++--
>  tools/perf/arch/arm64/util/unwind-libunwind.c | 4 ++--
>  3 files changed, 6 insertions(+), 6 deletions(-)
> 
> diff --git a/tools/perf/arch/arm64/util/kvm-stat.c 
> b/tools/perf/arch/arm64/util/kvm-stat.c
> index 50376b9062c1..2303256b7d05 100644
> --- a/tools/perf/arch/arm64/util/kvm-stat.c
> +++ b/tools/perf/arch/arm64/util/kvm-stat.c
> @@ -1,8 +1,8 @@
>  // SPDX-License-Identifier: GPL-2.0
>  #include 
>  #include 
> -#include "../../util/evsel.h"
> -#include "../../util/kvm-stat.h"
> +#include "../../../util/evsel.h"
> +#include "../../../util/kvm-stat.h"
>  #include "arm64_exception_types.h"
>  #include "debug.h"
>  
> diff --git a/tools/perf/arch/arm64/util/pmu.c 
> b/tools/perf/arch/arm64/util/pmu.c
> index d3259d61ca75..2234fbd0a912 100644
> --- a/tools/perf/arch/arm64/util/pmu.c
> +++ b/tools/perf/arch/arm64/util/pmu.c
> @@ -1,7 +1,7 @@
>  // SPDX-License-Identifier: GPL-2.0
>  
> -#include "../../util/cpumap.h"
> -#include "../../util/pmu.h"
> +#include "../../../util/cpumap.h"
> +#include "../../../util/pmu.h"
>  
>  struct pmu_events_map *pmu_events_map__find(void)
>  {
> diff --git a/tools/perf/arch/arm64/util/unwind-libunwind.c 
> b/tools/perf/arch/arm64/util/unwind-libunwind.c
> index 1495a9523a23..5aecf88e3de6 100644
> --- a/tools/perf/arch/arm64/util/unwind-libunwind.c
> +++ b/tools/perf/arch/arm64/util/unwind-libunwind.c
> @@ -4,9 +4,9 @@
>  #ifndef REMOTE_UNWIND_LIBUNWIND
>  #include 
>  #include "perf_regs.h"
> -#include "../../util/unwind.h"
> +#include "../../../util/unwind.h"
>  #endif
> -#include "../../util/debug.h"
> +#include "../../../util/debug.h"
>  
>  int LIBUNWIND__ARCH_REG_ID(int regnum)
>  {
> -- 
> 2.31.1.368.gbe11c130af-goog
> 

-- 

- Arnaldo


Re: [PATCH] perf annotate: improve --stdio mode

2021-04-19 Thread Arnaldo Carvalho de Melo
Em Mon, Apr 19, 2021 at 09:39:37AM +0200, Martin Liška escreveu:
> @Arnaldo: May I please ping this?

Applied the refreshed version,

- Arnaldo
 
> Thanks,
> Martin
> 
> On 4/8/21 12:08 PM, Martin Liška wrote:
> > On 4/7/21 10:25 PM, Arnaldo Carvalho de Melo wrote:
> >> Em Wed, Apr 07, 2021 at 04:30:46PM -0300, Arnaldo Carvalho de Melo 
> >> escreveu:
> >>> Em Fri, Feb 26, 2021 at 10:24:00AM +0100, Martin Liška escreveu:
> >>>> On 2/23/21 8:47 PM, Arnaldo Carvalho de Melo wrote:
> >>>> Sure. But I think the current format provides quite broken visual layout:
> >>>>
> >>>>   0.00 :   405ef1: inc%r15
> >>>>   0.01 :   405ef4: vfmadd213sd 0x2b9b3(%rip),%xmm0,%xmm3# 
> >>>> 4318b0 <_IO_stdin_used+0x8b0>
> >>>>eff.c:18110.67 :   405efd: vfmadd213sd 0x2b9b2(%rip),%xmm0,%xmm3  
> >>>>   # 4318b8 <_IO_stdin_used+0x8b8>
> >>>>   :TA + tmpsd * (TB +
> >>>>
> >>>> vs.
> >>>>
> >>>>   0.00 :   405ef1: inc%r15
> >>>>   0.01 :   405ef4: vfmadd213sd 0x2b9b3(%rip),%xmm0,%xmm3# 
> >>>> 4318b0 <_IO_stdin_used+0x8b0>
> >>>>   0.67 :   405efd: vfmadd213sd 0x2b9b2(%rip),%xmm0,%xmm3# 
> >>>> 4318b8 <_IO_stdin_used+0x8b8> // eff.c:1811
> >>>>: 1810   TA + tmpsd * (TB +
> >>>>
> >>>> I bet also the current users of --stdio mode would benefit from it.
> >>>> What do you think?
> >>  
> >>> Agreed, I tried applying but it bitrotted, it seems :-\
> >>
> >> I refreshed it, please check.
> > 
> > Thanks! I've just tested the patch on top of acme/perf/core and it works as 
> > was planned.
> > I'm attaching 2 perf annotate snippets (perf annotate --stdio -l 
> > --stdio-color=always) before
> > and after the revision:
> > 
> > https://splichal.eu/tmp/perf-before.html
> > https://splichal.eu/tmp/perf-after.html
> > 
> > I hope it nicely describes that it's an improvement.
> > 
> > Cheers,
> > Martin
> > 
> >>
> >> - Arnaldo
> >>
> >> diff --git a/tools/perf/util/annotate.c b/tools/perf/util/annotate.c
> >> index 18eee25b4976bea8..abe1499a91645375 100644
> >> --- a/tools/perf/util/annotate.c
> >> +++ b/tools/perf/util/annotate.c
> >> @@ -1368,7 +1368,6 @@ annotation_line__print(struct annotation_line *al, 
> >> struct symbol *sym, u64 start
> >>  {
> >>struct disasm_line *dl = container_of(al, struct disasm_line, al);
> >>static const char *prev_line;
> >> -  static const char *prev_color;
> >>  
> >>if (al->offset != -1) {
> >>double max_percent = 0.0;
> >> @@ -1407,20 +1406,6 @@ annotation_line__print(struct annotation_line *al, 
> >> struct symbol *sym, u64 start
> >>  
> >>color = get_percent_color(max_percent);
> >>  
> >> -  /*
> >> -   * Also color the filename and line if needed, with
> >> -   * the same color than the percentage. Don't print it
> >> -   * twice for close colored addr with the same filename:line
> >> -   */
> >> -  if (al->path) {
> >> -  if (!prev_line || strcmp(prev_line, al->path)
> >> - || color != prev_color) {
> >> -  color_fprintf(stdout, color, " %s", al->path);
> >> -  prev_line = al->path;
> >> -  prev_color = color;
> >> -  }
> >> -  }
> >> -
> >>for (i = 0; i < nr_percent; i++) {
> >>struct annotation_data *data = >data[i];
> >>double percent;
> >> @@ -1441,6 +1426,19 @@ annotation_line__print(struct annotation_line *al, 
> >> struct symbol *sym, u64 start
> >>printf(" : ");
> >>  
> >>disasm_line__print(dl, start, addr_fmt_width);
> >> +
> >> +  /*
> >> +   * Also color the filename and line if needed, with
> >> +   * the same color than the percentage. Don't print it
> >> +   * twice for close colored addr with the same filename:line
> >> +   */
> >> +  if (al->path) {
> >> +  if (!prev_line || strcmp(prev_line, al->path)) {
> >> +  color_fprintf(stdout, color, " // %s", 
> >> al->path);
> >> +  prev_line = al->path;
> >> +  }
> >> +  }
> >> +
> >>printf("\n");
> >>} else if (max_lines && printed >= max_lines)
> >>return 1;
> >> @@ -1456,7 +1454,7 @@ annotation_line__print(struct annotation_line *al, 
> >> struct symbol *sym, u64 start
> >>if (!*al->line)
> >>printf(" %*s:\n", width, " ");
> >>else
> >> -  printf(" %*s: %*s %s\n", width, " ", 
> >> addr_fmt_width, " ", al->line);
> >> +  printf(" %*s: %-*d %s\n", width, " ", addr_fmt_width, 
> >> al->line_nr, al->line);
> >>}
> >>  
> >>return 0;
> >>
> > 
> 

-- 

- Arnaldo


Re: [RESEND PATCH v5 0/4] perf stat: Introduce iostat mode to provide I/O performance metrics

2021-04-19 Thread Arnaldo Carvalho de Melo
Em Mon, Apr 19, 2021 at 12:41:43PM +0300, alexander.anto...@linux.intel.com 
escreveu:
> From: Alexander Antonov 
> 
> Resending V5 with added Acked-by: Namhyung Kim  tag.

Thanks, applied.

- Arnaldo

 
> Thanks,
> Alexander
> 
> The previous version can be found at:
> v4: 
> https://lkml.kernel.org/r/20210203135830.38568-1-alexander.anto...@linux.intel.com/
> Changes in this revision are:
> v4 -> v5:
> - Addressed comments from Namhyung Kim:
>   1. Removed AGGR_PCIE_PORT aggregation mode
>   2. Added iostat_prepare() function
>   3. Moved implementation specific fprintf() calls to separate x86-related 
> function
>   4. Fixed code-related issues
> - Moved __weak iostat's functions to separate util/iostat.c file
> 
> The previous version can be found at:
> v3: 
> https://lkml.kernel.org/r/20210126080619.30275-1-alexander.anto...@linux.intel.com/
> Changes in this revision are:
> v3 -> v4:
> - Addressed comment from Namhyung Kim:
>   1. Removed NULL-termination of root ports list
> 
> The previous version can be found at:
> v2: 
> https://lkml.kernel.org/r/20201223130320.3930-1-alexander.anto...@linux.intel.com
> 
> Changes in this revision are:
> v2 -> v3:
> - Addressed comments from Namhyung Kim:
>   1. Removed perf_device pointer from evsel structure. Use priv field instead
>   2. Renamed 'iiostat' to 'iostat'
>   3. Renamed 'show' mode to 'list' mode
>   4. Renamed iiostat_delete_root_ports() to iiostat_release() and
>  iostat_show_root_ports() to iostat_list()
> 
> The previous version can be found at:
> v1: 
> https://lkml.kernel.org/r/20201210090340.14358-1-alexander.anto...@linux.intel.com
> 
> Changes in this revision are:
> v1 -> v2:
> - Addressed comment from Arnaldo Carvalho de Melo:
>   1. Using 'perf iiostat' subcommand instead of 'perf stat --iiostat':
> - Added perf-iiostat.sh script to use short command
> - Updated manual pages to get help for 'perf iiostat'
> - Added 'perf-iiostat' to perf's gitignore file
> 
> Mode is intended to provide four I/O performance metrics in MB per each
> root port:
>  - Inbound Read:   I/O devices below root port read from the host memory
>  - Inbound Write:  I/O devices below root port write to the host memory
>  - Outbound Read:  CPU reads from I/O devices below root port
>  - Outbound Write: CPU writes to I/O devices below root port
> 
> Each metric requiries only one uncore event which increments at every 4B
> transfer in corresponding direction. The formulas to compute metrics
> are generic:
> #EventCount * 4B / (1024 * 1024)
> 
> Note: iostat introduces new perf data aggregation mode - per PCIe root port
> hence -e and -M options are not supported.
> 
> Usage examples:
> 
> 1. List all PCIe root ports (example for 2-S platform):
>$ perf iostat list
>S0-uncore_iio_0<:00>
>S1-uncore_iio_0<:80>
>S0-uncore_iio_1<:17>
>S1-uncore_iio_1<:85>
>S0-uncore_iio_2<:3a>
>S1-uncore_iio_2<:ae>
>S0-uncore_iio_3<:5d>
>S1-uncore_iio_3<:d7>
> 
> 2. Collect metrics for all PCIe root ports:
>$ perf iostat -- dd if=/dev/zero of=/dev/nvme0n1 bs=1M oflag=direct
>357708+0 records in
>357707+0 records out
>375083606016 bytes (375 GB, 349 GiB) copied, 215.974 s, 1.7 GB/s
> 
> Performance counter stats for 'system wide':
> 
>   port Inbound Read(MB)Inbound Write(MB)Outbound 
> Read(MB)   Outbound Write(MB) 
>:00102 
>3 
>:80000 
>0 
>:17   352552   430 
>   21 
>:85000 
>0 
>:3a300 
>0 
>:ae000 
>0 
>:5d000 
>0 
>:d7000 
>0
> 
> 3. Collect metrics for comma separated list of PCIe root ports:
>$ perf iostat :17,0:3a -- dd if=/dev/zero of=/dev/nvme0n1 bs=1M 
> oflag=direct
>357708+0 records in
>357707+0 records out
>375083606016 bytes (375 GB, 349 GiB) copied, 197.08 s, 1.9 GB/s
> 
> Performance counter stats for 'system wide':
> 
>   port   

Re: [PATCH v2] perf vendor events: Initial json/events list for power10 platform

2021-04-19 Thread Arnaldo Carvalho de Melo
Em Mon, Apr 19, 2021 at 10:38:46PM +1000, Michael Ellerman escreveu:
> Kajol Jain  writes:
> > Patch adds initial json/events for POWER10.
> 
> Acked-by: Michael Ellerman 

Thanks, applied.

- Arnaldo

 
> cheers
> 
> > Signed-off-by: Kajol Jain 
> > Tested-by: Paul A. Clarke 
> > Reviewed-by: Paul A. Clarke 
> > ---
> >  .../perf/pmu-events/arch/powerpc/mapfile.csv  |   1 +
> >  .../arch/powerpc/power10/cache.json   |  47 +++
> >  .../arch/powerpc/power10/floating_point.json  |   7 +
> >  .../arch/powerpc/power10/frontend.json| 217 +
> >  .../arch/powerpc/power10/locks.json   |  12 +
> >  .../arch/powerpc/power10/marked.json  | 147 +
> >  .../arch/powerpc/power10/memory.json  | 192 +++
> >  .../arch/powerpc/power10/others.json  | 297 ++
> >  .../arch/powerpc/power10/pipeline.json| 297 ++
> >  .../pmu-events/arch/powerpc/power10/pmc.json  |  22 ++
> >  .../arch/powerpc/power10/translation.json |  57 
> >  11 files changed, 1296 insertions(+)
> >  create mode 100644 tools/perf/pmu-events/arch/powerpc/power10/cache.json
> >  create mode 100644 
> > tools/perf/pmu-events/arch/powerpc/power10/floating_point.json
> >  create mode 100644 tools/perf/pmu-events/arch/powerpc/power10/frontend.json
> >  create mode 100644 tools/perf/pmu-events/arch/powerpc/power10/locks.json
> >  create mode 100644 tools/perf/pmu-events/arch/powerpc/power10/marked.json
> >  create mode 100644 tools/perf/pmu-events/arch/powerpc/power10/memory.json
> >  create mode 100644 tools/perf/pmu-events/arch/powerpc/power10/others.json
> >  create mode 100644 tools/perf/pmu-events/arch/powerpc/power10/pipeline.json
> >  create mode 100644 tools/perf/pmu-events/arch/powerpc/power10/pmc.json
> >  create mode 100644 
> > tools/perf/pmu-events/arch/powerpc/power10/translation.json
> >
> > ---
> > Changelog:
> > v1 -> v2
> > - Removed inconsistencies in "BriefDescription" field and make sure
> >   it will end with period without any space at the end.
> >   Suggested by : Paul A. Clarke  
> > - Added Tested-by and Reviewed-by tag.
> > ---
> > diff --git a/tools/perf/pmu-events/arch/powerpc/mapfile.csv 
> > b/tools/perf/pmu-events/arch/powerpc/mapfile.csv
> > index 229150e7ab7d..4abdfc3f9692 100644
> > --- a/tools/perf/pmu-events/arch/powerpc/mapfile.csv
> > +++ b/tools/perf/pmu-events/arch/powerpc/mapfile.csv
> > @@ -15,3 +15,4 @@
> >  # Power8 entries
> >  004[bcd][[:xdigit:]]{4},1,power8,core
> >  004e[[:xdigit:]]{4},1,power9,core
> > +0080[[:xdigit:]]{4},1,power10,core
> > diff --git a/tools/perf/pmu-events/arch/powerpc/power10/cache.json 
> > b/tools/perf/pmu-events/arch/powerpc/power10/cache.json
> > new file mode 100644
> > index ..95e33531fbc6
> > --- /dev/null
> > +++ b/tools/perf/pmu-events/arch/powerpc/power10/cache.json
> > @@ -0,0 +1,47 @@
> > +[
> > +  {
> > +"EventCode": "1003C",
> > +"EventName": "PM_EXEC_STALL_DMISS_L2L3",
> > +"BriefDescription": "Cycles in which the oldest instruction in the 
> > pipeline was waiting for a load miss to resolve from either the local L2 or 
> > local L3."
> > +  },
> > +  {
> > +"EventCode": "34056",
> > +"EventName": "PM_EXEC_STALL_LOAD_FINISH",
> > +"BriefDescription": "Cycles in which the oldest instruction in the 
> > pipeline was finishing a load after its data was reloaded from a data 
> > source beyond the local L1; cycles in which the LSU was processing an 
> > L1-hit; cycles in which the NTF instruction merged with another load in the 
> > LMQ."
> > +  },
> > +  {
> > +"EventCode": "3006C",
> > +"EventName": "PM_RUN_CYC_SMT2_MODE",
> > +"BriefDescription": "Cycles when this thread's run latch is set and 
> > the core is in SMT2 mode."
> > +  },
> > +  {
> > +"EventCode": "300F4",
> > +"EventName": "PM_RUN_INST_CMPL_CONC",
> > +"BriefDescription": "PowerPC instructions completed by this thread 
> > when all threads in the core had the run-latch set."
> > +  },
> > +  {
> > +"EventCode": "4C016",
> > +"EventName": "PM_EXEC_STALL_DMISS_L2L3_CONFLICT",
> > +"BriefDescription": "Cycles in which the oldest instruction in the 
> > pipeline was waiting for a load miss to resolve from the local L2 or local 
> > L3, with a dispatch conflict."
> > +  },
> > +  {
> > +"EventCode": "4D014",
> > +"EventName": "PM_EXEC_STALL_LOAD",
> > +"BriefDescription": "Cycles in which the oldest instruction in the 
> > pipeline was a load instruction executing in the Load Store Unit."
> > +  },
> > +  {
> > +"EventCode": "4D016",
> > +"EventName": "PM_EXEC_STALL_PTESYNC",
> > +"BriefDescription": "Cycles in which the oldest instruction in the 
> > pipeline was a PTESYNC instruction executing in the Load Store Unit."
> > +  },
> > +  {
> > +"EventCode": "401EA",
> > +"EventName": "PM_THRESH_EXC_128",
> > +"BriefDescription": "Threshold counter exceeded a value of 128."
> > +  },
> > +  {
> > 

Re: [PATCH v2 0/2] perf cs-etm: Set time on synthesised samples to preserve ordering

2021-04-16 Thread Arnaldo Carvalho de Melo
Em Fri, Apr 16, 2021 at 09:07:09AM -0600, Mathieu Poirier escreveu:
> Hi James,
> 
> On Fri, Apr 16, 2021 at 01:56:30PM +0300, James Clark wrote:
> > Changes since v1:
> >  * Improved variable name from etm_timestamp -> cs_timestamp
> >  * Fixed ordering of Signed-off-by
> > 
> 
> You forgot to add the RB and AB you received.  Since Arnaldo is responsible 
> for
> the perf tools subsystem, please send another revision.
 


Yep, please collect Reported-by and Acked-by as you go sending new
versions of a patchset, the last one I don't have a problem collecting
myself, but if you have to resend, please collect the feedback tags.

- Arnaldo

> Thanks,
> Mathieu
> 
> > James Clark (2):
> >   perf cs-etm: Refactor timestamp variable names
> >   perf cs-etm: Set time on synthesised samples to preserve ordering
> > 
> >  .../perf/util/cs-etm-decoder/cs-etm-decoder.c | 18 +++
> >  tools/perf/util/cs-etm.c  | 52 ++-
> >  tools/perf/util/cs-etm.h  |  4 +-
> >  3 files changed, 39 insertions(+), 35 deletions(-)
> > 
> > -- 
> > 2.28.0
> > 

-- 

- Arnaldo


Re: [QUESTION] Will the pahole tar source code with corresponding libbpf submodule codes be released as well in the future?

2021-04-15 Thread Arnaldo Carvalho de Melo
Em Thu, Apr 15, 2021 at 12:01:23PM +0800, Tiezhu Yang escreveu:
> (1) tools/bpf/bpftool build failed due to the following reason:
> 
> Error: failed to load BTF from /boot/vmlinux-5.12.0-rc2: No such
> file or directory
> make: *** [Makefile:158: vmlinux.h] Error 2
> 
> (2) When set CONFIG_DEBUG_INFO_BTF=y, failed to generate BTF for vmlinux
> due to pahole is not available
> 
> BTF: .tmp_vmlinux.btf: pahole (pahole) is not available
> Failed to generate BTF for vmlinux
> Try to disable CONFIG_DEBUG_INFO_BTF
> make: *** [Makefile:1197: vmlinux] Error 1
> 
> (3) When build pahole from tar.gz source code, it still failed
> due to no libbpf submodule.


You're getting the tarball from the wrong place, you should get it from:

https://fedorapeople.org/~acme/dwarves/dwarves-1.21.tar.xz

Please read the announcement:

https://lore.kernel.org/bpf/yhrixnx1juf2a...@kernel.org/

- Arnaldo

 
> loongson@linux:~$ wget 
> https://git.kernel.org/pub/scm/devel/pahole/pahole.git/snapshot/pahole-1.21.tar.gz
> loongson@linux:~$ tar xf pahole-1.21.tar.gz
> loongson@linux:~$ cd pahole-1.21
> loongson@linux:~/pahole-1.21$ mkdir build
> loongson@linux:~/pahole-1.21$ cd build/
> loongson@linux:~/pahole-1.21/build$ cmake -D__LIB=lib ..
> -- The C compiler identification is GNU 10.2.1
> -- Detecting C compiler ABI info
> -- Detecting C compiler ABI info - done
> -- Check for working C compiler: /usr/bin/cc - skipped
> -- Detecting C compile features
> -- Detecting C compile features - done
> -- Checking availability of DWARF and ELF development libraries
> -- Looking for dwfl_module_build_id in elf
> -- Looking for dwfl_module_build_id in elf - found
> -- Found dwarf.h header: /usr/include
> -- Found elfutils/libdw.h header: /usr/include
> -- Found libdw library: /usr/lib/mips64el-linux-gnuabi64/libdw.so
> -- Found libelf library: /usr/lib/mips64el-linux-gnuabi64/libelf.so
> -- Checking availability of DWARF and ELF development libraries - done
> -- Found ZLIB: /usr/lib/mips64el-linux-gnuabi64/libz.so (found
> version "1.2.11")
> CMake Error at CMakeLists.txt:60 (message):
>   The submodules were not downloaded! GIT_SUBMODULE was turned off
> or failed.
>   Please update submodules and try again.
> 
> -- Configuring incomplete, errors occurred!
> See also "/home/loongson/pahole-1.21/build/CMakeFiles/CMakeOutput.log".
> 
> (4) I notice that the pahole git source code can build successful because
> it will clone libbpf automatically:
> 
> -- Submodule update
> Submodule 'lib/bpf' (https://github.com/libbpf/libbpf) registered
> for path 'lib/bpf'
> Cloning into '/home/loongson/pahole/lib/bpf'...
> Submodule path 'lib/bpf': checked out
> '986962fade5dfa89c2890f3854eb040d2a64ab38'
> -- Submodule update - done
> 
> (5) So Will the pahole tar source code with corresponding libbpf
> submodule codes
> be released as well in the future? just like bcc:
> https://github.com/iovisor/bcc/releases
> https://github.com/iovisor/bcc/commit/708f786e3784dc32570a079f2ed74c35731664ea
> 
> Thanks,
> Tiezhu



Re: [PATCH v8 2/4] libperf: Add evsel mmap support

2021-04-15 Thread Arnaldo Carvalho de Melo
Em Thu, Apr 15, 2021 at 03:09:28PM -0500, Rob Herring escreveu:
> On Thu, Apr 15, 2021 at 2:37 PM Arnaldo Carvalho de Melo  
> wrote:
> > Ok, b4 failed on it, probably some missing Reply to, so I'll apply it by
> > hand:
> 
> That's my fault. A duplicate message-id is the issue. git-send-email
> died after patch 1/4 (can't say I've ever had that happen). So in my
> attempt to manually resend 2-4, I was off by 1 in the message-id and
> duplicated patch 1's message-id. I should have just resent the whole
> thing.

No problem, it is already in, just letting you know to fix your scripts
:-)

- Arnaldo


Re: [PATCH] libperf: xyarray: Add bounds checks to xyarray__entry()

2021-04-15 Thread Arnaldo Carvalho de Melo
Em Thu, Apr 15, 2021 at 04:48:34PM -0300, Arnaldo Carvalho de Melo escreveu:
> Em Thu, Apr 15, 2021 at 04:46:46PM -0300, Arnaldo Carvalho de Melo escreveu:
> > Em Wed, Apr 14, 2021 at 03:53:36PM -0500, Rob Herring escreveu:
> > > On Wed, Apr 14, 2021 at 3:25 PM Namhyung Kim  wrote:
> > > > > +static inline void *xyarray__entry(struct xyarray *xy, int x, int y)
> > > > > +{
> > > > > +   if (x >= xy->max_x || y >= xy->max_y)
> > > > > +   return NULL;
> > > >
> > > > Maybe better to check negatives as well.
> > > 
> > > max_x and max_y are size_t and unsigned, so x and y will be promoted
> > > to unsigned and the check will still work.
> > 
> > Fair enough, applied.
> 
> So...:
> 
>   CC   /tmp/build/perf/builtin-script.o
> In file included from xyarray.c:2:
> /home/acme/git/perf/tools/lib/perf/include/internal/xyarray.h: In function 
> ‘xyarray__entry’:
> /home/acme/git/perf/tools/lib/perf/include/internal/xyarray.h:28:8: error: 
> comparison of integer expressions of different signedness: ‘int’ and ‘size_t’ 
> {aka ‘long unsigned int’} [-Werror=sign-compare]
>28 |  if (x >= xy->max_x || y >= xy->max_y)
>   |^~
> /home/acme/git/perf/tools/lib/perf/include/internal/xyarray.h:28:26: error: 
> comparison of integer expressions of different signedness: ‘int’ and ‘size_t’ 
> {aka ‘long unsigned int’} [-Werror=sign-compare]
>28 |  if (x >= xy->max_x || y >= xy->max_y)
>   |  ^~
> cc1: all warnings being treated as errors
> 
> 
> Fedora 33's gcc complains, so I'll cast it to size_t.

> > > It's probably better to change the args to size_t though. And perhaps
> > > on xyarray__new(), xyarray__max_y(), and xyarray__max_x() as well.

So I did this, should be enough:

diff --git a/tools/lib/perf/include/internal/xyarray.h 
b/tools/lib/perf/include/internal/xyarray.h
index f0896c00b4940016..f10af3da7b21cc15 100644
--- a/tools/lib/perf/include/internal/xyarray.h
+++ b/tools/lib/perf/include/internal/xyarray.h
@@ -23,7 +23,7 @@ static inline void *__xyarray__entry(struct xyarray *xy, int 
x, int y)
return >contents[x * xy->row_size + y * xy->entry_size];
 }
 
-static inline void *xyarray__entry(struct xyarray *xy, int x, int y)
+static inline void *xyarray__entry(struct xyarray *xy, size_t x, size_t y)
 {
if (x >= xy->max_x || y >= xy->max_y)
return NULL;


Re: [PATCH] libperf: xyarray: Add bounds checks to xyarray__entry()

2021-04-15 Thread Arnaldo Carvalho de Melo
Em Thu, Apr 15, 2021 at 04:46:46PM -0300, Arnaldo Carvalho de Melo escreveu:
> Em Wed, Apr 14, 2021 at 03:53:36PM -0500, Rob Herring escreveu:
> > On Wed, Apr 14, 2021 at 3:25 PM Namhyung Kim  wrote:
> > >
> > > On Thu, Apr 15, 2021 at 4:58 AM Rob Herring  wrote:
> > > >
> > > > xyarray__entry() is missing any bounds checking yet often the x and y
> > > > parameters come from external callers. Add bounds checks and an
> > > > unchecked __xyarray__entry().
> > > >
> > > > Cc: Peter Zijlstra 
> > > > Cc: Ingo Molnar 
> > > > Cc: Arnaldo Carvalho de Melo 
> > > > Cc: Mark Rutland 
> > > > Cc: Alexander Shishkin 
> > > > Cc: Jiri Olsa 
> > > > Cc: Namhyung Kim 
> > > > Signed-off-by: Rob Herring 
> > > > ---
> > > >  tools/lib/perf/include/internal/xyarray.h | 9 -
> > > >  1 file changed, 8 insertions(+), 1 deletion(-)
> > > >
> > > > diff --git a/tools/lib/perf/include/internal/xyarray.h 
> > > > b/tools/lib/perf/include/internal/xyarray.h
> > > > index 51e35d6c8ec4..f0896c00b494 100644
> > > > --- a/tools/lib/perf/include/internal/xyarray.h
> > > > +++ b/tools/lib/perf/include/internal/xyarray.h
> > > > @@ -18,11 +18,18 @@ struct xyarray *xyarray__new(int xlen, int ylen, 
> > > > size_t entry_size);
> > > >  void xyarray__delete(struct xyarray *xy);
> > > >  void xyarray__reset(struct xyarray *xy);
> > > >
> > > > -static inline void *xyarray__entry(struct xyarray *xy, int x, int y)
> > > > +static inline void *__xyarray__entry(struct xyarray *xy, int x, int y)
> > > >  {
> > > > return >contents[x * xy->row_size + y * xy->entry_size];
> > > >  }
> > > >
> > > > +static inline void *xyarray__entry(struct xyarray *xy, int x, int y)
> > > > +{
> > > > +   if (x >= xy->max_x || y >= xy->max_y)
> > > > +   return NULL;
> > >
> > > Maybe better to check negatives as well.
> > 
> > max_x and max_y are size_t and unsigned, so x and y will be promoted
> > to unsigned and the check will still work.
> 
> Fair enough, applied.

So...:

  CC   /tmp/build/perf/builtin-script.o
In file included from xyarray.c:2:
/home/acme/git/perf/tools/lib/perf/include/internal/xyarray.h: In function 
‘xyarray__entry’:
/home/acme/git/perf/tools/lib/perf/include/internal/xyarray.h:28:8: error: 
comparison of integer expressions of different signedness: ‘int’ and ‘size_t’ 
{aka ‘long unsigned int’} [-Werror=sign-compare]
   28 |  if (x >= xy->max_x || y >= xy->max_y)
  |^~
/home/acme/git/perf/tools/lib/perf/include/internal/xyarray.h:28:26: error: 
comparison of integer expressions of different signedness: ‘int’ and ‘size_t’ 
{aka ‘long unsigned int’} [-Werror=sign-compare]
   28 |  if (x >= xy->max_x || y >= xy->max_y)
  |  ^~
cc1: all warnings being treated as errors


Fedora 33's gcc complains, so I'll cast it to size_t.

- Arnaldo
 
>  
> > It's probably better to change the args to size_t though. And perhaps
> > on xyarray__new(), xyarray__max_y(), and xyarray__max_x() as well.

-- 

- Arnaldo


Re: [PATCH] libperf: xyarray: Add bounds checks to xyarray__entry()

2021-04-15 Thread Arnaldo Carvalho de Melo
Em Wed, Apr 14, 2021 at 03:53:36PM -0500, Rob Herring escreveu:
> On Wed, Apr 14, 2021 at 3:25 PM Namhyung Kim  wrote:
> >
> > On Thu, Apr 15, 2021 at 4:58 AM Rob Herring  wrote:
> > >
> > > xyarray__entry() is missing any bounds checking yet often the x and y
> > > parameters come from external callers. Add bounds checks and an
> > > unchecked __xyarray__entry().
> > >
> > > Cc: Peter Zijlstra 
> > > Cc: Ingo Molnar 
> > > Cc: Arnaldo Carvalho de Melo 
> > > Cc: Mark Rutland 
> > > Cc: Alexander Shishkin 
> > > Cc: Jiri Olsa 
> > > Cc: Namhyung Kim 
> > > Signed-off-by: Rob Herring 
> > > ---
> > >  tools/lib/perf/include/internal/xyarray.h | 9 -
> > >  1 file changed, 8 insertions(+), 1 deletion(-)
> > >
> > > diff --git a/tools/lib/perf/include/internal/xyarray.h 
> > > b/tools/lib/perf/include/internal/xyarray.h
> > > index 51e35d6c8ec4..f0896c00b494 100644
> > > --- a/tools/lib/perf/include/internal/xyarray.h
> > > +++ b/tools/lib/perf/include/internal/xyarray.h
> > > @@ -18,11 +18,18 @@ struct xyarray *xyarray__new(int xlen, int ylen, 
> > > size_t entry_size);
> > >  void xyarray__delete(struct xyarray *xy);
> > >  void xyarray__reset(struct xyarray *xy);
> > >
> > > -static inline void *xyarray__entry(struct xyarray *xy, int x, int y)
> > > +static inline void *__xyarray__entry(struct xyarray *xy, int x, int y)
> > >  {
> > > return >contents[x * xy->row_size + y * xy->entry_size];
> > >  }
> > >
> > > +static inline void *xyarray__entry(struct xyarray *xy, int x, int y)
> > > +{
> > > +   if (x >= xy->max_x || y >= xy->max_y)
> > > +   return NULL;
> >
> > Maybe better to check negatives as well.
> 
> max_x and max_y are size_t and unsigned, so x and y will be promoted
> to unsigned and the check will still work.

Fair enough, applied.

- Arnaldo
 
> It's probably better to change the args to size_t though. And perhaps
> on xyarray__new(), xyarray__max_y(), and xyarray__max_x() as well.


Re: [PATCH v8 2/4] libperf: Add evsel mmap support

2021-04-15 Thread Arnaldo Carvalho de Melo
Em Thu, Apr 15, 2021 at 04:14:31AM +0900, Namhyung Kim escreveu:
> On Thu, Apr 15, 2021 at 3:23 AM Arnaldo Carvalho de Melo
>  wrote:
> >
> > Em Wed, Apr 14, 2021 at 03:02:08PM -0300, Arnaldo Carvalho de Melo escreveu:
> > > Em Thu, Apr 15, 2021 at 01:41:35AM +0900, Namhyung Kim escreveu:
> > > > Hello,
> > > >
> > > > On Thu, Apr 15, 2021 at 1:07 AM Rob Herring  wrote:
> > > > > +void *perf_evsel__mmap_base(struct perf_evsel *evsel, int cpu, int 
> > > > > thread)
> > > > > +{
> > > > > +   if (FD(evsel, cpu, thread) < 0 || MMAP(evsel, cpu, thread) == 
> > > > > NULL)
> > > > > +   return NULL;
> > > >
> > > > I think you should check the cpu and the thread is in
> > > > a valid range.  Currently xyarray__entry() simply accesses
> > > > the content without checking the boundaries.
> > >
> > > So, since xyarray has the bounds, it should check it, i.e. we need to
> > > have a __xyarray__entry() that is what xyarray__entry() does, i.e.
> > > assume the values have been bounds checked, then a new method,
> > > xyarray__entry() that does bounds check, if it fails, return NULL,
> > > otherwise calls __xyarray__entry().
> > >
> > > I see this is frustrating and I should've chimed in earlier, but at
> > > least now this is getting traction, and the end result will be better
> > > not just for the feature you've been dilligently working on,
> > >
> > > Thank you for your persistence,
> >
> > Re-reading, yeah, this can be done in a separate patch, Namhyung, can I
> > have your Reviewed-by? That or an Acked-by?
> 
> Sure, for the series:
> 
> Acked-by: Namhyung Kim 

Ok, b4 failed on it, probably some missing Reply to, so I'll apply it by
hand:

[acme@five perf]$ b4 am -t -s -l --cc-trailers 
20210414155412.3697605-1-r...@kernel.org
Looking up https://lore.kernel.org/r/20210414155412.3697605-1-robh%40kernel.org
Grabbing thread from lore.kernel.org/lkml
Analyzing 11 messages in the thread
---
Thread incomplete, attempting to backfill
---
Writing ./v8_20210414_robh_libperf_userspace_counter_access.mbx
  [PATCH v8 1/4] tools/include: Add an initial math64.h
+ Acked-by: Namhyung Kim 
+ Acked-by: Jiri Olsa  (✓ DKIM/redhat.com)
+ Signed-off-by: Arnaldo Carvalho de Melo 
+ Link: https://lore.kernel.org/r/20210414155412.3697605-2-r...@kernel.org
+ Cc: Catalin Marinas 
    + Cc: Mark Rutland 
+ Cc: Itaru Kitayama 
+ Cc: Arnaldo Carvalho de Melo 
+ Cc: Will Deacon 
+ Cc: Ingo Molnar 
+ Cc: linux-kernel@vger.kernel.org
  ERROR: missing [2/4]!
  [PATCH v8 3/4] libperf: tests: Add support for verbose printing
+ Acked-by: Jiri Olsa  (✓ DKIM/redhat.com)
+ Signed-off-by: Arnaldo Carvalho de Melo 
+ Link: https://lore.kernel.org/r/20210414155412.3697605-3-r...@kernel.org
+ Cc: Catalin Marinas 
+ Cc: Mark Rutland 
+ Cc: Itaru Kitayama 
+ Cc: Peter Zijlstra 
+ Cc: Arnaldo Carvalho de Melo 
+ Cc: Namhyung Kim 
+ Cc: Will Deacon 
+ Cc: Ingo Molnar 
+ Cc: linux-kernel@vger.kernel.org
  [PATCH v8 4/4] libperf: Add support for user space counter access
+ Acked-by: Jiri Olsa  (✓ DKIM/redhat.com)
+ Signed-off-by: Arnaldo Carvalho de Melo 
+ Link: https://lore.kernel.org/r/20210414155412.3697605-4-r...@kernel.org
+ Cc: Catalin Marinas 
+ Cc: Mark Rutland 
+ Cc: Itaru Kitayama 
+ Cc: Peter Zijlstra 
+ Cc: Arnaldo Carvalho de Melo 
+ Cc: Namhyung Kim 
+ Cc: Will Deacon 
+ Cc: Ingo Molnar 
+ Cc: linux-kernel@vger.kernel.org
---
Total patches: 3
---
WARNING: Thread incomplete!
Cover: ./v8_20210414_robh_libperf_userspace_counter_access.cover
 Link: https://lore.kernel.org/r/20210414155412.3697605-1-r...@kernel.org
 Base: not found
   git am ./v8_20210414_robh_libperf_userspace_counter_access.mbx
[acme@five perf]$


Re: [PATCH 1/1] perf map: Fix error return code in maps__clone()

2021-04-15 Thread Arnaldo Carvalho de Melo
Em Thu, Apr 15, 2021 at 05:27:44PM +0800, Zhen Lei escreveu:
> Although 'err' has been initialized to -ENOMEM, but it will be reassigned
> by the "err = unwind__prepare_access(...)" statement in the for loop. So
> that, the value of 'err' is unknown when map__clone() failed.

You forgot to research and add this:

Fixes: 6c502584438bda63 ("perf unwind: Call unwind__prepare_access for forked 
thread")

So that the sta...@kernel.org guys can pick this up automagically and
apply this fix to the stable kernels.

I've added it.

Thanks, applied.

- Arnaldo
 
> Reported-by: Hulk Robot 
> Signed-off-by: Zhen Lei 
> ---
>  tools/perf/util/map.c | 7 +--
>  1 file changed, 5 insertions(+), 2 deletions(-)
> 
> diff --git a/tools/perf/util/map.c b/tools/perf/util/map.c
> index fbc40a2c17d4dca..8af693d9678cefe 100644
> --- a/tools/perf/util/map.c
> +++ b/tools/perf/util/map.c
> @@ -840,15 +840,18 @@ int maps__fixup_overlappings(struct maps *maps, struct 
> map *map, FILE *fp)
>  int maps__clone(struct thread *thread, struct maps *parent)
>  {
>   struct maps *maps = thread->maps;
> - int err = -ENOMEM;
> + int err;
>   struct map *map;
>  
>   down_read(>lock);
>  
>   maps__for_each_entry(parent, map) {
>   struct map *new = map__clone(map);
> - if (new == NULL)
> +
> + if (new == NULL) {
> + err = -ENOMEM;
>   goto out_unlock;
> + }
>  
>   err = unwind__prepare_access(maps, new, NULL);
>   if (err)
> -- 
> 2.26.0.106.g9fadedd
> 
> 

-- 

- Arnaldo


Re: [PATCH v2] perf beauty: Fix fsconfig generator

2021-04-14 Thread Arnaldo Carvalho de Melo
Em Wed, Apr 14, 2021 at 04:08:12PM -0300, Arnaldo Carvalho de Melo escreveu:
> [root@6db6d5ad9661 perf]# tools/perf/trace/beauty/fsconfig.sh
> static const char *fsconfig_cmds[] = {
>   [0] = "SET_FLAG",
>   [1] = "SET_STRING",
>   [2] = "SET_BINARY",
>   [3] = "SET_PATH",
>   [4] = "SET_PATH_EMPTY",
>   [5] = "SET_FD",
>   [6] = "CMD_CREATE",
>   [7] = "CMD_RECONFIGURE",
> };
> [root@6db6d5ad9661 perf]#
> 
> So I guess we can sweep thru tools/perf/trace/beauty/*.sh and simplify
> things in other table generators?
> 
> Please consider this.
> 
> Thanks, applied.

Its in my tmp.perf/core branch, will go to the main one after what is in
there passes my longish regression test suite,

- Arnaldo


Re: [PATCH v2] perf beauty: Fix fsconfig generator

2021-04-14 Thread Arnaldo Carvalho de Melo
Em Wed, Apr 14, 2021 at 09:27:23PM +0300, Vitaly Chikunov escreveu:
> After gnulib update sed stopped matching `[[:space:]]*+' as before,
> causing the following compilation error:
> 
>   In file included from builtin-trace.c:719:
>   trace/beauty/generated/fsconfig_arrays.c:2:3: error: expected expression 
> before ']' token
>   2 |  [] = "",
>   |   ^
>   trace/beauty/generated/fsconfig_arrays.c:2:3: error: array index in 
> initializer not of integer type
>   trace/beauty/generated/fsconfig_arrays.c:2:3: note: (near initialization 
> for 'fsconfig_cmds')
> 
> Fix this by correcting the regular expression used in the generator.
> Also, clean up the script by removing redundant egrep, xargs, and printf
> invocations.
> 
> Fixes: d35293004a5e4 ("perf beauty: Add generator for fsconfig's 'cmd' arg 
> values")
> Co-authored-by: Dmitry V. Levin 
> Signed-off-by: Vitaly Chikunov 
> ---
>  tools/perf/trace/beauty/fsconfig.sh | 7 +++
>  1 file changed, 3 insertions(+), 4 deletions(-)
> 
> diff --git a/tools/perf/trace/beauty/fsconfig.sh 
> b/tools/perf/trace/beauty/fsconfig.sh
> index 83fb24df05c9f..bc6ef7bb7a5f9 100755
> --- a/tools/perf/trace/beauty/fsconfig.sh
> +++ b/tools/perf/trace/beauty/fsconfig.sh
> @@ -10,8 +10,7 @@ fi
>  linux_mount=${linux_header_dir}/mount.h
>  
>  printf "static const char *fsconfig_cmds[] = {\n"
> -regex='^[[:space:]]*+FSCONFIG_([[:alnum:]_]+)[[:space:]]*=[[:space:]]*([[:digit:]]+)[[:space:]]*,[[:space:]]*.*'
> -egrep $regex ${linux_mount} | \
> - sed -r "s/$regex/\2 \1/g"   | \
> - xargs printf "\t[%s] = \"%s\",\n"
> +ms='[[:space:]]*'
> +sed -nr 
> "s/^${ms}FSCONFIG_([[:alnum:]_]+)${ms}=${ms}([[:digit:]]+)${ms},.*/\t[\2] = 
> \"\1\",/p" \
> + ${linux_mount}
>  printf "};\n"

It continues working:

[acme@five perf]$ tools/perf/trace/beauty/fsconfig.sh
static const char *fsconfig_cmds[] = {
[0] = "SET_FLAG",
[1] = "SET_STRING",
[2] = "SET_BINARY",
[3] = "SET_PATH",
[4] = "SET_PATH_EMPTY",
[5] = "SET_FD",
[6] = "CMD_CREATE",
[7] = "CMD_RECONFIGURE",
};
[acme@five perf]$

Cool, this is on f33, lemme see on some other distro:

perfbuilder@fd2d918f35e1:/git/perf$ sed --version | head -1
sed (GNU sed) 4.2.2
perfbuilder@fd2d918f35e1:/git/perf$ cat tools/perf/trace/beauty/fsconfig.sh 
#!/bin/sh
# SPDX-License-Identifier: LGPL-2.1

if [ $# -ne 1 ] ; then
linux_header_dir=tools/include/uapi/linux
else
linux_header_dir=$1
fi

linux_mount=${linux_header_dir}/mount.h

printf "static const char *fsconfig_cmds[] = {\n"
ms='[[:space:]]*'
sed -nr 
"s/^${ms}FSCONFIG_([[:alnum:]_]+)${ms}=${ms}([[:digit:]]+)${ms},.*/\t[\2] = 
\"\1\",/p" \
${linux_mount}
printf "};\n"
perfbuilder@fd2d918f35e1:/git/perf$ tools/perf/trace/beauty/fsconfig.sh 
static const char *fsconfig_cmds[] = {
[0] = "SET_FLAG",
[1] = "SET_STRING",
[2] = "SET_BINARY",
[3] = "SET_PATH",
[4] = "SET_PATH_EMPTY",
[5] = "SET_FD",
[6] = "CMD_CREATE",
[7] = "CMD_RECONFIGURE",
};
perfbuilder@fd2d918f35e1:/git/perf$


[perfbuilder@five sisyphus]$ dsh alt:sisyphus
sh-4.4# bash
[root@6db6d5ad9661 /]# cat /etc/redhat-release
ALT Sisyphus Sisyphus (unstable) (sisyphus)
[root@6db6d5ad9661 /]# cd /git
[root@6db6d5ad9661 git]# cd perf
[root@6db6d5ad9661 perf]# cat tools/perf/trace/beauty/fsconfig.sh
#!/bin/sh
# SPDX-License-Identifier: LGPL-2.1

if [ $# -ne 1 ] ; then
linux_header_dir=tools/include/uapi/linux
else
linux_header_dir=$1
fi

linux_mount=${linux_header_dir}/mount.h

printf "static const char *fsconfig_cmds[] = {\n"
ms='[[:space:]]*'
sed -nr 
"s/^${ms}FSCONFIG_([[:alnum:]_]+)${ms}=${ms}([[:digit:]]+)${ms},.*/\t[\2] = 
\"\1\",/p" \
${linux_mount}
printf "};\n"
[root@6db6d5ad9661 perf]# tools/perf/trace/beauty/fsconfig.sh
static const char *fsconfig_cmds[] = {
[0] = "SET_FLAG",
[1] = "SET_STRING",
[2] = "SET_BINARY",
[3] = "SET_PATH",
[4] = "SET_PATH_EMPTY",
[5] = "SET_FD",
[6] = "CMD_CREATE",
[7] = "CMD_RECONFIGURE",
};
[root@6db6d5ad9661 perf]#

So I guess we can sweep thru tools/perf/trace/beauty/*.sh and simplify
things in other table generators?

Please consider this.

Thanks, applied.

- Arnaldo


Re: [PATCH v8 2/4] libperf: Add evsel mmap support

2021-04-14 Thread Arnaldo Carvalho de Melo
Em Wed, Apr 14, 2021 at 03:02:08PM -0300, Arnaldo Carvalho de Melo escreveu:
> Em Thu, Apr 15, 2021 at 01:41:35AM +0900, Namhyung Kim escreveu:
> > Hello,
> > 
> > On Thu, Apr 15, 2021 at 1:07 AM Rob Herring  wrote:
> > > +void *perf_evsel__mmap_base(struct perf_evsel *evsel, int cpu, int 
> > > thread)
> > > +{
> > > +   if (FD(evsel, cpu, thread) < 0 || MMAP(evsel, cpu, thread) == 
> > > NULL)
> > > +   return NULL;
> > 
> > I think you should check the cpu and the thread is in
> > a valid range.  Currently xyarray__entry() simply accesses
> > the content without checking the boundaries.
> 
> So, since xyarray has the bounds, it should check it, i.e. we need to
> have a __xyarray__entry() that is what xyarray__entry() does, i.e.
> assume the values have been bounds checked, then a new method,
> xyarray__entry() that does bounds check, if it fails, return NULL,
> otherwise calls __xyarray__entry().
> 
> I see this is frustrating and I should've chimed in earlier, but at
> least now this is getting traction, and the end result will be better
> not just for the feature you've been dilligently working on,
> 
> Thank you for your persistence,

Re-reading, yeah, this can be done in a separate patch, Namhyung, can I
have your Reviewed-by? That or an Acked-by?

- Arnaldo
 
> - Arnaldo
>  
> > Thanks,
> > Namhyung
> > 
> > 
> > > +
> > > +   return MMAP(evsel, cpu, thread)->base;
> > > +}
> > > +
> > >  int perf_evsel__read_size(struct perf_evsel *evsel)
> > >  {
> > > u64 read_format = evsel->attr.read_format;
> > > diff --git a/tools/lib/perf/include/internal/evsel.h 
> > > b/tools/lib/perf/include/internal/evsel.h
> > > index 1ffd083b235e..1c067d088bc6 100644
> > > --- a/tools/lib/perf/include/internal/evsel.h
> > > +++ b/tools/lib/perf/include/internal/evsel.h
> > > @@ -41,6 +41,7 @@ struct perf_evsel {
> > > struct perf_cpu_map *own_cpus;
> > > struct perf_thread_map  *threads;
> > > struct xyarray  *fd;
> > > +   struct xyarray  *mmap;
> > > struct xyarray  *sample_id;
> > > u64 *id;
> > > u32  ids;
> > > diff --git a/tools/lib/perf/include/perf/evsel.h 
> > > b/tools/lib/perf/include/perf/evsel.h
> > > index c82ec39a4ad0..60eae25076d3 100644
> > > --- a/tools/lib/perf/include/perf/evsel.h
> > > +++ b/tools/lib/perf/include/perf/evsel.h
> > > @@ -27,6 +27,9 @@ LIBPERF_API int perf_evsel__open(struct perf_evsel 
> > > *evsel, struct perf_cpu_map *
> > >  struct perf_thread_map *threads);
> > >  LIBPERF_API void perf_evsel__close(struct perf_evsel *evsel);
> > >  LIBPERF_API void perf_evsel__close_cpu(struct perf_evsel *evsel, int 
> > > cpu);
> > > +LIBPERF_API int perf_evsel__mmap(struct perf_evsel *evsel, int pages);
> > > +LIBPERF_API void perf_evsel__munmap(struct perf_evsel *evsel);
> > > +LIBPERF_API void *perf_evsel__mmap_base(struct perf_evsel *evsel, int 
> > > cpu, int thread);
> > >  LIBPERF_API int perf_evsel__read(struct perf_evsel *evsel, int cpu, int 
> > > thread,
> > >  struct perf_counts_values *count);
> > >  LIBPERF_API int perf_evsel__enable(struct perf_evsel *evsel);
> > > diff --git a/tools/lib/perf/libperf.map b/tools/lib/perf/libperf.map
> > > index 7be1af8a546c..c0c7ceb11060 100644
> > > --- a/tools/lib/perf/libperf.map
> > > +++ b/tools/lib/perf/libperf.map
> > > @@ -23,6 +23,9 @@ LIBPERF_0.0.1 {
> > > perf_evsel__disable;
> > > perf_evsel__open;
> > > perf_evsel__close;
> > > +   perf_evsel__mmap;
> > > +   perf_evsel__munmap;
> > > +   perf_evsel__mmap_base;
> > > perf_evsel__read;
> > > perf_evsel__cpus;
> > > perf_evsel__threads;
> > > --
> > > 2.27.0
> 
> -- 
> 
> - Arnaldo

-- 

- Arnaldo


Re: [PATCH] perf beauty: Fix fsconfig generator

2021-04-14 Thread Arnaldo Carvalho de Melo
Em Wed, Apr 14, 2021 at 07:29:42PM +0300, Vitaly Chikunov escreveu:
> After gnulib update sed stopped matching `[[:space:]]*+' as before,
> causing the following compilation error:
> 
>   In file included from builtin-trace.c:719:
>   trace/beauty/generated/fsconfig_arrays.c:2:3: error: expected expression 
> before ']' token
>   2 |  [] = "",
>   |   ^
>   trace/beauty/generated/fsconfig_arrays.c:2:3: error: array index in 
> initializer not of integer type
>   trace/beauty/generated/fsconfig_arrays.c:2:3: note: (near initialization 
> for 'fsconfig_cmds')
> 
> Fix this by correcting the regular expression used in the generator.
> Also, clean up the script by removing redundant egrep, xargs, and printf
> invocations.
> 
> Fixes: d35293004a5e4 ("perf beauty: Add generator for fsconfig's 'cmd' arg 
> values")
> Co-authored-by: Dmitry V. Levin 
> Signed-off-by: Vitaly Chikunov 
> ---
>  tools/perf/trace/beauty/fsconfig.sh | 6 ++
>  1 file changed, 2 insertions(+), 4 deletions(-)
> 
> diff --git a/tools/perf/trace/beauty/fsconfig.sh 
> b/tools/perf/trace/beauty/fsconfig.sh
> index 83fb24df05c9f..cc76b2aa7a5af 100755
> --- a/tools/perf/trace/beauty/fsconfig.sh
> +++ b/tools/perf/trace/beauty/fsconfig.sh
> @@ -10,8 +10,6 @@ fi
>  linux_mount=${linux_header_dir}/mount.h
>  
>  printf "static const char *fsconfig_cmds[] = {\n"
> -regex='^[[:space:]]*+FSCONFIG_([[:alnum:]_]+)[[:space:]]*=[[:space:]]*([[:digit:]]+)[[:space:]]*,[[:space:]]*.*'
> -egrep $regex ${linux_mount} | \
> - sed -r "s/$regex/\2 \1/g"   | \
> - xargs printf "\t[%s] = \"%s\",\n"
> +regex='^[[:space:]]*FSCONFIG_([[:alnum:]_]+)[[:space:]]*=[[:space:]]*([[:digit:]]+)[[:space:]]*,.*'
> +sed -nr "s/$regex/\t[\2] = \"\1\",/p" ${linux_mount}
>  printf "};\n"

Testing this, all working, I'll step back and ask you to remove that now
useless regex variable and do it directly in the now only line using it,
the sed one.

- Arnaldo


Re: [PATCH v8 2/4] libperf: Add evsel mmap support

2021-04-14 Thread Arnaldo Carvalho de Melo
Em Thu, Apr 15, 2021 at 01:41:35AM +0900, Namhyung Kim escreveu:
> Hello,
> 
> On Thu, Apr 15, 2021 at 1:07 AM Rob Herring  wrote:
> > +void *perf_evsel__mmap_base(struct perf_evsel *evsel, int cpu, int thread)
> > +{
> > +   if (FD(evsel, cpu, thread) < 0 || MMAP(evsel, cpu, thread) == NULL)
> > +   return NULL;
> 
> I think you should check the cpu and the thread is in
> a valid range.  Currently xyarray__entry() simply accesses
> the content without checking the boundaries.

So, since xyarray has the bounds, it should check it, i.e. we need to
have a __xyarray__entry() that is what xyarray__entry() does, i.e.
assume the values have been bounds checked, then a new method,
xyarray__entry() that does bounds check, if it fails, return NULL,
otherwise calls __xyarray__entry().

I see this is frustrating and I should've chimed in earlier, but at
least now this is getting traction, and the end result will be better
not just for the feature you've been dilligently working on,

Thank you for your persistence,

- Arnaldo
 
> Thanks,
> Namhyung
> 
> 
> > +
> > +   return MMAP(evsel, cpu, thread)->base;
> > +}
> > +
> >  int perf_evsel__read_size(struct perf_evsel *evsel)
> >  {
> > u64 read_format = evsel->attr.read_format;
> > diff --git a/tools/lib/perf/include/internal/evsel.h 
> > b/tools/lib/perf/include/internal/evsel.h
> > index 1ffd083b235e..1c067d088bc6 100644
> > --- a/tools/lib/perf/include/internal/evsel.h
> > +++ b/tools/lib/perf/include/internal/evsel.h
> > @@ -41,6 +41,7 @@ struct perf_evsel {
> > struct perf_cpu_map *own_cpus;
> > struct perf_thread_map  *threads;
> > struct xyarray  *fd;
> > +   struct xyarray  *mmap;
> > struct xyarray  *sample_id;
> > u64 *id;
> > u32  ids;
> > diff --git a/tools/lib/perf/include/perf/evsel.h 
> > b/tools/lib/perf/include/perf/evsel.h
> > index c82ec39a4ad0..60eae25076d3 100644
> > --- a/tools/lib/perf/include/perf/evsel.h
> > +++ b/tools/lib/perf/include/perf/evsel.h
> > @@ -27,6 +27,9 @@ LIBPERF_API int perf_evsel__open(struct perf_evsel 
> > *evsel, struct perf_cpu_map *
> >  struct perf_thread_map *threads);
> >  LIBPERF_API void perf_evsel__close(struct perf_evsel *evsel);
> >  LIBPERF_API void perf_evsel__close_cpu(struct perf_evsel *evsel, int cpu);
> > +LIBPERF_API int perf_evsel__mmap(struct perf_evsel *evsel, int pages);
> > +LIBPERF_API void perf_evsel__munmap(struct perf_evsel *evsel);
> > +LIBPERF_API void *perf_evsel__mmap_base(struct perf_evsel *evsel, int cpu, 
> > int thread);
> >  LIBPERF_API int perf_evsel__read(struct perf_evsel *evsel, int cpu, int 
> > thread,
> >  struct perf_counts_values *count);
> >  LIBPERF_API int perf_evsel__enable(struct perf_evsel *evsel);
> > diff --git a/tools/lib/perf/libperf.map b/tools/lib/perf/libperf.map
> > index 7be1af8a546c..c0c7ceb11060 100644
> > --- a/tools/lib/perf/libperf.map
> > +++ b/tools/lib/perf/libperf.map
> > @@ -23,6 +23,9 @@ LIBPERF_0.0.1 {
> > perf_evsel__disable;
> > perf_evsel__open;
> > perf_evsel__close;
> > +   perf_evsel__mmap;
> > +   perf_evsel__munmap;
> > +   perf_evsel__mmap_base;
> > perf_evsel__read;
> > perf_evsel__cpus;
> > perf_evsel__threads;
> > --
> > 2.27.0

-- 

- Arnaldo


[RFC] Improve workload error in 'perf record'

2021-04-14 Thread Arnaldo Carvalho de Melo
Hi,

Please take a look,

Best regards,

- Arnaldo

Arnaldo Carvalho de Melo (2):
  perf evlist: Add a method to return the list of evsels as a string
  perf record: Improve 'Workload failed' message printing events + what
was exec'ed

 tools/perf/builtin-record.c |  8 ++--
 tools/perf/util/evlist.c| 19 +++
 tools/perf/util/evlist.h|  2 ++
 3 files changed, 27 insertions(+), 2 deletions(-)

-- 
2.26.2



[PATCH 2/2] perf record: Improve 'Workload failed' message printing events + what was exec'ed

2021-04-14 Thread Arnaldo Carvalho de Melo
From: Arnaldo Carvalho de Melo 

Before:

  # perf record -a cycles,instructions,cache-misses
  Workload failed: No such file or directory
  #

After:

  # perf record -a cycles,instructions,cache-misses
  Failed to collect 'cycles' for the 'cycles,instructions,cache-misses' 
workload: No such file or directory
  #

Helps disambiguating other error scenarios:

  # perf record -a -e cycles,instructions,cache-misses bla
  Failed to collect 'cycles,instructions,cache-misses' for the 'bla' workload: 
No such file or directory
  # perf record -a cycles,instructions,cache-misses sleep 1
  Failed to collect 'cycles' for the 'cycles,instructions,cache-misses' 
workload: No such file or directory
  #

When all goes well we're back to the usual:

  # perf record -a -e cycles,instructions,cache-misses sleep 1
  [ perf record: Woken up 3 times to write data ]
  [ perf record: Captured and wrote 3.151 MB perf.data (21242 samples) ]
  #

Cc: Adrian Hunter 
Cc: Ian Rogers 
Cc: Jiri Olsa 
Cc: Namhyung Kim 
Signed-off-by: Arnaldo Carvalho de Melo 
---
 tools/perf/builtin-record.c | 8 ++--
 1 file changed, 6 insertions(+), 2 deletions(-)

diff --git a/tools/perf/builtin-record.c b/tools/perf/builtin-record.c
index 35465d1db6dda3ae..5fb9665a2ec27dde 100644
--- a/tools/perf/builtin-record.c
+++ b/tools/perf/builtin-record.c
@@ -1977,9 +1977,13 @@ static int __cmd_record(struct record *rec, int argc, 
const char **argv)
record__auxtrace_snapshot_exit(rec);
 
if (forks && workload_exec_errno) {
-   char msg[STRERR_BUFSIZE];
+   char msg[STRERR_BUFSIZE], strevsels[2048];
const char *emsg = str_error_r(workload_exec_errno, msg, 
sizeof(msg));
-   pr_err("Workload failed: %s\n", emsg);
+
+   evlist__scnprintf_evsels(rec->evlist, sizeof(strevsels), 
strevsels);
+
+   pr_err("Failed to collect '%s' for the '%s' workload: %s\n",
+   strevsels, argv[0], emsg);
err = -1;
goto out_child;
}
-- 
2.26.2



[PATCH 1/2] perf evlist: Add a method to return the list of evsels as a string

2021-04-14 Thread Arnaldo Carvalho de Melo
From: Arnaldo Carvalho de Melo 

Add a 'scnprintf' method to obtain the list of evsels in a evlist as a
string, excluding the "dummy" event used for things like receiving
metadata events (PERF_RECORD_FORK, MMAP, etc) when synthesizing
preexisting threads.

Will be used to improve the error message for workload failure in 'perf
record.

Cc: Adrian Hunter 
Cc: Ian Rogers 
Cc: Jiri Olsa 
Cc: Namhyung Kim 
Signed-off-by: Arnaldo Carvalho de Melo 
---
 tools/perf/util/evlist.c | 19 +++
 tools/perf/util/evlist.h |  2 ++
 2 files changed, 21 insertions(+)

diff --git a/tools/perf/util/evlist.c b/tools/perf/util/evlist.c
index f1c79ecf81073f74..d29a8a118973c71c 100644
--- a/tools/perf/util/evlist.c
+++ b/tools/perf/util/evlist.c
@@ -2138,3 +2138,22 @@ struct evsel *evlist__find_evsel(struct evlist *evlist, 
int idx)
}
return NULL;
 }
+
+int evlist__scnprintf_evsels(struct evlist *evlist, size_t size, char *bf)
+{
+   struct evsel *evsel;
+   int printed = 0;
+
+   evlist__for_each_entry(evlist, evsel) {
+   if (evsel__is_dummy_event(evsel))
+   continue;
+   if (size > (strlen(evsel__name(evsel)) + (printed ? 2 : 1))) {
+   printed += scnprintf(bf + printed, size - printed, 
"%s%s", printed ? "," : "", evsel__name(evsel));
+   } else {
+   printed += scnprintf(bf + printed, size - printed, 
"%s...", printed ? "," : "");
+   break;
+   }
+   }
+
+   return printed;
+}
diff --git a/tools/perf/util/evlist.h b/tools/perf/util/evlist.h
index b695ffaae519a5d0..a8b97b50cceb7e43 100644
--- a/tools/perf/util/evlist.h
+++ b/tools/perf/util/evlist.h
@@ -365,4 +365,6 @@ int evlist__ctlfd_ack(struct evlist *evlist);
 #define EVLIST_DISABLED_MSG "Events disabled\n"
 
 struct evsel *evlist__find_evsel(struct evlist *evlist, int idx);
+
+int evlist__scnprintf_evsels(struct evlist *evlist, size_t size, char *bf);
 #endif /* __PERF_EVLIST_H */
-- 
2.26.2



Re: [PATCH v7] perf annotate: Fix sample events lost in stdio mode

2021-04-14 Thread Arnaldo Carvalho de Melo
Em Mon, Apr 12, 2021 at 03:22:29PM +0800, Yang Jihong escreveu:
> On 2021/3/31 10:18, Yang Jihong wrote:
> > On 2021/3/30 15:26, Namhyung Kim wrote:
> > > On Sat, Mar 27, 2021 at 11:16 AM Yang Jihong  
> > > wrote:
> > > > On 2021/3/26 20:06, Arnaldo Carvalho de Melo wrote:
> > > > > So it seems to be working, what am I missing? Is this strictly non
> > > > > group related?

> > > > Yes, it is non group related.
> > > > This problem occurs only when different events need to be recorded at
> > > > the same time, i.e.:
> > > > perf record -e branch-misses -e branch-instructions -a sleep 1

> > > > The output results of perf script and perf annotate do not match.
> > > > Some events are not output in perf annotate.

> > > Yeah I think it's related to sort keys.  The code works with a single
> > > hist_entry for each event and symbol.  But the default sort key
> > > creates multiple entries for different threads and it causes the
> > > confusion.

> > Yes, After revome zfree from hists__find_annotations, the output of perf
> > annotate is repeated, which is related to sort keys.

> > The original problem is that notes->src may correspond to multiple
> > sample events. Therefore, we cannot simply zfree notes->src to avoid
> > repeated output.

> > Arnaldo, is there any problem with this patch? :)

> PING :)
> Is there any problem with this patch that needs to be modified?

I continue having a feeling this is kinda a bandaid, i.e. avoid the
problem, and since we have a way to work this when using a group, I fail
to see why it couldn't work when not grouping events.

But since I have no time to dive into this and Namhyung is ok with it,
I'll merge it now.

- Arnaldo


Re: [PATCH v7 2/4] libperf: Add evsel mmap support

2021-04-14 Thread Arnaldo Carvalho de Melo
Em Tue, Apr 13, 2021 at 02:07:57PM -0500, Rob Herring escreveu:
> On Tue, Apr 13, 2021 at 1:39 PM Arnaldo Carvalho de Melo  
> wrote:
> > > --- a/tools/lib/perf/evsel.c
> > > +int perf_evsel__mmap(struct perf_evsel *evsel, int pages)
> > > +{
> > > + int ret, cpu, thread;
> > Where is the counterpart?
> 
> I was assuming implicitly unmapped when closing the fd(s), but looks
> like it's when exiting the process only.
> 
> I.e. perf_evsel__munmap(), and it should be
> > called if perf_evsel__mmap() fails, right?
> 
> If perf_evsel__mmap() fails, the caller shouldn't have to do anything
> WRT mmap, right? But if the perf_mmap__mmap() call fails, we do need
> some internal clean-up. I'll fix both.

You're right, thanks!

- Arnaldo


Re: [PATCH v7 3/4] libperf: tests: Add support for verbose printing

2021-04-13 Thread Arnaldo Carvalho de Melo
Em Tue, Apr 13, 2021 at 03:49:31PM -0300, Arnaldo Carvalho de Melo escreveu:
> Em Tue, Apr 13, 2021 at 12:16:05PM -0500, Rob Herring escreveu:
> > Add __T_VERBOSE() so tests can add verbose output. The verbose output is
> > enabled with the '-v' command line option.
> 
> You forgot to show how this is used, I'm trying:
> 
>   # cd tools/lib/perf
>   # sudo make tests
> 
> So how from the command line one asks for verbose output from the tests?
> 
>   Should be:
> 
>   # sudo make tests V=1
> 

> I'm only getting a move verbose output for the Makefile steps, not from
> the actual tests.
> 
> Perhaps if I read the last cset... will do that now.

Ok, I misread, I thought that was adding a way to enable verbose mode
for _pre-existing tests_, so I tried to use it, it is only used in the
following patch...

- Arnaldo


Re: [PATCH v7 3/4] libperf: tests: Add support for verbose printing

2021-04-13 Thread Arnaldo Carvalho de Melo
Em Tue, Apr 13, 2021 at 12:16:05PM -0500, Rob Herring escreveu:
> Add __T_VERBOSE() so tests can add verbose output. The verbose output is
> enabled with the '-v' command line option.

You forgot to show how this is used, I'm trying:

  # cd tools/lib/perf
  # sudo make tests

So how from the command line one asks for verbose output from the tests?

  Should be:

  # sudo make tests V=1

?

The default output, non-verbose, is:

[acme@five perf]$ sudo make tests
  LINK test-cpumap-a
  LINK test-threadmap-a
  LINK test-evlist-a
  LINK test-evsel-a
  LINK test-cpumap-so
  LINK test-threadmap-so
  LINK test-evlist-so
  LINK test-evsel-so
running static:
- running test-cpumap.c...OK
- running test-threadmap.c...OK
- running test-evlist.c...OK
- running test-evsel.c...OK
running dynamic:
- running test-cpumap.c...OK
- running test-threadmap.c...OK
- running test-evlist.c...OK
- running test-evsel.c...OK
[acme@five perf]$

Trying a verbose mode:

[acme@five perf]$ sudo make tests V=1
make -f /home/acme/git/perf/tools/build/Makefile.build dir=. obj=libperf
make -C /home/acme/git/perf/tools/lib/api/ O= libapi.a
make -f /home/acme/git/perf/tools/build/Makefile.build dir=./fd obj=libapi
make -f /home/acme/git/perf/tools/build/Makefile.build dir=./fs obj=libapi
make -C tests
gcc -I/home/acme/git/perf/tools/lib/perf/include 
-I/home/acme/git/perf/tools/include -I/home/acme/git/perf/tools/lib -g -Wall -o 
test-cpumap-a test-cpumap.c ../libperf.a 
/home/acme/git/perf/tools/lib/api/libapi.a
gcc -I/home/acme/git/perf/tools/lib/perf/include 
-I/home/acme/git/perf/tools/include -I/home/acme/git/perf/tools/lib -g -Wall -o 
test-threadmap-a test-threadmap.c ../libperf.a 
/home/acme/git/perf/tools/lib/api/libapi.a
gcc -I/home/acme/git/perf/tools/lib/perf/include 
-I/home/acme/git/perf/tools/include -I/home/acme/git/perf/tools/lib -g -Wall -o 
test-evlist-a test-evlist.c ../libperf.a 
/home/acme/git/perf/tools/lib/api/libapi.a
gcc -I/home/acme/git/perf/tools/lib/perf/include 
-I/home/acme/git/perf/tools/include -I/home/acme/git/perf/tools/lib -g -Wall -o 
test-evsel-a test-evsel.c ../libperf.a 
/home/acme/git/perf/tools/lib/api/libapi.a
gcc -I/home/acme/git/perf/tools/lib/perf/include 
-I/home/acme/git/perf/tools/include -I/home/acme/git/perf/tools/lib -g -Wall 
-L.. -o test-cpumap-so test-cpumap.c /home/acme/git/perf/tools/lib/api/libapi.a 
-lperf
gcc -I/home/acme/git/perf/tools/lib/perf/include 
-I/home/acme/git/perf/tools/include -I/home/acme/git/perf/tools/lib -g -Wall 
-L.. -o test-threadmap-so test-threadmap.c 
/home/acme/git/perf/tools/lib/api/libapi.a -lperf
gcc -I/home/acme/git/perf/tools/lib/perf/include 
-I/home/acme/git/perf/tools/include -I/home/acme/git/perf/tools/lib -g -Wall 
-L.. -o test-evlist-so test-evlist.c /home/acme/git/perf/tools/lib/api/libapi.a 
-lperf
gcc -I/home/acme/git/perf/tools/lib/perf/include 
-I/home/acme/git/perf/tools/include -I/home/acme/git/perf/tools/lib -g -Wall 
-L.. -o test-evsel-so test-evsel.c /home/acme/git/perf/tools/lib/api/libapi.a 
-lperf
make -C tests run
running static:
- running test-cpumap.c...OK
- running test-threadmap.c...OK
- running test-evlist.c...OK
- running test-evsel.c...OK
running dynamic:
- running test-cpumap.c...OK
- running test-threadmap.c...OK
- running test-evlist.c...OK
- running test-evsel.c...OK
[acme@five perf]$


I'm only getting a move verbose output for the Makefile steps, not from
the actual tests.

Perhaps if I read the last cset... will do that now.

- Arnaldo
 
> Signed-off-by: Rob Herring 
> ---
> v5:
>  - Pass verbose flag to static tests
>  - Fix getopt loop with unsigned char (arm64)
> v3:
>  - New patch
> ---
>  tools/lib/perf/include/internal/tests.h | 32 +
>  tools/lib/perf/tests/Makefile   |  6 +++--
>  2 files changed, 36 insertions(+), 2 deletions(-)
> 
> diff --git a/tools/lib/perf/include/internal/tests.h 
> b/tools/lib/perf/include/internal/tests.h
> index 2093e8868a67..29425c2dabe1 100644
> --- a/tools/lib/perf/include/internal/tests.h
> +++ b/tools/lib/perf/include/internal/tests.h
> @@ -3,11 +3,32 @@
>  #define __LIBPERF_INTERNAL_TESTS_H
>  
>  #include 
> +#include 
>  
>  int tests_failed;
> +int tests_verbose;
> +
> +static inline int get_verbose(char **argv, int argc)
> +{
> + int c;
> + int verbose = 0;
> +
> + while ((c = getopt(argc, argv, "v")) != -1) {
> + switch (c)
> + {
> + case 'v':
> + verbose = 1;
> + break;
> + default:
> + break;
> + }
> + }
> + return verbose;
> +}
>  
>  #define __T_START\
>  do { \
> + tests_verbose = get_verbose(argv, argc);\
>   fprintf(stdout, "- running %s...", __FILE__);   \
>   fflush(NULL);   \
>   tests_failed = 0;   \
> 

Re: [PATCH v7 2/4] libperf: Add evsel mmap support

2021-04-13 Thread Arnaldo Carvalho de Melo
Em Tue, Apr 13, 2021 at 12:16:04PM -0500, Rob Herring escreveu:
> In order to support usersapce access, an event must be mmapped. While
> there's already mmap support for evlist, the usecase is a bit different
> than the self monitoring with userspace access. So let's add a new
> perf_evsel__mmap() function to mmap an evsel. This allows implementing
> userspace access as a fastpath for perf_evsel__read().
> 
> The mmapped address is returned by perf_evsel__mmap_base() which
> primarily for users/tests to check if userspace access is enabled.
> 
> Signed-off-by: Rob Herring 
> ---
> v7:
>  - Add NULL fd check to perf_evsel__mmap
> v6:
>  - split mmap struct into it's own xyarray
> v5:
>  - Create an mmap for every underlying event opened. Due to this, we
>need a different way to get the mmap ptr, so perf_evsel__mmap_base()
>is introduced.
> v4:
>  - Change perf_evsel__mmap size to pages instead of bytes
> v3:
>  - New patch split out from user access patch
> ---
>  tools/lib/perf/Documentation/libperf.txt |  2 +
>  tools/lib/perf/evsel.c   | 54 
>  tools/lib/perf/include/internal/evsel.h  |  1 +
>  tools/lib/perf/include/perf/evsel.h  |  2 +
>  tools/lib/perf/libperf.map   |  2 +
>  5 files changed, 61 insertions(+)
> 
> diff --git a/tools/lib/perf/Documentation/libperf.txt 
> b/tools/lib/perf/Documentation/libperf.txt
> index 0c74c30ed23a..a2c73df191ca 100644
> --- a/tools/lib/perf/Documentation/libperf.txt
> +++ b/tools/lib/perf/Documentation/libperf.txt
> @@ -136,6 +136,8 @@ SYNOPSIS
> struct perf_thread_map *threads);
>void perf_evsel__close(struct perf_evsel *evsel);
>void perf_evsel__close_cpu(struct perf_evsel *evsel, int cpu);
> +  int perf_evsel__mmap(struct perf_evsel *evsel, int pages);
> +  void *perf_evsel__mmap_base(struct perf_evsel *evsel, int cpu, int thread);
>int perf_evsel__read(struct perf_evsel *evsel, int cpu, int thread,
> struct perf_counts_values *count);
>int perf_evsel__enable(struct perf_evsel *evsel);
> diff --git a/tools/lib/perf/evsel.c b/tools/lib/perf/evsel.c
> index 4dc06289f4c7..7e140763552f 100644
> --- a/tools/lib/perf/evsel.c
> +++ b/tools/lib/perf/evsel.c
> @@ -11,10 +11,12 @@
>  #include 
>  #include 
>  #include 
> +#include 
>  #include 
>  #include 
>  #include 
>  #include 
> +#include 
>  
>  void perf_evsel__init(struct perf_evsel *evsel, struct perf_event_attr *attr)
>  {
> @@ -38,6 +40,7 @@ void perf_evsel__delete(struct perf_evsel *evsel)
>  }
>  
>  #define FD(e, x, y) (*(int *) xyarray__entry(e->fd, x, y))
> +#define MMAP(e, x, y) (e->mmap ? ((struct perf_mmap *) 
> xyarray__entry(e->mmap, x, y)) : NULL)
>  
>  int perf_evsel__alloc_fd(struct perf_evsel *evsel, int ncpus, int nthreads)
>  {
> @@ -55,6 +58,13 @@ int perf_evsel__alloc_fd(struct perf_evsel *evsel, int 
> ncpus, int nthreads)
>   return evsel->fd != NULL ? 0 : -ENOMEM;
>  }
>  
> +static int perf_evsel__alloc_mmap(struct perf_evsel *evsel, int ncpus, int 
> nthreads)
> +{
> + evsel->mmap = xyarray__new(ncpus, nthreads, sizeof(struct perf_mmap));
> +
> + return evsel->mmap != NULL ? 0 : -ENOMEM;
> +}
> +
>  static int
>  sys_perf_event_open(struct perf_event_attr *attr,
>   pid_t pid, int cpu, int group_fd,
> @@ -137,6 +147,8 @@ void perf_evsel__free_fd(struct perf_evsel *evsel)
>  {
>   xyarray__delete(evsel->fd);
>   evsel->fd = NULL;
> + xyarray__delete(evsel->mmap);
> + evsel->mmap = NULL;
>  }
>  
>  void perf_evsel__close(struct perf_evsel *evsel)
> @@ -156,6 +168,48 @@ void perf_evsel__close_cpu(struct perf_evsel *evsel, int 
> cpu)
>   perf_evsel__close_fd_cpu(evsel, cpu);
>  }
>  
> +int perf_evsel__mmap(struct perf_evsel *evsel, int pages)
> +{
> + int ret, cpu, thread;
> + struct perf_mmap_param mp = {
> + .prot = PROT_READ | PROT_WRITE,
> + .mask = (pages * page_size) - 1,
> + };
> +
> + if (evsel->fd == NULL)
> + return -EINVAL;
> +
> + if (evsel->mmap == NULL &&
> + perf_evsel__alloc_mmap(evsel, xyarray__max_x(evsel->fd), 
> xyarray__max_y(evsel->fd)) < 0)
> + return -ENOMEM;
> +
> + for (cpu = 0; cpu < xyarray__max_x(evsel->fd); cpu++) {
> + for (thread = 0; thread < xyarray__max_y(evsel->fd); thread++) {
> + int fd = FD(evsel, cpu, thread);
> + struct perf_mmap *map = MMAP(evsel, cpu, thread);
> +
> + if (fd < 0)
> + continue;
> +
> + perf_mmap__init(map, NULL, false, NULL);
> +
> + ret = perf_mmap__mmap(map, , fd, cpu);
> + if (ret)
> + return -1;
> + }
> + }
> +
> + return 0;
> +}

Where is the counterpart? I.e. perf_evsel__munmap(), and it should be
called if perf_evsel__mmap() fails, right?

- Arnaldo

> +void 

ANNOUNCE: pahole v1.21 (clang's LTO edition, BTF floats)

2021-04-12 Thread Arnaldo Carvalho de Melo
Hi,
 
The v1.21 release of pahole and its friends is out, this time it's
about using clang to build the kernel with LTO, some DWARF5 fixes, supporting
floating types in the BTF encoder for s/390 sake and some misc fixes and
improvements. Ah, it should also be faster due to switching to using libbpf's
hashing routines.

Main git repo:

   git://git.kernel.org/pub/scm/devel/pahole/pahole.git

Mirror git repo:

   https://github.com/acmel/dwarves.git

tarball + gpg signature:

   https://fedorapeople.org/~acme/dwarves/dwarves-1.21.tar.xz
   https://fedorapeople.org/~acme/dwarves/dwarves-1.21.tar.bz2
   https://fedorapeople.org/~acme/dwarves/dwarves-1.21.tar.sign

Thanks a lot to all the contributors and distro packagers, you're on the
CC list, I appreciate a lot the work you put into these tools,

Best Regards,
 
 - Arnaldo

DWARF loader:

- Handle DWARF5 DW_OP_addrx properly

  Part of the effort to support the subset of DWARF5 that is generated when 
building the kernel.

- Handle subprogram ret type with abstract_origin properly

  Adds a second pass to resolve abstract origin DWARF description of functions 
to aid
  the BTF encoder in getting the right return type.

- Check .notes section for LTO build info

  When LTO is used, currently only with clang, we need to do extra steps to 
handle references
  from one object (compile unit, aka CU) to another, a way for DWARF to avoid 
duplicating
  information.

- Check .debug_abbrev for cross-CU references

  When the kernel build process doesn't add an ELF note in vmlinux indicating 
that LTO was
  used and thus intra-CU references are present and thus we need to use a more 
expensive
  way to resolve types and (again) thus to encode BTF, we need to look at 
DWARF's .debug_abbrev
  ELF section to figure out if such intra-CU references are present.

- Permit merging all DWARF CU's for clang LTO built binary

  Allow not trowing away previously supposedly self contained compile units
  (objects, aka CU, aka Compile Units) as they have type descriptions that will
  be used in later CUs.

- Permit a flexible HASHTAGS__BITS

  So that we can use a more expensive algorithm when we need to keep previously 
processed
  compile units that will then be referenced by later ones to resolve types.

- Use a better hashing function, from libbpf

  Enabling patch to combine compile units when using LTO.

BTF encoder:

- Add --btf_gen_all flag

  A new command line to allow asking for the generation of all BTF encodings, 
so that we
  can stop adding new command line options to enable new encodings in the 
kernel Makefile.

- Match ftrace addresses within ELF functions

  To cope with differences in how DWARF and ftrace describes function 
boundaries.

- Funnel ELF error reporting through a macro

  To use libelf's elf_error() function, improving error messages.

- Sanitize non-regular int base type

  Cope with clang with dwarf5 non-regular int base types, tricky stuff, see yhs
  full explanation in the relevant cset.

- Add support for the floating-point types

  S/390 has floats'n'doubles in its arch specific linux headers, cope with that.

Pretty printer:

- Honour conf_fprintf.hex when printing enumerations

  If the user specifies --hex in the command line, honour it when printing 
enumerations.

Signed-off-by: Arnaldo Carvalho de Melo 


[GIT PULL] perf tools fixes for v5.12: 3rd batch

2021-04-09 Thread Arnaldo Carvalho de Melo
Hi Linus,

Please consider pulling,

Best regards,

- Arnaldo

Test results at the end of this message, as usual.

The following changes since commit e49d033bddf5b565044e2abe4241353959bc9120:

  Linux 5.12-rc6 (2021-04-04 14:15:36 -0700)

are available in the Git repository at:

  git://git.kernel.org/pub/scm/linux/kernel/git/acme/linux.git 
tags/perf-tools-fixes-for-v5.12-2020-04-09

for you to fetch changes up to 92f1e8adf7db2ef9b90e5662182810c0cf8ac22e:

  perf arm-spe: Avoid potential buffer overrun (2021-04-07 16:23:20 -0300)


perf tools fixes for v5.12: 3rd batch

- Fix wrong LBR block sorting in 'perf report'.

- Fix 'perf inject' repipe usage when consuming perf.data files.

- Avoid potential buffer overrun when decoding ARM SPE hardware tracing
  packets, bug found using a fuzzer.

Signed-off-by: Arnaldo Carvalho de Melo 


Adrian Hunter (1):
  perf inject: Fix repipe usage

Ian Rogers (1):
  perf arm-spe: Avoid potential buffer overrun

Jin Yao (1):
  perf report: Fix wrong LBR block sorting

 tools/perf/builtin-inject.c   | 2 +-
 tools/perf/util/arm-spe-decoder/arm-spe-pkt-decoder.c | 4 +++-
 tools/perf/util/block-info.c  | 6 +++---
 3 files changed, 7 insertions(+), 5 deletions(-)

Test results:

The first ones are container based builds of tools/perf with and without libelf
support.  Where clang is available, it is also used to build perf with/without
libelf, and building with LIBCLANGLLVM=1 (built-in clang) with gcc and clang
when clang and its devel libraries are installed.

The objtool and samples/bpf/ builds are disabled now that I'm switching from
using the sources in a local volume to fetching them from a http server to
build it inside the container, to make it easier to build in a container 
cluster.
Those will come back later.

Several are cross builds, the ones with -x-ARCH and the android one, and those
may not have all the features built, due to lack of multi-arch devel packages,
available and being used so far on just a few, like
debian:experimental-x-{arm64,mipsel}.

The 'perf test' one will perform a variety of tests exercising
tools/perf/util/, tools/lib/{bpf,traceevent,etc}, as well as run perf commands
with a variety of command line event specifications to then intercept the
sys_perf_event syscall to check that the perf_event_attr fields are set up as
expected, among a variety of other unit tests.

Then there is the 'make -C tools/perf build-test' ones, that build tools/perf/
with a variety of feature sets, exercising the build with an incomplete set of
features as well as with a complete one. It is planned to have it run on each
of the containers mentioned above, using some container orchestration
infrastructure. Get in contact if interested in helping having this in place.

  $ grep "model name" -m1 /proc/cpuinfo 
  model name: AMD Ryzen 9 3900X 12-Core Processor
  # export PERF_TARBALL=http://192.168.86.5/perf/perf-5.12.0-rc6.tar.xz
  # dm 
   168.93 alpine:3.4: Ok   gcc (Alpine 5.3.0) 5.3.0 , 
clang version 3.8.0 (tags/RELEASE_380/final)
   269.06 alpine:3.5: Ok   gcc (Alpine 6.2.1) 6.2.1 
20160822 , clang version 3.8.1 (tags/RELEASE_381/final)
   371.72 alpine:3.6: Ok   gcc (Alpine 6.3.0) 6.3.0 , 
clang version 4.0.0 (tags/RELEASE_400/final)
   478.94 alpine:3.7: Ok   gcc (Alpine 6.4.0) 6.4.0 , 
Alpine clang version 5.0.0 (tags/RELEASE_500/final) (based on LLVM 5.0.0)
   579.08 alpine:3.8: Ok   gcc (Alpine 6.4.0) 6.4.0 , 
Alpine clang version 5.0.1 (tags/RELEASE_501/final) (based on LLVM 5.0.1)
   682.77 alpine:3.9: Ok   gcc (Alpine 8.3.0) 8.3.0 , 
Alpine clang version 5.0.1 (tags/RELEASE_502/final) (based on LLVM 5.0.1)
   7   109.04 alpine:3.10   : Ok   gcc (Alpine 8.3.0) 8.3.0 , 
Alpine clang version 8.0.0 (tags/RELEASE_800/final) (based on LLVM 8.0.0)
   8   123.95 alpine:3.11   : Ok   gcc (Alpine 9.3.0) 9.3.0 , 
Alpine clang version 9.0.0 (https://git.alpinelinux.org/aports 
f7f0d2c2b8bcd6a5843401a9a702029556492689) (based on LLVM 9.0.0)
   9   110.89 alpine:3.12   : Ok   gcc (Alpine 9.3.0) 9.3.0 , 
Alpine clang version 10.0.0 (https://gitlab.alpinelinux.org/alpine/aports.git 
7445adce501f8473efdb93b17b5eaf2f1445ed4c)
  10   118.17 alpine:3.13   : Ok   gcc (Alpine 10.2.1_pre1) 
10.2.1 20201203 , Alpine clang version 10.0.1 
  11   104.02 alpine:edge   : Ok   gcc (Alpine 
10.2.1_git20210328) 10.2.1 20210328 , Alpine clang version 11.1.0
  1267.37 alt:p8: Ok   x86_64-alt-linux-gcc (GCC) 
5.3.1 20151207 (ALT p8 5.3.1-alt3.M80P.1) , clang version 3.8.0 
(tags/RELEASE_380/final)
  1382.46 alt:p9  

Re: [PATCH v2 3/3] perf-stat: introduce config stat.bpf-counter-events

2021-04-08 Thread Arnaldo Carvalho de Melo
Em Thu, Apr 08, 2021 at 08:24:47PM +0200, Jiri Olsa escreveu:
> On Thu, Apr 08, 2021 at 06:08:20PM +, Song Liu wrote:
> > 
> > 
> > > On Apr 8, 2021, at 10:45 AM, Jiri Olsa  wrote:
> > > 
> > > On Thu, Apr 08, 2021 at 05:28:10PM +, Song Liu wrote:
> > >> 
> > >> 
> > >>> On Apr 8, 2021, at 10:20 AM, Jiri Olsa  wrote:
> > >>> 
> > >>> On Thu, Apr 08, 2021 at 04:39:33PM +, Song Liu wrote:
> >  
> >  
> > > On Apr 8, 2021, at 4:47 AM, Jiri Olsa  wrote:
> > > 
> > > On Tue, Apr 06, 2021 at 05:36:01PM -0700, Song Liu wrote:
> > >> Currently, to use BPF to aggregate perf event counters, the user uses
> > >> --bpf-counters option. Enable "use bpf by default" events with a 
> > >> config
> > >> option, stat.bpf-counter-events. This is limited to hardware events 
> > >> in
> > >> evsel__hw_names.
> > >> 
> > >> This also enables mixed BPF event and regular event in the same 
> > >> sesssion.
> > >> For example:
> > >> 
> > >> perf config stat.bpf-counter-events=instructions
> > >> perf stat -e instructions,cs
> > >> 
> > > 
> > > so if we are mixing events now, how about uing modifier for bpf 
> > > counters,
> > > instead of configuring .perfconfig list we could use:
> > > 
> > > perf stat -e instructions:b,cs
> > > 
> > > thoughts?
> > > 
> > > the change below adds 'b' modifier and sets 'evsel::bpf_counter',
> > > feel free to use it
> >  
> >  I think we will need both 'b' modifier and .perfconfig configuration. 
> >  For systems with BPF-managed perf events running in the background, 
> > >>> 
> > >>> hum, I'm not sure I understand what that means.. you mean there
> > >>> are tools that run perf stat so you don't want to change them?
> > >> 
> > >> We have tools that do perf_event_open(). I will change them to use 
> > >> BPF managed perf events for "cycles" and "instructions". Since these 
> > >> tools are running 24/7, perf-stat on the system should use BPF managed
> > >> "cycles" and "instructions" by default. 
> > > 
> > > well if you are already changing the tools why not change them to add
> > > modifier.. but I don't mind adding that .perfconfig stuff if you need
> > > that
> > 
> > The tools I mentioned here don't use perf-stat, they just use 
> > perf_event_open() and read the perf events fds. We want a config to make
> 
> just curious, how those tools use perf_event_open?

I.e. do they use tools/lib/perf/? :-)

I guess they will use it now for getting that "struct 
perf_event_attr_map_entry" and
the map name define.
 
> > "cycles" to use BPF by default, so that when the user (not these tools)
> > runs perf-stat, it will share PMCs with those events by default. 
 
> I'm sorry but I still don't see the usecase.. if you need to change both 
> tools,
> you can change them to use bpf-managed event, why bother with the list?

He wants users not to bother if they are using bpf based counters, this will 
happen
automagically after they set their ~/.perfconfig with some command line Song 
provides.

Then they will be using bpf counters that won't get exclusive access to those
scarce counters, the tooling they are using will use bpf-counters and all will
be well.

Right Song?

- Arnaldo



Re: [PATCH v2 3/3] perf-stat: introduce config stat.bpf-counter-events

2021-04-08 Thread Arnaldo Carvalho de Melo
Em Thu, Apr 08, 2021 at 04:39:33PM +, Song Liu escreveu:
> > On Apr 8, 2021, at 4:47 AM, Jiri Olsa  wrote:
> > On Tue, Apr 06, 2021 at 05:36:01PM -0700, Song Liu wrote:
> >> Currently, to use BPF to aggregate perf event counters, the user uses
> >> --bpf-counters option. Enable "use bpf by default" events with a config
> >> option, stat.bpf-counter-events. This is limited to hardware events in
> >> evsel__hw_names.
> >> 
> >> This also enables mixed BPF event and regular event in the same sesssion.
> >> For example:
> >> 
> >>   perf config stat.bpf-counter-events=instructions
> >>   perf stat -e instructions,cs
> >> 
> > 
> > so if we are mixing events now, how about uing modifier for bpf counters,
> > instead of configuring .perfconfig list we could use:
> > 
> >  perf stat -e instructions:b,cs
> > 
> > thoughts?
> > 
> > the change below adds 'b' modifier and sets 'evsel::bpf_counter',
> > feel free to use it
> 
> I think we will need both 'b' modifier and .perfconfig configuration. 

Agreed, maximum flexibility.

> For systems with BPF-managed perf events running in the background, 
> .perfconfig makes sure perf-stat sessions will share PMCs with these 
> background monitoring tools. 'b' modifier, on the other hand, is useful
> when the user knows there is opportunity to share the PMCs. 
> 
> Does this make sense? 

I think so.

- Arnaldo
 
> Thanks,
> Song
> 
> > 
> > jirka
> > 
> > 
> > ---
> > diff --git a/tools/perf/util/evsel.h b/tools/perf/util/evsel.h
> > index ca52581f1b17..c55e4e58d1dc 100644
> > --- a/tools/perf/util/evsel.h
> > +++ b/tools/perf/util/evsel.h
> > @@ -82,6 +82,7 @@ struct evsel {
> > boolauto_merge_stats;
> > boolcollect_stat;
> > boolweak_group;
> > +   boolbpf_counter;
> > int bpf_fd;
> > struct bpf_object   *bpf_obj;
> > };
> > diff --git a/tools/perf/util/parse-events.c b/tools/perf/util/parse-events.c
> > index 9ecb45bea948..b5850f1ea90b 100644
> > --- a/tools/perf/util/parse-events.c
> > +++ b/tools/perf/util/parse-events.c
> > @@ -1801,6 +1801,7 @@ struct event_modifier {
> > int pinned;
> > int weak;
> > int exclusive;
> > +   int bpf_counter;
> > };
> > 
> > static int get_event_modifier(struct event_modifier *mod, char *str,
> > @@ -1821,6 +1822,7 @@ static int get_event_modifier(struct event_modifier 
> > *mod, char *str,
> > int exclude = eu | ek | eh;
> > int exclude_GH = evsel ? evsel->exclude_GH : 0;
> > int weak = 0;
> > +   int bpf_counter = 0;
> > 
> > memset(mod, 0, sizeof(*mod));
> > 
> > @@ -1864,6 +1866,8 @@ static int get_event_modifier(struct event_modifier 
> > *mod, char *str,
> > exclusive = 1;
> > } else if (*str == 'W') {
> > weak = 1;
> > +   } else if (*str == 'b') {
> > +   bpf_counter = 1;
> > } else
> > break;
> > 
> > @@ -1895,6 +1899,7 @@ static int get_event_modifier(struct event_modifier 
> > *mod, char *str,
> > mod->sample_read = sample_read;
> > mod->pinned = pinned;
> > mod->weak = weak;
> > +   mod->bpf_counter = bpf_counter;
> > mod->exclusive = exclusive;
> > 
> > return 0;
> > @@ -1909,7 +1914,7 @@ static int check_modifier(char *str)
> > char *p = str;
> > 
> > /* The sizeof includes 0 byte as well. */
> > -   if (strlen(str) > (sizeof("ukhGHpppPSDIWe") - 1))
> > +   if (strlen(str) > (sizeof("ukhGHpppPSDIWeb") - 1))
> > return -1;
> > 
> > while (*p) {
> > @@ -1950,6 +1955,7 @@ int parse_events__modifier_event(struct list_head 
> > *list, char *str, bool add)
> > evsel->sample_read = mod.sample_read;
> > evsel->precise_max = mod.precise_max;
> > evsel->weak_group  = mod.weak;
> > +   evsel->bpf_counter = mod.bpf_counter;
> > 
> > if (evsel__is_group_leader(evsel)) {
> > evsel->core.attr.pinned = mod.pinned;
> > diff --git a/tools/perf/util/parse-events.l b/tools/perf/util/parse-events.l
> > index 0b36285a9435..fb8646cc3e83 100644
> > --- a/tools/perf/util/parse-events.l
> > +++ b/tools/perf/util/parse-events.l
> > @@ -210,7 +210,7 @@ name_tag
> > [\'][a-zA-Z_*?\[\]][a-zA-Z0-9_*?\-,\.\[\]:=]*[\']
> > name_minus  [a-zA-Z_*?][a-zA-Z0-9\-_*?.:]*
> > drv_cfg_term[a-zA-Z0-9_\.]+(=[a-zA-Z0-9_*?\.:]+)?
> > /* If you add a modifier you need to update check_modifier() */
> > -modifier_event [ukhpPGHSDIWe]+
> > +modifier_event [ukhpPGHSDIWeb]+
> > modifier_bp [rwx]{1,3}
> > 
> > %%
> > 
> 

-- 

- Arnaldo


Re: [PATCH] perf annotate: improve --stdio mode

2021-04-07 Thread Arnaldo Carvalho de Melo
Em Wed, Apr 07, 2021 at 04:30:46PM -0300, Arnaldo Carvalho de Melo escreveu:
> Em Fri, Feb 26, 2021 at 10:24:00AM +0100, Martin Liška escreveu:
> > On 2/23/21 8:47 PM, Arnaldo Carvalho de Melo wrote:
> > Sure. But I think the current format provides quite broken visual layout:
> > 
> >   0.00 :   405ef1: inc%r15
> >   0.01 :   405ef4: vfmadd213sd 0x2b9b3(%rip),%xmm0,%xmm3# 
> > 4318b0 <_IO_stdin_used+0x8b0>
> >eff.c:18110.67 :   405efd: vfmadd213sd 0x2b9b2(%rip),%xmm0,%xmm3 
> ># 4318b8 <_IO_stdin_used+0x8b8>
> >   :TA + tmpsd * (TB +
> > 
> > vs.
> > 
> >   0.00 :   405ef1: inc%r15
> >   0.01 :   405ef4: vfmadd213sd 0x2b9b3(%rip),%xmm0,%xmm3# 
> > 4318b0 <_IO_stdin_used+0x8b0>
> >   0.67 :   405efd: vfmadd213sd 0x2b9b2(%rip),%xmm0,%xmm3# 
> > 4318b8 <_IO_stdin_used+0x8b8> // eff.c:1811
> >: 1810   TA + tmpsd * (TB +
> > 
> > I bet also the current users of --stdio mode would benefit from it.
> > What do you think?
 
> Agreed, I tried applying but it bitrotted, it seems :-\

I refreshed it, please check.

- Arnaldo

diff --git a/tools/perf/util/annotate.c b/tools/perf/util/annotate.c
index 18eee25b4976bea8..abe1499a91645375 100644
--- a/tools/perf/util/annotate.c
+++ b/tools/perf/util/annotate.c
@@ -1368,7 +1368,6 @@ annotation_line__print(struct annotation_line *al, struct 
symbol *sym, u64 start
 {
struct disasm_line *dl = container_of(al, struct disasm_line, al);
static const char *prev_line;
-   static const char *prev_color;
 
if (al->offset != -1) {
double max_percent = 0.0;
@@ -1407,20 +1406,6 @@ annotation_line__print(struct annotation_line *al, 
struct symbol *sym, u64 start
 
color = get_percent_color(max_percent);
 
-   /*
-* Also color the filename and line if needed, with
-* the same color than the percentage. Don't print it
-* twice for close colored addr with the same filename:line
-*/
-   if (al->path) {
-   if (!prev_line || strcmp(prev_line, al->path)
-  || color != prev_color) {
-   color_fprintf(stdout, color, " %s", al->path);
-   prev_line = al->path;
-   prev_color = color;
-   }
-   }
-
for (i = 0; i < nr_percent; i++) {
struct annotation_data *data = >data[i];
double percent;
@@ -1441,6 +1426,19 @@ annotation_line__print(struct annotation_line *al, 
struct symbol *sym, u64 start
printf(" : ");
 
disasm_line__print(dl, start, addr_fmt_width);
+
+   /*
+* Also color the filename and line if needed, with
+* the same color than the percentage. Don't print it
+* twice for close colored addr with the same filename:line
+*/
+   if (al->path) {
+   if (!prev_line || strcmp(prev_line, al->path)) {
+   color_fprintf(stdout, color, " // %s", 
al->path);
+   prev_line = al->path;
+   }
+   }
+
printf("\n");
} else if (max_lines && printed >= max_lines)
return 1;
@@ -1456,7 +1454,7 @@ annotation_line__print(struct annotation_line *al, struct 
symbol *sym, u64 start
if (!*al->line)
printf(" %*s:\n", width, " ");
else
-   printf(" %*s: %*s %s\n", width, " ", 
addr_fmt_width, " ", al->line);
+   printf(" %*s: %-*d %s\n", width, " ", addr_fmt_width, 
al->line_nr, al->line);
}
 
return 0;


Re: [PATCH] perf annotate: improve --stdio mode

2021-04-07 Thread Arnaldo Carvalho de Melo
Em Fri, Feb 26, 2021 at 10:24:00AM +0100, Martin Liška escreveu:
> On 2/23/21 8:47 PM, Arnaldo Carvalho de Melo wrote:
> > Em Sun, Feb 21, 2021 at 01:46:36PM +0100, Martin Liška escreveu:
> > > The patch changes the output format in 2 ways:
> > > - line number is displayed for all source lines (matching TUI mode)
> > 
> > Are you aware of 'perf annotate --stdio2' ? If the goal is to make the
> > stdio mode better, doing it in that mode would be best, as it was done
> > to share as much code as possible, not just the looks, with the TUI
> > mode.
> 
> Yes, I'm aware of it. My motivation is to generate a HTML perf annotate report
> and I see the following parts of --stdio2 not ideal:
> 
> - coloring is not available (--stdio-color=always does not work)
> - 'Sorted summary for file ' is missing so one can't easily search for
>   hot spots in browser
> - source line number are displayed, but not the source files
> - there's a missing option for 'Toggle disassembler output/simplified view' 
> which
>   is available in TUI mode
> 
> That said, the stdio2 annotation report is quite different and so handy for 
> my use case.

I'll add these to my TODO list, all valid concerns. And being able to
directly generate some sort of HTML with CSS, etc also seems a great
feature to have.

- Arnaldo
 
> > 
> > I kept --stdio around because changing the output in that way could
> > annoy people used to that format.
> 
> Sure. But I think the current format provides quite broken visual layout:
> 
>   0.00 :   405ef1: inc%r15
>   0.01 :   405ef4: vfmadd213sd 0x2b9b3(%rip),%xmm0,%xmm3# 4318b0 
> <_IO_stdin_used+0x8b0>
>eff.c:18110.67 :   405efd: vfmadd213sd 0x2b9b2(%rip),%xmm0,%xmm3   
>  # 4318b8 <_IO_stdin_used+0x8b8>
>   :TA + tmpsd * (TB +
> 
> vs.
> 
>   0.00 :   405ef1: inc%r15
>   0.01 :   405ef4: vfmadd213sd 0x2b9b3(%rip),%xmm0,%xmm3# 4318b0 
> <_IO_stdin_used+0x8b0>
>   0.67 :   405efd: vfmadd213sd 0x2b9b2(%rip),%xmm0,%xmm3# 4318b8 
> <_IO_stdin_used+0x8b8> // eff.c:1811
>: 1810   TA + tmpsd * (TB +
> 
> I bet also the current users of --stdio mode would benefit from it.
> What do you think?

Agreed, I tried applying but it bitrotted, it seems :-\

- Arnaldo
 
> Thanks,
> Martin
> 
> > 
> > Please take a look at 'man perf-config' and see what can be configured
> > for both 'perf annotate --tui' and 'perf annotate --stdio2'.
> > 
> > Perhaps we can do something like:
> > 
> > perf config annotate.stdio=tui_like
> > 
> > And, for completeness have:
> > 
> > perf config annotate.stdio=classical
> > 
> > wdyt?
> > 
> > Looking at the other patches now.
> > 
> > - Arnaldo
> > 
> > > - source locations for the hottest lines are printed
> > >at the line end in order to preserve layout
> > > 
> > > Before:
> > > 
> > >  0.00 :   405ef1: inc%r15
> > >   :tmpsd * (TD + tmpsd * TDD)));
> > >  0.01 :   405ef4: vfmadd213sd 0x2b9b3(%rip),%xmm0,%xmm3# 
> > > 4318b0 <_IO_stdin_used+0x8b0>
> > >   :tmpsd * (TC +
> > >   eff.c:18110.67 :   405efd: vfmadd213sd 0x2b9b2(%rip),%xmm0,%xmm3
> > > # 4318b8 <_IO_stdin_used+0x8b8>
> > >   :TA + tmpsd * (TB +
> > >  0.35 :   405f06: vfmadd213sd 0x2b9b1(%rip),%xmm0,%xmm3# 
> > > 4318c0 <_IO_stdin_used+0x8c0>
> > >   :dumbo =
> > >   eff.c:18091.41 :   405f0f: vfmadd213sd 0x2b9b0(%rip),%xmm0,%xmm3
> > > # 4318c8 <_IO_stdin_used+0x8c8>
> > >   :sumi -= sj * tmpsd * dij2i * dumbo;
> > >   eff.c:18132.58 :   405f18: vmulsd %xmm3,%xmm0,%xmm0
> > >  2.81 :   405f1c: vfnmadd213sd 0x30(%rsp),%xmm1,%xmm0
> > >  3.78 :   405f23: vmovsd %xmm0,0x30(%rsp)
> > >   :for (k = 0; k < lpears[i] + upears[i]; k++) {
> > >   eff.c:17610.90 :   405f29: cmp%r15d,%r12d
> > > 
> > > After:
> > > 
> > >  0.00 :   405ef1: inc%r15
> > >   : 1812   tmpsd * (TD + tmpsd * TDD)));
> > >  0.01 :   405ef4: vfmadd213sd 0x2b9b3(%rip),%xmm0,%xmm3# 
> > > 4318b0 <_IO_stdin_used+0x8b0>
> > >   : 1811   tmpsd * (TC +
> > >  0.67 :   405efd: vfmadd213sd 0x2b9b2(%rip),%xmm0,%xmm3# 
> > > 4318b8 <_IO_stdin_used+0x8b8> // eff.c:1811
> > 

Re: [PATCH] perf annotate: improve --stdio mode

2021-04-07 Thread Arnaldo Carvalho de Melo
Em Fri, Mar 19, 2021 at 10:38:26AM +0100, Martin Liška escreveu:
> PING

Can you please try to refresh the patch? It isn't applying, probably my
bad for not having processed it already :-\

- Arnaldo
 
> On 2/26/21 10:24 AM, Martin Liška wrote:
> > On 2/23/21 8:47 PM, Arnaldo Carvalho de Melo wrote:
> > > Em Sun, Feb 21, 2021 at 01:46:36PM +0100, Martin Liška escreveu:
> > > > The patch changes the output format in 2 ways:
> > > > - line number is displayed for all source lines (matching TUI mode)
> > > 
> > > Are you aware of 'perf annotate --stdio2' ? If the goal is to make the
> > > stdio mode better, doing it in that mode would be best, as it was done
> > > to share as much code as possible, not just the looks, with the TUI
> > > mode.
> > 
> > Yes, I'm aware of it. My motivation is to generate a HTML perf annotate 
> > report
> > and I see the following parts of --stdio2 not ideal:
> > 
> > - coloring is not available (--stdio-color=always does not work)
> > - 'Sorted summary for file ' is missing so one can't easily search for
> >    hot spots in browser
> > - source line number are displayed, but not the source files
> > - there's a missing option for 'Toggle disassembler output/simplified view' 
> > which
> >    is available in TUI mode
> > 
> > That said, the stdio2 annotation report is quite different and so handy for 
> > my use case.
> > 
> > > 
> > > I kept --stdio around because changing the output in that way could
> > > annoy people used to that format.
> > 
> > Sure. But I think the current format provides quite broken visual layout:
> > 
> >    0.00 :   405ef1: inc    %r15
> >    0.01 :   405ef4: vfmadd213sd 0x2b9b3(%rip),%xmm0,%xmm3    # 
> > 4318b0 <_IO_stdin_used+0x8b0>
> >     eff.c:1811    0.67 :   405efd: vfmadd213sd 0x2b9b2(%rip),%xmm0,%xmm3
> >     # 4318b8 <_IO_stdin_used+0x8b8>
> >    :    TA + tmpsd * (TB +
> > 
> > vs.
> > 
> >    0.00 :   405ef1: inc    %r15
> >    0.01 :   405ef4: vfmadd213sd 0x2b9b3(%rip),%xmm0,%xmm3    # 
> > 4318b0 <_IO_stdin_used+0x8b0>
> >    0.67 :   405efd: vfmadd213sd 0x2b9b2(%rip),%xmm0,%xmm3    # 
> > 4318b8 <_IO_stdin_used+0x8b8> // eff.c:1811
> >     : 1810   TA + tmpsd * (TB +
> > 
> > I bet also the current users of --stdio mode would benefit from it.
> > What do you think?
> > 
> > Thanks,
> > Martin
> > 
> > > 
> > > Please take a look at 'man perf-config' and see what can be configured
> > > for both 'perf annotate --tui' and 'perf annotate --stdio2'.
> > > 
> > > Perhaps we can do something like:
> > > 
> > > perf config annotate.stdio=tui_like
> > > 
> > > And, for completeness have:
> > > 
> > > perf config annotate.stdio=classical
> > > 
> > > wdyt?
> > > 
> > > Looking at the other patches now.
> > > 
> > > - Arnaldo
> > > 
> > > > - source locations for the hottest lines are printed
> > > >    at the line end in order to preserve layout
> > > > 
> > > > Before:
> > > > 
> > > >  0.00 :   405ef1: inc    %r15
> > > >   :    tmpsd * (TD + tmpsd * TDD)));
> > > >  0.01 :   405ef4: vfmadd213sd 0x2b9b3(%rip),%xmm0,%xmm3    # 
> > > > 4318b0 <_IO_stdin_used+0x8b0>
> > > >   :    tmpsd * (TC +
> > > >   eff.c:1811    0.67 :   405efd: vfmadd213sd 0x2b9b2(%rip),%xmm0,%xmm3  
> > > >   # 4318b8 <_IO_stdin_used+0x8b8>
> > > >   :    TA + tmpsd * (TB +
> > > >  0.35 :   405f06: vfmadd213sd 0x2b9b1(%rip),%xmm0,%xmm3    # 
> > > > 4318c0 <_IO_stdin_used+0x8c0>
> > > >   :    dumbo =
> > > >   eff.c:1809    1.41 :   405f0f: vfmadd213sd 0x2b9b0(%rip),%xmm0,%xmm3  
> > > >   # 4318c8 <_IO_stdin_used+0x8c8>
> > > >   :    sumi -= sj * tmpsd * dij2i * dumbo;
> > > >   eff.c:1813    2.58 :   405f18: vmulsd %xmm3,%xmm0,%xmm0
> > > >  2.81 :   405f1c: vfnmadd213sd 0x30(%rsp),%xmm1,%xmm0
> > > >  3.78 :   405f23: vmovsd %xmm0,0x30(%rsp)
> > > >   :    for (k = 0; k < lpears[i] + upears[i]; k++) {
> > > >   eff.c:1761    0.90 :   405f29: cmp    %r15d,%r12d
> > > > 
> > > > After:
> > > >

Re: [PATCH] perf arm-spe: Avoid potential buffer overrun.

2021-04-07 Thread Arnaldo Carvalho de Melo
Em Wed, Apr 07, 2021 at 08:39:55AM -0700, Ian Rogers escreveu:
> SPE extended headers are >1 byte so ensure the buffer contains at
> least this before reading. This issue was detected by fuzzing.

Thanks, applied.

- Arnaldo

 
> Signed-off-by: Ian Rogers 
> ---
>  tools/perf/util/arm-spe-decoder/arm-spe-pkt-decoder.c | 4 +++-
>  1 file changed, 3 insertions(+), 1 deletion(-)
> 
> diff --git a/tools/perf/util/arm-spe-decoder/arm-spe-pkt-decoder.c 
> b/tools/perf/util/arm-spe-decoder/arm-spe-pkt-decoder.c
> index f3ac9d40cebf..2e5eff4f8f03 100644
> --- a/tools/perf/util/arm-spe-decoder/arm-spe-pkt-decoder.c
> +++ b/tools/perf/util/arm-spe-decoder/arm-spe-pkt-decoder.c
> @@ -210,8 +210,10 @@ static int arm_spe_do_get_packet(const unsigned char 
> *buf, size_t len,
>  
>   if ((hdr & SPE_HEADER0_MASK2) == SPE_HEADER0_EXTENDED) {
>   /* 16-bit extended format header */
> - ext_hdr = 1;
> + if (len == 1)
> + return ARM_SPE_BAD_PACKET;
>  
> + ext_hdr = 1;
>   hdr = buf[1];
>   if (hdr == SPE_HEADER1_ALIGNMENT)
>   return arm_spe_get_alignment(buf, len, packet);
> -- 
> 2.31.0.208.g409f899ff0-goog
> 

-- 

- Arnaldo


Re: [PATCH] perf report: Fix wrong LBR block sorting

2021-04-07 Thread Arnaldo Carvalho de Melo
Em Wed, Apr 07, 2021 at 06:49:57AM -0700, Andi Kleen escreveu:
> > Now the hottest block is reported at the top of output.
> > 
> > Fixes: b65a7d372b1a ("perf hist: Support block formats with 
> > compare/sort/display")
> > Signed-off-by: Jin Yao 
> 
> 
> Reviewed-by: Andi Kleen 

Thanks, applied.

- Arnaldo



Re: [PATCH 0/4] perf events vendor amd: Fixes, cleanups and updates for AMD Zen cores

2021-04-07 Thread Arnaldo Carvalho de Melo
Em Tue, Apr 06, 2021 at 04:59:40PM -0500, Smita Koralahalli escreveu:
> This series of patches provides a fix for the broken metric and does some
> cleanup for AMD Zen1/Zen2 cores. Additionally, adds Zen3 events.
> 
> The first patch fixes broken L2 Cache Hits from L2 HWPF recommended event.
> 
> The second and third patches addresses the inconsistency by defaulting all
> event codes and umask values to use lower cases and 0x%02x as their
> format.
> 
> The final patch adds Zen3 events.

Thanks, applied.

- Arnaldo

 
> Cc: Peter Zijlstra 
> Cc: Ingo Molnar 
> Cc: Arnaldo Carvalho de Melo 
> Cc: Mark Rutland 
> Cc: Alexander Shishkin 
> Cc: Jiri Olsa 
> Cc: Namhyung Kim 
> Cc: Ian Rogers 
> Cc: Vijay Thakkar 
> Cc: Martin Liška 
> Cc: Michael Petlan 
> Cc: Kim Phillips 
> Cc: linux-perf-us...@vger.kernel.org
> 
> Smita Koralahalli (4):
>   perf vendor events amd: Fix broken L2 Cache Hits from L2 HWPF metric
>   perf vendor events amd: Use lowercases for all the eventcodes and umasks
>   perf vendor events amd: Use 0x%02x format for event code and umask
>   perf vendor events amd: Add Zen3 events
> 
>  .../pmu-events/arch/x86/amdzen1/cache.json|  48 +-
>  .../pmu-events/arch/x86/amdzen1/core.json |  12 +-
>  .../arch/x86/amdzen1/floating-point.json  |  42 +-
>  .../pmu-events/arch/x86/amdzen1/memory.json   |  42 +-
>  .../pmu-events/arch/x86/amdzen1/other.json|  12 +-
>  .../arch/x86/amdzen1/recommended.json |   8 +-
>  .../pmu-events/arch/x86/amdzen2/branch.json   |   8 +-
>  .../pmu-events/arch/x86/amdzen2/cache.json|  60 +--
>  .../pmu-events/arch/x86/amdzen2/core.json |  12 +-
>  .../arch/x86/amdzen2/floating-point.json  |  42 +-
>  .../pmu-events/arch/x86/amdzen2/memory.json   |  86 ++--
>  .../pmu-events/arch/x86/amdzen2/other.json|  20 +-
>  .../arch/x86/amdzen2/recommended.json |   8 +-
>  .../pmu-events/arch/x86/amdzen3/branch.json   |  53 +++
>  .../pmu-events/arch/x86/amdzen3/cache.json| 402 
>  .../pmu-events/arch/x86/amdzen3/core.json | 137 ++
>  .../arch/x86/amdzen3/data-fabric.json |  98 
>  .../arch/x86/amdzen3/floating-point.json  | 139 ++
>  .../pmu-events/arch/x86/amdzen3/memory.json   | 428 ++
>  .../pmu-events/arch/x86/amdzen3/other.json| 103 +
>  .../arch/x86/amdzen3/recommended.json | 214 +
>  tools/perf/pmu-events/arch/x86/mapfile.csv|   2 +-
>  22 files changed, 1775 insertions(+), 201 deletions(-)
>  create mode 100644 tools/perf/pmu-events/arch/x86/amdzen3/branch.json
>  create mode 100644 tools/perf/pmu-events/arch/x86/amdzen3/cache.json
>  create mode 100644 tools/perf/pmu-events/arch/x86/amdzen3/core.json
>  create mode 100644 tools/perf/pmu-events/arch/x86/amdzen3/data-fabric.json
>  create mode 100644 tools/perf/pmu-events/arch/x86/amdzen3/floating-point.json
>  create mode 100644 tools/perf/pmu-events/arch/x86/amdzen3/memory.json
>  create mode 100644 tools/perf/pmu-events/arch/x86/amdzen3/other.json
>  create mode 100644 tools/perf/pmu-events/arch/x86/amdzen3/recommended.json
> 
> -- 
> 2.17.1
> 

-- 

- Arnaldo


Re: [PATCH v3 0/6] perf arm64 metricgroup support

2021-04-07 Thread Arnaldo Carvalho de Melo
Em Wed, Apr 07, 2021 at 06:32:44PM +0800, John Garry escreveu:
> This series contains support to get basic metricgroups working for
> arm64 CPUs.
> 
> Initial support is added for HiSilicon hip08 platform.
> 
> Some sample usage on Huawei D06 board:
> 
>  $ ./perf list metric

Thanks, applied.

- Arnaldo

 
> List of pre-defined events (to be used in -e): 
> 
> Metrics: 
> 
>   bp_misp_flush
>[BP misp flush L3 topdown metric]
>   branch_mispredicts
>[Branch mispredicts L2 topdown metric]
>   core_bound
>[Core bound L2 topdown metric]
>   divider
>[Divider L3 topdown metric]
>   exe_ports_util
>[EXE ports util L3 topdown metric]
>   fetch_bandwidth_bound
>[Fetch bandwidth bound L2 topdown metric]
>   fetch_latency_bound
>[Fetch latency bound L2 topdown metric]
>   fsu_stall
>[FSU stall L3 topdown metric]
>   idle_by_icache_miss
> 
> $ sudo ./perf stat -v -M core_bound sleep 1
> Using CPUID 0x480fd010
> metric expr (exe_stall_cycle - (mem_stall_anyload + 
> armv8_pmuv3_0@event\=0x7005@)) / cpu_cycles for core_bound
> found event cpu_cycles
> found event armv8_pmuv3_0/event=0x7005/
> found event exe_stall_cycle
> found event mem_stall_anyload
> adding {cpu_cycles -> armv8_pmuv3_0/event=0x7001/
> mem_stall_anyload -> armv8_pmuv3_0/event=0x7004/
> Control descriptor is not initialized
> cpu_cycles: 989433 385050 385050
> armv8_pmuv3_0/event=0x7005/: 19207 385050 385050
> exe_stall_cycle: 900825 385050 385050
> mem_stall_anyload: 253516 385050 385050
> 
> Performance counter stats for 'sleep':
> 
> 989,433  cpu_cycles  # 0.63 core_bound
>   19,207  armv8_pmuv3_0/event=0x7005/
>  900,825  exe_stall_cycle
>  253,516  mem_stall_anyload
> 
>0.000805809 seconds time elapsed
> 
>0.000875000 seconds user
>0.0 seconds sys
>
> perf stat --topdown is not supported, as this requires the CPU PMU to
> expose (alias) events for the TopDown L1 metrics from sysfs, which arm 
> does not do. To get that to work, we probably need to make perf use the
> pmu-events cpumap to learn about those alias events.
> 
> Metric reuse support is added for pmu-events parse metric testcase.
> This had been broken on power9 recently:
> https://lore.kernel.org/lkml/20210324015418.gc8...@li-24c3614c-2adc-11b2-a85c-85f334518bdb.ibm.com/
>  
> 
> Differences to v2:
> - Add TB and RB tags (Thanks!)
> - Rename metricgroup__find_metric() from metricgroup_find_metric()
> - Change resolve_metric_simple() to rescan after any insert
> 
> Differences to v1:
> - Add pmu_events_map__find() as arm64-specific function
> - Fix metric reuse for pmu-events parse metric testcase 
> 
> John Garry (6):
>   perf metricgroup: Make find_metric() public with name change
>   perf test: Handle metric reuse in pmu-events parsing test
>   perf pmu: Add pmu_events_map__find()
>   perf vendor events arm64: Add Hisi hip08 L1 metrics
>   perf vendor events arm64: Add Hisi hip08 L2 metrics
>   perf vendor events arm64: Add Hisi hip08 L3 metrics
> 
>  tools/perf/arch/arm64/util/Build  |   1 +
>  tools/perf/arch/arm64/util/pmu.c  |  25 ++
>  .../arch/arm64/hisilicon/hip08/metrics.json   | 233 ++
>  tools/perf/tests/pmu-events.c |  83 ++-
>  tools/perf/util/metricgroup.c |  12 +-
>  tools/perf/util/metricgroup.h |   3 +-
>  tools/perf/util/pmu.c |   5 +
>  tools/perf/util/pmu.h |   1 +
>  tools/perf/util/s390-sample-raw.c |   4 +-
>  9 files changed, 356 insertions(+), 11 deletions(-)
>  create mode 100644 tools/perf/arch/arm64/util/pmu.c
>  create mode 100644 
> tools/perf/pmu-events/arch/arm64/hisilicon/hip08/metrics.json
> 
> -- 
> 2.26.2
> 

-- 

- Arnaldo


Re: [PATCH-next] perf daemon: Remove duplicate header file

2021-04-07 Thread Arnaldo Carvalho de Melo
Em Wed, Apr 07, 2021 at 05:49:02PM +0800, johnny.che...@huawei.com escreveu:
> From: Chen Yi 
> 
> Delete one of the header files  that are included twice.

Thanks, but I got a patch merged for this already.

- Arnaldo
 
> Signed-off-by: Chen Yi 
> ---
>  tools/perf/builtin-daemon.c | 1 -
>  1 file changed, 1 deletion(-)
> 
> diff --git a/tools/perf/builtin-daemon.c b/tools/perf/builtin-daemon.c
> index 7c4a9d424a64..be1a13d06b9c 100644
> --- a/tools/perf/builtin-daemon.c
> +++ b/tools/perf/builtin-daemon.c
> @@ -6,7 +6,6 @@
>  #include 
>  #include 
>  #include 
> -#include 
>  #include 
>  #include 
>  #include 
> -- 
> 2.31.0
> 

-- 

- Arnaldo


Re: [PATCH] perf: util/mem-events.h: Remove unnecessary struct declaration

2021-04-06 Thread Arnaldo Carvalho de Melo
Em Tue, Apr 06, 2021 at 06:51:02PM +0800, Wan Jiabing escreveu:
> struct mem_info is defined at 22nd line.
> The declaration here is unnecessary. Remove it.
 

Thanks, applied.

- Arnaldo

> Signed-off-by: Wan Jiabing 
> ---
>  tools/perf/util/mem-events.h | 1 -
>  1 file changed, 1 deletion(-)
> 
> diff --git a/tools/perf/util/mem-events.h b/tools/perf/util/mem-events.h
> index 755cef7e0625..5ddbeaa057b0 100644
> --- a/tools/perf/util/mem-events.h
> +++ b/tools/perf/util/mem-events.h
> @@ -44,7 +44,6 @@ bool is_mem_loads_aux_event(struct evsel *leader);
>  
>  void perf_mem_events__list(void);
>  
> -struct mem_info;
>  int perf_mem__tlb_scnprintf(char *out, size_t sz, struct mem_info *mem_info);
>  int perf_mem__lvl_scnprintf(char *out, size_t sz, struct mem_info *mem_info);
>  int perf_mem__snp_scnprintf(char *out, size_t sz, struct mem_info *mem_info);
> -- 
> 2.25.1
> 

-- 

- Arnaldo


Re: [PATCH] perf record: Disallow -c and -F option at the same time

2021-04-03 Thread Arnaldo Carvalho de Melo
Em Fri, Apr 02, 2021 at 08:25:30PM -0700, Alexey Alexandrov escreveu:
> A warning can be missed when the tool is run by some kind of automation.
> Backward compatibility aside, I think conflicting flags should result in an
> early exit to avoid later surprises.

Sure, I agree with you in principle, but having erred out in the past,
i.e. in making this be accepted, now making this out of the blue finally
be considered what it always should have been considered, an error,
feels like an error.

I sent this message after merging the change, but before pushing it out
publicly I felt some (more) discussion would be in order.

Are you sure that potentially breaking existing scripts is ok in this
case?

Up to you, frankly.

- Arnaldo
 
> On Fri, Apr 2, 2021 at 6:37 AM Arnaldo Carvalho de Melo 
> wrote:
> 
> > Em Fri, Apr 02, 2021 at 06:40:20PM +0900, Namhyung Kim escreveu:
> > > It's confusing which one is effective when the both options are given.
> > > The current code happens to use -c in this case but users might not be
> > > aware of it.  We can change it to complain about that instead of
> > > relying on the implicit priority.
> > >
> > > Before:
> > >   $ perf record -c 11 -F 99 true
> > >   [ perf record: Woken up 1 times to write data ]
> > >   [ perf record: Captured and wrote 0.031 MB perf.data (8 samples) ]
> > >
> > >   $ perf evlist -F
> > >   cycles: sample_period=11
> > >
> > > After:
> > >   $ perf record -c 11 -F 99 true
> > >   cannot set frequency and period at the same time
> > >
> > > So this change can break existing usages, but I think it's rare to
> > > have both options and it'd be better changing them.
> >
> > Humm, perhaps we can just make that an warning stating that -c is used
> > if both are specified?
> >
> > $ perf record -c 11 -F 99 true
> > Frequency and period can't be used the same time, -c 1 will be used.
> >
> > - Arnaldo
> >
> > > Suggested-by: Alexey Alexandrov 
> > > Signed-off-by: Namhyung Kim 
> > > ---
> > >  tools/perf/util/record.c | 8 +++-
> > >  1 file changed, 7 insertions(+), 1 deletion(-)
> > >
> > > diff --git a/tools/perf/util/record.c b/tools/perf/util/record.c
> > > index f99852d54b14..43e5b563dee8 100644
> > > --- a/tools/perf/util/record.c
> > > +++ b/tools/perf/util/record.c
> > > @@ -157,9 +157,15 @@ static int get_max_rate(unsigned int *rate)
> > >  static int record_opts__config_freq(struct record_opts *opts)
> > >  {
> > >   bool user_freq = opts->user_freq != UINT_MAX;
> > > + bool user_interval = opts->user_interval != ULLONG_MAX;
> > >   unsigned int max_rate;
> > >
> > > - if (opts->user_interval != ULLONG_MAX)
> > > + if (user_interval && user_freq) {
> > > + pr_err("cannot set frequency and period at the same
> > time\n");
> > > + return -1;
> > > + }
> > > +
> > > + if (user_interval)
> > >   opts->default_interval = opts->user_interval;
> > >   if (user_freq)
> > >   opts->freq = opts->user_freq;
> > > --
> > > 2.31.0.208.g409f899ff0-goog
> > >
> >
> > --
> >
> > - Arnaldo
> >

-- 

- Arnaldo


Re: [PATCH] tools: perf: util: Remove duplicate struct declaration

2021-04-02 Thread Arnaldo Carvalho de Melo
Em Thu, Apr 01, 2021 at 04:19:38PM +0900, Namhyung Kim escreveu:
> Hello,
> 
> On Thu, Apr 1, 2021 at 3:25 PM Wan Jiabing  wrote:
> >
> > struct target is declared twice. One has been declared
> > at 21st line. Remove the duplicate.
> >
> > Signed-off-by: Wan Jiabing 
> 
> Acked-by: Namhyung Kim 
> 
> I think we can move all the forward declarations to the top
> (and sort them) as well.

Thanks, applied.

- Arnaldo



Re: [PATCH] perf record: Disallow -c and -F option at the same time

2021-04-02 Thread Arnaldo Carvalho de Melo
Em Fri, Apr 02, 2021 at 06:40:20PM +0900, Namhyung Kim escreveu:
> It's confusing which one is effective when the both options are given.
> The current code happens to use -c in this case but users might not be
> aware of it.  We can change it to complain about that instead of
> relying on the implicit priority.
> 
> Before:
>   $ perf record -c 11 -F 99 true
>   [ perf record: Woken up 1 times to write data ]
>   [ perf record: Captured and wrote 0.031 MB perf.data (8 samples) ]
> 
>   $ perf evlist -F
>   cycles: sample_period=11
> 
> After:
>   $ perf record -c 11 -F 99 true
>   cannot set frequency and period at the same time
> 
> So this change can break existing usages, but I think it's rare to
> have both options and it'd be better changing them.

Humm, perhaps we can just make that an warning stating that -c is used
if both are specified?

$ perf record -c 11 -F 99 true
Frequency and period can't be used the same time, -c 1 will be used.

- Arnaldo
 
> Suggested-by: Alexey Alexandrov 
> Signed-off-by: Namhyung Kim 
> ---
>  tools/perf/util/record.c | 8 +++-
>  1 file changed, 7 insertions(+), 1 deletion(-)
> 
> diff --git a/tools/perf/util/record.c b/tools/perf/util/record.c
> index f99852d54b14..43e5b563dee8 100644
> --- a/tools/perf/util/record.c
> +++ b/tools/perf/util/record.c
> @@ -157,9 +157,15 @@ static int get_max_rate(unsigned int *rate)
>  static int record_opts__config_freq(struct record_opts *opts)
>  {
>   bool user_freq = opts->user_freq != UINT_MAX;
> + bool user_interval = opts->user_interval != ULLONG_MAX;
>   unsigned int max_rate;
>  
> - if (opts->user_interval != ULLONG_MAX)
> + if (user_interval && user_freq) {
> + pr_err("cannot set frequency and period at the same time\n");
> + return -1;
> + }
> +
> + if (user_interval)
>   opts->default_interval = opts->user_interval;
>   if (user_freq)
>   opts->freq = opts->user_freq;
> -- 
> 2.31.0.208.g409f899ff0-goog
> 

-- 

- Arnaldo


Re: [PATCH] perf inject: Fix repipe usage

2021-04-02 Thread Arnaldo Carvalho de Melo
Em Thu, Apr 01, 2021 at 04:05:13PM +0200, Jiri Olsa escreveu:
> On Thu, Apr 01, 2021 at 01:36:05PM +0300, Adrian Hunter wrote:
> > Since commit 14d3d5405253 ("perf session: Try to read pipe data from file")
> > perf inject has started printing "PERFILE2h" when not processing pipes.
> > 
> > The commit exposed perf to the possiblity that the input is not a pipe but
> > the 'repipe' parameter gets used. That causes the printing because perf
> > inject sets 'repipe' to true always.
> > 
> > The 'repipe' parameter of perf_session__new() is used by 2 functions:
> > - perf_file_header__read_pipe()
> > - trace_report()
> > In both cases, the functions copy data to STDOUT_FILENO when 'repipe' is
> > true.
> > 
> > Fix by setting 'repipe' to true only if the output is a pipe.
> > 
> > Fixes: e558a5bd8b74 ("perf inject: Work with files")
> > Signed-off-by: Adrian Hunter 
> 
> Acked-by: Jiri Olsa 

Thanks, applied.

- Arnaldo



Re: [PATCH] perf annotate: add --demangle and --demangle-kernel

2021-03-30 Thread Arnaldo Carvalho de Melo
Em Tue, Mar 30, 2021 at 08:19:10PM +0200, Martin Liška escreveu:
> On 3/30/21 5:42 PM, Arnaldo Carvalho de Melo wrote:
> > Trying to find V2

You said you would resend fixing up this:

+   OPT_BOOLEAN(0, "demangle", _conf.demangle,
+   "Disable symbol demangling"),
 ^^^



+   OPT_BOOLEAN(0, "demangle-kernel", _conf.demangle_kernel,
+   "Enable kernel symbol demangling"),
 
> It's this email:
> https://lore.kernel.org/lkml/deb2af9e-25dd-ac72-29f4-ab90c2b24...@suse.cz/
> 
> Subject: Re: [PATCH] perf config: add annotate.demangle{,_kernel}
> From:   =?UTF-8?Q?Martin_Li=c5=a1ka?= 
> To: Arnaldo Carvalho de Melo 
> Cc: linux-kernel@vger.kernel.org, linux-perf-us...@vger.kernel.org
> References: 
>   
> Message-ID: 
> Date:   Fri, 26 Feb 2021 11:08:12 +0100
> 
> 
> Cheers,
> Martin

-- 

- Arnaldo


Re: [PATCH] perf tools: Preserve identifier id in OCaml demangler

2021-03-30 Thread Arnaldo Carvalho de Melo
Em Fri, Feb 26, 2021 at 02:52:23AM -0500, Fabian Hemmer escreveu:
> Some OCaml developers reported that this bit of information is sometimes
> useful for disambiguating functions for which the OCaml compiler assigns
> the same name, e.g. nested or inlined functions.

Sorry for the delay in processing, applied.

- Arnaldo
 
> Signed-off-by: Fabian Hemmer 
> ---
>  tools/perf/tests/demangle-ocaml-test.c |  6 +++---
>  tools/perf/util/demangle-ocaml.c   | 12 
>  2 files changed, 3 insertions(+), 15 deletions(-)
> 
> diff --git a/tools/perf/tests/demangle-ocaml-test.c 
> b/tools/perf/tests/demangle-ocaml-test.c
> index a273ed5163d7..1d232c2e2190 100644
> --- a/tools/perf/tests/demangle-ocaml-test.c
> +++ b/tools/perf/tests/demangle-ocaml-test.c
> @@ -19,11 +19,11 @@ int test__demangle_ocaml(struct test *test 
> __maybe_unused, int subtest __maybe_u
>   { "main",
> NULL },
>   { "camlStdlib__array__map_154",
> -   "Stdlib.array.map" },
> +   "Stdlib.array.map_154" },
>   { "camlStdlib__anon_fn$5bstdlib$2eml$3a334$2c0$2d$2d54$5d_1453",
> -   "Stdlib.anon_fn[stdlib.ml:334,0--54]" },
> +   "Stdlib.anon_fn[stdlib.ml:334,0--54]_1453" },
>   { "camlStdlib__bytes__$2b$2b_2205",
> -   "Stdlib.bytes.++" },
> +   "Stdlib.bytes.++_2205" },
>   };
>  
>   for (i = 0; i < sizeof(test_cases) / sizeof(test_cases[0]); i++) {
> diff --git a/tools/perf/util/demangle-ocaml.c 
> b/tools/perf/util/demangle-ocaml.c
> index 3df14e67c622..9d707bb60b4b 100644
> --- a/tools/perf/util/demangle-ocaml.c
> +++ b/tools/perf/util/demangle-ocaml.c
> @@ -64,17 +64,5 @@ ocaml_demangle_sym(const char *sym)
>   }
>   result[j] = '\0';
>  
> - /* scan backwards to remove an "_" followed by decimal digits */
> - if (j != 0 && isdigit(result[j - 1])) {
> - while (--j) {
> - if (!isdigit(result[j])) {
> - break;
> - }
> - }
> - if (result[j] == '_') {
> - result[j] = '\0';
> - }
> - }
> -
>   return result;
>  }
> -- 
> 2.30.1
> 

-- 

- Arnaldo


Re: [PATCH] perf annotate: add --demangle and --demangle-kernel

2021-03-30 Thread Arnaldo Carvalho de Melo
Em Tue, Mar 30, 2021 at 09:41:33AM +0200, Martin Liška escreveu:
> PING^2
> 
> On 3/7/21 8:23 PM, Martin Liška wrote:
> > Hello.
> > 
> > May I please remind this patch. Apparently, you applied the perf-config
> > counterpart of the patch as 804fd30c6bd9aec7859a0503581312834fb197f1
> > (in tmp.perf/core branch), but we miss setting the same via options.
> > 
> > Thank you,
> > Martin
> > 
> > On 2/26/21 11:01 AM, Martin Liška wrote:
> > > On 2/23/21 8:49 PM, Arnaldo Carvalho de Melo wrote:
> > > > Em Mon, Feb 22, 2021 at 09:29:22AM +0100, Martin Liška escreveu:
> > > > > Perf annotate supports --symbol but it's impossible to filter
> > > > > a C++ symbol. With --no-demangle one can filter easily by
> > > > > mangled function name.
> > > > > 
> > > > > Signed-off-by: Martin Liška 
> > > > > ---
> > > > >   tools/perf/Documentation/perf-annotate.txt | 7 +++
> > > > >   tools/perf/builtin-annotate.c  | 4 
> > > > >   2 files changed, 11 insertions(+)
> > > > > 
> > > > > diff --git a/tools/perf/Documentation/perf-annotate.txt 
> > > > > b/tools/perf/Documentation/perf-annotate.txt
> > > > > index 1b5042f134a8..80c1be5d566c 100644
> > > > > --- a/tools/perf/Documentation/perf-annotate.txt
> > > > > +++ b/tools/perf/Documentation/perf-annotate.txt
> > > > > @@ -124,6 +124,13 @@ OPTIONS
> > > > >   --group::
> > > > >   Show event group information together
> > > > > +--demangle::
> > > > > +    Demangle symbol names to human readable form. It's enabled by 
> > > > > default,
> > > > > +    disable with --no-demangle.
> > > > > +
> > > > > +--demangle-kernel::
> > > > > +    Demangle kernel symbol names to human readable form (for C++ 
> > > > > kernels).
> > > > > +
> > > > >   --percent-type::
> > > > >   Set annotation percent type from following choices:
> > > > >     global-period, local-period, global-hits, local-hits
> > > > > diff --git a/tools/perf/builtin-annotate.c 
> > > > > b/tools/perf/builtin-annotate.c
> > > > > index a23ba6bb99b6..ef70a17b9b5b 100644
> > > > > --- a/tools/perf/builtin-annotate.c
> > > > > +++ b/tools/perf/builtin-annotate.c
> > > > > @@ -538,6 +538,10 @@ int cmd_annotate(int argc, const char **argv)
> > > > >   "Strip first N entries of source file path name in 
> > > > > programs (with --prefix)"),
> > > > >   OPT_STRING(0, "objdump", _path, "path",
> > > > >  "objdump binary to use for disassembly and annotations"),
> > > > > +    OPT_BOOLEAN(0, "demangle", _conf.demangle,
> > > > > +    "Disable symbol demangling"),
> > > > 
> > > > Nope, this _enables_ demangling, i.e.:
> > > > 
> > > > perf annotate --demangle
> > > 
> > > Oh, yeah, you are right.
> > > 
> > > > 
> > > > Asks for symbol demangling, while:
> > > > 
> > > > perf annotate --no-demangle
> > > > 
> > > > As you correctly wrote in your commit message and on the
> > > > --demangle-kernel case, enables demangling.
> > > 
> > > Fixed that in V2.

Trying to find V2

- Arnaldo


[GIT PULL] perf tools changes for v5.12: 2nd batch

2021-03-28 Thread Arnaldo Carvalho de Melo
Hi Linus,

Please consider pulling,

Best regards,

- Arnaldo

Tests at the end of this message:

The following changes since commit 1e28eed17697bcf343c6743f0028cc3b5dd88bf0:

  Linux 5.12-rc3 (2021-03-14 14:41:02 -0700)

are available in the Git repository at:

  git://git.kernel.org/pub/scm/linux/kernel/git/acme/linux.git 
tags/perf-tools-fixes-for-v5.12-2020-03-28

for you to fetch changes up to 1dc481c0b0cf18d3952d93a73c4ece90dec277f0:

  perf test: Change to use bash for daemon test (2021-03-26 08:56:57 -0300)


Some more perf tools fixes for v5.12:

- Avoid write of uninitialized memory when generating PERF_RECORD_MMAP* records.

- Fix 'perf top' BPF support related crash with perf_event_paranoid=3 + 
kptr_restrict.

- Validate raw event with sysfs exported format bits.

- Fix waipid on SIGCHLD delivery bugs in 'perf daemon'.

- Change to use bash for daemon test on Debian, where the default is dash and
  thus fails for use of bashisms in this test.

- Fix memory leak in vDSO found using ASAN.

- Remove now useless (due to the fact taht BPF now supports static vars)
  failing sub test "BPF relocation checker".

- Fix auxtrace queue conflict.

- Sync linux/kvm.h with the kernel sources.

Signed-off-by: Arnaldo Carvalho de Melo 


Adrian Hunter (1):
  perf auxtrace: Fix auxtrace queue conflict

Arnaldo Carvalho de Melo (2):
  tools headers UAPI: Sync linux/kvm.h with the kernel sources
  Merge remote-tracking branch 'torvalds/master' into perf/urgent

Ian Rogers (1):
  perf synthetic events: Avoid write of uninitialized memory when 
generating PERF_RECORD_MMAP* records

Jackie Liu (1):
  perf top: Fix BPF support related crash with perf_event_paranoid=3 + 
kptr_restrict

Jin Yao (1):
  perf pmu: Validate raw event with sysfs exported format bits

Jiri Olsa (2):
  perf daemon: Force waipid for all session on SIGCHLD delivery
  perf daemon: Return from kill functions

Leo Yan (1):
  perf test: Change to use bash for daemon test

Namhyung Kim (1):
  perf record: Fix memory leak in vDSO found using ASAN

Thomas Richter (2):
  perf synthetic-events: Fix uninitialized 'kernel_thread' variable
  perf test: Remove now useless failing sub test "BPF relocation checker"

 tools/include/uapi/linux/kvm.h | 13 +
 tools/perf/builtin-daemon.c| 57 ++
 tools/perf/tests/bpf.c |  9 +-
 tools/perf/tests/shell/daemon.sh   |  2 +-
 tools/perf/util/auxtrace.c |  4 ---
 tools/perf/util/bpf-event.c| 13 +++--
 tools/perf/util/parse-events.c |  3 ++
 tools/perf/util/pmu.c  | 33 ++
 tools/perf/util/pmu.h  |  3 ++
 tools/perf/util/synthetic-events.c | 11 
 tools/perf/util/vdso.c |  2 ++
 11 files changed, 105 insertions(+), 45 deletions(-)

Test results:

The first ones are container based builds of tools/perf with and without libelf
support.  Where clang is available, it is also used to build perf with/without
libelf, and building with LIBCLANGLLVM=1 (built-in clang) with gcc and clang
when clang and its devel libraries are installed.

The objtool and samples/bpf/ builds are disabled now that I'm switching from
using the sources in a local volume to fetching them from a http server to
build it inside the container, to make it easier to build in a container 
cluster.
Those will come back later.

Several are cross builds, the ones with -x-ARCH and the android one, and those
may not have all the features built, due to lack of multi-arch devel packages,
available and being used so far on just a few, like
debian:experimental-x-{arm64,mipsel}.

The 'perf test' one will perform a variety of tests exercising
tools/perf/util/, tools/lib/{bpf,traceevent,etc}, as well as run perf commands
with a variety of command line event specifications to then intercept the
sys_perf_event syscall to check that the perf_event_attr fields are set up as
expected, among a variety of other unit tests.

Then there is the 'make -C tools/perf build-test' ones, that build tools/perf/
with a variety of feature sets, exercising the build with an incomplete set of
features as well as with a complete one. It is planned to have it run on each
of the containers mentioned above, using some container orchestration
infrastructure. Get in contact if interested in helping having this in place.

  $ grep "model name" -m1 /proc/cpuinfo 
  model name: AMD Ryzen 9 3900X 12-Core Processor
   180.82 alpine:3.4: Ok   gcc (Alpine 5.3.0) 5.3.0 , 
clang version 3.8.0 (tags/RELEASE_380/final)
   278.88 alpine:3.5: Ok   gcc (Alpine 6.2.1) 6.2.1 
20160822 , clang version 3.8.1 (tags/RELEASE_381/final)
   383.59 alpine:3.6: Ok   gcc (Alpine 6.3.0

Re: [PATCH v7] perf annotate: Fix sample events lost in stdio mode

2021-03-26 Thread Arnaldo Carvalho de Melo
Em Fri, Mar 26, 2021 at 12:25:37PM +0900, Namhyung Kim escreveu:
> On Fri, Mar 26, 2021 at 11:24 AM Yang Jihong  wrote:
> > On 2021/3/19 20:35, Yang Jihong wrote:
> > > In hist__find_annotations function, since different hist_entry may point 
> > > to same
> > > symbol, we free notes->src to signal already processed this symbol in 
> > > stdio mode;
> > > when annotate, entry will skipped if notes->src is NULL to avoid repeated 
> > > output.
> > >
> > > However, there is a problem, for example, run the following command:
> > >
> > >   # perf record -e branch-misses -e branch-instructions -a sleep 1
> > >
> > > perf.data file contains different types of sample event.
> > >
> > > If the same IP sample event exists in branch-misses and 
> > > branch-instructions,
> > > this event uses the same symbol. When annotate branch-misses events, 
> > > notes->src
> > > corresponding to this event is set to null, as a result, when annotate
> > > branch-instructions events, this event is skipped and no annotate is 
> > > output.
> > >
> > > Solution of this patch is to remove zfree in hists__find_annotations and
> > > change sort order to "dso,symbol" to avoid duplicate output when different
> > > processes correspond to the same symbol.

> > > Signed-off-by: Yang Jihong 

> Acked-by: Namhyung Kim 

Without looking at the patch, just at its description of the problem, I
tried to annotate two events in a group, to get the annotate group view
output with both events, and it seems I'm getting samples accounted for
both events:

[root@five ~]# perf record -e '{branch-misses,branch-instructions}' -a sleep 1
[ perf record: Woken up 1 times to write data ]
[ perf record: Captured and wrote 2.296 MB perf.data (2507 samples) ]
[root@five ~]#
[root@five ~]# perf report | grep -v '^#' | head -5
17.49%  19.19%  ThreadPoolForeg  chromium-browser  [.] 
v8::internal::ConcurrentMarking::Run
12.17%  17.04%  ThreadPoolForeg  chromium-browser  [.] 
v8::internal::Sweeper::RawSweep
11.14%  11.63%  ThreadPoolForeg  chromium-browser  [.] 
v8::internal::MarkingVisitorBase::ProcessStrongHeapObject
 7.65%   7.84%  ThreadPoolForeg  chromium-browser  [.] 
v8::internal::ConcurrentMarkingVisitor::ShouldVisit
 5.66%   6.72%  ThreadPoolForeg  chromium-browser  [.] 
v8::internal::ConcurrentMarkingVisitor::VisitPointersInSnapshot

[root@five ~]# perf annotate --stdio2 v8::internal::ConcurrentMarking::Run
Samples: 444  of events 'anon group { branch-misses, branch-instructions }', 
4000 Hz, Event count (approx.): 596221, [percent: local period]
v8::internal::ConcurrentMarking::Run() 
/usr/lib64/chromium-browser/chromium-browser
Percent  
 
 
 Disassembly of section .text:
 
 03290b30 
:
 v8::internal::ConcurrentMarking::Run(v8::JobDelegate*, 
unsigned int, bool):
   push   %rbp
   mov%rsp,%rbp   
   push   %r15
   push   %r14
   mov%rdi,%r14   
   push   %r13
   mov%edx,%r13d  
   push   %r12
   mov%ecx,%r12d  
   push   %rbx
   sub$0x1298,%rsp
   mov%rsi,-0x1228(%rbp)
   mov%fs:0x28,%rax
   mov%rax,-0x38(%rbp)
 
   movzwl 0x2(%rbx),%eax
   test   %ax,%ax 
 ↓ jne4a9 
   mov-0x10e8(%rbp),%rdx
   cmpw   $0x0,0x2(%rdx)
  0.41   0.39↓ je 4b90
   movq   %rbx,%xmm0  
   movq   %rdx,%xmm2  
   mov%rdx,%rbx   
   punpcklqdq %xmm2,%xmm0 
   movups %xmm0,-0x10e8(%rbp)
   movzwl 0x2(%rdx),%eax
4a9:   sub$0x1,%eax   
   mov%ax,0x2(%rbx)
  0.36   0.91  movzwl %ax,%eax
  0.60   0.00  mov0x10(%rbx,%rax,8),%rax
  3.44   2.46  mov%rax,-0x11e0(%rbp)
  0.00   0.34   4bf:   mov0x8(%r13),%rax
  0.00   0.36  add$0x1,%r15d  
  0.00   0.34  mov0x110(%rax),%rax
   mov0x128(%rax),%rcx
  0.88   0.36  mov0x8(%r13),%rax
   mov0x110(%rax),%rdx
   mov0x130(%rdx),%rdx
  0.00   0.48  mov0x140(%rax),%rax
   mov0x110(%rax),%rsi
  0.61  

Re: [PATCH V2 1/5] powerpc/perf: Expose processor pipeline stage cycles using PERF_SAMPLE_WEIGHT_STRUCT

2021-03-25 Thread Arnaldo Carvalho de Melo
Em Wed, Mar 24, 2021 at 10:05:23AM +0530, Madhavan Srinivasan escreveu:
> 
> On 3/22/21 8:27 PM, Athira Rajeev wrote:
> > Performance Monitoring Unit (PMU) registers in powerpc provides
> > information on cycles elapsed between different stages in the
> > pipeline. This can be used for application tuning. On ISA v3.1
> > platform, this information is exposed by sampling registers.
> > Patch adds kernel support to capture two of the cycle counters
> > as part of perf sample using the sample type:
> > PERF_SAMPLE_WEIGHT_STRUCT.
> > 
> > The power PMU function 'get_mem_weight' currently uses 64 bit weight
> > field of perf_sample_data to capture memory latency. But following the
> > introduction of PERF_SAMPLE_WEIGHT_TYPE, weight field could contain
> > 64-bit or 32-bit value depending on the architexture support for
> > PERF_SAMPLE_WEIGHT_STRUCT. Patches uses WEIGHT_STRUCT to expose the
> > pipeline stage cycles info. Hence update the ppmu functions to work for
> > 64-bit and 32-bit weight values.
> > 
> > If the sample type is PERF_SAMPLE_WEIGHT, use the 64-bit weight field.
> > if the sample type is PERF_SAMPLE_WEIGHT_STRUCT, memory subsystem
> > latency is stored in the low 32bits of perf_sample_weight structure.
> > Also for CPU_FTR_ARCH_31, capture the two cycle counter information in
> > two 16 bit fields of perf_sample_weight structure.
> 
> Changes looks fine to me.

You mean just the kernel part or can I add your Reviewed-by to all the
patchset?
 
> Reviewed-by: Madhavan Srinivasan 
> 
> 
> > Signed-off-by: Athira Rajeev 
> > ---
> >   arch/powerpc/include/asm/perf_event_server.h |  2 +-
> >   arch/powerpc/perf/core-book3s.c  |  4 ++--
> >   arch/powerpc/perf/isa207-common.c| 29 
> > +---
> >   arch/powerpc/perf/isa207-common.h|  6 +-
> >   4 files changed, 34 insertions(+), 7 deletions(-)
> > 
> > diff --git a/arch/powerpc/include/asm/perf_event_server.h 
> > b/arch/powerpc/include/asm/perf_event_server.h
> > index 00e7e671bb4b..112cf092d7b3 100644
> > --- a/arch/powerpc/include/asm/perf_event_server.h
> > +++ b/arch/powerpc/include/asm/perf_event_server.h
> > @@ -43,7 +43,7 @@ struct power_pmu {
> > u64 alt[]);
> > void(*get_mem_data_src)(union perf_mem_data_src *dsrc,
> > u32 flags, struct pt_regs *regs);
> > -   void(*get_mem_weight)(u64 *weight);
> > +   void(*get_mem_weight)(u64 *weight, u64 type);
> > unsigned long   group_constraint_mask;
> > unsigned long   group_constraint_val;
> > u64 (*bhrb_filter_map)(u64 branch_sample_type);
> > diff --git a/arch/powerpc/perf/core-book3s.c 
> > b/arch/powerpc/perf/core-book3s.c
> > index 766f064f00fb..6936763246bd 100644
> > --- a/arch/powerpc/perf/core-book3s.c
> > +++ b/arch/powerpc/perf/core-book3s.c
> > @@ -2206,9 +2206,9 @@ static void record_and_restart(struct perf_event 
> > *event, unsigned long val,
> > ppmu->get_mem_data_src)
> > ppmu->get_mem_data_src(_src, ppmu->flags, 
> > regs);
> > -   if (event->attr.sample_type & PERF_SAMPLE_WEIGHT &&
> > +   if (event->attr.sample_type & PERF_SAMPLE_WEIGHT_TYPE &&
> > ppmu->get_mem_weight)
> > -   ppmu->get_mem_weight();
> > +   ppmu->get_mem_weight(, 
> > event->attr.sample_type);
> > if (perf_event_overflow(event, , regs))
> > power_pmu_stop(event, 0);
> > diff --git a/arch/powerpc/perf/isa207-common.c 
> > b/arch/powerpc/perf/isa207-common.c
> > index e4f577da33d8..5dcbdbd54598 100644
> > --- a/arch/powerpc/perf/isa207-common.c
> > +++ b/arch/powerpc/perf/isa207-common.c
> > @@ -284,8 +284,10 @@ void isa207_get_mem_data_src(union perf_mem_data_src 
> > *dsrc, u32 flags,
> > }
> >   }
> > -void isa207_get_mem_weight(u64 *weight)
> > +void isa207_get_mem_weight(u64 *weight, u64 type)
> >   {
> > +   union perf_sample_weight *weight_fields;
> > +   u64 weight_lat;
> > u64 mmcra = mfspr(SPRN_MMCRA);
> > u64 exp = MMCRA_THR_CTR_EXP(mmcra);
> > u64 mantissa = MMCRA_THR_CTR_MANT(mmcra);
> > @@ -296,9 +298,30 @@ void isa207_get_mem_weight(u64 *weight)
> > mantissa = P10_MMCRA_THR_CTR_MANT(mmcra);
> > if (val == 0 || val == 7)
> > -   *weight = 0;
> > +   weight_lat = 0;
> > else
> > -   *weight = mantissa << (2 * exp);
> > +   weight_lat = mantissa << (2 * exp);
> > +
> > +   /*
> > +* Use 64 bit weight field (full) if sample type is
> > +* WEIGHT.
> > +*
> > +* if sample type is WEIGHT_STRUCT:
> > +* - store memory latency in the lower 32 bits.
> > +* - For ISA v3.1, use remaining two 16 bit fields of
> > +*   perf_sample_weight to store cycle counter values
> > +*   from sier2.
> > +*/
> > +   weight_fields = (union perf_sample_weight 

Re: [PATCH V2 1/5] powerpc/perf: Expose processor pipeline stage cycles using PERF_SAMPLE_WEIGHT_STRUCT

2021-03-25 Thread Arnaldo Carvalho de Melo
Em Wed, Mar 24, 2021 at 10:05:23AM +0530, Madhavan Srinivasan escreveu:
> 
> On 3/22/21 8:27 PM, Athira Rajeev wrote:
> > Performance Monitoring Unit (PMU) registers in powerpc provides
> > information on cycles elapsed between different stages in the
> > pipeline. This can be used for application tuning. On ISA v3.1
> > platform, this information is exposed by sampling registers.
> > Patch adds kernel support to capture two of the cycle counters
> > as part of perf sample using the sample type:
> > PERF_SAMPLE_WEIGHT_STRUCT.
> > 
> > The power PMU function 'get_mem_weight' currently uses 64 bit weight
> > field of perf_sample_data to capture memory latency. But following the
> > introduction of PERF_SAMPLE_WEIGHT_TYPE, weight field could contain
> > 64-bit or 32-bit value depending on the architexture support for
> > PERF_SAMPLE_WEIGHT_STRUCT. Patches uses WEIGHT_STRUCT to expose the
> > pipeline stage cycles info. Hence update the ppmu functions to work for
> > 64-bit and 32-bit weight values.
> > 
> > If the sample type is PERF_SAMPLE_WEIGHT, use the 64-bit weight field.
> > if the sample type is PERF_SAMPLE_WEIGHT_STRUCT, memory subsystem
> > latency is stored in the low 32bits of perf_sample_weight structure.
> > Also for CPU_FTR_ARCH_31, capture the two cycle counter information in
> > two 16 bit fields of perf_sample_weight structure.
> 
> Changes looks fine to me.
> 
> Reviewed-by: Madhavan Srinivasan 

So who will process the kernel bits? I'm merging the tooling parts,

Thanks,

- Arnaldo
 
> 
> > Signed-off-by: Athira Rajeev 
> > ---
> >   arch/powerpc/include/asm/perf_event_server.h |  2 +-
> >   arch/powerpc/perf/core-book3s.c  |  4 ++--
> >   arch/powerpc/perf/isa207-common.c| 29 
> > +---
> >   arch/powerpc/perf/isa207-common.h|  6 +-
> >   4 files changed, 34 insertions(+), 7 deletions(-)
> > 
> > diff --git a/arch/powerpc/include/asm/perf_event_server.h 
> > b/arch/powerpc/include/asm/perf_event_server.h
> > index 00e7e671bb4b..112cf092d7b3 100644
> > --- a/arch/powerpc/include/asm/perf_event_server.h
> > +++ b/arch/powerpc/include/asm/perf_event_server.h
> > @@ -43,7 +43,7 @@ struct power_pmu {
> > u64 alt[]);
> > void(*get_mem_data_src)(union perf_mem_data_src *dsrc,
> > u32 flags, struct pt_regs *regs);
> > -   void(*get_mem_weight)(u64 *weight);
> > +   void(*get_mem_weight)(u64 *weight, u64 type);
> > unsigned long   group_constraint_mask;
> > unsigned long   group_constraint_val;
> > u64 (*bhrb_filter_map)(u64 branch_sample_type);
> > diff --git a/arch/powerpc/perf/core-book3s.c 
> > b/arch/powerpc/perf/core-book3s.c
> > index 766f064f00fb..6936763246bd 100644
> > --- a/arch/powerpc/perf/core-book3s.c
> > +++ b/arch/powerpc/perf/core-book3s.c
> > @@ -2206,9 +2206,9 @@ static void record_and_restart(struct perf_event 
> > *event, unsigned long val,
> > ppmu->get_mem_data_src)
> > ppmu->get_mem_data_src(_src, ppmu->flags, 
> > regs);
> > -   if (event->attr.sample_type & PERF_SAMPLE_WEIGHT &&
> > +   if (event->attr.sample_type & PERF_SAMPLE_WEIGHT_TYPE &&
> > ppmu->get_mem_weight)
> > -   ppmu->get_mem_weight();
> > +   ppmu->get_mem_weight(, 
> > event->attr.sample_type);
> > if (perf_event_overflow(event, , regs))
> > power_pmu_stop(event, 0);
> > diff --git a/arch/powerpc/perf/isa207-common.c 
> > b/arch/powerpc/perf/isa207-common.c
> > index e4f577da33d8..5dcbdbd54598 100644
> > --- a/arch/powerpc/perf/isa207-common.c
> > +++ b/arch/powerpc/perf/isa207-common.c
> > @@ -284,8 +284,10 @@ void isa207_get_mem_data_src(union perf_mem_data_src 
> > *dsrc, u32 flags,
> > }
> >   }
> > -void isa207_get_mem_weight(u64 *weight)
> > +void isa207_get_mem_weight(u64 *weight, u64 type)
> >   {
> > +   union perf_sample_weight *weight_fields;
> > +   u64 weight_lat;
> > u64 mmcra = mfspr(SPRN_MMCRA);
> > u64 exp = MMCRA_THR_CTR_EXP(mmcra);
> > u64 mantissa = MMCRA_THR_CTR_MANT(mmcra);
> > @@ -296,9 +298,30 @@ void isa207_get_mem_weight(u64 *weight)
> > mantissa = P10_MMCRA_THR_CTR_MANT(mmcra);
> > if (val == 0 || val == 7)
> > -   *weight = 0;
> > +   weight_lat = 0;
> > else
> > -   *weight = mantissa << (2 * exp);
> > +   weight_lat = mantissa << (2 * exp);
> > +
> > +   /*
> > +* Use 64 bit weight field (full) if sample type is
> > +* WEIGHT.
> > +*
> > +* if sample type is WEIGHT_STRUCT:
> > +* - store memory latency in the lower 32 bits.
> > +* - For ISA v3.1, use remaining two 16 bit fields of
> > +*   perf_sample_weight to store cycle counter values
> > +*   from sier2.
> > +*/
> > +   weight_fields = (union 

Re: [PATCH] tools: perf: Remove duplicate includes

2021-03-25 Thread Arnaldo Carvalho de Melo
Em Tue, Mar 23, 2021 at 01:01:39PM +0800, Wan Jiabing escreveu:
> sys/stat.h has been included at line 23, so remove the
> duplicate one at line 27.
> linux/string.h has been included at line 7, so remove the
> duplicate one at line 9.
> time.h has been included at line 14, so remove the
> duplicate one at line 28.

Thanks, applied.

- Arnaldo

 
> Signed-off-by: Wan Jiabing 
> ---
>  tools/perf/builtin-daemon.c | 3 ---
>  1 file changed, 3 deletions(-)
> 
> diff --git a/tools/perf/builtin-daemon.c b/tools/perf/builtin-daemon.c
> index ace8772a4f03..632ecd010a4f 100644
> --- a/tools/perf/builtin-daemon.c
> +++ b/tools/perf/builtin-daemon.c
> @@ -6,7 +6,6 @@
>  #include 
>  #include 
>  #include 
> -#include 
>  #include 
>  #include 
>  #include 
> @@ -24,8 +23,6 @@
>  #include 
>  #include 
>  #include 
> -#include 
> -#include 
>  #include "builtin.h"
>  #include "perf.h"
>  #include "debug.h"
> -- 
> 2.25.1
> 

-- 

- Arnaldo


Re: [PATCH 1/2] perf/core: Share an event with multiple cgroups

2021-03-25 Thread Arnaldo Carvalho de Melo
Em Thu, Mar 25, 2021 at 12:55:50AM +, Song Liu escreveu:
> > On Mar 23, 2021, at 9:21 AM, Namhyung Kim  wrote:
> > #ifdef CONFIG_SECURITY
> > @@ -780,6 +792,14 @@ struct perf_event {
> > #endif /* CONFIG_PERF_EVENTS */
> > };

> > +struct perf_cgroup_node {
> > +   struct hlist_node   node;
> > +   u64 id;
> > +   u64 count;
> > +   u64 time_enabled;
> > +   u64 time_running;
> > +   u64 padding[2];
> 
> Do we really need the padding? For cache line alignment? 

I guess so, to get it to 64 bytes, then having it as:

struct perf_cgroup_node {
struct hlist_node   node;
u64 id;
u64 count;
u64 time_enabled;
u64 time_running;
} cacheline_aligned;

Seems better :-)

Testing:

[acme@five c]$ cat cacheline_aligned.c
#ifndef cacheline_aligned
#define cacheline_aligned __attribute__((__aligned__(SMP_CACHE_BYTES)))
#endif

// from ../build/v5.12.0-rc4+/include/generated/autoconf.h
#define CONFIG_X86_L1_CACHE_SHIFT 6

#define L1_CACHE_SHIFT  (CONFIG_X86_L1_CACHE_SHIFT)
#define L1_CACHE_BYTES  (1 << L1_CACHE_SHIFT)

#ifndef SMP_CACHE_BYTES
#define SMP_CACHE_BYTES L1_CACHE_BYTES
#endif

typedef long long unsigned int u64;

struct hlist_node {
struct hlist_node *next; /* 0 8 */
struct hlist_node * *  pprev;/* 8 8 */

/* size: 16, cachelines: 1, members: 2 */
/* last cacheline: 16 bytes */
};

struct perf_cgroup_node {
struct hlist_node   node;
u64 id;
u64 count;
u64 time_enabled;
u64 time_running;
} cacheline_aligned foo;

[acme@five c]$ cc  -g  -c -o cacheline_aligned.o cacheline_aligned.c
[acme@five c]$ pahole cacheline_aligned.o
struct hlist_node {
struct hlist_node *next; /* 0 8 */
struct hlist_node * *  pprev;/* 8 8 */

/* size: 16, cachelines: 1, members: 2 */
/* last cacheline: 16 bytes */
};
struct perf_cgroup_node {
struct hlist_node  node; /* 016 */
u64id;   /*16 8 */
u64count;/*24 8 */
u64time_enabled; /*32 8 */
u64time_running; /*40 8 */

/* size: 64, cachelines: 1, members: 5 */
/* padding: 16 */
} __attribute__((__aligned__(64)));
[acme@five c]$

- Arnaldo


Re: [PATCH] tools: perf: util: Remove duplicate declaration

2021-03-25 Thread Arnaldo Carvalho de Melo
Em Thu, Mar 25, 2021 at 12:39:34PM +0800, Wan Jiabing escreveu:
> struct evlist has been declared at 10th line.
> struct comm has been declared at 15th line.
> Remove the duplicate.

Thanks, applied.

- Arnaldo

 
> Signed-off-by: Wan Jiabing 
> ---
>  tools/perf/util/metricgroup.h  | 1 -
>  tools/perf/util/thread-stack.h | 1 -
>  2 files changed, 2 deletions(-)
> 
> diff --git a/tools/perf/util/metricgroup.h b/tools/perf/util/metricgroup.h
> index ed1b9392e624..026bbf416c48 100644
> --- a/tools/perf/util/metricgroup.h
> +++ b/tools/perf/util/metricgroup.h
> @@ -9,7 +9,6 @@
>  
>  struct evlist;
>  struct evsel;
> -struct evlist;
>  struct option;
>  struct rblist;
>  struct pmu_events_map;
> diff --git a/tools/perf/util/thread-stack.h b/tools/perf/util/thread-stack.h
> index 3bc47a42af8e..b3cd09beb62f 100644
> --- a/tools/perf/util/thread-stack.h
> +++ b/tools/perf/util/thread-stack.h
> @@ -16,7 +16,6 @@ struct comm;
>  struct ip_callchain;
>  struct symbol;
>  struct dso;
> -struct comm;
>  struct perf_sample;
>  struct addr_location;
>  struct call_path;
> -- 
> 2.25.1
> 

-- 

- Arnaldo


Re: [PATCH v3 04/21] x86/insn: Add an insn_decode() API

2021-03-24 Thread Arnaldo Carvalho de Melo
Em Wed, Mar 24, 2021 at 11:21:19AM -0700, Ian Rogers escreveu:
> On Wed, Mar 24, 2021 at 6:54 AM Borislav Petkov  wrote:
> >
> > On Wed, Mar 24, 2021 at 10:43:20AM -0300, Arnaldo Carvalho de Melo wrote:
> > > Borislav, was this addressed? Ian?
> >
> > Yap:
> >
> > https://git.kernel.org/tip/0705ef64d1ff52b817e278ca6e28095585ff31e1
> 
> Tested on PPC and ARM64 fwiw. Thanks,

Thank you guys for clearing this up,

- Arnaldo


Re: [PATCH] perf test: Change to use bash for daemon test

2021-03-24 Thread Arnaldo Carvalho de Melo
Em Wed, Mar 24, 2021 at 10:45:22AM +0900, Namhyung Kim escreveu:
> On Sat, Mar 20, 2021 at 7:46 PM Leo Yan  wrote:
> > [1] https://bugs.launchpad.net/ubuntu/+source/dash/+bug/139097
> >
> > Fixes: 2291bb915b55 ("perf tests: Add daemon 'list' command test")
> > Signed-off-by: Leo Yan 
> 
> Acked-by: Namhyung Kim 

Thanks, applied.

- Arnaldo



Re: [PATCH v3 04/21] x86/insn: Add an insn_decode() API

2021-03-24 Thread Arnaldo Carvalho de Melo
Em Tue, Mar 16, 2021 at 06:14:54PM -0700, Ian Rogers escreveu:
> On Thu, Mar 4, 2021 at 9:56 AM Borislav Petkov  wrote:
> > From: Borislav Petkov 
> >
> > Users of the instruction decoder should use this to decode instruction
> > bytes. For that, have insn*() helpers return an int value to denote
> > success/failure. When there's an error fetching the next insn byte and
> > the insn falls short, return -ENODATA to denote that.
> >
> > While at it, make insn_get_opcode() more stricter as to whether what has
> > seen so far is a valid insn and if not.
> >
> > Copy linux/kconfig.h for the tools-version of the decoder so that it can
> > use IS_ENABLED().
> >
> > Also, cast the INSN_MODE_KERN dummy define value to (enum insn_mode)
> > for tools use of the decoder because perf tool builds with -Werror and
> > errors out with -Werror=sign-compare otherwise.
> >
> > Signed-off-by: Borislav Petkov 
> > Acked-by: Masami Hiramatsu 

> > +++ b/tools/arch/x86/lib/insn.c
> > @@ -11,10 +11,13 @@
> >  #else
> >  #include 
> >  #endif
> > -#include "../include/asm/inat.h" /* __ignore_sync_check__ */
> > -#include "../include/asm/insn.h" /* __ignore_sync_check__ */
> > +#include  /*__ignore_sync_check__ */
> > +#include  /* __ignore_sync_check__ */
> 
> Hi, this change is breaking non-x86 builds of perf for me in
> tip.git/master. The reason being that non-x86 builds compile the
> intel-pt-decoder, which includes this file, but don't have their
> include paths set to find tools/arch/x86. I think we want to keep the
> relative paths.

Borislav, was this addressed? Ian?

- Arnaldo


Re: [PATCH] MAINTAINERS: Add Mailing list and Web-page for PERFORMANCE EVENTS SUBSYSTEM

2021-03-24 Thread Arnaldo Carvalho de Melo
Em Mon, Mar 15, 2021 at 11:56:32AM +0800, Tiezhu Yang escreveu:
> Add entry "L: linux-perf-us...@vger.kernel.org" to archive the
> related mail on https://lore.kernel.org/linux-perf-users/, add
> entry "W: https://perf.wiki.kernel.org/; so that newbies could
> get some useful materials.

Thanks, applied.

- Arnaldo

 
> Signed-off-by: Tiezhu Yang 
> ---
>  MAINTAINERS | 2 ++
>  1 file changed, 2 insertions(+)
> 
> diff --git a/MAINTAINERS b/MAINTAINERS
> index aa84121..e1626db 100644
> --- a/MAINTAINERS
> +++ b/MAINTAINERS
> @@ -14021,8 +14021,10 @@ R:   Mark Rutland 
>  R:   Alexander Shishkin 
>  R:   Jiri Olsa 
>  R:   Namhyung Kim 
> +L:   linux-perf-us...@vger.kernel.org
>  L:   linux-kernel@vger.kernel.org
>  S:   Supported
> +W:   https://perf.wiki.kernel.org/
>  T:   git git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip.git perf/core
>  F:   arch/*/events/*
>  F:   arch/*/events/*/*
> -- 
> 2.1.0
> 

-- 

- Arnaldo


Re: [PATCH] perf record: Fix memory leak in vDSO

2021-03-24 Thread Arnaldo Carvalho de Melo
Em Tue, Mar 16, 2021 at 02:50:48PM +0100, Jiri Olsa escreveu:
> On Tue, Mar 16, 2021 at 09:56:26AM -0300, Arnaldo Carvalho de Melo wrote:
> > Em Tue, Mar 16, 2021 at 11:28:12AM +0900, Namhyung Kim escreveu:
> > > On Mon, Mar 15, 2021 at 10:28 PM Jiri Olsa  wrote:
> > > >
> > > > On Mon, Mar 15, 2021 at 01:56:41PM +0900, Namhyung Kim wrote:
> > > > > I got several memory leak reports from Asan with a simple command.  It
> > > > > was because VDSO is not released due to the refcount.  Like in
> > > > > __dsos_addnew_id(), it should put the refcount after adding to the 
> > > > > list.
> > > > >
> > > > >   $ perf record true
> > > > >   [ perf record: Woken up 1 times to write data ]
> > > > >   [ perf record: Captured and wrote 0.030 MB perf.data (10 samples) ]
> > > > >
> > > > >   =
> > > > >   ==692599==ERROR: LeakSanitizer: detected memory leaks
> > > > >
> > > > >   Direct leak of 439 byte(s) in 1 object(s) allocated from:
> > > > > #0 0x7fea52341037 in __interceptor_calloc 
> > > > > ../../../../src/libsanitizer/asan/asan_malloc_linux.cpp:154
> > > > > #1 0x559bce4aa8ee in dso__new_id util/dso.c:1256
> > > > > #2 0x559bce59245a in __machine__addnew_vdso util/vdso.c:132
> > > > > #3 0x559bce59245a in machine__findnew_vdso util/vdso.c:347
> > > > > #4 0x559bce50826c in map__new util/map.c:175
> > > > > #5 0x559bce503c92 in machine__process_mmap2_event 
> > > > > util/machine.c:1787
> > > > > #6 0x559bce512f6b in machines__deliver_event util/session.c:1481
> > > > > #7 0x559bce515107 in perf_session__deliver_event 
> > > > > util/session.c:1551
> > > > > #8 0x559bce51d4d2 in do_flush util/ordered-events.c:244
> > > > > #9 0x559bce51d4d2 in __ordered_events__flush 
> > > > > util/ordered-events.c:323
> > > > > #10 0x559bce519bea in __perf_session__process_events 
> > > > > util/session.c:2268
> > > > > #11 0x559bce519bea in perf_session__process_events 
> > > > > util/session.c:2297
> > > > > #12 0x559bce2e7a52 in process_buildids 
> > > > > /home/namhyung/project/linux/tools/perf/builtin-record.c:1017
> > > > > #13 0x559bce2e7a52 in record__finish_output 
> > > > > /home/namhyung/project/linux/tools/perf/builtin-record.c:1234
> > > > > #14 0x559bce2ed4f6 in __cmd_record 
> > > > > /home/namhyung/project/linux/tools/perf/builtin-record.c:2026
> > > > > #15 0x559bce2ed4f6 in cmd_record 
> > > > > /home/namhyung/project/linux/tools/perf/builtin-record.c:2858
> > > > > #16 0x559bce422db4 in run_builtin 
> > > > > /home/namhyung/project/linux/tools/perf/perf.c:313
> > > > > #17 0x559bce2acac8 in handle_internal_command 
> > > > > /home/namhyung/project/linux/tools/perf/perf.c:365
> > > > > #18 0x559bce2acac8 in run_argv 
> > > > > /home/namhyung/project/linux/tools/perf/perf.c:409
> > > > > #19 0x559bce2acac8 in main 
> > > > > /home/namhyung/project/linux/tools/perf/perf.c:539
> > > > > #20 0x7fea51e76d09 in __libc_start_main ../csu/libc-start.c:308
> > > > >
> > > > >   Indirect leak of 32 byte(s) in 1 object(s) allocated from:
> > > > > #0 0x7fea52341037 in __interceptor_calloc 
> > > > > ../../../../src/libsanitizer/asan/asan_malloc_linux.cpp:154
> > > > > #1 0x559bce520907 in nsinfo__copy util/namespaces.c:169
> > > > > #2 0x559bce50821b in map__new util/map.c:168
> > > > > #3 0x559bce503c92 in machine__process_mmap2_event 
> > > > > util/machine.c:1787
> > > > > #4 0x559bce512f6b in machines__deliver_event util/session.c:1481
> > > > > #5 0x559bce515107 in perf_session__deliver_event 
> > > > > util/session.c:1551
> > > > > #6 0x559bce51d4d2 in do_flush util/ordered-events.c:244
> > > > > #7 0x559bce51d4d2 in __ordered_events__flush 
> > > > > util/ordered-events.c:323
> > > > > #8 0x559bce519bea in __perf_session__process_events 
> > > > > util/session.c:2268
> > > > > #9 0x559bce519bea in perf_session__pr

Re: [PATCH] perf test: Remove perf sub test 42.4 BPF relocation checker

2021-03-24 Thread Arnaldo Carvalho de Melo
Em Wed, Mar 24, 2021 at 09:37:34AM +0100, Thomas Richter escreveu:
> For some time now the perf test 42: BPF filter returns an error
> on bpf relocation subtest, at least on x86 and s390. This is caused by
> 
> commit d859900c4c56 ("bpf, libbpf: support global data/bss/rodata sections")
> 
> which introduces support for global variables in eBPF programs.
> 
> Perf test 42.4 checks that the eBPF relocation fails when the eBPF program
> contains a global variable. It returns OK when the eBPF program
> could not be loaded and FAILED otherwise.
> 
> With above commit the test logic for the eBPF relocation is obsolete.
> The loading of the eBPF now succeeds and the test always shows FAILED.
> 
> This patch removes the sub test completely.
> Also a lot of eBPF program testing is done in the eBPF test suite,
> it also contains tests for global variables.

Thanks, applied.

- Arnaldo



Re: [PATCH] perf data: export to JSON

2021-03-24 Thread Arnaldo Carvalho de Melo
Em Wed, Mar 24, 2021 at 09:06:50AM -0400, Nicholas Fraser escreveu:
> This adds preliminary support to dump the contents of a perf.data file to
> human-readable JSON.
> 
> The "perf data" command currently only supports exporting to Common Trace
> Format and it doesn't do symbol resolution among other things. Dumping to JSON
> means the data can be trivially parsed by anything without any dependencies
> (besides a JSON parser.) We use this to import the data into a tool on Windows
> where integrating perf or libbabeltrace is impractical.
> 
> The JSON is encoded using some trivial fprintf() commands; there is no
> dependency on any JSON library. It currently only outputs samples. Other stuff
> like processes and mappings could easily be added as needed. The output is of
> course huge but it compresses well enough.
> 
> Use it like this:
> 
> perf data convert --to-json out.json

Interesting, see below for some minor stuff while others have the chance
to further review this.

I'm ok with how it is right now, not being that versed into JSON
details.

Do you plan to output the headers too? I think we should, for
completeness.
 
- Arnaldo
 
> Here's what the output looks like:
> 
> {
> "linux-perf-json-version": 1,
> "samples": [
> {
> "timestamp": 3074717308597,
> "pid": 8604,
> "tid": 8604,
> "comm": "sh",
> "callchain": [
> {
> "ip": "0x7f1e0deb2d36",
> "symbol": "__strcmp_avx2",
> "dso": "libc-2.33.so"
> },
> {
> "ip": "0x7f1e0dd7f49f",
> "symbol": "__gconv_find_transform",
> "dso": "libc-2.33.so"
> },
> {
> "ip": "0x7f1e0de0b71c",
> "symbol": "__wcsmbs_load_conv",
> "dso": "libc-2.33.so"
> }
> ]
> },
> ...
> ]
> }
> 
> Signed-off-by: Nicholas Fraser 
> ---
>  tools/perf/Documentation/perf-data.txt |   5 +-
>  tools/perf/builtin-data.c  |  39 -
>  tools/perf/util/Build  |   1 +
>  tools/perf/util/data-convert-json.c| 228 +
>  tools/perf/util/data-convert-json.h|   9 +
>  tools/perf/util/data-convert.h |   2 +
>  6 files changed, 276 insertions(+), 8 deletions(-)
>  create mode 100644 tools/perf/util/data-convert-json.c
>  create mode 100644 tools/perf/util/data-convert-json.h
> 
> diff --git a/tools/perf/Documentation/perf-data.txt 
> b/tools/perf/Documentation/perf-data.txt
> index 726b9bc9e1a7..417bf17e265c 100644
> --- a/tools/perf/Documentation/perf-data.txt
> +++ b/tools/perf/Documentation/perf-data.txt
> @@ -17,7 +17,7 @@ Data file related processing.
>  COMMANDS
>  
>  convert::
> - Converts perf data file into another format (only CTF [1] format is 
> support by now).
> + Converts perf data file into another format.
>   It's possible to set data-convert debug variable to get debug messages 
> from conversion,
>   like:
> perf --debug data-convert data convert ...
> @@ -27,6 +27,9 @@ OPTIONS for 'convert'
>  --to-ctf::
>   Triggers the CTF conversion, specify the path of CTF data directory.
>  
> +--to-json::
> + Triggers JSON conversion. Specify the JSON filename to output.
> +
>  --tod::
>   Convert time to wall clock time.
>  
> diff --git a/tools/perf/builtin-data.c b/tools/perf/builtin-data.c
> index 8d23b8d6ee8e..64546ba517a5 100644
> --- a/tools/perf/builtin-data.c
> +++ b/tools/perf/builtin-data.c
> @@ -8,6 +8,7 @@
>  #include 
>  #include "data-convert.h"
>  #include "data-convert-bt.h"
> +#include "data-convert-json.h"
>  
>  typedef int (*data_cmd_fn_t)(int argc, const char **argv);
>  
> @@ -55,7 +56,8 @@ static const char * const data_convert_usage[] = {
>  
>  static int cmd_data_convert(int argc, const char **argv)
>  {
> - const char *to_ctf = NULL;
> + const char *to_json = NULL;
> + const char *to_ctf = NULL;
>   struct perf_data_convert_opts opts = {
>   .force = false,
>   .all = false,
> @@ -63,6 +65,7 @@ static int cmd_data_convert(int argc, const char **argv)
>   const struct option options[] = {
>   OPT_INCR('v', "verbose", , "be more verbose"),
>   OPT_STRING('i', "input", _name, "file", "input file 
> name"),
> + OPT_STRING(0, "to-json", _json, NULL, "Convert to JSON 
> format"),
>  #ifdef HAVE_LIBBABELTRACE_SUPPORT
>   

Re: [PATCH 1/2] perf daemon: Force waipid for all session on SIGCHLD delivery

2021-03-24 Thread Arnaldo Carvalho de Melo
Em Sat, Mar 20, 2021 at 11:10:12PM +0100, Jiri Olsa escreveu:
> If we don't process SIGCHLD before another comes, we will
> see just one SIGCHLD as a result. In this case current code
> will miss exit notification for a session and wait forever.
> 
> Adding extra waitpid check for all sessions when SIGCHLD
> is received, to make sure we don't miss any session exit.
> 
> Also fix close condition for signal_fd.

Thanks, applied.

- Arnaldo

 
> Reported-by: Ian Rogers 
> Signed-off-by: Jiri Olsa 
> ---
>  tools/perf/builtin-daemon.c | 50 +
>  1 file changed, 28 insertions(+), 22 deletions(-)
> 
> diff --git a/tools/perf/builtin-daemon.c b/tools/perf/builtin-daemon.c
> index ace8772a4f03..4697493842f5 100644
> --- a/tools/perf/builtin-daemon.c
> +++ b/tools/perf/builtin-daemon.c
> @@ -402,35 +402,42 @@ static pid_t handle_signalfd(struct daemon *daemon)
>   int status;
>   pid_t pid;
>  
> + /*
> +  * Take signal fd data as pure signal notification and check all
> +  * the sessions state. The reason is that multiple signals can get
> +  * coalesced in kernel and we can receive only single signal even
> +  * if multiple SIGCHLD were generated.
> +  */
>   err = read(daemon->signal_fd, , sizeof(struct signalfd_siginfo));
> - if (err != sizeof(struct signalfd_siginfo))
> + if (err != sizeof(struct signalfd_siginfo)) {
> + pr_err("failed to read signal fd\n");
>   return -1;
> + }
>  
>   list_for_each_entry(session, >sessions, list) {
> + if (session->pid == -1)
> + continue;
>  
> - if (session->pid != (int) si.ssi_pid)
> + pid = waitpid(session->pid, , WNOHANG);
> + if (pid <= 0)
>   continue;
>  
> - pid = waitpid(session->pid, , 0);
> - if (pid == session->pid) {
> - if (WIFEXITED(status)) {
> - pr_info("session '%s' exited, status=%d\n",
> - session->name, WEXITSTATUS(status));
> - } else if (WIFSIGNALED(status)) {
> - pr_info("session '%s' killed (signal %d)\n",
> - session->name, WTERMSIG(status));
> - } else if (WIFSTOPPED(status)) {
> - pr_info("session '%s' stopped (signal %d)\n",
> - session->name, WSTOPSIG(status));
> - } else {
> - pr_info("session '%s' Unexpected status 
> (0x%x)\n",
> - session->name, status);
> - }
> + if (WIFEXITED(status)) {
> + pr_info("session '%s' exited, status=%d\n",
> + session->name, WEXITSTATUS(status));
> + } else if (WIFSIGNALED(status)) {
> + pr_info("session '%s' killed (signal %d)\n",
> + session->name, WTERMSIG(status));
> + } else if (WIFSTOPPED(status)) {
> + pr_info("session '%s' stopped (signal %d)\n",
> + session->name, WSTOPSIG(status));
> + } else {
> + pr_info("session '%s' Unexpected status (0x%x)\n",
> + session->name, status);
>   }
>  
>   session->state = KILL;
>   session->pid = -1;
> - return pid;
>   }
>  
>   return 0;
> @@ -443,7 +450,6 @@ static int daemon_session__wait(struct daemon_session 
> *session, struct daemon *d
>   .fd = daemon->signal_fd,
>   .events = POLLIN,
>   };
> - pid_t wpid = 0, pid = session->pid;
>   time_t start;
>  
>   start = time(NULL);
> @@ -452,7 +458,7 @@ static int daemon_session__wait(struct daemon_session 
> *session, struct daemon *d
>   int err = poll(, 1, 1000);
>  
>   if (err > 0) {
> - wpid = handle_signalfd(daemon);
> + handle_signalfd(daemon);
>   } else if (err < 0) {
>   perror("failed: poll\n");
>   return -1;
> @@ -460,7 +466,7 @@ static int daemon_session__wait(struct daemon_session 
> *session, struct daemon *d
>  
>   if (start + secs < time(NULL))
>   return -1;
> - } while (wpid != pid);
> + } while (session->pid != -1);
>  
>   return 0;
>  }
> @@ -1344,7 +1350,7 @@ static int __cmd_start(struct daemon *daemon, struct 
> option parent_options[],
>   close(sock_fd);
>   if (conf_fd != -1)
>   close(conf_fd);
> - if (conf_fd != -1)
> + if (signal_fd != -1)
>   close(signal_fd);
>  
>   pr_info("daemon exited\n");
> -- 
> 2.30.2
> 

-- 

- Arnaldo


Re: [PATCH v3 1/2] perf stat: Align CSV output for summary mode

2021-03-24 Thread Arnaldo Carvalho de Melo
Em Mon, Mar 22, 2021 at 10:15:48PM +0100, Jiri Olsa escreveu:
> On Fri, Mar 19, 2021 at 03:01:55PM +0800, Jin Yao wrote:
> 
> SNIP
> 
> >   102107,,branch-misses,8012781751,100.00,4.15,of all branches
> > 
> > This option can be enabled in perf config by setting the variable
> > 'stat.no-cvs-summary'.
> > 
> >   # perf config stat.no-cvs-summary=true
> > 
> >   # perf config -l
> >   stat.no-cvs-summary=true
> > 
> >   # perf stat -x, -I1000 --interval-count 1 --summary
> >1.001330198,8013.28,msec,cpu-clock,8013279201,100.00,8.013,CPUs 
> > utilized
> >1.001330198,205,,context-switches,8013308394,100.00,25.583,/sec
> >1.001330198,10,,cpu-migrations,8013324681,100.00,1.248,/sec
> >1.001330198,0,,page-faults,8013340926,100.00,0.000,/sec
> >1.001330198,8027742,,cycles,8013344503,100.00,0.001,GHz
> >1.001330198,2871717,,instructions,8013356501,100.00,0.36,insn per 
> > cycle
> >1.001330198,553564,,branches,8013366204,100.00,69.081,K/sec
> >1.001330198,54021,,branch-misses,8013375952,100.00,9.76,of all 
> > branches
> >   8013.28,msec,cpu-clock,8013279201,100.00,7.985,CPUs utilized
> >   205,,context-switches,8013308394,100.00,25.583,/sec
> >   10,,cpu-migrations,8013324681,100.00,1.248,/sec
> >   0,,page-faults,8013340926,100.00,0.000,/sec
> >   8027742,,cycles,8013344503,100.00,0.001,GHz
> >   2871717,,instructions,8013356501,100.00,0.36,insn per cycle
> >   553564,,branches,8013366204,100.00,69.081,K/sec
> >   54021,,branch-misses,8013375952,100.00,9.76,of all branches
> > 
> > Signed-off-by: Jin Yao 
> 
> LGTM, for patchset:
> 
> Acked-by: Jiri Olsa 

After doing the s/cvs/csv/ changes, applied.

- Arnaldo


Re: [PATCH v3 2/2] perf test: Add CVS summary test

2021-03-24 Thread Arnaldo Carvalho de Melo
Em Wed, Mar 24, 2021 at 10:12:43AM -0300, Arnaldo Carvalho de Melo escreveu:
> Em Wed, Mar 24, 2021 at 10:05:18AM -0300, Arnaldo Carvalho de Melo escreveu:
> > Em Fri, Mar 19, 2021 at 03:01:56PM +0800, Jin Yao escreveu:
> > > The patch "perf stat: Align CSV output for summary mode" aligned
> > > CVS output and added "summary" to the first column of summary
> > > lines.
> > > 
> > > Now we check if the "summary" string is added to the CVS output.
> > > 
> > > If we set '--no-cvs-summary' option, the "summary" string would
> > > not be added, also check with this case.
> > 
> > You mixed up cvs with csv in various places, I'm fixing it up...
> 
> This, for the first patch, now fixing the second.

nah, there were some missing fixes:


diff --git a/tools/perf/Documentation/perf-stat.txt 
b/tools/perf/Documentation/perf-stat.txt
index e81a45cadd4a0bdb..6ec5960b08c3de21 100644
--- a/tools/perf/Documentation/perf-stat.txt
+++ b/tools/perf/Documentation/perf-stat.txt
@@ -482,14 +482,14 @@ convenient for post processing.
 --summary::
 Print summary for interval mode (-I).
 
---no-cvs-summary::
+--no-csv-summary::
 Don't print 'summary' at the first column for CVS summary output.
 This option must be used with -x and --summary.
 
 This option can be enabled in perf config by setting the variable
-'stat.no-cvs-summary'.
+'stat.no-csv-summary'.
 
-$ perf config stat.no-cvs-summary=true
+$ perf config stat.no-csv-summary=true
 
 EXAMPLES
 
diff --git a/tools/perf/builtin-stat.c b/tools/perf/builtin-stat.c
index 6daa090129a65c78..2a2c15cac80a3bee 100644
--- a/tools/perf/builtin-stat.c
+++ b/tools/perf/builtin-stat.c
@@ -1093,9 +1093,9 @@ void perf_stat__set_big_num(int set)
stat_config.big_num = (set != 0);
 }
 
-void perf_stat__set_no_cvs_summary(int set)
+void perf_stat__set_no_csv_summary(int set)
 {
-   stat_config.no_cvs_summary = (set != 0);
+   stat_config.no_csv_summary = (set != 0);
 }
 
 static int stat__set_big_num(const struct option *opt __maybe_unused,
@@ -1254,8 +1254,8 @@ static struct option stat_options[] = {
"threads of same physical core"),
OPT_BOOLEAN(0, "summary", _config.summary,
   "print summary for interval mode"),
-   OPT_BOOLEAN(0, "no-cvs-summary", _config.no_cvs_summary,
-  "don't print 'summary' for CVS summary output"),
+   OPT_BOOLEAN(0, "no-csv-summary", _config.no_csv_summary,
+  "don't print 'summary' for CSV summary output"),
OPT_BOOLEAN(0, "quiet", _config.quiet,
"don't print output (useful with record)"),
 #ifdef HAVE_LIBPFM
diff --git a/tools/perf/util/config.c b/tools/perf/util/config.c
index df78f11f6fb50a0b..6bcb5ef221f8c1be 100644
--- a/tools/perf/util/config.c
+++ b/tools/perf/util/config.c
@@ -457,8 +457,8 @@ static int perf_stat_config(const char *var, const char 
*value)
if (!strcmp(var, "stat.big-num"))
perf_stat__set_big_num(perf_config_bool(var, value));
 
-   if (!strcmp(var, "stat.no-cvs-summary"))
-   perf_stat__set_no_cvs_summary(perf_config_bool(var, value));
+   if (!strcmp(var, "stat.no-csv-summary"))
+   perf_stat__set_no_csv_summary(perf_config_bool(var, value));
 
/* Add other config variables here. */
return 0;
diff --git a/tools/perf/util/stat-display.c b/tools/perf/util/stat-display.c
index 2e7fec0bd8f3f3bb..d3137bc1706548d4 100644
--- a/tools/perf/util/stat-display.c
+++ b/tools/perf/util/stat-display.c
@@ -440,7 +440,7 @@ static void printout(struct perf_stat_config *config, 
struct aggr_cpu_id id, int
os.nfields++;
}
 
-   if (!config->no_cvs_summary && config->csv_output &&
+   if (!config->no_csv_summary && config->csv_output &&
config->summary && !config->interval) {
fprintf(config->output, "%16s%s", "summary", config->csv_sep);
}
diff --git a/tools/perf/util/stat.h b/tools/perf/util/stat.h
index def0cdc841330210..48e6a06233faef8e 100644
--- a/tools/perf/util/stat.h
+++ b/tools/perf/util/stat.h
@@ -128,7 +128,7 @@ struct perf_stat_config {
bool all_user;
bool percore_show_thread;
bool summary;
-   bool no_cvs_summary;
+   bool no_csv_summary;
bool metric_no_group;
bool metric_no_merge;
bool stop_read_counter;
@@ -161,7 +161,7 @@ struct perf_stat_config {
 };
 
 void perf_stat__set_big_num(int set);
-void perf_stat__set_no_cvs_summary(int set);
+void perf_stat__set_no_csv_summary(int set);
 
 void update_stats(struct stats *stats, u64 val);
 double avg_stats(struct stats *stats);


Re: [PATCH v3 2/2] perf test: Add CVS summary test

2021-03-24 Thread Arnaldo Carvalho de Melo
Em Wed, Mar 24, 2021 at 10:05:18AM -0300, Arnaldo Carvalho de Melo escreveu:
> Em Fri, Mar 19, 2021 at 03:01:56PM +0800, Jin Yao escreveu:
> > The patch "perf stat: Align CSV output for summary mode" aligned
> > CVS output and added "summary" to the first column of summary
> > lines.
> > 
> > Now we check if the "summary" string is added to the CVS output.
> > 
> > If we set '--no-cvs-summary' option, the "summary" string would
> > not be added, also check with this case.
> 
> You mixed up cvs with csv in various places, I'm fixing it up...

This, for the first patch, now fixing the second.


diff --git a/tools/perf/Documentation/perf-stat.txt 
b/tools/perf/Documentation/perf-stat.txt
index e81a45cadd4a0bdb..6ec5960b08c3de21 100644
--- a/tools/perf/Documentation/perf-stat.txt
+++ b/tools/perf/Documentation/perf-stat.txt
@@ -482,14 +482,14 @@ convenient for post processing.
 --summary::
 Print summary for interval mode (-I).
 
---no-cvs-summary::
+--no-csv-summary::
 Don't print 'summary' at the first column for CVS summary output.
 This option must be used with -x and --summary.
 
 This option can be enabled in perf config by setting the variable
-'stat.no-cvs-summary'.
+'stat.no-csv-summary'.
 
-$ perf config stat.no-cvs-summary=true
+$ perf config stat.no-csv-summary=true
 
 EXAMPLES
 
diff --git a/tools/perf/builtin-stat.c b/tools/perf/builtin-stat.c
index 6daa090129a65c78..2a2c15cac80a3bee 100644
--- a/tools/perf/builtin-stat.c
+++ b/tools/perf/builtin-stat.c
@@ -1093,9 +1093,9 @@ void perf_stat__set_big_num(int set)
stat_config.big_num = (set != 0);
 }
 
-void perf_stat__set_no_cvs_summary(int set)
+void perf_stat__set_no_csv_summary(int set)
 {
-   stat_config.no_cvs_summary = (set != 0);
+   stat_config.no_csv_summary = (set != 0);
 }
 
 static int stat__set_big_num(const struct option *opt __maybe_unused,
@@ -1254,8 +1254,8 @@ static struct option stat_options[] = {
"threads of same physical core"),
OPT_BOOLEAN(0, "summary", _config.summary,
   "print summary for interval mode"),
-   OPT_BOOLEAN(0, "no-cvs-summary", _config.no_cvs_summary,
-  "don't print 'summary' for CVS summary output"),
+   OPT_BOOLEAN(0, "no-csv-summary", _config.no_csv_summary,
+  "don't print 'summary' for CSV summary output"),
OPT_BOOLEAN(0, "quiet", _config.quiet,
"don't print output (useful with record)"),
 #ifdef HAVE_LIBPFM
diff --git a/tools/perf/util/config.c b/tools/perf/util/config.c
index df78f11f6fb50a0b..6bcb5ef221f8c1be 100644
--- a/tools/perf/util/config.c
+++ b/tools/perf/util/config.c
@@ -457,8 +457,8 @@ static int perf_stat_config(const char *var, const char 
*value)
if (!strcmp(var, "stat.big-num"))
perf_stat__set_big_num(perf_config_bool(var, value));
 
-   if (!strcmp(var, "stat.no-cvs-summary"))
-   perf_stat__set_no_cvs_summary(perf_config_bool(var, value));
+   if (!strcmp(var, "stat.no-csv-summary"))
+   perf_stat__set_no_csv_summary(perf_config_bool(var, value));
 
/* Add other config variables here. */
return 0;


Re: [PATCH v3 2/2] perf test: Add CVS summary test

2021-03-24 Thread Arnaldo Carvalho de Melo
Em Fri, Mar 19, 2021 at 03:01:56PM +0800, Jin Yao escreveu:
> The patch "perf stat: Align CSV output for summary mode" aligned
> CVS output and added "summary" to the first column of summary
> lines.
> 
> Now we check if the "summary" string is added to the CVS output.
> 
> If we set '--no-cvs-summary' option, the "summary" string would
> not be added, also check with this case.

You mixed up cvs with csv in various places, I'm fixing it up...

- Arnaldo
 
> Signed-off-by: Jin Yao 
> ---
>  v3:
>- New in v3.
>  
>  tools/perf/tests/shell/stat+cvs_summary.sh | 31 ++
>  1 file changed, 31 insertions(+)
>  create mode 100755 tools/perf/tests/shell/stat+cvs_summary.sh
> 
> diff --git a/tools/perf/tests/shell/stat+cvs_summary.sh 
> b/tools/perf/tests/shell/stat+cvs_summary.sh
> new file mode 100755
> index ..dd14f2ce7f6b
> --- /dev/null
> +++ b/tools/perf/tests/shell/stat+cvs_summary.sh
> @@ -0,0 +1,31 @@
> +#!/bin/sh
> +# perf stat cvs summary test
> +# SPDX-License-Identifier: GPL-2.0
> +
> +set -e
> +
> +#
> +# 1.001364330 9224197  cycles 8012885033 100.00
> +# summary 9224197  cycles 8012885033 100.00
> +#
> +perf stat -e cycles  -x' ' -I1000 --interval-count 1 --summary 2>&1 | \
> +grep -e summary | \
> +while read summary num event run pct
> +do
> + if [ $summary != "summary" ]; then
> + exit 1
> + fi
> +done
> +
> +#
> +# 1.001360298 9148534  cycles 8012853854 100.00
> +#9148534  cycles 8012853854 100.00
> +#
> +perf stat -e cycles  -x' ' -I1000 --interval-count 1 --summary 
> --no-cvs-summary 2>&1 | \
> +grep -e summary | \
> +while read num event run pct
> +do
> + exit 1
> +done
> +
> +exit 0
> -- 
> 2.17.1
> 

-- 

- Arnaldo


Re: [PATCH v2 0/3] perf-stat: share hardware PMCs with BPF

2021-03-23 Thread Arnaldo Carvalho de Melo
Em Fri, Mar 19, 2021 at 04:14:42PM +, Song Liu escreveu:
> > On Mar 19, 2021, at 8:58 AM, Namhyung Kim  wrote:
> > On Sat, Mar 20, 2021 at 12:35 AM Arnaldo Carvalho de Melo  
> > wrote:
> >> Em Fri, Mar 19, 2021 at 09:54:59AM +0900, Namhyung Kim escreveu:
> >>> On Fri, Mar 19, 2021 at 9:22 AM Song Liu  wrote:
> >>>>> On Mar 18, 2021, at 5:09 PM, Arnaldo  wrote:
> >>>>> On March 18, 2021 6:14:34 PM GMT-03:00, Jiri Olsa  
> >>>>> wrote:
> >>>>>> On Thu, Mar 18, 2021 at 03:52:51AM +, Song Liu wrote:
> >>>>>>> perf stat -C 1,3,5  107.063 [sec]
> >>>>>>> perf stat -C 1,3,5 --bpf-counters   106.406 [sec]

> >>>>>> I can't see why it's actualy faster than normal perf ;-)
> >>>>>> would be worth to find out

> >>>>> Isn't this all about contended cases?

> >>>> Yeah, the normal perf is doing time multiplexing; while --bpf-counters
> >>>> doesn't need it.

> >>> Yep, so for uncontended cases, normal perf should be the same as the
> >>> baseline (faster than the bperf).  But for contended cases, the bperf
> >>> works faster.

> >> The difference should be small enough that for people that use this in a
> >> machine where contention happens most of the time, setting a
> >> ~/.perfconfig to use it by default should be advantageous, i.e. no need
> >> to use --bpf-counters on the command line all the time.

> >> So, Namhyung, can I take that as an Acked-by or a Reviewed-by? I'll take
> >> a look again now but I want to have this merged on perf/core so that I
> >> can work on a new BPF SKEL to use this:

> > I have a concern for the per cpu target, but it can be done later, so

> > Acked-by: Namhyung Kim 

> >> https://git.kernel.org/pub/scm/linux/kernel/git/acme/linux.git/log/?h=tmp.bpf/bpf_perf_enable

> > Interesting!  Actually I was thinking about the similar too. :)
> 
> Hi Namhyung, Jiri, and Arnaldo,
> 
> Thanks a lot for your kind review. 
> 
> Here is updated 3/3, where we use perf-bench instead of stressapptest.

I had to apply this updated 3/3 manually, as there was some munging, its
all now at:

https://git.kernel.org/pub/scm/linux/kernel/git/acme/linux.git/log/?h=tmp.perf/core

Please take a look at the "Committer testing" section I added to the
main patch, introducing bperf:

https://git.kernel.org/pub/scm/linux/kernel/git/acme/linux.git/commit/?h=tmp.perf/core=7fac83aaf2eecc9e7e7b72da694c49bb4ce7fdfc

And check if I made any mistake or if something else could be added.

It'll move to perf/core after my set of automated tests finishes.

- Arnaldo


Re: [PATCH] perf tools: Fix various typos in comments

2021-03-23 Thread Arnaldo Carvalho de Melo
Em Tue, Mar 23, 2021 at 02:59:57PM -0300, Arnaldo Carvalho de Melo escreveu:
> Em Tue, Mar 23, 2021 at 05:10:10PM +0100, Ingo Molnar escreveu:
> > 
> > Here's the delta between -v1 and -v2, in case you already have -v1 or 
> > want to review the changes only:
> 
> I had not pushed out it, so I just replaced v1 with v2, thanks.

So the first hunk has a problem, I'm fixing it up :-)

- Arnaldo
 
> > +++ b/tools/perf/arch/arm64/util/machine.c
> > @@ -1,4 +1,4 @@
> > -// SPDX-License-Identifier: GPL-2.0
> > +/ SPDX-License-Identifier: GPL-2.0
> >  
> >  #include 
> >  #include 
> > @@ -6,11 +6,11 @@
> >  #include "debug.h"
> >  #include "symbol.h"


Re: [PATCH v2 1/3] perf-stat: introduce bperf, share hardware PMCs with BPF

2021-03-23 Thread Arnaldo Carvalho de Melo
Em Tue, Mar 23, 2021 at 09:37:42AM -0300, Arnaldo Carvalho de Melo escreveu:
> Em Tue, Mar 23, 2021 at 09:25:52AM -0300, Arnaldo Carvalho de Melo escreveu:
> > Em Fri, Mar 19, 2021 at 03:41:57PM -0300, Arnaldo Carvalho de Melo escreveu:
> > > Em Thu, Mar 18, 2021 at 10:15:13PM +0100, Jiri Olsa escreveu:
> > > > On Tue, Mar 16, 2021 at 02:18:35PM -0700, Song Liu wrote:
> > > > > bperf is off by default. To enable it, pass --bpf-counters option to
> > > > > perf-stat. bperf uses a BPF hashmap to share information about BPF
> > > > > programs and maps used by bperf. This map is pinned to bpffs. The 
> > > > > default
> > > > > path is /sys/fs/bpf/perf_attr_map. The user could change the path with
> > > > > option --bpf-attr-map.
> > > > > 
> > > > > Signed-off-by: Song Liu 
> > > > 
> > > > Reviewed-by: Jiri Olsa 
> > > 
> > > After applying just this first patch in the series I'm getting this
> > > after a 'make -C tools/ clean', now I'm checking if I need some new
> > > clang, ideas?
> > 
> > Works now with clang from fedora 33, I was using a locally built, older,
> > now I get this when trying as non-root, expected, but we need to improve
> > the wording.
> 
> Fails as root as well, investigating:
> 
> [root@five ~]# ls -lad /sys/fs/bpf/
> drwx-T. 2 root root 0 Mar 23 06:03 /sys/fs/bpf/
> [root@five ~]# strace -e bpf perf stat --bpf-counters sleep 1
> bpf(BPF_MAP_CREATE, {map_type=BPF_MAP_TYPE_HASH, key_size=120, value_size=8, 
> max_entries=16, map_flags=0, inner_map_fd=0, map_name="", map_ifindex=0, 
> btf_fd=0, btf_key_type_id=0, btf_value_type_id=0, 
> btf_vmlinux_value_type_id=0}, 120) = -1 EPERM (Operation not permitted)
> Failed to lock perf_event_attr map
> --- SIGCHLD {si_signo=SIGCHLD, si_code=CLD_KILLED, si_pid=13916, si_uid=0, 
> si_status=SIGTERM, si_utime=0, si_stime=0} ---
> +++ exited with 255 +++
> [root@five ~]#
>  
> > [acme@five perf]$ perf stat --bpf-counters sleep 1
> > Failed to lock perf_event_attr map
> > [acme@five perf]$

Now it works, on 5.12-rc2+

[root@five pahole]# perf stat --bpf-counters sleep 1
libbpf: elf: skipping unrecognized data section(7) .eh_frame
libbpf: elf: skipping relo section(12) .rel.eh_frame for section(7) .eh_frame
libbpf: elf: skipping unrecognized data section(8) .eh_frame
libbpf: elf: skipping relo section(13) .rel.eh_frame for section(8) .eh_frame
libbpf: elf: skipping unrecognized data section(7) .eh_frame
libbpf: elf: skipping relo section(12) .rel.eh_frame for section(7) .eh_frame
libbpf: elf: skipping unrecognized data section(8) .eh_frame
libbpf: elf: skipping relo section(13) .rel.eh_frame for section(8) .eh_frame
libbpf: elf: skipping unrecognized data section(7) .eh_frame
libbpf: elf: skipping relo section(12) .rel.eh_frame for section(7) .eh_frame
libbpf: elf: skipping unrecognized data section(8) .eh_frame
libbpf: elf: skipping relo section(13) .rel.eh_frame for section(8) .eh_frame
libbpf: elf: skipping unrecognized data section(7) .eh_frame
libbpf: elf: skipping relo section(12) .rel.eh_frame for section(7) .eh_frame
libbpf: elf: skipping unrecognized data section(8) .eh_frame
libbpf: elf: skipping relo section(13) .rel.eh_frame for section(8) .eh_frame
libbpf: elf: skipping unrecognized data section(7) .eh_frame
libbpf: elf: skipping relo section(12) .rel.eh_frame for section(7) .eh_frame
libbpf: elf: skipping unrecognized data section(8) .eh_frame
libbpf: elf: skipping relo section(13) .rel.eh_frame for section(8) .eh_frame
libbpf: elf: skipping unrecognized data section(7) .eh_frame
libbpf: elf: skipping relo section(12) .rel.eh_frame for section(7) .eh_frame
libbpf: elf: skipping unrecognized data section(8) .eh_frame
libbpf: elf: skipping relo section(13) .rel.eh_frame for section(8) .eh_frame
libbpf: elf: skipping unrecognized data section(7) .eh_frame
libbpf: elf: skipping relo section(12) .rel.eh_frame for section(7) .eh_frame
libbpf: elf: skipping unrecognized data section(8) .eh_frame
libbpf: elf: skipping relo section(13) .rel.eh_frame for section(8) .eh_frame
libbpf: elf: skipping unrecognized data section(7) .eh_frame
libbpf: elf: skipping relo section(12) .rel.eh_frame for section(7) .eh_frame
libbpf: elf: skipping unrecognized data section(8) .eh_frame
libbpf: elf: skipping relo section(13) .rel.eh_frame for section(8) .eh_frame
libbpf: elf: skipping unrecognized data section(7) .eh_frame
libbpf: elf: skipping relo section(12) .rel.eh_frame for section(7) .eh_frame
libbpf: elf: skipping unrecognized data section(8) .eh_frame
libbpf: elf: skipping relo section(13) .rel.eh_frame for section(8) .eh_frame
libbpf: elf: skipping unrecognized data section(7) .eh_frame
libbpf: elf: skipping relo section(12) .rel.eh_frame for 

Re: [PATCH] perf test: Fix perf test 42

2021-03-23 Thread Arnaldo Carvalho de Melo
Em Mon, Mar 22, 2021 at 01:53:39PM +0100, Thomas Richter escreveu:
> For some time now the perf test 42: BPF filter returns an error
> on bpf relocation subtest, at least on x86 and s390. This is caused by
> 
> commit d859900c4c56 ("bpf, libbpf: support global data/bss/rodata sections")
> 
> which introduces support for global variables in eBPF programs. At least
> for global variables defined static.
> 
> Perf test 42 checks that the eBPF relocation fails when the eBPF program
> contains a global variable. It returns OK when the eBPF program
> could not be loaded and FAILED otherwise.
> 
> With above commit the test logic for the eBPF relocation need to change:
> 1. The function prepare_bpf() now always succeeds, the eBPF program
>compiled without errors and returns a valid object pointer instead of
>NULL.
> 2. There is no kprobe named sys_write, it now named ksys_write.
> 3. The function do_test() now returns TEST_FAIL because function
>parse_events_load_bpf_obj() can not execute the eBPF program. The
>eBPF verifier complains on an invalid map pointer:
>   libbpf: load bpf program failed: Permission denied
>   libbpf: -- BEGIN DUMP LOG ---
>   libbpf:
>   0: (b7) r1 = 0
>   1: (63) *(u32 *)(r10 -4) = r1
>   last_idx 1 first_idx 0
>   regs=2 stack=0 before 0: (b7) r1 = 0
>   2: (63) *(u32 *)(r10 -8) = r1
>   3: (bf) r2 = r10
>   4: (07) r2 += -4
>   5: (bf) r3 = r10
>   6: (07) r3 += -8
>   7: (18) r1 = 0x380006ce000
>   9: (b7) r4 = 0
>   10: (85) call bpf_map_update_elem#2
>   R1 type=map_value expected=map_ptr
> 
> Fix this by added logic to handle the kernel verifier return code:
> 1. Add function myksys_write() to cope with successful compile.
> 2. Use kprobe ksys_write
> 3. Handle eBPF verifier error.
> 
> Output after:
>  42: BPF filter  :
>  42.1: Basic BPF filtering   : Ok
>  42.2: BPF pinning   : Ok
>  42.3: BPF prologue generation   : Ok
>  42.4: BPF relocation checker: Failed
>  #
> 
> Output after:
>  # ./perf test -F 42
>  42: BPF filter  :
>  42.1: Basic BPF filtering   : Ok
>  42.2: BPF pinning   : Ok
>  42.3: BPF prologue generation   : Ok
>  42.4: BPF relocation checker: Ok
>  #
> 
> Signed-off-by: Thomas Richter 
> ---
>  tools/perf/tests/bpf-script-test-relocation.c |  4 ++--
>  tools/perf/tests/bpf.c| 11 +++
>  2 files changed, 13 insertions(+), 2 deletions(-)
> 
> diff --git a/tools/perf/tests/bpf-script-test-relocation.c 
> b/tools/perf/tests/bpf-script-test-relocation.c
> index 74006e4b2d24..f8f8176ad4d1 100644
> --- a/tools/perf/tests/bpf-script-test-relocation.c
> +++ b/tools/perf/tests/bpf-script-test-relocation.c
> @@ -34,8 +34,8 @@ struct bpf_map_def SEC("maps") my_table = {
>  
>  int this_is_a_global_val;
>  
> -SEC("func=sys_write")
> -int bpf_func__sys_write(void *ctx)
> +SEC("func=ksys_write")
> +int bpf_func__ksys_write(void *ctx)
>  {
>   int key = 0;
>   int value = 0;
> diff --git a/tools/perf/tests/bpf.c b/tools/perf/tests/bpf.c
> index f57e075b0ed2..d60ef9472d3d 100644
> --- a/tools/perf/tests/bpf.c
> +++ b/tools/perf/tests/bpf.c
> @@ -59,6 +59,11 @@ static int llseek_loop(void)
>  
>  #endif
>  
> +static int myksys_write(void)
> +{
> + return 0;
> +}
> +
>  static struct {
>   enum test_llvm__testcase prog_id;
>   const char *desc;
> @@ -105,6 +110,7 @@ static struct {
>   .name = "[bpf_relocation_test]",
>   .msg_compile_fail = "fix 'perf test LLVM' first",
>   .msg_load_fail= "libbpf error when dealing with relocation",
> + .target_func  = _write,
>   },
>  };
>  
> @@ -258,6 +264,11 @@ static int __test__bpf(int idx)
>   ret = do_test(obj,
> bpf_testcase_table[idx].target_func,
> bpf_testcase_table[idx].expect_result);
> + if (bpf_testcase_table[idx].prog_id == 
> LLVM_TESTCASE_BPF_RELOCATION
> + && ret == TEST_FAIL) {
> + ret = TEST_OK;
> + goto out;
> + }

At this point, if it doesn't matter if it fails or succeeds, just drop
this test case?

- Arnaldo

>   if (ret != TEST_OK)
>   goto out;
>   if (bpf_testcase_table[idx].pin) {
> -- 
> 2.30.2
> 

-- 

- Arnaldo


Re: [PATCH] perf tools: Fix various typos in comments

2021-03-23 Thread Arnaldo Carvalho de Melo
Em Tue, Mar 23, 2021 at 05:10:10PM +0100, Ingo Molnar escreveu:
> 
> Here's the delta between -v1 and -v2, in case you already have -v1 or 
> want to review the changes only:

I had not pushed out it, so I just replaced v1 with v2, thanks.

- Arnaldo
 
> ---
>  tools/perf/arch/arm64/util/machine.c   |  8 
>  tools/perf/examples/bpf/augmented_raw_syscalls.c   |  4 ++--
>  tools/perf/pmu-events/arch/powerpc/power8/metrics.json | 12 ++--
>  tools/perf/pmu-events/arch/powerpc/power9/metrics.json |  2 +-
>  tools/perf/pmu-events/jevents.c|  2 +-
>  tools/perf/tests/bp_signal.c   |  2 +-
>  tools/perf/tests/parse-events.c|  2 +-
>  tools/perf/util/bpf-loader.c   |  2 +-
>  tools/perf/util/config.c   |  2 +-
>  tools/perf/util/cs-etm.h   |  3 ++-
>  tools/perf/util/demangle-java.c|  2 +-
>  tools/perf/util/dso.h  |  2 +-
>  tools/perf/util/evsel.c|  2 +-
>  tools/perf/util/header.c   | 16 
>  tools/perf/util/intel-pt.c |  2 +-
>  tools/perf/util/machine.c  |  2 +-
>  tools/perf/util/map.h  |  4 ++--
>  tools/perf/util/parse-events.c |  4 ++--
>  tools/perf/util/pmu.c  |  2 +-
>  tools/perf/util/probe-finder.c |  2 +-
>  tools/perf/util/scripting-engines/trace-event-python.c |  2 +-
>  tools/perf/util/session.c  |  2 +-
>  tools/perf/util/strbuf.h   |  2 +-
>  tools/perf/util/strfilter.h|  4 ++--
>  24 files changed, 44 insertions(+), 43 deletions(-)
> 
> diff --git a/tools/perf/arch/arm64/util/machine.c 
> b/tools/perf/arch/arm64/util/machine.c
> index 40c5e0b5bda8..acdf8dc1189b 100644
> --- a/tools/perf/arch/arm64/util/machine.c
> +++ b/tools/perf/arch/arm64/util/machine.c
> @@ -1,4 +1,4 @@
> -// SPDX-License-Identifier: GPL-2.0
> +/ SPDX-License-Identifier: GPL-2.0
>  
>  #include 
>  #include 
> @@ -6,11 +6,11 @@
>  #include "debug.h"
>  #include "symbol.h"
>  
> -/* On arm64, kernel text segment start at high memory address,
> +/* On arm64, kernel text segment starts at high memory address,
>   * for example 0x  8xxx . Modules start at a low memory
> - * address, like 0x  00ax . When only samll amount of
> + * address, like 0x  00ax . When only small amount of
>   * memory is used by modules, gap between end of module's text segment
> - * and start of kernel text segment may be reach 2G.
> + * and start of kernel text segment may reach 2G.
>   * Therefore do not fill this gap and do not assign it to the kernel dso map.
>   */
>  
> diff --git a/tools/perf/examples/bpf/augmented_raw_syscalls.c 
> b/tools/perf/examples/bpf/augmented_raw_syscalls.c
> index b80437971d80..a262dcd020f4 100644
> --- a/tools/perf/examples/bpf/augmented_raw_syscalls.c
> +++ b/tools/perf/examples/bpf/augmented_raw_syscalls.c
> @@ -262,7 +262,7 @@ int sys_enter(struct syscall_enter_args *args)
>   /*
>* Jump to syscall specific augmenter, even if the default one,
>* "!raw_syscalls:unaugmented" that will just return 1 to return the
> -  * unagmented tracepoint payload.
> +  * unaugmented tracepoint payload.
>*/
>   bpf_tail_call(args, _sys_enter, 
> augmented_args->args.syscall_nr);
>  
> @@ -282,7 +282,7 @@ int sys_exit(struct syscall_exit_args *args)
>   /*
>* Jump to syscall specific return augmenter, even if the default one,
>* "!raw_syscalls:unaugmented" that will just return 1 to return the
> -  * unagmented tracepoint payload.
> +  * unaugmented tracepoint payload.
>*/
>   bpf_tail_call(args, _sys_exit, exit_args.syscall_nr);
>   /*
> diff --git a/tools/perf/pmu-events/arch/powerpc/power8/metrics.json 
> b/tools/perf/pmu-events/arch/powerpc/power8/metrics.json
> index fc4aa6c2ddc9..4e25525b7da6 100644
> --- a/tools/perf/pmu-events/arch/powerpc/power8/metrics.json
> +++ b/tools/perf/pmu-events/arch/powerpc/power8/metrics.json
> @@ -885,37 +885,37 @@
>  "MetricName": "flush_rate_percent"
>  },
>  {
> -"BriefDescription": "GCT slot utilization (11 to 14) as a % of 
> cycles this thread had atleast 1 slot valid",
> +"BriefDescription": "GCT slot utilization (11 to 14) as a % of 
> cycles this thread had at least 1 slot valid",
>  "MetricExpr": "PM_GCT_UTIL_11_14_ENTRIES / ( PM_RUN_CYC - 
> PM_GCT_NOSLOT_CYC) * 100",
>  "MetricGroup": "general",
>  "MetricName": "gct_util_11to14_slots_percent"
>  },
>  {
> -"BriefDescription": "GCT slot utilization (15 to 17) as a % of 
> cycles this 

Re: [PATCH] perf tools: Fix various typos in comments

2021-03-23 Thread Arnaldo Carvalho de Melo
Em Sun, Mar 21, 2021 at 12:37:34PM +0100, Ingo Molnar escreveu:
> 
> Fix ~81 single-word typos in the perf tooling code - accumulated over the 
> years.

Thanks, applied.

- Arnaldo

 
> Signed-off-by: Ingo Molnar 
> ---
>  tools/perf/Documentation/perf-buildid-cache.txt |  2 +-
>  tools/perf/Documentation/perf-report.txt|  2 +-
>  tools/perf/Documentation/perf-top.txt   |  2 +-
>  tools/perf/arch/arm/util/cs-etm.c   |  2 +-
>  tools/perf/arch/arm64/util/perf_regs.c  |  2 +-
>  tools/perf/arch/powerpc/util/kvm-stat.c |  2 +-
>  tools/perf/arch/powerpc/util/utils_header.h |  2 +-
>  tools/perf/arch/x86/tests/bp-modify.c   |  2 +-
>  tools/perf/arch/x86/util/perf_regs.c|  4 ++--
>  tools/perf/bench/epoll-wait.c   |  4 ++--
>  tools/perf/bench/numa.c |  2 +-
>  tools/perf/builtin-annotate.c   |  2 +-
>  tools/perf/builtin-diff.c   |  2 +-
>  tools/perf/builtin-lock.c   |  2 +-
>  tools/perf/builtin-sched.c  |  2 +-
>  tools/perf/builtin-script.c |  4 ++--
>  tools/perf/builtin-stat.c   |  4 ++--
>  tools/perf/builtin-top.c|  2 +-
>  tools/perf/jvmti/jvmti_agent.c  |  4 ++--
>  tools/perf/scripts/python/netdev-times.py   |  2 +-
>  tools/perf/tests/bp_signal.c|  4 ++--
>  tools/perf/tests/code-reading.c |  2 +-
>  tools/perf/tests/hists_cumulate.c   |  4 ++--
>  tools/perf/tests/parse-metric.c |  2 +-
>  tools/perf/tests/topology.c |  2 +-
>  tools/perf/trace/beauty/include/linux/socket.h  |  2 +-
>  tools/perf/ui/browsers/annotate.c   |  2 +-
>  tools/perf/ui/browsers/hists.c  |  2 +-
>  tools/perf/util/call-path.h |  2 +-
>  tools/perf/util/callchain.c |  2 +-
>  tools/perf/util/cs-etm-decoder/cs-etm-decoder.c |  2 +-
>  tools/perf/util/cs-etm.c|  8 
>  tools/perf/util/cs-etm.h|  2 +-
>  tools/perf/util/data-convert-bt.c   |  2 +-
>  tools/perf/util/demangle-java.c |  2 +-
>  tools/perf/util/dwarf-aux.c |  6 +++---
>  tools/perf/util/dwarf-aux.h |  2 +-
>  tools/perf/util/events_stats.h  |  2 +-
>  tools/perf/util/evlist.c|  2 +-
>  tools/perf/util/evsel.c |  2 +-
>  tools/perf/util/expr.h  |  2 +-
>  tools/perf/util/header.c|  2 +-
>  tools/perf/util/levenshtein.c   |  2 +-
>  tools/perf/util/libunwind/arm64.c   |  2 +-
>  tools/perf/util/libunwind/x86_32.c  |  2 +-
>  tools/perf/util/llvm-utils.c|  2 +-
>  tools/perf/util/machine.c   |  6 +++---
>  tools/perf/util/mem-events.h|  2 +-
>  tools/perf/util/metricgroup.c   |  2 +-
>  tools/perf/util/parse-events.c  |  6 +++---
>  tools/perf/util/pmu.c   |  2 +-
>  tools/perf/util/probe-event.c   |  4 ++--
>  tools/perf/util/probe-finder.c  |  4 ++--
>  tools/perf/util/s390-cpumsf.c   | 10 +-
>  tools/perf/util/session.c   |  2 +-
>  tools/perf/util/symbol-elf.c|  2 +-
>  tools/perf/util/synthetic-events.c  |  4 ++--
>  tools/perf/util/unwind-libunwind-local.c|  2 +-
>  58 files changed, 81 insertions(+), 81 deletions(-)
> 
> diff --git a/tools/perf/Documentation/perf-buildid-cache.txt 
> b/tools/perf/Documentation/perf-buildid-cache.txt
> index bb167e32a1d7..cd8ce6e8ec12 100644
> --- a/tools/perf/Documentation/perf-buildid-cache.txt
> +++ b/tools/perf/Documentation/perf-buildid-cache.txt
> @@ -57,7 +57,7 @@ OPTIONS
>  -u::
>  --update=::
>   Update specified file of the cache. Note that this doesn't remove
> - older entires since those may be still needed for annotating old
> + older entries since those may be still needed for annotating old
>   (or remote) perf.data. Only if there is already a cache which has
>   exactly same build-id, that is replaced by new one. It can be used
>   to update kallsyms and kernel dso to vmlinux in order to support
> diff --git a/tools/perf/Documentation/perf-report.txt 
> b/tools/perf/Documentation/perf-report.txt
> index f546b5e9db05..d2d2a8d8f8f5 100644
> --- a/tools/perf/Documentation/perf-report.txt
> +++ b/tools/perf/Documentation/perf-report.txt
> @@ -472,7 +472,7 @@ OPTIONS
>   but probably we'll make the default not to show the switch-on/off events
>  on the --group mode and if there is only one event besides the 
> off/on ones,
>   go straight to the histogram browser, 

Re: [PATCH v2 1/3] perf-stat: introduce bperf, share hardware PMCs with BPF

2021-03-23 Thread Arnaldo Carvalho de Melo
Em Tue, Mar 23, 2021 at 09:25:52AM -0300, Arnaldo Carvalho de Melo escreveu:
> Em Fri, Mar 19, 2021 at 03:41:57PM -0300, Arnaldo Carvalho de Melo escreveu:
> > Em Thu, Mar 18, 2021 at 10:15:13PM +0100, Jiri Olsa escreveu:
> > > On Tue, Mar 16, 2021 at 02:18:35PM -0700, Song Liu wrote:
> > > > bperf is off by default. To enable it, pass --bpf-counters option to
> > > > perf-stat. bperf uses a BPF hashmap to share information about BPF
> > > > programs and maps used by bperf. This map is pinned to bpffs. The 
> > > > default
> > > > path is /sys/fs/bpf/perf_attr_map. The user could change the path with
> > > > option --bpf-attr-map.
> > > > 
> > > > Signed-off-by: Song Liu 
> > > 
> > > Reviewed-by: Jiri Olsa 
> > 
> > After applying just this first patch in the series I'm getting this
> > after a 'make -C tools/ clean', now I'm checking if I need some new
> > clang, ideas?
> 
> Works now with clang from fedora 33, I was using a locally built, older,
> now I get this when trying as non-root, expected, but we need to improve
> the wording.

Fails as root as well, investigating:

[root@five ~]# ls -lad /sys/fs/bpf/
drwx-T. 2 root root 0 Mar 23 06:03 /sys/fs/bpf/
[root@five ~]# strace -e bpf perf stat --bpf-counters sleep 1
bpf(BPF_MAP_CREATE, {map_type=BPF_MAP_TYPE_HASH, key_size=120, value_size=8, 
max_entries=16, map_flags=0, inner_map_fd=0, map_name="", map_ifindex=0, 
btf_fd=0, btf_key_type_id=0, btf_value_type_id=0, btf_vmlinux_value_type_id=0}, 
120) = -1 EPERM (Operation not permitted)
Failed to lock perf_event_attr map
--- SIGCHLD {si_signo=SIGCHLD, si_code=CLD_KILLED, si_pid=13916, si_uid=0, 
si_status=SIGTERM, si_utime=0, si_stime=0} ---
+++ exited with 255 +++
[root@five ~]#
 
> [acme@five perf]$ perf stat --bpf-counters sleep 1
> Failed to lock perf_event_attr map
> [acme@five perf]$
>  
> > - Arnaldo
> > 
> > [acme@quaco perf]$ make O=/tmp/build/perf -C tools/perf BUILD_BPF_SKEL=1 
> > PYTHON=python3 install-bin
> > make: Entering directory '/home/acme/git/perf/tools/perf'
> >   BUILD:   Doing 'make -j8' parallel build
> > Warning: Kernel ABI header at 'tools/include/uapi/linux/kvm.h' differs from 
> > latest version at 'include/uapi/linux/kvm.h'
> > diff -u tools/include/uapi/linux/kvm.h include/uapi/linux/kvm.h
> > Warning: Kernel ABI header at 
> > 'tools/perf/arch/mips/entry/syscalls/syscall_n64.tbl' differs from latest 
> > version at 'arch/mips/kernel/syscalls/syscall_n64.tbl'
> > diff -u tools/perf/arch/mips/entry/syscalls/syscall_n64.tbl 
> > arch/mips/kernel/syscalls/syscall_n64.tbl
> > 
> > Auto-detecting system features:
> > ... dwarf: [ on  ]
> > ...dwarf_getlocations: [ on  ]
> > ... glibc: [ on  ]
> > ...libbfd: [ on  ]
> > ...libbfd-buildid: [ on  ]
> > ...libcap: [ on  ]
> > ...libelf: [ on  ]
> > ...   libnuma: [ on  ]
> > ...numa_num_possible_cpus: [ on  ]
> > ...   libperl: [ on  ]
> > ... libpython: [ on  ]
> > ... libcrypto: [ on  ]
> > ... libunwind: [ on  ]
> > ...libdw-dwarf-unwind: [ on  ]
> > ...  zlib: [ on  ]
> > ...  lzma: [ on  ]
> > ... get_cpuid: [ on  ]
> > ...   bpf: [ on  ]
> > ...libaio: [ on  ]
> > ...   libzstd: [ on  ]
> > ...disassembler-four-args: [ on  ]
> > 
> >   GEN  /tmp/build/perf/common-cmds.h
> >   CC   /tmp/build/perf/exec-cmd.o
> >   MKDIR/tmp/build/perf/fd/
> >   MKDIR/tmp/build/perf/fs/
> >   CC   /tmp/build/perf/fs/fs.o
> >   CC   /tmp/build/perf/event-parse.o
> >   CC   /tmp/build/perf/fd/array.o
> >   CC   /tmp/build/perf/core.o
> >   GEN  /tmp/build/perf/bpf_helper_defs.h
> >   CC   /tmp/build/perf/event-plugin.o
> >   MKDIR/tmp/build/perf/staticobjs/
> >   PERF_VERSION = 5.12.rc2.g3df07f57f205
> >   CC   /tmp/build/perf/staticobjs/libbpf.o
> >   CC   /tmp/build/perf/cpu.o
> >   LD   /tmp/build/perf/fd/libapi-in.o
> >   CC   /tmp/build/perf/cpumap.o
> >   CC   /tmp/build/perf/help.o
> >   MKDIR/tmp/build/perf/fs/
> >   CC   /tmp/build/perf/fs/tracing_path.o
> >   CC   /tmp/build/perf/fs/cgroup.o
> >   CC   /tmp/bu

Re: [PATCH v2 1/3] perf-stat: introduce bperf, share hardware PMCs with BPF

2021-03-23 Thread Arnaldo Carvalho de Melo
Em Fri, Mar 19, 2021 at 03:41:57PM -0300, Arnaldo Carvalho de Melo escreveu:
> Em Thu, Mar 18, 2021 at 10:15:13PM +0100, Jiri Olsa escreveu:
> > On Tue, Mar 16, 2021 at 02:18:35PM -0700, Song Liu wrote:
> > > bperf is off by default. To enable it, pass --bpf-counters option to
> > > perf-stat. bperf uses a BPF hashmap to share information about BPF
> > > programs and maps used by bperf. This map is pinned to bpffs. The default
> > > path is /sys/fs/bpf/perf_attr_map. The user could change the path with
> > > option --bpf-attr-map.
> > > 
> > > Signed-off-by: Song Liu 
> > 
> > Reviewed-by: Jiri Olsa 
> 
> After applying just this first patch in the series I'm getting this
> after a 'make -C tools/ clean', now I'm checking if I need some new
> clang, ideas?

Works now with clang from fedora 33, I was using a locally built, older,
now I get this when trying as non-root, expected, but we need to improve
the wording.

[acme@five perf]$ perf stat --bpf-counters sleep 1
Failed to lock perf_event_attr map
[acme@five perf]$
 
> - Arnaldo
> 
> [acme@quaco perf]$ make O=/tmp/build/perf -C tools/perf BUILD_BPF_SKEL=1 
> PYTHON=python3 install-bin
> make: Entering directory '/home/acme/git/perf/tools/perf'
>   BUILD:   Doing 'make -j8' parallel build
> Warning: Kernel ABI header at 'tools/include/uapi/linux/kvm.h' differs from 
> latest version at 'include/uapi/linux/kvm.h'
> diff -u tools/include/uapi/linux/kvm.h include/uapi/linux/kvm.h
> Warning: Kernel ABI header at 
> 'tools/perf/arch/mips/entry/syscalls/syscall_n64.tbl' differs from latest 
> version at 'arch/mips/kernel/syscalls/syscall_n64.tbl'
> diff -u tools/perf/arch/mips/entry/syscalls/syscall_n64.tbl 
> arch/mips/kernel/syscalls/syscall_n64.tbl
> 
> Auto-detecting system features:
> ... dwarf: [ on  ]
> ...dwarf_getlocations: [ on  ]
> ... glibc: [ on  ]
> ...libbfd: [ on  ]
> ...libbfd-buildid: [ on  ]
> ...libcap: [ on  ]
> ...libelf: [ on  ]
> ...   libnuma: [ on  ]
> ...numa_num_possible_cpus: [ on  ]
> ...   libperl: [ on  ]
> ... libpython: [ on  ]
> ... libcrypto: [ on  ]
> ... libunwind: [ on  ]
> ...libdw-dwarf-unwind: [ on  ]
> ...  zlib: [ on  ]
> ...  lzma: [ on  ]
> ... get_cpuid: [ on  ]
> ...   bpf: [ on  ]
> ...libaio: [ on  ]
> ...   libzstd: [ on  ]
> ...disassembler-four-args: [ on  ]
> 
>   GEN  /tmp/build/perf/common-cmds.h
>   CC   /tmp/build/perf/exec-cmd.o
>   MKDIR/tmp/build/perf/fd/
>   MKDIR/tmp/build/perf/fs/
>   CC   /tmp/build/perf/fs/fs.o
>   CC   /tmp/build/perf/event-parse.o
>   CC   /tmp/build/perf/fd/array.o
>   CC   /tmp/build/perf/core.o
>   GEN  /tmp/build/perf/bpf_helper_defs.h
>   CC   /tmp/build/perf/event-plugin.o
>   MKDIR/tmp/build/perf/staticobjs/
>   PERF_VERSION = 5.12.rc2.g3df07f57f205
>   CC   /tmp/build/perf/staticobjs/libbpf.o
>   CC   /tmp/build/perf/cpu.o
>   LD   /tmp/build/perf/fd/libapi-in.o
>   CC   /tmp/build/perf/cpumap.o
>   CC   /tmp/build/perf/help.o
>   MKDIR/tmp/build/perf/fs/
>   CC   /tmp/build/perf/fs/tracing_path.o
>   CC   /tmp/build/perf/fs/cgroup.o
>   CC   /tmp/build/perf/trace-seq.o
>   CC   /tmp/build/perf/pager.o
>   CC   /tmp/build/perf/parse-options.o
>   LD   /tmp/build/perf/fs/libapi-in.o
>   CC   /tmp/build/perf/debug.o
>   CC   /tmp/build/perf/str_error_r.o
>   CC   /tmp/build/perf/run-command.o
>   CC   /tmp/build/perf/sigchain.o
>   LD   /tmp/build/perf/libapi-in.o
>   AR   /tmp/build/perf/libapi.a
>   CC   /tmp/build/perf/subcmd-config.o
>   CC   /tmp/build/perf/threadmap.o
>   CC   /tmp/build/perf/evsel.o
>   CC   /tmp/build/perf/parse-filter.o
>   MKDIR/tmp/build/perf/staticobjs/
>   CC   /tmp/build/perf/staticobjs/bpf.o
>   CC   /tmp/build/perf/evlist.o
>   CC   /tmp/build/perf/parse-utils.o
>   CC   /tmp/build/perf/kbuffer-parse.o
>   CC   /tmp/build/perf/tep_strerror.o
>   CC   /tmp/build/perf/mmap.o
>   CC   /tmp/build/perf/zalloc.o
>   CC   /tmp/build/perf/event-parse-api.o
>   LD   /tmp/build/perf/libsubcmd-in.o
>   AR   /tmp/build/perf/libsubcmd.a
>   CC   /tmp/build/perf/xyarray.o
>   LD   /tmp/build/perf/libtraceevent-in.o
>   LI

Re: [PATCH v2 1/3] perf-stat: introduce bperf, share hardware PMCs with BPF

2021-03-19 Thread Arnaldo Carvalho de Melo
Em Thu, Mar 18, 2021 at 10:15:13PM +0100, Jiri Olsa escreveu:
> On Tue, Mar 16, 2021 at 02:18:35PM -0700, Song Liu wrote:
> > bperf is off by default. To enable it, pass --bpf-counters option to
> > perf-stat. bperf uses a BPF hashmap to share information about BPF
> > programs and maps used by bperf. This map is pinned to bpffs. The default
> > path is /sys/fs/bpf/perf_attr_map. The user could change the path with
> > option --bpf-attr-map.
> > 
> > Signed-off-by: Song Liu 
> 
> Reviewed-by: Jiri Olsa 

After applying just this first patch in the series I'm getting this
after a 'make -C tools/ clean', now I'm checking if I need some new
clang, ideas?

- Arnaldo

[acme@quaco perf]$ make O=/tmp/build/perf -C tools/perf BUILD_BPF_SKEL=1 
PYTHON=python3 install-bin
make: Entering directory '/home/acme/git/perf/tools/perf'
  BUILD:   Doing 'make -j8' parallel build
Warning: Kernel ABI header at 'tools/include/uapi/linux/kvm.h' differs from 
latest version at 'include/uapi/linux/kvm.h'
diff -u tools/include/uapi/linux/kvm.h include/uapi/linux/kvm.h
Warning: Kernel ABI header at 
'tools/perf/arch/mips/entry/syscalls/syscall_n64.tbl' differs from latest 
version at 'arch/mips/kernel/syscalls/syscall_n64.tbl'
diff -u tools/perf/arch/mips/entry/syscalls/syscall_n64.tbl 
arch/mips/kernel/syscalls/syscall_n64.tbl

Auto-detecting system features:
... dwarf: [ on  ]
...dwarf_getlocations: [ on  ]
... glibc: [ on  ]
...libbfd: [ on  ]
...libbfd-buildid: [ on  ]
...libcap: [ on  ]
...libelf: [ on  ]
...   libnuma: [ on  ]
...numa_num_possible_cpus: [ on  ]
...   libperl: [ on  ]
... libpython: [ on  ]
... libcrypto: [ on  ]
... libunwind: [ on  ]
...libdw-dwarf-unwind: [ on  ]
...  zlib: [ on  ]
...  lzma: [ on  ]
... get_cpuid: [ on  ]
...   bpf: [ on  ]
...libaio: [ on  ]
...   libzstd: [ on  ]
...disassembler-four-args: [ on  ]

  GEN  /tmp/build/perf/common-cmds.h
  CC   /tmp/build/perf/exec-cmd.o
  MKDIR/tmp/build/perf/fd/
  MKDIR/tmp/build/perf/fs/
  CC   /tmp/build/perf/fs/fs.o
  CC   /tmp/build/perf/event-parse.o
  CC   /tmp/build/perf/fd/array.o
  CC   /tmp/build/perf/core.o
  GEN  /tmp/build/perf/bpf_helper_defs.h
  CC   /tmp/build/perf/event-plugin.o
  MKDIR/tmp/build/perf/staticobjs/
  PERF_VERSION = 5.12.rc2.g3df07f57f205
  CC   /tmp/build/perf/staticobjs/libbpf.o
  CC   /tmp/build/perf/cpu.o
  LD   /tmp/build/perf/fd/libapi-in.o
  CC   /tmp/build/perf/cpumap.o
  CC   /tmp/build/perf/help.o
  MKDIR/tmp/build/perf/fs/
  CC   /tmp/build/perf/fs/tracing_path.o
  CC   /tmp/build/perf/fs/cgroup.o
  CC   /tmp/build/perf/trace-seq.o
  CC   /tmp/build/perf/pager.o
  CC   /tmp/build/perf/parse-options.o
  LD   /tmp/build/perf/fs/libapi-in.o
  CC   /tmp/build/perf/debug.o
  CC   /tmp/build/perf/str_error_r.o
  CC   /tmp/build/perf/run-command.o
  CC   /tmp/build/perf/sigchain.o
  LD   /tmp/build/perf/libapi-in.o
  AR   /tmp/build/perf/libapi.a
  CC   /tmp/build/perf/subcmd-config.o
  CC   /tmp/build/perf/threadmap.o
  CC   /tmp/build/perf/evsel.o
  CC   /tmp/build/perf/parse-filter.o
  MKDIR/tmp/build/perf/staticobjs/
  CC   /tmp/build/perf/staticobjs/bpf.o
  CC   /tmp/build/perf/evlist.o
  CC   /tmp/build/perf/parse-utils.o
  CC   /tmp/build/perf/kbuffer-parse.o
  CC   /tmp/build/perf/tep_strerror.o
  CC   /tmp/build/perf/mmap.o
  CC   /tmp/build/perf/zalloc.o
  CC   /tmp/build/perf/event-parse-api.o
  LD   /tmp/build/perf/libsubcmd-in.o
  AR   /tmp/build/perf/libsubcmd.a
  CC   /tmp/build/perf/xyarray.o
  LD   /tmp/build/perf/libtraceevent-in.o
  LINK /tmp/build/perf/libtraceevent.a
  CC   /tmp/build/perf/staticobjs/nlattr.o
  CC   /tmp/build/perf/staticobjs/btf.o
  CC   /tmp/build/perf/lib.o
  CC   /tmp/build/perf/staticobjs/libbpf_errno.o
  CC   /tmp/build/perf/staticobjs/str_error.o
  CC   /tmp/build/perf/staticobjs/netlink.o
  CC   /tmp/build/perf/staticobjs/bpf_prog_linfo.o
  CC   /tmp/build/perf/staticobjs/libbpf_probes.o
  LD   /tmp/build/perf/libperf-in.o
  AR   /tmp/build/perf/libperf.a
  MKDIR/tmp/build/perf/pmu-events/
  HOSTCC   /tmp/build/perf/pmu-events/json.o
  CC   /tmp/build/perf/plugin_jbd2.o
  CC   /tmp/build/perf/staticobjs/xsk.o
  MKDIR/tmp/build/perf/pmu-events/
  HOSTCC   /tmp/build/perf/pmu-events/jsmn.o
  CC   /tmp/build/perf/staticobjs/hashmap.o
  LD   /tmp/build/perf/plugin_jbd2-in.o
  CC   /tmp/build/perf/staticobjs/btf_dump.o
  CC  

Re: [PATCH] Tools: lib: string: Fix isspace() parameter to avoid undefined behavior

2021-03-19 Thread Arnaldo Carvalho de Melo
Em Fri, Mar 19, 2021 at 09:14:15AM +0900, hyunji-Hong escreveu:
> isspace() could be vulnerable in terms of unpredictable results. So, the 
> parameter of the isspace() should be cast with 'unsigned int'. We found out 
> that information through these sites. (Microsoft, Stack Overflow)
> url: [https://docs.microsoft.com/en-us/cpp/code-quality/c6328?view=msvc-160]
> url: [https://stackoverflow.com/questions/122721]

tools/ and the kernel versions of isspace are implemented in
include/linux/ctype.h and tools/include/linux/ctype.h.

- Arnaldo
 
> Signed-off-by: SeungHoon Woo, Hyunji Hong, Kyeongseok Yang 
> 
> ---
>  tools/lib/string.c | 4 ++--
>  1 file changed, 2 insertions(+), 2 deletions(-)
> 
> diff --git a/tools/lib/string.c b/tools/lib/string.c
> index 8b6892f959ab..7d01be5cdcf8 100644
> --- a/tools/lib/string.c
> +++ b/tools/lib/string.c
> @@ -123,7 +123,7 @@ size_t __weak strlcpy(char *dest, const char *src, size_t 
> size)
>   */
>  char *skip_spaces(const char *str)
>  {
> - while (isspace(*str))
> + while (isspace((unsigned char)*str))
>   ++str;
>   return (char *)str;
>  }
> @@ -146,7 +146,7 @@ char *strim(char *s)
>   return s;
>  
>   end = s + size - 1;
> - while (end >= s && isspace(*end))
> + while (end >= s && isspace((unsigned char)*end))
>   end--;
>   *(end + 1) = '\0';
>  
> -- 
> 2.17.1



Re: [PATCH v2 0/3] perf-stat: share hardware PMCs with BPF

2021-03-19 Thread Arnaldo Carvalho de Melo
Em Fri, Mar 19, 2021 at 09:54:59AM +0900, Namhyung Kim escreveu:
> On Fri, Mar 19, 2021 at 9:22 AM Song Liu  wrote:
> > > On Mar 18, 2021, at 5:09 PM, Arnaldo  wrote:
> > > On March 18, 2021 6:14:34 PM GMT-03:00, Jiri Olsa  
> > > wrote:
> > >> On Thu, Mar 18, 2021 at 03:52:51AM +, Song Liu wrote:
> > >>> perf stat -C 1,3,5  107.063 [sec]
> > >>> perf stat -C 1,3,5 --bpf-counters   106.406 [sec]

> > >> I can't see why it's actualy faster than normal perf ;-)
> > >> would be worth to find out

> > > Isn't this all about contended cases?

> > Yeah, the normal perf is doing time multiplexing; while --bpf-counters
> > doesn't need it.

> Yep, so for uncontended cases, normal perf should be the same as the
> baseline (faster than the bperf).  But for contended cases, the bperf
> works faster.

The difference should be small enough that for people that use this in a
machine where contention happens most of the time, setting a
~/.perfconfig to use it by default should be advantageous, i.e. no need
to use --bpf-counters on the command line all the time.

So, Namhyung, can I take that as an Acked-by or a Reviewed-by? I'll take
a look again now but I want to have this merged on perf/core so that I
can work on a new BPF SKEL to use this:

https://git.kernel.org/pub/scm/linux/kernel/git/acme/linux.git/log/?h=tmp.bpf/bpf_perf_enable

:-)

- Arnaldo


Re: [PATCH v2 11/27] perf parse-events: Support hardware events inside PMU

2021-03-18 Thread Arnaldo Carvalho de Melo
Em Thu, Mar 18, 2021 at 01:16:37PM +0100, Jiri Olsa escreveu:
> On Wed, Mar 17, 2021 at 10:42:45AM -0300, Arnaldo Carvalho de Melo wrote:
> > Em Wed, Mar 17, 2021 at 08:17:52PM +0800, Jin, Yao escreveu:
> > > I'm OK to only support 'cpu_core/cpu-cycles/' or 'cpu_atom/cpu-cycles/'. 
> > > But
> > > what would we do for cache event?

> > > 'perf stat -e LLC-loads' is OK, but 'perf stat -e cpu/LLC-loads/' is not 
> > > supported currently.

> > > For hybrid platform, user may only want to enable the LLC-loads on core 
> > > CPUs
> > > or on atom CPUs. That's reasonable. While if we don't support the pmu 
> > > style
> > > event, how to satisfy this requirement?

> > > If we can support the pmu style event, we can also use the same way for
> > > cpu_core/cycles/. At least it's not a bad thing, right? :)

> > While we're discussing, do we really want to use the "core" and "atom"
> > terms here? I thought cpu/cycles/ would be ok for the main (Big) CPU and
> > that we should come up with some short name for the "litle" CPUs.

> > Won't we have the same situation with ARM where we want to know the
> > number of cycles spent on a BIG core and also on a little one?

> > Perhaps 'cycles' should mean all cycles, and then we use 'big/cycles/' and
> > 'little/cycles/'?

> do arm servers already export multiple pmus like this?
> I did not notice

I haven't checked, but AFAIK this BIG/Little kind of arch started there,
Mark?

- Arnaldo
 
> it'd be definitely great to have some unite way for this,
> so far we have the hybrid pmu detection and support in
> hw events like cycles/instructions.. which should be easy
> to follow on arm
> 
> there's also support to have these events on specific pmu
> pmu/cycles/ , which I still need to check on


Re: [PATCH v2 1/3] perf test: Remove unused argument

2021-03-18 Thread Arnaldo Carvalho de Melo
Em Thu, Mar 18, 2021 at 02:01:49PM +0100, Jiri Olsa escreveu:
> On Tue, Mar 16, 2021 at 05:55:03PM -0700, Ian Rogers wrote:
> > Remove unused argument from daemon_exit.
> > 
> > Signed-off-by: Ian Rogers 
> 
> for the patchset
> 
> Acked-by: Jiri Olsa 

Thanks, added to the csets I had applied already in my local repo.

- Arnaldo


Re: [PATCH v2] perf stat: Align CSV output for summary mode

2021-03-18 Thread Arnaldo Carvalho de Melo
Em Wed, Mar 17, 2021 at 02:51:42PM -0700, Andi Kleen escreveu:
> > If you care about not breaking existing scripts, then the output they
> > get with what they use as command line options must continue to produce
> > the same output.
> 
> It's not clear there are any useful ones (except for tools that handle
> both). It's really hard to parse the previous mess. It's simply not
> valid CSV.
> 
> That's why I'm arguing that keeping compatibility is not useful here.
> 
> We would be stuck with the broken mess as default forever.

Fair enough, lets fix the default then. Jin, can you please consider
adding a 'perf test' shell entry to parse the CSV mode with/without that
summary? This way we'll notice when the new normal gets broken.

- Arnaldo


Re: [PATCH v2 11/27] perf parse-events: Support hardware events inside PMU

2021-03-17 Thread Arnaldo Carvalho de Melo
Em Wed, Mar 17, 2021 at 08:17:52PM +0800, Jin, Yao escreveu:
> Hi Jiri,
> 
> On 3/17/2021 6:06 PM, Jiri Olsa wrote:
> > On Wed, Mar 17, 2021 at 10:12:03AM +0800, Jin, Yao wrote:
> > > 
> > > 
> > > On 3/16/2021 10:04 PM, Jiri Olsa wrote:
> > > > On Tue, Mar 16, 2021 at 09:49:42AM +0800, Jin, Yao wrote:
> > > > 
> > > > SNIP
> > > > 
> > > > > 
> > > > >Performance counter stats for 'system wide':
> > > > > 
> > > > >  136,655,302  cpu_core/branch-instructions/
> > > > > 
> > > > >  1.003171561 seconds time elapsed
> > > > > 
> > > > > So we need special rules for both cycles and branches.
> > > > > 
> > > > > The worse thing is, we also need to process the hardware cache events.
> > > > > 
> > > > > # ./perf stat -e cpu_core/LLC-loads/
> > > > > event syntax error: 'cpu_core/LLC-loads/'
> > > > > \___ unknown term 'LLC-loads' for pmu 
> > > > > 'cpu_core'
> > > > > 
> > > > > valid terms: 
> > > > > event,pc,edge,offcore_rsp,ldlat,inv,umask,frontend,cmask,config,config1,config2,name,period,percore
> > > > > 
> > > > > Initial error:
> > > > > event syntax error: 'cpu_core/LLC-loads/'
> > > > > \___ unknown term 'LLC-loads' for pmu 
> > > > > 'cpu_core'
> > > > > 
> > > > > If we use special rules for establishing all event mapping, that 
> > > > > looks too much. :(
> > > > 
> > > > hmmm but wait, currently we do not support events like this:
> > > > 
> > > > 'cpu/cycles/'
> > > > 'cpu/branches/'
> > > > 
> > > > the pmu style accepts only 'events' or 'format' terms within //
> > > > 
> > > > we made hw events like 'cycles','instructions','branches' special
> > > > to be used without the pmu
> > > > 
> > > > so why do we need to support cpu_code/cycles/ ?

> > > Actually we have to support pmu style event for hybrid platform.

> > > User may want to enable the events from specified pmus and also with 
> > > flexible grouping.

> > > For example,

> > > perf stat -e '{cpu_core/cycles/,cpu_core/instructions/}' -e 
> > > '{cpu_atom/cycles/,cpu_atom/instructions/}'

> > > This usage is common and reasonable. So I think we may need to support 
> > > pmu style events.

> > sure, but we don't support 'cpu/cycles/' but we support 'cpu/cpu-cycles/'
> > why do you insist on supporting cpu_core/cycles/ ?

> 
> I'm OK to only support 'cpu_core/cpu-cycles/' or 'cpu_atom/cpu-cycles/'. But
> what would we do for cache event?
> 
> 'perf stat -e LLC-loads' is OK, but 'perf stat -e cpu/LLC-loads/' is not 
> supported currently.
> 
> For hybrid platform, user may only want to enable the LLC-loads on core CPUs
> or on atom CPUs. That's reasonable. While if we don't support the pmu style
> event, how to satisfy this requirement?
> 
> If we can support the pmu style event, we can also use the same way for
> cpu_core/cycles/. At least it's not a bad thing, right? :)

While we're discussing, do we really want to use the "core" and "atom"
terms here? I thought cpu/cycles/ would be ok for the main (Big) CPU and
that we should come up with some short name for the "litle" CPUs.

Won't we have the same situation with ARM where we want to know the
number of cycles spent on a BIG core and also on a little one?

Perhaps 'cycles' should mean all cycles, and then we use 'big/cycles/' and
'little/cycles/'?

- Arnaldo


Re: unknown NMI on AMD Rome

2021-03-17 Thread Arnaldo Carvalho de Melo
Em Wed, Mar 17, 2021 at 04:32:17PM +0300, Alexander Monakov escreveu:
> On Wed, 17 Mar 2021, Peter Zijlstra wrote:
> > On Wed, Mar 17, 2021 at 09:48:29AM +0100, Ingo Molnar wrote:
> > > > https://developer.amd.com/wp-content/resources/56323-PUB_0.78.pdf

> > >   1215 IBS (Instruction Based Sampling) Counter Valid Value
> > >   May be Incorrect After Exit From Core C6 (CC6) State

> > >   Description

> > >   If a core's IBS feature is enabled and configured to generate an 
> > > interrupt, including NMI (Non-Maskable
> > >   Interrupt), and the IBS counter overflows during the entry into the 
> > > Core C6 (CC6) state, the interrupt may be
> > >   issued, but an invalid value of the valid bit may be restored when the 
> > > core exits CC6.
> > >   Potential Effect on System

> > >   The operating system may receive interrupts due to an IBS counter 
> > > event, including NMI, and not observe an
> > >   valid IBS register. Console messages indicating "NMI received for 
> > > unknown reason" have been observed on
> > >   Linux systems.

> > >   Suggested Workaround: None
> > >   Fix Planned: No fix planned

> > Should be simple enough to disable CC6 while IBS is in use. Kim, can you
> > please make that happen?

> Wouldn't that "magically" significantly speed up workloads running under
> 'perf top', in case they don't saturate the CPUs? Scheduling gets
> much snappier if the target CPU doesn't need to wake up from deep sleep :)

> Alternatively, would you consider adding the errata reference to the
> printk message when IBS is in use, and rate-limit it so it doesn't
> flood dmesg? Then the user will know what's going on, and may
> choose to temporarily disable C-states using the 'cpupower' tool.

Would be interesting as well to make 'perf top' realize that somehow
(looking at some cpu id, etc) and don't use IBS when C-states are being
used and/or warn the user about the situation, i.e. cycles:P can't be
used in this machine if C-states are enabled?

- Arnaldo


Re: [PATCH v2] perf stat: Align CSV output for summary mode

2021-03-17 Thread Arnaldo Carvalho de Melo
Em Wed, Mar 17, 2021 at 03:02:05PM +0800, Jin Yao escreveu:
> perf-stat has supported the summary mode. But the summary
> lines break the CSV output so it's hard for scripts to parse
> the result.
> 
> Before:
> 
>   # perf stat -x, -I1000 --interval-count 1 --summary
>1.001323097,8013.48,msec,cpu-clock,8013483384,100.00,8.013,CPUs 
> utilized
>1.001323097,270,,context-switches,8013513297,100.00,0.034,K/sec
>1.001323097,13,,cpu-migrations,8013530032,100.00,0.002,K/sec
>1.001323097,184,,page-faults,8013546992,100.00,0.023,K/sec
>1.001323097,20574191,,cycles,8013551506,100.00,0.003,GHz
>1.001323097,10562267,,instructions,8013564958,100.00,0.51,insn per 
> cycle
>1.001323097,2019244,,branches,8013575673,100.00,0.252,M/sec
>1.001323097,106152,,branch-misses,8013585776,100.00,5.26,of all 
> branches
>   8013.48,msec,cpu-clock,8013483384,100.00,7.984,CPUs utilized
>   270,,context-switches,8013513297,100.00,0.034,K/sec
>   13,,cpu-migrations,8013530032,100.00,0.002,K/sec
>   184,,page-faults,8013546992,100.00,0.023,K/sec
>   20574191,,cycles,8013551506,100.00,0.003,GHz
>   10562267,,instructions,8013564958,100.00,0.51,insn per cycle
>   2019244,,branches,8013575673,100.00,0.252,M/sec
>   106152,,branch-misses,8013585776,100.00,5.26,of all branches
> 
> The summary line loses the timestamp column, which breaks the
> CVS output.
> 
> We add a column at the original 'timestamp' position and it just says
> 'summary' for the summary line.
> 
> After:
> 
>   # perf stat -x, -I1000 --interval-count 1 --summary
>1.001196053,8012.72,msec,cpu-clock,8012722903,100.00,8.013,CPUs 
> utilized
>1.001196053,218,,context-switches,8012753271,100.00,0.027,K/sec
>1.001196053,9,,cpu-migrations,8012769767,100.00,0.001,K/sec
>1.001196053,0,,page-faults,8012786257,100.00,0.000,K/sec
>1.001196053,15004518,,cycles,8012790637,100.00,0.002,GHz
>1.001196053,7954691,,instructions,8012804027,100.00,0.53,insn per cycle
>1.001196053,1590259,,branches,8012814766,100.00,0.198,M/sec
>1.001196053,82601,,branch-misses,8012824365,100.00,5.19,of all branches
>summary,8012.72,msec,cpu-clock,8012722903,100.00,7.986,CPUs 
> utilized
>summary,218,,context-switches,8012753271,100.00,0.027,K/sec
>summary,9,,cpu-migrations,8012769767,100.00,0.001,K/sec
>summary,0,,page-faults,8012786257,100.00,0.000,K/sec
>summary,15004518,,cycles,8012790637,100.00,0.002,GHz
>summary,7954691,,instructions,8012804027,100.00,0.53,insn per cycle
>summary,1590259,,branches,8012814766,100.00,0.198,M/sec
>summary,82601,,branch-misses,8012824365,100.00,5.19,of all branches
> 
> Now it's easy for script to analyse the summary lines.
> 
> Of course, we also consider not to break possible existing scripts which
> have fixed the broken CVS format, we provide a optiton '--no-cvs-summary'
> to keep original output.

If you care about not breaking existing scripts, then the output they
get with what they use as command line options must continue to produce
the same output.

Adding a new option for these pre-existing scripts to use by definition
will break them, that will need to be modified to use this new option to
ask that the pre-existing output is produced. :-)
 
>   # perf stat -x, -I1000 --interval-count 1 --summary --no-cvs-summary
>1.001213261,8012.67,msec,cpu-clock,8012672327,100.00,8.013,CPUs 
> utilized
>1.001213261,197,,context-switches,8012703742,100.00,24.586,/sec
>1.001213261,9,,cpu-migrations,8012720902,100.00,1.123,/sec
>1.001213261,644,,page-faults,8012738266,100.00,80.373,/sec
>1.001213261,18350698,,cycles,8012744109,100.00,0.002,GHz
>1.001213261,12745021,,instructions,8012759001,100.00,0.69,insn per 
> cycle
>1.001213261,2458033,,branches,8012770864,100.00,306.768,K/sec
>1.001213261,102107,,branch-misses,8012781751,100.00,4.15,of all 
> branches
>   8012.67,msec,cpu-clock,8012672327,100.00,7.985,CPUs utilized
>   197,,context-switches,8012703742,100.00,24.586,/sec
>   9,,cpu-migrations,8012720902,100.00,1.123,/sec
>   644,,page-faults,8012738266,100.00,80.373,/sec
>   18350698,,cycles,8012744109,100.00,0.002,GHz
>   12745021,,instructions,8012759001,100.00,0.69,insn per cycle
>   2458033,,branches,8012770864,100.00,306.768,K/sec
>   102107,,branch-misses,8012781751,100.00,4.15,of all branches
> 
> This option can be enabled in perf config by setting the variable
> 'stat.no-cvs-summary'.
> 
>   # perf config stat.no-cvs-summary=true
> 
>   # perf config -l
>   stat.no-cvs-summary=true
> 
>   # perf stat -x, -I1000 --interval-count 1 --summary
>1.001330198,8013.28,msec,cpu-clock,8013279201,100.00,8.013,CPUs 
> utilized
>1.001330198,205,,context-switches,8013308394,100.00,25.583,/sec
>1.001330198,10,,cpu-migrations,8013324681,100.00,1.248,/sec
>

Re: [PATCH v2 0/3] perf-stat: share hardware PMCs with BPF

2021-03-17 Thread Arnaldo Carvalho de Melo
Em Wed, Mar 17, 2021 at 02:29:28PM +0900, Namhyung Kim escreveu:
> Hi Song,
> 
> On Wed, Mar 17, 2021 at 6:18 AM Song Liu  wrote:
> >
> > perf uses performance monitoring counters (PMCs) to monitor system
> > performance. The PMCs are limited hardware resources. For example,
> > Intel CPUs have 3x fixed PMCs and 4x programmable PMCs per cpu.
> >
> > Modern data center systems use these PMCs in many different ways:
> > system level monitoring, (maybe nested) container level monitoring, per
> > process monitoring, profiling (in sample mode), etc. In some cases,
> > there are more active perf_events than available hardware PMCs. To allow
> > all perf_events to have a chance to run, it is necessary to do expensive
> > time multiplexing of events.
> >
> > On the other hand, many monitoring tools count the common metrics (cycles,
> > instructions). It is a waste to have multiple tools create multiple
> > perf_events of "cycles" and occupy multiple PMCs.
> 
> Right, it'd be really helpful when the PMCs are frequently or mostly shared.
> But it'd also increase the overhead for uncontended cases as BPF programs
> need to run on every context switch.  Depending on the workload, it may
> cause a non-negligible performance impact.  So users should be aware of it.

Would be interesting to, humm, measure both cases to have a firm number
of the impact, how many instructions are added when sharing using
--bpf-counters?

I.e. compare the "expensive time multiplexing of events" with its
avoidance by using --bpf-counters.

Song, have you perfmormed such measurements?

- Arnaldo
 
> Thanks,
> Namhyung
> 
> >
> > bperf tries to reduce such wastes by allowing multiple perf_events of
> > "cycles" or "instructions" (at different scopes) to share PMUs. Instead
> > of having each perf-stat session to read its own perf_events, bperf uses
> > BPF programs to read the perf_events and aggregate readings to BPF maps.
> > Then, the perf-stat session(s) reads the values from these BPF maps.
> >
> > Changes v1 => v2:
> >   1. Add documentation.
> >   2. Add a shell test.
> >   3. Rename options, default path of the atto-map, and some variables.
> >   4. Add a separate patch that moves clock_gettime() in __run_perf_stat()
> >  to after enable_counters().
> >   5. Make perf_cpu_map for all cpus a global variable.
> >   6. Use sysfs__mountpoint() for default attr-map path.
> >   7. Use cpu__max_cpu() instead of libbpf_num_possible_cpus().
> >   8. Add flag "enabled" to the follower program. Then move follower attach
> >  to bperf__load() and simplify bperf__enable().
> >
> > Song Liu (3):
> >   perf-stat: introduce bperf, share hardware PMCs with BPF
> >   perf-stat: measure t0 and ref_time after enable_counters()
> >   perf-test: add a test for perf-stat --bpf-counters option
> >
> >  tools/perf/Documentation/perf-stat.txt|  11 +
> >  tools/perf/Makefile.perf  |   1 +
> >  tools/perf/builtin-stat.c |  20 +-
> >  tools/perf/tests/shell/stat_bpf_counters.sh   |  34 ++
> >  tools/perf/util/bpf_counter.c | 519 +-
> >  tools/perf/util/bpf_skel/bperf.h  |  14 +
> >  tools/perf/util/bpf_skel/bperf_follower.bpf.c |  69 +++
> >  tools/perf/util/bpf_skel/bperf_leader.bpf.c   |  46 ++
> >  tools/perf/util/bpf_skel/bperf_u.h|  14 +
> >  tools/perf/util/evsel.h   |  20 +-
> >  tools/perf/util/target.h  |   4 +-
> >  11 files changed, 742 insertions(+), 10 deletions(-)
> >  create mode 100755 tools/perf/tests/shell/stat_bpf_counters.sh
> >  create mode 100644 tools/perf/util/bpf_skel/bperf.h
> >  create mode 100644 tools/perf/util/bpf_skel/bperf_follower.bpf.c
> >  create mode 100644 tools/perf/util/bpf_skel/bperf_leader.bpf.c
> >  create mode 100644 tools/perf/util/bpf_skel/bperf_u.h
> >
> > --
> > 2.30.2

-- 

- Arnaldo


Re: [PATCH] perf stat: Align CSV output for summary mode

2021-03-16 Thread Arnaldo Carvalho de Melo
Em Tue, Mar 16, 2021 at 09:34:21AM -0700, Andi Kleen escreveu:
> > looks ok, but maybe make the option more related to CVS, like:
> > 
> >   --x-summary, --cvs-summary  ...? 
> 
> Actually I don't think it should be a new option. I doubt
> anyone could parse the previous mess. So just make it default
> with -x

In these cases I always fear that people are already parsing that mess
by considering the summary lines to be the ones not starting with
spaces, and now we go on and change it to be "better" by prefixing it
with "summary" and... break existing scripts.

Can we do this with a new option?

I.e. like --cvs-summary?

- Arnaldo


Re: [PATCH v4] perf tools: perf_event_paranoid and kptr_restrict may crash on 'perf top'

2021-03-16 Thread Arnaldo Carvalho de Melo
Em Tue, Mar 16, 2021 at 09:24:53AM +0800, Jackie Liu escreveu:
> After install the libelf-dev package and compiling perf, kptr_restrict=2
> and perf_event_paranoid=3 will cause perf top to crash, because the
> value of /proc/kallsyms cannot be obtained, which leads to
> info->jited_ksyms == NULL. In order to solve this problem, Add a
> judgment before use.
> 
> v3->v4:
> Fix memory leaks in more places
> 
> v2->v3:
> free info_linear memory and move code above, don't do those extra btf
> checks.

Applied, edited the commit message to:


perf top: Fix BPF support related crash with perf_event_paranoid=3 + 
kptr_restrict

After installing the libelf-dev package and compiling perf, if we have
kptr_restrict=2 and perf_event_paranoid=3 'perf top' will crash because
the value of /proc/kallsyms cannot be obtained, which leads to
info->jited_ksyms == NULL. In order to solve this problem, Add a
check before use.

Also plug some leaks on the error path.

 
> Suggested-by: Jiri Olsa 
> Cc: Peter Zijlstra 
> Cc: Ingo Molnar 
> Cc: Arnaldo Carvalho de Melo 
> Cc: Mark Rutland 
> Cc: Alexander Shishkin 
> Cc: Namhyung Kim 
> Signed-off-by: Jackie Liu 
> ---
>  tools/perf/util/bpf-event.c | 13 ++---
>  1 file changed, 10 insertions(+), 3 deletions(-)
> 
> diff --git a/tools/perf/util/bpf-event.c b/tools/perf/util/bpf-event.c
> index 57d58c81a5f8..cdecda1ddd36 100644
> --- a/tools/perf/util/bpf-event.c
> +++ b/tools/perf/util/bpf-event.c
> @@ -196,25 +196,32 @@ static int perf_event__synthesize_one_bpf_prog(struct 
> perf_session *session,
>   }
>  
>   if (info_linear->info_len < offsetof(struct bpf_prog_info, prog_tags)) {
> + free(info_linear);
>   pr_debug("%s: the kernel is too old, aborting\n", __func__);
>   return -2;
>   }
>  
>   info = _linear->info;
> + if (!info->jited_ksyms) {
> + free(info_linear);
> + return -1;
> + }
>  
>   /* number of ksyms, func_lengths, and tags should match */
>   sub_prog_cnt = info->nr_jited_ksyms;
>   if (sub_prog_cnt != info->nr_prog_tags ||
> - sub_prog_cnt != info->nr_jited_func_lens)
> + sub_prog_cnt != info->nr_jited_func_lens) {
> + free(info_linear);
>   return -1;
> + }
>  
>   /* check BTF func info support */
>   if (info->btf_id && info->nr_func_info && info->func_info_rec_size) {
>   /* btf func info number should be same as sub_prog_cnt */
>   if (sub_prog_cnt != info->nr_func_info) {
>   pr_debug("%s: mismatch in BPF sub program count and BTF 
> function info count, aborting\n", __func__);
> - err = -1;
> - goto out;
> + free(info_linear);
> + return -1;
>   }
>   if (btf__get_from_id(info->btf_id, )) {
>   pr_debug("%s: failed to get BTF of id %u, aborting\n", 
> __func__, info->btf_id);
> -- 
> 2.25.1
> 

-- 

- Arnaldo


Re: [PATCH v5] perf annotate: Fix sample events lost in stdio mode

2021-03-16 Thread Arnaldo Carvalho de Melo
Em Tue, Mar 16, 2021 at 10:17:59AM +0800, Yang Jihong escreveu:
> In hist__find_annotations function, since different hist_entry may point to 
> same
> symbol, we free notes->src to signal already processed this symbol in stdio 
> mode;
> when annotate, entry will skipped if notes->src is NULL to avoid repeated 
> output.
> 
> However, there is a problem, for example, run the following command:
> 
>  # perf record -e branch-misses -e branch-instructions -a sleep 1
> 
> perf.data file contains different types of sample event.
> 
> If the same IP sample event exists in branch-misses and branch-instructions,
> this event uses the same symbol. When annotate branch-misses events, 
> notes->src
> corresponding to this event is set to null, as a result, when annotate
> branch-instructions events, this event is skipped and no annotate is output.
> 
> Solution of this patch is to remove zfree in hists__find_annotations and
> change sort order to "dso,symbol" to avoid duplicate output when different
> processes correspond to the same symbol.

You forgot to add your Signed-off-by tag, i.e.:

Signed-off-by: Yang Jihong 

Please take a look at Documentation/process/submitting-patches.rst.

Regards,

- Arnaldo

> ---
>  tools/perf/builtin-annotate.c | 13 ++---
>  1 file changed, 6 insertions(+), 7 deletions(-)
> 
> diff --git a/tools/perf/builtin-annotate.c b/tools/perf/builtin-annotate.c
> index a23ba6bb99b6..92c55f292c11 100644
> --- a/tools/perf/builtin-annotate.c
> +++ b/tools/perf/builtin-annotate.c
> @@ -374,13 +374,6 @@ static void hists__find_annotations(struct hists *hists,
>   } else {
>   hist_entry__tty_annotate(he, evsel, ann);
>   nd = rb_next(nd);
> - /*
> -  * Since we have a hist_entry per IP for the same
> -  * symbol, free he->ms.sym->src to signal we already
> -  * processed this symbol.
> -  */
> - zfree(>src->cycles_hist);
> - zfree(>src);
>   }
>   }
>  }
> @@ -619,6 +612,12 @@ int cmd_annotate(int argc, const char **argv)
>  
>   setup_browser(true);
>  
> + /*
> +  * Events of different processes may correspond to the same
> +  * symbol, we do not care about the processes in annotate,
> +  * set sort order to avoid repeated output.
> +  */
> + sort_order = "dso,symbol";
>   if ((use_browser == 1 || annotate.use_stdio2) && annotate.has_br_stack) 
> {
>   sort__mode = SORT_MODE__BRANCH;
>   if (setup_sorting(annotate.session->evlist) < 0)
> -- 
> 2.30.GIT
> 

-- 

- Arnaldo


Re: [PATCH] perf record: Fix memory leak in vDSO

2021-03-16 Thread Arnaldo Carvalho de Melo
Em Tue, Mar 16, 2021 at 11:28:12AM +0900, Namhyung Kim escreveu:
> On Mon, Mar 15, 2021 at 10:28 PM Jiri Olsa  wrote:
> >
> > On Mon, Mar 15, 2021 at 01:56:41PM +0900, Namhyung Kim wrote:
> > > I got several memory leak reports from Asan with a simple command.  It
> > > was because VDSO is not released due to the refcount.  Like in
> > > __dsos_addnew_id(), it should put the refcount after adding to the list.
> > >
> > >   $ perf record true
> > >   [ perf record: Woken up 1 times to write data ]
> > >   [ perf record: Captured and wrote 0.030 MB perf.data (10 samples) ]
> > >
> > >   =
> > >   ==692599==ERROR: LeakSanitizer: detected memory leaks
> > >
> > >   Direct leak of 439 byte(s) in 1 object(s) allocated from:
> > > #0 0x7fea52341037 in __interceptor_calloc 
> > > ../../../../src/libsanitizer/asan/asan_malloc_linux.cpp:154
> > > #1 0x559bce4aa8ee in dso__new_id util/dso.c:1256
> > > #2 0x559bce59245a in __machine__addnew_vdso util/vdso.c:132
> > > #3 0x559bce59245a in machine__findnew_vdso util/vdso.c:347
> > > #4 0x559bce50826c in map__new util/map.c:175
> > > #5 0x559bce503c92 in machine__process_mmap2_event util/machine.c:1787
> > > #6 0x559bce512f6b in machines__deliver_event util/session.c:1481
> > > #7 0x559bce515107 in perf_session__deliver_event util/session.c:1551
> > > #8 0x559bce51d4d2 in do_flush util/ordered-events.c:244
> > > #9 0x559bce51d4d2 in __ordered_events__flush util/ordered-events.c:323
> > > #10 0x559bce519bea in __perf_session__process_events 
> > > util/session.c:2268
> > > #11 0x559bce519bea in perf_session__process_events util/session.c:2297
> > > #12 0x559bce2e7a52 in process_buildids 
> > > /home/namhyung/project/linux/tools/perf/builtin-record.c:1017
> > > #13 0x559bce2e7a52 in record__finish_output 
> > > /home/namhyung/project/linux/tools/perf/builtin-record.c:1234
> > > #14 0x559bce2ed4f6 in __cmd_record 
> > > /home/namhyung/project/linux/tools/perf/builtin-record.c:2026
> > > #15 0x559bce2ed4f6 in cmd_record 
> > > /home/namhyung/project/linux/tools/perf/builtin-record.c:2858
> > > #16 0x559bce422db4 in run_builtin 
> > > /home/namhyung/project/linux/tools/perf/perf.c:313
> > > #17 0x559bce2acac8 in handle_internal_command 
> > > /home/namhyung/project/linux/tools/perf/perf.c:365
> > > #18 0x559bce2acac8 in run_argv 
> > > /home/namhyung/project/linux/tools/perf/perf.c:409
> > > #19 0x559bce2acac8 in main 
> > > /home/namhyung/project/linux/tools/perf/perf.c:539
> > > #20 0x7fea51e76d09 in __libc_start_main ../csu/libc-start.c:308
> > >
> > >   Indirect leak of 32 byte(s) in 1 object(s) allocated from:
> > > #0 0x7fea52341037 in __interceptor_calloc 
> > > ../../../../src/libsanitizer/asan/asan_malloc_linux.cpp:154
> > > #1 0x559bce520907 in nsinfo__copy util/namespaces.c:169
> > > #2 0x559bce50821b in map__new util/map.c:168
> > > #3 0x559bce503c92 in machine__process_mmap2_event util/machine.c:1787
> > > #4 0x559bce512f6b in machines__deliver_event util/session.c:1481
> > > #5 0x559bce515107 in perf_session__deliver_event util/session.c:1551
> > > #6 0x559bce51d4d2 in do_flush util/ordered-events.c:244
> > > #7 0x559bce51d4d2 in __ordered_events__flush util/ordered-events.c:323
> > > #8 0x559bce519bea in __perf_session__process_events 
> > > util/session.c:2268
> > > #9 0x559bce519bea in perf_session__process_events util/session.c:2297
> > > #10 0x559bce2e7a52 in process_buildids 
> > > /home/namhyung/project/linux/tools/perf/builtin-record.c:1017
> > > #11 0x559bce2e7a52 in record__finish_output 
> > > /home/namhyung/project/linux/tools/perf/builtin-record.c:1234
> > > #12 0x559bce2ed4f6 in __cmd_record 
> > > /home/namhyung/project/linux/tools/perf/builtin-record.c:2026
> > > #13 0x559bce2ed4f6 in cmd_record 
> > > /home/namhyung/project/linux/tools/perf/builtin-record.c:2858
> > > #14 0x559bce422db4 in run_builtin 
> > > /home/namhyung/project/linux/tools/perf/perf.c:313
> > > #15 0x559bce2acac8 in handle_internal_command 
> > > /home/namhyung/project/linux/tools/perf/perf.c:365
> > > #16 0x559bce2acac8 in run_argv 
> > > /home/namhyung/project/linux/tools/perf/perf.c:409
> > > #17 0x559bce2acac8 in main 
> > > /home/namhyung/project/linux/tools/perf/perf.c:539
> > > #18 0x7fea51e76d09 in __libc_start_main ../csu/libc-start.c:308
> > >
> > >   SUMMARY: AddressSanitizer: 471 byte(s) leaked in 2 allocation(s).
> > >
> > > Signed-off-by: Namhyung Kim 
> > > ---
> > >  tools/perf/util/vdso.c | 2 ++
> > >  1 file changed, 2 insertions(+)
> > >
> > > diff --git a/tools/perf/util/vdso.c b/tools/perf/util/vdso.c
> > > index 3cc91ad048ea..43beb169631d 100644
> > > --- a/tools/perf/util/vdso.c
> > > +++ b/tools/perf/util/vdso.c
> > > @@ -133,6 +133,8 @@ static struct dso *__machine__addnew_vdso(struct 
> > > machine *machine, const char *s
> > >   

Re: [PATCH v9 2/2] perf vendor events arm64: Add Fujitsu A64FX pmu event

2021-03-15 Thread Arnaldo Carvalho de Melo
Em Tue, Mar 09, 2021 at 11:59:18AM -0500, Masayoshi Mizuma escreveu:
> On Mon, Mar 08, 2021 at 07:53:41PM +0900, Shunsuke Nakamura wrote:
> > Add pmu events for A64FX.
> > 
> > Documentation source:
> > https://github.com/fujitsu/A64FX/blob/master/doc/A64FX_PMU_Events_v1.2.pdf
> 
> The PMU events described in above document seems to work well
> with this patch!
> Please feel free to add:
> 
> Tested-by: Masayoshi Mizuma 

Thanks, applied.

- Arnaldo

 
> Thanks!
> Masa
> 
> > 
> > Signed-off-by: Shunsuke Nakamura 
> > Reviewed-by: John Garry 
> > ---
> >  .../arch/arm64/fujitsu/a64fx/branch.json  |   8 +
> >  .../arch/arm64/fujitsu/a64fx/bus.json |  62 ++
> >  .../arch/arm64/fujitsu/a64fx/cache.json   | 128 
> >  .../arch/arm64/fujitsu/a64fx/cycle.json   |   5 +
> >  .../arch/arm64/fujitsu/a64fx/exception.json   |  29 +++
> >  .../arch/arm64/fujitsu/a64fx/instruction.json | 131 
> >  .../arch/arm64/fujitsu/a64fx/memory.json  |   8 +
> >  .../arch/arm64/fujitsu/a64fx/other.json   | 188 +
> >  .../arch/arm64/fujitsu/a64fx/pipeline.json| 194 ++
> >  .../arch/arm64/fujitsu/a64fx/sve.json | 110 ++
> >  tools/perf/pmu-events/arch/arm64/mapfile.csv  |   1 +
> >  11 files changed, 864 insertions(+)
> >  create mode 100644 
> > tools/perf/pmu-events/arch/arm64/fujitsu/a64fx/branch.json
> >  create mode 100644 tools/perf/pmu-events/arch/arm64/fujitsu/a64fx/bus.json
> >  create mode 100644 
> > tools/perf/pmu-events/arch/arm64/fujitsu/a64fx/cache.json
> >  create mode 100644 
> > tools/perf/pmu-events/arch/arm64/fujitsu/a64fx/cycle.json
> >  create mode 100644 
> > tools/perf/pmu-events/arch/arm64/fujitsu/a64fx/exception.json
> >  create mode 100644 
> > tools/perf/pmu-events/arch/arm64/fujitsu/a64fx/instruction.json
> >  create mode 100644 
> > tools/perf/pmu-events/arch/arm64/fujitsu/a64fx/memory.json
> >  create mode 100644 
> > tools/perf/pmu-events/arch/arm64/fujitsu/a64fx/other.json
> >  create mode 100644 
> > tools/perf/pmu-events/arch/arm64/fujitsu/a64fx/pipeline.json
> >  create mode 100644 tools/perf/pmu-events/arch/arm64/fujitsu/a64fx/sve.json
> > 
> > diff --git a/tools/perf/pmu-events/arch/arm64/fujitsu/a64fx/branch.json 
> > b/tools/perf/pmu-events/arch/arm64/fujitsu/a64fx/branch.json
> > new file mode 100644
> > index ..b011af11bf94
> > --- /dev/null
> > +++ b/tools/perf/pmu-events/arch/arm64/fujitsu/a64fx/branch.json
> > @@ -0,0 +1,8 @@
> > +[
> > +  {
> > +"ArchStdEvent": "BR_MIS_PRED"
> > +  },
> > +  {
> > +"ArchStdEvent": "BR_PRED"
> > +  }
> > +]
> > diff --git a/tools/perf/pmu-events/arch/arm64/fujitsu/a64fx/bus.json 
> > b/tools/perf/pmu-events/arch/arm64/fujitsu/a64fx/bus.json
> > new file mode 100644
> > index ..084e88d7df73
> > --- /dev/null
> > +++ b/tools/perf/pmu-events/arch/arm64/fujitsu/a64fx/bus.json
> > @@ -0,0 +1,62 @@
> > +[
> > +  {
> > +"PublicDescription": "This event counts read transactions from tofu 
> > controller to measured CMG.",
> > +"EventCode": "0x314",
> > +"EventName": "BUS_READ_TOTAL_TOFU",
> > +"BriefDescription": "This event counts read transactions from tofu 
> > controller to measured CMG."
> > +  },
> > +  {
> > +"PublicDescription": "This event counts read transactions from PCI 
> > controller to measured CMG.",
> > +"EventCode": "0x315",
> > +"EventName": "BUS_READ_TOTAL_PCI",
> > +"BriefDescription": "This event counts read transactions from PCI 
> > controller to measured CMG."
> > +  },
> > +  {
> > +"PublicDescription": "This event counts read transactions from 
> > measured CMG local memory to measured CMG.",
> > +"EventCode": "0x316",
> > +"EventName": "BUS_READ_TOTAL_MEM",
> > +"BriefDescription": "This event counts read transactions from measured 
> > CMG local memory to measured CMG."
> > +  },
> > +  {
> > +"PublicDescription": "This event counts write transactions from 
> > measured CMG to CMG0, if measured CMG is not CMG0.",
> > +"EventCode": "0x318",
> > +"EventName": "BUS_WRITE_TOTAL_CMG0",
> > +"BriefDescription": "This event counts write transactions from 
> > measured CMG to CMG0, if measured CMG is not CMG0."
> > +  },
> > +  {
> > +"PublicDescription": "This event counts write transactions from 
> > measured CMG to CMG1, if measured CMG is not CMG1.",
> > +"EventCode": "0x319",
> > +"EventName": "BUS_WRITE_TOTAL_CMG1",
> > +"BriefDescription": "This event counts write transactions from 
> > measured CMG to CMG1, if measured CMG is not CMG1."
> > +  },
> > +  {
> > +"PublicDescription": "This event counts write transactions from 
> > measured CMG to CMG2, if measured CMG is not CMG2.",
> > +"EventCode": "0x31A",
> > +"EventName": "BUS_WRITE_TOTAL_CMG2",
> > +"BriefDescription": "This event counts write transactions from 
> > measured CMG to CMG2, if measured CMG is not CMG2."
> > +  },
> > +  {
> > +

Re: [PATCH v4] perf pmu: Validate raw event with sysfs exported format bits

2021-03-15 Thread Arnaldo Carvalho de Melo
Em Wed, Mar 10, 2021 at 12:38:02PM +0100, Jiri Olsa escreveu:
> On Wed, Mar 10, 2021 at 01:11:38PM +0800, Jin Yao wrote:
> > Warnings are reported for invalid bits.
> > 
> > Co-developed-by: Jiri Olsa 
> > Signed-off-by: Jin Yao 
> 
> Reviewed-by: Jiri Olsa 

Thanks, applied.

- Arnaldo



Re: [PATCH] perf-stat: introduce bperf, share hardware PMCs with BPF

2021-03-15 Thread Arnaldo Carvalho de Melo
Em Fri, Mar 12, 2021 at 06:52:39PM +, Song Liu escreveu:
> > On Mar 12, 2021, at 6:24 AM, Arnaldo Carvalho de Melo  
> > wrote:
> > Em Thu, Mar 11, 2021 at 06:02:57PM -0800, Song Liu escreveu:
> >> perf uses performance monitoring counters (PMCs) to monitor system
> >> performance. The PMCs are limited hardware resources. For example,
> >> Intel CPUs have 3x fixed PMCs and 4x programmable PMCs per cpu.
> >> 
> >> Modern data center systems use these PMCs in many different ways:
> >> system level monitoring, (maybe nested) container level monitoring, per
> >> process monitoring, profiling (in sample mode), etc. In some cases,
> >> there are more active perf_events than available hardware PMCs. To allow
> >> all perf_events to have a chance to run, it is necessary to do expensive
> >> time multiplexing of events.
> >> 
> >> On the other hand, many monitoring tools count the common metrics (cycles,
> >> instructions). It is a waste to have multiple tools create multiple
> >> perf_events of "cycles" and occupy multiple PMCs.
> >> 
> >> bperf tries to reduce such wastes by allowing multiple perf_events of
> >> "cycles" or "instructions" (at different scopes) to share PMUs. Instead
> >> of having each perf-stat session to read its own perf_events, bperf uses
> >> BPF programs to read the perf_events and aggregate readings to BPF maps.
> >> Then, the perf-stat session(s) reads the values from these BPF maps.
> >> 
> >> Please refer to the comment before the definition of bperf_ops for the
> >> description of bperf architecture.
> >> 
> >> bperf is off by default. To enable it, pass --use-bpf option to perf-stat.
> >> bperf uses a BPF hashmap to share information about BPF programs and maps
> >> used by bperf. This map is pinned to bpffs. The default address is
> >> /sys/fs/bpf/bperf_attr_map. The user could change the address with option
> >> --attr-map.
> >> 
> >> ---
> >> Known limitations:
> >> 1. Do not support per cgroup events;
> >> 2. Do not support monitoring of BPF program (perf-stat -b);
> >> 3. Do not support event groups.
> >> The following commands have been tested:
> >> 
> >>   perf stat --use-bpf -e cycles -a
> >>   perf stat --use-bpf -e cycles -C 1,3,4
> >>   perf stat --use-bpf -e cycles -p 123
> >>   perf stat --use-bpf -e cycles -t 100,101



> >> @@ -925,15 +931,15 @@ static int __run_perf_stat(int argc, const char 
> >> **argv, int run_idx)
> >>/*
> >> * Enable counters and exec the command:
> >> */
> >> -  t0 = rdclock();
> >> -  clock_gettime(CLOCK_MONOTONIC, _time);
> >> -
> >>if (forks) {
> >>evlist__start_workload(evsel_list);
> >>err = enable_counters();
> >>if (err)
> >>return -1;
> >> 
> >> +  t0 = rdclock();
> >> +  clock_gettime(CLOCK_MONOTONIC, _time);
> >> +
> >>if (interval || timeout || 
> >> evlist__ctlfd_initialized(evsel_list))
> >>status = dispatch_events(forks, timeout, interval, 
> >> );
> >>if (child_pid != -1) {
> >> @@ -954,6 +960,10 @@ static int __run_perf_stat(int argc, const char 
> >> **argv, int run_idx)
> >>err = enable_counters();
> >>if (err)
> >>return -1;
> >> +
> >> +  t0 = rdclock();
> >> +  clock_gettime(CLOCK_MONOTONIC, _time);
> >> +
> >>status = dispatch_events(forks, timeout, interval, );
> >>}
> >> 
> > 
> > The above two hunks seems out of place, i.e. can they go to a different
> > patch and with an explanation about why this is needed?
> 
> Actually, I am still debating whether we want the above change in a separate 
> patch. It is related to the following change. 
> 
> [...]
> 
> >> +  /*
> >> +   * Attahcing the skeleton takes non-trivial time (0.2s+ on a kernel
> >> +   * with some debug options enabled). This shows as a longer first
> >> +   * interval:
> >> +   *
> >> +   * # perf stat -e cycles -a -I 1000
> >> +   * #   time counts unit events
> >> +   *  1.267634674 26,259,166,523  cycles
> >> +   *  2.271637827 22,550,822,286  cycles
> >> +  

Re: [PATCH] perf-stat: introduce bperf, share hardware PMCs with BPF

2021-03-12 Thread Arnaldo Carvalho de Melo
Em Fri, Mar 12, 2021 at 04:07:42PM +, Song Liu escreveu:
> 
> 
> > On Mar 12, 2021, at 6:24 AM, Arnaldo Carvalho de Melo  
> > wrote:
> > 
> > Em Thu, Mar 11, 2021 at 06:02:57PM -0800, Song Liu escreveu:
> >> perf uses performance monitoring counters (PMCs) to monitor system
> >> performance. The PMCs are limited hardware resources. For example,
> >> Intel CPUs have 3x fixed PMCs and 4x programmable PMCs per cpu.
> >> 
> >> Modern data center systems use these PMCs in many different ways:
> >> system level monitoring, (maybe nested) container level monitoring, per
> >> process monitoring, profiling (in sample mode), etc. In some cases,
> >> there are more active perf_events than available hardware PMCs. To allow
> >> all perf_events to have a chance to run, it is necessary to do expensive
> >> time multiplexing of events.
> >> 
> >> On the other hand, many monitoring tools count the common metrics (cycles,
> >> instructions). It is a waste to have multiple tools create multiple
> >> perf_events of "cycles" and occupy multiple PMCs.
> >> 
> >> bperf tries to reduce such wastes by allowing multiple perf_events of
> >> "cycles" or "instructions" (at different scopes) to share PMUs. Instead
> >> of having each perf-stat session to read its own perf_events, bperf uses
> >> BPF programs to read the perf_events and aggregate readings to BPF maps.
> >> Then, the perf-stat session(s) reads the values from these BPF maps.
> >> 
> >> Please refer to the comment before the definition of bperf_ops for the
> >> description of bperf architecture.
> >> 
> >> bperf is off by default. To enable it, pass --use-bpf option to perf-stat.
> >> bperf uses a BPF hashmap to share information about BPF programs and maps
> >> used by bperf. This map is pinned to bpffs. The default address is
> >> /sys/fs/bpf/bperf_attr_map. The user could change the address with option
> >> --attr-map.
> >> 
> >> ---
> >> Known limitations:
> >> 1. Do not support per cgroup events;
> >> 2. Do not support monitoring of BPF program (perf-stat -b);
> >> 3. Do not support event groups.
> > 
> > Cool stuff, but I think you can break this up into more self contained
> > patches, see below.
> > 
> > Apart from that, some suggestions/requests:
> > 
> > We need a shell 'perf test' that uses some synthetic workload so that we
> > can count events with/without --use-bpf (--bpf-counters is my
> > alternative name, see below), and then compare if the difference is
> > under some acceptable range.
> 
> Yes, "perf test" makes sense. Would this be the extension of current 
> perf-test command? Or a new set of tests?

Extension, please look at:

tools/perf/tests/shell/

Those are shell script based tests, that will be run by 'perf test'
right after the other, C based ones.
 
> > As a followup patch we could have something like:
> > 
> > perf config stat.bpf-counters=yes
> > 
> > That would make 'perf stat' use BPF counters for what it can, using the
> > default method for the non-supported targets, emitting some 'perf stat
> > -v' visible warning (i.e. a debug message), i.e. make it opt-in that the
> > user wants to use BPF counters for all that is possible at that point in
> > time.o
 
> Yes, the fallback mechanism will be very helpful. I also have ideas on
> setting a list for "common events", and only use BPF for the common 
> events. Not common events should just use the original method. 

Yeah, transition period, as the need arises, more can be done, with the
pre-existing method being the fallback or better than any BPF based
mechanism already.
 
> > Thanks for working on this,
> > 
> >> The following commands have been tested:
> >> 
> >>   perf stat --use-bpf -e cycles -a
> >>   perf stat --use-bpf -e cycles -C 1,3,4
> >>   perf stat --use-bpf -e cycles -p 123
> >>   perf stat --use-bpf -e cycles -t 100,101
> >> 
> >> Signed-off-by: Song Liu 
> >> ---
> >> tools/perf/Makefile.perf  |   1 +
> >> tools/perf/builtin-stat.c |  20 +-
> >> tools/perf/util/bpf_counter.c | 552 +-
> >> tools/perf/util/bpf_skel/bperf.h  |  14 +
> >> tools/perf/util/bpf_skel/bperf_follower.bpf.c |  65 +++
> >> tools/perf/util/bpf_skel/bperf_leader.bpf.c   |  46 ++
> >> tools/perf/util/evsel.h   | 

Re: [PATCH] perf-stat: introduce bperf, share hardware PMCs with BPF

2021-03-12 Thread Arnaldo Carvalho de Melo
Em Thu, Mar 11, 2021 at 06:02:57PM -0800, Song Liu escreveu:
> perf uses performance monitoring counters (PMCs) to monitor system
> performance. The PMCs are limited hardware resources. For example,
> Intel CPUs have 3x fixed PMCs and 4x programmable PMCs per cpu.
> 
> Modern data center systems use these PMCs in many different ways:
> system level monitoring, (maybe nested) container level monitoring, per
> process monitoring, profiling (in sample mode), etc. In some cases,
> there are more active perf_events than available hardware PMCs. To allow
> all perf_events to have a chance to run, it is necessary to do expensive
> time multiplexing of events.
> 
> On the other hand, many monitoring tools count the common metrics (cycles,
> instructions). It is a waste to have multiple tools create multiple
> perf_events of "cycles" and occupy multiple PMCs.
> 
> bperf tries to reduce such wastes by allowing multiple perf_events of
> "cycles" or "instructions" (at different scopes) to share PMUs. Instead
> of having each perf-stat session to read its own perf_events, bperf uses
> BPF programs to read the perf_events and aggregate readings to BPF maps.
> Then, the perf-stat session(s) reads the values from these BPF maps.
> 
> Please refer to the comment before the definition of bperf_ops for the
> description of bperf architecture.
> 
> bperf is off by default. To enable it, pass --use-bpf option to perf-stat.
> bperf uses a BPF hashmap to share information about BPF programs and maps
> used by bperf. This map is pinned to bpffs. The default address is
> /sys/fs/bpf/bperf_attr_map. The user could change the address with option
> --attr-map.
> 
> ---
> Known limitations:
> 1. Do not support per cgroup events;
> 2. Do not support monitoring of BPF program (perf-stat -b);
> 3. Do not support event groups.

Cool stuff, but I think you can break this up into more self contained
patches, see below.

Apart from that, some suggestions/requests:

We need a shell 'perf test' that uses some synthetic workload so that we
can count events with/without --use-bpf (--bpf-counters is my
alternative name, see below), and then compare if the difference is
under some acceptable range.

As a followup patch we could have something like:

 perf config stat.bpf-counters=yes

That would make 'perf stat' use BPF counters for what it can, using the
default method for the non-supported targets, emitting some 'perf stat
-v' visible warning (i.e. a debug message), i.e. make it opt-in that the
user wants to use BPF counters for all that is possible at that point in
time.o

Thanks for working on this,

- Arnaldo
 
> The following commands have been tested:
> 
>perf stat --use-bpf -e cycles -a
>perf stat --use-bpf -e cycles -C 1,3,4
>perf stat --use-bpf -e cycles -p 123
>perf stat --use-bpf -e cycles -t 100,101
> 
> Signed-off-by: Song Liu 
> ---
>  tools/perf/Makefile.perf  |   1 +
>  tools/perf/builtin-stat.c |  20 +-
>  tools/perf/util/bpf_counter.c | 552 +-
>  tools/perf/util/bpf_skel/bperf.h  |  14 +
>  tools/perf/util/bpf_skel/bperf_follower.bpf.c |  65 +++
>  tools/perf/util/bpf_skel/bperf_leader.bpf.c   |  46 ++
>  tools/perf/util/evsel.h   |  20 +-
>  tools/perf/util/target.h  |   4 +-
>  8 files changed, 712 insertions(+), 10 deletions(-)
>  create mode 100644 tools/perf/util/bpf_skel/bperf.h
>  create mode 100644 tools/perf/util/bpf_skel/bperf_follower.bpf.c
>  create mode 100644 tools/perf/util/bpf_skel/bperf_leader.bpf.c
> 
> diff --git a/tools/perf/Makefile.perf b/tools/perf/Makefile.perf
> index f6e609673de2b..ca9aa08e85a1f 100644
> --- a/tools/perf/Makefile.perf
> +++ b/tools/perf/Makefile.perf
> @@ -1007,6 +1007,7 @@ python-clean:
>  SKEL_OUT := $(abspath $(OUTPUT)util/bpf_skel)
>  SKEL_TMP_OUT := $(abspath $(SKEL_OUT)/.tmp)
>  SKELETONS := $(SKEL_OUT)/bpf_prog_profiler.skel.h
> +SKELETONS += $(SKEL_OUT)/bperf_leader.skel.h 
> $(SKEL_OUT)/bperf_follower.skel.h
>  
>  ifdef BUILD_BPF_SKEL
>  BPFTOOL := $(SKEL_TMP_OUT)/bootstrap/bpftool
> diff --git a/tools/perf/builtin-stat.c b/tools/perf/builtin-stat.c
> index 2e2e4a8345ea2..34df713a8eea9 100644
> --- a/tools/perf/builtin-stat.c
> +++ b/tools/perf/builtin-stat.c
> @@ -792,6 +792,12 @@ static int __run_perf_stat(int argc, const char **argv, 
> int run_idx)
>   }
>  
>   evlist__for_each_cpu (evsel_list, i, cpu) {
> + /*
> +  * bperf calls evsel__open_per_cpu() in bperf__load(), so
> +  * no need to call it again here.
> +  */
> + if (target.use_bpf)
> + break;
>   affinity__set(, cpu);
>  
>   evlist__for_each_entry(evsel_list, counter) {
> @@ -925,15 +931,15 @@ static int __run_perf_stat(int argc, const char **argv, 
> int run_idx)
>   /*
>* Enable counters and exec the command:
>*/
> - t0 = rdclock();
> -   

Re: [PATCH] perf synthetic events: Avoid write of uninitialized memory.

2021-03-10 Thread Arnaldo Carvalho de Melo
Em Wed, Mar 10, 2021 at 12:48:36PM +0100, Jiri Olsa escreveu:
> On Tue, Mar 09, 2021 at 03:49:45PM -0800, Ian Rogers wrote:
> > Account for alignment bytes in the zero-ing memset.
> > 
> > Signed-off-by: Ian Rogers 
> > ---
> >  tools/perf/util/synthetic-events.c | 9 +
> >  1 file changed, 5 insertions(+), 4 deletions(-)
> > 
> > diff --git a/tools/perf/util/synthetic-events.c 
> > b/tools/perf/util/synthetic-events.c
> > index b698046ec2db..31bf3dd6a1e0 100644
> > --- a/tools/perf/util/synthetic-events.c
> > +++ b/tools/perf/util/synthetic-events.c
> > @@ -424,7 +424,7 @@ int perf_event__synthesize_mmap_events(struct perf_tool 
> > *tool,
> >  
> > while (!io.eof) {
> > static const char anonstr[] = "//anon";
> > -   size_t size;
> > +   size_t size, aligned_size;
> >  
> > /* ensure null termination since stack will be reused. */
> > event->mmap2.filename[0] = '\0';
> > @@ -484,11 +484,12 @@ int perf_event__synthesize_mmap_events(struct 
> > perf_tool *tool,
> > }
> >  
> > size = strlen(event->mmap2.filename) + 1;
> > -   size = PERF_ALIGN(size, sizeof(u64));
> > +   aligned_size = PERF_ALIGN(size, sizeof(u64));
> > event->mmap2.len -= event->mmap.start;
> > event->mmap2.header.size = (sizeof(event->mmap2) -
> > -   (sizeof(event->mmap2.filename) - size));
> > -   memset(event->mmap2.filename + size, 0, machine->id_hdr_size);
> > +   (sizeof(event->mmap2.filename) - 
> > aligned_size));
> > +   memset(event->mmap2.filename + size, 0, machine->id_hdr_size +
> > +   (aligned_size - size));
> 
> so we did not zero the extra alignment bytes, nice ;-) looks good
> 
> Acked-by: Jiri Olsa 

That is really old:

Fixes: 1a853e36871b533c ("perf record: Allow specifying a pid to record")

Circa 2009, the PERF_RECORD_COMM is ok as TASK_COMM_LEN is 16.

But I think there are other places synthesizing PERF_RECORD_MMAP,
jitdump maybe:

tools/perf/bench/inject-buildid.c, but it uses memset to zero the whole
union, no problem.

tools/perf/util/jitdump.c
jit_repipe_code_load() but it uses calloc to allocate the union
perf_event, so no problem as well.

Thanks, applied.

- Arnaldo


Re: [PATCH] tool/perf: Perf build fails on 5.12.0rc2 on s390

2021-03-09 Thread Arnaldo Carvalho de Melo
Em Tue, Mar 09, 2021 at 12:04:47PM +0100, Thomas Richter escreveu:
> perf build fails on 5.12.0rc2 on s390 with this error message:
> 
> util/synthetic-events.c: In function
>   ‘__event__synthesize_thread.part.0.isra’:
> util/synthetic-events.c:787:19: error: ‘kernel_thread’ may be
> used uninitialized in this function [-Werror=maybe-uninitialized]
> 787 |   if (_pid == pid && !kernel_thread) {
> |   ^
> 
> The build succeeds using command 'make DEBUG=y'.
> 
> The variable kernel_thread is set by this function sequence:
> 
> __event__synthesize_thread()
> |defines bool kernel_thread; as local variable and calls
> +--> perf_event__prepare_comm(..., _thread)
>  +--> perf_event__get_comm_ids(..., bool *kernel);
>   On return of this function variable kernel is always
>   set to true of false.

s/of/or/

But it is only called for the host 'struct machine', if that is not the
case, then the value of 'kernel_thread' is left undefined/uninitialized,
right?

> 
> To prevent this compile error, assign variable kernel_thread
> a value when it is defined.

Applied, and added:

Fixes: c1b907953b2cd9ff ("perf tools: Skip PERF_RECORD_MMAP event synthesis for 
kernel threads")

Changed the subject to:

perf synthetic-events: Fix uninitialized 'kernel_thread' variable

As this doesn't affect just s/390, it is entirely possible that that
variable gets used with an undefined value.

- Arnaldo
 
> Output after:
> [root@m35lp76 perf]# make  util/synthetic-events.o
> 
>  CC   util/synthetic-events.o
> [root@m35lp76 perf]#
> 
> Signed-off-by: Thomas Richter 
> ---
>  tools/perf/util/synthetic-events.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/tools/perf/util/synthetic-events.c 
> b/tools/perf/util/synthetic-events.c
> index b698046ec2db..5dd451695f33 100644
> --- a/tools/perf/util/synthetic-events.c
> +++ b/tools/perf/util/synthetic-events.c
> @@ -758,7 +758,7 @@ static int __event__synthesize_thread(union perf_event 
> *comm_event,
>   for (i = 0; i < n; i++) {
>   char *end;
>   pid_t _pid;
> - bool kernel_thread;
> + bool kernel_thread = false;
>  
>   _pid = strtol(dirent[i]->d_name, , 10);
>   if (*end)
> -- 
> 2.29.2
> 

-- 

- Arnaldo


Re: [PATCH] perf auxtrace: Fix auxtrace queue conflict

2021-03-09 Thread Arnaldo Carvalho de Melo
Em Mon, Mar 08, 2021 at 08:54:37AM -0800, Andi Kleen escreveu:
> On Mon, Mar 08, 2021 at 05:11:43PM +0200, Adrian Hunter wrote:
> > The only requirement of an auxtrace queue is that the buffers are in
> > time order.  That is achieved by making separate queues for separate
> > perf buffer or AUX area buffer mmaps.
> > 
> > That generally means a separate queue per cpu for per-cpu contexts,
> > and a separate queue per thread for per-task contexts.
> > 
> > When buffers are added to a queue, perf checks that the buffer cpu
> > and thread id (tid) match the queue cpu and thread id.
> > 
> > However, generally, that need not be true, and perf will queue
> > buffers correctly anyway, so the check is not needed.
> > 
> > In addition, the check gets erroneously hit when using sample mode
> > to trace multiple threads.
> > 
> > Consequently, fix that case by removing the check.
> 
> Thanks!
> 
> Reviewed-by: Andi Kleen 

Thanks, applied.

- Arnaldo



Re: [PATCH] perf machine: Assign boolean values to a bool variable

2021-03-09 Thread Arnaldo Carvalho de Melo
Em Tue, Mar 09, 2021 at 06:11:09PM +0800, Jiapeng Chong escreveu:
> Fix the following coccicheck warnings:
> 
> ./tools/perf/util/machine.c:2041:9-10: WARNING: return of 0/1 in
> function 'symbol__match_regex' with return type bool.

Thanks, applied.

- Arnaldo

 
> Reported-by: Abaci Robot 
> Signed-off-by: Jiapeng Chong 
> ---
>  tools/perf/util/machine.c | 4 ++--
>  1 file changed, 2 insertions(+), 2 deletions(-)
> 
> diff --git a/tools/perf/util/machine.c b/tools/perf/util/machine.c
> index b5c2d8b..435771e 100644
> --- a/tools/perf/util/machine.c
> +++ b/tools/perf/util/machine.c
> @@ -2038,8 +2038,8 @@ int machine__process_event(struct machine *machine, 
> union perf_event *event,
>  static bool symbol__match_regex(struct symbol *sym, regex_t *regex)
>  {
>   if (!regexec(regex, sym->name, 0, NULL, 0))
> - return 1;
> - return 0;
> + return true;
> + return false;
>  }
>  
>  static void ip__resolve_ams(struct thread *thread,
> -- 
> 1.8.3.1
> 

-- 

- Arnaldo


Re: [PATCH] perf tools: use ARRAY_SIZE

2021-03-09 Thread Arnaldo Carvalho de Melo
Em Tue, Mar 09, 2021 at 05:12:25PM +0800, Jiapeng Chong escreveu:
> Fix the following cppcheck warnings:
> 
> ./tools/perf/tests/demangle-ocaml-test.c:29:34-35: WARNING: Use
> ARRAY_SIZE.

Thanks, applied.

- Arnaldo

 
> Reported-by: Abaci Robot 
> Signed-off-by: Jiapeng Chong 
> ---
>  tools/perf/tests/demangle-ocaml-test.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/tools/perf/tests/demangle-ocaml-test.c 
> b/tools/perf/tests/demangle-ocaml-test.c
> index a273ed5..2fac7d7 100644
> --- a/tools/perf/tests/demangle-ocaml-test.c
> +++ b/tools/perf/tests/demangle-ocaml-test.c
> @@ -26,7 +26,7 @@ int test__demangle_ocaml(struct test *test __maybe_unused, 
> int subtest __maybe_u
> "Stdlib.bytes.++" },
>   };
>  
> - for (i = 0; i < sizeof(test_cases) / sizeof(test_cases[0]); i++) {
> + for (i = 0; i < ARRAY_SIZE(test_cases); i++) {
>   buf = ocaml_demangle_sym(test_cases[i].mangled);
>   if ((buf == NULL && test_cases[i].demangled != NULL)
>   || (buf != NULL && test_cases[i].demangled == 
> NULL)
> -- 
> 1.8.3.1
> 

-- 

- Arnaldo


  1   2   3   4   5   6   7   8   9   10   >