[PATCH v4 14/15] perf tools script: Add array bound checking to list_scripts

2019-03-05 Thread Andi Kleen
From: Andi Kleen 

Don't overflow array when the scripts directory is too large,
or the script file name is too long.

Signed-off-by: Andi Kleen 
---
 tools/perf/builtin-script.c  | 8 ++--
 tools/perf/builtin.h | 3 ++-
 tools/perf/ui/browsers/scripts.c | 3 ++-
 3 files changed, 10 insertions(+), 4 deletions(-)

diff --git a/tools/perf/builtin-script.c b/tools/perf/builtin-script.c
index d5e819b68970..652c114f8d40 100644
--- a/tools/perf/builtin-script.c
+++ b/tools/perf/builtin-script.c
@@ -2983,7 +2983,8 @@ static int check_ev_match(char *dir_name, char 
*scriptname,
  * will list all statically runnable scripts, select one, execute it and
  * show the output in a perf browser.
  */
-int find_scripts(char **scripts_array, char **scripts_path_array)
+int find_scripts(char **scripts_array, char **scripts_path_array, int num,
+int pathlen)
 {
struct dirent *script_dirent, *lang_dirent;
char scripts_path[MAXPATHLEN], lang_path[MAXPATHLEN];
@@ -3028,7 +3029,10 @@ int find_scripts(char **scripts_array, char 
**scripts_path_array)
/* Skip those real time scripts: xxxtop.p[yl] */
if (strstr(script_dirent->d_name, "top."))
continue;
-   sprintf(scripts_path_array[i], "%s/%s", lang_path,
+   if (i >= num)
+   break;
+   snprintf(scripts_path_array[i], pathlen, "%s/%s",
+   lang_path,
script_dirent->d_name);
temp = strchr(script_dirent->d_name, '.');
snprintf(scripts_array[i],
diff --git a/tools/perf/builtin.h b/tools/perf/builtin.h
index 05745f3ce912..999fe9170122 100644
--- a/tools/perf/builtin.h
+++ b/tools/perf/builtin.h
@@ -40,5 +40,6 @@ int cmd_mem(int argc, const char **argv);
 int cmd_data(int argc, const char **argv);
 int cmd_ftrace(int argc, const char **argv);
 
-int find_scripts(char **scripts_array, char **scripts_path_array);
+int find_scripts(char **scripts_array, char **scripts_path_array, int num,
+int pathlen);
 #endif
diff --git a/tools/perf/ui/browsers/scripts.c b/tools/perf/ui/browsers/scripts.c
index 5d407ab834b5..6759d6657e1a 100644
--- a/tools/perf/ui/browsers/scripts.c
+++ b/tools/perf/ui/browsers/scripts.c
@@ -117,7 +117,8 @@ static int list_scripts(char *script_name, bool *custom,
paths[i] = names[i] + SCRIPT_NAMELEN;
}
 
-   num = find_scripts(names + max_std, paths + max_std);
+   num = find_scripts(names + max_std, paths + max_std, SCRIPT_MAX_NO - 
max_std,
+   SCRIPT_FULLPATH_LEN);
if (num < 0)
num = 0;
choice = ui__popup_menu(num + max_std, (char * const *)names);
-- 
2.20.1



Re: [PATCH v5 07/10] perf record: implement -z,--compression_level=n option and compression

2019-03-04 Thread Andi Kleen
On Fri, Mar 01, 2019 at 06:58:32PM +0300, Alexey Budankov wrote:

Could do this as a follow up patch, but at some point the new
records need to be documented in Documentation/perf.data-file-format.txt

-Andi


Re: [PATCH v3 02/11] perf tools script: Support insn output for normal samples

2019-03-04 Thread Andi Kleen
> > +   uname();
> > +   if (!strcmp(uts.machine, session->header.env.arch) ||
> > +   (!strcmp(uts.machine, "x86_64") &&
> > +!strcmp(session->header.env.arch, "i386")))
> 
> why is this check and native_arch bool necessary?
> i386 data will be overed by arch/x86

This is so that e.g. if someone displays an ARM perf.data
on x86 it won't try to decode the ARM instructions as x86
instructions.

It also assumes that noone parses 64bit x86 on 32bit,
but I guess that's ok.

-Andi


Re: [PATCH v3 10/11] perf tools report: Implement browsing of individual samples

2019-03-04 Thread Andi Kleen
> > +--samples=N::
> > +   Save N individual samples for each histogram entry to show context in 
> > perf
> > +   report tui browser.
> 
> maybe we could set some default value (50?)

50 wouldn't fit on the screen.

I turned it off by default intentionally because it will increase the
memory consumption of perf report quite a bit.

It might be possible to set some small default though, but right
now I guess whoever wants it can enable it.

-Andi


[PATCH v3 07/11] perf tools report: Support running scripts for current time range

2019-02-28 Thread Andi Kleen
From: Andi Kleen 

When using the time sort key, add new context menus to run
scripts for only the currently selected time range. Compute
the correct range for the selection add pass it as the --time option to
perf script.

Signed-off-by: Andi Kleen 

---
v2:
Use symbol_conf.time_quantum
v3:
Work around gcc bug that triggers bogus warning.
---
 tools/perf/ui/browsers/hists.c | 82 +-
 1 file changed, 71 insertions(+), 11 deletions(-)

diff --git a/tools/perf/ui/browsers/hists.c b/tools/perf/ui/browsers/hists.c
index aef800d97ea1..691056b99b76 100644
--- a/tools/perf/ui/browsers/hists.c
+++ b/tools/perf/ui/browsers/hists.c
@@ -30,6 +30,7 @@
 #include "srcline.h"
 #include "string2.h"
 #include "units.h"
+#include "time-utils.h"
 
 #include "sane_ctype.h"
 
@@ -2338,6 +2339,7 @@ static int switch_data_file(void)
 }
 
 struct popup_action {
+   unsigned long   time;
struct thread   *thread;
struct map_symbol   ms;
int socket;
@@ -2527,36 +2529,64 @@ static int
 do_run_script(struct hist_browser *browser __maybe_unused,
  struct popup_action *act)
 {
-   char script_opt[64];
-   memset(script_opt, 0, sizeof(script_opt));
+   char *script_opt;
+   int len;
+   int n = 0;
+
+   len = 100;
+   if (act->thread)
+   len += strlen(thread__comm_str(act->thread));
+   else if (act->ms.sym)
+   len += strlen(act->ms.sym->name);
+   script_opt = malloc(len);
+   if (!script_opt)
+   return -1;
 
+   script_opt[0] = 0;
if (act->thread) {
-   scnprintf(script_opt, sizeof(script_opt), " -c %s ",
+   n = scnprintf(script_opt, len, " -c %s ",
  thread__comm_str(act->thread));
} else if (act->ms.sym) {
-   scnprintf(script_opt, sizeof(script_opt), " -S %s ",
+   n = scnprintf(script_opt, len, " -S %s ",
  act->ms.sym->name);
}
 
+   if (act->time) {
+   char start[22], end[22];
+   unsigned long starttime = act->time;
+   unsigned long endtime = act->time + symbol_conf.time_quantum;
+
+   if (starttime == endtime) { /* Display 1ms as fallback */
+   starttime -= 1*100;
+   endtime += 1*100;
+   }
+   timestamp__scnprintf_usec(starttime, start, sizeof start);
+   timestamp__scnprintf_usec(endtime, end, sizeof end);
+   n += snprintf(script_opt + n, len - n, " --time %s,%s", start, 
end);
+   }
+
script_browse(script_opt);
+   free(script_opt);
return 0;
 }
 
 static int
-add_script_opt(struct hist_browser *browser __maybe_unused,
+add_script_opt_2(struct hist_browser *browser __maybe_unused,
   struct popup_action *act, char **optstr,
-  struct thread *thread, struct symbol *sym)
+  struct thread *thread, struct symbol *sym,
+  const char *tstr)
 {
+
if (thread) {
-   if (asprintf(optstr, "Run scripts for samples of thread [%s]",
-thread__comm_str(thread)) < 0)
+   if (asprintf(optstr, "Run scripts for samples of thread [%s]%s",
+thread__comm_str(thread), tstr) < 0)
return 0;
} else if (sym) {
-   if (asprintf(optstr, "Run scripts for samples of symbol [%s]",
-sym->name) < 0)
+   if (asprintf(optstr, "Run scripts for samples of symbol [%s]%s",
+sym->name, tstr) < 0)
return 0;
} else {
-   if (asprintf(optstr, "Run scripts for all samples") < 0)
+   if (asprintf(optstr, "Run scripts for all samples%s", tstr) < 0)
return 0;
}
 
@@ -2566,6 +2596,36 @@ add_script_opt(struct hist_browser *browser 
__maybe_unused,
return 1;
 }
 
+static int
+add_script_opt(struct hist_browser *browser,
+  struct popup_action *act, char **optstr,
+  struct thread *thread, struct symbol *sym)
+{
+   int n, j;
+   struct hist_entry *he;
+
+   n = add_script_opt_2(browser, act, optstr, thread, sym, "");
+
+   he = hist_browser__selected_entry(browser);
+   if (sort_order && strstr(sort_order, "time")) {
+   char tstr[128];
+
+   optstr++;
+   act++;
+   j = sprintf(tstr, " in ");
+   j += timestamp__scnprintf_usec(he->time, tstr + j,
+ 

Support sample context in perf report

2019-02-28 Thread Andi Kleen
[Changes: 
v3:
Fix compile problem on Fedora.
Rebase on latest tip. Now hopefully no missing patches.]

We currently have two ways to look at sample data in perf:
either use perf report to aggregate everything, or use
perf script to look at all individual samples.

Both ways are useful. Of course aggregation is useful
to quickly find the most expensive part of the code.

But sometimes a single sample is not good enough to
determine the problem and we need to look at context, either
through branch contexts, or other previous samples (e.g. for
correlating different micro architecture events or computing
metrics)

This can be done through perf script today, but it can
be rather cumbersome to find the right samples to look
at.

Another problem with perf report is that it aggregates
the whole measurement period. But many real workloads
have phases where they behave quite differently, and it is
often not useful to combine them into a single histogram.

While this can be worked around with the --time option
to report, it can be quite cumbersome.

This patch kit attempts to address some of these
problems in perf report by making it time aware.

- It adds a new time sort key that allows perf report
to separate samples from different regions. The time
regions can be defined with a new --time-quantum option.

- Then it extends the perf script support in the
tui record browser to allow browsing samples for 
different time regions from within a perf report
session.

- Extends the report browser script display
to automatically select sensible defaults
based on what was recorded. For example it will
automatically show branch contexts with -b.

- Support browsing the context of individual samples.
perf report can save a limited number of random samples
per histogram entry with the new --samples option.
Then the browser allows directly jumping to any
of the saved samples and browsing the context on the current
thread or CPU.

There could be probably be done more to make
perf report even better for such use cases (e.g. a real
time line display), but this basic support is good
enough for many useful usages.

Also available in
git://git.kernel.org/pub/scm/linux/kernel/git/ak/linux-misc.git perf/streams-3




[PATCH v3 09/11] perf tools: Add utility function to print ns time stamps

2019-02-28 Thread Andi Kleen
From: Andi Kleen 

Add a utility function to print nanosecond timestamps.

Signed-off-by: Andi Kleen 
---
 tools/perf/util/time-utils.c | 8 
 tools/perf/util/time-utils.h | 1 +
 2 files changed, 9 insertions(+)

diff --git a/tools/perf/util/time-utils.c b/tools/perf/util/time-utils.c
index 6193b46050a5..a63bdf4cfd1b 100644
--- a/tools/perf/util/time-utils.c
+++ b/tools/perf/util/time-utils.c
@@ -404,6 +404,14 @@ int timestamp__scnprintf_usec(u64 timestamp, char *buf, 
size_t sz)
return scnprintf(buf, sz, "%"PRIu64".%06"PRIu64, sec, usec);
 }
 
+int timestamp__scnprintf_nsec(u64 timestamp, char *buf, size_t sz)
+{
+   u64  sec = timestamp / NSEC_PER_SEC;
+   u64 nsec = timestamp % NSEC_PER_SEC;
+
+   return scnprintf(buf, sz, "%"PRIu64".%09"PRIu64, sec, nsec);
+}
+
 int fetch_current_timestamp(char *buf, size_t sz)
 {
struct timeval tv;
diff --git a/tools/perf/util/time-utils.h b/tools/perf/util/time-utils.h
index 70b177d2b98c..9266cf4a8e58 100644
--- a/tools/perf/util/time-utils.h
+++ b/tools/perf/util/time-utils.h
@@ -24,6 +24,7 @@ bool perf_time__ranges_skip_sample(struct perf_time_interval 
*ptime_buf,
   int num, u64 timestamp);
 
 int timestamp__scnprintf_usec(u64 timestamp, char *buf, size_t sz);
+int timestamp__scnprintf_nsec(u64 timestamp, char *buf, size_t sz);
 
 int fetch_current_timestamp(char *buf, size_t sz);
 
-- 
2.17.2



[PATCH v3 03/11] perf tools report: Support nano seconds

2019-02-28 Thread Andi Kleen
From: Andi Kleen 

Upcoming changes add timestamp output in perf report. Add a --ns
argument similar to perf script to support nanoseconds resolution
when needed.

Signed-off-by: Andi Kleen 

---
v2:
Move flag into symbol_conf and change all users
---
 tools/perf/Documentation/perf-report.txt |  3 +++
 tools/perf/builtin-report.c  |  1 +
 tools/perf/builtin-script.c  | 11 +--
 tools/perf/util/symbol.c |  1 +
 tools/perf/util/symbol_conf.h|  1 +
 5 files changed, 11 insertions(+), 6 deletions(-)

diff --git a/tools/perf/Documentation/perf-report.txt 
b/tools/perf/Documentation/perf-report.txt
index 1a27bfe05039..51dbc519dbce 100644
--- a/tools/perf/Documentation/perf-report.txt
+++ b/tools/perf/Documentation/perf-report.txt
@@ -477,6 +477,9 @@ include::itrace.txt[]
Please note that not all mmaps are stored, options affecting which ones
are include 'perf record --data', for instance.
 
+--ns::
+   Show time stamps in nanoseconds.
+
 --stats::
Display overall events statistics without any further processing.
(like the one at the end of the perf report -D command)
diff --git a/tools/perf/builtin-report.c b/tools/perf/builtin-report.c
index 1532ebde6c4b..09180e559ad6 100644
--- a/tools/perf/builtin-report.c
+++ b/tools/perf/builtin-report.c
@@ -1147,6 +1147,7 @@ int cmd_report(int argc, const char **argv)
OPT_CALLBACK(0, "percent-type", _opts, "local-period",
 "Set percent type local/global-period/hits",
 annotate_parse_percent_type),
+   OPT_BOOLEAN(0, "ns", _conf.nanosecs, "Show times in nanosecs"),
OPT_END()
};
struct perf_data data = {
diff --git a/tools/perf/builtin-script.c b/tools/perf/builtin-script.c
index fbc440bdf880..6575322c61c0 100644
--- a/tools/perf/builtin-script.c
+++ b/tools/perf/builtin-script.c
@@ -60,7 +60,6 @@ static bool   no_callchain;
 static boollatency_format;
 static boolsystem_wide;
 static boolprint_flags;
-static boolnanosecs;
 static const char  *cpu_list;
 static DECLARE_BITMAP(cpu_bitmap, MAX_NR_CPUS);
 static struct perf_stat_config stat_config;
@@ -691,7 +690,7 @@ static int perf_sample__fprintf_start(struct perf_sample 
*sample,
secs = nsecs / NSEC_PER_SEC;
nsecs -= secs * NSEC_PER_SEC;
 
-   if (nanosecs)
+   if (symbol_conf.nanosecs)
printed += fprintf(fp, "%5lu.%09llu: ", secs, nsecs);
else {
char sample_time[32];
@@ -3244,7 +3243,7 @@ static int parse_insn_trace(const struct option *opt 
__maybe_unused,
 {
parse_output_fields(NULL, "+insn,-event,-period", 0);
itrace_parse_synth_opts(opt, "i0ns", 0);
-   nanosecs = true;
+   symbol_conf.nanosecs = true;
return 0;
 }
 
@@ -3262,7 +3261,7 @@ static int parse_call_trace(const struct option *opt 
__maybe_unused,
 {
parse_output_fields(NULL, "-ip,-addr,-event,-period,+callindent", 0);
itrace_parse_synth_opts(opt, "cewp", 0);
-   nanosecs = true;
+   symbol_conf.nanosecs = true;
return 0;
 }
 
@@ -3272,7 +3271,7 @@ static int parse_callret_trace(const struct option *opt 
__maybe_unused,
 {
parse_output_fields(NULL, 
"-ip,-addr,-event,-period,+callindent,+flags", 0);
itrace_parse_synth_opts(opt, "crewp", 0);
-   nanosecs = true;
+   symbol_conf.nanosecs = true;
return 0;
 }
 
@@ -3408,7 +3407,7 @@ int cmd_script(int argc, const char **argv)
OPT_BOOLEAN('f', "force", _conf.force, "don't complain, do it"),
OPT_INTEGER(0, "max-blocks", _blocks,
"Maximum number of code blocks to dump with brstackinsn"),
-   OPT_BOOLEAN(0, "ns", ,
+   OPT_BOOLEAN(0, "ns", _conf.nanosecs,
"Use 9 decimal places when displaying time"),
OPT_CALLBACK_OPTARG(0, "itrace", _synth_opts, NULL, "opts",
"Instruction Tracing options\n" ITRACE_HELP,
diff --git a/tools/perf/util/symbol.c b/tools/perf/util/symbol.c
index 758bf5f74e6e..eb873ea1c405 100644
--- a/tools/perf/util/symbol.c
+++ b/tools/perf/util/symbol.c
@@ -39,6 +39,7 @@ int vmlinux_path__nr_entries;
 char **vmlinux_path;
 
 struct symbol_conf symbol_conf = {
+   .nanosecs   = false,
.use_modules= true,
.try_vmlinux_path   = true,
.demangle   = true,
diff --git a/tools/perf/util/symbol_conf.h b/tools/perf/util/symbol_conf.h
index fffea68c1203..095a297c8b47 100644
--- a/tools/perf/util/symbol_conf.

[PATCH v3 08/11] perf tools report: Support builtin perf script in scripts menu

2019-02-28 Thread Andi Kleen
From: Andi Kleen 

The scripts menu traditionally only showed custom perf scripts.

Allow to run standard perf script with useful default options too.

- Normal perf script
- perf script with assembler (needs xed installed)
- perf script with source code output (needs debuginfo)
- perf script with custom arguments

Then we automatically select the right options to
display the information in the perf.data file.

For example with -b display branch contexts.

It's not easily possible to check for xed's existence
in advance.  perf script usually gives sensible error messages when
it's not available.

Signed-off-by: Andi Kleen 

---
v2:
Pass --inline if needed

squash! perf tools report: Support builtin perf script in scripts menu

Avoid compiler warnings. Allocate cmd buffer dynamically.
---
 tools/perf/ui/browsers/annotate.c |   2 +-
 tools/perf/ui/browsers/hists.c|  20 +++--
 tools/perf/ui/browsers/scripts.c  | 127 +++---
 tools/perf/util/hist.h|   8 +-
 4 files changed, 119 insertions(+), 38 deletions(-)

diff --git a/tools/perf/ui/browsers/annotate.c 
b/tools/perf/ui/browsers/annotate.c
index 35bdfd8b1e71..98d934a36d86 100644
--- a/tools/perf/ui/browsers/annotate.c
+++ b/tools/perf/ui/browsers/annotate.c
@@ -750,7 +750,7 @@ static int annotate_browser__run(struct annotate_browser 
*browser,
continue;
case 'r':
{
-   script_browse(NULL);
+   script_browse(NULL, NULL);
continue;
}
case 'k':
diff --git a/tools/perf/ui/browsers/hists.c b/tools/perf/ui/browsers/hists.c
index 691056b99b76..41027284f1e1 100644
--- a/tools/perf/ui/browsers/hists.c
+++ b/tools/perf/ui/browsers/hists.c
@@ -2343,6 +2343,7 @@ struct popup_action {
struct thread   *thread;
struct map_symbol   ms;
int socket;
+   struct perf_evsel   *evsel;
 
int (*fn)(struct hist_browser *browser, struct popup_action *act);
 };
@@ -2565,7 +2566,7 @@ do_run_script(struct hist_browser *browser __maybe_unused,
n += snprintf(script_opt + n, len - n, " --time %s,%s", start, 
end);
}
 
-   script_browse(script_opt);
+   script_browse(script_opt, act->evsel);
free(script_opt);
return 0;
 }
@@ -2574,7 +2575,7 @@ static int
 add_script_opt_2(struct hist_browser *browser __maybe_unused,
   struct popup_action *act, char **optstr,
   struct thread *thread, struct symbol *sym,
-  const char *tstr)
+  struct perf_evsel *evsel, const char *tstr)
 {
 
if (thread) {
@@ -2592,6 +2593,7 @@ add_script_opt_2(struct hist_browser *browser 
__maybe_unused,
 
act->thread = thread;
act->ms.sym = sym;
+   act->evsel = evsel;
act->fn = do_run_script;
return 1;
 }
@@ -2599,12 +2601,13 @@ add_script_opt_2(struct hist_browser *browser 
__maybe_unused,
 static int
 add_script_opt(struct hist_browser *browser,
   struct popup_action *act, char **optstr,
-  struct thread *thread, struct symbol *sym)
+  struct thread *thread, struct symbol *sym,
+  struct perf_evsel *evsel)
 {
int n, j;
struct hist_entry *he;
 
-   n = add_script_opt_2(browser, act, optstr, thread, sym, "");
+   n = add_script_opt_2(browser, act, optstr, thread, sym, evsel, "");
 
he = hist_browser__selected_entry(browser);
if (sort_order && strstr(sort_order, "time")) {
@@ -2620,7 +2623,7 @@ add_script_opt(struct hist_browser *browser,
  tstr + j,
  sizeof tstr - j);
n += add_script_opt_2(browser, act, optstr, thread, sym,
- tstr);
+ evsel, tstr);
act->time = he->time;
}
return n;
@@ -3091,7 +3094,7 @@ static int perf_evsel__hists_browse(struct perf_evsel 
*evsel, int nr_events,
nr_options += add_script_opt(browser,
 
[nr_options],
 
[nr_options],
-thread, NULL);
+thread, NULL, 
evsel);
}
/*
 * Note that browser->selection != NULL
@@ -3106,11 +3109,12 @@ static int perf_evsel__hists_browse(struct perf_evsel 
*evsel, int nr_events,
nr_options += add_script_opt(browser,
 
[nr_options],
 

[PATCH v3 04/11] perf tools report: Parse time quantum

2019-02-28 Thread Andi Kleen
From: Andi Kleen 

Many workloads change over time. perf report currently aggregates
the whole time range reported in perf.data.

This patch adds an option for a time quantum to quantisize the
perf.data over time.

This just adds the option, will be used in follow on patches
for a time sort key.

Signed-off-by: Andi Kleen 

---
v2:
Move time_quantum to symbol_conf. check for zero time quantum
---
 tools/perf/Documentation/perf-report.txt |  4 +++
 tools/perf/builtin-report.c  | 41 
 tools/perf/util/symbol.c |  1 +
 tools/perf/util/symbol_conf.h|  1 +
 4 files changed, 47 insertions(+)

diff --git a/tools/perf/Documentation/perf-report.txt 
b/tools/perf/Documentation/perf-report.txt
index 51dbc519dbce..9ec1702bccdd 100644
--- a/tools/perf/Documentation/perf-report.txt
+++ b/tools/perf/Documentation/perf-report.txt
@@ -497,6 +497,10 @@ include::itrace.txt[]
The period/hits keywords set the base the percentage is computed
on - the samples period or the number of samples (hits).
 
+--time-quantum::
+   Configure time quantum for time sort key. Default 100ms.
+   Accepts s, us, ms, ns units.
+
 include::callchain-overhead-calculation.txt[]
 
 SEE ALSO
diff --git a/tools/perf/builtin-report.c b/tools/perf/builtin-report.c
index 09180e559ad6..f071f19d12d6 100644
--- a/tools/perf/builtin-report.c
+++ b/tools/perf/builtin-report.c
@@ -47,6 +47,7 @@
 #include 
 #include 
 #include 
+#include "sane_ctype.h"
 #include 
 #include 
 #include 
@@ -926,6 +927,43 @@ report_parse_callchain_opt(const struct option *opt, const 
char *arg, int unset)
return parse_callchain_report_opt(arg);
 }
 
+static int
+parse_time_quantum(const struct option *opt, const char *arg,
+  int unset __maybe_unused)
+{
+   unsigned long *time_q = opt->value;
+   char *end;
+
+   *time_q = strtoul(arg, , 0);
+   if (end == arg)
+   goto parse_err;
+   if (*time_q == 0) {
+   pr_err("time quantum cannot be 0");
+   return -1;
+   }
+   while (isspace(*end))
+   end++;
+   if (*end == 0)
+   return 0;
+   if (!strcmp(end, "s")) {
+   *time_q *= 10;
+   return 0;
+   }
+   if (!strcmp(end, "ms")) {
+   *time_q *= 100;
+   return 0;
+   }
+   if (!strcmp(end, "us")) {
+   *time_q *= 1000;
+   return 0;
+   }
+   if (!strcmp(end, "ns"))
+   return 0;
+parse_err:
+   pr_err("Cannot parse time quantum `%s'\n", arg);
+   return -1;
+}
+
 int
 report_parse_ignore_callees_opt(const struct option *opt __maybe_unused,
const char *arg, int unset __maybe_unused)
@@ -1148,6 +1186,9 @@ int cmd_report(int argc, const char **argv)
 "Set percent type local/global-period/hits",
 annotate_parse_percent_type),
OPT_BOOLEAN(0, "ns", _conf.nanosecs, "Show times in nanosecs"),
+   OPT_CALLBACK(0, "time-quantum", _conf.time_quantum, "time 
(ms|us|ns)",
+"Set time quantum for time sort key (default 100ms)",
+parse_time_quantum),
OPT_END()
};
struct perf_data data = {
diff --git a/tools/perf/util/symbol.c b/tools/perf/util/symbol.c
index eb873ea1c405..0f80743a1c25 100644
--- a/tools/perf/util/symbol.c
+++ b/tools/perf/util/symbol.c
@@ -45,6 +45,7 @@ struct symbol_conf symbol_conf = {
.demangle   = true,
.demangle_kernel= false,
.cumulate_callchain = true,
+   .time_quantum   = 1, /* 100ms */
.show_hist_headers  = true,
.symfs  = "",
.event_group= true,
diff --git a/tools/perf/util/symbol_conf.h b/tools/perf/util/symbol_conf.h
index 095a297c8b47..a5684a71b78e 100644
--- a/tools/perf/util/symbol_conf.h
+++ b/tools/perf/util/symbol_conf.h
@@ -56,6 +56,7 @@ struct symbol_conf {
*sym_list_str,
*col_width_list_str,
*bt_stop_list_str;
+   unsigned long   time_quantum;
struct strlist  *dso_list,
*comm_list,
*sym_list,
-- 
2.17.2



[PATCH v3 11/11] perf tools: Add some new tips describing the new options

2019-02-28 Thread Andi Kleen
From: Andi Kleen 

And one old option.

Signed-off-by: Andi Kleen 
---
 tools/perf/Documentation/tips.txt | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/tools/perf/Documentation/tips.txt 
b/tools/perf/Documentation/tips.txt
index 849599f39c5e..4ec8107ed512 100644
--- a/tools/perf/Documentation/tips.txt
+++ b/tools/perf/Documentation/tips.txt
@@ -15,6 +15,7 @@ To see callchains in a more compact form: perf report -g 
folded
 Show individual samples with: perf script
 Limit to show entries above 5% only: perf report --percent-limit 5
 Profiling branch (mis)predictions with: perf record -b / perf report
+To show assembler sample contexts use perf record -b / perf script -F 
+brstackinsn --xed
 Treat branches as callchains: perf report --branch-history
 To count events in every 1000 msec: perf stat -I 1000
 Print event counts in CSV format with: perf stat -x,
@@ -34,3 +35,5 @@ Show current config key-value pairs: perf config --list
 Show user configuration overrides: perf config --user --list
 To add Node.js USDT(User-Level Statically Defined Tracing): perf buildid-cache 
--add `which node`
 To report cacheline events from previous recording: perf c2c report
+To browse sample contexts use perf report --sample 10 and select in context 
menu
+To separate samples by time use perf report --sort time,overhead,sym
-- 
2.17.2



[PATCH v3 06/11] perf tools report: Use less for scripts output

2019-02-28 Thread Andi Kleen
From: Andi Kleen 

The UI viewer for scripts output has a lot of limitations: limited size,
no search or save function, slow, and various other issues.

Just use 'less' to display directly on the terminal instead.

This won't work in gtk mode, but gtk doesn't support these
context menus anyways. If that is ever done could use an terminal
for the output.

Signed-off-by: Andi Kleen 

---

v2:
Remove some unneeded headers
---
 tools/perf/ui/browsers/scripts.c | 130 ---
 1 file changed, 17 insertions(+), 113 deletions(-)

diff --git a/tools/perf/ui/browsers/scripts.c b/tools/perf/ui/browsers/scripts.c
index 90a32ac69e76..7f36630694bf 100644
--- a/tools/perf/ui/browsers/scripts.c
+++ b/tools/perf/ui/browsers/scripts.c
@@ -1,35 +1,12 @@
 // SPDX-License-Identifier: GPL-2.0
-#include 
-#include 
-#include 
-#include 
 #include "../../util/sort.h"
 #include "../../util/util.h"
 #include "../../util/hist.h"
 #include "../../util/debug.h"
 #include "../../util/symbol.h"
 #include "../browser.h"
-#include "../helpline.h"
 #include "../libslang.h"
 
-/* 2048 lines should be enough for a script output */
-#define MAX_LINES  2048
-
-/* 160 bytes for one output line */
-#define AVERAGE_LINE_LEN   160
-
-struct script_line {
-   struct list_head node;
-   char line[AVERAGE_LINE_LEN];
-};
-
-struct perf_script_browser {
-   struct ui_browser b;
-   struct list_head entries;
-   const char *script_name;
-   int nr_lines;
-};
-
 #define SCRIPT_NAMELEN 128
 #define SCRIPT_MAX_NO  64
 /*
@@ -73,69 +50,29 @@ static int list_scripts(char *script_name)
return ret;
 }
 
-static void script_browser__write(struct ui_browser *browser,
-  void *entry, int row)
+static void run_script(char *cmd)
 {
-   struct script_line *sline = list_entry(entry, struct script_line, node);
-   bool current_entry = ui_browser__is_current_entry(browser, row);
-
-   ui_browser__set_color(browser, current_entry ? HE_COLORSET_SELECTED :
-  HE_COLORSET_NORMAL);
-
-   ui_browser__write_nstring(browser, sline->line, browser->width);
+   pr_debug("Running %s\n", cmd);
+   SLang_reset_tty();
+   if (system(cmd) < 0)
+   pr_warning("Cannot run %s\n", cmd);
+   /*
+* SLang doesn't seem to reset the whole terminal, so be more
+* forceful to get back to the original state.
+*/
+   printf("\033[c\033[H\033[J");
+   fflush(stdout);
+   SLang_init_tty(0, 0, 0);
+   SLsmg_refresh();
 }
 
-static int script_browser__run(struct perf_script_browser *browser)
-{
-   int key;
-
-   if (ui_browser__show(>b, browser->script_name,
-"Press ESC to exit") < 0)
-   return -1;
-
-   while (1) {
-   key = ui_browser__run(>b, 0);
-
-   /* We can add some special key handling here if needed */
-   break;
-   }
-
-   ui_browser__hide(>b);
-   return key;
-}
-
-
 int script_browse(const char *script_opt)
 {
char cmd[SCRIPT_FULLPATH_LEN*2], script_name[SCRIPT_FULLPATH_LEN];
-   char *line = NULL;
-   size_t len = 0;
-   ssize_t retlen;
-   int ret = -1, nr_entries = 0;
-   FILE *fp;
-   void *buf;
-   struct script_line *sline;
-
-   struct perf_script_browser script = {
-   .b = {
-   .refresh= ui_browser__list_head_refresh,
-   .seek   = ui_browser__list_head_seek,
-   .write  = script_browser__write,
-   },
-   .script_name = script_name,
-   };
-
-   INIT_LIST_HEAD();
-
-   /* Save each line of the output in one struct script_line object. */
-   buf = zalloc((sizeof(*sline)) * MAX_LINES);
-   if (!buf)
-   return -1;
-   sline = buf;
 
memset(script_name, 0, SCRIPT_FULLPATH_LEN);
if (list_scripts(script_name))
-   goto exit;
+   return -1;
 
sprintf(cmd, "perf script -s %s ", script_name);
 
@@ -147,42 +84,9 @@ int script_browse(const char *script_opt)
strcat(cmd, input_name);
}
 
-   strcat(cmd, " 2>&1");
-
-   fp = popen(cmd, "r");
-   if (!fp)
-   goto exit;
-
-   while ((retlen = getline(, , fp)) != -1) {
-   strncpy(sline->line, line, AVERAGE_LINE_LEN);
-
-   /* If one output line is very large, just cut it short */
-   if (retlen >= AVERAGE_LINE_LEN) {
-   sline->line[AVERAGE_LINE_LEN - 1] = '\0';
-   sline->line[AVERAGE_LINE_LEN - 2] = '\n';
-   }
-   list_add_

[PATCH v3 01/11] perf tools: Add utility function to fetch executable

2019-02-28 Thread Andi Kleen
From: Andi Kleen 

Add a utility function to fetch executable code. Convert one
user over to it. There are more places doing that, but they
do significantly different actions, so they are not
easy to fit into a single library function.

Signed-off-by: Andi Kleen 
---
 tools/perf/util/Build   |  1 +
 tools/perf/util/fetch.c | 28 
 tools/perf/util/fetch.h |  7 +++
 tools/perf/util/intel-bts.c | 21 +++--
 4 files changed, 39 insertions(+), 18 deletions(-)
 create mode 100644 tools/perf/util/fetch.c
 create mode 100644 tools/perf/util/fetch.h

diff --git a/tools/perf/util/Build b/tools/perf/util/Build
index 8dd3102301ea..649321fc3fb9 100644
--- a/tools/perf/util/Build
+++ b/tools/perf/util/Build
@@ -65,6 +65,7 @@ perf-y += trace-event-scripting.o
 perf-y += trace-event.o
 perf-y += svghelper.o
 perf-y += sort.o
+perf-y += fetch.o
 perf-y += hist.o
 perf-y += util.o
 perf-y += xyarray.o
diff --git a/tools/perf/util/fetch.c b/tools/perf/util/fetch.c
new file mode 100644
index ..1430083e7eca
--- /dev/null
+++ b/tools/perf/util/fetch.c
@@ -0,0 +1,28 @@
+#include "perf.h"
+#include "machine.h"
+#include "thread.h"
+#include "symbol.h"
+#include "map.h"
+#include "fetch.h"
+
+int fetch_exe(u64 ip, struct thread *thread, struct machine *machine,
+ char *buf, int len, bool *is64bit)
+{
+   struct addr_location al;
+   u8 cpumode;
+   long offset;
+
+   if (machine__kernel_ip(machine, ip))
+   cpumode = PERF_RECORD_MISC_KERNEL;
+   else
+   cpumode = PERF_RECORD_MISC_USER;
+   if (!thread__find_map(thread, cpumode, ip, ) || !al.map->dso)
+   return -1;
+   if (al.map->dso->data.status == DSO_DATA_STATUS_ERROR)
+   return -1;
+   map__load(al.map);
+   offset = al.map->map_ip(al.map, ip);
+   if (is64bit)
+   *is64bit = al.map->dso->is_64_bit;
+   return dso__data_read_offset(al.map->dso, machine, offset, (u8 *)buf, 
len);
+}
diff --git a/tools/perf/util/fetch.h b/tools/perf/util/fetch.h
new file mode 100644
index ..7b77b8cee55a
--- /dev/null
+++ b/tools/perf/util/fetch.h
@@ -0,0 +1,7 @@
+#ifndef FETCH_H
+#define FETCH_H 1
+
+int fetch_exe(u64 ip, struct thread *thread, struct machine *machine,
+ char *buf, int len, bool *is64bit);
+
+#endif
diff --git a/tools/perf/util/intel-bts.c b/tools/perf/util/intel-bts.c
index 0c0180c67574..915f4662e52e 100644
--- a/tools/perf/util/intel-bts.c
+++ b/tools/perf/util/intel-bts.c
@@ -38,6 +38,7 @@
 #include "auxtrace.h"
 #include "intel-pt-decoder/intel-pt-insn-decoder.h"
 #include "intel-bts.h"
+#include "fetch.h"
 
 #define MAX_TIMESTAMP (~0ULL)
 
@@ -328,35 +329,19 @@ static int intel_bts_get_next_insn(struct intel_bts_queue 
*btsq, u64 ip)
 {
struct machine *machine = btsq->bts->machine;
struct thread *thread;
-   struct addr_location al;
unsigned char buf[INTEL_PT_INSN_BUF_SZ];
ssize_t len;
-   int x86_64;
-   uint8_t cpumode;
+   bool x86_64;
int err = -1;
 
-   if (machine__kernel_ip(machine, ip))
-   cpumode = PERF_RECORD_MISC_KERNEL;
-   else
-   cpumode = PERF_RECORD_MISC_USER;
-
thread = machine__find_thread(machine, -1, btsq->tid);
if (!thread)
return -1;
 
-   if (!thread__find_map(thread, cpumode, ip, ) || !al.map->dso)
-   goto out_put;
-
-   len = dso__data_read_addr(al.map->dso, al.map, machine, ip, buf,
- INTEL_PT_INSN_BUF_SZ);
+   len = fetch_exe(ip, thread, machine, (char*)buf, INTEL_PT_INSN_BUF_SZ, 
_64);
if (len <= 0)
goto out_put;
 
-   /* Load maps to ensure dso->is_64_bit has been updated */
-   map__load(al.map);
-
-   x86_64 = al.map->dso->is_64_bit;
-
if (intel_pt_get_insn(buf, len, x86_64, >intel_pt_insn))
goto out_put;
 
-- 
2.17.2



[PATCH v3 10/11] perf tools report: Implement browsing of individual samples

2019-02-28 Thread Andi Kleen
From: Andi Kleen 

Now report can show whole time periods with perf script,
but the user still has to find individual samples of interest
manually.

It would be expensive and complicated to search for the
right samples in the whole perf file. Typically users
only need to look at a small number of samples for useful
analysis.

Also the full scripts tend to show samples of all CPUs and all
threads mixed up, which can be very confusing on larger systems.

Add a new --samples option to save a small random number of samples
per hist entry

Use a reservoir sample technique to select a representatve
number of samples.

Then allow browsing the samples using perf script
as part of the hist entry context menu. This automatically
adds the right filters, so only the thread or cpu of the sample
is displayed. Then we use less' search functionality
to directly jump the to the time stamp of the selected
sample.

It uses different menus for assembler and source display.
Assembler needs xed installed and source needs debuginfo.

Currently it only supports as many samples as fit on
the screen due to some limitations in the slang ui code.

Signed-off-by: Andi Kleen 

---
v2:
Free names on error path
Pass --inline and --show-*-event to child perf as needed.
---
 tools/perf/Documentation/perf-report.txt |  4 ++
 tools/perf/builtin-report.c  |  2 +
 tools/perf/ui/browsers/Build |  1 +
 tools/perf/ui/browsers/hists.c   | 47 ++
 tools/perf/ui/browsers/res_sample.c  | 80 
 tools/perf/ui/browsers/scripts.c |  2 +-
 tools/perf/util/hist.c   | 36 +++
 tools/perf/util/hist.h   | 19 ++
 tools/perf/util/sort.h   |  8 +++
 tools/perf/util/symbol.c |  1 +
 tools/perf/util/symbol_conf.h|  1 +
 11 files changed, 200 insertions(+), 1 deletion(-)
 create mode 100644 tools/perf/ui/browsers/res_sample.c

diff --git a/tools/perf/Documentation/perf-report.txt 
b/tools/perf/Documentation/perf-report.txt
index 546d87221ad8..f441baa794ce 100644
--- a/tools/perf/Documentation/perf-report.txt
+++ b/tools/perf/Documentation/perf-report.txt
@@ -461,6 +461,10 @@ include::itrace.txt[]
 --socket-filter::
Only report the samples on the processor socket that match with this 
filter
 
+--samples=N::
+   Save N individual samples for each histogram entry to show context in 
perf
+   report tui browser.
+
 --raw-trace::
When displaying traceevent output, do not use print fmt or plugins.
 
diff --git a/tools/perf/builtin-report.c b/tools/perf/builtin-report.c
index f071f19d12d6..791eb77e41fe 100644
--- a/tools/perf/builtin-report.c
+++ b/tools/perf/builtin-report.c
@@ -1158,6 +1158,8 @@ int cmd_report(int argc, const char **argv)
OPT_BOOLEAN(0, "demangle-kernel", _conf.demangle_kernel,
"Enable kernel symbol demangling"),
OPT_BOOLEAN(0, "mem-mode", _mode, "mem access profile"),
+   OPT_INTEGER(0, "samples", _conf.res_sample,
+   "Number of samples to save per histogram entry for 
individual browsing"),
OPT_CALLBACK(0, "percent-limit", , "percent",
 "Don't show entries under that percent", 
parse_percent_limit),
OPT_CALLBACK(0, "percentage", NULL, "relative|absolute",
diff --git a/tools/perf/ui/browsers/Build b/tools/perf/ui/browsers/Build
index 8fee56b46502..fdf86f7981ca 100644
--- a/tools/perf/ui/browsers/Build
+++ b/tools/perf/ui/browsers/Build
@@ -3,6 +3,7 @@ perf-y += hists.o
 perf-y += map.o
 perf-y += scripts.o
 perf-y += header.o
+perf-y += res_sample.o
 
 CFLAGS_annotate.o += -DENABLE_SLFUTURE_CONST
 CFLAGS_hists.o+= -DENABLE_SLFUTURE_CONST
diff --git a/tools/perf/ui/browsers/hists.c b/tools/perf/ui/browsers/hists.c
index 41027284f1e1..8ca988506388 100644
--- a/tools/perf/ui/browsers/hists.c
+++ b/tools/perf/ui/browsers/hists.c
@@ -2344,6 +2344,7 @@ struct popup_action {
struct map_symbol   ms;
int socket;
struct perf_evsel   *evsel;
+   enum rstype rstype;
 
int (*fn)(struct hist_browser *browser, struct popup_action *act);
 };
@@ -2571,6 +2572,19 @@ do_run_script(struct hist_browser *browser 
__maybe_unused,
return 0;
 }
 
+static int
+do_res_sample_script(struct hist_browser *browser __maybe_unused,
+struct popup_action *act)
+{
+   struct hist_entry *he;
+   int num_res;
+
+   he = hist_browser__selected_entry(browser);
+   num_res = min(he->num_res, symbol_conf.res_sample);
+   res_sample_browse(he->res_samples, num_res, act->evsel, act->rstype);
+   return 0;
+}
+
 static int
 add_script_opt_2(struct hist_browser *browser __maybe_unused,
   struct popup_action *act, char **optstr,
@@ -2629,6 +2643,27 @@ add_sc

[PATCH v3 02/11] perf tools script: Support insn output for normal samples

2019-02-28 Thread Andi Kleen
From: Andi Kleen 

perf script -F +insn was only working for PT traces because
the PT instruction decoder was filling in the insn/insn_len
sample attributes. Support it for non PT samples too on x86
using the existing x86 instruction decoder.

% perf record -a sleep 1
% perf script -F ip,sym,insn --xed
 811704c9 remote_function   movl  %eax, 0x18(%rbx)
 8100bb50 intel_bts_enable_localretq
 81048612 native_apic_mem_write movl  %esi, 
-0xa04000(%rdi)
 81048612 native_apic_mem_write movl  %esi, 
-0xa04000(%rdi)
 81048612 native_apic_mem_write movl  %esi, 
-0xa04000(%rdi)
 810f1f79 generic_exec_single   xor %eax, %eax
 811704c9 remote_function   movl  %eax, 0x18(%rbx)
 8100bb34 intel_bts_enable_localmovl  0x2000(%rax), %edx
 81048610 native_apic_mem_write mov %edi, %edi
...

Signed-off-by: Andi Kleen 

---
v2:
Avoid printing instruction when empty
Only decode when perf.data file was collected on same architecture
---
 tools/perf/arch/x86/util/Build  |  1 +
 tools/perf/arch/x86/util/archinsn.c | 28 
 tools/perf/builtin-script.c | 21 -
 tools/perf/util/archinsn.h  | 12 
 4 files changed, 61 insertions(+), 1 deletion(-)
 create mode 100644 tools/perf/arch/x86/util/archinsn.c
 create mode 100644 tools/perf/util/archinsn.h

diff --git a/tools/perf/arch/x86/util/Build b/tools/perf/arch/x86/util/Build
index 7aab0be5fc5f..7b8e69bbbdfe 100644
--- a/tools/perf/arch/x86/util/Build
+++ b/tools/perf/arch/x86/util/Build
@@ -6,6 +6,7 @@ perf-y += perf_regs.o
 perf-y += group.o
 perf-y += machine.o
 perf-y += event.o
+perf-y += archinsn.o
 
 perf-$(CONFIG_DWARF) += dwarf-regs.o
 perf-$(CONFIG_BPF_PROLOGUE) += dwarf-regs.o
diff --git a/tools/perf/arch/x86/util/archinsn.c 
b/tools/perf/arch/x86/util/archinsn.c
new file mode 100644
index ..10b3c2a08b8f
--- /dev/null
+++ b/tools/perf/arch/x86/util/archinsn.c
@@ -0,0 +1,28 @@
+// SPDX-License-Identifier: GPL-2.0
+#include "perf.h"
+#include "archinsn.h"
+#include "fetch.h"
+#include "util/intel-pt-decoder/insn.h"
+#include "machine.h"
+#include "thread.h"
+#include "symbol.h"
+
+void arch_fetch_insn(struct perf_sample *sample,
+struct thread *thread,
+struct machine *machine)
+{
+   struct insn insn;
+   int len;
+   bool is64bit = false;
+
+   if (!sample->ip)
+   return;
+   len = fetch_exe(sample->ip, thread, machine, sample->insn,
+   sizeof(sample->insn), );
+   if (len <= 0)
+   return;
+   insn_init(, sample->insn, len, is64bit);
+   insn_get_length();
+   if (insn_complete() && insn.length <= len)
+   sample->insn_len = insn.length;
+}
diff --git a/tools/perf/builtin-script.c b/tools/perf/builtin-script.c
index 2d8cb1d1682c..fbc440bdf880 100644
--- a/tools/perf/builtin-script.c
+++ b/tools/perf/builtin-script.c
@@ -29,10 +29,12 @@
 #include "util/time-utils.h"
 #include "util/path.h"
 #include "print_binary.h"
+#include "archinsn.h"
 #include 
 #include 
 #include 
 #include 
+#include 
 #include "asm/bug.h"
 #include "util/mem-events.h"
 #include "util/dump-insn.h"
@@ -63,6 +65,7 @@ static const char *cpu_list;
 static DECLARE_BITMAP(cpu_bitmap, MAX_NR_CPUS);
 static struct perf_stat_config stat_config;
 static int max_blocks;
+static boolnative_arch;
 
 unsigned int scripting_max_stack = PERF_MAX_STACK_DEPTH;
 
@@ -1227,6 +1230,12 @@ static int perf_sample__fprintf_callindent(struct 
perf_sample *sample,
return len + dlen;
 }
 
+__weak void arch_fetch_insn(struct perf_sample *sample __maybe_unused,
+   struct thread *thread __maybe_unused,
+   struct machine *machine __maybe_unused)
+{
+}
+
 static int perf_sample__fprintf_insn(struct perf_sample *sample,
 struct perf_event_attr *attr,
 struct thread *thread,
@@ -1234,9 +1243,12 @@ static int perf_sample__fprintf_insn(struct perf_sample 
*sample,
 {
int printed = 0;
 
+   if (sample->insn_len == 0 && native_arch)
+   arch_fetch_insn(sample, thread, machine);
+
if (PRINT_FIELD(INSNLEN))
printed += fprintf(fp, " ilen: %d", sample->insn_len);
-   if (PRINT_FIELD(INSN)) {
+   if (PRINT_FIELD(INSN) && sample->insn_len) {
int i;
 
printed += fprintf(fp, " insn:");
@@ -3277,6 +3289,7 @@ int cmd_script(int argc, const char **argv)

[PATCH v3 05/11] perf tools report: Support time sort key

2019-02-28 Thread Andi Kleen
From: Andi Kleen 

Add a time sort key to perf report to display samples for
different time quantums separately. This allows easier
analysis of workloads that change over time, and also
will allow looking at the context of samples.

% perf record ...
% perf report --sort time,overhead,symbol --time-quantum 1ms --stdio
...
 0.67%  277061.87300  [.] _dl_start
 0.50%  277061.87300  [.] f1
 0.50%  277061.87300  [.] f2
 0.33%  277061.87300  [.] main
 0.29%  277061.87300  [.] _dl_lookup_symbol_x
 0.29%  277061.87300  [.] dl_main
 0.29%  277061.87300  [.] do_lookup_x
 0.17%  277061.87300  [.] _dl_debug_initialize
 0.17%  277061.87300  [.] _dl_init_paths
 0.08%  277061.87300  [.] check_match
 0.04%  277061.87300  [.] _dl_count_modids
 1.33%  277061.87400  [.] f1
 1.33%  277061.87400  [.] f2
 1.33%  277061.87400  [.] main
 1.17%  277061.87500  [.] main
 1.08%  277061.87500  [.] f1
 1.08%  277061.87500  [.] f2
 1.00%  277061.87600  [.] main
 0.83%  277061.87600  [.] f1
 0.83%  277061.87600  [.] f2
 1.00%  277061.87700  [.] main

Signed-off-by: Andi Kleen 

---
v2:
Use symbol_conf.time_quantum
---
 tools/perf/Documentation/perf-report.txt |  2 ++
 tools/perf/util/hist.c   | 10 +++
 tools/perf/util/hist.h   |  1 +
 tools/perf/util/sort.c   | 38 
 tools/perf/util/sort.h   |  2 ++
 5 files changed, 53 insertions(+)

diff --git a/tools/perf/Documentation/perf-report.txt 
b/tools/perf/Documentation/perf-report.txt
index 9ec1702bccdd..546d87221ad8 100644
--- a/tools/perf/Documentation/perf-report.txt
+++ b/tools/perf/Documentation/perf-report.txt
@@ -105,6 +105,8 @@ OPTIONS
guest machine
- sample: Number of sample
- period: Raw number of event count of sample
+   - time: Separate the samples by time stamp with the resolution 
specified by
+   --time-quantum (default 100ms). Specify with overhead and before it.
 
By default, comm, dso and symbol keys are used.
(i.e. --sort comm,dso,symbol)
diff --git a/tools/perf/util/hist.c b/tools/perf/util/hist.c
index 669f961316f0..6040eb49ea23 100644
--- a/tools/perf/util/hist.c
+++ b/tools/perf/util/hist.c
@@ -192,6 +192,7 @@ void hists__calc_col_len(struct hists *hists, struct 
hist_entry *h)
hists__new_col_len(hists, HISTC_MEM_LVL, 21 + 3);
hists__new_col_len(hists, HISTC_LOCAL_WEIGHT, 12);
hists__new_col_len(hists, HISTC_GLOBAL_WEIGHT, 12);
+   hists__new_col_len(hists, HISTC_TIME, 12);
 
if (h->srcline) {
len = MAX(strlen(h->srcline), strlen(sort_srcline.se_header));
@@ -246,6 +247,14 @@ static void he_stat__add_cpumode_period(struct he_stat 
*he_stat,
}
 }
 
+static long hist_time(unsigned long time)
+{
+   unsigned long time_quantum = symbol_conf.time_quantum;
+   if (time_quantum)
+   return (time / time_quantum) * time_quantum;
+   return (time / 100) * 100;
+}
+
 static void he_stat__add_period(struct he_stat *he_stat, u64 period,
u64 weight)
 {
@@ -626,6 +635,7 @@ __hists__add_entry(struct hists *hists,
.raw_data = sample->raw_data,
.raw_size = sample->raw_size,
.ops = ops,
+   .time = hist_time(sample->time),
}, *he = hists__findnew_entry(hists, , al, sample_self);
 
if (!hists->has_callchains && he && he->callchain_size != 0)
diff --git a/tools/perf/util/hist.h b/tools/perf/util/hist.h
index 4af27fbab24f..6279eca56409 100644
--- a/tools/perf/util/hist.h
+++ b/tools/perf/util/hist.h
@@ -31,6 +31,7 @@ enum hist_filter {
 
 enum hist_column {
HISTC_SYMBOL,
+   HISTC_TIME,
HISTC_DSO,
HISTC_THREAD,
HISTC_COMM,
diff --git a/tools/perf/util/sort.c b/tools/perf/util/sort.c
index d2299e912e59..22f24bb2bf8a 100644
--- a/tools/perf/util/sort.c
+++ b/tools/perf/util/sort.c
@@ -15,6 +15,7 @@
 #include 
 #include "mem-events.h"
 #include "annotate.h"
+#include "time-utils.h"
 #include 
 
 regex_tparent_regex;
@@ -654,6 +655,42 @@ struct sort_entry sort_socket = {
.se_width_idx   = HISTC_SOCKET,
 };
 
+/* --sort time */
+
+static int64_t
+sort__time_cmp(struct hist_entry *left, struct hist_entry *right)
+{
+   return right->time - left->time;
+}
+
+static int hist_entry__time_snprintf(struct hist_entry *he, char *bf,
+   size_t size, unsigned int width)
+{
+   unsigned long secs;
+   unsigned long long nsecs;
+   char he_time[32];
+
+   nsecs = he->time;
+   secs = nsecs / 10;
+   nsecs -= secs * 10;
+
+   if (symbol_conf.nanosecs)
+   snprintf(he_time, sizeof he_time, "%5lu.%09llu: ",
+

[tip:perf/core] perf tools: Add perf_exe() helper to find perf binary

2019-02-28 Thread tip-bot for Andi Kleen
Commit-ID:  94816add0005595ea33fc8456519be582330401e
Gitweb: https://git.kernel.org/tip/94816add0005595ea33fc8456519be582330401e
Author: Andi Kleen 
AuthorDate: Sun, 24 Feb 2019 07:37:19 -0800
Committer:  Arnaldo Carvalho de Melo 
CommitDate: Mon, 25 Feb 2019 10:58:28 -0300

perf tools: Add perf_exe() helper to find perf binary

Also convert one existing user.

Signed-off-by: Andi Kleen 
Acked-by: Jiri Olsa 
Cc: Namhyung Kim 
Cc: Stephane Eranian 
Link: http://lkml.kernel.org/r/20190224153722.27020-9-a...@firstfloor.org
Signed-off-by: Arnaldo Carvalho de Melo 
---
 tools/perf/util/header.c | 12 +++-
 tools/perf/util/util.c   | 10 ++
 tools/perf/util/util.h   |  2 ++
 3 files changed, 15 insertions(+), 9 deletions(-)

diff --git a/tools/perf/util/header.c b/tools/perf/util/header.c
index a2323d777dae..01b324c275b9 100644
--- a/tools/perf/util/header.c
+++ b/tools/perf/util/header.c
@@ -527,17 +527,11 @@ static int write_event_desc(struct feat_fd *ff,
 static int write_cmdline(struct feat_fd *ff,
 struct perf_evlist *evlist __maybe_unused)
 {
-   char buf[MAXPATHLEN];
-   u32 n;
-   int i, ret;
+   char pbuf[MAXPATHLEN], *buf;
+   int i, ret, n;
 
/* actual path to perf binary */
-   ret = readlink("/proc/self/exe", buf, sizeof(buf) - 1);
-   if (ret <= 0)
-   return -1;
-
-   /* readlink() does not add null termination */
-   buf[ret] = '\0';
+   buf = perf_exe(pbuf, MAXPATHLEN);
 
/* account for binary path */
n = perf_env.nr_cmdline + 1;
diff --git a/tools/perf/util/util.c b/tools/perf/util/util.c
index 706818693086..d388f80d8703 100644
--- a/tools/perf/util/util.c
+++ b/tools/perf/util/util.c
@@ -568,3 +568,13 @@ out:
 
return tip;
 }
+
+char *perf_exe(char *buf, int len)
+{
+   int n = readlink("/proc/self/exe", buf, len);
+   if (n > 0) {
+   buf[n] = 0;
+   return buf;
+   }
+   return strcpy(buf, "perf");
+}
diff --git a/tools/perf/util/util.h b/tools/perf/util/util.h
index 01c538027c6f..09c1b0f91f65 100644
--- a/tools/perf/util/util.h
+++ b/tools/perf/util/util.h
@@ -77,6 +77,8 @@ extern bool perf_singlethreaded;
 void perf_set_singlethreaded(void);
 void perf_set_multithreaded(void);
 
+char *perf_exe(char *buf, int len);
+
 #ifndef O_CLOEXEC
 #ifdef __sparc__
 #define O_CLOEXEC  0x40


[tip:perf/core] perf script: Handle missing fields with -F +..

2019-02-28 Thread tip-bot for Andi Kleen
Commit-ID:  4b6ac811bce46c83811b83cdf87b41251596b9fc
Gitweb: https://git.kernel.org/tip/4b6ac811bce46c83811b83cdf87b41251596b9fc
Author: Andi Kleen 
AuthorDate: Sun, 24 Feb 2019 07:37:12 -0800
Committer:  Arnaldo Carvalho de Melo 
CommitDate: Mon, 25 Feb 2019 10:58:07 -0300

perf script: Handle missing fields with -F +..

When using -F + syntax to add a field the existing defaults are
currently all marked user_set. This can cause errors when some field is
missing in the perf.data

This patch tracks the actually user set fields separately, so that we don't
error out in this case.

Before:

  % perf record true
  % perf script -F +metric
  Samples for 'cycles:ppp' event do not have CPU attribute set. Cannot print 
'cpu' field.
  %

After:

  5 perf record true
  % perf script -F +metric
  perf 28936 278636.237688:  1 cycles:ppp:  
8117da99 perf_event_exec+0x59 (/lib/modules/4.20.0-odilo/build/vmlinux)
  ...
  %

Signed-off-by: Andi Kleen 
Tested-by: Arnaldo Carvalho de Melo 
Acked-by: Jiri Olsa 
Cc: Jiri Olsa 
Cc: Namhyung Kim 
Cc: Stephane Eranian 
Link: http://lkml.kernel.org/r/20190224153722.27020-2-a...@firstfloor.org
Signed-off-by: Arnaldo Carvalho de Melo 
---
 tools/perf/builtin-script.c | 10 +++---
 1 file changed, 7 insertions(+), 3 deletions(-)

diff --git a/tools/perf/builtin-script.c b/tools/perf/builtin-script.c
index 5b1543f42290..2d8cb1d1682c 100644
--- a/tools/perf/builtin-script.c
+++ b/tools/perf/builtin-script.c
@@ -149,6 +149,7 @@ static struct {
unsigned int print_ip_opts;
u64 fields;
u64 invalid_fields;
+   u64 user_set_fields;
 } output[OUTPUT_TYPE_MAX] = {
 
[PERF_TYPE_HARDWARE] = {
@@ -345,7 +346,7 @@ static int perf_evsel__do_check_stype(struct perf_evsel 
*evsel,
if (attr->sample_type & sample_type)
return 0;
 
-   if (output[type].user_set) {
+   if (output[type].user_set_fields & field) {
if (allow_user_set)
return 0;
evname = perf_evsel__name(evsel);
@@ -2632,10 +2633,13 @@ parse:
pr_warning("\'%s\' not valid for %s 
events. Ignoring.\n",
   all_output_options[i].str, 
event_type(j));
} else {
-   if (change == REMOVE)
+   if (change == REMOVE) {
output[j].fields &= 
~all_output_options[i].field;
-   else
+   output[j].user_set_fields &= 
~all_output_options[i].field;
+   } else {
output[j].fields |= 
all_output_options[i].field;
+   output[j].user_set_fields |= 
all_output_options[i].field;
+   }
output[j].user_set = true;
output[j].wildcard_set = true;
}


Re: Support sample context in perf report

2019-02-27 Thread Andi Kleen
On Wed, Feb 27, 2019 at 05:16:59PM +0100, Jiri Olsa wrote:
> On Wed, Feb 27, 2019 at 08:01:35AM -0800, Andi Kleen wrote:
> > > > Also available in
> > > > git://git.kernel.org/pub/scm/linux/kernel/git/ak/linux-misc.git 
> > > > perf/streams-2
> > > 
> > > your post is missing this patch, it's only in the branch:
> > >   perf tools: Add utility function to fetch executable
> > 
> > Because Arnaldo already merged it. But the branch is still based
> 
> can't see it on his perf/core branch,
> please paste me commit id

I don't know the commit id, but there was an email from him earlier
that said "applied". Maybe he hasn't pushed yet.

-Andi


Re: Support sample context in perf report

2019-02-27 Thread Andi Kleen
> > Also available in
> > git://git.kernel.org/pub/scm/linux/kernel/git/ak/linux-misc.git 
> > perf/streams-2
> 
> your post is missing this patch, it's only in the branch:
>   perf tools: Add utility function to fetch executable

Because Arnaldo already merged it. But the branch is still based
on tip/perf/core, which doesn't have it yet.

-Andi


Re: Support sample context in perf report

2019-02-26 Thread Andi Kleen
Jiri Olsa  writes:
>
> im still getting compile error the new branch:
>
>   CC   ui/browsers/hists.o
> ui/browsers/hists.c: In function ‘perf_evsel__hists_browse’:
> ui/browsers/hists.c:2567:8: error: ‘%s’ directive output may be truncated 
> writing up to 63 bytes into a region of size between 28 and 91 
> [-Werror=format-truncation=]
>n += snprintf(script_opt + n, len - n, " --time %s,%s", start, end);
> ^~
> In file included from /usr/include/stdio.h:862,
>  from ui/browsers/hists.c:5:
> /usr/include/bits/stdio2.h:64:10: note: ‘__builtin___snprintf_chk’ output 
> between 10 and 136 bytes into a destination of size 100
>return __builtin___snprintf_chk (__s, __n, __USE_FORTIFY_LEVEL - 1,
>   ^~~~
> __bos (__s), __fmt, __va_arg_pack ());
> ~
> cc1: all warnings being treated as errors
> mv: cannot stat 'ui/browsers/.hists.o.tmp': No such file or directory

I tested with gcc 8 and it built on a opensuse leap system.

Of course you never know where you end up with the gcc -Werror
russian roulette. I don't think any of those can really overflow,
it's all false positives. This one is particularly annoying
because the compiler seems to assume that every char[] variable
is filled up to the maximum, which is flat out wrong.

Anyways this patch should help.

-Andi

diff --git a/tools/perf/ui/browsers/hists.c b/tools/perf/ui/browsers/hists.c
index e35b274ee863..8ca988506388 100644
--- a/tools/perf/ui/browsers/hists.c
+++ b/tools/perf/ui/browsers/hists.c
@@ -2554,7 +2554,7 @@ do_run_script(struct hist_browser *browser __maybe_unused,
}
 
if (act->time) {
-   char start[64], end[64];
+   char start[22], end[22];
unsigned long starttime = act->time;
unsigned long endtime = act->time + symbol_conf.time_quantum;
 


[PATCH v2 02/11] perf tools report: Support nano seconds

2019-02-25 Thread Andi Kleen
From: Andi Kleen 

Upcoming changes add timestamp output in perf report. Add a --ns
argument similar to perf script to support nanoseconds resolution
when needed.

Signed-off-by: Andi Kleen 

---
v2:
Move flag into symbol_conf and change all users
---
 tools/perf/Documentation/perf-report.txt |  3 +++
 tools/perf/builtin-report.c  |  1 +
 tools/perf/builtin-script.c  | 11 +--
 tools/perf/util/symbol.c |  1 +
 tools/perf/util/symbol_conf.h|  1 +
 5 files changed, 11 insertions(+), 6 deletions(-)

diff --git a/tools/perf/Documentation/perf-report.txt 
b/tools/perf/Documentation/perf-report.txt
index 1a27bfe05039..51dbc519dbce 100644
--- a/tools/perf/Documentation/perf-report.txt
+++ b/tools/perf/Documentation/perf-report.txt
@@ -477,6 +477,9 @@ include::itrace.txt[]
Please note that not all mmaps are stored, options affecting which ones
are include 'perf record --data', for instance.
 
+--ns::
+   Show time stamps in nanoseconds.
+
 --stats::
Display overall events statistics without any further processing.
(like the one at the end of the perf report -D command)
diff --git a/tools/perf/builtin-report.c b/tools/perf/builtin-report.c
index 2e8c74d6430c..6b2cc91ee26a 100644
--- a/tools/perf/builtin-report.c
+++ b/tools/perf/builtin-report.c
@@ -1147,6 +1147,7 @@ int cmd_report(int argc, const char **argv)
OPT_CALLBACK(0, "percent-type", _opts, "local-period",
 "Set percent type local/global-period/hits",
 annotate_parse_percent_type),
+   OPT_BOOLEAN(0, "ns", _conf.nanosecs, "Show times in nanosecs"),
OPT_END()
};
struct perf_data data = {
diff --git a/tools/perf/builtin-script.c b/tools/perf/builtin-script.c
index 1d651a9b110e..1e6f2f095152 100644
--- a/tools/perf/builtin-script.c
+++ b/tools/perf/builtin-script.c
@@ -60,7 +60,6 @@ static bool   no_callchain;
 static boollatency_format;
 static boolsystem_wide;
 static boolprint_flags;
-static boolnanosecs;
 static const char  *cpu_list;
 static DECLARE_BITMAP(cpu_bitmap, MAX_NR_CPUS);
 static struct perf_stat_config stat_config;
@@ -691,7 +690,7 @@ static int perf_sample__fprintf_start(struct perf_sample 
*sample,
secs = nsecs / NSEC_PER_SEC;
nsecs -= secs * NSEC_PER_SEC;
 
-   if (nanosecs)
+   if (symbol_conf.nanosecs)
printed += fprintf(fp, "%5lu.%09llu: ", secs, nsecs);
else {
char sample_time[32];
@@ -3238,7 +3237,7 @@ static int parse_insn_trace(const struct option *opt 
__maybe_unused,
 {
parse_output_fields(NULL, "+insn,-event,-period", 0);
itrace_parse_synth_opts(opt, "i0ns", 0);
-   nanosecs = true;
+   symbol_conf.nanosecs = true;
return 0;
 }
 
@@ -3256,7 +3255,7 @@ static int parse_call_trace(const struct option *opt 
__maybe_unused,
 {
parse_output_fields(NULL, "-ip,-addr,-event,-period,+callindent", 0);
itrace_parse_synth_opts(opt, "cewp", 0);
-   nanosecs = true;
+   symbol_conf.nanosecs = true;
return 0;
 }
 
@@ -3266,7 +3265,7 @@ static int parse_callret_trace(const struct option *opt 
__maybe_unused,
 {
parse_output_fields(NULL, 
"-ip,-addr,-event,-period,+callindent,+flags", 0);
itrace_parse_synth_opts(opt, "crewp", 0);
-   nanosecs = true;
+   symbol_conf.nanosecs = true;
return 0;
 }
 
@@ -3402,7 +3401,7 @@ int cmd_script(int argc, const char **argv)
OPT_BOOLEAN('f', "force", _conf.force, "don't complain, do it"),
OPT_INTEGER(0, "max-blocks", _blocks,
"Maximum number of code blocks to dump with brstackinsn"),
-   OPT_BOOLEAN(0, "ns", ,
+   OPT_BOOLEAN(0, "ns", _conf.nanosecs,
"Use 9 decimal places when displaying time"),
OPT_CALLBACK_OPTARG(0, "itrace", _synth_opts, NULL, "opts",
"Instruction Tracing options\n" ITRACE_HELP,
diff --git a/tools/perf/util/symbol.c b/tools/perf/util/symbol.c
index 758bf5f74e6e..eb873ea1c405 100644
--- a/tools/perf/util/symbol.c
+++ b/tools/perf/util/symbol.c
@@ -39,6 +39,7 @@ int vmlinux_path__nr_entries;
 char **vmlinux_path;
 
 struct symbol_conf symbol_conf = {
+   .nanosecs   = false,
.use_modules= true,
.try_vmlinux_path   = true,
.demangle   = true,
diff --git a/tools/perf/util/symbol_conf.h b/tools/perf/util/symbol_conf.h
index fffea68c1203..095a297c8b47 100644
--- a/tools/perf/util/symbol_conf.

[PATCH v2 01/11] perf tools script: Support insn output for normal samples

2019-02-25 Thread Andi Kleen
From: Andi Kleen 

perf script -F +insn was only working for PT traces because
the PT instruction decoder was filling in the insn/insn_len
sample attributes. Support it for non PT samples too on x86
using the existing x86 instruction decoder.

% perf record -a sleep 1
% perf script -F ip,sym,insn --xed
 811704c9 remote_function   movl  %eax, 0x18(%rbx)
 8100bb50 intel_bts_enable_localretq
 81048612 native_apic_mem_write movl  %esi, 
-0xa04000(%rdi)
 81048612 native_apic_mem_write movl  %esi, 
-0xa04000(%rdi)
 81048612 native_apic_mem_write movl  %esi, 
-0xa04000(%rdi)
 810f1f79 generic_exec_single   xor %eax, %eax
 811704c9 remote_function   movl  %eax, 0x18(%rbx)
 8100bb34 intel_bts_enable_localmovl  0x2000(%rax), %edx
 81048610 native_apic_mem_write mov %edi, %edi
...

Signed-off-by: Andi Kleen 

---
v2:
Avoid printing instruction when empty
Only decode when perf.data file was collected on same architecture
---
 tools/perf/arch/x86/util/Build  |  1 +
 tools/perf/arch/x86/util/archinsn.c | 28 
 tools/perf/builtin-script.c | 21 -
 tools/perf/util/archinsn.h  | 12 
 4 files changed, 61 insertions(+), 1 deletion(-)
 create mode 100644 tools/perf/arch/x86/util/archinsn.c
 create mode 100644 tools/perf/util/archinsn.h

diff --git a/tools/perf/arch/x86/util/Build b/tools/perf/arch/x86/util/Build
index 7aab0be5fc5f..7b8e69bbbdfe 100644
--- a/tools/perf/arch/x86/util/Build
+++ b/tools/perf/arch/x86/util/Build
@@ -6,6 +6,7 @@ perf-y += perf_regs.o
 perf-y += group.o
 perf-y += machine.o
 perf-y += event.o
+perf-y += archinsn.o
 
 perf-$(CONFIG_DWARF) += dwarf-regs.o
 perf-$(CONFIG_BPF_PROLOGUE) += dwarf-regs.o
diff --git a/tools/perf/arch/x86/util/archinsn.c 
b/tools/perf/arch/x86/util/archinsn.c
new file mode 100644
index ..10b3c2a08b8f
--- /dev/null
+++ b/tools/perf/arch/x86/util/archinsn.c
@@ -0,0 +1,28 @@
+// SPDX-License-Identifier: GPL-2.0
+#include "perf.h"
+#include "archinsn.h"
+#include "fetch.h"
+#include "util/intel-pt-decoder/insn.h"
+#include "machine.h"
+#include "thread.h"
+#include "symbol.h"
+
+void arch_fetch_insn(struct perf_sample *sample,
+struct thread *thread,
+struct machine *machine)
+{
+   struct insn insn;
+   int len;
+   bool is64bit = false;
+
+   if (!sample->ip)
+   return;
+   len = fetch_exe(sample->ip, thread, machine, sample->insn,
+   sizeof(sample->insn), );
+   if (len <= 0)
+   return;
+   insn_init(, sample->insn, len, is64bit);
+   insn_get_length();
+   if (insn_complete() && insn.length <= len)
+   sample->insn_len = insn.length;
+}
diff --git a/tools/perf/builtin-script.c b/tools/perf/builtin-script.c
index 29e95a25c6e6..1d651a9b110e 100644
--- a/tools/perf/builtin-script.c
+++ b/tools/perf/builtin-script.c
@@ -29,10 +29,12 @@
 #include "util/time-utils.h"
 #include "util/path.h"
 #include "print_binary.h"
+#include "archinsn.h"
 #include 
 #include 
 #include 
 #include 
+#include 
 #include "asm/bug.h"
 #include "util/mem-events.h"
 #include "util/dump-insn.h"
@@ -63,6 +65,7 @@ static const char *cpu_list;
 static DECLARE_BITMAP(cpu_bitmap, MAX_NR_CPUS);
 static struct perf_stat_config stat_config;
 static int max_blocks;
+static boolnative_arch;
 
 unsigned int scripting_max_stack = PERF_MAX_STACK_DEPTH;
 
@@ -1227,6 +1230,12 @@ static int perf_sample__fprintf_callindent(struct 
perf_sample *sample,
return len + dlen;
 }
 
+__weak void arch_fetch_insn(struct perf_sample *sample __maybe_unused,
+   struct thread *thread __maybe_unused,
+   struct machine *machine __maybe_unused)
+{
+}
+
 static int perf_sample__fprintf_insn(struct perf_sample *sample,
 struct perf_event_attr *attr,
 struct thread *thread,
@@ -1234,9 +1243,12 @@ static int perf_sample__fprintf_insn(struct perf_sample 
*sample,
 {
int printed = 0;
 
+   if (sample->insn_len == 0 && native_arch)
+   arch_fetch_insn(sample, thread, machine);
+
if (PRINT_FIELD(INSNLEN))
printed += fprintf(fp, " ilen: %d", sample->insn_len);
-   if (PRINT_FIELD(INSN)) {
+   if (PRINT_FIELD(INSN) && sample->insn_len) {
int i;
 
printed += fprintf(fp, " insn:");
@@ -3271,6 +3283,7 @@ int cmd_script(int argc, const char **argv)

[PATCH v2 07/11] perf tools: Add perf_exe() helper to find perf binary

2019-02-25 Thread Andi Kleen
From: Andi Kleen 

Also convert one existing user.

Signed-off-by: Andi Kleen 
---
 tools/perf/util/header.c | 12 +++-
 tools/perf/util/util.c   | 10 ++
 tools/perf/util/util.h   |  2 ++
 3 files changed, 15 insertions(+), 9 deletions(-)

diff --git a/tools/perf/util/header.c b/tools/perf/util/header.c
index 61ce197c5362..2f290a0343d4 100644
--- a/tools/perf/util/header.c
+++ b/tools/perf/util/header.c
@@ -526,17 +526,11 @@ static int write_event_desc(struct feat_fd *ff,
 static int write_cmdline(struct feat_fd *ff,
 struct perf_evlist *evlist __maybe_unused)
 {
-   char buf[MAXPATHLEN];
-   u32 n;
-   int i, ret;
+   char pbuf[MAXPATHLEN], *buf;
+   int i, ret, n;
 
/* actual path to perf binary */
-   ret = readlink("/proc/self/exe", buf, sizeof(buf) - 1);
-   if (ret <= 0)
-   return -1;
-
-   /* readlink() does not add null termination */
-   buf[ret] = '\0';
+   buf = perf_exe(pbuf, MAXPATHLEN);
 
/* account for binary path */
n = perf_env.nr_cmdline + 1;
diff --git a/tools/perf/util/util.c b/tools/perf/util/util.c
index 320b0fef249a..dd693e018aef 100644
--- a/tools/perf/util/util.c
+++ b/tools/perf/util/util.c
@@ -507,3 +507,13 @@ const char *perf_tip(const char *dirpath)
 
return tip;
 }
+
+char *perf_exe(char *buf, int len)
+{
+   int n = readlink("/proc/self/exe", buf, len);
+   if (n > 0) {
+   buf[n] = 0;
+   return buf;
+   }
+   return strcpy(buf, "perf");
+}
diff --git a/tools/perf/util/util.h b/tools/perf/util/util.h
index ece040b799f6..dc2a99ad5c96 100644
--- a/tools/perf/util/util.h
+++ b/tools/perf/util/util.h
@@ -76,6 +76,8 @@ extern bool perf_singlethreaded;
 void perf_set_singlethreaded(void);
 void perf_set_multithreaded(void);
 
+char *perf_exe(char *buf, int len);
+
 #ifndef O_CLOEXEC
 #ifdef __sparc__
 #define O_CLOEXEC  0x40
-- 
2.17.2



[PATCH v2 04/11] perf tools report: Support time sort key

2019-02-25 Thread Andi Kleen
From: Andi Kleen 

Add a time sort key to perf report to display samples for
different time quantums separately. This allows easier
analysis of workloads that change over time, and also
will allow looking at the context of samples.

% perf record ...
% perf report --sort time,overhead,symbol --time-quantum 1ms --stdio
...
 0.67%  277061.87300  [.] _dl_start
 0.50%  277061.87300  [.] f1
 0.50%  277061.87300  [.] f2
 0.33%  277061.87300  [.] main
 0.29%  277061.87300  [.] _dl_lookup_symbol_x
 0.29%  277061.87300  [.] dl_main
 0.29%  277061.87300  [.] do_lookup_x
 0.17%  277061.87300  [.] _dl_debug_initialize
 0.17%  277061.87300  [.] _dl_init_paths
 0.08%  277061.87300  [.] check_match
 0.04%  277061.87300  [.] _dl_count_modids
 1.33%  277061.87400  [.] f1
 1.33%  277061.87400  [.] f2
 1.33%  277061.87400  [.] main
 1.17%  277061.87500  [.] main
 1.08%  277061.87500  [.] f1
 1.08%  277061.87500  [.] f2
 1.00%  277061.87600  [.] main
 0.83%  277061.87600  [.] f1
 0.83%  277061.87600  [.] f2
 1.00%  277061.87700  [.] main

Signed-off-by: Andi Kleen 

---
v2:
Use symbol_conf.time_quantum
---
 tools/perf/Documentation/perf-report.txt |  2 ++
 tools/perf/util/hist.c   | 10 +++
 tools/perf/util/hist.h   |  1 +
 tools/perf/util/sort.c   | 38 
 tools/perf/util/sort.h   |  2 ++
 5 files changed, 53 insertions(+)

diff --git a/tools/perf/Documentation/perf-report.txt 
b/tools/perf/Documentation/perf-report.txt
index 9ec1702bccdd..546d87221ad8 100644
--- a/tools/perf/Documentation/perf-report.txt
+++ b/tools/perf/Documentation/perf-report.txt
@@ -105,6 +105,8 @@ OPTIONS
guest machine
- sample: Number of sample
- period: Raw number of event count of sample
+   - time: Separate the samples by time stamp with the resolution 
specified by
+   --time-quantum (default 100ms). Specify with overhead and before it.
 
By default, comm, dso and symbol keys are used.
(i.e. --sort comm,dso,symbol)
diff --git a/tools/perf/util/hist.c b/tools/perf/util/hist.c
index 669f961316f0..6040eb49ea23 100644
--- a/tools/perf/util/hist.c
+++ b/tools/perf/util/hist.c
@@ -192,6 +192,7 @@ void hists__calc_col_len(struct hists *hists, struct 
hist_entry *h)
hists__new_col_len(hists, HISTC_MEM_LVL, 21 + 3);
hists__new_col_len(hists, HISTC_LOCAL_WEIGHT, 12);
hists__new_col_len(hists, HISTC_GLOBAL_WEIGHT, 12);
+   hists__new_col_len(hists, HISTC_TIME, 12);
 
if (h->srcline) {
len = MAX(strlen(h->srcline), strlen(sort_srcline.se_header));
@@ -246,6 +247,14 @@ static void he_stat__add_cpumode_period(struct he_stat 
*he_stat,
}
 }
 
+static long hist_time(unsigned long time)
+{
+   unsigned long time_quantum = symbol_conf.time_quantum;
+   if (time_quantum)
+   return (time / time_quantum) * time_quantum;
+   return (time / 100) * 100;
+}
+
 static void he_stat__add_period(struct he_stat *he_stat, u64 period,
u64 weight)
 {
@@ -626,6 +635,7 @@ __hists__add_entry(struct hists *hists,
.raw_data = sample->raw_data,
.raw_size = sample->raw_size,
.ops = ops,
+   .time = hist_time(sample->time),
}, *he = hists__findnew_entry(hists, , al, sample_self);
 
if (!hists->has_callchains && he && he->callchain_size != 0)
diff --git a/tools/perf/util/hist.h b/tools/perf/util/hist.h
index 4af27fbab24f..6279eca56409 100644
--- a/tools/perf/util/hist.h
+++ b/tools/perf/util/hist.h
@@ -31,6 +31,7 @@ enum hist_filter {
 
 enum hist_column {
HISTC_SYMBOL,
+   HISTC_TIME,
HISTC_DSO,
HISTC_THREAD,
HISTC_COMM,
diff --git a/tools/perf/util/sort.c b/tools/perf/util/sort.c
index 2b6c1ccb878c..414c8704cced 100644
--- a/tools/perf/util/sort.c
+++ b/tools/perf/util/sort.c
@@ -15,6 +15,7 @@
 #include 
 #include "mem-events.h"
 #include "annotate.h"
+#include "time-utils.h"
 #include 
 
 regex_tparent_regex;
@@ -648,6 +649,42 @@ struct sort_entry sort_socket = {
.se_width_idx   = HISTC_SOCKET,
 };
 
+/* --sort time */
+
+static int64_t
+sort__time_cmp(struct hist_entry *left, struct hist_entry *right)
+{
+   return right->time - left->time;
+}
+
+static int hist_entry__time_snprintf(struct hist_entry *he, char *bf,
+   size_t size, unsigned int width)
+{
+   unsigned long secs;
+   unsigned long long nsecs;
+   char he_time[32];
+
+   nsecs = he->time;
+   secs = nsecs / 10;
+   nsecs -= secs * 10;
+
+   if (symbol_conf.nanosecs)
+   snprintf(he_time, sizeof he_time, "%5lu.%09llu: ",
+

[PATCH v2 03/11] perf tools report: Parse time quantum

2019-02-25 Thread Andi Kleen
From: Andi Kleen 

Many workloads change over time. perf report currently aggregates
the whole time range reported in perf.data.

This patch adds an option for a time quantum to quantisize the
perf.data over time.

This just adds the option, will be used in follow on patches
for a time sort key.

Signed-off-by: Andi Kleen 

---
v2:
Move time_quantum to symbol_conf. check for zero time quantum
---
 tools/perf/Documentation/perf-report.txt |  4 +++
 tools/perf/builtin-report.c  | 41 
 tools/perf/util/symbol.c |  1 +
 tools/perf/util/symbol_conf.h|  1 +
 4 files changed, 47 insertions(+)

diff --git a/tools/perf/Documentation/perf-report.txt 
b/tools/perf/Documentation/perf-report.txt
index 51dbc519dbce..9ec1702bccdd 100644
--- a/tools/perf/Documentation/perf-report.txt
+++ b/tools/perf/Documentation/perf-report.txt
@@ -497,6 +497,10 @@ include::itrace.txt[]
The period/hits keywords set the base the percentage is computed
on - the samples period or the number of samples (hits).
 
+--time-quantum::
+   Configure time quantum for time sort key. Default 100ms.
+   Accepts s, us, ms, ns units.
+
 include::callchain-overhead-calculation.txt[]
 
 SEE ALSO
diff --git a/tools/perf/builtin-report.c b/tools/perf/builtin-report.c
index 6b2cc91ee26a..f290c19304c8 100644
--- a/tools/perf/builtin-report.c
+++ b/tools/perf/builtin-report.c
@@ -47,6 +47,7 @@
 #include 
 #include 
 #include 
+#include "sane_ctype.h"
 #include 
 #include 
 #include 
@@ -926,6 +927,43 @@ report_parse_callchain_opt(const struct option *opt, const 
char *arg, int unset)
return parse_callchain_report_opt(arg);
 }
 
+static int
+parse_time_quantum(const struct option *opt, const char *arg,
+  int unset __maybe_unused)
+{
+   unsigned long *time_q = opt->value;
+   char *end;
+
+   *time_q = strtoul(arg, , 0);
+   if (end == arg)
+   goto parse_err;
+   if (*time_q == 0) {
+   pr_err("time quantum cannot be 0");
+   return -1;
+   }
+   while (isspace(*end))
+   end++;
+   if (*end == 0)
+   return 0;
+   if (!strcmp(end, "s")) {
+   *time_q *= 10;
+   return 0;
+   }
+   if (!strcmp(end, "ms")) {
+   *time_q *= 100;
+   return 0;
+   }
+   if (!strcmp(end, "us")) {
+   *time_q *= 1000;
+   return 0;
+   }
+   if (!strcmp(end, "ns"))
+   return 0;
+parse_err:
+   pr_err("Cannot parse time quantum `%s'\n", arg);
+   return -1;
+}
+
 int
 report_parse_ignore_callees_opt(const struct option *opt __maybe_unused,
const char *arg, int unset __maybe_unused)
@@ -1148,6 +1186,9 @@ int cmd_report(int argc, const char **argv)
 "Set percent type local/global-period/hits",
 annotate_parse_percent_type),
OPT_BOOLEAN(0, "ns", _conf.nanosecs, "Show times in nanosecs"),
+   OPT_CALLBACK(0, "time-quantum", _conf.time_quantum, "time 
(ms|us|ns)",
+"Set time quantum for time sort key (default 100ms)",
+parse_time_quantum),
OPT_END()
};
struct perf_data data = {
diff --git a/tools/perf/util/symbol.c b/tools/perf/util/symbol.c
index eb873ea1c405..0f80743a1c25 100644
--- a/tools/perf/util/symbol.c
+++ b/tools/perf/util/symbol.c
@@ -45,6 +45,7 @@ struct symbol_conf symbol_conf = {
.demangle   = true,
.demangle_kernel= false,
.cumulate_callchain = true,
+   .time_quantum   = 1, /* 100ms */
.show_hist_headers  = true,
.symfs  = "",
.event_group= true,
diff --git a/tools/perf/util/symbol_conf.h b/tools/perf/util/symbol_conf.h
index 095a297c8b47..a5684a71b78e 100644
--- a/tools/perf/util/symbol_conf.h
+++ b/tools/perf/util/symbol_conf.h
@@ -56,6 +56,7 @@ struct symbol_conf {
*sym_list_str,
*col_width_list_str,
*bt_stop_list_str;
+   unsigned long   time_quantum;
struct strlist  *dso_list,
*comm_list,
*sym_list,
-- 
2.17.2



[PATCH v2 06/11] perf tools report: Support running scripts for current time range

2019-02-25 Thread Andi Kleen
From: Andi Kleen 

When using the time sort key, add new context menus to run
scripts for only the currently selected time range. Compute
the correct range for the selection add pass it as the --time option to
perf script.

Signed-off-by: Andi Kleen 

---
v2:
Use symbol_conf.time_quantum
---
 tools/perf/ui/browsers/hists.c | 82 +-
 1 file changed, 71 insertions(+), 11 deletions(-)

diff --git a/tools/perf/ui/browsers/hists.c b/tools/perf/ui/browsers/hists.c
index aef800d97ea1..663970f766e1 100644
--- a/tools/perf/ui/browsers/hists.c
+++ b/tools/perf/ui/browsers/hists.c
@@ -30,6 +30,7 @@
 #include "srcline.h"
 #include "string2.h"
 #include "units.h"
+#include "time-utils.h"
 
 #include "sane_ctype.h"
 
@@ -2338,6 +2339,7 @@ static int switch_data_file(void)
 }
 
 struct popup_action {
+   unsigned long   time;
struct thread   *thread;
struct map_symbol   ms;
int socket;
@@ -2527,36 +2529,64 @@ static int
 do_run_script(struct hist_browser *browser __maybe_unused,
  struct popup_action *act)
 {
-   char script_opt[64];
-   memset(script_opt, 0, sizeof(script_opt));
+   char *script_opt;
+   int len;
+   int n = 0;
+
+   len = 100;
+   if (act->thread)
+   len += strlen(thread__comm_str(act->thread));
+   else if (act->ms.sym)
+   len += strlen(act->ms.sym->name);
+   script_opt = malloc(len);
+   if (!script_opt)
+   return -1;
 
+   script_opt[0] = 0;
if (act->thread) {
-   scnprintf(script_opt, sizeof(script_opt), " -c %s ",
+   n = scnprintf(script_opt, len, " -c %s ",
  thread__comm_str(act->thread));
} else if (act->ms.sym) {
-   scnprintf(script_opt, sizeof(script_opt), " -S %s ",
+   n = scnprintf(script_opt, len, " -S %s ",
  act->ms.sym->name);
}
 
+   if (act->time) {
+   char start[64], end[64];
+   unsigned long starttime = act->time;
+   unsigned long endtime = act->time + symbol_conf.time_quantum;
+
+   if (starttime == endtime) { /* Display 1ms as fallback */
+   starttime -= 1*100;
+   endtime += 1*100;
+   }
+   timestamp__scnprintf_usec(starttime, start, sizeof start);
+   timestamp__scnprintf_usec(endtime, end, sizeof end);
+   n += snprintf(script_opt + n, len - n, " --time %s,%s", start, 
end);
+   }
+
script_browse(script_opt);
+   free(script_opt);
return 0;
 }
 
 static int
-add_script_opt(struct hist_browser *browser __maybe_unused,
+add_script_opt_2(struct hist_browser *browser __maybe_unused,
   struct popup_action *act, char **optstr,
-  struct thread *thread, struct symbol *sym)
+  struct thread *thread, struct symbol *sym,
+  const char *tstr)
 {
+
if (thread) {
-   if (asprintf(optstr, "Run scripts for samples of thread [%s]",
-thread__comm_str(thread)) < 0)
+   if (asprintf(optstr, "Run scripts for samples of thread [%s]%s",
+thread__comm_str(thread), tstr) < 0)
return 0;
} else if (sym) {
-   if (asprintf(optstr, "Run scripts for samples of symbol [%s]",
-sym->name) < 0)
+   if (asprintf(optstr, "Run scripts for samples of symbol [%s]%s",
+sym->name, tstr) < 0)
return 0;
} else {
-   if (asprintf(optstr, "Run scripts for all samples") < 0)
+   if (asprintf(optstr, "Run scripts for all samples%s", tstr) < 0)
return 0;
}
 
@@ -2566,6 +2596,36 @@ add_script_opt(struct hist_browser *browser 
__maybe_unused,
return 1;
 }
 
+static int
+add_script_opt(struct hist_browser *browser,
+  struct popup_action *act, char **optstr,
+  struct thread *thread, struct symbol *sym)
+{
+   int n, j;
+   struct hist_entry *he;
+
+   n = add_script_opt_2(browser, act, optstr, thread, sym, "");
+
+   he = hist_browser__selected_entry(browser);
+   if (sort_order && strstr(sort_order, "time")) {
+   char tstr[128];
+
+   optstr++;
+   act++;
+   j = sprintf(tstr, " in ");
+   j += timestamp__scnprintf_usec(he->time, tstr + j,
+  sizeof tstr - j);
+   j += sprintf(t

Support sample context in perf report

2019-02-25 Thread Andi Kleen
[Changes: 
Removed already merged patches.
Address review feedback, see individual patches.
Now compiles with gcc 8.
Some minor bug fixes and improvements.]

We currently have two ways to look at sample data in perf:
either use perf report to aggregate everything, or use
perf script to look at all individual samples.

Both ways are useful. Of course aggregation is useful
to quickly find the most expensive part of the code.

But sometimes a single sample is not good enough to
determine the problem and we need to look at context, either
through branch contexts, or other previous samples (e.g. for
correlating different micro architecture events or computing
metrics)

This can be done through perf script today, but it can
be rather cumbersome to find the right samples to look
at.

Another problem with perf report is that it aggregates
the whole measurement period. But many real workloads
have phases where they behave quite differently, and it is
often not useful to combine them into a single histogram.

While this can be worked around with the --time option
to report, it can be quite cumbersome.

This patch kit attempts to address some of these
problems in perf report by making it time aware.

- It adds a new time sort key that allows perf report
to separate samples from different regions. The time
regions can be defined with a new --time-quantum option.

- Then it extends the perf script support in the
tui record browser to allow browsing samples for 
different time regions from within a perf report
session.

- Extends the report browser script display
to automatically select sensible defaults
based on what was recorded. For example it will
automatically show branch contexts with -b.

- Support browsing the context of individual samples.
perf report can save a limited number of random samples
per histogram entry with the new --samples option.
Then the browser allows directly jumping to any
of the saved samples and browsing the context on the current
thread or CPU.

There could be probably be done more to make
perf report even better for such use cases (e.g. a real
time line display), but this basic support is good
enough for many useful usages.

Also available in
git://git.kernel.org/pub/scm/linux/kernel/git/ak/linux-misc.git perf/streams-2




[PATCH v2 09/11] perf tools: Add utility function to print ns time stamps

2019-02-25 Thread Andi Kleen
From: Andi Kleen 

Add a utility function to print nanosecond timestamps.

Signed-off-by: Andi Kleen 
---
 tools/perf/util/time-utils.c | 8 
 tools/perf/util/time-utils.h | 1 +
 2 files changed, 9 insertions(+)

diff --git a/tools/perf/util/time-utils.c b/tools/perf/util/time-utils.c
index 6193b46050a5..a63bdf4cfd1b 100644
--- a/tools/perf/util/time-utils.c
+++ b/tools/perf/util/time-utils.c
@@ -404,6 +404,14 @@ int timestamp__scnprintf_usec(u64 timestamp, char *buf, 
size_t sz)
return scnprintf(buf, sz, "%"PRIu64".%06"PRIu64, sec, usec);
 }
 
+int timestamp__scnprintf_nsec(u64 timestamp, char *buf, size_t sz)
+{
+   u64  sec = timestamp / NSEC_PER_SEC;
+   u64 nsec = timestamp % NSEC_PER_SEC;
+
+   return scnprintf(buf, sz, "%"PRIu64".%09"PRIu64, sec, nsec);
+}
+
 int fetch_current_timestamp(char *buf, size_t sz)
 {
struct timeval tv;
diff --git a/tools/perf/util/time-utils.h b/tools/perf/util/time-utils.h
index 70b177d2b98c..9266cf4a8e58 100644
--- a/tools/perf/util/time-utils.h
+++ b/tools/perf/util/time-utils.h
@@ -24,6 +24,7 @@ bool perf_time__ranges_skip_sample(struct perf_time_interval 
*ptime_buf,
   int num, u64 timestamp);
 
 int timestamp__scnprintf_usec(u64 timestamp, char *buf, size_t sz);
+int timestamp__scnprintf_nsec(u64 timestamp, char *buf, size_t sz);
 
 int fetch_current_timestamp(char *buf, size_t sz);
 
-- 
2.17.2



[PATCH v2 10/11] perf tools report: Implement browsing of individual samples

2019-02-25 Thread Andi Kleen
From: Andi Kleen 

Now report can show whole time periods with perf script,
but the user still has to find individual samples of interest
manually.

It would be expensive and complicated to search for the
right samples in the whole perf file. Typically users
only need to look at a small number of samples for useful
analysis.

Also the full scripts tend to show samples of all CPUs and all
threads mixed up, which can be very confusing on larger systems.

Add a new --samples option to save a small random number of samples
per hist entry

Use a reservoir sample technique to select a representatve
number of samples.

Then allow browsing the samples using perf script
as part of the hist entry context menu. This automatically
adds the right filters, so only the thread or cpu of the sample
is displayed. Then we use less' search functionality
to directly jump the to the time stamp of the selected
sample.

It uses different menus for assembler and source display.
Assembler needs xed installed and source needs debuginfo.

Currently it only supports as many samples as fit on
the screen due to some limitations in the slang ui code.

Signed-off-by: Andi Kleen 

---
v2:
Free names on error path
Pass --inline and --show-*-event to child perf as needed.
---
 tools/perf/Documentation/perf-report.txt |  4 ++
 tools/perf/builtin-report.c  |  2 +
 tools/perf/ui/browsers/Build |  1 +
 tools/perf/ui/browsers/hists.c   | 47 ++
 tools/perf/ui/browsers/res_sample.c  | 80 
 tools/perf/ui/browsers/scripts.c |  2 +-
 tools/perf/util/hist.c   | 36 +++
 tools/perf/util/hist.h   | 19 ++
 tools/perf/util/sort.h   |  8 +++
 tools/perf/util/symbol.c |  1 +
 tools/perf/util/symbol_conf.h|  1 +
 11 files changed, 200 insertions(+), 1 deletion(-)
 create mode 100644 tools/perf/ui/browsers/res_sample.c

diff --git a/tools/perf/Documentation/perf-report.txt 
b/tools/perf/Documentation/perf-report.txt
index 546d87221ad8..f441baa794ce 100644
--- a/tools/perf/Documentation/perf-report.txt
+++ b/tools/perf/Documentation/perf-report.txt
@@ -461,6 +461,10 @@ include::itrace.txt[]
 --socket-filter::
Only report the samples on the processor socket that match with this 
filter
 
+--samples=N::
+   Save N individual samples for each histogram entry to show context in 
perf
+   report tui browser.
+
 --raw-trace::
When displaying traceevent output, do not use print fmt or plugins.
 
diff --git a/tools/perf/builtin-report.c b/tools/perf/builtin-report.c
index f290c19304c8..f16a9a6f436a 100644
--- a/tools/perf/builtin-report.c
+++ b/tools/perf/builtin-report.c
@@ -1158,6 +1158,8 @@ int cmd_report(int argc, const char **argv)
OPT_BOOLEAN(0, "demangle-kernel", _conf.demangle_kernel,
"Enable kernel symbol demangling"),
OPT_BOOLEAN(0, "mem-mode", _mode, "mem access profile"),
+   OPT_INTEGER(0, "samples", _conf.res_sample,
+   "Number of samples to save per histogram entry for 
individual browsing"),
OPT_CALLBACK(0, "percent-limit", , "percent",
 "Don't show entries under that percent", 
parse_percent_limit),
OPT_CALLBACK(0, "percentage", NULL, "relative|absolute",
diff --git a/tools/perf/ui/browsers/Build b/tools/perf/ui/browsers/Build
index 8fee56b46502..fdf86f7981ca 100644
--- a/tools/perf/ui/browsers/Build
+++ b/tools/perf/ui/browsers/Build
@@ -3,6 +3,7 @@ perf-y += hists.o
 perf-y += map.o
 perf-y += scripts.o
 perf-y += header.o
+perf-y += res_sample.o
 
 CFLAGS_annotate.o += -DENABLE_SLFUTURE_CONST
 CFLAGS_hists.o+= -DENABLE_SLFUTURE_CONST
diff --git a/tools/perf/ui/browsers/hists.c b/tools/perf/ui/browsers/hists.c
index 10c5aa13ca0e..e35b274ee863 100644
--- a/tools/perf/ui/browsers/hists.c
+++ b/tools/perf/ui/browsers/hists.c
@@ -2344,6 +2344,7 @@ struct popup_action {
struct map_symbol   ms;
int socket;
struct perf_evsel   *evsel;
+   enum rstype rstype;
 
int (*fn)(struct hist_browser *browser, struct popup_action *act);
 };
@@ -2571,6 +2572,19 @@ do_run_script(struct hist_browser *browser 
__maybe_unused,
return 0;
 }
 
+static int
+do_res_sample_script(struct hist_browser *browser __maybe_unused,
+struct popup_action *act)
+{
+   struct hist_entry *he;
+   int num_res;
+
+   he = hist_browser__selected_entry(browser);
+   num_res = min(he->num_res, symbol_conf.res_sample);
+   res_sample_browse(he->res_samples, num_res, act->evsel, act->rstype);
+   return 0;
+}
+
 static int
 add_script_opt_2(struct hist_browser *browser __maybe_unused,
   struct popup_action *act, char **optstr,
@@ -2629,6 +2643,27 @@ add_sc

[PATCH v2 11/11] perf tools: Add some new tips describing the new options

2019-02-25 Thread Andi Kleen
From: Andi Kleen 

And one old option.

Signed-off-by: Andi Kleen 
---
 tools/perf/Documentation/tips.txt | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/tools/perf/Documentation/tips.txt 
b/tools/perf/Documentation/tips.txt
index 849599f39c5e..4ec8107ed512 100644
--- a/tools/perf/Documentation/tips.txt
+++ b/tools/perf/Documentation/tips.txt
@@ -15,6 +15,7 @@ To see callchains in a more compact form: perf report -g 
folded
 Show individual samples with: perf script
 Limit to show entries above 5% only: perf report --percent-limit 5
 Profiling branch (mis)predictions with: perf record -b / perf report
+To show assembler sample contexts use perf record -b / perf script -F 
+brstackinsn --xed
 Treat branches as callchains: perf report --branch-history
 To count events in every 1000 msec: perf stat -I 1000
 Print event counts in CSV format with: perf stat -x,
@@ -34,3 +35,5 @@ Show current config key-value pairs: perf config --list
 Show user configuration overrides: perf config --user --list
 To add Node.js USDT(User-Level Statically Defined Tracing): perf buildid-cache 
--add `which node`
 To report cacheline events from previous recording: perf c2c report
+To browse sample contexts use perf report --sample 10 and select in context 
menu
+To separate samples by time use perf report --sort time,overhead,sym
-- 
2.17.2



[PATCH v2 05/11] perf tools report: Use less for scripts output

2019-02-25 Thread Andi Kleen
From: Andi Kleen 

The UI viewer for scripts output has a lot of limitations: limited size,
no search or save function, slow, and various other issues.

Just use 'less' to display directly on the terminal instead.

This won't work in gtk mode, but gtk doesn't support these
context menus anyways. If that is ever done could use an terminal
for the output.

Signed-off-by: Andi Kleen 

---

v2:
Remove some unneeded headers
---
 tools/perf/ui/browsers/scripts.c | 130 ---
 1 file changed, 17 insertions(+), 113 deletions(-)

diff --git a/tools/perf/ui/browsers/scripts.c b/tools/perf/ui/browsers/scripts.c
index 90a32ac69e76..7f36630694bf 100644
--- a/tools/perf/ui/browsers/scripts.c
+++ b/tools/perf/ui/browsers/scripts.c
@@ -1,35 +1,12 @@
 // SPDX-License-Identifier: GPL-2.0
-#include 
-#include 
-#include 
-#include 
 #include "../../util/sort.h"
 #include "../../util/util.h"
 #include "../../util/hist.h"
 #include "../../util/debug.h"
 #include "../../util/symbol.h"
 #include "../browser.h"
-#include "../helpline.h"
 #include "../libslang.h"
 
-/* 2048 lines should be enough for a script output */
-#define MAX_LINES  2048
-
-/* 160 bytes for one output line */
-#define AVERAGE_LINE_LEN   160
-
-struct script_line {
-   struct list_head node;
-   char line[AVERAGE_LINE_LEN];
-};
-
-struct perf_script_browser {
-   struct ui_browser b;
-   struct list_head entries;
-   const char *script_name;
-   int nr_lines;
-};
-
 #define SCRIPT_NAMELEN 128
 #define SCRIPT_MAX_NO  64
 /*
@@ -73,69 +50,29 @@ static int list_scripts(char *script_name)
return ret;
 }
 
-static void script_browser__write(struct ui_browser *browser,
-  void *entry, int row)
+static void run_script(char *cmd)
 {
-   struct script_line *sline = list_entry(entry, struct script_line, node);
-   bool current_entry = ui_browser__is_current_entry(browser, row);
-
-   ui_browser__set_color(browser, current_entry ? HE_COLORSET_SELECTED :
-  HE_COLORSET_NORMAL);
-
-   ui_browser__write_nstring(browser, sline->line, browser->width);
+   pr_debug("Running %s\n", cmd);
+   SLang_reset_tty();
+   if (system(cmd) < 0)
+   pr_warning("Cannot run %s\n", cmd);
+   /*
+* SLang doesn't seem to reset the whole terminal, so be more
+* forceful to get back to the original state.
+*/
+   printf("\033[c\033[H\033[J");
+   fflush(stdout);
+   SLang_init_tty(0, 0, 0);
+   SLsmg_refresh();
 }
 
-static int script_browser__run(struct perf_script_browser *browser)
-{
-   int key;
-
-   if (ui_browser__show(>b, browser->script_name,
-"Press ESC to exit") < 0)
-   return -1;
-
-   while (1) {
-   key = ui_browser__run(>b, 0);
-
-   /* We can add some special key handling here if needed */
-   break;
-   }
-
-   ui_browser__hide(>b);
-   return key;
-}
-
-
 int script_browse(const char *script_opt)
 {
char cmd[SCRIPT_FULLPATH_LEN*2], script_name[SCRIPT_FULLPATH_LEN];
-   char *line = NULL;
-   size_t len = 0;
-   ssize_t retlen;
-   int ret = -1, nr_entries = 0;
-   FILE *fp;
-   void *buf;
-   struct script_line *sline;
-
-   struct perf_script_browser script = {
-   .b = {
-   .refresh= ui_browser__list_head_refresh,
-   .seek   = ui_browser__list_head_seek,
-   .write  = script_browser__write,
-   },
-   .script_name = script_name,
-   };
-
-   INIT_LIST_HEAD();
-
-   /* Save each line of the output in one struct script_line object. */
-   buf = zalloc((sizeof(*sline)) * MAX_LINES);
-   if (!buf)
-   return -1;
-   sline = buf;
 
memset(script_name, 0, SCRIPT_FULLPATH_LEN);
if (list_scripts(script_name))
-   goto exit;
+   return -1;
 
sprintf(cmd, "perf script -s %s ", script_name);
 
@@ -147,42 +84,9 @@ int script_browse(const char *script_opt)
strcat(cmd, input_name);
}
 
-   strcat(cmd, " 2>&1");
-
-   fp = popen(cmd, "r");
-   if (!fp)
-   goto exit;
-
-   while ((retlen = getline(, , fp)) != -1) {
-   strncpy(sline->line, line, AVERAGE_LINE_LEN);
-
-   /* If one output line is very large, just cut it short */
-   if (retlen >= AVERAGE_LINE_LEN) {
-   sline->line[AVERAGE_LINE_LEN - 1] = '\0';
-   sline->line[AVERAGE_LINE_LEN - 2] = '\n';
-   }
-   list_add_

[PATCH v2 08/11] perf tools report: Support builtin perf script in scripts menu

2019-02-25 Thread Andi Kleen
From: Andi Kleen 

The scripts menu traditionally only showed custom perf scripts.

Allow to run standard perf script with useful default options too.

- Normal perf script
- perf script with assembler (needs xed installed)
- perf script with source code output (needs debuginfo)
- perf script with custom arguments

Then we automatically select the right options to
display the information in the perf.data file.

For example with -b display branch contexts.

It's not easily possible to check for xed's existence
in advance.  perf script usually gives sensible error messages when
it's not available.

Signed-off-by: Andi Kleen 

---
v2:
Pass --inline if needed

squash! perf tools report: Support builtin perf script in scripts menu

Avoid compiler warnings. Allocate cmd buffer dynamically.
---
 tools/perf/ui/browsers/annotate.c |   2 +-
 tools/perf/ui/browsers/hists.c|  20 +++--
 tools/perf/ui/browsers/scripts.c  | 127 +++---
 tools/perf/util/hist.h|   8 +-
 4 files changed, 119 insertions(+), 38 deletions(-)

diff --git a/tools/perf/ui/browsers/annotate.c 
b/tools/perf/ui/browsers/annotate.c
index 35bdfd8b1e71..98d934a36d86 100644
--- a/tools/perf/ui/browsers/annotate.c
+++ b/tools/perf/ui/browsers/annotate.c
@@ -750,7 +750,7 @@ static int annotate_browser__run(struct annotate_browser 
*browser,
continue;
case 'r':
{
-   script_browse(NULL);
+   script_browse(NULL, NULL);
continue;
}
case 'k':
diff --git a/tools/perf/ui/browsers/hists.c b/tools/perf/ui/browsers/hists.c
index 663970f766e1..10c5aa13ca0e 100644
--- a/tools/perf/ui/browsers/hists.c
+++ b/tools/perf/ui/browsers/hists.c
@@ -2343,6 +2343,7 @@ struct popup_action {
struct thread   *thread;
struct map_symbol   ms;
int socket;
+   struct perf_evsel   *evsel;
 
int (*fn)(struct hist_browser *browser, struct popup_action *act);
 };
@@ -2565,7 +2566,7 @@ do_run_script(struct hist_browser *browser __maybe_unused,
n += snprintf(script_opt + n, len - n, " --time %s,%s", start, 
end);
}
 
-   script_browse(script_opt);
+   script_browse(script_opt, act->evsel);
free(script_opt);
return 0;
 }
@@ -2574,7 +2575,7 @@ static int
 add_script_opt_2(struct hist_browser *browser __maybe_unused,
   struct popup_action *act, char **optstr,
   struct thread *thread, struct symbol *sym,
-  const char *tstr)
+  struct perf_evsel *evsel, const char *tstr)
 {
 
if (thread) {
@@ -2592,6 +2593,7 @@ add_script_opt_2(struct hist_browser *browser 
__maybe_unused,
 
act->thread = thread;
act->ms.sym = sym;
+   act->evsel = evsel;
act->fn = do_run_script;
return 1;
 }
@@ -2599,12 +2601,13 @@ add_script_opt_2(struct hist_browser *browser 
__maybe_unused,
 static int
 add_script_opt(struct hist_browser *browser,
   struct popup_action *act, char **optstr,
-  struct thread *thread, struct symbol *sym)
+  struct thread *thread, struct symbol *sym,
+  struct perf_evsel *evsel)
 {
int n, j;
struct hist_entry *he;
 
-   n = add_script_opt_2(browser, act, optstr, thread, sym, "");
+   n = add_script_opt_2(browser, act, optstr, thread, sym, evsel, "");
 
he = hist_browser__selected_entry(browser);
if (sort_order && strstr(sort_order, "time")) {
@@ -2620,7 +2623,7 @@ add_script_opt(struct hist_browser *browser,
  tstr + j,
  sizeof tstr - j);
n += add_script_opt_2(browser, act, optstr, thread, sym,
- tstr);
+ evsel, tstr);
act->time = he->time;
}
return n;
@@ -3091,7 +3094,7 @@ static int perf_evsel__hists_browse(struct perf_evsel 
*evsel, int nr_events,
nr_options += add_script_opt(browser,
 
[nr_options],
 
[nr_options],
-thread, NULL);
+thread, NULL, 
evsel);
}
/*
 * Note that browser->selection != NULL
@@ -3106,11 +3109,12 @@ static int perf_evsel__hists_browse(struct perf_evsel 
*evsel, int nr_events,
nr_options += add_script_opt(browser,
 
[nr_options],
 

Re: [PATCH 03/11] perf tools report: Support nano seconds

2019-02-25 Thread Andi Kleen
On Mon, Feb 25, 2019 at 11:40:45AM -0500, Sebastien Boisvert wrote:
> 
> 
> On 2019-02-24 10:37 a.m., Andi Kleen wrote:
> > From: Andi Kleen 
> > 
> > Upcoming changes add timestamp output in perf report. Add a --ns
> > argument similar to perf script to support nanoseconds resolution
> > when needed.
> 
> Is this a ISO-8601 date with nanoseconds ?

No. 

-Andi


Re: [PATCH 11/11] perf tools report: Implement browsing of individual samples

2019-02-25 Thread Andi Kleen
On Mon, Feb 25, 2019 at 01:56:15PM +0100, Jiri Olsa wrote:
> On Sun, Feb 24, 2019 at 07:37:22AM -0800, Andi Kleen wrote:
> 
> SNIP
> 
> > +static void hists__res_sample(struct hist_entry *he, struct perf_sample 
> > *sample)
> > +{
> > +   struct res_sample *r;
> > +   int j;
> > +
> > +   if (!he->res_samples) {
> > +   he->res_samples = calloc(sizeof(struct res_sample),
> > +   symbol_conf.res_sample);
> > +   if (!he->res_samples)
> > +   return;
> > +   }
> > +   if (he->num_res < symbol_conf.res_sample) {
> > +   j = he->num_res++;
> > +   } else {
> > +   j = random_max(++he->num_res + 1);
> 
> will he->num_res keep raising all the time?

Yes, but the display code limits it to symbol_conf.res_samples

That was intentional so that we can keep track of the total
number of samples. Plan was to eventually display it somewhere
so that the users know how many of the samples are covered
(currently it is not though) 

-Andi


Re: [PATCH 11/11] perf tools report: Implement browsing of individual samples

2019-02-25 Thread Andi Kleen
> for some reason can't see those items in menu

These one needs --samples 10 or similar.

It's off by default currently.

-Andi


Re: [PATCH 07/11] perf tools report: Support running scripts for current time range

2019-02-25 Thread Andi Kleen
> can't see the time ranges in the menu.. what's the path
> to get them listed?

You need to use --sort time,overhead,sym

Without time sorting there are no available ranges.

-Andi


[PATCH 04/11] perf tools report: Parse time quantum

2019-02-24 Thread Andi Kleen
From: Andi Kleen 

Many workloads change over time. perf report currently aggregates
the whole time range reported in perf.data.

This patch adds an option for a time quantum to quantisize the
perf.data over time.

This just adds the option, will be used in follow on patches
for a time sort key.

Signed-off-by: Andi Kleen 
---
 tools/perf/Documentation/perf-report.txt |  4 +++
 tools/perf/builtin-report.c  | 37 
 tools/perf/util/hist.c   |  2 ++
 tools/perf/util/hist.h   |  2 ++
 4 files changed, 45 insertions(+)

diff --git a/tools/perf/Documentation/perf-report.txt 
b/tools/perf/Documentation/perf-report.txt
index 51dbc519dbce..9ec1702bccdd 100644
--- a/tools/perf/Documentation/perf-report.txt
+++ b/tools/perf/Documentation/perf-report.txt
@@ -497,6 +497,10 @@ include::itrace.txt[]
The period/hits keywords set the base the percentage is computed
on - the samples period or the number of samples (hits).
 
+--time-quantum::
+   Configure time quantum for time sort key. Default 100ms.
+   Accepts s, us, ms, ns units.
+
 include::callchain-overhead-calculation.txt[]
 
 SEE ALSO
diff --git a/tools/perf/builtin-report.c b/tools/perf/builtin-report.c
index bb8918a747ba..600e168f3b99 100644
--- a/tools/perf/builtin-report.c
+++ b/tools/perf/builtin-report.c
@@ -47,6 +47,7 @@
 #include 
 #include 
 #include 
+#include "sane_ctype.h"
 #include 
 #include 
 #include 
@@ -926,6 +927,39 @@ report_parse_callchain_opt(const struct option *opt, const 
char *arg, int unset)
return parse_callchain_report_opt(arg);
 }
 
+static int
+parse_time_quantum(const struct option *opt, const char *arg,
+  int unset __maybe_unused)
+{
+   unsigned long *time_q = opt->value;
+   char *end;
+
+   *time_q = strtoul(arg, , 0);
+   if (end == arg)
+   goto parse_err;
+   while (isspace(*end))
+   end++;
+   if (*end == 0)
+   return 0;
+   if (!strcmp(end, "s")) {
+   *time_q *= 10;
+   return 0;
+   }
+   if (!strcmp(end, "ms")) {
+   *time_q *= 100;
+   return 0;
+   }
+   if (!strcmp(end, "us")) {
+   *time_q *= 1000;
+   return 0;
+   }
+   if (!strcmp(end, "ns"))
+   return 0;
+parse_err:
+   pr_err("Cannot parse time quantum `%s'\n", arg);
+   return -1;
+}
+
 int
 report_parse_ignore_callees_opt(const struct option *opt __maybe_unused,
const char *arg, int unset __maybe_unused)
@@ -1148,6 +1182,9 @@ int cmd_report(int argc, const char **argv)
 "Set percent type local/global-period/hits",
 annotate_parse_percent_type),
OPT_BOOLEAN(0, "ns", , "Show times in nanosecs"),
+   OPT_CALLBACK(0, "time-quantum", _quantum, "time (ms|us|ns)",
+"Set time quantum for time sort key (default 100ms)",
+parse_time_quantum),
OPT_END()
};
struct perf_data data = {
diff --git a/tools/perf/util/hist.c b/tools/perf/util/hist.c
index 669f961316f0..d25ca5bb91e6 100644
--- a/tools/perf/util/hist.c
+++ b/tools/perf/util/hist.c
@@ -20,6 +20,8 @@
 #include 
 #include 
 
+unsigned long time_quantum = 1;
+
 static bool hists__filter_entry_by_dso(struct hists *hists,
   struct hist_entry *he);
 static bool hists__filter_entry_by_thread(struct hists *hists,
diff --git a/tools/perf/util/hist.h b/tools/perf/util/hist.h
index 4af27fbab24f..8e602b486c5c 100644
--- a/tools/perf/util/hist.h
+++ b/tools/perf/util/hist.h
@@ -531,4 +531,6 @@ static inline int hists__scnprintf_title(struct hists 
*hists, char *bf, size_t s
return __hists__scnprintf_title(hists, bf, size, true);
 }
 
+extern unsigned long time_quantum;
+
 #endif /* __PERF_HIST_H */
-- 
2.17.2



[PATCH 01/11] perf tools script: Handle missing fields with -F +..

2019-02-24 Thread Andi Kleen
From: Andi Kleen 

When using -F + syntax to add a field the existing defaults
are currently all marked user_set. This can cause errors when
some field is missing in the perf.data

This patch tracks the actually user set fields separately,
so that we don't error out in this case.

Before:

% perf record true
% perf script -F +metric
Samples for 'cycles:ppp' event do not have CPU attribute set. Cannot print 
'cpu' field.
%

After

5 perf record true
% perf script -F +metric
perf 28936 278636.237688:  1 cycles:ppp:  8117da99 
perf_event_exec+0x59 (/lib/modules/4.20.0-odilo/build/vmlinux)
...
%

Signed-off-by: Andi Kleen 
---
 tools/perf/builtin-script.c | 10 +++---
 1 file changed, 7 insertions(+), 3 deletions(-)

diff --git a/tools/perf/builtin-script.c b/tools/perf/builtin-script.c
index 8d5fe092525c..29e95a25c6e6 100644
--- a/tools/perf/builtin-script.c
+++ b/tools/perf/builtin-script.c
@@ -149,6 +149,7 @@ static struct {
unsigned int print_ip_opts;
u64 fields;
u64 invalid_fields;
+   u64 user_set_fields;
 } output[OUTPUT_TYPE_MAX] = {
 
[PERF_TYPE_HARDWARE] = {
@@ -345,7 +346,7 @@ static int perf_evsel__do_check_stype(struct perf_evsel 
*evsel,
if (attr->sample_type & sample_type)
return 0;
 
-   if (output[type].user_set) {
+   if (output[type].user_set_fields & field) {
if (allow_user_set)
return 0;
evname = perf_evsel__name(evsel);
@@ -2628,10 +2629,13 @@ static int parse_output_fields(const struct option *opt 
__maybe_unused,
pr_warning("\'%s\' not valid for %s 
events. Ignoring.\n",
   all_output_options[i].str, 
event_type(j));
} else {
-   if (change == REMOVE)
+   if (change == REMOVE) {
output[j].fields &= 
~all_output_options[i].field;
-   else
+   output[j].user_set_fields &= 
~all_output_options[i].field;
+   } else {
output[j].fields |= 
all_output_options[i].field;
+   output[j].user_set_fields |= 
all_output_options[i].field;
+   }
output[j].user_set = true;
output[j].wildcard_set = true;
}
-- 
2.17.2



Support sample context in perf report

2019-02-24 Thread Andi Kleen
We currently have two ways to look at sample data in perf:
either use perf report to aggregate everything, or use
perf script to look at all individual samples.

Both ways are useful. Of course aggregation is useful
to quickly find the most expensive part of the code.

But sometimes a single sample is not good enough to
determine the problem and we need to look at context, either
through branch contexts, or other previous samples (e.g. for
correlating different micro architecture events or computing
metrics)

This can be done through perf script today, but it can
be rather cumbersome to find the right samples to look
at.

Another problem with perf report is that it aggregates
the whole measurement period. But many real workloads
have phases where they behave quite differently, and it is
often not useful to combine them into a single histogram.

While this can be worked around with the --time option
to report, it can be quite cumbersome.

This patch kit attempts to address some of these
problems in perf report by making it time aware.

- It adds a new time sort key that allows perf report
to separate samples from different regions. The time
regions can be defined with a new --time-quantum option.

- Then it extends the perf script support in the
tui record browser to allow browsing samples for 
different time regions from within a perf report
session.

- Finally it extends the report browser script display
support to automatically select sensible defaults
based on what was recorded. For example it will
automatically show branch contexts with -b.

There could be probably be done more to make
perf report even better for such use cases (e.g. a real
time line display), but this basic support is good
enough for many useful usages.

Also available in
git://git.kernel.org/pub/scm/linux/kernel/git/ak/linux-misc.git perf/streams-1



[PATCH 08/11] perf tools: Add perf_exe() helper to find perf binary

2019-02-24 Thread Andi Kleen
From: Andi Kleen 

Also convert one existing user.

Signed-off-by: Andi Kleen 
---
 tools/perf/util/header.c | 12 +++-
 tools/perf/util/util.c   | 10 ++
 tools/perf/util/util.h   |  2 ++
 3 files changed, 15 insertions(+), 9 deletions(-)

diff --git a/tools/perf/util/header.c b/tools/perf/util/header.c
index 61ce197c5362..2f290a0343d4 100644
--- a/tools/perf/util/header.c
+++ b/tools/perf/util/header.c
@@ -526,17 +526,11 @@ static int write_event_desc(struct feat_fd *ff,
 static int write_cmdline(struct feat_fd *ff,
 struct perf_evlist *evlist __maybe_unused)
 {
-   char buf[MAXPATHLEN];
-   u32 n;
-   int i, ret;
+   char pbuf[MAXPATHLEN], *buf;
+   int i, ret, n;
 
/* actual path to perf binary */
-   ret = readlink("/proc/self/exe", buf, sizeof(buf) - 1);
-   if (ret <= 0)
-   return -1;
-
-   /* readlink() does not add null termination */
-   buf[ret] = '\0';
+   buf = perf_exe(pbuf, MAXPATHLEN);
 
/* account for binary path */
n = perf_env.nr_cmdline + 1;
diff --git a/tools/perf/util/util.c b/tools/perf/util/util.c
index 320b0fef249a..dd693e018aef 100644
--- a/tools/perf/util/util.c
+++ b/tools/perf/util/util.c
@@ -507,3 +507,13 @@ const char *perf_tip(const char *dirpath)
 
return tip;
 }
+
+char *perf_exe(char *buf, int len)
+{
+   int n = readlink("/proc/self/exe", buf, len);
+   if (n > 0) {
+   buf[n] = 0;
+   return buf;
+   }
+   return strcpy(buf, "perf");
+}
diff --git a/tools/perf/util/util.h b/tools/perf/util/util.h
index ece040b799f6..dc2a99ad5c96 100644
--- a/tools/perf/util/util.h
+++ b/tools/perf/util/util.h
@@ -76,6 +76,8 @@ extern bool perf_singlethreaded;
 void perf_set_singlethreaded(void);
 void perf_set_multithreaded(void);
 
+char *perf_exe(char *buf, int len);
+
 #ifndef O_CLOEXEC
 #ifdef __sparc__
 #define O_CLOEXEC  0x40
-- 
2.17.2



[PATCH 06/11] perf tools report: Use less for scripts output

2019-02-24 Thread Andi Kleen
From: Andi Kleen 

The UI viewer for scripts output has a lot of limitations: limited size,
no search or save function, slow, and various other issues.

Just use 'less' to display directly on the terminal instead.

This won't work in gtk mode, but gtk doesn't support these
context menus anyways. If that is every done could use an terminal
for the output.

Signed-off-by: Andi Kleen 
---
 tools/perf/ui/browsers/scripts.c | 125 +--
 1 file changed, 17 insertions(+), 108 deletions(-)

diff --git a/tools/perf/ui/browsers/scripts.c b/tools/perf/ui/browsers/scripts.c
index 90a32ac69e76..92707be174e4 100644
--- a/tools/perf/ui/browsers/scripts.c
+++ b/tools/perf/ui/browsers/scripts.c
@@ -12,24 +12,6 @@
 #include "../helpline.h"
 #include "../libslang.h"
 
-/* 2048 lines should be enough for a script output */
-#define MAX_LINES  2048
-
-/* 160 bytes for one output line */
-#define AVERAGE_LINE_LEN   160
-
-struct script_line {
-   struct list_head node;
-   char line[AVERAGE_LINE_LEN];
-};
-
-struct perf_script_browser {
-   struct ui_browser b;
-   struct list_head entries;
-   const char *script_name;
-   int nr_lines;
-};
-
 #define SCRIPT_NAMELEN 128
 #define SCRIPT_MAX_NO  64
 /*
@@ -73,69 +55,29 @@ static int list_scripts(char *script_name)
return ret;
 }
 
-static void script_browser__write(struct ui_browser *browser,
-  void *entry, int row)
+static void run_script(char *cmd)
 {
-   struct script_line *sline = list_entry(entry, struct script_line, node);
-   bool current_entry = ui_browser__is_current_entry(browser, row);
-
-   ui_browser__set_color(browser, current_entry ? HE_COLORSET_SELECTED :
-  HE_COLORSET_NORMAL);
-
-   ui_browser__write_nstring(browser, sline->line, browser->width);
+   pr_debug("Running %s\n", cmd);
+   SLang_reset_tty();
+   if (system(cmd) < 0)
+   pr_warning("Cannot run %s\n", cmd);
+   /*
+* SLang doesn't seem to reset the whole terminal, so be more
+* forceful to get back to the original state.
+*/
+   printf("\033[c\033[H\033[J");
+   fflush(stdout);
+   SLang_init_tty(0, 0, 0);
+   SLsmg_refresh();
 }
 
-static int script_browser__run(struct perf_script_browser *browser)
-{
-   int key;
-
-   if (ui_browser__show(>b, browser->script_name,
-"Press ESC to exit") < 0)
-   return -1;
-
-   while (1) {
-   key = ui_browser__run(>b, 0);
-
-   /* We can add some special key handling here if needed */
-   break;
-   }
-
-   ui_browser__hide(>b);
-   return key;
-}
-
-
 int script_browse(const char *script_opt)
 {
char cmd[SCRIPT_FULLPATH_LEN*2], script_name[SCRIPT_FULLPATH_LEN];
-   char *line = NULL;
-   size_t len = 0;
-   ssize_t retlen;
-   int ret = -1, nr_entries = 0;
-   FILE *fp;
-   void *buf;
-   struct script_line *sline;
-
-   struct perf_script_browser script = {
-   .b = {
-   .refresh= ui_browser__list_head_refresh,
-   .seek   = ui_browser__list_head_seek,
-   .write  = script_browser__write,
-   },
-   .script_name = script_name,
-   };
-
-   INIT_LIST_HEAD();
-
-   /* Save each line of the output in one struct script_line object. */
-   buf = zalloc((sizeof(*sline)) * MAX_LINES);
-   if (!buf)
-   return -1;
-   sline = buf;
 
memset(script_name, 0, SCRIPT_FULLPATH_LEN);
if (list_scripts(script_name))
-   goto exit;
+   return -1;
 
sprintf(cmd, "perf script -s %s ", script_name);
 
@@ -147,42 +89,9 @@ int script_browse(const char *script_opt)
strcat(cmd, input_name);
}
 
-   strcat(cmd, " 2>&1");
-
-   fp = popen(cmd, "r");
-   if (!fp)
-   goto exit;
-
-   while ((retlen = getline(, , fp)) != -1) {
-   strncpy(sline->line, line, AVERAGE_LINE_LEN);
-
-   /* If one output line is very large, just cut it short */
-   if (retlen >= AVERAGE_LINE_LEN) {
-   sline->line[AVERAGE_LINE_LEN - 1] = '\0';
-   sline->line[AVERAGE_LINE_LEN - 2] = '\n';
-   }
-   list_add_tail(>node, );
-
-   if (script.b.width < retlen)
-   script.b.width = retlen;
-
-   if (nr_entries++ >= MAX_LINES - 1)
-   break;
-   sline++;
-   }
-
-   if (script.b.width > AVERAGE_LINE_LEN)
-   script.b.width = AVERAGE_LINE_LEN;
-
-   free

[PATCH 05/11] perf tools report: Support time sort key

2019-02-24 Thread Andi Kleen
From: Andi Kleen 

Add a time sort key to perf report to display samples for
different time quantums separately. This allows easier
analysis of workloads that change over time, and also
will allow looking at the context of samples.

% perf record ...
% perf report --sort time,overhead,symbol --time-quantum 1ms --stdio
...
 0.67%  277061.87300  [.] _dl_start
 0.50%  277061.87300  [.] f1
 0.50%  277061.87300  [.] f2
 0.33%  277061.87300  [.] main
 0.29%  277061.87300  [.] _dl_lookup_symbol_x
 0.29%  277061.87300  [.] dl_main
 0.29%  277061.87300  [.] do_lookup_x
 0.17%  277061.87300  [.] _dl_debug_initialize
 0.17%  277061.87300  [.] _dl_init_paths
 0.08%  277061.87300  [.] check_match
 0.04%  277061.87300  [.] _dl_count_modids
 1.33%  277061.87400  [.] f1
 1.33%  277061.87400  [.] f2
 1.33%  277061.87400  [.] main
 1.17%  277061.87500  [.] main
 1.08%  277061.87500  [.] f1
 1.08%  277061.87500  [.] f2
 1.00%  277061.87600  [.] main
 0.83%  277061.87600  [.] f1
 0.83%  277061.87600  [.] f2
 1.00%  277061.87700  [.] main

Signed-off-by: Andi Kleen 
---
 tools/perf/Documentation/perf-report.txt |  2 ++
 tools/perf/util/hist.c   |  9 ++
 tools/perf/util/hist.h   |  1 +
 tools/perf/util/sort.c   | 38 
 tools/perf/util/sort.h   |  2 ++
 5 files changed, 52 insertions(+)

diff --git a/tools/perf/Documentation/perf-report.txt 
b/tools/perf/Documentation/perf-report.txt
index 9ec1702bccdd..546d87221ad8 100644
--- a/tools/perf/Documentation/perf-report.txt
+++ b/tools/perf/Documentation/perf-report.txt
@@ -105,6 +105,8 @@ OPTIONS
guest machine
- sample: Number of sample
- period: Raw number of event count of sample
+   - time: Separate the samples by time stamp with the resolution 
specified by
+   --time-quantum (default 100ms). Specify with overhead and before it.
 
By default, comm, dso and symbol keys are used.
(i.e. --sort comm,dso,symbol)
diff --git a/tools/perf/util/hist.c b/tools/perf/util/hist.c
index d25ca5bb91e6..7bf4d85a3021 100644
--- a/tools/perf/util/hist.c
+++ b/tools/perf/util/hist.c
@@ -194,6 +194,7 @@ void hists__calc_col_len(struct hists *hists, struct 
hist_entry *h)
hists__new_col_len(hists, HISTC_MEM_LVL, 21 + 3);
hists__new_col_len(hists, HISTC_LOCAL_WEIGHT, 12);
hists__new_col_len(hists, HISTC_GLOBAL_WEIGHT, 12);
+   hists__new_col_len(hists, HISTC_TIME, 12);
 
if (h->srcline) {
len = MAX(strlen(h->srcline), strlen(sort_srcline.se_header));
@@ -248,6 +249,13 @@ static void he_stat__add_cpumode_period(struct he_stat 
*he_stat,
}
 }
 
+static long hist_time(unsigned long time)
+{
+   if (time_quantum)
+   return (time / time_quantum) * time_quantum;
+   return (time / 100) * 100;
+}
+
 static void he_stat__add_period(struct he_stat *he_stat, u64 period,
u64 weight)
 {
@@ -628,6 +636,7 @@ __hists__add_entry(struct hists *hists,
.raw_data = sample->raw_data,
.raw_size = sample->raw_size,
.ops = ops,
+   .time = hist_time(sample->time),
}, *he = hists__findnew_entry(hists, , al, sample_self);
 
if (!hists->has_callchains && he && he->callchain_size != 0)
diff --git a/tools/perf/util/hist.h b/tools/perf/util/hist.h
index 8e602b486c5c..8b21b6d006be 100644
--- a/tools/perf/util/hist.h
+++ b/tools/perf/util/hist.h
@@ -31,6 +31,7 @@ enum hist_filter {
 
 enum hist_column {
HISTC_SYMBOL,
+   HISTC_TIME,
HISTC_DSO,
HISTC_THREAD,
HISTC_COMM,
diff --git a/tools/perf/util/sort.c b/tools/perf/util/sort.c
index d1010a966f9b..974a4c553f82 100644
--- a/tools/perf/util/sort.c
+++ b/tools/perf/util/sort.c
@@ -15,6 +15,7 @@
 #include 
 #include "mem-events.h"
 #include "annotate.h"
+#include "time-utils.h"
 #include 
 
 regex_tparent_regex;
@@ -649,6 +650,42 @@ struct sort_entry sort_socket = {
.se_width_idx   = HISTC_SOCKET,
 };
 
+/* --sort time */
+
+static int64_t
+sort__time_cmp(struct hist_entry *left, struct hist_entry *right)
+{
+   return right->time - left->time;
+}
+
+static int hist_entry__time_snprintf(struct hist_entry *he, char *bf,
+   size_t size, unsigned int width)
+{
+   unsigned long secs;
+   unsigned long long nsecs;
+   char he_time[32];
+
+   nsecs = he->time;
+   secs = nsecs / 10;
+   nsecs -= secs * 10;
+
+   if (nanosecs)
+   snprintf(he_time, sizeof he_time, "%5lu.%09llu: ",
+secs, nsecs);
+   else
+   timestamp__scnprintf_usec(he->time, he_time,
+

[PATCH 02/11] perf tools script: Support insn output for normal samples

2019-02-24 Thread Andi Kleen
From: Andi Kleen 

perf script -F +insn was only working for PT traces because
the PT instruction decoder was filling in the insn/insn_len
sample attributes. Support it for non PT samples too on x86
using the existing x86 instruction decoder.

% perf record -a sleep 1
% perf script -F ip,sym,insn --xed
 811704c9 remote_function   movl  %eax, 0x18(%rbx)
 8100bb50 intel_bts_enable_localretq
 81048612 native_apic_mem_write movl  %esi, 
-0xa04000(%rdi)
 81048612 native_apic_mem_write movl  %esi, 
-0xa04000(%rdi)
 81048612 native_apic_mem_write movl  %esi, 
-0xa04000(%rdi)
 810f1f79 generic_exec_single   xor %eax, %eax
 811704c9 remote_function   movl  %eax, 0x18(%rbx)
 8100bb34 intel_bts_enable_localmovl  0x2000(%rax), %edx
 81048610 native_apic_mem_write mov %edi, %edi
...

Signed-off-by: Andi Kleen 
---
 tools/perf/arch/x86/util/Build  |  1 +
 tools/perf/arch/x86/util/archinsn.c | 41 +
 tools/perf/builtin-script.c | 10 +++
 tools/perf/util/archinsn.h  | 12 +
 4 files changed, 64 insertions(+)
 create mode 100644 tools/perf/arch/x86/util/archinsn.c
 create mode 100644 tools/perf/util/archinsn.h

diff --git a/tools/perf/arch/x86/util/Build b/tools/perf/arch/x86/util/Build
index 7aab0be5fc5f..7b8e69bbbdfe 100644
--- a/tools/perf/arch/x86/util/Build
+++ b/tools/perf/arch/x86/util/Build
@@ -6,6 +6,7 @@ perf-y += perf_regs.o
 perf-y += group.o
 perf-y += machine.o
 perf-y += event.o
+perf-y += archinsn.o
 
 perf-$(CONFIG_DWARF) += dwarf-regs.o
 perf-$(CONFIG_BPF_PROLOGUE) += dwarf-regs.o
diff --git a/tools/perf/arch/x86/util/archinsn.c 
b/tools/perf/arch/x86/util/archinsn.c
new file mode 100644
index ..9e3b0828b018
--- /dev/null
+++ b/tools/perf/arch/x86/util/archinsn.c
@@ -0,0 +1,41 @@
+// SPDX-License-Identifier: GPL-2.0
+#include "perf.h"
+#include "archinsn.h"
+#include "util/intel-pt-decoder/insn.h"
+#include "machine.h"
+#include "thread.h"
+#include "symbol.h"
+#include "map.h"
+
+void arch_fetch_insn(struct perf_sample *sample,
+struct thread *thread,
+struct machine *machine)
+{
+   struct addr_location al;
+   u8 cpumode;
+   long offset;
+   struct insn insn;
+   int len;
+
+   if (!sample->ip)
+   return;
+
+   if (machine__kernel_ip(machine, sample->ip))
+   cpumode = PERF_RECORD_MISC_KERNEL;
+   else
+   cpumode = PERF_RECORD_MISC_USER;
+   if (!thread__find_map(thread, cpumode, sample->ip, ) || !al.map->dso)
+   return;
+   if (al.map->dso->data.status == DSO_DATA_STATUS_ERROR)
+   return;
+   map__load(al.map);
+   offset = al.map->map_ip(al.map, sample->ip);
+   len = dso__data_read_offset(al.map->dso, machine, offset, (u8 
*)sample->insn,
+  sizeof(sample->insn));
+   if (len <= 0)
+   return;
+   insn_init(, sample->insn, len, al.map->dso->is_64_bit);
+   insn_get_length();
+   if (insn_complete() && insn.length <= len)
+   sample->insn_len = insn.length;
+}
diff --git a/tools/perf/builtin-script.c b/tools/perf/builtin-script.c
index 29e95a25c6e6..261055302d08 100644
--- a/tools/perf/builtin-script.c
+++ b/tools/perf/builtin-script.c
@@ -29,6 +29,7 @@
 #include "util/time-utils.h"
 #include "util/path.h"
 #include "print_binary.h"
+#include "archinsn.h"
 #include 
 #include 
 #include 
@@ -1227,6 +1228,12 @@ static int perf_sample__fprintf_callindent(struct 
perf_sample *sample,
return len + dlen;
 }
 
+__weak void arch_fetch_insn(struct perf_sample *sample __maybe_unused,
+   struct thread *thread __maybe_unused,
+   struct machine *machine __maybe_unused)
+{
+}
+
 static int perf_sample__fprintf_insn(struct perf_sample *sample,
 struct perf_event_attr *attr,
 struct thread *thread,
@@ -1234,6 +1241,9 @@ static int perf_sample__fprintf_insn(struct perf_sample 
*sample,
 {
int printed = 0;
 
+   if (sample->insn_len == 0)
+   arch_fetch_insn(sample, thread, machine);
+
if (PRINT_FIELD(INSNLEN))
printed += fprintf(fp, " ilen: %d", sample->insn_len);
if (PRINT_FIELD(INSN)) {
diff --git a/tools/perf/util/archinsn.h b/tools/perf/util/archinsn.h
new file mode 100644
index ..448cbb6b8d7e
--- /dev/null
+++ b/tools/perf/util/archinsn.h
@@ -0,0 +1,12 @@
+#ifndef INSN_H
+#define INSN_H 1
+
+struct perf_sample;
+struct machine;
+struct thread;
+
+void arch_fetch_insn(struct perf_sample *sample,
+struct thread *thread,
+struct machine *machine);
+
+#endif
-- 
2.17.2



[PATCH 07/11] perf tools report: Support running scripts for current time range

2019-02-24 Thread Andi Kleen
From: Andi Kleen 

When using the time sort key, add new context menus to run
scripts for only the currently selected time range. Compute
the correct range for the selection add pass it as the --time option to
perf script.

Signed-off-by: Andi Kleen 
---
 tools/perf/ui/browsers/hists.c | 82 +-
 1 file changed, 71 insertions(+), 11 deletions(-)

diff --git a/tools/perf/ui/browsers/hists.c b/tools/perf/ui/browsers/hists.c
index aef800d97ea1..2a4f363f8cc7 100644
--- a/tools/perf/ui/browsers/hists.c
+++ b/tools/perf/ui/browsers/hists.c
@@ -30,6 +30,7 @@
 #include "srcline.h"
 #include "string2.h"
 #include "units.h"
+#include "time-utils.h"
 
 #include "sane_ctype.h"
 
@@ -2338,6 +2339,7 @@ static int switch_data_file(void)
 }
 
 struct popup_action {
+   unsigned long   time;
struct thread   *thread;
struct map_symbol   ms;
int socket;
@@ -2527,36 +2529,64 @@ static int
 do_run_script(struct hist_browser *browser __maybe_unused,
  struct popup_action *act)
 {
-   char script_opt[64];
-   memset(script_opt, 0, sizeof(script_opt));
+   char *script_opt;
+   int len;
+   int n = 0;
+
+   len = 100;
+   if (act->thread)
+   len += strlen(thread__comm_str(act->thread));
+   else if (act->ms.sym)
+   len += strlen(act->ms.sym->name);
+   script_opt = malloc(len);
+   if (!script_opt)
+   return -1;
 
+   script_opt[0] = 0;
if (act->thread) {
-   scnprintf(script_opt, sizeof(script_opt), " -c %s ",
+   n = scnprintf(script_opt, len, " -c %s ",
  thread__comm_str(act->thread));
} else if (act->ms.sym) {
-   scnprintf(script_opt, sizeof(script_opt), " -S %s ",
+   n = scnprintf(script_opt, len, " -S %s ",
  act->ms.sym->name);
}
 
+   if (act->time) {
+   char start[64], end[64];
+   unsigned long starttime = act->time;
+   unsigned long endtime = act->time + time_quantum;
+
+   if (starttime == endtime) { /* Display 1ms as fallback */
+   starttime -= 1*100;
+   endtime += 1*100;
+   }
+   timestamp__scnprintf_usec(starttime, start, sizeof start);
+   timestamp__scnprintf_usec(endtime, end, sizeof end);
+   n += snprintf(script_opt + n, len - n, " --time %s,%s", start, 
end);
+   }
+
script_browse(script_opt);
+   free(script_opt);
return 0;
 }
 
 static int
-add_script_opt(struct hist_browser *browser __maybe_unused,
+add_script_opt_2(struct hist_browser *browser __maybe_unused,
   struct popup_action *act, char **optstr,
-  struct thread *thread, struct symbol *sym)
+  struct thread *thread, struct symbol *sym,
+  const char *tstr)
 {
+
if (thread) {
-   if (asprintf(optstr, "Run scripts for samples of thread [%s]",
-thread__comm_str(thread)) < 0)
+   if (asprintf(optstr, "Run scripts for samples of thread [%s]%s",
+thread__comm_str(thread), tstr) < 0)
return 0;
} else if (sym) {
-   if (asprintf(optstr, "Run scripts for samples of symbol [%s]",
-sym->name) < 0)
+   if (asprintf(optstr, "Run scripts for samples of symbol [%s]%s",
+sym->name, tstr) < 0)
return 0;
} else {
-   if (asprintf(optstr, "Run scripts for all samples") < 0)
+   if (asprintf(optstr, "Run scripts for all samples%s", tstr) < 0)
return 0;
}
 
@@ -2566,6 +2596,36 @@ add_script_opt(struct hist_browser *browser 
__maybe_unused,
return 1;
 }
 
+static int
+add_script_opt(struct hist_browser *browser,
+  struct popup_action *act, char **optstr,
+  struct thread *thread, struct symbol *sym)
+{
+   int n, j;
+   struct hist_entry *he;
+
+   n = add_script_opt_2(browser, act, optstr, thread, sym, "");
+
+   he = hist_browser__selected_entry(browser);
+   if (sort_order && strstr(sort_order, "time")) {
+   char tstr[128];
+
+   optstr++;
+   act++;
+   j = sprintf(tstr, " in ");
+   j += timestamp__scnprintf_usec(he->time, tstr + j,
+  sizeof tstr - j);
+   j += sprintf(tstr + j, "-");
+   timestam

[PATCH 10/11] perf tools: Add utility function to print ns time stamps

2019-02-24 Thread Andi Kleen
From: Andi Kleen 

Add a utility function to print nanosecond timestamps.

Signed-off-by: Andi Kleen 
---
 tools/perf/util/time-utils.c | 8 
 tools/perf/util/time-utils.h | 1 +
 2 files changed, 9 insertions(+)

diff --git a/tools/perf/util/time-utils.c b/tools/perf/util/time-utils.c
index 6193b46050a5..a63bdf4cfd1b 100644
--- a/tools/perf/util/time-utils.c
+++ b/tools/perf/util/time-utils.c
@@ -404,6 +404,14 @@ int timestamp__scnprintf_usec(u64 timestamp, char *buf, 
size_t sz)
return scnprintf(buf, sz, "%"PRIu64".%06"PRIu64, sec, usec);
 }
 
+int timestamp__scnprintf_nsec(u64 timestamp, char *buf, size_t sz)
+{
+   u64  sec = timestamp / NSEC_PER_SEC;
+   u64 nsec = timestamp % NSEC_PER_SEC;
+
+   return scnprintf(buf, sz, "%"PRIu64".%09"PRIu64, sec, nsec);
+}
+
 int fetch_current_timestamp(char *buf, size_t sz)
 {
struct timeval tv;
diff --git a/tools/perf/util/time-utils.h b/tools/perf/util/time-utils.h
index 70b177d2b98c..9266cf4a8e58 100644
--- a/tools/perf/util/time-utils.h
+++ b/tools/perf/util/time-utils.h
@@ -24,6 +24,7 @@ bool perf_time__ranges_skip_sample(struct perf_time_interval 
*ptime_buf,
   int num, u64 timestamp);
 
 int timestamp__scnprintf_usec(u64 timestamp, char *buf, size_t sz);
+int timestamp__scnprintf_nsec(u64 timestamp, char *buf, size_t sz);
 
 int fetch_current_timestamp(char *buf, size_t sz);
 
-- 
2.17.2



[PATCH 11/11] perf tools report: Implement browsing of individual samples

2019-02-24 Thread Andi Kleen
From: Andi Kleen 

Now report can show whole time periods with perf script,
but the user still has to find individual samples of interest
manually.

It would be expensive and complicated to search for the
right samples in the whole perf file. Typically users
only need to look at a small number of samples for useful
analysis.

Also the full scripts tend to show samples of all CPUs and all
threads mixed up, which can be very confusing on larger systems.

Add a new --samples option to save a small random number of samples
per hist entry

Use a reservoir sample technique to select a representatve
number of samples.

Then allow browsing the samples using perf script
as part of the hist entry context menu. This automatically
adds the right filters, so only the thread or cpu of the sample
is displayed. Then we use less' search functionality
to directly jump the to the time stamp of the selected
sample.

It uses different menus for assembler and source display.
Assembler needs xed installed and source needs debuginfo.

Currently it only supports as many samples as fit on
the screen due to some limitations in the slang ui code.

Signed-off-by: Andi Kleen 
---
 tools/perf/Documentation/perf-report.txt |  4 ++
 tools/perf/builtin-report.c  |  2 +
 tools/perf/ui/browsers/Build |  1 +
 tools/perf/ui/browsers/hists.c   | 47 +++
 tools/perf/ui/browsers/res_sample.c  | 74 
 tools/perf/ui/browsers/scripts.c |  2 +-
 tools/perf/util/hist.c   | 36 
 tools/perf/util/hist.h   | 19 ++
 tools/perf/util/sort.h   |  8 +++
 tools/perf/util/symbol.c |  1 +
 tools/perf/util/symbol_conf.h|  1 +
 11 files changed, 194 insertions(+), 1 deletion(-)
 create mode 100644 tools/perf/ui/browsers/res_sample.c

diff --git a/tools/perf/Documentation/perf-report.txt 
b/tools/perf/Documentation/perf-report.txt
index 546d87221ad8..f441baa794ce 100644
--- a/tools/perf/Documentation/perf-report.txt
+++ b/tools/perf/Documentation/perf-report.txt
@@ -461,6 +461,10 @@ include::itrace.txt[]
 --socket-filter::
Only report the samples on the processor socket that match with this 
filter
 
+--samples=N::
+   Save N individual samples for each histogram entry to show context in 
perf
+   report tui browser.
+
 --raw-trace::
When displaying traceevent output, do not use print fmt or plugins.
 
diff --git a/tools/perf/builtin-report.c b/tools/perf/builtin-report.c
index 600e168f3b99..9ad9dfa431d1 100644
--- a/tools/perf/builtin-report.c
+++ b/tools/perf/builtin-report.c
@@ -1154,6 +1154,8 @@ int cmd_report(int argc, const char **argv)
OPT_BOOLEAN(0, "demangle-kernel", _conf.demangle_kernel,
"Enable kernel symbol demangling"),
OPT_BOOLEAN(0, "mem-mode", _mode, "mem access profile"),
+   OPT_INTEGER(0, "samples", _conf.res_sample,
+   "Number of samples to save per histogram entry for 
individual browsing"),
OPT_CALLBACK(0, "percent-limit", , "percent",
 "Don't show entries under that percent", 
parse_percent_limit),
OPT_CALLBACK(0, "percentage", NULL, "relative|absolute",
diff --git a/tools/perf/ui/browsers/Build b/tools/perf/ui/browsers/Build
index 8fee56b46502..fdf86f7981ca 100644
--- a/tools/perf/ui/browsers/Build
+++ b/tools/perf/ui/browsers/Build
@@ -3,6 +3,7 @@ perf-y += hists.o
 perf-y += map.o
 perf-y += scripts.o
 perf-y += header.o
+perf-y += res_sample.o
 
 CFLAGS_annotate.o += -DENABLE_SLFUTURE_CONST
 CFLAGS_hists.o+= -DENABLE_SLFUTURE_CONST
diff --git a/tools/perf/ui/browsers/hists.c b/tools/perf/ui/browsers/hists.c
index 9db8daa20044..6b27ff1494a8 100644
--- a/tools/perf/ui/browsers/hists.c
+++ b/tools/perf/ui/browsers/hists.c
@@ -2344,6 +2344,7 @@ struct popup_action {
struct map_symbol   ms;
int socket;
struct perf_evsel   *evsel;
+   enum rstype rstype;
 
int (*fn)(struct hist_browser *browser, struct popup_action *act);
 };
@@ -2571,6 +2572,19 @@ do_run_script(struct hist_browser *browser 
__maybe_unused,
return 0;
 }
 
+static int
+do_res_sample_script(struct hist_browser *browser __maybe_unused,
+struct popup_action *act)
+{
+   struct hist_entry *he;
+   int num_res;
+
+   he = hist_browser__selected_entry(browser);
+   num_res = min(he->num_res, symbol_conf.res_sample);
+   res_sample_browse(he->res_samples, num_res, act->evsel, act->rstype);
+   return 0;
+}
+
 static int
 add_script_opt_2(struct hist_browser *browser __maybe_unused,
   struct popup_action *act, char **optstr,
@@ -2629,6 +2643,27 @@ add_script_opt(struct hist_browser *browser,
return n;
 }
 
+static int
+

[PATCH 09/11] perf tools report: Support builtin perf script in scripts menu

2019-02-24 Thread Andi Kleen
From: Andi Kleen 

The scripts menu traditionally only showed custom perf scripts.

Allow to run standard perf script with useful default options too.

- Normal perf script
- perf script with assembler (needs xed installed)
- perf script with source code output (needs debuginfo)
- perf script with custom arguments

Then we automatically select the right options to
display the information in the perf.data file.

For example with -b display branch contexts.

It's not easily possible to check for xed's existence
in advance.  perf script usually gives sensible error messages when
it's not available.

Signed-off-by: Andi Kleen 
---
 tools/perf/ui/browsers/annotate.c |   2 +-
 tools/perf/ui/browsers/hists.c|  20 +++---
 tools/perf/ui/browsers/scripts.c  | 109 +-
 tools/perf/util/hist.h|   8 ++-
 4 files changed, 112 insertions(+), 27 deletions(-)

diff --git a/tools/perf/ui/browsers/annotate.c 
b/tools/perf/ui/browsers/annotate.c
index 35bdfd8b1e71..98d934a36d86 100644
--- a/tools/perf/ui/browsers/annotate.c
+++ b/tools/perf/ui/browsers/annotate.c
@@ -750,7 +750,7 @@ static int annotate_browser__run(struct annotate_browser 
*browser,
continue;
case 'r':
{
-   script_browse(NULL);
+   script_browse(NULL, NULL);
continue;
}
case 'k':
diff --git a/tools/perf/ui/browsers/hists.c b/tools/perf/ui/browsers/hists.c
index 2a4f363f8cc7..9db8daa20044 100644
--- a/tools/perf/ui/browsers/hists.c
+++ b/tools/perf/ui/browsers/hists.c
@@ -2343,6 +2343,7 @@ struct popup_action {
struct thread   *thread;
struct map_symbol   ms;
int socket;
+   struct perf_evsel   *evsel;
 
int (*fn)(struct hist_browser *browser, struct popup_action *act);
 };
@@ -2565,7 +2566,7 @@ do_run_script(struct hist_browser *browser __maybe_unused,
n += snprintf(script_opt + n, len - n, " --time %s,%s", start, 
end);
}
 
-   script_browse(script_opt);
+   script_browse(script_opt, act->evsel);
free(script_opt);
return 0;
 }
@@ -2574,7 +2575,7 @@ static int
 add_script_opt_2(struct hist_browser *browser __maybe_unused,
   struct popup_action *act, char **optstr,
   struct thread *thread, struct symbol *sym,
-  const char *tstr)
+  struct perf_evsel *evsel, const char *tstr)
 {
 
if (thread) {
@@ -2592,6 +2593,7 @@ add_script_opt_2(struct hist_browser *browser 
__maybe_unused,
 
act->thread = thread;
act->ms.sym = sym;
+   act->evsel = evsel;
act->fn = do_run_script;
return 1;
 }
@@ -2599,12 +2601,13 @@ add_script_opt_2(struct hist_browser *browser 
__maybe_unused,
 static int
 add_script_opt(struct hist_browser *browser,
   struct popup_action *act, char **optstr,
-  struct thread *thread, struct symbol *sym)
+  struct thread *thread, struct symbol *sym,
+  struct perf_evsel *evsel)
 {
int n, j;
struct hist_entry *he;
 
-   n = add_script_opt_2(browser, act, optstr, thread, sym, "");
+   n = add_script_opt_2(browser, act, optstr, thread, sym, evsel, "");
 
he = hist_browser__selected_entry(browser);
if (sort_order && strstr(sort_order, "time")) {
@@ -2620,7 +2623,7 @@ add_script_opt(struct hist_browser *browser,
  tstr + j,
  sizeof tstr - j);
n += add_script_opt_2(browser, act, optstr, thread, sym,
- tstr);
+ evsel, tstr);
act->time = he->time;
}
return n;
@@ -3091,7 +3094,7 @@ static int perf_evsel__hists_browse(struct perf_evsel 
*evsel, int nr_events,
nr_options += add_script_opt(browser,
 
[nr_options],
 
[nr_options],
-thread, NULL);
+thread, NULL, 
evsel);
}
/*
 * Note that browser->selection != NULL
@@ -3106,11 +3109,12 @@ static int perf_evsel__hists_browse(struct perf_evsel 
*evsel, int nr_events,
nr_options += add_script_opt(browser,
 
[nr_options],
 
[nr_options],
-NULL, 
browser->selecti

[PATCH 03/11] perf tools report: Support nano seconds

2019-02-24 Thread Andi Kleen
From: Andi Kleen 

Upcoming changes add timestamp output in perf report. Add a --ns
argument similar to perf script to support nanoseconds resolution
when needed.

Signed-off-by: Andi Kleen 
---
 tools/perf/Documentation/perf-report.txt | 3 +++
 tools/perf/builtin-report.c  | 1 +
 tools/perf/builtin-script.c  | 1 -
 tools/perf/util/sort.c   | 1 +
 tools/perf/util/sort.h   | 2 ++
 5 files changed, 7 insertions(+), 1 deletion(-)

diff --git a/tools/perf/Documentation/perf-report.txt 
b/tools/perf/Documentation/perf-report.txt
index 1a27bfe05039..51dbc519dbce 100644
--- a/tools/perf/Documentation/perf-report.txt
+++ b/tools/perf/Documentation/perf-report.txt
@@ -477,6 +477,9 @@ include::itrace.txt[]
Please note that not all mmaps are stored, options affecting which ones
are include 'perf record --data', for instance.
 
+--ns::
+   Show time stamps in nanoseconds.
+
 --stats::
Display overall events statistics without any further processing.
(like the one at the end of the perf report -D command)
diff --git a/tools/perf/builtin-report.c b/tools/perf/builtin-report.c
index 2e8c74d6430c..bb8918a747ba 100644
--- a/tools/perf/builtin-report.c
+++ b/tools/perf/builtin-report.c
@@ -1147,6 +1147,7 @@ int cmd_report(int argc, const char **argv)
OPT_CALLBACK(0, "percent-type", _opts, "local-period",
 "Set percent type local/global-period/hits",
 annotate_parse_percent_type),
+   OPT_BOOLEAN(0, "ns", , "Show times in nanosecs"),
OPT_END()
};
struct perf_data data = {
diff --git a/tools/perf/builtin-script.c b/tools/perf/builtin-script.c
index 261055302d08..628c04543974 100644
--- a/tools/perf/builtin-script.c
+++ b/tools/perf/builtin-script.c
@@ -59,7 +59,6 @@ static bool   no_callchain;
 static boollatency_format;
 static boolsystem_wide;
 static boolprint_flags;
-static boolnanosecs;
 static const char  *cpu_list;
 static DECLARE_BITMAP(cpu_bitmap, MAX_NR_CPUS);
 static struct perf_stat_config stat_config;
diff --git a/tools/perf/util/sort.c b/tools/perf/util/sort.c
index 2b6c1ccb878c..d1010a966f9b 100644
--- a/tools/perf/util/sort.c
+++ b/tools/perf/util/sort.c
@@ -31,6 +31,7 @@ const char*field_order;
 regex_tignore_callees_regex;
 inthave_ignore_callees = 0;
 enum sort_mode sort__mode = SORT_MODE__NORMAL;
+bool   nanosecs = false;
 
 /*
  * Replaces all occurrences of a char used with the:
diff --git a/tools/perf/util/sort.h b/tools/perf/util/sort.h
index 2fbee0b1011c..98223d952404 100644
--- a/tools/perf/util/sort.h
+++ b/tools/perf/util/sort.h
@@ -278,6 +278,8 @@ struct sort_entry {
 extern struct sort_entry sort_thread;
 extern struct list_head hist_entry__sort_list;
 
+extern bool nanosecs;
+
 struct perf_evlist;
 struct tep_handle;
 int setup_sorting(struct perf_evlist *evlist);
-- 
2.17.2



Re: [PATCH 04/17] perf data: Fail check_backup in case of error

2019-02-21 Thread Andi Kleen
On Thu, Feb 21, 2019 at 10:41:32AM +0100, Jiri Olsa wrote:
> And display the error message from removing
> the old data file:
> 
>   $ perf record ls
>   Can't remove old data: Permission denied (perf.data.old)
>   Perf session creation failed.
> 
> Not sure how to make fail the rename (after we successfully
> remove the destination file/dir) to show the message,
> anyway let's have it there.

The use of rm_rf is really scary. So I simple command line typo could turn
this into rm -rf / ? 

Please add some sanity checks to not remove multiple levels,
and only remove files which look like they are generated
by perf.

-Andi


Re: [PATCH] x86/fpu: Parse comma separated list passed in clearcpuid

2019-02-21 Thread Andi Kleen
On Thu, Feb 21, 2019 at 02:37:45PM +0100, Peter Zijlstra wrote:
> On Thu, Feb 21, 2019 at 08:12:25AM -0500, Prarit Bhargava wrote:
> > Users cannot disable multiple CPU features with the kernel parameter
> > clearcpuid=.  For example, "clearcpuid=154 clearcpuid=227" only disables
> > CPUID bit 154.
> > 
> > Previous to commit 0c2a3913d6f5 ("x86/fpu: Parse clearcpuid= as early XSAVE
> > argument") it was possible to pass multiple clearcpuid options as kernel
> > parameters using individual entries.  With the new code it isn't easy to
> > replicate exactly that behaviour but a comma separated list can be easily
> > implemented, eg) "clearcpuid=154,227"
> > 
> > Make the clearcpuid parse a comma-separated list of values instead of only
> > a single value.
> 
> So I think the feature is broken as is; because it doesn't clear the
> CPUID bits for userspace.

Usually it's enough to make the kernel stop using something. I used it many 
times for this.

People who want to affect user space usually run VMs anyways.

-Andi


Re: [PATCH v5 12/12] KVM/VMX/vPMU: support to report GLOBAL_STATUS_LBRS_FROZEN

2019-02-15 Thread Andi Kleen
On Fri, Feb 15, 2019 at 08:56:02AM +, Wang, Wei W wrote:
> On Friday, February 15, 2019 12:32 AM, Andi Kleen wrote:
> > 
> > > +static void intel_pmu_get_global_status(struct kvm_pmu *pmu,
> > > + struct msr_data *msr_info)
> > > +{
> > > + u64 guest_debugctl, freeze_lbr_bits =
> > DEBUGCTLMSR_FREEZE_LBRS_ON_PMI |
> > > +   DEBUGCTLMSR_LBR;
> > > +
> > > + if (!pmu->global_status) {
> > > + msr_info->data = 0;
> > > + return;
> > > + }
> > > +
> > > + msr_info->data = pmu->global_status;
> > > + if (pmu->version >= 4) {
> > > + guest_debugctl = vmcs_read64(GUEST_IA32_DEBUGCTL);
> > > + if ((guest_debugctl & freeze_lbr_bits) == freeze_lbr_bits)
> > 
> > It should only check for the freeze bit, the freeze bit can be set even when
> > LBRs are disabled.
> > 
> > Also you seem to set the bit unconditionally?
> > That doesn't seem right. It should only be set after an overflow.
> > 
> > So the PMI injection needs to set it.
> 
> OK. The freeze bits need to be cleared by IA32_PERF_GLOBAL_STATUS_RESET, 
> which seems not supported by the perf code yet (thus guest won't clear them). 
> Would handle_irq_v4 also need to be changed to support that?

In Arch Perfmon v4 it is  cleared by the MSR_CORE_PERF_GLOBAL_OVF_CTRL write
But the guest KVM pmu doesn't support v4 so far, so the only way to clear it is 
through DEBUGCTL.

STATUS_RESET would only be needed to set it from the guest, which is not 
necessary at least for now
(and would be also v4)

At some point the guest PMU should probably be updated for v4, but it can be 
done
separately from this.

-Andi


Re: [PATCH v5 12/12] KVM/VMX/vPMU: support to report GLOBAL_STATUS_LBRS_FROZEN

2019-02-14 Thread Andi Kleen
> +static void intel_pmu_get_global_status(struct kvm_pmu *pmu,
> + struct msr_data *msr_info)
> +{
> + u64 guest_debugctl, freeze_lbr_bits = DEBUGCTLMSR_FREEZE_LBRS_ON_PMI |
> +   DEBUGCTLMSR_LBR;
> +
> + if (!pmu->global_status) {
> + msr_info->data = 0;
> + return;
> + }
> +
> + msr_info->data = pmu->global_status;
> + if (pmu->version >= 4) {
> + guest_debugctl = vmcs_read64(GUEST_IA32_DEBUGCTL);
> + if ((guest_debugctl & freeze_lbr_bits) == freeze_lbr_bits)

It should only check for the freeze bit, the freeze bit can be set
even when LBRs are disabled.

Also you seem to set the bit unconditionally?
That doesn't seem right. It should only be set after an overflow.

So the PMI injection needs to set it.

-Andi


Re: [PATCH v5 07/12] perf/x86: no counter allocation support

2019-02-14 Thread Andi Kleen
> diff --git a/include/uapi/linux/perf_event.h b/include/uapi/linux/perf_event.h
> index 9de8780..ec97a70 100644
> --- a/include/uapi/linux/perf_event.h
> +++ b/include/uapi/linux/perf_event.h
> @@ -372,7 +372,8 @@ struct perf_event_attr {
>   context_switch :  1, /* context switch data */
>   write_backward :  1, /* Write ring buffer from 
> end to beginning */
>   namespaces :  1, /* include namespaces data 
> */
> - __reserved_1   : 35;
> + no_counter :  1, /* no counter allocation */

Not sure we really want to expose this in the user ABI. Perhaps make
it a feature of the in kernel API only?

-Andi


Re: [PATCH v5 03/12] KVM/x86: KVM_CAP_X86_GUEST_LBR

2019-02-14 Thread Andi Kleen
> + case KVM_CAP_X86_GUEST_LBR:
> + r = -EINVAL;
> + if (cap->args[0] &&
> + x86_perf_get_lbr_stack(>arch.lbr_stack)) {
> + pr_err("Failed to enable the guest lbr feature\n");

Remove the pr_err. We don't want unprivileged users trigger unlimited
kernel printk.





Re: [PATCH 4.9 137/137] perf: Add support for supplementary event registers

2019-02-11 Thread Andi Kleen
On Mon, Feb 11, 2019 at 03:20:18PM +0100, Greg Kroah-Hartman wrote:
> 4.9-stable review patch.  If anyone has any objections, please let me know.
> 
> --
> 
> From: Andi Kleen 
> 
> commit a7e3ed1e470116c9d12c2f778431a481a6be8ab6 upstream.

The patch doesn't seem to match the commit log.

Did something got mixed up?

> Unfortunately this event requires programming a mask in a separate
> register. And worse this separate register is per core, not per
> CPU thread.
> 
> This patch:
> 
> - Teaches perf_events that OFFCORE_RESPONSE needs extra parameters.
>   The extra parameters are passed by user space in the
>   perf_event_attr::config1 field.
> 
> - Adds support to the Intel perf_event core to schedule per
>   core resources. This adds fairly generic infrastructure that
>   can be also used for other per core resources.
>   The basic code has is patterned after the similar AMD northbridge
>   constraints code.
> 
> Thanks to Stephane Eranian who pointed out some problems
> in the original version and suggested improvements.
> 
> Signed-off-by: Andi Kleen 
> Signed-off-by: Lin Ming 
> Signed-off-by: Peter Zijlstra 
> LKML-Reference: <1299119690-13991-2-git-send-email-ming.m@intel.com>
> Signed-off-by: Ingo Molnar 
> [ He Zhe: Fixes conflict caused by missing disable_counter_freeze which is
>  introduced since v4.20 af3bdb991a5cb. ]
> Signed-off-by: He Zhe 
> Signed-off-by: Greg Kroah-Hartman 
> 
> ---
>  arch/x86/events/intel/core.c |   10 --
>  1 file changed, 8 insertions(+), 2 deletions(-)
> 
> --- a/arch/x86/events/intel/core.c
> +++ b/arch/x86/events/intel/core.c
> @@ -3235,6 +3235,11 @@ static void free_excl_cntrs(int cpu)
>  
>  static void intel_pmu_cpu_dying(int cpu)
>  {
> + fini_debug_store_on_cpu(cpu);
> +}
> +
> +static void intel_pmu_cpu_dead(int cpu)
> +{
>   struct cpu_hw_events *cpuc = _cpu(cpu_hw_events, cpu);
>   struct intel_shared_regs *pc;
>  
> @@ -3246,8 +3251,6 @@ static void intel_pmu_cpu_dying(int cpu)
>   }
>  
>   free_excl_cntrs(cpu);
> -
> - fini_debug_store_on_cpu(cpu);
>  }
>  
>  static void intel_pmu_sched_task(struct perf_event_context *ctx,
> @@ -3324,6 +3327,7 @@ static __initconst const struct x86_pmu
>   .cpu_prepare= intel_pmu_cpu_prepare,
>   .cpu_starting   = intel_pmu_cpu_starting,
>   .cpu_dying  = intel_pmu_cpu_dying,
> + .cpu_dead   = intel_pmu_cpu_dead,
>  };
>  
>  static __initconst const struct x86_pmu intel_pmu = {
> @@ -3359,6 +3363,8 @@ static __initconst const struct x86_pmu
>   .cpu_prepare= intel_pmu_cpu_prepare,
>   .cpu_starting   = intel_pmu_cpu_starting,
>   .cpu_dying  = intel_pmu_cpu_dying,
> + .cpu_dead   = intel_pmu_cpu_dead,
> +
>   .guest_get_msrs = intel_guest_get_msrs,
>   .sched_task = intel_pmu_sched_task,
>  };
> 
> 


[tip:perf/core] perf/x86/kvm: Avoid unnecessary work in guest filtering

2019-02-11 Thread tip-bot for Andi Kleen
Commit-ID:  9b545c04abd4f7246a3bde040efde587abebb23c
Gitweb: https://git.kernel.org/tip/9b545c04abd4f7246a3bde040efde587abebb23c
Author: Andi Kleen 
AuthorDate: Mon, 4 Feb 2019 14:23:30 -0800
Committer:  Ingo Molnar 
CommitDate: Mon, 11 Feb 2019 08:00:39 +0100

perf/x86/kvm: Avoid unnecessary work in guest filtering

KVM added a workaround for PEBS events leaking into guests with
commit:

  26a4f3c08de4 ("perf/x86: disable PEBS on a guest entry.")

This uses the VT entry/exit list to add an extra disable of the
PEBS_ENABLE MSR.

Intel also added a fix for this issue to microcode updates on
Haswell/Broadwell/Skylake.

It turns out using the MSR entry/exit list makes VM exits
significantly slower. The list is only needed for disabling
PEBS, because the GLOBAL_CTRL change gets optimized by
KVM into changing the VMCS.

Check for the microcode updates that have the microcode
fix for leaking PEBS, and disable the extra entry/exit list
entry for PEBS_ENABLE. In addition we always clear the
GLOBAL_CTRL for the PEBS counter while running in the guest,
which is enough to make them never fire at the wrong
side of the host/guest transition.

The overhead for VM exits with the filtering active with the patch is
reduced from 8% to 4%.

The microcode patch has already been merged into future platforms.
This patch is one-off thing. The quirks is used here.

For other old platforms which doesn't have microcode patch and quirks,
extra disable of the PEBS_ENABLE MSR is still required.

Signed-off-by: Andi Kleen 
Signed-off-by: Kan Liang 
Signed-off-by: Peter Zijlstra (Intel) 
Cc: Alexander Shishkin 
Cc: Arnaldo Carvalho de Melo 
Cc: David Ahern 
Cc: Jiri Olsa 
Cc: Linus Torvalds 
Cc: Namhyung Kim 
Cc: Peter Zijlstra 
Cc: Stephane Eranian 
Cc: Thomas Gleixner 
Cc: Vince Weaver 
Cc: b...@alien8.de
Link: 
https://lkml.kernel.org/r/1549319013-4522-2-git-send-email-kan.li...@linux.intel.com
Signed-off-by: Ingo Molnar 
---
 arch/x86/events/intel/core.c | 74 ++--
 arch/x86/events/intel/ds.c   |  2 ++
 arch/x86/events/perf_event.h | 15 -
 3 files changed, 75 insertions(+), 16 deletions(-)

diff --git a/arch/x86/events/intel/core.c b/arch/x86/events/intel/core.c
index daafb893449b..8fe2afa9c818 100644
--- a/arch/x86/events/intel/core.c
+++ b/arch/x86/events/intel/core.c
@@ -18,6 +18,7 @@
 #include 
 #include 
 #include 
+#include 
 
 #include "../perf_event.h"
 
@@ -3206,16 +3207,27 @@ static struct perf_guest_switch_msr 
*intel_guest_get_msrs(int *nr)
arr[0].msr = MSR_CORE_PERF_GLOBAL_CTRL;
arr[0].host = x86_pmu.intel_ctrl & ~cpuc->intel_ctrl_guest_mask;
arr[0].guest = x86_pmu.intel_ctrl & ~cpuc->intel_ctrl_host_mask;
-   /*
-* If PMU counter has PEBS enabled it is not enough to disable counter
-* on a guest entry since PEBS memory write can overshoot guest entry
-* and corrupt guest memory. Disabling PEBS solves the problem.
-*/
-   arr[1].msr = MSR_IA32_PEBS_ENABLE;
-   arr[1].host = cpuc->pebs_enabled;
-   arr[1].guest = 0;
+   if (x86_pmu.flags & PMU_FL_PEBS_ALL)
+   arr[0].guest &= ~cpuc->pebs_enabled;
+   else
+   arr[0].guest &= ~(cpuc->pebs_enabled & PEBS_COUNTER_MASK);
+   *nr = 1;
+
+   if (x86_pmu.pebs && x86_pmu.pebs_no_isolation) {
+   /*
+* If PMU counter has PEBS enabled it is not enough to
+* disable counter on a guest entry since PEBS memory
+* write can overshoot guest entry and corrupt guest
+* memory. Disabling PEBS solves the problem.
+*
+* Don't do this if the CPU already enforces it.
+*/
+   arr[1].msr = MSR_IA32_PEBS_ENABLE;
+   arr[1].host = cpuc->pebs_enabled;
+   arr[1].guest = 0;
+   *nr = 2;
+   }
 
-   *nr = 2;
return arr;
 }
 
@@ -3739,6 +3751,47 @@ static __init void intel_clovertown_quirk(void)
x86_pmu.pebs_constraints = NULL;
 }
 
+static const struct x86_cpu_desc isolation_ucodes[] = {
+   INTEL_CPU_DESC(INTEL_FAM6_HASWELL_CORE,  3, 0x001f),
+   INTEL_CPU_DESC(INTEL_FAM6_HASWELL_ULT,   1, 0x001e),
+   INTEL_CPU_DESC(INTEL_FAM6_HASWELL_GT3E,  1, 0x0015),
+   INTEL_CPU_DESC(INTEL_FAM6_HASWELL_X, 2, 0x0037),
+   INTEL_CPU_DESC(INTEL_FAM6_HASWELL_X, 4, 0x000a),
+   INTEL_CPU_DESC(INTEL_FAM6_BROADWELL_CORE,4, 0x0023),
+   INTEL_CPU_DESC(INTEL_FAM6_BROADWELL_GT3E,1, 0x0014),
+   INTEL_CPU_DESC(INTEL_FAM6_BROADWELL_XEON_D,  2, 0x0010),
+   INTEL_CPU_DESC(INTEL_FAM6_BROADWELL_XEON_D,  3, 0x0709),
+   INTEL_CPU_DESC(INTEL_FAM6_BROADWELL_XEON_D,  4, 0x0f09),
+   INTEL_CPU_DESC(INTEL_FAM6_BROADWELL_XEON_D,  5, 0

Re: [PATCH V6 2/5] perf/x86/kvm: Avoid unnecessary work in guest filtering

2019-02-04 Thread Andi Kleen
> As my understanding, the microcode version for each stepping is independent
> and irrelevant. The 0x004e should be just coincidence.
> If so, I don't think X86_STEPPING_ANY is very useful.
> 
> Andi, if I'm wrong please correct me.

Yes that's right. You cannot match microcode without stepping.

-Andi


Re: [PATCH 3.16 045/305] x86/speculation: Apply IBPB more strictly to avoid cross-process data leak

2019-02-03 Thread Andi Kleen
On Sun, Feb 03, 2019 at 08:05:53PM +0100, Jiri Kosina wrote:
> On Sun, 3 Feb 2019, Ben Hutchings wrote:
> 
> > 3.16.63-rc1 review patch.  If anyone has any objections, please let me know.
> > 
> > --
> > 
> > From: Jiri Kosina 
> > 
> > commit dbfe2953f63c640463c630746cd5d9de8b2f63ae upstream.
> 
> You really want the whole IBPB+STIBP revamp from upstream, otherwise 
> you're going to get noticeable performance penalties on some workloads 
> with some microcodes.

Yes, we would need the opt-in/opt-out support too.

Please don't merge it just as is.

-Andi


Re: [PATCH v5 00/13] x86: Enable FSGSBASE instructions

2019-02-01 Thread Andi Kleen


Patches all look good to me.

Reviewed-by: Andi Kleen 

-Andi


Re: System crash with perf_fuzzer (kernel: 5.0.0-rc3)

2019-01-31 Thread Andi Kleen
> Yeah, a loop stuck looks really scary inside an NMI handler.
> Should I just go ahead to send a patch to remove this warning?
> Or probably turn it into a pr_info()?

Not at this point. Would need to fix the PMU reset first to be
more selective. 

-Andi


Re: [PATCH V3 01/13] perf/core, x86: Add PERF_SAMPLE_DATA_PAGE_SIZE

2019-01-31 Thread Andi Kleen
> 
> Aside from all the missin {}, I'm fairly sure this is broken since this
> happens from NMI context. This can interrupt switch_mm() and things like
> use_temporary_mm().

So the concern is that the sample is from before the switch, and then
looks it up in the wrong page tables if the PMI happens after the switch
due to sampling skid?

First this can happen only with PEBS, which doesn't have that
bad worst case skid (perhaps tens of cycles)

I doubt it is very likely because this problem could only happen
for user addresses because kernel page tables don't change. 

But we would be in the middle of the context 
switch (or use_temporary_mm) here, and there should be no
user space accesses within a tens of cycle window 
(except perhaps for the rseq address, but that's not a very
interesting case)

I assume the use_temporary_mm() cases are similar.

I suppose we could enforce flushing the PMU on such context switches,
but I would suspect while it's a valid theoretical problem, 
it's unlikely to be a real problem in practice.

Likely it means that large buffer PEBS cannot be ever used
with this option, but I guess that's ok.

-Andi



Re: System crash with perf_fuzzer (kernel: 5.0.0-rc3)

2019-01-31 Thread Andi Kleen
On Thu, Jan 31, 2019 at 01:28:34PM +0530, Ravi Bangoria wrote:
> Hi Andi,
> 
> On 1/25/19 9:30 PM, Andi Kleen wrote:
> >> [Fri Jan 25 10:28:53 2019] perf: interrupt took too long (2501 > 2500), 
> >> lowering kernel.perf_event_max_sample_rate to 79750
> >> [Fri Jan 25 10:29:08 2019] perf: interrupt took too long (3136 > 3126), 
> >> lowering kernel.perf_event_max_sample_rate to 63750
> >> [Fri Jan 25 10:29:11 2019] perf: interrupt took too long (4140 > 3920), 
> >> lowering kernel.perf_event_max_sample_rate to 48250
> >> [Fri Jan 25 10:29:11 2019] perf: interrupt took too long (5231 > 5175), 
> >> lowering kernel.perf_event_max_sample_rate to 38000
> >> [Fri Jan 25 10:29:11 2019] perf: interrupt took too long (6736 > 6538), 
> >> lowering kernel.perf_event_max_sample_rate to 29500
> > 
> > These are fairly normal.
> 
> I understand that throttling mechanism is designed exactly to do this.
> But I've observed that, everytime I run the fuzzer, max_sample_rates is
> been throttled down to 250 (which is CONFIG_HZ I guess). Doesn't this
> mean the interrupt time is somehow increasing gradually? Is that fine?

It's more like the throttling mechanism is an controller
and it takes multiple tries to zoom in on the truely
needed value.

You can measure the PMI time by enabling the nmi:nmi_handler
trace point. It directly reports it. From what I've seen
it's a long tail distribution with regular large outliers.
Most of the PMIs are not that slow, just an occassional
few are.

When I did some investigation on this a couple years back
the outliers were either due to call stack processing,
or due to flushing the perf ring buffer. There were some
fixes on the the call stack case back then, but I'm sure more could
be done.

For the call stack processing there isn't much more we can do I think
(other than switching to call stack LBR only),
but I suspect the buffer flushing problem could be improved more.

It's relatively easy to investigate with a variant of the ftrace
recipe I posted earlier (but you need to fix the Makefile first
to enable ftrace for all of perf) Just add a ftrace trigger on the
nmi_handler trace point to stop tracing when the nmi_handler
time exceeds a threshold and look at the traces.

-Andi



Re: System crash with perf_fuzzer (kernel: 5.0.0-rc3)

2019-01-30 Thread Andi Kleen
Jiri Olsa  writes:
>
> the patch adds check_eriod pmu callback.. I need to check if there's
> better way to do this, but so far it fixes the crash for me
>
> if you guys could check this patch, that'd be great

There's already a limit_period callback, perhaps that could
be extended. But ok, can do it this way too.

I suspect there are some other cases that need this callback, not
just BTS, e.g. the checks in hsw_hw_config


-Andi


Re: [PATCH v2 0/4] perf: enable compression of record mode trace to save storage space

2019-01-29 Thread Andi Kleen
On Tue, Jan 29, 2019 at 11:45:43AM +0100, Arnaldo Carvalho de Melo wrote:
> Em Mon, Jan 28, 2019 at 09:40:28AM +0300, Alexey Budankov escreveu:
> > The patch set implements runtime trace compression for record mode and 
> > trace file decompression for report mode. Zstandard API [1] is used for 
> > compression/decompression of data that come from perf_events kernel 
> 
> Interesting, wasn't aware of this zstd library, I wonder if we can add
> it and switch the other compression libraries we link against, so that
> we're not adding one more library to the dep list of perf but removing
> some instead, do you think this would be possible?

Sadly we can't because perf needs the existing libraries to uncompress
modules, and they might be compressed with zlib or bzip2. zstd cannot
uncompress those formats.

-Andi


Re: [PATCH] perf vendor events intel: Fix Load_Miss_Real_Latency on CLX

2019-01-29 Thread Andi Kleen
On Tue, Jan 29, 2019 at 12:05:36PM -0500, William Cohen wrote:
> Fix incorrect event names for the Load_Miss_Real_Latency metric for
> Cascadelake server in the same manner as commit 91b2b97025 for SKL/SKX.

Reviewed-by: Andi Kleen 

> 
> Signed-off-by: William Cohen 
> ---
>  tools/perf/pmu-events/arch/x86/cascadelakex/clx-metrics.json | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/tools/perf/pmu-events/arch/x86/cascadelakex/clx-metrics.json 
> b/tools/perf/pmu-events/arch/x86/cascadelakex/clx-metrics.json
> index 36c903faed0b..71e9737f4614 100644
> --- a/tools/perf/pmu-events/arch/x86/cascadelakex/clx-metrics.json
> +++ b/tools/perf/pmu-events/arch/x86/cascadelakex/clx-metrics.json
> @@ -73,7 +73,7 @@
>  },
>  {
>  "BriefDescription": "Actual Average Latency for L1 data-cache miss 
> demand loads",
> -"MetricExpr": "L1D_PEND_MISS.PENDING / ( MEM_LOAD_RETIRED.L1_MISS_PS 
> + MEM_LOAD_RETIRED.FB_HIT_PS )",
> +"MetricExpr": "L1D_PEND_MISS.PENDING / ( MEM_LOAD_RETIRED.L1_MISS + 
> MEM_LOAD_RETIRED.FB_HIT )",
>  "MetricGroup": "Memory_Bound;Memory_Lat",
>  "MetricName": "Load_Miss_Real_Latency"
>  },
> -- 
> 2.20.1
> 


Re: [RFC] Don't print sample_type bits in non-group events not set in the group's was Re: [PATCH] perf, script: Fix crash with printing mixed trace point and other events

2019-01-28 Thread Andi Kleen
> > also now it won't make sample for slave events
> > with zero value/period read
> > 
> > note the patch needs to be split into more patches,
> > sending it all together for discussion over the solution
> 
> any feedback on this one?

Looks good to me.

Reviewed-by: Andi Kleen 
-Andi


Re: System crash with perf_fuzzer (kernel: 5.0.0-rc3)

2019-01-25 Thread Andi Kleen
> [Fri Jan 25 10:28:53 2019] perf: interrupt took too long (2501 > 2500), 
> lowering kernel.perf_event_max_sample_rate to 79750
> [Fri Jan 25 10:29:08 2019] perf: interrupt took too long (3136 > 3126), 
> lowering kernel.perf_event_max_sample_rate to 63750
> [Fri Jan 25 10:29:11 2019] perf: interrupt took too long (4140 > 3920), 
> lowering kernel.perf_event_max_sample_rate to 48250
> [Fri Jan 25 10:29:11 2019] perf: interrupt took too long (5231 > 5175), 
> lowering kernel.perf_event_max_sample_rate to 38000
> [Fri Jan 25 10:29:11 2019] perf: interrupt took too long (6736 > 6538), 
> lowering kernel.perf_event_max_sample_rate to 29500

These are fairly normal.

> [Fri Jan 25 10:32:44 2019] [ cut here ]
> [Fri Jan 25 10:32:44 2019] perfevents: irq loop stuck!

I believe it's always possible to cause an irq loop. This happens when
the PMU is programmed to cause PMIs on multiple counters 
too quickly. Maybe should just recover from it without printing such
scary messages.

Right now the scary message is justified because it resets the complete
PMU. Perhaps need to be a bit more selective resetting on only
the events that loop.

> [Fri Jan 25 10:32:44 2019] WARNING: CPU: 1 PID: 0 at 
> arch/x86/events/intel/core.c:2440 intel_pmu_handle_irq+0x158/0x170

This looks independent.

I would apply the following patch (cut'n'pasted, so may need manual apply) 
and then run with

cd /sys/kernel/debug/tracing
echo 5 > buffer_size_kb
echo default_do_nmi > set_graph_function 
echo 1 > events/msr/enable 
echo 'msr != 0xc100 && msr != 0x6e0' > events/msr/write_msr/filter
echo function_graph > current_tracer 
echo printk:traceoff > set_ftrace_filter
echo 1 > tracing_on

and then collect the trace from /sys/kernel/debug/tracing/trace
after the oops.  This should show the context of when it happens.

diff --git a/kernel/events/Makefile b/kernel/events/Makefile
index 3c022e33c109..8afc997110e0 100644
--- a/kernel/events/Makefile
+++ b/kernel/events/Makefile
@@ -1,7 +1,4 @@
 # SPDX-License-Identifier: GPL-2.0
-ifdef CONFIG_FUNCTION_TRACER
-CFLAGS_REMOVE_core.o = $(CC_FLAGS_FTRACE)
-endif
 
 obj-y := core.o ring_buffer.o callchain.o
 


Re: [LSF/MM TOPIC] Page flags, can we free up space ?

2019-01-22 Thread Andi Kleen
Jerome Glisse  writes:
>
> Right now this is more a temptative ie i do not know if i will succeed,
> in any case i can report on failure or success and discuss my finding to
> get people opinions on the matter.

I would just stop putting node/zone number into the flags. These
could be all handled with a small perfect hash table, like the original
x86_64 port did, which should be quite cheap to look up.
Then there should be enough bits for everyone again.

-Andi


Re: [PATCH 10/12] perf script: Add support for PERF_SAMPLE_CODE_PAGE_SIZE

2019-01-22 Thread Andi Kleen
> + PERF_OUTPUT_CODE_PAGE_SIZE  = 1UL << 32,

That won't work on 32bit. You need 1ULL

Also might want to audit that noone puts these flags into 
an int.

-Andi


[tip:perf/urgent] perf script: Fix crash with printing mixed trace point and other events

2019-01-22 Thread tip-bot for Andi Kleen
Commit-ID:  96167167b6e17b25c0e05ecc31119b73baeab094
Gitweb: https://git.kernel.org/tip/96167167b6e17b25c0e05ecc31119b73baeab094
Author: Andi Kleen 
AuthorDate: Thu, 17 Jan 2019 11:48:34 -0800
Committer:  Arnaldo Carvalho de Melo 
CommitDate: Fri, 18 Jan 2019 09:53:07 -0300

perf script: Fix crash with printing mixed trace point and other events

'perf script' crashes currently when printing mixed trace points and
other events because the trace format does not handle events without
trace meta data. Add a simple check to avoid that.

  % cat > test.c
  main()
  {
  printf("Hello world\n");
  }
  ^D
  % gcc -g -o test test.c
  % sudo perf probe -x test 'test.c:3'
  % perf record -e '{cpu/cpu-cycles,period=1/,probe_test:main}:S' ./test
  % perf script
  

Committer testing:

Before:

  # perf probe -x /lib64/libc-2.28.so malloc
  Added new event:
probe_libc:malloc(on malloc in /usr/lib64/libc-2.28.so)

  You can now use it in all perf tools, such as:

perf record -e probe_libc:malloc -aR sleep 1

  # perf probe -l
  probe_libc:malloc(on __libc_malloc@malloc/malloc.c in 
/usr/lib64/libc-2.28.so)
  # perf record -e '{cpu/cpu-cycles,period=1/,probe_libc:*}:S' sleep 1
  [ perf record: Woken up 1 times to write data ]
  [ perf record: Captured and wrote 0.023 MB perf.data (40 samples) ]
  # perf script
  Segmentation fault (core dumped)
  ^C
  #

After:

  # perf script | head -6
 sleep 2888 94796.944981: 16198 cpu/cpu-cycles,period=1/: 
925dc04f get_random_u32+0x1f (/lib/modules/5.0.0-rc2+/build/vmlinux)
 sleep 2888 [-01] 94796.944981: probe_libc:malloc:
 sleep 2888 94796.944983:  4713 cpu/cpu-cycles,period=1/: 
922763af change_protection+0xcf (/lib/modules/5.0.0-rc2+/build/vmlinux)
 sleep 2888 [-01] 94796.944983: probe_libc:malloc:
 sleep 2888 94796.944986:  9934 cpu/cpu-cycles,period=1/: 
922777e0 move_page_tables+0x0 (/lib/modules/5.0.0-rc2+/build/vmlinux)
 sleep 2888 [-01] 94796.944986: probe_libc:malloc:
  #

Signed-off-by: Andi Kleen 
Tested-by: Arnaldo Carvalho de Melo 
Acked-by: Jiri Olsa 
Link: http://lkml.kernel.org/r/20190117194834.21940-1-a...@firstfloor.org
Signed-off-by: Arnaldo Carvalho de Melo 
---
 tools/perf/builtin-script.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/tools/perf/builtin-script.c b/tools/perf/builtin-script.c
index d079f36d342d..357906ed1898 100644
--- a/tools/perf/builtin-script.c
+++ b/tools/perf/builtin-script.c
@@ -1794,7 +1794,7 @@ static void process_event(struct perf_script *script,
return;
}
 
-   if (PRINT_FIELD(TRACE)) {
+   if (PRINT_FIELD(TRACE) && sample->raw_data) {
event_format__fprintf(evsel->tp_format, sample->cpu,
  sample->raw_data, sample->raw_size, fp);
}


Re: [RFC] x86/speculation: add L1 Terminal Fault / Foreshadow demo

2019-01-21 Thread Andi Kleen
> + /* Check the start address: needs to be page-aligned.. */
> +-if (start & ~PAGE_MASK)
> ++if (start & ~PAGE_MASK) {
> ++
> ++/*
> ++ * XXX Hack
> ++ *
> ++ * We re-use this error case to show case a cache load gadget:
> ++ * There is a mispredicted branch, which leads to prefetching
> ++ * the cache with attacker controlled data.
> ++ */
> ++asm volatile (

Obviously that can never be added to a standard kernel.

And I don't see much point in shipping test cases that require non
standard kernel patching. The idea of shipping test cases is that
you can easily test them, but in this form it can't.

Also even without that problem, not sure what benefit including such a thing 
would have.

If you want to improve regression test coverage, it would be far better to have
test cases which do more directed unit testing against specific software 
parts of the mitigation.

For example some automated testing that the host page tables are inverted as
expected for different scenarios. I checked that manually during development,
but something automated would be great as a regression test. It would
need some way to translate VA->PA in user space.

Or have some tests that run test cases with PT or the MSR tracer with
a guest and automatically check that the MSR writes for VM entries are in
the right location.

-Andi



Re: [RFC] Don't print sample_type bits in non-group events not set in the group's was Re: [PATCH] perf, script: Fix crash with printing mixed trace point and other events

2019-01-18 Thread Andi Kleen
> +static bool perf_evsel__should_skip(struct perf_evsel *evsel)
> +{
> + struct perf_event_attr *attr = >attr;
> + struct perf_evsel *leader = evsel->leader;
> +
> + return (leader != evsel) && !attr->freq && !attr->sample_freq;
> +}
> +
>  static int process_sample_event(struct perf_tool *tool,
>   union perf_event *event,
>   struct perf_sample *sample,
> @@ -1934,6 +1942,9 @@ static int process_sample_event(struct perf_tool *tool,
>   struct perf_script *scr = container_of(tool, struct perf_script, tool);
>   struct addr_location al;
>  
> + if (perf_evsel__should_skip(evsel))
> + return 0;

That just skips, but surely it has to be displayed somewhere?

-Andi


Re: [PATCH] perf/core: fix perf_proc_update_handler() bug

2019-01-17 Thread Andi Kleen
On Thu, Jan 17, 2019 at 01:03:20PM -0800, Stephane Eranian wrote:
> On Thu, Jan 10, 2019 at 5:17 PM Stephane Eranian  wrote:
> >
> > The perf_proc_update_handler() handles 
> > /proc/sys/kernel/perf_event_max_sample_rate
> > syctl variable.  When the PMU IRQ handler timing monitoring is disabled, 
> > i.e,
> > when /proc/sys/kernel/perf_cpu_time_max_percent is equal to 0 or 100,
> > then no modification to sysctl_perf_event_sample_rate is allowed to prevent
> > possible hang from wrong values.
> >
> > The problem is that the test to prevent modification is made after the
> > sysctl variable is modified in perf_proc_update_handler().
> >
> > You get an error:
> >
> > $ echo 10001 >/proc/sys/kernel/perf_event_max_sample_rate
> > echo: write error: invalid argument
> >
> > But the value is still modified causing all sorts of inconsistencies:
> >
> > $ cat /proc/sys/kernel/perf_event_max_sample_rate
> > 10001
> >
> > This patch fixes the problem by moving the parsing of the value after
> > the test.
> >
> > Signed-off-by: Stephane Eranian 
> 
> Ping.

Reviewed-by: Andi Kleen 

-Andi


[PATCH] perf, script: Fix crash with printing mixed trace point and other events

2019-01-17 Thread Andi Kleen
From: Andi Kleen 

perf script crashes currently when printing mixed trace points and other
events because the trace format does not handle events without trace
meta data. Add a simple check to avoid that.

% cat > test.c
main()
{
printf("Hello world\n");
}
^D
% gcc -g -o test test.c
% sudo perf probe -x test 'test.c:3'
% perf record -e '{cpu/cpu-cycles,period=1/,probe_test:main}:S' ./test
% perf script


Signed-off-by: Andi Kleen 
---
 tools/perf/builtin-script.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/tools/perf/builtin-script.c b/tools/perf/builtin-script.c
index 3728b50e52e2..ea2396b33a57 100644
--- a/tools/perf/builtin-script.c
+++ b/tools/perf/builtin-script.c
@@ -1779,7 +1779,7 @@ static void process_event(struct perf_script *script,
return;
}
 
-   if (PRINT_FIELD(TRACE)) {
+   if (PRINT_FIELD(TRACE) && sample->raw_data) {
event_format__fprintf(evsel->tp_format, sample->cpu,
  sample->raw_data, sample->raw_size, fp);
}
-- 
2.17.2



Re: [PATCH v4 04/13] x86/fsgsbase/64: Add intrinsics/macros for FSGSBASE instructions

2019-01-16 Thread Andi Kleen
> +#ifdef CONFIG_X86_64
> +
> +#include 
> +
> +.macro RDGSBASE opd

The caller can now use the assembler instructions directly, so the macros
are not needed anymore.

-Andi


Re: [RFC v2 0/6] x86: dynamic indirect branch promotion

2019-01-08 Thread Andi Kleen
> BTW: I am not sure that static-keys are much better. Their change also
> affects the control flow, and they do affect the control flow.

Static keys have the same problem, but they only change infrequently so usually
it's not too big a problem if you dump the kernel close to the tracing 
sessions.

simple-pt doesn't have a similar mechanism, so it suffers more from it.

-Andi


Re: [RFC v2 0/6] x86: dynamic indirect branch promotion

2019-01-08 Thread Andi Kleen
On Tue, Jan 08, 2019 at 11:10:58AM +0100, Peter Zijlstra wrote:
> On Tue, Jan 08, 2019 at 12:01:11PM +0200, Adrian Hunter wrote:
> > The problem is that the jitted code gets freed from memory, which is why I
> > suggested the ability to pin it for a while.
> 
> Then what do you tell the guy that keeps PT running for a day and runs
> out of memory because he likes to JIT a lot?

It only would need to be saved until the next kcore dump, so they would
need to do regular kcore dumps, after each of which the JIT code could be freed.

In a sense it would be like RCU for code.

You would somehow need to tell the kernel when that happens though
so it can schedule the frees.

It doesn't work when the code is modified in place, like the
patch in the $SUBJECT.

-Andi


Re: [PATCH v4 05/10] KVM/x86: expose MSR_IA32_PERF_CAPABILITIES to the guest

2019-01-07 Thread Andi Kleen
On Mon, Jan 07, 2019 at 10:48:38AM -0800, Jim Mattson wrote:
> On Mon, Jan 7, 2019 at 10:20 AM Andi Kleen  wrote:
> >
> > > The issue is compatibility. Prior to your change, reading this MSR
> > > from a VM would raise #GP. After your change, it won't. That means
> > > that if you have a VM migrating between hosts with kernel versions
> > > before and after this change, the results will be inconsistent. In the
> >
> > No it will not be. All Linux kernel uses of this MSR are guarded
> > by a CPUID check.
> 
> Linux usage is irrelevant to the architected behavior of the virtual
> CPU. According to volume 4 of the SDM, this MSR is only supported when
> CPUID.01H:ECX.PDCM [bit 15] is set. Therefore, kvm should raise #GP
> whenever a guest tries to read this MSR and the guest's
> CPUID.01H:ECX.PDCM [bit 15] is clear.

That's not general practice in KVM. There are lots of MSRs
that don't fault even if their CPUID doesn't support them.

It's also not generally true for instructions, where it is even
impossible.

You could argue it should be done for MSRs, but that would
be a much larger patch for lots of MSRs. It seems pointless
to single out this particular one.

In practice I doubt it will make any difference for
real software either way.

-Andi



Re: [PATCH v4 05/10] KVM/x86: expose MSR_IA32_PERF_CAPABILITIES to the guest

2019-01-07 Thread Andi Kleen
> The issue is compatibility. Prior to your change, reading this MSR
> from a VM would raise #GP. After your change, it won't. That means
> that if you have a VM migrating between hosts with kernel versions
> before and after this change, the results will be inconsistent. In the

No it will not be. All Linux kernel uses of this MSR are guarded
by a CPUID check.

-Andi


Re: [RFC v2 1/6] x86: introduce kernel restartable sequence

2019-01-03 Thread Andi Kleen
> Thanks for the explanations. I don’t think it would work (e.g., IRQs). I can
> avoid generalizing and just detect the "magic sequence” of the code, but let
> me give it some more thought.

If you ask me I would just use compiler profile feedback or autofdo (if your
compiler has a working version)

The compiler can do a much better job at optimizing this than you ever could.

Manual FDO needs some kernel patching though.

-Andi



Re: [RFC v2 1/6] x86: introduce kernel restartable sequence

2019-01-03 Thread Andi Kleen
> Ok… I’ll try to think about another solution. Just note that this is just
> used as a hint to avoid unnecessary lookups. (IOW, nothing will break if the
> prefix is used.)

Are you sure actually? 

The empty prefix could mean 8bit register accesses.

> > You're doing the equivalent of patching a private system call
> > into your own kernel without working with upstream, don't do that.
> 
> I don’t understand this comment though. Can you please explain?

Instruction encoding = system call ABI
Upstream = CPU vendors

Early in Linux's history, naive Linux distribution vendors patched in their own
private system calls without waiting for upstream to define an ABI, which caused
endless compatibility problems. These days this is very frowned upon.

> > Better to find some other solution to do the restart.
> > How about simply using a per cpu variable? That should be cheaper
> > anyways.
> 
> The problem is that the per-cpu variable needs to be updated after the call
> is executed, when we are already not in the context of the “injected” code.
> I can increase it before the call, and decrease it after return - but this
> can create (in theory) long periods in which the code is “unpatchable”,
> increase the code size and slow performance.
> 
> Anyhow, I’ll give more thought. Ideas are welcomed.

Write the address of the instruction into the per cpu variable.

-Andi



Re: [RFC v2 1/6] x86: introduce kernel restartable sequence

2019-01-03 Thread Andi Kleen
Nadav Amit  writes:

I see another poor man's attempt to reinvent TSX.

> It is sometimes beneficial to have a restartable sequence - very few
> instructions which if they are preempted jump to a predefined point.
>
> To provide such functionality on x86-64, we use an empty REX-prefix
> (opcode 0x40) as an indication for instruction in such a sequence. Before
> calling the schedule IRQ routine, if the "magic" prefix is found, we
> call a routine to adjust the instruction pointer.  It is expected that
> this opcode is not in common use.

You cannot just assume something like that. x86 is a constantly
evolving architecture. The prefix might well have meaning at
some point.

Before doing something like that you would need to ask the CPU
vendors to reserve the sequence you're using for software use.

You're doing the equivalent of patching a private system call
into your own kernel without working with upstream, don't do that.

Better to find some other solution to do the restart.
How about simply using a per cpu variable? That should be cheaper
anyways.

-Andi


Re: [RFC v2 0/6] x86: dynamic indirect branch promotion

2019-01-03 Thread Andi Kleen
Nadav Amit  writes:
>
> - Do we use periodic learning or not? Josh suggested to reconfigure the
>   branches whenever a new target is found. However, I do not know at
>   this time how to do learning efficiently, without making learning much
>   more expensive.

FWIW frequent patching will likely completely break perf Processor Trace
decoding, which needs a somewhat stable kernel text image to decode the
traces generated by the CPU. Right now it relies on kcore dumped after
the trace usually being stable because jumplabel changes happen only
infrequently. But if you start patching frequently this assumption will
break.

You would either need a way to turn this off, or provide
updates for every change to the trace, so that the decoder can
keep track.

-Andi


Re: [PATCH v4 04/10] KVM/x86: intel_pmu_lbr_enable

2019-01-03 Thread Andi Kleen
> Yes, but then what happens?
> 
> Fast forward to, say, 2021. You're decommissioning all Broadwell
> servers in your data center. You have to migrate the running VMs off
> of those Broadwell systems onto newer hardware. But, with the current
> implementation, the migration cannot happen. So, what do you do? I
> suppose you just never enable the feature in the first place. Right?

There would in theory ways to handle this.

LBRs are normally not required for running correctly, they're not like
instructions, so it would be actually quite reasonable to just return 0 
and ignore writes for incompatible machines for them. Your profilers might
return some bogus data, but that is normally acceptable.

In some cases it might be also possible to emulate LBRs of
different types, but I don't think it would be worth the effort.
 
Yes, but the patches don't do this, so right now you should only enable it when
all systems in your migration pool have compatible LBRs.

-Andi


[tip:perf/urgent] perf script: Fix LBR skid dump problems in brstackinsn

2019-01-03 Thread tip-bot for Andi Kleen
Commit-ID:  61f611593f2c90547cb09c0bf6977414454a27e6
Gitweb: https://git.kernel.org/tip/61f611593f2c90547cb09c0bf6977414454a27e6
Author: Andi Kleen 
AuthorDate: Mon, 19 Nov 2018 21:06:17 -0800
Committer:  Arnaldo Carvalho de Melo 
CommitDate: Fri, 28 Dec 2018 16:33:02 -0300

perf script: Fix LBR skid dump problems in brstackinsn

This is a fix for another instance of the skid problem Milian recently
found [1]

The LBRs don't freeze at the exact same time as the PMI is triggered.
The perf script brstackinsn code that dumps LBR assembler assumes that
the last branch in the LBR leads to the sample point.  But with skid
it's possible that the CPU executes one or more branches before the
sample, but which do not appear in the LBR.

What happens then is either that the sample point is before the last LBR
branch. In this case the dumper sees a negative length and ignores it.
Or it the sample point is long after the last branch. Then the dumper
sees a very long block and dumps it upto its block limit (16k bytes),
which is noise in the output.

On typical sample session this can happen regularly.

This patch tries to detect and handle the situation. On the last block
that is dumped by the LBR dumper we always stop on the first branch. If
the block length is negative just scan forward to the first branch.
Otherwise scan until a branch is found.

The PT decoder already has a function that uses the instruction decoder
to detect branches, so we can just reuse it here.

Then when a terminating branch is found print an indication and stop
dumping. This might miss a few instructions, but at least shows no
runaway blocks.

Signed-off-by: Andi Kleen 
Acked-by: Adrian Hunter 
Cc: Jiri Olsa 
Cc: Milian Wolff 
Link: http://lkml.kernel.org/r/20181120050617.4119-1-a...@firstfloor.org
[ Resolved conflict with dd2e18e9ac20 ("perf tools: Support 'srccode' output") ]
Signed-off-by: Arnaldo Carvalho de Melo 
---
 tools/perf/builtin-script.c | 17 -
 tools/perf/util/dump-insn.c |  8 
 tools/perf/util/dump-insn.h |  2 ++
 .../perf/util/intel-pt-decoder/intel-pt-insn-decoder.c  |  8 
 4 files changed, 34 insertions(+), 1 deletion(-)

diff --git a/tools/perf/builtin-script.c b/tools/perf/builtin-script.c
index 3728b50e52e2..88d52ed85ffc 100644
--- a/tools/perf/builtin-script.c
+++ b/tools/perf/builtin-script.c
@@ -1073,9 +1073,18 @@ static int perf_sample__fprintf_brstackinsn(struct 
perf_sample *sample,
 
/*
 * Print final block upto sample
+*
+* Due to pipeline delays the LBRs might be missing a branch
+* or two, which can result in very large or negative blocks
+* between final branch and sample. When this happens just
+* continue walking after the last TO until we hit a branch.
 */
start = br->entries[0].to;
end = sample->ip;
+   if (end < start) {
+   /* Missing jump. Scan 128 bytes for the next branch */
+   end = start + 128;
+   }
len = grab_bb(buffer, start, end, machine, thread, , 
, true);
printed += ip__fprintf_sym(start, thread, x.cpumode, x.cpu, , 
attr, fp);
if (len <= 0) {
@@ -1084,7 +1093,6 @@ static int perf_sample__fprintf_brstackinsn(struct 
perf_sample *sample,
  machine, thread, , , false);
if (len <= 0)
goto out;
-
printed += fprintf(fp, "\t%016" PRIx64 "\t%s\n", sample->ip,
dump_insn(, sample->ip, buffer, len, NULL));
if (PRINT_FIELD(SRCCODE))
@@ -1096,6 +1104,13 @@ static int perf_sample__fprintf_brstackinsn(struct 
perf_sample *sample,
   dump_insn(, start + off, buffer + off, len 
- off, ));
if (ilen == 0)
break;
+   if (arch_is_branch(buffer + off, len - off, x.is64bit) && start 
+ off != sample->ip) {
+   /*
+* Hit a missing branch. Just stop.
+*/
+   printed += fprintf(fp, "\t... not reaching sample 
...\n");
+   break;
+   }
if (PRINT_FIELD(SRCCODE))
print_srccode(thread, x.cpumode, start + off);
}
diff --git a/tools/perf/util/dump-insn.c b/tools/perf/util/dump-insn.c
index 10988d3de7ce..2bd8585db93c 100644
--- a/tools/perf/util/dump-insn.c
+++ b/tools/perf/util/dump-insn.c
@@ -13,3 +13,11 @@ const char *dump_insn(struct perf_insn *x __maybe_unused,
*lenp = 0;
return "?";
 }
+
+__weak
+int arch_is_branch(const unsigned char *buf __maybe_unused,
+  size_t len __maybe_unused,
+  int x86_64 __maybe_unused)
+{
+   return 0;
+}
diff --git a/tools/pe

Re: [PATCH v4 10/10] KVM/x86/lbr: lazy save the guest lbr stack

2018-12-28 Thread Andi Kleen
On Fri, Dec 28, 2018 at 11:47:06AM +0800, Wei Wang wrote:
> On 12/28/2018 04:51 AM, Andi Kleen wrote:
> > Thanks. This looks a lot better than the earlier versions.
> > 
> > Some more comments.
> > 
> > On Wed, Dec 26, 2018 at 05:25:38PM +0800, Wei Wang wrote:
> > > When the vCPU is scheduled in:
> > > - if the lbr feature was used in the last vCPU time slice, set the lbr
> > >stack to be interceptible, so that the host can capture whether the
> > >lbr feature will be used in this time slice;
> > > - if the lbr feature wasn't used in the last vCPU time slice, disable
> > >the vCPU support of the guest lbr switching.
> > time slice is the time from exit to exit?
> 
> It's the vCPU thread time slice (e.g. 100ms).

I don't think the time slices are that long, but ok.

> 
> > 
> > This might be rather short in some cases if the workload does a lot of exits
> > (which I would expect PMU workloads to do) Would be better to use some
> > explicit time check, or at least N exits.
> 
> Did you mean further increasing the lazy time to multiple host thread
> scheduling time slices?
> What would be a good value for "N"?

I'm not sure -- i think the goal would be to find a value that optimizes
performance (or rather minimizes overhead). But perhaps if it's as you say the
scheduler time slice it might be good enough as it is.

I guess it could be tuned later based on more experneice.

> > or partially cleared. This would be user visible.
> > 
> > In theory could try to detect if the guest is inside a PMI and
> > save/restore then, but that would likely be complicated. I would
> > save/restore for all cases.
> 
> Yes, it is easier to save for all the cases. But curious for the
> non-callstack
> mode, it's just ponit sampling functions (kind of speculative in some
> degree).
> Would rarely losing a few recordings important in that case?

In principle no for statistical samples, but I know some tools complain
for bogus samples (e.g. autofdo will). Also with perf report --branch-history 
it will
be definitely visible. I think it's easier to always safe now than to
handle the user complaints about this later.


-Andi


Re: [PATCH v4 10/10] KVM/x86/lbr: lazy save the guest lbr stack II

2018-12-27 Thread Andi Kleen


Actually forgot one case.

In Arch Perfmon v4 the LBR freezing is also controlled through a GLOBAL_CTRL 
bit.
I didn't see any code handling that bit?


-Andi


Re: [PATCH v4 10/10] KVM/x86/lbr: lazy save the guest lbr stack

2018-12-27 Thread Andi Kleen


Thanks. This looks a lot better than the earlier versions.

Some more comments.

On Wed, Dec 26, 2018 at 05:25:38PM +0800, Wei Wang wrote:
> When the vCPU is scheduled in:
> - if the lbr feature was used in the last vCPU time slice, set the lbr
>   stack to be interceptible, so that the host can capture whether the
>   lbr feature will be used in this time slice;
> - if the lbr feature wasn't used in the last vCPU time slice, disable
>   the vCPU support of the guest lbr switching.

time slice is the time from exit to exit?

This might be rather short in some cases if the workload does a lot of exits
(which I would expect PMU workloads to do) Would be better to use some
explicit time check, or at least N exits.

> 
> Upon the first access to one of the lbr related MSRs (since the vCPU was
> scheduled in):
> - record that the guest has used the lbr;
> - create a host perf event to help save/restore the guest lbr stack if
>   the guest uses the user callstack mode lbr stack;

This is a bit risky. It would be safer (but also more expensive)
to always safe even for any guest LBR use independent of callstack.

Otherwise we might get into a situation where
a vCPU context switch inside the guest PMI will clear the LBRs
before they can be read in the PMI, so some LBR samples will be fully
or partially cleared. This would be user visible.

In theory could try to detect if the guest is inside a PMI and
save/restore then, but that would likely be complicated. I would
save/restore for all cases.

> +static void
> +__always_inline vmx_set_intercept_for_msr(unsigned long *msr_bitmap, u32 msr,
> +   int type, bool value);

__always_inline should only be used if it's needed for functionality,
or in a header.

The rest looks good.

-Andi



Re: [PATCH v2] x86, kbuild: revert macrofying inline assembly code

2018-12-21 Thread Andi Kleen
Masahiro Yamada  writes:

> Revert the following 9 commits:

FWIW the -Wa additional also broke LTO builds because it doesn't really
support -Wa for individual files.

So I'm glad they got reverted.

-Andi


Re: [PATCH 0/7] ARM: hacks for link-time optimization

2018-12-21 Thread Andi Kleen
> In particular turning an address-dependency into a control-dependency,
> which is something allowed by the C language, since it doesn't recognise
> these concepts as such.
> 
> The 'optimization' is allowed currently, but LTO will make it much more
> likely since it will have a much wider view of things. Esp. when combined
> with PGO.
> 
> Specifically; if you have something like:
> 
> int idx;
> struct object objs[2];
> 
> the statement:
> 
>   val = objs[idx & 1].ponies;
> 
> which you 'need' to be translated like:
> 
>   struct object *obj = objs;
>   obj += (idx & 1);
>   val = obj->ponies;
> 
> Such that the load of obj->ponies depends on the load of idx. However
> our dear compiler is allowed to make it:
> 
>   if (idx & 1)
> obj = [1];
>   else
> obj = [0];
> 
>   val = obj->ponies;

I don't see why a compiler would do such an optimization. Clearly
the second variant is worse than the first, bigger and needs
branch prediction resources.

In fact compilers usually try hard to go into the other direction
and apply if conversion.

Has anyone seen real world examples of such changes being done, or is this
all language lawyering theory?

-Andi

> 
> Because C doesn't recognise this as being different. However this is
> utterly broken, because in this translation we can speculate the load
> of obj->ponies such that it no longer depends on the load of idx, which
> breaks RCU.
> 
> Note that further 'optimization' is possible and the compiler could even
> make it:
> 
>   if (idx & 1)
> val = objs[1].ponies;
>   else
> val = objs[0].ponies;
> 
> Now, granted, this is a fairly artificial example, but it does
> illustrate the exact problem.
> 
> The more the compiler can see of the complete program, the more likely
> it can make inferrences like this, esp. when coupled with PGO.
> 
> Now, we're (usually) very careful to wrap things in READ_ONCE() and
> rcu_dereference() and the like, which makes it harder on the compiler
> (because 'volatile' is special), but nothing really stops it from doing
> this.
> 
> Paul has been trying to beat clue into the language people, but given
> he's been at it for 10 years now, and there's no resolution, I figure we
> ought to get compiler implementations to give us a knob.


Re: [RFC PATCH] x86/speculation: Don't inherit TIF_SSBD on execve()

2018-12-19 Thread Andi Kleen
> You can always force disable SSB. In that case, all the child processes
> will have SSBD on.

Okay that sounds reasonable, given the below. Thanks.

-Andi

> 
> >
> > Do you have a real use case where this behavior is a problem?
> >
> > -Andi
> 
> Yes, we have an enterprise application partner that found that their
> application slow down up to 10-20% depending on how their application
> was set up. With the slow setup, the application was spawned by Java
> processes causing the SSBD bit to stay on when the application was running.


Re: [PATCH] checkpatch.pl: Improve WARNING on Kconfig help

2018-12-19 Thread Andi Kleen
>"expecting a 'help' section of 
> $min_conf_desc_length or more lines\n" . $herecurr);
> or maybe
>"please write a paragraph that describes 
> the config symbol fully ($min_conf_desc_length or more lines)\n" . $herecurr);
> 
> Andi?
> You are the originator of this text.
> Do you have an opinion?

Either change is fine for me.

-Andi

> 
> 


Re: [RFC PATCH] x86/speculation: Don't inherit TIF_SSBD on execve()

2018-12-19 Thread Andi Kleen
On Wed, Dec 19, 2018 at 02:09:50PM -0500, Waiman Long wrote:
> With the default SPEC_STORE_BYPASS_SECCOMP/SPEC_STORE_BYPASS_PRCTL mode,
> the TIF_SSBD bit will be inherited when a new task is fork'ed or cloned.
> 
> As only certain class of applications (like Java) requires disabling
> speculative store bypass for security purpose, it may not make sense to
> allow the TIF_SSBD bit to be inherited across execve() boundary where the
> new application may not need SSBD at all and is probably not aware that
> SSBD may have been turned on. This may cause an unnecessary performance
> loss of up to 20% in some cases.
> 
> The arch_setup_new_exec() function is updated to clear the TIF_SSBD
> bit unless it has been force-disabled.

This makes it impossible to write a wrapper that turns this mode
on for unmodified programs.

Do you have a real use case where this behavior is a problem?

-Andi


Re: objtool warnings for kernel/trace/trace_selftest_dynamic.o

2018-12-18 Thread Andi Kleen
On Tue, Dec 18, 2018 at 05:16:20PM -0500, Steven Rostedt wrote:
> On Tue, 18 Dec 2018 14:13:38 -0800
> Andi Kleen  wrote:
> 
> > > Again, that's not the ftrace case. It doesn't care about more than one
> > > out of line instance. Thus, for this particular use, "used" should be
> > > good enough.  
> > 
> > You mean noinline used? 
> 
> I thought that someone said that "used" would also prevent inlining.

that's not correct. You need noinline

-Andi

[ak@tassilo tsrc]$ cat tinline.c

int i;

inline __attribute__((used)) int finline(void)
{
i++;
}


main()
{
finline();
}

[ak@tassilo tsrc]$ gcc -O2 -S tinline.c
tinline.c:10:1: warning: return type defaults to ‘int’ [-Wimplicit-int]
 main()
 ^~~~
[ak@tassilo tsrc]$
[ak@tassilo tsrc]$ cat tinline.s
.file   "tinline.c"
.text
.section.text.startup,"ax",@progbits
.p2align 4,,15
.globl  main
.type   main, @function
main:
.LFB1:
.cfi_startproc
addl$1, i(%rip)
xorl%eax, %eax
ret
.cfi_endproc
.LFE1:
.size   main, .-main
.comm   i,4,4
.ident  "GCC: (GNU) 8.2.1 20181105 (Red Hat 8.2.1-5)"
.section.note.GNU-stack,"",@progbits
[ak@tassilo tsrc]$






Re: objtool warnings for kernel/trace/trace_selftest_dynamic.o

2018-12-18 Thread Andi Kleen
On Tue, Dec 18, 2018 at 04:57:13PM -0500, Steven Rostedt wrote:
> Hmm, how does that work? When does LTO do its linker magic? Because the
> fentry/mcounts are added when the object is created. Are they removed
> if the compiler sees that it can be inlined? Or does LTO just compile
> everything in one go?

LTO compiles everything in one go at link time. The objects
just contain immediate code.

Also in principle it should track command line options from the command
line and apply them by function, but there were bugs regarding
this in older gcc versions so it may not always work. 

> Again, that's not the ftrace case. It doesn't care about more than one
> out of line instance. Thus, for this particular use, "used" should be
> good enough.

You mean noinline used? 

Inlining would certainly break the test.

-Andi


Re: [PATCH v6 1/3] x86/fpu: track AVX-512 usage of tasks

2018-12-18 Thread Andi Kleen
On Tue, Dec 18, 2018 at 01:44:41PM -0800, Dave Hansen wrote:
> On 12/18/18 1:38 PM, Andi Kleen wrote:
> >> I misunderstood, you mean 32bit kernel, not 32bit machine. Theoretically 
> >> 32bit
> >> kernel can use AVX512, but not sure if anyone use it like this. 
> >> get_jiffies_64()
> >> includes jiffies_lock ops so not good in context switch. So I want to use 
> >> raw
> >> jiffies_64 here. jiffies is a good candidate but it has wraparound 
> >> overflow issue.
> >> Other time source are expensive here.
> >>
> >> Should I limit the code only running on 64bit kernel? 
> > Yes making it 64bit only should be fine.
> 
> I think I'd rather just disable AVX512 itself on 32-bit and be done with
> it.  I think more than half of the ~2k of XSAVE space that it consumes
> in *every* *task* is just pure waste because it has to be 0's.
> 
> This ~2k of extra space is also lowmem, which makes it even more valuable.

That will actually break programs.

If someone compiled binaries with -march=native on a system with AVX512
they wouldn't work anymore.

Don't think we can do it.


-Andi


Re: [PATCH v6 1/3] x86/fpu: track AVX-512 usage of tasks

2018-12-18 Thread Andi Kleen
> I misunderstood, you mean 32bit kernel, not 32bit machine. Theoretically 32bit
> kernel can use AVX512, but not sure if anyone use it like this. 
> get_jiffies_64()
> includes jiffies_lock ops so not good in context switch. So I want to use raw
> jiffies_64 here. jiffies is a good candidate but it has wraparound overflow 
> issue.
> Other time source are expensive here.
> 
> Should I limit the code only running on 64bit kernel? 

Yes making it 64bit only should be fine.

Other alternative would be to use 32bit jiffies on 32bit. I assume
wrapping is not that big a problem here.

-Andi


Re: objtool warnings for kernel/trace/trace_selftest_dynamic.o

2018-12-18 Thread Andi Kleen
On Tue, Dec 18, 2018 at 10:19:32AM +0100, Peter Zijlstra wrote:
> On Mon, Dec 17, 2018 at 03:59:50PM -0800, Andi Kleen wrote:
> > BTW I have a user base for LTO and so far noone has reported any issues
> > like this.
> 
> Because ordering issues are immediately obvious and easy to debug no
> doubt.

Again if it's a real problem it should be already there 
because we have a lot of code inside files that gets happily inlined.

Do you have an example in the tree where it happened or could happen?


-Andi


Re: objtool warnings for kernel/trace/trace_selftest_dynamic.o

2018-12-18 Thread Andi Kleen
> I whittled it down to a small test case.  It turns out the problem is
> caused by the "__optimize__("no-tracer")" atribute, which is used by our
> __noclone macro:
> 
> 
> # if __has_attribute(__optimize__)
> #  define __noclone __attribute__((__noclone__, 
> __optimize__("no-tracer")))
> # else
> #  define __noclone __attribute__((__noclone__))
> # endif

Ah interesting. That makes sense. Thanks for tracking that down.

> 
> commit 95272c29378ee7dc15f43fa2758cb28a5913a06d
> Author: Paolo Bonzini 
> Date:   Thu Mar 31 09:38:51 2016 +0200
> 
> compiler-gcc: disable -ftracer for __noclone functions
> 
> -ftracer can duplicate asm blocks causing compilation to fail in
> noclone functions.  For example, KVM declares a global variable
> in an asm like
> 
> asm("2: ... \n
>  .pushsection data \n
>  .global vmx_return \n
>  vmx_return: .long 2b");
> 
> and -ftracer causes a double declaration.

Ok. So the real solution would be to add a no tracer to the KVM
function only.

But of course it will drop its frame pointer. So if you rely on
frame pointer for live patching you would still have a problem.

Just fixing the ftrace case independently may not be useful?

ftrace here really not interesting at all for live patching because
the ftracing self test only happens at boot, when there is no live patching.
Likely it's even in __init code.

KVM on the other hand is quite important and actually happens at boot.

So I would concentrate on fixing that one instead, but leave ftrace
alone.

My suggestion to fix KVM would be to drop the inline assembler
and use out of line assembler in a separate file. Then disabling
the tracer wouldn't be needed.

VM entries are very expensive anyways, I doubt avoiding a single call/ret
makes any difference at all.

-Andi



<    1   2   3   4   5   6   7   8   9   10   >