[PATCH] perf kvm record/report: 'unprocessable sample' error while recording/reporting guest data
While recording guest samples in host using perf kvm record, it will populate unprocessable sample error, though samples will be recorded properly. While generating report using perf kvm report, no samples will be processed and same error will populate. We have seen this behaviour with upstream perf(4.4-rc3) on x86 and ppc64 hardware. Reason behind this failure is, when it tries to fetch machine from rb_tree of machines, it fails. As a part of tracing a bug, we figured out that this code was incorrectly refactored in commit 54245fdc357613633954bfd38cffb71cb9def067 ("perf session: Remove wrappers to machines__find") This patch will change the functionality such that if it can't fetch machine in first trial, it will create one node of machine and add that to rb_tree. So next time when it tries to fetch same machine from rb_tree, it won't fail. Actually it was the case before refactoring of code in aforementioned commit. This patch is generated from acme perf/core branch. Below I've mention an example that demonstrate the behaviour before and after applying patch. Before applying patch: [Note: One needs to run guest before recording data in host] ravi@ravi-bangoria:~$ ./perf kvm record -a Warning: 5903 unprocessable samples recorded. Do you have a KVM guest running and not using 'perf kvm'? [ perf record: Captured and wrote 1.409 MB perf.data.guest (285 samples) ] ravi@ravi-bangoria:~$ ./perf kvm report --stdio Warning: 5903 unprocessable samples recorded. Do you have a KVM guest running and not using 'perf kvm'? # To display the perf.data header info, please use --header/--header-only options. # # # Total Lost Samples: 0 # # Samples: 285 of event 'cycles' # Event count (approx.): 88715406 # # Overhead Command Shared Object Symbol # ... . .. # # # (For a higher level overview, try: perf report --sort comm,dso) # After applying patch: ravi@ravi-bangoria:~$ ./perf kvm record -a [ perf record: Captured and wrote 1.188 MB perf.data.guest (17 samples) ] ravi@ravi-bangoria:~$ ./perf kvm report --stdio # To display the perf.data header info, please use --header/--header-only options. # # # Total Lost Samples: 0 # # Samples: 17 of event 'cycles' # Event count (approx.): 700746 # # Overhead Command Shared Object Symbol # ... .. # 34.19% :5758[unknown] [g] 0x818682ab 22.79% :5758[unknown] [g] 0x812dc7f8 22.79% :5758[unknown] [g] 0x818650d0 14.83% :5758[unknown] [g] 0x8161a1b6 2.49% :5758[unknown] [g] 0x818692bf 0.48% :5758[unknown] [g] 0x81869253 0.05% :5758[unknown] [g] 0x81869250 Signed-off-by: Ravi Bangoria <ravi.bango...@linux.vnet.ibm.com> --- tools/perf/util/session.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/tools/perf/util/session.c b/tools/perf/util/session.c index c35ffdd..468de95 100644 --- a/tools/perf/util/session.c +++ b/tools/perf/util/session.c @@ -972,7 +972,7 @@ static struct machine *machines__find_for_cpumode(struct machines *machines, machine = machines__find(machines, pid); if (!machine) - machine = machines__find(machines, DEFAULT_GUEST_KERNEL_ID); + machine = machines__findnew(machines, DEFAULT_GUEST_KERNEL_ID); return machine; } -- 2.1.4 -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] perf kvm record/report: 'unprocessable sample' error while recording/reporting guest data
Hi acme, I sent this patch few days ago. Unfortunately nobody has payed attention. Can you please pick this up. Regards, Ravi On Monday 07 December 2015 12:25 PM, Ravi Bangoria wrote: While recording guest samples in host using perf kvm record, it will populate unprocessable sample error, though samples will be recorded properly. While generating report using perf kvm report, no samples will be processed and same error will populate. We have seen this behaviour with upstream perf(4.4-rc3) on x86 and ppc64 hardware. Reason behind this failure is, when it tries to fetch machine from rb_tree of machines, it fails. As a part of tracing a bug, we figured out that this code was incorrectly refactored in commit 54245fdc357613633954bfd38cffb71cb9def067 ("perf session: Remove wrappers to machines__find") This patch will change the functionality such that if it can't fetch machine in first trial, it will create one node of machine and add that to rb_tree. So next time when it tries to fetch same machine from rb_tree, it won't fail. Actually it was the case before refactoring of code in aforementioned commit. This patch is generated from acme perf/core branch. Below I've mention an example that demonstrate the behaviour before and after applying patch. Before applying patch: [Note: One needs to run guest before recording data in host] ravi@ravi-bangoria:~$ ./perf kvm record -a Warning: 5903 unprocessable samples recorded. Do you have a KVM guest running and not using 'perf kvm'? [ perf record: Captured and wrote 1.409 MB perf.data.guest (285 samples) ] ravi@ravi-bangoria:~$ ./perf kvm report --stdio Warning: 5903 unprocessable samples recorded. Do you have a KVM guest running and not using 'perf kvm'? # To display the perf.data header info, please use --header/--header-only options. # # # Total Lost Samples: 0 # # Samples: 285 of event 'cycles' # Event count (approx.): 88715406 # # Overhead Command Shared Object Symbol # ... . .. # # # (For a higher level overview, try: perf report --sort comm,dso) # After applying patch: ravi@ravi-bangoria:~$ ./perf kvm record -a [ perf record: Captured and wrote 1.188 MB perf.data.guest (17 samples) ] ravi@ravi-bangoria:~$ ./perf kvm report --stdio # To display the perf.data header info, please use --header/--header-only options. # # # Total Lost Samples: 0 # # Samples: 17 of event 'cycles' # Event count (approx.): 700746 # # Overhead Command Shared Object Symbol # ... .. # 34.19% :5758[unknown] [g] 0x818682ab 22.79% :5758[unknown] [g] 0x812dc7f8 22.79% :5758[unknown] [g] 0x818650d0 14.83% :5758[unknown] [g] 0x8161a1b6 2.49% :5758[unknown] [g] 0x818692bf 0.48% :5758[unknown] [g] 0x81869253 0.05% :5758[unknown] [g] 0x81869250 Signed-off-by: Ravi Bangoria <ravi.bango...@linux.vnet.ibm.com> --- tools/perf/util/session.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/tools/perf/util/session.c b/tools/perf/util/session.c index c35ffdd..468de95 100644 --- a/tools/perf/util/session.c +++ b/tools/perf/util/session.c @@ -972,7 +972,7 @@ static struct machine *machines__find_for_cpumode(struct machines *machines, machine = machines__find(machines, pid); if (!machine) - machine = machines__find(machines, DEFAULT_GUEST_KERNEL_ID); + machine = machines__findnew(machines, DEFAULT_GUEST_KERNEL_ID); return machine; } -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH] perf/kvm: Guest Symbol Resolution for powerpc
0.00% :9688[guest.kernel.kallsyms] [g] .plpar_hcall 0.00% :9689[guest.kernel.kallsyms] [g] .__srcu_read_unlock 0.00% :9689[guest.kernel.kallsyms] [g] ._raw_spin_lock 0.00% :9689[guest.kernel.kallsyms] [g] .arch_local_irq_restore Signed-off-by: Ravi Bangoria <ravi.bango...@linux.vnet.ibm.com> Signed-off-by: Hemant Kumar <hem...@linux.vnet.ibm.com> --- tools/perf/arch/powerpc/util/Build | 2 + tools/perf/arch/powerpc/util/evlist.c | 22 ++ tools/perf/arch/powerpc/util/parse-tp.c | 74 + tools/perf/builtin-annotate.c | 3 +- tools/perf/builtin-diff.c | 3 +- tools/perf/builtin-mem.c| 10 +++-- tools/perf/builtin-report.c | 3 +- tools/perf/builtin-script.c | 3 +- tools/perf/builtin-timechart.c | 8 ++-- tools/perf/builtin-top.c| 3 +- tools/perf/tests/hists_cumulate.c | 2 +- tools/perf/tests/hists_filter.c | 2 +- tools/perf/tests/hists_link.c | 4 +- tools/perf/tests/hists_output.c | 2 +- tools/perf/util/event.c | 15 +-- tools/perf/util/event.h | 3 +- tools/perf/util/evlist.c| 8 tools/perf/util/evlist.h| 1 + tools/perf/util/evsel.c | 7 tools/perf/util/evsel.h | 3 ++ tools/perf/util/session.c | 9 ++-- tools/perf/util/util.c | 5 +++ tools/perf/util/util.h | 2 + 23 files changed, 169 insertions(+), 25 deletions(-) create mode 100644 tools/perf/arch/powerpc/util/evlist.c create mode 100644 tools/perf/arch/powerpc/util/parse-tp.c diff --git a/tools/perf/arch/powerpc/util/Build b/tools/perf/arch/powerpc/util/Build index 7b8b0d1..edd08e4 100644 --- a/tools/perf/arch/powerpc/util/Build +++ b/tools/perf/arch/powerpc/util/Build @@ -1,5 +1,7 @@ libperf-y += header.o libperf-y += sym-handling.o +libperf-y += parse-tp.o +libperf-y += evlist.o libperf-$(CONFIG_DWARF) += dwarf-regs.o libperf-$(CONFIG_DWARF) += skip-callchain-idx.o diff --git a/tools/perf/arch/powerpc/util/evlist.c b/tools/perf/arch/powerpc/util/evlist.c new file mode 100644 index 000..6a16d72 --- /dev/null +++ b/tools/perf/arch/powerpc/util/evlist.c @@ -0,0 +1,22 @@ +#include +#include "../../util/evsel.h" +#include "../../util/evlist.h" + +/* + * To sample for only guest, record kvm_hv:kvm_guest_exit. + * Otherwise go via normal way(cycles). + */ +int perf_evlist__arch_add_default(struct perf_evlist *evlist) +{ + struct perf_evsel *evsel; + + if (!perf_guest_only()) + return -1; + + evsel = perf_evsel__newtp_idx("kvm_hv", "kvm_guest_exit", 0); + if (IS_ERR(evsel)) + return PTR_ERR(evsel); + + perf_evlist__add(evlist, evsel); + return 0; +} diff --git a/tools/perf/arch/powerpc/util/parse-tp.c b/tools/perf/arch/powerpc/util/parse-tp.c new file mode 100644 index 000..50c4ac8 --- /dev/null +++ b/tools/perf/arch/powerpc/util/parse-tp.c @@ -0,0 +1,74 @@ +#include "../../util/evsel.h" +#include "../../util/trace-event.h" +#include "../../util/session.h" +#include "../../util/util.h" + +#define KVMPPC_EXIT "kvm_hv:kvm_guest_exit" +#define HV_DECREMENTER 2432 +#define HV_BIT 3 +#define PR_BIT 49 +#define PPC_MAX 63 + +static bool is_kvmppc_exit_event(struct perf_evsel *evsel) +{ + static unsigned int kvmppc_exit; + + if (evsel->attr.type != PERF_TYPE_TRACEPOINT) + return false; + + if (unlikely(kvmppc_exit == 0)) { + if (strcmp(KVMPPC_EXIT, evsel->name)) + return false; + kvmppc_exit = evsel->attr.config; + } else if (kvmppc_exit != evsel->attr.config) { + return false; + } + + return true; +} + +static bool is_hv_dec_trap(struct perf_evsel *evsel, struct perf_sample *sample) +{ + int trap = perf_evsel__intval(evsel, sample, "trap"); + return trap == HV_DECREMENTER; +} + +/* + * Get the instruction pointer from the tracepoint data + */ +u64 arch__get_ip(struct perf_evsel *evsel, struct perf_sample *sample) +{ + if (perf_guest_only() && + is_kvmppc_exit_event(evsel) && + is_hv_dec_trap(evsel, sample)) + return perf_evsel__intval(evsel, sample, "pc"); + + return sample->ip; +} + +/* + * Get the HV and PR bits and accordingly, determine the cpumode + */ +u8 arch__get_cpumode(const union perf_event *event, struct perf_evsel *evsel, +struct perf_sample *sample) +{ + unsigned long hv, pr, msr; + u8 cpumode = event->header.misc & PERF_RECORD_MISC_CPUMODE_MASK; + + if (!perf_guest_only() || !is_kvmppc_exit_event(e
[RFC 4/6] perf annotate: generalize handling of ret instructions
From: "Naveen N. Rao"Introduce helper to detect ret instructions and use the same in the tui. A helper is needed since some architectures such as powerpc have more than one return instruction. Signed-off-by: Naveen N. Rao --- tools/perf/ui/browsers/annotate.c | 20 +--- tools/perf/util/annotate.c| 10 ++ tools/perf/util/annotate.h| 1 + 3 files changed, 20 insertions(+), 11 deletions(-) diff --git a/tools/perf/ui/browsers/annotate.c b/tools/perf/ui/browsers/annotate.c index b65a979..288200f 100644 --- a/tools/perf/ui/browsers/annotate.c +++ b/tools/perf/ui/browsers/annotate.c @@ -223,16 +223,14 @@ static void annotate_browser__write(struct ui_browser *browser, void *entry, int } else if (ins__is_call(dl->ins)) { ui_browser__write_graph(browser, SLSMG_RARROW_CHAR); SLsmg_write_char(' '); + } else if (ins__is_ret(dl->ins)) { + ui_browser__write_graph(browser, SLSMG_LARROW_CHAR); + SLsmg_write_char(' '); } else { ui_browser__write_nstring(browser, " ", 2); } } else { - if (strcmp(dl->name, "retq")) { - ui_browser__write_nstring(browser, " ", 2); - } else { - ui_browser__write_graph(browser, SLSMG_LARROW_CHAR); - SLsmg_write_char(' '); - } + ui_browser__write_nstring(browser, " ", 2); } disasm_line__scnprintf(dl, bf, sizeof(bf), !annotate_browser__opts.use_offset); @@ -843,14 +841,14 @@ show_help: ui_helpline__puts("Huh? No selection. Report to linux-kernel@vger.kernel.org"); else if (browser->selection->offset == -1) ui_helpline__puts("Actions are only available for assembly lines."); - else if (!browser->selection->ins) { - if (strcmp(browser->selection->name, "retq")) - goto show_sup_ins; + else if (!browser->selection->ins) + goto show_sup_ins; + else if (ins__is_ret(browser->selection->ins)) goto out; - } else if (!(annotate_browser__jump(browser) || + else if (!(annotate_browser__jump(browser) || annotate_browser__callq(browser, evsel, hbt))) { show_sup_ins: - ui_helpline__puts("Actions are only available for 'callq', 'retq' & jump instructions."); + ui_helpline__puts("Actions are only available for function call/return & jump/branch instructions."); } continue; case 't': diff --git a/tools/perf/util/annotate.c b/tools/perf/util/annotate.c index e0dc7b2..634daf5 100644 --- a/tools/perf/util/annotate.c +++ b/tools/perf/util/annotate.c @@ -363,6 +363,15 @@ static struct ins_ops nop_ops = { .scnprintf = nop__scnprintf, }; +static struct ins_ops ret_ops = { + .scnprintf = ins__raw_scnprintf, +}; + +bool ins__is_ret(const struct ins *ins) +{ + return ins->ops == _ops; +} + static struct ins instructions_x86[] = { { .name = "add", .ops = _ops, }, { .name = "addl", .ops = _ops, }, @@ -439,6 +448,7 @@ static struct ins instructions_x86[] = { { .name = "xadd", .ops = _ops, }, { .name = "xbeginl", .ops = _ops, }, { .name = "xbeginq", .ops = _ops, }, + { .name = "retq", .ops = _ops, }, }; static struct ins instructions_arm[] = { diff --git a/tools/perf/util/annotate.h b/tools/perf/util/annotate.h index f7b669e..488c427 100644 --- a/tools/perf/util/annotate.h +++ b/tools/perf/util/annotate.h @@ -48,6 +48,7 @@ struct ins { bool ins__is_jump(const struct ins *ins); bool ins__is_call(const struct ins *ins); +bool ins__is_ret(const struct ins *ins); int ins__scnprintf(struct ins *ins, char *bf, size_t size, struct ins_operands *ops); struct annotation; -- 2.5.5
[RFC 5/6] perf annotate: add powerpc support
From: "Naveen N. Rao" <naveen.n@linux.vnet.ibm.com> Powerpc has long list of branch instructions and hardcoding them in table appears to be error-prone. So, add new function to find instruction instead of creating table. Signed-off-by: Naveen N. Rao <naveen.n@linux.vnet.ibm.com> Signed-off-by: Ravi Bangoria <ravi.bango...@linux.vnet.ibm.com> --- tools/perf/util/annotate.c | 64 ++ 1 file changed, 64 insertions(+) diff --git a/tools/perf/util/annotate.c b/tools/perf/util/annotate.c index 634daf5..ad01825 100644 --- a/tools/perf/util/annotate.c +++ b/tools/perf/util/annotate.c @@ -492,6 +492,68 @@ static struct ins *ins__find(const char *name) sizeof(struct ins), ins__key_cmp); } +static struct ins *ins__find_powerpc(const char *name) +{ + int i; + struct ins *ins; + + ins = zalloc(sizeof(struct ins)); + if (!ins) + return NULL; + + ins->name = strdup(name); + if (!ins->name) + return NULL; + + if (name[0] == 'b') { + /* branch instructions */ + ins->ops = _ops; + + /* +* - Few start with 'b', but aren't branch instructions. +* - Let's also ignore instructions involving 'ctr' and +* 'tar' since target branch addresses for those can't +* be determined statically. +*/ + if (!strncmp(name, "bcd", 3) || + !strncmp(name, "brinc", 5) || + !strncmp(name, "bper", 4) || + strstr(name, "ctr")|| + strstr(name, "tar")) + return NULL; + + i = strlen(name) - 1; + if (i < 0) + return NULL; + + /* ignore optional hints at the end of the instructions */ + if (name[i] == '+' || name[i] == '-') + i--; + + if (name[i] == 'l' || (name[i] == 'a' && name[i-1] == 'l')) { + /* +* if the instruction ends up with 'l' or 'la', then +* those are considered 'calls' since they update LR. +* ... except for 'bnl' which is branch if not less than +* and the absolute form of the same. +*/ + if (strcmp(name, "bnl") && strcmp(name, "bnl+") && + strcmp(name, "bnl-") && strcmp(name, "bnla") && + strcmp(name, "bnla+") && strcmp(name, "bnla-")) + ins->ops = _ops; + } + if (name[i] == 'r' && name[i-1] == 'l') + /* +* instructions ending with 'lr' are considered to be +* return instructions +*/ + ins->ops = _ops; + + return ins; + } + return NULL; +} + static void __init_arch_ins(const char *arch, struct ins *instructions, int size, struct ins *(*func)(const char *)) { @@ -513,6 +575,8 @@ static int _init_arch_ins(const char *norm_arch) __init_arch_ins(norm_arch, instructions_arm, ARRAY_SIZE(instructions_arm), ins__find); + else if (!strcmp(norm_arch, PERF_ARCH_POWERPC)) + __init_arch_ins(norm_arch, NULL, 0, ins__find_powerpc); else return -1; -- 2.5.5
[RFC 6/6] perf: add more triplets
Add few more triplets based on Fedora and Ubuntu binutils(cross tools). Before applying patch on x86: ( Install binutils-powerpc64-linux-gnu.x86_64 ) $ perf report -i perf.data.powerpc --vmlinux vmlinux.powerpc \ --objdump powerpc64-linux-gnu-objdump After applying patch on x86: $ perf report -i perf.data.powerpc --vmlinux vmlinux.powerpc Signed-off-by: Ravi Bangoria <ravi.bango...@linux.vnet.ibm.com> --- tools/perf/arch/common.c | 17 + 1 file changed, 17 insertions(+) diff --git a/tools/perf/arch/common.c b/tools/perf/arch/common.c index 7da6ac7..93afa60 100644 --- a/tools/perf/arch/common.c +++ b/tools/perf/arch/common.c @@ -9,34 +9,44 @@ const char *const arm_triplets[] = { "arm-unknown-linux-", "arm-unknown-linux-gnu-", "arm-unknown-linux-gnueabi-", + "arm-linux-gnu-", + "arm-linux-gnueabihf-", + "arm-none-eabi-", NULL }; const char *const arm64_triplets[] = { "aarch64-linux-android-", + "aarch64-linux-gnu-", NULL }; const char *const powerpc_triplets[] = { "powerpc-unknown-linux-gnu-", "powerpc64-unknown-linux-gnu-", + "powerpc64-linux-gnu-", + "powerpc64le-linux-gnu-", NULL }; const char *const s390_triplets[] = { "s390-ibm-linux-", + "s390x-linux-gnu-", NULL }; const char *const sh_triplets[] = { "sh-unknown-linux-gnu-", "sh64-unknown-linux-gnu-", + "sh-linux-gnu-", + "sh64-linux-gnu-", NULL }; const char *const sparc_triplets[] = { "sparc-unknown-linux-gnu-", "sparc64-unknown-linux-gnu-", + "sparc64-linux-gnu-", NULL }; @@ -49,12 +59,19 @@ const char *const x86_triplets[] = { "i386-pc-linux-gnu-", "i686-linux-android-", "i686-android-linux-", + "x86_64-linux-gnu-", + "i586-linux-gnu-", NULL }; const char *const mips_triplets[] = { "mips-unknown-linux-gnu-", "mipsel-linux-android-", + "mips-linux-gnu-", + "mips64-linux-gnu-", + "mips64el-linux-gnuabi64-", + "mips64-linux-gnuabi64-", + "mipsel-linux-gnu-", NULL }; -- 2.5.5
[RFC 3/6] perf annotate: Enable cross arch annotate
Change current data structures and function to enable cross arch annotate and add support for x86 and arm instructions. Current implementation does not contain logic of recording on one arch and annotating on other. This remote annotate is partially possible with current implementation for x86 (or may be arm as well) only. But, to make remote annotation work properly, all architecture instruction tables need to be included in the perf binary. And while annotating, look for instruction table where perf.data was recorded. For arm, few instructions were defined under #if __arm__ which I've used as a table for arm. But I'm not sure whether instruction defined outside of that also contains arm instructions. Note: Here arch_ins is global var. And init_arch_ins will be called every time when we annotate symbol. So I still need to optimize this. May be make arch_ins per session. Please suggest best way to do it. Signed-off-by: Ravi Bangoria <ravi.bango...@linux.vnet.ibm.com> --- tools/perf/builtin-top.c | 2 +- tools/perf/ui/browsers/annotate.c | 5 +- tools/perf/ui/gtk/annotate.c | 6 +- tools/perf/util/annotate.c| 116 +- tools/perf/util/annotate.h| 3 +- tools/perf/util/evsel.c | 7 +++ tools/perf/util/evsel.h | 2 + 7 files changed, 108 insertions(+), 33 deletions(-) diff --git a/tools/perf/builtin-top.c b/tools/perf/builtin-top.c index 07fc792..d4fd947 100644 --- a/tools/perf/builtin-top.c +++ b/tools/perf/builtin-top.c @@ -128,7 +128,7 @@ static int perf_top__parse_source(struct perf_top *top, struct hist_entry *he) return err; } - err = symbol__annotate(sym, map, 0); + err = symbol__annotate(sym, map, 0, NULL); if (err == 0) { out_assign: top->sym_filter_entry = he; diff --git a/tools/perf/ui/browsers/annotate.c b/tools/perf/ui/browsers/annotate.c index 0e106bb..b65a979 100644 --- a/tools/perf/ui/browsers/annotate.c +++ b/tools/perf/ui/browsers/annotate.c @@ -1031,6 +1031,7 @@ int symbol__tui_annotate(struct symbol *sym, struct map *map, int ret = -1; int nr_pcnt = 1; size_t sizeof_bdl = sizeof(struct browser_disasm_line); + char *target_arch = NULL; if (sym == NULL) return -1; @@ -1052,7 +1053,9 @@ int symbol__tui_annotate(struct symbol *sym, struct map *map, (nr_pcnt - 1); } - if (symbol__annotate(sym, map, sizeof_bdl) < 0) { + target_arch = perf_evsel__env_arch(evsel); + + if (symbol__annotate(sym, map, sizeof_bdl, target_arch) < 0) { ui__error("%s", ui_helpline__last_msg); goto out_free_offsets; } diff --git a/tools/perf/ui/gtk/annotate.c b/tools/perf/ui/gtk/annotate.c index 9c7ff8d..e468c1a 100644 --- a/tools/perf/ui/gtk/annotate.c +++ b/tools/perf/ui/gtk/annotate.c @@ -4,7 +4,6 @@ #include "util/evsel.h" #include "ui/helpline.h" - enum { ANN_COL__PERCENT, ANN_COL__OFFSET, @@ -162,11 +161,14 @@ static int symbol__gtk_annotate(struct symbol *sym, struct map *map, GtkWidget *notebook; GtkWidget *scrolled_window; GtkWidget *tab_label; + char *target_arch = NULL; if (map->dso->annotate_warned) return -1; - if (symbol__annotate(sym, map, 0) < 0) { + target_arch = perf_evsel__env_arch(evsel); + + if (symbol__annotate(sym, map, 0, target_arch) < 0) { ui__error("%s", ui_helpline__current); return -1; } diff --git a/tools/perf/util/annotate.c b/tools/perf/util/annotate.c index b2c7ae4..e0dc7b2 100644 --- a/tools/perf/util/annotate.c +++ b/tools/perf/util/annotate.c @@ -20,6 +20,8 @@ #include #include #include +#include +#include "../arch/common.h" const char *disassembler_style; const char *objdump_path; @@ -28,6 +30,13 @@ static regex_tfile_lineno; static struct ins *ins__find(const char *name); static int disasm_line__parse(char *line, char **namep, char **rawp); +static struct arch_instructions { + const char *arch; + intnmemb; + struct ins *instructions; + struct ins *(*ins__find)(const char *); +} arch_ins; + static void ins__delete(struct ins_operands *ops) { if (ops == NULL) @@ -183,7 +192,7 @@ static int lock__parse(struct ins_operands *ops) if (disasm_line__parse(ops->raw, , >locked.ops->raw) < 0) goto out_free_ops; - ops->locked.ins = ins__find(name); + ops->locked.ins = arch_ins.ins__find(name); free(name); if (ops->locked.ins == NULL) @@ -354,26 +363,12 @@ static struct ins_ops nop_ops = { .scnprintf = nop__scnprintf, }; -static struct ins instructions[] = { +static struct ins instructions_x86[] = { { .name = &qu
[RFC 1/6] perf: Remove unused hist_entry__annotate function
hist_entry__annotate looks part of API but I don't find any caller of this function. Removing it. Signed-off-by: Ravi Bangoria <ravi.bango...@linux.vnet.ibm.com> --- tools/perf/util/annotate.c | 5 - tools/perf/util/annotate.h | 2 -- 2 files changed, 7 deletions(-) diff --git a/tools/perf/util/annotate.c b/tools/perf/util/annotate.c index 7e5a1e8..b2c7ae4 100644 --- a/tools/perf/util/annotate.c +++ b/tools/perf/util/annotate.c @@ -1676,11 +1676,6 @@ int symbol__tty_annotate(struct symbol *sym, struct map *map, return 0; } -int hist_entry__annotate(struct hist_entry *he, size_t privsize) -{ - return symbol__annotate(he->ms.sym, he->ms.map, privsize); -} - bool ui__has_annotation(void) { return use_browser == 1 && perf_hpp_list.sym; diff --git a/tools/perf/util/annotate.h b/tools/perf/util/annotate.h index 9241f8c..82f3781 100644 --- a/tools/perf/util/annotate.h +++ b/tools/perf/util/annotate.h @@ -156,8 +156,6 @@ void symbol__annotate_zero_histograms(struct symbol *sym); int symbol__annotate(struct symbol *sym, struct map *map, size_t privsize); -int hist_entry__annotate(struct hist_entry *he, size_t privsize); - int symbol__annotate_init(struct map *map, struct symbol *sym); int symbol__annotate_printf(struct symbol *sym, struct map *map, struct perf_evsel *evsel, bool full_paths, -- 2.5.5
[RFC 2/6] perf annotate: Define macro for arch names
Define macro for each arch name and use them instead of using arch name as string. Signed-off-by: Ravi Bangoria <ravi.bango...@linux.vnet.ibm.com> --- tools/perf/arch/common.c | 36 ++-- tools/perf/arch/common.h | 10 ++ tools/perf/util/unwind-libunwind.c | 5 +++-- 3 files changed, 31 insertions(+), 20 deletions(-) diff --git a/tools/perf/arch/common.c b/tools/perf/arch/common.c index fa090a9..7da6ac7 100644 --- a/tools/perf/arch/common.c +++ b/tools/perf/arch/common.c @@ -105,25 +105,25 @@ static int lookup_triplets(const char *const *triplets, const char *name) const char *normalize_arch(char *arch) { if (!strcmp(arch, "x86_64")) - return "x86"; + return PERF_ARCH_X86; if (arch[0] == 'i' && arch[2] == '8' && arch[3] == '6') - return "x86"; + return PERF_ARCH_X86; if (!strcmp(arch, "sun4u") || !strncmp(arch, "sparc", 5)) - return "sparc"; + return PERF_ARCH_SPARC; if (!strcmp(arch, "aarch64") || !strcmp(arch, "arm64")) - return "arm64"; + return PERF_ARCH_ARM64; if (!strncmp(arch, "arm", 3) || !strcmp(arch, "sa110")) - return "arm"; + return PERF_ARCH_ARM; if (!strncmp(arch, "s390", 4)) - return "s390"; + return PERF_ARCH_S390; if (!strncmp(arch, "parisc", 6)) - return "parisc"; + return PERF_ARCH_PARISC; if (!strncmp(arch, "powerpc", 7) || !strncmp(arch, "ppc", 3)) - return "powerpc"; + return PERF_ARCH_POWERPC; if (!strncmp(arch, "mips", 4)) - return "mips"; + return PERF_ARCH_MIPS; if (!strncmp(arch, "sh", 2) && isdigit(arch[2])) - return "sh"; + return PERF_ARCH_SH; return arch; } @@ -163,21 +163,21 @@ static int perf_env__lookup_binutils_path(struct perf_env *env, zfree(); } - if (!strcmp(arch, "arm")) + if (!strcmp(arch, PERF_ARCH_ARM)) path_list = arm_triplets; - else if (!strcmp(arch, "arm64")) + else if (!strcmp(arch, PERF_ARCH_ARM64)) path_list = arm64_triplets; - else if (!strcmp(arch, "powerpc")) + else if (!strcmp(arch, PERF_ARCH_POWERPC)) path_list = powerpc_triplets; - else if (!strcmp(arch, "sh")) + else if (!strcmp(arch, PERF_ARCH_SH)) path_list = sh_triplets; - else if (!strcmp(arch, "s390")) + else if (!strcmp(arch, PERF_ARCH_S390)) path_list = s390_triplets; - else if (!strcmp(arch, "sparc")) + else if (!strcmp(arch, PERF_ARCH_SPARC)) path_list = sparc_triplets; - else if (!strcmp(arch, "x86")) + else if (!strcmp(arch, PERF_ARCH_X86)) path_list = x86_triplets; - else if (!strcmp(arch, "mips")) + else if (!strcmp(arch, PERF_ARCH_MIPS)) path_list = mips_triplets; else { ui__error("binutils for %s not supported.\n", arch); diff --git a/tools/perf/arch/common.h b/tools/perf/arch/common.h index 6b01c73..bbb6960 100644 --- a/tools/perf/arch/common.h +++ b/tools/perf/arch/common.h @@ -5,6 +5,16 @@ extern const char *objdump_path; +#define PERF_ARCH_X86 "x86" +#define PERF_ARCH_SPARC"sparc" +#define PERF_ARCH_ARM64"arm64" +#define PERF_ARCH_ARM "arm" +#define PERF_ARCH_S390 "s390" +#define PERF_ARCH_PARISC "parisc" +#define PERF_ARCH_POWERPC "powerpc" +#define PERF_ARCH_MIPS "mips" +#define PERF_ARCH_SH "sh" + int perf_env__lookup_objdump(struct perf_env *env); const char *normalize_arch(char *arch); diff --git a/tools/perf/util/unwind-libunwind.c b/tools/perf/util/unwind-libunwind.c index 8547119..ccebe5e 100644 --- a/tools/perf/util/unwind-libunwind.c +++ b/tools/perf/util/unwind-libunwind.c @@ -36,10 +36,11 @@ int unwind__prepare_access(struct thread *thread, struct map *map) arch = normalize_arch(thread->mg->machine->env->arch); - if (!strcmp(arch, "x86")) { + if (!strcmp(arch, PERF_ARCH_X86)) { if (dso_type != DSO__TYPE_64BIT) ops = x86_32_unwind_libunwind_ops; - } else if (!strcmp(arch, "arm64") || !strcmp(arch, "arm")) { + } else if (!strcmp(arch, PERF_ARCH_ARM64) || + !strcmp(arch, PERF_ARCH_ARM)) { if (dso_type == DSO__TYPE_64BIT) ops = arm64_unwind_libunwind_ops; } -- 2.5.5
[RFC 0/6] perf annotate: Enable cross arch annotate
Perf can currently only support code navigation (branches and calls) in annotate when run on the same architecture where perf.data was recorded. But cross arch annotate is not supported. This patchset enables cross arch annotate. Currently I've used x86 and arm instructions which are already available and adding support for powerpc as well. Adding support for other arch will be easy. I've created this patch on top of acme/perf/core. And tested it with x86 and powerpc only. Example: Record on powerpc: $ ./perf record -a Report -> Annotate on x86: $ ./perf report -i perf.data.powerpc --vmlinux vmlinux.powerpc Naveen N. Rao (2): perf annotate: generalize handling of ret instructions perf annotate: add powerpc support Ravi Bangoria (4): perf: Remove unused hist_entry__annotate function perf annotate: Define macro for arch names perf annotate: Enable cross arch annotate perf: add more triplets tools/perf/arch/common.c | 53 ++ tools/perf/arch/common.h | 10 ++ tools/perf/builtin-top.c | 2 +- tools/perf/ui/browsers/annotate.c | 25 ++--- tools/perf/ui/gtk/annotate.c | 6 +- tools/perf/util/annotate.c | 195 ++--- tools/perf/util/annotate.h | 6 +- tools/perf/util/evsel.c| 7 ++ tools/perf/util/evsel.h| 2 + tools/perf/util/unwind-libunwind.c | 5 +- 10 files changed, 240 insertions(+), 71 deletions(-) -- 2.5.5
[PATCH 2/4] perf annotate: Enable cross arch annotate
Change current data structures and function to enable cross arch annotate. Current implementation does not contain logic of record on one arch and annotate on other. This remote annotate is partially possible with current implementation for x86 (or may be arm as well) only. But, to make remote annotation work properly, all architecture instruction tables need to be included in the perf binary. And while annotating, look for instruction table where perf.data was recorded. For arm, few instructions were defined under #if __arm__ which I've used as a table for arm. But I'm not sure whether instruction defined outside of that also contains arm instructions. Apart from that, 'call__parse()' and 'move__parse()' contains #ifdef __arm__ directive. I've changed it to if (!strcmp(norm_arch, "arm")). But I've not tested this as well. Signed-off-by: Ravi Bangoria <ravi.bango...@linux.vnet.ibm.com> --- tools/perf/builtin-top.c | 2 +- tools/perf/ui/browsers/annotate.c | 3 +- tools/perf/ui/gtk/annotate.c | 2 +- tools/perf/util/annotate.c| 136 -- tools/perf/util/annotate.h| 5 +- 5 files changed, 95 insertions(+), 53 deletions(-) diff --git a/tools/perf/builtin-top.c b/tools/perf/builtin-top.c index 07fc792..d4fd947 100644 --- a/tools/perf/builtin-top.c +++ b/tools/perf/builtin-top.c @@ -128,7 +128,7 @@ static int perf_top__parse_source(struct perf_top *top, struct hist_entry *he) return err; } - err = symbol__annotate(sym, map, 0); + err = symbol__annotate(sym, map, 0, NULL); if (err == 0) { out_assign: top->sym_filter_entry = he; diff --git a/tools/perf/ui/browsers/annotate.c b/tools/perf/ui/browsers/annotate.c index 29dc6d2..3a652a6f 100644 --- a/tools/perf/ui/browsers/annotate.c +++ b/tools/perf/ui/browsers/annotate.c @@ -1050,7 +1050,8 @@ int symbol__tui_annotate(struct symbol *sym, struct map *map, (nr_pcnt - 1); } - if (symbol__annotate(sym, map, sizeof_bdl) < 0) { + if (symbol__annotate(sym, map, sizeof_bdl, +perf_evsel__env_arch(evsel)) < 0) { ui__error("%s", ui_helpline__last_msg); goto out_free_offsets; } diff --git a/tools/perf/ui/gtk/annotate.c b/tools/perf/ui/gtk/annotate.c index 9c7ff8d..d7150b3 100644 --- a/tools/perf/ui/gtk/annotate.c +++ b/tools/perf/ui/gtk/annotate.c @@ -166,7 +166,7 @@ static int symbol__gtk_annotate(struct symbol *sym, struct map *map, if (map->dso->annotate_warned) return -1; - if (symbol__annotate(sym, map, 0) < 0) { + if (symbol__annotate(sym, map, 0, perf_evsel__env_arch(evsel)) < 0) { ui__error("%s", ui_helpline__current); return -1; } diff --git a/tools/perf/util/annotate.c b/tools/perf/util/annotate.c index c385fec..36a5825 100644 --- a/tools/perf/util/annotate.c +++ b/tools/perf/util/annotate.c @@ -20,12 +20,14 @@ #include #include #include +#include +#include "../arch/common.h" const char *disassembler_style; const char *objdump_path; static regex_t file_lineno; -static struct ins *ins__find(const char *name); +static struct ins *ins__find(const char *name, const char *norm_arch); static int disasm_line__parse(char *line, char **namep, char **rawp); static void ins__delete(struct ins_operands *ops) @@ -53,7 +55,8 @@ int ins__scnprintf(struct ins *ins, char *bf, size_t size, return ins__raw_scnprintf(ins, bf, size, ops); } -static int call__parse(struct ins_operands *ops) +static int call__parse(struct ins_operands *ops, + __maybe_unused const char *norm_arch) { char *endptr, *tok, *name; @@ -65,10 +68,8 @@ static int call__parse(struct ins_operands *ops) name++; -#ifdef __arm__ - if (strchr(name, '+')) + if (!strcmp(norm_arch, "arm") && strchr(name, '+')) return -1; -#endif tok = strchr(name, '>'); if (tok == NULL) @@ -117,7 +118,8 @@ bool ins__is_call(const struct ins *ins) return ins->ops == _ops; } -static int jump__parse(struct ins_operands *ops) +static int jump__parse(struct ins_operands *ops, + __maybe_unused const char *norm_arch) { const char *s = strchr(ops->raw, '+'); @@ -172,7 +174,7 @@ static int comment__symbol(char *raw, char *comment, u64 *addrp, char **namep) return 0; } -static int lock__parse(struct ins_operands *ops) +static int lock__parse(struct ins_operands *ops, const char *norm_arch) { char *name; @@ -183,7 +185,7 @@ static int lock__parse(struct ins_operands *ops) if (disasm_line__parse(ops->raw, , >locked.ops->raw) < 0) goto out_free_ops; - ops->locked.ins = ins__find(name); + ops-&
[PATCH 1/4] perf: Utility function to fetch arch from evsel
Add Utility function to fetch 'arch' from 'evsel'. (evsel->env->arch) Signed-off-by: Ravi Bangoria <ravi.bango...@linux.vnet.ibm.com> --- tools/perf/util/evsel.c | 7 +++ tools/perf/util/evsel.h | 2 ++ 2 files changed, 9 insertions(+) diff --git a/tools/perf/util/evsel.c b/tools/perf/util/evsel.c index 1d8f2bb..0fea724 100644 --- a/tools/perf/util/evsel.c +++ b/tools/perf/util/evsel.c @@ -2422,3 +2422,10 @@ int perf_evsel__open_strerror(struct perf_evsel *evsel, struct target *target, err, strerror_r(err, sbuf, sizeof(sbuf)), perf_evsel__name(evsel)); } + +char *perf_evsel__env_arch(struct perf_evsel *evsel) +{ + if (evsel && evsel->evlist && evsel->evlist->env) + return evsel->evlist->env->arch; + return NULL; +} diff --git a/tools/perf/util/evsel.h b/tools/perf/util/evsel.h index 828ddd1..86fed7a 100644 --- a/tools/perf/util/evsel.h +++ b/tools/perf/util/evsel.h @@ -435,4 +435,6 @@ typedef int (*attr__fprintf_f)(FILE *, const char *, const char *, void *); int perf_event_attr__fprintf(FILE *fp, struct perf_event_attr *attr, attr__fprintf_f attr__fprintf, void *priv); +char *perf_evsel__env_arch(struct perf_evsel *evsel); + #endif /* __PERF_EVSEL_H */ -- 2.5.5
[PATCH 4/4] perf annotate: Define macro for arch names
Define macro for each arch name and use them instead of using arch name as string. Signed-off-by: Ravi Bangoria <ravi.bango...@linux.vnet.ibm.com> --- tools/perf/arch/common.c | 36 ++-- tools/perf/arch/common.h | 11 +++ tools/perf/util/annotate.c | 10 +- tools/perf/util/unwind-libunwind.c | 4 ++-- 4 files changed, 36 insertions(+), 25 deletions(-) diff --git a/tools/perf/arch/common.c b/tools/perf/arch/common.c index ee69668..feb2113 100644 --- a/tools/perf/arch/common.c +++ b/tools/perf/arch/common.c @@ -122,25 +122,25 @@ static int lookup_triplets(const char *const *triplets, const char *name) const char *normalize_arch(char *arch) { if (!strcmp(arch, "x86_64")) - return "x86"; + return NORM_X86; if (arch[0] == 'i' && arch[2] == '8' && arch[3] == '6') - return "x86"; + return NORM_X86; if (!strcmp(arch, "sun4u") || !strncmp(arch, "sparc", 5)) - return "sparc"; + return NORM_SPARC; if (!strcmp(arch, "aarch64") || !strcmp(arch, "arm64")) - return "arm64"; + return NORM_ARM64; if (!strncmp(arch, "arm", 3) || !strcmp(arch, "sa110")) - return "arm"; + return NORM_ARM; if (!strncmp(arch, "s390", 4)) - return "s390"; + return NORM_S390; if (!strncmp(arch, "parisc", 6)) - return "parisc"; + return NORM_PARISC; if (!strncmp(arch, "powerpc", 7) || !strncmp(arch, "ppc", 3)) - return "powerpc"; + return NORM_POWERPC; if (!strncmp(arch, "mips", 4)) - return "mips"; + return NORM_MIPS; if (!strncmp(arch, "sh", 2) && isdigit(arch[2])) - return "sh"; + return NORM_SH; return arch; } @@ -180,21 +180,21 @@ static int perf_env__lookup_binutils_path(struct perf_env *env, zfree(); } - if (!strcmp(arch, "arm")) + if (!strcmp(arch, NORM_ARM)) path_list = arm_triplets; - else if (!strcmp(arch, "arm64")) + else if (!strcmp(arch, NORM_ARM64)) path_list = arm64_triplets; - else if (!strcmp(arch, "powerpc")) + else if (!strcmp(arch, NORM_POWERPC)) path_list = powerpc_triplets; - else if (!strcmp(arch, "sh")) + else if (!strcmp(arch, NORM_SH)) path_list = sh_triplets; - else if (!strcmp(arch, "s390")) + else if (!strcmp(arch, NORM_S390)) path_list = s390_triplets; - else if (!strcmp(arch, "sparc")) + else if (!strcmp(arch, NORM_SPARC)) path_list = sparc_triplets; - else if (!strcmp(arch, "x86")) + else if (!strcmp(arch, NORM_X86)) path_list = x86_triplets; - else if (!strcmp(arch, "mips")) + else if (!strcmp(arch, NORM_MIPS)) path_list = mips_triplets; else { ui__error("binutils for %s not supported.\n", arch); diff --git a/tools/perf/arch/common.h b/tools/perf/arch/common.h index 6b01c73..14ca8ca 100644 --- a/tools/perf/arch/common.h +++ b/tools/perf/arch/common.h @@ -5,6 +5,17 @@ extern const char *objdump_path; +/* Macro for normalized arch names */ +#define NORM_X86 "x86" +#define NORM_SPARC "sparc" +#define NORM_ARM64 "arm64" +#define NORM_ARM "arm" +#define NORM_S390 "s390" +#define NORM_PARISC"parisc" +#define NORM_POWERPC "powerpc" +#define NORM_MIPS "mips" +#define NORM_SH"sh" + int perf_env__lookup_objdump(struct perf_env *env); const char *normalize_arch(char *arch); diff --git a/tools/perf/util/annotate.c b/tools/perf/util/annotate.c index 96c6610..8146a25 100644 --- a/tools/perf/util/annotate.c +++ b/tools/perf/util/annotate.c @@ -68,7 +68,7 @@ static int call__parse(struct ins_operands *ops, name++; - if (!strcmp(norm_arch, "arm") && strchr(name, '+')) + if (!strcmp(norm_arch, NORM_ARM) && strchr(name, '+')) return -1; tok = strchr(name, '>'); @@ -255,7 +255,7 @@ static int mov__parse(struct ins_operands *ops, target = ++s; - if (!strcmp(norm_arch, "arm")) + if (!strcmp(norm_arch, NORM_ARM)) comment = strchr(s, ';'); else comment = strchr(
Re: [RFC 3/6] perf annotate: Enable cross arch annotate
On Monday 27 June 2016 10:46 PM, Arnaldo Carvalho de Melo wrote: Em Fri, Jun 24, 2016 at 05:23:57PM +0530, Ravi Bangoria escreveu: Change current data structures and function to enable cross arch annotate and add support for x86 and arm instructions. Current implementation does not contain logic of recording on one arch and annotating on other. This remote annotate is partially possible with current implementation for x86 (or may be arm as well) only. But, to make remote annotation work properly, all architecture instruction tables need to be included in the perf binary. And while annotating, look for instruction table where perf.data was recorded. ... +static struct arch_instructions { + const char *arch; + intnmemb; + struct ins *instructions; + struct ins *(*ins__find)(const char *); Why do we need arch specific find functions? Why not pass the instructions pointer to it, just like you did with ins__sort(). Probably it is not needed to be global, you just pick the right instructions table + its ARRAY_SIZE and pass it around, again, like you did in ins__sort(). - Arnaldo Thanks Arnaldo for suggestion. To determine arch in ins__find, I need to pass 'arch' till ins__find and which requires changes in definition of many functions. So, I thought about global var. Anyway, I've prepared a patch as you suggested and sent it as a [PATCH]. Please review it. -Ravi
[PATCH 3/4] perf annotate: add powerpc support
From: "Naveen N. Rao" <naveen.n@linux.vnet.ibm.com> Powerpc has long list of branch instructions and hardcoding them in table appears to be error-prone. So, add new function to find instruction instead of creating table. Signed-off-by: Naveen N. Rao <naveen.n@linux.vnet.ibm.com> Signed-off-by: Ravi Bangoria <ravi.bango...@linux.vnet.ibm.com> --- tools/perf/util/annotate.c | 64 ++ 1 file changed, 64 insertions(+) diff --git a/tools/perf/util/annotate.c b/tools/perf/util/annotate.c index 36a5825..96c6610 100644 --- a/tools/perf/util/annotate.c +++ b/tools/perf/util/annotate.c @@ -476,6 +476,68 @@ static int ins__cmp(const void *a, const void *b) return strcmp(ia->name, ib->name); } +static struct ins *ins__find_powerpc(const char *name) +{ + int i; + struct ins *ins; + + ins = zalloc(sizeof(struct ins)); + if (!ins) + return NULL; + + ins->name = strdup(name); + if (!ins->name) + return NULL; + + if (name[0] == 'b') { + /* branch instructions */ + ins->ops = _ops; + + /* +* - Few start with 'b', but aren't branch instructions. +* - Let's also ignore instructions involving 'ctr' and +* 'tar' since target branch addresses for those can't +* be determined statically. +*/ + if (!strncmp(name, "bcd", 3) || + !strncmp(name, "brinc", 5) || + !strncmp(name, "bper", 4) || + strstr(name, "ctr")|| + strstr(name, "tar")) + return NULL; + + i = strlen(name) - 1; + if (i < 0) + return NULL; + + /* ignore optional hints at the end of the instructions */ + if (name[i] == '+' || name[i] == '-') + i--; + + if (name[i] == 'l' || (name[i] == 'a' && name[i-1] == 'l')) { + /* +* if the instruction ends up with 'l' or 'la', then +* those are considered 'calls' since they update LR. +* ... except for 'bnl' which is branch if not less than +* and the absolute form of the same. +*/ + if (strcmp(name, "bnl") && strcmp(name, "bnl+") && + strcmp(name, "bnl-") && strcmp(name, "bnla") && + strcmp(name, "bnla+") && strcmp(name, "bnla-")) + ins->ops = _ops; + } + if (name[i] == 'r' && name[i-1] == 'l') + /* +* instructions ending with 'lr' are considered to be +* return instructions +*/ + ins->ops = _ops; + + return ins; + } + return NULL; +} + static void ins__sort(struct ins *instructions, int nmemb) { qsort(instructions, nmemb, sizeof(struct ins), ins__cmp); @@ -511,6 +573,8 @@ static struct ins *ins__find(const char *name, const char *norm_arch) } else if (!strcmp(norm_arch, "arm")) { instructions = instructions_arm; nmemb = ARRAY_SIZE(instructions_arm); + } else if (!strcmp(norm_arch, "powerpc")) { + return ins__find_powerpc(name); } else { pr_err("perf annotate not supported by %s arch\n", norm_arch); return NULL; -- 2.5.5
[PATCH 0/4] perf annotate: Enable cross arch annotate
Perf can currently only support code navigation (branches and calls) in annotate when run on the same architecture where perf.data was recorded. But cross arch annotate is not supported. This patchset enables cross arch annotate. Currently I've used x86 and arm instructions which are already available and adding support for powerpc as well. Adding support for other arch will be easy. I've created this patch on top of acme/perf/core. And tested it with x86 and powerpc only. Example: Record on powerpc: $ ./perf record -a Report -> Annotate on x86: $ ./perf report -i perf.data.powerpc --vmlinux vmlinux.powerpc Changes in [PATCH] vs [RFC] - Removed global var 'arch__ins' and pass arch info till ins__find Naveen N. Rao (1): perf annotate: add powerpc support Ravi Bangoria (3): perf: Utility function to fetch arch perf annotate: Enable cross arch annotate perf annotate: Define macro for arch names tools/perf/arch/common.c | 36 +++ tools/perf/arch/common.h | 11 +++ tools/perf/builtin-top.c | 2 +- tools/perf/ui/browsers/annotate.c | 3 +- tools/perf/ui/gtk/annotate.c | 2 +- tools/perf/util/annotate.c | 198 - tools/perf/util/annotate.h | 5 +- tools/perf/util/evsel.c| 7 ++ tools/perf/util/evsel.h| 2 + tools/perf/util/unwind-libunwind.c | 4 +- 10 files changed, 198 insertions(+), 72 deletions(-) -- 2.5.5
Re: [PATCH 3/4] perf annotate: add powerpc support
Thanks David. On Tuesday 28 June 2016 09:37 PM, David Laight wrote: From: Ravi Bangoria Sent: 28 June 2016 12:37 Powerpc has long list of branch instructions and hardcoding them in table appears to be error-prone. So, add new function to find instruction instead of creating table. Signed-off-by: Naveen N. Rao <naveen.n@linux.vnet.ibm.com> Signed-off-by: Ravi Bangoria <ravi.bango...@linux.vnet.ibm.com> --- tools/perf/util/annotate.c | 64 ++ 1 file changed, 64 insertions(+) diff --git a/tools/perf/util/annotate.c b/tools/perf/util/annotate.c index 36a5825..96c6610 100644 --- a/tools/perf/util/annotate.c +++ b/tools/perf/util/annotate.c @@ -476,6 +476,68 @@ static int ins__cmp(const void *a, const void *b) return strcmp(ia->name, ib->name); } +static struct ins *ins__find_powerpc(const char *name) It would be better if the function name include 'branch'. +{ + int i; + struct ins *ins; + + ins = zalloc(sizeof(struct ins)); + if (!ins) + return NULL; + + ins->name = strdup(name); + if (!ins->name) + return NULL; You leak 'ins' here. + + if (name[0] == 'b') { + /* branch instructions */ + ins->ops = _ops; + + /* +* - Few start with 'b', but aren't branch instructions. +* - Let's also ignore instructions involving 'ctr' and +* 'tar' since target branch addresses for those can't +* be determined statically. +*/ + if (!strncmp(name, "bcd", 3) || + !strncmp(name, "brinc", 5) || + !strncmp(name, "bper", 4) || + strstr(name, "ctr")|| + strstr(name, "tar")) + return NULL; More importantly you leak 'ins' and 'ins->name' here. And on other paths below. Yes. Fair points. I can create linked list that maintain allocated instructions and lookup it every time before allocating memory. But for this, I need to free memory at the end and it's becoming complicated. I can go back to normal approach of creating table for powerpc. This is simplest. But only problem is powerpc has around 400 branch instructions(which includes call and ret as well). And list them all is bit error-prone. Suggestions? - Ravi ... David
Re: [PATCH v2 2/3] perf kvm: enable record|report feature on powerpc
Hi acme, On Tuesday 02 February 2016 02:36 PM, Ravi Bangoria wrote: HI acme, On Tuesday 02 February 2016 02:36 AM, Arnaldo Carvalho de Melo wrote: Em Fri, Jan 22, 2016 at 11:28:11AM +0530, Ravi Bangoria escreveu: +return event->header.misc & PERF_RECORD_MISC_CPUMODE_MASK; +} This hunk and the next should be on the previous patch, that is not even compiling... You have to compile patch by patch, we can't just test at the end of a patchkit like this, this destroys bisection ;-\ Didn't aware about that. Will take care of compiling each patch separately next time onwards. Also you first need to put in place a way to override how to obtain the cpumode, then you should use it. Also this mode doesn't look feasible at all, think about processing perf.data files generated in !powerpc systems being analysed in a powerpc system. This has to be dependend on the architecture of the machine where the perf.data file was recorded, not on the archictecture of the machine the binary was built for. Valid point. I'll re-think about approach in this case. I've analyzed the approach. Here is my observations: 1. With the current approach, record on !powerpc and report on powerpc will work as we are solely dependent on tracepoint; so we don't change ip and cpumode of sample if it's not of kvm_hv:kvm_guest_exit. 2. However, record on powerpc and report on !powerpc won't work with the current approach. To enable that, we have two options: Option A. Change ip and cpumode of sample at a time of record. This will add overhead at a time of recording data and it may have bad effect like data lost. Option B. Extension to current approach (change ip and cpumode at report time only). I'll need to move 'most of' the code from arch/powerpc/util/kvm.c into some common code which is included on all architectures. And use this code to make decision about changing ip and cpumode of sample at run time. So these functions needs to be present in a binary, no matter which platform it's compiled on. I want your suggestions here, how best we can achieve that? Regards, Ravi
Re: [PATCH v2 1/3] perf kvm: Introduce evsel as argument to perf_event__preprocess_sample
HI acme, Thanks for reviewing the patch. On Tuesday 02 February 2016 02:23 AM, Arnaldo Carvalho de Melo wrote: Em Fri, Jan 22, 2016 at 11:28:10AM +0530, Ravi Bangoria escreveu: This patch changes prototype of perf_event__preprocess_sample() with additional argument evsel added at last. This change is required because perf_event__preprocess_sample() function will use evsel to determine cpumode of samples for powerpc architecture. Signed-off-by: Ravi Bangoria<ravi.bango...@linux.vnet.ibm.com> Fixing these problems: CC /tmp/build/perf/ui/gtk/util.o util/event.c: In function ‘perf_event__preprocess_sample’: util/event.c:1302:26: error: unused parameter ‘evsel’ [-Werror=unused-parameter] struct perf_evsel *evsel) ^ cc1: all warnings being treated as errors mv: cannot stat ‘/tmp/build/perf/util/.event.o.tmp’: No such file or directory /home/acme/git/linux/tools/build/Makefile.build:77: recipe for target '/tmp/build/perf/util/event.o' failed make[3]: *** [/tmp/build/perf/util/event.o] Error 1 make[3]: *** Waiting for unfinished jobs CC /tmp/build/perf/ui/gtk/helpline.o CC /tmp/build/perf/arch/common.o /home/acme/git/linux/tools/build/Makefile.build:116: recipe for target 'util' failed make[2]: *** [util] Error 2 make[2]: *** Waiting for unfinished jobs Thanks for pointing this out. Actually I was not aware about this. Will take care next time onwards. Regards, Ravi
Re: [PATCH v2 2/3] perf kvm: enable record|report feature on powerpc
HI acme, On Tuesday 02 February 2016 02:36 AM, Arnaldo Carvalho de Melo wrote: Em Fri, Jan 22, 2016 at 11:28:11AM +0530, Ravi Bangoria escreveu: + return event->header.misc & PERF_RECORD_MISC_CPUMODE_MASK; +} This hunk and the next should be on the previous patch, that is not even compiling... You have to compile patch by patch, we can't just test at the end of a patchkit like this, this destroys bisection ;-\ Didn't aware about that. Will take care of compiling each patch separately next time onwards. Also you first need to put in place a way to override how to obtain the cpumode, then you should use it. Also this mode doesn't look feasible at all, think about processing perf.data files generated in !powerpc systems being analysed in a powerpc system. This has to be dependend on the architecture of the machine where the perf.data file was recorded, not on the archictecture of the machine the binary was built for. Valid point. I'll re-think about approach in this case. It is only when you do live analysis, like with 'perf trace' and 'perf top' that its guaranteed to be all on the same machine. IIRC in one of the patches in this series you introduce and use a library function on the same patch, please break it into two patches as well, lemme see what is the name... Yeah, it is also in this patch: perf_evlist__arch_add_default(struct perf_evlist *evlist) Please add this in a separate patch, stating in the changeset comment why it is needed and how architectures can override it. Will do that. Thanks for reviewing. Regards, Ravi
[RFC 0/4] perf kvm: Guest Symbol Resolution for powerpc
Design of [patch v2] Guest Symbol Resolution is focused on enabling perf kvm {record|report} on powerpc. Here is the link for the same: thread.gmane.org/gmane.linux.kernel/2132409 As per the point raised by acme, this design does not enable cross arch reporting functionality. i.e. record on powerpc and report on !powerpc. This patch aims to enable cross arch reporting functionality along with enabling perf kvm {record|report} on powerpc. Note that basic principle of enabling perf kvm {record|report} on powerpc using tracepoint kvm_hv:kvm_guest_exit has not been changed. Major change between [patch v2] and this [RFC] patch is, I've moved 'perf kvm report' related and ppc specific functionality from tool/perf/arch/powerpc/ to generic tool/perf/ code. This is required because perf binary needs ppc specific code even if it's compiled on !ppc to enable cross arch reporting. I need suggestion specifically on patch 3 (Enable 'report' on powerpc) which contains arch specific code in generic area. Right now I've added code in util/evsel.c. But please let me know if there's any better way to do this. This patch is to get suggestions on approach so I've tagged it as RFC and not following the patch version series. Ravi Bangoria (4): perf kvm: Enable 'record' on powerpc perf kvm: Introduce evsel as argument to perf_event__preprocess_sample perf kvm: Enable 'report' on powerpc perf kvm: Fix output fields instead of 'trace' for perf kvm report on powerpc tools/perf/arch/powerpc/util/Build | 1 + tools/perf/arch/powerpc/util/kvm.c | 18 + tools/perf/builtin-annotate.c | 3 +- tools/perf/builtin-diff.c | 3 +- tools/perf/builtin-mem.c | 10 +++-- tools/perf/builtin-report.c| 8 +++- tools/perf/builtin-script.c| 3 +- tools/perf/builtin-timechart.c | 8 ++-- tools/perf/builtin-top.c | 3 +- tools/perf/tests/hists_cumulate.c | 2 +- tools/perf/tests/hists_filter.c| 2 +- tools/perf/tests/hists_link.c | 4 +- tools/perf/tests/hists_output.c| 2 +- tools/perf/util/event.c| 8 ++-- tools/perf/util/event.h| 3 +- tools/perf/util/evlist.c | 9 + tools/perf/util/evlist.h | 1 + tools/perf/util/evsel.c| 77 ++ tools/perf/util/evsel.h| 7 tools/perf/util/session.c | 7 ++-- tools/perf/util/util.c | 5 +++ tools/perf/util/util.h | 1 + 22 files changed, 161 insertions(+), 24 deletions(-) create mode 100644 tools/perf/arch/powerpc/util/kvm.c -- 2.1.4
[RFC 2/4] perf kvm: Introduce evsel as argument to perf_event__preprocess_sample
This patch changes prototype of perf_event__preprocess_sample() with additional argument evsel added at the end. This change is required because perf_event__preprocess_sample() function will use evsel to determine cpumode of samples for powerpc architecture. Signed-off-by: Ravi Bangoria <ravi.bango...@linux.vnet.ibm.com> --- tools/perf/builtin-annotate.c | 3 ++- tools/perf/builtin-diff.c | 3 ++- tools/perf/builtin-mem.c | 10 ++ tools/perf/builtin-report.c | 3 ++- tools/perf/builtin-script.c | 3 ++- tools/perf/builtin-timechart.c| 8 +--- tools/perf/builtin-top.c | 3 ++- tools/perf/tests/hists_cumulate.c | 2 +- tools/perf/tests/hists_filter.c | 2 +- tools/perf/tests/hists_link.c | 4 ++-- tools/perf/tests/hists_output.c | 2 +- tools/perf/util/event.c | 3 ++- tools/perf/util/event.h | 3 ++- 13 files changed, 30 insertions(+), 19 deletions(-) diff --git a/tools/perf/builtin-annotate.c b/tools/perf/builtin-annotate.c index cfe3663..da330ae 100644 --- a/tools/perf/builtin-annotate.c +++ b/tools/perf/builtin-annotate.c @@ -94,7 +94,8 @@ static int process_sample_event(struct perf_tool *tool, struct addr_location al; int ret = 0; - if (perf_event__preprocess_sample(event, machine, , sample) < 0) { + if (perf_event__preprocess_sample(event, machine, , + sample, evsel) < 0) { pr_warning("problem processing %d event, skipping it.\n", event->header.type); return -1; diff --git a/tools/perf/builtin-diff.c b/tools/perf/builtin-diff.c index 36ccc2b..d2a27fe 100644 --- a/tools/perf/builtin-diff.c +++ b/tools/perf/builtin-diff.c @@ -330,7 +330,8 @@ static int diff__process_sample_event(struct perf_tool *tool __maybe_unused, struct hists *hists = evsel__hists(evsel); int ret = -1; - if (perf_event__preprocess_sample(event, machine, , sample) < 0) { + if (perf_event__preprocess_sample(event, machine, , + sample, evsel) < 0) { pr_warning("problem processing %d event, skipping it.\n", event->header.type); return -1; diff --git a/tools/perf/builtin-mem.c b/tools/perf/builtin-mem.c index b3f8a89..a7c01fe 100644 --- a/tools/perf/builtin-mem.c +++ b/tools/perf/builtin-mem.c @@ -118,13 +118,15 @@ static int dump_raw_samples(struct perf_tool *tool, union perf_event *event, struct perf_sample *sample, -struct machine *machine) +struct machine *machine, +struct perf_evsel *evsel) { struct perf_mem *mem = container_of(tool, struct perf_mem, tool); struct addr_location al; const char *fmt; - if (perf_event__preprocess_sample(event, machine, , sample) < 0) { + if (perf_event__preprocess_sample(event, machine, , + sample, evsel) < 0) { fprintf(stderr, "problem processing %d event, skipping it.\n", event->header.type); return -1; @@ -168,10 +170,10 @@ out_put: static int process_sample_event(struct perf_tool *tool, union perf_event *event, struct perf_sample *sample, - struct perf_evsel *evsel __maybe_unused, + struct perf_evsel *evsel, struct machine *machine) { - return dump_raw_samples(tool, event, sample, machine); + return dump_raw_samples(tool, event, sample, machine, evsel); } static int report_raw_events(struct perf_mem *mem) diff --git a/tools/perf/builtin-report.c b/tools/perf/builtin-report.c index 760e886..31ec4ba 100644 --- a/tools/perf/builtin-report.c +++ b/tools/perf/builtin-report.c @@ -154,7 +154,8 @@ static int process_sample_event(struct perf_tool *tool, }; int ret = 0; - if (perf_event__preprocess_sample(event, machine, , sample) < 0) { + if (perf_event__preprocess_sample(event, machine, , + sample, evsel) < 0) { pr_debug("problem processing %d event, skipping it.\n", event->header.type); return -1; diff --git a/tools/perf/builtin-script.c b/tools/perf/builtin-script.c index f4caf48..792868e 100644 --- a/tools/perf/builtin-script.c +++ b/tools/perf/builtin-script.c @@ -804,7 +804,8 @@ static int process_sample_event(struct perf_tool *tool, return 0; } - if (perf_event__preprocess_sample(event, machine, , sample) < 0) { + if (perf_event__preprocess_sample(event, machine, , +
[RFC 3/4] perf kvm: Enable 'report' on powerpc
'perf kvm record' on powerpc will record kvm_hv:kvm_guest_exit event instead of cycles. However, to have some kind of periodicity, we can't use all the kvm exits, rather exits which are bound to happen in certain intervals. HV_DECREMENTER Interrupt forces the threads to exit after an interval of 10 ms. This patch makes use of the 'kvm_guest_exit' tracepoint and checks the exit reason for any kvm exit. If it is HV_DECREMENTER, then the instruction pointer dumped along with this tracepoint is retrieved and mapped with the guest kallsyms. Signed-off-by: Ravi Bangoria <ravi.bango...@linux.vnet.ibm.com> Signed-off-by: Hemant Kumar <hem...@linux.vnet.ibm.com> --- tools/perf/util/event.c | 7 +++-- tools/perf/util/evsel.c | 77 +++ tools/perf/util/evsel.h | 7 + tools/perf/util/session.c | 7 +++-- 4 files changed, 92 insertions(+), 6 deletions(-) diff --git a/tools/perf/util/event.c b/tools/perf/util/event.c index bc0a3f0..31bbc50 100644 --- a/tools/perf/util/event.c +++ b/tools/perf/util/event.c @@ -1299,15 +1299,16 @@ int perf_event__preprocess_sample(const union perf_event *event, struct machine *machine, struct addr_location *al, struct perf_sample *sample, - struct perf_evsel *evsel __maybe_unused) + struct perf_evsel *evsel) { - u8 cpumode = event->header.misc & PERF_RECORD_MISC_CPUMODE_MASK; + u8 cpumode; struct thread *thread = machine__findnew_thread(machine, sample->pid, sample->tid); - if (thread == NULL) return -1; + al->cpumode = cpumode = arch__get_cpumode(event, evsel, sample); + dump_printf(" ... thread: %s:%d\n", thread__comm_str(thread), thread->tid); /* * Have we already created the kernel maps for this machine? diff --git a/tools/perf/util/evsel.c b/tools/perf/util/evsel.c index 0902fe4..a4d309e 100644 --- a/tools/perf/util/evsel.c +++ b/tools/perf/util/evsel.c @@ -1622,6 +1622,82 @@ static inline bool overflow(const void *endp, u16 max_size, const void *offset, #define OVERFLOW_CHECK_u64(offset) \ OVERFLOW_CHECK(offset, sizeof(u64), sizeof(u64)) +#define KVMPPC_EXIT "kvm_hv:kvm_guest_exit" +#define HV_DECREMENTER 2432 +#define HV_BIT 3 +#define PR_BIT 49 +#define PPC_MAX 63 + +bool is_kvmppc_exit_event(struct perf_evsel *evsel) +{ + static unsigned int kvmppc_exit; + + if (evsel->attr.type != PERF_TYPE_TRACEPOINT) + return false; + + if (unlikely(kvmppc_exit == 0)) { + if (strcmp(KVMPPC_EXIT, evsel->name)) + return false; + kvmppc_exit = evsel->attr.config; + } else if (kvmppc_exit != evsel->attr.config) { + return false; + } + + return true; +} + +bool is_hv_dec_trap(struct perf_evsel *evsel, struct perf_sample *sample) +{ + int trap = perf_evsel__intval(evsel, sample, "trap"); + return trap == HV_DECREMENTER; +} + +bool is_perf_data_reorded_on_ppc(struct perf_evlist *evlist) +{ + if (evlist && evlist->env && evlist->env->arch) + return !strcmp(evlist->env->arch, "ppc64") || + !strcmp(evlist->env->arch, "ppc64le"); + return false; +} + +u8 arch__get_cpumode(const union perf_event *event, +struct perf_evsel *evsel, +struct perf_sample *sample) +{ + unsigned long hv, pr, msr; + u8 cpumode = event->header.misc & PERF_RECORD_MISC_CPUMODE_MASK; + + if (!(is_perf_data_reorded_on_ppc(evsel->evlist) && + perf_guest_only() && + is_kvmppc_exit_event(evsel))) + goto ret; + + if (sample->raw_data && is_hv_dec_trap(evsel, sample)) { + msr = perf_evsel__intval(evsel, sample, "msr"); + hv = msr & ((unsigned long)1 << (PPC_MAX - HV_BIT)); + pr = msr & ((unsigned long)1 << (PPC_MAX - PR_BIT)); + + if (!hv && pr) + cpumode = PERF_RECORD_MISC_GUEST_USER; + else + cpumode = PERF_RECORD_MISC_GUEST_KERNEL; + } + +ret: + return cpumode; +} + +u64 arch__get_ip(struct perf_evsel *evsel, struct perf_sample *sample) +{ + if (is_perf_data_reorded_on_ppc(evsel->evlist) && + perf_guest_only() && + is_kvmppc_exit_event(evsel) && + is_hv_dec_trap(evsel, sample)) + return perf_evsel__intval(evsel, sample, "pc"); + + return sample->ip;
[RFC 4/4] perf kvm: Fix output fields instead of 'trace' for perf kvm report on powerpc
commit d49dadea7862 ("perf tools: Make 'trace' or 'trace_fields' sort key default for tracepoint events") makes 'trace' sort key as a default while displaying report for tracepoint. Because tracepoint(kvm_hv:kvm_guest_exit) is used as a default event, perf kvm report will display output as a list of tracepoint hits and not with a normal report columns. This patch will replace 'trace' field with 'overhead,comm,dso,sym' while displaying perf kvm report of powerpc. Before applying patch: $ ./perf kvm --guestkallsyms=guest.kallsyms --guestmodules=guest.modules report --stdio # To display the perf.data header info, please use --header/--header-only options. # # # Total Lost Samples: 0 # # Samples: 181K of event 'kvm_hv:kvm_guest_exit' # Event count (approx.): 181061 # # Overhead Trace output # . # 0.02% VCPU 8: trap=HV_DECREMENTER pc=0xc0091924 msr=0x80009032, ceded=0 0.00% VCPU 0: trap=HV_DECREMENTER pc=0xc0091924 msr=0x80009032, ceded=0 0.00% VCPU 8: trap=HV_DECREMENTER pc=0x10005c7c msr=0x8280f032, ceded=0 0.00% VCPU 8: trap=HV_DECREMENTER pc=0x1001ef14 msr=0x8280f032, ceded=0 0.00% VCPU 8: trap=HV_DECREMENTER pc=0x3fff83398830 msr=0x8280f032, ceded=0 0.00% VCPU 8: trap=HV_DECREMENTER pc=0x3fff833a6fe4 msr=0x8280f032, ceded=0 0.00% VCPU 8: trap=HV_DECREMENTER pc=0x3fff833a7a64 msr=0x8280f032, ceded=0 After applying patch: $ ./perf kvm --guestkallsyms=guest.kallsyms --guestmodules=guest.modules report --stdio # To display the perf.data header info, please use --header/--header-only options. # # # Total Lost Samples: 0 # # Samples: 181K of event 'kvm_hv:kvm_guest_exit' # Event count (approx.): 181061 # # Overhead Command Shared ObjectSymbol # ... ... .. # 0.02% :57276 [guest.kernel.kallsyms] [g] .plpar_hcall_norets 0.00% :57274 [guest.kernel.kallsyms] [g] .plpar_hcall_norets 0.00% :57276 [guest.kernel.kallsyms] [g] .__copy_tofrom_user_power7 0.00% :57276 [guest.kernel.kallsyms] [g] ._atomic_dec_and_lock 0.00% :57276 [guest.kernel.kallsyms] [g] ._raw_spin_lock 0.00% :57276 [guest.kernel.kallsyms] [g] ._switch 0.00% :57276 [guest.kernel.kallsyms] [g] .bio_add_page 0.00% :57276 [guest.kernel.kallsyms] [g] .kmem_cache_alloc Signed-off-by: Ravi Bangoria <ravi.bango...@linux.vnet.ibm.com> --- tools/perf/builtin-report.c | 5 + 1 file changed, 5 insertions(+) diff --git a/tools/perf/builtin-report.c b/tools/perf/builtin-report.c index 31ec4ba..5d96882 100644 --- a/tools/perf/builtin-report.c +++ b/tools/perf/builtin-report.c @@ -930,6 +930,11 @@ repeat: else use_browser = 0; + if (!field_order && + is_perf_data_reorded_on_ppc(session->evlist) && + perf_guest_only()) + field_order = "overhead,comm,dso,sym"; + if (setup_sorting(session->evlist) < 0) { if (sort_order) parse_options_usage(report_usage, options, "s", 1); -- 2.1.4
[RFC 1/4] perf kvm: Enable 'record' on powerpc
'perf kvm record' is not available on powerpc because 'perf' relies on the 'cycles' event (a PMU event) to profile the guest. However, for powerpc, this can't be used from the host because the PMUs are controlled by the guest rather than the host. There exists a tracepoint 'kvm_hv:kvm_guest_exit' in powerpc which is hit whenever any of the threads exit the guest context. The guest instruction pointer dumped along with this tracepoint data in the field 'pc', can be used as guest instruction pointer. This patch changes default event as kvm_hv:kvm_guest_exit for recording guest data in host on powerpc. As we are using host event to record guest data, this approach will enable only --guest option of 'perf kvm'. Still --host --guest together won't work. Signed-off-by: Ravi Bangoria <ravi.bango...@linux.vnet.ibm.com> --- tools/perf/arch/powerpc/util/Build | 1 + tools/perf/arch/powerpc/util/kvm.c | 18 ++ tools/perf/util/evlist.c | 9 + tools/perf/util/evlist.h | 1 + tools/perf/util/util.c | 5 + tools/perf/util/util.h | 1 + 6 files changed, 35 insertions(+) create mode 100644 tools/perf/arch/powerpc/util/kvm.c diff --git a/tools/perf/arch/powerpc/util/Build b/tools/perf/arch/powerpc/util/Build index c8fe207..4cb620d 100644 --- a/tools/perf/arch/powerpc/util/Build +++ b/tools/perf/arch/powerpc/util/Build @@ -1,6 +1,7 @@ libperf-y += header.o libperf-y += sym-handling.o libperf-y += kvm-stat.o +libperf-y += kvm.o libperf-$(CONFIG_DWARF) += dwarf-regs.o libperf-$(CONFIG_DWARF) += skip-callchain-idx.o diff --git a/tools/perf/arch/powerpc/util/kvm.c b/tools/perf/arch/powerpc/util/kvm.c new file mode 100644 index 000..878d323 --- /dev/null +++ b/tools/perf/arch/powerpc/util/kvm.c @@ -0,0 +1,18 @@ +#include +#include "../../../util/evsel.h" +#include "../../../util/evlist.h" + +int perf_evlist__arch_add_default(struct perf_evlist *evlist) +{ + struct perf_evsel *evsel; + + if (!perf_guest_only()) + return -1; + + evsel = perf_evsel__newtp_idx("kvm_hv", "kvm_guest_exit", 0); + if (IS_ERR(evsel)) + return PTR_ERR(evsel); + + perf_evlist__add(evlist, evsel); + return 0; +} diff --git a/tools/perf/util/evlist.c b/tools/perf/util/evlist.c index c42e196..8b7b84f 100644 --- a/tools/perf/util/evlist.c +++ b/tools/perf/util/evlist.c @@ -231,6 +231,12 @@ void perf_event_attr__set_max_precise_ip(struct perf_event_attr *attr) } } +int __weak +perf_evlist__arch_add_default(struct perf_evlist *evlist __maybe_unused) +{ + return -1; +} + int perf_evlist__add_default(struct perf_evlist *evlist) { struct perf_event_attr attr = { @@ -239,6 +245,9 @@ int perf_evlist__add_default(struct perf_evlist *evlist) }; struct perf_evsel *evsel; + if (!perf_evlist__arch_add_default(evlist)) + return 0; + event_attr_init(); perf_event_attr__set_max_precise_ip(); diff --git a/tools/perf/util/evlist.h b/tools/perf/util/evlist.h index a0d1522..7951406 100644 --- a/tools/perf/util/evlist.h +++ b/tools/perf/util/evlist.h @@ -75,6 +75,7 @@ void perf_evlist__delete(struct perf_evlist *evlist); void perf_evlist__add(struct perf_evlist *evlist, struct perf_evsel *entry); void perf_evlist__remove(struct perf_evlist *evlist, struct perf_evsel *evsel); +int perf_evlist__arch_add_default(struct perf_evlist *evlist); int perf_evlist__add_default(struct perf_evlist *evlist); int __perf_evlist__add_default_attrs(struct perf_evlist *evlist, struct perf_event_attr *attrs, size_t nr_attrs); diff --git a/tools/perf/util/util.c b/tools/perf/util/util.c index 35b20dd..567e3da 100644 --- a/tools/perf/util/util.c +++ b/tools/perf/util/util.c @@ -37,6 +37,11 @@ bool test_attr__enabled; bool perf_host = true; bool perf_guest = false; +bool perf_guest_only(void) +{ + return !perf_host && perf_guest; +} + void event_attr_init(struct perf_event_attr *attr) { if (!perf_host) diff --git a/tools/perf/util/util.h b/tools/perf/util/util.h index 3dd0408..c459c45 100644 --- a/tools/perf/util/util.h +++ b/tools/perf/util/util.h @@ -344,5 +344,6 @@ int fetch_kernel_version(unsigned int *puint, const char *perf_tip(const char *dirpath); bool is_regular_file(const char *file); int fetch_current_timestamp(char *buf, size_t sz); +bool perf_guest_only(void); #endif /* GIT_COMPAT_UTIL_H */ -- 2.1.4
[PATCH v2 1/3] perf kvm: Introduce evsel as argument to perf_event__preprocess_sample
This patch changes prototype of perf_event__preprocess_sample() with additional argument evsel added at last. This change is required because perf_event__preprocess_sample() function will use evsel to determine cpumode of samples for powerpc architecture. Signed-off-by: Ravi Bangoria <ravi.bango...@linux.vnet.ibm.com> --- changes in v2: - Breakdown of v1 patch into two sub patches tools/perf/builtin-annotate.c | 3 ++- tools/perf/builtin-diff.c | 3 ++- tools/perf/builtin-mem.c | 10 ++ tools/perf/builtin-report.c | 3 ++- tools/perf/builtin-script.c | 3 ++- tools/perf/builtin-timechart.c| 8 +--- tools/perf/builtin-top.c | 3 ++- tools/perf/tests/hists_cumulate.c | 2 +- tools/perf/tests/hists_filter.c | 2 +- tools/perf/tests/hists_link.c | 4 ++-- tools/perf/tests/hists_output.c | 2 +- tools/perf/util/event.c | 3 ++- tools/perf/util/event.h | 3 ++- 13 files changed, 30 insertions(+), 19 deletions(-) diff --git a/tools/perf/builtin-annotate.c b/tools/perf/builtin-annotate.c index cc5c126..b488a5c 100644 --- a/tools/perf/builtin-annotate.c +++ b/tools/perf/builtin-annotate.c @@ -94,7 +94,8 @@ static int process_sample_event(struct perf_tool *tool, struct addr_location al; int ret = 0; - if (perf_event__preprocess_sample(event, machine, , sample) < 0) { + if (perf_event__preprocess_sample(event, machine, , + sample, evsel) < 0) { pr_warning("problem processing %d event, skipping it.\n", event->header.type); return -1; diff --git a/tools/perf/builtin-diff.c b/tools/perf/builtin-diff.c index 36ccc2b..d2a27fe 100644 --- a/tools/perf/builtin-diff.c +++ b/tools/perf/builtin-diff.c @@ -330,7 +330,8 @@ static int diff__process_sample_event(struct perf_tool *tool __maybe_unused, struct hists *hists = evsel__hists(evsel); int ret = -1; - if (perf_event__preprocess_sample(event, machine, , sample) < 0) { + if (perf_event__preprocess_sample(event, machine, , + sample, evsel) < 0) { pr_warning("problem processing %d event, skipping it.\n", event->header.type); return -1; diff --git a/tools/perf/builtin-mem.c b/tools/perf/builtin-mem.c index 3901700..eb27b49 100644 --- a/tools/perf/builtin-mem.c +++ b/tools/perf/builtin-mem.c @@ -61,13 +61,15 @@ static int dump_raw_samples(struct perf_tool *tool, union perf_event *event, struct perf_sample *sample, -struct machine *machine) +struct machine *machine, +struct perf_evsel *evsel) { struct perf_mem *mem = container_of(tool, struct perf_mem, tool); struct addr_location al; const char *fmt; - if (perf_event__preprocess_sample(event, machine, , sample) < 0) { + if (perf_event__preprocess_sample(event, machine, , + sample, evsel) < 0) { fprintf(stderr, "problem processing %d event, skipping it.\n", event->header.type); return -1; @@ -111,10 +113,10 @@ out_put: static int process_sample_event(struct perf_tool *tool, union perf_event *event, struct perf_sample *sample, - struct perf_evsel *evsel __maybe_unused, + struct perf_evsel *evsel, struct machine *machine) { - return dump_raw_samples(tool, event, sample, machine); + return dump_raw_samples(tool, event, sample, machine, evsel); } static int report_raw_events(struct perf_mem *mem) diff --git a/tools/perf/builtin-report.c b/tools/perf/builtin-report.c index 2bf537f..fa7bbd9 100644 --- a/tools/perf/builtin-report.c +++ b/tools/perf/builtin-report.c @@ -151,7 +151,8 @@ static int process_sample_event(struct perf_tool *tool, }; int ret = 0; - if (perf_event__preprocess_sample(event, machine, , sample) < 0) { + if (perf_event__preprocess_sample(event, machine, , + sample, evsel) < 0) { pr_debug("problem processing %d event, skipping it.\n", event->header.type); return -1; diff --git a/tools/perf/builtin-script.c b/tools/perf/builtin-script.c index c691214..4363e8a 100644 --- a/tools/perf/builtin-script.c +++ b/tools/perf/builtin-script.c @@ -783,7 +783,8 @@ static int process_sample_event(struct perf_tool *tool, return 0; } - if (perf_event__preprocess_sample(event, machine, , sample) < 0) { + if (perf_event__preprocess_
[PATCH v2 3/3] perf kvm: Fix output fields instead of 'trace' for perf kvm report on powerpc
commit d49dadea7862 ("perf tools: Make 'trace' or 'trace_fields' sort key default for tracepoint events") makes 'trace' sort key as a default while displaying report for tracepoint. As tracepoint(kvm_hv:kvm_guest_exit) is used as a default event for recording data, perf kvm report will display output as a list of tracepoint hits and not with a normal report columns. This patch will replace 'overhead,comm,dso,sym' fields instead of 'trace' while displaying perf kvm report on powerpc. Before applying patch: $ ./perf kvm --guestkallsyms=guest.kallsyms --guestmodules=guest.modules report --stdio # To display the perf.data header info, please use --header/--header-only options. # # # Total Lost Samples: 0 # # Samples: 181K of event 'kvm_hv:kvm_guest_exit' # Event count (approx.): 181061 # # Overhead Trace output # . # 0.02% VCPU 8: trap=HV_DECREMENTER pc=0xc0091924 msr=0x80009032, ceded=0 0.00% VCPU 0: trap=HV_DECREMENTER pc=0xc0091924 msr=0x80009032, ceded=0 0.00% VCPU 8: trap=HV_DECREMENTER pc=0x10005c7c msr=0x8280f032, ceded=0 0.00% VCPU 8: trap=HV_DECREMENTER pc=0x1001ef14 msr=0x8280f032, ceded=0 0.00% VCPU 8: trap=HV_DECREMENTER pc=0x3fff83398830 msr=0x8280f032, ceded=0 0.00% VCPU 8: trap=HV_DECREMENTER pc=0x3fff833a6fe4 msr=0x8280f032, ceded=0 0.00% VCPU 8: trap=HV_DECREMENTER pc=0x3fff833a7a64 msr=0x8280f032, ceded=0 After applying patch: $ ./perf kvm --guestkallsyms=guest.kallsyms --guestmodules=guest.modules report --stdio # To display the perf.data header info, please use --header/--header-only options. # # # Total Lost Samples: 0 # # Samples: 181K of event 'kvm_hv:kvm_guest_exit' # Event count (approx.): 181061 # # Overhead Command Shared ObjectSymbol # ... ... .. # 0.02% :57276 [guest.kernel.kallsyms] [g] .plpar_hcall_norets 0.00% :57274 [guest.kernel.kallsyms] [g] .plpar_hcall_norets 0.00% :57276 [guest.kernel.kallsyms] [g] .__copy_tofrom_user_power7 0.00% :57276 [guest.kernel.kallsyms] [g] ._atomic_dec_and_lock 0.00% :57276 [guest.kernel.kallsyms] [g] ._raw_spin_lock 0.00% :57276 [guest.kernel.kallsyms] [g] ._switch 0.00% :57276 [guest.kernel.kallsyms] [g] .bio_add_page 0.00% :57276 [guest.kernel.kallsyms] [g] .kmem_cache_alloc Signed-off-by: Ravi Bangoria <ravi.bango...@linux.vnet.ibm.com> --- changes in v2: - Fixes output format of perf kvm report on powerpc tools/perf/arch/powerpc/util/kvm.c | 30 ++ tools/perf/builtin-kvm.c | 23 +-- tools/perf/builtin.h | 3 +++ 3 files changed, 50 insertions(+), 6 deletions(-) diff --git a/tools/perf/arch/powerpc/util/kvm.c b/tools/perf/arch/powerpc/util/kvm.c index 317f29a..e5d88cc 100644 --- a/tools/perf/arch/powerpc/util/kvm.c +++ b/tools/perf/arch/powerpc/util/kvm.c @@ -8,11 +8,13 @@ */ #include +#include #include "../../../util/evsel.h" #include "../../../util/evlist.h" #include "../../../util/trace-event.h" #include "../../../util/session.h" #include "../../../util/util.h" +#include "../../../builtin.h" #define KVMPPC_EXIT "kvm_hv:kvm_guest_exit" #define HV_DECREMENTER 2432 @@ -102,3 +104,31 @@ u8 arch__get_cpumode(const union perf_event *event, struct perf_evsel *evsel, ret: return cpumode; } + +const char **arch__cmd_kvm_report_argv(const char *file_name, int argc, + int *rec_argc, const char **argv) +{ + int i = 0, j, arch_argc = 0; + const char **rec_argv; + + if (perf_guest_only()) + arch_argc = 2; + + *rec_argc = argc + arch_argc + 2; + rec_argv = calloc(*rec_argc + 1, sizeof(char *)); + rec_argv[i++] = strdup("report"); + rec_argv[i++] = strdup("-i"); + rec_argv[i++] = strdup(file_name); + + if (arch_argc) { + rec_argv[i++] = strdup("-F"); + rec_argv[i++] = strdup("overhead,comm,dso,sym"); + } + + for (j = 1; j < argc; j++, i++) + rec_argv[i] = argv[j]; + + BUG_ON(i != *rec_argc); + + return rec_argv; +} diff --git a/tools/perf/builtin-kvm.c b/tools/perf/builtin-kvm.c index 4418d92..48455c9 100644 --- a/tools/perf/builtin-kvm.c +++ b/tools/perf/builtin-kvm.c @@ -1480,22 +1480,33 @@ static int __cmd_record(const char *file_name, int argc, const char **argv) return cmd_record(i, rec_argv, NULL); } -static int __cmd_report(const char *file_name, int argc, const char **argv) + +const char
[PATCH v2 0/3] perf kvm: Guest Symbol Resolution for powerpc
] [g] fast_exception_return 0.00% :9690[unknown][u] 0x3fff966eb6a0 0.00% :9690[unknown][u] 0x3fff966fd09c 0.00% :9687[guest.kernel.kallsyms] [g] .__copy_tofrom_user_power7 0.00% :9688[guest.kernel.kallsyms] [g] ._raw_spin_lock_irqsave 0.00% :9688[guest.kernel.kallsyms] [g] .n_tty_write 0.00% :9688[guest.kernel.kallsyms] [g] .plpar_hcall 0.00% :9689[guest.kernel.kallsyms] [g] .__srcu_read_unlock 0.00% :9689[guest.kernel.kallsyms] [g] ._raw_spin_lock 0.00% :9689[guest.kernel.kallsyms] [g] .arch_local_irq_restore Ravi Bangoria (3): perf kvm: Introduce evsel as argument to perf_event__preprocess_sample perf kvm: enable record|report feature on powerpc perf kvm: Fix output fields instead of 'trace' for perf kvm report on powerpc tools/perf/arch/powerpc/util/Build | 1 + tools/perf/arch/powerpc/util/kvm.c | 134 + tools/perf/builtin-annotate.c | 3 +- tools/perf/builtin-diff.c | 3 +- tools/perf/builtin-kvm.c | 23 +-- tools/perf/builtin-mem.c | 10 +-- tools/perf/builtin-report.c| 3 +- tools/perf/builtin-script.c| 3 +- tools/perf/builtin-timechart.c | 8 ++- tools/perf/builtin-top.c | 3 +- tools/perf/builtin.h | 3 + tools/perf/tests/hists_cumulate.c | 2 +- tools/perf/tests/hists_filter.c| 2 +- tools/perf/tests/hists_link.c | 4 +- tools/perf/tests/hists_output.c| 2 +- tools/perf/util/event.c| 15 - tools/perf/util/event.h| 3 +- tools/perf/util/evlist.c | 9 +++ tools/perf/util/evlist.h | 1 + tools/perf/util/evsel.c| 7 ++ tools/perf/util/evsel.h| 4 ++ tools/perf/util/session.c | 9 +-- tools/perf/util/util.c | 5 ++ tools/perf/util/util.h | 1 + 24 files changed, 227 insertions(+), 31 deletions(-) create mode 100644 tools/perf/arch/powerpc/util/kvm.c -- 2.1.4
[PATCH v2 2/3] perf kvm: enable record|report feature on powerpc
This patch contains core logic for enabling perf kvm {record|report} on powerpc. For perf kvm record, This patch will replace default event(cycle) with kvm_hv:kvm_guest_exit while recording guest data from host. For perf kvm report, This patch makes use of the 'kvm_guest_exit' tracepoint and checks the exit reason for any kvm exit. If it is HV_DECREMENTER, then the instruction pointer dumped along with this tracepoint is retrieved and mapped with the guest kallsyms. Signed-off-by: Ravi Bangoria <ravi.bango...@linux.vnet.ibm.com> Signed-off-by: Hemant Kumar <hem...@linux.vnet.ibm.com> --- changes in v2: - Breakdown of v1 patch into two sub patches - Merged parse-tp.c and evlist.c from tools/perf/arch/powerpc/util/ into single file with name kvm.c tools/perf/arch/powerpc/util/Build | 1 + tools/perf/arch/powerpc/util/kvm.c | 104 + tools/perf/util/event.c| 12 - tools/perf/util/evlist.c | 9 tools/perf/util/evlist.h | 1 + tools/perf/util/evsel.c| 7 +++ tools/perf/util/evsel.h| 4 ++ tools/perf/util/session.c | 9 ++-- tools/perf/util/util.c | 5 ++ tools/perf/util/util.h | 1 + 10 files changed, 147 insertions(+), 6 deletions(-) create mode 100644 tools/perf/arch/powerpc/util/kvm.c diff --git a/tools/perf/arch/powerpc/util/Build b/tools/perf/arch/powerpc/util/Build index 7b8b0d1..eb819e0 100644 --- a/tools/perf/arch/powerpc/util/Build +++ b/tools/perf/arch/powerpc/util/Build @@ -1,5 +1,6 @@ libperf-y += header.o libperf-y += sym-handling.o +libperf-y += kvm.o libperf-$(CONFIG_DWARF) += dwarf-regs.o libperf-$(CONFIG_DWARF) += skip-callchain-idx.o diff --git a/tools/perf/arch/powerpc/util/kvm.c b/tools/perf/arch/powerpc/util/kvm.c new file mode 100644 index 000..317f29a --- /dev/null +++ b/tools/perf/arch/powerpc/util/kvm.c @@ -0,0 +1,104 @@ +/* + * This program is free software; you can redistribute it and/or modify + * it under the terms of the GNU General Public License, version 2, as + * published by the Free Software Foundation. + * + * Copyright (C) 2016 Hemant Kumar Shaw, IBM Corporation + * Copyright (C) 2016 Ravikumar B. Bangoria, IBM Corporation + */ + +#include +#include "../../../util/evsel.h" +#include "../../../util/evlist.h" +#include "../../../util/trace-event.h" +#include "../../../util/session.h" +#include "../../../util/util.h" + +#define KVMPPC_EXIT "kvm_hv:kvm_guest_exit" +#define HV_DECREMENTER 2432 +#define HV_BIT 3 +#define PR_BIT 49 +#define PPC_MAX 63 + +/* + * To sample for only guest, record kvm_hv:kvm_guest_exit. + * Otherwise go via normal way(cycles). + */ +int perf_evlist__arch_add_default(struct perf_evlist *evlist) +{ + struct perf_evsel *evsel; + + if (!perf_guest_only()) + return -1; + + evsel = perf_evsel__newtp_idx("kvm_hv", "kvm_guest_exit", 0); + if (IS_ERR(evsel)) + return PTR_ERR(evsel); + + perf_evlist__add(evlist, evsel); + return 0; +} + +static bool is_kvmppc_exit_event(struct perf_evsel *evsel) +{ + static unsigned int kvmppc_exit; + + if (evsel->attr.type != PERF_TYPE_TRACEPOINT) + return false; + + if (unlikely(kvmppc_exit == 0)) { + if (strcmp(KVMPPC_EXIT, evsel->name)) + return false; + kvmppc_exit = evsel->attr.config; + } else if (kvmppc_exit != evsel->attr.config) { + return false; + } + + return true; +} + +static bool is_hv_dec_trap(struct perf_evsel *evsel, struct perf_sample *sample) +{ + int trap = perf_evsel__intval(evsel, sample, "trap"); + return trap == HV_DECREMENTER; +} + +/* + * Get the instruction pointer from the tracepoint data + */ +u64 arch__get_ip(struct perf_evsel *evsel, struct perf_sample *sample) +{ + if (perf_guest_only() && + is_kvmppc_exit_event(evsel) && + is_hv_dec_trap(evsel, sample)) + return perf_evsel__intval(evsel, sample, "pc"); + + return sample->ip; +} + +/* + * Get the HV and PR bits and accordingly, determine the cpumode + */ +u8 arch__get_cpumode(const union perf_event *event, struct perf_evsel *evsel, +struct perf_sample *sample) +{ + unsigned long hv, pr, msr; + u8 cpumode = event->header.misc & PERF_RECORD_MISC_CPUMODE_MASK; + + if (!perf_guest_only() || !is_kvmppc_exit_event(evsel)) + goto ret; + + if (sample->raw_data && is_hv_dec_trap(evsel, sample)) { + msr = perf_evsel__intval(evsel, sample, "msr"); + hv = msr & ((unsigned long)1 << (PPC_MAX - HV_BIT)); + pr = msr & ((unsigned long)1 << (PPC_MAX -
Re: [PATCH] perf/kvm: Guest Symbol Resolution for powerpc
Hi Arnaldo, On Wednesday 13 January 2016 10:29 PM, Arnaldo Carvalho de Melo wrote: Em Tue, Dec 29, 2015 at 03:38:40PM +0530, Ravi Bangoria escreveu: 'perf kvm {record|report}' is used to record and report the profiled performance of any workload on a guest. From the host, we can collect guest kernel statistics which is useful in finding out any contentions in guest kernel symbols for a certain workload. This feature is not available on powerpc because 'perf' relies on the 'cycles' event (a PMU event) to profile the guest. However, for powerpc, this can't be used from the host because the PMUs are controlled by the guest rather than the host. Without entering the realms if the approach is the right one, which I leave to PowerPC experts, Ingo, PeterZ, etc: So, in these cases, please break this into a series, where you, for instance, will add that extra evsel parameter to the functions that will ultimately use it to extract those event fields, that should be a separate patch, so that when reviewing the "meat" of your patch we can quickly see what it does, not having to extract that from leg work. Two other patches should introduce arch__get_{ip,cpumode}(). - Arnaldo Thanks for suggestion. I've sent v2 with changes you suggested. Can you please take a look. Regards, Ravi
Re: [RFC 4/4] perf kvm: Fix output fields instead of 'trace' for perf kvm report on powerpc
Hi Arnaldo, Gentle reminder :) Any updates? Regards, Ravi On Thursday 03 March 2016 06:49 AM, Ravi Bangoria wrote: Thanks acme, On Wednesday 02 March 2016 09:52 PM, Arnaldo Carvalho de Melo wrote: Em Wed, Mar 02, 2016 at 09:16:48PM +0530, Ravi Bangoria escreveu: Thanks Arnaldo, Please find my comments. On Wednesday 02 March 2016 07:55 PM, Arnaldo Carvalho de Melo wrote: Em Wed, Feb 24, 2016 at 02:37:45PM +0530, Ravi Bangoria escreveu: use_browser = 0; +if (!field_order && +is_perf_data_reorded_on_ppc(session->evlist) && +perf_guest_only()) +field_order = "overhead,comm,dso,sym"; + Can you please do it as: __weak void arch__override_field_order(struct perf_evlist *evlist, const char **field_order) { } So you mean like this - Just implement only weak function and move code into it? ie. No strong implementation at this point of time. Like, __weak void arch__override_field_order(struct perf_evlist *evlist, const char **f_order) { if (!field_order && is_perf_data_reorded_on_ppc(session->evlist) && Oh, I see, ugh, when running on x86_64 we wouldn't use this, so we need to have per arch default field orders, now I have to recall why is it that we need this per-arch field order :-\ Sorry, I'm little bit confused. We need arch specific functionality present on all arch to make cross arch reporting possible. for example, record perf.data on ppc and report on x86, we need ppc specific function present in perf binary compiled on x86. Please let me know if I understood it wrong. Regads, Ravi
Re: [RFC 4/4] perf kvm: Fix output fields instead of 'trace' for perf kvm report on powerpc
Thanks Arnaldo, Please find my comments. On Wednesday 02 March 2016 07:55 PM, Arnaldo Carvalho de Melo wrote: Em Wed, Feb 24, 2016 at 02:37:45PM +0530, Ravi Bangoria escreveu: use_browser = 0; + if (!field_order && + is_perf_data_reorded_on_ppc(session->evlist) && + perf_guest_only()) + field_order = "overhead,comm,dso,sym"; + Can you please do it as: __weak void arch__override_field_order(struct perf_evlist *evlist, const char **field_order) { } So you mean like this - Just implement only weak function and move code into it? ie. No strong implementation at this point of time. Like, __weak void arch__override_field_order(struct perf_evlist *evlist, const char **f_order) { if (!field_order && is_perf_data_reorded_on_ppc(session->evlist) && perf_guest_only()) *field_order = "overhead,comm,dso,sym"; } Then I can do that. But if you are proposing to implement a strong function and move this code into in, then we won't be able to enable cross arch reporting. This way we don't see any arch specific stuff in the tool, also I haven't seen any doc update, are you sure nothing needs to be added to tools/perf/Documentaton/ for any of these patches? I think this needs to be documented further, probably in tools/perf/design.txt too? Yes, I'll do this in next version. Regards, Ravi
Re: [RFC 4/4] perf kvm: Fix output fields instead of 'trace' for perf kvm report on powerpc
Thanks acme, On Wednesday 02 March 2016 09:52 PM, Arnaldo Carvalho de Melo wrote: Em Wed, Mar 02, 2016 at 09:16:48PM +0530, Ravi Bangoria escreveu: Thanks Arnaldo, Please find my comments. On Wednesday 02 March 2016 07:55 PM, Arnaldo Carvalho de Melo wrote: Em Wed, Feb 24, 2016 at 02:37:45PM +0530, Ravi Bangoria escreveu: use_browser = 0; + if (!field_order && + is_perf_data_reorded_on_ppc(session->evlist) && + perf_guest_only()) + field_order = "overhead,comm,dso,sym"; + Can you please do it as: __weak void arch__override_field_order(struct perf_evlist *evlist, const char **field_order) { } So you mean like this - Just implement only weak function and move code into it? ie. No strong implementation at this point of time. Like, __weak void arch__override_field_order(struct perf_evlist *evlist, const char **f_order) { if (!field_order && is_perf_data_reorded_on_ppc(session->evlist) && Oh, I see, ugh, when running on x86_64 we wouldn't use this, so we need to have per arch default field orders, now I have to recall why is it that we need this per-arch field order :-\ Sorry, I'm little bit confused. We need arch specific functionality present on all arch to make cross arch reporting possible. for example, record perf.data on ppc and report on x86, we need ppc specific function present in perf binary compiled on x86. Please let me know if I understood it wrong. Regads, Ravi
[PATCH] hw_breakpoint: Fix Oops at destroying hw_breakpoint event on powerpc
At a time of destroying hw_breakpoint event, kernel ends up with Oops. Here is the sample output from 4.5.0-rc6 kernel. [ 450.708568] Unable to handle kernel paging request for data at address 0x0c07 [ 450.708684] Faulting instruction address: 0xc00291d0 [ 450.708750] Oops: Kernel access of bad area, sig: 11 [#1] [ 450.708798] SMP NR_CPUS=1024 NUMA pSeries [ 450.708856] Modules linked in: stap_4c2bdcf3e1aee79b646bb9a844e600f7__4962(O) xt_CHECKSUM ... [ 450.709539] CPU: 5 PID: 5106 Comm: perf_fuzzer Tainted: G O 4.5.0-rc5+ #1 [ 450.709620] task: c000f8795c80 ti: c000e334 task.ti: c000e334 [ 450.709691] NIP: c00291d0 LR: c020b6b4 CTR: c020b6f0 [ 450.709760] REGS: c000e3343760 TRAP: 0300 Tainted: G O (4.5.0-rc5+) [ 450.709831] MSR: 80009033 <SF,EE,ME,IR,DR,RI,LE> CR: 22008828 XER: 2000 [ 450.710001] CFAR: c0010708 DAR: 0c07 DSISR: 4200 SOFTE: 1 GPR00: c020b6b4 c000e33439e0 c1350900 c0009efa7000 GPR04: 0001 c0009efa7000 0001 GPR08: GPR12: c020b6f0 c7e02800 c0009efa5208 GPR16: 0001 c000f3ad7f10 GPR20: c000f87964c8 0001 c000f8795c80 fffd GPR24: c000f3ad7f08 c000f3ad7f68 c0009efa6800 GPR28: c000f3ad7f00 c0009efa5000 c1259520 c0009efa7000 [ 450.710996] NIP [c00291d0] arch_unregister_hw_breakpoint+0x40/0x60 [ 450.711066] LR [c020b6b4] release_bp_slot+0x44/0x80 [ 450.77] Call Trace: [ 450.711165] [c000e33439e0] [c09c1e38] mutex_lock+0x28/0x70 (unreliable) [ 450.711257] [c000e3343a10] [c020b6b4] release_bp_slot+0x44/0x80 [ 450.711332] [c000e3343a40] [c02036c8] _free_event+0xd8/0x350 [ 450.711404] [c000e3343a70] [c0208260] perf_event_exit_task+0x2b0/0x4c0 [ 450.711490] [c000e3343b20] [c00b8ac8] do_exit+0x388/0xc60 [ 450.711563] [c000e3343be0] [c00b9484] do_group_exit+0x64/0x100 [ 450.711641] [c000e3343c20] [c00c9100] get_signal+0x220/0x770 [ 450.711716] [c000e3343d10] [c0017884] do_signal+0x54/0x2b0 [ 450.711793] [c000e3343e00] [c0017cac] do_notify_resume+0xbc/0xd0 [ 450.711865] [c000e3343e30] [c0009838] ret_from_except_lite+0x64/0x68 [ 450.711948] Instruction dump: [ 450.711986] f8010010 f821ffd1 7c7f1b78 6000 6000 e93f01e8 2fa9 419e0018 [ 450.712107] e9290098 2fa9 419e000c 3940 38210030 e8010010 ebe1fff8 [ 450.712230] ---[ end trace 3cf087de955e9358 ]--- Call chain: hw_breakpoint_event_init() bp->destroy = bp_perf_event_destroy; do_exit() perf_event_exit_task() perf_event_exit_task_context() WRITE_ONCE(child_ctx->task, TASK_TOMBSTONE); perf_event_exit_event() free_event() _free_event() bp_perf_event_destroy()//event->destroy(event); release_bp_slot() arch_unregister_hw_breakpoint() perf_event_exit_task_context sets child_ctx->task as TASK_TOMBSTONE which is (void *)-1. arch_unregister_hw_breakpoint tries to fetch 'thread' attribute of 'task' resulting in Oops. This patch adds one more condition before accessing data from 'task'. Signed-off-by: Ravi Bangoria <ravi.bango...@linux.vnet.ibm.com> --- arch/powerpc/kernel/hw_breakpoint.c | 3 ++- include/linux/perf_event.h | 2 ++ kernel/events/core.c| 2 -- 3 files changed, 4 insertions(+), 3 deletions(-) diff --git a/arch/powerpc/kernel/hw_breakpoint.c b/arch/powerpc/kernel/hw_breakpoint.c index 05e804c..43d8496 100644 --- a/arch/powerpc/kernel/hw_breakpoint.c +++ b/arch/powerpc/kernel/hw_breakpoint.c @@ -29,6 +29,7 @@ #include #include #include +#include #include #include @@ -110,7 +111,7 @@ void arch_unregister_hw_breakpoint(struct perf_event *bp) * and the single_step_dabr_instruction(), then cleanup the breakpoint * restoration variables to prevent dangling pointers. */ - if (bp->ctx && bp->ctx->task) + if (bp->ctx && bp->ctx->task && bp->ctx->task != TASK_TOMBSTONE) bp->ctx->task->thread.last_hit_ubp = NULL; } diff --git a/include/linux/perf_event.h b/include/linux/perf_event.h index f5c5a3f..491c50e 100644 --- a/include/linux/perf_event.h +++ b/include/linux/perf_event.h @@ -1192,4 +1192,6 @@ _name##_show(struct device *dev, \ \ static struct device_attribute format_
Re: [RFC 1/4] perf kvm: Enable 'record' on powerpc
Thanks Arnaldo for putting the effort. I've tested this patch on powerpc and it looks fine to me. Please find my below comments. On Friday 25 March 2016 02:45 AM, Arnaldo Carvalho de Melo wrote: Em Tue, Mar 22, 2016 at 11:19:21PM -0300, Arnaldo Carvalho de Melo escreveu: Em Tue, Mar 22, 2016 at 04:12:11PM -0300, Arnaldo Carvalho de Melo escreveu: Em Wed, Feb 24, 2016 at 02:37:42PM +0530, Ravi Bangoria escreveu: 'perf kvm record' is not available on powerpc because 'perf' relies on the 'cycles' event (a PMU event) to profile the guest. However, for powerpc, this can't be used from the host because the PMUs are controlled by the guest rather than the host. There exists a tracepoint 'kvm_hv:kvm_guest_exit' in powerpc which is hit whenever any of the threads exit the guest context. The guest instruction pointer dumped along with this tracepoint data in the field 'pc', can be used as guest instruction pointer. This patch changes default event as kvm_hv:kvm_guest_exit for recording guest data in host on powerpc. As we are using host event to record guest data, this approach will enable only --guest option of 'perf kvm'. Still --host --guest together won't work. It should, i.e. --host --guest should translate to: -e cycles:H,kvm_hv:kvm_guest_exit I.e. both collect cycles only in the host, and also the tracepoint that will allow us to get the guest approximation for the unavailable cycles event, no? I'm putting the infrastructure work needed for this the perf/cpumode branch. More work will be put there soon. So I took a different path and made perf_evsel__parse_sample set a new perf_sample.cpumode field, this way we'll end up having just to set a per-evsel ->post_parse_sample() callback for the event that replaces "cycles" for PPC guests where we'll just set data->ip and data->cpumode, the rest of the code remains unchanged. The changes I made looks useful in itself, as, IIRC more code was removed than added. I'll continue tomorrow and will test with the kvm:kvm_exit on x86_64 for testing, that has: Ok, so the infrastructure got merged already and from there the next steps are in running with: perf kvm --guest record -a -e cycles:H,kvm:kvm_exit And then, with the patch below applied, try: perf kvm --guestkallsyms kallsyms.guest --guestmodules modules.guest report -i perf.data.guest --munge-ppc-guest-sample kvm:kvm_exit The initial proposal was to change the default event as "kvm_guest_exit" for kvm recording/reporting on ppc. If I understand it correctly, your patch creates a handler for reporting kvm events based on "munge_ppc_guest_event" and the required tracepoint i.e., we need to mention the required tracepoint event name for recording and reporting. There might be a little bit of an issue here. For scripts which depend on generic perf kvm record/report, we need to change those appropriately to prevent those from failing on powerpc. Otherwise, (just a thought) can we create some kind of an alias to map the ppc specific perf kvm commands with the generic perf kvm. For e.g : perf kvm record -e "kvm_hv:kvm_guest_exit" mapped to perf kvm record & perf kvm report --munge-ppc-guest-sample kvm_hv:kvm_guest_exit mapped to perf kvm report. Regards, Ravi
Re: [RFC] perf probe: Fix module probe issue if no dwarf support
On Tuesday 26 April 2016 02:59 AM, Masami Hiramatsu wrote: On Mon, 25 Apr 2016 16:08:28 +0530 Ravi Bangoria <ravi.bango...@linux.vnet.ibm.com> wrote: Perf is not able to register probe in kernel module when dwarf supprt is not there(and so it goes for symtab). Perf passes full path of module where only module name is required which is causing the problem. This patch fixes this issue. Before applying patch: $ dpkg -s libdw-dev dpkg-query: package 'libdw-dev' is not installed ... $ ./perf probe -m /linux/samples/kobject/kobject-example.ko foo_show Added new event: probe:foo_show (on foo_show in /linux/samples/kobject/kobject-example.ko) You can now use it in all perf tools, such as: perf record -e probe:foo_show -aR sleep 1 $ cat /sys/kernel/debug/tracing/kprobe_events p:probe/foo_show /linux/samples/kobject/kobject-example.ko:foo_show After applying patch: $ ./perf probe -m /linux/samples/kobject/kobject-example.ko foo_show Added new event: probe:foo_show (on foo_show in kobject_example) You can now use it in all perf tools, such as: perf record -e probe:foo_show -aR sleep 1 $ cat /sys/kernel/debug/tracing/kprobe_events p:probe/foo_show kobject_example:foo_show Looks good to me :) However, it seems that this patch depends on your previous patch ("perf probe: Fix offline module name missmatch issue") In that case, could you make these a series of patches? Acked-by: Masami Hiramatsu <mhira...@kernel.org> Thanks Masami, I've sent v2 with changes you suggested. Please review it. Regards, Ravi
Re: [RFC] perf probe: Fix offline module name missmatch issue
Thanks Masami for reviewing. Please find my replies to your comment. On Tuesday 26 April 2016 02:54 AM, Masami Hiramatsu wrote: Hi Ravi, On Mon, 25 Apr 2016 16:08:27 +0530 Ravi Bangoria <ravi.bango...@linux.vnet.ibm.com> wrote: Perf can add a probe on kernel module which has not been loaded yet. Current implementation finds module name from path. But if filename is different from actual module name then perf fails to register probe while loading module because of mismatch in names. For example, samples/kobject/kobject-example.ko is loaded as kobject_example. Ah! right, good catch! Have some comment below; diff --git a/tools/perf/util/probe-event.c b/tools/perf/util/probe-event.c index 8319fbb..05d0905 100644 --- a/tools/perf/util/probe-event.c +++ b/tools/perf/util/probe-event.c @@ -265,6 +265,65 @@ static bool kprobe_warn_out_range(const char *symbol, unsigned long address) return true; } +/* + * NOTE: + * '.gnu.linkonce.this_module' section of kernel module elf directly + * maps to 'struct module' from linux/module.h. This section contains + * actual module name which will be used by kernel after loading it. + * But, we cannot use 'struct module' here since linux/module.h is not + * exposed to user-space. Offset of 'name' has remained same from long + * time, so hardcoding it here. + */ +#ifdef __LP64__ +#define MOD_NAME_OFFSET 24 +#else +#define MOD_NAME_OFFSET 12 +#endif + +/* + * @module can be module name of module file path. In case of path, + * inspect elf and find out what is actual module name. + * Caller has to free mod_name after using it. + */ +char *find_module_name(const char *module) Could you make this function static, since there is no caller outside this file? Yes. no caller outside of this file. But, In this patch, function find_module_name is defined outside of #ifdef HAVE_DWARF_SUPPORT while it's being called from inside of #ifdef HAVE_DWARF_SUPPORT. If I make it static and if there is no dwarf support, there will be compilation error about function defined but not used. And in second patch("perf probe: Fix module probe issue if no dwarf support"), I'm calling it from outside of #ifdef HAVE_DWARF_SUPPORT. So I have two options: 1. merge both the patches and make definition as static 2. make function static in second patch I've chose second approach and sent v2. But please let me know if there is better way to do it. Regards, Ravi
Re: [RFC] perf probe: Fix offline module name missmatch issue
On Tuesday 26 April 2016 02:45 PM, Wangnan (F) wrote: On 2016/4/26 16:56, Ravi Bangoria wrote: Thanks Masami for reviewing. Please find my replies to your comment. On Tuesday 26 April 2016 02:54 AM, Masami Hiramatsu wrote: Hi Ravi, On Mon, 25 Apr 2016 16:08:27 +0530 Ravi Bangoria <ravi.bango...@linux.vnet.ibm.com> wrote: Perf can add a probe on kernel module which has not been loaded yet. Current implementation finds module name from path. But if filename is different from actual module name then perf fails to register probe while loading module because of mismatch in names. For example, samples/kobject/kobject-example.ko is loaded as kobject_example. Ah! right, good catch! Have some comment below; diff --git a/tools/perf/util/probe-event.c b/tools/perf/util/probe-event.c index 8319fbb..05d0905 100644 --- a/tools/perf/util/probe-event.c +++ b/tools/perf/util/probe-event.c @@ -265,6 +265,65 @@ static bool kprobe_warn_out_range(const char *symbol, unsigned long address) return true; } +/* + * NOTE: + * '.gnu.linkonce.this_module' section of kernel module elf directly + * maps to 'struct module' from linux/module.h. This section contains + * actual module name which will be used by kernel after loading it. + * But, we cannot use 'struct module' here since linux/module.h is not + * exposed to user-space. Offset of 'name' has remained same from long + * time, so hardcoding it here. + */ +#ifdef __LP64__ +#define MOD_NAME_OFFSET 24 +#else +#define MOD_NAME_OFFSET 12 +#endif + +/* + * @module can be module name of module file path. In case of path, + * inspect elf and find out what is actual module name. + * Caller has to free mod_name after using it. + */ +char *find_module_name(const char *module) Could you make this function static, since there is no caller outside this file? Yes. no caller outside of this file. But, In this patch, function find_module_name is defined outside of #ifdef HAVE_DWARF_SUPPORT while it's being called from inside of #ifdef HAVE_DWARF_SUPPORT. If I make it static and if there is no dwarf support, there will be compilation error about function defined but not used. And in second patch("perf probe: Fix module probe issue if no dwarf support"), I'm calling it from outside of #ifdef HAVE_DWARF_SUPPORT. So I have two options: 1. merge both the patches and make definition as static 2. make function static in second patch I've chose second approach and sent v2. But please let me know if there is better way to do it. Try __maybe_unused directive? Thanks Wangnan for suggestion, Actually I tried to use __maybe_unused with definition of find_module_name but it throws following compilation error: util/probe-event.c:289:1: error: expected ‘,’ or ‘;’ before ‘{’ token { ^ util/probe-event.c:288:14: error: ‘find_module_name’ declared ‘static’ but never defined [-Werror=unused-function] static char *find_module_name(const char *module) __maybe_unused ^ CC util/zlib.o cc1: all warnings being treated as errors I've to declare prototype of function with __maybe_unused before it's definition to resolve this error. And, anyway this is temporary and need to be removed in patch 2, I think no need to do this change. Regards, Ravi
Re: [RFC] perf probe: Fix offline module name missmatch issue
Thanks Masami, On Tuesday 26 April 2016 07:49 AM, Masami Hiramatsu wrote: On Tue, 26 Apr 2016 06:24:38 +0900 Masami Hiramatsuwrote: +/* + * NOTE: + * '.gnu.linkonce.this_module' section of kernel module elf directly + * maps to 'struct module' from linux/module.h. This section contains + * actual module name which will be used by kernel after loading it. + * But, we cannot use 'struct module' here since linux/module.h is not + * exposed to user-space. Offset of 'name' has remained same from long + * time, so hardcoding it here. + */ BTW, is there no way to get the module name avoiding to access this "hidden" data structure? This looks very tricky way... So this is the same approach kernel use to find module name when module is loaded. Please refer this function for more detail: kernel/module.c :: static struct module *setup_load_info(...) Regards, Ravi
[PATCH 2/2] perf probe: Fix offline module name missmatch issue
Perf can add a probe on kernel module which has not been loaded yet. Current implementation finds module name from path. But if filename is different from actual module name then perf fails to register probe while loading module because of mismatch in names. For example, samples/kobject/kobject-example.ko is loaded as kobject_example. Before applying patch: $ sudo ./perf probe -m /linux/samples/kobject/kobject-example.ko foo_show Added new event: probe:foo_show (on foo_show in kobject-example) You can now use it in all perf tools, such as: perf record -e probe:foo_show -aR sleep 1 $ cat /sys/kernel/debug/tracing/kprobe_events p:probe/foo_show kobject-example:foo_show $ insmod kobject-example.ko $ lsmod Module Size Used by kobject_example16384 0 Generate read to /sys/kernel/kobject_example/foo while recording data with below command $ sudo ./perf record -e probe:foo_show -a [ perf record: Woken up 1 times to write data ] [ perf record: Captured and wrote 0.093 MB perf.data ] $./perf report --stdio -F overhead,comm,dso,sym Error: The perf.data.old file has no samples! After applying patch: $ sudo ./perf probe -m /linux/samples/kobject/kobject-example.ko foo_show Added new event: probe:foo_show (on foo_show in kobject_example) You can now use it in all perf tools, such as: perf record -e probe:foo_show -aR sleep 1 $ sudo cat /sys/kernel/debug/tracing/kprobe_events p:probe/foo_show kobject_example:foo_show $ insmod kobject-example.ko $ lsmod Module Size Used by kobject_example16384 0 Generate read to /sys/kernel/kobject_example/foo while recording data with below command $ sudo ./perf record -e probe:foo_show -a [ perf record: Woken up 1 times to write data ] [ perf record: Captured and wrote 0.097 MB perf.data (8 samples) ] $ sudo ./perf report --stdio -F overhead,comm,dso,sym ... # Samples: 8 of event 'probe:foo_show' # Event count (approx.): 8 # # Overhead Command Shared Object Symbol # ... . # 100.00% cat [kobject_example] [k] foo_show Signed-off-by: Ravi Bangoria <ravi.bango...@linux.vnet.ibm.com> --- tools/perf/util/probe-event.c | 19 +-- 1 file changed, 5 insertions(+), 14 deletions(-) diff --git a/tools/perf/util/probe-event.c b/tools/perf/util/probe-event.c index d58de20..26803e1 100644 --- a/tools/perf/util/probe-event.c +++ b/tools/perf/util/probe-event.c @@ -642,32 +642,23 @@ static int add_module_to_probe_trace_events(struct probe_trace_event *tevs, int ntevs, const char *module) { int i, ret = 0; - char *tmp; + char *mod_name = NULL; if (!module) return 0; - tmp = strrchr(module, '/'); - if (tmp) { - /* This is a module path -- get the module name */ - module = strdup(tmp + 1); - if (!module) - return -ENOMEM; - tmp = strchr(module, '.'); - if (tmp) - *tmp = '\0'; - tmp = (char *)module; /* For free() */ - } + mod_name = find_module_name(module); for (i = 0; i < ntevs; i++) { - tevs[i].point.module = strdup(module); + tevs[i].point.module = + strdup(mod_name ? mod_name : module); if (!tevs[i].point.module) { ret = -ENOMEM; break; } } - free(tmp); + free(mod_name); return ret; } -- 1.9.1
[PATCH 1/2] perf probe: Fix module probe issue if no dwarf support
Perf is not able to register probe in kernel module when dwarf supprt is not there(and so it goes for symtab). Perf passes full path of module where only module name is required which is causing the problem. This patch fixes this issue. Before applying patch: $ dpkg -s libdw-dev dpkg-query: package 'libdw-dev' is not installed and no information is... $ sudo ./perf probe -m /linux/samples/kprobes/kprobe_example.ko kprobe_init Added new event: probe:kprobe_init (on kprobe_init in /linux/samples/kprobes/kprobe_example.ko) You can now use it in all perf tools, such as: perf record -e probe:kprobe_init -aR sleep 1 $ sudo cat /sys/kernel/debug/tracing/kprobe_events p:probe/kprobe_init /linux/samples/kprobes/kprobe_example.ko:kprobe_init $ sudo ./perf record -a -e probe:kprobe_init [ perf record: Woken up 1 times to write data ] [ perf record: Captured and wrote 0.105 MB perf.data ] $ sudo ./perf script # No output here After applying patch: $ sudo ./perf probe -m /linux/samples/kprobes/kprobe_example.ko kprobe_init Added new event: probe:kprobe_init(on kprobe_init in kprobe_example) You can now use it in all perf tools, such as: perf record -e probe:kprobe_init -aR sleep 1 $ sudo cat /sys/kernel/debug/tracing/kprobe_events p:probe/kprobe_init kprobe_example:kprobe_init $ sudo ./perf record -a -e probe:kprobe_init [ perf record: Woken up 1 times to write data ] [ perf record: Captured and wrote 0.105 MB perf.data (2 samples) ] $ sudo ./perf script insmod 13990 [002] 5961.216833: probe:kprobe_init: ... insmod 13995 [002] 5962.889384: probe:kprobe_init: ... Signed-off-by: Ravi Bangoria <ravi.bango...@linux.vnet.ibm.com> --- tools/perf/util/probe-event.c | 76 +-- 1 file changed, 73 insertions(+), 3 deletions(-) diff --git a/tools/perf/util/probe-event.c b/tools/perf/util/probe-event.c index 8319fbb..d58de20 100644 --- a/tools/perf/util/probe-event.c +++ b/tools/perf/util/probe-event.c @@ -265,6 +265,65 @@ static bool kprobe_warn_out_range(const char *symbol, unsigned long address) return true; } +/* + * NOTE: + * '.gnu.linkonce.this_module' section of kernel module elf directly + * maps to 'struct module' from linux/module.h. This section contains + * actual module name which will be used by kernel after loading it. + * But, we cannot use 'struct module' here since linux/module.h is not + * exposed to user-space. Offset of 'name' has remained same from long + * time, so hardcoding it here. + */ +#ifdef __LP64__ +#define MOD_NAME_OFFSET 24 +#else +#define MOD_NAME_OFFSET 12 +#endif + +/* + * @module can be module name of module file path. In case of path, + * inspect elf and find out what is actual module name. + * Caller has to free mod_name after using it. + */ +static char *find_module_name(const char *module) +{ + int fd; + Elf *elf; + GElf_Ehdr ehdr; + GElf_Shdr shdr; + Elf_Data *data; + Elf_Scn *sec; + char *mod_name = NULL; + + fd = open(module, O_RDONLY); + if (fd < 0) + return NULL; + + elf = elf_begin(fd, PERF_ELF_C_READ_MMAP, NULL); + if (elf == NULL) + goto elf_err; + + if (gelf_getehdr(elf, ) == NULL) + goto ret_err; + + sec = elf_section_by_name(elf, , , + ".gnu.linkonce.this_module", NULL); + if (!sec) + goto ret_err; + + data = elf_getdata(sec, NULL); + if (!data || !data->d_buf) + goto ret_err; + + mod_name = strdup((char *)data->d_buf + MOD_NAME_OFFSET); + +ret_err: + elf_end(elf); +elf_err: + close(fd); + return mod_name; +} + #ifdef HAVE_DWARF_SUPPORT static int kernel_get_module_dso(const char *module, struct dso **pdso) @@ -2516,6 +2575,7 @@ static int find_probe_trace_events_from_map(struct perf_probe_event *pev, struct probe_trace_point *tp; int num_matched_functions; int ret, i, j, skipped = 0; + char *mod_name; map = get_target_map(pev->target, pev->uprobes); if (!map) { @@ -2600,9 +2660,19 @@ static int find_probe_trace_events_from_map(struct perf_probe_event *pev, tp->realname = strdup_or_goto(sym->name, nomem_out); tp->retprobe = pp->retprobe; - if (pev->target) - tev->point.module = strdup_or_goto(pev->target, - nomem_out); + if (pev->target) { + if (pev->uprobes) { + tev->point.module = strdup_or_goto(pev->target, + nomem_out); + } else { + mod_name = find_module_name(pev->target); +
Re: [PATCH 1/2] perf probe: Fix module probe issue if no dwarf support
On Tuesday 26 April 2016 08:04 PM, Arnaldo Carvalho de Melo wrote: Em Tue, Apr 26, 2016 at 07:55:40PM +0530, Ravi Bangoria escreveu: Perf is not able to register probe in kernel module when dwarf supprt is not there(and so it goes for symtab). Perf passes full path of module where only module name is required which is causing the problem. This patch fixes this issue. Is this v3? What has changed from v2? - Arnaldo Yes Arnaldo. But I changed it from [RFC] to [PATCH], so didn't include version. Here is [RFC v2] link: https://lkml.org/lkml/2016/4/26/114 Changes w.r.t. [RFC v2]: - Swapped patch in series and move definition of find_module_name in other patch. Regards, Ravi
Re: [RFC] perf probe: Fix offline module name missmatch issue
On Tuesday 26 April 2016 04:16 PM, Masami Hiramatsu wrote: On Tue, 26 Apr 2016 14:26:48 +0530 Ravi Bangoria <ravi.bango...@linux.vnet.ibm.com> wrote: Thanks Masami for reviewing. Please find my replies to your comment. On Tuesday 26 April 2016 02:54 AM, Masami Hiramatsu wrote: Hi Ravi, On Mon, 25 Apr 2016 16:08:27 +0530 Ravi Bangoria <ravi.bango...@linux.vnet.ibm.com> wrote: Perf can add a probe on kernel module which has not been loaded yet. Current implementation finds module name from path. But if filename is different from actual module name then perf fails to register probe while loading module because of mismatch in names. For example, samples/kobject/kobject-example.ko is loaded as kobject_example. Ah! right, good catch! Have some comment below; diff --git a/tools/perf/util/probe-event.c b/tools/perf/util/probe-event.c index 8319fbb..05d0905 100644 --- a/tools/perf/util/probe-event.c +++ b/tools/perf/util/probe-event.c @@ -265,6 +265,65 @@ static bool kprobe_warn_out_range(const char *symbol, unsigned long address) return true; } +/* + * NOTE: + * '.gnu.linkonce.this_module' section of kernel module elf directly + * maps to 'struct module' from linux/module.h. This section contains + * actual module name which will be used by kernel after loading it. + * But, we cannot use 'struct module' here since linux/module.h is not + * exposed to user-space. Offset of 'name' has remained same from long + * time, so hardcoding it here. + */ +#ifdef __LP64__ +#define MOD_NAME_OFFSET 24 +#else +#define MOD_NAME_OFFSET 12 +#endif + +/* + * @module can be module name of module file path. In case of path, + * inspect elf and find out what is actual module name. + * Caller has to free mod_name after using it. + */ +char *find_module_name(const char *module) Could you make this function static, since there is no caller outside this file? Yes. no caller outside of this file. But, In this patch, function find_module_name is defined outside of #ifdef HAVE_DWARF_SUPPORT while it's being called from inside of #ifdef HAVE_DWARF_SUPPORT. If I make it static and if there is no dwarf support, there will be compilation error about function defined but not used. And in second patch("perf probe: Fix module probe issue if no dwarf support"), I'm calling it from outside of #ifdef HAVE_DWARF_SUPPORT. So I have two options: 1. merge both the patches and make definition as static 2. make function static in second patch I've chose second approach and sent v2. But please let me know if there is better way to do it. Ah, I see. In that case, you can swap the patch in the series and move find_module_name in the other patch ;) Thanks :) Sent patch with changes. Please review it. Regards, Ravi
Re: [RFC 1/4] perf kvm: Enable 'record' on powerpc
0ce..ffe5bbb 100644 --- a/tools/perf/util/util.h +++ b/tools/perf/util/util.h @@ -367,4 +367,9 @@ typedef void (*print_binary_t)(enum binary_printer_ops, void print_binary(unsigned char *data, size_t len, size_t bytes_per_line, print_binary_t printer, void *extra); + +struct perf_evlist; + +int perf_kvm__setup_munge_ppc_guest_event(struct perf_evlist *evlist); + #endif /* GIT_COMPAT_UTIL_H */ On Friday 25 March 2016 02:45 AM, Arnaldo Carvalho de Melo wrote: Em Tue, Mar 22, 2016 at 11:19:21PM -0300, Arnaldo Carvalho de Melo escreveu: Em Tue, Mar 22, 2016 at 04:12:11PM -0300, Arnaldo Carvalho de Melo escreveu: Em Wed, Feb 24, 2016 at 02:37:42PM +0530, Ravi Bangoria escreveu: 'perf kvm record' is not available on powerpc because 'perf' relies on the 'cycles' event (a PMU event) to profile the guest. However, for powerpc, this can't be used from the host because the PMUs are controlled by the guest rather than the host. There exists a tracepoint 'kvm_hv:kvm_guest_exit' in powerpc which is hit whenever any of the threads exit the guest context. The guest instruction pointer dumped along with this tracepoint data in the field 'pc', can be used as guest instruction pointer. This patch changes default event as kvm_hv:kvm_guest_exit for recording guest data in host on powerpc. As we are using host event to record guest data, this approach will enable only --guest option of 'perf kvm'. Still --host --guest together won't work. It should, i.e. --host --guest should translate to: -e cycles:H,kvm_hv:kvm_guest_exit I.e. both collect cycles only in the host, and also the tracepoint that will allow us to get the guest approximation for the unavailable cycles event, no? I'm putting the infrastructure work needed for this the perf/cpumode branch. More work will be put there soon. So I took a different path and made perf_evsel__parse_sample set a new perf_sample.cpumode field, this way we'll end up having just to set a per-evsel ->post_parse_sample() callback for the event that replaces "cycles" for PPC guests where we'll just set data->ip and data->cpumode, the rest of the code remains unchanged. The changes I made looks useful in itself, as, IIRC more code was removed than added. I'll continue tomorrow and will test with the kvm:kvm_exit on x86_64 for testing, that has: Ok, so the infrastructure got merged already and from there the next steps are in running with: perf kvm --guest record -a -e cycles:H,kvm:kvm_exit And then, with the patch below applied, try: perf kvm --guestkallsyms kallsyms.guest --guestmodules modules.guest report -i perf.data.guest --munge-ppc-guest-sample kvm:kvm_exit I'm almost there, it is still not resolving to the kernel DSO, etc, so I get: Samples: 1K of event 'kvm:kvm_exit', Event count (approx.): 1924 Overhead Command Shared Object Symbol 34.77% :5343[unknown] [g] 0x81043158 16.84% :5345[unknown] [g] 0x813f3d5a 16.84% :5345[unknown] [g] 0x813f43ec 13.83% :5345[unknown] [g] 0x81043158 9.62% :5343[unknown] [g] 0x8104301a 3.79% :5345[unknown] [g] 0x8104301a 1.77% :5345[unknown] [u] 0x003ae6c75dc9 0.52% :5343[unknown] [g] 0x812a29b1 0.16% :5343[unknown] [g] 0x8100bc00 0.10% :5343[unknown] [g] 0x8104315a 0.10% :5343[unknown] [g] 0x8106306f 0.10% :5343[unknown] [g] 0x8153b7fc 0.10% :5345[unknown] [g] 0x8106306f 0.05% :5343[unknown] [g] 0x8100b720 [root@jouet ~]# cat /proc/*/task/5343/comm qemu-system-x86 [root@jouet ~]# The patch does several of the things you did per sample, but only right after opening the perf.data file, and I'll break it down in multiple patches, this is just a heads up, please review it if you have the time, in the end we should have a mechanism useful not just for PPC and that affects just 'perf kvm' in this specific case. - Arnaldo diff --git a/tools/perf/builtin-kvm.c b/tools/perf/builtin-kvm.c index bff666458b28..b7b6527446f8 100644 --- a/tools/perf/builtin-kvm.c +++ b/tools/perf/builtin-kvm.c @@ -1480,6 +1480,86 @@ perf_stat: } #endif /* HAVE_KVM_STAT_SUPPORT */ +#ifdef __powerpc__ +#define PPC_HV_DECREMENTER 2432 +#define PPC_HV_BIT 3 +#define PPC_PR_BIT 49 +#define PPC_MAX 63 + +static bool perf_sample__is_hv_dec_trap(struct perf_sample *sample, struct perf_evsel *evsel) +{ + int trap = perf_evsel__intval(evsel, sample, "trap"); + return trap == PPC_HV_DECREMENTER; +} + +static void perf_kvm__munge_ppc_guest_sample(struct perf_evsel *evsel, struct perf_sample *sample) +{ + unsigned long msr, hv, pr; + + if (!perf_sample__is_hv_dec_trap(sample, evsel)) + return; + + sample->ip = perf_evsel__intval
[RFC] perf probe: Fix offline module name missmatch issue
Perf can add a probe on kernel module which has not been loaded yet. Current implementation finds module name from path. But if filename is different from actual module name then perf fails to register probe while loading module because of mismatch in names. For example, samples/kobject/kobject-example.ko is loaded as kobject_example. Before applying patch: $ sudo ./perf probe -m /linux/samples/kobject/kobject-example.ko foo_show Added new event: probe:foo_show (on foo_show in kobject-example) You can now use it in all perf tools, such as: perf record -e probe:foo_show -aR sleep 1 $ cat /sys/kernel/debug/tracing/kprobe_events p:probe/foo_show kobject-example:foo_show $ insmod kobject-example.ko $ lsmod Module Size Used by kobject_example16384 0 Generate read to /sys/kernel/kobject_example/foo while recording data with below command $ sudo ./perf record -e probe:foo_show -a [ perf record: Woken up 1 times to write data ] [ perf record: Captured and wrote 0.093 MB perf.data ] $./perf report --stdio -F overhead,comm,dso,sym Error: The perf.data.old file has no samples! After applying patch: $ sudo ./perf probe -m /linux/samples/kobject/kobject-example.ko foo_show Added new event: probe:foo_show (on foo_show in kobject_example) You can now use it in all perf tools, such as: perf record -e probe:foo_show -aR sleep 1 $ sudo cat /sys/kernel/debug/tracing/kprobe_events p:probe/foo_show kobject_example:foo_show $ insmod kobject-example.ko $ lsmod Module Size Used by kobject_example16384 0 Generate read to /sys/kernel/kobject_example/foo while recording data with below command $ sudo ./perf record -e probe:foo_show -a [ perf record: Woken up 1 times to write data ] [ perf record: Captured and wrote 0.097 MB perf.data (8 samples) ] $ sudo ./perf report --stdio -F overhead,comm,dso,sym ... # Samples: 8 of event 'probe:foo_show' # Event count (approx.): 8 # # Overhead Command Shared Object Symbol # ... . # 100.00% cat [kobject_example] [k] foo_show Signed-off-by: Ravi Bangoria <ravi.bango...@linux.vnet.ibm.com> --- tools/perf/util/probe-event.c | 78 +++ tools/perf/util/probe-event.h | 2 ++ 2 files changed, 66 insertions(+), 14 deletions(-) diff --git a/tools/perf/util/probe-event.c b/tools/perf/util/probe-event.c index 8319fbb..05d0905 100644 --- a/tools/perf/util/probe-event.c +++ b/tools/perf/util/probe-event.c @@ -265,6 +265,65 @@ static bool kprobe_warn_out_range(const char *symbol, unsigned long address) return true; } +/* + * NOTE: + * '.gnu.linkonce.this_module' section of kernel module elf directly + * maps to 'struct module' from linux/module.h. This section contains + * actual module name which will be used by kernel after loading it. + * But, we cannot use 'struct module' here since linux/module.h is not + * exposed to user-space. Offset of 'name' has remained same from long + * time, so hardcoding it here. + */ +#ifdef __LP64__ +#define MOD_NAME_OFFSET 24 +#else +#define MOD_NAME_OFFSET 12 +#endif + +/* + * @module can be module name of module file path. In case of path, + * inspect elf and find out what is actual module name. + * Caller has to free mod_name after using it. + */ +char *find_module_name(const char *module) +{ + int fd; + Elf *elf; + GElf_Ehdr ehdr; + GElf_Shdr shdr; + Elf_Data *data; + Elf_Scn *sec; + char *mod_name = NULL; + + fd = open(module, O_RDONLY); + if (fd < 0) + return NULL; + + elf = elf_begin(fd, PERF_ELF_C_READ_MMAP, NULL); + if (elf == NULL) + goto elf_err; + + if (gelf_getehdr(elf, ) == NULL) + goto ret_err; + + sec = elf_section_by_name(elf, , , + ".gnu.linkonce.this_module", NULL); + if (!sec) + goto ret_err; + + data = elf_getdata(sec, NULL); + if (!data || !data->d_buf) + goto ret_err; + + mod_name = strdup((char *)data->d_buf + MOD_NAME_OFFSET); + +ret_err: + elf_end(elf); +elf_err: + close(fd); + return mod_name; +} + #ifdef HAVE_DWARF_SUPPORT static int kernel_get_module_dso(const char *module, struct dso **pdso) @@ -583,32 +642,23 @@ static int add_module_to_probe_trace_events(struct probe_trace_event *tevs, int ntevs, const char *module) { int i, ret = 0; - char *tmp; + char *mod_name; if (!module) return 0; - tmp = strrchr(module, '/'); - if (tmp) { - /* This is a module path -- get the module name */ - module = strdup(tmp + 1); -
[RFC] perf probe: Fix module probe issue if no dwarf support
Perf is not able to register probe in kernel module when dwarf supprt is not there(and so it goes for symtab). Perf passes full path of module where only module name is required which is causing the problem. This patch fixes this issue. Before applying patch: $ dpkg -s libdw-dev dpkg-query: package 'libdw-dev' is not installed ... $ ./perf probe -m /linux/samples/kobject/kobject-example.ko foo_show Added new event: probe:foo_show (on foo_show in /linux/samples/kobject/kobject-example.ko) You can now use it in all perf tools, such as: perf record -e probe:foo_show -aR sleep 1 $ cat /sys/kernel/debug/tracing/kprobe_events p:probe/foo_show /linux/samples/kobject/kobject-example.ko:foo_show After applying patch: $ ./perf probe -m /linux/samples/kobject/kobject-example.ko foo_show Added new event: probe:foo_show (on foo_show in kobject_example) You can now use it in all perf tools, such as: perf record -e probe:foo_show -aR sleep 1 $ cat /sys/kernel/debug/tracing/kprobe_events p:probe/foo_show kobject_example:foo_show Signed-off-by: Ravi Bangoria <ravi.bango...@linux.vnet.ibm.com> --- tools/perf/util/probe-event.c | 17 ++--- 1 file changed, 14 insertions(+), 3 deletions(-) diff --git a/tools/perf/util/probe-event.c b/tools/perf/util/probe-event.c index 05d0905..54e6a5a 100644 --- a/tools/perf/util/probe-event.c +++ b/tools/perf/util/probe-event.c @@ -2566,6 +2566,7 @@ static int find_probe_trace_events_from_map(struct perf_probe_event *pev, struct probe_trace_point *tp; int num_matched_functions; int ret, i, j, skipped = 0; + char *mod_name; map = get_target_map(pev->target, pev->uprobes); if (!map) { @@ -2650,9 +2651,19 @@ static int find_probe_trace_events_from_map(struct perf_probe_event *pev, tp->realname = strdup_or_goto(sym->name, nomem_out); tp->retprobe = pp->retprobe; - if (pev->target) - tev->point.module = strdup_or_goto(pev->target, - nomem_out); + if (pev->target) { + if (pev->uprobes) { + tev->point.module = strdup_or_goto(pev->target, + nomem_out); + } else { + mod_name = find_module_name(pev->target); + tev->point.module = + strdup(mod_name ? mod_name : pev->target); + free(mod_name); + if (!tev->point.module) + goto nomem_out; + } + } tev->uprobes = pev->uprobes; tev->nargs = pev->nargs; if (tev->nargs) { -- 2.1.4
[RFC v2 1/2] perf probe: Fix offline module name missmatch issue
Perf can add a probe on kernel module which has not been loaded yet. Current implementation finds module name from path. But if filename is different from actual module name then perf fails to register probe while loading module because of mismatch in names. For example, samples/kobject/kobject-example.ko is loaded as kobject_example. Before applying patch: $ sudo ./perf probe -m /linux/samples/kobject/kobject-example.ko foo_show Added new event: probe:foo_show (on foo_show in kobject-example) You can now use it in all perf tools, such as: perf record -e probe:foo_show -aR sleep 1 $ cat /sys/kernel/debug/tracing/kprobe_events p:probe/foo_show kobject-example:foo_show $ insmod kobject-example.ko $ lsmod Module Size Used by kobject_example16384 0 Generate read to /sys/kernel/kobject_example/foo while recording data with below command $ sudo ./perf record -e probe:foo_show -a [ perf record: Woken up 1 times to write data ] [ perf record: Captured and wrote 0.093 MB perf.data ] $./perf report --stdio -F overhead,comm,dso,sym Error: The perf.data.old file has no samples! After applying patch: $ sudo ./perf probe -m /linux/samples/kobject/kobject-example.ko foo_show Added new event: probe:foo_show (on foo_show in kobject_example) You can now use it in all perf tools, such as: perf record -e probe:foo_show -aR sleep 1 $ sudo cat /sys/kernel/debug/tracing/kprobe_events p:probe/foo_show kobject_example:foo_show $ insmod kobject-example.ko $ lsmod Module Size Used by kobject_example16384 0 Generate read to /sys/kernel/kobject_example/foo while recording data with below command $ sudo ./perf record -e probe:foo_show -a [ perf record: Woken up 1 times to write data ] [ perf record: Captured and wrote 0.097 MB perf.data (8 samples) ] $ sudo ./perf report --stdio -F overhead,comm,dso,sym ... # Samples: 8 of event 'probe:foo_show' # Event count (approx.): 8 # # Overhead Command Shared Object Symbol # ... . # 100.00% cat [kobject_example] [k] foo_show Signed-off-by: Ravi Bangoria <ravi.bango...@linux.vnet.ibm.com> --- tools/perf/util/probe-event.c | 78 +++ tools/perf/util/probe-event.h | 2 ++ 2 files changed, 66 insertions(+), 14 deletions(-) diff --git a/tools/perf/util/probe-event.c b/tools/perf/util/probe-event.c index 8319fbb..5f1a9bf 100644 --- a/tools/perf/util/probe-event.c +++ b/tools/perf/util/probe-event.c @@ -265,6 +265,65 @@ static bool kprobe_warn_out_range(const char *symbol, unsigned long address) return true; } +/* + * NOTE: + * '.gnu.linkonce.this_module' section of kernel module elf directly + * maps to 'struct module' from linux/module.h. This section contains + * actual module name which will be used by kernel after loading it. + * But, we cannot use 'struct module' here since linux/module.h is not + * exposed to user-space. Offset of 'name' has remained same from long + * time, so hardcoding it here. + */ +#ifdef __LP64__ +#define MOD_NAME_OFFSET 24 +#else +#define MOD_NAME_OFFSET 12 +#endif + +/* + * @module can be module name of module file path. In case of path, + * inspect elf and find out what is actual module name. + * Caller has to free mod_name after using it. + */ +char *find_module_name(const char *module) +{ + int fd; + Elf *elf; + GElf_Ehdr ehdr; + GElf_Shdr shdr; + Elf_Data *data; + Elf_Scn *sec; + char *mod_name = NULL; + + fd = open(module, O_RDONLY); + if (fd < 0) + return NULL; + + elf = elf_begin(fd, PERF_ELF_C_READ_MMAP, NULL); + if (elf == NULL) + goto elf_err; + + if (gelf_getehdr(elf, ) == NULL) + goto ret_err; + + sec = elf_section_by_name(elf, , , + ".gnu.linkonce.this_module", NULL); + if (!sec) + goto ret_err; + + data = elf_getdata(sec, NULL); + if (!data || !data->d_buf) + goto ret_err; + + mod_name = strdup((char *)data->d_buf + MOD_NAME_OFFSET); + +ret_err: + elf_end(elf); +elf_err: + close(fd); + return mod_name; +} + #ifdef HAVE_DWARF_SUPPORT static int kernel_get_module_dso(const char *module, struct dso **pdso) @@ -583,32 +642,23 @@ static int add_module_to_probe_trace_events(struct probe_trace_event *tevs, int ntevs, const char *module) { int i, ret = 0; - char *tmp; + char *mod_name; if (!module) return 0; - tmp = strrchr(module, '/'); - if (tmp) { - /* This is a module path -- get the module name */ - module = strdup(tmp + 1); -
[RFC v2 2/2] perf probe: Fix module probe issue if no dwarf support
Perf is not able to register probe in kernel module when dwarf supprt is not there(and so it goes for symtab). Perf passes full path of module where only module name is required which is causing the problem. This patch fixes this issue. Before applying patch: $ dpkg -s libdw-dev dpkg-query: package 'libdw-dev' is not installed ... $ ./perf probe -m /linux/samples/kobject/kobject-example.ko foo_show Added new event: probe:foo_show (on foo_show in /linux/samples/kobject/kobject-example.ko) You can now use it in all perf tools, such as: perf record -e probe:foo_show -aR sleep 1 $ cat /sys/kernel/debug/tracing/kprobe_events p:probe/foo_show /linux/samples/kobject/kobject-example.ko:foo_show After applying patch: $ ./perf probe -m /linux/samples/kobject/kobject-example.ko foo_show Added new event: probe:foo_show (on foo_show in kobject_example) You can now use it in all perf tools, such as: perf record -e probe:foo_show -aR sleep 1 $ cat /sys/kernel/debug/tracing/kprobe_events p:probe/foo_show kobject_example:foo_show Signed-off-by: Ravi Bangoria <ravi.bango...@linux.vnet.ibm.com> --- Changes in v2: - made find_module_name static tools/perf/util/probe-event.c | 19 +++ tools/perf/util/probe-event.h | 2 -- 2 files changed, 15 insertions(+), 6 deletions(-) diff --git a/tools/perf/util/probe-event.c b/tools/perf/util/probe-event.c index 5f1a9bf..c570a6c 100644 --- a/tools/perf/util/probe-event.c +++ b/tools/perf/util/probe-event.c @@ -285,7 +285,7 @@ static bool kprobe_warn_out_range(const char *symbol, unsigned long address) * inspect elf and find out what is actual module name. * Caller has to free mod_name after using it. */ -char *find_module_name(const char *module) +static char *find_module_name(const char *module) { int fd; Elf *elf; @@ -2566,6 +2566,7 @@ static int find_probe_trace_events_from_map(struct perf_probe_event *pev, struct probe_trace_point *tp; int num_matched_functions; int ret, i, j, skipped = 0; + char *mod_name; map = get_target_map(pev->target, pev->uprobes); if (!map) { @@ -2650,9 +2651,19 @@ static int find_probe_trace_events_from_map(struct perf_probe_event *pev, tp->realname = strdup_or_goto(sym->name, nomem_out); tp->retprobe = pp->retprobe; - if (pev->target) - tev->point.module = strdup_or_goto(pev->target, - nomem_out); + if (pev->target) { + if (pev->uprobes) { + tev->point.module = strdup_or_goto(pev->target, + nomem_out); + } else { + mod_name = find_module_name(pev->target); + tev->point.module = + strdup(mod_name ? mod_name : pev->target); + free(mod_name); + if (!tev->point.module) + goto nomem_out; + } + } tev->uprobes = pev->uprobes; tev->nargs = pev->nargs; if (tev->nargs) { diff --git a/tools/perf/util/probe-event.h b/tools/perf/util/probe-event.h index 0468fa3..e54e7b0 100644 --- a/tools/perf/util/probe-event.h +++ b/tools/perf/util/probe-event.h @@ -166,6 +166,4 @@ int e_snprintf(char *str, size_t size, const char *format, ...) int copy_to_probe_trace_arg(struct probe_trace_arg *tvar, struct perf_probe_arg *pvar); -char *find_module_name(const char *module); - #endif /*_PROBE_EVENT_H */ -- 2.1.4
Re: [RFC 1/4] perf kvm: Enable 'record' on powerpc
Hi Arnaldo, Thanks for the review. Please find my comments below. On Thursday 28 April 2016 03:17 AM, Arnaldo Carvalho de Melo wrote: Em Wed, Apr 27, 2016 at 06:02:21PM +0530, Ravi Bangoria escreveu: Hi Arnaldo, I've worked on your patch. I'm sending this patch(diff) to check if this is the same idea you want to progress with. I cleanup your patch, removed arch specific compile time directives and changed code to enable cross arch reporting. I tested record on powerpc and report on x86 and it's working. Please give suggestion about your approach. Let me know if you have some other idea to progress with. Here is the diff w.r.t perf/cpumode branch: diff --git a/tools/perf/builtin-kvm.c b/tools/perf/builtin-kvm.c index bff6664..83ef6c6 100644 --- a/tools/perf/builtin-kvm.c +++ b/tools/perf/builtin-kvm.c @@ -1480,6 +1480,60 @@ perf_stat: } #endif /* HAVE_KVM_STAT_SUPPORT */ +#define PPC_HV_DECREMENTER 2432 +#define PPC_HV_BIT 3 +#define PPC_PR_BIT 49 +#define PPC_MAX 63 + +static bool perf_sample__is_hv_dec_trap(struct perf_sample *sample, struct perf_evsel *evsel) +{ +int trap = perf_evsel__intval(evsel, sample, "trap"); +return trap == PPC_HV_DECREMENTER; +} + +static void perf_kvm__munge_ppc_guest_sample(struct perf_evsel *evsel, struct perf_sample *sample) +{ +unsigned long msr, hv, pr; + +if (!perf_sample__is_hv_dec_trap(sample, evsel)) +return; + +sample->ip = perf_evsel__intval(evsel, sample, "pc"); +sample->cpumode = PERF_RECORD_MISC_GUEST_KERNEL; + +msr = perf_evsel__intval(evsel, sample, "msr"); +hv = msr & ((unsigned long)1 << (PPC_MAX - PPC_HV_BIT)); +pr = msr & ((unsigned long)1 << (PPC_MAX - PPC_PR_BIT)); +if (!hv && pr) +sample->cpumode = PERF_RECORD_MISC_GUEST_USER; +} + +static bool perf_evlist__recorded_on_ppc(const struct perf_evlist *evlist) +{ +if (evlist->env && evlist->env->arch) { +return !strcmp(evlist->env->arch, "ppc64") || + !strcmp(evlist->env->arch, "ppc64le"); +} +return false; +} + +int perf_kvm__setup_munge_ppc_guest_event(struct perf_evlist *evlist) +{ +struct perf_evsel *evsel; +const char name[] = "kvm_hv:kvm_guest_exit"; + +if (!perf_evlist__recorded_on_ppc(evlist)) +return 0; + +evsel = perf_evlist__find_tracepoint_by_name(evlist, name); +if (evsel == NULL) +return -1; + +evsel->munge_sample = perf_kvm__munge_ppc_guest_sample; + +return 0; +} + static int __cmd_record(const char *file_name, int argc, const char **argv) { int rec_argc, i = 0, j; diff --git a/tools/perf/builtin-report.c b/tools/perf/builtin-report.c index ab47273..7cb41f7 100644 --- a/tools/perf/builtin-report.c +++ b/tools/perf/builtin-report.c @@ -879,6 +879,12 @@ repeat: if (session == NULL) return -1; +if (perf_guest && +perf_kvm__setup_munge_ppc_guest_event(session->evlist)) { +pr_err("PPC event not present in %s file\n", input_name); +goto error; +} This looks out of place, i.e. this reads: "For all cases where there is a guest and we can't setup the ppc KVM guest related stuff, its an error" I think this gets clearer as: if (perf_guest && perf_evlist__recorded_on_ppc(evlist) && perf_kvm__setup_munge_ppc_guest_event(session->evlist)) { pr_err("PPC event not present in %s file\n", input_name); goto error; } Then we read this as "Hey, if this was recorded on ppc, try to set things up for ppc", Yes I'll change this. but then again, what is this KVM stuff doing in the generic 'perf report' code? Basically we are checking if data recorded on ppc in perf.data file. Which can be done after opening a file and mapping header info in evlist. And evlist is initialized in builtin-record.c only. So, I don't see any possibility to move this in builtin-kvm.c. Kindly guide how can we do it. What if this is a perf.data file generated on PPC but being read on PPC? This will not make sense to munge it, right? If you are asking about normal(without kvm) perf record and report, it's working with this patch. Otherwise can you please explain little bit more. But yes, we can change this code like this: if (perf_guest && perf_evlist__recorded_on_ppc(session->evlist)) perf_kvm__setup_munge_ppc_guest_event(session->evlist); and change definition of perf_kvm__setup_munge_ppc_guest_event as: void perf_kvm__setup_munge_ppc_guest_event(struct perf_evlist *evlist) { struct perf_evsel *evsel; const char name[] = "kvm_hv:kvm_guest_exit"; evsel = perf_evlist__find_tracepoint_by_name(evlist, name); if (evsel == NULL) return; evsel->munge_sample = perf_kvm__munge_ppc_guest_sample; } This is with what I remember from this case, please bear with me. Regards, Ravi
[RFC] perf uprobe: Skip prologue if program compiled without optimization
Function prologue prepares stack and registers before executing function logic. When target program is compiled without optimization, function parameter information is only valid after prologue. When we probe entrypc of the function, and try to record function parameter, it contains garbage value. For example, $ vim test.c #include void foo(int i) { printf("i: %d\n", i); } int main() { foo(42); return 0; } $ gcc -g test.c -o test $ objdump -dl test | less foo(): /home/ravi/test.c:4 400536: 55 push %rbp 400537: 48 89 e5mov%rsp,%rbp 40053a: 48 83 ec 10 sub-bashx10,%rsp 40053e: 89 7d fcmov%edi,-0x4(%rbp) /home/ravi/test.c:5 400541: 8b 45 fcmov-0x4(%rbp),%eax ... ... main(): /home/ravi/test.c:9 400558: 55 push %rbp 400559: 48 89 e5mov%rsp,%rbp /home/ravi/test.c:10 40055c: bf 2a 00 00 00 mov-bashx2a,%edi 400561: e8 d0 ff ff ff callq 400536 /home/ravi/test.c:11 $ ./perf probe -x ./test 'foo i' $ cat /sys/kernel/debug/tracing/uprobe_events p:probe_test/foo /home/ravi/test:0x0536 i=-12(%sp):s32 $ ./perf record -e probe_test:foo ./test $ ./perf script test 5778 [001] 4918.562027: probe_test:foo: (400536) i=0 Here variable 'i' is passed via stack which is pushed on stack at 0x40053e. But we are probing at 0x400536. To resolve this issues, we need to probe on next instruction after prologue. gdb and systemtap also does same thing. I've implemented this patch based on approach systemtap has used. After applying patch: $ ./perf probe -x ./test 'foo i' $ cat /sys/kernel/debug/tracing/uprobe_events p:probe_test/foo /home/ravi/test:0x0541 i=-4(%bp):s32 $ ./perf record -e probe_test:foo ./test $ ./perf script test 6300 [001] 5877.879327: probe_test:foo: (400541) i=42 No need to skip prologue for optimized case since debug info is correct for each instructions for -O2 -g. For more details please visit: https://bugzilla.redhat.com/show_bug.cgi?id=612253#c6 Signed-off-by: Ravi Bangoria <ravi.bango...@linux.vnet.ibm.com> --- tools/perf/util/probe-finder.c | 156 + 1 file changed, 156 insertions(+) diff --git a/tools/perf/util/probe-finder.c b/tools/perf/util/probe-finder.c index f2d9ff0..a788b9c2 100644 --- a/tools/perf/util/probe-finder.c +++ b/tools/perf/util/probe-finder.c @@ -892,6 +892,161 @@ static int find_probe_point_lazy(Dwarf_Die *sp_die, struct probe_finder *pf) return die_walk_lines(sp_die, probe_point_lazy_walker, pf); } +static bool var_has_loclist(Dwarf_Die *die) +{ + Dwarf_Attribute loc; + int tag = dwarf_tag(die); + + if (tag != DW_TAG_formal_parameter && + tag != DW_TAG_variable) + return false; + + return (dwarf_attr_integrate(die, DW_AT_location, ) && + dwarf_whatform() == DW_FORM_sec_offset); +} + +/* + * For any object in given CU whose DW_AT_location is a location list, + * target program is compiled with optimization. + */ +static bool optimized_target(Dwarf_Die *die) +{ + Dwarf_Die tmp_die; + + if (var_has_loclist(die)) + return true; + + if (!dwarf_child(die, _die) && optimized_target(_die)) + return true; + + if (!dwarf_siblingof(die, _die) && optimized_target(_die)) + return true; + + return false; +} + +static bool get_entrypc_idx(Dwarf_Lines *lines, unsigned long nr_lines, + Dwarf_Addr pf_addr, unsigned long *entrypc_idx) +{ + unsigned long i; + Dwarf_Addr addr; + + for (i = 0; i < nr_lines; i++) { + if (dwarf_lineaddr(dwarf_onesrcline(lines, i), )) + return false; + + if (addr == pf_addr) { + *entrypc_idx = i; + return true; + } + } + return false; +} + +static bool get_postprologue_addr(unsigned long entrypc_idx, + Dwarf_Lines *lines, + unsigned long nr_lines, + Dwarf_Addr highpc, + Dwarf_Addr *postprologue_addr) +{ + unsigned long i; + int entrypc_lno, lno; + Dwarf_Line *line; + Dwarf_Addr addr; + bool p_end; + + /* entrypc_lno is actual source line number */ + line = dwarf_onesrcline(lines, entrypc_idx); + if (dwarf_lineno(line, _lno)) + return false; + + for (i = entrypc_idx; i < nr_lines; i++) { + line = dwarf_onesrclin
[PATCH] perf uprobe: Skip prologue if program compiled without optimization
Function prologue prepares stack and registers before executing function logic. When target program is compiled without optimization, function parameter information is only valid after prologue. When we probe entrypc of the function, and try to record function parameter, it contains garbage value. For example, $ vim test.c #include void foo(int i) { printf("i: %d\n", i); } int main() { foo(42); return 0; } $ gcc -g test.c -o test $ objdump -dl test | less foo(): /home/ravi/test.c:4 400536: 55 push %rbp 400537: 48 89 e5mov%rsp,%rbp 40053a: 48 83 ec 10 sub-bashx10,%rsp 40053e: 89 7d fcmov%edi,-0x4(%rbp) /home/ravi/test.c:5 400541: 8b 45 fcmov-0x4(%rbp),%eax ... ... main(): /home/ravi/test.c:9 400558: 55 push %rbp 400559: 48 89 e5mov%rsp,%rbp /home/ravi/test.c:10 40055c: bf 2a 00 00 00 mov-bashx2a,%edi 400561: e8 d0 ff ff ff callq 400536 /home/ravi/test.c:11 $ ./perf probe -x ./test 'foo i' $ cat /sys/kernel/debug/tracing/uprobe_events p:probe_test/foo /home/ravi/test:0x0536 i=-12(%sp):s32 $ ./perf record -e probe_test:foo ./test $ ./perf script test 5778 [001] 4918.562027: probe_test:foo: (400536) i=0 Here variable 'i' is passed via stack which is pushed on stack at 0x40053e. But we are probing at 0x400536. To resolve this issues, we need to probe on next instruction after prologue. gdb and systemtap also does same thing. I've implemented this patch based on approach systemtap has used. After applying patch: $ ./perf probe -x ./test 'foo i' $ cat /sys/kernel/debug/tracing/uprobe_events p:probe_test/foo /home/ravi/test:0x0541 i=-4(%bp):s32 $ ./perf record -e probe_test:foo ./test $ ./perf script test 6300 [001] 5877.879327: probe_test:foo: (400541) i=42 No need to skip prologue for optimized case since debug info is correct for each instructions for -O2 -g. For more details please visit: https://bugzilla.redhat.com/show_bug.cgi?id=612253#c6 Signed-off-by: Ravi Bangoria <ravi.bango...@linux.vnet.ibm.com> --- Changes wrt RFC: - Skip prologue only when function parameter is specified - Notify user about skipping prologue tools/perf/util/probe-finder.c | 164 + 1 file changed, 164 insertions(+) diff --git a/tools/perf/util/probe-finder.c b/tools/perf/util/probe-finder.c index f2d9ff0..8efa7f2 100644 --- a/tools/perf/util/probe-finder.c +++ b/tools/perf/util/probe-finder.c @@ -892,6 +892,169 @@ static int find_probe_point_lazy(Dwarf_Die *sp_die, struct probe_finder *pf) return die_walk_lines(sp_die, probe_point_lazy_walker, pf); } +static bool var_has_loclist(Dwarf_Die *die) +{ + Dwarf_Attribute loc; + int tag = dwarf_tag(die); + + if (tag != DW_TAG_formal_parameter && + tag != DW_TAG_variable) + return false; + + return (dwarf_attr_integrate(die, DW_AT_location, ) && + dwarf_whatform() == DW_FORM_sec_offset); +} + +/* + * For any object in given CU whose DW_AT_location is a location list, + * target program is compiled with optimization. + */ +static bool optimized_target(Dwarf_Die *die) +{ + Dwarf_Die tmp_die; + + if (var_has_loclist(die)) + return true; + + if (!dwarf_child(die, _die) && optimized_target(_die)) + return true; + + if (!dwarf_siblingof(die, _die) && optimized_target(_die)) + return true; + + return false; +} + +static bool get_entrypc_idx(Dwarf_Lines *lines, unsigned long nr_lines, + Dwarf_Addr pf_addr, unsigned long *entrypc_idx) +{ + unsigned long i; + Dwarf_Addr addr; + + for (i = 0; i < nr_lines; i++) { + if (dwarf_lineaddr(dwarf_onesrcline(lines, i), )) + return false; + + if (addr == pf_addr) { + *entrypc_idx = i; + return true; + } + } + return false; +} + +static bool get_postprologue_addr(unsigned long entrypc_idx, + Dwarf_Lines *lines, + unsigned long nr_lines, + Dwarf_Addr highpc, + Dwarf_Addr *postprologue_addr) +{ + unsigned long i; + int entrypc_lno, lno; + Dwarf_Line *line; + Dwarf_Addr addr; + bool p_end; + + /* entrypc_lno is actual source line number */ + line = dwarf_onesrcline(lines, entrypc_idx); + if (dwarf_lineno(line, _lno)) + return fal
Re: [RFC] perf uprobe: Skip prologue if program compiled without optimization
On Saturday 30 July 2016 08:34 AM, Masami Hiramatsu wrote: On Thu, 28 Jul 2016 20:01:51 +0530 Ravi Bangoria <ravi.bango...@linux.vnet.ibm.com> wrote: Function prologue prepares stack and registers before executing function logic. When target program is compiled without optimization, function parameter information is only valid after prologue. When we probe entrypc of the function, and try to record function parameter, it contains garbage value. Right! :) Thanks Masami for review. I've sent patch with changes you suggested. Please review it. -Ravi
[PATCH v2 2/2] perf uprobe: Skip prologue if program compiled without optimization
Function prologue prepares stack and registers before executing function logic. When target program is compiled without optimization, function parameter information is only valid after prologue. When we probe entrypc of the function, and try to record function parameter, it contains garbage value. For example, $ vim test.c #include void foo(int i) { printf("i: %d\n", i); } int main() { foo(42); return 0; } $ gcc -g test.c -o test $ objdump -dl test | less foo(): /home/ravi/test.c:4 400536: 55 push %rbp 400537: 48 89 e5mov%rsp,%rbp 40053a: 48 83 ec 10 sub-bashx10,%rsp 40053e: 89 7d fcmov%edi,-0x4(%rbp) /home/ravi/test.c:5 400541: 8b 45 fcmov-0x4(%rbp),%eax ... ... main(): /home/ravi/test.c:9 400558: 55 push %rbp 400559: 48 89 e5mov%rsp,%rbp /home/ravi/test.c:10 40055c: bf 2a 00 00 00 mov-bashx2a,%edi 400561: e8 d0 ff ff ff callq 400536 /home/ravi/test.c:11 $ ./perf probe -x ./test 'foo i' $ cat /sys/kernel/debug/tracing/uprobe_events p:probe_test/foo /home/ravi/test:0x0536 i=-12(%sp):s32 $ ./perf record -e probe_test:foo ./test $ ./perf script test 5778 [001] 4918.562027: probe_test:foo: (400536) i=0 Here variable 'i' is passed via stack which is pushed on stack at 0x40053e. But we are probing at 0x400536. To resolve this issues, we need to probe on next instruction after prologue. gdb and systemtap also does same thing. I've implemented this patch based on approach systemtap has used. After applying patch: $ ./perf probe -x ./test 'foo i' $ cat /sys/kernel/debug/tracing/uprobe_events p:probe_test/foo /home/ravi/test:0x0541 i=-4(%bp):s32 $ ./perf record -e probe_test:foo ./test $ ./perf script test 6300 [001] 5877.879327: probe_test:foo: (400541) i=42 No need to skip prologue for optimized case since debug info is correct for each instructions for -O2 -g. For more details please visit: https://bugzilla.redhat.com/show_bug.cgi?id=612253#c6 Signed-off-by: Ravi Bangoria <ravi.bango...@linux.vnet.ibm.com> --- Changes in v2: - Skipping prologue only when any ARG is either C variable, $params or $vars. - Probe on line(:1) may not be always possible. Recommend only address to force probe on function entry. tools/perf/util/probe-finder.c | 164 + 1 file changed, 164 insertions(+) diff --git a/tools/perf/util/probe-finder.c b/tools/perf/util/probe-finder.c index f2d9ff0..b2bc77c 100644 --- a/tools/perf/util/probe-finder.c +++ b/tools/perf/util/probe-finder.c @@ -892,6 +892,169 @@ static int find_probe_point_lazy(Dwarf_Die *sp_die, struct probe_finder *pf) return die_walk_lines(sp_die, probe_point_lazy_walker, pf); } +static bool var_has_loclist(Dwarf_Die *die) +{ + Dwarf_Attribute loc; + int tag = dwarf_tag(die); + + if (tag != DW_TAG_formal_parameter && + tag != DW_TAG_variable) + return false; + + return (dwarf_attr_integrate(die, DW_AT_location, ) && + dwarf_whatform() == DW_FORM_sec_offset); +} + +/* + * For any object in given CU whose DW_AT_location is a location list, + * target program is compiled with optimization. + */ +static bool optimized_target(Dwarf_Die *die) +{ + Dwarf_Die tmp_die; + + if (var_has_loclist(die)) + return true; + + if (!dwarf_child(die, _die) && optimized_target(_die)) + return true; + + if (!dwarf_siblingof(die, _die) && optimized_target(_die)) + return true; + + return false; +} + +static bool get_entrypc_idx(Dwarf_Lines *lines, unsigned long nr_lines, + Dwarf_Addr pf_addr, unsigned long *entrypc_idx) +{ + unsigned long i; + Dwarf_Addr addr; + + for (i = 0; i < nr_lines; i++) { + if (dwarf_lineaddr(dwarf_onesrcline(lines, i), )) + return false; + + if (addr == pf_addr) { + *entrypc_idx = i; + return true; + } + } + return false; +} + +static bool get_postprologue_addr(unsigned long entrypc_idx, + Dwarf_Lines *lines, + unsigned long nr_lines, + Dwarf_Addr highpc, + Dwarf_Addr *postprologue_addr) +{ + unsigned long i; + int entrypc_lno, lno; + Dwarf_Line *line; + Dwarf_Addr addr; + bool p_end; + + /* entrypc_lno is actual source line number */ + line = dwarf_onesrcline
[PATCH 1/2] perf probe: Helper function to check if probe with variable
Introduce helper function instead of inline code and replace hardcoded strings "$vars" and "$params" with their corresponding macros. perf_probe_with_var is not declared as static since it will be called from different file in subsequent patch. Signed-off-by: Ravi Bangoria <ravi.bango...@linux.vnet.ibm.com> --- tools/perf/util/probe-event.c | 22 +++--- tools/perf/util/probe-event.h | 2 ++ 2 files changed, 17 insertions(+), 7 deletions(-) diff --git a/tools/perf/util/probe-event.c b/tools/perf/util/probe-event.c index 953dc1a..bc9317e 100644 --- a/tools/perf/util/probe-event.c +++ b/tools/perf/util/probe-event.c @@ -1592,19 +1592,27 @@ out: return ret; } +/* Returns true if *any* ARG is either C variable, $params or $vars. */ +bool perf_probe_with_var(struct perf_probe_event *pev) +{ + int i = 0; + + for (i = 0; i < pev->nargs; i++) + if (is_c_varname(pev->args[i].var) || + !strcmp(pev->args[i].var, PROBE_ARG_PARAMS) || + !strcmp(pev->args[i].var, PROBE_ARG_VARS)) + return true; + return false; +} + /* Return true if this perf_probe_event requires debuginfo */ bool perf_probe_event_need_dwarf(struct perf_probe_event *pev) { - int i; - if (pev->point.file || pev->point.line || pev->point.lazy_line) return true; - for (i = 0; i < pev->nargs; i++) - if (is_c_varname(pev->args[i].var) || - !strcmp(pev->args[i].var, "$params") || - !strcmp(pev->args[i].var, "$vars")) - return true; + if (perf_probe_with_var(pev)) + return true; return false; } diff --git a/tools/perf/util/probe-event.h b/tools/perf/util/probe-event.h index e18ea9f..4d1139b 100644 --- a/tools/perf/util/probe-event.h +++ b/tools/perf/util/probe-event.h @@ -128,6 +128,8 @@ char *synthesize_perf_probe_point(struct perf_probe_point *pp); int perf_probe_event__copy(struct perf_probe_event *dst, struct perf_probe_event *src); +bool perf_probe_with_var(struct perf_probe_event *pev); + /* Check the perf_probe_event needs debuginfo */ bool perf_probe_event_need_dwarf(struct perf_probe_event *pev); -- 2.5.5
Re: [PATCH] perf uprobe: Skip prologue if program compiled without optimization
Thanks Masami, On Tuesday 02 August 2016 08:22 PM, Masami Hiramatsu wrote: On Mon, 1 Aug 2016 14:19:28 +0530 Ravi Bangoria <ravi.bango...@linux.vnet.ibm.com> wrote: Function prologue prepares stack and registers before executing function logic. When target program is compiled without optimization, function parameter information is only valid after prologue. When we probe entrypc of the function, and try to record function parameter, it contains garbage value. [SNIP] + + /* Only FUNC and FUNC@SRC are eligible. */ + if (!pp->function || pp->line || pp->retprobe || pp->lazy_line || + pp->offset || pp->abs_address) + return; + + /* Not interested in func parameter? */ + if (!pf->pev->nargs) + return; Hmm, this is not enough, since perf-probe accepts registers and stacks. At least you should check if all argument are !is_c_varname(), !PROBE_ARG_VARS and !PROBE_ARG_PARAMS here, instead of checking nargs. + + pr_info("Target program is compiled without optimization. Skipping prologue.\n" + "Use %s:1 or absolute address 0x%lx to force probe on entry point.\n\n", Hmm, is :1 always available? I think we should just recommend to use only the address. (moreover, pf->addr may not the absolute address in uprobe event, we'd better say "the address 0x%x") Nice catch. :) Sent v2. Please review it. -Ravi
Re: [PATCH 2/2] perf ppc64le: Fix probe location when using DWARF
On Thursday 11 August 2016 05:20 PM, Arnaldo Carvalho de Melo wrote: Em Thu, Aug 11, 2016 at 10:01:04AM +0530, Ravi Bangoria escreveu: On Thursday 11 August 2016 05:24 AM, Anton Blanchard wrote: Hi, Powerpc has Global Entry Point and Local Entry Point for functions. LEP catches call from both the GEP and the LEP. Symbol table of ELF contains GEP and Offset from which we can calculate LEP, but debuginfo does not have LEP info. Currently, perf prioritize symbol table over dwarf to probe on LEP for ppc64le. But when user tries to probe with function parameter, we fall back to using dwarf(i.e. GEP) and when function called via LEP, probe will never hit. This patch causes a build failure for me on ppc64le: libperf.a(libperf-in.o): In function `arch__post_process_probe_trace_events': tools/perf/arch/powerpc/util/sym-handling.c:109: undefined reference to `get_target_map' Thanks Anton. Sorry, I should have caught that. @Arnaldo, Can you please pick this up. I've prepared this on top of acme/perf/core. From 89c977ae9c3ae35c78b16cddabcf2b01d3cf5cc8 Mon Sep 17 00:00:00 2001 From: Ravi Bangoria <ravi.bango...@linux.vnet.ibm.com> Date: Wed, 10 Aug 2016 23:13:45 -0500 Subject: [PATCH] perf ppc64le: Fix build failure when no dwarf support Fix perf build failure on ppc64le because of Commit 99e608b5954c ("perf probe ppc64le: Fix probe location when using DWARF") Can you please provide a better explanation? I had to look at the patch to understand what it was fixing, and then the patch adds LIBELF_SUPPORT ifdefs while the patch description, talks about DWARF. Yes. Explanation could have been better. Apologies for that. arch__post_process_probe_trace_events() calls get_target_map() to prepare symbol table. get_target_map() is defined inside util/probe-event.c. probe-event.c will only get included in perf binary if CONFIG_LIBELF is set. Hence arch__post_process_probe_trace_events() needs to be defined inside #ifdef HAVE_LIBELF_SUPPORT to solve compilation error. Please let me know if any doubts. Thanks, Ravi Anyway, Anton, does this fix the problem for you? - Arnaldo Signed-off-by: Ravi Bangoria <ravi.bango...@linux.vnet.ibm.com> --- tools/perf/arch/powerpc/util/sym-handling.c | 2 ++ 1 file changed, 2 insertions(+) diff --git a/tools/perf/arch/powerpc/util/sym-handling.c b/tools/perf/arch/powerpc/util/sym-handling.c index 8d4dc97..c27a51a 100644 --- a/tools/perf/arch/powerpc/util/sym-handling.c +++ b/tools/perf/arch/powerpc/util/sym-handling.c @@ -97,6 +97,7 @@ void arch__fix_tev_from_maps(struct perf_probe_event *pev, } } +#ifdef HAVE_LIBELF_SUPPORT void arch__post_process_probe_trace_events(struct perf_probe_event *pev, int ntevs) { @@ -118,5 +119,6 @@ void arch__post_process_probe_trace_events(struct perf_probe_event *pev, } } } +#endif #endif -- 2.7.4
Re: [PATCH v2 2/2] perf uprobe: Skip prologue if program compiled without optimization
Hi Masami, Arnaldo, Any updates on this? Thanks, Ravi On Wednesday 03 August 2016 02:28 PM, Ravi Bangoria wrote: > Function prologue prepares stack and registers before executing function > logic. When target program is compiled without optimization, function > parameter information is only valid after prologue. When we probe entrypc > of the function, and try to record function parameter, it contains > garbage value. > > For example, > $ vim test.c > #include > > void foo(int i) > { >printf("i: %d\n", i); > } > > int main() > { > foo(42); > return 0; > } > > $ gcc -g test.c -o test > $ objdump -dl test | less > foo(): > /home/ravi/test.c:4 > 400536: 55 push %rbp > 400537: 48 89 e5mov%rsp,%rbp > 40053a: 48 83 ec 10 sub-bashx10,%rsp > 40053e: 89 7d fcmov%edi,-0x4(%rbp) > /home/ravi/test.c:5 > 400541: 8b 45 fcmov-0x4(%rbp),%eax > ... > ... > main(): > /home/ravi/test.c:9 > 400558: 55 push %rbp > 400559: 48 89 e5mov%rsp,%rbp > /home/ravi/test.c:10 > 40055c: bf 2a 00 00 00 mov-bashx2a,%edi > 400561: e8 d0 ff ff ff callq 400536 > /home/ravi/test.c:11 > > $ ./perf probe -x ./test 'foo i' > $ cat /sys/kernel/debug/tracing/uprobe_events > p:probe_test/foo /home/ravi/test:0x0536 i=-12(%sp):s32 > > $ ./perf record -e probe_test:foo ./test > $ ./perf script > test 5778 [001] 4918.562027: probe_test:foo: (400536) i=0 > > Here variable 'i' is passed via stack which is pushed on stack at > 0x40053e. But we are probing at 0x400536. > > To resolve this issues, we need to probe on next instruction after > prologue. gdb and systemtap also does same thing. I've implemented > this patch based on approach systemtap has used. > > After applying patch: > > $ ./perf probe -x ./test 'foo i' > $ cat /sys/kernel/debug/tracing/uprobe_events > p:probe_test/foo /home/ravi/test:0x0541 i=-4(%bp):s32 > > $ ./perf record -e probe_test:foo ./test > $ ./perf script > test 6300 [001] 5877.879327: probe_test:foo: (400541) i=42 > > No need to skip prologue for optimized case since debug info is correct > for each instructions for -O2 -g. For more details please visit: > https://bugzilla.redhat.com/show_bug.cgi?id=612253#c6 > > Signed-off-by: Ravi Bangoria <ravi.bango...@linux.vnet.ibm.com> > --- > Changes in v2: > - Skipping prologue only when any ARG is either C variable, $params > or $vars. > - Probe on line(:1) may not be always possible. Recommend only address > to force probe on function entry. > > tools/perf/util/probe-finder.c | 164 > + > 1 file changed, 164 insertions(+) > > diff --git a/tools/perf/util/probe-finder.c b/tools/perf/util/probe-finder.c > index f2d9ff0..b2bc77c 100644 > --- a/tools/perf/util/probe-finder.c > +++ b/tools/perf/util/probe-finder.c > @@ -892,6 +892,169 @@ static int find_probe_point_lazy(Dwarf_Die *sp_die, > struct probe_finder *pf) > return die_walk_lines(sp_die, probe_point_lazy_walker, pf); > } > > +static bool var_has_loclist(Dwarf_Die *die) > +{ > + Dwarf_Attribute loc; > + int tag = dwarf_tag(die); > + > + if (tag != DW_TAG_formal_parameter && > + tag != DW_TAG_variable) > + return false; > + > + return (dwarf_attr_integrate(die, DW_AT_location, ) && > + dwarf_whatform() == DW_FORM_sec_offset); > +} > + > +/* > + * For any object in given CU whose DW_AT_location is a location list, > + * target program is compiled with optimization. > + */ > +static bool optimized_target(Dwarf_Die *die) > +{ > + Dwarf_Die tmp_die; > + > + if (var_has_loclist(die)) > + return true; > + > + if (!dwarf_child(die, _die) && optimized_target(_die)) > + return true; > + > + if (!dwarf_siblingof(die, _die) && optimized_target(_die)) > + return true; > + > + return false; > +} > + > +static bool get_entrypc_idx(Dwarf_Lines *lines, unsigned long nr_lines, > + Dwarf_Addr pf_addr, unsigned long *entrypc_idx) > +{ > + unsigned long i; > + Dwarf_Addr addr; > + > + for (i = 0; i < nr_lines; i++) { > + if (dwarf_lineaddr(dwarf_onesrcline(lines, i),
Re: [PATCH 2/2] perf ppc64le: Fix probe location when using DWARF
On Thursday 11 August 2016 05:24 AM, Anton Blanchard wrote: Hi, Powerpc has Global Entry Point and Local Entry Point for functions. LEP catches call from both the GEP and the LEP. Symbol table of ELF contains GEP and Offset from which we can calculate LEP, but debuginfo does not have LEP info. Currently, perf prioritize symbol table over dwarf to probe on LEP for ppc64le. But when user tries to probe with function parameter, we fall back to using dwarf(i.e. GEP) and when function called via LEP, probe will never hit. This patch causes a build failure for me on ppc64le: libperf.a(libperf-in.o): In function `arch__post_process_probe_trace_events': tools/perf/arch/powerpc/util/sym-handling.c:109: undefined reference to `get_target_map' Thanks Anton. Sorry, I should have caught that. @Arnaldo, Can you please pick this up. I've prepared this on top of acme/perf/core. From 89c977ae9c3ae35c78b16cddabcf2b01d3cf5cc8 Mon Sep 17 00:00:00 2001 From: Ravi Bangoria <ravi.bango...@linux.vnet.ibm.com> Date: Wed, 10 Aug 2016 23:13:45 -0500 Subject: [PATCH] perf ppc64le: Fix build failure when no dwarf support Fix perf build failure on ppc64le because of Commit 99e608b5954c ("perf probe ppc64le: Fix probe location when using DWARF") Signed-off-by: Ravi Bangoria <ravi.bango...@linux.vnet.ibm.com> --- tools/perf/arch/powerpc/util/sym-handling.c | 2 ++ 1 file changed, 2 insertions(+) diff --git a/tools/perf/arch/powerpc/util/sym-handling.c b/tools/perf/arch/powerpc/util/sym-handling.c index 8d4dc97..c27a51a 100644 --- a/tools/perf/arch/powerpc/util/sym-handling.c +++ b/tools/perf/arch/powerpc/util/sym-handling.c @@ -97,6 +97,7 @@ void arch__fix_tev_from_maps(struct perf_probe_event *pev, } } +#ifdef HAVE_LIBELF_SUPPORT void arch__post_process_probe_trace_events(struct perf_probe_event *pev, int ntevs) { @@ -118,5 +119,6 @@ void arch__post_process_probe_trace_events(struct perf_probe_event *pev, } } } +#endif #endif -- 2.7.4
Re: [PATCH v3 3/4] perf annotate: add powerpc support
On Wednesday 13 July 2016 01:09 PM, Michael Ellerman wrote: Arnaldo Carvalho de Melo <a...@kernel.org> writes: Em Tue, Jul 12, 2016 at 07:51:46AM +0530, Ravi Bangoria escreveu: Hi Arnaldo, On Friday 08 July 2016 02:01 PM, Michael Ellerman wrote: Ravi Bangoria <ravi.bango...@linux.vnet.ibm.com> writes: On Wednesday 06 July 2016 03:38 PM, Michael Ellerman wrote: I've sent v4 which enables annotate for bctr' instructions. for 'bctr', it will show down arrow(indicate jump) and 'bctrl' will show right arrow(indicate call). But no navigation options will be provided. By pressing Enter key on that, message will be shown that like "Invalid target" Great thanks. I've sent v4 series. Please review it. If somebody else could do it and provide acks/reviewed by, that would help, Michael, can I get your comments as such? It looks OK to me. But I don't know the code really, and I haven't had time to test it personally. Ravi, have you tested on a big endian machine? Yes Michael, I've tested annotate on BE and LE both. -Ravi cheers
Re: [PATCH v4 0/3] perf annotate: Enable cross arch annotate
Arnaldo, Michael, I've tested this patchset on ppc64 BE and LE both. Please review this. -Ravi On Friday 08 July 2016 10:10 AM, Ravi Bangoria wrote: Perf can currently only support code navigation (branches and calls) in annotate when run on the same architecture where perf.data was recorded. But cross arch annotate is not supported. This patchset enables cross arch annotate. Currently I've used x86 and arm instructions which are already available and adding support for powerpc as well. Adding support for other arch will be easy. I've created this patch on top of acme/perf/core. And tested it with x86 and powerpc only. Note for arm: Few instructions were defined under #if __arm__ which I've used as a table for arm. But I'm not sure whether instruction defined outside of that also contains arm instructions. Apart from that, 'call__parse()' and 'move__parse()' contains #ifdef __arm__ directive. I've changed it to if (!strcmp(norm_arch, arm)). I don't have a arm machine to test these changes. Example: Record on powerpc: $ ./perf record -a Report -> Annotate on x86: $ ./perf report -i perf.data.powerpc --vmlinux vmlinux.powerpc Changes in v4: - powerpc: Added support for branch instructions that includes 'ctr' - __maybe_unused was misplaced at few location. Corrected it. - Moved position of v3 last patch that define macro for each arch name v3 link: https://lkml.org/lkml/2016/6/30/99 Naveen N. Rao (1): perf annotate: add powerpc support Ravi Bangoria (2): perf: Define macro for normalized arch names perf annotate: Enable cross arch annotate tools/perf/arch/common.c | 36 ++--- tools/perf/arch/common.h | 11 ++ tools/perf/builtin-top.c | 2 +- tools/perf/ui/browsers/annotate.c | 3 +- tools/perf/ui/gtk/annotate.c | 2 +- tools/perf/util/annotate.c | 273 ++--- tools/perf/util/annotate.h | 6 +- tools/perf/util/unwind-libunwind.c | 4 +- 8 files changed, 265 insertions(+), 72 deletions(-) -- 2.5.5
Re: [PATCH v3 3/4] perf annotate: add powerpc support
Hi Arnaldo, On Friday 08 July 2016 02:01 PM, Michael Ellerman wrote: Ravi Bangoria <ravi.bango...@linux.vnet.ibm.com> writes: On Wednesday 06 July 2016 03:38 PM, Michael Ellerman wrote: I've sent v4 which enables annotate for bctr' instructions. for 'bctr', it will show down arrow(indicate jump) and 'bctrl' will show right arrow(indicate call). But no navigation options will be provided. By pressing Enter key on that, message will be shown that like "Invalid target" Great thanks. I've sent v4 series. Please review it. -Ravi It doesn't look like we have the opcode handy here? Could we get it somehow? That would make this a *lot* more robust. objdump prints machine code, but I don't know how difficult that would be to parse to get opcode. Normal objdump -d output includes the opcode, eg: c000886c: 2c 2c 00 00 cmpdi r12,0 ^^^ The only thing you need to know is the endian and you can reconstruct the raw instruction. Then you can just decode the opcode, see how we do it in the kernel with eg. instr_is_relative_branch(). I'm sorry. I was thinking that you wants to show opcodes with perf annotate. But you were asking to use opcode instead of parsing instructions. Yeah. This looks like rewrite parsing code. I don't know whether there is any library already available for this which we can directly use. I'm thinking about this. OK don't worry about it for now. We should get this merged for starters and we can always improve it later. cheers
Re: [PATCH v3 3/4] perf annotate: add powerpc support
Hi Michael, On Friday 01 July 2016 02:13 PM, Ravi Bangoria wrote: Thanks Michael for your suggestion. On Thursday 30 June 2016 11:51 AM, Michael Ellerman wrote: On Thu, 2016-06-30 at 11:44 +0530, Ravi Bangoria wrote: diff --git a/tools/perf/util/annotate.c b/tools/perf/util/annotate.c index 36a5825..b87eac7 100644 --- a/tools/perf/util/annotate.c +++ b/tools/perf/util/annotate.c @@ -476,6 +481,125 @@ static int ins__cmp(const void *a, const void *b) ... + +static struct ins *ins__find_powerpc(const char *name) +{ +int i; +struct ins *ins; +struct ins_ops *ops; +static struct instructions_powerpc head; +static bool list_initialized; + +/* + * - Interested only if instruction starts with 'b'. + * - Few start with 'b', but aren't branch instructions. + * - Let's also ignore instructions involving 'ctr' and + * 'tar' since target branch addresses for those can't + * be determined statically. + */ +if (name[0] != 'b' || +!strncmp(name, "bcd", 3) || +!strncmp(name, "brinc", 5) || +!strncmp(name, "bper", 4) || +strstr(name, "ctr")|| +strstr(name, "tar")) +return NULL; It would be good if 'bctr' was at least recognised as a branch, even if we can't determine the target. They are very common. We can not show arrow for this since we don't know the target location. can you please suggest how you intends perf to display bctr? bctr can be classified into two variants -- 'bctr' and 'bctrl'. 'bctr' will be considered as jump instruction but jump__parse() won't be able to find any target location and hence it will set target to UINT64_MAX which transform 'bctr' to 'bctr UINT64_MAX'. This looks misleading. bctrl will be considered as call instruction but call_parse() won't be able to find any target function and hence it won't show any navigation arrow for this instruction. Which is same as filter it beforehand. It doesn't look like we have the opcode handy here? Could we get it somehow? That would make this a *lot* more robust. objdump prints machine code, but I don't know how difficult that would be to parse to get opcode. Perf uses --no-show-raw with objdump and hence objdump output does not show opcodes. So change in current objdump output may requires changes in current parsing logic. Additionally I need to change tui as well to show opcodes. This looks quite more work. And this patchset is about enabling annotate for cross arch. So if you really need opcode with perf anotate, can we do it separately? Please let me know your thoughts. -Ravi -Ravi cheers
[PATCH v4 1/3] perf: Define macro for normalized arch names
Define macro for each normalized arch name and use them instead of using arch name as string Signed-off-by: Ravi Bangoria <ravi.bango...@linux.vnet.ibm.com> --- Changes in v4: - Moved position of patch tools/perf/arch/common.c | 36 ++-- tools/perf/arch/common.h | 11 +++ tools/perf/util/unwind-libunwind.c | 4 ++-- 3 files changed, 31 insertions(+), 20 deletions(-) diff --git a/tools/perf/arch/common.c b/tools/perf/arch/common.c index ee69668..feb2113 100644 --- a/tools/perf/arch/common.c +++ b/tools/perf/arch/common.c @@ -122,25 +122,25 @@ static int lookup_triplets(const char *const *triplets, const char *name) const char *normalize_arch(char *arch) { if (!strcmp(arch, "x86_64")) - return "x86"; + return NORM_X86; if (arch[0] == 'i' && arch[2] == '8' && arch[3] == '6') - return "x86"; + return NORM_X86; if (!strcmp(arch, "sun4u") || !strncmp(arch, "sparc", 5)) - return "sparc"; + return NORM_SPARC; if (!strcmp(arch, "aarch64") || !strcmp(arch, "arm64")) - return "arm64"; + return NORM_ARM64; if (!strncmp(arch, "arm", 3) || !strcmp(arch, "sa110")) - return "arm"; + return NORM_ARM; if (!strncmp(arch, "s390", 4)) - return "s390"; + return NORM_S390; if (!strncmp(arch, "parisc", 6)) - return "parisc"; + return NORM_PARISC; if (!strncmp(arch, "powerpc", 7) || !strncmp(arch, "ppc", 3)) - return "powerpc"; + return NORM_POWERPC; if (!strncmp(arch, "mips", 4)) - return "mips"; + return NORM_MIPS; if (!strncmp(arch, "sh", 2) && isdigit(arch[2])) - return "sh"; + return NORM_SH; return arch; } @@ -180,21 +180,21 @@ static int perf_env__lookup_binutils_path(struct perf_env *env, zfree(); } - if (!strcmp(arch, "arm")) + if (!strcmp(arch, NORM_ARM)) path_list = arm_triplets; - else if (!strcmp(arch, "arm64")) + else if (!strcmp(arch, NORM_ARM64)) path_list = arm64_triplets; - else if (!strcmp(arch, "powerpc")) + else if (!strcmp(arch, NORM_POWERPC)) path_list = powerpc_triplets; - else if (!strcmp(arch, "sh")) + else if (!strcmp(arch, NORM_SH)) path_list = sh_triplets; - else if (!strcmp(arch, "s390")) + else if (!strcmp(arch, NORM_S390)) path_list = s390_triplets; - else if (!strcmp(arch, "sparc")) + else if (!strcmp(arch, NORM_SPARC)) path_list = sparc_triplets; - else if (!strcmp(arch, "x86")) + else if (!strcmp(arch, NORM_X86)) path_list = x86_triplets; - else if (!strcmp(arch, "mips")) + else if (!strcmp(arch, NORM_MIPS)) path_list = mips_triplets; else { ui__error("binutils for %s not supported.\n", arch); diff --git a/tools/perf/arch/common.h b/tools/perf/arch/common.h index 6b01c73..14ca8ca 100644 --- a/tools/perf/arch/common.h +++ b/tools/perf/arch/common.h @@ -5,6 +5,17 @@ extern const char *objdump_path; +/* Macro for normalized arch names */ +#define NORM_X86 "x86" +#define NORM_SPARC "sparc" +#define NORM_ARM64 "arm64" +#define NORM_ARM "arm" +#define NORM_S390 "s390" +#define NORM_PARISC"parisc" +#define NORM_POWERPC "powerpc" +#define NORM_MIPS "mips" +#define NORM_SH"sh" + int perf_env__lookup_objdump(struct perf_env *env); const char *normalize_arch(char *arch); diff --git a/tools/perf/util/unwind-libunwind.c b/tools/perf/util/unwind-libunwind.c index 6d542a4..6199102 100644 --- a/tools/perf/util/unwind-libunwind.c +++ b/tools/perf/util/unwind-libunwind.c @@ -40,10 +40,10 @@ int unwind__prepare_access(struct thread *thread, struct map *map, arch = normalize_arch(thread->mg->machine->env->arch); - if (!strcmp(arch, "x86")) { + if (!strcmp(arch, NORM_X86)) { if (dso_type != DSO__TYPE_64BIT) ops = x86_32_unwind_libunwind_ops; - } else if (!strcmp(arch, "arm64") || !strcmp(arch, "arm")) { + } else if (!strcmp(arch, NORM_ARM64) || !strcmp(arch, NORM_ARM)) { if (dso_type == DSO__TYPE_64BIT) ops = arm64_unwind_libunwind_ops; } -- 2.5.5
[PATCH v4 0/3] perf annotate: Enable cross arch annotate
Perf can currently only support code navigation (branches and calls) in annotate when run on the same architecture where perf.data was recorded. But cross arch annotate is not supported. This patchset enables cross arch annotate. Currently I've used x86 and arm instructions which are already available and adding support for powerpc as well. Adding support for other arch will be easy. I've created this patch on top of acme/perf/core. And tested it with x86 and powerpc only. Note for arm: Few instructions were defined under #if __arm__ which I've used as a table for arm. But I'm not sure whether instruction defined outside of that also contains arm instructions. Apart from that, 'call__parse()' and 'move__parse()' contains #ifdef __arm__ directive. I've changed it to if (!strcmp(norm_arch, arm)). I don't have a arm machine to test these changes. Example: Record on powerpc: $ ./perf record -a Report -> Annotate on x86: $ ./perf report -i perf.data.powerpc --vmlinux vmlinux.powerpc Changes in v4: - powerpc: Added support for branch instructions that includes 'ctr' - __maybe_unused was misplaced at few location. Corrected it. - Moved position of v3 last patch that define macro for each arch name v3 link: https://lkml.org/lkml/2016/6/30/99 Naveen N. Rao (1): perf annotate: add powerpc support Ravi Bangoria (2): perf: Define macro for normalized arch names perf annotate: Enable cross arch annotate tools/perf/arch/common.c | 36 ++--- tools/perf/arch/common.h | 11 ++ tools/perf/builtin-top.c | 2 +- tools/perf/ui/browsers/annotate.c | 3 +- tools/perf/ui/gtk/annotate.c | 2 +- tools/perf/util/annotate.c | 273 ++--- tools/perf/util/annotate.h | 6 +- tools/perf/util/unwind-libunwind.c | 4 +- 8 files changed, 265 insertions(+), 72 deletions(-) -- 2.5.5
Re: [PATCH v3 3/4] perf annotate: add powerpc support
Hi Michael, On Wednesday 06 July 2016 03:38 PM, Michael Ellerman wrote: Ravi Bangoria <ravi.bango...@linux.vnet.ibm.com> writes: On Thursday 30 June 2016 11:51 AM, Michael Ellerman wrote: On Thu, 2016-06-30 at 11:44 +0530, Ravi Bangoria wrote: diff --git a/tools/perf/util/annotate.c b/tools/perf/util/annotate.c index 36a5825..b87eac7 100644 --- a/tools/perf/util/annotate.c +++ b/tools/perf/util/annotate.c @@ -476,6 +481,125 @@ static int ins__cmp(const void *a, const void *b) ... + +static struct ins *ins__find_powerpc(const char *name) +{ + int i; + struct ins *ins; + struct ins_ops *ops; + static struct instructions_powerpc head; + static bool list_initialized; + + /* +* - Interested only if instruction starts with 'b'. +* - Few start with 'b', but aren't branch instructions. +* - Let's also ignore instructions involving 'ctr' and +* 'tar' since target branch addresses for those can't +* be determined statically. +*/ + if (name[0] != 'b' || + !strncmp(name, "bcd", 3) || + !strncmp(name, "brinc", 5) || + !strncmp(name, "bper", 4) || + strstr(name, "ctr")|| + strstr(name, "tar")) + return NULL; It would be good if 'bctr' was at least recognised as a branch, even if we can't determine the target. They are very common. We can not show arrow for this since we don't know the target location. can you please suggest how you intends perf to display bctr? Yeah I understand you can't show an arrow. I guess it could just be an unterminated arrow? But I'm not sure if that's easy to do with the way the UI is constructed. eg. something like: ld r12,0(r12) mtctr r12 bctrl --> ld r3,-32704(r2) But that's just an idea. I've sent v4 which enables annotate for bctr' instructions. for 'bctr', it will show down arrow(indicate jump) and 'bctrl' will show right arrow(indicate call). But no navigation options will be provided. By pressing Enter key on that, message will be shown that like "Invalid target" Please review it. bctr can be classified into two variants -- 'bctr' and 'bctrl'. 'bctr' will be considered as jump instruction but jump__parse() won't be able to find any target location and hence it will set target to UINT64_MAX which transform 'bctr' to 'bctr UINT64_MAX'. This looks misleading. Agreed. bctrl will be considered as call instruction but call_parse() won't be able to find any target function and hence it won't show any navigation arrow for this instruction. Which is same as filter it beforehand. OK. Maybe what I'm asking for is an enhancement and can be done later. It doesn't look like we have the opcode handy here? Could we get it somehow? That would make this a *lot* more robust. objdump prints machine code, but I don't know how difficult that would be to parse to get opcode. Normal objdump -d output includes the opcode, eg: c000886c: 2c 2c 00 00 cmpdi r12,0 ^^^ The only thing you need to know is the endian and you can reconstruct the raw instruction. Then you can just decode the opcode, see how we do it in the kernel with eg. instr_is_relative_branch(). I'm sorry. I was thinking that you wants to show opcodes with perf annotate. But you were asking to use opcode instead of parsing instructions. This looks like rewrite parsing code. I don't know whether there is any library already available for this which we can directly use. I'm thinking about this. - Ravi cheers
[PATCH v4 2/3] perf annotate: Enable cross arch annotate
Change current data structures and function to enable cross arch annotate. Current implementation does not contain logic of record on one arch and annotating on other. This remote annotate is partially possible with current implementation for x86 (or may be arm as well) only. But, to make remote annotation work properly, all architecture instruction tables need to be included in the perf binary. And while annotating, look for instruction table where perf.data was recorded. Signed-off-by: Ravi Bangoria <ravi.bango...@linux.vnet.ibm.com> --- Changes in v4: - __maybe_unused was misplaced at few location. Corrected it tools/perf/builtin-top.c | 2 +- tools/perf/ui/browsers/annotate.c | 3 +- tools/perf/ui/gtk/annotate.c | 2 +- tools/perf/util/annotate.c| 134 -- tools/perf/util/annotate.h| 5 +- 5 files changed, 93 insertions(+), 53 deletions(-) diff --git a/tools/perf/builtin-top.c b/tools/perf/builtin-top.c index 07fc792..d4fd947 100644 --- a/tools/perf/builtin-top.c +++ b/tools/perf/builtin-top.c @@ -128,7 +128,7 @@ static int perf_top__parse_source(struct perf_top *top, struct hist_entry *he) return err; } - err = symbol__annotate(sym, map, 0); + err = symbol__annotate(sym, map, 0, NULL); if (err == 0) { out_assign: top->sym_filter_entry = he; diff --git a/tools/perf/ui/browsers/annotate.c b/tools/perf/ui/browsers/annotate.c index 29dc6d2..3a652a6f 100644 --- a/tools/perf/ui/browsers/annotate.c +++ b/tools/perf/ui/browsers/annotate.c @@ -1050,7 +1050,8 @@ int symbol__tui_annotate(struct symbol *sym, struct map *map, (nr_pcnt - 1); } - if (symbol__annotate(sym, map, sizeof_bdl) < 0) { + if (symbol__annotate(sym, map, sizeof_bdl, +perf_evsel__env_arch(evsel)) < 0) { ui__error("%s", ui_helpline__last_msg); goto out_free_offsets; } diff --git a/tools/perf/ui/gtk/annotate.c b/tools/perf/ui/gtk/annotate.c index 9c7ff8d..d7150b3 100644 --- a/tools/perf/ui/gtk/annotate.c +++ b/tools/perf/ui/gtk/annotate.c @@ -166,7 +166,7 @@ static int symbol__gtk_annotate(struct symbol *sym, struct map *map, if (map->dso->annotate_warned) return -1; - if (symbol__annotate(sym, map, 0) < 0) { + if (symbol__annotate(sym, map, 0, perf_evsel__env_arch(evsel)) < 0) { ui__error("%s", ui_helpline__current); return -1; } diff --git a/tools/perf/util/annotate.c b/tools/perf/util/annotate.c index e9825fe..32889ce 100644 --- a/tools/perf/util/annotate.c +++ b/tools/perf/util/annotate.c @@ -20,12 +20,14 @@ #include #include #include +#include +#include "../arch/common.h" const char *disassembler_style; const char *objdump_path; static regex_t file_lineno; -static struct ins *ins__find(const char *name); +static struct ins *ins__find(const char *name, const char *norm_arch); static int disasm_line__parse(char *line, char **namep, char **rawp); static void ins__delete(struct ins_operands *ops) @@ -53,7 +55,7 @@ int ins__scnprintf(struct ins *ins, char *bf, size_t size, return ins__raw_scnprintf(ins, bf, size, ops); } -static int call__parse(struct ins_operands *ops) +static int call__parse(struct ins_operands *ops, const char *norm_arch) { char *endptr, *tok, *name; @@ -65,10 +67,8 @@ static int call__parse(struct ins_operands *ops) name++; -#ifdef __arm__ - if (strchr(name, '+')) + if (!strcmp(norm_arch, NORM_ARM) && strchr(name, '+')) return -1; -#endif tok = strchr(name, '>'); if (tok == NULL) @@ -117,7 +117,8 @@ bool ins__is_call(const struct ins *ins) return ins->ops == _ops; } -static int jump__parse(struct ins_operands *ops) +static int jump__parse(struct ins_operands *ops, + const char *norm_arch __maybe_unused) { const char *s = strchr(ops->raw, '+'); @@ -172,7 +173,7 @@ static int comment__symbol(char *raw, char *comment, u64 *addrp, char **namep) return 0; } -static int lock__parse(struct ins_operands *ops) +static int lock__parse(struct ins_operands *ops, const char *norm_arch) { char *name; @@ -183,7 +184,7 @@ static int lock__parse(struct ins_operands *ops) if (disasm_line__parse(ops->raw, , >locked.ops->raw) < 0) goto out_free_ops; - ops->locked.ins = ins__find(name); + ops->locked.ins = ins__find(name, norm_arch); free(name); if (ops->locked.ins == NULL) @@ -193,7 +194,7 @@ static int lock__parse(struct ins_operands *ops) return 0; if (ops->locked.ins->ops->parse && - ops->locked.ins->ops->parse(ops->
[PATCH v4 3/3] perf annotate: add powerpc support
From: Naveen N. Rao <naveen.n@linux.vnet.ibm.com> Powerpc has long list of branch instructions and hardcoding them in table appears to be error-prone. So, add new function to find instruction instead of creating table. This function dynamically create table(list of 'struct ins'), and instead of creating object every time, first check if list already contain object for that instruction. Signed-off-by: Naveen N. Rao <naveen.n@linux.vnet.ibm.com> Signed-off-by: Ravi Bangoria <ravi.bango...@linux.vnet.ibm.com> --- Chnages in v4: - Added support for branch instructions that includes 'ctr' tools/perf/util/annotate.c | 155 +++-- tools/perf/util/annotate.h | 3 +- 2 files changed, 150 insertions(+), 8 deletions(-) diff --git a/tools/perf/util/annotate.c b/tools/perf/util/annotate.c index 32889ce..9de1271 100644 --- a/tools/perf/util/annotate.c +++ b/tools/perf/util/annotate.c @@ -55,10 +55,15 @@ int ins__scnprintf(struct ins *ins, char *bf, size_t size, return ins__raw_scnprintf(ins, bf, size, ops); } -static int call__parse(struct ins_operands *ops, const char *norm_arch) +static int call__parse(char *ins_name, struct ins_operands *ops, + const char *norm_arch) { char *endptr, *tok, *name; + /* Special case for powerpc */ + if (!strcmp(norm_arch, NORM_POWERPC) && strstr(ins_name, "ctr")) + return 0; + ops->target.addr = strtoull(ops->raw, , 16); name = strchr(endptr, '<'); @@ -117,7 +122,7 @@ bool ins__is_call(const struct ins *ins) return ins->ops == _ops; } -static int jump__parse(struct ins_operands *ops, +static int jump__parse(char *ins_name __maybe_unused, struct ins_operands *ops, const char *norm_arch __maybe_unused) { const char *s = strchr(ops->raw, '+'); @@ -135,6 +140,13 @@ static int jump__parse(struct ins_operands *ops, static int jump__scnprintf(struct ins *ins, char *bf, size_t size, struct ins_operands *ops) { + /* +* Instructions that does not include target address in operand +* like 'bctr' for powerpc. +*/ + if (!ops->target.addr) + return scnprintf(bf, size, "%-6.6s", ins->name); + return scnprintf(bf, size, "%-6.6s %" PRIx64, ins->name, ops->target.offset); } @@ -173,7 +185,8 @@ static int comment__symbol(char *raw, char *comment, u64 *addrp, char **namep) return 0; } -static int lock__parse(struct ins_operands *ops, const char *norm_arch) +static int lock__parse(char *ins_name, struct ins_operands *ops, + const char *norm_arch) { char *name; @@ -194,7 +207,8 @@ static int lock__parse(struct ins_operands *ops, const char *norm_arch) return 0; if (ops->locked.ins->ops->parse && - ops->locked.ins->ops->parse(ops->locked.ops, norm_arch) < 0) + ops->locked.ins->ops->parse(ins_name, + ops->locked.ops, norm_arch) < 0) goto out_free_ops; return 0; @@ -237,7 +251,8 @@ static struct ins_ops lock_ops = { .scnprintf = lock__scnprintf, }; -static int mov__parse(struct ins_operands *ops, const char *norm_arch) +static int mov__parse(char *ins_name __maybe_unused, struct ins_operands *ops, + const char *norm_arch) { char *s = strchr(ops->raw, ','), *target, *comment, prev; @@ -304,7 +319,7 @@ static struct ins_ops mov_ops = { .scnprintf = mov__scnprintf, }; -static int dec__parse(struct ins_operands *ops, +static int dec__parse(char *ins_name __maybe_unused, struct ins_operands *ops, const char *norm_arch __maybe_unused) { char *target, *comment, *s, prev; @@ -459,6 +474,11 @@ static struct ins instructions_arm[] = { { .name = "bne", .ops = _ops, }, }; +struct instructions_powerpc { + struct ins *ins; + struct list_head list; +}; + static int ins__key_cmp(const void *name, const void *insp) { const struct ins *ins = insp; @@ -474,6 +494,125 @@ static int ins__cmp(const void *a, const void *b) return strcmp(ia->name, ib->name); } +static struct ins *list_add__ins_powerpc(struct instructions_powerpc *head, +const char *name, struct ins_ops *ops) +{ + struct instructions_powerpc *ins_powerpc; + struct ins *ins; + + ins = zalloc(sizeof(struct ins)); + if (!ins) + return NULL; + + ins_powerpc = zalloc(sizeof(struct instructions_powerpc)); + if (!ins_powerpc) + goto out_free_ins; + + ins->name = strdup(name); + if (!ins->name) + goto out_free_ins_power; + + ins->ops = ops; +
[PATCH 2/2] perf ppc64le: Fix probe location when using DWARF
Powerpc has Global Entry Point and Local Entry Point for functions. LEP catches call from both the GEP and the LEP. Symbol table of ELF contains GEP and Offset from which we can calculate LEP, but debuginfo does not have LEP info. Currently, perf prioritize symbol table over dwarf to probe on LEP for ppc64le. But when user tries to probe with function parameter, we fall back to using dwarf(i.e. GEP) and when function called via LEP, probe will never hit. For example: $ objdump -d vmlinux ... do_sys_open(): c02eb4a0: e8 00 4c 3c addis r2,r12,232 c02eb4a4: 60 00 42 38 addir2,r2,96 c02eb4a8: a6 02 08 7c mflrr0 c02eb4ac: d0 ff 41 fb std r26,-48(r1) $ sudo ./perf probe do_sys_open $ sudo cat /sys/kernel/debug/tracing/kprobe_events p:probe/do_sys_open _text+3060904 $ sudo ./perf probe 'do_sys_open filename:string' $ sudo cat /sys/kernel/debug/tracing/kprobe_events p:probe/do_sys_open _text+3060896 filename_string=+0(%gpr4):string For second case, perf probed on GEP. So when function will be called via LEP, probe won't hit. $ sudo ./perf record -a -e probe:do_sys_open ls [ perf record: Woken up 1 times to write data ] [ perf record: Captured and wrote 0.195 MB perf.data ] To resolve this issue, let's not prioritize symbol table, let perf decide what it wants to use. Perf is already converting GEP to LEP when it uses symbol table. When perf uses debuginfo, let it find LEP offset form symbol table. This way we fall back to probe on LEP for all cases. After patch: $ sudo ./perf probe 'do_sys_open filename:string' $ sudo cat /sys/kernel/debug/tracing/kprobe_events p:probe/do_sys_open _text+3060904 filename_string=+0(%gpr4):string $ sudo ./perf record -a -e probe:do_sys_open ls [ perf record: Woken up 1 times to write data ] [ perf record: Captured and wrote 0.197 MB perf.data (11 samples) ] Signed-off-by: Ravi Bangoria <ravi.bango...@linux.vnet.ibm.com> --- tools/perf/arch/powerpc/util/sym-handling.c | 27 + tools/perf/util/probe-event.c | 37 - tools/perf/util/probe-event.h | 6 - 3 files changed, 49 insertions(+), 21 deletions(-) diff --git a/tools/perf/arch/powerpc/util/sym-handling.c b/tools/perf/arch/powerpc/util/sym-handling.c index c6d0f91..8d4dc97 100644 --- a/tools/perf/arch/powerpc/util/sym-handling.c +++ b/tools/perf/arch/powerpc/util/sym-handling.c @@ -54,10 +54,6 @@ int arch__compare_symbol_names(const char *namea, const char *nameb) #endif #if defined(_CALL_ELF) && _CALL_ELF == 2 -bool arch__prefers_symtab(void) -{ - return true; -} #ifdef HAVE_LIBELF_SUPPORT void arch__sym_update(struct symbol *s, GElf_Sym *sym) @@ -100,4 +96,27 @@ void arch__fix_tev_from_maps(struct perf_probe_event *pev, tev->point.offset += lep_offset; } } + +void arch__post_process_probe_trace_events(struct perf_probe_event *pev, + int ntevs) +{ + struct probe_trace_event *tev; + struct map *map; + struct symbol *sym = NULL; + struct rb_node *tmp; + int i = 0; + + map = get_target_map(pev->target, pev->uprobes); + if (!map || map__load(map, NULL) < 0) + return; + + for (i = 0; i < ntevs; i++) { + tev = >tevs[i]; + map__for_each_symbol(map, sym, tmp) { + if (map->unmap_ip(map, sym->start) == tev->point.address) + arch__fix_tev_from_maps(pev, tev, map, sym); + } + } +} + #endif diff --git a/tools/perf/util/probe-event.c b/tools/perf/util/probe-event.c index 4e215e7..5efa535 100644 --- a/tools/perf/util/probe-event.c +++ b/tools/perf/util/probe-event.c @@ -178,7 +178,7 @@ static struct map *kernel_get_module_map(const char *module) return NULL; } -static struct map *get_target_map(const char *target, bool user) +struct map *get_target_map(const char *target, bool user) { /* Init maps of given executable or kernel */ if (user) @@ -703,19 +703,32 @@ post_process_kernel_probe_trace_events(struct probe_trace_event *tevs, return skipped; } +void __weak +arch__post_process_probe_trace_events(struct perf_probe_event *pev __maybe_unused, + int ntevs __maybe_unused) +{ +} + /* Post processing the probe events */ -static int post_process_probe_trace_events(struct probe_trace_event *tevs, +static int post_process_probe_trace_events(struct perf_probe_event *pev, + struct probe_trace_event *tevs, int ntevs, const char *module, bool uprobe) { - if (uprobe) - return add_exec_to_probe_trace_eve
[PATCH 1/2] perf: Add function to post process kernel trace events
Instead of inline code, introduce function to post process kernel probe trace events. Signed-off-by: Ravi Bangoria <ravi.bango...@linux.vnet.ibm.com> --- tools/perf/util/probe-event.c | 29 ++--- 1 file changed, 18 insertions(+), 11 deletions(-) diff --git a/tools/perf/util/probe-event.c b/tools/perf/util/probe-event.c index 953dc1a..4e215e7 100644 --- a/tools/perf/util/probe-event.c +++ b/tools/perf/util/probe-event.c @@ -664,22 +664,14 @@ static int add_module_to_probe_trace_events(struct probe_trace_event *tevs, return ret; } -/* Post processing the probe events */ -static int post_process_probe_trace_events(struct probe_trace_event *tevs, - int ntevs, const char *module, - bool uprobe) +static int +post_process_kernel_probe_trace_events(struct probe_trace_event *tevs, + int ntevs) { struct ref_reloc_sym *reloc_sym; char *tmp; int i, skipped = 0; - if (uprobe) - return add_exec_to_probe_trace_events(tevs, ntevs, module); - - /* Note that currently ref_reloc_sym based probe is not for drivers */ - if (module) - return add_module_to_probe_trace_events(tevs, ntevs, module); - reloc_sym = kernel_get_ref_reloc_sym(); if (!reloc_sym) { pr_warning("Relocated base symbol is not found!\n"); @@ -711,6 +703,21 @@ static int post_process_probe_trace_events(struct probe_trace_event *tevs, return skipped; } +/* Post processing the probe events */ +static int post_process_probe_trace_events(struct probe_trace_event *tevs, + int ntevs, const char *module, + bool uprobe) +{ + if (uprobe) + return add_exec_to_probe_trace_events(tevs, ntevs, module); + + if (module) + /* Currently ref_reloc_sym based probe is not for drivers */ + return add_module_to_probe_trace_events(tevs, ntevs, module); + + return post_process_kernel_probe_trace_events(tevs, ntevs); +} + /* Try to find perf_probe_event with debuginfo */ static int try_to_find_probe_trace_events(struct perf_probe_event *pev, struct probe_trace_event **tevs) -- 2.7.4
[PATCH v3 1/4] perf: Utility function to fetch arch
Add Utility function to fetch arch using evsel. (evsel->env->arch) Signed-off-by: Ravi Bangoria <ravi.bango...@linux.vnet.ibm.com> --- Change in v3: - No changes tools/perf/util/evsel.c | 7 +++ tools/perf/util/evsel.h | 2 ++ 2 files changed, 9 insertions(+) diff --git a/tools/perf/util/evsel.c b/tools/perf/util/evsel.c index 1d8f2bb..0fea724 100644 --- a/tools/perf/util/evsel.c +++ b/tools/perf/util/evsel.c @@ -2422,3 +2422,10 @@ int perf_evsel__open_strerror(struct perf_evsel *evsel, struct target *target, err, strerror_r(err, sbuf, sizeof(sbuf)), perf_evsel__name(evsel)); } + +char *perf_evsel__env_arch(struct perf_evsel *evsel) +{ + if (evsel && evsel->evlist && evsel->evlist->env) + return evsel->evlist->env->arch; + return NULL; +} diff --git a/tools/perf/util/evsel.h b/tools/perf/util/evsel.h index 828ddd1..86fed7a 100644 --- a/tools/perf/util/evsel.h +++ b/tools/perf/util/evsel.h @@ -435,4 +435,6 @@ typedef int (*attr__fprintf_f)(FILE *, const char *, const char *, void *); int perf_event_attr__fprintf(FILE *fp, struct perf_event_attr *attr, attr__fprintf_f attr__fprintf, void *priv); +char *perf_evsel__env_arch(struct perf_evsel *evsel); + #endif /* __PERF_EVSEL_H */ -- 2.5.5
[PATCH v3 0/4] perf annotate: Enable cross arch annotate
Perf can currently only support code navigation (branches and calls) in annotate when run on the same architecture where perf.data was recorded. But cross arch annotate is not supported. This patchset enables cross arch annotate. Currently I've used x86 and arm instructions which are already available and adding support for powerpc as well. Adding support for other arch will be easy. I've created this patch on top of acme/perf/core. And tested it with x86 and powerpc only. Example: Record on powerpc: $ ./perf record -a Report -> Annotate on x86: $ ./perf report -i perf.data.powerpc --vmlinux vmlinux.powerpc Changes in v3: - Optimized patch that enables annotate on powerpc - Corrected one memory leak v2 link: https://lkml.org/lkml/2016/6/29/278 Naveen N. Rao (1): perf annotate: add powerpc support Ravi Bangoria (4): perf: Utility function to fetch arch perf annotate: Enable cross arch annotate perf: Define macro for normalized arch names tools/perf/arch/common.c | 36 ++--- tools/perf/arch/common.h | 11 ++ tools/perf/builtin-top.c | 2 +- tools/perf/ui/browsers/annotate.c | 3 +- tools/perf/ui/gtk/annotate.c | 2 +- tools/perf/util/annotate.c | 260 ++--- tools/perf/util/annotate.h | 5 +- tools/perf/util/evsel.c| 7 + tools/perf/util/evsel.h| 2 + tools/perf/util/unwind-libunwind.c | 4 +- 10 files changed, 260 insertions(+), 72 deletions(-) -- 2.5.5
[PATCH v3 2/4] perf annotate: Enable cross arch annotate
Change current data structures and function to enable cross arch annotate. Current implementation does not contain logic of record on one arch and annotating on other. This remote annotate is partially possible with current implementation for x86 (or may be arm as well) only. But, to make remote annotation work properly, all architecture instruction tables need to be included in the perf binary. And while annotating, look for instruction table where perf.data was recorded. For arm, few instructions were defined under #if __arm__ which I've used as a table for arm. But I'm not sure whether instruction defined outside of that also contains arm instructions. Apart from that, 'call__parse()' and 'move__parse()' contains #ifdef __arm__ directive. I've changed it to if (!strcmp(norm_arch, arm)). But I've not tested this as well. Signed-off-by: Ravi Bangoria <ravi.bango...@linux.vnet.ibm.com> --- Changes in v3: - No changes tools/perf/builtin-top.c | 2 +- tools/perf/ui/browsers/annotate.c | 3 +- tools/perf/ui/gtk/annotate.c | 2 +- tools/perf/util/annotate.c| 136 -- tools/perf/util/annotate.h| 5 +- 5 files changed, 95 insertions(+), 53 deletions(-) diff --git a/tools/perf/builtin-top.c b/tools/perf/builtin-top.c index 07fc792..d4fd947 100644 --- a/tools/perf/builtin-top.c +++ b/tools/perf/builtin-top.c @@ -128,7 +128,7 @@ static int perf_top__parse_source(struct perf_top *top, struct hist_entry *he) return err; } - err = symbol__annotate(sym, map, 0); + err = symbol__annotate(sym, map, 0, NULL); if (err == 0) { out_assign: top->sym_filter_entry = he; diff --git a/tools/perf/ui/browsers/annotate.c b/tools/perf/ui/browsers/annotate.c index 29dc6d2..3a652a6f 100644 --- a/tools/perf/ui/browsers/annotate.c +++ b/tools/perf/ui/browsers/annotate.c @@ -1050,7 +1050,8 @@ int symbol__tui_annotate(struct symbol *sym, struct map *map, (nr_pcnt - 1); } - if (symbol__annotate(sym, map, sizeof_bdl) < 0) { + if (symbol__annotate(sym, map, sizeof_bdl, +perf_evsel__env_arch(evsel)) < 0) { ui__error("%s", ui_helpline__last_msg); goto out_free_offsets; } diff --git a/tools/perf/ui/gtk/annotate.c b/tools/perf/ui/gtk/annotate.c index 9c7ff8d..d7150b3 100644 --- a/tools/perf/ui/gtk/annotate.c +++ b/tools/perf/ui/gtk/annotate.c @@ -166,7 +166,7 @@ static int symbol__gtk_annotate(struct symbol *sym, struct map *map, if (map->dso->annotate_warned) return -1; - if (symbol__annotate(sym, map, 0) < 0) { + if (symbol__annotate(sym, map, 0, perf_evsel__env_arch(evsel)) < 0) { ui__error("%s", ui_helpline__current); return -1; } diff --git a/tools/perf/util/annotate.c b/tools/perf/util/annotate.c index c385fec..36a5825 100644 --- a/tools/perf/util/annotate.c +++ b/tools/perf/util/annotate.c @@ -20,12 +20,14 @@ #include #include #include +#include +#include "../arch/common.h" const char *disassembler_style; const char *objdump_path; static regex_t file_lineno; -static struct ins *ins__find(const char *name); +static struct ins *ins__find(const char *name, const char *norm_arch); static int disasm_line__parse(char *line, char **namep, char **rawp); static void ins__delete(struct ins_operands *ops) @@ -53,7 +55,8 @@ int ins__scnprintf(struct ins *ins, char *bf, size_t size, return ins__raw_scnprintf(ins, bf, size, ops); } -static int call__parse(struct ins_operands *ops) +static int call__parse(struct ins_operands *ops, + __maybe_unused const char *norm_arch) { char *endptr, *tok, *name; @@ -65,10 +68,8 @@ static int call__parse(struct ins_operands *ops) name++; -#ifdef __arm__ - if (strchr(name, '+')) + if (!strcmp(norm_arch, "arm") && strchr(name, '+')) return -1; -#endif tok = strchr(name, '>'); if (tok == NULL) @@ -117,7 +118,8 @@ bool ins__is_call(const struct ins *ins) return ins->ops == _ops; } -static int jump__parse(struct ins_operands *ops) +static int jump__parse(struct ins_operands *ops, + __maybe_unused const char *norm_arch) { const char *s = strchr(ops->raw, '+'); @@ -172,7 +174,7 @@ static int comment__symbol(char *raw, char *comment, u64 *addrp, char **namep) return 0; } -static int lock__parse(struct ins_operands *ops) +static int lock__parse(struct ins_operands *ops, const char *norm_arch) { char *name; @@ -183,7 +185,7 @@ static int lock__parse(struct ins_operands *ops) if (disasm_line__parse(ops->raw, , >locked.ops->raw) < 0) goto out_free_ops; - ops->locked.ins = ins__find
[PATCH v3 3/4] perf annotate: add powerpc support
From: Naveen N. Rao <naveen.n@linux.vnet.ibm.com> Powerpc has long list of branch instructions and hardcoding them in table appears to be error-prone. So, add new function to find instruction instead of creating table. This function dynamically create table(list of 'struct ins'), and instead of creating object every time, first check if list already contain object for that instruction. Signed-off-by: Naveen N. Rao <naveen.n@linux.vnet.ibm.com> Signed-off-by: Ravi Bangoria <ravi.bango...@linux.vnet.ibm.com> --- Changes in v3: - Optimized code - Corrected one memory leak tools/perf/util/annotate.c | 126 + 1 file changed, 126 insertions(+) diff --git a/tools/perf/util/annotate.c b/tools/perf/util/annotate.c index 36a5825..b87eac7 100644 --- a/tools/perf/util/annotate.c +++ b/tools/perf/util/annotate.c @@ -461,6 +461,11 @@ static struct ins instructions_arm[] = { { .name = "bne", .ops = _ops, }, }; +struct instructions_powerpc { + struct ins *ins; + struct list_head list; +}; + static int ins__key_cmp(const void *name, const void *insp) { const struct ins *ins = insp; @@ -476,6 +481,125 @@ static int ins__cmp(const void *a, const void *b) return strcmp(ia->name, ib->name); } +static struct ins *list_add__ins_powerpc(struct instructions_powerpc *head, +const char *name, struct ins_ops *ops) +{ + struct instructions_powerpc *ins_powerpc; + struct ins *ins; + + ins = zalloc(sizeof(struct ins)); + if (!ins) + return NULL; + + ins_powerpc = zalloc(sizeof(struct instructions_powerpc)); + if (!ins_powerpc) + goto out_free_ins; + + ins->name = strdup(name); + if (!ins->name) + goto out_free_ins_power; + + ins->ops = ops; + ins_powerpc->ins = ins; + list_add_tail(&(ins_powerpc->list), &(head->list)); + + return ins; + +out_free_ins_power: + zfree(_powerpc); +out_free_ins: + zfree(); + return NULL; +} + +static struct ins *list_search__ins_powerpc(struct instructions_powerpc *head, + const char *name) +{ + struct instructions_powerpc *pos; + + list_for_each_entry(pos, >list, list) { + if (!strcmp(pos->ins->name, name)) + return pos->ins; + } + return NULL; +} + +static struct ins *ins__find_powerpc(const char *name) +{ + int i; + struct ins *ins; + struct ins_ops *ops; + static struct instructions_powerpc head; + static bool list_initialized; + + /* +* - Interested only if instruction starts with 'b'. +* - Few start with 'b', but aren't branch instructions. +* - Let's also ignore instructions involving 'ctr' and +* 'tar' since target branch addresses for those can't +* be determined statically. +*/ + if (name[0] != 'b' || + !strncmp(name, "bcd", 3) || + !strncmp(name, "brinc", 5) || + !strncmp(name, "bper", 4) || + strstr(name, "ctr")|| + strstr(name, "tar")) + return NULL; + + if (!list_initialized) { + INIT_LIST_HEAD(); + list_initialized = true; + } + + /* +* Return if we already have object of 'struct ins' for this +* instruction +*/ + ins = list_search__ins_powerpc(, name); + if (ins) + return ins; + + ops = _ops; + + i = strlen(name) - 1; + if (i < 0) + return NULL; + + /* ignore optional hints at the end of the instructions */ + if (name[i] == '+' || name[i] == '-') + i--; + + if (name[i] == 'l' || (name[i] == 'a' && name[i-1] == 'l')) { + /* +* if the instruction ends up with 'l' or 'la', then +* those are considered 'calls' since they update LR. +* ... except for 'bnl' which is branch if not less than +* and the absolute form of the same. +*/ + if (strcmp(name, "bnl") && strcmp(name, "bnl+") && + strcmp(name, "bnl-") && strcmp(name, "bnla") && + strcmp(name, "bnla+") && strcmp(name, "bnla-")) + ops = _ops; + } + if (name[i] == 'r' && name[i-1] == 'l') + /* +* instructions ending with 'lr' are considered to be +* return instructions +*/ + ops = _ops; + + /* +* Add instruction to list so next tim
[PATCH v3 4/4] perf: Define macro for normalized arch names
Define macro for each normalized arch name and use them instead of using arch name as string Signed-off-by: Ravi Bangoria <ravi.bango...@linux.vnet.ibm.com> --- Changes in v3: - No changes tools/perf/arch/common.c | 36 ++-- tools/perf/arch/common.h | 11 +++ tools/perf/util/annotate.c | 10 +- tools/perf/util/unwind-libunwind.c | 4 ++-- 4 files changed, 36 insertions(+), 25 deletions(-) diff --git a/tools/perf/arch/common.c b/tools/perf/arch/common.c index ee69668..feb2113 100644 --- a/tools/perf/arch/common.c +++ b/tools/perf/arch/common.c @@ -122,25 +122,25 @@ static int lookup_triplets(const char *const *triplets, const char *name) const char *normalize_arch(char *arch) { if (!strcmp(arch, "x86_64")) - return "x86"; + return NORM_X86; if (arch[0] == 'i' && arch[2] == '8' && arch[3] == '6') - return "x86"; + return NORM_X86; if (!strcmp(arch, "sun4u") || !strncmp(arch, "sparc", 5)) - return "sparc"; + return NORM_SPARC; if (!strcmp(arch, "aarch64") || !strcmp(arch, "arm64")) - return "arm64"; + return NORM_ARM64; if (!strncmp(arch, "arm", 3) || !strcmp(arch, "sa110")) - return "arm"; + return NORM_ARM; if (!strncmp(arch, "s390", 4)) - return "s390"; + return NORM_S390; if (!strncmp(arch, "parisc", 6)) - return "parisc"; + return NORM_PARISC; if (!strncmp(arch, "powerpc", 7) || !strncmp(arch, "ppc", 3)) - return "powerpc"; + return NORM_POWERPC; if (!strncmp(arch, "mips", 4)) - return "mips"; + return NORM_MIPS; if (!strncmp(arch, "sh", 2) && isdigit(arch[2])) - return "sh"; + return NORM_SH; return arch; } @@ -180,21 +180,21 @@ static int perf_env__lookup_binutils_path(struct perf_env *env, zfree(); } - if (!strcmp(arch, "arm")) + if (!strcmp(arch, NORM_ARM)) path_list = arm_triplets; - else if (!strcmp(arch, "arm64")) + else if (!strcmp(arch, NORM_ARM64)) path_list = arm64_triplets; - else if (!strcmp(arch, "powerpc")) + else if (!strcmp(arch, NORM_POWERPC)) path_list = powerpc_triplets; - else if (!strcmp(arch, "sh")) + else if (!strcmp(arch, NORM_SH)) path_list = sh_triplets; - else if (!strcmp(arch, "s390")) + else if (!strcmp(arch, NORM_S390)) path_list = s390_triplets; - else if (!strcmp(arch, "sparc")) + else if (!strcmp(arch, NORM_SPARC)) path_list = sparc_triplets; - else if (!strcmp(arch, "x86")) + else if (!strcmp(arch, NORM_X86)) path_list = x86_triplets; - else if (!strcmp(arch, "mips")) + else if (!strcmp(arch, NORM_MIPS)) path_list = mips_triplets; else { ui__error("binutils for %s not supported.\n", arch); diff --git a/tools/perf/arch/common.h b/tools/perf/arch/common.h index 6b01c73..14ca8ca 100644 --- a/tools/perf/arch/common.h +++ b/tools/perf/arch/common.h @@ -5,6 +5,17 @@ extern const char *objdump_path; +/* Macro for normalized arch names */ +#define NORM_X86 "x86" +#define NORM_SPARC "sparc" +#define NORM_ARM64 "arm64" +#define NORM_ARM "arm" +#define NORM_S390 "s390" +#define NORM_PARISC"parisc" +#define NORM_POWERPC "powerpc" +#define NORM_MIPS "mips" +#define NORM_SH"sh" + int perf_env__lookup_objdump(struct perf_env *env); const char *normalize_arch(char *arch); diff --git a/tools/perf/util/annotate.c b/tools/perf/util/annotate.c index b87eac7..fce60b4 100644 --- a/tools/perf/util/annotate.c +++ b/tools/perf/util/annotate.c @@ -68,7 +68,7 @@ static int call__parse(struct ins_operands *ops, name++; - if (!strcmp(norm_arch, "arm") && strchr(name, '+')) + if (!strcmp(norm_arch, NORM_ARM) && strchr(name, '+')) return -1; tok = strchr(name, '>'); @@ -255,7 +255,7 @@ static int mov__parse(struct ins_operands *ops, target = ++s; - if (!strcmp(norm_arch, "arm")) + if (!strcmp(norm_arch, NORM_ARM)) comment = strchr(s,
Re: [PATCH v3 3/4] perf annotate: add powerpc support
Thanks Michael for your suggestion. On Thursday 30 June 2016 11:51 AM, Michael Ellerman wrote: On Thu, 2016-06-30 at 11:44 +0530, Ravi Bangoria wrote: diff --git a/tools/perf/util/annotate.c b/tools/perf/util/annotate.c index 36a5825..b87eac7 100644 --- a/tools/perf/util/annotate.c +++ b/tools/perf/util/annotate.c @@ -476,6 +481,125 @@ static int ins__cmp(const void *a, const void *b) ... + +static struct ins *ins__find_powerpc(const char *name) +{ + int i; + struct ins *ins; + struct ins_ops *ops; + static struct instructions_powerpc head; + static bool list_initialized; + + /* +* - Interested only if instruction starts with 'b'. +* - Few start with 'b', but aren't branch instructions. +* - Let's also ignore instructions involving 'ctr' and +* 'tar' since target branch addresses for those can't +* be determined statically. +*/ + if (name[0] != 'b' || + !strncmp(name, "bcd", 3) || + !strncmp(name, "brinc", 5) || + !strncmp(name, "bper", 4) || + strstr(name, "ctr")|| + strstr(name, "tar")) + return NULL; It would be good if 'bctr' was at least recognised as a branch, even if we can't determine the target. They are very common. We can not show arrow for this since we don't know the target location. can you please suggest how you intends perf to display bctr? bctr can be classified into two variants -- 'bctr' and 'bctrl'. 'bctr' will be considered as jump instruction but jump__parse() won't be able to find any target location and hence it will set target to UINT64_MAX which transform 'bctr' to 'bctr UINT64_MAX'. This looks misleading. bctrl will be considered as call instruction but call_parse() won't be able to find any target function and hence it won't show any navigation arrow for this instruction. Which is same as filter it beforehand. It doesn't look like we have the opcode handy here? Could we get it somehow? That would make this a *lot* more robust. objdump prints machine code, but I don't know how difficult that would be to parse to get opcode. -Ravi cheers
Re: [PATCH v3 3/4] perf annotate: add powerpc support
Hi Balbir, On Friday 01 July 2016 06:18 PM, Balbir Singh wrote: On Fri, 2016-07-01 at 14:13 +0530, Ravi Bangoria wrote: Thanks Michael for your suggestion. On Thursday 30 June 2016 11:51 AM, Michael Ellerman wrote: On Thu, 2016-06-30 at 11:44 +0530, Ravi Bangoria wrote: diff --git a/tools/perf/util/annotate.c b/tools/perf/util/annotate.c index 36a5825..b87eac7 100644 --- a/tools/perf/util/annotate.c +++ b/tools/perf/util/annotate.c @@ -476,6 +481,125 @@ static int ins__cmp(const void *a, const void *b) ... + +static struct ins *ins__find_powerpc(const char *name) +{ + int i; + struct ins *ins; + struct ins_ops *ops; + static struct instructions_powerpc head; + static bool list_initialized; + + /* +* - Interested only if instruction starts with 'b'. +* - Few start with 'b', but aren't branch instructions. +* - Let's also ignore instructions involving 'ctr' and +* 'tar' since target branch addresses for those can't +* be determined statically. +*/ + if (name[0] != 'b' || + !strncmp(name, "bcd", 3) || + !strncmp(name, "brinc", 5) || + !strncmp(name, "bper", 4) || + strstr(name, "ctr")|| + strstr(name, "tar")) + return NULL; It would be good if 'bctr' was at least recognised as a branch, even if we can't determine the target. They are very common. We can not show arrow for this since we don't know the target location. can you please suggest how you intends perf to display bctr? bctr can be classified into two variants -- 'bctr' and 'bctrl'. 'bctr' will be considered as jump instruction but jump__parse() won't be able to find any target location and hence it will set target to UINT64_MAX which transform 'bctr' to 'bctr UINT64_MAX'. This looks misleading. bctrl will be considered as call instruction but call_parse() won't be able to find any target function and hence it won't show any navigation arrow for this instruction. Which is same as filter it beforehand. The target location and function are in the counter. Can't we add this to instruction ops? Is it a major change to add it? Of course we can add it. What I mean is we can not determine target location statically by parsing objdump output. For example, consider snippet: objdump output: c0143848: lwarx r8,0,r10 c014384c: addic r8,r8,1 c0143850: stwcx. r8,0,r10 c0143854: bne-c0143848 <.rcu_idle_exit+0x58> corresponding perf annotate output: 58: lwarx r8,0,r10 addic r8,r8,1 stwcx. r8,0,r10 bne- 58 tui will show up arrow before 'bne- 58' instruction, that indicate it as a jump instruction. When we focus on 'bne- 58' instruction, arrow will span from that instruction to instruction with 58th offset( lwarx ). By pressing Enter, it will jump focus to the target. In case of 'bctr', we can not determine target location statically and hence we can not provide any navigation options. Same for 'bctrl' as well. Please correct me if I misunderstood anything. -Ravi Balbir Singh.
Re: [PATCH v2 3/4] perf annotate: add powerpc support
Thanks Naveen, On Wednesday 29 June 2016 08:15 PM, Naveen N. Rao wrote: On 2016/06/29 04:45PM, Ravi Bangoria wrote: From: Naveen N. Rao <naveen.n@linux.vnet.ibm.com> Powerpc has long list of branch instructions and hardcoding them in table appears to be error-prone. So, add new function to find instruction instead of creating table. This function dynamically create table(list of 'struct ins'), and instead of creating object every time, first check if list already contain object for that nemonics. Signed-off-by: Naveen N. Rao <naveen.n@linux.vnet.ibm.com> Signed-off-by: Ravi Bangoria <ravi.bango...@linux.vnet.ibm.com> --- Changes in v2: - Corrected few memory leaks. - Created Dynamic list for powerpc to optimize memory consumption tools/perf/util/annotate.c | 121 + 1 file changed, 121 insertions(+) diff --git a/tools/perf/util/annotate.c b/tools/perf/util/annotate.c index 36a5825..812bfad 100644 --- a/tools/perf/util/annotate.c +++ b/tools/perf/util/annotate.c @@ -461,6 +461,11 @@ static struct ins instructions_arm[] = { { .name = "bne", .ops = _ops, }, }; +struct instructions_powerpc { + struct ins *ins; + struct list_head list; +}; + static int ins__key_cmp(const void *name, const void *insp) { const struct ins *ins = insp; @@ -476,6 +481,120 @@ static int ins__cmp(const void *a, const void *b) return strcmp(ia->name, ib->name); } +static int list_add__ins_powerpc(struct instructions_powerpc *head, +struct ins *ins) +{ + struct instructions_powerpc *ins_powerpc; + + ins_powerpc = zalloc(sizeof(struct instructions_powerpc)); + if (!ins_powerpc) + return -1; + + ins_powerpc->ins = ins; + list_add_tail(&(ins_powerpc->list), &(head->list)); + + return 0; +} + +static struct ins *list_search__ins_powerpc(struct instructions_powerpc *head, + const char *name) +{ + struct instructions_powerpc *pos; + + list_for_each_entry(pos, >list, list) { + if (!strcmp(pos->ins->name, name)) + return pos->ins; + } + return NULL; +} + +static struct ins *ins__find_powerpc(const char *name) +{ + int i; + struct ins *ins; + static struct instructions_powerpc head; + static bool list_initialized; + + if (!list_initialized) { + INIT_LIST_HEAD(); + list_initialized = true; + } + + /* +* Search if we already created object of 'struct ins' +* for this instruction +*/ + ins = list_search__ins_powerpc(, name); + if (ins) + return ins; + + ins = zalloc(sizeof(struct ins)); + if (!ins) + return NULL; + + ins->name = strdup(name); + if (!ins->name) + goto err; You can move the above two inside the below if condition, so that you only allocate memory if needed. Or, what would be better would be to pass 'name' and the appropriate ops pointer to the helper above (list_add__ins_powerpc) and have that allocate 'struct ins' and insert into the list. Yes I will think about this. + + if (name[0] == 'b') { + /* branch instructions */ + ins->ops = _ops; + + /* +* - Few start with 'b', but aren't branch instructions. +* - Let's also ignore instructions involving 'ctr' and +* 'tar' since target branch addresses for those can't +* be determined statically. +*/ + if (!strncmp(name, "bcd", 3) || + !strncmp(name, "brinc", 5) || + !strncmp(name, "bper", 4) || + strstr(name, "ctr")|| + strstr(name, "tar")) + goto err; You are still leaking ins->name here. Ah!! Sorry. I missed that we are using strdup here. Will correct it. -Ravi
[PATCH v2 3/4] perf annotate: add powerpc support
From: Naveen N. Rao <naveen.n@linux.vnet.ibm.com> Powerpc has long list of branch instructions and hardcoding them in table appears to be error-prone. So, add new function to find instruction instead of creating table. This function dynamically create table(list of 'struct ins'), and instead of creating object every time, first check if list already contain object for that nemonics. Signed-off-by: Naveen N. Rao <naveen.n@linux.vnet.ibm.com> Signed-off-by: Ravi Bangoria <ravi.bango...@linux.vnet.ibm.com> --- Changes in v2: - Corrected few memory leaks. - Created Dynamic list for powerpc to optimize memory consumption tools/perf/util/annotate.c | 121 + 1 file changed, 121 insertions(+) diff --git a/tools/perf/util/annotate.c b/tools/perf/util/annotate.c index 36a5825..812bfad 100644 --- a/tools/perf/util/annotate.c +++ b/tools/perf/util/annotate.c @@ -461,6 +461,11 @@ static struct ins instructions_arm[] = { { .name = "bne", .ops = _ops, }, }; +struct instructions_powerpc { + struct ins *ins; + struct list_head list; +}; + static int ins__key_cmp(const void *name, const void *insp) { const struct ins *ins = insp; @@ -476,6 +481,120 @@ static int ins__cmp(const void *a, const void *b) return strcmp(ia->name, ib->name); } +static int list_add__ins_powerpc(struct instructions_powerpc *head, +struct ins *ins) +{ + struct instructions_powerpc *ins_powerpc; + + ins_powerpc = zalloc(sizeof(struct instructions_powerpc)); + if (!ins_powerpc) + return -1; + + ins_powerpc->ins = ins; + list_add_tail(&(ins_powerpc->list), &(head->list)); + + return 0; +} + +static struct ins *list_search__ins_powerpc(struct instructions_powerpc *head, + const char *name) +{ + struct instructions_powerpc *pos; + + list_for_each_entry(pos, >list, list) { + if (!strcmp(pos->ins->name, name)) + return pos->ins; + } + return NULL; +} + +static struct ins *ins__find_powerpc(const char *name) +{ + int i; + struct ins *ins; + static struct instructions_powerpc head; + static bool list_initialized; + + if (!list_initialized) { + INIT_LIST_HEAD(); + list_initialized = true; + } + + /* +* Search if we already created object of 'struct ins' +* for this instruction +*/ + ins = list_search__ins_powerpc(, name); + if (ins) + return ins; + + ins = zalloc(sizeof(struct ins)); + if (!ins) + return NULL; + + ins->name = strdup(name); + if (!ins->name) + goto err; + + if (name[0] == 'b') { + /* branch instructions */ + ins->ops = _ops; + + /* +* - Few start with 'b', but aren't branch instructions. +* - Let's also ignore instructions involving 'ctr' and +* 'tar' since target branch addresses for those can't +* be determined statically. +*/ + if (!strncmp(name, "bcd", 3) || + !strncmp(name, "brinc", 5) || + !strncmp(name, "bper", 4) || + strstr(name, "ctr")|| + strstr(name, "tar")) + goto err; + + i = strlen(name) - 1; + if (i < 0) + goto err; + + /* ignore optional hints at the end of the instructions */ + if (name[i] == '+' || name[i] == '-') + i--; + + if (name[i] == 'l' || (name[i] == 'a' && name[i-1] == 'l')) { + /* +* if the instruction ends up with 'l' or 'la', then +* those are considered 'calls' since they update LR. +* ... except for 'bnl' which is branch if not less than +* and the absolute form of the same. +*/ + if (strcmp(name, "bnl") && strcmp(name, "bnl+") && + strcmp(name, "bnl-") && strcmp(name, "bnla") && + strcmp(name, "bnla+") && strcmp(name, "bnla-")) + ins->ops = _ops; + } + if (name[i] == 'r' && name[i-1] == 'l') + /* +* instructions ending with 'lr' are considered to be +* return instructions +*/
[PATCH v2 2/4] perf annotate: Enable cross arch annotate
Change current data structures and function to enable cross arch annotate. Current implementation does not contain logic of record on one arch and annotating on other. This remote annotate is partially possible with current implementation for x86 (or may be arm as well) only. But, to make remote annotation work properly, all architecture instruction tables need to be included in the perf binary. And while annotating, look for instruction table where perf.data was recorded. For arm, few instructions were defined under #if __arm__ which I've used as a table for arm. But I'm not sure whether instruction defined outside of that also contains arm instructions. Apart from that, 'call__parse()' and 'move__parse()' contains #ifdef __arm__ directive. I've changed it to if (!strcmp(norm_arch, "arm")). But I've not tested this as well. Signed-off-by: Ravi Bangoria <ravi.bango...@linux.vnet.ibm.com> --- Changes in v2: - No changes tools/perf/builtin-top.c | 2 +- tools/perf/ui/browsers/annotate.c | 3 +- tools/perf/ui/gtk/annotate.c | 2 +- tools/perf/util/annotate.c| 136 -- tools/perf/util/annotate.h| 5 +- 5 files changed, 95 insertions(+), 53 deletions(-) diff --git a/tools/perf/builtin-top.c b/tools/perf/builtin-top.c index 07fc792..d4fd947 100644 --- a/tools/perf/builtin-top.c +++ b/tools/perf/builtin-top.c @@ -128,7 +128,7 @@ static int perf_top__parse_source(struct perf_top *top, struct hist_entry *he) return err; } - err = symbol__annotate(sym, map, 0); + err = symbol__annotate(sym, map, 0, NULL); if (err == 0) { out_assign: top->sym_filter_entry = he; diff --git a/tools/perf/ui/browsers/annotate.c b/tools/perf/ui/browsers/annotate.c index 29dc6d2..3a652a6f 100644 --- a/tools/perf/ui/browsers/annotate.c +++ b/tools/perf/ui/browsers/annotate.c @@ -1050,7 +1050,8 @@ int symbol__tui_annotate(struct symbol *sym, struct map *map, (nr_pcnt - 1); } - if (symbol__annotate(sym, map, sizeof_bdl) < 0) { + if (symbol__annotate(sym, map, sizeof_bdl, +perf_evsel__env_arch(evsel)) < 0) { ui__error("%s", ui_helpline__last_msg); goto out_free_offsets; } diff --git a/tools/perf/ui/gtk/annotate.c b/tools/perf/ui/gtk/annotate.c index 9c7ff8d..d7150b3 100644 --- a/tools/perf/ui/gtk/annotate.c +++ b/tools/perf/ui/gtk/annotate.c @@ -166,7 +166,7 @@ static int symbol__gtk_annotate(struct symbol *sym, struct map *map, if (map->dso->annotate_warned) return -1; - if (symbol__annotate(sym, map, 0) < 0) { + if (symbol__annotate(sym, map, 0, perf_evsel__env_arch(evsel)) < 0) { ui__error("%s", ui_helpline__current); return -1; } diff --git a/tools/perf/util/annotate.c b/tools/perf/util/annotate.c index c385fec..36a5825 100644 --- a/tools/perf/util/annotate.c +++ b/tools/perf/util/annotate.c @@ -20,12 +20,14 @@ #include #include #include +#include +#include "../arch/common.h" const char *disassembler_style; const char *objdump_path; static regex_t file_lineno; -static struct ins *ins__find(const char *name); +static struct ins *ins__find(const char *name, const char *norm_arch); static int disasm_line__parse(char *line, char **namep, char **rawp); static void ins__delete(struct ins_operands *ops) @@ -53,7 +55,8 @@ int ins__scnprintf(struct ins *ins, char *bf, size_t size, return ins__raw_scnprintf(ins, bf, size, ops); } -static int call__parse(struct ins_operands *ops) +static int call__parse(struct ins_operands *ops, + __maybe_unused const char *norm_arch) { char *endptr, *tok, *name; @@ -65,10 +68,8 @@ static int call__parse(struct ins_operands *ops) name++; -#ifdef __arm__ - if (strchr(name, '+')) + if (!strcmp(norm_arch, "arm") && strchr(name, '+')) return -1; -#endif tok = strchr(name, '>'); if (tok == NULL) @@ -117,7 +118,8 @@ bool ins__is_call(const struct ins *ins) return ins->ops == _ops; } -static int jump__parse(struct ins_operands *ops) +static int jump__parse(struct ins_operands *ops, + __maybe_unused const char *norm_arch) { const char *s = strchr(ops->raw, '+'); @@ -172,7 +174,7 @@ static int comment__symbol(char *raw, char *comment, u64 *addrp, char **namep) return 0; } -static int lock__parse(struct ins_operands *ops) +static int lock__parse(struct ins_operands *ops, const char *norm_arch) { char *name; @@ -183,7 +185,7 @@ static int lock__parse(struct ins_operands *ops) if (disasm_line__parse(ops->raw, , >locked.ops->raw) < 0) goto out_free_ops; -
[PATCH v2 0/4] perf annotate: Enable cross arch annotate
Perf can currently only support code navigation (branches and calls) in annotate when run on the same architecture where perf.data was recorded. But cross arch annotate is not supported. This patchset enables cross arch annotate. Currently I've used x86 and arm instructions which are already available and adding support for powerpc as well. Adding support for other arch will be easy. I've created this patch on top of acme/perf/core. And tested it with x86 and powerpc only. Example: Record on powerpc: $ ./perf record -a Report -> Annotate on x86: $ ./perf report -i perf.data.powerpc --vmlinux vmlinux.powerpc Changes in v2: - Corrected few memory leaks. - Created Dynamic list for powerpc to optimize memory consumption Naveen N. Rao (1): perf annotate: add powerpc support Ravi Bangoria (3): perf: Utility function to fetch arch perf annotate: Enable cross arch annotate perf: Define macro for arch names tools/perf/arch/common.c | 36 +++--- tools/perf/arch/common.h | 11 ++ tools/perf/builtin-top.c | 2 +- tools/perf/ui/browsers/annotate.c | 3 +- tools/perf/ui/gtk/annotate.c | 2 +- tools/perf/util/annotate.c | 255 ++--- tools/perf/util/annotate.h | 5 +- tools/perf/util/evsel.c| 7 + tools/perf/util/evsel.h| 2 + tools/perf/util/unwind-libunwind.c | 4 +- 10 files changed, 255 insertions(+), 72 deletions(-) -- 2.5.5
[PATCH v2 4/4] perf annotate: Define macro for arch names
Define macro for each arch name and use them instead of using arch name as string. Signed-off-by: Ravi Bangoria <ravi.bango...@linux.vnet.ibm.com> --- Changes in v2: - No changes tools/perf/arch/common.c | 36 ++-- tools/perf/arch/common.h | 11 +++ tools/perf/util/annotate.c | 10 +- tools/perf/util/unwind-libunwind.c | 4 ++-- 4 files changed, 36 insertions(+), 25 deletions(-) diff --git a/tools/perf/arch/common.c b/tools/perf/arch/common.c index ee69668..feb2113 100644 --- a/tools/perf/arch/common.c +++ b/tools/perf/arch/common.c @@ -122,25 +122,25 @@ static int lookup_triplets(const char *const *triplets, const char *name) const char *normalize_arch(char *arch) { if (!strcmp(arch, "x86_64")) - return "x86"; + return NORM_X86; if (arch[0] == 'i' && arch[2] == '8' && arch[3] == '6') - return "x86"; + return NORM_X86; if (!strcmp(arch, "sun4u") || !strncmp(arch, "sparc", 5)) - return "sparc"; + return NORM_SPARC; if (!strcmp(arch, "aarch64") || !strcmp(arch, "arm64")) - return "arm64"; + return NORM_ARM64; if (!strncmp(arch, "arm", 3) || !strcmp(arch, "sa110")) - return "arm"; + return NORM_ARM; if (!strncmp(arch, "s390", 4)) - return "s390"; + return NORM_S390; if (!strncmp(arch, "parisc", 6)) - return "parisc"; + return NORM_PARISC; if (!strncmp(arch, "powerpc", 7) || !strncmp(arch, "ppc", 3)) - return "powerpc"; + return NORM_POWERPC; if (!strncmp(arch, "mips", 4)) - return "mips"; + return NORM_MIPS; if (!strncmp(arch, "sh", 2) && isdigit(arch[2])) - return "sh"; + return NORM_SH; return arch; } @@ -180,21 +180,21 @@ static int perf_env__lookup_binutils_path(struct perf_env *env, zfree(); } - if (!strcmp(arch, "arm")) + if (!strcmp(arch, NORM_ARM)) path_list = arm_triplets; - else if (!strcmp(arch, "arm64")) + else if (!strcmp(arch, NORM_ARM64)) path_list = arm64_triplets; - else if (!strcmp(arch, "powerpc")) + else if (!strcmp(arch, NORM_POWERPC)) path_list = powerpc_triplets; - else if (!strcmp(arch, "sh")) + else if (!strcmp(arch, NORM_SH)) path_list = sh_triplets; - else if (!strcmp(arch, "s390")) + else if (!strcmp(arch, NORM_S390)) path_list = s390_triplets; - else if (!strcmp(arch, "sparc")) + else if (!strcmp(arch, NORM_SPARC)) path_list = sparc_triplets; - else if (!strcmp(arch, "x86")) + else if (!strcmp(arch, NORM_X86)) path_list = x86_triplets; - else if (!strcmp(arch, "mips")) + else if (!strcmp(arch, NORM_MIPS)) path_list = mips_triplets; else { ui__error("binutils for %s not supported.\n", arch); diff --git a/tools/perf/arch/common.h b/tools/perf/arch/common.h index 6b01c73..14ca8ca 100644 --- a/tools/perf/arch/common.h +++ b/tools/perf/arch/common.h @@ -5,6 +5,17 @@ extern const char *objdump_path; +/* Macro for normalized arch names */ +#define NORM_X86 "x86" +#define NORM_SPARC "sparc" +#define NORM_ARM64 "arm64" +#define NORM_ARM "arm" +#define NORM_S390 "s390" +#define NORM_PARISC"parisc" +#define NORM_POWERPC "powerpc" +#define NORM_MIPS "mips" +#define NORM_SH"sh" + int perf_env__lookup_objdump(struct perf_env *env); const char *normalize_arch(char *arch); diff --git a/tools/perf/util/annotate.c b/tools/perf/util/annotate.c index 812bfad..8c27486 100644 --- a/tools/perf/util/annotate.c +++ b/tools/perf/util/annotate.c @@ -68,7 +68,7 @@ static int call__parse(struct ins_operands *ops, name++; - if (!strcmp(norm_arch, "arm") && strchr(name, '+')) + if (!strcmp(norm_arch, NORM_ARM) && strchr(name, '+')) return -1; tok = strchr(name, '>'); @@ -255,7 +255,7 @@ static int mov__parse(struct ins_operands *ops, target = ++s; - if (!strcmp(norm_arch, "arm")) + if (!strcmp(norm_arch, NORM_ARM)) comment = strchr(s, ';'); else
[PATCH v2 1/4] perf: Utility function to fetch arch
Add Utility function to fetch arch using evsel. (evsel->env->arch) Signed-off-by: Ravi Bangoria <ravi.bango...@linux.vnet.ibm.com> --- Changes in v2: - No changes tools/perf/util/evsel.c | 7 +++ tools/perf/util/evsel.h | 2 ++ 2 files changed, 9 insertions(+) diff --git a/tools/perf/util/evsel.c b/tools/perf/util/evsel.c index 1d8f2bb..0fea724 100644 --- a/tools/perf/util/evsel.c +++ b/tools/perf/util/evsel.c @@ -2422,3 +2422,10 @@ int perf_evsel__open_strerror(struct perf_evsel *evsel, struct target *target, err, strerror_r(err, sbuf, sizeof(sbuf)), perf_evsel__name(evsel)); } + +char *perf_evsel__env_arch(struct perf_evsel *evsel) +{ + if (evsel && evsel->evlist && evsel->evlist->env) + return evsel->evlist->env->arch; + return NULL; +} diff --git a/tools/perf/util/evsel.h b/tools/perf/util/evsel.h index 828ddd1..86fed7a 100644 --- a/tools/perf/util/evsel.h +++ b/tools/perf/util/evsel.h @@ -435,4 +435,6 @@ typedef int (*attr__fprintf_f)(FILE *, const char *, const char *, void *); int perf_event_attr__fprintf(FILE *fp, struct perf_event_attr *attr, attr__fprintf_f attr__fprintf, void *priv); +char *perf_evsel__env_arch(struct perf_evsel *evsel); + #endif /* __PERF_EVSEL_H */ -- 2.5.5
[PATCH 3/5] perf/sdt/x86: Move OP parser to tools/perf/arch/x86/
SDT marker argument is in N@OP format. N is the size of argument and OP is the actual assembly operand. OP is arch dependent component and hence it's parsing logic also should be placed under tools/perf/arch/. Signed-off-by: Ravi Bangoria <ravi.bango...@linux.vnet.ibm.com> --- tools/perf/arch/x86/util/perf_regs.c | 93 - tools/perf/util/perf_regs.c | 9 ++- tools/perf/util/perf_regs.h | 7 +- tools/perf/util/probe-file.c | 127 +-- 4 files changed, 134 insertions(+), 102 deletions(-) diff --git a/tools/perf/arch/x86/util/perf_regs.c b/tools/perf/arch/x86/util/perf_regs.c index d8a8dcf..34fcb0d 100644 --- a/tools/perf/arch/x86/util/perf_regs.c +++ b/tools/perf/arch/x86/util/perf_regs.c @@ -3,6 +3,7 @@ #include "../../perf.h" #include "../../util/util.h" #include "../../util/perf_regs.h" +#include "../../util/debug.h" const struct sample_reg sample_reg_masks[] = { SMPL_REG(AX, PERF_REG_X86_AX), @@ -87,7 +88,16 @@ static const struct sdt_name_reg sdt_reg_renamings[] = { SDT_NAME_REG_END, }; -int sdt_rename_register(char **pdesc, char *old_name) +bool arch_sdt_probe_arg_supp(void) +{ + return true; +} + +/* + * The table sdt_reg_renamings is used for adjusting gcc/gas-generated + * registers before filling the uprobe tracer interface. + */ +static int sdt_rename_register(char **pdesc, char *old_name) { const struct sdt_name_reg *rnames = sdt_reg_renamings; char *new_desc, *old_desc = *pdesc; @@ -129,3 +139,84 @@ int sdt_rename_register(char **pdesc, char *old_name) return 0; } + +/* + * x86 specific implementation + * return value: + * <0 : error + * 0 : success + * >0 : skip + */ +int arch_sdt_probe_parse_op(char **desc, const char **prefix) +{ + char *tmp; + int ret = 0; + + /* +* The uprobe tracer format does not support all the addressing +* modes (notably: in x86 the scaled mode); so, we detect ',' +* characters, if there is just one, there is no use converting +* the sdt arg into a uprobe one. +* +* Also it does not support constants; if we find one in the +* current argument, let's skip the argument. +*/ + if (strchr(*desc, ',') || strchr(*desc, '$')) { + pr_debug4("Skipping unsupported SDT argument; %s\n", *desc); + return 1; + } + + /* +* If the argument addressing mode is indirect, we must check +* a few things... +*/ + tmp = strchr(*desc, '('); + if (tmp) { + int j; + + /* +* ...if the addressing mode is indirect with a +* positive offset (ex.: "1608(%ax)"), we need to add +* a '+' prefix so as to be compliant with uprobe +* format. +*/ + if ((*desc)[0] != '+' && (*desc)[0] != '-') + *prefix = ((*desc)[0] == '(') ? "+0" : "+"; + + /* +* ...or if the addressing mode is indirect with a symbol +* as offset, the argument will not be supported by +* the uprobe tracer format; so, let's skip this one. +*/ + for (j = 0; j < tmp - *desc; j++) { + if ((*desc)[j] != '+' && (*desc)[j] != '-' && + !isdigit((*desc)[j])) { + pr_debug4("Skipping unsupported SDT argument; " + "%s\n", *desc); + return 1; + } + } + } + + /* +* The uprobe parser does not support all gas register names; +* so, we have to replace them (ex. for x86_64: %rax -> %ax); +* the loop below looks for the register names (starting with +* a '%' and tries to perform the needed renamings. +*/ + tmp = strchr(*desc, '%'); + while (tmp) { + size_t offset = tmp - *desc; + + ret = sdt_rename_register(desc, *desc + offset); + if (ret < 0) + return ret; + + /* +* The *desc pointer might have changed; so, let's not +* try to reuse tmp for next lookup +*/ + tmp = strchr(*desc + offset + 1, '%'); + } + return 0; +} diff --git a/tools/perf/util/perf_regs.c b/tools/perf/util/perf_regs.c index a37e593..f2b3d0d 100644 --- a/tools/perf/util/perf_regs.c +++ b/tools/perf/util/perf_regs.c @@ -6,8 +6,13 @@ const struct sample_reg __weak sample_reg_masks[] = { SMPL_REG_END }; -int __weak sdt_rename_register(char **pdesc __maybe_unused, - char *ol
[PATCH v2] perf/probe: Change MAX_CMDLEN
There are many SDT markers in powerpc whose uprobe definition goes beyond current MAX_CMDLEN, especially when target filename is long and sdt marker has long list of arguments. For example, definition of sdt marker method__compile__end: 8@17 8@9 8@10 -4@8 8@7 -4@6 8@5 -4@4 1@37(28) from file /usr/lib/jvm/java-1.8.0-openjdk-1.8.0.91-2.b14.fc22.ppc64/jre/lib/\ ppc64/server/libjvm.so is p:sdt_hotspot/method__compile__end /usr/lib/jvm/java-1.8.0-openjdk-\ 1.8.0.91-2.b14.fc22.ppc64/jre/lib/ppc64/server/libjvm.so:0x4c4e00\ arg1=%gpr17:u64 arg2=%gpr9:u64 arg3=%gpr10:u64 arg4=%gpr8:s32\ arg5=%gpr7:u64 arg6=%gpr6:s32 arg7=%gpr5:u64 arg8=%gpr4:s32\ arg9=+37(%gpr28):u8 Perf probe fails with seg fault for such markers. As uprobe_events file accepts definition upto 4094 characters(4096 - 2 (\n\0)), increase value of MAX_CMDLEN to 4094. Signed-off-by: Ravi Bangoria <ravi.bango...@linux.vnet.ibm.com> --- Changes in v2: - Set MAX_CMDLEN to 4094 instead of 512 tools/perf/util/probe-event.c | 1 - tools/perf/util/probe-file.c | 3 ++- 2 files changed, 2 insertions(+), 2 deletions(-) diff --git a/tools/perf/util/probe-event.c b/tools/perf/util/probe-event.c index 6a6f44d..e6e3244 100644 --- a/tools/perf/util/probe-event.c +++ b/tools/perf/util/probe-event.c @@ -47,7 +47,6 @@ #include "probe-file.h" #include "session.h" -#define MAX_CMDLEN 256 #define PERFPROBE_GROUP "probe" bool probe_event_dry_run; /* Dry run flag */ diff --git a/tools/perf/util/probe-file.c b/tools/perf/util/probe-file.c index 38eca3c..fdabe7e 100644 --- a/tools/perf/util/probe-file.c +++ b/tools/perf/util/probe-file.c @@ -29,7 +29,8 @@ #include "session.h" #include "perf_regs.h" -#define MAX_CMDLEN 256 +/* 4096 - 2 ('\n' + '\0') */ +#define MAX_CMDLEN 4094 static void print_open_warning(int err, bool uprobe) { -- 2.9.3
Re: [PATCH 3/5] perf/sdt/x86: Move OP parser to tools/perf/arch/x86/
Thanks Masami for the review. On Tuesday 07 February 2017 08:41 AM, Masami Hiramatsu wrote: > On Thu, 2 Feb 2017 16:41:41 +0530 > Ravi Bangoria <ravi.bango...@linux.vnet.ibm.com> wrote: > >> SDT marker argument is in N@OP format. N is the size of argument and >> OP is the actual assembly operand. OP is arch dependent component and >> hence it's parsing logic also should be placed under tools/perf/arch/. >> > Ok, I have one question. Is there any possibility that N is different > size of OP? e.g. 8@dil, in this case we will record whole rdi. > is that OK? By looking at list of markers on my x86 Fedora25 box, yes, it's possible for case when register size used in OP is more than size specified by N. For example, -4@68(%rbx). But I don't see any argument which specifies higher size in N compared to size of register in OP, like you mentioned in your example. Ravi
[PATCH 0/5] perf/sdt: Argument support for x86 and powepc
The v5 patchset for sdt marker argument support for x86 [1] has couple of issues. For example, it still has x86 specific code in general code. It lacks support for rNN (with size postfix b/w/d), %rsp, %esp, %sil etc. registers and such sdt markers are failing at 'perf probe'. It also fails to convert arguments having no offset but still surrounds register with parenthesis for ex. 8@(%rdi) is converted to +(%di):u64 which is rejected by uprobe_events. It's causing failure at 'perf probe' for all SDT events on all archs except x86. With this patchset, I've solved these issues. (patch 2,3) Also, existing perf shows misleading message when user tries to record sdt event without probing it. I've prepared patch for the same. (patch 1) Apart from that, I've also added logic to support arguments with sdt marker on powerpc. (patch 4) There are cases where uprobe definition of sdt event goes beyond current limit MAX_CMDLEN (256) and in such case perf fails with seg fault. I've solve this issue. (patch 5) Note: This patchset is prepared on top of Alexis' v5 series.[1] [1] http://www.mail-archive.com/linux-kernel@vger.kernel.org/msg1292251.html Ravi Bangoria (5): perf/sdt: Show proper hint perf/sdt/x86: Add renaming logic for rNN and other registers perf/sdt/x86: Move OP parser to tools/perf/arch/x86/ perf/sdt/powerpc: Add argument support perf/probe: Change MAX_CMDLEN tools/lib/api/fs/tracing_path.c | 16 +++- tools/perf/arch/powerpc/util/perf_regs.c | 115 ++ tools/perf/arch/x86/util/perf_regs.c | 137 --- tools/perf/util/perf_regs.c | 9 +- tools/perf/util/perf_regs.h | 7 +- tools/perf/util/probe-event.c| 1 - tools/perf/util/probe-file.c | 129 - 7 files changed, 294 insertions(+), 120 deletions(-) -- 2.9.3
[PATCH 4/5] perf/sdt/powerpc: Add argument support
SDT marker argument is in N@OP format. Here OP is arch dependent component. Add powerpc logic to parse OP and convert it to uprobe compatible format. Signed-off-by: Ravi Bangoria <ravi.bango...@linux.vnet.ibm.com> --- tools/perf/arch/powerpc/util/perf_regs.c | 115 +++ 1 file changed, 115 insertions(+) diff --git a/tools/perf/arch/powerpc/util/perf_regs.c b/tools/perf/arch/powerpc/util/perf_regs.c index a3c3e1c..bbd6f91 100644 --- a/tools/perf/arch/powerpc/util/perf_regs.c +++ b/tools/perf/arch/powerpc/util/perf_regs.c @@ -1,5 +1,10 @@ +#include +#include + #include "../../perf.h" +#include "../../util/util.h" #include "../../util/perf_regs.h" +#include "../../util/debug.h" const struct sample_reg sample_reg_masks[] = { SMPL_REG(r0, PERF_REG_POWERPC_R0), @@ -47,3 +52,113 @@ const struct sample_reg sample_reg_masks[] = { SMPL_REG(dsisr, PERF_REG_POWERPC_DSISR), SMPL_REG_END }; + +bool arch_sdt_probe_arg_supp(void) +{ + return true; +} + +static regex_t regex1, regex2; + +static int init_op_regex(void) +{ + static int initialized; + + if (initialized) + return 0; + + /* REG or %rREG */ + if (regcomp(, "^(%r)?([1-2]?[0-9]|3[0-1])$", REG_EXTENDED)) + goto error; + + /* -NUM(REG) or NUM(REG) or -NUM(%rREG) or NUM(%rREG) */ + if (regcomp(, "^(\\-)?([0-9]+)\\((%r)?([1-2]?[0-9]|3[0-1])\\)$", + REG_EXTENDED)) + goto free_regex1; + + initialized = 1; + return 0; + +free_regex1: + regfree(); +error: + pr_debug4("Regex compilation error.\n"); + return -1; +} + +/* + * Parse OP and convert it into uprobe format, which is, +/-NUM(%gprREG). + * Possible variants of OP are: + * Format Example + * - + * NUM(REG)48(18) + * -NUM(REG) -48(18) + * NUM(%rREG) 48(%r18) + * -NUM(%rREG) -48(%r18) + * REG 18 + * %rREG %r18 + * iNUMi0 + * i-NUM i-1 + * + * SDT marker arguments on Powerpc uses %rREG form with -mregnames flag + * and REG form with -mno-regnames. Here REG is general purpose register, + * which is in 0 to 31 range. + * + * return value of the function: + * <0 : error + * 0 : success + * >0 : skip + */ +int arch_sdt_probe_parse_op(char **desc, const char **prefix) +{ + char *tmp = NULL; + size_t new_len; + regmatch_t rm[5]; + + /* Constant argument. Uprobe does not support it */ + if (*desc[0] == 'i') { + pr_debug4("Skipping unsupported SDT argument: %s\n", *desc); + return 1; + } + + if (init_op_regex() < 0) + return -1; + + if (!regexec(, *desc, 3, rm, 0)) { + /* REG or %rREG --> %gprREG */ + new_len = 5; + new_len += (int)(rm[2].rm_eo - rm[2].rm_so); + + tmp = zalloc(new_len); + if (!tmp) + return -1; + + scnprintf(tmp, new_len, "%%gpr%.*s", + (int)(rm[2].rm_eo - rm[2].rm_so), *desc + rm[2].rm_so); + } else if (!regexec(, *desc, 5, rm, 0)) { + /* +* -NUM(REG) or NUM(REG) or -NUM(%rREG) or NUM(%rREG) --> +* +/-NUM(%gprREG) +*/ + *prefix = (rm[1].rm_so == -1) ? "+" : "-"; + + new_len = 7; + new_len += (int)(rm[2].rm_eo - rm[2].rm_so); + new_len += (int)(rm[4].rm_eo - rm[4].rm_so); + + tmp = zalloc(new_len); + if (!tmp) + return -1; + + scnprintf(tmp, new_len, "%.*s(%%gpr%.*s)", + (int)(rm[2].rm_eo - rm[2].rm_so), *desc + rm[2].rm_so, + (int)(rm[4].rm_eo - rm[4].rm_so), *desc + rm[4].rm_so); + } else { + pr_debug4("Skipping unsupported SDT argument: %s\n", *desc); + return 1; + } + + free(*desc); + *desc = tmp; + return 0; +} -- 2.9.3
[PATCH 2/5] perf/sdt/x86: Add renaming logic for rNN and other registers
'perf probe' is failing for sdt markers whose arguments has rNN (with postfix b/w/d), %rsp, %esp, %sil etc. registers. Add renaming logic for these registers. Signed-off-by: Ravi Bangoria <ravi.bango...@linux.vnet.ibm.com> --- tools/perf/arch/x86/util/perf_regs.c | 44 ++-- 1 file changed, 32 insertions(+), 12 deletions(-) diff --git a/tools/perf/arch/x86/util/perf_regs.c b/tools/perf/arch/x86/util/perf_regs.c index 09a7f55..d8a8dcf 100644 --- a/tools/perf/arch/x86/util/perf_regs.c +++ b/tools/perf/arch/x86/util/perf_regs.c @@ -48,10 +48,42 @@ static const struct sdt_name_reg sdt_reg_renamings[] = { SDT_NAME_REG(rdx, dx), SDT_NAME_REG(esi, si), SDT_NAME_REG(rsi, si), + SDT_NAME_REG(sil, si), SDT_NAME_REG(edi, di), SDT_NAME_REG(rdi, di), + SDT_NAME_REG(dil, di), SDT_NAME_REG(ebp, bp), SDT_NAME_REG(rbp, bp), + SDT_NAME_REG(bpl, bp), + SDT_NAME_REG(rsp, sp), + SDT_NAME_REG(esp, sp), + SDT_NAME_REG(spl, sp), + + /* rNN registers */ + SDT_NAME_REG(r8b, r8), + SDT_NAME_REG(r8w, r8), + SDT_NAME_REG(r8d, r8), + SDT_NAME_REG(r9b, r9), + SDT_NAME_REG(r9w, r9), + SDT_NAME_REG(r9d, r9), + SDT_NAME_REG(r10b, r10), + SDT_NAME_REG(r10w, r10), + SDT_NAME_REG(r10d, r10), + SDT_NAME_REG(r11b, r11), + SDT_NAME_REG(r11w, r11), + SDT_NAME_REG(r11d, r11), + SDT_NAME_REG(r12b, r12), + SDT_NAME_REG(r12w, r12), + SDT_NAME_REG(r12d, r12), + SDT_NAME_REG(r13b, r13), + SDT_NAME_REG(r13w, r13), + SDT_NAME_REG(r13d, r13), + SDT_NAME_REG(r14b, r14), + SDT_NAME_REG(r14w, r14), + SDT_NAME_REG(r14d, r14), + SDT_NAME_REG(r15b, r15), + SDT_NAME_REG(r15w, r15), + SDT_NAME_REG(r15d, r15), SDT_NAME_REG_END, }; @@ -88,18 +120,6 @@ int sdt_rename_register(char **pdesc, char *old_name) /* Copy the chars after the register name (if need be) */ offset = prefix_len + sdt_len; - if (offset < old_desc_len) { - /* -* The orginal register name can be suffixed by 'b', -* 'w' or 'd' to indicate its size; so, we need to -* skip this char if we met one. -*/ - char sfx = old_desc[offset]; - - if (sfx == 'b' || sfx == 'w' || sfx == 'd') - offset++; - } - if (offset < old_desc_len) memcpy(new_desc + prefix_len + uprobe_len, old_desc + offset, old_desc_len - offset); -- 2.9.3
[PATCH 1/5] perf/sdt: Show proper hint
All events from 'perf list', except SDT events, can be directly recorded with 'perf record'. But, the flow is little different for SDT events. User has to probe on SDT events before recording them. Perf is showing misleading message when user tries to record SDT event without probing it. Show proper hint there. Before patch: $ perf record -a -e sdt_glib:idle__add event syntax error: 'sdt_glib:idle__add' \___ unknown tracepoint Error: File /sys/kernel/debug/tracing/events/sdt_glib/idle__add ... Hint: Perhaps this kernel misses some CONFIG_ setting to enable... ... After patch: $ perf record -e sdt_glib:main__after_check event syntax error: 'sdt_glib:idle__add' \___ unknown tracepoint Error: File /sys/kernel/debug/tracing/events/sdt_glib/idle__add ... Hint: SDT event has to be probed before recording it. Suggested-by: Ingo Molnar <mi...@redhat.com> Signed-off-by: Ravi Bangoria <ravi.bango...@linux.vnet.ibm.com> --- tools/lib/api/fs/tracing_path.c | 16 1 file changed, 12 insertions(+), 4 deletions(-) diff --git a/tools/lib/api/fs/tracing_path.c b/tools/lib/api/fs/tracing_path.c index 251b7c3..a0e85df 100644 --- a/tools/lib/api/fs/tracing_path.c +++ b/tools/lib/api/fs/tracing_path.c @@ -99,10 +99,18 @@ static int strerror_open(int err, char *buf, size_t size, const char *filename) * - jirka */ if (debugfs__configured() || tracefs__configured()) { - snprintf(buf, size, -"Error:\tFile %s/%s not found.\n" -"Hint:\tPerhaps this kernel misses some CONFIG_ setting to enable this feature?.\n", -tracing_events_path, filename); + /* sdt markers */ + if (!strncmp(filename, "sdt_", 4)) { + snprintf(buf, size, + "Error:\tFile %s/%s not found.\n" + "Hint:\tSDT event has to be probed before recording it.\n", + tracing_events_path, filename); + } else { + snprintf(buf, size, +"Error:\tFile %s/%s not found.\n" +"Hint:\tPerhaps this kernel misses some CONFIG_ setting to enable this feature?.\n", +tracing_events_path, filename); + } break; } snprintf(buf, size, "%s", -- 2.9.3
[PATCH 5/5] perf/probe: Change MAX_CMDLEN
There are many SDT markers in powerpc whose uprobe definition goes beyond current MAX_CMDLEN, especially when target filename is long and sdt marker has long list of arguments. For example, definition of sdt marker method__compile__end: 8@17 8@9 8@10 -4@8 8@7 -4@6 8@5 -4@4 1@37(28) from file /usr/lib/jvm/java-1.8.0-openjdk-1.8.0.91-2.b14.fc22.ppc64/jre/lib/\ ppc64/server/libjvm.so is p:sdt_hotspot/method__compile__end /usr/lib/jvm/java-1.8.0-openjdk-\ 1.8.0.91-2.b14.fc22.ppc64/jre/lib/ppc64/server/libjvm.so:0x4c4e00\ arg1=%gpr17:u64 arg2=%gpr9:u64 arg3=%gpr10:u64 arg4=%gpr8:s32\ arg5=%gpr7:u64 arg6=%gpr6:s32 arg7=%gpr5:u64 arg8=%gpr4:s32\ arg9=+37(%gpr28):u8 Perf probe fails with seg fault for such markers. As uprobe_events file accepts definition beyond 256 characters, increase value of MAX_CMDLEN to 512. Signed-off-by: Ravi Bangoria <ravi.bango...@linux.vnet.ibm.com> --- tools/perf/util/probe-event.c | 1 - tools/perf/util/probe-file.c | 2 +- 2 files changed, 1 insertion(+), 2 deletions(-) diff --git a/tools/perf/util/probe-event.c b/tools/perf/util/probe-event.c index 2c1bca2..5f3256f 100644 --- a/tools/perf/util/probe-event.c +++ b/tools/perf/util/probe-event.c @@ -47,7 +47,6 @@ #include "probe-file.h" #include "session.h" -#define MAX_CMDLEN 256 #define PERFPROBE_GROUP "probe" bool probe_event_dry_run; /* Dry run flag */ diff --git a/tools/perf/util/probe-file.c b/tools/perf/util/probe-file.c index 38eca3c..1580e26 100644 --- a/tools/perf/util/probe-file.c +++ b/tools/perf/util/probe-file.c @@ -29,7 +29,7 @@ #include "session.h" #include "perf_regs.h" -#define MAX_CMDLEN 256 +#define MAX_CMDLEN 512 static void print_open_warning(int err, bool uprobe) { -- 2.9.3
[PATCH v2] perf/sdt: Show proper hint
All events from 'perf list', except SDT events, can be directly recorded with 'perf record'. But, the flow is little different for SDT events. Probe point for SDT event needs to be created using 'perf probe' before recording it using 'perf record'. Perf shows misleading hint when user tries to record SDT event without creating a probe point. Show proper hint there. Before patch: $ perf record -a -e sdt_glib:idle__add event syntax error: 'sdt_glib:idle__add' \___ unknown tracepoint Error: File /sys/kernel/debug/tracing/events/sdt_glib/idle__add not found. Hint: Perhaps this kernel misses some CONFIG_ setting to enable this feature?. ... After patch: $ perf record -a -e sdt_glib:idle__add event syntax error: 'sdt_glib:idle__add' \___ unknown tracepoint Error: File /sys/kernel/debug/tracing/events/sdt_glib/idle__add not found. Hint: SDT event cannot be directly recorded on. Please use 'perf probe sdt_glib:idle__add' before recording it. ... $ perf probe sdt_glib:idle__add Added new event: sdt_glib:idle__add (on %idle__add in /usr/lib64/libglib-2.0.so.0.5000.2) You can now use it in all perf tools, such as: perf record -e sdt_glib:idle__add -aR sleep 1 $ perf record -a -e sdt_glib:idle__add [ perf record: Woken up 1 times to write data ] [ perf record: Captured and wrote 0.175 MB perf.data ] Suggested-by: Ingo Molnar <mi...@redhat.com> Signed-off-by: Ravi Bangoria <ravi.bango...@linux.vnet.ibm.com> --- Changes in v2: - More precise hint tools/lib/api/fs/tracing_path.c | 31 +-- 1 file changed, 17 insertions(+), 14 deletions(-) diff --git a/tools/lib/api/fs/tracing_path.c b/tools/lib/api/fs/tracing_path.c index 251b7c3..aaafc99 100644 --- a/tools/lib/api/fs/tracing_path.c +++ b/tools/lib/api/fs/tracing_path.c @@ -86,9 +86,13 @@ void put_tracing_file(char *file) free(file); } -static int strerror_open(int err, char *buf, size_t size, const char *filename) +int tracing_path__strerror_open_tp(int err, char *buf, size_t size, + const char *sys, const char *name) { char sbuf[128]; + char filename[PATH_MAX]; + + snprintf(filename, PATH_MAX, "%s/%s", sys, name ?: "*"); switch (err) { case ENOENT: @@ -99,10 +103,18 @@ static int strerror_open(int err, char *buf, size_t size, const char *filename) * - jirka */ if (debugfs__configured() || tracefs__configured()) { - snprintf(buf, size, -"Error:\tFile %s/%s not found.\n" -"Hint:\tPerhaps this kernel misses some CONFIG_ setting to enable this feature?.\n", -tracing_events_path, filename); + /* sdt markers */ + if (!strncmp(filename, "sdt_", 4)) { + snprintf(buf, size, + "Error:\tFile %s/%s not found.\n" + "Hint:\tSDT event cannot be directly recorded on. Please use 'perf probe %s:%s' before recording it.\n", + tracing_events_path, filename, sys, name); + } else { + snprintf(buf, size, +"Error:\tFile %s/%s not found.\n" +"Hint:\tPerhaps this kernel misses some CONFIG_ setting to enable this feature?.\n", +tracing_events_path, filename); + } break; } snprintf(buf, size, "%s", @@ -125,12 +137,3 @@ static int strerror_open(int err, char *buf, size_t size, const char *filename) return 0; } - -int tracing_path__strerror_open_tp(int err, char *buf, size_t size, const char *sys, const char *name) -{ - char path[PATH_MAX]; - - snprintf(path, PATH_MAX, "%s/%s", sys, name ?: "*"); - - return strerror_open(err, buf, size, path); -} -- 2.9.3
Re: [PATCH v5 0/2] perf probe: add sdt probes arguments into the uprobe cmd string
On Wednesday 14 December 2016 01:06 PM, Ingo Molnar wrote: > * Alexis Berlemontwrote: > >> Hi Masami, >> >> Many thanks for your mail. >> >> Here is another patch set which tries to fix the points you mentioned: >> >> * Skip the arguments containing a constant ($123); >> * Review the code in charge of the register renaming (search for '%' >> and parse it); >> * Minor changes (print the argument in case of error, skipping, check >> the sdt arg type index); >> >> Many thanks, >> >> Alexis. >> >> Alexis Berlemont (2): >> perf sdt: add scanning of sdt probles arguments >> perf probe: add sdt probes arguments into the uprobe cmd string > I'd like to hijack this thread to report an SDT oddity - one of my boxen > reports > lots of SDT tracepoints in 'perf list': > > mem:[/len][:access] [Hardware breakpoint] > > sdt_libc:lll_lock_wait_private [SDT event] > sdt_libc:longjmp [SDT event] > sdt_libc:longjmp_target[SDT event] > sdt_libc:memory_arena_new [SDT event] > sdt_libc:memory_arena_retry[SDT event] > sdt_libc:memory_arena_reuse[SDT event] > sdt_libc:memory_arena_reuse_free_list [SDT event] > sdt_libc:memory_arena_reuse_wait [SDT event] > sdt_libc:memory_calloc_retry [SDT event] > sdt_libc:memory_heap_free [SDT event] > ... > > But none of them work: > > Error: No permissions to read > /sys/kernel/debug/tracing/events/sdt_libc/longjmp > Hint: Try 'sudo mount -o remount,mode=755 /sys/kernel/debug/tracing' > > ... > > Error: File /sys/kernel/debug/tracing/events/sdt_libc/longjmp not found. > Hint: Perhaps this kernel misses some CONFIG_ setting to enable this > feature?. > > What kind of patches are required for SDT probes to work? Hi Ingo, Works for me on my x86 Fedora 25 box. May be some permission issue? @Alexis, Planning to progress on it :) ? I would like to prepare patch for powerpc. Thanks, Ravi > Thanks, > > Ingo >
Re: [PATCH v5 0/2] perf probe: add sdt probes arguments into the uprobe cmd string
On Tuesday 24 January 2017 01:52 PM, Ingo Molnar wrote: > * Ravi Bangoria <ravi.bango...@linux.vnet.ibm.com> wrote: > >> >> On Wednesday 14 December 2016 01:06 PM, Ingo Molnar wrote: >>> * Alexis Berlemont <alexis.berlem...@gmail.com> wrote: >>> >>>> Hi Masami, >>>> >>>> Many thanks for your mail. >>>> >>>> Here is another patch set which tries to fix the points you mentioned: >>>> >>>> * Skip the arguments containing a constant ($123); >>>> * Review the code in charge of the register renaming (search for '%' >>>> and parse it); >>>> * Minor changes (print the argument in case of error, skipping, check >>>> the sdt arg type index); >>>> >>>> Many thanks, >>>> >>>> Alexis. >>>> >>>> Alexis Berlemont (2): >>>> perf sdt: add scanning of sdt probles arguments >>>> perf probe: add sdt probes arguments into the uprobe cmd string >>> I'd like to hijack this thread to report an SDT oddity - one of my boxen >>> reports >>> lots of SDT tracepoints in 'perf list': >>> >>> mem:[/len][:access] [Hardware breakpoint] >>> >>> sdt_libc:lll_lock_wait_private [SDT event] >>> sdt_libc:longjmp [SDT event] >>> sdt_libc:longjmp_target[SDT event] >>> sdt_libc:memory_arena_new [SDT event] >>> sdt_libc:memory_arena_retry[SDT event] >>> sdt_libc:memory_arena_reuse[SDT event] >>> sdt_libc:memory_arena_reuse_free_list [SDT event] >>> sdt_libc:memory_arena_reuse_wait [SDT event] >>> sdt_libc:memory_calloc_retry [SDT event] >>> sdt_libc:memory_heap_free [SDT event] >>> ... >>> >>> But none of them work: >>> >>> Error: No permissions to read >>> /sys/kernel/debug/tracing/events/sdt_libc/longjmp >>> Hint: Try 'sudo mount -o remount,mode=755 /sys/kernel/debug/tracing' >>> >>> ... >>> >>> Error: File /sys/kernel/debug/tracing/events/sdt_libc/longjmp not found. >>> Hint: Perhaps this kernel misses some CONFIG_ setting to enable this >>> feature?. >>> >>> What kind of patches are required for SDT probes to work? >> Hi Ingo, >> >> I suppose you are trying to record SDT events without probing it. >> In that case, first put a probe on an event and then try to record >> it. For example, > > Well, I was mainly complaining about the misleading messages and flow of the > tooling here. Could you please improve the messages so that if I use it like > the > way I reported it results in me trying the right approach? Right, message is misleading. Will prepare a patch for this. Also it's little odd flow for sdt markers, to put a probe first and then record it while other events can be recorded directly. There was a patch by Hemant about directly recording SDT marker events. I don't see any updates on that: https://www.mail-archive.com/linux-kernel@vger.kernel.org/msg1138183.html -Ravi > Thanks, > > Ingo >
Re: [PATCH v5 2/2] perf probe: add sdt probes arguments into the uprobe cmd string
Hi Alexis, On Wednesday 14 December 2016 05:37 AM, Alexis Berlemont wrote: > An sdt probe can be associated with arguments but they were not passed > to the user probe tracing interface (uprobe_events); this patch adapts > the sdt argument descriptors according to the uprobe input format. > > As the uprobe parser does not support scaled address mode, perf will > skip arguments which cannot be adapted to the uprobe format. > > Here are the results: > > $ perf buildid-cache -v --add test_sdt > $ perf probe -x test_sdt sdt_libfoo:table_frob > $ perf probe -x test_sdt sdt_libfoo:table_diddle > $ perf record -e sdt_libfoo:table_frob -e sdt_libfoo:table_diddle test_sdt > $ perf script > test_sdt ... 666.255678: sdt_libfoo:table_frob: (4004d7) arg0=0 arg1=0 > test_sdt ... 666.255683: sdt_libfoo:table_diddle: (40051a) arg0=0 arg1=0 > test_sdt ... 666.255686: sdt_libfoo:table_frob: (4004d7) arg0=1 arg1=2 > test_sdt ... 666.255689: sdt_libfoo:table_diddle: (40051a) arg0=3 arg1=4 > test_sdt ... 666.255692: sdt_libfoo:table_frob: (4004d7) arg0=2 arg1=4 > test_sdt ... 666.255694: sdt_libfoo:table_diddle: (40051a) arg0=6 arg1=8 > > Signed-off-by: Alexis Berlemont> --- > tools/perf/arch/x86/util/perf_regs.c | 83 + > tools/perf/util/perf_regs.c | 6 ++ > tools/perf/util/perf_regs.h | 6 ++ > tools/perf/util/probe-file.c | 170 > ++- > 4 files changed, 261 insertions(+), 4 deletions(-) > > diff --git a/tools/perf/arch/x86/util/perf_regs.c > b/tools/perf/arch/x86/util/perf_regs.c > index c5db14f..09a7f55 100644 > --- a/tools/perf/arch/x86/util/perf_regs.c > +++ b/tools/perf/arch/x86/util/perf_regs.c > @@ -1,4 +1,7 @@ > +#include > + > #include "../../perf.h" > +#include "../../util/util.h" > #include "../../util/perf_regs.h" > > const struct sample_reg sample_reg_masks[] = { > @@ -26,3 +29,83 @@ const struct sample_reg sample_reg_masks[] = { > #endif > SMPL_REG_END > }; > + > +struct sdt_name_reg { > + const char *sdt_name; > + const char *uprobe_name; > +}; > +#define SDT_NAME_REG(n, m) {.sdt_name = "%" #n, .uprobe_name = "%" #m} > +#define SDT_NAME_REG_END {.sdt_name = NULL, .uprobe_name = NULL} > + > +static const struct sdt_name_reg sdt_reg_renamings[] = { > + SDT_NAME_REG(eax, ax), > + SDT_NAME_REG(rax, ax), > + SDT_NAME_REG(ebx, bx), > + SDT_NAME_REG(rbx, bx), > + SDT_NAME_REG(ecx, cx), > + SDT_NAME_REG(rcx, cx), > + SDT_NAME_REG(edx, dx), > + SDT_NAME_REG(rdx, dx), > + SDT_NAME_REG(esi, si), > + SDT_NAME_REG(rsi, si), > + SDT_NAME_REG(edi, di), > + SDT_NAME_REG(rdi, di), > + SDT_NAME_REG(ebp, bp), > + SDT_NAME_REG(rbp, bp), > + SDT_NAME_REG_END, > +}; I see many markers uses %rsp. Such markers are failing at perf probe. Please add renaming entry for %rsp and %esp. For example: $ readelf -n /usr/lib64/libpython3.5m.so.1.0 ... Name: function__entry Arguments: 8@224(%rsp) 8@232(%rsp) -4@240(%rsp) 8@%rbx $ sudo ./perf probe sdt_python:function__entry Failed to write event: Invalid argument Please upgrade your kernel to at least 3.14 to have access to feature +224(%rsp):u64 Error: Failed to add events. This code does not handle rNN registers with postfix('b', 'w', 'd'). Such markers are failing at perf probe. For example: $ readelf -n /usr/lib64/libperl.so.5.24.0 ... Name: sub__return Arguments: 8@%rax 8@%r8 4@%r9d 8@%rsi $ sudo ./perf probe -v sdt_perl:sub__entry ... Opening /sys/kernel/debug/tracing//uprobe_events write=1 Writing event: p:sdt_perl/sub__entry /usr/lib64/libperl.so.5.24.0:0xbb780 arg1=%ax:u64 arg2=%r8:u64 arg3=%r9d:u32 arg4=%si:u64 Failed to write event: Invalid argument Error: Failed to add events. Reason: Invalid argument (Code: -22) Can we add them like: /* rNN registers */ SDT_NAME_REG(r8b, r8), SDT_NAME_REG(r8w, r8), SDT_NAME_REG(r8d, r8), SDT_NAME_REG(r9b, r9), ... SDT_NAME_REG(r14d, r14), SDT_NAME_REG(r15b, r15), SDT_NAME_REG(r15w, r15), SDT_NAME_REG(r15d, r15), and ... > + > +int sdt_rename_register(char **pdesc, char *old_name) > +{ > + const struct sdt_name_reg *rnames = sdt_reg_renamings; > + char *new_desc, *old_desc = *pdesc; > + size_t prefix_len, sdt_len, uprobe_len, old_desc_len, offset; > + int ret = -1; > + > + while (ret != 0 && rnames->sdt_name != NULL) { > + sdt_len = strlen(rnames->sdt_name); > + ret = strncmp(old_name, rnames->sdt_name, sdt_len); > + rnames += !!ret; > + } > + > + if (rnames->sdt_name == NULL) > + return 0; > + > + sdt_len = strlen(rnames->sdt_name); > + uprobe_len = strlen(rnames->uprobe_name); > + old_desc_len = strlen(old_desc) + 1; > + > + new_desc = zalloc(old_desc_len + uprobe_len - sdt_len); > + if (new_desc == NULL) > + return -1; > + > + /*
Re: [PATCH v5 0/2] perf probe: add sdt probes arguments into the uprobe cmd string
On Wednesday 14 December 2016 01:06 PM, Ingo Molnar wrote: > * Alexis Berlemontwrote: > >> Hi Masami, >> >> Many thanks for your mail. >> >> Here is another patch set which tries to fix the points you mentioned: >> >> * Skip the arguments containing a constant ($123); >> * Review the code in charge of the register renaming (search for '%' >> and parse it); >> * Minor changes (print the argument in case of error, skipping, check >> the sdt arg type index); >> >> Many thanks, >> >> Alexis. >> >> Alexis Berlemont (2): >> perf sdt: add scanning of sdt probles arguments >> perf probe: add sdt probes arguments into the uprobe cmd string > I'd like to hijack this thread to report an SDT oddity - one of my boxen > reports > lots of SDT tracepoints in 'perf list': > > mem:[/len][:access] [Hardware breakpoint] > > sdt_libc:lll_lock_wait_private [SDT event] > sdt_libc:longjmp [SDT event] > sdt_libc:longjmp_target[SDT event] > sdt_libc:memory_arena_new [SDT event] > sdt_libc:memory_arena_retry[SDT event] > sdt_libc:memory_arena_reuse[SDT event] > sdt_libc:memory_arena_reuse_free_list [SDT event] > sdt_libc:memory_arena_reuse_wait [SDT event] > sdt_libc:memory_calloc_retry [SDT event] > sdt_libc:memory_heap_free [SDT event] > ... > > But none of them work: > > Error: No permissions to read > /sys/kernel/debug/tracing/events/sdt_libc/longjmp > Hint: Try 'sudo mount -o remount,mode=755 /sys/kernel/debug/tracing' > > ... > > Error: File /sys/kernel/debug/tracing/events/sdt_libc/longjmp not found. > Hint: Perhaps this kernel misses some CONFIG_ setting to enable this > feature?. > > What kind of patches are required for SDT probes to work? Hi Ingo, I suppose you are trying to record SDT events without probing it. In that case, first put a probe on an event and then try to record it. For example, $ ./perf list | grep sdt_ sdt_glib:main__after_prepare [SDT event] sdt_glib:main__before_dispatch [SDT event] ... $ ./perf record -a -e sdt_glib:main__after_prepare event syntax error: 'sdt_glib:main__after_prepare' \___ unknown tracepoint Error: File /sys/kernel/debug/tracing/events/sdt_glib/main__after_prepare not found. Hint: Perhaps this kernel misses some CONFIG_ setting to enable this feature?. ... $ ./perf probe sdt_glib:main__after_prepare Added new events: sdt_glib:main__after_prepare (on %main__after_prepare in /usr/lib64/libglib-2.0.so.0.5000.2) sdt_glib:main__after_prepare_1 (on %main__after_prepare in /usr/lib64/libglib-2.0.so.0.5000.2) You can now use it in all perf tools, such as: perf record -e sdt_glib:main__after_prepare_1 -aR sleep 1 $ ./perf record -a -e sdt_glib:main__after_prepare [ perf record: Woken up 1 times to write data ] [ perf record: Captured and wrote 0.191 MB perf.data ] -Ravi
Re: [RFC] perf/sdt: Directly record SDT event with 'perf record'
Thanks Ingo, On Monday 20 February 2017 02:12 PM, Ingo Molnar wrote: > * Ravi Bangoria <ravi.bango...@linux.vnet.ibm.com> wrote: > >> What should be the behavior of the tool? Should it record only one >> 'sdt_libpthread:mutex_entry' which exists in uprobe_events? Or it >> should record all the SDT events from libpthread? We can choose either >> of two but both the cases are ambiguous. > They are not ambiguous really if coded right: just pick one of the outcomes > and > maybe print a warning to inform the user that something weird is going on > because > not all markers are enabled? > > As a user I'd expect 'perf record' to enable all markers and print a warning > that > the markers were in a partial state. This would result in consistent > behaviour. Yes, makes sense. > Does it make sense to only enable some of the markers that alias on the same > name? > If not then maybe disallow that in perf probe - or change perf probe to do > the > same thing as perf record. 'perf probe' is doing that correctly. It fetches all events with given name from probe-cache and creates entries for them in uprobe_events. The problem is the 2-step process of adding probes and then recording, allowing users to select individual markers to record on. > > I.e. this is IMHO an artificial problem that users should not be exposed to > and > which can be solved by tooling. > > In particular if it's possible to enable only a part of the markers then perf > record not continuing would be a failure mode: if for example a previous perf > record session segfaulted (or ran out of RAM or was killed in the wrong > moment or > whatever) then it would not be possible to (easily) clean up the mess. Agreed. We need to make this more robust. > >> Not allowing 'perf probe' for SDT event will solve all such issues. >> Also it will make user interface simple and consistent. Other current >> tooling (systemtap, for instance) also do not allow probing individual >> markers when there are multiple markers with the same name. > In any case if others agree with your change in UI flow too then it's fine by > me, > but please make it robust, i.e. if perf record sees partially enabled probes > it > should still continue. @Masami, can you please provide your thoughts as well. Thanks, Ravi
[PATCH v3 2/2] perf/sdt: Directly record SDT events with 'perf record'
From: Hemant Kumar <hem...@linux.vnet.ibm.com> Add support for directly recording SDT events which are present in the probe cache. Without this patch, we could probe into SDT events using 'perf probe' and 'perf record'. With this patch, we can probe the SDT events directly using 'perf record'. For example : $ perf list sdt sdt_libpthread:mutex_entry [SDT event] sdt_libc:setjmp[SDT event] ... $ perf record -a -e sdt_libc:setjmp ^C[ perf record: Woken up 1 times to write data ] [ perf record: Captured and wrote 0.286 MB perf.data (1065 samples) ] $ perf script bash 803 [002] 6492.190311: sdt_libc:setjmp: (7f1d503b56a1) login 488 [001] 6496.791583: sdt_libc:setjmp: (7ff3013d56a1) fprintd 11038 [003] 6496.808032: sdt_libc:setjmp: (7fdedf5936a1) Recording on SDT events with same provider and marker names is also supported: $ readelf -n /usr/lib64/libpthread-2.24.so | grep -A2 Provider Provider: libpthread Name: mutex_entry Location: 0x9ddb, Base: 0x000139cc, ... -- Provider: libpthread Name: mutex_entry Location: 0xbcbb, Base: 0x000139cc, ... $ perf record -a -e sdt_libpthread:mutex_entry Info: Recording on 2 occurrences of sdt_libpthread:mutex_entry ^C[ perf record: Woken up 1 times to write data ] [ perf record: Captured and wrote 0.197 MB perf.data (31 samples) ] $ perf script in:imjournal 442 [000] 6625.701833: sdt_libpthread:mutex_entry: (7fb1a1940ddb) rs:main Q:Reg 443 [001] 6625.701889: sdt_libpthread:mutex_entry_1: (7fb1a1942cbb) After invoking 'perf record', behind the scenes, it checks whether the event specified is an SDT event by using the flag '%' or string 'sdt_'. After that, it first checks whether event already exists with that *name* in uprobe_events file. If found, it records that particular event. Otherwise, it does a lookup of the probe cache to find out the SDT event. If its not present, it throws an error. If found, it again tries to find existing events from uprobe_events file, but this time it uses *address* and *filename* for comparison. Finally it writes new events into the uprobe_events file and starts recording. It also maintains a list of the event names that were written to uprobe_events file for this session. After finishing the record session, it removes the events from the uprobe_events file using the maintained name list. Signed-off-by: Hemant Kumar <hem...@linux.vnet.ibm.com> Signed-off-by: Ravi Bangoria <ravi.bango...@linux.vnet.ibm.com> --- tools/lib/api/fs/tracing_path.c | 17 +- tools/perf/builtin-probe.c | 21 ++- tools/perf/builtin-record.c | 23 +++ tools/perf/perf.h | 2 + tools/perf/util/parse-events.c | 56 +- tools/perf/util/parse-events.h | 2 + tools/perf/util/probe-event.c | 44 - tools/perf/util/probe-event.h | 4 + tools/perf/util/probe-file.c| 376 tools/perf/util/probe-file.h| 27 +++ 10 files changed, 552 insertions(+), 20 deletions(-) diff --git a/tools/lib/api/fs/tracing_path.c b/tools/lib/api/fs/tracing_path.c index 3e606b9..fa52e67 100644 --- a/tools/lib/api/fs/tracing_path.c +++ b/tools/lib/api/fs/tracing_path.c @@ -103,19 +103,10 @@ int tracing_path__strerror_open_tp(int err, char *buf, size_t size, * - jirka */ if (debugfs__configured() || tracefs__configured()) { - /* sdt markers */ - if (!strncmp(filename, "sdt_", 4)) { - snprintf(buf, size, - "Error:\tFile %s/%s not found.\n" - "Hint:\tSDT event cannot be directly recorded on.\n" - "\tPlease first use 'perf probe %s:%s' before recording it.\n", - tracing_events_path, filename, sys, name); - } else { - snprintf(buf, size, -"Error:\tFile %s/%s not found.\n" -"Hint:\tPerhaps this kernel misses some CONFIG_ setting to enable this feature?.\n", -tracing_events_path, filename); - } + snprintf(buf, size, +"Error:\tFile %s/%s not found.\n" +"Hint:\tPerhaps this kernel misses some CONFIG_ setting to enable this feature?.\n", +tracing_events_path, filename); break; } snprintf(buf, size, "%s", diff --git a/tools/perf/builtin-pro
[PATCH v3 1/2] perf/sdt: Introduce util func is_sdt_event()
No Functionality changes. Signed-off-by: Ravi Bangoria <ravi.bango...@linux.vnet.ibm.com> --- tools/perf/util/probe-event.c | 9 + tools/perf/util/util.c| 12 tools/perf/util/util.h| 2 ++ 3 files changed, 15 insertions(+), 8 deletions(-) diff --git a/tools/perf/util/probe-event.c b/tools/perf/util/probe-event.c index 28fb62c..2b1409f 100644 --- a/tools/perf/util/probe-event.c +++ b/tools/perf/util/probe-event.c @@ -1339,14 +1339,7 @@ static int parse_perf_probe_point(char *arg, struct perf_probe_event *pev) if (!arg) return -EINVAL; - /* -* If the probe point starts with '%', -* or starts with "sdt_" and has a ':' but no '=', -* then it should be a SDT/cached probe point. -*/ - if (arg[0] == '%' || - (!strncmp(arg, "sdt_", 4) && -!!strchr(arg, ':') && !strchr(arg, '='))) { + if (is_sdt_event(arg)) { pev->sdt = true; if (arg[0] == '%') arg++; diff --git a/tools/perf/util/util.c b/tools/perf/util/util.c index d8b45ce..b827428 100644 --- a/tools/perf/util/util.c +++ b/tools/perf/util/util.c @@ -802,3 +802,15 @@ int unit_number__scnprintf(char *buf, size_t size, u64 n) return scnprintf(buf, size, "%" PRIu64 "%c", n, unit[i]); } + +/* + * If the probe point starts with '%', + * or starts with "sdt_" and has a ':' but no '=', + * then it should be a SDT/cached probe point. + */ +bool is_sdt_event(char *str) +{ + return (str[0] == '%' || + (!strncmp(str, "sdt_", 4) && +!!strchr(str, ':') && !strchr(str, '='))); +} diff --git a/tools/perf/util/util.h b/tools/perf/util/util.h index c74708d..ee599dc 100644 --- a/tools/perf/util/util.h +++ b/tools/perf/util/util.h @@ -364,4 +364,6 @@ int is_printable_array(char *p, unsigned int len); int timestamp__scnprintf_usec(u64 timestamp, char *buf, size_t sz); int unit_number__scnprintf(char *buf, size_t size, u64 n); + +bool is_sdt_event(char *str); #endif /* GIT_COMPAT_UTIL_H */ -- 2.9.3
[PATCH v3 0/2] perf/sdt: Directly record SDT events with 'perf record'
with 0xbcbb address, but also it has created new entry for event with 0x9ddb address. It also maintains list of entries created for particular record session, and uses that list to remove entries at the end of session. Finally, If somehow tool fails to clean events from uprobe_events at the end of session, user has to clean events manually with 'perf probe -d'. But perf will give Warning in such case. For ex, $ perf record -a -e sdt_libpthread:mutex_entry Warning: Recording on 2 occurrences of sdt_libpthread:mutex_entry /** Fails with segfault **/ $ cat /sys/kernel/debug/tracing/uprobe_events p:sdt_libpthread/mutex_entry /usr/lib64/libpthread-2.24.so:0x9ddb p:sdt_libpthread/mutex_entry_1 /usr/lib64/libpthread-2.24.so:0xbcbb When next time user tries to record, it will show a warning: $ perf record -a -e sdt_libpthread:mutex_entry Matching event(s) from uprobe_events: sdt_libpthread:mutex_entry 0x9ddb@/usr/lib64/libpthread-2.24.so Use 'perf probe -d ' to delete event(s). Warning: Found 2 events from probe-cache with name 'sdt_libpthread:mutex_entry'. Since probe point already exists with this name, recording only 1 event. Hint: Please use 'perf probe -d sdt_libpthread:mutex_entry*' to allow record on all events. But no such warning for 'sdt_libpthread:mutex_entry_1'. $ perf record -a -e sdt_libpthread:mutex_entry_1 Matching event(s) from uprobe_events: sdt_libpthread:mutex_entry_1 0xbcbb@/usr/lib64/libpthread-2.24.so Use 'perf probe -d ' to delete event(s). [1] https://lkml.org/lkml/2017/2/7/59 [2] https://lkml.org/lkml/2016/5/3/810 [3] https://lkml.org/lkml/2016/5/2/689 Hemant Kumar (1): perf/sdt: Directly record SDT events with 'perf record' Ravi Bangoria (1): perf/sdt: Introduce util func is_sdt_event() tools/lib/api/fs/tracing_path.c | 17 +- tools/perf/builtin-probe.c | 21 ++- tools/perf/builtin-record.c | 23 +++ tools/perf/perf.h | 2 + tools/perf/util/parse-events.c | 56 +- tools/perf/util/parse-events.h | 2 + tools/perf/util/probe-event.c | 53 +- tools/perf/util/probe-event.h | 4 + tools/perf/util/probe-file.c| 376 tools/perf/util/probe-file.h| 27 +++ tools/perf/util/util.c | 12 ++ tools/perf/util/util.h | 2 + 12 files changed, 567 insertions(+), 28 deletions(-) -- 2.9.3
Re: [PATCH 4/4] perf annotate: Introduce source_code to collect actual code
Hi Taeung, On Wednesday 22 February 2017 03:38 PM, Taeung Song wrote: > + INIT_LIST_HEAD(>src->code); > + > + while (!feof(file)) { > + int nr; > + char *c, *parsed_line; > + struct source_code *code; > + > + if (getline(, , file) < 0) { > + symbol__free_source_code(sym); > + break; > + } > + > + if (++nr < first_linenr) Please initialize variable nr. I got a compilation error: util/annotate.c: In function ‘symbol__tty_annotate’: util/annotate.c:1674:6: error: ‘nr’ may be used uninitialized in this function [-Werror=maybe-uninitialized] if (++nr < first_linenr) ^ util/annotate.c:1665:7: note: ‘nr’ was declared here int nr; ^ Ravi