Re: [PATCH 3/3] perf script: Fix crash because of missing feat_op[] entry
Hi Arnaldo, On 06/20/2018 07:19 PM, Arnaldo Carvalho de Melo wrote: > Em Wed, Jun 20, 2018 at 07:00:30PM +0530, Ravi Bangoria escreveu: >> perf_event__process_feature() tries to access feat_ops[feat].process >> which is not defined for feat = HEADER_LAST_FEATURE and thus perf is >> crashing. Add dummy entry for HEADER_LAST_FEATURE in the feat_ops. > > Humm, first impression is that we should check for HEADER_LAST_FEATURE > and not try to access that array slot, as it is just a marker, not an > actual feature. Sure. Let me try to handle it inside perf_event__process_feature() itself. May be something like below. - diff --git a/tools/perf/util/header.c b/tools/perf/util/header.c index 540cd2dcd3e7..4dc251d3b372 100644 --- a/tools/perf/util/header.c +++ b/tools/perf/util/header.c @@ -3461,6 +3461,10 @@ int perf_event__process_feature(struct perf_tool *tool, return -1; } + /* HEADER_LAST_FEATURE is just a marker. Ignore it. */ + if (feat == HEADER_LAST_FEATURE) + return 0; + if (!feat_ops[feat].process) return 0; - Thanks, Ravi
Re: [PATCH 2/3] perf script: Fix crash because of missing evsel->priv
Hi Arnaldo, On 06/20/2018 07:22 PM, Arnaldo Carvalho de Melo wrote: > Em Wed, Jun 20, 2018 at 07:00:29PM +0530, Ravi Bangoria escreveu: >> perf script in pipped mode is crashing because evsel->priv is not >> set properly. Fix it. >> >> Before: >> # ./perf record -o - -- ls | ./perf script >> Segmentation fault (core dumped) >> >> After: >> # ./perf record -o - -- ls | ./perf script >> ls 2282 1031.731974: 25 cpu-clock:uhH: 7effe4b3d29e >> ls 2282 1031.73: 25 cpu-clock:uhH: 7effe4b3a650 >> >> Signed-off-by: Ravi Bangoria >> Fixes: a14390fde64e ("perf script: Allow creating per-event dump files") > > Humm, this cset doesn't set evsel->priv to a 'struct perf_evsel_script' > object, will check which one does to continue review. So, it's not about setting evsel->priv to 'struct perf_evsel_script', but it's about setting proper output stream, either file or stdout. When 'struct perf_evsel_script' was not introduced, we were setting evsel->priv to output stream. So I think, this commit missed to set evsel->priv properly in pipped case. Ex, below patch applied _directly_ on top of a14390fde64e fixes the issue. -- diff --git a/tools/perf/builtin-script.c b/tools/perf/builtin-script.c index fb5e49b3bc44..66cc4b29bf4d 100644 --- a/tools/perf/builtin-script.c +++ b/tools/perf/builtin-script.c @@ -1645,6 +1645,9 @@ static int process_attr(struct perf_tool *tool, union perf_event *event, evlist = *pevlist; evsel = perf_evlist__last(*pevlist); + if (!evsel->priv) + evsel->priv = stdout; + if (evsel->attr.type >= PERF_TYPE_MAX && evsel->attr.type != PERF_TYPE_SYNTH) return 0; -- To me this commit seems to be the bad. Please let me know if that is not the case. I'll change the last patch as you suggested an post v2 soon. Thanks, Ravi
[PATCH v2 2/3] perf script: Fix crash because of missing evsel->priv
perf script in pipped mode is crashing because evsel->priv is not set properly. Fix it. Before: # ./perf record -o - -- ls | ./perf script Segmentation fault (core dumped) After: # ./perf record -o - -- ls | ./perf script ls 2282 1031.731974: 25 cpu-clock:uhH: 7effe4b3d29e ls 2282 1031.73: 25 cpu-clock:uhH: 7effe4b3a650 Signed-off-by: Ravi Bangoria Fixes: a14390fde64e ("perf script: Allow creating per-event dump files") --- tools/perf/builtin-script.c | 14 ++ 1 file changed, 14 insertions(+) diff --git a/tools/perf/builtin-script.c b/tools/perf/builtin-script.c index f3fefbcc4503..ad2ac1300420 100644 --- a/tools/perf/builtin-script.c +++ b/tools/perf/builtin-script.c @@ -1834,6 +1834,7 @@ static int process_attr(struct perf_tool *tool, union perf_event *event, struct perf_evlist *evlist; struct perf_evsel *evsel, *pos; int err; + static struct perf_evsel_script *es; err = perf_event__process_attr(tool, event, pevlist); if (err) @@ -1842,6 +1843,19 @@ static int process_attr(struct perf_tool *tool, union perf_event *event, evlist = *pevlist; evsel = perf_evlist__last(*pevlist); + if (!evsel->priv) { + if (scr->per_event_dump) { + evsel->priv = perf_evsel_script__new(evsel, + scr->session->data); + } else { + es = zalloc(sizeof(*es)); + if (!es) + return -ENOMEM; + es->fp = stdout; + evsel->priv = es; + } + } + if (evsel->attr.type >= PERF_TYPE_MAX && evsel->attr.type != PERF_TYPE_SYNTH) return 0; -- 2.14.4
[PATCH v2 3/3] perf script/annotate: Fix crash caused by accessing feat_ops[HEADER_LAST_FEATURE]
perf_event__process_feature() accesses feat_ops[HEADER_LAST_FEATURE] which is not defined and thus perf is crashing. HEADER_LAST_FEATURE is used as an end marker for the perf report but it's unused for perf script/annotate. Ignore HEADER_LAST_FEATURE for perf script/ annotate. Before: # ./perf record -o - ls | ./perf script Segmentation fault (core dumped) After: # ./perf record -o - ls | ./perf script ls 7031 4392.099856: 25 cpu-clock:uhH: 7f5e0ce7cd60 ls 7031 4392.100355: 25 cpu-clock:uhH: 7f5e0c706ef7 Signed-off-by: Ravi Bangoria Fixes: 57b5de463925 ("perf report: Support forced leader feature in pipe mode") --- tools/perf/builtin-annotate.c | 11 ++- tools/perf/builtin-report.c | 3 ++- tools/perf/builtin-script.c | 11 ++- tools/perf/util/header.c | 2 +- 4 files changed, 23 insertions(+), 4 deletions(-) diff --git a/tools/perf/builtin-annotate.c b/tools/perf/builtin-annotate.c index 5eb22cc56363..8180319285af 100644 --- a/tools/perf/builtin-annotate.c +++ b/tools/perf/builtin-annotate.c @@ -283,6 +283,15 @@ static int process_sample_event(struct perf_tool *tool, return ret; } +static int process_feature_event(struct perf_tool *tool, +union perf_event *event, +struct perf_session *session) +{ + if (event->feat.feat_id < HEADER_LAST_FEATURE) + return perf_event__process_feature(tool, event, session); + return 0; +} + static int hist_entry__tty_annotate(struct hist_entry *he, struct perf_evsel *evsel, struct perf_annotate *ann) @@ -471,7 +480,7 @@ int cmd_annotate(int argc, const char **argv) .attr = perf_event__process_attr, .build_id = perf_event__process_build_id, .tracing_data = perf_event__process_tracing_data, - .feature= perf_event__process_feature, + .feature= process_feature_event, .ordered_events = true, .ordering_requires_timestamps = true, }, diff --git a/tools/perf/builtin-report.c b/tools/perf/builtin-report.c index cdb5b6949832..c04dc7b53797 100644 --- a/tools/perf/builtin-report.c +++ b/tools/perf/builtin-report.c @@ -217,7 +217,8 @@ static int process_feature_event(struct perf_tool *tool, } /* -* All features are received, we can force the +* (feat_id = HEADER_LAST_FEATURE) is the end marker which +* means all features are received, now we can force the * group if needed. */ setup_forced_leader(rep, session->evlist); diff --git a/tools/perf/builtin-script.c b/tools/perf/builtin-script.c index ad2ac1300420..568ddfac3213 100644 --- a/tools/perf/builtin-script.c +++ b/tools/perf/builtin-script.c @@ -3044,6 +3044,15 @@ int process_cpu_map_event(struct perf_tool *tool __maybe_unused, return set_maps(script); } +static int process_feature_event(struct perf_tool *tool, +union perf_event *event, +struct perf_session *session) +{ + if (event->feat.feat_id < HEADER_LAST_FEATURE) + return perf_event__process_feature(tool, event, session); + return 0; +} + #ifdef HAVE_AUXTRACE_SUPPORT static int perf_script__process_auxtrace_info(struct perf_tool *tool, union perf_event *event, @@ -3088,7 +3097,7 @@ int cmd_script(int argc, const char **argv) .attr= process_attr, .event_update = perf_event__process_event_update, .tracing_data= perf_event__process_tracing_data, - .feature = perf_event__process_feature, + .feature = process_feature_event, .build_id= perf_event__process_build_id, .id_index= perf_event__process_id_index, .auxtrace_info = perf_script__process_auxtrace_info, diff --git a/tools/perf/util/header.c b/tools/perf/util/header.c index 59fcc790c865..653ff65aa2c3 100644 --- a/tools/perf/util/header.c +++ b/tools/perf/util/header.c @@ -3464,7 +3464,7 @@ int perf_event__process_feature(struct perf_tool *tool, pr_warning("invalid record type %d in pipe-mode\n", type); return 0; } - if (feat == HEADER_RESERVED || feat > HEADER_LAST_FEATURE) { + if (feat == HEADER_RESERVED || feat >= HEADER_LAST_FEATURE) { pr_warning("invalid record type %d in pipe-mode\n", type); return -1; } -- 2.14.4
[PATCH v2 1/3] perf script: Add missing output fields in a hint
Few fields are missing in a perf script -F hint. Add them. Signed-off-by: Ravi Bangoria --- tools/perf/builtin-script.c | 5 +++-- 1 file changed, 3 insertions(+), 2 deletions(-) diff --git a/tools/perf/builtin-script.c b/tools/perf/builtin-script.c index a31d7082188e..f3fefbcc4503 100644 --- a/tools/perf/builtin-script.c +++ b/tools/perf/builtin-script.c @@ -3125,8 +3125,9 @@ int cmd_script(int argc, const char **argv) "+field to add and -field to remove." "Valid types: hw,sw,trace,raw,synth. " "Fields: comm,tid,pid,time,cpu,event,trace,ip,sym,dso," -"addr,symoff,period,iregs,uregs,brstack,brstacksym,flags," - "bpf-output,callindent,insn,insnlen,brstackinsn,synth,phys_addr", +"addr,symoff,srcline,period,iregs,uregs,brstack," +"brstacksym,flags,bpf-output,brstackinsn,brstackoff," +"callindent,insn,insnlen,synth,phys_addr,metric,misc", parse_output_fields), OPT_BOOLEAN('a', "all-cpus", _wide, "system-wide collection from all CPUs"), -- 2.14.4
[PATCH v2 0/3] perf script: Few trivial fixes
First patch fixes perf output field hint. Second and third fixes crash when used in a piped mode. v2 changes: - [PATCH 3/3] Changed as suggested by Arnaldo v1: https://lkml.org/lkml/2018/6/20/538 Ravi Bangoria (3): perf script: Add missing output fields in a hint perf script: Fix crash because of missing evsel->priv perf script/annotate: Fix crash caused by accessing feat_ops[HEADER_LAST_FEATURE] tools/perf/builtin-annotate.c | 11 ++- tools/perf/builtin-report.c | 3 ++- tools/perf/builtin-script.c | 30 +++--- tools/perf/util/header.c | 2 +- 4 files changed, 40 insertions(+), 6 deletions(-) -- 2.14.4
xfs: Deadlock between fs_reclaim and sb_internal
Hello Darrick, Lockdep is reporting a deadlock with following trace. Saw this on my powerpc vm with 4GB of ram, running Linus/master kernel. Though, I don't have exact testcase to reproduce it. Is this something known? [ 1797.620389] == [ 1797.620541] WARNING: possible circular locking dependency detected [ 1797.620695] 4.18.0-rc2+ #2 Not tainted [ 1797.620791] -- [ 1797.620942] kswapd0/362 is trying to acquire lock: [ 1797.621067] 01ad2774 (sb_internal){.+.+}, at: xfs_trans_alloc+0x284/0x360 [xfs] [ 1797.621616] [ 1797.621616] but task is already holding lock: [ 1797.621779] bb0609ea (fs_reclaim){+.+.}, at: __fs_reclaim_acquire+0x10/0x60 [ 1797.622090] [ 1797.622090] which lock already depends on the new lock. [ 1797.622090] [ 1797.622266] [ 1797.622266] the existing dependency chain (in reverse order) is: [ 1797.622440] [ 1797.622440] -> #1 (fs_reclaim){+.+.}: [ 1797.622609]fs_reclaim_acquire.part.22+0x44/0x60 [ 1797.622764]kmem_cache_alloc+0x60/0x500 [ 1797.622983]kmem_zone_alloc+0xb4/0x168 [xfs] [ 1797.623220]xfs_trans_alloc+0xac/0x360 [xfs] [ 1797.623454]xfs_vn_update_time+0x278/0x420 [xfs] [ 1797.623576]touch_atime+0xfc/0x120 [ 1797.623668]generic_file_read_iter+0x95c/0xbd0 [ 1797.623948]xfs_file_buffered_aio_read+0x98/0x2e0 [xfs] [ 1797.624190]xfs_file_read_iter+0xac/0x170 [xfs] [ 1797.624304]__vfs_read+0x128/0x1d0 [ 1797.624388]vfs_read+0xbc/0x1b0 [ 1797.624476]kernel_read+0x60/0xa0 [ 1797.624562]prepare_binprm+0xf8/0x1f0 [ 1797.624676]__do_execve_file.isra.13+0x7ac/0xc80 [ 1797.624791]sys_execve+0x58/0x70 [ 1797.624880]system_call+0x5c/0x70 [ 1797.624965] [ 1797.624965] -> #0 (sb_internal){.+.+}: [ 1797.625119]lock_acquire+0xec/0x310 [ 1797.625204]__sb_start_write+0x18c/0x290 [ 1797.625435]xfs_trans_alloc+0x284/0x360 [xfs] [ 1797.625692]xfs_iomap_write_allocate+0x230/0x470 [xfs] [ 1797.625945]xfs_map_blocks+0x1a8/0x610 [xfs] [ 1797.626181]xfs_do_writepage+0x1a8/0x9e0 [xfs] [ 1797.626414]xfs_vm_writepage+0x4c/0x78 [xfs] [ 1797.626527]pageout.isra.17+0x180/0x680 [ 1797.626640]shrink_page_list+0xe0c/0x1470 [ 1797.626758]shrink_inactive_list+0x3a4/0x890 [ 1797.626872]shrink_node_memcg+0x268/0x790 [ 1797.626986]shrink_node+0x124/0x630 [ 1797.627089]balance_pgdat+0x1f0/0x550 [ 1797.627204]kswapd+0x214/0x8c0 [ 1797.627301]kthread+0x1b8/0x1c0 [ 1797.627389]ret_from_kernel_thread+0x5c/0x88 [ 1797.627509] [ 1797.627509] other info that might help us debug this: [ 1797.627509] [ 1797.627684] Possible unsafe locking scenario: [ 1797.627684] [ 1797.627820]CPU0CPU1 [ 1797.627965] [ 1797.628072] lock(fs_reclaim); [ 1797.628166]lock(sb_internal); [ 1797.628308]lock(fs_reclaim); [ 1797.628453] lock(sb_internal); [ 1797.628551] [ 1797.628551] *** DEADLOCK *** [ 1797.628551] [ 1797.628693] 1 lock held by kswapd0/362: [ 1797.628779] #0: bb0609ea (fs_reclaim){+.+.}, at: __fs_reclaim_acquire+0x10/0x60 [ 1797.628955] [ 1797.628955] stack backtrace: [ 1797.629067] CPU: 18 PID: 362 Comm: kswapd0 Not tainted 4.18.0-rc2+ #2 [ 1797.629199] Call Trace: [ 1797.629311] [c000f4ebaff0] [c0cc4414] dump_stack+0xe8/0x164 (unreliable) [ 1797.629473] [c000f4ebb040] [c01a9f54] print_circular_bug.isra.17+0x284/0x3d0 [ 1797.629639] [c000f4ebb0e0] [c01aed58] __lock_acquire+0x1d48/0x2020 [ 1797.629797] [c000f4ebb260] [c01af7cc] lock_acquire+0xec/0x310 [ 1797.629933] [c000f4ebb330] [c0454c5c] __sb_start_write+0x18c/0x290 [ 1797.630191] [c000f4ebb380] [d9b7896c] xfs_trans_alloc+0x284/0x360 [xfs] [ 1797.630443] [c000f4ebb3e0] [d9b58958] xfs_iomap_write_allocate+0x230/0x470 [xfs] [ 1797.630710] [c000f4ebb560] [d9b2c760] xfs_map_blocks+0x1a8/0x610 [xfs] [ 1797.630961] [c000f4ebb5d0] [d9b2eb20] xfs_do_writepage+0x1a8/0x9e0 [xfs] [ 1797.631223] [c000f4ebb6b0] [d9b2f3a4] xfs_vm_writepage+0x4c/0x78 [xfs] [ 1797.631384] [c000f4ebb720] [c03692e0] pageout.isra.17+0x180/0x680 [ 1797.631557] [c000f4ebb800] [c036ccfc] shrink_page_list+0xe0c/0x1470 [ 1797.631730] [c000f4ebb920] [c036e064] shrink_inactive_list+0x3a4/0x890 [ 1797.631912] [c000f4ebb9f0] [c036ef48] shrink_node_memcg+0x268/0x790 [ 1797.632078] [c000f4ebbb00] [c036f594] shrink_node+0x124/0x630 [ 1797.632213] [c000f4ebbbd0] [c03714e0] balance_pgdat+0x1f0/0x550 [ 1797.632350] [c000f4ebbcd0] [c0371a54] kswapd+0x214/0x8c0 [ 1797.632488] [c000f4ebbdc0]
Re: [PATCH v3 6/9] trace_uprobe: Support SDT markers having reference count (semaphore)
Thanks Oleg for the review, On 05/24/2018 09:56 PM, Oleg Nesterov wrote: > On 04/17, Ravi Bangoria wrote: >> >> @@ -941,6 +1091,9 @@ typedef bool (*filter_func_t)(struct uprobe_consumer >> *self, >> if (ret) >> goto err_buffer; >> >> +if (tu->ref_ctr_offset) >> +sdt_increment_ref_ctr(tu); >> + > > iiuc, this is probe_event_enable()... > > Looks racy, but afaics the race with uprobe_mmap() will be closed by the next > change. However, it seems that probe_event_disable() can race with > trace_uprobe_mmap() > too and the next 7/9 patch won't help, > >> +if (tu->ref_ctr_offset) >> +sdt_decrement_ref_ctr(tu); >> + >> uprobe_unregister(tu->inode, tu->offset, >consumer); >> tu->tp.flags &= file ? ~TP_FLAG_TRACE : ~TP_FLAG_PROFILE; > > so what if trace_uprobe_mmap() comes right after uprobe_unregister() ? > Note that trace_probe_is_enabled() is T until we update tp.flags. Sure, I'll look at your comments. Apart from these, I've also found a deadlock between uprobe_lock and mm->mmap_sem. trace_uprobe_mmap() takes these locks in mm->mmap_sem uprobe_lock order but some other code path is taking these locks in reverse order. I've mentioned sample lockdep warning at the end. The issue is, mm->mmap_sem is not in control of trace_uprobe_mmap() and we have to take uprobe_lock to loop over all trace_uprobes. Any idea how this can be resolved? Sample lockdep warning: [ 499.258006] == [ 499.258205] WARNING: possible circular locking dependency detected [ 499.258409] 4.17.0-rc3+ #76 Not tainted [ 499.258528] -- [ 499.258731] perf/6744 is trying to acquire lock: [ 499.258895] e4895f49 (uprobe_lock){+.+.}, at: trace_uprobe_mmap+0x78/0x130 [ 499.259147] [ 499.259147] but task is already holding lock: [ 499.259349] 9ec93a76 (>mmap_sem){}, at: vm_mmap_pgoff+0xe0/0x160 [ 499.259597] [ 499.259597] which lock already depends on the new lock. [ 499.259597] [ 499.259848] [ 499.259848] the existing dependency chain (in reverse order) is: [ 499.260086] [ 499.260086] -> #4 (>mmap_sem){}: [ 499.260277]__lock_acquire+0x53c/0x910 [ 499.260442]lock_acquire+0xf4/0x2f0 [ 499.260595]down_write_killable+0x6c/0x150 [ 499.260764]copy_process.isra.34.part.35+0x1594/0x1be0 [ 499.260967]_do_fork+0xf8/0x910 [ 499.261090]ppc_clone+0x8/0xc [ 499.261209] [ 499.261209] -> #3 (_mmap_sem){}: [ 499.261378]__lock_acquire+0x53c/0x910 [ 499.261540]lock_acquire+0xf4/0x2f0 [ 499.261669]down_write+0x6c/0x110 [ 499.261793]percpu_down_write+0x48/0x140 [ 499.261954]register_for_each_vma+0x6c/0x2a0 [ 499.262116]uprobe_register+0x230/0x320 [ 499.262277]probe_event_enable+0x1cc/0x540 [ 499.262435]perf_trace_event_init+0x1e0/0x350 [ 499.262587]perf_trace_init+0xb0/0x110 [ 499.262750]perf_tp_event_init+0x38/0x90 [ 499.262910]perf_try_init_event+0x10c/0x150 [ 499.263075]perf_event_alloc+0xbb0/0xf10 [ 499.263235]sys_perf_event_open+0x2a8/0xdd0 [ 499.263396]system_call+0x58/0x6c [ 499.263516] [ 499.263516] -> #2 (>register_rwsem){}: [ 499.263723]__lock_acquire+0x53c/0x910 [ 499.263884]lock_acquire+0xf4/0x2f0 [ 499.264002]down_write+0x6c/0x110 [ 499.264118]uprobe_register+0x1ec/0x320 [ 499.264283]probe_event_enable+0x1cc/0x540 [ 499.264442]perf_trace_event_init+0x1e0/0x350 [ 499.264603]perf_trace_init+0xb0/0x110 [ 499.264766]perf_tp_event_init+0x38/0x90 [ 499.264930]perf_try_init_event+0x10c/0x150 [ 499.265092]perf_event_alloc+0xbb0/0xf10 [ 499.265261]sys_perf_event_open+0x2a8/0xdd0 [ 499.265424]system_call+0x58/0x6c [ 499.265542] [ 499.265542] -> #1 (event_mutex){+.+.}: [ 499.265738]__lock_acquire+0x53c/0x910 [ 499.265896]lock_acquire+0xf4/0x2f0 [ 499.266019]__mutex_lock+0xa0/0xab0 [ 499.266142]trace_add_event_call+0x44/0x100 [ 499.266310]create_trace_uprobe+0x4a0/0x8b0 [ 499.266474]trace_run_command+0xa4/0xc0 [ 499.266631]trace_parse_run_command+0xe4/0x200 [ 499.266799]probes_write+0x20/0x40 [ 499.266922]__vfs_write+0x6c/0x240 [ 499.267041]vfs_write+0xd0/0x240 [ 499.267166]ksys_write+0x6c/0x110 [ 499.267295]system_call+0x58/0x6c [ 499.267413] [ 499.267413] -> #0 (uprobe_lock){+.+.}: [ 499.267591]validate_chain.isra.34+0xbd0/0x1000 [ 499.267747]__lock_acquire+0x53c/0x910 [ 499.267917]lock_acquire+0xf4/0x2f0 [ 499.268048]__mu
Re: [PATCH v5 09/10] Uprobes/sdt: Document about reference counter
Hi Srikar, On 07/02/2018 08:24 PM, Srikar Dronamraju wrote: > * Ravi Bangoria [2018-06-28 10:52:08]: > >> Reference counter gate the invocation of probe. If present, >> by default reference count is 0. Kernel needs to increment >> it before tracing the probe and decrement it when done. This >> is identical to semaphore in Userspace Statically Defined >> Tracepoints (USDT). >> >> Document usage of reference counter. >> >> Signed-off-by: Ravi Bangoria >> Acked-by: Masami Hiramatsu > > Unlike perf, this mechanism cannot detect ref count and depends on the > users data. What happens if the user mistakenly provides a wrong location? > I guess he can corrupt some other data structures? Yes, if user is giving wrong ref_ctr_offset, uprobe infrastructure will corrupt some user data. > > Hence I would think twice of advertising this mechanism. i.e keep this > as an undocumented feature. > I don't mind. Thanks, Ravi
Re: [RFC 2/4] perf: Pass pmu pointer to perf_paranoid_* helpers
Hi Tvrtko, > @@ -199,7 +199,7 @@ static inline void perf_get_data_addr(struct pt_regs > *regs, u64 *addrp) > if (!(mmcra & MMCRA_SAMPLE_ENABLE) || sdar_valid) > *addrp = mfspr(SPRN_SDAR); > > - if (perf_paranoid_kernel() && !capable(CAP_SYS_ADMIN) && > + if (perf_paranoid_kernel(ppmu) && !capable(CAP_SYS_ADMIN) && > is_kernel_addr(mfspr(SPRN_SDAR))) > *addrp = 0; > } This patch fails for me on powerpc: arch/powerpc/perf/core-book3s.c: In function ‘perf_get_data_addr’: arch/powerpc/perf/core-book3s.c:202:27: error: passing argument 1 of ‘perf_paranoid_kernel’ from incompatible pointer type [-Werror=incompatible-pointer-types] if (perf_paranoid_kernel(ppmu) && !capable(CAP_SYS_ADMIN) && ^~~~ In file included from arch/powerpc/perf/core-book3s.c:13:0: ./include/linux/perf_event.h:1191:20: note: expected ‘const struct pmu *’ but argument is of type ‘struct power_pmu *’ static inline bool perf_paranoid_kernel(const struct pmu *pmu) ^~~~ arch/powerpc/perf/core-book3s.c: In function ‘power_pmu_bhrb_read’: arch/powerpc/perf/core-book3s.c:470:8: error: too few arguments to function ‘perf_paranoid_kernel’ if (perf_paranoid_kernel() && !capable(CAP_SYS_ADMIN) && ^~~~ In file included from arch/powerpc/perf/core-book3s.c:13:0: ./include/linux/perf_event.h:1191:20: note: declared here static inline bool perf_paranoid_kernel(const struct pmu *pmu) ^~~~ CC net/ipv6/route.o
Re: [PATCH v6 5/6] Uprobes/sdt: Prevent multiple reference counter for same uprobe
Hi Oleg, On 07/25/2018 04:38 PM, Oleg Nesterov wrote: > No, I can't understand this patch... > > On 07/16, Ravi Bangoria wrote: >> >> --- a/kernel/events/uprobes.c >> +++ b/kernel/events/uprobes.c >> @@ -63,6 +63,8 @@ static struct percpu_rw_semaphore dup_mmap_sem; >> >> /* Have a copy of original instruction */ >> #define UPROBE_COPY_INSN0 >> +/* Reference counter offset is reloaded with non-zero value. */ >> +#define REF_CTR_OFF_RELOADED1 >> >> struct uprobe { >> struct rb_node rb_node;/* node in the rb tree */ >> @@ -476,9 +478,23 @@ int uprobe_write_opcode(struct arch_uprobe *auprobe, >> struct mm_struct *mm, >> return ret; >> >> ret = verify_opcode(old_page, vaddr, ); >> -if (ret <= 0) >> +if (ret < 0) >> goto put_old; > > I agree, "ret <= 0" wasn't nice even before this change, but "ret < 0" looks > worse because this is simply not possible. Ok. Any better idea? I think if we don't track all mms patched by uprobe, we have to rely on current instruction. > >> +/* >> + * If instruction is already patched but reference counter offset >> + * has been reloaded to non-zero value, increment the reference >> + * counter and return. >> + */ >> +if (ret == 0) { >> +if (is_register && >> +test_bit(REF_CTR_OFF_RELOADED, >flags)) { >> +WARN_ON(!uprobe->ref_ctr_offset); >> +ret = update_ref_ctr(uprobe, mm, true); >> +} >> +goto put_old; >> +} > > So we need to force update_ref_ctr(true) in case when uprobe_register_refctr() > detects the already registered uprobe with ref_ctr_offset == 0, and then it > calls > register_for_each_vma(). > > Why this can't race with uprobe_mmap() ? > > uprobe_mmap() can do install_breakpoint() right after REF_CTR_OFF_RELOADED > was set, > then register_for_each_vma() will find this vma and do install_breakpoint() > too. > If ref_ctr_vma was already mmaped, the counter will be incremented twice, no? Hmm right. Probably, I can fix this race by using some lock, but I don't know if it's good to do that inside uprobe_mmap(). > >> @@ -971,6 +1011,7 @@ register_for_each_vma(struct uprobe *uprobe, struct >> uprobe_consumer *new) >> bool is_register = !!new; >> struct map_info *info; >> int err = 0; >> +bool installed = false; >> >> percpu_down_write(_mmap_sem); >> info = build_map_info(uprobe->inode->i_mapping, >> @@ -1000,8 +1041,10 @@ register_for_each_vma(struct uprobe *uprobe, struct >> uprobe_consumer *new) >> if (is_register) { >> /* consult only the "caller", new consumer. */ >> if (consumer_filter(new, >> -UPROBE_FILTER_REGISTER, mm)) >> +UPROBE_FILTER_REGISTER, mm)) { >> err = install_breakpoint(uprobe, mm, vma, >> info->vaddr); >> +installed = true; >> +} >> } else if (test_bit(MMF_HAS_UPROBES, >flags)) { >> if (!filter_chain(uprobe, >> UPROBE_FILTER_UNREGISTER, mm)) >> @@ -1016,6 +1059,8 @@ register_for_each_vma(struct uprobe *uprobe, struct >> uprobe_consumer *new) >> } >> out: >> percpu_up_write(_mmap_sem); >> +if (installed) >> +clear_bit(REF_CTR_OFF_RELOADED, >flags); > > I simply can't understand this "bool installed" That boolean is needed because consumer_filter() returns false when this function gets called first time from uprobe_register(). And consumer_filter returns true when it gets called by uprobe_apply(). If I make it unconditional, there is no effect of setting REF_CTR_OFF_RELOADED bit. So this boolean is needed. Though, there is one more issue I found. Let's say there are two processes running and user probes on both of them using uprobe_register() using, let's say systemtap. Now, some other guy does uprobe_register_refctr() using 'perf -p PID' on same uprobe but he is interested in only one process. Here, we will increment the reference counter only in the "PID" process and we will clear REF_CTR_OFF_RELOADED flag. Later, some other guy does 'perf -a' which will call uprobe_register_refctr() for both the target but we will fail to increment the counter in "non-PID" process
[PATCH v7 4/6] Uprobes/sdt: Prevent multiple reference counter for same uprobe
We assume to have only one reference counter for one uprobe. Don't allow user to register multiple uprobes having same inode+offset but different reference counter. Signed-off-by: Ravi Bangoria --- kernel/events/uprobes.c | 9 + 1 file changed, 9 insertions(+) diff --git a/kernel/events/uprobes.c b/kernel/events/uprobes.c index ad92fed11526..c27546929ae7 100644 --- a/kernel/events/uprobes.c +++ b/kernel/events/uprobes.c @@ -679,6 +679,12 @@ static struct uprobe *alloc_uprobe(struct inode *inode, loff_t offset, cur_uprobe = insert_uprobe(uprobe); /* a uprobe exists for this inode:offset combination */ if (cur_uprobe) { + if (cur_uprobe->ref_ctr_offset != uprobe->ref_ctr_offset) { + pr_warn("Reference counter offset mismatch.\n"); + put_uprobe(cur_uprobe); + kfree(uprobe); + return ERR_PTR(-EINVAL); + } kfree(uprobe); uprobe = cur_uprobe; } @@ -1093,6 +1099,9 @@ static int __uprobe_register(struct inode *inode, loff_t offset, uprobe = alloc_uprobe(inode, offset, ref_ctr_offset); if (!uprobe) return -ENOMEM; + if (IS_ERR(uprobe)) + return PTR_ERR(uprobe); + /* * We can race with uprobe_unregister()->delete_uprobe(). * Check uprobe_is_active() and retry if it is false. -- 2.14.4
[PATCH v7 6/6] perf probe: Support SDT markers having reference counter (semaphore)
With this, perf buildid-cache will save SDT markers with reference counter in probe cache. Perf probe will be able to probe markers having reference counter. Ex, # readelf -n /tmp/tick | grep -A1 loop2 Name: loop2 ... Semaphore: 0x10020036 # ./perf buildid-cache --add /tmp/tick # ./perf probe sdt_tick:loop2 # ./perf stat -e sdt_tick:loop2 /tmp/tick hi: 0 hi: 1 hi: 2 ^C Performance counter stats for '/tmp/tick': 3 sdt_tick:loop2 2.561851452 seconds time elapsed Signed-off-by: Ravi Bangoria Acked-by: Masami Hiramatsu Acked-by: Srikar Dronamraju --- tools/perf/util/probe-event.c | 39 tools/perf/util/probe-event.h | 1 + tools/perf/util/probe-file.c | 34 ++-- tools/perf/util/probe-file.h | 1 + tools/perf/util/symbol-elf.c | 46 --- tools/perf/util/symbol.h | 7 +++ 6 files changed, 106 insertions(+), 22 deletions(-) diff --git a/tools/perf/util/probe-event.c b/tools/perf/util/probe-event.c index f119eb628dbb..e86f8be89157 100644 --- a/tools/perf/util/probe-event.c +++ b/tools/perf/util/probe-event.c @@ -1819,6 +1819,12 @@ int parse_probe_trace_command(const char *cmd, struct probe_trace_event *tev) tp->offset = strtoul(fmt2_str, NULL, 10); } + if (tev->uprobes) { + fmt2_str = strchr(p, '('); + if (fmt2_str) + tp->ref_ctr_offset = strtoul(fmt2_str + 1, NULL, 0); + } + tev->nargs = argc - 2; tev->args = zalloc(sizeof(struct probe_trace_arg) * tev->nargs); if (tev->args == NULL) { @@ -2012,6 +2018,22 @@ static int synthesize_probe_trace_arg(struct probe_trace_arg *arg, return err; } +static int +synthesize_uprobe_trace_def(struct probe_trace_event *tev, struct strbuf *buf) +{ + struct probe_trace_point *tp = >point; + int err; + + err = strbuf_addf(buf, "%s:0x%lx", tp->module, tp->address); + + if (err >= 0 && tp->ref_ctr_offset) { + if (!uprobe_ref_ctr_is_supported()) + return -1; + err = strbuf_addf(buf, "(0x%lx)", tp->ref_ctr_offset); + } + return err >= 0 ? 0 : -1; +} + char *synthesize_probe_trace_command(struct probe_trace_event *tev) { struct probe_trace_point *tp = >point; @@ -2041,15 +2063,17 @@ char *synthesize_probe_trace_command(struct probe_trace_event *tev) } /* Use the tp->address for uprobes */ - if (tev->uprobes) - err = strbuf_addf(, "%s:0x%lx", tp->module, tp->address); - else if (!strncmp(tp->symbol, "0x", 2)) + if (tev->uprobes) { + err = synthesize_uprobe_trace_def(tev, ); + } else if (!strncmp(tp->symbol, "0x", 2)) { /* Absolute address. See try_to_find_absolute_address() */ err = strbuf_addf(, "%s%s0x%lx", tp->module ?: "", tp->module ? ":" : "", tp->address); - else + } else { err = strbuf_addf(, "%s%s%s+%lu", tp->module ?: "", tp->module ? ":" : "", tp->symbol, tp->offset); + } + if (err) goto error; @@ -2633,6 +2657,13 @@ static void warn_uprobe_event_compat(struct probe_trace_event *tev) { int i; char *buf = synthesize_probe_trace_command(tev); + struct probe_trace_point *tp = >point; + + if (tp->ref_ctr_offset && !uprobe_ref_ctr_is_supported()) { + pr_warning("A semaphore is associated with %s:%s and " + "seems your kernel doesn't support it.\n", + tev->group, tev->event); + } /* Old uprobe event doesn't support memory dereference */ if (!tev->uprobes || tev->nargs == 0 || !buf) diff --git a/tools/perf/util/probe-event.h b/tools/perf/util/probe-event.h index 45b14f020558..15a98c3a2a2f 100644 --- a/tools/perf/util/probe-event.h +++ b/tools/perf/util/probe-event.h @@ -27,6 +27,7 @@ struct probe_trace_point { char*symbol;/* Base symbol */ char*module;/* Module name */ unsigned long offset; /* Offset from symbol */ + unsigned long ref_ctr_offset; /* SDT reference counter offset */ unsigned long address;/* Actual address of the trace point */ boolretprobe; /* Return probe flag */ }; diff --git a/tools/perf/util/probe-file.c b/tools/perf/util/probe-file.c index b76088fadf3d..aac7817d9e14 100644 --- a/tools/perf/util/probe-file.c
[PATCH v7 3/6] Uprobes: Support SDT markers having reference count (semaphore)
Userspace Statically Defined Tracepoints[1] are dtrace style markers inside userspace applications. Applications like PostgreSQL, MySQL, Pthread, Perl, Python, Java, Ruby, Node.js, libvirt, QEMU, glib etc have these markers embedded in them. These markers are added by developer at important places in the code. Each marker source expands to a single nop instruction in the compiled code but there may be additional overhead for computing the marker arguments which expands to couple of instructions. In case the overhead is more, execution of it can be omitted by runtime if() condition when no one is tracing on the marker: if (reference_counter > 0) { Execute marker instructions; } Default value of reference counter is 0. Tracer has to increment the reference counter before tracing on a marker and decrement it when done with the tracing. Implement the reference counter logic in core uprobe. User will be able to use it from trace_uprobe as well as from kernel module. New trace_uprobe definition with reference counter will now be: :[(ref_ctr_offset)] where ref_ctr_offset is an optional field. For kernel module, new variant of uprobe_register() has been introduced: uprobe_register_refctr(inode, offset, ref_ctr_offset, consumer) No new variant for uprobe_unregister() because it's assumed to have only one reference counter for one uprobe. [1] https://sourceware.org/systemtap/wiki/UserSpaceProbeImplementation Note: 'reference counter' is called as 'semaphore' in original Dtrace (or Systemtap, bcc and even in ELF) documentation and code. But the term 'semaphore' is misleading in this context. This is just a counter used to hold number of tracers tracing on a marker. This is not really used for any synchronization. So we are referring it as 'reference counter' in kernel / perf code. Signed-off-by: Ravi Bangoria Reviewed-by: Masami Hiramatsu [Only trace_uprobe.c] --- include/linux/uprobes.h | 5 + kernel/events/uprobes.c | 232 ++-- kernel/trace/trace.c| 2 +- kernel/trace/trace_uprobe.c | 38 +++- 4 files changed, 267 insertions(+), 10 deletions(-) diff --git a/include/linux/uprobes.h b/include/linux/uprobes.h index bb9d2084af03..103a48a48872 100644 --- a/include/linux/uprobes.h +++ b/include/linux/uprobes.h @@ -123,6 +123,7 @@ extern unsigned long uprobe_get_swbp_addr(struct pt_regs *regs); extern unsigned long uprobe_get_trap_addr(struct pt_regs *regs); extern int uprobe_write_opcode(struct arch_uprobe *auprobe, struct mm_struct *mm, unsigned long vaddr, uprobe_opcode_t); extern int uprobe_register(struct inode *inode, loff_t offset, struct uprobe_consumer *uc); +extern int uprobe_register_refctr(struct inode *inode, loff_t offset, loff_t ref_ctr_offset, struct uprobe_consumer *uc); extern int uprobe_apply(struct inode *inode, loff_t offset, struct uprobe_consumer *uc, bool); extern void uprobe_unregister(struct inode *inode, loff_t offset, struct uprobe_consumer *uc); extern int uprobe_mmap(struct vm_area_struct *vma); @@ -160,6 +161,10 @@ uprobe_register(struct inode *inode, loff_t offset, struct uprobe_consumer *uc) { return -ENOSYS; } +static inline int uprobe_register_refctr(struct inode *inode, loff_t offset, loff_t ref_ctr_offset, struct uprobe_consumer *uc) +{ + return -ENOSYS; +} static inline int uprobe_apply(struct inode *inode, loff_t offset, struct uprobe_consumer *uc, bool add) { diff --git a/kernel/events/uprobes.c b/kernel/events/uprobes.c index c0418ba52ba8..ad92fed11526 100644 --- a/kernel/events/uprobes.c +++ b/kernel/events/uprobes.c @@ -73,6 +73,7 @@ struct uprobe { struct uprobe_consumer *consumers; struct inode*inode; /* Also hold a ref to inode */ loff_t offset; + loff_t ref_ctr_offset; unsigned long flags; /* @@ -88,6 +89,15 @@ struct uprobe { struct arch_uprobe arch; }; +struct delayed_uprobe { + struct list_head list; + struct uprobe *uprobe; + struct mm_struct *mm; +}; + +static DEFINE_MUTEX(delayed_uprobe_lock); +static LIST_HEAD(delayed_uprobe_list); + /* * Execute out of line area: anonymous executable mapping installed * by the probed task to execute the copy of the original instruction @@ -282,6 +292,154 @@ static int verify_opcode(struct page *page, unsigned long vaddr, uprobe_opcode_t return 1; } +static struct delayed_uprobe * +delayed_uprobe_check(struct uprobe *uprobe, struct mm_struct *mm) +{ + struct delayed_uprobe *du; + + list_for_each_entry(du, _uprobe_list, list) + if (du->uprobe == uprobe && du->mm == mm) + return du; + return NULL; +} + +static int delayed_uprobe_add(struct uprobe *uprobe, struct mm_struct *mm) +{ + struct delayed_uprobe *du; + + if (delayed_uprobe_check(uprobe, mm)) +
[PATCH v7 2/6] Uprobe: Additional argument arch_uprobe to uprobe_write_opcode()
Add addition argument 'arch_uprobe' to uprobe_write_opcode(). We need this in later set of patches. Signed-off-by: Ravi Bangoria --- arch/arm/probes/uprobes/core.c | 2 +- arch/mips/kernel/uprobes.c | 2 +- include/linux/uprobes.h| 2 +- kernel/events/uprobes.c| 9 + 4 files changed, 8 insertions(+), 7 deletions(-) diff --git a/arch/arm/probes/uprobes/core.c b/arch/arm/probes/uprobes/core.c index d1329f1ba4e4..bf992264060e 100644 --- a/arch/arm/probes/uprobes/core.c +++ b/arch/arm/probes/uprobes/core.c @@ -32,7 +32,7 @@ bool is_swbp_insn(uprobe_opcode_t *insn) int set_swbp(struct arch_uprobe *auprobe, struct mm_struct *mm, unsigned long vaddr) { - return uprobe_write_opcode(mm, vaddr, + return uprobe_write_opcode(auprobe, mm, vaddr, __opcode_to_mem_arm(auprobe->bpinsn)); } diff --git a/arch/mips/kernel/uprobes.c b/arch/mips/kernel/uprobes.c index f7a0645ccb82..4aaff3b3175c 100644 --- a/arch/mips/kernel/uprobes.c +++ b/arch/mips/kernel/uprobes.c @@ -224,7 +224,7 @@ unsigned long arch_uretprobe_hijack_return_addr( int __weak set_swbp(struct arch_uprobe *auprobe, struct mm_struct *mm, unsigned long vaddr) { - return uprobe_write_opcode(mm, vaddr, UPROBE_SWBP_INSN); + return uprobe_write_opcode(auprobe, mm, vaddr, UPROBE_SWBP_INSN); } void arch_uprobe_copy_ixol(struct page *page, unsigned long vaddr, diff --git a/include/linux/uprobes.h b/include/linux/uprobes.h index 0a294e950df8..bb9d2084af03 100644 --- a/include/linux/uprobes.h +++ b/include/linux/uprobes.h @@ -121,7 +121,7 @@ extern bool is_swbp_insn(uprobe_opcode_t *insn); extern bool is_trap_insn(uprobe_opcode_t *insn); extern unsigned long uprobe_get_swbp_addr(struct pt_regs *regs); extern unsigned long uprobe_get_trap_addr(struct pt_regs *regs); -extern int uprobe_write_opcode(struct mm_struct *mm, unsigned long vaddr, uprobe_opcode_t); +extern int uprobe_write_opcode(struct arch_uprobe *auprobe, struct mm_struct *mm, unsigned long vaddr, uprobe_opcode_t); extern int uprobe_register(struct inode *inode, loff_t offset, struct uprobe_consumer *uc); extern int uprobe_apply(struct inode *inode, loff_t offset, struct uprobe_consumer *uc, bool); extern void uprobe_unregister(struct inode *inode, loff_t offset, struct uprobe_consumer *uc); diff --git a/kernel/events/uprobes.c b/kernel/events/uprobes.c index 471eac896635..c0418ba52ba8 100644 --- a/kernel/events/uprobes.c +++ b/kernel/events/uprobes.c @@ -299,8 +299,8 @@ static int verify_opcode(struct page *page, unsigned long vaddr, uprobe_opcode_t * Called with mm->mmap_sem held for write. * Return 0 (success) or a negative errno. */ -int uprobe_write_opcode(struct mm_struct *mm, unsigned long vaddr, - uprobe_opcode_t opcode) +int uprobe_write_opcode(struct arch_uprobe *auprobe, struct mm_struct *mm, + unsigned long vaddr, uprobe_opcode_t opcode) { struct page *old_page, *new_page; struct vm_area_struct *vma; @@ -351,7 +351,7 @@ int uprobe_write_opcode(struct mm_struct *mm, unsigned long vaddr, */ int __weak set_swbp(struct arch_uprobe *auprobe, struct mm_struct *mm, unsigned long vaddr) { - return uprobe_write_opcode(mm, vaddr, UPROBE_SWBP_INSN); + return uprobe_write_opcode(auprobe, mm, vaddr, UPROBE_SWBP_INSN); } /** @@ -366,7 +366,8 @@ int __weak set_swbp(struct arch_uprobe *auprobe, struct mm_struct *mm, unsigned int __weak set_orig_insn(struct arch_uprobe *auprobe, struct mm_struct *mm, unsigned long vaddr) { - return uprobe_write_opcode(mm, vaddr, *(uprobe_opcode_t *)>insn); + return uprobe_write_opcode(auprobe, mm, vaddr, + *(uprobe_opcode_t *)>insn); } static struct uprobe *get_uprobe(struct uprobe *uprobe) -- 2.14.4
[PATCH v7 0/6] Uprobes: Support SDT markers having reference count (semaphore)
v7 changes: - Don't allow both zero and non-zero reference counter offset at the same time. It's painful and error prone. - Don't call delayed_uprobe_install() if vma->vm_mm does not have any breakpoint installed. v6: https://lkml.org/lkml/2018/7/16/353 Description: Userspace Statically Defined Tracepoints[1] are dtrace style markers inside userspace applications. Applications like PostgreSQL, MySQL, Pthread, Perl, Python, Java, Ruby, Node.js, libvirt, QEMU, glib etc have these markers embedded in them. These markers are added by developer at important places in the code. Each marker source expands to a single nop instruction in the compiled code but there may be additional overhead for computing the marker arguments which expands to couple of instructions. In case the overhead is more, execution of it can be omitted by runtime if() condition when no one is tracing on the marker: if (reference_counter > 0) { Execute marker instructions; } Default value of reference counter is 0. Tracer has to increment the reference counter before tracing on a marker and decrement it when done with the tracing. Currently, perf tool has limited supports for SDT markers. I.e. it can not trace markers surrounded by reference counter. Also, it's not easy to add reference counter logic in userspace tool like perf, so basic idea for this patchset is to add reference counter logic in the a uprobe infrastructure. Ex,[2] # cat tick.c ... for (i = 0; i < 100; i++) { DTRACE_PROBE1(tick, loop1, i); if (TICK_LOOP2_ENABLED()) { DTRACE_PROBE1(tick, loop2, i); } printf("hi: %d\n", i); sleep(1); } ... Here tick:loop1 is marker without reference counter where as tick:loop2 is surrounded by reference counter condition. # perf buildid-cache --add /tmp/tick # perf probe sdt_tick:loop1 # perf probe sdt_tick:loop2 # perf stat -e sdt_tick:loop1,sdt_tick:loop2 -- /tmp/tick hi: 0 hi: 1 hi: 2 ^C Performance counter stats for '/tmp/tick': 3 sdt_tick:loop1 0 sdt_tick:loop2 2.747086086 seconds time elapsed Perf failed to record data for tick:loop2. Same experiment with this patch series: # ./perf buildid-cache --add /tmp/tick # ./perf probe sdt_tick:loop2 # ./perf stat -e sdt_tick:loop2 /tmp/tick hi: 0 hi: 1 hi: 2 ^C Performance counter stats for '/tmp/tick': 3 sdt_tick:loop2 2.561851452 seconds time elapsed Ravi Bangoria (6): Uprobes: Simplify uprobe_register() body Uprobe: Additional argument arch_uprobe to uprobe_write_opcode() Uprobes: Support SDT markers having reference count (semaphore) Uprobes/sdt: Prevent multiple reference counter for same uprobe trace_uprobe/sdt: Prevent multiple reference counter for same uprobe perf probe: Support SDT markers having reference counter (semaphore) arch/arm/probes/uprobes/core.c | 2 +- arch/mips/kernel/uprobes.c | 2 +- include/linux/uprobes.h| 7 +- kernel/events/uprobes.c| 315 +++-- kernel/trace/trace.c | 2 +- kernel/trace/trace_uprobe.c| 75 +- tools/perf/util/probe-event.c | 39 - tools/perf/util/probe-event.h | 1 + tools/perf/util/probe-file.c | 34 - tools/perf/util/probe-file.h | 1 + tools/perf/util/symbol-elf.c | 46 -- tools/perf/util/symbol.h | 7 + 12 files changed, 459 insertions(+), 72 deletions(-) -- 2.14.4
[PATCH v7 5/6] trace_uprobe/sdt: Prevent multiple reference counter for same uprobe
We assume to have only one reference counter for one uprobe. Don't allow user to add multiple trace_uprobe entries having same inode+offset but different reference counter. Ex, # echo "p:sdt_tick/loop2 /home/ravi/tick:0x6e4(0x10036)" > uprobe_events # echo "p:sdt_tick/loop2_1 /home/ravi/tick:0x6e4(0xf)" >> uprobe_events bash: echo: write error: Invalid argument # dmesg trace_kprobe: Reference counter offset mismatch. There is one exception though: When user is trying to replace the old entry with the new one, we allow this if the new entry does not conflict with any other existing entries. Signed-off-by: Ravi Bangoria --- kernel/trace/trace_uprobe.c | 37 +++-- 1 file changed, 35 insertions(+), 2 deletions(-) diff --git a/kernel/trace/trace_uprobe.c b/kernel/trace/trace_uprobe.c index bf2be098eb08..be64d943d7ea 100644 --- a/kernel/trace/trace_uprobe.c +++ b/kernel/trace/trace_uprobe.c @@ -324,6 +324,35 @@ static int unregister_trace_uprobe(struct trace_uprobe *tu) return 0; } +/* + * Uprobe with multiple reference counter is not allowed. i.e. + * If inode and offset matches, reference counter offset *must* + * match as well. Though, there is one exception: If user is + * replacing old trace_uprobe with new one(same group/event), + * then we allow same uprobe with new reference counter as far + * as the new one does not conflict with any other existing + * ones. + */ +static struct trace_uprobe *find_old_trace_uprobe(struct trace_uprobe *new) +{ + struct trace_uprobe *tmp, *old = NULL; + struct inode *new_inode = d_real_inode(new->path.dentry); + + old = find_probe_event(trace_event_name(>tp.call), + new->tp.call.class->system); + + list_for_each_entry(tmp, _list, list) { + if ((old ? old != tmp : true) && + new_inode == d_real_inode(tmp->path.dentry) && + new->offset == tmp->offset && + new->ref_ctr_offset != tmp->ref_ctr_offset) { + pr_warn("Reference counter offset mismatch."); + return ERR_PTR(-EINVAL); + } + } + return old; +} + /* Register a trace_uprobe and probe_event */ static int register_trace_uprobe(struct trace_uprobe *tu) { @@ -333,8 +362,12 @@ static int register_trace_uprobe(struct trace_uprobe *tu) mutex_lock(_lock); /* register as an event */ - old_tu = find_probe_event(trace_event_name(>tp.call), - tu->tp.call.class->system); + old_tu = find_old_trace_uprobe(tu); + if (IS_ERR(old_tu)) { + ret = PTR_ERR(old_tu); + goto end; + } + if (old_tu) { /* delete old event */ ret = unregister_trace_uprobe(old_tu); -- 2.14.4
[PATCH v7 1/6] Uprobes: Simplify uprobe_register() body
Simplify uprobe_register() function body and let __uprobe_register() handle everything. Also move dependency functions around to fix build failures. Signed-off-by: Ravi Bangoria --- kernel/events/uprobes.c | 69 ++--- 1 file changed, 36 insertions(+), 33 deletions(-) diff --git a/kernel/events/uprobes.c b/kernel/events/uprobes.c index ccc579a7d32e..471eac896635 100644 --- a/kernel/events/uprobes.c +++ b/kernel/events/uprobes.c @@ -840,13 +840,8 @@ register_for_each_vma(struct uprobe *uprobe, struct uprobe_consumer *new) return err; } -static int __uprobe_register(struct uprobe *uprobe, struct uprobe_consumer *uc) -{ - consumer_add(uprobe, uc); - return register_for_each_vma(uprobe, uc); -} - -static void __uprobe_unregister(struct uprobe *uprobe, struct uprobe_consumer *uc) +static void +__uprobe_unregister(struct uprobe *uprobe, struct uprobe_consumer *uc) { int err; @@ -860,24 +855,46 @@ static void __uprobe_unregister(struct uprobe *uprobe, struct uprobe_consumer *u } /* - * uprobe_register - register a probe + * uprobe_unregister - unregister a already registered probe. + * @inode: the file in which the probe has to be removed. + * @offset: offset from the start of the file. + * @uc: identify which probe if multiple probes are colocated. + */ +void uprobe_unregister(struct inode *inode, loff_t offset, struct uprobe_consumer *uc) +{ + struct uprobe *uprobe; + + uprobe = find_uprobe(inode, offset); + if (WARN_ON(!uprobe)) + return; + + down_write(>register_rwsem); + __uprobe_unregister(uprobe, uc); + up_write(>register_rwsem); + put_uprobe(uprobe); +} +EXPORT_SYMBOL_GPL(uprobe_unregister); + +/* + * __uprobe_register - register a probe * @inode: the file in which the probe has to be placed. * @offset: offset from the start of the file. * @uc: information on howto handle the probe.. * - * Apart from the access refcount, uprobe_register() takes a creation + * Apart from the access refcount, __uprobe_register() takes a creation * refcount (thro alloc_uprobe) if and only if this @uprobe is getting * inserted into the rbtree (i.e first consumer for a @inode:@offset * tuple). Creation refcount stops uprobe_unregister from freeing the * @uprobe even before the register operation is complete. Creation * refcount is released when the last @uc for the @uprobe - * unregisters. Caller of uprobe_register() is required to keep @inode + * unregisters. Caller of __uprobe_register() is required to keep @inode * (and the containing mount) referenced. * * Return errno if it cannot successully install probes * else return 0 (success) */ -int uprobe_register(struct inode *inode, loff_t offset, struct uprobe_consumer *uc) +static int __uprobe_register(struct inode *inode, loff_t offset, +struct uprobe_consumer *uc) { struct uprobe *uprobe; int ret; @@ -904,7 +921,8 @@ int uprobe_register(struct inode *inode, loff_t offset, struct uprobe_consumer * down_write(>register_rwsem); ret = -EAGAIN; if (likely(uprobe_is_active(uprobe))) { - ret = __uprobe_register(uprobe, uc); + consumer_add(uprobe, uc); + ret = register_for_each_vma(uprobe, uc); if (ret) __uprobe_unregister(uprobe, uc); } @@ -915,6 +933,12 @@ int uprobe_register(struct inode *inode, loff_t offset, struct uprobe_consumer * goto retry; return ret; } + +int uprobe_register(struct inode *inode, loff_t offset, + struct uprobe_consumer *uc) +{ + return __uprobe_register(inode, offset, uc); +} EXPORT_SYMBOL_GPL(uprobe_register); /* @@ -946,27 +970,6 @@ int uprobe_apply(struct inode *inode, loff_t offset, return ret; } -/* - * uprobe_unregister - unregister a already registered probe. - * @inode: the file in which the probe has to be removed. - * @offset: offset from the start of the file. - * @uc: identify which probe if multiple probes are colocated. - */ -void uprobe_unregister(struct inode *inode, loff_t offset, struct uprobe_consumer *uc) -{ - struct uprobe *uprobe; - - uprobe = find_uprobe(inode, offset); - if (WARN_ON(!uprobe)) - return; - - down_write(>register_rwsem); - __uprobe_unregister(uprobe, uc); - up_write(>register_rwsem); - put_uprobe(uprobe); -} -EXPORT_SYMBOL_GPL(uprobe_unregister); - static int unapply_uprobe(struct uprobe *uprobe, struct mm_struct *mm) { struct vm_area_struct *vma; -- 2.14.4
Re: [PATCH] perf test 39 (Session topology) dumps core on s390
Hi Thomas, On 05/24/2018 07:26 PM, Thomas Richter wrote: > @@ -95,7 +98,7 @@ int test__session_topology(struct test *test > __maybe_unused, int subtest __maybe > { > char path[PATH_MAX]; > struct cpu_map *map; > - int ret = -1; > + int ret; This is failing for me: tests/topology.c: In function ‘test__session_topology’: tests/topology.c:101:6: error: ‘ret’ may be used uninitialized in this function [-Werror=maybe-uninitialized] int ret; ^ cc1: all warnings being treated as errors CC util/intlist.o mv: cannot stat 'tests/.topology.o.tmp': No such file or directory /home/ravi/Workspace/linux/tools/build/Makefile.build:96: recipe for target 'tests/topology.o' failed make[4]: *** [tests/topology.o] Error 1 make[4]: *** Waiting for unfinished jobs Thanks, Ravi
[PATCH] perf script powerpc: Python script for hypervisor call statistics
Add python script to show hypervisor call statistics. Ex, # perf record -a -e "{powerpc:hcall_entry,powerpc:hcall_exit}" # perf script -s scripts/python/powerpc-hcalls.py hcallcount min(ns) max(ns) avg(ns) H_RANDOM82 838 1164 904 H_PUT_TCE 47 1078 5928 2003 H_EOI 266 1336 3546 1654 H_ENTER 28 1646 4038 1952 H_PUT_TCE_INDIRECT 230 2166 18168 6109 H_IPI 238 1072 3232 1688 H_SEND_LOGICAL_LAN 42 5488 21366 7694 H_STUFF_TCE294 986 6210 3591 H_XIRR 266 2286 6990 3783 H_PROTECT 10 2196 3556 2555 H_VIO_SIGNAL 294 1028 2784 1311 H_ADD_LOGICAL_LAN_BUFFER53 1978 3450 2600 H_SEND_CRQ 77 1762 7240 2447 Signed-off-by: Ravi Bangoria --- .../perf/scripts/python/bin/powerpc-hcalls-record | 2 + .../perf/scripts/python/bin/powerpc-hcalls-report | 2 + tools/perf/scripts/python/powerpc-hcalls.py| 200 + 3 files changed, 204 insertions(+) create mode 100644 tools/perf/scripts/python/bin/powerpc-hcalls-record create mode 100644 tools/perf/scripts/python/bin/powerpc-hcalls-report create mode 100644 tools/perf/scripts/python/powerpc-hcalls.py diff --git a/tools/perf/scripts/python/bin/powerpc-hcalls-record b/tools/perf/scripts/python/bin/powerpc-hcalls-record new file mode 100644 index ..b7402aa9147d --- /dev/null +++ b/tools/perf/scripts/python/bin/powerpc-hcalls-record @@ -0,0 +1,2 @@ +#!/bin/bash +perf record -e "{powerpc:hcall_entry,powerpc:hcall_exit}" $@ diff --git a/tools/perf/scripts/python/bin/powerpc-hcalls-report b/tools/perf/scripts/python/bin/powerpc-hcalls-report new file mode 100644 index ..dd32ad7465f6 --- /dev/null +++ b/tools/perf/scripts/python/bin/powerpc-hcalls-report @@ -0,0 +1,2 @@ +#!/bin/bash +perf script $@ -s "$PERF_EXEC_PATH"/scripts/python/powerpc-hcalls.py diff --git a/tools/perf/scripts/python/powerpc-hcalls.py b/tools/perf/scripts/python/powerpc-hcalls.py new file mode 100644 index ..ff732118cae8 --- /dev/null +++ b/tools/perf/scripts/python/powerpc-hcalls.py @@ -0,0 +1,200 @@ +# SPDX-License-Identifier: GPL-2.0+ +# +# Copyright (C) 2018 Ravi Bangoria, IBM Corporation +# +# Hypervisor call statisics + +import os +import sys + +sys.path.append(os.environ['PERF_EXEC_PATH'] + \ + '/scripts/python/Perf-Trace-Util/lib/Perf/Trace') + +from perf_trace_context import * +from Core import * +from Util import * + +# output: { +# opcode: { +# 'min': minimum time nsec +# 'max': maximum time nsec +# 'time': average time nsec +# 'cnt': counter +# } ... +# } +output = {} + +# d_enter: { +# cpu: { +# opcode: nsec +# } ... +# } +d_enter = {} + +hcall_table = { + 4: 'H_REMOVE', + 8: 'H_ENTER', + 12: 'H_READ', + 16: 'H_CLEAR_MOD', + 20: 'H_CLEAR_REF', + 24: 'H_PROTECT', + 28: 'H_GET_TCE', + 32: 'H_PUT_TCE', + 36: 'H_SET_SPRG0', + 40: 'H_SET_DABR', + 44: 'H_PAGE_INIT', + 48: 'H_SET_ASR', + 52: 'H_ASR_ON', + 56: 'H_ASR_OFF', + 60: 'H_LOGICAL_CI_LOAD', + 64: 'H_LOGICAL_CI_STORE', + 68: 'H_LOGICAL_CACHE_LOAD', + 72: 'H_LOGICAL_CACHE_STORE', + 76: 'H_LOGICAL_ICBI', + 80: 'H_LOGICAL_DCBF', + 84: 'H_GET_TERM_CHAR', + 88: 'H_PUT_TERM_CHAR', + 92: 'H_REAL_TO_LOGICAL', + 96: 'H_HYPERVISOR_DATA', + 100: 'H_EOI', + 104: 'H_CPPR', + 108: 'H_IPI', + 112: 'H_IPOLL', + 116: 'H_XIRR', + 120: 'H_MIGRATE_DMA', + 124: 'H_PERFMON', + 220: 'H_REGISTER_VPA', + 224: 'H_CEDE', + 228: 'H_CONFER', + 232: 'H_PROD', + 236: 'H_GET_PPP', + 240: 'H_SET_PPP', + 244: 'H_PURR', + 248: 'H_PIC', + 252: 'H_REG_CRQ', + 256: 'H_FREE_CRQ', + 260: 'H_VIO_SIGNAL', + 264: 'H_SEND_CRQ', + 272: 'H_COPY_RDMA', + 276: 'H_REGISTER_LOGICAL_LAN', + 280: 'H_FREE_LOGICAL_LAN', + 284: 'H_ADD_LOGICAL_LAN_BUFFER', + 288: 'H_SEND_LOGICAL_LAN', + 292: 'H_BULK_REMOVE', + 304: 'H_MULTICAST_CTRL', + 308: 'H_SET_XDABR', + 312: 'H_STUFF_TCE', + 316: 'H_PUT_TCE_INDIRECT', + 332: 'H_CHANGE_LOGICAL_LAN_MAC', + 336: 'H_VTERM_PARTNER_INFO', + 340: 'H_REGISTER_VTERM', + 344: 'H_FREE_VTERM'
[PATCH 0/7] Uprobes: Support SDT markers having reference count (semaphore)
Why RFC again: This series is different from earlier versions[1]. Earlier series implemented this feature in trace_uprobe while this has implemented the logic in core uprobe. Few reasons for this: 1. One of the major reason was the deadlock between uprobe_lock and mm->mmap inside trace_uprobe_mmap(). That deadlock was not easy to fix because mm->mmap is not in control of trace_uprobe_mmap() and it has to take uprobe_lock to loop over trace_uprobe list. More details can be found at[2]. With this new approach, there are no deadlocks found so far. 2. Many of the core uprobe function and data-structures needs to be exported to make earlier implementation simple. With this new approach, reference counter logic is been implemented in core uprobe and thus no need to export anything. Description: Userspace Statically Defined Tracepoints[3] are dtrace style markers inside userspace applications. Applications like PostgreSQL, MySQL, Pthread, Perl, Python, Java, Ruby, Node.js, libvirt, QEMU, glib etc have these markers embedded in them. These markers are added by developer at important places in the code. Each marker source expands to a single nop instruction in the compiled code but there may be additional overhead for computing the marker arguments which expands to couple of instructions. In case the overhead is more, execution of it can be omitted by runtime if() condition when no one is tracing on the marker: if (reference_counter > 0) { Execute marker instructions; } Default value of reference counter is 0. Tracer has to increment the reference counter before tracing on a marker and decrement it when done with the tracing. Currently, perf tool has limited supports for SDT markers. I.e. it can not trace markers surrounded by reference counter. Also, it's not easy to add reference counter logic in userspace tool like perf, so basic idea for this patchset is to add reference counter logic in the trace_uprobe infrastructure. Ex,[4] # cat tick.c ... for (i = 0; i < 100; i++) { DTRACE_PROBE1(tick, loop1, i); if (TICK_LOOP2_ENABLED()) { DTRACE_PROBE1(tick, loop2, i); } printf("hi: %d\n", i); sleep(1); } ... Here tick:loop1 is marker without reference counter where as tick:loop2 is surrounded by reference counter condition. # perf buildid-cache --add /tmp/tick # perf probe sdt_tick:loop1 # perf probe sdt_tick:loop2 # perf stat -e sdt_tick:loop1,sdt_tick:loop2 -- /tmp/tick hi: 0 hi: 1 hi: 2 ^C Performance counter stats for '/tmp/tick': 3 sdt_tick:loop1 0 sdt_tick:loop2 2.747086086 seconds time elapsed Perf failed to record data for tick:loop2. Same experiment with this patch series: # ./perf buildid-cache --add /tmp/tick # ./perf probe sdt_tick:loop2 # ./perf stat -e sdt_tick:loop2 /tmp/tick hi: 0 hi: 1 hi: 2 ^C Performance counter stats for '/tmp/tick': 3 sdt_tick:loop2 2.561851452 seconds time elapsed Note: - 'reference counter' is called as 'semaphore' in original Dtrace (or Systemtap, bcc and even in ELF) documentation and code. But the term 'semaphore' is misleading in this context. This is just a counter used to hold number of tracers tracing on a marker. This is not really used for any synchronization. So we are referring it as 'reference counter' in kernel / perf code. - This patches still has one issue. If there are multiple instances of same application running and user wants to trace any particular instance, trace_uprobe is updating reference counter in all instances. This is not a problem on user side because instruction is not replaced with trap/int3 and thus user will only see samples from his interested process. But still this is more of a correctness issue. I'm working on a fix for this. [1] https://lkml.org/lkml/2018/4/17/23 [2] https://lkml.org/lkml/2018/5/25/111 [3] https://sourceware.org/systemtap/wiki/UserSpaceProbeImplementation [4] https://github.com/iovisor/bcc/issues/327#issuecomment-200576506 Ravi Bangoria (7): Uprobes: Simplify uprobe_register() body Uprobes: Support SDT markers having reference count (semaphore) Uprobes/sdt: Fix multiple update of same reference counter trace_uprobe/sdt: Prevent multiple reference counter for same uprobe Uprobes/sdt: Prevent multiple reference counter for same uprobe Uprobes/sdt: Document about reference counter perf probe: Support SDT markers having reference counter (semaphore) Documentation/trace/uprobetracer.rst | 16 +- include/linux/uprobes.h | 5 + kernel/events/uprobes.c | 502 +++ kernel/trace/trace.c | 2 +- kernel/trace/trace_uprobe.c | 74 +- tools/perf/util/probe-event.c| 39 ++- tools/perf/util/probe-event.h| 1 + tools/
Re: [PATCH] perf stat: Fix shadow stats for clock events
On 11/16/18 7:05 PM, Jiri Olsa wrote: > On Fri, Nov 16, 2018 at 09:58:43AM +0530, Ravi Bangoria wrote: >> Commit 0aa802a79469 ("perf stat: Get rid of extra clock display >> function") introduced scale and unit for clock events. Thus, >> perf_stat__update_shadow_stats() now saves scaled values of >> clock events in msecs, instead of original nsecs. But while >> calculating values of shadow stats we still consider clock >> event values in nsecs. This results in a wrong shadow stat >> values. Ex, >> >> # ./perf stat -e task-clock,cycles ls >> >> 2.60 msec task-clock:u#0.877 CPUs utilized >> 2,430,564 cycles:u# 1215282.000 GHz >> >> Fix this by saving original nsec values for clock events in >> perf_stat__update_shadow_stats(). After patch: >> >> # ./perf stat -e task-clock,cycles ls >> >> 3.14 msec task-clock:u#0.839 CPUs utilized >> 3,094,528 cycles:u#0.985 GHz >> >> Reported-by: Anton Blanchard >> Suggested-by: Jiri Olsa >> Fixes: 0aa802a79469 ("perf stat: Get rid of extra clock display function") >> Signed-off-by: Ravi Bangoria > > Reviewed-by: Jiri Olsa Hi Arnaldo, Please pull this. Thanks.
[PATCH] Uprobes: Fix kernel oops with delayed_uprobe_remove()
syzbot reported a kernel crash with delayed_uprobe_remove(): https://lkml.org/lkml/2018/11/1/1244 Backtrace mentioned in the link points to a race between process exit and uprobe_unregister(). Fix it by locking delayed_uprobe_lock before calling delayed_uprobe_remove() from put_uprobe(). Reported-by: syzbot+cb1fb754b771caca0...@syzkaller.appspotmail.com Signed-off-by: Ravi Bangoria --- kernel/events/uprobes.c | 2 ++ 1 file changed, 2 insertions(+) diff --git a/kernel/events/uprobes.c b/kernel/events/uprobes.c index 96fb51f3994f..e527c4753d4f 100644 --- a/kernel/events/uprobes.c +++ b/kernel/events/uprobes.c @@ -572,7 +572,9 @@ static void put_uprobe(struct uprobe *uprobe) * gets called, we don't get a chance to remove uprobe from * delayed_uprobe_list from remove_breakpoint(). Do it here. */ + mutex_lock(_uprobe_lock); delayed_uprobe_remove(uprobe, NULL); + mutex_unlock(_uprobe_lock); kfree(uprobe); } } -- 2.19.1
Re: mainline/master boot bisection: v4.20-rc5-79-gabb8d6ecbd8f on jetson-tk1
Hi, Can you please provide more details. I don't understand how this patch can cause boot failure. >From the log found at https://storage.kernelci.org/mainline/master/v4.20-rc5-79-gabb8d6ecbd8f/arm/multi_v7_defconfig+CONFIG_EFI=y+CONFIG_ARM_LPAE=y/lab-baylibre/boot-tegra124-jetson-tk1.html 23:21:06.680269 [7.500733] Unable to handle kernel NULL pointer dereference at virtual address 0064 23:21:06.680455 [7.508893] pgd = (ptrval) 23:21:06.721940 [7.511591] [0064] *pgd=ad7d8003, *pmd=f9d5d003 23:21:06.722241 [7.516500] Internal error: Oops: 207 [#1] SMP ARM ... 23:21:06.722724 [7.546706] CPU: 0 PID: 122 Comm: udevd Not tainted 4.20.0-rc5 #1 23:21:06.722911 [7.552785] Hardware name: NVIDIA Tegra SoC (Flattened Device Tree) 23:21:06.765203 [7.559045] PC is at drm_plane_register_all+0x18/0x50 23:21:06.765493 [7.564094] LR is at drm_modeset_register_all+0xc/0x6c 23:21:06.765698 [7.569217] pc : []lr : []psr: a013 23:21:06.765882 [7.575470] sp : c3451c70 ip : 2d827000 fp : c1804c48 23:21:06.766053 [7.580680] r10: r9 : ec9cc300 r8 : 23:21:06.766229 [7.585893] r7 : bf193c80 r6 : r5 : c3694224 r4 : fffc 23:21:06.766403 [7.592404] r3 : 2000 r2 : 0002f000 r1 : eef92cf0 r0 : c3694000 ... 23:21:07.068237 [7.880215] [] (drm_plane_register_all) from [] (drm_modeset_register_all+0xc/0x6c) 23:21:07.068493 [7.889603] [] (drm_modeset_register_all) from [] (drm_dev_register+0x16c/0x1c4) 23:21:07.109960 [7.898915] [] (drm_dev_register) from [] (nouveau_platform_probe+0x54/0x8c [nouveau]) 23:21:07.110285 [7.908750] [] (nouveau_platform_probe [nouveau]) from [] (platform_drv_probe+0x48/0x98) 23:21:07.110515 [7.918572] [] (platform_drv_probe) from [] (really_probe+0x228/0x2d0) 23:21:07.110706 [7.926832] [] (really_probe) from [] (driver_probe_device+0x60/0x174) 23:21:07.110893 [7.935093] [] (driver_probe_device) from [] (__driver_attach+0xd0/0xd4) 23:21:07.153794 [7.943528] [] (__driver_attach) from [] (bus_for_each_dev+0x74/0xb4) 23:21:07.154133 [7.951688] [] (bus_for_each_dev) from [] (bus_add_driver+0x18c/0x210) 23:21:07.154352 [7.959946] [] (bus_add_driver) from [] (driver_register+0x74/0x108) 23:21:07.154544 [7.968212] [] (driver_register) from [] (nouveau_drm_init+0x170/0x1000 [nouveau]) 23:21:07.154739 [7.977692] [] (nouveau_drm_init [nouveau]) from [] (do_one_initcall+0x54/0x1fc) 23:21:07.197008 [7.986820] [] (do_one_initcall) from [] (do_init_module+0x64/0x1f4) 23:21:07.197344 [7.994906] [] (do_init_module) from [] (load_module+0x1ee8/0x23c8) 23:21:07.197553 [8.002907] [] (load_module) from [] (sys_finit_module+0xac/0xd8) 23:21:07.197751 [8.010722] [] (sys_finit_module) from [] (ret_fast_syscall+0x0/0x4c) 23:21:07.197935 [8.018884] Exception stack(0xc3451fa8 to 0xc3451ff0) Both PC and LR are pointing to drm_* code. I don't see this anyway related to uprobes. Did I miss anything? Thanks, Ravi On 12/7/18 9:30 AM, kernelci.org bot wrote: > * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * > * This automated bisection report was sent to you on the basis * > * that you may be involved with the breaking commit it has * > * found. No manual investigation has been done to verify it, * > * and the root cause of the problem may be somewhere else. * > * Hope this helps! * > * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * > > Bisection result for mainline/master (v4.20-rc5-79-gabb8d6ecbd8f) on > jetson-tk1 > > Good: cf76c364a1e1 Merge tag 'scsi-fixes' of > git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi > Bad:abb8d6ecbd8f Merge tag 'trace-v4.20-rc5' of > git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace > Found: 1aed58e67a6e Uprobes: Fix kernel oops with > delayed_uprobe_remove() > > Details: > Good: > https://kernelci.org/boot/all/job/mainline/branch/master/kernel/v4.20-rc5-62-gcf76c364a1e1/ > Bad: > https://kernelci.org/boot/all/job/mainline/branch/master/kernel/v4.20-rc5-79-gabb8d6ecbd8f/ > > Checks: > revert: PASS > verify: PASS > > Parameters: > Tree: mainline > URL: > http://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git > Branch: master > Target: jetson-tk1 > Lab:lab-baylibre > Config: multi_v7_defconfig > Plan: boot > > Breaking commit found: > > --- > commit 1aed58e67a6ec1e7a18bfabe8ba6ec2d27c15636 > Author: Ravi Bangoria > Date: Wed Dec 5 09:04:23 2018 +0530 > > Uprobes: Fix kernel oops with delayed_uprobe
[PATCH v2] Uprobes: Fix kernel oops with delayed_uprobe_remove()
There could be a race between task exit and probe unregister: exit_mm() mmput() __mmput() uprobe_unregister() uprobe_clear_state() put_uprobe() delayed_uprobe_remove() delayed_uprobe_remove() put_uprobe() is calling delayed_uprobe_remove() without taking delayed_uprobe_lock and thus the race sometimes results in a kernel crash. Fix this by taking delayed_uprobe_lock before calling delayed_uprobe_remove() from put_uprobe(). Detailed crash log can be found at: https://lkml.org/lkml/2018/11/1/1244 Reported-by: syzbot+cb1fb754b771caca0...@syzkaller.appspotmail.com Fixes: 1cc33161a83d ("uprobes: Support SDT markers having reference count (semaphore)") Signed-off-by: Ravi Bangoria --- kernel/events/uprobes.c | 2 ++ 1 file changed, 2 insertions(+) diff --git a/kernel/events/uprobes.c b/kernel/events/uprobes.c index 322e97bbb437..abbd8da9ac21 100644 --- a/kernel/events/uprobes.c +++ b/kernel/events/uprobes.c @@ -572,7 +572,9 @@ static void put_uprobe(struct uprobe *uprobe) * gets called, we don't get a chance to remove uprobe from * delayed_uprobe_list from remove_breakpoint(). Do it here. */ + mutex_lock(_uprobe_lock); delayed_uprobe_remove(uprobe, NULL); + mutex_unlock(_uprobe_lock); kfree(uprobe); } } -- 2.19.2
[PATCH v9 0/4] Uprobes: Support SDT markers having reference count (semaphore)
v8 -> v9: - Rebased to rostedt/for-next (Commit bb730b5833b5 to be precise) - Not including first two patches now. They are already pulled by Steven. - Change delayed_uprobe_remove() function as suggested by Oleg - Dump inode, offset, ref_ctr_offset, mm etc if we fail to update reference counter. - Rename delayed_uprobe_install() to delayed_ref_ctr_inc() - Use 'short d' (delta) in update_ref_ctr() in place of 'bool is_register'. v8: https://lkml.org/lkml/2018/8/9/81 Future work: - Optimize uprobe_mmap()->delayed_ref_ctr_inc() by making delayed_uprobe_list per mm. Description: Userspace Statically Defined Tracepoints[1] are dtrace style markers inside userspace applications. Applications like PostgreSQL, MySQL, Pthread, Perl, Python, Java, Ruby, Node.js, libvirt, QEMU, glib etc have these markers embedded in them. These markers are added by developer at important places in the code. Each marker source expands to a single nop instruction in the compiled code but there may be additional overhead for computing the marker arguments which expands to couple of instructions. In case the overhead is more, execution of it can be omitted by runtime if() condition when no one is tracing on the marker: if (reference_counter > 0) { Execute marker instructions; } Default value of reference counter is 0. Tracer has to increment the reference counter before tracing on a marker and decrement it when done with the tracing. Currently, perf tool has limited supports for SDT markers. I.e. it can not trace markers surrounded by reference counter. Also, it's not easy to add reference counter logic in userspace tool like perf, so basic idea for this patchset is to add reference counter logic in the a uprobe infrastructure. Ex,[2] # cat tick.c ... for (i = 0; i < 100; i++) { DTRACE_PROBE1(tick, loop1, i); if (TICK_LOOP2_ENABLED()) { DTRACE_PROBE1(tick, loop2, i); } printf("hi: %d\n", i); sleep(1); } ... Here tick:loop1 is marker without reference counter where as tick:loop2 is surrounded by reference counter condition. # perf buildid-cache --add /tmp/tick # perf probe sdt_tick:loop1 # perf probe sdt_tick:loop2 # perf stat -e sdt_tick:loop1,sdt_tick:loop2 -- /tmp/tick hi: 0 hi: 1 hi: 2 ^C Performance counter stats for '/tmp/tick': 3 sdt_tick:loop1 0 sdt_tick:loop2 2.747086086 seconds time elapsed Perf failed to record data for tick:loop2. Same experiment with this patch series: # ./perf buildid-cache --add /tmp/tick # ./perf probe sdt_tick:loop2 # ./perf stat -e sdt_tick:loop2 /tmp/tick hi: 0 hi: 1 hi: 2 ^C Performance counter stats for '/tmp/tick': 3 sdt_tick:loop2 2.561851452 seconds time elapsed [1] https://sourceware.org/systemtap/wiki/UserSpaceProbeImplementation [2] https://github.com/iovisor/bcc/issues/327#issuecomment-200576506 Ravi Bangoria (4): Uprobes: Support SDT markers having reference count (semaphore) Uprobes/sdt: Prevent multiple reference counter for same uprobe trace_uprobe/sdt: Prevent multiple reference counter for same uprobe perf probe: Support SDT markers having reference counter (semaphore) include/linux/uprobes.h | 5 + kernel/events/uprobes.c | 278 -- kernel/trace/trace.c | 2 +- kernel/trace/trace_uprobe.c | 75 +++- tools/perf/util/probe-event.c | 39 +- tools/perf/util/probe-event.h | 1 + tools/perf/util/probe-file.c | 34 +- tools/perf/util/probe-file.h | 1 + tools/perf/util/symbol-elf.c | 46 +-- tools/perf/util/symbol.h | 7 ++ 10 files changed, 453 insertions(+), 35 deletions(-) -- 2.14.4
[PATCH v9 2/4] Uprobes/sdt: Prevent multiple reference counter for same uprobe
We assume to have only one reference counter for one uprobe. Don't allow user to register multiple uprobes having same inode+offset but different reference counter. Signed-off-by: Ravi Bangoria Acked-by: Srikar Dronamraju Reviewed-by: Oleg Nesterov --- kernel/events/uprobes.c | 19 +++ 1 file changed, 19 insertions(+) diff --git a/kernel/events/uprobes.c b/kernel/events/uprobes.c index 35065febcb6c..ecee371a59c7 100644 --- a/kernel/events/uprobes.c +++ b/kernel/events/uprobes.c @@ -679,6 +679,16 @@ static struct uprobe *insert_uprobe(struct uprobe *uprobe) return u; } +static void +ref_ctr_mismatch_warn(struct uprobe *cur_uprobe, struct uprobe *uprobe) +{ + pr_warn("ref_ctr_offset mismatch. inode: 0x%lx offset: 0x%llx " + "ref_ctr_offset(old): 0x%llx ref_ctr_offset(new): 0x%llx\n", + uprobe->inode->i_ino, (unsigned long long) uprobe->offset, + (unsigned long long) cur_uprobe->ref_ctr_offset, + (unsigned long long) uprobe->ref_ctr_offset); +} + static struct uprobe *alloc_uprobe(struct inode *inode, loff_t offset, loff_t ref_ctr_offset) { @@ -698,6 +708,12 @@ static struct uprobe *alloc_uprobe(struct inode *inode, loff_t offset, cur_uprobe = insert_uprobe(uprobe); /* a uprobe exists for this inode:offset combination */ if (cur_uprobe) { + if (cur_uprobe->ref_ctr_offset != uprobe->ref_ctr_offset) { + ref_ctr_mismatch_warn(cur_uprobe, uprobe); + put_uprobe(cur_uprobe); + kfree(uprobe); + return ERR_PTR(-EINVAL); + } kfree(uprobe); uprobe = cur_uprobe; } @@ -1112,6 +1128,9 @@ static int __uprobe_register(struct inode *inode, loff_t offset, uprobe = alloc_uprobe(inode, offset, ref_ctr_offset); if (!uprobe) return -ENOMEM; + if (IS_ERR(uprobe)) + return PTR_ERR(uprobe); + /* * We can race with uprobe_unregister()->delete_uprobe(). * Check uprobe_is_active() and retry if it is false. -- 2.14.4
[PATCH v9 1/4] Uprobes: Support SDT markers having reference count (semaphore)
Userspace Statically Defined Tracepoints[1] are dtrace style markers inside userspace applications. Applications like PostgreSQL, MySQL, Pthread, Perl, Python, Java, Ruby, Node.js, libvirt, QEMU, glib etc have these markers embedded in them. These markers are added by developer at important places in the code. Each marker source expands to a single nop instruction in the compiled code but there may be additional overhead for computing the marker arguments which expands to couple of instructions. In case the overhead is more, execution of it can be omitted by runtime if() condition when no one is tracing on the marker: if (reference_counter > 0) { Execute marker instructions; } Default value of reference counter is 0. Tracer has to increment the reference counter before tracing on a marker and decrement it when done with the tracing. Implement the reference counter logic in core uprobe. User will be able to use it from trace_uprobe as well as from kernel module. New trace_uprobe definition with reference counter will now be: :[(ref_ctr_offset)] where ref_ctr_offset is an optional field. For kernel module, new variant of uprobe_register() has been introduced: uprobe_register_refctr(inode, offset, ref_ctr_offset, consumer) No new variant for uprobe_unregister() because it's assumed to have only one reference counter for one uprobe. [1] https://sourceware.org/systemtap/wiki/UserSpaceProbeImplementation Note: 'reference counter' is called as 'semaphore' in original Dtrace (or Systemtap, bcc and even in ELF) documentation and code. But the term 'semaphore' is misleading in this context. This is just a counter used to hold number of tracers tracing on a marker. This is not really used for any synchronization. So we are calling it a 'reference counter' in kernel / perf code. Signed-off-by: Ravi Bangoria Reviewed-by: Masami Hiramatsu [Only trace_uprobe.c] Reviewed-by: Oleg Nesterov --- include/linux/uprobes.h | 5 + kernel/events/uprobes.c | 259 ++-- kernel/trace/trace.c| 2 +- kernel/trace/trace_uprobe.c | 38 ++- 4 files changed, 293 insertions(+), 11 deletions(-) diff --git a/include/linux/uprobes.h b/include/linux/uprobes.h index bb9d2084af03..103a48a48872 100644 --- a/include/linux/uprobes.h +++ b/include/linux/uprobes.h @@ -123,6 +123,7 @@ extern unsigned long uprobe_get_swbp_addr(struct pt_regs *regs); extern unsigned long uprobe_get_trap_addr(struct pt_regs *regs); extern int uprobe_write_opcode(struct arch_uprobe *auprobe, struct mm_struct *mm, unsigned long vaddr, uprobe_opcode_t); extern int uprobe_register(struct inode *inode, loff_t offset, struct uprobe_consumer *uc); +extern int uprobe_register_refctr(struct inode *inode, loff_t offset, loff_t ref_ctr_offset, struct uprobe_consumer *uc); extern int uprobe_apply(struct inode *inode, loff_t offset, struct uprobe_consumer *uc, bool); extern void uprobe_unregister(struct inode *inode, loff_t offset, struct uprobe_consumer *uc); extern int uprobe_mmap(struct vm_area_struct *vma); @@ -160,6 +161,10 @@ uprobe_register(struct inode *inode, loff_t offset, struct uprobe_consumer *uc) { return -ENOSYS; } +static inline int uprobe_register_refctr(struct inode *inode, loff_t offset, loff_t ref_ctr_offset, struct uprobe_consumer *uc) +{ + return -ENOSYS; +} static inline int uprobe_apply(struct inode *inode, loff_t offset, struct uprobe_consumer *uc, bool add) { diff --git a/kernel/events/uprobes.c b/kernel/events/uprobes.c index 919c1ce32beb..35065febcb6c 100644 --- a/kernel/events/uprobes.c +++ b/kernel/events/uprobes.c @@ -73,6 +73,7 @@ struct uprobe { struct uprobe_consumer *consumers; struct inode*inode; /* Also hold a ref to inode */ loff_t offset; + loff_t ref_ctr_offset; unsigned long flags; /* @@ -88,6 +89,15 @@ struct uprobe { struct arch_uprobe arch; }; +struct delayed_uprobe { + struct list_head list; + struct uprobe *uprobe; + struct mm_struct *mm; +}; + +static DEFINE_MUTEX(delayed_uprobe_lock); +static LIST_HEAD(delayed_uprobe_list); + /* * Execute out of line area: anonymous executable mapping installed * by the probed task to execute the copy of the original instruction @@ -282,6 +292,166 @@ static int verify_opcode(struct page *page, unsigned long vaddr, uprobe_opcode_t return 1; } +static struct delayed_uprobe * +delayed_uprobe_check(struct uprobe *uprobe, struct mm_struct *mm) +{ + struct delayed_uprobe *du; + + list_for_each_entry(du, _uprobe_list, list) + if (du->uprobe == uprobe && du->mm == mm) + return du; + return NULL; +} + +static int delayed_uprobe_add(struct uprobe *uprobe, struct mm_struct *mm) +{ + struct delayed_uprobe *du; + + if (delayed_uprobe
[PATCH v9 4/4] perf probe: Support SDT markers having reference counter (semaphore)
With this, perf buildid-cache will save SDT markers with reference counter in probe cache. Perf probe will be able to probe markers having reference counter. Ex, # readelf -n /tmp/tick | grep -A1 loop2 Name: loop2 ... Semaphore: 0x10020036 # ./perf buildid-cache --add /tmp/tick # ./perf probe sdt_tick:loop2 # ./perf stat -e sdt_tick:loop2 /tmp/tick hi: 0 hi: 1 hi: 2 ^C Performance counter stats for '/tmp/tick': 3 sdt_tick:loop2 2.561851452 seconds time elapsed Signed-off-by: Ravi Bangoria Acked-by: Masami Hiramatsu Acked-by: Srikar Dronamraju --- tools/perf/util/probe-event.c | 39 tools/perf/util/probe-event.h | 1 + tools/perf/util/probe-file.c | 34 ++-- tools/perf/util/probe-file.h | 1 + tools/perf/util/symbol-elf.c | 46 --- tools/perf/util/symbol.h | 7 +++ 6 files changed, 106 insertions(+), 22 deletions(-) diff --git a/tools/perf/util/probe-event.c b/tools/perf/util/probe-event.c index f119eb628dbb..e86f8be89157 100644 --- a/tools/perf/util/probe-event.c +++ b/tools/perf/util/probe-event.c @@ -1819,6 +1819,12 @@ int parse_probe_trace_command(const char *cmd, struct probe_trace_event *tev) tp->offset = strtoul(fmt2_str, NULL, 10); } + if (tev->uprobes) { + fmt2_str = strchr(p, '('); + if (fmt2_str) + tp->ref_ctr_offset = strtoul(fmt2_str + 1, NULL, 0); + } + tev->nargs = argc - 2; tev->args = zalloc(sizeof(struct probe_trace_arg) * tev->nargs); if (tev->args == NULL) { @@ -2012,6 +2018,22 @@ static int synthesize_probe_trace_arg(struct probe_trace_arg *arg, return err; } +static int +synthesize_uprobe_trace_def(struct probe_trace_event *tev, struct strbuf *buf) +{ + struct probe_trace_point *tp = >point; + int err; + + err = strbuf_addf(buf, "%s:0x%lx", tp->module, tp->address); + + if (err >= 0 && tp->ref_ctr_offset) { + if (!uprobe_ref_ctr_is_supported()) + return -1; + err = strbuf_addf(buf, "(0x%lx)", tp->ref_ctr_offset); + } + return err >= 0 ? 0 : -1; +} + char *synthesize_probe_trace_command(struct probe_trace_event *tev) { struct probe_trace_point *tp = >point; @@ -2041,15 +2063,17 @@ char *synthesize_probe_trace_command(struct probe_trace_event *tev) } /* Use the tp->address for uprobes */ - if (tev->uprobes) - err = strbuf_addf(, "%s:0x%lx", tp->module, tp->address); - else if (!strncmp(tp->symbol, "0x", 2)) + if (tev->uprobes) { + err = synthesize_uprobe_trace_def(tev, ); + } else if (!strncmp(tp->symbol, "0x", 2)) { /* Absolute address. See try_to_find_absolute_address() */ err = strbuf_addf(, "%s%s0x%lx", tp->module ?: "", tp->module ? ":" : "", tp->address); - else + } else { err = strbuf_addf(, "%s%s%s+%lu", tp->module ?: "", tp->module ? ":" : "", tp->symbol, tp->offset); + } + if (err) goto error; @@ -2633,6 +2657,13 @@ static void warn_uprobe_event_compat(struct probe_trace_event *tev) { int i; char *buf = synthesize_probe_trace_command(tev); + struct probe_trace_point *tp = >point; + + if (tp->ref_ctr_offset && !uprobe_ref_ctr_is_supported()) { + pr_warning("A semaphore is associated with %s:%s and " + "seems your kernel doesn't support it.\n", + tev->group, tev->event); + } /* Old uprobe event doesn't support memory dereference */ if (!tev->uprobes || tev->nargs == 0 || !buf) diff --git a/tools/perf/util/probe-event.h b/tools/perf/util/probe-event.h index 45b14f020558..15a98c3a2a2f 100644 --- a/tools/perf/util/probe-event.h +++ b/tools/perf/util/probe-event.h @@ -27,6 +27,7 @@ struct probe_trace_point { char*symbol;/* Base symbol */ char*module;/* Module name */ unsigned long offset; /* Offset from symbol */ + unsigned long ref_ctr_offset; /* SDT reference counter offset */ unsigned long address;/* Actual address of the trace point */ boolretprobe; /* Return probe flag */ }; diff --git a/tools/perf/util/probe-file.c b/tools/perf/util/probe-file.c index b76088fadf3d..aac7817d9e14 100644 --- a/tools/perf/util/probe-file.c
[PATCH v9 3/4] trace_uprobe/sdt: Prevent multiple reference counter for same uprobe
We assume to have only one reference counter for one uprobe. Don't allow user to add multiple trace_uprobe entries having same inode+offset but different reference counter. Ex, # echo "p:sdt_tick/loop2 /home/ravi/tick:0x6e4(0x10036)" > uprobe_events # echo "p:sdt_tick/loop2_1 /home/ravi/tick:0x6e4(0xf)" >> uprobe_events bash: echo: write error: Invalid argument # dmesg trace_kprobe: Reference counter offset mismatch. There is one exception though: When user is trying to replace the old entry with the new one, we allow this if the new entry does not conflict with any other existing entries. Signed-off-by: Ravi Bangoria Acked-by: Srikar Dronamraju Reviewed-by: Song Liu Reviewed-by: Oleg Nesterov --- kernel/trace/trace_uprobe.c | 37 +++-- 1 file changed, 35 insertions(+), 2 deletions(-) diff --git a/kernel/trace/trace_uprobe.c b/kernel/trace/trace_uprobe.c index a7ef6c4ca16e..21d9fffaa096 100644 --- a/kernel/trace/trace_uprobe.c +++ b/kernel/trace/trace_uprobe.c @@ -324,6 +324,35 @@ static int unregister_trace_uprobe(struct trace_uprobe *tu) return 0; } +/* + * Uprobe with multiple reference counter is not allowed. i.e. + * If inode and offset matches, reference counter offset *must* + * match as well. Though, there is one exception: If user is + * replacing old trace_uprobe with new one(same group/event), + * then we allow same uprobe with new reference counter as far + * as the new one does not conflict with any other existing + * ones. + */ +static struct trace_uprobe *find_old_trace_uprobe(struct trace_uprobe *new) +{ + struct trace_uprobe *tmp, *old = NULL; + struct inode *new_inode = d_real_inode(new->path.dentry); + + old = find_probe_event(trace_event_name(>tp.call), + new->tp.call.class->system); + + list_for_each_entry(tmp, _list, list) { + if ((old ? old != tmp : true) && + new_inode == d_real_inode(tmp->path.dentry) && + new->offset == tmp->offset && + new->ref_ctr_offset != tmp->ref_ctr_offset) { + pr_warn("Reference counter offset mismatch."); + return ERR_PTR(-EINVAL); + } + } + return old; +} + /* Register a trace_uprobe and probe_event */ static int register_trace_uprobe(struct trace_uprobe *tu) { @@ -333,8 +362,12 @@ static int register_trace_uprobe(struct trace_uprobe *tu) mutex_lock(_lock); /* register as an event */ - old_tu = find_probe_event(trace_event_name(>tp.call), - tu->tp.call.class->system); + old_tu = find_old_trace_uprobe(tu); + if (IS_ERR(old_tu)) { + ret = PTR_ERR(old_tu); + goto end; + } + if (old_tu) { /* delete old event */ ret = unregister_trace_uprobe(old_tu); -- 2.14.4
Re: [PATCH v9 0/4] Uprobes: Support SDT markers having reference count (semaphore)
Hi Song, > root@virt-test:~# ~/a.out > 11 > semaphore 0 > semaphore 0 > semaphore 2 <<< when the uprobe is enabled Yes, this happens when multiple vmas points to the same file portion. Can you check /proc/`pgrep a.out`/maps. Logic is simple. If we are going to patch an instruction, increment the reference counter. If we are going to unpatch an instruction, decrement the reference counter. In this case, we patched instruction twice and thus incremented reference counter twice as well. Ravi
Re: [PATCH v9 0/4] Uprobes: Support SDT markers having reference count (semaphore)
Hi Song, > However, if I start a.out AFTER enabling the uprobe, there is something wrong: > > root@virt-test:~# ~/a.out > 11 > semaphore 0 <<< this should be non-zero, as the uprobe is already > enabled Ok. I'm able to reproduce this. Digging deeper. Ravi
Re: [PATCH v9 0/4] Uprobes: Support SDT markers having reference count (semaphore)
Hi Song, On 08/21/2018 10:53 AM, Ravi Bangoria wrote: > Hi Song, > >> However, if I start a.out AFTER enabling the uprobe, there is something >> wrong: >> >> root@virt-test:~# ~/a.out >> 11 >> semaphore 0 <<< this should be non-zero, as the uprobe is already >> enabled In this testcase, semaphore variable is stored into .bss: $ nm test | grep semaphore 10010c5e B semaphore $ readelf -SW ./test | grep "data\|bss" [22] .data PROGBITS10010c58 000c58 04 00 WA 0 0 1 [23] .bss NOBITS 10010c5c 000c5c 04 00 WA 0 0 2 I'm not so sure but I guess .bss data initialization happens after calling uprobe_mmap() and thus you are seeing semaphore as 0. To verify this, if I force to save semaphore into data section by assigning non-zero value to it: volatile short semaphore = 1 $ nm test | grep semaphore 10010c5c D semaphore $ readelf -SW ./test | grep "data\|bss" [22] .data PROGBITS10010c58 000c58 06 00 WA 0 0 2 [23] .bss NOBITS 10010c5e 000c5e 02 00 WA 0 0 1 increment/decrement works fine. Ravi
Re: [PATCH v9 0/4] Uprobes: Support SDT markers having reference count (semaphore)
On 08/21/2018 01:04 PM, Naveen N. Rao wrote: > Song Liu wrote: >> I am testing the patch set with the following code: >> >> #include >> #include >> >> volatile short semaphore = 0; >> >> int for_uprobe(int c) >> { >> printf("%d\n", c + 10); >> return c + 1; >> } >> >> int main(int argc, char *argv[]) >> { >> for_uprobe(argc); >> while (1) { >> sleep(1); >> printf("semaphore %d\n", semaphore); >> } >> } >> >> I created a uprobe on function for_uprobe(), that uses semaphore as >> reference counter: >> >> echo "p:uprobe_1 /root/a.out:0x49a(0x1036)" >> uprobe_events > > Is that even valid? That _looks_ like a semaphore, but I'm not quite sure > that it qualifies as an _SDT_ semaphore. Do you see this issue if you instead > use the macros provided by to create SDT markers? > Right. By default SDT reference counters(semaphore) goes into .probes section: [25] .probes PROGBITS0060102c 00102c 04 00 WA 0 0 2 which has PROGBITS set. So this works fine. And there are many other things which are coded into . So the official way to use SDT markers should be through that. Ravi
Re: [PATCH] trace_uprobe: support reference count in fd-based uprobe
On 08/22/2018 03:53 AM, Song Liu wrote: > This patch applies on top of Ravi Bangoria's work that enables reference > count for uprobe: > > https://lkml.org/lkml/2018/8/20/37 > > After Ravi's work, the effort to enable it in fd-based uprobe is straight > forward. Highest 40 bits of perf_event_attr.config is used to stored offset > of the reference count (semaphore). > > Format information in /sys/bus/event_source/devices/uprobe/format/ is > updated to reflect this new feature. LGTM Reviewed-and-tested-by: Ravi Bangoria > > Signed-off-by: Song Liu > Cc: Ravi Bangoria > Cc: Masami Hiramatsu > Cc: Oleg Nesterov > Cc: Srikar Dronamraju > Cc: Naveen N. Rao > Cc: Steven Rostedt (VMware) > Cc: Peter Zijlstra > Cc: Ingo Molnar > Cc: Teng Qin > --- > include/linux/trace_events.h| 3 +- > kernel/events/core.c| 49 ++--- > kernel/trace/trace_event_perf.c | 7 +++-- > kernel/trace/trace_probe.h | 3 +- > kernel/trace/trace_uprobe.c | 4 ++- > 5 files changed, 50 insertions(+), 16 deletions(-) > > diff --git a/include/linux/trace_events.h b/include/linux/trace_events.h > index 78a010e19ed4..4130a5497d40 100644 > --- a/include/linux/trace_events.h > +++ b/include/linux/trace_events.h > @@ -575,7 +575,8 @@ extern int bpf_get_kprobe_info(const struct perf_event > *event, > bool perf_type_tracepoint); > #endif > #ifdef CONFIG_UPROBE_EVENTS > -extern int perf_uprobe_init(struct perf_event *event, bool is_retprobe); > +extern int perf_uprobe_init(struct perf_event *event, > + unsigned long ref_ctr_offset, bool is_retprobe); > extern void perf_uprobe_destroy(struct perf_event *event); > extern int bpf_get_uprobe_info(const struct perf_event *event, > u32 *fd_type, const char **filename, > diff --git a/kernel/events/core.c b/kernel/events/core.c > index 8f0434a9951a..75a0219be420 100644 > --- a/kernel/events/core.c > +++ b/kernel/events/core.c > @@ -8363,30 +8363,39 @@ static struct pmu perf_tracepoint = { > * > * PERF_PROBE_CONFIG_IS_RETPROBE if set, create kretprobe/uretprobe > * if not set, create kprobe/uprobe > + * > + * The following values specify a reference counter (or semaphore in the > + * terminology of tools like dtrace, systemtap, etc.) Userspace Statically > + * Defined Tracepoints (USDT). Currently, we use 40 bit for the offset. > + * > + * PERF_UPROBE_REF_CTR_OFFSET_BITS # of bits in config as th offset > + * PERF_UPROBE_REF_CTR_OFFSET_SHIFT # of bits to shift left > */ > enum perf_probe_config { > PERF_PROBE_CONFIG_IS_RETPROBE = 1U << 0, /* [k,u]retprobe */ > + PERF_UPROBE_REF_CTR_OFFSET_BITS = 40, > + PERF_UPROBE_REF_CTR_OFFSET_SHIFT = 64 - PERF_UPROBE_REF_CTR_OFFSET_BITS, > }; > > PMU_FORMAT_ATTR(retprobe, "config:0"); > +#endif > > -static struct attribute *probe_attrs[] = { > +#ifdef CONFIG_KPROBE_EVENTS > +static struct attribute *kprobe_attrs[] = { > _attr_retprobe.attr, > NULL, > }; > > -static struct attribute_group probe_format_group = { > +static struct attribute_group kprobe_format_group = { > .name = "format", > - .attrs = probe_attrs, > + .attrs = kprobe_attrs, > }; > > -static const struct attribute_group *probe_attr_groups[] = { > - _format_group, > +static const struct attribute_group *kprobe_attr_groups[] = { > + _format_group, > NULL, > }; > -#endif > > -#ifdef CONFIG_KPROBE_EVENTS > static int perf_kprobe_event_init(struct perf_event *event); > static struct pmu perf_kprobe = { > .task_ctx_nr= perf_sw_context, > @@ -8396,7 +8405,7 @@ static struct pmu perf_kprobe = { > .start = perf_swevent_start, > .stop = perf_swevent_stop, > .read = perf_swevent_read, > - .attr_groups= probe_attr_groups, > + .attr_groups= kprobe_attr_groups, > }; > > static int perf_kprobe_event_init(struct perf_event *event) > @@ -8428,6 +8437,24 @@ static int perf_kprobe_event_init(struct perf_event > *event) > #endif /* CONFIG_KPROBE_EVENTS */ > > #ifdef CONFIG_UPROBE_EVENTS > +PMU_FORMAT_ATTR(ref_ctr_offset, "config:63-24"); > + > +static struct attribute *uprobe_attrs[] = { > + _attr_retprobe.attr, > + _attr_ref_ctr_offset.attr, > + NULL, > +}; > + > +static struct attribute_group uprobe_format_group = { > + .name = "format", > + .attrs = uprobe_attrs, > +}; > + > +static const struct attribute_group *uprobe_attr_gr
Re: [PATCH v8 3/6] Uprobes: Support SDT markers having reference count (semaphore)
Hi Song, On 08/11/2018 01:27 PM, Song Liu wrote: >> + >> +static void delayed_uprobe_delete(struct delayed_uprobe *du) >> +{ >> + if (!du) >> + return; > Do we really need this check? Not necessary though, but I would still like to keep it for a safety. > >> + list_del(>list); >> + kfree(du); >> +} >> + >> +static void delayed_uprobe_remove(struct uprobe *uprobe, struct mm_struct >> *mm) >> +{ >> + struct list_head *pos, *q; >> + struct delayed_uprobe *du; >> + >> + if (!uprobe && !mm) >> + return; > And do we really need this check? Yes. delayed_uprobe_remove(uprobe=NULL, mm=NULL) is an invalid case. If I remove this check, code below (or more accurately code suggested by Oleg) will remove all entries from delayed_uprobe_list. So I will keep this check but put a comment above function. [...] >> + >> + ret = get_user_pages_remote(NULL, mm, vaddr, 1, >> + FOLL_WRITE, , , NULL); >> + if (unlikely(ret <= 0)) { >> + /* >> +* We are asking for 1 page. If get_user_pages_remote() >> fails, >> +* it may return 0, in that case we have to return error. >> +*/ >> + ret = (ret == 0) ? -EBUSY : ret; >> + pr_warn("Failed to %s ref_ctr. (%d)\n", >> + d > 0 ? "increment" : "decrement", ret); > This warning is not really useful. Seems this function has little information > about which uprobe is failing here. Maybe we only need warning in the caller > (or caller of caller). Sure, I can move this warning to caller of this function but what are the exact fields you would like to print with warning? Something like this is fine? pr_warn("ref_ctr %s failed for 0x%lx, 0x%lx, 0x%lx, 0x%p", d > 0 ? "increment" : "decrement", inode->i_ino, offset, ref_ctr_offset, mm); More importantly, the reason I didn't print more info is because dmesg is accessible to unprivileged users in many distros but uprobes are not. So printing this information may be a security violation. No? > >> + return ret; >> + } >> + >> + kaddr = kmap_atomic(page); >> + ptr = kaddr + (vaddr & ~PAGE_MASK); >> + >> + if (unlikely(*ptr + d < 0)) { >> + pr_warn("ref_ctr going negative. vaddr: 0x%lx, " >> + "curr val: %d, delta: %d\n", vaddr, *ptr, d); >> + ret = -EINVAL; >> + goto out; >> + } >> + >> + *ptr += d; >> + ret = 0; >> +out: >> + kunmap_atomic(kaddr); >> + put_page(page); >> + return ret; >> +} >> + >> +static int update_ref_ctr(struct uprobe *uprobe, struct mm_struct *mm, >> + bool is_register) > What's the reason of bool is_register here vs. short d in __update_ref_ctr()? > Can we use short for both? Yes, I can use short as well. > >> +{ >> + struct vm_area_struct *rc_vma; >> + unsigned long rc_vaddr; >> + int ret = 0; >> + >> + rc_vma = find_ref_ctr_vma(uprobe, mm); >> + >> + if (rc_vma) { >> + rc_vaddr = offset_to_vaddr(rc_vma, uprobe->ref_ctr_offset); >> + ret = __update_ref_ctr(mm, rc_vaddr, is_register ? 1 : -1); >> + >> + if (is_register) >> + return ret; >> + } > Mixing __update_ref_ctr() here and delayed_uprobe_add() in the same > function is a little confusing (at least for me). How about we always use > delayed uprobe for uprobe_mmap() and use non-delayed in other case(s)? No. delayed_uprobe_add() is needed for uprobe_register() case to handle race between uprobe_register() and process creation. [...] >> >> +static int delayed_uprobe_install(struct vm_area_struct *vma) > This function name is confusing. How about we call it delayed_ref_ctr_incr() > or > something similar? Also, we should add comments to highlight this is vma is > not > the vma containing the uprobe, but the vma containing the ref_ctr. Sure, I'll do that. > >> +{ >> + struct list_head *pos, *q; >> + struct delayed_uprobe *du; >> + unsigned long vaddr; >> + int ret = 0, err = 0; >> + >> + mutex_lock(_uprobe_lock); >> + list_for_each_safe(pos, q, _uprobe_list) { >> + du = list_entry(pos, struct delayed_uprobe, list); >> + >> + if (!valid_ref_ctr_vma(du->uprobe, vma)) >> + continue; >> + >> + vaddr = offset_to_vaddr(vma, du->uprobe->ref_ctr_offset); >> + ret = __update_ref_ctr(vma->vm_mm, vaddr, 1); >> + /* Record an error and continue. */ >> + if (ret && !err) >> + err = ret; > I think this is a good place (when ret != 0) to call pr_warn(). I guess we can > print which mm get error for which uprobe (inode+offset). __update_ref_ctr() is already printing warning, so I didn't add anything
Re: [PATCH v8 3/6] Uprobes: Support SDT markers having reference count (semaphore)
Hi Song, On 08/13/2018 11:17 AM, Ravi Bangoria wrote: >>> + >>> +static void delayed_uprobe_remove(struct uprobe *uprobe, struct mm_struct >>> *mm) >>> +{ >>> + struct list_head *pos, *q; >>> + struct delayed_uprobe *du; >>> + >>> + if (!uprobe && !mm) >>> + return; >> And do we really need this check? > > > Yes. delayed_uprobe_remove(uprobe=NULL, mm=NULL) is an invalid case. If I > remove > this check, code below (or more accurately code suggested by Oleg) will remove > all entries from delayed_uprobe_list. So I will keep this check but put a > comment > above function. > Sorry, my bad. Please ignore above comment. Even though, it saves us to unnecessary loop over entire delayed_uprobe_list when both uprobe and mm are NULL, that case is not possible with current code. Also, I'm not dereferencing any of them. So, IMHO it's fine to remove this check. Thanks, Ravi
Re: [PATCH v8 5/6] trace_uprobe/sdt: Prevent multiple reference counter for same uprobe
Hi Song, On 08/11/2018 01:42 PM, Song Liu wrote: > Do we really need this given we already have PATCH 4/6? > uprobe_regsiter() can be called > out of trace_uprobe, this patch won't catch all conflicts anyway. Right but it, at least, catch all conflicts happening via trace_uprobe. I don't mind in removing this patch but I would like to get an opinion of Oleg/Srikar/Steven/Masami. Thanks, Ravi
Re: [PATCH v8 3/6] Uprobes: Support SDT markers having reference count (semaphore)
Hi Oleg, On 08/13/2018 05:20 PM, Oleg Nesterov wrote: > On 08/13, Ravi Bangoria wrote: >> >> On 08/11/2018 01:27 PM, Song Liu wrote: >>>> + >>>> +static void delayed_uprobe_delete(struct delayed_uprobe *du) >>>> +{ >>>> + if (!du) >>>> + return; >>> Do we really need this check? >> >> Not necessary though, but I would still like to keep it for a safety. > > Heh. I tried to ignore all minor problems in this version, but now that Song > mentioned this unnecessary check... > > Personally I really dislike the checks like this one. > > - It can confuse the reader who will try to understand the purpose > > - it can hide a bug if delayed_uprobe_delete(du) is actually called > with du == NULL. > > IMO, you should either remove it and let the kernel crash (to notice the > problem), or turn it into > > if (WARN_ON(!du)) > return; > > which is self-documented and reports the problem without kernel crash. Ok. I'll remove that check. > >>>> + rc_vma = find_ref_ctr_vma(uprobe, mm); >>>> + >>>> + if (rc_vma) { >>>> + rc_vaddr = offset_to_vaddr(rc_vma, uprobe->ref_ctr_offset); >>>> + ret = __update_ref_ctr(mm, rc_vaddr, is_register ? 1 : -1); >>>> + >>>> + if (is_register) >>>> + return ret; >>>> + } >>> Mixing __update_ref_ctr() here and delayed_uprobe_add() in the same >>> function is a little confusing (at least for me). How about we always use >>> delayed uprobe for uprobe_mmap() and use non-delayed in other case(s)? >> >> >> No. delayed_uprobe_add() is needed for uprobe_register() case to handle race >> between uprobe_register() and process creation. > > Yes. > > But damn, process creation (exec) is trivial. We could add a new uprobe_exec() > hook and avoid delayed_uprobe_install() in uprobe_mmap(). I'm sorry. I didn't get this. > > Afaics, the really problematic case is dlopen() which can race with > _register() > too, right? dlopen() should internally use mmap() right? So what is the problem here? Can you please elaborate. Thanks, Ravi
Re: [PATCH v8 3/6] Uprobes: Support SDT markers having reference count (semaphore)
On 08/13/2018 06:47 PM, Oleg Nesterov wrote: > On 08/13, Ravi Bangoria wrote: >> >>> But damn, process creation (exec) is trivial. We could add a new >>> uprobe_exec() >>> hook and avoid delayed_uprobe_install() in uprobe_mmap(). >> >> I'm sorry. I didn't get this. > > Sorry for confusion... > > I meant, if only exec*( could race with _register(), we could add another > uprobe > hook which updates all (delayed) counters before return to user-mode. Ok. > >>> Afaics, the really problematic case is dlopen() which can race with >>> _register() >>> too, right? >> >> dlopen() should internally use mmap() right? So what is the problem here? Can >> you please elaborate. > > What I tried to say is that we can't avoid > uprobe_mmap()->delayed_uprobe_install() > because dlopen() can race with _register() too, just like exec. Right :) Thanks, Ravi
Re: [PATCH v8 3/6] Uprobes: Support SDT markers having reference count (semaphore)
Hi Song, On 08/13/2018 10:42 PM, Song Liu wrote: > On Mon, Aug 13, 2018 at 6:17 AM, Oleg Nesterov wrote: >> On 08/13, Ravi Bangoria wrote: >>> >>>> But damn, process creation (exec) is trivial. We could add a new >>>> uprobe_exec() >>>> hook and avoid delayed_uprobe_install() in uprobe_mmap(). >>> >>> I'm sorry. I didn't get this. >> >> Sorry for confusion... >> >> I meant, if only exec*( could race with _register(), we could add another >> uprobe >> hook which updates all (delayed) counters before return to user-mode. >> >>>> Afaics, the really problematic case is dlopen() which can race with >>>> _register() >>>> too, right? >>> >>> dlopen() should internally use mmap() right? So what is the problem here? >>> Can >>> you please elaborate. >> >> What I tried to say is that we can't avoid >> uprobe_mmap()->delayed_uprobe_install() >> because dlopen() can race with _register() too, just like exec. >> >> Oleg. >> > > How about we do delayed_uprobe_install() per file? Say we keep a list > of delayed_uprobe > in load_elf_binary(). Then we can install delayed_uprobe after loading > all sections of the > file. I'm not sure if I totally understood the idea. But how this approach can solve dlopen() race with _register()? Rather, making delayed_uprobe_list an mm field seems simple and effective idea to me. The only overhead will be list_empty(mm->delayed_list) check. Please let me know if I misunderstood you. Thanks, Ravi
Re: [PATCH v8 3/6] Uprobes: Support SDT markers having reference count (semaphore)
> +static int delayed_uprobe_install(struct vm_area_struct *vma) > +{ > + struct list_head *pos, *q; > + struct delayed_uprobe *du; > + unsigned long vaddr; > + int ret = 0, err = 0; > + > + mutex_lock(_uprobe_lock); > + list_for_each_safe(pos, q, _uprobe_list) { > + du = list_entry(pos, struct delayed_uprobe, list); > + > + if (!valid_ref_ctr_vma(du->uprobe, vma)) > + continue; I think we should compare mm here. I.e.: if (du->mm != vma->vm_mm || !valid_ref_ctr_vma(du->uprobe, vma)) continue; Otherwise things can mess up. > + > + vaddr = offset_to_vaddr(vma, du->uprobe->ref_ctr_offset); > + ret = __update_ref_ctr(vma->vm_mm, vaddr, 1); > + /* Record an error and continue. */ > + if (ret && !err) > + err = ret; > + delayed_uprobe_delete(du); > + } > + mutex_unlock(_uprobe_lock); > + return err; > +}
Re: [PATCH] perf record: use unmapped IP for inline callchain cursors
On 10/02/2018 09:02 PM, Arnaldo Carvalho de Melo wrote: > Em Tue, Oct 02, 2018 at 09:39:49AM +0200, Milian Wolff escreveu: >> Only use the mapped IP to find inline frames, but keep >> using the unmapped IP for the callchain cursor. This >> ensures we properly show the unmapped IP when displaying >> a frame we received via the dso__parse_addr_inlines API >> for a module which does not contain sufficient debug symbols >> to show the srcline. > > So, the patch came mangled, I fixed it up, please check, I'm adding Ravi > to the CC list, so that he can check as well and retest, right Ravi? > > - Arnaldo > > From d9ddee9653d5676130471de9bc688fc7ec7b4fc4 Mon Sep 17 00:00:00 2001 > From: Milian Wolff > Date: Tue, 2 Oct 2018 09:39:49 +0200 > Subject: [PATCH 1/1] perf record: use unmapped IP for inline callchain cursors > > Only use the mapped IP to find inline frames, but keep using the unmapped IP > for the callchain cursor. This ensures we properly show the unmapped IP when > displaying a frame we received via the dso__parse_addr_inlines API for a > module > which does not contain sufficient debug symbols to show the srcline. > > Before: > > $ perf record -e cycles:u --call-graph ls > $ perf script > ... > ls 12853 2735.563911: 43354 cycles:u: > 17878 __GI___tunables_init+0x01d1d63a0118 > (/usr/lib/ld-2.28.so) > 19ee9 _dl_sysdep_start+0x01d1d63a02e9 > (/usr/lib/ld-2.28.so) > 3087 _dl_start+0x01d1d63a0287 (/usr/lib/ld-2.28.so) > 2007 _start+0x01d1d63a0007 (/usr/lib/ld-2.28.so) > > After: > > $ perf script > ... > ls 12853 2735.563911: 43354 cycles:u: > 7f1714e46878 __GI___tunables_init+0x118 (/usr/lib/ld-2.28.so) > 7f1714e48ee9 _dl_sysdep_start+0x2e9 (/usr/lib/ld-2.28.so) > 7f1714e32087 _dl_start+0x287 (/usr/lib/ld-2.28.so) > 7f1714e31007 _start+0x7 (/usr/lib/ld-2.28.so) With current perf/urgent: $ sudo ./perf record --call-graph=dwarf -e probe_libc:inet_pton -- ping -6 -c 1 ::1 $ sudo ./perf script ping 4109 [011] 767269.031273: probe_libc:inet_pton: (7fff944cb458) 15b458 __inet_pton+0xd7920008 (inlined) 10feb3 gaih_inet.constprop.7+0xd7920f43 (/usr/lib64/libc-2.26. 110a13 __GI_getaddrinfo+0xd7920163 (inlined) 13c752d6f _init+0xbfb (/usr/bin/ping) 2371f generic_start_main.isra.0+0xd792013f (/usr/lib64/libc-2 2391b __libc_start_main+0xd79200bb (/usr/lib64/libc-2.26.so) With this patch: $ sudo ./perf script ping 4109 [011] 767269.031273: probe_libc:inet_pton: (7fff944cb458) 7fff944cb458 __inet_pton+0x8 (inlined) 7fff9447feb3 gaih_inet.constprop.7+0xf43 (/usr/lib64/libc-2.26.so) 7fff94480a13 __GI_getaddrinfo+0x163 (inlined) 13c752d6f _init+0xbfb (/usr/bin/ping) 7fff9439371f generic_start_main.isra.0+0x13f (/usr/lib64/libc-2.26.so) 7fff9439391b __libc_start_main+0xbb (/usr/lib64/libc-2.26.so) LGTM. Tested-by: Ravi Bangoria
Re: [PATCH v3 1/1] perf: Sharing PMU counters across compatible events
Hi Song, On 09/25/2018 03:55 AM, Song Liu wrote: > This patch tries to enable PMU sharing. To make perf event scheduling > fast, we use special data structures. > > An array of "struct perf_event_dup" is added to the perf_event_context, > to remember all the duplicated events under this ctx. All the events > under this ctx has a "dup_id" pointing to its perf_event_dup. Compatible > events under the same ctx share the same perf_event_dup. The following > figure shows a simplified version of the data structure. > > ctx -> perf_event_dup -> master > ^ > | > perf_event /| > | > perf_event / > I've not gone through the patch in detail, but I was specifically interested in scenarios where one perf instance is counting event systemwide and thus other perf instance fails to count the same event for a specific workload because that event can be counted in one hw counter only. Ex: https://lkml.org/lkml/2018/3/12/1011 Seems this patch does not solve this issue. Please let me know if I'm missing anything. Thanks, Ravi
Re: [PATCH v2] perf test: S390 does not support watchpoints in test 22
Hi Thomas, On 09/28/2018 01:13 PM, Thomas Richter wrote: > diff --git a/tools/perf/tests/builtin-test.c b/tools/perf/tests/builtin-test.c > index 54ca7d87236f..e83c15a95c43 100644 > --- a/tools/perf/tests/builtin-test.c > +++ b/tools/perf/tests/builtin-test.c > @@ -124,7 +124,7 @@ static struct test generic_tests[] = { > .desc = "Watchpoint", > .func = test__wp, > .subtest = { > - .skip_if_fail = false, > + .skip_if_fail = true, Can you instead use ".is_supported"? Thanks, Ravi
Re: [PATCH v2] perf test: S390 does not support watchpoints in test 22
Hi Thomas, On 09/28/2018 03:32 PM, Thomas-Mich Richter wrote: > I can rework the patch to use the is_supported() member function. The > down side is that the test does not show up in the list of executed tests > anymore, > unless some debug output is enabled: Which should be fine because s390 is doing that in all breakpoint related tests. But more important thing to suggest .is_supported field is, this patch has a regression on x86: Before: $ sudo ./perf test 22 22: Watchpoint: 22.1: Read Only Watchpoint: Skip 22.2: Write Only Watchpoint : Ok 22.3: Read / Write Watchpoint : Ok 22.4: Modify Watchpoint : Ok After: $ git apply 1.patch $ make $ sudo ./perf test 22 22: Watchpoint: 22.1: Read Only Watchpoint: Skip 22.2: Write Only Watchpoint : Skip 22.3: Read / Write Watchpoint : Skip 22.4: Modify Watchpoint : Skip Intel hw does not support read only watchpoitns. If you use skip_if_fail=true, all subsequent testcases are skipped as well. Thanks, Ravi
Re: [PATCH v3] perf test: S390 does not support watchpoints in test 22
On 09/28/2018 04:23 PM, Thomas Richter wrote: > S390 does not support the perf_event_open system call for > attribute type PERF_TYPE_BREAKPOINT. This results in test > failure for test 22: > > [root@s8360046 perf]# ./perf test 22 > 22: Watchpoint: > 22.1: Read Only Watchpoint: FAILED! > 22.2: Write Only Watchpoint : FAILED! > 22.3: Read / Write Watchpoint : FAILED! > 22.4: Modify Watchpoint : FAILED! > [root@s8360046 perf]# > > Add s390 support to avoid these tests being executed on > s390 platform: > > [root@s8360046 perf]# ./perf test 22 > [root@s8360046 perf]# ./perf test -v 22 > 22: Watchpoint: Disabled > [root@s8360046 perf]# > > Signed-off-by: Thomas Richter Acked-by: Ravi Bangoria Thanks, Ravi
Re: [PATCH 4/6] perf report: Use the offset address to find inline frames
Hi Milian, Seems this has a regression: With acme/perf/urgent: $ ./perf record -e cycles:u --call-graph=dwarf ls $ ./perf script ls 13585 602082.534478: 28032 cycles:u: 1f1f4 __GI___tunables_init+0xd3dc00a4 (/usr/lib64/ld-2.26.so) 20e2b _dl_sysdep_start+0xd3dc04ab (/usr/lib64/ld-2.26.so) 1ca7 _dl_start_final+0xd3dc00f7 (/usr/lib64/ld-2.26.so) 29b3 _dl_start+0xd3dc0553 (/usr/lib64/ld-2.26.so) 1437 _start+0xd3dc0017 (/usr/lib64/ld-2.26.so) After reverting this patch: $ git revert d9c910d5f8c3a1858f115dc9d3b157df32da70a3 [a/perf/urgent 72cada4e6b30] Revert "perf report: Use the offset address to find inline frames" $ ./perf script ls 13585 602082.534478: 28032 cycles:u: 7fff9613f1f4 __GI___tunables_init+0xa4 (/usr/lib64/ld-2.26.so) 7fff96140e2b _dl_sysdep_start+0x4ab (/usr/lib64/ld-2.26.so) 7fff96121ca7 _dl_start_final+0xf7 (/usr/lib64/ld-2.26.so) 7fff961229b3 _dl_start+0x553 (/usr/lib64/ld-2.26.so) 7fff96121437 _start+0x17 (/usr/lib64/ld-2.26.so) Thanks, Ravi On 09/28/2018 05:55 PM, Arnaldo Carvalho de Melo wrote: > From: Milian Wolff > > To correctly find inlined frames, we have to use the file offset instead > of the virtual memory address. This was already fixed for displaying > srcline information while displaying in commit 2a9d5050dc84fa20 ("perf > script: Show correct offsets for DWARF-based unwinding"). We just need > to use the same corrected address also when trying to find inline > frames. > > This is another follow-up to commit 19610184693c ("perf script: Show > virtual addresses instead of offsets"). > > Signed-off-by: Milian Wolff > Acked-by: Jiri Olsa > Cc: Jin Yao > Cc: Namhyung Kim > Cc: Sandipan Das > Fixes: 19610184693c ("perf script: Show virtual addresses instead of offsets") > Link: http://lkml.kernel.org/r/20180926135207.30263-2-milian.wo...@kdab.com > Signed-off-by: Arnaldo Carvalho de Melo > --- > tools/perf/util/machine.c | 6 +++--- > 1 file changed, 3 insertions(+), 3 deletions(-) > > diff --git a/tools/perf/util/machine.c b/tools/perf/util/machine.c > index 0cb4f8bf3ca7..73a651f10a0f 100644 > --- a/tools/perf/util/machine.c > +++ b/tools/perf/util/machine.c > @@ -2317,9 +2317,6 @@ static int unwind_entry(struct unwind_entry *entry, > void *arg) > if (symbol_conf.hide_unresolved && entry->sym == NULL) > return 0; > > - if (append_inlines(cursor, entry->map, entry->sym, entry->ip) == 0) > - return 0; > - > /* >* Convert entry->ip from a virtual address to an offset in >* its corresponding binary. > @@ -2327,6 +2324,9 @@ static int unwind_entry(struct unwind_entry *entry, > void *arg) > if (entry->map) > addr = map__map_ip(entry->map, entry->ip); > > + if (append_inlines(cursor, entry->map, entry->sym, addr) == 0) > + return 0; > + > srcline = callchain_srcline(entry->map, entry->sym, addr); > return callchain_cursor_append(cursor, entry->ip, > entry->map, entry->sym, >
Re: [PATCH v7 3/6] Uprobes: Support SDT markers having reference count (semaphore)
Hi Oleg, Sorry for bit late reply. On 08/03/2018 04:54 PM, Oleg Nesterov wrote: > Hi Ravi, > > I was going to give up and ack this series, but it seems I noticed > a bug... > > On 07/31, Ravi Bangoria wrote: >> >> +static int delayed_uprobe_add(struct uprobe *uprobe, struct mm_struct *mm) >> +{ >> +struct delayed_uprobe *du; >> + >> +if (delayed_uprobe_check(uprobe, mm)) >> +return 0; >> + >> +du = kzalloc(sizeof(*du), GFP_KERNEL); >> +if (!du) >> +return -ENOMEM; >> + >> +du->uprobe = uprobe; >> +du->mm = mm; > > I am surprised I didn't notice this before... > > So > du->mm = mm; > > is fine, mm can't go away, uprobe_clear_state() does > delayed_uprobe_remove(NULL,mm). > > But > du->uprobe = uprobe; > > doesn't look right, uprobe can go away and it can be freed, its memory can be > reused. > We can't rely on remove_breakpoint(), I'm sorry. I didn't get this. How can uprobe go away without calling uprobe_unregister() -> rergister_for_each_vma() -> remove_breakpoint() And remove_breakpoint() will get called only when last uprobe consumer is going away _for that mm_. So, we can rely on remove_breakpoint() to remove {uprobe,mm} from delayed_uprobe_list. Am I missing anything? Or it would be even better if you can tell me some example scenario. > the application can unmap the probed page/vma. > Yes we do not care about the application in this case, say, the next > uprobe_mmap() can > wrongly increment the counter, we do not care although this can lead to > hard-to-debug > problems. But, if nothing else, the kernel can crash if the freed memory is > unmapped. > So I think put_uprobe() should do delayed_uprobe_remove(uprobe, NULL) before > kfree() > and delayed_uprobe_remove() should be updated to handle the mm==NULL case. > > Also. delayed_uprobe_add() should check the list and avoid duplicates. > Otherwise the > trivial > > for (;;) > munmap(mmap(uprobed_file)); > > will eat the memory until uprobe is unregistered. I'm already calling delayed_uprobe_check(uprobe, mm) from delayed_uprobe_add(). So, we don't add same {uprobe,mm} multiple time in delayed_uprobe_list. Is it the same check you are asking me to add or something else. > > >> +static bool valid_ref_ctr_vma(struct uprobe *uprobe, >> + struct vm_area_struct *vma) >> +{ >> +unsigned long vaddr = offset_to_vaddr(vma, uprobe->ref_ctr_offset); >> + >> +return uprobe->ref_ctr_offset && >> +vma->vm_file && >> +file_inode(vma->vm_file) == uprobe->inode && >> +vma->vm_flags & VM_WRITE && >> +!(vma->vm_flags & VM_SHARED) && > > vma->vm_flags & (VM_WRITE|VM_SHARED) == VM_WRITE && > > looks a bit better to me, but I won't insist. Sure. I'll change it. > >> +static int delayed_uprobe_install(struct vm_area_struct *vma) >> +{ >> +struct list_head *pos, *q; >> +struct delayed_uprobe *du; >> +unsigned long vaddr; >> +int ret = 0, err = 0; >> + >> +mutex_lock(_uprobe_lock); >> +list_for_each_safe(pos, q, _uprobe_list) { >> +du = list_entry(pos, struct delayed_uprobe, list); >> + >> +if (!valid_ref_ctr_vma(du->uprobe, vma)) >> +continue; >> + >> +vaddr = offset_to_vaddr(vma, du->uprobe->ref_ctr_offset); >> +ret = __update_ref_ctr(vma->vm_mm, vaddr, 1); >> +/* Record an error and continue. */ >> +err = ret & !err ? ret : err; > > I try to avoid the cosmetic nits, but I simply can't look at this line ;) > > if (ret && !err) > err = ret; This is neat. Will replace it. > >> @@ -1072,7 +1281,14 @@ int uprobe_mmap(struct vm_area_struct *vma) >> struct uprobe *uprobe, *u; >> struct inode *inode; >> >> -if (no_uprobe_events() || !valid_vma(vma, true)) >> +if (no_uprobe_events()) >> +return 0; >> + >> +if (vma->vm_flags & VM_WRITE && >> +test_bit(MMF_HAS_UPROBES, >vm_mm->flags)) >> +delayed_uprobe_install(vma); > > OK, so you also added the VM_WRITE check and I agree. But then I think we > should also check VM_SHARED, just like valid_ref_ctr_vma() does? Right. I'll add a check here. Thanks for reviewing :)
Re: [PATCH v7 3/6] Uprobes: Support SDT markers having reference count (semaphore)
Hi Oleg, >> I'm sorry. I didn't get this. How can uprobe go away without calling >> uprobe_unregister() >> -> rergister_for_each_vma() >>-> remove_breakpoint() >> And remove_breakpoint() will get called > > assuming that _unregister() will find the same vma with the probed insn. But > as I said, the application can munmap the probed page/vma. Got it now :) Thanks.
[PATCH v8 4/6] Uprobes/sdt: Prevent multiple reference counter for same uprobe
We assume to have only one reference counter for one uprobe. Don't allow user to register multiple uprobes having same inode+offset but different reference counter. Signed-off-by: Ravi Bangoria --- kernel/events/uprobes.c | 9 + 1 file changed, 9 insertions(+) diff --git a/kernel/events/uprobes.c b/kernel/events/uprobes.c index 61b0481ef417..492a0e005b4d 100644 --- a/kernel/events/uprobes.c +++ b/kernel/events/uprobes.c @@ -689,6 +689,12 @@ static struct uprobe *alloc_uprobe(struct inode *inode, loff_t offset, cur_uprobe = insert_uprobe(uprobe); /* a uprobe exists for this inode:offset combination */ if (cur_uprobe) { + if (cur_uprobe->ref_ctr_offset != uprobe->ref_ctr_offset) { + pr_warn("Reference counter offset mismatch.\n"); + put_uprobe(cur_uprobe); + kfree(uprobe); + return ERR_PTR(-EINVAL); + } kfree(uprobe); uprobe = cur_uprobe; } @@ -1103,6 +1109,9 @@ static int __uprobe_register(struct inode *inode, loff_t offset, uprobe = alloc_uprobe(inode, offset, ref_ctr_offset); if (!uprobe) return -ENOMEM; + if (IS_ERR(uprobe)) + return PTR_ERR(uprobe); + /* * We can race with uprobe_unregister()->delete_uprobe(). * Check uprobe_is_active() and retry if it is false. -- 2.14.4
[PATCH v8 3/6] Uprobes: Support SDT markers having reference count (semaphore)
Userspace Statically Defined Tracepoints[1] are dtrace style markers inside userspace applications. Applications like PostgreSQL, MySQL, Pthread, Perl, Python, Java, Ruby, Node.js, libvirt, QEMU, glib etc have these markers embedded in them. These markers are added by developer at important places in the code. Each marker source expands to a single nop instruction in the compiled code but there may be additional overhead for computing the marker arguments which expands to couple of instructions. In case the overhead is more, execution of it can be omitted by runtime if() condition when no one is tracing on the marker: if (reference_counter > 0) { Execute marker instructions; } Default value of reference counter is 0. Tracer has to increment the reference counter before tracing on a marker and decrement it when done with the tracing. Implement the reference counter logic in core uprobe. User will be able to use it from trace_uprobe as well as from kernel module. New trace_uprobe definition with reference counter will now be: :[(ref_ctr_offset)] where ref_ctr_offset is an optional field. For kernel module, new variant of uprobe_register() has been introduced: uprobe_register_refctr(inode, offset, ref_ctr_offset, consumer) No new variant for uprobe_unregister() because it's assumed to have only one reference counter for one uprobe. [1] https://sourceware.org/systemtap/wiki/UserSpaceProbeImplementation Note: 'reference counter' is called as 'semaphore' in original Dtrace (or Systemtap, bcc and even in ELF) documentation and code. But the term 'semaphore' is misleading in this context. This is just a counter used to hold number of tracers tracing on a marker. This is not really used for any synchronization. So we are referring it as 'reference counter' in kernel / perf code. Signed-off-by: Ravi Bangoria Reviewed-by: Masami Hiramatsu [Only trace_uprobe.c] --- include/linux/uprobes.h | 5 + kernel/events/uprobes.c | 246 ++-- kernel/trace/trace.c| 2 +- kernel/trace/trace_uprobe.c | 38 ++- 4 files changed, 280 insertions(+), 11 deletions(-) diff --git a/include/linux/uprobes.h b/include/linux/uprobes.h index bb9d2084af03..103a48a48872 100644 --- a/include/linux/uprobes.h +++ b/include/linux/uprobes.h @@ -123,6 +123,7 @@ extern unsigned long uprobe_get_swbp_addr(struct pt_regs *regs); extern unsigned long uprobe_get_trap_addr(struct pt_regs *regs); extern int uprobe_write_opcode(struct arch_uprobe *auprobe, struct mm_struct *mm, unsigned long vaddr, uprobe_opcode_t); extern int uprobe_register(struct inode *inode, loff_t offset, struct uprobe_consumer *uc); +extern int uprobe_register_refctr(struct inode *inode, loff_t offset, loff_t ref_ctr_offset, struct uprobe_consumer *uc); extern int uprobe_apply(struct inode *inode, loff_t offset, struct uprobe_consumer *uc, bool); extern void uprobe_unregister(struct inode *inode, loff_t offset, struct uprobe_consumer *uc); extern int uprobe_mmap(struct vm_area_struct *vma); @@ -160,6 +161,10 @@ uprobe_register(struct inode *inode, loff_t offset, struct uprobe_consumer *uc) { return -ENOSYS; } +static inline int uprobe_register_refctr(struct inode *inode, loff_t offset, loff_t ref_ctr_offset, struct uprobe_consumer *uc) +{ + return -ENOSYS; +} static inline int uprobe_apply(struct inode *inode, loff_t offset, struct uprobe_consumer *uc, bool add) { diff --git a/kernel/events/uprobes.c b/kernel/events/uprobes.c index c0418ba52ba8..61b0481ef417 100644 --- a/kernel/events/uprobes.c +++ b/kernel/events/uprobes.c @@ -73,6 +73,7 @@ struct uprobe { struct uprobe_consumer *consumers; struct inode*inode; /* Also hold a ref to inode */ loff_t offset; + loff_t ref_ctr_offset; unsigned long flags; /* @@ -88,6 +89,15 @@ struct uprobe { struct arch_uprobe arch; }; +struct delayed_uprobe { + struct list_head list; + struct uprobe *uprobe; + struct mm_struct *mm; +}; + +static DEFINE_MUTEX(delayed_uprobe_lock); +static LIST_HEAD(delayed_uprobe_list); + /* * Execute out of line area: anonymous executable mapping installed * by the probed task to execute the copy of the original instruction @@ -282,6 +292,157 @@ static int verify_opcode(struct page *page, unsigned long vaddr, uprobe_opcode_t return 1; } +static struct delayed_uprobe * +delayed_uprobe_check(struct uprobe *uprobe, struct mm_struct *mm) +{ + struct delayed_uprobe *du; + + list_for_each_entry(du, _uprobe_list, list) + if (du->uprobe == uprobe && du->mm == mm) + return du; + return NULL; +} + +static int delayed_uprobe_add(struct uprobe *uprobe, struct mm_struct *mm) +{ + struct delayed_uprobe *du; + + if (delayed_uprobe_check(uprobe, mm)) +
[PATCH v8 1/6] Uprobes: Simplify uprobe_register() body
Simplify uprobe_register() function body and let __uprobe_register() handle everything. Also move dependency functions around to fix build failures. Signed-off-by: Ravi Bangoria --- kernel/events/uprobes.c | 69 ++--- 1 file changed, 36 insertions(+), 33 deletions(-) diff --git a/kernel/events/uprobes.c b/kernel/events/uprobes.c index ccc579a7d32e..471eac896635 100644 --- a/kernel/events/uprobes.c +++ b/kernel/events/uprobes.c @@ -840,13 +840,8 @@ register_for_each_vma(struct uprobe *uprobe, struct uprobe_consumer *new) return err; } -static int __uprobe_register(struct uprobe *uprobe, struct uprobe_consumer *uc) -{ - consumer_add(uprobe, uc); - return register_for_each_vma(uprobe, uc); -} - -static void __uprobe_unregister(struct uprobe *uprobe, struct uprobe_consumer *uc) +static void +__uprobe_unregister(struct uprobe *uprobe, struct uprobe_consumer *uc) { int err; @@ -860,24 +855,46 @@ static void __uprobe_unregister(struct uprobe *uprobe, struct uprobe_consumer *u } /* - * uprobe_register - register a probe + * uprobe_unregister - unregister a already registered probe. + * @inode: the file in which the probe has to be removed. + * @offset: offset from the start of the file. + * @uc: identify which probe if multiple probes are colocated. + */ +void uprobe_unregister(struct inode *inode, loff_t offset, struct uprobe_consumer *uc) +{ + struct uprobe *uprobe; + + uprobe = find_uprobe(inode, offset); + if (WARN_ON(!uprobe)) + return; + + down_write(>register_rwsem); + __uprobe_unregister(uprobe, uc); + up_write(>register_rwsem); + put_uprobe(uprobe); +} +EXPORT_SYMBOL_GPL(uprobe_unregister); + +/* + * __uprobe_register - register a probe * @inode: the file in which the probe has to be placed. * @offset: offset from the start of the file. * @uc: information on howto handle the probe.. * - * Apart from the access refcount, uprobe_register() takes a creation + * Apart from the access refcount, __uprobe_register() takes a creation * refcount (thro alloc_uprobe) if and only if this @uprobe is getting * inserted into the rbtree (i.e first consumer for a @inode:@offset * tuple). Creation refcount stops uprobe_unregister from freeing the * @uprobe even before the register operation is complete. Creation * refcount is released when the last @uc for the @uprobe - * unregisters. Caller of uprobe_register() is required to keep @inode + * unregisters. Caller of __uprobe_register() is required to keep @inode * (and the containing mount) referenced. * * Return errno if it cannot successully install probes * else return 0 (success) */ -int uprobe_register(struct inode *inode, loff_t offset, struct uprobe_consumer *uc) +static int __uprobe_register(struct inode *inode, loff_t offset, +struct uprobe_consumer *uc) { struct uprobe *uprobe; int ret; @@ -904,7 +921,8 @@ int uprobe_register(struct inode *inode, loff_t offset, struct uprobe_consumer * down_write(>register_rwsem); ret = -EAGAIN; if (likely(uprobe_is_active(uprobe))) { - ret = __uprobe_register(uprobe, uc); + consumer_add(uprobe, uc); + ret = register_for_each_vma(uprobe, uc); if (ret) __uprobe_unregister(uprobe, uc); } @@ -915,6 +933,12 @@ int uprobe_register(struct inode *inode, loff_t offset, struct uprobe_consumer * goto retry; return ret; } + +int uprobe_register(struct inode *inode, loff_t offset, + struct uprobe_consumer *uc) +{ + return __uprobe_register(inode, offset, uc); +} EXPORT_SYMBOL_GPL(uprobe_register); /* @@ -946,27 +970,6 @@ int uprobe_apply(struct inode *inode, loff_t offset, return ret; } -/* - * uprobe_unregister - unregister a already registered probe. - * @inode: the file in which the probe has to be removed. - * @offset: offset from the start of the file. - * @uc: identify which probe if multiple probes are colocated. - */ -void uprobe_unregister(struct inode *inode, loff_t offset, struct uprobe_consumer *uc) -{ - struct uprobe *uprobe; - - uprobe = find_uprobe(inode, offset); - if (WARN_ON(!uprobe)) - return; - - down_write(>register_rwsem); - __uprobe_unregister(uprobe, uc); - up_write(>register_rwsem); - put_uprobe(uprobe); -} -EXPORT_SYMBOL_GPL(uprobe_unregister); - static int unapply_uprobe(struct uprobe *uprobe, struct mm_struct *mm) { struct vm_area_struct *vma; -- 2.14.4
[PATCH v8 5/6] trace_uprobe/sdt: Prevent multiple reference counter for same uprobe
We assume to have only one reference counter for one uprobe. Don't allow user to add multiple trace_uprobe entries having same inode+offset but different reference counter. Ex, # echo "p:sdt_tick/loop2 /home/ravi/tick:0x6e4(0x10036)" > uprobe_events # echo "p:sdt_tick/loop2_1 /home/ravi/tick:0x6e4(0xf)" >> uprobe_events bash: echo: write error: Invalid argument # dmesg trace_kprobe: Reference counter offset mismatch. There is one exception though: When user is trying to replace the old entry with the new one, we allow this if the new entry does not conflict with any other existing entries. Signed-off-by: Ravi Bangoria --- kernel/trace/trace_uprobe.c | 37 +++-- 1 file changed, 35 insertions(+), 2 deletions(-) diff --git a/kernel/trace/trace_uprobe.c b/kernel/trace/trace_uprobe.c index bf2be098eb08..be64d943d7ea 100644 --- a/kernel/trace/trace_uprobe.c +++ b/kernel/trace/trace_uprobe.c @@ -324,6 +324,35 @@ static int unregister_trace_uprobe(struct trace_uprobe *tu) return 0; } +/* + * Uprobe with multiple reference counter is not allowed. i.e. + * If inode and offset matches, reference counter offset *must* + * match as well. Though, there is one exception: If user is + * replacing old trace_uprobe with new one(same group/event), + * then we allow same uprobe with new reference counter as far + * as the new one does not conflict with any other existing + * ones. + */ +static struct trace_uprobe *find_old_trace_uprobe(struct trace_uprobe *new) +{ + struct trace_uprobe *tmp, *old = NULL; + struct inode *new_inode = d_real_inode(new->path.dentry); + + old = find_probe_event(trace_event_name(>tp.call), + new->tp.call.class->system); + + list_for_each_entry(tmp, _list, list) { + if ((old ? old != tmp : true) && + new_inode == d_real_inode(tmp->path.dentry) && + new->offset == tmp->offset && + new->ref_ctr_offset != tmp->ref_ctr_offset) { + pr_warn("Reference counter offset mismatch."); + return ERR_PTR(-EINVAL); + } + } + return old; +} + /* Register a trace_uprobe and probe_event */ static int register_trace_uprobe(struct trace_uprobe *tu) { @@ -333,8 +362,12 @@ static int register_trace_uprobe(struct trace_uprobe *tu) mutex_lock(_lock); /* register as an event */ - old_tu = find_probe_event(trace_event_name(>tp.call), - tu->tp.call.class->system); + old_tu = find_old_trace_uprobe(tu); + if (IS_ERR(old_tu)) { + ret = PTR_ERR(old_tu); + goto end; + } + if (old_tu) { /* delete old event */ ret = unregister_trace_uprobe(old_tu); -- 2.14.4
[PATCH v8 2/6] Uprobe: Additional argument arch_uprobe to uprobe_write_opcode()
Add addition argument 'arch_uprobe' to uprobe_write_opcode(). We need this in later set of patches. Signed-off-by: Ravi Bangoria --- arch/arm/probes/uprobes/core.c | 2 +- arch/mips/kernel/uprobes.c | 2 +- include/linux/uprobes.h| 2 +- kernel/events/uprobes.c| 9 + 4 files changed, 8 insertions(+), 7 deletions(-) diff --git a/arch/arm/probes/uprobes/core.c b/arch/arm/probes/uprobes/core.c index d1329f1ba4e4..bf992264060e 100644 --- a/arch/arm/probes/uprobes/core.c +++ b/arch/arm/probes/uprobes/core.c @@ -32,7 +32,7 @@ bool is_swbp_insn(uprobe_opcode_t *insn) int set_swbp(struct arch_uprobe *auprobe, struct mm_struct *mm, unsigned long vaddr) { - return uprobe_write_opcode(mm, vaddr, + return uprobe_write_opcode(auprobe, mm, vaddr, __opcode_to_mem_arm(auprobe->bpinsn)); } diff --git a/arch/mips/kernel/uprobes.c b/arch/mips/kernel/uprobes.c index f7a0645ccb82..4aaff3b3175c 100644 --- a/arch/mips/kernel/uprobes.c +++ b/arch/mips/kernel/uprobes.c @@ -224,7 +224,7 @@ unsigned long arch_uretprobe_hijack_return_addr( int __weak set_swbp(struct arch_uprobe *auprobe, struct mm_struct *mm, unsigned long vaddr) { - return uprobe_write_opcode(mm, vaddr, UPROBE_SWBP_INSN); + return uprobe_write_opcode(auprobe, mm, vaddr, UPROBE_SWBP_INSN); } void arch_uprobe_copy_ixol(struct page *page, unsigned long vaddr, diff --git a/include/linux/uprobes.h b/include/linux/uprobes.h index 0a294e950df8..bb9d2084af03 100644 --- a/include/linux/uprobes.h +++ b/include/linux/uprobes.h @@ -121,7 +121,7 @@ extern bool is_swbp_insn(uprobe_opcode_t *insn); extern bool is_trap_insn(uprobe_opcode_t *insn); extern unsigned long uprobe_get_swbp_addr(struct pt_regs *regs); extern unsigned long uprobe_get_trap_addr(struct pt_regs *regs); -extern int uprobe_write_opcode(struct mm_struct *mm, unsigned long vaddr, uprobe_opcode_t); +extern int uprobe_write_opcode(struct arch_uprobe *auprobe, struct mm_struct *mm, unsigned long vaddr, uprobe_opcode_t); extern int uprobe_register(struct inode *inode, loff_t offset, struct uprobe_consumer *uc); extern int uprobe_apply(struct inode *inode, loff_t offset, struct uprobe_consumer *uc, bool); extern void uprobe_unregister(struct inode *inode, loff_t offset, struct uprobe_consumer *uc); diff --git a/kernel/events/uprobes.c b/kernel/events/uprobes.c index 471eac896635..c0418ba52ba8 100644 --- a/kernel/events/uprobes.c +++ b/kernel/events/uprobes.c @@ -299,8 +299,8 @@ static int verify_opcode(struct page *page, unsigned long vaddr, uprobe_opcode_t * Called with mm->mmap_sem held for write. * Return 0 (success) or a negative errno. */ -int uprobe_write_opcode(struct mm_struct *mm, unsigned long vaddr, - uprobe_opcode_t opcode) +int uprobe_write_opcode(struct arch_uprobe *auprobe, struct mm_struct *mm, + unsigned long vaddr, uprobe_opcode_t opcode) { struct page *old_page, *new_page; struct vm_area_struct *vma; @@ -351,7 +351,7 @@ int uprobe_write_opcode(struct mm_struct *mm, unsigned long vaddr, */ int __weak set_swbp(struct arch_uprobe *auprobe, struct mm_struct *mm, unsigned long vaddr) { - return uprobe_write_opcode(mm, vaddr, UPROBE_SWBP_INSN); + return uprobe_write_opcode(auprobe, mm, vaddr, UPROBE_SWBP_INSN); } /** @@ -366,7 +366,8 @@ int __weak set_swbp(struct arch_uprobe *auprobe, struct mm_struct *mm, unsigned int __weak set_orig_insn(struct arch_uprobe *auprobe, struct mm_struct *mm, unsigned long vaddr) { - return uprobe_write_opcode(mm, vaddr, *(uprobe_opcode_t *)>insn); + return uprobe_write_opcode(auprobe, mm, vaddr, + *(uprobe_opcode_t *)>insn); } static struct uprobe *get_uprobe(struct uprobe *uprobe) -- 2.14.4
[PATCH v8 6/6] perf probe: Support SDT markers having reference counter (semaphore)
With this, perf buildid-cache will save SDT markers with reference counter in probe cache. Perf probe will be able to probe markers having reference counter. Ex, # readelf -n /tmp/tick | grep -A1 loop2 Name: loop2 ... Semaphore: 0x10020036 # ./perf buildid-cache --add /tmp/tick # ./perf probe sdt_tick:loop2 # ./perf stat -e sdt_tick:loop2 /tmp/tick hi: 0 hi: 1 hi: 2 ^C Performance counter stats for '/tmp/tick': 3 sdt_tick:loop2 2.561851452 seconds time elapsed Signed-off-by: Ravi Bangoria Acked-by: Masami Hiramatsu Acked-by: Srikar Dronamraju --- tools/perf/util/probe-event.c | 39 tools/perf/util/probe-event.h | 1 + tools/perf/util/probe-file.c | 34 ++-- tools/perf/util/probe-file.h | 1 + tools/perf/util/symbol-elf.c | 46 --- tools/perf/util/symbol.h | 7 +++ 6 files changed, 106 insertions(+), 22 deletions(-) diff --git a/tools/perf/util/probe-event.c b/tools/perf/util/probe-event.c index f119eb628dbb..e86f8be89157 100644 --- a/tools/perf/util/probe-event.c +++ b/tools/perf/util/probe-event.c @@ -1819,6 +1819,12 @@ int parse_probe_trace_command(const char *cmd, struct probe_trace_event *tev) tp->offset = strtoul(fmt2_str, NULL, 10); } + if (tev->uprobes) { + fmt2_str = strchr(p, '('); + if (fmt2_str) + tp->ref_ctr_offset = strtoul(fmt2_str + 1, NULL, 0); + } + tev->nargs = argc - 2; tev->args = zalloc(sizeof(struct probe_trace_arg) * tev->nargs); if (tev->args == NULL) { @@ -2012,6 +2018,22 @@ static int synthesize_probe_trace_arg(struct probe_trace_arg *arg, return err; } +static int +synthesize_uprobe_trace_def(struct probe_trace_event *tev, struct strbuf *buf) +{ + struct probe_trace_point *tp = >point; + int err; + + err = strbuf_addf(buf, "%s:0x%lx", tp->module, tp->address); + + if (err >= 0 && tp->ref_ctr_offset) { + if (!uprobe_ref_ctr_is_supported()) + return -1; + err = strbuf_addf(buf, "(0x%lx)", tp->ref_ctr_offset); + } + return err >= 0 ? 0 : -1; +} + char *synthesize_probe_trace_command(struct probe_trace_event *tev) { struct probe_trace_point *tp = >point; @@ -2041,15 +2063,17 @@ char *synthesize_probe_trace_command(struct probe_trace_event *tev) } /* Use the tp->address for uprobes */ - if (tev->uprobes) - err = strbuf_addf(, "%s:0x%lx", tp->module, tp->address); - else if (!strncmp(tp->symbol, "0x", 2)) + if (tev->uprobes) { + err = synthesize_uprobe_trace_def(tev, ); + } else if (!strncmp(tp->symbol, "0x", 2)) { /* Absolute address. See try_to_find_absolute_address() */ err = strbuf_addf(, "%s%s0x%lx", tp->module ?: "", tp->module ? ":" : "", tp->address); - else + } else { err = strbuf_addf(, "%s%s%s+%lu", tp->module ?: "", tp->module ? ":" : "", tp->symbol, tp->offset); + } + if (err) goto error; @@ -2633,6 +2657,13 @@ static void warn_uprobe_event_compat(struct probe_trace_event *tev) { int i; char *buf = synthesize_probe_trace_command(tev); + struct probe_trace_point *tp = >point; + + if (tp->ref_ctr_offset && !uprobe_ref_ctr_is_supported()) { + pr_warning("A semaphore is associated with %s:%s and " + "seems your kernel doesn't support it.\n", + tev->group, tev->event); + } /* Old uprobe event doesn't support memory dereference */ if (!tev->uprobes || tev->nargs == 0 || !buf) diff --git a/tools/perf/util/probe-event.h b/tools/perf/util/probe-event.h index 45b14f020558..15a98c3a2a2f 100644 --- a/tools/perf/util/probe-event.h +++ b/tools/perf/util/probe-event.h @@ -27,6 +27,7 @@ struct probe_trace_point { char*symbol;/* Base symbol */ char*module;/* Module name */ unsigned long offset; /* Offset from symbol */ + unsigned long ref_ctr_offset; /* SDT reference counter offset */ unsigned long address;/* Actual address of the trace point */ boolretprobe; /* Return probe flag */ }; diff --git a/tools/perf/util/probe-file.c b/tools/perf/util/probe-file.c index b76088fadf3d..aac7817d9e14 100644 --- a/tools/perf/util/probe-file.c
[PATCH v8 0/6] Uprobes: Support SDT markers having reference count (semaphore)
v8 changes: - Call delayed_uprobe_remove(uprobe, NULL) from put_uprobe() as well. - Other minor optimizations. v7: https://lkml.org/lkml/2018/7/31/20 Description: Userspace Statically Defined Tracepoints[1] are dtrace style markers inside userspace applications. Applications like PostgreSQL, MySQL, Pthread, Perl, Python, Java, Ruby, Node.js, libvirt, QEMU, glib etc have these markers embedded in them. These markers are added by developer at important places in the code. Each marker source expands to a single nop instruction in the compiled code but there may be additional overhead for computing the marker arguments which expands to couple of instructions. In case the overhead is more, execution of it can be omitted by runtime if() condition when no one is tracing on the marker: if (reference_counter > 0) { Execute marker instructions; } Default value of reference counter is 0. Tracer has to increment the reference counter before tracing on a marker and decrement it when done with the tracing. Currently, perf tool has limited supports for SDT markers. I.e. it can not trace markers surrounded by reference counter. Also, it's not easy to add reference counter logic in userspace tool like perf, so basic idea for this patchset is to add reference counter logic in the a uprobe infrastructure. Ex,[2] # cat tick.c ... for (i = 0; i < 100; i++) { DTRACE_PROBE1(tick, loop1, i); if (TICK_LOOP2_ENABLED()) { DTRACE_PROBE1(tick, loop2, i); } printf("hi: %d\n", i); sleep(1); } ... Here tick:loop1 is marker without reference counter where as tick:loop2 is surrounded by reference counter condition. # perf buildid-cache --add /tmp/tick # perf probe sdt_tick:loop1 # perf probe sdt_tick:loop2 # perf stat -e sdt_tick:loop1,sdt_tick:loop2 -- /tmp/tick hi: 0 hi: 1 hi: 2 ^C Performance counter stats for '/tmp/tick': 3 sdt_tick:loop1 0 sdt_tick:loop2 2.747086086 seconds time elapsed Perf failed to record data for tick:loop2. Same experiment with this patch series: # ./perf buildid-cache --add /tmp/tick # ./perf probe sdt_tick:loop2 # ./perf stat -e sdt_tick:loop2 /tmp/tick hi: 0 hi: 1 hi: 2 ^C Performance counter stats for '/tmp/tick': 3 sdt_tick:loop2 2.561851452 seconds time elapsed [1] https://sourceware.org/systemtap/wiki/UserSpaceProbeImplementation Ravi Bangoria (6): Uprobes: Simplify uprobe_register() body Uprobe: Additional argument arch_uprobe to uprobe_write_opcode() Uprobes: Support SDT markers having reference count (semaphore) Uprobes/sdt: Prevent multiple reference counter for same uprobe trace_uprobe/sdt: Prevent multiple reference counter for same uprobe perf probe: Support SDT markers having reference counter (semaphore) arch/arm/probes/uprobes/core.c | 2 +- arch/mips/kernel/uprobes.c | 2 +- include/linux/uprobes.h| 7 +- kernel/events/uprobes.c| 329 +++-- kernel/trace/trace.c | 2 +- kernel/trace/trace_uprobe.c| 75 +- tools/perf/util/probe-event.c | 39 - tools/perf/util/probe-event.h | 1 + tools/perf/util/probe-file.c | 34 - tools/perf/util/probe-file.h | 1 + tools/perf/util/symbol-elf.c | 46 -- tools/perf/util/symbol.h | 7 + 12 files changed, 472 insertions(+), 73 deletions(-) -- 2.14.4
[PATCH 2/2] powerpc/perf: Fix mmcra corruption by bhrb_filter
Consider a scenario where user creates two events: 1st event: attr.sample_type |= PERF_SAMPLE_BRANCH_STACK; attr.branch_sample_type = PERF_SAMPLE_BRANCH_ANY; fd = perf_event_open(attr, 0, 1, -1, 0); This sets cpuhw->bhrb_filter to 0 and returns valid fd. 2nd event: attr.sample_type |= PERF_SAMPLE_BRANCH_STACK; attr.branch_sample_type = PERF_SAMPLE_BRANCH_CALL; fd = perf_event_open(attr, 0, 1, -1, 0); It overrides cpuhw->bhrb_filter to -1 and returns with error. Now if power_pmu_enable() gets called by any path other than power_pmu_add(), ppmu->config_bhrb(-1) will set mmcra to -1. Signed-off-by: Ravi Bangoria --- arch/powerpc/perf/core-book3s.c | 6 -- arch/powerpc/perf/power8-pmu.c | 3 +++ arch/powerpc/perf/power9-pmu.c | 3 +++ 3 files changed, 10 insertions(+), 2 deletions(-) diff --git a/arch/powerpc/perf/core-book3s.c b/arch/powerpc/perf/core-book3s.c index b0723002a396..8eb5dc5df62b 100644 --- a/arch/powerpc/perf/core-book3s.c +++ b/arch/powerpc/perf/core-book3s.c @@ -1846,6 +1846,7 @@ static int power_pmu_event_init(struct perf_event *event) int n; int err; struct cpu_hw_events *cpuhw; + u64 bhrb_filter; if (!ppmu) return -ENOENT; @@ -1951,13 +1952,14 @@ static int power_pmu_event_init(struct perf_event *event) err = power_check_constraints(cpuhw, events, cflags, n + 1); if (has_branch_stack(event)) { - cpuhw->bhrb_filter = ppmu->bhrb_filter_map( + bhrb_filter = ppmu->bhrb_filter_map( event->attr.branch_sample_type); - if (cpuhw->bhrb_filter == -1) { + if (bhrb_filter == -1) { put_cpu_var(cpu_hw_events); return -EOPNOTSUPP; } + cpuhw->bhrb_filter = bhrb_filter; } put_cpu_var(cpu_hw_events); diff --git a/arch/powerpc/perf/power8-pmu.c b/arch/powerpc/perf/power8-pmu.c index d12a2db26353..d10feef93b6b 100644 --- a/arch/powerpc/perf/power8-pmu.c +++ b/arch/powerpc/perf/power8-pmu.c @@ -29,6 +29,7 @@ enum { #definePOWER8_MMCRA_IFM1 0x4000UL #definePOWER8_MMCRA_IFM2 0x8000UL #definePOWER8_MMCRA_IFM3 0xC000UL +#definePOWER8_MMCRA_BHRB_MASK 0xC000UL /* * Raw event encoding for PowerISA v2.07 (Power8): @@ -243,6 +244,8 @@ static u64 power8_bhrb_filter_map(u64 branch_sample_type) static void power8_config_bhrb(u64 pmu_bhrb_filter) { + pmu_bhrb_filter &= POWER8_MMCRA_BHRB_MASK; + /* Enable BHRB filter in PMU */ mtspr(SPRN_MMCRA, (mfspr(SPRN_MMCRA) | pmu_bhrb_filter)); } diff --git a/arch/powerpc/perf/power9-pmu.c b/arch/powerpc/perf/power9-pmu.c index 030544e35959..f3987915cadc 100644 --- a/arch/powerpc/perf/power9-pmu.c +++ b/arch/powerpc/perf/power9-pmu.c @@ -92,6 +92,7 @@ enum { #define POWER9_MMCRA_IFM1 0x4000UL #define POWER9_MMCRA_IFM2 0x8000UL #define POWER9_MMCRA_IFM3 0xC000UL +#define POWER9_MMCRA_BHRB_MASK 0xC000UL /* Nasty Power9 specific hack */ #define PVR_POWER9_CUMULUS 0x2000 @@ -300,6 +301,8 @@ static u64 power9_bhrb_filter_map(u64 branch_sample_type) static void power9_config_bhrb(u64 pmu_bhrb_filter) { + pmu_bhrb_filter &= POWER9_MMCRA_BHRB_MASK; + /* Enable BHRB filter in PMU */ mtspr(SPRN_MMCRA, (mfspr(SPRN_MMCRA) | pmu_bhrb_filter)); } -- 2.20.1
[PATCH 1/2] perf ioctl: Add check for the sample_period value
Add a check for sample_period value sent from userspace. Negative value does not make sense. And in powerpc arch code this could cause a recursive PMI leading to a hang (reported when running perf-fuzzer). Signed-off-by: Ravi Bangoria --- kernel/events/core.c | 3 +++ 1 file changed, 3 insertions(+) diff --git a/kernel/events/core.c b/kernel/events/core.c index abbd4b3b96c2..e44c90378940 100644 --- a/kernel/events/core.c +++ b/kernel/events/core.c @@ -5005,6 +5005,9 @@ static int perf_event_period(struct perf_event *event, u64 __user *arg) if (perf_event_check_period(event, value)) return -EINVAL; + if (!event->attr.freq && (value & (1ULL << 63))) + return -EINVAL; + event_function_call(event, __perf_event_period, ); return 0; -- 2.20.1
Re: [PATCH 2/2] powerpc/perf: Fix mmcra corruption by bhrb_filter
On 5/11/19 8:12 AM, Ravi Bangoria wrote: > Consider a scenario where user creates two events: > > 1st event: > attr.sample_type |= PERF_SAMPLE_BRANCH_STACK; > attr.branch_sample_type = PERF_SAMPLE_BRANCH_ANY; > fd = perf_event_open(attr, 0, 1, -1, 0); > > This sets cpuhw->bhrb_filter to 0 and returns valid fd. > > 2nd event: > attr.sample_type |= PERF_SAMPLE_BRANCH_STACK; > attr.branch_sample_type = PERF_SAMPLE_BRANCH_CALL; > fd = perf_event_open(attr, 0, 1, -1, 0); > > It overrides cpuhw->bhrb_filter to -1 and returns with error. > > Now if power_pmu_enable() gets called by any path other than > power_pmu_add(), ppmu->config_bhrb(-1) will set mmcra to -1. > > Signed-off-by: Ravi Bangoria Fixes: 3925f46bb590 ("powerpc/perf: Enable branch stack sampling framework")
Re: [PATCH 1/2] perf ioctl: Add check for the sample_period value
On 5/13/19 2:26 PM, Peter Zijlstra wrote: > On Mon, May 13, 2019 at 09:42:13AM +0200, Peter Zijlstra wrote: >> On Sat, May 11, 2019 at 08:12:16AM +0530, Ravi Bangoria wrote: >>> Add a check for sample_period value sent from userspace. Negative >>> value does not make sense. And in powerpc arch code this could cause >>> a recursive PMI leading to a hang (reported when running perf-fuzzer). >>> >>> Signed-off-by: Ravi Bangoria >>> --- >>> kernel/events/core.c | 3 +++ >>> 1 file changed, 3 insertions(+) >>> >>> diff --git a/kernel/events/core.c b/kernel/events/core.c >>> index abbd4b3b96c2..e44c90378940 100644 >>> --- a/kernel/events/core.c >>> +++ b/kernel/events/core.c >>> @@ -5005,6 +5005,9 @@ static int perf_event_period(struct perf_event >>> *event, u64 __user *arg) >>> if (perf_event_check_period(event, value)) >>> return -EINVAL; >>> >>> + if (!event->attr.freq && (value & (1ULL << 63))) >>> + return -EINVAL; >> >> Well, perf_event_attr::sample_period is __u64. Would not be the site >> using it as signed be the one in error? > > You forgot to mention commit: 0819b2e30ccb9, so I guess this just makes > it consistent and is fine. > Yeah, I was about to reply :)
Re: [PATCH V2 1/3] perf parse-regs: Split parse_regs
On 5/15/19 1:49 AM, kan.li...@linux.intel.com wrote: > From: Kan Liang > > The available registers for --int-regs and --user-regs may be different, > e.g. XMM registers. > > Split parse_regs into two dedicated functions for --int-regs and > --user-regs respectively. > > Modify the warning message. "--user-regs=?" should be applied to show > the available registers for --user-regs. > > Signed-off-by: Kan Liang > --- For patch 1 and 2, Tested-by: Ravi Bangoria Minor neat. Should we update document as well? May be something like: tools/perf/Documentation/perf-record.txt --user-regs:: Similar to -I, but capture user registers at sample time. To list the available user registers use --user-regs=\?. Ravi
Re: [PATCH]perf:Remove P8 HW events which are not supported
On 3/18/19 11:56 PM, Arnaldo Carvalho de Melo wrote: > I added this: > > Cc: Sukadev Bhattiprolu > Fixes: 2a81fa3bb5ed ("perf vendor events: Add power8 PMU events") > > - Arnaldo Sure. Thanks a lot Arnaldo!
[PATCH] perf c2c: Fix c2c report for empty numa node
perf c2c report fails if system has empty numa node(0 cpus): $ lscpu NUMA node0 CPU(s): NUMA node1 CPU(s): 0-4 $ sudo ./perf c2c report node/cpu topology bugFailed setup nodes Fix this. Reported-by: Nageswara R Sastry Signed-off-by: Ravi Bangoria --- tools/perf/util/cpumap.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/tools/perf/util/cpumap.c b/tools/perf/util/cpumap.c index 383674f448fc..517c3f37c613 100644 --- a/tools/perf/util/cpumap.c +++ b/tools/perf/util/cpumap.c @@ -261,7 +261,7 @@ struct cpu_map *cpu_map__dummy_new(void) struct cpu_map *cpus = malloc(sizeof(*cpus) + sizeof(int)); if (cpus != NULL) { - cpus->nr = 1; + cpus->nr = 0; cpus->map[0] = -1; refcount_set(>refcnt, 1); } -- 2.17.1
Re: [PATCH] perf c2c: Fix c2c report for empty numa node
On 2/28/19 9:52 PM, Jiri Olsa wrote: > how about attached change (untested)? LGTM. Would you mind sending a patch. > > but I wonder there are some other hidden > bugs wrt empty node > > jirka > > > --- > diff --git a/tools/perf/builtin-c2c.c b/tools/perf/builtin-c2c.c > index 4272763a5e96..9e6cc868bdb4 100644 > --- a/tools/perf/builtin-c2c.c > +++ b/tools/perf/builtin-c2c.c > @@ -2056,6 +2056,12 @@ static int setup_nodes(struct perf_session *session) > if (!set) > return -ENOMEM; > > + nodes[node] = set; > + > + /* empty node, skip */ > + if (cpu_map__empty(map)) > + continue; > + > for (cpu = 0; cpu < map->nr; cpu++) { > set_bit(map->map[cpu], set); > > @@ -2064,8 +2070,6 @@ static int setup_nodes(struct perf_session *session) > > cpu2node[map->map[cpu]] = node; > } > - > - nodes[node] = set; > } > > setup_nodes_header(); >
Re: [PATCH v4 3/4] perf stat: Support 'percore' event qualifier
On 4/12/19 7:29 PM, Jin Yao wrote: > diff --git a/tools/perf/Documentation/perf-stat.txt > b/tools/perf/Documentation/perf-stat.txt > index 39c05f8..1e312c2 100644 > --- a/tools/perf/Documentation/perf-stat.txt > +++ b/tools/perf/Documentation/perf-stat.txt > @@ -43,6 +43,10 @@ report:: > param1 and param2 are defined as formats for the PMU in > /sys/bus/event_source/devices//format/* > > + 'percore' is a event qualifier that sums up the event counts for both > + hardware threads in a core. s/both/all/ : $ lscpu | grep Thread Thread(s) per core:4 Apart from that, for the series: Tested-by: Ravi Bangoria
Re: [PATCH] perf c2c: Fix c2c report for empty numa node
On 3/1/19 3:56 PM, Jiri Olsa wrote: > Ravi Bangoria reported that we fail with empty > numa node with following message: > > $ lscpu > NUMA node0 CPU(s): > NUMA node1 CPU(s): 0-4 > > $ sudo ./perf c2c report > node/cpu topology bugFailed setup nodes > > Fixing this by detecting empty node and keeping > its cpu set empty. > > Reported-by: Ravi Bangoria > Link: http://lkml.kernel.org/n/tip-dyq5jo6pn1j3yqavb5ukj...@git.kernel.org > Signed-off-by: Jiri Olsa Reported-by: Nageswara R Sastry Tested-by: Ravi Bangoria
Re: [PATCH] Uprobes: Fix deadlock between delayed_uprobe_lock and fs_reclaim
On 2/8/19 2:03 PM, Ravi Bangoria wrote: > > > On 2/6/19 7:06 PM, Oleg Nesterov wrote: >> Ravi, I am on vacation till the end of this week, can't read your patch >> carefully. >> >> I am not sure I fully understand the problem, but shouldn't we change >> binder_alloc_free_page() to use mmput_async() ? Like it does if trylock >> fails. > > I don't understand binderfs code much so I'll let Sherry comment on this. Sherry, Can you please comment on this. Thanks, Ravi
Re: [PATCH]perf:Remove P8 HW events which are not supported
On 2/7/19 3:09 PM, Mamatha Inamdar wrote: > This patch is to remove following hardware events > from JSON file which are not supported on POWER8 > Acked-by: Ravi Bangoria
Re: System crash with perf_fuzzer (kernel: 5.0.0-rc3)
Hi Jiri, On 2/1/19 1:13 PM, Jiri Olsa wrote: > On Thu, Jan 31, 2019 at 09:27:11AM +0100, Jiri Olsa wrote: >> On Wed, Jan 30, 2019 at 07:36:48PM +0100, Jiri Olsa wrote: >> >> SNIP >> >>> diff --git a/kernel/events/core.c b/kernel/events/core.c >>> index 280a72b3a553..22ec63a0782e 100644 >>> --- a/kernel/events/core.c >>> +++ b/kernel/events/core.c >>> @@ -4969,6 +4969,26 @@ static void __perf_event_period(struct perf_event >>> *event, >>> } >>> } >>> >>> +static int check_period(struct perf_event *event, u64 value) >>> +{ >>> + u64 sample_period_attr = event->attr.sample_period; >>> + u64 sample_period_hw = event->hw.sample_period; >>> + int ret; >>> + >>> + if (event->attr.freq) { >>> + event->attr.sample_freq = value; >>> + } else { >>> + event->attr.sample_period = value; >>> + event->hw.sample_period = value; >>> + } >> >> hm, I think we need to check the period without changing the event, >> because we don't disable pmu, so it might get picked up by bts code >> >> will check > > with attached patch I did not trigger the fuzzer crash > for over a day now, could you guys try? I ran fuzzer for couple of hours but I didn't see any crash with your previous patch. I'll try this newer one as well. Thanks.
Re: System crash with perf_fuzzer (kernel: 5.0.0-rc3)
On 2/1/19 1:24 PM, Ravi Bangoria wrote: > I ran fuzzer for couple of hours but I didn't see any crash with > your previous patch. > > I'll try this newer one as well. I ran fuzzer for ~8 hrs and no lockup so far. Thanks.
Re: [PATCH] perf script: fix crash when processing recorded stat data
On 1/21/19 12:44 AM, Tony Jones wrote: > While updating Perf to work with Python3 and Python2 I noticed that the > stat-cpi script was dumping core. [...] > Fixes: 1fcd03946b52 ("perf stat: Update per-thread shadow stats") > Signed-off-by: Tony Jones Tested-by: Ravi Bangoria
Re: perf trace syscall table generation for powerpc with syscall.tbl
Hi Arnaldo, Yes. I'm aware of it. Just that I was busy with something else so couldn't do it. Thanks for reminding :). Will post a patch soon. Ravi On 1/8/19 10:34 PM, Arnaldo Carvalho de Melo wrote: > Hi Ravi, > > I noticed that in: > > commit ab66dcc76d6ab8fae9d69d149ae38c42605e7fc5 > Author: Firoz Khan > Date: Mon Dec 17 16:10:36 2018 +0530 > > powerpc: generate uapi header and system call table files > > powerpc now generates its syscall tables headers from a syscall.tbl just > like x86 and s390, could you please switch to using it, taking the x86 > and s390 scripts as a starting point, then test on your systems > everything works? > > Thanks, > > - Arnaldo >
[PATCH 1/2] perf powerpc: Rework syscall table generation
Commit aff850393200 ("powerpc: add system call table generation support") changed how systemcall table is generated for powerpc. Incorporate these changes into perf as well. Signed-off-by: Ravi Bangoria --- tools/perf/arch/powerpc/Makefile | 15 +- .../perf/arch/powerpc/entry/syscalls/mksyscalltbl | 22 +- tools/perf/arch/powerpc/entry/syscalls/syscall.tbl | 427 + 3 files changed, 450 insertions(+), 14 deletions(-) create mode 100644 tools/perf/arch/powerpc/entry/syscalls/syscall.tbl diff --git a/tools/perf/arch/powerpc/Makefile b/tools/perf/arch/powerpc/Makefile index a111239df182..e58d00d62f02 100644 --- a/tools/perf/arch/powerpc/Makefile +++ b/tools/perf/arch/powerpc/Makefile @@ -14,18 +14,25 @@ PERF_HAVE_JITDUMP := 1 out:= $(OUTPUT)arch/powerpc/include/generated/asm header32 := $(out)/syscalls_32.c header64 := $(out)/syscalls_64.c -sysdef := $(srctree)/tools/arch/powerpc/include/uapi/asm/unistd.h -sysprf := $(srctree)/tools/perf/arch/powerpc/entry/syscalls/ +syskrn := $(srctree)/arch/powerpc/kernel/syscalls/syscall.tbl +sysprf := $(srctree)/tools/perf/arch/powerpc/entry/syscalls +sysdef := $(sysprf)/syscall.tbl systbl := $(sysprf)/mksyscalltbl # Create output directory if not already present _dummy := $(shell [ -d '$(out)' ] || mkdir -p '$(out)') $(header64): $(sysdef) $(systbl) - $(Q)$(SHELL) '$(systbl)' '64' '$(CC)' $(sysdef) > $@ + @(test -d ../../kernel -a -d ../../tools -a -d ../perf && ( \ + (diff -B $(sysdef) $(syskrn) >/dev/null) \ + || echo "Warning: Kernel ABI header at '$(sysdef)' differs from latest version at '$(syskrn)'" >&2 )) || true + $(Q)$(SHELL) '$(systbl)' '64' $(sysdef) > $@ $(header32): $(sysdef) $(systbl) - $(Q)$(SHELL) '$(systbl)' '32' '$(CC)' $(sysdef) > $@ + @(test -d ../../kernel -a -d ../../tools -a -d ../perf && ( \ + (diff -B $(sysdef) $(syskrn) >/dev/null) \ + || echo "Warning: Kernel ABI header at '$(sysdef)' differs from latest version at '$(syskrn)'" >&2 )) || true + $(Q)$(SHELL) '$(systbl)' '32' $(sysdef) > $@ clean:: $(call QUIET_CLEAN, powerpc) $(RM) $(header32) $(header64) diff --git a/tools/perf/arch/powerpc/entry/syscalls/mksyscalltbl b/tools/perf/arch/powerpc/entry/syscalls/mksyscalltbl index ef52e1dd694b..6c58060aa03b 100755 --- a/tools/perf/arch/powerpc/entry/syscalls/mksyscalltbl +++ b/tools/perf/arch/powerpc/entry/syscalls/mksyscalltbl @@ -9,10 +9,9 @@ # Changed by: Ravi Bangoria wordsize=$1 -gcc=$2 -input=$3 +SYSCALL_TBL=$2 -if ! test -r $input; then +if ! test -r $SYSCALL_TBL; then echo "Could not read input file" >&2 exit 1 fi @@ -20,18 +19,21 @@ fi create_table() { local wordsize=$1 - local max_nr + local max_nr nr abi sc discard + max_nr=-1 + nr=0 echo "static const char *syscalltbl_powerpc_${wordsize}[] = {" - while read sc nr; do - printf '\t[%d] = "%s",\n' $nr $sc - max_nr=$nr + while read nr abi sc discard; do + if [ "$max_nr" -lt "$nr" ]; then + printf '\t[%d] = "%s",\n' $nr $sc + max_nr=$nr + fi done echo '};' echo "#define SYSCALLTBL_POWERPC_${wordsize}_MAX_ID $max_nr" } -$gcc -m${wordsize} -E -dM -x c $input\ - |sed -ne 's/^#define __NR_//p' \ - |sort -t' ' -k2 -nu\ +grep -E "^[[:digit:]]+[[:space:]]+(common|spu|nospu|${wordsize})" $SYSCALL_TBL \ + |sort -k1 -n \ |create_table ${wordsize} diff --git a/tools/perf/arch/powerpc/entry/syscalls/syscall.tbl b/tools/perf/arch/powerpc/entry/syscalls/syscall.tbl new file mode 100644 index ..db3bbb8744af --- /dev/null +++ b/tools/perf/arch/powerpc/entry/syscalls/syscall.tbl @@ -0,0 +1,427 @@ +# SPDX-License-Identifier: GPL-2.0 WITH Linux-syscall-note +# +# system call numbers and entry vectors for powerpc +# +# The format is: +# +# +# The can be common, spu, nospu, 64, or 32 for this file. +# +0 nospu restart_syscall sys_restart_syscall +1 nospu exitsys_exit +2 nospu forkppc_fork +3 common readsys_read +4 common write sys_write +5 common opensys_open compat_sys_open +6 common close sys_close +7 common waitpid sys_waitpid +8 common creat sys_creat +9 common linksys_link +10 common unlink sys_unlink +11 nosp
[PATCH 2/2] perf powerpc: Remove unistd.h
We use syscall.tbl to generate system call table on powerpc. unistd.h is no longer required now. Remove it. Signed-off-by: Ravi Bangoria --- tools/arch/powerpc/include/uapi/asm/unistd.h | 404 --- tools/perf/check-headers.sh | 1 - 2 files changed, 405 deletions(-) delete mode 100644 tools/arch/powerpc/include/uapi/asm/unistd.h diff --git a/tools/arch/powerpc/include/uapi/asm/unistd.h b/tools/arch/powerpc/include/uapi/asm/unistd.h deleted file mode 100644 index 985534d0b448.. --- a/tools/arch/powerpc/include/uapi/asm/unistd.h +++ /dev/null @@ -1,404 +0,0 @@ -/* SPDX-License-Identifier: GPL-2.0+ WITH Linux-syscall-note */ -/* - * This file contains the system call numbers. - * - * This program is free software; you can redistribute it and/or - * modify it under the terms of the GNU General Public License - * as published by the Free Software Foundation; either version - * 2 of the License, or (at your option) any later version. - */ -#ifndef _UAPI_ASM_POWERPC_UNISTD_H_ -#define _UAPI_ASM_POWERPC_UNISTD_H_ - - -#define __NR_restart_syscall 0 -#define __NR_exit1 -#define __NR_fork2 -#define __NR_read3 -#define __NR_write 4 -#define __NR_open5 -#define __NR_close 6 -#define __NR_waitpid 7 -#define __NR_creat 8 -#define __NR_link9 -#define __NR_unlink 10 -#define __NR_execve 11 -#define __NR_chdir 12 -#define __NR_time 13 -#define __NR_mknod 14 -#define __NR_chmod 15 -#define __NR_lchown 16 -#define __NR_break 17 -#define __NR_oldstat18 -#define __NR_lseek 19 -#define __NR_getpid 20 -#define __NR_mount 21 -#define __NR_umount 22 -#define __NR_setuid 23 -#define __NR_getuid 24 -#define __NR_stime 25 -#define __NR_ptrace 26 -#define __NR_alarm 27 -#define __NR_oldfstat 28 -#define __NR_pause 29 -#define __NR_utime 30 -#define __NR_stty 31 -#define __NR_gtty 32 -#define __NR_access 33 -#define __NR_nice 34 -#define __NR_ftime 35 -#define __NR_sync 36 -#define __NR_kill 37 -#define __NR_rename 38 -#define __NR_mkdir 39 -#define __NR_rmdir 40 -#define __NR_dup41 -#define __NR_pipe 42 -#define __NR_times 43 -#define __NR_prof 44 -#define __NR_brk45 -#define __NR_setgid 46 -#define __NR_getgid 47 -#define __NR_signal 48 -#define __NR_geteuid49 -#define __NR_getegid50 -#define __NR_acct 51 -#define __NR_umount252 -#define __NR_lock 53 -#define __NR_ioctl 54 -#define __NR_fcntl 55 -#define __NR_mpx56 -#define __NR_setpgid57 -#define __NR_ulimit 58 -#define __NR_oldolduname59 -#define __NR_umask 60 -#define __NR_chroot 61 -#define __NR_ustat 62 -#define __NR_dup2 63 -#define __NR_getppid64 -#define __NR_getpgrp65 -#define __NR_setsid 66 -#define __NR_sigaction 67 -#define __NR_sgetmask 68 -#define __NR_ssetmask 69 -#define __NR_setreuid 70 -#define __NR_setregid 71 -#define __NR_sigsuspend 72 -#define __NR_sigpending 73 -#define __NR_sethostname74 -#define __NR_setrlimit 75 -#define __NR_getrlimit 76 -#define __NR_getrusage 77 -#define __NR_gettimeofday 78 -#define __NR_settimeofday 79 -#define __NR_getgroups 80 -#define __NR_setgroups 81 -#define __NR_select 82 -#define __NR_symlink83 -#define __NR_oldlstat 84 -#define __NR_readlink 85 -#define __NR_uselib 86 -#define __NR_swapon 87 -#define __NR_reboot 88 -#define __NR_readdir89 -#define __NR_mmap 90 -#define __NR_munmap 91 -#define __NR_truncate 92 -#define __NR_ftruncate 93 -#define __NR_fchmod 94 -#define __NR_fchown 95 -#define __NR_getpriority96 -#define __NR_setpriority97 -#define __NR_profil 98 -#define __NR_statfs 99 -#define __NR_fstatfs 100 -#define __NR_ioperm101 -#define __NR_socketcall102 -#define __NR_syslog103 -#define __NR_setitimer 104 -#define __NR_getitimer 105 -#define __NR_stat 106 -#define __NR_lstat 107
Re: [PATCH] Uprobes: Fix kernel oops with delayed_uprobe_remove()
Hi Steve, Please pull this patch. Thanks. On 11/15/18 6:13 PM, Oleg Nesterov wrote: > On 11/15, Ravi Bangoria wrote: >> >> There could be a race between task exit and probe unregister: >> >> exit_mm() >> mmput() >> __mmput() uprobe_unregister() >> uprobe_clear_state() put_uprobe() >> delayed_uprobe_remove() delayed_uprobe_remove() >> >> put_uprobe() is calling delayed_uprobe_remove() without taking >> delayed_uprobe_lock and thus the race sometimes results in a >> kernel crash. Fix this by taking delayed_uprobe_lock before >> calling delayed_uprobe_remove() from put_uprobe(). >> >> Detailed crash log can be found at: >> https://lkml.org/lkml/2018/11/1/1244 > > Thanks, looks good, > > Oleg. >
[PATCH] perf test: Add watchpoint test
We don't have perf test available to test watchpoint functionality. Add simple set of tests: - Read only watchpoint - Write only watchpoint - Read / Write watchpoint - Runtime watchpoint modification Ex on powerpc: $ sudo ./perf test 22 22: Watchpoint: 22.1: Read Only Watchpoint: Ok 22.2: Write Only Watchpoint : Ok 22.3: Read / Write Watchpoint : Ok 22.4: Modify Watchpoint : Ok Signed-off-by: Ravi Bangoria --- tools/perf/tests/Build | 1 + tools/perf/tests/builtin-test.c | 9 ++ tools/perf/tests/tests.h| 3 + tools/perf/tests/wp.c | 225 4 files changed, 238 insertions(+) create mode 100644 tools/perf/tests/wp.c diff --git a/tools/perf/tests/Build b/tools/perf/tests/Build index 6c108fa79ae3..0b2b8305c965 100644 --- a/tools/perf/tests/Build +++ b/tools/perf/tests/Build @@ -21,6 +21,7 @@ perf-y += python-use.o perf-y += bp_signal.o perf-y += bp_signal_overflow.o perf-y += bp_account.o +perf-y += wp.o perf-y += task-exit.o perf-y += sw-clock.o perf-y += mmap-thread-lookup.o diff --git a/tools/perf/tests/builtin-test.c b/tools/perf/tests/builtin-test.c index d7a5e1b9aa6f..54ca7d87236f 100644 --- a/tools/perf/tests/builtin-test.c +++ b/tools/perf/tests/builtin-test.c @@ -120,6 +120,15 @@ static struct test generic_tests[] = { .func = test__bp_accounting, .is_supported = test__bp_signal_is_supported, }, + { + .desc = "Watchpoint", + .func = test__wp, + .subtest = { + .skip_if_fail = false, + .get_nr = test__wp_subtest_get_nr, + .get_desc = test__wp_subtest_get_desc, + }, + }, { .desc = "Number of exit events of a simple workload", .func = test__task_exit, diff --git a/tools/perf/tests/tests.h b/tools/perf/tests/tests.h index a9760e790563..8e26a4148f30 100644 --- a/tools/perf/tests/tests.h +++ b/tools/perf/tests/tests.h @@ -59,6 +59,9 @@ int test__python_use(struct test *test, int subtest); int test__bp_signal(struct test *test, int subtest); int test__bp_signal_overflow(struct test *test, int subtest); int test__bp_accounting(struct test *test, int subtest); +int test__wp(struct test *test, int subtest); +const char *test__wp_subtest_get_desc(int subtest); +int test__wp_subtest_get_nr(void); int test__task_exit(struct test *test, int subtest); int test__mem(struct test *test, int subtest); int test__sw_clock_freq(struct test *test, int subtest); diff --git a/tools/perf/tests/wp.c b/tools/perf/tests/wp.c new file mode 100644 index ..f9f6d9ee5172 --- /dev/null +++ b/tools/perf/tests/wp.c @@ -0,0 +1,225 @@ +// SPDX-License-Identifier: GPL-2.0 +#include +#include +#include "tests/tests.h" +#include "arch-tests.h" +#include "debug.h" +#include "cloexec.h" + +#define WP_TEST_ASSERT_VAL(fd, text, val) \ +do {\ + long long count;\ + wp_read(fd, , sizeof(long long)); \ + TEST_ASSERT_VAL(text, count == val);\ +} while (0) + +volatile u64 data1; +volatile u8 data2[3]; + +static int wp_read(int fd, long long *count, int size) +{ + int ret = read(fd, count, size); + + if (ret != size) { + pr_debug("failed to read: %d\n", ret); + return -1; + } + return 0; +} + +static void get__perf_event_attr(struct perf_event_attr *attr, int wp_type, +void *wp_addr, unsigned long wp_len) +{ + memset(attr, 0, sizeof(struct perf_event_attr)); + attr->type = PERF_TYPE_BREAKPOINT; + attr->size = sizeof(struct perf_event_attr); + attr->config = 0; + attr->bp_type= wp_type; + attr->bp_addr= (unsigned long)wp_addr; + attr->bp_len = wp_len; + attr->sample_period = 1; + attr->sample_type= PERF_SAMPLE_IP; + attr->exclude_kernel = 1; + attr->exclude_hv = 1; +} + +static int __event(int wp_type, void *wp_addr, unsigned long wp_len) +{ + int fd; + struct perf_event_attr attr; + + get__perf_event_attr(, wp_type, wp_addr, wp_len); + fd = sys_perf_event_open(, 0, -1, -1, +perf_event_open_cloexec_flag()); + if (fd < 0) + pr_debug("failed opening event %x\n", attr.bp_type); + + return fd; +} + +static int wp_ro_test(void) +{ + int fd; + unsigned long tmp, tmp1 = rand(); + + fd = __event(HW_BREAKPOINT_R, (void *), sizeof(data1)); +
Re: [PATCH v6 3/6] Uprobes: Support SDT markers having reference count (semaphore)
Hi Oleg, On 07/23/2018 09:56 PM, Oleg Nesterov wrote: > I have a mixed feeling about this series... I'll try to summarise my thinking > tomorrow, but I do not see any obvious problem so far. Although I have some > concerns about 5/6, I need to re-read it after sleep. Sure. > > > On 07/16, Ravi Bangoria wrote: >> >> +static int delayed_uprobe_install(struct vm_area_struct *vma) >> +{ >> +struct list_head *pos, *q; >> +struct delayed_uprobe *du; >> +unsigned long vaddr; >> +int ret = 0, err = 0; >> + >> +mutex_lock(_uprobe_lock); >> +list_for_each_safe(pos, q, _uprobe_list) { >> +du = list_entry(pos, struct delayed_uprobe, list); >> + >> +if (!du->uprobe->ref_ctr_offset || > > Is it possible to see ->ref_ctr_offset == 0 in delayed_uprobe_list? I'll remove this check. > >> @@ -1072,7 +1282,13 @@ int uprobe_mmap(struct vm_area_struct *vma) >> struct uprobe *uprobe, *u; >> struct inode *inode; >> >> -if (no_uprobe_events() || !valid_vma(vma, true)) >> +if (no_uprobe_events()) >> +return 0; >> + >> +if (vma->vm_flags & VM_WRITE) >> +delayed_uprobe_install(vma); > > Obviously not nice performance-wise... OK, I do not know if it will actually > hurt in practice and probably we can use the better data structures if > necessary. > But can't we check MMF_HAS_UPROBES at least? I mean, > > if (vma->vm_flags & VM_WRITE && test_bit(MMF_HAS_UPROBES, > >vm_mm->flags)) > delayed_uprobe_install(vma); > > ? Yes. > > > Perhaps we can even add another flag later, MMF_HAS_DELAYED_UPROBES set by > delayed_uprobe_add(). Yes, good idea. Thanks for the review, Ravi
Re: [PATCH v6 0/6] Uprobes: Support SDT markers having reference count (semaphore)
Oleg, Srikar, Masami, Song, Any feedback please? :) Thanks, Ravi On 07/16/2018 02:17 PM, Ravi Bangoria wrote: > Userspace Statically Defined Tracepoints[1] are dtrace style markers > inside userspace applications. Applications like PostgreSQL, MySQL, > Pthread, Perl, Python, Java, Ruby, Node.js, libvirt, QEMU, glib etc > have these markers embedded in them. These markers are added by developer > at important places in the code. Each marker source expands to a single > nop instruction in the compiled code but there may be additional > overhead for computing the marker arguments which expands to couple of > instructions. In case the overhead is more, execution of it can be > omitted by runtime if() condition when no one is tracing on the marker: > > if (reference_counter > 0) { > Execute marker instructions; > } > > Default value of reference counter is 0. Tracer has to increment the > reference counter before tracing on a marker and decrement it when > done with the tracing. > > Currently, perf tool has limited supports for SDT markers. I.e. it > can not trace markers surrounded by reference counter. Also, it's > not easy to add reference counter logic in userspace tool like perf, > so basic idea for this patchset is to add reference counter logic in > the a uprobe infrastructure. Ex,[2] > > # cat tick.c > ... > for (i = 0; i < 100; i++) { > DTRACE_PROBE1(tick, loop1, i); > if (TICK_LOOP2_ENABLED()) { > DTRACE_PROBE1(tick, loop2, i); > } > printf("hi: %d\n", i); > sleep(1); > } > ... > > Here tick:loop1 is marker without reference counter where as tick:loop2 > is surrounded by reference counter condition. > > # perf buildid-cache --add /tmp/tick > # perf probe sdt_tick:loop1 > # perf probe sdt_tick:loop2 > > # perf stat -e sdt_tick:loop1,sdt_tick:loop2 -- /tmp/tick > hi: 0 > hi: 1 > hi: 2 > ^C > Performance counter stats for '/tmp/tick': > 3 sdt_tick:loop1 > 0 sdt_tick:loop2 > 2.747086086 seconds time elapsed > > Perf failed to record data for tick:loop2. Same experiment with this > patch series: > > # ./perf buildid-cache --add /tmp/tick > # ./perf probe sdt_tick:loop2 > # ./perf stat -e sdt_tick:loop2 /tmp/tick > hi: 0 > hi: 1 > hi: 2 > ^C > Performance counter stats for '/tmp/tick': > 3 sdt_tick:loop2 >2.561851452 seconds time elapsed > > > v6 changes: > - Do not export struct uprobe outside of uprobe.c. Instead, use >container_of(arch_uprobe) to regain uprobe in uprobe_write_opcode(). > - Allow 0 as a special value for reference counter _offset_. I.e. >two uprobes, one having ref_ctr_offset=0 and the other having >non-zero ref_ctr_offset can coexists. > - If vma holding reference counter is not present while patching an >instruction, we add that uprobe in delayed_uprobe_list. When >appropriate mapping is created, we increment the reference counter >and remove uprobe from delayed_uprobe_list. While doing all this, >v5 was searching for all such uprobes in uprobe_tree which is not >require. Also, uprobes are stored in rbtree with inode+offset as >the key and v5 was searching uprobe based on inode+ref_ctr_offset >which is wrong too. Fix this by directly looing on delayed_uprobe_list. > - Consider VM_SHARED vma as invalid for reference counter. Return >false from valid_ref_ctr_vma() if vma->vm_flags has VM_SHARED set. > - No need to use FOLL_FORCE in get_user_pages_remote() while getting >a page to update reference counter. Remove it. > - Do not mention usage of reference counter in Documentation/. > > > v5 can be found at: https://lkml.org/lkml/2018/6/28/51 > > Ravi Bangoria (6): > Uprobes: Simplify uprobe_register() body > Uprobe: Additional argument arch_uprobe to uprobe_write_opcode() > Uprobes: Support SDT markers having reference count (semaphore) > trace_uprobe/sdt: Prevent multiple reference counter for same uprobe > Uprobes/sdt: Prevent multiple reference counter for same uprobe > perf probe: Support SDT markers having reference counter (semaphore) > > arch/arm/probes/uprobes/core.c | 2 +- > arch/mips/kernel/uprobes.c | 2 +- > include/linux/uprobes.h| 7 +- > kernel/events/uprobes.c| 358 > - > kernel/trace/trace.c | 2 +- > kernel/trace/trace_uprobe.c| 78 - > tools/perf/util/probe-event.c | 39 - > tools/perf/util/probe-event.h | 1 + > tools/perf/util/probe-file.c | 34 +++- > tools/perf/util/probe-file.h | 1 + > tools/perf/util/symbol-elf.c | 46 -- > tools/perf/util/symbol.h | 7 + > 12 files changed, 503 insertions(+), 74 deletions(-) >
Re: [PATCH] perf report powerpc: Fix crash if callchain is empty
On 06/11/2018 04:10 PM, Sandipan Das wrote: > For some cases, the callchain provided by the kernel may be > empty. So, the callchain ip filtering code will cause a crash > if we do not check whether the struct ip_callchain pointer is > NULL before accessing any members. > > This can be observed on a powerpc64le system running Fedora 27 > as shown below. > > # perf record -b -e cycles:u ls > > Before applying this patch: > > # perf report --branch-history > > perf: Segmentation fault > backtrace > perf[0x1027615c] > linux-vdso64.so.1(__kernel_sigtramp_rt64+0x0)[0x7fff856304d8] > perf(arch_skip_callchain_idx+0x44)[0x10257c58] > perf[0x1017f2e4] > perf(thread__resolve_callchain+0x124)[0x1017ff5c] > perf(sample__resolve_callchain+0xf0)[0x10172788] > ... > > After applying this patch: > > # perf report --branch-history > > Samples: 25 of event 'cycles:u', Event count (approx.): 2306870 > Overhead Source:LineSymbol Shared Object > + 11.60% _init+35736[.] _initls > +9.84% strcoll_l.c:137[.] __strcoll_l libc-2.26.so > +9.16% memcpy.S:175 [.] __memcpy_power7 libc-2.26.so > +9.01% gconv_charset.h:54 [.] _nl_find_locale libc-2.26.so > +8.87% dl-addr.c:52 [.] _dl_addr libc-2.26.so > +8.83% _init+236 [.] _initls > ... > > Reported-by: Ravi Bangoria > Signed-off-by: Sandipan Das Acked-by: Ravi Bangoria
Re: [PATCH v5 06/10] Uprobes: Support SDT markers having reference count (semaphore)
Hi Song, On 07/14/2018 05:20 AM, Song Liu wrote: > > Hmm... what happens when we have multiple uprobes sharing the same > reference counter? It feels equally complicate to me. Or did I miss any > cases here? As far as I can think of, it should be handled by default. No special treatment needed for this. ... > > This patch tries to resolve this imbalance by passing extra flag > "restore_insn" to probe_event_disable(). > > Signed-off-by: Song Liu Ah cool. Seems this will make them balanced. But is it fine to change uprobe_unregister()? It's already an ABI. No other way around? Also, one more source of imbalance is this condition: if (is_register) flags |= VM_WRITE; in valid_vma(). You need to take care of that as well. Thanks, Ravi
[PATCH v6 2/6] Uprobe: Additional argument arch_uprobe to uprobe_write_opcode()
Add addition argument 'arch_uprobe' to uprobe_write_opcode(). We need this in later set of patches. Signed-off-by: Ravi Bangoria --- arch/arm/probes/uprobes/core.c | 2 +- arch/mips/kernel/uprobes.c | 2 +- include/linux/uprobes.h| 2 +- kernel/events/uprobes.c| 9 + 4 files changed, 8 insertions(+), 7 deletions(-) diff --git a/arch/arm/probes/uprobes/core.c b/arch/arm/probes/uprobes/core.c index d1329f1ba4e4..bf992264060e 100644 --- a/arch/arm/probes/uprobes/core.c +++ b/arch/arm/probes/uprobes/core.c @@ -32,7 +32,7 @@ bool is_swbp_insn(uprobe_opcode_t *insn) int set_swbp(struct arch_uprobe *auprobe, struct mm_struct *mm, unsigned long vaddr) { - return uprobe_write_opcode(mm, vaddr, + return uprobe_write_opcode(auprobe, mm, vaddr, __opcode_to_mem_arm(auprobe->bpinsn)); } diff --git a/arch/mips/kernel/uprobes.c b/arch/mips/kernel/uprobes.c index f7a0645ccb82..4aaff3b3175c 100644 --- a/arch/mips/kernel/uprobes.c +++ b/arch/mips/kernel/uprobes.c @@ -224,7 +224,7 @@ unsigned long arch_uretprobe_hijack_return_addr( int __weak set_swbp(struct arch_uprobe *auprobe, struct mm_struct *mm, unsigned long vaddr) { - return uprobe_write_opcode(mm, vaddr, UPROBE_SWBP_INSN); + return uprobe_write_opcode(auprobe, mm, vaddr, UPROBE_SWBP_INSN); } void arch_uprobe_copy_ixol(struct page *page, unsigned long vaddr, diff --git a/include/linux/uprobes.h b/include/linux/uprobes.h index 0a294e950df8..bb9d2084af03 100644 --- a/include/linux/uprobes.h +++ b/include/linux/uprobes.h @@ -121,7 +121,7 @@ extern bool is_swbp_insn(uprobe_opcode_t *insn); extern bool is_trap_insn(uprobe_opcode_t *insn); extern unsigned long uprobe_get_swbp_addr(struct pt_regs *regs); extern unsigned long uprobe_get_trap_addr(struct pt_regs *regs); -extern int uprobe_write_opcode(struct mm_struct *mm, unsigned long vaddr, uprobe_opcode_t); +extern int uprobe_write_opcode(struct arch_uprobe *auprobe, struct mm_struct *mm, unsigned long vaddr, uprobe_opcode_t); extern int uprobe_register(struct inode *inode, loff_t offset, struct uprobe_consumer *uc); extern int uprobe_apply(struct inode *inode, loff_t offset, struct uprobe_consumer *uc, bool); extern void uprobe_unregister(struct inode *inode, loff_t offset, struct uprobe_consumer *uc); diff --git a/kernel/events/uprobes.c b/kernel/events/uprobes.c index 471eac896635..c0418ba52ba8 100644 --- a/kernel/events/uprobes.c +++ b/kernel/events/uprobes.c @@ -299,8 +299,8 @@ static int verify_opcode(struct page *page, unsigned long vaddr, uprobe_opcode_t * Called with mm->mmap_sem held for write. * Return 0 (success) or a negative errno. */ -int uprobe_write_opcode(struct mm_struct *mm, unsigned long vaddr, - uprobe_opcode_t opcode) +int uprobe_write_opcode(struct arch_uprobe *auprobe, struct mm_struct *mm, + unsigned long vaddr, uprobe_opcode_t opcode) { struct page *old_page, *new_page; struct vm_area_struct *vma; @@ -351,7 +351,7 @@ int uprobe_write_opcode(struct mm_struct *mm, unsigned long vaddr, */ int __weak set_swbp(struct arch_uprobe *auprobe, struct mm_struct *mm, unsigned long vaddr) { - return uprobe_write_opcode(mm, vaddr, UPROBE_SWBP_INSN); + return uprobe_write_opcode(auprobe, mm, vaddr, UPROBE_SWBP_INSN); } /** @@ -366,7 +366,8 @@ int __weak set_swbp(struct arch_uprobe *auprobe, struct mm_struct *mm, unsigned int __weak set_orig_insn(struct arch_uprobe *auprobe, struct mm_struct *mm, unsigned long vaddr) { - return uprobe_write_opcode(mm, vaddr, *(uprobe_opcode_t *)>insn); + return uprobe_write_opcode(auprobe, mm, vaddr, + *(uprobe_opcode_t *)>insn); } static struct uprobe *get_uprobe(struct uprobe *uprobe) -- 2.14.4
[PATCH v6 1/6] Uprobes: Simplify uprobe_register() body
Simplify uprobe_register() function body and let __uprobe_register() handle everything. Also move dependency functions around to fix build failures. Signed-off-by: Ravi Bangoria --- kernel/events/uprobes.c | 69 ++--- 1 file changed, 36 insertions(+), 33 deletions(-) diff --git a/kernel/events/uprobes.c b/kernel/events/uprobes.c index ccc579a7d32e..471eac896635 100644 --- a/kernel/events/uprobes.c +++ b/kernel/events/uprobes.c @@ -840,13 +840,8 @@ register_for_each_vma(struct uprobe *uprobe, struct uprobe_consumer *new) return err; } -static int __uprobe_register(struct uprobe *uprobe, struct uprobe_consumer *uc) -{ - consumer_add(uprobe, uc); - return register_for_each_vma(uprobe, uc); -} - -static void __uprobe_unregister(struct uprobe *uprobe, struct uprobe_consumer *uc) +static void +__uprobe_unregister(struct uprobe *uprobe, struct uprobe_consumer *uc) { int err; @@ -860,24 +855,46 @@ static void __uprobe_unregister(struct uprobe *uprobe, struct uprobe_consumer *u } /* - * uprobe_register - register a probe + * uprobe_unregister - unregister a already registered probe. + * @inode: the file in which the probe has to be removed. + * @offset: offset from the start of the file. + * @uc: identify which probe if multiple probes are colocated. + */ +void uprobe_unregister(struct inode *inode, loff_t offset, struct uprobe_consumer *uc) +{ + struct uprobe *uprobe; + + uprobe = find_uprobe(inode, offset); + if (WARN_ON(!uprobe)) + return; + + down_write(>register_rwsem); + __uprobe_unregister(uprobe, uc); + up_write(>register_rwsem); + put_uprobe(uprobe); +} +EXPORT_SYMBOL_GPL(uprobe_unregister); + +/* + * __uprobe_register - register a probe * @inode: the file in which the probe has to be placed. * @offset: offset from the start of the file. * @uc: information on howto handle the probe.. * - * Apart from the access refcount, uprobe_register() takes a creation + * Apart from the access refcount, __uprobe_register() takes a creation * refcount (thro alloc_uprobe) if and only if this @uprobe is getting * inserted into the rbtree (i.e first consumer for a @inode:@offset * tuple). Creation refcount stops uprobe_unregister from freeing the * @uprobe even before the register operation is complete. Creation * refcount is released when the last @uc for the @uprobe - * unregisters. Caller of uprobe_register() is required to keep @inode + * unregisters. Caller of __uprobe_register() is required to keep @inode * (and the containing mount) referenced. * * Return errno if it cannot successully install probes * else return 0 (success) */ -int uprobe_register(struct inode *inode, loff_t offset, struct uprobe_consumer *uc) +static int __uprobe_register(struct inode *inode, loff_t offset, +struct uprobe_consumer *uc) { struct uprobe *uprobe; int ret; @@ -904,7 +921,8 @@ int uprobe_register(struct inode *inode, loff_t offset, struct uprobe_consumer * down_write(>register_rwsem); ret = -EAGAIN; if (likely(uprobe_is_active(uprobe))) { - ret = __uprobe_register(uprobe, uc); + consumer_add(uprobe, uc); + ret = register_for_each_vma(uprobe, uc); if (ret) __uprobe_unregister(uprobe, uc); } @@ -915,6 +933,12 @@ int uprobe_register(struct inode *inode, loff_t offset, struct uprobe_consumer * goto retry; return ret; } + +int uprobe_register(struct inode *inode, loff_t offset, + struct uprobe_consumer *uc) +{ + return __uprobe_register(inode, offset, uc); +} EXPORT_SYMBOL_GPL(uprobe_register); /* @@ -946,27 +970,6 @@ int uprobe_apply(struct inode *inode, loff_t offset, return ret; } -/* - * uprobe_unregister - unregister a already registered probe. - * @inode: the file in which the probe has to be removed. - * @offset: offset from the start of the file. - * @uc: identify which probe if multiple probes are colocated. - */ -void uprobe_unregister(struct inode *inode, loff_t offset, struct uprobe_consumer *uc) -{ - struct uprobe *uprobe; - - uprobe = find_uprobe(inode, offset); - if (WARN_ON(!uprobe)) - return; - - down_write(>register_rwsem); - __uprobe_unregister(uprobe, uc); - up_write(>register_rwsem); - put_uprobe(uprobe); -} -EXPORT_SYMBOL_GPL(uprobe_unregister); - static int unapply_uprobe(struct uprobe *uprobe, struct mm_struct *mm) { struct vm_area_struct *vma; -- 2.14.4
[PATCH v6 0/6] Uprobes: Support SDT markers having reference count (semaphore)
Userspace Statically Defined Tracepoints[1] are dtrace style markers inside userspace applications. Applications like PostgreSQL, MySQL, Pthread, Perl, Python, Java, Ruby, Node.js, libvirt, QEMU, glib etc have these markers embedded in them. These markers are added by developer at important places in the code. Each marker source expands to a single nop instruction in the compiled code but there may be additional overhead for computing the marker arguments which expands to couple of instructions. In case the overhead is more, execution of it can be omitted by runtime if() condition when no one is tracing on the marker: if (reference_counter > 0) { Execute marker instructions; } Default value of reference counter is 0. Tracer has to increment the reference counter before tracing on a marker and decrement it when done with the tracing. Currently, perf tool has limited supports for SDT markers. I.e. it can not trace markers surrounded by reference counter. Also, it's not easy to add reference counter logic in userspace tool like perf, so basic idea for this patchset is to add reference counter logic in the a uprobe infrastructure. Ex,[2] # cat tick.c ... for (i = 0; i < 100; i++) { DTRACE_PROBE1(tick, loop1, i); if (TICK_LOOP2_ENABLED()) { DTRACE_PROBE1(tick, loop2, i); } printf("hi: %d\n", i); sleep(1); } ... Here tick:loop1 is marker without reference counter where as tick:loop2 is surrounded by reference counter condition. # perf buildid-cache --add /tmp/tick # perf probe sdt_tick:loop1 # perf probe sdt_tick:loop2 # perf stat -e sdt_tick:loop1,sdt_tick:loop2 -- /tmp/tick hi: 0 hi: 1 hi: 2 ^C Performance counter stats for '/tmp/tick': 3 sdt_tick:loop1 0 sdt_tick:loop2 2.747086086 seconds time elapsed Perf failed to record data for tick:loop2. Same experiment with this patch series: # ./perf buildid-cache --add /tmp/tick # ./perf probe sdt_tick:loop2 # ./perf stat -e sdt_tick:loop2 /tmp/tick hi: 0 hi: 1 hi: 2 ^C Performance counter stats for '/tmp/tick': 3 sdt_tick:loop2 2.561851452 seconds time elapsed v6 changes: - Do not export struct uprobe outside of uprobe.c. Instead, use container_of(arch_uprobe) to regain uprobe in uprobe_write_opcode(). - Allow 0 as a special value for reference counter _offset_. I.e. two uprobes, one having ref_ctr_offset=0 and the other having non-zero ref_ctr_offset can coexists. - If vma holding reference counter is not present while patching an instruction, we add that uprobe in delayed_uprobe_list. When appropriate mapping is created, we increment the reference counter and remove uprobe from delayed_uprobe_list. While doing all this, v5 was searching for all such uprobes in uprobe_tree which is not require. Also, uprobes are stored in rbtree with inode+offset as the key and v5 was searching uprobe based on inode+ref_ctr_offset which is wrong too. Fix this by directly looing on delayed_uprobe_list. - Consider VM_SHARED vma as invalid for reference counter. Return false from valid_ref_ctr_vma() if vma->vm_flags has VM_SHARED set. - No need to use FOLL_FORCE in get_user_pages_remote() while getting a page to update reference counter. Remove it. - Do not mention usage of reference counter in Documentation/. v5 can be found at: https://lkml.org/lkml/2018/6/28/51 Ravi Bangoria (6): Uprobes: Simplify uprobe_register() body Uprobe: Additional argument arch_uprobe to uprobe_write_opcode() Uprobes: Support SDT markers having reference count (semaphore) trace_uprobe/sdt: Prevent multiple reference counter for same uprobe Uprobes/sdt: Prevent multiple reference counter for same uprobe perf probe: Support SDT markers having reference counter (semaphore) arch/arm/probes/uprobes/core.c | 2 +- arch/mips/kernel/uprobes.c | 2 +- include/linux/uprobes.h| 7 +- kernel/events/uprobes.c| 358 - kernel/trace/trace.c | 2 +- kernel/trace/trace_uprobe.c| 78 - tools/perf/util/probe-event.c | 39 - tools/perf/util/probe-event.h | 1 + tools/perf/util/probe-file.c | 34 +++- tools/perf/util/probe-file.h | 1 + tools/perf/util/symbol-elf.c | 46 -- tools/perf/util/symbol.h | 7 + 12 files changed, 503 insertions(+), 74 deletions(-) -- 2.14.4
[PATCH v6 3/6] Uprobes: Support SDT markers having reference count (semaphore)
Userspace Statically Defined Tracepoints[1] are dtrace style markers inside userspace applications. Applications like PostgreSQL, MySQL, Pthread, Perl, Python, Java, Ruby, Node.js, libvirt, QEMU, glib etc have these markers embedded in them. These markers are added by developer at important places in the code. Each marker source expands to a single nop instruction in the compiled code but there may be additional overhead for computing the marker arguments which expands to couple of instructions. In case the overhead is more, execution of it can be omitted by runtime if() condition when no one is tracing on the marker: if (reference_counter > 0) { Execute marker instructions; } Default value of reference counter is 0. Tracer has to increment the reference counter before tracing on a marker and decrement it when done with the tracing. Implement the reference counter logic in core uprobe. User will be able to use it from trace_uprobe as well as from kernel module. New trace_uprobe definition with reference counter will now be: :[(ref_ctr_offset)] where ref_ctr_offset is an optional field. For kernel module, new variant of uprobe_register() has been introduced: uprobe_register_refctr(inode, offset, ref_ctr_offset, consumer) No new variant for uprobe_unregister() because it's assumed to have only one reference counter for one uprobe. [1] https://sourceware.org/systemtap/wiki/UserSpaceProbeImplementation Note: 'reference counter' is called as 'semaphore' in original Dtrace (or Systemtap, bcc and even in ELF) documentation and code. But the term 'semaphore' is misleading in this context. This is just a counter used to hold number of tracers tracing on a marker. This is not really used for any synchronization. So we are referring it as 'reference counter' in kernel / perf code. Signed-off-by: Ravi Bangoria --- include/linux/uprobes.h | 5 + kernel/events/uprobes.c | 232 ++-- kernel/trace/trace.c| 2 +- kernel/trace/trace_uprobe.c | 38 +++- 4 files changed, 267 insertions(+), 10 deletions(-) diff --git a/include/linux/uprobes.h b/include/linux/uprobes.h index bb9d2084af03..103a48a48872 100644 --- a/include/linux/uprobes.h +++ b/include/linux/uprobes.h @@ -123,6 +123,7 @@ extern unsigned long uprobe_get_swbp_addr(struct pt_regs *regs); extern unsigned long uprobe_get_trap_addr(struct pt_regs *regs); extern int uprobe_write_opcode(struct arch_uprobe *auprobe, struct mm_struct *mm, unsigned long vaddr, uprobe_opcode_t); extern int uprobe_register(struct inode *inode, loff_t offset, struct uprobe_consumer *uc); +extern int uprobe_register_refctr(struct inode *inode, loff_t offset, loff_t ref_ctr_offset, struct uprobe_consumer *uc); extern int uprobe_apply(struct inode *inode, loff_t offset, struct uprobe_consumer *uc, bool); extern void uprobe_unregister(struct inode *inode, loff_t offset, struct uprobe_consumer *uc); extern int uprobe_mmap(struct vm_area_struct *vma); @@ -160,6 +161,10 @@ uprobe_register(struct inode *inode, loff_t offset, struct uprobe_consumer *uc) { return -ENOSYS; } +static inline int uprobe_register_refctr(struct inode *inode, loff_t offset, loff_t ref_ctr_offset, struct uprobe_consumer *uc) +{ + return -ENOSYS; +} static inline int uprobe_apply(struct inode *inode, loff_t offset, struct uprobe_consumer *uc, bool add) { diff --git a/kernel/events/uprobes.c b/kernel/events/uprobes.c index c0418ba52ba8..84da8512a974 100644 --- a/kernel/events/uprobes.c +++ b/kernel/events/uprobes.c @@ -73,6 +73,7 @@ struct uprobe { struct uprobe_consumer *consumers; struct inode*inode; /* Also hold a ref to inode */ loff_t offset; + loff_t ref_ctr_offset; unsigned long flags; /* @@ -88,6 +89,15 @@ struct uprobe { struct arch_uprobe arch; }; +struct delayed_uprobe { + struct list_head list; + struct uprobe *uprobe; + struct mm_struct *mm; +}; + +static DEFINE_MUTEX(delayed_uprobe_lock); +static LIST_HEAD(delayed_uprobe_list); + /* * Execute out of line area: anonymous executable mapping installed * by the probed task to execute the copy of the original instruction @@ -282,6 +292,154 @@ static int verify_opcode(struct page *page, unsigned long vaddr, uprobe_opcode_t return 1; } +static struct delayed_uprobe * +delayed_uprobe_check(struct uprobe *uprobe, struct mm_struct *mm) +{ + struct delayed_uprobe *du; + + list_for_each_entry(du, _uprobe_list, list) + if (du->uprobe == uprobe && du->mm == mm) + return du; + return NULL; +} + +static int delayed_uprobe_add(struct uprobe *uprobe, struct mm_struct *mm) +{ + struct delayed_uprobe *du; + + if (delayed_uprobe_check(uprobe, mm)) + return 0; + + du = kzalloc(sizeof
[PATCH v6 4/6] trace_uprobe/sdt: Prevent multiple reference counter for same uprobe
We assume to have only one reference counter for one uprobe. Don't allow user to add multiple trace_uprobe entries having same inode+offset but different reference counter. Ex, # echo "p:sdt_tick/loop2 /home/ravi/tick:0x6e4(0x10036)" > uprobe_events # echo "p:sdt_tick/loop2_1 /home/ravi/tick:0x6e4(0xf)" >> uprobe_events bash: echo: write error: Invalid argument # dmesg trace_kprobe: Reference counter offset mismatch. There are two exceptions though: 1) When user is trying to replace the old entry with the new one, we allow this if the new entry does not conflict with any other existing entry. 2) Any one of the reference counter offset is 0 while comparing new entry with old one. Signed-off-by: Ravi Bangoria --- kernel/trace/trace_uprobe.c | 40 ++-- 1 file changed, 38 insertions(+), 2 deletions(-) diff --git a/kernel/trace/trace_uprobe.c b/kernel/trace/trace_uprobe.c index bf2be098eb08..9fafb183c49d 100644 --- a/kernel/trace/trace_uprobe.c +++ b/kernel/trace/trace_uprobe.c @@ -324,6 +324,38 @@ static int unregister_trace_uprobe(struct trace_uprobe *tu) return 0; } +/* + * Uprobe with multiple reference counter is not allowed. i.e. + * If inode and offset matches, reference counter offset *must* + * match as well. Though, there are two exceptions: + * 1) We are replacing old trace_uprobe with new one(same + *group/event) then we don't care as far as the new entry + *does not conflict with any other existing entry. + * 2) Any one of the reference counter offset is 0 while comparing + *new entry with old one. + */ +static struct trace_uprobe *find_old_trace_uprobe(struct trace_uprobe *new) +{ + struct trace_uprobe *tmp, *old = NULL; + struct inode *new_inode = d_real_inode(new->path.dentry); + + old = find_probe_event(trace_event_name(>tp.call), + new->tp.call.class->system); + + list_for_each_entry(tmp, _list, list) { + if ((old ? old != tmp : true) && + new_inode == d_real_inode(tmp->path.dentry) && + new->offset == tmp->offset && + new->ref_ctr_offset != tmp->ref_ctr_offset && + new->ref_ctr_offset != 0 && + tmp->ref_ctr_offset != 0) { + pr_warn("Reference counter offset mismatch."); + return ERR_PTR(-EINVAL); + } + } + return old; +} + /* Register a trace_uprobe and probe_event */ static int register_trace_uprobe(struct trace_uprobe *tu) { @@ -333,8 +365,12 @@ static int register_trace_uprobe(struct trace_uprobe *tu) mutex_lock(_lock); /* register as an event */ - old_tu = find_probe_event(trace_event_name(>tp.call), - tu->tp.call.class->system); + old_tu = find_old_trace_uprobe(tu); + if (IS_ERR(old_tu)) { + ret = PTR_ERR(old_tu); + goto end; + } + if (old_tu) { /* delete old event */ ret = unregister_trace_uprobe(old_tu); -- 2.14.4
[PATCH v6 5/6] Uprobes/sdt: Prevent multiple reference counter for same uprobe
We assume to have only one reference counter for one uprobe. Don't allow user to register multiple uprobes having same inode+offset but different reference counter. Though, existing tools which already support SDT events creates normal uprobe and updates reference counter on their own. Allow 0 as a special value for reference counter offset. I.e. two uprobes, one having ref_ctr_offset=0 and the other having non-zero ref_ctr_offset can coexists. This gives user a flexibility to either depend on kernel uprobe infrastructure to maintain reference counter or just use normal uprobe and maintain reference counter on his own. Signed-off-by: Ravi Bangoria --- kernel/events/uprobes.c | 52 +++-- 1 file changed, 50 insertions(+), 2 deletions(-) diff --git a/kernel/events/uprobes.c b/kernel/events/uprobes.c index 84da8512a974..563cc3e625b3 100644 --- a/kernel/events/uprobes.c +++ b/kernel/events/uprobes.c @@ -63,6 +63,8 @@ static struct percpu_rw_semaphore dup_mmap_sem; /* Have a copy of original instruction */ #define UPROBE_COPY_INSN 0 +/* Reference counter offset is reloaded with non-zero value. */ +#define REF_CTR_OFF_RELOADED 1 struct uprobe { struct rb_node rb_node;/* node in the rb tree */ @@ -476,9 +478,23 @@ int uprobe_write_opcode(struct arch_uprobe *auprobe, struct mm_struct *mm, return ret; ret = verify_opcode(old_page, vaddr, ); - if (ret <= 0) + if (ret < 0) goto put_old; + /* +* If instruction is already patched but reference counter offset +* has been reloaded to non-zero value, increment the reference +* counter and return. +*/ + if (ret == 0) { + if (is_register && + test_bit(REF_CTR_OFF_RELOADED, >flags)) { + WARN_ON(!uprobe->ref_ctr_offset); + ret = update_ref_ctr(uprobe, mm, true); + } + goto put_old; + } + /* We are going to replace instruction, update ref_ctr. */ if (!ref_ctr_updated && uprobe->ref_ctr_offset) { ret = update_ref_ctr(uprobe, mm, is_register); @@ -679,6 +695,30 @@ static struct uprobe *alloc_uprobe(struct inode *inode, loff_t offset, cur_uprobe = insert_uprobe(uprobe); /* a uprobe exists for this inode:offset combination */ if (cur_uprobe) { + /* +* If inode+offset matches, ref_ctr_offset must match as +* well. Though, 0 is a special value for ref_ctr_offset. +*/ + if (cur_uprobe->ref_ctr_offset != uprobe->ref_ctr_offset && + cur_uprobe->ref_ctr_offset != 0 && + uprobe->ref_ctr_offset != 0) { + pr_warn("Err: Reference counter mismatch.\n"); + put_uprobe(cur_uprobe); + kfree(uprobe); + return ERR_PTR(-EINVAL); + } + + /* +* If existing uprobe->ref_ctr_offset is 0 and user is +* registering same uprobe with non-zero ref_ctr_offset, +* set new ref_ctr_offset to existing uprobe. +*/ + + if (!cur_uprobe->ref_ctr_offset && uprobe->ref_ctr_offset) { + cur_uprobe->ref_ctr_offset = uprobe->ref_ctr_offset; + set_bit(REF_CTR_OFF_RELOADED, _uprobe->flags); + } + kfree(uprobe); uprobe = cur_uprobe; } @@ -971,6 +1011,7 @@ register_for_each_vma(struct uprobe *uprobe, struct uprobe_consumer *new) bool is_register = !!new; struct map_info *info; int err = 0; + bool installed = false; percpu_down_write(_mmap_sem); info = build_map_info(uprobe->inode->i_mapping, @@ -1000,8 +1041,10 @@ register_for_each_vma(struct uprobe *uprobe, struct uprobe_consumer *new) if (is_register) { /* consult only the "caller", new consumer. */ if (consumer_filter(new, - UPROBE_FILTER_REGISTER, mm)) + UPROBE_FILTER_REGISTER, mm)) { err = install_breakpoint(uprobe, mm, vma, info->vaddr); + installed = true; + } } else if (test_bit(MMF_HAS_UPROBES, >flags)) { if (!filter_chain(uprobe, UPROBE_FILTER_UNREGISTER, mm)) @@ -1016,6 +1059,8 @@ register_for_each_vma(struct uprobe *uprobe, struct uprobe_consumer *new) } out: percpu_up_write(_mmap_sem); + if (installed) + clear_bit(RE
[PATCH v6 6/6] perf probe: Support SDT markers having reference counter (semaphore)
With this, perf buildid-cache will save SDT markers with reference counter in probe cache. Perf probe will be able to probe markers having reference counter. Ex, # readelf -n /tmp/tick | grep -A1 loop2 Name: loop2 ... Semaphore: 0x10020036 # ./perf buildid-cache --add /tmp/tick # ./perf probe sdt_tick:loop2 # ./perf stat -e sdt_tick:loop2 /tmp/tick hi: 0 hi: 1 hi: 2 ^C Performance counter stats for '/tmp/tick': 3 sdt_tick:loop2 2.561851452 seconds time elapsed Signed-off-by: Ravi Bangoria Acked-by: Masami Hiramatsu Acked-by: Srikar Dronamraju --- tools/perf/util/probe-event.c | 39 tools/perf/util/probe-event.h | 1 + tools/perf/util/probe-file.c | 34 ++-- tools/perf/util/probe-file.h | 1 + tools/perf/util/symbol-elf.c | 46 --- tools/perf/util/symbol.h | 7 +++ 6 files changed, 106 insertions(+), 22 deletions(-) diff --git a/tools/perf/util/probe-event.c b/tools/perf/util/probe-event.c index f119eb628dbb..e86f8be89157 100644 --- a/tools/perf/util/probe-event.c +++ b/tools/perf/util/probe-event.c @@ -1819,6 +1819,12 @@ int parse_probe_trace_command(const char *cmd, struct probe_trace_event *tev) tp->offset = strtoul(fmt2_str, NULL, 10); } + if (tev->uprobes) { + fmt2_str = strchr(p, '('); + if (fmt2_str) + tp->ref_ctr_offset = strtoul(fmt2_str + 1, NULL, 0); + } + tev->nargs = argc - 2; tev->args = zalloc(sizeof(struct probe_trace_arg) * tev->nargs); if (tev->args == NULL) { @@ -2012,6 +2018,22 @@ static int synthesize_probe_trace_arg(struct probe_trace_arg *arg, return err; } +static int +synthesize_uprobe_trace_def(struct probe_trace_event *tev, struct strbuf *buf) +{ + struct probe_trace_point *tp = >point; + int err; + + err = strbuf_addf(buf, "%s:0x%lx", tp->module, tp->address); + + if (err >= 0 && tp->ref_ctr_offset) { + if (!uprobe_ref_ctr_is_supported()) + return -1; + err = strbuf_addf(buf, "(0x%lx)", tp->ref_ctr_offset); + } + return err >= 0 ? 0 : -1; +} + char *synthesize_probe_trace_command(struct probe_trace_event *tev) { struct probe_trace_point *tp = >point; @@ -2041,15 +2063,17 @@ char *synthesize_probe_trace_command(struct probe_trace_event *tev) } /* Use the tp->address for uprobes */ - if (tev->uprobes) - err = strbuf_addf(, "%s:0x%lx", tp->module, tp->address); - else if (!strncmp(tp->symbol, "0x", 2)) + if (tev->uprobes) { + err = synthesize_uprobe_trace_def(tev, ); + } else if (!strncmp(tp->symbol, "0x", 2)) { /* Absolute address. See try_to_find_absolute_address() */ err = strbuf_addf(, "%s%s0x%lx", tp->module ?: "", tp->module ? ":" : "", tp->address); - else + } else { err = strbuf_addf(, "%s%s%s+%lu", tp->module ?: "", tp->module ? ":" : "", tp->symbol, tp->offset); + } + if (err) goto error; @@ -2633,6 +2657,13 @@ static void warn_uprobe_event_compat(struct probe_trace_event *tev) { int i; char *buf = synthesize_probe_trace_command(tev); + struct probe_trace_point *tp = >point; + + if (tp->ref_ctr_offset && !uprobe_ref_ctr_is_supported()) { + pr_warning("A semaphore is associated with %s:%s and " + "seems your kernel doesn't support it.\n", + tev->group, tev->event); + } /* Old uprobe event doesn't support memory dereference */ if (!tev->uprobes || tev->nargs == 0 || !buf) diff --git a/tools/perf/util/probe-event.h b/tools/perf/util/probe-event.h index 45b14f020558..15a98c3a2a2f 100644 --- a/tools/perf/util/probe-event.h +++ b/tools/perf/util/probe-event.h @@ -27,6 +27,7 @@ struct probe_trace_point { char*symbol;/* Base symbol */ char*module;/* Module name */ unsigned long offset; /* Offset from symbol */ + unsigned long ref_ctr_offset; /* SDT reference counter offset */ unsigned long address;/* Actual address of the trace point */ boolretprobe; /* Return probe flag */ }; diff --git a/tools/perf/util/probe-file.c b/tools/perf/util/probe-file.c index b76088fadf3d..aac7817d9e14 100644 --- a/tools/perf/util/probe-file.c
Re: [PATCH 42/46] perf script powerpc: Python script for hypervisor call statistics
Hi Paul, On 06/06/2018 08:23 PM, Paul Clarke wrote: > On 06/05/2018 12:50 PM, Arnaldo Carvalho de Melo wrote: >> From: Ravi Bangoria >> >> Add python script to show hypervisor call statistics. Ex, >> >> # perf record -a -e "{powerpc:hcall_entry,powerpc:hcall_exit}" >> # perf script -s scripts/python/powerpc-hcalls.py >> hcallcount min(ns) max(ns) avg(ns) >> >> H_RANDOM82 838 1164 904 >> H_PUT_TCE 47 1078 5928 2003 >> H_EOI 266 1336 3546 1654 >> H_ENTER 28 1646 4038 1952 >> H_PUT_TCE_INDIRECT 230 2166 18168 6109 >> H_IPI 238 1072 3232 1688 >> H_SEND_LOGICAL_LAN 42 5488 21366 7694 >> H_STUFF_TCE294 986 6210 3591 >> H_XIRR 266 2286 6990 3783 >> H_PROTECT 10 2196 3556 2555 >> H_VIO_SIGNAL 294 1028 2784 1311 >> H_ADD_LOGICAL_LAN_BUFFER53 1978 3450 2600 >> H_SEND_CRQ 77 1762 7240 2447 > > This translation from HCALL code to mnemonic is more generally useful. Is > there a good way to make the "hcall_table_lookup" function more generally > available, like "syscall_name" in > scripts/python/Perf-Trace-Util/lib/Perf/Trace/Util.py ? It's even simpler > than the syscall ID-to-name mapping, because the HCALL codes are constant, > unlike syscall IDs which vary between arches. > Right, but I don't see any other python script for hcalls. So, I thought to add it in the script itself. Let me know if you are planning to add any :) We will move it in a generic module. Ravi
Re: [PATCH 0/7] Uprobes: Support SDT markers having reference count (semaphore)
Hi Oleg, On 06/08/2018 10:06 PM, Oleg Nesterov wrote: > Hello, > > I am travelling till the end of the next week, can't read this version > until I return. Just one question, > > On 06/06, Ravi Bangoria wrote: >> >> 1. One of the major reason was the deadlock between uprobe_lock and >> mm->mmap inside trace_uprobe_mmap(). That deadlock was not easy to fix > > Could you remind what exactly was wrong? > > I can't find your previous email about this problem, but iirc you didn't > explain the deadlock in details, just copied some traces from lockdep... The deadlock is between mm->mmap_sem and uprobe_lock. Some existing code path is taking these locks in following order: uprobe_lock event_mutex uprobe->register_rwsem dup_mmap_sem mm->mmap_sem I've introduced new function trace_uprobe_mmap() which gets called from mmap_region() / vma_adjust() with mm->mmap_sem already acquired. And it has to take uprobe_lock to loop over all trace_uprobes. i.e. the sequence is: mm->mmap_sem uprobe_lock Why it's difficult to resolve is because trace_uprobe_mmap() does not have control over mm->mmap_sem. Detailed trace from lockdep: [ 499.258006] == [ 499.258205] WARNING: possible circular locking dependency detected [ 499.258409] 4.17.0-rc3+ #76 Not tainted [ 499.258528] -- [ 499.258731] perf/6744 is trying to acquire lock: [ 499.258895] e4895f49 (uprobe_lock){+.+.}, at: trace_uprobe_mmap+0x78/0x130 [ 499.259147] [ 499.259147] but task is already holding lock: [ 499.259349] 9ec93a76 (>mmap_sem){}, at: vm_mmap_pgoff+0xe0/0x160 [ 499.259597] [ 499.259597] which lock already depends on the new lock. [ 499.259597] [ 499.259848] [ 499.259848] the existing dependency chain (in reverse order) is: [ 499.260086] [ 499.260086] -> #4 (>mmap_sem){}: [ 499.260277]__lock_acquire+0x53c/0x910 [ 499.260442]lock_acquire+0xf4/0x2f0 [ 499.260595]down_write_killable+0x6c/0x150 [ 499.260764]copy_process.isra.34.part.35+0x1594/0x1be0 [ 499.260967]_do_fork+0xf8/0x910 [ 499.261090]ppc_clone+0x8/0xc [ 499.261209] [ 499.261209] -> #3 (_mmap_sem){}: [ 499.261378]__lock_acquire+0x53c/0x910 [ 499.261540]lock_acquire+0xf4/0x2f0 [ 499.261669]down_write+0x6c/0x110 [ 499.261793]percpu_down_write+0x48/0x140 [ 499.261954]register_for_each_vma+0x6c/0x2a0 [ 499.262116]uprobe_register+0x230/0x320 [ 499.262277]probe_event_enable+0x1cc/0x540 [ 499.262435]perf_trace_event_init+0x1e0/0x350 [ 499.262587]perf_trace_init+0xb0/0x110 [ 499.262750]perf_tp_event_init+0x38/0x90 [ 499.262910]perf_try_init_event+0x10c/0x150 [ 499.263075]perf_event_alloc+0xbb0/0xf10 [ 499.263235]sys_perf_event_open+0x2a8/0xdd0 [ 499.263396]system_call+0x58/0x6c [ 499.263516] [ 499.263516] -> #2 (>register_rwsem){}: [ 499.263723]__lock_acquire+0x53c/0x910 [ 499.263884]lock_acquire+0xf4/0x2f0 [ 499.264002]down_write+0x6c/0x110 [ 499.264118]uprobe_register+0x1ec/0x320 [ 499.264283]probe_event_enable+0x1cc/0x540 [ 499.264442]perf_trace_event_init+0x1e0/0x350 [ 499.264603]perf_trace_init+0xb0/0x110 [ 499.264766]perf_tp_event_init+0x38/0x90 [ 499.264930]perf_try_init_event+0x10c/0x150 [ 499.265092]perf_event_alloc+0xbb0/0xf10 [ 499.265261]sys_perf_event_open+0x2a8/0xdd0 [ 499.265424]system_call+0x58/0x6c [ 499.265542] [ 499.265542] -> #1 (event_mutex){+.+.}: [ 499.265738]__lock_acquire+0x53c/0x910 [ 499.265896]lock_acquire+0xf4/0x2f0 [ 499.266019]__mutex_lock+0xa0/0xab0 [ 499.266142]trace_add_event_call+0x44/0x100 [ 499.266310]create_trace_uprobe+0x4a0/0x8b0 [ 499.266474]trace_run_command+0xa4/0xc0 [ 499.266631]trace_parse_run_command+0xe4/0x200 [ 499.266799]probes_write+0x20/0x40 [ 499.266922]__vfs_write+0x6c/0x240 [ 499.267041]vfs_write+0xd0/0x240 [ 499.267166]ksys_write+0x6c/0x110 [ 499.267295]system_call+0x58/0x6c [ 499.267413] [ 499.267413] -> #0 (uprobe_lock){+.+.}: [ 499.267591]validate_chain.isra.34+0xbd0/0x1000 [ 499.267747]__lock_acquire+0x53c/0x910 [ 499.267917]lock_acquire+0xf4/0x2f0 [ 499.268048]__mutex_lock+0xa0/0xab0 [ 499.268170]trace_uprobe_mmap+0x78/0x130 [ 499.268335]uprobe_mmap+0x80/0x3b0 [ 499.268464]mmap_region+0x290/0x660 [ 499.268590]do_mmap+0x40c/0x500 [ 499.268718]vm_mmap_pgoff+0x114/0x160 [ 499.268870]ksys_mmap_pgoff+0xe8/0x2e0 [ 499.269034]sys_mma
Re: [PATCH 0/7] Uprobes: Support SDT markers having reference count (semaphore)
Hi Masami, >>> Hmm, it sounds simple... maybe we can increment refctr in >>> install_breakpoint/ >>> remove_breakpoint? >> >> Not really, it would be simpler if I can put it inside install_breakpoint(). >> Consider an mmap() case. Probed instruction resides in the text section >> whereas >> reference counter resides in the data section. These sections gets mapped >> using >> separate mmap() calls. So, when process mmaps the text section we will >> change the >> instruction, but section holding the reference counter may not have been >> mapped >> yet in the virtual memory. If so, we will fail to update the reference >> counter. > > Got it. > In such case, maybe we can hook the target page mmapped and do > install_breakpoint() > at that point. Since the instruction is protected by a refctr, unless mmap the > page on where the refctr is, the program doesn't reach the tracepoint. Is > that right? > You mean, when mmap(text) happens, save the target page somewhere and when mmap(data) happens, update both instruction and ref_ctr? This sounds feasible. Let me think on it. Thanks for suggestion, Ravi
Re: [PATCH] perf test: Add watchpoint test
On 09/10/2018 11:01 PM, Arnaldo Carvalho de Melo wrote: > Em Mon, Sep 10, 2018 at 11:18:30AM -0300, Arnaldo Carvalho de Melo escreveu: >> Em Mon, Sep 10, 2018 at 10:47:54AM -0300, Arnaldo Carvalho de Melo escreveu: >>> Em Mon, Sep 10, 2018 at 12:31:54PM +0200, Jiri Olsa escreveu: >>>> On Mon, Sep 10, 2018 at 03:28:11PM +0530, Ravi Bangoria wrote: >>>>> Ex on powerpc: >>>>> $ sudo ./perf test 22 >>>>> 22: Watchpoint: >>>>> 22.1: Read Only Watchpoint: Ok >>>>> 22.2: Write Only Watchpoint : Ok >>>>> 22.3: Read / Write Watchpoint : Ok >>>>> 22.4: Modify Watchpoint : Ok > >>>> cool, thanks! > >>>> Acked-by: Jiri Olsa > >>> Thanks, applied. > > Oops, fails when cross-building it to mips, I'll try to fix after lunch: Sorry for bit late reply. Will send v2 with the fix. Thanks Ravi > > 18 109.48 debian:experimental : Ok gcc (Debian 8.2.0-4) 8.2.0 > 1942.66 debian:experimental-x-arm64 : Ok aarch64-linux-gnu-gcc > (Debian 8.1.0-12) 8.1.0 > 2022.33 debian:experimental-x-mips: FAIL mips-linux-gnu-gcc (Debian > 8.1.0-12) 8.1.0 > 2120.05 debian:experimental-x-mips64 : FAIL mips64-linux-gnuabi64-gcc > (Debian 8.1.0-12) 8.1.0 > 2222.85 debian:experimental-x-mipsel : FAIL mipsel-linux-gnu-gcc > (Debian 8.1.0-12) 8.1.0 > > CC /tmp/build/perf/tests/bp_account.o > CC /tmp/build/perf/tests/wp.o > tests/wp.c:5:10: fatal error: arch-tests.h: No such file or directory > #include "arch-tests.h" > ^~ > compilation terminated. > mv: cannot stat '/tmp/build/perf/tests/.wp.o.tmp': No such file or directory > make[4]: *** [/git/linux/tools/build/Makefile.build:97: > /tmp/build/perf/tests/wp.o] Error 1 > make[4]: *** Waiting for unfinished jobs > CC /tmp/build/perf/util/record.o > CC /tmp/build/perf/util/srcline.o > make[3]: *** [/git/linux/tools/build/Makefile.build:139: tests] Error 2 > make[2]: *** [Makefile.perf:507: /tmp/build/perf/perf-in.o] Error 2 > make[2]: *** Waiting for unfinished jobs >
Re: [PATCH] perf test: Add watchpoint test
> While testing, I got curious, as a 'perf test' user, why one of the > tests had the "Skip" result: > > [root@seventh ~]# perf test watchpoint > 22: Watchpoint: > 22.1: Read Only Watchpoint: Skip > 22.2: Write Only Watchpoint : Ok > 22.3: Read / Write Watchpoint : Ok > 22.4: Modify Watchpoint : Ok > [root@seventh ~]# > > I tried with 'perf test -v watchpoint' but that didn't help, perhaps you > could add some message after the "Skip" telling why it skipped that > test? I.e. hardware doesn't have that capability, kernel driver not yet > supporting that, something else? Sure will add a message: pr_debug("Hardware does not support read only watchpoints."); Ravi
[PATCH v2] perf test: Add watchpoint test
We don't have perf test available to test watchpoint functionality. Add simple set of tests: - Read only watchpoint - Write only watchpoint - Read / Write watchpoint - Runtime watchpoint modification Ex on powerpc: $ sudo ./perf test 22 22: Watchpoint: 22.1: Read Only Watchpoint: Ok 22.2: Write Only Watchpoint : Ok 22.3: Read / Write Watchpoint : Ok 22.4: Modify Watchpoint : Ok Signed-off-by: Ravi Bangoria Acked-by: Jiri Olsa --- v1 -> v2: - Fix build failure for mips and other archs. - Show debug message if subtest is not supported. tools/perf/tests/Build | 1 + tools/perf/tests/builtin-test.c | 9 ++ tools/perf/tests/tests.h| 3 + tools/perf/tests/wp.c | 229 4 files changed, 242 insertions(+) create mode 100644 tools/perf/tests/wp.c diff --git a/tools/perf/tests/Build b/tools/perf/tests/Build index 6c108fa79ae3..0b2b8305c965 100644 --- a/tools/perf/tests/Build +++ b/tools/perf/tests/Build @@ -21,6 +21,7 @@ perf-y += python-use.o perf-y += bp_signal.o perf-y += bp_signal_overflow.o perf-y += bp_account.o +perf-y += wp.o perf-y += task-exit.o perf-y += sw-clock.o perf-y += mmap-thread-lookup.o diff --git a/tools/perf/tests/builtin-test.c b/tools/perf/tests/builtin-test.c index d7a5e1b9aa6f..54ca7d87236f 100644 --- a/tools/perf/tests/builtin-test.c +++ b/tools/perf/tests/builtin-test.c @@ -120,6 +120,15 @@ static struct test generic_tests[] = { .func = test__bp_accounting, .is_supported = test__bp_signal_is_supported, }, + { + .desc = "Watchpoint", + .func = test__wp, + .subtest = { + .skip_if_fail = false, + .get_nr = test__wp_subtest_get_nr, + .get_desc = test__wp_subtest_get_desc, + }, + }, { .desc = "Number of exit events of a simple workload", .func = test__task_exit, diff --git a/tools/perf/tests/tests.h b/tools/perf/tests/tests.h index a9760e790563..8e26a4148f30 100644 --- a/tools/perf/tests/tests.h +++ b/tools/perf/tests/tests.h @@ -59,6 +59,9 @@ int test__python_use(struct test *test, int subtest); int test__bp_signal(struct test *test, int subtest); int test__bp_signal_overflow(struct test *test, int subtest); int test__bp_accounting(struct test *test, int subtest); +int test__wp(struct test *test, int subtest); +const char *test__wp_subtest_get_desc(int subtest); +int test__wp_subtest_get_nr(void); int test__task_exit(struct test *test, int subtest); int test__mem(struct test *test, int subtest); int test__sw_clock_freq(struct test *test, int subtest); diff --git a/tools/perf/tests/wp.c b/tools/perf/tests/wp.c new file mode 100644 index ..017a99317f94 --- /dev/null +++ b/tools/perf/tests/wp.c @@ -0,0 +1,229 @@ +// SPDX-License-Identifier: GPL-2.0 +#include +#include +#include +#include "tests.h" +#include "debug.h" +#include "cloexec.h" + +#define WP_TEST_ASSERT_VAL(fd, text, val) \ +do {\ + long long count;\ + wp_read(fd, , sizeof(long long)); \ + TEST_ASSERT_VAL(text, count == val);\ +} while (0) + +volatile u64 data1; +volatile u8 data2[3]; + +static int wp_read(int fd, long long *count, int size) +{ + int ret = read(fd, count, size); + + if (ret != size) { + pr_debug("failed to read: %d\n", ret); + return -1; + } + return 0; +} + +static void get__perf_event_attr(struct perf_event_attr *attr, int wp_type, +void *wp_addr, unsigned long wp_len) +{ + memset(attr, 0, sizeof(struct perf_event_attr)); + attr->type = PERF_TYPE_BREAKPOINT; + attr->size = sizeof(struct perf_event_attr); + attr->config = 0; + attr->bp_type= wp_type; + attr->bp_addr= (unsigned long)wp_addr; + attr->bp_len = wp_len; + attr->sample_period = 1; + attr->sample_type= PERF_SAMPLE_IP; + attr->exclude_kernel = 1; + attr->exclude_hv = 1; +} + +static int __event(int wp_type, void *wp_addr, unsigned long wp_len) +{ + int fd; + struct perf_event_attr attr; + + get__perf_event_attr(, wp_type, wp_addr, wp_len); + fd = sys_perf_event_open(, 0, -1, -1, +perf_event_open_cloexec_flag()); + if (fd < 0) + pr_debug("failed opening event %x\n", attr.bp_type); + + return fd; +} + +static int wp_ro_test(void) +{ + int fd
Re: [PATCH] Uprobes: Fix kernel oops with delayed_uprobe_remove()
Hi Oleg, On 11/14/18 9:36 PM, Oleg Nesterov wrote: > On 11/14, Ravi Bangoria wrote: >> >> syzbot reported a kernel crash with delayed_uprobe_remove(): >> https://lkml.org/lkml/2018/11/1/1244 >> >> Backtrace mentioned in the link points to a race between process >> exit and uprobe_unregister(). Fix it by locking delayed_uprobe_lock >> before calling delayed_uprobe_remove() from put_uprobe(). > > The patch looks good to me, but could you update the changelog? > > Please explain that the exiting task calls uprobe_clear_state() which > can race with delayed_uprobe_remove(). IIUC this is the only problem > solved by this patch, right? Right. Is this better: There could be a race between task exit and probe unregister: exit_mm() mmput() __mmput() uprobe_unregister() uprobe_clear_state() put_uprobe() delayed_uprobe_remove() delayed_uprobe_remove() put_uprobe() is calling delayed_uprobe_remove() without taking delayed_uprobe_lock and thus the race sometimes results in a kernel crash. Fix this by taking delayed_uprobe_lock before calling delayed_uprobe_remove() from put_uprobe(). Detailed crash log can be found at: https://lkml.org/lkml/2018/11/1/1244
[PATCH 1/2] perf stat: Use perf_evsel__is_clocki() for clock events
We already have function to check if a given event is either SW_CPU_CLOCK or SW_TASK_CLOCK. Utilize it. Signed-off-by: Ravi Bangoria --- tools/perf/util/stat-shadow.c | 3 +-- 1 file changed, 1 insertion(+), 2 deletions(-) diff --git a/tools/perf/util/stat-shadow.c b/tools/perf/util/stat-shadow.c index 8ad32763cfff..f0a8cec55c47 100644 --- a/tools/perf/util/stat-shadow.c +++ b/tools/perf/util/stat-shadow.c @@ -212,8 +212,7 @@ void perf_stat__update_shadow_stats(struct perf_evsel *counter, u64 count, count *= counter->scale; - if (perf_evsel__match(counter, SOFTWARE, SW_TASK_CLOCK) || - perf_evsel__match(counter, SOFTWARE, SW_CPU_CLOCK)) + if (perf_evsel__is_clock(counter)) update_runtime_stat(st, STAT_NSECS, 0, cpu, count); else if (perf_evsel__match(counter, HARDWARE, HW_CPU_CYCLES)) update_runtime_stat(st, STAT_CYCLES, ctx, cpu, count); -- 2.17.1
[RFC 2/2] perf stat: Fix shadow stats for clock events
Commit 0aa802a79469 ("perf stat: Get rid of extra clock display function") introduced scale and unit for clock events. Thus, perf_stat__update_shadow_stats() now saves scaled values of clock events in msecs, instead of original nsecs. But while calculating values of shadow stats we still consider clock event values in nsecs. This results in a wrong shadow stat values. Ex, # ./perf stat -e task-clock,cycles ls 2.62 msec task-clock:u#0.624 CPUs utilized 2,501,536 cycles:u# 1250768.000 GHz Fix this by considering clock events's saved stats in msecs: # ./perf stat -e task-clock,cycles ls 2.42 msec task-clock:u#0.754 CPUs utilized 2,338,747 cycles:u#1.169 GHz Note: The problem with this approach is, we are losing fractional part while converting nsecs to msecs. This results in a sightly different values of shadow stats. Reported-by: Anton Blanchard Fixes: 0aa802a79469 ("perf stat: Get rid of extra clock display function") Signed-off-by: Ravi Bangoria --- tools/perf/util/stat-shadow.c | 13 +++-- tools/perf/util/stat.h| 2 +- 2 files changed, 8 insertions(+), 7 deletions(-) diff --git a/tools/perf/util/stat-shadow.c b/tools/perf/util/stat-shadow.c index f0a8cec55c47..5b28b278a24e 100644 --- a/tools/perf/util/stat-shadow.c +++ b/tools/perf/util/stat-shadow.c @@ -1,5 +1,6 @@ // SPDX-License-Identifier: GPL-2.0 #include +#include #include "evsel.h" #include "stat.h" #include "color.h" @@ -213,7 +214,7 @@ void perf_stat__update_shadow_stats(struct perf_evsel *counter, u64 count, count *= counter->scale; if (perf_evsel__is_clock(counter)) - update_runtime_stat(st, STAT_NSECS, 0, cpu, count); + update_runtime_stat(st, STAT_MSECS, 0, cpu, count); else if (perf_evsel__match(counter, HARDWARE, HW_CPU_CYCLES)) update_runtime_stat(st, STAT_CYCLES, ctx, cpu, count); else if (perf_stat_evsel__is(counter, CYCLES_IN_TX)) @@ -873,10 +874,10 @@ void perf_stat__print_shadow_stats(struct perf_stat_config *config, } else if (perf_evsel__match(evsel, HARDWARE, HW_STALLED_CYCLES_BACKEND)) { print_stalled_cycles_backend(config, cpu, evsel, avg, out, st); } else if (perf_evsel__match(evsel, HARDWARE, HW_CPU_CYCLES)) { - total = runtime_stat_avg(st, STAT_NSECS, 0, cpu); + total = runtime_stat_avg(st, STAT_MSECS, 0, cpu); if (total) { - ratio = avg / total; + ratio = avg / (total * NSEC_PER_MSEC); print_metric(config, ctxp, NULL, "%8.3f", "GHz", ratio); } else { print_metric(config, ctxp, NULL, NULL, "Ghz", 0); @@ -972,14 +973,14 @@ void perf_stat__print_shadow_stats(struct perf_stat_config *config, } else if (evsel->metric_expr) { generic_metric(config, evsel->metric_expr, evsel->metric_events, evsel->name, evsel->metric_name, avg, cpu, out, st); - } else if (runtime_stat_n(st, STAT_NSECS, 0, cpu) != 0) { + } else if (runtime_stat_n(st, STAT_MSECS, 0, cpu) != 0) { char unit = 'M'; char unit_buf[10]; - total = runtime_stat_avg(st, STAT_NSECS, 0, cpu); + total = runtime_stat_avg(st, STAT_MSECS, 0, cpu); if (total) - ratio = 1000.0 * avg / total; + ratio = 1000.0 * avg / (total * NSEC_PER_MSEC); if (ratio < 0.001) { ratio *= 1000; unit = 'K'; diff --git a/tools/perf/util/stat.h b/tools/perf/util/stat.h index 2f9c9159a364..941ad4b0836c 100644 --- a/tools/perf/util/stat.h +++ b/tools/perf/util/stat.h @@ -62,7 +62,7 @@ enum { enum stat_type { STAT_NONE = 0, - STAT_NSECS, + STAT_MSECS, /* Milliseconds */ STAT_CYCLES, STAT_STALLED_CYCLES_FRONT, STAT_STALLED_CYCLES_BACK, -- 2.17.1
[PATCH] perf stat: Fix shadow stats for clock events
Commit 0aa802a79469 ("perf stat: Get rid of extra clock display function") introduced scale and unit for clock events. Thus, perf_stat__update_shadow_stats() now saves scaled values of clock events in msecs, instead of original nsecs. But while calculating values of shadow stats we still consider clock event values in nsecs. This results in a wrong shadow stat values. Ex, # ./perf stat -e task-clock,cycles ls 2.60 msec task-clock:u#0.877 CPUs utilized 2,430,564 cycles:u# 1215282.000 GHz Fix this by saving original nsec values for clock events in perf_stat__update_shadow_stats(). After patch: # ./perf stat -e task-clock,cycles ls 3.14 msec task-clock:u#0.839 CPUs utilized 3,094,528 cycles:u#0.985 GHz Reported-by: Anton Blanchard Suggested-by: Jiri Olsa Fixes: 0aa802a79469 ("perf stat: Get rid of extra clock display function") Signed-off-by: Ravi Bangoria --- tools/perf/util/stat-shadow.c | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) diff --git a/tools/perf/util/stat-shadow.c b/tools/perf/util/stat-shadow.c index f0a8cec55c47..3c22c58b3e90 100644 --- a/tools/perf/util/stat-shadow.c +++ b/tools/perf/util/stat-shadow.c @@ -209,11 +209,12 @@ void perf_stat__update_shadow_stats(struct perf_evsel *counter, u64 count, int cpu, struct runtime_stat *st) { int ctx = evsel_context(counter); + u64 count_ns = count; count *= counter->scale; if (perf_evsel__is_clock(counter)) - update_runtime_stat(st, STAT_NSECS, 0, cpu, count); + update_runtime_stat(st, STAT_NSECS, 0, cpu, count_ns); else if (perf_evsel__match(counter, HARDWARE, HW_CPU_CYCLES)) update_runtime_stat(st, STAT_CYCLES, ctx, cpu, count); else if (perf_stat_evsel__is(counter, CYCLES_IN_TX)) -- 2.17.1
Re: [PATCH] powerpc/xmon: Fix data-breakpoint
Thanks Michael, On Tuesday 22 November 2016 05:03 PM, Michael Ellerman wrote: > Ravi Bangoria writes: > >> Xmon data-breakpoint feature is broken. >> >> Whenever there is a watchpoint match occurs, hw_breakpoint_handler will >> be called by do_break via notifier chains mechanism. If watchpoint is >> registered by xmon, hw_breakpoint_handler won't find any associated >> perf_event and returns immediately with NOTIFY_STOP. Similarly, do_break >> also returns without notifying to xmon. >> >> Solve this by returning NOTIFY_DONE when hw_breakpoint_handler does not >> find any perf_event associated with matched watchpoint. > .. rather than NOTIFY_STOP, which tells the core code to continue > calling the other breakpoint handlers including the xmon one. > > Right? Yes. > Also any idea when we broke this? Hmm, not sure exactly. The code is same since it was merged in 2010 when support for hw_breakpoint was added for server processor. -Ravi > cheers >
Re: [PATCH v8 2/3] perf annotate: Support jump instruction with target as second operand
Hi Arnaldo, Hmm, so it's difficult to find example of this when we use debuginfo. Because... Jump__parse tries to look for two things 'offset' and 'target address'. objdump with debuginfo will include offset in assembly f.e. annotate of 'smp_call_function_single' with perf.data and vmlinux I shared. │c016d6ac: cmpwi cr7,r9,0 ▒ │c016d6b0: ↑ bnecr7,c016d59c <.smp_call_function_single+0x8c> ▒ │c016d6b4: addis r10,r2,-15 ▒ objdump of same function with kcore. │c016d6ac: cmpwi cr7,r9,0 ▒ │c016d6b0: ↓ bnecr7,0xc016d59c ▒ │c016d6b4: addis r10,r2,-15 ▒ Annotating in first case won't show any issue because we directly get offset. But in this case as well, we are parsing wrong target address in ops->target.addr While we don't have offset in second case, we use target address to find it. And thus it shows wrong o/p something like: │ cmpwi cr7,r9,0 ▒ │ ↓ bne3fe92afc ▒ │ addis r10,r2,-15 ▒ BTW, we have lot of such instructions in kernel. Thanks, -Ravi On Monday 05 December 2016 09:26 PM, Ravi Bangoria wrote: > Arch like powerpc has jump instructions that includes target address > as second operand. For example, 'bne cr7,0xc00f6154'. Add > support for such instruction in perf annotate. > > objdump o/p: > c00f6140: ld r9,1032(r31) > c00f6144: cmpdi cr7,r9,0 > c00f6148: bnecr7,0xc00f6154 > c00f614c: ld r9,2312(r30) > c00f6150: stdr9,1032(r31) > c00f6154: ld r9,88(r31) > > Corresponding perf annotate o/p: > > Before patch: > ld r9,1032(r31) > cmpdi cr7,r9,0 > v bne3ff09f2c > ld r9,2312(r30) > stdr9,1032(r31) > 74:ld r9,88(r31) > > After patch: > ld r9,1032(r31) > cmpdi cr7,r9,0 > v bne74 > ld r9,2312(r30) > stdr9,1032(r31) > 74:ld r9,88(r31) > > Signed-off-by: Ravi Bangoria > --- > Changes in v8: > - v7: https://lkml.org/lkml/2016/9/21/436 > - Rebase to acme/perf/core > - Little change in patch description. > - No logical changes. (Cross arch annotate patches are in. This patch > is for hardening annotate for powerpc.) > > tools/perf/util/annotate.c | 6 +- > 1 file changed, 5 insertions(+), 1 deletion(-) > > diff --git a/tools/perf/util/annotate.c b/tools/perf/util/annotate.c > index ea7e0de..590244e 100644 > --- a/tools/perf/util/annotate.c > +++ b/tools/perf/util/annotate.c > @@ -223,8 +223,12 @@ bool ins__is_call(const struct ins *ins) > static int jump__parse(struct arch *arch __maybe_unused, struct ins_operands > *ops, struct map *map __maybe_unused) > { > const char *s = strchr(ops->raw, '+'); > + const char *c = strchr(ops->raw, ','); > > - ops->target.addr = strtoull(ops->raw, NULL, 16); > + if (c++ != NULL) > + ops->target.addr = strtoull(c, NULL, 16); > + else > + ops->target.addr = strtoull(ops->raw, NULL, 16); > > if (s++ != NULL) > ops->target.offset = strtoull(s, NULL, 16);