Re: [PATCH 3/3] perf script: Fix crash because of missing feat_op[] entry

2018-06-20 Thread Ravi Bangoria
Hi Arnaldo,

On 06/20/2018 07:19 PM, Arnaldo Carvalho de Melo wrote:
> Em Wed, Jun 20, 2018 at 07:00:30PM +0530, Ravi Bangoria escreveu:
>> perf_event__process_feature() tries to access feat_ops[feat].process
>> which is not defined for feat = HEADER_LAST_FEATURE and thus perf is
>> crashing. Add dummy entry for HEADER_LAST_FEATURE in the feat_ops.
> 
> Humm, first impression is that we should check for HEADER_LAST_FEATURE
> and not try to access that array slot, as it is just a marker, not an
> actual feature.

Sure. Let me try to handle it inside perf_event__process_feature() itself.
May be something like below.

-
diff --git a/tools/perf/util/header.c b/tools/perf/util/header.c
index 540cd2dcd3e7..4dc251d3b372 100644
--- a/tools/perf/util/header.c
+++ b/tools/perf/util/header.c
@@ -3461,6 +3461,10 @@ int perf_event__process_feature(struct perf_tool *tool,
return -1;
}
 
+   /* HEADER_LAST_FEATURE is just a marker. Ignore it. */
+   if (feat == HEADER_LAST_FEATURE)
+   return 0;
+
if (!feat_ops[feat].process)
return 0;
-

Thanks,
Ravi



Re: [PATCH 2/3] perf script: Fix crash because of missing evsel->priv

2018-06-21 Thread Ravi Bangoria
Hi Arnaldo,

On 06/20/2018 07:22 PM, Arnaldo Carvalho de Melo wrote:
> Em Wed, Jun 20, 2018 at 07:00:29PM +0530, Ravi Bangoria escreveu:
>> perf script in pipped mode is crashing because evsel->priv is not
>> set properly. Fix it.
>>
>> Before:
>>   # ./perf record -o - -- ls | ./perf script
>> Segmentation fault (core dumped)
>>
>> After:
>>   # ./perf record -o - -- ls | ./perf script
>>   ls 2282 1031.731974:  25 cpu-clock:uhH:  7effe4b3d29e
>>   ls 2282 1031.73:  25 cpu-clock:uhH:  7effe4b3a650
>>
>> Signed-off-by: Ravi Bangoria 
>> Fixes: a14390fde64e ("perf script: Allow creating per-event dump files")
> 
> Humm, this cset doesn't set evsel->priv to a 'struct perf_evsel_script'
> object, will check which one does to continue review.

So, it's not about setting evsel->priv to 'struct perf_evsel_script', but
it's about setting proper output stream, either file or stdout. When
'struct perf_evsel_script' was not introduced, we were setting evsel->priv
to output stream. So I think, this commit missed to set evsel->priv properly
in pipped case. Ex, below patch applied _directly_ on top of a14390fde64e
fixes the issue.

--
diff --git a/tools/perf/builtin-script.c b/tools/perf/builtin-script.c
index fb5e49b3bc44..66cc4b29bf4d 100644
--- a/tools/perf/builtin-script.c
+++ b/tools/perf/builtin-script.c
@@ -1645,6 +1645,9 @@ static int process_attr(struct perf_tool *tool, union 
perf_event *event,
evlist = *pevlist;
evsel = perf_evlist__last(*pevlist);
 
+   if (!evsel->priv)
+   evsel->priv = stdout;
+
if (evsel->attr.type >= PERF_TYPE_MAX &&
evsel->attr.type != PERF_TYPE_SYNTH)
return 0;
--

To me this commit seems to be the bad. Please let me know if that is not
the case. I'll change the last patch as you suggested an post v2 soon.

Thanks,
Ravi



[PATCH v2 2/3] perf script: Fix crash because of missing evsel->priv

2018-06-25 Thread Ravi Bangoria
perf script in pipped mode is crashing because evsel->priv is not
set properly. Fix it.

Before:
  # ./perf record -o - -- ls | ./perf script
Segmentation fault (core dumped)

After:
  # ./perf record -o - -- ls | ./perf script
  ls 2282 1031.731974:  25 cpu-clock:uhH:  7effe4b3d29e
  ls 2282 1031.73:  25 cpu-clock:uhH:  7effe4b3a650

Signed-off-by: Ravi Bangoria 
Fixes: a14390fde64e ("perf script: Allow creating per-event dump files")
---
 tools/perf/builtin-script.c | 14 ++
 1 file changed, 14 insertions(+)

diff --git a/tools/perf/builtin-script.c b/tools/perf/builtin-script.c
index f3fefbcc4503..ad2ac1300420 100644
--- a/tools/perf/builtin-script.c
+++ b/tools/perf/builtin-script.c
@@ -1834,6 +1834,7 @@ static int process_attr(struct perf_tool *tool, union 
perf_event *event,
struct perf_evlist *evlist;
struct perf_evsel *evsel, *pos;
int err;
+   static struct perf_evsel_script *es;
 
err = perf_event__process_attr(tool, event, pevlist);
if (err)
@@ -1842,6 +1843,19 @@ static int process_attr(struct perf_tool *tool, union 
perf_event *event,
evlist = *pevlist;
evsel = perf_evlist__last(*pevlist);
 
+   if (!evsel->priv) {
+   if (scr->per_event_dump) {
+   evsel->priv = perf_evsel_script__new(evsel,
+   scr->session->data);
+   } else {
+   es = zalloc(sizeof(*es));
+   if (!es)
+   return -ENOMEM;
+   es->fp = stdout;
+   evsel->priv = es;
+   }
+   }
+
if (evsel->attr.type >= PERF_TYPE_MAX &&
evsel->attr.type != PERF_TYPE_SYNTH)
return 0;
-- 
2.14.4



[PATCH v2 3/3] perf script/annotate: Fix crash caused by accessing feat_ops[HEADER_LAST_FEATURE]

2018-06-25 Thread Ravi Bangoria
perf_event__process_feature() accesses feat_ops[HEADER_LAST_FEATURE]
which is not defined and thus perf is crashing. HEADER_LAST_FEATURE
is used as an end marker for the perf report but it's unused for
perf script/annotate. Ignore HEADER_LAST_FEATURE for perf script/
annotate.

Before:
  # ./perf record -o - ls | ./perf script
  Segmentation fault (core dumped)

After:
  # ./perf record -o - ls | ./perf script
  ls 7031 4392.099856:  25 cpu-clock:uhH:  7f5e0ce7cd60
  ls 7031 4392.100355:  25 cpu-clock:uhH:  7f5e0c706ef7

Signed-off-by: Ravi Bangoria 
Fixes: 57b5de463925 ("perf report: Support forced leader feature in pipe mode")
---
 tools/perf/builtin-annotate.c | 11 ++-
 tools/perf/builtin-report.c   |  3 ++-
 tools/perf/builtin-script.c   | 11 ++-
 tools/perf/util/header.c  |  2 +-
 4 files changed, 23 insertions(+), 4 deletions(-)

diff --git a/tools/perf/builtin-annotate.c b/tools/perf/builtin-annotate.c
index 5eb22cc56363..8180319285af 100644
--- a/tools/perf/builtin-annotate.c
+++ b/tools/perf/builtin-annotate.c
@@ -283,6 +283,15 @@ static int process_sample_event(struct perf_tool *tool,
return ret;
 }
 
+static int process_feature_event(struct perf_tool *tool,
+union perf_event *event,
+struct perf_session *session)
+{
+   if (event->feat.feat_id < HEADER_LAST_FEATURE)
+   return perf_event__process_feature(tool, event, session);
+   return 0;
+}
+
 static int hist_entry__tty_annotate(struct hist_entry *he,
struct perf_evsel *evsel,
struct perf_annotate *ann)
@@ -471,7 +480,7 @@ int cmd_annotate(int argc, const char **argv)
.attr   = perf_event__process_attr,
.build_id = perf_event__process_build_id,
.tracing_data   = perf_event__process_tracing_data,
-   .feature= perf_event__process_feature,
+   .feature= process_feature_event,
.ordered_events = true,
.ordering_requires_timestamps = true,
},
diff --git a/tools/perf/builtin-report.c b/tools/perf/builtin-report.c
index cdb5b6949832..c04dc7b53797 100644
--- a/tools/perf/builtin-report.c
+++ b/tools/perf/builtin-report.c
@@ -217,7 +217,8 @@ static int process_feature_event(struct perf_tool *tool,
}
 
/*
-* All features are received, we can force the
+* (feat_id = HEADER_LAST_FEATURE) is the end marker which
+* means all features are received, now we can force the
 * group if needed.
 */
setup_forced_leader(rep, session->evlist);
diff --git a/tools/perf/builtin-script.c b/tools/perf/builtin-script.c
index ad2ac1300420..568ddfac3213 100644
--- a/tools/perf/builtin-script.c
+++ b/tools/perf/builtin-script.c
@@ -3044,6 +3044,15 @@ int process_cpu_map_event(struct perf_tool *tool 
__maybe_unused,
return set_maps(script);
 }
 
+static int process_feature_event(struct perf_tool *tool,
+union perf_event *event,
+struct perf_session *session)
+{
+   if (event->feat.feat_id < HEADER_LAST_FEATURE)
+   return perf_event__process_feature(tool, event, session);
+   return 0;
+}
+
 #ifdef HAVE_AUXTRACE_SUPPORT
 static int perf_script__process_auxtrace_info(struct perf_tool *tool,
  union perf_event *event,
@@ -3088,7 +3097,7 @@ int cmd_script(int argc, const char **argv)
.attr= process_attr,
.event_update   = perf_event__process_event_update,
.tracing_data= perf_event__process_tracing_data,
-   .feature = perf_event__process_feature,
+   .feature = process_feature_event,
.build_id= perf_event__process_build_id,
.id_index= perf_event__process_id_index,
.auxtrace_info   = perf_script__process_auxtrace_info,
diff --git a/tools/perf/util/header.c b/tools/perf/util/header.c
index 59fcc790c865..653ff65aa2c3 100644
--- a/tools/perf/util/header.c
+++ b/tools/perf/util/header.c
@@ -3464,7 +3464,7 @@ int perf_event__process_feature(struct perf_tool *tool,
pr_warning("invalid record type %d in pipe-mode\n", type);
return 0;
}
-   if (feat == HEADER_RESERVED || feat > HEADER_LAST_FEATURE) {
+   if (feat == HEADER_RESERVED || feat >= HEADER_LAST_FEATURE) {
pr_warning("invalid record type %d in pipe-mode\n", type);
return -1;
}
-- 
2.14.4



[PATCH v2 1/3] perf script: Add missing output fields in a hint

2018-06-25 Thread Ravi Bangoria
Few fields are missing in a perf script -F hint. Add them.

Signed-off-by: Ravi Bangoria 
---
 tools/perf/builtin-script.c | 5 +++--
 1 file changed, 3 insertions(+), 2 deletions(-)

diff --git a/tools/perf/builtin-script.c b/tools/perf/builtin-script.c
index a31d7082188e..f3fefbcc4503 100644
--- a/tools/perf/builtin-script.c
+++ b/tools/perf/builtin-script.c
@@ -3125,8 +3125,9 @@ int cmd_script(int argc, const char **argv)
 "+field to add and -field to remove."
 "Valid types: hw,sw,trace,raw,synth. "
 "Fields: comm,tid,pid,time,cpu,event,trace,ip,sym,dso,"
-"addr,symoff,period,iregs,uregs,brstack,brstacksym,flags,"
-
"bpf-output,callindent,insn,insnlen,brstackinsn,synth,phys_addr",
+"addr,symoff,srcline,period,iregs,uregs,brstack,"
+"brstacksym,flags,bpf-output,brstackinsn,brstackoff,"
+"callindent,insn,insnlen,synth,phys_addr,metric,misc",
 parse_output_fields),
OPT_BOOLEAN('a', "all-cpus", _wide,
"system-wide collection from all CPUs"),
-- 
2.14.4



[PATCH v2 0/3] perf script: Few trivial fixes

2018-06-25 Thread Ravi Bangoria
First patch fixes perf output field hint. Second and third fixes
crash when used in a piped mode.

v2 changes:
 - [PATCH 3/3] Changed as suggested by Arnaldo

v1: https://lkml.org/lkml/2018/6/20/538

Ravi Bangoria (3):
  perf script: Add missing output fields in a hint
  perf script: Fix crash because of missing evsel->priv
  perf script/annotate: Fix crash caused by accessing
feat_ops[HEADER_LAST_FEATURE]

 tools/perf/builtin-annotate.c | 11 ++-
 tools/perf/builtin-report.c   |  3 ++-
 tools/perf/builtin-script.c   | 30 +++---
 tools/perf/util/header.c  |  2 +-
 4 files changed, 40 insertions(+), 6 deletions(-)

-- 
2.14.4



xfs: Deadlock between fs_reclaim and sb_internal

2018-06-27 Thread Ravi Bangoria
Hello Darrick,

Lockdep is reporting a deadlock with following trace. Saw this on my
powerpc vm with 4GB of ram, running Linus/master kernel. Though, I
don't have exact testcase to reproduce it. Is this something known?


[ 1797.620389] ==
[ 1797.620541] WARNING: possible circular locking dependency detected
[ 1797.620695] 4.18.0-rc2+ #2 Not tainted
[ 1797.620791] --
[ 1797.620942] kswapd0/362 is trying to acquire lock:
[ 1797.621067] 01ad2774 (sb_internal){.+.+}, at: 
xfs_trans_alloc+0x284/0x360 [xfs]
[ 1797.621616] 
[ 1797.621616] but task is already holding lock:
[ 1797.621779] bb0609ea (fs_reclaim){+.+.}, at: 
__fs_reclaim_acquire+0x10/0x60
[ 1797.622090] 
[ 1797.622090] which lock already depends on the new lock.
[ 1797.622090] 
[ 1797.622266] 
[ 1797.622266] the existing dependency chain (in reverse order) is:
[ 1797.622440] 
[ 1797.622440] -> #1 (fs_reclaim){+.+.}:
[ 1797.622609]fs_reclaim_acquire.part.22+0x44/0x60
[ 1797.622764]kmem_cache_alloc+0x60/0x500
[ 1797.622983]kmem_zone_alloc+0xb4/0x168 [xfs]
[ 1797.623220]xfs_trans_alloc+0xac/0x360 [xfs]
[ 1797.623454]xfs_vn_update_time+0x278/0x420 [xfs]
[ 1797.623576]touch_atime+0xfc/0x120
[ 1797.623668]generic_file_read_iter+0x95c/0xbd0
[ 1797.623948]xfs_file_buffered_aio_read+0x98/0x2e0 [xfs]
[ 1797.624190]xfs_file_read_iter+0xac/0x170 [xfs]
[ 1797.624304]__vfs_read+0x128/0x1d0
[ 1797.624388]vfs_read+0xbc/0x1b0
[ 1797.624476]kernel_read+0x60/0xa0
[ 1797.624562]prepare_binprm+0xf8/0x1f0
[ 1797.624676]__do_execve_file.isra.13+0x7ac/0xc80
[ 1797.624791]sys_execve+0x58/0x70
[ 1797.624880]system_call+0x5c/0x70
[ 1797.624965] 
[ 1797.624965] -> #0 (sb_internal){.+.+}:
[ 1797.625119]lock_acquire+0xec/0x310
[ 1797.625204]__sb_start_write+0x18c/0x290
[ 1797.625435]xfs_trans_alloc+0x284/0x360 [xfs]
[ 1797.625692]xfs_iomap_write_allocate+0x230/0x470 [xfs]
[ 1797.625945]xfs_map_blocks+0x1a8/0x610 [xfs]
[ 1797.626181]xfs_do_writepage+0x1a8/0x9e0 [xfs]
[ 1797.626414]xfs_vm_writepage+0x4c/0x78 [xfs]
[ 1797.626527]pageout.isra.17+0x180/0x680
[ 1797.626640]shrink_page_list+0xe0c/0x1470
[ 1797.626758]shrink_inactive_list+0x3a4/0x890
[ 1797.626872]shrink_node_memcg+0x268/0x790
[ 1797.626986]shrink_node+0x124/0x630
[ 1797.627089]balance_pgdat+0x1f0/0x550
[ 1797.627204]kswapd+0x214/0x8c0
[ 1797.627301]kthread+0x1b8/0x1c0
[ 1797.627389]ret_from_kernel_thread+0x5c/0x88
[ 1797.627509] 
[ 1797.627509] other info that might help us debug this:
[ 1797.627509] 
[ 1797.627684]  Possible unsafe locking scenario:
[ 1797.627684] 
[ 1797.627820]CPU0CPU1
[ 1797.627965]
[ 1797.628072]   lock(fs_reclaim);
[ 1797.628166]lock(sb_internal);
[ 1797.628308]lock(fs_reclaim);
[ 1797.628453]   lock(sb_internal);
[ 1797.628551] 
[ 1797.628551]  *** DEADLOCK ***
[ 1797.628551] 
[ 1797.628693] 1 lock held by kswapd0/362:
[ 1797.628779]  #0: bb0609ea (fs_reclaim){+.+.}, at: 
__fs_reclaim_acquire+0x10/0x60
[ 1797.628955] 
[ 1797.628955] stack backtrace:
[ 1797.629067] CPU: 18 PID: 362 Comm: kswapd0 Not tainted 4.18.0-rc2+ #2
[ 1797.629199] Call Trace:
[ 1797.629311] [c000f4ebaff0] [c0cc4414] dump_stack+0xe8/0x164 
(unreliable)
[ 1797.629473] [c000f4ebb040] [c01a9f54] 
print_circular_bug.isra.17+0x284/0x3d0
[ 1797.629639] [c000f4ebb0e0] [c01aed58] 
__lock_acquire+0x1d48/0x2020
[ 1797.629797] [c000f4ebb260] [c01af7cc] lock_acquire+0xec/0x310
[ 1797.629933] [c000f4ebb330] [c0454c5c] 
__sb_start_write+0x18c/0x290
[ 1797.630191] [c000f4ebb380] [d9b7896c] 
xfs_trans_alloc+0x284/0x360 [xfs]
[ 1797.630443] [c000f4ebb3e0] [d9b58958] 
xfs_iomap_write_allocate+0x230/0x470 [xfs]
[ 1797.630710] [c000f4ebb560] [d9b2c760] xfs_map_blocks+0x1a8/0x610 
[xfs]
[ 1797.630961] [c000f4ebb5d0] [d9b2eb20] 
xfs_do_writepage+0x1a8/0x9e0 [xfs]
[ 1797.631223] [c000f4ebb6b0] [d9b2f3a4] xfs_vm_writepage+0x4c/0x78 
[xfs]
[ 1797.631384] [c000f4ebb720] [c03692e0] pageout.isra.17+0x180/0x680
[ 1797.631557] [c000f4ebb800] [c036ccfc] 
shrink_page_list+0xe0c/0x1470
[ 1797.631730] [c000f4ebb920] [c036e064] 
shrink_inactive_list+0x3a4/0x890
[ 1797.631912] [c000f4ebb9f0] [c036ef48] 
shrink_node_memcg+0x268/0x790
[ 1797.632078] [c000f4ebbb00] [c036f594] shrink_node+0x124/0x630
[ 1797.632213] [c000f4ebbbd0] [c03714e0] balance_pgdat+0x1f0/0x550
[ 1797.632350] [c000f4ebbcd0] [c0371a54] kswapd+0x214/0x8c0
[ 1797.632488] [c000f4ebbdc0] 

Re: [PATCH v3 6/9] trace_uprobe: Support SDT markers having reference count (semaphore)

2018-05-25 Thread Ravi Bangoria
Thanks Oleg for the review,

On 05/24/2018 09:56 PM, Oleg Nesterov wrote:
> On 04/17, Ravi Bangoria wrote:
>>
>> @@ -941,6 +1091,9 @@ typedef bool (*filter_func_t)(struct uprobe_consumer 
>> *self,
>>  if (ret)
>>  goto err_buffer;
>>  
>> +if (tu->ref_ctr_offset)
>> +sdt_increment_ref_ctr(tu);
>> +
> 
> iiuc, this is probe_event_enable()...
> 
> Looks racy, but afaics the race with uprobe_mmap() will be closed by the next
> change. However, it seems that probe_event_disable() can race with 
> trace_uprobe_mmap()
> too and the next 7/9 patch won't help,
> 
>> +if (tu->ref_ctr_offset)
>> +sdt_decrement_ref_ctr(tu);
>> +
>>  uprobe_unregister(tu->inode, tu->offset, >consumer);
>>  tu->tp.flags &= file ? ~TP_FLAG_TRACE : ~TP_FLAG_PROFILE;
> 
> so what if trace_uprobe_mmap() comes right after uprobe_unregister() ?
> Note that trace_probe_is_enabled() is T until we update tp.flags.

Sure, I'll look at your comments.

Apart from these, I've also found a deadlock between uprobe_lock and
mm->mmap_sem. trace_uprobe_mmap() takes these locks in

   mm->mmap_sem
  uprobe_lock

order but some other code path is taking these locks in reverse order. I've
mentioned sample lockdep warning at the end. The issue is, mm->mmap_sem is
not in control of trace_uprobe_mmap() and we have to take uprobe_lock to
loop over all trace_uprobes.

Any idea how this can be resolved?


Sample lockdep warning:

[  499.258006] ==
[  499.258205] WARNING: possible circular locking dependency detected
[  499.258409] 4.17.0-rc3+ #76 Not tainted
[  499.258528] --
[  499.258731] perf/6744 is trying to acquire lock:
[  499.258895] e4895f49 (uprobe_lock){+.+.}, at: 
trace_uprobe_mmap+0x78/0x130
[  499.259147]
[  499.259147] but task is already holding lock:
[  499.259349] 9ec93a76 (>mmap_sem){}, at: 
vm_mmap_pgoff+0xe0/0x160
[  499.259597]
[  499.259597] which lock already depends on the new lock.
[  499.259597]
[  499.259848]
[  499.259848] the existing dependency chain (in reverse order) is:
[  499.260086]
[  499.260086] -> #4 (>mmap_sem){}:
[  499.260277]__lock_acquire+0x53c/0x910
[  499.260442]lock_acquire+0xf4/0x2f0
[  499.260595]down_write_killable+0x6c/0x150
[  499.260764]copy_process.isra.34.part.35+0x1594/0x1be0
[  499.260967]_do_fork+0xf8/0x910
[  499.261090]ppc_clone+0x8/0xc
[  499.261209]
[  499.261209] -> #3 (_mmap_sem){}:
[  499.261378]__lock_acquire+0x53c/0x910
[  499.261540]lock_acquire+0xf4/0x2f0
[  499.261669]down_write+0x6c/0x110
[  499.261793]percpu_down_write+0x48/0x140
[  499.261954]register_for_each_vma+0x6c/0x2a0
[  499.262116]uprobe_register+0x230/0x320
[  499.262277]probe_event_enable+0x1cc/0x540
[  499.262435]perf_trace_event_init+0x1e0/0x350
[  499.262587]perf_trace_init+0xb0/0x110
[  499.262750]perf_tp_event_init+0x38/0x90
[  499.262910]perf_try_init_event+0x10c/0x150
[  499.263075]perf_event_alloc+0xbb0/0xf10
[  499.263235]sys_perf_event_open+0x2a8/0xdd0
[  499.263396]system_call+0x58/0x6c
[  499.263516]
[  499.263516] -> #2 (>register_rwsem){}:
[  499.263723]__lock_acquire+0x53c/0x910
[  499.263884]lock_acquire+0xf4/0x2f0
[  499.264002]down_write+0x6c/0x110
[  499.264118]uprobe_register+0x1ec/0x320
[  499.264283]probe_event_enable+0x1cc/0x540
[  499.264442]perf_trace_event_init+0x1e0/0x350
[  499.264603]perf_trace_init+0xb0/0x110
[  499.264766]perf_tp_event_init+0x38/0x90
[  499.264930]perf_try_init_event+0x10c/0x150
[  499.265092]perf_event_alloc+0xbb0/0xf10
[  499.265261]sys_perf_event_open+0x2a8/0xdd0
[  499.265424]system_call+0x58/0x6c
[  499.265542]
[  499.265542] -> #1 (event_mutex){+.+.}:
[  499.265738]__lock_acquire+0x53c/0x910
[  499.265896]lock_acquire+0xf4/0x2f0
[  499.266019]__mutex_lock+0xa0/0xab0
[  499.266142]trace_add_event_call+0x44/0x100
[  499.266310]create_trace_uprobe+0x4a0/0x8b0
[  499.266474]trace_run_command+0xa4/0xc0
[  499.266631]trace_parse_run_command+0xe4/0x200
[  499.266799]probes_write+0x20/0x40
[  499.266922]__vfs_write+0x6c/0x240
[  499.267041]vfs_write+0xd0/0x240
[  499.267166]ksys_write+0x6c/0x110
[  499.267295]system_call+0x58/0x6c
[  499.267413]
[  499.267413] -> #0 (uprobe_lock){+.+.}:
[  499.267591]validate_chain.isra.34+0xbd0/0x1000
[  499.267747]__lock_acquire+0x53c/0x910
[  499.267917]lock_acquire+0xf4/0x2f0
[  499.268048]__mu

Re: [PATCH v5 09/10] Uprobes/sdt: Document about reference counter

2018-07-03 Thread Ravi Bangoria
Hi Srikar,

On 07/02/2018 08:24 PM, Srikar Dronamraju wrote:
> * Ravi Bangoria  [2018-06-28 10:52:08]:
> 
>> Reference counter gate the invocation of probe. If present,
>> by default reference count is 0. Kernel needs to increment
>> it before tracing the probe and decrement it when done. This
>> is identical to semaphore in Userspace Statically Defined
>> Tracepoints (USDT).
>>
>> Document usage of reference counter.
>>
>> Signed-off-by: Ravi Bangoria 
>> Acked-by: Masami Hiramatsu 
> 
> Unlike perf, this mechanism cannot detect ref count and depends on the
> users data. What happens if the user mistakenly provides a wrong location?
> I guess he can corrupt some other data structures?

Yes, if user is giving wrong ref_ctr_offset, uprobe infrastructure will
corrupt some user data.

> 
> Hence I would think twice of advertising this mechanism. i.e keep this
> as an undocumented feature.
> 

I don't mind.

Thanks,
Ravi



Re: [RFC 2/4] perf: Pass pmu pointer to perf_paranoid_* helpers

2018-07-03 Thread Ravi Bangoria
Hi Tvrtko,

> @@ -199,7 +199,7 @@ static inline void perf_get_data_addr(struct pt_regs 
> *regs, u64 *addrp)
>   if (!(mmcra & MMCRA_SAMPLE_ENABLE) || sdar_valid)
>   *addrp = mfspr(SPRN_SDAR);
>  
> - if (perf_paranoid_kernel() && !capable(CAP_SYS_ADMIN) &&
> + if (perf_paranoid_kernel(ppmu) && !capable(CAP_SYS_ADMIN) &&
>   is_kernel_addr(mfspr(SPRN_SDAR)))
>   *addrp = 0;
>  }

This patch fails for me on powerpc:

arch/powerpc/perf/core-book3s.c: In function ‘perf_get_data_addr’:
arch/powerpc/perf/core-book3s.c:202:27: error: passing argument 1 of 
‘perf_paranoid_kernel’ from incompatible pointer type 
[-Werror=incompatible-pointer-types]
  if (perf_paranoid_kernel(ppmu) && !capable(CAP_SYS_ADMIN) &&
   ^~~~
In file included from arch/powerpc/perf/core-book3s.c:13:0:
./include/linux/perf_event.h:1191:20: note: expected ‘const struct pmu *’ but 
argument is of type ‘struct power_pmu *’
 static inline bool perf_paranoid_kernel(const struct pmu *pmu)
^~~~
arch/powerpc/perf/core-book3s.c: In function ‘power_pmu_bhrb_read’:
arch/powerpc/perf/core-book3s.c:470:8: error: too few arguments to function 
‘perf_paranoid_kernel’
if (perf_paranoid_kernel() && !capable(CAP_SYS_ADMIN) &&
^~~~
In file included from arch/powerpc/perf/core-book3s.c:13:0:
./include/linux/perf_event.h:1191:20: note: declared here
 static inline bool perf_paranoid_kernel(const struct pmu *pmu)
^~~~
  CC  net/ipv6/route.o



Re: [PATCH v6 5/6] Uprobes/sdt: Prevent multiple reference counter for same uprobe

2018-07-26 Thread Ravi Bangoria
Hi Oleg,

On 07/25/2018 04:38 PM, Oleg Nesterov wrote:
> No, I can't understand this patch...
> 
> On 07/16, Ravi Bangoria wrote:
>>
>> --- a/kernel/events/uprobes.c
>> +++ b/kernel/events/uprobes.c
>> @@ -63,6 +63,8 @@ static struct percpu_rw_semaphore dup_mmap_sem;
>>
>>  /* Have a copy of original instruction */
>>  #define UPROBE_COPY_INSN0
>> +/* Reference counter offset is reloaded with non-zero value. */
>> +#define REF_CTR_OFF_RELOADED1
>>
>>  struct uprobe {
>>  struct rb_node  rb_node;/* node in the rb tree */
>> @@ -476,9 +478,23 @@ int uprobe_write_opcode(struct arch_uprobe *auprobe, 
>> struct mm_struct *mm,
>>  return ret;
>>
>>  ret = verify_opcode(old_page, vaddr, );
>> -if (ret <= 0)
>> +if (ret < 0)
>>  goto put_old;
> 
> I agree, "ret <= 0" wasn't nice even before this change, but "ret < 0" looks
> worse because this is simply not possible.


Ok. Any better idea?
I think if we don't track all mms patched by uprobe, we have to rely
on current instruction.


> 
>> +/*
>> + * If instruction is already patched but reference counter offset
>> + * has been reloaded to non-zero value, increment the reference
>> + * counter and return.
>> + */
>> +if (ret == 0) {
>> +if (is_register &&
>> +test_bit(REF_CTR_OFF_RELOADED, >flags)) {
>> +WARN_ON(!uprobe->ref_ctr_offset);
>> +ret = update_ref_ctr(uprobe, mm, true);
>> +}
>> +goto put_old;
>> +}
> 
> So we need to force update_ref_ctr(true) in case when uprobe_register_refctr()
> detects the already registered uprobe with ref_ctr_offset == 0, and then it 
> calls
> register_for_each_vma().
> 
> Why this can't race with uprobe_mmap() ?
> 
> uprobe_mmap() can do install_breakpoint() right after REF_CTR_OFF_RELOADED 
> was set,
> then register_for_each_vma() will find this vma and do install_breakpoint() 
> too.
> If ref_ctr_vma was already mmaped, the counter will be incremented twice, no?


Hmm right. Probably, I can fix this race by using some lock, but I don't
know if it's good to do that inside uprobe_mmap().


> 
>> @@ -971,6 +1011,7 @@ register_for_each_vma(struct uprobe *uprobe, struct 
>> uprobe_consumer *new)
>>  bool is_register = !!new;
>>  struct map_info *info;
>>  int err = 0;
>> +bool installed = false;
>>
>>  percpu_down_write(_mmap_sem);
>>  info = build_map_info(uprobe->inode->i_mapping,
>> @@ -1000,8 +1041,10 @@ register_for_each_vma(struct uprobe *uprobe, struct 
>> uprobe_consumer *new)
>>  if (is_register) {
>>  /* consult only the "caller", new consumer. */
>>  if (consumer_filter(new,
>> -UPROBE_FILTER_REGISTER, mm))
>> +UPROBE_FILTER_REGISTER, mm)) {
>>  err = install_breakpoint(uprobe, mm, vma, 
>> info->vaddr);
>> +installed = true;
>> +}
>>  } else if (test_bit(MMF_HAS_UPROBES, >flags)) {
>>  if (!filter_chain(uprobe,
>>  UPROBE_FILTER_UNREGISTER, mm))
>> @@ -1016,6 +1059,8 @@ register_for_each_vma(struct uprobe *uprobe, struct 
>> uprobe_consumer *new)
>>  }
>>   out:
>>  percpu_up_write(_mmap_sem);
>> +if (installed)
>> +clear_bit(REF_CTR_OFF_RELOADED, >flags);
> 
> I simply can't understand this "bool installed"


That boolean is needed because consumer_filter() returns false when this
function gets called first time from uprobe_register(). And consumer_filter
returns true when it gets called by uprobe_apply(). If I make it
unconditional, there is no effect of setting REF_CTR_OFF_RELOADED bit. So
this boolean is needed.

Though, there is one more issue I found. Let's say there are two processes
running and user probes on both of them using uprobe_register() using, let's
say systemtap. Now, some other guy does uprobe_register_refctr() using
'perf -p PID' on same uprobe but he is interested in only one process. Here,
we will increment the reference counter only in the "PID" process and we will
clear REF_CTR_OFF_RELOADED flag. Later, some other guy does 'perf -a' which
will call uprobe_register_refctr() for both the target but we will fail to
increment the counter in "non-PID" process

[PATCH v7 4/6] Uprobes/sdt: Prevent multiple reference counter for same uprobe

2018-07-30 Thread Ravi Bangoria
We assume to have only one reference counter for one uprobe.
Don't allow user to register multiple uprobes having same
inode+offset but different reference counter.

Signed-off-by: Ravi Bangoria 
---
 kernel/events/uprobes.c | 9 +
 1 file changed, 9 insertions(+)

diff --git a/kernel/events/uprobes.c b/kernel/events/uprobes.c
index ad92fed11526..c27546929ae7 100644
--- a/kernel/events/uprobes.c
+++ b/kernel/events/uprobes.c
@@ -679,6 +679,12 @@ static struct uprobe *alloc_uprobe(struct inode *inode, 
loff_t offset,
cur_uprobe = insert_uprobe(uprobe);
/* a uprobe exists for this inode:offset combination */
if (cur_uprobe) {
+   if (cur_uprobe->ref_ctr_offset != uprobe->ref_ctr_offset) {
+   pr_warn("Reference counter offset mismatch.\n");
+   put_uprobe(cur_uprobe);
+   kfree(uprobe);
+   return ERR_PTR(-EINVAL);
+   }
kfree(uprobe);
uprobe = cur_uprobe;
}
@@ -1093,6 +1099,9 @@ static int __uprobe_register(struct inode *inode, loff_t 
offset,
uprobe = alloc_uprobe(inode, offset, ref_ctr_offset);
if (!uprobe)
return -ENOMEM;
+   if (IS_ERR(uprobe))
+   return PTR_ERR(uprobe);
+
/*
 * We can race with uprobe_unregister()->delete_uprobe().
 * Check uprobe_is_active() and retry if it is false.
-- 
2.14.4



[PATCH v7 6/6] perf probe: Support SDT markers having reference counter (semaphore)

2018-07-30 Thread Ravi Bangoria
With this, perf buildid-cache will save SDT markers with reference
counter in probe cache. Perf probe will be able to probe markers
having reference counter. Ex,

  # readelf -n /tmp/tick | grep -A1 loop2
Name: loop2
... Semaphore: 0x10020036

  # ./perf buildid-cache --add /tmp/tick
  # ./perf probe sdt_tick:loop2
  # ./perf stat -e sdt_tick:loop2 /tmp/tick
hi: 0
hi: 1
hi: 2
^C
 Performance counter stats for '/tmp/tick':
 3  sdt_tick:loop2
   2.561851452 seconds time elapsed

Signed-off-by: Ravi Bangoria 
Acked-by: Masami Hiramatsu 
Acked-by: Srikar Dronamraju 
---
 tools/perf/util/probe-event.c | 39 
 tools/perf/util/probe-event.h |  1 +
 tools/perf/util/probe-file.c  | 34 ++--
 tools/perf/util/probe-file.h  |  1 +
 tools/perf/util/symbol-elf.c  | 46 ---
 tools/perf/util/symbol.h  |  7 +++
 6 files changed, 106 insertions(+), 22 deletions(-)

diff --git a/tools/perf/util/probe-event.c b/tools/perf/util/probe-event.c
index f119eb628dbb..e86f8be89157 100644
--- a/tools/perf/util/probe-event.c
+++ b/tools/perf/util/probe-event.c
@@ -1819,6 +1819,12 @@ int parse_probe_trace_command(const char *cmd, struct 
probe_trace_event *tev)
tp->offset = strtoul(fmt2_str, NULL, 10);
}
 
+   if (tev->uprobes) {
+   fmt2_str = strchr(p, '(');
+   if (fmt2_str)
+   tp->ref_ctr_offset = strtoul(fmt2_str + 1, NULL, 0);
+   }
+
tev->nargs = argc - 2;
tev->args = zalloc(sizeof(struct probe_trace_arg) * tev->nargs);
if (tev->args == NULL) {
@@ -2012,6 +2018,22 @@ static int synthesize_probe_trace_arg(struct 
probe_trace_arg *arg,
return err;
 }
 
+static int
+synthesize_uprobe_trace_def(struct probe_trace_event *tev, struct strbuf *buf)
+{
+   struct probe_trace_point *tp = >point;
+   int err;
+
+   err = strbuf_addf(buf, "%s:0x%lx", tp->module, tp->address);
+
+   if (err >= 0 && tp->ref_ctr_offset) {
+   if (!uprobe_ref_ctr_is_supported())
+   return -1;
+   err = strbuf_addf(buf, "(0x%lx)", tp->ref_ctr_offset);
+   }
+   return err >= 0 ? 0 : -1;
+}
+
 char *synthesize_probe_trace_command(struct probe_trace_event *tev)
 {
struct probe_trace_point *tp = >point;
@@ -2041,15 +2063,17 @@ char *synthesize_probe_trace_command(struct 
probe_trace_event *tev)
}
 
/* Use the tp->address for uprobes */
-   if (tev->uprobes)
-   err = strbuf_addf(, "%s:0x%lx", tp->module, tp->address);
-   else if (!strncmp(tp->symbol, "0x", 2))
+   if (tev->uprobes) {
+   err = synthesize_uprobe_trace_def(tev, );
+   } else if (!strncmp(tp->symbol, "0x", 2)) {
/* Absolute address. See try_to_find_absolute_address() */
err = strbuf_addf(, "%s%s0x%lx", tp->module ?: "",
  tp->module ? ":" : "", tp->address);
-   else
+   } else {
err = strbuf_addf(, "%s%s%s+%lu", tp->module ?: "",
tp->module ? ":" : "", tp->symbol, tp->offset);
+   }
+
if (err)
goto error;
 
@@ -2633,6 +2657,13 @@ static void warn_uprobe_event_compat(struct 
probe_trace_event *tev)
 {
int i;
char *buf = synthesize_probe_trace_command(tev);
+   struct probe_trace_point *tp = >point;
+
+   if (tp->ref_ctr_offset && !uprobe_ref_ctr_is_supported()) {
+   pr_warning("A semaphore is associated with %s:%s and "
+  "seems your kernel doesn't support it.\n",
+  tev->group, tev->event);
+   }
 
/* Old uprobe event doesn't support memory dereference */
if (!tev->uprobes || tev->nargs == 0 || !buf)
diff --git a/tools/perf/util/probe-event.h b/tools/perf/util/probe-event.h
index 45b14f020558..15a98c3a2a2f 100644
--- a/tools/perf/util/probe-event.h
+++ b/tools/perf/util/probe-event.h
@@ -27,6 +27,7 @@ struct probe_trace_point {
char*symbol;/* Base symbol */
char*module;/* Module name */
unsigned long   offset; /* Offset from symbol */
+   unsigned long   ref_ctr_offset; /* SDT reference counter offset */
unsigned long   address;/* Actual address of the trace point */
boolretprobe;   /* Return probe flag */
 };
diff --git a/tools/perf/util/probe-file.c b/tools/perf/util/probe-file.c
index b76088fadf3d..aac7817d9e14 100644
--- a/tools/perf/util/probe-file.c

[PATCH v7 3/6] Uprobes: Support SDT markers having reference count (semaphore)

2018-07-30 Thread Ravi Bangoria
Userspace Statically Defined Tracepoints[1] are dtrace style markers
inside userspace applications. Applications like PostgreSQL, MySQL,
Pthread, Perl, Python, Java, Ruby, Node.js, libvirt, QEMU, glib etc
have these markers embedded in them. These markers are added by developer
at important places in the code. Each marker source expands to a single
nop instruction in the compiled code but there may be additional
overhead for computing the marker arguments which expands to couple of
instructions. In case the overhead is more, execution of it can be
omitted by runtime if() condition when no one is tracing on the marker:

if (reference_counter > 0) {
Execute marker instructions;
}

Default value of reference counter is 0. Tracer has to increment the
reference counter before tracing on a marker and decrement it when
done with the tracing.

Implement the reference counter logic in core uprobe. User will be
able to use it from trace_uprobe as well as from kernel module. New
trace_uprobe definition with reference counter will now be:

:[(ref_ctr_offset)]

where ref_ctr_offset is an optional field. For kernel module, new
variant of uprobe_register() has been introduced:

uprobe_register_refctr(inode, offset, ref_ctr_offset, consumer)

No new variant for uprobe_unregister() because it's assumed to have
only one reference counter for one uprobe.

[1] https://sourceware.org/systemtap/wiki/UserSpaceProbeImplementation

Note: 'reference counter' is called as 'semaphore' in original Dtrace
(or Systemtap, bcc and even in ELF) documentation and code. But the
term 'semaphore' is misleading in this context. This is just a counter
used to hold number of tracers tracing on a marker. This is not really
used for any synchronization. So we are referring it as 'reference
counter' in kernel / perf code.

Signed-off-by: Ravi Bangoria 
Reviewed-by: Masami Hiramatsu 
[Only trace_uprobe.c]
---
 include/linux/uprobes.h |   5 +
 kernel/events/uprobes.c | 232 ++--
 kernel/trace/trace.c|   2 +-
 kernel/trace/trace_uprobe.c |  38 +++-
 4 files changed, 267 insertions(+), 10 deletions(-)

diff --git a/include/linux/uprobes.h b/include/linux/uprobes.h
index bb9d2084af03..103a48a48872 100644
--- a/include/linux/uprobes.h
+++ b/include/linux/uprobes.h
@@ -123,6 +123,7 @@ extern unsigned long uprobe_get_swbp_addr(struct pt_regs 
*regs);
 extern unsigned long uprobe_get_trap_addr(struct pt_regs *regs);
 extern int uprobe_write_opcode(struct arch_uprobe *auprobe, struct mm_struct 
*mm, unsigned long vaddr, uprobe_opcode_t);
 extern int uprobe_register(struct inode *inode, loff_t offset, struct 
uprobe_consumer *uc);
+extern int uprobe_register_refctr(struct inode *inode, loff_t offset, loff_t 
ref_ctr_offset, struct uprobe_consumer *uc);
 extern int uprobe_apply(struct inode *inode, loff_t offset, struct 
uprobe_consumer *uc, bool);
 extern void uprobe_unregister(struct inode *inode, loff_t offset, struct 
uprobe_consumer *uc);
 extern int uprobe_mmap(struct vm_area_struct *vma);
@@ -160,6 +161,10 @@ uprobe_register(struct inode *inode, loff_t offset, struct 
uprobe_consumer *uc)
 {
return -ENOSYS;
 }
+static inline int uprobe_register_refctr(struct inode *inode, loff_t offset, 
loff_t ref_ctr_offset, struct uprobe_consumer *uc)
+{
+   return -ENOSYS;
+}
 static inline int
 uprobe_apply(struct inode *inode, loff_t offset, struct uprobe_consumer *uc, 
bool add)
 {
diff --git a/kernel/events/uprobes.c b/kernel/events/uprobes.c
index c0418ba52ba8..ad92fed11526 100644
--- a/kernel/events/uprobes.c
+++ b/kernel/events/uprobes.c
@@ -73,6 +73,7 @@ struct uprobe {
struct uprobe_consumer  *consumers;
struct inode*inode; /* Also hold a ref to inode */
loff_t  offset;
+   loff_t  ref_ctr_offset;
unsigned long   flags;
 
/*
@@ -88,6 +89,15 @@ struct uprobe {
struct arch_uprobe  arch;
 };
 
+struct delayed_uprobe {
+   struct list_head list;
+   struct uprobe *uprobe;
+   struct mm_struct *mm;
+};
+
+static DEFINE_MUTEX(delayed_uprobe_lock);
+static LIST_HEAD(delayed_uprobe_list);
+
 /*
  * Execute out of line area: anonymous executable mapping installed
  * by the probed task to execute the copy of the original instruction
@@ -282,6 +292,154 @@ static int verify_opcode(struct page *page, unsigned long 
vaddr, uprobe_opcode_t
return 1;
 }
 
+static struct delayed_uprobe *
+delayed_uprobe_check(struct uprobe *uprobe, struct mm_struct *mm)
+{
+   struct delayed_uprobe *du;
+
+   list_for_each_entry(du, _uprobe_list, list)
+   if (du->uprobe == uprobe && du->mm == mm)
+   return du;
+   return NULL;
+}
+
+static int delayed_uprobe_add(struct uprobe *uprobe, struct mm_struct *mm)
+{
+   struct delayed_uprobe *du;
+
+   if (delayed_uprobe_check(uprobe, mm))
+  

[PATCH v7 2/6] Uprobe: Additional argument arch_uprobe to uprobe_write_opcode()

2018-07-30 Thread Ravi Bangoria
Add addition argument 'arch_uprobe' to uprobe_write_opcode().
We need this in later set of patches.

Signed-off-by: Ravi Bangoria 
---
 arch/arm/probes/uprobes/core.c | 2 +-
 arch/mips/kernel/uprobes.c | 2 +-
 include/linux/uprobes.h| 2 +-
 kernel/events/uprobes.c| 9 +
 4 files changed, 8 insertions(+), 7 deletions(-)

diff --git a/arch/arm/probes/uprobes/core.c b/arch/arm/probes/uprobes/core.c
index d1329f1ba4e4..bf992264060e 100644
--- a/arch/arm/probes/uprobes/core.c
+++ b/arch/arm/probes/uprobes/core.c
@@ -32,7 +32,7 @@ bool is_swbp_insn(uprobe_opcode_t *insn)
 int set_swbp(struct arch_uprobe *auprobe, struct mm_struct *mm,
 unsigned long vaddr)
 {
-   return uprobe_write_opcode(mm, vaddr,
+   return uprobe_write_opcode(auprobe, mm, vaddr,
   __opcode_to_mem_arm(auprobe->bpinsn));
 }
 
diff --git a/arch/mips/kernel/uprobes.c b/arch/mips/kernel/uprobes.c
index f7a0645ccb82..4aaff3b3175c 100644
--- a/arch/mips/kernel/uprobes.c
+++ b/arch/mips/kernel/uprobes.c
@@ -224,7 +224,7 @@ unsigned long arch_uretprobe_hijack_return_addr(
 int __weak set_swbp(struct arch_uprobe *auprobe, struct mm_struct *mm,
unsigned long vaddr)
 {
-   return uprobe_write_opcode(mm, vaddr, UPROBE_SWBP_INSN);
+   return uprobe_write_opcode(auprobe, mm, vaddr, UPROBE_SWBP_INSN);
 }
 
 void arch_uprobe_copy_ixol(struct page *page, unsigned long vaddr,
diff --git a/include/linux/uprobes.h b/include/linux/uprobes.h
index 0a294e950df8..bb9d2084af03 100644
--- a/include/linux/uprobes.h
+++ b/include/linux/uprobes.h
@@ -121,7 +121,7 @@ extern bool is_swbp_insn(uprobe_opcode_t *insn);
 extern bool is_trap_insn(uprobe_opcode_t *insn);
 extern unsigned long uprobe_get_swbp_addr(struct pt_regs *regs);
 extern unsigned long uprobe_get_trap_addr(struct pt_regs *regs);
-extern int uprobe_write_opcode(struct mm_struct *mm, unsigned long vaddr, 
uprobe_opcode_t);
+extern int uprobe_write_opcode(struct arch_uprobe *auprobe, struct mm_struct 
*mm, unsigned long vaddr, uprobe_opcode_t);
 extern int uprobe_register(struct inode *inode, loff_t offset, struct 
uprobe_consumer *uc);
 extern int uprobe_apply(struct inode *inode, loff_t offset, struct 
uprobe_consumer *uc, bool);
 extern void uprobe_unregister(struct inode *inode, loff_t offset, struct 
uprobe_consumer *uc);
diff --git a/kernel/events/uprobes.c b/kernel/events/uprobes.c
index 471eac896635..c0418ba52ba8 100644
--- a/kernel/events/uprobes.c
+++ b/kernel/events/uprobes.c
@@ -299,8 +299,8 @@ static int verify_opcode(struct page *page, unsigned long 
vaddr, uprobe_opcode_t
  * Called with mm->mmap_sem held for write.
  * Return 0 (success) or a negative errno.
  */
-int uprobe_write_opcode(struct mm_struct *mm, unsigned long vaddr,
-   uprobe_opcode_t opcode)
+int uprobe_write_opcode(struct arch_uprobe *auprobe, struct mm_struct *mm,
+   unsigned long vaddr, uprobe_opcode_t opcode)
 {
struct page *old_page, *new_page;
struct vm_area_struct *vma;
@@ -351,7 +351,7 @@ int uprobe_write_opcode(struct mm_struct *mm, unsigned long 
vaddr,
  */
 int __weak set_swbp(struct arch_uprobe *auprobe, struct mm_struct *mm, 
unsigned long vaddr)
 {
-   return uprobe_write_opcode(mm, vaddr, UPROBE_SWBP_INSN);
+   return uprobe_write_opcode(auprobe, mm, vaddr, UPROBE_SWBP_INSN);
 }
 
 /**
@@ -366,7 +366,8 @@ int __weak set_swbp(struct arch_uprobe *auprobe, struct 
mm_struct *mm, unsigned
 int __weak
 set_orig_insn(struct arch_uprobe *auprobe, struct mm_struct *mm, unsigned long 
vaddr)
 {
-   return uprobe_write_opcode(mm, vaddr, *(uprobe_opcode_t 
*)>insn);
+   return uprobe_write_opcode(auprobe, mm, vaddr,
+   *(uprobe_opcode_t *)>insn);
 }
 
 static struct uprobe *get_uprobe(struct uprobe *uprobe)
-- 
2.14.4



[PATCH v7 0/6] Uprobes: Support SDT markers having reference count (semaphore)

2018-07-30 Thread Ravi Bangoria
v7 changes:
 - Don't allow both zero and non-zero reference counter offset at
   the same time. It's painful and error prone.
 - Don't call delayed_uprobe_install() if vma->vm_mm does not have
   any breakpoint installed.

v6: https://lkml.org/lkml/2018/7/16/353


Description:
Userspace Statically Defined Tracepoints[1] are dtrace style markers
inside userspace applications. Applications like PostgreSQL, MySQL,
Pthread, Perl, Python, Java, Ruby, Node.js, libvirt, QEMU, glib etc
have these markers embedded in them. These markers are added by developer
at important places in the code. Each marker source expands to a single
nop instruction in the compiled code but there may be additional
overhead for computing the marker arguments which expands to couple of
instructions. In case the overhead is more, execution of it can be
omitted by runtime if() condition when no one is tracing on the marker:

if (reference_counter > 0) {
Execute marker instructions;
}

Default value of reference counter is 0. Tracer has to increment the 
reference counter before tracing on a marker and decrement it when
done with the tracing.

Currently, perf tool has limited supports for SDT markers. I.e. it
can not trace markers surrounded by reference counter. Also, it's
not easy to add reference counter logic in userspace tool like perf,
so basic idea for this patchset is to add reference counter logic in
the a uprobe infrastructure. Ex,[2]

  # cat tick.c
... 
for (i = 0; i < 100; i++) {
DTRACE_PROBE1(tick, loop1, i);
if (TICK_LOOP2_ENABLED()) {
DTRACE_PROBE1(tick, loop2, i); 
}
printf("hi: %d\n", i); 
sleep(1);
}   
... 

Here tick:loop1 is marker without reference counter where as tick:loop2
is surrounded by reference counter condition.

  # perf buildid-cache --add /tmp/tick
  # perf probe sdt_tick:loop1
  # perf probe sdt_tick:loop2

  # perf stat -e sdt_tick:loop1,sdt_tick:loop2 -- /tmp/tick
  hi: 0
  hi: 1
  hi: 2
  ^C
  Performance counter stats for '/tmp/tick':
 3  sdt_tick:loop1
 0  sdt_tick:loop2
 2.747086086 seconds time elapsed

Perf failed to record data for tick:loop2. Same experiment with this
patch series:

  # ./perf buildid-cache --add /tmp/tick
  # ./perf probe sdt_tick:loop2
  # ./perf stat -e sdt_tick:loop2 /tmp/tick
hi: 0
hi: 1
hi: 2
^C  
 Performance counter stats for '/tmp/tick':
 3  sdt_tick:loop2
   2.561851452 seconds time elapsed


Ravi Bangoria (6):
  Uprobes: Simplify uprobe_register() body
  Uprobe: Additional argument arch_uprobe to uprobe_write_opcode()
  Uprobes: Support SDT markers having reference count (semaphore)
  Uprobes/sdt: Prevent multiple reference counter for same uprobe
  trace_uprobe/sdt: Prevent multiple reference counter for same uprobe
  perf probe: Support SDT markers having reference counter (semaphore)

 arch/arm/probes/uprobes/core.c |   2 +-
 arch/mips/kernel/uprobes.c |   2 +-
 include/linux/uprobes.h|   7 +-
 kernel/events/uprobes.c| 315 +++--
 kernel/trace/trace.c   |   2 +-
 kernel/trace/trace_uprobe.c|  75 +-
 tools/perf/util/probe-event.c  |  39 -
 tools/perf/util/probe-event.h  |   1 +
 tools/perf/util/probe-file.c   |  34 -
 tools/perf/util/probe-file.h   |   1 +
 tools/perf/util/symbol-elf.c   |  46 --
 tools/perf/util/symbol.h   |   7 +
 12 files changed, 459 insertions(+), 72 deletions(-)

-- 
2.14.4



[PATCH v7 5/6] trace_uprobe/sdt: Prevent multiple reference counter for same uprobe

2018-07-30 Thread Ravi Bangoria
We assume to have only one reference counter for one uprobe.
Don't allow user to add multiple trace_uprobe entries having
same inode+offset but different reference counter.

Ex,
  # echo "p:sdt_tick/loop2 /home/ravi/tick:0x6e4(0x10036)" > uprobe_events
  # echo "p:sdt_tick/loop2_1 /home/ravi/tick:0x6e4(0xf)" >> uprobe_events
  bash: echo: write error: Invalid argument

  # dmesg
  trace_kprobe: Reference counter offset mismatch.

There is one exception though:
When user is trying to replace the old entry with the new
one, we allow this if the new entry does not conflict with
any other existing entries.

Signed-off-by: Ravi Bangoria 
---
 kernel/trace/trace_uprobe.c | 37 +++--
 1 file changed, 35 insertions(+), 2 deletions(-)

diff --git a/kernel/trace/trace_uprobe.c b/kernel/trace/trace_uprobe.c
index bf2be098eb08..be64d943d7ea 100644
--- a/kernel/trace/trace_uprobe.c
+++ b/kernel/trace/trace_uprobe.c
@@ -324,6 +324,35 @@ static int unregister_trace_uprobe(struct trace_uprobe *tu)
return 0;
 }
 
+/*
+ * Uprobe with multiple reference counter is not allowed. i.e.
+ * If inode and offset matches, reference counter offset *must*
+ * match as well. Though, there is one exception: If user is
+ * replacing old trace_uprobe with new one(same group/event),
+ * then we allow same uprobe with new reference counter as far
+ * as the new one does not conflict with any other existing
+ * ones.
+ */
+static struct trace_uprobe *find_old_trace_uprobe(struct trace_uprobe *new)
+{
+   struct trace_uprobe *tmp, *old = NULL;
+   struct inode *new_inode = d_real_inode(new->path.dentry);
+
+   old = find_probe_event(trace_event_name(>tp.call),
+   new->tp.call.class->system);
+
+   list_for_each_entry(tmp, _list, list) {
+   if ((old ? old != tmp : true) &&
+   new_inode == d_real_inode(tmp->path.dentry) &&
+   new->offset == tmp->offset &&
+   new->ref_ctr_offset != tmp->ref_ctr_offset) {
+   pr_warn("Reference counter offset mismatch.");
+   return ERR_PTR(-EINVAL);
+   }
+   }
+   return old;
+}
+
 /* Register a trace_uprobe and probe_event */
 static int register_trace_uprobe(struct trace_uprobe *tu)
 {
@@ -333,8 +362,12 @@ static int register_trace_uprobe(struct trace_uprobe *tu)
mutex_lock(_lock);
 
/* register as an event */
-   old_tu = find_probe_event(trace_event_name(>tp.call),
-   tu->tp.call.class->system);
+   old_tu = find_old_trace_uprobe(tu);
+   if (IS_ERR(old_tu)) {
+   ret = PTR_ERR(old_tu);
+   goto end;
+   }
+
if (old_tu) {
/* delete old event */
ret = unregister_trace_uprobe(old_tu);
-- 
2.14.4



[PATCH v7 1/6] Uprobes: Simplify uprobe_register() body

2018-07-30 Thread Ravi Bangoria
Simplify uprobe_register() function body and let __uprobe_register()
handle everything. Also move dependency functions around to fix build
failures.

Signed-off-by: Ravi Bangoria 
---
 kernel/events/uprobes.c | 69 ++---
 1 file changed, 36 insertions(+), 33 deletions(-)

diff --git a/kernel/events/uprobes.c b/kernel/events/uprobes.c
index ccc579a7d32e..471eac896635 100644
--- a/kernel/events/uprobes.c
+++ b/kernel/events/uprobes.c
@@ -840,13 +840,8 @@ register_for_each_vma(struct uprobe *uprobe, struct 
uprobe_consumer *new)
return err;
 }
 
-static int __uprobe_register(struct uprobe *uprobe, struct uprobe_consumer *uc)
-{
-   consumer_add(uprobe, uc);
-   return register_for_each_vma(uprobe, uc);
-}
-
-static void __uprobe_unregister(struct uprobe *uprobe, struct uprobe_consumer 
*uc)
+static void
+__uprobe_unregister(struct uprobe *uprobe, struct uprobe_consumer *uc)
 {
int err;
 
@@ -860,24 +855,46 @@ static void __uprobe_unregister(struct uprobe *uprobe, 
struct uprobe_consumer *u
 }
 
 /*
- * uprobe_register - register a probe
+ * uprobe_unregister - unregister a already registered probe.
+ * @inode: the file in which the probe has to be removed.
+ * @offset: offset from the start of the file.
+ * @uc: identify which probe if multiple probes are colocated.
+ */
+void uprobe_unregister(struct inode *inode, loff_t offset, struct 
uprobe_consumer *uc)
+{
+   struct uprobe *uprobe;
+
+   uprobe = find_uprobe(inode, offset);
+   if (WARN_ON(!uprobe))
+   return;
+
+   down_write(>register_rwsem);
+   __uprobe_unregister(uprobe, uc);
+   up_write(>register_rwsem);
+   put_uprobe(uprobe);
+}
+EXPORT_SYMBOL_GPL(uprobe_unregister);
+
+/*
+ * __uprobe_register - register a probe
  * @inode: the file in which the probe has to be placed.
  * @offset: offset from the start of the file.
  * @uc: information on howto handle the probe..
  *
- * Apart from the access refcount, uprobe_register() takes a creation
+ * Apart from the access refcount, __uprobe_register() takes a creation
  * refcount (thro alloc_uprobe) if and only if this @uprobe is getting
  * inserted into the rbtree (i.e first consumer for a @inode:@offset
  * tuple).  Creation refcount stops uprobe_unregister from freeing the
  * @uprobe even before the register operation is complete. Creation
  * refcount is released when the last @uc for the @uprobe
- * unregisters. Caller of uprobe_register() is required to keep @inode
+ * unregisters. Caller of __uprobe_register() is required to keep @inode
  * (and the containing mount) referenced.
  *
  * Return errno if it cannot successully install probes
  * else return 0 (success)
  */
-int uprobe_register(struct inode *inode, loff_t offset, struct uprobe_consumer 
*uc)
+static int __uprobe_register(struct inode *inode, loff_t offset,
+struct uprobe_consumer *uc)
 {
struct uprobe *uprobe;
int ret;
@@ -904,7 +921,8 @@ int uprobe_register(struct inode *inode, loff_t offset, 
struct uprobe_consumer *
down_write(>register_rwsem);
ret = -EAGAIN;
if (likely(uprobe_is_active(uprobe))) {
-   ret = __uprobe_register(uprobe, uc);
+   consumer_add(uprobe, uc);
+   ret = register_for_each_vma(uprobe, uc);
if (ret)
__uprobe_unregister(uprobe, uc);
}
@@ -915,6 +933,12 @@ int uprobe_register(struct inode *inode, loff_t offset, 
struct uprobe_consumer *
goto retry;
return ret;
 }
+
+int uprobe_register(struct inode *inode, loff_t offset,
+   struct uprobe_consumer *uc)
+{
+   return __uprobe_register(inode, offset, uc);
+}
 EXPORT_SYMBOL_GPL(uprobe_register);
 
 /*
@@ -946,27 +970,6 @@ int uprobe_apply(struct inode *inode, loff_t offset,
return ret;
 }
 
-/*
- * uprobe_unregister - unregister a already registered probe.
- * @inode: the file in which the probe has to be removed.
- * @offset: offset from the start of the file.
- * @uc: identify which probe if multiple probes are colocated.
- */
-void uprobe_unregister(struct inode *inode, loff_t offset, struct 
uprobe_consumer *uc)
-{
-   struct uprobe *uprobe;
-
-   uprobe = find_uprobe(inode, offset);
-   if (WARN_ON(!uprobe))
-   return;
-
-   down_write(>register_rwsem);
-   __uprobe_unregister(uprobe, uc);
-   up_write(>register_rwsem);
-   put_uprobe(uprobe);
-}
-EXPORT_SYMBOL_GPL(uprobe_unregister);
-
 static int unapply_uprobe(struct uprobe *uprobe, struct mm_struct *mm)
 {
struct vm_area_struct *vma;
-- 
2.14.4



Re: [PATCH] perf test 39 (Session topology) dumps core on s390

2018-05-27 Thread Ravi Bangoria
Hi Thomas,

On 05/24/2018 07:26 PM, Thomas Richter wrote:
> @@ -95,7 +98,7 @@ int test__session_topology(struct test *test 
> __maybe_unused, int subtest __maybe
>  {
>   char path[PATH_MAX];
>   struct cpu_map *map;
> - int ret = -1;
> + int ret;

This is failing for me:

tests/topology.c: In function ‘test__session_topology’:
tests/topology.c:101:6: error: ‘ret’ may be used uninitialized in this function 
[-Werror=maybe-uninitialized]
  int ret;
  ^
cc1: all warnings being treated as errors
  CC   util/intlist.o
mv: cannot stat 'tests/.topology.o.tmp': No such file or directory
/home/ravi/Workspace/linux/tools/build/Makefile.build:96: recipe for target 
'tests/topology.o' failed
make[4]: *** [tests/topology.o] Error 1
make[4]: *** Waiting for unfinished jobs


Thanks,
Ravi



[PATCH] perf script powerpc: Python script for hypervisor call statistics

2018-06-05 Thread Ravi Bangoria
Add python script to show hypervisor call statistics. Ex,

  # perf record -a -e "{powerpc:hcall_entry,powerpc:hcall_exit}"
  # perf script -s scripts/python/powerpc-hcalls.py
hcallcount   min(ns)   max(ns)   avg(ns)

H_RANDOM82   838  1164   904
H_PUT_TCE   47  1078  5928  2003
H_EOI  266  1336  3546  1654
H_ENTER 28  1646  4038  1952
H_PUT_TCE_INDIRECT 230  2166 18168  6109
H_IPI  238  1072  3232  1688
H_SEND_LOGICAL_LAN  42  5488 21366  7694
H_STUFF_TCE294   986  6210  3591
H_XIRR 266  2286  6990  3783
H_PROTECT   10  2196  3556  2555
H_VIO_SIGNAL   294  1028  2784  1311
H_ADD_LOGICAL_LAN_BUFFER53  1978  3450  2600
H_SEND_CRQ  77  1762  7240  2447

Signed-off-by: Ravi Bangoria 
---
 .../perf/scripts/python/bin/powerpc-hcalls-record  |   2 +
 .../perf/scripts/python/bin/powerpc-hcalls-report  |   2 +
 tools/perf/scripts/python/powerpc-hcalls.py| 200 +
 3 files changed, 204 insertions(+)
 create mode 100644 tools/perf/scripts/python/bin/powerpc-hcalls-record
 create mode 100644 tools/perf/scripts/python/bin/powerpc-hcalls-report
 create mode 100644 tools/perf/scripts/python/powerpc-hcalls.py

diff --git a/tools/perf/scripts/python/bin/powerpc-hcalls-record 
b/tools/perf/scripts/python/bin/powerpc-hcalls-record
new file mode 100644
index ..b7402aa9147d
--- /dev/null
+++ b/tools/perf/scripts/python/bin/powerpc-hcalls-record
@@ -0,0 +1,2 @@
+#!/bin/bash
+perf record -e "{powerpc:hcall_entry,powerpc:hcall_exit}" $@
diff --git a/tools/perf/scripts/python/bin/powerpc-hcalls-report 
b/tools/perf/scripts/python/bin/powerpc-hcalls-report
new file mode 100644
index ..dd32ad7465f6
--- /dev/null
+++ b/tools/perf/scripts/python/bin/powerpc-hcalls-report
@@ -0,0 +1,2 @@
+#!/bin/bash
+perf script $@ -s "$PERF_EXEC_PATH"/scripts/python/powerpc-hcalls.py
diff --git a/tools/perf/scripts/python/powerpc-hcalls.py 
b/tools/perf/scripts/python/powerpc-hcalls.py
new file mode 100644
index ..ff732118cae8
--- /dev/null
+++ b/tools/perf/scripts/python/powerpc-hcalls.py
@@ -0,0 +1,200 @@
+# SPDX-License-Identifier: GPL-2.0+
+#
+# Copyright (C) 2018 Ravi Bangoria, IBM Corporation
+#
+# Hypervisor call statisics
+
+import os
+import sys
+
+sys.path.append(os.environ['PERF_EXEC_PATH'] + \
+   '/scripts/python/Perf-Trace-Util/lib/Perf/Trace')
+
+from perf_trace_context import *
+from Core import *
+from Util import *
+
+# output: {
+#  opcode: {
+#  'min': minimum time nsec
+#  'max': maximum time nsec
+#  'time': average time nsec
+#  'cnt': counter
+#  } ...
+# }
+output = {}
+
+# d_enter: {
+#  cpu: {
+#  opcode: nsec
+#  } ...
+# }
+d_enter = {}
+
+hcall_table = {
+   4: 'H_REMOVE',
+   8: 'H_ENTER',
+   12: 'H_READ',
+   16: 'H_CLEAR_MOD',
+   20: 'H_CLEAR_REF',
+   24: 'H_PROTECT',
+   28: 'H_GET_TCE',
+   32: 'H_PUT_TCE',
+   36: 'H_SET_SPRG0',
+   40: 'H_SET_DABR',
+   44: 'H_PAGE_INIT',
+   48: 'H_SET_ASR',
+   52: 'H_ASR_ON',
+   56: 'H_ASR_OFF',
+   60: 'H_LOGICAL_CI_LOAD',
+   64: 'H_LOGICAL_CI_STORE',
+   68: 'H_LOGICAL_CACHE_LOAD',
+   72: 'H_LOGICAL_CACHE_STORE',
+   76: 'H_LOGICAL_ICBI',
+   80: 'H_LOGICAL_DCBF',
+   84: 'H_GET_TERM_CHAR',
+   88: 'H_PUT_TERM_CHAR',
+   92: 'H_REAL_TO_LOGICAL',
+   96: 'H_HYPERVISOR_DATA',
+   100: 'H_EOI',
+   104: 'H_CPPR',
+   108: 'H_IPI',
+   112: 'H_IPOLL',
+   116: 'H_XIRR',
+   120: 'H_MIGRATE_DMA',
+   124: 'H_PERFMON',
+   220: 'H_REGISTER_VPA',
+   224: 'H_CEDE',
+   228: 'H_CONFER',
+   232: 'H_PROD',
+   236: 'H_GET_PPP',
+   240: 'H_SET_PPP',
+   244: 'H_PURR',
+   248: 'H_PIC',
+   252: 'H_REG_CRQ',
+   256: 'H_FREE_CRQ',
+   260: 'H_VIO_SIGNAL',
+   264: 'H_SEND_CRQ',
+   272: 'H_COPY_RDMA',
+   276: 'H_REGISTER_LOGICAL_LAN',
+   280: 'H_FREE_LOGICAL_LAN',
+   284: 'H_ADD_LOGICAL_LAN_BUFFER',
+   288: 'H_SEND_LOGICAL_LAN',
+   292: 'H_BULK_REMOVE',
+   304: 'H_MULTICAST_CTRL',
+   308: 'H_SET_XDABR',
+   312: 'H_STUFF_TCE',
+   316: 'H_PUT_TCE_INDIRECT',
+   332: 'H_CHANGE_LOGICAL_LAN_MAC',
+   336: 'H_VTERM_PARTNER_INFO',
+   340: 'H_REGISTER_VTERM',
+   344: 'H_FREE_VTERM'

[PATCH 0/7] Uprobes: Support SDT markers having reference count (semaphore)

2018-06-06 Thread Ravi Bangoria
Why RFC again:

This series is different from earlier versions[1]. Earlier series
implemented this feature in trace_uprobe while this has implemented
the logic in core uprobe. Few reasons for this:
 1. One of the major reason was the deadlock between uprobe_lock and
 mm->mmap inside trace_uprobe_mmap(). That deadlock was not easy to fix
 because mm->mmap is not in control of trace_uprobe_mmap() and it has
 to take uprobe_lock to loop over trace_uprobe list. More details can
 be found at[2]. With this new approach, there are no deadlocks found
 so far.
 2. Many of the core uprobe function and data-structures needs to be
 exported to make earlier implementation simple. With this new approach,
 reference counter logic is been implemented in core uprobe and thus
 no need to export anything.

Description:

Userspace Statically Defined Tracepoints[3] are dtrace style markers
inside userspace applications. Applications like PostgreSQL, MySQL,
Pthread, Perl, Python, Java, Ruby, Node.js, libvirt, QEMU, glib etc
have these markers embedded in them. These markers are added by developer
at important places in the code. Each marker source expands to a single
nop instruction in the compiled code but there may be additional
overhead for computing the marker arguments which expands to couple of
instructions. In case the overhead is more, execution of it can be
omitted by runtime if() condition when no one is tracing on the marker:

if (reference_counter > 0) {
Execute marker instructions;
}

Default value of reference counter is 0. Tracer has to increment the 
reference counter before tracing on a marker and decrement it when
done with the tracing.

Currently, perf tool has limited supports for SDT markers. I.e. it
can not trace markers surrounded by reference counter. Also, it's
not easy to add reference counter logic in userspace tool like perf,
so basic idea for this patchset is to add reference counter logic in
the trace_uprobe infrastructure. Ex,[4]

  # cat tick.c
... 
for (i = 0; i < 100; i++) {
DTRACE_PROBE1(tick, loop1, i);
if (TICK_LOOP2_ENABLED()) {
DTRACE_PROBE1(tick, loop2, i); 
}
printf("hi: %d\n", i); 
sleep(1);
}   
... 

Here tick:loop1 is marker without reference counter where as tick:loop2
is surrounded by reference counter condition.

  # perf buildid-cache --add /tmp/tick
  # perf probe sdt_tick:loop1
  # perf probe sdt_tick:loop2

  # perf stat -e sdt_tick:loop1,sdt_tick:loop2 -- /tmp/tick
  hi: 0
  hi: 1
  hi: 2
  ^C
  Performance counter stats for '/tmp/tick':
 3  sdt_tick:loop1
 0  sdt_tick:loop2
 2.747086086 seconds time elapsed

Perf failed to record data for tick:loop2. Same experiment with this
patch series:

  # ./perf buildid-cache --add /tmp/tick
  # ./perf probe sdt_tick:loop2
  # ./perf stat -e sdt_tick:loop2 /tmp/tick
hi: 0
hi: 1
hi: 2
^C  
 Performance counter stats for '/tmp/tick':
 3  sdt_tick:loop2
   2.561851452 seconds time elapsed


Note:
 - 'reference counter' is called as 'semaphore' in original Dtrace
   (or Systemtap, bcc and even in ELF) documentation and code. But the 
   term 'semaphore' is misleading in this context. This is just a counter
   used to hold number of tracers tracing on a marker. This is not really
   used for any synchronization. So we are referring it as 'reference
   counter' in kernel / perf code.
 - This patches still has one issue. If there are multiple instances of
   same application running and user wants to trace any particular
   instance, trace_uprobe is updating reference counter in all instances.
   This is not a problem on user side because instruction is not replaced
   with trap/int3 and thus user will only see samples from his interested
   process. But still this is more of a correctness issue. I'm working on
   a fix for this.

[1] https://lkml.org/lkml/2018/4/17/23
[2] https://lkml.org/lkml/2018/5/25/111
[3] https://sourceware.org/systemtap/wiki/UserSpaceProbeImplementation
[4] https://github.com/iovisor/bcc/issues/327#issuecomment-200576506

Ravi Bangoria (7):
  Uprobes: Simplify uprobe_register() body
  Uprobes: Support SDT markers having reference count (semaphore)
  Uprobes/sdt: Fix multiple update of same reference counter
  trace_uprobe/sdt: Prevent multiple reference counter for same uprobe
  Uprobes/sdt: Prevent multiple reference counter for same uprobe
  Uprobes/sdt: Document about reference counter
  perf probe: Support SDT markers having reference counter (semaphore)

 Documentation/trace/uprobetracer.rst |  16 +-
 include/linux/uprobes.h  |   5 +
 kernel/events/uprobes.c  | 502 +++
 kernel/trace/trace.c |   2 +-
 kernel/trace/trace_uprobe.c  |  74 +-
 tools/perf/util/probe-event.c|  39 ++-
 tools/perf/util/probe-event.h|   1 +
 tools/

Re: [PATCH] perf stat: Fix shadow stats for clock events

2018-11-27 Thread Ravi Bangoria



On 11/16/18 7:05 PM, Jiri Olsa wrote:
> On Fri, Nov 16, 2018 at 09:58:43AM +0530, Ravi Bangoria wrote:
>> Commit 0aa802a79469 ("perf stat: Get rid of extra clock display
>> function") introduced scale and unit for clock events. Thus,
>> perf_stat__update_shadow_stats() now saves scaled values of
>> clock events in msecs, instead of original nsecs. But while
>> calculating values of shadow stats we still consider clock
>> event values in nsecs. This results in a wrong shadow stat
>> values. Ex,
>>
>>   # ./perf stat -e task-clock,cycles ls
>> 
>>   2.60 msec task-clock:u#0.877 CPUs utilized
>>  2,430,564  cycles:u# 1215282.000 GHz
>>
>> Fix this by saving original nsec values for clock events in
>> perf_stat__update_shadow_stats(). After patch:
>>
>>   # ./perf stat -e task-clock,cycles ls
>> 
>>   3.14 msec task-clock:u#0.839 CPUs utilized
>>  3,094,528  cycles:u#0.985 GHz
>>
>> Reported-by: Anton Blanchard 
>> Suggested-by: Jiri Olsa 
>> Fixes: 0aa802a79469 ("perf stat: Get rid of extra clock display function")
>> Signed-off-by: Ravi Bangoria 
> 
> Reviewed-by: Jiri Olsa 

Hi Arnaldo, Please pull this.

Thanks.



[PATCH] Uprobes: Fix kernel oops with delayed_uprobe_remove()

2018-11-14 Thread Ravi Bangoria
syzbot reported a kernel crash with delayed_uprobe_remove():
  https://lkml.org/lkml/2018/11/1/1244

Backtrace mentioned in the link points to a race between process
exit and uprobe_unregister(). Fix it by locking delayed_uprobe_lock
before calling delayed_uprobe_remove() from put_uprobe().

Reported-by: syzbot+cb1fb754b771caca0...@syzkaller.appspotmail.com
Signed-off-by: Ravi Bangoria 
---
 kernel/events/uprobes.c | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/kernel/events/uprobes.c b/kernel/events/uprobes.c
index 96fb51f3994f..e527c4753d4f 100644
--- a/kernel/events/uprobes.c
+++ b/kernel/events/uprobes.c
@@ -572,7 +572,9 @@ static void put_uprobe(struct uprobe *uprobe)
 * gets called, we don't get a chance to remove uprobe from
 * delayed_uprobe_list from remove_breakpoint(). Do it here.
 */
+   mutex_lock(_uprobe_lock);
delayed_uprobe_remove(uprobe, NULL);
+   mutex_unlock(_uprobe_lock);
kfree(uprobe);
}
 }
-- 
2.19.1



Re: mainline/master boot bisection: v4.20-rc5-79-gabb8d6ecbd8f on jetson-tk1

2018-12-10 Thread Ravi Bangoria
Hi,

Can you please provide more details. I don't understand how this patch
can cause boot failure.

>From the log found at
https://storage.kernelci.org/mainline/master/v4.20-rc5-79-gabb8d6ecbd8f/arm/multi_v7_defconfig+CONFIG_EFI=y+CONFIG_ARM_LPAE=y/lab-baylibre/boot-tegra124-jetson-tk1.html

23:21:06.680269  [7.500733] Unable to handle kernel NULL pointer 
dereference at virtual address 0064
23:21:06.680455  [7.508893] pgd = (ptrval)
23:21:06.721940  [7.511591] [0064] *pgd=ad7d8003, *pmd=f9d5d003
23:21:06.722241  [7.516500] Internal error: Oops: 207 [#1] SMP ARM
 ...
23:21:06.722724  [7.546706] CPU: 0 PID: 122 Comm: udevd Not tainted 
4.20.0-rc5 #1
23:21:06.722911  [7.552785] Hardware name: NVIDIA Tegra SoC (Flattened 
Device Tree)
23:21:06.765203  [7.559045] PC is at drm_plane_register_all+0x18/0x50
23:21:06.765493  [7.564094] LR is at drm_modeset_register_all+0xc/0x6c
23:21:06.765698  [7.569217] pc : []lr : []psr: 
a013
23:21:06.765882  [7.575470] sp : c3451c70  ip : 2d827000  fp : c1804c48
23:21:06.766053  [7.580680] r10:   r9 : ec9cc300  r8 : 
23:21:06.766229  [7.585893] r7 : bf193c80  r6 :   r5 : c3694224  r4 
: fffc
23:21:06.766403  [7.592404] r3 : 2000  r2 : 0002f000  r1 : eef92cf0  r0 
: c3694000
 ...
23:21:07.068237  [7.880215] [] (drm_plane_register_all) from 
[] (drm_modeset_register_all+0xc/0x6c)
23:21:07.068493  [7.889603] [] (drm_modeset_register_all) from 
[] (drm_dev_register+0x16c/0x1c4)
23:21:07.109960  [7.898915] [] (drm_dev_register) from 
[] (nouveau_platform_probe+0x54/0x8c [nouveau])
23:21:07.110285  [7.908750] [] (nouveau_platform_probe [nouveau]) 
from [] (platform_drv_probe+0x48/0x98)
23:21:07.110515  [7.918572] [] (platform_drv_probe) from 
[] (really_probe+0x228/0x2d0)
23:21:07.110706  [7.926832] [] (really_probe) from [] 
(driver_probe_device+0x60/0x174)
23:21:07.110893  [7.935093] [] (driver_probe_device) from 
[] (__driver_attach+0xd0/0xd4)
23:21:07.153794  [7.943528] [] (__driver_attach) from 
[] (bus_for_each_dev+0x74/0xb4)
23:21:07.154133  [7.951688] [] (bus_for_each_dev) from 
[] (bus_add_driver+0x18c/0x210)
23:21:07.154352  [7.959946] [] (bus_add_driver) from [] 
(driver_register+0x74/0x108)
23:21:07.154544  [7.968212] [] (driver_register) from 
[] (nouveau_drm_init+0x170/0x1000 [nouveau])
23:21:07.154739  [7.977692] [] (nouveau_drm_init [nouveau]) from 
[] (do_one_initcall+0x54/0x1fc)
23:21:07.197008  [7.986820] [] (do_one_initcall) from 
[] (do_init_module+0x64/0x1f4)
23:21:07.197344  [7.994906] [] (do_init_module) from [] 
(load_module+0x1ee8/0x23c8)
23:21:07.197553  [8.002907] [] (load_module) from [] 
(sys_finit_module+0xac/0xd8)
23:21:07.197751  [8.010722] [] (sys_finit_module) from 
[] (ret_fast_syscall+0x0/0x4c)
23:21:07.197935  [8.018884] Exception stack(0xc3451fa8 to 0xc3451ff0)


Both PC and LR are pointing to drm_* code. I don't see this anyway related to
uprobes. Did I miss anything?

Thanks,
Ravi


On 12/7/18 9:30 AM, kernelci.org bot wrote:
> * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * *
> * This automated bisection report was sent to you on the basis  *
> * that you may be involved with the breaking commit it has  *
> * found.  No manual investigation has been done to verify it,   *
> * and the root cause of the problem may be somewhere else.  *
> * Hope this helps!  *
> * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * *
> 
> Bisection result for mainline/master (v4.20-rc5-79-gabb8d6ecbd8f) on 
> jetson-tk1
> 
>   Good:   cf76c364a1e1 Merge tag 'scsi-fixes' of 
> git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi
>   Bad:abb8d6ecbd8f Merge tag 'trace-v4.20-rc5' of 
> git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace
>   Found:  1aed58e67a6e Uprobes: Fix kernel oops with 
> delayed_uprobe_remove()
> 
> Details:
>   Good:   
> https://kernelci.org/boot/all/job/mainline/branch/master/kernel/v4.20-rc5-62-gcf76c364a1e1/
>   Bad:
> https://kernelci.org/boot/all/job/mainline/branch/master/kernel/v4.20-rc5-79-gabb8d6ecbd8f/
> 
> Checks:
>   revert: PASS
>   verify: PASS
> 
> Parameters:
>   Tree:   mainline
>   URL:
> http://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git
>   Branch: master
>   Target: jetson-tk1
>   Lab:lab-baylibre
>   Config: multi_v7_defconfig
>   Plan:   boot
> 
> Breaking commit found:
> 
> ---
> commit 1aed58e67a6ec1e7a18bfabe8ba6ec2d27c15636
> Author: Ravi Bangoria 
> Date:   Wed Dec 5 09:04:23 2018 +0530
> 
> Uprobes: Fix kernel oops with delayed_uprobe

[PATCH v2] Uprobes: Fix kernel oops with delayed_uprobe_remove()

2018-12-04 Thread Ravi Bangoria
There could be a race between task exit and probe unregister:

  exit_mm()
  mmput()
  __mmput() uprobe_unregister()
  uprobe_clear_state()  put_uprobe()
  delayed_uprobe_remove()   delayed_uprobe_remove()

put_uprobe() is calling delayed_uprobe_remove() without taking
delayed_uprobe_lock and thus the race sometimes results in a
kernel crash. Fix this by taking delayed_uprobe_lock before
calling delayed_uprobe_remove() from put_uprobe().

Detailed crash log can be found at:
  https://lkml.org/lkml/2018/11/1/1244

Reported-by: syzbot+cb1fb754b771caca0...@syzkaller.appspotmail.com
Fixes: 1cc33161a83d ("uprobes: Support SDT markers having reference count 
(semaphore)")
Signed-off-by: Ravi Bangoria 
---
 kernel/events/uprobes.c | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/kernel/events/uprobes.c b/kernel/events/uprobes.c
index 322e97bbb437..abbd8da9ac21 100644
--- a/kernel/events/uprobes.c
+++ b/kernel/events/uprobes.c
@@ -572,7 +572,9 @@ static void put_uprobe(struct uprobe *uprobe)
 * gets called, we don't get a chance to remove uprobe from
 * delayed_uprobe_list from remove_breakpoint(). Do it here.
 */
+   mutex_lock(_uprobe_lock);
delayed_uprobe_remove(uprobe, NULL);
+   mutex_unlock(_uprobe_lock);
kfree(uprobe);
}
 }
-- 
2.19.2



[PATCH v9 0/4] Uprobes: Support SDT markers having reference count (semaphore)

2018-08-19 Thread Ravi Bangoria
v8 -> v9:
 - Rebased to rostedt/for-next (Commit bb730b5833b5 to be precise)
 - Not including first two patches now. They are already pulled by
   Steven.
 - Change delayed_uprobe_remove() function as suggested by Oleg
 - Dump inode, offset, ref_ctr_offset, mm etc if we fail to update
   reference counter.
 - Rename delayed_uprobe_install() to delayed_ref_ctr_inc()
 - Use 'short d' (delta) in update_ref_ctr() in place of 'bool
   is_register'.

v8: https://lkml.org/lkml/2018/8/9/81

Future work:
 - Optimize uprobe_mmap()->delayed_ref_ctr_inc() by making
   delayed_uprobe_list per mm.

Description:
Userspace Statically Defined Tracepoints[1] are dtrace style markers
inside userspace applications. Applications like PostgreSQL, MySQL,
Pthread, Perl, Python, Java, Ruby, Node.js, libvirt, QEMU, glib etc
have these markers embedded in them. These markers are added by developer
at important places in the code. Each marker source expands to a single
nop instruction in the compiled code but there may be additional
overhead for computing the marker arguments which expands to couple of
instructions. In case the overhead is more, execution of it can be
omitted by runtime if() condition when no one is tracing on the marker:

if (reference_counter > 0) {
Execute marker instructions;
}

Default value of reference counter is 0. Tracer has to increment the 
reference counter before tracing on a marker and decrement it when
done with the tracing.

Currently, perf tool has limited supports for SDT markers. I.e. it
can not trace markers surrounded by reference counter. Also, it's
not easy to add reference counter logic in userspace tool like perf,
so basic idea for this patchset is to add reference counter logic in
the a uprobe infrastructure. Ex,[2]

  # cat tick.c
... 
for (i = 0; i < 100; i++) {
DTRACE_PROBE1(tick, loop1, i);
if (TICK_LOOP2_ENABLED()) {
DTRACE_PROBE1(tick, loop2, i); 
}
printf("hi: %d\n", i); 
sleep(1);
}   
... 

Here tick:loop1 is marker without reference counter where as tick:loop2
is surrounded by reference counter condition.

  # perf buildid-cache --add /tmp/tick
  # perf probe sdt_tick:loop1
  # perf probe sdt_tick:loop2

  # perf stat -e sdt_tick:loop1,sdt_tick:loop2 -- /tmp/tick
  hi: 0
  hi: 1
  hi: 2
  ^C
  Performance counter stats for '/tmp/tick':
 3  sdt_tick:loop1
 0  sdt_tick:loop2
 2.747086086 seconds time elapsed

Perf failed to record data for tick:loop2. Same experiment with this
patch series:

  # ./perf buildid-cache --add /tmp/tick
  # ./perf probe sdt_tick:loop2
  # ./perf stat -e sdt_tick:loop2 /tmp/tick
hi: 0
hi: 1
hi: 2
^C  
 Performance counter stats for '/tmp/tick':
 3  sdt_tick:loop2
   2.561851452 seconds time elapsed

[1] https://sourceware.org/systemtap/wiki/UserSpaceProbeImplementation
[2] https://github.com/iovisor/bcc/issues/327#issuecomment-200576506

Ravi Bangoria (4):
  Uprobes: Support SDT markers having reference count (semaphore)
  Uprobes/sdt: Prevent multiple reference counter for same uprobe
  trace_uprobe/sdt: Prevent multiple reference counter for same uprobe
  perf probe: Support SDT markers having reference counter (semaphore)

 include/linux/uprobes.h   |   5 +
 kernel/events/uprobes.c   | 278 --
 kernel/trace/trace.c  |   2 +-
 kernel/trace/trace_uprobe.c   |  75 +++-
 tools/perf/util/probe-event.c |  39 +-
 tools/perf/util/probe-event.h |   1 +
 tools/perf/util/probe-file.c  |  34 +-
 tools/perf/util/probe-file.h  |   1 +
 tools/perf/util/symbol-elf.c  |  46 +--
 tools/perf/util/symbol.h  |   7 ++
 10 files changed, 453 insertions(+), 35 deletions(-)

-- 
2.14.4



[PATCH v9 2/4] Uprobes/sdt: Prevent multiple reference counter for same uprobe

2018-08-19 Thread Ravi Bangoria
We assume to have only one reference counter for one uprobe.
Don't allow user to register multiple uprobes having same
inode+offset but different reference counter.

Signed-off-by: Ravi Bangoria 
Acked-by: Srikar Dronamraju 
Reviewed-by: Oleg Nesterov 
---
 kernel/events/uprobes.c | 19 +++
 1 file changed, 19 insertions(+)

diff --git a/kernel/events/uprobes.c b/kernel/events/uprobes.c
index 35065febcb6c..ecee371a59c7 100644
--- a/kernel/events/uprobes.c
+++ b/kernel/events/uprobes.c
@@ -679,6 +679,16 @@ static struct uprobe *insert_uprobe(struct uprobe *uprobe)
return u;
 }
 
+static void
+ref_ctr_mismatch_warn(struct uprobe *cur_uprobe, struct uprobe *uprobe)
+{
+   pr_warn("ref_ctr_offset mismatch. inode: 0x%lx offset: 0x%llx "
+   "ref_ctr_offset(old): 0x%llx ref_ctr_offset(new): 0x%llx\n",
+   uprobe->inode->i_ino, (unsigned long long) uprobe->offset,
+   (unsigned long long) cur_uprobe->ref_ctr_offset,
+   (unsigned long long) uprobe->ref_ctr_offset);
+}
+
 static struct uprobe *alloc_uprobe(struct inode *inode, loff_t offset,
   loff_t ref_ctr_offset)
 {
@@ -698,6 +708,12 @@ static struct uprobe *alloc_uprobe(struct inode *inode, 
loff_t offset,
cur_uprobe = insert_uprobe(uprobe);
/* a uprobe exists for this inode:offset combination */
if (cur_uprobe) {
+   if (cur_uprobe->ref_ctr_offset != uprobe->ref_ctr_offset) {
+   ref_ctr_mismatch_warn(cur_uprobe, uprobe);
+   put_uprobe(cur_uprobe);
+   kfree(uprobe);
+   return ERR_PTR(-EINVAL);
+   }
kfree(uprobe);
uprobe = cur_uprobe;
}
@@ -1112,6 +1128,9 @@ static int __uprobe_register(struct inode *inode, loff_t 
offset,
uprobe = alloc_uprobe(inode, offset, ref_ctr_offset);
if (!uprobe)
return -ENOMEM;
+   if (IS_ERR(uprobe))
+   return PTR_ERR(uprobe);
+
/*
 * We can race with uprobe_unregister()->delete_uprobe().
 * Check uprobe_is_active() and retry if it is false.
-- 
2.14.4



[PATCH v9 1/4] Uprobes: Support SDT markers having reference count (semaphore)

2018-08-19 Thread Ravi Bangoria
Userspace Statically Defined Tracepoints[1] are dtrace style markers
inside userspace applications. Applications like PostgreSQL, MySQL,
Pthread, Perl, Python, Java, Ruby, Node.js, libvirt, QEMU, glib etc
have these markers embedded in them. These markers are added by developer
at important places in the code. Each marker source expands to a single
nop instruction in the compiled code but there may be additional
overhead for computing the marker arguments which expands to couple of
instructions. In case the overhead is more, execution of it can be
omitted by runtime if() condition when no one is tracing on the marker:

if (reference_counter > 0) {
Execute marker instructions;
}

Default value of reference counter is 0. Tracer has to increment the
reference counter before tracing on a marker and decrement it when
done with the tracing.

Implement the reference counter logic in core uprobe. User will be
able to use it from trace_uprobe as well as from kernel module. New
trace_uprobe definition with reference counter will now be:

:[(ref_ctr_offset)]

where ref_ctr_offset is an optional field. For kernel module, new
variant of uprobe_register() has been introduced:

uprobe_register_refctr(inode, offset, ref_ctr_offset, consumer)

No new variant for uprobe_unregister() because it's assumed to have
only one reference counter for one uprobe.

[1] https://sourceware.org/systemtap/wiki/UserSpaceProbeImplementation

Note: 'reference counter' is called as 'semaphore' in original Dtrace
(or Systemtap, bcc and even in ELF) documentation and code. But the
term 'semaphore' is misleading in this context. This is just a counter
used to hold number of tracers tracing on a marker. This is not really
used for any synchronization. So we are calling it a 'reference counter'
in kernel / perf code.

Signed-off-by: Ravi Bangoria 
Reviewed-by: Masami Hiramatsu 
[Only trace_uprobe.c]
Reviewed-by: Oleg Nesterov 
---
 include/linux/uprobes.h |   5 +
 kernel/events/uprobes.c | 259 ++--
 kernel/trace/trace.c|   2 +-
 kernel/trace/trace_uprobe.c |  38 ++-
 4 files changed, 293 insertions(+), 11 deletions(-)

diff --git a/include/linux/uprobes.h b/include/linux/uprobes.h
index bb9d2084af03..103a48a48872 100644
--- a/include/linux/uprobes.h
+++ b/include/linux/uprobes.h
@@ -123,6 +123,7 @@ extern unsigned long uprobe_get_swbp_addr(struct pt_regs 
*regs);
 extern unsigned long uprobe_get_trap_addr(struct pt_regs *regs);
 extern int uprobe_write_opcode(struct arch_uprobe *auprobe, struct mm_struct 
*mm, unsigned long vaddr, uprobe_opcode_t);
 extern int uprobe_register(struct inode *inode, loff_t offset, struct 
uprobe_consumer *uc);
+extern int uprobe_register_refctr(struct inode *inode, loff_t offset, loff_t 
ref_ctr_offset, struct uprobe_consumer *uc);
 extern int uprobe_apply(struct inode *inode, loff_t offset, struct 
uprobe_consumer *uc, bool);
 extern void uprobe_unregister(struct inode *inode, loff_t offset, struct 
uprobe_consumer *uc);
 extern int uprobe_mmap(struct vm_area_struct *vma);
@@ -160,6 +161,10 @@ uprobe_register(struct inode *inode, loff_t offset, struct 
uprobe_consumer *uc)
 {
return -ENOSYS;
 }
+static inline int uprobe_register_refctr(struct inode *inode, loff_t offset, 
loff_t ref_ctr_offset, struct uprobe_consumer *uc)
+{
+   return -ENOSYS;
+}
 static inline int
 uprobe_apply(struct inode *inode, loff_t offset, struct uprobe_consumer *uc, 
bool add)
 {
diff --git a/kernel/events/uprobes.c b/kernel/events/uprobes.c
index 919c1ce32beb..35065febcb6c 100644
--- a/kernel/events/uprobes.c
+++ b/kernel/events/uprobes.c
@@ -73,6 +73,7 @@ struct uprobe {
struct uprobe_consumer  *consumers;
struct inode*inode; /* Also hold a ref to inode */
loff_t  offset;
+   loff_t  ref_ctr_offset;
unsigned long   flags;
 
/*
@@ -88,6 +89,15 @@ struct uprobe {
struct arch_uprobe  arch;
 };
 
+struct delayed_uprobe {
+   struct list_head list;
+   struct uprobe *uprobe;
+   struct mm_struct *mm;
+};
+
+static DEFINE_MUTEX(delayed_uprobe_lock);
+static LIST_HEAD(delayed_uprobe_list);
+
 /*
  * Execute out of line area: anonymous executable mapping installed
  * by the probed task to execute the copy of the original instruction
@@ -282,6 +292,166 @@ static int verify_opcode(struct page *page, unsigned long 
vaddr, uprobe_opcode_t
return 1;
 }
 
+static struct delayed_uprobe *
+delayed_uprobe_check(struct uprobe *uprobe, struct mm_struct *mm)
+{
+   struct delayed_uprobe *du;
+
+   list_for_each_entry(du, _uprobe_list, list)
+   if (du->uprobe == uprobe && du->mm == mm)
+   return du;
+   return NULL;
+}
+
+static int delayed_uprobe_add(struct uprobe *uprobe, struct mm_struct *mm)
+{
+   struct delayed_uprobe *du;
+
+   if (delayed_uprobe

[PATCH v9 4/4] perf probe: Support SDT markers having reference counter (semaphore)

2018-08-19 Thread Ravi Bangoria
With this, perf buildid-cache will save SDT markers with reference
counter in probe cache. Perf probe will be able to probe markers
having reference counter. Ex,

  # readelf -n /tmp/tick | grep -A1 loop2
Name: loop2
... Semaphore: 0x10020036

  # ./perf buildid-cache --add /tmp/tick
  # ./perf probe sdt_tick:loop2
  # ./perf stat -e sdt_tick:loop2 /tmp/tick
hi: 0
hi: 1
hi: 2
^C
 Performance counter stats for '/tmp/tick':
 3  sdt_tick:loop2
   2.561851452 seconds time elapsed

Signed-off-by: Ravi Bangoria 
Acked-by: Masami Hiramatsu 
Acked-by: Srikar Dronamraju 
---
 tools/perf/util/probe-event.c | 39 
 tools/perf/util/probe-event.h |  1 +
 tools/perf/util/probe-file.c  | 34 ++--
 tools/perf/util/probe-file.h  |  1 +
 tools/perf/util/symbol-elf.c  | 46 ---
 tools/perf/util/symbol.h  |  7 +++
 6 files changed, 106 insertions(+), 22 deletions(-)

diff --git a/tools/perf/util/probe-event.c b/tools/perf/util/probe-event.c
index f119eb628dbb..e86f8be89157 100644
--- a/tools/perf/util/probe-event.c
+++ b/tools/perf/util/probe-event.c
@@ -1819,6 +1819,12 @@ int parse_probe_trace_command(const char *cmd, struct 
probe_trace_event *tev)
tp->offset = strtoul(fmt2_str, NULL, 10);
}
 
+   if (tev->uprobes) {
+   fmt2_str = strchr(p, '(');
+   if (fmt2_str)
+   tp->ref_ctr_offset = strtoul(fmt2_str + 1, NULL, 0);
+   }
+
tev->nargs = argc - 2;
tev->args = zalloc(sizeof(struct probe_trace_arg) * tev->nargs);
if (tev->args == NULL) {
@@ -2012,6 +2018,22 @@ static int synthesize_probe_trace_arg(struct 
probe_trace_arg *arg,
return err;
 }
 
+static int
+synthesize_uprobe_trace_def(struct probe_trace_event *tev, struct strbuf *buf)
+{
+   struct probe_trace_point *tp = >point;
+   int err;
+
+   err = strbuf_addf(buf, "%s:0x%lx", tp->module, tp->address);
+
+   if (err >= 0 && tp->ref_ctr_offset) {
+   if (!uprobe_ref_ctr_is_supported())
+   return -1;
+   err = strbuf_addf(buf, "(0x%lx)", tp->ref_ctr_offset);
+   }
+   return err >= 0 ? 0 : -1;
+}
+
 char *synthesize_probe_trace_command(struct probe_trace_event *tev)
 {
struct probe_trace_point *tp = >point;
@@ -2041,15 +2063,17 @@ char *synthesize_probe_trace_command(struct 
probe_trace_event *tev)
}
 
/* Use the tp->address for uprobes */
-   if (tev->uprobes)
-   err = strbuf_addf(, "%s:0x%lx", tp->module, tp->address);
-   else if (!strncmp(tp->symbol, "0x", 2))
+   if (tev->uprobes) {
+   err = synthesize_uprobe_trace_def(tev, );
+   } else if (!strncmp(tp->symbol, "0x", 2)) {
/* Absolute address. See try_to_find_absolute_address() */
err = strbuf_addf(, "%s%s0x%lx", tp->module ?: "",
  tp->module ? ":" : "", tp->address);
-   else
+   } else {
err = strbuf_addf(, "%s%s%s+%lu", tp->module ?: "",
tp->module ? ":" : "", tp->symbol, tp->offset);
+   }
+
if (err)
goto error;
 
@@ -2633,6 +2657,13 @@ static void warn_uprobe_event_compat(struct 
probe_trace_event *tev)
 {
int i;
char *buf = synthesize_probe_trace_command(tev);
+   struct probe_trace_point *tp = >point;
+
+   if (tp->ref_ctr_offset && !uprobe_ref_ctr_is_supported()) {
+   pr_warning("A semaphore is associated with %s:%s and "
+  "seems your kernel doesn't support it.\n",
+  tev->group, tev->event);
+   }
 
/* Old uprobe event doesn't support memory dereference */
if (!tev->uprobes || tev->nargs == 0 || !buf)
diff --git a/tools/perf/util/probe-event.h b/tools/perf/util/probe-event.h
index 45b14f020558..15a98c3a2a2f 100644
--- a/tools/perf/util/probe-event.h
+++ b/tools/perf/util/probe-event.h
@@ -27,6 +27,7 @@ struct probe_trace_point {
char*symbol;/* Base symbol */
char*module;/* Module name */
unsigned long   offset; /* Offset from symbol */
+   unsigned long   ref_ctr_offset; /* SDT reference counter offset */
unsigned long   address;/* Actual address of the trace point */
boolretprobe;   /* Return probe flag */
 };
diff --git a/tools/perf/util/probe-file.c b/tools/perf/util/probe-file.c
index b76088fadf3d..aac7817d9e14 100644
--- a/tools/perf/util/probe-file.c

[PATCH v9 3/4] trace_uprobe/sdt: Prevent multiple reference counter for same uprobe

2018-08-19 Thread Ravi Bangoria
We assume to have only one reference counter for one uprobe.
Don't allow user to add multiple trace_uprobe entries having
same inode+offset but different reference counter.

Ex,
  # echo "p:sdt_tick/loop2 /home/ravi/tick:0x6e4(0x10036)" > uprobe_events
  # echo "p:sdt_tick/loop2_1 /home/ravi/tick:0x6e4(0xf)" >> uprobe_events
  bash: echo: write error: Invalid argument

  # dmesg
  trace_kprobe: Reference counter offset mismatch.

There is one exception though:
When user is trying to replace the old entry with the new
one, we allow this if the new entry does not conflict with
any other existing entries.

Signed-off-by: Ravi Bangoria 
Acked-by: Srikar Dronamraju 
Reviewed-by: Song Liu 
Reviewed-by: Oleg Nesterov 
---
 kernel/trace/trace_uprobe.c | 37 +++--
 1 file changed, 35 insertions(+), 2 deletions(-)

diff --git a/kernel/trace/trace_uprobe.c b/kernel/trace/trace_uprobe.c
index a7ef6c4ca16e..21d9fffaa096 100644
--- a/kernel/trace/trace_uprobe.c
+++ b/kernel/trace/trace_uprobe.c
@@ -324,6 +324,35 @@ static int unregister_trace_uprobe(struct trace_uprobe *tu)
return 0;
 }
 
+/*
+ * Uprobe with multiple reference counter is not allowed. i.e.
+ * If inode and offset matches, reference counter offset *must*
+ * match as well. Though, there is one exception: If user is
+ * replacing old trace_uprobe with new one(same group/event),
+ * then we allow same uprobe with new reference counter as far
+ * as the new one does not conflict with any other existing
+ * ones.
+ */
+static struct trace_uprobe *find_old_trace_uprobe(struct trace_uprobe *new)
+{
+   struct trace_uprobe *tmp, *old = NULL;
+   struct inode *new_inode = d_real_inode(new->path.dentry);
+
+   old = find_probe_event(trace_event_name(>tp.call),
+   new->tp.call.class->system);
+
+   list_for_each_entry(tmp, _list, list) {
+   if ((old ? old != tmp : true) &&
+   new_inode == d_real_inode(tmp->path.dentry) &&
+   new->offset == tmp->offset &&
+   new->ref_ctr_offset != tmp->ref_ctr_offset) {
+   pr_warn("Reference counter offset mismatch.");
+   return ERR_PTR(-EINVAL);
+   }
+   }
+   return old;
+}
+
 /* Register a trace_uprobe and probe_event */
 static int register_trace_uprobe(struct trace_uprobe *tu)
 {
@@ -333,8 +362,12 @@ static int register_trace_uprobe(struct trace_uprobe *tu)
mutex_lock(_lock);
 
/* register as an event */
-   old_tu = find_probe_event(trace_event_name(>tp.call),
-   tu->tp.call.class->system);
+   old_tu = find_old_trace_uprobe(tu);
+   if (IS_ERR(old_tu)) {
+   ret = PTR_ERR(old_tu);
+   goto end;
+   }
+
if (old_tu) {
/* delete old event */
ret = unregister_trace_uprobe(old_tu);
-- 
2.14.4



Re: [PATCH v9 0/4] Uprobes: Support SDT markers having reference count (semaphore)

2018-08-20 Thread Ravi Bangoria
Hi Song,

> root@virt-test:~# ~/a.out
> 11
> semaphore 0
> semaphore 0
> semaphore 2  <<<  when the uprobe is enabled

Yes, this happens when multiple vmas points to the same file portion.
Can you check /proc/`pgrep a.out`/maps.

Logic is simple. If we are going to patch an instruction, increment the
reference counter. If we are going to unpatch an instruction, decrement
the reference counter. In this case, we patched instruction twice and
thus incremented reference counter twice as well.

Ravi



Re: [PATCH v9 0/4] Uprobes: Support SDT markers having reference count (semaphore)

2018-08-20 Thread Ravi Bangoria
Hi Song,

> However, if I start a.out AFTER enabling the uprobe, there is something wrong:
> 
> root@virt-test:~# ~/a.out
> 11
> semaphore 0   <<< this should be non-zero, as the uprobe is already 
> enabled

Ok. I'm able to reproduce this. Digging deeper.

Ravi



Re: [PATCH v9 0/4] Uprobes: Support SDT markers having reference count (semaphore)

2018-08-21 Thread Ravi Bangoria
Hi Song,

On 08/21/2018 10:53 AM, Ravi Bangoria wrote:
> Hi Song,
> 
>> However, if I start a.out AFTER enabling the uprobe, there is something 
>> wrong:
>>
>> root@virt-test:~# ~/a.out
>> 11
>> semaphore 0   <<< this should be non-zero, as the uprobe is already 
>> enabled

In this testcase, semaphore variable is stored into .bss:

  $ nm test | grep semaphore
  10010c5e B semaphore
 
  $ readelf -SW ./test | grep "data\|bss"
[22] .data PROGBITS10010c58 000c58 04 00  
WA  0   0  1
[23] .bss  NOBITS  10010c5c 000c5c 04 00  
WA  0   0  2

I'm not so sure but I guess .bss data initialization happens after
calling uprobe_mmap() and thus you are seeing semaphore as 0.

To verify this, if I force to save semaphore into data section by
assigning non-zero value to it:

  volatile short semaphore = 1

 $ nm test | grep semaphore
 10010c5c D semaphore

 $ readelf -SW ./test | grep "data\|bss"
[22] .data PROGBITS10010c58 000c58 06 00  
WA  0   0  2
[23] .bss  NOBITS  10010c5e 000c5e 02 00  
WA  0   0  1 

increment/decrement works fine.

Ravi



Re: [PATCH v9 0/4] Uprobes: Support SDT markers having reference count (semaphore)

2018-08-21 Thread Ravi Bangoria


On 08/21/2018 01:04 PM, Naveen N. Rao wrote:
> Song Liu wrote:
>> I am testing the patch set with the following code:
>>
>> #include 
>> #include 
>>
>> volatile short semaphore = 0;
>>
>> int for_uprobe(int c)
>> {
>>     printf("%d\n", c + 10);
>>     return c + 1;
>> }
>>
>> int main(int argc, char *argv[])
>> {
>>     for_uprobe(argc);
>>     while (1) {
>>     sleep(1);
>>     printf("semaphore %d\n", semaphore);
>>     }
>> }
>>
>> I created a uprobe on function for_uprobe(), that uses semaphore as
>> reference counter:
>>
>>   echo "p:uprobe_1 /root/a.out:0x49a(0x1036)" >> uprobe_events
> 
> Is that even valid? That _looks_ like a semaphore, but I'm not quite sure 
> that it qualifies as an _SDT_ semaphore. Do you see this issue if you instead 
> use the macros provided by  to create SDT markers?
> 

Right. By default SDT reference counters(semaphore) goes into .probes
section:

  [25] .probes   PROGBITS0060102c 00102c 04 00  WA  
0   0  2

which has PROGBITS set. So this works fine. And there are many other
things which are coded into . So the official way to use
SDT markers should be through that.

Ravi



Re: [PATCH] trace_uprobe: support reference count in fd-based uprobe

2018-08-22 Thread Ravi Bangoria



On 08/22/2018 03:53 AM, Song Liu wrote:
> This patch applies on top of Ravi Bangoria's work that enables reference
> count for uprobe:
> 
>   https://lkml.org/lkml/2018/8/20/37
> 
> After Ravi's work, the effort to enable it in fd-based uprobe is straight
> forward. Highest 40 bits of perf_event_attr.config is used to stored offset
> of the reference count (semaphore).
> 
> Format information in /sys/bus/event_source/devices/uprobe/format/ is
> updated to reflect this new feature.

LGTM

Reviewed-and-tested-by: Ravi Bangoria 


> 
> Signed-off-by: Song Liu 
> Cc: Ravi Bangoria 
> Cc: Masami Hiramatsu 
> Cc: Oleg Nesterov 
> Cc: Srikar Dronamraju 
> Cc: Naveen N. Rao 
> Cc: Steven Rostedt (VMware) 
> Cc: Peter Zijlstra 
> Cc: Ingo Molnar 
> Cc: Teng Qin 
> ---
>  include/linux/trace_events.h|  3 +-
>  kernel/events/core.c| 49 ++---
>  kernel/trace/trace_event_perf.c |  7 +++--
>  kernel/trace/trace_probe.h  |  3 +-
>  kernel/trace/trace_uprobe.c |  4 ++-
>  5 files changed, 50 insertions(+), 16 deletions(-)
> 
> diff --git a/include/linux/trace_events.h b/include/linux/trace_events.h
> index 78a010e19ed4..4130a5497d40 100644
> --- a/include/linux/trace_events.h
> +++ b/include/linux/trace_events.h
> @@ -575,7 +575,8 @@ extern int bpf_get_kprobe_info(const struct perf_event 
> *event,
>  bool perf_type_tracepoint);
>  #endif
>  #ifdef CONFIG_UPROBE_EVENTS
> -extern int  perf_uprobe_init(struct perf_event *event, bool is_retprobe);
> +extern int  perf_uprobe_init(struct perf_event *event,
> +  unsigned long ref_ctr_offset, bool is_retprobe);
>  extern void perf_uprobe_destroy(struct perf_event *event);
>  extern int bpf_get_uprobe_info(const struct perf_event *event,
>  u32 *fd_type, const char **filename,
> diff --git a/kernel/events/core.c b/kernel/events/core.c
> index 8f0434a9951a..75a0219be420 100644
> --- a/kernel/events/core.c
> +++ b/kernel/events/core.c
> @@ -8363,30 +8363,39 @@ static struct pmu perf_tracepoint = {
>   *
>   * PERF_PROBE_CONFIG_IS_RETPROBE if set, create kretprobe/uretprobe
>   *   if not set, create kprobe/uprobe
> + *
> + * The following values specify a reference counter (or semaphore in the
> + * terminology of tools like dtrace, systemtap, etc.) Userspace Statically
> + * Defined Tracepoints (USDT). Currently, we use 40 bit for the offset.
> + *
> + * PERF_UPROBE_REF_CTR_OFFSET_BITS   # of bits in config as th offset
> + * PERF_UPROBE_REF_CTR_OFFSET_SHIFT  # of bits to shift left
>   */
>  enum perf_probe_config {
>   PERF_PROBE_CONFIG_IS_RETPROBE = 1U << 0,  /* [k,u]retprobe */
> + PERF_UPROBE_REF_CTR_OFFSET_BITS = 40,
> + PERF_UPROBE_REF_CTR_OFFSET_SHIFT = 64 - PERF_UPROBE_REF_CTR_OFFSET_BITS,
>  };
>  
>  PMU_FORMAT_ATTR(retprobe, "config:0");
> +#endif
>  
> -static struct attribute *probe_attrs[] = {
> +#ifdef CONFIG_KPROBE_EVENTS
> +static struct attribute *kprobe_attrs[] = {
>   _attr_retprobe.attr,
>   NULL,
>  };
>  
> -static struct attribute_group probe_format_group = {
> +static struct attribute_group kprobe_format_group = {
>   .name = "format",
> - .attrs = probe_attrs,
> + .attrs = kprobe_attrs,
>  };
>  
> -static const struct attribute_group *probe_attr_groups[] = {
> - _format_group,
> +static const struct attribute_group *kprobe_attr_groups[] = {
> + _format_group,
>   NULL,
>  };
> -#endif
>  
> -#ifdef CONFIG_KPROBE_EVENTS
>  static int perf_kprobe_event_init(struct perf_event *event);
>  static struct pmu perf_kprobe = {
>   .task_ctx_nr= perf_sw_context,
> @@ -8396,7 +8405,7 @@ static struct pmu perf_kprobe = {
>   .start  = perf_swevent_start,
>   .stop   = perf_swevent_stop,
>   .read   = perf_swevent_read,
> - .attr_groups= probe_attr_groups,
> + .attr_groups= kprobe_attr_groups,
>  };
>  
>  static int perf_kprobe_event_init(struct perf_event *event)
> @@ -8428,6 +8437,24 @@ static int perf_kprobe_event_init(struct perf_event 
> *event)
>  #endif /* CONFIG_KPROBE_EVENTS */
>  
>  #ifdef CONFIG_UPROBE_EVENTS
> +PMU_FORMAT_ATTR(ref_ctr_offset, "config:63-24");
> +
> +static struct attribute *uprobe_attrs[] = {
> + _attr_retprobe.attr,
> + _attr_ref_ctr_offset.attr,
> + NULL,
> +};
> +
> +static struct attribute_group uprobe_format_group = {
> + .name = "format",
> + .attrs = uprobe_attrs,
> +};
> +
> +static const struct attribute_group *uprobe_attr_gr

Re: [PATCH v8 3/6] Uprobes: Support SDT markers having reference count (semaphore)

2018-08-12 Thread Ravi Bangoria
Hi Song,

On 08/11/2018 01:27 PM, Song Liu wrote:
>> +
>> +static void delayed_uprobe_delete(struct delayed_uprobe *du)
>> +{
>> +   if (!du)
>> +   return;
> Do we really need this check?


Not necessary though, but I would still like to keep it for a safety.


> 
>> +   list_del(>list);
>> +   kfree(du);
>> +}
>> +
>> +static void delayed_uprobe_remove(struct uprobe *uprobe, struct mm_struct 
>> *mm)
>> +{
>> +   struct list_head *pos, *q;
>> +   struct delayed_uprobe *du;
>> +
>> +   if (!uprobe && !mm)
>> +   return;
> And do we really need this check?


Yes. delayed_uprobe_remove(uprobe=NULL, mm=NULL) is an invalid case. If I remove
this check, code below (or more accurately code suggested by Oleg) will remove
all entries from delayed_uprobe_list. So I will keep this check but put a 
comment
above function.


[...]
>> +
>> +   ret = get_user_pages_remote(NULL, mm, vaddr, 1,
>> +   FOLL_WRITE, , , NULL);
>> +   if (unlikely(ret <= 0)) {
>> +   /*
>> +* We are asking for 1 page. If get_user_pages_remote() 
>> fails,
>> +* it may return 0, in that case we have to return error.
>> +*/
>> +   ret = (ret == 0) ? -EBUSY : ret;
>> +   pr_warn("Failed to %s ref_ctr. (%d)\n",
>> +   d > 0 ? "increment" : "decrement", ret);
> This warning is not really useful. Seems this function has little information
> about which uprobe is failing here. Maybe we only need warning in the caller
> (or caller of caller).


Sure, I can move this warning to caller of this function but what are the
exact fields you would like to print with warning? Something like this is
fine?

pr_warn("ref_ctr %s failed for 0x%lx, 0x%lx, 0x%lx, 0x%p",
d > 0 ? "increment" : "decrement", inode->i_ino,
offset, ref_ctr_offset, mm);

More importantly, the reason I didn't print more info is because dmesg is
accessible to unprivileged users in many distros but uprobes are not. So
printing this information may be a security violation. No?


> 
>> +   return ret;
>> +   }
>> +
>> +   kaddr = kmap_atomic(page);
>> +   ptr = kaddr + (vaddr & ~PAGE_MASK);
>> +
>> +   if (unlikely(*ptr + d < 0)) {
>> +   pr_warn("ref_ctr going negative. vaddr: 0x%lx, "
>> +   "curr val: %d, delta: %d\n", vaddr, *ptr, d);
>> +   ret = -EINVAL;
>> +   goto out;
>> +   }
>> +
>> +   *ptr += d;
>> +   ret = 0;
>> +out:
>> +   kunmap_atomic(kaddr);
>> +   put_page(page);
>> +   return ret;
>> +}
>> +
>> +static int update_ref_ctr(struct uprobe *uprobe, struct mm_struct *mm,
>> + bool is_register)
> What's the reason of bool is_register here vs. short d in __update_ref_ctr()?
> Can we use short for both?


Yes, I can use short as well.


> 
>> +{
>> +   struct vm_area_struct *rc_vma;
>> +   unsigned long rc_vaddr;
>> +   int ret = 0;
>> +
>> +   rc_vma = find_ref_ctr_vma(uprobe, mm);
>> +
>> +   if (rc_vma) {
>> +   rc_vaddr = offset_to_vaddr(rc_vma, uprobe->ref_ctr_offset);
>> +   ret = __update_ref_ctr(mm, rc_vaddr, is_register ? 1 : -1);
>> +
>> +   if (is_register)
>> +   return ret;
>> +   }
> Mixing __update_ref_ctr() here and delayed_uprobe_add() in the same
> function is a little confusing (at least for me). How about we always use
> delayed uprobe for uprobe_mmap() and use non-delayed in other case(s)?


No. delayed_uprobe_add() is needed for uprobe_register() case to handle race
between uprobe_register() and process creation.


[...]
>>
>> +static int delayed_uprobe_install(struct vm_area_struct *vma)
> This function name is confusing. How about we call it delayed_ref_ctr_incr() 
> or
> something similar? Also, we should add comments to highlight this is vma is 
> not
> the vma containing the uprobe, but the vma containing the ref_ctr.


Sure, I'll do that.


> 
>> +{
>> +   struct list_head *pos, *q;
>> +   struct delayed_uprobe *du;
>> +   unsigned long vaddr;
>> +   int ret = 0, err = 0;
>> +
>> +   mutex_lock(_uprobe_lock);
>> +   list_for_each_safe(pos, q, _uprobe_list) {
>> +   du = list_entry(pos, struct delayed_uprobe, list);
>> +
>> +   if (!valid_ref_ctr_vma(du->uprobe, vma))
>> +   continue;
>> +
>> +   vaddr = offset_to_vaddr(vma, du->uprobe->ref_ctr_offset);
>> +   ret = __update_ref_ctr(vma->vm_mm, vaddr, 1);
>> +   /* Record an error and continue. */
>> +   if (ret && !err)
>> +   err = ret;
> I think this is a good place (when ret != 0) to call pr_warn(). I guess we can
> print which mm get error for which uprobe (inode+offset).


__update_ref_ctr() is already printing warning, so I didn't add anything 

Re: [PATCH v8 3/6] Uprobes: Support SDT markers having reference count (semaphore)

2018-08-13 Thread Ravi Bangoria
Hi Song,

On 08/13/2018 11:17 AM, Ravi Bangoria wrote:
>>> +
>>> +static void delayed_uprobe_remove(struct uprobe *uprobe, struct mm_struct 
>>> *mm)
>>> +{
>>> +   struct list_head *pos, *q;
>>> +   struct delayed_uprobe *du;
>>> +
>>> +   if (!uprobe && !mm)
>>> +   return;
>> And do we really need this check?
> 
> 
> Yes. delayed_uprobe_remove(uprobe=NULL, mm=NULL) is an invalid case. If I 
> remove
> this check, code below (or more accurately code suggested by Oleg) will remove
> all entries from delayed_uprobe_list. So I will keep this check but put a 
> comment
> above function.
> 

Sorry, my bad. Please ignore above comment. Even though, it saves us
to unnecessary loop over entire delayed_uprobe_list when both uprobe
and mm are NULL, that case is not possible with current code. Also,
I'm not dereferencing any of them. So, IMHO it's fine to remove this
check.

Thanks,
Ravi



Re: [PATCH v8 5/6] trace_uprobe/sdt: Prevent multiple reference counter for same uprobe

2018-08-13 Thread Ravi Bangoria
Hi Song,

On 08/11/2018 01:42 PM, Song Liu wrote:
> Do we really need this given we already have PATCH 4/6?
> uprobe_regsiter() can be called
> out of trace_uprobe, this patch won't catch all conflicts anyway.

Right but it, at least, catch all conflicts happening via trace_uprobe.

I don't mind in removing this patch but I would like to get an opinion of
Oleg/Srikar/Steven/Masami.

Thanks,
Ravi



Re: [PATCH v8 3/6] Uprobes: Support SDT markers having reference count (semaphore)

2018-08-13 Thread Ravi Bangoria
Hi Oleg,

On 08/13/2018 05:20 PM, Oleg Nesterov wrote:
> On 08/13, Ravi Bangoria wrote:
>>
>> On 08/11/2018 01:27 PM, Song Liu wrote:
>>>> +
>>>> +static void delayed_uprobe_delete(struct delayed_uprobe *du)
>>>> +{
>>>> +   if (!du)
>>>> +   return;
>>> Do we really need this check?
>>
>> Not necessary though, but I would still like to keep it for a safety.
> 
> Heh. I tried to ignore all minor problems in this version, but now that Song
> mentioned this unnecessary check...
> 
> Personally I really dislike the checks like this one.
> 
>   - It can confuse the reader who will try to understand the purpose
> 
>   - it can hide a bug if delayed_uprobe_delete(du) is actually called
> with du == NULL.
> 
> IMO, you should either remove it and let the kernel crash (to notice the
> problem), or turn it into
> 
>   if (WARN_ON(!du))
>   return;
> 
> which is self-documented and reports the problem without kernel crash.

Ok. I'll remove that check.

> 
>>>> +   rc_vma = find_ref_ctr_vma(uprobe, mm);
>>>> +
>>>> +   if (rc_vma) {
>>>> +   rc_vaddr = offset_to_vaddr(rc_vma, uprobe->ref_ctr_offset);
>>>> +   ret = __update_ref_ctr(mm, rc_vaddr, is_register ? 1 : -1);
>>>> +
>>>> +   if (is_register)
>>>> +   return ret;
>>>> +   }
>>> Mixing __update_ref_ctr() here and delayed_uprobe_add() in the same
>>> function is a little confusing (at least for me). How about we always use
>>> delayed uprobe for uprobe_mmap() and use non-delayed in other case(s)?
>>
>>
>> No. delayed_uprobe_add() is needed for uprobe_register() case to handle race
>> between uprobe_register() and process creation.
> 
> Yes.
> 
> But damn, process creation (exec) is trivial. We could add a new uprobe_exec()
> hook and avoid delayed_uprobe_install() in uprobe_mmap().

I'm sorry. I didn't get this.

> 
> Afaics, the really problematic case is dlopen() which can race with 
> _register()
> too, right?

dlopen() should internally use mmap() right? So what is the problem here? Can
you please elaborate.

Thanks,
Ravi



Re: [PATCH v8 3/6] Uprobes: Support SDT markers having reference count (semaphore)

2018-08-13 Thread Ravi Bangoria



On 08/13/2018 06:47 PM, Oleg Nesterov wrote:
> On 08/13, Ravi Bangoria wrote:
>>
>>> But damn, process creation (exec) is trivial. We could add a new 
>>> uprobe_exec()
>>> hook and avoid delayed_uprobe_install() in uprobe_mmap().
>>
>> I'm sorry. I didn't get this.
> 
> Sorry for confusion...
> 
> I meant, if only exec*( could race with _register(), we could add another 
> uprobe
> hook which updates all (delayed) counters before return to user-mode.

Ok.

> 
>>> Afaics, the really problematic case is dlopen() which can race with 
>>> _register()
>>> too, right?
>>
>> dlopen() should internally use mmap() right? So what is the problem here? Can
>> you please elaborate.
> 
> What I tried to say is that we can't avoid 
> uprobe_mmap()->delayed_uprobe_install()
> because dlopen() can race with _register() too, just like exec.

Right :)

Thanks,
Ravi



Re: [PATCH v8 3/6] Uprobes: Support SDT markers having reference count (semaphore)

2018-08-13 Thread Ravi Bangoria
Hi Song,

On 08/13/2018 10:42 PM, Song Liu wrote:
> On Mon, Aug 13, 2018 at 6:17 AM, Oleg Nesterov  wrote:
>> On 08/13, Ravi Bangoria wrote:
>>>
>>>> But damn, process creation (exec) is trivial. We could add a new 
>>>> uprobe_exec()
>>>> hook and avoid delayed_uprobe_install() in uprobe_mmap().
>>>
>>> I'm sorry. I didn't get this.
>>
>> Sorry for confusion...
>>
>> I meant, if only exec*( could race with _register(), we could add another 
>> uprobe
>> hook which updates all (delayed) counters before return to user-mode.
>>
>>>> Afaics, the really problematic case is dlopen() which can race with 
>>>> _register()
>>>> too, right?
>>>
>>> dlopen() should internally use mmap() right? So what is the problem here? 
>>> Can
>>> you please elaborate.
>>
>> What I tried to say is that we can't avoid 
>> uprobe_mmap()->delayed_uprobe_install()
>> because dlopen() can race with _register() too, just like exec.
>>
>> Oleg.
>>
> 
> How about we do delayed_uprobe_install() per file? Say we keep a list
> of delayed_uprobe
> in load_elf_binary(). Then we can install delayed_uprobe after loading
> all sections of the
> file.

I'm not sure if I totally understood the idea. But how this approach can
solve dlopen() race with _register()?

Rather, making delayed_uprobe_list an mm field seems simple and effective
idea to me. The only overhead will be list_empty(mm->delayed_list) check.

Please let me know if I misunderstood you.

Thanks,
Ravi



Re: [PATCH v8 3/6] Uprobes: Support SDT markers having reference count (semaphore)

2018-08-14 Thread Ravi Bangoria


> +static int delayed_uprobe_install(struct vm_area_struct *vma)
> +{
> + struct list_head *pos, *q;
> + struct delayed_uprobe *du;
> + unsigned long vaddr;
> + int ret = 0, err = 0;
> +
> + mutex_lock(_uprobe_lock);
> + list_for_each_safe(pos, q, _uprobe_list) {
> + du = list_entry(pos, struct delayed_uprobe, list);
> +
> + if (!valid_ref_ctr_vma(du->uprobe, vma))
> + continue;

I think we should compare mm here. I.e.:

if (du->mm != vma->vm_mm || !valid_ref_ctr_vma(du->uprobe, vma))
continue;

Otherwise things can mess up.

> +
> + vaddr = offset_to_vaddr(vma, du->uprobe->ref_ctr_offset);
> + ret = __update_ref_ctr(vma->vm_mm, vaddr, 1);
> + /* Record an error and continue. */
> + if (ret && !err)
> + err = ret;
> + delayed_uprobe_delete(du);
> + }
> + mutex_unlock(_uprobe_lock);
> + return err;
> +}



Re: [PATCH] perf record: use unmapped IP for inline callchain cursors

2018-10-02 Thread Ravi Bangoria



On 10/02/2018 09:02 PM, Arnaldo Carvalho de Melo wrote:
> Em Tue, Oct 02, 2018 at 09:39:49AM +0200, Milian Wolff escreveu:
>> Only use the mapped IP to find inline frames, but keep
>> using the unmapped IP for the callchain cursor. This
>> ensures we properly show the unmapped IP when displaying
>> a frame we received via the dso__parse_addr_inlines API
>> for a module which does not contain sufficient debug symbols
>> to show the srcline.
> 
> So, the patch came mangled, I fixed it up, please check, I'm adding Ravi
> to the CC list, so that he can check as well and retest, right Ravi?
> 
> - Arnaldo
> 
> From d9ddee9653d5676130471de9bc688fc7ec7b4fc4 Mon Sep 17 00:00:00 2001
> From: Milian Wolff 
> Date: Tue, 2 Oct 2018 09:39:49 +0200
> Subject: [PATCH 1/1] perf record: use unmapped IP for inline callchain cursors
> 
> Only use the mapped IP to find inline frames, but keep using the unmapped IP
> for the callchain cursor. This ensures we properly show the unmapped IP when
> displaying a frame we received via the dso__parse_addr_inlines API for a 
> module
> which does not contain sufficient debug symbols to show the srcline.
> 
> Before:
> 
>   $ perf record -e cycles:u --call-graph ls
>   $ perf script
>   ...
>   ls 12853  2735.563911:  43354 cycles:u:
>  17878 __GI___tunables_init+0x01d1d63a0118 
> (/usr/lib/ld-2.28.so)
>  19ee9 _dl_sysdep_start+0x01d1d63a02e9 
> (/usr/lib/ld-2.28.so)
>   3087 _dl_start+0x01d1d63a0287 (/usr/lib/ld-2.28.so)
>   2007 _start+0x01d1d63a0007 (/usr/lib/ld-2.28.so)
> 
> After:
> 
>   $ perf script
>   ...
>   ls 12853  2735.563911:  43354 cycles:u:
>   7f1714e46878 __GI___tunables_init+0x118 (/usr/lib/ld-2.28.so)
>   7f1714e48ee9 _dl_sysdep_start+0x2e9 (/usr/lib/ld-2.28.so)
>   7f1714e32087 _dl_start+0x287 (/usr/lib/ld-2.28.so)
>   7f1714e31007 _start+0x7 (/usr/lib/ld-2.28.so)

With current perf/urgent:

  $ sudo ./perf record --call-graph=dwarf -e probe_libc:inet_pton -- ping -6 -c 
1 ::1
  $ sudo ./perf script
  ping  4109 [011] 767269.031273: probe_libc:inet_pton: (7fff944cb458)
  15b458 __inet_pton+0xd7920008 (inlined)
  10feb3 gaih_inet.constprop.7+0xd7920f43 
(/usr/lib64/libc-2.26.
  110a13 __GI_getaddrinfo+0xd7920163 (inlined)
   13c752d6f _init+0xbfb (/usr/bin/ping)
   2371f generic_start_main.isra.0+0xd792013f 
(/usr/lib64/libc-2
   2391b __libc_start_main+0xd79200bb 
(/usr/lib64/libc-2.26.so)

With this patch:

  $ sudo ./perf script
  ping  4109 [011] 767269.031273: probe_libc:inet_pton: (7fff944cb458)
7fff944cb458 __inet_pton+0x8 (inlined)
7fff9447feb3 gaih_inet.constprop.7+0xf43 (/usr/lib64/libc-2.26.so)
7fff94480a13 __GI_getaddrinfo+0x163 (inlined)
   13c752d6f _init+0xbfb (/usr/bin/ping)
7fff9439371f generic_start_main.isra.0+0x13f 
(/usr/lib64/libc-2.26.so)
7fff9439391b __libc_start_main+0xbb (/usr/lib64/libc-2.26.so)

LGTM.

Tested-by: Ravi Bangoria 



Re: [PATCH v3 1/1] perf: Sharing PMU counters across compatible events

2018-09-27 Thread Ravi Bangoria
Hi Song,

On 09/25/2018 03:55 AM, Song Liu wrote:
> This patch tries to enable PMU sharing. To make perf event scheduling
> fast, we use special data structures.
> 
> An array of "struct perf_event_dup" is added to the perf_event_context,
> to remember all the duplicated events under this ctx. All the events
> under this ctx has a "dup_id" pointing to its perf_event_dup. Compatible
> events under the same ctx share the same perf_event_dup. The following
> figure shows a simplified version of the data structure.
> 
>   ctx ->  perf_event_dup -> master
>  ^
>  |
>  perf_event /|
>  |
>  perf_event /
> 

I've not gone through the patch in detail, but I was specifically
interested in scenarios where one perf instance is counting event
systemwide and thus other perf instance fails to count the same
event for a specific workload because that event can be counted
in one hw counter only.

Ex: https://lkml.org/lkml/2018/3/12/1011

Seems this patch does not solve this issue. Please let me know if
I'm missing anything.

Thanks,
Ravi



Re: [PATCH v2] perf test: S390 does not support watchpoints in test 22

2018-09-28 Thread Ravi Bangoria
Hi Thomas,

On 09/28/2018 01:13 PM, Thomas Richter wrote:
> diff --git a/tools/perf/tests/builtin-test.c b/tools/perf/tests/builtin-test.c
> index 54ca7d87236f..e83c15a95c43 100644
> --- a/tools/perf/tests/builtin-test.c
> +++ b/tools/perf/tests/builtin-test.c
> @@ -124,7 +124,7 @@ static struct test generic_tests[] = {
>   .desc = "Watchpoint",
>   .func = test__wp,
>   .subtest = {
> - .skip_if_fail   = false,
> + .skip_if_fail   = true,

Can you instead use ".is_supported"?

Thanks,
Ravi



Re: [PATCH v2] perf test: S390 does not support watchpoints in test 22

2018-09-28 Thread Ravi Bangoria
Hi Thomas,

On 09/28/2018 03:32 PM, Thomas-Mich Richter wrote:
> I can rework the patch to use the is_supported() member function. The
> down side is that the test does not show up in the list of executed tests 
> anymore,
> unless some debug output is enabled:

Which should be fine because s390 is doing that in all breakpoint related tests.
But more important thing to suggest .is_supported field is, this patch has a
regression on x86:

Before:
  $ sudo ./perf test 22
  22: Watchpoint:
  22.1: Read Only Watchpoint: Skip
  22.2: Write Only Watchpoint   : Ok
  22.3: Read / Write Watchpoint : Ok
  22.4: Modify Watchpoint   : Ok

After:
  $ git apply 1.patch
  $ make
  $ sudo ./perf test 22
  22: Watchpoint:
  22.1: Read Only Watchpoint: Skip
  22.2: Write Only Watchpoint   : Skip
  22.3: Read / Write Watchpoint : Skip
  22.4: Modify Watchpoint   : Skip

Intel hw does not support read only watchpoitns. If you use skip_if_fail=true,
all subsequent testcases are skipped as well.

Thanks,
Ravi



Re: [PATCH v3] perf test: S390 does not support watchpoints in test 22

2018-09-28 Thread Ravi Bangoria



On 09/28/2018 04:23 PM, Thomas Richter wrote:
> S390 does not support the perf_event_open system call for
> attribute type PERF_TYPE_BREAKPOINT. This results in test
> failure for test 22:
> 
> [root@s8360046 perf]# ./perf test 22
> 22: Watchpoint:
> 22.1: Read Only Watchpoint: FAILED!
> 22.2: Write Only Watchpoint   : FAILED!
> 22.3: Read / Write Watchpoint : FAILED!
> 22.4: Modify Watchpoint   : FAILED!
> [root@s8360046 perf]#
> 
> Add s390 support to avoid these tests being executed on
> s390 platform:
> 
> [root@s8360046 perf]# ./perf test 22
> [root@s8360046 perf]# ./perf test -v 22
> 22: Watchpoint: Disabled
> [root@s8360046 perf]#
> 
> Signed-off-by: Thomas Richter 

Acked-by: Ravi Bangoria 

Thanks,
Ravi



Re: [PATCH 4/6] perf report: Use the offset address to find inline frames

2018-09-30 Thread Ravi Bangoria
Hi Milian, 

Seems this has a regression:

With acme/perf/urgent:

  $ ./perf record -e cycles:u --call-graph=dwarf ls
  $ ./perf script
  ls 13585 602082.534478:  28032 cycles:u: 
   1f1f4 __GI___tunables_init+0xd3dc00a4 
(/usr/lib64/ld-2.26.so)
   20e2b _dl_sysdep_start+0xd3dc04ab 
(/usr/lib64/ld-2.26.so)
1ca7 _dl_start_final+0xd3dc00f7 
(/usr/lib64/ld-2.26.so)
29b3 _dl_start+0xd3dc0553 (/usr/lib64/ld-2.26.so)
1437 _start+0xd3dc0017 (/usr/lib64/ld-2.26.so)

After reverting this patch:

  $ git revert d9c910d5f8c3a1858f115dc9d3b157df32da70a3
  [a/perf/urgent 72cada4e6b30] Revert "perf report: Use the offset address to 
find inline frames"

  $ ./perf script
  ls 13585 602082.534478:  28032 cycles:u: 
7fff9613f1f4 __GI___tunables_init+0xa4 (/usr/lib64/ld-2.26.so)
7fff96140e2b _dl_sysdep_start+0x4ab (/usr/lib64/ld-2.26.so)
7fff96121ca7 _dl_start_final+0xf7 (/usr/lib64/ld-2.26.so)
7fff961229b3 _dl_start+0x553 (/usr/lib64/ld-2.26.so)
7fff96121437 _start+0x17 (/usr/lib64/ld-2.26.so)

Thanks,
Ravi


On 09/28/2018 05:55 PM, Arnaldo Carvalho de Melo wrote:
> From: Milian Wolff 
> 
> To correctly find inlined frames, we have to use the file offset instead
> of the virtual memory address. This was already fixed for displaying
> srcline information while displaying in commit 2a9d5050dc84fa20 ("perf
> script: Show correct offsets for DWARF-based unwinding"). We just need
> to use the same corrected address also when trying to find inline
> frames.
> 
> This is another follow-up to commit 19610184693c ("perf script: Show
> virtual addresses instead of offsets").
> 
> Signed-off-by: Milian Wolff 
> Acked-by: Jiri Olsa 
> Cc: Jin Yao 
> Cc: Namhyung Kim 
> Cc: Sandipan Das 
> Fixes: 19610184693c ("perf script: Show virtual addresses instead of offsets")
> Link: http://lkml.kernel.org/r/20180926135207.30263-2-milian.wo...@kdab.com
> Signed-off-by: Arnaldo Carvalho de Melo 
> ---
>  tools/perf/util/machine.c | 6 +++---
>  1 file changed, 3 insertions(+), 3 deletions(-)
> 
> diff --git a/tools/perf/util/machine.c b/tools/perf/util/machine.c
> index 0cb4f8bf3ca7..73a651f10a0f 100644
> --- a/tools/perf/util/machine.c
> +++ b/tools/perf/util/machine.c
> @@ -2317,9 +2317,6 @@ static int unwind_entry(struct unwind_entry *entry, 
> void *arg)
>   if (symbol_conf.hide_unresolved && entry->sym == NULL)
>   return 0;
>  
> - if (append_inlines(cursor, entry->map, entry->sym, entry->ip) == 0)
> - return 0;
> -
>   /*
>* Convert entry->ip from a virtual address to an offset in
>* its corresponding binary.
> @@ -2327,6 +2324,9 @@ static int unwind_entry(struct unwind_entry *entry, 
> void *arg)
>   if (entry->map)
>   addr = map__map_ip(entry->map, entry->ip);
>  
> + if (append_inlines(cursor, entry->map, entry->sym, addr) == 0)
> + return 0;
> +
>   srcline = callchain_srcline(entry->map, entry->sym, addr);
>   return callchain_cursor_append(cursor, entry->ip,
>  entry->map, entry->sym,
> 



Re: [PATCH v7 3/6] Uprobes: Support SDT markers having reference count (semaphore)

2018-08-06 Thread Ravi Bangoria
Hi Oleg,

Sorry for bit late reply.


On 08/03/2018 04:54 PM, Oleg Nesterov wrote:
> Hi Ravi,
> 
> I was going to give up and ack this series, but it seems I noticed
> a bug...
> 
> On 07/31, Ravi Bangoria wrote:
>>
>> +static int delayed_uprobe_add(struct uprobe *uprobe, struct mm_struct *mm)
>> +{
>> +struct delayed_uprobe *du;
>> +
>> +if (delayed_uprobe_check(uprobe, mm))
>> +return 0;
>> +
>> +du  = kzalloc(sizeof(*du), GFP_KERNEL);
>> +if (!du)
>> +return -ENOMEM;
>> +
>> +du->uprobe = uprobe;
>> +du->mm = mm;
> 
> I am surprised I didn't notice this before...
> 
> So
>   du->mm = mm;
> 
> is fine, mm can't go away, uprobe_clear_state() does 
> delayed_uprobe_remove(NULL,mm).
> 
> But
>   du->uprobe = uprobe;
> 
> doesn't look right, uprobe can go away and it can be freed, its memory can be 
> reused.
> We can't rely on remove_breakpoint(),


I'm sorry. I didn't get this. How can uprobe go away without calling
uprobe_unregister()
-> rergister_for_each_vma()
   -> remove_breakpoint()
And remove_breakpoint() will get called only when last uprobe consumer is going
away _for that mm_. So, we can rely on remove_breakpoint() to remove {uprobe,mm}
from delayed_uprobe_list. Am I missing anything? Or it would be even better if
you can tell me some example scenario.


> the application can unmap the probed page/vma.
> Yes we do not care about the application in this case, say, the next 
> uprobe_mmap() can
> wrongly increment the counter, we do not care although this can lead to 
> hard-to-debug
> problems. But, if nothing else, the kernel can crash if the freed memory is 
> unmapped.
> So I think put_uprobe() should do delayed_uprobe_remove(uprobe, NULL) before 
> kfree()
> and delayed_uprobe_remove() should be updated to handle the mm==NULL case.
> 
> Also. delayed_uprobe_add() should check the list and avoid duplicates. 
> Otherwise the
> trivial
> 
>   for (;;)
>   munmap(mmap(uprobed_file));
> 
> will eat the memory until uprobe is unregistered.


I'm already calling delayed_uprobe_check(uprobe, mm) from delayed_uprobe_add().
So, we don't add same {uprobe,mm} multiple time in delayed_uprobe_list.

Is it the same check you are asking me to add or something else.


> 
> 
>> +static bool valid_ref_ctr_vma(struct uprobe *uprobe,
>> +  struct vm_area_struct *vma)
>> +{
>> +unsigned long vaddr = offset_to_vaddr(vma, uprobe->ref_ctr_offset);
>> +
>> +return uprobe->ref_ctr_offset &&
>> +vma->vm_file &&
>> +file_inode(vma->vm_file) == uprobe->inode &&
>> +vma->vm_flags & VM_WRITE &&
>> +!(vma->vm_flags & VM_SHARED) &&
> 
>   vma->vm_flags & (VM_WRITE|VM_SHARED) == VM_WRITE &&
> 
> looks a bit better to me, but I won't insist.

Sure. I'll change it.

> 
>> +static int delayed_uprobe_install(struct vm_area_struct *vma)
>> +{
>> +struct list_head *pos, *q;
>> +struct delayed_uprobe *du;
>> +unsigned long vaddr;
>> +int ret = 0, err = 0;
>> +
>> +mutex_lock(_uprobe_lock);
>> +list_for_each_safe(pos, q, _uprobe_list) {
>> +du = list_entry(pos, struct delayed_uprobe, list);
>> +
>> +if (!valid_ref_ctr_vma(du->uprobe, vma))
>> +continue;
>> +
>> +vaddr = offset_to_vaddr(vma, du->uprobe->ref_ctr_offset);
>> +ret = __update_ref_ctr(vma->vm_mm, vaddr, 1);
>> +/* Record an error and continue. */
>> +err = ret & !err ? ret : err;
> 
> I try to avoid the cosmetic nits, but I simply can't look at this line ;)
> 
>   if (ret && !err)
>   err = ret;

This is neat. Will replace it.

> 
>> @@ -1072,7 +1281,14 @@ int uprobe_mmap(struct vm_area_struct *vma)
>>  struct uprobe *uprobe, *u;
>>  struct inode *inode;
>>
>> -if (no_uprobe_events() || !valid_vma(vma, true))
>> +if (no_uprobe_events())
>> +return 0;
>> +
>> +if (vma->vm_flags & VM_WRITE &&
>> +test_bit(MMF_HAS_UPROBES, >vm_mm->flags))
>> +delayed_uprobe_install(vma);
> 
> OK, so you also added the VM_WRITE check and I agree. But then I think we
> should also check VM_SHARED, just like valid_ref_ctr_vma() does?

Right. I'll add a check here.

Thanks for reviewing :)



Re: [PATCH v7 3/6] Uprobes: Support SDT markers having reference count (semaphore)

2018-08-07 Thread Ravi Bangoria
Hi Oleg,

>> I'm sorry. I didn't get this. How can uprobe go away without calling
>> uprobe_unregister()
>> -> rergister_for_each_vma()
>>-> remove_breakpoint()
>> And remove_breakpoint() will get called
> 
> assuming that _unregister() will find the same vma with the probed insn. But
> as I said, the application can munmap the probed page/vma.

Got it now :)

Thanks.



[PATCH v8 4/6] Uprobes/sdt: Prevent multiple reference counter for same uprobe

2018-08-08 Thread Ravi Bangoria
We assume to have only one reference counter for one uprobe.
Don't allow user to register multiple uprobes having same
inode+offset but different reference counter.

Signed-off-by: Ravi Bangoria 
---
 kernel/events/uprobes.c | 9 +
 1 file changed, 9 insertions(+)

diff --git a/kernel/events/uprobes.c b/kernel/events/uprobes.c
index 61b0481ef417..492a0e005b4d 100644
--- a/kernel/events/uprobes.c
+++ b/kernel/events/uprobes.c
@@ -689,6 +689,12 @@ static struct uprobe *alloc_uprobe(struct inode *inode, 
loff_t offset,
cur_uprobe = insert_uprobe(uprobe);
/* a uprobe exists for this inode:offset combination */
if (cur_uprobe) {
+   if (cur_uprobe->ref_ctr_offset != uprobe->ref_ctr_offset) {
+   pr_warn("Reference counter offset mismatch.\n");
+   put_uprobe(cur_uprobe);
+   kfree(uprobe);
+   return ERR_PTR(-EINVAL);
+   }
kfree(uprobe);
uprobe = cur_uprobe;
}
@@ -1103,6 +1109,9 @@ static int __uprobe_register(struct inode *inode, loff_t 
offset,
uprobe = alloc_uprobe(inode, offset, ref_ctr_offset);
if (!uprobe)
return -ENOMEM;
+   if (IS_ERR(uprobe))
+   return PTR_ERR(uprobe);
+
/*
 * We can race with uprobe_unregister()->delete_uprobe().
 * Check uprobe_is_active() and retry if it is false.
-- 
2.14.4



[PATCH v8 3/6] Uprobes: Support SDT markers having reference count (semaphore)

2018-08-08 Thread Ravi Bangoria
Userspace Statically Defined Tracepoints[1] are dtrace style markers
inside userspace applications. Applications like PostgreSQL, MySQL,
Pthread, Perl, Python, Java, Ruby, Node.js, libvirt, QEMU, glib etc
have these markers embedded in them. These markers are added by developer
at important places in the code. Each marker source expands to a single
nop instruction in the compiled code but there may be additional
overhead for computing the marker arguments which expands to couple of
instructions. In case the overhead is more, execution of it can be
omitted by runtime if() condition when no one is tracing on the marker:

if (reference_counter > 0) {
Execute marker instructions;
}

Default value of reference counter is 0. Tracer has to increment the
reference counter before tracing on a marker and decrement it when
done with the tracing.

Implement the reference counter logic in core uprobe. User will be
able to use it from trace_uprobe as well as from kernel module. New
trace_uprobe definition with reference counter will now be:

:[(ref_ctr_offset)]

where ref_ctr_offset is an optional field. For kernel module, new
variant of uprobe_register() has been introduced:

uprobe_register_refctr(inode, offset, ref_ctr_offset, consumer)

No new variant for uprobe_unregister() because it's assumed to have
only one reference counter for one uprobe.

[1] https://sourceware.org/systemtap/wiki/UserSpaceProbeImplementation

Note: 'reference counter' is called as 'semaphore' in original Dtrace
(or Systemtap, bcc and even in ELF) documentation and code. But the
term 'semaphore' is misleading in this context. This is just a counter
used to hold number of tracers tracing on a marker. This is not really
used for any synchronization. So we are referring it as 'reference
counter' in kernel / perf code.

Signed-off-by: Ravi Bangoria 
Reviewed-by: Masami Hiramatsu 
[Only trace_uprobe.c]
---
 include/linux/uprobes.h |   5 +
 kernel/events/uprobes.c | 246 ++--
 kernel/trace/trace.c|   2 +-
 kernel/trace/trace_uprobe.c |  38 ++-
 4 files changed, 280 insertions(+), 11 deletions(-)

diff --git a/include/linux/uprobes.h b/include/linux/uprobes.h
index bb9d2084af03..103a48a48872 100644
--- a/include/linux/uprobes.h
+++ b/include/linux/uprobes.h
@@ -123,6 +123,7 @@ extern unsigned long uprobe_get_swbp_addr(struct pt_regs 
*regs);
 extern unsigned long uprobe_get_trap_addr(struct pt_regs *regs);
 extern int uprobe_write_opcode(struct arch_uprobe *auprobe, struct mm_struct 
*mm, unsigned long vaddr, uprobe_opcode_t);
 extern int uprobe_register(struct inode *inode, loff_t offset, struct 
uprobe_consumer *uc);
+extern int uprobe_register_refctr(struct inode *inode, loff_t offset, loff_t 
ref_ctr_offset, struct uprobe_consumer *uc);
 extern int uprobe_apply(struct inode *inode, loff_t offset, struct 
uprobe_consumer *uc, bool);
 extern void uprobe_unregister(struct inode *inode, loff_t offset, struct 
uprobe_consumer *uc);
 extern int uprobe_mmap(struct vm_area_struct *vma);
@@ -160,6 +161,10 @@ uprobe_register(struct inode *inode, loff_t offset, struct 
uprobe_consumer *uc)
 {
return -ENOSYS;
 }
+static inline int uprobe_register_refctr(struct inode *inode, loff_t offset, 
loff_t ref_ctr_offset, struct uprobe_consumer *uc)
+{
+   return -ENOSYS;
+}
 static inline int
 uprobe_apply(struct inode *inode, loff_t offset, struct uprobe_consumer *uc, 
bool add)
 {
diff --git a/kernel/events/uprobes.c b/kernel/events/uprobes.c
index c0418ba52ba8..61b0481ef417 100644
--- a/kernel/events/uprobes.c
+++ b/kernel/events/uprobes.c
@@ -73,6 +73,7 @@ struct uprobe {
struct uprobe_consumer  *consumers;
struct inode*inode; /* Also hold a ref to inode */
loff_t  offset;
+   loff_t  ref_ctr_offset;
unsigned long   flags;
 
/*
@@ -88,6 +89,15 @@ struct uprobe {
struct arch_uprobe  arch;
 };
 
+struct delayed_uprobe {
+   struct list_head list;
+   struct uprobe *uprobe;
+   struct mm_struct *mm;
+};
+
+static DEFINE_MUTEX(delayed_uprobe_lock);
+static LIST_HEAD(delayed_uprobe_list);
+
 /*
  * Execute out of line area: anonymous executable mapping installed
  * by the probed task to execute the copy of the original instruction
@@ -282,6 +292,157 @@ static int verify_opcode(struct page *page, unsigned long 
vaddr, uprobe_opcode_t
return 1;
 }
 
+static struct delayed_uprobe *
+delayed_uprobe_check(struct uprobe *uprobe, struct mm_struct *mm)
+{
+   struct delayed_uprobe *du;
+
+   list_for_each_entry(du, _uprobe_list, list)
+   if (du->uprobe == uprobe && du->mm == mm)
+   return du;
+   return NULL;
+}
+
+static int delayed_uprobe_add(struct uprobe *uprobe, struct mm_struct *mm)
+{
+   struct delayed_uprobe *du;
+
+   if (delayed_uprobe_check(uprobe, mm))
+  

[PATCH v8 1/6] Uprobes: Simplify uprobe_register() body

2018-08-08 Thread Ravi Bangoria
Simplify uprobe_register() function body and let __uprobe_register()
handle everything. Also move dependency functions around to fix build
failures.

Signed-off-by: Ravi Bangoria 
---
 kernel/events/uprobes.c | 69 ++---
 1 file changed, 36 insertions(+), 33 deletions(-)

diff --git a/kernel/events/uprobes.c b/kernel/events/uprobes.c
index ccc579a7d32e..471eac896635 100644
--- a/kernel/events/uprobes.c
+++ b/kernel/events/uprobes.c
@@ -840,13 +840,8 @@ register_for_each_vma(struct uprobe *uprobe, struct 
uprobe_consumer *new)
return err;
 }
 
-static int __uprobe_register(struct uprobe *uprobe, struct uprobe_consumer *uc)
-{
-   consumer_add(uprobe, uc);
-   return register_for_each_vma(uprobe, uc);
-}
-
-static void __uprobe_unregister(struct uprobe *uprobe, struct uprobe_consumer 
*uc)
+static void
+__uprobe_unregister(struct uprobe *uprobe, struct uprobe_consumer *uc)
 {
int err;
 
@@ -860,24 +855,46 @@ static void __uprobe_unregister(struct uprobe *uprobe, 
struct uprobe_consumer *u
 }
 
 /*
- * uprobe_register - register a probe
+ * uprobe_unregister - unregister a already registered probe.
+ * @inode: the file in which the probe has to be removed.
+ * @offset: offset from the start of the file.
+ * @uc: identify which probe if multiple probes are colocated.
+ */
+void uprobe_unregister(struct inode *inode, loff_t offset, struct 
uprobe_consumer *uc)
+{
+   struct uprobe *uprobe;
+
+   uprobe = find_uprobe(inode, offset);
+   if (WARN_ON(!uprobe))
+   return;
+
+   down_write(>register_rwsem);
+   __uprobe_unregister(uprobe, uc);
+   up_write(>register_rwsem);
+   put_uprobe(uprobe);
+}
+EXPORT_SYMBOL_GPL(uprobe_unregister);
+
+/*
+ * __uprobe_register - register a probe
  * @inode: the file in which the probe has to be placed.
  * @offset: offset from the start of the file.
  * @uc: information on howto handle the probe..
  *
- * Apart from the access refcount, uprobe_register() takes a creation
+ * Apart from the access refcount, __uprobe_register() takes a creation
  * refcount (thro alloc_uprobe) if and only if this @uprobe is getting
  * inserted into the rbtree (i.e first consumer for a @inode:@offset
  * tuple).  Creation refcount stops uprobe_unregister from freeing the
  * @uprobe even before the register operation is complete. Creation
  * refcount is released when the last @uc for the @uprobe
- * unregisters. Caller of uprobe_register() is required to keep @inode
+ * unregisters. Caller of __uprobe_register() is required to keep @inode
  * (and the containing mount) referenced.
  *
  * Return errno if it cannot successully install probes
  * else return 0 (success)
  */
-int uprobe_register(struct inode *inode, loff_t offset, struct uprobe_consumer 
*uc)
+static int __uprobe_register(struct inode *inode, loff_t offset,
+struct uprobe_consumer *uc)
 {
struct uprobe *uprobe;
int ret;
@@ -904,7 +921,8 @@ int uprobe_register(struct inode *inode, loff_t offset, 
struct uprobe_consumer *
down_write(>register_rwsem);
ret = -EAGAIN;
if (likely(uprobe_is_active(uprobe))) {
-   ret = __uprobe_register(uprobe, uc);
+   consumer_add(uprobe, uc);
+   ret = register_for_each_vma(uprobe, uc);
if (ret)
__uprobe_unregister(uprobe, uc);
}
@@ -915,6 +933,12 @@ int uprobe_register(struct inode *inode, loff_t offset, 
struct uprobe_consumer *
goto retry;
return ret;
 }
+
+int uprobe_register(struct inode *inode, loff_t offset,
+   struct uprobe_consumer *uc)
+{
+   return __uprobe_register(inode, offset, uc);
+}
 EXPORT_SYMBOL_GPL(uprobe_register);
 
 /*
@@ -946,27 +970,6 @@ int uprobe_apply(struct inode *inode, loff_t offset,
return ret;
 }
 
-/*
- * uprobe_unregister - unregister a already registered probe.
- * @inode: the file in which the probe has to be removed.
- * @offset: offset from the start of the file.
- * @uc: identify which probe if multiple probes are colocated.
- */
-void uprobe_unregister(struct inode *inode, loff_t offset, struct 
uprobe_consumer *uc)
-{
-   struct uprobe *uprobe;
-
-   uprobe = find_uprobe(inode, offset);
-   if (WARN_ON(!uprobe))
-   return;
-
-   down_write(>register_rwsem);
-   __uprobe_unregister(uprobe, uc);
-   up_write(>register_rwsem);
-   put_uprobe(uprobe);
-}
-EXPORT_SYMBOL_GPL(uprobe_unregister);
-
 static int unapply_uprobe(struct uprobe *uprobe, struct mm_struct *mm)
 {
struct vm_area_struct *vma;
-- 
2.14.4



[PATCH v8 5/6] trace_uprobe/sdt: Prevent multiple reference counter for same uprobe

2018-08-08 Thread Ravi Bangoria
We assume to have only one reference counter for one uprobe.
Don't allow user to add multiple trace_uprobe entries having
same inode+offset but different reference counter.

Ex,
  # echo "p:sdt_tick/loop2 /home/ravi/tick:0x6e4(0x10036)" > uprobe_events
  # echo "p:sdt_tick/loop2_1 /home/ravi/tick:0x6e4(0xf)" >> uprobe_events
  bash: echo: write error: Invalid argument

  # dmesg
  trace_kprobe: Reference counter offset mismatch.

There is one exception though:
When user is trying to replace the old entry with the new
one, we allow this if the new entry does not conflict with
any other existing entries.

Signed-off-by: Ravi Bangoria 
---
 kernel/trace/trace_uprobe.c | 37 +++--
 1 file changed, 35 insertions(+), 2 deletions(-)

diff --git a/kernel/trace/trace_uprobe.c b/kernel/trace/trace_uprobe.c
index bf2be098eb08..be64d943d7ea 100644
--- a/kernel/trace/trace_uprobe.c
+++ b/kernel/trace/trace_uprobe.c
@@ -324,6 +324,35 @@ static int unregister_trace_uprobe(struct trace_uprobe *tu)
return 0;
 }
 
+/*
+ * Uprobe with multiple reference counter is not allowed. i.e.
+ * If inode and offset matches, reference counter offset *must*
+ * match as well. Though, there is one exception: If user is
+ * replacing old trace_uprobe with new one(same group/event),
+ * then we allow same uprobe with new reference counter as far
+ * as the new one does not conflict with any other existing
+ * ones.
+ */
+static struct trace_uprobe *find_old_trace_uprobe(struct trace_uprobe *new)
+{
+   struct trace_uprobe *tmp, *old = NULL;
+   struct inode *new_inode = d_real_inode(new->path.dentry);
+
+   old = find_probe_event(trace_event_name(>tp.call),
+   new->tp.call.class->system);
+
+   list_for_each_entry(tmp, _list, list) {
+   if ((old ? old != tmp : true) &&
+   new_inode == d_real_inode(tmp->path.dentry) &&
+   new->offset == tmp->offset &&
+   new->ref_ctr_offset != tmp->ref_ctr_offset) {
+   pr_warn("Reference counter offset mismatch.");
+   return ERR_PTR(-EINVAL);
+   }
+   }
+   return old;
+}
+
 /* Register a trace_uprobe and probe_event */
 static int register_trace_uprobe(struct trace_uprobe *tu)
 {
@@ -333,8 +362,12 @@ static int register_trace_uprobe(struct trace_uprobe *tu)
mutex_lock(_lock);
 
/* register as an event */
-   old_tu = find_probe_event(trace_event_name(>tp.call),
-   tu->tp.call.class->system);
+   old_tu = find_old_trace_uprobe(tu);
+   if (IS_ERR(old_tu)) {
+   ret = PTR_ERR(old_tu);
+   goto end;
+   }
+
if (old_tu) {
/* delete old event */
ret = unregister_trace_uprobe(old_tu);
-- 
2.14.4



[PATCH v8 2/6] Uprobe: Additional argument arch_uprobe to uprobe_write_opcode()

2018-08-08 Thread Ravi Bangoria
Add addition argument 'arch_uprobe' to uprobe_write_opcode().
We need this in later set of patches.

Signed-off-by: Ravi Bangoria 
---
 arch/arm/probes/uprobes/core.c | 2 +-
 arch/mips/kernel/uprobes.c | 2 +-
 include/linux/uprobes.h| 2 +-
 kernel/events/uprobes.c| 9 +
 4 files changed, 8 insertions(+), 7 deletions(-)

diff --git a/arch/arm/probes/uprobes/core.c b/arch/arm/probes/uprobes/core.c
index d1329f1ba4e4..bf992264060e 100644
--- a/arch/arm/probes/uprobes/core.c
+++ b/arch/arm/probes/uprobes/core.c
@@ -32,7 +32,7 @@ bool is_swbp_insn(uprobe_opcode_t *insn)
 int set_swbp(struct arch_uprobe *auprobe, struct mm_struct *mm,
 unsigned long vaddr)
 {
-   return uprobe_write_opcode(mm, vaddr,
+   return uprobe_write_opcode(auprobe, mm, vaddr,
   __opcode_to_mem_arm(auprobe->bpinsn));
 }
 
diff --git a/arch/mips/kernel/uprobes.c b/arch/mips/kernel/uprobes.c
index f7a0645ccb82..4aaff3b3175c 100644
--- a/arch/mips/kernel/uprobes.c
+++ b/arch/mips/kernel/uprobes.c
@@ -224,7 +224,7 @@ unsigned long arch_uretprobe_hijack_return_addr(
 int __weak set_swbp(struct arch_uprobe *auprobe, struct mm_struct *mm,
unsigned long vaddr)
 {
-   return uprobe_write_opcode(mm, vaddr, UPROBE_SWBP_INSN);
+   return uprobe_write_opcode(auprobe, mm, vaddr, UPROBE_SWBP_INSN);
 }
 
 void arch_uprobe_copy_ixol(struct page *page, unsigned long vaddr,
diff --git a/include/linux/uprobes.h b/include/linux/uprobes.h
index 0a294e950df8..bb9d2084af03 100644
--- a/include/linux/uprobes.h
+++ b/include/linux/uprobes.h
@@ -121,7 +121,7 @@ extern bool is_swbp_insn(uprobe_opcode_t *insn);
 extern bool is_trap_insn(uprobe_opcode_t *insn);
 extern unsigned long uprobe_get_swbp_addr(struct pt_regs *regs);
 extern unsigned long uprobe_get_trap_addr(struct pt_regs *regs);
-extern int uprobe_write_opcode(struct mm_struct *mm, unsigned long vaddr, 
uprobe_opcode_t);
+extern int uprobe_write_opcode(struct arch_uprobe *auprobe, struct mm_struct 
*mm, unsigned long vaddr, uprobe_opcode_t);
 extern int uprobe_register(struct inode *inode, loff_t offset, struct 
uprobe_consumer *uc);
 extern int uprobe_apply(struct inode *inode, loff_t offset, struct 
uprobe_consumer *uc, bool);
 extern void uprobe_unregister(struct inode *inode, loff_t offset, struct 
uprobe_consumer *uc);
diff --git a/kernel/events/uprobes.c b/kernel/events/uprobes.c
index 471eac896635..c0418ba52ba8 100644
--- a/kernel/events/uprobes.c
+++ b/kernel/events/uprobes.c
@@ -299,8 +299,8 @@ static int verify_opcode(struct page *page, unsigned long 
vaddr, uprobe_opcode_t
  * Called with mm->mmap_sem held for write.
  * Return 0 (success) or a negative errno.
  */
-int uprobe_write_opcode(struct mm_struct *mm, unsigned long vaddr,
-   uprobe_opcode_t opcode)
+int uprobe_write_opcode(struct arch_uprobe *auprobe, struct mm_struct *mm,
+   unsigned long vaddr, uprobe_opcode_t opcode)
 {
struct page *old_page, *new_page;
struct vm_area_struct *vma;
@@ -351,7 +351,7 @@ int uprobe_write_opcode(struct mm_struct *mm, unsigned long 
vaddr,
  */
 int __weak set_swbp(struct arch_uprobe *auprobe, struct mm_struct *mm, 
unsigned long vaddr)
 {
-   return uprobe_write_opcode(mm, vaddr, UPROBE_SWBP_INSN);
+   return uprobe_write_opcode(auprobe, mm, vaddr, UPROBE_SWBP_INSN);
 }
 
 /**
@@ -366,7 +366,8 @@ int __weak set_swbp(struct arch_uprobe *auprobe, struct 
mm_struct *mm, unsigned
 int __weak
 set_orig_insn(struct arch_uprobe *auprobe, struct mm_struct *mm, unsigned long 
vaddr)
 {
-   return uprobe_write_opcode(mm, vaddr, *(uprobe_opcode_t 
*)>insn);
+   return uprobe_write_opcode(auprobe, mm, vaddr,
+   *(uprobe_opcode_t *)>insn);
 }
 
 static struct uprobe *get_uprobe(struct uprobe *uprobe)
-- 
2.14.4



[PATCH v8 6/6] perf probe: Support SDT markers having reference counter (semaphore)

2018-08-08 Thread Ravi Bangoria
With this, perf buildid-cache will save SDT markers with reference
counter in probe cache. Perf probe will be able to probe markers
having reference counter. Ex,

  # readelf -n /tmp/tick | grep -A1 loop2
Name: loop2
... Semaphore: 0x10020036

  # ./perf buildid-cache --add /tmp/tick
  # ./perf probe sdt_tick:loop2
  # ./perf stat -e sdt_tick:loop2 /tmp/tick
hi: 0
hi: 1
hi: 2
^C
 Performance counter stats for '/tmp/tick':
 3  sdt_tick:loop2
   2.561851452 seconds time elapsed

Signed-off-by: Ravi Bangoria 
Acked-by: Masami Hiramatsu 
Acked-by: Srikar Dronamraju 
---
 tools/perf/util/probe-event.c | 39 
 tools/perf/util/probe-event.h |  1 +
 tools/perf/util/probe-file.c  | 34 ++--
 tools/perf/util/probe-file.h  |  1 +
 tools/perf/util/symbol-elf.c  | 46 ---
 tools/perf/util/symbol.h  |  7 +++
 6 files changed, 106 insertions(+), 22 deletions(-)

diff --git a/tools/perf/util/probe-event.c b/tools/perf/util/probe-event.c
index f119eb628dbb..e86f8be89157 100644
--- a/tools/perf/util/probe-event.c
+++ b/tools/perf/util/probe-event.c
@@ -1819,6 +1819,12 @@ int parse_probe_trace_command(const char *cmd, struct 
probe_trace_event *tev)
tp->offset = strtoul(fmt2_str, NULL, 10);
}
 
+   if (tev->uprobes) {
+   fmt2_str = strchr(p, '(');
+   if (fmt2_str)
+   tp->ref_ctr_offset = strtoul(fmt2_str + 1, NULL, 0);
+   }
+
tev->nargs = argc - 2;
tev->args = zalloc(sizeof(struct probe_trace_arg) * tev->nargs);
if (tev->args == NULL) {
@@ -2012,6 +2018,22 @@ static int synthesize_probe_trace_arg(struct 
probe_trace_arg *arg,
return err;
 }
 
+static int
+synthesize_uprobe_trace_def(struct probe_trace_event *tev, struct strbuf *buf)
+{
+   struct probe_trace_point *tp = >point;
+   int err;
+
+   err = strbuf_addf(buf, "%s:0x%lx", tp->module, tp->address);
+
+   if (err >= 0 && tp->ref_ctr_offset) {
+   if (!uprobe_ref_ctr_is_supported())
+   return -1;
+   err = strbuf_addf(buf, "(0x%lx)", tp->ref_ctr_offset);
+   }
+   return err >= 0 ? 0 : -1;
+}
+
 char *synthesize_probe_trace_command(struct probe_trace_event *tev)
 {
struct probe_trace_point *tp = >point;
@@ -2041,15 +2063,17 @@ char *synthesize_probe_trace_command(struct 
probe_trace_event *tev)
}
 
/* Use the tp->address for uprobes */
-   if (tev->uprobes)
-   err = strbuf_addf(, "%s:0x%lx", tp->module, tp->address);
-   else if (!strncmp(tp->symbol, "0x", 2))
+   if (tev->uprobes) {
+   err = synthesize_uprobe_trace_def(tev, );
+   } else if (!strncmp(tp->symbol, "0x", 2)) {
/* Absolute address. See try_to_find_absolute_address() */
err = strbuf_addf(, "%s%s0x%lx", tp->module ?: "",
  tp->module ? ":" : "", tp->address);
-   else
+   } else {
err = strbuf_addf(, "%s%s%s+%lu", tp->module ?: "",
tp->module ? ":" : "", tp->symbol, tp->offset);
+   }
+
if (err)
goto error;
 
@@ -2633,6 +2657,13 @@ static void warn_uprobe_event_compat(struct 
probe_trace_event *tev)
 {
int i;
char *buf = synthesize_probe_trace_command(tev);
+   struct probe_trace_point *tp = >point;
+
+   if (tp->ref_ctr_offset && !uprobe_ref_ctr_is_supported()) {
+   pr_warning("A semaphore is associated with %s:%s and "
+  "seems your kernel doesn't support it.\n",
+  tev->group, tev->event);
+   }
 
/* Old uprobe event doesn't support memory dereference */
if (!tev->uprobes || tev->nargs == 0 || !buf)
diff --git a/tools/perf/util/probe-event.h b/tools/perf/util/probe-event.h
index 45b14f020558..15a98c3a2a2f 100644
--- a/tools/perf/util/probe-event.h
+++ b/tools/perf/util/probe-event.h
@@ -27,6 +27,7 @@ struct probe_trace_point {
char*symbol;/* Base symbol */
char*module;/* Module name */
unsigned long   offset; /* Offset from symbol */
+   unsigned long   ref_ctr_offset; /* SDT reference counter offset */
unsigned long   address;/* Actual address of the trace point */
boolretprobe;   /* Return probe flag */
 };
diff --git a/tools/perf/util/probe-file.c b/tools/perf/util/probe-file.c
index b76088fadf3d..aac7817d9e14 100644
--- a/tools/perf/util/probe-file.c

[PATCH v8 0/6] Uprobes: Support SDT markers having reference count (semaphore)

2018-08-08 Thread Ravi Bangoria
v8 changes:
 - Call delayed_uprobe_remove(uprobe, NULL) from put_uprobe() as well.
 - Other minor optimizations.

v7: https://lkml.org/lkml/2018/7/31/20


Description:
Userspace Statically Defined Tracepoints[1] are dtrace style markers
inside userspace applications. Applications like PostgreSQL, MySQL,
Pthread, Perl, Python, Java, Ruby, Node.js, libvirt, QEMU, glib etc
have these markers embedded in them. These markers are added by developer
at important places in the code. Each marker source expands to a single
nop instruction in the compiled code but there may be additional
overhead for computing the marker arguments which expands to couple of
instructions. In case the overhead is more, execution of it can be
omitted by runtime if() condition when no one is tracing on the marker:

if (reference_counter > 0) {
Execute marker instructions;
}

Default value of reference counter is 0. Tracer has to increment the 
reference counter before tracing on a marker and decrement it when
done with the tracing.

Currently, perf tool has limited supports for SDT markers. I.e. it
can not trace markers surrounded by reference counter. Also, it's
not easy to add reference counter logic in userspace tool like perf,
so basic idea for this patchset is to add reference counter logic in
the a uprobe infrastructure. Ex,[2]

  # cat tick.c
... 
for (i = 0; i < 100; i++) {
DTRACE_PROBE1(tick, loop1, i);
if (TICK_LOOP2_ENABLED()) {
DTRACE_PROBE1(tick, loop2, i); 
}
printf("hi: %d\n", i); 
sleep(1);
}   
... 

Here tick:loop1 is marker without reference counter where as tick:loop2
is surrounded by reference counter condition.

  # perf buildid-cache --add /tmp/tick
  # perf probe sdt_tick:loop1
  # perf probe sdt_tick:loop2

  # perf stat -e sdt_tick:loop1,sdt_tick:loop2 -- /tmp/tick
  hi: 0
  hi: 1
  hi: 2
  ^C
  Performance counter stats for '/tmp/tick':
 3  sdt_tick:loop1
 0  sdt_tick:loop2
 2.747086086 seconds time elapsed

Perf failed to record data for tick:loop2. Same experiment with this
patch series:

  # ./perf buildid-cache --add /tmp/tick
  # ./perf probe sdt_tick:loop2
  # ./perf stat -e sdt_tick:loop2 /tmp/tick
hi: 0
hi: 1
hi: 2
^C  
 Performance counter stats for '/tmp/tick':
 3  sdt_tick:loop2
   2.561851452 seconds time elapsed

[1] https://sourceware.org/systemtap/wiki/UserSpaceProbeImplementation


Ravi Bangoria (6):
  Uprobes: Simplify uprobe_register() body
  Uprobe: Additional argument arch_uprobe to uprobe_write_opcode()
  Uprobes: Support SDT markers having reference count (semaphore)
  Uprobes/sdt: Prevent multiple reference counter for same uprobe
  trace_uprobe/sdt: Prevent multiple reference counter for same uprobe
  perf probe: Support SDT markers having reference counter (semaphore)

 arch/arm/probes/uprobes/core.c |   2 +-
 arch/mips/kernel/uprobes.c |   2 +-
 include/linux/uprobes.h|   7 +-
 kernel/events/uprobes.c| 329 +++--
 kernel/trace/trace.c   |   2 +-
 kernel/trace/trace_uprobe.c|  75 +-
 tools/perf/util/probe-event.c  |  39 -
 tools/perf/util/probe-event.h  |   1 +
 tools/perf/util/probe-file.c   |  34 -
 tools/perf/util/probe-file.h   |   1 +
 tools/perf/util/symbol-elf.c   |  46 --
 tools/perf/util/symbol.h   |   7 +
 12 files changed, 472 insertions(+), 73 deletions(-)

-- 
2.14.4



[PATCH 2/2] powerpc/perf: Fix mmcra corruption by bhrb_filter

2019-05-10 Thread Ravi Bangoria
Consider a scenario where user creates two events:

  1st event:
attr.sample_type |= PERF_SAMPLE_BRANCH_STACK;
attr.branch_sample_type = PERF_SAMPLE_BRANCH_ANY;
fd = perf_event_open(attr, 0, 1, -1, 0);

  This sets cpuhw->bhrb_filter to 0 and returns valid fd.

  2nd event:
attr.sample_type |= PERF_SAMPLE_BRANCH_STACK;
attr.branch_sample_type = PERF_SAMPLE_BRANCH_CALL;
fd = perf_event_open(attr, 0, 1, -1, 0);

  It overrides cpuhw->bhrb_filter to -1 and returns with error.

Now if power_pmu_enable() gets called by any path other than
power_pmu_add(), ppmu->config_bhrb(-1) will set mmcra to -1.

Signed-off-by: Ravi Bangoria 
---
 arch/powerpc/perf/core-book3s.c | 6 --
 arch/powerpc/perf/power8-pmu.c  | 3 +++
 arch/powerpc/perf/power9-pmu.c  | 3 +++
 3 files changed, 10 insertions(+), 2 deletions(-)

diff --git a/arch/powerpc/perf/core-book3s.c b/arch/powerpc/perf/core-book3s.c
index b0723002a396..8eb5dc5df62b 100644
--- a/arch/powerpc/perf/core-book3s.c
+++ b/arch/powerpc/perf/core-book3s.c
@@ -1846,6 +1846,7 @@ static int power_pmu_event_init(struct perf_event *event)
int n;
int err;
struct cpu_hw_events *cpuhw;
+   u64 bhrb_filter;
 
if (!ppmu)
return -ENOENT;
@@ -1951,13 +1952,14 @@ static int power_pmu_event_init(struct perf_event 
*event)
err = power_check_constraints(cpuhw, events, cflags, n + 1);
 
if (has_branch_stack(event)) {
-   cpuhw->bhrb_filter = ppmu->bhrb_filter_map(
+   bhrb_filter = ppmu->bhrb_filter_map(
event->attr.branch_sample_type);
 
-   if (cpuhw->bhrb_filter == -1) {
+   if (bhrb_filter == -1) {
put_cpu_var(cpu_hw_events);
return -EOPNOTSUPP;
}
+   cpuhw->bhrb_filter = bhrb_filter;
}
 
put_cpu_var(cpu_hw_events);
diff --git a/arch/powerpc/perf/power8-pmu.c b/arch/powerpc/perf/power8-pmu.c
index d12a2db26353..d10feef93b6b 100644
--- a/arch/powerpc/perf/power8-pmu.c
+++ b/arch/powerpc/perf/power8-pmu.c
@@ -29,6 +29,7 @@ enum {
 #definePOWER8_MMCRA_IFM1   0x4000UL
 #definePOWER8_MMCRA_IFM2   0x8000UL
 #definePOWER8_MMCRA_IFM3   0xC000UL
+#definePOWER8_MMCRA_BHRB_MASK  0xC000UL
 
 /*
  * Raw event encoding for PowerISA v2.07 (Power8):
@@ -243,6 +244,8 @@ static u64 power8_bhrb_filter_map(u64 branch_sample_type)
 
 static void power8_config_bhrb(u64 pmu_bhrb_filter)
 {
+   pmu_bhrb_filter &= POWER8_MMCRA_BHRB_MASK;
+
/* Enable BHRB filter in PMU */
mtspr(SPRN_MMCRA, (mfspr(SPRN_MMCRA) | pmu_bhrb_filter));
 }
diff --git a/arch/powerpc/perf/power9-pmu.c b/arch/powerpc/perf/power9-pmu.c
index 030544e35959..f3987915cadc 100644
--- a/arch/powerpc/perf/power9-pmu.c
+++ b/arch/powerpc/perf/power9-pmu.c
@@ -92,6 +92,7 @@ enum {
 #define POWER9_MMCRA_IFM1  0x4000UL
 #define POWER9_MMCRA_IFM2  0x8000UL
 #define POWER9_MMCRA_IFM3  0xC000UL
+#define POWER9_MMCRA_BHRB_MASK 0xC000UL
 
 /* Nasty Power9 specific hack */
 #define PVR_POWER9_CUMULUS 0x2000
@@ -300,6 +301,8 @@ static u64 power9_bhrb_filter_map(u64 branch_sample_type)
 
 static void power9_config_bhrb(u64 pmu_bhrb_filter)
 {
+   pmu_bhrb_filter &= POWER9_MMCRA_BHRB_MASK;
+
/* Enable BHRB filter in PMU */
mtspr(SPRN_MMCRA, (mfspr(SPRN_MMCRA) | pmu_bhrb_filter));
 }
-- 
2.20.1



[PATCH 1/2] perf ioctl: Add check for the sample_period value

2019-05-10 Thread Ravi Bangoria
Add a check for sample_period value sent from userspace. Negative
value does not make sense. And in powerpc arch code this could cause
a recursive PMI leading to a hang (reported when running perf-fuzzer).

Signed-off-by: Ravi Bangoria 
---
 kernel/events/core.c | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/kernel/events/core.c b/kernel/events/core.c
index abbd4b3b96c2..e44c90378940 100644
--- a/kernel/events/core.c
+++ b/kernel/events/core.c
@@ -5005,6 +5005,9 @@ static int perf_event_period(struct perf_event *event, 
u64 __user *arg)
if (perf_event_check_period(event, value))
return -EINVAL;
 
+   if (!event->attr.freq && (value & (1ULL << 63)))
+   return -EINVAL;
+
event_function_call(event, __perf_event_period, );
 
return 0;
-- 
2.20.1



Re: [PATCH 2/2] powerpc/perf: Fix mmcra corruption by bhrb_filter

2019-05-10 Thread Ravi Bangoria



On 5/11/19 8:12 AM, Ravi Bangoria wrote:
> Consider a scenario where user creates two events:
> 
>   1st event:
> attr.sample_type |= PERF_SAMPLE_BRANCH_STACK;
> attr.branch_sample_type = PERF_SAMPLE_BRANCH_ANY;
> fd = perf_event_open(attr, 0, 1, -1, 0);
> 
>   This sets cpuhw->bhrb_filter to 0 and returns valid fd.
> 
>   2nd event:
> attr.sample_type |= PERF_SAMPLE_BRANCH_STACK;
> attr.branch_sample_type = PERF_SAMPLE_BRANCH_CALL;
> fd = perf_event_open(attr, 0, 1, -1, 0);
> 
>   It overrides cpuhw->bhrb_filter to -1 and returns with error.
> 
> Now if power_pmu_enable() gets called by any path other than
> power_pmu_add(), ppmu->config_bhrb(-1) will set mmcra to -1.
> 
> Signed-off-by: Ravi Bangoria 

Fixes: 3925f46bb590 ("powerpc/perf: Enable branch stack sampling framework")



Re: [PATCH 1/2] perf ioctl: Add check for the sample_period value

2019-05-13 Thread Ravi Bangoria



On 5/13/19 2:26 PM, Peter Zijlstra wrote:
> On Mon, May 13, 2019 at 09:42:13AM +0200, Peter Zijlstra wrote:
>> On Sat, May 11, 2019 at 08:12:16AM +0530, Ravi Bangoria wrote:
>>> Add a check for sample_period value sent from userspace. Negative
>>> value does not make sense. And in powerpc arch code this could cause
>>> a recursive PMI leading to a hang (reported when running perf-fuzzer).
>>>
>>> Signed-off-by: Ravi Bangoria 
>>> ---
>>>  kernel/events/core.c | 3 +++
>>>  1 file changed, 3 insertions(+)
>>>
>>> diff --git a/kernel/events/core.c b/kernel/events/core.c
>>> index abbd4b3b96c2..e44c90378940 100644
>>> --- a/kernel/events/core.c
>>> +++ b/kernel/events/core.c
>>> @@ -5005,6 +5005,9 @@ static int perf_event_period(struct perf_event 
>>> *event, u64 __user *arg)
>>> if (perf_event_check_period(event, value))
>>> return -EINVAL;
>>>  
>>> +   if (!event->attr.freq && (value & (1ULL << 63)))
>>> +   return -EINVAL;
>>
>> Well, perf_event_attr::sample_period is __u64. Would not be the site
>> using it as signed be the one in error?
> 
> You forgot to mention commit: 0819b2e30ccb9, so I guess this just makes
> it consistent and is fine.
> 

Yeah, I was about to reply :)



Re: [PATCH V2 1/3] perf parse-regs: Split parse_regs

2019-05-15 Thread Ravi Bangoria


On 5/15/19 1:49 AM, kan.li...@linux.intel.com wrote:
> From: Kan Liang 
> 
> The available registers for --int-regs and --user-regs may be different,
> e.g. XMM registers.
> 
> Split parse_regs into two dedicated functions for --int-regs and
> --user-regs respectively.
> 
> Modify the warning message. "--user-regs=?" should be applied to show
> the available registers for --user-regs.
> 
> Signed-off-by: Kan Liang 
> ---

For patch 1 and 2,
Tested-by: Ravi Bangoria 

Minor neat. Should we update document as well? May be something like:

  tools/perf/Documentation/perf-record.txt

  --user-regs::
  Similar to -I, but capture user registers at sample time. To list the 
available
  user registers use --user-regs=\?.

Ravi



Re: [PATCH]perf:Remove P8 HW events which are not supported

2019-03-18 Thread Ravi Bangoria



On 3/18/19 11:56 PM, Arnaldo Carvalho de Melo wrote:
> I added this:
> 
> Cc: Sukadev Bhattiprolu 
> Fixes: 2a81fa3bb5ed ("perf vendor events: Add power8 PMU events")
> 
> - Arnaldo
Sure. Thanks a lot Arnaldo!



[PATCH] perf c2c: Fix c2c report for empty numa node

2019-02-28 Thread Ravi Bangoria
perf c2c report fails if system has empty numa node(0 cpus):

  $ lscpu
  NUMA node0 CPU(s):   
  NUMA node1 CPU(s):   0-4

  $ sudo ./perf c2c report 
  node/cpu topology bugFailed setup nodes

Fix this.

Reported-by: Nageswara R Sastry 
Signed-off-by: Ravi Bangoria 
---
 tools/perf/util/cpumap.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/tools/perf/util/cpumap.c b/tools/perf/util/cpumap.c
index 383674f448fc..517c3f37c613 100644
--- a/tools/perf/util/cpumap.c
+++ b/tools/perf/util/cpumap.c
@@ -261,7 +261,7 @@ struct cpu_map *cpu_map__dummy_new(void)
struct cpu_map *cpus = malloc(sizeof(*cpus) + sizeof(int));
 
if (cpus != NULL) {
-   cpus->nr = 1;
+   cpus->nr = 0;
cpus->map[0] = -1;
refcount_set(>refcnt, 1);
}
-- 
2.17.1



Re: [PATCH] perf c2c: Fix c2c report for empty numa node

2019-02-28 Thread Ravi Bangoria


On 2/28/19 9:52 PM, Jiri Olsa wrote:
> how about attached change (untested)?

LGTM. Would you mind sending a patch.

> 
> but I wonder there are some other hidden
> bugs wrt empty node
> 
> jirka
> 
> 
> ---
> diff --git a/tools/perf/builtin-c2c.c b/tools/perf/builtin-c2c.c
> index 4272763a5e96..9e6cc868bdb4 100644
> --- a/tools/perf/builtin-c2c.c
> +++ b/tools/perf/builtin-c2c.c
> @@ -2056,6 +2056,12 @@ static int setup_nodes(struct perf_session *session)
>   if (!set)
>   return -ENOMEM;
>  
> + nodes[node] = set;
> +
> + /* empty node, skip */
> + if (cpu_map__empty(map))
> + continue;
> +
>   for (cpu = 0; cpu < map->nr; cpu++) {
>   set_bit(map->map[cpu], set);
>  
> @@ -2064,8 +2070,6 @@ static int setup_nodes(struct perf_session *session)
>  
>   cpu2node[map->map[cpu]] = node;
>   }
> -
> - nodes[node] = set;
>   }
>  
>   setup_nodes_header();
> 



Re: [PATCH v4 3/4] perf stat: Support 'percore' event qualifier

2019-04-15 Thread Ravi Bangoria



On 4/12/19 7:29 PM, Jin Yao wrote:
> diff --git a/tools/perf/Documentation/perf-stat.txt 
> b/tools/perf/Documentation/perf-stat.txt
> index 39c05f8..1e312c2 100644
> --- a/tools/perf/Documentation/perf-stat.txt
> +++ b/tools/perf/Documentation/perf-stat.txt
> @@ -43,6 +43,10 @@ report::
> param1 and param2 are defined as formats for the PMU in
> /sys/bus/event_source/devices//format/*
>  
> +   'percore' is a event qualifier that sums up the event counts for both
> +   hardware threads in a core.

s/both/all/  :

  $ lscpu | grep Thread
  Thread(s) per core:4


Apart from that, for the series:
Tested-by: Ravi Bangoria 



Re: [PATCH] perf c2c: Fix c2c report for empty numa node

2019-03-01 Thread Ravi Bangoria


On 3/1/19 3:56 PM, Jiri Olsa wrote:
> Ravi Bangoria reported that we fail with empty
> numa node with following message:
> 
>   $ lscpu
>   NUMA node0 CPU(s):
>   NUMA node1 CPU(s):   0-4
> 
>   $ sudo ./perf c2c report
>   node/cpu topology bugFailed setup nodes
> 
> Fixing this by detecting empty node and keeping
> its cpu set empty.
> 
> Reported-by: Ravi Bangoria 
> Link: http://lkml.kernel.org/n/tip-dyq5jo6pn1j3yqavb5ukj...@git.kernel.org
> Signed-off-by: Jiri Olsa 

Reported-by: Nageswara R Sastry 
Tested-by: Ravi Bangoria 



Re: [PATCH] Uprobes: Fix deadlock between delayed_uprobe_lock and fs_reclaim

2019-02-25 Thread Ravi Bangoria



On 2/8/19 2:03 PM, Ravi Bangoria wrote:
> 
> 
> On 2/6/19 7:06 PM, Oleg Nesterov wrote:
>> Ravi, I am on vacation till the end of this week, can't read your patch
>> carefully.
>>
>> I am not sure I fully understand the problem, but shouldn't we change
>> binder_alloc_free_page() to use mmput_async() ? Like it does if trylock
>> fails.
> 
> I don't understand binderfs code much so I'll let Sherry comment on this.

Sherry, Can you please comment on this.

Thanks,
Ravi



Re: [PATCH]perf:Remove P8 HW events which are not supported

2019-02-25 Thread Ravi Bangoria



On 2/7/19 3:09 PM, Mamatha Inamdar wrote:
> This patch is to remove following hardware events 
> from JSON file which are not supported on POWER8
> 

Acked-by: Ravi Bangoria 



Re: System crash with perf_fuzzer (kernel: 5.0.0-rc3)

2019-01-31 Thread Ravi Bangoria
Hi Jiri,

On 2/1/19 1:13 PM, Jiri Olsa wrote:
> On Thu, Jan 31, 2019 at 09:27:11AM +0100, Jiri Olsa wrote:
>> On Wed, Jan 30, 2019 at 07:36:48PM +0100, Jiri Olsa wrote:
>>
>> SNIP
>>
>>> diff --git a/kernel/events/core.c b/kernel/events/core.c
>>> index 280a72b3a553..22ec63a0782e 100644
>>> --- a/kernel/events/core.c
>>> +++ b/kernel/events/core.c
>>> @@ -4969,6 +4969,26 @@ static void __perf_event_period(struct perf_event 
>>> *event,
>>> }
>>>  }
>>>  
>>> +static int check_period(struct perf_event *event, u64 value)
>>> +{
>>> +   u64 sample_period_attr = event->attr.sample_period;
>>> +   u64 sample_period_hw   = event->hw.sample_period;
>>> +   int ret;
>>> +
>>> +   if (event->attr.freq) {
>>> +   event->attr.sample_freq = value;
>>> +   } else {
>>> +   event->attr.sample_period = value;
>>> +   event->hw.sample_period = value;
>>> +   }
>>
>> hm, I think we need to check the period without changing the event,
>> because we don't disable pmu, so it might get picked up by bts code
>>
>> will check
> 
> with attached patch I did not trigger the fuzzer crash
> for over a day now, could you guys try?

I ran fuzzer for couple of hours but I didn't see any crash with
your previous patch.

I'll try this newer one as well.

Thanks.



Re: System crash with perf_fuzzer (kernel: 5.0.0-rc3)

2019-02-01 Thread Ravi Bangoria



On 2/1/19 1:24 PM, Ravi Bangoria wrote:
> I ran fuzzer for couple of hours but I didn't see any crash with
> your previous patch.
> 
> I'll try this newer one as well.

I ran fuzzer for ~8 hrs and no lockup so far.

Thanks.



Re: [PATCH] perf script: fix crash when processing recorded stat data

2019-01-20 Thread Ravi Bangoria



On 1/21/19 12:44 AM, Tony Jones wrote:
> While updating Perf to work with Python3 and Python2 I noticed that the 
> stat-cpi script was dumping core.

[...]
> Fixes: 1fcd03946b52 ("perf stat: Update per-thread shadow stats")
> Signed-off-by: Tony Jones 

Tested-by: Ravi Bangoria 



Re: perf trace syscall table generation for powerpc with syscall.tbl

2019-01-08 Thread Ravi Bangoria
Hi Arnaldo,

Yes. I'm aware of it. Just that I was busy with something else so couldn't do 
it.

Thanks for reminding :). Will post a patch soon.

Ravi

On 1/8/19 10:34 PM, Arnaldo Carvalho de Melo wrote:
> Hi Ravi,
> 
>   I noticed that in:
> 
>   commit ab66dcc76d6ab8fae9d69d149ae38c42605e7fc5
>   Author: Firoz Khan 
>   Date:   Mon Dec 17 16:10:36 2018 +0530
> 
> powerpc: generate uapi header and system call table files
> 
> powerpc now generates its syscall tables headers from a syscall.tbl just
> like x86 and s390, could you please switch to using it, taking the x86
> and s390 scripts as a starting point, then test on your systems
> everything works?
> 
> Thanks,
> 
> - Arnaldo
> 



[PATCH 1/2] perf powerpc: Rework syscall table generation

2019-01-10 Thread Ravi Bangoria
Commit aff850393200 ("powerpc: add system call table generation support")
changed how systemcall table is generated for powerpc. Incorporate these
changes into perf as well.

Signed-off-by: Ravi Bangoria 
---
 tools/perf/arch/powerpc/Makefile   |  15 +-
 .../perf/arch/powerpc/entry/syscalls/mksyscalltbl  |  22 +-
 tools/perf/arch/powerpc/entry/syscalls/syscall.tbl | 427 +
 3 files changed, 450 insertions(+), 14 deletions(-)
 create mode 100644 tools/perf/arch/powerpc/entry/syscalls/syscall.tbl

diff --git a/tools/perf/arch/powerpc/Makefile b/tools/perf/arch/powerpc/Makefile
index a111239df182..e58d00d62f02 100644
--- a/tools/perf/arch/powerpc/Makefile
+++ b/tools/perf/arch/powerpc/Makefile
@@ -14,18 +14,25 @@ PERF_HAVE_JITDUMP := 1
 out:= $(OUTPUT)arch/powerpc/include/generated/asm
 header32 := $(out)/syscalls_32.c
 header64 := $(out)/syscalls_64.c
-sysdef := $(srctree)/tools/arch/powerpc/include/uapi/asm/unistd.h
-sysprf := $(srctree)/tools/perf/arch/powerpc/entry/syscalls/
+syskrn := $(srctree)/arch/powerpc/kernel/syscalls/syscall.tbl
+sysprf := $(srctree)/tools/perf/arch/powerpc/entry/syscalls
+sysdef := $(sysprf)/syscall.tbl
 systbl := $(sysprf)/mksyscalltbl
 
 # Create output directory if not already present
 _dummy := $(shell [ -d '$(out)' ] || mkdir -p '$(out)')
 
 $(header64): $(sysdef) $(systbl)
-   $(Q)$(SHELL) '$(systbl)' '64' '$(CC)' $(sysdef) > $@
+   @(test -d ../../kernel -a -d ../../tools -a -d ../perf && ( \
+   (diff -B $(sysdef) $(syskrn) >/dev/null) \
+   || echo "Warning: Kernel ABI header at '$(sysdef)' differs from latest 
version at '$(syskrn)'" >&2 )) || true
+   $(Q)$(SHELL) '$(systbl)' '64' $(sysdef) > $@
 
 $(header32): $(sysdef) $(systbl)
-   $(Q)$(SHELL) '$(systbl)' '32' '$(CC)' $(sysdef) > $@
+   @(test -d ../../kernel -a -d ../../tools -a -d ../perf && ( \
+   (diff -B $(sysdef) $(syskrn) >/dev/null) \
+   || echo "Warning: Kernel ABI header at '$(sysdef)' differs from latest 
version at '$(syskrn)'" >&2 )) || true
+   $(Q)$(SHELL) '$(systbl)' '32' $(sysdef) > $@
 
 clean::
$(call QUIET_CLEAN, powerpc) $(RM) $(header32) $(header64)
diff --git a/tools/perf/arch/powerpc/entry/syscalls/mksyscalltbl 
b/tools/perf/arch/powerpc/entry/syscalls/mksyscalltbl
index ef52e1dd694b..6c58060aa03b 100755
--- a/tools/perf/arch/powerpc/entry/syscalls/mksyscalltbl
+++ b/tools/perf/arch/powerpc/entry/syscalls/mksyscalltbl
@@ -9,10 +9,9 @@
 # Changed by: Ravi Bangoria 
 
 wordsize=$1
-gcc=$2
-input=$3
+SYSCALL_TBL=$2
 
-if ! test -r $input; then
+if ! test -r $SYSCALL_TBL; then
echo "Could not read input file" >&2
exit 1
 fi
@@ -20,18 +19,21 @@ fi
 create_table()
 {
local wordsize=$1
-   local max_nr
+   local max_nr nr abi sc discard
+   max_nr=-1
+   nr=0
 
echo "static const char *syscalltbl_powerpc_${wordsize}[] = {"
-   while read sc nr; do
-   printf '\t[%d] = "%s",\n' $nr $sc
-   max_nr=$nr
+   while read nr abi sc discard; do
+   if [ "$max_nr" -lt "$nr" ]; then
+   printf '\t[%d] = "%s",\n' $nr $sc
+   max_nr=$nr
+   fi
done
echo '};'
echo "#define SYSCALLTBL_POWERPC_${wordsize}_MAX_ID $max_nr"
 }
 
-$gcc -m${wordsize} -E -dM -x c  $input\
-   |sed -ne 's/^#define __NR_//p' \
-   |sort -t' ' -k2 -nu\
+grep -E "^[[:digit:]]+[[:space:]]+(common|spu|nospu|${wordsize})" $SYSCALL_TBL 
\
+   |sort -k1 -n   \
|create_table ${wordsize}
diff --git a/tools/perf/arch/powerpc/entry/syscalls/syscall.tbl 
b/tools/perf/arch/powerpc/entry/syscalls/syscall.tbl
new file mode 100644
index ..db3bbb8744af
--- /dev/null
+++ b/tools/perf/arch/powerpc/entry/syscalls/syscall.tbl
@@ -0,0 +1,427 @@
+# SPDX-License-Identifier: GPL-2.0 WITH Linux-syscall-note
+#
+# system call numbers and entry vectors for powerpc
+#
+# The format is:
+# 
+#
+# The  can be common, spu, nospu, 64, or 32 for this file.
+#
+0  nospu   restart_syscall sys_restart_syscall
+1  nospu   exitsys_exit
+2  nospu   forkppc_fork
+3  common  readsys_read
+4  common  write   sys_write
+5  common  opensys_open
compat_sys_open
+6  common  close   sys_close
+7  common  waitpid sys_waitpid
+8  common  creat   sys_creat
+9  common  linksys_link
+10 common  unlink  sys_unlink
+11 nosp

[PATCH 2/2] perf powerpc: Remove unistd.h

2019-01-10 Thread Ravi Bangoria
We use syscall.tbl to generate system call table on powerpc.
unistd.h is no longer required now. Remove it.

Signed-off-by: Ravi Bangoria 
---
 tools/arch/powerpc/include/uapi/asm/unistd.h | 404 ---
 tools/perf/check-headers.sh  |   1 -
 2 files changed, 405 deletions(-)
 delete mode 100644 tools/arch/powerpc/include/uapi/asm/unistd.h

diff --git a/tools/arch/powerpc/include/uapi/asm/unistd.h 
b/tools/arch/powerpc/include/uapi/asm/unistd.h
deleted file mode 100644
index 985534d0b448..
--- a/tools/arch/powerpc/include/uapi/asm/unistd.h
+++ /dev/null
@@ -1,404 +0,0 @@
-/* SPDX-License-Identifier: GPL-2.0+ WITH Linux-syscall-note */
-/*
- * This file contains the system call numbers.
- *
- * This program is free software; you can redistribute it and/or
- * modify it under the terms of the GNU General Public License
- * as published by the Free Software Foundation; either version
- * 2 of the License, or (at your option) any later version.
- */
-#ifndef _UAPI_ASM_POWERPC_UNISTD_H_
-#define _UAPI_ASM_POWERPC_UNISTD_H_
-
-
-#define __NR_restart_syscall 0
-#define __NR_exit1
-#define __NR_fork2
-#define __NR_read3
-#define __NR_write   4
-#define __NR_open5
-#define __NR_close   6
-#define __NR_waitpid 7
-#define __NR_creat   8
-#define __NR_link9
-#define __NR_unlink 10
-#define __NR_execve 11
-#define __NR_chdir  12
-#define __NR_time   13
-#define __NR_mknod  14
-#define __NR_chmod  15
-#define __NR_lchown 16
-#define __NR_break  17
-#define __NR_oldstat18
-#define __NR_lseek  19
-#define __NR_getpid 20
-#define __NR_mount  21
-#define __NR_umount 22
-#define __NR_setuid 23
-#define __NR_getuid 24
-#define __NR_stime  25
-#define __NR_ptrace 26
-#define __NR_alarm  27
-#define __NR_oldfstat   28
-#define __NR_pause  29
-#define __NR_utime  30
-#define __NR_stty   31
-#define __NR_gtty   32
-#define __NR_access 33
-#define __NR_nice   34
-#define __NR_ftime  35
-#define __NR_sync   36
-#define __NR_kill   37
-#define __NR_rename 38
-#define __NR_mkdir  39
-#define __NR_rmdir  40
-#define __NR_dup41
-#define __NR_pipe   42
-#define __NR_times  43
-#define __NR_prof   44
-#define __NR_brk45
-#define __NR_setgid 46
-#define __NR_getgid 47
-#define __NR_signal 48
-#define __NR_geteuid49
-#define __NR_getegid50
-#define __NR_acct   51
-#define __NR_umount252
-#define __NR_lock   53
-#define __NR_ioctl  54
-#define __NR_fcntl  55
-#define __NR_mpx56
-#define __NR_setpgid57
-#define __NR_ulimit 58
-#define __NR_oldolduname59
-#define __NR_umask  60
-#define __NR_chroot 61
-#define __NR_ustat  62
-#define __NR_dup2   63
-#define __NR_getppid64
-#define __NR_getpgrp65
-#define __NR_setsid 66
-#define __NR_sigaction  67
-#define __NR_sgetmask   68
-#define __NR_ssetmask   69
-#define __NR_setreuid   70
-#define __NR_setregid   71
-#define __NR_sigsuspend 72
-#define __NR_sigpending 73
-#define __NR_sethostname74
-#define __NR_setrlimit  75
-#define __NR_getrlimit  76
-#define __NR_getrusage  77
-#define __NR_gettimeofday   78
-#define __NR_settimeofday   79
-#define __NR_getgroups  80
-#define __NR_setgroups  81
-#define __NR_select 82
-#define __NR_symlink83
-#define __NR_oldlstat   84
-#define __NR_readlink   85
-#define __NR_uselib 86
-#define __NR_swapon 87
-#define __NR_reboot 88
-#define __NR_readdir89
-#define __NR_mmap   90
-#define __NR_munmap 91
-#define __NR_truncate   92
-#define __NR_ftruncate  93
-#define __NR_fchmod 94
-#define __NR_fchown 95
-#define __NR_getpriority96
-#define __NR_setpriority97
-#define __NR_profil 98
-#define __NR_statfs 99
-#define __NR_fstatfs   100
-#define __NR_ioperm101
-#define __NR_socketcall102
-#define __NR_syslog103
-#define __NR_setitimer 104
-#define __NR_getitimer 105
-#define __NR_stat  106
-#define __NR_lstat 107

Re: [PATCH] Uprobes: Fix kernel oops with delayed_uprobe_remove()

2018-12-02 Thread Ravi Bangoria
Hi Steve,

Please pull this patch.

Thanks.

On 11/15/18 6:13 PM, Oleg Nesterov wrote:
> On 11/15, Ravi Bangoria wrote:
>>
>> There could be a race between task exit and probe unregister:
>>
>>   exit_mm()
>>   mmput()
>>   __mmput() uprobe_unregister()
>>   uprobe_clear_state()  put_uprobe()
>>   delayed_uprobe_remove()   delayed_uprobe_remove()
>>
>> put_uprobe() is calling delayed_uprobe_remove() without taking
>> delayed_uprobe_lock and thus the race sometimes results in a
>> kernel crash. Fix this by taking delayed_uprobe_lock before
>> calling delayed_uprobe_remove() from put_uprobe().
>>
>> Detailed crash log can be found at:
>>   https://lkml.org/lkml/2018/11/1/1244
> 
> Thanks, looks good,
> 
> Oleg.
> 



[PATCH] perf test: Add watchpoint test

2018-09-10 Thread Ravi Bangoria
We don't have perf test available to test watchpoint functionality.
Add simple set of tests: 
 - Read only watchpoint
 - Write only watchpoint
 - Read / Write watchpoint
 - Runtime watchpoint modification

Ex on powerpc:
  $ sudo ./perf test 22
  22: Watchpoint:
  22.1: Read Only Watchpoint: Ok
  22.2: Write Only Watchpoint   : Ok
  22.3: Read / Write Watchpoint : Ok
  22.4: Modify Watchpoint   : Ok

Signed-off-by: Ravi Bangoria 
---
 tools/perf/tests/Build  |   1 +
 tools/perf/tests/builtin-test.c |   9 ++
 tools/perf/tests/tests.h|   3 +
 tools/perf/tests/wp.c   | 225 
 4 files changed, 238 insertions(+)
 create mode 100644 tools/perf/tests/wp.c

diff --git a/tools/perf/tests/Build b/tools/perf/tests/Build
index 6c108fa79ae3..0b2b8305c965 100644
--- a/tools/perf/tests/Build
+++ b/tools/perf/tests/Build
@@ -21,6 +21,7 @@ perf-y += python-use.o
 perf-y += bp_signal.o
 perf-y += bp_signal_overflow.o
 perf-y += bp_account.o
+perf-y += wp.o
 perf-y += task-exit.o
 perf-y += sw-clock.o
 perf-y += mmap-thread-lookup.o
diff --git a/tools/perf/tests/builtin-test.c b/tools/perf/tests/builtin-test.c
index d7a5e1b9aa6f..54ca7d87236f 100644
--- a/tools/perf/tests/builtin-test.c
+++ b/tools/perf/tests/builtin-test.c
@@ -120,6 +120,15 @@ static struct test generic_tests[] = {
.func = test__bp_accounting,
.is_supported = test__bp_signal_is_supported,
},
+   {
+   .desc = "Watchpoint",
+   .func = test__wp,
+   .subtest = {
+   .skip_if_fail   = false,
+   .get_nr = test__wp_subtest_get_nr,
+   .get_desc   = test__wp_subtest_get_desc,
+   },
+   },
{
.desc = "Number of exit events of a simple workload",
.func = test__task_exit,
diff --git a/tools/perf/tests/tests.h b/tools/perf/tests/tests.h
index a9760e790563..8e26a4148f30 100644
--- a/tools/perf/tests/tests.h
+++ b/tools/perf/tests/tests.h
@@ -59,6 +59,9 @@ int test__python_use(struct test *test, int subtest);
 int test__bp_signal(struct test *test, int subtest);
 int test__bp_signal_overflow(struct test *test, int subtest);
 int test__bp_accounting(struct test *test, int subtest);
+int test__wp(struct test *test, int subtest);
+const char *test__wp_subtest_get_desc(int subtest);
+int test__wp_subtest_get_nr(void);
 int test__task_exit(struct test *test, int subtest);
 int test__mem(struct test *test, int subtest);
 int test__sw_clock_freq(struct test *test, int subtest);
diff --git a/tools/perf/tests/wp.c b/tools/perf/tests/wp.c
new file mode 100644
index ..f9f6d9ee5172
--- /dev/null
+++ b/tools/perf/tests/wp.c
@@ -0,0 +1,225 @@
+// SPDX-License-Identifier: GPL-2.0
+#include 
+#include 
+#include "tests/tests.h"
+#include "arch-tests.h"
+#include "debug.h"
+#include "cloexec.h"
+
+#define WP_TEST_ASSERT_VAL(fd, text, val)   \
+do {\
+   long long count;\
+   wp_read(fd, , sizeof(long long)); \
+   TEST_ASSERT_VAL(text, count == val);\
+} while (0)
+
+volatile u64 data1;
+volatile u8 data2[3];
+
+static int wp_read(int fd, long long *count, int size)
+{
+   int ret = read(fd, count, size);
+
+   if (ret != size) {
+   pr_debug("failed to read: %d\n", ret);
+   return -1;
+   }
+   return 0;
+}
+
+static void get__perf_event_attr(struct perf_event_attr *attr, int wp_type,
+void *wp_addr, unsigned long wp_len)
+{
+   memset(attr, 0, sizeof(struct perf_event_attr));
+   attr->type   = PERF_TYPE_BREAKPOINT;
+   attr->size   = sizeof(struct perf_event_attr);
+   attr->config = 0;
+   attr->bp_type= wp_type;
+   attr->bp_addr= (unsigned long)wp_addr;
+   attr->bp_len = wp_len;
+   attr->sample_period  = 1;
+   attr->sample_type= PERF_SAMPLE_IP;
+   attr->exclude_kernel = 1;
+   attr->exclude_hv = 1;
+}
+
+static int __event(int wp_type, void *wp_addr, unsigned long wp_len)
+{
+   int fd;
+   struct perf_event_attr attr;
+
+   get__perf_event_attr(, wp_type, wp_addr, wp_len);
+   fd = sys_perf_event_open(, 0, -1, -1,
+perf_event_open_cloexec_flag());
+   if (fd < 0)
+   pr_debug("failed opening event %x\n", attr.bp_type);
+
+   return fd;
+}
+
+static int wp_ro_test(void)
+{
+   int fd;
+   unsigned long tmp, tmp1 = rand();
+
+   fd = __event(HW_BREAKPOINT_R, (void *), sizeof(data1));
+ 

Re: [PATCH v6 3/6] Uprobes: Support SDT markers having reference count (semaphore)

2018-07-23 Thread Ravi Bangoria
Hi Oleg,

On 07/23/2018 09:56 PM, Oleg Nesterov wrote:
> I have a mixed feeling about this series... I'll try to summarise my thinking
> tomorrow, but I do not see any obvious problem so far. Although I have some
> concerns about 5/6, I need to re-read it after sleep.

Sure.

> 
> 
> On 07/16, Ravi Bangoria wrote:
>>
>> +static int delayed_uprobe_install(struct vm_area_struct *vma)
>> +{
>> +struct list_head *pos, *q;
>> +struct delayed_uprobe *du;
>> +unsigned long vaddr;
>> +int ret = 0, err = 0;
>> +
>> +mutex_lock(_uprobe_lock);
>> +list_for_each_safe(pos, q, _uprobe_list) {
>> +du = list_entry(pos, struct delayed_uprobe, list);
>> +
>> +if (!du->uprobe->ref_ctr_offset ||
> 
> Is it possible to see ->ref_ctr_offset == 0 in delayed_uprobe_list?

I'll remove this check.

> 
>> @@ -1072,7 +1282,13 @@ int uprobe_mmap(struct vm_area_struct *vma)
>>  struct uprobe *uprobe, *u;
>>  struct inode *inode;
>>  
>> -if (no_uprobe_events() || !valid_vma(vma, true))
>> +if (no_uprobe_events())
>> +return 0;
>> +
>> +if (vma->vm_flags & VM_WRITE)
>> +delayed_uprobe_install(vma);
> 
> Obviously not nice performance-wise... OK, I do not know if it will actually
> hurt in practice and probably we can use the better data structures if 
> necessary.
> But can't we check MMF_HAS_UPROBES at least? I mean,
> 
>   if (vma->vm_flags & VM_WRITE && test_bit(MMF_HAS_UPROBES, 
> >vm_mm->flags))
>   delayed_uprobe_install(vma);
> 
> ?

Yes.

> 
> 
> Perhaps we can even add another flag later, MMF_HAS_DELAYED_UPROBES set by
> delayed_uprobe_add().

Yes, good idea.

Thanks for the review,
Ravi



Re: [PATCH v6 0/6] Uprobes: Support SDT markers having reference count (semaphore)

2018-07-20 Thread Ravi Bangoria
Oleg, Srikar, Masami, Song,

Any feedback please? :)

Thanks,
Ravi

On 07/16/2018 02:17 PM, Ravi Bangoria wrote:
> Userspace Statically Defined Tracepoints[1] are dtrace style markers
> inside userspace applications. Applications like PostgreSQL, MySQL,
> Pthread, Perl, Python, Java, Ruby, Node.js, libvirt, QEMU, glib etc
> have these markers embedded in them. These markers are added by developer
> at important places in the code. Each marker source expands to a single
> nop instruction in the compiled code but there may be additional
> overhead for computing the marker arguments which expands to couple of
> instructions. In case the overhead is more, execution of it can be
> omitted by runtime if() condition when no one is tracing on the marker:
> 
> if (reference_counter > 0) {
> Execute marker instructions;
> }
> 
> Default value of reference counter is 0. Tracer has to increment the 
> reference counter before tracing on a marker and decrement it when
> done with the tracing.
> 
> Currently, perf tool has limited supports for SDT markers. I.e. it
> can not trace markers surrounded by reference counter. Also, it's
> not easy to add reference counter logic in userspace tool like perf,
> so basic idea for this patchset is to add reference counter logic in
> the a uprobe infrastructure. Ex,[2]
> 
>   # cat tick.c
> ... 
> for (i = 0; i < 100; i++) {
>   DTRACE_PROBE1(tick, loop1, i);
> if (TICK_LOOP2_ENABLED()) {
> DTRACE_PROBE1(tick, loop2, i); 
> }
> printf("hi: %d\n", i); 
> sleep(1);
> }   
> ... 
> 
> Here tick:loop1 is marker without reference counter where as tick:loop2
> is surrounded by reference counter condition.
> 
>   # perf buildid-cache --add /tmp/tick
>   # perf probe sdt_tick:loop1
>   # perf probe sdt_tick:loop2
> 
>   # perf stat -e sdt_tick:loop1,sdt_tick:loop2 -- /tmp/tick
>   hi: 0
>   hi: 1
>   hi: 2
>   ^C
>   Performance counter stats for '/tmp/tick':
>  3  sdt_tick:loop1
>  0  sdt_tick:loop2
>  2.747086086 seconds time elapsed
> 
> Perf failed to record data for tick:loop2. Same experiment with this
> patch series:
> 
>   # ./perf buildid-cache --add /tmp/tick
>   # ./perf probe sdt_tick:loop2
>   # ./perf stat -e sdt_tick:loop2 /tmp/tick
> hi: 0
> hi: 1
> hi: 2
> ^C  
>  Performance counter stats for '/tmp/tick':
>  3  sdt_tick:loop2
>2.561851452 seconds time elapsed
> 
> 
> v6 changes:
>  - Do not export struct uprobe outside of uprobe.c. Instead, use
>container_of(arch_uprobe) to regain uprobe in uprobe_write_opcode().
>  - Allow 0 as a special value for reference counter _offset_. I.e.
>two uprobes, one having ref_ctr_offset=0 and the other having
>non-zero ref_ctr_offset can coexists.
>  - If vma holding reference counter is not present while patching an
>instruction, we add that uprobe in delayed_uprobe_list. When
>appropriate mapping is created, we increment the reference counter
>and remove uprobe from delayed_uprobe_list. While doing all this,
>v5 was searching for all such uprobes in uprobe_tree which is not 
>require. Also, uprobes are stored in rbtree with inode+offset as
>the key and v5 was searching uprobe based on inode+ref_ctr_offset
>which is wrong too. Fix this by directly looing on delayed_uprobe_list.
>  - Consider VM_SHARED vma as invalid for reference counter. Return
>false from valid_ref_ctr_vma() if vma->vm_flags has VM_SHARED set.
>  - No need to use FOLL_FORCE in get_user_pages_remote() while getting
>a page to update reference counter. Remove it.
>  - Do not mention usage of reference counter in Documentation/.
> 
> 
> v5 can be found at: https://lkml.org/lkml/2018/6/28/51
> 
> Ravi Bangoria (6):
>   Uprobes: Simplify uprobe_register() body
>   Uprobe: Additional argument arch_uprobe to uprobe_write_opcode()
>   Uprobes: Support SDT markers having reference count (semaphore)
>   trace_uprobe/sdt: Prevent multiple reference counter for same uprobe
>   Uprobes/sdt: Prevent multiple reference counter for same uprobe
>   perf probe: Support SDT markers having reference counter (semaphore)
> 
>  arch/arm/probes/uprobes/core.c |   2 +-
>  arch/mips/kernel/uprobes.c |   2 +-
>  include/linux/uprobes.h|   7 +-
>  kernel/events/uprobes.c| 358 
> -
>  kernel/trace/trace.c   |   2 +-
>  kernel/trace/trace_uprobe.c|  78 -
>  tools/perf/util/probe-event.c  |  39 -
>  tools/perf/util/probe-event.h  |   1 +
>  tools/perf/util/probe-file.c   |  34 +++-
>  tools/perf/util/probe-file.h   |   1 +
>  tools/perf/util/symbol-elf.c   |  46 --
>  tools/perf/util/symbol.h   |   7 +
>  12 files changed, 503 insertions(+), 74 deletions(-)
> 



Re: [PATCH] perf report powerpc: Fix crash if callchain is empty

2018-06-11 Thread Ravi Bangoria



On 06/11/2018 04:10 PM, Sandipan Das wrote:
> For some cases, the callchain provided by the kernel may be
> empty. So, the callchain ip filtering code will cause a crash
> if we do not check whether the struct ip_callchain pointer is
> NULL before accessing any members.
> 
> This can be observed on a powerpc64le system running Fedora 27
> as shown below.
> 
>   # perf record -b -e cycles:u ls
> 
> Before applying this patch:
> 
>   # perf report --branch-history
> 
>   perf: Segmentation fault
>    backtrace 
>   perf[0x1027615c]
>   linux-vdso64.so.1(__kernel_sigtramp_rt64+0x0)[0x7fff856304d8]
>   perf(arch_skip_callchain_idx+0x44)[0x10257c58]
>   perf[0x1017f2e4]
>   perf(thread__resolve_callchain+0x124)[0x1017ff5c]
>   perf(sample__resolve_callchain+0xf0)[0x10172788]
>   ...
> 
> After applying this patch:
> 
>   # perf report --branch-history
> 
>   Samples: 25  of event 'cycles:u', Event count (approx.): 2306870
> Overhead  Source:LineSymbol   Shared Object
>   +   11.60%  _init+35736[.] _initls
>   +9.84%  strcoll_l.c:137[.] __strcoll_l  libc-2.26.so
>   +9.16%  memcpy.S:175   [.] __memcpy_power7  libc-2.26.so
>   +9.01%  gconv_charset.h:54 [.] _nl_find_locale  libc-2.26.so
>   +8.87%  dl-addr.c:52   [.] _dl_addr libc-2.26.so
>   +8.83%  _init+236  [.] _initls
>   ...
> 
> Reported-by: Ravi Bangoria 
> Signed-off-by: Sandipan Das 


Acked-by: Ravi Bangoria 



Re: [PATCH v5 06/10] Uprobes: Support SDT markers having reference count (semaphore)

2018-07-16 Thread Ravi Bangoria
Hi Song,

On 07/14/2018 05:20 AM, Song Liu wrote:
> 
> Hmm... what happens when we have multiple uprobes sharing the same
> reference counter? It feels equally complicate to me. Or did I miss any
> cases here?


As far as I can think of, it should be handled by default. No special
treatment needed for this.

...

> 
> This patch tries to resolve this imbalance by passing extra flag
> "restore_insn" to probe_event_disable().
> 
> Signed-off-by: Song Liu 


Ah cool. Seems this will make them balanced. But is it fine to change
uprobe_unregister()? It's already an ABI. No other way around?

Also, one more source of imbalance is this condition:

  if (is_register)
  flags |= VM_WRITE;

in valid_vma(). You need to take care of that as well.

Thanks,
Ravi



[PATCH v6 2/6] Uprobe: Additional argument arch_uprobe to uprobe_write_opcode()

2018-07-16 Thread Ravi Bangoria
Add addition argument 'arch_uprobe' to uprobe_write_opcode().
We need this in later set of patches.

Signed-off-by: Ravi Bangoria 
---
 arch/arm/probes/uprobes/core.c | 2 +-
 arch/mips/kernel/uprobes.c | 2 +-
 include/linux/uprobes.h| 2 +-
 kernel/events/uprobes.c| 9 +
 4 files changed, 8 insertions(+), 7 deletions(-)

diff --git a/arch/arm/probes/uprobes/core.c b/arch/arm/probes/uprobes/core.c
index d1329f1ba4e4..bf992264060e 100644
--- a/arch/arm/probes/uprobes/core.c
+++ b/arch/arm/probes/uprobes/core.c
@@ -32,7 +32,7 @@ bool is_swbp_insn(uprobe_opcode_t *insn)
 int set_swbp(struct arch_uprobe *auprobe, struct mm_struct *mm,
 unsigned long vaddr)
 {
-   return uprobe_write_opcode(mm, vaddr,
+   return uprobe_write_opcode(auprobe, mm, vaddr,
   __opcode_to_mem_arm(auprobe->bpinsn));
 }
 
diff --git a/arch/mips/kernel/uprobes.c b/arch/mips/kernel/uprobes.c
index f7a0645ccb82..4aaff3b3175c 100644
--- a/arch/mips/kernel/uprobes.c
+++ b/arch/mips/kernel/uprobes.c
@@ -224,7 +224,7 @@ unsigned long arch_uretprobe_hijack_return_addr(
 int __weak set_swbp(struct arch_uprobe *auprobe, struct mm_struct *mm,
unsigned long vaddr)
 {
-   return uprobe_write_opcode(mm, vaddr, UPROBE_SWBP_INSN);
+   return uprobe_write_opcode(auprobe, mm, vaddr, UPROBE_SWBP_INSN);
 }
 
 void arch_uprobe_copy_ixol(struct page *page, unsigned long vaddr,
diff --git a/include/linux/uprobes.h b/include/linux/uprobes.h
index 0a294e950df8..bb9d2084af03 100644
--- a/include/linux/uprobes.h
+++ b/include/linux/uprobes.h
@@ -121,7 +121,7 @@ extern bool is_swbp_insn(uprobe_opcode_t *insn);
 extern bool is_trap_insn(uprobe_opcode_t *insn);
 extern unsigned long uprobe_get_swbp_addr(struct pt_regs *regs);
 extern unsigned long uprobe_get_trap_addr(struct pt_regs *regs);
-extern int uprobe_write_opcode(struct mm_struct *mm, unsigned long vaddr, 
uprobe_opcode_t);
+extern int uprobe_write_opcode(struct arch_uprobe *auprobe, struct mm_struct 
*mm, unsigned long vaddr, uprobe_opcode_t);
 extern int uprobe_register(struct inode *inode, loff_t offset, struct 
uprobe_consumer *uc);
 extern int uprobe_apply(struct inode *inode, loff_t offset, struct 
uprobe_consumer *uc, bool);
 extern void uprobe_unregister(struct inode *inode, loff_t offset, struct 
uprobe_consumer *uc);
diff --git a/kernel/events/uprobes.c b/kernel/events/uprobes.c
index 471eac896635..c0418ba52ba8 100644
--- a/kernel/events/uprobes.c
+++ b/kernel/events/uprobes.c
@@ -299,8 +299,8 @@ static int verify_opcode(struct page *page, unsigned long 
vaddr, uprobe_opcode_t
  * Called with mm->mmap_sem held for write.
  * Return 0 (success) or a negative errno.
  */
-int uprobe_write_opcode(struct mm_struct *mm, unsigned long vaddr,
-   uprobe_opcode_t opcode)
+int uprobe_write_opcode(struct arch_uprobe *auprobe, struct mm_struct *mm,
+   unsigned long vaddr, uprobe_opcode_t opcode)
 {
struct page *old_page, *new_page;
struct vm_area_struct *vma;
@@ -351,7 +351,7 @@ int uprobe_write_opcode(struct mm_struct *mm, unsigned long 
vaddr,
  */
 int __weak set_swbp(struct arch_uprobe *auprobe, struct mm_struct *mm, 
unsigned long vaddr)
 {
-   return uprobe_write_opcode(mm, vaddr, UPROBE_SWBP_INSN);
+   return uprobe_write_opcode(auprobe, mm, vaddr, UPROBE_SWBP_INSN);
 }
 
 /**
@@ -366,7 +366,8 @@ int __weak set_swbp(struct arch_uprobe *auprobe, struct 
mm_struct *mm, unsigned
 int __weak
 set_orig_insn(struct arch_uprobe *auprobe, struct mm_struct *mm, unsigned long 
vaddr)
 {
-   return uprobe_write_opcode(mm, vaddr, *(uprobe_opcode_t 
*)>insn);
+   return uprobe_write_opcode(auprobe, mm, vaddr,
+   *(uprobe_opcode_t *)>insn);
 }
 
 static struct uprobe *get_uprobe(struct uprobe *uprobe)
-- 
2.14.4



[PATCH v6 1/6] Uprobes: Simplify uprobe_register() body

2018-07-16 Thread Ravi Bangoria
Simplify uprobe_register() function body and let __uprobe_register()
handle everything. Also move dependency functions around to fix build
failures.

Signed-off-by: Ravi Bangoria 
---
 kernel/events/uprobes.c | 69 ++---
 1 file changed, 36 insertions(+), 33 deletions(-)

diff --git a/kernel/events/uprobes.c b/kernel/events/uprobes.c
index ccc579a7d32e..471eac896635 100644
--- a/kernel/events/uprobes.c
+++ b/kernel/events/uprobes.c
@@ -840,13 +840,8 @@ register_for_each_vma(struct uprobe *uprobe, struct 
uprobe_consumer *new)
return err;
 }
 
-static int __uprobe_register(struct uprobe *uprobe, struct uprobe_consumer *uc)
-{
-   consumer_add(uprobe, uc);
-   return register_for_each_vma(uprobe, uc);
-}
-
-static void __uprobe_unregister(struct uprobe *uprobe, struct uprobe_consumer 
*uc)
+static void
+__uprobe_unregister(struct uprobe *uprobe, struct uprobe_consumer *uc)
 {
int err;
 
@@ -860,24 +855,46 @@ static void __uprobe_unregister(struct uprobe *uprobe, 
struct uprobe_consumer *u
 }
 
 /*
- * uprobe_register - register a probe
+ * uprobe_unregister - unregister a already registered probe.
+ * @inode: the file in which the probe has to be removed.
+ * @offset: offset from the start of the file.
+ * @uc: identify which probe if multiple probes are colocated.
+ */
+void uprobe_unregister(struct inode *inode, loff_t offset, struct 
uprobe_consumer *uc)
+{
+   struct uprobe *uprobe;
+
+   uprobe = find_uprobe(inode, offset);
+   if (WARN_ON(!uprobe))
+   return;
+
+   down_write(>register_rwsem);
+   __uprobe_unregister(uprobe, uc);
+   up_write(>register_rwsem);
+   put_uprobe(uprobe);
+}
+EXPORT_SYMBOL_GPL(uprobe_unregister);
+
+/*
+ * __uprobe_register - register a probe
  * @inode: the file in which the probe has to be placed.
  * @offset: offset from the start of the file.
  * @uc: information on howto handle the probe..
  *
- * Apart from the access refcount, uprobe_register() takes a creation
+ * Apart from the access refcount, __uprobe_register() takes a creation
  * refcount (thro alloc_uprobe) if and only if this @uprobe is getting
  * inserted into the rbtree (i.e first consumer for a @inode:@offset
  * tuple).  Creation refcount stops uprobe_unregister from freeing the
  * @uprobe even before the register operation is complete. Creation
  * refcount is released when the last @uc for the @uprobe
- * unregisters. Caller of uprobe_register() is required to keep @inode
+ * unregisters. Caller of __uprobe_register() is required to keep @inode
  * (and the containing mount) referenced.
  *
  * Return errno if it cannot successully install probes
  * else return 0 (success)
  */
-int uprobe_register(struct inode *inode, loff_t offset, struct uprobe_consumer 
*uc)
+static int __uprobe_register(struct inode *inode, loff_t offset,
+struct uprobe_consumer *uc)
 {
struct uprobe *uprobe;
int ret;
@@ -904,7 +921,8 @@ int uprobe_register(struct inode *inode, loff_t offset, 
struct uprobe_consumer *
down_write(>register_rwsem);
ret = -EAGAIN;
if (likely(uprobe_is_active(uprobe))) {
-   ret = __uprobe_register(uprobe, uc);
+   consumer_add(uprobe, uc);
+   ret = register_for_each_vma(uprobe, uc);
if (ret)
__uprobe_unregister(uprobe, uc);
}
@@ -915,6 +933,12 @@ int uprobe_register(struct inode *inode, loff_t offset, 
struct uprobe_consumer *
goto retry;
return ret;
 }
+
+int uprobe_register(struct inode *inode, loff_t offset,
+   struct uprobe_consumer *uc)
+{
+   return __uprobe_register(inode, offset, uc);
+}
 EXPORT_SYMBOL_GPL(uprobe_register);
 
 /*
@@ -946,27 +970,6 @@ int uprobe_apply(struct inode *inode, loff_t offset,
return ret;
 }
 
-/*
- * uprobe_unregister - unregister a already registered probe.
- * @inode: the file in which the probe has to be removed.
- * @offset: offset from the start of the file.
- * @uc: identify which probe if multiple probes are colocated.
- */
-void uprobe_unregister(struct inode *inode, loff_t offset, struct 
uprobe_consumer *uc)
-{
-   struct uprobe *uprobe;
-
-   uprobe = find_uprobe(inode, offset);
-   if (WARN_ON(!uprobe))
-   return;
-
-   down_write(>register_rwsem);
-   __uprobe_unregister(uprobe, uc);
-   up_write(>register_rwsem);
-   put_uprobe(uprobe);
-}
-EXPORT_SYMBOL_GPL(uprobe_unregister);
-
 static int unapply_uprobe(struct uprobe *uprobe, struct mm_struct *mm)
 {
struct vm_area_struct *vma;
-- 
2.14.4



[PATCH v6 0/6] Uprobes: Support SDT markers having reference count (semaphore)

2018-07-16 Thread Ravi Bangoria
Userspace Statically Defined Tracepoints[1] are dtrace style markers
inside userspace applications. Applications like PostgreSQL, MySQL,
Pthread, Perl, Python, Java, Ruby, Node.js, libvirt, QEMU, glib etc
have these markers embedded in them. These markers are added by developer
at important places in the code. Each marker source expands to a single
nop instruction in the compiled code but there may be additional
overhead for computing the marker arguments which expands to couple of
instructions. In case the overhead is more, execution of it can be
omitted by runtime if() condition when no one is tracing on the marker:

if (reference_counter > 0) {
Execute marker instructions;
}

Default value of reference counter is 0. Tracer has to increment the 
reference counter before tracing on a marker and decrement it when
done with the tracing.

Currently, perf tool has limited supports for SDT markers. I.e. it
can not trace markers surrounded by reference counter. Also, it's
not easy to add reference counter logic in userspace tool like perf,
so basic idea for this patchset is to add reference counter logic in
the a uprobe infrastructure. Ex,[2]

  # cat tick.c
... 
for (i = 0; i < 100; i++) {
DTRACE_PROBE1(tick, loop1, i);
if (TICK_LOOP2_ENABLED()) {
DTRACE_PROBE1(tick, loop2, i); 
}
printf("hi: %d\n", i); 
sleep(1);
}   
... 

Here tick:loop1 is marker without reference counter where as tick:loop2
is surrounded by reference counter condition.

  # perf buildid-cache --add /tmp/tick
  # perf probe sdt_tick:loop1
  # perf probe sdt_tick:loop2

  # perf stat -e sdt_tick:loop1,sdt_tick:loop2 -- /tmp/tick
  hi: 0
  hi: 1
  hi: 2
  ^C
  Performance counter stats for '/tmp/tick':
 3  sdt_tick:loop1
 0  sdt_tick:loop2
 2.747086086 seconds time elapsed

Perf failed to record data for tick:loop2. Same experiment with this
patch series:

  # ./perf buildid-cache --add /tmp/tick
  # ./perf probe sdt_tick:loop2
  # ./perf stat -e sdt_tick:loop2 /tmp/tick
hi: 0
hi: 1
hi: 2
^C  
 Performance counter stats for '/tmp/tick':
 3  sdt_tick:loop2
   2.561851452 seconds time elapsed


v6 changes:
 - Do not export struct uprobe outside of uprobe.c. Instead, use
   container_of(arch_uprobe) to regain uprobe in uprobe_write_opcode().
 - Allow 0 as a special value for reference counter _offset_. I.e.
   two uprobes, one having ref_ctr_offset=0 and the other having
   non-zero ref_ctr_offset can coexists.
 - If vma holding reference counter is not present while patching an
   instruction, we add that uprobe in delayed_uprobe_list. When
   appropriate mapping is created, we increment the reference counter
   and remove uprobe from delayed_uprobe_list. While doing all this,
   v5 was searching for all such uprobes in uprobe_tree which is not 
   require. Also, uprobes are stored in rbtree with inode+offset as
   the key and v5 was searching uprobe based on inode+ref_ctr_offset
   which is wrong too. Fix this by directly looing on delayed_uprobe_list.
 - Consider VM_SHARED vma as invalid for reference counter. Return
   false from valid_ref_ctr_vma() if vma->vm_flags has VM_SHARED set.
 - No need to use FOLL_FORCE in get_user_pages_remote() while getting
   a page to update reference counter. Remove it.
 - Do not mention usage of reference counter in Documentation/.


v5 can be found at: https://lkml.org/lkml/2018/6/28/51

Ravi Bangoria (6):
  Uprobes: Simplify uprobe_register() body
  Uprobe: Additional argument arch_uprobe to uprobe_write_opcode()
  Uprobes: Support SDT markers having reference count (semaphore)
  trace_uprobe/sdt: Prevent multiple reference counter for same uprobe
  Uprobes/sdt: Prevent multiple reference counter for same uprobe
  perf probe: Support SDT markers having reference counter (semaphore)

 arch/arm/probes/uprobes/core.c |   2 +-
 arch/mips/kernel/uprobes.c |   2 +-
 include/linux/uprobes.h|   7 +-
 kernel/events/uprobes.c| 358 -
 kernel/trace/trace.c   |   2 +-
 kernel/trace/trace_uprobe.c|  78 -
 tools/perf/util/probe-event.c  |  39 -
 tools/perf/util/probe-event.h  |   1 +
 tools/perf/util/probe-file.c   |  34 +++-
 tools/perf/util/probe-file.h   |   1 +
 tools/perf/util/symbol-elf.c   |  46 --
 tools/perf/util/symbol.h   |   7 +
 12 files changed, 503 insertions(+), 74 deletions(-)

-- 
2.14.4



[PATCH v6 3/6] Uprobes: Support SDT markers having reference count (semaphore)

2018-07-16 Thread Ravi Bangoria
Userspace Statically Defined Tracepoints[1] are dtrace style markers
inside userspace applications. Applications like PostgreSQL, MySQL,
Pthread, Perl, Python, Java, Ruby, Node.js, libvirt, QEMU, glib etc
have these markers embedded in them. These markers are added by developer
at important places in the code. Each marker source expands to a single
nop instruction in the compiled code but there may be additional
overhead for computing the marker arguments which expands to couple of
instructions. In case the overhead is more, execution of it can be
omitted by runtime if() condition when no one is tracing on the marker:

if (reference_counter > 0) {
Execute marker instructions;
}

Default value of reference counter is 0. Tracer has to increment the
reference counter before tracing on a marker and decrement it when
done with the tracing.

Implement the reference counter logic in core uprobe. User will be
able to use it from trace_uprobe as well as from kernel module. New
trace_uprobe definition with reference counter will now be:

:[(ref_ctr_offset)]

where ref_ctr_offset is an optional field. For kernel module, new
variant of uprobe_register() has been introduced:

uprobe_register_refctr(inode, offset, ref_ctr_offset, consumer)

No new variant for uprobe_unregister() because it's assumed to have
only one reference counter for one uprobe.

[1] https://sourceware.org/systemtap/wiki/UserSpaceProbeImplementation

Note: 'reference counter' is called as 'semaphore' in original Dtrace
(or Systemtap, bcc and even in ELF) documentation and code. But the
term 'semaphore' is misleading in this context. This is just a counter
used to hold number of tracers tracing on a marker. This is not really
used for any synchronization. So we are referring it as 'reference
counter' in kernel / perf code.

Signed-off-by: Ravi Bangoria 
---
 include/linux/uprobes.h |   5 +
 kernel/events/uprobes.c | 232 ++--
 kernel/trace/trace.c|   2 +-
 kernel/trace/trace_uprobe.c |  38 +++-
 4 files changed, 267 insertions(+), 10 deletions(-)

diff --git a/include/linux/uprobes.h b/include/linux/uprobes.h
index bb9d2084af03..103a48a48872 100644
--- a/include/linux/uprobes.h
+++ b/include/linux/uprobes.h
@@ -123,6 +123,7 @@ extern unsigned long uprobe_get_swbp_addr(struct pt_regs 
*regs);
 extern unsigned long uprobe_get_trap_addr(struct pt_regs *regs);
 extern int uprobe_write_opcode(struct arch_uprobe *auprobe, struct mm_struct 
*mm, unsigned long vaddr, uprobe_opcode_t);
 extern int uprobe_register(struct inode *inode, loff_t offset, struct 
uprobe_consumer *uc);
+extern int uprobe_register_refctr(struct inode *inode, loff_t offset, loff_t 
ref_ctr_offset, struct uprobe_consumer *uc);
 extern int uprobe_apply(struct inode *inode, loff_t offset, struct 
uprobe_consumer *uc, bool);
 extern void uprobe_unregister(struct inode *inode, loff_t offset, struct 
uprobe_consumer *uc);
 extern int uprobe_mmap(struct vm_area_struct *vma);
@@ -160,6 +161,10 @@ uprobe_register(struct inode *inode, loff_t offset, struct 
uprobe_consumer *uc)
 {
return -ENOSYS;
 }
+static inline int uprobe_register_refctr(struct inode *inode, loff_t offset, 
loff_t ref_ctr_offset, struct uprobe_consumer *uc)
+{
+   return -ENOSYS;
+}
 static inline int
 uprobe_apply(struct inode *inode, loff_t offset, struct uprobe_consumer *uc, 
bool add)
 {
diff --git a/kernel/events/uprobes.c b/kernel/events/uprobes.c
index c0418ba52ba8..84da8512a974 100644
--- a/kernel/events/uprobes.c
+++ b/kernel/events/uprobes.c
@@ -73,6 +73,7 @@ struct uprobe {
struct uprobe_consumer  *consumers;
struct inode*inode; /* Also hold a ref to inode */
loff_t  offset;
+   loff_t  ref_ctr_offset;
unsigned long   flags;
 
/*
@@ -88,6 +89,15 @@ struct uprobe {
struct arch_uprobe  arch;
 };
 
+struct delayed_uprobe {
+   struct list_head list;
+   struct uprobe *uprobe;
+   struct mm_struct *mm;
+};
+
+static DEFINE_MUTEX(delayed_uprobe_lock);
+static LIST_HEAD(delayed_uprobe_list);
+
 /*
  * Execute out of line area: anonymous executable mapping installed
  * by the probed task to execute the copy of the original instruction
@@ -282,6 +292,154 @@ static int verify_opcode(struct page *page, unsigned long 
vaddr, uprobe_opcode_t
return 1;
 }
 
+static struct delayed_uprobe *
+delayed_uprobe_check(struct uprobe *uprobe, struct mm_struct *mm)
+{
+   struct delayed_uprobe *du;
+
+   list_for_each_entry(du, _uprobe_list, list)
+   if (du->uprobe == uprobe && du->mm == mm)
+   return du;
+   return NULL;
+}
+
+static int delayed_uprobe_add(struct uprobe *uprobe, struct mm_struct *mm)
+{
+   struct delayed_uprobe *du;
+
+   if (delayed_uprobe_check(uprobe, mm))
+   return 0;
+
+   du  = kzalloc(sizeof

[PATCH v6 4/6] trace_uprobe/sdt: Prevent multiple reference counter for same uprobe

2018-07-16 Thread Ravi Bangoria
We assume to have only one reference counter for one uprobe.
Don't allow user to add multiple trace_uprobe entries having
same inode+offset but different reference counter.

Ex,
  # echo "p:sdt_tick/loop2 /home/ravi/tick:0x6e4(0x10036)" > uprobe_events
  # echo "p:sdt_tick/loop2_1 /home/ravi/tick:0x6e4(0xf)" >> uprobe_events
  bash: echo: write error: Invalid argument

  # dmesg
  trace_kprobe: Reference counter offset mismatch.

There are two exceptions though:
1) When user is trying to replace the old entry with the new
   one, we allow this if the new entry does not conflict with
   any other existing entry.
2) Any one of the reference counter offset is 0 while comparing
   new entry with old one.

Signed-off-by: Ravi Bangoria 
---
 kernel/trace/trace_uprobe.c | 40 ++--
 1 file changed, 38 insertions(+), 2 deletions(-)

diff --git a/kernel/trace/trace_uprobe.c b/kernel/trace/trace_uprobe.c
index bf2be098eb08..9fafb183c49d 100644
--- a/kernel/trace/trace_uprobe.c
+++ b/kernel/trace/trace_uprobe.c
@@ -324,6 +324,38 @@ static int unregister_trace_uprobe(struct trace_uprobe *tu)
return 0;
 }
 
+/*
+ * Uprobe with multiple reference counter is not allowed. i.e.
+ * If inode and offset matches, reference counter offset *must*
+ * match as well. Though, there are two exceptions:
+ * 1) We are replacing old trace_uprobe with new one(same
+ *group/event) then we don't care as far as the new entry
+ *does not conflict with any other existing entry.
+ * 2) Any one of the reference counter offset is 0 while comparing
+ *new entry with old one.
+ */
+static struct trace_uprobe *find_old_trace_uprobe(struct trace_uprobe *new)
+{
+   struct trace_uprobe *tmp, *old = NULL;
+   struct inode *new_inode = d_real_inode(new->path.dentry);
+
+   old = find_probe_event(trace_event_name(>tp.call),
+   new->tp.call.class->system);
+
+   list_for_each_entry(tmp, _list, list) {
+   if ((old ? old != tmp : true) &&
+   new_inode == d_real_inode(tmp->path.dentry) &&
+   new->offset == tmp->offset &&
+   new->ref_ctr_offset != tmp->ref_ctr_offset &&
+   new->ref_ctr_offset != 0 &&
+   tmp->ref_ctr_offset != 0) {
+   pr_warn("Reference counter offset mismatch.");
+   return ERR_PTR(-EINVAL);
+   }
+   }
+   return old;
+}
+
 /* Register a trace_uprobe and probe_event */
 static int register_trace_uprobe(struct trace_uprobe *tu)
 {
@@ -333,8 +365,12 @@ static int register_trace_uprobe(struct trace_uprobe *tu)
mutex_lock(_lock);
 
/* register as an event */
-   old_tu = find_probe_event(trace_event_name(>tp.call),
-   tu->tp.call.class->system);
+   old_tu = find_old_trace_uprobe(tu);
+   if (IS_ERR(old_tu)) {
+   ret = PTR_ERR(old_tu);
+   goto end;
+   }
+
if (old_tu) {
/* delete old event */
ret = unregister_trace_uprobe(old_tu);
-- 
2.14.4



[PATCH v6 5/6] Uprobes/sdt: Prevent multiple reference counter for same uprobe

2018-07-16 Thread Ravi Bangoria
We assume to have only one reference counter for one uprobe. Don't
allow user to register multiple uprobes having same inode+offset
but different reference counter.

Though, existing tools which already support SDT events creates
normal uprobe and updates reference counter on their own. Allow 0 as
a special value for reference counter offset. I.e. two uprobes, one
having ref_ctr_offset=0 and the other having non-zero ref_ctr_offset
can coexists. This gives user a flexibility to either depend on
kernel uprobe infrastructure to maintain reference counter or just
use normal uprobe and maintain reference counter on his own.

Signed-off-by: Ravi Bangoria 
---
 kernel/events/uprobes.c | 52 +++--
 1 file changed, 50 insertions(+), 2 deletions(-)

diff --git a/kernel/events/uprobes.c b/kernel/events/uprobes.c
index 84da8512a974..563cc3e625b3 100644
--- a/kernel/events/uprobes.c
+++ b/kernel/events/uprobes.c
@@ -63,6 +63,8 @@ static struct percpu_rw_semaphore dup_mmap_sem;
 
 /* Have a copy of original instruction */
 #define UPROBE_COPY_INSN   0
+/* Reference counter offset is reloaded with non-zero value. */
+#define REF_CTR_OFF_RELOADED   1
 
 struct uprobe {
struct rb_node  rb_node;/* node in the rb tree */
@@ -476,9 +478,23 @@ int uprobe_write_opcode(struct arch_uprobe *auprobe, 
struct mm_struct *mm,
return ret;
 
ret = verify_opcode(old_page, vaddr, );
-   if (ret <= 0)
+   if (ret < 0)
goto put_old;
 
+   /*
+* If instruction is already patched but reference counter offset
+* has been reloaded to non-zero value, increment the reference
+* counter and return.
+*/
+   if (ret == 0) {
+   if (is_register &&
+   test_bit(REF_CTR_OFF_RELOADED, >flags)) {
+   WARN_ON(!uprobe->ref_ctr_offset);
+   ret = update_ref_ctr(uprobe, mm, true);
+   }
+   goto put_old;
+   }
+
/* We are going to replace instruction, update ref_ctr. */
if (!ref_ctr_updated && uprobe->ref_ctr_offset) {
ret = update_ref_ctr(uprobe, mm, is_register);
@@ -679,6 +695,30 @@ static struct uprobe *alloc_uprobe(struct inode *inode, 
loff_t offset,
cur_uprobe = insert_uprobe(uprobe);
/* a uprobe exists for this inode:offset combination */
if (cur_uprobe) {
+   /*
+* If inode+offset matches, ref_ctr_offset must match as
+* well. Though, 0 is a special value for ref_ctr_offset.
+*/
+   if (cur_uprobe->ref_ctr_offset != uprobe->ref_ctr_offset &&
+   cur_uprobe->ref_ctr_offset != 0 &&
+   uprobe->ref_ctr_offset != 0) {
+   pr_warn("Err: Reference counter mismatch.\n");
+   put_uprobe(cur_uprobe);
+   kfree(uprobe);
+   return ERR_PTR(-EINVAL);
+   }
+
+   /*
+* If existing uprobe->ref_ctr_offset is 0 and user is
+* registering same uprobe with non-zero ref_ctr_offset,
+* set new ref_ctr_offset to existing uprobe.
+*/
+
+   if (!cur_uprobe->ref_ctr_offset && uprobe->ref_ctr_offset) {
+   cur_uprobe->ref_ctr_offset = uprobe->ref_ctr_offset;
+   set_bit(REF_CTR_OFF_RELOADED, _uprobe->flags);
+   }
+
kfree(uprobe);
uprobe = cur_uprobe;
}
@@ -971,6 +1011,7 @@ register_for_each_vma(struct uprobe *uprobe, struct 
uprobe_consumer *new)
bool is_register = !!new;
struct map_info *info;
int err = 0;
+   bool installed = false;
 
percpu_down_write(_mmap_sem);
info = build_map_info(uprobe->inode->i_mapping,
@@ -1000,8 +1041,10 @@ register_for_each_vma(struct uprobe *uprobe, struct 
uprobe_consumer *new)
if (is_register) {
/* consult only the "caller", new consumer. */
if (consumer_filter(new,
-   UPROBE_FILTER_REGISTER, mm))
+   UPROBE_FILTER_REGISTER, mm)) {
err = install_breakpoint(uprobe, mm, vma, 
info->vaddr);
+   installed = true;
+   }
} else if (test_bit(MMF_HAS_UPROBES, >flags)) {
if (!filter_chain(uprobe,
UPROBE_FILTER_UNREGISTER, mm))
@@ -1016,6 +1059,8 @@ register_for_each_vma(struct uprobe *uprobe, struct 
uprobe_consumer *new)
}
  out:
percpu_up_write(_mmap_sem);
+   if (installed)
+   clear_bit(RE

[PATCH v6 6/6] perf probe: Support SDT markers having reference counter (semaphore)

2018-07-16 Thread Ravi Bangoria
With this, perf buildid-cache will save SDT markers with reference
counter in probe cache. Perf probe will be able to probe markers
having reference counter. Ex,

  # readelf -n /tmp/tick | grep -A1 loop2
Name: loop2
... Semaphore: 0x10020036

  # ./perf buildid-cache --add /tmp/tick
  # ./perf probe sdt_tick:loop2
  # ./perf stat -e sdt_tick:loop2 /tmp/tick
hi: 0
hi: 1
hi: 2
^C
 Performance counter stats for '/tmp/tick':
 3  sdt_tick:loop2
   2.561851452 seconds time elapsed

Signed-off-by: Ravi Bangoria 
Acked-by: Masami Hiramatsu 
Acked-by: Srikar Dronamraju 
---
 tools/perf/util/probe-event.c | 39 
 tools/perf/util/probe-event.h |  1 +
 tools/perf/util/probe-file.c  | 34 ++--
 tools/perf/util/probe-file.h  |  1 +
 tools/perf/util/symbol-elf.c  | 46 ---
 tools/perf/util/symbol.h  |  7 +++
 6 files changed, 106 insertions(+), 22 deletions(-)

diff --git a/tools/perf/util/probe-event.c b/tools/perf/util/probe-event.c
index f119eb628dbb..e86f8be89157 100644
--- a/tools/perf/util/probe-event.c
+++ b/tools/perf/util/probe-event.c
@@ -1819,6 +1819,12 @@ int parse_probe_trace_command(const char *cmd, struct 
probe_trace_event *tev)
tp->offset = strtoul(fmt2_str, NULL, 10);
}
 
+   if (tev->uprobes) {
+   fmt2_str = strchr(p, '(');
+   if (fmt2_str)
+   tp->ref_ctr_offset = strtoul(fmt2_str + 1, NULL, 0);
+   }
+
tev->nargs = argc - 2;
tev->args = zalloc(sizeof(struct probe_trace_arg) * tev->nargs);
if (tev->args == NULL) {
@@ -2012,6 +2018,22 @@ static int synthesize_probe_trace_arg(struct 
probe_trace_arg *arg,
return err;
 }
 
+static int
+synthesize_uprobe_trace_def(struct probe_trace_event *tev, struct strbuf *buf)
+{
+   struct probe_trace_point *tp = >point;
+   int err;
+
+   err = strbuf_addf(buf, "%s:0x%lx", tp->module, tp->address);
+
+   if (err >= 0 && tp->ref_ctr_offset) {
+   if (!uprobe_ref_ctr_is_supported())
+   return -1;
+   err = strbuf_addf(buf, "(0x%lx)", tp->ref_ctr_offset);
+   }
+   return err >= 0 ? 0 : -1;
+}
+
 char *synthesize_probe_trace_command(struct probe_trace_event *tev)
 {
struct probe_trace_point *tp = >point;
@@ -2041,15 +2063,17 @@ char *synthesize_probe_trace_command(struct 
probe_trace_event *tev)
}
 
/* Use the tp->address for uprobes */
-   if (tev->uprobes)
-   err = strbuf_addf(, "%s:0x%lx", tp->module, tp->address);
-   else if (!strncmp(tp->symbol, "0x", 2))
+   if (tev->uprobes) {
+   err = synthesize_uprobe_trace_def(tev, );
+   } else if (!strncmp(tp->symbol, "0x", 2)) {
/* Absolute address. See try_to_find_absolute_address() */
err = strbuf_addf(, "%s%s0x%lx", tp->module ?: "",
  tp->module ? ":" : "", tp->address);
-   else
+   } else {
err = strbuf_addf(, "%s%s%s+%lu", tp->module ?: "",
tp->module ? ":" : "", tp->symbol, tp->offset);
+   }
+
if (err)
goto error;
 
@@ -2633,6 +2657,13 @@ static void warn_uprobe_event_compat(struct 
probe_trace_event *tev)
 {
int i;
char *buf = synthesize_probe_trace_command(tev);
+   struct probe_trace_point *tp = >point;
+
+   if (tp->ref_ctr_offset && !uprobe_ref_ctr_is_supported()) {
+   pr_warning("A semaphore is associated with %s:%s and "
+  "seems your kernel doesn't support it.\n",
+  tev->group, tev->event);
+   }
 
/* Old uprobe event doesn't support memory dereference */
if (!tev->uprobes || tev->nargs == 0 || !buf)
diff --git a/tools/perf/util/probe-event.h b/tools/perf/util/probe-event.h
index 45b14f020558..15a98c3a2a2f 100644
--- a/tools/perf/util/probe-event.h
+++ b/tools/perf/util/probe-event.h
@@ -27,6 +27,7 @@ struct probe_trace_point {
char*symbol;/* Base symbol */
char*module;/* Module name */
unsigned long   offset; /* Offset from symbol */
+   unsigned long   ref_ctr_offset; /* SDT reference counter offset */
unsigned long   address;/* Actual address of the trace point */
boolretprobe;   /* Return probe flag */
 };
diff --git a/tools/perf/util/probe-file.c b/tools/perf/util/probe-file.c
index b76088fadf3d..aac7817d9e14 100644
--- a/tools/perf/util/probe-file.c

Re: [PATCH 42/46] perf script powerpc: Python script for hypervisor call statistics

2018-06-06 Thread Ravi Bangoria
Hi Paul,

On 06/06/2018 08:23 PM, Paul Clarke wrote:
> On 06/05/2018 12:50 PM, Arnaldo Carvalho de Melo wrote:
>> From: Ravi Bangoria 
>>
>> Add python script to show hypervisor call statistics. Ex,
>>
>>   # perf record -a -e "{powerpc:hcall_entry,powerpc:hcall_exit}"
>>   # perf script -s scripts/python/powerpc-hcalls.py
>> hcallcount   min(ns)   max(ns)   avg(ns)
>> 
>> H_RANDOM82   838  1164   904
>> H_PUT_TCE   47  1078  5928  2003
>> H_EOI  266  1336  3546  1654
>> H_ENTER 28  1646  4038  1952
>> H_PUT_TCE_INDIRECT 230  2166 18168  6109
>> H_IPI  238  1072  3232  1688
>> H_SEND_LOGICAL_LAN  42  5488 21366  7694
>> H_STUFF_TCE294   986  6210  3591
>> H_XIRR 266  2286  6990  3783
>> H_PROTECT   10  2196  3556  2555
>> H_VIO_SIGNAL   294  1028  2784  1311
>> H_ADD_LOGICAL_LAN_BUFFER53  1978  3450  2600
>> H_SEND_CRQ  77  1762  7240  2447
> 
> This translation from HCALL code to mnemonic is more generally useful.  Is 
> there a good way to make the "hcall_table_lookup" function more generally 
> available, like "syscall_name" in 
> scripts/python/Perf-Trace-Util/lib/Perf/Trace/Util.py ?  It's even simpler 
> than the syscall ID-to-name mapping, because the HCALL codes are constant, 
> unlike syscall IDs which vary between arches.
> 

Right, but I don't see any other python script for hcalls. So, I thought
to add it in the script itself. Let me know if you are planning to add
any :) We will move it in a generic module.

Ravi



Re: [PATCH 0/7] Uprobes: Support SDT markers having reference count (semaphore)

2018-06-10 Thread Ravi Bangoria
Hi Oleg,

On 06/08/2018 10:06 PM, Oleg Nesterov wrote:
> Hello,
> 
> I am travelling till the end of the next week, can't read this version
> until I return. Just one question,
> 
> On 06/06, Ravi Bangoria wrote:
>>
>>  1. One of the major reason was the deadlock between uprobe_lock and
>>  mm->mmap inside trace_uprobe_mmap(). That deadlock was not easy to fix
> 
> Could you remind what exactly was wrong?
> 
> I can't find your previous email about this problem, but iirc you didn't
> explain the deadlock in details, just copied some traces from lockdep...

The deadlock is between mm->mmap_sem and uprobe_lock.

Some existing code path is taking these locks in following order:
uprobe_lock
  event_mutex
uprobe->register_rwsem
  dup_mmap_sem
mm->mmap_sem

I've introduced new function trace_uprobe_mmap() which gets called
from mmap_region() / vma_adjust() with mm->mmap_sem already acquired.
And it has to take uprobe_lock to loop over all trace_uprobes. i.e.
the sequence is:
mm->mmap_sem
  uprobe_lock

Why it's difficult to resolve is because trace_uprobe_mmap() does
not have control over mm->mmap_sem.

Detailed trace from lockdep:

[  499.258006] ==
[  499.258205] WARNING: possible circular locking dependency detected
[  499.258409] 4.17.0-rc3+ #76 Not tainted
[  499.258528] --
[  499.258731] perf/6744 is trying to acquire lock:
[  499.258895] e4895f49 (uprobe_lock){+.+.}, at: 
trace_uprobe_mmap+0x78/0x130
[  499.259147]
[  499.259147] but task is already holding lock:
[  499.259349] 9ec93a76 (>mmap_sem){}, at: 
vm_mmap_pgoff+0xe0/0x160
[  499.259597]
[  499.259597] which lock already depends on the new lock.
[  499.259597]
[  499.259848]
[  499.259848] the existing dependency chain (in reverse order) is:
[  499.260086]
[  499.260086] -> #4 (>mmap_sem){}:
[  499.260277]__lock_acquire+0x53c/0x910
[  499.260442]lock_acquire+0xf4/0x2f0
[  499.260595]down_write_killable+0x6c/0x150
[  499.260764]copy_process.isra.34.part.35+0x1594/0x1be0
[  499.260967]_do_fork+0xf8/0x910
[  499.261090]ppc_clone+0x8/0xc
[  499.261209]
[  499.261209] -> #3 (_mmap_sem){}:
[  499.261378]__lock_acquire+0x53c/0x910
[  499.261540]lock_acquire+0xf4/0x2f0
[  499.261669]down_write+0x6c/0x110
[  499.261793]percpu_down_write+0x48/0x140
[  499.261954]register_for_each_vma+0x6c/0x2a0
[  499.262116]uprobe_register+0x230/0x320
[  499.262277]probe_event_enable+0x1cc/0x540
[  499.262435]perf_trace_event_init+0x1e0/0x350
[  499.262587]perf_trace_init+0xb0/0x110
[  499.262750]perf_tp_event_init+0x38/0x90
[  499.262910]perf_try_init_event+0x10c/0x150
[  499.263075]perf_event_alloc+0xbb0/0xf10
[  499.263235]sys_perf_event_open+0x2a8/0xdd0
[  499.263396]system_call+0x58/0x6c
[  499.263516]
[  499.263516] -> #2 (>register_rwsem){}:
[  499.263723]__lock_acquire+0x53c/0x910
[  499.263884]lock_acquire+0xf4/0x2f0
[  499.264002]down_write+0x6c/0x110
[  499.264118]uprobe_register+0x1ec/0x320
[  499.264283]probe_event_enable+0x1cc/0x540
[  499.264442]perf_trace_event_init+0x1e0/0x350
[  499.264603]perf_trace_init+0xb0/0x110
[  499.264766]perf_tp_event_init+0x38/0x90
[  499.264930]perf_try_init_event+0x10c/0x150
[  499.265092]perf_event_alloc+0xbb0/0xf10
[  499.265261]sys_perf_event_open+0x2a8/0xdd0
[  499.265424]system_call+0x58/0x6c
[  499.265542]
[  499.265542] -> #1 (event_mutex){+.+.}:
[  499.265738]__lock_acquire+0x53c/0x910
[  499.265896]lock_acquire+0xf4/0x2f0
[  499.266019]__mutex_lock+0xa0/0xab0
[  499.266142]trace_add_event_call+0x44/0x100
[  499.266310]create_trace_uprobe+0x4a0/0x8b0
[  499.266474]trace_run_command+0xa4/0xc0
[  499.266631]trace_parse_run_command+0xe4/0x200
[  499.266799]probes_write+0x20/0x40
[  499.266922]__vfs_write+0x6c/0x240
[  499.267041]vfs_write+0xd0/0x240
[  499.267166]ksys_write+0x6c/0x110
[  499.267295]system_call+0x58/0x6c
[  499.267413]
[  499.267413] -> #0 (uprobe_lock){+.+.}:
[  499.267591]validate_chain.isra.34+0xbd0/0x1000
[  499.267747]__lock_acquire+0x53c/0x910
[  499.267917]lock_acquire+0xf4/0x2f0
[  499.268048]__mutex_lock+0xa0/0xab0
[  499.268170]trace_uprobe_mmap+0x78/0x130
[  499.268335]uprobe_mmap+0x80/0x3b0
[  499.268464]mmap_region+0x290/0x660
[  499.268590]do_mmap+0x40c/0x500
[  499.268718]vm_mmap_pgoff+0x114/0x160
[  499.268870]ksys_mmap_pgoff+0xe8/0x2e0
[  499.269034]sys_mma

Re: [PATCH 0/7] Uprobes: Support SDT markers having reference count (semaphore)

2018-06-10 Thread Ravi Bangoria
Hi Masami,

>>> Hmm, it sounds simple... maybe we can increment refctr in 
>>> install_breakpoint/
>>> remove_breakpoint?
>>
>> Not really, it would be simpler if I can put it inside install_breakpoint().
>> Consider an mmap() case. Probed instruction resides in the text section 
>> whereas
>> reference counter resides in the data section. These sections gets mapped 
>> using
>> separate mmap() calls. So, when process mmaps the text section we will 
>> change the
>> instruction, but section holding the reference counter may not have been 
>> mapped
>> yet in the virtual memory. If so, we will fail to update the reference 
>> counter.
> 
> Got it. 
> In such case, maybe we can hook the target page mmapped and do 
> install_breakpoint()
> at that point. Since the instruction is protected by a refctr, unless mmap the
> page on where the refctr is, the program doesn't reach the tracepoint. Is 
> that right?
> 

You mean, when mmap(text) happens, save the target page somewhere and when
mmap(data) happens, update both instruction and ref_ctr?

This sounds feasible. Let me think on it.

Thanks for suggestion,
Ravi



Re: [PATCH] perf test: Add watchpoint test

2018-09-11 Thread Ravi Bangoria



On 09/10/2018 11:01 PM, Arnaldo Carvalho de Melo wrote:
> Em Mon, Sep 10, 2018 at 11:18:30AM -0300, Arnaldo Carvalho de Melo escreveu:
>> Em Mon, Sep 10, 2018 at 10:47:54AM -0300, Arnaldo Carvalho de Melo escreveu:
>>> Em Mon, Sep 10, 2018 at 12:31:54PM +0200, Jiri Olsa escreveu:
>>>> On Mon, Sep 10, 2018 at 03:28:11PM +0530, Ravi Bangoria wrote:
>>>>> Ex on powerpc:
>>>>>   $ sudo ./perf test 22
>>>>>   22: Watchpoint:
>>>>>   22.1: Read Only Watchpoint: Ok
>>>>>   22.2: Write Only Watchpoint   : Ok
>>>>>   22.3: Read / Write Watchpoint : Ok
>>>>>   22.4: Modify Watchpoint   : Ok
> 
>>>> cool, thanks!
> 
>>>> Acked-by: Jiri Olsa 
> 
>>> Thanks, applied.
> 
> Oops, fails when cross-building it to mips, I'll try to fix after lunch:


Sorry for bit late reply. Will send v2 with the fix.

Thanks
Ravi

> 
>   18   109.48 debian:experimental   : Ok   gcc (Debian 8.2.0-4) 8.2.0
>   1942.66 debian:experimental-x-arm64   : Ok   aarch64-linux-gnu-gcc 
> (Debian 8.1.0-12) 8.1.0
>   2022.33 debian:experimental-x-mips: FAIL mips-linux-gnu-gcc (Debian 
> 8.1.0-12) 8.1.0
>   2120.05 debian:experimental-x-mips64  : FAIL mips64-linux-gnuabi64-gcc 
> (Debian 8.1.0-12) 8.1.0
>   2222.85 debian:experimental-x-mipsel  : FAIL mipsel-linux-gnu-gcc 
> (Debian 8.1.0-12) 8.1.0
> 
>   CC   /tmp/build/perf/tests/bp_account.o
>   CC   /tmp/build/perf/tests/wp.o
> tests/wp.c:5:10: fatal error: arch-tests.h: No such file or directory
>  #include "arch-tests.h"
>   ^~
> compilation terminated.
> mv: cannot stat '/tmp/build/perf/tests/.wp.o.tmp': No such file or directory
> make[4]: *** [/git/linux/tools/build/Makefile.build:97: 
> /tmp/build/perf/tests/wp.o] Error 1
> make[4]: *** Waiting for unfinished jobs
>   CC   /tmp/build/perf/util/record.o
>   CC   /tmp/build/perf/util/srcline.o
> make[3]: *** [/git/linux/tools/build/Makefile.build:139: tests] Error 2
> make[2]: *** [Makefile.perf:507: /tmp/build/perf/perf-in.o] Error 2
> make[2]: *** Waiting for unfinished jobs
> 



Re: [PATCH] perf test: Add watchpoint test

2018-09-11 Thread Ravi Bangoria


> While testing, I got curious, as a 'perf test' user, why one of the
> tests had the "Skip" result:
> 
> [root@seventh ~]# perf test watchpoint
> 22: Watchpoint:
> 22.1: Read Only Watchpoint: Skip
> 22.2: Write Only Watchpoint   : Ok
> 22.3: Read / Write Watchpoint : Ok
> 22.4: Modify Watchpoint   : Ok
> [root@seventh ~]#
> 
> I tried with 'perf test -v watchpoint' but that didn't help, perhaps you
> could add some message after the "Skip" telling why it skipped that
> test? I.e. hardware doesn't have that capability, kernel driver not yet
> supporting that, something else?

Sure will add a message:

pr_debug("Hardware does not support read only watchpoints.");

Ravi



[PATCH v2] perf test: Add watchpoint test

2018-09-12 Thread Ravi Bangoria
We don't have perf test available to test watchpoint functionality.
Add simple set of tests:
 - Read only watchpoint
 - Write only watchpoint
 - Read / Write watchpoint
 - Runtime watchpoint modification

Ex on powerpc:
  $ sudo ./perf test 22
  22: Watchpoint:
  22.1: Read Only Watchpoint: Ok
  22.2: Write Only Watchpoint   : Ok
  22.3: Read / Write Watchpoint : Ok
  22.4: Modify Watchpoint   : Ok

Signed-off-by: Ravi Bangoria 
Acked-by: Jiri Olsa 
---
v1 -> v2:
 - Fix build failure for mips and other archs.
 - Show debug message if subtest is not supported.

 tools/perf/tests/Build  |   1 +
 tools/perf/tests/builtin-test.c |   9 ++
 tools/perf/tests/tests.h|   3 +
 tools/perf/tests/wp.c   | 229 
 4 files changed, 242 insertions(+)
 create mode 100644 tools/perf/tests/wp.c

diff --git a/tools/perf/tests/Build b/tools/perf/tests/Build
index 6c108fa79ae3..0b2b8305c965 100644
--- a/tools/perf/tests/Build
+++ b/tools/perf/tests/Build
@@ -21,6 +21,7 @@ perf-y += python-use.o
 perf-y += bp_signal.o
 perf-y += bp_signal_overflow.o
 perf-y += bp_account.o
+perf-y += wp.o
 perf-y += task-exit.o
 perf-y += sw-clock.o
 perf-y += mmap-thread-lookup.o
diff --git a/tools/perf/tests/builtin-test.c b/tools/perf/tests/builtin-test.c
index d7a5e1b9aa6f..54ca7d87236f 100644
--- a/tools/perf/tests/builtin-test.c
+++ b/tools/perf/tests/builtin-test.c
@@ -120,6 +120,15 @@ static struct test generic_tests[] = {
.func = test__bp_accounting,
.is_supported = test__bp_signal_is_supported,
},
+   {
+   .desc = "Watchpoint",
+   .func = test__wp,
+   .subtest = {
+   .skip_if_fail   = false,
+   .get_nr = test__wp_subtest_get_nr,
+   .get_desc   = test__wp_subtest_get_desc,
+   },
+   },
{
.desc = "Number of exit events of a simple workload",
.func = test__task_exit,
diff --git a/tools/perf/tests/tests.h b/tools/perf/tests/tests.h
index a9760e790563..8e26a4148f30 100644
--- a/tools/perf/tests/tests.h
+++ b/tools/perf/tests/tests.h
@@ -59,6 +59,9 @@ int test__python_use(struct test *test, int subtest);
 int test__bp_signal(struct test *test, int subtest);
 int test__bp_signal_overflow(struct test *test, int subtest);
 int test__bp_accounting(struct test *test, int subtest);
+int test__wp(struct test *test, int subtest);
+const char *test__wp_subtest_get_desc(int subtest);
+int test__wp_subtest_get_nr(void);
 int test__task_exit(struct test *test, int subtest);
 int test__mem(struct test *test, int subtest);
 int test__sw_clock_freq(struct test *test, int subtest);
diff --git a/tools/perf/tests/wp.c b/tools/perf/tests/wp.c
new file mode 100644
index ..017a99317f94
--- /dev/null
+++ b/tools/perf/tests/wp.c
@@ -0,0 +1,229 @@
+// SPDX-License-Identifier: GPL-2.0
+#include 
+#include 
+#include 
+#include "tests.h"
+#include "debug.h"
+#include "cloexec.h"
+
+#define WP_TEST_ASSERT_VAL(fd, text, val)   \
+do {\
+   long long count;\
+   wp_read(fd, , sizeof(long long)); \
+   TEST_ASSERT_VAL(text, count == val);\
+} while (0)
+
+volatile u64 data1;
+volatile u8 data2[3];
+
+static int wp_read(int fd, long long *count, int size)
+{
+   int ret = read(fd, count, size);
+
+   if (ret != size) {
+   pr_debug("failed to read: %d\n", ret);
+   return -1;
+   }
+   return 0;
+}
+
+static void get__perf_event_attr(struct perf_event_attr *attr, int wp_type,
+void *wp_addr, unsigned long wp_len)
+{
+   memset(attr, 0, sizeof(struct perf_event_attr));
+   attr->type   = PERF_TYPE_BREAKPOINT;
+   attr->size   = sizeof(struct perf_event_attr);
+   attr->config = 0;
+   attr->bp_type= wp_type;
+   attr->bp_addr= (unsigned long)wp_addr;
+   attr->bp_len = wp_len;
+   attr->sample_period  = 1;
+   attr->sample_type= PERF_SAMPLE_IP;
+   attr->exclude_kernel = 1;
+   attr->exclude_hv = 1;
+}
+
+static int __event(int wp_type, void *wp_addr, unsigned long wp_len)
+{
+   int fd;
+   struct perf_event_attr attr;
+
+   get__perf_event_attr(, wp_type, wp_addr, wp_len);
+   fd = sys_perf_event_open(, 0, -1, -1,
+perf_event_open_cloexec_flag());
+   if (fd < 0)
+   pr_debug("failed opening event %x\n", attr.bp_type);
+
+   return fd;
+}
+
+static int wp_ro_test(void)
+{
+   int fd

Re: [PATCH] Uprobes: Fix kernel oops with delayed_uprobe_remove()

2018-11-14 Thread Ravi Bangoria
Hi Oleg,

On 11/14/18 9:36 PM, Oleg Nesterov wrote:
> On 11/14, Ravi Bangoria wrote:
>>
>> syzbot reported a kernel crash with delayed_uprobe_remove():
>>   https://lkml.org/lkml/2018/11/1/1244
>>
>> Backtrace mentioned in the link points to a race between process
>> exit and uprobe_unregister(). Fix it by locking delayed_uprobe_lock
>> before calling delayed_uprobe_remove() from put_uprobe().
> 
> The patch looks good to me, but could you update the changelog?
> 
> Please explain that the exiting task calls uprobe_clear_state() which
> can race with delayed_uprobe_remove(). IIUC this is the only problem
> solved by this patch, right?

Right. Is this better:

There could be a race between task exit and probe unregister:

  exit_mm()
  mmput()
  __mmput() uprobe_unregister()
  uprobe_clear_state()  put_uprobe()
  delayed_uprobe_remove()   delayed_uprobe_remove()

put_uprobe() is calling delayed_uprobe_remove() without taking
delayed_uprobe_lock and thus the race sometimes results in a
kernel crash. Fix this by taking delayed_uprobe_lock before
calling delayed_uprobe_remove() from put_uprobe().

Detailed crash log can be found at:
  https://lkml.org/lkml/2018/11/1/1244



[PATCH 1/2] perf stat: Use perf_evsel__is_clocki() for clock events

2018-11-15 Thread Ravi Bangoria
We already have function to check if a given event is either
SW_CPU_CLOCK or SW_TASK_CLOCK. Utilize it.

Signed-off-by: Ravi Bangoria 
---
 tools/perf/util/stat-shadow.c | 3 +--
 1 file changed, 1 insertion(+), 2 deletions(-)

diff --git a/tools/perf/util/stat-shadow.c b/tools/perf/util/stat-shadow.c
index 8ad32763cfff..f0a8cec55c47 100644
--- a/tools/perf/util/stat-shadow.c
+++ b/tools/perf/util/stat-shadow.c
@@ -212,8 +212,7 @@ void perf_stat__update_shadow_stats(struct perf_evsel 
*counter, u64 count,
 
count *= counter->scale;
 
-   if (perf_evsel__match(counter, SOFTWARE, SW_TASK_CLOCK) ||
-   perf_evsel__match(counter, SOFTWARE, SW_CPU_CLOCK))
+   if (perf_evsel__is_clock(counter))
update_runtime_stat(st, STAT_NSECS, 0, cpu, count);
else if (perf_evsel__match(counter, HARDWARE, HW_CPU_CYCLES))
update_runtime_stat(st, STAT_CYCLES, ctx, cpu, count);
-- 
2.17.1



[RFC 2/2] perf stat: Fix shadow stats for clock events

2018-11-15 Thread Ravi Bangoria
Commit 0aa802a79469 ("perf stat: Get rid of extra clock display
function") introduced scale and unit for clock events. Thus,
perf_stat__update_shadow_stats() now saves scaled values of
clock events in msecs, instead of original nsecs. But while
calculating values of shadow stats we still consider clock
event values in nsecs. This results in a wrong shadow stat
values. Ex,

  # ./perf stat -e task-clock,cycles ls

  2.62 msec task-clock:u#0.624 CPUs utilized
 2,501,536  cycles:u# 1250768.000 GHz

Fix this by considering clock events's saved stats in msecs:

  # ./perf stat -e task-clock,cycles ls

  2.42 msec task-clock:u#0.754 CPUs utilized
 2,338,747  cycles:u#1.169 GHz

Note:
The problem with this approach is, we are losing fractional part
while converting nsecs to msecs. This results in a sightly different
values of shadow stats.

Reported-by: Anton Blanchard 
Fixes: 0aa802a79469 ("perf stat: Get rid of extra clock display function")
Signed-off-by: Ravi Bangoria 
---
 tools/perf/util/stat-shadow.c | 13 +++--
 tools/perf/util/stat.h|  2 +-
 2 files changed, 8 insertions(+), 7 deletions(-)

diff --git a/tools/perf/util/stat-shadow.c b/tools/perf/util/stat-shadow.c
index f0a8cec55c47..5b28b278a24e 100644
--- a/tools/perf/util/stat-shadow.c
+++ b/tools/perf/util/stat-shadow.c
@@ -1,5 +1,6 @@
 // SPDX-License-Identifier: GPL-2.0
 #include 
+#include 
 #include "evsel.h"
 #include "stat.h"
 #include "color.h"
@@ -213,7 +214,7 @@ void perf_stat__update_shadow_stats(struct perf_evsel 
*counter, u64 count,
count *= counter->scale;
 
if (perf_evsel__is_clock(counter))
-   update_runtime_stat(st, STAT_NSECS, 0, cpu, count);
+   update_runtime_stat(st, STAT_MSECS, 0, cpu, count);
else if (perf_evsel__match(counter, HARDWARE, HW_CPU_CYCLES))
update_runtime_stat(st, STAT_CYCLES, ctx, cpu, count);
else if (perf_stat_evsel__is(counter, CYCLES_IN_TX))
@@ -873,10 +874,10 @@ void perf_stat__print_shadow_stats(struct 
perf_stat_config *config,
} else if (perf_evsel__match(evsel, HARDWARE, 
HW_STALLED_CYCLES_BACKEND)) {
print_stalled_cycles_backend(config, cpu, evsel, avg, out, st);
} else if (perf_evsel__match(evsel, HARDWARE, HW_CPU_CYCLES)) {
-   total = runtime_stat_avg(st, STAT_NSECS, 0, cpu);
+   total = runtime_stat_avg(st, STAT_MSECS, 0, cpu);
 
if (total) {
-   ratio = avg / total;
+   ratio = avg / (total * NSEC_PER_MSEC);
print_metric(config, ctxp, NULL, "%8.3f", "GHz", ratio);
} else {
print_metric(config, ctxp, NULL, NULL, "Ghz", 0);
@@ -972,14 +973,14 @@ void perf_stat__print_shadow_stats(struct 
perf_stat_config *config,
} else if (evsel->metric_expr) {
generic_metric(config, evsel->metric_expr, 
evsel->metric_events, evsel->name,
evsel->metric_name, avg, cpu, out, st);
-   } else if (runtime_stat_n(st, STAT_NSECS, 0, cpu) != 0) {
+   } else if (runtime_stat_n(st, STAT_MSECS, 0, cpu) != 0) {
char unit = 'M';
char unit_buf[10];
 
-   total = runtime_stat_avg(st, STAT_NSECS, 0, cpu);
+   total = runtime_stat_avg(st, STAT_MSECS, 0, cpu);
 
if (total)
-   ratio = 1000.0 * avg / total;
+   ratio = 1000.0 * avg / (total * NSEC_PER_MSEC);
if (ratio < 0.001) {
ratio *= 1000;
unit = 'K';
diff --git a/tools/perf/util/stat.h b/tools/perf/util/stat.h
index 2f9c9159a364..941ad4b0836c 100644
--- a/tools/perf/util/stat.h
+++ b/tools/perf/util/stat.h
@@ -62,7 +62,7 @@ enum {
 
 enum stat_type {
STAT_NONE = 0,
-   STAT_NSECS,
+   STAT_MSECS, /* Milliseconds */
STAT_CYCLES,
STAT_STALLED_CYCLES_FRONT,
STAT_STALLED_CYCLES_BACK,
-- 
2.17.1



[PATCH] perf stat: Fix shadow stats for clock events

2018-11-15 Thread Ravi Bangoria
Commit 0aa802a79469 ("perf stat: Get rid of extra clock display
function") introduced scale and unit for clock events. Thus,
perf_stat__update_shadow_stats() now saves scaled values of
clock events in msecs, instead of original nsecs. But while
calculating values of shadow stats we still consider clock
event values in nsecs. This results in a wrong shadow stat
values. Ex,

  # ./perf stat -e task-clock,cycles ls

  2.60 msec task-clock:u#0.877 CPUs utilized
 2,430,564  cycles:u# 1215282.000 GHz

Fix this by saving original nsec values for clock events in
perf_stat__update_shadow_stats(). After patch:

  # ./perf stat -e task-clock,cycles ls

  3.14 msec task-clock:u#0.839 CPUs utilized
 3,094,528  cycles:u#0.985 GHz

Reported-by: Anton Blanchard 
Suggested-by: Jiri Olsa 
Fixes: 0aa802a79469 ("perf stat: Get rid of extra clock display function")
Signed-off-by: Ravi Bangoria 
---
 tools/perf/util/stat-shadow.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/tools/perf/util/stat-shadow.c b/tools/perf/util/stat-shadow.c
index f0a8cec55c47..3c22c58b3e90 100644
--- a/tools/perf/util/stat-shadow.c
+++ b/tools/perf/util/stat-shadow.c
@@ -209,11 +209,12 @@ void perf_stat__update_shadow_stats(struct perf_evsel 
*counter, u64 count,
int cpu, struct runtime_stat *st)
 {
int ctx = evsel_context(counter);
+   u64 count_ns = count;
 
count *= counter->scale;
 
if (perf_evsel__is_clock(counter))
-   update_runtime_stat(st, STAT_NSECS, 0, cpu, count);
+   update_runtime_stat(st, STAT_NSECS, 0, cpu, count_ns);
else if (perf_evsel__match(counter, HARDWARE, HW_CPU_CYCLES))
update_runtime_stat(st, STAT_CYCLES, ctx, cpu, count);
else if (perf_stat_evsel__is(counter, CYCLES_IN_TX))
-- 
2.17.1



Re: [PATCH] powerpc/xmon: Fix data-breakpoint

2016-11-22 Thread Ravi Bangoria
Thanks Michael,

On Tuesday 22 November 2016 05:03 PM, Michael Ellerman wrote:
> Ravi Bangoria  writes:
>
>> Xmon data-breakpoint feature is broken.
>>
>> Whenever there is a watchpoint match occurs, hw_breakpoint_handler will
>> be called by do_break via notifier chains mechanism. If watchpoint is
>> registered by xmon, hw_breakpoint_handler won't find any associated
>> perf_event and returns immediately with NOTIFY_STOP. Similarly, do_break
>> also returns without notifying to xmon.
>>
>> Solve this by returning NOTIFY_DONE when hw_breakpoint_handler does not
>> find any perf_event associated with matched watchpoint.
> .. rather than NOTIFY_STOP, which tells the core code to continue
> calling the other breakpoint handlers including the xmon one.
>
> Right?

Yes.

> Also any idea when we broke this?

Hmm, not sure exactly. The code is same since it was merged in 2010 when
support for hw_breakpoint was added for server processor.

-Ravi

> cheers
>



Re: [PATCH v8 2/3] perf annotate: Support jump instruction with target as second operand

2016-12-05 Thread Ravi Bangoria
Hi Arnaldo,

Hmm, so it's difficult to find example of this when we use debuginfo.
Because...

Jump__parse tries to look for two things 'offset' and 'target address'.

objdump with debuginfo will include offset in assembly f.e. annotate of
'smp_call_function_single' with perf.data and vmlinux I shared.

   │c016d6ac:   cmpwi  cr7,r9,0 
   ▒
   │c016d6b0: ↑ bnecr7,c016d59c 
<.smp_call_function_single+0x8c>   ▒
   │c016d6b4:   addis  r10,r2,-15   
   ▒

objdump of same function with kcore.

   │c016d6ac:   cmpwi  cr7,r9,0 
   ▒
   │c016d6b0: ↓ bnecr7,0xc016d59c   
   ▒
   │c016d6b4:   addis  r10,r2,-15   
   ▒

Annotating in first case won't show any issue because we directly get
offset. But in this case as well, we are parsing wrong target address
in ops->target.addr

While we don't have offset in second case, we use target address to
find it. And thus it shows wrong o/p something like:

   │   cmpwi  cr7,r9,0  
   ▒
   │ ↓ bne3fe92afc  
   ▒
   │   addis  r10,r2,-15
   ▒

BTW, we have lot of such instructions in kernel.

Thanks,
-Ravi


On Monday 05 December 2016 09:26 PM, Ravi Bangoria wrote:
> Arch like powerpc has jump instructions that includes target address
> as second operand. For example, 'bne  cr7,0xc00f6154'. Add
> support for such instruction in perf annotate.
>
> objdump o/p:
>   c00f6140:   ld r9,1032(r31)
>   c00f6144:   cmpdi  cr7,r9,0
>   c00f6148:   bnecr7,0xc00f6154
>   c00f614c:   ld r9,2312(r30)
>   c00f6150:   stdr9,1032(r31)
>   c00f6154:   ld r9,88(r31)
>
> Corresponding perf annotate o/p:
>
> Before patch:
>  ld r9,1032(r31)
>  cmpdi  cr7,r9,0
>   v  bne3ff09f2c
>  ld r9,2312(r30)
>  stdr9,1032(r31)
>   74:ld r9,88(r31)
>
> After patch:
>  ld r9,1032(r31)
>  cmpdi  cr7,r9,0
>   v  bne74
>  ld r9,2312(r30)
>  stdr9,1032(r31)
>   74:ld r9,88(r31)
>
> Signed-off-by: Ravi Bangoria 
> ---
> Changes in v8:
>   - v7: https://lkml.org/lkml/2016/9/21/436
>   - Rebase to acme/perf/core
>   - Little change in patch description.
>   - No logical changes. (Cross arch annotate patches are in. This patch
> is for hardening annotate for powerpc.)
>
>  tools/perf/util/annotate.c | 6 +-
>  1 file changed, 5 insertions(+), 1 deletion(-)
>
> diff --git a/tools/perf/util/annotate.c b/tools/perf/util/annotate.c
> index ea7e0de..590244e 100644
> --- a/tools/perf/util/annotate.c
> +++ b/tools/perf/util/annotate.c
> @@ -223,8 +223,12 @@ bool ins__is_call(const struct ins *ins)
>  static int jump__parse(struct arch *arch __maybe_unused, struct ins_operands 
> *ops, struct map *map __maybe_unused)
>  {
>   const char *s = strchr(ops->raw, '+');
> + const char *c = strchr(ops->raw, ',');
>
> - ops->target.addr = strtoull(ops->raw, NULL, 16);
> + if (c++ != NULL)
> + ops->target.addr = strtoull(c, NULL, 16);
> + else
> + ops->target.addr = strtoull(ops->raw, NULL, 16);
>
>   if (s++ != NULL)
>   ops->target.offset = strtoull(s, NULL, 16);



<    5   6   7   8   9   10   11   12   13   14   >