[PATCH v3] tracing/kprobes: expose maxactive for kretprobe in kprobe_events

2017-04-03 Thread Alban Crequy
From: Alban Crequy <al...@kinvolk.io>

When a kretprobe is installed on a kernel function, there is a maximum
limit of how many calls in parallel it can catch (aka "maxactive"). A
kernel module could call register_kretprobe() and initialize maxactive
(see example in samples/kprobes/kretprobe_example.c).

But that is not exposed to userspace and it is currently not possible to
choose maxactive when writing to /sys/kernel/debug/tracing/kprobe_events

The default maxactive can be as low as 1 on single-core with a
non-preemptive kernel. This is too low and we need to increase it not
only for recursive functions, but for functions that sleep or resched.

This patch updates the format of the command that can be written to
kprobe_events so that maxactive can be optionally specified.

I need this for a bpf program attached to the kretprobe of
inet_csk_accept, which can sleep for a long time.

This patch includes a basic selftest:

> # ./ftracetest -v  test.d/kprobe/
> === Ftrace unit tests ===
> [1] Kprobe dynamic event - adding and removing[PASS]
> [2] Kprobe dynamic event - busy event check   [PASS]
> [3] Kprobe dynamic event with arguments   [PASS]
> [4] Kprobes event arguments with types[PASS]
> [5] Kprobe dynamic event with function tracer [PASS]
> [6] Kretprobe dynamic event with arguments[PASS]
> [7] Kretprobe dynamic event with maxactive[PASS]
>
> # of passed:  7
> # of failed:  0
> # of unresolved:  0
> # of untested:  0
> # of unsupported:  0
> # of xfailed:  0
> # of undefined(test bug):  0

BugLink: https://github.com/iovisor/bcc/issues/1072
Signed-off-by: Alban Crequy <al...@kinvolk.io>

---

Changes since v2:
- Explain the default maxactive value in the documentation.
  (Review from Steven Rostedt)

Changes since v1:
- Remove "(*)" from documentation. (Review from Masami Hiramatsu)
- Fix support for "r100" without the event name (Review from Masami Hiramatsu)
- Get rid of magic numbers within the code.  (Review from Steven Rostedt)
  Note that I didn't use KRETPROBE_MAXACTIVE_ALLOC since that patch is not
  merged.
- Return -E2BIG when maxactive is too big.
- Add basic selftest
---
 Documentation/trace/kprobetrace.txt|  5 ++-
 kernel/trace/trace_kprobe.c| 39 ++
 .../ftrace/test.d/kprobe/kretprobe_maxactive.tc| 39 ++
 3 files changed, 76 insertions(+), 7 deletions(-)
 create mode 100644 
tools/testing/selftests/ftrace/test.d/kprobe/kretprobe_maxactive.tc

diff --git a/Documentation/trace/kprobetrace.txt 
b/Documentation/trace/kprobetrace.txt
index 41ef9d8..25f3960 100644
--- a/Documentation/trace/kprobetrace.txt
+++ b/Documentation/trace/kprobetrace.txt
@@ -23,7 +23,7 @@ current_tracer. Instead of that, add probe points via
 Synopsis of kprobe_events
 -
   p[:[GRP/]EVENT] [MOD:]SYM[+offs]|MEMADDR [FETCHARGS] : Set a probe
-  r[:[GRP/]EVENT] [MOD:]SYM[+0] [FETCHARGS]: Set a return probe
+  r[MAXACTIVE][:[GRP/]EVENT] [MOD:]SYM[+0] [FETCHARGS] : Set a return probe
   -:[GRP/]EVENT: Clear a probe
 
  GRP   : Group name. If omitted, use "kprobes" for it.
@@ -32,6 +32,9 @@ Synopsis of kprobe_events
  MOD   : Module name which has given SYM.
  SYM[+offs]: Symbol+offset where the probe is inserted.
  MEMADDR   : Address where the probe is inserted.
+ MAXACTIVE : Maximum number of instances of the specified function that
+ can be probed simultaneously, or 0 for the default value
+ as defined in Documentation/kprobes.txt section 1.3.1.
 
  FETCHARGS : Arguments. Each probe can have up to 128 args.
   %REG : Fetch register REG
diff --git a/kernel/trace/trace_kprobe.c b/kernel/trace/trace_kprobe.c
index c5089c7..ae81f3c 100644
--- a/kernel/trace/trace_kprobe.c
+++ b/kernel/trace/trace_kprobe.c
@@ -25,6 +25,7 @@
 #include "trace_probe.h"
 
 #define KPROBE_EVENT_SYSTEM "kprobes"
+#define KRETPROBE_MAXACTIVE_MAX 4096
 
 /**
  * Kprobe event core functions
@@ -282,6 +283,7 @@ static struct trace_kprobe *alloc_trace_kprobe(const char 
*group,
 void *addr,
 const char *symbol,
 unsigned long offs,
+int maxactive,
 int nargs, bool is_return)
 {
struct trace_kprobe *tk;
@@ -309,6 +311,8 @@ static struct trace_kprobe *alloc_trace_kprobe(const char 
*group,
else
tk->rp.kp.pre_handler = kprobe_dispatcher;
 
+   tk->rp.maxactive = maxactive;
+
if (!event || !is_good_name(event)) {
ret = -EINVAL;
goto error;
@@ -598,8 +602,10 @@ static int cr

[PATCH v2] tracing/kprobes: expose maxactive for kretprobe in kprobe_events

2017-03-31 Thread Alban Crequy
When a kretprobe is installed on a kernel function, there is a maximum
limit of how many calls in parallel it can catch (aka "maxactive"). A
kernel module could call register_kretprobe() and initialize maxactive
(see example in samples/kprobes/kretprobe_example.c).

But that is not exposed to userspace and it is currently not possible to
choose maxactive when writing to /sys/kernel/debug/tracing/kprobe_events

The default maxactive can be as low as 1 on single-core with a
non-preemptive kernel. This is too low and we need to increase it not
only for recursive functions, but for functions that sleep or resched.

This patch updates the format of the command that can be written to
kprobe_events so that maxactive can be optionally specified.

I need this for a bpf program attached to the kretprobe of
inet_csk_accept, which can sleep for a long time.

This patch includes a basic selftest:

> # ./ftracetest -v  test.d/kprobe/
> === Ftrace unit tests ===
> [1] Kprobe dynamic event - adding and removing[PASS]
> [2] Kprobe dynamic event - busy event check   [PASS]
> [3] Kprobe dynamic event with arguments   [PASS]
> [4] Kprobes event arguments with types[PASS]
> [5] Kprobe dynamic event with function tracer [PASS]
> [6] Kretprobe dynamic event with arguments[PASS]
> [7] Kretprobe dynamic event with maxactive[PASS]
>
> # of passed:  7
> # of failed:  0
> # of unresolved:  0
> # of untested:  0
> # of unsupported:  0
> # of xfailed:  0
> # of undefined(test bug):  0

BugLink: https://github.com/iovisor/bcc/issues/1072
Signed-off-by: Alban Crequy <al...@kinvolk.io>

---

Changes since v1:
- Remove "(*)" from documentation. (Review from Masami Hiramatsu)
- Fix support for "r100" without the event name (Review from Masami Hiramatsu)
- Get rid of magic numbers within the code.  (Review from Steven Rostedt)
  Note that I didn't use KRETPROBE_MAXACTIVE_ALLOC since that patch is not
  merged.
- Return -E2BIG when maxactive is too big.
- Add basic selftest
---
 Documentation/trace/kprobetrace.txt|  4 ++-
 kernel/trace/trace_kprobe.c| 39 ++
 .../ftrace/test.d/kprobe/kretprobe_maxactive.tc| 39 ++
 3 files changed, 75 insertions(+), 7 deletions(-)
 create mode 100644 
tools/testing/selftests/ftrace/test.d/kprobe/kretprobe_maxactive.tc

diff --git a/Documentation/trace/kprobetrace.txt 
b/Documentation/trace/kprobetrace.txt
index 41ef9d8..7051a20 100644
--- a/Documentation/trace/kprobetrace.txt
+++ b/Documentation/trace/kprobetrace.txt
@@ -23,7 +23,7 @@ current_tracer. Instead of that, add probe points via
 Synopsis of kprobe_events
 -
   p[:[GRP/]EVENT] [MOD:]SYM[+offs]|MEMADDR [FETCHARGS] : Set a probe
-  r[:[GRP/]EVENT] [MOD:]SYM[+0] [FETCHARGS]: Set a return probe
+  r[MAXACTIVE][:[GRP/]EVENT] [MOD:]SYM[+0] [FETCHARGS] : Set a return probe
   -:[GRP/]EVENT: Clear a probe
 
  GRP   : Group name. If omitted, use "kprobes" for it.
@@ -32,6 +32,8 @@ Synopsis of kprobe_events
  MOD   : Module name which has given SYM.
  SYM[+offs]: Symbol+offset where the probe is inserted.
  MEMADDR   : Address where the probe is inserted.
+ MAXACTIVE : Maximum number of instances of the specified function that
+ can be probed simultaneously, or 0 for the default.
 
  FETCHARGS : Arguments. Each probe can have up to 128 args.
   %REG : Fetch register REG
diff --git a/kernel/trace/trace_kprobe.c b/kernel/trace/trace_kprobe.c
index c5089c7..ae81f3c 100644
--- a/kernel/trace/trace_kprobe.c
+++ b/kernel/trace/trace_kprobe.c
@@ -25,6 +25,7 @@
 #include "trace_probe.h"
 
 #define KPROBE_EVENT_SYSTEM "kprobes"
+#define KRETPROBE_MAXACTIVE_MAX 4096
 
 /**
  * Kprobe event core functions
@@ -282,6 +283,7 @@ static struct trace_kprobe *alloc_trace_kprobe(const char 
*group,
 void *addr,
 const char *symbol,
 unsigned long offs,
+int maxactive,
 int nargs, bool is_return)
 {
struct trace_kprobe *tk;
@@ -309,6 +311,8 @@ static struct trace_kprobe *alloc_trace_kprobe(const char 
*group,
else
tk->rp.kp.pre_handler = kprobe_dispatcher;
 
+   tk->rp.maxactive = maxactive;
+
if (!event || !is_good_name(event)) {
ret = -EINVAL;
goto error;
@@ -598,8 +602,10 @@ static int create_trace_kprobe(int argc, char **argv)
 {
/*
 * Argument syntax:
-*  - Add kprobe: p[:[GRP/]EVENT] [MOD:]KSYM[+OFFS]|KADDR [FETCHARGS]
-*  - Add kretprobe: r[:[GRP/]EVENT] [MOD:]KSYM[+0] [FETCHARGS

Re: [RFC PATCH tip/master 1/3] trace: kprobes: Show sum of probe/retprobe nmissed count

2017-03-31 Thread Alban Crequy
On Wed, Mar 29, 2017 at 7:22 AM, Masami Hiramatsu <mhira...@kernel.org> wrote:
> Show sum of probe and retprobe nmissed count in
> kprobe_profile, since retprobe can be missed even
> if the kprobe itself succeeeded.
> This explains user why their return probe didn't hit
> sometimes.
>
> Signed-off-by: Masami Hiramatsu <mhira...@kernel.org>

I tested this patch with my kretprobe on "inet_csk_accept" when there
are many processes waiting in the accept() syscall. I can now
successfully see the nmissed counter in
/sys/kernel/debug/tracing/kprobe_profile being incremented when the
kretprobe is missed.

Tested-by: Alban Crequy <al...@kinvolk.io>


> ---
>  kernel/trace/trace_kprobe.c |2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
>
> diff --git a/kernel/trace/trace_kprobe.c b/kernel/trace/trace_kprobe.c
> index 013f4e7..bbdc3de 100644
> --- a/kernel/trace/trace_kprobe.c
> +++ b/kernel/trace/trace_kprobe.c
> @@ -896,7 +896,7 @@ static int probes_profile_seq_show(struct seq_file *m, 
> void *v)
> seq_printf(m, "  %-44s %15lu %15lu\n",
>trace_event_name(>tp.call),
>trace_kprobe_nhit(tk),
> -  tk->rp.kp.nmissed);
> +  tk->rp.kp.nmissed + tk->rp.nmissed);
>
> return 0;
>  }
>
--
To unsubscribe from this list: send the line "unsubscribe linux-doc" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC PATCH tip/master 2/3] kprobes: Allocate kretprobe instance if its free list is empty

2017-03-30 Thread Alban Crequy
On Thu, Mar 30, 2017 at 8:53 AM, Ingo Molnar  wrote:
>
> * Masami Hiramatsu  wrote:
>
>> > So this is something I missed while the original code was merged, but the 
>> > concept
>> > looks a bit weird: why do we do any "allocation" while a handler is 
>> > executing?
>> >
>> > That's fundamentally fragile. What's the maximum number of parallel
>> > 'kretprobe_instance' required per kretprobe - one per CPU?
>>
>> It depends on the place where we put the probe. If the probed function will 
>> be
>> blocked (yield to other tasks), then we need a same number of threads on
>> the system which can invoke the function. So, ultimately, it is same
>> as function_graph tracer, we need it for each thread.
>
> So then put it into task_struct (assuming there's no 
> kretprobe-inside-kretprobe
> nesting allowed). There's just no way in hell we should be calling any complex
> kernel function from kernel probes!

Some kprobes are called from an interruption context. We have a kprobe
on tcp_set_state() and this is sometimes called when the network card
receives a packet.

> I mean, think about it, a kretprobe can be installed in a lot of places, and 
> now
> we want to call get_free_pages() from it?? This would add a massive amount of
> fragility.
>
> Instrumentation must be _simple_, every patch that adds more complexity to the
> most fundamental code path of it should raise a red flag ...
>
> So let's make this more robust, ok?
>
> Thanks,
>
> Ingo

Thanks,
Alban
--
To unsubscribe from this list: send the line "unsubscribe linux-doc" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v1] tracing/kprobes: expose maxactive for kretprobe in kprobe_events

2017-03-28 Thread Alban Crequy
Thanks for the review,

On Tue, Mar 28, 2017 at 5:23 PM, Masami Hiramatsu <mhira...@kernel.org> wrote:
> On Tue, 28 Mar 2017 15:52:22 +0200
> Alban Crequy <alban.cre...@gmail.com> wrote:
>
>> When a kretprobe is installed on a kernel function, there is a maximum
>> limit of how many calls in parallel it can catch (aka "maxactive"). A
>> kernel module could call register_kretprobe() and initialize maxactive
>> (see example in samples/kprobes/kretprobe_example.c).
>>
>> But that is not exposed to userspace and it is currently not possible to
>> choose maxactive when writing to /sys/kernel/debug/tracing/kprobe_events
>
> I see, I found same issue last week...
>
>>
>> The default maxactive can be as low as 1 on single-core with a
>> non-preemptive kernel. This is too low and we need to increase it not
>> only for recursive functions, but for functions that sleep or resched.
>>
>> This patch updates the format of the command that can be written to
>> kprobe_events so that maxactive can be optionally specified.
>
> Good! this is completely same what I'm planning to add.
>
>>
>> I need this for a bpf program attached to the kretprobe of
>> inet_csk_accept, which can sleep for a long time.
>
> I'm also preparing another patch for using kmalloc(GFP_ATOMIC) in
> kretprobe_pre_handler(), since this manual way is sometimes hard to
> estimate how many instances needed. Anyway, since that may involve
> unwilling memory allocation, this patch also needed.

Where is that kretprobe_pre_handler()? I don't see any allocations in
pre_handler_kretprobe().

>> BugLink: https://github.com/iovisor/bcc/issues/1072
>
> Could you also add Lukasz to Cc list, since he had reported an issue
> related this one.
>
> https://www.spinics.net/lists/linux-trace/msg00448.html
>
> Please see my comments below.
>
>> Signed-off-by: Alban Crequy <al...@kinvolk.io>
>> ---
>>  Documentation/trace/kprobetrace.txt |  4 +++-
>>  kernel/trace/trace_kprobe.c | 34 +-
>>  2 files changed, 32 insertions(+), 6 deletions(-)
>>
>> diff --git a/Documentation/trace/kprobetrace.txt 
>> b/Documentation/trace/kprobetrace.txt
>> index 41ef9d8..655ca7e 100644
>> --- a/Documentation/trace/kprobetrace.txt
>> +++ b/Documentation/trace/kprobetrace.txt
>> @@ -23,7 +23,7 @@ current_tracer. Instead of that, add probe points via
>>  Synopsis of kprobe_events
>>  -
>>p[:[GRP/]EVENT] [MOD:]SYM[+offs]|MEMADDR [FETCHARGS]   : Set a probe
>> -  r[:[GRP/]EVENT] [MOD:]SYM[+0] [FETCHARGS]  : Set a return probe
>> +  r[MAXACTIVE][:[GRP/]EVENT] [MOD:]SYM[+0] [FETCHARGS]   : Set a return 
>> probe
>>-:[GRP/]EVENT  : Clear a probe
>>
>>   GRP : Group name. If omitted, use "kprobes" for it.
>> @@ -32,6 +32,8 @@ Synopsis of kprobe_events
>>   MOD : Module name which has given SYM.
>>   SYM[+offs]  : Symbol+offset where the probe is inserted.
>>   MEMADDR : Address where the probe is inserted.
>> + MAXACTIVE   : Maximum number of instances of the specified function that
>> +   can be probed simultaneously, or 0 for the default.(*)
>
> Here, yoy don't need (*), since [MAXACTIVE] is not a FETCHARGS.

Why not? (*) refers to "(*) only for return probe." and the maxactive
only makes sense for the kretprobe.

>>   FETCHARGS   : Arguments. Each probe can have up to 128 args.
>>%REG   : Fetch register REG
>> diff --git a/kernel/trace/trace_kprobe.c b/kernel/trace/trace_kprobe.c
>> index 5f688cc..807e01c 100644
>> --- a/kernel/trace/trace_kprobe.c
>> +++ b/kernel/trace/trace_kprobe.c
>> @@ -282,6 +282,7 @@ static struct trace_kprobe *alloc_trace_kprobe(const 
>> char *group,
>>void *addr,
>>const char *symbol,
>>unsigned long offs,
>> +  int maxactive,
>>int nargs, bool is_return)
>>  {
>>   struct trace_kprobe *tk;
>> @@ -309,6 +310,8 @@ static struct trace_kprobe *alloc_trace_kprobe(const 
>> char *group,
>>   else
>>   tk->rp.kp.pre_handler = kprobe_dispatcher;
>>
>> + tk->rp.maxactive = maxactive;
>> +
>>   if (!event || !is_good_name(event)) {
>>   ret = -EINVAL;
>>   goto error;
>> @@ -598,8 +601,10 @@ static

[PATCH v1] tracing/kprobes: expose maxactive for kretprobe in kprobe_events

2017-03-28 Thread Alban Crequy
When a kretprobe is installed on a kernel function, there is a maximum
limit of how many calls in parallel it can catch (aka "maxactive"). A
kernel module could call register_kretprobe() and initialize maxactive
(see example in samples/kprobes/kretprobe_example.c).

But that is not exposed to userspace and it is currently not possible to
choose maxactive when writing to /sys/kernel/debug/tracing/kprobe_events

The default maxactive can be as low as 1 on single-core with a
non-preemptive kernel. This is too low and we need to increase it not
only for recursive functions, but for functions that sleep or resched.

This patch updates the format of the command that can be written to
kprobe_events so that maxactive can be optionally specified.

I need this for a bpf program attached to the kretprobe of
inet_csk_accept, which can sleep for a long time.

BugLink: https://github.com/iovisor/bcc/issues/1072
Signed-off-by: Alban Crequy <al...@kinvolk.io>
---
 Documentation/trace/kprobetrace.txt |  4 +++-
 kernel/trace/trace_kprobe.c | 34 +-
 2 files changed, 32 insertions(+), 6 deletions(-)

diff --git a/Documentation/trace/kprobetrace.txt 
b/Documentation/trace/kprobetrace.txt
index 41ef9d8..655ca7e 100644
--- a/Documentation/trace/kprobetrace.txt
+++ b/Documentation/trace/kprobetrace.txt
@@ -23,7 +23,7 @@ current_tracer. Instead of that, add probe points via
 Synopsis of kprobe_events
 -
   p[:[GRP/]EVENT] [MOD:]SYM[+offs]|MEMADDR [FETCHARGS] : Set a probe
-  r[:[GRP/]EVENT] [MOD:]SYM[+0] [FETCHARGS]: Set a return probe
+  r[MAXACTIVE][:[GRP/]EVENT] [MOD:]SYM[+0] [FETCHARGS] : Set a return probe
   -:[GRP/]EVENT: Clear a probe
 
  GRP   : Group name. If omitted, use "kprobes" for it.
@@ -32,6 +32,8 @@ Synopsis of kprobe_events
  MOD   : Module name which has given SYM.
  SYM[+offs]: Symbol+offset where the probe is inserted.
  MEMADDR   : Address where the probe is inserted.
+ MAXACTIVE : Maximum number of instances of the specified function that
+ can be probed simultaneously, or 0 for the default.(*)
 
  FETCHARGS : Arguments. Each probe can have up to 128 args.
   %REG : Fetch register REG
diff --git a/kernel/trace/trace_kprobe.c b/kernel/trace/trace_kprobe.c
index 5f688cc..807e01c 100644
--- a/kernel/trace/trace_kprobe.c
+++ b/kernel/trace/trace_kprobe.c
@@ -282,6 +282,7 @@ static struct trace_kprobe *alloc_trace_kprobe(const char 
*group,
 void *addr,
 const char *symbol,
 unsigned long offs,
+int maxactive,
 int nargs, bool is_return)
 {
struct trace_kprobe *tk;
@@ -309,6 +310,8 @@ static struct trace_kprobe *alloc_trace_kprobe(const char 
*group,
else
tk->rp.kp.pre_handler = kprobe_dispatcher;
 
+   tk->rp.maxactive = maxactive;
+
if (!event || !is_good_name(event)) {
ret = -EINVAL;
goto error;
@@ -598,8 +601,10 @@ static int create_trace_kprobe(int argc, char **argv)
 {
/*
 * Argument syntax:
-*  - Add kprobe: p[:[GRP/]EVENT] [MOD:]KSYM[+OFFS]|KADDR [FETCHARGS]
-*  - Add kretprobe: r[:[GRP/]EVENT] [MOD:]KSYM[+0] [FETCHARGS]
+*  - Add kprobe:
+*  p[:[GRP/]EVENT] [MOD:]KSYM[+OFFS]|KADDR [FETCHARGS]
+*  - Add kretprobe:
+*  r[MAXACTIVE][:[GRP/]EVENT] [MOD:]KSYM[+0] [FETCHARGS]
 * Fetch args:
 *  $retval : fetch return value
 *  $stack  : fetch stack address
@@ -619,6 +624,7 @@ static int create_trace_kprobe(int argc, char **argv)
int i, ret = 0;
bool is_return = false, is_delete = false;
char *symbol = NULL, *event = NULL, *group = NULL;
+   int maxactive = 0;
char *arg;
unsigned long offset = 0;
void *addr = NULL;
@@ -637,8 +643,26 @@ static int create_trace_kprobe(int argc, char **argv)
return -EINVAL;
}
 
-   if (argv[0][1] == ':') {
+   if (is_return && isdigit(argv[0][1]) && strchr([0][1], ':')) {
+   event = strchr([0][1], ':') + 1;
+   event[-1] = '\0';
+   ret = kstrtouint([0][1], 0, );
+   if (ret) {
+   pr_info("Failed to parse maxactive.\n");
+   return ret;
+   }
+   /* kretprobes instances are iterated over via a list. The
+* maximum should stay reasonable.
+*/
+   if (maxactive > 1024) {
+   pr_info("Maxactive is too big.\n");
+   return -EINVAL;
+