Re: [PATCH RFC net-next 11/14] tracing: allow eBPF programs to be attached to events

2014-07-02 Thread Alexei Starovoitov
On Tue, Jul 1, 2014 at 11:39 PM, Namhyung Kim  wrote:
> On Wed, Jul 2, 2014 at 3:14 PM, Alexei Starovoitov  wrote:
>>
>> Can manipulate what at compile time? Entry records of tracepoints are
>> hard coded based on the event. For verifier it's easier to treat all
>> tracepoint events as they received the same 'struct bpf_context'
>> of N arguments then the same program can be attached to multiple
>> tracepoint events at the same time.
>
> I was thinking about perf creates a bpf program for filtering some
> events like recording kfree_skb if protocol == xx.  So perf can
> calculate the offset and size of the protocol field and make
> appropriate insns for the filter.

When I'm saying 'tracing filter' in patch 11/14, I really mean
stap/dtrace-like facility for live debugging, where tracing infra plays
a key role. At the end the programs are written in C with annotations
and perf orchestrates compilation, insertion, attaching, printing results.
Your meaning of 'tracing filter' is canonical: a filter that says whether
event should be recorded or not. And it makes sense.
When perf sees 'protocol==xx' on command line it can generate
ebpf program for it. In such case my earlier proposal for replacing
predicate tree walker with ebpf programs in kernel becomes obsolete?
If I understood correctly, you're proposing to teach perf to generate
ebpf programs for existing command line interface and use it instead
of predicate tree. This way predicate tree can be removed, right?
In such case programs would need to access event records.

> Maybe it needs to pass the event format to the verifier somehow then.

The integer fields are easy to verify. dynamic_array part is tricky, since
16-bit offset  + 16-bit length accessors are very tracing specific.
I need to think it through.

> Your scenario looks like just calling a bpf program when it hits a
> event.  It could use event triggering for that purpose IMHO.

Sure. Calling ebpf program can be one of even trigger types.
On the other side ebpf programs themselves can replace the whole
triggering, filtering, recording code. We can have events that
do nothing or call ebpf programs. Then programs walk all necessary
data structures, store stuff into a maps, etc Just look at amount of
events that perf processes. Some of it can be done in kernel by
dynamic program.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH RFC net-next 11/14] tracing: allow eBPF programs to be attached to events

2014-07-02 Thread Namhyung Kim
On Wed, Jul 2, 2014 at 3:14 PM, Alexei Starovoitov  wrote:
> On Tue, Jul 1, 2014 at 10:32 PM, Namhyung Kim  wrote:
>> On Fri, 27 Jun 2014 17:06:03 -0700, Alexei Starovoitov wrote:
>>> User interface:
>>> cat bpf_123 > /sys/kernel/debug/tracing/__event__/filter
>>>
>>> where 123 is an id of the eBPF program priorly loaded.
>>> __event__ is static tracepoint event.
>>> (kprobe events will be supported in the future patches)
>>>
>>> eBPF programs can call in-kernel helper functions to:
>>> - lookup/update/delete elements in maps
>>> - memcmp
>>> - trace_printk
>>
>> ISTR Steve doesn't like to use trace_printk() (at least for production
>> kernels) anymore.  And I'm not sure it'd work if there's no existing
>> trace_printk() on a system.
>
> yes. I saw big warning that trace_printk_init_buffers() emits.
> The idea here is to use eBPF programs for live kernel debugging.
> Instead of adding printk() and recompiling, just write a program,
> attach it to some event, and printk whatever is interesting.
> My only concern about printk() was that it dumps things into trace
> buffers (which is still better than dumping stuff to syslog), but now
> (since Andy almost convinced me to switch to 'fd' based interface)
> we can have seq_printk-like that prints into special buffer. So that
> user space does 'read(ufd)' and receives whatever program has
> printed. I think that would be much cleaner.
>
>>> + if (unlikely(ftrace_file->flags & FTRACE_EVENT_FL_FILTERED) &&  \
>>> + unlikely(ftrace_file->event_call->flags & TRACE_EVENT_FL_BPF)) { \
>>> + struct bpf_context __ctx;   \
>>> + \
>>> + populate_bpf_context(&__ctx, args, 0, 0, 0, 0, 0);  \
>>> + trace_filter_call_bpf(ftrace_file->filter, &__ctx); \
>>> + return; \
>>> + }   \
>>> + \
>>
>> Hmm..  But it seems the eBPF prog is not a filter - it'd always drop the
>> event.  And I think it's better to use a recorded entry rather then args
>> as a bpf_context so that tools like perf can manipulate it at compile
>> time based on the event format.
>
> Can manipulate what at compile time? Entry records of tracepoints are
> hard coded based on the event. For verifier it's easier to treat all
> tracepoint events as they received the same 'struct bpf_context'
> of N arguments then the same program can be attached to multiple
> tracepoint events at the same time.

I was thinking about perf creates a bpf program for filtering some
events like recording kfree_skb if protocol == xx.  So perf can
calculate the offset and size of the protocol field and make
appropriate insns for the filter.

Maybe it needs to pass the event format to the verifier somehow then.


> I thought about making verifier specific for _every_ tracepoint event,
> but it complicates the user interface, since 'bpf_context' is now different
> for every program. I think args are much easier to deal with from C
> programming point of view, since program can go a fetch the same
> fields that tracepoint 'fast_assign' macro does.
> Also skipping buffer allocation and fast_assign gives very sizable
> performance boost, since the program will access only what it needs to.
>
> The return value of eBPF program is ignored, since I couldn't think
> of use case for it. We can change it to be more 'filter' like and interpret
> return value as true/false, whether to record this event or not. Thoughts?

Your scenario looks like just calling a bpf program when it hits a
event.  It could use event triggering for that purpose IMHO.

But for filtering, it needs to add checking of the return value.

Thanks,
Namhyung
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH RFC net-next 11/14] tracing: allow eBPF programs to be attached to events

2014-07-02 Thread Alexei Starovoitov
On Tue, Jul 1, 2014 at 10:32 PM, Namhyung Kim  wrote:
> On Fri, 27 Jun 2014 17:06:03 -0700, Alexei Starovoitov wrote:
>> User interface:
>> cat bpf_123 > /sys/kernel/debug/tracing/__event__/filter
>>
>> where 123 is an id of the eBPF program priorly loaded.
>> __event__ is static tracepoint event.
>> (kprobe events will be supported in the future patches)
>>
>> eBPF programs can call in-kernel helper functions to:
>> - lookup/update/delete elements in maps
>> - memcmp
>> - trace_printk
>
> ISTR Steve doesn't like to use trace_printk() (at least for production
> kernels) anymore.  And I'm not sure it'd work if there's no existing
> trace_printk() on a system.

yes. I saw big warning that trace_printk_init_buffers() emits.
The idea here is to use eBPF programs for live kernel debugging.
Instead of adding printk() and recompiling, just write a program,
attach it to some event, and printk whatever is interesting.
My only concern about printk() was that it dumps things into trace
buffers (which is still better than dumping stuff to syslog), but now
(since Andy almost convinced me to switch to 'fd' based interface)
we can have seq_printk-like that prints into special buffer. So that
user space does 'read(ufd)' and receives whatever program has
printed. I think that would be much cleaner.

>> + if (unlikely(ftrace_file->flags & FTRACE_EVENT_FL_FILTERED) &&  \
>> + unlikely(ftrace_file->event_call->flags & TRACE_EVENT_FL_BPF)) { \
>> + struct bpf_context __ctx;   \
>> + \
>> + populate_bpf_context(&__ctx, args, 0, 0, 0, 0, 0);  \
>> + trace_filter_call_bpf(ftrace_file->filter, &__ctx); \
>> + return; \
>> + }   \
>> + \
>
> Hmm..  But it seems the eBPF prog is not a filter - it'd always drop the
> event.  And I think it's better to use a recorded entry rather then args
> as a bpf_context so that tools like perf can manipulate it at compile
> time based on the event format.

Can manipulate what at compile time? Entry records of tracepoints are
hard coded based on the event. For verifier it's easier to treat all
tracepoint events as they received the same 'struct bpf_context'
of N arguments then the same program can be attached to multiple
tracepoint events at the same time.
I thought about making verifier specific for _every_ tracepoint event,
but it complicates the user interface, since 'bpf_context' is now different
for every program. I think args are much easier to deal with from C
programming point of view, since program can go a fetch the same
fields that tracepoint 'fast_assign' macro does.
Also skipping buffer allocation and fast_assign gives very sizable
performance boost, since the program will access only what it needs to.

The return value of eBPF program is ignored, since I couldn't think
of use case for it. We can change it to be more 'filter' like and interpret
return value as true/false, whether to record this event or not. Thoughts?
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH RFC net-next 11/14] tracing: allow eBPF programs to be attached to events

2014-07-02 Thread Alexei Starovoitov
On Tue, Jul 1, 2014 at 10:32 PM, Namhyung Kim namhy...@gmail.com wrote:
 On Fri, 27 Jun 2014 17:06:03 -0700, Alexei Starovoitov wrote:
 User interface:
 cat bpf_123  /sys/kernel/debug/tracing/__event__/filter

 where 123 is an id of the eBPF program priorly loaded.
 __event__ is static tracepoint event.
 (kprobe events will be supported in the future patches)

 eBPF programs can call in-kernel helper functions to:
 - lookup/update/delete elements in maps
 - memcmp
 - trace_printk

 ISTR Steve doesn't like to use trace_printk() (at least for production
 kernels) anymore.  And I'm not sure it'd work if there's no existing
 trace_printk() on a system.

yes. I saw big warning that trace_printk_init_buffers() emits.
The idea here is to use eBPF programs for live kernel debugging.
Instead of adding printk() and recompiling, just write a program,
attach it to some event, and printk whatever is interesting.
My only concern about printk() was that it dumps things into trace
buffers (which is still better than dumping stuff to syslog), but now
(since Andy almost convinced me to switch to 'fd' based interface)
we can have seq_printk-like that prints into special buffer. So that
user space does 'read(ufd)' and receives whatever program has
printed. I think that would be much cleaner.

 + if (unlikely(ftrace_file-flags  FTRACE_EVENT_FL_FILTERED)   \
 + unlikely(ftrace_file-event_call-flags  TRACE_EVENT_FL_BPF)) { \
 + struct bpf_context __ctx;   \
 + \
 + populate_bpf_context(__ctx, args, 0, 0, 0, 0, 0);  \
 + trace_filter_call_bpf(ftrace_file-filter, __ctx); \
 + return; \
 + }   \
 + \

 Hmm..  But it seems the eBPF prog is not a filter - it'd always drop the
 event.  And I think it's better to use a recorded entry rather then args
 as a bpf_context so that tools like perf can manipulate it at compile
 time based on the event format.

Can manipulate what at compile time? Entry records of tracepoints are
hard coded based on the event. For verifier it's easier to treat all
tracepoint events as they received the same 'struct bpf_context'
of N arguments then the same program can be attached to multiple
tracepoint events at the same time.
I thought about making verifier specific for _every_ tracepoint event,
but it complicates the user interface, since 'bpf_context' is now different
for every program. I think args are much easier to deal with from C
programming point of view, since program can go a fetch the same
fields that tracepoint 'fast_assign' macro does.
Also skipping buffer allocation and fast_assign gives very sizable
performance boost, since the program will access only what it needs to.

The return value of eBPF program is ignored, since I couldn't think
of use case for it. We can change it to be more 'filter' like and interpret
return value as true/false, whether to record this event or not. Thoughts?
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH RFC net-next 11/14] tracing: allow eBPF programs to be attached to events

2014-07-02 Thread Namhyung Kim
On Wed, Jul 2, 2014 at 3:14 PM, Alexei Starovoitov a...@plumgrid.com wrote:
 On Tue, Jul 1, 2014 at 10:32 PM, Namhyung Kim namhy...@gmail.com wrote:
 On Fri, 27 Jun 2014 17:06:03 -0700, Alexei Starovoitov wrote:
 User interface:
 cat bpf_123  /sys/kernel/debug/tracing/__event__/filter

 where 123 is an id of the eBPF program priorly loaded.
 __event__ is static tracepoint event.
 (kprobe events will be supported in the future patches)

 eBPF programs can call in-kernel helper functions to:
 - lookup/update/delete elements in maps
 - memcmp
 - trace_printk

 ISTR Steve doesn't like to use trace_printk() (at least for production
 kernels) anymore.  And I'm not sure it'd work if there's no existing
 trace_printk() on a system.

 yes. I saw big warning that trace_printk_init_buffers() emits.
 The idea here is to use eBPF programs for live kernel debugging.
 Instead of adding printk() and recompiling, just write a program,
 attach it to some event, and printk whatever is interesting.
 My only concern about printk() was that it dumps things into trace
 buffers (which is still better than dumping stuff to syslog), but now
 (since Andy almost convinced me to switch to 'fd' based interface)
 we can have seq_printk-like that prints into special buffer. So that
 user space does 'read(ufd)' and receives whatever program has
 printed. I think that would be much cleaner.

 + if (unlikely(ftrace_file-flags  FTRACE_EVENT_FL_FILTERED)   \
 + unlikely(ftrace_file-event_call-flags  TRACE_EVENT_FL_BPF)) { \
 + struct bpf_context __ctx;   \
 + \
 + populate_bpf_context(__ctx, args, 0, 0, 0, 0, 0);  \
 + trace_filter_call_bpf(ftrace_file-filter, __ctx); \
 + return; \
 + }   \
 + \

 Hmm..  But it seems the eBPF prog is not a filter - it'd always drop the
 event.  And I think it's better to use a recorded entry rather then args
 as a bpf_context so that tools like perf can manipulate it at compile
 time based on the event format.

 Can manipulate what at compile time? Entry records of tracepoints are
 hard coded based on the event. For verifier it's easier to treat all
 tracepoint events as they received the same 'struct bpf_context'
 of N arguments then the same program can be attached to multiple
 tracepoint events at the same time.

I was thinking about perf creates a bpf program for filtering some
events like recording kfree_skb if protocol == xx.  So perf can
calculate the offset and size of the protocol field and make
appropriate insns for the filter.

Maybe it needs to pass the event format to the verifier somehow then.


 I thought about making verifier specific for _every_ tracepoint event,
 but it complicates the user interface, since 'bpf_context' is now different
 for every program. I think args are much easier to deal with from C
 programming point of view, since program can go a fetch the same
 fields that tracepoint 'fast_assign' macro does.
 Also skipping buffer allocation and fast_assign gives very sizable
 performance boost, since the program will access only what it needs to.

 The return value of eBPF program is ignored, since I couldn't think
 of use case for it. We can change it to be more 'filter' like and interpret
 return value as true/false, whether to record this event or not. Thoughts?

Your scenario looks like just calling a bpf program when it hits a
event.  It could use event triggering for that purpose IMHO.

But for filtering, it needs to add checking of the return value.

Thanks,
Namhyung
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH RFC net-next 11/14] tracing: allow eBPF programs to be attached to events

2014-07-02 Thread Alexei Starovoitov
On Tue, Jul 1, 2014 at 11:39 PM, Namhyung Kim namhy...@kernel.org wrote:
 On Wed, Jul 2, 2014 at 3:14 PM, Alexei Starovoitov a...@plumgrid.com wrote:

 Can manipulate what at compile time? Entry records of tracepoints are
 hard coded based on the event. For verifier it's easier to treat all
 tracepoint events as they received the same 'struct bpf_context'
 of N arguments then the same program can be attached to multiple
 tracepoint events at the same time.

 I was thinking about perf creates a bpf program for filtering some
 events like recording kfree_skb if protocol == xx.  So perf can
 calculate the offset and size of the protocol field and make
 appropriate insns for the filter.

When I'm saying 'tracing filter' in patch 11/14, I really mean
stap/dtrace-like facility for live debugging, where tracing infra plays
a key role. At the end the programs are written in C with annotations
and perf orchestrates compilation, insertion, attaching, printing results.
Your meaning of 'tracing filter' is canonical: a filter that says whether
event should be recorded or not. And it makes sense.
When perf sees 'protocol==xx' on command line it can generate
ebpf program for it. In such case my earlier proposal for replacing
predicate tree walker with ebpf programs in kernel becomes obsolete?
If I understood correctly, you're proposing to teach perf to generate
ebpf programs for existing command line interface and use it instead
of predicate tree. This way predicate tree can be removed, right?
In such case programs would need to access event records.

 Maybe it needs to pass the event format to the verifier somehow then.

The integer fields are easy to verify. dynamic_array part is tricky, since
16-bit offset  + 16-bit length accessors are very tracing specific.
I need to think it through.

 Your scenario looks like just calling a bpf program when it hits a
 event.  It could use event triggering for that purpose IMHO.

Sure. Calling ebpf program can be one of even trigger types.
On the other side ebpf programs themselves can replace the whole
triggering, filtering, recording code. We can have events that
do nothing or call ebpf programs. Then programs walk all necessary
data structures, store stuff into a maps, etc Just look at amount of
events that perf processes. Some of it can be done in kernel by
dynamic program.
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH RFC net-next 11/14] tracing: allow eBPF programs to be attached to events

2014-07-01 Thread Namhyung Kim
On Fri, 27 Jun 2014 17:06:03 -0700, Alexei Starovoitov wrote:
> User interface:
> cat bpf_123 > /sys/kernel/debug/tracing/__event__/filter
>
> where 123 is an id of the eBPF program priorly loaded.
> __event__ is static tracepoint event.
> (kprobe events will be supported in the future patches)
>
> eBPF programs can call in-kernel helper functions to:
> - lookup/update/delete elements in maps
> - memcmp
> - trace_printk

ISTR Steve doesn't like to use trace_printk() (at least for production
kernels) anymore.  And I'm not sure it'd work if there's no existing
trace_printk() on a system.

> - load_pointer
> - dump_stack


[SNIP]
> @@ -634,6 +635,15 @@ ftrace_raw_event_##call(void *__data, proto) 
> \
>   if (ftrace_trigger_soft_disabled(ftrace_file))  \
>   return; \
>   \
> + if (unlikely(ftrace_file->flags & FTRACE_EVENT_FL_FILTERED) &&  \
> + unlikely(ftrace_file->event_call->flags & TRACE_EVENT_FL_BPF)) { \
> + struct bpf_context __ctx;   \
> + \
> + populate_bpf_context(&__ctx, args, 0, 0, 0, 0, 0);  \
> + trace_filter_call_bpf(ftrace_file->filter, &__ctx); \
> + return; \
> + }   \
> + \

Hmm..  But it seems the eBPF prog is not a filter - it'd always drop the
event.  And I think it's better to use a recorded entry rather then args
as a bpf_context so that tools like perf can manipulate it at compile
time based on the event format.

Thanks,
Namhyung


>   __data_size = ftrace_get_offsets_##call(&__data_offsets, args); \
>   \
>   entry = ftrace_event_buffer_reserve(, ftrace_file,  \
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH RFC net-next 11/14] tracing: allow eBPF programs to be attached to events

2014-07-01 Thread Alexei Starovoitov
On Tue, Jul 1, 2014 at 1:30 AM, Daniel Borkmann  wrote:
> On 06/28/2014 02:06 AM, Alexei Starovoitov wrote:
>>
>> User interface:
>> cat bpf_123 > /sys/kernel/debug/tracing/__event__/filter
>>
>> where 123 is an id of the eBPF program priorly loaded.
>> __event__ is static tracepoint event.
>> (kprobe events will be supported in the future patches)
>>
>> eBPF programs can call in-kernel helper functions to:
>> - lookup/update/delete elements in maps
>> - memcmp
>> - trace_printk
>> - load_pointer
>> - dump_stack
>
>
> Are there plans to let eBPF replace the generic event
> filtering framework in tracing?

yes. the other patch that replaces predicate tree walking with
eBPF programs is pending on eBPF split out of networking.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH RFC net-next 11/14] tracing: allow eBPF programs to be attached to events

2014-07-01 Thread Daniel Borkmann

On 06/28/2014 02:06 AM, Alexei Starovoitov wrote:

User interface:
cat bpf_123 > /sys/kernel/debug/tracing/__event__/filter

where 123 is an id of the eBPF program priorly loaded.
__event__ is static tracepoint event.
(kprobe events will be supported in the future patches)

eBPF programs can call in-kernel helper functions to:
- lookup/update/delete elements in maps
- memcmp
- trace_printk
- load_pointer
- dump_stack


Are there plans to let eBPF replace the generic event
filtering framework in tracing?


Signed-off-by: Alexei Starovoitov 
---
  include/linux/ftrace_event.h   |5 +
  include/trace/bpf_trace.h  |   29 +
  include/trace/ftrace.h |   10 ++
  include/uapi/linux/bpf.h   |5 +
  kernel/trace/Kconfig   |1 +
  kernel/trace/Makefile  |1 +
  kernel/trace/bpf_trace.c   |  217 
  kernel/trace/trace.h   |3 +
  kernel/trace/trace_events.c|7 ++
  kernel/trace/trace_events_filter.c |   72 +++-
  10 files changed, 349 insertions(+), 1 deletion(-)
  create mode 100644 include/trace/bpf_trace.h
  create mode 100644 kernel/trace/bpf_trace.c

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH RFC net-next 11/14] tracing: allow eBPF programs to be attached to events

2014-07-01 Thread Daniel Borkmann

On 06/28/2014 02:06 AM, Alexei Starovoitov wrote:

User interface:
cat bpf_123  /sys/kernel/debug/tracing/__event__/filter

where 123 is an id of the eBPF program priorly loaded.
__event__ is static tracepoint event.
(kprobe events will be supported in the future patches)

eBPF programs can call in-kernel helper functions to:
- lookup/update/delete elements in maps
- memcmp
- trace_printk
- load_pointer
- dump_stack


Are there plans to let eBPF replace the generic event
filtering framework in tracing?


Signed-off-by: Alexei Starovoitov a...@plumgrid.com
---
  include/linux/ftrace_event.h   |5 +
  include/trace/bpf_trace.h  |   29 +
  include/trace/ftrace.h |   10 ++
  include/uapi/linux/bpf.h   |5 +
  kernel/trace/Kconfig   |1 +
  kernel/trace/Makefile  |1 +
  kernel/trace/bpf_trace.c   |  217 
  kernel/trace/trace.h   |3 +
  kernel/trace/trace_events.c|7 ++
  kernel/trace/trace_events_filter.c |   72 +++-
  10 files changed, 349 insertions(+), 1 deletion(-)
  create mode 100644 include/trace/bpf_trace.h
  create mode 100644 kernel/trace/bpf_trace.c

--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH RFC net-next 11/14] tracing: allow eBPF programs to be attached to events

2014-07-01 Thread Alexei Starovoitov
On Tue, Jul 1, 2014 at 1:30 AM, Daniel Borkmann dbork...@redhat.com wrote:
 On 06/28/2014 02:06 AM, Alexei Starovoitov wrote:

 User interface:
 cat bpf_123  /sys/kernel/debug/tracing/__event__/filter

 where 123 is an id of the eBPF program priorly loaded.
 __event__ is static tracepoint event.
 (kprobe events will be supported in the future patches)

 eBPF programs can call in-kernel helper functions to:
 - lookup/update/delete elements in maps
 - memcmp
 - trace_printk
 - load_pointer
 - dump_stack


 Are there plans to let eBPF replace the generic event
 filtering framework in tracing?

yes. the other patch that replaces predicate tree walking with
eBPF programs is pending on eBPF split out of networking.
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH RFC net-next 11/14] tracing: allow eBPF programs to be attached to events

2014-07-01 Thread Namhyung Kim
On Fri, 27 Jun 2014 17:06:03 -0700, Alexei Starovoitov wrote:
 User interface:
 cat bpf_123  /sys/kernel/debug/tracing/__event__/filter

 where 123 is an id of the eBPF program priorly loaded.
 __event__ is static tracepoint event.
 (kprobe events will be supported in the future patches)

 eBPF programs can call in-kernel helper functions to:
 - lookup/update/delete elements in maps
 - memcmp
 - trace_printk

ISTR Steve doesn't like to use trace_printk() (at least for production
kernels) anymore.  And I'm not sure it'd work if there's no existing
trace_printk() on a system.

 - load_pointer
 - dump_stack


[SNIP]
 @@ -634,6 +635,15 @@ ftrace_raw_event_##call(void *__data, proto) 
 \
   if (ftrace_trigger_soft_disabled(ftrace_file))  \
   return; \
   \
 + if (unlikely(ftrace_file-flags  FTRACE_EVENT_FL_FILTERED)   \
 + unlikely(ftrace_file-event_call-flags  TRACE_EVENT_FL_BPF)) { \
 + struct bpf_context __ctx;   \
 + \
 + populate_bpf_context(__ctx, args, 0, 0, 0, 0, 0);  \
 + trace_filter_call_bpf(ftrace_file-filter, __ctx); \
 + return; \
 + }   \
 + \

Hmm..  But it seems the eBPF prog is not a filter - it'd always drop the
event.  And I think it's better to use a recorded entry rather then args
as a bpf_context so that tools like perf can manipulate it at compile
time based on the event format.

Thanks,
Namhyung


   __data_size = ftrace_get_offsets_##call(__data_offsets, args); \
   \
   entry = ftrace_event_buffer_reserve(fbuffer, ftrace_file,  \
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH RFC net-next 11/14] tracing: allow eBPF programs to be attached to events

2014-06-27 Thread Alexei Starovoitov
User interface:
cat bpf_123 > /sys/kernel/debug/tracing/__event__/filter

where 123 is an id of the eBPF program priorly loaded.
__event__ is static tracepoint event.
(kprobe events will be supported in the future patches)

eBPF programs can call in-kernel helper functions to:
- lookup/update/delete elements in maps
- memcmp
- trace_printk
- load_pointer
- dump_stack

Signed-off-by: Alexei Starovoitov 
---
 include/linux/ftrace_event.h   |5 +
 include/trace/bpf_trace.h  |   29 +
 include/trace/ftrace.h |   10 ++
 include/uapi/linux/bpf.h   |5 +
 kernel/trace/Kconfig   |1 +
 kernel/trace/Makefile  |1 +
 kernel/trace/bpf_trace.c   |  217 
 kernel/trace/trace.h   |3 +
 kernel/trace/trace_events.c|7 ++
 kernel/trace/trace_events_filter.c |   72 +++-
 10 files changed, 349 insertions(+), 1 deletion(-)
 create mode 100644 include/trace/bpf_trace.h
 create mode 100644 kernel/trace/bpf_trace.c

diff --git a/include/linux/ftrace_event.h b/include/linux/ftrace_event.h
index cff3106ffe2c..de313bd9a434 100644
--- a/include/linux/ftrace_event.h
+++ b/include/linux/ftrace_event.h
@@ -237,6 +237,7 @@ enum {
TRACE_EVENT_FL_WAS_ENABLED_BIT,
TRACE_EVENT_FL_USE_CALL_FILTER_BIT,
TRACE_EVENT_FL_TRACEPOINT_BIT,
+   TRACE_EVENT_FL_BPF_BIT,
 };
 
 /*
@@ -259,6 +260,7 @@ enum {
TRACE_EVENT_FL_WAS_ENABLED  = (1 << TRACE_EVENT_FL_WAS_ENABLED_BIT),
TRACE_EVENT_FL_USE_CALL_FILTER  = (1 << 
TRACE_EVENT_FL_USE_CALL_FILTER_BIT),
TRACE_EVENT_FL_TRACEPOINT   = (1 << TRACE_EVENT_FL_TRACEPOINT_BIT),
+   TRACE_EVENT_FL_BPF  = (1 << TRACE_EVENT_FL_BPF_BIT),
 };
 
 struct ftrace_event_call {
@@ -536,6 +538,9 @@ event_trigger_unlock_commit_regs(struct ftrace_event_file 
*file,
event_triggers_post_call(file, tt);
 }
 
+struct bpf_context;
+void trace_filter_call_bpf(struct event_filter *filter, struct bpf_context 
*ctx);
+
 enum {
FILTER_OTHER = 0,
FILTER_STATIC_STRING,
diff --git a/include/trace/bpf_trace.h b/include/trace/bpf_trace.h
new file mode 100644
index ..2122437f1317
--- /dev/null
+++ b/include/trace/bpf_trace.h
@@ -0,0 +1,29 @@
+/* Copyright (c) 2011-2014 PLUMgrid, http://plumgrid.com
+ *
+ * This program is free software; you can redistribute it and/or
+ * modify it under the terms of version 2 of the GNU General Public
+ * License as published by the Free Software Foundation.
+ */
+#ifndef _LINUX_KERNEL_BPF_TRACE_H
+#define _LINUX_KERNEL_BPF_TRACE_H
+
+/* For tracing filters save first six arguments of tracepoint events.
+ * On 64-bit architectures argN fields will match one to one to arguments 
passed
+ * to tracepoint events.
+ * On 32-bit architectures u64 arguments to events will be seen into two
+ * consecutive argN, argN+1 fields. Pointers, u32, u16, u8, bool types will
+ * match one to one
+ */
+struct bpf_context {
+   unsigned long arg1;
+   unsigned long arg2;
+   unsigned long arg3;
+   unsigned long arg4;
+   unsigned long arg5;
+   unsigned long arg6;
+};
+
+/* call from ftrace_raw_event_*() to copy tracepoint arguments into ctx */
+void populate_bpf_context(struct bpf_context *ctx, ...);
+
+#endif /* _LINUX_KERNEL_BPF_TRACE_H */
diff --git a/include/trace/ftrace.h b/include/trace/ftrace.h
index 26b4f2e13275..ad4987ac68bb 100644
--- a/include/trace/ftrace.h
+++ b/include/trace/ftrace.h
@@ -17,6 +17,7 @@
  */
 
 #include 
+#include 
 
 /*
  * DECLARE_EVENT_CLASS can be used to add a generic function
@@ -634,6 +635,15 @@ ftrace_raw_event_##call(void *__data, proto)   
\
if (ftrace_trigger_soft_disabled(ftrace_file))  \
return; \
\
+   if (unlikely(ftrace_file->flags & FTRACE_EVENT_FL_FILTERED) &&  \
+   unlikely(ftrace_file->event_call->flags & TRACE_EVENT_FL_BPF)) { \
+   struct bpf_context __ctx;   \
+   \
+   populate_bpf_context(&__ctx, args, 0, 0, 0, 0, 0);  \
+   trace_filter_call_bpf(ftrace_file->filter, &__ctx); \
+   return; \
+   }   \
+   \
__data_size = ftrace_get_offsets_##call(&__data_offsets, args); \
\
entry = ftrace_event_buffer_reserve(, ftrace_file,  \
diff --git a/include/uapi/linux/bpf.h b/include/uapi/linux/bpf.h
index 03c65eedd3d5..d03b8b39e031 100644
--- a/include/uapi/linux/bpf.h
+++ 

[PATCH RFC net-next 11/14] tracing: allow eBPF programs to be attached to events

2014-06-27 Thread Alexei Starovoitov
User interface:
cat bpf_123  /sys/kernel/debug/tracing/__event__/filter

where 123 is an id of the eBPF program priorly loaded.
__event__ is static tracepoint event.
(kprobe events will be supported in the future patches)

eBPF programs can call in-kernel helper functions to:
- lookup/update/delete elements in maps
- memcmp
- trace_printk
- load_pointer
- dump_stack

Signed-off-by: Alexei Starovoitov a...@plumgrid.com
---
 include/linux/ftrace_event.h   |5 +
 include/trace/bpf_trace.h  |   29 +
 include/trace/ftrace.h |   10 ++
 include/uapi/linux/bpf.h   |5 +
 kernel/trace/Kconfig   |1 +
 kernel/trace/Makefile  |1 +
 kernel/trace/bpf_trace.c   |  217 
 kernel/trace/trace.h   |3 +
 kernel/trace/trace_events.c|7 ++
 kernel/trace/trace_events_filter.c |   72 +++-
 10 files changed, 349 insertions(+), 1 deletion(-)
 create mode 100644 include/trace/bpf_trace.h
 create mode 100644 kernel/trace/bpf_trace.c

diff --git a/include/linux/ftrace_event.h b/include/linux/ftrace_event.h
index cff3106ffe2c..de313bd9a434 100644
--- a/include/linux/ftrace_event.h
+++ b/include/linux/ftrace_event.h
@@ -237,6 +237,7 @@ enum {
TRACE_EVENT_FL_WAS_ENABLED_BIT,
TRACE_EVENT_FL_USE_CALL_FILTER_BIT,
TRACE_EVENT_FL_TRACEPOINT_BIT,
+   TRACE_EVENT_FL_BPF_BIT,
 };
 
 /*
@@ -259,6 +260,7 @@ enum {
TRACE_EVENT_FL_WAS_ENABLED  = (1  TRACE_EVENT_FL_WAS_ENABLED_BIT),
TRACE_EVENT_FL_USE_CALL_FILTER  = (1  
TRACE_EVENT_FL_USE_CALL_FILTER_BIT),
TRACE_EVENT_FL_TRACEPOINT   = (1  TRACE_EVENT_FL_TRACEPOINT_BIT),
+   TRACE_EVENT_FL_BPF  = (1  TRACE_EVENT_FL_BPF_BIT),
 };
 
 struct ftrace_event_call {
@@ -536,6 +538,9 @@ event_trigger_unlock_commit_regs(struct ftrace_event_file 
*file,
event_triggers_post_call(file, tt);
 }
 
+struct bpf_context;
+void trace_filter_call_bpf(struct event_filter *filter, struct bpf_context 
*ctx);
+
 enum {
FILTER_OTHER = 0,
FILTER_STATIC_STRING,
diff --git a/include/trace/bpf_trace.h b/include/trace/bpf_trace.h
new file mode 100644
index ..2122437f1317
--- /dev/null
+++ b/include/trace/bpf_trace.h
@@ -0,0 +1,29 @@
+/* Copyright (c) 2011-2014 PLUMgrid, http://plumgrid.com
+ *
+ * This program is free software; you can redistribute it and/or
+ * modify it under the terms of version 2 of the GNU General Public
+ * License as published by the Free Software Foundation.
+ */
+#ifndef _LINUX_KERNEL_BPF_TRACE_H
+#define _LINUX_KERNEL_BPF_TRACE_H
+
+/* For tracing filters save first six arguments of tracepoint events.
+ * On 64-bit architectures argN fields will match one to one to arguments 
passed
+ * to tracepoint events.
+ * On 32-bit architectures u64 arguments to events will be seen into two
+ * consecutive argN, argN+1 fields. Pointers, u32, u16, u8, bool types will
+ * match one to one
+ */
+struct bpf_context {
+   unsigned long arg1;
+   unsigned long arg2;
+   unsigned long arg3;
+   unsigned long arg4;
+   unsigned long arg5;
+   unsigned long arg6;
+};
+
+/* call from ftrace_raw_event_*() to copy tracepoint arguments into ctx */
+void populate_bpf_context(struct bpf_context *ctx, ...);
+
+#endif /* _LINUX_KERNEL_BPF_TRACE_H */
diff --git a/include/trace/ftrace.h b/include/trace/ftrace.h
index 26b4f2e13275..ad4987ac68bb 100644
--- a/include/trace/ftrace.h
+++ b/include/trace/ftrace.h
@@ -17,6 +17,7 @@
  */
 
 #include linux/ftrace_event.h
+#include trace/bpf_trace.h
 
 /*
  * DECLARE_EVENT_CLASS can be used to add a generic function
@@ -634,6 +635,15 @@ ftrace_raw_event_##call(void *__data, proto)   
\
if (ftrace_trigger_soft_disabled(ftrace_file))  \
return; \
\
+   if (unlikely(ftrace_file-flags  FTRACE_EVENT_FL_FILTERED)   \
+   unlikely(ftrace_file-event_call-flags  TRACE_EVENT_FL_BPF)) { \
+   struct bpf_context __ctx;   \
+   \
+   populate_bpf_context(__ctx, args, 0, 0, 0, 0, 0);  \
+   trace_filter_call_bpf(ftrace_file-filter, __ctx); \
+   return; \
+   }   \
+   \
__data_size = ftrace_get_offsets_##call(__data_offsets, args); \
\
entry = ftrace_event_buffer_reserve(fbuffer, ftrace_file,  \
diff --git a/include/uapi/linux/bpf.h b/include/uapi/linux/bpf.h
index 03c65eedd3d5..d03b8b39e031 100644