Re: [PATCH RFC net-next 11/14] tracing: allow eBPF programs to be attached to events
On Tue, Jul 1, 2014 at 11:39 PM, Namhyung Kim wrote: > On Wed, Jul 2, 2014 at 3:14 PM, Alexei Starovoitov wrote: >> >> Can manipulate what at compile time? Entry records of tracepoints are >> hard coded based on the event. For verifier it's easier to treat all >> tracepoint events as they received the same 'struct bpf_context' >> of N arguments then the same program can be attached to multiple >> tracepoint events at the same time. > > I was thinking about perf creates a bpf program for filtering some > events like recording kfree_skb if protocol == xx. So perf can > calculate the offset and size of the protocol field and make > appropriate insns for the filter. When I'm saying 'tracing filter' in patch 11/14, I really mean stap/dtrace-like facility for live debugging, where tracing infra plays a key role. At the end the programs are written in C with annotations and perf orchestrates compilation, insertion, attaching, printing results. Your meaning of 'tracing filter' is canonical: a filter that says whether event should be recorded or not. And it makes sense. When perf sees 'protocol==xx' on command line it can generate ebpf program for it. In such case my earlier proposal for replacing predicate tree walker with ebpf programs in kernel becomes obsolete? If I understood correctly, you're proposing to teach perf to generate ebpf programs for existing command line interface and use it instead of predicate tree. This way predicate tree can be removed, right? In such case programs would need to access event records. > Maybe it needs to pass the event format to the verifier somehow then. The integer fields are easy to verify. dynamic_array part is tricky, since 16-bit offset + 16-bit length accessors are very tracing specific. I need to think it through. > Your scenario looks like just calling a bpf program when it hits a > event. It could use event triggering for that purpose IMHO. Sure. Calling ebpf program can be one of even trigger types. On the other side ebpf programs themselves can replace the whole triggering, filtering, recording code. We can have events that do nothing or call ebpf programs. Then programs walk all necessary data structures, store stuff into a maps, etc Just look at amount of events that perf processes. Some of it can be done in kernel by dynamic program. -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH RFC net-next 11/14] tracing: allow eBPF programs to be attached to events
On Wed, Jul 2, 2014 at 3:14 PM, Alexei Starovoitov wrote: > On Tue, Jul 1, 2014 at 10:32 PM, Namhyung Kim wrote: >> On Fri, 27 Jun 2014 17:06:03 -0700, Alexei Starovoitov wrote: >>> User interface: >>> cat bpf_123 > /sys/kernel/debug/tracing/__event__/filter >>> >>> where 123 is an id of the eBPF program priorly loaded. >>> __event__ is static tracepoint event. >>> (kprobe events will be supported in the future patches) >>> >>> eBPF programs can call in-kernel helper functions to: >>> - lookup/update/delete elements in maps >>> - memcmp >>> - trace_printk >> >> ISTR Steve doesn't like to use trace_printk() (at least for production >> kernels) anymore. And I'm not sure it'd work if there's no existing >> trace_printk() on a system. > > yes. I saw big warning that trace_printk_init_buffers() emits. > The idea here is to use eBPF programs for live kernel debugging. > Instead of adding printk() and recompiling, just write a program, > attach it to some event, and printk whatever is interesting. > My only concern about printk() was that it dumps things into trace > buffers (which is still better than dumping stuff to syslog), but now > (since Andy almost convinced me to switch to 'fd' based interface) > we can have seq_printk-like that prints into special buffer. So that > user space does 'read(ufd)' and receives whatever program has > printed. I think that would be much cleaner. > >>> + if (unlikely(ftrace_file->flags & FTRACE_EVENT_FL_FILTERED) && \ >>> + unlikely(ftrace_file->event_call->flags & TRACE_EVENT_FL_BPF)) { \ >>> + struct bpf_context __ctx; \ >>> + \ >>> + populate_bpf_context(&__ctx, args, 0, 0, 0, 0, 0); \ >>> + trace_filter_call_bpf(ftrace_file->filter, &__ctx); \ >>> + return; \ >>> + } \ >>> + \ >> >> Hmm.. But it seems the eBPF prog is not a filter - it'd always drop the >> event. And I think it's better to use a recorded entry rather then args >> as a bpf_context so that tools like perf can manipulate it at compile >> time based on the event format. > > Can manipulate what at compile time? Entry records of tracepoints are > hard coded based on the event. For verifier it's easier to treat all > tracepoint events as they received the same 'struct bpf_context' > of N arguments then the same program can be attached to multiple > tracepoint events at the same time. I was thinking about perf creates a bpf program for filtering some events like recording kfree_skb if protocol == xx. So perf can calculate the offset and size of the protocol field and make appropriate insns for the filter. Maybe it needs to pass the event format to the verifier somehow then. > I thought about making verifier specific for _every_ tracepoint event, > but it complicates the user interface, since 'bpf_context' is now different > for every program. I think args are much easier to deal with from C > programming point of view, since program can go a fetch the same > fields that tracepoint 'fast_assign' macro does. > Also skipping buffer allocation and fast_assign gives very sizable > performance boost, since the program will access only what it needs to. > > The return value of eBPF program is ignored, since I couldn't think > of use case for it. We can change it to be more 'filter' like and interpret > return value as true/false, whether to record this event or not. Thoughts? Your scenario looks like just calling a bpf program when it hits a event. It could use event triggering for that purpose IMHO. But for filtering, it needs to add checking of the return value. Thanks, Namhyung -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH RFC net-next 11/14] tracing: allow eBPF programs to be attached to events
On Tue, Jul 1, 2014 at 10:32 PM, Namhyung Kim wrote: > On Fri, 27 Jun 2014 17:06:03 -0700, Alexei Starovoitov wrote: >> User interface: >> cat bpf_123 > /sys/kernel/debug/tracing/__event__/filter >> >> where 123 is an id of the eBPF program priorly loaded. >> __event__ is static tracepoint event. >> (kprobe events will be supported in the future patches) >> >> eBPF programs can call in-kernel helper functions to: >> - lookup/update/delete elements in maps >> - memcmp >> - trace_printk > > ISTR Steve doesn't like to use trace_printk() (at least for production > kernels) anymore. And I'm not sure it'd work if there's no existing > trace_printk() on a system. yes. I saw big warning that trace_printk_init_buffers() emits. The idea here is to use eBPF programs for live kernel debugging. Instead of adding printk() and recompiling, just write a program, attach it to some event, and printk whatever is interesting. My only concern about printk() was that it dumps things into trace buffers (which is still better than dumping stuff to syslog), but now (since Andy almost convinced me to switch to 'fd' based interface) we can have seq_printk-like that prints into special buffer. So that user space does 'read(ufd)' and receives whatever program has printed. I think that would be much cleaner. >> + if (unlikely(ftrace_file->flags & FTRACE_EVENT_FL_FILTERED) && \ >> + unlikely(ftrace_file->event_call->flags & TRACE_EVENT_FL_BPF)) { \ >> + struct bpf_context __ctx; \ >> + \ >> + populate_bpf_context(&__ctx, args, 0, 0, 0, 0, 0); \ >> + trace_filter_call_bpf(ftrace_file->filter, &__ctx); \ >> + return; \ >> + } \ >> + \ > > Hmm.. But it seems the eBPF prog is not a filter - it'd always drop the > event. And I think it's better to use a recorded entry rather then args > as a bpf_context so that tools like perf can manipulate it at compile > time based on the event format. Can manipulate what at compile time? Entry records of tracepoints are hard coded based on the event. For verifier it's easier to treat all tracepoint events as they received the same 'struct bpf_context' of N arguments then the same program can be attached to multiple tracepoint events at the same time. I thought about making verifier specific for _every_ tracepoint event, but it complicates the user interface, since 'bpf_context' is now different for every program. I think args are much easier to deal with from C programming point of view, since program can go a fetch the same fields that tracepoint 'fast_assign' macro does. Also skipping buffer allocation and fast_assign gives very sizable performance boost, since the program will access only what it needs to. The return value of eBPF program is ignored, since I couldn't think of use case for it. We can change it to be more 'filter' like and interpret return value as true/false, whether to record this event or not. Thoughts? -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH RFC net-next 11/14] tracing: allow eBPF programs to be attached to events
On Tue, Jul 1, 2014 at 10:32 PM, Namhyung Kim namhy...@gmail.com wrote: On Fri, 27 Jun 2014 17:06:03 -0700, Alexei Starovoitov wrote: User interface: cat bpf_123 /sys/kernel/debug/tracing/__event__/filter where 123 is an id of the eBPF program priorly loaded. __event__ is static tracepoint event. (kprobe events will be supported in the future patches) eBPF programs can call in-kernel helper functions to: - lookup/update/delete elements in maps - memcmp - trace_printk ISTR Steve doesn't like to use trace_printk() (at least for production kernels) anymore. And I'm not sure it'd work if there's no existing trace_printk() on a system. yes. I saw big warning that trace_printk_init_buffers() emits. The idea here is to use eBPF programs for live kernel debugging. Instead of adding printk() and recompiling, just write a program, attach it to some event, and printk whatever is interesting. My only concern about printk() was that it dumps things into trace buffers (which is still better than dumping stuff to syslog), but now (since Andy almost convinced me to switch to 'fd' based interface) we can have seq_printk-like that prints into special buffer. So that user space does 'read(ufd)' and receives whatever program has printed. I think that would be much cleaner. + if (unlikely(ftrace_file-flags FTRACE_EVENT_FL_FILTERED) \ + unlikely(ftrace_file-event_call-flags TRACE_EVENT_FL_BPF)) { \ + struct bpf_context __ctx; \ + \ + populate_bpf_context(__ctx, args, 0, 0, 0, 0, 0); \ + trace_filter_call_bpf(ftrace_file-filter, __ctx); \ + return; \ + } \ + \ Hmm.. But it seems the eBPF prog is not a filter - it'd always drop the event. And I think it's better to use a recorded entry rather then args as a bpf_context so that tools like perf can manipulate it at compile time based on the event format. Can manipulate what at compile time? Entry records of tracepoints are hard coded based on the event. For verifier it's easier to treat all tracepoint events as they received the same 'struct bpf_context' of N arguments then the same program can be attached to multiple tracepoint events at the same time. I thought about making verifier specific for _every_ tracepoint event, but it complicates the user interface, since 'bpf_context' is now different for every program. I think args are much easier to deal with from C programming point of view, since program can go a fetch the same fields that tracepoint 'fast_assign' macro does. Also skipping buffer allocation and fast_assign gives very sizable performance boost, since the program will access only what it needs to. The return value of eBPF program is ignored, since I couldn't think of use case for it. We can change it to be more 'filter' like and interpret return value as true/false, whether to record this event or not. Thoughts? -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH RFC net-next 11/14] tracing: allow eBPF programs to be attached to events
On Wed, Jul 2, 2014 at 3:14 PM, Alexei Starovoitov a...@plumgrid.com wrote: On Tue, Jul 1, 2014 at 10:32 PM, Namhyung Kim namhy...@gmail.com wrote: On Fri, 27 Jun 2014 17:06:03 -0700, Alexei Starovoitov wrote: User interface: cat bpf_123 /sys/kernel/debug/tracing/__event__/filter where 123 is an id of the eBPF program priorly loaded. __event__ is static tracepoint event. (kprobe events will be supported in the future patches) eBPF programs can call in-kernel helper functions to: - lookup/update/delete elements in maps - memcmp - trace_printk ISTR Steve doesn't like to use trace_printk() (at least for production kernels) anymore. And I'm not sure it'd work if there's no existing trace_printk() on a system. yes. I saw big warning that trace_printk_init_buffers() emits. The idea here is to use eBPF programs for live kernel debugging. Instead of adding printk() and recompiling, just write a program, attach it to some event, and printk whatever is interesting. My only concern about printk() was that it dumps things into trace buffers (which is still better than dumping stuff to syslog), but now (since Andy almost convinced me to switch to 'fd' based interface) we can have seq_printk-like that prints into special buffer. So that user space does 'read(ufd)' and receives whatever program has printed. I think that would be much cleaner. + if (unlikely(ftrace_file-flags FTRACE_EVENT_FL_FILTERED) \ + unlikely(ftrace_file-event_call-flags TRACE_EVENT_FL_BPF)) { \ + struct bpf_context __ctx; \ + \ + populate_bpf_context(__ctx, args, 0, 0, 0, 0, 0); \ + trace_filter_call_bpf(ftrace_file-filter, __ctx); \ + return; \ + } \ + \ Hmm.. But it seems the eBPF prog is not a filter - it'd always drop the event. And I think it's better to use a recorded entry rather then args as a bpf_context so that tools like perf can manipulate it at compile time based on the event format. Can manipulate what at compile time? Entry records of tracepoints are hard coded based on the event. For verifier it's easier to treat all tracepoint events as they received the same 'struct bpf_context' of N arguments then the same program can be attached to multiple tracepoint events at the same time. I was thinking about perf creates a bpf program for filtering some events like recording kfree_skb if protocol == xx. So perf can calculate the offset and size of the protocol field and make appropriate insns for the filter. Maybe it needs to pass the event format to the verifier somehow then. I thought about making verifier specific for _every_ tracepoint event, but it complicates the user interface, since 'bpf_context' is now different for every program. I think args are much easier to deal with from C programming point of view, since program can go a fetch the same fields that tracepoint 'fast_assign' macro does. Also skipping buffer allocation and fast_assign gives very sizable performance boost, since the program will access only what it needs to. The return value of eBPF program is ignored, since I couldn't think of use case for it. We can change it to be more 'filter' like and interpret return value as true/false, whether to record this event or not. Thoughts? Your scenario looks like just calling a bpf program when it hits a event. It could use event triggering for that purpose IMHO. But for filtering, it needs to add checking of the return value. Thanks, Namhyung -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH RFC net-next 11/14] tracing: allow eBPF programs to be attached to events
On Tue, Jul 1, 2014 at 11:39 PM, Namhyung Kim namhy...@kernel.org wrote: On Wed, Jul 2, 2014 at 3:14 PM, Alexei Starovoitov a...@plumgrid.com wrote: Can manipulate what at compile time? Entry records of tracepoints are hard coded based on the event. For verifier it's easier to treat all tracepoint events as they received the same 'struct bpf_context' of N arguments then the same program can be attached to multiple tracepoint events at the same time. I was thinking about perf creates a bpf program for filtering some events like recording kfree_skb if protocol == xx. So perf can calculate the offset and size of the protocol field and make appropriate insns for the filter. When I'm saying 'tracing filter' in patch 11/14, I really mean stap/dtrace-like facility for live debugging, where tracing infra plays a key role. At the end the programs are written in C with annotations and perf orchestrates compilation, insertion, attaching, printing results. Your meaning of 'tracing filter' is canonical: a filter that says whether event should be recorded or not. And it makes sense. When perf sees 'protocol==xx' on command line it can generate ebpf program for it. In such case my earlier proposal for replacing predicate tree walker with ebpf programs in kernel becomes obsolete? If I understood correctly, you're proposing to teach perf to generate ebpf programs for existing command line interface and use it instead of predicate tree. This way predicate tree can be removed, right? In such case programs would need to access event records. Maybe it needs to pass the event format to the verifier somehow then. The integer fields are easy to verify. dynamic_array part is tricky, since 16-bit offset + 16-bit length accessors are very tracing specific. I need to think it through. Your scenario looks like just calling a bpf program when it hits a event. It could use event triggering for that purpose IMHO. Sure. Calling ebpf program can be one of even trigger types. On the other side ebpf programs themselves can replace the whole triggering, filtering, recording code. We can have events that do nothing or call ebpf programs. Then programs walk all necessary data structures, store stuff into a maps, etc Just look at amount of events that perf processes. Some of it can be done in kernel by dynamic program. -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH RFC net-next 11/14] tracing: allow eBPF programs to be attached to events
On Fri, 27 Jun 2014 17:06:03 -0700, Alexei Starovoitov wrote: > User interface: > cat bpf_123 > /sys/kernel/debug/tracing/__event__/filter > > where 123 is an id of the eBPF program priorly loaded. > __event__ is static tracepoint event. > (kprobe events will be supported in the future patches) > > eBPF programs can call in-kernel helper functions to: > - lookup/update/delete elements in maps > - memcmp > - trace_printk ISTR Steve doesn't like to use trace_printk() (at least for production kernels) anymore. And I'm not sure it'd work if there's no existing trace_printk() on a system. > - load_pointer > - dump_stack [SNIP] > @@ -634,6 +635,15 @@ ftrace_raw_event_##call(void *__data, proto) > \ > if (ftrace_trigger_soft_disabled(ftrace_file)) \ > return; \ > \ > + if (unlikely(ftrace_file->flags & FTRACE_EVENT_FL_FILTERED) && \ > + unlikely(ftrace_file->event_call->flags & TRACE_EVENT_FL_BPF)) { \ > + struct bpf_context __ctx; \ > + \ > + populate_bpf_context(&__ctx, args, 0, 0, 0, 0, 0); \ > + trace_filter_call_bpf(ftrace_file->filter, &__ctx); \ > + return; \ > + } \ > + \ Hmm.. But it seems the eBPF prog is not a filter - it'd always drop the event. And I think it's better to use a recorded entry rather then args as a bpf_context so that tools like perf can manipulate it at compile time based on the event format. Thanks, Namhyung > __data_size = ftrace_get_offsets_##call(&__data_offsets, args); \ > \ > entry = ftrace_event_buffer_reserve(, ftrace_file, \ -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH RFC net-next 11/14] tracing: allow eBPF programs to be attached to events
On Tue, Jul 1, 2014 at 1:30 AM, Daniel Borkmann wrote: > On 06/28/2014 02:06 AM, Alexei Starovoitov wrote: >> >> User interface: >> cat bpf_123 > /sys/kernel/debug/tracing/__event__/filter >> >> where 123 is an id of the eBPF program priorly loaded. >> __event__ is static tracepoint event. >> (kprobe events will be supported in the future patches) >> >> eBPF programs can call in-kernel helper functions to: >> - lookup/update/delete elements in maps >> - memcmp >> - trace_printk >> - load_pointer >> - dump_stack > > > Are there plans to let eBPF replace the generic event > filtering framework in tracing? yes. the other patch that replaces predicate tree walking with eBPF programs is pending on eBPF split out of networking. -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH RFC net-next 11/14] tracing: allow eBPF programs to be attached to events
On 06/28/2014 02:06 AM, Alexei Starovoitov wrote: User interface: cat bpf_123 > /sys/kernel/debug/tracing/__event__/filter where 123 is an id of the eBPF program priorly loaded. __event__ is static tracepoint event. (kprobe events will be supported in the future patches) eBPF programs can call in-kernel helper functions to: - lookup/update/delete elements in maps - memcmp - trace_printk - load_pointer - dump_stack Are there plans to let eBPF replace the generic event filtering framework in tracing? Signed-off-by: Alexei Starovoitov --- include/linux/ftrace_event.h |5 + include/trace/bpf_trace.h | 29 + include/trace/ftrace.h | 10 ++ include/uapi/linux/bpf.h |5 + kernel/trace/Kconfig |1 + kernel/trace/Makefile |1 + kernel/trace/bpf_trace.c | 217 kernel/trace/trace.h |3 + kernel/trace/trace_events.c|7 ++ kernel/trace/trace_events_filter.c | 72 +++- 10 files changed, 349 insertions(+), 1 deletion(-) create mode 100644 include/trace/bpf_trace.h create mode 100644 kernel/trace/bpf_trace.c -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH RFC net-next 11/14] tracing: allow eBPF programs to be attached to events
On 06/28/2014 02:06 AM, Alexei Starovoitov wrote: User interface: cat bpf_123 /sys/kernel/debug/tracing/__event__/filter where 123 is an id of the eBPF program priorly loaded. __event__ is static tracepoint event. (kprobe events will be supported in the future patches) eBPF programs can call in-kernel helper functions to: - lookup/update/delete elements in maps - memcmp - trace_printk - load_pointer - dump_stack Are there plans to let eBPF replace the generic event filtering framework in tracing? Signed-off-by: Alexei Starovoitov a...@plumgrid.com --- include/linux/ftrace_event.h |5 + include/trace/bpf_trace.h | 29 + include/trace/ftrace.h | 10 ++ include/uapi/linux/bpf.h |5 + kernel/trace/Kconfig |1 + kernel/trace/Makefile |1 + kernel/trace/bpf_trace.c | 217 kernel/trace/trace.h |3 + kernel/trace/trace_events.c|7 ++ kernel/trace/trace_events_filter.c | 72 +++- 10 files changed, 349 insertions(+), 1 deletion(-) create mode 100644 include/trace/bpf_trace.h create mode 100644 kernel/trace/bpf_trace.c -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH RFC net-next 11/14] tracing: allow eBPF programs to be attached to events
On Tue, Jul 1, 2014 at 1:30 AM, Daniel Borkmann dbork...@redhat.com wrote: On 06/28/2014 02:06 AM, Alexei Starovoitov wrote: User interface: cat bpf_123 /sys/kernel/debug/tracing/__event__/filter where 123 is an id of the eBPF program priorly loaded. __event__ is static tracepoint event. (kprobe events will be supported in the future patches) eBPF programs can call in-kernel helper functions to: - lookup/update/delete elements in maps - memcmp - trace_printk - load_pointer - dump_stack Are there plans to let eBPF replace the generic event filtering framework in tracing? yes. the other patch that replaces predicate tree walking with eBPF programs is pending on eBPF split out of networking. -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH RFC net-next 11/14] tracing: allow eBPF programs to be attached to events
On Fri, 27 Jun 2014 17:06:03 -0700, Alexei Starovoitov wrote: User interface: cat bpf_123 /sys/kernel/debug/tracing/__event__/filter where 123 is an id of the eBPF program priorly loaded. __event__ is static tracepoint event. (kprobe events will be supported in the future patches) eBPF programs can call in-kernel helper functions to: - lookup/update/delete elements in maps - memcmp - trace_printk ISTR Steve doesn't like to use trace_printk() (at least for production kernels) anymore. And I'm not sure it'd work if there's no existing trace_printk() on a system. - load_pointer - dump_stack [SNIP] @@ -634,6 +635,15 @@ ftrace_raw_event_##call(void *__data, proto) \ if (ftrace_trigger_soft_disabled(ftrace_file)) \ return; \ \ + if (unlikely(ftrace_file-flags FTRACE_EVENT_FL_FILTERED) \ + unlikely(ftrace_file-event_call-flags TRACE_EVENT_FL_BPF)) { \ + struct bpf_context __ctx; \ + \ + populate_bpf_context(__ctx, args, 0, 0, 0, 0, 0); \ + trace_filter_call_bpf(ftrace_file-filter, __ctx); \ + return; \ + } \ + \ Hmm.. But it seems the eBPF prog is not a filter - it'd always drop the event. And I think it's better to use a recorded entry rather then args as a bpf_context so that tools like perf can manipulate it at compile time based on the event format. Thanks, Namhyung __data_size = ftrace_get_offsets_##call(__data_offsets, args); \ \ entry = ftrace_event_buffer_reserve(fbuffer, ftrace_file, \ -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH RFC net-next 11/14] tracing: allow eBPF programs to be attached to events
User interface: cat bpf_123 > /sys/kernel/debug/tracing/__event__/filter where 123 is an id of the eBPF program priorly loaded. __event__ is static tracepoint event. (kprobe events will be supported in the future patches) eBPF programs can call in-kernel helper functions to: - lookup/update/delete elements in maps - memcmp - trace_printk - load_pointer - dump_stack Signed-off-by: Alexei Starovoitov --- include/linux/ftrace_event.h |5 + include/trace/bpf_trace.h | 29 + include/trace/ftrace.h | 10 ++ include/uapi/linux/bpf.h |5 + kernel/trace/Kconfig |1 + kernel/trace/Makefile |1 + kernel/trace/bpf_trace.c | 217 kernel/trace/trace.h |3 + kernel/trace/trace_events.c|7 ++ kernel/trace/trace_events_filter.c | 72 +++- 10 files changed, 349 insertions(+), 1 deletion(-) create mode 100644 include/trace/bpf_trace.h create mode 100644 kernel/trace/bpf_trace.c diff --git a/include/linux/ftrace_event.h b/include/linux/ftrace_event.h index cff3106ffe2c..de313bd9a434 100644 --- a/include/linux/ftrace_event.h +++ b/include/linux/ftrace_event.h @@ -237,6 +237,7 @@ enum { TRACE_EVENT_FL_WAS_ENABLED_BIT, TRACE_EVENT_FL_USE_CALL_FILTER_BIT, TRACE_EVENT_FL_TRACEPOINT_BIT, + TRACE_EVENT_FL_BPF_BIT, }; /* @@ -259,6 +260,7 @@ enum { TRACE_EVENT_FL_WAS_ENABLED = (1 << TRACE_EVENT_FL_WAS_ENABLED_BIT), TRACE_EVENT_FL_USE_CALL_FILTER = (1 << TRACE_EVENT_FL_USE_CALL_FILTER_BIT), TRACE_EVENT_FL_TRACEPOINT = (1 << TRACE_EVENT_FL_TRACEPOINT_BIT), + TRACE_EVENT_FL_BPF = (1 << TRACE_EVENT_FL_BPF_BIT), }; struct ftrace_event_call { @@ -536,6 +538,9 @@ event_trigger_unlock_commit_regs(struct ftrace_event_file *file, event_triggers_post_call(file, tt); } +struct bpf_context; +void trace_filter_call_bpf(struct event_filter *filter, struct bpf_context *ctx); + enum { FILTER_OTHER = 0, FILTER_STATIC_STRING, diff --git a/include/trace/bpf_trace.h b/include/trace/bpf_trace.h new file mode 100644 index ..2122437f1317 --- /dev/null +++ b/include/trace/bpf_trace.h @@ -0,0 +1,29 @@ +/* Copyright (c) 2011-2014 PLUMgrid, http://plumgrid.com + * + * This program is free software; you can redistribute it and/or + * modify it under the terms of version 2 of the GNU General Public + * License as published by the Free Software Foundation. + */ +#ifndef _LINUX_KERNEL_BPF_TRACE_H +#define _LINUX_KERNEL_BPF_TRACE_H + +/* For tracing filters save first six arguments of tracepoint events. + * On 64-bit architectures argN fields will match one to one to arguments passed + * to tracepoint events. + * On 32-bit architectures u64 arguments to events will be seen into two + * consecutive argN, argN+1 fields. Pointers, u32, u16, u8, bool types will + * match one to one + */ +struct bpf_context { + unsigned long arg1; + unsigned long arg2; + unsigned long arg3; + unsigned long arg4; + unsigned long arg5; + unsigned long arg6; +}; + +/* call from ftrace_raw_event_*() to copy tracepoint arguments into ctx */ +void populate_bpf_context(struct bpf_context *ctx, ...); + +#endif /* _LINUX_KERNEL_BPF_TRACE_H */ diff --git a/include/trace/ftrace.h b/include/trace/ftrace.h index 26b4f2e13275..ad4987ac68bb 100644 --- a/include/trace/ftrace.h +++ b/include/trace/ftrace.h @@ -17,6 +17,7 @@ */ #include +#include /* * DECLARE_EVENT_CLASS can be used to add a generic function @@ -634,6 +635,15 @@ ftrace_raw_event_##call(void *__data, proto) \ if (ftrace_trigger_soft_disabled(ftrace_file)) \ return; \ \ + if (unlikely(ftrace_file->flags & FTRACE_EVENT_FL_FILTERED) && \ + unlikely(ftrace_file->event_call->flags & TRACE_EVENT_FL_BPF)) { \ + struct bpf_context __ctx; \ + \ + populate_bpf_context(&__ctx, args, 0, 0, 0, 0, 0); \ + trace_filter_call_bpf(ftrace_file->filter, &__ctx); \ + return; \ + } \ + \ __data_size = ftrace_get_offsets_##call(&__data_offsets, args); \ \ entry = ftrace_event_buffer_reserve(, ftrace_file, \ diff --git a/include/uapi/linux/bpf.h b/include/uapi/linux/bpf.h index 03c65eedd3d5..d03b8b39e031 100644 --- a/include/uapi/linux/bpf.h +++
[PATCH RFC net-next 11/14] tracing: allow eBPF programs to be attached to events
User interface: cat bpf_123 /sys/kernel/debug/tracing/__event__/filter where 123 is an id of the eBPF program priorly loaded. __event__ is static tracepoint event. (kprobe events will be supported in the future patches) eBPF programs can call in-kernel helper functions to: - lookup/update/delete elements in maps - memcmp - trace_printk - load_pointer - dump_stack Signed-off-by: Alexei Starovoitov a...@plumgrid.com --- include/linux/ftrace_event.h |5 + include/trace/bpf_trace.h | 29 + include/trace/ftrace.h | 10 ++ include/uapi/linux/bpf.h |5 + kernel/trace/Kconfig |1 + kernel/trace/Makefile |1 + kernel/trace/bpf_trace.c | 217 kernel/trace/trace.h |3 + kernel/trace/trace_events.c|7 ++ kernel/trace/trace_events_filter.c | 72 +++- 10 files changed, 349 insertions(+), 1 deletion(-) create mode 100644 include/trace/bpf_trace.h create mode 100644 kernel/trace/bpf_trace.c diff --git a/include/linux/ftrace_event.h b/include/linux/ftrace_event.h index cff3106ffe2c..de313bd9a434 100644 --- a/include/linux/ftrace_event.h +++ b/include/linux/ftrace_event.h @@ -237,6 +237,7 @@ enum { TRACE_EVENT_FL_WAS_ENABLED_BIT, TRACE_EVENT_FL_USE_CALL_FILTER_BIT, TRACE_EVENT_FL_TRACEPOINT_BIT, + TRACE_EVENT_FL_BPF_BIT, }; /* @@ -259,6 +260,7 @@ enum { TRACE_EVENT_FL_WAS_ENABLED = (1 TRACE_EVENT_FL_WAS_ENABLED_BIT), TRACE_EVENT_FL_USE_CALL_FILTER = (1 TRACE_EVENT_FL_USE_CALL_FILTER_BIT), TRACE_EVENT_FL_TRACEPOINT = (1 TRACE_EVENT_FL_TRACEPOINT_BIT), + TRACE_EVENT_FL_BPF = (1 TRACE_EVENT_FL_BPF_BIT), }; struct ftrace_event_call { @@ -536,6 +538,9 @@ event_trigger_unlock_commit_regs(struct ftrace_event_file *file, event_triggers_post_call(file, tt); } +struct bpf_context; +void trace_filter_call_bpf(struct event_filter *filter, struct bpf_context *ctx); + enum { FILTER_OTHER = 0, FILTER_STATIC_STRING, diff --git a/include/trace/bpf_trace.h b/include/trace/bpf_trace.h new file mode 100644 index ..2122437f1317 --- /dev/null +++ b/include/trace/bpf_trace.h @@ -0,0 +1,29 @@ +/* Copyright (c) 2011-2014 PLUMgrid, http://plumgrid.com + * + * This program is free software; you can redistribute it and/or + * modify it under the terms of version 2 of the GNU General Public + * License as published by the Free Software Foundation. + */ +#ifndef _LINUX_KERNEL_BPF_TRACE_H +#define _LINUX_KERNEL_BPF_TRACE_H + +/* For tracing filters save first six arguments of tracepoint events. + * On 64-bit architectures argN fields will match one to one to arguments passed + * to tracepoint events. + * On 32-bit architectures u64 arguments to events will be seen into two + * consecutive argN, argN+1 fields. Pointers, u32, u16, u8, bool types will + * match one to one + */ +struct bpf_context { + unsigned long arg1; + unsigned long arg2; + unsigned long arg3; + unsigned long arg4; + unsigned long arg5; + unsigned long arg6; +}; + +/* call from ftrace_raw_event_*() to copy tracepoint arguments into ctx */ +void populate_bpf_context(struct bpf_context *ctx, ...); + +#endif /* _LINUX_KERNEL_BPF_TRACE_H */ diff --git a/include/trace/ftrace.h b/include/trace/ftrace.h index 26b4f2e13275..ad4987ac68bb 100644 --- a/include/trace/ftrace.h +++ b/include/trace/ftrace.h @@ -17,6 +17,7 @@ */ #include linux/ftrace_event.h +#include trace/bpf_trace.h /* * DECLARE_EVENT_CLASS can be used to add a generic function @@ -634,6 +635,15 @@ ftrace_raw_event_##call(void *__data, proto) \ if (ftrace_trigger_soft_disabled(ftrace_file)) \ return; \ \ + if (unlikely(ftrace_file-flags FTRACE_EVENT_FL_FILTERED) \ + unlikely(ftrace_file-event_call-flags TRACE_EVENT_FL_BPF)) { \ + struct bpf_context __ctx; \ + \ + populate_bpf_context(__ctx, args, 0, 0, 0, 0, 0); \ + trace_filter_call_bpf(ftrace_file-filter, __ctx); \ + return; \ + } \ + \ __data_size = ftrace_get_offsets_##call(__data_offsets, args); \ \ entry = ftrace_event_buffer_reserve(fbuffer, ftrace_file, \ diff --git a/include/uapi/linux/bpf.h b/include/uapi/linux/bpf.h index 03c65eedd3d5..d03b8b39e031 100644