Re: [RFC PATCH tip 4/5] use BPF in tracing filters
(2013/12/09 3:22), Frank Ch. Eigler wrote: > > masami.hiramatsu.pt wrote: > >> [...] >> Anyway, as far as I can see, there looks be two different models of >> tracing in our mind. >> >> A) Fixed event based tracing: In this model, there are several fixed >> "events" which well defined with fixed arguments. tracer handles these >> events and only use limited arguments. It's like a packet stream >> processing. ftrace, perf etc. are used this model. >> >> B) Flexible event-point tracing: In this model, each tracer(or even >> trace user) can freely define their own event, there will be some fixed >> tracing points defined, but arguments are defined by users. It's like a >> debugger's breakpoint debugging. systemtap, ktap etc. are used this model. > > It may be more useful to think of it as a contrast along the > hard-coded versus programmable axis. (perf, systemtap, and ktap can > each reach to some extent across your "fixed" vs "flexible" line. > Each has some dynamic and some static-tracepoint capability.) Oh, I meant that B is not tend to share the defined event among different tracing instances. Each instances defines new different dynamic events and gets memories and registers freely. OTOH, the Ftrace and LTT models are based on the fixed, shared and well defined events. Even if a new dynamic event is defined, it will be shared by every instances. > >> e.g. B model has a good flexibility and A model is easy to use for >> beginners. > > I don't think it's the model that dictates ease-of-use, but the > quality of implementation, logistics, documentation, and examples. Of course, but it requires learning the new programming way. And also, we need to know about the target source code for setting up new events. I know that the systemtap provides many pre-defined probepoints. so, the systemtap may already have solved this kind of issue. ;) Thank you, -- Masami HIRAMATSU IT Management Research Dept. Linux Technology Center Hitachi, Ltd., Yokohama Research Laboratory E-mail: masami.hiramatsu...@hitachi.com -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [RFC PATCH tip 4/5] use BPF in tracing filters
(2013/12/09 3:22), Frank Ch. Eigler wrote: masami.hiramatsu.pt wrote: [...] Anyway, as far as I can see, there looks be two different models of tracing in our mind. A) Fixed event based tracing: In this model, there are several fixed events which well defined with fixed arguments. tracer handles these events and only use limited arguments. It's like a packet stream processing. ftrace, perf etc. are used this model. B) Flexible event-point tracing: In this model, each tracer(or even trace user) can freely define their own event, there will be some fixed tracing points defined, but arguments are defined by users. It's like a debugger's breakpoint debugging. systemtap, ktap etc. are used this model. It may be more useful to think of it as a contrast along the hard-coded versus programmable axis. (perf, systemtap, and ktap can each reach to some extent across your fixed vs flexible line. Each has some dynamic and some static-tracepoint capability.) Oh, I meant that B is not tend to share the defined event among different tracing instances. Each instances defines new different dynamic events and gets memories and registers freely. OTOH, the Ftrace and LTT models are based on the fixed, shared and well defined events. Even if a new dynamic event is defined, it will be shared by every instances. e.g. B model has a good flexibility and A model is easy to use for beginners. I don't think it's the model that dictates ease-of-use, but the quality of implementation, logistics, documentation, and examples. Of course, but it requires learning the new programming way. And also, we need to know about the target source code for setting up new events. I know that the systemtap provides many pre-defined probepoints. so, the systemtap may already have solved this kind of issue. ;) Thank you, -- Masami HIRAMATSU IT Management Research Dept. Linux Technology Center Hitachi, Ltd., Yokohama Research Laboratory E-mail: masami.hiramatsu...@hitachi.com -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [RFC PATCH tip 4/5] use BPF in tracing filters
masami.hiramatsu.pt wrote: > [...] > Anyway, as far as I can see, there looks be two different models of > tracing in our mind. > > A) Fixed event based tracing: In this model, there are several fixed > "events" which well defined with fixed arguments. tracer handles these > events and only use limited arguments. It's like a packet stream > processing. ftrace, perf etc. are used this model. > > B) Flexible event-point tracing: In this model, each tracer(or even > trace user) can freely define their own event, there will be some fixed > tracing points defined, but arguments are defined by users. It's like a > debugger's breakpoint debugging. systemtap, ktap etc. are used this model. It may be more useful to think of it as a contrast along the hard-coded versus programmable axis. (perf, systemtap, and ktap can each reach to some extent across your "fixed" vs "flexible" line. Each has some dynamic and some static-tracepoint capability.) > e.g. B model has a good flexibility and A model is easy to use for > beginners. I don't think it's the model that dictates ease-of-use, but the quality of implementation, logistics, documentation, and examples. - FChE -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [RFC PATCH tip 4/5] use BPF in tracing filters
masami.hiramatsu.pt wrote: [...] Anyway, as far as I can see, there looks be two different models of tracing in our mind. A) Fixed event based tracing: In this model, there are several fixed events which well defined with fixed arguments. tracer handles these events and only use limited arguments. It's like a packet stream processing. ftrace, perf etc. are used this model. B) Flexible event-point tracing: In this model, each tracer(or even trace user) can freely define their own event, there will be some fixed tracing points defined, but arguments are defined by users. It's like a debugger's breakpoint debugging. systemtap, ktap etc. are used this model. It may be more useful to think of it as a contrast along the hard-coded versus programmable axis. (perf, systemtap, and ktap can each reach to some extent across your fixed vs flexible line. Each has some dynamic and some static-tracepoint capability.) e.g. B model has a good flexibility and A model is easy to use for beginners. I don't think it's the model that dictates ease-of-use, but the quality of implementation, logistics, documentation, and examples. - FChE -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [RFC PATCH tip 4/5] use BPF in tracing filters
(2013/12/06 19:05), Jovi Zhangwei wrote: > On Fri, Dec 6, 2013 at 4:43 PM, Masami Hiramatsu > wrote: >> (2013/12/05 14:11), Alexei Starovoitov wrote: >>> On Wed, Dec 4, 2013 at 4:05 PM, Masami Hiramatsu >>> wrote: (2013/12/04 10:11), Steven Rostedt wrote: > On Wed, 04 Dec 2013 09:48:44 +0900 > Masami Hiramatsu wrote: > >> fetch functions and actions. In that case, we can continue >> to use current interface but much faster to trace. >> Also, we can see what filter/arguments/actions are set >> on each event. > > There's also the problem that the current filters work with the results > of what is written to the buffer, not what is passed in by the trace > point, as that isn't even displayed to the user. Agreed, so I've said I doubt this implementation is a good shape to integrate. Ktap style is better, since it just gets parameters from perf buffer entry (using event format). >>> >>> Are you saying always store all arguments into ring buffer and let >>> filter run on it? >> >> Yes, it is what ftrace does. I doubt your way fits all of the existing >> trace-event macros. However, I think just for dynamic events, you can >> integrating the argument fetching and filtering. >> > Does this will affect the user interface of perf-probe argument fetching? > > I mean if use bpf backend, do we must need gcc to compile bpf source > for perf-probe argument fetching? as we known, current argument > fetching is go through kprobe_events/uprobe_events debugfs file, and > ktap is based on this behavior. No, I don't want to do that. Feeding binary code into the kernel is not trusted nor controllable. I'd just like to see the code which optimizing current fetching/filtering methods, and that is possible. Anyway, as far as I can see, there looks be two different models of tracing in our mind. A) Fixed event based tracing: In this model, there are several fixed "events" which well defined with fixed arguments. tracer handles these events and only use limited arguments. It's like a packet stream processing. ftrace, perf etc. are used this model. B) Flexible event-point tracing: In this model, each tracer(or even trace user) can freely define their own event, there will be some fixed tracing points defined, but arguments are defined by users. It's like a debugger's breakpoint debugging. systemtap, ktap etc. are used this model. Of course, both have pros/cons, and can share some fundamental features. e.g. B model has a good flexibility and A model is easy to use for beginners. I think we'd better not integrate these two, but find the better way to share each functionality. Thank you, -- Masami HIRAMATSU IT Management Research Dept. Linux Technology Center Hitachi, Ltd., Yokohama Research Laboratory E-mail: masami.hiramatsu...@hitachi.com -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [RFC PATCH tip 4/5] use BPF in tracing filters
On Fri, Dec 6, 2013 at 4:43 PM, Masami Hiramatsu wrote: > (2013/12/05 14:11), Alexei Starovoitov wrote: >> On Wed, Dec 4, 2013 at 4:05 PM, Masami Hiramatsu >> wrote: >>> (2013/12/04 10:11), Steven Rostedt wrote: On Wed, 04 Dec 2013 09:48:44 +0900 Masami Hiramatsu wrote: > fetch functions and actions. In that case, we can continue > to use current interface but much faster to trace. > Also, we can see what filter/arguments/actions are set > on each event. There's also the problem that the current filters work with the results of what is written to the buffer, not what is passed in by the trace point, as that isn't even displayed to the user. >>> >>> Agreed, so I've said I doubt this implementation is a good >>> shape to integrate. Ktap style is better, since it just gets >>> parameters from perf buffer entry (using event format). >> >> Are you saying always store all arguments into ring buffer and let >> filter run on it? > > Yes, it is what ftrace does. I doubt your way fits all of the existing > trace-event macros. However, I think just for dynamic events, you can > integrating the argument fetching and filtering. > Does this will affect the user interface of perf-probe argument fetching? I mean if use bpf backend, do we must need gcc to compile bpf source for perf-probe argument fetching? as we known, current argument fetching is go through kprobe_events/uprobe_events debugfs file, and ktap is based on this behavior. Thanks. Jovi. >> It's slower, but it's cleaner, because of human readable? since ktap >> arg1 matches first >> argument of tracepoint is better than doing ctx->regs.di ? Sure. >> si->arg1 is easy to fix. >> With si->arg1 tweak the bpf will become architecture independent. It >> will run through JIT on x86 and through interpreter everywhere else. >> but for kprobes user have to specify 'var=cpu_register' during probe >> creation… how is it better than doing the same in filter? > > Haven't you used perf-probe yet? It already supports such kind of > translation from kernel local variable name to registers, offsets, > and dereference. :) And kprobe-events can parse such arguments into > method chain. See Documentation/trace/kprobetrace.txt and > tools/perf/Documentation/perf-probe.txt for more detail. > Anyway, I'd like to use the bpf for re-implementing fetch method. :) > > Thank you, > > -- > Masami HIRAMATSU > IT Management Research Dept. Linux Technology Center > Hitachi, Ltd., Yokohama Research Laboratory > E-mail: masami.hiramatsu...@hitachi.com > > -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [RFC PATCH tip 4/5] use BPF in tracing filters
(2013/12/05 14:11), Alexei Starovoitov wrote: > On Wed, Dec 4, 2013 at 4:05 PM, Masami Hiramatsu > wrote: >> (2013/12/04 10:11), Steven Rostedt wrote: >>> On Wed, 04 Dec 2013 09:48:44 +0900 >>> Masami Hiramatsu wrote: >>> fetch functions and actions. In that case, we can continue to use current interface but much faster to trace. Also, we can see what filter/arguments/actions are set on each event. >>> >>> There's also the problem that the current filters work with the results >>> of what is written to the buffer, not what is passed in by the trace >>> point, as that isn't even displayed to the user. >> >> Agreed, so I've said I doubt this implementation is a good >> shape to integrate. Ktap style is better, since it just gets >> parameters from perf buffer entry (using event format). > > Are you saying always store all arguments into ring buffer and let > filter run on it? Yes, it is what ftrace does. I doubt your way fits all of the existing trace-event macros. However, I think just for dynamic events, you can integrating the argument fetching and filtering. > It's slower, but it's cleaner, because of human readable? since ktap > arg1 matches first > argument of tracepoint is better than doing ctx->regs.di ? Sure. > si->arg1 is easy to fix. > With si->arg1 tweak the bpf will become architecture independent. It > will run through JIT on x86 and through interpreter everywhere else. > but for kprobes user have to specify 'var=cpu_register' during probe > creation… how is it better than doing the same in filter? Haven't you used perf-probe yet? It already supports such kind of translation from kernel local variable name to registers, offsets, and dereference. :) And kprobe-events can parse such arguments into method chain. See Documentation/trace/kprobetrace.txt and tools/perf/Documentation/perf-probe.txt for more detail. Anyway, I'd like to use the bpf for re-implementing fetch method. :) Thank you, -- Masami HIRAMATSU IT Management Research Dept. Linux Technology Center Hitachi, Ltd., Yokohama Research Laboratory E-mail: masami.hiramatsu...@hitachi.com -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [RFC PATCH tip 4/5] use BPF in tracing filters
(2013/12/05 14:11), Alexei Starovoitov wrote: On Wed, Dec 4, 2013 at 4:05 PM, Masami Hiramatsu masami.hiramatsu...@hitachi.com wrote: (2013/12/04 10:11), Steven Rostedt wrote: On Wed, 04 Dec 2013 09:48:44 +0900 Masami Hiramatsu masami.hiramatsu...@hitachi.com wrote: fetch functions and actions. In that case, we can continue to use current interface but much faster to trace. Also, we can see what filter/arguments/actions are set on each event. There's also the problem that the current filters work with the results of what is written to the buffer, not what is passed in by the trace point, as that isn't even displayed to the user. Agreed, so I've said I doubt this implementation is a good shape to integrate. Ktap style is better, since it just gets parameters from perf buffer entry (using event format). Are you saying always store all arguments into ring buffer and let filter run on it? Yes, it is what ftrace does. I doubt your way fits all of the existing trace-event macros. However, I think just for dynamic events, you can integrating the argument fetching and filtering. It's slower, but it's cleaner, because of human readable? since ktap arg1 matches first argument of tracepoint is better than doing ctx-regs.di ? Sure. si-arg1 is easy to fix. With si-arg1 tweak the bpf will become architecture independent. It will run through JIT on x86 and through interpreter everywhere else. but for kprobes user have to specify 'var=cpu_register' during probe creation… how is it better than doing the same in filter? Haven't you used perf-probe yet? It already supports such kind of translation from kernel local variable name to registers, offsets, and dereference. :) And kprobe-events can parse such arguments into method chain. See Documentation/trace/kprobetrace.txt and tools/perf/Documentation/perf-probe.txt for more detail. Anyway, I'd like to use the bpf for re-implementing fetch method. :) Thank you, -- Masami HIRAMATSU IT Management Research Dept. Linux Technology Center Hitachi, Ltd., Yokohama Research Laboratory E-mail: masami.hiramatsu...@hitachi.com -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [RFC PATCH tip 4/5] use BPF in tracing filters
On Fri, Dec 6, 2013 at 4:43 PM, Masami Hiramatsu masami.hiramatsu...@hitachi.com wrote: (2013/12/05 14:11), Alexei Starovoitov wrote: On Wed, Dec 4, 2013 at 4:05 PM, Masami Hiramatsu masami.hiramatsu...@hitachi.com wrote: (2013/12/04 10:11), Steven Rostedt wrote: On Wed, 04 Dec 2013 09:48:44 +0900 Masami Hiramatsu masami.hiramatsu...@hitachi.com wrote: fetch functions and actions. In that case, we can continue to use current interface but much faster to trace. Also, we can see what filter/arguments/actions are set on each event. There's also the problem that the current filters work with the results of what is written to the buffer, not what is passed in by the trace point, as that isn't even displayed to the user. Agreed, so I've said I doubt this implementation is a good shape to integrate. Ktap style is better, since it just gets parameters from perf buffer entry (using event format). Are you saying always store all arguments into ring buffer and let filter run on it? Yes, it is what ftrace does. I doubt your way fits all of the existing trace-event macros. However, I think just for dynamic events, you can integrating the argument fetching and filtering. Does this will affect the user interface of perf-probe argument fetching? I mean if use bpf backend, do we must need gcc to compile bpf source for perf-probe argument fetching? as we known, current argument fetching is go through kprobe_events/uprobe_events debugfs file, and ktap is based on this behavior. Thanks. Jovi. It's slower, but it's cleaner, because of human readable? since ktap arg1 matches first argument of tracepoint is better than doing ctx-regs.di ? Sure. si-arg1 is easy to fix. With si-arg1 tweak the bpf will become architecture independent. It will run through JIT on x86 and through interpreter everywhere else. but for kprobes user have to specify 'var=cpu_register' during probe creation… how is it better than doing the same in filter? Haven't you used perf-probe yet? It already supports such kind of translation from kernel local variable name to registers, offsets, and dereference. :) And kprobe-events can parse such arguments into method chain. See Documentation/trace/kprobetrace.txt and tools/perf/Documentation/perf-probe.txt for more detail. Anyway, I'd like to use the bpf for re-implementing fetch method. :) Thank you, -- Masami HIRAMATSU IT Management Research Dept. Linux Technology Center Hitachi, Ltd., Yokohama Research Laboratory E-mail: masami.hiramatsu...@hitachi.com -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [RFC PATCH tip 4/5] use BPF in tracing filters
(2013/12/06 19:05), Jovi Zhangwei wrote: On Fri, Dec 6, 2013 at 4:43 PM, Masami Hiramatsu masami.hiramatsu...@hitachi.com wrote: (2013/12/05 14:11), Alexei Starovoitov wrote: On Wed, Dec 4, 2013 at 4:05 PM, Masami Hiramatsu masami.hiramatsu...@hitachi.com wrote: (2013/12/04 10:11), Steven Rostedt wrote: On Wed, 04 Dec 2013 09:48:44 +0900 Masami Hiramatsu masami.hiramatsu...@hitachi.com wrote: fetch functions and actions. In that case, we can continue to use current interface but much faster to trace. Also, we can see what filter/arguments/actions are set on each event. There's also the problem that the current filters work with the results of what is written to the buffer, not what is passed in by the trace point, as that isn't even displayed to the user. Agreed, so I've said I doubt this implementation is a good shape to integrate. Ktap style is better, since it just gets parameters from perf buffer entry (using event format). Are you saying always store all arguments into ring buffer and let filter run on it? Yes, it is what ftrace does. I doubt your way fits all of the existing trace-event macros. However, I think just for dynamic events, you can integrating the argument fetching and filtering. Does this will affect the user interface of perf-probe argument fetching? I mean if use bpf backend, do we must need gcc to compile bpf source for perf-probe argument fetching? as we known, current argument fetching is go through kprobe_events/uprobe_events debugfs file, and ktap is based on this behavior. No, I don't want to do that. Feeding binary code into the kernel is not trusted nor controllable. I'd just like to see the code which optimizing current fetching/filtering methods, and that is possible. Anyway, as far as I can see, there looks be two different models of tracing in our mind. A) Fixed event based tracing: In this model, there are several fixed events which well defined with fixed arguments. tracer handles these events and only use limited arguments. It's like a packet stream processing. ftrace, perf etc. are used this model. B) Flexible event-point tracing: In this model, each tracer(or even trace user) can freely define their own event, there will be some fixed tracing points defined, but arguments are defined by users. It's like a debugger's breakpoint debugging. systemtap, ktap etc. are used this model. Of course, both have pros/cons, and can share some fundamental features. e.g. B model has a good flexibility and A model is easy to use for beginners. I think we'd better not integrate these two, but find the better way to share each functionality. Thank you, -- Masami HIRAMATSU IT Management Research Dept. Linux Technology Center Hitachi, Ltd., Yokohama Research Laboratory E-mail: masami.hiramatsu...@hitachi.com -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [RFC PATCH tip 4/5] use BPF in tracing filters
On Wed, Dec 4, 2013 at 4:05 PM, Masami Hiramatsu wrote: > (2013/12/04 10:11), Steven Rostedt wrote: >> On Wed, 04 Dec 2013 09:48:44 +0900 >> Masami Hiramatsu wrote: >> >>> fetch functions and actions. In that case, we can continue >>> to use current interface but much faster to trace. >>> Also, we can see what filter/arguments/actions are set >>> on each event. >> >> There's also the problem that the current filters work with the results >> of what is written to the buffer, not what is passed in by the trace >> point, as that isn't even displayed to the user. > > Agreed, so I've said I doubt this implementation is a good > shape to integrate. Ktap style is better, since it just gets > parameters from perf buffer entry (using event format). Are you saying always store all arguments into ring buffer and let filter run on it? It's slower, but it's cleaner, because of human readable? since ktap arg1 matches first argument of tracepoint is better than doing ctx->regs.di ? Sure. si->arg1 is easy to fix. With si->arg1 tweak the bpf will become architecture independent. It will run through JIT on x86 and through interpreter everywhere else. but for kprobes user have to specify 'var=cpu_register' during probe creation… how is it better than doing the same in filter? I'm open to suggestions on how to improve the usability. Thanks Alexei -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [RFC PATCH tip 4/5] use BPF in tracing filters
(2013/12/04 10:11), Steven Rostedt wrote: > On Wed, 04 Dec 2013 09:48:44 +0900 > Masami Hiramatsu wrote: > >> (2013/12/03 13:28), Alexei Starovoitov wrote: >>> Such filters can be written in C and allow safe read-only access to >>> any kernel data structure. >>> Like systemtap but with safety guaranteed by kernel. >>> >>> The user can do: >>> cat bpf_program > /sys/kernel/debug/tracing/.../filter >>> if tracing event is either static or dynamic via kprobe_events. >>> >>> The program can be anything as long as bpf_check() can verify its safety. >>> For example, the user can create kprobe_event on dst_discard() >>> and use logically following code inside BPF filter: >>> skb = (struct sk_buff *)ctx->regs.di; >>> dev = bpf_load_pointer(>dev); >>> to access 'struct net_device' >>> Since its prototype is 'int dst_discard(struct sk_buff *skb);' >>> 'skb' pointer is in 'rdi' register on x86_64 >>> bpf_load_pointer() will try to fetch 'dev' field of 'sk_buff' >>> structure and will suppress page-fault if pointer is incorrect. >> >> Hmm, I doubt it is a good way to integrate with ftrace. >> I prefer to use this for replacing current ftrace filter, > > I'm not sure how we can do that. Especially since the bpf is very arch > specific, and the current filters work for all archs. My idea is to use BPF for the arch specific optimization for ftrace filter. For the other arch, filter works with current code. So the ftrace holds filter_preds and compile it in BPF bytecode if possible. And this backend optimization also can be done for fetch methods. >> fetch functions and actions. In that case, we can continue >> to use current interface but much faster to trace. >> Also, we can see what filter/arguments/actions are set >> on each event. > > There's also the problem that the current filters work with the results > of what is written to the buffer, not what is passed in by the trace > point, as that isn't even displayed to the user. Agreed, so I've said I doubt this implementation is a good shape to integrate. Ktap style is better, since it just gets parameters from perf buffer entry (using event format). Thank you, -- Masami HIRAMATSU IT Management Research Dept. Linux Technology Center Hitachi, Ltd., Yokohama Research Laboratory E-mail: masami.hiramatsu...@hitachi.com -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [RFC PATCH tip 4/5] use BPF in tracing filters
(2013/12/04 10:11), Steven Rostedt wrote: On Wed, 04 Dec 2013 09:48:44 +0900 Masami Hiramatsu masami.hiramatsu...@hitachi.com wrote: (2013/12/03 13:28), Alexei Starovoitov wrote: Such filters can be written in C and allow safe read-only access to any kernel data structure. Like systemtap but with safety guaranteed by kernel. The user can do: cat bpf_program /sys/kernel/debug/tracing/.../filter if tracing event is either static or dynamic via kprobe_events. The program can be anything as long as bpf_check() can verify its safety. For example, the user can create kprobe_event on dst_discard() and use logically following code inside BPF filter: skb = (struct sk_buff *)ctx-regs.di; dev = bpf_load_pointer(skb-dev); to access 'struct net_device' Since its prototype is 'int dst_discard(struct sk_buff *skb);' 'skb' pointer is in 'rdi' register on x86_64 bpf_load_pointer() will try to fetch 'dev' field of 'sk_buff' structure and will suppress page-fault if pointer is incorrect. Hmm, I doubt it is a good way to integrate with ftrace. I prefer to use this for replacing current ftrace filter, I'm not sure how we can do that. Especially since the bpf is very arch specific, and the current filters work for all archs. My idea is to use BPF for the arch specific optimization for ftrace filter. For the other arch, filter works with current code. So the ftrace holds filter_preds and compile it in BPF bytecode if possible. And this backend optimization also can be done for fetch methods. fetch functions and actions. In that case, we can continue to use current interface but much faster to trace. Also, we can see what filter/arguments/actions are set on each event. There's also the problem that the current filters work with the results of what is written to the buffer, not what is passed in by the trace point, as that isn't even displayed to the user. Agreed, so I've said I doubt this implementation is a good shape to integrate. Ktap style is better, since it just gets parameters from perf buffer entry (using event format). Thank you, -- Masami HIRAMATSU IT Management Research Dept. Linux Technology Center Hitachi, Ltd., Yokohama Research Laboratory E-mail: masami.hiramatsu...@hitachi.com -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [RFC PATCH tip 4/5] use BPF in tracing filters
On Wed, Dec 4, 2013 at 4:05 PM, Masami Hiramatsu masami.hiramatsu...@hitachi.com wrote: (2013/12/04 10:11), Steven Rostedt wrote: On Wed, 04 Dec 2013 09:48:44 +0900 Masami Hiramatsu masami.hiramatsu...@hitachi.com wrote: fetch functions and actions. In that case, we can continue to use current interface but much faster to trace. Also, we can see what filter/arguments/actions are set on each event. There's also the problem that the current filters work with the results of what is written to the buffer, not what is passed in by the trace point, as that isn't even displayed to the user. Agreed, so I've said I doubt this implementation is a good shape to integrate. Ktap style is better, since it just gets parameters from perf buffer entry (using event format). Are you saying always store all arguments into ring buffer and let filter run on it? It's slower, but it's cleaner, because of human readable? since ktap arg1 matches first argument of tracepoint is better than doing ctx-regs.di ? Sure. si-arg1 is easy to fix. With si-arg1 tweak the bpf will become architecture independent. It will run through JIT on x86 and through interpreter everywhere else. but for kprobes user have to specify 'var=cpu_register' during probe creation… how is it better than doing the same in filter? I'm open to suggestions on how to improve the usability. Thanks Alexei -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [RFC PATCH tip 4/5] use BPF in tracing filters
On Wed, 04 Dec 2013 09:48:44 +0900 Masami Hiramatsu wrote: > (2013/12/03 13:28), Alexei Starovoitov wrote: > > Such filters can be written in C and allow safe read-only access to > > any kernel data structure. > > Like systemtap but with safety guaranteed by kernel. > > > > The user can do: > > cat bpf_program > /sys/kernel/debug/tracing/.../filter > > if tracing event is either static or dynamic via kprobe_events. > > > > The program can be anything as long as bpf_check() can verify its safety. > > For example, the user can create kprobe_event on dst_discard() > > and use logically following code inside BPF filter: > > skb = (struct sk_buff *)ctx->regs.di; > > dev = bpf_load_pointer(>dev); > > to access 'struct net_device' > > Since its prototype is 'int dst_discard(struct sk_buff *skb);' > > 'skb' pointer is in 'rdi' register on x86_64 > > bpf_load_pointer() will try to fetch 'dev' field of 'sk_buff' > > structure and will suppress page-fault if pointer is incorrect. > > Hmm, I doubt it is a good way to integrate with ftrace. > I prefer to use this for replacing current ftrace filter, I'm not sure how we can do that. Especially since the bpf is very arch specific, and the current filters work for all archs. > fetch functions and actions. In that case, we can continue > to use current interface but much faster to trace. > Also, we can see what filter/arguments/actions are set > on each event. There's also the problem that the current filters work with the results of what is written to the buffer, not what is passed in by the trace point, as that isn't even displayed to the user. For example, sched_switch gets passed struct task_struct *prev, and *next, from that we save prev_comm, prev_pid, prev_prio, prev_state, next_comm, next_prio and next_state. These are expressed to the user by the format file of the event: field:char prev_comm[32]; offset:16; size:16;signed:1; field:pid_t prev_pid; offset:32; size:4; signed:1; field:int prev_prio; offset:36; size:4; signed:1; field:long prev_state; offset:40; size:8; signed:1; field:char next_comm[32]; offset:48; size:16;signed:1; field:pid_t next_pid; offset:64; size:4; signed:1; field:int next_prio; offset:68; size:4; signed:1; And the filters can check "next_prio > 10" and what not. The bpf program needs to access next->prio. There's nothing that shows the user what is passed to the tracepoint, and from that, what structure member to use from there. The user would be required to look at the source code of the given kernel. A requirement not needed by the current implementation. Also, there's results that can not be trivially converted. Taking a quick look at some TRACE_EVENT() structures, I found bcache_bio that has this: TP_fast_assign( __entry->dev= bio->bi_bdev->bd_dev; __entry->sector = bio->bi_sector; __entry->nr_sector = bio->bi_size >> 9; blk_fill_rwbs(__entry->rwbs, bio->bi_rw, bio->bi_size); ), Where the blk_fill_rwbs() updates the status of the entry->rwbs based on the bi_rw field. A filter must remain backward compatible to something like: rwbs == "w" or rwbs =~ '*w*' Now maybe we can make the filter code use some of the bpf if possible, but to get the result, it still needs to write to the ring buffer, and discard it if it is incorrect. Which will not make it any faster than the original trace, but perhaps faster than the trace + current filter. The speed up that was shown was because we were processing the parameters of the trace point and not the result. That currently requires the user to have full access to the source of the kernel they are tracing. -- Steve -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [RFC PATCH tip 4/5] use BPF in tracing filters
(2013/12/03 13:28), Alexei Starovoitov wrote: > Such filters can be written in C and allow safe read-only access to > any kernel data structure. > Like systemtap but with safety guaranteed by kernel. > > The user can do: > cat bpf_program > /sys/kernel/debug/tracing/.../filter > if tracing event is either static or dynamic via kprobe_events. > > The program can be anything as long as bpf_check() can verify its safety. > For example, the user can create kprobe_event on dst_discard() > and use logically following code inside BPF filter: > skb = (struct sk_buff *)ctx->regs.di; > dev = bpf_load_pointer(>dev); > to access 'struct net_device' > Since its prototype is 'int dst_discard(struct sk_buff *skb);' > 'skb' pointer is in 'rdi' register on x86_64 > bpf_load_pointer() will try to fetch 'dev' field of 'sk_buff' > structure and will suppress page-fault if pointer is incorrect. Hmm, I doubt it is a good way to integrate with ftrace. I prefer to use this for replacing current ftrace filter, fetch functions and actions. In that case, we can continue to use current interface but much faster to trace. Also, we can see what filter/arguments/actions are set on each event. Thank you, -- Masami HIRAMATSU IT Management Research Dept. Linux Technology Center Hitachi, Ltd., Yokohama Research Laboratory E-mail: masami.hiramatsu...@hitachi.com -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [RFC PATCH tip 4/5] use BPF in tracing filters
(2013/12/03 13:28), Alexei Starovoitov wrote: Such filters can be written in C and allow safe read-only access to any kernel data structure. Like systemtap but with safety guaranteed by kernel. The user can do: cat bpf_program /sys/kernel/debug/tracing/.../filter if tracing event is either static or dynamic via kprobe_events. The program can be anything as long as bpf_check() can verify its safety. For example, the user can create kprobe_event on dst_discard() and use logically following code inside BPF filter: skb = (struct sk_buff *)ctx-regs.di; dev = bpf_load_pointer(skb-dev); to access 'struct net_device' Since its prototype is 'int dst_discard(struct sk_buff *skb);' 'skb' pointer is in 'rdi' register on x86_64 bpf_load_pointer() will try to fetch 'dev' field of 'sk_buff' structure and will suppress page-fault if pointer is incorrect. Hmm, I doubt it is a good way to integrate with ftrace. I prefer to use this for replacing current ftrace filter, fetch functions and actions. In that case, we can continue to use current interface but much faster to trace. Also, we can see what filter/arguments/actions are set on each event. Thank you, -- Masami HIRAMATSU IT Management Research Dept. Linux Technology Center Hitachi, Ltd., Yokohama Research Laboratory E-mail: masami.hiramatsu...@hitachi.com -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [RFC PATCH tip 4/5] use BPF in tracing filters
On Wed, 04 Dec 2013 09:48:44 +0900 Masami Hiramatsu masami.hiramatsu...@hitachi.com wrote: (2013/12/03 13:28), Alexei Starovoitov wrote: Such filters can be written in C and allow safe read-only access to any kernel data structure. Like systemtap but with safety guaranteed by kernel. The user can do: cat bpf_program /sys/kernel/debug/tracing/.../filter if tracing event is either static or dynamic via kprobe_events. The program can be anything as long as bpf_check() can verify its safety. For example, the user can create kprobe_event on dst_discard() and use logically following code inside BPF filter: skb = (struct sk_buff *)ctx-regs.di; dev = bpf_load_pointer(skb-dev); to access 'struct net_device' Since its prototype is 'int dst_discard(struct sk_buff *skb);' 'skb' pointer is in 'rdi' register on x86_64 bpf_load_pointer() will try to fetch 'dev' field of 'sk_buff' structure and will suppress page-fault if pointer is incorrect. Hmm, I doubt it is a good way to integrate with ftrace. I prefer to use this for replacing current ftrace filter, I'm not sure how we can do that. Especially since the bpf is very arch specific, and the current filters work for all archs. fetch functions and actions. In that case, we can continue to use current interface but much faster to trace. Also, we can see what filter/arguments/actions are set on each event. There's also the problem that the current filters work with the results of what is written to the buffer, not what is passed in by the trace point, as that isn't even displayed to the user. For example, sched_switch gets passed struct task_struct *prev, and *next, from that we save prev_comm, prev_pid, prev_prio, prev_state, next_comm, next_prio and next_state. These are expressed to the user by the format file of the event: field:char prev_comm[32]; offset:16; size:16;signed:1; field:pid_t prev_pid; offset:32; size:4; signed:1; field:int prev_prio; offset:36; size:4; signed:1; field:long prev_state; offset:40; size:8; signed:1; field:char next_comm[32]; offset:48; size:16;signed:1; field:pid_t next_pid; offset:64; size:4; signed:1; field:int next_prio; offset:68; size:4; signed:1; And the filters can check next_prio 10 and what not. The bpf program needs to access next-prio. There's nothing that shows the user what is passed to the tracepoint, and from that, what structure member to use from there. The user would be required to look at the source code of the given kernel. A requirement not needed by the current implementation. Also, there's results that can not be trivially converted. Taking a quick look at some TRACE_EVENT() structures, I found bcache_bio that has this: TP_fast_assign( __entry-dev= bio-bi_bdev-bd_dev; __entry-sector = bio-bi_sector; __entry-nr_sector = bio-bi_size 9; blk_fill_rwbs(__entry-rwbs, bio-bi_rw, bio-bi_size); ), Where the blk_fill_rwbs() updates the status of the entry-rwbs based on the bi_rw field. A filter must remain backward compatible to something like: rwbs == w or rwbs =~ '*w*' Now maybe we can make the filter code use some of the bpf if possible, but to get the result, it still needs to write to the ring buffer, and discard it if it is incorrect. Which will not make it any faster than the original trace, but perhaps faster than the trace + current filter. The speed up that was shown was because we were processing the parameters of the trace point and not the result. That currently requires the user to have full access to the source of the kernel they are tracing. -- Steve -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/