Re: [RFC PATCH tip 4/5] use BPF in tracing filters

2013-12-09 Thread Masami Hiramatsu
(2013/12/09 3:22), Frank Ch. Eigler wrote:
> 
> masami.hiramatsu.pt wrote:
> 
>> [...]
>> Anyway, as far as I can see, there looks be two different models of
>> tracing in our mind.
>>
>> A) Fixed event based tracing: In this model, there are several fixed
>> "events" which well defined with fixed arguments. tracer handles these
>> events and only use limited arguments. It's like a packet stream
>> processing. ftrace, perf etc. are used this model.
>>
>> B) Flexible event-point tracing: In this model, each tracer(or even
>> trace user) can freely define their own event, there will be some fixed
>> tracing points defined, but arguments are defined by users. It's like a
>> debugger's breakpoint debugging. systemtap, ktap etc. are used this model.
> 
> It may be more useful to think of it as a contrast along the
> hard-coded versus programmable axis.  (perf, systemtap, and ktap can
> each reach to some extent across your "fixed" vs "flexible" line.
> Each has some dynamic and some static-tracepoint capability.)

Oh, I meant that B is not tend to share the defined event among
different tracing instances. Each instances defines new different
dynamic events and gets memories and registers freely.
OTOH, the Ftrace and LTT models are based on the fixed, shared
and well defined events. Even if a new dynamic event is defined,
it will be shared by every instances.

> 
>> e.g. B model has a good flexibility and A model is easy to use for
>> beginners.
> 
> I don't think it's the model that dictates ease-of-use, but the
> quality of implementation, logistics, documentation, and examples.

Of course, but it requires learning the new programming way. And
also, we need to know about the target source code for setting up
new events. I know that the systemtap provides many pre-defined
probepoints. so, the systemtap may already have solved this kind of
issue. ;)

Thank you,

-- 
Masami HIRAMATSU
IT Management Research Dept. Linux Technology Center
Hitachi, Ltd., Yokohama Research Laboratory
E-mail: masami.hiramatsu...@hitachi.com


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [RFC PATCH tip 4/5] use BPF in tracing filters

2013-12-09 Thread Masami Hiramatsu
(2013/12/09 3:22), Frank Ch. Eigler wrote:
 
 masami.hiramatsu.pt wrote:
 
 [...]
 Anyway, as far as I can see, there looks be two different models of
 tracing in our mind.

 A) Fixed event based tracing: In this model, there are several fixed
 events which well defined with fixed arguments. tracer handles these
 events and only use limited arguments. It's like a packet stream
 processing. ftrace, perf etc. are used this model.

 B) Flexible event-point tracing: In this model, each tracer(or even
 trace user) can freely define their own event, there will be some fixed
 tracing points defined, but arguments are defined by users. It's like a
 debugger's breakpoint debugging. systemtap, ktap etc. are used this model.
 
 It may be more useful to think of it as a contrast along the
 hard-coded versus programmable axis.  (perf, systemtap, and ktap can
 each reach to some extent across your fixed vs flexible line.
 Each has some dynamic and some static-tracepoint capability.)

Oh, I meant that B is not tend to share the defined event among
different tracing instances. Each instances defines new different
dynamic events and gets memories and registers freely.
OTOH, the Ftrace and LTT models are based on the fixed, shared
and well defined events. Even if a new dynamic event is defined,
it will be shared by every instances.

 
 e.g. B model has a good flexibility and A model is easy to use for
 beginners.
 
 I don't think it's the model that dictates ease-of-use, but the
 quality of implementation, logistics, documentation, and examples.

Of course, but it requires learning the new programming way. And
also, we need to know about the target source code for setting up
new events. I know that the systemtap provides many pre-defined
probepoints. so, the systemtap may already have solved this kind of
issue. ;)

Thank you,

-- 
Masami HIRAMATSU
IT Management Research Dept. Linux Technology Center
Hitachi, Ltd., Yokohama Research Laboratory
E-mail: masami.hiramatsu...@hitachi.com


--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [RFC PATCH tip 4/5] use BPF in tracing filters

2013-12-08 Thread Frank Ch. Eigler

masami.hiramatsu.pt wrote:

> [...]
> Anyway, as far as I can see, there looks be two different models of
> tracing in our mind.
>
> A) Fixed event based tracing: In this model, there are several fixed
> "events" which well defined with fixed arguments. tracer handles these
> events and only use limited arguments. It's like a packet stream
> processing. ftrace, perf etc. are used this model.
>
> B) Flexible event-point tracing: In this model, each tracer(or even
> trace user) can freely define their own event, there will be some fixed
> tracing points defined, but arguments are defined by users. It's like a
> debugger's breakpoint debugging. systemtap, ktap etc. are used this model.

It may be more useful to think of it as a contrast along the
hard-coded versus programmable axis.  (perf, systemtap, and ktap can
each reach to some extent across your "fixed" vs "flexible" line.
Each has some dynamic and some static-tracepoint capability.)


> e.g. B model has a good flexibility and A model is easy to use for
> beginners.

I don't think it's the model that dictates ease-of-use, but the
quality of implementation, logistics, documentation, and examples.


- FChE
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [RFC PATCH tip 4/5] use BPF in tracing filters

2013-12-08 Thread Frank Ch. Eigler

masami.hiramatsu.pt wrote:

 [...]
 Anyway, as far as I can see, there looks be two different models of
 tracing in our mind.

 A) Fixed event based tracing: In this model, there are several fixed
 events which well defined with fixed arguments. tracer handles these
 events and only use limited arguments. It's like a packet stream
 processing. ftrace, perf etc. are used this model.

 B) Flexible event-point tracing: In this model, each tracer(or even
 trace user) can freely define their own event, there will be some fixed
 tracing points defined, but arguments are defined by users. It's like a
 debugger's breakpoint debugging. systemtap, ktap etc. are used this model.

It may be more useful to think of it as a contrast along the
hard-coded versus programmable axis.  (perf, systemtap, and ktap can
each reach to some extent across your fixed vs flexible line.
Each has some dynamic and some static-tracepoint capability.)


 e.g. B model has a good flexibility and A model is easy to use for
 beginners.

I don't think it's the model that dictates ease-of-use, but the
quality of implementation, logistics, documentation, and examples.


- FChE
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [RFC PATCH tip 4/5] use BPF in tracing filters

2013-12-06 Thread Masami Hiramatsu
(2013/12/06 19:05), Jovi Zhangwei wrote:
> On Fri, Dec 6, 2013 at 4:43 PM, Masami Hiramatsu
>  wrote:
>> (2013/12/05 14:11), Alexei Starovoitov wrote:
>>> On Wed, Dec 4, 2013 at 4:05 PM, Masami Hiramatsu
>>>  wrote:
 (2013/12/04 10:11), Steven Rostedt wrote:
> On Wed, 04 Dec 2013 09:48:44 +0900
> Masami Hiramatsu  wrote:
>
>> fetch functions and actions. In that case, we can continue
>> to use current interface but much faster to trace.
>> Also, we can see what filter/arguments/actions are set
>> on each event.
>
> There's also the problem that the current filters work with the results
> of what is written to the buffer, not what is passed in by the trace
> point, as that isn't even displayed to the user.

 Agreed, so I've said I doubt this implementation is a good
 shape to integrate. Ktap style is better, since it just gets
 parameters from perf buffer entry (using event format).
>>>
>>> Are you saying always store all arguments into ring buffer and let
>>> filter run on it?
>>
>> Yes, it is what ftrace does. I doubt your way fits all of the existing
>> trace-event macros. However, I think just for dynamic events, you can
>> integrating the argument fetching and filtering.
>>
> Does this will affect the user interface of perf-probe argument fetching?
> 
> I mean if use bpf backend, do we must need gcc to compile bpf source
> for perf-probe argument fetching? as we known, current argument
> fetching is go through kprobe_events/uprobe_events debugfs file, and
> ktap is based on this behavior.

No, I don't want to do that. Feeding binary code into the kernel is
not trusted nor controllable. I'd just like to see the code which
optimizing current fetching/filtering methods, and that is possible.

Anyway, as far as I can see, there looks be two different models of
tracing in our mind.

A) Fixed event based tracing: In this model, there are several fixed
"events" which well defined with fixed arguments. tracer handles these
events and only use limited arguments. It's like a packet stream
processing. ftrace, perf etc. are used this model.

B) Flexible event-point tracing: In this model, each tracer(or even
trace user) can freely define their own event, there will be some fixed
tracing points defined, but arguments are defined by users. It's like a
debugger's breakpoint debugging. systemtap, ktap etc. are used this model.

Of course, both have pros/cons, and can share some fundamental features.
e.g. B model has a good flexibility and A model is easy to use for beginners.

I think we'd better not integrate these two, but find the better way
to share each functionality.

Thank you,

-- 
Masami HIRAMATSU
IT Management Research Dept. Linux Technology Center
Hitachi, Ltd., Yokohama Research Laboratory
E-mail: masami.hiramatsu...@hitachi.com


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [RFC PATCH tip 4/5] use BPF in tracing filters

2013-12-06 Thread Jovi Zhangwei
On Fri, Dec 6, 2013 at 4:43 PM, Masami Hiramatsu
 wrote:
> (2013/12/05 14:11), Alexei Starovoitov wrote:
>> On Wed, Dec 4, 2013 at 4:05 PM, Masami Hiramatsu
>>  wrote:
>>> (2013/12/04 10:11), Steven Rostedt wrote:
 On Wed, 04 Dec 2013 09:48:44 +0900
 Masami Hiramatsu  wrote:

> fetch functions and actions. In that case, we can continue
> to use current interface but much faster to trace.
> Also, we can see what filter/arguments/actions are set
> on each event.

 There's also the problem that the current filters work with the results
 of what is written to the buffer, not what is passed in by the trace
 point, as that isn't even displayed to the user.
>>>
>>> Agreed, so I've said I doubt this implementation is a good
>>> shape to integrate. Ktap style is better, since it just gets
>>> parameters from perf buffer entry (using event format).
>>
>> Are you saying always store all arguments into ring buffer and let
>> filter run on it?
>
> Yes, it is what ftrace does. I doubt your way fits all of the existing
> trace-event macros. However, I think just for dynamic events, you can
> integrating the argument fetching and filtering.
>
Does this will affect the user interface of perf-probe argument fetching?

I mean if use bpf backend, do we must need gcc to compile bpf source
for perf-probe argument fetching? as we known, current argument
fetching is go through kprobe_events/uprobe_events debugfs file, and
ktap is based on this behavior.

Thanks.

Jovi.

>> It's slower, but it's cleaner, because of human readable? since ktap
>> arg1 matches first
>> argument of tracepoint is better than doing ctx->regs.di ? Sure.
>> si->arg1 is easy to fix.
>> With si->arg1 tweak the bpf will become architecture independent. It
>> will run through JIT on x86 and through interpreter everywhere else.
>> but for kprobes user have to specify 'var=cpu_register' during probe
>> creation… how is it better than doing the same in filter?
>
> Haven't you used perf-probe yet? It already supports such kind of
> translation from kernel local variable name to registers, offsets,
> and dereference. :) And kprobe-events can parse such arguments into
> method chain. See Documentation/trace/kprobetrace.txt and
> tools/perf/Documentation/perf-probe.txt for more detail.
> Anyway, I'd like to use the bpf for re-implementing fetch method. :)
>
> Thank you,
>
> --
> Masami HIRAMATSU
> IT Management Research Dept. Linux Technology Center
> Hitachi, Ltd., Yokohama Research Laboratory
> E-mail: masami.hiramatsu...@hitachi.com
>
>
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [RFC PATCH tip 4/5] use BPF in tracing filters

2013-12-06 Thread Masami Hiramatsu
(2013/12/05 14:11), Alexei Starovoitov wrote:
> On Wed, Dec 4, 2013 at 4:05 PM, Masami Hiramatsu
>  wrote:
>> (2013/12/04 10:11), Steven Rostedt wrote:
>>> On Wed, 04 Dec 2013 09:48:44 +0900
>>> Masami Hiramatsu  wrote:
>>>
 fetch functions and actions. In that case, we can continue
 to use current interface but much faster to trace.
 Also, we can see what filter/arguments/actions are set
 on each event.
>>>
>>> There's also the problem that the current filters work with the results
>>> of what is written to the buffer, not what is passed in by the trace
>>> point, as that isn't even displayed to the user.
>>
>> Agreed, so I've said I doubt this implementation is a good
>> shape to integrate. Ktap style is better, since it just gets
>> parameters from perf buffer entry (using event format).
> 
> Are you saying always store all arguments into ring buffer and let
> filter run on it?

Yes, it is what ftrace does. I doubt your way fits all of the existing
trace-event macros. However, I think just for dynamic events, you can
integrating the argument fetching and filtering.

> It's slower, but it's cleaner, because of human readable? since ktap
> arg1 matches first
> argument of tracepoint is better than doing ctx->regs.di ? Sure.
> si->arg1 is easy to fix.
> With si->arg1 tweak the bpf will become architecture independent. It
> will run through JIT on x86 and through interpreter everywhere else.
> but for kprobes user have to specify 'var=cpu_register' during probe
> creation… how is it better than doing the same in filter?

Haven't you used perf-probe yet? It already supports such kind of
translation from kernel local variable name to registers, offsets,
and dereference. :) And kprobe-events can parse such arguments into
method chain. See Documentation/trace/kprobetrace.txt and
tools/perf/Documentation/perf-probe.txt for more detail.
Anyway, I'd like to use the bpf for re-implementing fetch method. :)

Thank you,

-- 
Masami HIRAMATSU
IT Management Research Dept. Linux Technology Center
Hitachi, Ltd., Yokohama Research Laboratory
E-mail: masami.hiramatsu...@hitachi.com


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [RFC PATCH tip 4/5] use BPF in tracing filters

2013-12-06 Thread Masami Hiramatsu
(2013/12/05 14:11), Alexei Starovoitov wrote:
 On Wed, Dec 4, 2013 at 4:05 PM, Masami Hiramatsu
 masami.hiramatsu...@hitachi.com wrote:
 (2013/12/04 10:11), Steven Rostedt wrote:
 On Wed, 04 Dec 2013 09:48:44 +0900
 Masami Hiramatsu masami.hiramatsu...@hitachi.com wrote:

 fetch functions and actions. In that case, we can continue
 to use current interface but much faster to trace.
 Also, we can see what filter/arguments/actions are set
 on each event.

 There's also the problem that the current filters work with the results
 of what is written to the buffer, not what is passed in by the trace
 point, as that isn't even displayed to the user.

 Agreed, so I've said I doubt this implementation is a good
 shape to integrate. Ktap style is better, since it just gets
 parameters from perf buffer entry (using event format).
 
 Are you saying always store all arguments into ring buffer and let
 filter run on it?

Yes, it is what ftrace does. I doubt your way fits all of the existing
trace-event macros. However, I think just for dynamic events, you can
integrating the argument fetching and filtering.

 It's slower, but it's cleaner, because of human readable? since ktap
 arg1 matches first
 argument of tracepoint is better than doing ctx-regs.di ? Sure.
 si-arg1 is easy to fix.
 With si-arg1 tweak the bpf will become architecture independent. It
 will run through JIT on x86 and through interpreter everywhere else.
 but for kprobes user have to specify 'var=cpu_register' during probe
 creation… how is it better than doing the same in filter?

Haven't you used perf-probe yet? It already supports such kind of
translation from kernel local variable name to registers, offsets,
and dereference. :) And kprobe-events can parse such arguments into
method chain. See Documentation/trace/kprobetrace.txt and
tools/perf/Documentation/perf-probe.txt for more detail.
Anyway, I'd like to use the bpf for re-implementing fetch method. :)

Thank you,

-- 
Masami HIRAMATSU
IT Management Research Dept. Linux Technology Center
Hitachi, Ltd., Yokohama Research Laboratory
E-mail: masami.hiramatsu...@hitachi.com


--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [RFC PATCH tip 4/5] use BPF in tracing filters

2013-12-06 Thread Jovi Zhangwei
On Fri, Dec 6, 2013 at 4:43 PM, Masami Hiramatsu
masami.hiramatsu...@hitachi.com wrote:
 (2013/12/05 14:11), Alexei Starovoitov wrote:
 On Wed, Dec 4, 2013 at 4:05 PM, Masami Hiramatsu
 masami.hiramatsu...@hitachi.com wrote:
 (2013/12/04 10:11), Steven Rostedt wrote:
 On Wed, 04 Dec 2013 09:48:44 +0900
 Masami Hiramatsu masami.hiramatsu...@hitachi.com wrote:

 fetch functions and actions. In that case, we can continue
 to use current interface but much faster to trace.
 Also, we can see what filter/arguments/actions are set
 on each event.

 There's also the problem that the current filters work with the results
 of what is written to the buffer, not what is passed in by the trace
 point, as that isn't even displayed to the user.

 Agreed, so I've said I doubt this implementation is a good
 shape to integrate. Ktap style is better, since it just gets
 parameters from perf buffer entry (using event format).

 Are you saying always store all arguments into ring buffer and let
 filter run on it?

 Yes, it is what ftrace does. I doubt your way fits all of the existing
 trace-event macros. However, I think just for dynamic events, you can
 integrating the argument fetching and filtering.

Does this will affect the user interface of perf-probe argument fetching?

I mean if use bpf backend, do we must need gcc to compile bpf source
for perf-probe argument fetching? as we known, current argument
fetching is go through kprobe_events/uprobe_events debugfs file, and
ktap is based on this behavior.

Thanks.

Jovi.

 It's slower, but it's cleaner, because of human readable? since ktap
 arg1 matches first
 argument of tracepoint is better than doing ctx-regs.di ? Sure.
 si-arg1 is easy to fix.
 With si-arg1 tweak the bpf will become architecture independent. It
 will run through JIT on x86 and through interpreter everywhere else.
 but for kprobes user have to specify 'var=cpu_register' during probe
 creation… how is it better than doing the same in filter?

 Haven't you used perf-probe yet? It already supports such kind of
 translation from kernel local variable name to registers, offsets,
 and dereference. :) And kprobe-events can parse such arguments into
 method chain. See Documentation/trace/kprobetrace.txt and
 tools/perf/Documentation/perf-probe.txt for more detail.
 Anyway, I'd like to use the bpf for re-implementing fetch method. :)

 Thank you,

 --
 Masami HIRAMATSU
 IT Management Research Dept. Linux Technology Center
 Hitachi, Ltd., Yokohama Research Laboratory
 E-mail: masami.hiramatsu...@hitachi.com


--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [RFC PATCH tip 4/5] use BPF in tracing filters

2013-12-06 Thread Masami Hiramatsu
(2013/12/06 19:05), Jovi Zhangwei wrote:
 On Fri, Dec 6, 2013 at 4:43 PM, Masami Hiramatsu
 masami.hiramatsu...@hitachi.com wrote:
 (2013/12/05 14:11), Alexei Starovoitov wrote:
 On Wed, Dec 4, 2013 at 4:05 PM, Masami Hiramatsu
 masami.hiramatsu...@hitachi.com wrote:
 (2013/12/04 10:11), Steven Rostedt wrote:
 On Wed, 04 Dec 2013 09:48:44 +0900
 Masami Hiramatsu masami.hiramatsu...@hitachi.com wrote:

 fetch functions and actions. In that case, we can continue
 to use current interface but much faster to trace.
 Also, we can see what filter/arguments/actions are set
 on each event.

 There's also the problem that the current filters work with the results
 of what is written to the buffer, not what is passed in by the trace
 point, as that isn't even displayed to the user.

 Agreed, so I've said I doubt this implementation is a good
 shape to integrate. Ktap style is better, since it just gets
 parameters from perf buffer entry (using event format).

 Are you saying always store all arguments into ring buffer and let
 filter run on it?

 Yes, it is what ftrace does. I doubt your way fits all of the existing
 trace-event macros. However, I think just for dynamic events, you can
 integrating the argument fetching and filtering.

 Does this will affect the user interface of perf-probe argument fetching?
 
 I mean if use bpf backend, do we must need gcc to compile bpf source
 for perf-probe argument fetching? as we known, current argument
 fetching is go through kprobe_events/uprobe_events debugfs file, and
 ktap is based on this behavior.

No, I don't want to do that. Feeding binary code into the kernel is
not trusted nor controllable. I'd just like to see the code which
optimizing current fetching/filtering methods, and that is possible.

Anyway, as far as I can see, there looks be two different models of
tracing in our mind.

A) Fixed event based tracing: In this model, there are several fixed
events which well defined with fixed arguments. tracer handles these
events and only use limited arguments. It's like a packet stream
processing. ftrace, perf etc. are used this model.

B) Flexible event-point tracing: In this model, each tracer(or even
trace user) can freely define their own event, there will be some fixed
tracing points defined, but arguments are defined by users. It's like a
debugger's breakpoint debugging. systemtap, ktap etc. are used this model.

Of course, both have pros/cons, and can share some fundamental features.
e.g. B model has a good flexibility and A model is easy to use for beginners.

I think we'd better not integrate these two, but find the better way
to share each functionality.

Thank you,

-- 
Masami HIRAMATSU
IT Management Research Dept. Linux Technology Center
Hitachi, Ltd., Yokohama Research Laboratory
E-mail: masami.hiramatsu...@hitachi.com


--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [RFC PATCH tip 4/5] use BPF in tracing filters

2013-12-04 Thread Alexei Starovoitov
On Wed, Dec 4, 2013 at 4:05 PM, Masami Hiramatsu
 wrote:
> (2013/12/04 10:11), Steven Rostedt wrote:
>> On Wed, 04 Dec 2013 09:48:44 +0900
>> Masami Hiramatsu  wrote:
>>
>>> fetch functions and actions. In that case, we can continue
>>> to use current interface but much faster to trace.
>>> Also, we can see what filter/arguments/actions are set
>>> on each event.
>>
>> There's also the problem that the current filters work with the results
>> of what is written to the buffer, not what is passed in by the trace
>> point, as that isn't even displayed to the user.
>
> Agreed, so I've said I doubt this implementation is a good
> shape to integrate. Ktap style is better, since it just gets
> parameters from perf buffer entry (using event format).

Are you saying always store all arguments into ring buffer and let
filter run on it?
It's slower, but it's cleaner, because of human readable? since ktap
arg1 matches first
argument of tracepoint is better than doing ctx->regs.di ? Sure.
si->arg1 is easy to fix.
With si->arg1 tweak the bpf will become architecture independent. It
will run through JIT on x86 and through interpreter everywhere else.
but for kprobes user have to specify 'var=cpu_register' during probe
creation… how is it better than doing the same in filter?
I'm open to suggestions on how to improve the usability.

Thanks
Alexei
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [RFC PATCH tip 4/5] use BPF in tracing filters

2013-12-04 Thread Masami Hiramatsu
(2013/12/04 10:11), Steven Rostedt wrote:
> On Wed, 04 Dec 2013 09:48:44 +0900
> Masami Hiramatsu  wrote:
> 
>> (2013/12/03 13:28), Alexei Starovoitov wrote:
>>> Such filters can be written in C and allow safe read-only access to
>>> any kernel data structure.
>>> Like systemtap but with safety guaranteed by kernel.
>>>
>>> The user can do:
>>> cat bpf_program > /sys/kernel/debug/tracing/.../filter
>>> if tracing event is either static or dynamic via kprobe_events.
>>>
>>> The program can be anything as long as bpf_check() can verify its safety.
>>> For example, the user can create kprobe_event on dst_discard()
>>> and use logically following code inside BPF filter:
>>>   skb = (struct sk_buff *)ctx->regs.di;
>>>   dev = bpf_load_pointer(>dev);
>>> to access 'struct net_device'
>>> Since its prototype is 'int dst_discard(struct sk_buff *skb);'
>>> 'skb' pointer is in 'rdi' register on x86_64
>>> bpf_load_pointer() will try to fetch 'dev' field of 'sk_buff'
>>> structure and will suppress page-fault if pointer is incorrect.
>>
>> Hmm, I doubt it is a good way to integrate with ftrace.
>> I prefer to use this for replacing current ftrace filter,
> 
> I'm not sure how we can do that. Especially since the bpf is very arch
> specific, and the current filters work for all archs.

My idea is to use BPF for the arch specific optimization for
ftrace filter. For the other arch, filter works with current
code. So the ftrace holds filter_preds and compile it in
BPF bytecode if possible.
And this backend optimization also can be done for fetch methods.

>> fetch functions and actions. In that case, we can continue
>> to use current interface but much faster to trace.
>> Also, we can see what filter/arguments/actions are set
>> on each event.
> 
> There's also the problem that the current filters work with the results
> of what is written to the buffer, not what is passed in by the trace
> point, as that isn't even displayed to the user.

Agreed, so I've said I doubt this implementation is a good
shape to integrate. Ktap style is better, since it just gets
parameters from perf buffer entry (using event format).

Thank you,

-- 
Masami HIRAMATSU
IT Management Research Dept. Linux Technology Center
Hitachi, Ltd., Yokohama Research Laboratory
E-mail: masami.hiramatsu...@hitachi.com


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [RFC PATCH tip 4/5] use BPF in tracing filters

2013-12-04 Thread Masami Hiramatsu
(2013/12/04 10:11), Steven Rostedt wrote:
 On Wed, 04 Dec 2013 09:48:44 +0900
 Masami Hiramatsu masami.hiramatsu...@hitachi.com wrote:
 
 (2013/12/03 13:28), Alexei Starovoitov wrote:
 Such filters can be written in C and allow safe read-only access to
 any kernel data structure.
 Like systemtap but with safety guaranteed by kernel.

 The user can do:
 cat bpf_program  /sys/kernel/debug/tracing/.../filter
 if tracing event is either static or dynamic via kprobe_events.

 The program can be anything as long as bpf_check() can verify its safety.
 For example, the user can create kprobe_event on dst_discard()
 and use logically following code inside BPF filter:
   skb = (struct sk_buff *)ctx-regs.di;
   dev = bpf_load_pointer(skb-dev);
 to access 'struct net_device'
 Since its prototype is 'int dst_discard(struct sk_buff *skb);'
 'skb' pointer is in 'rdi' register on x86_64
 bpf_load_pointer() will try to fetch 'dev' field of 'sk_buff'
 structure and will suppress page-fault if pointer is incorrect.

 Hmm, I doubt it is a good way to integrate with ftrace.
 I prefer to use this for replacing current ftrace filter,
 
 I'm not sure how we can do that. Especially since the bpf is very arch
 specific, and the current filters work for all archs.

My idea is to use BPF for the arch specific optimization for
ftrace filter. For the other arch, filter works with current
code. So the ftrace holds filter_preds and compile it in
BPF bytecode if possible.
And this backend optimization also can be done for fetch methods.

 fetch functions and actions. In that case, we can continue
 to use current interface but much faster to trace.
 Also, we can see what filter/arguments/actions are set
 on each event.
 
 There's also the problem that the current filters work with the results
 of what is written to the buffer, not what is passed in by the trace
 point, as that isn't even displayed to the user.

Agreed, so I've said I doubt this implementation is a good
shape to integrate. Ktap style is better, since it just gets
parameters from perf buffer entry (using event format).

Thank you,

-- 
Masami HIRAMATSU
IT Management Research Dept. Linux Technology Center
Hitachi, Ltd., Yokohama Research Laboratory
E-mail: masami.hiramatsu...@hitachi.com


--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [RFC PATCH tip 4/5] use BPF in tracing filters

2013-12-04 Thread Alexei Starovoitov
On Wed, Dec 4, 2013 at 4:05 PM, Masami Hiramatsu
masami.hiramatsu...@hitachi.com wrote:
 (2013/12/04 10:11), Steven Rostedt wrote:
 On Wed, 04 Dec 2013 09:48:44 +0900
 Masami Hiramatsu masami.hiramatsu...@hitachi.com wrote:

 fetch functions and actions. In that case, we can continue
 to use current interface but much faster to trace.
 Also, we can see what filter/arguments/actions are set
 on each event.

 There's also the problem that the current filters work with the results
 of what is written to the buffer, not what is passed in by the trace
 point, as that isn't even displayed to the user.

 Agreed, so I've said I doubt this implementation is a good
 shape to integrate. Ktap style is better, since it just gets
 parameters from perf buffer entry (using event format).

Are you saying always store all arguments into ring buffer and let
filter run on it?
It's slower, but it's cleaner, because of human readable? since ktap
arg1 matches first
argument of tracepoint is better than doing ctx-regs.di ? Sure.
si-arg1 is easy to fix.
With si-arg1 tweak the bpf will become architecture independent. It
will run through JIT on x86 and through interpreter everywhere else.
but for kprobes user have to specify 'var=cpu_register' during probe
creation… how is it better than doing the same in filter?
I'm open to suggestions on how to improve the usability.

Thanks
Alexei
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [RFC PATCH tip 4/5] use BPF in tracing filters

2013-12-03 Thread Steven Rostedt
On Wed, 04 Dec 2013 09:48:44 +0900
Masami Hiramatsu  wrote:

> (2013/12/03 13:28), Alexei Starovoitov wrote:
> > Such filters can be written in C and allow safe read-only access to
> > any kernel data structure.
> > Like systemtap but with safety guaranteed by kernel.
> > 
> > The user can do:
> > cat bpf_program > /sys/kernel/debug/tracing/.../filter
> > if tracing event is either static or dynamic via kprobe_events.
> > 
> > The program can be anything as long as bpf_check() can verify its safety.
> > For example, the user can create kprobe_event on dst_discard()
> > and use logically following code inside BPF filter:
> >   skb = (struct sk_buff *)ctx->regs.di;
> >   dev = bpf_load_pointer(>dev);
> > to access 'struct net_device'
> > Since its prototype is 'int dst_discard(struct sk_buff *skb);'
> > 'skb' pointer is in 'rdi' register on x86_64
> > bpf_load_pointer() will try to fetch 'dev' field of 'sk_buff'
> > structure and will suppress page-fault if pointer is incorrect.
> 
> Hmm, I doubt it is a good way to integrate with ftrace.
> I prefer to use this for replacing current ftrace filter,

I'm not sure how we can do that. Especially since the bpf is very arch
specific, and the current filters work for all archs.

> fetch functions and actions. In that case, we can continue
> to use current interface but much faster to trace.
> Also, we can see what filter/arguments/actions are set
> on each event.

There's also the problem that the current filters work with the results
of what is written to the buffer, not what is passed in by the trace
point, as that isn't even displayed to the user.

For example, sched_switch gets passed struct task_struct *prev, and
*next, from that we save prev_comm, prev_pid, prev_prio, prev_state,
next_comm, next_prio and next_state. These are expressed to the user
by the format file of the event:

field:char prev_comm[32];   offset:16;
size:16;signed:1; field:pid_t prev_pid;
offset:32;  size:4; signed:1; field:int
prev_prio;  offset:36;  size:4; signed:1;
field:long prev_state;  offset:40;  size:8;
signed:1; field:char next_comm[32]; offset:48;
size:16;signed:1; field:pid_t next_pid;
offset:64;  size:4; signed:1; field:int
next_prio;  offset:68;  size:4; signed:1;

And the filters can check "next_prio > 10" and what not. The bpf
program needs to access next->prio. There's nothing that shows the user
what is passed to the tracepoint, and from that, what structure member
to use from there. The user would be required to look at the source
code of the given kernel. A requirement not needed by the current
implementation.

Also, there's results that can not be trivially converted. Taking a
quick look at some TRACE_EVENT() structures, I found bcache_bio that
has this:

TP_fast_assign(
__entry->dev= bio->bi_bdev->bd_dev;
__entry->sector = bio->bi_sector;
__entry->nr_sector  = bio->bi_size >> 9;
blk_fill_rwbs(__entry->rwbs, bio->bi_rw, bio->bi_size);
),

Where the blk_fill_rwbs() updates the status of the entry->rwbs based
on the bi_rw field. A filter must remain backward compatible to
something like:

rwbs == "w"  or rwbs =~ '*w*'


Now maybe we can make the filter code use some of the bpf if possible,
but to get the result, it still needs to write to the ring buffer, and
discard it if it is incorrect. Which will not make it any faster than
the original trace, but perhaps faster than the trace + current filter.

The speed up that was shown was because we were processing the
parameters of the trace point and not the result. That currently
requires the user to have full access to the source of the kernel they
are tracing.

-- Steve
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [RFC PATCH tip 4/5] use BPF in tracing filters

2013-12-03 Thread Masami Hiramatsu
(2013/12/03 13:28), Alexei Starovoitov wrote:
> Such filters can be written in C and allow safe read-only access to
> any kernel data structure.
> Like systemtap but with safety guaranteed by kernel.
> 
> The user can do:
> cat bpf_program > /sys/kernel/debug/tracing/.../filter
> if tracing event is either static or dynamic via kprobe_events.
> 
> The program can be anything as long as bpf_check() can verify its safety.
> For example, the user can create kprobe_event on dst_discard()
> and use logically following code inside BPF filter:
>   skb = (struct sk_buff *)ctx->regs.di;
>   dev = bpf_load_pointer(>dev);
> to access 'struct net_device'
> Since its prototype is 'int dst_discard(struct sk_buff *skb);'
> 'skb' pointer is in 'rdi' register on x86_64
> bpf_load_pointer() will try to fetch 'dev' field of 'sk_buff'
> structure and will suppress page-fault if pointer is incorrect.

Hmm, I doubt it is a good way to integrate with ftrace.
I prefer to use this for replacing current ftrace filter,
fetch functions and actions. In that case, we can continue
to use current interface but much faster to trace.
Also, we can see what filter/arguments/actions are set
on each event.

Thank you,


-- 
Masami HIRAMATSU
IT Management Research Dept. Linux Technology Center
Hitachi, Ltd., Yokohama Research Laboratory
E-mail: masami.hiramatsu...@hitachi.com


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [RFC PATCH tip 4/5] use BPF in tracing filters

2013-12-03 Thread Masami Hiramatsu
(2013/12/03 13:28), Alexei Starovoitov wrote:
 Such filters can be written in C and allow safe read-only access to
 any kernel data structure.
 Like systemtap but with safety guaranteed by kernel.
 
 The user can do:
 cat bpf_program  /sys/kernel/debug/tracing/.../filter
 if tracing event is either static or dynamic via kprobe_events.
 
 The program can be anything as long as bpf_check() can verify its safety.
 For example, the user can create kprobe_event on dst_discard()
 and use logically following code inside BPF filter:
   skb = (struct sk_buff *)ctx-regs.di;
   dev = bpf_load_pointer(skb-dev);
 to access 'struct net_device'
 Since its prototype is 'int dst_discard(struct sk_buff *skb);'
 'skb' pointer is in 'rdi' register on x86_64
 bpf_load_pointer() will try to fetch 'dev' field of 'sk_buff'
 structure and will suppress page-fault if pointer is incorrect.

Hmm, I doubt it is a good way to integrate with ftrace.
I prefer to use this for replacing current ftrace filter,
fetch functions and actions. In that case, we can continue
to use current interface but much faster to trace.
Also, we can see what filter/arguments/actions are set
on each event.

Thank you,


-- 
Masami HIRAMATSU
IT Management Research Dept. Linux Technology Center
Hitachi, Ltd., Yokohama Research Laboratory
E-mail: masami.hiramatsu...@hitachi.com


--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [RFC PATCH tip 4/5] use BPF in tracing filters

2013-12-03 Thread Steven Rostedt
On Wed, 04 Dec 2013 09:48:44 +0900
Masami Hiramatsu masami.hiramatsu...@hitachi.com wrote:

 (2013/12/03 13:28), Alexei Starovoitov wrote:
  Such filters can be written in C and allow safe read-only access to
  any kernel data structure.
  Like systemtap but with safety guaranteed by kernel.
  
  The user can do:
  cat bpf_program  /sys/kernel/debug/tracing/.../filter
  if tracing event is either static or dynamic via kprobe_events.
  
  The program can be anything as long as bpf_check() can verify its safety.
  For example, the user can create kprobe_event on dst_discard()
  and use logically following code inside BPF filter:
skb = (struct sk_buff *)ctx-regs.di;
dev = bpf_load_pointer(skb-dev);
  to access 'struct net_device'
  Since its prototype is 'int dst_discard(struct sk_buff *skb);'
  'skb' pointer is in 'rdi' register on x86_64
  bpf_load_pointer() will try to fetch 'dev' field of 'sk_buff'
  structure and will suppress page-fault if pointer is incorrect.
 
 Hmm, I doubt it is a good way to integrate with ftrace.
 I prefer to use this for replacing current ftrace filter,

I'm not sure how we can do that. Especially since the bpf is very arch
specific, and the current filters work for all archs.

 fetch functions and actions. In that case, we can continue
 to use current interface but much faster to trace.
 Also, we can see what filter/arguments/actions are set
 on each event.

There's also the problem that the current filters work with the results
of what is written to the buffer, not what is passed in by the trace
point, as that isn't even displayed to the user.

For example, sched_switch gets passed struct task_struct *prev, and
*next, from that we save prev_comm, prev_pid, prev_prio, prev_state,
next_comm, next_prio and next_state. These are expressed to the user
by the format file of the event:

field:char prev_comm[32];   offset:16;
size:16;signed:1; field:pid_t prev_pid;
offset:32;  size:4; signed:1; field:int
prev_prio;  offset:36;  size:4; signed:1;
field:long prev_state;  offset:40;  size:8;
signed:1; field:char next_comm[32]; offset:48;
size:16;signed:1; field:pid_t next_pid;
offset:64;  size:4; signed:1; field:int
next_prio;  offset:68;  size:4; signed:1;

And the filters can check next_prio  10 and what not. The bpf
program needs to access next-prio. There's nothing that shows the user
what is passed to the tracepoint, and from that, what structure member
to use from there. The user would be required to look at the source
code of the given kernel. A requirement not needed by the current
implementation.

Also, there's results that can not be trivially converted. Taking a
quick look at some TRACE_EVENT() structures, I found bcache_bio that
has this:

TP_fast_assign(
__entry-dev= bio-bi_bdev-bd_dev;
__entry-sector = bio-bi_sector;
__entry-nr_sector  = bio-bi_size  9;
blk_fill_rwbs(__entry-rwbs, bio-bi_rw, bio-bi_size);
),

Where the blk_fill_rwbs() updates the status of the entry-rwbs based
on the bi_rw field. A filter must remain backward compatible to
something like:

rwbs == w  or rwbs =~ '*w*'


Now maybe we can make the filter code use some of the bpf if possible,
but to get the result, it still needs to write to the ring buffer, and
discard it if it is incorrect. Which will not make it any faster than
the original trace, but perhaps faster than the trace + current filter.

The speed up that was shown was because we were processing the
parameters of the trace point and not the result. That currently
requires the user to have full access to the source of the kernel they
are tracing.

-- Steve
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/