Re: [RFC PATCH 00/22] perf tools: introduce 'perf bpf' command to load eBPF programs.

2015-05-05 Thread Wang Nan
On 2015/5/6 12:56, Alexei Starovoitov wrote:
> On 5/5/15 9:46 PM, Wang Nan wrote:
>> Hi Alexei Starovoitov,
>>
>> Have you ever read this mail?
> 
> please don't top post.
> 
 all makes sense and your use case fits quite well into existing
 bpf+kprobe model. I'm not sure why you're calling a 'problem'.
 A problem of how to display that call stack from perf?
 I would say it fits better as a sample than a trace.
 If you dump it as a trace, it won't easy to decipher, whereas if you
 treat it a sampling event, perf record/report facility will pick it up and 
 display nicely. Meaning that one sample == lock_page/unlock_page
 latency > N. Then existing sample_callchain flag should work.

>>>
>>> Quite well. Do we have an eBPF function like
>>>
>>> static int (*bpf_perf_sample)(const char *fmt, int fmt_size, ...) = 
>>> BPF_FUNC_perf_sample
>>>
>>> so we can use it in the program probed in the body of __unlock_page() like 
>>> that:
>>>
>>>   ...
>>>   if (latency > 0.5s)
>>>  bpf_perf_sample("page=%p, latency=%d", sizeof(...), page, latency);
> 
> No need for extra helper. There is already return value from
> the program for this purpose.
> From kernel/trace/bpf_trace.c:
>  * Return: BPF programs always return an integer which is interpreted by
>  * kprobe handler as:
>  * 0 - return from kprobe (event is filtered out)
>  * 1 - store kprobe event into ring buffer
> 
> in your case the program attached to unlock_page() can return 1
> when it needs to store this event into ring buffer, so that perf can
> process it. If I'm not mistaken, the sample_callchain flag cannot be
> applied to kprobe events, but that's a general program (not
> related to bpf) and can be addressed as such.
> 

That's great! Thanks to your response!


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [RFC PATCH 00/22] perf tools: introduce 'perf bpf' command to load eBPF programs.

2015-05-05 Thread Alexei Starovoitov

On 5/5/15 9:46 PM, Wang Nan wrote:

Hi Alexei Starovoitov,

Have you ever read this mail?


please don't top post.


all makes sense and your use case fits quite well into existing
bpf+kprobe model. I'm not sure why you're calling a 'problem'.
A problem of how to display that call stack from perf?
I would say it fits better as a sample than a trace.
If you dump it as a trace, it won't easy to decipher, whereas if you
treat it a sampling event, perf record/report facility will pick it up and 
display nicely. Meaning that one sample == lock_page/unlock_page
latency > N. Then existing sample_callchain flag should work.



Quite well. Do we have an eBPF function like

static int (*bpf_perf_sample)(const char *fmt, int fmt_size, ...) = 
BPF_FUNC_perf_sample

so we can use it in the program probed in the body of __unlock_page() like that:

  ...
  if (latency > 0.5s)
 bpf_perf_sample("page=%p, latency=%d", sizeof(...), page, latency);


No need for extra helper. There is already return value from
the program for this purpose.
From kernel/trace/bpf_trace.c:
 * Return: BPF programs always return an integer which is interpreted by
 * kprobe handler as:
 * 0 - return from kprobe (event is filtered out)
 * 1 - store kprobe event into ring buffer

in your case the program attached to unlock_page() can return 1
when it needs to store this event into ring buffer, so that perf can
process it. If I'm not mistaken, the sample_callchain flag cannot be
applied to kprobe events, but that's a general program (not
related to bpf) and can be addressed as such.

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [RFC PATCH 00/22] perf tools: introduce 'perf bpf' command to load eBPF programs.

2015-05-05 Thread Wang Nan
Hi Alexei Starovoitov,

Have you ever read this mail?

I'm very intrerested in triggering perf sample in BPF code.
You said it is not a problem. Could you please give me some
further information?

Thank you.

On 2015/5/5 14:14, Wang Nan wrote:
> On 2015/5/5 13:49, Alexei Starovoitov wrote:
>> On 5/4/15 9:41 PM, Wang Nan wrote:
>>>
>>> That's great. Could you please append the description of 'llvm -s' into 
>>> your README
>>> or comments? It has cost me a lot of time for dumping eBPF instructions so 
>>> I decide to
>>> add it into perf...
>>
>> sure. it's just -filetype=asm flag to llc instead of -filetype=obj.
>> Eventually it will work as normal 'clang -S file.c' when few more
>> llvm commits are accepted upstream.
>>
> My collage He Kuang is working on variable accessing. Probing inside 
> function body
> and accessing its local variable will be supported like this:
>
>SEC("config") char _prog_config[] = "prog: func_name:1234 
> vara=localvara"
>int prog(struct pt_regs *ctx, unsigned long vara) {
>   // vara is the value of localvara of function func_name
>}

 that would be great. I'm not sure though how you can achieve that
 without changing C front-end ?
>>>
>>> It's not very difficult. He is trying to generate the loader of vara
>>> as prologue, then paste the prologue and the main eBPF program together.
>>>  From the viewpoint of kernel bpf verifier, there is only one param (ctx); 
>>> the
>>> prologue program fetches the value of vara then put it into a propoer 
>>> register,
>>> then main program work.
>>
>> got it. I think that's much cleaner than what I was proposing.
>> The only question is then:
>> char _prog_config[] = "prog: func_name:1234 vara=localvara"
>> should actually be something like "... r2=localvara", right?
>> since prologue would need to assign into r2.
>> Otherwise I don't see where you find out about 'vara' inside
>> compiled bpf code.
>>
> 
> I think the calling convention could teach us which var should go to which
> register. In the case of
> 
>  SEC("config") char _prog_config[] = "prog: func_name:1234 vara=localvara 
> varb=globalvarb";
>  int prog(struct pt_regs *ctx, unsigned long vara, unsigned long varb) { ... }
> 
> llvm should compile 'prog' according to calling convention. The body of that
> program should assume vara in r2 and varb in r3. The prologue also puts the 
> vars into
> r2 and r3 according to calling convention. Therefore, after paste them 
> together, the final
> program should run properly. There is no need to describe register number 
> explicitly.
> What do you think?
> 
> 
>> Would be nice if this can be done without debug info.
>> Like in tracex2_kern.c I have:
>> SEC("kprobe/sys_write")
>> int bpf_prog(struct pt_regs *ctx)
>> {
>> long wr_size = ctx->dx; /* arg3 */
>>
>> with your prolog generator the above can be rewritten as:
>> SEC("kprobe/sys_write")
>> int bpf_prog(struct pt_regs *unused, int fd, char *buf, size_t wr_size)
>> {
>> /* use wr_size */
>>
>> that will improve ease of use a lot.
>>
> 
> It is possible if probing on the entry of a function. However, when probing on
> function body, there still need a way to pass variable list required by the
> program to perf to let it generate correct prologue. We'd like to implement
> the generic one (list vars in config string) first, then make function
> parameters accessing as a syntax sugar.
> 
>>> Another possible solution is to change the protocol between kprobe and eBPF
>>> program, makes kprobes calls fetchers and passes them to eBPF program as
>>> a second param (group all varx together).
>>> A prologue may still need in this case to load each param into correct
>>> register.
>>
>> you mean grouping varx together in some other struct and embedding it
>> together with pt_regs into new container struct?
>> doable, but your first approach is quite clean already. why bother.
>>
> 
> The second approach makes us reuse the fetchers code which are already in
> kernel. Further more, if new type of fetchers are appear (for example, fetcher
> of PMU counter), we support it automatically.
> 
>>> Could you please consider the following problem?
>>>
>>> We find there are serval __lock_page() calls last very long time. We are 
>>> going
>>> to find corresponding __unlock_page() so we can know what blocks them. We 
>>> want to
>>> insert eBPF programs before io_schedule() in __lock_page(), and also add 
>>> eBPF program
>>> on the entry of __unlock_page(), so we can compute the interval between 
>>> page locking and
>>> unlocking. If time is longer than a threshold, let __unlock_page() trigger 
>>> a perf sampling
>>> so we get its call stack. In this case, eBPF program acts as a trace filter.
>>
>> all makes sense and your use case fits quite well into existing
>> bpf+kprobe model. I'm not sure why you're calling a 'problem'.
>> A problem of how to display that call stack from perf?
>> I would say it fits better as a 

Re: [RFC PATCH 00/22] perf tools: introduce 'perf bpf' command to load eBPF programs.

2015-05-05 Thread Brendan Gregg
On Thu, Apr 30, 2015 at 3:52 AM, Wang Nan  wrote:
[...]
> An example is pasted at the bottom of this cover letter. In that
> example, mybpfprog is configured by string in config section, and will
> be probed at __alloc_pages_nodemask. sample_bpf.o is generated using:
>
>  $ $CLANG -I/usr/src/kernel/include -I/usr/src/kernel/usr/include 
> -D__KERNEL__ \
>  -Wno-unused-value -Wno-pointer-sign \
>  -O2 -emit-llvm -c sample_bpf.c -o -| $LLC -march=bpf -filetype=obj 
> -o \
>  sample_bpf.o
>
> And can be loaded using:
>
>  $ perf bpf sample_bpf.o
[...]
>   EXAMPL 
>  - sample_bpf.c -
>  #include 
>  #include 
>  #include 
>
>  #define SEC(NAME) __attribute__((section(NAME), used))
>
>  static int (*bpf_map_delete_elem)(void *map, void *key) =
> (void *) BPF_FUNC_map_delete_elem;
>  static int (*bpf_trace_printk)(const char *fmt, int fmt_size, ...) =
> (void *) BPF_FUNC_trace_printk;
>
>  struct bpf_map_def {
> unsigned int type;
> unsigned int key_size;
> unsigned int value_size;
> unsigned int max_entries;
>  };
>
>  struct pair {
> u64 val;
> u64 ip;
>  };
>
>  struct bpf_map_def SEC("maps") my_map = {
> .type = BPF_MAP_TYPE_HASH,
> .key_size = sizeof(long),
> .value_size = sizeof(struct pair),
> .max_entries = 100,
>  };
>
>  SEC("kprobe/kmem_cache_free")
>  int bpf_prog1(struct pt_regs *ctx)
>  {
> long ptr = ctx->r14;
> bpf_map_delete_elem(_map, );
> return 0;
>  }
>
>  SEC("mybpfprog")
>  int bpf_prog_my(void *ctx)
>  {
> char fmt[] = "Haha\n";
> bpf_trace_printk(fmt, sizeof(fmt));
> return 0;
>  }
>
>  char _license[] SEC("license") = "GPL";
>  u32 _version SEC("version") = LINUX_VERSION_CODE;
>  char _config[] SEC("config") = ""
>  "mybpfprog=__alloc_pages_nodemask\n";

Was this just some random eBPF code to test the perf framework? Or was
it to do something useful with
kmem_cache_free()/__alloc_pages_nodemask() tracing as well? It looks a
bit incomplete.

If it's just random code, I'd include a comment to state that,
otherwise it's a bit confusing. A complete example might be better;
eg, something like Alexei's tracex1, for a simple example of
bpf_trace_printk(), or sockex1, for a simple map example.

Brendan
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [RFC PATCH 00/22] perf tools: introduce 'perf bpf' command to load eBPF programs.

2015-05-05 Thread Arnaldo Carvalho de Melo
Em Fri, May 01, 2015 at 09:56:23AM -0700, Alexei Starovoitov escreveu:
> Anyway, back to my original question about long term home.
> where to land 'perf/bpf' branch ?

I don't care, but for me to merge it, please go on addressing the
comments made in this thread (perf bpf command --args, etc) and at some
point provide a small patchset that implements the most basic stuff,
like, say, a "hello, world" style proggie, together with the
tools/perf/Documentation/perf-bpf.txt file, detailed instructions on how
to use the feature, i.e. what dependencies are needed, what kernel
options should be enabled, etc.

Nice warning/error messages for when the user doesn't have those options
enabled or doesn't have appropriate permissions, etc.

I.e. just by following what is in each changeset comment log I should be
able to test patch after patch.

After we get one such, say, 10-long patchkit with a very basic feature
of eBPF exposed via 'perf bpf', we can go to the next, and so on.

Try to use 'perf trace usleep 1', 'perf trace -a usleep 1' as non-root,
for instance, to see examples on how to inform the user about what is
needed to use the tool.

- Arnaldo
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [RFC PATCH 00/22] perf tools: introduce 'perf bpf' command to load eBPF programs.

2015-05-05 Thread Wang Nan
On 2015/5/5 13:49, Alexei Starovoitov wrote:
> On 5/4/15 9:41 PM, Wang Nan wrote:
>>
>> That's great. Could you please append the description of 'llvm -s' into your 
>> README
>> or comments? It has cost me a lot of time for dumping eBPF instructions so I 
>> decide to
>> add it into perf...
> 
> sure. it's just -filetype=asm flag to llc instead of -filetype=obj.
> Eventually it will work as normal 'clang -S file.c' when few more
> llvm commits are accepted upstream.
> 
 My collage He Kuang is working on variable accessing. Probing inside 
 function body
 and accessing its local variable will be supported like this:

SEC("config") char _prog_config[] = "prog: func_name:1234 
 vara=localvara"
int prog(struct pt_regs *ctx, unsigned long vara) {
   // vara is the value of localvara of function func_name
}
>>>
>>> that would be great. I'm not sure though how you can achieve that
>>> without changing C front-end ?
>>
>> It's not very difficult. He is trying to generate the loader of vara
>> as prologue, then paste the prologue and the main eBPF program together.
>>  From the viewpoint of kernel bpf verifier, there is only one param (ctx); 
>> the
>> prologue program fetches the value of vara then put it into a propoer 
>> register,
>> then main program work.
> 
> got it. I think that's much cleaner than what I was proposing.
> The only question is then:
> char _prog_config[] = "prog: func_name:1234 vara=localvara"
> should actually be something like "... r2=localvara", right?
> since prologue would need to assign into r2.
> Otherwise I don't see where you find out about 'vara' inside
> compiled bpf code.
>

I think the calling convention could teach us which var should go to which
register. In the case of

 SEC("config") char _prog_config[] = "prog: func_name:1234 vara=localvara 
varb=globalvarb";
 int prog(struct pt_regs *ctx, unsigned long vara, unsigned long varb) { ... }

llvm should compile 'prog' according to calling convention. The body of that
program should assume vara in r2 and varb in r3. The prologue also puts the 
vars into
r2 and r3 according to calling convention. Therefore, after paste them 
together, the final
program should run properly. There is no need to describe register number 
explicitly.
What do you think?


> Would be nice if this can be done without debug info.
> Like in tracex2_kern.c I have:
> SEC("kprobe/sys_write")
> int bpf_prog(struct pt_regs *ctx)
> {
> long wr_size = ctx->dx; /* arg3 */
> 
> with your prolog generator the above can be rewritten as:
> SEC("kprobe/sys_write")
> int bpf_prog(struct pt_regs *unused, int fd, char *buf, size_t wr_size)
> {
> /* use wr_size */
> 
> that will improve ease of use a lot.
>

It is possible if probing on the entry of a function. However, when probing on
function body, there still need a way to pass variable list required by the
program to perf to let it generate correct prologue. We'd like to implement
the generic one (list vars in config string) first, then make function
parameters accessing as a syntax sugar.

>> Another possible solution is to change the protocol between kprobe and eBPF
>> program, makes kprobes calls fetchers and passes them to eBPF program as
>> a second param (group all varx together).
>> A prologue may still need in this case to load each param into correct
>> register.
> 
> you mean grouping varx together in some other struct and embedding it
> together with pt_regs into new container struct?
> doable, but your first approach is quite clean already. why bother.
> 

The second approach makes us reuse the fetchers code which are already in
kernel. Further more, if new type of fetchers are appear (for example, fetcher
of PMU counter), we support it automatically.

>> Could you please consider the following problem?
>>
>> We find there are serval __lock_page() calls last very long time. We are 
>> going
>> to find corresponding __unlock_page() so we can know what blocks them. We 
>> want to
>> insert eBPF programs before io_schedule() in __lock_page(), and also add 
>> eBPF program
>> on the entry of __unlock_page(), so we can compute the interval between page 
>> locking and
>> unlocking. If time is longer than a threshold, let __unlock_page() trigger a 
>> perf sampling
>> so we get its call stack. In this case, eBPF program acts as a trace filter.
> 
> all makes sense and your use case fits quite well into existing
> bpf+kprobe model. I'm not sure why you're calling a 'problem'.
> A problem of how to display that call stack from perf?
> I would say it fits better as a sample than a trace.
> If you dump it as a trace, it won't easy to decipher, whereas if you
> treat it a sampling event, perf record/report facility will pick it up and 
> display nicely. Meaning that one sample == lock_page/unlock_page
> latency > N. Then existing sample_callchain flag should work.
> 

Quite well. Do we have an eBPF function like

static int (*bpf_perf_sample)(const 

Re: [RFC PATCH 00/22] perf tools: introduce 'perf bpf' command to load eBPF programs.

2015-05-05 Thread Wang Nan
Hi Alexei Starovoitov,

Have you ever read this mail?

I'm very intrerested in triggering perf sample in BPF code.
You said it is not a problem. Could you please give me some
further information?

Thank you.

On 2015/5/5 14:14, Wang Nan wrote:
 On 2015/5/5 13:49, Alexei Starovoitov wrote:
 On 5/4/15 9:41 PM, Wang Nan wrote:

 That's great. Could you please append the description of 'llvm -s' into 
 your README
 or comments? It has cost me a lot of time for dumping eBPF instructions so 
 I decide to
 add it into perf...

 sure. it's just -filetype=asm flag to llc instead of -filetype=obj.
 Eventually it will work as normal 'clang -S file.c' when few more
 llvm commits are accepted upstream.

 My collage He Kuang is working on variable accessing. Probing inside 
 function body
 and accessing its local variable will be supported like this:

SEC(config) char _prog_config[] = prog: func_name:1234 
 vara=localvara
int prog(struct pt_regs *ctx, unsigned long vara) {
   // vara is the value of localvara of function func_name
}

 that would be great. I'm not sure though how you can achieve that
 without changing C front-end ?

 It's not very difficult. He is trying to generate the loader of vara
 as prologue, then paste the prologue and the main eBPF program together.
  From the viewpoint of kernel bpf verifier, there is only one param (ctx); 
 the
 prologue program fetches the value of vara then put it into a propoer 
 register,
 then main program work.

 got it. I think that's much cleaner than what I was proposing.
 The only question is then:
 char _prog_config[] = prog: func_name:1234 vara=localvara
 should actually be something like ... r2=localvara, right?
 since prologue would need to assign into r2.
 Otherwise I don't see where you find out about 'vara' inside
 compiled bpf code.

 
 I think the calling convention could teach us which var should go to which
 register. In the case of
 
  SEC(config) char _prog_config[] = prog: func_name:1234 vara=localvara 
 varb=globalvarb;
  int prog(struct pt_regs *ctx, unsigned long vara, unsigned long varb) { ... }
 
 llvm should compile 'prog' according to calling convention. The body of that
 program should assume vara in r2 and varb in r3. The prologue also puts the 
 vars into
 r2 and r3 according to calling convention. Therefore, after paste them 
 together, the final
 program should run properly. There is no need to describe register number 
 explicitly.
 What do you think?
 
 
 Would be nice if this can be done without debug info.
 Like in tracex2_kern.c I have:
 SEC(kprobe/sys_write)
 int bpf_prog(struct pt_regs *ctx)
 {
 long wr_size = ctx-dx; /* arg3 */

 with your prolog generator the above can be rewritten as:
 SEC(kprobe/sys_write)
 int bpf_prog(struct pt_regs *unused, int fd, char *buf, size_t wr_size)
 {
 /* use wr_size */

 that will improve ease of use a lot.

 
 It is possible if probing on the entry of a function. However, when probing on
 function body, there still need a way to pass variable list required by the
 program to perf to let it generate correct prologue. We'd like to implement
 the generic one (list vars in config string) first, then make function
 parameters accessing as a syntax sugar.
 
 Another possible solution is to change the protocol between kprobe and eBPF
 program, makes kprobes calls fetchers and passes them to eBPF program as
 a second param (group all varx together).
 A prologue may still need in this case to load each param into correct
 register.

 you mean grouping varx together in some other struct and embedding it
 together with pt_regs into new container struct?
 doable, but your first approach is quite clean already. why bother.

 
 The second approach makes us reuse the fetchers code which are already in
 kernel. Further more, if new type of fetchers are appear (for example, fetcher
 of PMU counter), we support it automatically.
 
 Could you please consider the following problem?

 We find there are serval __lock_page() calls last very long time. We are 
 going
 to find corresponding __unlock_page() so we can know what blocks them. We 
 want to
 insert eBPF programs before io_schedule() in __lock_page(), and also add 
 eBPF program
 on the entry of __unlock_page(), so we can compute the interval between 
 page locking and
 unlocking. If time is longer than a threshold, let __unlock_page() trigger 
 a perf sampling
 so we get its call stack. In this case, eBPF program acts as a trace filter.

 all makes sense and your use case fits quite well into existing
 bpf+kprobe model. I'm not sure why you're calling a 'problem'.
 A problem of how to display that call stack from perf?
 I would say it fits better as a sample than a trace.
 If you dump it as a trace, it won't easy to decipher, whereas if you
 treat it a sampling event, perf record/report facility will pick it up and 
 display nicely. Meaning that one sample == lock_page/unlock_page
 latency  N. Then existing sample_callchain flag 

Re: [RFC PATCH 00/22] perf tools: introduce 'perf bpf' command to load eBPF programs.

2015-05-05 Thread Wang Nan
On 2015/5/6 12:56, Alexei Starovoitov wrote:
 On 5/5/15 9:46 PM, Wang Nan wrote:
 Hi Alexei Starovoitov,

 Have you ever read this mail?
 
 please don't top post.
 
 all makes sense and your use case fits quite well into existing
 bpf+kprobe model. I'm not sure why you're calling a 'problem'.
 A problem of how to display that call stack from perf?
 I would say it fits better as a sample than a trace.
 If you dump it as a trace, it won't easy to decipher, whereas if you
 treat it a sampling event, perf record/report facility will pick it up and 
 display nicely. Meaning that one sample == lock_page/unlock_page
 latency  N. Then existing sample_callchain flag should work.


 Quite well. Do we have an eBPF function like

 static int (*bpf_perf_sample)(const char *fmt, int fmt_size, ...) = 
 BPF_FUNC_perf_sample

 so we can use it in the program probed in the body of __unlock_page() like 
 that:

   ...
   if (latency  0.5s)
  bpf_perf_sample(page=%p, latency=%d, sizeof(...), page, latency);
 
 No need for extra helper. There is already return value from
 the program for this purpose.
 From kernel/trace/bpf_trace.c:
  * Return: BPF programs always return an integer which is interpreted by
  * kprobe handler as:
  * 0 - return from kprobe (event is filtered out)
  * 1 - store kprobe event into ring buffer
 
 in your case the program attached to unlock_page() can return 1
 when it needs to store this event into ring buffer, so that perf can
 process it. If I'm not mistaken, the sample_callchain flag cannot be
 applied to kprobe events, but that's a general program (not
 related to bpf) and can be addressed as such.
 

That's great! Thanks to your response!


--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [RFC PATCH 00/22] perf tools: introduce 'perf bpf' command to load eBPF programs.

2015-05-05 Thread Alexei Starovoitov

On 5/5/15 9:46 PM, Wang Nan wrote:

Hi Alexei Starovoitov,

Have you ever read this mail?


please don't top post.


all makes sense and your use case fits quite well into existing
bpf+kprobe model. I'm not sure why you're calling a 'problem'.
A problem of how to display that call stack from perf?
I would say it fits better as a sample than a trace.
If you dump it as a trace, it won't easy to decipher, whereas if you
treat it a sampling event, perf record/report facility will pick it up and 
display nicely. Meaning that one sample == lock_page/unlock_page
latency  N. Then existing sample_callchain flag should work.



Quite well. Do we have an eBPF function like

static int (*bpf_perf_sample)(const char *fmt, int fmt_size, ...) = 
BPF_FUNC_perf_sample

so we can use it in the program probed in the body of __unlock_page() like that:

  ...
  if (latency  0.5s)
 bpf_perf_sample(page=%p, latency=%d, sizeof(...), page, latency);


No need for extra helper. There is already return value from
the program for this purpose.
From kernel/trace/bpf_trace.c:
 * Return: BPF programs always return an integer which is interpreted by
 * kprobe handler as:
 * 0 - return from kprobe (event is filtered out)
 * 1 - store kprobe event into ring buffer

in your case the program attached to unlock_page() can return 1
when it needs to store this event into ring buffer, so that perf can
process it. If I'm not mistaken, the sample_callchain flag cannot be
applied to kprobe events, but that's a general program (not
related to bpf) and can be addressed as such.

--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [RFC PATCH 00/22] perf tools: introduce 'perf bpf' command to load eBPF programs.

2015-05-05 Thread Arnaldo Carvalho de Melo
Em Fri, May 01, 2015 at 09:56:23AM -0700, Alexei Starovoitov escreveu:
 Anyway, back to my original question about long term home.
 where to land 'perf/bpf' branch ?

I don't care, but for me to merge it, please go on addressing the
comments made in this thread (perf bpf command --args, etc) and at some
point provide a small patchset that implements the most basic stuff,
like, say, a hello, world style proggie, together with the
tools/perf/Documentation/perf-bpf.txt file, detailed instructions on how
to use the feature, i.e. what dependencies are needed, what kernel
options should be enabled, etc.

Nice warning/error messages for when the user doesn't have those options
enabled or doesn't have appropriate permissions, etc.

I.e. just by following what is in each changeset comment log I should be
able to test patch after patch.

After we get one such, say, 10-long patchkit with a very basic feature
of eBPF exposed via 'perf bpf', we can go to the next, and so on.

Try to use 'perf trace usleep 1', 'perf trace -a usleep 1' as non-root,
for instance, to see examples on how to inform the user about what is
needed to use the tool.

- Arnaldo
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [RFC PATCH 00/22] perf tools: introduce 'perf bpf' command to load eBPF programs.

2015-05-05 Thread Wang Nan
On 2015/5/5 13:49, Alexei Starovoitov wrote:
 On 5/4/15 9:41 PM, Wang Nan wrote:

 That's great. Could you please append the description of 'llvm -s' into your 
 README
 or comments? It has cost me a lot of time for dumping eBPF instructions so I 
 decide to
 add it into perf...
 
 sure. it's just -filetype=asm flag to llc instead of -filetype=obj.
 Eventually it will work as normal 'clang -S file.c' when few more
 llvm commits are accepted upstream.
 
 My collage He Kuang is working on variable accessing. Probing inside 
 function body
 and accessing its local variable will be supported like this:

SEC(config) char _prog_config[] = prog: func_name:1234 
 vara=localvara
int prog(struct pt_regs *ctx, unsigned long vara) {
   // vara is the value of localvara of function func_name
}

 that would be great. I'm not sure though how you can achieve that
 without changing C front-end ?

 It's not very difficult. He is trying to generate the loader of vara
 as prologue, then paste the prologue and the main eBPF program together.
  From the viewpoint of kernel bpf verifier, there is only one param (ctx); 
 the
 prologue program fetches the value of vara then put it into a propoer 
 register,
 then main program work.
 
 got it. I think that's much cleaner than what I was proposing.
 The only question is then:
 char _prog_config[] = prog: func_name:1234 vara=localvara
 should actually be something like ... r2=localvara, right?
 since prologue would need to assign into r2.
 Otherwise I don't see where you find out about 'vara' inside
 compiled bpf code.


I think the calling convention could teach us which var should go to which
register. In the case of

 SEC(config) char _prog_config[] = prog: func_name:1234 vara=localvara 
varb=globalvarb;
 int prog(struct pt_regs *ctx, unsigned long vara, unsigned long varb) { ... }

llvm should compile 'prog' according to calling convention. The body of that
program should assume vara in r2 and varb in r3. The prologue also puts the 
vars into
r2 and r3 according to calling convention. Therefore, after paste them 
together, the final
program should run properly. There is no need to describe register number 
explicitly.
What do you think?


 Would be nice if this can be done without debug info.
 Like in tracex2_kern.c I have:
 SEC(kprobe/sys_write)
 int bpf_prog(struct pt_regs *ctx)
 {
 long wr_size = ctx-dx; /* arg3 */
 
 with your prolog generator the above can be rewritten as:
 SEC(kprobe/sys_write)
 int bpf_prog(struct pt_regs *unused, int fd, char *buf, size_t wr_size)
 {
 /* use wr_size */
 
 that will improve ease of use a lot.


It is possible if probing on the entry of a function. However, when probing on
function body, there still need a way to pass variable list required by the
program to perf to let it generate correct prologue. We'd like to implement
the generic one (list vars in config string) first, then make function
parameters accessing as a syntax sugar.

 Another possible solution is to change the protocol between kprobe and eBPF
 program, makes kprobes calls fetchers and passes them to eBPF program as
 a second param (group all varx together).
 A prologue may still need in this case to load each param into correct
 register.
 
 you mean grouping varx together in some other struct and embedding it
 together with pt_regs into new container struct?
 doable, but your first approach is quite clean already. why bother.
 

The second approach makes us reuse the fetchers code which are already in
kernel. Further more, if new type of fetchers are appear (for example, fetcher
of PMU counter), we support it automatically.

 Could you please consider the following problem?

 We find there are serval __lock_page() calls last very long time. We are 
 going
 to find corresponding __unlock_page() so we can know what blocks them. We 
 want to
 insert eBPF programs before io_schedule() in __lock_page(), and also add 
 eBPF program
 on the entry of __unlock_page(), so we can compute the interval between page 
 locking and
 unlocking. If time is longer than a threshold, let __unlock_page() trigger a 
 perf sampling
 so we get its call stack. In this case, eBPF program acts as a trace filter.
 
 all makes sense and your use case fits quite well into existing
 bpf+kprobe model. I'm not sure why you're calling a 'problem'.
 A problem of how to display that call stack from perf?
 I would say it fits better as a sample than a trace.
 If you dump it as a trace, it won't easy to decipher, whereas if you
 treat it a sampling event, perf record/report facility will pick it up and 
 display nicely. Meaning that one sample == lock_page/unlock_page
 latency  N. Then existing sample_callchain flag should work.
 

Quite well. Do we have an eBPF function like

static int (*bpf_perf_sample)(const char *fmt, int fmt_size, ...) = 
BPF_FUNC_perf_sample

so we can use it in the program probed in the body of __unlock_page() like that:

 ...
 if (latency  0.5s)

Re: [RFC PATCH 00/22] perf tools: introduce 'perf bpf' command to load eBPF programs.

2015-05-05 Thread Brendan Gregg
On Thu, Apr 30, 2015 at 3:52 AM, Wang Nan wangn...@huawei.com wrote:
[...]
 An example is pasted at the bottom of this cover letter. In that
 example, mybpfprog is configured by string in config section, and will
 be probed at __alloc_pages_nodemask. sample_bpf.o is generated using:

  $ $CLANG -I/usr/src/kernel/include -I/usr/src/kernel/usr/include 
 -D__KERNEL__ \
  -Wno-unused-value -Wno-pointer-sign \
  -O2 -emit-llvm -c sample_bpf.c -o -| $LLC -march=bpf -filetype=obj 
 -o \
  sample_bpf.o

 And can be loaded using:

  $ perf bpf sample_bpf.o
[...]
   EXAMPL 
  - sample_bpf.c -
  #include uapi/linux/bpf.h
  #include linux/version.h
  #include uapi/linux/ptrace.h

  #define SEC(NAME) __attribute__((section(NAME), used))

  static int (*bpf_map_delete_elem)(void *map, void *key) =
 (void *) BPF_FUNC_map_delete_elem;
  static int (*bpf_trace_printk)(const char *fmt, int fmt_size, ...) =
 (void *) BPF_FUNC_trace_printk;

  struct bpf_map_def {
 unsigned int type;
 unsigned int key_size;
 unsigned int value_size;
 unsigned int max_entries;
  };

  struct pair {
 u64 val;
 u64 ip;
  };

  struct bpf_map_def SEC(maps) my_map = {
 .type = BPF_MAP_TYPE_HASH,
 .key_size = sizeof(long),
 .value_size = sizeof(struct pair),
 .max_entries = 100,
  };

  SEC(kprobe/kmem_cache_free)
  int bpf_prog1(struct pt_regs *ctx)
  {
 long ptr = ctx-r14;
 bpf_map_delete_elem(my_map, ptr);
 return 0;
  }

  SEC(mybpfprog)
  int bpf_prog_my(void *ctx)
  {
 char fmt[] = Haha\n;
 bpf_trace_printk(fmt, sizeof(fmt));
 return 0;
  }

  char _license[] SEC(license) = GPL;
  u32 _version SEC(version) = LINUX_VERSION_CODE;
  char _config[] SEC(config) = 
  mybpfprog=__alloc_pages_nodemask\n;

Was this just some random eBPF code to test the perf framework? Or was
it to do something useful with
kmem_cache_free()/__alloc_pages_nodemask() tracing as well? It looks a
bit incomplete.

If it's just random code, I'd include a comment to state that,
otherwise it's a bit confusing. A complete example might be better;
eg, something like Alexei's tracex1, for a simple example of
bpf_trace_printk(), or sockex1, for a simple map example.

Brendan
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [RFC PATCH 00/22] perf tools: introduce 'perf bpf' command to load eBPF programs.

2015-05-04 Thread Alexei Starovoitov

On 5/4/15 9:41 PM, Wang Nan wrote:


That's great. Could you please append the description of 'llvm -s' into your 
README
or comments? It has cost me a lot of time for dumping eBPF instructions so I 
decide to
add it into perf...


sure. it's just -filetype=asm flag to llc instead of -filetype=obj.
Eventually it will work as normal 'clang -S file.c' when few more
llvm commits are accepted upstream.


My collage He Kuang is working on variable accessing. Probing inside function 
body
and accessing its local variable will be supported like this:

   SEC("config") char _prog_config[] = "prog: func_name:1234 vara=localvara"
   int prog(struct pt_regs *ctx, unsigned long vara) {
  // vara is the value of localvara of function func_name
   }


that would be great. I'm not sure though how you can achieve that
without changing C front-end ?


It's not very difficult. He is trying to generate the loader of vara
as prologue, then paste the prologue and the main eBPF program together.
 From the viewpoint of kernel bpf verifier, there is only one param (ctx); the
prologue program fetches the value of vara then put it into a propoer register,
then main program work.


got it. I think that's much cleaner than what I was proposing.
The only question is then:
char _prog_config[] = "prog: func_name:1234 vara=localvara"
should actually be something like "... r2=localvara", right?
since prologue would need to assign into r2.
Otherwise I don't see where you find out about 'vara' inside
compiled bpf code.

Would be nice if this can be done without debug info.
Like in tracex2_kern.c I have:
SEC("kprobe/sys_write")
int bpf_prog(struct pt_regs *ctx)
{
long wr_size = ctx->dx; /* arg3 */

with your prolog generator the above can be rewritten as:
SEC("kprobe/sys_write")
int bpf_prog(struct pt_regs *unused, int fd, char *buf, size_t wr_size)
{
/* use wr_size */

that will improve ease of use a lot.


Another possible solution is to change the protocol between kprobe and eBPF
program, makes kprobes calls fetchers and passes them to eBPF program as
a second param (group all varx together).
A prologue may still need in this case to load each param into correct
register.


you mean grouping varx together in some other struct and embedding it
together with pt_regs into new container struct?
doable, but your first approach is quite clean already. why bother.


Could you please consider the following problem?

We find there are serval __lock_page() calls last very long time. We are going
to find corresponding __unlock_page() so we can know what blocks them. We want 
to
insert eBPF programs before io_schedule() in __lock_page(), and also add eBPF 
program
on the entry of __unlock_page(), so we can compute the interval between page 
locking and
unlocking. If time is longer than a threshold, let __unlock_page() trigger a 
perf sampling
so we get its call stack. In this case, eBPF program acts as a trace filter.


all makes sense and your use case fits quite well into existing
bpf+kprobe model. I'm not sure why you're calling a 'problem'.
A problem of how to display that call stack from perf?
I would say it fits better as a sample than a trace.
If you dump it as a trace, it won't easy to decipher, whereas if you
treat it a sampling event, perf record/report facility will pick it up 
and display nicely. Meaning that one sample == lock_page/unlock_page

latency > N. Then existing sample_callchain flag should work.

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [RFC PATCH 00/22] perf tools: introduce 'perf bpf' command to load eBPF programs.

2015-05-04 Thread Wang Nan
On 2015/5/5 11:02, Alexei Starovoitov wrote:
> On 5/2/15 12:19 AM, Wang Nan wrote:
>>
>> I'd like to do following works in the next version (based on my experience 
>> and feedbacks):
>>
>> 1. Safely clean up kprobe points after unloading;
>>
>> 2. Add subcommand space to 'perf bpf'. Current staff should be reside in 
>> 'perf bpf load';
>>
>> 3. Extract eBPF ELF walking and collecting work to a separated library to 
>> help others.
> 
> that's a good list.
> 
> The feedback for existing patches:
> patch 18 - since we're creating a generic library for bpf elf
> loading it would great to do the following:
> first try to load with
> attr.log_buf = NULL;
> attr.log_level = 0;
> then only if it fails, allocate a buffer and repeat with log_level = 1.
> The reason is that it's better to have fast program loading by default
> without any verbosity emitted by verifier.
> 

Will do.

> patch 19 - I think it's unnecessary.
> verifier already dumps it. so this '-v' flag can be translated into
> verbose loading.
> There is also .s output from llvm for those interested in bpf asm
> instructions.
> 

That's great. Could you please append the description of 'llvm -s' into your 
README
or comments? It has cost me a lot of time for dumping eBPF instructions so I 
decide to
add it into perf...

>> My collage He Kuang is working on variable accessing. Probing inside 
>> function body
>> and accessing its local variable will be supported like this:
>>
>>   SEC("config") char _prog_config[] = "prog: func_name:1234 vara=localvara"
>>   int prog(struct pt_regs *ctx, unsigned long vara) {
>>  // vara is the value of localvara of function func_name
>>   }
> 
> that would be great. I'm not sure though how you can achieve that
> without changing C front-end ?

It's not very difficult. He is trying to generate the loader of vara
as prologue, then paste the prologue and the main eBPF program together.
>From the viewpoint of kernel bpf verifier, there is only one param (ctx); the
prologue program fetches the value of vara then put it into a propoer register,
then main program work.

Another possible solution is to change the protocol between kprobe and eBPF
program, makes kprobes calls fetchers and passes them to eBPF program as
a second param (group all varx together).
A prologue may still need in this case to load each param into correct
register.

> This type of feature is exactly the reason why we're trying to write
> our front-end.
> In general there are two ways to achieve 'restricted C' language:
> - start from clang and chop all features that are not supported.
>   I believe Jovi already tried to do that and it became very difficult.
> - start from simple front-end with minimal C and add all things one by
>   one. That's what we're trying to do. So far we have most of normal
>   syntax. The problem with our approach is that we cannot easily do
>   #include of existing .h files. We're working on that.
>   It's too experimental still. May be will be drop it and go back to
>   first approach.
> 
> The reason for extending front-end is your example above, where
> the user would want to write:
>int prog(struct pt_regs *ctx, unsigned long vara) {
> // use 'vara'
> but generated BPF should have only one 'ctx' pointer, since that's
> the only thing that verifier will accept. bpf/core and JITs expect
> only one argument, etc.
> So this func definition + 'vara' access can be compiled as ctx->si
> (if vara is actually in register) or
> bpf_probe_read(ctx->bp + magic_offset_from_debug_info)
> (if vara is on stack)
> or it can also be done via store_trace_args() but that will be slower
> and requires hacking kernel, whereas ctx->... style is pure userspace.
> Lot's of things to brainstorm. So please share your progress soon.
> 
>> And I want to discuss with you and others about:
>>
>>   1. How to make eBPF output its tracing and aggregation results to perf?
> 
> well, the output of bpf program is a data stored in maps. Each program
> needs a corresponding user space reader/printer/sorter of this data.
> Like tracex2 prints this data as histogram and tracex3 prints it as
> heatmap. We can standardize few things like this, but ideally we
> keep it up to user. So that user can write single file that consists
> of functions that are loaded as bpf into kernel and other functions
> that are executed in user space. llvm can jit first set to bpf and
> second set to x86. That's distant future though.
> So far samples/bpf/ style of kern.c+user.c worked quite well.
> 

Well, looks like in your design the usage of BPF programs are some aggration
results. In my side, I want they also ack as trace filters.

Could you please consider the following problem?

We find there are serval __lock_page() calls last very long time. We are going
to find corresponding __unlock_page() so we can know what blocks them. We want 
to
insert eBPF programs before io_schedule() in __lock_page(), and also add eBPF 
program
on the entry of __unlock_page(), so we 

Re: [RFC PATCH 00/22] perf tools: introduce 'perf bpf' command to load eBPF programs.

2015-05-04 Thread Alexei Starovoitov

On 5/2/15 12:19 AM, Wang Nan wrote:


I'd like to do following works in the next version (based on my experience and 
feedbacks):

1. Safely clean up kprobe points after unloading;

2. Add subcommand space to 'perf bpf'. Current staff should be reside in 'perf 
bpf load';

3. Extract eBPF ELF walking and collecting work to a separated library to help 
others.


that's a good list.

The feedback for existing patches:
patch 18 - since we're creating a generic library for bpf elf
loading it would great to do the following:
first try to load with
attr.log_buf = NULL;
attr.log_level = 0;
then only if it fails, allocate a buffer and repeat with log_level = 1.
The reason is that it's better to have fast program loading by default
without any verbosity emitted by verifier.

patch 19 - I think it's unnecessary.
verifier already dumps it. so this '-v' flag can be translated into
verbose loading.
There is also .s output from llvm for those interested in bpf asm
instructions.


My collage He Kuang is working on variable accessing. Probing inside function 
body
and accessing its local variable will be supported like this:

  SEC("config") char _prog_config[] = "prog: func_name:1234 vara=localvara"
  int prog(struct pt_regs *ctx, unsigned long vara) {
 // vara is the value of localvara of function func_name
  }


that would be great. I'm not sure though how you can achieve that
without changing C front-end ?
This type of feature is exactly the reason why we're trying to write
our front-end.
In general there are two ways to achieve 'restricted C' language:
- start from clang and chop all features that are not supported.
  I believe Jovi already tried to do that and it became very difficult.
- start from simple front-end with minimal C and add all things one by
  one. That's what we're trying to do. So far we have most of normal
  syntax. The problem with our approach is that we cannot easily do
  #include of existing .h files. We're working on that.
  It's too experimental still. May be will be drop it and go back to
  first approach.

The reason for extending front-end is your example above, where
the user would want to write:
   int prog(struct pt_regs *ctx, unsigned long vara) {
// use 'vara'
but generated BPF should have only one 'ctx' pointer, since that's
the only thing that verifier will accept. bpf/core and JITs expect
only one argument, etc.
So this func definition + 'vara' access can be compiled as ctx->si
(if vara is actually in register) or
bpf_probe_read(ctx->bp + magic_offset_from_debug_info)
(if vara is on stack)
or it can also be done via store_trace_args() but that will be slower
and requires hacking kernel, whereas ctx->... style is pure userspace.
Lot's of things to brainstorm. So please share your progress soon.


And I want to discuss with you and others about:

  1. How to make eBPF output its tracing and aggregation results to perf?


well, the output of bpf program is a data stored in maps. Each program
needs a corresponding user space reader/printer/sorter of this data.
Like tracex2 prints this data as histogram and tracex3 prints it as
heatmap. We can standardize few things like this, but ideally we
keep it up to user. So that user can write single file that consists
of functions that are loaded as bpf into kernel and other functions
that are executed in user space. llvm can jit first set to bpf and
second set to x86. That's distant future though.
So far samples/bpf/ style of kern.c+user.c worked quite well.

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [RFC PATCH 00/22] perf tools: introduce 'perf bpf' command to load eBPF programs.

2015-05-04 Thread Alexei Starovoitov

On 5/2/15 12:19 AM, Wang Nan wrote:


I'd like to do following works in the next version (based on my experience and 
feedbacks):

1. Safely clean up kprobe points after unloading;

2. Add subcommand space to 'perf bpf'. Current staff should be reside in 'perf 
bpf load';

3. Extract eBPF ELF walking and collecting work to a separated library to help 
others.


that's a good list.

The feedback for existing patches:
patch 18 - since we're creating a generic library for bpf elf
loading it would great to do the following:
first try to load with
attr.log_buf = NULL;
attr.log_level = 0;
then only if it fails, allocate a buffer and repeat with log_level = 1.
The reason is that it's better to have fast program loading by default
without any verbosity emitted by verifier.

patch 19 - I think it's unnecessary.
verifier already dumps it. so this '-v' flag can be translated into
verbose loading.
There is also .s output from llvm for those interested in bpf asm
instructions.


My collage He Kuang is working on variable accessing. Probing inside function 
body
and accessing its local variable will be supported like this:

  SEC(config) char _prog_config[] = prog: func_name:1234 vara=localvara
  int prog(struct pt_regs *ctx, unsigned long vara) {
 // vara is the value of localvara of function func_name
  }


that would be great. I'm not sure though how you can achieve that
without changing C front-end ?
This type of feature is exactly the reason why we're trying to write
our front-end.
In general there are two ways to achieve 'restricted C' language:
- start from clang and chop all features that are not supported.
  I believe Jovi already tried to do that and it became very difficult.
- start from simple front-end with minimal C and add all things one by
  one. That's what we're trying to do. So far we have most of normal
  syntax. The problem with our approach is that we cannot easily do
  #include of existing .h files. We're working on that.
  It's too experimental still. May be will be drop it and go back to
  first approach.

The reason for extending front-end is your example above, where
the user would want to write:
   int prog(struct pt_regs *ctx, unsigned long vara) {
// use 'vara'
but generated BPF should have only one 'ctx' pointer, since that's
the only thing that verifier will accept. bpf/core and JITs expect
only one argument, etc.
So this func definition + 'vara' access can be compiled as ctx-si
(if vara is actually in register) or
bpf_probe_read(ctx-bp + magic_offset_from_debug_info)
(if vara is on stack)
or it can also be done via store_trace_args() but that will be slower
and requires hacking kernel, whereas ctx-... style is pure userspace.
Lot's of things to brainstorm. So please share your progress soon.


And I want to discuss with you and others about:

  1. How to make eBPF output its tracing and aggregation results to perf?


well, the output of bpf program is a data stored in maps. Each program
needs a corresponding user space reader/printer/sorter of this data.
Like tracex2 prints this data as histogram and tracex3 prints it as
heatmap. We can standardize few things like this, but ideally we
keep it up to user. So that user can write single file that consists
of functions that are loaded as bpf into kernel and other functions
that are executed in user space. llvm can jit first set to bpf and
second set to x86. That's distant future though.
So far samples/bpf/ style of kern.c+user.c worked quite well.

--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [RFC PATCH 00/22] perf tools: introduce 'perf bpf' command to load eBPF programs.

2015-05-04 Thread Wang Nan
On 2015/5/5 11:02, Alexei Starovoitov wrote:
 On 5/2/15 12:19 AM, Wang Nan wrote:

 I'd like to do following works in the next version (based on my experience 
 and feedbacks):

 1. Safely clean up kprobe points after unloading;

 2. Add subcommand space to 'perf bpf'. Current staff should be reside in 
 'perf bpf load';

 3. Extract eBPF ELF walking and collecting work to a separated library to 
 help others.
 
 that's a good list.
 
 The feedback for existing patches:
 patch 18 - since we're creating a generic library for bpf elf
 loading it would great to do the following:
 first try to load with
 attr.log_buf = NULL;
 attr.log_level = 0;
 then only if it fails, allocate a buffer and repeat with log_level = 1.
 The reason is that it's better to have fast program loading by default
 without any verbosity emitted by verifier.
 

Will do.

 patch 19 - I think it's unnecessary.
 verifier already dumps it. so this '-v' flag can be translated into
 verbose loading.
 There is also .s output from llvm for those interested in bpf asm
 instructions.
 

That's great. Could you please append the description of 'llvm -s' into your 
README
or comments? It has cost me a lot of time for dumping eBPF instructions so I 
decide to
add it into perf...

 My collage He Kuang is working on variable accessing. Probing inside 
 function body
 and accessing its local variable will be supported like this:

   SEC(config) char _prog_config[] = prog: func_name:1234 vara=localvara
   int prog(struct pt_regs *ctx, unsigned long vara) {
  // vara is the value of localvara of function func_name
   }
 
 that would be great. I'm not sure though how you can achieve that
 without changing C front-end ?

It's not very difficult. He is trying to generate the loader of vara
as prologue, then paste the prologue and the main eBPF program together.
From the viewpoint of kernel bpf verifier, there is only one param (ctx); the
prologue program fetches the value of vara then put it into a propoer register,
then main program work.

Another possible solution is to change the protocol between kprobe and eBPF
program, makes kprobes calls fetchers and passes them to eBPF program as
a second param (group all varx together).
A prologue may still need in this case to load each param into correct
register.

 This type of feature is exactly the reason why we're trying to write
 our front-end.
 In general there are two ways to achieve 'restricted C' language:
 - start from clang and chop all features that are not supported.
   I believe Jovi already tried to do that and it became very difficult.
 - start from simple front-end with minimal C and add all things one by
   one. That's what we're trying to do. So far we have most of normal
   syntax. The problem with our approach is that we cannot easily do
   #include of existing .h files. We're working on that.
   It's too experimental still. May be will be drop it and go back to
   first approach.
 
 The reason for extending front-end is your example above, where
 the user would want to write:
int prog(struct pt_regs *ctx, unsigned long vara) {
 // use 'vara'
 but generated BPF should have only one 'ctx' pointer, since that's
 the only thing that verifier will accept. bpf/core and JITs expect
 only one argument, etc.
 So this func definition + 'vara' access can be compiled as ctx-si
 (if vara is actually in register) or
 bpf_probe_read(ctx-bp + magic_offset_from_debug_info)
 (if vara is on stack)
 or it can also be done via store_trace_args() but that will be slower
 and requires hacking kernel, whereas ctx-... style is pure userspace.
 Lot's of things to brainstorm. So please share your progress soon.
 
 And I want to discuss with you and others about:

   1. How to make eBPF output its tracing and aggregation results to perf?
 
 well, the output of bpf program is a data stored in maps. Each program
 needs a corresponding user space reader/printer/sorter of this data.
 Like tracex2 prints this data as histogram and tracex3 prints it as
 heatmap. We can standardize few things like this, but ideally we
 keep it up to user. So that user can write single file that consists
 of functions that are loaded as bpf into kernel and other functions
 that are executed in user space. llvm can jit first set to bpf and
 second set to x86. That's distant future though.
 So far samples/bpf/ style of kern.c+user.c worked quite well.
 

Well, looks like in your design the usage of BPF programs are some aggration
results. In my side, I want they also ack as trace filters.

Could you please consider the following problem?

We find there are serval __lock_page() calls last very long time. We are going
to find corresponding __unlock_page() so we can know what blocks them. We want 
to
insert eBPF programs before io_schedule() in __lock_page(), and also add eBPF 
program
on the entry of __unlock_page(), so we can compute the interval between page 
locking and
unlocking. If time is longer than a threshold, let 

Re: [RFC PATCH 00/22] perf tools: introduce 'perf bpf' command to load eBPF programs.

2015-05-04 Thread Alexei Starovoitov

On 5/4/15 9:41 PM, Wang Nan wrote:


That's great. Could you please append the description of 'llvm -s' into your 
README
or comments? It has cost me a lot of time for dumping eBPF instructions so I 
decide to
add it into perf...


sure. it's just -filetype=asm flag to llc instead of -filetype=obj.
Eventually it will work as normal 'clang -S file.c' when few more
llvm commits are accepted upstream.


My collage He Kuang is working on variable accessing. Probing inside function 
body
and accessing its local variable will be supported like this:

   SEC(config) char _prog_config[] = prog: func_name:1234 vara=localvara
   int prog(struct pt_regs *ctx, unsigned long vara) {
  // vara is the value of localvara of function func_name
   }


that would be great. I'm not sure though how you can achieve that
without changing C front-end ?


It's not very difficult. He is trying to generate the loader of vara
as prologue, then paste the prologue and the main eBPF program together.
 From the viewpoint of kernel bpf verifier, there is only one param (ctx); the
prologue program fetches the value of vara then put it into a propoer register,
then main program work.


got it. I think that's much cleaner than what I was proposing.
The only question is then:
char _prog_config[] = prog: func_name:1234 vara=localvara
should actually be something like ... r2=localvara, right?
since prologue would need to assign into r2.
Otherwise I don't see where you find out about 'vara' inside
compiled bpf code.

Would be nice if this can be done without debug info.
Like in tracex2_kern.c I have:
SEC(kprobe/sys_write)
int bpf_prog(struct pt_regs *ctx)
{
long wr_size = ctx-dx; /* arg3 */

with your prolog generator the above can be rewritten as:
SEC(kprobe/sys_write)
int bpf_prog(struct pt_regs *unused, int fd, char *buf, size_t wr_size)
{
/* use wr_size */

that will improve ease of use a lot.


Another possible solution is to change the protocol between kprobe and eBPF
program, makes kprobes calls fetchers and passes them to eBPF program as
a second param (group all varx together).
A prologue may still need in this case to load each param into correct
register.


you mean grouping varx together in some other struct and embedding it
together with pt_regs into new container struct?
doable, but your first approach is quite clean already. why bother.


Could you please consider the following problem?

We find there are serval __lock_page() calls last very long time. We are going
to find corresponding __unlock_page() so we can know what blocks them. We want 
to
insert eBPF programs before io_schedule() in __lock_page(), and also add eBPF 
program
on the entry of __unlock_page(), so we can compute the interval between page 
locking and
unlocking. If time is longer than a threshold, let __unlock_page() trigger a 
perf sampling
so we get its call stack. In this case, eBPF program acts as a trace filter.


all makes sense and your use case fits quite well into existing
bpf+kprobe model. I'm not sure why you're calling a 'problem'.
A problem of how to display that call stack from perf?
I would say it fits better as a sample than a trace.
If you dump it as a trace, it won't easy to decipher, whereas if you
treat it a sampling event, perf record/report facility will pick it up 
and display nicely. Meaning that one sample == lock_page/unlock_page

latency  N. Then existing sample_callchain flag should work.

--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [RFC PATCH 00/22] perf tools: introduce 'perf bpf' command to load eBPF programs.

2015-05-02 Thread Wang Nan
On 2015/5/1 12:37, Alexei Starovoitov wrote:
> On 4/30/15 3:52 AM, Wang Nan wrote:
>> This series of patches is an approach to integrate eBPF with perf.
>> After applying these patches, users are allowed to use following
>> command to load eBPF program compiled by LLVM into kernel:
>>
>>   $ perf bpf sample_bpf.o
>>
>> The required BPF code and the loading procedure is similar to Alexei
>> Starovoitov's libbpf in sample/bpf, with following exceptions:
>>
>>   1. The section name are not required leading with 'kprobe/' or
>>  'kretprobe/'. Without such leading, any valid C var name can be use.
>>
>>   2. A 'config' section can be provided to describe the position and
>>  arguments of a program. Syntax is identical to 'perf probe'.
>>
>> An example is pasted at the bottom of this cover letter. In that
>> example, mybpfprog is configured by string in config section, and will
>> be probed at __alloc_pages_nodemask. sample_bpf.o is generated using:
>>
>>   $ $CLANG -I/usr/src/kernel/include -I/usr/src/kernel/usr/include 
>> -D__KERNEL__ \
>>  -Wno-unused-value -Wno-pointer-sign \
>>  -O2 -emit-llvm -c sample_bpf.c -o -| $LLC -march=bpf -filetype=obj -o \
>>  sample_bpf.o
>>
>> And can be loaded using:
>>
>>   $ perf bpf sample_bpf.o
>>
>> This series is only a limited functional. Following works are on the
>> todo list:
>>
>>   1. Unprobe kprobe stubs used by eBPF programs when unloading;
>>
>>   2. Enable eBPF programs to access local variables and arguments
>>  by utilizing debuginfo;
>>
>>   3. Output data in perf way.
>>
>> In this series:
>>
>> Patch 1/22 is a bugfix in perf probe, and may be triggered by following
>> patches;
>>
>> Patch 2-3/22 are preparation, add required macros and syscall
>> definition into perf source tree.
>>
>> Patch 4/22 add 'perf bpf' command.
>>
>> Patch 5-20/22 are labor works, which parse the ELF object file, collect
>> information in object files, create maps needed by programs, link map
>> and programs, config programs and load programs into kernel.
>>
>> Patch 21-22/22 are the final work. Patch 21 creates kprobe points which
>> will be used by eBPF programs, patch 22 creates perf file descriptors
>> then attach eBPF programs on them.
> 
> I'm very happy to see this work. Looks great. All patches are impressively 
> clean and concise.
> I think patches 1-3 are ready to go into Arnaldo's perf tree right now.
> 4 and above are clean and polished, but probably need to go into
> some 'staging area' like a branch of perf tree, since I suspect the
> user interface may change a little in the coming months and it's
> a bit too early to expose 'perf bpf' command to every perf user ?
> Arnaldo, Ingo, what do you guys think should be the arrangement?
> 'perf/bpf' branch in acme/linux.git or in tip/tip.git ?
> 
> I have few comments for patches 18 and 19, but let's figure out
> the long term plan first.
> 

Hi,

Very happy to see your and other's positive feedbacks. I'm also interested in
how these patches can be merged into mainline. I'd like to continous send 
patches
to this list to let you all see my improvements, and let maintainers deside 
whether
and how to merge them.

Now we are also doing some backporting work to make eBPF patches to work for our
low version kernels. After that we will utilize eBPF in our profiling work.
I think this RFC series is only a start point to let us to use eBPF. Further 
requirements
should arise during our real work.

I'd like to do following works in the next version (based on my experience and 
feedbacks):

1. Safely clean up kprobe points after unloading;

2. Add subcommand space to 'perf bpf'. Current staff should be reside in 'perf 
bpf load';

3. Extract eBPF ELF walking and collecting work to a separated library to help 
others.

My collage He Kuang is working on variable accessing. Probing inside function 
body
and accessing its local variable will be supported like this:

 SEC("config") char _prog_config[] = "prog: func_name:1234 vara=localvara"
 int prog(struct pt_regs *ctx, unsigned long vara) {
// vara is the value of localvara of function func_name
 }

And I want to discuss with you and others about:

 1. How to make eBPF output its tracing and aggregation results to perf?

Thanks!

> We're also working in parallel on creating a new tracing language
> that together with llvm backend can be used as a single shared library
> that can be called from perf or anything else.
> Then clang compilation step will be gone and programs can be run
> as 'perf bpf file.bpf'.
> 
> Thanks!
> 


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [RFC PATCH 00/22] perf tools: introduce 'perf bpf' command to load eBPF programs.

2015-05-02 Thread Wang Nan
On 2015/5/1 12:37, Alexei Starovoitov wrote:
 On 4/30/15 3:52 AM, Wang Nan wrote:
 This series of patches is an approach to integrate eBPF with perf.
 After applying these patches, users are allowed to use following
 command to load eBPF program compiled by LLVM into kernel:

   $ perf bpf sample_bpf.o

 The required BPF code and the loading procedure is similar to Alexei
 Starovoitov's libbpf in sample/bpf, with following exceptions:

   1. The section name are not required leading with 'kprobe/' or
  'kretprobe/'. Without such leading, any valid C var name can be use.

   2. A 'config' section can be provided to describe the position and
  arguments of a program. Syntax is identical to 'perf probe'.

 An example is pasted at the bottom of this cover letter. In that
 example, mybpfprog is configured by string in config section, and will
 be probed at __alloc_pages_nodemask. sample_bpf.o is generated using:

   $ $CLANG -I/usr/src/kernel/include -I/usr/src/kernel/usr/include 
 -D__KERNEL__ \
  -Wno-unused-value -Wno-pointer-sign \
  -O2 -emit-llvm -c sample_bpf.c -o -| $LLC -march=bpf -filetype=obj -o \
  sample_bpf.o

 And can be loaded using:

   $ perf bpf sample_bpf.o

 This series is only a limited functional. Following works are on the
 todo list:

   1. Unprobe kprobe stubs used by eBPF programs when unloading;

   2. Enable eBPF programs to access local variables and arguments
  by utilizing debuginfo;

   3. Output data in perf way.

 In this series:

 Patch 1/22 is a bugfix in perf probe, and may be triggered by following
 patches;

 Patch 2-3/22 are preparation, add required macros and syscall
 definition into perf source tree.

 Patch 4/22 add 'perf bpf' command.

 Patch 5-20/22 are labor works, which parse the ELF object file, collect
 information in object files, create maps needed by programs, link map
 and programs, config programs and load programs into kernel.

 Patch 21-22/22 are the final work. Patch 21 creates kprobe points which
 will be used by eBPF programs, patch 22 creates perf file descriptors
 then attach eBPF programs on them.
 
 I'm very happy to see this work. Looks great. All patches are impressively 
 clean and concise.
 I think patches 1-3 are ready to go into Arnaldo's perf tree right now.
 4 and above are clean and polished, but probably need to go into
 some 'staging area' like a branch of perf tree, since I suspect the
 user interface may change a little in the coming months and it's
 a bit too early to expose 'perf bpf' command to every perf user ?
 Arnaldo, Ingo, what do you guys think should be the arrangement?
 'perf/bpf' branch in acme/linux.git or in tip/tip.git ?
 
 I have few comments for patches 18 and 19, but let's figure out
 the long term plan first.
 

Hi,

Very happy to see your and other's positive feedbacks. I'm also interested in
how these patches can be merged into mainline. I'd like to continous send 
patches
to this list to let you all see my improvements, and let maintainers deside 
whether
and how to merge them.

Now we are also doing some backporting work to make eBPF patches to work for our
low version kernels. After that we will utilize eBPF in our profiling work.
I think this RFC series is only a start point to let us to use eBPF. Further 
requirements
should arise during our real work.

I'd like to do following works in the next version (based on my experience and 
feedbacks):

1. Safely clean up kprobe points after unloading;

2. Add subcommand space to 'perf bpf'. Current staff should be reside in 'perf 
bpf load';

3. Extract eBPF ELF walking and collecting work to a separated library to help 
others.

My collage He Kuang is working on variable accessing. Probing inside function 
body
and accessing its local variable will be supported like this:

 SEC(config) char _prog_config[] = prog: func_name:1234 vara=localvara
 int prog(struct pt_regs *ctx, unsigned long vara) {
// vara is the value of localvara of function func_name
 }

And I want to discuss with you and others about:

 1. How to make eBPF output its tracing and aggregation results to perf?

Thanks!

 We're also working in parallel on creating a new tracing language
 that together with llvm backend can be used as a single shared library
 that can be called from perf or anything else.
 Then clang compilation step will be gone and programs can be run
 as 'perf bpf file.bpf'.
 
 Thanks!
 


--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [RFC PATCH 00/22] perf tools: introduce 'perf bpf' command to load eBPF programs.

2015-05-01 Thread Ingo Molnar

* Alexei Starovoitov  wrote:

> On 5/1/15 4:49 AM, Ingo Molnar wrote:
> >
> >* Peter Zijlstra  wrote:
> >
> >>On Thu, Apr 30, 2015 at 09:37:04PM -0700, Alexei Starovoitov wrote:
> >>>We're also working in parallel on creating a new tracing language
> >>>that together with llvm backend can be used as a single shared library
> >>>that can be called from perf or anything else.
> >>
> >>Gurgh, please also keep normal C an option. [...]
> >
> >Absolutely, I thought there was agreement on that when we started
> >merging all these eBPF patches ...
> >
> >It might be 'simplified C', in that it's just a subset of C, but
> >please don't re-do something that works, especially if it's used to
> >instrument a kernel that is written in C ...
> 
> of course. When did I say that I like 'bird' languages? :)
> By 'new' I mean that we're not trying to port existing tracing
> language like dtrace, systemtap, ktap to bpf.
> I believe dtrace would have been more widely adopted if it didn't
> invent new syntax. We're trying to do a C -- with ++.
> It's C where non-supported things like 'for', 'while', 'asm' are
> actively error-ed by front-end and additional syntactic
> sugar for things that too ugly/verbose in vanilla C are added.

Ok, sounds very good to me!

Thanks,

Ingo
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [RFC PATCH 00/22] perf tools: introduce 'perf bpf' command to load eBPF programs.

2015-05-01 Thread Alexei Starovoitov

On 5/1/15 4:49 AM, Ingo Molnar wrote:


* Peter Zijlstra  wrote:


On Thu, Apr 30, 2015 at 09:37:04PM -0700, Alexei Starovoitov wrote:

We're also working in parallel on creating a new tracing language
that together with llvm backend can be used as a single shared library
that can be called from perf or anything else.


Gurgh, please also keep normal C an option. [...]


Absolutely, I thought there was agreement on that when we started
merging all these eBPF patches ...

It might be 'simplified C', in that it's just a subset of C, but
please don't re-do something that works, especially if it's used to
instrument a kernel that is written in C ...


of course. When did I say that I like 'bird' languages? :)
By 'new' I mean that we're not trying to port existing tracing
language like dtrace, systemtap, ktap to bpf.
I believe dtrace would have been more widely adopted if it didn't
invent new syntax. We're trying to do a C -- with ++.
It's C where non-supported things like 'for', 'while', 'asm' are
actively error-ed by front-end and additional syntactic
sugar for things that too ugly/verbose in vanilla C are added.
Full C via clang will always be there, but looks like it will have
a hard time, because full C has way too many things that are not
supported by bpf VM. We're trying to act on feedback that new users
are giving us. It's much more friendly when compiler tells you right
away that 'for' is not supported instead of kernel verifier says that
there is a loop. New thing is map[key] access which is equivalent
to bpf_map_lookup(, ) followed by
bpf_map_update(, , _value) if lookup doesn't find
an element. Turned out that for tracing use cases it's a very common
pattern.

Anyway, back to my original question about long term home.
where to land 'perf/bpf' branch ?

I also agree on a room for additional arguments after 'perf bpf'.
Especially I like to see 'perf bpf list'.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [RFC PATCH 00/22] perf tools: introduce 'perf bpf' command to load eBPF programs.

2015-05-01 Thread Ingo Molnar

* Peter Zijlstra  wrote:

> On Thu, Apr 30, 2015 at 09:37:04PM -0700, Alexei Starovoitov wrote:
> > We're also working in parallel on creating a new tracing language
> > that together with llvm backend can be used as a single shared library
> > that can be called from perf or anything else.
> 
> Gurgh, please also keep normal C an option. [...]

Absolutely, I thought there was agreement on that when we started 
merging all these eBPF patches ...

It might be 'simplified C', in that it's just a subset of C, but 
please don't re-do something that works, especially if it's used to 
instrument a kernel that is written in C ...

Thanks,

Ingo
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [RFC PATCH 00/22] perf tools: introduce 'perf bpf' command to load eBPF programs.

2015-05-01 Thread Peter Zijlstra
On Thu, Apr 30, 2015 at 09:37:04PM -0700, Alexei Starovoitov wrote:
> We're also working in parallel on creating a new tracing language
> that together with llvm backend can be used as a single shared library
> that can be called from perf or anything else.

Gurgh, please also keep normal C an option. I never can remember how all
these fancy arse special case languages work and its just too annoying /
frustrating to have to figure out how to do simple things every time you
need it to just work.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [RFC PATCH 00/22] perf tools: introduce 'perf bpf' command to load eBPF programs.

2015-05-01 Thread Ingo Molnar

* Wang Nan  wrote:

> This series of patches is an approach to integrate eBPF with perf.

Very promising!

> After applying these patches, users are allowed to use following
> command to load eBPF program compiled by LLVM into kernel:
> 
>  $ perf bpf sample_bpf.o

Please keep space for a subcommand space as most other perf 
subcommands do, i.e. make it something like:

perf bpf add sample_bpf.o

or:

perf bpf run sample_bpf.o

or:

perf bpf load sample_bpf.o

So that future subcommands can be added:

perf bpf list
perf bpf del <...>
perf bpf enable <...>
perf bpf disable <...>
perf bpf help

and 'perf bpf' should probably display the help page by default, so if 
curious perf users stumble into the new subcommand, they get a basic 
idea about what it's all about.

I.e. you should think about the high level subcommand space right now, 
and pick proper names - because this is going to determine the future 
usability and the success of the tool to a large degree.

Thanks,

Ingo
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [RFC PATCH 00/22] perf tools: introduce 'perf bpf' command to load eBPF programs.

2015-05-01 Thread Ingo Molnar

* Wang Nan wangn...@huawei.com wrote:

 This series of patches is an approach to integrate eBPF with perf.

Very promising!

 After applying these patches, users are allowed to use following
 command to load eBPF program compiled by LLVM into kernel:
 
  $ perf bpf sample_bpf.o

Please keep space for a subcommand space as most other perf 
subcommands do, i.e. make it something like:

perf bpf add sample_bpf.o

or:

perf bpf run sample_bpf.o

or:

perf bpf load sample_bpf.o

So that future subcommands can be added:

perf bpf list
perf bpf del ...
perf bpf enable ...
perf bpf disable ...
perf bpf help

and 'perf bpf' should probably display the help page by default, so if 
curious perf users stumble into the new subcommand, they get a basic 
idea about what it's all about.

I.e. you should think about the high level subcommand space right now, 
and pick proper names - because this is going to determine the future 
usability and the success of the tool to a large degree.

Thanks,

Ingo
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [RFC PATCH 00/22] perf tools: introduce 'perf bpf' command to load eBPF programs.

2015-05-01 Thread Alexei Starovoitov

On 5/1/15 4:49 AM, Ingo Molnar wrote:


* Peter Zijlstra pet...@infradead.org wrote:


On Thu, Apr 30, 2015 at 09:37:04PM -0700, Alexei Starovoitov wrote:

We're also working in parallel on creating a new tracing language
that together with llvm backend can be used as a single shared library
that can be called from perf or anything else.


Gurgh, please also keep normal C an option. [...]


Absolutely, I thought there was agreement on that when we started
merging all these eBPF patches ...

It might be 'simplified C', in that it's just a subset of C, but
please don't re-do something that works, especially if it's used to
instrument a kernel that is written in C ...


of course. When did I say that I like 'bird' languages? :)
By 'new' I mean that we're not trying to port existing tracing
language like dtrace, systemtap, ktap to bpf.
I believe dtrace would have been more widely adopted if it didn't
invent new syntax. We're trying to do a C -- with ++.
It's C where non-supported things like 'for', 'while', 'asm' are
actively error-ed by front-end and additional syntactic
sugar for things that too ugly/verbose in vanilla C are added.
Full C via clang will always be there, but looks like it will have
a hard time, because full C has way too many things that are not
supported by bpf VM. We're trying to act on feedback that new users
are giving us. It's much more friendly when compiler tells you right
away that 'for' is not supported instead of kernel verifier says that
there is a loop. New thing is map[key] access which is equivalent
to bpf_map_lookup(map, key) followed by
bpf_map_update(map, key, zero_value) if lookup doesn't find
an element. Turned out that for tracing use cases it's a very common
pattern.

Anyway, back to my original question about long term home.
where to land 'perf/bpf' branch ?

I also agree on a room for additional arguments after 'perf bpf'.
Especially I like to see 'perf bpf list'.
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [RFC PATCH 00/22] perf tools: introduce 'perf bpf' command to load eBPF programs.

2015-05-01 Thread Ingo Molnar

* Alexei Starovoitov a...@plumgrid.com wrote:

 On 5/1/15 4:49 AM, Ingo Molnar wrote:
 
 * Peter Zijlstra pet...@infradead.org wrote:
 
 On Thu, Apr 30, 2015 at 09:37:04PM -0700, Alexei Starovoitov wrote:
 We're also working in parallel on creating a new tracing language
 that together with llvm backend can be used as a single shared library
 that can be called from perf or anything else.
 
 Gurgh, please also keep normal C an option. [...]
 
 Absolutely, I thought there was agreement on that when we started
 merging all these eBPF patches ...
 
 It might be 'simplified C', in that it's just a subset of C, but
 please don't re-do something that works, especially if it's used to
 instrument a kernel that is written in C ...
 
 of course. When did I say that I like 'bird' languages? :)
 By 'new' I mean that we're not trying to port existing tracing
 language like dtrace, systemtap, ktap to bpf.
 I believe dtrace would have been more widely adopted if it didn't
 invent new syntax. We're trying to do a C -- with ++.
 It's C where non-supported things like 'for', 'while', 'asm' are
 actively error-ed by front-end and additional syntactic
 sugar for things that too ugly/verbose in vanilla C are added.

Ok, sounds very good to me!

Thanks,

Ingo
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [RFC PATCH 00/22] perf tools: introduce 'perf bpf' command to load eBPF programs.

2015-05-01 Thread Peter Zijlstra
On Thu, Apr 30, 2015 at 09:37:04PM -0700, Alexei Starovoitov wrote:
 We're also working in parallel on creating a new tracing language
 that together with llvm backend can be used as a single shared library
 that can be called from perf or anything else.

Gurgh, please also keep normal C an option. I never can remember how all
these fancy arse special case languages work and its just too annoying /
frustrating to have to figure out how to do simple things every time you
need it to just work.
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [RFC PATCH 00/22] perf tools: introduce 'perf bpf' command to load eBPF programs.

2015-05-01 Thread Ingo Molnar

* Peter Zijlstra pet...@infradead.org wrote:

 On Thu, Apr 30, 2015 at 09:37:04PM -0700, Alexei Starovoitov wrote:
  We're also working in parallel on creating a new tracing language
  that together with llvm backend can be used as a single shared library
  that can be called from perf or anything else.
 
 Gurgh, please also keep normal C an option. [...]

Absolutely, I thought there was agreement on that when we started 
merging all these eBPF patches ...

It might be 'simplified C', in that it's just a subset of C, but 
please don't re-do something that works, especially if it's used to 
instrument a kernel that is written in C ...

Thanks,

Ingo
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [RFC PATCH 00/22] perf tools: introduce 'perf bpf' command to load eBPF programs.

2015-04-30 Thread Alexei Starovoitov

On 4/30/15 3:52 AM, Wang Nan wrote:

This series of patches is an approach to integrate eBPF with perf.
After applying these patches, users are allowed to use following
command to load eBPF program compiled by LLVM into kernel:

  $ perf bpf sample_bpf.o

The required BPF code and the loading procedure is similar to Alexei
Starovoitov's libbpf in sample/bpf, with following exceptions:

  1. The section name are not required leading with 'kprobe/' or
 'kretprobe/'. Without such leading, any valid C var name can be use.

  2. A 'config' section can be provided to describe the position and
 arguments of a program. Syntax is identical to 'perf probe'.

An example is pasted at the bottom of this cover letter. In that
example, mybpfprog is configured by string in config section, and will
be probed at __alloc_pages_nodemask. sample_bpf.o is generated using:

  $ $CLANG -I/usr/src/kernel/include -I/usr/src/kernel/usr/include -D__KERNEL__ 
\
 -Wno-unused-value -Wno-pointer-sign \
 -O2 -emit-llvm -c sample_bpf.c -o -| $LLC -march=bpf -filetype=obj -o \
 sample_bpf.o

And can be loaded using:

  $ perf bpf sample_bpf.o

This series is only a limited functional. Following works are on the
todo list:

  1. Unprobe kprobe stubs used by eBPF programs when unloading;

  2. Enable eBPF programs to access local variables and arguments
 by utilizing debuginfo;

  3. Output data in perf way.

In this series:

Patch 1/22 is a bugfix in perf probe, and may be triggered by following
patches;

Patch 2-3/22 are preparation, add required macros and syscall
definition into perf source tree.

Patch 4/22 add 'perf bpf' command.

Patch 5-20/22 are labor works, which parse the ELF object file, collect
information in object files, create maps needed by programs, link map
and programs, config programs and load programs into kernel.

Patch 21-22/22 are the final work. Patch 21 creates kprobe points which
will be used by eBPF programs, patch 22 creates perf file descriptors
then attach eBPF programs on them.


I'm very happy to see this work. Looks great. All patches are 
impressively clean and concise.

I think patches 1-3 are ready to go into Arnaldo's perf tree right now.
4 and above are clean and polished, but probably need to go into
some 'staging area' like a branch of perf tree, since I suspect the
user interface may change a little in the coming months and it's
a bit too early to expose 'perf bpf' command to every perf user ?
Arnaldo, Ingo, what do you guys think should be the arrangement?
'perf/bpf' branch in acme/linux.git or in tip/tip.git ?

I have few comments for patches 18 and 19, but let's figure out
the long term plan first.

We're also working in parallel on creating a new tracing language
that together with llvm backend can be used as a single shared library
that can be called from perf or anything else.
Then clang compilation step will be gone and programs can be run
as 'perf bpf file.bpf'.

Thanks!

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [RFC PATCH 00/22] perf tools: introduce 'perf bpf' command to load eBPF programs.

2015-04-30 Thread Alexei Starovoitov

On 4/30/15 3:52 AM, Wang Nan wrote:

This series of patches is an approach to integrate eBPF with perf.
After applying these patches, users are allowed to use following
command to load eBPF program compiled by LLVM into kernel:

  $ perf bpf sample_bpf.o

The required BPF code and the loading procedure is similar to Alexei
Starovoitov's libbpf in sample/bpf, with following exceptions:

  1. The section name are not required leading with 'kprobe/' or
 'kretprobe/'. Without such leading, any valid C var name can be use.

  2. A 'config' section can be provided to describe the position and
 arguments of a program. Syntax is identical to 'perf probe'.

An example is pasted at the bottom of this cover letter. In that
example, mybpfprog is configured by string in config section, and will
be probed at __alloc_pages_nodemask. sample_bpf.o is generated using:

  $ $CLANG -I/usr/src/kernel/include -I/usr/src/kernel/usr/include -D__KERNEL__ 
\
 -Wno-unused-value -Wno-pointer-sign \
 -O2 -emit-llvm -c sample_bpf.c -o -| $LLC -march=bpf -filetype=obj -o \
 sample_bpf.o

And can be loaded using:

  $ perf bpf sample_bpf.o

This series is only a limited functional. Following works are on the
todo list:

  1. Unprobe kprobe stubs used by eBPF programs when unloading;

  2. Enable eBPF programs to access local variables and arguments
 by utilizing debuginfo;

  3. Output data in perf way.

In this series:

Patch 1/22 is a bugfix in perf probe, and may be triggered by following
patches;

Patch 2-3/22 are preparation, add required macros and syscall
definition into perf source tree.

Patch 4/22 add 'perf bpf' command.

Patch 5-20/22 are labor works, which parse the ELF object file, collect
information in object files, create maps needed by programs, link map
and programs, config programs and load programs into kernel.

Patch 21-22/22 are the final work. Patch 21 creates kprobe points which
will be used by eBPF programs, patch 22 creates perf file descriptors
then attach eBPF programs on them.


I'm very happy to see this work. Looks great. All patches are 
impressively clean and concise.

I think patches 1-3 are ready to go into Arnaldo's perf tree right now.
4 and above are clean and polished, but probably need to go into
some 'staging area' like a branch of perf tree, since I suspect the
user interface may change a little in the coming months and it's
a bit too early to expose 'perf bpf' command to every perf user ?
Arnaldo, Ingo, what do you guys think should be the arrangement?
'perf/bpf' branch in acme/linux.git or in tip/tip.git ?

I have few comments for patches 18 and 19, but let's figure out
the long term plan first.

We're also working in parallel on creating a new tracing language
that together with llvm backend can be used as a single shared library
that can be called from perf or anything else.
Then clang compilation step will be gone and programs can be run
as 'perf bpf file.bpf'.

Thanks!

--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/