Re: [RFC PATCH 00/22] perf tools: introduce 'perf bpf' command to load eBPF programs.
On 2015/5/6 12:56, Alexei Starovoitov wrote: > On 5/5/15 9:46 PM, Wang Nan wrote: >> Hi Alexei Starovoitov, >> >> Have you ever read this mail? > > please don't top post. > all makes sense and your use case fits quite well into existing bpf+kprobe model. I'm not sure why you're calling a 'problem'. A problem of how to display that call stack from perf? I would say it fits better as a sample than a trace. If you dump it as a trace, it won't easy to decipher, whereas if you treat it a sampling event, perf record/report facility will pick it up and display nicely. Meaning that one sample == lock_page/unlock_page latency > N. Then existing sample_callchain flag should work. >>> >>> Quite well. Do we have an eBPF function like >>> >>> static int (*bpf_perf_sample)(const char *fmt, int fmt_size, ...) = >>> BPF_FUNC_perf_sample >>> >>> so we can use it in the program probed in the body of __unlock_page() like >>> that: >>> >>> ... >>> if (latency > 0.5s) >>> bpf_perf_sample("page=%p, latency=%d", sizeof(...), page, latency); > > No need for extra helper. There is already return value from > the program for this purpose. > From kernel/trace/bpf_trace.c: > * Return: BPF programs always return an integer which is interpreted by > * kprobe handler as: > * 0 - return from kprobe (event is filtered out) > * 1 - store kprobe event into ring buffer > > in your case the program attached to unlock_page() can return 1 > when it needs to store this event into ring buffer, so that perf can > process it. If I'm not mistaken, the sample_callchain flag cannot be > applied to kprobe events, but that's a general program (not > related to bpf) and can be addressed as such. > That's great! Thanks to your response! -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [RFC PATCH 00/22] perf tools: introduce 'perf bpf' command to load eBPF programs.
On 5/5/15 9:46 PM, Wang Nan wrote: Hi Alexei Starovoitov, Have you ever read this mail? please don't top post. all makes sense and your use case fits quite well into existing bpf+kprobe model. I'm not sure why you're calling a 'problem'. A problem of how to display that call stack from perf? I would say it fits better as a sample than a trace. If you dump it as a trace, it won't easy to decipher, whereas if you treat it a sampling event, perf record/report facility will pick it up and display nicely. Meaning that one sample == lock_page/unlock_page latency > N. Then existing sample_callchain flag should work. Quite well. Do we have an eBPF function like static int (*bpf_perf_sample)(const char *fmt, int fmt_size, ...) = BPF_FUNC_perf_sample so we can use it in the program probed in the body of __unlock_page() like that: ... if (latency > 0.5s) bpf_perf_sample("page=%p, latency=%d", sizeof(...), page, latency); No need for extra helper. There is already return value from the program for this purpose. From kernel/trace/bpf_trace.c: * Return: BPF programs always return an integer which is interpreted by * kprobe handler as: * 0 - return from kprobe (event is filtered out) * 1 - store kprobe event into ring buffer in your case the program attached to unlock_page() can return 1 when it needs to store this event into ring buffer, so that perf can process it. If I'm not mistaken, the sample_callchain flag cannot be applied to kprobe events, but that's a general program (not related to bpf) and can be addressed as such. -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [RFC PATCH 00/22] perf tools: introduce 'perf bpf' command to load eBPF programs.
Hi Alexei Starovoitov, Have you ever read this mail? I'm very intrerested in triggering perf sample in BPF code. You said it is not a problem. Could you please give me some further information? Thank you. On 2015/5/5 14:14, Wang Nan wrote: > On 2015/5/5 13:49, Alexei Starovoitov wrote: >> On 5/4/15 9:41 PM, Wang Nan wrote: >>> >>> That's great. Could you please append the description of 'llvm -s' into >>> your README >>> or comments? It has cost me a lot of time for dumping eBPF instructions so >>> I decide to >>> add it into perf... >> >> sure. it's just -filetype=asm flag to llc instead of -filetype=obj. >> Eventually it will work as normal 'clang -S file.c' when few more >> llvm commits are accepted upstream. >> > My collage He Kuang is working on variable accessing. Probing inside > function body > and accessing its local variable will be supported like this: > >SEC("config") char _prog_config[] = "prog: func_name:1234 > vara=localvara" >int prog(struct pt_regs *ctx, unsigned long vara) { > // vara is the value of localvara of function func_name >} that would be great. I'm not sure though how you can achieve that without changing C front-end ? >>> >>> It's not very difficult. He is trying to generate the loader of vara >>> as prologue, then paste the prologue and the main eBPF program together. >>> From the viewpoint of kernel bpf verifier, there is only one param (ctx); >>> the >>> prologue program fetches the value of vara then put it into a propoer >>> register, >>> then main program work. >> >> got it. I think that's much cleaner than what I was proposing. >> The only question is then: >> char _prog_config[] = "prog: func_name:1234 vara=localvara" >> should actually be something like "... r2=localvara", right? >> since prologue would need to assign into r2. >> Otherwise I don't see where you find out about 'vara' inside >> compiled bpf code. >> > > I think the calling convention could teach us which var should go to which > register. In the case of > > SEC("config") char _prog_config[] = "prog: func_name:1234 vara=localvara > varb=globalvarb"; > int prog(struct pt_regs *ctx, unsigned long vara, unsigned long varb) { ... } > > llvm should compile 'prog' according to calling convention. The body of that > program should assume vara in r2 and varb in r3. The prologue also puts the > vars into > r2 and r3 according to calling convention. Therefore, after paste them > together, the final > program should run properly. There is no need to describe register number > explicitly. > What do you think? > > >> Would be nice if this can be done without debug info. >> Like in tracex2_kern.c I have: >> SEC("kprobe/sys_write") >> int bpf_prog(struct pt_regs *ctx) >> { >> long wr_size = ctx->dx; /* arg3 */ >> >> with your prolog generator the above can be rewritten as: >> SEC("kprobe/sys_write") >> int bpf_prog(struct pt_regs *unused, int fd, char *buf, size_t wr_size) >> { >> /* use wr_size */ >> >> that will improve ease of use a lot. >> > > It is possible if probing on the entry of a function. However, when probing on > function body, there still need a way to pass variable list required by the > program to perf to let it generate correct prologue. We'd like to implement > the generic one (list vars in config string) first, then make function > parameters accessing as a syntax sugar. > >>> Another possible solution is to change the protocol between kprobe and eBPF >>> program, makes kprobes calls fetchers and passes them to eBPF program as >>> a second param (group all varx together). >>> A prologue may still need in this case to load each param into correct >>> register. >> >> you mean grouping varx together in some other struct and embedding it >> together with pt_regs into new container struct? >> doable, but your first approach is quite clean already. why bother. >> > > The second approach makes us reuse the fetchers code which are already in > kernel. Further more, if new type of fetchers are appear (for example, fetcher > of PMU counter), we support it automatically. > >>> Could you please consider the following problem? >>> >>> We find there are serval __lock_page() calls last very long time. We are >>> going >>> to find corresponding __unlock_page() so we can know what blocks them. We >>> want to >>> insert eBPF programs before io_schedule() in __lock_page(), and also add >>> eBPF program >>> on the entry of __unlock_page(), so we can compute the interval between >>> page locking and >>> unlocking. If time is longer than a threshold, let __unlock_page() trigger >>> a perf sampling >>> so we get its call stack. In this case, eBPF program acts as a trace filter. >> >> all makes sense and your use case fits quite well into existing >> bpf+kprobe model. I'm not sure why you're calling a 'problem'. >> A problem of how to display that call stack from perf? >> I would say it fits better as a
Re: [RFC PATCH 00/22] perf tools: introduce 'perf bpf' command to load eBPF programs.
On Thu, Apr 30, 2015 at 3:52 AM, Wang Nan wrote: [...] > An example is pasted at the bottom of this cover letter. In that > example, mybpfprog is configured by string in config section, and will > be probed at __alloc_pages_nodemask. sample_bpf.o is generated using: > > $ $CLANG -I/usr/src/kernel/include -I/usr/src/kernel/usr/include > -D__KERNEL__ \ > -Wno-unused-value -Wno-pointer-sign \ > -O2 -emit-llvm -c sample_bpf.c -o -| $LLC -march=bpf -filetype=obj > -o \ > sample_bpf.o > > And can be loaded using: > > $ perf bpf sample_bpf.o [...] > EXAMPL > - sample_bpf.c - > #include > #include > #include > > #define SEC(NAME) __attribute__((section(NAME), used)) > > static int (*bpf_map_delete_elem)(void *map, void *key) = > (void *) BPF_FUNC_map_delete_elem; > static int (*bpf_trace_printk)(const char *fmt, int fmt_size, ...) = > (void *) BPF_FUNC_trace_printk; > > struct bpf_map_def { > unsigned int type; > unsigned int key_size; > unsigned int value_size; > unsigned int max_entries; > }; > > struct pair { > u64 val; > u64 ip; > }; > > struct bpf_map_def SEC("maps") my_map = { > .type = BPF_MAP_TYPE_HASH, > .key_size = sizeof(long), > .value_size = sizeof(struct pair), > .max_entries = 100, > }; > > SEC("kprobe/kmem_cache_free") > int bpf_prog1(struct pt_regs *ctx) > { > long ptr = ctx->r14; > bpf_map_delete_elem(_map, ); > return 0; > } > > SEC("mybpfprog") > int bpf_prog_my(void *ctx) > { > char fmt[] = "Haha\n"; > bpf_trace_printk(fmt, sizeof(fmt)); > return 0; > } > > char _license[] SEC("license") = "GPL"; > u32 _version SEC("version") = LINUX_VERSION_CODE; > char _config[] SEC("config") = "" > "mybpfprog=__alloc_pages_nodemask\n"; Was this just some random eBPF code to test the perf framework? Or was it to do something useful with kmem_cache_free()/__alloc_pages_nodemask() tracing as well? It looks a bit incomplete. If it's just random code, I'd include a comment to state that, otherwise it's a bit confusing. A complete example might be better; eg, something like Alexei's tracex1, for a simple example of bpf_trace_printk(), or sockex1, for a simple map example. Brendan -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [RFC PATCH 00/22] perf tools: introduce 'perf bpf' command to load eBPF programs.
Em Fri, May 01, 2015 at 09:56:23AM -0700, Alexei Starovoitov escreveu: > Anyway, back to my original question about long term home. > where to land 'perf/bpf' branch ? I don't care, but for me to merge it, please go on addressing the comments made in this thread (perf bpf command --args, etc) and at some point provide a small patchset that implements the most basic stuff, like, say, a "hello, world" style proggie, together with the tools/perf/Documentation/perf-bpf.txt file, detailed instructions on how to use the feature, i.e. what dependencies are needed, what kernel options should be enabled, etc. Nice warning/error messages for when the user doesn't have those options enabled or doesn't have appropriate permissions, etc. I.e. just by following what is in each changeset comment log I should be able to test patch after patch. After we get one such, say, 10-long patchkit with a very basic feature of eBPF exposed via 'perf bpf', we can go to the next, and so on. Try to use 'perf trace usleep 1', 'perf trace -a usleep 1' as non-root, for instance, to see examples on how to inform the user about what is needed to use the tool. - Arnaldo -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [RFC PATCH 00/22] perf tools: introduce 'perf bpf' command to load eBPF programs.
On 2015/5/5 13:49, Alexei Starovoitov wrote: > On 5/4/15 9:41 PM, Wang Nan wrote: >> >> That's great. Could you please append the description of 'llvm -s' into your >> README >> or comments? It has cost me a lot of time for dumping eBPF instructions so I >> decide to >> add it into perf... > > sure. it's just -filetype=asm flag to llc instead of -filetype=obj. > Eventually it will work as normal 'clang -S file.c' when few more > llvm commits are accepted upstream. > My collage He Kuang is working on variable accessing. Probing inside function body and accessing its local variable will be supported like this: SEC("config") char _prog_config[] = "prog: func_name:1234 vara=localvara" int prog(struct pt_regs *ctx, unsigned long vara) { // vara is the value of localvara of function func_name } >>> >>> that would be great. I'm not sure though how you can achieve that >>> without changing C front-end ? >> >> It's not very difficult. He is trying to generate the loader of vara >> as prologue, then paste the prologue and the main eBPF program together. >> From the viewpoint of kernel bpf verifier, there is only one param (ctx); >> the >> prologue program fetches the value of vara then put it into a propoer >> register, >> then main program work. > > got it. I think that's much cleaner than what I was proposing. > The only question is then: > char _prog_config[] = "prog: func_name:1234 vara=localvara" > should actually be something like "... r2=localvara", right? > since prologue would need to assign into r2. > Otherwise I don't see where you find out about 'vara' inside > compiled bpf code. > I think the calling convention could teach us which var should go to which register. In the case of SEC("config") char _prog_config[] = "prog: func_name:1234 vara=localvara varb=globalvarb"; int prog(struct pt_regs *ctx, unsigned long vara, unsigned long varb) { ... } llvm should compile 'prog' according to calling convention. The body of that program should assume vara in r2 and varb in r3. The prologue also puts the vars into r2 and r3 according to calling convention. Therefore, after paste them together, the final program should run properly. There is no need to describe register number explicitly. What do you think? > Would be nice if this can be done without debug info. > Like in tracex2_kern.c I have: > SEC("kprobe/sys_write") > int bpf_prog(struct pt_regs *ctx) > { > long wr_size = ctx->dx; /* arg3 */ > > with your prolog generator the above can be rewritten as: > SEC("kprobe/sys_write") > int bpf_prog(struct pt_regs *unused, int fd, char *buf, size_t wr_size) > { > /* use wr_size */ > > that will improve ease of use a lot. > It is possible if probing on the entry of a function. However, when probing on function body, there still need a way to pass variable list required by the program to perf to let it generate correct prologue. We'd like to implement the generic one (list vars in config string) first, then make function parameters accessing as a syntax sugar. >> Another possible solution is to change the protocol between kprobe and eBPF >> program, makes kprobes calls fetchers and passes them to eBPF program as >> a second param (group all varx together). >> A prologue may still need in this case to load each param into correct >> register. > > you mean grouping varx together in some other struct and embedding it > together with pt_regs into new container struct? > doable, but your first approach is quite clean already. why bother. > The second approach makes us reuse the fetchers code which are already in kernel. Further more, if new type of fetchers are appear (for example, fetcher of PMU counter), we support it automatically. >> Could you please consider the following problem? >> >> We find there are serval __lock_page() calls last very long time. We are >> going >> to find corresponding __unlock_page() so we can know what blocks them. We >> want to >> insert eBPF programs before io_schedule() in __lock_page(), and also add >> eBPF program >> on the entry of __unlock_page(), so we can compute the interval between page >> locking and >> unlocking. If time is longer than a threshold, let __unlock_page() trigger a >> perf sampling >> so we get its call stack. In this case, eBPF program acts as a trace filter. > > all makes sense and your use case fits quite well into existing > bpf+kprobe model. I'm not sure why you're calling a 'problem'. > A problem of how to display that call stack from perf? > I would say it fits better as a sample than a trace. > If you dump it as a trace, it won't easy to decipher, whereas if you > treat it a sampling event, perf record/report facility will pick it up and > display nicely. Meaning that one sample == lock_page/unlock_page > latency > N. Then existing sample_callchain flag should work. > Quite well. Do we have an eBPF function like static int (*bpf_perf_sample)(const
Re: [RFC PATCH 00/22] perf tools: introduce 'perf bpf' command to load eBPF programs.
Hi Alexei Starovoitov, Have you ever read this mail? I'm very intrerested in triggering perf sample in BPF code. You said it is not a problem. Could you please give me some further information? Thank you. On 2015/5/5 14:14, Wang Nan wrote: On 2015/5/5 13:49, Alexei Starovoitov wrote: On 5/4/15 9:41 PM, Wang Nan wrote: That's great. Could you please append the description of 'llvm -s' into your README or comments? It has cost me a lot of time for dumping eBPF instructions so I decide to add it into perf... sure. it's just -filetype=asm flag to llc instead of -filetype=obj. Eventually it will work as normal 'clang -S file.c' when few more llvm commits are accepted upstream. My collage He Kuang is working on variable accessing. Probing inside function body and accessing its local variable will be supported like this: SEC(config) char _prog_config[] = prog: func_name:1234 vara=localvara int prog(struct pt_regs *ctx, unsigned long vara) { // vara is the value of localvara of function func_name } that would be great. I'm not sure though how you can achieve that without changing C front-end ? It's not very difficult. He is trying to generate the loader of vara as prologue, then paste the prologue and the main eBPF program together. From the viewpoint of kernel bpf verifier, there is only one param (ctx); the prologue program fetches the value of vara then put it into a propoer register, then main program work. got it. I think that's much cleaner than what I was proposing. The only question is then: char _prog_config[] = prog: func_name:1234 vara=localvara should actually be something like ... r2=localvara, right? since prologue would need to assign into r2. Otherwise I don't see where you find out about 'vara' inside compiled bpf code. I think the calling convention could teach us which var should go to which register. In the case of SEC(config) char _prog_config[] = prog: func_name:1234 vara=localvara varb=globalvarb; int prog(struct pt_regs *ctx, unsigned long vara, unsigned long varb) { ... } llvm should compile 'prog' according to calling convention. The body of that program should assume vara in r2 and varb in r3. The prologue also puts the vars into r2 and r3 according to calling convention. Therefore, after paste them together, the final program should run properly. There is no need to describe register number explicitly. What do you think? Would be nice if this can be done without debug info. Like in tracex2_kern.c I have: SEC(kprobe/sys_write) int bpf_prog(struct pt_regs *ctx) { long wr_size = ctx-dx; /* arg3 */ with your prolog generator the above can be rewritten as: SEC(kprobe/sys_write) int bpf_prog(struct pt_regs *unused, int fd, char *buf, size_t wr_size) { /* use wr_size */ that will improve ease of use a lot. It is possible if probing on the entry of a function. However, when probing on function body, there still need a way to pass variable list required by the program to perf to let it generate correct prologue. We'd like to implement the generic one (list vars in config string) first, then make function parameters accessing as a syntax sugar. Another possible solution is to change the protocol between kprobe and eBPF program, makes kprobes calls fetchers and passes them to eBPF program as a second param (group all varx together). A prologue may still need in this case to load each param into correct register. you mean grouping varx together in some other struct and embedding it together with pt_regs into new container struct? doable, but your first approach is quite clean already. why bother. The second approach makes us reuse the fetchers code which are already in kernel. Further more, if new type of fetchers are appear (for example, fetcher of PMU counter), we support it automatically. Could you please consider the following problem? We find there are serval __lock_page() calls last very long time. We are going to find corresponding __unlock_page() so we can know what blocks them. We want to insert eBPF programs before io_schedule() in __lock_page(), and also add eBPF program on the entry of __unlock_page(), so we can compute the interval between page locking and unlocking. If time is longer than a threshold, let __unlock_page() trigger a perf sampling so we get its call stack. In this case, eBPF program acts as a trace filter. all makes sense and your use case fits quite well into existing bpf+kprobe model. I'm not sure why you're calling a 'problem'. A problem of how to display that call stack from perf? I would say it fits better as a sample than a trace. If you dump it as a trace, it won't easy to decipher, whereas if you treat it a sampling event, perf record/report facility will pick it up and display nicely. Meaning that one sample == lock_page/unlock_page latency N. Then existing sample_callchain flag
Re: [RFC PATCH 00/22] perf tools: introduce 'perf bpf' command to load eBPF programs.
On 2015/5/6 12:56, Alexei Starovoitov wrote: On 5/5/15 9:46 PM, Wang Nan wrote: Hi Alexei Starovoitov, Have you ever read this mail? please don't top post. all makes sense and your use case fits quite well into existing bpf+kprobe model. I'm not sure why you're calling a 'problem'. A problem of how to display that call stack from perf? I would say it fits better as a sample than a trace. If you dump it as a trace, it won't easy to decipher, whereas if you treat it a sampling event, perf record/report facility will pick it up and display nicely. Meaning that one sample == lock_page/unlock_page latency N. Then existing sample_callchain flag should work. Quite well. Do we have an eBPF function like static int (*bpf_perf_sample)(const char *fmt, int fmt_size, ...) = BPF_FUNC_perf_sample so we can use it in the program probed in the body of __unlock_page() like that: ... if (latency 0.5s) bpf_perf_sample(page=%p, latency=%d, sizeof(...), page, latency); No need for extra helper. There is already return value from the program for this purpose. From kernel/trace/bpf_trace.c: * Return: BPF programs always return an integer which is interpreted by * kprobe handler as: * 0 - return from kprobe (event is filtered out) * 1 - store kprobe event into ring buffer in your case the program attached to unlock_page() can return 1 when it needs to store this event into ring buffer, so that perf can process it. If I'm not mistaken, the sample_callchain flag cannot be applied to kprobe events, but that's a general program (not related to bpf) and can be addressed as such. That's great! Thanks to your response! -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [RFC PATCH 00/22] perf tools: introduce 'perf bpf' command to load eBPF programs.
On 5/5/15 9:46 PM, Wang Nan wrote: Hi Alexei Starovoitov, Have you ever read this mail? please don't top post. all makes sense and your use case fits quite well into existing bpf+kprobe model. I'm not sure why you're calling a 'problem'. A problem of how to display that call stack from perf? I would say it fits better as a sample than a trace. If you dump it as a trace, it won't easy to decipher, whereas if you treat it a sampling event, perf record/report facility will pick it up and display nicely. Meaning that one sample == lock_page/unlock_page latency N. Then existing sample_callchain flag should work. Quite well. Do we have an eBPF function like static int (*bpf_perf_sample)(const char *fmt, int fmt_size, ...) = BPF_FUNC_perf_sample so we can use it in the program probed in the body of __unlock_page() like that: ... if (latency 0.5s) bpf_perf_sample(page=%p, latency=%d, sizeof(...), page, latency); No need for extra helper. There is already return value from the program for this purpose. From kernel/trace/bpf_trace.c: * Return: BPF programs always return an integer which is interpreted by * kprobe handler as: * 0 - return from kprobe (event is filtered out) * 1 - store kprobe event into ring buffer in your case the program attached to unlock_page() can return 1 when it needs to store this event into ring buffer, so that perf can process it. If I'm not mistaken, the sample_callchain flag cannot be applied to kprobe events, but that's a general program (not related to bpf) and can be addressed as such. -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [RFC PATCH 00/22] perf tools: introduce 'perf bpf' command to load eBPF programs.
Em Fri, May 01, 2015 at 09:56:23AM -0700, Alexei Starovoitov escreveu: Anyway, back to my original question about long term home. where to land 'perf/bpf' branch ? I don't care, but for me to merge it, please go on addressing the comments made in this thread (perf bpf command --args, etc) and at some point provide a small patchset that implements the most basic stuff, like, say, a hello, world style proggie, together with the tools/perf/Documentation/perf-bpf.txt file, detailed instructions on how to use the feature, i.e. what dependencies are needed, what kernel options should be enabled, etc. Nice warning/error messages for when the user doesn't have those options enabled or doesn't have appropriate permissions, etc. I.e. just by following what is in each changeset comment log I should be able to test patch after patch. After we get one such, say, 10-long patchkit with a very basic feature of eBPF exposed via 'perf bpf', we can go to the next, and so on. Try to use 'perf trace usleep 1', 'perf trace -a usleep 1' as non-root, for instance, to see examples on how to inform the user about what is needed to use the tool. - Arnaldo -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [RFC PATCH 00/22] perf tools: introduce 'perf bpf' command to load eBPF programs.
On 2015/5/5 13:49, Alexei Starovoitov wrote: On 5/4/15 9:41 PM, Wang Nan wrote: That's great. Could you please append the description of 'llvm -s' into your README or comments? It has cost me a lot of time for dumping eBPF instructions so I decide to add it into perf... sure. it's just -filetype=asm flag to llc instead of -filetype=obj. Eventually it will work as normal 'clang -S file.c' when few more llvm commits are accepted upstream. My collage He Kuang is working on variable accessing. Probing inside function body and accessing its local variable will be supported like this: SEC(config) char _prog_config[] = prog: func_name:1234 vara=localvara int prog(struct pt_regs *ctx, unsigned long vara) { // vara is the value of localvara of function func_name } that would be great. I'm not sure though how you can achieve that without changing C front-end ? It's not very difficult. He is trying to generate the loader of vara as prologue, then paste the prologue and the main eBPF program together. From the viewpoint of kernel bpf verifier, there is only one param (ctx); the prologue program fetches the value of vara then put it into a propoer register, then main program work. got it. I think that's much cleaner than what I was proposing. The only question is then: char _prog_config[] = prog: func_name:1234 vara=localvara should actually be something like ... r2=localvara, right? since prologue would need to assign into r2. Otherwise I don't see where you find out about 'vara' inside compiled bpf code. I think the calling convention could teach us which var should go to which register. In the case of SEC(config) char _prog_config[] = prog: func_name:1234 vara=localvara varb=globalvarb; int prog(struct pt_regs *ctx, unsigned long vara, unsigned long varb) { ... } llvm should compile 'prog' according to calling convention. The body of that program should assume vara in r2 and varb in r3. The prologue also puts the vars into r2 and r3 according to calling convention. Therefore, after paste them together, the final program should run properly. There is no need to describe register number explicitly. What do you think? Would be nice if this can be done without debug info. Like in tracex2_kern.c I have: SEC(kprobe/sys_write) int bpf_prog(struct pt_regs *ctx) { long wr_size = ctx-dx; /* arg3 */ with your prolog generator the above can be rewritten as: SEC(kprobe/sys_write) int bpf_prog(struct pt_regs *unused, int fd, char *buf, size_t wr_size) { /* use wr_size */ that will improve ease of use a lot. It is possible if probing on the entry of a function. However, when probing on function body, there still need a way to pass variable list required by the program to perf to let it generate correct prologue. We'd like to implement the generic one (list vars in config string) first, then make function parameters accessing as a syntax sugar. Another possible solution is to change the protocol between kprobe and eBPF program, makes kprobes calls fetchers and passes them to eBPF program as a second param (group all varx together). A prologue may still need in this case to load each param into correct register. you mean grouping varx together in some other struct and embedding it together with pt_regs into new container struct? doable, but your first approach is quite clean already. why bother. The second approach makes us reuse the fetchers code which are already in kernel. Further more, if new type of fetchers are appear (for example, fetcher of PMU counter), we support it automatically. Could you please consider the following problem? We find there are serval __lock_page() calls last very long time. We are going to find corresponding __unlock_page() so we can know what blocks them. We want to insert eBPF programs before io_schedule() in __lock_page(), and also add eBPF program on the entry of __unlock_page(), so we can compute the interval between page locking and unlocking. If time is longer than a threshold, let __unlock_page() trigger a perf sampling so we get its call stack. In this case, eBPF program acts as a trace filter. all makes sense and your use case fits quite well into existing bpf+kprobe model. I'm not sure why you're calling a 'problem'. A problem of how to display that call stack from perf? I would say it fits better as a sample than a trace. If you dump it as a trace, it won't easy to decipher, whereas if you treat it a sampling event, perf record/report facility will pick it up and display nicely. Meaning that one sample == lock_page/unlock_page latency N. Then existing sample_callchain flag should work. Quite well. Do we have an eBPF function like static int (*bpf_perf_sample)(const char *fmt, int fmt_size, ...) = BPF_FUNC_perf_sample so we can use it in the program probed in the body of __unlock_page() like that: ... if (latency 0.5s)
Re: [RFC PATCH 00/22] perf tools: introduce 'perf bpf' command to load eBPF programs.
On Thu, Apr 30, 2015 at 3:52 AM, Wang Nan wangn...@huawei.com wrote: [...] An example is pasted at the bottom of this cover letter. In that example, mybpfprog is configured by string in config section, and will be probed at __alloc_pages_nodemask. sample_bpf.o is generated using: $ $CLANG -I/usr/src/kernel/include -I/usr/src/kernel/usr/include -D__KERNEL__ \ -Wno-unused-value -Wno-pointer-sign \ -O2 -emit-llvm -c sample_bpf.c -o -| $LLC -march=bpf -filetype=obj -o \ sample_bpf.o And can be loaded using: $ perf bpf sample_bpf.o [...] EXAMPL - sample_bpf.c - #include uapi/linux/bpf.h #include linux/version.h #include uapi/linux/ptrace.h #define SEC(NAME) __attribute__((section(NAME), used)) static int (*bpf_map_delete_elem)(void *map, void *key) = (void *) BPF_FUNC_map_delete_elem; static int (*bpf_trace_printk)(const char *fmt, int fmt_size, ...) = (void *) BPF_FUNC_trace_printk; struct bpf_map_def { unsigned int type; unsigned int key_size; unsigned int value_size; unsigned int max_entries; }; struct pair { u64 val; u64 ip; }; struct bpf_map_def SEC(maps) my_map = { .type = BPF_MAP_TYPE_HASH, .key_size = sizeof(long), .value_size = sizeof(struct pair), .max_entries = 100, }; SEC(kprobe/kmem_cache_free) int bpf_prog1(struct pt_regs *ctx) { long ptr = ctx-r14; bpf_map_delete_elem(my_map, ptr); return 0; } SEC(mybpfprog) int bpf_prog_my(void *ctx) { char fmt[] = Haha\n; bpf_trace_printk(fmt, sizeof(fmt)); return 0; } char _license[] SEC(license) = GPL; u32 _version SEC(version) = LINUX_VERSION_CODE; char _config[] SEC(config) = mybpfprog=__alloc_pages_nodemask\n; Was this just some random eBPF code to test the perf framework? Or was it to do something useful with kmem_cache_free()/__alloc_pages_nodemask() tracing as well? It looks a bit incomplete. If it's just random code, I'd include a comment to state that, otherwise it's a bit confusing. A complete example might be better; eg, something like Alexei's tracex1, for a simple example of bpf_trace_printk(), or sockex1, for a simple map example. Brendan -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [RFC PATCH 00/22] perf tools: introduce 'perf bpf' command to load eBPF programs.
On 5/4/15 9:41 PM, Wang Nan wrote: That's great. Could you please append the description of 'llvm -s' into your README or comments? It has cost me a lot of time for dumping eBPF instructions so I decide to add it into perf... sure. it's just -filetype=asm flag to llc instead of -filetype=obj. Eventually it will work as normal 'clang -S file.c' when few more llvm commits are accepted upstream. My collage He Kuang is working on variable accessing. Probing inside function body and accessing its local variable will be supported like this: SEC("config") char _prog_config[] = "prog: func_name:1234 vara=localvara" int prog(struct pt_regs *ctx, unsigned long vara) { // vara is the value of localvara of function func_name } that would be great. I'm not sure though how you can achieve that without changing C front-end ? It's not very difficult. He is trying to generate the loader of vara as prologue, then paste the prologue and the main eBPF program together. From the viewpoint of kernel bpf verifier, there is only one param (ctx); the prologue program fetches the value of vara then put it into a propoer register, then main program work. got it. I think that's much cleaner than what I was proposing. The only question is then: char _prog_config[] = "prog: func_name:1234 vara=localvara" should actually be something like "... r2=localvara", right? since prologue would need to assign into r2. Otherwise I don't see where you find out about 'vara' inside compiled bpf code. Would be nice if this can be done without debug info. Like in tracex2_kern.c I have: SEC("kprobe/sys_write") int bpf_prog(struct pt_regs *ctx) { long wr_size = ctx->dx; /* arg3 */ with your prolog generator the above can be rewritten as: SEC("kprobe/sys_write") int bpf_prog(struct pt_regs *unused, int fd, char *buf, size_t wr_size) { /* use wr_size */ that will improve ease of use a lot. Another possible solution is to change the protocol between kprobe and eBPF program, makes kprobes calls fetchers and passes them to eBPF program as a second param (group all varx together). A prologue may still need in this case to load each param into correct register. you mean grouping varx together in some other struct and embedding it together with pt_regs into new container struct? doable, but your first approach is quite clean already. why bother. Could you please consider the following problem? We find there are serval __lock_page() calls last very long time. We are going to find corresponding __unlock_page() so we can know what blocks them. We want to insert eBPF programs before io_schedule() in __lock_page(), and also add eBPF program on the entry of __unlock_page(), so we can compute the interval between page locking and unlocking. If time is longer than a threshold, let __unlock_page() trigger a perf sampling so we get its call stack. In this case, eBPF program acts as a trace filter. all makes sense and your use case fits quite well into existing bpf+kprobe model. I'm not sure why you're calling a 'problem'. A problem of how to display that call stack from perf? I would say it fits better as a sample than a trace. If you dump it as a trace, it won't easy to decipher, whereas if you treat it a sampling event, perf record/report facility will pick it up and display nicely. Meaning that one sample == lock_page/unlock_page latency > N. Then existing sample_callchain flag should work. -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [RFC PATCH 00/22] perf tools: introduce 'perf bpf' command to load eBPF programs.
On 2015/5/5 11:02, Alexei Starovoitov wrote: > On 5/2/15 12:19 AM, Wang Nan wrote: >> >> I'd like to do following works in the next version (based on my experience >> and feedbacks): >> >> 1. Safely clean up kprobe points after unloading; >> >> 2. Add subcommand space to 'perf bpf'. Current staff should be reside in >> 'perf bpf load'; >> >> 3. Extract eBPF ELF walking and collecting work to a separated library to >> help others. > > that's a good list. > > The feedback for existing patches: > patch 18 - since we're creating a generic library for bpf elf > loading it would great to do the following: > first try to load with > attr.log_buf = NULL; > attr.log_level = 0; > then only if it fails, allocate a buffer and repeat with log_level = 1. > The reason is that it's better to have fast program loading by default > without any verbosity emitted by verifier. > Will do. > patch 19 - I think it's unnecessary. > verifier already dumps it. so this '-v' flag can be translated into > verbose loading. > There is also .s output from llvm for those interested in bpf asm > instructions. > That's great. Could you please append the description of 'llvm -s' into your README or comments? It has cost me a lot of time for dumping eBPF instructions so I decide to add it into perf... >> My collage He Kuang is working on variable accessing. Probing inside >> function body >> and accessing its local variable will be supported like this: >> >> SEC("config") char _prog_config[] = "prog: func_name:1234 vara=localvara" >> int prog(struct pt_regs *ctx, unsigned long vara) { >> // vara is the value of localvara of function func_name >> } > > that would be great. I'm not sure though how you can achieve that > without changing C front-end ? It's not very difficult. He is trying to generate the loader of vara as prologue, then paste the prologue and the main eBPF program together. >From the viewpoint of kernel bpf verifier, there is only one param (ctx); the prologue program fetches the value of vara then put it into a propoer register, then main program work. Another possible solution is to change the protocol between kprobe and eBPF program, makes kprobes calls fetchers and passes them to eBPF program as a second param (group all varx together). A prologue may still need in this case to load each param into correct register. > This type of feature is exactly the reason why we're trying to write > our front-end. > In general there are two ways to achieve 'restricted C' language: > - start from clang and chop all features that are not supported. > I believe Jovi already tried to do that and it became very difficult. > - start from simple front-end with minimal C and add all things one by > one. That's what we're trying to do. So far we have most of normal > syntax. The problem with our approach is that we cannot easily do > #include of existing .h files. We're working on that. > It's too experimental still. May be will be drop it and go back to > first approach. > > The reason for extending front-end is your example above, where > the user would want to write: >int prog(struct pt_regs *ctx, unsigned long vara) { > // use 'vara' > but generated BPF should have only one 'ctx' pointer, since that's > the only thing that verifier will accept. bpf/core and JITs expect > only one argument, etc. > So this func definition + 'vara' access can be compiled as ctx->si > (if vara is actually in register) or > bpf_probe_read(ctx->bp + magic_offset_from_debug_info) > (if vara is on stack) > or it can also be done via store_trace_args() but that will be slower > and requires hacking kernel, whereas ctx->... style is pure userspace. > Lot's of things to brainstorm. So please share your progress soon. > >> And I want to discuss with you and others about: >> >> 1. How to make eBPF output its tracing and aggregation results to perf? > > well, the output of bpf program is a data stored in maps. Each program > needs a corresponding user space reader/printer/sorter of this data. > Like tracex2 prints this data as histogram and tracex3 prints it as > heatmap. We can standardize few things like this, but ideally we > keep it up to user. So that user can write single file that consists > of functions that are loaded as bpf into kernel and other functions > that are executed in user space. llvm can jit first set to bpf and > second set to x86. That's distant future though. > So far samples/bpf/ style of kern.c+user.c worked quite well. > Well, looks like in your design the usage of BPF programs are some aggration results. In my side, I want they also ack as trace filters. Could you please consider the following problem? We find there are serval __lock_page() calls last very long time. We are going to find corresponding __unlock_page() so we can know what blocks them. We want to insert eBPF programs before io_schedule() in __lock_page(), and also add eBPF program on the entry of __unlock_page(), so we
Re: [RFC PATCH 00/22] perf tools: introduce 'perf bpf' command to load eBPF programs.
On 5/2/15 12:19 AM, Wang Nan wrote: I'd like to do following works in the next version (based on my experience and feedbacks): 1. Safely clean up kprobe points after unloading; 2. Add subcommand space to 'perf bpf'. Current staff should be reside in 'perf bpf load'; 3. Extract eBPF ELF walking and collecting work to a separated library to help others. that's a good list. The feedback for existing patches: patch 18 - since we're creating a generic library for bpf elf loading it would great to do the following: first try to load with attr.log_buf = NULL; attr.log_level = 0; then only if it fails, allocate a buffer and repeat with log_level = 1. The reason is that it's better to have fast program loading by default without any verbosity emitted by verifier. patch 19 - I think it's unnecessary. verifier already dumps it. so this '-v' flag can be translated into verbose loading. There is also .s output from llvm for those interested in bpf asm instructions. My collage He Kuang is working on variable accessing. Probing inside function body and accessing its local variable will be supported like this: SEC("config") char _prog_config[] = "prog: func_name:1234 vara=localvara" int prog(struct pt_regs *ctx, unsigned long vara) { // vara is the value of localvara of function func_name } that would be great. I'm not sure though how you can achieve that without changing C front-end ? This type of feature is exactly the reason why we're trying to write our front-end. In general there are two ways to achieve 'restricted C' language: - start from clang and chop all features that are not supported. I believe Jovi already tried to do that and it became very difficult. - start from simple front-end with minimal C and add all things one by one. That's what we're trying to do. So far we have most of normal syntax. The problem with our approach is that we cannot easily do #include of existing .h files. We're working on that. It's too experimental still. May be will be drop it and go back to first approach. The reason for extending front-end is your example above, where the user would want to write: int prog(struct pt_regs *ctx, unsigned long vara) { // use 'vara' but generated BPF should have only one 'ctx' pointer, since that's the only thing that verifier will accept. bpf/core and JITs expect only one argument, etc. So this func definition + 'vara' access can be compiled as ctx->si (if vara is actually in register) or bpf_probe_read(ctx->bp + magic_offset_from_debug_info) (if vara is on stack) or it can also be done via store_trace_args() but that will be slower and requires hacking kernel, whereas ctx->... style is pure userspace. Lot's of things to brainstorm. So please share your progress soon. And I want to discuss with you and others about: 1. How to make eBPF output its tracing and aggregation results to perf? well, the output of bpf program is a data stored in maps. Each program needs a corresponding user space reader/printer/sorter of this data. Like tracex2 prints this data as histogram and tracex3 prints it as heatmap. We can standardize few things like this, but ideally we keep it up to user. So that user can write single file that consists of functions that are loaded as bpf into kernel and other functions that are executed in user space. llvm can jit first set to bpf and second set to x86. That's distant future though. So far samples/bpf/ style of kern.c+user.c worked quite well. -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [RFC PATCH 00/22] perf tools: introduce 'perf bpf' command to load eBPF programs.
On 5/2/15 12:19 AM, Wang Nan wrote: I'd like to do following works in the next version (based on my experience and feedbacks): 1. Safely clean up kprobe points after unloading; 2. Add subcommand space to 'perf bpf'. Current staff should be reside in 'perf bpf load'; 3. Extract eBPF ELF walking and collecting work to a separated library to help others. that's a good list. The feedback for existing patches: patch 18 - since we're creating a generic library for bpf elf loading it would great to do the following: first try to load with attr.log_buf = NULL; attr.log_level = 0; then only if it fails, allocate a buffer and repeat with log_level = 1. The reason is that it's better to have fast program loading by default without any verbosity emitted by verifier. patch 19 - I think it's unnecessary. verifier already dumps it. so this '-v' flag can be translated into verbose loading. There is also .s output from llvm for those interested in bpf asm instructions. My collage He Kuang is working on variable accessing. Probing inside function body and accessing its local variable will be supported like this: SEC(config) char _prog_config[] = prog: func_name:1234 vara=localvara int prog(struct pt_regs *ctx, unsigned long vara) { // vara is the value of localvara of function func_name } that would be great. I'm not sure though how you can achieve that without changing C front-end ? This type of feature is exactly the reason why we're trying to write our front-end. In general there are two ways to achieve 'restricted C' language: - start from clang and chop all features that are not supported. I believe Jovi already tried to do that and it became very difficult. - start from simple front-end with minimal C and add all things one by one. That's what we're trying to do. So far we have most of normal syntax. The problem with our approach is that we cannot easily do #include of existing .h files. We're working on that. It's too experimental still. May be will be drop it and go back to first approach. The reason for extending front-end is your example above, where the user would want to write: int prog(struct pt_regs *ctx, unsigned long vara) { // use 'vara' but generated BPF should have only one 'ctx' pointer, since that's the only thing that verifier will accept. bpf/core and JITs expect only one argument, etc. So this func definition + 'vara' access can be compiled as ctx-si (if vara is actually in register) or bpf_probe_read(ctx-bp + magic_offset_from_debug_info) (if vara is on stack) or it can also be done via store_trace_args() but that will be slower and requires hacking kernel, whereas ctx-... style is pure userspace. Lot's of things to brainstorm. So please share your progress soon. And I want to discuss with you and others about: 1. How to make eBPF output its tracing and aggregation results to perf? well, the output of bpf program is a data stored in maps. Each program needs a corresponding user space reader/printer/sorter of this data. Like tracex2 prints this data as histogram and tracex3 prints it as heatmap. We can standardize few things like this, but ideally we keep it up to user. So that user can write single file that consists of functions that are loaded as bpf into kernel and other functions that are executed in user space. llvm can jit first set to bpf and second set to x86. That's distant future though. So far samples/bpf/ style of kern.c+user.c worked quite well. -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [RFC PATCH 00/22] perf tools: introduce 'perf bpf' command to load eBPF programs.
On 2015/5/5 11:02, Alexei Starovoitov wrote: On 5/2/15 12:19 AM, Wang Nan wrote: I'd like to do following works in the next version (based on my experience and feedbacks): 1. Safely clean up kprobe points after unloading; 2. Add subcommand space to 'perf bpf'. Current staff should be reside in 'perf bpf load'; 3. Extract eBPF ELF walking and collecting work to a separated library to help others. that's a good list. The feedback for existing patches: patch 18 - since we're creating a generic library for bpf elf loading it would great to do the following: first try to load with attr.log_buf = NULL; attr.log_level = 0; then only if it fails, allocate a buffer and repeat with log_level = 1. The reason is that it's better to have fast program loading by default without any verbosity emitted by verifier. Will do. patch 19 - I think it's unnecessary. verifier already dumps it. so this '-v' flag can be translated into verbose loading. There is also .s output from llvm for those interested in bpf asm instructions. That's great. Could you please append the description of 'llvm -s' into your README or comments? It has cost me a lot of time for dumping eBPF instructions so I decide to add it into perf... My collage He Kuang is working on variable accessing. Probing inside function body and accessing its local variable will be supported like this: SEC(config) char _prog_config[] = prog: func_name:1234 vara=localvara int prog(struct pt_regs *ctx, unsigned long vara) { // vara is the value of localvara of function func_name } that would be great. I'm not sure though how you can achieve that without changing C front-end ? It's not very difficult. He is trying to generate the loader of vara as prologue, then paste the prologue and the main eBPF program together. From the viewpoint of kernel bpf verifier, there is only one param (ctx); the prologue program fetches the value of vara then put it into a propoer register, then main program work. Another possible solution is to change the protocol between kprobe and eBPF program, makes kprobes calls fetchers and passes them to eBPF program as a second param (group all varx together). A prologue may still need in this case to load each param into correct register. This type of feature is exactly the reason why we're trying to write our front-end. In general there are two ways to achieve 'restricted C' language: - start from clang and chop all features that are not supported. I believe Jovi already tried to do that and it became very difficult. - start from simple front-end with minimal C and add all things one by one. That's what we're trying to do. So far we have most of normal syntax. The problem with our approach is that we cannot easily do #include of existing .h files. We're working on that. It's too experimental still. May be will be drop it and go back to first approach. The reason for extending front-end is your example above, where the user would want to write: int prog(struct pt_regs *ctx, unsigned long vara) { // use 'vara' but generated BPF should have only one 'ctx' pointer, since that's the only thing that verifier will accept. bpf/core and JITs expect only one argument, etc. So this func definition + 'vara' access can be compiled as ctx-si (if vara is actually in register) or bpf_probe_read(ctx-bp + magic_offset_from_debug_info) (if vara is on stack) or it can also be done via store_trace_args() but that will be slower and requires hacking kernel, whereas ctx-... style is pure userspace. Lot's of things to brainstorm. So please share your progress soon. And I want to discuss with you and others about: 1. How to make eBPF output its tracing and aggregation results to perf? well, the output of bpf program is a data stored in maps. Each program needs a corresponding user space reader/printer/sorter of this data. Like tracex2 prints this data as histogram and tracex3 prints it as heatmap. We can standardize few things like this, but ideally we keep it up to user. So that user can write single file that consists of functions that are loaded as bpf into kernel and other functions that are executed in user space. llvm can jit first set to bpf and second set to x86. That's distant future though. So far samples/bpf/ style of kern.c+user.c worked quite well. Well, looks like in your design the usage of BPF programs are some aggration results. In my side, I want they also ack as trace filters. Could you please consider the following problem? We find there are serval __lock_page() calls last very long time. We are going to find corresponding __unlock_page() so we can know what blocks them. We want to insert eBPF programs before io_schedule() in __lock_page(), and also add eBPF program on the entry of __unlock_page(), so we can compute the interval between page locking and unlocking. If time is longer than a threshold, let
Re: [RFC PATCH 00/22] perf tools: introduce 'perf bpf' command to load eBPF programs.
On 5/4/15 9:41 PM, Wang Nan wrote: That's great. Could you please append the description of 'llvm -s' into your README or comments? It has cost me a lot of time for dumping eBPF instructions so I decide to add it into perf... sure. it's just -filetype=asm flag to llc instead of -filetype=obj. Eventually it will work as normal 'clang -S file.c' when few more llvm commits are accepted upstream. My collage He Kuang is working on variable accessing. Probing inside function body and accessing its local variable will be supported like this: SEC(config) char _prog_config[] = prog: func_name:1234 vara=localvara int prog(struct pt_regs *ctx, unsigned long vara) { // vara is the value of localvara of function func_name } that would be great. I'm not sure though how you can achieve that without changing C front-end ? It's not very difficult. He is trying to generate the loader of vara as prologue, then paste the prologue and the main eBPF program together. From the viewpoint of kernel bpf verifier, there is only one param (ctx); the prologue program fetches the value of vara then put it into a propoer register, then main program work. got it. I think that's much cleaner than what I was proposing. The only question is then: char _prog_config[] = prog: func_name:1234 vara=localvara should actually be something like ... r2=localvara, right? since prologue would need to assign into r2. Otherwise I don't see where you find out about 'vara' inside compiled bpf code. Would be nice if this can be done without debug info. Like in tracex2_kern.c I have: SEC(kprobe/sys_write) int bpf_prog(struct pt_regs *ctx) { long wr_size = ctx-dx; /* arg3 */ with your prolog generator the above can be rewritten as: SEC(kprobe/sys_write) int bpf_prog(struct pt_regs *unused, int fd, char *buf, size_t wr_size) { /* use wr_size */ that will improve ease of use a lot. Another possible solution is to change the protocol between kprobe and eBPF program, makes kprobes calls fetchers and passes them to eBPF program as a second param (group all varx together). A prologue may still need in this case to load each param into correct register. you mean grouping varx together in some other struct and embedding it together with pt_regs into new container struct? doable, but your first approach is quite clean already. why bother. Could you please consider the following problem? We find there are serval __lock_page() calls last very long time. We are going to find corresponding __unlock_page() so we can know what blocks them. We want to insert eBPF programs before io_schedule() in __lock_page(), and also add eBPF program on the entry of __unlock_page(), so we can compute the interval between page locking and unlocking. If time is longer than a threshold, let __unlock_page() trigger a perf sampling so we get its call stack. In this case, eBPF program acts as a trace filter. all makes sense and your use case fits quite well into existing bpf+kprobe model. I'm not sure why you're calling a 'problem'. A problem of how to display that call stack from perf? I would say it fits better as a sample than a trace. If you dump it as a trace, it won't easy to decipher, whereas if you treat it a sampling event, perf record/report facility will pick it up and display nicely. Meaning that one sample == lock_page/unlock_page latency N. Then existing sample_callchain flag should work. -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [RFC PATCH 00/22] perf tools: introduce 'perf bpf' command to load eBPF programs.
On 2015/5/1 12:37, Alexei Starovoitov wrote: > On 4/30/15 3:52 AM, Wang Nan wrote: >> This series of patches is an approach to integrate eBPF with perf. >> After applying these patches, users are allowed to use following >> command to load eBPF program compiled by LLVM into kernel: >> >> $ perf bpf sample_bpf.o >> >> The required BPF code and the loading procedure is similar to Alexei >> Starovoitov's libbpf in sample/bpf, with following exceptions: >> >> 1. The section name are not required leading with 'kprobe/' or >> 'kretprobe/'. Without such leading, any valid C var name can be use. >> >> 2. A 'config' section can be provided to describe the position and >> arguments of a program. Syntax is identical to 'perf probe'. >> >> An example is pasted at the bottom of this cover letter. In that >> example, mybpfprog is configured by string in config section, and will >> be probed at __alloc_pages_nodemask. sample_bpf.o is generated using: >> >> $ $CLANG -I/usr/src/kernel/include -I/usr/src/kernel/usr/include >> -D__KERNEL__ \ >> -Wno-unused-value -Wno-pointer-sign \ >> -O2 -emit-llvm -c sample_bpf.c -o -| $LLC -march=bpf -filetype=obj -o \ >> sample_bpf.o >> >> And can be loaded using: >> >> $ perf bpf sample_bpf.o >> >> This series is only a limited functional. Following works are on the >> todo list: >> >> 1. Unprobe kprobe stubs used by eBPF programs when unloading; >> >> 2. Enable eBPF programs to access local variables and arguments >> by utilizing debuginfo; >> >> 3. Output data in perf way. >> >> In this series: >> >> Patch 1/22 is a bugfix in perf probe, and may be triggered by following >> patches; >> >> Patch 2-3/22 are preparation, add required macros and syscall >> definition into perf source tree. >> >> Patch 4/22 add 'perf bpf' command. >> >> Patch 5-20/22 are labor works, which parse the ELF object file, collect >> information in object files, create maps needed by programs, link map >> and programs, config programs and load programs into kernel. >> >> Patch 21-22/22 are the final work. Patch 21 creates kprobe points which >> will be used by eBPF programs, patch 22 creates perf file descriptors >> then attach eBPF programs on them. > > I'm very happy to see this work. Looks great. All patches are impressively > clean and concise. > I think patches 1-3 are ready to go into Arnaldo's perf tree right now. > 4 and above are clean and polished, but probably need to go into > some 'staging area' like a branch of perf tree, since I suspect the > user interface may change a little in the coming months and it's > a bit too early to expose 'perf bpf' command to every perf user ? > Arnaldo, Ingo, what do you guys think should be the arrangement? > 'perf/bpf' branch in acme/linux.git or in tip/tip.git ? > > I have few comments for patches 18 and 19, but let's figure out > the long term plan first. > Hi, Very happy to see your and other's positive feedbacks. I'm also interested in how these patches can be merged into mainline. I'd like to continous send patches to this list to let you all see my improvements, and let maintainers deside whether and how to merge them. Now we are also doing some backporting work to make eBPF patches to work for our low version kernels. After that we will utilize eBPF in our profiling work. I think this RFC series is only a start point to let us to use eBPF. Further requirements should arise during our real work. I'd like to do following works in the next version (based on my experience and feedbacks): 1. Safely clean up kprobe points after unloading; 2. Add subcommand space to 'perf bpf'. Current staff should be reside in 'perf bpf load'; 3. Extract eBPF ELF walking and collecting work to a separated library to help others. My collage He Kuang is working on variable accessing. Probing inside function body and accessing its local variable will be supported like this: SEC("config") char _prog_config[] = "prog: func_name:1234 vara=localvara" int prog(struct pt_regs *ctx, unsigned long vara) { // vara is the value of localvara of function func_name } And I want to discuss with you and others about: 1. How to make eBPF output its tracing and aggregation results to perf? Thanks! > We're also working in parallel on creating a new tracing language > that together with llvm backend can be used as a single shared library > that can be called from perf or anything else. > Then clang compilation step will be gone and programs can be run > as 'perf bpf file.bpf'. > > Thanks! > -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [RFC PATCH 00/22] perf tools: introduce 'perf bpf' command to load eBPF programs.
On 2015/5/1 12:37, Alexei Starovoitov wrote: On 4/30/15 3:52 AM, Wang Nan wrote: This series of patches is an approach to integrate eBPF with perf. After applying these patches, users are allowed to use following command to load eBPF program compiled by LLVM into kernel: $ perf bpf sample_bpf.o The required BPF code and the loading procedure is similar to Alexei Starovoitov's libbpf in sample/bpf, with following exceptions: 1. The section name are not required leading with 'kprobe/' or 'kretprobe/'. Without such leading, any valid C var name can be use. 2. A 'config' section can be provided to describe the position and arguments of a program. Syntax is identical to 'perf probe'. An example is pasted at the bottom of this cover letter. In that example, mybpfprog is configured by string in config section, and will be probed at __alloc_pages_nodemask. sample_bpf.o is generated using: $ $CLANG -I/usr/src/kernel/include -I/usr/src/kernel/usr/include -D__KERNEL__ \ -Wno-unused-value -Wno-pointer-sign \ -O2 -emit-llvm -c sample_bpf.c -o -| $LLC -march=bpf -filetype=obj -o \ sample_bpf.o And can be loaded using: $ perf bpf sample_bpf.o This series is only a limited functional. Following works are on the todo list: 1. Unprobe kprobe stubs used by eBPF programs when unloading; 2. Enable eBPF programs to access local variables and arguments by utilizing debuginfo; 3. Output data in perf way. In this series: Patch 1/22 is a bugfix in perf probe, and may be triggered by following patches; Patch 2-3/22 are preparation, add required macros and syscall definition into perf source tree. Patch 4/22 add 'perf bpf' command. Patch 5-20/22 are labor works, which parse the ELF object file, collect information in object files, create maps needed by programs, link map and programs, config programs and load programs into kernel. Patch 21-22/22 are the final work. Patch 21 creates kprobe points which will be used by eBPF programs, patch 22 creates perf file descriptors then attach eBPF programs on them. I'm very happy to see this work. Looks great. All patches are impressively clean and concise. I think patches 1-3 are ready to go into Arnaldo's perf tree right now. 4 and above are clean and polished, but probably need to go into some 'staging area' like a branch of perf tree, since I suspect the user interface may change a little in the coming months and it's a bit too early to expose 'perf bpf' command to every perf user ? Arnaldo, Ingo, what do you guys think should be the arrangement? 'perf/bpf' branch in acme/linux.git or in tip/tip.git ? I have few comments for patches 18 and 19, but let's figure out the long term plan first. Hi, Very happy to see your and other's positive feedbacks. I'm also interested in how these patches can be merged into mainline. I'd like to continous send patches to this list to let you all see my improvements, and let maintainers deside whether and how to merge them. Now we are also doing some backporting work to make eBPF patches to work for our low version kernels. After that we will utilize eBPF in our profiling work. I think this RFC series is only a start point to let us to use eBPF. Further requirements should arise during our real work. I'd like to do following works in the next version (based on my experience and feedbacks): 1. Safely clean up kprobe points after unloading; 2. Add subcommand space to 'perf bpf'. Current staff should be reside in 'perf bpf load'; 3. Extract eBPF ELF walking and collecting work to a separated library to help others. My collage He Kuang is working on variable accessing. Probing inside function body and accessing its local variable will be supported like this: SEC(config) char _prog_config[] = prog: func_name:1234 vara=localvara int prog(struct pt_regs *ctx, unsigned long vara) { // vara is the value of localvara of function func_name } And I want to discuss with you and others about: 1. How to make eBPF output its tracing and aggregation results to perf? Thanks! We're also working in parallel on creating a new tracing language that together with llvm backend can be used as a single shared library that can be called from perf or anything else. Then clang compilation step will be gone and programs can be run as 'perf bpf file.bpf'. Thanks! -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [RFC PATCH 00/22] perf tools: introduce 'perf bpf' command to load eBPF programs.
* Alexei Starovoitov wrote: > On 5/1/15 4:49 AM, Ingo Molnar wrote: > > > >* Peter Zijlstra wrote: > > > >>On Thu, Apr 30, 2015 at 09:37:04PM -0700, Alexei Starovoitov wrote: > >>>We're also working in parallel on creating a new tracing language > >>>that together with llvm backend can be used as a single shared library > >>>that can be called from perf or anything else. > >> > >>Gurgh, please also keep normal C an option. [...] > > > >Absolutely, I thought there was agreement on that when we started > >merging all these eBPF patches ... > > > >It might be 'simplified C', in that it's just a subset of C, but > >please don't re-do something that works, especially if it's used to > >instrument a kernel that is written in C ... > > of course. When did I say that I like 'bird' languages? :) > By 'new' I mean that we're not trying to port existing tracing > language like dtrace, systemtap, ktap to bpf. > I believe dtrace would have been more widely adopted if it didn't > invent new syntax. We're trying to do a C -- with ++. > It's C where non-supported things like 'for', 'while', 'asm' are > actively error-ed by front-end and additional syntactic > sugar for things that too ugly/verbose in vanilla C are added. Ok, sounds very good to me! Thanks, Ingo -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [RFC PATCH 00/22] perf tools: introduce 'perf bpf' command to load eBPF programs.
On 5/1/15 4:49 AM, Ingo Molnar wrote: * Peter Zijlstra wrote: On Thu, Apr 30, 2015 at 09:37:04PM -0700, Alexei Starovoitov wrote: We're also working in parallel on creating a new tracing language that together with llvm backend can be used as a single shared library that can be called from perf or anything else. Gurgh, please also keep normal C an option. [...] Absolutely, I thought there was agreement on that when we started merging all these eBPF patches ... It might be 'simplified C', in that it's just a subset of C, but please don't re-do something that works, especially if it's used to instrument a kernel that is written in C ... of course. When did I say that I like 'bird' languages? :) By 'new' I mean that we're not trying to port existing tracing language like dtrace, systemtap, ktap to bpf. I believe dtrace would have been more widely adopted if it didn't invent new syntax. We're trying to do a C -- with ++. It's C where non-supported things like 'for', 'while', 'asm' are actively error-ed by front-end and additional syntactic sugar for things that too ugly/verbose in vanilla C are added. Full C via clang will always be there, but looks like it will have a hard time, because full C has way too many things that are not supported by bpf VM. We're trying to act on feedback that new users are giving us. It's much more friendly when compiler tells you right away that 'for' is not supported instead of kernel verifier says that there is a loop. New thing is map[key] access which is equivalent to bpf_map_lookup(, ) followed by bpf_map_update(, , _value) if lookup doesn't find an element. Turned out that for tracing use cases it's a very common pattern. Anyway, back to my original question about long term home. where to land 'perf/bpf' branch ? I also agree on a room for additional arguments after 'perf bpf'. Especially I like to see 'perf bpf list'. -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [RFC PATCH 00/22] perf tools: introduce 'perf bpf' command to load eBPF programs.
* Peter Zijlstra wrote: > On Thu, Apr 30, 2015 at 09:37:04PM -0700, Alexei Starovoitov wrote: > > We're also working in parallel on creating a new tracing language > > that together with llvm backend can be used as a single shared library > > that can be called from perf or anything else. > > Gurgh, please also keep normal C an option. [...] Absolutely, I thought there was agreement on that when we started merging all these eBPF patches ... It might be 'simplified C', in that it's just a subset of C, but please don't re-do something that works, especially if it's used to instrument a kernel that is written in C ... Thanks, Ingo -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [RFC PATCH 00/22] perf tools: introduce 'perf bpf' command to load eBPF programs.
On Thu, Apr 30, 2015 at 09:37:04PM -0700, Alexei Starovoitov wrote: > We're also working in parallel on creating a new tracing language > that together with llvm backend can be used as a single shared library > that can be called from perf or anything else. Gurgh, please also keep normal C an option. I never can remember how all these fancy arse special case languages work and its just too annoying / frustrating to have to figure out how to do simple things every time you need it to just work. -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [RFC PATCH 00/22] perf tools: introduce 'perf bpf' command to load eBPF programs.
* Wang Nan wrote: > This series of patches is an approach to integrate eBPF with perf. Very promising! > After applying these patches, users are allowed to use following > command to load eBPF program compiled by LLVM into kernel: > > $ perf bpf sample_bpf.o Please keep space for a subcommand space as most other perf subcommands do, i.e. make it something like: perf bpf add sample_bpf.o or: perf bpf run sample_bpf.o or: perf bpf load sample_bpf.o So that future subcommands can be added: perf bpf list perf bpf del <...> perf bpf enable <...> perf bpf disable <...> perf bpf help and 'perf bpf' should probably display the help page by default, so if curious perf users stumble into the new subcommand, they get a basic idea about what it's all about. I.e. you should think about the high level subcommand space right now, and pick proper names - because this is going to determine the future usability and the success of the tool to a large degree. Thanks, Ingo -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [RFC PATCH 00/22] perf tools: introduce 'perf bpf' command to load eBPF programs.
* Wang Nan wangn...@huawei.com wrote: This series of patches is an approach to integrate eBPF with perf. Very promising! After applying these patches, users are allowed to use following command to load eBPF program compiled by LLVM into kernel: $ perf bpf sample_bpf.o Please keep space for a subcommand space as most other perf subcommands do, i.e. make it something like: perf bpf add sample_bpf.o or: perf bpf run sample_bpf.o or: perf bpf load sample_bpf.o So that future subcommands can be added: perf bpf list perf bpf del ... perf bpf enable ... perf bpf disable ... perf bpf help and 'perf bpf' should probably display the help page by default, so if curious perf users stumble into the new subcommand, they get a basic idea about what it's all about. I.e. you should think about the high level subcommand space right now, and pick proper names - because this is going to determine the future usability and the success of the tool to a large degree. Thanks, Ingo -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [RFC PATCH 00/22] perf tools: introduce 'perf bpf' command to load eBPF programs.
On 5/1/15 4:49 AM, Ingo Molnar wrote: * Peter Zijlstra pet...@infradead.org wrote: On Thu, Apr 30, 2015 at 09:37:04PM -0700, Alexei Starovoitov wrote: We're also working in parallel on creating a new tracing language that together with llvm backend can be used as a single shared library that can be called from perf or anything else. Gurgh, please also keep normal C an option. [...] Absolutely, I thought there was agreement on that when we started merging all these eBPF patches ... It might be 'simplified C', in that it's just a subset of C, but please don't re-do something that works, especially if it's used to instrument a kernel that is written in C ... of course. When did I say that I like 'bird' languages? :) By 'new' I mean that we're not trying to port existing tracing language like dtrace, systemtap, ktap to bpf. I believe dtrace would have been more widely adopted if it didn't invent new syntax. We're trying to do a C -- with ++. It's C where non-supported things like 'for', 'while', 'asm' are actively error-ed by front-end and additional syntactic sugar for things that too ugly/verbose in vanilla C are added. Full C via clang will always be there, but looks like it will have a hard time, because full C has way too many things that are not supported by bpf VM. We're trying to act on feedback that new users are giving us. It's much more friendly when compiler tells you right away that 'for' is not supported instead of kernel verifier says that there is a loop. New thing is map[key] access which is equivalent to bpf_map_lookup(map, key) followed by bpf_map_update(map, key, zero_value) if lookup doesn't find an element. Turned out that for tracing use cases it's a very common pattern. Anyway, back to my original question about long term home. where to land 'perf/bpf' branch ? I also agree on a room for additional arguments after 'perf bpf'. Especially I like to see 'perf bpf list'. -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [RFC PATCH 00/22] perf tools: introduce 'perf bpf' command to load eBPF programs.
* Alexei Starovoitov a...@plumgrid.com wrote: On 5/1/15 4:49 AM, Ingo Molnar wrote: * Peter Zijlstra pet...@infradead.org wrote: On Thu, Apr 30, 2015 at 09:37:04PM -0700, Alexei Starovoitov wrote: We're also working in parallel on creating a new tracing language that together with llvm backend can be used as a single shared library that can be called from perf or anything else. Gurgh, please also keep normal C an option. [...] Absolutely, I thought there was agreement on that when we started merging all these eBPF patches ... It might be 'simplified C', in that it's just a subset of C, but please don't re-do something that works, especially if it's used to instrument a kernel that is written in C ... of course. When did I say that I like 'bird' languages? :) By 'new' I mean that we're not trying to port existing tracing language like dtrace, systemtap, ktap to bpf. I believe dtrace would have been more widely adopted if it didn't invent new syntax. We're trying to do a C -- with ++. It's C where non-supported things like 'for', 'while', 'asm' are actively error-ed by front-end and additional syntactic sugar for things that too ugly/verbose in vanilla C are added. Ok, sounds very good to me! Thanks, Ingo -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [RFC PATCH 00/22] perf tools: introduce 'perf bpf' command to load eBPF programs.
On Thu, Apr 30, 2015 at 09:37:04PM -0700, Alexei Starovoitov wrote: We're also working in parallel on creating a new tracing language that together with llvm backend can be used as a single shared library that can be called from perf or anything else. Gurgh, please also keep normal C an option. I never can remember how all these fancy arse special case languages work and its just too annoying / frustrating to have to figure out how to do simple things every time you need it to just work. -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [RFC PATCH 00/22] perf tools: introduce 'perf bpf' command to load eBPF programs.
* Peter Zijlstra pet...@infradead.org wrote: On Thu, Apr 30, 2015 at 09:37:04PM -0700, Alexei Starovoitov wrote: We're also working in parallel on creating a new tracing language that together with llvm backend can be used as a single shared library that can be called from perf or anything else. Gurgh, please also keep normal C an option. [...] Absolutely, I thought there was agreement on that when we started merging all these eBPF patches ... It might be 'simplified C', in that it's just a subset of C, but please don't re-do something that works, especially if it's used to instrument a kernel that is written in C ... Thanks, Ingo -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [RFC PATCH 00/22] perf tools: introduce 'perf bpf' command to load eBPF programs.
On 4/30/15 3:52 AM, Wang Nan wrote: This series of patches is an approach to integrate eBPF with perf. After applying these patches, users are allowed to use following command to load eBPF program compiled by LLVM into kernel: $ perf bpf sample_bpf.o The required BPF code and the loading procedure is similar to Alexei Starovoitov's libbpf in sample/bpf, with following exceptions: 1. The section name are not required leading with 'kprobe/' or 'kretprobe/'. Without such leading, any valid C var name can be use. 2. A 'config' section can be provided to describe the position and arguments of a program. Syntax is identical to 'perf probe'. An example is pasted at the bottom of this cover letter. In that example, mybpfprog is configured by string in config section, and will be probed at __alloc_pages_nodemask. sample_bpf.o is generated using: $ $CLANG -I/usr/src/kernel/include -I/usr/src/kernel/usr/include -D__KERNEL__ \ -Wno-unused-value -Wno-pointer-sign \ -O2 -emit-llvm -c sample_bpf.c -o -| $LLC -march=bpf -filetype=obj -o \ sample_bpf.o And can be loaded using: $ perf bpf sample_bpf.o This series is only a limited functional. Following works are on the todo list: 1. Unprobe kprobe stubs used by eBPF programs when unloading; 2. Enable eBPF programs to access local variables and arguments by utilizing debuginfo; 3. Output data in perf way. In this series: Patch 1/22 is a bugfix in perf probe, and may be triggered by following patches; Patch 2-3/22 are preparation, add required macros and syscall definition into perf source tree. Patch 4/22 add 'perf bpf' command. Patch 5-20/22 are labor works, which parse the ELF object file, collect information in object files, create maps needed by programs, link map and programs, config programs and load programs into kernel. Patch 21-22/22 are the final work. Patch 21 creates kprobe points which will be used by eBPF programs, patch 22 creates perf file descriptors then attach eBPF programs on them. I'm very happy to see this work. Looks great. All patches are impressively clean and concise. I think patches 1-3 are ready to go into Arnaldo's perf tree right now. 4 and above are clean and polished, but probably need to go into some 'staging area' like a branch of perf tree, since I suspect the user interface may change a little in the coming months and it's a bit too early to expose 'perf bpf' command to every perf user ? Arnaldo, Ingo, what do you guys think should be the arrangement? 'perf/bpf' branch in acme/linux.git or in tip/tip.git ? I have few comments for patches 18 and 19, but let's figure out the long term plan first. We're also working in parallel on creating a new tracing language that together with llvm backend can be used as a single shared library that can be called from perf or anything else. Then clang compilation step will be gone and programs can be run as 'perf bpf file.bpf'. Thanks! -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [RFC PATCH 00/22] perf tools: introduce 'perf bpf' command to load eBPF programs.
On 4/30/15 3:52 AM, Wang Nan wrote: This series of patches is an approach to integrate eBPF with perf. After applying these patches, users are allowed to use following command to load eBPF program compiled by LLVM into kernel: $ perf bpf sample_bpf.o The required BPF code and the loading procedure is similar to Alexei Starovoitov's libbpf in sample/bpf, with following exceptions: 1. The section name are not required leading with 'kprobe/' or 'kretprobe/'. Without such leading, any valid C var name can be use. 2. A 'config' section can be provided to describe the position and arguments of a program. Syntax is identical to 'perf probe'. An example is pasted at the bottom of this cover letter. In that example, mybpfprog is configured by string in config section, and will be probed at __alloc_pages_nodemask. sample_bpf.o is generated using: $ $CLANG -I/usr/src/kernel/include -I/usr/src/kernel/usr/include -D__KERNEL__ \ -Wno-unused-value -Wno-pointer-sign \ -O2 -emit-llvm -c sample_bpf.c -o -| $LLC -march=bpf -filetype=obj -o \ sample_bpf.o And can be loaded using: $ perf bpf sample_bpf.o This series is only a limited functional. Following works are on the todo list: 1. Unprobe kprobe stubs used by eBPF programs when unloading; 2. Enable eBPF programs to access local variables and arguments by utilizing debuginfo; 3. Output data in perf way. In this series: Patch 1/22 is a bugfix in perf probe, and may be triggered by following patches; Patch 2-3/22 are preparation, add required macros and syscall definition into perf source tree. Patch 4/22 add 'perf bpf' command. Patch 5-20/22 are labor works, which parse the ELF object file, collect information in object files, create maps needed by programs, link map and programs, config programs and load programs into kernel. Patch 21-22/22 are the final work. Patch 21 creates kprobe points which will be used by eBPF programs, patch 22 creates perf file descriptors then attach eBPF programs on them. I'm very happy to see this work. Looks great. All patches are impressively clean and concise. I think patches 1-3 are ready to go into Arnaldo's perf tree right now. 4 and above are clean and polished, but probably need to go into some 'staging area' like a branch of perf tree, since I suspect the user interface may change a little in the coming months and it's a bit too early to expose 'perf bpf' command to every perf user ? Arnaldo, Ingo, what do you guys think should be the arrangement? 'perf/bpf' branch in acme/linux.git or in tip/tip.git ? I have few comments for patches 18 and 19, but let's figure out the long term plan first. We're also working in parallel on creating a new tracing language that together with llvm backend can be used as a single shared library that can be called from perf or anything else. Then clang compilation step will be gone and programs can be run as 'perf bpf file.bpf'. Thanks! -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/