Re: [RFC] perf uprobe: Skip prologue if program compiled without optimization
On Saturday 30 July 2016 08:34 AM, Masami Hiramatsu wrote: On Thu, 28 Jul 2016 20:01:51 +0530 Ravi Bangoriawrote: Function prologue prepares stack and registers before executing function logic. When target program is compiled without optimization, function parameter information is only valid after prologue. When we probe entrypc of the function, and try to record function parameter, it contains garbage value. Right! :) Thanks Masami for review. I've sent patch with changes you suggested. Please review it. -Ravi
Re: [RFC] perf uprobe: Skip prologue if program compiled without optimization
On Saturday 30 July 2016 08:34 AM, Masami Hiramatsu wrote: On Thu, 28 Jul 2016 20:01:51 +0530 Ravi Bangoria wrote: Function prologue prepares stack and registers before executing function logic. When target program is compiled without optimization, function parameter information is only valid after prologue. When we probe entrypc of the function, and try to record function parameter, it contains garbage value. Right! :) Thanks Masami for review. I've sent patch with changes you suggested. Please review it. -Ravi
Re: [RFC] perf uprobe: Skip prologue if program compiled without optimization
On Thu, 28 Jul 2016 20:01:51 +0530 Ravi Bangoriawrote: > Function prologue prepares stack and registers before executing function > logic. When target program is compiled without optimization, function > parameter information is only valid after prologue. When we probe entrypc > of the function, and try to record function parameter, it contains > garbage value. Right! :) > For example, > $ vim test.c > #include > > void foo(int i) > { >printf("i: %d\n", i); > } > > int main() > { > foo(42); > return 0; > } > > $ gcc -g test.c -o test > $ objdump -dl test | less > foo(): > /home/ravi/test.c:4 > 400536: 55 push %rbp > 400537: 48 89 e5mov%rsp,%rbp > 40053a: 48 83 ec 10 sub-bashx10,%rsp > 40053e: 89 7d fcmov%edi,-0x4(%rbp) > /home/ravi/test.c:5 > 400541: 8b 45 fcmov-0x4(%rbp),%eax > ... > ... > main(): > /home/ravi/test.c:9 > 400558: 55 push %rbp > 400559: 48 89 e5mov%rsp,%rbp > /home/ravi/test.c:10 > 40055c: bf 2a 00 00 00 mov-bashx2a,%edi > 400561: e8 d0 ff ff ff callq 400536 > /home/ravi/test.c:11 > > $ ./perf probe -x ./test 'foo i' > $ cat /sys/kernel/debug/tracing/uprobe_events > p:probe_test/foo /home/ravi/test:0x0536 i=-12(%sp):s32 > > $ ./perf record -e probe_test:foo ./test > $ ./perf script > test 5778 [001] 4918.562027: probe_test:foo: (400536) i=0 > > Here variable 'i' is passed via stack which is pushed on stack at > 0x40053e. But we are probing at 0x400536. > > To resolve this issues, we need to probe on next instruction after > prologue. gdb and systemtap also does same thing. I've implemented > this patch based on approach systemtap has used. > > After applying patch: > > $ ./perf probe -x ./test 'foo i' > $ cat /sys/kernel/debug/tracing/uprobe_events > p:probe_test/foo /home/ravi/test:0x0541 i=-4(%bp):s32 > > $ ./perf record -e probe_test:foo ./test > $ ./perf script > test 6300 [001] 5877.879327: probe_test:foo: (400541) i=42 It is great! And I think we also should give a notice message for users about skipping prologue, so that they can understand why the probe point is not on the function entry address ;) > No need to skip prologue for optimized case since debug info is correct > for each instructions for -O2 -g. For more details please visit: > https://bugzilla.redhat.com/show_bug.cgi?id=612253#c6 > > Signed-off-by: Ravi Bangoria > --- > tools/perf/util/probe-finder.c | 156 > + > 1 file changed, 156 insertions(+) > > diff --git a/tools/perf/util/probe-finder.c b/tools/perf/util/probe-finder.c > index f2d9ff0..a788b9c2 100644 > --- a/tools/perf/util/probe-finder.c > +++ b/tools/perf/util/probe-finder.c > @@ -892,6 +892,161 @@ static int find_probe_point_lazy(Dwarf_Die *sp_die, > struct probe_finder *pf) > return die_walk_lines(sp_die, probe_point_lazy_walker, pf); > } > > +static bool var_has_loclist(Dwarf_Die *die) > +{ > + Dwarf_Attribute loc; > + int tag = dwarf_tag(die); > + > + if (tag != DW_TAG_formal_parameter && > + tag != DW_TAG_variable) > + return false; > + > + return (dwarf_attr_integrate(die, DW_AT_location, ) && > + dwarf_whatform() == DW_FORM_sec_offset); > +} > + > +/* > + * For any object in given CU whose DW_AT_location is a location list, > + * target program is compiled with optimization. > + */ OK, anyway if it has loclist, we can ensure the variable is available at that address. > +static bool optimized_target(Dwarf_Die *die) > +{ > + Dwarf_Die tmp_die; > + > + if (var_has_loclist(die)) > + return true; > + > + if (!dwarf_child(die, _die) && optimized_target(_die)) > + return true; > + > + if (!dwarf_siblingof(die, _die) && optimized_target(_die)) > + return true; > + > + return false; > +} > + > +static bool get_entrypc_idx(Dwarf_Lines *lines, unsigned long nr_lines, > + Dwarf_Addr pf_addr, unsigned long *entrypc_idx) > +{ > + unsigned long i; > + Dwarf_Addr addr; > + > + for (i = 0; i < nr_lines; i++) { > + if (dwarf_lineaddr(dwarf_onesrcline(lines, i), )) > + return false; > + > + if (addr == pf_addr) { > + *entrypc_idx = i; > + return true; > + } > + } > + return false; > +} > + > +static bool get_postprologue_addr(unsigned long entrypc_idx, > + Dwarf_Lines *lines, > + unsigned long nr_lines, >
Re: [RFC] perf uprobe: Skip prologue if program compiled without optimization
On Thu, 28 Jul 2016 20:01:51 +0530 Ravi Bangoria wrote: > Function prologue prepares stack and registers before executing function > logic. When target program is compiled without optimization, function > parameter information is only valid after prologue. When we probe entrypc > of the function, and try to record function parameter, it contains > garbage value. Right! :) > For example, > $ vim test.c > #include > > void foo(int i) > { >printf("i: %d\n", i); > } > > int main() > { > foo(42); > return 0; > } > > $ gcc -g test.c -o test > $ objdump -dl test | less > foo(): > /home/ravi/test.c:4 > 400536: 55 push %rbp > 400537: 48 89 e5mov%rsp,%rbp > 40053a: 48 83 ec 10 sub-bashx10,%rsp > 40053e: 89 7d fcmov%edi,-0x4(%rbp) > /home/ravi/test.c:5 > 400541: 8b 45 fcmov-0x4(%rbp),%eax > ... > ... > main(): > /home/ravi/test.c:9 > 400558: 55 push %rbp > 400559: 48 89 e5mov%rsp,%rbp > /home/ravi/test.c:10 > 40055c: bf 2a 00 00 00 mov-bashx2a,%edi > 400561: e8 d0 ff ff ff callq 400536 > /home/ravi/test.c:11 > > $ ./perf probe -x ./test 'foo i' > $ cat /sys/kernel/debug/tracing/uprobe_events > p:probe_test/foo /home/ravi/test:0x0536 i=-12(%sp):s32 > > $ ./perf record -e probe_test:foo ./test > $ ./perf script > test 5778 [001] 4918.562027: probe_test:foo: (400536) i=0 > > Here variable 'i' is passed via stack which is pushed on stack at > 0x40053e. But we are probing at 0x400536. > > To resolve this issues, we need to probe on next instruction after > prologue. gdb and systemtap also does same thing. I've implemented > this patch based on approach systemtap has used. > > After applying patch: > > $ ./perf probe -x ./test 'foo i' > $ cat /sys/kernel/debug/tracing/uprobe_events > p:probe_test/foo /home/ravi/test:0x0541 i=-4(%bp):s32 > > $ ./perf record -e probe_test:foo ./test > $ ./perf script > test 6300 [001] 5877.879327: probe_test:foo: (400541) i=42 It is great! And I think we also should give a notice message for users about skipping prologue, so that they can understand why the probe point is not on the function entry address ;) > No need to skip prologue for optimized case since debug info is correct > for each instructions for -O2 -g. For more details please visit: > https://bugzilla.redhat.com/show_bug.cgi?id=612253#c6 > > Signed-off-by: Ravi Bangoria > --- > tools/perf/util/probe-finder.c | 156 > + > 1 file changed, 156 insertions(+) > > diff --git a/tools/perf/util/probe-finder.c b/tools/perf/util/probe-finder.c > index f2d9ff0..a788b9c2 100644 > --- a/tools/perf/util/probe-finder.c > +++ b/tools/perf/util/probe-finder.c > @@ -892,6 +892,161 @@ static int find_probe_point_lazy(Dwarf_Die *sp_die, > struct probe_finder *pf) > return die_walk_lines(sp_die, probe_point_lazy_walker, pf); > } > > +static bool var_has_loclist(Dwarf_Die *die) > +{ > + Dwarf_Attribute loc; > + int tag = dwarf_tag(die); > + > + if (tag != DW_TAG_formal_parameter && > + tag != DW_TAG_variable) > + return false; > + > + return (dwarf_attr_integrate(die, DW_AT_location, ) && > + dwarf_whatform() == DW_FORM_sec_offset); > +} > + > +/* > + * For any object in given CU whose DW_AT_location is a location list, > + * target program is compiled with optimization. > + */ OK, anyway if it has loclist, we can ensure the variable is available at that address. > +static bool optimized_target(Dwarf_Die *die) > +{ > + Dwarf_Die tmp_die; > + > + if (var_has_loclist(die)) > + return true; > + > + if (!dwarf_child(die, _die) && optimized_target(_die)) > + return true; > + > + if (!dwarf_siblingof(die, _die) && optimized_target(_die)) > + return true; > + > + return false; > +} > + > +static bool get_entrypc_idx(Dwarf_Lines *lines, unsigned long nr_lines, > + Dwarf_Addr pf_addr, unsigned long *entrypc_idx) > +{ > + unsigned long i; > + Dwarf_Addr addr; > + > + for (i = 0; i < nr_lines; i++) { > + if (dwarf_lineaddr(dwarf_onesrcline(lines, i), )) > + return false; > + > + if (addr == pf_addr) { > + *entrypc_idx = i; > + return true; > + } > + } > + return false; > +} > + > +static bool get_postprologue_addr(unsigned long entrypc_idx, > + Dwarf_Lines *lines, > + unsigned long nr_lines, > + Dwarf_Addr highpc, > +
[RFC] perf uprobe: Skip prologue if program compiled without optimization
Function prologue prepares stack and registers before executing function logic. When target program is compiled without optimization, function parameter information is only valid after prologue. When we probe entrypc of the function, and try to record function parameter, it contains garbage value. For example, $ vim test.c #include void foo(int i) { printf("i: %d\n", i); } int main() { foo(42); return 0; } $ gcc -g test.c -o test $ objdump -dl test | less foo(): /home/ravi/test.c:4 400536: 55 push %rbp 400537: 48 89 e5mov%rsp,%rbp 40053a: 48 83 ec 10 sub-bashx10,%rsp 40053e: 89 7d fcmov%edi,-0x4(%rbp) /home/ravi/test.c:5 400541: 8b 45 fcmov-0x4(%rbp),%eax ... ... main(): /home/ravi/test.c:9 400558: 55 push %rbp 400559: 48 89 e5mov%rsp,%rbp /home/ravi/test.c:10 40055c: bf 2a 00 00 00 mov-bashx2a,%edi 400561: e8 d0 ff ff ff callq 400536 /home/ravi/test.c:11 $ ./perf probe -x ./test 'foo i' $ cat /sys/kernel/debug/tracing/uprobe_events p:probe_test/foo /home/ravi/test:0x0536 i=-12(%sp):s32 $ ./perf record -e probe_test:foo ./test $ ./perf script test 5778 [001] 4918.562027: probe_test:foo: (400536) i=0 Here variable 'i' is passed via stack which is pushed on stack at 0x40053e. But we are probing at 0x400536. To resolve this issues, we need to probe on next instruction after prologue. gdb and systemtap also does same thing. I've implemented this patch based on approach systemtap has used. After applying patch: $ ./perf probe -x ./test 'foo i' $ cat /sys/kernel/debug/tracing/uprobe_events p:probe_test/foo /home/ravi/test:0x0541 i=-4(%bp):s32 $ ./perf record -e probe_test:foo ./test $ ./perf script test 6300 [001] 5877.879327: probe_test:foo: (400541) i=42 No need to skip prologue for optimized case since debug info is correct for each instructions for -O2 -g. For more details please visit: https://bugzilla.redhat.com/show_bug.cgi?id=612253#c6 Signed-off-by: Ravi Bangoria--- tools/perf/util/probe-finder.c | 156 + 1 file changed, 156 insertions(+) diff --git a/tools/perf/util/probe-finder.c b/tools/perf/util/probe-finder.c index f2d9ff0..a788b9c2 100644 --- a/tools/perf/util/probe-finder.c +++ b/tools/perf/util/probe-finder.c @@ -892,6 +892,161 @@ static int find_probe_point_lazy(Dwarf_Die *sp_die, struct probe_finder *pf) return die_walk_lines(sp_die, probe_point_lazy_walker, pf); } +static bool var_has_loclist(Dwarf_Die *die) +{ + Dwarf_Attribute loc; + int tag = dwarf_tag(die); + + if (tag != DW_TAG_formal_parameter && + tag != DW_TAG_variable) + return false; + + return (dwarf_attr_integrate(die, DW_AT_location, ) && + dwarf_whatform() == DW_FORM_sec_offset); +} + +/* + * For any object in given CU whose DW_AT_location is a location list, + * target program is compiled with optimization. + */ +static bool optimized_target(Dwarf_Die *die) +{ + Dwarf_Die tmp_die; + + if (var_has_loclist(die)) + return true; + + if (!dwarf_child(die, _die) && optimized_target(_die)) + return true; + + if (!dwarf_siblingof(die, _die) && optimized_target(_die)) + return true; + + return false; +} + +static bool get_entrypc_idx(Dwarf_Lines *lines, unsigned long nr_lines, + Dwarf_Addr pf_addr, unsigned long *entrypc_idx) +{ + unsigned long i; + Dwarf_Addr addr; + + for (i = 0; i < nr_lines; i++) { + if (dwarf_lineaddr(dwarf_onesrcline(lines, i), )) + return false; + + if (addr == pf_addr) { + *entrypc_idx = i; + return true; + } + } + return false; +} + +static bool get_postprologue_addr(unsigned long entrypc_idx, + Dwarf_Lines *lines, + unsigned long nr_lines, + Dwarf_Addr highpc, + Dwarf_Addr *postprologue_addr) +{ + unsigned long i; + int entrypc_lno, lno; + Dwarf_Line *line; + Dwarf_Addr addr; + bool p_end; + + /* entrypc_lno is actual source line number */ + line = dwarf_onesrcline(lines, entrypc_idx); + if (dwarf_lineno(line, _lno)) + return false; + + for (i = entrypc_idx; i < nr_lines; i++) { + line = dwarf_onesrcline(lines, i); + + if (dwarf_lineaddr(line, ) || +
[RFC] perf uprobe: Skip prologue if program compiled without optimization
Function prologue prepares stack and registers before executing function logic. When target program is compiled without optimization, function parameter information is only valid after prologue. When we probe entrypc of the function, and try to record function parameter, it contains garbage value. For example, $ vim test.c #include void foo(int i) { printf("i: %d\n", i); } int main() { foo(42); return 0; } $ gcc -g test.c -o test $ objdump -dl test | less foo(): /home/ravi/test.c:4 400536: 55 push %rbp 400537: 48 89 e5mov%rsp,%rbp 40053a: 48 83 ec 10 sub-bashx10,%rsp 40053e: 89 7d fcmov%edi,-0x4(%rbp) /home/ravi/test.c:5 400541: 8b 45 fcmov-0x4(%rbp),%eax ... ... main(): /home/ravi/test.c:9 400558: 55 push %rbp 400559: 48 89 e5mov%rsp,%rbp /home/ravi/test.c:10 40055c: bf 2a 00 00 00 mov-bashx2a,%edi 400561: e8 d0 ff ff ff callq 400536 /home/ravi/test.c:11 $ ./perf probe -x ./test 'foo i' $ cat /sys/kernel/debug/tracing/uprobe_events p:probe_test/foo /home/ravi/test:0x0536 i=-12(%sp):s32 $ ./perf record -e probe_test:foo ./test $ ./perf script test 5778 [001] 4918.562027: probe_test:foo: (400536) i=0 Here variable 'i' is passed via stack which is pushed on stack at 0x40053e. But we are probing at 0x400536. To resolve this issues, we need to probe on next instruction after prologue. gdb and systemtap also does same thing. I've implemented this patch based on approach systemtap has used. After applying patch: $ ./perf probe -x ./test 'foo i' $ cat /sys/kernel/debug/tracing/uprobe_events p:probe_test/foo /home/ravi/test:0x0541 i=-4(%bp):s32 $ ./perf record -e probe_test:foo ./test $ ./perf script test 6300 [001] 5877.879327: probe_test:foo: (400541) i=42 No need to skip prologue for optimized case since debug info is correct for each instructions for -O2 -g. For more details please visit: https://bugzilla.redhat.com/show_bug.cgi?id=612253#c6 Signed-off-by: Ravi Bangoria --- tools/perf/util/probe-finder.c | 156 + 1 file changed, 156 insertions(+) diff --git a/tools/perf/util/probe-finder.c b/tools/perf/util/probe-finder.c index f2d9ff0..a788b9c2 100644 --- a/tools/perf/util/probe-finder.c +++ b/tools/perf/util/probe-finder.c @@ -892,6 +892,161 @@ static int find_probe_point_lazy(Dwarf_Die *sp_die, struct probe_finder *pf) return die_walk_lines(sp_die, probe_point_lazy_walker, pf); } +static bool var_has_loclist(Dwarf_Die *die) +{ + Dwarf_Attribute loc; + int tag = dwarf_tag(die); + + if (tag != DW_TAG_formal_parameter && + tag != DW_TAG_variable) + return false; + + return (dwarf_attr_integrate(die, DW_AT_location, ) && + dwarf_whatform() == DW_FORM_sec_offset); +} + +/* + * For any object in given CU whose DW_AT_location is a location list, + * target program is compiled with optimization. + */ +static bool optimized_target(Dwarf_Die *die) +{ + Dwarf_Die tmp_die; + + if (var_has_loclist(die)) + return true; + + if (!dwarf_child(die, _die) && optimized_target(_die)) + return true; + + if (!dwarf_siblingof(die, _die) && optimized_target(_die)) + return true; + + return false; +} + +static bool get_entrypc_idx(Dwarf_Lines *lines, unsigned long nr_lines, + Dwarf_Addr pf_addr, unsigned long *entrypc_idx) +{ + unsigned long i; + Dwarf_Addr addr; + + for (i = 0; i < nr_lines; i++) { + if (dwarf_lineaddr(dwarf_onesrcline(lines, i), )) + return false; + + if (addr == pf_addr) { + *entrypc_idx = i; + return true; + } + } + return false; +} + +static bool get_postprologue_addr(unsigned long entrypc_idx, + Dwarf_Lines *lines, + unsigned long nr_lines, + Dwarf_Addr highpc, + Dwarf_Addr *postprologue_addr) +{ + unsigned long i; + int entrypc_lno, lno; + Dwarf_Line *line; + Dwarf_Addr addr; + bool p_end; + + /* entrypc_lno is actual source line number */ + line = dwarf_onesrcline(lines, entrypc_idx); + if (dwarf_lineno(line, _lno)) + return false; + + for (i = entrypc_idx; i < nr_lines; i++) { + line = dwarf_onesrcline(lines, i); + + if (dwarf_lineaddr(line, ) || + dwarf_lineno(line, )|| +