Re: [PATCH 1/8] perf, tools: Support handling complete branch stacks as histograms
> > I considered this. For this example it doesn't make much difference > > because the functions are so small. > > > > But for anything larger I really need the line numbers to make > > sense of it. > > > > So I prefer to keep them. I'll look into some easy switch > > to turn them off though. > > Oh, I'm not just removing line numbers - it also removed duplicates (f1 > and f2). But having both from/to entries, I'm not sure it's worth tho.. The duplicate removal is only for the LBRs. I think it's a sensible default there. What would be nice in the future would be to add some kind of annotation support to the hist entries, so we could say "removed N iterations" and display it (and possibly some more LBR information, like mispredict rate). But that's more work and definitely would be a new patchkit. > > > >> > +if (sort__has_parent && !*parent && > >> > +symbol__match_regex(al.sym, _regex)) > >> > +*parent = al.sym; > >> > +else if (have_ignore_callees && root_al && > >> > + symbol__match_regex(al.sym, _callees_regex)) { > >> > +/* Treat this symbol as the root, > >> > + forgetting its callees. */ > >> > +*root_al = al; > >> > +callchain_cursor_reset(_cursor); > >> > +} > >> > +if (!symbol_conf.use_callchain) > >> > +return -EINVAL; > >> > >> This check already went away. > >> > >> And, to remove duplicates, I think we need to check last callchain > >> cursor node wrt the callchain_param.key here. > > > > I don't understand the comment. I'm not modifying anything > > that has been already added to the callchain. Just things > > to be added in the future. So why would I need to check > > or change the cursor? > > But didn't you already do it (with ips[first_call]) to remove overlaps > between LBR and normal callchain? I added the LBRs, but i didn't add the normal call entries yet. > > > > > >> > >> Also, by comparing 'from' address, I'd expect you add the from address > >> alone but you add both of 'from' and 'to'. Do we really need to do > >> that? > > > > Adding from and to makes it much clearer to the user what happens, > > especially with conditional branches, so they can follow the > > control flow. > > But it could be confusing too - esp. when it moves from LBR to normal > callchains? Hmm.. maybe we can print them bit differently. Yes that would be nice. > > > > > > > >> And the first address saved in normal callchain is address of the > >> function itself so it might be 'to' you need to check if sampled before > >> any branch in a function. > > > > I'm checking against the CALL, not the target. > > Yeah, but I'm afraid that it'd always fail to find a match. It seems to work as far as I can tell. > >> > +err = add_callchain_ip(machine, thread, > >> > + parent, root_al, > >> > + -1, be[i].from); > >> > +if (err == -EINVAL) > >> > +break; > >> > +if (err) > >> > +return err; > >> > +} > >> > +chain_nr -= nr; > >> > >> I'm not sure this line is needed. > > > > Without that i could exceed the limit. > > What limit? The limit of max history entries. -Andi -- a...@linux.intel.com -- Speaking for myself only. -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 1/8] perf, tools: Support handling complete branch stacks as histograms
I considered this. For this example it doesn't make much difference because the functions are so small. But for anything larger I really need the line numbers to make sense of it. So I prefer to keep them. I'll look into some easy switch to turn them off though. Oh, I'm not just removing line numbers - it also removed duplicates (f1 and f2). But having both from/to entries, I'm not sure it's worth tho.. The duplicate removal is only for the LBRs. I think it's a sensible default there. What would be nice in the future would be to add some kind of annotation support to the hist entries, so we could say removed N iterations and display it (and possibly some more LBR information, like mispredict rate). But that's more work and definitely would be a new patchkit. +if (sort__has_parent !*parent +symbol__match_regex(al.sym, parent_regex)) +*parent = al.sym; +else if (have_ignore_callees root_al + symbol__match_regex(al.sym, ignore_callees_regex)) { +/* Treat this symbol as the root, + forgetting its callees. */ +*root_al = al; +callchain_cursor_reset(callchain_cursor); +} +if (!symbol_conf.use_callchain) +return -EINVAL; This check already went away. And, to remove duplicates, I think we need to check last callchain cursor node wrt the callchain_param.key here. I don't understand the comment. I'm not modifying anything that has been already added to the callchain. Just things to be added in the future. So why would I need to check or change the cursor? But didn't you already do it (with ips[first_call]) to remove overlaps between LBR and normal callchain? I added the LBRs, but i didn't add the normal call entries yet. Also, by comparing 'from' address, I'd expect you add the from address alone but you add both of 'from' and 'to'. Do we really need to do that? Adding from and to makes it much clearer to the user what happens, especially with conditional branches, so they can follow the control flow. But it could be confusing too - esp. when it moves from LBR to normal callchains? Hmm.. maybe we can print them bit differently. Yes that would be nice. And the first address saved in normal callchain is address of the function itself so it might be 'to' you need to check if sampled before any branch in a function. I'm checking against the CALL, not the target. Yeah, but I'm afraid that it'd always fail to find a match. It seems to work as far as I can tell. +err = add_callchain_ip(machine, thread, + parent, root_al, + -1, be[i].from); +if (err == -EINVAL) +break; +if (err) +return err; +} +chain_nr -= nr; I'm not sure this line is needed. Without that i could exceed the limit. What limit? The limit of max history entries. -Andi -- a...@linux.intel.com -- Speaking for myself only. -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 1/8] perf, tools: Support handling complete branch stacks as histograms
Hi Andi, On Wed, 12 Nov 2014 00:31:53 +0100, Andi Kleen wrote: > Sorry for the long delay. Just revisiting that. > > On Wed, Oct 22, 2014 at 10:03:51AM +0900, Namhyung Kim wrote: >> > | | f2 tcall.c:5 >> > | | f1 tcall.c:12 >> > | | f1 tcall.c:12 >> > | | f2 tcall.c:7 >> > | | f2 tcall.c:5 >> > | | f1 tcall.c:11 >> >> I think it'd be better if it just prints function names as normal >> callchain does (and optionally srcline with a switch) and duplicates >> removed like below: >> >> 54.91% tcall.c:6 [.] f2 tcall >> | >> |--65.53%-- f2 tcall.c:5 >> | | >> | |--70.83%-- f1 >> | | main >> | | f1 >> | | f2 >> | | f1 >> | | f2 > > I considered this. For this example it doesn't make much difference > because the functions are so small. > > But for anything larger I really need the line numbers to make > sense of it. > > So I prefer to keep them. I'll look into some easy switch > to turn them off though. Oh, I'm not just removing line numbers - it also removed duplicates (f1 and f2). But having both from/to entries, I'm not sure it's worth tho.. > > >> > + if (sort__has_parent && !*parent && >> > + symbol__match_regex(al.sym, _regex)) >> > + *parent = al.sym; >> > + else if (have_ignore_callees && root_al && >> > +symbol__match_regex(al.sym, _callees_regex)) { >> > + /* Treat this symbol as the root, >> > + forgetting its callees. */ >> > + *root_al = al; >> > + callchain_cursor_reset(_cursor); >> > + } >> > + if (!symbol_conf.use_callchain) >> > + return -EINVAL; >> >> This check already went away. >> >> And, to remove duplicates, I think we need to check last callchain >> cursor node wrt the callchain_param.key here. > > I don't understand the comment. I'm not modifying anything > that has been already added to the callchain. Just things > to be added in the future. So why would I need to check > or change the cursor? But didn't you already do it (with ips[first_call]) to remove overlaps between LBR and normal callchain? > >> >> Also, by comparing 'from' address, I'd expect you add the from address >> alone but you add both of 'from' and 'to'. Do we really need to do >> that? > > Adding from and to makes it much clearer to the user what happens, > especially with conditional branches, so they can follow the > control flow. But it could be confusing too - esp. when it moves from LBR to normal callchains? Hmm.. maybe we can print them bit differently. > > >> And the first address saved in normal callchain is address of the >> function itself so it might be 'to' you need to check if sampled before >> any branch in a function. > > I'm checking against the CALL, not the target. Yeah, but I'm afraid that it'd always fail to find a match. > >> >> > + } else >> > + be[i] = branch->entries[branch->nr - i - 1]; >> > + } >> > + >> > + nr = remove_loops(be, nr); >> > + >> > + for (i = 0; i < nr; i++) { >> > + err = add_callchain_ip(machine, thread, parent, >> > + root_al, >> > + -1, be[i].to); >> > + if (!err) >> > + err = add_callchain_ip(machine, thread, >> > + parent, root_al, >> > + -1, be[i].from); >> > + if (err == -EINVAL) >> > + break; >> > + if (err) >> > + return err; >> > + } >> > + chain_nr -= nr; >> >> I'm not sure this line is needed. > > Without that i could exceed the limit. What limit? Let's say there's a callchains and LBR records below.. callchain: f1 <- f2 <- f3 <- f4 <- f5 LBR (f1<-f1) <- (f1<-f2) So two entries are matched, we have nr = 2, first_call = 2 and chain_nr = 5 right? So IIUC above code will print callchains like this: - f1 - f1 - f1 - f2 - f3 while I expect below (with duplicates for now): - f1 - f1 - f1 - f2 - f3 - f4 - f5 Do I miss something? Thanks, Namhyung -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 1/8] perf, tools: Support handling complete branch stacks as histograms
Hi Andi, On Wed, 12 Nov 2014 00:31:53 +0100, Andi Kleen wrote: Sorry for the long delay. Just revisiting that. On Wed, Oct 22, 2014 at 10:03:51AM +0900, Namhyung Kim wrote: | | f2 tcall.c:5 | | f1 tcall.c:12 | | f1 tcall.c:12 | | f2 tcall.c:7 | | f2 tcall.c:5 | | f1 tcall.c:11 I think it'd be better if it just prints function names as normal callchain does (and optionally srcline with a switch) and duplicates removed like below: 54.91% tcall.c:6 [.] f2 tcall | |--65.53%-- f2 tcall.c:5 | | | |--70.83%-- f1 | | main | | f1 | | f2 | | f1 | | f2 I considered this. For this example it doesn't make much difference because the functions are so small. But for anything larger I really need the line numbers to make sense of it. So I prefer to keep them. I'll look into some easy switch to turn them off though. Oh, I'm not just removing line numbers - it also removed duplicates (f1 and f2). But having both from/to entries, I'm not sure it's worth tho.. + if (sort__has_parent !*parent + symbol__match_regex(al.sym, parent_regex)) + *parent = al.sym; + else if (have_ignore_callees root_al +symbol__match_regex(al.sym, ignore_callees_regex)) { + /* Treat this symbol as the root, + forgetting its callees. */ + *root_al = al; + callchain_cursor_reset(callchain_cursor); + } + if (!symbol_conf.use_callchain) + return -EINVAL; This check already went away. And, to remove duplicates, I think we need to check last callchain cursor node wrt the callchain_param.key here. I don't understand the comment. I'm not modifying anything that has been already added to the callchain. Just things to be added in the future. So why would I need to check or change the cursor? But didn't you already do it (with ips[first_call]) to remove overlaps between LBR and normal callchain? Also, by comparing 'from' address, I'd expect you add the from address alone but you add both of 'from' and 'to'. Do we really need to do that? Adding from and to makes it much clearer to the user what happens, especially with conditional branches, so they can follow the control flow. But it could be confusing too - esp. when it moves from LBR to normal callchains? Hmm.. maybe we can print them bit differently. And the first address saved in normal callchain is address of the function itself so it might be 'to' you need to check if sampled before any branch in a function. I'm checking against the CALL, not the target. Yeah, but I'm afraid that it'd always fail to find a match. + } else + be[i] = branch-entries[branch-nr - i - 1]; + } + + nr = remove_loops(be, nr); + + for (i = 0; i nr; i++) { + err = add_callchain_ip(machine, thread, parent, + root_al, + -1, be[i].to); + if (!err) + err = add_callchain_ip(machine, thread, + parent, root_al, + -1, be[i].from); + if (err == -EINVAL) + break; + if (err) + return err; + } + chain_nr -= nr; I'm not sure this line is needed. Without that i could exceed the limit. What limit? Let's say there's a callchains and LBR records below.. callchain: f1 - f2 - f3 - f4 - f5 LBR (f1-f1) - (f1-f2) So two entries are matched, we have nr = 2, first_call = 2 and chain_nr = 5 right? So IIUC above code will print callchains like this: - f1 - f1 - f1 - f2 - f3 while I expect below (with duplicates for now): - f1 - f1 - f1 - f2 - f3 - f4 - f5 Do I miss something? Thanks, Namhyung -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 1/8] perf, tools: Support handling complete branch stacks as histograms
Sorry for the long delay. Just revisiting that. On Wed, Oct 22, 2014 at 10:03:51AM +0900, Namhyung Kim wrote: > > | | f2 tcall.c:5 > > | | f1 tcall.c:12 > > | | f1 tcall.c:12 > > | | f2 tcall.c:7 > > | | f2 tcall.c:5 > > | | f1 tcall.c:11 > > I think it'd be better if it just prints function names as normal > callchain does (and optionally srcline with a switch) and duplicates > removed like below: > > 54.91% tcall.c:6 [.] f2 tcall > | > |--65.53%-- f2 tcall.c:5 > | | > | |--70.83%-- f1 > | | main > | | f1 > | | f2 > | | f1 > | | f2 I considered this. For this example it doesn't make much difference because the functions are so small. But for anything larger I really need the line numbers to make sense of it. So I prefer to keep them. I'll look into some easy switch to turn them off though. > > + if (sort__has_parent && !*parent && > > + symbol__match_regex(al.sym, _regex)) > > + *parent = al.sym; > > + else if (have_ignore_callees && root_al && > > + symbol__match_regex(al.sym, _callees_regex)) { > > + /* Treat this symbol as the root, > > + forgetting its callees. */ > > + *root_al = al; > > + callchain_cursor_reset(_cursor); > > + } > > + if (!symbol_conf.use_callchain) > > + return -EINVAL; > > This check already went away. > > And, to remove duplicates, I think we need to check last callchain > cursor node wrt the callchain_param.key here. I don't understand the comment. I'm not modifying anything that has been already added to the callchain. Just things to be added in the future. So why would I need to check or change the cursor? > > Also, by comparing 'from' address, I'd expect you add the from address > alone but you add both of 'from' and 'to'. Do we really need to do > that? Adding from and to makes it much clearer to the user what happens, especially with conditional branches, so they can follow the control flow. > And the first address saved in normal callchain is address of the > function itself so it might be 'to' you need to check if sampled before > any branch in a function. I'm checking against the CALL, not the target. > > > + } else > > + be[i] = branch->entries[branch->nr - i - 1]; > > + } > > + > > + nr = remove_loops(be, nr); > > + > > + for (i = 0; i < nr; i++) { > > + err = add_callchain_ip(machine, thread, parent, > > + root_al, > > + -1, be[i].to); > > + if (!err) > > + err = add_callchain_ip(machine, thread, > > + parent, root_al, > > + -1, be[i].from); > > + if (err == -EINVAL) > > + break; > > + if (err) > > + return err; > > + } > > + chain_nr -= nr; > > I'm not sure this line is needed. Without that i could exceed the limit. -Andi -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 1/8] perf, tools: Support handling complete branch stacks as histograms
Sorry for the long delay. Just revisiting that. On Wed, Oct 22, 2014 at 10:03:51AM +0900, Namhyung Kim wrote: | | f2 tcall.c:5 | | f1 tcall.c:12 | | f1 tcall.c:12 | | f2 tcall.c:7 | | f2 tcall.c:5 | | f1 tcall.c:11 I think it'd be better if it just prints function names as normal callchain does (and optionally srcline with a switch) and duplicates removed like below: 54.91% tcall.c:6 [.] f2 tcall | |--65.53%-- f2 tcall.c:5 | | | |--70.83%-- f1 | | main | | f1 | | f2 | | f1 | | f2 I considered this. For this example it doesn't make much difference because the functions are so small. But for anything larger I really need the line numbers to make sense of it. So I prefer to keep them. I'll look into some easy switch to turn them off though. + if (sort__has_parent !*parent + symbol__match_regex(al.sym, parent_regex)) + *parent = al.sym; + else if (have_ignore_callees root_al + symbol__match_regex(al.sym, ignore_callees_regex)) { + /* Treat this symbol as the root, + forgetting its callees. */ + *root_al = al; + callchain_cursor_reset(callchain_cursor); + } + if (!symbol_conf.use_callchain) + return -EINVAL; This check already went away. And, to remove duplicates, I think we need to check last callchain cursor node wrt the callchain_param.key here. I don't understand the comment. I'm not modifying anything that has been already added to the callchain. Just things to be added in the future. So why would I need to check or change the cursor? Also, by comparing 'from' address, I'd expect you add the from address alone but you add both of 'from' and 'to'. Do we really need to do that? Adding from and to makes it much clearer to the user what happens, especially with conditional branches, so they can follow the control flow. And the first address saved in normal callchain is address of the function itself so it might be 'to' you need to check if sampled before any branch in a function. I'm checking against the CALL, not the target. + } else + be[i] = branch-entries[branch-nr - i - 1]; + } + + nr = remove_loops(be, nr); + + for (i = 0; i nr; i++) { + err = add_callchain_ip(machine, thread, parent, + root_al, + -1, be[i].to); + if (!err) + err = add_callchain_ip(machine, thread, + parent, root_al, + -1, be[i].from); + if (err == -EINVAL) + break; + if (err) + return err; + } + chain_nr -= nr; I'm not sure this line is needed. Without that i could exceed the limit. -Andi -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 1/8] perf, tools: Support handling complete branch stacks as histograms
Hi Andi, On Fri, 26 Sep 2014 16:37:09 -0700, Andi Kleen wrote: > From: Andi Kleen > > Currently branch stacks can be only shown as edge histograms for > individual branches. I never found this display particularly useful. > > This implements an alternative mode that creates histograms over complete > branch traces, instead of individual branches, similar to how normal > callgraphs are handled. This is done by putting it in > front of the normal callgraph and then using the normal callgraph > histogram infrastructure to unify them. > > This way in complex functions we can understand the control flow > that lead to a particular sample, and may even see some control > flow in the caller for short functions. > > Example (simplified, of course for such simple code this > is usually not needed): > > tcall.c: > > volatile a = 1, b = 10, c; > > __attribute__((noinline)) f2() > { > c = a / b; > } > > __attribute__((noinline)) f1() > { > f2(); > f2(); > } > main() > { > int i; > for (i = 0; i < 100; i++) > f1(); > } > > % perf record -b -g ./tsrc/tcall > [ perf record: Woken up 1 times to write data ] > [ perf record: Captured and wrote 0.044 MB perf.data (~1923 samples) ] > % perf report --branch-history > ... > 54.91% tcall.c:6 [.] f2 tcall > | > |--65.53%-- f2 tcall.c:5 > | | > | |--70.83%-- f1 tcall.c:11 > | | f1 tcall.c:10 > | | main tcall.c:18 > | | main tcall.c:18 > | | main tcall.c:17 > | | main tcall.c:17 > | | f1 tcall.c:13 > | | f1 tcall.c:13 > | | f2 tcall.c:7 > | | f2 tcall.c:5 > | | f1 tcall.c:12 > | | f1 tcall.c:12 > | | f2 tcall.c:7 > | | f2 tcall.c:5 > | | f1 tcall.c:11 I think it'd be better if it just prints function names as normal callchain does (and optionally srcline with a switch) and duplicates removed like below: 54.91% tcall.c:6 [.] f2 tcall | |--65.53%-- f2 tcall.c:5 | | | |--70.83%-- f1 | | main | | f1 | | f2 | | f1 | | f2 [SNIP] > +static int add_callchain_ip(struct machine *machine, > + struct thread *thread, > + struct symbol **parent, > + struct addr_location *root_al, > + int cpumode, > + u64 ip) > +{ > + struct addr_location al; > + > + al.filtered = 0; > + al.sym = NULL; > + if (cpumode == -1) > + thread__find_cpumode_addr_location(thread, machine, > MAP__FUNCTION, ip, ); > + else > + thread__find_addr_location(thread, machine, cpumode, > MAP__FUNCTION, > +ip, ); > + if (al.sym != NULL) { > + if (sort__has_parent && !*parent && > + symbol__match_regex(al.sym, _regex)) > + *parent = al.sym; > + else if (have_ignore_callees && root_al && > + symbol__match_regex(al.sym, _callees_regex)) { > + /* Treat this symbol as the root, > +forgetting its callees. */ > + *root_al = al; > + callchain_cursor_reset(_cursor); > + } > + if (!symbol_conf.use_callchain) > + return -EINVAL; This check already went away. And, to remove duplicates, I think we need to check last callchain cursor node wrt the callchain_param.key here. > + } > + > + return callchain_cursor_append(_cursor, ip, al.map, al.sym); > +} > + > +#define CHASHSZ 127 > +#define CHASHBITS 7 > +#define NO_ENTRY 0xff > + > +#define PERF_MAX_BRANCH_DEPTH 127 > + > +/* Remove loops. */ > +static int remove_loops(struct branch_entry *l, int nr) > +{ > + int i, j, off; > + unsigned char chash[CHASHSZ]; > + memset(chash, NO_ENTRY, sizeof(chash)); > + > + BUG_ON(nr >= 256); > + for (i = 0; i < nr; i++) { > + int h = hash_64(l[i].from, CHASHBITS) % CHASHSZ; > + > + /* no collision handling for now */ > + if (chash[h] == NO_ENTRY) { > + chash[h] = i; > + } else if (l[chash[h]].from == l[i].from) { > + bool is_loop = true; > + /* check if it is a real loop */ > + off = 0; > +
Re: [PATCH 1/8] perf, tools: Support handling complete branch stacks as histograms
Hi Andi, On Fri, 26 Sep 2014 16:37:09 -0700, Andi Kleen wrote: From: Andi Kleen a...@linux.intel.com Currently branch stacks can be only shown as edge histograms for individual branches. I never found this display particularly useful. This implements an alternative mode that creates histograms over complete branch traces, instead of individual branches, similar to how normal callgraphs are handled. This is done by putting it in front of the normal callgraph and then using the normal callgraph histogram infrastructure to unify them. This way in complex functions we can understand the control flow that lead to a particular sample, and may even see some control flow in the caller for short functions. Example (simplified, of course for such simple code this is usually not needed): tcall.c: volatile a = 1, b = 10, c; __attribute__((noinline)) f2() { c = a / b; } __attribute__((noinline)) f1() { f2(); f2(); } main() { int i; for (i = 0; i 100; i++) f1(); } % perf record -b -g ./tsrc/tcall [ perf record: Woken up 1 times to write data ] [ perf record: Captured and wrote 0.044 MB perf.data (~1923 samples) ] % perf report --branch-history ... 54.91% tcall.c:6 [.] f2 tcall | |--65.53%-- f2 tcall.c:5 | | | |--70.83%-- f1 tcall.c:11 | | f1 tcall.c:10 | | main tcall.c:18 | | main tcall.c:18 | | main tcall.c:17 | | main tcall.c:17 | | f1 tcall.c:13 | | f1 tcall.c:13 | | f2 tcall.c:7 | | f2 tcall.c:5 | | f1 tcall.c:12 | | f1 tcall.c:12 | | f2 tcall.c:7 | | f2 tcall.c:5 | | f1 tcall.c:11 I think it'd be better if it just prints function names as normal callchain does (and optionally srcline with a switch) and duplicates removed like below: 54.91% tcall.c:6 [.] f2 tcall | |--65.53%-- f2 tcall.c:5 | | | |--70.83%-- f1 | | main | | f1 | | f2 | | f1 | | f2 [SNIP] +static int add_callchain_ip(struct machine *machine, + struct thread *thread, + struct symbol **parent, + struct addr_location *root_al, + int cpumode, + u64 ip) +{ + struct addr_location al; + + al.filtered = 0; + al.sym = NULL; + if (cpumode == -1) + thread__find_cpumode_addr_location(thread, machine, MAP__FUNCTION, ip, al); + else + thread__find_addr_location(thread, machine, cpumode, MAP__FUNCTION, +ip, al); + if (al.sym != NULL) { + if (sort__has_parent !*parent + symbol__match_regex(al.sym, parent_regex)) + *parent = al.sym; + else if (have_ignore_callees root_al + symbol__match_regex(al.sym, ignore_callees_regex)) { + /* Treat this symbol as the root, +forgetting its callees. */ + *root_al = al; + callchain_cursor_reset(callchain_cursor); + } + if (!symbol_conf.use_callchain) + return -EINVAL; This check already went away. And, to remove duplicates, I think we need to check last callchain cursor node wrt the callchain_param.key here. + } + + return callchain_cursor_append(callchain_cursor, ip, al.map, al.sym); +} + +#define CHASHSZ 127 +#define CHASHBITS 7 +#define NO_ENTRY 0xff + +#define PERF_MAX_BRANCH_DEPTH 127 + +/* Remove loops. */ +static int remove_loops(struct branch_entry *l, int nr) +{ + int i, j, off; + unsigned char chash[CHASHSZ]; + memset(chash, NO_ENTRY, sizeof(chash)); + + BUG_ON(nr = 256); + for (i = 0; i nr; i++) { + int h = hash_64(l[i].from, CHASHBITS) % CHASHSZ; + + /* no collision handling for now */ + if (chash[h] == NO_ENTRY) { + chash[h] = i; + } else if (l[chash[h]].from == l[i].from) { + bool is_loop = true; + /* check if it is a real loop */ + off = 0; + for (j = chash[h]; j i i + off nr; j++, off++) +
Re: [PATCH 1/8] perf, tools: Support handling complete branch stacks as histograms
On Fri, Sep 26, 2014 at 04:37:09PM -0700, Andi Kleen wrote: > From: Andi Kleen SNIP > > struct callchain_list { > diff --git a/tools/perf/util/machine.c b/tools/perf/util/machine.c > index b2ec38b..8ba32ce 100644 > --- a/tools/perf/util/machine.c > +++ b/tools/perf/util/machine.c > @@ -12,6 +12,7 @@ > #include > #include > #include "unwind.h" > +#include "linux/hash.h" > > int machine__init(struct machine *machine, const char *root_dir, pid_t pid) > { > @@ -1364,9 +1365,84 @@ struct branch_info *sample__resolve_bstack(struct > perf_sample *sample, > return bi; > } > > +static int add_callchain_ip(struct machine *machine, > + struct thread *thread, > + struct symbol **parent, > + struct addr_location *root_al, > + int cpumode, > + u64 ip) > +{ > + struct addr_location al; > + > + al.filtered = 0; > + al.sym = NULL; > + if (cpumode == -1) > + thread__find_cpumode_addr_location(thread, machine, > MAP__FUNCTION, ip, ); > + else > + thread__find_addr_location(thread, machine, cpumode, > MAP__FUNCTION, > +ip, ); this cpumode condition is new (wrt below comment) > + if (al.sym != NULL) { > + if (sort__has_parent && !*parent && > + symbol__match_regex(al.sym, _regex)) > + *parent = al.sym; > + else if (have_ignore_callees && root_al && > + symbol__match_regex(al.sym, _callees_regex)) { > + /* Treat this symbol as the root, > +forgetting its callees. */ > + *root_al = al; > + callchain_cursor_reset(_cursor); > + } > + if (!symbol_conf.use_callchain) > + return -EINVAL; why is this condition here? could you please split this change into - adding add_callchain_ip function - adding more functionality to add_callchain_ip function? IMO it'd make it cleaner and easier to understand thanks, jirka -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 1/8] perf, tools: Support handling complete branch stacks as histograms
On Fri, Sep 26, 2014 at 04:37:09PM -0700, Andi Kleen wrote: SNIP > OPT_BOOLEAN('x', "exclude-other", _conf.exclude_other, > "Only display entries with parent-match"), > - OPT_CALLBACK_DEFAULT('g', "call-graph", , > "output_type,min_percent[,print_limit],call_order", > - "Display callchains using output_type (graph, flat, > fractal, or none) , min percent threshold, optional print limit, callchain > order, key (function or address). " > + OPT_CALLBACK_DEFAULT('g', "call-graph", , > "output_type,min_percent[,print_limit],call_order[,branch]", > + "Display callchains using output_type (graph, flat, > fractal, or none) , min percent threshold, optional print limit, callchain > order, key (function or address), add branches. " >"Default: fractal,0.5,callee,function", > _parse_callchain_opt, callchain_default_opt), > OPT_BOOLEAN(0, "children", _conf.cumulate_callchain, > "Accumulate callchains of children and show total overhead > as well"), > diff --git a/tools/perf/util/callchain.c b/tools/perf/util/callchain.c > index 08f0fbf..265457c 100644 > --- a/tools/perf/util/callchain.c > +++ b/tools/perf/util/callchain.c > @@ -61,6 +61,8 @@ parse_callchain_report_opt(const char *arg) > callchain_param.key = CCKEY_FUNCTION; > else if (!strncmp(tok, "address", strlen(tok))) > callchain_param.key = CCKEY_ADDRESS; > + else if (!strncmp(tok, "branch", strlen(tok))) > + callchain_param.branch_callstack = 1; this needs to be rebased to latest Namhyung's changes which got in.. could you please rebase on Arnaldo's perf/core? jirka -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 1/8] perf, tools: Support handling complete branch stacks as histograms
On Fri, Sep 26, 2014 at 04:37:09PM -0700, Andi Kleen wrote: SNIP OPT_BOOLEAN('x', exclude-other, symbol_conf.exclude_other, Only display entries with parent-match), - OPT_CALLBACK_DEFAULT('g', call-graph, report, output_type,min_percent[,print_limit],call_order, - Display callchains using output_type (graph, flat, fractal, or none) , min percent threshold, optional print limit, callchain order, key (function or address). + OPT_CALLBACK_DEFAULT('g', call-graph, report, output_type,min_percent[,print_limit],call_order[,branch], + Display callchains using output_type (graph, flat, fractal, or none) , min percent threshold, optional print limit, callchain order, key (function or address), add branches. Default: fractal,0.5,callee,function, report_parse_callchain_opt, callchain_default_opt), OPT_BOOLEAN(0, children, symbol_conf.cumulate_callchain, Accumulate callchains of children and show total overhead as well), diff --git a/tools/perf/util/callchain.c b/tools/perf/util/callchain.c index 08f0fbf..265457c 100644 --- a/tools/perf/util/callchain.c +++ b/tools/perf/util/callchain.c @@ -61,6 +61,8 @@ parse_callchain_report_opt(const char *arg) callchain_param.key = CCKEY_FUNCTION; else if (!strncmp(tok, address, strlen(tok))) callchain_param.key = CCKEY_ADDRESS; + else if (!strncmp(tok, branch, strlen(tok))) + callchain_param.branch_callstack = 1; this needs to be rebased to latest Namhyung's changes which got in.. could you please rebase on Arnaldo's perf/core? jirka -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 1/8] perf, tools: Support handling complete branch stacks as histograms
On Fri, Sep 26, 2014 at 04:37:09PM -0700, Andi Kleen wrote: From: Andi Kleen a...@linux.intel.com SNIP struct callchain_list { diff --git a/tools/perf/util/machine.c b/tools/perf/util/machine.c index b2ec38b..8ba32ce 100644 --- a/tools/perf/util/machine.c +++ b/tools/perf/util/machine.c @@ -12,6 +12,7 @@ #include stdbool.h #include symbol/kallsyms.h #include unwind.h +#include linux/hash.h int machine__init(struct machine *machine, const char *root_dir, pid_t pid) { @@ -1364,9 +1365,84 @@ struct branch_info *sample__resolve_bstack(struct perf_sample *sample, return bi; } +static int add_callchain_ip(struct machine *machine, + struct thread *thread, + struct symbol **parent, + struct addr_location *root_al, + int cpumode, + u64 ip) +{ + struct addr_location al; + + al.filtered = 0; + al.sym = NULL; + if (cpumode == -1) + thread__find_cpumode_addr_location(thread, machine, MAP__FUNCTION, ip, al); + else + thread__find_addr_location(thread, machine, cpumode, MAP__FUNCTION, +ip, al); this cpumode condition is new (wrt below comment) + if (al.sym != NULL) { + if (sort__has_parent !*parent + symbol__match_regex(al.sym, parent_regex)) + *parent = al.sym; + else if (have_ignore_callees root_al + symbol__match_regex(al.sym, ignore_callees_regex)) { + /* Treat this symbol as the root, +forgetting its callees. */ + *root_al = al; + callchain_cursor_reset(callchain_cursor); + } + if (!symbol_conf.use_callchain) + return -EINVAL; why is this condition here? could you please split this change into - adding add_callchain_ip function - adding more functionality to add_callchain_ip function? IMO it'd make it cleaner and easier to understand thanks, jirka -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/