Re: [PATCHSET 0/4] perf report: Support folded callchain output (v4)
Hi Arnaldo, On Wed, Nov 04, 2015 at 03:08:58PM -0300, Arnaldo Carvalho de Melo wrote: > Em Thu, Nov 05, 2015 at 12:34:57AM +0900, Namhyung Kim escreveu: > > Hi Arnaldo and Brendan, > > > > On Wed, Nov 04, 2015 at 11:51:31AM -0300, Arnaldo Carvalho de Melo wrote: > > > Em Tue, Nov 03, 2015 at 10:02:32PM -0800, Brendan Gregg escreveu: > > > > On Tue, Nov 3, 2015 at 5:54 PM, Namhyung Kim > > > > wrote: > > > > > Ah, makes sense. So it'd look like > > > > > > > > $ perf report --stdio -g folded,count,info -F none -s comm > > > > > $ perf report --stdio -g folded,count,info -F none -s pid > > > > > > > > The output would be > > > > > > > > 809 swapper-0 > > > > > cpu_bringup_and_idle;cpu_startup_entry;default_idle_call;arch_cpu_idle;default_idle;xen_hypercall_sched_op > > > > > > > Thanks, looks almost right: a couple of minor changes: > > > > > > > 1. If perf already has the precedent of "PID:comm", instead of my > > > > "comm-PID", then maybe it should use "PID:comm" for perf consistency. > > > > Doesn't make much difference to me. > > > Right. Actually I'd like to write it that way.. ;-) > > Well, those are two pieces of information: "comm" and "pid", so it would > be nice that we could take this opportunity to remove it, i.e. just > treat it as any other field and separate it via the designated > separator, and only show the ones specified. So do you want to change '-s pid' to print 'PID' part only? > > > > > 2. The second space, delimiting "PID:comm" (or comm) and the stack... > > > > I'm nervous about using space as a delimiter any more than once, since > > > > it can also appear in comm (eg, "java main") and frames (eg, > > > > "JavaCalls::call_helper(JavaValue*, methodHandle*, JavaCallArguments*, > > > > Thread*)" -- that's direct from "perf script"!). I'd consider making > > > > it a semicolon: > > The C++ symbol names are the biggest challenge here for a single line in > CSV ("comma" quoted) record :-\ > > > Fair enough. > > > > > 809 swapper-0;cpu_bringup_and_idle;cpu_startup_entry;... > > > > > > > So the output is "value key", and key is a semicolon delimited stack > > > > with an optional comm or PID:comm frame at the start. > > > > > > Agreed, but then, we can have some sort of default and also be able to, > > > using -F, specify what are the fields we want, and in which order, and I > > > liked your suggestion of being able to specify "-F none" and that mean > > > no hist line to be produced. > > > > > > Likewise, the way that each callchain line should be formatted should be > > > programmable via the command line, via the -g option, no? Then script > > > writers could use it in a way that doesn't requires further processing, > > > as Brendan showed. > > > > Right. So '-s [,,...] -g info' can control which info is > > displayed along with the callchains. > > So you force the same selection of fields to be used for both the > hist_entry and the callchains? Yes. > > And why is that some of the fields will be selected via -s (comm, dso) > and other fields will be selected via -g (count, this "info" thing)? Because it affects how hist entries are aggregated.. > > Why not be flexible and allow any set of fields to be used in both > cases, without one being tied to the other? > > I.e. instead of: > > -s [,,...] -g info > > We use: > > -s [,,...] -g [[,],...] But then we need to aggregate hist entries using all of key1, key2, keyA, keyB and so on. Otherwise callchain info with keyA and keyB might be stale. If so, we need to group hist entries again using key1 and key2 only for printing hist part. For example, entries for (1,2,A,B) and (1,2,C,D) should be shown as single entry for (1,2). I think this 'info' part is only needed when hist entries are omitted (i.e. -F none). If so, no need to bother with new options.. > > If one would want to have the same set for both, then yeah, a keyword > for that would be interesting, reusing your "info": > > -s [,,...] -g info > > Would mean: > > -s [,,...] -g [[,],...] > > With both ... equal > > But "info" is way too vague, perhaps "hist_keys", or something more > compact, like: "\-s", to reuse the semantic of regular expression groups > (\1). I prefer "hist_keys". > > > $ perf report -s comm,dso -g folded,count,info -F none > > 809 swapper;[kernel.vmlinux];cpu_bringup_and_idle;cpu_startup_entry;... > > > Note that the info part (swapper;[kernel.vmlinux]) is also separated > > by a semicolon. But I think it's ok since it's controlled by command > > line, so script can know how many entries will be. > > > > But yeah, the value is the semicolon delimited stack all the way to the > > > comm/PID:comm if there are more than one or if the user asks it to be > > > there via a -g keyword, all the other counts/info are just relative to > > > that, CSV or whatever other delimiter the user asks it to, and space is > > > not an option, as we know it can appear in the middle of a
Re: [PATCHSET 0/4] perf report: Support folded callchain output (v4)
Hi Arnaldo, On Wed, Nov 04, 2015 at 03:08:58PM -0300, Arnaldo Carvalho de Melo wrote: > Em Thu, Nov 05, 2015 at 12:34:57AM +0900, Namhyung Kim escreveu: > > Hi Arnaldo and Brendan, > > > > On Wed, Nov 04, 2015 at 11:51:31AM -0300, Arnaldo Carvalho de Melo wrote: > > > Em Tue, Nov 03, 2015 at 10:02:32PM -0800, Brendan Gregg escreveu: > > > > On Tue, Nov 3, 2015 at 5:54 PM, Namhyung Kim> > > > wrote: > > > > > Ah, makes sense. So it'd look like > > > > > > > > $ perf report --stdio -g folded,count,info -F none -s comm > > > > > $ perf report --stdio -g folded,count,info -F none -s pid > > > > > > > > The output would be > > > > > > > > 809 swapper-0 > > > > > cpu_bringup_and_idle;cpu_startup_entry;default_idle_call;arch_cpu_idle;default_idle;xen_hypercall_sched_op > > > > > > > Thanks, looks almost right: a couple of minor changes: > > > > > > > 1. If perf already has the precedent of "PID:comm", instead of my > > > > "comm-PID", then maybe it should use "PID:comm" for perf consistency. > > > > Doesn't make much difference to me. > > > Right. Actually I'd like to write it that way.. ;-) > > Well, those are two pieces of information: "comm" and "pid", so it would > be nice that we could take this opportunity to remove it, i.e. just > treat it as any other field and separate it via the designated > separator, and only show the ones specified. So do you want to change '-s pid' to print 'PID' part only? > > > > > 2. The second space, delimiting "PID:comm" (or comm) and the stack... > > > > I'm nervous about using space as a delimiter any more than once, since > > > > it can also appear in comm (eg, "java main") and frames (eg, > > > > "JavaCalls::call_helper(JavaValue*, methodHandle*, JavaCallArguments*, > > > > Thread*)" -- that's direct from "perf script"!). I'd consider making > > > > it a semicolon: > > The C++ symbol names are the biggest challenge here for a single line in > CSV ("comma" quoted) record :-\ > > > Fair enough. > > > > > 809 swapper-0;cpu_bringup_and_idle;cpu_startup_entry;... > > > > > > > So the output is "value key", and key is a semicolon delimited stack > > > > with an optional comm or PID:comm frame at the start. > > > > > > Agreed, but then, we can have some sort of default and also be able to, > > > using -F, specify what are the fields we want, and in which order, and I > > > liked your suggestion of being able to specify "-F none" and that mean > > > no hist line to be produced. > > > > > > Likewise, the way that each callchain line should be formatted should be > > > programmable via the command line, via the -g option, no? Then script > > > writers could use it in a way that doesn't requires further processing, > > > as Brendan showed. > > > > Right. So '-s [,,...] -g info' can control which info is > > displayed along with the callchains. > > So you force the same selection of fields to be used for both the > hist_entry and the callchains? Yes. > > And why is that some of the fields will be selected via -s (comm, dso) > and other fields will be selected via -g (count, this "info" thing)? Because it affects how hist entries are aggregated.. > > Why not be flexible and allow any set of fields to be used in both > cases, without one being tied to the other? > > I.e. instead of: > > -s [,,...] -g info > > We use: > > -s [,,...] -g [[,],...] But then we need to aggregate hist entries using all of key1, key2, keyA, keyB and so on. Otherwise callchain info with keyA and keyB might be stale. If so, we need to group hist entries again using key1 and key2 only for printing hist part. For example, entries for (1,2,A,B) and (1,2,C,D) should be shown as single entry for (1,2). I think this 'info' part is only needed when hist entries are omitted (i.e. -F none). If so, no need to bother with new options.. > > If one would want to have the same set for both, then yeah, a keyword > for that would be interesting, reusing your "info": > > -s [,,...] -g info > > Would mean: > > -s [,,...] -g [[,],...] > > With both ... equal > > But "info" is way too vague, perhaps "hist_keys", or something more > compact, like: "\-s", to reuse the semantic of regular expression groups > (\1). I prefer "hist_keys". > > > $ perf report -s comm,dso -g folded,count,info -F none > > 809 swapper;[kernel.vmlinux];cpu_bringup_and_idle;cpu_startup_entry;... > > > Note that the info part (swapper;[kernel.vmlinux]) is also separated > > by a semicolon. But I think it's ok since it's controlled by command > > line, so script can know how many entries will be. > > > > But yeah, the value is the semicolon delimited stack all the way to the > > > comm/PID:comm if there are more than one or if the user asks it to be > > > there via a -g keyword, all the other counts/info are just relative to > > > that, CSV or whatever other delimiter the user asks it to, and space is > > > not an option, as we know it can appear
Re: [PATCHSET 0/4] perf report: Support folded callchain output (v4)
Em Thu, Nov 05, 2015 at 12:34:57AM +0900, Namhyung Kim escreveu: > Hi Arnaldo and Brendan, > > On Wed, Nov 04, 2015 at 11:51:31AM -0300, Arnaldo Carvalho de Melo wrote: > > Em Tue, Nov 03, 2015 at 10:02:32PM -0800, Brendan Gregg escreveu: > > > On Tue, Nov 3, 2015 at 5:54 PM, Namhyung Kim wrote: > > > > Ah, makes sense. So it'd look like > > > > > > $ perf report --stdio -g folded,count,info -F none -s comm > > > > $ perf report --stdio -g folded,count,info -F none -s pid > > > > > > The output would be > > > > > > 809 swapper-0 > > > > cpu_bringup_and_idle;cpu_startup_entry;default_idle_call;arch_cpu_idle;default_idle;xen_hypercall_sched_op > > > > > Thanks, looks almost right: a couple of minor changes: > > > > > 1. If perf already has the precedent of "PID:comm", instead of my > > > "comm-PID", then maybe it should use "PID:comm" for perf consistency. > > > Doesn't make much difference to me. > Right. Actually I'd like to write it that way.. ;-) Well, those are two pieces of information: "comm" and "pid", so it would be nice that we could take this opportunity to remove it, i.e. just treat it as any other field and separate it via the designated separator, and only show the ones specified. > > > 2. The second space, delimiting "PID:comm" (or comm) and the stack... > > > I'm nervous about using space as a delimiter any more than once, since > > > it can also appear in comm (eg, "java main") and frames (eg, > > > "JavaCalls::call_helper(JavaValue*, methodHandle*, JavaCallArguments*, > > > Thread*)" -- that's direct from "perf script"!). I'd consider making > > > it a semicolon: The C++ symbol names are the biggest challenge here for a single line in CSV ("comma" quoted) record :-\ > Fair enough. > > > 809 swapper-0;cpu_bringup_and_idle;cpu_startup_entry;... > > > > > So the output is "value key", and key is a semicolon delimited stack > > > with an optional comm or PID:comm frame at the start. > > > > Agreed, but then, we can have some sort of default and also be able to, > > using -F, specify what are the fields we want, and in which order, and I > > liked your suggestion of being able to specify "-F none" and that mean > > no hist line to be produced. > > > > Likewise, the way that each callchain line should be formatted should be > > programmable via the command line, via the -g option, no? Then script > > writers could use it in a way that doesn't requires further processing, > > as Brendan showed. > > Right. So '-s [,,...] -g info' can control which info is > displayed along with the callchains. So you force the same selection of fields to be used for both the hist_entry and the callchains? And why is that some of the fields will be selected via -s (comm, dso) and other fields will be selected via -g (count, this "info" thing)? Why not be flexible and allow any set of fields to be used in both cases, without one being tied to the other? I.e. instead of: -s [,,...] -g info We use: -s [,,...] -g [[,],...] If one would want to have the same set for both, then yeah, a keyword for that would be interesting, reusing your "info": -s [,,...] -g info Would mean: -s [,,...] -g [[,],...] With both ... equal But "info" is way too vague, perhaps "hist_keys", or something more compact, like: "\-s", to reuse the semantic of regular expression groups (\1). > $ perf report -s comm,dso -g folded,count,info -F none > 809 swapper;[kernel.vmlinux];cpu_bringup_and_idle;cpu_startup_entry;... > Note that the info part (swapper;[kernel.vmlinux]) is also separated > by a semicolon. But I think it's ok since it's controlled by command > line, so script can know how many entries will be. > > But yeah, the value is the semicolon delimited stack all the way to the > > comm/PID:comm if there are more than one or if the user asks it to be > > there via a -g keyword, all the other counts/info are just relative to > > that, CSV or whatever other delimiter the user asks it to, and space is > > not an option, as we know it can appear in the middle of a COMM: > > Yes, I think that we should use a given separator (using -t option) > instead of hard-coded semicolon. Although it'd be rare, it seems > possible to use semicolons in the comm name too. Well, we can have an option to specify what would be the separator for the callchains. - Arnaldo -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCHSET 0/4] perf report: Support folded callchain output (v4)
Hi Arnaldo and Brendan, On Wed, Nov 04, 2015 at 11:51:31AM -0300, Arnaldo Carvalho de Melo wrote: > Em Tue, Nov 03, 2015 at 10:02:32PM -0800, Brendan Gregg escreveu: > > On Tue, Nov 3, 2015 at 5:54 PM, Namhyung Kim wrote: > > > Ah, makes sense. So it'd look like > > > > $ perf report --stdio -g folded,count,info -F none -s comm > > > $ perf report --stdio -g folded,count,info -F none -s pid > > > > The output would be > > > > 809 swapper-0 > > > cpu_bringup_and_idle;cpu_startup_entry;default_idle_call;arch_cpu_idle;default_idle;xen_hypercall_sched_op > > > Thanks, looks almost right: a couple of minor changes: > > > 1. If perf already has the precedent of "PID:comm", instead of my > > "comm-PID", then maybe it should use "PID:comm" for perf consistency. > > Doesn't make much difference to me. Right. Actually I'd like to write it that way.. ;-) > > 2. The second space, delimiting "PID:comm" (or comm) and the stack... > > I'm nervous about using space as a delimiter any more than once, since > > it can also appear in comm (eg, "java main") and frames (eg, > > "JavaCalls::call_helper(JavaValue*, methodHandle*, JavaCallArguments*, > > Thread*)" -- that's direct from "perf script"!). I'd consider making > > it a semicolon: Fair enough. > > > 809 swapper-0;cpu_bringup_and_idle;cpu_startup_entry;... > > > So the output is "value key", and key is a semicolon delimited stack > > with an optional comm or PID:comm frame at the start. > > Agreed, but then, we can have some sort of default and also be able to, > using -F, specify what are the fields we want, and in which order, and I > liked your suggestion of being able to specify "-F none" and that mean > no hist line to be produced. > > Likewise, the way that each callchain line should be formatted should be > programmable via the command line, via the -g option, no? Then script > writers could use it in a way that doesn't requires further processing, > as Brendan showed. Right. So '-s [,,...] -g info' can control which info is displayed along with the callchains. $ perf report -s comm,dso -g folded,count,info -F none 809 swapper;[kernel.vmlinux];cpu_bringup_and_idle;cpu_startup_entry;... Note that the info part (swapper;[kernel.vmlinux]) is also separated by a semicolon. But I think it's ok since it's controlled by command line, so script can know how many entries will be. > > But yeah, the value is the semicolon delimited stack all the way to the > comm/PID:comm if there are more than one or if the user asks it to be > there via a -g keyword, all the other counts/info are just relative to > that, CSV or whatever other delimiter the user asks it to, and space is > not an option, as we know it can appear in the middle of a COMM: Yes, I think that we should use a given separator (using -t option) instead of hard-coded semicolon. Although it'd be rare, it seems possible to use semicolons in the comm name too. Thanks, Namhyung > > [root@zoo ~]# perf report -s comm | grep '[a-zA-Z] [a-zA-Z]' > # To display the perf.data header info, please use > # --header/--header-only options. > # Total Lost Samples: 0 > # Samples: 164K of event 'cycles:pp' > # Event count (approx.): 34422160859 > 0.11% DOM Worker > 0.10% JS Helper > 0.01% Qt bearer threa > 0.00% Socket Thread > 0.00% dconf worker > 0.00% JS Watchdog > [root@zoo ~]# > > - Arnaldo -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCHSET 0/4] perf report: Support folded callchain output (v4)
Em Tue, Nov 03, 2015 at 10:02:32PM -0800, Brendan Gregg escreveu: > On Tue, Nov 3, 2015 at 5:54 PM, Namhyung Kim wrote: > > Ah, makes sense. So it'd look like > > $ perf report --stdio -g folded,count,info -F none -s comm > > $ perf report --stdio -g folded,count,info -F none -s pid > > The output would be > > 809 swapper-0 > > cpu_bringup_and_idle;cpu_startup_entry;default_idle_call;arch_cpu_idle;default_idle;xen_hypercall_sched_op > Thanks, looks almost right: a couple of minor changes: > 1. If perf already has the precedent of "PID:comm", instead of my > "comm-PID", then maybe it should use "PID:comm" for perf consistency. > Doesn't make much difference to me. > 2. The second space, delimiting "PID:comm" (or comm) and the stack... > I'm nervous about using space as a delimiter any more than once, since > it can also appear in comm (eg, "java main") and frames (eg, > "JavaCalls::call_helper(JavaValue*, methodHandle*, JavaCallArguments*, > Thread*)" -- that's direct from "perf script"!). I'd consider making > it a semicolon: > 809 swapper-0;cpu_bringup_and_idle;cpu_startup_entry;... > So the output is "value key", and key is a semicolon delimited stack > with an optional comm or PID:comm frame at the start. Agreed, but then, we can have some sort of default and also be able to, using -F, specify what are the fields we want, and in which order, and I liked your suggestion of being able to specify "-F none" and that mean no hist line to be produced. Likewise, the way that each callchain line should be formatted should be programmable via the command line, via the -g option, no? Then script writers could use it in a way that doesn't requires further processing, as Brendan showed. But yeah, the value is the semicolon delimited stack all the way to the comm/PID:comm if there are more than one or if the user asks it to be there via a -g keyword, all the other counts/info are just relative to that, CSV or whatever other delimiter the user asks it to, and space is not an option, as we know it can appear in the middle of a COMM: [root@zoo ~]# perf report -s comm | grep '[a-zA-Z] [a-zA-Z]' # To display the perf.data header info, please use # --header/--header-only options. # Total Lost Samples: 0 # Samples: 164K of event 'cycles:pp' # Event count (approx.): 34422160859 0.11% DOM Worker 0.10% JS Helper 0.01% Qt bearer threa 0.00% Socket Thread 0.00% dconf worker 0.00% JS Watchdog [root@zoo ~]# - Arnaldo -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCHSET 0/4] perf report: Support folded callchain output (v4)
Em Tue, Nov 03, 2015 at 10:02:32PM -0800, Brendan Gregg escreveu: > On Tue, Nov 3, 2015 at 5:54 PM, Namhyung Kimwrote: > > Ah, makes sense. So it'd look like > > $ perf report --stdio -g folded,count,info -F none -s comm > > $ perf report --stdio -g folded,count,info -F none -s pid > > The output would be > > 809 swapper-0 > > cpu_bringup_and_idle;cpu_startup_entry;default_idle_call;arch_cpu_idle;default_idle;xen_hypercall_sched_op > Thanks, looks almost right: a couple of minor changes: > 1. If perf already has the precedent of "PID:comm", instead of my > "comm-PID", then maybe it should use "PID:comm" for perf consistency. > Doesn't make much difference to me. > 2. The second space, delimiting "PID:comm" (or comm) and the stack... > I'm nervous about using space as a delimiter any more than once, since > it can also appear in comm (eg, "java main") and frames (eg, > "JavaCalls::call_helper(JavaValue*, methodHandle*, JavaCallArguments*, > Thread*)" -- that's direct from "perf script"!). I'd consider making > it a semicolon: > 809 swapper-0;cpu_bringup_and_idle;cpu_startup_entry;... > So the output is "value key", and key is a semicolon delimited stack > with an optional comm or PID:comm frame at the start. Agreed, but then, we can have some sort of default and also be able to, using -F, specify what are the fields we want, and in which order, and I liked your suggestion of being able to specify "-F none" and that mean no hist line to be produced. Likewise, the way that each callchain line should be formatted should be programmable via the command line, via the -g option, no? Then script writers could use it in a way that doesn't requires further processing, as Brendan showed. But yeah, the value is the semicolon delimited stack all the way to the comm/PID:comm if there are more than one or if the user asks it to be there via a -g keyword, all the other counts/info are just relative to that, CSV or whatever other delimiter the user asks it to, and space is not an option, as we know it can appear in the middle of a COMM: [root@zoo ~]# perf report -s comm | grep '[a-zA-Z] [a-zA-Z]' # To display the perf.data header info, please use # --header/--header-only options. # Total Lost Samples: 0 # Samples: 164K of event 'cycles:pp' # Event count (approx.): 34422160859 0.11% DOM Worker 0.10% JS Helper 0.01% Qt bearer threa 0.00% Socket Thread 0.00% dconf worker 0.00% JS Watchdog [root@zoo ~]# - Arnaldo -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCHSET 0/4] perf report: Support folded callchain output (v4)
Hi Arnaldo and Brendan, On Wed, Nov 04, 2015 at 11:51:31AM -0300, Arnaldo Carvalho de Melo wrote: > Em Tue, Nov 03, 2015 at 10:02:32PM -0800, Brendan Gregg escreveu: > > On Tue, Nov 3, 2015 at 5:54 PM, Namhyung Kimwrote: > > > Ah, makes sense. So it'd look like > > > > $ perf report --stdio -g folded,count,info -F none -s comm > > > $ perf report --stdio -g folded,count,info -F none -s pid > > > > The output would be > > > > 809 swapper-0 > > > cpu_bringup_and_idle;cpu_startup_entry;default_idle_call;arch_cpu_idle;default_idle;xen_hypercall_sched_op > > > Thanks, looks almost right: a couple of minor changes: > > > 1. If perf already has the precedent of "PID:comm", instead of my > > "comm-PID", then maybe it should use "PID:comm" for perf consistency. > > Doesn't make much difference to me. Right. Actually I'd like to write it that way.. ;-) > > 2. The second space, delimiting "PID:comm" (or comm) and the stack... > > I'm nervous about using space as a delimiter any more than once, since > > it can also appear in comm (eg, "java main") and frames (eg, > > "JavaCalls::call_helper(JavaValue*, methodHandle*, JavaCallArguments*, > > Thread*)" -- that's direct from "perf script"!). I'd consider making > > it a semicolon: Fair enough. > > > 809 swapper-0;cpu_bringup_and_idle;cpu_startup_entry;... > > > So the output is "value key", and key is a semicolon delimited stack > > with an optional comm or PID:comm frame at the start. > > Agreed, but then, we can have some sort of default and also be able to, > using -F, specify what are the fields we want, and in which order, and I > liked your suggestion of being able to specify "-F none" and that mean > no hist line to be produced. > > Likewise, the way that each callchain line should be formatted should be > programmable via the command line, via the -g option, no? Then script > writers could use it in a way that doesn't requires further processing, > as Brendan showed. Right. So '-s [,,...] -g info' can control which info is displayed along with the callchains. $ perf report -s comm,dso -g folded,count,info -F none 809 swapper;[kernel.vmlinux];cpu_bringup_and_idle;cpu_startup_entry;... Note that the info part (swapper;[kernel.vmlinux]) is also separated by a semicolon. But I think it's ok since it's controlled by command line, so script can know how many entries will be. > > But yeah, the value is the semicolon delimited stack all the way to the > comm/PID:comm if there are more than one or if the user asks it to be > there via a -g keyword, all the other counts/info are just relative to > that, CSV or whatever other delimiter the user asks it to, and space is > not an option, as we know it can appear in the middle of a COMM: Yes, I think that we should use a given separator (using -t option) instead of hard-coded semicolon. Although it'd be rare, it seems possible to use semicolons in the comm name too. Thanks, Namhyung > > [root@zoo ~]# perf report -s comm | grep '[a-zA-Z] [a-zA-Z]' > # To display the perf.data header info, please use > # --header/--header-only options. > # Total Lost Samples: 0 > # Samples: 164K of event 'cycles:pp' > # Event count (approx.): 34422160859 > 0.11% DOM Worker > 0.10% JS Helper > 0.01% Qt bearer threa > 0.00% Socket Thread > 0.00% dconf worker > 0.00% JS Watchdog > [root@zoo ~]# > > - Arnaldo -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCHSET 0/4] perf report: Support folded callchain output (v4)
Em Thu, Nov 05, 2015 at 12:34:57AM +0900, Namhyung Kim escreveu: > Hi Arnaldo and Brendan, > > On Wed, Nov 04, 2015 at 11:51:31AM -0300, Arnaldo Carvalho de Melo wrote: > > Em Tue, Nov 03, 2015 at 10:02:32PM -0800, Brendan Gregg escreveu: > > > On Tue, Nov 3, 2015 at 5:54 PM, Namhyung Kimwrote: > > > > Ah, makes sense. So it'd look like > > > > > > $ perf report --stdio -g folded,count,info -F none -s comm > > > > $ perf report --stdio -g folded,count,info -F none -s pid > > > > > > The output would be > > > > > > 809 swapper-0 > > > > cpu_bringup_and_idle;cpu_startup_entry;default_idle_call;arch_cpu_idle;default_idle;xen_hypercall_sched_op > > > > > Thanks, looks almost right: a couple of minor changes: > > > > > 1. If perf already has the precedent of "PID:comm", instead of my > > > "comm-PID", then maybe it should use "PID:comm" for perf consistency. > > > Doesn't make much difference to me. > Right. Actually I'd like to write it that way.. ;-) Well, those are two pieces of information: "comm" and "pid", so it would be nice that we could take this opportunity to remove it, i.e. just treat it as any other field and separate it via the designated separator, and only show the ones specified. > > > 2. The second space, delimiting "PID:comm" (or comm) and the stack... > > > I'm nervous about using space as a delimiter any more than once, since > > > it can also appear in comm (eg, "java main") and frames (eg, > > > "JavaCalls::call_helper(JavaValue*, methodHandle*, JavaCallArguments*, > > > Thread*)" -- that's direct from "perf script"!). I'd consider making > > > it a semicolon: The C++ symbol names are the biggest challenge here for a single line in CSV ("comma" quoted) record :-\ > Fair enough. > > > 809 swapper-0;cpu_bringup_and_idle;cpu_startup_entry;... > > > > > So the output is "value key", and key is a semicolon delimited stack > > > with an optional comm or PID:comm frame at the start. > > > > Agreed, but then, we can have some sort of default and also be able to, > > using -F, specify what are the fields we want, and in which order, and I > > liked your suggestion of being able to specify "-F none" and that mean > > no hist line to be produced. > > > > Likewise, the way that each callchain line should be formatted should be > > programmable via the command line, via the -g option, no? Then script > > writers could use it in a way that doesn't requires further processing, > > as Brendan showed. > > Right. So '-s [,,...] -g info' can control which info is > displayed along with the callchains. So you force the same selection of fields to be used for both the hist_entry and the callchains? And why is that some of the fields will be selected via -s (comm, dso) and other fields will be selected via -g (count, this "info" thing)? Why not be flexible and allow any set of fields to be used in both cases, without one being tied to the other? I.e. instead of: -s [,,...] -g info We use: -s [,,...] -g [[,],...] If one would want to have the same set for both, then yeah, a keyword for that would be interesting, reusing your "info": -s [,,...] -g info Would mean: -s [,,...] -g [[,],...] With both ... equal But "info" is way too vague, perhaps "hist_keys", or something more compact, like: "\-s", to reuse the semantic of regular expression groups (\1). > $ perf report -s comm,dso -g folded,count,info -F none > 809 swapper;[kernel.vmlinux];cpu_bringup_and_idle;cpu_startup_entry;... > Note that the info part (swapper;[kernel.vmlinux]) is also separated > by a semicolon. But I think it's ok since it's controlled by command > line, so script can know how many entries will be. > > But yeah, the value is the semicolon delimited stack all the way to the > > comm/PID:comm if there are more than one or if the user asks it to be > > there via a -g keyword, all the other counts/info are just relative to > > that, CSV or whatever other delimiter the user asks it to, and space is > > not an option, as we know it can appear in the middle of a COMM: > > Yes, I think that we should use a given separator (using -t option) > instead of hard-coded semicolon. Although it'd be rare, it seems > possible to use semicolons in the comm name too. Well, we can have an option to specify what would be the separator for the callchains. - Arnaldo -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCHSET 0/4] perf report: Support folded callchain output (v4)
On Tue, Nov 3, 2015 at 5:54 PM, Namhyung Kim wrote: > Hi Brendan, > > On Tue, Nov 03, 2015 at 01:33:43PM -0800, Brendan Gregg wrote: >> On Tue, Nov 3, 2015 at 6:40 AM, Arnaldo Carvalho de Melo >> wrote: >> > Em Tue, Nov 03, 2015 at 09:52:07PM +0900, Namhyung Kim escreveu: >> >> Hello, >> >> >> >> This is what Brendan requested on the perf-users mailing list [1] to >> >> support FlameGraphs [2] more efficiently. This patchset adds a few >> >> more callchain options to adjust the output for it. >> >> >> >> * changes in v4) >> >> - add missing doc update >> >> - cleanup/fix callchain value print code >> >> - add Acked-by from Brendan and Jiri >> > >> > Do those Acked-by stand? Things changed, the values moved from the end >> > of the line to the start, etc. >> > >> [...] >> >> I'd Ack this change as it's a useful addition. It doesn't quite >> address the folded-only output, but it's a step in that direction. I >> think having the value at the start of a line only makes sense for the >> perf report output containing the hist summary lines, for consistency. > > Right, thanks! > > >> >> Here's how I'd shuffle the output of this patch (ignore word wrap >> issues with this email): >> >> # ./perf report --stdio -g folded,count,caller -F pid | \ >> awk '/^ / { n = $1 } >> /^[0-9]/ { split(n,a,":"); print a[2] "-" a[1] ";" $2,$1 }' >> swapper-0;cpu_bringup_and_idle;cpu_startup_entry;default_idle_call;arch_cpu_idle;default_idle;xen_hypercall_sched_op >> 809 >> swapper-0;xen_start_kernel;x86_64_start_reservations;start_kernel;rest_init;cpu_startup_entry;default_idle_call;arch_cpu_idle;default_idle;xen_hypercall_sched_op >> 135 >> dd-30551;__GI___libc_read;entry_SYSCALL_64_fastpath;sys_read;vfs_read;__vfs_read;urandom_read;extract_entropy_user;extract_buf;check_events;xen_hypercall_xen_version >> 63 >> dd-30551;__GI___libc_read;entry_SYSCALL_64_fastpath;sys_read;vfs_read;__vfs_read;urandom_read;extract_entropy_user;extract_buf >> 54 >> dd-30551;__GI___libc_read;entry_SYSCALL_64_fastpath;sys_read;vfs_read;__vfs_read;urandom_read;extract_entropy_user;extract_buf;memset_erms >> 3 >> dd-30551;xen_irq_enable_direct_end;check_events;xen_hypercall_xen_version 3 >> >> So the output is folded stacks, prefixed by comm-PID. Shuffling the >> summarized output is a lot better than doing a "perf script" dump and >> re-processing call chains. (Note that since I'm using -F, I didn't >> need --no-children; > > Nope. The '-F pid' doesn't affect --children. It doesn't show the > children overhead column but we still have hist entries for > (synthesized) children.. > > $ perf report --no-children | wc -l > 998 > > $ perf report --no-children -F pid,dso,sym | wc -l > 998 > > $ perf report --children | wc -l > 3229 > > $ perf report --children -F pid,dso,sym | wc -l > 3202 > > So I think you still need to use --no-children (or set report.children > config variable to false) for your script. Ok, good to know, thanks. > > >> and with "-g count", I didn't need --show-nr-samples.) > > Yes, I used -n/--show-nr-samples just to check the number is correct. > > >> >> I notice the fields (-F) option already has this precedent: >> >> - "comm": prints PID:comm >> - "pid": prints PID > > It's opposite: "comm" prints comm, "pid" prints PID:comm. :) Ah, right, sorry, I'd typed those the wrong way around. :) > > >> >> If these were added to -g, along with a no-hists, then the two types >> of folded-only output could be generated using: >> >> perf report --stdio -g folded,count,comm,no-hists,caller >> perf report --stdio -g folded,count,pid,no-hists,caller > > As I said, using fields like comm, pid requires to have same keys in > --sort option. So it's basically unreliable to use those specific > field names in the -g option IMHO. I suggested to use 'info' (yes, it > needs better name) to print all sort keys. > > >> >> ... although "no-hists" doesn't hit me as intuitive. How about "-F >> none" to specify zero columns? ie: >> >> perf report --stdio -g folded,count,comm,caller -F none >> perf report --stdio -g folded,count,pid,caller -F none > > Ah, makes sense. So it'd look like > > $ perf report --stdio -g folded,count,info -F none -s comm > $ perf report --stdio -g folded,count,info -F none -s pid > > The output would be > > 809 swapper-0 > cpu_bringup_and_idle;cpu_startup_entry;default_idle_call;arch_cpu_idle;default_idle;xen_hypercall_sched_op > Thanks, looks almost right: a couple of minor changes: 1. If perf already has the precedent of "PID:comm", instead of my "comm-PID", then maybe it should use "PID:comm" for perf consistency. Doesn't make much difference to me. 2. The second space, delimiting "PID:comm" (or comm) and the stack... I'm nervous about using space as a delimiter any more than once, since it can also appear in comm (eg, "java main") and frames (eg, "JavaCalls::call_helper(JavaValue*, methodHandle*, JavaCallArguments*, Thread*)" -- that's direct from "perf script"!). I'd consider
Re: [PATCHSET 0/4] perf report: Support folded callchain output (v4)
Hi Brendan, On Tue, Nov 03, 2015 at 01:33:43PM -0800, Brendan Gregg wrote: > On Tue, Nov 3, 2015 at 6:40 AM, Arnaldo Carvalho de Melo > wrote: > > Em Tue, Nov 03, 2015 at 09:52:07PM +0900, Namhyung Kim escreveu: > >> Hello, > >> > >> This is what Brendan requested on the perf-users mailing list [1] to > >> support FlameGraphs [2] more efficiently. This patchset adds a few > >> more callchain options to adjust the output for it. > >> > >> * changes in v4) > >> - add missing doc update > >> - cleanup/fix callchain value print code > >> - add Acked-by from Brendan and Jiri > > > > Do those Acked-by stand? Things changed, the values moved from the end > > of the line to the start, etc. > > > [...] > > I'd Ack this change as it's a useful addition. It doesn't quite > address the folded-only output, but it's a step in that direction. I > think having the value at the start of a line only makes sense for the > perf report output containing the hist summary lines, for consistency. Right, thanks! > > Here's how I'd shuffle the output of this patch (ignore word wrap > issues with this email): > > # ./perf report --stdio -g folded,count,caller -F pid | \ > awk '/^ / { n = $1 } > /^[0-9]/ { split(n,a,":"); print a[2] "-" a[1] ";" $2,$1 }' > swapper-0;cpu_bringup_and_idle;cpu_startup_entry;default_idle_call;arch_cpu_idle;default_idle;xen_hypercall_sched_op > 809 > swapper-0;xen_start_kernel;x86_64_start_reservations;start_kernel;rest_init;cpu_startup_entry;default_idle_call;arch_cpu_idle;default_idle;xen_hypercall_sched_op > 135 > dd-30551;__GI___libc_read;entry_SYSCALL_64_fastpath;sys_read;vfs_read;__vfs_read;urandom_read;extract_entropy_user;extract_buf;check_events;xen_hypercall_xen_version > 63 > dd-30551;__GI___libc_read;entry_SYSCALL_64_fastpath;sys_read;vfs_read;__vfs_read;urandom_read;extract_entropy_user;extract_buf > 54 > dd-30551;__GI___libc_read;entry_SYSCALL_64_fastpath;sys_read;vfs_read;__vfs_read;urandom_read;extract_entropy_user;extract_buf;memset_erms > 3 > dd-30551;xen_irq_enable_direct_end;check_events;xen_hypercall_xen_version 3 > > So the output is folded stacks, prefixed by comm-PID. Shuffling the > summarized output is a lot better than doing a "perf script" dump and > re-processing call chains. (Note that since I'm using -F, I didn't > need --no-children; Nope. The '-F pid' doesn't affect --children. It doesn't show the children overhead column but we still have hist entries for (synthesized) children.. $ perf report --no-children | wc -l 998 $ perf report --no-children -F pid,dso,sym | wc -l 998 $ perf report --children | wc -l 3229 $ perf report --children -F pid,dso,sym | wc -l 3202 So I think you still need to use --no-children (or set report.children config variable to false) for your script. > and with "-g count", I didn't need --show-nr-samples.) Yes, I used -n/--show-nr-samples just to check the number is correct. > > I notice the fields (-F) option already has this precedent: > > - "comm": prints PID:comm > - "pid": prints PID It's opposite: "comm" prints comm, "pid" prints PID:comm. :) > > If these were added to -g, along with a no-hists, then the two types > of folded-only output could be generated using: > > perf report --stdio -g folded,count,comm,no-hists,caller > perf report --stdio -g folded,count,pid,no-hists,caller As I said, using fields like comm, pid requires to have same keys in --sort option. So it's basically unreliable to use those specific field names in the -g option IMHO. I suggested to use 'info' (yes, it needs better name) to print all sort keys. > > ... although "no-hists" doesn't hit me as intuitive. How about "-F > none" to specify zero columns? ie: > > perf report --stdio -g folded,count,comm,caller -F none > perf report --stdio -g folded,count,pid,caller -F none Ah, makes sense. So it'd look like $ perf report --stdio -g folded,count,info -F none -s comm $ perf report --stdio -g folded,count,info -F none -s pid The output would be 809 swapper-0 cpu_bringup_and_idle;cpu_startup_entry;default_idle_call;arch_cpu_idle;default_idle;xen_hypercall_sched_op Thoughts? Thanks, Namhyung -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCHSET 0/4] perf report: Support folded callchain output (v4)
On Tue, Nov 3, 2015 at 6:40 AM, Arnaldo Carvalho de Melo wrote: > Em Tue, Nov 03, 2015 at 09:52:07PM +0900, Namhyung Kim escreveu: >> Hello, >> >> This is what Brendan requested on the perf-users mailing list [1] to >> support FlameGraphs [2] more efficiently. This patchset adds a few >> more callchain options to adjust the output for it. >> >> * changes in v4) >> - add missing doc update >> - cleanup/fix callchain value print code >> - add Acked-by from Brendan and Jiri > > Do those Acked-by stand? Things changed, the values moved from the end > of the line to the start, etc. > [...] I'd Ack this change as it's a useful addition. It doesn't quite address the folded-only output, but it's a step in that direction. I think having the value at the start of a line only makes sense for the perf report output containing the hist summary lines, for consistency. Here's how I'd shuffle the output of this patch (ignore word wrap issues with this email): # ./perf report --stdio -g folded,count,caller -F pid | \ awk '/^ / { n = $1 } /^[0-9]/ { split(n,a,":"); print a[2] "-" a[1] ";" $2,$1 }' swapper-0;cpu_bringup_and_idle;cpu_startup_entry;default_idle_call;arch_cpu_idle;default_idle;xen_hypercall_sched_op 809 swapper-0;xen_start_kernel;x86_64_start_reservations;start_kernel;rest_init;cpu_startup_entry;default_idle_call;arch_cpu_idle;default_idle;xen_hypercall_sched_op 135 dd-30551;__GI___libc_read;entry_SYSCALL_64_fastpath;sys_read;vfs_read;__vfs_read;urandom_read;extract_entropy_user;extract_buf;check_events;xen_hypercall_xen_version 63 dd-30551;__GI___libc_read;entry_SYSCALL_64_fastpath;sys_read;vfs_read;__vfs_read;urandom_read;extract_entropy_user;extract_buf 54 dd-30551;__GI___libc_read;entry_SYSCALL_64_fastpath;sys_read;vfs_read;__vfs_read;urandom_read;extract_entropy_user;extract_buf;memset_erms 3 dd-30551;xen_irq_enable_direct_end;check_events;xen_hypercall_xen_version 3 So the output is folded stacks, prefixed by comm-PID. Shuffling the summarized output is a lot better than doing a "perf script" dump and re-processing call chains. (Note that since I'm using -F, I didn't need --no-children; and with "-g count", I didn't need --show-nr-samples.) I notice the fields (-F) option already has this precedent: - "comm": prints PID:comm - "pid": prints PID If these were added to -g, along with a no-hists, then the two types of folded-only output could be generated using: perf report --stdio -g folded,count,comm,no-hists,caller perf report --stdio -g folded,count,pid,no-hists,caller ... although "no-hists" doesn't hit me as intuitive. How about "-F none" to specify zero columns? ie: perf report --stdio -g folded,count,comm,caller -F none perf report --stdio -g folded,count,pid,caller -F none Brendan -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCHSET 0/4] perf report: Support folded callchain output (v4)
Em Tue, Nov 03, 2015 at 09:52:07PM +0900, Namhyung Kim escreveu: > Hello, > > This is what Brendan requested on the perf-users mailing list [1] to > support FlameGraphs [2] more efficiently. This patchset adds a few > more callchain options to adjust the output for it. > > * changes in v4) > - add missing doc update > - cleanup/fix callchain value print code > - add Acked-by from Brendan and Jiri Do those Acked-by stand? Things changed, the values moved from the end of the line to the start, etc. You said you would consider having a --no-hists, but I see nothing about it in this patchkit. Some more comments below. - Arnaldo > * changes in v3) > - put the value before callchains > - fix compile error > > > At first, 'folded' output mode was added. The folded output puts the > value, a space and all calchain nodes separated by semicolons. Now it > only supports --stdio as other UI provides some way of folding and/or > expanding callchains dynamically. > > The value is now can be one of 'percent', 'period', or 'count'. The > percent is current default output and the period is the raw number of > sample periods. The count is the number of samples for each callchain. > > Here's an example: > > $ perf report --no-children --show-nr-samples --stdio -g folded,count > ... > 39.93% 80 swapper [kernel.vmlinux] [k] intel_idel > 57 > intel_idle;cpuidle_enter_state;cpuidle_enter;call_cpuidle;cpu_startup_entry;start_secondary > 23 > intel_idle;cpuidle_enter_state;cpuidle_enter;call_cpuidle;cpu_startup_entry;rest_init;... > > > $ perf report --no-children --stdio -g percent So, in this first one you show the percent in both > ... > 39.93% swapper [kernel.vmlinux] [k] intel_idel > | > ---intel_idle >cpuidle_enter_state >cpuidle_enter >call_cpuidle >cpu_startup_entry >| >|--28.63%-- start_secondary >| > --11.30%-- rest_init > > > $ perf report --no-children --stdio --show-total-period -g period > ... then here you _add_ the period to the hist_entry line, but... > 39.93% 13018705 swapper [kernel.vmlinux] [k] intel_idel > | > ---intel_idle >cpuidle_enter_state >cpuidle_enter >call_cpuidle >cpu_startup_entry >| _replace_ the percentage with the period in the callchains. Can't we have the same effect in both? I.e. I would expect the 39.93% to simply be replaced with that 13018705. >|--9334403-- start_secondary >| > --3684302-- rest_init > > > $ perf report --no-children --stdio --show-nr-samples -g count > ... > 39.93% 80 swapper [kernel.vmlinux] [k] intel_idel Ditto for count > | > ---intel_idle >cpuidle_enter_state >cpuidle_enter >call_cpuidle >cpu_startup_entry >| >|--57-- start_secondary >| > --23-- rest_init > > > You can get it from 'perf/callchain-fold-v4' branch on my tree: > > git://git.kernel.org/pub/scm/linux/kernel/git/namhyung/linux-perf.git > > Any comments are welcome, thanks > Namhyung > > > [1] http://www.spinics.net/lists/linux-perf-users/msg02498.html > [2] http://www.brendangregg.com/FlameGraphs/cpuflamegraphs.html > > > Namhyung Kim (4): > perf report: Support folded callchain mode on --stdio > perf callchain: Abstract callchain print function > perf callchain: Add count fields to struct callchain_node > perf report: Add callchain value option > > tools/perf/Documentation/perf-report.txt | 13 +++-- > tools/perf/builtin-report.c | 4 +- > tools/perf/ui/browsers/hists.c | 8 +-- > tools/perf/ui/gtk/hists.c| 8 +-- > tools/perf/ui/stdio/hist.c | 93 > ++-- > tools/perf/util/callchain.c | 87 +- > tools/perf/util/callchain.h | 24 - > tools/perf/util/util.c | 3 +- > 8 files changed, 205 insertions(+), 35 deletions(-) > > -- > 2.6.2 -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCHSET 0/4] perf report: Support folded callchain output (v4)
Hello, This is what Brendan requested on the perf-users mailing list [1] to support FlameGraphs [2] more efficiently. This patchset adds a few more callchain options to adjust the output for it. * changes in v4) - add missing doc update - cleanup/fix callchain value print code - add Acked-by from Brendan and Jiri * changes in v3) - put the value before callchains - fix compile error At first, 'folded' output mode was added. The folded output puts the value, a space and all calchain nodes separated by semicolons. Now it only supports --stdio as other UI provides some way of folding and/or expanding callchains dynamically. The value is now can be one of 'percent', 'period', or 'count'. The percent is current default output and the period is the raw number of sample periods. The count is the number of samples for each callchain. Here's an example: $ perf report --no-children --show-nr-samples --stdio -g folded,count ... 39.93% 80 swapper [kernel.vmlinux] [k] intel_idel 57 intel_idle;cpuidle_enter_state;cpuidle_enter;call_cpuidle;cpu_startup_entry;start_secondary 23 intel_idle;cpuidle_enter_state;cpuidle_enter;call_cpuidle;cpu_startup_entry;rest_init;... $ perf report --no-children --stdio -g percent ... 39.93% swapper [kernel.vmlinux] [k] intel_idel | ---intel_idle cpuidle_enter_state cpuidle_enter call_cpuidle cpu_startup_entry | |--28.63%-- start_secondary | --11.30%-- rest_init $ perf report --no-children --stdio --show-total-period -g period ... 39.93% 13018705 swapper [kernel.vmlinux] [k] intel_idel | ---intel_idle cpuidle_enter_state cpuidle_enter call_cpuidle cpu_startup_entry | |--9334403-- start_secondary | --3684302-- rest_init $ perf report --no-children --stdio --show-nr-samples -g count ... 39.93% 80 swapper [kernel.vmlinux] [k] intel_idel | ---intel_idle cpuidle_enter_state cpuidle_enter call_cpuidle cpu_startup_entry | |--57-- start_secondary | --23-- rest_init You can get it from 'perf/callchain-fold-v4' branch on my tree: git://git.kernel.org/pub/scm/linux/kernel/git/namhyung/linux-perf.git Any comments are welcome, thanks Namhyung [1] http://www.spinics.net/lists/linux-perf-users/msg02498.html [2] http://www.brendangregg.com/FlameGraphs/cpuflamegraphs.html Namhyung Kim (4): perf report: Support folded callchain mode on --stdio perf callchain: Abstract callchain print function perf callchain: Add count fields to struct callchain_node perf report: Add callchain value option tools/perf/Documentation/perf-report.txt | 13 +++-- tools/perf/builtin-report.c | 4 +- tools/perf/ui/browsers/hists.c | 8 +-- tools/perf/ui/gtk/hists.c| 8 +-- tools/perf/ui/stdio/hist.c | 93 ++-- tools/perf/util/callchain.c | 87 +- tools/perf/util/callchain.h | 24 - tools/perf/util/util.c | 3 +- 8 files changed, 205 insertions(+), 35 deletions(-) -- 2.6.2 -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCHSET 0/4] perf report: Support folded callchain output (v4)
Hello, This is what Brendan requested on the perf-users mailing list [1] to support FlameGraphs [2] more efficiently. This patchset adds a few more callchain options to adjust the output for it. * changes in v4) - add missing doc update - cleanup/fix callchain value print code - add Acked-by from Brendan and Jiri * changes in v3) - put the value before callchains - fix compile error At first, 'folded' output mode was added. The folded output puts the value, a space and all calchain nodes separated by semicolons. Now it only supports --stdio as other UI provides some way of folding and/or expanding callchains dynamically. The value is now can be one of 'percent', 'period', or 'count'. The percent is current default output and the period is the raw number of sample periods. The count is the number of samples for each callchain. Here's an example: $ perf report --no-children --show-nr-samples --stdio -g folded,count ... 39.93% 80 swapper [kernel.vmlinux] [k] intel_idel 57 intel_idle;cpuidle_enter_state;cpuidle_enter;call_cpuidle;cpu_startup_entry;start_secondary 23 intel_idle;cpuidle_enter_state;cpuidle_enter;call_cpuidle;cpu_startup_entry;rest_init;... $ perf report --no-children --stdio -g percent ... 39.93% swapper [kernel.vmlinux] [k] intel_idel | ---intel_idle cpuidle_enter_state cpuidle_enter call_cpuidle cpu_startup_entry | |--28.63%-- start_secondary | --11.30%-- rest_init $ perf report --no-children --stdio --show-total-period -g period ... 39.93% 13018705 swapper [kernel.vmlinux] [k] intel_idel | ---intel_idle cpuidle_enter_state cpuidle_enter call_cpuidle cpu_startup_entry | |--9334403-- start_secondary | --3684302-- rest_init $ perf report --no-children --stdio --show-nr-samples -g count ... 39.93% 80 swapper [kernel.vmlinux] [k] intel_idel | ---intel_idle cpuidle_enter_state cpuidle_enter call_cpuidle cpu_startup_entry | |--57-- start_secondary | --23-- rest_init You can get it from 'perf/callchain-fold-v4' branch on my tree: git://git.kernel.org/pub/scm/linux/kernel/git/namhyung/linux-perf.git Any comments are welcome, thanks Namhyung [1] http://www.spinics.net/lists/linux-perf-users/msg02498.html [2] http://www.brendangregg.com/FlameGraphs/cpuflamegraphs.html Namhyung Kim (4): perf report: Support folded callchain mode on --stdio perf callchain: Abstract callchain print function perf callchain: Add count fields to struct callchain_node perf report: Add callchain value option tools/perf/Documentation/perf-report.txt | 13 +++-- tools/perf/builtin-report.c | 4 +- tools/perf/ui/browsers/hists.c | 8 +-- tools/perf/ui/gtk/hists.c| 8 +-- tools/perf/ui/stdio/hist.c | 93 ++-- tools/perf/util/callchain.c | 87 +- tools/perf/util/callchain.h | 24 - tools/perf/util/util.c | 3 +- 8 files changed, 205 insertions(+), 35 deletions(-) -- 2.6.2 -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCHSET 0/4] perf report: Support folded callchain output (v4)
On Tue, Nov 3, 2015 at 6:40 AM, Arnaldo Carvalho de Melowrote: > Em Tue, Nov 03, 2015 at 09:52:07PM +0900, Namhyung Kim escreveu: >> Hello, >> >> This is what Brendan requested on the perf-users mailing list [1] to >> support FlameGraphs [2] more efficiently. This patchset adds a few >> more callchain options to adjust the output for it. >> >> * changes in v4) >> - add missing doc update >> - cleanup/fix callchain value print code >> - add Acked-by from Brendan and Jiri > > Do those Acked-by stand? Things changed, the values moved from the end > of the line to the start, etc. > [...] I'd Ack this change as it's a useful addition. It doesn't quite address the folded-only output, but it's a step in that direction. I think having the value at the start of a line only makes sense for the perf report output containing the hist summary lines, for consistency. Here's how I'd shuffle the output of this patch (ignore word wrap issues with this email): # ./perf report --stdio -g folded,count,caller -F pid | \ awk '/^ / { n = $1 } /^[0-9]/ { split(n,a,":"); print a[2] "-" a[1] ";" $2,$1 }' swapper-0;cpu_bringup_and_idle;cpu_startup_entry;default_idle_call;arch_cpu_idle;default_idle;xen_hypercall_sched_op 809 swapper-0;xen_start_kernel;x86_64_start_reservations;start_kernel;rest_init;cpu_startup_entry;default_idle_call;arch_cpu_idle;default_idle;xen_hypercall_sched_op 135 dd-30551;__GI___libc_read;entry_SYSCALL_64_fastpath;sys_read;vfs_read;__vfs_read;urandom_read;extract_entropy_user;extract_buf;check_events;xen_hypercall_xen_version 63 dd-30551;__GI___libc_read;entry_SYSCALL_64_fastpath;sys_read;vfs_read;__vfs_read;urandom_read;extract_entropy_user;extract_buf 54 dd-30551;__GI___libc_read;entry_SYSCALL_64_fastpath;sys_read;vfs_read;__vfs_read;urandom_read;extract_entropy_user;extract_buf;memset_erms 3 dd-30551;xen_irq_enable_direct_end;check_events;xen_hypercall_xen_version 3 So the output is folded stacks, prefixed by comm-PID. Shuffling the summarized output is a lot better than doing a "perf script" dump and re-processing call chains. (Note that since I'm using -F, I didn't need --no-children; and with "-g count", I didn't need --show-nr-samples.) I notice the fields (-F) option already has this precedent: - "comm": prints PID:comm - "pid": prints PID If these were added to -g, along with a no-hists, then the two types of folded-only output could be generated using: perf report --stdio -g folded,count,comm,no-hists,caller perf report --stdio -g folded,count,pid,no-hists,caller ... although "no-hists" doesn't hit me as intuitive. How about "-F none" to specify zero columns? ie: perf report --stdio -g folded,count,comm,caller -F none perf report --stdio -g folded,count,pid,caller -F none Brendan -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCHSET 0/4] perf report: Support folded callchain output (v4)
Hi Brendan, On Tue, Nov 03, 2015 at 01:33:43PM -0800, Brendan Gregg wrote: > On Tue, Nov 3, 2015 at 6:40 AM, Arnaldo Carvalho de Melo >wrote: > > Em Tue, Nov 03, 2015 at 09:52:07PM +0900, Namhyung Kim escreveu: > >> Hello, > >> > >> This is what Brendan requested on the perf-users mailing list [1] to > >> support FlameGraphs [2] more efficiently. This patchset adds a few > >> more callchain options to adjust the output for it. > >> > >> * changes in v4) > >> - add missing doc update > >> - cleanup/fix callchain value print code > >> - add Acked-by from Brendan and Jiri > > > > Do those Acked-by stand? Things changed, the values moved from the end > > of the line to the start, etc. > > > [...] > > I'd Ack this change as it's a useful addition. It doesn't quite > address the folded-only output, but it's a step in that direction. I > think having the value at the start of a line only makes sense for the > perf report output containing the hist summary lines, for consistency. Right, thanks! > > Here's how I'd shuffle the output of this patch (ignore word wrap > issues with this email): > > # ./perf report --stdio -g folded,count,caller -F pid | \ > awk '/^ / { n = $1 } > /^[0-9]/ { split(n,a,":"); print a[2] "-" a[1] ";" $2,$1 }' > swapper-0;cpu_bringup_and_idle;cpu_startup_entry;default_idle_call;arch_cpu_idle;default_idle;xen_hypercall_sched_op > 809 > swapper-0;xen_start_kernel;x86_64_start_reservations;start_kernel;rest_init;cpu_startup_entry;default_idle_call;arch_cpu_idle;default_idle;xen_hypercall_sched_op > 135 > dd-30551;__GI___libc_read;entry_SYSCALL_64_fastpath;sys_read;vfs_read;__vfs_read;urandom_read;extract_entropy_user;extract_buf;check_events;xen_hypercall_xen_version > 63 > dd-30551;__GI___libc_read;entry_SYSCALL_64_fastpath;sys_read;vfs_read;__vfs_read;urandom_read;extract_entropy_user;extract_buf > 54 > dd-30551;__GI___libc_read;entry_SYSCALL_64_fastpath;sys_read;vfs_read;__vfs_read;urandom_read;extract_entropy_user;extract_buf;memset_erms > 3 > dd-30551;xen_irq_enable_direct_end;check_events;xen_hypercall_xen_version 3 > > So the output is folded stacks, prefixed by comm-PID. Shuffling the > summarized output is a lot better than doing a "perf script" dump and > re-processing call chains. (Note that since I'm using -F, I didn't > need --no-children; Nope. The '-F pid' doesn't affect --children. It doesn't show the children overhead column but we still have hist entries for (synthesized) children.. $ perf report --no-children | wc -l 998 $ perf report --no-children -F pid,dso,sym | wc -l 998 $ perf report --children | wc -l 3229 $ perf report --children -F pid,dso,sym | wc -l 3202 So I think you still need to use --no-children (or set report.children config variable to false) for your script. > and with "-g count", I didn't need --show-nr-samples.) Yes, I used -n/--show-nr-samples just to check the number is correct. > > I notice the fields (-F) option already has this precedent: > > - "comm": prints PID:comm > - "pid": prints PID It's opposite: "comm" prints comm, "pid" prints PID:comm. :) > > If these were added to -g, along with a no-hists, then the two types > of folded-only output could be generated using: > > perf report --stdio -g folded,count,comm,no-hists,caller > perf report --stdio -g folded,count,pid,no-hists,caller As I said, using fields like comm, pid requires to have same keys in --sort option. So it's basically unreliable to use those specific field names in the -g option IMHO. I suggested to use 'info' (yes, it needs better name) to print all sort keys. > > ... although "no-hists" doesn't hit me as intuitive. How about "-F > none" to specify zero columns? ie: > > perf report --stdio -g folded,count,comm,caller -F none > perf report --stdio -g folded,count,pid,caller -F none Ah, makes sense. So it'd look like $ perf report --stdio -g folded,count,info -F none -s comm $ perf report --stdio -g folded,count,info -F none -s pid The output would be 809 swapper-0 cpu_bringup_and_idle;cpu_startup_entry;default_idle_call;arch_cpu_idle;default_idle;xen_hypercall_sched_op Thoughts? Thanks, Namhyung -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCHSET 0/4] perf report: Support folded callchain output (v4)
On Tue, Nov 3, 2015 at 5:54 PM, Namhyung Kimwrote: > Hi Brendan, > > On Tue, Nov 03, 2015 at 01:33:43PM -0800, Brendan Gregg wrote: >> On Tue, Nov 3, 2015 at 6:40 AM, Arnaldo Carvalho de Melo >> wrote: >> > Em Tue, Nov 03, 2015 at 09:52:07PM +0900, Namhyung Kim escreveu: >> >> Hello, >> >> >> >> This is what Brendan requested on the perf-users mailing list [1] to >> >> support FlameGraphs [2] more efficiently. This patchset adds a few >> >> more callchain options to adjust the output for it. >> >> >> >> * changes in v4) >> >> - add missing doc update >> >> - cleanup/fix callchain value print code >> >> - add Acked-by from Brendan and Jiri >> > >> > Do those Acked-by stand? Things changed, the values moved from the end >> > of the line to the start, etc. >> > >> [...] >> >> I'd Ack this change as it's a useful addition. It doesn't quite >> address the folded-only output, but it's a step in that direction. I >> think having the value at the start of a line only makes sense for the >> perf report output containing the hist summary lines, for consistency. > > Right, thanks! > > >> >> Here's how I'd shuffle the output of this patch (ignore word wrap >> issues with this email): >> >> # ./perf report --stdio -g folded,count,caller -F pid | \ >> awk '/^ / { n = $1 } >> /^[0-9]/ { split(n,a,":"); print a[2] "-" a[1] ";" $2,$1 }' >> swapper-0;cpu_bringup_and_idle;cpu_startup_entry;default_idle_call;arch_cpu_idle;default_idle;xen_hypercall_sched_op >> 809 >> swapper-0;xen_start_kernel;x86_64_start_reservations;start_kernel;rest_init;cpu_startup_entry;default_idle_call;arch_cpu_idle;default_idle;xen_hypercall_sched_op >> 135 >> dd-30551;__GI___libc_read;entry_SYSCALL_64_fastpath;sys_read;vfs_read;__vfs_read;urandom_read;extract_entropy_user;extract_buf;check_events;xen_hypercall_xen_version >> 63 >> dd-30551;__GI___libc_read;entry_SYSCALL_64_fastpath;sys_read;vfs_read;__vfs_read;urandom_read;extract_entropy_user;extract_buf >> 54 >> dd-30551;__GI___libc_read;entry_SYSCALL_64_fastpath;sys_read;vfs_read;__vfs_read;urandom_read;extract_entropy_user;extract_buf;memset_erms >> 3 >> dd-30551;xen_irq_enable_direct_end;check_events;xen_hypercall_xen_version 3 >> >> So the output is folded stacks, prefixed by comm-PID. Shuffling the >> summarized output is a lot better than doing a "perf script" dump and >> re-processing call chains. (Note that since I'm using -F, I didn't >> need --no-children; > > Nope. The '-F pid' doesn't affect --children. It doesn't show the > children overhead column but we still have hist entries for > (synthesized) children.. > > $ perf report --no-children | wc -l > 998 > > $ perf report --no-children -F pid,dso,sym | wc -l > 998 > > $ perf report --children | wc -l > 3229 > > $ perf report --children -F pid,dso,sym | wc -l > 3202 > > So I think you still need to use --no-children (or set report.children > config variable to false) for your script. Ok, good to know, thanks. > > >> and with "-g count", I didn't need --show-nr-samples.) > > Yes, I used -n/--show-nr-samples just to check the number is correct. > > >> >> I notice the fields (-F) option already has this precedent: >> >> - "comm": prints PID:comm >> - "pid": prints PID > > It's opposite: "comm" prints comm, "pid" prints PID:comm. :) Ah, right, sorry, I'd typed those the wrong way around. :) > > >> >> If these were added to -g, along with a no-hists, then the two types >> of folded-only output could be generated using: >> >> perf report --stdio -g folded,count,comm,no-hists,caller >> perf report --stdio -g folded,count,pid,no-hists,caller > > As I said, using fields like comm, pid requires to have same keys in > --sort option. So it's basically unreliable to use those specific > field names in the -g option IMHO. I suggested to use 'info' (yes, it > needs better name) to print all sort keys. > > >> >> ... although "no-hists" doesn't hit me as intuitive. How about "-F >> none" to specify zero columns? ie: >> >> perf report --stdio -g folded,count,comm,caller -F none >> perf report --stdio -g folded,count,pid,caller -F none > > Ah, makes sense. So it'd look like > > $ perf report --stdio -g folded,count,info -F none -s comm > $ perf report --stdio -g folded,count,info -F none -s pid > > The output would be > > 809 swapper-0 > cpu_bringup_and_idle;cpu_startup_entry;default_idle_call;arch_cpu_idle;default_idle;xen_hypercall_sched_op > Thanks, looks almost right: a couple of minor changes: 1. If perf already has the precedent of "PID:comm", instead of my "comm-PID", then maybe it should use "PID:comm" for perf consistency. Doesn't make much difference to me. 2. The second space, delimiting "PID:comm" (or comm) and the stack... I'm nervous about using space as a delimiter any more than once, since it can also appear in comm (eg, "java main") and frames (eg, "JavaCalls::call_helper(JavaValue*, methodHandle*, JavaCallArguments*, Thread*)" -- that's
Re: [PATCHSET 0/4] perf report: Support folded callchain output (v4)
Em Tue, Nov 03, 2015 at 09:52:07PM +0900, Namhyung Kim escreveu: > Hello, > > This is what Brendan requested on the perf-users mailing list [1] to > support FlameGraphs [2] more efficiently. This patchset adds a few > more callchain options to adjust the output for it. > > * changes in v4) > - add missing doc update > - cleanup/fix callchain value print code > - add Acked-by from Brendan and Jiri Do those Acked-by stand? Things changed, the values moved from the end of the line to the start, etc. You said you would consider having a --no-hists, but I see nothing about it in this patchkit. Some more comments below. - Arnaldo > * changes in v3) > - put the value before callchains > - fix compile error > > > At first, 'folded' output mode was added. The folded output puts the > value, a space and all calchain nodes separated by semicolons. Now it > only supports --stdio as other UI provides some way of folding and/or > expanding callchains dynamically. > > The value is now can be one of 'percent', 'period', or 'count'. The > percent is current default output and the period is the raw number of > sample periods. The count is the number of samples for each callchain. > > Here's an example: > > $ perf report --no-children --show-nr-samples --stdio -g folded,count > ... > 39.93% 80 swapper [kernel.vmlinux] [k] intel_idel > 57 > intel_idle;cpuidle_enter_state;cpuidle_enter;call_cpuidle;cpu_startup_entry;start_secondary > 23 > intel_idle;cpuidle_enter_state;cpuidle_enter;call_cpuidle;cpu_startup_entry;rest_init;... > > > $ perf report --no-children --stdio -g percent So, in this first one you show the percent in both > ... > 39.93% swapper [kernel.vmlinux] [k] intel_idel > | > ---intel_idle >cpuidle_enter_state >cpuidle_enter >call_cpuidle >cpu_startup_entry >| >|--28.63%-- start_secondary >| > --11.30%-- rest_init > > > $ perf report --no-children --stdio --show-total-period -g period > ... then here you _add_ the period to the hist_entry line, but... > 39.93% 13018705 swapper [kernel.vmlinux] [k] intel_idel > | > ---intel_idle >cpuidle_enter_state >cpuidle_enter >call_cpuidle >cpu_startup_entry >| _replace_ the percentage with the period in the callchains. Can't we have the same effect in both? I.e. I would expect the 39.93% to simply be replaced with that 13018705. >|--9334403-- start_secondary >| > --3684302-- rest_init > > > $ perf report --no-children --stdio --show-nr-samples -g count > ... > 39.93% 80 swapper [kernel.vmlinux] [k] intel_idel Ditto for count > | > ---intel_idle >cpuidle_enter_state >cpuidle_enter >call_cpuidle >cpu_startup_entry >| >|--57-- start_secondary >| > --23-- rest_init > > > You can get it from 'perf/callchain-fold-v4' branch on my tree: > > git://git.kernel.org/pub/scm/linux/kernel/git/namhyung/linux-perf.git > > Any comments are welcome, thanks > Namhyung > > > [1] http://www.spinics.net/lists/linux-perf-users/msg02498.html > [2] http://www.brendangregg.com/FlameGraphs/cpuflamegraphs.html > > > Namhyung Kim (4): > perf report: Support folded callchain mode on --stdio > perf callchain: Abstract callchain print function > perf callchain: Add count fields to struct callchain_node > perf report: Add callchain value option > > tools/perf/Documentation/perf-report.txt | 13 +++-- > tools/perf/builtin-report.c | 4 +- > tools/perf/ui/browsers/hists.c | 8 +-- > tools/perf/ui/gtk/hists.c| 8 +-- > tools/perf/ui/stdio/hist.c | 93 > ++-- > tools/perf/util/callchain.c | 87 +- > tools/perf/util/callchain.h | 24 - > tools/perf/util/util.c | 3 +- > 8 files changed, 205 insertions(+), 35 deletions(-) > > -- > 2.6.2 -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/