Hello, I tried the perf-hello.c file and found the results as you explained. It is very helpful to me. Now, for PERF_COUNT_HW_BRANCH_MISSES, I am getting varying values every time unlike PERF_COUNT_HW_INSTRUCTIONS.
To get count for only defined process, I am trying to set perf_event_attr structure variables sample_type as PERF_SAMPLE_BRANCH_STACK and branch_sample_type as PERF_SAMPLE_BRANCH_USER. By doing so, I am getting error no. 5 by perf_event_open. What may be the problem? Dhara Buch On Mon, Jun 19, 2017 at 5:51 PM, Michael Petlan <mpet...@redhat.com> wrote: > On Mon, 19 Jun 2017, dhara buch wrote: > > Hello, > > Thanks a lot for the detailed explanation. I could understand the > working of operf and opreport. > > Unlike operf, ocount counts each occurrence of the monitored event. In > such a case, why ocount also gives varying values of CPU_CLK_UNHALTED even > when the monitored code doesn't get > > changed. Is it because of background processes? If so, then can we get > event count for only monitored command? > > Hi Dhara, > > it is kind of because of the background processes, but not in the way that > ocount > would count also their CPU consumption. If you don't use '-s' option, > ocount will > count only what your command does. > > However, it counts either kernelspace, userspace or both (default if you > are root). > This can be configured by the event's flags: > > Both: > -e CPU_CLK_UNHALTED:0:1:1 > > Only kernelspace (used by the command, i.e. syscalls, etc): > -e CPU_CLK_UNHALTED:0:1:0 > > Only userspace (i.e. syscalls are excluded): > -e CPU_CLK_UNHALTED:0:0:1 > > (the format is (ocount): event:umask:kernelspace:userspace, > for operf it is: event:umask:sample_rate:kernelspace:userspace) > > The numbers are slightly different each run because the process does not > live in > vacuum. It is usually dynamically linked, it calls libraries, kernel > switches > contexts, etc. All these things cause the differences. Also, the CPU > optimizes > the execution with floating success, depending on what else runs there > with it. > If the process is interrupted many times, the CPU consumption will be > higher. > > I have a sample code which I linked dynamically and statically. While the > dyn. > linked executable always takes around 670,260 cycles, the statically > linked one > takes around 380,202 and seems to be more steady. Since it is very simple > code, > you can see that the linker-related stuff took more than 40% of the CPU > cycles). > > Executing a binary is so complex problem that there is no surprise that > those > measurements give different values every time. > > ........ > > TIP: > > You can use either papi library or the built-in kernel support for counting > just a particular part of code, e.g. a function. I have attached one > example > which uses perfevents kernel background. I have two functions for factorial > counting, one is slightly less effective than the other. I count cycles and > instructions for both. When executing repeatedly, you can see, that the > inst. > count is the same all the time (at least on my laptop), but the cycles > count > differs even for just the stuff in the functions which are very simple. > > cc -o perf-hello perf-hello.c myperf.c > ./perf-hello > > Ineffective method: > PERF_COUNT_HW_CPU_CYCLES = 32192 > PERF_COUNT_HW_INSTRUCTIONS = 20681 > > More effective method: > PERF_COUNT_HW_CPU_CYCLES = 15931 > PERF_COUNT_HW_INSTRUCTIONS = 19103 > > ........ > > Cheers, > Michael > > > > > > Thank you, > > > > On Fri, Jun 16, 2017 at 4:52 PM, Michael Petlan <mpet...@redhat.com> > wrote: > > On Fri, 16 Jun 2017, dhara buch wrote: > > > Hello, > > > > > > Many days ago you had helped me a lot in using Oprofile in > Virtual Machine. That time, I also had a query regarding > > > getting information from 'opreport --symbols --debuginfo' > command. > > > > Hello, > > > > yeah, I remember... > > > > > > > > Actually, I am profiling a file1.c file with the following > commands... > > > > > > 1. gcc -g file1.c -o file1 > > > > > > 2. operf ./file1 --events=CPU_CLK_UNHALTED > > > > > > 3. opreport --symbols --debuginfo > > > > > > the output contains various lines where one line look follows: > > > > > > samples % linenr info image > name symbol name > > > > > > 2 12.5000 rsaimp.c:16 > rsaimp MM > > > 2 12.5000 rsaimp.c:34 > rsaimp main > > > > > > When I execute commands 2 and 3 multiple times, each time I gwt > the output for symbole name MM (which is UDF in the > > > program) but I don't get output for main every time. > > > > > This is kind of random, see below. > > > > > Additionally,'' the sample % also vary every time I file operf > command. I do not understand the working of operf that > > > how it gives different outputs for the same program. > > > > > > > How _counting_ works: > > > > Each time an event happens in CPU (in our case -- CPU_CLK_UNHALTED > -- one > > CPU tick), its counter is incremented by one. This way, you can > get the > > whole number of how many times the event happened: > > > > $ ocount ls > > Events were actively counted for 1863267 nanoseconds. > > Event counts (actual) for /bin/ls: > > Event Count % time > counted > > CPU_CLK_UNHALTED 2,672,309 100.00 > > > > This means that there were 2.672 million CPU ticks necessary to > execute > > `ls`. > > > > The tool for counting is *ocount*. > > > > > > How _profiling_ works: > > > > For profiling, events are counted the same way as when counting, > but only > > once per N event occurrences a "sample" is taken. A sample is a > piece of > > information about where in the code it happened, what was the > context and > > potentially call-stack, etc. This sample is saved into the > oprofile_data > > dir structure. When you finish profiling, opreport generates some > overview > > from the samples recorded. > > > > Of course, taking sample causes some overhead. Thus, you cannot > take it > > each time the event counter increments. For this purpose, there is > "sample > > rate" value which divides the event counter and takes a sample > only once > > per "sample rate" occurrences. Thus, it is pretty random, which > event > > occurrence is picked for a sample. > > > > Of course, your main() consumes CPU ticks and the counter is > incremented, > > but sometimes, one of these incrementations produces sample, > sometimes not, > > thus the function is missing. > > > > You have 2 samples on each of the line in the report. That is > really a low > > number, so it is very probable that sometimes the samples hit > main() and > > sometimes not. > > > > $ operf ls > > Profiling done. > > > > $ opreport > > Using /root/oprofile_data/samples/ for samples directory. > > CPU: Core 2, speed 2267 MHz (estimated) > > Counted CPU_CLK_UNHALTED events (Clock cycles when not halted) > with a unit mask of 0x00 (Unhalted core cycles) count 100000 > > CPU_CLK_UNHALT...| > > samples| %| > > ------------------ > > --> 37 100.000 ls > > CPU_CLK_UNHALT...| > > samples| %| > > ------------------ > > 27 72.9730 kallsyms > > 5 13.5135 libc-2.19.so > > 2 5.4054 ld-2.19.so > > 1 2.7027 ls > > 1 2.7027 ext4 > > 1 2.7027 libattr.so.1.1.0 > > > > You see that out of (estimated) 2.5-3.5 mil of event occurrences > (which is > > the scale `ls` usually needs on my PC), I got only 37 samples. > > > > The sample rate for each event are in `ophelp` as "min count". > > > > Why did I get 37 samples? My default setting of default event is: > > > > $ ophelp -d > > CPU_CLK_UNHALTED:100000:0:1:1 > > -------------------^^ > > > > This means that there is 1 sample per 100000 event occurrences. > > > > You can tweak this value a bit, but note that oprofile won't let > you set > > the value to be too low, due to the overhead reasons. > > > > Let's try 10000 (sampling 10 times more often): > > > > $ operf -e CPU_CLK_UNHALTED:10000 ls > > WARNING: Lost samples detected! See > > /root/oprofile_data/samples/operf.log > for details. > > Lowering the sampling rate may reduce or eliminate lost samples. > > See the '--events' option description in the operf man page for > help. > > Profiling done. > > > > $ opreport > > Using /root/oprofile_data/samples/ for samples directory. > > WARNING: Lost samples detected! See > > /root/oprofile_data/samples/operf.log > for details. > > CPU: Core 2, speed 2267 MHz (estimated) > > Counted CPU_CLK_UNHALTED events (Clock cycles when not halted) > with a unit mask of 0x00 (Unhalted core cycles) count 10000 > > CPU_CLK_UNHALT...| > > samples| %| > > ------------------ > > 429 100.000 ls > > CPU_CLK_UNHALT...| > > samples| %| > > ------------------ > > 313 72.9604 kallsyms > > 55 12.8205 ld-2.19.so > > 34 7.9254 libc-2.19.so > > 17 3.9627 ls > > 8 1.8648 ext4 > > 2 0.4662 binfmt_misc > > > > You see that both operf and opreport warned about lost samples > which means > > that the overhead in kernel was too high and some samples were > lost. But > > probably not that many, since you see that I got 429 samples > instead of > > 37 which is very roughly 10 times more. > > > > > > Profiling is not for exact measurement of how many CPU ticks > happened in > > a function, it is rather designed for relative comparison across > your code. > > > > If you are getting too few samples, try to decrease the sample > rate as I > > did in the example. Have a look at `ophelp` what is the lowest > value for > > your event. > > > > Also please note that the min sample rates are rough and depend on > the load, > > so under low load, lower rates can be used without issues. When > you profile > > systemwide on all CPUs, it's very high chance that you'll run into > an overhead > > and lose many samples... > > > > I hope I explained it a bit. > > > > Cheers, > > Michael > > > > > Thanks, > > > > > > Dhara Buch > > > > > > > > > > > > > > >
_______________________________________________ Xen-devel mailing list Xen-devel@lists.xen.org https://lists.xen.org/xen-devel