Re: [Xen-devel] Operf and Opreport outputs

dhara buch Thu, 06 Jul 2017 02:05:27 -0700

Hello,

I tried the perf-hello.c file and found the results as you explained. It is
very helpful to me. Now, for PERF_COUNT_HW_BRANCH_MISSES, I am getting
varying values every time unlike PERF_COUNT_HW_INSTRUCTIONS.


To get count for only defined process, I am trying to set perf_event_attr
structure variables sample_type as PERF_SAMPLE_BRANCH_STACK and
branch_sample_type as PERF_SAMPLE_BRANCH_USER. By doing so, I am getting
error no. 5 by perf_event_open. What may be the problem?

Dhara Buch

On Mon, Jun 19, 2017 at 5:51 PM, Michael Petlan <mpet...@redhat.com> wrote:

> On Mon, 19 Jun 2017, dhara buch wrote:
> > Hello,
> > Thanks a lot for the detailed explanation. I could understand the
> working of operf and opreport.
> > Unlike operf, ocount counts each occurrence of the monitored event. In
> such a case, why ocount also gives varying values of CPU_CLK_UNHALTED even
> when the monitored code doesn't get
> > changed. Is it because of background processes? If so, then can we get
> event count for only monitored command?
>
> Hi Dhara,
>
> it is kind of because of the background processes, but not in the way that
> ocount
> would count also their CPU consumption. If you don't use '-s' option,
> ocount will
> count only what your command does.
>
> However, it counts either kernelspace, userspace or both (default if you
> are root).
> This can be configured by the event's flags:
>
> Both:
> -e CPU_CLK_UNHALTED:0:1:1
>
> Only kernelspace (used by the command, i.e. syscalls, etc):
> -e CPU_CLK_UNHALTED:0:1:0
>
> Only userspace (i.e. syscalls are excluded):
> -e CPU_CLK_UNHALTED:0:0:1
>
> (the format is (ocount): event:umask:kernelspace:userspace,
> for operf it is: event:umask:sample_rate:kernelspace:userspace)
>
> The numbers are slightly different each run because the process does not
> live in
> vacuum. It is usually dynamically linked, it calls libraries, kernel
> switches
> contexts, etc. All these things cause the differences. Also, the CPU
> optimizes
> the execution with floating success, depending on what else runs there
> with it.
> If the process is interrupted many times, the CPU consumption will be
> higher.
>
> I have a sample code which I linked dynamically and statically. While the
> dyn.
> linked executable always takes around 670,260 cycles, the statically
> linked one
> takes around 380,202 and seems to be more steady. Since it is very simple
> code,
> you can see that the linker-related stuff took more than 40% of the CPU
> cycles).
>
> Executing a binary is so complex problem that there is no surprise that
> those
> measurements give different values every time.
>
> ........
>
> TIP:
>
> You can use either papi library or the built-in kernel support for counting
> just a particular part of code, e.g. a function. I have attached one
> example
> which uses perfevents kernel background. I have two functions for factorial
> counting, one is slightly less effective than the other. I count cycles and
> instructions for both. When executing repeatedly, you can see, that the
> inst.
> count is the same all the time (at least on my laptop), but the cycles
> count
> differs even for just the stuff in the functions which are very simple.
>
> cc -o perf-hello perf-hello.c myperf.c
> ./perf-hello
>
> Ineffective method:
>         PERF_COUNT_HW_CPU_CYCLES = 32192
>         PERF_COUNT_HW_INSTRUCTIONS = 20681
>
> More effective method:
>         PERF_COUNT_HW_CPU_CYCLES = 15931
>         PERF_COUNT_HW_INSTRUCTIONS = 19103
>
> ........
>
> Cheers,
> Michael
>
>
> >
> > Thank you,
> >
> > On Fri, Jun 16, 2017 at 4:52 PM, Michael Petlan <mpet...@redhat.com>
> wrote:
> >       On Fri, 16 Jun 2017, dhara buch wrote:
> >       > Hello,
> >       >
> >       > Many days ago you had helped me a lot in using Oprofile in
> Virtual Machine. That time, I also had a query regarding
> >       > getting information from 'opreport --symbols --debuginfo'
> command.
> >
> >       Hello,
> >
> >       yeah, I remember...
> >
> >       >
> >       > Actually, I am profiling a file1.c file with the following
> commands...
> >       >
> >       > 1. gcc -g file1.c -o file1
> >       >
> >       > 2. operf ./file1 --events=CPU_CLK_UNHALTED
> >       >
> >       > 3. opreport --symbols --debuginfo
> >       >
> >       > the output contains various lines where one line look follows:
> >       >
> >       > samples  %        linenr info                 image
> name               symbol name
> >       >
> >       > 2       12.5000  rsaimp.c:16
> rsaimp                   MM
> >       > 2        12.5000  rsaimp.c:34
> rsaimp                   main
> >       >
> >       > When I execute commands 2 and 3 multiple times, each time I gwt
> the output for symbole name MM (which is UDF in the
> >       > program) but I don't get output for main every time.
> >       >
> >       This is kind of random, see below.
> >
> >       > Additionally,'' the sample % also vary every time I file operf
> command. I do not understand the working of operf that
> >       > how it gives different outputs for the same program.
> >       >
> >
> >       How _counting_ works:
> >
> >       Each time an event happens in CPU (in our case -- CPU_CLK_UNHALTED
> -- one
> >       CPU tick), its counter is incremented by one. This way, you can
> get the
> >       whole number of how many times the event happened:
> >
> >       $ ocount ls
> >       Events were actively counted for 1863267 nanoseconds.
> >       Event counts (actual) for /bin/ls:
> >                 Event                    Count                    % time
> counted
> >                 CPU_CLK_UNHALTED         2,672,309                100.00
> >
> >       This means that there were 2.672 million CPU ticks necessary to
> execute
> >       `ls`.
> >
> >       The tool for counting is *ocount*.
> >
> >
> >       How _profiling_ works:
> >
> >       For profiling, events are counted the same way as when counting,
> but only
> >       once per N event occurrences a "sample" is taken. A sample is a
> piece of
> >       information about where in the code it happened, what was the
> context and
> >       potentially call-stack, etc. This sample is saved into the
> oprofile_data
> >       dir structure. When you finish profiling, opreport generates some
> overview
> >       from the samples recorded.
> >
> >       Of course, taking sample causes some overhead. Thus, you cannot
> take it
> >       each time the event counter increments. For this purpose, there is
> "sample
> >       rate" value which divides the event counter and takes a sample
> only once
> >       per "sample rate" occurrences. Thus, it is pretty random, which
> event
> >       occurrence is picked for a sample.
> >
> >       Of course, your main() consumes CPU ticks and the counter is
> incremented,
> >       but sometimes, one of these incrementations produces sample,
> sometimes not,
> >       thus the function is missing.
> >
> >       You have 2 samples on each of the line in the report. That is
> really a low
> >       number, so it is very probable that sometimes the samples hit
> main() and
> >       sometimes not.
> >
> >       $ operf ls
> >       Profiling done.
> >
> >       $ opreport
> >       Using /root/oprofile_data/samples/ for samples directory.
> >       CPU: Core 2, speed 2267 MHz (estimated)
> >       Counted CPU_CLK_UNHALTED events (Clock cycles when not halted)
> with a unit mask of 0x00 (Unhalted core cycles) count 100000
> >       CPU_CLK_UNHALT...|
> >         samples|      %|
> >       ------------------
> >        -->   37 100.000 ls
> >              CPU_CLK_UNHALT...|
> >                samples|      %|
> >                        ------------------
> >                     27 72.9730 kallsyms
> >                      5 13.5135 libc-2.19.so
> >                      2  5.4054 ld-2.19.so
> >                      1  2.7027 ls
> >                      1  2.7027 ext4
> >                      1  2.7027 libattr.so.1.1.0
> >
> >       You see that out of (estimated) 2.5-3.5 mil of event occurrences
> (which is
> >       the scale `ls` usually needs on my PC), I got only 37 samples.
> >
> >       The sample rate for each event are in `ophelp` as "min count".
> >
> >       Why did I get 37 samples? My default setting of default event is:
> >
> >       $ ophelp -d
> >       CPU_CLK_UNHALTED:100000:0:1:1
> >       -------------------^^
> >
> >       This means that there is 1 sample per 100000 event occurrences.
> >
> >       You can tweak this value a bit, but note that oprofile won't let
> you set
> >       the value to be too low, due to the overhead reasons.
> >
> >       Let's try 10000 (sampling 10 times more often):
> >
> >       $ operf -e CPU_CLK_UNHALTED:10000 ls
> >       WARNING: Lost samples detected! See 
> > /root/oprofile_data/samples/operf.log
> for details.
> >       Lowering the sampling rate may reduce or eliminate lost samples.
> >       See the '--events' option description in the operf man page for
> help.
> >       Profiling done.
> >
> >       $ opreport
> >       Using /root/oprofile_data/samples/ for samples directory.
> >       WARNING: Lost samples detected! See 
> > /root/oprofile_data/samples/operf.log
> for details.
> >       CPU: Core 2, speed 2267 MHz (estimated)
> >       Counted CPU_CLK_UNHALTED events (Clock cycles when not halted)
> with a unit mask of 0x00 (Unhalted core cycles) count 10000
> >       CPU_CLK_UNHALT...|
> >         samples|      %|
> >       ------------------
> >             429 100.000 ls
> >             CPU_CLK_UNHALT...|
> >               samples|      %|
> >               ------------------
> >                     313 72.9604 kallsyms
> >                      55 12.8205 ld-2.19.so
> >                      34  7.9254 libc-2.19.so
> >                      17  3.9627 ls
> >                       8  1.8648 ext4
> >                       2  0.4662 binfmt_misc
> >
> >       You see that both operf and opreport warned about lost samples
> which means
> >       that the overhead in kernel was too high and some samples were
> lost. But
> >       probably not that many, since you see that I got 429 samples
> instead of
> >       37 which is very roughly 10 times more.
> >
> >
> >       Profiling is not for exact measurement of how many CPU ticks
> happened in
> >       a function, it is rather designed for relative comparison across
> your code.
> >
> >       If you are getting too few samples, try to decrease the sample
> rate as I
> >       did in the example. Have a look at `ophelp` what is the lowest
> value for
> >       your event.
> >
> >       Also please note that the min sample rates are rough and depend on
> the load,
> >       so under low load, lower rates can be used without issues. When
> you profile
> >       systemwide on all CPUs, it's very high chance that you'll run into
> an overhead
> >       and lose many samples...
> >
> >       I hope I explained it a bit.
> >
> >       Cheers,
> >       Michael
> >
> >       > Thanks,
> >       >
> >       > Dhara Buch
> >       >
> >       >
> >
> >
> >
> >
>

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

Re: [Xen-devel] Operf and Opreport outputs

Reply via email to