> If this is true, then perf is entirely unsuitable for
> microoptimization, since microoptimization depends on having
> reproducible results.

Reproducibility can be misleading however if it is achieved by simulated
CPU cache and instruction pipelines, as this could lead to favoring the
wrong microoptimizations.

Perf stat variability (and other "real" measurements) can be minimized by
making multiple runs and taking the minimum, but also by keeping an eye on
the variance of realtime runs.

A high variance is often the sign that either the host machine is busy
(which can be dealt with seperately) or that the code is relying on
behaviors whose performance is "fragile", such as kernel context switches,
thread switches, mutexes, interrupts, I/O "coincidences" or even race
conditions.

For instance a spinlock may never exceed its spin count to enter sleep/wait
states in a simulated, deterministic environment, but could occasionnally
do in a more realistic setting, which could lead to a very different
performance profile when that happens, and ultimately favor different
optimization strategies ("slower but more robust").

Eric

On Fri, Dec 30, 2016 at 12:03 AM, Dominique Pellé <dominique.pe...@gmail.com
> wrote:

> Bob Friesenhahn <bfrie...@simple.dallas.tx.us> wrote:
>
> > On Thu, 29 Dec 2016, Darko Volaric wrote:
> >
> >> What are you basing that theory on?
> >
> >
> > Perf is claimed to provide very good results but they are real results
> based
> > on real measurements.  Due to this, the measured results are very
> different
> > for the first time the program is executed and the second time it is
> > executed.  Any other factor on the machine would impact perf results.
> >
> > It seems that cachegrind produces absolutely consistent results which do
> not
> > depend on I/O, multi-core, or VM artifacts.
>
> You're right. Consistency can matter more where measuring
> small improvements that add up.
>
> I just tried "valgrind --tool=cachegrind ..." and "perf stat ..."
> with the same command. Valgrind result was more indeed
> more consistent across multiple runs.
>
> Regarding speed of measurement, the same command
> took 13.8 sec with cachegrind vs only 0.28 sec with "perf stat"
> and 0.27 sec with neither cachegrind nor perf stat. So
> perf stat has almost no overhead whereas cachegrind has a
> big overhead, making it impractical when measuring slow
> commands.
>
> Dominique
> _______________________________________________
> sqlite-users mailing list
> sqlite-users@mailinglists.sqlite.org
> http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
>
_______________________________________________
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users

Reply via email to