> If this is true, then perf is entirely unsuitable for > microoptimization, since microoptimization depends on having > reproducible results.
Reproducibility can be misleading however if it is achieved by simulated CPU cache and instruction pipelines, as this could lead to favoring the wrong microoptimizations. Perf stat variability (and other "real" measurements) can be minimized by making multiple runs and taking the minimum, but also by keeping an eye on the variance of realtime runs. A high variance is often the sign that either the host machine is busy (which can be dealt with seperately) or that the code is relying on behaviors whose performance is "fragile", such as kernel context switches, thread switches, mutexes, interrupts, I/O "coincidences" or even race conditions. For instance a spinlock may never exceed its spin count to enter sleep/wait states in a simulated, deterministic environment, but could occasionnally do in a more realistic setting, which could lead to a very different performance profile when that happens, and ultimately favor different optimization strategies ("slower but more robust"). Eric On Fri, Dec 30, 2016 at 12:03 AM, Dominique Pellé <dominique.pe...@gmail.com > wrote: > Bob Friesenhahn <bfrie...@simple.dallas.tx.us> wrote: > > > On Thu, 29 Dec 2016, Darko Volaric wrote: > > > >> What are you basing that theory on? > > > > > > Perf is claimed to provide very good results but they are real results > based > > on real measurements. Due to this, the measured results are very > different > > for the first time the program is executed and the second time it is > > executed. Any other factor on the machine would impact perf results. > > > > It seems that cachegrind produces absolutely consistent results which do > not > > depend on I/O, multi-core, or VM artifacts. > > You're right. Consistency can matter more where measuring > small improvements that add up. > > I just tried "valgrind --tool=cachegrind ..." and "perf stat ..." > with the same command. Valgrind result was more indeed > more consistent across multiple runs. > > Regarding speed of measurement, the same command > took 13.8 sec with cachegrind vs only 0.28 sec with "perf stat" > and 0.27 sec with neither cachegrind nor perf stat. So > perf stat has almost no overhead whereas cachegrind has a > big overhead, making it impractical when measuring slow > commands. > > Dominique > _______________________________________________ > sqlite-users mailing list > sqlite-users@mailinglists.sqlite.org > http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users > _______________________________________________ sqlite-users mailing list sqlite-users@mailinglists.sqlite.org http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users