Hi All, I am attaching an image with comparison running the CALL_METHOD in the old Grand Unified Python Benchmark (GUPB) suite (https://hg.python.org/benchmarks), with and without ASLR disabled. You could see the run2run variation was reduced significantly, from data scattering all over the place, to just one single outlier, out of 30 repeated runs. This effectively eliminated most of the variations for this micro-benchmark.
On a Linux system, you could do this by: as root echo 0 > /proc/sys/kernel/randomize_va_space # to disable echo 2 > /proc/sys/kernel/randomize_va_space # to enable If anyone still experiences run2run variation, I'd suggest to read on: Based on my observation in our labs, a lot of factors could impact performance, including environment (yes, even a room temperature), HW components or related such as platforms, chipset, memory DIMMs, CPU generations and stepping, BIOS version, kernels, the list goes on and on. Being said that, would it be helpful we work together, to identify the root cause, be it due to SW, or anything else? We could start with a specific micro-benchmark, with specific goal as what to measure. After that, or in parallel after some baseline work is done, then focus on measurement process/methodology? Is this helpful? Thanks, Peter -----Original Message----- From: Speed [mailto:speed-bounces+peter.xihong.wang=intel....@python.org] On Behalf Of Victor Stinner Sent: Wednesday, March 15, 2017 11:11 AM To: Antoine Pitrou <solip...@pitrou.net> Cc: speed@python.org Subject: Re: [Speed] Median +- MAD or Mean +- std dev? 2017-03-15 18:11 GMT+01:00 Antoine Pitrou <solip...@pitrou.net>: > I would say keep it simple. mean/stddev is informative enough, no > need to add or maintain options of dubious utility. Ok. I added a message to suggest to use perf stats to analyze results. Example of warnings with a benchmark result considered as unstable, python startup time measured by the new bench_command() function: --- $ python3 -m perf show startup1.json WARNING: the benchmark result may be unstable * the standard deviation (6.08 ms) is 16% of the mean (39.1 ms) * the minimum (23.6 ms) is 40% smaller than the mean (39.1 ms) Try to rerun the benchmark with more runs, values and/or loops. Run 'python3 -m perf system tune' command to reduce the system jitter. Use perf stats to analyze results, or --quiet to hide warnings. Median +- MAD: 40.7 ms +- 3.9 ms ---- Statistics of this result: --- $ python3 -m perf stats startup1.json -q Total duration: 37.2 sec Start date: 2017-03-15 18:02:46 End date: 2017-03-15 18:03:27 Raw value minimum: 189 ms Raw value maximum: 390 ms Number of runs: 25 Total number of values: 75 Number of values per run: 3 Number of warmups per run: 1 Loop iterations per value: 8 Minimum: 23.6 ms (-42% of the median) Median +- MAD: 40.7 ms +- 3.9 ms Mean +- std dev: 39.1 ms +- 6.1 ms Maximum: 48.7 ms (+20% of the median) --- Victor _______________________________________________ Speed mailing list Speed@python.org https://mail.python.org/mailman/listinfo/speed
_______________________________________________ Speed mailing list Speed@python.org https://mail.python.org/mailman/listinfo/speed