Hi All,

I am attaching an image with comparison running the CALL_METHOD in the old 
Grand Unified Python Benchmark (GUPB) suite (https://hg.python.org/benchmarks), 
with and without ASLR disabled.
You could see the run2run variation was reduced significantly, from data 
scattering all over the place, to just one single outlier, out of 30 repeated 
runs.
This effectively eliminated most of the variations for this micro-benchmark.

On a Linux system, you could do this by:
as root
echo 0 > /proc/sys/kernel/randomize_va_space   # to disable
echo 2 > /proc/sys/kernel/randomize_va_space   # to enable

If anyone still experiences run2run variation, I'd suggest to read on:
Based on my observation in our labs, a lot of factors could impact performance, 
including environment (yes, even a room temperature), HW components or related 
such as platforms, chipset, memory DIMMs, CPU generations and stepping, BIOS 
version, kernels, the list goes on and on.

Being said that, would it be helpful we work together, to identify the root 
cause, be it due to SW, or anything else?  We could start with a specific 
micro-benchmark, with specific goal as what to measure.
After that, or in parallel after some baseline work is done, then focus on 
measurement process/methodology?  

Is this helpful?

Thanks,

Peter


 
-----Original Message-----
From: Speed [mailto:speed-bounces+peter.xihong.wang=intel....@python.org] On 
Behalf Of Victor Stinner
Sent: Wednesday, March 15, 2017 11:11 AM
To: Antoine Pitrou <solip...@pitrou.net>
Cc: speed@python.org
Subject: Re: [Speed] Median +- MAD or Mean +- std dev?

2017-03-15 18:11 GMT+01:00 Antoine Pitrou <solip...@pitrou.net>:
> I would say keep it simple.  mean/stddev is informative enough, no 
> need to add or maintain options of dubious utility.

Ok. I added a message to suggest to use perf stats to analyze results.

Example of warnings with a benchmark result considered as unstable, python 
startup time measured by the new bench_command() function:
---
$ python3 -m perf show startup1.json
WARNING: the benchmark result may be unstable
* the standard deviation (6.08 ms) is 16% of the mean (39.1 ms)
* the minimum (23.6 ms) is 40% smaller than the mean (39.1 ms)

Try to rerun the benchmark with more runs, values and/or loops.
Run 'python3 -m perf system tune' command to reduce the system jitter.
Use perf stats to analyze results, or --quiet to hide warnings.

Median +- MAD: 40.7 ms +- 3.9 ms
----

Statistics of this result:
---
$ python3 -m perf stats startup1.json -q Total duration: 37.2 sec Start date: 
2017-03-15 18:02:46 End date: 2017-03-15 18:03:27 Raw value minimum: 189 ms Raw 
value maximum: 390 ms

Number of runs: 25
Total number of values: 75
Number of values per run: 3
Number of warmups per run: 1
Loop iterations per value: 8

Minimum: 23.6 ms (-42% of the median)
Median +- MAD: 40.7 ms +- 3.9 ms
Mean +- std dev: 39.1 ms +- 6.1 ms
Maximum: 48.7 ms (+20% of the median)
---

Victor
_______________________________________________
Speed mailing list
Speed@python.org
https://mail.python.org/mailman/listinfo/speed
_______________________________________________
Speed mailing list
Speed@python.org
https://mail.python.org/mailman/listinfo/speed

Reply via email to