Hi, Last months, I spent a lot of time on microbenchmarks. Probably too much time :-) I found a great Linux config to get a much more stable system to get reliable microbenchmarks: https://haypo-notes.readthedocs.org/microbenchmark.html
* isolate some CPU cores * force CPU to performance * disable ASLR * block IRQ on isolated CPU cores With such Linux config, the system load doesn't impact benchmark results at all. Last days, I almost lost my mind trying to figure out why a very tiny change in C code makes a difference up to 8% slower. My main issue was to get reliable benchmark since running the same microbenchmark using perf.py gave me "random" results. I finished to run directly the underlying script bm_call_simple.py: taskset -c 7 ./python ../benchmarks/performance/bm_call_simple.py -n 5 --timer perf_counter In a single run, timings of each loop iteration is very stable. Example: 0.22682707803323865 0.22741253697313368 0.227521265973337 0.22750743699725717 0.22752994997426867 0.22753606992773712 0.22742654103785753 0.22750875598285347 0.22752253606449813 0.22718404198531061 Problem: each new run gives a different result. Example: * run 1: 0.226... * run 2: 0.255... * run 3: 0.248... * run 4: 0.258... * etc. I saw 3 groups of values: ~0.226, ~0.248, ~0.255. I didn't understand how running the same program can give so different result. The reply is the randomization of the Python hash function. Aaaaaaah! The last source of entropy in my microbenchmark! The performance difference can be seen by forcing a specific hash function: PYTHONHASHSEED=2 => 0.254... PYTHONHASHSEED=1 => 0.246... PYTHONHASHSEED=5 => 0.228... Sadly, perf.py and timeit don't disable hash randomization for me. I hacked perf.py to set PYTHONHASHSEED=0 and magically the result became super stable! Multiple runs of the command: $ taskset_isolated.py python3 perf.py ../default/python-ref ../default/python -b call_simple --fast Outputs: ### call_simple ### Min: 0.232621 -> 0.247904: 1.07x slower Avg: 0.232628 -> 0.247941: 1.07x slower Significant (t=-591.78) Stddev: 0.00001 -> 0.00010: 13.7450x larger ### call_simple ### Min: 0.232619 -> 0.247904: 1.07x slower Avg: 0.232703 -> 0.247955: 1.07x slower Significant (t=-190.58) Stddev: 0.00029 -> 0.00011: 2.6336x smaller ### call_simple ### Min: 0.232621 -> 0.247903: 1.07x slower Avg: 0.232629 -> 0.247918: 1.07x slower Significant (t=-5896.14) Stddev: 0.00001 -> 0.00001: 1.3350x larger Even with --fast, the result is *very* stable. See the very good standard deviation. In 3 runs, I got exactly the same "1.07x". Average timings are the same +/-1 up to 4 digits! No need to use the ultra slow --rigourous option. This option is probably designed to hide the noise of a very unstable system. But using my Linux config, it doesn't seem to be needed anymore, at least on this very specific microbenchmark. Ok, now I can investigate why my change on the C code introduced a performance regression :-D Victor _______________________________________________ Speed mailing list [email protected] https://mail.python.org/mailman/listinfo/speed
