On Tue, Apr 26, 2016 at 11:46 AM, Victor Stinner <victor.stin...@gmail.com> wrote: > Hi, > > 2016-04-26 10:56 GMT+02:00 Armin Rigo <ar...@tunes.org>: >> Hi, >> >> On 25 April 2016 at 08:25, Maciej Fijalkowski <fij...@gmail.com> wrote: >>> The problem with disabled ASLR is that you change the measurment from >>> a statistical distribution, to one draw from a statistical >>> distribution repeatedly. There is no going around doing multiple runs >>> and doing an average on that. >> >> You should mention that it is usually enough to do the following: >> instead of running once with PYTHONHASHSEED=0, run five or ten times >> with PYTHONHASHSEED in range(5 or 10). In this way, you get all >> benefits: not-too-long benchmarking, no randomness, but still some >> statistically relevant sampling. > > I guess that the number of required runs to get a nice distribution > depends on the size of the largest dictionary in the benchmark. I > mean, the dictionaries that matter in performance. > > The best would be to handle this transparently in perf.py. Either > disable all source of randomness, or run mutliple processes to have an > uniform distribution, rather than on only having one sample for one > specific config. Maybe it could be an option: by default, run multiple > processes, but have an option to only run one process using > PYTHONHASHSEED=0. > > By the way, timeit has a very similar issue. I'm quite sure that most > Python developers run "python -m timeit ..." at least 3 times and take > the minimum. "python -m timeit" could maybe be modified to also spawn > child processes to get a better distribution, and maybe also modified > to display the minimum, the average and the standard deviation? (not > only the minimum)
taking the minimum is a terrible idea anyway, none of the statistical discussion makes sense if you do that > > Well, the question is also if it's a good thing to have such really > tiny microbenchmark like bm_call_simple in the Python benchmark suite. > I spend 2 or 3 days to analyze CPython running bm_call_simple with > Linux perf tool, callgrind and cachegrind. I'm still unable to > understand the link between my changes on the C code and the result. > IMHO this specific benchmark depends on very low-level things like the > CPU L1 cache. Maybe bm_call_simple helps in some very specific use > cases, like trying to make Python function calls faster. But in other > cases, it can be a source of noise, confusion and frustration... > > Victor maybe it's just a terrible benchmark (it surely is for pypy for example) _______________________________________________ Speed mailing list Speed@python.org https://mail.python.org/mailman/listinfo/speed