STINNER Victor added the comment:

Serhiy Storchaka added the comment:
>> Sorry, I don't understand how running 1 iteration instead of 10 makes the 
>> benchmark less reliable. IMO the reliability is more impacted by the number

> Caches. Not high-level caching that can make the measurement senseless, but 
> low-level caching, for example memory caching, that can cause small 
> difference (but this difference can be larger than the effect that you 
> measure). On every repetition you first run a setup code, and then run 
> testing code in loops. After the first loop the memory cache is filled with 
> used data and next loops can be faster. On next repetition running a setup 
> code can unload this data from the memory cache, and the next loop will need 
> to load it back from slow memory. Thus on every repetition the first loop is 
> slower that the followings. If you run 10 or 100 loops the difference can be 
> negligible, but if run the only one loop, the result can differs on 10% or 
> more.

It seems like you give a time budget of less than 20 seconds to timeit
according to one of your previous message. IMO reliability is
incompatible with quick timeit command. If you want a reliable
benchmark, you need much more repetition than just 5. perf uses
20x(1+3) by default: it always run the benchmark once to "warmup" the
benchmark, but ignore this timing. All parameters can be tuned on the
command line (number of processes, warmups, samples, etc.).

Well, I'm not really interested by timeit in the stdlib anymore since
it seems to make significant enhancements without bikeshedding. So I
let you revert my change if you consider that it makes timeit less


Python tracker <>
Python-bugs-list mailing list

Reply via email to