Serhiy Storchaka added the comment:
> Sorry, I don't understand how running 1 iteration instead of 10 makes the
> benchmark less reliable. IMO the reliability is more impacted by the number
> of repeatitions (-r). I changed the default from 3 to 5 repetitions, so
> timeit should be *more* reliable in Python 3.7 than 3.6.
Caches. Not high-level caching that can make the measurement senseless, but
low-level caching, for example memory caching, that can cause small difference
(but this difference can be larger than the effect that you measure). On every
repetition you first run a setup code, and then run testing code in loops.
After the first loop the memory cache is filled with used data and next loops
can be faster. On next repetition running a setup code can unload this data
from the memory cache, and the next loop will need to load it back from slow
memory. Thus on every repetition the first loop is slower that the followings.
If you run 10 or 100 loops the difference can be negligible, but if run the
only one loop, the result can differs on 10% or more.
> $ python3.6 -m timeit 'pass'
> 100000000 loops, best of 3: 0.0339 usec per loop
This is a senseless example. 0.0339 usec is not a time of executing "pass", it
is an overhead of the iteration. You can't use timeit for measuring the
performance of the code that takes such small time. You just can't get the
reliable result for it. Even for code that takes an order larger time the
result is not very reliable. Thus no need to worry about timing much less than
Python tracker <rep...@bugs.python.org>
Python-bugs-list mailing list