Serhiy Storchaka added the comment:

> Sorry, I don't understand how running 1 iteration instead of 10 makes the 
> benchmark less reliable. IMO the reliability is more impacted by the number 
> of repeatitions (-r). I changed the default from 3 to 5 repetitions, so 
> timeit should be *more* reliable in Python 3.7 than 3.6.

Caches. Not high-level caching that can make the measurement senseless, but 
low-level caching, for example memory caching, that can cause small difference 
(but this difference can be larger than the effect that you measure). On every 
repetition you first run a setup code, and then run testing code in loops. 
After the first loop the memory cache is filled with used data and next loops 
can be faster. On next repetition running a setup code can unload this data 
from the memory cache, and the next loop will need to load it back from slow 
memory. Thus on every repetition the first loop is slower that the followings. 
If you run 10 or 100 loops the difference can be negligible, but if run the 
only one loop, the result can differs on 10% or more.

> $ python3.6 -m timeit 'pass'
> 100000000 loops, best of 3: 0.0339 usec per loop

This is a senseless example. 0.0339 usec is not a time of executing "pass", it 
is an overhead of the iteration. You can't use timeit for measuring the 
performance of the code that takes such small time. You just can't get the 
reliable result for it. Even for code that takes an order larger time the 
result is not very reliable. Thus no need to worry about timing much less than 
1 usec.


Python tracker <>
Python-bugs-list mailing list

Reply via email to