Serhiy Storchaka <storchaka+cpyt...@gmail.com> added the comment:

> ISTM that if a generator produces so much data that it is infeasible to fit 
> in memory, then it will also take a long time to loop over it and generate a 
> random value for each entry.

Good point!

$ ./python -m timeit -s 'from random import sample as s' 's(range(10**6), 50)'
10000 loops, best of 5: 25.6 usec per loop
$ ./python -m timeit -s 'from random import sample as s' 's(list(range(10**6)), 
50)'
10 loops, best of 5: 31.5 msec per loop
$ ./python -m timeit -s 'from random import reservoir_sample as s' 
's(range(10**6), 50)'
1 loop, best of 5: 328 msec per loop

$ ./python -m timeit -s 'from random import sample as s' 's(range(10**8), 50)'
10000 loops, best of 5: 26.9 usec per loop
$ ./python -m timeit -s 'from random import sample as s' 's(list(range(10**8)), 
50)'
1 loop, best of 5: 3.41 sec per loop
$ ./python -m timeit -s 'from random import reservoir_sample as s' 
's(range(10**8), 50)'
1 loop, best of 5: 36.5 sec per loop

It is possible that a generator produces not so much data, but every item takes 
much memory so the total size does not fit in memory. But I suppose that the 
generation time of larger items will be proportionally larger, so 
reservoir_sample() will be just as slow.

----------

_______________________________________
Python tracker <rep...@bugs.python.org>
<https://bugs.python.org/issue37682>
_______________________________________
_______________________________________________
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com

Reply via email to