New submission from Thomas Dybdahl Ahle <lob...@gmail.com>:

Given a generator `f()` we can use `random.sample(list(f()), 10)` to get a 
uniform sample of the values generated.
This is fine, and fast, as long as `list(f())` easily fits in memory.
However, if it doesn't, one has to implement the reservoir sampling algorithm 
as a pure python function, which is much slower, and not so easy.

It seems that having a fast reservoir sampling implementation in 
`random.sample` to use for iterators would be both useful and make the API more 
predictable.

Currently when passing an iterator `random.sample` throws `TypeError: 
Population must be a sequence or set.`.
This is inconsistent with most of the standard library which accepts lists and 
iterators transparently.

I apologize if this enhancement has already been discussed.
I wasn't able to find it.
If wanted, I can write up a pull request.
I believe questions like this: 
https://stackoverflow.com/questions/12581437/python-random-sample-with-a-generator-iterable-iterator
 makes it clear that such functionality is wanted and non-obvious.

----------
components: Library (Lib)
messages: 348445
nosy: thomasahle
priority: normal
severity: normal
status: open
title: random.sample should support iterators
type: enhancement
versions: Python 2.7, Python 3.5, Python 3.6, Python 3.7, Python 3.8, Python 3.9

_______________________________________
Python tracker <rep...@bugs.python.org>
<https://bugs.python.org/issue37682>
_______________________________________
_______________________________________________
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com

Reply via email to