[Python-ideas] Re: Warn when iterating over an already exhausted generator

BoppreH via Python-ideas Tue, 13 Jun 2023 14:03:31 -0700

> In close to 10 years of experience with python I have never encountered 
> anything like this.


Here's a small selection of the StackOverflow questions from people who 
encountered this exact issue:

https://stackoverflow.com/questions/25336726/why-cant-i-iterate-twice-over-the-same-iterator-how-can-i-reset-the-iterator
https://stackoverflow.com/questions/10255273/iterating-on-a-file-doesnt-work-the-second-time?noredirect=1&lq=1
https://stackoverflow.com/questions/3906137/why-cant-i-call-read-twice-on-an-open-file
https://stackoverflow.com/questions/17777219/zip-variable-empty-after-first-use
https://stackoverflow.com/questions/42246819/loop-over-results-from-path-glob-pathlib
https://stackoverflow.com/questions/21715268/list-returned-by-map-function-disappears-after-one-use
https://stackoverflow.com/questions/14637154/performing-len-on-list-of-a-zip-object-clears-zip
https://stackoverflow.com/questions/44420135/filter-object-becomes-empty-after-iteration

Note that questions usually get few votes, and "what's wrong with my code" 
questions are especially poorly received, so getting even a couple of votes is 
a strong signal. The questions above range from 10 to 124 (!) votes, and have a 
combined 250k+ views.

These are the people I'd like to help.

> If you could give a full real-life scenario, then it might expose the problem 
> (if it exists) better.

Open a log file, count the number of lines, then find both the longest and 
number of unique "error" entries. Implemented in the most obvious way I can, 
using builtin functions, it has *two* such bugs (reusing the exhausted "f" and 
"error_lines").

import re
error_regex = re.compile('^ERROR: ')

with open('logs.txt') as f:
    n_lines = len(list(f))
    error_lines = filter(error_regex.match, f)
    longest_error = max(error_lines, key=len, default='')
    n_unique_errors = len(set(error_lines))

print(f'{n_lines=}\n{longest_error=}\n{n_unique_errors=}')


Is it hard to fix? No, not all, just store "list(f)" and replace "filter" with 
a longer list comprehension. Is it easy to spot? For an experienced developer, 
in this short example, with all the parts introduced together, yes. But having 
a natural solution silently give wrong answers is dangerous. At least having a 
warning would break the false sense of security.

> If I wanted sorted numbers, then ValueError wouldn’t help, because I do not 
> get sorted numbers.

I do want sorted numbers, but what can Python do in the face of broken code? 
There's a reason it raises errors for 1/0, str.invalid, and len(None). It's not 
"helpful" to the program, but it stops execution from continuing with a bad 
state.

I understand that backwards compatibility will probably prevent us from raising 
a new error. But a warning could help a lot of people.

I'm tempted to patch the Python interpreter and test some popular packages, to 
verify if doing this on purpose is as rare as I think it is.

On Tue, Jun 13, 2023, at 6:50 PM, Dom Grigonis wrote:
> In close to 10 years of experience with python I have never encountered 
> anything like this.
> 
> If I need to use a list later I never do ANY assignments to it. Why would I?
> 
> In the last example I would:
> ```
> strings = ['aa', '', 'bbb', 'c’]
> longest = max(filter(bool, strings), key=len)
> n_unique = len(set(strings))
> ```
> 
> And in initial example I don’t see why would I ever do this. It is very 
> unclear what is the scenario here:
> ```???
> numbers = (i for i in range(5))
> assert 5 not in numbers
> sorted(numbers)
> ```
> 1. If I wanted sorted numbers, then ValueError wouldn’t help, because I do 
> not get sorted numbers.
> 2. If I wanted unmodified list and if it was modified then it is an error, 
> your solution doesn’t work either.
> 3. If sorting is ok only on non-empty iterator, then just `assert sorted` 
> after sorting.
> 
> If you could give a full real-life scenario, then it might expose the problem 
> (if it exists) better.
> "There should be one-- and preferably only one --obvious way to do it.”
> 
> There is either: something to be improved or you are not using that "one 
> obvious" way.
> 
>> On 13 Jun 2023, at 18:05, BoppreH via Python-ideas <python-ideas@python.org> 
>> wrote:
>> 
>> @ChrisA: Shadowing "iter()" would only help with Barry's example.
>> 
>> @Jonathan: Updating documentation is helpful, but I find an automated check 
>> better. Too often the most obvious way to accomplish something silently 
>> triggers this behavior:
>> 
>> strings = ['aa', '', 'bbb', 'c']
>> strings = filter(bool, strings) # Adding this step makes n_unique always 0.
>> longest = max(strings, key=len)
>> n_unique = len(set(strings))
>> 
>> I feel like a warning here would save time and prevent bugs, and that my 
>> is_exhausted proposal, if implemented directly in the generators, is an easy 
>> way to accomplish this.
>> 
>> And I have to say I'm surprised by the responses. Does nobody else hit bugs 
>> like this and wish they were automatically detected? To be clear, raising 
>> ValueError is just an example; logging a warning would already be helpful, 
>> like Go's race condition detector.
>> 
>> 
>> --
>> BoppreH
>> _______________________________________________
>> Python-ideas mailing list -- python-ideas@python.org
>> To unsubscribe send an email to python-ideas-le...@python.org
>> https://mail.python.org/mailman3/lists/python-ideas.python.org/
>> Message archived at 
>> https://mail.python.org/archives/list/python-ideas@python.org/message/KWBFRIK4AYKSRG3FZCGYXFQ6ER7TWL3H/
>> Code of Conduct: http://python.org/psf/codeofconduct/

_______________________________________________
Python-ideas mailing list -- python-ideas@python.org
To unsubscribe send an email to python-ideas-le...@python.org
https://mail.python.org/mailman3/lists/python-ideas.python.org/
Message archived at 
https://mail.python.org/archives/list/python-ideas@python.org/message/TDM7X76DDQDNEVNW537OEUJTQ2QB6SFS/
Code of Conduct: http://python.org/psf/codeofconduct/

[Python-ideas] Re: Warn when iterating over an already exhausted generator

Reply via email to