On Wed, Oct 19, 2016 at 7:07 PM, Terry Reedy <tjre...@udel.edu> wrote: > On 10/19/2016 12:38 AM, Nathaniel Smith wrote: > >> I'd like to propose that Python's iterator protocol be enhanced to add >> a first-class notion of completion / cleanup. > > > With respect the the standard iterator protocol, a very solid -1 from me. > (I leave commenting specifically on __aiterclose__ to Yury.) > > 1. I consider the introduction of iterables and the new iterator protocol in > 2.2 and their gradual replacement of lists in many situations to be the > greatest enhancement to Python since 1.3 (my first version). They are, to > me, they one of Python's greatest features and the minimal nature of the > protocol an essential part of what makes them great.
Minimalism for its own sake isn't really a core Python value, and in any case the minimalism ship has kinda sailed -- we effectively already have send/throw/close as optional parts of the protocol (they're most strongly associated with generators, but you're free to add them to your own iterators and e.g. yield from will happily work with that). This proposal is basically "we formalize and start automatically calling the 'close' methods that are already there". > 2. I think you greatly underestimate the negative impact, just as we did > with changing str is bytes to str is unicode. The change itself, embodied > in for loops, will break most non-trivial programs. You yourself note that > there will have to be pervasive changes in the stdlib just to begin fixing > the breakage. The long-ish list of stdlib changes is about enabling the feature everywhere, not about fixing backwards incompatibilities. It's an important question though what programs will break and how badly. To try and get a better handle on it I've been playing a bit with an instrumented version of CPython that logs whenever the same iterator is passed to multiple 'for' loops. I'll write up the results in more detail, but the summary so far is that there seem to be ~8 places in the stdlib that would need preserve() calls added, and ~3 in django. Maybe 2-3 hours and 1 hour of work respectively to fix? It's not a perfect measure, and the cost certainly isn't zero, but it's at a completely different order of magnitude than the str changes. Among other things, this is a transition that allows for gradual opt-in via a __future__, and fine-grained warnings pointing you at what you need to fix, neither of which were possible for str->unicode. > 3. Though perhaps common for what you do, the need for the change is > extremely rare in the overall Python world. Iterators depending on an > external resource are rare (< 1%, I would think). Incomplete iteration is > also rare (also < 1%, I think). And resources do not always need to > releases immediately. This could equally well be an argument that the change is fine -- e.g. if you're always doing complete iteration, or just iterating over lists and stuff, then it literally doesn't affect you at all either way... > 4. Previous proposals to officially augment the iterator protocol, even with > optional methods, have been rejected, and I think this one should be too. > > a. Add .__len__ as an option. We added __length_hint__, which an iterator > may implement, but which is not part of the iterator protocol. It is also > ignored by bool(). > > b., c. Add __bool__ and/or peek(). I posted a LookAhead wrapper class that > implements both for most any iterable. I suspect that the is rarely used. > > >> def read_newline_separated_json(path): >> with open(path) as file_handle: # <-- with block >> for line in file_handle: >> yield json.loads(line) > > > One problem with passing paths around is that it makes the receiving > function hard to test. I think functions should at least optionally take an > iterable of lines, and make the open part optional. But then closing should > also be conditional. Sure, that's all true, but this is the problem with tiny documentation examples :-). The point here was to explain the surprising interaction between generators and with blocks in the simplest way, not to demonstrate the ideal solution to the problem of reading newline-separated JSON. Everything you want is still doable in a post-__iterclose__ world -- in particular, if you do for doc in read_newline_separated_json(lines_generator()): ... then both iterators will be closed when the for loop exits. But if you want to re-use the lines_generator, just write: it = lines_generator() for doc in read_newline_separated_json(preserve(it)): ... for more_lines in it: ... > If the combination of 'with', 'for', and 'yield' do not work together, then > do something else, rather than changing the meaning of 'for'. Moving > responsibility for closing the file from 'with' to 'for', makes 'with' > pretty useless, while overloading 'for' with something that is rarely > needed. This does not strike me as the right solution to the problem. > >> for document in read_newline_separated_json(path): # <-- outer for loop >> ... > > > If the outer loop determines when the file should be closed, then why not > open it there? What fails with > > try: > lines = open(path) > gen = read_newline_separated_json(lines) > for doc in gen: do_something(doc) > finally: > lines.close > # and/or gen.throw(...) to stop the generator. Sure, that works in this trivial case, but they aren't all trivial :-). See the example from my first email about a WSGI-like interface where response handlers are generators: in that use case, your suggestion that we avoid all resource management inside generators would translate to: "webapps can't open files". (Or database connections, proxy requests, ... or at least, can't hold them open while streaming out response data.) Or sticking to concrete examples, here's a toy-but-plausible generator where the put-the-with-block-outside strategy seems rather difficult to implement: # Yields all lines in all files in 'directory' that contain the substring 'needle' def recursive_grep(directory, needle): for dirpath, _, filenames in os.walk(directory): for filename in filenames: with open(os.path.join(dirpath, filename)) as file_handle: for line in file_handle: if needle in line: yield line -n -- Nathaniel J. Smith -- https://vorpus.org _______________________________________________ Python-ideas mailing list Python-ideas@python.org https://mail.python.org/mailman/listinfo/python-ideas Code of Conduct: http://python.org/psf/codeofconduct/