Re: [Python-ideas] Deterministic iterator cleanup

Neil Girdhar Wed, 19 Oct 2016 06:08:46 -0700

This is a very interesting proposal.  I just wanted to share something I 
found in my quick search:


http://stackoverflow.com/questions/14797930/python-custom-iterator-close-a-file-on-stopiteration

Could you explain why the accepted answer there doesn't address this issue?

class Parse(object):
    """A generator that iterates through a file"""
    def __init__(self, path):
        self.path = path

  def __iter__(self):
        with open(self.path) as f:
            yield from f


Best,

Neil

On Wednesday, October 19, 2016 at 12:39:34 AM UTC-4, Nathaniel Smith wrote:
>
> Hi all, 
>
> I'd like to propose that Python's iterator protocol be enhanced to add 
> a first-class notion of completion / cleanup. 
>
> This is mostly motivated by thinking about the issues around async 
> generators and cleanup. Unfortunately even though PEP 525 was accepted 
> I found myself unable to stop pondering this, and the more I've 
> pondered the more convinced I've become that the GC hooks added in PEP 
> 525 are really not enough, and that we'll regret it if we stick with 
> them, or at least with them alone :-/. The strategy here is pretty 
> different -- it's an attempt to dig down and make a fundamental 
> improvement to the language that fixes a number of long-standing rough 
> spots, including async generators. 
>
> The basic concept is relatively simple: just adding a '__iterclose__' 
> method that 'for' loops call upon completion, even if that's via break 
> or exception. But, the overall issue is fairly complicated + iterators 
> have a large surface area across the language, so the text below is 
> pretty long. Mostly I wrote it all out to convince myself that there 
> wasn't some weird showstopper lurking somewhere :-). For a first pass 
> discussion, it probably makes sense to mainly focus on whether the 
> basic concept makes sense? The main rationale is at the top, but the 
> details are there too for those who want them. 
>
> Also, for *right* now I'm hoping -- probably unreasonably -- to try to 
> get the async iterator parts of the proposal in ASAP, ideally for 
> 3.6.0 or 3.6.1. (I know this is about the worst timing for a proposal 
> like this, which I apologize for -- though async generators are 
> provisional in 3.6, so at least in theory changing them is not out of 
> the question.) So again, it might make sense to focus especially on 
> the async parts, which are a pretty small and self-contained part, and 
> treat the rest of the proposal as a longer-term plan provided for 
> context. The comparison to PEP 525 GC hooks comes right after the 
> initial rationale. 
>
> Anyway, I'll be interested to hear what you think! 
>
> -n 
>
> ------------------ 
>
> Abstract 
> ======== 
>
> We propose to extend the iterator protocol with a new 
> ``__(a)iterclose__`` slot, which is called automatically on exit from 
> ``(async) for`` loops, regardless of how they exit. This allows for 
> convenient, deterministic cleanup of resources held by iterators 
> without reliance on the garbage collector. This is especially valuable 
> for asynchronous generators. 
>
>
> Note on timing 
> ============== 
>
> In practical terms, the proposal here is divided into two separate 
> parts: the handling of async iterators, which should ideally be 
> implemented ASAP, and the handling of regular iterators, which is a 
> larger but more relaxed project that can't start until 3.7 at the 
> earliest. But since the changes are closely related, and we probably 
> don't want to end up with async iterators and regular iterators 
> diverging in the long run, it seems useful to look at them together. 
>
>
> Background and motivation 
> ========================= 
>
> Python iterables often hold resources which require cleanup. For 
> example: ``file`` objects need to be closed; the `WSGI spec 
> <https://www.python.org/dev/peps/pep-0333/>`_ adds a ``close`` method 
> on top of the regular iterator protocol and demands that consumers 
> call it at the appropriate time (though forgetting to do so is a 
> `frequent source of bugs 
> <http://blog.dscpl.com.au/2012/10/obligations-for-calling-close-on.html>`_); 
>
> and PEP 342 (based on PEP 325) extended generator objects to add a 
> ``close`` method to allow generators to clean up after themselves. 
>
> Generally, objects that need to clean up after themselves also define 
> a ``__del__`` method to ensure that this cleanup will happen 
> eventually, when the object is garbage collected. However, relying on 
> the garbage collector for cleanup like this causes serious problems in 
> at least two cases: 
>
> - In Python implementations that do not use reference counting (e.g. 
> PyPy, Jython), calls to ``__del__`` may be arbitrarily delayed -- yet 
> many situations require *prompt* cleanup of resources. Delayed cleanup 
> produces problems like crashes due to file descriptor exhaustion, or 
> WSGI timing middleware that collects bogus times. 
>
> - Async generators (PEP 525) can only perform cleanup under the 
> supervision of the appropriate coroutine runner. ``__del__`` doesn't 
> have access to the coroutine runner; indeed, the coroutine runner 
> might be garbage collected before the generator object. So relying on 
> the garbage collector is effectively impossible without some kind of 
> language extension. (PEP 525 does provide such an extension, but it 
> has a number of limitations that this proposal fixes; see the 
> "alternatives" section below for discussion.) 
>
> Fortunately, Python provides a standard tool for doing resource 
> cleanup in a more structured way: ``with`` blocks. For example, this 
> code opens a file but relies on the garbage collector to close it:: 
>
>   def read_newline_separated_json(path): 
>       for line in open(path): 
>           yield json.loads(line) 
>
>   for document in read_newline_separated_json(path): 
>       ... 
>
> and recent versions of CPython will point this out by issuing a 
> ``ResourceWarning``, nudging us to fix it by adding a ``with`` block:: 
>
>   def read_newline_separated_json(path): 
>       with open(path) as file_handle:      # <-- with block 
>           for line in file_handle: 
>               yield json.loads(line) 
>
>   for document in read_newline_separated_json(path):  # <-- outer for loop 
>       ... 
>
> But there's a subtlety here, caused by the interaction of ``with`` 
> blocks and generators. ``with`` blocks are Python's main tool for 
> managing cleanup, and they're a powerful one, because they pin the 
> lifetime of a resource to the lifetime of a stack frame. But this 
> assumes that someone will take care of cleaning up the stack frame... 
> and for generators, this requires that someone ``close`` them. 
>
> In this case, adding the ``with`` block *is* enough to shut up the 
> ``ResourceWarning``, but this is misleading -- the file object cleanup 
> here is still dependent on the garbage collector. The ``with`` block 
> will only be unwound when the ``read_newline_separated_json`` 
> generator is closed. If the outer ``for`` loop runs to completion then 
> the cleanup will happen immediately; but if this loop is terminated 
> early by a ``break`` or an exception, then the ``with`` block won't 
> fire until the generator object is garbage collected. 
>
> The correct solution requires that all *users* of this API wrap every 
> ``for`` loop in its own ``with`` block:: 
>
>   with closing(read_newline_separated_json(path)) as genobj: 
>       for document in genobj: 
>           ... 
>
> This gets even worse if we consider the idiom of decomposing a complex 
> pipeline into multiple nested generators:: 
>
>   def read_users(path): 
>       with closing(read_newline_separated_json(path)) as gen: 
>           for document in gen: 
>               yield User.from_json(document) 
>
>   def users_in_group(path, group): 
>       with closing(read_users(path)) as gen: 
>           for user in gen: 
>               if user.group == group: 
>                   yield user 
>
> In general if you have N nested generators then you need N+1 ``with`` 
> blocks to clean up 1 file. And good defensive programming would 
> suggest that any time we use a generator, we should assume the 
> possibility that there could be at least one ``with`` block somewhere 
> in its (potentially transitive) call stack, either now or in the 
> future, and thus always wrap it in a ``with``. But in practice, 
> basically nobody does this, because programmers would rather write 
> buggy code than tiresome repetitive code. In simple cases like this 
> there are some workarounds that good Python developers know (e.g. in 
> this simple case it would be idiomatic to pass in a file handle 
> instead of a path and move the resource management to the top level), 
> but in general we cannot avoid the use of ``with``/``finally`` inside 
> of generators, and thus dealing with this problem one way or another. 
> When beauty and correctness fight then beauty tends to win, so it's 
> important to make correct code beautiful. 
>
> Still, is this worth fixing? Until async generators came along I would 
> have argued yes, but that it was a low priority, since everyone seems 
> to be muddling along okay -- but async generators make it much more 
> urgent. Async generators cannot do cleanup *at all* without some 
> mechanism for deterministic cleanup that people will actually use, and 
> async generators are particularly likely to hold resources like file 
> descriptors. (After all, if they weren't doing I/O, they'd be 
> generators, not async generators.) So we have to do something, and it 
> might as well be a comprehensive fix to the underlying problem. And 
> it's much easier to fix this now when async generators are first 
> rolling out, then it will be to fix it later. 
>
> The proposal itself is simple in concept: add a ``__(a)iterclose__`` 
> method to the iterator protocol, and have (async) ``for`` loops call 
> it when the loop is exited, even if this occurs via ``break`` or 
> exception unwinding. Effectively, we're taking the current cumbersome 
> idiom (``with`` block + ``for`` loop) and merging them together into a 
> fancier ``for``. This may seem non-orthogonal, but makes sense when 
> you consider that the existence of generators means that ``with`` 
> blocks actually depend on iterator cleanup to work reliably, plus 
> experience showing that iterator cleanup is often a desireable feature 
> in its own right. 
>
>
> Alternatives 
> ============ 
>
> PEP 525 asyncgen hooks 
> ---------------------- 
>
> PEP 525 proposes a `set of global thread-local hooks managed by new 
> ``sys.{get/set}_asyncgen_hooks()`` functions 
> <https://www.python.org/dev/peps/pep-0525/#finalization>`_, which 
> allow event loops to integrate with the garbage collector to run 
> cleanup for async generators. In principle, this proposal and PEP 525 
> are complementary, in the same way that ``with`` blocks and 
> ``__del__`` are complementary: this proposal takes care of ensuring 
> deterministic cleanup in most cases, while PEP 525's GC hooks clean up 
> anything that gets missed. But ``__aiterclose__`` provides a number of 
> advantages over GC hooks alone: 
>
> - The GC hook semantics aren't part of the abstract async iterator 
> protocol, but are instead restricted `specifically to the async 
> generator concrete type <XX find and link Yury's email saying this>`_. 
> If you have an async iterator implemented using a class, like:: 
>
>     class MyAsyncIterator: 
>         async def __anext__(): 
>             ... 
>
>   then you can't refactor this into an async generator without 
> changing its semantics, and vice-versa. This seems very unpythonic. 
> (It also leaves open the question of what exactly class-based async 
> iterators are supposed to do, given that they face exactly the same 
> cleanup problems as async generators.) ``__aiterclose__``, on the 
> other hand, is defined at the protocol level, so it's duck-type 
> friendly and works for all iterators, not just generators. 
>
> - Code that wants to work on non-CPython implementations like PyPy 
> cannot in general rely on GC for cleanup. Without ``__aiterclose__``, 
> it's more or less guaranteed that developers who develop and test on 
> CPython will produce libraries that leak resources when used on PyPy. 
> Developers who do want to target alternative implementations will 
> either have to take the defensive approach of wrapping every ``for`` 
> loop in a ``with`` block, or else carefully audit their code to figure 
> out which generators might possibly contain cleanup code and add 
> ``with`` blocks around those only. With ``__aiterclose__``, writing 
> portable code becomes easy and natural. 
>
> - An important part of building robust software is making sure that 
> exceptions always propagate correctly without being lost. One of the 
> most exciting things about async/await compared to traditional 
> callback-based systems is that instead of requiring manual chaining, 
> the runtime can now do the heavy lifting of propagating errors, making 
> it *much* easier to write robust code. But, this beautiful new picture 
> has one major gap: if we rely on the GC for generator cleanup, then 
> exceptions raised during cleanup are lost. So, again, with 
> ``__aiterclose__``, developers who care about this kind of robustness 
> will either have to take the defensive approach of wrapping every 
> ``for`` loop in a ``with`` block, or else carefully audit their code 
> to figure out which generators might possibly contain cleanup code. 
> ``__aiterclose__`` plugs this hole by performing cleanup in the 
> caller's context, so writing more robust code becomes the path of 
> least resistance. 
>
> - The WSGI experience suggests that there exist important 
> iterator-based APIs that need prompt cleanup and cannot rely on the 
> GC, even in CPython. For example, consider a hypothetical WSGI-like 
> API based around async/await and async iterators, where a response 
> handler is an async generator that takes request headers + an async 
> iterator over the request body, and yields response headers + the 
> response body. (This is actually the use case that got me interested 
> in async generators in the first place, i.e. this isn't hypothetical.) 
> If we follow WSGI in requiring that child iterators must be closed 
> properly, then without ``__aiterclose__`` the absolute most 
> minimalistic middleware in our system looks something like:: 
>
>     async def noop_middleware(handler, request_header, request_body): 
>         async with aclosing(handler(request_body, request_body)) as aiter: 
>             async for response_item in aiter: 
>                 yield response_item 
>
>   Arguably in regular code one can get away with skipping the ``with`` 
> block around ``for`` loops, depending on how confident one is that one 
> understands the internal implementation of the generator. But here we 
> have to cope with arbitrary response handlers, so without 
> ``__aiterclose__``, this ``with`` construction is a mandatory part of 
> every middleware. 
>
>   ``__aiterclose__`` allows us to eliminate the mandatory boilerplate 
> and an extra level of indentation from every middleware:: 
>
>     async def noop_middleware(handler, request_header, request_body): 
>         async for response_item in handler(request_header, request_body): 
>             yield response_item 
>
> So the ``__aiterclose__`` approach provides substantial advantages 
> over GC hooks. 
>
> This leaves open the question of whether we want a combination of GC 
> hooks + ``__aiterclose__``, or just ``__aiterclose__`` alone. Since 
> the vast majority of generators are iterated over using a ``for`` loop 
> or equivalent, ``__aiterclose__`` handles most situations before the 
> GC has a chance to get involved. The case where GC hooks provide 
> additional value is in code that does manual iteration, e.g.:: 
>
>     agen = fetch_newline_separated_json_from_url(...) 
>     while True: 
>         document = await type(agen).__anext__(agen) 
>         if document["id"] == needle: 
>             break 
>     # doesn't do 'await agen.aclose()' 
>
> If we go with the GC-hooks + ``__aiterclose__`` approach, this 
> generator will eventually be cleaned up by GC calling the generator 
> ``__del__`` method, which then will use the hooks to call back into 
> the event loop to run the cleanup code. 
>
> If we go with the no-GC-hooks approach, this generator will eventually 
> be garbage collected, with the following effects: 
>
> - its ``__del__`` method will issue a warning that the generator was 
> not closed (similar to the existing "coroutine never awaited" 
> warning). 
>
> - The underlying resources involved will still be cleaned up, because 
> the generator frame will still be garbage collected, causing it to 
> drop references to any file handles or sockets it holds, and then 
> those objects's ``__del__`` methods will release the actual operating 
> system resources. 
>
> - But, any cleanup code inside the generator itself (e.g. logging, 
> buffer flushing) will not get a chance to run. 
>
> The solution here -- as the warning would indicate -- is to fix the 
> code so that it calls ``__aiterclose__``, e.g. by using a ``with`` 
> block:: 
>
>     async with aclosing(fetch_newline_separated_json_from_url(...)) as 
> agen: 
>         while True: 
>             document = await type(agen).__anext__(agen) 
>             if document["id"] == needle: 
>                 break 
>
> Basically in this approach, the rule would be that if you want to 
> manually implement the iterator protocol, then it's your 
> responsibility to implement all of it, and that now includes 
> ``__(a)iterclose__``. 
>
> GC hooks add non-trivial complexity in the form of (a) new global 
> interpreter state, (b) a somewhat complicated control flow (e.g., 
> async generator GC always involves resurrection, so the details of PEP 
> 442 are important), and (c) a new public API in asyncio (``await 
> loop.shutdown_asyncgens()``) that users have to remember to call at 
> the appropriate time. (This last point in particular somewhat 
> undermines the argument that GC hooks provide a safe backup to 
> guarantee cleanup, since if ``shutdown_asyncgens()`` isn't called 
> correctly then I *think* it's possible for generators to be silently 
> discarded without their cleanup code being called; compare this to the 
> ``__aiterclose__``-only approach where in the worst case we still at 
> least get a warning printed. This might be fixable.) All this 
> considered, GC hooks arguably aren't worth it, given that the only 
> people they help are those who want to manually call ``__anext__`` yet 
> don't want to manually call ``__aiterclose__``. But Yury disagrees 
> with me on this :-). And both options are viable. 
>
>
> Always inject resources, and do all cleanup at the top level 
> ------------------------------------------------------------ 
>
> It was suggested on python-dev (XX find link) that a pattern to avoid 
> these problems is to always pass resources in from above, e.g. 
> ``read_newline_separated_json`` should take a file object rather than 
> a path, with cleanup handled at the top level:: 
>
>   def read_newline_separated_json(file_handle): 
>       for line in file_handle: 
>           yield json.loads(line) 
>
>   def read_users(file_handle): 
>       for document in read_newline_separated_json(file_handle): 
>           yield User.from_json(document) 
>
>   with open(path) as file_handle: 
>       for user in read_users(file_handle): 
>           ... 
>
> This works well in simple cases; here it lets us avoid the "N+1 
> ``with`` blocks problem". But unfortunately, it breaks down quickly 
> when things get more complex. Consider if instead of reading from a 
> file, our generator was reading from a streaming HTTP GET request -- 
> while handling redirects and authentication via OAUTH. Then we'd 
> really want the sockets to be managed down inside our HTTP client 
> library, not at the top level. Plus there are other cases where 
> ``finally`` blocks embedded inside generators are important in their 
> own right: db transaction management, emitting logging information 
> during cleanup (one of the major motivating use cases for WSGI 
> ``close``), and so forth. So this is really a workaround for simple 
> cases, not a general solution. 
>
>
> More complex variants of __(a)iterclose__ 
> ----------------------------------------- 
>
> The semantics of ``__(a)iterclose__`` are somewhat inspired by 
> ``with`` blocks, but context managers are more powerful: 
> ``__(a)exit__`` can distinguish between a normal exit versus exception 
> unwinding, and in the case of an exception it can examine the 
> exception details and optionally suppress propagation. 
> ``__(a)iterclose__`` as proposed here does not have these powers, but 
> one can imagine an alternative design where it did. 
>
> However, this seems like unwarranted complexity: experience suggests 
> that it's common for iterables to have ``close`` methods, and even to 
> have ``__exit__`` methods that call ``self.close()``, but I'm not 
> aware of any common cases that make use of ``__exit__``'s full power. 
> I also can't think of any examples where this would be useful. And it 
> seems unnecessarily confusing to allow iterators to affect flow 
> control by swallowing exceptions -- if you're in a situation where you 
> really want that, then you should probably use a real ``with`` block 
> anyway. 
>
>
> Specification 
> ============= 
>
> This section describes where we want to eventually end up, though 
> there are some backwards compatibility issues that mean we can't jump 
> directly here. A later section describes the transition plan. 
>
>
> Guiding principles 
> ------------------ 
>
> Generally, ``__(a)iterclose__`` implementations should: 
>
> - be idempotent, 
> - perform any cleanup that is appropriate on the assumption that the 
> iterator will not be used again after ``__(a)iterclose__`` is called. 
> In particular, once ``__(a)iterclose__`` has been called then calling 
> ``__(a)next__`` produces undefined behavior. 
>
> And generally, any code which starts iterating through an iterable 
> with the intention of exhausting it, should arrange to make sure that 
> ``__(a)iterclose__`` is eventually called, whether or not the iterator 
> is actually exhausted. 
>
>
> Changes to iteration 
> -------------------- 
>
> The core proposal is the change in behavior of ``for`` loops. Given 
> this Python code:: 
>
>   for VAR in ITERABLE: 
>       LOOP-BODY 
>   else: 
>       ELSE-BODY 
>
> we desugar to the equivalent of:: 
>
>   _iter = iter(ITERABLE) 
>   _iterclose = getattr(type(_iter), "__iterclose__", lambda: None) 
>   try: 
>       traditional-for VAR in _iter: 
>           LOOP-BODY 
>       else: 
>           ELSE-BODY 
>   finally: 
>       _iterclose(_iter) 
>
> where the "traditional-for statement" here is meant as a shorthand for 
> the classic 3.5-and-earlier ``for`` loop semantics. 
>
> Besides the top-level ``for`` statement, Python also contains several 
> other places where iterators are consumed. For consistency, these 
> should call ``__iterclose__`` as well using semantics equivalent to 
> the above. This includes: 
>
> - ``for`` loops inside comprehensions 
> - ``*`` unpacking 
> - functions which accept and fully consume iterables, like 
> ``list(it)``, ``tuple(it)``, ``itertools.product(it1, it2, ...)``, and 
> others. 
>
>
> Changes to async iteration 
> -------------------------- 
>
> We also make the analogous changes to async iteration constructs, 
> except that the new slot is called ``__aiterclose__``, and it's an 
> async method that gets ``await``\ed. 
>
>
> Modifications to basic iterator types 
> ------------------------------------- 
>
> Generator objects (including those created by generator comprehensions): 
> - ``__iterclose__`` calls ``self.close()`` 
> - ``__del__`` calls ``self.close()`` (same as now), and additionally 
> issues a ``ResourceWarning`` if the generator wasn't exhausted. This 
> warning is hidden by default, but can be enabled for those who want to 
> make sure they aren't inadverdantly relying on CPython-specific GC 
> semantics. 
>
> Async generator objects (including those created by async generator 
> comprehensions): 
> - ``__aiterclose__`` calls ``self.aclose()`` 
> - ``__del__`` issues a ``RuntimeWarning`` if ``aclose`` has not been 
> called, since this probably indicates a latent bug, similar to the 
> "coroutine never awaited" warning. 
>
> QUESTION: should file objects implement ``__iterclose__`` to close the 
> file? On the one hand this would make this change more disruptive; on 
> the other hand people really like writing ``for line in open(...): 
> ...``, and if we get used to iterators taking care of their own 
> cleanup then it might become very weird if files don't. 
>
>
> New convenience functions 
> ------------------------- 
>
> The ``itertools`` module gains a new iterator wrapper that can be used 
> to selectively disable the new ``__iterclose__`` behavior:: 
>
>   # QUESTION: I feel like there might be a better name for this one? 
>   class preserve(iterable): 
>       def __init__(self, iterable): 
>           self._it = iter(iterable) 
>
>       def __iter__(self): 
>           return self 
>
>       def __next__(self): 
>           return next(self._it) 
>
>       def __iterclose__(self): 
>           # Swallow __iterclose__ without passing it on 
>           pass 
>
> Example usage (assuming that file objects implements ``__iterclose__``):: 
>
>   with open(...) as handle: 
>       # Iterate through the same file twice: 
>       for line in itertools.preserve(handle): 
>           ... 
>       handle.seek(0) 
>       for line in itertools.preserve(handle): 
>           ... 
>
> The ``operator`` module gains two new functions, with semantics 
> equivalent to the following:: 
>
>   def iterclose(it): 
>       if hasattr(type(it), "__iterclose__"): 
>           type(it).__iterclose__(it) 
>
>   async def aiterclose(ait): 
>       if hasattr(type(ait), "__aiterclose__"): 
>           await type(ait).__aiterclose__(ait) 
>
> These are particularly useful when implementing the changes in the next 
> section: 
>
>
> __iterclose__ implementations for iterator wrappers 
> --------------------------------------------------- 
>
> Python ships a number of iterator types that act as wrappers around 
> other iterators: ``map``, ``zip``, ``itertools.accumulate``, 
> ``csv.reader``, and others. These iterators should define a 
> ``__iterclose__`` method which calls ``__iterclose__`` in turn on 
> their underlying iterators. For example, ``map`` could be implemented 
> as:: 
>
>   class map: 
>       def __init__(self, fn, *iterables): 
>           self._fn = fn 
>           self._iters = [iter(iterable) for iterable in iterables] 
>
>       def __iter__(self): 
>           return self 
>
>       def __next__(self): 
>           return self._fn(*[next(it) for it in self._iters]) 
>
>       def __iterclose__(self): 
>           for it in self._iters: 
>               operator.iterclose(it) 
>
> In some cases this requires some subtlety; for example, 
> ```itertools.tee`` 
> <https://docs.python.org/3/library/itertools.html#itertools.tee>`_ 
> should not call ``__iterclose__`` on the underlying iterator until it 
> has been called on *all* of the clone iterators. 
>
>
> Example / Rationale 
> ------------------- 
>
> The payoff for all this is that we can now write straightforward code 
> like:: 
>
>   def read_newline_separated_json(path): 
>       for line in open(path): 
>           yield json.loads(line) 
>
> and be confident that the file will receive deterministic cleanup 
> *without the end-user having to take any special effort*, even in 
> complex cases. For example, consider this silly pipeline:: 
>
>   list(map(lambda key: key.upper(), 
>            doc["key"] for doc in read_newline_separated_json(path))) 
>
> If our file contains a document where ``doc["key"]`` turns out to be 
> an integer, then the following sequence of events will happen: 
>
> 1. ``key.upper()`` raises an ``AttributeError``, which propagates out 
> of the ``map`` and triggers the implicit ``finally`` block inside 
> ``list``. 
> 2. The ``finally`` block in ``list`` calls ``__iterclose__()`` on the 
> map object. 
> 3. ``map.__iterclose__()`` calls ``__iterclose__()`` on the generator 
> comprehension object. 
> 4. This injects a ``GeneratorExit`` exception into the generator 
> comprehension body, which is currently suspended inside the 
> comprehension's ``for`` loop body. 
> 5. The exception propagates out of the ``for`` loop, triggering the 
> ``for`` loop's implicit ``finally`` block, which calls 
> ``__iterclose__`` on the generator object representing the call to 
> ``read_newline_separated_json``. 
> 6. This injects an inner ``GeneratorExit`` exception into the body of 
> ``read_newline_separated_json``, currently suspended at the ``yield``. 
> 7. The inner ``GeneratorExit`` propagates out of the ``for`` loop, 
> triggering the ``for`` loop's implicit ``finally`` block, which calls 
> ``__iterclose__()`` on the file object. 
> 8. The file object is closed. 
> 9. The inner ``GeneratorExit`` resumes propagating, hits the boundary 
> of the generator function, and causes 
> ``read_newline_separated_json``'s ``__iterclose__()`` method to return 
> successfully. 
> 10. Control returns to the generator comprehension body, and the outer 
> ``GeneratorExit`` continues propagating, allowing the comprehension's 
> ``__iterclose__()`` to return successfully. 
> 11. The rest of the ``__iterclose__()`` calls unwind without incident, 
> back into the body of ``list``. 
> 12. The original ``AttributeError`` resumes propagating. 
>
> (The details above assume that we implement ``file.__iterclose__``; if 
> not then add a ``with`` block to ``read_newline_separated_json`` and 
> essentially the same logic goes through.) 
>
> Of course, from the user's point of view, this can be simplified down to 
> just: 
>
> 1. ``int.upper()`` raises an ``AttributeError`` 
> 1. The file object is closed. 
> 2. The ``AttributeError`` propagates out of ``list`` 
>
> So we've accomplished our goal of making this "just work" without the 
> user having to think about it. 
>
>
> Transition plan 
> =============== 
>
> While the majority of existing ``for`` loops will continue to produce 
> identical results, the proposed changes will produce 
> backwards-incompatible behavior in some cases. Example:: 
>
>   def read_csv_with_header(lines_iterable): 
>       lines_iterator = iter(lines_iterable) 
>       for line in lines_iterator: 
>           column_names = line.strip().split("\t") 
>           break 
>       for line in lines_iterator: 
>           values = line.strip().split("\t") 
>           record = dict(zip(column_names, values)) 
>           yield record 
>
> This code used to be correct, but after this proposal is implemented 
> will require an ``itertools.preserve`` call added to the first ``for`` 
> loop. 
>
> [QUESTION: currently, if you close a generator and then try to iterate 
> over it then it just raises ``Stop(Async)Iteration``, so code the 
> passes the same generator object to multiple ``for`` loops but forgets 
> to use ``itertools.preserve`` won't see an obvious error -- the second 
> ``for`` loop will just exit immediately. Perhaps it would be better if 
> iterating a closed generator raised a ``RuntimeError``? Note that 
> files don't have this problem -- attempting to iterate a closed file 
> object already raises ``ValueError``.] 
>
> Specifically, the incompatibility happens when all of these factors 
> come together: 
>
> - The automatic calling of ``__(a)iterclose__`` is enabled 
> - The iterable did not previously define ``__(a)iterclose__`` 
> - The iterable does now define ``__(a)iterclose__`` 
> - The iterable is re-used after the ``for`` loop exits 
>
> So the problem is how to manage this transition, and those are the 
> levers we have to work with. 
>
> First, observe that the only async iterables where we propose to add 
> ``__aiterclose__`` are async generators, and there is currently no 
> existing code using async generators (though this will start changing 
> very soon), so the async changes do not produce any backwards 
> incompatibilities. (There is existing code using async iterators, but 
> using the new async for loop on an old async iterator is harmless, 
> because old async iterators don't have ``__aiterclose__``.) In 
> addition, PEP 525 was accepted on a provisional basis, and async 
> generators are by far the biggest beneficiary of this PEP's proposed 
> changes. Therefore, I think we should strongly consider enabling 
> ``__aiterclose__`` for ``async for`` loops and async generators ASAP, 
> ideally for 3.6.0 or 3.6.1. 
>
> For the non-async world, things are harder, but here's a potential 
> transition path: 
>
> In 3.7: 
>
> Our goal is that existing unsafe code will start emitting warnings, 
> while those who want to opt-in to the future can do that immediately: 
>
> - We immediately add all the ``__iterclose__`` methods described above. 
> - If ``from __future__ import iterclose`` is in effect, then ``for`` 
> loops and ``*`` unpacking call ``__iterclose__`` as specified above. 
> - If the future is *not* enabled, then ``for`` loops and ``*`` 
> unpacking do *not* call ``__iterclose__``. But they do call some other 
> method instead, e.g. ``__iterclose_warning__``. 
> - Similarly, functions like ``list`` use stack introspection (!!) to 
> check whether their direct caller has ``__future__.iterclose`` 
> enabled, and use this to decide whether to call ``__iterclose__`` or 
> ``__iterclose_warning__``. 
> - For all the wrapper iterators, we also add ``__iterclose_warning__`` 
> methods that forward to the ``__iterclose_warning__`` method of the 
> underlying iterator or iterators. 
> - For generators (and files, if we decide to do that), 
> ``__iterclose_warning__`` is defined to set an internal flag, and 
> other methods on the object are modified to check for this flag. If 
> they find the flag set, they issue a ``PendingDeprecationWarning`` to 
> inform the user that in the future this sequence would have led to a 
> use-after-close situation and the user should use ``preserve()``. 
>
> In 3.8: 
>
> - Switch from ``PendingDeprecationWarning`` to ``DeprecationWarning`` 
>
> In 3.9: 
>
> - Enable the ``__future__`` unconditionally and remove all the 
> ``__iterclose_warning__`` stuff. 
>
> I believe that this satisfies the normal requirements for this kind of 
> transition -- opt-in initially, with warnings targeted precisely to 
> the cases that will be effected, and a long deprecation cycle. 
>
> Probably the most controversial / risky part of this is the use of 
> stack introspection to make the iterable-consuming functions sensitive 
> to a ``__future__`` setting, though I haven't thought of any situation 
> where it would actually go wrong yet... 
>
>
> Acknowledgements 
> ================ 
>
> Thanks to Yury Selivanov, Armin Rigo, and Carl Friedrich Bolz for 
> helpful discussion on earlier versions of this idea. 
>
> -- 
> Nathaniel J. Smith -- https://vorpus.org 
> _______________________________________________ 
> Python-ideas mailing list 
> python...@python.org <javascript:> 
> https://mail.python.org/mailman/listinfo/python-ideas 
> Code of Conduct: http://python.org/psf/codeofconduct/ 
>

_______________________________________________
Python-ideas mailing list
Python-ideas@python.org
https://mail.python.org/mailman/listinfo/python-ideas
Code of Conduct: http://python.org/psf/codeofconduct/

Re: [Python-ideas] Deterministic iterator cleanup

Reply via email to