Hi Yury, Thanks for the detailed comments! Replies inline below.
On Wed, Oct 19, 2016 at 8:51 AM, Yury Selivanov <yselivanov...@gmail.com> wrote: > I'm -1 on the idea. Here's why: > > > 1. Python is a very dynamic language with GC and that is one of its > fundamental properties. This proposal might make GC of iterators more > deterministic, but that is only one case. > > For instance, in some places in asyncio source code we have statements like > this: "self = None". Why? When an exception occurs and we want to save it > (for instance to log it), it holds a reference to the Traceback object. > Which in turn references frame objects. Which means that a lot of objects > in those frames will be alive while the exception object is alive. So in > asyncio we go to great lengths to avoid unnecessary runs of GC, but this is > an exception! Most of Python code out there today doesn't do this sorts of > tricks. > > And this is just one example of how you can have cycles that require a run > of GC. It is not possible to have deterministic GC in real life Python > applications. This proposal addresses only *one* use case, leaving 100s of > others unresolved. Maybe I'm misunderstanding, but I think those 100s of other cases where you need deterministic cleanup are why 'with' blocks were invented, and in my experience they work great for that. Once you get in the habit, it's very easy and idiomatic to attach a 'with' to each file handle, socket, etc., at the point where you create it. So from where I stand, it seems like those 100s of unresolved cases actually are resolved? The problem is that 'with' blocks are great, and generators are great, but when you put them together into the same language there's this weird interaction that emerges, where 'with' blocks inside generators don't really work for their intended purpose unless you're very careful and willing to write boilerplate. Adding deterministic cleanup to generators plugs this gap. Beyond that, I do think it's a nice bonus that other iterables can take advantage of the feature, but this isn't just a random "hey let's smush two constructs together to save a line of code" thing -- iteration is special because it's where generator call stacks and regular call stacks meet. > IMO, while GC-related issues can be annoying to debug sometimes, it's not > worth it to change the behaviour of iteration in Python only to slightly > improve on this. > > 2. This proposal will make writing iterators significantly harder. Consider > 'itertools.chain'. We will have to rewrite it to add the proposed > __iterclose__ method. The Chain iterator object will have to track all of > its iterators, call __iterclose__ on them when it's necessary (there are a > few corner cases). Given that this object is implemented in C, it's quite a > bit of work. And we'll have a lot of objects to fix. When you say "make writing iterators significantly harder", is it fair to say that you're thinking mostly of what I'm calling "iterator wrappers"? For most day-to-day iterators, it's pretty trivial to either add a close method or not; the tricky cases are when you're trying to manage a collection of sub-iterators. itertools.chain is a great challenge / test case here, because I think it's about as hard as this gets :-). It took me a bit to wrap my head around, but I think I've got it, and that it's not so bad actually. Right now, chain's semantics are: # copied directly from the docs def chain(*iterables): for it in iterables: for element in it: yield element In a post-__iterclose__ world, the inner for loop there will already handle closing each iterators as its finished being consumed, and if the generator is closed early then the inner for loop will also close the current iterator. What we need to add is that if the generator is closed early, we should also close all the unprocessed iterators. The first change is to replace the outer for loop with a while/pop loop, so that if an exception occurs we'll know which iterables remain to be processed: def chain(*iterables): try: while iterables: for element in iterables.pop(0): yield element ... Now, what do we do if an exception does occur? We need to call iterclose on all of the remaining iterables, but the tricky bit is that this might itself raise new exceptions. If this happens, we don't want to abort early; instead, we want to continue until we've closed all the iterables, and then raise a chained exception. Basically what we want is: def chain(*iterables): try: while iterables: for element in iterables.pop(0): yield element finally: try: operators.iterclose(iter(iterables[0])) finally: try: operators.iterclose(iter(iterables[1])) finally: try: operators.iterclose(iter(iterables[2])) finally: ... but of course that's not valid syntax. Fortunately, it's not too hard to rewrite that into real Python -- but it's a little dense: def chain(*iterables): try: while iterables: for element in iterables.pop(0): yield element # This is equivalent to the nested-finally chain above: except BaseException as last_exc: for iterable in iterables: try: operators.iterclose(iter(iterable)) except BaseException as new_exc: if new_exc.__context__ is None: new_exc.__context__ = last_exc last_exc = new_exc raise last_exc It's probably worth wrapping that bottom part into an iterclose_all() helper, since the pattern probably occurs in other cases as well. (Actually, now that I think about it, the map() example in the text should be doing this instead of what it's currently doing... I'll fix that.) This doesn't strike me as fundamentally complicated, really -- the exception chaining logic makes it look scary, but basically it's just the current chain() plus a cleanup loop. I believe that this handles all the corner cases correctly. Am I missing something? And again, this strikes me as one of the worst cases -- the vast majority of iterators out there are not doing anything nearly this complicated with subiterators. > We can probably update all iterators in standard library (in 3.7), but what > about third-party code? It will take many years until you can say with > certainty that most of Python code supports __iterclose__ / __aiterclose__. Adding support to itertools, toolz.itertoolz, and generators (which are the most common way to implement iterator wrappers) will probably take care of 95% of uses, but yeah, there's definitely a long tail that will take time to shake out. The (extremely tentative) transition plan has __iterclose__ as opt-in until 3.9, so that's about 3.5 years from now. __aiterclose__ is a different matter of course, since there are very very few async iterator wrappers in the wild, and in general I think most people writing async iterators are watching async/await-related language developments very closely. > 3. This proposal changes the behaviour of 'for' and 'async for' statements > significantly. To do partial iteration you will have to use a special > builtin function to guard the iterator from being closed. This is > completely non-obvious to any existing Python user and will be hard to > explain to newcomers. It's true that it's non-obvious to existing users, but that's true of literally every change that we could ever make :-). That's why we have release notes, deprecation warnings, enthusiastic blog posts, etc. For newcomers... well, it's always difficult for those of us with more experience to put ourselves back in the mindset, but I don't see why this would be particularly difficult to explain? for loops consume their iterator; if you don't want that then here's how you avoid it. That's no more difficult to explain than what an iterator is in the first place, I don't think, and for me at least it's a lot easier to wrap my head around than the semantics of else blocks on for loops :-). (I always forget how those work.) > 4. This proposal only addresses iteration with 'for' and 'async for' > statements. If you iterate using a 'while' loop and 'next()' function, this > proposal wouldn't help you. Also see the point #2 about third-party code. True. If you're doing manual iteration, then you are still responsible for manual cleanup (if that's what you want), just like today. This seems fine to me -- I'm not sure why it's an objection to this proposal :-). > 5. Asynchronous generators (AG) introduced by PEP 525 are finalized in a > very similar fashion to synchronous generators. There is an API to help > Python to call event loop to finalize AGs. asyncio in 3.6 (and other event > loops in the near future) already uses this API to ensure that *all AGs in a > long-running program are properly finalized* while it is being run. > > There is an extra loop method (`loop.shutdown_asyncgens`) that should be > called right before stopping the loop (exiting the program) to make sure > that all AGs are finalized, but if you forget to call it the world won't > end. The process will end and the interpreter will shutdown, maybe issuing > a couple of ResourceWarnings. There is no law that says that the interpreter always shuts down after the event loop exits. We're talking about a fundamental language feature here, it shouldn't be dependent on the details of libraries and application shutdown tendencies :-(. > No exception will pass silently in the current PEP 525 implementation. Exceptions that occur inside a garbage-collected iterator will be printed to the console, or possibly logged according to whatever the event loop does with unhandled exceptions. And sure, that's better than nothing, if someone remembers to look at the console/logs. But they *won't* be propagated out to the containing frame, they can't be caught, etc. That's a really big difference. > And if some AG isn't properly finalized a warning will be issued. This actually isn't true of the code currently in asyncio master -- if the loop is already closed (either manually by the user or by its __del__ being called) when the AG finalizer executes, then the AG is silently discarded: https://github.com/python/asyncio/blob/e3fed68754002000be665ad1a379a747ad9247b6/asyncio/base_events.py#L352 This isn't really an argument against the mechanism though, just a bug you should probably fix :-). I guess it does point to my main dissatisfaction with the whole GC hook machinery, though. At this point I have spent many, many hours tracing through the details of this catching edge cases -- first during the initial PEP process, where there were a few rounds of revision, then again the last few days when I first thought I found a bunch of bugs that turned out to be spurious because I'd missed one line in the PEP, plus one real bug that you already know about (the finalizer-called-from-wrong-thread issue), and then I spent another hour carefully reading through the code again with PEP 442 open alongside once I realized how subtle the resurrection and cyclic reference issues are here, and now here's another minor bug for you. At this point I'm about 85% confident that it does actually function as described, or that we'll at least be able to shake out any remaining weird edge cases over the next 6-12 months as people use it. But -- and I realize this is an aesthetic reaction as much as anything else -- this all feels *really* unpythonic to me. Looking at the Zen, the phrases that come to mind are "complicated", and "If the implementation is hard to explain, ...". The __(a)iterclose__ proposal definitely has its complexity as well, but it's a very different kind. The core is incredibly straightforward: "there is this method, for loops always call it". That's it. When you look at a for loop, you can be extremely confident about what's going to happen and when. Of course then there's the question of defining this method on all the diverse iterators that we have floating around -- I'm not saying it's trivial. But you can take them one at a time, and each individual case is pretty straightforward. > The current AG finalization mechanism must stay even if this proposal gets > accepted, as it ensures that even manually iterated AGs are properly > finalized. Like I said in the text, I don't find this very persuasive, since if you're manually iterating then you can just as well take manual responsibility for cleaning things up. But I could live with both mechanisms co-existing. > 6. If this proposal gets accepted, I think we shouldn't introduce it in any > form in 3.6. It's too late to implement it for both sync- and > async-generators. Implementing it only for async-generators will only add > cognitive overhead. Even implementing this only for async-generators will > (and should!) delay 3.6 release significantly. I certainly don't want to delay 3.6. I'm not as convinced as you that the async-generator code alone is so complicated that it would force a delay, but if it is then 3.6.1 is also an option worth considering. > 7. To conclude: I'm not convinced that this proposal fully solves the issue > of non-deterministic GC of iterators. It cripples iteration protocols to > partially solve the problem for 'for' and 'async for' statements, leaving > manual iteration unresolved. It will make it harder to write *correct* > (async-) iterators. It introduces some *implicit* context management to > 'for' and 'async for' statements -- something that IMO should be done by > user with an explicit 'with' or 'async with'. The goal isn't to "fully solve the problem of non-deterministic GC of iterators". That would require magic :-). The goal is to provide tools so that when users run into this problem, they have viable options to solve it. Right now, we don't have those tools, as evidenced by the fact that I've basically never seen code that does this "correctly". We can tell people that they should be using explicit 'with' on every generator that might contain cleanup code, but they don't and they won't, and as a result their code quality is suffering on several axes (portability across Python implementations, 'with' blocks inside generators that don't actually do anything except spuriously hide ResourceWarnings, etc.). Adding __(a)iterclose__ to (async) for loops makes it easy and convenient to do the right thing in common cases; and in the less-usual case where you want to do manual iteration, then you can and should use a manual 'with' block too. The proposal is not trying to replace 'with' blocks :-). As for implicitness, eh. If 'for' is defined to mean 'iterate and then close', then that's what 'for' means. If we make the change then there won't be anything more implicit about 'for' calling __iterclose__ than there is about 'for' calling __iter__ or __next__. Definitely this will take some adjustment for those who are used to the old system, but sometimes that's the price of progress ;-). -n -- Nathaniel J. Smith -- https://vorpus.org _______________________________________________ Python-ideas mailing list Python-ideas@python.org https://mail.python.org/mailman/listinfo/python-ideas Code of Conduct: http://python.org/psf/codeofconduct/