Nathaniel,

On 2016-10-19 5:02 PM, Nathaniel Smith wrote:

Hi Yury,

Thanks for the detailed comments! Replies inline below.

NP!


On Wed, Oct 19, 2016 at 8:51 AM, Yury Selivanov <yselivanov...@gmail.com> wrote:
I'm -1 on the idea.  Here's why:


1. Python is a very dynamic language with GC and that is one of its
fundamental properties.  This proposal might make GC of iterators more
deterministic, but that is only one case.

For instance, in some places in asyncio source code we have statements like
this: "self = None".  Why?  When an exception occurs and we want to save it
(for instance to log it), it holds a reference to the Traceback object.
Which in turn references frame objects.  Which means that a lot of objects
in those frames will be alive while the exception object is alive.  So in
asyncio we go to great lengths to avoid unnecessary runs of GC, but this is
an exception!  Most of Python code out there today doesn't do this sorts of
tricks.

And this is just one example of how you can have cycles that require a run
of GC.  It is not possible to have deterministic GC in real life Python
applications.  This proposal addresses only *one* use case, leaving 100s of
others unresolved.
Maybe I'm misunderstanding, but I think those 100s of other cases
where you need deterministic cleanup are why 'with' blocks were
invented, and in my experience they work great for that. Once you get
in the habit, it's very easy and idiomatic to attach a 'with' to each
file handle, socket, etc., at the point where you create it. So from
where I stand, it seems like those 100s of unresolved cases actually
are resolved?

Not all code can be written with 'with' statements, see my example with 'self = None' in asyncio. Python code can be quite complex, involving classes with __del__ that do some cleanups etc. Fundamentally, you cannot make GC of such objects deterministic.

IOW I'm not convinced that if we implement your proposal we'll fix 90% (or even 30%) of cases where non-deterministic and postponed cleanup is harmful.
The problem is that 'with' blocks are great, and generators are great,
but when you put them together into the same language there's this
weird interaction that emerges, where 'with' blocks inside generators
don't really work for their intended purpose unless you're very
careful and willing to write boilerplate.

Adding deterministic cleanup to generators plugs this gap. Beyond
that, I do think it's a nice bonus that other iterables can take
advantage of the feature, but this isn't just a random "hey let's
smush two constructs together to save a line of code" thing --
iteration is special because it's where generator call stacks and
regular call stacks meet.

Yes, I understand that your proposal really improves some things. OTOH it undeniably complicates the iteration protocol and requires a long period of deprecations, teaching users and library authors new semantics, etc.

We only now begin to see Python 3 gaining traction. I don't want us to harm that by introducing another set of things to Python 3 that are significantly different from Python 2. DeprecationWarnings/future imports don't excite users either.

IMO, while GC-related issues can be annoying to debug sometimes, it's not
worth it to change the behaviour of iteration in Python only to slightly
improve on this.

2. This proposal will make writing iterators significantly harder. Consider
'itertools.chain'.  We will have to rewrite it to add the proposed
__iterclose__ method.  The Chain iterator object will have to track all of
its iterators, call __iterclose__ on them when it's necessary (there are a
few corner cases).  Given that this object is implemented in C, it's quite a
bit of work.  And we'll have a lot of objects to fix.
When you say "make writing iterators significantly harder", is it fair
to say that you're thinking mostly of what I'm calling "iterator
wrappers"? For most day-to-day iterators, it's pretty trivial to
either add a close method or not; the tricky cases are when you're
trying to manage a collection of sub-iterators.

Yes, mainly iterator wrappers. You'll also will need to educate users to refactor (more on that below) their __del__ methods to __(a)iterclose__ in 3.6.

itertools.chain is a great challenge / test case here, because I think
it's about as hard as this gets :-). It took me a bit to wrap my head
around, but I think I've got it, and that it's not so bad actually.
Now imagine that being applied throughout the stdlib, plus some of it will have to be implemented in C. I'm not saying it's impossible, I'm saying that it will require additional effort for CPython and ecosystem.

[..]

3. This proposal changes the behaviour of 'for' and 'async for' statements
significantly.  To do partial iteration you will have to use a special
builtin function to guard the iterator from being closed.  This is
completely non-obvious to any existing Python user and will be hard to
explain to newcomers.
It's true that it's non-obvious to existing users, but that's true of
literally every change that we could ever make :-). That's why we have
release notes, deprecation warnings, enthusiastic blog posts, etc.

We don't often change the behavior of basic statements like 'for', if ever.


For newcomers... well, it's always difficult for those of us with more
experience to put ourselves back in the mindset, but I don't see why
this would be particularly difficult to explain? for loops consume
their iterator; if you don't want that then here's how you avoid it.
That's no more difficult to explain than what an iterator is in the
first place, I don't think, and for me at least it's a lot easier to
wrap my head around than the semantics of else blocks on for loops
:-). (I always forget how those work.)

A lot of code that you find on stackoverflow etc will be broken. Porting code from Python2/<3.6 will be challenging. People are still struggling to understand 'dict.keys()'-like views in Python 3.


4. This proposal only addresses iteration with 'for' and 'async for'
statements.  If you iterate using a 'while' loop and 'next()' function, this
proposal wouldn't help you.  Also see the point #2 about third-party code.
True. If you're doing manual iteration, then you are still responsible
for manual cleanup (if that's what you want), just like today. This
seems fine to me -- I'm not sure why it's an objection to this
proposal :-).

Right now we can implement the __del__ method to cleanup iterators. And it works for both partial iteration and cases where people forgot to close the iterator explicitly.

With you proposal, to achieve the same (and make the code compatible with new for-loop semantics), users will have to implement both __iterclose__ and __del__.


5. Asynchronous generators (AG) introduced by PEP 525 are finalized in a
very similar fashion to synchronous generators.  There is an API to help
Python to call event loop to finalize AGs.  asyncio in 3.6 (and other event
loops in the near future) already uses this API to ensure that *all AGs in a
long-running program are properly finalized* while it is being run.

There is an extra loop method (`loop.shutdown_asyncgens`) that should be
called right before stopping the loop (exiting the program) to make sure
that all AGs are finalized, but if you forget to call it the world won't
end.  The process will end and the interpreter will shutdown, maybe issuing
a couple of ResourceWarnings.
There is no law that says that the interpreter always shuts down after
the event loop exits. We're talking about a fundamental language
feature here, it shouldn't be dependent on the details of libraries
and application shutdown tendencies :-(.

It's not about shutting down the interpreter or exiting the process. The majority of async applications just run the loop until they exit. The point of PEP 525 and how the finalization is handled in asyncio is that AGs will be properly cleaned up for the absolute majority of time (while the loop is running).

[..]
And if some AG isn't properly finalized a warning will be issued.
This actually isn't true of the code currently in asyncio master -- if
the loop is already closed (either manually by the user or by its
__del__ being called) when the AG finalizer executes, then the AG is
silently discarded:
   
https://github.com/python/asyncio/blob/e3fed68754002000be665ad1a379a747ad9247b6/asyncio/base_events.py#L352

This isn't really an argument against the mechanism though, just a bug
you should probably fix :-).

I don't think it's a bug. When the loop is closed, the hook will do nothing, so the asynchronous generator will be cleaned up by the interpreter. If it has an 'await' expression in its 'finally' statement, the interpreter will issue a warning.

I'll add a comment explaining this.


I guess it does point to my main dissatisfaction with the whole GC
hook machinery, though. At this point I have spent many, many hours
tracing through the details of this catching edge cases -- first
during the initial PEP process, where there were a few rounds of
revision, then again the last few days when I first thought I found a
bunch of bugs that turned out to be spurious because I'd missed one
line in the PEP, plus one real bug that you already know about (the
finalizer-called-from-wrong-thread issue), and then I spent another
hour carefully reading through the code again with PEP 442 open
alongside once I realized how subtle the resurrection and cyclic
reference issues are here, and now here's another minor bug for you.

Yes, I agree it's not an easy thing to digest. Good thing is that asyncio has a reference implementation of PEP 525 support, so people can learn from it. I'll definitely add more comments to make the code easier to read.


At this point I'm about 85% confident that it does actually function
as described, or that we'll at least be able to shake out any
remaining weird edge cases over the next 6-12 months as people use it.
But -- and I realize this is an aesthetic reaction as much as anything
else -- this all feels *really* unpythonic to me. Looking at the Zen,
the phrases that come to mind are "complicated", and "If the
implementation is hard to explain, ...".

The __(a)iterclose__ proposal definitely has its complexity as well,
but it's a very different kind. The core is incredibly
straightforward: "there is this method, for loops always call it".
That's it. When you look at a for loop, you can be extremely confident
about what's going to happen and when. Of course then there's the
question of defining this method on all the diverse iterators that we
have floating around -- I'm not saying it's trivial. But you can take
them one at a time, and each individual case is pretty
straightforward.

The __(a)iterclose__ semantics is clear. What's not clear is how much harm changing the semantics of for-loops will do (and how to quantify the amount of good :))

[..]

7. To conclude: I'm not convinced that this proposal fully solves the issue
of non-deterministic GC of iterators.  It cripples iteration protocols to
partially solve the problem for 'for' and 'async for' statements, leaving
manual iteration unresolved.  It will make it harder to write *correct*
(async-) iterators.  It introduces some *implicit* context management to
'for' and 'async for' statements -- something that IMO should be done by
user with an explicit 'with' or 'async with'.
The goal isn't to "fully solve the problem of non-deterministic GC of
iterators". That would require magic :-). The goal is to provide tools
so that when users run into this problem, they have viable options to
solve it. Right now, we don't have those tools, as evidenced by the
fact that I've basically never seen code that does this "correctly".
We can tell people that they should be using explicit 'with' on every
generator that might contain cleanup code, but they don't and they
won't, and as a result their code quality is suffering on several axes
(portability across Python implementations, 'with' blocks inside
generators that don't actually do anything except spuriously hide
ResourceWarnings, etc.).

Perhaps we should focus on teaching people that using 'with' statements inside (async-) generators is a bad idea. What you should do instead is to have a 'with' statement wrapping the code that uses the generator.

Yury
_______________________________________________
Python-ideas mailing list
Python-ideas@python.org
https://mail.python.org/mailman/listinfo/python-ideas
Code of Conduct: http://python.org/psf/codeofconduct/

Reply via email to