Nathaniel Smith <n...@pobox.com> added the comment:

I think conceptually, cancelling all tasks and waiting for them to exit is the 
right thing to do. That way you run as much shutdown code as possible in 
regular context, and also avoid this problem – if there are no tasks, then 
there can't be any tasks blocked in __anext__/asend/athrow/aclose, so calling 
aclose is safe.

This is also the right way to handle it from Trio's perspective – we always 
wait for all tasks to exit before exiting our run loop. For asyncio it might be 
a bit more complicated though so we should think through the trade-offs.

> The only way (at least known to me) that that could be the case if some tasks 
> ignores cancellation.  

Looking at runners.py, it seems like asyncio.run doesn't just call 
task.cancel(), but actually waits for all tasks to exit. (See the call to 
gather() inside _cancel_all_tasks.) So I think that does guarantee that by the 
time we hit shutdown_asyncgens(), we can be certain that all tasks have exited. 
(If a task ignores cancellation, then that will cause _cancel_all_tasks to 
hang, so we never reach shutdown_asyncgens at all. Not ideal, but not really 
our problem either – if someone complains we just tell them they need to stop 
ignoring cancellation if they want  their program to shutdown cleanly.)

So I think asyncio.run is actually totally fine with the 3.8.0 behavior. It's 
only explicit calls to shutdown_asyncgens that might run into this, and I think 
that's probably OK? If you're calling shutdown_asyncgens by hand then you're 
kind of taking responsibility for dealing with any subtle issues around 
shutdown.

And if we did allow aclose() to run at any time, then I worry that that could 
cause serious problems. Users can call aclose() at will, and it's generally 
good practice to always call aclose() on your async generators explicitly. So 
it's entirely possible that users will accidentally call aclose() themselves 
while they have another task is blocked in __anext__. And in that case.... what 
do we do?

The aclose call has to consume the generator's frame, by definition. But in the 
mean time, we still have this __anext__ coroutine object hanging around 
pointing to the consumed frame, that the task scheduler thinks needs to be 
resumed at some point in response to some event, and that can cause all kinds 
of weird situations. The __anext__ might try to resume while the aclose() is 
running. The __anext__ might get stuck and never resume, because of the code 
run by aclose() accidentally unregistering the event that was going to wake it 
up, so you just get a deadlock instead of a useful error. It might cause some 
other code entirely to crash – like if the __anext__ was blocked inside a call 
to 'await queue.get()', then aclose() interrupting that might corrupt the 
queue's internal state, so the next time some other task calls queue.put() they 
get an exception or other incorrect behavior. It all seems really tricky and 
messy to me.

So... this is super subtle and confusing, but I *think* the conclusion is that 
yeah, 3.8.0 is fine and there's no urgent action needed.

----------

_______________________________________
Python tracker <rep...@bugs.python.org>
<https://bugs.python.org/issue38559>
_______________________________________
_______________________________________________
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com

Reply via email to