[Python-Dev] PEP 554 comments

Antoine Pitrou Sat, 18 Apr 2020 10:10:21 -0700


Hello,

First, I would like to say that I have no fondamental problem with this
PEP. While I agree with Nathaniel that the rationale given about the CSP
concurrency model seems a bit weak, the author is obviously expressing
his opinion there and I won't object to that. However, I think the PEP
is desirable for other reasons. Mostly, I hope that by making the
subinterpreters functionality available to pure Python programmers
(while it was formally an advanced and arcane part of the C API), we
will spur of bunch of interesting third-party experimentations,
including possibilities that we on python-dev have not thought about.

The appeal of the PEP for experimentations is multiple:
1) ability to concurrently run independent execution environments
without spawning child processes (which on some platforms and in some
situations may not be very desirable: for example on Windows where
the cost of spawning is rather high; also, child processes may
crash, and sometimes it is not easy for the parent to recover,
especially if a synchronization primitive is left in an unexpected
state)
2) the potential for parallelizing CPU-bound pure Python code
in a single process, if a per-interpreter GIL is finally implemented
3) easier support for sharing large data between separate execution
environments, without the hassle of setting up shared memory or the
fragility of relying on fork() semantics

(and as I said, I hope people find other applications)

As for the argument that we already have asyncio and several other
packages, I actually think that combining these different concurrency
mechanisms would be interesting complex applications (such as
distributed systems). For that, however, I think the PEP as currently
written is a bit lacking, see below.

Now for the detailed comments.

* I think the module should indeed be provisional. Experimentation may
discover warts that call for a change in the API or semantics. Let's
not prevent ourselves from fixing those issues.

* The "association" timing seems quirky and potentially annoying: an
interpreter only becomes associated with a channel the first time it
calls recv() or send(). How about, instead, associating an
interpreter with a channel as soon as that channel is given to it
through `Interpreter.run(..., channels=...)` (or received through
`recv()`)?

* How hard would it be, in the current implementation, to add buffering
to channels? It doesn't have to be infinite: you can choose a fixed
buffer size (or make it configurable in the create() function, which
allows passing 0 for unbuffered). Like Nathaniel, I think unbuffered
channels will quickly be annoying to work with (yes, you can create a
helper thread... now you have one additional thread per channel,
which isn't pretty -- especially with the GIL).

* In the same vein, I think channels should allow adding readiness
callbacks (that are called whenever a channel becomes ready for
sending or receiving, respectively). This would make it easy to plug
them into an event loop or other concurrency systems (such as
Future-based concurrency). Note that each interpreter "associated"
with a channel should be able to set its own readiness callback: so
one callback per Python object representing the channel, but
potentially multiple callbacks for the underlying channel primitive.

(how would the callback be scheduled for execution in the right
interpreter? perhaps using `_PyEval_AddPendingCall()` or a similar
mechanism?)

* I think either `interpreters.get_main()` or `interpreters.is_main()`
is desirable. Inevitable, the slight differences between main and
non-main interpreters will surface in non-trivial applications
(finalization issues in distributed systems can really be hairy). It
seems this should be mostly costless to provide, so let's do it.

* I do think a minimal synchronization primitive would be nice.
Either a Lock (in the Python sense) or a Semaphore: both should be
relatively easy to provide, by wrapping an OS-level synchronization
primitive. Then you can recreate all high-level synchronization
primitives, like the threading and multiprocessing modules do (using
a Lock or a Semaphore, respectively).

(note you should be able to emulate a semaphore using blocking send()
and recv() calls, but that's probably not very efficient, and
efficiency is important)

Of course, I hope these are all actionable before beta1 :-) If not,
here is my preferential priority list:

* High priority: fix association timing
* High priority: either buffering /or/ readiness callbacks
* Middle priority: get_main() /or/ is_main()
* Middle / low priority: a simple synchronization primitive

But I would stress the more of these we provide, the more we encourage
people to experiment without pulling too much of their hair.

(also, of course, I hope other people read the PEP and emit feedback)

Best regards

Antoine.

_______________________________________________
Python-Dev mailing list -- python-dev@python.org
To unsubscribe send an email to python-dev-le...@python.org
https://mail.python.org/mailman3/lists/python-dev.python.org/
Message archived at
https://mail.python.org/archives/list/python-dev@python.org/message/3KS3KACCJBUCHUGRBZ3R6WUGZXOKKWZ5/
Code of Conduct: http://python.org/psf/codeofconduct/

[Python-Dev] PEP 554 comments

Reply via email to