[Python-Dev] Re: PEP 554 comments

Eric Snow Mon, 20 Apr 2020 13:32:42 -0700

Thanks for the feedback, Antoine.  I've responded inline below and
will be making appropriate changes to the PEP.  One point I'd like to
reinforce before my comments is the PEP's emphasis on minimalism.

>From PEP 554:

    This proposal is focused on enabling the fundamental capability of
    multiple isolated interpreters in the same Python process.  This is a
    new area for Python so there is relative uncertainly about the best
    tools to provide as companions to subinterpreters.  Thus we minimize
    the functionality we add in the proposal as much as possible.

I don't think anything you've mentioned really deviates much from
that, and making the module provisional helps.  I just want us to be
careful not to add stuff that we'll decide we want to remove later. :)

FYI, I'm already updating the PEP based on feedback from the other
email thread.  I'll let you know once all the updates are done.

On Sat, Apr 18, 2020 at 11:16 AM Antoine Pitrou <solip...@pitrou.net> wrote:
> First, I would like to say that I have no fondamental problem with this
> PEP. While I agree with Nathaniel that the rationale given about the CSP
> concurrency model seems a bit weak, the author is obviously expressing
> his opinion there and I won't object to that.  However, I think the PEP
> is desirable for other reasons.  Mostly, I hope that by making the
> subinterpreters functionality available to pure Python programmers
> (while it was formally an advanced and arcane part of the C API), we
> will spur of bunch of interesting third-party experimentations,
> including possibilities that we on python-dev have not thought about.

The experimentation angle is one I didn't consider all that much, but
you make a good point.

> The appeal of the PEP for experimentations is multiple:
> 1) ability to concurrently run independent execution environments
>    without spawning child processes (which on some platforms and in some
>    situations may not be very desirable: for example on Windows where
>    the cost of spawning is rather high; also, child processes may
>    crash, and sometimes it is not easy for the parent to recover,
>    especially if a synchronization primitive is left in an unexpected
>    state)
> 2) the potential for parallelizing CPU-bound pure Python code
>    in a single process, if a per-interpreter GIL is finally implemented
> 3) easier support for sharing large data between separate execution
>    environments, without the hassle of setting up shared memory or the
>    fragility of relying on fork() semantics
>
> (and as I said, I hope people find other applications)

These are covered in the PEP, though not together in the rationale,
etc.  Should I add explicit mention of experimentation as a motivation
in the abstract or rationale sections?  Would you like me to add a
dedicated paragraph/section covering experimentation?

> As for the argument that we already have asyncio and several other
> packages, I actually think that combining these different concurrency
> mechanisms would be interesting complex applications (such as
> distributed systems).  For that, however, I think the PEP as currently
> written is a bit lacking, see below.

Yeah, that would be interesting.  What in particular will help make
subinterpreters and asyncio more cooperative?

> Now for the detailed comments.
>
> * I think the module should indeed be provisional.  Experimentation may
>   discover warts that call for a change in the API or semantics.  Let's
>   not prevent ourselves from fixing those issues.

Sounds good.

> * The "association" timing seems quirky and potentially annoying: an
>   interpreter only becomes associated with a channel the first time it
>   calls recv() or send().  How about, instead, associating an
>   interpreter with a channel as soon as that channel is given to it
>   through `Interpreter.run(..., channels=...)` (or received through
>   `recv()`)?

That seems fine to me.  I do not recall the exact reason for tying
association to recv() or send().  I only vaguely remember doing it
that way for a technical reason.  If I determine that reason then I'll
bring it up.  In the meantime I'll update the PEP to associate
interpreters when the channel end is sent.

FWIW, it may have been influenced by the automatic channel closing
when no interpreters are associated.  If interpreters are associated
when channel ends are sent (rather than when used) then interpreters
will have to be more careful about releasing channels.  That's just a
guess as to why I did it that way. :)

> * How hard would it be, in the current implementation, to add buffering
>   to channels?  It doesn't have to be infinite: you can choose a fixed
>   buffer size (or make it configurable in the create() function, which
>   allows passing 0 for unbuffered).  Like Nathaniel, I think unbuffered
>   channels will quickly be annoying to work with (yes, you can create a
>   helper thread... now you have one additional thread per channel,
>   which isn't pretty -- especially with the GIL).

Currently the low-level implementation supports "infinite" channel
buffering.  The restriction in proposed high-level API was there to
allow us to go with a simpler low-level implementation.  However, I
don't think that is necessary at this point.  I'll update the PEP.

> * In the same vein, I think channels should allow adding readiness
>   callbacks (that are called whenever a channel becomes ready for
>   sending or receiving, respectively).  This would make it easy to plug
>   them into an event loop or other concurrency systems (such as
>   Future-based concurrency).  Note that each interpreter "associated"
>   with a channel should be able to set its own readiness callback: so
>   one callback per Python object representing the channel, but
>   potentially multiple callbacks for the underlying channel primitive.

Would this be as useful if we have buffered channels?  It sounds like
you wanted one or the other but not both.

>   (how would the callback be scheduled for execution in the right
>   interpreter? perhaps using `_PyEval_AddPendingCall()` or a similar
>   mechanism?)

Yeah, the pending call machinery has become my dear friend for several
parts of the low-level implementation for the PEP. :)

> * I think either `interpreters.get_main()` or `interpreters.is_main()`
>   is desirable.  Inevitable, the slight differences between main and
>   non-main interpreters will surface in non-trivial applications
>   (finalization issues in distributed systems can really be hairy).  It
>   seems this should be mostly costless to provide, so let's do it.

In the PEP (https://www.python.org/dev/peps/pep-0554/#get-main) I have
this listed as a deferred functionality:

    for the basic functionality of a high-level API a get_main() function is
    not necessary. Furthermore, there is no requirement that a Python
    implementation have a concept of a main interpreter. So until there's
    a clear need we'll leave get_main() out.

My preference would be to leave it out, since it's much harder to
remove something later than to add it later.  However, it isn't a
major issue and is one of the deferred bits that I almost kept in the
PEP. :)  So I'll go ahead and add it to the proposed API.

> * I do think a minimal synchronization primitive would be nice.
>   Either a Lock (in the Python sense) or a Semaphore: both should be
>   relatively easy to provide, by wrapping an OS-level synchronization
>   primitive.  Then you can recreate all high-level synchronization
>   primitives, like the threading and multiprocessing modules do (using
>   a Lock or a Semaphore, respectively).
>
>   (note you should be able to emulate a semaphore using blocking send()
>   and recv() calls, but that's probably not very efficient, and
>   efficiency is important)

I'll address this specific ask in a separate post, to keep the
discussion focused.

> Of course, I hope these are all actionable before beta1 :-)  If not,
> here is my preferential priority list:
>
> * High priority: fix association timing
> * High priority: either buffering /or/ readiness callbacks
> * Middle priority: get_main() /or/ is_main()

These should be doable for beta1 since they re either trivial or
already done. :)

> * Middle / low priority: a simple synchronization primitive

This might be harder to get done for beta1.  That said, with a
provisional status we may be able to add it after beta1. :)

> But I would stress the more of these we provide, the more we encourage
> people to experiment without pulling too much of their hair.

Good point.  I think the emphasis on experimentation is valuable.

Thanks again,

-eric
_______________________________________________
Python-Dev mailing list -- python-dev@python.org
To unsubscribe send an email to python-dev-le...@python.org
https://mail.python.org/mailman3/lists/python-dev.python.org/
Message archived at 
https://mail.python.org/archives/list/python-dev@python.org/message/2A2E3GYFI4VPJVJQN2JVWHVL54GXDJN6/
Code of Conduct: http://python.org/psf/codeofconduct/

[Python-Dev] Re: PEP 554 comments

Reply via email to