Sorry to burst your bubble, but I have not followed any of the discussion and I am actually very worried about this topic. I don't think I will be able to make time for this before the 3.7b1 feature freeze.
On Tue, Dec 5, 2017 at 6:51 PM, Eric Snow <ericsnowcurren...@gmail.com> wrote: > Hi all, > > I've finally updated PEP 554. Feedback would be most welcome. The > PEP is in a pretty good place now and I hope to we're close to a > decision to accept it. :) > > In addition to resolving the open questions, I've also made the > following changes to the PEP: > > * put an API summary at the top and moved the full API description down > * add the "is_shareable()" function to indicate if an object can be shared > * added None as a shareable object > > Regarding the open questions: > > * "Leaking exceptions across interpreters" > > I chose to go with an approach that effectively creates a > traceback.TracebackException proxy of the original exception, wraps > that in a RuntimeError, and raises that in the calling interpreter. > Raising an exception that safely preserves the original exception and > traceback seems like the most intuitive behavior (to me, as a user). > The only alternative that made sense is to fully duplicate the > exception and traceback (minus stack frames) in the calling > interpreter, which is probably overkill and likely to be confusing. > > * "Initial support for buffers in channels" > > I chose to add a "SendChannel.send_buffer(obj)" method for this. > Supporting buffer objects from the beginning makes sense, opening good > experimentation opportunities for a valuable set of users. Supporting > buffer objects separately and explicitly helps set clear expectations > for users. I decided not to go with a separate class (e.g. > MemChannel) as it didn't seem like there's enough difference to > warrant keeping them strictly separate. > > FWIW, I'm still strongly in favor of support for passing (copies of) > bytes objects via channels. Passing objects to SendChannel.send() is > obvious. Limiting it, for now, to bytes (and None) helps us avoid > tying ourselves strongly to any particular implementation (it seems > like all the reservations were relative to the implementation). So I > do not see a reason to wait. > > * "Pass channels explicitly to run()?" > > I've applied the suggested solution (make "channels" an explicit > keyword argument). > > -eric > > > I've include the latest full text > (https://www.python.org/dev/peps/pep-0554/) below: > > +++++++++++++++++++++++++++++++++++++++++++++++++ > > PEP: 554 > Title: Multiple Interpreters in the Stdlib > Author: Eric Snow <ericsnowcurren...@gmail.com> > Status: Draft > Type: Standards Track > Content-Type: text/x-rst > Created: 2017-09-05 > Python-Version: 3.7 > Post-History: 07-Sep-2017, 08-Sep-2017, 13-Sep-2017, 05-Dec-2017 > > > Abstract > ======== > > CPython has supported multiple interpreters in the same process (AKA > "subinterpreters") since version 1.5. The feature has been available > via the C-API. [c-api]_ Subinterpreters operate in > `relative isolation from one another <Interpreter Isolation_>`_, which > provides the basis for an > `alternative concurrency model <Concurrency_>`_. > > This proposal introduces the stdlib ``interpreters`` module. The module > will be `provisional <Provisional Status_>`_. It exposes the basic > functionality of subinterpreters already provided by the C-API, along > with new functionality for sharing data between interpreters. > > > Proposal > ======== > > The ``interpreters`` module will be added to the stdlib. It will > provide a high-level interface to subinterpreters and wrap a new > low-level ``_interpreters`` (in the same was as the ``threading`` > module). See the `Examples`_ section for concrete usage and use cases. > > Along with exposing the existing (in CPython) subinterpreter support, > the module will also provide a mechanism for sharing data between > interpreters. This mechanism centers around "channels", which are > similar to queues and pipes. > > Note that *objects* are not shared between interpreters since they are > tied to the interpreter in which they were created. Instead, the > objects' *data* is passed between interpreters. See the `Shared data`_ > section for more details about sharing between interpreters. > > At first only the following types will be supported for sharing: > > * None > * bytes > * PEP 3118 buffer objects (via ``send_buffer()``) > > Support for other basic types (e.g. int, Ellipsis) will be added later. > > API summary for interpreters module > ----------------------------------- > > Here is a summary of the API for the ``interpreters`` module. For a > more in-depth explanation of the proposed classes and functions, see > the `"interpreters" Module API`_ section below. > > For creating and using interpreters: > > +------------------------------+---------------------------- > ------------------+ > | signature | description > | > +============================+=+============================ > ==================+ > | list_all() -> [Intepreter] | Get all existing interpreters. > | > +------------------------------+---------------------------- > ------------------+ > | get_current() -> Interpreter | Get the currently running interpreter. > | > +------------------------------+---------------------------- > ------------------+ > | create() -> Interpreter | Initialize a new (idle) Python > interpreter. | > +------------------------------+---------------------------- > ------------------+ > > | > > +-----------------------+----------------------------------- > ------------------+ > | signature | description > | > +=======================+=================================== > ==================+ > | class Interpreter(id) | A single interpreter. > | > +-----------------------+----------------------------------- > ------------------+ > | .id | The interpreter's ID (read-only). > | > +-----------------------+----------------------------------- > ------------------+ > | .is_running() -> Bool | Is the interpreter currently executing code? > | > +-----------------------+----------------------------------- > ------------------+ > | .destroy() | Finalize and destroy the interpreter. > | > +-----------------------+----------------------------------- > ------------------+ > | .run(src_str, /, \*, | | Run the given source code in the interpreter. > | > | channels=None) | | (This blocks the current thread until done.) > | > +-----------------------+----------------------------------- > ------------------+ > > For sharing data between interpreters: > > +--------------------------------+-------------------------- > ------------------+ > | signature | description > | > +================================+========================== > ==================+ > | is_shareable(obj) -> Bool | | Can the object's data be shared > | > | | | between interpreters? > | > +--------------------------------+-------------------------- > ------------------+ > | create_channel() -> | | Create a new channel for passing > | > | (RecvChannel, SendChannel) | | data between interpreters. > | > +--------------------------------+-------------------------- > ------------------+ > | list_all_channels() -> | Get all open channels. > | > | [(RecvChannel, SendChannel)] | > | > +--------------------------------+-------------------------- > ------------------+ > > | > > +-------------------------------+--------------------------- > --------------------+ > | signature | description > | > +===============================+=========================== > ====================+ > | class RecvChannel(id) | The receiving end of a channel. > | > +-------------------------------+--------------------------- > --------------------+ > | .id | The channel's unique ID. > | > +-------------------------------+--------------------------- > --------------------+ > | .interpreters | The list of associated interpreters. > | > +-------------------------------+--------------------------- > --------------------+ > | .recv() -> object | | Get the next object from the > channel, | > | | | and wait if none have been sent. > | > | | | Associate the interpreter with the > channel. | > +-------------------------------+--------------------------- > --------------------+ > | .recv_nowait(default=None) -> | | Like recv(), but return the > default | > | object | | instead of waiting. > | > +-------------------------------+--------------------------- > --------------------+ > | .close() | | No longer associate the current > interpreter | > | | | with the channel (on the receiving > end). | > +-------------------------------+--------------------------- > --------------------+ > > | > > +---------------------------+------------------------------- > ------------------+ > | signature | description > | > +===========================+=============================== > ==================+ > | class SendChannel(id) | The sending end of a channel. > | > +---------------------------+------------------------------- > ------------------+ > | .id | The channel's unique ID. > | > +---------------------------+------------------------------- > ------------------+ > | .interpreters | The list of associated interpreters. > | > +---------------------------+------------------------------- > ------------------+ > | .send(obj) | | Send the object (i.e. its data) to the > | > | | | receiving end of the channel and wait. > | > | | | Associate the interpreter with the > channel. | > +---------------------------+------------------------------- > ------------------+ > | .send_nowait(obj) | | Like send(), but Fail if not received. > | > +---------------------------+------------------------------- > ------------------+ > | .send_buffer(obj) | | Send the object's (PEP 3118) buffer to > the | > | | | receiving end of the channel and wait. > | > | | | Associate the interpreter with the > channel. | > +---------------------------+------------------------------- > ------------------+ > | .send_buffer_nowait(obj) | | Like send_buffer(), but fail if not > received. | > +---------------------------+------------------------------- > ------------------+ > | .close() | | No longer associate the current > interpreter | > | | | with the channel (on the sending end). > | > +---------------------------+------------------------------- > ------------------+ > > > Examples > ======== > > Run isolated code > ----------------- > > :: > > interp = interpreters.create() > print('before') > interp.run('print("during")') > print('after') > > Run in a thread > --------------- > > :: > > interp = interpreters.create() > def run(): > interp.run('print("during")') > t = threading.Thread(target=run) > print('before') > t.start() > print('after') > > Pre-populate an interpreter > --------------------------- > > :: > > interp = interpreters.create() > interp.run(tw.dedent(""" > import some_lib > import an_expensive_module > some_lib.set_up() > """)) > wait_for_request() > interp.run(tw.dedent(""" > some_lib.handle_request() > """)) > > Handling an exception > --------------------- > > :: > > interp = interpreters.create() > try: > interp.run(tw.dedent(""" > raise KeyError > """)) > except KeyError: > print("got the error from the subinterpreter") > > Synchronize using a channel > --------------------------- > > :: > > interp = interpreters.create() > r, s = interpreters.create_channel() > def run(): > interp.run(tw.dedent(""" > reader.recv() > print("during") > reader.close() > """), > reader=r)) > t = threading.Thread(target=run) > print('before') > t.start() > print('after') > s.send(b'') > s.close() > > Sharing a file descriptor > ------------------------- > > :: > > interp = interpreters.create() > r1, s1 = interpreters.create_channel() > r2, s2 = interpreters.create_channel() > def run(): > interp.run(tw.dedent(""" > fd = int.from_bytes( > reader.recv(), 'big') > for line in os.fdopen(fd): > print(line) > writer.send(b'') > """), > reader=r1, writer=s2) > t = threading.Thread(target=run) > t.start() > with open('spamspamspam') as infile: > fd = infile.fileno().to_bytes(1, 'big') > s.send(fd) > r.recv() > > Passing objects via marshal > --------------------------- > > :: > > interp = interpreters.create() > r, s = interpreters.create_fifo() > interp.run(tw.dedent(""" > import marshal > """), > reader=r) > def run(): > interp.run(tw.dedent(""" > data = reader.recv() > while data: > obj = marshal.loads(data) > do_something(obj) > data = reader.recv() > reader.close() > """), > reader=r) > t = threading.Thread(target=run) > t.start() > for obj in input: > data = marshal.dumps(obj) > s.send(data) > s.send(b'') > > Passing objects via pickle > -------------------------- > > :: > > interp = interpreters.create() > r, s = interpreters.create_channel() > interp.run(tw.dedent(""" > import pickle > """), > reader=r) > def run(): > interp.run(tw.dedent(""" > data = reader.recv() > while data: > obj = pickle.loads(data) > do_something(obj) > data = reader.recv() > reader.close() > """), > reader=r) > t = threading.Thread(target=run) > t.start() > for obj in input: > data = pickle.dumps(obj) > s.send(data) > s.send(b'') > > Running a module > ---------------- > > :: > > interp = interpreters.create() > main_module = mod_name > interp.run(f'import runpy; runpy.run_module({main_module!r})') > > Running as script (including zip archives & directories) > -------------------------------------------------------- > > :: > > interp = interpreters.create() > main_script = path_name > interp.run(f"import runpy; runpy.run_path({main_script!r})") > > Running in a thread pool executor > --------------------------------- > > :: > > interps = [interpreters.create() for i in range(5)] > with concurrent.futures.ThreadPoolExecutor(max_workers=len(interps)) > as pool: > print('before') > for interp in interps: > pool.submit(interp.run, 'print("starting"); print("stopping")' > print('after') > > > Rationale > ========= > > Running code in multiple interpreters provides a useful level of > isolation within the same process. This can be leveraged in a number > of ways. Furthermore, subinterpreters provide a well-defined framework > in which such isolation may extended. > > Nick Coghlan explained some of the benefits through a comparison with > multi-processing [benefits]_:: > > [I] expect that communicating between subinterpreters is going > to end up looking an awful lot like communicating between > subprocesses via shared memory. > > The trade-off between the two models will then be that one still > just looks like a single process from the point of view of the > outside world, and hence doesn't place any extra demands on the > underlying OS beyond those required to run CPython with a single > interpreter, while the other gives much stricter isolation > (including isolating C globals in extension modules), but also > demands much more from the OS when it comes to its IPC > capabilities. > > The security risk profiles of the two approaches will also be quite > different, since using subinterpreters won't require deliberately > poking holes in the process isolation that operating systems give > you by default. > > CPython has supported subinterpreters, with increasing levels of > support, since version 1.5. While the feature has the potential > to be a powerful tool, subinterpreters have suffered from neglect > because they are not available directly from Python. Exposing the > existing functionality in the stdlib will help reverse the situation. > > This proposal is focused on enabling the fundamental capability of > multiple isolated interpreters in the same Python process. This is a > new area for Python so there is relative uncertainly about the best > tools to provide as companions to subinterpreters. Thus we minimize > the functionality we add in the proposal as much as possible. > > Concerns > -------- > > * "subinterpreters are not worth the trouble" > > Some have argued that subinterpreters do not add sufficient benefit > to justify making them an official part of Python. Adding features > to the language (or stdlib) has a cost in increasing the size of > the language. So an addition must pay for itself. In this case, > subinterpreters provide a novel concurrency model focused on isolated > threads of execution. Furthermore, they provide an opportunity for > changes in CPython that will allow simulateous use of multiple CPU > cores (currently prevented by the GIL). > > Alternatives to subinterpreters include threading, async, and > multiprocessing. Threading is limited by the GIL and async isn't > the right solution for every problem (nor for every person). > Multiprocessing is likewise valuable in some but not all situations. > Direct IPC (rather than via the multiprocessing module) provides > similar benefits but with the same caveat. > > Notably, subinterpreters are not intended as a replacement for any of > the above. Certainly they overlap in some areas, but the benefits of > subinterpreters include isolation and (potentially) performance. In > particular, subinterpreters provide a direct route to an alternate > concurrency model (e.g. CSP) which has found success elsewhere and > will appeal to some Python users. That is the core value that the > ``interpreters`` module will provide. > > * "stdlib support for subinterpreters adds extra burden > on C extension authors" > > In the `Interpreter Isolation`_ section below we identify ways in > which isolation in CPython's subinterpreters is incomplete. Most > notable is extension modules that use C globals to store internal > state. PEP 3121 and PEP 489 provide a solution for most of the > problem, but one still remains. [petr-c-ext]_ Until that is resolved, > C extension authors will face extra difficulty to support > subinterpreters. > > Consequently, projects that publish extension modules may face an > increased maintenance burden as their users start using subinterpreters, > where their modules may break. This situation is limited to modules > that use C globals (or use libraries that use C globals) to store > internal state. For numpy, the reported-bug rate is one every 6 > months. [bug-rate]_ > > Ultimately this comes down to a question of how often it will be a > problem in practice: how many projects would be affected, how often > their users will be affected, what the additional maintenance burden > will be for projects, and what the overall benefit of subinterpreters > is to offset those costs. The position of this PEP is that the actual > extra maintenance burden will be small and well below the threshold at > which subinterpreters are worth it. > > > About Subinterpreters > ===================== > > Concurrency > ----------- > > Concurrency is a challenging area of software development. Decades of > research and practice have led to a wide variety of concurrency models, > each with different goals. Most center on correctness and usability. > > One class of concurrency models focuses on isolated threads of > execution that interoperate through some message passing scheme. A > notable example is `Communicating Sequential Processes`_ (CSP), upon > which Go's concurrency is based. The isolation inherent to > subinterpreters makes them well-suited to this approach. > > Shared data > ----------- > > Subinterpreters are inherently isolated (with caveats explained below), > in contrast to threads. So the same communicate-via-shared-memory > approach doesn't work. Without an alternative, effective use of > concurrency via subinterpreters is significantly limited. > > The key challenge here is that sharing objects between interpreters > faces complexity due to various constraints on object ownership, > visibility, and mutability. At a conceptual level it's easier to > reason about concurrency when objects only exist in one interpreter > at a time. At a technical level, CPython's current memory model > limits how Python *objects* may be shared safely between interpreters; > effectively objects are bound to the interpreter in which they were > created. Furthermore the complexity of *object* sharing increases as > subinterpreters become more isolated, e.g. after GIL removal. > > Consequently,the mechanism for sharing needs to be carefully considered. > There are a number of valid solutions, several of which may be > appropriate to support in Python. This proposal provides a single basic > solution: "channels". Ultimately, any other solution will look similar > to the proposed one, which will set the precedent. Note that the > implementation of ``Interpreter.run()`` can be done in a way that allows > for multiple solutions to coexist, but doing so is not technically > a part of the proposal here. > > Regarding the proposed solution, "channels", it is a basic, opt-in data > sharing mechanism that draws inspiration from pipes, queues, and CSP's > channels. [fifo]_ > > As simply described earlier by the API summary, > channels have two operations: send and receive. A key characteristic > of those operations is that channels transmit data derived from Python > objects rather than the objects themselves. When objects are sent, > their data is extracted. When the "object" is received in the other > interpreter, the data is converted back into an object. > > To make this work, the mutable shared state will be managed by the > Python runtime, not by any of the interpreters. Initially we will > support only one type of objects for shared state: the channels provided > by ``create_channel()``. Channels, in turn, will carefully manage > passing objects between interpreters. > > This approach, including keeping the API minimal, helps us avoid further > exposing any underlying complexity to Python users. Along those same > lines, we will initially restrict the types that may be passed through > channels to the following: > > * None > * bytes > * PEP 3118 buffer objects (via ``send_buffer()``) > > Limiting the initial shareable types is a practical matter, reducing > the potential complexity of the initial implementation. There are a > number of strategies we may pursue in the future to expand supported > objects and object sharing strategies. > > Interpreter Isolation > --------------------- > > CPython's interpreters are intended to be strictly isolated from each > other. Each interpreter has its own copy of all modules, classes, > functions, and variables. The same applies to state in C, including in > extension modules. The CPython C-API docs explain more. [caveats]_ > > However, there are ways in which interpreters share some state. First > of all, some process-global state remains shared: > > * file descriptors > * builtin types (e.g. dict, bytes) > * singletons (e.g. None) > * underlying static module data (e.g. functions) for > builtin/extension/frozen modules > > There are no plans to change this. > > Second, some isolation is faulty due to bugs or implementations that did > not take subinterpreters into account. This includes things like > extension modules that rely on C globals. [cryptography]_ In these > cases bugs should be opened (some are already): > > * readline module hook functions (http://bugs.python.org/issue4202) > * memory leaks on re-init (http://bugs.python.org/issue21387) > > Finally, some potential isolation is missing due to the current design > of CPython. Improvements are currently going on to address gaps in this > area: > > * interpreters share the GIL > * interpreters share memory management (e.g. allocators, gc) > * GC is not run per-interpreter [global-gc]_ > * at-exit handlers are not run per-interpreter [global-atexit]_ > * extensions using the ``PyGILState_*`` API are incompatible [gilstate]_ > > Existing Usage > -------------- > > Subinterpreters are not a widely used feature. In fact, the only > documented cases of wide-spread usage are > `mod_wsgi <https://github.com/GrahamDumpleton/mod_wsgi>`_and > `JEP <https://github.com/ninia/jep>`_. On the one hand, this case > provides confidence that existing subinterpreter support is relatively > stable. On the other hand, there isn't much of a sample size from which > to judge the utility of the feature. > > > Provisional Status > ================== > > The new ``interpreters`` module will be added with "provisional" status > (see PEP 411). This allows Python users to experiment with the feature > and provide feedback while still allowing us to adjust to that feedback. > The module will be provisional in Python 3.7 and we will make a decision > before the 3.8 release whether to keep it provisional, graduate it, or > remove it. > > > Alternate Python Implementations > ================================ > > I'll be soliciting feedback from the different Python implementors about > subinterpreter support. > > Multiple-interpter support in the major Python implementations: > > TBD > > * jython: yes [jython]_ > * ironpython: yes? > * pypy: maybe not? [pypy]_ > * micropython: ??? > > > "interpreters" Module API > ========================= > > The module provides the following functions: > > ``list_all()``:: > > Return a list of all existing interpreters. > > ``get_current()``:: > > Return the currently running interpreter. > > ``create()``:: > > Initialize a new Python interpreter and return it. The > interpreter will be created in the current thread and will remain > idle until something is run in it. The interpreter may be used > in any thread and will run in whichever thread calls > ``interp.run()``. > > > The module also provides the following class: > > ``Interpreter(id)``:: > > id: > > The interpreter's ID (read-only). > > is_running(): > > Return whether or not the interpreter is currently executing code. > Calling this on the current interpreter will always return True. > > destroy(): > > Finalize and destroy the interpreter. > > This may not be called on an already running interpreter. Doing > so results in a RuntimeError. > > run(source_str, /, *, channels=None): > > Run the provided Python source code in the interpreter. If the > "channels" keyword argument is provided (and is a mapping of > attribute names to channels) then it is added to the interpreter's > execution namespace (the interpreter's "__main__" module). If any > of the values are not are not RecvChannel or SendChannel instances > then ValueError gets raised. > > This may not be called on an already running interpreter. Doing > so results in a RuntimeError. > > A "run()" call is similar to a function call. Once it completes, > the code that called "run()" continues executing (in the original > interpreter). Likewise, if there is any uncaught exception then > it effectively (see below) propagates into the code where > ``run()`` was called. However, unlike function calls (but like > threads), there is no return value. If any value is needed, pass > it out via a channel. > > The big difference is that "run()" executes the code in an > entirely different interpreter, with entirely separate state. > The state of the current interpreter in the current OS thread > is swapped out with the state of the target interpreter (the one > that will execute the code). When the target finishes executing, > the original interpreter gets swapped back in and its execution > resumes. > > So calling "run()" will effectively cause the current Python > thread to pause. Sometimes you won't want that pause, in which > case you should make the "run()" call in another thread. To do > so, add a function that calls "run()" and then run that function > in a normal "threading.Thread". > > Note that the interpreter's state is never reset, neither before > "run()" executes the code nor after. Thus the interpreter > state is preserved between calls to "run()". This includes > "sys.modules", the "builtins" module, and the internal state > of C extension modules. > > Also note that "run()" executes in the namespace of the "__main__" > module, just like scripts, the REPL, "-m", and "-c". Just as > the interpreter's state is not ever reset, the "__main__" module > is never reset. You can imagine concatenating the code from each > "run()" call into one long script. This is the same as how the > REPL operates. > > Regarding uncaught exceptions, we noted that they are > "effectively" propagated into the code where ``run()`` was called. > To prevent leaking exceptions (and tracebacks) between > interpreters, we create a surrogate of the exception and its > traceback (see ``traceback.TracebackException``), wrap it in a > RuntimeError, and raise that. > > Supported code: source text. > > > API for sharing data > -------------------- > > Subinterpreters are less useful without a mechanism for sharing data > between them. Sharing actual Python objects between interpreters, > however, has enough potential problems that we are avoiding support > for that here. Instead, only mimimum set of types will be supported. > Initially this will include ``bytes`` and channels. Further types may > be supported later. > > The ``interpreters`` module provides a way for users to determine > whether an object is shareable or not: > > ``is_shareable(obj)``:: > > Return True if the object may be shared between interpreters. This > does not necessarily mean that the actual objects will be shared. > Insead, it means that the objects' underlying data will be shared in > a cross-interpreter way, whether via a proxy, a copy, or some other > means. > > This proposal provides two ways to do share such objects between > interpreters. > > First, shareable objects may be passed to ``run()`` as keyword arguments, > where they are effectively injected into the target interpreter's > ``__main__`` module. This is mainly intended for sharing meta-objects > (e.g. channels) between interpreters, as it is less useful to pass other > objects (like ``bytes``) to ``run``. > > Second, the main mechanism for sharing objects (i.e. their data) between > interpreters is through channels. A channel is a simplex FIFO similar > to a pipe. The main difference is that channels can be associated with > zero or more interpreters on either end. Unlike queues, which are also > many-to-many, channels have no buffer. > > ``create_channel()``:: > > Create a new channel and return (recv, send), the RecvChannel and > SendChannel corresponding to the ends of the channel. The channel > is not closed and destroyed (i.e. garbage-collected) until the number > of associated interpreters returns to 0. > > An interpreter gets associated with a channel by calling its "send()" > or "recv()" method. That association gets dropped by calling > "close()" on the channel. > > Both ends of the channel are supported "shared" objects (i.e. may be > safely shared by different interpreters. Thus they may be passed as > keyword arguments to "Interpreter.run()". > > ``list_all_channels()``:: > > Return a list of all open (RecvChannel, SendChannel) pairs. > > > ``RecvChannel(id)``:: > > The receiving end of a channel. An interpreter may use this to > receive objects from another interpreter. At first only bytes will > be supported. > > id: > > The channel's unique ID. > > interpreters: > > The list of associated interpreters: those that have called > the "recv()" or "__next__()" methods and haven't called "close()". > > recv(): > > Return the next object (i.e. the data from the sent object) from > the channel. If none have been sent then wait until the next > send. This associates the current interpreter with the channel. > > If the channel is already closed (see the close() method) > then raise EOFError. If the channel isn't closed, but the current > interpreter already called the "close()" method (which drops its > association with the channel) then raise ValueError. > > recv_nowait(default=None): > > Return the next object from the channel. If none have been sent > then return the default. Otherwise, this is the same as the > "recv()" method. > > close(): > > No longer associate the current interpreter with the channel (on > the receiving end) and block future association (via the "recv()" > method. If the interpreter was never associated with the channel > then still block future association. Once an interpreter is no > longer associated with the channel, subsequent (or current) send() > and recv() calls from that interpreter will raise ValueError > (or EOFError if the channel is actually marked as closed). > > Once the number of associated interpreters on both ends drops > to 0, the channel is actually marked as closed. The Python > runtime will garbage collect all closed channels, though it may > not be immediately. Note that "close()" is automatically called > in behalf of the current interpreter when the channel is no longer > used (i.e. has no references) in that interpreter. > > This operation is idempotent. Return True if "close()" has not > been called before by the current interpreter. > > > ``SendChannel(id)``:: > > The sending end of a channel. An interpreter may use this to send > objects to another interpreter. At first only bytes will be > supported. > > id: > > The channel's unique ID. > > interpreters: > > The list of associated interpreters (those that have called > the "send()" method). > > send(obj): > > Send the object (i.e. its data) to the receiving end of the > channel. Wait until the object is received. If the the > object is not shareable then ValueError is raised. Currently > only bytes are supported. > > If the channel is already closed (see the close() method) > then raise EOFError. If the channel isn't closed, but the current > interpreter already called the "close()" method (which drops its > association with the channel) then raise ValueError. > > send_nowait(obj): > > Send the object to the receiving end of the channel. If the other > end is not currently receiving then raise RuntimeError. Otherwise > this is the same as "send()". > > send_buffer(obj): > > Send a MemoryView of the object rather than the object. Otherwise > this is the same as send(). Note that the object must implement > the PEP 3118 buffer protocol. > > send_buffer_nowait(obj): > > Send a MemoryView of the object rather than the object. If the > other end is not currently receiving then raise RuntimeError. > Otherwise this is the same as "send_buffer()". > > close(): > > This is the same as "RecvChannel.close(), but applied to the > sending end of the channel. > > Note that ``send_buffer()`` is similar to how > ``multiprocessing.Connection`` works. [mp-conn]_ > > > Open Questions > ============== > > None > > > Open Implementation Questions > ============================= > > Does every interpreter think that their thread is the "main" thread? > -------------------------------------------------------------------- > > (This is more of an implementation detail that an issue for the PEP.) > > CPython's interpreter implementation identifies the OS thread in which > it was started as the "main" thread. The interpreter the has slightly > different behavior depending on if the current thread is the main one > or not. This presents a problem in cases where "main thread" is meant > to imply "main thread in the main interpreter" [main-thread]_, where > the main interpreter is the initial one. > > Disallow subinterpreters in the main thread? > -------------------------------------------- > > (This is more of an implementation detail that an issue for the PEP.) > > This is a specific case of the above issue. Currently in CPython, > "we need a main \*thread\* in order to sensibly manage the way signal > handling works across different platforms". [main-thread]_ > > Since signal handlers are part of the interpreter state, running a > subinterpreter in the main thread means that the main interpreter > can no longer properly handle signals (since it's effectively paused). > > Furthermore, running a subinterpreter in the main thread would > conceivably allow setting signal handlers on that interpreter, which > would likewise impact signal handling when that interpreter isn't > running or is running in a different thread. > > Ultimately, running subinterpreters in the main OS thread introduces > complications to the signal handling implementation. So it may make > the most sense to disallow running subinterpreters in the main thread. > Support for it could be considered later. The downside is that folks > wanting to try out subinterpreters would be required to take the extra > step of using threads. This could slow adoption and experimentation, > whereas without the restriction there's less of an obstacle. > > > Deferred Functionality > ====================== > > In the interest of keeping this proposal minimal, the following > functionality has been left out for future consideration. Note that > this is not a judgement against any of said capability, but rather a > deferment. That said, each is arguably valid. > > Interpreter.call() > ------------------ > > It would be convenient to run existing functions in subinterpreters > directly. ``Interpreter.run()`` could be adjusted to support this or > a ``call()`` method could be added:: > > Interpreter.call(f, *args, **kwargs) > > This suffers from the same problem as sharing objects between > interpreters via queues. The minimal solution (running a source string) > is sufficient for us to get the feature out where it can be explored. > > timeout arg to recv() and send() > -------------------------------- > > Typically functions that have a ``block`` argument also have a > ``timeout`` argument. It sometimes makes sense to do likewise for > functions that otherwise block, like the channel ``recv()`` and > ``send()`` methods. We can add it later if needed. > > get_main() > ---------- > > CPython has a concept of a "main" interpreter. This is the initial > interpreter created during CPython's runtime initialization. It may > be useful to identify the main interpreter. For instance, the main > interpreter should not be destroyed. However, for the basic > functionality of a high-level API a ``get_main()`` function is not > necessary. Furthermore, there is no requirement that a Python > implementation have a concept of a main interpreter. So until there's > a clear need we'll leave ``get_main()`` out. > > Interpreter.run_in_thread() > --------------------------- > > This method would make a ``run()`` call for you in a thread. Doing this > using only ``threading.Thread`` and ``run()`` is relatively trivial so > we've left it out. > > Synchronization Primitives > -------------------------- > > The ``threading`` module provides a number of synchronization primitives > for coordinating concurrent operations. This is especially necessary > due to the shared-state nature of threading. In contrast, > subinterpreters do not share state. Data sharing is restricted to > channels, which do away with the need for explicit synchronization. If > any sort of opt-in shared state support is added to subinterpreters in > the future, that same effort can introduce synchronization primitives > to meet that need. > > CSP Library > ----------- > > A ``csp`` module would not be a large step away from the functionality > provided by this PEP. However, adding such a module is outside the > minimalist goals of this proposal. > > Syntactic Support > ----------------- > > The ``Go`` language provides a concurrency model based on CSP, so > it's similar to the concurrency model that subinterpreters support. > ``Go`` provides syntactic support, as well several builtin concurrency > primitives, to make concurrency a first-class feature. Conceivably, > similar syntactic (and builtin) support could be added to Python using > subinterpreters. However, that is *way* outside the scope of this PEP! > > Multiprocessing > --------------- > > The ``multiprocessing`` module could support subinterpreters in the same > way it supports threads and processes. In fact, the module's > maintainer, Davin Potts, has indicated this is a reasonable feature > request. However, it is outside the narrow scope of this PEP. > > C-extension opt-in/opt-out > -------------------------- > > By using the ``PyModuleDef_Slot`` introduced by PEP 489, we could easily > add a mechanism by which C-extension modules could opt out of support > for subinterpreters. Then the import machinery, when operating in > a subinterpreter, would need to check the module for support. It would > raise an ImportError if unsupported. > > Alternately we could support opting in to subinterpreter support. > However, that would probably exclude many more modules (unnecessarily) > than the opt-out approach. > > The scope of adding the ModuleDef slot and fixing up the import > machinery is non-trivial, but could be worth it. It all depends on > how many extension modules break under subinterpreters. Given the > relatively few cases we know of through mod_wsgi, we can leave this > for later. > > Poisoning channels > ------------------ > > CSP has the concept of poisoning a channel. Once a channel has been > poisoned, and ``send()`` or ``recv()`` call on it will raise a special > exception, effectively ending execution in the interpreter that tried > to use the poisoned channel. > > This could be accomplished by adding a ``poison()`` method to both ends > of the channel. The ``close()`` method could work if it had a ``force`` > option to force the channel closed. Regardless, these semantics are > relatively specialized and can wait. > > Sending channels over channels > ------------------------------ > > Some advanced usage of subinterpreters could take advantage of the > ability to send channels over channels, in addition to bytes. Given > that channels will already be multi-interpreter safe, supporting then > in ``RecvChannel.recv()`` wouldn't be a big change. However, this can > wait until the basic functionality has been ironed out. > > Reseting __main__ > ----------------- > > As proposed, every call to ``Interpreter.run()`` will execute in the > namespace of the interpreter's existing ``__main__`` module. This means > that data persists there between ``run()`` calls. Sometimes this isn't > desireable and you want to execute in a fresh ``__main__``. Also, > you don't necessarily want to leak objects there that you aren't using > any more. > > Note that the following won't work right because it will clear too much > (e.g. ``__name__`` and the other "__dunder__" attributes:: > > interp.run('globals().clear()') > > Possible solutions include: > > * a ``create()`` arg to indicate resetting ``__main__`` after each > ``run`` call > * an ``Interpreter.reset_main`` flag to support opting in or out > after the fact > * an ``Interpreter.reset_main()`` method to opt in when desired > * ``importlib.util.reset_globals()`` [reset_globals]_ > > Also note that reseting ``__main__`` does nothing about state stored > in other modules. So any solution would have to be clear about the > scope of what is being reset. Conceivably we could invent a mechanism > by which any (or every) module could be reset, unlike ``reload()`` > which does not clear the module before loading into it. Regardless, > since ``__main__`` is the execution namespace of the interpreter, > resetting it has a much more direct correlation to interpreters and > their dynamic state than does resetting other modules. So a more > generic module reset mechanism may prove unnecessary. > > This isn't a critical feature initially. It can wait until later > if desirable. > > Support passing ints in channels > -------------------------------- > > Passing ints around should be fine and ultimately is probably > desirable. However, we can get by with serializing them as bytes > for now. The goal is a minimal API for the sake of basic > functionality at first. > > File descriptors and sockets in channels > ---------------------------------------- > > Given that file descriptors and sockets are process-global resources, > support for passing them through channels is a reasonable idea. They > would be a good candidate for the first effort at expanding the types > that channels support. They aren't strictly necessary for the initial > API. > > Integration with async > ---------------------- > > Per Antoine Pitrou [async]_:: > > Has any thought been given to how FIFOs could integrate with async > code driven by an event loop (e.g. asyncio)? I think the model of > executing several asyncio (or Tornado) applications each in their > own subinterpreter may prove quite interesting to reconcile multi- > core concurrency with ease of programming. That would require the > FIFOs to be able to synchronize on something an event loop can wait > on (probably a file descriptor?). > > A possible solution is to provide async implementations of the blocking > channel methods (``__next__()``, ``recv()``, and ``send()``). However, > the basic functionality of subinterpreters does not depend on async and > can be added later. > > Support for iteration > --------------------- > > Supporting iteration on ``RecvChannel`` (via ``__iter__()`` or > ``_next__()``) may be useful. A trivial implementation would use the > ``recv()`` method, similar to how files do iteration. Since this isn't > a fundamental capability and has a simple analog, adding iteration > support can wait until later. > > Channel context managers > ------------------------ > > Context manager support on ``RecvChannel`` and ``SendChannel`` may be > helpful. The implementation would be simple, wrapping a call to > ``close()`` like files do. As with iteration, this can wait. > > Pipes and Queues > ---------------- > > With the proposed object passing machanism of "channels", other similar > basic types aren't required to achieve the minimal useful functionality > of subinterpreters. Such types include pipes (like channels, but > one-to-one) and queues (like channels, but buffered). See below in > `Rejected Ideas` for more information. > > Even though these types aren't part of this proposal, they may still > be useful in the context of concurrency. Adding them later is entirely > reasonable. The could be trivially implemented as wrappers around > channels. Alternatively they could be implemented for efficiency at the > same low level as channels. > > interpreters.RunFailedError > --------------------------- > > As currently proposed, ``Interpreter.run()`` offers you no way to > distinguish an error coming from the subinterpreter from any other > error in the current interpreter. Your only option would be to > explicitly wrap your ``run()`` call in a > ``try: ... except RuntimeError:`` (since we wrap a proxy of the original > exception in a RuntimeError and raise that). > > If this is a problem in practice then would could add something like > ``interpreters.RunFailedError`` (subclassing RuntimeError) and raise that > in ``run()``. > > Return a lock from send() > ------------------------- > > When sending an object through a channel, you don't have a way of knowing > when the object gets received on the other end. One way to work around > this is to return a locked ``threading.Lock`` from ``SendChannel.send()`` > that unlocks once the object is received. > > This matters for buffered channels (i.e. queues). For unbuffered > channels it is a non-issue. So this can be dealt with once channels > support buffering. > > > Rejected Ideas > ============== > > Explicit channel association > ---------------------------- > > Interpreters are implicitly associated with channels upon ``recv()`` and > ``send()`` calls. They are de-associated with ``close()`` calls. The > alternative would be explicit methods. It would be either > ``add_channel()`` and ``remove_channel()`` methods on ``Interpreter`` > objects or something similar on channel objects. > > In practice, this level of management shouldn't be necessary for users. > So adding more explicit support would only add clutter to the API. > > Use pipes instead of channels > ----------------------------- > > A pipe would be a simplex FIFO between exactly two interpreters. For > most use cases this would be sufficient. It could potentially simplify > the implementation as well. However, it isn't a big step to supporting > a many-to-many simplex FIFO via channels. Also, with pipes the API > ends up being slightly more complicated, requiring naming the pipes. > > Use queues instead of channels > ------------------------------ > > The main difference between queues and channels is that queues support > buffering. This would complicate the blocking semantics of ``recv()`` > and ``send()``. Also, queues can be built on top of channels. > > "enumerate" > ----------- > > The ``list_all()`` function provides the list of all interpreters. > In the threading module, which partly inspired the proposed API, the > function is called ``enumerate()``. The name is different here to > avoid confusing Python users that are not already familiar with the > threading API. For them "enumerate" is rather unclear, whereas > "list_all" is clear. > > Alternate solutions to prevent leaking exceptions across interpreters > --------------------------------------------------------------------- > > In function calls, uncaught exceptions propagate to the calling frame. > The same approach could be taken with ``run()``. However, this would > mean that exception objects would leak across the inter-interpreter > boundary. Likewise, the frames in the traceback would potentially leak. > > While that might not be a problem currently, it would be a problem once > interpreters get better isolation relative to memory management (which > is necessary to stop sharing the GIL between interpreters). We've > resolved the semantics of how the exceptions propagate by raising a > RuntimeError instead, which wraps a safe proxy for the original > exception and traceback. > > Rejected possible solutions: > > * set the RuntimeError's __cause__ to the proxy of the original > exception > * reproduce the exception and traceback in the original interpreter > and raise that. > * convert at the boundary (a la ``subprocess.CalledProcessError``) > (requires a cross-interpreter representation) > * support customization via ``Interpreter.excepthook`` > (requires a cross-interpreter representation) > * wrap in a proxy at the boundary (including with support for > something like ``err.raise()`` to propagate the traceback). > * return the exception (or its proxy) from ``run()`` instead of > raising it > * return a result object (like ``subprocess`` does) [result-object]_ > (unecessary complexity?) > * throw the exception away and expect users to deal with unhandled > exceptions explicitly in the script they pass to ``run()`` > (they can pass error info out via channels); with threads you have > to do something similar > > > References > ========== > > .. [c-api] > https://docs.python.org/3/c-api/init.html#sub-interpreter-support > > .. _Communicating Sequential Processes: > > .. [CSP] > https://en.wikipedia.org/wiki/Communicating_sequential_processes > https://github.com/futurecore/python-csp > > .. [fifo] > https://docs.python.org/3/library/multiprocessing.html# > multiprocessing.Pipe > https://docs.python.org/3/library/multiprocessing.html# > multiprocessing.Queue > https://docs.python.org/3/library/queue.html#module-queue > http://stackless.readthedocs.io/en/2.7-slp/library/ > stackless/channels.html > https://golang.org/doc/effective_go.html#sharing > http://www.jtolds.com/writing/2016/03/go-channels-are-bad- > and-you-should-feel-bad/ > > .. [caveats] > https://docs.python.org/3/c-api/init.html#bugs-and-caveats > > .. [petr-c-ext] > https://mail.python.org/pipermail/import-sig/2016-June/001062.html > https://mail.python.org/pipermail/python-ideas/2016-April/039748.html > > .. [cryptography] > https://github.com/pyca/cryptography/issues/2299 > > .. [global-gc] > http://bugs.python.org/issue24554 > > .. [gilstate] > https://bugs.python.org/issue10915 > http://bugs.python.org/issue15751 > > .. [global-atexit] > https://bugs.python.org/issue6531 > > .. [mp-conn] > https://docs.python.org/3/library/multiprocessing.html# > multiprocessing.Connection > > .. [bug-rate] > https://mail.python.org/pipermail/python-ideas/2017- > September/047094.html > > .. [benefits] > https://mail.python.org/pipermail/python-ideas/2017- > September/047122.html > > .. [main-thread] > https://mail.python.org/pipermail/python-ideas/2017- > September/047144.html > https://mail.python.org/pipermail/python-dev/2017-September/149566.html > > .. [reset_globals] > https://mail.python.org/pipermail/python-dev/2017-September/149545.html > > .. [async] > https://mail.python.org/pipermail/python-dev/2017-September/149420.html > https://mail.python.org/pipermail/python-dev/2017-September/149585.html > > .. [result-object] > https://mail.python.org/pipermail/python-dev/2017-September/149562.html > > .. [jython] > https://mail.python.org/pipermail/python-ideas/2017-May/045771.html > > .. [pypy] > https://mail.python.org/pipermail/python-ideas/2017- > September/046973.html > > > Copyright > ========= > > This document has been placed in the public domain. > _______________________________________________ > Python-Dev mailing list > Python-Dev@python.org > https://mail.python.org/mailman/listinfo/python-dev > Unsubscribe: https://mail.python.org/mailman/options/python-dev/ > guido%40python.org > -- --Guido van Rossum (python.org/~guido)
_______________________________________________ Python-Dev mailing list Python-Dev@python.org https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com