Re: [Python-Dev] PEP 554 v4 (new interpreters module)

Guido van Rossum Wed, 06 Dec 2017 07:56:38 -0800

Sorry to burst your bubble, but I have not followed any of the discussion
and I am actually very worried about this topic. I don't think I will be
able to make time for this before the 3.7b1 feature freeze.


On Tue, Dec 5, 2017 at 6:51 PM, Eric Snow <ericsnowcurren...@gmail.com>
wrote:

> Hi all,
>
> I've finally updated PEP 554.  Feedback would be most welcome.  The
> PEP is in a pretty good place now and I hope to we're close to a
> decision to accept it. :)
>
> In addition to resolving the open questions, I've also made the
> following changes to the PEP:
>
> * put an API summary at the top and moved the full API description down
> * add the "is_shareable()" function to indicate if an object can be shared
> * added None as a shareable object
>
> Regarding the open questions:
>
>  * "Leaking exceptions across interpreters"
>
> I chose to go with an approach that effectively creates a
> traceback.TracebackException proxy of the original exception, wraps
> that in a RuntimeError, and raises that in the calling interpreter.
> Raising an exception that safely preserves the original exception and
> traceback seems like the most intuitive behavior (to me, as a user).
> The only alternative that made sense is to fully duplicate the
> exception and traceback (minus stack frames) in the calling
> interpreter, which is probably overkill and likely to be confusing.
>
>  * "Initial support for buffers in channels"
>
> I chose to add a "SendChannel.send_buffer(obj)" method for this.
> Supporting buffer objects from the beginning makes sense, opening good
> experimentation opportunities for a valuable set of users.  Supporting
> buffer objects separately and explicitly helps set clear expectations
> for users.  I decided not to go with a separate class (e.g.
> MemChannel) as it didn't seem like there's enough difference to
> warrant keeping them strictly separate.
>
> FWIW, I'm still strongly in favor of support for passing (copies of)
> bytes objects via channels.  Passing objects to SendChannel.send() is
> obvious.  Limiting it, for now, to bytes (and None) helps us avoid
> tying ourselves strongly to any particular implementation (it seems
> like all the reservations were relative to the implementation).  So I
> do not see a reason to wait.
>
>  * "Pass channels explicitly to run()?"
>
> I've applied the suggested solution (make "channels" an explicit
> keyword argument).
>
> -eric
>
>
> I've include the latest full text
> (https://www.python.org/dev/peps/pep-0554/) below:
>
> +++++++++++++++++++++++++++++++++++++++++++++++++
>
> PEP: 554
> Title: Multiple Interpreters in the Stdlib
> Author: Eric Snow <ericsnowcurren...@gmail.com>
> Status: Draft
> Type: Standards Track
> Content-Type: text/x-rst
> Created: 2017-09-05
> Python-Version: 3.7
> Post-History: 07-Sep-2017, 08-Sep-2017, 13-Sep-2017, 05-Dec-2017
>
>
> Abstract
> ========
>
> CPython has supported multiple interpreters in the same process (AKA
> "subinterpreters") since version 1.5.  The feature has been available
> via the C-API. [c-api]_ Subinterpreters operate in
> `relative isolation from one another <Interpreter Isolation_>`_, which
> provides the basis for an
> `alternative concurrency model <Concurrency_>`_.
>
> This proposal introduces the stdlib ``interpreters`` module.  The module
> will be `provisional <Provisional Status_>`_.  It exposes the basic
> functionality of subinterpreters already provided by the C-API, along
> with new functionality for sharing data between interpreters.
>
>
> Proposal
> ========
>
> The ``interpreters`` module will be added to the stdlib.  It will
> provide a high-level interface to subinterpreters and wrap a new
> low-level ``_interpreters`` (in the same was as the ``threading``
> module).  See the `Examples`_ section for concrete usage and use cases.
>
> Along with exposing the existing (in CPython) subinterpreter support,
> the module will also provide a mechanism for sharing data between
> interpreters.  This mechanism centers around "channels", which are
> similar to queues and pipes.
>
> Note that *objects* are not shared between interpreters since they are
> tied to the interpreter in which they were created.  Instead, the
> objects' *data* is passed between interpreters.  See the `Shared data`_
> section for more details about sharing between interpreters.
>
> At first only the following types will be supported for sharing:
>
> * None
> * bytes
> * PEP 3118 buffer objects (via ``send_buffer()``)
>
> Support for other basic types (e.g. int, Ellipsis) will be added later.
>
> API summary for interpreters module
> -----------------------------------
>
> Here is a summary of the API for the ``interpreters`` module.  For a
> more in-depth explanation of the proposed classes and functions, see
> the `"interpreters" Module API`_ section below.
>
> For creating and using interpreters:
>
> +------------------------------+----------------------------
> ------------------+
> | signature                    | description
>     |
> +============================+=+============================
> ==================+
> | list_all() -> [Intepreter]   | Get all existing interpreters.
>    |
> +------------------------------+----------------------------
> ------------------+
> | get_current() -> Interpreter | Get the currently running interpreter.
>    |
> +------------------------------+----------------------------
> ------------------+
> | create() -> Interpreter      | Initialize a new (idle) Python
> interpreter.  |
> +------------------------------+----------------------------
> ------------------+
>
> |
>
> +-----------------------+-----------------------------------
> ------------------+
> | signature             | description
>    |
> +=======================+===================================
> ==================+
> | class Interpreter(id) | A single interpreter.
>    |
> +-----------------------+-----------------------------------
> ------------------+
> | .id                   | The interpreter's ID (read-only).
>    |
> +-----------------------+-----------------------------------
> ------------------+
> | .is_running() -> Bool | Is the interpreter currently executing code?
>     |
> +-----------------------+-----------------------------------
> ------------------+
> | .destroy()            | Finalize and destroy the interpreter.
>    |
> +-----------------------+-----------------------------------
> ------------------+
> | .run(src_str, /, \*,  | | Run the given source code in the interpreter.
>    |
> |      channels=None)   | | (This blocks the current thread until done.)
>     |
> +-----------------------+-----------------------------------
> ------------------+
>
> For sharing data between interpreters:
>
> +--------------------------------+--------------------------
> ------------------+
> | signature                      | description
>     |
> +================================+==========================
> ==================+
> | is_shareable(obj) -> Bool      | | Can the object's data be shared
>     |
> |                                | | between interpreters?
>     |
> +--------------------------------+--------------------------
> ------------------+
> | create_channel() ->            | | Create a new channel for passing
>    |
> |   (RecvChannel, SendChannel)   | | data between interpreters.
>    |
> +--------------------------------+--------------------------
> ------------------+
> | list_all_channels() ->         | Get all open channels.
>    |
> |   [(RecvChannel, SendChannel)] |
>     |
> +--------------------------------+--------------------------
> ------------------+
>
> |
>
> +-------------------------------+---------------------------
> --------------------+
> | signature                     | description
>          |
> +===============================+===========================
> ====================+
> | class RecvChannel(id)         | The receiving end of a channel.
>          |
> +-------------------------------+---------------------------
> --------------------+
> | .id                           | The channel's unique ID.
>          |
> +-------------------------------+---------------------------
> --------------------+
> | .interpreters                 | The list of associated interpreters.
>          |
> +-------------------------------+---------------------------
> --------------------+
> | .recv() -> object             | | Get the next object from the
> channel,       |
> |                               | | and wait if none have been sent.
>          |
> |                               | | Associate the interpreter with the
> channel. |
> +-------------------------------+---------------------------
> --------------------+
> | .recv_nowait(default=None) -> | | Like recv(), but return the
> default         |
> |   object                      | | instead of waiting.
>          |
> +-------------------------------+---------------------------
> --------------------+
> | .close()                      | | No longer associate the current
> interpreter |
> |                               | | with the channel (on the receiving
> end).    |
> +-------------------------------+---------------------------
> --------------------+
>
> |
>
> +---------------------------+-------------------------------
> ------------------+
> | signature                 | description
>    |
> +===========================+===============================
> ==================+
> | class SendChannel(id)     | The sending end of a channel.
>    |
> +---------------------------+-------------------------------
> ------------------+
> | .id                       | The channel's unique ID.
>     |
> +---------------------------+-------------------------------
> ------------------+
> | .interpreters             | The list of associated interpreters.
>     |
> +---------------------------+-------------------------------
> ------------------+
> | .send(obj)                | | Send the object (i.e. its data) to the
>     |
> |                           | | receiving end of the channel and wait.
>     |
> |                           | | Associate the interpreter with the
> channel.   |
> +---------------------------+-------------------------------
> ------------------+
> | .send_nowait(obj)         | | Like send(), but Fail if not received.
>     |
> +---------------------------+-------------------------------
> ------------------+
> | .send_buffer(obj)         | | Send the object's (PEP 3118) buffer to
> the    |
> |                           | | receiving end of the channel and wait.
>     |
> |                           | | Associate the interpreter with the
> channel.   |
> +---------------------------+-------------------------------
> ------------------+
> | .send_buffer_nowait(obj)  | | Like send_buffer(), but fail if not
> received. |
> +---------------------------+-------------------------------
> ------------------+
> | .close()                  | | No longer associate the current
> interpreter   |
> |                           | | with the channel (on the sending end).
>     |
> +---------------------------+-------------------------------
> ------------------+
>
>
> Examples
> ========
>
> Run isolated code
> -----------------
>
> ::
>
>    interp = interpreters.create()
>    print('before')
>    interp.run('print("during")')
>    print('after')
>
> Run in a thread
> ---------------
>
> ::
>
>    interp = interpreters.create()
>    def run():
>        interp.run('print("during")')
>    t = threading.Thread(target=run)
>    print('before')
>    t.start()
>    print('after')
>
> Pre-populate an interpreter
> ---------------------------
>
> ::
>
>    interp = interpreters.create()
>    interp.run(tw.dedent("""
>        import some_lib
>        import an_expensive_module
>        some_lib.set_up()
>        """))
>    wait_for_request()
>    interp.run(tw.dedent("""
>        some_lib.handle_request()
>        """))
>
> Handling an exception
> ---------------------
>
> ::
>
>    interp = interpreters.create()
>    try:
>        interp.run(tw.dedent("""
>            raise KeyError
>            """))
>    except KeyError:
>        print("got the error from the subinterpreter")
>
> Synchronize using a channel
> ---------------------------
>
> ::
>
>    interp = interpreters.create()
>    r, s = interpreters.create_channel()
>    def run():
>        interp.run(tw.dedent("""
>            reader.recv()
>            print("during")
>            reader.close()
>            """),
>            reader=r))
>    t = threading.Thread(target=run)
>    print('before')
>    t.start()
>    print('after')
>    s.send(b'')
>    s.close()
>
> Sharing a file descriptor
> -------------------------
>
> ::
>
>    interp = interpreters.create()
>    r1, s1 = interpreters.create_channel()
>    r2, s2 = interpreters.create_channel()
>    def run():
>        interp.run(tw.dedent("""
>            fd = int.from_bytes(
>                    reader.recv(), 'big')
>            for line in os.fdopen(fd):
>                print(line)
>            writer.send(b'')
>            """),
>            reader=r1, writer=s2)
>    t = threading.Thread(target=run)
>    t.start()
>    with open('spamspamspam') as infile:
>        fd = infile.fileno().to_bytes(1, 'big')
>        s.send(fd)
>        r.recv()
>
> Passing objects via marshal
> ---------------------------
>
> ::
>
>    interp = interpreters.create()
>    r, s = interpreters.create_fifo()
>    interp.run(tw.dedent("""
>        import marshal
>        """),
>        reader=r)
>    def run():
>        interp.run(tw.dedent("""
>            data = reader.recv()
>            while data:
>                obj = marshal.loads(data)
>                do_something(obj)
>                data = reader.recv()
>            reader.close()
>            """),
>            reader=r)
>    t = threading.Thread(target=run)
>    t.start()
>    for obj in input:
>        data = marshal.dumps(obj)
>        s.send(data)
>    s.send(b'')
>
> Passing objects via pickle
> --------------------------
>
> ::
>
>    interp = interpreters.create()
>    r, s = interpreters.create_channel()
>    interp.run(tw.dedent("""
>        import pickle
>        """),
>        reader=r)
>    def run():
>        interp.run(tw.dedent("""
>            data = reader.recv()
>            while data:
>                obj = pickle.loads(data)
>                do_something(obj)
>                data = reader.recv()
>            reader.close()
>            """),
>            reader=r)
>    t = threading.Thread(target=run)
>    t.start()
>    for obj in input:
>        data = pickle.dumps(obj)
>        s.send(data)
>    s.send(b'')
>
> Running a module
> ----------------
>
> ::
>
>    interp = interpreters.create()
>    main_module = mod_name
>    interp.run(f'import runpy; runpy.run_module({main_module!r})')
>
> Running as script (including zip archives & directories)
> --------------------------------------------------------
>
> ::
>
>    interp = interpreters.create()
>    main_script = path_name
>    interp.run(f"import runpy; runpy.run_path({main_script!r})")
>
> Running in a thread pool executor
> ---------------------------------
>
> ::
>
>    interps = [interpreters.create() for i in range(5)]
>    with concurrent.futures.ThreadPoolExecutor(max_workers=len(interps))
> as pool:
>        print('before')
>        for interp in interps:
>            pool.submit(interp.run, 'print("starting"); print("stopping")'
>        print('after')
>
>
> Rationale
> =========
>
> Running code in multiple interpreters provides a useful level of
> isolation within the same process.  This can be leveraged in a number
> of ways.  Furthermore, subinterpreters provide a well-defined framework
> in which such isolation may extended.
>
> Nick Coghlan explained some of the benefits through a comparison with
> multi-processing [benefits]_::
>
>    [I] expect that communicating between subinterpreters is going
>    to end up looking an awful lot like communicating between
>    subprocesses via shared memory.
>
>    The trade-off between the two models will then be that one still
>    just looks like a single process from the point of view of the
>    outside world, and hence doesn't place any extra demands on the
>    underlying OS beyond those required to run CPython with a single
>    interpreter, while the other gives much stricter isolation
>    (including isolating C globals in extension modules), but also
>    demands much more from the OS when it comes to its IPC
>    capabilities.
>
>    The security risk profiles of the two approaches will also be quite
>    different, since using subinterpreters won't require deliberately
>    poking holes in the process isolation that operating systems give
>    you by default.
>
> CPython has supported subinterpreters, with increasing levels of
> support, since version 1.5.  While the feature has the potential
> to be a powerful tool, subinterpreters have suffered from neglect
> because they are not available directly from Python.  Exposing the
> existing functionality in the stdlib will help reverse the situation.
>
> This proposal is focused on enabling the fundamental capability of
> multiple isolated interpreters in the same Python process.  This is a
> new area for Python so there is relative uncertainly about the best
> tools to provide as companions to subinterpreters.  Thus we minimize
> the functionality we add in the proposal as much as possible.
>
> Concerns
> --------
>
> * "subinterpreters are not worth the trouble"
>
> Some have argued that subinterpreters do not add sufficient benefit
> to justify making them an official part of Python.  Adding features
> to the language (or stdlib) has a cost in increasing the size of
> the language.  So an addition must pay for itself.  In this case,
> subinterpreters provide a novel concurrency model focused on isolated
> threads of execution.  Furthermore, they provide an opportunity for
> changes in CPython that will allow simulateous use of multiple CPU
> cores (currently prevented by the GIL).
>
> Alternatives to subinterpreters include threading, async, and
> multiprocessing.  Threading is limited by the GIL and async isn't
> the right solution for every problem (nor for every person).
> Multiprocessing is likewise valuable in some but not all situations.
> Direct IPC (rather than via the multiprocessing module) provides
> similar benefits but with the same caveat.
>
> Notably, subinterpreters are not intended as a replacement for any of
> the above.  Certainly they overlap in some areas, but the benefits of
> subinterpreters include isolation and (potentially) performance.  In
> particular, subinterpreters provide a direct route to an alternate
> concurrency model (e.g. CSP) which has found success elsewhere and
> will appeal to some Python users.  That is the core value that the
> ``interpreters`` module will provide.
>
> * "stdlib support for subinterpreters adds extra burden
>   on C extension authors"
>
> In the `Interpreter Isolation`_ section below we identify ways in
> which isolation in CPython's subinterpreters is incomplete.  Most
> notable is extension modules that use C globals to store internal
> state.  PEP 3121 and PEP 489 provide a solution for most of the
> problem, but one still remains. [petr-c-ext]_  Until that is resolved,
> C extension authors will face extra difficulty to support
> subinterpreters.
>
> Consequently, projects that publish extension modules may face an
> increased maintenance burden as their users start using subinterpreters,
> where their modules may break.  This situation is limited to modules
> that use C globals (or use libraries that use C globals) to store
> internal state.  For numpy, the reported-bug rate is one every 6
> months. [bug-rate]_
>
> Ultimately this comes down to a question of how often it will be a
> problem in practice: how many projects would be affected, how often
> their users will be affected, what the additional maintenance burden
> will be for projects, and what the overall benefit of subinterpreters
> is to offset those costs.  The position of this PEP is that the actual
> extra maintenance burden will be small and well below the threshold at
> which subinterpreters are worth it.
>
>
> About Subinterpreters
> =====================
>
> Concurrency
> -----------
>
> Concurrency is a challenging area of software development.  Decades of
> research and practice have led to a wide variety of concurrency models,
> each with different goals.  Most center on correctness and usability.
>
> One class of concurrency models focuses on isolated threads of
> execution that interoperate through some message passing scheme.  A
> notable example is `Communicating Sequential Processes`_ (CSP), upon
> which Go's concurrency is based.  The isolation inherent to
> subinterpreters makes them well-suited to this approach.
>
> Shared data
> -----------
>
> Subinterpreters are inherently isolated (with caveats explained below),
> in contrast to threads.  So the same communicate-via-shared-memory
> approach doesn't work.  Without an alternative, effective use of
> concurrency via subinterpreters is significantly limited.
>
> The key challenge here is that sharing objects between interpreters
> faces complexity due to various constraints on object ownership,
> visibility, and mutability.  At a conceptual level it's easier to
> reason about concurrency when objects only exist in one interpreter
> at a time.  At a technical level, CPython's current memory model
> limits how Python *objects* may be shared safely between interpreters;
> effectively objects are bound to the interpreter in which they were
> created.  Furthermore the complexity of *object* sharing increases as
> subinterpreters become more isolated, e.g. after GIL removal.
>
> Consequently,the mechanism for sharing needs to be carefully considered.
> There are a number of valid solutions, several of which may be
> appropriate to support in Python.  This proposal provides a single basic
> solution: "channels".  Ultimately, any other solution will look similar
> to the proposed one, which will set the precedent.  Note that the
> implementation of ``Interpreter.run()`` can be done in a way that allows
> for multiple solutions to coexist, but doing so is not technically
> a part of the proposal here.
>
> Regarding the proposed solution, "channels", it is a basic, opt-in data
> sharing mechanism that draws inspiration from pipes, queues, and CSP's
> channels. [fifo]_
>
> As simply described earlier by the API summary,
> channels have two operations: send and receive.  A key characteristic
> of those operations is that channels transmit data derived from Python
> objects rather than the objects themselves.  When objects are sent,
> their data is extracted.  When the "object" is received in the other
> interpreter, the data is converted back into an object.
>
> To make this work, the mutable shared state will be managed by the
> Python runtime, not by any of the interpreters.  Initially we will
> support only one type of objects for shared state: the channels provided
> by ``create_channel()``.  Channels, in turn, will carefully manage
> passing objects between interpreters.
>
> This approach, including keeping the API minimal, helps us avoid further
> exposing any underlying complexity to Python users.  Along those same
> lines, we will initially restrict the types that may be passed through
> channels to the following:
>
> * None
> * bytes
> * PEP 3118 buffer objects (via ``send_buffer()``)
>
> Limiting the initial shareable types is a practical matter, reducing
> the potential complexity of the initial implementation.  There are a
> number of strategies we may pursue in the future to expand supported
> objects and object sharing strategies.
>
> Interpreter Isolation
> ---------------------
>
> CPython's interpreters are intended to be strictly isolated from each
> other.  Each interpreter has its own copy of all modules, classes,
> functions, and variables.  The same applies to state in C, including in
> extension modules.  The CPython C-API docs explain more. [caveats]_
>
> However, there are ways in which interpreters share some state.  First
> of all, some process-global state remains shared:
>
> * file descriptors
> * builtin types (e.g. dict, bytes)
> * singletons (e.g. None)
> * underlying static module data (e.g. functions) for
>   builtin/extension/frozen modules
>
> There are no plans to change this.
>
> Second, some isolation is faulty due to bugs or implementations that did
> not take subinterpreters into account.  This includes things like
> extension modules that rely on C globals. [cryptography]_  In these
> cases bugs should be opened (some are already):
>
> * readline module hook functions (http://bugs.python.org/issue4202)
> * memory leaks on re-init (http://bugs.python.org/issue21387)
>
> Finally, some potential isolation is missing due to the current design
> of CPython.  Improvements are currently going on to address gaps in this
> area:
>
> * interpreters share the GIL
> * interpreters share memory management (e.g. allocators, gc)
> * GC is not run per-interpreter [global-gc]_
> * at-exit handlers are not run per-interpreter [global-atexit]_
> * extensions using the ``PyGILState_*`` API are incompatible [gilstate]_
>
> Existing Usage
> --------------
>
> Subinterpreters are not a widely used feature.  In fact, the only
> documented cases of wide-spread usage are
> `mod_wsgi <https://github.com/GrahamDumpleton/mod_wsgi>`_and
> `JEP <https://github.com/ninia/jep>`_.  On the one hand, this case
> provides confidence that existing subinterpreter support is relatively
> stable.  On the other hand, there isn't much of a sample size from which
> to judge the utility of the feature.
>
>
> Provisional Status
> ==================
>
> The new ``interpreters`` module will be added with "provisional" status
> (see PEP 411).  This allows Python users to experiment with the feature
> and provide feedback while still allowing us to adjust to that feedback.
> The module will be provisional in Python 3.7 and we will make a decision
> before the 3.8 release whether to keep it provisional, graduate it, or
> remove it.
>
>
> Alternate Python Implementations
> ================================
>
> I'll be soliciting feedback from the different Python implementors about
> subinterpreter support.
>
> Multiple-interpter support in the major Python implementations:
>
> TBD
>
> * jython: yes [jython]_
> * ironpython: yes?
> * pypy: maybe not? [pypy]_
> * micropython: ???
>
>
> "interpreters" Module API
> =========================
>
> The module provides the following functions:
>
> ``list_all()``::
>
>    Return a list of all existing interpreters.
>
> ``get_current()``::
>
>    Return the currently running interpreter.
>
> ``create()``::
>
>    Initialize a new Python interpreter and return it.  The
>    interpreter will be created in the current thread and will remain
>    idle until something is run in it.  The interpreter may be used
>    in any thread and will run in whichever thread calls
>    ``interp.run()``.
>
>
> The module also provides the following class:
>
> ``Interpreter(id)``::
>
>    id:
>
>       The interpreter's ID (read-only).
>
>    is_running():
>
>       Return whether or not the interpreter is currently executing code.
>       Calling this on the current interpreter will always return True.
>
>    destroy():
>
>       Finalize and destroy the interpreter.
>
>       This may not be called on an already running interpreter.  Doing
>       so results in a RuntimeError.
>
>    run(source_str, /, *, channels=None):
>
>       Run the provided Python source code in the interpreter.  If the
>       "channels" keyword argument is provided (and is a mapping of
>       attribute names to channels) then it is added to the interpreter's
>       execution namespace (the interpreter's "__main__" module).  If any
>       of the values are not are not RecvChannel or SendChannel instances
>       then ValueError gets raised.
>
>       This may not be called on an already running interpreter.  Doing
>       so results in a RuntimeError.
>
>       A "run()" call is similar to a function call.  Once it completes,
>       the code that called "run()" continues executing (in the original
>       interpreter).  Likewise, if there is any uncaught exception then
>       it effectively (see below) propagates into the code where
>       ``run()`` was called.  However, unlike function calls (but like
>       threads), there is no return value.  If any value is needed, pass
>       it out via a channel.
>
>       The big difference is that "run()" executes the code in an
>       entirely different interpreter, with entirely separate state.
>       The state of the current interpreter in the current OS thread
>       is swapped out with the state of the target interpreter (the one
>       that will execute the code).  When the target finishes executing,
>       the original interpreter gets swapped back in and its execution
>       resumes.
>
>       So calling "run()" will effectively cause the current Python
>       thread to pause.  Sometimes you won't want that pause, in which
>       case you should make the "run()" call in another thread.  To do
>       so, add a function that calls "run()" and then run that function
>       in a normal "threading.Thread".
>
>       Note that the interpreter's state is never reset, neither before
>       "run()" executes the code nor after.  Thus the interpreter
>       state is preserved between calls to "run()".  This includes
>       "sys.modules", the "builtins" module, and the internal state
>       of C extension modules.
>
>       Also note that "run()" executes in the namespace of the "__main__"
>       module, just like scripts, the REPL, "-m", and "-c".  Just as
>       the interpreter's state is not ever reset, the "__main__" module
>       is never reset.  You can imagine concatenating the code from each
>       "run()" call into one long script.  This is the same as how the
>       REPL operates.
>
>       Regarding uncaught exceptions, we noted that they are
>       "effectively" propagated into the code where ``run()`` was called.
>       To prevent leaking exceptions (and tracebacks) between
>       interpreters, we create a surrogate of the exception and its
>       traceback (see ``traceback.TracebackException``), wrap it in a
>       RuntimeError, and raise that.
>
>       Supported code: source text.
>
>
> API for sharing data
> --------------------
>
> Subinterpreters are less useful without a mechanism for sharing data
> between them.  Sharing actual Python objects between interpreters,
> however, has enough potential problems that we are avoiding support
> for that here.  Instead, only mimimum set of types will be supported.
> Initially this will include ``bytes`` and channels.  Further types may
> be supported later.
>
> The ``interpreters`` module provides a way for users to determine
> whether an object is shareable or not:
>
> ``is_shareable(obj)``::
>
>    Return True if the object may be shared between interpreters.  This
>    does not necessarily mean that the actual objects will be shared.
>    Insead, it means that the objects' underlying data will be shared in
>    a cross-interpreter way, whether via a proxy, a copy, or some other
>    means.
>
> This proposal provides two ways to do share such objects between
> interpreters.
>
> First, shareable objects may be passed to ``run()`` as keyword arguments,
> where they are effectively injected into the target interpreter's
> ``__main__`` module.  This is mainly intended for sharing meta-objects
> (e.g. channels) between interpreters, as it is less useful to pass other
> objects (like ``bytes``) to ``run``.
>
> Second, the main mechanism for sharing objects (i.e. their data) between
> interpreters is through channels.  A channel is a simplex FIFO similar
> to a pipe.  The main difference is that channels can be associated with
> zero or more interpreters on either end.  Unlike queues, which are also
> many-to-many, channels have no buffer.
>
> ``create_channel()``::
>
>    Create a new channel and return (recv, send), the RecvChannel and
>    SendChannel corresponding to the ends of the channel.  The channel
>    is not closed and destroyed (i.e. garbage-collected) until the number
>    of associated interpreters returns to 0.
>
>    An interpreter gets associated with a channel by calling its "send()"
>    or "recv()" method.  That association gets dropped by calling
>    "close()" on the channel.
>
>    Both ends of the channel are supported "shared" objects (i.e. may be
>    safely shared by different interpreters.  Thus they may be passed as
>    keyword arguments to "Interpreter.run()".
>
> ``list_all_channels()``::
>
>    Return a list of all open (RecvChannel, SendChannel) pairs.
>
>
> ``RecvChannel(id)``::
>
>    The receiving end of a channel.  An interpreter may use this to
>    receive objects from another interpreter.  At first only bytes will
>    be supported.
>
>    id:
>
>       The channel's unique ID.
>
>    interpreters:
>
>       The list of associated interpreters: those that have called
>       the "recv()" or "__next__()" methods and haven't called "close()".
>
>    recv():
>
>       Return the next object (i.e. the data from the sent object) from
>       the channel.  If none have been sent then wait until the next
>       send.  This associates the current interpreter with the channel.
>
>       If the channel is already closed (see the close() method)
>       then raise EOFError.  If the channel isn't closed, but the current
>       interpreter already called the "close()" method (which drops its
>       association with the channel) then raise ValueError.
>
>    recv_nowait(default=None):
>
>       Return the next object from the channel.  If none have been sent
>       then return the default.  Otherwise, this is the same as the
>       "recv()" method.
>
>    close():
>
>       No longer associate the current interpreter with the channel (on
>       the receiving end) and block future association (via the "recv()"
>       method.  If the interpreter was never associated with the channel
>       then still block future association.  Once an interpreter is no
>       longer associated with the channel, subsequent (or current) send()
>       and recv() calls from that interpreter will raise ValueError
>       (or EOFError if the channel is actually marked as closed).
>
>       Once the number of associated interpreters on both ends drops
>       to 0, the channel is actually marked as closed.  The Python
>       runtime will garbage collect all closed channels, though it may
>       not be immediately.  Note that "close()" is automatically called
>       in behalf of the current interpreter when the channel is no longer
>       used (i.e. has no references) in that interpreter.
>
>       This operation is idempotent.  Return True if "close()" has not
>       been called before by the current interpreter.
>
>
> ``SendChannel(id)``::
>
>    The sending end of a channel.  An interpreter may use this to send
>    objects to another interpreter.  At first only bytes will be
>    supported.
>
>    id:
>
>       The channel's unique ID.
>
>    interpreters:
>
>       The list of associated interpreters (those that have called
>       the "send()" method).
>
>    send(obj):
>
>       Send the object (i.e. its data) to the receiving end of the
>       channel.  Wait until the object is received.  If the the
>       object is not shareable then ValueError is raised.  Currently
>       only bytes are supported.
>
>       If the channel is already closed (see the close() method)
>       then raise EOFError.  If the channel isn't closed, but the current
>       interpreter already called the "close()" method (which drops its
>       association with the channel) then raise ValueError.
>
>    send_nowait(obj):
>
>       Send the object to the receiving end of the channel.  If the other
>       end is not currently receiving then raise RuntimeError.  Otherwise
>       this is the same as "send()".
>
>    send_buffer(obj):
>
>       Send a MemoryView of the object rather than the object.  Otherwise
>       this is the same as send().  Note that the object must implement
>       the PEP 3118 buffer protocol.
>
>    send_buffer_nowait(obj):
>
>       Send a MemoryView of the object rather than the object.  If the
>       other end is not currently receiving then raise RuntimeError.
>       Otherwise this is the same as "send_buffer()".
>
>    close():
>
>       This is the same as "RecvChannel.close(), but applied to the
>       sending end of the channel.
>
> Note that ``send_buffer()`` is similar to how
> ``multiprocessing.Connection`` works. [mp-conn]_
>
>
> Open Questions
> ==============
>
> None
>
>
> Open Implementation Questions
> =============================
>
> Does every interpreter think that their thread is the "main" thread?
> --------------------------------------------------------------------
>
> (This is more of an implementation detail that an issue for the PEP.)
>
> CPython's interpreter implementation identifies the OS thread in which
> it was started as the "main" thread.  The interpreter the has slightly
> different behavior depending on if the current thread is the main one
> or not.  This presents a problem in cases where "main thread" is meant
> to imply "main thread in the main interpreter" [main-thread]_, where
> the main interpreter is the initial one.
>
> Disallow subinterpreters in the main thread?
> --------------------------------------------
>
> (This is more of an implementation detail that an issue for the PEP.)
>
> This is a specific case of the above issue.  Currently in CPython,
> "we need a main \*thread\* in order to sensibly manage the way signal
> handling works across different platforms".  [main-thread]_
>
> Since signal handlers are part of the interpreter state, running a
> subinterpreter in the main thread means that the main interpreter
> can no longer properly handle signals (since it's effectively paused).
>
> Furthermore, running a subinterpreter in the main thread would
> conceivably allow setting signal handlers on that interpreter, which
> would likewise impact signal handling when that interpreter isn't
> running or is running in a different thread.
>
> Ultimately, running subinterpreters in the main OS thread introduces
> complications to the signal handling implementation.  So it may make
> the most sense to disallow running subinterpreters in the main thread.
> Support for it could be considered later.  The downside is that folks
> wanting to try out subinterpreters would be required to take the extra
> step of using threads.  This could slow adoption and experimentation,
> whereas without the restriction there's less of an obstacle.
>
>
> Deferred Functionality
> ======================
>
> In the interest of keeping this proposal minimal, the following
> functionality has been left out for future consideration.  Note that
> this is not a judgement against any of said capability, but rather a
> deferment.  That said, each is arguably valid.
>
> Interpreter.call()
> ------------------
>
> It would be convenient to run existing functions in subinterpreters
> directly.  ``Interpreter.run()`` could be adjusted to support this or
> a ``call()`` method could be added::
>
>    Interpreter.call(f, *args, **kwargs)
>
> This suffers from the same problem as sharing objects between
> interpreters via queues.  The minimal solution (running a source string)
> is sufficient for us to get the feature out where it can be explored.
>
> timeout arg to recv() and send()
> --------------------------------
>
> Typically functions that have a ``block`` argument also have a
> ``timeout`` argument.  It sometimes makes sense to do likewise for
> functions that otherwise block, like the channel ``recv()`` and
> ``send()`` methods.  We can add it later if needed.
>
> get_main()
> ----------
>
> CPython has a concept of a "main" interpreter.  This is the initial
> interpreter created during CPython's runtime initialization.  It may
> be useful to identify the main interpreter.  For instance, the main
> interpreter should not be destroyed.  However, for the basic
> functionality of a high-level API a ``get_main()`` function is not
> necessary.  Furthermore, there is no requirement that a Python
> implementation have a concept of a main interpreter.  So until there's
> a clear need we'll leave ``get_main()`` out.
>
> Interpreter.run_in_thread()
> ---------------------------
>
> This method would make a ``run()`` call for you in a thread.  Doing this
> using only ``threading.Thread`` and ``run()`` is relatively trivial so
> we've left it out.
>
> Synchronization Primitives
> --------------------------
>
> The ``threading`` module provides a number of synchronization primitives
> for coordinating concurrent operations.  This is especially necessary
> due to the shared-state nature of threading.  In contrast,
> subinterpreters do not share state.  Data sharing is restricted to
> channels, which do away with the need for explicit synchronization.  If
> any sort of opt-in shared state support is added to subinterpreters in
> the future, that same effort can introduce synchronization primitives
> to meet that need.
>
> CSP Library
> -----------
>
> A ``csp`` module would not be a large step away from the functionality
> provided by this PEP.  However, adding such a module is outside the
> minimalist goals of this proposal.
>
> Syntactic Support
> -----------------
>
> The ``Go`` language provides a concurrency model based on CSP, so
> it's similar to the concurrency model that subinterpreters support.
> ``Go`` provides syntactic support, as well several builtin concurrency
> primitives, to make concurrency a first-class feature.  Conceivably,
> similar syntactic (and builtin) support could be added to Python using
> subinterpreters.  However, that is *way* outside the scope of this PEP!
>
> Multiprocessing
> ---------------
>
> The ``multiprocessing`` module could support subinterpreters in the same
> way it supports threads and processes.  In fact, the module's
> maintainer, Davin Potts, has indicated this is a reasonable feature
> request.  However, it is outside the narrow scope of this PEP.
>
> C-extension opt-in/opt-out
> --------------------------
>
> By using the ``PyModuleDef_Slot`` introduced by PEP 489, we could easily
> add a mechanism by which C-extension modules could opt out of support
> for subinterpreters.  Then the import machinery, when operating in
> a subinterpreter, would need to check the module for support.  It would
> raise an ImportError if unsupported.
>
> Alternately we could support opting in to subinterpreter support.
> However, that would probably exclude many more modules (unnecessarily)
> than the opt-out approach.
>
> The scope of adding the ModuleDef slot and fixing up the import
> machinery is non-trivial, but could be worth it.  It all depends on
> how many extension modules break under subinterpreters.  Given the
> relatively few cases we know of through mod_wsgi, we can leave this
> for later.
>
> Poisoning channels
> ------------------
>
> CSP has the concept of poisoning a channel.  Once a channel has been
> poisoned, and ``send()`` or ``recv()`` call on it will raise a special
> exception, effectively ending execution in the interpreter that tried
> to use the poisoned channel.
>
> This could be accomplished by adding a ``poison()`` method to both ends
> of the channel.  The ``close()`` method could work if it had a ``force``
> option to force the channel closed.  Regardless, these semantics are
> relatively specialized and can wait.
>
> Sending channels over channels
> ------------------------------
>
> Some advanced usage of subinterpreters could take advantage of the
> ability to send channels over channels, in addition to bytes.  Given
> that channels will already be multi-interpreter safe, supporting then
> in ``RecvChannel.recv()`` wouldn't be a big change.  However, this can
> wait until the basic functionality has been ironed out.
>
> Reseting __main__
> -----------------
>
> As proposed, every call to ``Interpreter.run()`` will execute in the
> namespace of the interpreter's existing ``__main__`` module.  This means
> that data persists there between ``run()`` calls.  Sometimes this isn't
> desireable and you want to execute in a fresh ``__main__``.  Also,
> you don't necessarily want to leak objects there that you aren't using
> any more.
>
> Note that the following won't work right because it will clear too much
> (e.g. ``__name__`` and the other "__dunder__" attributes::
>
>    interp.run('globals().clear()')
>
> Possible solutions include:
>
> * a ``create()`` arg to indicate resetting ``__main__`` after each
>   ``run`` call
> * an ``Interpreter.reset_main`` flag to support opting in or out
>   after the fact
> * an ``Interpreter.reset_main()`` method to opt in when desired
> * ``importlib.util.reset_globals()`` [reset_globals]_
>
> Also note that reseting ``__main__`` does nothing about state stored
> in other modules.  So any solution would have to be clear about the
> scope of what is being reset.  Conceivably we could invent a mechanism
> by which any (or every) module could be reset, unlike ``reload()``
> which does not clear the module before loading into it.  Regardless,
> since ``__main__`` is the execution namespace of the interpreter,
> resetting it has a much more direct correlation to interpreters and
> their dynamic state than does resetting other modules.  So a more
> generic module reset mechanism may prove unnecessary.
>
> This isn't a critical feature initially.  It can wait until later
> if desirable.
>
> Support passing ints in channels
> --------------------------------
>
> Passing ints around should be fine and ultimately is probably
> desirable.  However, we can get by with serializing them as bytes
> for now.  The goal is a minimal API for the sake of basic
> functionality at first.
>
> File descriptors and sockets in channels
> ----------------------------------------
>
> Given that file descriptors and sockets are process-global resources,
> support for passing them through channels is a reasonable idea.  They
> would be a good candidate for the first effort at expanding the types
> that channels support.  They aren't strictly necessary for the initial
> API.
>
> Integration with async
> ----------------------
>
> Per Antoine Pitrou [async]_::
>
>    Has any thought been given to how FIFOs could integrate with async
>    code driven by an event loop (e.g. asyncio)?  I think the model of
>    executing several asyncio (or Tornado) applications each in their
>    own subinterpreter may prove quite interesting to reconcile multi-
>    core concurrency with ease of programming.  That would require the
>    FIFOs to be able to synchronize on something an event loop can wait
>    on (probably a file descriptor?).
>
> A possible solution is to provide async implementations of the blocking
> channel methods (``__next__()``, ``recv()``, and ``send()``).  However,
> the basic functionality of subinterpreters does not depend on async and
> can be added later.
>
> Support for iteration
> ---------------------
>
> Supporting iteration on ``RecvChannel`` (via ``__iter__()`` or
> ``_next__()``) may be useful.  A trivial implementation would use the
> ``recv()`` method, similar to how files do iteration.  Since this isn't
> a fundamental capability and has a simple analog, adding iteration
> support can wait until later.
>
> Channel context managers
> ------------------------
>
> Context manager support on ``RecvChannel`` and ``SendChannel`` may be
> helpful.  The implementation would be simple, wrapping a call to
> ``close()`` like files do.  As with iteration, this can wait.
>
> Pipes and Queues
> ----------------
>
> With the proposed object passing machanism of "channels", other similar
> basic types aren't required to achieve the minimal useful functionality
> of subinterpreters.  Such types include pipes (like channels, but
> one-to-one) and queues (like channels, but buffered).  See below in
> `Rejected Ideas` for more information.
>
> Even though these types aren't part of this proposal, they may still
> be useful in the context of concurrency.  Adding them later is entirely
> reasonable.  The could be trivially implemented as wrappers around
> channels.  Alternatively they could be implemented for efficiency at the
> same low level as channels.
>
> interpreters.RunFailedError
> ---------------------------
>
> As currently proposed, ``Interpreter.run()`` offers you no way to
> distinguish an error coming from the subinterpreter from any other
> error in the current interpreter.  Your only option would be to
> explicitly wrap your ``run()`` call in a
> ``try: ... except RuntimeError:`` (since we wrap a proxy of the original
> exception in a RuntimeError and raise that).
>
> If this is a problem in practice then would could add something like
> ``interpreters.RunFailedError`` (subclassing RuntimeError) and raise that
> in ``run()``.
>
> Return a lock from send()
> -------------------------
>
> When sending an object through a channel, you don't have a way of knowing
> when the object gets received on the other end.  One way to work around
> this is to return a locked ``threading.Lock`` from ``SendChannel.send()``
> that unlocks once the object is received.
>
> This matters for buffered channels (i.e. queues).  For unbuffered
> channels it is a non-issue.  So this can be dealt with once channels
> support buffering.
>
>
> Rejected Ideas
> ==============
>
> Explicit channel association
> ----------------------------
>
> Interpreters are implicitly associated with channels upon ``recv()`` and
> ``send()`` calls.  They are de-associated with ``close()`` calls.  The
> alternative would be explicit methods.  It would be either
> ``add_channel()`` and ``remove_channel()`` methods on ``Interpreter``
> objects or something similar on channel objects.
>
> In practice, this level of management shouldn't be necessary for users.
> So adding more explicit support would only add clutter to the API.
>
> Use pipes instead of channels
> -----------------------------
>
> A pipe would be a simplex FIFO between exactly two interpreters.  For
> most use cases this would be sufficient.  It could potentially simplify
> the implementation as well.  However, it isn't a big step to supporting
> a many-to-many simplex FIFO via channels.  Also, with pipes the API
> ends up being slightly more complicated, requiring naming the pipes.
>
> Use queues instead of channels
> ------------------------------
>
> The main difference between queues and channels is that queues support
> buffering.  This would complicate the blocking semantics of ``recv()``
> and ``send()``.  Also, queues can be built on top of channels.
>
> "enumerate"
> -----------
>
> The ``list_all()`` function provides the list of all interpreters.
> In the threading module, which partly inspired the proposed API, the
> function is called ``enumerate()``.  The name is different here to
> avoid confusing Python users that are not already familiar with the
> threading API.  For them "enumerate" is rather unclear, whereas
> "list_all" is clear.
>
> Alternate solutions to prevent leaking exceptions across interpreters
> ---------------------------------------------------------------------
>
> In function calls, uncaught exceptions propagate to the calling frame.
> The same approach could be taken with ``run()``.  However, this would
> mean that exception objects would leak across the inter-interpreter
> boundary.  Likewise, the frames in the traceback would potentially leak.
>
> While that might not be a problem currently, it would be a problem once
> interpreters get better isolation relative to memory management (which
> is necessary to stop sharing the GIL between interpreters).  We've
> resolved the semantics of how the exceptions propagate by raising a
> RuntimeError instead, which wraps a safe proxy for the original
> exception and traceback.
>
> Rejected possible solutions:
>
> * set the RuntimeError's __cause__ to the proxy of the original
>   exception
> * reproduce the exception and traceback in the original interpreter
>   and raise that.
> * convert at the boundary (a la ``subprocess.CalledProcessError``)
>   (requires a cross-interpreter representation)
> * support customization via ``Interpreter.excepthook``
>   (requires a cross-interpreter representation)
> * wrap in a proxy at the boundary (including with support for
>   something like ``err.raise()`` to propagate the traceback).
> * return the exception (or its proxy) from ``run()`` instead of
>   raising it
> * return a result object (like ``subprocess`` does) [result-object]_
>   (unecessary complexity?)
> * throw the exception away and expect users to deal with unhandled
>   exceptions explicitly in the script they pass to ``run()``
>   (they can pass error info out via channels); with threads you have
>   to do something similar
>
>
> References
> ==========
>
> .. [c-api]
>    https://docs.python.org/3/c-api/init.html#sub-interpreter-support
>
> .. _Communicating Sequential Processes:
>
> .. [CSP]
>    https://en.wikipedia.org/wiki/Communicating_sequential_processes
>    https://github.com/futurecore/python-csp
>
> .. [fifo]
>    https://docs.python.org/3/library/multiprocessing.html#
> multiprocessing.Pipe
>    https://docs.python.org/3/library/multiprocessing.html#
> multiprocessing.Queue
>    https://docs.python.org/3/library/queue.html#module-queue
>    http://stackless.readthedocs.io/en/2.7-slp/library/
> stackless/channels.html
>    https://golang.org/doc/effective_go.html#sharing
>    http://www.jtolds.com/writing/2016/03/go-channels-are-bad-
> and-you-should-feel-bad/
>
> .. [caveats]
>    https://docs.python.org/3/c-api/init.html#bugs-and-caveats
>
> .. [petr-c-ext]
>    https://mail.python.org/pipermail/import-sig/2016-June/001062.html
>    https://mail.python.org/pipermail/python-ideas/2016-April/039748.html
>
> .. [cryptography]
>    https://github.com/pyca/cryptography/issues/2299
>
> .. [global-gc]
>    http://bugs.python.org/issue24554
>
> .. [gilstate]
>    https://bugs.python.org/issue10915
>    http://bugs.python.org/issue15751
>
> .. [global-atexit]
>    https://bugs.python.org/issue6531
>
> .. [mp-conn]
>    https://docs.python.org/3/library/multiprocessing.html#
> multiprocessing.Connection
>
> .. [bug-rate]
>    https://mail.python.org/pipermail/python-ideas/2017-
> September/047094.html
>
> .. [benefits]
>    https://mail.python.org/pipermail/python-ideas/2017-
> September/047122.html
>
> .. [main-thread]
>    https://mail.python.org/pipermail/python-ideas/2017-
> September/047144.html
>    https://mail.python.org/pipermail/python-dev/2017-September/149566.html
>
> .. [reset_globals]
>    https://mail.python.org/pipermail/python-dev/2017-September/149545.html
>
> .. [async]
>    https://mail.python.org/pipermail/python-dev/2017-September/149420.html
>    https://mail.python.org/pipermail/python-dev/2017-September/149585.html
>
> .. [result-object]
>    https://mail.python.org/pipermail/python-dev/2017-September/149562.html
>
> .. [jython]
>    https://mail.python.org/pipermail/python-ideas/2017-May/045771.html
>
> .. [pypy]
>    https://mail.python.org/pipermail/python-ideas/2017-
> September/046973.html
>
>
> Copyright
> =========
>
> This document has been placed in the public domain.
> _______________________________________________
> Python-Dev mailing list
> Python-Dev@python.org
> https://mail.python.org/mailman/listinfo/python-dev
> Unsubscribe: https://mail.python.org/mailman/options/python-dev/
> guido%40python.org
>



-- 
--Guido van Rossum (python.org/~guido)

_______________________________________________
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] PEP 554 v4 (new interpreters module)

Reply via email to