Re: [Python-ideas] New PEP 550: Execution Context

Nathaniel Smith Sat, 12 Aug 2017 16:36:32 -0700

I had an idea for an alternative API that exposes the same
functionality/semantics as the current draft, but that might have some
advantages. It would look like:


# a "context item" is an object that holds a context-sensitive value
# each call to create_context_item creates a new one
ci = sys.create_context_item()

# Set the value of this item in the current context
ci.set(value)

# Get the value of this item in the current context
value = ci.get()
value = ci.get(default)

# To support async libraries, we need some way to capture the whole context
# But an opaque token representing "all context item values" is enough
state_token = sys.current_context_state_token()
sys.set_context_state_token(state_token)
coro.cr_state_token = state_token
# etc.

The advantages are:
- Eliminates the current PEP's issues with namespace collision; every
context item is automatically distinct from all others.
- Eliminates the need for the None-means-del hack.
- Lets the interpreter hide the details of garbage collecting context values.
- Allows for more implementation flexibility. This could be
implemented directly on top of Yury's current prototype. But it could
also, for example, be implemented by storing the context values in a
flat array, where each context item is assigned an index when it's
allocated. In the current draft this is suggested as a possible
extension for particularly performance-sensitive users, but this way
we'd have the option of making everything fast without changing or
extending the API.

As precedent, this is basically the API that low-level thread-local
storage implementations use; see e.g. pthread_key_create,
pthread_getspecific, pthread_setspecific. (And the
allocate-an-index-in-a-table is the implementation that fast
thread-local storage implementations use too.)

-n

On Fri, Aug 11, 2017 at 3:37 PM, Yury Selivanov <[email protected]> wrote:
> Hi,
>
> This is a new PEP to implement Execution Contexts in Python.
>
> The PEP is in-flight to python.org, and in the meanwhile can
> be read on GitHub:
>
> https://github.com/python/peps/blob/master/pep-0550.rst
>
> (it contains a few diagrams and charts, so please read it there.)
>
> Thank you!
> Yury
>
>
> PEP: 550
> Title: Execution Context
> Version: $Revision$
> Last-Modified: $Date$
> Author: Yury Selivanov <[email protected]>
> Status: Draft
> Type: Standards Track
> Content-Type: text/x-rst
> Created: 11-Aug-2017
> Python-Version: 3.7
> Post-History: 11-Aug-2017
>
>
> Abstract
> ========
>
> This PEP proposes a new mechanism to manage execution state--the
> logical environment in which a function, a thread, a generator,
> or a coroutine executes in.
>
> A few examples of where having a reliable state storage is required:
>
> * Context managers like decimal contexts, ``numpy.errstate``,
>   and ``warnings.catch_warnings``;
>
> * Storing request-related data such as security tokens and request
>   data in web applications;
>
> * Profiling, tracing, and logging in complex and large code bases.
>
> The usual solution for storing state is to use a Thread-local Storage
> (TLS), implemented in the standard library as ``threading.local()``.
> Unfortunately, TLS does not work for isolating state of generators or
> asynchronous code because such code shares a single thread.
>
>
> Rationale
> =========
>
> Traditionally a Thread-local Storage (TLS) is used for storing the
> state.  However, the major flaw of using the TLS is that it works only
> for multi-threaded code.  It is not possible to reliably contain the
> state within a generator or a coroutine.  For example, consider
> the following generator::
>
>     def calculate(precision, ...):
>         with decimal.localcontext() as ctx:
>             # Set the precision for decimal calculations
>             # inside this block
>             ctx.prec = precision
>
>             yield calculate_something()
>             yield calculate_something_else()
>
> Decimal context is using a TLS to store the state, and because TLS is
> not aware of generators, the state can leak.  The above code will
> not work correctly, if a user iterates over the ``calculate()``
> generator with different precisions in parallel::
>
>     g1 = calculate(100)
>     g2 = calculate(50)
>
>     items = list(zip(g1, g2))
>
>     # items[0] will be a tuple of:
>     #   first value from g1 calculated with 100 precision,
>     #   first value from g2 calculated with 50 precision.
>     #
>     # items[1] will be a tuple of:
>     #   second value from g1 calculated with 50 precision,
>     #   second value from g2 calculated with 50 precision.
>
> An even scarier example would be using decimals to represent money
> in an async/await application: decimal calculations can suddenly
> lose precision in the middle of processing a request.  Currently,
> bugs like this are extremely hard to find and fix.
>
> Another common need for web applications is to have access to the
> current request object, or security context, or, simply, the request
> URL for logging or submitting performance tracing data::
>
>     async def handle_http_request(request):
>         context.current_http_request = request
>
>         await ...
>         # Invoke your framework code, render templates,
>         # make DB queries, etc, and use the global
>         # 'current_http_request' in that code.
>
>         # This isn't currently possible to do reliably
>         # in asyncio out of the box.
>
> These examples are just a few out of many, where a reliable way to
> store context data is absolutely needed.
>
> The inability to use TLS for asynchronous code has lead to
> proliferation of ad-hoc solutions, limited to be supported only by
> code that was explicitly enabled to work with them.
>
> Current status quo is that any library, including the standard
> library, that uses a TLS, will likely not work as expected in
> asynchronous code or with generators (see [3]_ as an example issue.)
>
> Some languages that have coroutines or generators recommend to
> manually pass a ``context`` object to every function, see [1]_
> describing the pattern for Go.  This approach, however, has limited
> use for Python, where we have a huge ecosystem that was built to work
> with a TLS-like context.  Moreover, passing the context explicitly
> does not work at all for libraries like ``decimal`` or ``numpy``,
> which use operator overloading.
>
> .NET runtime, which has support for async/await, has a generic
> solution of this problem, called ``ExecutionContext`` (see [2]_).
> On the surface, working with it is very similar to working with a TLS,
> but the former explicitly supports asynchronous code.
>
>
> Goals
> =====
>
> The goal of this PEP is to provide a more reliable alternative to
> ``threading.local()``.  It should be explicitly designed to work with
> Python execution model, equally supporting threads, generators, and
> coroutines.
>
> An acceptable solution for Python should meet the following
> requirements:
>
> * Transparent support for code executing in threads, coroutines,
>   and generators with an easy to use API.
>
> * Negligible impact on the performance of the existing code or the
>   code that will be using the new mechanism.
>
> * Fast C API for packages like ``decimal`` and ``numpy``.
>
> Explicit is still better than implicit, hence the new APIs should only
> be used when there is no option to pass the state explicitly.
>
> With this PEP implemented, it should be possible to update a context
> manager like the below::
>
>     _local = threading.local()
>
>     @contextmanager
>     def context(x):
>         old_x = getattr(_local, 'x', None)
>         _local.x = x
>         try:
>             yield
>         finally:
>             _local.x = old_x
>
> to a more robust version that can be reliably used in generators
> and async/await code, with a simple transformation::
>
>     @contextmanager
>     def context(x):
>         old_x = get_execution_context_item('x')
>         set_execution_context_item('x', x)
>         try:
>             yield
>         finally:
>             set_execution_context_item('x', old_x)
>
>
> Specification
> =============
>
> This proposal introduces a new concept called Execution Context (EC),
> along with a set of Python APIs and C APIs to interact with it.
>
> EC is implemented using an immutable mapping.  Every modification
> of the mapping produces a new copy of it.  To illustrate what it
> means let's compare it to how we work with tuples in Python::
>
>     a0 = ()
>     a1 = a0 + (1,)
>     a2 = a1 + (2,)
>
>     # a0 is an empty tuple
>     # a1 is (1,)
>     # a2 is (1, 2)
>
> Manipulating an EC object would be similar::
>
>     a0 = EC()
>     a1 = a0.set('foo', 'bar')
>     a2 = a1.set('spam', 'ham')
>
>     # a0 is an empty mapping
>     # a1 is {'foo': 'bar'}
>     # a2 is {'foo': 'bar', 'spam': 'ham'}
>
> In CPython, every thread that can execute Python code has a
> corresponding ``PyThreadState`` object.  It encapsulates important
> runtime information like a pointer to the current frame, and is
> being used by the ceval loop extensively.  We add a new field to
> ``PyThreadState``, called ``exec_context``, which points to the
> current EC object.
>
> We also introduce a set of APIs to work with Execution Context.
> In this section we will only cover two functions that are needed to
> explain how Execution Context works.  See the full list of new APIs
> in the `New APIs`_ section.
>
> * ``sys.get_execution_context_item(key, default=None)``: lookup
>   ``key`` in the EC of the executing thread.  If not found,
>   return ``default``.
>
> * ``sys.set_execution_context_item(key, value)``: get the
>   current EC of the executing thread.  Add a ``key``/``value``
>   item to it, which will produce a new EC object.  Set the
>   new object as the current one for the executing thread.
>   In pseudo-code::
>
>       tstate = PyThreadState_GET()
>       ec = tstate.exec_context
>       ec2 = ec.set(key, value)
>       tstate.exec_context = ec2
>
> Note, that some important implementation details and optimizations
> are omitted here, and will be covered in later sections of this PEP.
>
> Now let's see how Execution Contexts work with regular multi-threaded
> code, generators, and coroutines.
>
>
> Regular & Multithreaded Code
> ----------------------------
>
> For regular Python code, EC behaves just like a thread-local.  Any
> modification of the EC object produces a new one, which is immediately
> set as the current one for the thread state.
>
> .. figure:: pep-0550/functions.png
>    :align: center
>    :width: 90%
>
>    Figure 1.  Execution Context flow in a thread.
>
> As Figure 1 illustrates, if a function calls
> ``set_execution_context_item()``, the modification of the execution
> context will be visible to all subsequent calls and to the caller::
>
>     def set_foo():
>         set_execution_context_item('foo', 'spam')
>
>     set_execution_context_item('foo', 'bar')
>     print(get_execution_context_item('foo'))
>
>     set_foo()
>     print(get_execution_context_item('foo'))
>
>     # will print:
>     #   bar
>     #   spam
>
>
> Coroutines
> ----------
>
> Python :pep:`492` coroutines are used to implement cooperative
> multitasking.  For a Python end-user they are similar to threads,
> especially when it comes to sharing resources or modifying
> the global state.
>
> An event loop is needed to schedule coroutines.  Coroutines that
> are explicitly scheduled by the user are usually called Tasks.
> When a coroutine is scheduled, it can schedule other coroutines using
> an ``await`` expression.  In async/await world, awaiting a coroutine
> can be viewed as a different calling convention: Tasks are similar to
> threads, and awaiting on coroutines within a Task is similar to
> calling functions within a thread.
>
> By drawing a parallel between regular multithreaded code and
> async/await, it becomes apparent that any modification of the
> execution context within one Task should be visible to all coroutines
> scheduled within it.  Any execution context modifications, however,
> must not be visible to other Tasks executing within the same thread.
>
> To achieve this, a small set of modifications to the coroutine object
> is needed:
>
> * When a coroutine object is instantiated, it saves a reference to
>   the current execution context object to its ``cr_execution_context``
>   attribute.
>
> * Coroutine's ``.send()`` and ``.throw()`` methods are modified as
>   follows (in pseudo-C)::
>
>     if coro->cr_isolated_execution_context:
>         # Save a reference to the current execution context
>         old_context = tstate->execution_context
>
>         # Set our saved execution context as the current
>         # for the current thread.
>         tstate->execution_context = coro->cr_execution_context
>
>         try:
>             # Perform the actual `Coroutine.send()` or
>             # `Coroutine.throw()` call.
>             return coro->send(...)
>         finally:
>             # Save a reference to the updated execution_context.
>             # We will need it later, when `.send()` or `.throw()`
>             # are called again.
>             coro->cr_execution_context = tstate->execution_context
>
>             # Restore thread's execution context to what it was before
>             # invoking this coroutine.
>             tstate->execution_context = old_context
>     else:
>         # Perform the actual `Coroutine.send()` or
>         # `Coroutine.throw()` call.
>         return coro->send(...)
>
> * ``cr_isolated_execution_context`` is a new attribute on coroutine
>   objects.  Set to ``True`` by default, it makes any execution context
>   modifications performed by coroutine to stay visible only to that
>   coroutine.
>
>   When Python interpreter sees an ``await`` instruction, it flips
>   ``cr_isolated_execution_context`` to ``False`` for the coroutine
>   that is about to be awaited.  This makes any changes to execution
>   context made by nested coroutine calls within a Task to be visible
>   throughout the Task.
>
>   Because the top-level coroutine (Task) cannot be scheduled with
>   ``await`` (in asyncio you need to call ``loop.create_task()`` or
>   ``asyncio.ensure_future()`` to schedule a Task), all execution
>   context modifications are guaranteed to stay within the Task.
>
> * We always work with ``tstate->exec_context``.  We use
>   ``coro->cr_execution_context`` only to store coroutine's execution
>   context when it is not executing.
>
> Figure 2 below illustrates how execution context mutations work with
> coroutines.
>
> .. figure:: pep-0550/coroutines.png
>    :align: center
>    :width: 90%
>
>    Figure 2.  Execution Context flow in coroutines.
>
> In the above diagram:
>
> * When "coro1" is created, it saves a reference to the current
>   execution context "2".
>
> * If it makes any change to the context, it will have its own
>   execution context branch "2.1".
>
> * When it awaits on "coro2", any subsequent changes it does to
>   the execution context are visible to "coro1", but not outside
>   of it.
>
> In code::
>
>     async def inner_foo():
>         print('inner_foo:', get_execution_context_item('key'))
>         set_execution_context_item('key', 2)
>
>     async def foo():
>         print('foo:', get_execution_context_item('key'))
>
>         set_execution_context_item('key', 1)
>         await inner_foo()
>
>         print('foo:', get_execution_context_item('key'))
>
>
>     set_execution_context_item('key', 'spam')
>     print('main:', get_execution_context_item('key'))
>
>     asyncio.get_event_loop().run_until_complete(foo())
>
>     print('main:', get_execution_context_item('key'))
>
> which will output::
>
>     main: spam
>     foo: spam
>     inner_foo: 1
>     foo: 2
>     main: spam
>
> Generator-based coroutines (generators decorated with
> ``types.coroutine`` or ``asyncio.coroutine``) behave exactly as
> native coroutines with regards to execution context management:
> their ``yield from`` expression is semantically equivalent to
> ``await``.
>
>
> Generators
> ----------
>
> Generators in Python, while similar to Coroutines, are used in a
> fundamentally different way.  They are producers of data, and
> they use ``yield`` expression to suspend/resume their execution.
>
> A crucial difference between ``await coro`` and ``yield value`` is
> that the former expression guarantees that the ``coro`` will be
> executed to the end, while the latter is producing ``value`` and
> suspending the generator until it gets iterated again.
>
> Generators share 99% of their implementation with coroutines, and
> thus have similar new attributes ``gi_execution_context`` and
> ``gi_isolated_execution_context``.  Similar to coroutines, generators
> save a reference to the current execution context when they are
> instantiated.  The have the same implementation of ``.send()`` and
> ``.throw()`` methods.
>
> The only difference is that
> ``gi_isolated_execution_context`` is always set to ``True``, and
> is never modified by the interpreter.  ``yield from o`` expression in
> regular generators that are not decorated with ``types.coroutine``,
> is semantically equivalent to ``for v in o: yield v``.
>
> .. figure:: pep-0550/generators.png
>    :align: center
>    :width: 90%
>
>    Figure 3.  Execution Context flow in a generator.
>
> In the above diagram:
>
> * When "gen1" is created, it saves a reference to the current
>   execution context "2".
>
> * If it makes any change to the context, it will have its own
>   execution context branch "2.1".
>
> * When "gen2" is created, it saves a reference to the current
>   execution context for it -- "2.1".
>
> * Any subsequent execution context updated in "gen2" will only
>   be visible to "gen2".
>
> * Likewise, any context changes that "gen1" will do after it
>   created "gen2" will not be visible to "gen2".
>
> In code::
>
>     def inner_foo():
>         for i in range(3):
>             print('inner_foo:', get_execution_context_item('key'))
>             set_execution_context_item('key', i)
>             yield i
>
>
>     def foo():
>         set_execution_context_item('key', 'spam')
>         print('foo:', get_execution_context_item('key'))
>
>         inner = inner_foo()
>
>         while True:
>             val = next(inner, None)
>             if val is None:
>                 break
>             yield val
>             print('foo:', get_execution_context_item('key'))
>
>     set_execution_context_item('key', 'spam')
>     print('main:', get_execution_context_item('key'))
>
>     list(foo())
>
>     print('main:', get_execution_context_item('key'))
>
> which will output::
>
>     main: ham
>     foo: spam
>     inner_foo: spam
>     foo: spam
>     inner_foo: 0
>     foo: spam
>     inner_foo: 1
>     foo: spam
>     main: ham
>
> As we see, any modification of the execution context in a generator
> is visible only to the generator itself.
>
> There is one use-case where it is desired for generators to affect
> the surrounding execution context: ``contextlib.contextmanager``
> decorator.  To make the following work::
>
>     @contextmanager
>     def context(x):
>         old_x = get_execution_context_item('x')
>         set_execution_context_item('x', x)
>         try:
>             yield
>         finally:
>             set_execution_context_item('x', old_x)
>
> we modified ``contextmanager`` to flip
> ``gi_isolated_execution_context`` flag to ``False`` on its generator.
>
>
> Greenlets
> ---------
>
> Greenlet is an alternative implementation of cooperative
> scheduling for Python.  Although greenlet package is not part of
> CPython, popular frameworks like gevent rely on it, and it is
> important that greenlet can be modified to support execution
> contexts.
>
> In a nutshell, greenlet design is very similar to design of
> generators.  The main difference is that for generators, the stack
> is managed by the Python interpreter.  Greenlet works outside of the
> Python interpreter, and manually saves some ``PyThreadState``
> fields and pushes/pops the C-stack.  Since Execution Context is
> implemented on top of ``PyThreadState``, it's easy to add
> transparent support of it to greenlet.
>
>
> New APIs
> ========
>
> Even though this PEP adds a number of new APIs, please keep in mind,
> that most Python users will likely ever use only two of them:
> ``sys.get_execution_context_item()`` and
> ``sys.set_execution_context_item()``.
>
>
> Python
> ------
>
> 1. ``sys.get_execution_context_item(key, default=None)``: lookup
>    ``key`` for the current Execution Context.  If not found,
>    return ``default``.
>
> 2. ``sys.set_execution_context_item(key, value)``: set
>    ``key``/``value`` item for the current Execution Context.
>    If ``value`` is ``None``, the item will be removed.
>
> 3. ``sys.get_execution_context()``: return the current Execution
>    Context object: ``sys.ExecutionContext``.
>
> 4. ``sys.set_execution_context(ec)``: set the passed
>    ``sys.ExecutionContext`` instance as a current one for the current
>    thread.
>
> 5. ``sys.ExecutionContext`` object.
>
>    Implementation detail: ``sys.ExecutionContext`` wraps a low-level
>    ``PyExecContextData`` object.  ``sys.ExecutionContext`` has a
>    mutable mapping API, abstracting away the real immutable
>    ``PyExecContextData``.
>
>    * ``ExecutionContext()``: construct a new, empty, execution
>      context.
>
>    * ``ec.run(func, *args)`` method: run ``func(*args)`` in the
>      ``ec`` execution context.
>
>    * ``ec[key]``: lookup ``key`` in ``ec`` context.
>
>    * ``ec[key] = value``: assign ``key``/``value`` item to the ``ec``.
>
>    * ``ec.get()``, ``ec.items()``, ``ec.values()``, ``ec.keys()``, and
>      ``ec.copy()`` are similar to that of ``dict`` object.
>
>
> C API
> -----
>
> C API is different from the Python one because it operates directly
> on the low-level immutable ``PyExecContextData`` object.
>
> 1. New ``PyThreadState->exec_context`` field, pointing to a
>    ``PyExecContextData`` object.
>
> 2. ``PyThreadState_SetExecContextItem`` and
>    ``PyThreadState_GetExecContextItem`` similar to
>    ``sys.set_execution_context_item()`` and
>    ``sys.get_execution_context_item()``.
>
> 3. ``PyThreadState_GetExecContext``: similar to
>    ``sys.get_execution_context()``.  Always returns an
>    ``PyExecContextData`` object.  If ``PyThreadState->exec_context``
>    is ``NULL`` an new and empty one will be created and assigned
>    to ``PyThreadState->exec_context``.
>
> 4. ``PyThreadState_SetExecContext``: similar to
>    ``sys.set_execution_context()``.
>
> 5. ``PyExecContext_New``: create a new empty ``PyExecContextData``
>    object.
>
> 6. ``PyExecContext_SetItem`` and ``PyExecContext_GetItem``.
>
> The exact layout ``PyExecContextData`` is private, which allows
> to switch it to a different implementation later.  More on that
> in the `Implementation Details`_ section.
>
>
> Modifications in Standard Library
> =================================
>
> * ``contextlib.contextmanager`` was updated to flip the new
>   ``gi_isolated_execution_context`` attribute on the generator.
>
> * ``asyncio.events.Handle`` object now captures the current
>   execution context when it is created, and uses the saved
>   execution context to run the callback (with
>   ``ExecutionContext.run()`` method.)  This makes
>   ``loop.call_soon()`` to run callbacks in the execution context
>   they were scheduled.
>
>   No modifications in ``asyncio.Task`` or ``asyncio.Future`` were
>   necessary.
>
> Some standard library modules like ``warnings`` and ``decimal``
> can be updated to use new execution contexts.  This will be considered
> in separate issues if this PEP is accepted.
>
>
> Backwards Compatibility
> =======================
>
> This proposal preserves 100% backwards compatibility.
>
>
> Performance
> ===========
>
> Implementation Details
> ----------------------
>
> The new ``PyExecContextData`` object is wrapping a ``dict`` object.
> Any modification requires creating a shallow copy of the dict.
>
> While working on the reference implementation of this PEP, we were
> able to optimize ``dict.copy()`` operation **5.5x**, see [4]_ for
> details.
>
> .. figure:: pep-0550/dict_copy.png
>    :align: center
>    :width: 100%
>
>    Figure 4.
>
> Figure 4 shows that the performance of immutable dict implemented
> with shallow copying is expectedly O(n) for the ``set()`` operation.
> However, this is tolerable until dict has more than 100 items
> (1 ``set()`` takes about a microsecond.)
>
> Judging by the number of modules that need EC in Standard Library
> it is likely that real world Python applications will use
> significantly less than 100 execution context variables.
>
> The important point is that the cost of accessing a key in
> Execution Context is always O(1).
>
> If the ``set()`` operation performance is a major concern, we discuss
> alternative approaches that have O(1) or close ``set()`` performance
> in `Alternative Immutable Dict Implementation`_, `Faster C API`_, and
> `Copy-on-write Execution Context`_ sections.
>
>
> Generators and Coroutines
> -------------------------
>
> Using a microbenchmark for generators and coroutines from :pep:`492`
> ([12]_), it was possible to observe 0.5 to 1% performance degradation.
>
> asyncio echoserver microbechmarks from the uvloop project [13]_
> showed 1-1.5% performance degradation for asyncio code.
>
> asyncpg benchmarks [14]_, that execute more code and are closer to a
> real-world application did not exhibit any noticeable performance
> change.
>
>
> Overall Performance Impact
> --------------------------
>
> The total number of changed lines in the ceval loop is 2 -- in the
> ``YIELD_FROM`` opcode implementation.  Only performance of generators
> and coroutines can be affected by the proposal.
>
> This was confirmed by running Python Performance Benchmark Suite
> [15]_, which demonstrated that there is no difference between
> 3.7 master branch and this PEP reference implementation branch
> (full benchmark results can be found here [16]_.)
>
>
> Design Considerations
> =====================
>
> Alternative Immutable Dict Implementation
> -----------------------------------------
>
> Languages like Clojure and Scala use Hash Array Mapped Tries (HAMT)
> to implement high performance immutable collections [5]_, [6]_.
>
> Immutable mappings implemented with HAMT have O(log\ :sub:`32`\ N)
> performance for both ``set()`` and ``get()`` operations, which will
> be essentially O(1) for relatively small mappings in EC.
>
> To assess if HAMT can be used for Execution Context, we implemented
> it in CPython [7]_.
>
> .. figure:: pep-0550/hamt_vs_dict.png
>    :align: center
>    :width: 100%
>
>    Figure 5.  Benchmark code can be found here: [9]_.
>
> Figure 5 shows that HAMT indeed displays O(1) performance for all
> benchmarked dictionary sizes.  For dictionaries with less than 100
> items, HAMT is a bit slower than Python dict/shallow copy.
>
> .. figure:: pep-0550/lookup_hamt.png
>    :align: center
>    :width: 100%
>
>    Figure 6.  Benchmark code can be found here: [10]_.
>
> Figure 6 below shows comparison of lookup costs between Python dict
> and an HAMT immutable mapping.  HAMT lookup time is 30-40% worse
> than Python dict lookups on average, which is a very good result,
> considering how well Python dicts are optimized.
>
> Note, that according to [8]_, HAMT design can be further improved.
>
> The bottom line is that the current approach with implementing
> an immutable mapping with shallow-copying dict will likely perform
> adequately in real-life applications.  The HAMT solution is more
> future proof, however.
>
> The proposed API is designed in such a way that the underlying
> implementation of the mapping can be changed completely without
> affecting the Execution Context `Specification`_, which allows
> us to switch to HAMT at some point if necessary.
>
>
> Copy-on-write Execution Context
> -------------------------------
>
> The implementation of Execution Context in .NET is different from
> this PEP. .NET uses copy-on-write mechanism and a regular mutable
> mapping.
>
> One way to implement this in CPython would be to have two new
> fields in ``PyThreadState``:
>
> * ``exec_context`` pointing to the current Execution Context mapping;
> * ``exec_context_copy_on_write`` flag, set to ``0`` initially.
>
> The idea is that whenever we are modifying the EC, the copy-on-write
> flag is checked, and if it is set to ``1``, the EC is copied.
>
> Modifications to Coroutine and Generator ``.send()`` and ``.throw()``
> methods described in the `Coroutines`_ section will be almost the
> same, except that in addition to the ``gi_execution_context`` they
> will have a ``gi_exec_context_copy_on_write`` flag.  When a coroutine
> or a generator starts, the flag will be set to ``1``.  This will
> ensure that any modification of the EC performed within a coroutine
> or a generator will be isolated.
>
> This approach has one advantage:
>
> * For Execution Context that contains a large number of items,
>   copy-on-write is a more efficient solution than the shallow-copy
>   dict approach.
>
> However, we believe that copy-on-write disadvantages are more
> important to consider:
>
> * Copy-on-write behaviour for generators and coroutines makes
>   EC semantics less predictable.
>
>   With immutable EC approach, generators and coroutines always
>   execute in the EC that was current at the moment of their
>   creation.  Any modifications to the outer EC while a generator
>   or a coroutine is executing are not visible to them::
>
>     def generator():
>         yield 1
>         print(get_execution_context_item('key'))
>         yield 2
>
>     set_execution_context_item('key', 'spam')
>     gen = iter(generator())
>     next(gen)
>     set_execution_context_item('key', 'ham')
>     next(gen)
>
>   The above script will always print 'spam' with immutable EC.
>
>   With a copy-on-write approach, the above script will print 'ham'.
>   Now, consider that ``generator()`` was refactored to call some
>   library function, that uses Execution Context::
>
>     def generator():
>         yield 1
>         some_function_that_uses_decimal_context()
>         print(get_execution_context_item('key'))
>         yield 2
>
>   Now, the script will print 'spam', because
>   ``some_function_that_uses_decimal_context`` forced the EC to copy,
>   and ``set_execution_context_item('key', 'ham')`` line did not
>   affect the ``generator()`` code after all.
>
> * Similarly to the previous point, ``sys.ExecutionContext.run()``
>   method will also become less predictable, as
>   ``sys.get_execution_context()`` would still return a reference to
>   the current mutable EC.
>
>   We can't modify ``sys.get_execution_context()`` to return a shallow
>   copy of the current EC, because this would seriously harm
>   performance of ``asyncio.call_soon()`` and similar places, where
>   it is important to propagate the Execution Context.
>
> * Even though copy-on-write requires to shallow copy the execution
>   context object less frequently, copying will still take place
>   in coroutines and generators.  In which case, HAMT approach will
>   perform better for medium to large sized execution contexts.
>
> All in all, we believe that the copy-on-write approach introduces
> very subtle corner cases that could lead to bugs that are
> exceptionally hard to discover and fix.
>
> The immutable EC solution in comparison is always predictable and
> easy to reason about.  Therefore we believe that any slight
> performance gain that the copy-on-write solution might offer is not
> worth it.
>
>
> Faster C API
> ------------
>
> Packages like numpy and standard library modules like decimal need
> to frequently query the global state for some local context
> configuration.  It is important that the APIs that they use is as
> fast as possible.
>
> The proposed ``PyThreadState_SetExecContextItem`` and
> ``PyThreadState_GetExecContextItem`` functions need to get the
> current thread state with ``PyThreadState_GET()`` (fast) and then
> perform a hash lookup (relatively slow).  We can eliminate the hash
> lookup by adding three additional C API functions:
>
> * ``Py_ssize_t PyExecContext_RequestIndex(char *key_name)``:
>   a function similar to the existing ``_PyEval_RequestCodeExtraIndex``
>   introduced :pep:`523`.  The idea is to request a unique index
>   that can later be used to lookup context items.
>
>   The ``key_name`` can later be used by ``sys.ExecutionContext`` to
>   introspect items added with this API.
>
> * ``PyThreadState_SetExecContextIndexedItem(Py_ssize_t index, PyObject *val)``
>   and ``PyThreadState_GetExecContextIndexedItem(Py_ssize_t index)``
>   to request an item by its index, avoiding the cost of hash lookup.
>
>
> Why setting a key to None removes the item?
> -------------------------------------------
>
> Consider a context manager::
>
>     @contextmanager
>     def context(x):
>         old_x = get_execution_context_item('x')
>         set_execution_context_item('x', x)
>         try:
>             yield
>         finally:
>             set_execution_context_item('x', old_x)
>
> With ``set_execution_context_item(key, None)`` call removing the
> ``key``, the user doesn't need to write additional code to remove
> the ``key`` if it wasn't in the execution context already.
>
> An alternative design with ``del_execution_context_item()`` method
> would look like the following::
>
>     @contextmanager
>     def context(x):
>         not_there = object()
>         old_x = get_execution_context_item('x', not_there)
>         set_execution_context_item('x', x)
>         try:
>             yield
>         finally:
>             if old_x is not_there:
>                 del_execution_context_item('x')
>             else:
>                 set_execution_context_item('x', old_x)
>
>
> Can we fix ``PyThreadState_GetDict()``?
> ---------------------------------------
>
> ``PyThreadState_GetDict`` is a TLS, and some of its existing users
> might depend on it being just a TLS.  Changing its behaviour to follow
> the Execution Context semantics would break backwards compatibility.
>
>
> PEP 521
> -------
>
> :pep:`521` proposes an alternative solution to the problem:
> enhance Context Manager Protocol with two new methods: ``__suspend__``
> and ``__resume__``.  To make it compatible with async/await,
> the Asynchronous Context Manager Protocol will also need to be
> extended with ``__asuspend__`` and ``__aresume__``.
>
> This allows to implement context managers like decimal context and
> ``numpy.errstate`` for generators and coroutines.
>
> The following code::
>
>     class Context:
>
>         def __enter__(self):
>             self.old_x = get_execution_context_item('x')
>             set_execution_context_item('x', 'something')
>
>         def __exit__(self, *err):
>             set_execution_context_item('x', self.old_x)
>
> would become this::
>
>     class Context:
>
>         def __enter__(self):
>             self.old_x = get_execution_context_item('x')
>             set_execution_context_item('x', 'something')
>
>         def __suspend__(self):
>             set_execution_context_item('x', self.old_x)
>
>         def __resume__(self):
>             set_execution_context_item('x', 'something')
>
>         def __exit__(self, *err):
>             set_execution_context_item('x', self.old_x)
>
> Besides complicating the protocol, the implementation will likely
> negatively impact performance of coroutines, generators, and any code
> that uses context managers, and will notably complicate the
> interpreter implementation.  It also does not solve the leaking state
> problem for greenlet/gevent.
>
> :pep:`521` also does not provide any mechanism to propagate state
> in a local context, like storing a request object in an HTTP request
> handler to have better logging.
>
>
> Can Execution Context be implemented outside of CPython?
> --------------------------------------------------------
>
> Because async/await code needs an event loop to run it, an EC-like
> solution can be implemented in a limited way for coroutines.
>
> Generators, on the other hand, do not have an event loop or
> trampoline, making it impossible to intercept their ``yield`` points
> outside of the Python interpreter.
>
>
> Reference Implementation
> ========================
>
> The reference implementation can be found here: [11]_.
>
>
> References
> ==========
>
> .. [1] https://blog.golang.org/context
>
> .. [2] 
> https://msdn.microsoft.com/en-us/library/system.threading.executioncontext.aspx
>
> .. [3] https://github.com/numpy/numpy/issues/9444
>
> .. [4] http://bugs.python.org/issue31179
>
> .. [5] https://en.wikipedia.org/wiki/Hash_array_mapped_trie
>
> .. [6] 
> http://blog.higher-order.net/2010/08/16/assoc-and-clojures-persistenthashmap-part-ii.html
>
> .. [7] https://github.com/1st1/cpython/tree/hamt
>
> .. [8] https://michael.steindorfer.name/publications/oopsla15.pdf
>
> .. [9] https://gist.github.com/1st1/9004813d5576c96529527d44c5457dcd
>
> .. [10] https://gist.github.com/1st1/dbe27f2e14c30cce6f0b5fddfc8c437e
>
> .. [11] https://github.com/1st1/cpython/tree/pep550
>
> .. [12] https://www.python.org/dev/peps/pep-0492/#async-await
>
> .. [13] 
> https://github.com/MagicStack/uvloop/blob/master/examples/bench/echoserver.py
>
> .. [14] https://github.com/MagicStack/pgbench
>
> .. [15] https://github.com/python/performance
>
> .. [16] https://gist.github.com/1st1/6b7a614643f91ead3edf37c4451a6b4c
>
>
> Copyright
> =========
>
> This document has been placed in the public domain.
> _______________________________________________
> Python-ideas mailing list
> [email protected]
> https://mail.python.org/mailman/listinfo/python-ideas
> Code of Conduct: http://python.org/psf/codeofconduct/



-- 
Nathaniel J. Smith -- https://vorpus.org
_______________________________________________
Python-ideas mailing list
[email protected]
https://mail.python.org/mailman/listinfo/python-ideas
Code of Conduct: http://python.org/psf/codeofconduct/

Re: [Python-ideas] New PEP 550: Execution Context

Reply via email to