This looks great. I only have two nits with the text.
First, why is the snapshot called a "dynamic snapshot"? What exactly is dynamic about it? Second, you use the word "mapping" a lot. Would you mind changing that to "mapping object" in most places? Especially in the phrase "each call to ``locals()`` returns the *same* mapping". To me, without the word "object" added, this *could* be interpreted as "a dict with the same key/value pairs" (since "mapping" is also an abstract mathematical concept describing anything that maps keys to values). Other than that, go for it! (Assuming Nathaniel agrees, of course.) --Guido On Tue, May 21, 2019 at 7:54 AM Nick Coghlan <ncogh...@gmail.com> wrote: > Hi folks, > > After a couple of years hiatus, I finally got back to working on the > reference implementation for PEP 558, my proposal to resolve some > weird interactions between closures and trace functions in CPython by > formally defining the expected semantics of the locals() builtin at > function scope, and then ensuring that CPython adheres to those > clarified semantics. > > The full text of the PEP is included below, and the rendered version > is available at https://www.python.org/dev/peps/pep-0558/ > > The gist of the PEP is that: > > 1. The behaviour when no trace functions are installed and no frame > introspection APIs are invoked is fine, so nothing should change in > that regard (except that we elevate that behaviour from "the way > CPython happens to work" to "the way the language and library > reference says that Python implementations are supposed to work", and > make sure CPython continues to behave that way even when a trace hook > *is* installed) > 2. If you get hold of a frame object in CPython (or another > implementation that emulates the CPython frame API), whether via a > trace hook or via a frame introspection API, then writing to the > returned mapping will update the actual function locals and/or closure > reference immediately, rather than relying on the > FastToLocals/LocalsToFast APIs > 3. The LocalsToFast C API changes to always produce RuntimeError > (since there's no way for us to make it actually work correctly and > consistently in the presence of closures, and the trace hook > implementation won't need it any more given the write-through proxy > behaviour on frame objects' "f_locals" attribute) > > The reference implementation still isn't quite done yet, but it's far > enough along that I'm happy with the semantics and C API updates > proposed in the current version of the PEP. > > Cheers, > Nick. > > P.S. I'm away this weekend, so I expect the reference implementation > to be done late next week, and I'd be submitting the PEP to Nathaniel > for formal pronouncement at that point. However, I'm posting this > thread now so that there's more time for discussion prior to the 3.8b1 > deadline. > > ============== > PEP: 558 > Title: Defined semantics for locals() > Author: Nick Coghlan <ncogh...@gmail.com> > BDFL-Delegate: Nathaniel J. Smith > Status: Draft > Type: Standards Track > Content-Type: text/x-rst > Created: 2017-09-08 > Python-Version: 3.8 > Post-History: 2017-09-08, 2019-05-22 > > > Abstract > ======== > > The semantics of the ``locals()`` builtin have historically been > underspecified > and hence implementation dependent. > > This PEP proposes formally standardising on the behaviour of the CPython > 3.6 > reference implementation for most execution scopes, with some adjustments > to the > behaviour at function scope to make it more predictable and independent of > the > presence or absence of tracing functions. > > > Rationale > ========= > > While the precise semantics of the ``locals()`` builtin are nominally > undefined, > in practice, many Python programs depend on it behaving exactly as it > behaves in > CPython (at least when no tracing functions are installed). > > Other implementations such as PyPy are currently replicating that > behaviour, > up to and including replication of local variable mutation bugs that > can arise when a trace hook is installed [1]_. > > While this PEP considers CPython's current behaviour when no trace hooks > are > installed to be acceptable (and largely desirable), it considers the > current > behaviour when trace hooks are installed to be problematic, as it causes > bugs > like [1]_ *without* even reliably enabling the desired functionality of > allowing > debuggers like ``pdb`` to mutate local variables [3]_. > > > Proposal > ======== > > The expected semantics of the ``locals()`` builtin change based on the > current > execution scope. For this purpose, the defined scopes of execution are: > > * module scope: top-level module code, as well as any other code executed > using > ``exec()`` or ``eval()`` with a single namespace > * class scope: code in the body of a ``class`` statement, as well as any > other > code executed using ``exec()`` or ``eval()`` with separate local and > global > namespaces > * function scope: code in the body of a ``def`` or ``async def`` statement > > We also allow interpreters to define two "modes" of execution, with only > the > first mode being considered part of the language specification itself: > > * regular operation: the way the interpreter behaves by default > * tracing mode: the way the interpreter behaves when a trace hook has been > registered in one or more threads via an implementation dependent > mechanism > like ``sys.settrace`` ([4]_) in CPython's ``sys`` module or > ``PyEval_SetTrace`` ([5]_) in CPython's C API > > For regular operation, this PEP proposes elevating the current behaviour of > the CPython reference implementation to become part of the language > specification. > > For tracing mode, this PEP proposes changes to CPython's behaviour at > function > scope that bring the ``locals()`` builtin semantics closer to those used in > regular operation, while also making the related frame API semantics > clearer > and easier for interactive debuggers to rely on. > > The proposed tracing mode changes also affect the semantics of frame object > references obtained through other means, such as via a traceback, or via > the > ``sys._getframe()`` API. > > > New ``locals()`` documentation > ------------------------------ > > The heart of this proposal is to revise the documentation for the > ``locals()`` > builtin to read as follows: > > Return a dictionary representing the current local symbol table, with > variable names as the keys, and their currently bound references as the > values. This will always be the same dictionary for a given runtime > execution frame. > > At module scope, as well as when using ``exec()`` or ``eval()`` with a > single namespace, this function returns the same namespace as > ``globals()``. > > At class scope, it returns the namespace that will be passed to the > metaclass constructor. > > When using ``exec()`` or ``eval()`` with separate local and global > namespaces, it returns the local namespace passed in to the function > call. > > At function scope (including for generators and coroutines), it > returns a > dynamic snapshot of the function's local variables and any nonlocal > cell > references. In this case, changes made via the snapshot are *not* > written > back to the corresponding local variables or nonlocal cell references, > and > any such changes to the snapshot will be overwritten if the snapshot is > subsequently refreshed (e.g. by another call to ``locals()``). > > CPython implementation detail: the dynamic snapshot for the current > frame > will be implicitly refreshed before each call to the trace function > when a > trace function is active. > > For reference, the current documentation of this builtin reads as follows: > > Update and return a dictionary representing the current local symbol > table. > Free variables are returned by locals() when it is called in function > blocks, but not in class blocks. > > Note: The contents of this dictionary should not be modified; changes > may > not affect the values of local and free variables used by the > interpreter. > > (In other words: the status quo is that the semantics and behaviour of > ``locals()`` are currently formally implementation defined, whereas the > proposed > state after this PEP is that the only implementation defined behaviour > will be > that encountered at function scope when a tracing function is defined, > with the > behaviour in all other cases being defined by the language and library > references) > > > Module scope > ------------ > > At module scope, as well as when using ``exec()`` or ``eval()`` with a > single namespace, ``locals()`` must return the same object as > ``globals()``, > which must be the actual execution namespace (available as > ``inspect.currentframe().f_locals`` in implementations that provide access > to frame objects). > > Variable assignments during subsequent code execution in the same scope > must > dynamically change the contents of the returned mapping, and changes to the > returned mapping must change the values bound to local variable names in > the > execution environment. > > The semantics at module scope are required to be the same in both tracing > mode (if provided by the implementation) and in regular operation. > > To capture this expectation as part of the language specification, the > following > paragraph will be added to the documentation for ``locals()``: > > At module scope, as well as when using ``exec()`` or ``eval()`` with a > single namespace, this function returns the same namespace as > ``globals()``. > > This part of the proposal does not require any changes to the reference > implementation - it is standardisation of the current behaviour. > > > Class scope > ----------- > > At class scope, as well as when using ``exec()`` or ``eval()`` with > separate > global and local namespaces, ``locals()`` must return the specified local > namespace (which may be supplied by the metaclass ``__prepare__`` method > in the case of classes). As for module scope, this must be a direct > reference > to the actual execution namespace (available as > ``inspect.currentframe().f_locals`` in implementations that provide access > to frame objects). > > Variable assignments during subsequent code execution in the same scope > must > change the contents of the returned mapping, and changes to the returned > mapping > must change the values bound to local variable names in the > execution environment. > > The mapping returned by ``locals()`` will *not* be used as the actual class > namespace underlying the defined class (the class creation process will > copy > the contents to a fresh dictionary that is only accessible by going > through the > class machinery). > > For nested classes defined inside a function, any nonlocal cells > referenced from > the class scope are *not* included in the ``locals()`` mapping. > > The semantics at class scope are required to be the same in both tracing > mode (if provided by the implementation) and in regular operation. > > To capture this expectation as part of the language specification, the > following > two paragraphs will be added to the documentation for ``locals()``: > > When using ``exec()`` or ``eval()`` with separate local and global > namespaces, [this function] returns the given local namespace. > > At class scope, it returns the namespace that will be passed to the > metaclass > constructor. > > This part of the proposal does not require any changes to the reference > implementation - it is standardisation of the current behaviour. > > > Function scope > -------------- > > At function scope, interpreter implementations are granted significant > freedom > to optimise local variable access, and hence are NOT required to permit > arbitrary modification of local and nonlocal variable bindings through the > mapping returned from ``locals()``. > > Historically, this leniency has been described in the language > specification > with the words "The contents of this dictionary should not be modified; > changes > may not affect the values of local and free variables used by the > interpreter." > > This PEP proposes to change that text to instead say: > > At function scope (including for generators and coroutines), [this > function] > returns a dynamic snapshot of the function's local variables and any > nonlocal cell references. In this case, changes made via the snapshot > are > *not* written back to the corresponding local variables or nonlocal > cell > references, and any such changes to the snapshot will be overwritten > if the > snapshot is subsequently refreshed (e.g. by another call to > ``locals()``). > > CPython implementation detail: the dynamic snapshot for the currently > executing frame will be implicitly refreshed before each call to the > trace > function when a trace function is active. > > This part of the proposal *does* require changes to the CPython reference > implementation, as while it accurately describes the behaviour in regular > operation, the "write back" strategy currently used to support namespace > changes > from trace functions doesn't comply with it (and also causes the quirky > behavioural problems mentioned in the Rationale). > > > CPython Implementation Changes > ============================== > > The current cause of CPython's tracing mode quirks (both the side effects > from > simply installing a tracing function and the fact that writing values back > to > function locals only works for the specific function being traced) is the > way > that locals mutation support for trace hooks is currently implemented: the > ``PyFrame_LocalsToFast`` function. > > When a trace function is installed, CPython currently does the following > for > function frames (those where the code object uses "fast locals" semantics): > > 1. Calls ``PyFrame_FastToLocals`` to update the dynamic snapshot > 2. Calls the trace hook (with tracing of the hook itself disabled) > 3. Calls ``PyFrame_LocalsToFast`` to capture any changes made to the > dynamic > snapshot > > This approach is problematic for a few different reasons: > > * Even if the trace function doesn't mutate the snapshot, the final step > resets > any cell references back to the state they were in before the trace > function > was called (this is the root cause of the bug report in [1]_) > * If the trace function *does* mutate the snapshot, but then does something > that causes the snapshot to be refreshed, those changes are lost (this is > one aspect of the bug report in [3]_) > * If the trace function attempts to mutate the local variables of a frame > other > than the one being traced (e.g. ``frame.f_back.f_locals``), those changes > will almost certainly be lost (this is another aspect of the bug report > in > [3]_) > * If a ``locals()`` reference is passed to another function, and *that* > function mutates the snapshot namespace, then those changes *may* be > written > back to the execution frame *if* a trace hook is installed > > The proposed resolution to this problem is to take advantage of the fact > that > whereas functions typically access their *own* namespace using the language > defined ``locals()`` builtin, trace functions necessarily use the > implementation > dependent ``frame.f_locals`` interface, as a frame reference is what gets > passed to hook implementations. > > Instead of being a direct reference to the dynamic snapshot returned by > ``locals()``, ``frame.f_locals`` will be updated to instead return a > dedicated > proxy type (implemented as a private subclass of the existing > ``types.MappingProxyType``) that has two internal attributes not exposed as > part of either the Python or public C API: > > * *mapping*: the dynamic snapshot that is returned by the ``locals()`` > builtin > * *frame*: the underlying frame that the snapshot is for > > ``__setitem__`` and ``__delitem__`` operations on the proxy will affect > not only > the dynamic snapshot, but *also* the corresponding fast local or cell > reference > on the underlying frame. > > The ``locals()`` builtin will be made aware of this proxy type, and > continue to > return a reference to the dynamic snapshot rather than to the write-through > proxy. > > At the C API layer, ``PyEval_GetLocals()`` will implement the same > semantics > as the Python level ``locals()`` builtin, and a new > ``PyFrame_GetPyLocals(frame)`` accessor API will be provided to allow the > function level proxy bypass logic to be encapsulated entirely inside the > frame > implementation. > > The C level equivalent of accessing ``pyframe.f_locals`` in Python will be > a > new ``PyFrame_GetLocalsAttr(frame)`` API. Like the Python level > descriptor, the > new API will implicitly refresh the dynamic snapshot at function scope > before > returning a reference to the write-through proxy. > > The ``PyFrame_LocalsToFast()`` function will be changed to always emit > ``RuntimeError``, explaining that it is no longer a supported operation, > and > affected code should be updated to rely on the write-through tracing mode > proxy instead. > > > Design Discussion > ================= > > Ensuring ``locals()`` returns a shared snapshot at function scope > ----------------------------------------------------------------- > > The ``locals()`` builtin is a required part of the language, and in the > reference implementation it has historically returned a mutable mapping > with > the following characteristics: > > * each call to ``locals()`` returns the *same* mapping > * for namespaces where ``locals()`` returns a reference to something other > than > the actual local execution namespace, each call to ``locals()`` updates > the > mapping with the current state of the local variables and any referenced > nonlocal cells > * changes to the returned mapping *usually* aren't written back to the > local variable bindings or the nonlocal cell references, but write backs > can be triggered by doing one of the following: > > * installing a Python level trace hook (write backs then happen whenever > the trace hook is called) > * running a function level wildcard import (requires bytecode > injection in Py3) > * running an ``exec`` statement in the function's scope (Py2 only, since > ``exec`` became an ordinary builtin in Python 3) > > The proposal in this PEP aims to retain the first two properties (to > maintain > backwards compatibility with as much code as possible) while ensuring that > simply installing a trace hook can't enable rebinding of function locals > via > the ``locals()`` builtin (whereas enabling rebinding via > ``frame.f_locals`` inside the tracehook implementation is fully intended). > > > Keeping ``locals()`` as a dynamic snapshot at function scope > ------------------------------------------------------------ > > It would theoretically be possible to change the semantics of the > ``locals()`` > builtin to return the write-through proxy at function scope, rather than > continuing to return a dynamic snapshot. > > This PEP doesn't (and won't) propose this as it's a backwards incompatible > change in practice, even though code that relies on the current behaviour > is > technically operating in an undefined area of the language specification. > > Consider the following code snippet:: > > def example(): > x = 1 > locals()["x"] = 2 > print(x) > > Even with a trace hook installed, that function will consistently print > ``1`` > on the current reference interpreter implementation:: > > >>> example() > 1 > >>> import sys > >>> def basic_hook(*args): > ... return basic_hook > ... > >>> sys.settrace(basic_hook) > >>> example() > 1 > > Similarly, ``locals()`` can be passed to the ``exec()`` and ``eval()`` > builtins > at function scope without risking unexpected rebinding of local variables. > > Provoking the reference interpreter into incorrectly mutating the local > variable > state requires a more complex setup where a nested function closes over a > variable being rebound in the outer function, and due to the use of either > threads, generators, or coroutines, it's possible for a trace function to > start > running for the nested function before the rebinding operation in the outer > function, but finish running after the rebinding operation has taken place > (in > which case the rebinding will be reverted, which is the bug reported in > [1]_). > > In addition to preserving the de facto semantics which have been in place > since > PEP 227 introduced nested scopes in Python 2.1, the other benefit of > restricting > the write-through proxy support to the implementation-defined frame object > API > is that it means that only interpreter implementations which emulate the > full > frame API need to offer the write-through capability at all, and that > JIT-compiled implementations only need to enable it when a frame > introspection > API is invoked, or a trace hook is installed, not whenever ``locals()`` is > accessed at function scope. > > > What happens with the default args for ``eval()`` and ``exec()``? > ----------------------------------------------------------------- > > These are formally defined as inheriting ``globals()`` and ``locals()`` > from > the calling scope by default. > > There isn't any need for the PEP to change these defaults, so it doesn't. > > > Changing the frame API semantics in regular operation > ----------------------------------------------------- > > Earlier versions of this PEP proposed having the semantics of the frame > ``f_locals`` attribute depend on whether or not a tracing hook was > currently > installed - only providing the write-through proxy behaviour when a > tracing hook > was active, and otherwise behaving the same as the ``locals()`` builtin. > > That was adopted as the original design proposal for a couple of key > reasons, > one pragmatic and one more philosophical: > > * Object allocations and method wrappers aren't free, and tracing functions > aren't the only operations that access frame locals from outside the > function. > Restricting the changes to tracing mode meant that the additional memory > and > execution time overhead of these changes would as close to zero in > regular > operation as we can possibly make them. > * "Don't change what isn't broken": the current tracing mode problems are > caused > by a requirement that's specific to tracing mode (support for external > rebinding of function local variable references), so it made sense to > also > restrict any related fixes to tracing mode > > However, actually attempting to implement and document that dynamic > approach > highlighted the fact that it makes for a really subtle runtime state > dependent > behaviour distinction in how ``frame.f_locals`` works, and creates several > new edge cases around how ``f_locals`` behaves as trace functions are added > and removed. > > Accordingly, the design was switched to the current one, where > ``frame.f_locals`` is always a write-through proxy, and ``locals()`` is > always > a dynamic snapshot, which is both simpler to implement and easier to > explain. > > Regardless of how the CPython reference implementation chooses to handle > this, > optimising compilers and interpreters also remain free to impose additional > restrictions on debuggers, by making local variable mutation through frame > objects an opt-in behaviour that may disable some optimisations (just as > the > emulation of CPython's frame API is already an opt-in flag in some Python > implementations). > > > Historical semantics at function scope > -------------------------------------- > > The current semantics of mutating ``locals()`` and ``frame.f_locals`` in > CPython > are rather quirky due to historical implementation details: > > * actual execution uses the fast locals array for local variable bindings > and > cell references for nonlocal variables > * there's a ``PyFrame_FastToLocals`` operation that populates the frame's > ``f_locals`` attribute based on the current state of the fast locals > array > and any referenced cells. This exists for three reasons: > > * allowing trace functions to read the state of local variables > * allowing traceback processors to read the state of local variables > * allowing ``locals()`` to read the state of local variables > * a direct reference to ``frame.f_locals`` is returned from ``locals()``, > so if > you hand out multiple concurrent references, then all those references > will be > to the exact same dictionary > * the two common calls to the reverse operation, ``PyFrame_LocalsToFast``, > were > removed in the migration to Python 3: ``exec`` is no longer a statement > (and > hence can no longer affect function local namespaces), and the compiler > now > disallows the use of ``from module import *`` operations at function > scope > * however, two obscure calling paths remain: ``PyFrame_LocalsToFast`` is > called > as part of returning from a trace function (which allows debuggers to > make > changes to the local variable state), and you can also still inject the > ``IMPORT_STAR`` opcode when creating a function directly from a code > object > rather than via the compiler > > This proposal deliberately *doesn't* formalise these semantics as is, > since they > only make sense in terms of the historical evolution of the language and > the > reference implementation, rather than being deliberately designed. > > > Implementation > ============== > > The reference implementation update is in development as a draft pull > request on GitHub ([6]_). > > > Acknowledgements > ================ > > Thanks to Nathaniel J. Smith for proposing the write-through proxy idea in > [1]_ and pointing out some critical design flaws in earlier iterations of > the > PEP that attempted to avoid introducing such a proxy. > > > References > ========== > > .. [1] Broken local variable assignment given threads + trace hook + > closure > (https://bugs.python.org/issue30744) > > .. [2] Clarify the required behaviour of ``locals()`` > (https://bugs.python.org/issue17960) > > .. [3] Updating function local variables from pdb is unreliable > (https://bugs.python.org/issue9633) > > .. [4] CPython's Python API for installing trace hooks > (https://docs.python.org/dev/library/sys.html#sys.settrace) > > .. [5] CPython's C API for installing trace hooks > (https://docs.python.org/3/c-api/init.html#c.PyEval_SetTrace) > > .. [6] PEP 558 reference implementation > (https://github.com/python/cpython/pull/3640/files) > > > Copyright > ========= > > This document has been placed in the public domain. > > > -- > Nick Coghlan | ncogh...@gmail.com | Brisbane, Australia > _______________________________________________ > Python-Dev mailing list > Python-Dev@python.org > https://mail.python.org/mailman/listinfo/python-dev > Unsubscribe: > https://mail.python.org/mailman/options/python-dev/guido%40python.org > -- --Guido van Rossum (python.org/~guido) *Pronouns: he/him/his **(why is my pronoun here?)* <http://feministing.com/2015/02/03/how-using-they-as-a-singular-pronoun-can-change-the-world/>
_______________________________________________ Python-Dev mailing list Python-Dev@python.org https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com