[Python-Dev] Re: Comments on PEP 558

Mark Shannon Wed, 28 Jul 2021 03:18:58 -0700


On 28/07/2021 1:03 am, Nick Coghlan wrote:

On Wed, 28 Jul 2021, 1:50 am Mark Shannon, <m...@hotpy.org<mailto:m...@hotpy.org>> wrote:


    Hi Nick,

    On 27/07/2021 2:29 pm, Nick Coghlan wrote:
     >
     >
     > The reference documentation should be precise as well, since that is
     > what other implementations will be following.
     >
     > What semantics do you feel are left unspecified?
     >
     >> The documentation needs to explain the behavior to the majority of
     >> users, but the PEP should be worded so that it can be implemented
     >> correctly from just reading the PEP.
     >>
     >> There is no reason to remove the sugested documentation changes, but
     >> they should be secondary to the more formal specification.
     >
     > I don't know what you want to see on this front.
     >
     > The current PEP seems complete to me, except in relation to the
     > specifics of the proxy behaviour, which it deliberately leaves
     > underspecified (mostly because my current implementation sucks in a
     > few areas, and I really don't want to enshrine those limitations in
     > the language spec).

    All the above were mostly just suggestions, the format of the PEP is up
    to you.

    But please, do not underspecify anything. I need to maintain this,
    as do
    the PyPy and MicroPython developers. We need this to be precise.


You still haven't said what you consider underspecified.


You said it was underspecified, not me. Quoting you from a few lines up
"which it deliberately leaves underspecified"

Perhaps read the implementation and tell me which parts of what thetests are covering you want to see replicated in the PEP text?



     >
     >> Resolving the issues with tracing mode behaviour
     >> ------------------------------------------------
     >>
     >> According to this section the `f_locals` object
    (_PyFastLocalsProxy_Type
     >> in C) will have *two* fields.
     >> It should only have *one*, the pointer to the underlying frame
    object.
     >
     > That's a good point - fast_refs doesn't reference the frame
    object, so
     > it's safe to store it on the frame object and reference it from the
     > individual proxies.
     >
     > Switching to that approach also has the advantage that every proxy
     > after the first is much cheaper, since they don't need to build the
     > fast refs map again.
     >
     >> In order to maximize backwards compatibility and, more importantly,
     >> avoid the synchronization issues that lead to obscure bugs, all the
     >> state of the `f_locals` object should be kept on the frame.
     >>
     >> Any changes to `f_locals` should be instantly visible to any
    thread both
     >> through other `f_locals` objects (for the same frame) and the
    underlying
     >> local variable (if it exists).
     >
     > [snip proxy method implementation sketch] >
     > This gets the gist of how the implementation works, but it isn't
    quite
     > that simple for a few reasons:
     >
     > 1. The proxy needs to offer the same algorithmic complexity as dict
     > APIs, so doing O(n) linear searches of C arrays and Python tuples
     > inside O(1) lookup operations isn't OK (hence the fastrefs mapping to
     > allow for O(1) resolution of variable names to cell references and
     > local variable array offsets)

    Linear searches are faster for small arrays, and there is nothing
    stopping you adding a helper map of names->indexes for large functions.
    Provided the mapping from name to local index is O(1), then the whole
    operation is O(1).

That helper map already exists in the fast refs mapping. Skipping it forsmall functions would be a future optimisation opportunity that doesn'tseem necessary in the initial implementation.

fastref is a mapping from names to values. I am suggesting a mappingfrom names to indexes in the locals array. That is not the same thing.The mapping from names to indexes is fixed at compile time and cannotget out of sync.

    You really don't need much extra machinery to maintain O(1) behavior
    (which no one cares about, they just care about overall performance).
Yes, the only extra machinery needed for working with individual keys isthe fast refs mapping.
It's other aspects of the mapping API that benefit from the continuingpresence of the frame cache (which is maintained on the frame, just asit is today, NOT separately in each proxy).
If you can give me a reliable O(1) implementation of"len(frame.f_locals)" that doesn't rely on up to date cache of:
* which local and cell variable names are currently bound to values
* which extra variables have been added via either proxies orPyEval_GetLocals()

1. You don't need a O(1) implementation of len(frame.f_locals); no onecares.

2. Your implementation isn't O(1) either, because you are ignoring thecosts of updating your cache.




     > 2. The proxy needs to deal with *SIX* kinds of possible key
    lookup, not two:
     >
     > * unbound local variable (numeric index in fast refs map, NULL in
    fast
     > locals slot)
     > * bound local variable (numeric index in fast refs map, object
    pointer
     > in fast locals slot)
     > * unbound cell variable (cell reference in fast refs map, cell
    contains NULL)
     > * bound cell variable (cell reference in fast refs map, cell contains
     > object pointer)
     > * not yet converted cell variables (numeric index in fast refs map,
     > possibly NULL initial value in fast locals slot)
     > * extra keys injected via a fast locals proxy or PyEval_GetLocals()

    There could be a million different cases internally, but that doesn't
    need to reflected in the API.

And it isn't, *except* in the form of potential frame cachedesynchronisation.


So make sure it is in sync all the time, by making the proxy stateless.

Execution of the frame code can move local variables between the boundand unbound states, and cell variables between the unconverted, unboundand bound states. Cell variable unbinding doesn't even require that codein the proxy execute - it can happen in other frames.


Yes I know.
Can we take it as read that I know how the VM works?

Actions through proxy objects will implicitly keep the cache up to date,but mutating the result of PyEval_GetLocals() for keys that correspondto local or cell variables will desynchronise it.


Make the proxy stateless and there can be no synchronization issues.

So the frame proxy implementation of len() has to choose between:
1. checking the binding state of every cell and local variable everytime it is called (making length reporting an O(N) operation); or2. keeping a cache on the frame of currently live name bindings andreporting the length of that (and offer ways to ensure the cache is upto date)
Since it also solves the problem of what to return fromPyEval_GetLocals() without breaking backwards compatibility, I went withoption 2.
     >
     > The combination of those 3 requirements is what lead me down the path
     > of using `f_locals` as a stateful cache that could be used both
    as the
     > return value for PyEval_GetLocals(), and to satisfy those mapping API
     > requirements that couldn't easily be met via the fast_refs mapping.

    Remove the state from the proxy and (almost) all of your problems go
    away.
The only state on the proxy is the fast refs mapping, and that's goingto move to the frame as you suggested. The frame cache is already on theframe so PyEval_GetLocals() can return a borrowed reference to it.
As noted above, I think trying to sketch out an O(1) implementation ofthe proxy's len() support that *doesn't* rely on a cache will providethe clearest explanation for why I ended up keeping the frame cachearound, and just changed the way I described it (i.e. it's mostlyreferred to as a cache now, and only occasionally still referred to as adynamic snapshot).
     >
     >
     > The PEP explains this: Python code knows structurally how locals() is
     > going to behave, but C extension code has no idea. Therefore, C
     > extension code needs to be able to request the behaviour it
    needs, and
     > let the implementation decide how best to satisfy that.

    Why do authors of C code have no idea what is going on, but authors of
    Python code do? That seems like an odd assumption.
C API bindings can be called from anywhere, while actual Python code islexically scoped in the file where it is written.


But the author of the code knows what context they are making the call.

    The C API has plenty of functions for querying the state of the VM, if
    that is what is needed.
The only query API I am aware of that allows a C extension to find outif it is running in function scope is to get the frame object, then thecode object, then check if it is optimised.
PyLocals_GetKind() avoids assuming that frame and code objects areavailable for external introspection.
    What exactly are the requirements of C extension authors?
To know whether PyLocals_Get() returns a direct reference to the localsfor the running frame or not.
     >
     >
    
https://www.python.org/dev/peps/pep-0558/#proposing-several-additions-to-the-stable-c-api-abi
    
<https://www.python.org/dev/peps/pep-0558/#proposing-several-additions-to-the-stable-c-api-abi>
     > summarises the outcome of that previous discussion.

    This discussion seems to date from when the design was quite different,
    is it relevant any more?
What makes you say that? If you think that section doesn't align withthe current design, then there's a significant miscommunication somewhere.

The discussion is from over a year ago, and the PEP appears to havechanged substantially since then.



     >
     >>
     >> `PyEval_GetLocals()` should be equivalent to `locals()`.
     >> It will have to return a new reference. It cannot return a borrowed
     >> reference as it would be returning freed memory.
     >> There is nothing to borrow the reference from, for a function scope.
     >
     > Yes, which is why it can't do that: that API is already defined as
     > returning a borrowed reference, and trying to change that would be a
     > significant backwards compatibility break.

    If you return a borrowed reference, you must borrow it from somewhere.
    Where are you borrowing this reference from?
    If you are borrowing from the frame, then the frame must have a strong
    reference to the proxy, which creates a cycle.

I'm borrowing it from the same place it has always been borrowed from:the f_locals slot on the frame.


That holds the frame cache, NOT the proxy, so there is no cycle.

You are claiming that: There is no cycle AND that it is safe to returnto a borrowed reference to the proxy.


Let me explain why that cannot happen without leaking memory.

In order to return a borrowed reference to the proxy there must be astrong reference to it somewhere, otherwise you are returning freed memory.The proxy must keep the frame alive, in order to work, so the frame mustbe strongly reachable from the proxy.


So:
Either the proxy must be strongly reachable from the frame: a cycle.

Or there is a strong reference to the proxy from an object that is notreachable from the frame: a leak.

The second case is a leak because the proxy will not be freed once theframe is freed, even if the caller of PyEval_GetLocals() has no strongreferences to the proxy.


If PyEval_GetLocals() returns a new reference, then all is good.



     >
     >> `PyFrame_GetLocals(PyFrameObject *)` should be equivalent to
     >> `frame.f_locals`.
     >
     >> The PEP doesn't explicitly state why `PyLocals_GetKind()` is
    needed, but
     >> I believe it is avoid unnecessary copying?
     >> Why is creating an extra copy an issue? It takes a microsecond
    or so to
     >> create the copy.
     >
     > As described in the PEP, the query API is for the benefit of C
     > extensions that want to do something different when PyLocals_Get()
     > returns something other than a direct reference to the frame's local
     > namespace.
     >
     > For example, it may try to get hold of the frame object to retrieve a
     > read/write proxy from it, and throw a custom error if that isn't
     > possible (as an implementation may support PyLocals_Get() without
     > supporting the frame introspection APIs).
     >
     >> If a function that returns a copy of the local namespace is really
     >> needed, then why not offer that functionality directly?
     >> E.g. `PyFrame_GetLocalsCopy()` which is roughly:
     >>
     >>       PyObject *locals = PyEval_GetLocals();
     >>       if (current_scope_is_function())
     >>           return locals;
     >>       return PyDict_Copy(locals); // This leaks, need to decref
    locals
     >
     > The proposed API does offer that functionality directly:
     > PyLocals_GetCopy() and PyFrame_GetLocalsCopy(f).

    Why two functions, not one?

One is in the stable ABI and doesn't require access to frame objects,but only works for the running frame.

The other is in the CPython API and works for any frame, but requiresaccess to frame objects.



     >
     >> Reducing the runtime overhead of trace hooks
     >> --------------------------------------------
     >>
     >> I'm confused by this part.
     >> Since `_PyFrame_FastToLocals` and friends do not alter the
    logical state
     >> of the frame object (or anything else), they should no-ops.
     >> It is `PyFrame_GetLocals()` that creates the proxy object, surely.
     >
     > _PyFrame_FastToLocals() still refreshes the frame cache that is used
     > as the return value from PyEval_GetLocals(), the same way it always
     > has.
     >
     > The frame cache is implicitly kept in sync by operations on frame
     > proxy objects, but there are still two ways for it to get out of
    sync:
     > * code execution inside the frame (this updates the fast locals
    array,
     > not the key/value cache stored in the C level f_locals attribute)
     > * direct manipulation of the frame cache by callers of
     > PyEval_GetLocals (the frame will ignore any attempts to bind or
    unbind
     > variables that way, but it can still throw off proxy operations that
     > rely on the cache)

    Just remove the state in the proxy and nothing will ever be out of sync.


The frame cache is stored on the frame, not separately in each proxy.

If you were to eliminate it, the operations that need it would just needto create it anew every time instead.

As I have said repeatedly, and even gave code examples for (which youpruned for some reason), there is no need for state in the proxy and noneed to cache the values in the fast locals array.

If you think I am mistaken and that the state must exist, you'll need toexplain why that code examples I gave would not work.



     >
     > The reason the frame cache still exists in the first place is because
     > there are a number of mapping APIs where the easiest way to meet the
     > algorithmic complexity expectations of the dict API is to use an
     > actual dict:
     >
     > = Length checking =
     >
     > The fast_refs mapping (and the frame metadata in general), includes
     > all variables defined for the frame, regardless of whether they're
     > currently bound or not.
     > The correct length for the fast locals proxy mapping only
    includes the
     > *currently bound* fast local and cell variables, together with any
     > extra keys that have been added.
     > The locals proxy keeps this an O(1) operation by reporting the
    size of
     > the frame cache, so it doesn't have to check which names are
    currently
     > bound every time.

    It may be O(1), but it is incorrect. You state earlier that the cache
    might be out of date.
    It is easy to implement anything in O(1) if it is allowed to be wrong ;)

Yes, I know. It's a trade-off, as most of the time both C and Pythontrace functions will be able to force a cache refresh when they start(e.g. by calling PyLocals_GetView() or retrieving f_locals from thePython level frame object), and there won't be any Python code running,and nothing will be messing with PyEval_GetLocals(), so the cache won'tgo out of sync.


You are trading away correctness for an illusion of performance.
That is a very bad trade.

The alternative is to NOT rely on the cache and instead check the stateof the bindings every time len() gets called, and every time someflavour of iteration is needed.


There is no overhead for iteration, and no one cares about len() being O(1).

That's trivial to implement as well (the intrinsically O(N) operationslike copying already work that way), it just makes nominally O(1)operations unexpectedly O(N) in the number of local and cell variableson the frame.



     >
     > = Mapping comparisons =
     >
     > Mapping comparisons have an O(1) early exit if their lengths don't
     > match, before falling back to a full O(N) key value comparison. The
     > locals proxy takes advantage of that by delegating comparison
     > operations to the frame cache rather than reimplementing them.
     >
     > = Iteration =
     >
     > Rather than reimplementing forward and reverse iteration and the
     > keys(), values(), and items() methods, the mapping proxy just
     > delegates these operations to the frame cache.
     >
     > Operations on the proxy other than these ones either don't rely
    on the
     > cache being up to date at all (because they can use the fast_refs
     > mapping instead, implicitly updating any affected cache entries along
     > the way), or else implicitly refresh the cache before relying on it
     > (e.g. copying the proxy mapping works by refreshing the frame cache,
     > then copying the cache).
     >
     > I assume there will be ways to improve the implementation to make
     > explicit frame cache refreshes less necessary over time (hence the
     > caveat along those lines in the final paragraph of
     >
    
https://www.python.org/dev/peps/pep-0558/#continuing-to-support-storing-additional-data-on-optimised-frames
    
<https://www.python.org/dev/peps/pep-0558/#continuing-to-support-storing-additional-data-on-optimised-frames>
     > ), but the existence of the PyEval_GetLocals() API means that we're
     > highly unlikely to ever get rid of the frame cache entirely.

    That's why PyEval_GetLocals() needs to return a new reference.
    This is a chance to fix a fundamentally broken API, let's not waste the
    opportunity.

You can't just change the refcounting semantics of existing APIs likethat, it's a major compatibility break.

PyEval_GetLocals() is already broken, which is a legitimate reason tochange it even without a PEP.Given there is a PEP as well, a small backwards compatibility breakseems fine.


PyLocals_Get() is the new API that offers the new locals() behaviour.

Making `PyEval_GetLocals()` always raise and adding a new function wouldbe a valid approach as well.


Cheers,
Mark.
_______________________________________________
Python-Dev mailing list -- python-dev@python.org
To unsubscribe send an email to python-dev-le...@python.org
https://mail.python.org/mailman3/lists/python-dev.python.org/
Message archived at 
https://mail.python.org/archives/list/python-dev@python.org/message/MGD3BY4KHWLSZNKOWEQN4TQCSKIR63FT/
Code of Conduct: http://python.org/psf/codeofconduct/

[Python-Dev] Re: Comments on PEP 558

Reply via email to