On Sat, Aug 21, 2021 at 8:52 PM Nick Coghlan <ncogh...@gmail.com> wrote:

>
> On Sun, 22 Aug 2021, 10:47 am Guido van Rossum, <gu...@python.org> wrote:
>
>>
>> Everything here is about locals() and f_locals in *function scope*. (I
>> use f_locals to refer to the f_locals field of frame objects as seen from
>> Python code.) And in particular, it is about what I'll call "extra
>> variables": the current CPython feature that you can add *new* variables to
>> f_locals that don't exist in the frame, for example:
>>
>> def foo():
>>     x = 1
>>     locals()["y"] = 2  # or sys._getframe()["y"] = 2
>>
>> My first reaction was to propose to drop this feature, but I realize it's
>> kind of important for debuggers to be able to execute arbitrary code in
>> function code -- assignments to locals should affect the frame, but it
>> should also be possible to create new variables (e.g. temporaries). So I
>> agree we should keep this.
>>
>
> I actually tried taking this feature out in one of the PEP 558 drafts, but
> actually doing so breaks the pdb test suite.
>

I wonder if we should reconsider this, given that so much of the complexity
of the competing PEPs is due to this issue of "extra" variables. We can fix
pdb, and other debuggers probably will be happy to make some changes
(debuggers often are closely tied to the implementation anyway -- e.g.
PyDev currently seems broken with 3.11). One way to fix it would be to have
the debugger use a mapping implementation that acts as a proxy for
f_locals, but stores extra variables in its own storage. Maybe this just
moves the problem, but I feel support for extra variables will always be a
bit of a wart, and many uses of f_locals don't need them.

Another thing I feel we should at least have a good second look at is the
"locals() returns a snapshot" behavior. This is the same in both PEPs but
it is inconsistent with module and class scopes, as well as different from
3.10.
I wonder if we're valuing "does it return the same type" too much over
"does it exhibit the same high-level (conceptual) behavior" here. Yes, a
snapshot is a dict, just like what you get from locals() in class and
module scopes. But no, that dict is not an alias for the actual contents of
the scope. If we made locals() return the same proxy that f_locals gives,
it's no longer a dict, but it has the same *conceptual* behavior (maybe:
"meaning") as for the other types of scopes.


> So apparently the key difference of opinion between Mark and Nick is about
>> f_locals, and what to do with extras. In Nick's proposal when you reference
>> f.f_locals twice in a row (for the same frame object f), you get the same
>> proxy object, whereas in Mark's proposal you get a different object each
>> time, but it doesn't matter, because the proxy has no state other than a
>> reference to the frame.
>>
>
> If PEP 558 is still giving that impression, I need to fix the wording -
> the proxy objects are ephemeral in both PEPs (the 558 text is slightly
> behind the implementation on that point, as the fast refs mapping is now
> stored on the frame object, so it only needs to be built once)
>

Ah, I didn't actually find a clear indication one way or another.  If your
proposal *also* makes f.f_locals return a new object on each use, the
difference between the two proposals really is entirely in the C API, as
you bring up below.

>
> In Mark's proposal, if you assign a value to an extra variable, it gets
>> stored in a hidden dict field on the frame, and when you read the proxy,
>> the contents of that hidden dict field gets included. This hidden dict
>> lazily created on the first store to an extra variable. (Mark shows
>> pseudo-code to clarify this; the hidden dict is stored as _extra_locals on
>> the frame.)
>>
>
> PEP 558 works essentially the same way, the difference is that it uses the
> existing locals dict storage rather than adding new storage just for
> optimised frames.
>

Oh, you're right. So then this doesn't really matter -- if there's no other
use for the C-level field f_locals in a function scope, then we might as
well use that to store the extras (assuming its NULL-ness isn't used as a
flag for some other purpose). To be clear, I *think* that for a function
scope where the f_locals property has never been used and locals() has
never been called, the C-level f_locals field is NULL.


> In Nick's proposal, there's a cache on the frame that stores both the
>> extras and the proper variables. This cache can get out of sync with the
>> contents of the proper variables when some bytecode is executed (for
>> performance reasons we don't want the bytecode to keep the cache up to date
>> on every store), so there's an operation to sync the frame cache
>> (sync_frame_cache(), it's not defined in which namespace this exists -- is
>> it a builtin or in sys?).
>>
>
> It's an extra method on the proxy objects. You only need it if you keep an
> old proxy object around - if you always retrieve a new proxy object after
> executing Python code, that proxy will refresh the cache when it needs to.
>

Ah, okay. We're getting closer to the heart of the matter -- the need for
this to exist feels like a pretty serious wart in your proposal.


> Frankly the description in Nick's PEP is hard to follow -- I am not 100%
>> sure what is meant by "the dynamic snapshot", and it's not quite clear
>> whether proper variables are copied into the cache (and if so, why).
>>
>
> Aye, Mark was a bit quicker with his PEP than I anticipated, so I've
> incorporated the implementation improvements arising from his last round of
> comments, but the PEP text hasn't been updated yet.
>

No problem, I think it is slowly getting clearer to me. (FWIW I really
liked the pseudo code Mark gave for the implementation, maybe you could add
something similar to your PEP).

Personally, I find Mark's proposed semantics for f_locals simpler --
>> there's no cache, only storage for extras, so there's nothing that can get
>> out of sync.
>>
>
> The wording in PEP 667 undersells the cost of that simplification:
>
> "Code that uses PyEval_GetLocals() will continue to operate safely, but
> will need to be changed to use PyEval_Locals() to restore functionality."
>
>
> Code that uses PyEval_GetLocals() will NOT continue to operate safely
> under PEP 667: all such code will raise an exception at runtime, and need
> to be rewritten to use a new API with different refcounting semantics.
>

Yeah, I did a double take too when I read what Mark wrote. He uses "safe"
in a very technical sense, meaning that you get a Python exception but not
a crash, and no leak or writing freed memory. And it's true, any caller to
PyEval_GetLocals() should check for errors, and there are several error
conditions that may occur. (The docs are incomplete, they say "returns NULL
if no frame is executing" but they fail to mention that it sets an
exception in that case and in other cases.)

But PyEval_Locals() is in the Stable ABI (though I have no idea why), so we
have essentially two options: keep it working, or make it return an error.
We can't delete it. And it returns a borrowed reference, which makes it
problematic to let it return a "f_locals proxy object" since those proxies
are not cached on the frame. So I think Mark's solution is viable, even
though his description is understated.


> That's essentially all code that accesses the frame locals from C, since
> we don't offer supported APIs for that other than PyEval_GetLocals()
> (directly accessing the f_locals field on the frame object is only
> "supported" in a very loose sense of the word, although PEP 558 mostly
> keeps that working, too)
>
> This means the real key difference between the two PEPs is that Mark is
> proposing a gratuitous compatibility break for PyEval_GetLocals() that also
> means that the algorithmic complexity characteristics of the proxy
> implementation will be completely off from those of a regular dict (e.g.
> len(proxy) will be O(n) in the number of variables defined on the frame
> rather than being O(1) after the proxy's initial cache update the way it is
> in PEP 558)
>

Maybe I care about the change in behavior to always return an error. But I
definitely don't care about the complexity characteristics of either
PyEval_GetLocals() or the f_locals proxy.


> If Mark's claim that PyEval_GetLocals() could not be fixed was true then I
> would be more sympathetic to his proposal, but I know it isn't true,
> because it still works fine in the PEP 558 implementation (it even
> immediately sees changes made via proxies, and proxies see changes to extra
> variables). The only truly unfixable public API is PyFrame_LocalsToFast().
>

You "fixed" it at a hefty price -- leaving open the possibility of having a
stale f_locals proxy object and the need to call sync_frame_cache() if it
could be stale. (IIRC you have some language about "if a fresh proxy is
always used you will never see out of sync data" but clearly the
possibility exists to end up with a stale proxy.)


> On the code complexity front, while the cache management in PEP 558 does
> incur a bit of extra complexity, it also offers a lot of code
> simplification as many mutable mapping API operations can be delegated to
> the cache instead of needing to be implemented directly against the fast
> locals array (e.g. the keys(), values() and items() views all interact with
> the cache rather than the underlying frame storage, so the implementation
> doesn't need proxy-specific types for those). For O(n) operations, the
> cache is refreshed every time, while for less than O(n) operations, the
> cache is refreshed if it is the first time that particular proxy instance
> has needed it.
>

I don't particularly care about code complexity of operations on the proxy
(even if the proxy was also used for locals() I wouldn't). The locals of a
function will never enter O(...) territory, people just don't write (or
even generate!) code with thousands of locals. I also don't particularly
care about the cost of writing the implementation, it's really not that
hard, and it won't be that much code. I do care about clear semantics that
are easy to explain, and Mark's version is extremely clear.


> While API clients *can* delve into the details of exactly when and how the
> cache gets refreshed, they can also adopt the simple principle of "if in
> doubt, request a new locals reference" and let the interpreter worry about
> the details.
>

I think it's not that easy. I'd much rather be able to pass a mapping to
some other piece of code (which doesn't need to know that it's a
frame-locals proxy, only that it may change over time) than having to pass
a frame with the instructions "if you need something from the frame-locals,
use frm.f_locals["varname"].

But... I also care about backwards compatibility, and I have a crazy idea
for making PyEval_GetLocals() work in a useful manner without compromising
the behavior of the f_locals proxy:

- Let's start your idea of using the C-level f_locals field to store the
"extra" variables.
- The Python-level f_locals proxy looks in the actual frame "fast" locals
and cells, and uses the C-level f_locals field only for extras
- However, PyEval_GetLocals() doesn't return the proxy.
- What PyEval_GetLocals() does: it calls PyEval_FastToLocals(), which makes
a pass over the frame locals and adds them to the C-level f_locals field;
then it returns that field.
- So the borrowed reference is owned by the frame, which is the same as
currently.
- The proxy only uses the f_locals field for extra variables. If a variable
is deleted in the frame but exists in the f_locals field, the proxy reports
it as deleted. (This requires some care but can be done, since we have the
mapping from proper variable names to frame locals or cells.)
- We'll still deprecate PyEval_GetLocals() and PyEval_FastToLocals(), but
unless the user turns the deprecation warning into an error, they will work
for another few releases. Eventually we'll make PyEval_GetLocals() always
return an error (similar to Mark's proposal), since it's in the stable ABI.
- For PyEval_LocalsToFast() I don't care too much whether we keep it (per
Mark's proposal) or make it return an error (per yours).

PS. The mapping from varname to position should be on the code object, not
on the frame. This is how Mark does it (though his implementation would
need to be extended to take cells into account).

-- 
--Guido van Rossum (python.org/~guido)
*Pronouns: he/him **(why is my pronoun here?)*
<http://feministing.com/2015/02/03/how-using-they-as-a-singular-pronoun-can-change-the-world/>
_______________________________________________
Python-Dev mailing list -- python-dev@python.org
To unsubscribe send an email to python-dev-le...@python.org
https://mail.python.org/mailman3/lists/python-dev.python.org/
Message archived at 
https://mail.python.org/archives/list/python-dev@python.org/message/LCHUOMW6SEB3CQ6ESIJ2ACKG6ZBAMN22/
Code of Conduct: http://python.org/psf/codeofconduct/

Reply via email to