Re: [Python-Dev] Idea: Dictionary references

Franklin? Lee Fri, 18 Dec 2015 14:46:16 -0800

On Fri, Dec 18, 2015 at 2:32 PM, Andrew Barnert via Python-Dev
<python-dev@python.org> wrote:

> (Also, either way, it seems more like a thread for -ideas than -dev...)

I said this early on in this thread!

Should I try to write up my idea as a single thing, instead of a bunch
of responses, and post it in -ideas?

Should I call them "parent scope" and "parent refcell"?

On Fri, Dec 18, 2015 at 7:56 AM, Steven D'Aprano <st...@pearwood.info> wrote:

> I'm not quite sure about this. In principle, every name lookup looks in
> four scopes, LEGB as you describe above:
>
> - locals
> - non-locals, a.k.a. enclosing or lexical scope(s)
> - globals (i.e. the module)
> - builtins
>
>
> although Python can (usually?) optimise away some of those lookups. The
> relationship of locals to enclosing scopes, and to globals in turn,
> involve actual nesting of indented blocks in Python, but that's not
> necessarily the case.

As I understand, L vs E vs GB is known at compile-time.

That is, your exec example doesn't work for me in Python 3, because
all names are scoped at compile-time.

    x = 5
    def f():
        exec('x = 111')
        print(x)

    f() #prints 5
    print(x) #prints 5

This means that my idea only really works for GB lookups.

> On Thu, Dec 17, 2015 at 09:30:24AM -0800, Andrew Barnert via Python-Dev wrote:

>> So, trying to generalize global vs. builtin to a general notion of
>> "nested scope" that isn't necessary for builtins and doesn't work for
>> anything else seems like overcomplicating things for no benefit.
>
> Well, putting aside the question of whether this is useful or not, and
> putting aside efficiency concerns, let's just imagine a hypothetical
> implementation where name lookups used ChainMaps instead of using
> separate LOAD_* lookups of special dicts. Then a function could set up a
> ChainMap:
>
> function.__scopes__ = ChainMap(locals, enclosing, globals, builtins)
>
> and a name lookup for (say) "x" would always be a simple:
>
> function.__scopes__["x"]
>
> Of course this would be harder to optimize, and hence probably slower,
> than the current arrangement,

This is where the ChainRefCell idea comes in.

If a ChainRefCell is empty, it would ask its parent dicts for a value.
If it finds a value in parent n, it would replace parent n with a
refcell into parent n, and similarly for parents 0, 1, ... n-1. It
won't need to do hash lookups in those parents again, while allowing
for those parents to acquire names. (This means parent n+1 won't need
to create refcells, so we don't make unnecessary refcells in `object`
and `__builtin__`.)

Unfortunately, classes are more complicated than nested scopes.

1. We skip MRO if we define classes as having their direct supers as
parents. (Solution: Define classes as having all supers as parents,
and make non-recursive Refcell.resolve() requests.) (Objects have
their class as a parent, always.)

2. Classes can replace their bases. (I have some ideas for this, but see #3.)

3. I get the impression that attribute lookups are already pretty optimized.

On Fri, Dec 18, 2015 at 2:32 PM, Andrew Barnert via Python-Dev
<python-dev@python.org> wrote:

> I think it kind of _has_ to optimize away, or at least tweak, some of those 
> things, rather than just acting as if globals and builtins were just two more 
> enclosing scopes. For example, global to builtins has to go through 
> globals()['__builtins__'], or act as if it does, or code that relies on, say, 
> the documented behavior of exec can be broken.

It would or could, in my idea of __builtins__ being a parent scope of
globals() (though I'm not sure whether it'd be the case for any other
kind of nesting).

Each refcell in globals() will hold a reference to __builtins__ (if
they didn't successfully look it up yet) or to a refcell in
__builtins__ (if there was once a successful lookup). Since globals()
knows when globals()['__builtins__'] is modified, it can invalidate
all its refcells' parent cells (by making them hold references to the
new __builtins__).

This will be O(len(table) + (# of refcells)), but swapping out
__builtins__ shouldn't be something you keep doing. Even if it is a
concern, I have More Ideas to remove the "len(table) +" (but with
Raymond Hettinger's compact dicts, it wouldn't be necessary). It would
be worse for classes, because it would require potentially many
notifications. (But it would also save future lookups. And I have More
Ideas.)

This idea (of the owner dict "knowing" about its changed parent) also
applies to general chained scopes, but flattenings like MRO would mess
it up. Again, though, More Ideas. And more importantly, from what I
understand of Victor's response, the current implementation would
probably be efficient enough, or more efficient.

> And you have to be able to modify the global scope after compile time and 
> have that modification be effective, which means you'd have to allow the same 
> things on locals and closures if they were to act the same.

Not sure what you mean, but since I demand (possibly empty) refcells
from globals() at compile time, they will always have the most updated
value from globals. Not so much from __builtins__, but each refcell in
globals will only have to make one successful lookup in __builtins__
(until it's swapped out).

> The first problem with this is that using closures keeps alive a ton of 
> garbage that can't be reclaimed for a long time. One solution to that is to 
> lift out the variables, and only keep alive the ones that are actually 
> referenced--but then you need some rule to decide variables are actually 
> referenced, and the easiest place to do that is at function compile time. 
> Which means that if you eval up new bindings or manipulate frame 
> environments, they may or may not get closures, and it gets very confusing. 
> It's simpler just to make them not work at all, at which point you've got 
> pretty much the same rules cellvars have in Python.

I don't know enough to confidentally say whether it would be an
improvement to closures, but the refs concept I want for dict works
for pretty much any data structure. You just keep a second container
of pointers to RefCells, synced to the size of the original container.

For a dict, that means syncing a second table with the same hash
indices. For a resizable array, it means keeping an array of pointers
of the same size.

When an internal function refers to a local, it requests a refcell.
When the external function call dies, the array cleans up its
unexposed variables and releases its ref'd variables to the refcells
(which might be held by an unexposed variable and thus later get
DecRef'd anyway).

The logic is pretty simple and doesn't need to "know" about closures.
It just piggybacks onto Python's refcounting. But it would mean that
inner functions create Python objects where they didn't used to (but
this might be solvable at compile-time). And again, I don't know
enough to say it's an improvement.

> But you don't want to apply those rules at global scope; you need to be able 
> to build that scope iteratively. (Even a simple recursive function--or a 
> function that references itself to call set_enclosing--needs to be able to 
> defer looking for the name's scope until call time. Which, besides the 
> trivial "making the basics work", allows Python to do all kinds of other fun 
> stuff, like write a function that calls itself, then replace it with an 
> optimized version, and now all existing invocations of that function--like, 
> say, generators--recurse to the optimized version.)

My idea would allow that, with only one lookup at compile-time. It
just creates cells that might never be used. (But by requesting such a
cell, you're saying that it INTENDS to be used.)

>> So I think there's something interesting here, even if we don't want to
>> use it in production code, it would make for some nice experiments.
>
> Most this seems like it would be easier to experiment with by building a new 
> Python-like language than by hacking on Python.

I think it would be pretty much the same difficulty.
_______________________________________________
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] Idea: Dictionary references

Reply via email to