Re: [Python-Dev] Idea: Dictionary references

Andrew Barnert via Python-Dev Fri, 18 Dec 2015 11:36:11 -0800

> On Dec 18, 2015, at 04:56, Steven D'Aprano <st...@pearwood.info> wrote:
> 
>>> On Thu, Dec 17, 2015 at 09:30:24AM -0800, Andrew Barnert via Python-Dev 
>>> wrote:
>>> On Dec 17, 2015, at 07:38, Franklin? Lee <leewangzhong+pyt...@gmail.com> 
>>> wrote:
>>> 
>>> The nested dictionaries are only for nested scopes (and inner
>>> functions don't create nested scopes). Nested scopes will already
>>> require multiple lookups in parents.
>> 
>> I think I understand what you're getting at here, but it's a really 
>> confusing use of terminology. In Python, and in programming in 
>> general, nested scopes refer to exactly inner functions (and classes) 
>> being lexically nested and doing lookup through outer scopes. The fact 
>> that this is optimized at compile time to FAST vs. CELL vs. 
>> GLOBAL/NAME, cells are optimized at function-creation time, and only 
>> global and name have to be resolved at the last second doesn't mean 
>> that there's no scoping, or some other form of scoping besides 
>> lexical. The actual semantics are LEGB, even if L vs. E vs. GB and E 
>> vs. further-out E can be optimized.
> 
> In Python 2, the LOAD_NAME byte-code can return a local, even though it 
> normally doesn't:
> 
> py> x = "global"
> py> def spam():
> ...     exec "x = 'local'"
> ...     print x
> ...
> py> spam()
> local
> py> x == 'global'
> True
> 
> 
> If we look at the byte-code, we see that the lookup is *not* optimized 
> to inspect locals only (LOAD_FAST), but uses the regular LOAD_NAME that 
> normally gets used for globals and builtins:
> 
> py> import dis
> py> dis.dis(spam)
>  2           0 LOAD_CONST               1 ("x = 'local'")
>              3 LOAD_CONST               0 (None)
>              6 DUP_TOP
>              7 EXEC_STMT
> 
>  3           8 LOAD_NAME                0 (x)
>             11 PRINT_ITEM
>             12 PRINT_NEWLINE
>             13 LOAD_CONST               0 (None)
>             16 RETURN_VALUE
> 
> 
> 
>> What you're talking about here is global lookups falling back to 
>> builtin lookups. There's no more general notion of nesting or scoping 
>> involved, so why use those words?
> 
> I'm not quite sure about this. In principle, every name lookup looks in 
> four scopes, LEGB as you describe above:
> 
> - locals
> - non-locals, a.k.a. enclosing or lexical scope(s)
> - globals (i.e. the module)
> - builtins
> 
> 
> although Python can (usually?) optimise away some of those lookups.


I think it kind of _has_ to optimize away, or at least tweak, some of those 
things, rather than just acting as if globals and builtins were just two more 
enclosing scopes. For example, global to builtins has to go through 
globals()['__builtins__'], or act as if it does, or code that relies on, say, 
the documented behavior of exec can be broken. And you have to be able to 
modify the global scope after compile time and have that modification be 
effective, which means you'd have to allow the same things on locals and 
closures if they were to act the same.

> The 
> relationship of locals to enclosing scopes, and to globals in turn, 
> involve actual nesting of indented blocks in Python, but that's not 
> necessarily the case. One might imagine a hypothetical capability for 
> manipulating scopes directly, e.g.:
> 
> def spam(): ...
> def ham(): ...
> set_enclosing(ham, spam)
> # like:
> # def spam():
> #     def ham(): ...

But that doesn't work; a closure has to link to a particular invocation of its 
outer function, not just to the function. Consider a trivial example:

def spam(): x=time()
def ham(): return x
set_enclosing(ham, spam)
ham()

There's no actual x value in scope. So you need something like this if you want 
to actually be able to call it:

def spam(helper):
    x=time()
    helper = bind_closure(helper, sys._getframe())
    return helper()
def ham(): return x
set_enclosing(ham, spam)
spam(ham)

Of course you could make that getframe implicit; the point is there has to be a 
frame from an invocation of spam, not just the function itself, to make lexical 
scoping (errr... dynamically-generated fake-lexical scoping?) useful.

> The adventurous or fool-hardy can probably do something like that now 
> with byte-code hacking :-)

Yeah; I actually played with something like this a few years ago. I did it 
directly in terms of creating cell and free vars, not circumventing the 
existing LEGB system, which means you have to modify not just ham, but spam, in 
that set_enclosing. (In fact, you also have to modify all functions lexically 
or faux-lexically enclosing spam or enclosed by ham, which my code didn't do, 
but there were lots of other ways to fake it...). You need a bit of 
ctypes.pythonapi, not just bytecode hacks, to do the bind_closure() hack (the 
cell constructor isn't callable from Python, and you can't even fake it with a 
wrapper around a cell because cell_contents is immutable from Python...), but 
it's all doable. Anyway, my original goal was to make it possible to get the 
effect of nonlocal in Python 2, by calling "set_enclosing(spam, ham, 
force_cells=('eggs,')", which converted "eggs" to a freevar even if it was 
local (normally it only tried to convert globals), which mostly worked and only 
occasionall
 y segfaulted. :)

At any rate, even in languages designed for this kind of hacking (like Scheme), 
the scopes are still described as nested scopes, and the intended behavior can 
even be defined in terms of "as if you took the sexpr/AST/whatever for ham and 
spam and nested and re-compiled them", so whatever hacks you actually do are 
just optimizations over that re-compile.

> Likewise, one might consider that builtins is a scope which in some 
> sense encloses the global scope. Consider it a virtual code block that 
> is outdented Sent from my iPhone
> 
> 
>> So, trying to generalize global vs. builtin to a general notion of 
>> "nested scope" that isn't necessary for builtins and doesn't work for 
>> anything else seems like overcomplicating things for no benefit.
> 
> Well, putting aside the question of whether this is useful or not, and 
> putting aside efficiency concerns, let's just imagine a hypothetical 
> implementation where name lookups used ChainMaps instead of using 
> separate LOAD_* lookups of special dicts. Then a function could set up a 
> ChainMap:

This is basically one of the two original ways of doing lexical scoping in 
Lisp: when a function is constructed, it stores a reference to the stack frame, 
and when that function is called, its frame stores a reference to the function, 
so you always have a linked chain of stack frames to walk through to do your 
lookups (and assignments).

The first problem with this is that using closures keeps alive a ton of garbage 
that can't be reclaimed for a long time. One solution to that is to lift out 
the variables, and only keep alive the ones that are actually referenced--but 
then you need some rule to decide variables are actually referenced, and the 
easiest place to do that is at function compile time. Which means that if you 
eval up new bindings or manipulate frame environments, they may or may not get 
closures, and it gets very confusing. It's simpler just to make them not work 
at all, at which point you've got pretty much the same rules cellvars have in 
Python.

But you don't want to apply those rules at global scope; you need to be able to 
build that scope iteratively. (Even a simple recursive function--or a function 
that references itself to call set_enclosing--needs to be able to defer looking 
for the name's scope until call time. Which, besides the trivial "making the 
basics work", allows Python to do all kinds of other fun stuff, like write a 
function that calls itself, then replace it with an optimized version, and now 
all existing invocations of that function--like, say, generators--recurse to 
the optimized version.) The traditional solution is to provide a way to declare 
two kinds of bindings: one, lexically scoped through the chain, and the other 
either flat-global or dynamically scoped. Then almost everything at the top 
level ends up global (or ends up don't-care) anyway. What Python does 
(special-case global lookup to be completely late and completely flat, assume 
the whole globals dict can live forever, and make top-level code ac
 tually use the globals dict as its locals) is just a simplification of that, 
which handles 99% of what you normally want.

Then you add a module system, and you effectively need two levels of global, 
and then you've got Python's behavior exactly, so you might as well optimize it 
the same way as Python. :)

But maybe if you went back and found a different solution to the first problem, 
you could come up with different results. If you start with unlinkable stacks 
instead of adding them on later (or decouple the environments from the stacks, 
as you already did, and make them unlinkable ChainMaps), maybe there's a better 
way.

Or maybe you could just decide that the garbage isn't a problem--or, rather, 
that it's an application-level problem; at compile time, users can just say 
"bind everything", but they can also provide a list of names to bind (and can 
also bind things by copy instead), so they have a way to manage the garbage 
growth. (This is C++'s solution to closures; I'm not sure how well it would 
transplant to Python, or Lisp, but it might work--maybe with hooks to do some 
things at runtime that C++ can only do at compile time?)

> function.__scopes__ = ChainMap(locals, enclosing, globals, builtins)
> 
> and a name lookup for (say) "x" would always be a simple:
> 
> function.__scopes__["x"]
> 
> Of course this would be harder to optimize, and hence probably slower, 
> than the current arrangement, but I think it would allow some 
> interesting experiments with scoping rules:
> 
> ChainMap(locals, enclosing, globals, application_globals, builtins)
> 
> 
> You could implement dynamic scoping by inserting the caller's __scopes__ 
> ChainMap into the front of the called function's ChainMap.

I'd think you'd normally want to dynamically scope on a per-variable basis, not 
a per-scope basis--but maybe that's because most languages that have optional 
dynamic scoping do it that way, so I never imagined what I could do with the 
other? Any cool ideas?

> And attribute 
> lookups would be something like this simplified scope:
> 
> ChainMap(self.__dict__, type(self).__dict__)

What about inheritance? You still need to get (base.__dict__ for base in 
type(self).__mro__) in there (and, for class attributes, the same for 
self.__mro__). But that would make things less dynamic (if a type's bases 
change at runtime, its subtypes aren't affected). You could link to your bases' 
ChainMaps instead of your entire MRO's dicts, but then I'm not sure how you 
linearize multiple inheritance. (Can you write a "C3ChainMap" that relies on a 
flag on each dict that says "if you see this flag you have to recompute the 
chain from your original inputs", which you set on __bases__ change?)

> to say nothing of combinations of the two.

You mean implicit self, where your chain the self and its type in there ahead 
of locals? The hard part there isn't lookups, but bindings. Unless you want a 
"self x" declaration akin to "nonlocal x" (or, if that's the default for 
methods, a "local x"--maybe implicit declarations are less important a feature 
than a single lookup chain?).

> So I think there's something interesting here, even if we don't want to 
> use it in production code, it would make for some nice experiments.

Most this seems like it would be easier to experiment with by building a new 
Python-like language than by hacking on Python. (Also, either way, it seems 
more like a thread for -ideas than -dev...)
_______________________________________________
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] Idea: Dictionary references

Reply via email to