On Fri, Dec 18, 2015 at 2:32 PM, Andrew Barnert via Python-Dev <python-dev@python.org> wrote:
> (Also, either way, it seems more like a thread for -ideas than -dev...) I said this early on in this thread! Should I try to write up my idea as a single thing, instead of a bunch of responses, and post it in -ideas? Should I call them "parent scope" and "parent refcell"? On Fri, Dec 18, 2015 at 7:56 AM, Steven D'Aprano <st...@pearwood.info> wrote: > I'm not quite sure about this. In principle, every name lookup looks in > four scopes, LEGB as you describe above: > > - locals > - non-locals, a.k.a. enclosing or lexical scope(s) > - globals (i.e. the module) > - builtins > > > although Python can (usually?) optimise away some of those lookups. The > relationship of locals to enclosing scopes, and to globals in turn, > involve actual nesting of indented blocks in Python, but that's not > necessarily the case. As I understand, L vs E vs GB is known at compile-time. That is, your exec example doesn't work for me in Python 3, because all names are scoped at compile-time. x = 5 def f(): exec('x = 111') print(x) f() #prints 5 print(x) #prints 5 This means that my idea only really works for GB lookups. > On Thu, Dec 17, 2015 at 09:30:24AM -0800, Andrew Barnert via Python-Dev wrote: >> So, trying to generalize global vs. builtin to a general notion of >> "nested scope" that isn't necessary for builtins and doesn't work for >> anything else seems like overcomplicating things for no benefit. > > Well, putting aside the question of whether this is useful or not, and > putting aside efficiency concerns, let's just imagine a hypothetical > implementation where name lookups used ChainMaps instead of using > separate LOAD_* lookups of special dicts. Then a function could set up a > ChainMap: > > function.__scopes__ = ChainMap(locals, enclosing, globals, builtins) > > and a name lookup for (say) "x" would always be a simple: > > function.__scopes__["x"] > > Of course this would be harder to optimize, and hence probably slower, > than the current arrangement, This is where the ChainRefCell idea comes in. If a ChainRefCell is empty, it would ask its parent dicts for a value. If it finds a value in parent n, it would replace parent n with a refcell into parent n, and similarly for parents 0, 1, ... n-1. It won't need to do hash lookups in those parents again, while allowing for those parents to acquire names. (This means parent n+1 won't need to create refcells, so we don't make unnecessary refcells in `object` and `__builtin__`.) Unfortunately, classes are more complicated than nested scopes. 1. We skip MRO if we define classes as having their direct supers as parents. (Solution: Define classes as having all supers as parents, and make non-recursive Refcell.resolve() requests.) (Objects have their class as a parent, always.) 2. Classes can replace their bases. (I have some ideas for this, but see #3.) 3. I get the impression that attribute lookups are already pretty optimized. On Fri, Dec 18, 2015 at 2:32 PM, Andrew Barnert via Python-Dev <python-dev@python.org> wrote: > I think it kind of _has_ to optimize away, or at least tweak, some of those > things, rather than just acting as if globals and builtins were just two more > enclosing scopes. For example, global to builtins has to go through > globals()['__builtins__'], or act as if it does, or code that relies on, say, > the documented behavior of exec can be broken. It would or could, in my idea of __builtins__ being a parent scope of globals() (though I'm not sure whether it'd be the case for any other kind of nesting). Each refcell in globals() will hold a reference to __builtins__ (if they didn't successfully look it up yet) or to a refcell in __builtins__ (if there was once a successful lookup). Since globals() knows when globals()['__builtins__'] is modified, it can invalidate all its refcells' parent cells (by making them hold references to the new __builtins__). This will be O(len(table) + (# of refcells)), but swapping out __builtins__ shouldn't be something you keep doing. Even if it is a concern, I have More Ideas to remove the "len(table) +" (but with Raymond Hettinger's compact dicts, it wouldn't be necessary). It would be worse for classes, because it would require potentially many notifications. (But it would also save future lookups. And I have More Ideas.) This idea (of the owner dict "knowing" about its changed parent) also applies to general chained scopes, but flattenings like MRO would mess it up. Again, though, More Ideas. And more importantly, from what I understand of Victor's response, the current implementation would probably be efficient enough, or more efficient. > And you have to be able to modify the global scope after compile time and > have that modification be effective, which means you'd have to allow the same > things on locals and closures if they were to act the same. Not sure what you mean, but since I demand (possibly empty) refcells from globals() at compile time, they will always have the most updated value from globals. Not so much from __builtins__, but each refcell in globals will only have to make one successful lookup in __builtins__ (until it's swapped out). > The first problem with this is that using closures keeps alive a ton of > garbage that can't be reclaimed for a long time. One solution to that is to > lift out the variables, and only keep alive the ones that are actually > referenced--but then you need some rule to decide variables are actually > referenced, and the easiest place to do that is at function compile time. > Which means that if you eval up new bindings or manipulate frame > environments, they may or may not get closures, and it gets very confusing. > It's simpler just to make them not work at all, at which point you've got > pretty much the same rules cellvars have in Python. I don't know enough to confidentally say whether it would be an improvement to closures, but the refs concept I want for dict works for pretty much any data structure. You just keep a second container of pointers to RefCells, synced to the size of the original container. For a dict, that means syncing a second table with the same hash indices. For a resizable array, it means keeping an array of pointers of the same size. When an internal function refers to a local, it requests a refcell. When the external function call dies, the array cleans up its unexposed variables and releases its ref'd variables to the refcells (which might be held by an unexposed variable and thus later get DecRef'd anyway). The logic is pretty simple and doesn't need to "know" about closures. It just piggybacks onto Python's refcounting. But it would mean that inner functions create Python objects where they didn't used to (but this might be solvable at compile-time). And again, I don't know enough to say it's an improvement. > But you don't want to apply those rules at global scope; you need to be able > to build that scope iteratively. (Even a simple recursive function--or a > function that references itself to call set_enclosing--needs to be able to > defer looking for the name's scope until call time. Which, besides the > trivial "making the basics work", allows Python to do all kinds of other fun > stuff, like write a function that calls itself, then replace it with an > optimized version, and now all existing invocations of that function--like, > say, generators--recurse to the optimized version.) My idea would allow that, with only one lookup at compile-time. It just creates cells that might never be used. (But by requesting such a cell, you're saying that it INTENDS to be used.) >> So I think there's something interesting here, even if we don't want to >> use it in production code, it would make for some nice experiments. > > Most this seems like it would be easier to experiment with by building a new > Python-like language than by hacking on Python. I think it would be pretty much the same difficulty. _______________________________________________ Python-Dev mailing list Python-Dev@python.org https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com