Re: [Python-Dev] Idea: Dictionary references

2015-12-18 Thread Steven D'Aprano
On Thu, Dec 17, 2015 at 09:30:24AM -0800, Andrew Barnert via Python-Dev wrote:
> On Dec 17, 2015, at 07:38, Franklin? Lee  
> wrote:
> > 
> > The nested dictionaries are only for nested scopes (and inner
> > functions don't create nested scopes). Nested scopes will already
> > require multiple lookups in parents.
> 
> I think I understand what you're getting at here, but it's a really 
> confusing use of terminology. In Python, and in programming in 
> general, nested scopes refer to exactly inner functions (and classes) 
> being lexically nested and doing lookup through outer scopes. The fact 
> that this is optimized at compile time to FAST vs. CELL vs. 
> GLOBAL/NAME, cells are optimized at function-creation time, and only 
> global and name have to be resolved at the last second doesn't mean 
> that there's no scoping, or some other form of scoping besides 
> lexical. The actual semantics are LEGB, even if L vs. E vs. GB and E 
> vs. further-out E can be optimized.

In Python 2, the LOAD_NAME byte-code can return a local, even though it 
normally doesn't:

py> x = "global"
py> def spam():
... exec "x = 'local'"
... print x
...
py> spam()
local
py> x == 'global'
True


If we look at the byte-code, we see that the lookup is *not* optimized 
to inspect locals only (LOAD_FAST), but uses the regular LOAD_NAME that 
normally gets used for globals and builtins:

py> import dis
py> dis.dis(spam)
  2   0 LOAD_CONST   1 ("x = 'local'")
  3 LOAD_CONST   0 (None)
  6 DUP_TOP
  7 EXEC_STMT

  3   8 LOAD_NAME0 (x)
 11 PRINT_ITEM
 12 PRINT_NEWLINE
 13 LOAD_CONST   0 (None)
 16 RETURN_VALUE



> What you're talking about here is global lookups falling back to 
> builtin lookups. There's no more general notion of nesting or scoping 
> involved, so why use those words?

I'm not quite sure about this. In principle, every name lookup looks in 
four scopes, LEGB as you describe above:

- locals
- non-locals, a.k.a. enclosing or lexical scope(s)
- globals (i.e. the module)
- builtins


although Python can (usually?) optimise away some of those lookups. The 
relationship of locals to enclosing scopes, and to globals in turn, 
involve actual nesting of indented blocks in Python, but that's not 
necessarily the case. One might imagine a hypothetical capability for 
manipulating scopes directly, e.g.:

def spam(): ...
def ham(): ...
set_enclosing(ham, spam)
# like:
# def spam():
# def ham(): ...

The adventurous or fool-hardy can probably do something like that now 
with byte-code hacking :-)

Likewise, one might consider that builtins is a scope which in some 
sense encloses the global scope. Consider it a virtual code block that 
is outdented from the top-level scope :-)


> So, trying to generalize global vs. builtin to a general notion of 
> "nested scope" that isn't necessary for builtins and doesn't work for 
> anything else seems like overcomplicating things for no benefit.

Well, putting aside the question of whether this is useful or not, and 
putting aside efficiency concerns, let's just imagine a hypothetical 
implementation where name lookups used ChainMaps instead of using 
separate LOAD_* lookups of special dicts. Then a function could set up a 
ChainMap:

function.__scopes__ = ChainMap(locals, enclosing, globals, builtins)

and a name lookup for (say) "x" would always be a simple:

function.__scopes__["x"]

Of course this would be harder to optimize, and hence probably slower, 
than the current arrangement, but I think it would allow some 
interesting experiments with scoping rules:

ChainMap(locals, enclosing, globals, application_globals, builtins)


You could implement dynamic scoping by inserting the caller's __scopes__ 
ChainMap into the front of the called function's ChainMap. And attribute 
lookups would be something like this simplified scope:

ChainMap(self.__dict__, type(self).__dict__)

to say nothing of combinations of the two.

So I think there's something interesting here, even if we don't want to 
use it in production code, it would make for some nice experiments.


-- 
Steve
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Idea: Dictionary references

2015-12-18 Thread Andrew Barnert via Python-Dev
> On Dec 18, 2015, at 04:56, Steven D'Aprano  wrote:
> 
>>> On Thu, Dec 17, 2015 at 09:30:24AM -0800, Andrew Barnert via Python-Dev 
>>> wrote:
>>> On Dec 17, 2015, at 07:38, Franklin? Lee  
>>> wrote:
>>> 
>>> The nested dictionaries are only for nested scopes (and inner
>>> functions don't create nested scopes). Nested scopes will already
>>> require multiple lookups in parents.
>> 
>> I think I understand what you're getting at here, but it's a really 
>> confusing use of terminology. In Python, and in programming in 
>> general, nested scopes refer to exactly inner functions (and classes) 
>> being lexically nested and doing lookup through outer scopes. The fact 
>> that this is optimized at compile time to FAST vs. CELL vs. 
>> GLOBAL/NAME, cells are optimized at function-creation time, and only 
>> global and name have to be resolved at the last second doesn't mean 
>> that there's no scoping, or some other form of scoping besides 
>> lexical. The actual semantics are LEGB, even if L vs. E vs. GB and E 
>> vs. further-out E can be optimized.
> 
> In Python 2, the LOAD_NAME byte-code can return a local, even though it 
> normally doesn't:
> 
> py> x = "global"
> py> def spam():
> ... exec "x = 'local'"
> ... print x
> ...
> py> spam()
> local
> py> x == 'global'
> True
> 
> 
> If we look at the byte-code, we see that the lookup is *not* optimized 
> to inspect locals only (LOAD_FAST), but uses the regular LOAD_NAME that 
> normally gets used for globals and builtins:
> 
> py> import dis
> py> dis.dis(spam)
>  2   0 LOAD_CONST   1 ("x = 'local'")
>  3 LOAD_CONST   0 (None)
>  6 DUP_TOP
>  7 EXEC_STMT
> 
>  3   8 LOAD_NAME0 (x)
> 11 PRINT_ITEM
> 12 PRINT_NEWLINE
> 13 LOAD_CONST   0 (None)
> 16 RETURN_VALUE
> 
> 
> 
>> What you're talking about here is global lookups falling back to 
>> builtin lookups. There's no more general notion of nesting or scoping 
>> involved, so why use those words?
> 
> I'm not quite sure about this. In principle, every name lookup looks in 
> four scopes, LEGB as you describe above:
> 
> - locals
> - non-locals, a.k.a. enclosing or lexical scope(s)
> - globals (i.e. the module)
> - builtins
> 
> 
> although Python can (usually?) optimise away some of those lookups.

I think it kind of _has_ to optimize away, or at least tweak, some of those 
things, rather than just acting as if globals and builtins were just two more 
enclosing scopes. For example, global to builtins has to go through 
globals()['__builtins__'], or act as if it does, or code that relies on, say, 
the documented behavior of exec can be broken. And you have to be able to 
modify the global scope after compile time and have that modification be 
effective, which means you'd have to allow the same things on locals and 
closures if they were to act the same.

> The 
> relationship of locals to enclosing scopes, and to globals in turn, 
> involve actual nesting of indented blocks in Python, but that's not 
> necessarily the case. One might imagine a hypothetical capability for 
> manipulating scopes directly, e.g.:
> 
> def spam(): ...
> def ham(): ...
> set_enclosing(ham, spam)
> # like:
> # def spam():
> # def ham(): ...

But that doesn't work; a closure has to link to a particular invocation of its 
outer function, not just to the function. Consider a trivial example:

def spam(): x=time()
def ham(): return x
set_enclosing(ham, spam)
ham()

There's no actual x value in scope. So you need something like this if you want 
to actually be able to call it:

def spam(helper):
x=time()
helper = bind_closure(helper, sys._getframe())
return helper()
def ham(): return x
set_enclosing(ham, spam)
spam(ham)

Of course you could make that getframe implicit; the point is there has to be a 
frame from an invocation of spam, not just the function itself, to make lexical 
scoping (errr... dynamically-generated fake-lexical scoping?) useful.

> The adventurous or fool-hardy can probably do something like that now 
> with byte-code hacking :-)

Yeah; I actually played with something like this a few years ago. I did it 
directly in terms of creating cell and free vars, not circumventing the 
existing LEGB system, which means you have to modify not just ham, but spam, in 
that set_enclosing. (In fact, you also have to modify all functions lexically 
or faux-lexically enclosing spam or enclosed by ham, which my code didn't do, 
but there were lots of other ways to fake it...). You need a bit of 
ctypes.pythonapi, not just bytecode hacks, to do the bind_closure() hack (the 
cell constructor isn't callable from Python, and you can't even fake it with a 
wrapper around a cell because cell_contents is immutable from Python...), but 
it's all doable. Anyway, my original goal was to make it possible to get the 

Re: [Python-Dev] Idea: Dictionary references

2015-12-18 Thread Franklin? Lee
On Fri, Dec 18, 2015 at 2:32 PM, Andrew Barnert via Python-Dev
 wrote:

> (Also, either way, it seems more like a thread for -ideas than -dev...)

I said this early on in this thread!

Should I try to write up my idea as a single thing, instead of a bunch
of responses, and post it in -ideas?

Should I call them "parent scope" and "parent refcell"?


On Fri, Dec 18, 2015 at 7:56 AM, Steven D'Aprano  wrote:

> I'm not quite sure about this. In principle, every name lookup looks in
> four scopes, LEGB as you describe above:
>
> - locals
> - non-locals, a.k.a. enclosing or lexical scope(s)
> - globals (i.e. the module)
> - builtins
>
>
> although Python can (usually?) optimise away some of those lookups. The
> relationship of locals to enclosing scopes, and to globals in turn,
> involve actual nesting of indented blocks in Python, but that's not
> necessarily the case.

As I understand, L vs E vs GB is known at compile-time.

That is, your exec example doesn't work for me in Python 3, because
all names are scoped at compile-time.

x = 5
def f():
exec('x = 111')
print(x)

f() #prints 5
print(x) #prints 5


This means that my idea only really works for GB lookups.

> On Thu, Dec 17, 2015 at 09:30:24AM -0800, Andrew Barnert via Python-Dev wrote:

>> So, trying to generalize global vs. builtin to a general notion of
>> "nested scope" that isn't necessary for builtins and doesn't work for
>> anything else seems like overcomplicating things for no benefit.
>
> Well, putting aside the question of whether this is useful or not, and
> putting aside efficiency concerns, let's just imagine a hypothetical
> implementation where name lookups used ChainMaps instead of using
> separate LOAD_* lookups of special dicts. Then a function could set up a
> ChainMap:
>
> function.__scopes__ = ChainMap(locals, enclosing, globals, builtins)
>
> and a name lookup for (say) "x" would always be a simple:
>
> function.__scopes__["x"]
>
> Of course this would be harder to optimize, and hence probably slower,
> than the current arrangement,

This is where the ChainRefCell idea comes in.

If a ChainRefCell is empty, it would ask its parent dicts for a value.
If it finds a value in parent n, it would replace parent n with a
refcell into parent n, and similarly for parents 0, 1, ... n-1. It
won't need to do hash lookups in those parents again, while allowing
for those parents to acquire names. (This means parent n+1 won't need
to create refcells, so we don't make unnecessary refcells in `object`
and `__builtin__`.)

Unfortunately, classes are more complicated than nested scopes.

1. We skip MRO if we define classes as having their direct supers as
parents. (Solution: Define classes as having all supers as parents,
and make non-recursive Refcell.resolve() requests.) (Objects have
their class as a parent, always.)

2. Classes can replace their bases. (I have some ideas for this, but see #3.)

3. I get the impression that attribute lookups are already pretty optimized.


On Fri, Dec 18, 2015 at 2:32 PM, Andrew Barnert via Python-Dev
 wrote:

> I think it kind of _has_ to optimize away, or at least tweak, some of those 
> things, rather than just acting as if globals and builtins were just two more 
> enclosing scopes. For example, global to builtins has to go through 
> globals()['__builtins__'], or act as if it does, or code that relies on, say, 
> the documented behavior of exec can be broken.

It would or could, in my idea of __builtins__ being a parent scope of
globals() (though I'm not sure whether it'd be the case for any other
kind of nesting).

Each refcell in globals() will hold a reference to __builtins__ (if
they didn't successfully look it up yet) or to a refcell in
__builtins__ (if there was once a successful lookup). Since globals()
knows when globals()['__builtins__'] is modified, it can invalidate
all its refcells' parent cells (by making them hold references to the
new __builtins__).

This will be O(len(table) + (# of refcells)), but swapping out
__builtins__ shouldn't be something you keep doing. Even if it is a
concern, I have More Ideas to remove the "len(table) +" (but with
Raymond Hettinger's compact dicts, it wouldn't be necessary). It would
be worse for classes, because it would require potentially many
notifications. (But it would also save future lookups. And I have More
Ideas.)

This idea (of the owner dict "knowing" about its changed parent) also
applies to general chained scopes, but flattenings like MRO would mess
it up. Again, though, More Ideas. And more importantly, from what I
understand of Victor's response, the current implementation would
probably be efficient enough, or more efficient.


> And you have to be able to modify the global scope after compile time and 
> have that modification be effective, which means you'd have to allow the same 
> things on locals and closures if they were to act the same.

Not 

Re: [Python-Dev] Idea: Dictionary references

2015-12-17 Thread Maciej Fijalkowski
You can very easily implement this with version tags on the globals
dictionaries - means that the dictionaries have versions and the guard
checking if everything is ok just checks the version tag on globals.

Generally speaking, such optimizations have been done in the past
(even in places like pypy, but also in literature) and as soon as we
have dynamic compilation (and FAT is a form of it), you can do such
tricks.

On Thu, Dec 17, 2015 at 3:48 PM, Steven D'Aprano  wrote:
> On Thu, Dec 17, 2015 at 12:53:13PM +0100, Victor Stinner quoted:
>> 2015-12-17 11:54 GMT+01:00 Franklin? Lee :
>
>> > Each function keeps an indirect, automagically updated
>> > reference to the current value of the names they use,
>
> Isn't that a description of globals()? If you want to look up a name
> "spam", you grab an indirect reference to it:
>
> globals()["spam"]
>
> which returns the current value of the name "spam".
>
>
>> > and will never need to look things up again.[*]
>
> How will this work?
>
> Naively, it sounds to me like Franklin is suggesting that on every
> global assignment, the interpreter will have to touch every single
> function in the module to update that name. Something like this:
>
> # on a global assignment
> spam = 23
>
> # the interpreter must do something like this:
> for function in module.list_of_functions:
> if "spam" in function.__code__.__global_names__:
> function.__code__.__global_names__["spam"] = spam
>
> As I said, that's a very naive way to implement this. Unless you have
> something much cleverer, I think this will be horribly slow.
>
> And besides, you *still* need to deal with the case that the name isn't
> a global at all, but in the built-ins namespace.
>
>
> --
> Steve
> ___
> Python-Dev mailing list
> Python-Dev@python.org
> https://mail.python.org/mailman/listinfo/python-dev
> Unsubscribe: 
> https://mail.python.org/mailman/options/python-dev/fijall%40gmail.com
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Idea: Dictionary references

2015-12-17 Thread Steven D'Aprano
On Thu, Dec 17, 2015 at 12:53:13PM +0100, Victor Stinner quoted:
> 2015-12-17 11:54 GMT+01:00 Franklin? Lee :

> > Each function keeps an indirect, automagically updated
> > reference to the current value of the names they use, 

Isn't that a description of globals()? If you want to look up a name 
"spam", you grab an indirect reference to it:

globals()["spam"]

which returns the current value of the name "spam".


> > and will never need to look things up again.[*]

How will this work?

Naively, it sounds to me like Franklin is suggesting that on every 
global assignment, the interpreter will have to touch every single 
function in the module to update that name. Something like this:

# on a global assignment
spam = 23

# the interpreter must do something like this:
for function in module.list_of_functions:
if "spam" in function.__code__.__global_names__:
function.__code__.__global_names__["spam"] = spam

As I said, that's a very naive way to implement this. Unless you have 
something much cleverer, I think this will be horribly slow.

And besides, you *still* need to deal with the case that the name isn't 
a global at all, but in the built-ins namespace.


-- 
Steve
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Idea: Dictionary references

2015-12-17 Thread Franklin? Lee
(Previous thread was here, by the way:
https://mail.python.org/pipermail/python-dev/2015-December/142437.html)

On Thu, Dec 17, 2015 at 8:48 AM, Steven D'Aprano  wrote:
> On Thu, Dec 17, 2015 at 12:53:13PM +0100, Victor Stinner quoted:
>> 2015-12-17 11:54 GMT+01:00 Franklin? Lee :
>
>> > Each function keeps an indirect, automagically updated
>> > reference to the current value of the names they use,
>
> Isn't that a description of globals()? If you want to look up a name
> "spam", you grab an indirect reference to it:
>
> globals()["spam"]
>
> which returns the current value of the name "spam".

The *current value*. I'm proposing that we instead do this:

spamref = globals().getref('spam')

Every time we want to find the current, updated value of 'spam', we just do

spam = spamref.resolve()

which will skip the hash lookup and go directly to the value.

>> > and will never need to look things up again.[*]
>
> How will this work?
>
> Naively, it sounds to me like Franklin is suggesting that on every
> global assignment, the interpreter will have to touch every single
> function in the module to update that name.

A refcell holds a pointer to the location in the dict itself where the
value pointer is. When the value is updated in the dict, the refcell
will not need to be updated.

My original proposal wanted to keep cells in the "real" dict, and
update them. Like so:

class RefCell:
__slots__ = ['value']

class ScopeDict(dict):
def __getitem__(self, key):
value = super()[key].value #may raise
if value is NULL:
raise KeyError(key)
return value

def __setitem__(self, key, value):
if key in super():
super()[key].value = value
else:
cell = super()[key] = RefCell()
cell.value = value
def __delitem__(self, key, value):
cell = super()[key] #may raise
if cell.value is NULL:
raise KeyError(key)
cell.value = NULL


I realized later that this isn't necessary. Most dict operations don't
need to know about the indirection, so I make the inner dict a normal
dict (with a few more holes than normal). But this would show how you
can avoid manual updates for references.

> And besides, you *still* need to deal with the case that the name isn't
> a global at all, but in the built-ins namespace.

globals() to __builtin__ is a nesting relationship. At the bottom of
the following email, I have a pseudocode implementation which knows
how to deal with nested scopes.
https://mail.python.org/pipermail/python-dev/2015-December/142489.html
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Idea: Dictionary references

2015-12-17 Thread Victor Stinner
2015-12-17 16:38 GMT+01:00 Franklin? Lee :
> Yeah, maybe it's out of the scope of bytecode optimization. But I
> think this would make guards unnecessary, since once a name is found,
> there's a quick way to refer to it.

FAT Python requires and supports various kinds of guards:
* type of a function argument
* dictionary key
* function (func.__code__)

I guess that you are talking about the dictionary key guards. In fact,
they are 4 subtypes of dict guards:
* guard on builtins.__dict__['key']
* guard on globals()['key']
* guard on MyClass.__dict__['key']
* guard on dict['key']

The implementation of the check if this guard currently relies on a
new "version" (global dict version, incremented at each change) to
avoid lookup if possible. The guard stores a strong reference to the
key and the value. If the value is different, the guard checks returns
False.

I don't understand how you plan to avoid guards. The purpose of guards
is to respect the Python semantic by falling back to the "slow"
bytecode if something changes. So I don't understand your idea of
avoiding completly guards. Again, that's why I guess that it's
unrelated to FAT Python...

Or it's just that your idea is an alternative to dict versionning to
get efficient guards on dict keys, right?

> Guards would also have an issue
> with nested scopes. You have a note on your website about it:
> (https://faster-cpython.readthedocs.org/fat_python.html#call-pure-builtins)
>
> "The optimization is disabled when the builtin function is
> modified or if a variable with the same name is added to the global
> namespace of the function."

FAT Python doesn't emit a specialized version if it requires a builtin
function, but a local variable with the same name is found.

The check is done in the current function but also in the parent
namespaces, up to the global namespace. I'm talking about the static
analysis of the code.

If the specialized version is built, a guard is created on the builtin
namespace and another on the global namespace.

Sorry, I don't understand the "problem" you are referring to. Can you
maybe show an example of code where FAT Python doesn't work or where
you idea can help?

> RefCells "know" whether they're part of a living dict. (The dict marks
> them as such upon its death.) If they are not, they will decref their
> value upon their death. They do not hold a reference to their owner
> dict.

The dict contains the list all of its "own" RefCell objects, right?

> If it's part of a living dict, we have a choice: the dict can be
> responsible for deletion, or the RefCell can be responsible for
> deletion. It doesn't really matter which design we go with.

I see your RefCell idea like dictionary views. Views are linked to the
dict. If the dict is modified, views are "updated" too.

It would be confusing to have a view disconnected from its container.

In short, RefCell is a view on a single dict entry, to be able to
"watch" a dict entry without the cost of a lookup? And a RefCell can
be created, even if the dict entry doesn't exist right?

Hum, creating many RefCell introduces an overhead somewhere. For
example, deleting a key has to update N RefCell objects linked to this
key, right? So del dict[x] takes O(number of RefCell), right?

> If a dict key is removed, the inner dict will still have the key
> (which is true in the current implementation), but the value will be
> decref'd and the value pointer will be NULL'd. The RefCell will not
> need to be updated, since (as part of a living dict) it's pointing at
> the pointer to the value, not the object itself.

Ok, so deleting a dict key always destroy the value, but the key may
stay alive longer than expected (until all RefCell objects are
destroyed). Usually, dict keys are constants, so keeping them alive
doesn't matter so much. It's rare to have large keys.

> If it is detached from its owner dict (due to owner death), it will
> own a strong reference to the value. This is necessarily the case,
> since things that have a (strong) reference to the RefCell expect to
> find the value (or lack of a value) there.

I don't  understand this part.

You said that deleting a key destroys the value. Destroying a dict
means clearing all keys, so destroying all values. No?

What is the use case of having a RefCell no more connected to the dict?

Victor
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Idea: Dictionary references

2015-12-17 Thread Franklin? Lee
On Thu, Dec 17, 2015 at 6:53 AM, Victor Stinner
 wrote:
> 2015-12-17 11:54 GMT+01:00 Franklin? Lee :
>> My suggestion should improve *all* function calls which refer to
>> outside names.
>
> Ok, I now think that you should stop hijacking the FAT Python thread.
> I start a new thread. IMHO your dictionary reference idea is completly
> unrelated to FAT Python.
>
> FAT Python is about guards and specialized bytecode.

Yeah, maybe it's out of the scope of bytecode optimization. But I
think this would make guards unnecessary, since once a name is found,
there's a quick way to refer to it.


>> Each function keeps an indirect, automagically updated
>> reference to the current value of the names they use, and will never
>> need to look things up again.[*]
>
> Indirections, nested dictionaries, creation of new "reference"
> objects... IMHO you are going to have major implementation issues :-/
> The design looks *very* complex. I'm quite sure that you are going to
> make namespace lookups *slower*.

The nested dictionaries are only for nested scopes (and inner
functions don't create nested scopes). Nested scopes will already
require multiple lookups in parents. I think this is strictly an
improvement, except perhaps in memory. Guards would also have an issue
with nested scopes. You have a note on your website about it:
(https://faster-cpython.readthedocs.org/fat_python.html#call-pure-builtins)

"The optimization is disabled when the builtin function is
modified or if a variable with the same name is added to the global
namespace of the function."

With a NestedRefCell, it would check globals() (a simple dereference
and `pointer != NULL`) and then check __builtin__. If it finds it in
__builtin__, it will save a reference to that. It will only do
repeated lookups in __builtin__ if each of the previous lookups fail.
As far as I know, these repeated lookups are already necessary, and
anything that can be used to avoid them (e.g. guards) can be used for
repeated failed lookups, too.

For non-nested scopes, it will look things up once, costing an extra
RefCell creation if necessary, and the only other costs are on
resizing, deletion of the dict, and working with a larger dict in
general.

The important parts of the design is pretty much in the code that I
posted. We keep an extra hash table for refs, and keep it the same
size as the original hash table, so that we pay a single lookup cost
to get the index in both.


> It reminds me Python before the with statement and PyPy garbage
> collector. Many applications relied on the exact behaviour of CPython
> garbage collector. For example, they expected that a file is written
> on disk when the last reference to the file object is destroyed. In
> PyPy, it wasn't (it isn't) true, the write can be delayed.

It should not affect the semantic. Things should still happen as they
used to, as far as I can figure. Or at least as far as the rules of
the interpreter are concerned. (That is, values might live a little
longer in PyPy, but can't be forced to live longer than they were
formerly allowed to.)


> I guess that with all your complex machinery for dict lookups,

The only cost to a normal getitem (again, other than from looking it
up in a bigger dict) is to make sure the return value isn't NULL. The
machinery is involved in function creation and resolution of names: On
function creation, get refs to each name used. When the name is used,
try to resolve the refs.


>you
> will have similar issues of object lifetime. It's unclear to me when
> and how "reference" objects are destroyed, nor when dict values are
> destroyed.

RefCells are ref-counted PyObjects. That is not an issue. A RefCell
will live while it is useful (= it has an outside reference) or while
it's not useful but its owner dict hasn't been resized/deleted yet (at
which time RefCells without outside references will be deleted).

RefCells "know" whether they're part of a living dict. (The dict marks
them as such upon its death.) If they are not, they will decref their
value upon their death. They do not hold a reference to their owner
dict.

If it's part of a living dict, we have a choice: the dict can be
responsible for deletion, or the RefCell can be responsible for
deletion. It doesn't really matter which design we go with.


> What happens if a dict key is removed and a reference object is still
> alive? Is the dict value immediatly destroyed? Does the reference
> object contain a strong or a weak reference to the value?

If a dict key is removed, the inner dict will still have the key
(which is true in the current implementation), but the value will be
decref'd and the value pointer will be NULL'd. The RefCell will not
need to be updated, since (as part of a living dict) it's pointing at
the pointer to the value, not the object itself.

If it is detached from its owner dict (due to owner 

Re: [Python-Dev] Idea: Dictionary references

2015-12-17 Thread Andrew Barnert via Python-Dev
On Dec 17, 2015, at 07:38, Franklin? Lee  wrote:
> 
> The nested dictionaries are only for nested scopes (and inner
> functions don't create nested scopes). Nested scopes will already
> require multiple lookups in parents.

I think I understand what you're getting at here, but it's a really confusing 
use of terminology. In Python, and in programming in general, nested scopes 
refer to exactly inner functions (and classes) being lexically nested and doing 
lookup through outer scopes. The fact that this is optimized at compile time to 
FAST vs. CELL vs. GLOBAL/NAME, cells are optimized at function-creation time, 
and only global and name have to be resolved at the last second doesn't mean 
that there's no scoping, or some other form of scoping besides lexical. The 
actual semantics are LEGB, even if L vs. E vs. GB and E vs. further-out E can 
be optimized.

What you're talking about here is global lookups falling back to builtin 
lookups. There's no more general notion of nesting or scoping involved, so why 
use those words?

Also, reading your earlier post, it sounds like you're trying to treat 
attribute lookup as a special case of global lookup, only with a chain of 
superclasses beyond the class instead of just a single builtins. But they're 
totally different. Class lookup doesn't just look in a series of dicts, it 
calls __getattribute__ which usually calls __getattr__ which may or may not 
look in the __dict__s (which may not even exist) to find a descriptor and then 
calls its __get__ method to get the value. You'd have to somehow handle the 
case where the search only went through object.__getattribute__ and __getattr__ 
and found a result by looking in a dict, to make a RefCell to that dict which 
is marked in some way that says "I'm not a value, I'm a descriptor you have to 
call each time", and then apply some guards that will detect whether that class 
or any intervening class dict touched that key, whether the MRO changed, 
whether that class or any intervening class added or changed implementations for
  __getatttibute__ or __getattr__, and probably more things I haven't thought 
of. What do those guards look like? (Also, you need a different set of rules to 
cache, and guard for, special method lookup--you could just ignore that, but I 
think those are the lookups that would benefit most from optimization.)

So, trying to generalize global vs. builtin to a general notion of "nested 
scope" that isn't necessary for builtins and doesn't work for anything else 
seems like overcomplicating things for no benefit.


> I think this is strictly an
> improvement, except perhaps in memory. Guards would also have an issue
> with nested scopes. You have a note on your website about it:
> (https://faster-cpython.readthedocs.org/fat_python.html#call-pure-builtins)
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Idea: Dictionary references

2015-12-17 Thread Franklin? Lee
On Thu, Dec 17, 2015 at 11:50 AM, Victor Stinner
 wrote:
> I don't understand how you plan to avoid guards. The purpose of guards
> is to respect the Python semantic by falling back to the "slow"
> bytecode if something changes. So I don't understand your idea of
> avoiding completly guards. Again, that's why I guess that it's
> unrelated to FAT Python...

Yeah, I guess it is. Maybe we could've moved to python-ideas. As far
as I understand, this concept can be put into CPython.

(Note to self: Look up how PyPy handles repeated lookups.)


>> Guards would also have an issue
>> with nested scopes. You have a note on your website about it:
>> (https://faster-cpython.readthedocs.org/fat_python.html#call-pure-builtins)
>>
>> "The optimization is disabled when the builtin function is
>> modified or if a variable with the same name is added to the global
>> namespace of the function."
>
> FAT Python doesn't emit a specialized version if it requires a builtin
> function, but a local variable with the same name is found.
>
> The check is done in the current function but also in the parent
> namespaces, up to the global namespace. I'm talking about the static
> analysis of the code.
>
> If the specialized version is built, a guard is created on the builtin
> namespace and another on the global namespace.
>
> Sorry, I don't understand the "problem" you are referring to. Can you
> maybe show an example of code where FAT Python doesn't work or where
> you idea can help?

If we have to look in scopes A, B, C in order, where the name is in C
but never in B, and there's no nesting relationship between B and C.
In that case, I do not create a refcell in B chained to C (because
there's no relationship), so I keep doing lookups in B. That's the
problem. For that, guards and versioning can prevent unnecessary
lookups in B. Though I think a better solution might be: If a name is
found in C, create empty refcells in B and A (i.e. in all previous
dicts).

(There's a nesting relationship between globals() and __builtin__, so
that's fine.)


>> RefCells "know" whether they're part of a living dict. (The dict marks
>> them as such upon its death.) If they are not, they will decref their
>> value upon their death. They do not hold a reference to their owner
>> dict.
>
> The dict contains the list all of its "own" RefCell objects, right?

It contains a table of pointers. The pointers are to RefCell objects or NULL.

The refs table is exactly the same size as the internal hash table.
This makes indexing it efficient: to find the pointer to a refcell,
find the index of the key in the hash table, then use that SAME index
on the refs table. You never need to find a refcell without also
finding its hash index, so this is cheap.


> In short, RefCell is a view on a single dict entry, to be able to
> "watch" a dict entry without the cost of a lookup? And a RefCell can
> be created, even if the dict entry doesn't exist right?

My "implementation", which had nesting and recursion in mind, had a
"create_if_none" parameter, which meant that the requester can ask for
it to be created even if the key didn't exist in the table.
Pre-creation is useful for functions which refer to globals() names
before they're defined. No-creation is useful in... I can only think
of nesting as a use (globals() -> __builtin__ shouldn't create empty
cells in __builtin__).

See `getref` in here:
https://mail.python.org/pipermail/python-dev/2015-December/142489.html


> Hum, creating many RefCell introduces an overhead somewhere. For
> example, deleting a key has to update N RefCell objects linked to this
> key, right? So del dict[x] takes O(number of RefCell), right?

There are no "outside" updates, except when a dict moves to a
different internal table or deletes its internal table. In that case,
the dict has to move and update each exposed RefCell.

For each dict, for each key, there is at most one RefCell. As long as
the dict is alive, that RefCell will hold a pointer to the pointer to
the value (enforced by the dict). When the dict dies, it makes the
Refcell point to the object, and tells the RefCell it's free (so it's
in charge of cleaning up its value).

Dict entries look like this.

typedef struct {
/* Cached hash code of me_key. */
Py_hash_t me_hash;
PyObject *me_key;
PyObject *me_value; /* This field is only meaningful for
combined tables */
} PyDictKeyEntry;

The internal table (which the ref table will sync with) is an array of
PyDictKeyEntrys.

(Raymond Hettinger made a design with a smaller table, and the hash
lookup would be into an array of indices. This would make synced
metadata tables both easier and smaller. See
https://mail.python.org/pipermail/python-dev/2012-December/123028.html
and latest relevant discussion
https://mail.python.org/pipermail/python-ideas/2015-December/037468.html
)

The refcell will hold this:

RefCell(_value)

A pointer to the field, not to the 

Re: [Python-Dev] Idea: Dictionary references

2015-12-17 Thread Franklin? Lee
On Thu, Dec 17, 2015 at 12:30 PM, Andrew Barnert  wrote:
> On Dec 17, 2015, at 07:38, Franklin? Lee  
> wrote:
>>
>> The nested dictionaries are only for nested scopes (and inner
>> functions don't create nested scopes). Nested scopes will already
>> require multiple lookups in parents.
>
> I think I understand what you're getting at here, but it's a really confusing 
> use of terminology. In Python, and in programming in general, nested scopes 
> refer to exactly inner functions (and classes) being lexically nested and 
> doing lookup through outer scopes. The fact that this is optimized at compile 
> time to FAST vs. CELL vs. GLOBAL/NAME, cells are optimized at 
> function-creation time, and only global and name have to be resolved at the 
> last second doesn't mean that there's no scoping, or some other form of 
> scoping besides lexical. The actual semantics are LEGB, even if L vs. E vs. 
> GB and E vs. further-out E can be optimized.

Oh, I've never actually read the Python scoping rules spelled out. I
wasn't sure if there were other cases.

The other two cases I thought of as "nesting" were: object to its
class, and class to its superclasses.

> Also, reading your earlier post, it sounds like you're trying to treat 
> attribute lookup as a special case of global lookup, only with a chain of 
> superclasses beyond the class instead of just a single builtins. But they're 
> totally different. Class lookup doesn't just look in a series of dicts, it 
> calls __getattribute__ which usually calls __getattr__ which may or may not 
> look in the __dict__s (which may not even exist) to find a descriptor and 
> then calls its __get__ method to get the value. You'd have to somehow handle 
> the case where the search only went through object.__getattribute__ and 
> __getattr__ and found a result by looking in a dict, to make a RefCell to 
> that dict which is marked in some way that says "I'm not a value, I'm a 
> descriptor you have to call each time", and then apply some guards that will 
> detect whether that class or any intervening class dict touched that key, 
> whether the MRO changed, whether that class or any intervening class added or 
> changed implementations f
 or __getatttibute__ or __getattr__, and probably more things I haven't thought 
of. What do those guards look like? (Also, you need a different set of rules to 
cache, and guard for, special method lookup--you could just ignore that, but I 
think those are the lookups that would benefit most from optimization.)

Doesn't __getattr__ only get called if all the mro __dict__ lookups failed?

I forgot about __getattribute__. That might be the point at which refs
are optimized.

As for descriptors versus RefCells, I'm guessing that can be resolved,
as soon as I figure out how descriptors actually work... If
descriptors don't modify the __dict__, then RefCells shouldn't get
involved. If they do, then there's some unwrapping going on there, and
RefCells should fit right in (though whether they'll improve anything
is a different question). RefCells are just a shortcut for dict
lookups.

For guards, I think Victor Stinner's idea could supplement this.
Alternatively, in my other email, I said there could be a rule of,
"Create intermediate RefCells for anything BEFORE a successful
lookup." So if we look in A, B, C, D, and find it in C, then we create
and save RefCells in A, B, C, but not D (where D = object). This MIGHT
result in a lot of intermediate RefCells, but I'd guess most things
aren't looked up just once, and it's saying, "It's possible for B to
gain member B.x and catch me on my way to C.x."

> So, trying to generalize global vs. builtin to a general notion of "nested 
> scope" that isn't necessary for builtins and doesn't work for anything else 
> seems like overcomplicating things for no benefit.

Probably. The globals() and __builtin__ case is simpler than the class case.
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Idea: Dictionary references

2015-12-17 Thread Andrew Barnert via Python-Dev
On Thursday, December 17, 2015 11:19 AM, Franklin? Lee 
 wrote:


> ...
> as soon as I figure out how descriptors actually work...


I think you need to learn what LOAD_ATTR and the machinery around it actually 
does before I can explain why trying to optimize it like globals-vs.-builtins 
doesn't make sense. Maybe someone who's better at explaining than me can come 
up with something clearer than the existing documentation, but I can't.
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Idea: Dictionary references

2015-12-17 Thread Victor Stinner
Le vendredi 18 décembre 2015, Andrew Barnert via Python-Dev <
python-dev@python.org> a écrit :
>
> >> Builtins do two dict lookups.
> >
> > Two?
>
> Actually, I guess three: first you fail to find the name in globals, then
> you find __builtins__ in globals, then you find the name in __builtins__ or
> __builtins__.__dict__.
>

Getting builtins from globals in done when the frame object is created, and
the lookup is skipped in the common case if qu recall correctly. LOAD_NAME
does a lookup on function globals, if the key doesn't exist (common case
for builtins), a ceond lookup is done in frame builtins. Open
Python/ceval.c and see to code. There is an optimisation for fast lookup in
two dict.

Victor
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Idea: Dictionary references

2015-12-17 Thread Victor Stinner
2015-12-17 23:17 GMT+01:00 Andrew Barnert via Python-Dev
:
> Builtins do two dict lookups.
>
> So, the only thing you can optimize there is builtins. But maybe that's worth 
> it.

FYI I implemented an optimization in FAT Python to avoid lookups for
builtin functions, builtin functions are copied to code constants at
runtime:
https://faster-cpython.readthedocs.org/fat_python.html#copy-builtin-functions-to-constants

It's nothing new, it's the generalization of common hacks, like 'def
func(len=len): return len(3)'.

The optimization is restricted to loading builtin symbols which are
not expected to be modified ;-)

(Right now the optimization is disabled by default, because the
optimizer is unable to detect when builtins are modified in the
current function, and so it breaks the Python semantic.)

> Class attributes (including normal methods, @property, etc.) do two or more 
> dict lookups--first the instance, then the class, then each class on the 
> class's MRO.

Note: Types have an efficient cache for name lookups ;-) Thanks for
this cache, it's no more an issue to have a deep hierarchy of classes.

Victor
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Idea: Dictionary references

2015-12-17 Thread Franklin? Lee
I already know that we can't use recursion, because it bypasses MRO.
I'm also not yet sure whether it makes sense to use refs for classes
in the first place.

As I understood it, an attribute will resolve in this order:
- __getattribute__ up the MRO. (raises AttributeError)
- __dict__ up the MRO. (raises KeyError)
- __getattr__ up the MRO. (raises AttributeError)


My new understanding:
- __getattribute__. (raises AttributeError)
- (default implementation:) __dict__.__getitem__. (raises KeyError)
- __getattr__ up the MRO. (raises AttributeError)

If this is the case, then (the default) __getattribute__ will be
making the repeated lookups, and might be the one requesting the
refcells (for the ones it wants).


Descriptors seem to be implemented as:
Store a Descriptor object as an attribute. When a Descriptor is
accessed, if it is being accessed from its owner, then unbox it and
use its methods. Otherwise, it's a normal attribute.

Then Descriptors are in the dict, so MIGHT benefit from refcells. The
memory cost might be higher, though.


On Thu, Dec 17, 2015 at 5:17 PM, Andrew Barnert  wrote:
> On Dec 17, 2015, at 13:37, Andrew Barnert via Python-Dev 
>  wrote:
>>
>> On Thursday, December 17, 2015 11:19 AM, Franklin? Lee 
>>  wrote:
>>
>>
>>> ...
>>> as soon as I figure out how descriptors actually work...
>>
>>
>> I think you need to learn what LOAD_ATTR and the machinery around it 
>> actually does before I can explain why trying to optimize it like 
>> globals-vs.-builtins doesn't make sense. Maybe someone who's better at 
>> explaining than me can come up with something clearer than the existing 
>> documentation, but I can't.
>
> I take that back. First, it was harsher than I intended. Second, I think I 
> can explain things.

I appreciate it! Tracking function definitions in the source can make
one want to do something else.


> First, for non-attribute lookups:
>
> (Non-shared) locals just load and save from an array.
>
> Free variables and shared locals load and save by going through an extra 
> dereference on a cell object in an array.

In retrospect, of course they do.

It sounds like the idea is what's already used there, except the refs
are synced to the locals array instead of a hash table.


> Globals do a single dict lookup.

A single dict lookup per function definition per name used? That's
what I'm proposing.

For example, (and I only just remembered that comprehensions and gen
expressions create scope)

[f(x) for x in range(1)]

would look up the name `f` at most twice (once in globals(), once in
builtins() if needed), and will always have the latest version of `f`.

And if it's in a function, the refcell(s) would be saved by the function.


> Builtins do two dict lookups.

Two?


> Class attributes (including normal methods, @property, etc.) do two or more 
> dict lookups--first the instance, then the class, then each class on the 
> class's MRO. Then, if the result has a __get__ method, it's called with the 
> instance and class to get the actual value. This is how bound methods get 
> created, property lookup functions get called, etc. The result of the 
> descriptor call can't get cached (that would mean, for example, that every 
> time you access the same @property on an instance, you'd get the same value).

Yeah, I would only try to save in a dict lookup to get the descriptor,
and I'm not sure it's worth it.

(Victor's response says that class attributes are already efficient, though.)


> So, if the globals->builtins optimization is worth doing, don't tie it to 
> another optimization that's much more complicated and less useful like this, 
> or we'll never get your simple and useful idea.

Sure. I couldn't figure out where to even save the refcells for
attributes, so I only really saw an opportunity for name lookups.
Since locals and nonlocals don't require dict lookups, this means
globals() and __builtin__.
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Idea: Dictionary references

2015-12-17 Thread Andrew Barnert via Python-Dev
On Dec 17, 2015, at 13:37, Andrew Barnert via Python-Dev 
 wrote:
> 
> On Thursday, December 17, 2015 11:19 AM, Franklin? Lee 
>  wrote:
> 
> 
>> ...
>> as soon as I figure out how descriptors actually work...
> 
> 
> I think you need to learn what LOAD_ATTR and the machinery around it actually 
> does before I can explain why trying to optimize it like globals-vs.-builtins 
> doesn't make sense. Maybe someone who's better at explaining than me can come 
> up with something clearer than the existing documentation, but I can't.

I take that back. First, it was harsher than I intended. Second, I think I can 
explain things.

First, for non-attribute lookups:

(Non-shared) locals just load and save from an array.

Free variables and shared locals load and save by going through an extra 
dereference on a cell object in an array.

Globals do a single dict lookup.

Builtins do two dict lookups.

So, the only thing you can optimize there is builtins. But maybe that's worth 
it.

Next, for attribute lookups (not counting special methods):

Everything calls __getattribute__. Assuming that's not overridden and uses the 
object implementation:

Instance attributes do one dict lookup.

Class attributes (including normal methods, @property, etc.) do two or more 
dict lookups--first the instance, then the class, then each class on the 
class's MRO. Then, if the result has a __get__ method, it's called with the 
instance and class to get the actual value. This is how bound methods get 
created, property lookup functions get called, etc. The result of the 
descriptor call can't get cached (that would mean, for example, that every time 
you access the same @property on an instance, you'd get the same value).

Dynamic attributes from a __getattr__ do all that plus whatever __getattr__ 
does.

If __getattribute__ is overloaded, it's entirely up to that implementation to 
do whatever it wants.

Things are similar for set and del: they call __setattr__/__delattr__, and the 
default versions of those look in the instance dict first, then look for a 
descriptor the same as with get except that they call a different method on the 
descriptor (and if it's not a descriptor, instead of using it, they ignore it 
and go back to the instance dict).

So, your mechanism can't significantly speed up method lookups, properties, or 
most other things. It could speed up lookups for class attributes that aren't 
descriptors, but only at the cost of increasing the size of every instance--and 
how often do those matter anyway?

A different mechanism that cached references to descriptors instead of to the 
resulting attributes could speed up method lookups, etc., but only by a very 
small amount, and with the same space cost.

A mechanism that didn't try to get involved with the instance dict, and just 
flattened out the MRO search once that failed (and was out of the way before 
the descriptor call or __getattr__ even entered the picture) might speed 
methods up in deeply nested hierarchies, and with only a per-class rather than 
a per-instance space cost. But how often do you have deeply-nested hierarchies? 
And the speedup still isn't going to be that big: You're basically turning 5 
dict lookups plus 2 method calls into 2 dict lookups plus 2 method calls. And 
it would still be much harder to guard than the globals dict: if any superclass 
changes its __bases__ or adds or removes a __getattribute__ or various other 
things, all of your references have to get re-computed. That's rare enough that 
the speed may not matter, but the code complexity probably does.

If short: if you can't cache the bound methods (and as far as I can tell, in 
general you can't--even though 99% of the time it would work), I don't think 
there's any other significant win here. 

So, if the globals->builtins optimization is worth doing, don't tie it to 
another optimization that's much more complicated and less useful like this, or 
we'll never get your simple and useful idea. 

___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Idea: Dictionary references

2015-12-17 Thread Franklin? Lee
On Thu, Dec 17, 2015 at 6:41 PM, Franklin? Lee
 wrote:
> Then Descriptors are in the dict, so MIGHT benefit from refcells. The
> memory cost might be higher, though.

Might be worse than the savings, I mean.
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Idea: Dictionary references

2015-12-17 Thread Andrew Barnert via Python-Dev
On Dec 17, 2015, at 15:41, Franklin? Lee  wrote:
> 
> I already know that we can't use recursion, because it bypasses MRO.
> I'm also not yet sure whether it makes sense to use refs for classes
> in the first place.
> 
> As I understood it, an attribute will resolve in this order:
> - __getattribute__ up the MRO. (raises AttributeError)
> - __dict__ up the MRO. (raises KeyError)
> - __getattr__ up the MRO. (raises AttributeError)
> 
> 
> My new understanding:
> - __getattribute__. (raises AttributeError)
>- (default implementation:) __dict__.__getitem__. (raises KeyError)
> - __getattr__ up the MRO. (raises AttributeError)

No, still completely wrong.

If __getattribute__ raises an AttributeError (or isn't found, but that only 
happens in special cases like somehow calling a method on a type that hasn't 
been constructed), that's the end of the line; there's no fallback, and 
everything else (IIRC: searching MRO dicts for data descriptors, searching the 
instance dict, searching MRO dicts for non-data descriptors or non-descriptors, 
special-method-lookup-and-call __getattr__, raise AttributeError... and then 
doing the appropriate descriptor call at the end if needed).

I was going to say that the only custom __getattribute__ you'll find in 
builtins or stdlib is on type, which does the exact same thing except when it 
calls a descriptor it does __get__(None, cls) instead of __get__(obj, 
type(obj)), and if you find any third-party __getattribute__ you should just 
assume it's going to do something crazy and don't bother trying to help it. But 
then I remembered that super must have a custom __getattribute__, so... you'd 
probably need to search the code for others.

> If this is the case, then (the default) __getattribute__ will be
> making the repeated lookups, and might be the one requesting the
> refcells (for the ones it wants).

Yes, the default and type __getattribute__ are what you'd want to optimize, if 
anything. And maybe special-method lookup.

> Descriptors seem to be implemented as:
>Store a Descriptor object as an attribute. When a Descriptor is
> accessed, if it is being accessed from its owner, then unbox it and
> use its methods. Otherwise, it's a normal attribute.

Depending on what you mean by "owner", I think you have that backward. If the 
instance itself stores a descriptor, it's just used as itself; if the 
instance's _type_ (or a supertype) stores one, it's called to get the instance 
attribute.

> Then Descriptors are in the dict, so MIGHT benefit from refcells. The
> memory cost might be higher, though.

Same memory cost. They're just objects whose type's dicts happen to have a 
__get__ method (just like iterables are just objects whose type's dicts happen 
to have an __iter__ method). The point is that you can't cache the result of 
the descriptor call, you can cache the descriptor itself but it will rarely 
help, and the builtin method cache probably already takes care of 99% of the 
cases where it would help, so I don't see what you're going to get here.

>> On Thu, Dec 17, 2015 at 5:17 PM, Andrew Barnert  wrote:
>>> On Dec 17, 2015, at 13:37, Andrew Barnert via Python-Dev 
>>>  wrote:
>>> 
>>> On Thursday, December 17, 2015 11:19 AM, Franklin? Lee 
>>>  wrote:
>>> 
>>> 
 ...
 as soon as I figure out how descriptors actually work...
>>> 
>>> 
>>> I think you need to learn what LOAD_ATTR and the machinery around it 
>>> actually does before I can explain why trying to optimize it like 
>>> globals-vs.-builtins doesn't make sense. Maybe someone who's better at 
>>> explaining than me can come up with something clearer than the existing 
>>> documentation, but I can't.
>> 
>> I take that back. First, it was harsher than I intended. Second, I think I 
>> can explain things.
> 
> I appreciate it! Tracking function definitions in the source can make
> one want to do something else.

The documentation is pretty good for this stuff (and getting better every 
year). You mainly want the data model chapter of the reference and the 
descriptor howto guide; the dis and inspect docs in the library can also be 
helpful. Together they'll answer most of what you need.

If they don't, maybe I will try to write up an explanation as a blog post, but 
I don't think it needs to get sent to the list (except for the benefit of core 
devs calling me out of I screw up, but they have better things to do with their 
time).

>> First, for non-attribute lookups:
>> 
>> (Non-shared) locals just load and save from an array.
>> 
>> Free variables and shared locals load and save by going through an extra 
>> dereference on a cell object in an array.
> 
> In retrospect, of course they do.
> 
> It sounds like the idea is what's already used there, except the refs
> are synced to the locals array instead of a hash table.

Yes, which is already faster than what you want to do.

More 

Re: [Python-Dev] Idea: Dictionary references

2015-12-17 Thread Chris Angelico
On Fri, Dec 18, 2015 at 2:37 PM, Andrew Barnert via Python-Dev
 wrote:
> If __getattribute__ raises an AttributeError (or isn't found, but that only 
> happens in special cases like somehow calling a method on a type that hasn't 
> been constructed)

Wow. How do you do that? Is it possible with pure Python?

ChrisA
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Idea: Dictionary references

2015-12-17 Thread Victor Stinner
2015-12-17 11:54 GMT+01:00 Franklin? Lee :
> My suggestion should improve *all* function calls which refer to
> outside names.

Ok, I now think that you should stop hijacking the FAT Python thread.
I start a new thread. IMHO your dictionary reference idea is completly
unrelated to FAT Python.

FAT Python is about guards and specialized bytecode.

> Each function keeps an indirect, automagically updated
> reference to the current value of the names they use, and will never
> need to look things up again.[*]

Indirections, nested dictionaries, creation of new "reference"
objects... IMHO you are going to have major implementation issues :-/
The design looks *very* complex. I'm quite sure that you are going to
make namespace lookups *slower*.

It reminds me Python before the with statement and PyPy garbage
collector. Many applications relied on the exact behaviour of CPython
garbage collector. For example, they expected that a file is written
on disk when the last reference to the file object is destroyed. In
PyPy, it wasn't (it isn't) true, the write can be delayed.

I guess that with all your complex machinery for dict lookups, you
will have similar issues of object lifetime. It's unclear to me when
and how "reference" objects are destroyed, nor when dict values are
destroyed.

What happens if a dict key is removed and a reference object is still
alive? Is the dict value immediatly destroyed? Does the reference
object contain a strong or a weak reference to the value?

Victor
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com