Re: [Python-Dev] Idea: Dictionary references

2015-12-17 Thread Maciej Fijalkowski
You can very easily implement this with version tags on the globals
dictionaries - means that the dictionaries have versions and the guard
checking if everything is ok just checks the version tag on globals.

Generally speaking, such optimizations have been done in the past
(even in places like pypy, but also in literature) and as soon as we
have dynamic compilation (and FAT is a form of it), you can do such
tricks.

On Thu, Dec 17, 2015 at 3:48 PM, Steven D'Aprano  wrote:
> On Thu, Dec 17, 2015 at 12:53:13PM +0100, Victor Stinner quoted:
>> 2015-12-17 11:54 GMT+01:00 Franklin? Lee :
>
>> > Each function keeps an indirect, automagically updated
>> > reference to the current value of the names they use,
>
> Isn't that a description of globals()? If you want to look up a name
> "spam", you grab an indirect reference to it:
>
> globals()["spam"]
>
> which returns the current value of the name "spam".
>
>
>> > and will never need to look things up again.[*]
>
> How will this work?
>
> Naively, it sounds to me like Franklin is suggesting that on every
> global assignment, the interpreter will have to touch every single
> function in the module to update that name. Something like this:
>
> # on a global assignment
> spam = 23
>
> # the interpreter must do something like this:
> for function in module.list_of_functions:
> if "spam" in function.__code__.__global_names__:
> function.__code__.__global_names__["spam"] = spam
>
> As I said, that's a very naive way to implement this. Unless you have
> something much cleverer, I think this will be horribly slow.
>
> And besides, you *still* need to deal with the case that the name isn't
> a global at all, but in the built-ins namespace.
>
>
> --
> Steve
> ___
> Python-Dev mailing list
> Python-Dev@python.org
> https://mail.python.org/mailman/listinfo/python-dev
> Unsubscribe: 
> https://mail.python.org/mailman/options/python-dev/fijall%40gmail.com
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Idea: Dictionary references

2015-12-17 Thread Steven D'Aprano
On Thu, Dec 17, 2015 at 12:53:13PM +0100, Victor Stinner quoted:
> 2015-12-17 11:54 GMT+01:00 Franklin? Lee :

> > Each function keeps an indirect, automagically updated
> > reference to the current value of the names they use, 

Isn't that a description of globals()? If you want to look up a name 
"spam", you grab an indirect reference to it:

globals()["spam"]

which returns the current value of the name "spam".


> > and will never need to look things up again.[*]

How will this work?

Naively, it sounds to me like Franklin is suggesting that on every 
global assignment, the interpreter will have to touch every single 
function in the module to update that name. Something like this:

# on a global assignment
spam = 23

# the interpreter must do something like this:
for function in module.list_of_functions:
if "spam" in function.__code__.__global_names__:
function.__code__.__global_names__["spam"] = spam

As I said, that's a very naive way to implement this. Unless you have 
something much cleverer, I think this will be horribly slow.

And besides, you *still* need to deal with the case that the name isn't 
a global at all, but in the built-ins namespace.


-- 
Steve
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


[Python-Dev] pypi simple index

2015-12-17 Thread Carlos Barera
Hi,

I'm using install_requires in setup.py to specify a specific package my
project is dependant on.
When running python setup.py install, apparently the simple index is used
as an older package is taken from pypi. While in
https://pypi.python.org/pypi, there's a newer package.
When installing directly using pip, the latest package is installed
successfully.

Several questions:
1. What's the difference between the pypi simple index and the general pypi
index?
2. Why is setup.py defaulting to the simple index?
3. How can I make the setup.py triggered install use the main pypi index
instead of simple

Thanks!
Carlos
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Idea: Dictionary references

2015-12-17 Thread Franklin? Lee
(Previous thread was here, by the way:
https://mail.python.org/pipermail/python-dev/2015-December/142437.html)

On Thu, Dec 17, 2015 at 8:48 AM, Steven D'Aprano  wrote:
> On Thu, Dec 17, 2015 at 12:53:13PM +0100, Victor Stinner quoted:
>> 2015-12-17 11:54 GMT+01:00 Franklin? Lee :
>
>> > Each function keeps an indirect, automagically updated
>> > reference to the current value of the names they use,
>
> Isn't that a description of globals()? If you want to look up a name
> "spam", you grab an indirect reference to it:
>
> globals()["spam"]
>
> which returns the current value of the name "spam".

The *current value*. I'm proposing that we instead do this:

spamref = globals().getref('spam')

Every time we want to find the current, updated value of 'spam', we just do

spam = spamref.resolve()

which will skip the hash lookup and go directly to the value.

>> > and will never need to look things up again.[*]
>
> How will this work?
>
> Naively, it sounds to me like Franklin is suggesting that on every
> global assignment, the interpreter will have to touch every single
> function in the module to update that name.

A refcell holds a pointer to the location in the dict itself where the
value pointer is. When the value is updated in the dict, the refcell
will not need to be updated.

My original proposal wanted to keep cells in the "real" dict, and
update them. Like so:

class RefCell:
__slots__ = ['value']

class ScopeDict(dict):
def __getitem__(self, key):
value = super()[key].value #may raise
if value is NULL:
raise KeyError(key)
return value

def __setitem__(self, key, value):
if key in super():
super()[key].value = value
else:
cell = super()[key] = RefCell()
cell.value = value
def __delitem__(self, key, value):
cell = super()[key] #may raise
if cell.value is NULL:
raise KeyError(key)
cell.value = NULL


I realized later that this isn't necessary. Most dict operations don't
need to know about the indirection, so I make the inner dict a normal
dict (with a few more holes than normal). But this would show how you
can avoid manual updates for references.

> And besides, you *still* need to deal with the case that the name isn't
> a global at all, but in the built-ins namespace.

globals() to __builtin__ is a nesting relationship. At the bottom of
the following email, I have a pseudocode implementation which knows
how to deal with nested scopes.
https://mail.python.org/pipermail/python-dev/2015-December/142489.html
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Idea: Dictionary references

2015-12-17 Thread Victor Stinner
2015-12-17 16:38 GMT+01:00 Franklin? Lee :
> Yeah, maybe it's out of the scope of bytecode optimization. But I
> think this would make guards unnecessary, since once a name is found,
> there's a quick way to refer to it.

FAT Python requires and supports various kinds of guards:
* type of a function argument
* dictionary key
* function (func.__code__)

I guess that you are talking about the dictionary key guards. In fact,
they are 4 subtypes of dict guards:
* guard on builtins.__dict__['key']
* guard on globals()['key']
* guard on MyClass.__dict__['key']
* guard on dict['key']

The implementation of the check if this guard currently relies on a
new "version" (global dict version, incremented at each change) to
avoid lookup if possible. The guard stores a strong reference to the
key and the value. If the value is different, the guard checks returns
False.

I don't understand how you plan to avoid guards. The purpose of guards
is to respect the Python semantic by falling back to the "slow"
bytecode if something changes. So I don't understand your idea of
avoiding completly guards. Again, that's why I guess that it's
unrelated to FAT Python...

Or it's just that your idea is an alternative to dict versionning to
get efficient guards on dict keys, right?

> Guards would also have an issue
> with nested scopes. You have a note on your website about it:
> (https://faster-cpython.readthedocs.org/fat_python.html#call-pure-builtins)
>
> "The optimization is disabled when the builtin function is
> modified or if a variable with the same name is added to the global
> namespace of the function."

FAT Python doesn't emit a specialized version if it requires a builtin
function, but a local variable with the same name is found.

The check is done in the current function but also in the parent
namespaces, up to the global namespace. I'm talking about the static
analysis of the code.

If the specialized version is built, a guard is created on the builtin
namespace and another on the global namespace.

Sorry, I don't understand the "problem" you are referring to. Can you
maybe show an example of code where FAT Python doesn't work or where
you idea can help?

> RefCells "know" whether they're part of a living dict. (The dict marks
> them as such upon its death.) If they are not, they will decref their
> value upon their death. They do not hold a reference to their owner
> dict.

The dict contains the list all of its "own" RefCell objects, right?

> If it's part of a living dict, we have a choice: the dict can be
> responsible for deletion, or the RefCell can be responsible for
> deletion. It doesn't really matter which design we go with.

I see your RefCell idea like dictionary views. Views are linked to the
dict. If the dict is modified, views are "updated" too.

It would be confusing to have a view disconnected from its container.

In short, RefCell is a view on a single dict entry, to be able to
"watch" a dict entry without the cost of a lookup? And a RefCell can
be created, even if the dict entry doesn't exist right?

Hum, creating many RefCell introduces an overhead somewhere. For
example, deleting a key has to update N RefCell objects linked to this
key, right? So del dict[x] takes O(number of RefCell), right?

> If a dict key is removed, the inner dict will still have the key
> (which is true in the current implementation), but the value will be
> decref'd and the value pointer will be NULL'd. The RefCell will not
> need to be updated, since (as part of a living dict) it's pointing at
> the pointer to the value, not the object itself.

Ok, so deleting a dict key always destroy the value, but the key may
stay alive longer than expected (until all RefCell objects are
destroyed). Usually, dict keys are constants, so keeping them alive
doesn't matter so much. It's rare to have large keys.

> If it is detached from its owner dict (due to owner death), it will
> own a strong reference to the value. This is necessarily the case,
> since things that have a (strong) reference to the RefCell expect to
> find the value (or lack of a value) there.

I don't  understand this part.

You said that deleting a key destroys the value. Destroying a dict
means clearing all keys, so destroying all values. No?

What is the use case of having a RefCell no more connected to the dict?

Victor
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Idea: Dictionary references

2015-12-17 Thread Franklin? Lee
On Thu, Dec 17, 2015 at 6:53 AM, Victor Stinner
 wrote:
> 2015-12-17 11:54 GMT+01:00 Franklin? Lee :
>> My suggestion should improve *all* function calls which refer to
>> outside names.
>
> Ok, I now think that you should stop hijacking the FAT Python thread.
> I start a new thread. IMHO your dictionary reference idea is completly
> unrelated to FAT Python.
>
> FAT Python is about guards and specialized bytecode.

Yeah, maybe it's out of the scope of bytecode optimization. But I
think this would make guards unnecessary, since once a name is found,
there's a quick way to refer to it.


>> Each function keeps an indirect, automagically updated
>> reference to the current value of the names they use, and will never
>> need to look things up again.[*]
>
> Indirections, nested dictionaries, creation of new "reference"
> objects... IMHO you are going to have major implementation issues :-/
> The design looks *very* complex. I'm quite sure that you are going to
> make namespace lookups *slower*.

The nested dictionaries are only for nested scopes (and inner
functions don't create nested scopes). Nested scopes will already
require multiple lookups in parents. I think this is strictly an
improvement, except perhaps in memory. Guards would also have an issue
with nested scopes. You have a note on your website about it:
(https://faster-cpython.readthedocs.org/fat_python.html#call-pure-builtins)

"The optimization is disabled when the builtin function is
modified or if a variable with the same name is added to the global
namespace of the function."

With a NestedRefCell, it would check globals() (a simple dereference
and `pointer != NULL`) and then check __builtin__. If it finds it in
__builtin__, it will save a reference to that. It will only do
repeated lookups in __builtin__ if each of the previous lookups fail.
As far as I know, these repeated lookups are already necessary, and
anything that can be used to avoid them (e.g. guards) can be used for
repeated failed lookups, too.

For non-nested scopes, it will look things up once, costing an extra
RefCell creation if necessary, and the only other costs are on
resizing, deletion of the dict, and working with a larger dict in
general.

The important parts of the design is pretty much in the code that I
posted. We keep an extra hash table for refs, and keep it the same
size as the original hash table, so that we pay a single lookup cost
to get the index in both.


> It reminds me Python before the with statement and PyPy garbage
> collector. Many applications relied on the exact behaviour of CPython
> garbage collector. For example, they expected that a file is written
> on disk when the last reference to the file object is destroyed. In
> PyPy, it wasn't (it isn't) true, the write can be delayed.

It should not affect the semantic. Things should still happen as they
used to, as far as I can figure. Or at least as far as the rules of
the interpreter are concerned. (That is, values might live a little
longer in PyPy, but can't be forced to live longer than they were
formerly allowed to.)


> I guess that with all your complex machinery for dict lookups,

The only cost to a normal getitem (again, other than from looking it
up in a bigger dict) is to make sure the return value isn't NULL. The
machinery is involved in function creation and resolution of names: On
function creation, get refs to each name used. When the name is used,
try to resolve the refs.


>you
> will have similar issues of object lifetime. It's unclear to me when
> and how "reference" objects are destroyed, nor when dict values are
> destroyed.

RefCells are ref-counted PyObjects. That is not an issue. A RefCell
will live while it is useful (= it has an outside reference) or while
it's not useful but its owner dict hasn't been resized/deleted yet (at
which time RefCells without outside references will be deleted).

RefCells "know" whether they're part of a living dict. (The dict marks
them as such upon its death.) If they are not, they will decref their
value upon their death. They do not hold a reference to their owner
dict.

If it's part of a living dict, we have a choice: the dict can be
responsible for deletion, or the RefCell can be responsible for
deletion. It doesn't really matter which design we go with.


> What happens if a dict key is removed and a reference object is still
> alive? Is the dict value immediatly destroyed? Does the reference
> object contain a strong or a weak reference to the value?

If a dict key is removed, the inner dict will still have the key
(which is true in the current implementation), but the value will be
decref'd and the value pointer will be NULL'd. The RefCell will not
need to be updated, since (as part of a living dict) it's pointing at
the pointer to the value, not the object itself.

If it is detached from its owner dict (due to owner 

[Python-Dev] pypi simple index

2015-12-17 Thread Carlos Barera
Hi,

I'm using install_requires in setup.py to specify a specific package my
project is dependant on.
When running python setup.py install, apparently the simple index is used
as an older package is taken from pypi. While in
https://pypi.python.org/pypi, there's a newer package.
When installing directly using pip, the latest package is installed
successfully.
I noticed that the new package is only available as a wheel and older
versions of setup tools won't install wheels for install_requires.
However, upgrading setuptools didn't help.

Several questions:
1. What's the difference between the pypi simple index and the general pypi
index?
2. Why is setup.py defaulting to the simple index?
3. How can I make the setup.py triggered install use the main pypi index
instead of simple

Thanks!
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] pypi simple index

2015-12-17 Thread Brett Cannon
PyPI questions are best directed towards the distutils-sig as they manage
PyPI and not python-dev.

On Thu, 17 Dec 2015 at 08:20 Carlos Barera  wrote:

> Hi,
>
> I'm using install_requires in setup.py to specify a specific package my
> project is dependant on.
> When running python setup.py install, apparently the simple index is used
> as an older package is taken from pypi. While in
> https://pypi.python.org/pypi, there's a newer package.
> When installing directly using pip, the latest package is installed
> successfully.
>
> Several questions:
> 1. What's the difference between the pypi simple index and the general
> pypi index?
> 2. Why is setup.py defaulting to the simple index?
> 3. How can I make the setup.py triggered install use the main pypi index
> instead of simple
>
> Thanks!
> Carlos
> ___
> Python-Dev mailing list
> Python-Dev@python.org
> https://mail.python.org/mailman/listinfo/python-dev
> Unsubscribe:
> https://mail.python.org/mailman/options/python-dev/brett%40python.org
>
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Idea: Dictionary references

2015-12-17 Thread Andrew Barnert via Python-Dev
On Dec 17, 2015, at 07:38, Franklin? Lee  wrote:
> 
> The nested dictionaries are only for nested scopes (and inner
> functions don't create nested scopes). Nested scopes will already
> require multiple lookups in parents.

I think I understand what you're getting at here, but it's a really confusing 
use of terminology. In Python, and in programming in general, nested scopes 
refer to exactly inner functions (and classes) being lexically nested and doing 
lookup through outer scopes. The fact that this is optimized at compile time to 
FAST vs. CELL vs. GLOBAL/NAME, cells are optimized at function-creation time, 
and only global and name have to be resolved at the last second doesn't mean 
that there's no scoping, or some other form of scoping besides lexical. The 
actual semantics are LEGB, even if L vs. E vs. GB and E vs. further-out E can 
be optimized.

What you're talking about here is global lookups falling back to builtin 
lookups. There's no more general notion of nesting or scoping involved, so why 
use those words?

Also, reading your earlier post, it sounds like you're trying to treat 
attribute lookup as a special case of global lookup, only with a chain of 
superclasses beyond the class instead of just a single builtins. But they're 
totally different. Class lookup doesn't just look in a series of dicts, it 
calls __getattribute__ which usually calls __getattr__ which may or may not 
look in the __dict__s (which may not even exist) to find a descriptor and then 
calls its __get__ method to get the value. You'd have to somehow handle the 
case where the search only went through object.__getattribute__ and __getattr__ 
and found a result by looking in a dict, to make a RefCell to that dict which 
is marked in some way that says "I'm not a value, I'm a descriptor you have to 
call each time", and then apply some guards that will detect whether that class 
or any intervening class dict touched that key, whether the MRO changed, 
whether that class or any intervening class added or changed implementations for
  __getatttibute__ or __getattr__, and probably more things I haven't thought 
of. What do those guards look like? (Also, you need a different set of rules to 
cache, and guard for, special method lookup--you could just ignore that, but I 
think those are the lookups that would benefit most from optimization.)

So, trying to generalize global vs. builtin to a general notion of "nested 
scope" that isn't necessary for builtins and doesn't work for anything else 
seems like overcomplicating things for no benefit.


> I think this is strictly an
> improvement, except perhaps in memory. Guards would also have an issue
> with nested scopes. You have a note on your website about it:
> (https://faster-cpython.readthedocs.org/fat_python.html#call-pure-builtins)
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] pypi simple index

2015-12-17 Thread Robert Collins
On 18 December 2015 at 06:13, Carlos Barera  wrote:

> Hi,
>
> I'm using install_requires in setup.py to specify a specific package my
> project is dependant on.
> When running python setup.py install, apparently the simple index is used
> as an older package is taken from pypi. While
>

What's happening here is that easy-install is triggering - which does not
support wheels. Use 'pip install .' instead.


> in https://pypi.python.org/pypi, there's a newer package.
> When installing directly using pip, the latest package is installed
> successfully.
> I noticed that the new package is only available as a wheel and older
> versions of setup tools won't install wheels for install_requires.
> However, upgrading setuptools didn't help.
>
> Several questions:
> 1. What's the difference between the pypi simple index and the general
> pypi index?
>

The '/simple' API is for machine consumption, /pypi is for humans, other
than that there should be not be any difference.


> 2. Why is setup.py defaulting to the simple index?
>

Because it is the only index :).


> 3. How can I make the setup.py triggered install use the main pypi index
> instead of simple
>

You can't - the issue is not the index being consulted, but your use of
'python setup.py install' which does not support  wheels.

Cheers,
Rob
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Idea: Dictionary references

2015-12-17 Thread Franklin? Lee
On Thu, Dec 17, 2015 at 11:50 AM, Victor Stinner
 wrote:
> I don't understand how you plan to avoid guards. The purpose of guards
> is to respect the Python semantic by falling back to the "slow"
> bytecode if something changes. So I don't understand your idea of
> avoiding completly guards. Again, that's why I guess that it's
> unrelated to FAT Python...

Yeah, I guess it is. Maybe we could've moved to python-ideas. As far
as I understand, this concept can be put into CPython.

(Note to self: Look up how PyPy handles repeated lookups.)


>> Guards would also have an issue
>> with nested scopes. You have a note on your website about it:
>> (https://faster-cpython.readthedocs.org/fat_python.html#call-pure-builtins)
>>
>> "The optimization is disabled when the builtin function is
>> modified or if a variable with the same name is added to the global
>> namespace of the function."
>
> FAT Python doesn't emit a specialized version if it requires a builtin
> function, but a local variable with the same name is found.
>
> The check is done in the current function but also in the parent
> namespaces, up to the global namespace. I'm talking about the static
> analysis of the code.
>
> If the specialized version is built, a guard is created on the builtin
> namespace and another on the global namespace.
>
> Sorry, I don't understand the "problem" you are referring to. Can you
> maybe show an example of code where FAT Python doesn't work or where
> you idea can help?

If we have to look in scopes A, B, C in order, where the name is in C
but never in B, and there's no nesting relationship between B and C.
In that case, I do not create a refcell in B chained to C (because
there's no relationship), so I keep doing lookups in B. That's the
problem. For that, guards and versioning can prevent unnecessary
lookups in B. Though I think a better solution might be: If a name is
found in C, create empty refcells in B and A (i.e. in all previous
dicts).

(There's a nesting relationship between globals() and __builtin__, so
that's fine.)


>> RefCells "know" whether they're part of a living dict. (The dict marks
>> them as such upon its death.) If they are not, they will decref their
>> value upon their death. They do not hold a reference to their owner
>> dict.
>
> The dict contains the list all of its "own" RefCell objects, right?

It contains a table of pointers. The pointers are to RefCell objects or NULL.

The refs table is exactly the same size as the internal hash table.
This makes indexing it efficient: to find the pointer to a refcell,
find the index of the key in the hash table, then use that SAME index
on the refs table. You never need to find a refcell without also
finding its hash index, so this is cheap.


> In short, RefCell is a view on a single dict entry, to be able to
> "watch" a dict entry without the cost of a lookup? And a RefCell can
> be created, even if the dict entry doesn't exist right?

My "implementation", which had nesting and recursion in mind, had a
"create_if_none" parameter, which meant that the requester can ask for
it to be created even if the key didn't exist in the table.
Pre-creation is useful for functions which refer to globals() names
before they're defined. No-creation is useful in... I can only think
of nesting as a use (globals() -> __builtin__ shouldn't create empty
cells in __builtin__).

See `getref` in here:
https://mail.python.org/pipermail/python-dev/2015-December/142489.html


> Hum, creating many RefCell introduces an overhead somewhere. For
> example, deleting a key has to update N RefCell objects linked to this
> key, right? So del dict[x] takes O(number of RefCell), right?

There are no "outside" updates, except when a dict moves to a
different internal table or deletes its internal table. In that case,
the dict has to move and update each exposed RefCell.

For each dict, for each key, there is at most one RefCell. As long as
the dict is alive, that RefCell will hold a pointer to the pointer to
the value (enforced by the dict). When the dict dies, it makes the
Refcell point to the object, and tells the RefCell it's free (so it's
in charge of cleaning up its value).

Dict entries look like this.

typedef struct {
/* Cached hash code of me_key. */
Py_hash_t me_hash;
PyObject *me_key;
PyObject *me_value; /* This field is only meaningful for
combined tables */
} PyDictKeyEntry;

The internal table (which the ref table will sync with) is an array of
PyDictKeyEntrys.

(Raymond Hettinger made a design with a smaller table, and the hash
lookup would be into an array of indices. This would make synced
metadata tables both easier and smaller. See
https://mail.python.org/pipermail/python-dev/2012-December/123028.html
and latest relevant discussion
https://mail.python.org/pipermail/python-ideas/2015-December/037468.html
)

The refcell will hold this:

RefCell(_value)

A pointer to the field, not to the 

Re: [Python-Dev] Idea: Dictionary references

2015-12-17 Thread Franklin? Lee
On Thu, Dec 17, 2015 at 12:30 PM, Andrew Barnert  wrote:
> On Dec 17, 2015, at 07:38, Franklin? Lee  
> wrote:
>>
>> The nested dictionaries are only for nested scopes (and inner
>> functions don't create nested scopes). Nested scopes will already
>> require multiple lookups in parents.
>
> I think I understand what you're getting at here, but it's a really confusing 
> use of terminology. In Python, and in programming in general, nested scopes 
> refer to exactly inner functions (and classes) being lexically nested and 
> doing lookup through outer scopes. The fact that this is optimized at compile 
> time to FAST vs. CELL vs. GLOBAL/NAME, cells are optimized at 
> function-creation time, and only global and name have to be resolved at the 
> last second doesn't mean that there's no scoping, or some other form of 
> scoping besides lexical. The actual semantics are LEGB, even if L vs. E vs. 
> GB and E vs. further-out E can be optimized.

Oh, I've never actually read the Python scoping rules spelled out. I
wasn't sure if there were other cases.

The other two cases I thought of as "nesting" were: object to its
class, and class to its superclasses.

> Also, reading your earlier post, it sounds like you're trying to treat 
> attribute lookup as a special case of global lookup, only with a chain of 
> superclasses beyond the class instead of just a single builtins. But they're 
> totally different. Class lookup doesn't just look in a series of dicts, it 
> calls __getattribute__ which usually calls __getattr__ which may or may not 
> look in the __dict__s (which may not even exist) to find a descriptor and 
> then calls its __get__ method to get the value. You'd have to somehow handle 
> the case where the search only went through object.__getattribute__ and 
> __getattr__ and found a result by looking in a dict, to make a RefCell to 
> that dict which is marked in some way that says "I'm not a value, I'm a 
> descriptor you have to call each time", and then apply some guards that will 
> detect whether that class or any intervening class dict touched that key, 
> whether the MRO changed, whether that class or any intervening class added or 
> changed implementations f
 or __getatttibute__ or __getattr__, and probably more things I haven't thought 
of. What do those guards look like? (Also, you need a different set of rules to 
cache, and guard for, special method lookup--you could just ignore that, but I 
think those are the lookups that would benefit most from optimization.)

Doesn't __getattr__ only get called if all the mro __dict__ lookups failed?

I forgot about __getattribute__. That might be the point at which refs
are optimized.

As for descriptors versus RefCells, I'm guessing that can be resolved,
as soon as I figure out how descriptors actually work... If
descriptors don't modify the __dict__, then RefCells shouldn't get
involved. If they do, then there's some unwrapping going on there, and
RefCells should fit right in (though whether they'll improve anything
is a different question). RefCells are just a shortcut for dict
lookups.

For guards, I think Victor Stinner's idea could supplement this.
Alternatively, in my other email, I said there could be a rule of,
"Create intermediate RefCells for anything BEFORE a successful
lookup." So if we look in A, B, C, D, and find it in C, then we create
and save RefCells in A, B, C, but not D (where D = object). This MIGHT
result in a lot of intermediate RefCells, but I'd guess most things
aren't looked up just once, and it's saying, "It's possible for B to
gain member B.x and catch me on my way to C.x."

> So, trying to generalize global vs. builtin to a general notion of "nested 
> scope" that isn't necessary for builtins and doesn't work for anything else 
> seems like overcomplicating things for no benefit.

Probably. The globals() and __builtin__ case is simpler than the class case.
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Idea: Dictionary references

2015-12-17 Thread Andrew Barnert via Python-Dev
On Thursday, December 17, 2015 11:19 AM, Franklin? Lee 
 wrote:


> ...
> as soon as I figure out how descriptors actually work...


I think you need to learn what LOAD_ATTR and the machinery around it actually 
does before I can explain why trying to optimize it like globals-vs.-builtins 
doesn't make sense. Maybe someone who's better at explaining than me can come 
up with something clearer than the existing documentation, but I can't.
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Idea: Dictionary references

2015-12-17 Thread Victor Stinner
Le vendredi 18 décembre 2015, Andrew Barnert via Python-Dev <
python-dev@python.org> a écrit :
>
> >> Builtins do two dict lookups.
> >
> > Two?
>
> Actually, I guess three: first you fail to find the name in globals, then
> you find __builtins__ in globals, then you find the name in __builtins__ or
> __builtins__.__dict__.
>

Getting builtins from globals in done when the frame object is created, and
the lookup is skipped in the common case if qu recall correctly. LOAD_NAME
does a lookup on function globals, if the key doesn't exist (common case
for builtins), a ceond lookup is done in frame builtins. Open
Python/ceval.c and see to code. There is an optimisation for fast lookup in
two dict.

Victor
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Idea: Dictionary references

2015-12-17 Thread Victor Stinner
2015-12-17 23:17 GMT+01:00 Andrew Barnert via Python-Dev
:
> Builtins do two dict lookups.
>
> So, the only thing you can optimize there is builtins. But maybe that's worth 
> it.

FYI I implemented an optimization in FAT Python to avoid lookups for
builtin functions, builtin functions are copied to code constants at
runtime:
https://faster-cpython.readthedocs.org/fat_python.html#copy-builtin-functions-to-constants

It's nothing new, it's the generalization of common hacks, like 'def
func(len=len): return len(3)'.

The optimization is restricted to loading builtin symbols which are
not expected to be modified ;-)

(Right now the optimization is disabled by default, because the
optimizer is unable to detect when builtins are modified in the
current function, and so it breaks the Python semantic.)

> Class attributes (including normal methods, @property, etc.) do two or more 
> dict lookups--first the instance, then the class, then each class on the 
> class's MRO.

Note: Types have an efficient cache for name lookups ;-) Thanks for
this cache, it's no more an issue to have a deep hierarchy of classes.

Victor
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Idea: Dictionary references

2015-12-17 Thread Franklin? Lee
I already know that we can't use recursion, because it bypasses MRO.
I'm also not yet sure whether it makes sense to use refs for classes
in the first place.

As I understood it, an attribute will resolve in this order:
- __getattribute__ up the MRO. (raises AttributeError)
- __dict__ up the MRO. (raises KeyError)
- __getattr__ up the MRO. (raises AttributeError)


My new understanding:
- __getattribute__. (raises AttributeError)
- (default implementation:) __dict__.__getitem__. (raises KeyError)
- __getattr__ up the MRO. (raises AttributeError)

If this is the case, then (the default) __getattribute__ will be
making the repeated lookups, and might be the one requesting the
refcells (for the ones it wants).


Descriptors seem to be implemented as:
Store a Descriptor object as an attribute. When a Descriptor is
accessed, if it is being accessed from its owner, then unbox it and
use its methods. Otherwise, it's a normal attribute.

Then Descriptors are in the dict, so MIGHT benefit from refcells. The
memory cost might be higher, though.


On Thu, Dec 17, 2015 at 5:17 PM, Andrew Barnert  wrote:
> On Dec 17, 2015, at 13:37, Andrew Barnert via Python-Dev 
>  wrote:
>>
>> On Thursday, December 17, 2015 11:19 AM, Franklin? Lee 
>>  wrote:
>>
>>
>>> ...
>>> as soon as I figure out how descriptors actually work...
>>
>>
>> I think you need to learn what LOAD_ATTR and the machinery around it 
>> actually does before I can explain why trying to optimize it like 
>> globals-vs.-builtins doesn't make sense. Maybe someone who's better at 
>> explaining than me can come up with something clearer than the existing 
>> documentation, but I can't.
>
> I take that back. First, it was harsher than I intended. Second, I think I 
> can explain things.

I appreciate it! Tracking function definitions in the source can make
one want to do something else.


> First, for non-attribute lookups:
>
> (Non-shared) locals just load and save from an array.
>
> Free variables and shared locals load and save by going through an extra 
> dereference on a cell object in an array.

In retrospect, of course they do.

It sounds like the idea is what's already used there, except the refs
are synced to the locals array instead of a hash table.


> Globals do a single dict lookup.

A single dict lookup per function definition per name used? That's
what I'm proposing.

For example, (and I only just remembered that comprehensions and gen
expressions create scope)

[f(x) for x in range(1)]

would look up the name `f` at most twice (once in globals(), once in
builtins() if needed), and will always have the latest version of `f`.

And if it's in a function, the refcell(s) would be saved by the function.


> Builtins do two dict lookups.

Two?


> Class attributes (including normal methods, @property, etc.) do two or more 
> dict lookups--first the instance, then the class, then each class on the 
> class's MRO. Then, if the result has a __get__ method, it's called with the 
> instance and class to get the actual value. This is how bound methods get 
> created, property lookup functions get called, etc. The result of the 
> descriptor call can't get cached (that would mean, for example, that every 
> time you access the same @property on an instance, you'd get the same value).

Yeah, I would only try to save in a dict lookup to get the descriptor,
and I'm not sure it's worth it.

(Victor's response says that class attributes are already efficient, though.)


> So, if the globals->builtins optimization is worth doing, don't tie it to 
> another optimization that's much more complicated and less useful like this, 
> or we'll never get your simple and useful idea.

Sure. I couldn't figure out where to even save the refcells for
attributes, so I only really saw an opportunity for name lookups.
Since locals and nonlocals don't require dict lookups, this means
globals() and __builtin__.
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Idea: Dictionary references

2015-12-17 Thread Andrew Barnert via Python-Dev
On Dec 17, 2015, at 13:37, Andrew Barnert via Python-Dev 
 wrote:
> 
> On Thursday, December 17, 2015 11:19 AM, Franklin? Lee 
>  wrote:
> 
> 
>> ...
>> as soon as I figure out how descriptors actually work...
> 
> 
> I think you need to learn what LOAD_ATTR and the machinery around it actually 
> does before I can explain why trying to optimize it like globals-vs.-builtins 
> doesn't make sense. Maybe someone who's better at explaining than me can come 
> up with something clearer than the existing documentation, but I can't.

I take that back. First, it was harsher than I intended. Second, I think I can 
explain things.

First, for non-attribute lookups:

(Non-shared) locals just load and save from an array.

Free variables and shared locals load and save by going through an extra 
dereference on a cell object in an array.

Globals do a single dict lookup.

Builtins do two dict lookups.

So, the only thing you can optimize there is builtins. But maybe that's worth 
it.

Next, for attribute lookups (not counting special methods):

Everything calls __getattribute__. Assuming that's not overridden and uses the 
object implementation:

Instance attributes do one dict lookup.

Class attributes (including normal methods, @property, etc.) do two or more 
dict lookups--first the instance, then the class, then each class on the 
class's MRO. Then, if the result has a __get__ method, it's called with the 
instance and class to get the actual value. This is how bound methods get 
created, property lookup functions get called, etc. The result of the 
descriptor call can't get cached (that would mean, for example, that every time 
you access the same @property on an instance, you'd get the same value).

Dynamic attributes from a __getattr__ do all that plus whatever __getattr__ 
does.

If __getattribute__ is overloaded, it's entirely up to that implementation to 
do whatever it wants.

Things are similar for set and del: they call __setattr__/__delattr__, and the 
default versions of those look in the instance dict first, then look for a 
descriptor the same as with get except that they call a different method on the 
descriptor (and if it's not a descriptor, instead of using it, they ignore it 
and go back to the instance dict).

So, your mechanism can't significantly speed up method lookups, properties, or 
most other things. It could speed up lookups for class attributes that aren't 
descriptors, but only at the cost of increasing the size of every instance--and 
how often do those matter anyway?

A different mechanism that cached references to descriptors instead of to the 
resulting attributes could speed up method lookups, etc., but only by a very 
small amount, and with the same space cost.

A mechanism that didn't try to get involved with the instance dict, and just 
flattened out the MRO search once that failed (and was out of the way before 
the descriptor call or __getattr__ even entered the picture) might speed 
methods up in deeply nested hierarchies, and with only a per-class rather than 
a per-instance space cost. But how often do you have deeply-nested hierarchies? 
And the speedup still isn't going to be that big: You're basically turning 5 
dict lookups plus 2 method calls into 2 dict lookups plus 2 method calls. And 
it would still be much harder to guard than the globals dict: if any superclass 
changes its __bases__ or adds or removes a __getattribute__ or various other 
things, all of your references have to get re-computed. That's rare enough that 
the speed may not matter, but the code complexity probably does.

If short: if you can't cache the bound methods (and as far as I can tell, in 
general you can't--even though 99% of the time it would work), I don't think 
there's any other significant win here. 

So, if the globals->builtins optimization is worth doing, don't tie it to 
another optimization that's much more complicated and less useful like this, or 
we'll never get your simple and useful idea. 

___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Idea: Dictionary references

2015-12-17 Thread Franklin? Lee
On Thu, Dec 17, 2015 at 6:41 PM, Franklin? Lee
 wrote:
> Then Descriptors are in the dict, so MIGHT benefit from refcells. The
> memory cost might be higher, though.

Might be worse than the savings, I mean.
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Idea: Dictionary references

2015-12-17 Thread Andrew Barnert via Python-Dev
On Dec 17, 2015, at 15:41, Franklin? Lee  wrote:
> 
> I already know that we can't use recursion, because it bypasses MRO.
> I'm also not yet sure whether it makes sense to use refs for classes
> in the first place.
> 
> As I understood it, an attribute will resolve in this order:
> - __getattribute__ up the MRO. (raises AttributeError)
> - __dict__ up the MRO. (raises KeyError)
> - __getattr__ up the MRO. (raises AttributeError)
> 
> 
> My new understanding:
> - __getattribute__. (raises AttributeError)
>- (default implementation:) __dict__.__getitem__. (raises KeyError)
> - __getattr__ up the MRO. (raises AttributeError)

No, still completely wrong.

If __getattribute__ raises an AttributeError (or isn't found, but that only 
happens in special cases like somehow calling a method on a type that hasn't 
been constructed), that's the end of the line; there's no fallback, and 
everything else (IIRC: searching MRO dicts for data descriptors, searching the 
instance dict, searching MRO dicts for non-data descriptors or non-descriptors, 
special-method-lookup-and-call __getattr__, raise AttributeError... and then 
doing the appropriate descriptor call at the end if needed).

I was going to say that the only custom __getattribute__ you'll find in 
builtins or stdlib is on type, which does the exact same thing except when it 
calls a descriptor it does __get__(None, cls) instead of __get__(obj, 
type(obj)), and if you find any third-party __getattribute__ you should just 
assume it's going to do something crazy and don't bother trying to help it. But 
then I remembered that super must have a custom __getattribute__, so... you'd 
probably need to search the code for others.

> If this is the case, then (the default) __getattribute__ will be
> making the repeated lookups, and might be the one requesting the
> refcells (for the ones it wants).

Yes, the default and type __getattribute__ are what you'd want to optimize, if 
anything. And maybe special-method lookup.

> Descriptors seem to be implemented as:
>Store a Descriptor object as an attribute. When a Descriptor is
> accessed, if it is being accessed from its owner, then unbox it and
> use its methods. Otherwise, it's a normal attribute.

Depending on what you mean by "owner", I think you have that backward. If the 
instance itself stores a descriptor, it's just used as itself; if the 
instance's _type_ (or a supertype) stores one, it's called to get the instance 
attribute.

> Then Descriptors are in the dict, so MIGHT benefit from refcells. The
> memory cost might be higher, though.

Same memory cost. They're just objects whose type's dicts happen to have a 
__get__ method (just like iterables are just objects whose type's dicts happen 
to have an __iter__ method). The point is that you can't cache the result of 
the descriptor call, you can cache the descriptor itself but it will rarely 
help, and the builtin method cache probably already takes care of 99% of the 
cases where it would help, so I don't see what you're going to get here.

>> On Thu, Dec 17, 2015 at 5:17 PM, Andrew Barnert  wrote:
>>> On Dec 17, 2015, at 13:37, Andrew Barnert via Python-Dev 
>>>  wrote:
>>> 
>>> On Thursday, December 17, 2015 11:19 AM, Franklin? Lee 
>>>  wrote:
>>> 
>>> 
 ...
 as soon as I figure out how descriptors actually work...
>>> 
>>> 
>>> I think you need to learn what LOAD_ATTR and the machinery around it 
>>> actually does before I can explain why trying to optimize it like 
>>> globals-vs.-builtins doesn't make sense. Maybe someone who's better at 
>>> explaining than me can come up with something clearer than the existing 
>>> documentation, but I can't.
>> 
>> I take that back. First, it was harsher than I intended. Second, I think I 
>> can explain things.
> 
> I appreciate it! Tracking function definitions in the source can make
> one want to do something else.

The documentation is pretty good for this stuff (and getting better every 
year). You mainly want the data model chapter of the reference and the 
descriptor howto guide; the dis and inspect docs in the library can also be 
helpful. Together they'll answer most of what you need.

If they don't, maybe I will try to write up an explanation as a blog post, but 
I don't think it needs to get sent to the list (except for the benefit of core 
devs calling me out of I screw up, but they have better things to do with their 
time).

>> First, for non-attribute lookups:
>> 
>> (Non-shared) locals just load and save from an array.
>> 
>> Free variables and shared locals load and save by going through an extra 
>> dereference on a cell object in an array.
> 
> In retrospect, of course they do.
> 
> It sounds like the idea is what's already used there, except the refs
> are synced to the locals array instead of a hash table.

Yes, which is already faster than what you want to do.

More 

Re: [Python-Dev] Idea: Dictionary references

2015-12-17 Thread Chris Angelico
On Fri, Dec 18, 2015 at 2:37 PM, Andrew Barnert via Python-Dev
 wrote:
> If __getattribute__ raises an AttributeError (or isn't found, but that only 
> happens in special cases like somehow calling a method on a type that hasn't 
> been constructed)

Wow. How do you do that? Is it possible with pure Python?

ChrisA
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Idea: Dictionary references

2015-12-17 Thread Victor Stinner
2015-12-17 11:54 GMT+01:00 Franklin? Lee :
> My suggestion should improve *all* function calls which refer to
> outside names.

Ok, I now think that you should stop hijacking the FAT Python thread.
I start a new thread. IMHO your dictionary reference idea is completly
unrelated to FAT Python.

FAT Python is about guards and specialized bytecode.

> Each function keeps an indirect, automagically updated
> reference to the current value of the names they use, and will never
> need to look things up again.[*]

Indirections, nested dictionaries, creation of new "reference"
objects... IMHO you are going to have major implementation issues :-/
The design looks *very* complex. I'm quite sure that you are going to
make namespace lookups *slower*.

It reminds me Python before the with statement and PyPy garbage
collector. Many applications relied on the exact behaviour of CPython
garbage collector. For example, they expected that a file is written
on disk when the last reference to the file object is destroyed. In
PyPy, it wasn't (it isn't) true, the write can be delayed.

I guess that with all your complex machinery for dict lookups, you
will have similar issues of object lifetime. It's unclear to me when
and how "reference" objects are destroyed, nor when dict values are
destroyed.

What happens if a dict key is removed and a reference object is still
alive? Is the dict value immediatly destroyed? Does the reference
object contain a strong or a weak reference to the value?

Victor
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Third milestone of FAT Python

2015-12-17 Thread Franklin? Lee
On Wed, Dec 16, 2015 at 2:01 AM, Victor Stinner
 wrote:
> Le mercredi 16 décembre 2015, Franklin? Lee 
> a écrit :
>>
>> I am confident that the time overhead and the savings will beat the
>> versioning dict. The versioning dict method has to save a reference to
>> the variable value and a reference to the name, and regularly test
>> whether the dict has changed.
>
>
> The performance of guards matters less than the performance of regular usage
> of dict. If we have to make a choice, I prefer "slow" guard but no impact on
> regular dict methods. It's very important that enabling FAT mode doesn't
> kill performances. Remember that FAT python is a static optimizer and so can
> only optimize some patterns, not all Python code.
>
> In my current implementation, a lookup is only needed when aguard is checked
> if the dict was modified. The dict version doesn't change if a mutable
> object was modified in place for example. I didn't benchmark, but I expect
> that the lookup is avoided in most cases. You should try FAT python and
> implement statistics before going too far in your idea.

My suggestion should improve *all* function calls which refer to
outside names. Each function keeps an indirect, automagically updated
reference to the current value of the names they use, and will never
need to look things up again.[*] There wouldn't be a need to save
global names as default arguments (`def f(x, list=list):`).

[*]: Not exactly true. Nested scopes can cause an issue.

I'm not sure what happens if you redefine the __builtin__ name after
these functions are defined. My suggestion would not know that the
__builtin__ was switched out, since it saves a ref into the original
__builtin__.

I'm not sure about how to deal with nested scopes (for example, class
inheritance). I think the "chained RefCells" idea works.


Chained Refcell idea:

There are three cases where we can have nested scopes:
1. An object's __dict__ nests its class.
2. A class's __dict__ nests its superclasses.
3. globals() nests __builtin__.
4? Does a package nest its modules?
5?. Does `from module import *` nest the module's __dict__?

(Nonlocal variables in nested functions, and probably nested classes,
are not a case of nested scope, since scope of each name is determined
during compilation of the function/class.)

RefCells of nested scopes will have a pointer to their value (or
NULL), and an array/linkedlist of pointers to refcells from their
parent dicts (or to their parent dicts if a refcell wasn't
successfully acquired yet).

When you request a RefCell from a nested Scope, it will return its
value if it exists. Otherwise, it requests refcells from each parent
(parents won't create refcells unless there's a value) until it gets
one.

When you ask a RefCell to resolve, it will check its own value, then
ask each parent for a value (creating intermediate refcells *if* value
exist). It will not need to do lookups in parents if it got a refcell
before (though the refcell might be null).

Problem: If you have class A(B, C), and you resolve a refcell for a
name which exists in C but not B, you will look things up through B's
dict every single time. It will fail, every single time. We can't skip
B, since B is allowed to get such a name later, but I don't want to
add refs to names that might never exist. This can be solved either
through versioning or by checking whether a dict is read-only (such as
for built-in types).

In fact, in the code I wrote at the end of this email,
RefCell.resolve() might even look things up in a shared ancestor
multiple times. However, this would be incorrect anyway, since it
doesn't respect Python's MRO resolution. So we can just fix that.
RefCell.resolve would need a `search_parents: bool` parameter.

>> I've read it again. By subclass, I mean that it implements the same
>> interface. But at the C level, I want to have it be a fork(?) of the
>> current dict implementation. As for `exec`, I think it might be okay
>> for it to be slower at the early stages of this game.
>
>
> Be careful, dict methods are hardcoded in the C code. If your type is not a
> subtype, there is risk of crashes.

Not exactly, and this is important. Many functions are called via
pointer. It's like C++'s virtual methods, but more dynamic, since they
can be changed per object. See
https://github.com/python/cpython/blob/master/Objects/dict-common.h#L17.

For example, the lookup method for str-only dicts swaps itself for a
general object lookup method if it finds a non-string key. See
https://github.com/python/cpython/blob/master/Objects/dictobject.c#L549.

I'm now suggesting that there be additional space in the dict object
itself to hold more function pointers (get_ref and
get_hash_table_index), which would slightly increase memory cost and
creation time. It won't have extra cost for running normal methods.
When the dict gets a request for a reference, it will swap in methods
that knows how to