Re: [Python-Dev] Idea: Dictionary references
You can very easily implement this with version tags on the globals dictionaries - means that the dictionaries have versions and the guard checking if everything is ok just checks the version tag on globals. Generally speaking, such optimizations have been done in the past (even in places like pypy, but also in literature) and as soon as we have dynamic compilation (and FAT is a form of it), you can do such tricks. On Thu, Dec 17, 2015 at 3:48 PM, Steven D'Apranowrote: > On Thu, Dec 17, 2015 at 12:53:13PM +0100, Victor Stinner quoted: >> 2015-12-17 11:54 GMT+01:00 Franklin? Lee : > >> > Each function keeps an indirect, automagically updated >> > reference to the current value of the names they use, > > Isn't that a description of globals()? If you want to look up a name > "spam", you grab an indirect reference to it: > > globals()["spam"] > > which returns the current value of the name "spam". > > >> > and will never need to look things up again.[*] > > How will this work? > > Naively, it sounds to me like Franklin is suggesting that on every > global assignment, the interpreter will have to touch every single > function in the module to update that name. Something like this: > > # on a global assignment > spam = 23 > > # the interpreter must do something like this: > for function in module.list_of_functions: > if "spam" in function.__code__.__global_names__: > function.__code__.__global_names__["spam"] = spam > > As I said, that's a very naive way to implement this. Unless you have > something much cleverer, I think this will be horribly slow. > > And besides, you *still* need to deal with the case that the name isn't > a global at all, but in the built-ins namespace. > > > -- > Steve > ___ > Python-Dev mailing list > Python-Dev@python.org > https://mail.python.org/mailman/listinfo/python-dev > Unsubscribe: > https://mail.python.org/mailman/options/python-dev/fijall%40gmail.com ___ Python-Dev mailing list Python-Dev@python.org https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Idea: Dictionary references
On Thu, Dec 17, 2015 at 12:53:13PM +0100, Victor Stinner quoted: > 2015-12-17 11:54 GMT+01:00 Franklin? Lee: > > Each function keeps an indirect, automagically updated > > reference to the current value of the names they use, Isn't that a description of globals()? If you want to look up a name "spam", you grab an indirect reference to it: globals()["spam"] which returns the current value of the name "spam". > > and will never need to look things up again.[*] How will this work? Naively, it sounds to me like Franklin is suggesting that on every global assignment, the interpreter will have to touch every single function in the module to update that name. Something like this: # on a global assignment spam = 23 # the interpreter must do something like this: for function in module.list_of_functions: if "spam" in function.__code__.__global_names__: function.__code__.__global_names__["spam"] = spam As I said, that's a very naive way to implement this. Unless you have something much cleverer, I think this will be horribly slow. And besides, you *still* need to deal with the case that the name isn't a global at all, but in the built-ins namespace. -- Steve ___ Python-Dev mailing list Python-Dev@python.org https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
[Python-Dev] pypi simple index
Hi, I'm using install_requires in setup.py to specify a specific package my project is dependant on. When running python setup.py install, apparently the simple index is used as an older package is taken from pypi. While in https://pypi.python.org/pypi, there's a newer package. When installing directly using pip, the latest package is installed successfully. Several questions: 1. What's the difference between the pypi simple index and the general pypi index? 2. Why is setup.py defaulting to the simple index? 3. How can I make the setup.py triggered install use the main pypi index instead of simple Thanks! Carlos ___ Python-Dev mailing list Python-Dev@python.org https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Idea: Dictionary references
(Previous thread was here, by the way: https://mail.python.org/pipermail/python-dev/2015-December/142437.html) On Thu, Dec 17, 2015 at 8:48 AM, Steven D'Apranowrote: > On Thu, Dec 17, 2015 at 12:53:13PM +0100, Victor Stinner quoted: >> 2015-12-17 11:54 GMT+01:00 Franklin? Lee : > >> > Each function keeps an indirect, automagically updated >> > reference to the current value of the names they use, > > Isn't that a description of globals()? If you want to look up a name > "spam", you grab an indirect reference to it: > > globals()["spam"] > > which returns the current value of the name "spam". The *current value*. I'm proposing that we instead do this: spamref = globals().getref('spam') Every time we want to find the current, updated value of 'spam', we just do spam = spamref.resolve() which will skip the hash lookup and go directly to the value. >> > and will never need to look things up again.[*] > > How will this work? > > Naively, it sounds to me like Franklin is suggesting that on every > global assignment, the interpreter will have to touch every single > function in the module to update that name. A refcell holds a pointer to the location in the dict itself where the value pointer is. When the value is updated in the dict, the refcell will not need to be updated. My original proposal wanted to keep cells in the "real" dict, and update them. Like so: class RefCell: __slots__ = ['value'] class ScopeDict(dict): def __getitem__(self, key): value = super()[key].value #may raise if value is NULL: raise KeyError(key) return value def __setitem__(self, key, value): if key in super(): super()[key].value = value else: cell = super()[key] = RefCell() cell.value = value def __delitem__(self, key, value): cell = super()[key] #may raise if cell.value is NULL: raise KeyError(key) cell.value = NULL I realized later that this isn't necessary. Most dict operations don't need to know about the indirection, so I make the inner dict a normal dict (with a few more holes than normal). But this would show how you can avoid manual updates for references. > And besides, you *still* need to deal with the case that the name isn't > a global at all, but in the built-ins namespace. globals() to __builtin__ is a nesting relationship. At the bottom of the following email, I have a pseudocode implementation which knows how to deal with nested scopes. https://mail.python.org/pipermail/python-dev/2015-December/142489.html ___ Python-Dev mailing list Python-Dev@python.org https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Idea: Dictionary references
2015-12-17 16:38 GMT+01:00 Franklin? Lee: > Yeah, maybe it's out of the scope of bytecode optimization. But I > think this would make guards unnecessary, since once a name is found, > there's a quick way to refer to it. FAT Python requires and supports various kinds of guards: * type of a function argument * dictionary key * function (func.__code__) I guess that you are talking about the dictionary key guards. In fact, they are 4 subtypes of dict guards: * guard on builtins.__dict__['key'] * guard on globals()['key'] * guard on MyClass.__dict__['key'] * guard on dict['key'] The implementation of the check if this guard currently relies on a new "version" (global dict version, incremented at each change) to avoid lookup if possible. The guard stores a strong reference to the key and the value. If the value is different, the guard checks returns False. I don't understand how you plan to avoid guards. The purpose of guards is to respect the Python semantic by falling back to the "slow" bytecode if something changes. So I don't understand your idea of avoiding completly guards. Again, that's why I guess that it's unrelated to FAT Python... Or it's just that your idea is an alternative to dict versionning to get efficient guards on dict keys, right? > Guards would also have an issue > with nested scopes. You have a note on your website about it: > (https://faster-cpython.readthedocs.org/fat_python.html#call-pure-builtins) > > "The optimization is disabled when the builtin function is > modified or if a variable with the same name is added to the global > namespace of the function." FAT Python doesn't emit a specialized version if it requires a builtin function, but a local variable with the same name is found. The check is done in the current function but also in the parent namespaces, up to the global namespace. I'm talking about the static analysis of the code. If the specialized version is built, a guard is created on the builtin namespace and another on the global namespace. Sorry, I don't understand the "problem" you are referring to. Can you maybe show an example of code where FAT Python doesn't work or where you idea can help? > RefCells "know" whether they're part of a living dict. (The dict marks > them as such upon its death.) If they are not, they will decref their > value upon their death. They do not hold a reference to their owner > dict. The dict contains the list all of its "own" RefCell objects, right? > If it's part of a living dict, we have a choice: the dict can be > responsible for deletion, or the RefCell can be responsible for > deletion. It doesn't really matter which design we go with. I see your RefCell idea like dictionary views. Views are linked to the dict. If the dict is modified, views are "updated" too. It would be confusing to have a view disconnected from its container. In short, RefCell is a view on a single dict entry, to be able to "watch" a dict entry without the cost of a lookup? And a RefCell can be created, even if the dict entry doesn't exist right? Hum, creating many RefCell introduces an overhead somewhere. For example, deleting a key has to update N RefCell objects linked to this key, right? So del dict[x] takes O(number of RefCell), right? > If a dict key is removed, the inner dict will still have the key > (which is true in the current implementation), but the value will be > decref'd and the value pointer will be NULL'd. The RefCell will not > need to be updated, since (as part of a living dict) it's pointing at > the pointer to the value, not the object itself. Ok, so deleting a dict key always destroy the value, but the key may stay alive longer than expected (until all RefCell objects are destroyed). Usually, dict keys are constants, so keeping them alive doesn't matter so much. It's rare to have large keys. > If it is detached from its owner dict (due to owner death), it will > own a strong reference to the value. This is necessarily the case, > since things that have a (strong) reference to the RefCell expect to > find the value (or lack of a value) there. I don't understand this part. You said that deleting a key destroys the value. Destroying a dict means clearing all keys, so destroying all values. No? What is the use case of having a RefCell no more connected to the dict? Victor ___ Python-Dev mailing list Python-Dev@python.org https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Idea: Dictionary references
On Thu, Dec 17, 2015 at 6:53 AM, Victor Stinnerwrote: > 2015-12-17 11:54 GMT+01:00 Franklin? Lee : >> My suggestion should improve *all* function calls which refer to >> outside names. > > Ok, I now think that you should stop hijacking the FAT Python thread. > I start a new thread. IMHO your dictionary reference idea is completly > unrelated to FAT Python. > > FAT Python is about guards and specialized bytecode. Yeah, maybe it's out of the scope of bytecode optimization. But I think this would make guards unnecessary, since once a name is found, there's a quick way to refer to it. >> Each function keeps an indirect, automagically updated >> reference to the current value of the names they use, and will never >> need to look things up again.[*] > > Indirections, nested dictionaries, creation of new "reference" > objects... IMHO you are going to have major implementation issues :-/ > The design looks *very* complex. I'm quite sure that you are going to > make namespace lookups *slower*. The nested dictionaries are only for nested scopes (and inner functions don't create nested scopes). Nested scopes will already require multiple lookups in parents. I think this is strictly an improvement, except perhaps in memory. Guards would also have an issue with nested scopes. You have a note on your website about it: (https://faster-cpython.readthedocs.org/fat_python.html#call-pure-builtins) "The optimization is disabled when the builtin function is modified or if a variable with the same name is added to the global namespace of the function." With a NestedRefCell, it would check globals() (a simple dereference and `pointer != NULL`) and then check __builtin__. If it finds it in __builtin__, it will save a reference to that. It will only do repeated lookups in __builtin__ if each of the previous lookups fail. As far as I know, these repeated lookups are already necessary, and anything that can be used to avoid them (e.g. guards) can be used for repeated failed lookups, too. For non-nested scopes, it will look things up once, costing an extra RefCell creation if necessary, and the only other costs are on resizing, deletion of the dict, and working with a larger dict in general. The important parts of the design is pretty much in the code that I posted. We keep an extra hash table for refs, and keep it the same size as the original hash table, so that we pay a single lookup cost to get the index in both. > It reminds me Python before the with statement and PyPy garbage > collector. Many applications relied on the exact behaviour of CPython > garbage collector. For example, they expected that a file is written > on disk when the last reference to the file object is destroyed. In > PyPy, it wasn't (it isn't) true, the write can be delayed. It should not affect the semantic. Things should still happen as they used to, as far as I can figure. Or at least as far as the rules of the interpreter are concerned. (That is, values might live a little longer in PyPy, but can't be forced to live longer than they were formerly allowed to.) > I guess that with all your complex machinery for dict lookups, The only cost to a normal getitem (again, other than from looking it up in a bigger dict) is to make sure the return value isn't NULL. The machinery is involved in function creation and resolution of names: On function creation, get refs to each name used. When the name is used, try to resolve the refs. >you > will have similar issues of object lifetime. It's unclear to me when > and how "reference" objects are destroyed, nor when dict values are > destroyed. RefCells are ref-counted PyObjects. That is not an issue. A RefCell will live while it is useful (= it has an outside reference) or while it's not useful but its owner dict hasn't been resized/deleted yet (at which time RefCells without outside references will be deleted). RefCells "know" whether they're part of a living dict. (The dict marks them as such upon its death.) If they are not, they will decref their value upon their death. They do not hold a reference to their owner dict. If it's part of a living dict, we have a choice: the dict can be responsible for deletion, or the RefCell can be responsible for deletion. It doesn't really matter which design we go with. > What happens if a dict key is removed and a reference object is still > alive? Is the dict value immediatly destroyed? Does the reference > object contain a strong or a weak reference to the value? If a dict key is removed, the inner dict will still have the key (which is true in the current implementation), but the value will be decref'd and the value pointer will be NULL'd. The RefCell will not need to be updated, since (as part of a living dict) it's pointing at the pointer to the value, not the object itself. If it is detached from its owner dict (due to owner
[Python-Dev] pypi simple index
Hi, I'm using install_requires in setup.py to specify a specific package my project is dependant on. When running python setup.py install, apparently the simple index is used as an older package is taken from pypi. While in https://pypi.python.org/pypi, there's a newer package. When installing directly using pip, the latest package is installed successfully. I noticed that the new package is only available as a wheel and older versions of setup tools won't install wheels for install_requires. However, upgrading setuptools didn't help. Several questions: 1. What's the difference between the pypi simple index and the general pypi index? 2. Why is setup.py defaulting to the simple index? 3. How can I make the setup.py triggered install use the main pypi index instead of simple Thanks! ___ Python-Dev mailing list Python-Dev@python.org https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] pypi simple index
PyPI questions are best directed towards the distutils-sig as they manage PyPI and not python-dev. On Thu, 17 Dec 2015 at 08:20 Carlos Barerawrote: > Hi, > > I'm using install_requires in setup.py to specify a specific package my > project is dependant on. > When running python setup.py install, apparently the simple index is used > as an older package is taken from pypi. While in > https://pypi.python.org/pypi, there's a newer package. > When installing directly using pip, the latest package is installed > successfully. > > Several questions: > 1. What's the difference between the pypi simple index and the general > pypi index? > 2. Why is setup.py defaulting to the simple index? > 3. How can I make the setup.py triggered install use the main pypi index > instead of simple > > Thanks! > Carlos > ___ > Python-Dev mailing list > Python-Dev@python.org > https://mail.python.org/mailman/listinfo/python-dev > Unsubscribe: > https://mail.python.org/mailman/options/python-dev/brett%40python.org > ___ Python-Dev mailing list Python-Dev@python.org https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Idea: Dictionary references
On Dec 17, 2015, at 07:38, Franklin? Leewrote: > > The nested dictionaries are only for nested scopes (and inner > functions don't create nested scopes). Nested scopes will already > require multiple lookups in parents. I think I understand what you're getting at here, but it's a really confusing use of terminology. In Python, and in programming in general, nested scopes refer to exactly inner functions (and classes) being lexically nested and doing lookup through outer scopes. The fact that this is optimized at compile time to FAST vs. CELL vs. GLOBAL/NAME, cells are optimized at function-creation time, and only global and name have to be resolved at the last second doesn't mean that there's no scoping, or some other form of scoping besides lexical. The actual semantics are LEGB, even if L vs. E vs. GB and E vs. further-out E can be optimized. What you're talking about here is global lookups falling back to builtin lookups. There's no more general notion of nesting or scoping involved, so why use those words? Also, reading your earlier post, it sounds like you're trying to treat attribute lookup as a special case of global lookup, only with a chain of superclasses beyond the class instead of just a single builtins. But they're totally different. Class lookup doesn't just look in a series of dicts, it calls __getattribute__ which usually calls __getattr__ which may or may not look in the __dict__s (which may not even exist) to find a descriptor and then calls its __get__ method to get the value. You'd have to somehow handle the case where the search only went through object.__getattribute__ and __getattr__ and found a result by looking in a dict, to make a RefCell to that dict which is marked in some way that says "I'm not a value, I'm a descriptor you have to call each time", and then apply some guards that will detect whether that class or any intervening class dict touched that key, whether the MRO changed, whether that class or any intervening class added or changed implementations for __getatttibute__ or __getattr__, and probably more things I haven't thought of. What do those guards look like? (Also, you need a different set of rules to cache, and guard for, special method lookup--you could just ignore that, but I think those are the lookups that would benefit most from optimization.) So, trying to generalize global vs. builtin to a general notion of "nested scope" that isn't necessary for builtins and doesn't work for anything else seems like overcomplicating things for no benefit. > I think this is strictly an > improvement, except perhaps in memory. Guards would also have an issue > with nested scopes. You have a note on your website about it: > (https://faster-cpython.readthedocs.org/fat_python.html#call-pure-builtins) ___ Python-Dev mailing list Python-Dev@python.org https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] pypi simple index
On 18 December 2015 at 06:13, Carlos Barerawrote: > Hi, > > I'm using install_requires in setup.py to specify a specific package my > project is dependant on. > When running python setup.py install, apparently the simple index is used > as an older package is taken from pypi. While > What's happening here is that easy-install is triggering - which does not support wheels. Use 'pip install .' instead. > in https://pypi.python.org/pypi, there's a newer package. > When installing directly using pip, the latest package is installed > successfully. > I noticed that the new package is only available as a wheel and older > versions of setup tools won't install wheels for install_requires. > However, upgrading setuptools didn't help. > > Several questions: > 1. What's the difference between the pypi simple index and the general > pypi index? > The '/simple' API is for machine consumption, /pypi is for humans, other than that there should be not be any difference. > 2. Why is setup.py defaulting to the simple index? > Because it is the only index :). > 3. How can I make the setup.py triggered install use the main pypi index > instead of simple > You can't - the issue is not the index being consulted, but your use of 'python setup.py install' which does not support wheels. Cheers, Rob ___ Python-Dev mailing list Python-Dev@python.org https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Idea: Dictionary references
On Thu, Dec 17, 2015 at 11:50 AM, Victor Stinnerwrote: > I don't understand how you plan to avoid guards. The purpose of guards > is to respect the Python semantic by falling back to the "slow" > bytecode if something changes. So I don't understand your idea of > avoiding completly guards. Again, that's why I guess that it's > unrelated to FAT Python... Yeah, I guess it is. Maybe we could've moved to python-ideas. As far as I understand, this concept can be put into CPython. (Note to self: Look up how PyPy handles repeated lookups.) >> Guards would also have an issue >> with nested scopes. You have a note on your website about it: >> (https://faster-cpython.readthedocs.org/fat_python.html#call-pure-builtins) >> >> "The optimization is disabled when the builtin function is >> modified or if a variable with the same name is added to the global >> namespace of the function." > > FAT Python doesn't emit a specialized version if it requires a builtin > function, but a local variable with the same name is found. > > The check is done in the current function but also in the parent > namespaces, up to the global namespace. I'm talking about the static > analysis of the code. > > If the specialized version is built, a guard is created on the builtin > namespace and another on the global namespace. > > Sorry, I don't understand the "problem" you are referring to. Can you > maybe show an example of code where FAT Python doesn't work or where > you idea can help? If we have to look in scopes A, B, C in order, where the name is in C but never in B, and there's no nesting relationship between B and C. In that case, I do not create a refcell in B chained to C (because there's no relationship), so I keep doing lookups in B. That's the problem. For that, guards and versioning can prevent unnecessary lookups in B. Though I think a better solution might be: If a name is found in C, create empty refcells in B and A (i.e. in all previous dicts). (There's a nesting relationship between globals() and __builtin__, so that's fine.) >> RefCells "know" whether they're part of a living dict. (The dict marks >> them as such upon its death.) If they are not, they will decref their >> value upon their death. They do not hold a reference to their owner >> dict. > > The dict contains the list all of its "own" RefCell objects, right? It contains a table of pointers. The pointers are to RefCell objects or NULL. The refs table is exactly the same size as the internal hash table. This makes indexing it efficient: to find the pointer to a refcell, find the index of the key in the hash table, then use that SAME index on the refs table. You never need to find a refcell without also finding its hash index, so this is cheap. > In short, RefCell is a view on a single dict entry, to be able to > "watch" a dict entry without the cost of a lookup? And a RefCell can > be created, even if the dict entry doesn't exist right? My "implementation", which had nesting and recursion in mind, had a "create_if_none" parameter, which meant that the requester can ask for it to be created even if the key didn't exist in the table. Pre-creation is useful for functions which refer to globals() names before they're defined. No-creation is useful in... I can only think of nesting as a use (globals() -> __builtin__ shouldn't create empty cells in __builtin__). See `getref` in here: https://mail.python.org/pipermail/python-dev/2015-December/142489.html > Hum, creating many RefCell introduces an overhead somewhere. For > example, deleting a key has to update N RefCell objects linked to this > key, right? So del dict[x] takes O(number of RefCell), right? There are no "outside" updates, except when a dict moves to a different internal table or deletes its internal table. In that case, the dict has to move and update each exposed RefCell. For each dict, for each key, there is at most one RefCell. As long as the dict is alive, that RefCell will hold a pointer to the pointer to the value (enforced by the dict). When the dict dies, it makes the Refcell point to the object, and tells the RefCell it's free (so it's in charge of cleaning up its value). Dict entries look like this. typedef struct { /* Cached hash code of me_key. */ Py_hash_t me_hash; PyObject *me_key; PyObject *me_value; /* This field is only meaningful for combined tables */ } PyDictKeyEntry; The internal table (which the ref table will sync with) is an array of PyDictKeyEntrys. (Raymond Hettinger made a design with a smaller table, and the hash lookup would be into an array of indices. This would make synced metadata tables both easier and smaller. See https://mail.python.org/pipermail/python-dev/2012-December/123028.html and latest relevant discussion https://mail.python.org/pipermail/python-ideas/2015-December/037468.html ) The refcell will hold this: RefCell(_value) A pointer to the field, not to the
Re: [Python-Dev] Idea: Dictionary references
On Thu, Dec 17, 2015 at 12:30 PM, Andrew Barnertwrote: > On Dec 17, 2015, at 07:38, Franklin? Lee > wrote: >> >> The nested dictionaries are only for nested scopes (and inner >> functions don't create nested scopes). Nested scopes will already >> require multiple lookups in parents. > > I think I understand what you're getting at here, but it's a really confusing > use of terminology. In Python, and in programming in general, nested scopes > refer to exactly inner functions (and classes) being lexically nested and > doing lookup through outer scopes. The fact that this is optimized at compile > time to FAST vs. CELL vs. GLOBAL/NAME, cells are optimized at > function-creation time, and only global and name have to be resolved at the > last second doesn't mean that there's no scoping, or some other form of > scoping besides lexical. The actual semantics are LEGB, even if L vs. E vs. > GB and E vs. further-out E can be optimized. Oh, I've never actually read the Python scoping rules spelled out. I wasn't sure if there were other cases. The other two cases I thought of as "nesting" were: object to its class, and class to its superclasses. > Also, reading your earlier post, it sounds like you're trying to treat > attribute lookup as a special case of global lookup, only with a chain of > superclasses beyond the class instead of just a single builtins. But they're > totally different. Class lookup doesn't just look in a series of dicts, it > calls __getattribute__ which usually calls __getattr__ which may or may not > look in the __dict__s (which may not even exist) to find a descriptor and > then calls its __get__ method to get the value. You'd have to somehow handle > the case where the search only went through object.__getattribute__ and > __getattr__ and found a result by looking in a dict, to make a RefCell to > that dict which is marked in some way that says "I'm not a value, I'm a > descriptor you have to call each time", and then apply some guards that will > detect whether that class or any intervening class dict touched that key, > whether the MRO changed, whether that class or any intervening class added or > changed implementations f or __getatttibute__ or __getattr__, and probably more things I haven't thought of. What do those guards look like? (Also, you need a different set of rules to cache, and guard for, special method lookup--you could just ignore that, but I think those are the lookups that would benefit most from optimization.) Doesn't __getattr__ only get called if all the mro __dict__ lookups failed? I forgot about __getattribute__. That might be the point at which refs are optimized. As for descriptors versus RefCells, I'm guessing that can be resolved, as soon as I figure out how descriptors actually work... If descriptors don't modify the __dict__, then RefCells shouldn't get involved. If they do, then there's some unwrapping going on there, and RefCells should fit right in (though whether they'll improve anything is a different question). RefCells are just a shortcut for dict lookups. For guards, I think Victor Stinner's idea could supplement this. Alternatively, in my other email, I said there could be a rule of, "Create intermediate RefCells for anything BEFORE a successful lookup." So if we look in A, B, C, D, and find it in C, then we create and save RefCells in A, B, C, but not D (where D = object). This MIGHT result in a lot of intermediate RefCells, but I'd guess most things aren't looked up just once, and it's saying, "It's possible for B to gain member B.x and catch me on my way to C.x." > So, trying to generalize global vs. builtin to a general notion of "nested > scope" that isn't necessary for builtins and doesn't work for anything else > seems like overcomplicating things for no benefit. Probably. The globals() and __builtin__ case is simpler than the class case. ___ Python-Dev mailing list Python-Dev@python.org https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Idea: Dictionary references
On Thursday, December 17, 2015 11:19 AM, Franklin? Leewrote: > ... > as soon as I figure out how descriptors actually work... I think you need to learn what LOAD_ATTR and the machinery around it actually does before I can explain why trying to optimize it like globals-vs.-builtins doesn't make sense. Maybe someone who's better at explaining than me can come up with something clearer than the existing documentation, but I can't. ___ Python-Dev mailing list Python-Dev@python.org https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Idea: Dictionary references
Le vendredi 18 décembre 2015, Andrew Barnert via Python-Dev < python-dev@python.org> a écrit : > > >> Builtins do two dict lookups. > > > > Two? > > Actually, I guess three: first you fail to find the name in globals, then > you find __builtins__ in globals, then you find the name in __builtins__ or > __builtins__.__dict__. > Getting builtins from globals in done when the frame object is created, and the lookup is skipped in the common case if qu recall correctly. LOAD_NAME does a lookup on function globals, if the key doesn't exist (common case for builtins), a ceond lookup is done in frame builtins. Open Python/ceval.c and see to code. There is an optimisation for fast lookup in two dict. Victor ___ Python-Dev mailing list Python-Dev@python.org https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Idea: Dictionary references
2015-12-17 23:17 GMT+01:00 Andrew Barnert via Python-Dev: > Builtins do two dict lookups. > > So, the only thing you can optimize there is builtins. But maybe that's worth > it. FYI I implemented an optimization in FAT Python to avoid lookups for builtin functions, builtin functions are copied to code constants at runtime: https://faster-cpython.readthedocs.org/fat_python.html#copy-builtin-functions-to-constants It's nothing new, it's the generalization of common hacks, like 'def func(len=len): return len(3)'. The optimization is restricted to loading builtin symbols which are not expected to be modified ;-) (Right now the optimization is disabled by default, because the optimizer is unable to detect when builtins are modified in the current function, and so it breaks the Python semantic.) > Class attributes (including normal methods, @property, etc.) do two or more > dict lookups--first the instance, then the class, then each class on the > class's MRO. Note: Types have an efficient cache for name lookups ;-) Thanks for this cache, it's no more an issue to have a deep hierarchy of classes. Victor ___ Python-Dev mailing list Python-Dev@python.org https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Idea: Dictionary references
I already know that we can't use recursion, because it bypasses MRO. I'm also not yet sure whether it makes sense to use refs for classes in the first place. As I understood it, an attribute will resolve in this order: - __getattribute__ up the MRO. (raises AttributeError) - __dict__ up the MRO. (raises KeyError) - __getattr__ up the MRO. (raises AttributeError) My new understanding: - __getattribute__. (raises AttributeError) - (default implementation:) __dict__.__getitem__. (raises KeyError) - __getattr__ up the MRO. (raises AttributeError) If this is the case, then (the default) __getattribute__ will be making the repeated lookups, and might be the one requesting the refcells (for the ones it wants). Descriptors seem to be implemented as: Store a Descriptor object as an attribute. When a Descriptor is accessed, if it is being accessed from its owner, then unbox it and use its methods. Otherwise, it's a normal attribute. Then Descriptors are in the dict, so MIGHT benefit from refcells. The memory cost might be higher, though. On Thu, Dec 17, 2015 at 5:17 PM, Andrew Barnertwrote: > On Dec 17, 2015, at 13:37, Andrew Barnert via Python-Dev > wrote: >> >> On Thursday, December 17, 2015 11:19 AM, Franklin? Lee >> wrote: >> >> >>> ... >>> as soon as I figure out how descriptors actually work... >> >> >> I think you need to learn what LOAD_ATTR and the machinery around it >> actually does before I can explain why trying to optimize it like >> globals-vs.-builtins doesn't make sense. Maybe someone who's better at >> explaining than me can come up with something clearer than the existing >> documentation, but I can't. > > I take that back. First, it was harsher than I intended. Second, I think I > can explain things. I appreciate it! Tracking function definitions in the source can make one want to do something else. > First, for non-attribute lookups: > > (Non-shared) locals just load and save from an array. > > Free variables and shared locals load and save by going through an extra > dereference on a cell object in an array. In retrospect, of course they do. It sounds like the idea is what's already used there, except the refs are synced to the locals array instead of a hash table. > Globals do a single dict lookup. A single dict lookup per function definition per name used? That's what I'm proposing. For example, (and I only just remembered that comprehensions and gen expressions create scope) [f(x) for x in range(1)] would look up the name `f` at most twice (once in globals(), once in builtins() if needed), and will always have the latest version of `f`. And if it's in a function, the refcell(s) would be saved by the function. > Builtins do two dict lookups. Two? > Class attributes (including normal methods, @property, etc.) do two or more > dict lookups--first the instance, then the class, then each class on the > class's MRO. Then, if the result has a __get__ method, it's called with the > instance and class to get the actual value. This is how bound methods get > created, property lookup functions get called, etc. The result of the > descriptor call can't get cached (that would mean, for example, that every > time you access the same @property on an instance, you'd get the same value). Yeah, I would only try to save in a dict lookup to get the descriptor, and I'm not sure it's worth it. (Victor's response says that class attributes are already efficient, though.) > So, if the globals->builtins optimization is worth doing, don't tie it to > another optimization that's much more complicated and less useful like this, > or we'll never get your simple and useful idea. Sure. I couldn't figure out where to even save the refcells for attributes, so I only really saw an opportunity for name lookups. Since locals and nonlocals don't require dict lookups, this means globals() and __builtin__. ___ Python-Dev mailing list Python-Dev@python.org https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Idea: Dictionary references
On Dec 17, 2015, at 13:37, Andrew Barnert via Python-Devwrote: > > On Thursday, December 17, 2015 11:19 AM, Franklin? Lee > wrote: > > >> ... >> as soon as I figure out how descriptors actually work... > > > I think you need to learn what LOAD_ATTR and the machinery around it actually > does before I can explain why trying to optimize it like globals-vs.-builtins > doesn't make sense. Maybe someone who's better at explaining than me can come > up with something clearer than the existing documentation, but I can't. I take that back. First, it was harsher than I intended. Second, I think I can explain things. First, for non-attribute lookups: (Non-shared) locals just load and save from an array. Free variables and shared locals load and save by going through an extra dereference on a cell object in an array. Globals do a single dict lookup. Builtins do two dict lookups. So, the only thing you can optimize there is builtins. But maybe that's worth it. Next, for attribute lookups (not counting special methods): Everything calls __getattribute__. Assuming that's not overridden and uses the object implementation: Instance attributes do one dict lookup. Class attributes (including normal methods, @property, etc.) do two or more dict lookups--first the instance, then the class, then each class on the class's MRO. Then, if the result has a __get__ method, it's called with the instance and class to get the actual value. This is how bound methods get created, property lookup functions get called, etc. The result of the descriptor call can't get cached (that would mean, for example, that every time you access the same @property on an instance, you'd get the same value). Dynamic attributes from a __getattr__ do all that plus whatever __getattr__ does. If __getattribute__ is overloaded, it's entirely up to that implementation to do whatever it wants. Things are similar for set and del: they call __setattr__/__delattr__, and the default versions of those look in the instance dict first, then look for a descriptor the same as with get except that they call a different method on the descriptor (and if it's not a descriptor, instead of using it, they ignore it and go back to the instance dict). So, your mechanism can't significantly speed up method lookups, properties, or most other things. It could speed up lookups for class attributes that aren't descriptors, but only at the cost of increasing the size of every instance--and how often do those matter anyway? A different mechanism that cached references to descriptors instead of to the resulting attributes could speed up method lookups, etc., but only by a very small amount, and with the same space cost. A mechanism that didn't try to get involved with the instance dict, and just flattened out the MRO search once that failed (and was out of the way before the descriptor call or __getattr__ even entered the picture) might speed methods up in deeply nested hierarchies, and with only a per-class rather than a per-instance space cost. But how often do you have deeply-nested hierarchies? And the speedup still isn't going to be that big: You're basically turning 5 dict lookups plus 2 method calls into 2 dict lookups plus 2 method calls. And it would still be much harder to guard than the globals dict: if any superclass changes its __bases__ or adds or removes a __getattribute__ or various other things, all of your references have to get re-computed. That's rare enough that the speed may not matter, but the code complexity probably does. If short: if you can't cache the bound methods (and as far as I can tell, in general you can't--even though 99% of the time it would work), I don't think there's any other significant win here. So, if the globals->builtins optimization is worth doing, don't tie it to another optimization that's much more complicated and less useful like this, or we'll never get your simple and useful idea. ___ Python-Dev mailing list Python-Dev@python.org https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Idea: Dictionary references
On Thu, Dec 17, 2015 at 6:41 PM, Franklin? Leewrote: > Then Descriptors are in the dict, so MIGHT benefit from refcells. The > memory cost might be higher, though. Might be worse than the savings, I mean. ___ Python-Dev mailing list Python-Dev@python.org https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Idea: Dictionary references
On Dec 17, 2015, at 15:41, Franklin? Leewrote: > > I already know that we can't use recursion, because it bypasses MRO. > I'm also not yet sure whether it makes sense to use refs for classes > in the first place. > > As I understood it, an attribute will resolve in this order: > - __getattribute__ up the MRO. (raises AttributeError) > - __dict__ up the MRO. (raises KeyError) > - __getattr__ up the MRO. (raises AttributeError) > > > My new understanding: > - __getattribute__. (raises AttributeError) >- (default implementation:) __dict__.__getitem__. (raises KeyError) > - __getattr__ up the MRO. (raises AttributeError) No, still completely wrong. If __getattribute__ raises an AttributeError (or isn't found, but that only happens in special cases like somehow calling a method on a type that hasn't been constructed), that's the end of the line; there's no fallback, and everything else (IIRC: searching MRO dicts for data descriptors, searching the instance dict, searching MRO dicts for non-data descriptors or non-descriptors, special-method-lookup-and-call __getattr__, raise AttributeError... and then doing the appropriate descriptor call at the end if needed). I was going to say that the only custom __getattribute__ you'll find in builtins or stdlib is on type, which does the exact same thing except when it calls a descriptor it does __get__(None, cls) instead of __get__(obj, type(obj)), and if you find any third-party __getattribute__ you should just assume it's going to do something crazy and don't bother trying to help it. But then I remembered that super must have a custom __getattribute__, so... you'd probably need to search the code for others. > If this is the case, then (the default) __getattribute__ will be > making the repeated lookups, and might be the one requesting the > refcells (for the ones it wants). Yes, the default and type __getattribute__ are what you'd want to optimize, if anything. And maybe special-method lookup. > Descriptors seem to be implemented as: >Store a Descriptor object as an attribute. When a Descriptor is > accessed, if it is being accessed from its owner, then unbox it and > use its methods. Otherwise, it's a normal attribute. Depending on what you mean by "owner", I think you have that backward. If the instance itself stores a descriptor, it's just used as itself; if the instance's _type_ (or a supertype) stores one, it's called to get the instance attribute. > Then Descriptors are in the dict, so MIGHT benefit from refcells. The > memory cost might be higher, though. Same memory cost. They're just objects whose type's dicts happen to have a __get__ method (just like iterables are just objects whose type's dicts happen to have an __iter__ method). The point is that you can't cache the result of the descriptor call, you can cache the descriptor itself but it will rarely help, and the builtin method cache probably already takes care of 99% of the cases where it would help, so I don't see what you're going to get here. >> On Thu, Dec 17, 2015 at 5:17 PM, Andrew Barnert wrote: >>> On Dec 17, 2015, at 13:37, Andrew Barnert via Python-Dev >>> wrote: >>> >>> On Thursday, December 17, 2015 11:19 AM, Franklin? Lee >>> wrote: >>> >>> ... as soon as I figure out how descriptors actually work... >>> >>> >>> I think you need to learn what LOAD_ATTR and the machinery around it >>> actually does before I can explain why trying to optimize it like >>> globals-vs.-builtins doesn't make sense. Maybe someone who's better at >>> explaining than me can come up with something clearer than the existing >>> documentation, but I can't. >> >> I take that back. First, it was harsher than I intended. Second, I think I >> can explain things. > > I appreciate it! Tracking function definitions in the source can make > one want to do something else. The documentation is pretty good for this stuff (and getting better every year). You mainly want the data model chapter of the reference and the descriptor howto guide; the dis and inspect docs in the library can also be helpful. Together they'll answer most of what you need. If they don't, maybe I will try to write up an explanation as a blog post, but I don't think it needs to get sent to the list (except for the benefit of core devs calling me out of I screw up, but they have better things to do with their time). >> First, for non-attribute lookups: >> >> (Non-shared) locals just load and save from an array. >> >> Free variables and shared locals load and save by going through an extra >> dereference on a cell object in an array. > > In retrospect, of course they do. > > It sounds like the idea is what's already used there, except the refs > are synced to the locals array instead of a hash table. Yes, which is already faster than what you want to do. More
Re: [Python-Dev] Idea: Dictionary references
On Fri, Dec 18, 2015 at 2:37 PM, Andrew Barnert via Python-Devwrote: > If __getattribute__ raises an AttributeError (or isn't found, but that only > happens in special cases like somehow calling a method on a type that hasn't > been constructed) Wow. How do you do that? Is it possible with pure Python? ChrisA ___ Python-Dev mailing list Python-Dev@python.org https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Idea: Dictionary references
2015-12-17 11:54 GMT+01:00 Franklin? Lee: > My suggestion should improve *all* function calls which refer to > outside names. Ok, I now think that you should stop hijacking the FAT Python thread. I start a new thread. IMHO your dictionary reference idea is completly unrelated to FAT Python. FAT Python is about guards and specialized bytecode. > Each function keeps an indirect, automagically updated > reference to the current value of the names they use, and will never > need to look things up again.[*] Indirections, nested dictionaries, creation of new "reference" objects... IMHO you are going to have major implementation issues :-/ The design looks *very* complex. I'm quite sure that you are going to make namespace lookups *slower*. It reminds me Python before the with statement and PyPy garbage collector. Many applications relied on the exact behaviour of CPython garbage collector. For example, they expected that a file is written on disk when the last reference to the file object is destroyed. In PyPy, it wasn't (it isn't) true, the write can be delayed. I guess that with all your complex machinery for dict lookups, you will have similar issues of object lifetime. It's unclear to me when and how "reference" objects are destroyed, nor when dict values are destroyed. What happens if a dict key is removed and a reference object is still alive? Is the dict value immediatly destroyed? Does the reference object contain a strong or a weak reference to the value? Victor ___ Python-Dev mailing list Python-Dev@python.org https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Third milestone of FAT Python
On Wed, Dec 16, 2015 at 2:01 AM, Victor Stinnerwrote: > Le mercredi 16 décembre 2015, Franklin? Lee > a écrit : >> >> I am confident that the time overhead and the savings will beat the >> versioning dict. The versioning dict method has to save a reference to >> the variable value and a reference to the name, and regularly test >> whether the dict has changed. > > > The performance of guards matters less than the performance of regular usage > of dict. If we have to make a choice, I prefer "slow" guard but no impact on > regular dict methods. It's very important that enabling FAT mode doesn't > kill performances. Remember that FAT python is a static optimizer and so can > only optimize some patterns, not all Python code. > > In my current implementation, a lookup is only needed when aguard is checked > if the dict was modified. The dict version doesn't change if a mutable > object was modified in place for example. I didn't benchmark, but I expect > that the lookup is avoided in most cases. You should try FAT python and > implement statistics before going too far in your idea. My suggestion should improve *all* function calls which refer to outside names. Each function keeps an indirect, automagically updated reference to the current value of the names they use, and will never need to look things up again.[*] There wouldn't be a need to save global names as default arguments (`def f(x, list=list):`). [*]: Not exactly true. Nested scopes can cause an issue. I'm not sure what happens if you redefine the __builtin__ name after these functions are defined. My suggestion would not know that the __builtin__ was switched out, since it saves a ref into the original __builtin__. I'm not sure about how to deal with nested scopes (for example, class inheritance). I think the "chained RefCells" idea works. Chained Refcell idea: There are three cases where we can have nested scopes: 1. An object's __dict__ nests its class. 2. A class's __dict__ nests its superclasses. 3. globals() nests __builtin__. 4? Does a package nest its modules? 5?. Does `from module import *` nest the module's __dict__? (Nonlocal variables in nested functions, and probably nested classes, are not a case of nested scope, since scope of each name is determined during compilation of the function/class.) RefCells of nested scopes will have a pointer to their value (or NULL), and an array/linkedlist of pointers to refcells from their parent dicts (or to their parent dicts if a refcell wasn't successfully acquired yet). When you request a RefCell from a nested Scope, it will return its value if it exists. Otherwise, it requests refcells from each parent (parents won't create refcells unless there's a value) until it gets one. When you ask a RefCell to resolve, it will check its own value, then ask each parent for a value (creating intermediate refcells *if* value exist). It will not need to do lookups in parents if it got a refcell before (though the refcell might be null). Problem: If you have class A(B, C), and you resolve a refcell for a name which exists in C but not B, you will look things up through B's dict every single time. It will fail, every single time. We can't skip B, since B is allowed to get such a name later, but I don't want to add refs to names that might never exist. This can be solved either through versioning or by checking whether a dict is read-only (such as for built-in types). In fact, in the code I wrote at the end of this email, RefCell.resolve() might even look things up in a shared ancestor multiple times. However, this would be incorrect anyway, since it doesn't respect Python's MRO resolution. So we can just fix that. RefCell.resolve would need a `search_parents: bool` parameter. >> I've read it again. By subclass, I mean that it implements the same >> interface. But at the C level, I want to have it be a fork(?) of the >> current dict implementation. As for `exec`, I think it might be okay >> for it to be slower at the early stages of this game. > > > Be careful, dict methods are hardcoded in the C code. If your type is not a > subtype, there is risk of crashes. Not exactly, and this is important. Many functions are called via pointer. It's like C++'s virtual methods, but more dynamic, since they can be changed per object. See https://github.com/python/cpython/blob/master/Objects/dict-common.h#L17. For example, the lookup method for str-only dicts swaps itself for a general object lookup method if it finds a non-string key. See https://github.com/python/cpython/blob/master/Objects/dictobject.c#L549. I'm now suggesting that there be additional space in the dict object itself to hold more function pointers (get_ref and get_hash_table_index), which would slightly increase memory cost and creation time. It won't have extra cost for running normal methods. When the dict gets a request for a reference, it will swap in methods that knows how to