Re: [Python-Dev] Instance variable access and descriptors
Hi, On Tue, Jun 12, 2007 at 08:10:26PM +1200, Greg Ewing wrote: Rather than spend time tinkering with the lookup order, it might be more productive to look into implementing a cache for attribute lookups. See patch #1700288. Armin ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Instance variable access and descriptors
Phillip J. Eby wrote: ...at the cost of slowing down access to properties and __slots__, by adding an *extra* dictionary lookup there. Rather than spend time tinkering with the lookup order, it might be more productive to look into implementing a cache for attribute lookups. That would help with method lookups as well, which are probably more frequent than instance var accesses. -- Greg ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Instance variable access and descriptors
On Tue, Jun 12, 2007 at 08:10:26PM +1200, Greg Ewing wrote: Phillip J. Eby wrote: ...at the cost of slowing down access to properties and __slots__, by adding an *extra* dictionary lookup there. Rather than spend time tinkering with the lookup order, it might be more productive to look into implementing a cache for attribute lookups. That would help with method lookups as well, which are probably more frequent than instance var accesses. Was wondering the same; specifically, hijacking pep280 celldict appraoch for this. Downside, this would break code that tries to do PyDict_* calls on a class tp_dict; haven't dug extensively, but I'm sure there are a few out there. Main thing I like about that approach is that it avoids the staleness verification crap, single lookup- it's there or it isn't. It would also be resuable for 280. If folks don't much like the hit from tracing back to a cell holding an actual value, could always implement it such that upon change, the change propagates out to instances registered (iow, change a.__dict__, it notifies b.__dict__ of the change, etc, till it hits a point where the change doesn't need to go further). ~harring pgphUjh4BMXhf.pgp Description: PGP signature ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Instance variable access and descriptors
While you're at it, it would be nice to fix this ugly asymmetry I found in descriptors. It seems that descriptor's __get__ is called even when accessed from a class rather than instance, but __set__ is only invoked from instances, never from classes: class Descr(object): def __get__(self, obj, objtype): print __get__ from instance %s, type %s % (obj, type) return foo def __set__(self, obj, value): print __set__ on instance %s, value %s % (obj, value) class Foo(object): foo = Descr() print Foo.foo # works ## doesn't work, goes directly to the class dict, not calling __set__ Foo.foo = 123 Because of this problem, I may have to install properties into a class's metaclass achieve the same effect that I expected to achieve with a simple descriptor :-( On 10/06/07, Aahz [EMAIL PROTECTED] wrote: On Sun, Jun 10, 2007, Eyal Lotem wrote: Python, probably through the valid assumption that most attribute lookups go to the class, tries to look for the attribute in the class first, and in the instance, second. What Python currently does is quite peculiar! Here's a short description o PyObject_GenericGetAttr: A. Python looks for a descriptor in the _entire_ mro hierarchy (len(mro) class/type check and dict lookups). B. If Python found a descriptor and it has both get and set functions - it uses it to get the value and returns, skipping the next stage. C. If Python either did not find a descriptor, or found one that has no setter, it will try a lookup in the instance dict. D. If Python failed to find it in the instance, it will use the descriptor's getter, and if it has no getter it will use the descriptor itself. Guido, Ping, and I tried working on this at the sprint for PyCon 2003. We were unable to find any solution that did not affect critical-path timing. As other people have noted, the current semantics cannot be changed. I'll also echo other people and suggest that this discusion be moved to python-ideas if you want to continue pushing for a change in semantics. I just did a Google for my notes from PyCon 2003 and it appears that I never sent them out (probably because they aren't particularly comprehensible). Here they are for the record (from 3/25/2003): ''' CACHE_ATTR is the name used to describe a speedup (for new-style classes only) in attribute lookup by caching the location of attributes in the MRO. Some of the non-obvious bits of code: * If a new-style class has any classic classes in its bases, we can't do attribute caching (we need to weakrefs to the derived classes). * If searching the MRO for an attribute discovers a data descriptor (has tp_descr_set), that overrides any attribute that might be in the instance; however, the existence of tp_descr_get still permits the instance to override its bases (but tp_descr_get is called if there is no instance attribute). * We need to invalidate the cache for the updated attribute in all derived classes in the following cases: * an attribute is added or deleted to the class or its base classes * an attribute has its status changed to or from being a data descriptor This file uses Python pseudocode to describe changes necessary to implement CACHE_ATTR at the C level. Except for class Meta, these are all exact descriptions of the work being done. Except for class Meta the changes go into object.c (Meta goes into typeobject.c). The pseudocode looks somewhat C-like to ease the transformation. ''' NULL = object() def getattr(inst, name): isdata, where = lookup(inst.__class__, name) if isdata: descr = where[name] if hasattr(descr, __get__): return descr.__get__(inst) else: return descr value = inst.__dict__.get(name, NULL) if value != NULL: return value if where == NULL: raise AttributError descr = where[name] if hasattr(descr, __get__): value = descr.__get__(inst) else: value = descr return value def setattr(inst, name, value): isdata, where = lookup(inst.__class__, name) if isdata: descr = where[name] descr.__set__(inst, value) return inst.__dict__[name] = value def lookup(cls, name): if cls.__cache__ != NULL: pair = cls.__cache__.get(name) else: pair = NULL if pair: return pair else: for c in cls.__mro__: where = c.__dict__ if name in where: descr = where[name] isdata = hasattr(descr, __set__) pair = isdata, where break else: pair = False, NULL if cls.__cache__ != NULL: cls.__cache__[name] = pair return pair ''' These changes go into typeobject.c; they are not a complete description of what happens during creation/updates, only the changes necessary to implement CACHE_ATTRO. ''' from types import ClassType class Meta(type): def _invalidate(cls,
Re: [Python-Dev] Instance variable access and descriptors
Hi Eyal, On Sun, Jun 10, 2007 at 04:13:38AM +0300, Eyal Lotem wrote: I must be missing something, as I really see no reason to keep the existing semantics other than backwards compatibility (which can be achieved by introducing a __fastattr__ or such). Can you explain under which situations or find any example situation where the existing semantics are desirable? The existing semantics are essential when dealing with metaclasses. Many of the descriptors of the 'type' class would stop working without it. For example, the fact that 'x.__class__' normally gives the type of 'x' for any object x relies on this. Reading the '__dict__' attribute of types is also based on this. Before proposing changes, be sure you understand exactly how the following works: object.__class__ type 'type' object.__dict__['__class__'] attribute '__class__' of 'object' objects class A(object): ... pass A.__dict__ dictproxy object at 0xb7c98e6c A.__dict__['__dict__'] attribute '__dict__' of 'A' objects A bientot, Armin. ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Instance variable access and descriptors
On Sun, Jun 10, 2007, Eyal Lotem wrote: Python, probably through the valid assumption that most attribute lookups go to the class, tries to look for the attribute in the class first, and in the instance, second. What Python currently does is quite peculiar! Here's a short description o PyObject_GenericGetAttr: A. Python looks for a descriptor in the _entire_ mro hierarchy (len(mro) class/type check and dict lookups). B. If Python found a descriptor and it has both get and set functions - it uses it to get the value and returns, skipping the next stage. C. If Python either did not find a descriptor, or found one that has no setter, it will try a lookup in the instance dict. D. If Python failed to find it in the instance, it will use the descriptor's getter, and if it has no getter it will use the descriptor itself. Guido, Ping, and I tried working on this at the sprint for PyCon 2003. We were unable to find any solution that did not affect critical-path timing. As other people have noted, the current semantics cannot be changed. I'll also echo other people and suggest that this discusion be moved to python-ideas if you want to continue pushing for a change in semantics. I just did a Google for my notes from PyCon 2003 and it appears that I never sent them out (probably because they aren't particularly comprehensible). Here they are for the record (from 3/25/2003): ''' CACHE_ATTR is the name used to describe a speedup (for new-style classes only) in attribute lookup by caching the location of attributes in the MRO. Some of the non-obvious bits of code: * If a new-style class has any classic classes in its bases, we can't do attribute caching (we need to weakrefs to the derived classes). * If searching the MRO for an attribute discovers a data descriptor (has tp_descr_set), that overrides any attribute that might be in the instance; however, the existence of tp_descr_get still permits the instance to override its bases (but tp_descr_get is called if there is no instance attribute). * We need to invalidate the cache for the updated attribute in all derived classes in the following cases: * an attribute is added or deleted to the class or its base classes * an attribute has its status changed to or from being a data descriptor This file uses Python pseudocode to describe changes necessary to implement CACHE_ATTR at the C level. Except for class Meta, these are all exact descriptions of the work being done. Except for class Meta the changes go into object.c (Meta goes into typeobject.c). The pseudocode looks somewhat C-like to ease the transformation. ''' NULL = object() def getattr(inst, name): isdata, where = lookup(inst.__class__, name) if isdata: descr = where[name] if hasattr(descr, __get__): return descr.__get__(inst) else: return descr value = inst.__dict__.get(name, NULL) if value != NULL: return value if where == NULL: raise AttributError descr = where[name] if hasattr(descr, __get__): value = descr.__get__(inst) else: value = descr return value def setattr(inst, name, value): isdata, where = lookup(inst.__class__, name) if isdata: descr = where[name] descr.__set__(inst, value) return inst.__dict__[name] = value def lookup(cls, name): if cls.__cache__ != NULL: pair = cls.__cache__.get(name) else: pair = NULL if pair: return pair else: for c in cls.__mro__: where = c.__dict__ if name in where: descr = where[name] isdata = hasattr(descr, __set__) pair = isdata, where break else: pair = False, NULL if cls.__cache__ != NULL: cls.__cache__[name] = pair return pair ''' These changes go into typeobject.c; they are not a complete description of what happens during creation/updates, only the changes necessary to implement CACHE_ATTRO. ''' from types import ClassType class Meta(type): def _invalidate(cls, name): if name in cls.__cache__: del cls.__cache__[name] for c in cls.__subclasses__(): if name not in c.__dict__: self._invalidate(c, name) def _build_cache(cls, bases): for base in bases: if type(base.__class__) is ClassType: cls.__cache__ = NULL break else: cls.__cache__ = {} def __new__ (cls, bases): self._build_cache(cls, bases) def __setbases__(cls, bases): self._build_cache(cls, bases) def __setattr__(cls, name, value): if cls.__cache__ != NULL: old = cls.__dict__.get(name, NULL) wasdata = old != NULL and hasattr(old, __set__) isdata = value != NULL and hasattr(value, __set__) if wasdata != isdata or (old == NULL)
[Python-Dev] Instance variable access and descriptors
Hi. I was surprised to find in my profiling that instance variable access was pretty slow. I looked through the CPython code involved, and discovered something that really surprises me. Python, probably through the valid assumption that most attribute lookups go to the class, tries to look for the attribute in the class first, and in the instance, second. What Python currently does is quite peculiar! Here's a short description o PyObject_GenericGetAttr: A. Python looks for a descriptor in the _entire_ mro hierarchy (len(mro) class/type check and dict lookups). B. If Python found a descriptor and it has both get and set functions - it uses it to get the value and returns, skipping the next stage. C. If Python either did not find a descriptor, or found one that has no setter, it will try a lookup in the instance dict. D. If Python failed to find it in the instance, it will use the descriptor's getter, and if it has no getter it will use the descriptor itself. I believe the average costs of A are much higher than of C. Because there is just 1 instance dict to look through, and it is also typically smaller than the class dicts (in rare cases of worse-case timings of hash lookups), while there are len(mro) dicts to look for a descriptor in. This means that for simple instance variable lookups, Python is paying the full mro lookup price! I believe that this should be changed, so that Python first looks for the attribute in the instance's dict and only then through the dict's mro. This will have the following effects: A. It will break code that uses instance.__dict__['var'] directly, when 'var' exists as a property with a __set__ in the class. I believe this is not significant. B. It will simplify getattr's semantics. Python should _always_ give precedence to instance attributes over class ones, rather than have very weird special-cases (such as a property with a __set__). C. It will greatly speed up instance variable access, especially when the class has a large mro. I think obviously the code breakage is the worst problem. This could probably be addressed by a transition version in which Python warns about any instance attributes that existed in the mro as descriptors as well. What do you think? ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Instance variable access and descriptors
On 6/9/07, Eyal Lotem [EMAIL PROTECTED] wrote: I believe that this should be changed, so that Python first looks for the attribute in the instance's dict and only then through the dict's mro. [snip] What do you think? Are you suggesting that the following code should print 43 instead of 42? :: class C(object): ... x = property(lambda self: self._x) ... def __init__(self): ... self._x = 42 ... c = C() c.__dict__['x'] = 43 c.x 42 If so, this is a pretty substantial backwards incompatibility, and you should probably post this to python-ideas first to hash things out. If people like it there, the right target is probably Python 3000, not Python 2.x. STeVe -- I'm not *in*-sane. Indeed, I am so far *out* of sane that you appear a tiny blip on the distant coast of sanity. --- Bucky Katt, Get Fuzzy ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Instance variable access and descriptors
On 6/10/07, Steven Bethard [EMAIL PROTECTED] wrote: On 6/9/07, Eyal Lotem [EMAIL PROTECTED] wrote: I believe that this should be changed, so that Python first looks for the attribute in the instance's dict and only then through the dict's mro. Are you suggesting that the following code should print 43 instead of 42? :: class C(object): ... x = property(lambda self: self._x) ... def __init__(self): ... self._x = 42 ... c = C() c.__dict__['x'] = 43 c.x 42 On 6/9/07, Eyal Lotem [EMAIL PROTECTED] wrote: Yes, I do suggest that. But its important to notice that this is not a suggestion in order to improve Python, but one that makes it possible to get reasonable performance out of CPython. As such, I don't believe it should be done in Py3K. Firstly, like everything that breaks backwards compatibility, it is possible to have a transitional version that spits warnings for all problems (detect name clashes between properties and instance dict). Sure, but then you're talking about really introducing this in Python 2.7, with 2.6 as a transitional version. So take a minute to look at the release timelines: http://www.python.org/dev/peps/pep-0361/ The initial 2.6 target is for April 2008. http://www.python.org/dev/peps/pep-3000/ I hope to have a first alpha release (3.0a1) out in the first half of 2007; it should take no more than a year from then before the first proper release, named Python 3.0 So I'm expecting Python 3.0 to come out *before* 2.7. Thus if you're proposing a backwards-incompatible change that would have to wait until 2.7 anyway, why not propose it for 3.0 where backwards-incompatible changes are more acceptable? STeVe -- I'm not *in*-sane. Indeed, I am so far *out* of sane that you appear a tiny blip on the distant coast of sanity. --- Bucky Katt, Get Fuzzy ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Instance variable access and descriptors
At 12:23 AM 6/10/2007 +0300, Eyal Lotem wrote: A. It will break code that uses instance.__dict__['var'] directly, when 'var' exists as a property with a __set__ in the class. I believe this is not significant. B. It will simplify getattr's semantics. Python should _always_ give precedence to instance attributes over class ones, rather than have very weird special-cases (such as a property with a __set__). Actually, these are features that are both used and desirable; I've been using them both since Python 2.2 (i.e., for many years now). I'm -1 on removing these features from any version of Python, even 3.0. C. It will greatly speed up instance variable access, especially when the class has a large mro. ...at the cost of slowing down access to properties and __slots__, by adding an *extra* dictionary lookup there. Note, by the way, that if you want to change attribute lookup semantics, you can always override __getattribute__ and make it work whatever way you like, without forcing everybody else to change *their* code. ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Instance variable access and descriptors
I agree with Phillip with regard to the semantics. They are semantically desirable. However, there is a patch to add a mro cache to speed up these sorts of cases on the Python tracker, originally submitted by Armin Rigo. He saw ~20% speedups, others see less. It is currently just sitting there with no apparent activity. So if the overhead of mro lookups is that bothersome, it may be well worth your time to review the patch. URL: http://sourceforge.net/tracker/index.php?func=detailaid=1700288group_id=5470atid=305470 -Kevin On 6/9/07, Phillip J. Eby [EMAIL PROTECTED] wrote: At 12:23 AM 6/10/2007 +0300, Eyal Lotem wrote: A. It will break code that uses instance.__dict__['var'] directly, when 'var' exists as a property with a __set__ in the class. I believe this is not significant. B. It will simplify getattr's semantics. Python should _always_ give precedence to instance attributes over class ones, rather than have very weird special-cases (such as a property with a __set__). Actually, these are features that are both used and desirable; I've been using them both since Python 2.2 (i.e., for many years now). I'm -1 on removing these features from any version of Python, even 3.0. C. It will greatly speed up instance variable access, especially when the class has a large mro. ...at the cost of slowing down access to properties and __slots__, by adding an *extra* dictionary lookup there. Note, by the way, that if you want to change attribute lookup semantics, you can always override __getattribute__ and make it work whatever way you like, without forcing everybody else to change *their* code. ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/jacobs%40bioinformed.com ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Instance variable access and descriptors
I must be missing something, as I really see no reason to keep the existing semantics other than backwards compatibility (which can be achieved by introducing a __fastattr__ or such). Can you explain under which situations or find any example situation where the existing semantics are desirable? As for the mro cache - thanks for pointing it out - I think it can serve as a platform for another idea that in conjunction with psyco, can possibly speed up CPython very significantly (will create a thread about this soon). Please note that speeding up the mro-lookup solves only half of the problem (if it was solved - which it seems not to have been), the more important half of the problem remains, allow me to emphasize: ALL instance attribute accesses look up in both instance and class dicts, when it could look just in the instance dict. This is made worse by the fact that the class dict lookup is more expensive (with or without the mro cache). Some code that accesses a lot of instance attributes in an inner loop can easily be sped up by a factor of 2 or more (depending on the size of the mro). Eyal On 6/10/07, Kevin Jacobs [EMAIL PROTECTED] [EMAIL PROTECTED] wrote: I agree with Phillip with regard to the semantics. They are semantically desirable. However, there is a patch to add a mro cache to speed up these sorts of cases on the Python tracker, originally submitted by Armin Rigo. He saw ~20% speedups, others see less. It is currently just sitting there with no apparent activity. So if the overhead of mro lookups is that bothersome, it may be well worth your time to review the patch. URL: http://sourceforge.net/tracker/index.php?func=detailaid=1700288group_id=5470atid=305470 -Kevin On 6/9/07, Phillip J. Eby [EMAIL PROTECTED] wrote: At 12:23 AM 6/10/2007 +0300, Eyal Lotem wrote: A. It will break code that uses instance.__dict__['var'] directly, when 'var' exists as a property with a __set__ in the class. I believe this is not significant. B. It will simplify getattr's semantics. Python should _always_ give precedence to instance attributes over class ones, rather than have very weird special-cases (such as a property with a __set__). Actually, these are features that are both used and desirable; I've been using them both since Python 2.2 (i.e., for many years now). I'm -1 on removing these features from any version of Python, even 3.0. C. It will greatly speed up instance variable access, especially when the class has a large mro. ...at the cost of slowing down access to properties and __slots__, by adding an *extra* dictionary lookup there. Note, by the way, that if you want to change attribute lookup semantics, you can always override __getattribute__ and make it work whatever way you like, without forcing everybody else to change *their* code. ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/jacobs%40bioinformed.com ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com