Re: [Python-Dev] Instance variable access and descriptors

2007-06-13 Thread Armin Rigo
Hi,

On Tue, Jun 12, 2007 at 08:10:26PM +1200, Greg Ewing wrote:
 Rather than spend time tinkering with the lookup order,
 it might be more productive to look into implementing
 a cache for attribute lookups.

See patch #1700288.


Armin
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Instance variable access and descriptors

2007-06-12 Thread Greg Ewing
Phillip J. Eby wrote:
 ...at the cost of slowing down access to properties and __slots__, by 
 adding an *extra* dictionary lookup there.

Rather than spend time tinkering with the lookup order,
it might be more productive to look into implementing
a cache for attribute lookups. That would help with
method lookups as well, which are probably more
frequent than instance var accesses.

--
Greg
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Instance variable access and descriptors

2007-06-12 Thread Brian Harring
On Tue, Jun 12, 2007 at 08:10:26PM +1200, Greg Ewing wrote:
 Phillip J. Eby wrote:
  ...at the cost of slowing down access to properties and __slots__, by 
  adding an *extra* dictionary lookup there.
 
 Rather than spend time tinkering with the lookup order,
 it might be more productive to look into implementing
 a cache for attribute lookups. That would help with
 method lookups as well, which are probably more
 frequent than instance var accesses.

Was wondering the same; specifically, hijacking pep280 celldict 
appraoch for this.

Downside, this would break code that tries to do PyDict_* calls on a 
class tp_dict; haven't dug extensively, but I'm sure there are a few 
out there.

Main thing I like about that approach is that it avoids the staleness 
verification crap, single lookup- it's there or it isn't.  It would 
also be resuable for 280.

If folks don't much like the hit from tracing back to a cell holding 
an actual value, could always implement it such that upon change, the 
change propagates out to instances registered (iow, change a.__dict__, 
it notifies b.__dict__ of the change, etc, till it hits a point where 
the change doesn't need to go further).

~harring


pgphUjh4BMXhf.pgp
Description: PGP signature
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Instance variable access and descriptors

2007-06-11 Thread Gustavo Carneiro

 While you're at it, it would be nice to fix this ugly asymmetry I found in
descriptors.  It seems that descriptor's __get__ is called even when
accessed from a class rather than instance, but __set__ is only invoked from
instances, never from classes:

class Descr(object):
   def __get__(self, obj, objtype):
   print __get__ from instance %s, type %s % (obj, type)
   return foo

   def __set__(self, obj, value):
   print __set__ on instance %s, value %s % (obj, value)

class Foo(object):
   foo = Descr()

print Foo.foo # works

## doesn't work, goes directly to the class dict, not calling __set__
Foo.foo = 123

 Because of this problem, I may have to install properties into a class's
metaclass achieve the same effect that I expected to achieve with a simple
descriptor :-(


On 10/06/07, Aahz [EMAIL PROTECTED] wrote:


On Sun, Jun 10, 2007, Eyal Lotem wrote:

 Python, probably through the valid assumption that most attribute
 lookups go to the class, tries to look for the attribute in the class
 first, and in the instance, second.

 What Python currently does is quite peculiar!
 Here's a short description o PyObject_GenericGetAttr:

 A. Python looks for a descriptor in the _entire_ mro hierarchy
 (len(mro) class/type check and dict lookups).
 B. If Python found a descriptor and it has both get and set functions
 - it uses it to get the value and returns, skipping the next stage.
 C. If Python either did not find a descriptor, or found one that has
 no setter, it will try a lookup in the instance dict.
 D. If Python failed to find it in the instance, it will use the
 descriptor's getter, and if it has no getter it will use the
 descriptor itself.

Guido, Ping, and I tried working on this at the sprint for PyCon 2003.
We were unable to find any solution that did not affect critical-path
timing.  As other people have noted, the current semantics cannot be
changed.  I'll also echo other people and suggest that this discusion be
moved to python-ideas if you want to continue pushing for a change in
semantics.

I just did a Google for my notes from PyCon 2003 and it appears that I
never sent them out (probably because they aren't particularly
comprehensible).  Here they are for the record (from 3/25/2003):

'''
CACHE_ATTR is the name used to describe a speedup (for new-style classes
only) in attribute lookup by caching the location of attributes in the
MRO.  Some of the non-obvious bits of code:

* If a new-style class has any classic classes in its bases, we
can't do attribute caching (we need to weakrefs to the derived
classes).

* If searching the MRO for an attribute discovers a data descriptor (has
tp_descr_set), that overrides any attribute that might be in the instance;
however, the existence of tp_descr_get still permits the instance to
override its bases (but tp_descr_get is called if there is no instance
attribute).

* We need to invalidate the cache for the updated attribute in all derived
classes in the following cases:

* an attribute is added or deleted to the class or its base classes

* an attribute has its status changed to or from being a data
descriptor

This file uses Python pseudocode to describe changes necessary to
implement CACHE_ATTR at the C level.  Except for class Meta, these are
all exact descriptions of the work being done.  Except for class Meta the
changes go into object.c (Meta goes into typeobject.c).  The pseudocode
looks somewhat C-like to ease the transformation.
'''

NULL = object()

def getattr(inst, name):
isdata, where = lookup(inst.__class__, name)
if isdata:
descr = where[name]
if hasattr(descr, __get__):
return descr.__get__(inst)
else:
return descr
value = inst.__dict__.get(name, NULL)
if value != NULL:
return value
if where == NULL:
raise AttributError
descr = where[name]
if hasattr(descr, __get__):
value = descr.__get__(inst)
else:
value = descr
return value

def setattr(inst, name, value):
isdata, where = lookup(inst.__class__, name)
if isdata:
descr = where[name]
descr.__set__(inst, value)
return
inst.__dict__[name] = value

def lookup(cls, name):
if cls.__cache__ != NULL:
pair = cls.__cache__.get(name)
else:
pair = NULL
if pair:
return pair
else:
for c in cls.__mro__:
where = c.__dict__
if name in where:
descr = where[name]
isdata = hasattr(descr, __set__)
pair = isdata, where
break
else:
pair = False, NULL
if cls.__cache__ != NULL:
cls.__cache__[name] = pair
return pair


'''
These changes go into typeobject.c; they are not a complete
description of what happens during creation/updates, only the
changes necessary to implement CACHE_ATTRO.
'''

from types import ClassType

class Meta(type):
def _invalidate(cls, 

Re: [Python-Dev] Instance variable access and descriptors

2007-06-11 Thread Armin Rigo
Hi Eyal,

On Sun, Jun 10, 2007 at 04:13:38AM +0300, Eyal Lotem wrote:
 I must be missing something, as I really see no reason to keep the
 existing semantics other than backwards compatibility (which can be
 achieved by introducing a __fastattr__ or such).
 
 Can you explain under which situations or find any example situation
 where the existing semantics are desirable?

The existing semantics are essential when dealing with metaclasses.
Many of the descriptors of the 'type' class would stop working without
it.  For example, the fact that 'x.__class__' normally gives the type of
'x' for any object x relies on this.  Reading the '__dict__' attribute
of types is also based on this.  Before proposing changes, be sure you
understand exactly how the following works:

 object.__class__
type 'type'
 object.__dict__['__class__']
attribute '__class__' of 'object' objects

 class A(object):
... pass
 A.__dict__
dictproxy object at 0xb7c98e6c
 A.__dict__['__dict__']
attribute '__dict__' of 'A' objects


A bientot,

Armin.
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Instance variable access and descriptors

2007-06-10 Thread Aahz
On Sun, Jun 10, 2007, Eyal Lotem wrote:
 
 Python, probably through the valid assumption that most attribute
 lookups go to the class, tries to look for the attribute in the class
 first, and in the instance, second.
 
 What Python currently does is quite peculiar!
 Here's a short description o PyObject_GenericGetAttr:
 
 A. Python looks for a descriptor in the _entire_ mro hierarchy
 (len(mro) class/type check and dict lookups).
 B. If Python found a descriptor and it has both get and set functions
 - it uses it to get the value and returns, skipping the next stage.
 C. If Python either did not find a descriptor, or found one that has
 no setter, it will try a lookup in the instance dict.
 D. If Python failed to find it in the instance, it will use the
 descriptor's getter, and if it has no getter it will use the
 descriptor itself.

Guido, Ping, and I tried working on this at the sprint for PyCon 2003.
We were unable to find any solution that did not affect critical-path
timing.  As other people have noted, the current semantics cannot be
changed.  I'll also echo other people and suggest that this discusion be
moved to python-ideas if you want to continue pushing for a change in
semantics.

I just did a Google for my notes from PyCon 2003 and it appears that I
never sent them out (probably because they aren't particularly
comprehensible).  Here they are for the record (from 3/25/2003):

'''
CACHE_ATTR is the name used to describe a speedup (for new-style classes
only) in attribute lookup by caching the location of attributes in the
MRO.  Some of the non-obvious bits of code:

* If a new-style class has any classic classes in its bases, we
can't do attribute caching (we need to weakrefs to the derived
classes).

* If searching the MRO for an attribute discovers a data descriptor (has
tp_descr_set), that overrides any attribute that might be in the instance;
however, the existence of tp_descr_get still permits the instance to
override its bases (but tp_descr_get is called if there is no instance
attribute).

* We need to invalidate the cache for the updated attribute in all derived
classes in the following cases:

* an attribute is added or deleted to the class or its base classes

* an attribute has its status changed to or from being a data
descriptor

This file uses Python pseudocode to describe changes necessary to
implement CACHE_ATTR at the C level.  Except for class Meta, these are
all exact descriptions of the work being done.  Except for class Meta the
changes go into object.c (Meta goes into typeobject.c).  The pseudocode
looks somewhat C-like to ease the transformation.
'''

NULL = object()

def getattr(inst, name):
isdata, where = lookup(inst.__class__, name)
if isdata:
descr = where[name]
if hasattr(descr, __get__):
return descr.__get__(inst)
else:
return descr
value = inst.__dict__.get(name, NULL)
if value != NULL:
return value
if where == NULL:
raise AttributError
descr = where[name]
if hasattr(descr, __get__):
value = descr.__get__(inst)
else:
value = descr
return value

def setattr(inst, name, value):
isdata, where = lookup(inst.__class__, name)
if isdata:
descr = where[name]
descr.__set__(inst, value)
return
inst.__dict__[name] = value

def lookup(cls, name):
if cls.__cache__ != NULL:
pair = cls.__cache__.get(name)
else:
pair = NULL
if pair:
return pair
else:
for c in cls.__mro__:
where = c.__dict__
if name in where:
descr = where[name]
isdata = hasattr(descr, __set__)
pair = isdata, where
break
else:
pair = False, NULL
if cls.__cache__ != NULL:
cls.__cache__[name] = pair
return pair


'''
These changes go into typeobject.c; they are not a complete
description of what happens during creation/updates, only the
changes necessary to implement CACHE_ATTRO.
'''

from types import ClassType

class Meta(type):
def _invalidate(cls, name):
if name in cls.__cache__:
del cls.__cache__[name]
for c in cls.__subclasses__():
if name not in c.__dict__:
self._invalidate(c, name)
def _build_cache(cls, bases):
for base in bases:
if type(base.__class__) is ClassType:
cls.__cache__ = NULL
break
else:
cls.__cache__ = {}
def __new__ (cls, bases):
self._build_cache(cls, bases)
def __setbases__(cls, bases):
self._build_cache(cls, bases)
def __setattr__(cls, name, value):
if cls.__cache__ != NULL:
old = cls.__dict__.get(name, NULL)
wasdata = old != NULL and hasattr(old, __set__)
isdata = value != NULL and hasattr(value, __set__)
if wasdata != isdata or (old == NULL) 

[Python-Dev] Instance variable access and descriptors

2007-06-09 Thread Eyal Lotem
Hi.

I was surprised to find in my profiling that instance variable access
was pretty slow.

I looked through the CPython code involved, and discovered something
that really surprises me.

Python, probably through the valid assumption that most attribute
lookups go to the class, tries to look for the attribute in the class
first, and in the instance, second.

What Python currently does is quite peculiar!
Here's a short description o PyObject_GenericGetAttr:

A. Python looks for a descriptor in the _entire_ mro hierarchy
(len(mro) class/type check and dict lookups).
B. If Python found a descriptor and it has both get and set functions
- it uses it to get the value and returns, skipping the next stage.
C. If Python either did not find a descriptor, or found one that has
no setter, it will try a lookup in the instance dict.
D. If Python failed to find it in the instance, it will use the
descriptor's getter, and if it has no getter it will use the
descriptor itself.


I believe the average costs of A are much higher than of C. Because
there is just 1 instance dict to look through, and it is also
typically smaller than the class dicts (in rare cases of worse-case
timings of hash lookups), while there are len(mro) dicts to look for a
descriptor in.

This means that for simple instance variable lookups, Python is paying
the full mro lookup price!

I believe that this should be changed, so that Python first looks for
the attribute in the instance's dict and only then through the dict's
mro.

This will have the following effects:

A. It will break code that uses instance.__dict__['var'] directly,
when 'var' exists as a property with a __set__ in the class. I believe
this is not significant.
B. It will simplify getattr's semantics. Python should _always_ give
precedence to instance attributes over class ones, rather than have
very weird special-cases (such as a property with a __set__).
C. It will greatly speed up instance variable access, especially when
the class has a large mro.

I think obviously the code breakage is the worst problem. This could
probably be addressed by a transition version in which Python warns
about any instance attributes that existed in the mro as descriptors
as well.

What do you think?
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Instance variable access and descriptors

2007-06-09 Thread Steven Bethard
On 6/9/07, Eyal Lotem [EMAIL PROTECTED] wrote:
 I believe that this should be changed, so that Python first looks for
 the attribute in the instance's dict and only then through the dict's
 mro.
[snip]
 What do you think?

Are you suggesting that the following code should print 43 instead of 42?
::

 class C(object):
... x = property(lambda self: self._x)
... def __init__(self):
... self._x = 42
...
 c = C()
 c.__dict__['x'] = 43
 c.x
42

If so, this is a pretty substantial backwards incompatibility, and you
should probably post this to python-ideas first to hash things out. If
people like it there, the right target is probably Python 3000, not
Python 2.x.

STeVe
-- 
I'm not *in*-sane. Indeed, I am so far *out* of sane that you appear a
tiny blip on the distant coast of sanity.
--- Bucky Katt, Get Fuzzy
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Instance variable access and descriptors

2007-06-09 Thread Steven Bethard
 On 6/10/07, Steven Bethard [EMAIL PROTECTED] wrote:
  On 6/9/07, Eyal Lotem [EMAIL PROTECTED] wrote:
   I believe that this should be changed, so that Python first looks for
   the attribute in the instance's dict and only then through the dict's
   mro.
 
  Are you suggesting that the following code should print 43 instead of 
  42?
  ::
 
   class C(object):
  ... x = property(lambda self: self._x)
  ... def __init__(self):
  ... self._x = 42
  ...
   c = C()
   c.__dict__['x'] = 43
   c.x
  42

On 6/9/07, Eyal Lotem [EMAIL PROTECTED] wrote:
 Yes, I do suggest that.
 But its important to notice that this is not a suggestion in order to
 improve Python, but one that makes it possible to get reasonable
 performance out of CPython. As such, I don't believe it should be done
 in Py3K.

 Firstly, like everything that breaks backwards compatibility, it is
 possible to have a transitional version that spits warnings for all
 problems (detect name clashes between properties and instance dict).

Sure, but then you're talking about really introducing this in Python
2.7, with 2.6 as a transitional version. So take a minute to look at
the release timelines:

http://www.python.org/dev/peps/pep-0361/
The initial 2.6 target is for April 2008.

http://www.python.org/dev/peps/pep-3000/
I hope to have a first alpha release (3.0a1) out in the first half of
2007; it should take no more than a year from then before the first
proper release, named Python 3.0

So I'm expecting Python 3.0 to come out *before* 2.7. Thus if you're
proposing a backwards-incompatible change that would have to wait
until 2.7 anyway, why not propose it for 3.0 where
backwards-incompatible changes are more acceptable?

STeVe
-- 
I'm not *in*-sane. Indeed, I am so far *out* of sane that you appear a
tiny blip on the distant coast of sanity.
--- Bucky Katt, Get Fuzzy
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Instance variable access and descriptors

2007-06-09 Thread Phillip J. Eby
At 12:23 AM 6/10/2007 +0300, Eyal Lotem wrote:
A. It will break code that uses instance.__dict__['var'] directly,
when 'var' exists as a property with a __set__ in the class. I believe
this is not significant.
B. It will simplify getattr's semantics. Python should _always_ give
precedence to instance attributes over class ones, rather than have
very weird special-cases (such as a property with a __set__).

Actually, these are features that are both used and desirable; I've 
been using them both since Python 2.2 (i.e., for many years 
now).  I'm -1 on removing these features from any version of Python, even 3.0.


C. It will greatly speed up instance variable access, especially when
the class has a large mro.

...at the cost of slowing down access to properties and __slots__, by 
adding an *extra* dictionary lookup there.

Note, by the way, that if you want to change attribute lookup 
semantics, you can always override __getattribute__ and make it work 
whatever way you like, without forcing everybody else to change *their* code.

___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Instance variable access and descriptors

2007-06-09 Thread Kevin Jacobs [EMAIL PROTECTED]

I agree with Phillip with regard to the semantics.  They are semantically
desirable.  However, there is a patch to add a mro cache to speed up these
sorts of cases on the Python tracker, originally submitted by Armin Rigo.
He saw ~20% speedups, others see less.  It is currently just sitting there
with no apparent activity.  So if the overhead of mro lookups is that
bothersome, it may be well worth your time to review the patch.

URL:
http://sourceforge.net/tracker/index.php?func=detailaid=1700288group_id=5470atid=305470

-Kevin


On 6/9/07, Phillip J. Eby [EMAIL PROTECTED] wrote:


At 12:23 AM 6/10/2007 +0300, Eyal Lotem wrote:
A. It will break code that uses instance.__dict__['var'] directly,
when 'var' exists as a property with a __set__ in the class. I believe
this is not significant.
B. It will simplify getattr's semantics. Python should _always_ give
precedence to instance attributes over class ones, rather than have
very weird special-cases (such as a property with a __set__).

Actually, these are features that are both used and desirable; I've
been using them both since Python 2.2 (i.e., for many years
now).  I'm -1 on removing these features from any version of Python, even
3.0.


C. It will greatly speed up instance variable access, especially when
the class has a large mro.

...at the cost of slowing down access to properties and __slots__, by
adding an *extra* dictionary lookup there.

Note, by the way, that if you want to change attribute lookup
semantics, you can always override __getattribute__ and make it work
whatever way you like, without forcing everybody else to change *their*
code.

___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe:
http://mail.python.org/mailman/options/python-dev/jacobs%40bioinformed.com

___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Instance variable access and descriptors

2007-06-09 Thread Eyal Lotem
I must be missing something, as I really see no reason to keep the
existing semantics other than backwards compatibility (which can be
achieved by introducing a __fastattr__ or such).

Can you explain under which situations or find any example situation
where the existing semantics are desirable?

As for the mro cache - thanks for pointing it out - I think it can
serve as a platform for another idea that in conjunction with psyco,
can possibly speed up CPython very significantly (will create a thread
about this soon).

Please note that speeding up the mro-lookup solves only half of the
problem (if it was solved - which it seems not to have been), the more
important half of the problem remains, allow me to emphasize:

ALL instance attribute accesses look up in both instance and class
dicts, when it could look just in the instance dict. This is made
worse by the fact that the class dict lookup is more expensive (with
or without the mro cache).
Some code that accesses a lot of instance attributes in an inner loop
can easily be sped up by a factor of 2 or more (depending on the size
of the mro).

Eyal

On 6/10/07, Kevin Jacobs [EMAIL PROTECTED] [EMAIL PROTECTED] wrote:
 I agree with Phillip with regard to the semantics.  They are semantically
 desirable.  However, there is a patch to add a mro cache to speed up these
 sorts of cases on the Python tracker, originally submitted by Armin Rigo.
 He saw ~20% speedups, others see less.  It is currently just sitting there
 with no apparent activity.  So if the overhead of mro lookups is that
 bothersome, it may be well worth your time to review the patch.

 URL:
 http://sourceforge.net/tracker/index.php?func=detailaid=1700288group_id=5470atid=305470

 -Kevin



 On 6/9/07, Phillip J. Eby [EMAIL PROTECTED] wrote:
 
  At 12:23 AM 6/10/2007 +0300, Eyal Lotem wrote:
  A. It will break code that uses instance.__dict__['var'] directly,
  when 'var' exists as a property with a __set__ in the class. I believe
  this is not significant.
  B. It will simplify getattr's semantics. Python should _always_ give
  precedence to instance attributes over class ones, rather than have
  very weird special-cases (such as a property with a __set__).
 
  Actually, these are features that are both used and desirable; I've
  been using them both since Python 2.2 (i.e., for many years
  now).  I'm -1 on removing these features from any version of Python, even
 3.0.
 
 
  C. It will greatly speed up instance variable access, especially when
  the class has a large mro.
 
  ...at the cost of slowing down access to properties and __slots__, by
  adding an *extra* dictionary lookup there.
 
  Note, by the way, that if you want to change attribute lookup
  semantics, you can always override __getattribute__ and make it work
  whatever way you like, without forcing everybody else to change *their*
 code.
 
  ___
  Python-Dev mailing list
  Python-Dev@python.org
  http://mail.python.org/mailman/listinfo/python-dev
  Unsubscribe:
 http://mail.python.org/mailman/options/python-dev/jacobs%40bioinformed.com
 


___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com