Re: Interning own classes like strings for speed and size?

2010-12-29 Thread Hrvoje Niksic
Christian Heimes li...@cheimes.de writes:

 You are right as long as you don't try to rebind the variable.

And then only if you assign through an instance, which doesn't make
sense for a class-level cache.  Assignment like Example2._cache = {}
would work.
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: Interning own classes like strings for speed and size?

2010-12-28 Thread Ulrich Eckhardt
Steven D'Aprano wrote:
 class InternedTuple(tuple):
 ... _cache = {}
 ... def __new__(cls, *args):
 ... t = super().__new__(cls, *args)
 ... return cls._cache.setdefault(t, t)

That looks good. The only thing that first bothered me is that it creates an 
object and then possibly discards it again. However, there is no way around 
that, since at least the key to the dict must be created for lookup. Since 
key and value are the same here, this is even for free.

What I also found was that with the above, I can't provide __eq__ and __ne__ 
that just check for identity. If I do, the lookup in setdefault() will never 
find an existing tuple and I will never save memory for a single object.


Thanks!

Uli

-- 
http://mail.python.org/mailman/listinfo/python-list


Re: Interning own classes like strings for speed and size?

2010-12-28 Thread Steven D'Aprano
On Tue, 28 Dec 2010 13:42:39 +0100, Ulrich Eckhardt wrote:

 Steven D'Aprano wrote:
 class InternedTuple(tuple):
 ... _cache = {}
 ... def __new__(cls, *args):
 ... t = super().__new__(cls, *args) 
 ... return cls._cache.setdefault(t, t)
 
 That looks good. The only thing that first bothered me is that it
 creates an object and then possibly discards it again. However, there is
 no way around that, since at least the key to the dict must be created
 for lookup. Since key and value are the same here, this is even for
 free.
 
 What I also found was that with the above, I can't provide __eq__ and
 __ne__ that just check for identity. If I do, the lookup in setdefault()
 will never find an existing tuple and I will never save memory for a
 single object.

If all you want is to save memory, you don't need to change the __eq__ 
method. But if you still want to, try this:


# Untested
class InternedTuple(tuple):
_cache = {}
def __new__(cls, *args):
t = super().__new__(cls, *args) 
return cls._cache.setdefault(args, t)
def __eq__(self, other):
return self is other
def __ne__(self, other):
return self is not other



-- 
Steven
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: Interning own classes like strings for speed and size?

2010-12-28 Thread Stefan Behnel

Steven D'Aprano, 28.12.2010 15:11:

On Tue, 28 Dec 2010 13:42:39 +0100, Ulrich Eckhardt wrote:


Steven D'Aprano wrote:

class InternedTuple(tuple):

... _cache = {}
... def __new__(cls, *args):
... t = super().__new__(cls, *args)
... return cls._cache.setdefault(t, t)


That looks good. The only thing that first bothered me is that it
creates an object and then possibly discards it again. However, there is
no way around that, since at least the key to the dict must be created
for lookup. Since key and value are the same here, this is even for
free.

What I also found was that with the above, I can't provide __eq__ and
__ne__ that just check for identity. If I do, the lookup in setdefault()
will never find an existing tuple and I will never save memory for a
single object.


If all you want is to save memory, you don't need to change the __eq__
method. But if you still want to, try this:

# Untested


Yep, that' the problem. ;)



class InternedTuple(tuple):
 _cache = {}
 def __new__(cls, *args):
 t = super().__new__(cls, *args)
 return cls._cache.setdefault(args, t)
 def __eq__(self, other):
 return self is other
 def __ne__(self, other):
 return self is not other


What Ulrich meant, was: doing this will actually kill the caching, because 
the first time the comparison is called is when looking up the tuple while 
adding it to the interning dict. Since the new tuple is, well, new, it will 
not be equal (read: identical) to any cached tuple, thus resulting in a new 
entry regardless of its content.


Stefan

--
http://mail.python.org/mailman/listinfo/python-list


Re: Interning own classes like strings for speed and size?

2010-12-28 Thread Christian Heimes
Am 28.12.2010 15:11, schrieb Steven D'Aprano:
 # Untested
 class InternedTuple(tuple):
 _cache = {}
 def __new__(cls, *args):
 t = super().__new__(cls, *args) 
 return cls._cache.setdefault(args, t)
 def __eq__(self, other):
 return self is other
 def __ne__(self, other):
 return self is not other

The code doesn't work, you have to change super().__new__(cls, *args) to
super().__new__(cls, args). Don't ask me why. I'm surprised, too.

 InternedTuple(1)
Traceback (most recent call last):
  File stdin, line 1, in module
  File stdin, line 4, in __new__
TypeError: 'int' object is not iterable


Also this code is going to use much more memory than an ordinary tuple
since every instance of InternedTuple has a __dict__ attribute. This
code works but I had to move the cache outside the class because of
__slots__.

_ituple_cache = {}

class ituple(tuple):
__slots__ = () # no __dict__

def __new__(cls, *args):
cached = _ituple_cache.get(args)
if cached is not None:
return cached
self = _ituple_cache[args] = super(ituple, cls).__new__(cls, args)
return self

-- 
http://mail.python.org/mailman/listinfo/python-list


Re: Interning own classes like strings for speed and size?

2010-12-28 Thread John Nagle

On 12/27/2010 3:05 AM, Ulrich Eckhardt wrote:

Hi!

I'm trying to solve a computational problem and of course speed and size is
important there.


Then you probably shouldn't be using CPython.


Apart from picking the right algorithm, I came across an
idea that could help speed up things and keep memory requirements down. What
I have is regions described by min and max coordinates. At first, I just
modeled these as a simple class containing two values for each axis.


There are more effective techniques for that.  In game physics
collision detection systems, it's common to use axis-ordered lists
of bounding boxes to partition objects.  This still works even when
the ranges overlap.  Look up I-Collide for a classic implementation.

 Am I looking in the wrong direction? Is there some better approach?
 Please
 don't tell me to use C, as I'm specifically interested in
 learning Python,
 I'm pretty sure I could have solved the problem quickly in
 C++ otherwise.

CPython is a naive interpreter which does dictionary
lookups for every variable and attribute access.  If you're doing
fast manipulation of linked data structures in CPython, performance
is going to suck.

Check out Shed Skin 0.7, which is a much faster Python
system.  It's limited, but for the kind of data structure work
you're doing, all the features you should need already work.

John Nagle


In a second step, I derived this class from tuple instead of object. Some
code then moved from __init__ to __new__ and some code that modified these
objects had to be changed to replace them instead. The upside to this is
that they can be used as keys in sets and dicts, which isn't the case for
mutable types[1].

What I'm now considering is to only allow a single instance of these objects
for each set of values, similar to interned strings. What I would gain is
that I could safely compare objects for identity instead of equality. What
I'm not yet sure is how much overhead that would give me and/or how to keep
it low. The idea is to store each instance in a set and after creating a new
object I would first look up an equal object in the global set and return
that instead, otherwise add the new one.

The problem I foresee is that if I define equality as identity, this lookup
when creating will never eliminate duplicates. If I only fall back to
equality comparison for non-identical objects, I would probably sacrifice
most of the gain. If I build a dict mapping between the values and the
actual objects, I would have doubled the required memory and uselessly store
the same values twice there.


Cheers!

Uli


[1] Somebody correct me if I'm wrong, but I believe I could have defined a
hashing function for the type and thus allowed its use in a set or dict,
right? However, goofing up because you accidentally modified an object and
changed its hash value is something I don't want to risk anyway.



--
http://mail.python.org/mailman/listinfo/python-list


Re: Interning own classes like strings for speed and size?

2010-12-28 Thread Hrvoje Niksic
Christian Heimes li...@cheimes.de writes:

 Also this code is going to use much more memory than an ordinary tuple
 since every instance of InternedTuple has a __dict__ attribute. This
 code works but I had to move the cache outside the class because of
 __slots__.

Wouldn't it work inside the class as well?  __slots__ applies to
instances, not to classes themselves.  In Python 3.1:

 class Foo(tuple):
...   __slots__ = ()
...   _cache = {}
... 
 Foo._cache# works as expected
{}
 Foo()._cache  # and so does this
{}
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: Interning own classes like strings for speed and size?

2010-12-28 Thread Christian Heimes
Am 28.12.2010 21:16, schrieb Hrvoje Niksic:
 Christian Heimes li...@cheimes.de writes:
 
 Also this code is going to use much more memory than an ordinary tuple
 since every instance of InternedTuple has a __dict__ attribute. This
 code works but I had to move the cache outside the class because of
 __slots__.
 
 Wouldn't it work inside the class as well?  __slots__ applies to
 instances, not to classes themselves.  In Python 3.1:

You are right as long as you don't try to rebind the variable. I
recalled that class attributes of classes with __slots__ behave slightly
different than ordinary classes. For example you can't have a writeable
slot and class default values at the same time.

 class Example2(object):
... __slots__ = ()
... _cache = {}
...
 Example2()._cache
{}
 Example2()._cache = {}
Traceback (most recent call last):
  File stdin, line 1, in module
AttributeError: 'Example2' object attribute '_cache' is read-only

Christian

-- 
http://mail.python.org/mailman/listinfo/python-list


Re: Interning own classes like strings for speed and size?

2010-12-28 Thread Rami Chowdhury
On Wed, Dec 29, 2010 at 00:45, Christian Heimes li...@cheimes.de wrote:
 Am 28.12.2010 21:16, schrieb Hrvoje Niksic:
 Christian Heimes li...@cheimes.de writes:

 Also this code is going to use much more memory than an ordinary tuple
 since every instance of InternedTuple has a __dict__ attribute. This
 code works but I had to move the cache outside the class because of
 __slots__.

 Wouldn't it work inside the class as well?  __slots__ applies to
 instances, not to classes themselves.  In Python 3.1:

 You are right as long as you don't try to rebind the variable. I
 recalled that class attributes of classes with __slots__ behave slightly
 different than ordinary classes. For example you can't have a writeable
 slot and class default values at the same time.

 class Example2(object):
 ...     __slots__ = ()
 ...     _cache = {}
 ...
 Example2()._cache
 {}
 Example2()._cache = {}
 Traceback (most recent call last):
  File stdin, line 1, in module
 AttributeError: 'Example2' object attribute '_cache' is read-only

Forgive me if I'm misunderstanding, but if you want to cache instances
of a class, surely

 Example2()._cache = {}

would defeat the purpose, at least for that instance of Example2? The
slot seems writeable enough, after all

 Example2()._cache['foo'] = 'bar'

seems to work?


-- 
Rami Chowdhury
Never assume malice when stupidity will suffice. -- Hanlon's Razor
+44-7581-430-517 / +1-408-597-7068 / +88-0189-245544
-- 
http://mail.python.org/mailman/listinfo/python-list


Interning own classes like strings for speed and size?

2010-12-27 Thread Ulrich Eckhardt
Hi!

I'm trying to solve a computational problem and of course speed and size is 
important there. Apart from picking the right algorithm, I came across an 
idea that could help speed up things and keep memory requirements down. What 
I have is regions described by min and max coordinates. At first, I just 
modeled these as a simple class containing two values for each axis.

In a second step, I derived this class from tuple instead of object. Some 
code then moved from __init__ to __new__ and some code that modified these 
objects had to be changed to replace them instead. The upside to this is 
that they can be used as keys in sets and dicts, which isn't the case for 
mutable types[1].

What I'm now considering is to only allow a single instance of these objects 
for each set of values, similar to interned strings. What I would gain is 
that I could safely compare objects for identity instead of equality. What 
I'm not yet sure is how much overhead that would give me and/or how to keep 
it low. The idea is to store each instance in a set and after creating a new 
object I would first look up an equal object in the global set and return 
that instead, otherwise add the new one.

The problem I foresee is that if I define equality as identity, this lookup 
when creating will never eliminate duplicates. If I only fall back to 
equality comparison for non-identical objects, I would probably sacrifice 
most of the gain. If I build a dict mapping between the values and the 
actual objects, I would have doubled the required memory and uselessly store 
the same values twice there.

Am I looking in the wrong direction? Is there some better approach? Please 
don't tell me to use C, as I'm specifically interested in learning Python, 
I'm pretty sure I could have solved the problem quickly in C++ otherwise. 
Other suggestions?

Cheers!

Uli


[1] Somebody correct me if I'm wrong, but I believe I could have defined a 
hashing function for the type and thus allowed its use in a set or dict, 
right? However, goofing up because you accidentally modified an object and 
changed its hash value is something I don't want to risk anyway.

-- 
http://mail.python.org/mailman/listinfo/python-list


Re: Interning own classes like strings for speed and size?

2010-12-27 Thread Daniel Fetchinson
 I'm trying to solve a computational problem and of course speed and size is
 important there. Apart from picking the right algorithm, I came across an
 idea that could help speed up things and keep memory requirements down. What
 I have is regions described by min and max coordinates. At first, I just
 modeled these as a simple class containing two values for each axis.

 In a second step, I derived this class from tuple instead of object. Some
 code then moved from __init__ to __new__ and some code that modified these
 objects had to be changed to replace them instead. The upside to this is
 that they can be used as keys in sets and dicts, which isn't the case for
 mutable types[1].

 What I'm now considering is to only allow a single instance of these objects
 for each set of values, similar to interned strings. What I would gain is
 that I could safely compare objects for identity instead of equality. What
 I'm not yet sure is how much overhead that would give me and/or how to keep
 it low. The idea is to store each instance in a set and after creating a new
 object I would first look up an equal object in the global set and return
 that instead, otherwise add the new one.

 The problem I foresee is that if I define equality as identity, this lookup
 when creating will never eliminate duplicates. If I only fall back to
 equality comparison for non-identical objects, I would probably sacrifice
 most of the gain. If I build a dict mapping between the values and the
 actual objects, I would have doubled the required memory and uselessly store
 the same values twice there.

 Am I looking in the wrong direction? Is there some better approach? Please
 don't tell me to use C, as I'm specifically interested in learning Python,
 I'm pretty sure I could have solved the problem quickly in C++ otherwise.
 Other suggestions?

 Cheers!

 Uli


 [1] Somebody correct me if I'm wrong, but I believe I could have defined a
 hashing function for the type and thus allowed its use in a set or dict,
 right? However, goofing up because you accidentally modified an object and
 changed its hash value is something I don't want to risk anyway.

I believe what you are looking for is (some variant of) the singleton pattern:

http://en.wikipedia.org/wiki/Singleton_pattern

How it's done in python see http://www.google.com/search?q=python+singleton

Cheers,
Daniel

-- 
Psss, psss, put it down! - http://www.cafepress.com/putitdown
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: Interning own classes like strings for speed and size?

2010-12-27 Thread Ulrich Eckhardt
Daniel Fetchinson wrote:
 I believe what you are looking for is (some variant of) the singleton
 pattern:
 
 http://en.wikipedia.org/wiki/Singleton_pattern

Actually, no. What I want is the flyweight pattern instead:

http://en.wikipedia.org/wiki/Flyweight_pattern

...but thank you for the approach of looking for a suitable pattern!

Cheers!

Uli

-- 
http://mail.python.org/mailman/listinfo/python-list


Re: Interning own classes like strings for speed and size?

2010-12-27 Thread Daniel Fetchinson
 I believe what you are looking for is (some variant of) the singleton
 pattern:

 http://en.wikipedia.org/wiki/Singleton_pattern

 Actually, no. What I want is the flyweight pattern instead:

 http://en.wikipedia.org/wiki/Flyweight_pattern

Oh I see. I did not know about this pattern, but in my defense it
looks like a variant of the singleton pattern :)

Thanks! One always learns something new on python-list.

Cheers,
Daniel


-- 
Psss, psss, put it down! - http://www.cafepress.com/putitdown
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: Interning own classes like strings for speed and size?

2010-12-27 Thread Terry Reedy

On 12/27/2010 6:05 AM, Ulrich Eckhardt wrote:

Hi!

I'm trying to solve a computational problem and of course speed and size is
important there. Apart from picking the right algorithm, I came across an
idea that could help speed up things and keep memory requirements down. What
I have is regions described by min and max coordinates.


What sort of numbers are the coordinates? If integers in a finite range, 
your problem is a lot simpler than if float of indefinite precision.


--
Terry Jan Reedy

--
http://mail.python.org/mailman/listinfo/python-list


Re: Interning own classes like strings for speed and size?

2010-12-27 Thread Ulrich Eckhardt
Terry Reedy wrote:
 What sort of numbers are the coordinates? If integers in a finite range,
 your problem is a lot simpler than if float of indefinite precision.

Yes, indeed, I could optimize the amount of data required to store the data 
itself, but that would require application-specific handling of the data, 
which is actually not what I want to learn about. If it was that, I'd use a 
language where I have lower-level access to the system. ;)

Thanks nonetheless!

Uli

-- 
http://mail.python.org/mailman/listinfo/python-list


Re: Interning own classes like strings for speed and size?

2010-12-27 Thread Tim Delaney
On 27 December 2010 22:05, Ulrich Eckhardt dooms...@knuut.de wrote:


 What I'm now considering is to only allow a single instance of these
 objects
 for each set of values, similar to interned strings. What I would gain is
 that I could safely compare objects for identity instead of equality. What
 I'm not yet sure is how much overhead that would give me and/or how to keep
 it low. The idea is to store each instance in a set and after creating a
 new
 object I would first look up an equal object in the global set and return
 that instead, otherwise add the new one.

 The problem I foresee is that if I define equality as identity, this lookup
 when creating will never eliminate duplicates. If I only fall back to
 equality comparison for non-identical objects, I would probably sacrifice
 most of the gain. If I build a dict mapping between the values and the
 actual objects, I would have doubled the required memory and uselessly
 store
 the same values twice there.


The first thing to deal with the equality check. The way this is generally
done is to first do an identity check, then if that fails fall back to an
equality check. This gives you a fast path for the normal case, but still
gives full equality checks on a slow path.

Your assumption of double storage for a dict is somewhat flawed if I
understand you correctly. The mapping:

(value1, value2, ...) = my_object(value1, value2, ...)

*could* result in value1, value2, ... being created and stored twice (hence
the possibility of double storage) and the mapping tuple being stored + your
object. However, if the key and value are the same object, there is only a
single additional reference being stored (within the dict structure of
course).

The way you should probably deal with this is to always create one of your
objects for doing the lookup. Then your algorithm is:

new_object = my_object(value1, value2, ...)

try:
canonical = canonical_dict[new_object]
except KeyError:
canonical = canonical_dict[new_object] = new_object

You'd have to structure your __new__ appropriately to do it there, but it is
possible assuming that everything you need for equality testing is done in
__new__.

If you further want to reduce storage (if it's an issue) you could also
canonicalise the values themselves using a similar technique. You could even
use the same canonicalisation dictionary so long as you could ensure that
none of the different types compare equal (e.g. floats and integers). Note
that as an implementation detail the integers -5...256 are already interned,
but you can't rely on that (the range has changed over time).

Tim Delaney
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: Interning own classes like strings for speed and size?

2010-12-27 Thread Steven D'Aprano
On Mon, 27 Dec 2010 12:05:10 +0100, Ulrich Eckhardt wrote:

 What I'm now considering is to only allow a single instance of these
 objects for each set of values, similar to interned strings. What I
 would gain is that I could safely compare objects for identity instead
 of equality. What I'm not yet sure is how much overhead that would give
 me and/or how to keep it low. The idea is to store each instance in a
 set and after creating a new object I would first look up an equal
 object in the global set and return that instead, otherwise add the new
 one.

Try this technique:

 class InternedTuple(tuple):
... _cache = {}
... def __new__(cls, *args):
... t = super().__new__(cls, *args)
... return cls._cache.setdefault(t, t)
... 
 
 
 t1 = InternedTuple((1.0, 2.0))
 t2 = InternedTuple((0.0, 0.0))
 t3 = InternedTuple((1.0, 2.0))
 
 t1 is t2
False
 t1 is t3
True
 t1 == t2
False
 t1 == t3
True



-- 
Steven
-- 
http://mail.python.org/mailman/listinfo/python-list