Re: Interning own classes like strings for speed and size?
Christian Heimes li...@cheimes.de writes: You are right as long as you don't try to rebind the variable. And then only if you assign through an instance, which doesn't make sense for a class-level cache. Assignment like Example2._cache = {} would work. -- http://mail.python.org/mailman/listinfo/python-list
Re: Interning own classes like strings for speed and size?
Steven D'Aprano wrote: class InternedTuple(tuple): ... _cache = {} ... def __new__(cls, *args): ... t = super().__new__(cls, *args) ... return cls._cache.setdefault(t, t) That looks good. The only thing that first bothered me is that it creates an object and then possibly discards it again. However, there is no way around that, since at least the key to the dict must be created for lookup. Since key and value are the same here, this is even for free. What I also found was that with the above, I can't provide __eq__ and __ne__ that just check for identity. If I do, the lookup in setdefault() will never find an existing tuple and I will never save memory for a single object. Thanks! Uli -- http://mail.python.org/mailman/listinfo/python-list
Re: Interning own classes like strings for speed and size?
On Tue, 28 Dec 2010 13:42:39 +0100, Ulrich Eckhardt wrote: Steven D'Aprano wrote: class InternedTuple(tuple): ... _cache = {} ... def __new__(cls, *args): ... t = super().__new__(cls, *args) ... return cls._cache.setdefault(t, t) That looks good. The only thing that first bothered me is that it creates an object and then possibly discards it again. However, there is no way around that, since at least the key to the dict must be created for lookup. Since key and value are the same here, this is even for free. What I also found was that with the above, I can't provide __eq__ and __ne__ that just check for identity. If I do, the lookup in setdefault() will never find an existing tuple and I will never save memory for a single object. If all you want is to save memory, you don't need to change the __eq__ method. But if you still want to, try this: # Untested class InternedTuple(tuple): _cache = {} def __new__(cls, *args): t = super().__new__(cls, *args) return cls._cache.setdefault(args, t) def __eq__(self, other): return self is other def __ne__(self, other): return self is not other -- Steven -- http://mail.python.org/mailman/listinfo/python-list
Re: Interning own classes like strings for speed and size?
Steven D'Aprano, 28.12.2010 15:11: On Tue, 28 Dec 2010 13:42:39 +0100, Ulrich Eckhardt wrote: Steven D'Aprano wrote: class InternedTuple(tuple): ... _cache = {} ... def __new__(cls, *args): ... t = super().__new__(cls, *args) ... return cls._cache.setdefault(t, t) That looks good. The only thing that first bothered me is that it creates an object and then possibly discards it again. However, there is no way around that, since at least the key to the dict must be created for lookup. Since key and value are the same here, this is even for free. What I also found was that with the above, I can't provide __eq__ and __ne__ that just check for identity. If I do, the lookup in setdefault() will never find an existing tuple and I will never save memory for a single object. If all you want is to save memory, you don't need to change the __eq__ method. But if you still want to, try this: # Untested Yep, that' the problem. ;) class InternedTuple(tuple): _cache = {} def __new__(cls, *args): t = super().__new__(cls, *args) return cls._cache.setdefault(args, t) def __eq__(self, other): return self is other def __ne__(self, other): return self is not other What Ulrich meant, was: doing this will actually kill the caching, because the first time the comparison is called is when looking up the tuple while adding it to the interning dict. Since the new tuple is, well, new, it will not be equal (read: identical) to any cached tuple, thus resulting in a new entry regardless of its content. Stefan -- http://mail.python.org/mailman/listinfo/python-list
Re: Interning own classes like strings for speed and size?
Am 28.12.2010 15:11, schrieb Steven D'Aprano: # Untested class InternedTuple(tuple): _cache = {} def __new__(cls, *args): t = super().__new__(cls, *args) return cls._cache.setdefault(args, t) def __eq__(self, other): return self is other def __ne__(self, other): return self is not other The code doesn't work, you have to change super().__new__(cls, *args) to super().__new__(cls, args). Don't ask me why. I'm surprised, too. InternedTuple(1) Traceback (most recent call last): File stdin, line 1, in module File stdin, line 4, in __new__ TypeError: 'int' object is not iterable Also this code is going to use much more memory than an ordinary tuple since every instance of InternedTuple has a __dict__ attribute. This code works but I had to move the cache outside the class because of __slots__. _ituple_cache = {} class ituple(tuple): __slots__ = () # no __dict__ def __new__(cls, *args): cached = _ituple_cache.get(args) if cached is not None: return cached self = _ituple_cache[args] = super(ituple, cls).__new__(cls, args) return self -- http://mail.python.org/mailman/listinfo/python-list
Re: Interning own classes like strings for speed and size?
On 12/27/2010 3:05 AM, Ulrich Eckhardt wrote: Hi! I'm trying to solve a computational problem and of course speed and size is important there. Then you probably shouldn't be using CPython. Apart from picking the right algorithm, I came across an idea that could help speed up things and keep memory requirements down. What I have is regions described by min and max coordinates. At first, I just modeled these as a simple class containing two values for each axis. There are more effective techniques for that. In game physics collision detection systems, it's common to use axis-ordered lists of bounding boxes to partition objects. This still works even when the ranges overlap. Look up I-Collide for a classic implementation. Am I looking in the wrong direction? Is there some better approach? Please don't tell me to use C, as I'm specifically interested in learning Python, I'm pretty sure I could have solved the problem quickly in C++ otherwise. CPython is a naive interpreter which does dictionary lookups for every variable and attribute access. If you're doing fast manipulation of linked data structures in CPython, performance is going to suck. Check out Shed Skin 0.7, which is a much faster Python system. It's limited, but for the kind of data structure work you're doing, all the features you should need already work. John Nagle In a second step, I derived this class from tuple instead of object. Some code then moved from __init__ to __new__ and some code that modified these objects had to be changed to replace them instead. The upside to this is that they can be used as keys in sets and dicts, which isn't the case for mutable types[1]. What I'm now considering is to only allow a single instance of these objects for each set of values, similar to interned strings. What I would gain is that I could safely compare objects for identity instead of equality. What I'm not yet sure is how much overhead that would give me and/or how to keep it low. The idea is to store each instance in a set and after creating a new object I would first look up an equal object in the global set and return that instead, otherwise add the new one. The problem I foresee is that if I define equality as identity, this lookup when creating will never eliminate duplicates. If I only fall back to equality comparison for non-identical objects, I would probably sacrifice most of the gain. If I build a dict mapping between the values and the actual objects, I would have doubled the required memory and uselessly store the same values twice there. Cheers! Uli [1] Somebody correct me if I'm wrong, but I believe I could have defined a hashing function for the type and thus allowed its use in a set or dict, right? However, goofing up because you accidentally modified an object and changed its hash value is something I don't want to risk anyway. -- http://mail.python.org/mailman/listinfo/python-list
Re: Interning own classes like strings for speed and size?
Christian Heimes li...@cheimes.de writes: Also this code is going to use much more memory than an ordinary tuple since every instance of InternedTuple has a __dict__ attribute. This code works but I had to move the cache outside the class because of __slots__. Wouldn't it work inside the class as well? __slots__ applies to instances, not to classes themselves. In Python 3.1: class Foo(tuple): ... __slots__ = () ... _cache = {} ... Foo._cache# works as expected {} Foo()._cache # and so does this {} -- http://mail.python.org/mailman/listinfo/python-list
Re: Interning own classes like strings for speed and size?
Am 28.12.2010 21:16, schrieb Hrvoje Niksic: Christian Heimes li...@cheimes.de writes: Also this code is going to use much more memory than an ordinary tuple since every instance of InternedTuple has a __dict__ attribute. This code works but I had to move the cache outside the class because of __slots__. Wouldn't it work inside the class as well? __slots__ applies to instances, not to classes themselves. In Python 3.1: You are right as long as you don't try to rebind the variable. I recalled that class attributes of classes with __slots__ behave slightly different than ordinary classes. For example you can't have a writeable slot and class default values at the same time. class Example2(object): ... __slots__ = () ... _cache = {} ... Example2()._cache {} Example2()._cache = {} Traceback (most recent call last): File stdin, line 1, in module AttributeError: 'Example2' object attribute '_cache' is read-only Christian -- http://mail.python.org/mailman/listinfo/python-list
Re: Interning own classes like strings for speed and size?
On Wed, Dec 29, 2010 at 00:45, Christian Heimes li...@cheimes.de wrote: Am 28.12.2010 21:16, schrieb Hrvoje Niksic: Christian Heimes li...@cheimes.de writes: Also this code is going to use much more memory than an ordinary tuple since every instance of InternedTuple has a __dict__ attribute. This code works but I had to move the cache outside the class because of __slots__. Wouldn't it work inside the class as well? __slots__ applies to instances, not to classes themselves. In Python 3.1: You are right as long as you don't try to rebind the variable. I recalled that class attributes of classes with __slots__ behave slightly different than ordinary classes. For example you can't have a writeable slot and class default values at the same time. class Example2(object): ... __slots__ = () ... _cache = {} ... Example2()._cache {} Example2()._cache = {} Traceback (most recent call last): File stdin, line 1, in module AttributeError: 'Example2' object attribute '_cache' is read-only Forgive me if I'm misunderstanding, but if you want to cache instances of a class, surely Example2()._cache = {} would defeat the purpose, at least for that instance of Example2? The slot seems writeable enough, after all Example2()._cache['foo'] = 'bar' seems to work? -- Rami Chowdhury Never assume malice when stupidity will suffice. -- Hanlon's Razor +44-7581-430-517 / +1-408-597-7068 / +88-0189-245544 -- http://mail.python.org/mailman/listinfo/python-list
Interning own classes like strings for speed and size?
Hi! I'm trying to solve a computational problem and of course speed and size is important there. Apart from picking the right algorithm, I came across an idea that could help speed up things and keep memory requirements down. What I have is regions described by min and max coordinates. At first, I just modeled these as a simple class containing two values for each axis. In a second step, I derived this class from tuple instead of object. Some code then moved from __init__ to __new__ and some code that modified these objects had to be changed to replace them instead. The upside to this is that they can be used as keys in sets and dicts, which isn't the case for mutable types[1]. What I'm now considering is to only allow a single instance of these objects for each set of values, similar to interned strings. What I would gain is that I could safely compare objects for identity instead of equality. What I'm not yet sure is how much overhead that would give me and/or how to keep it low. The idea is to store each instance in a set and after creating a new object I would first look up an equal object in the global set and return that instead, otherwise add the new one. The problem I foresee is that if I define equality as identity, this lookup when creating will never eliminate duplicates. If I only fall back to equality comparison for non-identical objects, I would probably sacrifice most of the gain. If I build a dict mapping between the values and the actual objects, I would have doubled the required memory and uselessly store the same values twice there. Am I looking in the wrong direction? Is there some better approach? Please don't tell me to use C, as I'm specifically interested in learning Python, I'm pretty sure I could have solved the problem quickly in C++ otherwise. Other suggestions? Cheers! Uli [1] Somebody correct me if I'm wrong, but I believe I could have defined a hashing function for the type and thus allowed its use in a set or dict, right? However, goofing up because you accidentally modified an object and changed its hash value is something I don't want to risk anyway. -- http://mail.python.org/mailman/listinfo/python-list
Re: Interning own classes like strings for speed and size?
I'm trying to solve a computational problem and of course speed and size is important there. Apart from picking the right algorithm, I came across an idea that could help speed up things and keep memory requirements down. What I have is regions described by min and max coordinates. At first, I just modeled these as a simple class containing two values for each axis. In a second step, I derived this class from tuple instead of object. Some code then moved from __init__ to __new__ and some code that modified these objects had to be changed to replace them instead. The upside to this is that they can be used as keys in sets and dicts, which isn't the case for mutable types[1]. What I'm now considering is to only allow a single instance of these objects for each set of values, similar to interned strings. What I would gain is that I could safely compare objects for identity instead of equality. What I'm not yet sure is how much overhead that would give me and/or how to keep it low. The idea is to store each instance in a set and after creating a new object I would first look up an equal object in the global set and return that instead, otherwise add the new one. The problem I foresee is that if I define equality as identity, this lookup when creating will never eliminate duplicates. If I only fall back to equality comparison for non-identical objects, I would probably sacrifice most of the gain. If I build a dict mapping between the values and the actual objects, I would have doubled the required memory and uselessly store the same values twice there. Am I looking in the wrong direction? Is there some better approach? Please don't tell me to use C, as I'm specifically interested in learning Python, I'm pretty sure I could have solved the problem quickly in C++ otherwise. Other suggestions? Cheers! Uli [1] Somebody correct me if I'm wrong, but I believe I could have defined a hashing function for the type and thus allowed its use in a set or dict, right? However, goofing up because you accidentally modified an object and changed its hash value is something I don't want to risk anyway. I believe what you are looking for is (some variant of) the singleton pattern: http://en.wikipedia.org/wiki/Singleton_pattern How it's done in python see http://www.google.com/search?q=python+singleton Cheers, Daniel -- Psss, psss, put it down! - http://www.cafepress.com/putitdown -- http://mail.python.org/mailman/listinfo/python-list
Re: Interning own classes like strings for speed and size?
Daniel Fetchinson wrote: I believe what you are looking for is (some variant of) the singleton pattern: http://en.wikipedia.org/wiki/Singleton_pattern Actually, no. What I want is the flyweight pattern instead: http://en.wikipedia.org/wiki/Flyweight_pattern ...but thank you for the approach of looking for a suitable pattern! Cheers! Uli -- http://mail.python.org/mailman/listinfo/python-list
Re: Interning own classes like strings for speed and size?
I believe what you are looking for is (some variant of) the singleton pattern: http://en.wikipedia.org/wiki/Singleton_pattern Actually, no. What I want is the flyweight pattern instead: http://en.wikipedia.org/wiki/Flyweight_pattern Oh I see. I did not know about this pattern, but in my defense it looks like a variant of the singleton pattern :) Thanks! One always learns something new on python-list. Cheers, Daniel -- Psss, psss, put it down! - http://www.cafepress.com/putitdown -- http://mail.python.org/mailman/listinfo/python-list
Re: Interning own classes like strings for speed and size?
On 12/27/2010 6:05 AM, Ulrich Eckhardt wrote: Hi! I'm trying to solve a computational problem and of course speed and size is important there. Apart from picking the right algorithm, I came across an idea that could help speed up things and keep memory requirements down. What I have is regions described by min and max coordinates. What sort of numbers are the coordinates? If integers in a finite range, your problem is a lot simpler than if float of indefinite precision. -- Terry Jan Reedy -- http://mail.python.org/mailman/listinfo/python-list
Re: Interning own classes like strings for speed and size?
Terry Reedy wrote: What sort of numbers are the coordinates? If integers in a finite range, your problem is a lot simpler than if float of indefinite precision. Yes, indeed, I could optimize the amount of data required to store the data itself, but that would require application-specific handling of the data, which is actually not what I want to learn about. If it was that, I'd use a language where I have lower-level access to the system. ;) Thanks nonetheless! Uli -- http://mail.python.org/mailman/listinfo/python-list
Re: Interning own classes like strings for speed and size?
On 27 December 2010 22:05, Ulrich Eckhardt dooms...@knuut.de wrote: What I'm now considering is to only allow a single instance of these objects for each set of values, similar to interned strings. What I would gain is that I could safely compare objects for identity instead of equality. What I'm not yet sure is how much overhead that would give me and/or how to keep it low. The idea is to store each instance in a set and after creating a new object I would first look up an equal object in the global set and return that instead, otherwise add the new one. The problem I foresee is that if I define equality as identity, this lookup when creating will never eliminate duplicates. If I only fall back to equality comparison for non-identical objects, I would probably sacrifice most of the gain. If I build a dict mapping between the values and the actual objects, I would have doubled the required memory and uselessly store the same values twice there. The first thing to deal with the equality check. The way this is generally done is to first do an identity check, then if that fails fall back to an equality check. This gives you a fast path for the normal case, but still gives full equality checks on a slow path. Your assumption of double storage for a dict is somewhat flawed if I understand you correctly. The mapping: (value1, value2, ...) = my_object(value1, value2, ...) *could* result in value1, value2, ... being created and stored twice (hence the possibility of double storage) and the mapping tuple being stored + your object. However, if the key and value are the same object, there is only a single additional reference being stored (within the dict structure of course). The way you should probably deal with this is to always create one of your objects for doing the lookup. Then your algorithm is: new_object = my_object(value1, value2, ...) try: canonical = canonical_dict[new_object] except KeyError: canonical = canonical_dict[new_object] = new_object You'd have to structure your __new__ appropriately to do it there, but it is possible assuming that everything you need for equality testing is done in __new__. If you further want to reduce storage (if it's an issue) you could also canonicalise the values themselves using a similar technique. You could even use the same canonicalisation dictionary so long as you could ensure that none of the different types compare equal (e.g. floats and integers). Note that as an implementation detail the integers -5...256 are already interned, but you can't rely on that (the range has changed over time). Tim Delaney -- http://mail.python.org/mailman/listinfo/python-list
Re: Interning own classes like strings for speed and size?
On Mon, 27 Dec 2010 12:05:10 +0100, Ulrich Eckhardt wrote: What I'm now considering is to only allow a single instance of these objects for each set of values, similar to interned strings. What I would gain is that I could safely compare objects for identity instead of equality. What I'm not yet sure is how much overhead that would give me and/or how to keep it low. The idea is to store each instance in a set and after creating a new object I would first look up an equal object in the global set and return that instead, otherwise add the new one. Try this technique: class InternedTuple(tuple): ... _cache = {} ... def __new__(cls, *args): ... t = super().__new__(cls, *args) ... return cls._cache.setdefault(t, t) ... t1 = InternedTuple((1.0, 2.0)) t2 = InternedTuple((0.0, 0.0)) t3 = InternedTuple((1.0, 2.0)) t1 is t2 False t1 is t3 True t1 == t2 False t1 == t3 True -- Steven -- http://mail.python.org/mailman/listinfo/python-list