Re: insert unique data in a list
On 12/14/09 2:24 AM, knifenomad wrote: On 12월14일, 오전10시19분, knifenomadknifeno...@gmail.com wrote: On 12월14일, 오전2시57분, mattiager...@gmail.com wrote: Il Sun, 13 Dec 2009 16:37:20 +, mattia ha scritto: How can I insert non-duplicate data in a list? I mean, is there a particular option in the creation of a list that permit me not to use something like: def append_unique(l, val): if val not in l: l.append(val) Thanks, Mattia Ok, so you all suggest to use a set. Now the second question, more interesting. Why can't I insert a list into a set? I mean, I have a function that returns a list. I call this function several times and maybe the list returned is the same as another one already returned. I usually put all this lists into another list. How can I assure that my list contains only unique lists? Using set does'n work (i.e. the python interpreter tells me: TypeError: unhashable type: 'list')... this makes the set type hashable. class Set(set): __hash__ = lambda self: id(self) s = Set() s.add(1) s.add(2) s.add(1) print s set([1, 2]) d = {} d[s] = 'voila' print d {Set([1,2]):'voila'} print d[s] 'voila'- 원본 텍스트 숨기기 - - 원본 텍스트 보기 - although it's not what you've asked about. it's intereting to make set hashable using __hash__. Thoughtless messing with __hash__ is seldom useful. s1 = Set([1]) s2 = Set([1]) s1 == s2 True d = {} d[s1] = 3 d[s2] = 5 d {Set([1]): 3, Set([1]): 5} Equality is kept for comparison, but what is it worth to hash them by id? On the original problem: You could turn you lists into tuples. This would be clean and correct. ciao - chris -- Christian Tismer :^)mailto:tis...@stackless.com tismerysoft GmbH : Have a break! Take a ride on Python's Johannes-Niemeyer-Weg 9A :*Starship* http://starship.python.net/ 14109 Berlin : PGP key - http://wwwkeys.pgp.net/ work +49 30 802 86 56 mobile +49 173 24 18 776 fax +49 30 80 90 57 05 PGP 0x57F3BF04 9064 F4E1 D754 C2FF 1619 305B C09C 5A3B 57F3 BF04 whom do you want to sponsor today? http://www.stackless.com/ -- http://mail.python.org/mailman/listinfo/python-list
Re: insert unique data in a list
Il Sun, 13 Dec 2009 21:17:28 -0800, knifenomad ha scritto: On 12월14일, 오후12시42분, Steven D'Aprano ste...@remove.this.cybersource.com.au wrote: On Sun, 13 Dec 2009 17:19:17 -0800, knifenomad wrote: this makes the set type hashable. class Set(set): __hash__ = lambda self: id(self) That's a *seriously* broken hash function. key = voila d = { Set(key): 1 } d {Set(['i', 'a', 'l', 'o', 'v']): 1} d[ Set(key) ] Traceback (most recent call last): File stdin, line 1, in module KeyError: Set(['i', 'a', 'l', 'o', 'v']) -- Steven of course it is broken as long as it uses it's instance id. i added this to notify that unhashable can become hashable implementing __hash__ inside the class. which probably set to None by default. Ok, nice example, but I believe that using id() as the hash function can lead to unexpected collisions. -- http://mail.python.org/mailman/listinfo/python-list
Re: insert unique data in a list
On 12/15/2009 4:13 AM, mattia wrote: of course it is broken as long as it uses it's instance id. i added this to notify that unhashable can become hashable implementing __hash__ inside the class. which probably set to None by default. Ok, nice example, but I believe that using id() as the hash function can lead to unexpected collisions. For dict and set to work correctly, the hash function must conform to the contract that: - if A == B then hash(A) == hash(B) If the id() of two objects differ but their content equal (i.e. they are two equivalent, but distinct object), they should have the same hash. If id() is used for the hash of an arbitrary object, the contract will be broken unless you define A == B in terms of id(A) == id(B). -- http://mail.python.org/mailman/listinfo/python-list
Re: insert unique data in a list
On Mon, 14 Dec 2009 17:13:24 +, mattia wrote: Il Sun, 13 Dec 2009 21:17:28 -0800, knifenomad ha scritto: On 12월14일, 오후12시42분, Steven D'Aprano ste...@remove.this.cybersource.com.au wrote: On Sun, 13 Dec 2009 17:19:17 -0800, knifenomad wrote: this makes the set type hashable. class Set(set): __hash__ = lambda self: id(self) That's a *seriously* broken hash function. key = voila d = { Set(key): 1 } d {Set(['i', 'a', 'l', 'o', 'v']): 1} d[ Set(key) ] Traceback (most recent call last): File stdin, line 1, in module KeyError: Set(['i', 'a', 'l', 'o', 'v']) -- Steven of course it is broken as long as it uses it's instance id. i added this to notify that unhashable can become hashable implementing __hash__ inside the class. which probably set to None by default. Ok, nice example, but I believe that using id() as the hash function can lead to unexpected collisions. No, you have that backwards. Using id() as the hash function means that equal keys will hash unequal -- rather than unexpected collisions, it leads to unexpected failure-to-collide-when-it-should-collide. And it isn't a nice example, it is a terrible example. Firstly, the example fails to behave correctly. It simply doesn't work as advertised. Secondly, and much worse, it encourages people to do something dangerous without thinking about the consequences. If it is so easy to hash mutable objects, why don't built-in lists and sets don't have a __hash__ method? Do you think that the Python developers merely forgot? No, there is good reason why mutable items shouldn't be used as keys in dicts or in sets, and this example simply papers over the reasons why and gives the impression that using mutable objects as keys is easy and safe when it is neither. Using mutable objects as keys or set elements leads to surprising, unexpected behaviour and hard-to-find bugs. Consider the following set with lists as elements: L = [1, 2] s = Set() # set that allows mutable elements s.add(L) s.add([1, 2, 3]) So far so good. But what should happen now? L.append(3) The set now has two equal elements, which breaks the set invariant that it has no duplicates. Putting the problem of duplicates aside, there is another problem: L = [1, 2] s = Set([L]) L.append(3) There are two possibilities: either the hash function of L changes when the object changes, or it doesn't. Suppose it changes. Then the hash of L after the append will be different from the hash of L before the append, and so membership testing (L in s) will fail. Okay, suppose we somehow arrange matters so that the hash of the object doesn't change as the object mutates. This will solve the problem above: we can mutate L as often as we like, and L in s will continue to work correctly. But now the hash of an object doesn't just depend on it's value, but on its history. That means that two objects which are identical can hash differently, and we've already seen this is a problem. There is one final approach which could work: we give the object a constant hash function, so that all objects of that type hash identically. This means that the performance of searches and lookups in sets and dicts will fall to that of lists. There is no point in paying all the extra overhead of a dict to get behaviour as slow, or slower, than a list. In other words, no matter what you do, using mutable objects as keys or set elements leads to serious problems that need to be dealt with. It simply isn't true that all you need to do to make mutable objects usable in dicts or sets is to add a hash function. That part is trivial. -- Steven -- http://mail.python.org/mailman/listinfo/python-list
Re: insert unique data in a list
Il Mon, 14 Dec 2009 21:53:38 +, Steven D'Aprano ha scritto: On Mon, 14 Dec 2009 17:13:24 +, mattia wrote: Il Sun, 13 Dec 2009 21:17:28 -0800, knifenomad ha scritto: On 12월14일, 오후12시42분, Steven D'Aprano ste...@remove.this.cybersource.com.au wrote: On Sun, 13 Dec 2009 17:19:17 -0800, knifenomad wrote: this makes the set type hashable. class Set(set): __hash__ = lambda self: id(self) That's a *seriously* broken hash function. key = voila d = { Set(key): 1 } d {Set(['i', 'a', 'l', 'o', 'v']): 1} d[ Set(key) ] Traceback (most recent call last): File stdin, line 1, in module KeyError: Set(['i', 'a', 'l', 'o', 'v']) -- Steven of course it is broken as long as it uses it's instance id. i added this to notify that unhashable can become hashable implementing __hash__ inside the class. which probably set to None by default. Ok, nice example, but I believe that using id() as the hash function can lead to unexpected collisions. No, you have that backwards. Using id() as the hash function means that equal keys will hash unequal -- rather than unexpected collisions, it leads to unexpected failure-to-collide-when-it-should-collide. And it isn't a nice example, it is a terrible example. Firstly, the example fails to behave correctly. It simply doesn't work as advertised. Secondly, and much worse, it encourages people to do something dangerous without thinking about the consequences. If it is so easy to hash mutable objects, why don't built-in lists and sets don't have a __hash__ method? Do you think that the Python developers merely forgot? No, there is good reason why mutable items shouldn't be used as keys in dicts or in sets, and this example simply papers over the reasons why and gives the impression that using mutable objects as keys is easy and safe when it is neither. Using mutable objects as keys or set elements leads to surprising, unexpected behaviour and hard-to-find bugs. Consider the following set with lists as elements: L = [1, 2] s = Set() # set that allows mutable elements s.add(L) s.add([1, 2, 3]) So far so good. But what should happen now? L.append(3) The set now has two equal elements, which breaks the set invariant that it has no duplicates. Putting the problem of duplicates aside, there is another problem: L = [1, 2] s = Set([L]) L.append(3) There are two possibilities: either the hash function of L changes when the object changes, or it doesn't. Suppose it changes. Then the hash of L after the append will be different from the hash of L before the append, and so membership testing (L in s) will fail. Okay, suppose we somehow arrange matters so that the hash of the object doesn't change as the object mutates. This will solve the problem above: we can mutate L as often as we like, and L in s will continue to work correctly. But now the hash of an object doesn't just depend on it's value, but on its history. That means that two objects which are identical can hash differently, and we've already seen this is a problem. There is one final approach which could work: we give the object a constant hash function, so that all objects of that type hash identically. This means that the performance of searches and lookups in sets and dicts will fall to that of lists. There is no point in paying all the extra overhead of a dict to get behaviour as slow, or slower, than a list. In other words, no matter what you do, using mutable objects as keys or set elements leads to serious problems that need to be dealt with. It simply isn't true that all you need to do to make mutable objects usable in dicts or sets is to add a hash function. That part is trivial. I agree with you, and in fact I'm inserting tuples in my set. All the workaroun to use somehow mutable object are poor attemps to solve in a quick-and-dirty way a difficult problem like hashing. But I think that during the discussion we have slowly forgot the main topic of my question, that was insert unique objects in a container. Hash are good to retrieve items in constant time, and when we are dealing with collisions we have to provide some solutions, like chaining or open addressing. Note also that in hash collisions happen and the hash function is used to retrieve items, not to insert unique items. -- http://mail.python.org/mailman/listinfo/python-list
insert unique data in a list
How can I insert non-duplicate data in a list? I mean, is there a particular option in the creation of a list that permit me not to use something like: def append_unique(l, val): if val not in l: l.append(val) Thanks, Mattia -- http://mail.python.org/mailman/listinfo/python-list
Re: insert unique data in a list
mattia wrote: How can I insert non-duplicate data in a list? I mean, is there a particular option in the creation of a list that permit me not to use something like: def append_unique(l, val): if val not in l: l.append(val) Thanks, Mattia Unless the insertion order is important, you could use a set -- where a second insertion will have no effect. s = set() s.add(1) s.add(2) s.add(1) print s set([1, 2]) Gary Herron -- http://mail.python.org/mailman/listinfo/python-list
Re: insert unique data in a list
On Dec 13, 8:37 am, mattia ger...@gmail.com wrote: How can I insert non-duplicate data in a list? I mean, is there a particular option in the creation of a list that permit me not to use something like: def append_unique(l, val): if val not in l: l.append(val) Thanks, Mattia Check out the set() data type. ~Sean -- http://mail.python.org/mailman/listinfo/python-list
Re: insert unique data in a list
* mattia: How can I insert non-duplicate data in a list? I mean, is there a particular option in the creation of a list that permit me not to use something like: def append_unique(l, val): if val not in l: l.append(val) How about using a set instead? a = {1, 2, 3} a {1, 2, 3} a |= {4} a {1, 2, 3, 4} a |= {4} a {1, 2, 3, 4} _ Cheers, - Alf -- http://mail.python.org/mailman/listinfo/python-list
Re: insert unique data in a list
Il Sun, 13 Dec 2009 16:37:20 +, mattia ha scritto: How can I insert non-duplicate data in a list? I mean, is there a particular option in the creation of a list that permit me not to use something like: def append_unique(l, val): if val not in l: l.append(val) Thanks, Mattia Ok, so you all suggest to use a set. Now the second question, more interesting. Why can't I insert a list into a set? I mean, I have a function that returns a list. I call this function several times and maybe the list returned is the same as another one already returned. I usually put all this lists into another list. How can I assure that my list contains only unique lists? Using set does'n work (i.e. the python interpreter tells me: TypeError: unhashable type: 'list')... -- http://mail.python.org/mailman/listinfo/python-list
Re: insert unique data in a list
mattia wrote: Il Sun, 13 Dec 2009 16:37:20 +, mattia ha scritto: How can I insert non-duplicate data in a list? I mean, is there a particular option in the creation of a list that permit me not to use something like: def append_unique(l, val): if val not in l: l.append(val) Thanks, Mattia Ok, so you all suggest to use a set. Now the second question, more interesting. Why can't I insert a list into a set? I mean, I have a function that returns a list. I call this function several times and maybe the list returned is the same as another one already returned. I usually put all this lists into another list. How can I assure that my list contains only unique lists? Using set does'n work (i.e. the python interpreter tells me: TypeError: unhashable type: 'list')... Sets can contain *only* hashable objects, but lists are not hashable (since they are mutable). Perhaps, you could convert your list to a tuple first -- tuples *are* hashable. s = set() l = [1,2,3] s.add(l) Traceback (most recent call last): File stdin, line 1, in module TypeError: unhashable type: 'list' s.add(tuple(l)) s set([(1, 2, 3)]) -- http://mail.python.org/mailman/listinfo/python-list
Re: insert unique data in a list
On 13 Dec 2009 17:57:47 GMT mattia ger...@gmail.com wrote: Using set does'n work (i.e. the python interpreter tells me: TypeError: unhashable type: 'list')... Convert the lists to tuples before adding them. Tuples are hashable. /W -- INVALID? DE! -- http://mail.python.org/mailman/listinfo/python-list
Re: insert unique data in a list
On Dec 13, 11:37 am, mattia ger...@gmail.com wrote: How can I insert non-duplicate data in a list? I mean, is there a particular option in the creation of a list that permit me not to use something like: def append_unique(l, val): if val not in l: l.append(val) Thanks, Mattia You could also define a custom object that manages a custom ordered set class unique_set(object): def __init__(self,list): self.list = list def __add___(self,val): if val not in self.list self.list.append(val) return True return False unique_list = unique_set(['a','b','c']) unique_list.list ['a', 'b', 'c'] unique_list + 'd' True unique_list + 'e' True unique_list + 'd' False unique_list.list ['a', 'b', 'c', 'd', 'e'] I've used this on a few projects, it makes for wonderfully clean code, because you can look at your program as an overview without all the arithmetic behind it. hope it helps -- http://mail.python.org/mailman/listinfo/python-list
Re: insert unique data in a list
On Sun, 13 Dec 2009 10:48:11 -0800, Fire Crow wrote: You could also define a custom object that manages a custom ordered set [...] I've used this on a few projects, it makes for wonderfully clean code, because you can look at your program as an overview without all the arithmetic behind it. Which is all very good, but beware that your ordered set implementation is O(N) for insertions, which may be slow once the set grows large enough. Also, I'm not sure I like your abuse of the + operator to modify the object in place and return a flag. It is an API not shared by (as far as I can see) any other data type in Python. -- Steven -- http://mail.python.org/mailman/listinfo/python-list
Re: insert unique data in a list
On Sun, Dec 13, 2009 at 3:09 PM, Steven D'Aprano st...@remove-this-cybersource.com.au wrote: On Sun, 13 Dec 2009 10:48:11 -0800, Fire Crow wrote: You could also define a custom object that manages a custom ordered set [...] I've used this on a few projects, it makes for wonderfully clean code, because you can look at your program as an overview without all the arithmetic behind it. Which is all very good, but beware that your ordered set implementation is O(N) for insertions, which may be slow once the set grows large enough. Also, I'm not sure I like your abuse of the + operator to modify the object in place and return a flag. It is an API not shared by (as far as I can see) any other data type in Python. Could probably just abuse an odict as cleanly. The other option that leaps to mind is to use a bloom filter and a list. Geremy Condra -- http://mail.python.org/mailman/listinfo/python-list
Re: insert unique data in a list
Also, I'm not sure I like your abuse of the + operator to modify the object in place and return a flag. It is an API not shared by (as far as I can see) any other data type in Python. I agree it couuld be more consisten with other object apis, I also think that if every api has to conform to every other api nothing will ever get done. Heres a slightly more familiar version, it returns the value added or none to conform with other APIs. class unique_set(object): def __init__(self,list): self.list = list def __add___(self,val): if val not in self.list self.list.append(val) return val return None unique_list = unique_set(['a','b','c']) unique_list.list ['a', 'b', 'c'] unique_list + 'd' 'd' then a test opperand to verify if a value was inserted could be if unique_list + 'd': # done some stuff but it could also be used in cases like this unique_added = unique_list + 'd' or 'not added' -- http://mail.python.org/mailman/listinfo/python-list
Re: insert unique data in a list
On 12월14일, 오전2시57분, mattia ger...@gmail.com wrote: Il Sun, 13 Dec 2009 16:37:20 +, mattia ha scritto: How can I insert non-duplicate data in a list? I mean, is there a particular option in the creation of a list that permit me not to use something like: def append_unique(l, val): if val not in l: l.append(val) Thanks, Mattia Ok, so you all suggest to use a set. Now the second question, more interesting. Why can't I insert a list into a set? I mean, I have a function that returns a list. I call this function several times and maybe the list returned is the same as another one already returned. I usually put all this lists into another list. How can I assure that my list contains only unique lists? Using set does'n work (i.e. the python interpreter tells me: TypeError: unhashable type: 'list')... this makes the set type hashable. class Set(set): __hash__ = lambda self: id(self) s = Set() s.add(1) s.add(2) s.add(1) print s set([1, 2]) d = {} d[s] = 'voila' print d {Set([1,2]):'voila'} print d[s] 'voila' -- http://mail.python.org/mailman/listinfo/python-list
Re: insert unique data in a list
On 12월14일, 오전10시19분, knifenomad knifeno...@gmail.com wrote: On 12월14일, 오전2시57분, mattia ger...@gmail.com wrote: Il Sun, 13 Dec 2009 16:37:20 +, mattia ha scritto: How can I insert non-duplicate data in a list? I mean, is there a particular option in the creation of a list that permit me not to use something like: def append_unique(l, val): if val not in l: l.append(val) Thanks, Mattia Ok, so you all suggest to use a set. Now the second question, more interesting. Why can't I insert a list into a set? I mean, I have a function that returns a list. I call this function several times and maybe the list returned is the same as another one already returned. I usually put all this lists into another list. How can I assure that my list contains only unique lists? Using set does'n work (i.e. the python interpreter tells me: TypeError: unhashable type: 'list')... this makes the set type hashable. class Set(set): __hash__ = lambda self: id(self) s = Set() s.add(1) s.add(2) s.add(1) print s set([1, 2]) d = {} d[s] = 'voila' print d {Set([1,2]):'voila'} print d[s] 'voila'- 원본 텍스트 숨기기 - - 원본 텍스트 보기 - although it's not what you've asked about. it's intereting to make set hashable using __hash__. -- http://mail.python.org/mailman/listinfo/python-list
Re: insert unique data in a list
On Sun, Dec 13, 2009 at 5:19 PM, knifenomad knifeno...@gmail.com wrote: On 12월14일, 오전2시57분, mattia ger...@gmail.com wrote: Il Sun, 13 Dec 2009 16:37:20 +, mattia ha scritto: How can I insert non-duplicate data in a list? I mean, is there a particular option in the creation of a list that permit me not to use something like: def append_unique(l, val): if val not in l: l.append(val) Ok, so you all suggest to use a set. Now the second question, more interesting. Why can't I insert a list into a set? I mean, I have a function that returns a list. I call this function several times and maybe the list returned is the same as another one already returned. I usually put all this lists into another list. How can I assure that my list contains only unique lists? Using set does'n work (i.e. the python interpreter tells me: TypeError: unhashable type: 'list')... this makes the set type hashable. class Set(set): __hash__ = lambda self: id(self) Or you could just use frozenset and get the correct semantics: http://docs.python.org/library/stdtypes.html#frozenset Cheers, Chris -- http://blog.rebertia.com -- http://mail.python.org/mailman/listinfo/python-list
Re: insert unique data in a list
On 12/14/2009 11:49 AM, Fire Crow wrote: I also think that if every api has to conform to every other api nothing will ever get done. but if every object has its own distinct API, we will never finish reading the docs (assuming they do exist for each object). -- http://mail.python.org/mailman/listinfo/python-list
Re: insert unique data in a list
On Sun, 13 Dec 2009 17:19:17 -0800, knifenomad wrote: this makes the set type hashable. class Set(set): __hash__ = lambda self: id(self) That's a *seriously* broken hash function. key = voila d = { Set(key): 1 } d {Set(['i', 'a', 'l', 'o', 'v']): 1} d[ Set(key) ] Traceback (most recent call last): File stdin, line 1, in module KeyError: Set(['i', 'a', 'l', 'o', 'v']) -- Steven -- http://mail.python.org/mailman/listinfo/python-list
Re: insert unique data in a list
On 12월14일, 오후12시42분, Steven D'Aprano ste...@remove.this.cybersource.com.au wrote: On Sun, 13 Dec 2009 17:19:17 -0800, knifenomad wrote: this makes the set type hashable. class Set(set): __hash__ = lambda self: id(self) That's a *seriously* broken hash function. key = voila d = { Set(key): 1 } d {Set(['i', 'a', 'l', 'o', 'v']): 1} d[ Set(key) ] Traceback (most recent call last): File stdin, line 1, in module KeyError: Set(['i', 'a', 'l', 'o', 'v']) -- Steven of course it is broken as long as it uses it's instance id. i added this to notify that unhashable can become hashable implementing __hash__ inside the class. which probably set to None by default. -- http://mail.python.org/mailman/listinfo/python-list