Re: Removing duplicates from a list
drochom wrote: i suppose this one is faster (but in most cases efficiency doesn't matter) def stable_unique(s): e = {} ret = [] for x in s: if not e.has_key(x): e[x] = 1 ret.append(x) return ret I'll repeat Peter Otten's link to Tim Peters's recipe here: http://aspn.activestate.com/ASPN/Cookbook/Python/Recipe/52560/ Read the comments at the end, they talk about order-preserving lists. See Raymond Hettinger's response: def uniq(alist)# Fastest order preserving set = {} return [set.setdefault(e,e) for e in alist if e not in set] STeVe -- http://mail.python.org/mailman/listinfo/python-list
Re: Removing duplicates from a list
Thanks for all the information. And now I understand the timeit module ;) GC-Martijn -- http://mail.python.org/mailman/listinfo/python-list
Re: Removing duplicates from a list
Rubinho napisal(a): I've a list with duplicate members and I need to make each entry unique. hi, other possibility (my newest discovery:) ) a = [1,2,2,4,2,1,3,4] unique = d.fromkeys(a).keys() unique [1, 2, 3, 4] regards przemek -- http://mail.python.org/mailman/listinfo/python-list
Re: Removing duplicates from a list
Look at the code below def unique(s): return list(set(s)) def unique2(keys): unique = [] for i in keys: if i not in unique:unique.append(i) return unique tmp = [0,1,2,4,2,2,3,4,1,3,2] print tmp print unique(tmp) print unique2(tmp) -- [0, 1, 2, 4, 2, 2, 3, 4, 1, 3, 2] [0, 1, 2, 3, 4] [0, 1, 2, 4, 3] As you can see the end result is not the same. I must get the end result [0, 1, 2, 4, 3] and not [0, 1, 2, 3, 4]. Thats why I use unique2() -- http://mail.python.org/mailman/listinfo/python-list
Re: Removing duplicates from a list
there wasn't any information about ordering... maybe i'll find something better which don't destroy original ordering regards przemek -- http://mail.python.org/mailman/listinfo/python-list
Re: Removing duplicates from a list
i suppose this one is faster (but in most cases efficiency doesn't matter) def stable_unique(s): e = {} ret = [] for x in s: if not e.has_key(x): e[x] = 1 ret.append(x) return ret cheers, przemek -- http://mail.python.org/mailman/listinfo/python-list
Re: Removing duplicates from a list
Ow thanks , i'm I newbie and I did this test. (don't know if this is the best way to do a small speed test) import timeit def unique2(keys): unique = [] for i in keys: if i not in unique:unique.append(i) return unique def unique3(s): e = {} ret = [] for x in s: if not e.has_key(x): e[x] = 1 ret.append(x) return ret tmp = [0,1,2,4,2,2,3,4,1,3,2] s = \ try: str.__nonzero__ except AttributeError: pass t = timeit.Timer(stmt=s) print %.2f usec/pass % (100 * t.timeit(number=10)/10) print tmp print %.2f usec/pass % (100 * t.timeit(number=10)/10) print unique2(tmp) print %.2f usec/pass % (100 * t.timeit(number=10)/10) print unique3(tmp) print %.2f usec/pass % (100 * t.timeit(number=10)/10) - 5.80 usec/pass [0, 1, 2, 4, 2, 2, 3, 4, 1, 3, 2] 7.51 usec/pass [0, 1, 2, 4, 3] 6.93 usec/pass [0, 1, 2, 4, 3] 6.45 usec/pass --- your code unique2(s): -- http://mail.python.org/mailman/listinfo/python-list
Re: Removing duplicates from a list
thanks, nice job. but this benchmark is pretty deceptive: try this: (definition of unique2 and unique3 as above) import timeit a = range(1000) t = timeit.Timer('unique2(a)','from __main__ import unique2,a') t2 = timeit.Timer('stable_unique(a)','from __main__ import stable_unique,a') t2.timeit(2000) 1.8392596235778456 t.timeit(2000) 51.52945844819817 unique2 has quadratic complexity unique3 has amortized linear complexity what it means? it means that speed of your algorithm strongly depends on len(unique2(a)). the greater distinct elements in a the greater difference in execution time of both implementations regards przemek -- http://mail.python.org/mailman/listinfo/python-list
Removing duplicates from a list
I've a list with duplicate members and I need to make each entry unique. I've come up with two ways of doing it and I'd like some input on what would be considered more pythonic (or at least best practice). Method 1 (the traditional approach) for x in mylist: if mylist.count(x) 1: mylist.remove(x) Method 2 (not so traditional) mylist = set(mylist) mylist = list(mylist) Converting to a set drops all the duplicates and converting back to a list, well, gets it back to a list which is what I want. I can't imagine one being much faster than the other except in the case of a huge list and mine's going to typically have less than 1000 elements. What do you think? Cheers, Robin -- http://mail.python.org/mailman/listinfo/python-list
Re: Removing duplicates from a list
Am Wed, 14 Sep 2005 04:38:35 -0700 schrieb Rubinho: I've a list with duplicate members and I need to make each entry unique. I've come up with two ways of doing it and I'd like some input on what would be considered more pythonic (or at least best practice). mylist = set(mylist) mylist = list(mylist) Converting to a set drops all the duplicates and converting back to a list, well, gets it back to a list which is what I want. I can't imagine one being much faster than the other except in the case of a huge list and mine's going to typically have less than 1000 elements. What do you think? Hi, I would use set: mylist=list(set(mylist)) Thomas -- Thomas Güttler, http://www.thomas-guettler.de/ E-Mail: guettli (*) thomas-guettler + de Spam Catcher: [EMAIL PROTECTED] -- http://mail.python.org/mailman/listinfo/python-list
Re: Removing duplicates from a list
Rubinho wrote: I've a list with duplicate members and I need to make each entry unique. I've come up with two ways of doing it and I'd like some input on what would be considered more pythonic (or at least best practice). Method 1 (the traditional approach) for x in mylist: if mylist.count(x) 1: mylist.remove(x) Method 2 (not so traditional) mylist = set(mylist) mylist = list(mylist) Converting to a set drops all the duplicates and converting back to a list, well, gets it back to a list which is what I want. I can't imagine one being much faster than the other except in the case of a huge list and mine's going to typically have less than 1000 elements. I would imagine that 2 would be significantly faster. Method 1 uses 'count' which must make a pass through every element of the list, which would be slower than the efficient hashing that set does. I'm also not sure about removing an element whilst iterating, I think thats a no-no. Will McGugan -- http://www.willmcgugan.com .join({'*':'@','^':'.'}.get(c,0) or chr(97+(ord(c)-84)%26) for c in jvyy*jvyyzpthtna^pbz) -- http://mail.python.org/mailman/listinfo/python-list
Re: Removing duplicates from a list
Rubinho wrote: I've a list with duplicate members and I need to make each entry unique. I've come up with two ways of doing it and I'd like some input on what would be considered more pythonic (or at least best practice). Method 1 (the traditional approach) for x in mylist: if mylist.count(x) 1: mylist.remove(x) That would be an odd tradition: mylist = [1, 2, 1, 3, 2, 3] for x in mylist: ... if mylist.count(x) 1: ... mylist.remove(x) ... mylist [2, 1, 2, 3] # oops! See Unexpected Behavior Iterating over a Mutating Object http://mail.python.org/pipermail/python-list/2005-September/298993.html thread for the most recent explanation. Rather, the traditional approach for an algorithmic problem in Python is to ask Tim Peters, see his recipe at http://aspn.activestate.com/ASPN/Cookbook/Python/Recipe/52560/ (which predates Python's set class). Peter -- http://mail.python.org/mailman/listinfo/python-list
Re: Removing duplicates from a list
Peter Otten wrote: Rubinho wrote: I've a list with duplicate members and I need to make each entry unique. I've come up with two ways of doing it and I'd like some input on what would be considered more pythonic (or at least best practice). Method 1 (the traditional approach) for x in mylist: if mylist.count(x) 1: mylist.remove(x) That would be an odd tradition: By tradition I wasn't really talking Python tradition; what I meant was that the above pattern is similar to what would be generated by people used to traditional programming languages. mylist = [1, 2, 1, 3, 2, 3] for x in mylist: ... if mylist.count(x) 1: ... mylist.remove(x) ... mylist [2, 1, 2, 3] # oops! But you're absolutely right, it doesn't work! Oops indeed :) I've gone with Thomas's suggestion above of: mylist=list(set(mylist)) Thanks, Robin -- http://mail.python.org/mailman/listinfo/python-list
Re: Removing duplicates from a list
[EMAIL PROTECTED] wrote in message news:[EMAIL PROTECTED] I do this: def unique(keys): unique = [] for i in keys: if i not in unique:unique.append(i) return unique I don't know what is faster at the moment. This is quadratic, O(n^2), in the length n of the list if all keys are unique. Conversion to a set just might use a better sorting algorithm than this (i.e. n*log(n)) and throwing out duplicates (which, after sorting, are positioned next to each other) is O(n). If conversion to a set should turn out to be slower than O(n*log(n)) [depending on the implementation], then you are well advised to sort the list first (n*log(n)) and then throw out the duplicate keys with a single walk over the list. In this case you know at least what to expect for large n... Regards, Christian -- http://mail.python.org/mailman/listinfo/python-list
Re: Removing duplicates from a list
I do this: def unique(keys): unique = [] for i in keys: if i not in unique:unique.append(i) return unique I don't know what is faster at the moment. -- http://mail.python.org/mailman/listinfo/python-list
Re: Removing duplicates from a list
Rubinho wrote: I can't imagine one being much faster than the other except in the case of a huge list and mine's going to typically have less than 1000 elements. To add to what others said, I'd imagine that the technique that's going to be fastest is going to depend not only on the length of the list, but also the estimated redundancy. (i.e. a technique that gives good performance with a list that has only one or two elements duplicated might be painfully slow when there is 10-100 copies of each element.) There really is no substitute for profiling with representitive data sets. -- http://mail.python.org/mailman/listinfo/python-list
Re: Removing duplicates from a list
On Wed, 14 Sep 2005 13:28:58 +0100, Will McGugan wrote: Rubinho wrote: I can't imagine one being much faster than the other except in the case of a huge list and mine's going to typically have less than 1000 elements. I would imagine that 2 would be significantly faster. Don't imagine, measure. Resist the temptation to guess. Write some test functions and time the two different methods. But first test that the functions do what you expect: there is no point having a blindingly fast bug. Method 1 uses 'count' which must make a pass through every element of the list, which would be slower than the efficient hashing that set does. But count passes through the list in C and is also very fast. Is that faster or slower than the hashing code used by sets? I don't know, and I'll bet you don't either. -- Steven. -- http://mail.python.org/mailman/listinfo/python-list
Re: Removing duplicates from a list
Steven D'Aprano wrote: Don't imagine, measure. Resist the temptation to guess. Write some test functions and time the two different methods. But first test that the functions do what you expect: there is no point having a blindingly fast bug. Thats is absolutely correct. Although I think you do sometimes have to guess. Otherwise you would write multiple versions of every line of code. But count passes through the list in C and is also very fast. Is that faster or slower than the hashing code used by sets? I don't know, and I'll bet you don't either. Sure. But if I'm not currently optimizing I would go for the method with the best behaviour, which usualy means hashing rather than searching. Since even if it is actualy slower - its not likely to be _very_ slow. Will McGugan -- http://www.willmcgugan.com .join({'*':'@','^':'.'}.get(c,0) or chr(97+(ord(c)-84)%26) for c in jvyy*jvyyzpthtna^pbz) -- http://mail.python.org/mailman/listinfo/python-list
Re: Removing duplicates from a list
I've a list with duplicate members and I need to make each entry unique. I've come up with two ways of doing it and I'd like some input on what would be considered more pythonic (or at least best practice). Method 1 (the traditional approach) for x in mylist: if mylist.count(x) 1: mylist.remove(x) Method 2 (not so traditional) mylist = set(mylist) mylist = list(mylist) Converting to a set drops all the duplicates and converting back to a list, well, gets it back to a list which is what I want. I can't imagine one being much faster than the other except in the case of a huge list and mine's going to typically have less than 1000 elements. What do you think? Cheers, Robin Hi, Try this: def unique(s): e = {} for x in s: if not e.has_key(x): e[x] = 1 return e.keys() Regards Przemek -- http://mail.python.org/mailman/listinfo/python-list
Re: Removing duplicates from a list
This works too, if speed isn't your thing.. a = [ 1,2,3,2,6,1,3,4,1,7,5,6,7] a = dict( ( (i,None) for i in a)).keys() a [1, 2, 3, 4, 5, 6, 7] -- http://mail.python.org/mailman/listinfo/python-list
Re: Removing duplicates from a list
przemek drochomirecki wrote: def unique(s): e = {} for x in s: if not e.has_key(x): e[x] = 1 return e.keys() This is basically identical in functionality to the code: def unique(s): return list(set(s)) And with the new-and-improved C implementation of sets coming in Python 2.5, there's even more of a reason to use them when you can. STeVe -- http://mail.python.org/mailman/listinfo/python-list