Thanks again. What you explain is reasonable. I try to the second method to unique the list. It does turn out that python just works and works without result. Maybe because it do iterate a long list in my example and slow.
>>> def average_polysemy(pos): synset_list = list(wn.all_synsets(pos)) sense_number = 0 lemma_list = [] for synset in synset_list: lemma_list.extend(synset.lemma_names) unique_lemma_list = [] for w in lemma_list: if not w in unique_lemma_list: unique_lemma_list.append(w) return unique_lemma_list for lemma in unique_lemma_list: sense_number_new = len(wn.synsets(lemma, pos)) sense_number = sense_number + sense_number_new return sense_number/len(unique_lemma_list) >>> average_polysemy('n') On Sun, Sep 9, 2012 at 2:36 PM, Donald Stufft <donald.stu...@gmail.com>wrote: > For a short list the difference is going to be negligible. > > For a long list the difference is that checking if an item in a list > requires iterating over the list internally to find it but checking if an > item is inside of a set uses a faster method that doesn't require iterating > over the list. This doesn't matter if you have 20 or 30 items, but imagine > if instead you have 50 million items. Your going to be iterating over the > list a lot and that can introduce significant slow dow. > > On the other hand using a set is faster in that case, but because you are > storing an additional copy of the data you are using more memory to store > extra copies of everything. > > On Sunday, September 9, 2012 at 2:31 AM, John H. Li wrote: > > Thanks first, I could understand the second approach easily. The first > approach is a bit puzzling. Why are seen=set() and seen.add(x) still > necessary there if we can use unique.append(x) alone? Thanks for your > enlightenment. > > On Sun, Sep 9, 2012 at 1:59 PM, Donald Stufft <donald.stu...@gmail.com>wrote: > > seen = set() > uniqued = [] > for x in original: > if not x in seen: > seen.add(x) > uniqued.append(x) > > or > > uniqued = [] > for x in oriignal: > if not x in uniqued: > uniqued.append(x) > > The difference between is option #1 is more efficient speed wise, but uses > more memory (extraneous set hanging around), whereas the second is slower > (``in`` is slower in lists than in sets) but uses less memory. > > On Sunday, September 9, 2012 at 1:56 AM, John H. Li wrote: > > Many thanks. If I want keep the order, how can I deal with it? > or we can list(set([1, 1, 2, 3, 4])) = [1,2,3,4] > > > On Sun, Sep 9, 2012 at 1:47 PM, Donald Stufft <donald.stu...@gmail.com>wrote: > > If you don't need to retain order you can just use a set, > > set([1, 1, 2, 3, 4]) = set([1, 2, 3, 4]) > > But set's don't retain order. > > On Sunday, September 9, 2012 at 1:43 AM, Token Type wrote: > > Is there a unique method in python to unique a list? thanks > -- > http://mail.python.org/mailman/listinfo/python-list > > > > > > >
-- http://mail.python.org/mailman/listinfo/python-list