On Thu, Mar 09, 2017 at 01:26:26AM +0530, Sri Kavi wrote: > I wrote and submitted the following function: > > def sublist_duplicates(lst): > sublists = [] > for item in set(lst): > sublists.append([item] * lst.count(item)) > return sublists > > lst_of_duplicates = [1, 2, 10, 3, 4, 1, 's', 2, 3, 1, 4, 's'] > > print sublist_duplicates(lst_of_duplicates) > > A participant told me that that'll grind into a near-halt if there are many > different elements, for example range(1_000_000) (1 million unique values, > so it checks the whole list (length 1 million) for the count, a million > times, a total of 1000 billion operations…
Correct. lst.count(item) has to walk the entire list, counting elements. It does that in C code, so it is fast, but still, if there are a lot of elements to walk, it takes some time. Then you do it again, and again, and again... For a list with N items, your sublist_duplicates function walks the entire list N*N times. When N is large, N**2 is even larger. > I thought he was right. I wrote another version of the function, this time > using collections.Counter. > > from collections import Counter > > def sublist_duplicates(lst): > counts = Counter(lst) > sublists = [[k] * counts[k] for k in counts] > return sublists > > _lst = list(range(1000000)) [...] > I found that the first version works better with a small list, Yes, but for a small list the total time is likely to be so small that it doesn't really matter. > but the > second version outperforms the first one when it’s given a considerably big > list of data. > > > I’m asking for your comments on these two functions. I’m not even sure if > this is how it should be coded. I agree with your analysis. -- Steve _______________________________________________ Tutor maillist - Tutor@python.org To unsubscribe or change subscription options: https://mail.python.org/mailman/listinfo/tutor