> I'm a unix guy. That's what we call a sort-uniq operation, after the > pipeline we'd use: sort datafile | uniq > uniq-lines.txt. So I google that > with python and ....
> As Jason Petrone wrote when he withdrew PEP 270 in > http://www.python.org/dev/peps/pep-0270/: > "creating a sequence without duplicates is just a matter of > choosing a different data structure: a set instead of a list." > At the time, sets.py was a nifty new thing. Since then, the set datatype > has > been added to python's base. > set() can consume a list of tuples, but not a list of lists, like the X you > showed us. You're job will be getting your massive list of lists into a > list of tuples. > This works, but for your very large arrays, may take large time: > X = [[1,2], [1,2], [3,4], [3,4]] > Y = set( [tuple(x) for x in X] ) > There may be faster methods. The map() function might help, but I really > don't know. Here's something to try: > Y = set( map(tuple, X ) > Or you can go old school route, from before the days of set(), that is: > http://code.activestate.com/recipes/52560-remove-duplicates-from-a-sequence/ > Best, > Mike Thanks for your reply, but perhaps there is misunderstanding: I don't want unique values, but unique sequence (block) of data that is repeated in array: A B C D D D A B C D D D A B C D D D |_________| |_________| |_________| | | | unique unique unique sequence sequence sequence data data data I tested your approach and won't say it's slow. It works great but that's not what I'm after. Thanks anyway Cheers _______________________________________________ python-win32 mailing list python-win32@python.org http://mail.python.org/mailman/listinfo/python-win32