Delaney, Timothy C (Timothy) <[EMAIL PROTECTED]> wrote:
The perennial "how do I remove duplicates from a list" topic came up on c.l.py and in the discussion I mentioned the java 1.5 LinkedHashSet and LinkedHashMap. I'd thought about proposing these before, but couldn't think of where to put them. It was pointed out that the obvious place would be the collections module.
For those who don't know, LinkedHashSet and LinkedHashMap are simply hashed sets and maps that iterate in the order that the keys were added to the set/map. I almost invariably use them for the above scenario - removing duplicates without changing order.
Does anyone else think it would be worthwhile adding these to collections, or should I just make a cookbook entry?
I guess I'm -0 on this.
Though I was the one that suggested that collections is the right place to put them, I'm not really certain how much we gain by including them. I too would only ever use them for removing duplicates from a list. But if we're trying to provide a solution to this problem, I'd rather see an iterable-friendly one. See a previous thread on this issue[1] where I suggest something like:
def filterdups(iterable): seen = set() for item in iterable: if item not in seen: seen.add(item) yield item
Adding this to, say, itertools would cover all my use cases. And as long as you don't have too many duplicates, filterdups as above should keep memory consumption down better.
I am -1 on adding LinkedHash*. While I can understand wanting to get rid of duplicates easily and wanting a good solutoin, Steven's snippet of code shows rolling your own solution is easy.
Plus this can even be simplified down to a one-liner using itertools already::
itertools.ifilterfalse(lambda item, _set=set(): (item in _set) or (_set.add(item) and False), iterable)
I don't think it is the prettiest solution, but it does show that coming up with one is not hard nor restricted to only a single solution that requires knowing some Python idiom (well, mine does for the default arg to the lambda, but Steven's doesn't).
The last thing I want to happen is for Python's stdlib to grow every possible data structure out there like Java seems to have. I don't ever want to have to think about what *variant* of a data structure I should use, only what *type* of data structure (and no, I don't count collections.deque and Queue.Queue an overlap since the latter is meant more for thread communication, at least in my mind).
-Brett _______________________________________________ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com