Steven Bethard wrote:
Delaney, Timothy C (Timothy) <[EMAIL PROTECTED]> wrote:

The perennial "how do I remove duplicates from a list" topic came up on
c.l.py and in the discussion I mentioned the java 1.5 LinkedHashSet and
LinkedHashMap. I'd thought about proposing these before, but couldn't
think of where to put them. It was pointed out that the obvious place
would be the collections module.

For those who don't know, LinkedHashSet and LinkedHashMap are simply
hashed sets and maps that iterate in the order that the keys were added
to the set/map. I almost invariably use them for the above scenario -
removing duplicates without changing order.

Does anyone else think it would be worthwhile adding these to
collections, or should I just make a cookbook entry?


I guess I'm -0 on this.

Though I was the one that suggested that collections is the right
place to put them, I'm not really certain how much we gain by
including them.  I too would only ever use them for removing
duplicates from a list.  But if we're trying to provide a solution to
this problem, I'd rather see an iterable-friendly one.  See a previous
thread on this issue[1] where I suggest something like:

def filterdups(iterable):
     seen = set()
     for item in iterable:
         if item not in seen:
             seen.add(item)
             yield item

Adding this to, say, itertools would cover all my use cases.  And as
long as you don't have too many duplicates, filterdups as above should
keep memory consumption down better.


I am -1 on adding LinkedHash*. While I can understand wanting to get rid of duplicates easily and wanting a good solutoin, Steven's snippet of code shows rolling your own solution is easy.


Plus this can even be simplified down to a one-liner using itertools already::

  itertools.ifilterfalse(lambda item, _set=set():
                           (item in _set) or (_set.add(item) and False),
                         iterable)

I don't think it is the prettiest solution, but it does show that coming up with one is not hard nor restricted to only a single solution that requires knowing some Python idiom (well, mine does for the default arg to the lambda, but Steven's doesn't).

The last thing I want to happen is for Python's stdlib to grow every possible data structure out there like Java seems to have. I don't ever want to have to think about what *variant* of a data structure I should use, only what *type* of data structure (and no, I don't count collections.deque and Queue.Queue an overlap since the latter is meant more for thread communication, at least in my mind).

-Brett
_______________________________________________
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Reply via email to