Ninereeds <[EMAIL PROTECTED]> writes: > The dictionary version Chris suggests (and the essentially > equivalent set-based approach) is doing essentially the same thing > in a way, but using hashing rather than ordering to organise the > list and spot duplicates. This is *not* O(n) due to the rate of > collisions increasing as the hash table fills. If collisions are > handled by building a linked list for each hash table entry, the > performance overall is still O(n^2)
This doesn't apply to Python, which implements dict storage as an open-addressed table and automatically (and exponentially) grows the table when the number of entries approaches 2/3 of the table size. Assuming a good hash function, filling the dict should yield amortized constant time for individual additions. > I suspect Python handles collisions using linked lists, Why suspect, it's trivial to check. dictobject.c says: The basic lookup function used by all operations. This is based on Algorithm D from Knuth Vol. 3, Sec. 6.4. Open addressing is preferred over chaining since the link overhead for chaining would be substantial (100% with typical malloc overhead). -- http://mail.python.org/mailman/listinfo/python-list