Re: finding items that occur more than once in a list

Hrvoje Niksic Tue, 18 Mar 2008 08:31:32 -0700

Ninereeds <[EMAIL PROTECTED]> writes:

> The dictionary version Chris suggests (and the essentially
> equivalent set-based approach) is doing essentially the same thing
> in a way, but using hashing rather than ordering to organise the
> list and spot duplicates. This is *not* O(n) due to the rate of
> collisions increasing as the hash table fills. If collisions are
> handled by building a linked list for each hash table entry, the
> performance overall is still O(n^2)


This doesn't apply to Python, which implements dict storage as an
open-addressed table and automatically (and exponentially) grows the
table when the number of entries approaches 2/3 of the table size.
Assuming a good hash function, filling the dict should yield amortized
constant time for individual additions.

> I suspect Python handles collisions using linked lists,

Why suspect, it's trivial to check.  dictobject.c says:

    The basic lookup function used by all operations.
    This is based on Algorithm D from Knuth Vol. 3, Sec. 6.4.
    Open addressing is preferred over chaining since the link overhead for
    chaining would be substantial (100% with typical malloc overhead).
-- 
http://mail.python.org/mailman/listinfo/python-list

Re: finding items that occur more than once in a list

Reply via email to