On 09/08/2012 23:26, Dave Angel wrote: > On 08/09/2012 06:03 PM, Andrew Cooper wrote: >> On 09/08/2012 22:34, Roman Vashkevich wrote: >>> Actually, they are different. >>> Put a dict.{iter}items() in an O(k^N) algorithm and make it a hundred >>> thousand entries, and you will feel the difference. >>> Dict uses hashing to get a value from the dict and this is why it's O(1). >>> >> Sligtly off topic, but looking up a value in a dictionary is actually >> O(n) for all other entries in the dict which suffer a hash collision >> with the searched entry. >> >> True, a sensible choice of hash function will reduce n to 1 in common >> cases, but it becomes an important consideration for larger datasets. >> >> ~Andrew > > I'm glad you're wrong for CPython's dictionaries. The only time the > lookup would degenerate to O[n] would be if the hash table had only one > slot. CPython sensibly increases the hash table size when it becomes > too small for efficiency. > > > Where have you seen dictionaries so poorly implemented? >
Different n, which I should have made more clear. I was using it for consistency with O() notation. My statement was O(n) where n is the number of hash collisions. The choice of hash algorithm (or several depending on the implementation) should specifically be chosen to reduce collisions to aid in efficient space utilisation and lookup times, but any implementation must allow for collisions. There are certainly runtime methods of improving efficiency using amortized operations. As for poor implementations, class Foo(object): ... def __hash__(self): return 0 I seriously found that in some older code I had the misfortune of reading. It didn't remain in that state for long. ~Andrew -- http://mail.python.org/mailman/listinfo/python-list