On 08/09/2012 06:53 PM, Chris Angelico wrote: > On Fri, Aug 10, 2012 at 8:26 AM, Dave Angel <d...@davea.name> wrote: >> On 08/09/2012 06:03 PM, Andrew Cooper wrote: >>> O(n) for all other entries in the dict which suffer a hash collision >>> with the searched entry. >>> >>> True, a sensible choice of hash function will reduce n to 1 in common >>> cases, but it becomes an important consideration for larger datasets. >> I'm glad you're wrong for CPython's dictionaries. The only time the >> lookup would degenerate to O[n] would be if the hash table had only one >> slot. CPython sensibly increases the hash table size when it becomes >> too small for efficiency. >> >> Where have you seen dictionaries so poorly implemented? > In vanilla CPython up to version (I think) 3.3, where it's possible to > DoS the hash generator. Hash collisions are always possible, just > ridiculously unlikely unless deliberately exploited. > > (And yes, I know an option was added to older versions to randomize > the hashes there too. It's not active by default, so "vanilla CPython" > is still vulnerable.) > > ChrisA
Thank you to you and others, who have corrected my over-general response. I was not intending to claim anything about either a deliberate DOS, nor a foolishly chosen hash function. But the message I was replying to seemed to me to claim that for large datasets, even a good hash algorithm would end up giving O(n) performance. -- DaveA -- http://mail.python.org/mailman/listinfo/python-list