On Mon, 03 Nov 2008 17:06:07 -0800, Aaron Brady wrote: >> For all practical purposes, dicts have almost constant access time (at >> least with any half-decent __hash__ method). > > Hash tables are slick, but their hash function is their weakest link. > >>>> [ hash( 2**x ) for x in range( 0, 256, 32 ) ] > [1, 1, 1, 1, 1, 1, 1, 1]
That's not the hash function of a dict. Dicts don't have a hash function: >>> hash({}) Traceback (most recent call last): File "<stdin>", line 1, in <module> TypeError: dict objects are unhashable What you're showing is the hash function of ints, and you've discovered that there are some collisions. This is hardly surprising. There are an infinite number of ints, and only a finite number of values that they can hash to, so there have to be some collisions. It's worth looking at the values that collide: >>> [ 2**x for x in range( 0, 256, 32 ) ] [1, 4294967296L, 18446744073709551616L, 79228162514264337593543950336L, 340282366920938463463374607431768211456L, 1461501637330902918203684832716283019655932542976L, 6277101735386680763835789423207666416102355444464034512896L, 26959946667150639794667015087019630673637144422540572481103610249216L] So if you have a dict with keys equal to those numbers, and do a lookup on one of them, you'll get O(N) performance instead of O(1) *for those keys that collide*. I think I can live with that. If you think you can come up with a better hash function, feel free to subclass int. > Such an index gives you linear-time performance in finding elements. Ha! In any real application, all your keys are highly unlikely to collide in that fashion. > Plus, your worst-case insertion will cause a copy of the entire table. Explain please. > There aren't any mappings based on balanced trees, unfortunately. I'm not sure why you think O(log N) is better than O(1). The major advantage of tree structures is that, unlike hash tables, they are ordered, not that they don't have collisions. -- Steven -- http://mail.python.org/mailman/listinfo/python-list