On Sat, Apr 10, 2010 at 4:40 PM, Antoine Pitrou <solip...@pitrou.net> wrote: > Reid Kleckner <rnk <at> mit.edu> writes: >> >> I think you're right about the number of collisions, though. CPython >> dicts use a pretty low load factor (2/3) to keep collision counts >> down. One of the major benefits cited in the paper is the ability to >> maintain performance in the face of higher load factors, so I may be >> able to bump up the load factor to save memory. This would increase >> collisions, but then that wouldn't matter, because resolving them >> would only require looking within two consecutive cache lines. > > Why wouldn't it matter? Hash collisions still involve more CPU work, even > though > if you're not access memory a lot.
So we know for sure that extra CPU work to avoid cache misses is a big win, but it's never clear whether or not any given memory access will be a miss. Because Python's access patterns are rarely random, it may turn out that all of the elements it accesses are in cache and the extra CPU work dominates. If you have a random access pattern over a large dataset, the dictionary will not fit in cache, and saving random memory accesses at the expense of CPU will be a win. Reid _______________________________________________ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com