Re: [Python-Dev] Tuning Python dicts

Reid Kleckner Sat, 10 Apr 2010 14:37:29 -0700

On Sat, Apr 10, 2010 at 4:40 PM, Antoine Pitrou <solip...@pitrou.net> wrote:
> Reid Kleckner <rnk <at> mit.edu> writes:
>>
>> I think you're right about the number of collisions, though.  CPython
>> dicts use a pretty low load factor (2/3) to keep collision counts
>> down.  One of the major benefits cited in the paper is the ability to
>> maintain performance in the face of higher load factors, so I may be
>> able to bump up the load factor to save memory.  This would increase
>> collisions, but then that wouldn't matter, because resolving them
>> would only require looking within two consecutive cache lines.
>
> Why wouldn't it matter? Hash collisions still involve more CPU work, even 
> though
> if you're not access memory a lot.


So we know for sure that extra CPU work to avoid cache misses is a big
win, but it's never clear whether or not any given memory access will
be a miss.  Because Python's access patterns are rarely random, it may
turn out that all of the elements it accesses are in cache and the
extra CPU work dominates.  If you have a random access pattern over a
large dataset, the dictionary will not fit in cache, and saving random
memory accesses at the expense of CPU will be a win.

Reid
_______________________________________________
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] Tuning Python dicts

Reply via email to