[issue18771] Reduce the cost of hash collisions for set objects

2013-09-15 Thread Roundup Robot
Roundup Robot added the comment: New changeset 9353c611f897 by Raymond Hettinger in branch 'default': Issue 18771: Make it possible to set the number linear probes at compile-time. http://hg.python.org/cpython/rev/9353c611f897 -- ___ Python tracker

[issue18771] Reduce the cost of hash collisions for set objects

2013-08-19 Thread Tal Einat
Tal Einat added the comment: The following benchmarks were run with the second version of the patch (so2.diff), once before and once after. OSX 10.8 on a Macbook Pro, 2.5 GHz Core i5, two cores each with 256kb L2 cache, 3MB L3 cache: 1.32% less time for first benchmark (7.07 before, 6.98

[issue18771] Reduce the cost of hash collisions for set objects

2013-08-19 Thread Tal Einat
Tal Einat added the comment: Because of the unusually low differences, I tried running the benchmarks again on the MacBook (OSX 10.8). The first time was actually with so.diff, this time was with so2.diff. 0.69% less time for first benchmark (7.30 before, 7.25 after) 8.08% less time for

[issue18771] Reduce the cost of hash collisions for set objects

2013-08-19 Thread Christian Heimes
Christian Heimes added the comment: Intel(R) Core(TM) i7-4770K CPU @ 3.50GHz on Ubuntu 13.04 X86_64 with GCC 4.7.3 1MB L2 cache test: 11.33sec / 10.83sec (4.4% faster) 8Mb L3 cache test: 22.16sec / 19.67sec (11% faster) -- ___ Python tracker

[issue18771] Reduce the cost of hash collisions for set objects

2013-08-19 Thread Tal Einat
Tal Einat added the comment: Sorry to be pedantic, but while 19.67sec is about 11.2% less than 22.16sec, it is actually about 12.6% faster (22.16/19.67 - 1). Regardless, that's a good result! And I'm happy that the modified benchmark is useful. --

[issue18771] Reduce the cost of hash collisions for set objects

2013-08-19 Thread Christian Heimes
Christian Heimes added the comment: Oh *curse*, you are right. Why do I always confuse the order? :| It looks like the performance boost increases with CPU cache size as well as the CPU's generation. The i7-4770K box at work is less than a week old. The CPU is currently the fast desktop CPU

[issue18771] Reduce the cost of hash collisions for set objects

2013-08-19 Thread Raymond Hettinger
Raymond Hettinger added the comment: Wow, thanks for running these tests and posting the results :-) -- ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue18771 ___

[issue18771] Reduce the cost of hash collisions for set objects

2013-08-19 Thread Roundup Robot
Roundup Robot added the comment: New changeset a887596b841f by Raymond Hettinger in branch 'default': Issue18771: Reduce the cost of hash collisions for set objects. http://hg.python.org/cpython/rev/a887596b841f -- nosy: +python-dev ___ Python

[issue18771] Reduce the cost of hash collisions for set objects

2013-08-19 Thread Raymond Hettinger
Changes by Raymond Hettinger raymond.hettin...@gmail.com: -- resolution: - fixed status: open - closed ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue18771 ___

[issue18771] Reduce the cost of hash collisions for set objects

2013-08-19 Thread Christian Heimes
Christian Heimes added the comment: How about a Misc/NEWS and a Doc/whatsnew/3.4 entry, too? -- ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue18771 ___

[issue18771] Reduce the cost of hash collisions for set objects

2013-08-19 Thread STINNER Victor
STINNER Victor added the comment: This optimization cannot be applied to the dict type? -- ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue18771 ___

[issue18771] Reduce the cost of hash collisions for set objects

2013-08-19 Thread Raymond Hettinger
Raymond Hettinger added the comment: Victor, this should also work for dictionaries as well. Fundamentally, the concepts of loop-unrolling and exploiting cache locality should apply to any hash table implementation. That said, the dicts are more complex than sets due to 1) storing the

[issue18771] Reduce the cost of hash collisions for set objects

2013-08-18 Thread Raymond Hettinger
Raymond Hettinger added the comment: Also, cachegrind shows a 3% improvement in the branch misprediction rate (from 9.9% to 9.6%). ==82814== Mispred rate: 9.9% ( 9.4% + 15.5% ==82812== Mispred rate: 9.6% ( 9.1% + 14.1% )

[issue18771] Reduce the cost of hash collisions for set objects

2013-08-18 Thread STINNER Victor
STINNER Victor added the comment: Instead of only a second lookup, could you try for example 4 lookup and align j to fit in a cache line? Or do you expect more collisions with 4 contiguious lookup? -- ___ Python tracker rep...@bugs.python.org

[issue18771] Reduce the cost of hash collisions for set objects

2013-08-18 Thread Raymond Hettinger
Raymond Hettinger added the comment: Instead of only a second lookup, could you try for example 4 lookup and align j to fit in a cache line? Accessing 4 entries per probe is a tempting line of development, but will be subject to diminishing returns (second and third collisions aren't

[issue18771] Reduce the cost of hash collisions for set objects

2013-08-18 Thread Jesús Cea Avión
Changes by Jesús Cea Avión j...@jcea.es: -- nosy: +jcea ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue18771 ___ ___ Python-bugs-list mailing list

[issue18771] Reduce the cost of hash collisions for set objects

2013-08-18 Thread Raymond Hettinger
Raymond Hettinger added the comment: Attaching an updated patch (same algorithm but with a little more factoring to make the patch smaller). -- Added file: http://bugs.python.org/file31361/so2.diff ___ Python tracker rep...@bugs.python.org

[issue18771] Reduce the cost of hash collisions for set objects

2013-08-17 Thread Raymond Hettinger
New submission from Raymond Hettinger: I'm working on a patch for the lookkey() functions in Object/setobject.c. The core idea is to follow the probe sequence as usual but to also check an adjacent entry for each probe (i.e. check table[i mask] as usual and then check table[(i mask) ^ 1]

[issue18771] Reduce the cost of hash collisions for set objects

2013-08-17 Thread Christian Heimes
Changes by Christian Heimes li...@cheimes.de: -- nosy: +christian.heimes ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue18771 ___ ___

[issue18771] Reduce the cost of hash collisions for set objects

2013-08-17 Thread Serhiy Storchaka
Serhiy Storchaka added the comment: How it will affect a performance of set(range(n))? -- nosy: +serhiy.storchaka ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue18771 ___

[issue18771] Reduce the cost of hash collisions for set objects

2013-08-17 Thread Raymond Hettinger
Changes by Raymond Hettinger raymond.hettin...@gmail.com: -- keywords: +patch Added file: http://bugs.python.org/file31345/so.diff ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue18771 ___

[issue18771] Reduce the cost of hash collisions for set objects

2013-08-17 Thread Raymond Hettinger
Raymond Hettinger added the comment: Serhiy, I've posted a patch so you can examine the effects in detail. For something like set(range(n)), I would expect no change because the dataset is collision free due to Tim's design where hash(someint)==someint. That said, I'm trying to optimize the

[issue18771] Reduce the cost of hash collisions for set objects

2013-08-17 Thread STINNER Victor
STINNER Victor added the comment: Do you expect visible difference on a microbenchmark? Do you have benchmarks showing the speedup and showing that many collisions are no too much slower? -- ___ Python tracker rep...@bugs.python.org

[issue18771] Reduce the cost of hash collisions for set objects

2013-08-17 Thread Raymond Hettinger
Raymond Hettinger added the comment: I just made the patch a few minutes ago. Am just starting to work on benchmarks. I posted here so that you guys could help find the strengths and weaknesses of the approach. The theory is sound, but it is going to take a good deal of effort to isolate

[issue18771] Reduce the cost of hash collisions for set objects

2013-08-17 Thread Raymond Hettinger
Changes by Raymond Hettinger raymond.hettin...@gmail.com: -- assignee: - rhettinger priority: normal - low ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue18771 ___

[issue18771] Reduce the cost of hash collisions for set objects

2013-08-17 Thread STINNER Victor
STINNER Victor added the comment: The theory is sound, but it is going to take a good deal of effort to isolate the effects (either good or bad) in realistic benchmarks. Oh, it was just asking for a microbenchmark on set(). I don't care of a real benchmark (realistic use case) for such issue

[issue18771] Reduce the cost of hash collisions for set objects

2013-08-17 Thread Raymond Hettinger
Raymond Hettinger added the comment: Here's an excerpt of the patch that gives a pretty good idea of that is being changed (there are essentially three lines of new logic): static void set_insert_clean(PySetObject *so, PyObject *key, Py_hash_t hash) { setentry *table = so-table;

[issue18771] Reduce the cost of hash collisions for set objects

2013-08-17 Thread Raymond Hettinger
Changes by Raymond Hettinger raymond.hettin...@gmail.com: -- Removed message: http://bugs.python.org/msg195530 ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue18771 ___

[issue18771] Reduce the cost of hash collisions for set objects

2013-08-17 Thread Raymond Hettinger
Raymond Hettinger added the comment: Here's an excerpt from the patch that gives a pretty good idea of what is being changed (the three lines of new logic are marked): static void set_insert_clean(PySetObject *so, PyObject *key, Py_hash_t hash) { setentry *table = so-table; setentry

[issue18771] Reduce the cost of hash collisions for set objects

2013-08-17 Thread STINNER Victor
STINNER Victor added the comment: + Cache line benefit: improves the odds that the adjacent probe will be on the same cache line (...) + Reduced loop overhead: the second lookup doesn't require a new computation of the index *i* (we just do a XOR 1) (...) With j=11, is the lookup at index 10

[issue18771] Reduce the cost of hash collisions for set objects

2013-08-17 Thread Raymond Hettinger
Raymond Hettinger added the comment: With j=11, is the lookup at index 10 in the same cache line? It might or it might not. The cache lines are typically 64 bytes which holds 4 key/hash pairs. So, the odds are 75% in favor of an adjacent entry being in the same cache line. If malloc where

[issue18771] Reduce the cost of hash collisions for set objects

2013-08-17 Thread Raymond Hettinger
Raymond Hettinger added the comment: Here's a simple benchmark to start with. On my machine (2.4Ghz Core2 Duo with 2MB L3 cache) and compiled with GCC-4.8, the benchmark shows a 6% speedup for sets that spill over L2 cache and 11% for sets that spill over the L3 cache. The theoretically

[issue18771] Reduce the cost of hash collisions for set objects

2013-08-17 Thread Raymond Hettinger
Changes by Raymond Hettinger raymond.hettin...@gmail.com: Added file: http://bugs.python.org/file31350/insert_clean.s ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue18771 ___

[issue18771] Reduce the cost of hash collisions for set objects

2013-08-17 Thread Raymond Hettinger
Raymond Hettinger added the comment: Attaching the detailed results of a CacheGrind analysis. Here are the high-points (all of which were expected): * The last level data cache miss-rate improved 12% (from 4.9% to 4.3%). * The first level data cache miss-rate improved 14% (from 5.5% to 4.7%).