Roundup Robot added the comment:
New changeset 9353c611f897 by Raymond Hettinger in branch 'default':
Issue 18771: Make it possible to set the number linear probes at compile-time.
http://hg.python.org/cpython/rev/9353c611f897
--
___
Python tracker
Tal Einat added the comment:
The following benchmarks were run with the second version of the patch
(so2.diff), once before and once after.
OSX 10.8 on a Macbook Pro, 2.5 GHz Core i5, two cores each with 256kb L2 cache,
3MB L3 cache:
1.32% less time for first benchmark (7.07 before, 6.98
Tal Einat added the comment:
Because of the unusually low differences, I tried running the benchmarks again
on the MacBook (OSX 10.8). The first time was actually with so.diff, this time
was with so2.diff.
0.69% less time for first benchmark (7.30 before, 7.25 after)
8.08% less time for
Christian Heimes added the comment:
Intel(R) Core(TM) i7-4770K CPU @ 3.50GHz on Ubuntu 13.04 X86_64 with GCC 4.7.3
1MB L2 cache test: 11.33sec / 10.83sec (4.4% faster)
8Mb L3 cache test: 22.16sec / 19.67sec (11% faster)
--
___
Python tracker
Tal Einat added the comment:
Sorry to be pedantic, but while 19.67sec is about 11.2% less than 22.16sec, it
is actually about 12.6% faster (22.16/19.67 - 1).
Regardless, that's a good result! And I'm happy that the modified benchmark is
useful.
--
Christian Heimes added the comment:
Oh *curse*, you are right. Why do I always confuse the order? :|
It looks like the performance boost increases with CPU cache size as well as
the CPU's generation. The i7-4770K box at work is less than a week old. The CPU
is currently the fast desktop CPU
Raymond Hettinger added the comment:
Wow, thanks for running these tests and posting the results :-)
--
___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue18771
___
Roundup Robot added the comment:
New changeset a887596b841f by Raymond Hettinger in branch 'default':
Issue18771: Reduce the cost of hash collisions for set objects.
http://hg.python.org/cpython/rev/a887596b841f
--
nosy: +python-dev
___
Python
Changes by Raymond Hettinger raymond.hettin...@gmail.com:
--
resolution: - fixed
status: open - closed
___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue18771
___
Christian Heimes added the comment:
How about a Misc/NEWS and a Doc/whatsnew/3.4 entry, too?
--
___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue18771
___
STINNER Victor added the comment:
This optimization cannot be applied to the dict type?
--
___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue18771
___
Raymond Hettinger added the comment:
Victor, this should also work for dictionaries as well. Fundamentally, the
concepts of loop-unrolling and exploiting cache locality should apply to any
hash table implementation.
That said, the dicts are more complex than sets due to 1) storing the
Raymond Hettinger added the comment:
Also, cachegrind shows a 3% improvement in the branch misprediction rate (from
9.9% to 9.6%).
==82814== Mispred rate: 9.9% ( 9.4% + 15.5%
==82812== Mispred rate: 9.6% ( 9.1% + 14.1%
)
STINNER Victor added the comment:
Instead of only a second lookup, could you try for example 4 lookup and
align j to fit in a cache line? Or do you expect more collisions with 4
contiguious lookup?
--
___
Python tracker rep...@bugs.python.org
Raymond Hettinger added the comment:
Instead of only a second lookup, could you try for example 4 lookup
and align j to fit in a cache line?
Accessing 4 entries per probe is a tempting line of development, but will be
subject to diminishing returns (second and third collisions aren't
Changes by Jesús Cea Avión j...@jcea.es:
--
nosy: +jcea
___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue18771
___
___
Python-bugs-list mailing list
Raymond Hettinger added the comment:
Attaching an updated patch (same algorithm but with a little more factoring to
make the patch smaller).
--
Added file: http://bugs.python.org/file31361/so2.diff
___
Python tracker rep...@bugs.python.org
New submission from Raymond Hettinger:
I'm working on a patch for the lookkey() functions in Object/setobject.c. The
core idea is to follow the probe sequence as usual but to also check an
adjacent entry for each probe (i.e. check table[i mask] as usual and then
check table[(i mask) ^ 1]
Changes by Christian Heimes li...@cheimes.de:
--
nosy: +christian.heimes
___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue18771
___
___
Serhiy Storchaka added the comment:
How it will affect a performance of set(range(n))?
--
nosy: +serhiy.storchaka
___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue18771
___
Changes by Raymond Hettinger raymond.hettin...@gmail.com:
--
keywords: +patch
Added file: http://bugs.python.org/file31345/so.diff
___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue18771
___
Raymond Hettinger added the comment:
Serhiy, I've posted a patch so you can examine the effects in detail.
For something like set(range(n)), I would expect no change because the dataset
is collision free due to Tim's design where hash(someint)==someint. That said,
I'm trying to optimize the
STINNER Victor added the comment:
Do you expect visible difference on a microbenchmark? Do you have
benchmarks showing the speedup and showing that many collisions are no
too much slower?
--
___
Python tracker rep...@bugs.python.org
Raymond Hettinger added the comment:
I just made the patch a few minutes ago. Am just starting to work on
benchmarks. I posted here so that you guys could help find the strengths and
weaknesses of the approach.
The theory is sound, but it is going to take a good deal of effort to isolate
Changes by Raymond Hettinger raymond.hettin...@gmail.com:
--
assignee: - rhettinger
priority: normal - low
___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue18771
___
STINNER Victor added the comment:
The theory is sound, but it is going to take a good deal of effort to isolate
the effects (either good or bad) in realistic benchmarks.
Oh, it was just asking for a microbenchmark on set(). I don't care of
a real benchmark (realistic use case) for such issue
Raymond Hettinger added the comment:
Here's an excerpt of the patch that gives a pretty good idea of that is being
changed (there are essentially three lines of new logic):
static void
set_insert_clean(PySetObject *so, PyObject *key, Py_hash_t hash)
{
setentry *table = so-table;
Changes by Raymond Hettinger raymond.hettin...@gmail.com:
--
Removed message: http://bugs.python.org/msg195530
___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue18771
___
Raymond Hettinger added the comment:
Here's an excerpt from the patch that gives a pretty good idea of what is being
changed (the three lines of new logic are marked):
static void
set_insert_clean(PySetObject *so, PyObject *key, Py_hash_t hash)
{
setentry *table = so-table;
setentry
STINNER Victor added the comment:
+ Cache line benefit: improves the odds that the adjacent probe will
be on the same cache line (...)
+ Reduced loop overhead: the second lookup doesn't require a new
computation of the index *i* (we just do a XOR 1) (...)
With j=11, is the lookup at index 10
Raymond Hettinger added the comment:
With j=11, is the lookup at index 10 in the same cache line?
It might or it might not. The cache lines are typically 64 bytes which holds 4
key/hash pairs. So, the odds are 75% in favor of an adjacent entry being in
the same cache line.
If malloc where
Raymond Hettinger added the comment:
Here's a simple benchmark to start with.
On my machine (2.4Ghz Core2 Duo with 2MB L3 cache) and compiled with GCC-4.8,
the benchmark shows a 6% speedup for sets that spill over L2 cache and 11% for
sets that spill over the L3 cache.
The theoretically
Changes by Raymond Hettinger raymond.hettin...@gmail.com:
Added file: http://bugs.python.org/file31350/insert_clean.s
___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue18771
___
Raymond Hettinger added the comment:
Attaching the detailed results of a CacheGrind analysis. Here are the
high-points (all of which were expected):
* The last level data cache miss-rate improved 12% (from 4.9% to 4.3%).
* The first level data cache miss-rate improved 14% (from 5.5% to 4.7%).
34 matches
Mail list logo