[issue23282] Slightly faster set lookup

2015-01-25 Thread Raymond Hettinger

Raymond Hettinger added the comment:

I had intentionally moved the entry-key == key test to be after the hash test. 
 Unlike dicts where exact key matches are norm, sets are used for deduping 
(cased where keys are equal but not dups).

By moving the test inside the entry-key == hash test, we reduce the number of 
comparisons per loop for collisions and increase the predictability of the 
entry-key == key branch (once the hash matches, the chance of an equality 
match is very high.

The one part I'm looking at changing is moving the dummy test after the 
identity key test.  However, the dummy test inside the hash test is perfectly 
predictable (it basicly never matches) and may be cost free.

--
resolution:  - rejected
status: open - closed

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue23282
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue23282] Slightly faster set lookup

2015-01-25 Thread Serhiy Storchaka

Serhiy Storchaka added the comment:

What about just removing the dummy test inside the hash test? The benefit is 
smaller but statistically sygnificant and always positive.

$ ./python -m timeit -s a = list(range(10**6)); s1 = set(a); s2 = set(a) -- 
s1 = s2

Unpatched: 10 loops, best of 3: 39.3 msec per loop
Patched: 10 loops, best of 3: 38.7 msec per loop

--

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue23282
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue23282] Slightly faster set lookup

2015-01-23 Thread Serhiy Storchaka

Serhiy Storchaka added the comment:

Good point Josh. Yes, it may slow down other cases, and there is a difference 
between sets and dicts.

1. If the key is not contained in the set, then if the first tested entry 
table[hash  mask] is empty, then the lookup is slowed down by one pointer 
comparison (2 comparisons instead of 1). If the entry is used then the lookup 
is not slow downed or even speed up (if need to test more than one additional 
entries).

2. If the same case is in the set, then if the first tested entry is the set 
then the lookup will use one comparison instead of 4. In other cases (the key 
is placed in other entry) the lookup will use at least one comparison less.

3. If the set contains key equal but not identical to tested key, then the 
lookup will use at least one comparison less.

Only first case can cause slowdown. If we test a lot of keys not existing in 
the set with small ratio between used and allocated entries numbers. May be the 
worst case is creating a copy of the set. For other operations I did not find 
significant difference.

$ ./python -m timeit -s s = set(range(10**4)) -- frozenset(s)
Unpatched: 1000 loops, best of 3: 658 usec per loop
Patched: 1000 loops, best of 3: 668 usec per loop

But issue23290 prevents this regression.

--

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue23282
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue23282] Slightly faster set lookup

2015-01-23 Thread Josh Rosenberg

Josh Rosenberg added the comment:

Does it slow down other cases? Seems to me like dictionaries would have more 
existing key lookups than sets to justify the optimization, since they're 
used for attribute lookup and the like, and because you usually want the value 
associated with existing keys, where a set would usually be used in a might 
exist, might not scenario of membership testing.

--
nosy: +josh.r

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue23282
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue23282] Slightly faster set lookup

2015-01-22 Thread Raymond Hettinger

Changes by Raymond Hettinger raymond.hettin...@gmail.com:


--
assignee:  - rhettinger

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue23282
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue23282] Slightly faster set lookup

2015-01-20 Thread Serhiy Storchaka

New submission from Serhiy Storchaka:

Currently set_lookkey() first tests entry-key == NULL, then entry-hash == 
hash and entry-key != dummy, and only after that entry-key == key. Proposed 
patch optimizes the order of comparisons. entry-key == key is tested first as 
for dicts. And no need to test entry-key != dummy after entry-hash == hash if 
entry-hash of dummy key is set to -1.

Microbenchmark which demonstrates the best case (a lot of lookups of keys 
existing in the set).

$ ./python -m timeit -s a = list(range(10**6)); s1 = set(a); s2 = set(a) -- 
s1 = s2

Unpatched: 10 loops, best of 3: 39.4 msec per loop
Patched: 10 loops, best of 3: 35.3 msec per loop

--
components: Interpreter Core
files: set_faster_lookup.patch
keywords: patch
messages: 234367
nosy: pitrou, rhettinger, serhiy.storchaka
priority: normal
severity: normal
stage: patch review
status: open
title: Slightly faster set lookup
type: performance
versions: Python 3.5
Added file: http://bugs.python.org/file37789/set_faster_lookup.patch

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue23282
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com