[issue24681] Put most likely test first in set_add_entry()

2015-07-31 Thread Raymond Hettinger

Changes by Raymond Hettinger raymond.hettin...@gmail.com:


--
resolution:  - fixed
status: open - closed

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue24681
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue24681] Put most likely test first in set_add_entry()

2015-07-31 Thread Roundup Robot

Roundup Robot added the comment:

New changeset 9e3be159d023 by Raymond Hettinger in branch 'default':
Issue #24681:  Move the most likely test first in set_add_entry().
https://hg.python.org/cpython/rev/9e3be159d023

--

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue24681
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue24681] Put most likely test first in set_add_entry()

2015-07-25 Thread STINNER Victor

Changes by STINNER Victor victor.stin...@gmail.com:


--
nosy:  -haypo

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue24681
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue24681] Put most likely test first in set_add_entry()

2015-07-23 Thread Raymond Hettinger

Raymond Hettinger added the comment:

Sets are under no obligation to keep their code synced with dicts and have over 
time diverged in a number of ways.

--

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue24681
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue24681] Put most likely test first in set_add_entry()

2015-07-23 Thread Raymond Hettinger

Changes by Raymond Hettinger raymond.hettin...@gmail.com:


Added file: http://bugs.python.org/file39992/move_likely_first.diff

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue24681
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue24681] Put most likely test first in set_add_entry()

2015-07-23 Thread Roundup Robot

Roundup Robot added the comment:

New changeset bc80c783c4ab by Raymond Hettinger in branch 'default':
Issue #24681:  Move the store of so-table to the code block where it is used.
https://hg.python.org/cpython/rev/bc80c783c4ab

--
nosy: +python-dev

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue24681
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue24681] Put most likely test first in set_add_entry()

2015-07-23 Thread Raymond Hettinger

Changes by Raymond Hettinger raymond.hettin...@gmail.com:


Added file: http://bugs.python.org/file39993/move_likely_first.diff

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue24681
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue24681] Put most likely test first in set_add_entry()

2015-07-23 Thread Serhiy Storchaka

Serhiy Storchaka added the comment:

The patch makes set_add_entry() inconsistent not only with dict, but with other 
set methods that use set_lookkey(). I don't know if there is some subtle bug 
here, but I feel myself slightly uncomfortable with it. Also the new code looks 
a little less readable to me. Current code is straightforward translation from 
Python to C (first check error, then check that the result still is actual, 
finally check the result itself), the new code forces you to stop and think, 
it's correctness is not so obvious.

Even if all good with the patch, please don't commit it right now. Wait at 
least a week and then commit. May be someone other will find a flaw in the 
patch.

--

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue24681
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue24681] Put most likely test first in set_add_entry()

2015-07-23 Thread Raymond Hettinger

Raymond Hettinger added the comment:

Serhiy, do you see anything actually wrong with the patch?  If it isn't broken 
in some clear cut way, I'm going to apply it shortly.  The comparison with dict 
internals is a red herring -- there is no promise need or precedent for making 
that code exactly the same (nor is there is restriction on PyPy or Jython which 
have already traveled down other paths not aligned with cpyhton dict internals).

--

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue24681
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue24681] Put most likely test first in set_add_entry()

2015-07-23 Thread Raymond Hettinger

Changes by Raymond Hettinger raymond.hettin...@gmail.com:


Removed file: http://bugs.python.org/file39992/move_likely_first.diff

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue24681
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue24681] Put most likely test first in set_add_entry()

2015-07-23 Thread Serhiy Storchaka

Serhiy Storchaka added the comment:

The condition that is determined by the PyObject_RichCompareBool() check is not 
that the element is already present, but that the element was present. Even if 
we will agree that such behavior change is appropriate, the inconsistency with 
dict makes it doubtful (originally set was implemented with dict).

I have no objections against tweaking so-table caching. I even think this part 
of the patch improves the code.

--

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue24681
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue24681] Put most likely test first in set_add_entry()

2015-07-22 Thread Raymond Hettinger

Raymond Hettinger added the comment:

FWIW, my approach is to look at the most important code
paths to see if there is any work being done that isn't
essential for the result being computed.

Next, I look at the generated assembly to estimate speed
by counting memory accesses (and whether they are cached
fresh accesses or stale random accesses) and I look at
the branches (and whether they are predictable or not).

The table=so-table assignment was being done for all code
paths but was only needed around the rich compare.  Here
is the before and after for the most important path
(the first lookup).  Note that the change saves one memory
spill and one reload.

Before:
---
_set_add_entry:
pushq   %r15
pushq   %r14
movq%rdx, %r14
pushq   %r13
pushq   %r12
movq%rdi, %r12
pushq   %rbp
movq%rsi, %rbp
pushq   %rbx
subq$56, %rsp
movq40(%rdi), %rax
addq$1, (%rsi)
movq%rax, 16(%rsp)-- spill
movq32(%r12), %rdx
movq%rdx, %r15
andq%r14, %r15
movq%r15, %rbx
salq$4, %rbx
addq16(%rsp), %rbx-- reload
movq(%rbx), %rcx
testq   %rcx, %rcx
je  L430


AFTER
-
_set_add_entry:
pushq   %r15
movq%rdx, %r15
pushq   %r14
pushq   %r13
pushq   %r12
movq%rdi, %r12
pushq   %rbp
movq%rsi, %rbp
pushq   %rbx
subq$56, %rsp
movq40(%rdi), %rdx
addq$1, (%rsi)-- no spill
movq%rdx, %r11
L428:
movq32(%r12), %rcx
movq%rcx, %r13
andq%r15, %r13
movq%r13, %rbx
salq$4, %rbx
addq%r11, %rbx -- from register
movq(%rbx), %r14
testq   %r14, %r14
je  L429


The code around the rich compare used to do memory
loads that weren't necessary for the most likely case
(since the 64-bit hash values match, it is very likely
that the comparison will report a match).

BEFORE
--

call_PyObject_RichCompareBool
movq24(%rsp), %rcx
movq(%rcx), %rdi
leaq-1(%rdi), %rdx
testq   %rdx, %rdx
movq%rdx, (%rcx)
je  L489
testl   %eax, %eax
js  L437  --- predictable error branch
movq40(%r12), %rdx--- memory load 
cmpq16(%rsp), %rdx--- memory load
jne L460 
cmpq(%rbx), %rcx  --- memory load  
jne L429  --- predictable restart branch
testl   %eax, %eax--- predictable found_active branch
jne L432  --- most common exit point
movq32(%r12), %rdx


AFTER
-

call_PyObject_RichCompareBool
movq16(%rsp), %rcx
movq(%rcx), %rdi
leaq-1(%rdi), %rdx
testq   %rdx, %rdx
movq%rdx, (%rcx)
je  L485
cmpl$0, %eax
jg  L431  -- common exit before the memory loads!
L490:
jne L434
movq40(%r12), %rdx--- memory load 
cmpq%rdx, 24(%rsp)--- memory load 
movq%rdx, %r11
jne L428
cmpq(%rbx), %rcx  --- memory load 
jne L428

--

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue24681
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue24681] Put most likely test first in set_add_entry()

2015-07-22 Thread Raymond Hettinger

Raymond Hettinger added the comment:

 you have to provide a benchmark

Actually, I don't.  When making a small series of changes, benchmarking every 
step is waste of time and tends to trap you in local minimums and causes you to 
overfit to a particular processor, compiler, or benchmark.  The better process 
is to carefully work through what the code is telling the machine to do and 
evaluating whether those steps make sense.  This is done in conjunction with 
organizing the code in a more logical manner (i.e. only saving the so-table in 
the block where we're using it as a check to check if the rich comparison 
rearranged the table in a way the invalidated the entry pointer or made the 
current search invalid).  In general, less work is better, having related 
actions take place close together is better, making functions self-contained is 
better, etc.

If you want to team-up and help, your contribution is welcome.  General sniping 
isn't helpful at all.  I wrote all of this code and have maintained it for 13 
years -- this series of refactorings has been something I've been working 
towards for a long time.

--

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue24681
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue24681] Put most likely test first in set_add_entry()

2015-07-22 Thread Eric Snow

Eric Snow added the comment:

Thanks for the clear explanation, Raymond.  The approach you've described is 
useful in a number of circumstances.  Would you mind publishing (somewhere 
outside the tracker; devguide?) the specific steps you take and the tools you 
use?

--
nosy: +eric.snow

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue24681
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue24681] Put most likely test first in set_add_entry()

2015-07-22 Thread STINNER Victor

STINNER Victor added the comment:

Since it looks like an optimization, can you please provide a benchmark?

--
nosy: +haypo

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue24681
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue24681] Put most likely test first in set_add_entry()

2015-07-22 Thread Raymond Hettinger

Raymond Hettinger added the comment:

You can benchmark if you want.  I'm looking for a second pair of eyes to 
validate the correctness.  My goal is to put the tests and assignments in the 
most logical order.

--

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue24681
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue24681] Put most likely test first in set_add_entry()

2015-07-22 Thread R. David Murray

R. David Murray added the comment:

Victor, I'm hearing Raymond say that it isn't really about optimization, but 
about the logical organization of the code.  I think making things more 
*complicated* requires a benchmark justification, but it doesn't look to me 
like this change makes things more complicated (on the other hand I'm not 
*fluent* in C).

--
nosy: +r.david.murray

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue24681
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue24681] Put most likely test first in set_add_entry()

2015-07-22 Thread Serhiy Storchaka

Serhiy Storchaka added the comment:

The patch changes behavior. With the patch it would be possible that after 
successful set.add(), the item will be not contained in the set. And this 
behavior is not consistent with dict behavior.

--

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue24681
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue24681] Put most likely test first in set_add_entry()

2015-07-22 Thread STINNER Victor

STINNER Victor added the comment:

 You can benchmark if you want.

No, you have to provide a benchmark if you want to modify the Python core to 
optimize it. Otherwise, modifying the code is pointless.

It's a general rule in Python to optimize code. We don't accept optimization 
changes if it has no effect on performances.

--

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue24681
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue24681] Put most likely test first in set_add_entry()

2015-07-22 Thread STINNER Victor

STINNER Victor added the comment:

For me, optimizing assembler is still an optimization. I give up, I just don't 
care of set performances.

--

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue24681
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue24681] Put most likely test first in set_add_entry()

2015-07-22 Thread Raymond Hettinger

Raymond Hettinger added the comment:

If PyObject_RichCompareBool() reports that a key is already present in the set, 
then set_add_entry() is under no further obligation to make an insertion.   As 
long as there is no risk of segfault, there is no more obligation to cater to 
lying or self-destructing __eq__ methods than there is to support objects that 
return randomized __hash__ values. 

The docs for set.add(...):  Add an element to a set. This has no effect if the 
element is already present.  The latter condition is determined by the 
PyObject_RichCompareBool() check.  If it reports a match, then it is reasonable 
to do nothing.

FWIW, dicts don't have a choice in this regard because they still have an 
implementation that depends on returning a currently valid entry which they 
can't do if the table is mutating during the search.  The set_add_entry() 
implementation has the advantage in that it is self-contained and need only 
report -1 is an error arose during the comparison or a 0 to indicate that no 
error arose.  Also, note that sets diverged from dicts that the outset in that 
they don't swallow exceptions like PyDict_GetItem() does.

--

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue24681
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue24681] Put most likely test first is set_add_entry()

2015-07-21 Thread Raymond Hettinger

New submission from Raymond Hettinger:

Since the *found_active* exit is like the *found_error* exit in that it makes 
no further use of *entry*, it can be moved before the table/entry_key check 
whose purpose is to make sure the *entry* pointer is still valid.  This change 
doesn't apply to lookkey() which makes downstream use of the entry pointer.  In 
constrast, set_add_entry() is fully self-contained now and only returns a 0 or 
-1 rather than a pointer into the set table.

This puts the most likely test case first, putting it ahead of the two memory 
reloads in table/entry_key check.

Also, add an else if to the initial freeslot check to make it match the 
corresponding else if in the linear probe loop.

--
assignee: serhiy.storchaka
components: Interpreter Core
files: better_test_order.diff
keywords: patch
messages: 247086
nosy: rhettinger, serhiy.storchaka
priority: normal
severity: normal
stage: patch review
status: open
title: Put most likely test first is set_add_entry()
versions: Python 3.6
Added file: http://bugs.python.org/file39972/better_test_order.diff

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue24681
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com