[issue23553] Reduce the number of comparison for range checking.

2015-03-01 Thread Serhiy Storchaka

Serhiy Storchaka added the comment:

Parenthesis around Py_SIZE() are redundant.

Are there any benchmarking results that show a speed up? Such microoptimization 
makes sense in tight loops, but optimized source code looks more cumbersome and 
errorprone.

--
nosy: +pitrou

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue23553
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue23553] Reduce the number of comparison for range checking.

2015-03-01 Thread Roundup Robot

Roundup Robot added the comment:

New changeset 1e89094998b2 by Raymond Hettinger in branch 'default':
Issue #23553:  Use an unsigned cast to tighten-up the bounds checking logic.
https://hg.python.org/cpython/rev/1e89094998b2

--
nosy: +python-dev

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue23553
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue23553] Reduce the number of comparison for range checking.

2015-03-01 Thread Raymond Hettinger

Raymond Hettinger added the comment:

I think the source in listobject.c would be benefit from a well-named macro for 
this.  That would provide the most clarity.   For deques, I'll just put in the 
simple patch because it only applies to a place that is already doing unsigned 
arithmetic/comparisons.

FWIW, I don't usually use benchmarking on these kinds of changes.  The 
generated assembler is sufficiently informative.   Benchmarking each tiny 
change risks getting trapped in a local minimum.  Also, little timeit tests 
tend to branch the same way every time (which won't show the cost of prediction 
misses) and it tends to have all code and data in cache (so you don't see the 
effects of cache misses) and it risks tuning to a single processor (in my case 
a haswell).   Instead, I look at the code generated by GCC and CLang to see 
that it does less work.

--

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue23553
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue23553] Reduce the number of comparison for range checking.

2015-03-01 Thread Serhiy Storchaka

Serhiy Storchaka added the comment:

My point is that if the benefit is too small (say  5% in microbenchmarks), it 
is not worth code churning. Actually my bar for microbenchmarks is higher, 
about 20%.

--

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue23553
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue23553] Reduce the number of comparison for range checking.

2015-03-01 Thread Arfrever Frehtes Taifersar Arahesis

Changes by Arfrever Frehtes Taifersar Arahesis arfrever@gmail.com:


--
nosy: +Arfrever

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue23553
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue23553] Reduce the number of comparison for range checking.

2015-02-28 Thread Raymond Hettinger

New submission from Raymond Hettinger:

Python's core is full of bound checks like this one in Objects/listobject.c:

static PyObject *
list_item(PyListObject *a, Py_ssize_t i)
{
if (i  0 || i = Py_SIZE(a)) {
...

Abner Fog's high-level language optimization guide,  
http://www.agner.org/optimize/optimizing_cpp.pdf in section 14.2 Bounds 
Checking, shows a way to fold this into a single check:

-if (i  0 || i = Py_SIZE(a)) {
+if ((unsigned)i = (unsigned)(Py_SIZE(a))) {
 if (indexerr == NULL) {
 indexerr = PyUnicode_FromString(
 list index out of range);

The old generated assembly code looks like this:

_list_item:
subq$8, %rsp
testq   %rsi, %rsi
js  L227
cmpq16(%rdi), %rsi
jl  L228
L227:
... error reporting and exit  ...
L228:
movq24(%rdi), %rax
movq(%rax,%rsi,8), %rax
addq$1, (%rax)
addq$8, %rsp
ret

The new disassembly looks like this:

_list_item:
cmpl%esi, 16(%rdi)
ja  L227
... error reporting and exit  ...
L227:
movq24(%rdi), %rax
movq(%rax,%rsi,8), %rax
addq$1, (%rax)
ret

Note, the new code not only saves a comparison/conditional-jump pair, it also 
avoids the need to adjust %rsp on the way in and the way out for a net savings 
of four instructions along the critical path.

When we have good branch prediction, the current approach is very low cost; 
however, Abner Fog's recommendation is never more expensive, is sometimes 
cheaper, saves a possible misprediction, and reduces the total code generated.  
All in all, it is a net win.

I recommend we put in a macro of some sort so that this optimization gets 
expressed exactly once in the code and so that it has a good clear name with an 
explanation of what it does.

--
components: Interpreter Core
messages: 236928
nosy: rhettinger
priority: normal
severity: normal
status: open
title: Reduce the number of comparison for range checking.
type: performance
versions: Python 3.5

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue23553
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue23553] Reduce the number of comparison for range checking.

2015-02-28 Thread Raymond Hettinger

Changes by Raymond Hettinger raymond.hettin...@gmail.com:


--
keywords: +patch
Added file: http://bugs.python.org/file38281/size_t.diff

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue23553
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue23553] Reduce the number of comparison for range checking.

2015-02-28 Thread Raymond Hettinger

Raymond Hettinger added the comment:

 But it wouldn't work for say off_t.

I'm only proposing a bounds checking macro for the Py_ssize_t case which is 
what all of our IndexError tests look for.

Also, please look at the attached deque fix.

--

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue23553
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue23553] Reduce the number of comparison for range checking.

2015-02-28 Thread Serhiy Storchaka

Serhiy Storchaka added the comment:

It looks correct to me, but I would change type and introduce few new variables 
to get rid of casts.

--

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue23553
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue23553] Reduce the number of comparison for range checking.

2015-02-28 Thread Raymond Hettinger

Raymond Hettinger added the comment:

Attaching a diff for the bounds checking Objects/listobject.c.
It looks like elsewhere in that file, (size_t) casts are done
for various reasons.

--
Added file: http://bugs.python.org/file38282/bounds_check_list.diff

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue23553
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue23553] Reduce the number of comparison for range checking.

2015-02-28 Thread Raymond Hettinger

Raymond Hettinger added the comment:

 The type of i and Py_SIZE(a) is Py_ssize_t, so when casted to 
 unsigned int, highest bits are lost. The correct casting type is size_t.

Yes, I had just seen that a early today and deciding whether to substitute 
size_t for the unsigned cast or whether to just revert.   I believe size_t is 
guaranteed to hold any array index and that a cast from non-negative Py_ssize_t 
would not lose bits.

 But in CPython implementation sizes are signed (Py_ssize_t). 
 The problem with using this optimization (rather low-level 
 than high-level) is that we need to know unsigned version of
 the type of compared values.

Wouldn't size_t always work for Py_ssize_t?

--

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue23553
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue23553] Reduce the number of comparison for range checking.

2015-02-28 Thread Raymond Hettinger

Raymond Hettinger added the comment:

Also attaching a bounds checking patch for deques.

--
Added file: http://bugs.python.org/file38283/bounds_check_deque.diff

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue23553
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue23553] Reduce the number of comparison for range checking.

2015-02-28 Thread Serhiy Storchaka

Serhiy Storchaka added the comment:

Yes, this is a technique commonly used in STL implementations. This is why 
sizes and indices in STL are unsigned.

But in CPython implementation sizes are signed (Py_ssize_t). The problem with 
using this optimization (rather low-level than high-level) is that we need to 
know unsigned version of the type of compared values.

 -if (i  0 || i = Py_SIZE(a)) {
 +if ((unsigned)i = (unsigned)(Py_SIZE(a))) {

Here is a bug. The type of i and Py_SIZE(a) is Py_ssize_t, so when casted to 
unsigned int, highest bits are lost. The correct casting type is size_t.

In changeset 5942fd9ab335 you introduced a bug.

--
nosy: +serhiy.storchaka

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue23553
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue23553] Reduce the number of comparison for range checking.

2015-02-28 Thread Serhiy Storchaka

Serhiy Storchaka added the comment:

 Wouldn't size_t always work for Py_ssize_t?

Yes. But it wouldn't work for say off_t.

The consistent way is always use size_t instead of Py_ssize_t. But this boat 
has sailed.

--

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue23553
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com