[issue38015] inline function generates slightly inefficient machine code

2019-09-17 Thread Greg Price


Greg Price  added the comment:

See followup at https://bugs.python.org/issue38205 and 
https://bugs.python.org/issue37812#msg352670 .

The patch in GH-15710 had a side effect of introducing a call to 
`Py_UNREACHABLE` inside a comma-expression.  A subsequent commit 3ab61473b 
changed `Py_UNREACHABLE` in a way that made that a syntax error; the issue 
wasn't caught then because the code is only compiled if `NSMALLNEGINTS + 
NSMALLPOSINTS <= 0`.  Detailed discussion on those threads.

--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue38015] inline function generates slightly inefficient machine code

2019-09-09 Thread Ma Lin


Ma Lin  added the comment:

PR 15710 has been merged into the master, but the merge message is not shown 
here.
Commit: 
https://github.com/python/cpython/commit/6b519985d23bd0f0bd072b5d5d5f2c60a81a19f2

Maybe this issue can be closed.

--
resolution:  -> fixed
stage: patch review -> resolved
status: open -> closed

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue38015] inline function generates slightly inefficient machine code

2019-09-09 Thread Benjamin Peterson


Benjamin Peterson  added the comment:

I agree with msg351348. The performance wins are very context-dependent and 
even then very small. It's not worth further disturbing the small int code. So, 
we should close this issue out.

--
nosy: +benjamin.peterson

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue38015] inline function generates slightly inefficient machine code

2019-09-08 Thread Greg Price


Greg Price  added the comment:

(Just to help keep discussions together: some earlier discussion was on 
GH-15216 .)

Because is_small_int / IS_SMALL_INT is so small, there's not much cost in the 
source code to making it a macro (as GH-15710 did).

But I think it'd be a mistake to go a lot farther than that and convert 
significantly larger chunks of code like get_small_int to macros (as GH-15718 
would), unless the payoff were really commensurate.  It'd be better to focus 
optimization efforts where profiling shows a lot of time is really being spent.

--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue38015] inline function generates slightly inefficient machine code

2019-09-07 Thread Sergey Fedoseev


Sergey Fedoseev  added the comment:

I use GCC 9.2.

--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue38015] inline function generates slightly inefficient machine code

2019-09-07 Thread Ma Lin


Ma Lin  added the comment:

> This change produces tiny, but measurable speed-up for handling small ints

I didn't get measurable change, I run this command a dozen times and take the 
best result:

D:\dev\cpython\PCbuild\amd64\python.exe -m pyperf timeit -s "from collections 
import deque; consume = deque(maxlen=0).extend; r = range(256)" "consume(r)"  
--duplicate=1000

before: Mean +- std dev: 771 ns +- 16 ns
after:  Mean +- std dev: 770 ns +- 10 ns

Environment:
64-bit release build by MSVC 2017
CPU: i3 4160, System: latest Windows 10 64-bit

Check the machine code from godbolt.org, x64 MSVC v19.14 only saves one 
instruction:
movsxd  rax, ecx

x86-64 GCC 9.2 saves two instructions:
lea eax, [rdi+5]
cdqe

--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue38015] inline function generates slightly inefficient machine code

2019-09-06 Thread Ma Lin


Ma Lin  added the comment:

This range has not been changed since "preallocated small integer pool" was 
introduced:

#define NSMALLPOSINTS   257
#define NSMALLNEGINTS   5

The commit (Jan 2007):
https://github.com/python/cpython/commit/ddefaf31b366ea84250fc5090837c2b764a04102


Is it worth increase the range?
FYI, build with MSVC 2017, the `small_ints` size:

32-bit build:
sizeof(PyLongObject)16 bytes
sizeof(small_ints)4192 bytes

64-bit build:
sizeof(PyLongObject)32 bytes
sizeof(small_ints)8384 bytes

--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue38015] inline function generates slightly inefficient machine code

2019-09-06 Thread Sergey Fedoseev


Sergey Fedoseev  added the comment:

I added similar patch that replaces get_small_int() with macro version, since 
it also induces unnecessary casts and makes machine code less efficient.

Example assembly can be checked at https://godbolt.org/z/1SjG3E.

This change produces tiny, but measurable speed-up for handling small ints:

$ python -m pyperf timeit -s "from collections import deque; consume = 
deque(maxlen=0).extend; r = range(256)" "consume(r)" 
--compare-to=../cpython-master/venv/bin/python --duplicate=1000
/home/sergey/tmp/cpython-master/venv/bin/python: . 1.03 us 
+- 0.08 us
/home/sergey/tmp/cpython-dev/venv/bin/python: . 973 ns +- 
18 ns

Mean +- std dev: [/home/sergey/tmp/cpython-master/venv/bin/python] 1.03 us +- 
0.08 us -> [/home/sergey/tmp/cpython-dev/venv/bin/python] 973 ns +- 18 ns: 
1.05x faster (-5%)

--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue38015] inline function generates slightly inefficient machine code

2019-09-06 Thread Sergey Fedoseev


Change by Sergey Fedoseev :


--
pull_requests: +15372
pull_request: https://github.com/python/cpython/pull/15718

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue38015] inline function generates slightly inefficient machine code

2019-09-05 Thread Ma Lin


Ma Lin  added the comment:

Revert commit 5e63ab0 or use PR 15710, both are fine.

--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue38015] inline function generates slightly inefficient machine code

2019-09-05 Thread Ma Lin


Change by Ma Lin :


--
keywords: +patch
pull_requests: +15365
stage:  -> patch review
pull_request: https://github.com/python/cpython/pull/15710

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue38015] inline function generates slightly inefficient machine code

2019-09-02 Thread Ma Lin


Ma Lin  added the comment:

There will always be a new commit, replacing with a macro version also looks 
good.

I have no opinion, both are fine.

--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue38015] inline function generates slightly inefficient machine code

2019-09-02 Thread Raymond Hettinger


Raymond Hettinger  added the comment:

Do you see a way to salvage the commit?

If not, I'm fine with reverting it.  Its benefit was only aesthetic.

--
assignee:  -> rhettinger

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue38015] inline function generates slightly inefficient machine code

2019-09-02 Thread Ma Lin


New submission from Ma Lin :

Commit 5e63ab0 replaces macro with this inline function:

static inline int
is_small_int(long long ival)
{
return -NSMALLNEGINTS <= ival && ival < NSMALLPOSINTS;
}

(by default, NSMALLNEGINTS is 5, NSMALLPOSINTS is 257)


However, when invoking this function, and `sizeof(value) < sizeof(long long)`, 
there is an unnecessary type casting.

For example, on 32-bit platform, if `value` is `Py_ssize_t`, it needs to be 
converted to 8-byte `long long` type.

The following assembly code is the beginning part of 
`PyLong_FromSsize_t(Py_ssize_t v)` function.
(32-bit x86 build generated by GCC 9.2, with `-m32 -O2` option)

Use macro before commit 5e63ab0:
mov eax, DWORD PTR [esp+4]
add eax, 5
cmp eax, 261
ja  .L2
sal eax, 4
add eax, OFFSET FLAT:small_ints
add DWORD PTR [eax], 1
ret
.L2:jmp PyLong_FromSsize_t_rest(int)

Use inlined function:
pushebx
mov eax, DWORD PTR [esp+8]
mov edx, 261
mov ecx, eax
mov ebx, eax
sar ebx, 31
add ecx, 5
adc ebx, 0
cmp edx, ecx
mov edx, 0
sbb edx, ebx
jc  .L7
cwde
sal eax, 4
add eax, OFFSET FLAT:small_ints+80
add DWORD PTR [eax], 1
pop ebx
ret
.L7:pop ebx
jmp PyLong_FromSsize_t_rest(int)

On 32-bit x86 platform, 8-byte `long long` is implemented in using two 
registers, so the machine code is much longer than macro version.

At least these hot functions are suffered from this:
  PyObject* PyLong_FromSsize_t(Py_ssize_t v)
  PyObject* PyLong_FromLong(long v)

Replacing the inline function with a macro version will fix this:
#define IS_SMALL_INT(ival) (-NSMALLNEGINTS <= (ival) && (ival) < NSMALLPOSINTS)

If you want to see assembly code generated by major compilers, you can paste 
attached file demo.c to https://godbolt.org/
- demo.c was original written by Greg Price.
- use `-m32 -O2` to generate 32-bit build.

--
components: Interpreter Core
files: demo.c
messages: 351052
nosy: Greg Price, Ma Lin, aeros167, mark.dickinson, rhettinger, sir-sigurd
priority: normal
severity: normal
status: open
title: inline function generates slightly inefficient machine code
versions: Python 3.9
Added file: https://bugs.python.org/file48583/demo.c

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com