[issue37837] add internal _PyLong_FromUnsignedChar() function

2019-08-27 Thread Greg Price
Greg Price added the comment: Ah OK, that makes sense of it then :) > But the most important thing is that using PyLong_FromUnsignedLong() instead > of _PyLong_FromUnsignedChar() on top of GH-15192 is producing the same > results: striter_next() uses small_ints[] directly. However that's

[issue37837] add internal _PyLong_FromUnsignedChar() function

2019-08-25 Thread Sergey Fedoseev
Sergey Fedoseev added the comment: These last results are invalid :-) I thought that I was checking _PyLong_FromUnsignedChar() on top of GH-15192, but that wasn't true. So the correct results for LTO build are: $ python -m perf timeit -s "from collections import deque; consume =

[issue37837] add internal _PyLong_FromUnsignedChar() function

2019-08-25 Thread Greg Price
Greg Price added the comment: Very interesting, thanks! It looks like with LTO enabled, this optimization has no effect at all. This change adds significant complexity, and it seems like the hoped-for payoff is entirely in terms of performance on rather narrowly-focused microbenchmarks.

[issue37837] add internal _PyLong_FromUnsignedChar() function

2019-08-24 Thread Sergey Fedoseev
Sergey Fedoseev added the comment: $ gcc -v 2>&1 | grep 'gcc version' gcc version 8.3.0 (Debian 8.3.0-19) using ./configure --enable-optimizations --with-lto $ python -m perf timeit -s "from collections import deque; consume = deque(maxlen=0).extend; b = bytes(2**20)" "consume(b)"

[issue37837] add internal _PyLong_FromUnsignedChar() function

2019-08-24 Thread Greg Price
Greg Price added the comment: > Is there a particular reason to specifically call PyLong_FromSize_t? Seems > like PyLong_FromLong is the natural default (and what we default to in the > rest of the code), and it's what this ends up calling anyway. Ah I see, the patch is meant to go on top

[issue37837] add internal _PyLong_FromUnsignedChar() function

2019-08-24 Thread Greg Price
Greg Price added the comment: Oh also: * What compiler, and what compilation flags, are you using in your benchmarking? That seems relevant :) -- ___ Python tracker ___

[issue37837] add internal _PyLong_FromUnsignedChar() function

2019-08-24 Thread Greg Price
Greg Price added the comment: Hmm, I'm a bit confused because: * Your patch at GH-15251 replaces a number of calls to PyLong_FromLong with calls to the new _PyLong_FromUnsignedChar. * That function, in turn, just calls PyLong_FromSize_t. * And that function begins: PyObject *

[issue37837] add internal _PyLong_FromUnsignedChar() function

2019-08-13 Thread Jeroen Demeyer
Jeroen Demeyer added the comment: Maybe an even better idea would be to partially inline PyLong_FromLong(). If the check for small ints in PyLong_FromLong() would be inlined, then the compiler could optimize those checks. This would benefit all users of PyLong_FromLong() without code

[issue37837] add internal _PyLong_FromUnsignedChar() function

2019-08-13 Thread Sergey Fedoseev
Change by Sergey Fedoseev : -- keywords: +patch pull_requests: +14971 stage: -> patch review pull_request: https://github.com/python/cpython/pull/15251 ___ Python tracker ___

[issue37837] add internal _PyLong_FromUnsignedChar() function

2019-08-13 Thread Sergey Fedoseev
New submission from Sergey Fedoseev : When compiled with default NSMALLPOSINTS, _PyLong_FromUnsignedChar() is significantly faster than other PyLong_From*(): $ python -m perf timeit -s "from collections import deque; consume = deque(maxlen=0).extend; b = bytes(2**20)" "consume(b)"