[issue36127] Argument Clinic: inline parsing code for functions with keyword parameters

2019-08-31 Thread Serhiy Storchaka


Change by Serhiy Storchaka :


--
resolution:  -> fixed
stage: patch review -> resolved
status: open -> closed

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue36127] Argument Clinic: inline parsing code for functions with keyword parameters

2019-03-16 Thread Serhiy Storchaka


Serhiy Storchaka  added the comment:


New changeset 1b0393d5b7842dcd9e933117d2d5404d15e2ad00 by Serhiy Storchaka in 
branch 'master':
bpo-36127: Fix compiler warning in _PyArg_UnpackKeywords(). (GH-12353)
https://github.com/python/cpython/commit/1b0393d5b7842dcd9e933117d2d5404d15e2ad00


--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue36127] Argument Clinic: inline parsing code for functions with keyword parameters

2019-03-15 Thread Serhiy Storchaka


Change by Serhiy Storchaka :


--
pull_requests: +12318

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue36127] Argument Clinic: inline parsing code for functions with keyword parameters

2019-03-15 Thread STINNER Victor


Change by STINNER Victor :


--
pull_requests: +12311

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue36127] Argument Clinic: inline parsing code for functions with keyword parameters

2019-03-14 Thread Serhiy Storchaka


Serhiy Storchaka  added the comment:


New changeset 3191391515824fa7f3c573d807f1034c6a28fab3 by Serhiy Storchaka in 
branch 'master':
bpo-36127: Argument Clinic: inline parsing code for keyword parameters. 
(GH-12058)
https://github.com/python/cpython/commit/3191391515824fa7f3c573d807f1034c6a28fab3


--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue36127] Argument Clinic: inline parsing code for functions with keyword parameters

2019-02-27 Thread Serhiy Storchaka


Serhiy Storchaka  added the comment:

In addition, this change can allow us to get rid of large and complex functions 
_PyArg_ParseTupleAndKeywordsFast() and _PyArg_ParseStackAndKeywords(). The 
former is no longer used in CPython, and the latter is still used in few places 
to support some deprecated formatting codes for which I intentionally not 
implemented inlining. After getting rid of uses of such codes (the patch in 
progress) we could remove both these functions.

--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue36127] Argument Clinic: inline parsing code for functions with keyword parameters

2019-02-27 Thread STINNER Victor


STINNER Victor  added the comment:

About the stack memory usage, in the past, I used a subfunction tagged with 
_Py_NO_INLINE to work on temporary stack but then drop it:

void function()
{
   subfunction(); /* use temporary stack */
   /* don't waste stack memory */
   ...
}

I'm not sure if such pattern could be used here for things like " PyObject 
*argsbuf[12];".

The problem is that argument parsing uses a lot of local variables allocated on 
the stack. In practice, it's more like:

void function(args)
{
   int x;
   parse_args(args, ); /* use temporary stack */
   /* don't waste stack memory */
   ...
}

I expect a long list of "" where arg is a local variable of function(). 
Well, that's basically the design of the current PyArg_ParseTuple() function 
family :-)

PyArg_ParseTuple() does its stuff in private and uses more stack memory, but 
once PyArg_ParseTuple() returns, the memory on the stack is "released" just 
because we exited the function.

--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue36127] Argument Clinic: inline parsing code for functions with keyword parameters

2019-02-27 Thread Serhiy Storchaka


Serhiy Storchaka  added the comment:

As for depending the optimization on the size of CPU cache, I have repeated 
mickrobenchmarks on the computer with 6 MiB cache and two computers with 512 
KiB caches (64- and 32-bit).

Intel(R) Core(TM) i7-6700HQ CPU @ 2.60GHz (cache size: 6144 KB):

+-+--+--+
| Benchmark   | baseline | inline   |
+=+==+==+
| round_(4.2) | 113 ns   | 81.3 ns: 1.39x faster (-28%) |
+-+--+--+
| sum_(())| 83.8 ns  | 56.7 ns: 1.48x faster (-32%) |
+-+--+--+
| sum_(a) | 98.0 ns  | 72.1 ns: 1.36x faster (-26%) |
+-+--+--+
| 'abc'.split()   | 107 ns   | 83.1 ns: 1.29x faster (-22%) |
+-+--+--+
| b'abc'.split()  | 101 ns   | 75.4 ns: 1.34x faster (-25%) |
+-+--+--+
| 'abc'.split('-')| 123 ns   | 89.9 ns: 1.37x faster (-27%) |
+-+--+--+
| 'abc'.encode()  | 79.6 ns  | 59.2 ns: 1.34x faster (-26%) |
+-+--+--+
| b'abc'.decode() | 105 ns   | 84.7 ns: 1.24x faster (-20%) |
+-+--+--+
| int_(4.2)   | 88.9 ns  | 64.1 ns: 1.39x faster (-28%) |
+-+--+--+
| int_('5')   | 137 ns   | 108 ns: 1.28x faster (-22%)  |
+-+--+--+
| 42 .to_bytes(2, 'little')   | 113 ns   | 77.6 ns: 1.45x faster (-31%) |
+-+--+--+
| int_from_bytes(b'ab', 'little') | 83.4 ns  | 51.5 ns: 1.62x faster (-38%) |
+-+--+--+
| struct_i32_unpack_from(b'abcd') | 96.0 ns  | 71.6 ns: 1.34x faster (-25%) |
+-+--+--+
| re_word_match('a')  | 221 ns   | 180 ns: 1.22x faster (-18%)  |
+-+--+--+
| datetime_now()  | 282 ns   | 248 ns: 1.14x faster (-12%)  |
+-+--+--+

Not significant (1): zlib_compress(b'abc')

AMD Athlon(tm) 64 X2 Dual Core Processor 4600+ (cache size: 512 KB):

+-+--+-+
| Benchmark   | baseline | inline  |
+=+==+=+
| round_(4.2) | 391 ns   | 272 ns: 1.44x faster (-31%) |
+-+--+-+
| sum_(())| 212 ns   | 160 ns: 1.32x faster (-24%) |
+-+--+-+
| sum_(a) | 256 ns   | 211 ns: 1.21x faster (-18%) |
+-+--+-+
| 'abc'.split()   | 290 ns   | 233 ns: 1.25x faster (-20%) |
+-+--+-+
| b'abc'.split()  | 263 ns   | 226 ns: 1.16x faster (-14%) |
+-+--+-+
| 'abc'.split('-')| 316 ns   | 262 ns: 1.21x faster (-17%) |
+-+--+-+
| 'abc'.encode()  | 197 ns   | 154 ns: 1.28x faster (-22%) |
+-+--+-+
| b'abc'.decode() | 303 ns   | 250 ns: 1.21x faster (-18%) |
+-+--+-+
| int_(4.2)   | 234 ns   | 171 ns: 1.37x faster (-27%) |
+-+--+-+
| int_('5')   | 372 ns   | 310 ns: 1.20x faster (-17%) |
+-+--+-+
| 42 .to_bytes(2, 'little')   | 370 ns   | 245 ns: 1.51x faster (-34%) |
+-+--+-+
| int_from_bytes(b'ab', 'little') | 251 ns   | 167 ns: 1.50x faster (-33%) |

[issue36127] Argument Clinic: inline parsing code for functions with keyword parameters

2019-02-27 Thread Serhiy Storchaka


Serhiy Storchaka  added the comment:

Good questions Josh!

The size of the python binary has been increased from 17494944 bytes (17085 
KiB) to 17657112 bytes (17243 KiB) -- by 1%.

I think that this change can not increase memory consumption, because the new 
code does not use the heap (the old code can allocate additional memory 
dynamically).

As for using the stack memory, it is not so clear. On one side, the new code 
allocates an array on the stack for references to all parameters, and this 
memory is left in use until you return from the function. On other side, the 
old code allocates a lot of variables and static-sized buffers, and creates 
several function frames, but this memory is released after the end of arguments 
parsing. I think that for non-recursive functions the new code has smaller 
stack memory consumption (while the function has less than several tens of 
parameters), but for recursive functions it can increase stack memory 
consumption. Although I do not know whether any of affected functions is 
recursive.

--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue36127] Argument Clinic: inline parsing code for functions with keyword parameters

2019-02-26 Thread STINNER Victor


STINNER Victor  added the comment:

> How much bigger does the core interpreter + built-in extension modules get 
> when you make this change? How much more memory is used by real world 
> programs?

Well, any optimization is a matter of trade-off between memory and CPU. Last 
years, CPU are not really getting way faster (especially when you consider that 
Python is usually only able to use a single CPU thread), whereas computers are 
getting more and more RAM.


> It should be tested on a number of systems too; any losses to cache 
> unfriendliness would be highly dependent on the size of the CPU cache.

I prefer to no pay too much attention to assumptions on the hardware. I prefer 
to only trust benchmarks :-)

--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue36127] Argument Clinic: inline parsing code for functions with keyword parameters

2019-02-26 Thread Josh Rosenberg


Josh Rosenberg  added the comment:

How much bigger does the core interpreter + built-in extension modules get when 
you make this change? How much more memory is used by real world programs?

I'm a little concerned when I see individual functions growing by 140 lines in 
the first file of the diff. Clearly the increase in memory usage wasn't enough 
to slow down the microbenchmarks (which can probably fit the entire hot code 
path in CPU on die cache), but I'd be worried about the aggregate effect on 
real world programs if we go from a handful of argument parsing code paths 
reused by every function (and therefore kept constantly hot) to hundreds (or 
thousands?) of unique argument parsing code paths, each one unique to a single 
function. It could easily look great when a single function is being called 
repeatedly, while looking much less great (possibly worse) when the many varied 
function calls are interleaved.

It should be tested on a number of systems too; any losses to cache 
unfriendliness would be highly dependent on the size of the CPU cache.

I'll grant, it doesn't seem like inlining positional argument parsing has 
caused problems, and it looks like we're still using _PyArg_UnpackKeywords, so 
argument parsing isn't completely devolved to each functions, but I think we 
need something more than microbenchmarks before we jump on this.

If my worries end up being unfounded, I'll be happy. Looks very cool if we can 
really get that sort of speed up for all function calls, not just positional 
args only functions. :-)

--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue36127] Argument Clinic: inline parsing code for functions with keyword parameters

2019-02-26 Thread Josh Rosenberg


Change by Josh Rosenberg :


--
nosy: +josh.r

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue36127] Argument Clinic: inline parsing code for functions with keyword parameters

2019-02-26 Thread Karthikeyan Singaravelan


Karthikeyan Singaravelan  added the comment:

That's a lot faster and will be great if they make it to next alpha :) Adding 
Ammar Askar since they did review for positional arguments.

--
nosy: +ammar2, xtreak

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue36127] Argument Clinic: inline parsing code for functions with keyword parameters

2019-02-26 Thread Serhiy Storchaka


Serhiy Storchaka  added the comment:

Some examples:

$ ./python -m perf timeit --compare-to=../cpython-release-baseline/python 
--duplicate=1000 -s "round_ = round" "round_(4.2)"
Mean +- std dev: [...] 110 ns +- 3 ns -> [...] 81.4 ns +- 2.2 ns: 1.35x faster 
(-26%)

$ ./python -m perf timeit --compare-to=../cpython-release-baseline/python 
--duplicate=1000 -s "sum_ = sum" "sum_(())"
Mean +- std dev: [...] 88.0 ns +- 6.5 ns -> [...] 57.6 ns +- 1.1 ns: 1.53x 
faster (-35%)

$ ./python -m perf timeit --compare-to=../cpython-release-baseline/python 
--duplicate=1000 -s "sum_ = sum; a = [1, 2]"  "sum_(a)"
Mean +- std dev: [...] 95.9 ns +- 2.1 ns -> [...] 70.6 ns +- 2.0 ns: 1.36x 
faster (-26%)

$ ./python -m perf timeit --compare-to=../cpython-release-baseline/python 
--duplicate=1000  "'abc'.split()"  
Mean +- std dev: [...] 102 ns +- 3 ns -> [...] 80.5 ns +- 2.1 ns: 1.26x faster 
(-21%)

$ ./python -m perf timeit --compare-to=../cpython-release-baseline/python 
--duplicate=1000  "b'abc'.split()"
Mean +- std dev: [...] 91.8 ns +- 2.3 ns -> [...] 75.1 ns +- 1.4 ns: 1.22x 
faster (-18%)

$ ./python -m perf timeit --compare-to=../cpython-release-baseline/python 
--duplicate=1000  "'abc'.split('-')"
Mean +- std dev: [...] 118 ns +- 2 ns -> [...] 89.2 ns +- 1.8 ns: 1.32x faster 
(-24%)

$ ./python -m perf timeit --compare-to=../cpython-release-baseline/python 
--duplicate=1000  "b'abc'.decode()"
Mean +- std dev: [...] 96.1 ns +- 3.6 ns -> [...] 78.9 ns +- 2.2 ns: 1.22x 
faster (-18%)

$ ./python -m perf timeit --compare-to=../cpython-release-baseline/python 
--duplicate=1000  "'abc'.encode()"
Mean +- std dev: [...] 72.4 ns +- 1.9 ns -> [...] 55.1 ns +- 1.8 ns: 1.31x 
faster (-24%)

$ ./python -m perf timeit --compare-to=../cpython-release-baseline/python 
--duplicate=1000 -s "int_= int"  "int(4.2)"
Mean +- std dev: [...] 105 ns +- 4 ns -> [...] 78.8 ns +- 1.9 ns: 1.33x faster 
(-25%)

$ ./python -m perf timeit --compare-to=../cpython-release-baseline/python 
--duplicate=1000 -s "int_= int"  "int('5')"
Mean +- std dev: [...] 154 ns +- 5 ns -> [...] 122 ns +- 4 ns: 1.26x faster 
(-21%)

$ ./python -m perf timeit --compare-to=../cpython-release-baseline/python 
--duplicate=1000  "42 .to_bytes(2, 'little')
Mean +- std dev: [...] 109 ns +- 3 ns -> [...] 72.4 ns +- 1.9 ns: 1.51x faster 
(-34%)

$ ./python -m perf timeit --compare-to=../cpython-release-baseline/python 
--duplicate=1000 "from_bytes = int.from_bytes" "from_bytes(b'ab', 'little')"
Mean +- std dev: [...] 138 ns +- 3 ns -> [...] 96.3 ns +- 3.0 ns: 1.43x faster 
(-30%)

--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue36127] Argument Clinic: inline parsing code for functions with keyword parameters

2019-02-26 Thread Serhiy Storchaka


Change by Serhiy Storchaka :


--
keywords: +patch
pull_requests: +12083
stage:  -> patch review

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue36127] Argument Clinic: inline parsing code for functions with keyword parameters

2019-02-26 Thread Serhiy Storchaka


New submission from Serhiy Storchaka :

This is a follow up of issue23867 and issue35582. The proposed PR makes 
Argument Clinic inlining parsing code for functions with keyword parameters, 
i.e. functions that use _PyArg_ParseTupleAndKeywordsFast() and 
_PyArg_ParseStackAndKeywords() now. This saves time for parsing format strings 
and calling few levels of functions.

--
assignee: serhiy.storchaka
components: Argument Clinic
messages: 336700
nosy: larry, serhiy.storchaka, vstinner
priority: normal
severity: normal
status: open
title: Argument Clinic: inline parsing code for functions with keyword 
parameters
type: performance
versions: Python 3.8

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com