On Wed, Jul 11, 2018 at 7:47 AM Victor Stinner <vstin...@redhat.com> wrote: > > 2018-07-10 14:59 GMT+02:00 INADA Naoki <songofaca...@gmail.com>: > > PyObject_CallFunction(func, "n", 42); > > > > Currently, we create temporary long object for passing argument. > > If there is protocol for exposeing format used by PyArg_Parse*, we can > > bypass temporal Python object and call myfunc_impl directly. > > I'm not sure that it's worth it. It seems complex to implement. >
Both of my idea and PEP 580 are complicated. For Python stdlibs, I expect no significant benefit. We already bypass Python function calling by typecheck + concrete function call idiom. But for Cython users, especially people using Cython on Jupyter, I expect there are many extension-to-extension calls. Both of this idea and PEP 580 is complicated. And we don't have realistic example to demonstrate real world benefit of them. Without application benchmark, I think both of idea and PEP 580 shouldn't be happened. That's why I requested application benchmark again and again. PEP 576 seems minimalistic, straightforward way to allow FASTCALL for Cython and other 3rd party libraries. But if we accept PEP 576, it makes harder to allow more optimization in the future. I expect best balance is between PEP 576 and 580. Maybe, adding new slot as struct pointer with some flags, but don't add per-instance data. But I'm not sure because I'm not data scientist. I don't know what's the typical usage and where is main bottleneck of their application. Jeroen seems want we to discuss on PEP 576 and 580. So I explained to him why we need example application first. > I proposed something simpler, but nobody tried to implement it. > Instead of calling the long and complex PyArg_Parse...() functions, > why not generating C code to parse arguments instead? The idea looks > like "inlining" PyArg_Parse...() in its caller, but technically it > means that Argument Clinic generates C code to parse arguments. > I believe Cython did it already. But I have worrying about it. If we do it for all function, it makes Python binary fatter, consume more CPU cache. Once CPU cache is start stashing, application performance got slower quickly. And benchmarking CPU cache efficiency is very difficult. Current Python benchmark is too small. We benchmarked HTTP server, SQLAlchemy, JSON, template engine individually. But real application do all of them in loop. And many processes share L3 cache. Even L1 cache is shared by several processes by HyperThreading and context switch. > PyArg_Parse...() is cheap and has been optimized, but on very fast > functions (less than 100 ns), it might be significant. Well, to be > sure, someone should run a benchmark :-) > > Victor -- INADA Naoki <songofaca...@gmail.com> _______________________________________________ Python-Dev mailing list Python-Dev@python.org https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com