New submission from STINNER Victor <vstin...@python.org>:
The Fedora packaging has been modified to compile libpython with -fno-semantic-interposition flag: it makes Python up to 1.3x faster without having to touch any line of the C code! See pyperformance results: https://fedoraproject.org/wiki/Changes/PythonNoSemanticInterpositionSpeedup#Benefit_to_Fedora The main drawback is that -fno-semantic-interposition prevents to override Python symbols using a custom library preloaded by LD_PRELOAD. For example, override PyErr_Occurred() function. We (authors of the Fedora change) failed to find any use case for LD_PRELOAD. To be honest, I found *one* user in the last 10 years who used LD_PRELOAD to track memory allocations in Python 2.7. This use case is no longer relevant in Python 3 with PEP 445 which provides a supported C API to override Python memory allocators or to install hooks on Python memory allocators. Moreover, tracemalloc is a nice way to track memory allocations. Is there anyone aware of any special use of LD_PRELOAD for libpython? To be clear: -fno-semantic-interposition only impacts libpython. All other libraries still respect LD_PRELOAD. For example, it is still possible to override glibc malloc/free. Why -fno-semantic-interposition makes Python faster? There are multiple reasons. For of all, libpython makes a lot of function calls to libpython. Like really a lot, especially in the hot code paths. Without -fno-semantic-interposition, function calls to libpython requires to get through "interposition": for example "Procedure Linkage Table" (PLT) indirection on Linux. It prevents function inlining which has a major impact on performance (missed optimization). In short, even with PGO and LTO, libpython function calls have two performance "penalities": * indirect function calls (PLT) * no inlining I'm comparing Python performance of "statically linked Python" (Debian/Ubuntu choice: don't use ./configure --enable-shared, python is not linked to libpython) to "dynamically linked Python" (Fedora choice: use "./configure --enable-shared", python is dynamically linked to libpython). With -fno-semantic-interposition, function calls are direct and can be inlined when appropriate. You don't have to trust me, look at pyperformance benchmark results ;-) When using ./configure --enable-shared (libpython), the "python" binary is exactly one function call and that's all: int main(int argc, char **argv) { return Py_BytesMain(argc, argv); } So 100% of the time is only spent in libpython. For a longer rationale, see the accepted Fedora change: https://fedoraproject.org/wiki/Changes/PythonNoSemanticInterpositionSpeedup ---------- components: Build messages: 357856 nosy: inada.naoki, pablogsal, serhiy.storchaka, vstinner priority: normal severity: normal status: open title: Compile libpython with -fno-semantic-interposition type: performance versions: Python 3.9 _______________________________________ Python tracker <rep...@bugs.python.org> <https://bugs.python.org/issue38980> _______________________________________ _______________________________________________ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com