You missed the point of the PEP: "It becomes possible to experiment with more advanced optimizations in CPython than just micro-optimizations, like tagged pointers."
IMHO it's time to stop wasting our limited developer resources on micro-optimizations and micro-benchmarks, but think about overall Python performance and major Python internals redesign to find a way to make Python overall 2x faster, rather than making a specific function 10% faster. I don't think that the performance of accessing namedtuple attributes is a known bottleneck of Python performance. Le lun. 29 juin 2020 à 23:37, Raymond Hettinger <raymond.hettin...@gmail.com> a écrit : > $ python3.8 -m timeit -s 'from collections import namedtuple' -s > 'Point=namedtuple("Point", "x y")' -s 'p=Point(10,20)' 'p.x; p.y; p.x; p.y; > p.x; p.y' > 2000000 loops, best of 5: 119 nsec per loop > > $ python3.9 -m timeit -s 'from collections import namedtuple' -s > 'Point=namedtuple("Point", "x y")' -s 'p=Point(10,20)' 'p.x; p.y; p.x; p.y; > p.x; p.y' > 2000000 loops, best of 5: 152 nsec per loop Measuring benchmarks which take less than 1 second requires being very careful. For a microbenchmark which takes around 100 ns like this one, you are very close to the CPU limit and "everything" becomes important. Python performance depends on the C compiler, on compiler options, how you run the microbenchmark, if --enable-shared is used, etc. Giving microbenchmark results without these information isn't helpful. On Fedora 32, Python binaries are built by GCC with Link Time Optimization (LTO) and Profile Guided Optimization (PGO). I simply get the same performance between Python 3.8.3 and Python 3.9.0b3: $ python3.9 -m pyperf timeit --compare-to=python3.8 -s 'from collections import namedtuple' -s 'Point=namedtuple("Point", "x y")' -s 'p=Point(10,20)' 'p.x; p.y; p.x; p.y; p.x; p.y' python3.8: ..................... 138 ns +- 2 ns python3.9: ..................... 136 ns +- 3 ns Mean +- std dev: [python3.8] 138 ns +- 2 ns -> [python3.9] 136 ns +- 3 ns: 1.01x faster (-1%) (A difference smaller than 10% on a microbenchmark is not significant.) The compiler decides to inline or not a static inline function depending on many complex things. I don't think that there is any need to elaborate here. The idea to force inlining was discussed but rejected when first C API macros have been converted to static inline functions: https://bugs.python.org/issue35059 C compilers are now really smart to emit the most efficient machine code. By the way, if you configure Python with --enable-shared, function calls from libpython to libpython have to go through a procedure linkage table (PLT) indirection. Python 3.8 and 3.9, on Fedora 32 and Python 3.8 on RHEL8 are built with -fno-semantic-interposition to avoid this indirection and so make Python faster. More about this linker flag: https://developers.redhat.com/blog/2020/06/25/red-hat-enterprise-linux-8-2-brings-faster-python-3-8-run-speeds/ Victor -- Night gathers, and now my watch begins. It shall not end until my death. _______________________________________________ Python-Dev mailing list -- python-dev@python.org To unsubscribe send an email to python-dev-le...@python.org https://mail.python.org/mailman3/lists/python-dev.python.org/ Message archived at https://mail.python.org/archives/list/python-dev@python.org/message/5AAO45Y276AS5EZDDKTRP6QZ6K5SOOO6/ Code of Conduct: http://python.org/psf/codeofconduct/