Hi, PGO compilation is very slow. I tried very hard to avoid it.
I started to annotate the C code with various GCC attributes like "inline", "always_inline", "hot", etc.. I also experimented likely/unlikely Linux macros which use __builtin_expect(). At the end... my efforts were worthless. I still had *major* issue (benchmark *suddenly* 68% slower! WTF?) with code locality and I decided to give up. You can still find some macros like _Py_HOT_FUNCTION and _Py_NO_INLINE in Python ;-) (_Py_NO_INLINE is used to reduce stack memory usage, that's a different story.) My sad story with code placement: https://vstinner.github.io/analysis-python-performance-issue.html tl; dr Use PGO. -- Since that time, I removed call_method from pyperformance to fix the root issue: don't waste your time on micro-benchmarks ;-) ... But I kept these micro-benchmarks in a different project: https://github.com/vstinner/pymicrobench For some specific needs (take a decision on a specific optimizaton), sometimes micro-benchmarks are still useful ;-) Victor Le mar. 26 févr. 2019 à 23:31, Neil Schemenauer <nas-pyt...@python.ca> a écrit : > > On 2019-02-26, Raymond Hettinger wrote: > > That said, I'm only observing the effect when building with the > > Mac default Clang (Apple LLVM version 10.0.0 (clang-1000.11.45.5). > > When building GCC 8.3.0, there is no change in performance. > > My guess is that the code in _PyEval_EvalFrameDefault() got changed > enough that Clang started emitting a bit different machine code. If > the conditional jumps are a bit different, I understand that could > have a significant difference on performance. > > Are you compiling with --enable-optimizations (i.e. PGO)? In my > experience, that is needed to get meaningful results. Victor also > mentions that on his "how-to-get-stable-benchmarks" page. Building > with PGO is really (really) slow so I supect you are not doing it > when bisecting. You can speed it up greatly by using a simpler > command for PROFILE_TASK in Makefile.pre.in. E.g. > > PROFILE_TASK=$(srcdir)/my_benchmark.py > > Now that you have narrowed it down to a single commit, it would be > worth doing the comparison with PGO builds (assuming Clang supports > that). > > > That said, it seems to be compiler specific and only affects the > > Mac builds, so maybe we can decide that we don't care. > > I think the key question is if the ceval loop got a bit slower due > to logic changes or if Clang just happened to generate a bit worse > code due to source code details. A PGO build could help answer > that. I suppose trying to compare machine code is going to produce > too large of a diff. > > Could you try hoisting the eval_breaker expression, as suggested by > Antoine: > > https://discuss.python.org/t/profiling-cpython-with-perf/940/2 > > If you think a slowdown affects most opcodes, I think the DISPATCH > change looks like the only cause. Maybe I missed something though. > > Also, maybe there would be some value in marking key branches as > likely/unlikely if it helps Clang generate better machine code. > Then, even if you compile without PGO (as many people do), you still > get the better machine code. > > Regards, > > Neil > _______________________________________________ > Python-Dev mailing list > Python-Dev@python.org > https://mail.python.org/mailman/listinfo/python-dev > Unsubscribe: > https://mail.python.org/mailman/options/python-dev/vstinner%40redhat.com -- Night gathers, and now my watch begins. It shall not end until my death. _______________________________________________ Python-Dev mailing list Python-Dev@python.org https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com