I suggest re-posting this on discuss.python.org as more engaged active core devs will pay attention to it there.
On Wed, Jan 4, 2023 at 11:12 AM Daan De Meyer <daan.j.deme...@gmail.com> wrote: > Hi, > > As part of the proposal to enable frame pointers by default in Fedora > (https://fedoraproject.org/wiki/Changes/fno-omit-frame-pointer), we > did some benchmarking to figure out the expected performance impact. > The performance impact was generally minimal, except for the > pyperformance benchmark suite where we noticed a more substantial > difference between a system built with frame pointers and a system > built without frame pointers. The results can be found here: > https://github.com/DaanDeMeyer/fpbench (look at the mean difference > column for the pyperformance results where the percentage is the > slowdown compared to a system built without frame pointers). One of > the biggest slowdowns was on the scimark_sparse_mat_mult benchmark > which slowed down 9.5% when the system (including python) was built > with frame pointers. Note that these benchmarks were run against > Python 3.11 on a Fedora 37 x86_64 system (one built with frame > pointers, another built without frame pointers). The system used to > run the benchmarks was an Amazon EC2 machine. > > We did look a bit into the reasons behind this slowdown. I'll quote > the investigation by Andrii on the Fesco issue thread here > (https://pagure.io/fesco/issue/2817): > > > So I did look a bit at Python with and without frame pointers trying to > > understand pyperformance > regressions. > > > First, perf data suggests that big chunk of CPU is spent in > _PyEval_EvalFrameDefault, > > so I looked specifically into it (also we had to use DWARF mode for > perf for apples-to-apples > > comparison, and a bunch of stack traces weren't symbolized properly, > which just again > > reminds why having frame pointers is important). > > > perf annotation of _PyEval_EvalFrameDefault didn't show any obvious hot > spots, the work > > seemed to be distributed pretty similarly with or without frame > pointers. Also scrolling through > > _PyEval_EvalFrameDefault disassembly also showed that instruction > patterns between fp > > and no-fp versions are very similar. > > > But just a few interesting observations. > > > The size of _PyEval_EvalFrameDefault function specifically (and all the > other functions didn't > > change much in that regard) increased very significantly from 46104 to > 53592 bytes, which is a > > considerable 15% increase. Looking deeper, I believe it's all due to > more stack spills and > > reloads due to one less register available to keep local variables in > registers instead of on the stack. > > > Looking at _PyEval_EvalFrameDefault C code, it is a humongous one > function with gigantic switch > > statement that implements Python instruction handling logic. So the > function itself is big and it has > > a lot of local state in different branches, which to me explained why > there is so much stack spill/load. > > > Grepping for instruction of the form mov -0xf0(%rbp),%rcx or mov > 0x50(%rsp),%r10 (and their reverse > > variants), I see that there is a substantial amount of stack spill/load > in _PyEval_EvalFrameDefault > > disassembly already in default no frame pointer variant (1870 out of > 11181 total instructions in that > > function, 16.7%), and it just increases further in frame pointer version > (2341 out of 11733 instructions, 20%). > > > One more interesting observation. With no frame pointers, GCC generates > stack accesses using %rsp > > with small positive offsets, which results in pretty compact binary > instruction representation, e.g.: > > > 0x00000000001cce40 <+44160>: 4c 8b 54 24 50 mov > 0x50(%rsp),%r10 > > > This uses 5 bytes. But if frame pointers are enabled, GCC switches to > using %rbp-relative offsets, > > which are all negative. And that seems to result in much bigger > instructions, taking now 7 bytes instead of 5: > > > 0x00000000001d3969 <+53065>: 48 8b 8d 10 ff ff ff mov > -0xf0(%rbp),%rcx > > > I found it pretty interesting. I'd imagine GCC should be capable to keep > using %rsp addressing just fine > > regardless of %rbp and save on instruction sizes, but apparently it > doesn't. Not sure why. But this instruction > > increase, coupled with increase of number of spills/reloads, actually > explains huge increase in byte size of > > _PyEval_EvalFrameDefault: (2341 - 1870) * 7 + 1870 * 2 = 7037 (2 extra > bytes for existing 1870 instructions > > that were switched from %rsp+positive offset to %rbp + negative offset, > plus 7 bytes for each of new 471 instructions). > > I'm no compiler expert, but it would be nice for someone from GCC > community to check this as well (please CC > > relevant folks, if you know them). > > > In summary, to put it bluntly, there is just more work to do for CPU > saving/restoring state to/from stack. But I don't > > think _PyEval_EvalFrameDefault example is typical of how application > code is written, nor is it, generally speaking, > > a good idea to do so much within single gigantic function. So I believe > it's more of an outlier than a typical case. > > We have a few questions: > - Is this slowdown when Python is built with frame pointers to be > expected? Has the Python community done any of their own experiments > with building Python with and without frame pointers? > - Is there anything we can do to fix the slowdown when Python is built > with frame pointers? > - Should we expect any change in benchmark results if we benchmark > against Python 3.12? Supposedly there are changes in Python 3.12 > related to frame pointers so we're wondering if those changes might > affect these results in any way. > > Cheers, > > Daan De Meyer > _______________________________________________ > Python-Dev mailing list -- python-dev@python.org > To unsubscribe send an email to python-dev-le...@python.org > https://mail.python.org/mailman3/lists/python-dev.python.org/ > Message archived at > https://mail.python.org/archives/list/python-dev@python.org/message/LVRUY7KAJ5I532NHMDWJIS5H4HXSGBWD/ > Code of Conduct: http://python.org/psf/codeofconduct/ >
_______________________________________________ Python-Dev mailing list -- python-dev@python.org To unsubscribe send an email to python-dev-le...@python.org https://mail.python.org/mailman3/lists/python-dev.python.org/ Message archived at https://mail.python.org/archives/list/python-dev@python.org/message/ZYYEL43527Q2FEQ2WRETLJDODKBPZMRA/ Code of Conduct: http://python.org/psf/codeofconduct/