I suggest re-posting this on discuss.python.org as more engaged active core
devs will pay attention to it there.

On Wed, Jan 4, 2023 at 11:12 AM Daan De Meyer <daan.j.deme...@gmail.com>
wrote:

> Hi,
>
> As part of the proposal to enable frame pointers by default in Fedora
> (https://fedoraproject.org/wiki/Changes/fno-omit-frame-pointer), we
> did some benchmarking to figure out the expected performance impact.
> The performance impact was generally minimal, except for the
> pyperformance benchmark suite where we noticed a more substantial
> difference between a system built with frame pointers and a system
> built without frame pointers. The results can be found here:
> https://github.com/DaanDeMeyer/fpbench (look at the mean difference
> column for the pyperformance results where the percentage is the
> slowdown compared to a system built without frame pointers). One of
> the biggest slowdowns was on the scimark_sparse_mat_mult benchmark
> which slowed down 9.5% when the system (including python) was built
> with frame pointers. Note that these benchmarks were run against
> Python 3.11 on a Fedora 37 x86_64 system (one built with frame
> pointers, another built without frame pointers). The system used to
> run the benchmarks was an Amazon EC2 machine.
>
> We did look a bit into the reasons behind this slowdown. I'll quote
> the investigation by Andrii on the Fesco issue thread here
> (https://pagure.io/fesco/issue/2817):
>
> > So I did look a bit at Python with and without frame pointers trying to
> > understand pyperformance > regressions.
>
> > First, perf data suggests that big chunk of CPU is spent in
> _PyEval_EvalFrameDefault,
> > so I looked specifically  into it (also we had to use DWARF mode for
> perf for apples-to-apples
> > comparison, and a bunch of stack traces weren't symbolized properly,
> which just again
> > reminds why having frame pointers is important).
>
> > perf annotation of _PyEval_EvalFrameDefault didn't show any obvious hot
> spots, the work
> > seemed to be distributed pretty similarly with or without frame
> pointers. Also scrolling through
> > _PyEval_EvalFrameDefault disassembly also showed that instruction
> patterns between fp
> > and no-fp versions are very similar.
>
> > But just a few interesting observations.
>
> > The size of _PyEval_EvalFrameDefault function specifically (and all the
> other functions didn't
> > change much in that regard) increased very significantly from 46104 to
> 53592 bytes, which is a
> > considerable 15% increase. Looking deeper, I believe it's all due to
> more stack spills and
> > reloads due to one less register available to keep local variables in
> registers instead of on the stack.
>
> > Looking at _PyEval_EvalFrameDefault C code, it is a humongous one
> function with gigantic switch
> > statement that implements Python instruction handling logic. So the
> function itself is big and it has
> > a lot of local state in different branches, which to me explained why
> there is so much stack spill/load.
>
> > Grepping for instruction of the form mov -0xf0(%rbp),%rcx or mov
> 0x50(%rsp),%r10 (and their reverse
> > variants), I see that there is a substantial amount of stack spill/load
> in _PyEval_EvalFrameDefault
> > disassembly already in default no frame pointer variant (1870 out of
> 11181 total instructions in that
> > function, 16.7%), and it just increases further in frame pointer version
> (2341 out of 11733 instructions, 20%).
>
> > One more interesting observation. With no frame pointers, GCC generates
> stack accesses using %rsp
> > with small positive offsets, which results in pretty compact binary
> instruction representation, e.g.:
>
> > 0x00000000001cce40 <+44160>: 4c 8b 54 24 50          mov
> 0x50(%rsp),%r10
>
> > This uses 5 bytes. But if frame pointers are enabled, GCC switches to
> using %rbp-relative offsets,
> > which are all negative. And that seems to result in much bigger
> instructions, taking now 7 bytes instead of 5:
>
> > 0x00000000001d3969 <+53065>: 48 8b 8d 10 ff ff ff    mov
> -0xf0(%rbp),%rcx
>
> > I found it pretty interesting. I'd imagine GCC should be capable to keep
> using %rsp addressing just fine
> > regardless of %rbp and save on instruction sizes, but apparently it
> doesn't. Not sure why. But this instruction
> > increase, coupled with increase of number of spills/reloads, actually
> explains huge increase in byte size of
> > _PyEval_EvalFrameDefault: (2341 - 1870) * 7 + 1870 * 2 = 7037 (2 extra
> bytes for existing 1870 instructions
> > that were switched from %rsp+positive offset to %rbp + negative offset,
> plus 7 bytes for each of new 471 instructions).
> > I'm no compiler expert, but it would be nice for someone from GCC
> community to check this as well (please CC
> > relevant folks, if you know them).
>
> > In summary, to put it bluntly, there is just more work to do for CPU
> saving/restoring state to/from stack. But I don't
> > think _PyEval_EvalFrameDefault example is typical of how application
> code is written, nor is it, generally speaking,
> > a good idea to do so much within single gigantic function. So I believe
> it's more of an outlier than a typical case.
>
> We have a few questions:
> - Is this slowdown when Python is built with frame pointers to be
> expected? Has the Python community done any of their own experiments
> with building Python with and without frame pointers?
> - Is there anything we can do to fix the slowdown when Python is built
> with frame pointers?
> - Should we expect any change in benchmark results if we benchmark
> against Python 3.12? Supposedly there are changes in Python 3.12
> related to frame pointers so we're wondering if those changes might
> affect these results in any way.
>
> Cheers,
>
> Daan De Meyer
> _______________________________________________
> Python-Dev mailing list -- python-dev@python.org
> To unsubscribe send an email to python-dev-le...@python.org
> https://mail.python.org/mailman3/lists/python-dev.python.org/
> Message archived at
> https://mail.python.org/archives/list/python-dev@python.org/message/LVRUY7KAJ5I532NHMDWJIS5H4HXSGBWD/
> Code of Conduct: http://python.org/psf/codeofconduct/
>
_______________________________________________
Python-Dev mailing list -- python-dev@python.org
To unsubscribe send an email to python-dev-le...@python.org
https://mail.python.org/mailman3/lists/python-dev.python.org/
Message archived at 
https://mail.python.org/archives/list/python-dev@python.org/message/ZYYEL43527Q2FEQ2WRETLJDODKBPZMRA/
Code of Conduct: http://python.org/psf/codeofconduct/

Reply via email to