On 18/6/20, 16:57, "Khem Raj" <[email protected]> wrote: > > On Thu, Jun 18, 2020 at 4:47 PM Andre McCurdy <[email protected]> wrote: > > > > On Thu, Jun 18, 2020 at 4:25 PM Khem Raj <[email protected]> wrote: > > > > > > On Monday, June 15, 2020 1:33:26 PM PDT Ryan Rowe wrote: > > > > > > > On 14/6/20, 18:05, "Mittal, Anuj" <[email protected]> wrote: > > > > > > > > > On Fri, 2020-06-12 at 21:28 +0000, Ryan Rowe wrote: > > > > > > > > > > > Hello Alex, > > > > > > > > > > > > > > > > > > > > > > > > I’m investigating Python 3 performance issues on a Raspberry Pi > > > > > > Yocto > > > > > > build; I appreciate any insights you can provide into the problem. > > > > > > > > > > > > > > > > > > > > > > > > In my investigation, I noticed that PGO was disabled in all cases > > > > > > due > > > > > > to a small bug. I fixed it in a patch submitted to OE-Core > > > > > > (#139459). > > > > > > Even when PGO is indeed enabled, Python 3 runs significantly slower > > > > > > on Yocto-compiled Python 3.8.3 than the same version compiled on > > > > > > Raspbian. > > > > > > > > > > > > > > > > > > > > > > > > In your patch, 0001-Makefile.pre-use-qemu-wrapper-when-gathering- > > > > > > profile.patch, I see that you override the default PROFILE_TASK, > > > > > > which did not explicitly specify test suites, to a command that > > > > > > explicitly provides test suites. How did you decide on these tests? > > > > > > The standard PGO command runs 43 tests, while you specify 7. When I > > > > > > compile Python 3.8.3 on Raspbian, I see no intersection between the > > > > > > 43 tests run by default and the 7 you specify. Additionally, the > > > > > > default module for PROFILE is test while you use test.regrtest. > > > > > > > > > > > > > > > > > > > > We used to run pybench and then switched to regrtest: > > > > > > > > > > > > > > > > > > > > https://git.yoctoproject.org/cgit/cgit.cgi/poky/commit/?id=d9f7b9d3ad44195 > > > > > e68b2c1b09e3eb42e623c9a20 > > > > > > > > > > > > > > > > > > > The PROFILE_TASK value it looks like was changed recently: > > > > > > > > > > > > > > > > > > > > https://github.com/python/cpython/commit/2406672984e4c1b18629e615edad52928 > > > > > a72ffcc#diff-45e8b91057f0c5b60efcb5944125b585 > > > > > > > > > > > > > > > > > > > If the performance is actually degrading, may be we should change it > > > > > to > > > > > something more useful. Do you know much time does the default set of > > > > > tasks take to run in qemu? > > > > > > > > > > > > > > > > > > > > Thanks, > > > > > > > > > > > > > > > > > > > > Anuj > > > > > > > > > > > > Thanks for looking into this. It took me about 20 minutes to run the PGO > > > > tests and I did notice a significant improvement in Python runtime. > > > > However, that is compared against a non-PGO build. I have not compared > > > > the existing PGO arguments against the new upstream arguments. > > > > > > > > We've come to realize that our performance issues are not due to Python, > > > > but in fact a much deeper rooted issue. Simple C code takes 2-3 times > > > > longer to run on our image based on meta-raspberrypi's raspberrypi4 > > > > machine than stock Raspbian. > > > > > > > > On a side node, it seems that cPython now exposes PROFILE_TASK as a > > > > configuration option, so we can override that variable with our > > > > desired profiling arguments rather than modifying the Makefile > > > > directly with a patch. > > > > > > > > > > The patch 0001-Makefile.pre-use-qemu-wrapper-when-gathering-profile.patch > > > seems to hardcode what tests to run, perhaps it will be better to use > > > PROFILE_TASK
We can use the default PROFILE_TASK, however it sounds like Ross had reason to switch from Pybench to regrtest, mainly execution time. In his commit, he notes "also upstream have removed it from Python and instead use test.regrtest —pgo to profile the interpreter." This does not seem to be true anymore as upstream uses test rather than test.regrtest. However, the default tests do take 20 minutes to run which is considerably longer than the current explicit tests. > > > > > > When 3.5 -> 3.7 upgrade was done in > > > > > > https://git.openembedded.org/openembedded-core/commit/? > > > id=02714c105426b0d687620913c1a7401b386428b6 > > > > > > it dropped using PYTHON3_PROFILE_TASK silently, among large swath of > > > changes > > > this patch carried. I guess we have not checked the py3 runtime > > > performance to > > > detect this regression. > > > > Are we sure there is a regression? Ryan posted a follow up saying > > everything was slower in his tests, not just python. In case anyone is curious, I did find out the issue. The CPU governor was powersave rather than ondemand. Silly me, I only checked the min and max freq, not that they were being used. And a quirk of the OS prevented any of my benchmarks from printing the observed clock speed during test, just empty strings. With this fixed and when compiling with upstream PGO in Yocto, I do observe comparable performance to regular upstream Python 3.8 compiled with PGO on Raspbian. > > regression is disabling it with e53ebf29 Yes, that's correct. This inadvertently disabled PGO entirely. I can do some tests tomorrow to determine the performance loss due to PGO with these explicit test suites rather than the defaults from the upstream. I did notice performance gain when using PGO, but that was against non-PGO. > > > > > > so it will be good to reinstate the variable to choose what tests one > > > wants to > > > run with defaults being whatever is optimal for autobuilder. > > > > > > > Thanks, > > > > Ryan > > > > > > > > > > > > > > > > > > > > > > > > > > For reference, here’s the results of a simple CPU-bound test. These > > > > > > tests were run on the same Raspberry Pi 4 with same SD card. > > > > > > > > > > > > > > > > > > > > > > > > python3 -m timeit -r 10 --setup ' > > > > > > def fib(n): > > > > > > > > > > > > if n < 2: > > > > > > > > > > > > return n > > > > > > > > > > > > if n == 2: > > > > > > > > > > > > return 1 > > > > > > > > > > > > return fib(n - 1) + fib(n - 2) > > > > > > > > > > > > ' '[fib(n) for n in range(20)]' > > > > > > > > > > > > > > > > > > > > > > > > # Yocto Python 3.8.3 > > > > > > # 10 loops, best of 10: 28.9 msec per loop > > > > > > # 10 loops, best of 10: 29.3 msec per loop > > > > > > # 10 loops, best of 10: 27.9 msec per loop > > > > > > # 10 loops, best of 10: 30.4 msec per loop > > > > > > # Average result: 31.625 msec per loop > > > > > > > > > > > > > > > > > > > > > > > > # Raspbian Python 3.8.3 > > > > > > # 50 loops, best of 10: 7.73 msec per loop > > > > > > # 50 loops, best of 10: 7.72 msec per loop > > > > > > # 50 loops, best of 10: 7.67 msec per loop > > > > > > # 50 loops, best of 10: 7.74 msec per loop > > > > > > # Average result: 7.715 msec per loop > > > > > > > > > > > > > > > > > > > > > > > > # Raspbian speedup: 4.09x > > > > > > > > > > > > > > > > > > > > > > > > Best, > > > > > > Ryan Rowe > > > > > > > > > > > > > > > > > > > > > > > > > > > > >
-=-=-=-=-=-=-=-=-=-=-=- Links: You receive all messages sent to this group. View/Reply Online (#139678): https://lists.openembedded.org/g/openembedded-core/message/139678 Mute This Topic: https://lists.openembedded.org/mt/74848490/21656 Group Owner: [email protected] Unsubscribe: https://lists.openembedded.org/g/openembedded-core/unsub [[email protected]] -=-=-=-=-=-=-=-=-=-=-=-
