On Thu, Jun 18, 2020 at 4:47 PM Andre McCurdy <[email protected]> wrote:
>
> On Thu, Jun 18, 2020 at 4:25 PM Khem Raj <[email protected]> wrote:
> >
> > On Monday, June 15, 2020 1:33:26 PM PDT Ryan Rowe wrote:
> > > On 14/6/20, 18:05, "Mittal, Anuj" <[email protected]> wrote:
> > >
> > > > On Fri, 2020-06-12 at 21:28 +0000, Ryan Rowe wrote:
> > > >
> > > > > Hello Alex,
> > > > >
> > > > >
> > > > >
> > > > > I’m investigating Python 3 performance issues on a Raspberry Pi Yocto
> > > > > build; I appreciate any insights you can provide into the problem.
> > > > >
> > > > >
> > > > >
> > > > > In my investigation, I noticed that PGO was disabled in all cases due
> > > > > to a small bug. I fixed it in a patch submitted to OE-Core (#139459).
> > > > > Even when PGO is indeed enabled, Python 3 runs significantly slower
> > > > > on Yocto-compiled Python 3.8.3 than the same version compiled on
> > > > > Raspbian.
> > > > >
> > > > >
> > > > >
> > > > > In your patch, 0001-Makefile.pre-use-qemu-wrapper-when-gathering-
> > > > > profile.patch, I see that you override the default PROFILE_TASK,
> > > > > which did not explicitly specify test suites, to a command that
> > > > > explicitly provides test suites. How did you decide on these tests?
> > > > > The standard PGO command runs 43 tests, while you specify 7. When I
> > > > > compile Python 3.8.3 on Raspbian, I see no intersection between the
> > > > > 43 tests run by default and the 7 you specify. Additionally, the
> > > > > default module for PROFILE is test while you use test.regrtest.
> > > >
> > > >
> > > >
> > > > We used to run pybench and then switched to regrtest:
> > > >
> > > >
> > > >
> > > > https://git.yoctoproject.org/cgit/cgit.cgi/poky/commit/?id=d9f7b9d3ad44195
> > > > e68b2c1b09e3eb42e623c9a20
> > >
> > > >
> > > >
> > > > The PROFILE_TASK value it looks like was changed recently:
> > > >
> > > >
> > > >
> > > > https://github.com/python/cpython/commit/2406672984e4c1b18629e615edad52928
> > > > a72ffcc#diff-45e8b91057f0c5b60efcb5944125b585
> > >
> > > >
> > > >
> > > > If the performance is actually degrading, may be we should change it to
> > > > something more useful. Do you know much time does the default set of
> > > > tasks take to run in qemu?
> > > >
> > > >
> > > >
> > > > Thanks,
> > > >
> > > >
> > > >
> > > > Anuj
> > >
> > >
> > > Thanks for looking into this. It took me about 20 minutes to run the PGO
> > > tests and I did notice a significant improvement in Python runtime.
> > > However, that is compared against a non-PGO build. I have not compared
> > > the existing PGO arguments against the new upstream arguments.
> > >
> > > We've come to realize that our performance issues are not due to Python,
> > > but in fact a much deeper rooted issue. Simple C code takes 2-3 times
> > > longer to run on our image based on meta-raspberrypi's raspberrypi4
> > > machine than stock Raspbian.
> > >
> > > On a side node, it seems that cPython now exposes PROFILE_TASK as a
> > > configuration option, so we can override that variable with our
> > > desired profiling arguments rather than modifying the Makefile
> > > directly with a patch.
> > >
> >
> > The patch 0001-Makefile.pre-use-qemu-wrapper-when-gathering-profile.patch
> > seems to hardcode what tests to run, perhaps it will be better to use
> > PROFILE_TASK
> >
> > When 3.5 -> 3.7 upgrade was done in
> >
> > https://git.openembedded.org/openembedded-core/commit/?
> > id=02714c105426b0d687620913c1a7401b386428b6
> >
> > it dropped using PYTHON3_PROFILE_TASK silently, among large swath of changes
> > this patch carried. I guess we have not checked the py3 runtime performance 
> > to
> > detect this regression.
>
> Are we sure there is a regression? Ryan posted a follow up saying
> everything was slower in his tests, not just python.

regression is disabling it with e53ebf29

>
> > so it will be good to reinstate the variable to choose what tests one wants 
> > to
> > run with defaults being whatever is optimal for autobuilder.
> >
> > > Thanks,
> > > Ryan
> > >
> > >
> > > > >
> > > > >
> > > > > For reference, here’s the results of a simple CPU-bound test. These
> > > > > tests were run on the same Raspberry Pi 4 with same SD card.
> > > > >
> > > > >
> > > > >
> > > > > python3 -m timeit -r 10 --setup '
> > > > > def fib(n):
> > > > >
> > > > >  if n < 2:
> > > > >
> > > > >    return n
> > > > >
> > > > >  if n == 2:
> > > > >
> > > > >    return 1
> > > > >
> > > > >  return fib(n - 1) + fib(n - 2)
> > > > >
> > > > > ' '[fib(n) for n in range(20)]'
> > > > >
> > > > >
> > > > >
> > > > > # Yocto Python 3.8.3
> > > > > # 10 loops, best of 10: 28.9 msec per loop
> > > > > # 10 loops, best of 10: 29.3 msec per loop
> > > > > # 10 loops, best of 10: 27.9 msec per loop
> > > > > # 10 loops, best of 10: 30.4 msec per loop
> > > > > # Average result: 31.625 msec per loop
> > > > >
> > > > >
> > > > >
> > > > > # Raspbian Python 3.8.3
> > > > > # 50 loops, best of 10: 7.73 msec per loop
> > > > > # 50 loops, best of 10: 7.72 msec per loop
> > > > > # 50 loops, best of 10: 7.67 msec per loop
> > > > > # 50 loops, best of 10: 7.74 msec per loop
> > > > > # Average result: 7.715 msec per loop
> > > > >
> > > > >
> > > > >
> > > > > # Raspbian speedup: 4.09x
> > > > >
> > > > >
> > > > >
> > > > > Best,
> > > > > Ryan Rowe
> > > > >
> > >
> > >
> >
> >
> >
> >
> > 
-=-=-=-=-=-=-=-=-=-=-=-
Links: You receive all messages sent to this group.

View/Reply Online (#139677): 
https://lists.openembedded.org/g/openembedded-core/message/139677
Mute This Topic: https://lists.openembedded.org/mt/74848490/21656
Group Owner: [email protected]
Unsubscribe: https://lists.openembedded.org/g/openembedded-core/unsub  
[[email protected]]
-=-=-=-=-=-=-=-=-=-=-=-

Reply via email to