On Monday, June 15, 2020 1:33:26 PM PDT Ryan Rowe wrote:
> On 14/6/20, 18:05, "Mittal, Anuj" <[email protected]> wrote:
> 
> > On Fri, 2020-06-12 at 21:28 +0000, Ryan Rowe wrote:
> > 
> > > Hello Alex,
> > >
> > >
> > >
> > > I’m investigating Python 3 performance issues on a Raspberry Pi Yocto
> > > build; I appreciate any insights you can provide into the problem.
> > >
> > >
> > >
> > > In my investigation, I noticed that PGO was disabled in all cases due
> > > to a small bug. I fixed it in a patch submitted to OE-Core (#139459).
> > > Even when PGO is indeed enabled, Python 3 runs significantly slower
> > > on Yocto-compiled Python 3.8.3 than the same version compiled on
> > > Raspbian.
> > >
> > >
> > >
> > > In your patch, 0001-Makefile.pre-use-qemu-wrapper-when-gathering-
> > > profile.patch, I see that you override the default PROFILE_TASK,
> > > which did not explicitly specify test suites, to a command that
> > > explicitly provides test suites. How did you decide on these tests?
> > > The standard PGO command runs 43 tests, while you specify 7. When I
> > > compile Python 3.8.3 on Raspbian, I see no intersection between the
> > > 43 tests run by default and the 7 you specify. Additionally, the
> > > default module for PROFILE is test while you use test.regrtest.
> >
> >
> >
> > We used to run pybench and then switched to regrtest:
> >
> >
> >
> > https://git.yoctoproject.org/cgit/cgit.cgi/poky/commit/?id=d9f7b9d3ad44195
> > e68b2c1b09e3eb42e623c9a20
>
> >
> >
> > The PROFILE_TASK value it looks like was changed recently:
> >
> >
> >
> > https://github.com/python/cpython/commit/2406672984e4c1b18629e615edad52928
> > a72ffcc#diff-45e8b91057f0c5b60efcb5944125b585
>
> >
> >
> > If the performance is actually degrading, may be we should change it to
> > something more useful. Do you know much time does the default set of
> > tasks take to run in qemu?
> >
> >
> >
> > Thanks,
> >
> >
> >
> > Anuj
> 
> 
> Thanks for looking into this. It took me about 20 minutes to run the PGO
> tests and I did notice a significant improvement in Python runtime.
> However, that is compared against a non-PGO build. I have not compared
> the existing PGO arguments against the new upstream arguments.
> 
> We've come to realize that our performance issues are not due to Python,
> but in fact a much deeper rooted issue. Simple C code takes 2-3 times
> longer to run on our image based on meta-raspberrypi's raspberrypi4
> machine than stock Raspbian.
> 
> On a side node, it seems that cPython now exposes PROFILE_TASK as a
> configuration option, so we can override that variable with our
> desired profiling arguments rather than modifying the Makefile
> directly with a patch.
> 

The patch 0001-Makefile.pre-use-qemu-wrapper-when-gathering-profile.patch 
seems to hardcode what tests to run, perhaps it will be better to use 
PROFILE_TASK

When 3.5 -> 3.7 upgrade was done in 

https://git.openembedded.org/openembedded-core/commit/?
id=02714c105426b0d687620913c1a7401b386428b6

it dropped using PYTHON3_PROFILE_TASK silently, among large swath of changes
this patch carried. I guess we have not checked the py3 runtime performance to 
detect this regression.

so it will be good to reinstate the variable to choose what tests one wants to 
run with defaults being whatever is optimal for autobuilder. 

> Thanks,
> Ryan
> 
> 
> > >
> > >
> > > For reference, here’s the results of a simple CPU-bound test. These
> > > tests were run on the same Raspberry Pi 4 with same SD card.
> > >
> > >
> > >
> > > python3 -m timeit -r 10 --setup '
> > > def fib(n):
> > > 
> > >  if n < 2:
> > >  
> > >    return n
> > >  
> > >  if n == 2:
> > >  
> > >    return 1
> > >  
> > >  return fib(n - 1) + fib(n - 2)
> > > 
> > > ' '[fib(n) for n in range(20)]'
> > >
> > >
> > >
> > > # Yocto Python 3.8.3
> > > # 10 loops, best of 10: 28.9 msec per loop
> > > # 10 loops, best of 10: 29.3 msec per loop
> > > # 10 loops, best of 10: 27.9 msec per loop
> > > # 10 loops, best of 10: 30.4 msec per loop
> > > # Average result: 31.625 msec per loop
> > >
> > >
> > >
> > > # Raspbian Python 3.8.3
> > > # 50 loops, best of 10: 7.73 msec per loop
> > > # 50 loops, best of 10: 7.72 msec per loop
> > > # 50 loops, best of 10: 7.67 msec per loop
> > > # 50 loops, best of 10: 7.74 msec per loop
> > > # Average result: 7.715 msec per loop
> > >
> > >
> > >
> > > # Raspbian speedup: 4.09x
> > >
> > >
> > >
> > > Best,
> > > Ryan Rowe
> > > 
> 
> 




-=-=-=-=-=-=-=-=-=-=-=-
Links: You receive all messages sent to this group.

View/Reply Online (#139675): 
https://lists.openembedded.org/g/openembedded-core/message/139675
Mute This Topic: https://lists.openembedded.org/mt/74848490/21656
Group Owner: [email protected]
Unsubscribe: https://lists.openembedded.org/g/openembedded-core/unsub  
[[email protected]]
-=-=-=-=-=-=-=-=-=-=-=-

Reply via email to