On Thu, Aug 15, 2019 at 05:05:48PM +0200, Martin Jansa wrote:
> On Tue, Aug 13, 2019 at 10:04:08AM +0100, Richard Purdie wrote:
> > On Mon, 2019-08-12 at 20:26 +0000, Peter Kjellerstedt wrote:
> > > Comparing that build to a corresponding do-nothing build with Thud,
> > > the time difference matches those three minutes where I have no idea
> > > what bitbake is doing now that it didn’t need to do before…
> > >
> > > Hopefully these time degradations can be solved, because the current
> > > state of bitbake is barely usable. We also need to look into possible
> > > ways to improve the cooker output when it is running the setscene
> > > tasks so it makes some kind of sense again.
> >
> > We talked on irc and you pointed at the commit things started to go
> > wrong. Just to summarise things for the benefit of the list, this is
> > some quick testing I did:
> >
> > "bitbake -p; time bitbake core-image-minimal -n"
> >
> > 30.0s 6c7c0cefd34067311144a1d4c01986fe0a4aef26
> > 30.6s a0d941c787cf3ef030d190903279d311bc05d752
> > 40.3s 7df31ff36892c2f9c65326b06b4c70
> > 42.2s a0542ed3ff700eca35f9195f743c9e28bcd50f3e
> > 45.4s 9983b07fffd19082abded7c3f15cc77d306dd69c
> > 76.9s master-next
>
> I know I'm late to the party, but it took really long for the test to finish..
>
> I'm not using poky, so reproduce this testing on our builds I've used
> bitbake revisions matching with poky revision RP was testing:
>
> + oe-core 18cdc087fd5da30e2b31f3d4e81b153cd36ca844 for older bitbake, because
> current master isn't compatible with old bitbake
> (bb.data_smart.ExpansionError: Failure expanding variable OE_IMPORTED[:=],
> expression was ${@oe_import(d)} which triggered exception AttributeError:
> module 'bb.siggen' has no attribute 'SignatureGeneratorUniHashMixIn')
> and oe-core 23db236a054ee7a989cdbbcb42ad5c6eefd4a6ae for newer bitbake (to
> possibly eliminate impact of metadata changes)
>
> tested on 72core (Intel(R) Xeon(R) CPU E5-2699 v3 @ 2.30GHz) with 380GB RAM
>
> nothing in TMPDIR/SSTATE_DIR, no SSTATE_MIRRO, no Hash Equivalence Server
> configured
>
> on a build with few more layers:
> Parsing of 3238 .bb files complete (0 cached, 3238 parsed). 7632 targets,
> 1927 skipped, 54 masked, 0 errors.
>
> but first doing just core-image-minimal like RP did:
>
> time poky bitbake
> oe-core
> 2m8.191s 6c7c0cefd34067311144a1d4c01986fe0a4aef26
> 18d4a31fdcec1f0e5d2199d6142f0ce833fca1a7
> 18cdc087fd5da30e2b31f3d4e81b153cd36ca844
> N/A a0d941c787cf3ef030d190903279d311bc05d752 doesn't exist
> in poky/poky-contrib 18cdc087fd5da30e2b31f3d4e81b153cd36ca844
> 2m17.053s 7df31ff36892c2f9c65326b06b4c70
> 1f630fdf0260db08541d3ca9f25f852931c19905
> 18cdc087fd5da30e2b31f3d4e81b153cd36ca844
> 2m18.515s a0542ed3ff700eca35f9195f743c9e28bcd50f3e
> f43778c2d19e70d4befd483b06aaf247fc65c799
> 23db236a054ee7a989cdbbcb42ad5c6eefd4a6ae
> 2m22.220s 9983b07fffd19082abded7c3f15cc77d306dd69c
> 8c26b451f22193ef1c544e2017cc84515566c1b8
> 2m38.185s master-next
> fbcb89dc3dbabfc80310e9a4ebe72d919300a32e
> cache.py:446: ResourceWarning: unclosed <socket.socket fd=10,
> family=AddressFamily.AF_INET, type=SocketKind.SOCK_STREAM, proto=0,
> laddr=('127.0.0.1', 41117)>
> value = pickled.load()
> started showing with this revision
> 2m17.991s master-next + fixups
> f7cd14f90463957b3e4be6d3876def98b78f1424
> 2m9.651s master-next + "Holding off tasks %s" out
>
> now world
> 253m36.637s 7df31ff36892c2f9c65326b06b4c70
> 1f630fdf0260db08541d3ca9f25f852931c19905
> 18cdc087fd5da30e2b31f3d4e81b153cd36ca844
> 174m13.324s a0542ed3ff700eca35f9195f743c9e28bcd50f3e
> f43778c2d19e70d4befd483b06aaf247fc65c799
> 23db236a054ee7a989cdbbcb42ad5c6eefd4a6ae
> this is time when killed at "NOTE: Executing Tasks" without showing any
> progress on the tasks (unlike other tests) and only one bitbake process is
> running
> 633m27.475s master-next
> fbcb89dc3dbabfc80310e9a4ebe72d919300a32e
> this is time when killed at (1417 from 71749) - running very slowly unlike
> other bitbake revisions where the task number changes relatively quickly -
> about 10 tasks per second
> 89m13.992s master-next + "Holding off tasks %s" out
> 89m59.201s master-next updated today
> 85fe627fdb6510f0942917964386fad9d8c479c8
>
> Interestingly old 1f630fdf0260db08541d3ca9f25f852931c19905 is 3 times
> slower than recent master-next.
>
> I'll send -P output next.So 1f630fdf0260db08541d3ca9f25f852931c19905 was already "bad", because this commit introduced that logger.debug(2, "Holding off tasks %s" % pprint.pformat(self.holdoff_tasks)) Here is -P output from it: 5min (til "Executing Tasks" message): profile.log.processed.1f630fdf0260db08541d3ca9f25f852931c19905.return.at.Executing.Tasks http://paste.ubuntu.com/p/Nw6n5hTPmF/ 5hours (whole dry run): profile.log.processed.1f630fdf0260db08541d3ca9f25f852931c19905.whole.dry.run http://paste.ubuntu.com/p/sydFGvYTxQ/ profile.log.processed.1f630fdf0260db08541d3ca9f25f852931c19905.worker.log.processed http://paste.ubuntu.com/p/RMGbm9DyZz/ So for good old baseline I need to go few commits back, to e.g. 18d4a31f RP was testing before: 1f630fdf runqueue: Enable dynamic task adjustment to hash equivalency a0d941c7 siggen: Convert to use self.unitaskhashes 1f326f2c siggen: Add new unitaskhashes data variable which is cached b707d0cb runqueue: Improve scenequeue processing logic 19a6e356 cache: Add SimpleCache class 7bb79099 siggen: Import unihash code from OE-Core 7aec8632 hashserv: SQL Optimizations 18d4a31f fetch2/wget: avoid 'maximum recursion depth' RuntimeErrors when handling 403 codes 94m19.081s profile.log.processed.18d4a31fdcec1f0e5d2199d6142f0ce833fca1a7.whole.dry.run http://paste.ubuntu.com/p/Jnvv8GJNKB/ And here is profile with latest master from yesterday: 90mins: profile.log.processed.a5e084a266f63c2fd370122327615e49beaeb94e.whole.dry.run http://paste.ubuntu.com/p/YhMCHPmg56/ So good news is that current master is a bit faster than "good old" 18d4a31fdcec1f0e5d2199d6142f0ce833fca1a7. But still significantly slower on my fast builder compared to world tests on poky + meta-oe, not sure if it's all just because the size of metadata in my builds (with 71K tasks). Even the dry running the tasks was about 10 tasks/second while RP was seeing around 200 tasks/second. Will try to bump BB_NUMBER_THREADS from 8 to 72. I've already disabled rm_work and PRserv, but that doesn't seem to have any significant impact on performance here. Other than resolving the PRserv terminating and then warnings about lost connection. Cheers, > > > > > So basically the original changes showed a 25% hit but the performance > > of -next is dire. > > > > This is with no hash equiv server configured. > > > > It will vary depending on the target used (numbers with -sato for the > > above would be interesting for comparision) and how much was or is in > > sstate, they type of sstate mirror configured and so on. > > > > I really need to focus on getting the new code functioning correctly > > before we attempt to optimise but if nobody tests the new code due to > > performance problems we have a different issue. We also have a scaling > > problem with the hash server itself I need to fix to stop the > > autobuilder throwing weird errors. I'm therefore a bit challenged on > > where to start with it all :/. > > > > Cheers, > > > > Richard > > > > > > > > -- > > _______________________________________________ > > Openembedded-core mailing list > > [email protected] > > http://lists.openembedded.org/mailman/listinfo/openembedded-core > > -- > Martin 'JaMa' Jansa jabber: [email protected] -- Martin 'JaMa' Jansa jabber: [email protected]
signature.asc
Description: Digital signature
-- _______________________________________________ Openembedded-core mailing list [email protected] http://lists.openembedded.org/mailman/listinfo/openembedded-core
