Re: [OE-core] Long delays with latest bitbake (was: [PATCH 1/7] insane.bbclass: in file-rdeps do not look into RDEPENDS recursively)
On Fri, Aug 16, 2019 at 07:22:55PM +0200, Martin Jansa wrote: > On Fri, Aug 16, 2019 at 04:54:48PM +0100, richard.pur...@linuxfoundation.org > wrote: > > On Fri, 2019-08-16 at 17:00 +0200, Martin Jansa wrote: > > > > Will try to bump BB_NUMBER_THREADS from 8 to 72. > > > > > > I've tried to remove icecc.bbclass inherit (because it does things > > > while parsing and RP probably doesn't have it inherited), but that > > > didn't save much time. > > > > > > All 3 tests were with bitbake > > > 18d4a31fdcec1f0e5d2199d6142f0ce833fca1a7 > > > 94m19.081s 8 BB_NUMBER_THREADS and icecc > > > 82m59.574s 8 BB_NUMBER_THREADS no icecc > > > 68m3.556s72 BB_NUMBER_THREADS no icecc > > > > > > Still don't know how to get to sub 10min world runs RP was seeing, > > > but at least it's as slow as it was before runqeueu changes (or even > > > a bit faster in lastest master). > > > > Just thinking out loud, other things which can influence timings: > > > > Is SSTATE_DIR on NFS or local disk? > > SSTATE_DIR is empty directory on local disk > /dev/mapper/vg00-jenkins on /jenkins type ext4 > (rw,noatime,nobarrier,commit=6000,stripe=128,data=ordered) > > > Are sstate mirrors configured? > > None, normally I have SSTATE_MIRRORS over sshfs in this case, but I've > removed it before any performance testing. > > > Is there an existing build or not, if so, how much is valid? > > Nothing, I remove whole TMPDIR and cache before each run. Then let it > recreate the cache before starting the profile: > bitbake -p; time bitbake world -P -n > > > Underlying filesystem of the build? > > ext4, everything is pretty much generic Ubuntu 18.04 > > There is plenty of ram, I'll try to test this from tmpfs as well. I did the same "bitbake -p; time bitbake world -P -n" on the same build with thud, warrior and zeus: thud/profile.log.processed: 2803894450 function calls (2803191143 primitive calls) in 3340.784 seconds thud/profile-worker.log.processed: 27005515 function calls (26944030 primitive calls) in 3243.139 seconds warrior/profile.log.processed: 2804294486 function calls (2803446296 primitive calls) in 3604.226 seconds warrior/profile-worker.log.processed: 27649679 function calls (27591220 primitive calls) in 3503.772 seconds zeus/profile.log.processed: 2327031298 function calls (2326396186 primitive calls) in 4433.851 seconds zeus/profile-worker.log.processed: 28829453 function calls (28771730 primitive calls) in 4331.257 seconds thud/time:real55m41.595s warrior/time:real60m4.954s zeus/time:real72m23.053s with multilib disabled I got zeus-no-multilib/profile.log.processed: 1232798107 function calls (1232447521 primitive calls) in 1773.164 seconds zeus-no-multilib/profile-worker.log.processed: 15561565 function calls (1044 primitive calls) in 1716.234 seconds real29m33.658s Which seems reasonable as the number of tasks was cut in half. Let me know if it's worth uploading these profiles somewhere. Cheers, > > Your build seems especially slow at executing through the tasks which > > is effectively a test on how fast the system can fork() and return in > > some ways. It would be interesting to block dry-run on the server side, > > skip the fork and see how the numbers compare. > > As discussed on IRC, it's slower than yours (8 minutes from 68), but the > most significant chunk of time is lost somewhere else. > > > My build manages some parts of the tasklist faster than others, perhaps > > because there are more "free" tasks to execute at some points in the > > task graph than others. > > > > Also, I have some patches which improve performance a bit which I'm > > still testing. > > Thanks for all the work on this! > > Cheers, > -- > Martin 'JaMa' Jansa jabber: martin.ja...@gmail.com -- Martin 'JaMa' Jansa jabber: martin.ja...@gmail.com signature.asc Description: Digital signature -- ___ Openembedded-core mailing list Openembedded-core@lists.openembedded.org http://lists.openembedded.org/mailman/listinfo/openembedded-core
Re: [OE-core] Long delays with latest bitbake (was: [PATCH 1/7] insane.bbclass: in file-rdeps do not look into RDEPENDS recursively)
On Fri, Aug 16, 2019 at 04:54:48PM +0100, richard.pur...@linuxfoundation.org wrote: > On Fri, 2019-08-16 at 17:00 +0200, Martin Jansa wrote: > > > Will try to bump BB_NUMBER_THREADS from 8 to 72. > > > > I've tried to remove icecc.bbclass inherit (because it does things > > while parsing and RP probably doesn't have it inherited), but that > > didn't save much time. > > > > All 3 tests were with bitbake > > 18d4a31fdcec1f0e5d2199d6142f0ce833fca1a7 > > 94m19.081s 8 BB_NUMBER_THREADS and icecc > > 82m59.574s 8 BB_NUMBER_THREADS no icecc > > 68m3.556s72 BB_NUMBER_THREADS no icecc > > > > Still don't know how to get to sub 10min world runs RP was seeing, > > but at least it's as slow as it was before runqeueu changes (or even > > a bit faster in lastest master). > > Just thinking out loud, other things which can influence timings: > > Is SSTATE_DIR on NFS or local disk? SSTATE_DIR is empty directory on local disk /dev/mapper/vg00-jenkins on /jenkins type ext4 (rw,noatime,nobarrier,commit=6000,stripe=128,data=ordered) > Are sstate mirrors configured? None, normally I have SSTATE_MIRRORS over sshfs in this case, but I've removed it before any performance testing. > Is there an existing build or not, if so, how much is valid? Nothing, I remove whole TMPDIR and cache before each run. Then let it recreate the cache before starting the profile: bitbake -p; time bitbake world -P -n > Underlying filesystem of the build? ext4, everything is pretty much generic Ubuntu 18.04 There is plenty of ram, I'll try to test this from tmpfs as well. > Your build seems especially slow at executing through the tasks which > is effectively a test on how fast the system can fork() and return in > some ways. It would be interesting to block dry-run on the server side, > skip the fork and see how the numbers compare. As discussed on IRC, it's slower than yours (8 minutes from 68), but the most significant chunk of time is lost somewhere else. > My build manages some parts of the tasklist faster than others, perhaps > because there are more "free" tasks to execute at some points in the > task graph than others. > > Also, I have some patches which improve performance a bit which I'm > still testing. Thanks for all the work on this! Cheers, -- Martin 'JaMa' Jansa jabber: martin.ja...@gmail.com signature.asc Description: Digital signature -- ___ Openembedded-core mailing list Openembedded-core@lists.openembedded.org http://lists.openembedded.org/mailman/listinfo/openembedded-core
Re: [OE-core] Long delays with latest bitbake (was: [PATCH 1/7] insane.bbclass: in file-rdeps do not look into RDEPENDS recursively)
On Fri, 2019-08-16 at 17:00 +0200, Martin Jansa wrote: > > Will try to bump BB_NUMBER_THREADS from 8 to 72. > > I've tried to remove icecc.bbclass inherit (because it does things > while parsing and RP probably doesn't have it inherited), but that > didn't save much time. > > All 3 tests were with bitbake > 18d4a31fdcec1f0e5d2199d6142f0ce833fca1a7 > 94m19.081s 8 BB_NUMBER_THREADS and icecc > 82m59.574s 8 BB_NUMBER_THREADS no icecc > 68m3.556s72 BB_NUMBER_THREADS no icecc > > Still don't know how to get to sub 10min world runs RP was seeing, > but at least it's as slow as it was before runqeueu changes (or even > a bit faster in lastest master). Just thinking out loud, other things which can influence timings: Is SSTATE_DIR on NFS or local disk? Are sstate mirrors configured? Is there an existing build or not, if so, how much is valid? Underlying filesystem of the build? Your build seems especially slow at executing through the tasks which is effectively a test on how fast the system can fork() and return in some ways. It would be interesting to block dry-run on the server side, skip the fork and see how the numbers compare. My build manages some parts of the tasklist faster than others, perhaps because there are more "free" tasks to execute at some points in the task graph than others. Also, I have some patches which improve performance a bit which I'm still testing. Cheers, Richard -- ___ Openembedded-core mailing list Openembedded-core@lists.openembedded.org http://lists.openembedded.org/mailman/listinfo/openembedded-core
Re: [OE-core] Long delays with latest bitbake (was: [PATCH 1/7] insane.bbclass: in file-rdeps do not look into RDEPENDS recursively)
> Will try to bump BB_NUMBER_THREADS from 8 to 72. I've tried to remove icecc.bbclass inherit (because it does things while parsing and RP probably doesn't have it inherited), but that didn't save much time. All 3 tests were with bitbake 18d4a31fdcec1f0e5d2199d6142f0ce833fca1a7 94m19.081s 8 BB_NUMBER_THREADS and icecc 82m59.574s 8 BB_NUMBER_THREADS no icecc 68m3.556s72 BB_NUMBER_THREADS no icecc Still don't know how to get to sub 10min world runs RP was seeing, but at least it's as slow as it was before runqeueu changes (or even a bit faster in lastest master). On Fri, Aug 16, 2019 at 12:24 PM Martin Jansa wrote: > On Thu, Aug 15, 2019 at 05:05:48PM +0200, Martin Jansa wrote: > > On Tue, Aug 13, 2019 at 10:04:08AM +0100, Richard Purdie wrote: > > > On Mon, 2019-08-12 at 20:26 +, Peter Kjellerstedt wrote: > > > > Comparing that build to a corresponding do-nothing build with Thud, > > > > the time difference matches those three minutes where I have no idea > > > > what bitbake is doing now that it didn’t need to do before… > > > > > > > > Hopefully these time degradations can be solved, because the current > > > > state of bitbake is barely usable. We also need to look into possible > > > > ways to improve the cooker output when it is running the setscene > > > > tasks so it makes some kind of sense again. > > > > > > We talked on irc and you pointed at the commit things started to go > > > wrong. Just to summarise things for the benefit of the list, this is > > > some quick testing I did: > > > > > > "bitbake -p; time bitbake core-image-minimal -n" > > > > > > 30.0s 6c7c0cefd34067311144a1d4c01986fe0a4aef26 > > > 30.6s a0d941c787cf3ef030d190903279d311bc05d752 > > > 40.3s 7df31ff36892c2f9c65326b06b4c70 > > > 42.2s a0542ed3ff700eca35f9195f743c9e28bcd50f3e > > > 45.4s 9983b07fffd19082abded7c3f15cc77d306dd69c > > > 76.9s master-next > > > > I know I'm late to the party, but it took really long for the test to > finish.. > > > > I'm not using poky, so reproduce this testing on our builds I've used > > bitbake revisions matching with poky revision RP was testing: > > > > + oe-core 18cdc087fd5da30e2b31f3d4e81b153cd36ca844 for older bitbake, > because current master isn't compatible with old bitbake > (bb.data_smart.ExpansionError: Failure expanding variable OE_IMPORTED[:=], > expression was ${@oe_import(d)} which triggered exception AttributeError: > module 'bb.siggen' has no attribute 'SignatureGeneratorUniHashMixIn') > > and oe-core 23db236a054ee7a989cdbbcb42ad5c6eefd4a6ae for newer bitbake > (to possibly eliminate impact of metadata changes) > > > > tested on 72core (Intel(R) Xeon(R) CPU E5-2699 v3 @ 2.30GHz) with 380GB > RAM > > > > nothing in TMPDIR/SSTATE_DIR, no SSTATE_MIRRO, no Hash Equivalence > Server configured > > > > on a build with few more layers: > > Parsing of 3238 .bb files complete (0 cached, 3238 parsed). 7632 > targets, 1927 skipped, 54 masked, 0 errors. > > > > but first doing just core-image-minimal like RP did: > > > > time pokybitbake > oe-core > > 2m8.191s 6c7c0cefd34067311144a1d4c01986fe0a4aef26 > 18d4a31fdcec1f0e5d2199d6142f0ce833fca1a7 > 18cdc087fd5da30e2b31f3d4e81b153cd36ca844 > > N/A a0d941c787cf3ef030d190903279d311bc05d752doesn't > exist in poky/poky-contrib > 18cdc087fd5da30e2b31f3d4e81b153cd36ca844 > > 2m17.053s 7df31ff36892c2f9c65326b06b4c70 > 1f630fdf0260db08541d3ca9f25f852931c19905 > 18cdc087fd5da30e2b31f3d4e81b153cd36ca844 > > 2m18.515s a0542ed3ff700eca35f9195f743c9e28bcd50f3e > f43778c2d19e70d4befd483b06aaf247fc65c799 > 23db236a054ee7a989cdbbcb42ad5c6eefd4a6ae > > 2m22.220s 9983b07fffd19082abded7c3f15cc77d306dd69c > 8c26b451f22193ef1c544e2017cc84515566c1b8 > > 2m38.185s master-next > fbcb89dc3dbabfc80310e9a4ebe72d919300a32e > > cache.py:446: ResourceWarning: unclosed family=AddressFamily.AF_INET, type=SocketKind.SOCK_STREAM, proto=0, > laddr=('127.0.0.1', 41117)> > > value = pickled.load() > > started showing with this revision > > 2m17.991s master-next + fixups > f7cd14f90463957b3e4be6d3876def98b78f1424 > > 2m9.651s master-next + "Holding off tasks %s" out > > > > now world > > 253m36.637s 7df31ff36892c2f9c65326b06b4c70 > 1f630fdf0260db08541d3ca9f25f852931c19905 > 18cdc087fd5da30e2b31f3d4e81b153cd36ca844 > > 174m13.324s a0542ed3ff700eca35f9195f743c9e28bcd50f3e > f43778c2d19e70d4befd483b06aaf247fc65c799 > 23db236a054ee7a989cdbbcb42ad5c6eefd4a6ae > >this is time when killed at "NOTE: Executing Tasks" without showing > any progress on the tasks (unlike other tests) and only one bitbake process > is running > > 633m27.475s master-next > fbcb89dc3dbabfc80310e9a4ebe72d919300a32e > >this is time when killed at (1417 from 71749) - running very slowly > unlike other bitbake revisions where the task number changes relatively > quickly - about 10 tasks per second > > 89m13.992smaster-next + "Holding off tasks %s"
Re: [OE-core] Long delays with latest bitbake (was: [PATCH 1/7] insane.bbclass: in file-rdeps do not look into RDEPENDS recursively)
On Thu, Aug 15, 2019 at 05:05:48PM +0200, Martin Jansa wrote: > On Tue, Aug 13, 2019 at 10:04:08AM +0100, Richard Purdie wrote: > > On Mon, 2019-08-12 at 20:26 +, Peter Kjellerstedt wrote: > > > Comparing that build to a corresponding do-nothing build with Thud, > > > the time difference matches those three minutes where I have no idea > > > what bitbake is doing now that it didn’t need to do before… > > > > > > Hopefully these time degradations can be solved, because the current > > > state of bitbake is barely usable. We also need to look into possible > > > ways to improve the cooker output when it is running the setscene > > > tasks so it makes some kind of sense again. > > > > We talked on irc and you pointed at the commit things started to go > > wrong. Just to summarise things for the benefit of the list, this is > > some quick testing I did: > > > > "bitbake -p; time bitbake core-image-minimal -n" > > > > 30.0s 6c7c0cefd34067311144a1d4c01986fe0a4aef26 > > 30.6s a0d941c787cf3ef030d190903279d311bc05d752 > > 40.3s 7df31ff36892c2f9c65326b06b4c70 > > 42.2s a0542ed3ff700eca35f9195f743c9e28bcd50f3e > > 45.4s 9983b07fffd19082abded7c3f15cc77d306dd69c > > 76.9s master-next > > I know I'm late to the party, but it took really long for the test to finish.. > > I'm not using poky, so reproduce this testing on our builds I've used > bitbake revisions matching with poky revision RP was testing: > > + oe-core 18cdc087fd5da30e2b31f3d4e81b153cd36ca844 for older bitbake, because > current master isn't compatible with old bitbake > (bb.data_smart.ExpansionError: Failure expanding variable OE_IMPORTED[:=], > expression was ${@oe_import(d)} which triggered exception AttributeError: > module 'bb.siggen' has no attribute 'SignatureGeneratorUniHashMixIn') > and oe-core 23db236a054ee7a989cdbbcb42ad5c6eefd4a6ae for newer bitbake (to > possibly eliminate impact of metadata changes) > > tested on 72core (Intel(R) Xeon(R) CPU E5-2699 v3 @ 2.30GHz) with 380GB RAM > > nothing in TMPDIR/SSTATE_DIR, no SSTATE_MIRRO, no Hash Equivalence Server > configured > > on a build with few more layers: > Parsing of 3238 .bb files complete (0 cached, 3238 parsed). 7632 targets, > 1927 skipped, 54 masked, 0 errors. > > but first doing just core-image-minimal like RP did: > > time pokybitbake > oe-core > 2m8.191s 6c7c0cefd34067311144a1d4c01986fe0a4aef26 > 18d4a31fdcec1f0e5d2199d6142f0ce833fca1a7 > 18cdc087fd5da30e2b31f3d4e81b153cd36ca844 > N/A a0d941c787cf3ef030d190903279d311bc05d752doesn't exist > in poky/poky-contrib 18cdc087fd5da30e2b31f3d4e81b153cd36ca844 > 2m17.053s 7df31ff36892c2f9c65326b06b4c70 > 1f630fdf0260db08541d3ca9f25f852931c19905 > 18cdc087fd5da30e2b31f3d4e81b153cd36ca844 > 2m18.515s a0542ed3ff700eca35f9195f743c9e28bcd50f3e > f43778c2d19e70d4befd483b06aaf247fc65c799 > 23db236a054ee7a989cdbbcb42ad5c6eefd4a6ae > 2m22.220s 9983b07fffd19082abded7c3f15cc77d306dd69c > 8c26b451f22193ef1c544e2017cc84515566c1b8 > 2m38.185s master-next > fbcb89dc3dbabfc80310e9a4ebe72d919300a32e > cache.py:446: ResourceWarning: unclosed family=AddressFamily.AF_INET, type=SocketKind.SOCK_STREAM, proto=0, > laddr=('127.0.0.1', 41117)> > value = pickled.load() > started showing with this revision > 2m17.991s master-next + fixups > f7cd14f90463957b3e4be6d3876def98b78f1424 > 2m9.651s master-next + "Holding off tasks %s" out > > now world > 253m36.637s 7df31ff36892c2f9c65326b06b4c70 > 1f630fdf0260db08541d3ca9f25f852931c19905 > 18cdc087fd5da30e2b31f3d4e81b153cd36ca844 > 174m13.324s a0542ed3ff700eca35f9195f743c9e28bcd50f3e > f43778c2d19e70d4befd483b06aaf247fc65c799 > 23db236a054ee7a989cdbbcb42ad5c6eefd4a6ae >this is time when killed at "NOTE: Executing Tasks" without showing any > progress on the tasks (unlike other tests) and only one bitbake process is > running > 633m27.475s master-next > fbcb89dc3dbabfc80310e9a4ebe72d919300a32e >this is time when killed at (1417 from 71749) - running very slowly unlike > other bitbake revisions where the task number changes relatively quickly - > about 10 tasks per second > 89m13.992smaster-next + "Holding off tasks %s" out > 89m59.201smaster-next updated today > 85fe627fdb6510f0942917964386fad9d8c479c8 > > Interestingly old 1f630fdf0260db08541d3ca9f25f852931c19905 is 3 times > slower than recent master-next. > > I'll send -P output next. So 1f630fdf0260db08541d3ca9f25f852931c19905 was already "bad", because this commit introduced that logger.debug(2, "Holding off tasks %s" % pprint.pformat(self.holdoff_tasks)) Here is -P output from it: 5min (til "Executing Tasks" message): profile.lo
Re: [OE-core] Long delays with latest bitbake (was: [PATCH 1/7] insane.bbclass: in file-rdeps do not look into RDEPENDS recursively)
On Thu, 2019-08-15 at 14:56 +0100, richard.pur...@linuxfoundation.org wrote: > On Thu, 2019-08-15 at 14:56 +0200, Alexander Kanavin wrote: > What is really odd is this on both traces: > >524436 405.1020.001 405.2660.001 > /home/alexander/development/poky/bitbake/lib/bb/cooker.py:264(notific > ations) > > What that means is that cooker recieved notification of 524436 file > changes whilst it was parsing/preparing runqueue. > > I would be very interested to understand which files its being > notified > of changes to as I suspect there is something that shouldn't be > there. This is a bit of a false lead. At least locally I found them to be profiles of the individual tasks being written out which is only triggered on profiled builds. We should write those files to a different directory to avoid this confusion of the profiles... Cheers, Richard -- ___ Openembedded-core mailing list Openembedded-core@lists.openembedded.org http://lists.openembedded.org/mailman/listinfo/openembedded-core
Re: [OE-core] Long delays with latest bitbake (was: [PATCH 1/7] insane.bbclass: in file-rdeps do not look into RDEPENDS recursively)
On Tue, Aug 13, 2019 at 10:04:08AM +0100, Richard Purdie wrote: > On Mon, 2019-08-12 at 20:26 +, Peter Kjellerstedt wrote: > > Comparing that build to a corresponding do-nothing build with Thud, > > the time difference matches those three minutes where I have no idea > > what bitbake is doing now that it didn’t need to do before… > > > > Hopefully these time degradations can be solved, because the current > > state of bitbake is barely usable. We also need to look into possible > > ways to improve the cooker output when it is running the setscene > > tasks so it makes some kind of sense again. > > We talked on irc and you pointed at the commit things started to go > wrong. Just to summarise things for the benefit of the list, this is > some quick testing I did: > > "bitbake -p; time bitbake core-image-minimal -n" > > 30.0s 6c7c0cefd34067311144a1d4c01986fe0a4aef26 > 30.6s a0d941c787cf3ef030d190903279d311bc05d752 > 40.3s 7df31ff36892c2f9c65326b06b4c70 > 42.2s a0542ed3ff700eca35f9195f743c9e28bcd50f3e > 45.4s 9983b07fffd19082abded7c3f15cc77d306dd69c > 76.9s master-next I know I'm late to the party, but it took really long for the test to finish.. I'm not using poky, so reproduce this testing on our builds I've used bitbake revisions matching with poky revision RP was testing: + oe-core 18cdc087fd5da30e2b31f3d4e81b153cd36ca844 for older bitbake, because current master isn't compatible with old bitbake (bb.data_smart.ExpansionError: Failure expanding variable OE_IMPORTED[:=], expression was ${@oe_import(d)} which triggered exception AttributeError: module 'bb.siggen' has no attribute 'SignatureGeneratorUniHashMixIn') and oe-core 23db236a054ee7a989cdbbcb42ad5c6eefd4a6ae for newer bitbake (to possibly eliminate impact of metadata changes) tested on 72core (Intel(R) Xeon(R) CPU E5-2699 v3 @ 2.30GHz) with 380GB RAM nothing in TMPDIR/SSTATE_DIR, no SSTATE_MIRRO, no Hash Equivalence Server configured on a build with few more layers: Parsing of 3238 .bb files complete (0 cached, 3238 parsed). 7632 targets, 1927 skipped, 54 masked, 0 errors. but first doing just core-image-minimal like RP did: timepokybitbake oe-core 2m8.191s6c7c0cefd34067311144a1d4c01986fe0a4aef26 18d4a31fdcec1f0e5d2199d6142f0ce833fca1a7 18cdc087fd5da30e2b31f3d4e81b153cd36ca844 N/A a0d941c787cf3ef030d190903279d311bc05d752doesn't exist in poky/poky-contrib 18cdc087fd5da30e2b31f3d4e81b153cd36ca844 2m17.053s 7df31ff36892c2f9c65326b06b4c70 1f630fdf0260db08541d3ca9f25f852931c19905 18cdc087fd5da30e2b31f3d4e81b153cd36ca844 2m18.515s a0542ed3ff700eca35f9195f743c9e28bcd50f3e f43778c2d19e70d4befd483b06aaf247fc65c799 23db236a054ee7a989cdbbcb42ad5c6eefd4a6ae 2m22.220s 9983b07fffd19082abded7c3f15cc77d306dd69c 8c26b451f22193ef1c544e2017cc84515566c1b8 2m38.185s master-next fbcb89dc3dbabfc80310e9a4ebe72d919300a32e cache.py:446: ResourceWarning: unclosed value = pickled.load() started showing with this revision 2m17.991s master-next + fixups f7cd14f90463957b3e4be6d3876def98b78f1424 2m9.651smaster-next + "Holding off tasks %s" out now world 253m36.637s 7df31ff36892c2f9c65326b06b4c70 1f630fdf0260db08541d3ca9f25f852931c19905 18cdc087fd5da30e2b31f3d4e81b153cd36ca844 174m13.324s a0542ed3ff700eca35f9195f743c9e28bcd50f3e f43778c2d19e70d4befd483b06aaf247fc65c799 23db236a054ee7a989cdbbcb42ad5c6eefd4a6ae this is time when killed at "NOTE: Executing Tasks" without showing any progress on the tasks (unlike other tests) and only one bitbake process is running 633m27.475s master-next fbcb89dc3dbabfc80310e9a4ebe72d919300a32e this is time when killed at (1417 from 71749) - running very slowly unlike other bitbake revisions where the task number changes relatively quickly - about 10 tasks per second 89m13.992s master-next + "Holding off tasks %s" out 89m59.201s master-next updated today 85fe627fdb6510f0942917964386fad9d8c479c8 Interestingly old 1f630fdf0260db08541d3ca9f25f852931c19905 is 3 times slower than recent master-next. I'll send -P output next. > > So basically the original changes showed a 25% hit but the performance > of -next is dire. > > This is with no hash equiv server configured. > > It will vary depending on the target used (numbers with -sato for the > above would be interesting for comparision) and how much was or is in > sstate, they type of sstate mirror configured and so on. > > I really need to focus on getting the new code functioning correctly > before we attempt to optimise but if nobody tests the new code due to > performance problems we have a different issue. We also have a scaling > pro
Re: [OE-core] Long delays with latest bitbake (was: [PATCH 1/7] insane.bbclass: in file-rdeps do not look into RDEPENDS recursively)
On Thu, 2019-08-15 at 14:56 +0200, Alexander Kanavin wrote: > On Wed, 14 Aug 2019 at 23:25, > wrote: > > Right, it still definitely needs work. Its a balancing act between > > sorting out the execution bugs in the code and figuring out the > > performance problem. > > > > If anyone wants to experiment, the way I'd debug this is to run the > > before and after cases with the -P option to bitbake. If you want > > to > > exit early just make the code return where it prints "Executing > > tasks" > > or whatever makes sense as Ctrl+C won't write the profile data. > > > > You want the profile.log.processed file. > > > > So save that file with the "before" commit, then save it afterwards > > and > > look at those files and see where its spending more time. > > > > If someone generates those two files I'll happily take a look, I'm > > kind > > of used to reading them. There are four sets of output in there, > > same > > data but different sorting/types, each has its uses. > > And here they are (with task spinning included this time): > http://sensi.org/~ak/tmp/profile.log.processed.before > http://sensi.org/~ak/tmp/profile.log.processed.after > > Hope you find it useful! How many tasks are in that build? What is really odd is this on both traces: 524436 405.1020.001 405.2660.001 /home/alexander/development/poky/bitbake/lib/bb/cooker.py:264(notifications) What that means is that cooker recieved notification of 524436 file changes whilst it was parsing/preparing runqueue. I would be very interested to understand which files its being notified of changes to as I suspect there is something that shouldn't be there. If we fixed that, it would take 406s off the 1800s for starters in both cases. I don't see that locally in my trace, my build layout may be different. Other things of note: Time in next_buildable_task went up 338 -> 440s (probably through the set manipulation e.g. method 'difference' of 'set' objects) New function update_holdofftasks() took 173s build_taskdepdata is also eating lots of time suddenly. So yes, useful, thanks. You may want to add some debug to that nofitications() function in cooker and see which files are changing though. Cheers, Richard -- ___ Openembedded-core mailing list Openembedded-core@lists.openembedded.org http://lists.openembedded.org/mailman/listinfo/openembedded-core
Re: [OE-core] Long delays with latest bitbake (was: [PATCH 1/7] insane.bbclass: in file-rdeps do not look into RDEPENDS recursively)
On Wed, 14 Aug 2019 at 23:25, wrote: > Right, it still definitely needs work. Its a balancing act between > sorting out the execution bugs in the code and figuring out the > performance problem. > > If anyone wants to experiment, the way I'd debug this is to run the > before and after cases with the -P option to bitbake. If you want to > exit early just make the code return where it prints "Executing tasks" > or whatever makes sense as Ctrl+C won't write the profile data. > > You want the profile.log.processed file. > > So save that file with the "before" commit, then save it afterwards and > look at those files and see where its spending more time. > > If someone generates those two files I'll happily take a look, I'm kind > of used to reading them. There are four sets of output in there, same > data but different sorting/types, each has its uses. > And here they are (with task spinning included this time): http://sensi.org/~ak/tmp/profile.log.processed.before http://sensi.org/~ak/tmp/profile.log.processed.after Hope you find it useful! Alex -- ___ Openembedded-core mailing list Openembedded-core@lists.openembedded.org http://lists.openembedded.org/mailman/listinfo/openembedded-core
Re: [OE-core] Long delays with latest bitbake (was: [PATCH 1/7] insane.bbclass: in file-rdeps do not look into RDEPENDS recursively)
On Wed, 2019-08-14 at 22:27 +0200, Alexander Kanavin wrote: > On Wed, 14 Aug 2019 at 14:55, > wrote: > > You followed up mentioning this wasn't with master-next. I think > > there > > is a patch in -next which will help with the empty task spin so > > both > > together might get us back to more normal numbers. > > > > As all of these patches are now in master, I re-ran the test with > that (209f89ab8ed51ac2867ca8f749336af1ee24ab25), but without > including the spinning task part, pressing ctrl-c just as it starts. > The outcome is 9m42s, compared to 2m9s for the baseline > (6c7c0cefd34067311144a1d4c01986fe0a4aef26). So the worst is fixed, > but the slowdown is still noticeable. Right, it still definitely needs work. Its a balancing act between sorting out the execution bugs in the code and figuring out the performance problem. If anyone wants to experiment, the way I'd debug this is to run the before and after cases with the -P option to bitbake. If you want to exit early just make the code return where it prints "Executing tasks" or whatever makes sense as Ctrl+C won't write the profile data. You want the profile.log.processed file. So save that file with the "before" commit, then save it afterwards and look at those files and see where its spending more time. If someone generates those two files I'll happily take a look, I'm kind of used to reading them. There are four sets of output in there, same data but different sorting/types, each has its uses. Cheers, Richard -- ___ Openembedded-core mailing list Openembedded-core@lists.openembedded.org http://lists.openembedded.org/mailman/listinfo/openembedded-core
Re: [OE-core] Long delays with latest bitbake (was: [PATCH 1/7] insane.bbclass: in file-rdeps do not look into RDEPENDS recursively)
On Wed, 14 Aug 2019 at 14:55, wrote: > You followed up mentioning this wasn't with master-next. I think there > is a patch in -next which will help with the empty task spin so both > together might get us back to more normal numbers. > As all of these patches are now in master, I re-ran the test with that (209f89ab8ed51ac2867ca8f749336af1ee24ab25), but without including the spinning task part, pressing ctrl-c just as it starts. The outcome is 9m42s, compared to 2m9s for the baseline (6c7c0cefd34067311144a1d4c01986fe0a4aef26). So the worst is fixed, but the slowdown is still noticeable. Alex -- ___ Openembedded-core mailing list Openembedded-core@lists.openembedded.org http://lists.openembedded.org/mailman/listinfo/openembedded-core
Re: [OE-core] Long delays with latest bitbake (was: [PATCH 1/7] insane.bbclass: in file-rdeps do not look into RDEPENDS recursively)
On Wed, Aug 14, 2019 at 7:57 AM Peter Kjellerstedt wrote: > > > -Original Message- > > From: richard.pur...@linuxfoundation.org > > > > Sent: den 14 augusti 2019 14:56 > > To: Alexander Kanavin > > Cc: Peter Kjellerstedt ; Khem Raj > > ; OE-core > c...@lists.openembedded.org> > > Subject: Re: [OE-core] Long delays with latest bitbake (was: [PATCH > > 1/7] insane.bbclass: in file-rdeps do not look into RDEPENDS > > recursively) > > > > On Wed, 2019-08-14 at 14:08 +0200, Alexander Kanavin wrote: > > > On Wed, 14 Aug 2019 at 13:36, > > > wrote: > > > > On Wed, 2019-08-14 at 13:25 +0200, Alexander Kanavin wrote: > > > > > On Tue, 13 Aug 2019 at 21:18, Richard Purdie < > > > > > richard.pur...@linuxfoundation.org> wrote: > > > > > > I had a glance at the profile output from master-next and the > > > > > > problem > > > > > > wasn't where I thought it would be, it was in the scheduler > > > > code. > > > > > > That > > > > > > is good as those classes are effectively independent of the > > > > other > > > > > > changes and hence are a separate fix. > > > > > > > > > > > > I've put a patch in -next which takes the above test to 36s > > > > which > > > > > > is > > > > > > close to the older bitbake. > > > > > > > > > > > > Could be interesting to see how it looks for others and > > > > different > > > > > > workloads. > > > > > > > > > > I just tried the same test I did yesterday with > > > > > ab56d466452148e5fce330d279d13e2495eceb1f. Unfortunately it > > > > doesn't > > > > > seem to improve things much: bitbake is stuck at "NOTE: Executing > > > > > Tasks" for 15 minutes now. > > > > > > > > This might sound slightly crazy but can you try commenting out this > > > > line in runqueue.py: > > > > > > > > logger.debug(2, "Holding off tasks %s" % > > > > pprint.pformat(self.holdoff_tasks)) > > > > > > > > ? > > > > > > Even crazier is the outcome: it helped! > > > > Cool, I think I can explain it. > > > > The holdoff_tasks list can contain a list of nearly all the tasks at > > some points in execution. Even though the debug messages aren't being > > printed on the console, they are being sent over the internal IPC bus > > between the cooker, UI and other event handlers. Obviously for small > > task lists its not a problem, for large ones its multiple 4k chunks > > over pipes which isn't going to be fast. > > > > We have done a lot of optimisation in the past but its all too easy to > > trend on something like this and upset things :/. > > > > > The whole thing completed after 15m49secons (with much of the time > > > going to the empty task spin), that's some 3 minutes slower, but > > > certainly it's usable again. > > > > You followed up mentioning this wasn't with master-next. I think there > > is a patch in -next which will help with the empty task spin so both > > together might get us back to more normal numbers. > > > > Cheers, > > > > Richard > > I can just confirm that removing the debug line removes almost all of > the delays I was seeing. Here are some time statistics from my builds: > > 6c7c0cef: 02:50 > 7df31ff3: 06:13 (first attempt) > a0542ed3: 06:32 (master) > 19a88c68: 43:19* (~yesterday's master-next) > b0a0e4a6: 06:55 (master-next) > no debug: 03:04 (master-next + removal of "Holding off tasks" debug) > > * I aborted this build after about half of the 12540 tasks were done... > > One thing I have noticed while I was doing the timings is that I do > not seem to be able to kill the bitbake server with bitbake -m after > the builds have completed, but have to resort to using kill -9 most > of the time... me too. My world build time came down to < 15 mins as well, Last night it was at 45 mins and still parsing before I stopped it. > > //Peter > -- ___ Openembedded-core mailing list Openembedded-core@lists.openembedded.org http://lists.openembedded.org/mailman/listinfo/openembedded-core
Re: [OE-core] Long delays with latest bitbake (was: [PATCH 1/7] insane.bbclass: in file-rdeps do not look into RDEPENDS recursively)
> -Original Message- > From: richard.pur...@linuxfoundation.org > > Sent: den 14 augusti 2019 14:56 > To: Alexander Kanavin > Cc: Peter Kjellerstedt ; Khem Raj > ; OE-core c...@lists.openembedded.org> > Subject: Re: [OE-core] Long delays with latest bitbake (was: [PATCH > 1/7] insane.bbclass: in file-rdeps do not look into RDEPENDS > recursively) > > On Wed, 2019-08-14 at 14:08 +0200, Alexander Kanavin wrote: > > On Wed, 14 Aug 2019 at 13:36, > > wrote: > > > On Wed, 2019-08-14 at 13:25 +0200, Alexander Kanavin wrote: > > > > On Tue, 13 Aug 2019 at 21:18, Richard Purdie < > > > > richard.pur...@linuxfoundation.org> wrote: > > > > > I had a glance at the profile output from master-next and the > > > > > problem > > > > > wasn't where I thought it would be, it was in the scheduler > > > code. > > > > > That > > > > > is good as those classes are effectively independent of the > > > other > > > > > changes and hence are a separate fix. > > > > > > > > > > I've put a patch in -next which takes the above test to 36s > > > which > > > > > is > > > > > close to the older bitbake. > > > > > > > > > > Could be interesting to see how it looks for others and > > > different > > > > > workloads. > > > > > > > > I just tried the same test I did yesterday with > > > > ab56d466452148e5fce330d279d13e2495eceb1f. Unfortunately it > > > doesn't > > > > seem to improve things much: bitbake is stuck at "NOTE: Executing > > > > Tasks" for 15 minutes now. > > > > > > This might sound slightly crazy but can you try commenting out this > > > line in runqueue.py: > > > > > > logger.debug(2, "Holding off tasks %s" % > > > pprint.pformat(self.holdoff_tasks)) > > > > > > ? > > > > Even crazier is the outcome: it helped! > > Cool, I think I can explain it. > > The holdoff_tasks list can contain a list of nearly all the tasks at > some points in execution. Even though the debug messages aren't being > printed on the console, they are being sent over the internal IPC bus > between the cooker, UI and other event handlers. Obviously for small > task lists its not a problem, for large ones its multiple 4k chunks > over pipes which isn't going to be fast. > > We have done a lot of optimisation in the past but its all too easy to > trend on something like this and upset things :/. > > > The whole thing completed after 15m49secons (with much of the time > > going to the empty task spin), that's some 3 minutes slower, but > > certainly it's usable again. > > You followed up mentioning this wasn't with master-next. I think there > is a patch in -next which will help with the empty task spin so both > together might get us back to more normal numbers. > > Cheers, > > Richard I can just confirm that removing the debug line removes almost all of the delays I was seeing. Here are some time statistics from my builds: 6c7c0cef: 02:50 7df31ff3: 06:13 (first attempt) a0542ed3: 06:32 (master) 19a88c68: 43:19* (~yesterday's master-next) b0a0e4a6: 06:55 (master-next) no debug: 03:04 (master-next + removal of "Holding off tasks" debug) * I aborted this build after about half of the 12540 tasks were done... One thing I have noticed while I was doing the timings is that I do not seem to be able to kill the bitbake server with bitbake -m after the builds have completed, but have to resort to using kill -9 most of the time... //Peter -- ___ Openembedded-core mailing list Openembedded-core@lists.openembedded.org http://lists.openembedded.org/mailman/listinfo/openembedded-core
Re: [OE-core] Long delays with latest bitbake (was: [PATCH 1/7] insane.bbclass: in file-rdeps do not look into RDEPENDS recursively)
On Wed, 2019-08-14 at 14:08 +0200, Alexander Kanavin wrote: > On Wed, 14 Aug 2019 at 13:36, > wrote: > > On Wed, 2019-08-14 at 13:25 +0200, Alexander Kanavin wrote: > > > On Tue, 13 Aug 2019 at 21:18, Richard Purdie < > > > richard.pur...@linuxfoundation.org> wrote: > > > > I had a glance at the profile output from master-next and the > > > > problem > > > > wasn't where I thought it would be, it was in the scheduler > > code. > > > > That > > > > is good as those classes are effectively independent of the > > other > > > > changes and hence are a separate fix. > > > > > > > > I've put a patch in -next which takes the above test to 36s > > which > > > > is > > > > close to the older bitbake. > > > > > > > > Could be interesting to see how it looks for others and > > different > > > > workloads. > > > > > > I just tried the same test I did yesterday with > > > ab56d466452148e5fce330d279d13e2495eceb1f. Unfortunately it > > doesn't > > > seem to improve things much: bitbake is stuck at "NOTE: Executing > > > Tasks" for 15 minutes now. > > > > This might sound slightly crazy but can you try commenting out this > > line in runqueue.py: > > > > logger.debug(2, "Holding off tasks %s" % > > pprint.pformat(self.holdoff_tasks)) > > > > ? > > Even crazier is the outcome: it helped! Cool, I think I can explain it. The holdoff_tasks list can contain a list of nearly all the tasks at some points in execution. Even though the debug messages aren't being printed on the console, they are being sent over the internal IPC bus between the cooker, UI and other event handlers. Obviously for small task lists its not a problem, for large ones its multiple 4k chunks over pipes which isn't going to be fast. We have done a lot of optimisation in the past but its all too easy to trend on something like this and upset things :/. > The whole thing completed after 15m49secons (with much of the time > going to the empty task spin), that's some 3 minutes slower, but > certainly it's usable again. You followed up mentioning this wasn't with master-next. I think there is a patch in -next which will help with the empty task spin so both together might get us back to more normal numbers. Cheers, Richard -- ___ Openembedded-core mailing list Openembedded-core@lists.openembedded.org http://lists.openembedded.org/mailman/listinfo/openembedded-core
Re: [OE-core] Long delays with latest bitbake (was: [PATCH 1/7] insane.bbclass: in file-rdeps do not look into RDEPENDS recursively)
On Wed, Aug 14, 2019 at 02:08:01PM +0200, Alexander Kanavin wrote: > On Wed, 14 Aug 2019 at 13:36, wrote: > > > On Wed, 2019-08-14 at 13:25 +0200, Alexander Kanavin wrote: > > > On Tue, 13 Aug 2019 at 21:18, Richard Purdie < > > > richard.pur...@linuxfoundation.org> wrote: > > > > I had a glance at the profile output from master-next and the > > > > problem > > > > wasn't where I thought it would be, it was in the scheduler code. > > > > That > > > > is good as those classes are effectively independent of the other > > > > changes and hence are a separate fix. > > > > > > > > I've put a patch in -next which takes the above test to 36s which > > > > is > > > > close to the older bitbake. > > > > > > > > Could be interesting to see how it looks for others and different > > > > workloads. > > > > > > I just tried the same test I did yesterday with > > > ab56d466452148e5fce330d279d13e2495eceb1f. Unfortunately it doesn't > > > seem to improve things much: bitbake is stuck at "NOTE: Executing > > > Tasks" for 15 minutes now. > > > > This might sound slightly crazy but can you try commenting out this > > line in runqueue.py: > > > > logger.debug(2, "Holding off tasks %s" % > > pprint.pformat(self.holdoff_tasks)) > > > > ? > > > > Even crazier is the outcome: it helped! The whole thing completed after > 15m49secons (with much of the time going to the empty task spin), that's > some 3 minutes slower, but certainly it's usable again. > > I have not enabled the hash server at any point. Indeed, commenting that debug statement out removes few minutes of the bitbake preparation times. Here is a time stamped output from master branch: 2019-08-14_14:29:24 2019-08-14_14:29:35 Initialising tasks...done. 2019-08-14_14:32:41 Checking sstate mirror object availability...done. 2019-08-14_14:32:41 Sstate summary: Wanted 436 Found 426 Missed 10 Current 2407 (97% match, 99% complete) 2019-08-14_14:32:41 NOTE: Executing Tasks 2019-08-14_14:32:52 NOTE: Setscene tasks completed And with the comment removed: 2019-08-14_14:35:10 2019-08-14_14:35:21 Initialising tasks...done. 2019-08-14_14:35:30 Checking sstate mirror object availability...done. 2019-08-14_14:35:30 Sstate summary: Wanted 436 Found 426 Missed 10 Current 2407 (97% match, 99% complete) 2019-08-14_14:35:30 NOTE: Executing Tasks And just in case back to master branch state: 2019-08-14_14:41:32 2019-08-14_14:41:43 Initialising tasks...done. 2019-08-14_14:45:02 Checking sstate mirror object availability...done. 2019-08-14_14:45:02 Sstate summary: Wanted 436 Found 426 Missed 10 Current 2407 (97% match, 99% complete) 2019-08-14_14:45:02 NOTE: Executing Tasks I did not flush caches in between and I stopped the builds once bitbake started doing stuff, so I think sstate cache was completely buffered in memory from file system and IO delays did not affect the timings. Change to poky was exactly: --- a/bitbake/lib/bb/runqueue.py +++ b/bitbake/lib/bb/runqueue.py @@ -2216,7 +2216,7 @@ class RunQueueExecute: for dep in self.sqdata.sq_covered_tasks[tid]: if dep not in self.runq_complete: self.holdoff_tasks.add(dep) -logger.debug(2, "Holding off tasks %s" % pprint.pformat(self.holdoff_tasks)) +# logger.debug(2, "Holding off tasks %s" % pprint.pformat(self.holdoff_tasks)) def process_possible_migrations(self): changes = False -Mikko -- ___ Openembedded-core mailing list Openembedded-core@lists.openembedded.org http://lists.openembedded.org/mailman/listinfo/openembedded-core
Re: [OE-core] Long delays with latest bitbake (was: [PATCH 1/7] insane.bbclass: in file-rdeps do not look into RDEPENDS recursively)
On Wed, 14 Aug 2019 at 14:08, Alexander Kanavin wrote: > > This might sound slightly crazy but can you try commenting out this >> line in runqueue.py: >> >> logger.debug(2, "Holding off tasks %s" % >> pprint.pformat(self.holdoff_tasks)) >> >> ? >> > > Even crazier is the outcome: it helped! The whole thing completed after > 15m49secons (with much of the time going to the empty task spin), that's > some 3 minutes slower, but certainly it's usable again. > > I have not enabled the hash server at any point. > I need to clarify: this was with current master (18b8e2e10494d3b0e18743a1bf9e99038da4229c), not master-next. Alex -- ___ Openembedded-core mailing list Openembedded-core@lists.openembedded.org http://lists.openembedded.org/mailman/listinfo/openembedded-core
Re: [OE-core] Long delays with latest bitbake (was: [PATCH 1/7] insane.bbclass: in file-rdeps do not look into RDEPENDS recursively)
On Wed, 14 Aug 2019 at 13:36, wrote: > On Wed, 2019-08-14 at 13:25 +0200, Alexander Kanavin wrote: > > On Tue, 13 Aug 2019 at 21:18, Richard Purdie < > > richard.pur...@linuxfoundation.org> wrote: > > > I had a glance at the profile output from master-next and the > > > problem > > > wasn't where I thought it would be, it was in the scheduler code. > > > That > > > is good as those classes are effectively independent of the other > > > changes and hence are a separate fix. > > > > > > I've put a patch in -next which takes the above test to 36s which > > > is > > > close to the older bitbake. > > > > > > Could be interesting to see how it looks for others and different > > > workloads. > > > > I just tried the same test I did yesterday with > > ab56d466452148e5fce330d279d13e2495eceb1f. Unfortunately it doesn't > > seem to improve things much: bitbake is stuck at "NOTE: Executing > > Tasks" for 15 minutes now. > > This might sound slightly crazy but can you try commenting out this > line in runqueue.py: > > logger.debug(2, "Holding off tasks %s" % > pprint.pformat(self.holdoff_tasks)) > > ? > Even crazier is the outcome: it helped! The whole thing completed after 15m49secons (with much of the time going to the empty task spin), that's some 3 minutes slower, but certainly it's usable again. I have not enabled the hash server at any point. Alex -- ___ Openembedded-core mailing list Openembedded-core@lists.openembedded.org http://lists.openembedded.org/mailman/listinfo/openembedded-core
Re: [OE-core] Long delays with latest bitbake (was: [PATCH 1/7] insane.bbclass: in file-rdeps do not look into RDEPENDS recursively)
On Wed, 2019-08-14 at 13:25 +0200, Alexander Kanavin wrote: > On Tue, 13 Aug 2019 at 21:18, Richard Purdie < > richard.pur...@linuxfoundation.org> wrote: > > I had a glance at the profile output from master-next and the > > problem > > wasn't where I thought it would be, it was in the scheduler code. > > That > > is good as those classes are effectively independent of the other > > changes and hence are a separate fix. > > > > I've put a patch in -next which takes the above test to 36s which > > is > > close to the older bitbake. > > > > Could be interesting to see how it looks for others and different > > workloads. > > I just tried the same test I did yesterday with > ab56d466452148e5fce330d279d13e2495eceb1f. Unfortunately it doesn't > seem to improve things much: bitbake is stuck at "NOTE: Executing > Tasks" for 15 minutes now. This might sound slightly crazy but can you try commenting out this line in runqueue.py: logger.debug(2, "Holding off tasks %s" % pprint.pformat(self.holdoff_tasks)) ? This was with the hash server disabled, right? Cheers, Richard -- ___ Openembedded-core mailing list Openembedded-core@lists.openembedded.org http://lists.openembedded.org/mailman/listinfo/openembedded-core
Re: [OE-core] Long delays with latest bitbake (was: [PATCH 1/7] insane.bbclass: in file-rdeps do not look into RDEPENDS recursively)
On Tue, 13 Aug 2019 at 21:18, Richard Purdie < richard.pur...@linuxfoundation.org> wrote: > I had a glance at the profile output from master-next and the problem > wasn't where I thought it would be, it was in the scheduler code. That > is good as those classes are effectively independent of the other > changes and hence are a separate fix. > > I've put a patch in -next which takes the above test to 36s which is > close to the older bitbake. > > Could be interesting to see how it looks for others and different > workloads. > I just tried the same test I did yesterday with ab56d466452148e5fce330d279d13e2495eceb1f. Unfortunately it doesn't seem to improve things much: bitbake is stuck at "NOTE: Executing Tasks" for 15 minutes now. Alex -- ___ Openembedded-core mailing list Openembedded-core@lists.openembedded.org http://lists.openembedded.org/mailman/listinfo/openembedded-core
Re: [OE-core] Long delays with latest bitbake (was: [PATCH 1/7] insane.bbclass: in file-rdeps do not look into RDEPENDS recursively)
On Tue, Aug 13, 2019 at 12:18 PM Richard Purdie wrote: > > On Tue, 2019-08-13 at 10:04 +0100, Richard Purdie wrote: > > On Mon, 2019-08-12 at 20:26 +, Peter Kjellerstedt wrote: > > > Comparing that build to a corresponding do-nothing build with Thud, > > > the time difference matches those three minutes where I have no > > > idea > > > what bitbake is doing now that it didn’t need to do before… > > > > > > Hopefully these time degradations can be solved, because the > > > current > > > state of bitbake is barely usable. We also need to look into > > > possible > > > ways to improve the cooker output when it is running the setscene > > > tasks so it makes some kind of sense again. > > > > We talked on irc and you pointed at the commit things started to go > > wrong. Just to summarise things for the benefit of the list, this is > > some quick testing I did: > > > > "bitbake -p; time bitbake core-image-minimal -n" > > > > 30.0s 6c7c0cefd34067311144a1d4c01986fe0a4aef26 > > 30.6s a0d941c787cf3ef030d190903279d311bc05d752 > > 40.3s 7df31ff36892c2f9c65326b06b4c70 > > 42.2s a0542ed3ff700eca35f9195f743c9e28bcd50f3e > > 45.4s 9983b07fffd19082abded7c3f15cc77d306dd69c > > 76.9s master-next > > > > So basically the original changes showed a 25% hit but the > > performance > > of -next is dire. > > > > This is with no hash equiv server configured. > > > > It will vary depending on the target used (numbers with -sato for the > > above would be interesting for comparision) and how much was or is in > > sstate, they type of sstate mirror configured and so on. > > > > I really need to focus on getting the new code functioning correctly > > before we attempt to optimise but if nobody tests the new code due to > > performance problems we have a different issue. We also have a > > scaling > > problem with the hash server itself I need to fix to stop the > > autobuilder throwing weird errors. I'm therefore a bit challenged on > > where to start with it all :/. > > I had a glance at the profile output from master-next and the problem > wasn't where I thought it would be, it was in the scheduler code. That > is good as those classes are effectively independent of the other > changes and hence are a separate fix. > > I've put a patch in -next which takes the above test to 36s which is > close to the older bitbake. > > Could be interesting to see how it looks for others and different > workloads. > will try it tonight > Cheers, > > Richard > -- ___ Openembedded-core mailing list Openembedded-core@lists.openembedded.org http://lists.openembedded.org/mailman/listinfo/openembedded-core
Re: [OE-core] Long delays with latest bitbake (was: [PATCH 1/7] insane.bbclass: in file-rdeps do not look into RDEPENDS recursively)
On Tue, 2019-08-13 at 10:04 +0100, Richard Purdie wrote: > On Mon, 2019-08-12 at 20:26 +, Peter Kjellerstedt wrote: > > Comparing that build to a corresponding do-nothing build with Thud, > > the time difference matches those three minutes where I have no > > idea > > what bitbake is doing now that it didn’t need to do before… > > > > Hopefully these time degradations can be solved, because the > > current > > state of bitbake is barely usable. We also need to look into > > possible > > ways to improve the cooker output when it is running the setscene > > tasks so it makes some kind of sense again. > > We talked on irc and you pointed at the commit things started to go > wrong. Just to summarise things for the benefit of the list, this is > some quick testing I did: > > "bitbake -p; time bitbake core-image-minimal -n" > > 30.0s 6c7c0cefd34067311144a1d4c01986fe0a4aef26 > 30.6s a0d941c787cf3ef030d190903279d311bc05d752 > 40.3s 7df31ff36892c2f9c65326b06b4c70 > 42.2s a0542ed3ff700eca35f9195f743c9e28bcd50f3e > 45.4s 9983b07fffd19082abded7c3f15cc77d306dd69c > 76.9s master-next > > So basically the original changes showed a 25% hit but the > performance > of -next is dire. > > This is with no hash equiv server configured. > > It will vary depending on the target used (numbers with -sato for the > above would be interesting for comparision) and how much was or is in > sstate, they type of sstate mirror configured and so on. > > I really need to focus on getting the new code functioning correctly > before we attempt to optimise but if nobody tests the new code due to > performance problems we have a different issue. We also have a > scaling > problem with the hash server itself I need to fix to stop the > autobuilder throwing weird errors. I'm therefore a bit challenged on > where to start with it all :/. I had a glance at the profile output from master-next and the problem wasn't where I thought it would be, it was in the scheduler code. That is good as those classes are effectively independent of the other changes and hence are a separate fix. I've put a patch in -next which takes the above test to 36s which is close to the older bitbake. Could be interesting to see how it looks for others and different workloads. Cheers, Richard -- ___ Openembedded-core mailing list Openembedded-core@lists.openembedded.org http://lists.openembedded.org/mailman/listinfo/openembedded-core
Re: [OE-core] Long delays with latest bitbake (was: [PATCH 1/7] insane.bbclass: in file-rdeps do not look into RDEPENDS recursively)
On Tue, 13 Aug 2019 at 11:04, Richard Purdie < richard.pur...@linuxfoundation.org> wrote: > We talked on irc and you pointed at the commit things started to go > wrong. Just to summarise things for the benefit of the list, this is > some quick testing I did: > > "bitbake -p; time bitbake core-image-minimal -n" > > 30.0s 6c7c0cefd34067311144a1d4c01986fe0a4aef26 > 30.6s a0d941c787cf3ef030d190903279d311bc05d752 > 40.3s 7df31ff36892c2f9c65326b06b4c70 > 42.2s a0542ed3ff700eca35f9195f743c9e28bcd50f3e > 45.4s 9983b07fffd19082abded7c3f15cc77d306dd69c > 76.9s master-next > > So basically the original changes showed a 25% hit but the performance > of -next is dire. > Sadly with larger sets the performance hit is much worse. I enabled all of the layers in meta-oe, and ran this with empty sstate and build directories: bitbake -p; time bitbake -n -k world The outcomes are: 6c7c0cefd34067311144a1d4c01986fe0a4aef26 12m51,848s (this is the baseline; the bulk of the time goes towards going through the task list without executing the tasks - the 'spinning task counter" part) 9983b07fffd19082abded7c3f15cc77d306dd69c 66m29,563s So there could be something quadratically-growing going on with these new commits :( I didn't dare to try master-next after seeing above how it hits core-image-minimal and master already regressing to over an hour :-( Alex -- ___ Openembedded-core mailing list Openembedded-core@lists.openembedded.org http://lists.openembedded.org/mailman/listinfo/openembedded-core
Re: [OE-core] Long delays with latest bitbake (was: [PATCH 1/7] insane.bbclass: in file-rdeps do not look into RDEPENDS recursively)
On Mon, 2019-08-12 at 20:26 +, Peter Kjellerstedt wrote: > Comparing that build to a corresponding do-nothing build with Thud, > the time difference matches those three minutes where I have no idea > what bitbake is doing now that it didn’t need to do before… > > Hopefully these time degradations can be solved, because the current > state of bitbake is barely usable. We also need to look into possible > ways to improve the cooker output when it is running the setscene > tasks so it makes some kind of sense again. We talked on irc and you pointed at the commit things started to go wrong. Just to summarise things for the benefit of the list, this is some quick testing I did: "bitbake -p; time bitbake core-image-minimal -n" 30.0s 6c7c0cefd34067311144a1d4c01986fe0a4aef26 30.6s a0d941c787cf3ef030d190903279d311bc05d752 40.3s 7df31ff36892c2f9c65326b06b4c70 42.2s a0542ed3ff700eca35f9195f743c9e28bcd50f3e 45.4s 9983b07fffd19082abded7c3f15cc77d306dd69c 76.9s master-next So basically the original changes showed a 25% hit but the performance of -next is dire. This is with no hash equiv server configured. It will vary depending on the target used (numbers with -sato for the above would be interesting for comparision) and how much was or is in sstate, they type of sstate mirror configured and so on. I really need to focus on getting the new code functioning correctly before we attempt to optimise but if nobody tests the new code due to performance problems we have a different issue. We also have a scaling problem with the hash server itself I need to fix to stop the autobuilder throwing weird errors. I'm therefore a bit challenged on where to start with it all :/. Cheers, Richard -- ___ Openembedded-core mailing list Openembedded-core@lists.openembedded.org http://lists.openembedded.org/mailman/listinfo/openembedded-core