Re: [OE-core] Long delays with latest bitbake (was: [PATCH 1/7] insane.bbclass: in file-rdeps do not look into RDEPENDS recursively)

2019-08-19 Thread Martin Jansa
On Fri, Aug 16, 2019 at 07:22:55PM +0200, Martin Jansa wrote:
> On Fri, Aug 16, 2019 at 04:54:48PM +0100, richard.pur...@linuxfoundation.org 
> wrote:
> > On Fri, 2019-08-16 at 17:00 +0200, Martin Jansa wrote:
> > > > Will try to bump BB_NUMBER_THREADS from 8 to 72.
> > > 
> > > I've tried to remove icecc.bbclass inherit (because it does things
> > > while parsing and RP probably doesn't have it inherited), but that
> > > didn't save much time.
> > > 
> > > All 3 tests were with bitbake
> > > 18d4a31fdcec1f0e5d2199d6142f0ce833fca1a7
> > > 94m19.081s  8 BB_NUMBER_THREADS and icecc
> > > 82m59.574s  8 BB_NUMBER_THREADS no icecc
> > > 68m3.556s72 BB_NUMBER_THREADS no icecc
> > > 
> > > Still don't know how to get to sub 10min world runs RP was seeing,
> > > but at least it's as slow as it was before runqeueu changes (or even
> > > a bit faster in lastest master).
> > 
> > Just thinking out loud, other things which can influence timings:
> > 
> > Is SSTATE_DIR on NFS or local disk?
> 
> SSTATE_DIR is empty directory on local disk
> /dev/mapper/vg00-jenkins on /jenkins type ext4 
> (rw,noatime,nobarrier,commit=6000,stripe=128,data=ordered)
> 
> > Are sstate mirrors configured?
> 
> None, normally I have SSTATE_MIRRORS over sshfs in this case, but I've
> removed it before any performance testing.
> 
> > Is there an existing build or not, if so, how much is valid?
> 
> Nothing, I remove whole TMPDIR and cache before each run. Then let it
> recreate the cache before starting the profile:
> bitbake -p; time bitbake world -P -n
> 
> > Underlying filesystem of the build?
> 
> ext4, everything is pretty much generic Ubuntu 18.04
> 
> There is plenty of ram, I'll try to test this from tmpfs as well.

I did the same "bitbake -p; time bitbake world -P -n" on the same build with 
thud, warrior and zeus:

thud/profile.log.processed: 2803894450 function calls (2803191143 
primitive calls) in 3340.784 seconds
thud/profile-worker.log.processed: 27005515 function calls (26944030 
primitive calls) in 3243.139 seconds
warrior/profile.log.processed: 2804294486 function calls (2803446296 
primitive calls) in 3604.226 seconds
warrior/profile-worker.log.processed: 27649679 function calls (27591220 
primitive calls) in 3503.772 seconds
zeus/profile.log.processed: 2327031298 function calls (2326396186 
primitive calls) in 4433.851 seconds
zeus/profile-worker.log.processed: 28829453 function calls (28771730 
primitive calls) in 4331.257 seconds

thud/time:real55m41.595s
warrior/time:real60m4.954s
zeus/time:real72m23.053s

with multilib disabled I got
zeus-no-multilib/profile.log.processed: 1232798107 function calls 
(1232447521 primitive calls) in 1773.164 seconds
zeus-no-multilib/profile-worker.log.processed: 15561565 function calls 
(1044 primitive calls) in 1716.234 seconds
real29m33.658s

Which seems reasonable as the number of tasks was cut in half.

Let me know if it's worth uploading these profiles somewhere.

Cheers,

> > Your build seems especially slow at executing through the tasks which
> > is effectively a test on how fast the system can fork() and return in
> > some ways. It would be interesting to block dry-run on the server side,
> > skip the fork and see how the numbers compare.
> 
> As discussed on IRC, it's slower than yours (8 minutes from 68), but the
> most significant chunk of time is lost somewhere else.
> 
> > My build manages some parts of the tasklist faster than others, perhaps
> > because there are more "free" tasks to execute at some points in the
> > task graph than others.
> > 
> > Also, I have some patches which improve performance a bit which I'm
> > still testing.
> 
> Thanks for all the work on this!
> 
> Cheers,
> -- 
> Martin 'JaMa' Jansa jabber: martin.ja...@gmail.com



-- 
Martin 'JaMa' Jansa jabber: martin.ja...@gmail.com


signature.asc
Description: Digital signature
-- 
___
Openembedded-core mailing list
Openembedded-core@lists.openembedded.org
http://lists.openembedded.org/mailman/listinfo/openembedded-core


Re: [OE-core] Long delays with latest bitbake (was: [PATCH 1/7] insane.bbclass: in file-rdeps do not look into RDEPENDS recursively)

2019-08-16 Thread Martin Jansa
On Fri, Aug 16, 2019 at 04:54:48PM +0100, richard.pur...@linuxfoundation.org 
wrote:
> On Fri, 2019-08-16 at 17:00 +0200, Martin Jansa wrote:
> > > Will try to bump BB_NUMBER_THREADS from 8 to 72.
> > 
> > I've tried to remove icecc.bbclass inherit (because it does things
> > while parsing and RP probably doesn't have it inherited), but that
> > didn't save much time.
> > 
> > All 3 tests were with bitbake
> > 18d4a31fdcec1f0e5d2199d6142f0ce833fca1a7
> > 94m19.081s  8 BB_NUMBER_THREADS and icecc
> > 82m59.574s  8 BB_NUMBER_THREADS no icecc
> > 68m3.556s72 BB_NUMBER_THREADS no icecc
> > 
> > Still don't know how to get to sub 10min world runs RP was seeing,
> > but at least it's as slow as it was before runqeueu changes (or even
> > a bit faster in lastest master).
> 
> Just thinking out loud, other things which can influence timings:
> 
> Is SSTATE_DIR on NFS or local disk?

SSTATE_DIR is empty directory on local disk
/dev/mapper/vg00-jenkins on /jenkins type ext4 
(rw,noatime,nobarrier,commit=6000,stripe=128,data=ordered)

> Are sstate mirrors configured?

None, normally I have SSTATE_MIRRORS over sshfs in this case, but I've
removed it before any performance testing.

> Is there an existing build or not, if so, how much is valid?

Nothing, I remove whole TMPDIR and cache before each run. Then let it
recreate the cache before starting the profile:
bitbake -p; time bitbake world -P -n

> Underlying filesystem of the build?

ext4, everything is pretty much generic Ubuntu 18.04

There is plenty of ram, I'll try to test this from tmpfs as well.

> Your build seems especially slow at executing through the tasks which
> is effectively a test on how fast the system can fork() and return in
> some ways. It would be interesting to block dry-run on the server side,
> skip the fork and see how the numbers compare.

As discussed on IRC, it's slower than yours (8 minutes from 68), but the
most significant chunk of time is lost somewhere else.

> My build manages some parts of the tasklist faster than others, perhaps
> because there are more "free" tasks to execute at some points in the
> task graph than others.
> 
> Also, I have some patches which improve performance a bit which I'm
> still testing.

Thanks for all the work on this!

Cheers,
-- 
Martin 'JaMa' Jansa jabber: martin.ja...@gmail.com


signature.asc
Description: Digital signature
-- 
___
Openembedded-core mailing list
Openembedded-core@lists.openembedded.org
http://lists.openembedded.org/mailman/listinfo/openembedded-core


Re: [OE-core] Long delays with latest bitbake (was: [PATCH 1/7] insane.bbclass: in file-rdeps do not look into RDEPENDS recursively)

2019-08-16 Thread richard . purdie
On Fri, 2019-08-16 at 17:00 +0200, Martin Jansa wrote:
> > Will try to bump BB_NUMBER_THREADS from 8 to 72.
> 
> I've tried to remove icecc.bbclass inherit (because it does things
> while parsing and RP probably doesn't have it inherited), but that
> didn't save much time.
> 
> All 3 tests were with bitbake
> 18d4a31fdcec1f0e5d2199d6142f0ce833fca1a7
> 94m19.081s  8 BB_NUMBER_THREADS and icecc
> 82m59.574s  8 BB_NUMBER_THREADS no icecc
> 68m3.556s72 BB_NUMBER_THREADS no icecc
> 
> Still don't know how to get to sub 10min world runs RP was seeing,
> but at least it's as slow as it was before runqeueu changes (or even
> a bit faster in lastest master).

Just thinking out loud, other things which can influence timings:

Is SSTATE_DIR on NFS or local disk?
Are sstate mirrors configured?
Is there an existing build or not, if so, how much is valid?
Underlying filesystem of the build?

Your build seems especially slow at executing through the tasks which
is effectively a test on how fast the system can fork() and return in
some ways. It would be interesting to block dry-run on the server side,
skip the fork and see how the numbers compare.

My build manages some parts of the tasklist faster than others, perhaps
because there are more "free" tasks to execute at some points in the
task graph than others.

Also, I have some patches which improve performance a bit which I'm
still testing.

Cheers,

Richard


-- 
___
Openembedded-core mailing list
Openembedded-core@lists.openembedded.org
http://lists.openembedded.org/mailman/listinfo/openembedded-core


Re: [OE-core] Long delays with latest bitbake (was: [PATCH 1/7] insane.bbclass: in file-rdeps do not look into RDEPENDS recursively)

2019-08-16 Thread Martin Jansa
> Will try to bump BB_NUMBER_THREADS from 8 to 72.

I've tried to remove icecc.bbclass inherit (because it does things while
parsing and RP probably doesn't have it inherited), but that didn't save
much time.

All 3 tests were with bitbake 18d4a31fdcec1f0e5d2199d6142f0ce833fca1a7
94m19.081s  8 BB_NUMBER_THREADS and icecc
82m59.574s  8 BB_NUMBER_THREADS no icecc
68m3.556s72 BB_NUMBER_THREADS no icecc

Still don't know how to get to sub 10min world runs RP was seeing, but at
least it's as slow as it was before runqeueu changes (or even a bit faster
in lastest master).

On Fri, Aug 16, 2019 at 12:24 PM Martin Jansa 
wrote:

> On Thu, Aug 15, 2019 at 05:05:48PM +0200, Martin Jansa wrote:
> > On Tue, Aug 13, 2019 at 10:04:08AM +0100, Richard Purdie wrote:
> > > On Mon, 2019-08-12 at 20:26 +, Peter Kjellerstedt wrote:
> > > > Comparing that build to a corresponding do-nothing build with Thud,
> > > > the time difference matches those three minutes where I have no idea
> > > > what bitbake is doing now that it didn’t need to do before…
> > > >
> > > > Hopefully these time degradations can be solved, because the current
> > > > state of bitbake is barely usable. We also need to look into possible
> > > > ways to improve the cooker output when it is running the setscene
> > > > tasks so it makes some kind of sense again.
> > >
> > > We talked on irc and you pointed at the commit things started to go
> > > wrong. Just to summarise things for the benefit of the list, this is
> > > some quick testing I did:
> > >
> > > "bitbake -p; time bitbake core-image-minimal -n"
> > >
> > > 30.0s 6c7c0cefd34067311144a1d4c01986fe0a4aef26
> > > 30.6s a0d941c787cf3ef030d190903279d311bc05d752
> > > 40.3s 7df31ff36892c2f9c65326b06b4c70
> > > 42.2s a0542ed3ff700eca35f9195f743c9e28bcd50f3e
> > > 45.4s 9983b07fffd19082abded7c3f15cc77d306dd69c
> > > 76.9s master-next
> >
> > I know I'm late to the party, but it took really long for the test to
> finish..
> >
> > I'm not using poky, so reproduce this testing on our builds I've used
> > bitbake revisions matching with poky revision RP was testing:
> >
> > + oe-core 18cdc087fd5da30e2b31f3d4e81b153cd36ca844 for older bitbake,
> because current master isn't compatible with old bitbake
> (bb.data_smart.ExpansionError: Failure expanding variable OE_IMPORTED[:=],
> expression was ${@oe_import(d)} which triggered exception AttributeError:
> module 'bb.siggen' has no attribute 'SignatureGeneratorUniHashMixIn')
> > and oe-core 23db236a054ee7a989cdbbcb42ad5c6eefd4a6ae for newer bitbake
> (to possibly eliminate impact of metadata changes)
> >
> > tested on 72core (Intel(R) Xeon(R) CPU E5-2699 v3 @ 2.30GHz) with 380GB
> RAM
> >
> > nothing in TMPDIR/SSTATE_DIR, no SSTATE_MIRRO, no Hash Equivalence
> Server configured
> >
> > on a build with few more layers:
> > Parsing of 3238 .bb files complete (0 cached, 3238 parsed). 7632
> targets, 1927 skipped, 54 masked, 0 errors.
> >
> > but first doing just core-image-minimal like RP did:
> >
> > time  pokybitbake
>  oe-core
> > 2m8.191s  6c7c0cefd34067311144a1d4c01986fe0a4aef26
> 18d4a31fdcec1f0e5d2199d6142f0ce833fca1a7
> 18cdc087fd5da30e2b31f3d4e81b153cd36ca844
> > N/A   a0d941c787cf3ef030d190903279d311bc05d752doesn't
> exist in poky/poky-contrib
> 18cdc087fd5da30e2b31f3d4e81b153cd36ca844
> > 2m17.053s 7df31ff36892c2f9c65326b06b4c70
> 1f630fdf0260db08541d3ca9f25f852931c19905
> 18cdc087fd5da30e2b31f3d4e81b153cd36ca844
> > 2m18.515s a0542ed3ff700eca35f9195f743c9e28bcd50f3e
> f43778c2d19e70d4befd483b06aaf247fc65c799
> 23db236a054ee7a989cdbbcb42ad5c6eefd4a6ae
> > 2m22.220s 9983b07fffd19082abded7c3f15cc77d306dd69c
> 8c26b451f22193ef1c544e2017cc84515566c1b8
> > 2m38.185s master-next
>  fbcb89dc3dbabfc80310e9a4ebe72d919300a32e
> > cache.py:446: ResourceWarning: unclosed  family=AddressFamily.AF_INET, type=SocketKind.SOCK_STREAM, proto=0,
> laddr=('127.0.0.1', 41117)>
> >   value = pickled.load()
> > started showing with this revision
> > 2m17.991s master-next + fixups
> f7cd14f90463957b3e4be6d3876def98b78f1424
> > 2m9.651s  master-next + "Holding off tasks %s" out
> >
> > now world
> > 253m36.637s   7df31ff36892c2f9c65326b06b4c70
> 1f630fdf0260db08541d3ca9f25f852931c19905
> 18cdc087fd5da30e2b31f3d4e81b153cd36ca844
> > 174m13.324s   a0542ed3ff700eca35f9195f743c9e28bcd50f3e
> f43778c2d19e70d4befd483b06aaf247fc65c799
> 23db236a054ee7a989cdbbcb42ad5c6eefd4a6ae
> >this is time when killed at "NOTE: Executing Tasks" without showing
> any progress on the tasks (unlike other tests) and only one bitbake process
> is running
> > 633m27.475s   master-next
>  fbcb89dc3dbabfc80310e9a4ebe72d919300a32e
> >this is time when killed at (1417 from 71749) - running very slowly
> unlike other bitbake revisions where the task number changes relatively
> quickly - about 10 tasks per second
> > 89m13.992smaster-next + "Holding off tasks 

Re: [OE-core] Long delays with latest bitbake (was: [PATCH 1/7] insane.bbclass: in file-rdeps do not look into RDEPENDS recursively)

2019-08-16 Thread Martin Jansa
On Thu, Aug 15, 2019 at 05:05:48PM +0200, Martin Jansa wrote:
> On Tue, Aug 13, 2019 at 10:04:08AM +0100, Richard Purdie wrote:
> > On Mon, 2019-08-12 at 20:26 +, Peter Kjellerstedt wrote:
> > > Comparing that build to a corresponding do-nothing build with Thud,
> > > the time difference matches those three minutes where I have no idea
> > > what bitbake is doing now that it didn’t need to do before…
> > >  
> > > Hopefully these time degradations can be solved, because the current
> > > state of bitbake is barely usable. We also need to look into possible
> > > ways to improve the cooker output when it is running the setscene
> > > tasks so it makes some kind of sense again.
> > 
> > We talked on irc and you pointed at the commit things started to go
> > wrong. Just to summarise things for the benefit of the list, this is
> > some quick testing I did:
> > 
> > "bitbake -p; time bitbake core-image-minimal -n"
> > 
> > 30.0s 6c7c0cefd34067311144a1d4c01986fe0a4aef26
> > 30.6s a0d941c787cf3ef030d190903279d311bc05d752
> > 40.3s 7df31ff36892c2f9c65326b06b4c70 
> > 42.2s a0542ed3ff700eca35f9195f743c9e28bcd50f3e 
> > 45.4s 9983b07fffd19082abded7c3f15cc77d306dd69c 
> > 76.9s master-next
> 
> I know I'm late to the party, but it took really long for the test to finish..
> 
> I'm not using poky, so reproduce this testing on our builds I've used
> bitbake revisions matching with poky revision RP was testing:
> 
> + oe-core 18cdc087fd5da30e2b31f3d4e81b153cd36ca844 for older bitbake, because 
> current master isn't compatible with old bitbake 
> (bb.data_smart.ExpansionError: Failure expanding variable OE_IMPORTED[:=], 
> expression was ${@oe_import(d)} which triggered exception AttributeError: 
> module 'bb.siggen' has no attribute 'SignatureGeneratorUniHashMixIn')
> and oe-core 23db236a054ee7a989cdbbcb42ad5c6eefd4a6ae for newer bitbake (to 
> possibly eliminate impact of metadata changes)
> 
> tested on 72core (Intel(R) Xeon(R) CPU E5-2699 v3 @ 2.30GHz) with 380GB RAM
> 
> nothing in TMPDIR/SSTATE_DIR, no SSTATE_MIRRO, no Hash Equivalence Server 
> configured
> 
> on a build with few more layers:
> Parsing of 3238 .bb files complete (0 cached, 3238 parsed). 7632 targets, 
> 1927 skipped, 54 masked, 0 errors.
> 
> but first doing just core-image-minimal like RP did:
> 
> time  pokybitbake 
> oe-core
> 2m8.191s  6c7c0cefd34067311144a1d4c01986fe0a4aef26
> 18d4a31fdcec1f0e5d2199d6142f0ce833fca1a7
> 18cdc087fd5da30e2b31f3d4e81b153cd36ca844
> N/A   a0d941c787cf3ef030d190903279d311bc05d752doesn't exist 
> in poky/poky-contrib  18cdc087fd5da30e2b31f3d4e81b153cd36ca844
> 2m17.053s 7df31ff36892c2f9c65326b06b4c70  
> 1f630fdf0260db08541d3ca9f25f852931c19905
> 18cdc087fd5da30e2b31f3d4e81b153cd36ca844
> 2m18.515s a0542ed3ff700eca35f9195f743c9e28bcd50f3e
> f43778c2d19e70d4befd483b06aaf247fc65c799
> 23db236a054ee7a989cdbbcb42ad5c6eefd4a6ae
> 2m22.220s 9983b07fffd19082abded7c3f15cc77d306dd69c
> 8c26b451f22193ef1c544e2017cc84515566c1b8
> 2m38.185s master-next 
> fbcb89dc3dbabfc80310e9a4ebe72d919300a32e
> cache.py:446: ResourceWarning: unclosed  family=AddressFamily.AF_INET, type=SocketKind.SOCK_STREAM, proto=0, 
> laddr=('127.0.0.1', 41117)>
>   value = pickled.load()
> started showing with this revision
> 2m17.991s master-next + fixups
> f7cd14f90463957b3e4be6d3876def98b78f1424
> 2m9.651s  master-next + "Holding off tasks %s" out
> 
> now world
> 253m36.637s   7df31ff36892c2f9c65326b06b4c70  
> 1f630fdf0260db08541d3ca9f25f852931c19905
> 18cdc087fd5da30e2b31f3d4e81b153cd36ca844
> 174m13.324s   a0542ed3ff700eca35f9195f743c9e28bcd50f3e
> f43778c2d19e70d4befd483b06aaf247fc65c799
> 23db236a054ee7a989cdbbcb42ad5c6eefd4a6ae
>this is time when killed at "NOTE: Executing Tasks" without showing any 
> progress on the tasks (unlike other tests) and only one bitbake process is 
> running
> 633m27.475s   master-next 
> fbcb89dc3dbabfc80310e9a4ebe72d919300a32e
>this is time when killed at (1417 from 71749) - running very slowly unlike 
> other bitbake revisions where the task number changes relatively quickly - 
> about 10 tasks per second
> 89m13.992smaster-next + "Holding off tasks %s" out
> 89m59.201smaster-next updated today   
> 85fe627fdb6510f0942917964386fad9d8c479c8
> 
> Interestingly old 1f630fdf0260db08541d3ca9f25f852931c19905 is 3 times
> slower than recent master-next.
> 
> I'll send -P output next.


So 1f630fdf0260db08541d3ca9f25f852931c19905 was already "bad", because
this commit introduced that
logger.debug(2, "Holding off tasks %s" % pprint.pformat(self.holdoff_tasks))

Here is -P output from it:

5min (til "Executing Tasks" message):

Re: [OE-core] Long delays with latest bitbake (was: [PATCH 1/7] insane.bbclass: in file-rdeps do not look into RDEPENDS recursively)

2019-08-15 Thread Richard Purdie
On Thu, 2019-08-15 at 14:56 +0100, richard.pur...@linuxfoundation.org
wrote:
> On Thu, 2019-08-15 at 14:56 +0200, Alexander Kanavin wrote:
> What is really odd is this on both traces:
> 
>524436  405.1020.001  405.2660.001
> /home/alexander/development/poky/bitbake/lib/bb/cooker.py:264(notific
> ations)
> 
> What that means is that cooker recieved notification of 524436 file
> changes whilst it was parsing/preparing runqueue.
> 
> I would be very interested to understand which files its being
> notified
> of changes to as I suspect there is something that shouldn't be
> there.

This is a bit of a false lead. At least locally I found them to be
profiles of the individual tasks being written out which is only
triggered on profiled builds.

We should write those files to a different directory to avoid this
confusion of the profiles...

Cheers,

Richard

-- 
___
Openembedded-core mailing list
Openembedded-core@lists.openembedded.org
http://lists.openembedded.org/mailman/listinfo/openembedded-core


Re: [OE-core] Long delays with latest bitbake (was: [PATCH 1/7] insane.bbclass: in file-rdeps do not look into RDEPENDS recursively)

2019-08-15 Thread Martin Jansa
On Tue, Aug 13, 2019 at 10:04:08AM +0100, Richard Purdie wrote:
> On Mon, 2019-08-12 at 20:26 +, Peter Kjellerstedt wrote:
> > Comparing that build to a corresponding do-nothing build with Thud,
> > the time difference matches those three minutes where I have no idea
> > what bitbake is doing now that it didn’t need to do before…
> >  
> > Hopefully these time degradations can be solved, because the current
> > state of bitbake is barely usable. We also need to look into possible
> > ways to improve the cooker output when it is running the setscene
> > tasks so it makes some kind of sense again.
> 
> We talked on irc and you pointed at the commit things started to go
> wrong. Just to summarise things for the benefit of the list, this is
> some quick testing I did:
> 
> "bitbake -p; time bitbake core-image-minimal -n"
> 
> 30.0s 6c7c0cefd34067311144a1d4c01986fe0a4aef26
> 30.6s a0d941c787cf3ef030d190903279d311bc05d752
> 40.3s 7df31ff36892c2f9c65326b06b4c70 
> 42.2s a0542ed3ff700eca35f9195f743c9e28bcd50f3e 
> 45.4s 9983b07fffd19082abded7c3f15cc77d306dd69c 
> 76.9s master-next

I know I'm late to the party, but it took really long for the test to finish..

I'm not using poky, so reproduce this testing on our builds I've used
bitbake revisions matching with poky revision RP was testing:

+ oe-core 18cdc087fd5da30e2b31f3d4e81b153cd36ca844 for older bitbake, because 
current master isn't compatible with old bitbake (bb.data_smart.ExpansionError: 
Failure expanding variable OE_IMPORTED[:=], expression was ${@oe_import(d)} 
which triggered exception AttributeError: module 'bb.siggen' has no attribute 
'SignatureGeneratorUniHashMixIn')
and oe-core 23db236a054ee7a989cdbbcb42ad5c6eefd4a6ae for newer bitbake (to 
possibly eliminate impact of metadata changes)

tested on 72core (Intel(R) Xeon(R) CPU E5-2699 v3 @ 2.30GHz) with 380GB RAM

nothing in TMPDIR/SSTATE_DIR, no SSTATE_MIRRO, no Hash Equivalence Server 
configured

on a build with few more layers:
Parsing of 3238 .bb files complete (0 cached, 3238 parsed). 7632 targets, 1927 
skipped, 54 masked, 0 errors.

but first doing just core-image-minimal like RP did:

timepokybitbake 
oe-core
2m8.191s6c7c0cefd34067311144a1d4c01986fe0a4aef26
18d4a31fdcec1f0e5d2199d6142f0ce833fca1a7
18cdc087fd5da30e2b31f3d4e81b153cd36ca844
N/A a0d941c787cf3ef030d190903279d311bc05d752doesn't exist 
in poky/poky-contrib  18cdc087fd5da30e2b31f3d4e81b153cd36ca844
2m17.053s   7df31ff36892c2f9c65326b06b4c70  
1f630fdf0260db08541d3ca9f25f852931c19905
18cdc087fd5da30e2b31f3d4e81b153cd36ca844
2m18.515s   a0542ed3ff700eca35f9195f743c9e28bcd50f3e
f43778c2d19e70d4befd483b06aaf247fc65c799
23db236a054ee7a989cdbbcb42ad5c6eefd4a6ae
2m22.220s   9983b07fffd19082abded7c3f15cc77d306dd69c
8c26b451f22193ef1c544e2017cc84515566c1b8
2m38.185s   master-next 
fbcb89dc3dbabfc80310e9a4ebe72d919300a32e
cache.py:446: ResourceWarning: unclosed 
  value = pickled.load()
started showing with this revision
2m17.991s   master-next + fixups
f7cd14f90463957b3e4be6d3876def98b78f1424
2m9.651smaster-next + "Holding off tasks %s" out

now world
253m36.637s 7df31ff36892c2f9c65326b06b4c70  
1f630fdf0260db08541d3ca9f25f852931c19905
18cdc087fd5da30e2b31f3d4e81b153cd36ca844
174m13.324s a0542ed3ff700eca35f9195f743c9e28bcd50f3e
f43778c2d19e70d4befd483b06aaf247fc65c799
23db236a054ee7a989cdbbcb42ad5c6eefd4a6ae
   this is time when killed at "NOTE: Executing Tasks" without showing any 
progress on the tasks (unlike other tests) and only one bitbake process is 
running
633m27.475s master-next 
fbcb89dc3dbabfc80310e9a4ebe72d919300a32e
   this is time when killed at (1417 from 71749) - running very slowly unlike 
other bitbake revisions where the task number changes relatively quickly - 
about 10 tasks per second
89m13.992s  master-next + "Holding off tasks %s" out
89m59.201s  master-next updated today   
85fe627fdb6510f0942917964386fad9d8c479c8

Interestingly old 1f630fdf0260db08541d3ca9f25f852931c19905 is 3 times
slower than recent master-next.

I'll send -P output next.

> 
> So basically the original changes showed a 25% hit but the performance
> of -next is dire.
> 
> This is with no hash equiv server configured.
> 
> It will vary depending on the target used (numbers with -sato for the
> above would be interesting for comparision) and how much was or is in
> sstate, they type of sstate mirror configured and so on.
> 
> I really need to focus on getting the new code functioning correctly
> before we attempt to optimise but if nobody tests the new code due to
> performance problems we have a different issue. We also have a scaling
> 

Re: [OE-core] Long delays with latest bitbake (was: [PATCH 1/7] insane.bbclass: in file-rdeps do not look into RDEPENDS recursively)

2019-08-15 Thread richard . purdie
On Thu, 2019-08-15 at 14:56 +0200, Alexander Kanavin wrote:
> On Wed, 14 Aug 2019 at 23:25, 
> wrote:
> > Right, it still definitely needs work. Its a balancing act between
> > sorting out the execution bugs in the code and figuring out the
> > performance problem.
> > 
> > If anyone wants to experiment, the way I'd debug this is to run the
> > before and after cases with the -P option to bitbake. If you want
> > to
> > exit early just make the code return where it prints "Executing
> > tasks"
> > or whatever makes sense as Ctrl+C won't write the profile data.
> > 
> > You want the profile.log.processed file.
> > 
> > So save that file with the "before" commit, then save it afterwards
> > and
> > look at those files and see where its spending more time.
> > 
> > If someone generates those two files I'll happily take a look, I'm
> > kind
> > of used to reading them. There are four sets of output in there,
> > same
> > data but different sorting/types, each has its uses.
> 
> And here they are (with task spinning included this time):
> http://sensi.org/~ak/tmp/profile.log.processed.before
> http://sensi.org/~ak/tmp/profile.log.processed.after
> 
> Hope you find it useful!

How many tasks are in that build?

What is really odd is this on both traces:

   524436  405.1020.001  405.2660.001 
/home/alexander/development/poky/bitbake/lib/bb/cooker.py:264(notifications)

What that means is that cooker recieved notification of 524436 file
changes whilst it was parsing/preparing runqueue.

I would be very interested to understand which files its being notified
of changes to as I suspect there is something that shouldn't be there.

If we fixed that, it would take 406s off the 1800s for starters in both
cases. I don't see that locally in my trace, my build layout may be
different.

Other things of note:

Time in next_buildable_task went up 338 -> 440s
(probably through the set manipulation e.g. method 'difference' of
'set' objects)
New function update_holdofftasks() took 173s

build_taskdepdata is also eating lots of time suddenly.

So yes, useful, thanks. You may want to add some debug to that
nofitications() function in cooker and see which files are changing
though.

Cheers,

Richard

-- 
___
Openembedded-core mailing list
Openembedded-core@lists.openembedded.org
http://lists.openembedded.org/mailman/listinfo/openembedded-core


Re: [OE-core] Long delays with latest bitbake (was: [PATCH 1/7] insane.bbclass: in file-rdeps do not look into RDEPENDS recursively)

2019-08-15 Thread Alexander Kanavin
On Wed, 14 Aug 2019 at 23:25,  wrote:

> Right, it still definitely needs work. Its a balancing act between
> sorting out the execution bugs in the code and figuring out the
> performance problem.
>
> If anyone wants to experiment, the way I'd debug this is to run the
> before and after cases with the -P option to bitbake. If you want to
> exit early just make the code return where it prints "Executing tasks"
> or whatever makes sense as Ctrl+C won't write the profile data.
>
> You want the profile.log.processed file.
>
> So save that file with the "before" commit, then save it afterwards and
> look at those files and see where its spending more time.
>
> If someone generates those two files I'll happily take a look, I'm kind
> of used to reading them. There are four sets of output in there, same
> data but different sorting/types, each has its uses.
>

And here they are (with task spinning included this time):
http://sensi.org/~ak/tmp/profile.log.processed.before
http://sensi.org/~ak/tmp/profile.log.processed.after

Hope you find it useful!

Alex
-- 
___
Openembedded-core mailing list
Openembedded-core@lists.openembedded.org
http://lists.openembedded.org/mailman/listinfo/openembedded-core


Re: [OE-core] Long delays with latest bitbake (was: [PATCH 1/7] insane.bbclass: in file-rdeps do not look into RDEPENDS recursively)

2019-08-14 Thread richard . purdie
On Wed, 2019-08-14 at 22:27 +0200, Alexander Kanavin wrote:
> On Wed, 14 Aug 2019 at 14:55, 
> wrote:
> > You followed up mentioning this wasn't with master-next. I think
> > there
> > is a patch in -next which will help with the empty task spin so
> > both
> > together might get us back to more normal numbers.
> > 
> 
> As all of these patches are now in master, I re-ran the test with
> that (209f89ab8ed51ac2867ca8f749336af1ee24ab25), but without
> including the spinning task part, pressing ctrl-c just as it starts.
> The outcome is 9m42s, compared to 2m9s for the baseline
> (6c7c0cefd34067311144a1d4c01986fe0a4aef26). So the worst is fixed,
> but the slowdown is still noticeable.

Right, it still definitely needs work. Its a balancing act between
sorting out the execution bugs in the code and figuring out the
performance problem.

If anyone wants to experiment, the way I'd debug this is to run the
before and after cases with the -P option to bitbake. If you want to
exit early just make the code return where it prints "Executing tasks"
or whatever makes sense as Ctrl+C won't write the profile data.

You want the profile.log.processed file.

So save that file with the "before" commit, then save it afterwards and
look at those files and see where its spending more time.

If someone generates those two files I'll happily take a look, I'm kind
of used to reading them. There are four sets of output in there, same
data but different sorting/types, each has its uses.

Cheers,

Richard

-- 
___
Openembedded-core mailing list
Openembedded-core@lists.openembedded.org
http://lists.openembedded.org/mailman/listinfo/openembedded-core


Re: [OE-core] Long delays with latest bitbake (was: [PATCH 1/7] insane.bbclass: in file-rdeps do not look into RDEPENDS recursively)

2019-08-14 Thread Alexander Kanavin
On Wed, 14 Aug 2019 at 14:55,  wrote:

> You followed up mentioning this wasn't with master-next. I think there
> is a patch in -next which will help with the empty task spin so both
> together might get us back to more normal numbers.
>

As all of these patches are now in master, I re-ran the test with that
(209f89ab8ed51ac2867ca8f749336af1ee24ab25), but without including the
spinning task part, pressing ctrl-c just as it starts. The outcome is
9m42s, compared to 2m9s for the baseline
(6c7c0cefd34067311144a1d4c01986fe0a4aef26).
So the worst is fixed, but the slowdown is still noticeable.

Alex
-- 
___
Openembedded-core mailing list
Openembedded-core@lists.openembedded.org
http://lists.openembedded.org/mailman/listinfo/openembedded-core


Re: [OE-core] Long delays with latest bitbake (was: [PATCH 1/7] insane.bbclass: in file-rdeps do not look into RDEPENDS recursively)

2019-08-14 Thread Khem Raj
On Wed, Aug 14, 2019 at 7:57 AM Peter Kjellerstedt
 wrote:
>
> > -Original Message-
> > From: richard.pur...@linuxfoundation.org
> > 
> > Sent: den 14 augusti 2019 14:56
> > To: Alexander Kanavin 
> > Cc: Peter Kjellerstedt ; Khem Raj
> > ; OE-core  > c...@lists.openembedded.org>
> > Subject: Re: [OE-core] Long delays with latest bitbake (was: [PATCH
> > 1/7] insane.bbclass: in file-rdeps do not look into RDEPENDS
> > recursively)
> >
> > On Wed, 2019-08-14 at 14:08 +0200, Alexander Kanavin wrote:
> > > On Wed, 14 Aug 2019 at 13:36, 
> > > wrote:
> > > > On Wed, 2019-08-14 at 13:25 +0200, Alexander Kanavin wrote:
> > > > > On Tue, 13 Aug 2019 at 21:18, Richard Purdie <
> > > > > richard.pur...@linuxfoundation.org> wrote:
> > > > > > I had a glance at the profile output from master-next and the
> > > > > > problem
> > > > > > wasn't where I thought it would be, it was in the scheduler
> > > > code.
> > > > > > That
> > > > > > is good as those classes are effectively independent of the
> > > > other
> > > > > > changes and hence are a separate fix.
> > > > > >
> > > > > > I've put a patch in -next which takes the above test to 36s
> > > > which
> > > > > > is
> > > > > > close to the older bitbake.
> > > > > >
> > > > > > Could be interesting to see how it looks for others and
> > > > different
> > > > > > workloads.
> > > > >
> > > > > I just tried the same test I did yesterday with
> > > > > ab56d466452148e5fce330d279d13e2495eceb1f. Unfortunately it
> > > > doesn't
> > > > > seem to improve things much: bitbake is stuck at "NOTE: Executing
> > > > > Tasks" for 15 minutes now.
> > > >
> > > > This might sound slightly crazy but can you try commenting out this
> > > > line in runqueue.py:
> > > >
> > > > logger.debug(2, "Holding off tasks %s" %
> > > > pprint.pformat(self.holdoff_tasks))
> > > >
> > > > ?
> > >
> > > Even crazier is the outcome: it helped!
> >
> > Cool, I think I can explain it.
> >
> > The holdoff_tasks list can contain a list of nearly all the tasks at
> > some points in execution. Even though the debug messages aren't being
> > printed on the console, they are being sent over the internal IPC bus
> > between the cooker, UI and other event handlers. Obviously for small
> > task lists its not a problem, for large ones its multiple 4k chunks
> > over pipes which isn't going to be fast.
> >
> > We have done a lot of optimisation in the past but its all too easy to
> > trend on something like this and upset things :/.
> >
> > > The whole thing completed after 15m49secons (with much of the time
> > > going to the empty task spin), that's some 3 minutes slower, but
> > > certainly it's usable again.
> >
> > You followed up mentioning this wasn't with master-next. I think there
> > is a patch in -next which will help with the empty task spin so both
> > together might get us back to more normal numbers.
> >
> > Cheers,
> >
> > Richard
>
> I can just confirm that removing the debug line removes almost all of
> the delays I was seeing. Here are some time statistics from my builds:
>
> 6c7c0cef: 02:50
> 7df31ff3: 06:13  (first attempt)
> a0542ed3: 06:32  (master)
> 19a88c68: 43:19* (~yesterday's master-next)
> b0a0e4a6: 06:55  (master-next)
> no debug: 03:04  (master-next + removal of "Holding off tasks" debug)
>
> * I aborted this build after about half of the 12540 tasks were done...
>
> One thing I have noticed while I was doing the timings is that I do
> not seem to be able to kill the bitbake server with bitbake -m after
> the builds have completed, but have to resort to using kill -9 most
> of the time...

me too.  My world build time came down to < 15 mins as well, Last
night it was at 45 mins and still parsing before I stopped it.

>
> //Peter
>
-- 
___
Openembedded-core mailing list
Openembedded-core@lists.openembedded.org
http://lists.openembedded.org/mailman/listinfo/openembedded-core


Re: [OE-core] Long delays with latest bitbake (was: [PATCH 1/7] insane.bbclass: in file-rdeps do not look into RDEPENDS recursively)

2019-08-14 Thread Peter Kjellerstedt
> -Original Message-
> From: richard.pur...@linuxfoundation.org
> 
> Sent: den 14 augusti 2019 14:56
> To: Alexander Kanavin 
> Cc: Peter Kjellerstedt ; Khem Raj
> ; OE-core  c...@lists.openembedded.org>
> Subject: Re: [OE-core] Long delays with latest bitbake (was: [PATCH
> 1/7] insane.bbclass: in file-rdeps do not look into RDEPENDS
> recursively)
> 
> On Wed, 2019-08-14 at 14:08 +0200, Alexander Kanavin wrote:
> > On Wed, 14 Aug 2019 at 13:36, 
> > wrote:
> > > On Wed, 2019-08-14 at 13:25 +0200, Alexander Kanavin wrote:
> > > > On Tue, 13 Aug 2019 at 21:18, Richard Purdie <
> > > > richard.pur...@linuxfoundation.org> wrote:
> > > > > I had a glance at the profile output from master-next and the
> > > > > problem
> > > > > wasn't where I thought it would be, it was in the scheduler
> > > code.
> > > > > That
> > > > > is good as those classes are effectively independent of the
> > > other
> > > > > changes and hence are a separate fix.
> > > > >
> > > > > I've put a patch in -next which takes the above test to 36s
> > > which
> > > > > is
> > > > > close to the older bitbake.
> > > > >
> > > > > Could be interesting to see how it looks for others and
> > > different
> > > > > workloads.
> > > >
> > > > I just tried the same test I did yesterday with
> > > > ab56d466452148e5fce330d279d13e2495eceb1f. Unfortunately it
> > > doesn't
> > > > seem to improve things much: bitbake is stuck at "NOTE: Executing
> > > > Tasks" for 15 minutes now.
> > >
> > > This might sound slightly crazy but can you try commenting out this
> > > line in runqueue.py:
> > >
> > > logger.debug(2, "Holding off tasks %s" %
> > > pprint.pformat(self.holdoff_tasks))
> > >
> > > ?
> >
> > Even crazier is the outcome: it helped!
> 
> Cool, I think I can explain it.
> 
> The holdoff_tasks list can contain a list of nearly all the tasks at
> some points in execution. Even though the debug messages aren't being
> printed on the console, they are being sent over the internal IPC bus
> between the cooker, UI and other event handlers. Obviously for small
> task lists its not a problem, for large ones its multiple 4k chunks
> over pipes which isn't going to be fast.
> 
> We have done a lot of optimisation in the past but its all too easy to
> trend on something like this and upset things :/.
> 
> > The whole thing completed after 15m49secons (with much of the time
> > going to the empty task spin), that's some 3 minutes slower, but
> > certainly it's usable again.
> 
> You followed up mentioning this wasn't with master-next. I think there
> is a patch in -next which will help with the empty task spin so both
> together might get us back to more normal numbers.
> 
> Cheers,
> 
> Richard

I can just confirm that removing the debug line removes almost all of 
the delays I was seeing. Here are some time statistics from my builds:

6c7c0cef: 02:50
7df31ff3: 06:13  (first attempt)
a0542ed3: 06:32  (master)
19a88c68: 43:19* (~yesterday's master-next)
b0a0e4a6: 06:55  (master-next)
no debug: 03:04  (master-next + removal of "Holding off tasks" debug)

* I aborted this build after about half of the 12540 tasks were done...

One thing I have noticed while I was doing the timings is that I do 
not seem to be able to kill the bitbake server with bitbake -m after 
the builds have completed, but have to resort to using kill -9 most 
of the time... 

//Peter

-- 
___
Openembedded-core mailing list
Openembedded-core@lists.openembedded.org
http://lists.openembedded.org/mailman/listinfo/openembedded-core


Re: [OE-core] Long delays with latest bitbake (was: [PATCH 1/7] insane.bbclass: in file-rdeps do not look into RDEPENDS recursively)

2019-08-14 Thread richard . purdie
On Wed, 2019-08-14 at 14:08 +0200, Alexander Kanavin wrote:
> On Wed, 14 Aug 2019 at 13:36, 
> wrote:
> > On Wed, 2019-08-14 at 13:25 +0200, Alexander Kanavin wrote:
> > > On Tue, 13 Aug 2019 at 21:18, Richard Purdie <
> > > richard.pur...@linuxfoundation.org> wrote:
> > > > I had a glance at the profile output from master-next and the
> > > > problem
> > > > wasn't where I thought it would be, it was in the scheduler
> > code.
> > > > That
> > > > is good as those classes are effectively independent of the
> > other
> > > > changes and hence are a separate fix.
> > > > 
> > > > I've put a patch in -next which takes the above test to 36s
> > which
> > > > is
> > > > close to the older bitbake.
> > > > 
> > > > Could be interesting to see how it looks for others and
> > different
> > > > workloads.
> > > 
> > > I just tried the same test I did yesterday with
> > > ab56d466452148e5fce330d279d13e2495eceb1f. Unfortunately it
> > doesn't
> > > seem to improve things much: bitbake is stuck at "NOTE: Executing
> > > Tasks" for 15 minutes now.
> > 
> > This might sound slightly crazy but can you try commenting out this
> > line in runqueue.py:
> > 
> > logger.debug(2, "Holding off tasks %s" %
> > pprint.pformat(self.holdoff_tasks))
> > 
> > ?
> 
> Even crazier is the outcome: it helped! 

Cool, I think I can explain it.

The holdoff_tasks list can contain a list of nearly all the tasks at
some points in execution. Even though the debug messages aren't being
printed on the console, they are being sent over the internal IPC bus
between the cooker, UI and other event handlers. Obviously for small
task lists its not a problem, for large ones its multiple 4k chunks
over pipes which isn't going to be fast.

We have done a lot of optimisation in the past but its all too easy to
trend on something like this and upset things :/.

> The whole thing completed after 15m49secons (with much of the time
> going to the empty task spin), that's some 3 minutes slower, but
> certainly it's usable again.

You followed up mentioning this wasn't with master-next. I think there
is a patch in -next which will help with the empty task spin so both
together might get us back to more normal numbers.

Cheers,

Richard

-- 
___
Openembedded-core mailing list
Openembedded-core@lists.openembedded.org
http://lists.openembedded.org/mailman/listinfo/openembedded-core


Re: [OE-core] Long delays with latest bitbake (was: [PATCH 1/7] insane.bbclass: in file-rdeps do not look into RDEPENDS recursively)

2019-08-14 Thread Mikko.Rapeli
On Wed, Aug 14, 2019 at 02:08:01PM +0200, Alexander Kanavin wrote:
> On Wed, 14 Aug 2019 at 13:36,  wrote:
> 
> > On Wed, 2019-08-14 at 13:25 +0200, Alexander Kanavin wrote:
> > > On Tue, 13 Aug 2019 at 21:18, Richard Purdie <
> > > richard.pur...@linuxfoundation.org> wrote:
> > > > I had a glance at the profile output from master-next and the
> > > > problem
> > > > wasn't where I thought it would be, it was in the scheduler code.
> > > > That
> > > > is good as those classes are effectively independent of the other
> > > > changes and hence are a separate fix.
> > > >
> > > > I've put a patch in -next which takes the above test to 36s which
> > > > is
> > > > close to the older bitbake.
> > > >
> > > > Could be interesting to see how it looks for others and different
> > > > workloads.
> > >
> > > I just tried the same test I did yesterday with
> > > ab56d466452148e5fce330d279d13e2495eceb1f. Unfortunately it doesn't
> > > seem to improve things much: bitbake is stuck at "NOTE: Executing
> > > Tasks" for 15 minutes now.
> >
> > This might sound slightly crazy but can you try commenting out this
> > line in runqueue.py:
> >
> > logger.debug(2, "Holding off tasks %s" %
> > pprint.pformat(self.holdoff_tasks))
> >
> > ?
> >
> 
> Even crazier is the outcome: it helped! The whole thing completed after
> 15m49secons (with much of the time going to the empty task spin), that's
> some 3 minutes slower, but certainly it's usable again.
> 
> I have not enabled the hash server at any point.

Indeed, commenting that debug statement out removes few minutes of the
bitbake preparation times.

Here is a time stamped output from master branch:

2019-08-14_14:29:24  
2019-08-14_14:29:35  Initialising tasks...done.
2019-08-14_14:32:41  Checking sstate mirror object availability...done.
2019-08-14_14:32:41  Sstate summary: Wanted 436 Found 426 Missed 10 Current 
2407 (97% match, 99% complete)
2019-08-14_14:32:41  NOTE: Executing Tasks
2019-08-14_14:32:52  NOTE: Setscene tasks completed

And with the comment removed:

2019-08-14_14:35:10  
2019-08-14_14:35:21  Initialising tasks...done.
2019-08-14_14:35:30  Checking sstate mirror object availability...done.
2019-08-14_14:35:30  Sstate summary: Wanted 436 Found 426 Missed 10 Current 
2407 (97% match, 99% complete)
2019-08-14_14:35:30  NOTE: Executing Tasks

And just in case back to master branch state:

2019-08-14_14:41:32  
2019-08-14_14:41:43  Initialising tasks...done.
2019-08-14_14:45:02  Checking sstate mirror object availability...done.
2019-08-14_14:45:02  Sstate summary: Wanted 436 Found 426 Missed 10 Current 
2407 (97% match, 99% complete)
2019-08-14_14:45:02  NOTE: Executing Tasks

I did not flush caches in between and I stopped the builds once bitbake started 
doing stuff, so I think sstate cache was completely buffered in memory
from file system and IO delays did not affect the timings.

Change to poky was exactly:

--- a/bitbake/lib/bb/runqueue.py
+++ b/bitbake/lib/bb/runqueue.py
@@ -2216,7 +2216,7 @@ class RunQueueExecute:
 for dep in self.sqdata.sq_covered_tasks[tid]:
 if dep not in self.runq_complete:
 self.holdoff_tasks.add(dep)
-logger.debug(2, "Holding off tasks %s" % 
pprint.pformat(self.holdoff_tasks))
+# logger.debug(2, "Holding off tasks %s" % 
pprint.pformat(self.holdoff_tasks))
 
 def process_possible_migrations(self):
 changes = False

-Mikko
-- 
___
Openembedded-core mailing list
Openembedded-core@lists.openembedded.org
http://lists.openembedded.org/mailman/listinfo/openembedded-core


Re: [OE-core] Long delays with latest bitbake (was: [PATCH 1/7] insane.bbclass: in file-rdeps do not look into RDEPENDS recursively)

2019-08-14 Thread Alexander Kanavin
On Wed, 14 Aug 2019 at 14:08, Alexander Kanavin 
wrote:

>
> This might sound slightly crazy but can you try commenting out this
>> line in runqueue.py:
>>
>> logger.debug(2, "Holding off tasks %s" %
>> pprint.pformat(self.holdoff_tasks))
>>
>> ?
>>
>
> Even crazier is the outcome: it helped! The whole thing completed after
> 15m49secons (with much of the time going to the empty task spin), that's
> some 3 minutes slower, but certainly it's usable again.
>
> I have not enabled the hash server at any point.
>

I need to clarify: this was with current master
(18b8e2e10494d3b0e18743a1bf9e99038da4229c), not master-next.

Alex
-- 
___
Openembedded-core mailing list
Openembedded-core@lists.openembedded.org
http://lists.openembedded.org/mailman/listinfo/openembedded-core


Re: [OE-core] Long delays with latest bitbake (was: [PATCH 1/7] insane.bbclass: in file-rdeps do not look into RDEPENDS recursively)

2019-08-14 Thread Alexander Kanavin
On Wed, 14 Aug 2019 at 13:36,  wrote:

> On Wed, 2019-08-14 at 13:25 +0200, Alexander Kanavin wrote:
> > On Tue, 13 Aug 2019 at 21:18, Richard Purdie <
> > richard.pur...@linuxfoundation.org> wrote:
> > > I had a glance at the profile output from master-next and the
> > > problem
> > > wasn't where I thought it would be, it was in the scheduler code.
> > > That
> > > is good as those classes are effectively independent of the other
> > > changes and hence are a separate fix.
> > >
> > > I've put a patch in -next which takes the above test to 36s which
> > > is
> > > close to the older bitbake.
> > >
> > > Could be interesting to see how it looks for others and different
> > > workloads.
> >
> > I just tried the same test I did yesterday with
> > ab56d466452148e5fce330d279d13e2495eceb1f. Unfortunately it doesn't
> > seem to improve things much: bitbake is stuck at "NOTE: Executing
> > Tasks" for 15 minutes now.
>
> This might sound slightly crazy but can you try commenting out this
> line in runqueue.py:
>
> logger.debug(2, "Holding off tasks %s" %
> pprint.pformat(self.holdoff_tasks))
>
> ?
>

Even crazier is the outcome: it helped! The whole thing completed after
15m49secons (with much of the time going to the empty task spin), that's
some 3 minutes slower, but certainly it's usable again.

I have not enabled the hash server at any point.

Alex
-- 
___
Openembedded-core mailing list
Openembedded-core@lists.openembedded.org
http://lists.openembedded.org/mailman/listinfo/openembedded-core


Re: [OE-core] Long delays with latest bitbake (was: [PATCH 1/7] insane.bbclass: in file-rdeps do not look into RDEPENDS recursively)

2019-08-14 Thread richard . purdie
On Wed, 2019-08-14 at 13:25 +0200, Alexander Kanavin wrote:
> On Tue, 13 Aug 2019 at 21:18, Richard Purdie <
> richard.pur...@linuxfoundation.org> wrote:
> > I had a glance at the profile output from master-next and the
> > problem
> > wasn't where I thought it would be, it was in the scheduler code.
> > That
> > is good as those classes are effectively independent of the other
> > changes and hence are a separate fix.
> > 
> > I've put a patch in -next which takes the above test to 36s which
> > is
> > close to the older bitbake.
> > 
> > Could be interesting to see how it looks for others and different
> > workloads.
> 
> I just tried the same test I did yesterday with
> ab56d466452148e5fce330d279d13e2495eceb1f. Unfortunately it doesn't
> seem to improve things much: bitbake is stuck at "NOTE: Executing
> Tasks" for 15 minutes now.

This might sound slightly crazy but can you try commenting out this
line in runqueue.py:

logger.debug(2, "Holding off tasks %s" % pprint.pformat(self.holdoff_tasks))

?

This was with the hash server disabled, right?

Cheers,

Richard

-- 
___
Openembedded-core mailing list
Openembedded-core@lists.openembedded.org
http://lists.openembedded.org/mailman/listinfo/openembedded-core


Re: [OE-core] Long delays with latest bitbake (was: [PATCH 1/7] insane.bbclass: in file-rdeps do not look into RDEPENDS recursively)

2019-08-14 Thread Alexander Kanavin
On Tue, 13 Aug 2019 at 21:18, Richard Purdie <
richard.pur...@linuxfoundation.org> wrote:

> I had a glance at the profile output from master-next and the problem
> wasn't where I thought it would be, it was in the scheduler code. That
> is good as those classes are effectively independent of the other
> changes and hence are a separate fix.
>
> I've put a patch in -next which takes the above test to 36s which is
> close to the older bitbake.
>
> Could be interesting to see how it looks for others and different
> workloads.
>

I just tried the same test I did yesterday with
ab56d466452148e5fce330d279d13e2495eceb1f. Unfortunately it doesn't seem to
improve things much: bitbake is stuck at "NOTE: Executing Tasks" for 15
minutes now.

Alex
-- 
___
Openembedded-core mailing list
Openembedded-core@lists.openembedded.org
http://lists.openembedded.org/mailman/listinfo/openembedded-core


Re: [OE-core] Long delays with latest bitbake (was: [PATCH 1/7] insane.bbclass: in file-rdeps do not look into RDEPENDS recursively)

2019-08-13 Thread Khem Raj
On Tue, Aug 13, 2019 at 12:18 PM Richard Purdie
 wrote:
>
> On Tue, 2019-08-13 at 10:04 +0100, Richard Purdie wrote:
> > On Mon, 2019-08-12 at 20:26 +, Peter Kjellerstedt wrote:
> > > Comparing that build to a corresponding do-nothing build with Thud,
> > > the time difference matches those three minutes where I have no
> > > idea
> > > what bitbake is doing now that it didn’t need to do before…
> > >
> > > Hopefully these time degradations can be solved, because the
> > > current
> > > state of bitbake is barely usable. We also need to look into
> > > possible
> > > ways to improve the cooker output when it is running the setscene
> > > tasks so it makes some kind of sense again.
> >
> > We talked on irc and you pointed at the commit things started to go
> > wrong. Just to summarise things for the benefit of the list, this is
> > some quick testing I did:
> >
> > "bitbake -p; time bitbake core-image-minimal -n"
> >
> > 30.0s 6c7c0cefd34067311144a1d4c01986fe0a4aef26
> > 30.6s a0d941c787cf3ef030d190903279d311bc05d752
> > 40.3s 7df31ff36892c2f9c65326b06b4c70
> > 42.2s a0542ed3ff700eca35f9195f743c9e28bcd50f3e
> > 45.4s 9983b07fffd19082abded7c3f15cc77d306dd69c
> > 76.9s master-next
> >
> > So basically the original changes showed a 25% hit but the
> > performance
> > of -next is dire.
> >
> > This is with no hash equiv server configured.
> >
> > It will vary depending on the target used (numbers with -sato for the
> > above would be interesting for comparision) and how much was or is in
> > sstate, they type of sstate mirror configured and so on.
> >
> > I really need to focus on getting the new code functioning correctly
> > before we attempt to optimise but if nobody tests the new code due to
> > performance problems we have a different issue. We also have a
> > scaling
> > problem with the hash server itself I need to fix to stop the
> > autobuilder throwing weird errors. I'm therefore a bit challenged on
> > where to start with it all :/.
>
> I had a glance at the profile output from master-next and the problem
> wasn't where I thought it would be, it was in the scheduler code. That
> is good as those classes are effectively independent of the other
> changes and hence are a separate fix.
>
> I've put a patch in -next which takes the above test to 36s which is
> close to the older bitbake.
>
> Could be interesting to see how it looks for others and different
> workloads.
>

will try it tonight

> Cheers,
>
> Richard
>
-- 
___
Openembedded-core mailing list
Openembedded-core@lists.openembedded.org
http://lists.openembedded.org/mailman/listinfo/openembedded-core


Re: [OE-core] Long delays with latest bitbake (was: [PATCH 1/7] insane.bbclass: in file-rdeps do not look into RDEPENDS recursively)

2019-08-13 Thread Richard Purdie
On Tue, 2019-08-13 at 10:04 +0100, Richard Purdie wrote:
> On Mon, 2019-08-12 at 20:26 +, Peter Kjellerstedt wrote:
> > Comparing that build to a corresponding do-nothing build with Thud,
> > the time difference matches those three minutes where I have no
> > idea
> > what bitbake is doing now that it didn’t need to do before…
> >  
> > Hopefully these time degradations can be solved, because the
> > current
> > state of bitbake is barely usable. We also need to look into
> > possible
> > ways to improve the cooker output when it is running the setscene
> > tasks so it makes some kind of sense again.
> 
> We talked on irc and you pointed at the commit things started to go
> wrong. Just to summarise things for the benefit of the list, this is
> some quick testing I did:
> 
> "bitbake -p; time bitbake core-image-minimal -n"
> 
> 30.0s 6c7c0cefd34067311144a1d4c01986fe0a4aef26
> 30.6s a0d941c787cf3ef030d190903279d311bc05d752
> 40.3s 7df31ff36892c2f9c65326b06b4c70 
> 42.2s a0542ed3ff700eca35f9195f743c9e28bcd50f3e 
> 45.4s 9983b07fffd19082abded7c3f15cc77d306dd69c 
> 76.9s master-next
> 
> So basically the original changes showed a 25% hit but the
> performance
> of -next is dire.
> 
> This is with no hash equiv server configured.
> 
> It will vary depending on the target used (numbers with -sato for the
> above would be interesting for comparision) and how much was or is in
> sstate, they type of sstate mirror configured and so on.
> 
> I really need to focus on getting the new code functioning correctly
> before we attempt to optimise but if nobody tests the new code due to
> performance problems we have a different issue. We also have a
> scaling
> problem with the hash server itself I need to fix to stop the
> autobuilder throwing weird errors. I'm therefore a bit challenged on
> where to start with it all :/.

I had a glance at the profile output from master-next and the problem
wasn't where I thought it would be, it was in the scheduler code. That
is good as those classes are effectively independent of the other
changes and hence are a separate fix.

I've put a patch in -next which takes the above test to 36s which is
close to the older bitbake.

Could be interesting to see how it looks for others and different
workloads.

Cheers,

Richard

-- 
___
Openembedded-core mailing list
Openembedded-core@lists.openembedded.org
http://lists.openembedded.org/mailman/listinfo/openembedded-core


Re: [OE-core] Long delays with latest bitbake (was: [PATCH 1/7] insane.bbclass: in file-rdeps do not look into RDEPENDS recursively)

2019-08-13 Thread Alexander Kanavin
On Tue, 13 Aug 2019 at 11:04, Richard Purdie <
richard.pur...@linuxfoundation.org> wrote:

> We talked on irc and you pointed at the commit things started to go
> wrong. Just to summarise things for the benefit of the list, this is
> some quick testing I did:
>
> "bitbake -p; time bitbake core-image-minimal -n"
>
> 30.0s 6c7c0cefd34067311144a1d4c01986fe0a4aef26
> 30.6s a0d941c787cf3ef030d190903279d311bc05d752
> 40.3s 7df31ff36892c2f9c65326b06b4c70
> 42.2s a0542ed3ff700eca35f9195f743c9e28bcd50f3e
> 45.4s 9983b07fffd19082abded7c3f15cc77d306dd69c
> 76.9s master-next
>
> So basically the original changes showed a 25% hit but the performance
> of -next is dire.
>

Sadly with larger sets the performance hit is much worse. I enabled all of
the layers in meta-oe, and ran this with empty sstate and build directories:
bitbake -p; time bitbake -n -k world

The outcomes are:

6c7c0cefd34067311144a1d4c01986fe0a4aef26
12m51,848s
(this is the baseline; the bulk of the time goes towards going through the
task list without executing the tasks - the 'spinning task counter" part)

9983b07fffd19082abded7c3f15cc77d306dd69c
66m29,563s

So there could be something quadratically-growing going on with these new
commits :(

I didn't dare to try master-next after seeing above how it hits
core-image-minimal and master already regressing to over an hour :-(

Alex
-- 
___
Openembedded-core mailing list
Openembedded-core@lists.openembedded.org
http://lists.openembedded.org/mailman/listinfo/openembedded-core


Re: [OE-core] Long delays with latest bitbake (was: [PATCH 1/7] insane.bbclass: in file-rdeps do not look into RDEPENDS recursively)

2019-08-13 Thread Richard Purdie
On Mon, 2019-08-12 at 20:26 +, Peter Kjellerstedt wrote:
> Comparing that build to a corresponding do-nothing build with Thud,
> the time difference matches those three minutes where I have no idea
> what bitbake is doing now that it didn’t need to do before…
>  
> Hopefully these time degradations can be solved, because the current
> state of bitbake is barely usable. We also need to look into possible
> ways to improve the cooker output when it is running the setscene
> tasks so it makes some kind of sense again.

We talked on irc and you pointed at the commit things started to go
wrong. Just to summarise things for the benefit of the list, this is
some quick testing I did:

"bitbake -p; time bitbake core-image-minimal -n"

30.0s 6c7c0cefd34067311144a1d4c01986fe0a4aef26
30.6s a0d941c787cf3ef030d190903279d311bc05d752
40.3s 7df31ff36892c2f9c65326b06b4c70 
42.2s a0542ed3ff700eca35f9195f743c9e28bcd50f3e 
45.4s 9983b07fffd19082abded7c3f15cc77d306dd69c 
76.9s master-next

So basically the original changes showed a 25% hit but the performance
of -next is dire.

This is with no hash equiv server configured.

It will vary depending on the target used (numbers with -sato for the
above would be interesting for comparision) and how much was or is in
sstate, they type of sstate mirror configured and so on.

I really need to focus on getting the new code functioning correctly
before we attempt to optimise but if nobody tests the new code due to
performance problems we have a different issue. We also have a scaling
problem with the hash server itself I need to fix to stop the
autobuilder throwing weird errors. I'm therefore a bit challenged on
where to start with it all :/.

Cheers,

Richard



-- 
___
Openembedded-core mailing list
Openembedded-core@lists.openembedded.org
http://lists.openembedded.org/mailman/listinfo/openembedded-core