Re: [yocto] Build time data

2012-04-19 Thread Koen Kooi

Op 13 apr. 2012, om 10:45 heeft Richard Purdie het volgende geschreven:

 On Thu, 2012-04-12 at 07:34 -0700, Darren Hart wrote:
 
 On 04/12/2012 07:08 AM, Björn Stenberg wrote:
 Darren Hart wrote:
 /dev/md0/build  ext4 
 noauto,noatime,nodiratime,commit=6000
 
 A minor detail: 'nodiratime' is a subset of 'noatime', so there is no
 need to specify both.
 
 Excellent, thanks for the tip.
 
 Note the key here is that for a system with large amounts of memory, you
 can effectively keep the build in memory due to the long commit time.
 
 All the tests I've done show we are not IO bound anyway.

Consider this scenario:

OS disk on spinning rust (sda1, /)
BUILDDIR on spinning rust (sdb1, /OE)
WORKDIR on SSD (sdc1, /OE/build/tmp/work)
SD card in USB reader (sde1)

When I do the following during a build all CPUs will enter IO wait and the 
build grinds to a halt:

cd /media ; xz -d -c foo.img.xz | pv -s 3488M  /dev/sde

That only touches the OS disk and the SD card, but for some reason the 3.2.8 
kernel stops IO to the OE disks as well. do_patch for my kernel recipe has been 
taking more than an hour now, it usually completes in less than 5 minutes (a 
few hundred patches applied with a custom patcher, git-am).

regards,

Koen
___
yocto mailing list
yocto@yoctoproject.org
https://lists.yoctoproject.org/listinfo/yocto


Re: [yocto] Build time data

2012-04-19 Thread Joshua Immanuel
Hello,

On Fri, 2012-04-13 at 09:45 +0100, Richard Purdie wrote:
 There are undoubtedly ways we can improve performance but I think
 we've done the low hanging fruit and we need some fresh ideas.

Is there a way to integrate distcc in yocto so that we could distribute
the build across machines.
-- 
Joshua Immanuel
HiPro IT Solutions Private Limited
http://hipro.co.in


signature.asc
Description: This is a digitally signed message part
___
yocto mailing list
yocto@yoctoproject.org
https://lists.yoctoproject.org/listinfo/yocto


Re: [yocto] Build time data

2012-04-19 Thread Richard Purdie
On Thu, 2012-04-19 at 18:18 +0530, Joshua Immanuel wrote:
 Hello,
 
 On Fri, 2012-04-13 at 09:45 +0100, Richard Purdie wrote:
  There are undoubtedly ways we can improve performance but I think
  we've done the low hanging fruit and we need some fresh ideas.
 
 Is there a way to integrate distcc in yocto so that we could distribute
 the build across machines.

See icecream.bbclass but compiling is not the bottleneck, its configure,
install and packaging...

Cheers,

Richard

___
yocto mailing list
yocto@yoctoproject.org
https://lists.yoctoproject.org/listinfo/yocto


Re: [yocto] Build time data

2012-04-19 Thread Samuel Stirtzel
2012/4/19 Richard Purdie richard.pur...@linuxfoundation.org:
 On Thu, 2012-04-19 at 18:18 +0530, Joshua Immanuel wrote:
 Hello,

 On Fri, 2012-04-13 at 09:45 +0100, Richard Purdie wrote:
  There are undoubtedly ways we can improve performance but I think
  we've done the low hanging fruit and we need some fresh ideas.

 Is there a way to integrate distcc in yocto so that we could distribute
 the build across machines.

 See icecream.bbclass but compiling is not the bottleneck, its configure,
 install and packaging...

Multi threaded package managers come to my mind, also multi threaded
bzip2 (see [1])
Maybe multi threaded autotools / cmake, but that will be future talk
(and a headache for the developers).



 Cheers,

 Richard

 ___
 yocto mailing list
 yocto@yoctoproject.org
 https://lists.yoctoproject.org/listinfo/yocto


[1] http://compression.ca/pbzip2/

-- 
Regards
Samuel
___
yocto mailing list
yocto@yoctoproject.org
https://lists.yoctoproject.org/listinfo/yocto


Re: [yocto] Build time data

2012-04-19 Thread Chris Tapp
On 18 Apr 2012, at 21:55, Darren Hart wrote:

snip

 A couple of things to keep in mind here. The minimal build is very
 serialized in comparison to something like a sato build. If you want to
 optimize your build times, look at the bbmatrix* scripts shipped with
 poky to find the sweet spot for your target image and your build system.
 I suspect you will find your BB_NUMBER_THREADS and PARALLEL_MAKE
 settings are two high for your system. I'd start with them at 8 and 8,
 or 8 and 6 respectively.

I've run a few of the matrix variants (it's going to take a few days to get a 
full set). 8 and 16 threads are giving the same results (within a few seconds) 
for parallel make values in the range 6 to 12.

I tried a core-image-sato build and it completed in 61m/244m/40m, which is much 
closer to your 50m than I thought I would get.

One thing I noticed during the build was that gettext-native seemed slow. Doing 
a 'clean' on it and re-baking shows that it takes over 4 minutes to build with 
most of the time (2m38) being spent in 'do_configure'. It also seems as if this 
is on the critical path as nothing else was getting scheduled while it was 
building. There seems to be a lot of 'nothing' going on during the do_configure 
phase (i.e. very little CPU use). Or, to put it another way, 2.5% of the build 
time is taken up configuring this package!

 IPK is faster than RPM. This is what I use on most of my builds.

Makes no noticeable difference in my testing so far, but I'll stick with IPK 
from now on.

snip

 Run the ubuntu server kernel to eliminate some scheduling overhead.
 Reducing the parallel settings mentioned above should help here too.

I'm running 11.x server as you mentioned this before ;-)

Chris Tapp

opensou...@keylevel.com
www.keylevel.com

___
yocto mailing list
yocto@yoctoproject.org
https://lists.yoctoproject.org/listinfo/yocto


Re: [yocto] Build time data

2012-04-18 Thread Chris Tapp
On 12 Apr 2012, at 23:56, Darren Hart wrote:
 Get back to us with times, and we'll build up a wiki page.

Some initial results / comments:

I'm running on:
 - i7 3820 (quad core, hyper-treading, 3.6GHz)
 - 16GB RAM (1600MHz XMP profile)
 - Asus P9X79 Pro motherboard
 - Ubuntu 11.10 x86_64 server installed on a 60GB OCZ Vertex 3 SSD on a 3Gb/s 
interface
 - Two 60GB OCZ Vertex 3s as RAID-0 on 6Gb/s interfaces.

The following results use a DL_DIR on the OS SSD (pre-populated) - I'm not 
interested in the speed of the internet, especially as I've only got a 
relatively slow connection ;-)

Poky-6.0.1 is also installed on the OS SSD.

I've done a few builds of core-image-minimal:

1) Build dir on the OS SSD
2) Build dir on the SSD RAID + various bits of tuning.

The results are basically the same, so it seems as if the SSD RAID makes no 
difference. Benchmarking it does show twice the read/write performance of the 
OS SSD, as expected. Disabling journalling and increasing the commit time to 
6000 also made no significant difference to the build times, which were (to the 
nearest minute):

Real   : 42m
User   : 133m
System : 19m

These time were starting from nothing, and seem to fit with your 30 minutes 
with 3 times as many cores! BTW, BB_NUMBER_THREADS was set to 16 and 
PARALLEL_MAKE to 12.

I also tried rebuilding the kernel:
   bitbake -c clean linux-yocto
   rm -rf the sstate bits for the above
   bitbake linux-yocto

and got the following times:

Real   : 39m
User   : 105m
System : 16m

Which kind of fits with an observation. The minimal build had something like 
1530 stages to complete. The first 750 to 800 of these flew past with all 8 
'cores' running at just about 100% all the time. Load average (short term) was 
about 19, so plenty ready to run. However, round about the time python-native, 
the kernel, libxslt, gettext kicked in the cpu usage dropped right off - to the 
point that the short term load average dropped below 3. It did pick up again 
later on (after the kernel was completed) before slowing down again towards the 
end (when it would seem reasonable to expect that less can run in parallel).

It seems as if some of these bits (or others around this time) aren't making 
use of parallel make or there is a queue of dependent tasks that needs to be 
serialized.

The kernel build is a much bigger part of the build than I was expecting, but 
this is only a small image. However, it looks as if the main compilation phase 
completes very early on and a lot of time is then spent building the modules 
(in a single thread, it seems) and in packaging - which leads me to ask if RPM 
is the best option (speed wise)? I don't use the packages myself (though 
understand they are needed internally), so I can use the fastest (if there is 
one).

Is there anything else I should be considering to improve build times? As I 
said above, this is just a rough-cut at some benchmarking and I plan to do some 
more, especially if there are other things to try and/or any other information 
that would be useful.

Still, it's looking much, much faster than my old build system :-)

Chris Tapp

opensou...@keylevel.com
www.keylevel.com
___
yocto mailing list
yocto@yoctoproject.org
https://lists.yoctoproject.org/listinfo/yocto


Re: [yocto] Build time data

2012-04-18 Thread Chris Tapp
On 18 Apr 2012, at 20:41, Chris Tapp wrote:

 On 12 Apr 2012, at 23:56, Darren Hart wrote:
 Get back to us with times, and we'll build up a wiki page.
 
 snip
 
 I also tried rebuilding the kernel:
   bitbake -c clean linux-yocto
   rm -rf the sstate bits for the above
   bitbake linux-yocto
 
 and got the following times:

(CORRECT TIMES INSERTED):

 Real   : 11m
 User   : 15m
 System : 2m


The comments about low load averages during kernel build still stand.

Chris Tapp

opensou...@keylevel.com
www.keylevel.com
___
yocto mailing list
yocto@yoctoproject.org
https://lists.yoctoproject.org/listinfo/yocto


Re: [yocto] Build time data

2012-04-18 Thread Darren Hart


On 04/18/2012 12:41 PM, Chris Tapp wrote:
 On 12 Apr 2012, at 23:56, Darren Hart wrote:
 Get back to us with times, and we'll build up a wiki page.
 
 Some initial results / comments:
 
 I'm running on:
  - i7 3820 (quad core, hyper-treading, 3.6GHz)
  - 16GB RAM (1600MHz XMP profile)
  - Asus P9X79 Pro motherboard
  - Ubuntu 11.10 x86_64 server installed on a 60GB OCZ Vertex 3 SSD on a 3Gb/s 
 interface
  - Two 60GB OCZ Vertex 3s as RAID-0 on 6Gb/s interfaces.
 
 The following results use a DL_DIR on the OS SSD (pre-populated) -
 I'm
not interested in the speed of the internet, especially as I've only got
a relatively slow connection ;-)
 
 Poky-6.0.1 is also installed on the OS SSD.
 
 I've done a few builds of core-image-minimal:
 
 1) Build dir on the OS SSD
 2) Build dir on the SSD RAID + various bits of tuning.
 
 The results are basically the same, so it seems as if the SSD RAID
makes no difference. Benchmarking it does show twice the read/write
performance of the OS SSD, as expected. Disabling journalling and
increasing the commit time to 6000 also made no significant difference
to the build times, which were (to the nearest minute):


That is not surprising. With 4 cores and a very serialized build target,
I would not expect your SSD to be the bottleneck.

 
 Real   : 42m
 User   : 133m
 System : 19m
 
 These time were starting from nothing, and seem to fit with your 30
minutes with 3 times as many cores! BTW, BB_NUMBER_THREADS was set to 16
and PARALLEL_MAKE to 12.

A couple of things to keep in mind here. The minimal build is very
serialized in comparison to something like a sato build. If you want to
optimize your build times, look at the bbmatrix* scripts shipped with
poky to find the sweet spot for your target image and your build system.
I suspect you will find your BB_NUMBER_THREADS and PARALLEL_MAKE
settings are two high for your system. I'd start with them at 8 and 8,
or 8 and 6 respectively.

 
 I also tried rebuilding the kernel:
bitbake -c clean linux-yocto
rm -rf the sstate bits for the above
bitbake linux-yocto
 
 and got the following times:
 
 Real   : 39m
 User   : 105m
 System : 16m
 
 Which kind of fits with an observation. The minimal build had
something like 1530 stages to complete. The first 750 to 800 of these
flew past with all 8 'cores' running at just about 100% all the time.
Load average (short term) was about 19, so plenty ready to run. However,
round about the time python-native, the kernel, libxslt, gettext kicked
in the cpu usage dropped right off - to the point that the short term
load average dropped below 3. It did pick up again later on (after the
kernel was completed) before slowing down again towards the end (when it
would seem reasonable to expect that less can run in parallel).
 
 It seems as if some of these bits (or others around this time)
 aren't
making use of parallel make or there is a queue of dependent tasks that
needs to be serialized.
 
 The kernel build is a much bigger part of the build than I was
expecting, but this is only a small image. However, it looks as if the
main compilation phase completes very early on and a lot of time is then
spent building the modules (in a single thread, it seems) and in
packaging - which leads me to ask if RPM is the best option (speed
wise)? I don't use the packages myself (though understand they are
needed internally), so I can use the fastest (if there is one).

IPK is faster than RPM. This is what I use on most of my builds.

 
 Is there anything else I should be considering to improve build
 times?

Run the ubuntu server kernel to eliminate some scheduling overhead.
Reducing the parallel settings mentioned above should help here too.

Welcome to Ubuntu 11.10 (GNU/Linux 3.0.0-16-server x86_64)

dvhart@rage:~
$ uname -r
3.0.0-16-server


As I said above, this is just a rough-cut at some benchmarking and I
plan to do some more, especially if there are other things to try and/or
any other information that would be useful.
 
 Still, it's looking much, much faster than my old build system :-)
 
 Chris Tapp
 
 opensou...@keylevel.com
 www.keylevel.com

-- 
Darren Hart
Intel Open Source Technology Center
Yocto Project - Linux Kernel
___
yocto mailing list
yocto@yoctoproject.org
https://lists.yoctoproject.org/listinfo/yocto


Re: [yocto] Build time data

2012-04-17 Thread Martin Jansa
On Fri, Apr 13, 2012 at 07:51:51AM +0200, Martin Jansa wrote:
 On Thu, Apr 12, 2012 at 04:37:00PM -0700, Flanagan, Elizabeth wrote:
  On Thu, Apr 12, 2012 at 7:12 AM, Darren Hart dvh...@linux.intel.com wrote:
  
   -BEGIN PGP SIGNED MESSAGE-
   Hash: SHA1
  
  
  
   On 04/12/2012 01:00 AM, Martin Jansa wrote:
On Thu, Apr 12, 2012 at 01:05:00PM +0530, Joshua Immanuel wrote:
Darren,
   
On Wed, 2012-04-11 at 17:30 -0700, Darren Hart wrote:
I run on a beast with 12 cores, 48GB of RAM, OS and sources on
a G2 Intel SSD, with two Seagate Barracudas in a RAID0 array
for my /build partition. I run a headless Ubuntu 11.10 (x86_64)
installation running the 3.0.0-16-server kernel. I can build
core-image-minimal in  30 minutes and core-image-sato in  50
minutes from scratch.
   
why not use so much RAM for WORKDIR in tmpfs? I bought 16GB just to
be able to do my builds in tmpfs and keep only more permanent data
on RAID.
  
   We've done some experiments with tmpfs, adding Beth on CC. If I recall
   correctly, my RAID0 array with the mount options I specified
   accomplishes much of what tmpfs does for me without the added setup.
  
  
  This should be the case in general. For the most part, if you have a decent
  RAID setup (We're using RAID10 on the ab) with fast disks you should be
  able to hit tmpfs speed (or close to it). I've done some experiments with
  this and what I found was maybe a 5 minute difference, sometimes, from a
  clean build between tmpfs and RAID10.
 
 5 minutes on very small image like core-image-minimal (30 min) is 1/6 of
 that time :).. 
 
 I have much bigger images and even bigger ipk feed, so to rebuild from
 scratch takes about 24 hours for one architecture..
 
 And my system is very slow compared to yours, I've found my measurement
 of core-image-minimal-with-mtdutils around 95 mins
 http://patchwork.openembedded.org/patch/17039/
 but this was with Phenom II X4 965, 4GB RAM, RAID0 (3 SATA2 disks) for
 WORKDIR, RAID5 (the same 3 SATA2 disks) BUILDDIR (raid as mdraid), now
 I have Bulldozer AMD FX(tm)-8120, 16GB RAM, still the same RAID0 but
 different motherboard..
 
 Problem with tmpfs is that no RAM is big enough to build whole feed in
 one go, so I have to build in steps (e.g. bitbake gcc for all machines
 with the same architecture, then cleanup WORKDIR and switch to another
 arch, then bitbake small-image, bigger-image, qt4-x11-free, ...).
 qt4-x11-free is able to eat 15GB tmpfs almost completely.
 
  I discussed this during Yocto Developer Day. Let me boil it down a bit to
  explain some of what I did on the autobuilders.
  
  Caveat first though. I would avoid using autobuilder time as representative
  of prime yocto build time. The autobuilder hosts a lot of different
  services that sometimes impact build time and this can vary depending on
  what else is going on on the machine.
  
  There are four places, in general, where you want to look at optimizing
  outside of dependency issues. CPU, disk, memory, build process. What I
  found was that the most useful of these in getting the autobuilder time
  down was disk and build process.
  
  With disk, spreading it across the RAID saved us not only a bit of time,
  but also helped us avoid trashed disks. More disk thrash == higher failure
  rate. So far this year we've seen two disk failures that have resulted in
  almost zero autobuilder downtime.
 
 True for RAID10, but for WORKDIR itself RAID0 is cheeper and even higher
 failure rate it's not big issue for WORKDIR.. just have to cleansstate
 tasks which were in hit in the middle of build..
 
  The real time saver however ended up being maintaining sstate across build
  runs. Even with our sstate on nfs, we're still seeing a dramatic decrease
  in build time.
  
  I would be interested in seeing what times you get with tmpfs. I've done
  tmpfs builds before and have seen good results, but bang for the buck did
  end up being a RAID array.
 
 I'll check if core-image-minimal can be built with just 15GB tmpfs,
 otherwise I would have to build it in 2 steps and the time wont be
 precise.

It was enough with rm_work, so here are my results:

The difference is much smaller then I've expected, but again those are
very small images (next time I'll try to do just qt4 builds).

Fastest is TMPDIR on tmpfs (BUILDDIR is not important - same times with
BUILDDIR also in tmpfs and on SATA2 disk).

raid0 is only about 4% slower

single SATA2 disk is slowest but only a bit slower then raid5, but that
could be caused by bug #2314 as I had to run build twice..

And all times were just from first successfull build, it could be
different with avg time over 10 builds..

And all builds on 
AMD FX(tm)-8120 Eight-Core Processor
16G DDR3-1600 RAM
standalone SATA2 disk ST31500341AS
mdraid on 3 older SATA2 disks HDS728080PLA380

bitbake:
commit 4219e2ea033232d95117211947b751bdb5efafd4
Author: Saul Wold s...@linux.intel.com
Date:   Tue Apr 10 17:57:15 

Re: [yocto] Build time data

2012-04-13 Thread Darren Hart
-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1

On 04/12/2012 10:51 PM, Martin Jansa wrote:

 And my system is very slow compared to yours, I've found my
 measurement of core-image-minimal-with-mtdutils around 95 mins 
 http://patchwork.openembedded.org/patch/17039/ but this was with
 Phenom II X4 965, 4GB RAM, RAID0 (3 SATA2 disks) for WORKDIR, RAID5
 (the same 3 SATA2 disks) BUILDDIR (raid as mdraid), now I have
 Bulldozer AMD FX(tm)-8120, 16GB RAM, still the same RAID0 but 
 different motherboard..

Why RAID5 for BUILDDIR? The write overhead of RAID5 is very high. The
savings RAID5 alots you is more significant with more disks, but with
3 disks it's only 1 disk better than RAID10, with a lot more overhead.

I spent some time outlining all this a while back:
http://www.dvhart.com/2011/03/qnap_ts419p_configuration_raid_levels_and_throughput/

Here's the relevant bit:

RAID 5 distributes parity across all the drives in the array, this
parity calculation is both compute intensive and IO intensive. Every
write requires the parity calculation, and data must be written to
every drive.



- --
Darren Hart
Intel Open Source Technology Center
Yocto Project - Linux Kernel
-BEGIN PGP SIGNATURE-
Version: GnuPG v1.4.11 (GNU/Linux)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org/

iQEcBAEBAgAGBQJPh8LTAAoJEKbMaAwKp364pa8H/A8BSudN/g7ixFmUTYMNGHlC
2+H59MgNHYWRYzNn9QvN6vyyfXzX7C00HUTQ4MQ3CmisTUza2tbJEdX9CpeIBQNg
Ny8iqyNNoInTFx2T1Yi2eA9Ytegtue9Ls+IcBRbpIbs6Zo1Qwzi6oemdPZN7g3YI
rH/NKALWIBt/Y/Dt2k0fz7WsQGYOuE/lYpL/CmukU7vNNEUAdOs7tZa5o1ZOQDuj
zGCwuVH9QwrDJEXNsMtjNY37aJeAgDMwSXjN0pKv1WQI9j47kYQQrrp2qKVQYhV1
x4QxJ5aOuV7BaS0Y7zYkNo9nv+yKPODt25s5L83k5vjbMhCvczmMJn3jupQuUhQ=
=3GDA
-END PGP SIGNATURE-
___
yocto mailing list
yocto@yoctoproject.org
https://lists.yoctoproject.org/listinfo/yocto


Re: [yocto] Build time data

2012-04-13 Thread Martin Jansa
On Thu, Apr 12, 2012 at 11:08:19PM -0700, Darren Hart wrote:
 -BEGIN PGP SIGNED MESSAGE-
 Hash: SHA1
 
 On 04/12/2012 10:51 PM, Martin Jansa wrote:
 
  And my system is very slow compared to yours, I've found my
  measurement of core-image-minimal-with-mtdutils around 95 mins 
  http://patchwork.openembedded.org/patch/17039/ but this was with
  Phenom II X4 965, 4GB RAM, RAID0 (3 SATA2 disks) for WORKDIR, RAID5
  (the same 3 SATA2 disks) BUILDDIR (raid as mdraid), now I have
  Bulldozer AMD FX(tm)-8120, 16GB RAM, still the same RAID0 but 
  different motherboard..
 
 Why RAID5 for BUILDDIR? The write overhead of RAID5 is very high. The
 savings RAID5 alots you is more significant with more disks, but with
 3 disks it's only 1 disk better than RAID10, with a lot more overhead.

Becaure RAID10 needs at least 4 drivers and all my SATA ports are
already used and also it's on my /home partition.. please not that this
is not some company build server, just my desktop where it happens I do
a lot of builds for comunity distribution for smartphones
http://shr-project.org

Server we have available for builds is _much_ slower then this
especially IO (some virtualized host on busy server), but has much
better network bandwidth.. :).

Cheers,
 
 I spent some time outlining all this a while back:
 http://www.dvhart.com/2011/03/qnap_ts419p_configuration_raid_levels_and_throughput/
 
 Here's the relevant bit:
 
 RAID 5 distributes parity across all the drives in the array, this
 parity calculation is both compute intensive and IO intensive. Every
 write requires the parity calculation, and data must be written to
 every drive.
 
 
 
 - --
 Darren Hart
 Intel Open Source Technology Center
 Yocto Project - Linux Kernel
 -BEGIN PGP SIGNATURE-
 Version: GnuPG v1.4.11 (GNU/Linux)
 Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org/
 
 iQEcBAEBAgAGBQJPh8LTAAoJEKbMaAwKp364pa8H/A8BSudN/g7ixFmUTYMNGHlC
 2+H59MgNHYWRYzNn9QvN6vyyfXzX7C00HUTQ4MQ3CmisTUza2tbJEdX9CpeIBQNg
 Ny8iqyNNoInTFx2T1Yi2eA9Ytegtue9Ls+IcBRbpIbs6Zo1Qwzi6oemdPZN7g3YI
 rH/NKALWIBt/Y/Dt2k0fz7WsQGYOuE/lYpL/CmukU7vNNEUAdOs7tZa5o1ZOQDuj
 zGCwuVH9QwrDJEXNsMtjNY37aJeAgDMwSXjN0pKv1WQI9j47kYQQrrp2qKVQYhV1
 x4QxJ5aOuV7BaS0Y7zYkNo9nv+yKPODt25s5L83k5vjbMhCvczmMJn3jupQuUhQ=
 =3GDA
 -END PGP SIGNATURE-

-- 
Martin 'JaMa' Jansa jabber: martin.ja...@gmail.com


signature.asc
Description: Digital signature
___
yocto mailing list
yocto@yoctoproject.org
https://lists.yoctoproject.org/listinfo/yocto


Re: [yocto] Build time data

2012-04-13 Thread Wolfgang Denk
Dear Darren Hart,

In message 4f87c2d3.8020...@linux.intel.com you wrote:

  Phenom II X4 965, 4GB RAM, RAID0 (3 SATA2 disks) for WORKDIR, RAID5
  (the same 3 SATA2 disks) BUILDDIR (raid as mdraid), now I have
  Bulldozer AMD FX(tm)-8120, 16GB RAM, still the same RAID0 but 
  different motherboard..
 
 Why RAID5 for BUILDDIR? The write overhead of RAID5 is very high. The
 savings RAID5 alots you is more significant with more disks, but with
 3 disks it's only 1 disk better than RAID10, with a lot more overhead.

Indeed, RAID5 with just 3 devices makes little sense - especially
when running on the same drives as the RAID0 workdir.

 I spent some time outlining all this a while back:
 http://www.dvhart.com/2011/03/qnap_ts419p_configuration_raid_levels_and_throughput/

Well, such data from a 4 spindle array are nor teling much. When you
are asking for I/O performance on RAID arrays, you want to distibute
load over _many_ spindles. Do your comparisons on a 8 or 16 (or more)
spindle setup, and the results will be much different. Also, your
test of copying huge files is just one usage mode: strictly
sequential access. But what we see with OE / Yocto builds is
completely different. Here you will see a huge number of small and
even tiny data transfers.

Classical recommendations for performance optimization od RAID
arrays (which are usually tuning for such big, sequentuial accesses
only) like using big stripe sizes and huge read-ahead etc. turn out
to be counter-productive here.  But it makes no sense to have for
example a stripe size of 256 kB or more when 95% or more of your disk
accesses write less than 4 kB only.

 Here's the relevant bit:
 
 RAID 5 distributes parity across all the drives in the array, this
 parity calculation is both compute intensive and IO intensive. Every
 write requires the parity calculation, and data must be written to
 every drive.

But did you look at a real system?  I never found the CPU load of the
parity calculations to be a bottleneck.  I rather have the CPU spend
cycles on computing parity, instead of running it with all cores idle
because it's waitong for I/O to complete.  I found that for the work
loads we have (software builds like Yocto etc.) a multi-spindle
software RAID array outperforms all other solutions (and especially
the h/w RAID controllers I had access to so far - these don't even
closely reach the same number of IOPS).

OH - and BTW: if you care about reliability, then don't use RAID5.
Go for RAID6.  Yes, it's more expensive, but it's also much less
painful when you have to rebuild the array in case of a disk failure.
I've seen too many cases where a second disk would fail during the
rebuild to ever go with RAID5 for big systems again - restoring
several TB of data from tape ain't no fun.

See also the RAID wiki for specific performance optizations on such
RAID arrays.

Best regards,

Wolfgang Denk

-- 
DENX Software Engineering GmbH, MD: Wolfgang Denk  Detlev Zundel
HRB 165235 Munich, Office: Kirchenstr.5, D-82194 Groebenzell, Germany
Phone: (+49)-8142-66989-10 Fax: (+49)-8142-66989-80 Email: w...@denx.de
Never put off until tomorrow what you can put off indefinitely.
___
yocto mailing list
yocto@yoctoproject.org
https://lists.yoctoproject.org/listinfo/yocto


Re: [yocto] Build time data

2012-04-13 Thread Richard Purdie
On Thu, 2012-04-12 at 07:34 -0700, Darren Hart wrote:
 
 On 04/12/2012 07:08 AM, Björn Stenberg wrote:
  Darren Hart wrote:
  /dev/md0/build  ext4 
  noauto,noatime,nodiratime,commit=6000
  
  A minor detail: 'nodiratime' is a subset of 'noatime', so there is no
  need to specify both.
 
 Excellent, thanks for the tip.

Note the key here is that for a system with large amounts of memory, you
can effectively keep the build in memory due to the long commit time.

All the tests I've done show we are not IO bound anyway.


  Yet for all the combined horsepower, I am unable to match your time
  of 30 minutes for core-image-minimal. I clock in at around 37 minutes
  for a qemux86-64 build with ipk output:
  
  -- NOTE: Tasks Summary: Attempted 1363 tasks of which 290 didn't
  need to be rerun and all succeeded.
  
  real36m32.118s user214m39.697s sys 108m49.152s --
  
  These numbers also show that my build is running less than 9x
  realtime, indicating that 80% of my cores sit idle most of the time.
 
 Yup, that sounds about right. The build has a linear component to it,
 and anything above about 12 just doesn't help. In fact the added
 scheduling overhead seems to hurt.

  This confirms what ps xf says during the builds: Only rarely is
  bitbake running more than a handful tasks at once, even with
  BB_NUMBER_THREADS at 64. And many of these tasks are in turn running
  sequential loops on a single core.
  
  I'm hoping to find time soon to look deeper into this issue and
  suggest remedies. It my distinct feeling that we should be able to
  build significantly faster on powerful machines.
  
 
 Reducing the dependency chains that result in the linear component of
 the build (forcing serialized execution) is one place we've focused, and
 could probably still use some attention. CC'ing RP as he's done a lot there.

The minimal build is about our worst case single threaded build as it is
highly dependency ordered. We've already done a lot of work looking at
the single thread of core dependencies and this is for example why we
have gettext-minimal-native which unlocked some of the core path
dependencies. When you look at what we build, there is a reason for most
of it unfortunately. There are emails from me about what I looked and
found on the mailing list, I tried to keep a record of it somewhere at
least. You can get some wins with things like ASSUME_PROVIDED +=
git-native.

For something like a sato build you should see more parallelism. 

I do also have some small gains in some pending patches:

http://git.yoctoproject.org/cgit.cgi/poky-contrib/commit/?h=rpurdie/t2id=2023801e25d81e8cffb643eac259c18b9fecda0b
http://git.yoctoproject.org/cgit.cgi/poky-contrib/commit/?h=rpurdie/t2id=ecf5f5de8368fdcf90c3d38eafc689d6d265514b
http://git.yoctoproject.org/cgit.cgi/poky-contrib/commit/?h=rpurdie/t2id=2190a51ffac71c9d19305601f8a3a46e467b745a

which look at speeding up do_package, do_package_write_rpm and do_rootfs
(with rpm). There were developed too late for 1.2 and are in some cases
only partially complete but they show some ways we can squeeze some
extra performance out the system.

There are undoubtedly ways we can improve performance but I think we've
done the low hanging fruit and we need some fresh ideas.

Cheers,

Richard


___
yocto mailing list
yocto@yoctoproject.org
https://lists.yoctoproject.org/listinfo/yocto


Re: [yocto] Build time data

2012-04-13 Thread Björn Stenberg
Darren Hart wrote:
 One thing that comes to mind is the parallel settings, BB_NUMBER_THREADS
 and PARALLEL_MAKE. I noticed a negative impact if I increased these
 beyond 12 and 14 respectively. I tested this with bb-matrix
 (scripts/contrib/bb-perf/bb-matrix.sh). The script is a bit fickle, but
 can provide useful results and killer 3D surface plots of build time
 with BB and PM on the axis.

Very nice! I ran a batch overnight with permutations of 8,12,16,24,64 cores:

BB PM %e %S %U %P %c %w %R %F %M %x
8 8 2288.96 2611.37 10773.53 584% 810299 18460161 690464859 0 1715456 0
8 12 2198.40 2648.57 10846.28 613% 839750 18559413 690563187 0 1982864 0
8 16 2157.26 2672.79 10943.59 631% 898599 18487946 690761197 0 1715440 0
8 24 2125.15 2916.33 11199.27 664% 89 18412764 690856116 0 1715440 0
8 64 2189.14 7084.14 12906.95 913% 1491503 18646891 699897733 0 1715440 0
12 8 2277.66 2625.82 10805.21 589% 691752 18596208 690998433 0 1715440 0
12 12 2194.04 2664.01 10934.65 619% 714997 18717017 691199925 0 1715440 0
12 16 2183.95 2736.33 11162.30 636% 1090270 18359128 690559327 0 1715440 0
12 24 2120.46 2907.63 11229.50 666% 829783 18644293 690729638 0 1715312 0
12 64 2171.58 6767.09 12822.86 902% 1524683 18634668 690904549 0 1867456 0
16 8 2294.59 2691.74 10813.69 588% 771621 18637582 686712129 0 1715344 0
16 12 2201.51 2704.54 11017.23 623% 753662 18590533 699231236 0 1715424 0
16 16 2154.54 2692.31 11023.28 636% 809586 18557781 691014487 0 1715440 0
16 24 2130.33 2932.18 11259.09 666% 905669 18531776 691082307 0 2030992 0
16 64 2184.01 6954.71 12922.39 910% 1467774 18800203 701770099 0 1715440 0
24 8 2284.88 2645.88 10854.89 590% 833061 18523938 691067170 0 1715328 0
24 12 2203.72 2696.96 11033.10 623% 931443 18457749 691187723 0 2016368 0
24 16 2176.02 2727.94 3.33 636% 940044 18420200 690959670 0 1715440 0
24 24 2170.38 2938.80 11643.10 671% 1023328 18641215 686665448 15 1715440 0
24 64 2200.02 7188.60 12902.42 913% 1509158 18924772 690615091 66 1715440 0
64 8 2309.40 2702.33 10952.18 591% 753168 18687309 690927732 10 1867440 0
64 12 2230.80 2765.98 11131.22 622% 875495 18744802 691213524 28 1715216 0
64 16 2182.22 2786.22 11180.86 640% 881328 18724987 691020084 109 1768576 0
64 24 2136.20 3001.36 11238.81 666% 898320 18646384 691239254 46 1715312 0
64 64 2189.73 7154.10 12846.99 913% 1416830 18781801 690890798 41 1715424 0

What it shows is that BB_NUMBER_THREADS makes no difference at all in this 
range. As for PARALLEL_MAKE, it shows 24 is better than 16 but 64 is too high, 
incurring a massive scheduling penalty. I wonder if newer kernel versions have 
become more efficient. In hindsight, I should have included 32 and 48 cores in 
the test.

Unfortunately I was unable to produce plots with bb-matrix-plot.sh. It gave me 
pretty png files, but missing any plotted data:

# ../../poky/scripts/contrib/bb-perf/bb-matrix-plot.sh
 line 0: Number of grid points must be in [2:1000] - not changed!

  Warning: Single isoline (scan) is not enough for a pm3d plot.
   Hint: Missing blank lines in the data file? See 'help pm3d' and FAQ.
  Warning: Single isoline (scan) is not enough for a pm3d plot.
   Hint: Missing blank lines in the data file? See 'help pm3d' and FAQ.
  Warning: Single isoline (scan) is not enough for a pm3d plot.
   Hint: Missing blank lines in the data file? See 'help pm3d' and FAQ.
  Warning: Single isoline (scan) is not enough for a pm3d plot.
   Hint: Missing blank lines in the data file? See 'help pm3d' and FAQ.

Result: http://imgur.com/mfgWb

-- 
Björn
___
yocto mailing list
yocto@yoctoproject.org
https://lists.yoctoproject.org/listinfo/yocto


Re: [yocto] Build time data

2012-04-13 Thread Koen Kooi

Op 13 apr. 2012, om 11:56 heeft Tomas Frydrych het volgende geschreven:

 On 12/04/12 01:30, Darren Hart wrote:
 Next up is storage. 
 
 Indeed. In my experience by far the biggest limiting factor in the
 builds is getting io bound. If you are not running a dedicated build
 machine, it is well worth using a dedicated disk for the poky tmp dir;
 assuming you have cpu time left, this leaves the machine completely
 usable for other things.
 
 
 Now RAM, you will want about 2 GB of RAM per core, with a minimum of 4GB.
 
 My experience does not bear this out at all; building Yocto on a 6 core
 hyper threaded desktop machine I have never ever seen the system memory
 use to get significantly over a 2GB mark (out of 8GB available), doing
 Yocto build using 10 cores/threads.

Try building webkit or asio, the linker will uses ~1.5GB per object, so for 
asio you need PARALLEL_MAKE * 1.5 GB of ram to avoid swapping to disk.
___
yocto mailing list
yocto@yoctoproject.org
https://lists.yoctoproject.org/listinfo/yocto


Re: [yocto] Build time data

2012-04-13 Thread Darren Hart


On 04/13/2012 01:47 AM, Björn Stenberg wrote:
 Darren Hart wrote:
 One thing that comes to mind is the parallel settings, BB_NUMBER_THREADS
 and PARALLEL_MAKE. I noticed a negative impact if I increased these
 beyond 12 and 14 respectively. I tested this with bb-matrix
 (scripts/contrib/bb-perf/bb-matrix.sh). The script is a bit fickle, but
 can provide useful results and killer 3D surface plots of build time
 with BB and PM on the axis.
 
 Very nice! I ran a batch overnight with permutations of 8,12,16,24,64 cores:
 
 BB PM %e %S %U %P %c %w %R %F %M %x
 8 8 2288.96 2611.37 10773.53 584% 810299 18460161 690464859 0 1715456 0
 8 12 2198.40 2648.57 10846.28 613% 839750 18559413 690563187 0 1982864 0
 8 16 2157.26 2672.79 10943.59 631% 898599 18487946 690761197 0 1715440 0
 8 24 2125.15 2916.33 11199.27 664% 89 18412764 690856116 0 1715440 0
 8 64 2189.14 7084.14 12906.95 913% 1491503 18646891 699897733 0 1715440 0
 12 8 2277.66 2625.82 10805.21 589% 691752 18596208 690998433 0 1715440 0
 12 12 2194.04 2664.01 10934.65 619% 714997 18717017 691199925 0 1715440 0
 12 16 2183.95 2736.33 11162.30 636% 1090270 18359128 690559327 0 1715440 0
 12 24 2120.46 2907.63 11229.50 666% 829783 18644293 690729638 0 1715312 0
 12 64 2171.58 6767.09 12822.86 902% 1524683 18634668 690904549 0 1867456 0
 16 8 2294.59 2691.74 10813.69 588% 771621 18637582 686712129 0 1715344 0
 16 12 2201.51 2704.54 11017.23 623% 753662 18590533 699231236 0 1715424 0
 16 16 2154.54 2692.31 11023.28 636% 809586 18557781 691014487 0 1715440 0
 16 24 2130.33 2932.18 11259.09 666% 905669 18531776 691082307 0 2030992 0
 16 64 2184.01 6954.71 12922.39 910% 1467774 18800203 701770099 0 1715440 0
 24 8 2284.88 2645.88 10854.89 590% 833061 18523938 691067170 0 1715328 0
 24 12 2203.72 2696.96 11033.10 623% 931443 18457749 691187723 0 2016368 0
 24 16 2176.02 2727.94 3.33 636% 940044 18420200 690959670 0 1715440 0
 24 24 2170.38 2938.80 11643.10 671% 1023328 18641215 686665448 15 1715440 0
 24 64 2200.02 7188.60 12902.42 913% 1509158 18924772 690615091 66 1715440 0
 64 8 2309.40 2702.33 10952.18 591% 753168 18687309 690927732 10 1867440 0
 64 12 2230.80 2765.98 11131.22 622% 875495 18744802 691213524 28 1715216 0
 64 16 2182.22 2786.22 11180.86 640% 881328 18724987 691020084 109 1768576 0
 64 24 2136.20 3001.36 11238.81 666% 898320 18646384 691239254 46 1715312 0
 64 64 2189.73 7154.10 12846.99 913% 1416830 18781801 690890798 41 1715424 0
 
 What it shows is that BB_NUMBER_THREADS makes no difference at all in this 
 range. As for PARALLEL_MAKE, it shows 24 is better than 16 but 64 is too 
 high, incurring a massive scheduling penalty. I wonder if newer kernel 
 versions have become more efficient. In hindsight, I should have included 32 
 and 48 cores in the test.
 
 Unfortunately I was unable to produce plots with bb-matrix-plot.sh. It gave 
 me pretty png files, but missing any plotted data:


Right, gnuplot likes evenly spaced values of BB and PM. So you could
have done: 8,12,16,24,28,32 (anything about that is going to go down
anyway). Unfortunately, the gaps force the plot to generate spikes at
the interpolated points. I'm open to ideas on how to make it compatible
with arbitrary gaps and avoid the spikes.


Perhaps I should rewrite this with python matplotlib and scipy and use
the interpolate module. This is non-trivial, so not something I'll get
to quickly.

 
 # ../../poky/scripts/contrib/bb-perf/bb-matrix-plot.sh
  line 0: Number of grid points must be in [2:1000] - not changed!
 
   Warning: Single isoline (scan) is not enough for a pm3d plot.
Hint: Missing blank lines in the data file? See 'help pm3d' and 
 FAQ.
   Warning: Single isoline (scan) is not enough for a pm3d plot.
Hint: Missing blank lines in the data file? See 'help pm3d' and 
 FAQ.
   Warning: Single isoline (scan) is not enough for a pm3d plot.
Hint: Missing blank lines in the data file? See 'help pm3d' and 
 FAQ.
   Warning: Single isoline (scan) is not enough for a pm3d plot.
Hint: Missing blank lines in the data file? See 'help pm3d' and 
 FAQ.
 
 Result: http://imgur.com/mfgWb
 

-- 
Darren Hart
Intel Open Source Technology Center
Yocto Project - Linux Kernel
___
yocto mailing list
yocto@yoctoproject.org
https://lists.yoctoproject.org/listinfo/yocto


Re: [yocto] Build time data

2012-04-12 Thread Darren Hart


On 04/11/2012 09:39 PM, Bob Cochran wrote:
 On 04/11/2012 08:30 PM, Darren Hart wrote:
 SSDs are one way to
 go, but we've been known to chew through them and they aren't priced as
 consumables.
 
 Hi Darren,
 
 Could you please elaborate on been known to chew through them?
 
 Are you running into an upper limit on write / erase cycles?  Are you 
 encountering hard (or soft) failures?

Some have reported early physical disk failure. Due to the cost of SSDs,
not a lot of people seem to be trying it out. I *believe* the current
generation of SSDs would perform admirably, but I haven't tested that. I
know Deny builds with SSDs, perhaps he would care to comment?


-- 
Darren Hart
Intel Open Source Technology Center
Yocto Project - Linux Kernel
___
yocto mailing list
yocto@yoctoproject.org
https://lists.yoctoproject.org/listinfo/yocto


Re: [yocto] Build time data

2012-04-12 Thread Joshua Immanuel
Darren,

On Wed, 2012-04-11 at 17:30 -0700, Darren Hart wrote:
 I run on a beast with 12 cores, 48GB of RAM, OS and sources on a G2
 Intel SSD, with two Seagate Barracudas in a RAID0 array for my /build
 partition. I run a headless Ubuntu 11.10 (x86_64) installation running
 the 3.0.0-16-server kernel. I can build core-image-minimal in  30
 minutes and core-image-sato in  50 minutes from scratch. 

wow. Can I get a shell? :D
-- 
Joshua Immanuel
HiPro IT Solutions Private Limited
http://hipro.co.in


signature.asc
Description: This is a digitally signed message part
___
yocto mailing list
yocto@yoctoproject.org
https://lists.yoctoproject.org/listinfo/yocto


Re: [yocto] Build time data

2012-04-12 Thread Joshua Immanuel
On Thu, 2012-04-12 at 10:00 +0200, Martin Jansa wrote:
  On Wed, 2012-04-11 at 17:30 -0700, Darren Hart wrote:
   I run on a beast with 12 cores, 48GB of RAM, OS and sources on 
   a G2 Intel SSD, with two Seagate Barracudas in a RAID0 array for
   my /build partition. I run a headless Ubuntu 11.10 (x86_64)
   installation running the 3.0.0-16-server kernel. I can build
   core-image-minimal in  30 minutes and core-image-sato in  50
   minutes from scratch. 
 
 why not use so much RAM for WORKDIR in tmpfs? I bought 16GB just to be
 able to do my builds in tmpfs and keep only more permanent data on
 RAID. 

+1 

I tried using the tmpfs for WORKDIR on my T420 which has 8GB of RAM. (In
India, maximum single slot DDR3 RAM we can get is 4GB.) Obviously, this
is not sufficient :( Maybe I shouldn't use the laptop for build
purposes.

Moreover, every time I build the image in yocto, temperature peeks to 87
degree Celsius. Hoping that my HDD should not die.

-- 
Joshua Immanuel
HiPro IT Solutions Private Limited
http://hipro.co.in


signature.asc
Description: This is a digitally signed message part
___
yocto mailing list
yocto@yoctoproject.org
https://lists.yoctoproject.org/listinfo/yocto


Re: [yocto] Build time data

2012-04-12 Thread Björn Stenberg
Darren Hart wrote:
 /dev/md0/build  ext4
 noauto,noatime,nodiratime,commit=6000

A minor detail: 'nodiratime' is a subset of 'noatime', so there is no need to 
specify both.

 I run on a beast with 12 cores, 48GB of RAM, OS and sources on a G2
 Intel SSD, with two Seagate Barracudas in a RAID0 array for my /build
 partition. I run a headless Ubuntu 11.10 (x86_64) installation running
 the 3.0.0-16-server kernel. I can build core-image-minimal in  30
 minutes and core-image-sato in  50 minutes from scratch.

I'm guessing those are rather fast cores? I build on a different type of beast: 
64 cores at 2.1GHz and 128 GB ram. The OS is on a single SSD and the build dir 
(and sources) is on a RAID0 array of Intel 520 SSDs. Kernel is the same ubuntu 
3.0.0-16-server as yours.

Yet for all the combined horsepower, I am unable to match your time of 30 
minutes for core-image-minimal. I clock in at around 37 minutes for a 
qemux86-64 build with ipk output:

--
NOTE: Tasks Summary: Attempted 1363 tasks of which 290 didn't need to be rerun 
and all succeeded.

real36m32.118s
user214m39.697s
sys 108m49.152s
--

These numbers also show that my build is running less than 9x realtime, 
indicating that 80% of my cores sit idle most of the time. This confirms what 
ps xf says during the builds: Only rarely is bitbake running more than a 
handful tasks at once, even with BB_NUMBER_THREADS at 64. And many of these 
tasks are in turn running sequential loops on a single core.

I'm hoping to find time soon to look deeper into this issue and suggest 
remedies. It my distinct feeling that we should be able to build significantly 
faster on powerful machines.

-- 
Björn
___
yocto mailing list
yocto@yoctoproject.org
https://lists.yoctoproject.org/listinfo/yocto


Re: [yocto] Build time data

2012-04-12 Thread Darren Hart
-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1



On 04/12/2012 01:00 AM, Martin Jansa wrote:
 On Thu, Apr 12, 2012 at 01:05:00PM +0530, Joshua Immanuel wrote:
 Darren,
 
 On Wed, 2012-04-11 at 17:30 -0700, Darren Hart wrote:
 I run on a beast with 12 cores, 48GB of RAM, OS and sources on
 a G2 Intel SSD, with two Seagate Barracudas in a RAID0 array
 for my /build partition. I run a headless Ubuntu 11.10 (x86_64)
 installation running the 3.0.0-16-server kernel. I can build
 core-image-minimal in  30 minutes and core-image-sato in  50
 minutes from scratch.
 
 why not use so much RAM for WORKDIR in tmpfs? I bought 16GB just to
 be able to do my builds in tmpfs and keep only more permanent data
 on RAID.

We've done some experiments with tmpfs, adding Beth on CC. If I recall
correctly, my RAID0 array with the mount options I specified
accomplishes much of what tmpfs does for me without the added setup.
With a higher commit interval, the kernel doesn't try to sync the
dcache with the disks as frequently (eg not even once during a build),
so it's effectively writing to memory (although there is still plenty
of IO occurring).

The other reason is that while 48GB is plenty for a single build, I
often run many builds in parallel, sometimes in virtual machines when
I need to reproduce or test something on different hosts.

For example:
https://picasaweb.google.com/lh/photo/7PCrqXQqxL98SAY1ecNzDdMTjNZETYmyPJy0liipFm0?feat=directlink


- -- 
Darren Hart
Intel Open Source Technology Center
Yocto Project - Linux Kernel
-BEGIN PGP SIGNATURE-
Version: GnuPG v1.4.11 (GNU/Linux)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org/

iQEcBAEBAgAGBQJPhuLfAAoJEKbMaAwKp3648pYH/1HGCzI1QP1mj1OPfbo1TNou
nq1dCnEQOc+vUqShrmgjEY5H2G7Kqu5Y8JRp8m3D6v2iUPwu+ko3xASJkIVetgTn
1J+dkZl93Gbm8nm63b5bES0mMqyiycNgXW4KTL0iA+4mLbKSXck7nF/gIyjE4iHa
SR+DDavSoOIJUiZsJBJpIdS4sY2RpalohhJvp97Qfmbxmqlo2RJkqzB7OmLliKbB
zGiuXeFgGojZXIRl11Rr36kqqA75WoTlNYjlkcg1paEhCr4zCMh0sujGaPQgVPtu
YU+FCtGxQ569f+hahdJraCU9T4IbMK4AOk30VqVxPifCqFhIvr7FnVRkYtV5pZM=
=tdFq
-END PGP SIGNATURE-
___
yocto mailing list
yocto@yoctoproject.org
https://lists.yoctoproject.org/listinfo/yocto


Re: [yocto] Build time data

2012-04-12 Thread Darren Hart


On 04/12/2012 03:43 PM, Chris Tapp wrote:
 On 12 Apr 2012, at 15:34, Darren Hart wrote:


 On 04/12/2012 07:08 AM, Björn Stenberg wrote:
 Darren Hart wrote:
 /dev/md0/build  ext4 
 noauto,noatime,nodiratime,commit=6000

 A minor detail: 'nodiratime' is a subset of 'noatime', so there is no
 need to specify both.

 Excellent, thanks for the tip.


 I run on a beast with 12 cores, 48GB of RAM, OS and sources on a
 G2 Intel SSD, with two Seagate Barracudas in a RAID0 array for my
 /build partition. I run a headless Ubuntu 11.10 (x86_64)
 installation running the 3.0.0-16-server kernel. I can build
 core-image-minimal in  30 minutes and core-image-sato in  50
 minutes from scratch.

 I'm guessing those are rather fast cores? 

 They are:
 model name   : Intel(R) Xeon(R) CPU   X5680  @ 3.33GHz
 
 Nice, but well out of my budget - I've got to make do with what one of your 
 CPUs costs for the whole system ;-)
 

 I build on a different type
 of beast: 64 cores at 2.1GHz and 128 GB ram. The OS is on a single
 SSD and the build dir (and sources) is on a RAID0 array of Intel 520
 SSDs. Kernel is the same ubuntu 3.0.0-16-server as yours.

 Now that I think about it, my downloads are on the RAID0 array too.

 One thing that comes to mind is the parallel settings, BB_NUMBER_THREADS
 and PARALLEL_MAKE. I noticed a negative impact if I increased these
 beyond 12 and 14 respectively. I tested this with bb-matrix
 (scripts/contrib/bb-perf/bb-matrix.sh). The script is a bit fickle, but
 can provide useful results and killer 3D surface plots of build time
 with BB and PM on the axis. Can't seem to find a plot image at the
 moment for some reason...


 Yet for all the combined horsepower, I am unable to match your time
 of 30 minutes for core-image-minimal. I clock in at around 37 minutes
 for a qemux86-64 build with ipk output:

 -- NOTE: Tasks Summary: Attempted 1363 tasks of which 290 didn't
 need to be rerun and all succeeded.

 real36m32.118s user214m39.697s sys 108m49.152s --

 These numbers also show that my build is running less than 9x
 realtime, indicating that 80% of my cores sit idle most of the time.

 Yup, that sounds about right. The build has a linear component to it,
 and anything above about 12 just doesn't help. In fact the added
 scheduling overhead seems to hurt.

 This confirms what ps xf says during the builds: Only rarely is
 bitbake running more than a handful tasks at once, even with
 BB_NUMBER_THREADS at 64. And many of these tasks are in turn running
 sequential loops on a single core.

 I'm hoping to find time soon to look deeper into this issue and
 suggest remedies. It my distinct feeling that we should be able to
 build significantly faster on powerful machines.


 Reducing the dependency chains that result in the linear component of
 the build (forcing serialized execution) is one place we've focused, and
 could probably still use some attention. CC'ing RP as he's done a lot there.
 
 Current plan for a 'budget' system is:
 
 DX79TO motherboard, i7 3820, 16GB RAM, a pair of 60GB OCZ Vertex III's in 
 RAID-0 for downloads / build, SATA HD for OS (Ubuntu 11.10 x86_64).
 
 That'll give me a 2.7x boost just on CPU and the SSDs (and maybe some 
 over-clocking) will give some more.
 
 Not sure if SSDs in RAID-0 will give any boost, so I'll run some tests.
 
 Thanks to all for the comments in this thread.

Get back to us with times, and we'll build up a wiki page.

 
 Chris Tapp
 
 opensou...@keylevel.com
 www.keylevel.com

-- 
Darren Hart
Intel Open Source Technology Center
Yocto Project - Linux Kernel
___
yocto mailing list
yocto@yoctoproject.org
https://lists.yoctoproject.org/listinfo/yocto


Re: [yocto] Build time data

2012-04-12 Thread Flanagan, Elizabeth
On Thu, Apr 12, 2012 at 7:12 AM, Darren Hart dvh...@linux.intel.com wrote:

 -BEGIN PGP SIGNED MESSAGE-
 Hash: SHA1



 On 04/12/2012 01:00 AM, Martin Jansa wrote:
  On Thu, Apr 12, 2012 at 01:05:00PM +0530, Joshua Immanuel wrote:
  Darren,
 
  On Wed, 2012-04-11 at 17:30 -0700, Darren Hart wrote:
  I run on a beast with 12 cores, 48GB of RAM, OS and sources on
  a G2 Intel SSD, with two Seagate Barracudas in a RAID0 array
  for my /build partition. I run a headless Ubuntu 11.10 (x86_64)
  installation running the 3.0.0-16-server kernel. I can build
  core-image-minimal in  30 minutes and core-image-sato in  50
  minutes from scratch.
 
  why not use so much RAM for WORKDIR in tmpfs? I bought 16GB just to
  be able to do my builds in tmpfs and keep only more permanent data
  on RAID.

 We've done some experiments with tmpfs, adding Beth on CC. If I recall
 correctly, my RAID0 array with the mount options I specified
 accomplishes much of what tmpfs does for me without the added setup.


This should be the case in general. For the most part, if you have a decent
RAID setup (We're using RAID10 on the ab) with fast disks you should be
able to hit tmpfs speed (or close to it). I've done some experiments with
this and what I found was maybe a 5 minute difference, sometimes, from a
clean build between tmpfs and RAID10.

I discussed this during Yocto Developer Day. Let me boil it down a bit to
explain some of what I did on the autobuilders.

Caveat first though. I would avoid using autobuilder time as representative
of prime yocto build time. The autobuilder hosts a lot of different
services that sometimes impact build time and this can vary depending on
what else is going on on the machine.

There are four places, in general, where you want to look at optimizing
outside of dependency issues. CPU, disk, memory, build process. What I
found was that the most useful of these in getting the autobuilder time
down was disk and build process.

With disk, spreading it across the RAID saved us not only a bit of time,
but also helped us avoid trashed disks. More disk thrash == higher failure
rate. So far this year we've seen two disk failures that have resulted in
almost zero autobuilder downtime.

The real time saver however ended up being maintaining sstate across build
runs. Even with our sstate on nfs, we're still seeing a dramatic decrease
in build time.

I would be interested in seeing what times you get with tmpfs. I've done
tmpfs builds before and have seen good results, but bang for the buck did
end up being a RAID array.


With a higher commit interval, the kernel doesn't try to sync the
 dcache with the disks as frequently (eg not even once during a build),
 so it's effectively writing to memory (although there is still plenty
 of IO occurring).

 The other reason is that while 48GB is plenty for a single build, I
 often run many builds in parallel, sometimes in virtual machines when
 I need to reproduce or test something on different hosts.

 For example:

 https://picasaweb.google.com/lh/photo/7PCrqXQqxL98SAY1ecNzDdMTjNZETYmyPJy0liipFm0?feat=directlink


 - --
 Darren Hart
 Intel Open Source Technology Center
 Yocto Project - Linux Kernel
 -BEGIN PGP SIGNATURE-
 Version: GnuPG v1.4.11 (GNU/Linux)
 Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org/

 iQEcBAEBAgAGBQJPhuLfAAoJEKbMaAwKp3648pYH/1HGCzI1QP1mj1OPfbo1TNou
 nq1dCnEQOc+vUqShrmgjEY5H2G7Kqu5Y8JRp8m3D6v2iUPwu+ko3xASJkIVetgTn
 1J+dkZl93Gbm8nm63b5bES0mMqyiycNgXW4KTL0iA+4mLbKSXck7nF/gIyjE4iHa
 SR+DDavSoOIJUiZsJBJpIdS4sY2RpalohhJvp97Qfmbxmqlo2RJkqzB7OmLliKbB
 zGiuXeFgGojZXIRl11Rr36kqqA75WoTlNYjlkcg1paEhCr4zCMh0sujGaPQgVPtu
 YU+FCtGxQ569f+hahdJraCU9T4IbMK4AOk30VqVxPifCqFhIvr7FnVRkYtV5pZM=
 =tdFq
 -END PGP SIGNATURE-




-- 
Elizabeth Flanagan
Yocto Project
Build and Release
___
yocto mailing list
yocto@yoctoproject.org
https://lists.yoctoproject.org/listinfo/yocto


Re: [yocto] Build time data

2012-04-12 Thread Martin Jansa
On Thu, Apr 12, 2012 at 04:37:00PM -0700, Flanagan, Elizabeth wrote:
 On Thu, Apr 12, 2012 at 7:12 AM, Darren Hart dvh...@linux.intel.com wrote:
 
  -BEGIN PGP SIGNED MESSAGE-
  Hash: SHA1
 
 
 
  On 04/12/2012 01:00 AM, Martin Jansa wrote:
   On Thu, Apr 12, 2012 at 01:05:00PM +0530, Joshua Immanuel wrote:
   Darren,
  
   On Wed, 2012-04-11 at 17:30 -0700, Darren Hart wrote:
   I run on a beast with 12 cores, 48GB of RAM, OS and sources on
   a G2 Intel SSD, with two Seagate Barracudas in a RAID0 array
   for my /build partition. I run a headless Ubuntu 11.10 (x86_64)
   installation running the 3.0.0-16-server kernel. I can build
   core-image-minimal in  30 minutes and core-image-sato in  50
   minutes from scratch.
  
   why not use so much RAM for WORKDIR in tmpfs? I bought 16GB just to
   be able to do my builds in tmpfs and keep only more permanent data
   on RAID.
 
  We've done some experiments with tmpfs, adding Beth on CC. If I recall
  correctly, my RAID0 array with the mount options I specified
  accomplishes much of what tmpfs does for me without the added setup.
 
 
 This should be the case in general. For the most part, if you have a decent
 RAID setup (We're using RAID10 on the ab) with fast disks you should be
 able to hit tmpfs speed (or close to it). I've done some experiments with
 this and what I found was maybe a 5 minute difference, sometimes, from a
 clean build between tmpfs and RAID10.

5 minutes on very small image like core-image-minimal (30 min) is 1/6 of
that time :).. 

I have much bigger images and even bigger ipk feed, so to rebuild from
scratch takes about 24 hours for one architecture..

And my system is very slow compared to yours, I've found my measurement
of core-image-minimal-with-mtdutils around 95 mins
http://patchwork.openembedded.org/patch/17039/
but this was with Phenom II X4 965, 4GB RAM, RAID0 (3 SATA2 disks) for
WORKDIR, RAID5 (the same 3 SATA2 disks) BUILDDIR (raid as mdraid), now
I have Bulldozer AMD FX(tm)-8120, 16GB RAM, still the same RAID0 but
different motherboard..

Problem with tmpfs is that no RAM is big enough to build whole feed in
one go, so I have to build in steps (e.g. bitbake gcc for all machines
with the same architecture, then cleanup WORKDIR and switch to another
arch, then bitbake small-image, bigger-image, qt4-x11-free, ...).
qt4-x11-free is able to eat 15GB tmpfs almost completely.

 I discussed this during Yocto Developer Day. Let me boil it down a bit to
 explain some of what I did on the autobuilders.
 
 Caveat first though. I would avoid using autobuilder time as representative
 of prime yocto build time. The autobuilder hosts a lot of different
 services that sometimes impact build time and this can vary depending on
 what else is going on on the machine.
 
 There are four places, in general, where you want to look at optimizing
 outside of dependency issues. CPU, disk, memory, build process. What I
 found was that the most useful of these in getting the autobuilder time
 down was disk and build process.
 
 With disk, spreading it across the RAID saved us not only a bit of time,
 but also helped us avoid trashed disks. More disk thrash == higher failure
 rate. So far this year we've seen two disk failures that have resulted in
 almost zero autobuilder downtime.

True for RAID10, but for WORKDIR itself RAID0 is cheeper and even higher
failure rate it's not big issue for WORKDIR.. just have to cleansstate
tasks which were in hit in the middle of build..

 The real time saver however ended up being maintaining sstate across build
 runs. Even with our sstate on nfs, we're still seeing a dramatic decrease
 in build time.
 
 I would be interested in seeing what times you get with tmpfs. I've done
 tmpfs builds before and have seen good results, but bang for the buck did
 end up being a RAID array.

I'll check if core-image-minimal can be built with just 15GB tmpfs,
otherwise I would have to build it in 2 steps and the time wont be
precise.

 With a higher commit interval, the kernel doesn't try to sync the
  dcache with the disks as frequently (eg not even once during a build),
  so it's effectively writing to memory (although there is still plenty
  of IO occurring).
 
  The other reason is that while 48GB is plenty for a single build, I
  often run many builds in parallel, sometimes in virtual machines when
  I need to reproduce or test something on different hosts.
 
  For example:
 
  https://picasaweb.google.com/lh/photo/7PCrqXQqxL98SAY1ecNzDdMTjNZETYmyPJy0liipFm0?feat=directlink

-- 
Martin 'JaMa' Jansa jabber: martin.ja...@gmail.com


signature.asc
Description: Digital signature
___
yocto mailing list
yocto@yoctoproject.org
https://lists.yoctoproject.org/listinfo/yocto


[yocto] Build time data

2012-04-11 Thread Chris Tapp
Is there a page somewhere that gives a rough idea of how quickly a full build 
runs on various systems?

I need a faster build platform, but want to get a reasonable price / 
performance balance ;-)

I'm looking at something like an i7-2700K but am not yet tied...

Chris Tapp

opensou...@keylevel.com
www.keylevel.com



___
yocto mailing list
yocto@yoctoproject.org
https://lists.yoctoproject.org/listinfo/yocto


Re: [yocto] Build time data

2012-04-11 Thread Bob Cochran

On 04/11/2012 04:42 PM, Chris Tapp wrote:

Is there a page somewhere that gives a rough idea of how quickly a full build 
runs on various systems?

I need a faster build platform, but want to get a reasonable price / 
performance balance ;-)

I'm looking at something like an i7-2700K but am not yet tied...

Chris Tapp

opensou...@keylevel.com
www.keylevel.com





I haven't seen one, but it would be great to have this on the wiki where 
everyone could post what they're seeing  using.


Maybe the autobuilder has some useful statistics 
(http://autobuilder.yoctoproject.org:8010/)?  Of course, you'll have to 
be careful to determine whether anything else was running at the time of 
the build.


On a related note, I have been wondering whether I would get the bang 
for the buck with an SSD for my build machines.  I would guess that 
building embedded Linux images isn't a typical use pattern for an SSD. I 
wonder if the long write  erase durations for FLASH technology would 
show its ugly face during a poky build.  I would think that the embedded 
micro inside the SSD managing the writes might get taxed to the limit 
trying to slice the data.  I would appreciate anyone's experience with 
SSDs on build machines.

___
yocto mailing list
yocto@yoctoproject.org
https://lists.yoctoproject.org/listinfo/yocto


Re: [yocto] Build time data

2012-04-11 Thread Darren Hart


On 04/11/2012 01:42 PM, Chris Tapp wrote:
 Is there a page somewhere that gives a rough idea of how quickly a full build 
 runs on various systems?
 
 I need a faster build platform, but want to get a reasonable price / 
 performance balance ;-)
 
 I'm looking at something like an i7-2700K but am not yet tied...
 


We really do need to get some pages up on this as it comes up a lot.

Currently Yocto Project builds scale well up to about 12 Cores, so first
step is to get as many cores as you can. Sacrifice some speed for cores
if you have to. If you can do dual-socket, do it. If not, try for a six
core.

Next up is storage. We read and write a LOT of data. SSDs are one way to
go, but we've been known to chew through them and they aren't priced as
consumables. You can get about 66% of the performance of a single SSD
with a pair of good quality SATA2 or better drives configured in RAID0
(no redundancy). Ideally, you would have your OS and sources on an SSD
and use a RAID0 array to build on. This data is all recreatable, so it's
OK if you lose a disk and therefor ALL of your build data.

Now RAM, you will want about 2 GB of RAM per core, with a minimum of 4GB.

Finally, software. Be sure to run a server kernel which is optimized
for throughput as opposed to interactivity (like Desktop kernels). This
implies CONFIG_PREEMPT_NONE=y. You'll want a 64-bit kernel to avoid the
performance penalty inherent with 32bit PAE kernels - and you will want
lots of memory. You can save some IO by mounting your
its-ok-if-i-lose-all-my-data build partition as follows:

/dev/md0/build  ext4
noauto,noatime,nodiratime,commit=6000

As well as drop the journal from it when you format it. Just don't power
off your machine without properly shutting down!

That should get you some pretty good build times.

I run on a beast with 12 cores, 48GB of RAM, OS and sources on a G2
Intel SSD, with two Seagate Barracudas in a RAID0 array for my /build
partition. I run a headless Ubuntu 11.10 (x86_64) installation running
the 3.0.0-16-server kernel. I can build core-image-minimal in  30
minutes and core-image-sato in  50 minutes from scratch.

Hopefully that gives you some ideas to get started.

-- 
Darren Hart
Intel Open Source Technology Center
Yocto Project - Linux Kernel
___
yocto mailing list
yocto@yoctoproject.org
https://lists.yoctoproject.org/listinfo/yocto


Re: [yocto] Build time data

2012-04-11 Thread Osier-mixon, Jeffrey
Excellent topic for a wiki page.

On Wed, Apr 11, 2012 at 5:30 PM, Darren Hart dvh...@linux.intel.com wrote:


 On 04/11/2012 01:42 PM, Chris Tapp wrote:
 Is there a page somewhere that gives a rough idea of how quickly a full 
 build runs on various systems?

 I need a faster build platform, but want to get a reasonable price / 
 performance balance ;-)

 I'm looking at something like an i7-2700K but am not yet tied...



 We really do need to get some pages up on this as it comes up a lot.

 Currently Yocto Project builds scale well up to about 12 Cores, so first
 step is to get as many cores as you can. Sacrifice some speed for cores
 if you have to. If you can do dual-socket, do it. If not, try for a six
 core.

 Next up is storage. We read and write a LOT of data. SSDs are one way to
 go, but we've been known to chew through them and they aren't priced as
 consumables. You can get about 66% of the performance of a single SSD
 with a pair of good quality SATA2 or better drives configured in RAID0
 (no redundancy). Ideally, you would have your OS and sources on an SSD
 and use a RAID0 array to build on. This data is all recreatable, so it's
 OK if you lose a disk and therefor ALL of your build data.

 Now RAM, you will want about 2 GB of RAM per core, with a minimum of 4GB.

 Finally, software. Be sure to run a server kernel which is optimized
 for throughput as opposed to interactivity (like Desktop kernels). This
 implies CONFIG_PREEMPT_NONE=y. You'll want a 64-bit kernel to avoid the
 performance penalty inherent with 32bit PAE kernels - and you will want
 lots of memory. You can save some IO by mounting your
 its-ok-if-i-lose-all-my-data build partition as follows:

 /dev/md0        /build          ext4
 noauto,noatime,nodiratime,commit=6000

 As well as drop the journal from it when you format it. Just don't power
 off your machine without properly shutting down!

 That should get you some pretty good build times.

 I run on a beast with 12 cores, 48GB of RAM, OS and sources on a G2
 Intel SSD, with two Seagate Barracudas in a RAID0 array for my /build
 partition. I run a headless Ubuntu 11.10 (x86_64) installation running
 the 3.0.0-16-server kernel. I can build core-image-minimal in  30
 minutes and core-image-sato in  50 minutes from scratch.

 Hopefully that gives you some ideas to get started.

 --
 Darren Hart
 Intel Open Source Technology Center
 Yocto Project - Linux Kernel
 ___
 yocto mailing list
 yocto@yoctoproject.org
 https://lists.yoctoproject.org/listinfo/yocto



-- 
Jeff Osier-Mixon http://jefro.net/blog
Yocto Project Community Manager @Intel http://yoctoproject.org
___
yocto mailing list
yocto@yoctoproject.org
https://lists.yoctoproject.org/listinfo/yocto


Re: [yocto] Build time data

2012-04-11 Thread Bob Cochran

On 04/11/2012 08:30 PM, Darren Hart wrote:

SSDs are one way to
go, but we've been known to chew through them and they aren't priced as
consumables.


Hi Darren,

Could you please elaborate on been known to chew through them?

Are you running into an upper limit on write / erase cycles?  Are you 
encountering hard (or soft) failures?


Thanks,

Bob


___
yocto mailing list
yocto@yoctoproject.org
https://lists.yoctoproject.org/listinfo/yocto