Re: [yocto] Build time data
Op 13 apr. 2012, om 10:45 heeft Richard Purdie het volgende geschreven: On Thu, 2012-04-12 at 07:34 -0700, Darren Hart wrote: On 04/12/2012 07:08 AM, Björn Stenberg wrote: Darren Hart wrote: /dev/md0/build ext4 noauto,noatime,nodiratime,commit=6000 A minor detail: 'nodiratime' is a subset of 'noatime', so there is no need to specify both. Excellent, thanks for the tip. Note the key here is that for a system with large amounts of memory, you can effectively keep the build in memory due to the long commit time. All the tests I've done show we are not IO bound anyway. Consider this scenario: OS disk on spinning rust (sda1, /) BUILDDIR on spinning rust (sdb1, /OE) WORKDIR on SSD (sdc1, /OE/build/tmp/work) SD card in USB reader (sde1) When I do the following during a build all CPUs will enter IO wait and the build grinds to a halt: cd /media ; xz -d -c foo.img.xz | pv -s 3488M /dev/sde That only touches the OS disk and the SD card, but for some reason the 3.2.8 kernel stops IO to the OE disks as well. do_patch for my kernel recipe has been taking more than an hour now, it usually completes in less than 5 minutes (a few hundred patches applied with a custom patcher, git-am). regards, Koen ___ yocto mailing list yocto@yoctoproject.org https://lists.yoctoproject.org/listinfo/yocto
Re: [yocto] Build time data
Hello, On Fri, 2012-04-13 at 09:45 +0100, Richard Purdie wrote: There are undoubtedly ways we can improve performance but I think we've done the low hanging fruit and we need some fresh ideas. Is there a way to integrate distcc in yocto so that we could distribute the build across machines. -- Joshua Immanuel HiPro IT Solutions Private Limited http://hipro.co.in signature.asc Description: This is a digitally signed message part ___ yocto mailing list yocto@yoctoproject.org https://lists.yoctoproject.org/listinfo/yocto
Re: [yocto] Build time data
On Thu, 2012-04-19 at 18:18 +0530, Joshua Immanuel wrote: Hello, On Fri, 2012-04-13 at 09:45 +0100, Richard Purdie wrote: There are undoubtedly ways we can improve performance but I think we've done the low hanging fruit and we need some fresh ideas. Is there a way to integrate distcc in yocto so that we could distribute the build across machines. See icecream.bbclass but compiling is not the bottleneck, its configure, install and packaging... Cheers, Richard ___ yocto mailing list yocto@yoctoproject.org https://lists.yoctoproject.org/listinfo/yocto
Re: [yocto] Build time data
2012/4/19 Richard Purdie richard.pur...@linuxfoundation.org: On Thu, 2012-04-19 at 18:18 +0530, Joshua Immanuel wrote: Hello, On Fri, 2012-04-13 at 09:45 +0100, Richard Purdie wrote: There are undoubtedly ways we can improve performance but I think we've done the low hanging fruit and we need some fresh ideas. Is there a way to integrate distcc in yocto so that we could distribute the build across machines. See icecream.bbclass but compiling is not the bottleneck, its configure, install and packaging... Multi threaded package managers come to my mind, also multi threaded bzip2 (see [1]) Maybe multi threaded autotools / cmake, but that will be future talk (and a headache for the developers). Cheers, Richard ___ yocto mailing list yocto@yoctoproject.org https://lists.yoctoproject.org/listinfo/yocto [1] http://compression.ca/pbzip2/ -- Regards Samuel ___ yocto mailing list yocto@yoctoproject.org https://lists.yoctoproject.org/listinfo/yocto
Re: [yocto] Build time data
On 18 Apr 2012, at 21:55, Darren Hart wrote: snip A couple of things to keep in mind here. The minimal build is very serialized in comparison to something like a sato build. If you want to optimize your build times, look at the bbmatrix* scripts shipped with poky to find the sweet spot for your target image and your build system. I suspect you will find your BB_NUMBER_THREADS and PARALLEL_MAKE settings are two high for your system. I'd start with them at 8 and 8, or 8 and 6 respectively. I've run a few of the matrix variants (it's going to take a few days to get a full set). 8 and 16 threads are giving the same results (within a few seconds) for parallel make values in the range 6 to 12. I tried a core-image-sato build and it completed in 61m/244m/40m, which is much closer to your 50m than I thought I would get. One thing I noticed during the build was that gettext-native seemed slow. Doing a 'clean' on it and re-baking shows that it takes over 4 minutes to build with most of the time (2m38) being spent in 'do_configure'. It also seems as if this is on the critical path as nothing else was getting scheduled while it was building. There seems to be a lot of 'nothing' going on during the do_configure phase (i.e. very little CPU use). Or, to put it another way, 2.5% of the build time is taken up configuring this package! IPK is faster than RPM. This is what I use on most of my builds. Makes no noticeable difference in my testing so far, but I'll stick with IPK from now on. snip Run the ubuntu server kernel to eliminate some scheduling overhead. Reducing the parallel settings mentioned above should help here too. I'm running 11.x server as you mentioned this before ;-) Chris Tapp opensou...@keylevel.com www.keylevel.com ___ yocto mailing list yocto@yoctoproject.org https://lists.yoctoproject.org/listinfo/yocto
Re: [yocto] Build time data
On 12 Apr 2012, at 23:56, Darren Hart wrote: Get back to us with times, and we'll build up a wiki page. Some initial results / comments: I'm running on: - i7 3820 (quad core, hyper-treading, 3.6GHz) - 16GB RAM (1600MHz XMP profile) - Asus P9X79 Pro motherboard - Ubuntu 11.10 x86_64 server installed on a 60GB OCZ Vertex 3 SSD on a 3Gb/s interface - Two 60GB OCZ Vertex 3s as RAID-0 on 6Gb/s interfaces. The following results use a DL_DIR on the OS SSD (pre-populated) - I'm not interested in the speed of the internet, especially as I've only got a relatively slow connection ;-) Poky-6.0.1 is also installed on the OS SSD. I've done a few builds of core-image-minimal: 1) Build dir on the OS SSD 2) Build dir on the SSD RAID + various bits of tuning. The results are basically the same, so it seems as if the SSD RAID makes no difference. Benchmarking it does show twice the read/write performance of the OS SSD, as expected. Disabling journalling and increasing the commit time to 6000 also made no significant difference to the build times, which were (to the nearest minute): Real : 42m User : 133m System : 19m These time were starting from nothing, and seem to fit with your 30 minutes with 3 times as many cores! BTW, BB_NUMBER_THREADS was set to 16 and PARALLEL_MAKE to 12. I also tried rebuilding the kernel: bitbake -c clean linux-yocto rm -rf the sstate bits for the above bitbake linux-yocto and got the following times: Real : 39m User : 105m System : 16m Which kind of fits with an observation. The minimal build had something like 1530 stages to complete. The first 750 to 800 of these flew past with all 8 'cores' running at just about 100% all the time. Load average (short term) was about 19, so plenty ready to run. However, round about the time python-native, the kernel, libxslt, gettext kicked in the cpu usage dropped right off - to the point that the short term load average dropped below 3. It did pick up again later on (after the kernel was completed) before slowing down again towards the end (when it would seem reasonable to expect that less can run in parallel). It seems as if some of these bits (or others around this time) aren't making use of parallel make or there is a queue of dependent tasks that needs to be serialized. The kernel build is a much bigger part of the build than I was expecting, but this is only a small image. However, it looks as if the main compilation phase completes very early on and a lot of time is then spent building the modules (in a single thread, it seems) and in packaging - which leads me to ask if RPM is the best option (speed wise)? I don't use the packages myself (though understand they are needed internally), so I can use the fastest (if there is one). Is there anything else I should be considering to improve build times? As I said above, this is just a rough-cut at some benchmarking and I plan to do some more, especially if there are other things to try and/or any other information that would be useful. Still, it's looking much, much faster than my old build system :-) Chris Tapp opensou...@keylevel.com www.keylevel.com ___ yocto mailing list yocto@yoctoproject.org https://lists.yoctoproject.org/listinfo/yocto
Re: [yocto] Build time data
On 18 Apr 2012, at 20:41, Chris Tapp wrote: On 12 Apr 2012, at 23:56, Darren Hart wrote: Get back to us with times, and we'll build up a wiki page. snip I also tried rebuilding the kernel: bitbake -c clean linux-yocto rm -rf the sstate bits for the above bitbake linux-yocto and got the following times: (CORRECT TIMES INSERTED): Real : 11m User : 15m System : 2m The comments about low load averages during kernel build still stand. Chris Tapp opensou...@keylevel.com www.keylevel.com ___ yocto mailing list yocto@yoctoproject.org https://lists.yoctoproject.org/listinfo/yocto
Re: [yocto] Build time data
On 04/18/2012 12:41 PM, Chris Tapp wrote: On 12 Apr 2012, at 23:56, Darren Hart wrote: Get back to us with times, and we'll build up a wiki page. Some initial results / comments: I'm running on: - i7 3820 (quad core, hyper-treading, 3.6GHz) - 16GB RAM (1600MHz XMP profile) - Asus P9X79 Pro motherboard - Ubuntu 11.10 x86_64 server installed on a 60GB OCZ Vertex 3 SSD on a 3Gb/s interface - Two 60GB OCZ Vertex 3s as RAID-0 on 6Gb/s interfaces. The following results use a DL_DIR on the OS SSD (pre-populated) - I'm not interested in the speed of the internet, especially as I've only got a relatively slow connection ;-) Poky-6.0.1 is also installed on the OS SSD. I've done a few builds of core-image-minimal: 1) Build dir on the OS SSD 2) Build dir on the SSD RAID + various bits of tuning. The results are basically the same, so it seems as if the SSD RAID makes no difference. Benchmarking it does show twice the read/write performance of the OS SSD, as expected. Disabling journalling and increasing the commit time to 6000 also made no significant difference to the build times, which were (to the nearest minute): That is not surprising. With 4 cores and a very serialized build target, I would not expect your SSD to be the bottleneck. Real : 42m User : 133m System : 19m These time were starting from nothing, and seem to fit with your 30 minutes with 3 times as many cores! BTW, BB_NUMBER_THREADS was set to 16 and PARALLEL_MAKE to 12. A couple of things to keep in mind here. The minimal build is very serialized in comparison to something like a sato build. If you want to optimize your build times, look at the bbmatrix* scripts shipped with poky to find the sweet spot for your target image and your build system. I suspect you will find your BB_NUMBER_THREADS and PARALLEL_MAKE settings are two high for your system. I'd start with them at 8 and 8, or 8 and 6 respectively. I also tried rebuilding the kernel: bitbake -c clean linux-yocto rm -rf the sstate bits for the above bitbake linux-yocto and got the following times: Real : 39m User : 105m System : 16m Which kind of fits with an observation. The minimal build had something like 1530 stages to complete. The first 750 to 800 of these flew past with all 8 'cores' running at just about 100% all the time. Load average (short term) was about 19, so plenty ready to run. However, round about the time python-native, the kernel, libxslt, gettext kicked in the cpu usage dropped right off - to the point that the short term load average dropped below 3. It did pick up again later on (after the kernel was completed) before slowing down again towards the end (when it would seem reasonable to expect that less can run in parallel). It seems as if some of these bits (or others around this time) aren't making use of parallel make or there is a queue of dependent tasks that needs to be serialized. The kernel build is a much bigger part of the build than I was expecting, but this is only a small image. However, it looks as if the main compilation phase completes very early on and a lot of time is then spent building the modules (in a single thread, it seems) and in packaging - which leads me to ask if RPM is the best option (speed wise)? I don't use the packages myself (though understand they are needed internally), so I can use the fastest (if there is one). IPK is faster than RPM. This is what I use on most of my builds. Is there anything else I should be considering to improve build times? Run the ubuntu server kernel to eliminate some scheduling overhead. Reducing the parallel settings mentioned above should help here too. Welcome to Ubuntu 11.10 (GNU/Linux 3.0.0-16-server x86_64) dvhart@rage:~ $ uname -r 3.0.0-16-server As I said above, this is just a rough-cut at some benchmarking and I plan to do some more, especially if there are other things to try and/or any other information that would be useful. Still, it's looking much, much faster than my old build system :-) Chris Tapp opensou...@keylevel.com www.keylevel.com -- Darren Hart Intel Open Source Technology Center Yocto Project - Linux Kernel ___ yocto mailing list yocto@yoctoproject.org https://lists.yoctoproject.org/listinfo/yocto
Re: [yocto] Build time data
On Fri, Apr 13, 2012 at 07:51:51AM +0200, Martin Jansa wrote: On Thu, Apr 12, 2012 at 04:37:00PM -0700, Flanagan, Elizabeth wrote: On Thu, Apr 12, 2012 at 7:12 AM, Darren Hart dvh...@linux.intel.com wrote: -BEGIN PGP SIGNED MESSAGE- Hash: SHA1 On 04/12/2012 01:00 AM, Martin Jansa wrote: On Thu, Apr 12, 2012 at 01:05:00PM +0530, Joshua Immanuel wrote: Darren, On Wed, 2012-04-11 at 17:30 -0700, Darren Hart wrote: I run on a beast with 12 cores, 48GB of RAM, OS and sources on a G2 Intel SSD, with two Seagate Barracudas in a RAID0 array for my /build partition. I run a headless Ubuntu 11.10 (x86_64) installation running the 3.0.0-16-server kernel. I can build core-image-minimal in 30 minutes and core-image-sato in 50 minutes from scratch. why not use so much RAM for WORKDIR in tmpfs? I bought 16GB just to be able to do my builds in tmpfs and keep only more permanent data on RAID. We've done some experiments with tmpfs, adding Beth on CC. If I recall correctly, my RAID0 array with the mount options I specified accomplishes much of what tmpfs does for me without the added setup. This should be the case in general. For the most part, if you have a decent RAID setup (We're using RAID10 on the ab) with fast disks you should be able to hit tmpfs speed (or close to it). I've done some experiments with this and what I found was maybe a 5 minute difference, sometimes, from a clean build between tmpfs and RAID10. 5 minutes on very small image like core-image-minimal (30 min) is 1/6 of that time :).. I have much bigger images and even bigger ipk feed, so to rebuild from scratch takes about 24 hours for one architecture.. And my system is very slow compared to yours, I've found my measurement of core-image-minimal-with-mtdutils around 95 mins http://patchwork.openembedded.org/patch/17039/ but this was with Phenom II X4 965, 4GB RAM, RAID0 (3 SATA2 disks) for WORKDIR, RAID5 (the same 3 SATA2 disks) BUILDDIR (raid as mdraid), now I have Bulldozer AMD FX(tm)-8120, 16GB RAM, still the same RAID0 but different motherboard.. Problem with tmpfs is that no RAM is big enough to build whole feed in one go, so I have to build in steps (e.g. bitbake gcc for all machines with the same architecture, then cleanup WORKDIR and switch to another arch, then bitbake small-image, bigger-image, qt4-x11-free, ...). qt4-x11-free is able to eat 15GB tmpfs almost completely. I discussed this during Yocto Developer Day. Let me boil it down a bit to explain some of what I did on the autobuilders. Caveat first though. I would avoid using autobuilder time as representative of prime yocto build time. The autobuilder hosts a lot of different services that sometimes impact build time and this can vary depending on what else is going on on the machine. There are four places, in general, where you want to look at optimizing outside of dependency issues. CPU, disk, memory, build process. What I found was that the most useful of these in getting the autobuilder time down was disk and build process. With disk, spreading it across the RAID saved us not only a bit of time, but also helped us avoid trashed disks. More disk thrash == higher failure rate. So far this year we've seen two disk failures that have resulted in almost zero autobuilder downtime. True for RAID10, but for WORKDIR itself RAID0 is cheeper and even higher failure rate it's not big issue for WORKDIR.. just have to cleansstate tasks which were in hit in the middle of build.. The real time saver however ended up being maintaining sstate across build runs. Even with our sstate on nfs, we're still seeing a dramatic decrease in build time. I would be interested in seeing what times you get with tmpfs. I've done tmpfs builds before and have seen good results, but bang for the buck did end up being a RAID array. I'll check if core-image-minimal can be built with just 15GB tmpfs, otherwise I would have to build it in 2 steps and the time wont be precise. It was enough with rm_work, so here are my results: The difference is much smaller then I've expected, but again those are very small images (next time I'll try to do just qt4 builds). Fastest is TMPDIR on tmpfs (BUILDDIR is not important - same times with BUILDDIR also in tmpfs and on SATA2 disk). raid0 is only about 4% slower single SATA2 disk is slowest but only a bit slower then raid5, but that could be caused by bug #2314 as I had to run build twice.. And all times were just from first successfull build, it could be different with avg time over 10 builds.. And all builds on AMD FX(tm)-8120 Eight-Core Processor 16G DDR3-1600 RAM standalone SATA2 disk ST31500341AS mdraid on 3 older SATA2 disks HDS728080PLA380 bitbake: commit 4219e2ea033232d95117211947b751bdb5efafd4 Author: Saul Wold s...@linux.intel.com Date: Tue Apr 10 17:57:15
Re: [yocto] Build time data
-BEGIN PGP SIGNED MESSAGE- Hash: SHA1 On 04/12/2012 10:51 PM, Martin Jansa wrote: And my system is very slow compared to yours, I've found my measurement of core-image-minimal-with-mtdutils around 95 mins http://patchwork.openembedded.org/patch/17039/ but this was with Phenom II X4 965, 4GB RAM, RAID0 (3 SATA2 disks) for WORKDIR, RAID5 (the same 3 SATA2 disks) BUILDDIR (raid as mdraid), now I have Bulldozer AMD FX(tm)-8120, 16GB RAM, still the same RAID0 but different motherboard.. Why RAID5 for BUILDDIR? The write overhead of RAID5 is very high. The savings RAID5 alots you is more significant with more disks, but with 3 disks it's only 1 disk better than RAID10, with a lot more overhead. I spent some time outlining all this a while back: http://www.dvhart.com/2011/03/qnap_ts419p_configuration_raid_levels_and_throughput/ Here's the relevant bit: RAID 5 distributes parity across all the drives in the array, this parity calculation is both compute intensive and IO intensive. Every write requires the parity calculation, and data must be written to every drive. - -- Darren Hart Intel Open Source Technology Center Yocto Project - Linux Kernel -BEGIN PGP SIGNATURE- Version: GnuPG v1.4.11 (GNU/Linux) Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org/ iQEcBAEBAgAGBQJPh8LTAAoJEKbMaAwKp364pa8H/A8BSudN/g7ixFmUTYMNGHlC 2+H59MgNHYWRYzNn9QvN6vyyfXzX7C00HUTQ4MQ3CmisTUza2tbJEdX9CpeIBQNg Ny8iqyNNoInTFx2T1Yi2eA9Ytegtue9Ls+IcBRbpIbs6Zo1Qwzi6oemdPZN7g3YI rH/NKALWIBt/Y/Dt2k0fz7WsQGYOuE/lYpL/CmukU7vNNEUAdOs7tZa5o1ZOQDuj zGCwuVH9QwrDJEXNsMtjNY37aJeAgDMwSXjN0pKv1WQI9j47kYQQrrp2qKVQYhV1 x4QxJ5aOuV7BaS0Y7zYkNo9nv+yKPODt25s5L83k5vjbMhCvczmMJn3jupQuUhQ= =3GDA -END PGP SIGNATURE- ___ yocto mailing list yocto@yoctoproject.org https://lists.yoctoproject.org/listinfo/yocto
Re: [yocto] Build time data
On Thu, Apr 12, 2012 at 11:08:19PM -0700, Darren Hart wrote: -BEGIN PGP SIGNED MESSAGE- Hash: SHA1 On 04/12/2012 10:51 PM, Martin Jansa wrote: And my system is very slow compared to yours, I've found my measurement of core-image-minimal-with-mtdutils around 95 mins http://patchwork.openembedded.org/patch/17039/ but this was with Phenom II X4 965, 4GB RAM, RAID0 (3 SATA2 disks) for WORKDIR, RAID5 (the same 3 SATA2 disks) BUILDDIR (raid as mdraid), now I have Bulldozer AMD FX(tm)-8120, 16GB RAM, still the same RAID0 but different motherboard.. Why RAID5 for BUILDDIR? The write overhead of RAID5 is very high. The savings RAID5 alots you is more significant with more disks, but with 3 disks it's only 1 disk better than RAID10, with a lot more overhead. Becaure RAID10 needs at least 4 drivers and all my SATA ports are already used and also it's on my /home partition.. please not that this is not some company build server, just my desktop where it happens I do a lot of builds for comunity distribution for smartphones http://shr-project.org Server we have available for builds is _much_ slower then this especially IO (some virtualized host on busy server), but has much better network bandwidth.. :). Cheers, I spent some time outlining all this a while back: http://www.dvhart.com/2011/03/qnap_ts419p_configuration_raid_levels_and_throughput/ Here's the relevant bit: RAID 5 distributes parity across all the drives in the array, this parity calculation is both compute intensive and IO intensive. Every write requires the parity calculation, and data must be written to every drive. - -- Darren Hart Intel Open Source Technology Center Yocto Project - Linux Kernel -BEGIN PGP SIGNATURE- Version: GnuPG v1.4.11 (GNU/Linux) Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org/ iQEcBAEBAgAGBQJPh8LTAAoJEKbMaAwKp364pa8H/A8BSudN/g7ixFmUTYMNGHlC 2+H59MgNHYWRYzNn9QvN6vyyfXzX7C00HUTQ4MQ3CmisTUza2tbJEdX9CpeIBQNg Ny8iqyNNoInTFx2T1Yi2eA9Ytegtue9Ls+IcBRbpIbs6Zo1Qwzi6oemdPZN7g3YI rH/NKALWIBt/Y/Dt2k0fz7WsQGYOuE/lYpL/CmukU7vNNEUAdOs7tZa5o1ZOQDuj zGCwuVH9QwrDJEXNsMtjNY37aJeAgDMwSXjN0pKv1WQI9j47kYQQrrp2qKVQYhV1 x4QxJ5aOuV7BaS0Y7zYkNo9nv+yKPODt25s5L83k5vjbMhCvczmMJn3jupQuUhQ= =3GDA -END PGP SIGNATURE- -- Martin 'JaMa' Jansa jabber: martin.ja...@gmail.com signature.asc Description: Digital signature ___ yocto mailing list yocto@yoctoproject.org https://lists.yoctoproject.org/listinfo/yocto
Re: [yocto] Build time data
Dear Darren Hart, In message 4f87c2d3.8020...@linux.intel.com you wrote: Phenom II X4 965, 4GB RAM, RAID0 (3 SATA2 disks) for WORKDIR, RAID5 (the same 3 SATA2 disks) BUILDDIR (raid as mdraid), now I have Bulldozer AMD FX(tm)-8120, 16GB RAM, still the same RAID0 but different motherboard.. Why RAID5 for BUILDDIR? The write overhead of RAID5 is very high. The savings RAID5 alots you is more significant with more disks, but with 3 disks it's only 1 disk better than RAID10, with a lot more overhead. Indeed, RAID5 with just 3 devices makes little sense - especially when running on the same drives as the RAID0 workdir. I spent some time outlining all this a while back: http://www.dvhart.com/2011/03/qnap_ts419p_configuration_raid_levels_and_throughput/ Well, such data from a 4 spindle array are nor teling much. When you are asking for I/O performance on RAID arrays, you want to distibute load over _many_ spindles. Do your comparisons on a 8 or 16 (or more) spindle setup, and the results will be much different. Also, your test of copying huge files is just one usage mode: strictly sequential access. But what we see with OE / Yocto builds is completely different. Here you will see a huge number of small and even tiny data transfers. Classical recommendations for performance optimization od RAID arrays (which are usually tuning for such big, sequentuial accesses only) like using big stripe sizes and huge read-ahead etc. turn out to be counter-productive here. But it makes no sense to have for example a stripe size of 256 kB or more when 95% or more of your disk accesses write less than 4 kB only. Here's the relevant bit: RAID 5 distributes parity across all the drives in the array, this parity calculation is both compute intensive and IO intensive. Every write requires the parity calculation, and data must be written to every drive. But did you look at a real system? I never found the CPU load of the parity calculations to be a bottleneck. I rather have the CPU spend cycles on computing parity, instead of running it with all cores idle because it's waitong for I/O to complete. I found that for the work loads we have (software builds like Yocto etc.) a multi-spindle software RAID array outperforms all other solutions (and especially the h/w RAID controllers I had access to so far - these don't even closely reach the same number of IOPS). OH - and BTW: if you care about reliability, then don't use RAID5. Go for RAID6. Yes, it's more expensive, but it's also much less painful when you have to rebuild the array in case of a disk failure. I've seen too many cases where a second disk would fail during the rebuild to ever go with RAID5 for big systems again - restoring several TB of data from tape ain't no fun. See also the RAID wiki for specific performance optizations on such RAID arrays. Best regards, Wolfgang Denk -- DENX Software Engineering GmbH, MD: Wolfgang Denk Detlev Zundel HRB 165235 Munich, Office: Kirchenstr.5, D-82194 Groebenzell, Germany Phone: (+49)-8142-66989-10 Fax: (+49)-8142-66989-80 Email: w...@denx.de Never put off until tomorrow what you can put off indefinitely. ___ yocto mailing list yocto@yoctoproject.org https://lists.yoctoproject.org/listinfo/yocto
Re: [yocto] Build time data
On Thu, 2012-04-12 at 07:34 -0700, Darren Hart wrote: On 04/12/2012 07:08 AM, Björn Stenberg wrote: Darren Hart wrote: /dev/md0/build ext4 noauto,noatime,nodiratime,commit=6000 A minor detail: 'nodiratime' is a subset of 'noatime', so there is no need to specify both. Excellent, thanks for the tip. Note the key here is that for a system with large amounts of memory, you can effectively keep the build in memory due to the long commit time. All the tests I've done show we are not IO bound anyway. Yet for all the combined horsepower, I am unable to match your time of 30 minutes for core-image-minimal. I clock in at around 37 minutes for a qemux86-64 build with ipk output: -- NOTE: Tasks Summary: Attempted 1363 tasks of which 290 didn't need to be rerun and all succeeded. real36m32.118s user214m39.697s sys 108m49.152s -- These numbers also show that my build is running less than 9x realtime, indicating that 80% of my cores sit idle most of the time. Yup, that sounds about right. The build has a linear component to it, and anything above about 12 just doesn't help. In fact the added scheduling overhead seems to hurt. This confirms what ps xf says during the builds: Only rarely is bitbake running more than a handful tasks at once, even with BB_NUMBER_THREADS at 64. And many of these tasks are in turn running sequential loops on a single core. I'm hoping to find time soon to look deeper into this issue and suggest remedies. It my distinct feeling that we should be able to build significantly faster on powerful machines. Reducing the dependency chains that result in the linear component of the build (forcing serialized execution) is one place we've focused, and could probably still use some attention. CC'ing RP as he's done a lot there. The minimal build is about our worst case single threaded build as it is highly dependency ordered. We've already done a lot of work looking at the single thread of core dependencies and this is for example why we have gettext-minimal-native which unlocked some of the core path dependencies. When you look at what we build, there is a reason for most of it unfortunately. There are emails from me about what I looked and found on the mailing list, I tried to keep a record of it somewhere at least. You can get some wins with things like ASSUME_PROVIDED += git-native. For something like a sato build you should see more parallelism. I do also have some small gains in some pending patches: http://git.yoctoproject.org/cgit.cgi/poky-contrib/commit/?h=rpurdie/t2id=2023801e25d81e8cffb643eac259c18b9fecda0b http://git.yoctoproject.org/cgit.cgi/poky-contrib/commit/?h=rpurdie/t2id=ecf5f5de8368fdcf90c3d38eafc689d6d265514b http://git.yoctoproject.org/cgit.cgi/poky-contrib/commit/?h=rpurdie/t2id=2190a51ffac71c9d19305601f8a3a46e467b745a which look at speeding up do_package, do_package_write_rpm and do_rootfs (with rpm). There were developed too late for 1.2 and are in some cases only partially complete but they show some ways we can squeeze some extra performance out the system. There are undoubtedly ways we can improve performance but I think we've done the low hanging fruit and we need some fresh ideas. Cheers, Richard ___ yocto mailing list yocto@yoctoproject.org https://lists.yoctoproject.org/listinfo/yocto
Re: [yocto] Build time data
Darren Hart wrote: One thing that comes to mind is the parallel settings, BB_NUMBER_THREADS and PARALLEL_MAKE. I noticed a negative impact if I increased these beyond 12 and 14 respectively. I tested this with bb-matrix (scripts/contrib/bb-perf/bb-matrix.sh). The script is a bit fickle, but can provide useful results and killer 3D surface plots of build time with BB and PM on the axis. Very nice! I ran a batch overnight with permutations of 8,12,16,24,64 cores: BB PM %e %S %U %P %c %w %R %F %M %x 8 8 2288.96 2611.37 10773.53 584% 810299 18460161 690464859 0 1715456 0 8 12 2198.40 2648.57 10846.28 613% 839750 18559413 690563187 0 1982864 0 8 16 2157.26 2672.79 10943.59 631% 898599 18487946 690761197 0 1715440 0 8 24 2125.15 2916.33 11199.27 664% 89 18412764 690856116 0 1715440 0 8 64 2189.14 7084.14 12906.95 913% 1491503 18646891 699897733 0 1715440 0 12 8 2277.66 2625.82 10805.21 589% 691752 18596208 690998433 0 1715440 0 12 12 2194.04 2664.01 10934.65 619% 714997 18717017 691199925 0 1715440 0 12 16 2183.95 2736.33 11162.30 636% 1090270 18359128 690559327 0 1715440 0 12 24 2120.46 2907.63 11229.50 666% 829783 18644293 690729638 0 1715312 0 12 64 2171.58 6767.09 12822.86 902% 1524683 18634668 690904549 0 1867456 0 16 8 2294.59 2691.74 10813.69 588% 771621 18637582 686712129 0 1715344 0 16 12 2201.51 2704.54 11017.23 623% 753662 18590533 699231236 0 1715424 0 16 16 2154.54 2692.31 11023.28 636% 809586 18557781 691014487 0 1715440 0 16 24 2130.33 2932.18 11259.09 666% 905669 18531776 691082307 0 2030992 0 16 64 2184.01 6954.71 12922.39 910% 1467774 18800203 701770099 0 1715440 0 24 8 2284.88 2645.88 10854.89 590% 833061 18523938 691067170 0 1715328 0 24 12 2203.72 2696.96 11033.10 623% 931443 18457749 691187723 0 2016368 0 24 16 2176.02 2727.94 3.33 636% 940044 18420200 690959670 0 1715440 0 24 24 2170.38 2938.80 11643.10 671% 1023328 18641215 686665448 15 1715440 0 24 64 2200.02 7188.60 12902.42 913% 1509158 18924772 690615091 66 1715440 0 64 8 2309.40 2702.33 10952.18 591% 753168 18687309 690927732 10 1867440 0 64 12 2230.80 2765.98 11131.22 622% 875495 18744802 691213524 28 1715216 0 64 16 2182.22 2786.22 11180.86 640% 881328 18724987 691020084 109 1768576 0 64 24 2136.20 3001.36 11238.81 666% 898320 18646384 691239254 46 1715312 0 64 64 2189.73 7154.10 12846.99 913% 1416830 18781801 690890798 41 1715424 0 What it shows is that BB_NUMBER_THREADS makes no difference at all in this range. As for PARALLEL_MAKE, it shows 24 is better than 16 but 64 is too high, incurring a massive scheduling penalty. I wonder if newer kernel versions have become more efficient. In hindsight, I should have included 32 and 48 cores in the test. Unfortunately I was unable to produce plots with bb-matrix-plot.sh. It gave me pretty png files, but missing any plotted data: # ../../poky/scripts/contrib/bb-perf/bb-matrix-plot.sh line 0: Number of grid points must be in [2:1000] - not changed! Warning: Single isoline (scan) is not enough for a pm3d plot. Hint: Missing blank lines in the data file? See 'help pm3d' and FAQ. Warning: Single isoline (scan) is not enough for a pm3d plot. Hint: Missing blank lines in the data file? See 'help pm3d' and FAQ. Warning: Single isoline (scan) is not enough for a pm3d plot. Hint: Missing blank lines in the data file? See 'help pm3d' and FAQ. Warning: Single isoline (scan) is not enough for a pm3d plot. Hint: Missing blank lines in the data file? See 'help pm3d' and FAQ. Result: http://imgur.com/mfgWb -- Björn ___ yocto mailing list yocto@yoctoproject.org https://lists.yoctoproject.org/listinfo/yocto
Re: [yocto] Build time data
Op 13 apr. 2012, om 11:56 heeft Tomas Frydrych het volgende geschreven: On 12/04/12 01:30, Darren Hart wrote: Next up is storage. Indeed. In my experience by far the biggest limiting factor in the builds is getting io bound. If you are not running a dedicated build machine, it is well worth using a dedicated disk for the poky tmp dir; assuming you have cpu time left, this leaves the machine completely usable for other things. Now RAM, you will want about 2 GB of RAM per core, with a minimum of 4GB. My experience does not bear this out at all; building Yocto on a 6 core hyper threaded desktop machine I have never ever seen the system memory use to get significantly over a 2GB mark (out of 8GB available), doing Yocto build using 10 cores/threads. Try building webkit or asio, the linker will uses ~1.5GB per object, so for asio you need PARALLEL_MAKE * 1.5 GB of ram to avoid swapping to disk. ___ yocto mailing list yocto@yoctoproject.org https://lists.yoctoproject.org/listinfo/yocto
Re: [yocto] Build time data
On 04/13/2012 01:47 AM, Björn Stenberg wrote: Darren Hart wrote: One thing that comes to mind is the parallel settings, BB_NUMBER_THREADS and PARALLEL_MAKE. I noticed a negative impact if I increased these beyond 12 and 14 respectively. I tested this with bb-matrix (scripts/contrib/bb-perf/bb-matrix.sh). The script is a bit fickle, but can provide useful results and killer 3D surface plots of build time with BB and PM on the axis. Very nice! I ran a batch overnight with permutations of 8,12,16,24,64 cores: BB PM %e %S %U %P %c %w %R %F %M %x 8 8 2288.96 2611.37 10773.53 584% 810299 18460161 690464859 0 1715456 0 8 12 2198.40 2648.57 10846.28 613% 839750 18559413 690563187 0 1982864 0 8 16 2157.26 2672.79 10943.59 631% 898599 18487946 690761197 0 1715440 0 8 24 2125.15 2916.33 11199.27 664% 89 18412764 690856116 0 1715440 0 8 64 2189.14 7084.14 12906.95 913% 1491503 18646891 699897733 0 1715440 0 12 8 2277.66 2625.82 10805.21 589% 691752 18596208 690998433 0 1715440 0 12 12 2194.04 2664.01 10934.65 619% 714997 18717017 691199925 0 1715440 0 12 16 2183.95 2736.33 11162.30 636% 1090270 18359128 690559327 0 1715440 0 12 24 2120.46 2907.63 11229.50 666% 829783 18644293 690729638 0 1715312 0 12 64 2171.58 6767.09 12822.86 902% 1524683 18634668 690904549 0 1867456 0 16 8 2294.59 2691.74 10813.69 588% 771621 18637582 686712129 0 1715344 0 16 12 2201.51 2704.54 11017.23 623% 753662 18590533 699231236 0 1715424 0 16 16 2154.54 2692.31 11023.28 636% 809586 18557781 691014487 0 1715440 0 16 24 2130.33 2932.18 11259.09 666% 905669 18531776 691082307 0 2030992 0 16 64 2184.01 6954.71 12922.39 910% 1467774 18800203 701770099 0 1715440 0 24 8 2284.88 2645.88 10854.89 590% 833061 18523938 691067170 0 1715328 0 24 12 2203.72 2696.96 11033.10 623% 931443 18457749 691187723 0 2016368 0 24 16 2176.02 2727.94 3.33 636% 940044 18420200 690959670 0 1715440 0 24 24 2170.38 2938.80 11643.10 671% 1023328 18641215 686665448 15 1715440 0 24 64 2200.02 7188.60 12902.42 913% 1509158 18924772 690615091 66 1715440 0 64 8 2309.40 2702.33 10952.18 591% 753168 18687309 690927732 10 1867440 0 64 12 2230.80 2765.98 11131.22 622% 875495 18744802 691213524 28 1715216 0 64 16 2182.22 2786.22 11180.86 640% 881328 18724987 691020084 109 1768576 0 64 24 2136.20 3001.36 11238.81 666% 898320 18646384 691239254 46 1715312 0 64 64 2189.73 7154.10 12846.99 913% 1416830 18781801 690890798 41 1715424 0 What it shows is that BB_NUMBER_THREADS makes no difference at all in this range. As for PARALLEL_MAKE, it shows 24 is better than 16 but 64 is too high, incurring a massive scheduling penalty. I wonder if newer kernel versions have become more efficient. In hindsight, I should have included 32 and 48 cores in the test. Unfortunately I was unable to produce plots with bb-matrix-plot.sh. It gave me pretty png files, but missing any plotted data: Right, gnuplot likes evenly spaced values of BB and PM. So you could have done: 8,12,16,24,28,32 (anything about that is going to go down anyway). Unfortunately, the gaps force the plot to generate spikes at the interpolated points. I'm open to ideas on how to make it compatible with arbitrary gaps and avoid the spikes. Perhaps I should rewrite this with python matplotlib and scipy and use the interpolate module. This is non-trivial, so not something I'll get to quickly. # ../../poky/scripts/contrib/bb-perf/bb-matrix-plot.sh line 0: Number of grid points must be in [2:1000] - not changed! Warning: Single isoline (scan) is not enough for a pm3d plot. Hint: Missing blank lines in the data file? See 'help pm3d' and FAQ. Warning: Single isoline (scan) is not enough for a pm3d plot. Hint: Missing blank lines in the data file? See 'help pm3d' and FAQ. Warning: Single isoline (scan) is not enough for a pm3d plot. Hint: Missing blank lines in the data file? See 'help pm3d' and FAQ. Warning: Single isoline (scan) is not enough for a pm3d plot. Hint: Missing blank lines in the data file? See 'help pm3d' and FAQ. Result: http://imgur.com/mfgWb -- Darren Hart Intel Open Source Technology Center Yocto Project - Linux Kernel ___ yocto mailing list yocto@yoctoproject.org https://lists.yoctoproject.org/listinfo/yocto
Re: [yocto] Build time data
On 04/11/2012 09:39 PM, Bob Cochran wrote: On 04/11/2012 08:30 PM, Darren Hart wrote: SSDs are one way to go, but we've been known to chew through them and they aren't priced as consumables. Hi Darren, Could you please elaborate on been known to chew through them? Are you running into an upper limit on write / erase cycles? Are you encountering hard (or soft) failures? Some have reported early physical disk failure. Due to the cost of SSDs, not a lot of people seem to be trying it out. I *believe* the current generation of SSDs would perform admirably, but I haven't tested that. I know Deny builds with SSDs, perhaps he would care to comment? -- Darren Hart Intel Open Source Technology Center Yocto Project - Linux Kernel ___ yocto mailing list yocto@yoctoproject.org https://lists.yoctoproject.org/listinfo/yocto
Re: [yocto] Build time data
Darren, On Wed, 2012-04-11 at 17:30 -0700, Darren Hart wrote: I run on a beast with 12 cores, 48GB of RAM, OS and sources on a G2 Intel SSD, with two Seagate Barracudas in a RAID0 array for my /build partition. I run a headless Ubuntu 11.10 (x86_64) installation running the 3.0.0-16-server kernel. I can build core-image-minimal in 30 minutes and core-image-sato in 50 minutes from scratch. wow. Can I get a shell? :D -- Joshua Immanuel HiPro IT Solutions Private Limited http://hipro.co.in signature.asc Description: This is a digitally signed message part ___ yocto mailing list yocto@yoctoproject.org https://lists.yoctoproject.org/listinfo/yocto
Re: [yocto] Build time data
On Thu, 2012-04-12 at 10:00 +0200, Martin Jansa wrote: On Wed, 2012-04-11 at 17:30 -0700, Darren Hart wrote: I run on a beast with 12 cores, 48GB of RAM, OS and sources on a G2 Intel SSD, with two Seagate Barracudas in a RAID0 array for my /build partition. I run a headless Ubuntu 11.10 (x86_64) installation running the 3.0.0-16-server kernel. I can build core-image-minimal in 30 minutes and core-image-sato in 50 minutes from scratch. why not use so much RAM for WORKDIR in tmpfs? I bought 16GB just to be able to do my builds in tmpfs and keep only more permanent data on RAID. +1 I tried using the tmpfs for WORKDIR on my T420 which has 8GB of RAM. (In India, maximum single slot DDR3 RAM we can get is 4GB.) Obviously, this is not sufficient :( Maybe I shouldn't use the laptop for build purposes. Moreover, every time I build the image in yocto, temperature peeks to 87 degree Celsius. Hoping that my HDD should not die. -- Joshua Immanuel HiPro IT Solutions Private Limited http://hipro.co.in signature.asc Description: This is a digitally signed message part ___ yocto mailing list yocto@yoctoproject.org https://lists.yoctoproject.org/listinfo/yocto
Re: [yocto] Build time data
Darren Hart wrote: /dev/md0/build ext4 noauto,noatime,nodiratime,commit=6000 A minor detail: 'nodiratime' is a subset of 'noatime', so there is no need to specify both. I run on a beast with 12 cores, 48GB of RAM, OS and sources on a G2 Intel SSD, with two Seagate Barracudas in a RAID0 array for my /build partition. I run a headless Ubuntu 11.10 (x86_64) installation running the 3.0.0-16-server kernel. I can build core-image-minimal in 30 minutes and core-image-sato in 50 minutes from scratch. I'm guessing those are rather fast cores? I build on a different type of beast: 64 cores at 2.1GHz and 128 GB ram. The OS is on a single SSD and the build dir (and sources) is on a RAID0 array of Intel 520 SSDs. Kernel is the same ubuntu 3.0.0-16-server as yours. Yet for all the combined horsepower, I am unable to match your time of 30 minutes for core-image-minimal. I clock in at around 37 minutes for a qemux86-64 build with ipk output: -- NOTE: Tasks Summary: Attempted 1363 tasks of which 290 didn't need to be rerun and all succeeded. real36m32.118s user214m39.697s sys 108m49.152s -- These numbers also show that my build is running less than 9x realtime, indicating that 80% of my cores sit idle most of the time. This confirms what ps xf says during the builds: Only rarely is bitbake running more than a handful tasks at once, even with BB_NUMBER_THREADS at 64. And many of these tasks are in turn running sequential loops on a single core. I'm hoping to find time soon to look deeper into this issue and suggest remedies. It my distinct feeling that we should be able to build significantly faster on powerful machines. -- Björn ___ yocto mailing list yocto@yoctoproject.org https://lists.yoctoproject.org/listinfo/yocto
Re: [yocto] Build time data
-BEGIN PGP SIGNED MESSAGE- Hash: SHA1 On 04/12/2012 01:00 AM, Martin Jansa wrote: On Thu, Apr 12, 2012 at 01:05:00PM +0530, Joshua Immanuel wrote: Darren, On Wed, 2012-04-11 at 17:30 -0700, Darren Hart wrote: I run on a beast with 12 cores, 48GB of RAM, OS and sources on a G2 Intel SSD, with two Seagate Barracudas in a RAID0 array for my /build partition. I run a headless Ubuntu 11.10 (x86_64) installation running the 3.0.0-16-server kernel. I can build core-image-minimal in 30 minutes and core-image-sato in 50 minutes from scratch. why not use so much RAM for WORKDIR in tmpfs? I bought 16GB just to be able to do my builds in tmpfs and keep only more permanent data on RAID. We've done some experiments with tmpfs, adding Beth on CC. If I recall correctly, my RAID0 array with the mount options I specified accomplishes much of what tmpfs does for me without the added setup. With a higher commit interval, the kernel doesn't try to sync the dcache with the disks as frequently (eg not even once during a build), so it's effectively writing to memory (although there is still plenty of IO occurring). The other reason is that while 48GB is plenty for a single build, I often run many builds in parallel, sometimes in virtual machines when I need to reproduce or test something on different hosts. For example: https://picasaweb.google.com/lh/photo/7PCrqXQqxL98SAY1ecNzDdMTjNZETYmyPJy0liipFm0?feat=directlink - -- Darren Hart Intel Open Source Technology Center Yocto Project - Linux Kernel -BEGIN PGP SIGNATURE- Version: GnuPG v1.4.11 (GNU/Linux) Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org/ iQEcBAEBAgAGBQJPhuLfAAoJEKbMaAwKp3648pYH/1HGCzI1QP1mj1OPfbo1TNou nq1dCnEQOc+vUqShrmgjEY5H2G7Kqu5Y8JRp8m3D6v2iUPwu+ko3xASJkIVetgTn 1J+dkZl93Gbm8nm63b5bES0mMqyiycNgXW4KTL0iA+4mLbKSXck7nF/gIyjE4iHa SR+DDavSoOIJUiZsJBJpIdS4sY2RpalohhJvp97Qfmbxmqlo2RJkqzB7OmLliKbB zGiuXeFgGojZXIRl11Rr36kqqA75WoTlNYjlkcg1paEhCr4zCMh0sujGaPQgVPtu YU+FCtGxQ569f+hahdJraCU9T4IbMK4AOk30VqVxPifCqFhIvr7FnVRkYtV5pZM= =tdFq -END PGP SIGNATURE- ___ yocto mailing list yocto@yoctoproject.org https://lists.yoctoproject.org/listinfo/yocto
Re: [yocto] Build time data
On 04/12/2012 03:43 PM, Chris Tapp wrote: On 12 Apr 2012, at 15:34, Darren Hart wrote: On 04/12/2012 07:08 AM, Björn Stenberg wrote: Darren Hart wrote: /dev/md0/build ext4 noauto,noatime,nodiratime,commit=6000 A minor detail: 'nodiratime' is a subset of 'noatime', so there is no need to specify both. Excellent, thanks for the tip. I run on a beast with 12 cores, 48GB of RAM, OS and sources on a G2 Intel SSD, with two Seagate Barracudas in a RAID0 array for my /build partition. I run a headless Ubuntu 11.10 (x86_64) installation running the 3.0.0-16-server kernel. I can build core-image-minimal in 30 minutes and core-image-sato in 50 minutes from scratch. I'm guessing those are rather fast cores? They are: model name : Intel(R) Xeon(R) CPU X5680 @ 3.33GHz Nice, but well out of my budget - I've got to make do with what one of your CPUs costs for the whole system ;-) I build on a different type of beast: 64 cores at 2.1GHz and 128 GB ram. The OS is on a single SSD and the build dir (and sources) is on a RAID0 array of Intel 520 SSDs. Kernel is the same ubuntu 3.0.0-16-server as yours. Now that I think about it, my downloads are on the RAID0 array too. One thing that comes to mind is the parallel settings, BB_NUMBER_THREADS and PARALLEL_MAKE. I noticed a negative impact if I increased these beyond 12 and 14 respectively. I tested this with bb-matrix (scripts/contrib/bb-perf/bb-matrix.sh). The script is a bit fickle, but can provide useful results and killer 3D surface plots of build time with BB and PM on the axis. Can't seem to find a plot image at the moment for some reason... Yet for all the combined horsepower, I am unable to match your time of 30 minutes for core-image-minimal. I clock in at around 37 minutes for a qemux86-64 build with ipk output: -- NOTE: Tasks Summary: Attempted 1363 tasks of which 290 didn't need to be rerun and all succeeded. real36m32.118s user214m39.697s sys 108m49.152s -- These numbers also show that my build is running less than 9x realtime, indicating that 80% of my cores sit idle most of the time. Yup, that sounds about right. The build has a linear component to it, and anything above about 12 just doesn't help. In fact the added scheduling overhead seems to hurt. This confirms what ps xf says during the builds: Only rarely is bitbake running more than a handful tasks at once, even with BB_NUMBER_THREADS at 64. And many of these tasks are in turn running sequential loops on a single core. I'm hoping to find time soon to look deeper into this issue and suggest remedies. It my distinct feeling that we should be able to build significantly faster on powerful machines. Reducing the dependency chains that result in the linear component of the build (forcing serialized execution) is one place we've focused, and could probably still use some attention. CC'ing RP as he's done a lot there. Current plan for a 'budget' system is: DX79TO motherboard, i7 3820, 16GB RAM, a pair of 60GB OCZ Vertex III's in RAID-0 for downloads / build, SATA HD for OS (Ubuntu 11.10 x86_64). That'll give me a 2.7x boost just on CPU and the SSDs (and maybe some over-clocking) will give some more. Not sure if SSDs in RAID-0 will give any boost, so I'll run some tests. Thanks to all for the comments in this thread. Get back to us with times, and we'll build up a wiki page. Chris Tapp opensou...@keylevel.com www.keylevel.com -- Darren Hart Intel Open Source Technology Center Yocto Project - Linux Kernel ___ yocto mailing list yocto@yoctoproject.org https://lists.yoctoproject.org/listinfo/yocto
Re: [yocto] Build time data
On Thu, Apr 12, 2012 at 7:12 AM, Darren Hart dvh...@linux.intel.com wrote: -BEGIN PGP SIGNED MESSAGE- Hash: SHA1 On 04/12/2012 01:00 AM, Martin Jansa wrote: On Thu, Apr 12, 2012 at 01:05:00PM +0530, Joshua Immanuel wrote: Darren, On Wed, 2012-04-11 at 17:30 -0700, Darren Hart wrote: I run on a beast with 12 cores, 48GB of RAM, OS and sources on a G2 Intel SSD, with two Seagate Barracudas in a RAID0 array for my /build partition. I run a headless Ubuntu 11.10 (x86_64) installation running the 3.0.0-16-server kernel. I can build core-image-minimal in 30 minutes and core-image-sato in 50 minutes from scratch. why not use so much RAM for WORKDIR in tmpfs? I bought 16GB just to be able to do my builds in tmpfs and keep only more permanent data on RAID. We've done some experiments with tmpfs, adding Beth on CC. If I recall correctly, my RAID0 array with the mount options I specified accomplishes much of what tmpfs does for me without the added setup. This should be the case in general. For the most part, if you have a decent RAID setup (We're using RAID10 on the ab) with fast disks you should be able to hit tmpfs speed (or close to it). I've done some experiments with this and what I found was maybe a 5 minute difference, sometimes, from a clean build between tmpfs and RAID10. I discussed this during Yocto Developer Day. Let me boil it down a bit to explain some of what I did on the autobuilders. Caveat first though. I would avoid using autobuilder time as representative of prime yocto build time. The autobuilder hosts a lot of different services that sometimes impact build time and this can vary depending on what else is going on on the machine. There are four places, in general, where you want to look at optimizing outside of dependency issues. CPU, disk, memory, build process. What I found was that the most useful of these in getting the autobuilder time down was disk and build process. With disk, spreading it across the RAID saved us not only a bit of time, but also helped us avoid trashed disks. More disk thrash == higher failure rate. So far this year we've seen two disk failures that have resulted in almost zero autobuilder downtime. The real time saver however ended up being maintaining sstate across build runs. Even with our sstate on nfs, we're still seeing a dramatic decrease in build time. I would be interested in seeing what times you get with tmpfs. I've done tmpfs builds before and have seen good results, but bang for the buck did end up being a RAID array. With a higher commit interval, the kernel doesn't try to sync the dcache with the disks as frequently (eg not even once during a build), so it's effectively writing to memory (although there is still plenty of IO occurring). The other reason is that while 48GB is plenty for a single build, I often run many builds in parallel, sometimes in virtual machines when I need to reproduce or test something on different hosts. For example: https://picasaweb.google.com/lh/photo/7PCrqXQqxL98SAY1ecNzDdMTjNZETYmyPJy0liipFm0?feat=directlink - -- Darren Hart Intel Open Source Technology Center Yocto Project - Linux Kernel -BEGIN PGP SIGNATURE- Version: GnuPG v1.4.11 (GNU/Linux) Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org/ iQEcBAEBAgAGBQJPhuLfAAoJEKbMaAwKp3648pYH/1HGCzI1QP1mj1OPfbo1TNou nq1dCnEQOc+vUqShrmgjEY5H2G7Kqu5Y8JRp8m3D6v2iUPwu+ko3xASJkIVetgTn 1J+dkZl93Gbm8nm63b5bES0mMqyiycNgXW4KTL0iA+4mLbKSXck7nF/gIyjE4iHa SR+DDavSoOIJUiZsJBJpIdS4sY2RpalohhJvp97Qfmbxmqlo2RJkqzB7OmLliKbB zGiuXeFgGojZXIRl11Rr36kqqA75WoTlNYjlkcg1paEhCr4zCMh0sujGaPQgVPtu YU+FCtGxQ569f+hahdJraCU9T4IbMK4AOk30VqVxPifCqFhIvr7FnVRkYtV5pZM= =tdFq -END PGP SIGNATURE- -- Elizabeth Flanagan Yocto Project Build and Release ___ yocto mailing list yocto@yoctoproject.org https://lists.yoctoproject.org/listinfo/yocto
Re: [yocto] Build time data
On Thu, Apr 12, 2012 at 04:37:00PM -0700, Flanagan, Elizabeth wrote: On Thu, Apr 12, 2012 at 7:12 AM, Darren Hart dvh...@linux.intel.com wrote: -BEGIN PGP SIGNED MESSAGE- Hash: SHA1 On 04/12/2012 01:00 AM, Martin Jansa wrote: On Thu, Apr 12, 2012 at 01:05:00PM +0530, Joshua Immanuel wrote: Darren, On Wed, 2012-04-11 at 17:30 -0700, Darren Hart wrote: I run on a beast with 12 cores, 48GB of RAM, OS and sources on a G2 Intel SSD, with two Seagate Barracudas in a RAID0 array for my /build partition. I run a headless Ubuntu 11.10 (x86_64) installation running the 3.0.0-16-server kernel. I can build core-image-minimal in 30 minutes and core-image-sato in 50 minutes from scratch. why not use so much RAM for WORKDIR in tmpfs? I bought 16GB just to be able to do my builds in tmpfs and keep only more permanent data on RAID. We've done some experiments with tmpfs, adding Beth on CC. If I recall correctly, my RAID0 array with the mount options I specified accomplishes much of what tmpfs does for me without the added setup. This should be the case in general. For the most part, if you have a decent RAID setup (We're using RAID10 on the ab) with fast disks you should be able to hit tmpfs speed (or close to it). I've done some experiments with this and what I found was maybe a 5 minute difference, sometimes, from a clean build between tmpfs and RAID10. 5 minutes on very small image like core-image-minimal (30 min) is 1/6 of that time :).. I have much bigger images and even bigger ipk feed, so to rebuild from scratch takes about 24 hours for one architecture.. And my system is very slow compared to yours, I've found my measurement of core-image-minimal-with-mtdutils around 95 mins http://patchwork.openembedded.org/patch/17039/ but this was with Phenom II X4 965, 4GB RAM, RAID0 (3 SATA2 disks) for WORKDIR, RAID5 (the same 3 SATA2 disks) BUILDDIR (raid as mdraid), now I have Bulldozer AMD FX(tm)-8120, 16GB RAM, still the same RAID0 but different motherboard.. Problem with tmpfs is that no RAM is big enough to build whole feed in one go, so I have to build in steps (e.g. bitbake gcc for all machines with the same architecture, then cleanup WORKDIR and switch to another arch, then bitbake small-image, bigger-image, qt4-x11-free, ...). qt4-x11-free is able to eat 15GB tmpfs almost completely. I discussed this during Yocto Developer Day. Let me boil it down a bit to explain some of what I did on the autobuilders. Caveat first though. I would avoid using autobuilder time as representative of prime yocto build time. The autobuilder hosts a lot of different services that sometimes impact build time and this can vary depending on what else is going on on the machine. There are four places, in general, where you want to look at optimizing outside of dependency issues. CPU, disk, memory, build process. What I found was that the most useful of these in getting the autobuilder time down was disk and build process. With disk, spreading it across the RAID saved us not only a bit of time, but also helped us avoid trashed disks. More disk thrash == higher failure rate. So far this year we've seen two disk failures that have resulted in almost zero autobuilder downtime. True for RAID10, but for WORKDIR itself RAID0 is cheeper and even higher failure rate it's not big issue for WORKDIR.. just have to cleansstate tasks which were in hit in the middle of build.. The real time saver however ended up being maintaining sstate across build runs. Even with our sstate on nfs, we're still seeing a dramatic decrease in build time. I would be interested in seeing what times you get with tmpfs. I've done tmpfs builds before and have seen good results, but bang for the buck did end up being a RAID array. I'll check if core-image-minimal can be built with just 15GB tmpfs, otherwise I would have to build it in 2 steps and the time wont be precise. With a higher commit interval, the kernel doesn't try to sync the dcache with the disks as frequently (eg not even once during a build), so it's effectively writing to memory (although there is still plenty of IO occurring). The other reason is that while 48GB is plenty for a single build, I often run many builds in parallel, sometimes in virtual machines when I need to reproduce or test something on different hosts. For example: https://picasaweb.google.com/lh/photo/7PCrqXQqxL98SAY1ecNzDdMTjNZETYmyPJy0liipFm0?feat=directlink -- Martin 'JaMa' Jansa jabber: martin.ja...@gmail.com signature.asc Description: Digital signature ___ yocto mailing list yocto@yoctoproject.org https://lists.yoctoproject.org/listinfo/yocto
[yocto] Build time data
Is there a page somewhere that gives a rough idea of how quickly a full build runs on various systems? I need a faster build platform, but want to get a reasonable price / performance balance ;-) I'm looking at something like an i7-2700K but am not yet tied... Chris Tapp opensou...@keylevel.com www.keylevel.com ___ yocto mailing list yocto@yoctoproject.org https://lists.yoctoproject.org/listinfo/yocto
Re: [yocto] Build time data
On 04/11/2012 04:42 PM, Chris Tapp wrote: Is there a page somewhere that gives a rough idea of how quickly a full build runs on various systems? I need a faster build platform, but want to get a reasonable price / performance balance ;-) I'm looking at something like an i7-2700K but am not yet tied... Chris Tapp opensou...@keylevel.com www.keylevel.com I haven't seen one, but it would be great to have this on the wiki where everyone could post what they're seeing using. Maybe the autobuilder has some useful statistics (http://autobuilder.yoctoproject.org:8010/)? Of course, you'll have to be careful to determine whether anything else was running at the time of the build. On a related note, I have been wondering whether I would get the bang for the buck with an SSD for my build machines. I would guess that building embedded Linux images isn't a typical use pattern for an SSD. I wonder if the long write erase durations for FLASH technology would show its ugly face during a poky build. I would think that the embedded micro inside the SSD managing the writes might get taxed to the limit trying to slice the data. I would appreciate anyone's experience with SSDs on build machines. ___ yocto mailing list yocto@yoctoproject.org https://lists.yoctoproject.org/listinfo/yocto
Re: [yocto] Build time data
On 04/11/2012 01:42 PM, Chris Tapp wrote: Is there a page somewhere that gives a rough idea of how quickly a full build runs on various systems? I need a faster build platform, but want to get a reasonable price / performance balance ;-) I'm looking at something like an i7-2700K but am not yet tied... We really do need to get some pages up on this as it comes up a lot. Currently Yocto Project builds scale well up to about 12 Cores, so first step is to get as many cores as you can. Sacrifice some speed for cores if you have to. If you can do dual-socket, do it. If not, try for a six core. Next up is storage. We read and write a LOT of data. SSDs are one way to go, but we've been known to chew through them and they aren't priced as consumables. You can get about 66% of the performance of a single SSD with a pair of good quality SATA2 or better drives configured in RAID0 (no redundancy). Ideally, you would have your OS and sources on an SSD and use a RAID0 array to build on. This data is all recreatable, so it's OK if you lose a disk and therefor ALL of your build data. Now RAM, you will want about 2 GB of RAM per core, with a minimum of 4GB. Finally, software. Be sure to run a server kernel which is optimized for throughput as opposed to interactivity (like Desktop kernels). This implies CONFIG_PREEMPT_NONE=y. You'll want a 64-bit kernel to avoid the performance penalty inherent with 32bit PAE kernels - and you will want lots of memory. You can save some IO by mounting your its-ok-if-i-lose-all-my-data build partition as follows: /dev/md0/build ext4 noauto,noatime,nodiratime,commit=6000 As well as drop the journal from it when you format it. Just don't power off your machine without properly shutting down! That should get you some pretty good build times. I run on a beast with 12 cores, 48GB of RAM, OS and sources on a G2 Intel SSD, with two Seagate Barracudas in a RAID0 array for my /build partition. I run a headless Ubuntu 11.10 (x86_64) installation running the 3.0.0-16-server kernel. I can build core-image-minimal in 30 minutes and core-image-sato in 50 minutes from scratch. Hopefully that gives you some ideas to get started. -- Darren Hart Intel Open Source Technology Center Yocto Project - Linux Kernel ___ yocto mailing list yocto@yoctoproject.org https://lists.yoctoproject.org/listinfo/yocto
Re: [yocto] Build time data
Excellent topic for a wiki page. On Wed, Apr 11, 2012 at 5:30 PM, Darren Hart dvh...@linux.intel.com wrote: On 04/11/2012 01:42 PM, Chris Tapp wrote: Is there a page somewhere that gives a rough idea of how quickly a full build runs on various systems? I need a faster build platform, but want to get a reasonable price / performance balance ;-) I'm looking at something like an i7-2700K but am not yet tied... We really do need to get some pages up on this as it comes up a lot. Currently Yocto Project builds scale well up to about 12 Cores, so first step is to get as many cores as you can. Sacrifice some speed for cores if you have to. If you can do dual-socket, do it. If not, try for a six core. Next up is storage. We read and write a LOT of data. SSDs are one way to go, but we've been known to chew through them and they aren't priced as consumables. You can get about 66% of the performance of a single SSD with a pair of good quality SATA2 or better drives configured in RAID0 (no redundancy). Ideally, you would have your OS and sources on an SSD and use a RAID0 array to build on. This data is all recreatable, so it's OK if you lose a disk and therefor ALL of your build data. Now RAM, you will want about 2 GB of RAM per core, with a minimum of 4GB. Finally, software. Be sure to run a server kernel which is optimized for throughput as opposed to interactivity (like Desktop kernels). This implies CONFIG_PREEMPT_NONE=y. You'll want a 64-bit kernel to avoid the performance penalty inherent with 32bit PAE kernels - and you will want lots of memory. You can save some IO by mounting your its-ok-if-i-lose-all-my-data build partition as follows: /dev/md0 /build ext4 noauto,noatime,nodiratime,commit=6000 As well as drop the journal from it when you format it. Just don't power off your machine without properly shutting down! That should get you some pretty good build times. I run on a beast with 12 cores, 48GB of RAM, OS and sources on a G2 Intel SSD, with two Seagate Barracudas in a RAID0 array for my /build partition. I run a headless Ubuntu 11.10 (x86_64) installation running the 3.0.0-16-server kernel. I can build core-image-minimal in 30 minutes and core-image-sato in 50 minutes from scratch. Hopefully that gives you some ideas to get started. -- Darren Hart Intel Open Source Technology Center Yocto Project - Linux Kernel ___ yocto mailing list yocto@yoctoproject.org https://lists.yoctoproject.org/listinfo/yocto -- Jeff Osier-Mixon http://jefro.net/blog Yocto Project Community Manager @Intel http://yoctoproject.org ___ yocto mailing list yocto@yoctoproject.org https://lists.yoctoproject.org/listinfo/yocto
Re: [yocto] Build time data
On 04/11/2012 08:30 PM, Darren Hart wrote: SSDs are one way to go, but we've been known to chew through them and they aren't priced as consumables. Hi Darren, Could you please elaborate on been known to chew through them? Are you running into an upper limit on write / erase cycles? Are you encountering hard (or soft) failures? Thanks, Bob ___ yocto mailing list yocto@yoctoproject.org https://lists.yoctoproject.org/listinfo/yocto