Re: wanna-build / how to sort packages on buildds?
Ingo Jürgensmann i...@2011.bluespice.org writes: On Sun, 1 May 2011 01:36:38 +0200, Andreas Barth wrote: Sometimes we have a few packages we don't want to build on a certain buildds. Sometimes this is because this package needs lots of ram. Or it takes quite long and would waste the parallel building a machine supports. Or whatever else. Of course a package could be in more than one category. Yes, you're facing basically the same problem I tried to address in 2000/2001 when doing my renderserver and later for what Multibuild was intended to do as well. ;-) Now, what I would like to do is to write that down in a central file with categories. I would recommend to use a database, really. That is, to mark packages as builds only with more than one gigabyte of ram. And to mark buildds as has 6 cores, only ... ram - so that I don't need to copy entries from buildd to buildd, but just say that new machine is the same class as ..., and that's it. Another category would be fast disk/raid. There are some packages with lots of disk accesses. When you can schedule those packages to a buildd that has faster disk access like in having multiple spindles for faster seeks, you can minimize build times as well. We faced that problem on m68k particularly on IDE vs SCSI disks on Amigas, as IDE was dog slow. Another example there was the faster disks on Amigas vs slower SCSI disks in Apple machines. Now my question is just: How to do that efficient? I.e. how would such a configuration file look like, and how the code to distribute the package on the most fitting buildd(s)? (I.e. it's better to waste 5 out of 6 cores than to not build a package at all, but a package needing at least 1g ram can't build on a buildd with only 512mb - but no package should starve in the end.) Ideas? Suggestions? Code? Look at my update-buildd.net from Buildd.net, which I used to collect data from the buildds such as RAM, kernel, uptime, used swap and such (http://buildd.net/cgi/hostpackages.cgi?unstable_arch=m68ksearchtype=arrakis). I store this information into the database and also the build times of the packages. With this dataset it should be possible to have the wanna-buildd schedule packages in such a way to minimize the build times because you can decide which buildd is the most suitable buildd for the next package. I think different groups of factors have to be considered: 1) absolute requirements I think there are only 2 absolute requirements: - ram size - disk size And all buildds currently have enough disk space I think. In the past we also had some sources that would crash one buildd but not the other. No way to track that ahead of time though. But it should be possible to report this to wanna-build. Absolute requirement are absolute. If a buildd doesn't have the requirement then wanna-build must never schedule the package to build there. (Note: The buildd will just give it back with the current setup so no biggy if wanna-build gets it wrong.) 2) important features The most relevant feature I think is multiple cores and support of DEB_BUILD_OPTIONS=parallel=x. This would be an attribute of both the buildd and the source and one should try to match them. Build sources which support parallel building preverably on systems with multiple cores. The I/O speed and the sources need for it could be another such feature. But I'm not sure (other than the m68k special case) this is relevant to such a degree that it makes sense tracking this specifically. Important features would be anything we can figure out and point to as having a major influence on the build speed. And imho this should be like N times faster to warrant the effort to track this for sources. 3) general performance Buildds are different and build times will differ acordingly. I don't think this can be properly quanitfied ahead of time and there are many hidden factors interacting that would be impossible to quantify with reasonable effort. But I think this can be measured and extrapolated just fine. Keep a database of build times and do some statistical analysis to rate the buildd speed in general and for specific sources. With that you have a good aproximation of the time a source will need to build on each buildd. Use that as weight when deciding where to build a source. Unlike items in 2), which would have to be manually tracked, this would encompass any and all factors including unknown ones in approximation. Some care would have to be taken that factors aren't weighted twice, once from 2) and once here. The build times for a parallel building source will differ greatly for single and multi core systems. The difference in weigth this produces might already be sufficient so that those sources prefer the multi core systems (after a few versions). So tracking important features manually might be wasted effort altogether. My suggestion would be to implement something for 1) and 3) and see how that goes.
Re: wanna-build / how to sort packages on buildds?
Roger Leigh rle...@codelibre.net writes: I just wanted to add that if you would like more statistics reporting for this purpose, I'll be happy to add that to sbuild. Currently we only really report build time and disc space. If you want additional data such as number of cores used, memory/swap usage and other resource usage, I'll be happy to add them to the sbuild summary stats. Actually measuring those might be a bit trickier though, especially on machines running parallel builds. Reporting the cummulative cpu time used and wall clock time should be helpfull in detecting parallel builds. The more cpu time approached wall clock * num cores the better it would be to build the package on multi core systems. MfG Goswin -- To UNSUBSCRIBE, email to debian-devel-requ...@lists.debian.org with a subject of unsubscribe. Trouble? Contact listmas...@lists.debian.org Archive: http://lists.debian.org/87vcxtuj8b.fsf@frosties.localnet
Re: wanna-build / how to sort packages on buildds?
On Saturday, April 30, 2011 07:36:38 PM Andreas Barth wrote: Hi, I have a problem I need to solve in perl within wanna-build: Sometimes we have a few packages we don't want to build on a certain buildds. Sometimes this is because this package needs lots of ram. Or it takes quite long and would waste the parallel building a machine supports. Or whatever else. Of course a package could be in more than one category. Now, what I would like to do is to write that down in a central file with categories. That is, to mark packages as builds only with more than one gigabyte of ram. And to mark buildds as has 6 cores, only ... ram - so that I don't need to copy entries from buildd to buildd, but just say that new machine is the same class as ..., and that's it. Now my question is just: How to do that efficient? I.e. how would such a configuration file look like, and how the code to distribute the package on the most fitting buildd(s)? (I.e. it's better to waste 5 out of 6 cores than to not build a package at all, but a package needing at least 1g ram can't build on a buildd with only 512mb - but no package should starve in the end.) Ideas? Suggestions? Code? If one could do something like: wb gb libieee1284 mod-wsgi nflog-bindings zinnia . ia64 . !caballero that would be a HUGE win. My suggestion would be to start with something simple and declarative like that and then build the back end to automatically sort out the list of candidate buildd's after that. Scott K -- To UNSUBSCRIBE, email to debian-devel-requ...@lists.debian.org with a subject of unsubscribe. Trouble? Contact listmas...@lists.debian.org Archive: http://lists.debian.org/201105021332.04065.deb...@kitterman.com
Re: wanna-build / how to sort packages on buildds?
* Scott Kitterman (deb...@kitterman.com) [110502 19:32]: If one could do something like: wb gb libieee1284 mod-wsgi nflog-bindings zinnia . ia64 . !caballero good idea. I'll consider how to do that. Andi -- To UNSUBSCRIBE, email to debian-devel-requ...@lists.debian.org with a subject of unsubscribe. Trouble? Contact listmas...@lists.debian.org Archive: http://lists.debian.org/20110502173620.gp15...@mails.so.argh.org
Re: wanna-build / how to sort packages on buildds?
]] Andreas Barth Hi, | Now my question is just: How to do that efficient? I.e. how would such | a configuration file look like, and how the code to distribute the | package on the most fitting buildd(s)? (I.e. it's better to waste 5 | out of 6 cores than to not build a package at all, but a package | needing at least 1g ram can't build on a buildd with only 512mb - but | no package should starve in the end.) | | Ideas? Suggestions? Code? Sounds like a variant of the knapsack problem. I'd suggest something like: - Have a mapping for buildds from resources to a value (this can just be a perl hash), this defines cores, amount of memory, etc. - Each package has a minimum requirement for cpu, memory, etc, stored in a hash. Store all the packages in a list. - Sort the list, either according to a score which is a mix of cpu and memory and whatever other factors you want or first along the cpu axis, then along the memory axis, etc. I suspect CPU and memory requirements are correlated, but not perfectly. - Assign packages to buildds on a first-match basis. That means you get the hardest packages done first. The match has to make sure the buildd can actually build the package in question, of course. Regards, -- Tollef Fog Heen UNIX is user friendly, it's just picky about who its friends are -- To UNSUBSCRIBE, email to debian-devel-requ...@lists.debian.org with a subject of unsubscribe. Trouble? Contact listmas...@lists.debian.org Archive: http://lists.debian.org/87d3k2c49x@qurzaw.varnish-software.com
Re: wanna-build / how to sort packages on buildds?
On Sun, 1 May 2011 01:36:38 +0200, Andreas Barth wrote: Sometimes we have a few packages we don't want to build on a certain buildds. Sometimes this is because this package needs lots of ram. Or it takes quite long and would waste the parallel building a machine supports. Or whatever else. Of course a package could be in more than one category. Yes, you're facing basically the same problem I tried to address in 2000/2001 when doing my renderserver and later for what Multibuild was intended to do as well. ;-) Now, what I would like to do is to write that down in a central file with categories. I would recommend to use a database, really. That is, to mark packages as builds only with more than one gigabyte of ram. And to mark buildds as has 6 cores, only ... ram - so that I don't need to copy entries from buildd to buildd, but just say that new machine is the same class as ..., and that's it. Another category would be fast disk/raid. There are some packages with lots of disk accesses. When you can schedule those packages to a buildd that has faster disk access like in having multiple spindles for faster seeks, you can minimize build times as well. We faced that problem on m68k particularly on IDE vs SCSI disks on Amigas, as IDE was dog slow. Another example there was the faster disks on Amigas vs slower SCSI disks in Apple machines. Now my question is just: How to do that efficient? I.e. how would such a configuration file look like, and how the code to distribute the package on the most fitting buildd(s)? (I.e. it's better to waste 5 out of 6 cores than to not build a package at all, but a package needing at least 1g ram can't build on a buildd with only 512mb - but no package should starve in the end.) Ideas? Suggestions? Code? Look at my update-buildd.net from Buildd.net, which I used to collect data from the buildds such as RAM, kernel, uptime, used swap and such (http://buildd.net/cgi/hostpackages.cgi?unstable_arch=m68ksearchtype=arrakis). I store this information into the database and also the build times of the packages. With this dataset it should be possible to have the wanna-buildd schedule packages in such a way to minimize the build times because you can decide which buildd is the most suitable buildd for the next package. -- Ciao...// Fon: 0381-2744150 Ingo \X/ http://blog.windfluechter.net gpg pubkey: http://www.juergensmann.de/ij_public_key. -- To UNSUBSCRIBE, email to debian-devel-requ...@lists.debian.org with a subject of unsubscribe. Trouble? Contact listmas...@lists.debian.org Archive: http://lists.debian.org/f423f24d7f17a3abe30510a870357...@muaddib.hro.localnet
Re: wanna-build / how to sort packages on buildds?
On Sun, May 01, 2011 at 01:36:38AM +0200, Andreas Barth wrote: I have a problem I need to solve in perl within wanna-build: Sometimes we have a few packages we don't want to build on a certain buildds. Sometimes this is because this package needs lots of ram. Or it takes quite long and would waste the parallel building a machine supports. Or whatever else. Of course a package could be in more than one category. Now, what I would like to do is to write that down in a central file with categories. I would have to echo the sentiment that storing this information in the database is probably a better idea. I just wanted to add that if you would like more statistics reporting for this purpose, I'll be happy to add that to sbuild. Currently we only really report build time and disc space. If you want additional data such as number of cores used, memory/swap usage and other resource usage, I'll be happy to add them to the sbuild summary stats. Actually measuring those might be a bit trickier though, especially on machines running parallel builds. Regards, Roger -- .''`. Roger Leigh : :' : Debian GNU/Linux http://people.debian.org/~rleigh/ `. `' Printing on GNU/Linux? http://gutenprint.sourceforge.net/ `-GPG Public Key: 0x25BFB848 Please GPG sign your mail. signature.asc Description: Digital signature
Re: wanna-build / how to sort packages on buildds?
* Ingo Jürgensmann (i...@2011.bluespice.org) [110501 11:55]: On Sun, 1 May 2011 01:36:38 +0200, Andreas Barth wrote: Now, what I would like to do is to write that down in a central file with categories. I would recommend to use a database, really. Sorry, but that's not at all the answer to *this* part of the question. This question is how would an normalized view of the data look like?. (How to store attributes is for another question, but that's later.) Andi -- To UNSUBSCRIBE, email to debian-devel-requ...@lists.debian.org with a subject of unsubscribe. Trouble? Contact listmas...@lists.debian.org Archive: http://lists.debian.org/20110501103636.gm2...@mails.so.argh.org
Re: wanna-build / how to sort packages on buildds?
* Roger Leigh (rle...@codelibre.net) [110501 12:02]: I just wanted to add that if you would like more statistics reporting for this purpose, I'll be happy to add that to sbuild. I only worry about the ~20-40 packages that are currently sitting in some no_auto_build on the buildds. Not more but also not less. I could easily write a file with buildd-name: gcc-4.5, gcc-snapshot, gmic, imagemagick, qt4-x11, ghc, # at least 1g more packages, # fpu-emulation is too slow but I consider that too ugly. Currently we only really report build time and disc space. If you want additional data such as number of cores used, memory/swap usage and other resource usage, I'll be happy to add them to the sbuild summary stats. Actually measuring those might be a bit trickier though, especially on machines running parallel builds. Thanks for the offer. Andi -- To UNSUBSCRIBE, email to debian-devel-requ...@lists.debian.org with a subject of unsubscribe. Trouble? Contact listmas...@lists.debian.org Archive: http://lists.debian.org/20110501104631.gc15...@mails.so.argh.org
wanna-build / how to sort packages on buildds?
Hi, I have a problem I need to solve in perl within wanna-build: Sometimes we have a few packages we don't want to build on a certain buildds. Sometimes this is because this package needs lots of ram. Or it takes quite long and would waste the parallel building a machine supports. Or whatever else. Of course a package could be in more than one category. Now, what I would like to do is to write that down in a central file with categories. That is, to mark packages as builds only with more than one gigabyte of ram. And to mark buildds as has 6 cores, only ... ram - so that I don't need to copy entries from buildd to buildd, but just say that new machine is the same class as ..., and that's it. Now my question is just: How to do that efficient? I.e. how would such a configuration file look like, and how the code to distribute the package on the most fitting buildd(s)? (I.e. it's better to waste 5 out of 6 cores than to not build a package at all, but a package needing at least 1g ram can't build on a buildd with only 512mb - but no package should starve in the end.) Ideas? Suggestions? Code? Andi -- To UNSUBSCRIBE, email to debian-devel-requ...@lists.debian.org with a subject of unsubscribe. Trouble? Contact listmas...@lists.debian.org Archive: http://lists.debian.org/20110430233638.gz15...@mails.so.argh.org