Re: wanna-build / how to sort packages on buildds?

2011-05-02 Thread Goswin von Brederlow
Ingo Jürgensmann i...@2011.bluespice.org writes:

 On Sun, 1 May 2011 01:36:38 +0200, Andreas Barth wrote:

 Sometimes we have a few packages we don't want to build on a certain
 buildds. Sometimes this is because this package needs lots of ram. Or
 it takes quite long and would waste the parallel building a machine
 supports. Or whatever else. Of course a package could be in more than
 one category.

 Yes, you're facing basically the same problem I tried to address in
 2000/2001 when doing my renderserver and later for what Multibuild was
 intended to do as well. ;-)

 Now, what I would like to do is to write that down in a central file
 with categories.

 I would recommend to use a database, really.

 That is, to mark packages as builds only with more than one gigabyte
 of ram. And to mark buildds as has 6 cores, only ... ram - so
 that I don't need to copy entries from buildd to buildd, but just say
 that new machine is the same class as ..., and that's it.

 Another category would be fast disk/raid. There are some packages
 with lots of disk accesses. When you can schedule those packages to a
 buildd that has faster disk access like in having multiple spindles
 for faster seeks, you can minimize build times as well. We faced that
 problem on m68k particularly on IDE vs SCSI disks on Amigas, as IDE
 was dog slow. Another example there was the faster disks on Amigas vs
 slower SCSI disks in Apple machines.

 Now my question is just: How to do that efficient? I.e. how would
 such
 a configuration file look like, and how the code to distribute the
 package on the most fitting buildd(s)? (I.e. it's better to waste 5
 out of 6 cores than to not build a package at all, but a package
 needing at least 1g ram can't build on a buildd with only 512mb - but
 no package should starve in the end.)
 Ideas? Suggestions? Code?

 Look at my update-buildd.net from Buildd.net, which I used to collect
 data from the buildds such as RAM, kernel, uptime, used swap and such
 (http://buildd.net/cgi/hostpackages.cgi?unstable_arch=m68ksearchtype=arrakis).
  I
 store this information into the database and also the build times of
 the packages. With this dataset it should be possible to have the
 wanna-buildd schedule packages in such a way to minimize the build
 times because you can decide which buildd is the most suitable buildd
 for the next package.

I think different groups of factors have to be considered:

1) absolute requirements

I think there are only 2 absolute requirements:
  - ram size
  - disk size
And all buildds currently have enough disk space I think.

In the past we also had some sources that would crash one buildd but not
the other. No way to track that ahead of time though. But it should be
possible to report this to wanna-build.

Absolute requirement are absolute. If a buildd doesn't have the
requirement then wanna-build must never schedule the package to build
there. (Note: The buildd will just give it back with the current setup
so no biggy if wanna-build gets it wrong.)

2) important features

The most relevant feature I think is multiple cores and support of
DEB_BUILD_OPTIONS=parallel=x. This would be an attribute of both the
buildd and the source and one should try to match them. Build sources
which support parallel building preverably on systems with multiple
cores.

The I/O speed and the sources need for it could be another such
feature. But I'm not sure (other than the m68k special case) this is
relevant to such a degree that it makes sense tracking this
specifically.

Important features would be anything we can figure out and point to as
having a major influence on the build speed. And imho this should be
like N times faster to warrant the effort to track this for sources.

3) general performance

Buildds are different and build times will differ acordingly. I don't
think this can be properly quanitfied ahead of time and there are many
hidden factors interacting that would be impossible to quantify with
reasonable effort. But I think this can be measured and extrapolated
just fine. Keep a database of build times and do some statistical
analysis to rate the buildd speed in general and for specific
sources. With that you have a good aproximation of the time a source
will need to build on each buildd. Use that as weight when deciding
where to build a source.

Unlike items in 2), which would have to be manually tracked, this would
encompass any and all factors including unknown ones in
approximation. Some care would have to be taken that factors aren't
weighted twice, once from 2) and once here.

The build times for a parallel building source will differ greatly for
single and multi core systems. The difference in weigth this produces
might already be sufficient so that those sources prefer the multi core
systems (after a few versions). So tracking important features manually
might be wasted effort altogether.



My suggestion would be to implement something for 1) and 3) and see how
that goes. 

Re: wanna-build / how to sort packages on buildds?

2011-05-02 Thread Goswin von Brederlow
Roger Leigh rle...@codelibre.net writes:

 I just wanted to add that if you would like more statistics reporting
 for this purpose, I'll be happy to add that to sbuild.  Currently we
 only really report build time and disc space.  If you want additional
 data such as number of cores used, memory/swap usage and other resource
 usage, I'll be happy to add them to the sbuild summary stats.  Actually
 measuring those might be a bit trickier though, especially on machines
 running parallel builds.

Reporting the cummulative cpu time used and wall clock time should be
helpfull in detecting parallel builds. The more cpu time approached wall
clock * num cores the better it would be to build the package on multi
core systems.

MfG
Goswin


-- 
To UNSUBSCRIBE, email to debian-devel-requ...@lists.debian.org
with a subject of unsubscribe. Trouble? Contact listmas...@lists.debian.org
Archive: http://lists.debian.org/87vcxtuj8b.fsf@frosties.localnet



Re: wanna-build / how to sort packages on buildds?

2011-05-02 Thread Scott Kitterman
On Saturday, April 30, 2011 07:36:38 PM Andreas Barth wrote:
 Hi,
 
 I have a problem I need to solve in perl within wanna-build:
 
 Sometimes we have a few packages we don't want to build on a certain
 buildds. Sometimes this is because this package needs lots of ram. Or
 it takes quite long and would waste the parallel building a machine
 supports. Or whatever else. Of course a package could be in more than
 one category.
 
 Now, what I would like to do is to write that down in a central file
 with categories.
 
 That is, to mark packages as builds only with more than one gigabyte
 of ram. And to mark buildds as has 6 cores, only ... ram - so
 that I don't need to copy entries from buildd to buildd, but just say
 that new machine is the same class as ..., and that's it.
 
 Now my question is just: How to do that efficient? I.e. how would such
 a configuration file look like, and how the code to distribute the
 package on the most fitting buildd(s)? (I.e. it's better to waste 5
 out of 6 cores than to not build a package at all, but a package
 needing at least 1g ram can't build on a buildd with only 512mb - but
 no package should starve in the end.)
 
 Ideas? Suggestions? Code?

If one could do something like:

wb gb libieee1284 mod-wsgi nflog-bindings zinnia . ia64 . !caballero

that would be a HUGE win.  My suggestion would be to start with something 
simple and declarative like that and then build the back end to automatically 
sort out the list of candidate buildd's after that.

Scott K


-- 
To UNSUBSCRIBE, email to debian-devel-requ...@lists.debian.org
with a subject of unsubscribe. Trouble? Contact listmas...@lists.debian.org
Archive: http://lists.debian.org/201105021332.04065.deb...@kitterman.com



Re: wanna-build / how to sort packages on buildds?

2011-05-02 Thread Andreas Barth
* Scott Kitterman (deb...@kitterman.com) [110502 19:32]:
 If one could do something like:
 
 wb gb libieee1284 mod-wsgi nflog-bindings zinnia . ia64 . !caballero

good idea. I'll consider how to do that.


Andi


-- 
To UNSUBSCRIBE, email to debian-devel-requ...@lists.debian.org
with a subject of unsubscribe. Trouble? Contact listmas...@lists.debian.org
Archive: http://lists.debian.org/20110502173620.gp15...@mails.so.argh.org



Re: wanna-build / how to sort packages on buildds?

2011-05-01 Thread Tollef Fog Heen
]] Andreas Barth 

Hi,

| Now my question is just: How to do that efficient? I.e. how would such
| a configuration file look like, and how the code to distribute the
| package on the most fitting buildd(s)? (I.e. it's better to waste 5
| out of 6 cores than to not build a package at all, but a package
| needing at least 1g ram can't build on a buildd with only 512mb - but
| no package should starve in the end.)
| 
| Ideas? Suggestions? Code?

Sounds like a variant of the knapsack problem.

I'd suggest something like:

- Have a mapping for buildds from resources to a value (this can just be
  a perl hash), this defines cores, amount of memory, etc.

- Each package has a minimum requirement for cpu, memory, etc, stored in a
  hash.  Store all the packages in a list.

- Sort the list, either according to a score which is a mix of cpu and
  memory and whatever other factors you want or first along the cpu
  axis, then along the memory axis, etc.  I suspect CPU and memory
  requirements are correlated, but not perfectly.

- Assign packages to buildds on a first-match basis.  That means you get
  the hardest packages done first.  The match has to make sure the
  buildd can actually build the package in question, of course.

Regards,
-- 
Tollef Fog Heen
UNIX is user friendly, it's just picky about who its friends are


-- 
To UNSUBSCRIBE, email to debian-devel-requ...@lists.debian.org
with a subject of unsubscribe. Trouble? Contact listmas...@lists.debian.org
Archive: http://lists.debian.org/87d3k2c49x@qurzaw.varnish-software.com



Re: wanna-build / how to sort packages on buildds?

2011-05-01 Thread Ingo Jürgensmann

On Sun, 1 May 2011 01:36:38 +0200, Andreas Barth wrote:


Sometimes we have a few packages we don't want to build on a certain
buildds. Sometimes this is because this package needs lots of ram. Or
it takes quite long and would waste the parallel building a machine
supports. Or whatever else. Of course a package could be in more than
one category.


Yes, you're facing basically the same problem I tried to address in 
2000/2001 when doing my renderserver and later for what Multibuild was 
intended to do as well. ;-)



Now, what I would like to do is to write that down in a central file
with categories.


I would recommend to use a database, really.


That is, to mark packages as builds only with more than one gigabyte
of ram. And to mark buildds as has 6 cores, only ... ram - so
that I don't need to copy entries from buildd to buildd, but just say
that new machine is the same class as ..., and that's it.


Another category would be fast disk/raid. There are some packages 
with lots of disk accesses. When you can schedule those packages to a 
buildd that has faster disk access like in having multiple spindles for 
faster seeks, you can minimize build times as well. We faced that 
problem on m68k particularly on IDE vs SCSI disks on Amigas, as IDE was 
dog slow. Another example there was the faster disks on Amigas vs slower 
SCSI disks in Apple machines.


Now my question is just: How to do that efficient? I.e. how would 
such

a configuration file look like, and how the code to distribute the
package on the most fitting buildd(s)? (I.e. it's better to waste 5
out of 6 cores than to not build a package at all, but a package
needing at least 1g ram can't build on a buildd with only 512mb - but
no package should starve in the end.)
Ideas? Suggestions? Code?


Look at my update-buildd.net from Buildd.net, which I used to collect 
data from the buildds such as RAM, kernel, uptime, used swap and such 
(http://buildd.net/cgi/hostpackages.cgi?unstable_arch=m68ksearchtype=arrakis). 
I store this information into the database and also the build times of 
the packages. With this dataset it should be possible to have the 
wanna-buildd schedule packages in such a way to minimize the build times 
because you can decide which buildd is the most suitable buildd for the 
next package.


--
Ciao...//  Fon: 0381-2744150
  Ingo   \X/   http://blog.windfluechter.net

gpg pubkey: http://www.juergensmann.de/ij_public_key.


--
To UNSUBSCRIBE, email to debian-devel-requ...@lists.debian.org
with a subject of unsubscribe. Trouble? Contact listmas...@lists.debian.org
Archive: 
http://lists.debian.org/f423f24d7f17a3abe30510a870357...@muaddib.hro.localnet



Re: wanna-build / how to sort packages on buildds?

2011-05-01 Thread Roger Leigh
On Sun, May 01, 2011 at 01:36:38AM +0200, Andreas Barth wrote:
 I have a problem I need to solve in perl within wanna-build:
 
 Sometimes we have a few packages we don't want to build on a certain
 buildds. Sometimes this is because this package needs lots of ram. Or
 it takes quite long and would waste the parallel building a machine
 supports. Or whatever else. Of course a package could be in more than
 one category.
 
 Now, what I would like to do is to write that down in a central file
 with categories.

I would have to echo the sentiment that storing this information in
the database is probably a better idea.

I just wanted to add that if you would like more statistics reporting
for this purpose, I'll be happy to add that to sbuild.  Currently we
only really report build time and disc space.  If you want additional
data such as number of cores used, memory/swap usage and other resource
usage, I'll be happy to add them to the sbuild summary stats.  Actually
measuring those might be a bit trickier though, especially on machines
running parallel builds.


Regards,
Roger

-- 
  .''`.  Roger Leigh
 : :' :  Debian GNU/Linux http://people.debian.org/~rleigh/
 `. `'   Printing on GNU/Linux?   http://gutenprint.sourceforge.net/
   `-GPG Public Key: 0x25BFB848   Please GPG sign your mail.


signature.asc
Description: Digital signature


Re: wanna-build / how to sort packages on buildds?

2011-05-01 Thread Andreas Barth
* Ingo Jürgensmann (i...@2011.bluespice.org) [110501 11:55]:
 On Sun, 1 May 2011 01:36:38 +0200, Andreas Barth wrote:

 Now, what I would like to do is to write that down in a central file
 with categories.

 I would recommend to use a database, really.

Sorry, but that's not at all the answer to *this* part of the
question. This question is how would an normalized view of the data
look like?. (How to store attributes is for another question, but
that's later.)


Andi


-- 
To UNSUBSCRIBE, email to debian-devel-requ...@lists.debian.org
with a subject of unsubscribe. Trouble? Contact listmas...@lists.debian.org
Archive: http://lists.debian.org/20110501103636.gm2...@mails.so.argh.org



Re: wanna-build / how to sort packages on buildds?

2011-05-01 Thread Andreas Barth
* Roger Leigh (rle...@codelibre.net) [110501 12:02]:
 I just wanted to add that if you would like more statistics reporting
 for this purpose, I'll be happy to add that to sbuild.

I only worry about the ~20-40 packages that are currently sitting in
some no_auto_build on the buildds. Not more but also not less.

I could easily write a file with
buildd-name: gcc-4.5, gcc-snapshot, gmic, imagemagick,
qt4-x11, ghc, # at least 1g
more packages, # fpu-emulation is too slow

but I consider that too ugly.


 Currently we
 only really report build time and disc space.  If you want additional
 data such as number of cores used, memory/swap usage and other resource
 usage, I'll be happy to add them to the sbuild summary stats.  Actually
 measuring those might be a bit trickier though, especially on machines
 running parallel builds.

Thanks for the offer.



Andi


-- 
To UNSUBSCRIBE, email to debian-devel-requ...@lists.debian.org
with a subject of unsubscribe. Trouble? Contact listmas...@lists.debian.org
Archive: http://lists.debian.org/20110501104631.gc15...@mails.so.argh.org



wanna-build / how to sort packages on buildds?

2011-04-30 Thread Andreas Barth
Hi,

I have a problem I need to solve in perl within wanna-build:

Sometimes we have a few packages we don't want to build on a certain
buildds. Sometimes this is because this package needs lots of ram. Or
it takes quite long and would waste the parallel building a machine
supports. Or whatever else. Of course a package could be in more than
one category.

Now, what I would like to do is to write that down in a central file
with categories.

That is, to mark packages as builds only with more than one gigabyte
of ram. And to mark buildds as has 6 cores, only ... ram - so
that I don't need to copy entries from buildd to buildd, but just say
that new machine is the same class as ..., and that's it.

Now my question is just: How to do that efficient? I.e. how would such
a configuration file look like, and how the code to distribute the
package on the most fitting buildd(s)? (I.e. it's better to waste 5
out of 6 cores than to not build a package at all, but a package
needing at least 1g ram can't build on a buildd with only 512mb - but
no package should starve in the end.)

Ideas? Suggestions? Code?



Andi


-- 
To UNSUBSCRIBE, email to debian-devel-requ...@lists.debian.org
with a subject of unsubscribe. Trouble? Contact listmas...@lists.debian.org
Archive: http://lists.debian.org/20110430233638.gz15...@mails.so.argh.org