I don't mean to argue with you about this, but I don't really find these points convincing. After all, we manage to make it work even though I'm pretty sure we've got way more code than PLT Scheme. (To quote Kevin Bourrillion, Google's Java library curator: "If code were ice cream, we would have ... a whole lot of ice cream.")
I think what you're really saying is that it's impractical to test all of PLT Scheme with one central testing service that tries to do the whole thing at once. I can definitely believe that. I think the key might be in making the process more modular --- a continuous build with "all of PLT Scheme" as its target is orders of magnitude too big to be very useful IMO. One way to chop things up would be to have separate builds for different components: a mzscheme build, a mred build, a drscheme build (or maybe further break down DrS into smaller pieces and potentially several other components for different collects. Each sub-build runs the test suites appropriate to it, and just measures code coverage in the files it cares about. (In practice, even though a test suite written to cover, say, a mred component might incidentally cover some mzlib stuff, that coverage isn't very high-quality and probably shouldn't count as mzlib tests anyway.) Sub-builds can run independently on different machines, set up by individual groups. When I was working on planet, for instance, I could've set up my own sub-project just for it, and had the system run on my local machine. (I do effectively the same thing on my current project.) One way to think about it is: suppose you're monitoring the the output of tests, and you get a message saying some tests have failed. Do you care? If you don't, you need to think about making better targets, and only monitoring the targets for which you can unhesitatingly say yes. This will incidentally make it a lot easier for smaller projects to get up and running. Coverage is actually a really important metric for test suites; without it, you get the warm fuzzies of seeing that all tests passed, but you don't get any sense whatsoever about how much assurance you can derive from them. It is worth investing effort into measuring it. -jacob (Part of my current job is to advocate good testing practices and help other teams set up good testing infrastructure for their projects, so if I come off like an evangelist, that's why. :) ) On Fri, May 22, 2009 at 10:08 AM, Eli Barzilay <e...@barzilay.org> wrote: > On May 22, Jacob Matthews wrote: >> On Fri, May 22, 2009 at 10:00 AM, Eli Barzilay <e...@barzilay.org> wrote: >> > On May 22, John Clements wrote: >> >> >> >> Well, if you're volunteering, what I'd really like is a way to do >> >> coverage testing across multiple files; the current green/red >> >> mechanism doesn't scale well. >> > >> > In any case, measuring coverage for the tests is not practical ATM. >> >> Out of curiosity: Why not? > > (a) errortrace is adding a good runtime factor -- and the tests take a > considerable time (compiling in errortrace mode can work too, but > even that is horribly expensive) > > (b) There's a *lot* of code, so keeping track of all expressions will > be a problem > > (c) code is executed one test suite at a time, so it will require > running it, collecting the results, etc, then combining them all. > > -- > ((lambda (x) (x x)) (lambda (x) (x x))) Eli Barzilay: > http://www.barzilay.org/ Maze is Life! > > _________________________________________________ For list-related administrative tasks: http://list.cs.brown.edu/mailman/listinfo/plt-dev