Re: Tests fail under load

Michael G Schwern Tue, 27 Mar 2007 21:59:30 -0800

Kirrily Robert wrote:
> We've got a situation where we have a suite of tests for a web app.  It
> starts of testing the lib/ and whatnot, but eventually gets to the point
> where it uses Test::WWW::Mechanize to go fetch stuff from the
> developer's sandbox website and do a sanity check on the web application
> itself.
> 
> The problem is that all the developer sandbox websites run on one server
> that's groaning under the strain.  It's in the process of being replaced
> but we're not there yet.  The upshot of this is that on a good day, the
> web tests take ages to run, and on a bad day they time out.  
> 
> It's got to the point where the developers just kind of mentally tune
> out failures in the web tests, and I'm worried about the "broken window"
> effect.
> 
> Any suggestions for how to work around this?  All I've got so far is the
> idea of splitting out the web tests into another directory, and treating
> them as "functional tests" that developers would typically run less
> often than the unit tests.


Ahh, the "one test server" problem.  Each developer has a perfectly fine and
highly overpowered computer sitting on their desk which is relegated to be,
essentially, a dumb terminal.  Maybe you run ssh into the dev box, a web
browser and maybe an editor.  What a tragic waste of resources.

Instead, each developer's machine should be capable of running a complete copy
of the sandbox website.  Then the tests fire up the sandbox on the local
machine and run against that.  No strained single test server to worry about.
 Individual devs can work isolated from other devs.  They can futz around with
the sandbox as much as they like without worrying about breaking everybody else.

This requires that

A) the setup of the web site be automated
B) the code not contain all sorts of hard-coded absolute paths
C) the dev machines contain the software necessary to run the site

A and B themselves have many other benefits outside testing.  In general
you'll want to move any hard-coded values out of the code and into a config
file.  Incidentally they also allow a single dev to run multiple sandboxes for
multiple branches of code they're working on.

C is a little trickier.  If the devs are using the same basic OS as the
servers then its not so bad.  Just install the appropriate packages and go.
You can even make your project a package and declare all its dependencies.

But if the developers are using Operating System A (just for example, Windows)
and the servers are using Operating System B (let's say Linux) then life gets
a little tricker, but not impossible.  If its just a few hold outs, then they
can use the now not-so-heavily loaded central testing machine and everyone
else can use their dev machines.

If most of your software is platform agnostic (Apache, a SQL database,
Perl...) then your devs can install it.  You can even go so far as to include
all dependent source and the means to automatically build it in your repository.

Another route is to go the "lite" software route.  Instead of testing with
Apache and PostgreSQL, test with HTTP::Server::Simple and SQLite.  Easier to
install and configure.  The downside is you're not testing against your real
production environment so something still should test against a staging server.

If your software isn't platform agnostic, consider something like VMWare
images.  At this point I wave my hands like so and throw a ninja flash bomb
*POOF!*


One major difference is that you're going from a homogeneous testing
environment -- one server, one install, one version of the dependent software,
one environment -- to a heterogeneous one.  Many different environments,
versions, operating systems, etc.

The homogeneous environment is a seductive one.  Its simple and easy to
maintain.  You don't have to worry about different developers getting
different results because they're using different versions of the software.
You know that the machine the code was tested on and the production server
match because they're built the same way and there's only one to worry about.

But it is an inflexible and all-or-nothing approach.  For an example let's
look the great buggaboo of the homogeneous testing system: upgrades.

Let's say you're using Perl 5.6.2.  This means EVERYONE is using 5.6.2.  Every
developer, every system one a single version of Perl.  This means everyone is
coding for the same bugs, quirks and undocumented features of that particular
version of Perl.  As long as it all works on that one version nobody is
thinking there's anything wrong.  So everyone continues to write code with
subtle mistakes that are more and more specific to that version of Perl.

Now you want to upgrade to 5.8.8.  With just one test server there's nothing
to do but upgrade it and see what happens, effecting everyone at once.  With
great dread and trepidation the upgrade is done and KERBLAM!  Failures
everywhere.  All that code that was slightly wrong but just happened to work
on 5.6 no longer works on 5.8.  New warnings, fixes to bugs you were depending
on, module upgrades, undocumented features revealed to be bugs and fixed,
experimental features gone.  Now what?  Your test server is broken.  Nobody
can test anything.  How do you fix the code to do the upgrade without breaking
the test server?

The answer is you don't.  You rapidly downgrade so people can get work done.
Then maybe, if you're really dedicated, you come in after work, upgrade the
test server and fix as much as you can then downgrade again before anyone
comes into work the next day.  More likely you just never upgrade.  And than
you wake up one day to find yourself running Perl 5.5.4, MySQL 3.22 and Apache
1.3 all on a Redhat 7.2 box.  Deep at the bottom of a steep pit of upgrades.

Another, similar, example is what happens when someone wants to do an
experiment?  Maybe they want to try a new database, Postgres instead of MySQL.
 Maybe they want try Perl compiled differently.  Maybe they want to try a
different web server.  Sorry, can't do it.  It would require changing the test
server.  And anyway your code is so tied to a single environment, a single set
of dependencies and a single version of them that it will be very difficult to
code flexibility back in.


The heterogeneous environment avoids all this.  Different developers can
freely use different versions of dependent software and different
environments.  Inflexibility is immediately spotted and destroyed.  A dev can
experiment on their own box as they like.  Individuals are using slightly
different versions and incrementally discovering what breaks from version to
version rather than all in one big upgrade leap.  The version ball keeps
getting moved forward.

The danger is too much flexibility.  Its great that your software works on
Oracle, MySQL, SQL Server, SQLite, PostgreSQL and DB2 but if its an in-house
app and all you ever use in production is Postgres then all that extra work
might have been a waste.  Maintaining portability to 2 distinct systems, maybe
3, is enough.

The other danger is in never testing on the same environment as the production
server.  This is why you need a staging server, a server configured just like
the production server where the software is installed and tested before it
moves onto the production server.


That's the Big Upgrade Plan.  There's all sorts of social and technical things
to overcome.  Meanwhile, here's a cheap hack: Run the full test suite only on
commit.  Store and display the results with something like
Test::TAP::HTMLMatrix.  Here's an example:
http://smoke.pugscode.org/

This is not ideal, but each commit is tested.  The results are saved and
displayed.  A new failure can be easily tracked back to the commit and the
developer who did it so they can immediately fix it.

Re: Tests fail under load

Reply via email to