[Zope-dev] Re: New test runner work

2005-08-24 Thread Jim Fulton

Stuart Bishop wrote:

Jim Fulton wrote:


A large proportion of our tests use a relational database. Some of
them want
an empty database, some of them want just the schema created but no data,
some of them want the schema created and the data. Some of them need the
component architecture, and some of them don't. Some of them need one or
more twisted servers running, some of them don't.

Note that we mix and match. We have 4 different types of database fixture
(none, empty, schema, populated), 2 different types of database
connection
mechanisms (psycopgda, psycopg), 2 types of CA fixture (none, loaded),
and
(currently) 4 states of external daemons needed. If we were to arrange
this
in layers, it would take 56 different layers, and this will double every
time we add a new daemon, or add more database templates (e.g. fat for
lots
of sample data to go with the existing thin).

As a way of supporting this better, instead of specifying a layer a test
could specify the list of resources it needs:

import testresources as r

class FooTest(unittest.TestCase):
   resources = [r.LaunchpadDb, r.Librarian, r.Component]
   [...]

class BarTest(unittest.TestCase):
   resources = [r.EmptyDb]

class BazTest(unittest.TestCase):
   resources = [r.LaunchpadDb, r.Librarian]


This is pretty much how layers work.  Layers can be arranged in
a DAG (much like a traditional multiple-inheritence class graph).
So, you can model each resource as a layer and specific combinations
of resources as layers.  The test runner will attempt to run the layers
in an order than minimizes set-up and tear-down of layers.



So my example could be modeled using layers like:

import layers as l

class FooLayer(l.LaunchpadDb, l.Librarian, l.Component): pass
class FooTest(unittest.TestCase):
layer = 'FooLayer'
[...]

class BarLayer(l.LaunchpadDb, l.Librarian, l.Component): pass
class BarTest(unitest.TestCase):
layer = 'BarLayer'
[...]

class BazLayer(l.LaunchpadDb, l.Librarian): pass
class BazTest(unittest.TestCase):
layer = 'BazLayer'
[...]

In general I would need to define a layer for each test case (because the
number of combinations make it impractical to explode all the possible
combinations into a tree of layers, if for no other reason than naming them).


That's too bad. Perhaps layers don't fit your need then.


If I tell the test runner to run all the tests, will the LaunchpadDb,
Librarian and Component layers each be initialized just once?


If all of the tests means these 3, then yes.


If I tell the test runner to run the Librarian layer tests, will all three
tests be run?


No, no tests will be run.  None of the tests are in the librarian layer.
They are in layers build on the librarian layer.


What happens if I go and define a new test:

class LibTest(unittest.TestCase):
layer = 'l.Librarian'
[...]

If I run all the tests, will the Librarian setup/teardown be run once (by
running the tests in the order LibTest, BazTest, FooTest, BarTest and
initializing the Librarian layer before the LaunchpadDb layer)?


Yes

 I expect

not, as 'layer' indicates a heirarchy which isn't as useful to me as a set
of resources.


I don't follow this.


If layers don't work this way, it might be possible to emulate resources
somehow:


If each test *really* has a unique set of resources, then perhaps
laters don't fit.


class ResourceTest(unittest.TestCase):
   @property
   def layer(self):
   return type(optimize_order(self.resources))

Howver, optimize_order would need to know about all the other tests so would
really be the responsibility of the test runner (so it would need to be
customized/overridden), and the test runner would need to support the layer
attribute possibly being a class rather than a string.


Layers can be classes.   In fact, I typically use classes with class
methods for setUp and tearDown.




Ah, so the layer specifies additional per-test setUp and tearDown
that is used in addition to the tests's own setUp and tearDown.  This
sounds reasonable.



But what to call them? setUpPerTest? The pretest and posttest names I used
are a bit sucky.


shrug testSetUp?




On another note, enforcing isolation of tests has been a continuous
problem
for us. For example, a developer registering a utility or otherwise
mucking
around with the global environment and forgetting to reset things in
tearDown. This goes unnoticed for a while, and other tests get written
that
actually depend on this corruption. But at some point, the order the
tests
are run changes for some reason and suddenly test 500 starts failing. It
turns out the global state has been screwed, and you have the fun task of
tracking down which of the proceeding 499 tests screwed it. I think
this is
a use case for some sort of global posttest hook.


How so?



In order to diagnose the problem I describe (which has happened far too
often!), you would add a posttest check that is run after each test. The
first test that fails due to this check is the 

[Zope-dev] Re: New test runner work

2005-08-23 Thread Stuart Bishop
Jim Fulton wrote:

 I'll note that I'm working on a newer test runner that I hope to use
 in Zope 2.9 and 3.2.  The new test runner is a nearly complete rewrite to
 provide:
 
 - A more flexible test runner that can be used for a variety of projects.
   The current test runner has been forked for ZODB, Zope 3, and Zope 2.
   That's why the Zope 3 version has features that are lacking in the Zope 2
   version.
 
 - Support for layers of tests, so that it can handle unit tests and
   functional tests.
 
 - A slightly better UI.
 
 - Tests (of the test runner itself :)
 
 See:
 
 http://svn.zope.org/zope.testing/trunk/src/zope/testing/testrunner.txt?view=log
 
 http://svn.zope.org/zope.testing/trunk/src/zope/testing/testrunner.py?view=log

Hi Jim.

I've been looking over this - fixing tests seems to take up a significant
amount of our time, so I might have some interesting use cases.

A large proportion of our tests use a relational database. Some of them want
an empty database, some of them want just the schema created but no data,
some of them want the schema created and the data. Some of them need the
component architecture, and some of them don't. Some of them need one or
more twisted servers running, some of them don't.

Note that we mix and match. We have 4 different types of database fixture
(none, empty, schema, populated), 2 different types of database connection
mechanisms (psycopgda, psycopg), 2 types of CA fixture (none, loaded), and
(currently) 4 states of external daemons needed. If we were to arrange this
in layers, it would take 56 different layers, and this will double every
time we add a new daemon, or add more database templates (e.g. fat for lots
of sample data to go with the existing thin).

As a way of supporting this better, instead of specifying a layer a test
could specify the list of resources it needs:

import testresources as r

class FooTest(unittest.TestCase):
resources = [r.LaunchpadDb, r.Librarian, r.Component]
[...]

class BarTest(unittest.TestCase):
resources = [r.EmptyDb]

class BazTest(unittest.TestCase):
resources = [r.LaunchpadDb, r.Librarian]

The resources are pretty much identical to the current layers, in that
(after the test runner does some sorting fu), the run order can be optimized
to avoid setting up and tearing down resources unnecessarily. This would be
a big win for us - currently, we specify 'resources' by simply calling
various setup and teardown methods in the test case:

class FooTest(unittest.TestCase):
def setUp(self):
LaunchpadTestSetup().setUp()
LibrarianTestSetup().setUp()
FunctionalTestSetup().setUp()
def tearDown(self):
FunctionalTestSetup().tearDown()
LibrarianTestSetup().tearDown()
LaunchpadTestSetup().tearDown()

Some other nice things could be done with the resources:

- If the setUp raises NotImplementedError (or whatever), tests using this
resource are skipped (and reported as skipped). This nicely handles tests
that should only be run in particular environments (Win32, Internet
connection, python.net installed etc.)

- If the setUp raises another exception, all tests using this resource fail.
The common case we see is 'database in use', where PostgreSQL does not let
us destroy or use as a template a database that has open connections to it.
Also useful for general sanity checking of the environment - no point
running the tests if we know they are going to fail or have skewed results.

- A resource should have a pretest and posttest hooks. pretest is used for
lightweight resource specific initialization (e.g. setUp creates a fresh
database from a dump and pretest initializes the connection pool). posttest
can be used to ensure tests cleaned up properly or other housekeeping (e.g.
issuing a rollback). This could also apply to layers in the current
environment. This eliminates tedious boilerplate from testcases.

- A resource could provide useful data to the test runner. For example, if a
resource says it doesn't use or lock any shared system resources, the test
runner could decide to run tests in parallel. Although a less blue sky use
would be specifying a dependancy on another resource.

On another note, enforcing isolation of tests has been a continuous problem
for us. For example, a developer registering a utility or otherwise mucking
around with the global environment and forgetting to reset things in
tearDown. This goes unnoticed for a while, and other tests get written that
actually depend on this corruption. But at some point, the order the tests
are run changes for some reason and suddenly test 500 starts failing. It
turns out the global state has been screwed, and you have the fun task of
tracking down which of the proceeding 499 tests screwed it. I think this is
a use case for some sort of global posttest hook. Perhaps this would be best
done by allowing people to write wrappers around the one-true-testrunner?
This seems to be the simplest way of allowing 

[Zope-dev] Re: New test runner work

2005-08-23 Thread Jim Fulton

Stuart Bishop wrote:

Jim Fulton wrote:



I'll note that I'm working on a newer test runner that I hope to use
in Zope 2.9 and 3.2.  The new test runner is a nearly complete rewrite to
provide:

- A more flexible test runner that can be used for a variety of projects.
 The current test runner has been forked for ZODB, Zope 3, and Zope 2.
 That's why the Zope 3 version has features that are lacking in the Zope 2
 version.

- Support for layers of tests, so that it can handle unit tests and
 functional tests.

- A slightly better UI.

- Tests (of the test runner itself :)

See:

http://svn.zope.org/zope.testing/trunk/src/zope/testing/testrunner.txt?view=log

http://svn.zope.org/zope.testing/trunk/src/zope/testing/testrunner.py?view=log



Hi Jim.

I've been looking over this - fixing tests seems to take up a significant
amount of our time, so I might have some interesting use cases.

A large proportion of our tests use a relational database. Some of them want
an empty database, some of them want just the schema created but no data,
some of them want the schema created and the data. Some of them need the
component architecture, and some of them don't. Some of them need one or
more twisted servers running, some of them don't.

Note that we mix and match. We have 4 different types of database fixture
(none, empty, schema, populated), 2 different types of database connection
mechanisms (psycopgda, psycopg), 2 types of CA fixture (none, loaded), and
(currently) 4 states of external daemons needed. If we were to arrange this
in layers, it would take 56 different layers, and this will double every
time we add a new daemon, or add more database templates (e.g. fat for lots
of sample data to go with the existing thin).

As a way of supporting this better, instead of specifying a layer a test
could specify the list of resources it needs:

import testresources as r

class FooTest(unittest.TestCase):
resources = [r.LaunchpadDb, r.Librarian, r.Component]
[...]

class BarTest(unittest.TestCase):
resources = [r.EmptyDb]

class BazTest(unittest.TestCase):
resources = [r.LaunchpadDb, r.Librarian]



This is pretty much how layers work.  Layers can be arranged in
a DAG (much like a traditional multiple-inheritence class graph).
So, you can model each resource as a layer and specific combinations
of resources as layers.  The test runner will attempt to run the layers
in an order than minimizes set-up and tear-down of layers.

...


Some other nice things could be done with the resources:

- If the setUp raises NotImplementedError (or whatever), tests using this
resource are skipped (and reported as skipped). This nicely handles tests
that should only be run in particular environments (Win32, Internet
connection, python.net installed etc.)


That's a good idea.


- If the setUp raises another exception, all tests using this resource fail.
The common case we see is 'database in use', where PostgreSQL does not let
us destroy or use as a template a database that has open connections to it.
Also useful for general sanity checking of the environment - no point
running the tests if we know they are going to fail or have skewed results.


Good.


- A resource should have a pretest and posttest hooks. pretest is used for
lightweight resource specific initialization (e.g. setUp creates a fresh
database from a dump and pretest initializes the connection pool). posttest
can be used to ensure tests cleaned up properly or other housekeeping (e.g.
issuing a rollback). This could also apply to layers in the current
environment. This eliminates tedious boilerplate from testcases.


Ah, so the layer specifies additional per-test setUp and tearDown
that is used in addition to the tests's own setUp and tearDown.  This
sounds reasonable.


- A resource could provide useful data to the test runner. For example, if a
resource says it doesn't use or lock any shared system resources, the test
runner could decide to run tests in parallel.



Although a less blue sky use
would be specifying a dependancy on another resource.


This is handled by layers now. Layers have __bases__ -- layers are
build on other layers. That's why they are caled layers. :)


On another note, enforcing isolation of tests has been a continuous problem
for us. For example, a developer registering a utility or otherwise mucking
around with the global environment and forgetting to reset things in
tearDown. This goes unnoticed for a while, and other tests get written that
actually depend on this corruption. But at some point, the order the tests
are run changes for some reason and suddenly test 500 starts failing. It
turns out the global state has been screwed, and you have the fun task of
tracking down which of the proceeding 499 tests screwed it. I think this is
a use case for some sort of global posttest hook.


How so?

 Perhaps this would be best

done by allowing people to write wrappers around the one-true-testrunner?


or we could simply provide such a