Re: Automated testing - design and interfaces

2005-11-24 Thread Ian Jackson
Anthony Towns writes (Re: Automated testing - design and interfaces):
 On Mon, Nov 21, 2005 at 06:22:37PM +, Ian Jackson wrote:
  This is no good because we want the test environment to be able to
  tell which tests failed, so the test cases have to be enumerated in
  the test metadata file.
 
 Uh, having to have 1000 scripts, or even just 1000 entries in a metadata
 file, just to run 1000 tests is a showstopper imho. Heck, identifying
 testcases by number mightn't be particularly desirable in some instances,
 if a clearer alternative like, say, test case failed: add 1, add 2,
 del 1, ch 2 is possible.

Sorry, as Robert Collins point out, I didn't mean `enumerate'.  I
meant `identify'.  That is, the test environment needs to see the
results of individual tests and not just a human-only-readable report
file.

I agree with you about numbers.  If you let tests have names people
can always write numbers in, if they insist, so it's sufficient for
the system to support names.

  You can't check that the binary works _when the .deb is installed_
  without installing it.
 
 That's okay. You can't check that the binary works _on the user's system_
 without installing it on the user's system either. For Debian's purposes,
 being able to run the tests with minimal setup seems crucial.

That's true.  Of course the user's system is a moveable feast.  One
goal of my design is to allow testing on a minimal setup.

  Also, a `Restriction' isn't right because if the test has neither of
  those Restrictions then presumably it can do either but how would it
  know which ?
 
 It would have to not care which; which it could do by expecting the
 test harness to put the binaries in the PATH, or provide an environment
 variable like INSTALL_ROOT=$(pwd)/debian/tmp .

Right.  So you're effectively adding a new bit to the spec to support
that.  I don't want to go there right now but this is definitely
something we want to allow room for in the future.  The way I would
imagine extending it to cover this case would be to invent a new
header (which the old test-runner would be updated to treat as
harmless)
 Features: test-in-build
which would mean that this was supported.  You could say
 Restrictions: test-in-build
to mean that _only_ that was supported.  And of course you'd have to
define exactly what the feature meant (including the INSTALL_ROOT
thing, etc.).

 Having test case dependencies is fairly useful; in any event the language
 Even integration tests can be represented like this: if one package's
 tests Depend on the other's is wrong if tests depend on other packages,
 not on other package's tests. You'll want Conflicts: as well as Depends:
 in that case too. 

Ah, I see what you mean.  Yes, that language is wrong.

Adding `Conflicts' is an obvious extension but I don't propose to
implement it in my first cut.

 It would probably be quite useful to be able to write tests like:
   for mta in exim4 smail sendmail qmail; do
  install $mta
  # actual test
  uninstall $mta
   done
 too, to ensure that packages that depend on one of a number of packages
 actually work with all of them.

Quite so.  I'm not sure if my test-runner will get clever enough for
that but the information and interfaces it needs will be there.

Ian.


-- 
To UNSUBSCRIBE, email to [EMAIL PROTECTED]
with a subject of unsubscribe. Trouble? Contact [EMAIL PROTECTED]



Re: Automated testing - design and interfaces

2005-11-23 Thread Anthony Towns
On Mon, Nov 21, 2005 at 06:22:37PM +, Ian Jackson wrote:
  Note that it's often better to have a single script run many tests, so
  you probably want to allow tests to pass back some summary information,
  or include the last ten lines of its output or similar. Something like:
foo FAIL:
  FAILURE: testcase 231
  FAILURE: testcase 289
  FAILURE: testcase 314
  3/512 test cases failed
 This is no good because we want the test environment to be able to
 tell which tests failed, so the test cases have to be enumerated in
 the test metadata file.

Uh, having to have 1000 scripts, or even just 1000 entries in a metadata
file, just to run 1000 tests is a showstopper imho. Heck, identifying
testcases by number mightn't be particularly desirable in some instances,
if a clearer alternative like, say, test case failed: add 1, add 2,
del 1, ch 2 is possible.

 You can't check that the binary works _when the .deb is installed_
 without installing it.

That's okay. You can't check that the binary works _on the user's system_
without installing it on the user's system either. For Debian's purposes,
being able to run the tests with minimal setup seems crucial.

 Also, a `Restriction' isn't right because if the test has neither of
 those Restrictions then presumably it can do either but how would it
 know which ?

It would have to not care which; which it could do by expecting the
test harness to put the binaries in the PATH, or provide an environment
variable like INSTALL_ROOT=$(pwd)/debian/tmp .

 No, I mean that if the tests live (say) in
 build/foo-1.0/debian/tests/x then build/foo-1.0/debian/tests/control
 could say
  Depends: bar
 which would mean bar would have to be installed, effectively making it
 an integration test.

Having test case dependencies is fairly useful; in any event the language
Even integration tests can be represented like this: if one package's
tests Depend on the other's is wrong if tests depend on other packages,
not on other package's tests. You'll want Conflicts: as well as Depends:
in that case too. 

It would probably be quite useful to be able to write tests like:

for mta in exim4 smail sendmail qmail; do
   install $mta
   # actual test
   uninstall $mta
done

too, to ensure that packages that depend on one of a number of packages
actually work with all of them.

Cheers,
aj


signature.asc
Description: Digital signature


Re: Automated testing - design and interfaces

2005-11-23 Thread Robert Collins
On Wed, 2005-11-23 at 18:16 +1000, Anthony Towns wrote:
 On Mon, Nov 21, 2005 at 06:22:37PM +, Ian Jackson wrote:
   Note that it's often better to have a single script run many tests, so
   you probably want to allow tests to pass back some summary information,
   or include the last ten lines of its output or similar. Something like:
 foo FAIL:
   FAILURE: testcase 231
   FAILURE: testcase 289
   FAILURE: testcase 314
   3/512 test cases failed
  This is no good because we want the test environment to be able to
  tell which tests failed, so the test cases have to be enumerated in
  the test metadata file.

Replying to two messages here ...
I don't think we have to enumerate the tests in advance. Sure the test
runner needs to be able to identify and [possibly] categorise the tests,
but explicit enumeration is quite orthogonal. A number of python
unittest runners will scan directories and classes for their tests, and
the report from users is consistently that this is easier to use.

 Uh, having to have 1000 scripts, or even just 1000 entries in a metadata
 file, just to run 1000 tests is a showstopper imho. Heck, identifying
 testcases by number mightn't be particularly desirable in some instances,
 if a clearer alternative like, say, test case failed: add 1, add 2,
 del 1, ch 2 is possible.

A very nice feature in the xUnit world is that tests can be identified
by either their path (inside the language namespace) or by a comment
written by the author. At runtime you can choose which to see. I dont
think we need the ability for runtime selection, but having a heuristic
that works and is overridable would be nice.

I.e. by default you might get
tests/layout/documentation_in_usr_share_doc.sh: PASS

But inside that test you could say:
test_name Documentation is installed in /usr/share/doc

and the output becomes
Documentation is installed in /usr/share/doc: PASS

I've written a project that is somewhat related to this:
http://www.robertcollins.net/unittest/subunit/

Its a python implementation of a cross process test running protocol.
This lets a sub process run 0 to many tests, identify them and provide
pass/fail/error status and traceback or other diagnostics. As the driver
is python its not appropriate here, but I think the basic
protocol/concept might be useful.

  You can't check that the binary works _when the .deb is installed_
  without installing it.
 
 That's okay. You can't check that the binary works _on the user's system_
 without installing it on the user's system either. For Debian's purposes,
 being able to run the tests with minimal setup seems crucial.

Yup


 It would probably be quite useful to be able to write tests like:
 
   for mta in exim4 smail sendmail qmail; do
  install $mta
  # actual test
  uninstall $mta
   done
 
 too, to ensure that packages that depend on one of a number of packages
 actually work with all of them.

Yes, that would be excellent.

Rob

-- 
GPG key available at: http://www.robertcollins.net/keys.txt.


signature.asc
Description: This is a digitally signed message part


Re: Automated testing - design and interfaces

2005-11-21 Thread Ian Jackson
Anthony Towns writes (Re: Automated testing - design and interfaces):
 On Thu, Nov 17, 2005 at 06:43:32PM +, Ian Jackson wrote:
The source package provides a test metadata file debian/tests/
control. This is a file containing zero or more RFC822-style
stanzas, along these lines:
Tests: fred bill bongo
Restrictions: needs-root breaks-computer
This means execute debian/tests/fred, debian/tests/bill, etc.,
 
 Seems like:
 
   debian/tests/bar:
 #!/bin/sh
 # Restrictions: needs-root trashes-system
 # Requires: foo

Urgh.  I'm really not a fan of those files which mix up different
languages.  We'll end up with complicated scheme for separating out
the test metadata from other stuff appearing in the comments at the
top of files (Emacs and vim modes, #! lines, different comment
syntaxes in different languages, etc.)

Also, we want to be able to share the actual tests - that is, the meat
of the work - with non-Debian systems.  So we should separate out the
metadata (which describes when the test should be run and where it is,
and is Debian-specific) from the actual tests (which need not be
Debian-specific).

  Is the Depends: line meant to refer to other Debian packages (and
 thus be a lower level version of Restrictions:) or is it meant to
 indiciate test interdependencies? If it's meant to be for debian
 packages, maybe
   # Restrictions: deb:xvncviewer
 might be better.

Yes, Depends is semantically much like Restrictions but refers to a
Debian package (which must be installed on the test system).  However,
Depends might have version numbers etc. - it's just like a Depends
field.  I don't want to try to mix that with the simple syntax of
Restrictions.

IMO it's better to have two fields if the structure (and hence the
syntax) of the information is going to be significantly different,
even if there's a certain similarity to the semantics.

 Note that it's often better to have a single script run many tests, so
 you probably want to allow tests to pass back some summary information,
 or include the last ten lines of its output or similar. Something like:
 
   foo FAIL:
 FAILURE: testcase 231
 FAILURE: testcase 289
 FAILURE: testcase 314
 3/512 test cases failed

This is no good because we want the test environment to be able to
tell which tests failed, so the test cases have to be enumerated in
the test metadata file.

You do have a point about not necessarily starting a single process
for each test.  An earlier version of my draft had something like
  Test: .../filename+
where the + meant to execute filename and it would print
   138: PASS
   231: FAIL
   289: FAIL
   314: SKIP: no X11
or some similar standard format.

A basic test could be simply running the binary and checking the
result status (or other variants of this). Eventually every
package would to be changed to include at least one test.
 
 These sorts of tests are better done as part of debian/rules, I would've
 thought -- the advantage of that is that the problems get caught even
 when users rebuild the package themselves, and you don't need to worry
 about special test infrastructure like you're talking about for the
 simple case.

You can't check that the binary works _when the .deb is installed_
without installing it.

Ideally eventually where possible the upstream regression tests
could be massaged so that they test the installed version. Whether
this is possible and how best to achieve it has to be decided on a
per-package basis.
 
 Having
   Restrictions: package-installed
 and
   Restrictions: build-tree

Hrm, that's an interesting idea.  I really think that concentrating on
testing as-installed is going to yield much richer results - that is,
more test failures :-).  So I want to provide that interface straight
away.

Also, a `Restriction' isn't right because if the test has neither of
those Restrictions then presumably it can do either but how would it
know which ?

Even integration tests can be represented like this: if one
package's tests Depend on the other's, then they are effectively
integration tests. The actual tests can live in whichever package
is most convenient.
 
 Going from build/foo-1.0/debian/tests/x to
 projects/bar-3.14/debian/tests/y seems difficult.

No, I mean that if the tests live (say) in
build/foo-1.0/debian/tests/x then build/foo-1.0/debian/tests/control
could say
 Depends: bar
which would mean bar would have to be installed, effectively making it
an integration test.

 Anyway, something that can be run with minimal amounts of setup seems
 most likely to be most useful: so running as part of the build without
 installing the package, running without anything special installed but the
 package being tested and a script that parses the control information,
 stuff that can be run on a user's system without root privs and without
 trashing the system, etc.

My idea is that the test runner will do `something

Re: Automated testing - design and interfaces

2005-11-20 Thread Robert Collins
On Thu, 2005-11-17 at 14:36 -0800, Steve Langasek wrote:
 [let's get this over to a technical list like it was supposed to be ;)]

  Following your exit status based approach you could add to stanzas
  something like:
 
Expected-Status: 0
 
  I found the above requirement the very minimum for a test interface.
  What follows is optional (IMHO).
 
 FWIW, I don't see that there's a clear advantage to having the test harness
 *expect* non-zero exit values (or non-empty output as you also suggested).
 It may make it easier to write tests by putting more of the logic in the
 test harness, but in exchange it makes it harder to debug a test failure
 because the debugger has to figure out how correct and incorrect are
 defined for each test, instead of just getting into the meat of seeing why
 the test returned non-zero.  Likewise, expecting successful tests to be
 silent means that you can rely on any output being error output that can be
 used for debugging a test failure.

Right. Splitting it into two bits ...

with respect to exit code, there is generally only one way to succeed,
but many ways to fail. So reserving 0 for 'test succeeded' in ALL cases,
makes writing front ends, or running the tests interactively much
easier. Its certainly possible to provide a $lang function that can
invert the relationship for you, for 'expected failure' results. One of
the things about expected failures is their granularity: is a test
expected to fail because 'file FOO is missing', or because 'something
went wrong'. The former test is best written as an explicit check, where
you invert the sense in the test script. Its best because this means
that when the behaviour of the failing logic alters - for better or
worse - you get flagged by the test that it has changed. Whereas a
handwaving 'somethings broken' style expected failure really does not
help code maintenance at all. So while it can be useful in the test
interface to have an explicit code for 'expected failure', I think it is
actually best to just write the test to catch the precise failure, and
report success. 

As for silence, yes, noise is generally not helpful, although long
running test suites can usefully give *some* feedback (a . per 100 tests
say) to remind people its still running.

Rob

-- 
GPG key available at: http://www.robertcollins.net/keys.txt.


signature.asc
Description: This is a digitally signed message part


Re: Automated testing - design and interfaces

2005-11-18 Thread Stefano Zacchiroli
On Thu, Nov 17, 2005 at 02:36:06PM -0800, Steve Langasek wrote:
 FWIW, I don't see that there's a clear advantage to having the test harness
 *expect* non-zero exit values (or non-empty output as you also suggested).

As I understood it, the proposed approach is about standardizing and
easing the way we perform tests on debian packages. Since it often
happen that both positive and negative tests are a need, having a way to
describe externally (wrt the tests) their expected behaviour seem to me
a good way to factorize in the tester the expectation code. It may
also help in writing interfaces for showing tests results which will be
more informative than a simple success/failure message.

I agree though that my very minimum assertion was too strong.

Cheers.

-- 
Stefano Zacchiroli -*- Computer Science PhD student @ Uny Bologna, Italy
[EMAIL PROTECTED],debian.org,bononia.it} -%- http://www.bononia.it/zack/
If there's any real truth it's that the entire multidimensional infinity
of the Universe is almost certainly being run by a bunch of maniacs. -!-


signature.asc
Description: Digital signature


Re: Automated testing - design and interfaces

2005-11-17 Thread Steve Langasek
[let's get this over to a technical list like it was supposed to be ;)]

On Thu, Nov 17, 2005 at 10:43:34PM +0100, Stefano Zacchiroli wrote:
 On Thu, Nov 17, 2005 at 06:43:32PM +, Ian Jackson wrote:
This means execute debian/tests/fred, debian/tests/bill, etc.,
each with no arguments, expecting exit status 0 and no stderr. The

 Having been involved in various unit testing packages I found the above
 expectations too much constraining.  The first thing you will need after
 requiring all tests not to fail is to be able to distinguish: test that
 need to suceed vs test that need to fail. Only the misbehaviour of
 tests wrt their expected result should be reported as test failures. I
 thus propose to add 

 Following your exit status based approach you could add to stanzas
 something like:

   Expected-Status: 0

 I found the above requirement the very minimum for a test interface.
 What follows is optional (IMHO).

FWIW, I don't see that there's a clear advantage to having the test harness
*expect* non-zero exit values (or non-empty output as you also suggested).
It may make it easier to write tests by putting more of the logic in the
test harness, but in exchange it makes it harder to debug a test failure
because the debugger has to figure out how correct and incorrect are
defined for each test, instead of just getting into the meat of seeing why
the test returned non-zero.  Likewise, expecting successful tests to be
silent means that you can rely on any output being error output that can be
used for debugging a test failure.

-- 
Steve Langasek   Give me a lever long enough and a Free OS
Debian Developer   to set it on, and I can move the world.
[EMAIL PROTECTED]   http://www.debian.org/


signature.asc
Description: Digital signature


Re: Automated testing - design and interfaces

2005-11-17 Thread Anthony Towns
Bcc'ed to -project; followups to -devel.

On Thu, Nov 17, 2005 at 06:43:32PM +, Ian Jackson wrote:
 Note that the point is to be able to test the _actual package_, _as
 installed_ (eg on a testbed system).  This is much better than testing
 the package from the source treeu during build time because it will
 detect packaging mistakes as well as program source problems and as we
 know packaging mistakes of one kind or another are one of the main
 causes of problems.

Mostly it's just different -- testing at build time lets you do unit
tests before putting the objects together and stripping them, which
gives you the opportunity to catch other bugs. One isn't better than the
other; though doing both is better than either or neither.

Other useful tests are things like install all of optional which
will catch unspecified Conflicts: relations, and do a partial upgrade
from stable to this package in unstable which will tell you if some
dependencies aren't strict enough. Looking through Contents-* files will
also let you catch unspecified dependencies.

Having multiple machines to do tests can be worthwhile too -- if you want
to test firewalls or that there aren't any listening ports on default
installs, eg.

   The source package provides a test metadata file debian/tests/
   control. This is a file containing zero or more RFC822-style
   stanzas, along these lines:
 Tests: fred bill bongo
 Restrictions: needs-root breaks-computer
   This means execute debian/tests/fred, debian/tests/bill, etc.,

Seems like:

  debian/tests/bar:
#!/bin/sh
# Restrictions: needs-root trashes-system
# Requires: foo

  foo FAIL: ...
  bar SKIP: foo failed

would make more sense than a separate file describing the tests? Is the
Depends: line meant to refer to other Debian packages (and thus be
a lower level version of Restrictions:) or is it meant to indiciate
test interdependencies? If it's meant to be for debian packages, maybe

  # Restrictions: deb:xvncviewer

might be better.

Note that it's often better to have a single script run many tests, so
you probably want to allow tests to pass back some summary information,
or include the last ten lines of its output or similar. Something like:

  foo FAIL:
FAILURE: testcase 231
FAILURE: testcase 289
FAILURE: testcase 314
3/512 test cases failed
  bar FAIL: (341123 other lines, then:)
xxx


x
Aborted (core dumped)
  baz SKIP: foo failed
  quux PASS

maybe.

   Any unknown thing in Restrictions, or any unknown field in the
   RFC822 stanza, causes the tester core to skip the test with a
   message like `test environment does not support blames-canada
   restriction of test simpsons'.

You mean southpark, surely?

   A basic test could be simply running the binary and checking the
   result status (or other variants of this). Eventually every
   package would to be changed to include at least one test.

These sorts of tests are better done as part of debian/rules, I would've
thought -- the advantage of that is that the problems get caught even
when users rebuild the package themselves, and you don't need to worry
about special test infrastructure like you're talking about for the
simple case.

   Ideally eventually where possible the upstream regression tests
   could be massaged so that they test the installed version. Whether
   this is possible and how best to achieve it has to be decided on a
   per-package basis.

Having

  Restrictions: package-installed

and

  Restrictions: build-tree

might be worth thinking about so that it's easy to do both sorts of
testing.

   Even integration tests can be represented like this: if one
   package's tests Depend on the other's, then they are effectively
   integration tests. The actual tests can live in whichever package
   is most convenient.

Going from build/foo-1.0/debian/tests/x to
projects/bar-3.14/debian/tests/y seems difficult.

Adding deb:otherpkg or deb:libotherpkg-dbg to the Restrictions:
seems more plausible?

Anyway, something that can be run with minimal amounts of setup seems
most likely to be most useful: so running as part of the build without
installing the package, running without anything special installed but the
package being tested and a script that parses the control information,
stuff that can be run on a user's system without root privs and without
trashing the system, etc.

If there's going to be a debian/rules check command, debian/tests/*
probably should just be a suggested standard, or vice-versa --
minimising the number of required interfaces would likely make things
more flexible. Being able to add upstream tests by nothing more than
symlinking them into debian/tests might be a worthwhile goal, perhaps.

 From: Ian Jackson [EMAIL PROTECTED]

 Ian.
 (wearing both my Debian and Ubuntu hats)

Heh.

Cheers,
aj



signature.asc
Description: Digital signature