Re: Automated testing - design and interfaces
Anthony Towns writes (Re: Automated testing - design and interfaces): On Mon, Nov 21, 2005 at 06:22:37PM +, Ian Jackson wrote: This is no good because we want the test environment to be able to tell which tests failed, so the test cases have to be enumerated in the test metadata file. Uh, having to have 1000 scripts, or even just 1000 entries in a metadata file, just to run 1000 tests is a showstopper imho. Heck, identifying testcases by number mightn't be particularly desirable in some instances, if a clearer alternative like, say, test case failed: add 1, add 2, del 1, ch 2 is possible. Sorry, as Robert Collins point out, I didn't mean `enumerate'. I meant `identify'. That is, the test environment needs to see the results of individual tests and not just a human-only-readable report file. I agree with you about numbers. If you let tests have names people can always write numbers in, if they insist, so it's sufficient for the system to support names. You can't check that the binary works _when the .deb is installed_ without installing it. That's okay. You can't check that the binary works _on the user's system_ without installing it on the user's system either. For Debian's purposes, being able to run the tests with minimal setup seems crucial. That's true. Of course the user's system is a moveable feast. One goal of my design is to allow testing on a minimal setup. Also, a `Restriction' isn't right because if the test has neither of those Restrictions then presumably it can do either but how would it know which ? It would have to not care which; which it could do by expecting the test harness to put the binaries in the PATH, or provide an environment variable like INSTALL_ROOT=$(pwd)/debian/tmp . Right. So you're effectively adding a new bit to the spec to support that. I don't want to go there right now but this is definitely something we want to allow room for in the future. The way I would imagine extending it to cover this case would be to invent a new header (which the old test-runner would be updated to treat as harmless) Features: test-in-build which would mean that this was supported. You could say Restrictions: test-in-build to mean that _only_ that was supported. And of course you'd have to define exactly what the feature meant (including the INSTALL_ROOT thing, etc.). Having test case dependencies is fairly useful; in any event the language Even integration tests can be represented like this: if one package's tests Depend on the other's is wrong if tests depend on other packages, not on other package's tests. You'll want Conflicts: as well as Depends: in that case too. Ah, I see what you mean. Yes, that language is wrong. Adding `Conflicts' is an obvious extension but I don't propose to implement it in my first cut. It would probably be quite useful to be able to write tests like: for mta in exim4 smail sendmail qmail; do install $mta # actual test uninstall $mta done too, to ensure that packages that depend on one of a number of packages actually work with all of them. Quite so. I'm not sure if my test-runner will get clever enough for that but the information and interfaces it needs will be there. Ian. -- To UNSUBSCRIBE, email to [EMAIL PROTECTED] with a subject of unsubscribe. Trouble? Contact [EMAIL PROTECTED]
Re: Automated testing - design and interfaces
On Mon, Nov 21, 2005 at 06:22:37PM +, Ian Jackson wrote: Note that it's often better to have a single script run many tests, so you probably want to allow tests to pass back some summary information, or include the last ten lines of its output or similar. Something like: foo FAIL: FAILURE: testcase 231 FAILURE: testcase 289 FAILURE: testcase 314 3/512 test cases failed This is no good because we want the test environment to be able to tell which tests failed, so the test cases have to be enumerated in the test metadata file. Uh, having to have 1000 scripts, or even just 1000 entries in a metadata file, just to run 1000 tests is a showstopper imho. Heck, identifying testcases by number mightn't be particularly desirable in some instances, if a clearer alternative like, say, test case failed: add 1, add 2, del 1, ch 2 is possible. You can't check that the binary works _when the .deb is installed_ without installing it. That's okay. You can't check that the binary works _on the user's system_ without installing it on the user's system either. For Debian's purposes, being able to run the tests with minimal setup seems crucial. Also, a `Restriction' isn't right because if the test has neither of those Restrictions then presumably it can do either but how would it know which ? It would have to not care which; which it could do by expecting the test harness to put the binaries in the PATH, or provide an environment variable like INSTALL_ROOT=$(pwd)/debian/tmp . No, I mean that if the tests live (say) in build/foo-1.0/debian/tests/x then build/foo-1.0/debian/tests/control could say Depends: bar which would mean bar would have to be installed, effectively making it an integration test. Having test case dependencies is fairly useful; in any event the language Even integration tests can be represented like this: if one package's tests Depend on the other's is wrong if tests depend on other packages, not on other package's tests. You'll want Conflicts: as well as Depends: in that case too. It would probably be quite useful to be able to write tests like: for mta in exim4 smail sendmail qmail; do install $mta # actual test uninstall $mta done too, to ensure that packages that depend on one of a number of packages actually work with all of them. Cheers, aj signature.asc Description: Digital signature
Re: Automated testing - design and interfaces
On Wed, 2005-11-23 at 18:16 +1000, Anthony Towns wrote: On Mon, Nov 21, 2005 at 06:22:37PM +, Ian Jackson wrote: Note that it's often better to have a single script run many tests, so you probably want to allow tests to pass back some summary information, or include the last ten lines of its output or similar. Something like: foo FAIL: FAILURE: testcase 231 FAILURE: testcase 289 FAILURE: testcase 314 3/512 test cases failed This is no good because we want the test environment to be able to tell which tests failed, so the test cases have to be enumerated in the test metadata file. Replying to two messages here ... I don't think we have to enumerate the tests in advance. Sure the test runner needs to be able to identify and [possibly] categorise the tests, but explicit enumeration is quite orthogonal. A number of python unittest runners will scan directories and classes for their tests, and the report from users is consistently that this is easier to use. Uh, having to have 1000 scripts, or even just 1000 entries in a metadata file, just to run 1000 tests is a showstopper imho. Heck, identifying testcases by number mightn't be particularly desirable in some instances, if a clearer alternative like, say, test case failed: add 1, add 2, del 1, ch 2 is possible. A very nice feature in the xUnit world is that tests can be identified by either their path (inside the language namespace) or by a comment written by the author. At runtime you can choose which to see. I dont think we need the ability for runtime selection, but having a heuristic that works and is overridable would be nice. I.e. by default you might get tests/layout/documentation_in_usr_share_doc.sh: PASS But inside that test you could say: test_name Documentation is installed in /usr/share/doc and the output becomes Documentation is installed in /usr/share/doc: PASS I've written a project that is somewhat related to this: http://www.robertcollins.net/unittest/subunit/ Its a python implementation of a cross process test running protocol. This lets a sub process run 0 to many tests, identify them and provide pass/fail/error status and traceback or other diagnostics. As the driver is python its not appropriate here, but I think the basic protocol/concept might be useful. You can't check that the binary works _when the .deb is installed_ without installing it. That's okay. You can't check that the binary works _on the user's system_ without installing it on the user's system either. For Debian's purposes, being able to run the tests with minimal setup seems crucial. Yup It would probably be quite useful to be able to write tests like: for mta in exim4 smail sendmail qmail; do install $mta # actual test uninstall $mta done too, to ensure that packages that depend on one of a number of packages actually work with all of them. Yes, that would be excellent. Rob -- GPG key available at: http://www.robertcollins.net/keys.txt. signature.asc Description: This is a digitally signed message part
Re: Automated testing - design and interfaces
Anthony Towns writes (Re: Automated testing - design and interfaces): On Thu, Nov 17, 2005 at 06:43:32PM +, Ian Jackson wrote: The source package provides a test metadata file debian/tests/ control. This is a file containing zero or more RFC822-style stanzas, along these lines: Tests: fred bill bongo Restrictions: needs-root breaks-computer This means execute debian/tests/fred, debian/tests/bill, etc., Seems like: debian/tests/bar: #!/bin/sh # Restrictions: needs-root trashes-system # Requires: foo Urgh. I'm really not a fan of those files which mix up different languages. We'll end up with complicated scheme for separating out the test metadata from other stuff appearing in the comments at the top of files (Emacs and vim modes, #! lines, different comment syntaxes in different languages, etc.) Also, we want to be able to share the actual tests - that is, the meat of the work - with non-Debian systems. So we should separate out the metadata (which describes when the test should be run and where it is, and is Debian-specific) from the actual tests (which need not be Debian-specific). Is the Depends: line meant to refer to other Debian packages (and thus be a lower level version of Restrictions:) or is it meant to indiciate test interdependencies? If it's meant to be for debian packages, maybe # Restrictions: deb:xvncviewer might be better. Yes, Depends is semantically much like Restrictions but refers to a Debian package (which must be installed on the test system). However, Depends might have version numbers etc. - it's just like a Depends field. I don't want to try to mix that with the simple syntax of Restrictions. IMO it's better to have two fields if the structure (and hence the syntax) of the information is going to be significantly different, even if there's a certain similarity to the semantics. Note that it's often better to have a single script run many tests, so you probably want to allow tests to pass back some summary information, or include the last ten lines of its output or similar. Something like: foo FAIL: FAILURE: testcase 231 FAILURE: testcase 289 FAILURE: testcase 314 3/512 test cases failed This is no good because we want the test environment to be able to tell which tests failed, so the test cases have to be enumerated in the test metadata file. You do have a point about not necessarily starting a single process for each test. An earlier version of my draft had something like Test: .../filename+ where the + meant to execute filename and it would print 138: PASS 231: FAIL 289: FAIL 314: SKIP: no X11 or some similar standard format. A basic test could be simply running the binary and checking the result status (or other variants of this). Eventually every package would to be changed to include at least one test. These sorts of tests are better done as part of debian/rules, I would've thought -- the advantage of that is that the problems get caught even when users rebuild the package themselves, and you don't need to worry about special test infrastructure like you're talking about for the simple case. You can't check that the binary works _when the .deb is installed_ without installing it. Ideally eventually where possible the upstream regression tests could be massaged so that they test the installed version. Whether this is possible and how best to achieve it has to be decided on a per-package basis. Having Restrictions: package-installed and Restrictions: build-tree Hrm, that's an interesting idea. I really think that concentrating on testing as-installed is going to yield much richer results - that is, more test failures :-). So I want to provide that interface straight away. Also, a `Restriction' isn't right because if the test has neither of those Restrictions then presumably it can do either but how would it know which ? Even integration tests can be represented like this: if one package's tests Depend on the other's, then they are effectively integration tests. The actual tests can live in whichever package is most convenient. Going from build/foo-1.0/debian/tests/x to projects/bar-3.14/debian/tests/y seems difficult. No, I mean that if the tests live (say) in build/foo-1.0/debian/tests/x then build/foo-1.0/debian/tests/control could say Depends: bar which would mean bar would have to be installed, effectively making it an integration test. Anyway, something that can be run with minimal amounts of setup seems most likely to be most useful: so running as part of the build without installing the package, running without anything special installed but the package being tested and a script that parses the control information, stuff that can be run on a user's system without root privs and without trashing the system, etc. My idea is that the test runner will do `something
Re: Automated testing - design and interfaces
On Thu, 2005-11-17 at 14:36 -0800, Steve Langasek wrote: [let's get this over to a technical list like it was supposed to be ;)] Following your exit status based approach you could add to stanzas something like: Expected-Status: 0 I found the above requirement the very minimum for a test interface. What follows is optional (IMHO). FWIW, I don't see that there's a clear advantage to having the test harness *expect* non-zero exit values (or non-empty output as you also suggested). It may make it easier to write tests by putting more of the logic in the test harness, but in exchange it makes it harder to debug a test failure because the debugger has to figure out how correct and incorrect are defined for each test, instead of just getting into the meat of seeing why the test returned non-zero. Likewise, expecting successful tests to be silent means that you can rely on any output being error output that can be used for debugging a test failure. Right. Splitting it into two bits ... with respect to exit code, there is generally only one way to succeed, but many ways to fail. So reserving 0 for 'test succeeded' in ALL cases, makes writing front ends, or running the tests interactively much easier. Its certainly possible to provide a $lang function that can invert the relationship for you, for 'expected failure' results. One of the things about expected failures is their granularity: is a test expected to fail because 'file FOO is missing', or because 'something went wrong'. The former test is best written as an explicit check, where you invert the sense in the test script. Its best because this means that when the behaviour of the failing logic alters - for better or worse - you get flagged by the test that it has changed. Whereas a handwaving 'somethings broken' style expected failure really does not help code maintenance at all. So while it can be useful in the test interface to have an explicit code for 'expected failure', I think it is actually best to just write the test to catch the precise failure, and report success. As for silence, yes, noise is generally not helpful, although long running test suites can usefully give *some* feedback (a . per 100 tests say) to remind people its still running. Rob -- GPG key available at: http://www.robertcollins.net/keys.txt. signature.asc Description: This is a digitally signed message part
Re: Automated testing - design and interfaces
On Thu, Nov 17, 2005 at 02:36:06PM -0800, Steve Langasek wrote: FWIW, I don't see that there's a clear advantage to having the test harness *expect* non-zero exit values (or non-empty output as you also suggested). As I understood it, the proposed approach is about standardizing and easing the way we perform tests on debian packages. Since it often happen that both positive and negative tests are a need, having a way to describe externally (wrt the tests) their expected behaviour seem to me a good way to factorize in the tester the expectation code. It may also help in writing interfaces for showing tests results which will be more informative than a simple success/failure message. I agree though that my very minimum assertion was too strong. Cheers. -- Stefano Zacchiroli -*- Computer Science PhD student @ Uny Bologna, Italy [EMAIL PROTECTED],debian.org,bononia.it} -%- http://www.bononia.it/zack/ If there's any real truth it's that the entire multidimensional infinity of the Universe is almost certainly being run by a bunch of maniacs. -!- signature.asc Description: Digital signature
Re: Automated testing - design and interfaces
[let's get this over to a technical list like it was supposed to be ;)] On Thu, Nov 17, 2005 at 10:43:34PM +0100, Stefano Zacchiroli wrote: On Thu, Nov 17, 2005 at 06:43:32PM +, Ian Jackson wrote: This means execute debian/tests/fred, debian/tests/bill, etc., each with no arguments, expecting exit status 0 and no stderr. The Having been involved in various unit testing packages I found the above expectations too much constraining. The first thing you will need after requiring all tests not to fail is to be able to distinguish: test that need to suceed vs test that need to fail. Only the misbehaviour of tests wrt their expected result should be reported as test failures. I thus propose to add Following your exit status based approach you could add to stanzas something like: Expected-Status: 0 I found the above requirement the very minimum for a test interface. What follows is optional (IMHO). FWIW, I don't see that there's a clear advantage to having the test harness *expect* non-zero exit values (or non-empty output as you also suggested). It may make it easier to write tests by putting more of the logic in the test harness, but in exchange it makes it harder to debug a test failure because the debugger has to figure out how correct and incorrect are defined for each test, instead of just getting into the meat of seeing why the test returned non-zero. Likewise, expecting successful tests to be silent means that you can rely on any output being error output that can be used for debugging a test failure. -- Steve Langasek Give me a lever long enough and a Free OS Debian Developer to set it on, and I can move the world. [EMAIL PROTECTED] http://www.debian.org/ signature.asc Description: Digital signature
Re: Automated testing - design and interfaces
Bcc'ed to -project; followups to -devel. On Thu, Nov 17, 2005 at 06:43:32PM +, Ian Jackson wrote: Note that the point is to be able to test the _actual package_, _as installed_ (eg on a testbed system). This is much better than testing the package from the source treeu during build time because it will detect packaging mistakes as well as program source problems and as we know packaging mistakes of one kind or another are one of the main causes of problems. Mostly it's just different -- testing at build time lets you do unit tests before putting the objects together and stripping them, which gives you the opportunity to catch other bugs. One isn't better than the other; though doing both is better than either or neither. Other useful tests are things like install all of optional which will catch unspecified Conflicts: relations, and do a partial upgrade from stable to this package in unstable which will tell you if some dependencies aren't strict enough. Looking through Contents-* files will also let you catch unspecified dependencies. Having multiple machines to do tests can be worthwhile too -- if you want to test firewalls or that there aren't any listening ports on default installs, eg. The source package provides a test metadata file debian/tests/ control. This is a file containing zero or more RFC822-style stanzas, along these lines: Tests: fred bill bongo Restrictions: needs-root breaks-computer This means execute debian/tests/fred, debian/tests/bill, etc., Seems like: debian/tests/bar: #!/bin/sh # Restrictions: needs-root trashes-system # Requires: foo foo FAIL: ... bar SKIP: foo failed would make more sense than a separate file describing the tests? Is the Depends: line meant to refer to other Debian packages (and thus be a lower level version of Restrictions:) or is it meant to indiciate test interdependencies? If it's meant to be for debian packages, maybe # Restrictions: deb:xvncviewer might be better. Note that it's often better to have a single script run many tests, so you probably want to allow tests to pass back some summary information, or include the last ten lines of its output or similar. Something like: foo FAIL: FAILURE: testcase 231 FAILURE: testcase 289 FAILURE: testcase 314 3/512 test cases failed bar FAIL: (341123 other lines, then:) xxx x Aborted (core dumped) baz SKIP: foo failed quux PASS maybe. Any unknown thing in Restrictions, or any unknown field in the RFC822 stanza, causes the tester core to skip the test with a message like `test environment does not support blames-canada restriction of test simpsons'. You mean southpark, surely? A basic test could be simply running the binary and checking the result status (or other variants of this). Eventually every package would to be changed to include at least one test. These sorts of tests are better done as part of debian/rules, I would've thought -- the advantage of that is that the problems get caught even when users rebuild the package themselves, and you don't need to worry about special test infrastructure like you're talking about for the simple case. Ideally eventually where possible the upstream regression tests could be massaged so that they test the installed version. Whether this is possible and how best to achieve it has to be decided on a per-package basis. Having Restrictions: package-installed and Restrictions: build-tree might be worth thinking about so that it's easy to do both sorts of testing. Even integration tests can be represented like this: if one package's tests Depend on the other's, then they are effectively integration tests. The actual tests can live in whichever package is most convenient. Going from build/foo-1.0/debian/tests/x to projects/bar-3.14/debian/tests/y seems difficult. Adding deb:otherpkg or deb:libotherpkg-dbg to the Restrictions: seems more plausible? Anyway, something that can be run with minimal amounts of setup seems most likely to be most useful: so running as part of the build without installing the package, running without anything special installed but the package being tested and a script that parses the control information, stuff that can be run on a user's system without root privs and without trashing the system, etc. If there's going to be a debian/rules check command, debian/tests/* probably should just be a suggested standard, or vice-versa -- minimising the number of required interfaces would likely make things more flexible. Being able to add upstream tests by nothing more than symlinking them into debian/tests might be a worthwhile goal, perhaps. From: Ian Jackson [EMAIL PROTECTED] Ian. (wearing both my Debian and Ubuntu hats) Heh. Cheers, aj signature.asc Description: Digital signature