Re: Breaking compatability with Test::Harness and friends?

Fergal Daly Wed, 20 Sep 2006 14:36:23 -0700

The crux of this is #1 and #2 in the first part below and "HOW CAN
THIS POSSIBLY BE A GOOD THING?" in the second part.


Using TODO tests instead of normal tests in the examples below has a
small benefit that can be achieved in other ways and a cost of false
passes and confusion.

On 20/09/06, Michael G Schwern <[EMAIL PROTECTED]> wrote:

It comes down to this:

SKIP tests are for when it'll never work.  Or if you don't care.

TODO tests allow the author to be notified when something that was broken 
starts working again.  It doesn't matter what that thing is.

If a module you depend on breaks and you want to know when it works again so you can 
start depending on it again, use a TODO test.  While its broken treat it like any other 
known bug (wrap it in a TODO test) or fix it by using something else.  Its not the end of 
the world if some users get an "unexpectedly succeeded" when it starts working 
again (though the Test::Harness output for bonus TODO tests could use some work to make 
it look less like a failure).


Using TODO tests in this case is bad all round -

1 It hides the failure if the user has a bad version of Fribble
2 It hides the failure if the user has a good version of Fribble but
we broke the part of Foo that talks to Fribble
3 It outputs confusing messages when the user actually has nothing to
worry about

I agree #3 is not the end of the world but #1 and #2 _are_ (for
whatever "the end of world" means in testing terms).

As for using this as a tool to find out when Fribble is fixed, I don't
get it. Do I regularly upgrade versions of Fribble and then run Foo's
test suite afterwards to see if it get unexpected passes? Then I go
digging in the test output to make sure that ALL of the TODOs passed,
not just some? My time would be much better spent filing a bug against
Fribble, sending a patch for Fribble's test suite (this can include
TODO tests) and waiting for my bug report to be closed.

If the developer of Fribble doesn't like my tests or if I'm super
eager, I can put them in my_fribble_tests.t and whenever I download a
new version of Fribble do

cd Fribble-x.y
make test
perl -Iblib ../my_fribble_tests.t

and see if they all pass. No TODOs involved and you don't even have to
install a possibly broken module to get an answer.

When the TODO test starts passing you can up your MakeMaker version dependency 
on that module and remove the TODO.


Let me give a real world example.

I recently worked in a shop where lots of modules were out of date.  For example, they were using CGI.pm 2.x. 
 I wanted to upgrade to the latest for some features, bug fixes, etc.  I asked if there was any reason why 
not and got some "oh, don't do that.  It'll break things."  "Like what?", I asked.  
"I don't remember.  We tried it a few years ago."

If, when someone encountered a CGI.pm bug they threw in a TODO test for it I'd 
have known what the problems were.  As it was all I could do is upgrade, run 
the tests and hope they were thorough enough to catch whatever problem CGI.pm 
caused.


I'm assuming you mean throw a test for CGI into Foo (the shop's
module). If you mean throw a test for CGI into CGI then I have no
argument with that but it's not the case we've been talking about.

This example misses a couple of points that were in the original
discussion because this shop don't appear to be publishing a module on
CPAN but anyhow...

So we're adding tests to Foo. Why are we making them TODO? Clearly Foo
has a crucial dependency of CGI.pm so they should be real tests that
fail loudly. If we make them TODO tests then we now have a situation
that when we run Foo's test suite

- a perfectly clear run means something is broken (the CGI TODO tests
are failing and thus silent)
- a run with real failures also means something is broken
- a run with unexpected passes means that everything might be OK,
depending on how many unexpected passes we got

HOW CAN THIS POSSIBLY BE A GOOD THING? Compare it with the case where
we make them real tests

- a prefectly clear run means nothing is broken
- a run with failures means something is broken

which is exactly how life should be.

Another example, same shop.  They're using 5.6.  I'm trying to upgrade to 5.8 but 
all sorts of things are failing.  One issue is with their Unicode code.  They have 
all sorts of Unicode code to work around 5.6 weirdness.  Now that I"m using 5.8 
I'm getting test failures.  Is it because they're relying on 5.6 bugs or because 
things are really broken?  Is a given bit of weird code in there to work around a 
5.6 bug or does it really need to work that way?  I don't know.

If people had written TODO tests for the perl bugs they encountered in 5.6 I'd 
have known what was a bug workaround and what was a real problem.


Well really they should have written and submitted tests for Perl but
that's beside the point.

They should also write real tests to ensure that their work arounds
still work (just in case they disappear when the bugs are fixed).

Finally, if they want, they could write TODO tests which would be
identical to the ones they should have submitted to the Perl
develeopers but in this case they would actually have some work to do
when they start passing (ie change their code to work with 5.8),
unlike the other examples where the todo item is "cross off this todo
item".

Note I would not make anything s SKIP test either in the examples you've given.

F


I never did manage to get 5.8 working.

Re: Breaking compatability with Test::Harness and friends?

Reply via email to